Yeah. Trying to define music is essentially pointless. Defining speech would be easier.
However, since you mentioned neural nets, you could try to train a net on speech and music (there's an ANN object out there that I've tested) and see what happens. That would be a fun experiment, no idea how well it would work... you'd need a pretty huge training set for it to be even remotely (I would think). ——t3db0t On Feb 7, 2011, at 1:38 PM, Pedro Lopes wrote: > First of all, I would take it from another angle: > > <this is one possible way, out of zillions> > if it is speech or not. Thus if the speech recognizer has X % of recogniztion > rate, you inherit that percentage. Now you heavily depend on the recognizer, > some recognizers like teh default windows try to always match the input to > some string, thus they are a bit of garbage in academic terms, what you need > is a strong open recognizer that can tell you how % similar the sentence is > to a target sentence in database. > > Why do I suggest this angle? > - Cause' I don't wanna think "what is music". Speech is a language, it is > defined, it easy structured. Music? Noise is music, drone is music, ambient > can be non rhythmical, what about an a Capella singing? Will it be music? and > all those inherited philosophical issues. Furthermore, if you need more help > maybe explaining the context will aid us, because if you only care for > certain "music" can be easier. ALSO: if you have access the audio data, you > can always extract (filter) the music. > > </this is one possible way, out of zillions> > > best, > pedro > > > On Mon, Feb 7, 2011 at 5:43 PM, patrick <[email protected]> wrote: > would it be possible to detect if the incoming audio is music or speech? i > guess it's very hard, but i was thinking about some methods: > > using some kind of frequency detection > using bonk (if the tempo is stable = music) > env~ (most music are compressed nowadays) > training a voice (using neural network?!?) > > > From the author of aubio: > Use a few low level features, such as energy of low and high frequencies > bands, spectral spread. In a second step, these approaches are often refined > using machine learning techniques bayesian networks or support vector > machines. > > See for instance these papers: > http://cobweb.ecn.purdue.edu/~malcolm/interval/1996-085/ > http://www.aclweb.org/anthology/O/O08/O08-1015.pdf > http://www.hindawi.com/journals/asp/2009/628570.html > > i would like to achieve > 90% of accuracy if possible. any suggestions are > welcome! > > _______________________________________________ > [email protected] mailing list > UNSUBSCRIBE and account-management -> > http://lists.puredata.info/listinfo/pd-list > > > > -- > Pedro Lopes (MSc) > contact: [email protected] > website: http://web.ist.utl.pt/Pedro.Lopes / > http://pedrolopesresearch.wordpress.com/ | http://twitter.com/plopesresearch > _______________________________________________ > [email protected] mailing list > UNSUBSCRIBE and account-management -> > http://lists.puredata.info/listinfo/pd-list
_______________________________________________ [email protected] mailing list UNSUBSCRIBE and account-management -> http://lists.puredata.info/listinfo/pd-list
