Re: Statistics Tool For Classification/Clustering

2002-02-27 Thread Mark Harrison

Good places to start:

Optimal feature extractors, that's better than PCA because you whiten your
inter class scatter and so put all inter class comparisons on the same
level. The good thing is this will also reduce your feature vector
dimensionality to c-1 (where c is # classes). PCA will not do this.

Check the stats of each class, is it Gaussian or known pdf? Apply
parameteric classifier if so.

However you are lucky if you get good classification after this, so you will
probably need non linear, non parametric classifiers. Try K nearest
neighobour, but that might take the age of the Universe so use a condensing
algorithm first to get a smaller representative set.

Matlab is what I use for coding, there are a lot of free toolboxes around.
Mostly I write my own though.

Best wishes

Andrew


"Rishabh Gupta" <[EMAIL PROTECTED]> wrote in message
news:a4eje9$ip8$[EMAIL PROTECTED].;
> Hi All,
> I'm a research student at the Department Of Electronics, University Of
> York, UK. I'm working a project related to music analysis and
> classification. I am at the stage where I perform some analysis on music
> files (currently only in MIDI format) and extract about 500 variables that
> are related to music properties like pitch, rhythm, polyphony and volume.
I
> am performing basic analysis like mean and standard deviation but then I
> also perform more elaborate analysis like measuring complexity of melody
and
> rhythm.
>
> The aim is that the variables obtained can be used to perform a number of
> different operations.
> - The variables can be used to classify / categorise each piece of
> music, on its own, in terms of some meta classifier (e.g. rock, pop,
> classical).
> - The variables can be used to perform comparison between two files. A
> variable from one music file can be compared to the equivalent variable in
> the other music file. By comparing all the variables in one file with the
> equivalent variable in the other file, an overall similarity measurement
can
> be obtained.
>
> The next stage is to test the ability of the of the variables obtained to
> perform the classification / comparison. I need to identify variables that
> are redundant (redundant in the sense of 'they do not provide any
> information' and 'they provide the same information as the other
variable')
> so that they can be removed and I need to identify variables that are
> distinguishing (provide the most amount of information).
>
> My Basic Questions Are:
> - What are the best statistical techniques / methods that should be
> applied here. E.g. I have looked at Principal Component Analysis; this
would
> be a good method to remove the redundant variables and hence reduce some
the
> amount of data that needs to be processed. Can anyone suggest any other
> sensible statistical anaysis methods?
> - What are the ideal tools / software to perform the clustering /
> classification. I have access to SPSS software but I have never used it
> before and am not really sure how to apply it or whether it is any good
when
> dealing with 100s of variables.
>
> So far I have been analysing each variable on its own 'by eye' by plotting
> the mean and sd for all music files. However this approach is not feasible
> in the long term since I am dealing with such a large number of variables.
> In addition, by looking at each variable on its own, I do not find
clusters
> / patterns that are only visible through multivariate analysis. If anyone
> can recommend a better approach I would be greatly appreciated.
>
> Any help or suggestion that can be offered will be greatly appreciated.
>
> Many Thanks!
>
> Rishabh Gupta
>
>




=
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at
  http://jse.stat.ncsu.edu/
=



Re: Statistics Tool For Classification/Clustering

2002-02-27 Thread Mark Harrison

Corection typo: Should read 'Whiten intra class scatter'

"Mark Harrison" <[EMAIL PROTECTED]> wrote in message
news:FIif8.16518$[EMAIL PROTECTED].;
> Good places to start:
>
> Optimal feature extractors, that's better than PCA because you whiten your
> inter class scatter and so put all inter class comparisons on the same
> level. The good thing is this will also reduce your feature vector
> dimensionality to c-1 (where c is # classes). PCA will not do this.
>
> Check the stats of each class, is it Gaussian or known pdf? Apply
> parameteric classifier if so.
>
> However you are lucky if you get good classification after this, so you
will
> probably need non linear, non parametric classifiers. Try K nearest
> neighobour, but that might take the age of the Universe so use a
condensing
> algorithm first to get a smaller representative set.
>
> Matlab is what I use for coding, there are a lot of free toolboxes around.
> Mostly I write my own though.
>
> Best wishes
>
> Andrew
>
>
> "Rishabh Gupta" <[EMAIL PROTECTED]> wrote in message
> news:a4eje9$ip8$[EMAIL PROTECTED].;
> > Hi All,
> > I'm a research student at the Department Of Electronics, University
Of
> > York, UK. I'm working a project related to music analysis and
> > classification. I am at the stage where I perform some analysis on music
> > files (currently only in MIDI format) and extract about 500 variables
that
> > are related to music properties like pitch, rhythm, polyphony and
volume.
> I
> > am performing basic analysis like mean and standard deviation but then I
> > also perform more elaborate analysis like measuring complexity of melody
> and
> > rhythm.
> >
> > The aim is that the variables obtained can be used to perform a number
of
> > different operations.
> > - The variables can be used to classify / categorise each piece of
> > music, on its own, in terms of some meta classifier (e.g. rock, pop,
> > classical).
> > - The variables can be used to perform comparison between two files.
A
> > variable from one music file can be compared to the equivalent variable
in
> > the other music file. By comparing all the variables in one file with
the
> > equivalent variable in the other file, an overall similarity measurement
> can
> > be obtained.
> >
> > The next stage is to test the ability of the of the variables obtained
to
> > perform the classification / comparison. I need to identify variables
that
> > are redundant (redundant in the sense of 'they do not provide any
> > information' and 'they provide the same information as the other
> variable')
> > so that they can be removed and I need to identify variables that are
> > distinguishing (provide the most amount of information).
> >
> > My Basic Questions Are:
> > - What are the best statistical techniques / methods that should be
> > applied here. E.g. I have looked at Principal Component Analysis; this
> would
> > be a good method to remove the redundant variables and hence reduce some
> the
> > amount of data that needs to be processed. Can anyone suggest any other
> > sensible statistical anaysis methods?
> > - What are the ideal tools / software to perform the clustering /
> > classification. I have access to SPSS software but I have never used it
> > before and am not really sure how to apply it or whether it is any good
> when
> > dealing with 100s of variables.
> >
> > So far I have been analysing each variable on its own 'by eye' by
plotting
> > the mean and sd for all music files. However this approach is not
feasible
> > in the long term since I am dealing with such a large number of
variables.
> > In addition, by looking at each variable on its own, I do not find
> clusters
> > / patterns that are only visible through multivariate analysis. If
anyone
> > can recommend a better approach I would be greatly appreciated.
> >
> > Any help or suggestion that can be offered will be greatly appreciated.
> >
> > Many Thanks!
> >
> > Rishabh Gupta
> >
> >
>
>




=
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at
  http://jse.stat.ncsu.edu/
=



Re: Statistics Tool For Classification/Clustering

2002-02-14 Thread Reg Edwards

Rishabh Gupta <[EMAIL PROTECTED]> wrote in message
a4eje9$ip8$[EMAIL PROTECTED]">news:a4eje9$ip8$[EMAIL PROTECTED]...
> Hi All,
> I'm a research student at the Department Of Electronics, University Of
> York, UK. I'm working a project related to music analysis and
> classification.
==
Pleased to see you have had many suggestions.

But I would have thought you are sitting right on top of all the books you
may need on the shelves in the university library.





=
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at
  http://jse.stat.ncsu.edu/
=



Re: Statistics Tool For Classification/Clustering

2002-02-14 Thread Rishabh Gupta

Hi all,
I recieved numerous replies to my query. I can't thanks everyone
individually so I want to thank everyone who has replied. I am now looking
through the information and links that you have provided.
Many Thanks For All Your Help!!

Rishabh
"Rishabh Gupta" <[EMAIL PROTECTED]> wrote in message
a4eje9$ip8$[EMAIL PROTECTED]">news:a4eje9$ip8$[EMAIL PROTECTED]...
> Hi All,
> I'm a research student at the Department Of Electronics, University Of
> York, UK. I'm working a project related to music analysis and
> classification. I am at the stage where I perform some analysis on music
> files (currently only in MIDI format) and extract about 500 variables that
> are related to music properties like pitch, rhythm, polyphony and volume.
I
> am performing basic analysis like mean and standard deviation but then I
> also perform more elaborate analysis like measuring complexity of melody
and
> rhythm.
>
> The aim is that the variables obtained can be used to perform a number of
> different operations.
> - The variables can be used to classify / categorise each piece of
> music, on its own, in terms of some meta classifier (e.g. rock, pop,
> classical).
> - The variables can be used to perform comparison between two files. A
> variable from one music file can be compared to the equivalent variable in
> the other music file. By comparing all the variables in one file with the
> equivalent variable in the other file, an overall similarity measurement
can
> be obtained.
>
> The next stage is to test the ability of the of the variables obtained to
> perform the classification / comparison. I need to identify variables that
> are redundant (redundant in the sense of 'they do not provide any
> information' and 'they provide the same information as the other
variable')
> so that they can be removed and I need to identify variables that are
> distinguishing (provide the most amount of information).
>
> My Basic Questions Are:
> - What are the best statistical techniques / methods that should be
> applied here. E.g. I have looked at Principal Component Analysis; this
would
> be a good method to remove the redundant variables and hence reduce some
the
> amount of data that needs to be processed. Can anyone suggest any other
> sensible statistical anaysis methods?
> - What are the ideal tools / software to perform the clustering /
> classification. I have access to SPSS software but I have never used it
> before and am not really sure how to apply it or whether it is any good
when
> dealing with 100s of variables.
>
> So far I have been analysing each variable on its own 'by eye' by plotting
> the mean and sd for all music files. However this approach is not feasible
> in the long term since I am dealing with such a large number of variables.
> In addition, by looking at each variable on its own, I do not find
clusters
> / patterns that are only visible through multivariate analysis. If anyone
> can recommend a better approach I would be greatly appreciated.
>
> Any help or suggestion that can be offered will be greatly appreciated.
>
> Many Thanks!
>
> Rishabh Gupta
>
>




=
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at
  http://jse.stat.ncsu.edu/
=



Re: Statistics Tool For Classification/Clustering

2002-02-13 Thread Jim Snow

"Richard Wright" <[EMAIL PROTECTED]> wrote in message
[EMAIL PROTECTED]">news:[EMAIL PROTECTED]...
> Genres are presumably groups. So linear combinations of variables that
> best separate the genres would be more effectively found by linear
> canonical variates analysis (aka discriminant analysis).
>
> Richard Wright
>
>
> On Thu, 14 Feb 2002 03:18:48 GMT, "Jim Snow" <[EMAIL PROTECTED]>
> wrote:
>
>
> snipped

> >My inclination would be to start with an Andrews plot, possibly
> >using principal component scores for about 20 music files from several
> >genres. This will enable you to find linear combinations of variable
which
> >best separate the genres. The technique and examples is set out in:
> snipped
>

 Andrews plots and similar techniques do not replace discriminant
analysis, which , as Richard Wright said " finds  linear combinations of
variables that best separate the variables" . In the book by Gnanadesikan
which first popularised the technique, he examines the variables in the
discriminant space, ie a space defined by discriminant functions rather than
principal components or original variables.
The techniques are doing different things.
 Andrews plots are to enable examination of the multidimensional data in a
two dimensional plot. Amongst other things, for example, several dimensions
of high difference between say jazz and pop or between jazz and flamenco may
be found,which are not necessarily orthogonal.
Andrews plots are a data reduction technique which is ,in many
dimensions, analogous to examining a multi dimensional cluster of points
from many viewpoints ,so that no possible view point is far from one of
those used. Thus virtually all possible discriminant functions are tried and
the interesting ones noted. In a spirit of exploratory data analysis, this
seems useful.
RishadhGupta wrote:
-" The variables can be used to perform comparison between two files. A
variable from one music file can be compared to the equivalent variable in
the other music file. By comparing all the variables in one file with the
equivalent variable in the other file, an overall similarity measurement can
be obtained."

Andrews plots reveal the directions in which the two files differ.
Incidentally, the total area between the two traces on the plot is the
Euclidean distance, I think, if the original Andrews weightings are used.
Tukey suggested weightings which examine the multidimensional space more
closely but do not have such a simple interpretation of the difference
between traces. I have not used any of this for some time and I do not have
relevant books, but the material I referred to on the web should be helpful.

Straightforward discriminant analysis will certainly find the best
linear discriminator in the least squares sense, but stepwise elimination of
variables in this process may result in discarding a variable with intuitive
appeal in favour of one or several highly correlated with it and the least
squares metric may possibly not be the best. For this and other reasons an
exploratory approach as Rishabh Gupta has begun seems appropriate.

   I still hope this helps   Jim Snow






=
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at
  http://jse.stat.ncsu.edu/
=



Re: Statistics Tool For Classification/Clustering

2002-02-13 Thread Jay Warner

You might consider a form of PLS - your measurmenets may be highly correlated,
and only a very few can do you any good.  You have a great many output vars,
and few enough inputs.

Jay

Rishabh Gupta wrote:

> Hi All,
> I'm a research student at the Department Of Electronics, University Of
> York, UK. I'm working a project related to music analysis and
> classification. I am at the stage where I perform some analysis on music
> files (currently only in MIDI format) and extract about 500 variables that
> are related to music properties like pitch, rhythm, polyphony and volume. I
> am performing basic analysis like mean and standard deviation but then I
> also perform more elaborate analysis like measuring complexity of melody and
> rhythm.
>
> The aim is that the variables obtained can be used to perform a number of
> different operations.
> - The variables can be used to classify / categorise each piece of
> music, on its own, in terms of some meta classifier (e.g. rock, pop,
> classical).
> - The variables can be used to perform comparison between two files. A
> variable from one music file can be compared to the equivalent variable in
> the other music file. By comparing all the variables in one file with the
> equivalent variable in the other file, an overall similarity measurement can
> be obtained.
>
> The next stage is to test the ability of the of the variables obtained to
> perform the classification / comparison. I need to identify variables that
> are redundant (redundant in the sense of 'they do not provide any
> information' and 'they provide the same information as the other variable')
> so that they can be removed and I need to identify variables that are
> distinguishing (provide the most amount of information).
>
> My Basic Questions Are:
> - What are the best statistical techniques / methods that should be
> applied here. E.g. I have looked at Principal Component Analysis; this would
> be a good method to remove the redundant variables and hence reduce some the
> amount of data that needs to be processed. Can anyone suggest any other
> sensible statistical anaysis methods?
> - What are the ideal tools / software to perform the clustering /
> classification. I have access to SPSS software but I have never used it
> before and am not really sure how to apply it or whether it is any good when
> dealing with 100s of variables.
>
> So far I have been analysing each variable on its own 'by eye' by plotting
> the mean and sd for all music files. However this approach is not feasible
> in the long term since I am dealing with such a large number of variables.
> In addition, by looking at each variable on its own, I do not find clusters
> / patterns that are only visible through multivariate analysis. If anyone
> can recommend a better approach I would be greatly appreciated.
>
> Any help or suggestion that can be offered will be greatly appreciated.
>
> Many Thanks!
>
> Rishabh Gupta
>
> =
> Instructions for joining and leaving this list, remarks about the
> problem of INAPPROPRIATE MESSAGES, and archives are available at
>   http://jse.stat.ncsu.edu/
> =

--
Jay Warner
Principal Scientist
Warner Consulting, Inc.
 North Green Bay Road
Racine, WI 53404-1216
USA

Ph: (262) 634-9100
FAX: (262) 681-1133
email: [EMAIL PROTECTED]
web: http://www.a2q.com

The A2Q Method (tm) -- What do you want to improve today?






=
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at
  http://jse.stat.ncsu.edu/
=



Re: Statistics Tool For Classification/Clustering

2002-02-13 Thread Richard Wright

Genres are presumably groups. So linear combinations of variables that
best separate the genres would be more effectively found by linear
canonical variates analysis (aka discriminant analysis).

Richard Wright


On Thu, 14 Feb 2002 03:18:48 GMT, "Jim Snow" <[EMAIL PROTECTED]>
wrote:


snipped
>My inclination would be to start with an Andrews plot, possibly
>using principal component scores for about 20 music files from several
>genres. This will enable you to find linear combinations of variable which
>best separate the genres. The technique and examples is set out in:
snipped



=
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at
  http://jse.stat.ncsu.edu/
=



Re: Statistics Tool For Classification/Clustering

2002-02-13 Thread Jim Snow


"Rishabh Gupta" <[EMAIL PROTECTED]> wrote in message
a4eje9$ip8$[EMAIL PROTECTED]">news:a4eje9$ip8$[EMAIL PROTECTED]...
> Hi All,
> I'm a research student at the Department Of Electronics, University Of
> York, UK. I'm working a project related to music analysis and
> classification. I am at the stage where I perform some analysis on music
> files (currently only in MIDI format) and extract about 500 variables that
> are related to music properties like pitch, rhythm, polyphony and volume.
I
> am performing basic analysis like mean and standard deviation but then I
> also perform more elaborate analysis like measuring complexity of melody
and
> rhythm.
>
> The aim is that the variables obtained can be used to perform a number of
> different operations.
> - The variables can be used to classify / categorise each piece of
> music, on its own, in terms of some meta classifier (e.g. rock, pop,
> classical).
> - The variables can be used to perform comparison between two files. A
> variable from one music file can be compared to the equivalent variable in
> the other music file. By comparing all the variables in one file with the
> equivalent variable in the other file, an overall similarity measurement
can
> be obtained.
>
> The next stage is to test the ability of the of the variables obtained to
> perform the classification / comparison. I need to identify variables that
> are redundant (redundant in the sense of 'they do not provide any
> information' and 'they provide the same information as the other
variable')
> so that they can be removed and I need to identify variables that are
> distinguishing (provide the most amount of information).
>
> My Basic Questions Are:
> - What are the best statistical techniques / methods that should be
> applied here. E.g. I have looked at Principal Component Analysis; this
would
> be a good method to remove the redundant variables and hence reduce some
the
> amount of data that needs to be processed. Can anyone suggest any other
> sensible statistical anaysis methods?
> - What are the ideal tools / software to perform the clustering /
> classification. I have access to SPSS software but I have never used it
> before and am not really sure how to apply it or whether it is any good
when
> dealing with 100s of variables.
>
> So far I have been analysing each variable on its own 'by eye' by plotting
> the mean and sd for all music files. However this approach is not feasible
> in the long term since I am dealing with such a large number of variables.
> In addition, by looking at each variable on its own, I do not find
clusters
> / patterns that are only visible through multivariate analysis. If anyone
> can recommend a better approach I would be greatly appreciated.
>
> Any help or suggestion that can be offered will be greatly appreciated.



 A useful exposition of techniques for initial investigation of
multivariate data set is given at

  http://www.sas.com/service/library/periodicals/obs/obswww22/

 If you point your browser at " Andrews plots " you will find more.

My inclination would be to start with an Andrews plot, possibly
using principal component scores for about 20 music files from several
genres. This will enable you to find linear combinations of variable which
best separate the genres. The technique and examples is set out in:

  Gnanadesikan:Multivariate Data Analysis, but this is an old
reference.

I hope this helps   Jim Snow




=
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at
  http://jse.stat.ncsu.edu/
=



Re: Statistics Tool For Classification/Clustering

2002-02-13 Thread Art Kendall

classification is a specialized field go to
http://www.pitt.edu/~csna/
and click on 
although this is the Classification Society of North America members of the
British Classification Society also follow it.

SPSS should be able to handle what you want to do.  However, you need
face-to-face consulting/collaboration with someone who does this kind of
analysis.  Many of the techniques grew out of psychology so if CLASS-L doesn't
help you might try your local psycholgy departments.

Rishabh Gupta wrote:

> Hi All,
> I'm a research student at the Department Of Electronics, University Of
> York, UK. I'm working a project related to music analysis and
> classification. I am at the stage where I perform some analysis on music
> files (currently only in MIDI format) and extract about 500 variables that
> are related to music properties like pitch, rhythm, polyphony and volume. I
> am performing basic analysis like mean and standard deviation but then I
> also perform more elaborate analysis like measuring complexity of melody and
> rhythm.
>
> The aim is that the variables obtained can be used to perform a number of
> different operations.
> - The variables can be used to classify / categorise each piece of
> music, on its own, in terms of some meta classifier (e.g. rock, pop,
> classical).
> - The variables can be used to perform comparison between two files. A
> variable from one music file can be compared to the equivalent variable in
> the other music file. By comparing all the variables in one file with the
> equivalent variable in the other file, an overall similarity measurement can
> be obtained.
>
> The next stage is to test the ability of the of the variables obtained to
> perform the classification / comparison. I need to identify variables that
> are redundant (redundant in the sense of 'they do not provide any
> information' and 'they provide the same information as the other variable')
> so that they can be removed and I need to identify variables that are
> distinguishing (provide the most amount of information).
>
> My Basic Questions Are:
> - What are the best statistical techniques / methods that should be
> applied here. E.g. I have looked at Principal Component Analysis; this would
> be a good method to remove the redundant variables and hence reduce some the
> amount of data that needs to be processed. Can anyone suggest any other
> sensible statistical anaysis methods?
> - What are the ideal tools / software to perform the clustering /
> classification. I have access to SPSS software but I have never used it
> before and am not really sure how to apply it or whether it is any good when
> dealing with 100s of variables.
>
> So far I have been analysing each variable on its own 'by eye' by plotting
> the mean and sd for all music files. However this approach is not feasible
> in the long term since I am dealing with such a large number of variables.
> In addition, by looking at each variable on its own, I do not find clusters
> / patterns that are only visible through multivariate analysis. If anyone
> can recommend a better approach I would be greatly appreciated.
>
> Any help or suggestion that can be offered will be greatly appreciated.
>
> Many Thanks!
>
> Rishabh Gupta



=
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at
  http://jse.stat.ncsu.edu/
=



Re: Statistics Tool For Classification/Clustering

2002-02-13 Thread M Law

In sci.stat.math Rishabh Gupta <[EMAIL PROTECTED]> wrote:

[ snip ]

It seems that you are new to the field of pattern recognition.
In that case, you may want to check out the classic book
"Pattern Classification" by Duda, Hart and Stork.

There is a second edition that came out in 2001. It is a classic of the
field, and you may find other insights useful to your problem.

M Law


=
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at
  http://jse.stat.ncsu.edu/
=



Re: Statistics Tool For Classification/Clustering

2002-02-13 Thread Doug Hoy

"Rishabh Gupta" <[EMAIL PROTECTED]> wrote in
a4eje9$ip8$[EMAIL PROTECTED]:">news:a4eje9$ip8$[EMAIL PROTECTED]: 

> Hi All,
> I'm a research student at the Department Of Electronics, University
> Of 
> York, UK. I'm working a project related to music analysis and
> classification. I am at the stage where I perform some analysis on
> music files (currently only in MIDI format) and extract about 500
> variables that are related to music properties like pitch, rhythm,
> polyphony and volume. I am performing basic analysis like mean and
> standard deviation but then I also perform more elaborate analysis like
> measuring complexity of melody and rhythm.
> 
> The aim is that the variables obtained can be used to perform a number
> of different operations.
> - The variables can be used to classify / categorise each piece of
> music, on its own, in terms of some meta classifier (e.g. rock, pop,
> classical).
> - The variables can be used to perform comparison between two
> files. A 
> variable from one music file can be compared to the equivalent variable
> in the other music file. By comparing all the variables in one file
> with the equivalent variable in the other file, an overall similarity
> measurement can be obtained.
> 
> The next stage is to test the ability of the of the variables obtained
> to perform the classification / comparison. I need to identify
> variables that are redundant (redundant in the sense of 'they do not
> provide any information' and 'they provide the same information as the
> other variable') so that they can be removed and I need to identify
> variables that are distinguishing (provide the most amount of
> information). 
> 
> My Basic Questions Are:
> - What are the best statistical techniques / methods that should be
> applied here. E.g. I have looked at Principal Component Analysis; this
> would be a good method to remove the redundant variables and hence
> reduce some the amount of data that needs to be processed. Can anyone
> suggest any other sensible statistical anaysis methods?
> - What are the ideal tools / software to perform the clustering /
> classification. I have access to SPSS software but I have never used it
> before and am not really sure how to apply it or whether it is any good
> when dealing with 100s of variables.
> 
> So far I have been analysing each variable on its own 'by eye' by
> plotting the mean and sd for all music files. However this approach is
> not feasible in the long term since I am dealing with such a large
> number of variables. In addition, by looking at each variable on its
> own, I do not find clusters / patterns that are only visible through
> multivariate analysis. If anyone can recommend a better approach I
> would be greatly appreciated. 
> 
> Any help or suggestion that can be offered will be greatly appreciated.
> 
> Many Thanks!
> 
> Rishabh Gupta

In SPSS, Factor Analysis would help you reduce your many variables down to 
bigger, more general ones. As well, Cluster Analysis will let you see how 
your variables group themselves. The results might look like the following:

Factor 1: (percussiveness)
volume of drums
number of drum types
drum melodies...

Factor 2: (happiness)
minor modes
speed
pitch...

Factor 3: (memorableness)
melodic structure
folk music precursor


The cluster analysis would be similar, but would have the variables on a 
branching tree that showed that speed and pitch were closer than drum type 
and folk precursor, say. Would be interesting to see how this works.

I wonder if you could calculate some kind of fractal dimension for the 
music too?

Doug H


=
Instructions for joining and leaving this list, remarks about the
problem of INAPPROPRIATE MESSAGES, and archives are available at
  http://jse.stat.ncsu.edu/
=