Re: [R] About clustering techniques

2008-07-30 Thread pacomet
Hi Christian

I've been reading about daisy and think I need to do something like..

 mydaisydata - daisy(mydata,metric=c(euclidean),stand=FALSE)
Error en vector(double, length) :
  tamaƱo del vector especificado es muy grande(which means, specified
vector size is too big)


mydata is an anual file with 14 columns by 124716 rows. Is it possible that
daisy can't handle this data? maybe I'm missing something when using daisy.

Another question, if I get daisy running I can use kmeans like this?

mykmeansdata - kmeans(mydaisydata, 5)

or pamk that I've read it gives the optimal number of clusters.

Thanks again

-- 
_
El ponent la mou, el llevant la plou
Usuari Linux registrat: 363952
---
Fotos: http://picasaweb.google.es/pacomet

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] About clustering techniques

2008-07-29 Thread ctu

Hi Paco,
I got the same problem with you before. Thus, I just impute the missing values
For example:

newdata-as.matrix(impute(olddata, fun=random))
then I believe that you could analyze your data.

Hopefully it helps.
Chunhao


Quoting pacomet [EMAIL PROTECTED]:


Hello R users

It's some time I am playing with a dataset to do some cluster analysis. The
data set consists of 14 columns being geographical coordinates and monthly
temperatures in annual files

latitutde - longitude - temperature 1 -. - temperature 12

I have some missing values in some cases, maybe there are 8 monthly valid
values at some points with four non valid. I don't want to supress the whole
row with 8 good/4 bad values as I wanna try annual and monthy analysis.

I first tried kmeans but found a problem with missing values. When trying
without omitting missing values kmeans gives an error and when excluding
invalid data too many values are excluded in some years of the data series.

Now I have been reading about pam, pamk and clara, I think they can handle
missing values. But can't find out the way to perform the analysis with
these functions. As I'm not an statistics nor an R expert the fpc or cluster
package documentation is not enough for me. If you know about a website or a
tutorial explaining the way to use that functions, with examples to check if
possible, please post them.

Any other help or suggestion is greatly appreciated.

Thanks in advance

Paco

--
_
El ponent la mou, el llevant la plou
Usuari Linux registrat: 363952
---
Fotos: http://picasaweb.google.es/pacomet

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] About clustering techniques

2008-07-29 Thread Christian Hennig

Dear Paco,

in order to use the methods in the cluster package (including pam), look up 
the help page of daisy, which is able to compute dissimilarity matrices

handling missing values appropriately (in most situations).

A good reference is the Kaufman and Rousseeuw book cited on that help page.

Christian

On Tue, 29 Jul 2008, pacomet wrote:


Hello R users

It's some time I am playing with a dataset to do some cluster analysis. The
data set consists of 14 columns being geographical coordinates and monthly
temperatures in annual files

latitutde - longitude - temperature 1 -. - temperature 12

I have some missing values in some cases, maybe there are 8 monthly valid
values at some points with four non valid. I don't want to supress the whole
row with 8 good/4 bad values as I wanna try annual and monthy analysis.

I first tried kmeans but found a problem with missing values. When trying
without omitting missing values kmeans gives an error and when excluding
invalid data too many values are excluded in some years of the data series.

Now I have been reading about pam, pamk and clara, I think they can handle
missing values. But can't find out the way to perform the analysis with
these functions. As I'm not an statistics nor an R expert the fpc or cluster
package documentation is not enough for me. If you know about a website or a
tutorial explaining the way to use that functions, with examples to check if
possible, please post them.

Any other help or suggestion is greatly appreciated.

Thanks in advance

Paco

--
_
El ponent la mou, el llevant la plou
Usuari Linux registrat: 363952
---
Fotos: http://picasaweb.google.es/pacomet

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.



*** --- ***
Christian Hennig
University College London, Department of Statistical Science
Gower St., London WC1E 6BT, phone +44 207 679 1698
[EMAIL PROTECTED], www.homepages.ucl.ac.uk/~ucakche

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.


Re: [R] About clustering techniques

2008-07-29 Thread Christian Hennig
A quick comment on this: imputation is an option to make things 
technically work, but it is not 
necessarily good. Imputation always introduces some noise, ie, it fakes 
information that is not really there.


Whether it is good depends strongly on the data, the situation and the 
imputation method (random often not being a very sensible 
choice).


Christian

On Tue, 29 Jul 2008, [EMAIL PROTECTED] wrote:


Hi Paco,
I got the same problem with you before. Thus, I just impute the missing 
values

For example:

newdata-as.matrix(impute(olddata, fun=random))
then I believe that you could analyze your data.

Hopefully it helps.
Chunhao


Quoting pacomet [EMAIL PROTECTED]:


Hello R users

It's some time I am playing with a dataset to do some cluster 
analysis. The
data set consists of 14 columns being geographical coordinates and 
monthly

temperatures in annual files

latitutde - longitude - temperature 1 -. - temperature 12

I have some missing values in some cases, maybe there are 8 monthly 
valid
values at some points with four non valid. I don't want to supress the 
whole
row with 8 good/4 bad values as I wanna try annual and monthy 
analysis.


I first tried kmeans but found a problem with missing values. When 
trying
without omitting missing values kmeans gives an error and when 
excluding
invalid data too many values are excluded in some years of the data 
series.


Now I have been reading about pam, pamk and clara, I think they can 
handle
missing values. But can't find out the way to perform the analysis 
with
these functions. As I'm not an statistics nor an R expert the fpc or 
cluster
package documentation is not enough for me. If you know about a 
website or a
tutorial explaining the way to use that functions, with examples to 
check if

possible, please post them.

Any other help or suggestion is greatly appreciated.

Thanks in advance

Paco

--
_
El ponent la mou, el llevant la plou
Usuari Linux registrat: 363952
---
Fotos: http://picasaweb.google.es/pacomet

[[alternative HTML version deleted]]

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.



__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide 
http://www.R-project.org/posting-guide.html

and provide commented, minimal, self-contained, reproducible code.


*** --- ***
Christian Hennig
University College London, Department of Statistical Science
Gower St., London WC1E 6BT, phone +44 207 679 1698
[EMAIL PROTECTED], www.homepages.ucl.ac.uk/~ucakche

__
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.