subject:"\[R\] Clustering"

Re: [R] Clustering Functions used by Reverse-Dependencies

2024-02-29 Thread Leo Mada via R-help

f code. On the other hand, the help page for codetools::checkUsage is quite cryptic. But it's good to know at least where to look. Sincerely, Leonard From: Ivan Krylov Sent: Wednesday, February 28, 2024 10:36 AM To: Leo Mada via R-help Cc: Leo Mada Su

Re: [R] Clustering Functions used by Reverse-Dependencies

2024-02-28 Thread Ivan Krylov via R-help

В Sat, 24 Feb 2024 03:08:26 + Leo Mada via R-help пишет: > Are there any tools to extract the function names called by > reverse-dependencies? For well-behaved packages that declare their dependencies correctly, parsing the NAMESPACE for importFrom() and import() calls should give you the ex

[R] Clustering Functions used by Reverse-Dependencies

2024-02-23 Thread Leo Mada via R-help

Dear R Users, Are there any tools to extract the function names called by reverse-dependencies? I would like to group these functions using clustering methods based on the co-occurrence in the reverse-dependencies. Utility: It may be possible to split complex packages into modules with fewer

Re: [R] Clustering of datasets

2022-09-05 Thread Rui Barradas

Hello, I am not at all sure that the following answers the question. The code below ries to find the optimal number of clusters. One of the changes I have made to your call to kmeans is to subset DMs not dropping the dim attribute. library(cluster) max_clust <- 10 wss <- numeric(max_clust)

Re: [R] Clustering of datasets

2022-09-05 Thread Jim Lemon

Hi Subhamitra, I think the fact that you are passing a vector of values rather than a matrix is part of the problem. As you have only one value for each country, The points plotted will be the index on the x-axis and the value for each country on the y-axis. Passing a value for ylim= means that you

[R] clustering levels using Tukey HSD in a one way anova

2017-12-31 Thread Ashim Kapoor

Dear all, I am doing a one way between subjects anova in an unbalanced data set. Suppose we have "a" levels of the one factor. I want to merge the not so significantly different levels into the same cluster. Can I do a Tukey Kramer HSD and then use the following algorithm: For i in 2 : "a"

Re: [R] Clustering methods for data that has bimodal distribution

2016-12-05 Thread Ranjan Maitra

Hello Adrian, It all depends on what the structure of the dataset is. For instance, you said that all your values are betweenn -1 and 1. Do the data rown sum-squared up to 1? How about the means? Are they zero. I guess all this has to depend on the application and how the data were processed or

[R] Clustering methods for data that has bimodal distribution

2016-12-04 Thread Adrian Johnson

Dear group, pardon me for a naive question. I have data matrix (11K rows , 4K columns). The data range is between -1 to 1. Not strictly integers, but real numbers with at least place values in millionths. The data distribution is peculiar (if I do plot(density(myMatrix)), I get nice bimodal curve

[R] Clustering of clients (retail) - Free data sets?

2015-09-17 Thread Omar André Gonzáles Díaz

Hi all, I'm learning about how to do clusters of clients. Ç I've founde this nice presentation on the subject, but the data is not avaliable to use. I've contacted the autor, hope he'll answer soon. https://ds4ci.files.wordpress.com/2013/09/user08_jimp_custseg_revnov08.pdf Someone knows similar

Re: [R] clustering with hclust

2014-07-25 Thread Christian Hennig

Dear Marianna, the function agnes in library cluster can compute Ward's method from a raw data matrix (at least this is what the help page suggests). Also, you may not be using the most recent version of hclust. The most recent version has a note in its help page that states: "Two different

[R] clustering with hclust

2014-07-25 Thread Marianna Bolognesi

Hi everybody, I have a problem with a cluster analysis. I am trying to use hclust, method=ward. The Ward method works with SQUARED Euclidean distances. Hclust demands "a dissimilarity structure as produced by dist". Yet, dist does not seem to produce a table of squared euclidean distances, star

[R] Clustering of data set documentation files in package description

2013-09-20 Thread Thiem Alrik

Dear R help list, I was just wondering whether there is a way to cluster the documentation files of data sets in the package documentation index file, so that common prefixes such as "dat..." are not necessary. Best wishes, Alrik Dr. Al

[R] Clustering with uneven variables

2013-05-09 Thread Elizabeth McKenzie

> Hello, > > I am new to R (and a novice at statistics). I have a list of objects, with > (ideally) 10 different attributes measured per object. However, in reality, > I was not able to obtain all 10 attributes for every object, so there is > some data missing (unequal number of measured attribut

[R] Clustering newbie question

2012-12-18 Thread Anton Ashanin

Hello, Please advice on encoding data for the following clustering problem. I have a dataset with car usage info. Dataset has the following fields: 1. Car model (Toyoya Celica, BMW, Nissan X-Trail, Mazda Cosmo, etc.) 2. Year built 3. Country where the car runs 4. Distance run by car before ma

Re: [R] clustering of binary data

2012-12-06 Thread David L Carlson

of Anthropology Texas A&M University College Station, TX 77843-4352 > -Original Message- > From: r-help-boun...@r-project.org [mailto:r-help-bounces@r- > project.org] On Behalf Of marco milella > Sent: Thursday, December 06, 2012 12:08 PM > To: r-help@r-project.org > Subject:

[R] clustering of binary data

2012-12-06 Thread marco milella

Good morning, I am analyzing a dataset composed by 364 subjects and 13 binary variables (0,1 = absence,presence). I am testing possible association (co-presence) of my variables. To do this, I was trying with cluster analysis. My main interest is to check for the significance of the obtained clust

[R] Clustering groups according to multiple variables

2012-10-31 Thread Matthew Ouellette

Dear R help, I am trying to cluster my data according to "group" in a data frame such as the following: df=data.frame(group=rep(c("a","b","c","d"),10),(replicate(100,rnorm(40 I'm not sure how to tell hclust() that I want to cluster according to the group variable. For example: dfclust=hc

[R] Clustering groups according to multiple variables

2012-10-31 Thread Matthew Ouellette

Dear R help, I am trying to cluster my data according to "group" in a data frame such as the following: df=data.frame(group=rep(c("a","b","c","d"),10),(replicate(100,rnorm(40 I'm not sure how to tell hclust() that I want to cluster according to the group variable. For example: dfclust=hc

[R] clustering spline-based models

2012-10-01 Thread Wyatt McMahon

Hello playeRs! I'm working on a project for a client. She's modeling hormone levels periodically, and trying to develop a model and fit her data to that model, and subsequently she's trying to cluster individuals based on how well each fits the model. I've been looking at grofit for this,

Re: [R] Clustering analysis with ordination plots

2012-05-02 Thread Gavin Simpson

Please read the posting guide for future questions. I presume you mean using the vegan package? If so, then see this blog post of mine which shows how to do something similar: http://wp.me/pZRQ9-73 If you post more details and an example I will help further if the blog post is not sufficient for

Re: [R] Clustering analysis with ordination plots

2012-05-01 Thread Uwe Ligges

On 30.04.2012 18:44, borinot wrote: Hello to all, I'm new to R so I have a lot of problems with it, but I'll only ask the main one. I have clustered an environmental matrix We do not know what that is. Where is the example data? See the posting guide. with 2 different methods, Which

[R] Clustering analysis with ordination plots

2012-04-30 Thread borinot

Hello to all, I'm new to R so I have a lot of problems with it, but I'll only ask the main one. I have clustered an environmental matrix with 2 different methods, and I'd like to plot them in a PCA and a db-RDA. I mean, I want see these clusters in the plots like points of differents colours, t

[R] clustering and the region of integration

2012-02-10 Thread Barbara Uszczynska

Dear R users, I'm having trouble with calculating pvalues for my 2d dataset. First I performed clustering and I would like to get some info about the strength of cluster membership for each point. I've calculated (thanks to nice people help) the multivariate normal densities (mnd) using dmvnorm fu

[R] Clustering and visualising a wordcloud

2012-01-23 Thread Sachinthaka Abeywardana

Is there a package (and for that matter a function) that I can use to create clustered wordclouds. The current wordcloud package simply has more frequent words as larger words, whereas what I want is the cluster centre to be the more frequent words but, the closer a word is to another the higher th

Re: [R] Clustering Large Applications..sort of

2011-08-10 Thread Christian Hennig

PS to my previous posting: Also have a look at kmeansruns in fpc. This runs kmeans for several numbers of clusters and decides the number of clusters by either Calinski&Harabasz or Average Silhouette Width. Christian On Wed, 10 Aug 2011, Ken Hutchison wrote: Hello all, I am using the clust

Re: [R] Clustering Large Applications..sort of

2011-08-10 Thread Christian Hennig

There is a number of methods in the literature to decide the number of clusters for k-means. Probably the most popular one is the Calinski and Harabasz index, implemented as calinhara in package fpc. A distance based version (and several other indexes to do this) is in function cluster.stats in

Re: [R] Clustering Large Applications..sort of

2011-08-10 Thread Peter Langfelder

On Wed, Aug 10, 2011 at 12:07 PM, Ken Hutchison wrote: > Hello all, > I am using the clustering functions in R in order to work with large > masses of binary time series data, however the clustering functions do not > seem able to fit this size of practical problem. Library 'hclust' is good > (t

Re: [R] Clustering Large Applications..sort of

2011-08-10 Thread Thomas Lumley

Try the flow cytometry clustering functions in Bioconductor. -thomas On Thu, Aug 11, 2011 at 7:07 AM, Ken Hutchison wrote: > Hello all, > I am using the clustering functions in R in order to work with large > masses of binary time series data, however the clustering functions do not > see

[R] Clustering Large Applications..sort of

2011-08-10 Thread Ken Hutchison

Hello all, I am using the clustering functions in R in order to work with large masses of binary time series data, however the clustering functions do not seem able to fit this size of practical problem. Library 'hclust' is good (though it may be sub par for this size of problem, thus doubly poo

Re: [R] clustering based on most significant pvalues does not separate the groups!

2011-07-06 Thread pguilha

Yes absolutely, your explanation makes sense. Thanks very much. rgds Paul -- View this message in context: http://r.789695.n4.nabble.com/clustering-based-on-most-significant-pvalues-does-not-separate-the-groups-tp3644249p3649233.html Sent from the R help mailing list archive at Nabble.com. _

Re: [R] clustering based on most significant pvalues does not separate the groups!

2011-07-06 Thread S Ellison

lp-boun...@r-project.org > [mailto:r-help-boun...@r-project.org] On Behalf Of pguilha > Sent: 04 July 2011 19:22 > To: r-help@r-project.org > Subject: [R] clustering based on most significant pvalues > does not separate the groups! > > Hi all, > > I have some microarra

[R] clustering based on most significant pvalues does not separate the groups!

2011-07-04 Thread pguilha

Hi all, I have some microarray data on 40 samples that fall into two groups. I have a value for 480k probes for each of those samples. I performed a t test (rowttests) on each row(giving the indices of the columns for each group) then used p.adjust() to adjust the pvalues for the number of tests p

[R] Clustering help in Heat Maps

2011-04-12 Thread khush ........

Dear Experts, I am using the below script to generate the heat map of gene expression data. I am using Hierarchical Clustering (hclust) for clustering. Now I want to compare different clustering parameters such as *K-means* clustering, Model Based Clustering, I have two queries: 1. How to incorp

[R] Clustering problem

2011-03-21 Thread Abhishek Pratap

Hi Guys I want to apply a clustering algo to my dataset in order to find the regions points(X,Y) which have similar values(percent_GC and mean_phred_quality). Details below. I have sampled 1% of points from my main data set of 85 million points. The result is still somewhat large 800K points and

Re: [R] clustering problem

2011-03-02 Thread Maxim

, March 02, 2011 4:08 PM > To: r-help@r-project.org > Subject: [R] clustering problem > > Hi, > > I have a gene expression experiment with 20 samples and 25000 genes each. > I'd like to perform clustering on these. It turned out to become much > faster > when I transform

Re: [R] clustering problem

2011-03-02 Thread rex.dwyer

Don't you expect it to be a lot faster if you cluster 20 items instead of 25000? -Original Message- From: r-help-boun...@r-project.org [mailto:r-help-boun...@r-project.org] On Behalf Of Maxim Sent: Wednesday, March 02, 2011 4:08 PM To: r-help@r-project.org Subject: [R] clustering pr

[R] clustering problem

2011-03-02 Thread Maxim

Hi, I have a gene expression experiment with 20 samples and 25000 genes each. I'd like to perform clustering on these. It turned out to become much faster when I transform the underlying matrix with t(matrix). Unfortunately then I'm not anymore able to use cutree to access individual clusters. In

[R] clustering fuzzy

2011-02-05 Thread pete

After ordering the table of membership degrees , i must get the difference between the first and second coloumns , between the first and second largest membership degree of object i. This for K=2,K=3,to K.max=6. This difference is multiplyed by the Crisp silhouette index vector (si). Too it d

Re: [R] clustering fuzzy

2011-02-05 Thread pete

After ordering the table of membership degrees , i must get the difference between the first and second coloumns , between the first and second largest membership degree of object i. This for K=2,K=3,to K.max=6. This difference is multiplyed by the Crisp silhouette index vector (si). Too it d

Re: [R] clustering fuzzy

2011-02-02 Thread pete

After ordering the table of membership degrees , i must get the difference between the first and second coloumns , between the first and second largest membership degree of object i. This for K=2,K=3,to K.max=6. This difference is multiplyed by the Crisp silhouette index vector (si). Too it d

Re: [R] clustering with finite mixture model

2011-02-02 Thread Matt Shotwell

There are quite a few packages that work with finite mixtures, as evidenced by the descriptions here: http://cran.r-project.org/web/packages/index.html These might be useful: http://cran.r-project.org/web/packages/flexmix/index.html http://cran.r-project.org/web/packages/mclust/index.html -Ma

[R] clustering with finite mixture model

2011-02-02 Thread karuna m

Dear R-help, I am doing clustering via finite mixture model. Please suggest some packages in R to find clusters via finite mixture model with continuous variables. And also I wish to verify the distributional properties of the mixture distributions by fitting the model with lognormal, gamma, ex

Re: [R] clustering fuzzy

2011-01-22 Thread pete

I must get an index (fuzzy silhouette), a weighted average. A average the crisp silhouette for every row (i) s and the weight of each term is determined by the difference between the membership degrees of corrisponding object to its first and second best matching fuzzy clusters. i need the differe

Re: [R] clustering fuzzy

2011-01-21 Thread pete

thank you ,you have been very kind -- View this message in context: http://r.789695.n4.nabble.com/clustering-fuzzy-tp3229853p3230228.html Sent from the R help mailing list archive at Nabble.com. __ R-help@r-project.org mailing list https://stat.ethz.c

Re: [R] clustering fuzzy

2011-01-21 Thread jim holtman

use 'apply': > head(x.m) V2 V3 V4 V5 [1,] 0.66 0.04 0.01 0.30 [2,] 0.02 0.89 0.09 0.00 [3,] 0.06 0.92 0.01 0.01 [4,] 0.07 0.71 0.21 0.01 [5,] 0.10 0.85 0.04 0.01 [6,] 0.91 0.04 0.02 0.02 > x.m.sort <- apply(x.m, 1, sort, decreasing = TRUE) > head(t(x.m.sort)) [,1] [,2] [,3] [,4]

[R] clustering fuzzy

2011-01-21 Thread pete

hello, i'm pete ,how can i order rows of matrix by max to min value? I have a matrix of membership degrees, with 82 (i) rows and K coloumns, K are clusters. I need first and second largest elements of the i-th row. for example 1 0.66 0.04 0.01 0.30 2 0.02 0.89 0.09 0.00 3 0.06 0.92 0.01 0.01 4

Re: [R] clustering association rules

2010-11-11 Thread Michael Hahsler

Jüri, How did you create the output? An example to cluster transactions with arules can be found in: Michael Hahsler and Kurt Hornik. Building on the arules infrastructure for analyzing transaction data with R. In R. Decker and H.-J. Lenz, editors, /Advances in Data Analysis, Proceedings of t

[R] clustering association rules

2010-11-10 Thread Kuusik , Jüri

Hello. I have a general question regarding to clustering of association rules. According to http://cran.r-project.org/web/packages/arules/vignettes/arules.pdf "4.7 Distance based clustering transactions and associations" there is possibility for creating clusters of association rules. I do not u

Re: [R] Clustering

2010-10-30 Thread David Winsemius

On Oct 30, 2010, at 7:49 AM, dpender wrote: David Winsemius wrote: On Oct 29, 2010, at 12:08 PM, David Winsemius wrote: On Oct 29, 2010, at 11:37 AM, dpender wrote: Apologies for being vague, The structure of the output is as follows: Still no code? I am using the Clusters functio

Re: [R] Clustering

2010-10-30 Thread dpender

David Winsemius wrote: > > > On Oct 29, 2010, at 12:08 PM, David Winsemius wrote: > >> >> On Oct 29, 2010, at 11:37 AM, dpender wrote: >> >>> Apologies for being vague, >>> >>> The structure of the output is as follows: >> >> Still no code? >> > > I am using the Clusters function from the evd

Re: [R] Clustering

2010-10-29 Thread David Winsemius

On Oct 29, 2010, at 12:08 PM, David Winsemius wrote: On Oct 29, 2010, at 11:37 AM, dpender wrote: Apologies for being vague, The structure of the output is as follows: Still no code? $ cluster1 : Named num [1:131] 3.05 2.71 3.26 2.91 2.88 3.11 3.21 -1 2.97 3.39 ... ..- attr(*, "nam

Re: [R] Clustering

2010-10-29 Thread David Winsemius

On Oct 29, 2010, at 11:37 AM, dpender wrote: Apologies for being vague, The structure of the output is as follows: Still no code? $ cluster1 : Named num [1:131] 3.05 2.71 3.26 2.91 2.88 3.11 3.21 -1 2.97 3.39 ... ..- attr(*, "names")= chr [1:131] "6667" "6668" "6669" "6670" ... Wi

Re: [R] Clustering

2010-10-29 Thread dpender

Apologies for being vague, The structure of the output is as follows: $ cluster1 : Named num [1:131] 3.05 2.71 3.26 2.91 2.88 3.11 3.21 -1 2.97 3.39 ... ..- attr(*, "names")= chr [1:131] "6667" "6668" "6669" "6670" ... With 613 clusters. What I require is abstracting the first and last va

Re: [R] Clustering

2010-10-29 Thread David Winsemius

On Oct 29, 2010, at 5:14 AM, dpender wrote: That's helpful but the reason I'm using clusters in evd is that I need to specify a time condition to ensure independence. I believe this is the first we heard about any particular function or package. I therefore have an output We woul

Re: [R] Clustering

2010-10-29 Thread dpender

That's helpful but the reason I'm using clusters in evd is that I need to specify a time condition to ensure independence. I therefore have an output in the form Cluster[[i]][j-k] where i is the cluster number and j-k is the range of values above the threshold taking account of the time condi

Re: [R] clustering on scaled dataset or not?

2010-10-28 Thread Claudia Beleites

John, Hi, just a general question: when we do hierarchical clustering, should we compute the dissimilarity matrix based on scaled dataset or non-scaled dataset? daisy() in cluster package allow standardizing the variables before calculating dissimilarity matrix; I'd say that should depend

[R] clustering on scaled dataset or not?

2010-10-28 Thread array chip

Hi, just a general question: when we do hierarchical clustering, should we compute the dissimilarity matrix based on scaled dataset or non-scaled dataset? daisy() in cluster package allow standardizing the variables before calculating dissimilarity matrix; but dist() doesn't have that option at

Re: [R] Clustering

2010-10-28 Thread David Winsemius

On Oct 28, 2010, at 8:00 AM, dpender wrote: I am looking to use R in order to determine the number of extreme events for a high frequency (20 minutes) dataset of wave heights that spans 25 years (657,432) data points. I require the number, spacing and duration of the extreme events as an

Re: [R] Clustering

2010-10-28 Thread Albyn Jones

I have worked with seismic data measured at 100hz, and had no trouble locating events in "long" records (several times the size of your dataset). 20 minutes is high frequency? what kind of waves are these? what is the wavelength? some details would help. albyn On Thu, Oct 28, 2010 at 05:00:10A

[R] Clustering

2010-10-28 Thread dpender

I am looking to use R in order to determine the number of extreme events for a high frequency (20 minutes) dataset of wave heights that spans 25 years (657,432) data points. I require the number, spacing and duration of the extreme events as an output. I have briefly used the clusters function i

Re: [R] Clustering with ordinal data

2010-10-19 Thread Michael Bedward

Hello Steve, > I've been asked to help evaluate a vegetation data set, specifically to > examine it for community similarity. The initial problem I see is that the > data is ordinal. At best this only captures a relative ranking of > abundance and ordinal ranks are assigned after data collection

Re: [R] Clustering with ordinal data

2010-10-19 Thread Steve_Friedman

10 02:23 cc PMr-help@r-project.org Subject Re: [R] Clustering with or

Re: [R] Clustering with ordinal data

2010-10-19 Thread Phil Spector

Steve - Take a look at daisy() in the cluster package. - Phil Spector Statistical Computing Facility Department of Statistics UC Be

[R] Clustering with ordinal data

2010-10-19 Thread Steve_Friedman

Hello I've been asked to help evaluate a vegetation data set, specifically to examine it for community similarity. The initial problem I see is that the data is ordinal. At best this only captures a relative ranking of abundance and ordinal ranks are assigned after data collection.I've been

[R] clustering with cosine correlation

2010-10-11 Thread l.mohammadikhankahdani

Dear All Do you know how to make a heatmap and use cosine correlation for clustering? This is what my colleague can do in gene-math and I want to do in R but I don't know how to. Thanks a lot Leila __ R-help@r-project.org mailing list https://stat.

[R] Clustering groups

2010-07-21 Thread syrvn

Hi, is there a way in R to identify those cluster methods / distance measures which best reflect predefined cluster groups. Given 10 observations O1...O10. Optimally, these 10 observations cluster as follows: cluster1: O1, O2, O3, O4 cluster2: O5, O6 cluster3: O7, O8, O9, O10. What I want is a

Re: [R] Clustering

2010-06-23 Thread Tal Galili

Hi Ralph, In case of hclust, the dendrogram does show the "steps" (they are the heights presented in the graph). You can present them also in a matrix using "cutree", for example: dat <- (USArrests) n <- (dim(dat)[1]) hc <- hclust(dist(USArrests)) cutree(hc, k=1:n) You might then visualize the

[R] Clustering

2010-06-23 Thread Ralph Modjesch

Hi, I use the following clustering methods and get the corresponding dendrograms for single, complete, average, ward and kmeans clustering. This gives the dendrograms, but doesn't show the calculation-way. My question: is there a possibility to show this calculation steps (cluster steps) in matr

Re: [R] Clustering algorithms don't find obvious clusters

2010-06-14 Thread Henrik Aldberg

Thank you Etienne, this seems to work like a charm. Also thanks to the rest of you for your help. Henrik On 11 June 2010 13:51, Cuvelier Etienne wrote: > > > Le 11/06/2010 12:45, Henrik Aldberg a écrit : > > I have a directed graph which is represented as a matrix on the form >> >> >> 0 4 0 1

Re: [R] Clustering algorithms don't find obvious clusters

2010-06-13 Thread Joris Meys

Henrik, the methods you use are NOT applicable to directed graphs, in the contrary even. They will split up what you want to put together. In your data, an author never cites himself. Hence, A and B are far more different than B and D according to the techniques you use. Please check out Etiennes

Re: [R] Clustering algorithms don't find obvious clusters

2010-06-12 Thread Dave Roberts

Henrik, Given your initial matrix, that should tell you which authors are similar/dissimilar to which other authors in terms of which authors they cite. In this case authors 1 and 3 are most similar because they both cite authors 2 and 4. Authors 2 and 3 are most different because they

Re: [R] Clustering algorithms don't find obvious clusters

2010-06-12 Thread Henrik Aldberg

Dave, I used daisy with the default settings (daisy(M) where M is the matrix). Henrik On 11 June 2010 21:57, Dave Roberts wrote: > Henrik, > >The clustering algorithms you refer to (and almost all others) expect > the matrix to be symmetric. They do not seek a graph-theoretic solution, >

Re: [R] Clustering algorithms don't find obvious clusters

2010-06-11 Thread Dave Roberts

Henrik, The clustering algorithms you refer to (and almost all others) expect the matrix to be symmetric. They do not seek a graph-theoretic solution, but rather proximity in geometric or topological space. How did you convert y9oru matrix to a dissimilarity? Dave Roberts Henrik Al

Re: [R] Clustering algorithms don't find obvious clusters

2010-06-11 Thread Cuvelier Etienne

Le 11/06/2010 12:45, Henrik Aldberg a écrit : I have a directed graph which is represented as a matrix on the form 0 4 0 1 6 0 0 0 0 1 0 5 0 0 4 0 Each row correspond to an author (A, B, C, D) and the values says how many times this author have cited the other authors. Hence the first ro

[R] Clustering algorithms don't find obvious clusters

2010-06-11 Thread Henrik Aldberg

I have a directed graph which is represented as a matrix on the form 0 4 0 1 6 0 0 0 0 1 0 5 0 0 4 0 Each row correspond to an author (A, B, C, D) and the values says how many times this author have cited the other authors. Hence the first row says that author A have cited author B four time

Re: [R] clustering in R

2010-05-28 Thread Joris Meys

Ah OK, I didn't get your question then. a dist-object is actually a vector of numbers with a couple of attributes. You can't just cut out values like that. The hclust function needs a perfect distance matrix to use the calculations. shortcut is easy : just do f <- f/2*max(f), and all values are b

Re: [R] clustering in R

2010-05-28 Thread Joris Meys

I can't run your code. Please, just give me whatever comes on your screen when you run: dput(q) On Fri, May 28, 2010 at 10:57 PM, Ayesha Khan wrote: > I assume my matrix should look something like this?.. > > >round(distance, 4) >P00A P00B M02A M02B P04A P04B M06A M06B P0

Re: [R] clustering in R

2010-05-28 Thread Ayesha Khan

I assume my matrix should look something like this?.. >round(distance, 4) P00A P00B M02A M02B P04A P04B M06A M06B P08A P08B M10A P00B 0.9678 M02A 1.0054 1.0349 M02B 1.0258 1.0052 1.2106 P04A 1.0247 0.9928 1.0145 0.9260 P04B 0.9898 0.9769 0.9875 0.9855 0.6075 M06A 1.0159 0.

Re: [R] clustering in R

2010-05-28 Thread Ayesha Khan

v <- dput(x,"sampledata.txt") dim(v) q <- v[1:10,1:10] f =as.matrix(dist(t(q))) distB=NULL for(k in 1:(nrow(f)-1)) for( m in (k+1):ncol(f)) { if(f[k,m] <2) distB=rbind(distB,c(k,m,f[k,m])) } #now distB looks like this > distB [,1] [,2] [,3] [1,]12 1.6275568 [2,]13 0

Re: [R] clustering in R

2010-05-28 Thread Ayesha Khan

Yes Joris. I did try that and it does produce the results. I am now wondering why I wanted a matrix like structure in the first place. However, I do want 'f' to contain values less than 2 only. but when i try to get rid of values greater than 2 by doing N <- (f[f<2], f strcuture disrupts and hclust

Re: [R] clustering in R

2010-05-28 Thread Joris Meys

errr, forget about the output of dput(q), but keep it in mind for next time. f = dist(t(q)) hclust(f,method="single") it's as simple as that. Cheers Joris On Fri, May 28, 2010 at 10:39 PM, Ayesha Khan wrote: > v <- dput(x,"sampledata.txt") > dim(v) > q <- v[1:10,1:10] > f =as.matrix(dist(t(q)))

Re: [R] clustering in R

2010-05-28 Thread Tal Galili

Hi Ayesha, I wish to help you, but without a simple self contained example that shows your issue, I will not be able to help. Try using the ?dput command to create some simple data, and let us see what you are doing. Best, Tal Contact Details:---

Re: [R] clustering in R

2010-05-28 Thread Ayesha Khan

Thanks Tal & Joris! I created my distance matrix distA by using the dist() function in R manipulating my output in order to get a matrix. distA =as.matrix(dist(t(x2))) # x2 being my original dataset as according to the documentaion on dist() For the default method, a "dist" object, or a matrix (of

Re: [R] clustering in R

2010-05-28 Thread Joris Meys

As Tal said. Next to that, I read that column1 (and column2?) are supposed to be seen as factors, not as numerical variables. Did you take that into account somehow? It's easy to reproduce the error code : > n <- NULL > if(n<2)print("This is OK") Error in if (n < 2) print("This is OK") : argument

Re: [R] clustering in R

2010-05-27 Thread Tal Galili

Hi Ayesha, hclust is a way to go (much better then trying to invent the wheel here). Please add what you used to create: distA And create a sample data set to show us what you did, using dput Best, Tal Contact Details:--- Con

[R] clustering in R

2010-05-27 Thread Ayesha Khan

i have a matrix with the following dimensions 136 3 and it looks something like [,1] [,2] [,3] [1,] 402 675 1.802758 [2,] 402 696 1.938902 [3,] 402 699 1.994253 [4,] 402 945 1.898619 [5,] 424 470 1.812857 [6,] 424 905 1.816345 [7,] 470 905 1.871252 [8,

Re: [R] Clustering with clara

2010-01-14 Thread Christian Hennig

Dear Paco, as far as I know, there is no such problem with clara, but I may be wrong. However, in order to help you (though I'm not sure whether I'll be able to do that), we'd need to understand precisely what you were doing in R and how your data looks like (code and data; you can show us a r

[R] Clustering with clara

2010-01-14 Thread pacomet

Hello everyone I am trying to use CLARA method for finding clusters in my spatial surface temperature data and noticed one problem. My data are in the form lat,lon,temperature. I extract lat,lon and cluster number for each point in the dataset. When I plotted a map of cluster numbers I found empty

Re: [R] Clustering for Ordinal data

2009-10-15 Thread Dylan Beaudette

On Wednesday 14 October 2009, Paul Evans wrote: > Hi, > > I just wanted to check whether there is a clustering package available for > ordinal data. My data looks something like: #1 #2 #3 #4. > A B C D... > D B C A... > D C A A... > where each column represents a sample, and each row some ordin

[R] Clustering for Ordinal data

2009-10-14 Thread Paul Evans

Hi, I just wanted to check whether there is a clustering package available for ordinal data. My data looks something like: #1 #2 #3 #4. A B C D... D B C A... D C A A... where each column represents a sample, and each row some ordinal values. I would like to cluster such that similar samples

[R] Clustering with R - efficient processing of large sparse data sets (text data)

2009-09-27 Thread dataguru

I checked the R procedure HCLUST (hierarchical clustering) but it looks like it requires a full triangular n x n similarity matrix as input, where n = number of observations. The number of variables is 200. My data set has n = 50,000 observations (keywords), and I use ad-hoc similarity measures, n

[R] Clustering with R - efficient processing of large sparse data sets (text data)

2009-09-27 Thread dataguru

I checked the R procedure HCLUST (hierarchical clustering) but it looks like it requires a full triangular n x n similarity matrix as input, where n = number of observations. The number of variables is 200. My data set has n = 50,000 observations (keywords), and I use ad-hoc similarity measures, n

[R] Clustering within part of a cluster result

2009-07-09 Thread Albert Vernon Smith

How can I cluster and order within part of a previous clustering result? For example, I am clustering and ordering results as follows: > rows <- 30 > cols <- 3 > x <- matrix(sample(-1:1,rows*cols,replace=T), nrow=rows, > ncol=cols,dimnames=list(c(paste("R",1:rows,sep="")),c(paste("C",1:cols,sep=

Re: [R] clustering, don't understand this error

2009-04-16 Thread Christian Hennig

Hi there, I'm travelling right now so I can't really check this but it seems that the problem is that cluster.stats needs a partition as input. hclust doesn't give you a partition but you can generate one from it using cutree. BTW, rather use "<-" than "=". Best wishes, Christian On Wed, 1

[R] clustering, don't understand this error

2009-04-15 Thread Ana M Aparicio Carrasco

Hello, I am using the dunn metric, but something is wrong and I dont understand what or what that this error mean. Please can you help me with this? The instructions are: #Indice de Dunn disbupa=dist(bupa[,1:6]) a=hclust(disbupa) cluster.stats(disbupa,a,bupa[,7])$dunn And the error is: Erro

Re: [R] Clustering with Mahalanobis Distance

2008-12-10 Thread Wayne F

I don't have any experience with your particular problem, but the thing I notice is that mahalanobis is that by default you specify a covariance matrix, and it uses solve to calculate its inverse. If you could supply the inverse covariance matrix (and specify inverted=TRUE to mahalanobis), that mi

[R] Clustering with Mahalanobis Distance

2008-12-08 Thread Richardson, Patrick

Dear R ExpeRts, I'm having memory difficulties using mahalanobis distance to trying to cluster in R. I was wondering if anyone has done it with a matrix of 6525x17 (or something similar to that size). I have a matrix of 6525 genes and 17 samples. I have my R memory increased to the max and am

Re: [R] Clustering and functions

2008-11-08 Thread Sarah Goslee

It would help a lot if you told us what the error message was, and provided some data to work with. As it is, we can't even run the function to find out what goes wrong. And also, OS, version of R - all that stuff that the posting guide requests. Sarah On Sat, Nov 8, 2008 at 10:31 AM, Bryan Rich

[R] Clustering and functions

2008-11-08 Thread Bryan Richardson

I am new to R and have written a function that clusters on subsets of a big data data set with 60,000 points. I am not sure why, but I keep getting a run-time error. Any suggestions would be greatly appreciated. Here is the code: library(cba) d<-read.csv("data.csv", header=TRUE) v<-c(53,54,

[R] Clustering In R. (rookie)

2008-11-04 Thread paul murima

Hi all. I have alrge microarray dat set that i would like to analyze using hierarchical clustering. The problem is when i use the command below, > hc<- hclust(dist(array), "ave") i get get this feedback... Error in as.vector(x, mode) : cannot coerce type 'closure' to vector of type 'any' Can som

1 2 >

1 - 100 of 119 matches

Mail list logo