Re: K Means Clustering Explanation

2018-03-04 Thread Alessandro Solimando
each >> of the ten clusters. >> >> >> >> On Thu, Mar 1, 2018 2:36 PM, Christoph Brücke carabo...@gmail.com wrote: >> >> Hi matt, >> >> the cluster are defined by there centroids / cluster centers. All the >> points belonging to a certain cl

Re: K Means Clustering Explanation

2018-03-02 Thread Matt Hicks
e whole cluster. Can you be a little bit more specific about your use-case? Best,Christoph Am 01.03.2018 20:53 schrieb "Matt Hicks" : I'm using K Means clustering for a project right now, and it's working very well.  However, I'd like to determine from the clusters what info

Re: K Means Clustering Explanation

2018-03-02 Thread Alessandro Solimando
typically do is to convert the cluster centers back to the >> original input format or of that is not possible use the point nearest to >> the cluster center and use this as a representation of the whole cluster. >> >> Can you be a little bit more specific about your use-cas

Re: K Means Clustering Explanation

2018-03-01 Thread Christoph Brücke
ntation of the whole cluster. > > Can you be a little bit more specific about your use-case? > > Best, > Christoph > > Am 01.03.2018 20:53 schrieb "Matt Hicks" : > > I'm using K Means clustering for a project right now, and it's working > very well.

K Means Clustering Explanation

2018-03-01 Thread Matt Hicks
I'm using K Means clustering for a project right now, and it's working very well.  However, I'd like to determine from the clusters what information distinctions define each cluster so I can explain the "reasons" data fits into a specific cluster. Is there a proper way to do this in Spark ML?

Re: K means clustering in spark

2015-12-31 Thread Yanbo Liang
Hi Anjali, The main output of KMeansModel is clusterCenters which is Array[Vector]. It has k elements where k is the number of clusters and each elements is the center of the specified cluster. Yanbo 2015-12-31 12:52 GMT+08:00 : > Hi, > > I am trying to use kmeans for clustering in spark using

K means clustering in spark

2015-12-30 Thread anjali . gautam09
Hi, I am trying to use kmeans for clustering in spark using python. I implemented it on the data set which spark has within. It's a 3*4 matrix. Can anybody please help me with how and what is orientation of data for kmeans. Also how to find out what all clusters and its members are. Thanks A

Distance Calculation in Spark K means clustering

2015-08-30 Thread ashensw
Hi all, I am currently working on some K means clustering project. I want to get the distances of each data point to it's cluster center after building the K means model. Currently I get the cluster centers of each data point by sending the JavaRDD which includes all the data points to K

Spark Taking too long on K-means clustering

2015-08-27 Thread masoom alam
HI every one, I am trying to run KDD data set - basically chapter 5 of the Advanced Analytics with Spark book. The data set is of 789MB, but Spark is taking some 3 to 4 hours. Is it normal behaviour.or some tuning is required. The server RAM is 32 GB, but we can only give 4 GB RAM on 64 bit Ub

Re: Settings for K-Means Clustering in Mlib for large data set

2015-06-23 Thread Xiangrui Meng
in process_request >>> >> self.finish_request(request, client_address) File >>> >> "/usr/lib64/python2.6/SocketServer.py", line 322, in finish_request >>> >> self.RequestHandlerClass(request, client_address, self) File >>

Re: Settings for K-Means Clustering in Mlib for large data set

2015-06-19 Thread Rogers Jeffrey
ocketServer.py", line 617, in __init__ >> >> self.handle() File "/root/spark/python/pyspark/accumulators.py", >> >> line 235, in handle >> >> num_updates = read_int(self.rfile) File >> >> "/root/spark/python/pyspark/s

Re: Settings for K-Means Clustering in Mlib for large data set

2015-06-18 Thread Rogers Jeffrey
read_int > >> raise EOFError EOFError > >> > >> > >> > --- > >> Py4JNetworkError Traceb

Re: Settings for K-Means Clustering in Mlib for large data set

2015-06-18 Thread Xiangrui Meng
raise EOFError EOFError >> >> >> --- >> Py4JNetworkError Traceback (most recent call >> last) in () >> > 1 model = KM

Settings for K-Means Clustering in Mlib for large data set

2015-06-18 Thread Rogers Jeffrey
ll > last) in () > > 1 model = KMeans.train(data, 1000, initializationMode="k-means||") > > /root/spark/python/pyspark/mllib/clustering.pyc in train(cls, rdd, k, > maxIterations, runs, initializationMode, seed, initializationSteps, > epsilon) &g

Announcement: Generalized K-Means Clustering on Spark

2015-01-25 Thread derrickburns
://apache-spark-user-list.1001560.n3.nabble.com/Announcement-Generalized-K-Means-Clustering-on-Spark-tp21363.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr

Re: k-means clustering

2014-11-25 Thread Yanbo Liang
Pre-processing is major workload before training model. MLlib provide TD-IDF calculation, StandardScaler and Normalizer which is essential for preprocessing and would be great help to the model training. Take a look at this http://spark.apache.org/docs/latest/mllib-feature-extraction.html 2014-11

Re: K-means clustering

2014-11-25 Thread Xiangrui Meng
There is a simple example here: https://github.com/apache/spark/blob/master/examples/src/main/python/kmeans.py . You can take advantage of sparsity by computing the distance via inner products: http://spark-summit.org/2014/talk/sparse-data-support-in-mllib-2 -Xiangrui On Tue, Nov 25, 2014 at 2:39

K-means clustering

2014-11-25 Thread amin mohebbi
 I  have generated a sparse matrix by python, which has the size of   4000*174000 (.pkl), the following is a small part of this matrix :  (0, 45) 1  (0, 413) 1  (0, 445) 1  (0, 107) 4  (0, 80) 2  (0, 352) 1  (0, 157) 1  (0, 191) 1  (0, 315) 1  (0, 395) 4  (0, 282) 3  (0, 184) 1  (0, 403) 1  (0, 1

Re: k-means clustering

2014-11-20 Thread Jun Yang
Guys, As to the questions of pre-processing, you could just migrate your logic to Spark before using K-means. I only used Scala on Spark, and haven't used Python binding on Spark, but I think the basic steps must be the same. BTW, if your data set is big with huge sparse dimension feature vector

k-means clustering

2014-11-18 Thread amin mohebbi
Hi there, I would like to do "text clustering" using  k-means and Spark on a massive dataset. As you know, before running the k-means, I have to do pre-processing methods such as TFIDF and NLTK on my big dataset. The following is my code in python : | | if __name__ == '__main__': | | | # Clus

Re: Categorical Features for K-Means Clustering

2014-09-16 Thread Aris
; > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Categorical-Features-for-K-Means-Clustering-tp9416p14394.html > > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > > >

Re: Categorical Features for K-Means Clustering

2014-09-16 Thread Sean Owen
spark-user-list.1001560.n3.nabble.com/Categorical-Features-for-K-Means-Clustering-tp9416p14394.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > - > To unsubscribe, e-mail: user-unsubsc

Re: Categorical Features for K-Means Clustering

2014-09-16 Thread st553
Does MLlib provide utility functions to do this kind of encoding? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Categorical-Features-for-K-Means-Clustering-tp9416p14394.html Sent from the Apache Spark User List mailing list archive at Nabble.com

Re: Speeding up K-Means Clustering

2014-07-17 Thread Xiangrui Meng
g. -Xiangrui >> >> On Thu, Jul 17, 2014 at 1:48 AM, Ravishankar Rajagopalan >> wrote: >> > I am trying to use MLlib for K-Means clustering on a data set with 1 >> > million >> > rows and 50 columns (all columns have double values) which is on HDFS >

Re: Speeding up K-Means Clustering

2014-07-17 Thread Ravishankar Rajagopalan
; > I am trying to use MLlib for K-Means clustering on a data set with 1 > million > > rows and 50 columns (all columns have double values) which is on HDFS > (raw > > txt file is 28 MB) > > > > I initially tried the following: > > > > val data3 = sc.textFil

Re: Speeding up K-Means Clustering

2014-07-17 Thread Xiangrui Meng
1:48 AM, Ravishankar Rajagopalan wrote: > I am trying to use MLlib for K-Means clustering on a data set with 1 million > rows and 50 columns (all columns have double values) which is on HDFS (raw > txt file is 28 MB) > > I initially tried the following: > > val data

Speeding up K-Means Clustering

2014-07-17 Thread Ravishankar Rajagopalan
I am trying to use MLlib for K-Means clustering on a data set with 1 million rows and 50 columns (all columns have double values) which is on HDFS (raw txt file is 28 MB) I initially tried the following: val data3 = sc.textFile("hdfs://...inputData.txt") val parsedData3 = data3.map( _

Re: Categorical Features for K-Means Clustering

2014-07-11 Thread Wen Phan
ns on incorporating categorical >> features (attributes) into k-means clustering in Spark? In other words, I >> want to cluster on a set of attributes that include categorical variables. >> >> I know I could probably implement some custom code to parse and calculate my &

Re: Categorical Features for K-Means Clustering

2014-07-11 Thread Sean Owen
ul 11, 2014 at 3:07 PM, Wen Phan wrote: > Hi Folks, > > Does any one have experience or recommendations on incorporating categorical > features (attributes) into k-means clustering in Spark? In other words, I > want to cluster on a set of attributes that include categorical var

Categorical Features for K-Means Clustering

2014-07-11 Thread Wen Phan
Hi Folks, Does any one have experience or recommendations on incorporating categorical features (attributes) into k-means clustering in Spark? In other words, I want to cluster on a set of attributes that include categorical variables. I know I could probably implement some custom code to