Re: refer to dictionary

2015-03-31 Thread Peng Xia
cast+variable+scale+ > > > > > On Mar 31, 2015, at 4:43 AM, Peng Xia wrote: > > > > Hi, > > > > I have a RDD (rdd1)where each line is split into an array ["a", "b", > "c], etc. > > And I also have a local dictionary p (dict1) stores

refer to dictionary

2015-03-31 Thread Peng Xia
Hi, I have a RDD (rdd1)where each line is split into an array ["a", "b", "c], etc. And I also have a local dictionary p (dict1) stores key value pair {"a":1, "b": 2, c:3} I want to replace the keys in the rdd with the its corresponding value in the dict: rdd1.map(lambda line: [dict1[item] for item

Re: spark there is no space on the disk

2015-03-31 Thread Peng Xia
.0 and later this will be overriden by >> > SPARK_LOCAL_DIRS (Standalone, Mesos) or LOCAL_DIRS (YARN) >> > >> > On Sat, Mar 14, 2015 at 5:29 PM, Peng Xia >> wrote: >> >> Hi Sean, >> >> >> >> Thank very much for your repl

Re: spark there is no space on the disk

2015-03-14 Thread Peng Xia
And I have 2 TB free space on C driver. On Sat, Mar 14, 2015 at 8:29 PM, Peng Xia wrote: > Hi Sean, > > Thank very much for your reply. > I tried to config it from below code: > > sf = SparkConf().setAppName("test").set("spark.executor.memory", &

Re: spark there is no space on the disk

2015-03-14 Thread Peng Xia
ocal.dirs to something more > appropriate and larger. > > On Sat, Mar 14, 2015 at 2:10 AM, Peng Xia wrote: > > Hi > > > > > > I was running a logistic regression algorithm on a 8 nodes spark cluster, > > each node has 8 cores and 56 GB Ram (each node

spark there is no space on the disk

2015-03-13 Thread Peng Xia
Hi I was running a logistic regression algorithm on a 8 nodes spark cluster, each node has 8 cores and 56 GB Ram (each node is running a windows system). And the spark installation driver has 1.9 TB capacity. The dataset I was training on are has around 40 million records with around 6600 feature

Re: error on training with logistic regression sgd

2015-03-10 Thread Peng Xia
algorithm in python. 3. train a logistic regression model with the converted labeled points. Can any one give some advice for how to avoid the 2gb, if this is the cause? Thanks very much for the help. Best, Peng On Mon, Mar 9, 2015 at 3:54 PM, Peng Xia wrote: > Hi, > > I was launchin

error on training with logistic regression sgd

2015-03-09 Thread Peng Xia
Hi, I was launching a spark cluster with 4 work nodes, each work nodes contains 8 cores and 56gb ram, and I was testing my logistic regression problem. The training set is around 1.2 million records.When I was using 2**10 (1024) features, the whole program works fine, but when I use 2**14 features

Re: issue on applying SVM to 5 million examples.

2014-10-30 Thread peng xia
Thanks Jimmy. I will have a try. Thanks very much for your guys' help. Best, Peng On Thu, Oct 30, 2014 at 8:19 PM, Jimmy wrote: > sampleRDD. cache() > > Sent from my iPhone > > On Oct 30, 2014, at 5:01 PM, peng xia wrote: > > Hi Xiangrui, > > Can you give me

Re: issue on applying SVM to 5 million examples.

2014-10-30 Thread peng xia
> > On Thu, Oct 30, 2014 at 11:44 AM, peng xia wrote: > > Thanks for all your help. > > I think I didn't cache the data. My previous cluster was expired and I > don't > > have a chance to check the load balance or app manager. > > Below is my code. > &g

Re: issue on applying SVM to 5 million examples.

2014-10-30 Thread peng xia
t; On Thu, Oct 30, 2014 at 9:13 AM, Jimmy wrote: > > Watch the app manager it should tell you what's running and taking > awhile... > > My guess it's a "distinct" function on the data. > > J > > > > Sent from my iPhone > > > > On Oct 30,

issue on applying SVM to 5 million examples.

2014-10-30 Thread peng xia
Hi, Previous we have applied SVM algorithm in MLlib to 5 million records (600 mb), it takes more than 25 minutes to finish. The spark version we are using is 1.0 and we were running this program on a 4 nodes cluster. Each node has 4 cpu cores and 11 GB RAM. The 5 million records only have two d