cast+variable+scale+
>
>
>
> > On Mar 31, 2015, at 4:43 AM, Peng Xia wrote:
> >
> > Hi,
> >
> > I have a RDD (rdd1)where each line is split into an array ["a", "b",
> "c], etc.
> > And I also have a local dictionary p (dict1) stores
Hi,
I have a RDD (rdd1)where each line is split into an array ["a", "b", "c],
etc.
And I also have a local dictionary p (dict1) stores key value pair {"a":1,
"b": 2, c:3}
I want to replace the keys in the rdd with the its corresponding value in
the dict:
rdd1.map(lambda line: [dict1[item] for item
.0 and later this will be overriden by
>> > SPARK_LOCAL_DIRS (Standalone, Mesos) or LOCAL_DIRS (YARN)
>> >
>> > On Sat, Mar 14, 2015 at 5:29 PM, Peng Xia
>> wrote:
>> >> Hi Sean,
>> >>
>> >> Thank very much for your repl
And I have 2 TB free space on C driver.
On Sat, Mar 14, 2015 at 8:29 PM, Peng Xia wrote:
> Hi Sean,
>
> Thank very much for your reply.
> I tried to config it from below code:
>
> sf = SparkConf().setAppName("test").set("spark.executor.memory",
&
ocal.dirs to something more
> appropriate and larger.
>
> On Sat, Mar 14, 2015 at 2:10 AM, Peng Xia wrote:
> > Hi
> >
> >
> > I was running a logistic regression algorithm on a 8 nodes spark cluster,
> > each node has 8 cores and 56 GB Ram (each node
Hi
I was running a logistic regression algorithm on a 8 nodes spark cluster,
each node has 8 cores and 56 GB Ram (each node is running a windows
system). And the spark installation driver has 1.9 TB capacity. The dataset
I was training on are has around 40 million records with around 6600
feature
algorithm in python.
3. train a logistic regression model with the converted labeled points.
Can any one give some advice for how to avoid the 2gb, if this is the cause?
Thanks very much for the help.
Best,
Peng
On Mon, Mar 9, 2015 at 3:54 PM, Peng Xia wrote:
> Hi,
>
> I was launchin
Hi,
I was launching a spark cluster with 4 work nodes, each work nodes contains
8 cores and 56gb ram, and I was testing my logistic regression problem.
The training set is around 1.2 million records.When I was using 2**10
(1024) features, the whole program works fine, but when I use 2**14
features
Thanks Jimmy.
I will have a try.
Thanks very much for your guys' help.
Best,
Peng
On Thu, Oct 30, 2014 at 8:19 PM, Jimmy wrote:
> sampleRDD. cache()
>
> Sent from my iPhone
>
> On Oct 30, 2014, at 5:01 PM, peng xia wrote:
>
> Hi Xiangrui,
>
> Can you give me
>
> On Thu, Oct 30, 2014 at 11:44 AM, peng xia wrote:
> > Thanks for all your help.
> > I think I didn't cache the data. My previous cluster was expired and I
> don't
> > have a chance to check the load balance or app manager.
> > Below is my code.
> &g
t; On Thu, Oct 30, 2014 at 9:13 AM, Jimmy wrote:
> > Watch the app manager it should tell you what's running and taking
> awhile...
> > My guess it's a "distinct" function on the data.
> > J
> >
> > Sent from my iPhone
> >
> > On Oct 30,
Hi,
Previous we have applied SVM algorithm in MLlib to 5 million records (600
mb), it takes more than 25 minutes to finish.
The spark version we are using is 1.0 and we were running this program on a
4 nodes cluster. Each node has 4 cpu cores and 11 GB RAM.
The 5 million records only have two d
12 matches
Mail list logo