Ability to offer initial coefficients in ml.LogisticRegression

2015-10-20 Thread YiZhi Liu
Hi all, I noticed that in ml.classification.LogisticRegression, users are not allowed to set initial coefficients, while it is supported in mllib.classification.LogisticRegressionWithSGD. Sometimes we know specific coefficients are close to the final optima. e.g., we usually pick yesterday's

Re: MapStatus too large for drvier

2015-10-20 Thread yaoqin
In our case, we are dealing with 20TB text data which is separated to about 200k map tasks and 200k reduce tasks, and our driver's memory is 15G,. -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/MapStatus-too-large-for-drvier-tp14704p14707.html Sent

Re: MapStatus too large for drvier

2015-10-20 Thread Reynold Xin
How big is your driver heap size? And any reason why you'd need 200k map and 200k reduce tasks? On Mon, Oct 19, 2015 at 11:59 PM, yaoqin wrote: > Hi everyone, > > When I run a spark job contains quite a lot of tasks(in my case is > 200,000*200,000), the driver occured

MapStatus too large for drvier

2015-10-20 Thread yaoqin
Hi everyone, When I run a spark job contains quite a lot of tasks(in my case is 200,000*200,000), the driver occured OOM mainly caused by the object MapStatus, As is shown in the pic bellow, RoaringBitmap that used to mark which block is empty seems to use too many memories. Are there

Re: Spark driver reducing total executors count even when Dynamic Allocation is disabled.

2015-10-20 Thread Saisai Shao
Hi Prakhar, I start to know your problem, you expected that the killed exexcutor by heartbeat mechanism should be launched again but seems not. This problem I think is fixed in the version 1.5 of Spark, you could check this jira https://issues.apache.org/jira/browse/SPARK-8119 Thanks Saisai

Re: If you use Spark 1.5 and disabled Tungsten mode ...

2015-10-20 Thread Jerry Lam
I disabled it because of the "Could not acquire 65536 bytes of memory". It happens to fail the job. So for now, I'm not touching it. On Tue, Oct 20, 2015 at 4:48 PM, charmee wrote: > We had disabled tungsten after we found few performance issues, but had to > enable it back

Re: If you use Spark 1.5 and disabled Tungsten mode ...

2015-10-20 Thread Reynold Xin
Jerry - I think that's been fixed in 1.5.1. Do you still see it? On Tue, Oct 20, 2015 at 2:11 PM, Jerry Lam wrote: > I disabled it because of the "Could not acquire 65536 bytes of memory". It > happens to fail the job. So for now, I'm not touching it. > > On Tue, Oct 20,

Re: If you use Spark 1.5 and disabled Tungsten mode ...

2015-10-20 Thread charmee
We had disabled tungsten after we found few performance issues, but had to enable it back because we found that when we had large number of group by fields, if tungsten is disabled the shuffle keeps failing. Here is an excerpt from one of our engineers with his analysis. With Tungsten Enabled

Re: If you use Spark 1.5 and disabled Tungsten mode ...

2015-10-20 Thread Jerry Lam
Hi Reynold, Yes, I'm using 1.5.1. I see them quite often. Sometimes it recovers but sometimes it does not. For one particular job, it failed all the time with the acquire-memory issue. I'm using spark on mesos with fine grained mode. Does it make a difference? Best Regards, Jerry On Tue, Oct

BUILD SYSTEM: builds are OOMing the jenkins workers, investigating. also need to reboot amp-jenkins-worker-06

2015-10-20 Thread shane knapp
starting this saturday (oct 17) we started getting alerts on the jenkins workers that various processes were dying (specifically ssh). since then, we've had half of our workers OOM due to java processes and have had now to reboot two of them (-05 and -06). if we look at the current machine

Re: BUILD SYSTEM: builds are OOMing the jenkins workers, investigating. also need to reboot amp-jenkins-worker-06

2015-10-20 Thread shane knapp
amp-jenkins-worker-06 is back up. my next bets are on -07 and -08... :\ https://amplab.cs.berkeley.edu/jenkins/computer/ On Tue, Oct 20, 2015 at 3:39 PM, shane knapp wrote: > here's the related stack trace from dmesg... UID 500 is jenkins. > > Out of memory: Kill process

Re: BUILD SYSTEM: builds are OOMing the jenkins workers, investigating. also need to reboot amp-jenkins-worker-06

2015-10-20 Thread shane knapp
ok, based on the timing, i *think* this might be the culprit: https://amplab.cs.berkeley.edu/jenkins/job/Spark-Master-SBT/AMPLAB_JENKINS_BUILD_PROFILE=hadoop1.0,label=spark-test/3814/console On Tue, Oct 20, 2015 at 3:35 PM, shane knapp wrote: > -06 just kinda came back... >

Fwd: If you use Spark 1.5 and disabled Tungsten mode ...

2015-10-20 Thread Reynold Xin
With Jerry's permission, sending this back to the dev list to close the loop. -- Forwarded message -- From: Jerry Lam Date: Tue, Oct 20, 2015 at 3:54 PM Subject: Re: If you use Spark 1.5 and disabled Tungsten mode ... To: Reynold Xin

Re: BUILD SYSTEM: builds are OOMing the jenkins workers, investigating. also need to reboot amp-jenkins-worker-06

2015-10-20 Thread shane knapp
well, it was -08, and ssh stopped working (according to the alerts) just as i was logging in to kill off any errant processes. i've taken that worker offline in jenkins and will be rebooting it asap. on a positive note, i was able to clear out -07 before anything horrible happened to that one.

Re: MapStatus too large for drvier

2015-10-20 Thread yaoqin
I try to use org.apache.spark.util.collection.BitSet instead of RoaringBitMap, and it can save about 20% memories but runs much slower. For the 200K tasks job, RoaringBitMap uses 3 Long[1024] and 1 Short[3392] =3*64*1024+16*3392=250880(bit) BitSet uses 1 Long[3125] = 3125*64=20(bit) Memory

Set numExecutors by sparklaunch

2015-10-20 Thread qinggangwa...@gmail.com
Hi all, I want to launch spark job on yarn by java, but it seemes that there is no way to set numExecutors int the class SparkLauncher. Is there any way to set numExecutors ?Thanks qinggangwa...@gmail.com

Re: Spark driver reducing total executors count even when Dynamic Allocation is disabled.

2015-10-20 Thread prakhar jauhari
Thanks sai for the input, So the problem is : i start my job with some fixed number of executors, but when a host running my executors goes unreachable, driver reduces the total number of executors. And never increases it. I have a repro for the issue, attaching logs: Running spark job is