Re: PCA OutOfMemoryError

2016-01-17 Thread Bharath Ravi Kumar
t the SVD of the input matrix to the first; EOF is another name for > PCA). > > This takes about 30 minutes to compute the top 20 PCs of a 46.7K-by-6.3M > dense matrix of doubles (~2 Tb), with most of the time spent on the > distributed matrix-vector multiplies. > > Best, > Al

Re: PCA OutOfMemoryError

2016-01-12 Thread Bharath Ravi Kumar
Any suggestion/opinion? On 12-Jan-2016 2:06 pm, "Bharath Ravi Kumar" wrote: > We're running PCA (selecting 100 principal components) on a dataset that > has ~29K columns and is 70G in size stored in ~600 parts on HDFS. The > matrix in question is mostly sparse with ten

PCA OutOfMemoryError

2016-01-12 Thread Bharath Ravi Kumar
We're running PCA (selecting 100 principal components) on a dataset that has ~29K columns and is 70G in size stored in ~600 parts on HDFS. The matrix in question is mostly sparse with tens of columns populate in most rows, but a few rows with thousands of columns populated. We're running spark on m

Re: Spark on Mesos / Executor Memory

2015-10-17 Thread Bharath Ravi Kumar
To be precise, the MesosExecutorBackend's Xms & Xmx equal spark.executor.memory. So there's no question of expanding or contracting the memory held by the executor. On Sat, Oct 17, 2015 at 5:38 PM, Bharath Ravi Kumar wrote: > David, Tom, > > Thanks for the explan

Re: Spark on Mesos / Executor Memory

2015-10-17 Thread Bharath Ravi Kumar
t way to solve this is to use a higher > level tool that can run your spark jobs through one mesos framework and > then you can let spark distribute the resources more effectively. > > I hope that helps! > > Tom. > > On 17 Oct 2015, at 06:47, Bharath Ravi Kumar wrote: > >

Re: Spark on Mesos / Executor Memory

2015-10-16 Thread Bharath Ravi Kumar
Can someone respond if you're aware of the reason for such a memory footprint? It seems unintuitive and hard to reason about. Thanks, Bharath On Thu, Oct 15, 2015 at 12:29 PM, Bharath Ravi Kumar wrote: > Resending since user@mesos bounced earlier. My apologies. > > On Thu, Oct

Re: Spark on Mesos / Executor Memory

2015-10-15 Thread Bharath Ravi Kumar
Resending since user@mesos bounced earlier. My apologies. On Thu, Oct 15, 2015 at 12:19 PM, Bharath Ravi Kumar wrote: > (Reviving this thread since I ran into similar issues...) > > I'm running two spark jobs (in mesos fine grained mode), each belonging to > a different mesos

Re: Spark on Mesos / Executor Memory

2015-10-14 Thread Bharath Ravi Kumar
(Reviving this thread since I ran into similar issues...) I'm running two spark jobs (in mesos fine grained mode), each belonging to a different mesos role, say low and high. The low:high mesos weights are 1:10. On expected lines, I see that the low priority job occupies cluster resources to the m

Re: Spark on Mesos vs Yarn

2015-05-27 Thread Bharath Ravi Kumar
A follow up : considering that spark on mesos is indeed important to databricks, its partners and the community, fundamental issues like spark-6284 shouldn't be languishing for this long. A mesos cluster hosting diverse (i.e.multi-tenant) workloads is a common scenario in production for serious us

Re: HDP 2.2 AM abort : Unable to find ExecutorLauncher class

2015-03-19 Thread Bharath Ravi Kumar
in > http://spark.apache.org/docs/latest/running-on-yarn.html > Then I can see exactly whats in the directory. > > Doug > > ps Sorry for the dup message Bharath and Todd, used wrong email address. > > > > On Mar 19, 2015, at 1:19 AM, Bharath Ravi Kumar > wrote:

Re: HDP 2.2 AM abort : Unable to find ExecutorLauncher class

2015-03-18 Thread Bharath Ravi Kumar
3.2 but that was for a cloudera > installation. I am not sure what the HDP version would be to put here. > > -Todd > > On Wed, Mar 18, 2015 at 12:49 AM, Bharath Ravi Kumar > wrote: > >> Hi Todd, >> >> Yes, those entries were present in the conf under the same S

Re: HDP 2.2 AM abort : Unable to find ExecutorLauncher class

2015-03-17 Thread Bharath Ravi Kumar
n your $SPARK_HOME/conf/spark-defaults.conf > file? > > spark.driver.extraJavaOptions -Dhdp.version=2.2.0.0-2041 > spark.yarn.am.extraJavaOptions -Dhdp.version=2.2.0.0-2041 > > > > > On Tue, Mar 17, 2015 at 1:04 AM, Bharath Ravi Kumar > wrote: > >> Still no luck

Re: HDP 2.2 AM abort : Unable to find ExecutorLauncher class

2015-03-16 Thread Bharath Ravi Kumar
Still no luck running purpose-built 1.3 against HDP 2.2 after following all the instructions. Anyone else faced this issue? On Mon, Mar 16, 2015 at 8:53 PM, Bharath Ravi Kumar wrote: > Hi Todd, > > Thanks for the help. I'll try again after building a distribution with the > 1.3

Re: HDP 2.2 AM abort : Unable to find ExecutorLauncher class

2015-03-16 Thread Bharath Ravi Kumar
apache-spark-hdp/ > > FWIW spark-1.3.0 appears to be working fine with HDP as well and steps 2a > and 2b are not required. > > HTH > > -Todd > > On Mon, Mar 16, 2015 at 10:13 AM, Bharath Ravi Kumar > wrote: > >> Hi, >> >> Trying to run spark ( 1.2.1

HDP 2.2 AM abort : Unable to find ExecutorLauncher class

2015-03-16 Thread Bharath Ravi Kumar
Hi, Trying to run spark ( 1.2.1 built for hdp 2.2) against a yarn cluster results in the AM failing to start with following error on stderr: Error: Could not find or load main class org.apache.spark.deploy.yarn.ExecutorLauncher An application id was assigned to the job, but there were no logs. Not

Re: ALS failure with size > Integer.MAX_VALUE

2014-12-15 Thread Bharath Ravi Kumar
Ok. We'll try using it in a test cluster running 1.2. On 16-Dec-2014 1:36 am, "Xiangrui Meng" wrote: Unfortunately, it will depends on the Sorter API in 1.2. -Xiangrui On Mon, Dec 15, 2014 at 11:48 AM, Bharath Ravi Kumar wrote: > Hi Xiangrui, > > The block size limit w

Re: ALS failure with size > Integer.MAX_VALUE

2014-12-14 Thread Bharath Ravi Kumar
s, Bharath On Wed, Dec 3, 2014 at 10:10 PM, Bharath Ravi Kumar wrote: > > Thanks Xiangrui. I'll try out setting a smaller number of item blocks. And > yes, I've been following the JIRA for the new ALS implementation. I'll try > it out when it's ready for tes

Re: ALS failure with size > Integer.MAX_VALUE

2014-12-03 Thread Bharath Ravi Kumar
pache.org/jira/browse/SPARK-3735 > > which I will try to implement in 1.3. I'll ping you when it is ready. > > Best, > Xiangrui > > On Tue, Dec 2, 2014 at 10:40 AM, Bharath Ravi Kumar > wrote: > > Yes, the issue appears to be due to the 2GB block size limitation

Re: ALS failure with size > Integer.MAX_VALUE

2014-12-01 Thread Bharath Ravi Kumar
check for that? > > > > I have been running a very similar use case to yours (with more > constrained > > hardware resources) and I haven’t seen this exact problem but I’m sure > we’ve > > seen similar issues. Please let me know if you have other questions

Re: ALS failure with size > Integer.MAX_VALUE

2014-11-28 Thread Bharath Ravi Kumar
. Thanks, Bharath On Fri, Nov 28, 2014 at 12:00 AM, Bharath Ravi Kumar wrote: > We're training a recommender with ALS in mllib 1.1 against a dataset of > 150M users and 4.5K items, with the total number of training records being > 1.2 Billion (~30GB data). The input data is spre

ALS failure with size > Integer.MAX_VALUE

2014-11-27 Thread Bharath Ravi Kumar
We're training a recommender with ALS in mllib 1.1 against a dataset of 150M users and 4.5K items, with the total number of training records being 1.2 Billion (~30GB data). The input data is spread across 1200 partitions on HDFS. For the training, rank=10, and we've configured {number of user data

Re: OOM with groupBy + saveAsTextFile

2014-11-03 Thread Bharath Ravi Kumar
approach. My bad. On Mon, Nov 3, 2014 at 3:38 PM, Bharath Ravi Kumar wrote: > The result was no different with saveAsHadoopFile. In both cases, I can > see that I've misinterpreted the API docs. I'll explore the API's a bit > further for ways to save the iterable as chun

Re: OOM with groupBy + saveAsTextFile

2014-11-03 Thread Bharath Ravi Kumar
quot;save every element of the RDD as one line of text". > It works like TextOutputFormat in Hadoop MapReduce since that's what > it uses. So you are causing it to create one big string out of each > Iterable this way. > > On Sun, Nov 2, 2014 at 4:48 PM, Bharath Ravi Kumar

Re: OOM with groupBy + saveAsTextFile

2014-11-02 Thread Bharath Ravi Kumar
e (heap size too small), or a bug that results in an application > attempting to create a huge array, for example, when the number of elements > in the array are computed using an algorithm that computes an incorrect > size.” > > > > > On 2 Nov, 2014, at 12:25 pm, Bharath

Re: OOM with groupBy + saveAsTextFile

2014-11-01 Thread Bharath Ravi Kumar
Resurfacing the thread. Oom shouldn't be the norm for a common groupby / sort use case in a framework that is leading in sorting bench marks? Or is there something fundamentally wrong in the usage? On 02-Nov-2014 1:06 am, "Bharath Ravi Kumar" wrote: > Hi, > > I'm t

Re: OOM with groupBy + saveAsTextFile

2014-11-01 Thread Bharath Ravi Kumar
Minor clarification: I'm running spark 1.1.0 on JDK 1.8, Linux 64 bit. On Sun, Nov 2, 2014 at 1:06 AM, Bharath Ravi Kumar wrote: > Hi, > > I'm trying to run groupBy(function) followed by saveAsTextFile on an RDD > of count ~ 100 million. The data size is 20GB and groupBy

OOM with groupBy + saveAsTextFile

2014-11-01 Thread Bharath Ravi Kumar
Hi, I'm trying to run groupBy(function) followed by saveAsTextFile on an RDD of count ~ 100 million. The data size is 20GB and groupBy results in an RDD of 1061 keys with values being Iterable>. The job runs on 3 hosts in a standalone setup with each host's executor having 100G RAM and 24 cores de

Re: OOM writing out sorted RDD

2014-08-09 Thread Bharath Ravi Kumar
Update: as expected, switching to kryo merely delays the inevitable. Does anyone have experience controlling memory consumption while processing (e.g. writing out) imbalanced partitions? On 09-Aug-2014 10:41 am, "Bharath Ravi Kumar" wrote: > Our prototype application reads a 20GB

OOM writing out sorted RDD

2014-08-08 Thread Bharath Ravi Kumar
Our prototype application reads a 20GB dataset from HDFS (nearly 180 partitions), groups it by key, sorts by rank and write out to HDFS in that order. The job runs against two nodes (16G, 24 cores per node available to the job). I noticed that the execution plan results in two sortByKey stages, fol

Re: Implementing percentile through top Vs take

2014-07-31 Thread Bharath Ravi Kumar
finitely not done on the driver. It works as you say. Look > at the source code for RDD.takeOrdered, which is what top calls. > > > https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/rdd/RDD.scala#L1130 > > On Wed, Jul 30, 2014 at 7:07 PM, Bharath Ra

Implementing percentile through top Vs take

2014-07-30 Thread Bharath Ravi Kumar
I'm looking to select the top n records (by rank) from a data set of a few hundred GB's. My understanding is that JavaRDD.top(n, comparator) is entirely a driver-side operation in that all records are sorted in the driver's memory. I prefer an approach where the records are sorted on the cluster an

Re: Hadoop client protocol mismatch with spark 1.0.1, cdh3u5

2014-07-26 Thread Bharath Ravi Kumar
PM, Bharath Ravi Kumar wrote: > That's right, I'm looking to depend on spark in general and change only > the hadoop client deps. The spark master and slaves use the > spark-1.0.1-bin-hadoop1 binaries from the downloads page. The relevant > snippet from the app

Re: Hadoop client protocol mismatch with spark 1.0.1, cdh3u5

2014-07-25 Thread Bharath Ravi Kumar
ps to clarify what you are depending on? Building > custom Spark and depending on it is a different thing from depending > on plain Spark and changing its deps. I think you want the latter. > > On Fri, Jul 25, 2014 at 5:46 PM, Bharath Ravi Kumar > wrote: > > Thanks for responding

Re: Hadoop client protocol mismatch with spark 1.0.1, cdh3u5

2014-07-25 Thread Bharath Ravi Kumar
linked to your build in your app? > > On Fri, Jul 25, 2014 at 4:32 PM, Bharath Ravi Kumar > wrote: > > Any suggestions to work around this issue ? The pre built spark binaries > > don't appear to work against cdh as documented, unless there's a build > >

Re: Hadoop client protocol mismatch with spark 1.0.1, cdh3u5

2014-07-25 Thread Bharath Ravi Kumar
Any suggestions to work around this issue ? The pre built spark binaries don't appear to work against cdh as documented, unless there's a build issue, which seems unlikely. On 25-Jul-2014 3:42 pm, "Bharath Ravi Kumar" wrote: > > I'm encountering a hadoop client p

Hadoop client protocol mismatch with spark 1.0.1, cdh3u5

2014-07-25 Thread Bharath Ravi Kumar
I'm encountering a hadoop client protocol mismatch trying to read from HDFS (cdh3u5) using the pre-build spark from the downloads page (linked under "For Hadoop 1 (HDP1, CDH3)"). I've also followed the instructions at http://spark.apache.org/docs/latest/hadoop-third-party-distributions.html (i.e.

Re: Execution stalls in LogisticRegressionWithSGD

2014-07-02 Thread Bharath Ravi Kumar
727 SUCCESS PROCESS_LOCAL slave2 2014/07/02 16:01:28 33 s 99 ms Any pointers / diagnosis please? On Thu, Jun 19, 2014 at 10:03 AM, Bharath Ravi Kumar wrote: > Thanks. I'll await the fix to re-run my test. > > > On Thu, Jun 19, 2014 at 8:28 AM,

Re: Execution stalls in LogisticRegressionWithSGD

2014-06-18 Thread Bharath Ravi Kumar
On Tue, Jun 17, 2014 at 7:37 PM, Bharath Ravi Kumar > wrote: > > Couple more points: > > 1)The inexplicable stalling of execution with large feature sets appears > > similar to that reported with the news-20 dataset: > > > http://mail-archives.apache.org/mod_mbox/spark-user/2

Re: Execution stalls in LogisticRegressionWithSGD

2014-06-17 Thread Bharath Ravi Kumar
a JavaPairRDD, Tuple2> is unrelated to mllib. Thanks, Bharath On Wed, Jun 18, 2014 at 7:14 AM, Bharath Ravi Kumar wrote: > Hi Xiangrui , > > I'm using 1.0.0. > > Thanks, > Bharath > On 18-Jun-2014 1:43 am, "Xiangrui Meng" wrote: > >> Hi Bhar

Re: Execution stalls in LogisticRegressionWithSGD

2014-06-17 Thread Bharath Ravi Kumar
Hi Xiangrui , I'm using 1.0.0. Thanks, Bharath On 18-Jun-2014 1:43 am, "Xiangrui Meng" wrote: > Hi Bharath, > > Thanks for posting the details! Which Spark version are you using? > > Best, > Xiangrui > > On Tue, Jun 17, 2014 at 6:48 AM, Bharath Ravi Kum

Execution stalls in LogisticRegressionWithSGD

2014-06-17 Thread Bharath Ravi Kumar
Hi, (Apologies for the long mail, but it's necessary to provide sufficient details considering the number of issues faced.) I'm running into issues testing LogisticRegressionWithSGD a two node cluster (each node with 24 cores and 16G available to slaves out of 24G on the system). Here's a descrip

Re: Standalone client failing with docker deployed cluster

2014-05-16 Thread Bharath Ravi Kumar
(Trying to bubble up the issue again...) Any insights (based on the enclosed logs) into why standalone client invocation might fail while issuing jobs through the spark console succeeded? Thanks, Bharath On Thu, May 15, 2014 at 5:08 PM, Bharath Ravi Kumar wrote: > Hi, > > I'

Standalone client failing with docker deployed cluster

2014-05-16 Thread Bharath Ravi Kumar
Hi, I'm running the spark server with a single worker on a laptop using the docker images. The spark shell examples run fine with this setup. However, a standalone java client that tries to run wordcount on a local files (1 MB in size), the execution fails with the following error on the stdout of