Re: Saving a pyspark.ml.feature.PCA model

2016-07-20 Thread Ajinkya Kale
PM Ajinkya Kale wrote: > I am using google cloud dataproc which comes with spark 1.6.1. So upgrade > is not really an option. > No way / hack to save the models in spark 1.6.1 ? > > On Tue, Jul 19, 2016 at 8:13 PM Shuai Lin wrote: > >> It's added in not-released-y

Re: Saving a pyspark.ml.feature.PCA model

2016-07-19 Thread Ajinkya Kale
a/browse/SPARK-13036 > https://github.com/apache/spark/commit/83302c3b > > so i guess you need to wait for 2.0 release (or use the current rc4). > > On Wed, Jul 20, 2016 at 6:54 AM, Ajinkya Kale > wrote: > >> Is there a way to save a pyspark.ml.feature.PCA model ? I know

Saving a pyspark.ml.feature.PCA model

2016-07-19 Thread Ajinkya Kale
Is there a way to save a pyspark.ml.feature.PCA model ? I know mllib has that but mllib does not have PCA afaik. How do people do model persistence for inference using the pyspark ml models ? Did not find any documentation on model persistency for ml. --ajinkya

Re: installing packages with pyspark

2016-03-19 Thread Ajinkya Kale
___ > From: Jakob Odersky > Sent: Thursday, March 17, 2016 6:40 PM > Subject: Re: installing packages with pyspark > To: Ajinkya Kale > Cc: > > > Hi, > regarding 1, packages are resolved locally. That means that when you > specify a package, spark-submit will resolv

installing packages with pyspark

2016-03-19 Thread Ajinkya Kale
Hi all, I had couple of questions. 1. Is there documentation on how to add the graphframes or any other package for that matter on the google dataproc managed spark clusters ? 2. Is there a way to add a package to an existing pyspark context through a jupyter notebook ? --aj

Re: Logistic Regression using ML Pipeline

2016-02-19 Thread Ajinkya Kale
Please take a look at the example here http://spark.apache.org/docs/latest/ml-guide.html#example-pipeline On Thu, Feb 18, 2016 at 9:27 PM Arunkumar Pillai wrote: > Hi > > I'm trying to build logistic regression using ML Pipeline > > val lr = new LogisticRegression() > > lr.setFitIntercept(t

Reading multiple avro files from a dir - Spark 1.5.1

2016-01-29 Thread Ajinkya Kale
Trying to load avro from hdfs. I have around 1000 part avro files in a dir. I am using this to read them - val df = sqlContext.read.format("com.databricks.spark.avro").load("path/to/avro/dir") df.select("QUERY").take(50).foreach(println) It works if I have pass only 1or 2 avro files in the path

Re: HBase 0.98.0 with Spark 1.5.3 issue in yarn-cluster mode

2016-01-22 Thread Ajinkya Kale
I tried --jars which supposedly does that but that did not work. On Fri, Jan 22, 2016 at 4:33 PM Ajinkya Kale wrote: > Hi Ted, > Is there a way for the executors to have the hbase-protocol jar on their > classpath ? > > On Fri, Jan 22, 2016 at 4:00 PM Ted Yu wrote: >

Re: HBase 0.98.0 with Spark 1.5.3 issue in yarn-cluster mode

2016-01-22 Thread Ajinkya Kale
Hi Ted, Is there a way for the executors to have the hbase-protocol jar on their classpath ? On Fri, Jan 22, 2016 at 4:00 PM Ted Yu wrote: > The class path formations on driver and executors are different. > > Cheers > > On Fri, Jan 22, 2016 at 3:25 PM, Ajinkya Kale > wrote:

Re: HBase 0.98.0 with Spark 1.5.3 issue in yarn-cluster mode

2016-01-22 Thread Ajinkya Kale
Is this issue only when the computations are in distributed mode ? If I do (pseudo code) : rdd.collect.call_to_hbase I dont get this error, but if I do : rdd.call_to_hbase.collect it throws this error. On Wed, Jan 20, 2016 at 6:50 PM Ajinkya Kale wrote: > Unfortunately I cannot at this mom

Re: HBase 0.98.0 with Spark 1.5.3 issue in yarn-cluster mode

2016-01-20 Thread Ajinkya Kale
Unfortunately I cannot at this moment (not a decision I can make) :( On Wed, Jan 20, 2016 at 6:46 PM Ted Yu wrote: > I am not aware of a workaround. > > Can you upgrade to 0.98.4+ release ? > > Cheers > > On Wed, Jan 20, 2016 at 6:26 PM, Ajinkya Kale > wrote: >

Re: HBase 0.98.0 with Spark 1.5.3 issue in yarn-cluster mode

2016-01-20 Thread Ajinkya Kale
. > > If still there is problem, please pastebin the stack trace. > > Thanks > > On Wed, Jan 20, 2016 at 5:41 PM, Ajinkya Kale > wrote: > >> >> I have posted this on hbase user list but i thought makes more sense on >> spark user list. >> I am able to rea

HBase 0.98.0 with Spark 1.5.3 issue in yarn-cluster mode

2016-01-20 Thread Ajinkya Kale
I have posted this on hbase user list but i thought makes more sense on spark user list. I am able to read the table in yarn-client mode from spark-shell but I have exhausted all online forums for options to get it working in the yarn-cluster mode through spark-submit. I am using this code-example