Bootstrap Action to Install Spark 2.0 on EMR?

2016-07-02 Thread Renxia Wang
Hi all,

Anybody had tried out Spark 2.0 on EMR 4.x? Will it work? I am looking for
a bootstrap action script to install it on EMR, does some one have a
working one to share? Appreciate that!

Best,

Renxia


latest version of Spark to work OK as Hive engine

2016-07-02 Thread Ashok Kumar
Hi,
Looking at this presentation Hive on Spark is Blazing Fast ..
Which latest version of Spark can run as an engine for Hive please?
Thanks
P.S. I am aware of  Hive on TEZ but that is not what I am interested here please
Warmest regards

Re: spark parquet too many small files ?

2016-07-02 Thread sri hari kali charan Tummala
Hi Takeshi,

I cant use coalesce in spark-sql shell right I know we can use coalesce in
spark with scala application , here in my project we are not building jar
or using python we are just executing hive query in spark-sql shell and
submitting to yarn client .

Example:-
spark-sql --verbose --queue default --name wchargeback_event.sparksql.kali
--master yarn-client --driver-memory 15g --executor-memory 15g
--num-executors 10 --executor-cores 2 -f /x/home/pp_dt_fin_batch/users/
srtummala/run-spark/sql/wtr_full.sql --conf
"spark.yarn.executor.memoryOverhead=8000"
--conf "spark.sql.shuffle.partitions=50" --conf
"spark.kyroserializer.buffer.max.mb=5g" --conf "spark.driver.maxResultSize=20g"
--conf "spark.storage.memoryFraction=0.8" --conf
"spark.hadoopConfiguration=2560"
--conf "spark.dynamicAllocation.enabled=false$" --conf
"spark.shuffle.service.enabled=false" --conf "spark.executor.instances=10"

Thanks
Sri




On Sat, Jul 2, 2016 at 2:53 AM, Takeshi Yamamuro 
wrote:

> Please also see https://issues.apache.org/jira/browse/SPARK-16188.
>
> // maropu
>
> On Fri, Jul 1, 2016 at 7:39 PM, kali.tumm...@gmail.com <
> kali.tumm...@gmail.com> wrote:
>
>> I found the jira for the issue will there be a fix in future ? or no fix ?
>>
>> https://issues.apache.org/jira/browse/SPARK-6221
>>
>>
>>
>> --
>> View this message in context:
>> http://apache-spark-user-list.1001560.n3.nabble.com/spark-parquet-too-many-small-files-tp27264p27267.html
>> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>>
>> -
>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>>
>>
>
>
> --
> ---
> Takeshi Yamamuro
>



-- 
Thanks & Regards
Sri Tummala


Working of Streaming Kmeans

2016-07-02 Thread Biplob Biswas
Hi,

I wanted to ask a very basic question about the working of Streaming Kmeans.

Does the model update only when training (i.e. training dataset is used) or
does it update on the PredictOnValues function as well for the test dataset? 

Thanks and Regards
Biplob




--
View this message in context: 
http://apache-spark-user-list.1001560.n3.nabble.com/Working-of-Streaming-Kmeans-tp27268.html
Sent from the Apache Spark User List mailing list archive at Nabble.com.

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



RE: Spark 2.0.0-preview ... problem with jackson core version

2016-07-02 Thread Paolo Patierno
Yes ! We got it  :-)

Btw it's not available on Maven yet. :-(

Paolo PatiernoSenior Software Engineer (IoT) @ Red Hat
Microsoft MVP on Windows Embedded & IoTMicrosoft Azure Advisor 
Twitter : @ppatierno
Linkedin : paolopatierno
Blog : DevExperience

> From: so...@cloudera.com
> Date: Sat, 2 Jul 2016 15:12:11 +0100
> Subject: Re: Spark 2.0.0-preview ... problem with jackson core version
> To: ppatie...@live.com
> CC: charles.al...@metamarkets.com; user@spark.apache.org
> 
> Ah, it looks like it was 2.5.3 as of 2.0.0-preview:
> 
> https://github.com/apache/spark/blob/2.0.0-preview/pom.xml#L164
> 
> but was updated to 2.6.5 soon after that, since it was 2.6.5 in 2.0.0-RC1:
> 
> https://github.com/apache/spark/blob/v2.0.0-rc1/pom.xml#L163
> 
> On Sat, Jul 2, 2016 at 3:04 PM, Paolo Patierno  wrote:
> > This sounds strange to me because here :
> >
> > https://mvnrepository.com/artifact/org.apache.spark/spark-core_2.11/2.0.0-preview
> >
> > I see :
> >
> > com.fasterxml.jackson.module » jackson-module-scala_2.112.5.3
> >
> > So it seems that 2.0.0-preview is bringing jackson module scala 2.5.3 that
> > is what I see.
> >
> >
> >
> >> From: so...@cloudera.com
> >> Date: Sat, 2 Jul 2016 14:32:58 +0100
> >> Subject: Re: Spark 2.0.0-preview ... problem with jackson core version
> >> To: ppatie...@live.com
> >> CC: charles.al...@metamarkets.com; user@spark.apache.org
> >
> >>
> >> This is something to do with your app. The version is 2.6.5 in master
> >> and branch-2.0, and jackson-module-scala is managed to this version
> >> along with all the other jackson artifacts.
> >>
> >> On Sat, Jul 2, 2016 at 1:35 PM, Paolo Patierno  wrote:
> >> > What I see is the following ...
> >> >
> >> > - Working configuration
> >> >
> >> > Spark Version : "2.0.0-SNAPSHOT"
> >> >
> >> > The Vert.x library brings ...
> >> > jackson-annotations:2.6.0
> >> > jackson-core:2.6.1
> >> > jackson-databind:2.6.1
> >> >
> >> > Spark brings
> >> > jackson-annotations:2.6.5
> >> > jackson-core:2.6.5
> >> > jackson-databind:2.6.5
> >> > jackson-module-scala_2.11: 2.6.5
> >> >
> >> > The runtime will use the latest version 2.6.5 and all works fine.
> >> >
> >> > - NOT Working configuration
> >> >
> >> > Spark Version : "2.0.0-preview"
> >> >
> >> > The Vert.x library brings ... (same dependencies as before)
> >> > jackson-annotations:2.6.0
> >> > jackson-core:2.6.1
> >> > jackson-databind:2.6.1
> >> >
> >> > Spark brings
> >> > jackson-module-scala_2.11: 2.5.3
> >> >
> >> > The module scale related to jackson is 2.5.3 ... seems not work with
> >> > above
> >> > 2.6.0/2.6.1 version ... so the exception.
> >> >
> >> >
> >> >> From: so...@cloudera.com
> >> >> Date: Sat, 2 Jul 2016 08:34:45 +0100
> >> >> Subject: Re: Spark 2.0.0-preview ... problem with jackson core version
> >> >> To: charles.al...@metamarkets.com
> >> >> CC: ppatie...@live.com; user@spark.apache.org
> >> >
> >> >>
> >> >> mvn dependency:tree?
> >> >>
> >> >> On Sat, Jul 2, 2016 at 12:46 AM, Charles Allen
> >> >>  wrote:
> >> >> > I'm having the same difficulty porting
> >> >> > https://github.com/metamx/druid-spark-batch/tree/spark2 over to
> >> >> > spark2.x,
> >> >> > where I have to go track down who is pulling in bad jackson versions.
> >> >> >
> >> >>
> >> >> -
> >> >> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
> >> >>
> >>
> >> -
> >> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
> >>
> 
> -
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
> 
  

Re: Spark 2.0.0-preview ... problem with jackson core version

2016-07-02 Thread Sean Owen
Ah, it looks like it was 2.5.3 as of 2.0.0-preview:

https://github.com/apache/spark/blob/2.0.0-preview/pom.xml#L164

but was updated to 2.6.5 soon after that, since it was 2.6.5 in 2.0.0-RC1:

https://github.com/apache/spark/blob/v2.0.0-rc1/pom.xml#L163

On Sat, Jul 2, 2016 at 3:04 PM, Paolo Patierno  wrote:
> This sounds strange to me because here :
>
> https://mvnrepository.com/artifact/org.apache.spark/spark-core_2.11/2.0.0-preview
>
> I see :
>
> com.fasterxml.jackson.module » jackson-module-scala_2.112.5.3
>
> So it seems that 2.0.0-preview is bringing jackson module scala 2.5.3 that
> is what I see.
>
>
>
>> From: so...@cloudera.com
>> Date: Sat, 2 Jul 2016 14:32:58 +0100
>> Subject: Re: Spark 2.0.0-preview ... problem with jackson core version
>> To: ppatie...@live.com
>> CC: charles.al...@metamarkets.com; user@spark.apache.org
>
>>
>> This is something to do with your app. The version is 2.6.5 in master
>> and branch-2.0, and jackson-module-scala is managed to this version
>> along with all the other jackson artifacts.
>>
>> On Sat, Jul 2, 2016 at 1:35 PM, Paolo Patierno  wrote:
>> > What I see is the following ...
>> >
>> > - Working configuration
>> >
>> > Spark Version : "2.0.0-SNAPSHOT"
>> >
>> > The Vert.x library brings ...
>> > jackson-annotations:2.6.0
>> > jackson-core:2.6.1
>> > jackson-databind:2.6.1
>> >
>> > Spark brings
>> > jackson-annotations:2.6.5
>> > jackson-core:2.6.5
>> > jackson-databind:2.6.5
>> > jackson-module-scala_2.11: 2.6.5
>> >
>> > The runtime will use the latest version 2.6.5 and all works fine.
>> >
>> > - NOT Working configuration
>> >
>> > Spark Version : "2.0.0-preview"
>> >
>> > The Vert.x library brings ... (same dependencies as before)
>> > jackson-annotations:2.6.0
>> > jackson-core:2.6.1
>> > jackson-databind:2.6.1
>> >
>> > Spark brings
>> > jackson-module-scala_2.11: 2.5.3
>> >
>> > The module scale related to jackson is 2.5.3 ... seems not work with
>> > above
>> > 2.6.0/2.6.1 version ... so the exception.
>> >
>> >
>> >> From: so...@cloudera.com
>> >> Date: Sat, 2 Jul 2016 08:34:45 +0100
>> >> Subject: Re: Spark 2.0.0-preview ... problem with jackson core version
>> >> To: charles.al...@metamarkets.com
>> >> CC: ppatie...@live.com; user@spark.apache.org
>> >
>> >>
>> >> mvn dependency:tree?
>> >>
>> >> On Sat, Jul 2, 2016 at 12:46 AM, Charles Allen
>> >>  wrote:
>> >> > I'm having the same difficulty porting
>> >> > https://github.com/metamx/druid-spark-batch/tree/spark2 over to
>> >> > spark2.x,
>> >> > where I have to go track down who is pulling in bad jackson versions.
>> >> >
>> >>
>> >> -
>> >> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>> >>
>>
>> -
>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>>

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



RE: Spark 2.0.0-preview ... problem with jackson core version

2016-07-02 Thread Paolo Patierno
This sounds strange to me because here :

https://mvnrepository.com/artifact/org.apache.spark/spark-core_2.11/2.0.0-preview

I see :

com.fasterxml.jackson.module
» jackson-module-scala_2.112.5.3

So it seems that 2.0.0-preview is bringing jackson module scala 2.5.3 that is 
what I see.



> From: so...@cloudera.com
> Date: Sat, 2 Jul 2016 14:32:58 +0100
> Subject: Re: Spark 2.0.0-preview ... problem with jackson core version
> To: ppatie...@live.com
> CC: charles.al...@metamarkets.com; user@spark.apache.org
> 
> This is something to do with your app. The version is 2.6.5 in master
> and branch-2.0, and jackson-module-scala is managed to this version
> along with all the other jackson artifacts.
> 
> On Sat, Jul 2, 2016 at 1:35 PM, Paolo Patierno  wrote:
> > What I see is the following ...
> >
> > - Working configuration
> >
> > Spark Version : "2.0.0-SNAPSHOT"
> >
> > The Vert.x library brings ...
> > jackson-annotations:2.6.0
> > jackson-core:2.6.1
> > jackson-databind:2.6.1
> >
> > Spark brings
> > jackson-annotations:2.6.5
> > jackson-core:2.6.5
> > jackson-databind:2.6.5
> > jackson-module-scala_2.11: 2.6.5
> >
> > The runtime will use the latest version 2.6.5 and all works fine.
> >
> > - NOT Working configuration
> >
> > Spark Version : "2.0.0-preview"
> >
> > The Vert.x library brings ... (same dependencies as before)
> > jackson-annotations:2.6.0
> > jackson-core:2.6.1
> > jackson-databind:2.6.1
> >
> > Spark brings
> > jackson-module-scala_2.11: 2.5.3
> >
> > The module scale related to jackson is 2.5.3 ... seems not work with above
> > 2.6.0/2.6.1 version ... so the exception.
> >
> >
> >> From: so...@cloudera.com
> >> Date: Sat, 2 Jul 2016 08:34:45 +0100
> >> Subject: Re: Spark 2.0.0-preview ... problem with jackson core version
> >> To: charles.al...@metamarkets.com
> >> CC: ppatie...@live.com; user@spark.apache.org
> >
> >>
> >> mvn dependency:tree?
> >>
> >> On Sat, Jul 2, 2016 at 12:46 AM, Charles Allen
> >>  wrote:
> >> > I'm having the same difficulty porting
> >> > https://github.com/metamx/druid-spark-batch/tree/spark2 over to
> >> > spark2.x,
> >> > where I have to go track down who is pulling in bad jackson versions.
> >> >
> >>
> >> -
> >> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
> >>
> 
> -
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
> 
  

Re: Spark 2.0.0-preview ... problem with jackson core version

2016-07-02 Thread Sean Owen
This is something to do with your app. The version is 2.6.5 in master
and branch-2.0, and jackson-module-scala is managed to this version
along with all the other jackson artifacts.

On Sat, Jul 2, 2016 at 1:35 PM, Paolo Patierno  wrote:
> What I see is the following ...
>
> - Working configuration
>
> Spark Version : "2.0.0-SNAPSHOT"
>
> The Vert.x library brings ...
> jackson-annotations:2.6.0
> jackson-core:2.6.1
> jackson-databind:2.6.1
>
> Spark brings
> jackson-annotations:2.6.5
> jackson-core:2.6.5
> jackson-databind:2.6.5
> jackson-module-scala_2.11: 2.6.5
>
> The runtime will use the latest version 2.6.5 and all works fine.
>
> - NOT Working configuration
>
> Spark Version : "2.0.0-preview"
>
> The Vert.x library brings ... (same dependencies as before)
> jackson-annotations:2.6.0
> jackson-core:2.6.1
> jackson-databind:2.6.1
>
> Spark brings
> jackson-module-scala_2.11: 2.5.3
>
> The module scale related to jackson is 2.5.3 ... seems not work with above
> 2.6.0/2.6.1 version ... so the exception.
>
>
>> From: so...@cloudera.com
>> Date: Sat, 2 Jul 2016 08:34:45 +0100
>> Subject: Re: Spark 2.0.0-preview ... problem with jackson core version
>> To: charles.al...@metamarkets.com
>> CC: ppatie...@live.com; user@spark.apache.org
>
>>
>> mvn dependency:tree?
>>
>> On Sat, Jul 2, 2016 at 12:46 AM, Charles Allen
>>  wrote:
>> > I'm having the same difficulty porting
>> > https://github.com/metamx/druid-spark-batch/tree/spark2 over to
>> > spark2.x,
>> > where I have to go track down who is pulling in bad jackson versions.
>> >
>>
>> -
>> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>>

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



RE: Spark 2.0.0-preview ... problem with jackson core version

2016-07-02 Thread Paolo Patierno
What I see is the following ...

- Working configuration

Spark Version : "2.0.0-SNAPSHOT"

The Vert.x library brings ...
jackson-annotations:2.6.0
jackson-core:2.6.1
jackson-databind:2.6.1

Spark brings
jackson-annotations:2.6.5
jackson-core:2.6.5
jackson-databind:2.6.5
jackson-module-scala_2.11: 2.6.5

The runtime will use the latest version 2.6.5 and all works fine.

- NOT Working configuration

Spark Version : "2.0.0-preview"

The Vert.x library brings ... (same dependencies as before)
jackson-annotations:2.6.0
jackson-core:2.6.1
jackson-databind:2.6.1

Spark brings
jackson-module-scala_2.11: 2.5.3

The module scale related to jackson is 2.5.3 ... seems not work with above 
2.6.0/2.6.1 version ... so the exception.


> From: so...@cloudera.com
> Date: Sat, 2 Jul 2016 08:34:45 +0100
> Subject: Re: Spark 2.0.0-preview ... problem with jackson core version
> To: charles.al...@metamarkets.com
> CC: ppatie...@live.com; user@spark.apache.org
> 
> mvn dependency:tree?
> 
> On Sat, Jul 2, 2016 at 12:46 AM, Charles Allen
>  wrote:
> > I'm having the same difficulty porting
> > https://github.com/metamx/druid-spark-batch/tree/spark2 over to spark2.x,
> > where I have to go track down who is pulling in bad jackson versions.
> >
> 
> -
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
> 
  

Spark-13979: issues with hadoopConf

2016-07-02 Thread Gil Vernik
Hello,

Any ideas about this one https://issues.apache.org/jira/browse/SPARK-13979
?
Does others see the same issues?

Thanks
Gil.




Re: Several questions about how pyspark.ml works

2016-07-02 Thread Yanbo Liang
Hi Nick,

Please see my inline reply.

Thanks
Yanbo

2016-06-12 3:08 GMT-07:00 XapaJIaMnu :

> Hey,
>
> I have some additional Spark ML algorithms implemented in scala that I
> would
> like to make available in pyspark. For a reference I am looking at the
> available logistic regression implementation here:
>
>
> https://spark.apache.org/docs/1.6.0/api/python/_modules/pyspark/ml/classification.html
>
> I have couple of questions:
> 1) The constructor for the *class LogisticRegression* as far as I
> understand
> just accepts the arguments and then just constructs the underlying Scala
> object via /py4j/ and parses its arguments. This is done via the line
> *self._java_obj = self._new_java_obj(
> "org.apache.spark.ml.classification.LogisticRegression", self.uid)*
> Is this correct?
> What does the line *super(LogisticRegression, self).__init__()* do?
>

*super(LogisticRegression, self).__init__()* is used to initialize the
*Params* object at Python side, since we store all params at Python side
and transfer them to Scala side when calling *fit*.


>
> Does this mean that any python datastructures used with it will be
> converted
> to java structures once the object is instantiated?
>
> 2) The corresponding model *class LogisticRegressionModel(JavaModel):*
> again
> just instantiates the Java object and nothing else? Is just enough for me
> to
> forward the arguments and instantiate the scala objects?
> Does this mean that when the pipeline is created, even if the pipeline is
> python it expects objects which are underlying scala code instantiated by
> /py4j/. Can one use pure python elements inside the pipeline (dealing with
> RDDs)? What would be the performance implication?
>

*class LogisticRegressionModel(JavaModel)* is only a wrapper of the peer
Scala model object.


>
> Cheers,
>
> Nick
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Several-questions-about-how-pyspark-ml-works-tp27141.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>


Re: Trainning a spark ml linear regresion model fail after migrating from 1.5.2 to 1.6.1

2016-07-02 Thread Yanbo Liang
Yes, WeightedLeastSquares can not solve some ill-conditioned problem
currently, the community members have paid some efforts to resolve it
(SPARK-13777). For the work around, you can set the solver to "l-bfgs"
which will train the LogisticRegressionModel by L-BFGS optimization method.

2016-06-09 7:37 GMT-07:00 chaz2505 :

> I ran into this problem too - it's because WeightedLeastSquares (added in
> 1.6.0 SPARK-10668) is being used on an ill-conditioned problem
> (SPARK-11918). I guess because of the one hot encoding. To get around it
> you
> need to ensure WeightedLeastSquares isn't used. Set parameters to make the
> following false:
>
> $(solver) == "auto" && $(elasticNetParam) == 0.0 &&
>   numFeatures <= WeightedLeastSquares.MAX_NUM_FEATURES) || $(solver) ==
> "normal"
>
> Hope this helps
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Trainning-a-spark-ml-linear-regresion-model-fail-after-migrating-from-1-5-2-to-1-6-1-tp27111p27128.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>


Re: Get both feature importance and ROC curve from a random forest classifier

2016-07-02 Thread Yanbo Liang
Hi Mathieu,

Using the new ml package to train a RandomForestClassificationModel, you
can get feature importance. Then you can convert the prediction result to
RDD and feed it into BinaryClassificationEvaluator for ROC curve. You can
refer the following code snippet:

val rf = new RandomForestClassifier()
val model = rf.fit(trainingData)

val predictions = model.transform(testData)

val scoreAndLabels =
  predictions.select(model.getRawPredictionCol, model.getLabelCol).rdd.map {
case Row(rawPrediction: Vector, label: Double) => (rawPrediction(1),
label)
case Row(rawPrediction: Double, label: Double) => (rawPrediction, label)
  }
val metrics = new BinaryClassificationMetrics(scoreAndLabels)
metrics.roc()


Thanks
Yanbo

2016-06-15 7:13 GMT-07:00 matd :

> Hi ml folks !
>
> I'm using a Random Forest for a binary classification.
> I'm interested in getting both the ROC *curve* and the feature importance
> from the trained model.
>
> If I'm not missing something obvious, the ROC curve is only available in
> the
> old mllib world, via BinaryClassificationMetrics. In the new ml package,
> only the areaUnderROC and areaUnderPR are available through
> BinaryClassificationEvaluator.
>
> The feature importance is only available in ml package, through
> RandomForestClassificationModel.
>
> Any idea to get both ?
>
> Mathieu
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/Get-both-feature-importance-and-ROC-curve-from-a-random-forest-classifier-tp27175.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> -
> To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
> For additional commands, e-mail: user-h...@spark.apache.org
>
>


Re: Enforcing shuffle hash join

2016-07-02 Thread Takeshi Yamamuro
Hi,

No, spark has no hint for the hash join.

// maropu

On Fri, Jul 1, 2016 at 4:56 PM, Lalitha MV  wrote:

> Hi,
>
> In order to force broadcast hash join, we can set
> the spark.sql.autoBroadcastJoinThreshold config. Is there a way to enforce
> shuffle hash join in spark sql?
>
>
> Thanks,
> Lalitha
>



-- 
---
Takeshi Yamamuro


Re: spark parquet too many small files ?

2016-07-02 Thread Takeshi Yamamuro
Please also see https://issues.apache.org/jira/browse/SPARK-16188.

// maropu

On Fri, Jul 1, 2016 at 7:39 PM, kali.tumm...@gmail.com <
kali.tumm...@gmail.com> wrote:

> I found the jira for the issue will there be a fix in future ? or no fix ?
>
> https://issues.apache.org/jira/browse/SPARK-6221
>
>
>
> --
> View this message in context:
> http://apache-spark-user-list.1001560.n3.nabble.com/spark-parquet-too-many-small-files-tp27264p27267.html
> Sent from the Apache Spark User List mailing list archive at Nabble.com.
>
> -
> To unsubscribe e-mail: user-unsubscr...@spark.apache.org
>
>


-- 
---
Takeshi Yamamuro


Re: Thrift JDBC server - why only one per machine and only yarn-client

2016-07-02 Thread Takeshi Yamamuro
This is probably because the current thrift-server implementation has
`SparkContext` inside
(See:
https://github.com/apache/spark/blob/master/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLEnv.scala#L34
).
To support yarn-cluster, we need to add a lots of functionalities to deploy
the thrift-server itself in a cluster.
However, istm there are many technical issues around this.

// maropu

On Fri, Jul 1, 2016 at 1:38 PM, Egor Pahomov  wrote:

> What about yarn-cluster mode?
>
> 2016-07-01 11:24 GMT-07:00 Egor Pahomov :
>
>> Separate bad users with bad quires from good users with good quires.
>> Spark do not provide no scope separation out of the box.
>>
>> 2016-07-01 11:12 GMT-07:00 Jeff Zhang :
>>
>>> I think so, any reason you want to deploy multiple thrift server on one
>>> machine ?
>>>
>>> On Fri, Jul 1, 2016 at 10:59 AM, Egor Pahomov 
>>> wrote:
>>>
 Takeshi, of course I used different HIVE_SERVER2_THRIFT_PORT
 Jeff, thanks, I would try, but from your answer I'm getting the
 feeling, that I'm trying some very rare case?

 2016-07-01 10:54 GMT-07:00 Jeff Zhang :

> This is not a bug, because these 2 processes use the
> same SPARK_PID_DIR which is /tmp by default.  Although you can resolve 
> this
> by using different SPARK_PID_DIR, I suspect you would still have other
> issues like port conflict. I would suggest you to deploy one spark thrift
> server per machine for now. If stick to deploy multiple spark thrift 
> server
> on one machine, then define different SPARK_CONF_DIR, SPARK_LOG_DIR and
> SPARK_PID_DIR for your 2 instances of spark thrift server. Not sure if
> there's other conflicts. but please try first.
>
>
> On Fri, Jul 1, 2016 at 10:47 AM, Egor Pahomov 
> wrote:
>
>> I get
>>
>> "org.apache.spark.sql.hive.thriftserver.HiveThriftServer2 running as
>> process 28989.  Stop it first."
>>
>> Is it a bug?
>>
>> 2016-07-01 10:10 GMT-07:00 Jeff Zhang :
>>
>>> I don't think the one instance per machine is true.  As long as you
>>> resolve the conflict issue such as port conflict, pid file, log file and
>>> etc, you can run multiple instances of spark thrift server.
>>>
>>> On Fri, Jul 1, 2016 at 9:32 AM, Egor Pahomov >> > wrote:
>>>
 Hi, I'm using Spark Thrift JDBC server and 2 limitations are really
 bother me -

 1) One instance per machine
 2) Yarn client only(not yarn cluster)

 Are there any architectural reasons for such limitations? About
 yarn-client I might understand in theory - master is the same process 
 as a
 server, so it makes some sense, but it's really inconvenient - I need 
 a lot
 of memory on my driver machine. Reasons for one instance per machine I 
 do
 not understand.

 --


 *Sincerely yoursEgor Pakhomov*

>>>
>>>
>>>
>>> --
>>> Best Regards
>>>
>>> Jeff Zhang
>>>
>>
>>
>>
>> --
>>
>>
>> *Sincerely yoursEgor Pakhomov*
>>
>
>
>
> --
> Best Regards
>
> Jeff Zhang
>



 --


 *Sincerely yoursEgor Pakhomov*

>>>
>>>
>>>
>>> --
>>> Best Regards
>>>
>>> Jeff Zhang
>>>
>>
>>
>>
>> --
>>
>>
>> *Sincerely yoursEgor Pakhomov*
>>
>
>
>
> --
>
>
> *Sincerely yoursEgor Pakhomov*
>



-- 
---
Takeshi Yamamuro


Re: Ideas to put a Spark ML model in production

2016-07-02 Thread Yanbo Liang
Let's suppose you have trained a LogisticRegressionModel and saved it at
"/tmp/lr-model". You can copy the directory to production environment and
use it to make prediction on users new data. You can refer the following
code snippets:

val model = LogisiticRegressionModel.load("/tmp/lr-model")
val data = newDataset
val prediction = model.transform(data)

However, usually we save/load PipelineModel which include necessary feature
transformers and model training process rather than the single model, but
they are similar operations.

Thanks
Yanbo

2016-06-23 10:54 GMT-07:00 Saurabh Sardeshpande :

> Hi all,
>
> How do you reliably deploy a spark model in production? Let's say I've
> done a lot of analysis and come up with a model that performs great. I have
> this "model file" and I'm not sure what to do with it. I want to build some
> kind of service around it that takes some inputs, converts them into a
> feature, runs the equivalent of 'transform', i.e. predict the output and
> return the output.
>
> At the Spark Summit I heard a lot of talk about how this will be easy to
> do in Spark 2.0, but I'm looking for an solution sooner. PMML support is
> limited and the model I have can't be exported in that format.
>
> I would appreciate any ideas around this, especially from personal
> experiences.
>
> Regards,
> Saurabh
>


Re: Spark 2.0.0-preview ... problem with jackson core version

2016-07-02 Thread Sean Owen
mvn dependency:tree?

On Sat, Jul 2, 2016 at 12:46 AM, Charles Allen
 wrote:
> I'm having the same difficulty porting
> https://github.com/metamx/druid-spark-batch/tree/spark2 over to spark2.x,
> where I have to go track down who is pulling in bad jackson versions.
>

-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org



Re: Custom Optimizer

2016-07-02 Thread Yanbo Liang
Spark MLlib does not support optimizer as a plugin, since the optimizer
interface is private.

Thanks
Yanbo

2016-06-23 16:56 GMT-07:00 Stephen Boesch :

> My team has a custom optimization routine that we would have wanted to
> plug in as a replacement for the default  LBFGS /  OWLQN for use by some of
> the ml/mllib algorithms.
>
> However it seems the choice of optimizer is hard-coded in every algorithm
> except LDA: and even in that one it is only a choice between the internally
> defined Online or batch version.
>
> Any suggestions on how we might be able to incorporate our own optimizer?
> Or do we need to roll all of our algorithms from top to bottom - basically
> side stepping ml/mllib?
>
> thanks
> stephen
>


Re: Spark ML - Java implementation of custom Transformer

2016-07-02 Thread Yanbo Liang
Hi Mehdi,

Could you share your code and then we can help you to figure out the
problem?
Actually JavaTestParams can work well but there is some compatibility issue
for JavaDeveloperApiExample.
We have removed JavaDeveloperApiExample temporary at Spark 2.0 in order to
not confuse users. Since the solution for the compatibility issue has been
figured out, we will add it back at 2.1.

Thanks
Yanbo

2016-06-27 11:57 GMT-07:00 Mehdi Meziane :

> Hi all,
>
> We have some problems while implementing custom Transformers in JAVA
> (SPARK 1.6.1).
> We do override the method copy, but it crashes with an AbstractMethodError.
>
> If we extends the UnaryTransformer, and do not override the copy method,
> it works without any error.
>
> We tried to write the copy like in these examples :
>
> https://github.com/apache/spark/blob/branch-2.0/mllib/src/test/java/org/apache/spark/ml/param/JavaTestParams.java
>
> https://github.com/eBay/Spark/blob/branch-1.6/examples/src/main/java/org/apache/spark/examples/ml/JavaDeveloperApiExample.java
>
> None of it worked.
>
> The copy is defined in the Params class as :
>
>   /**
>* Creates a copy of this instance with the same UID and some extra
> params.
>* Subclasses should implement this method and set the return type
> properly.
>*
>* @see [[defaultCopy()]]
>*/
>   def copy(extra: ParamMap): Params
>
> Any idea?
> Thanks,
>
> Mehdi
>