Bootstrap Action to Install Spark 2.0 on EMR?
Hi all, Anybody had tried out Spark 2.0 on EMR 4.x? Will it work? I am looking for a bootstrap action script to install it on EMR, does some one have a working one to share? Appreciate that! Best, Renxia
latest version of Spark to work OK as Hive engine
Hi, Looking at this presentation Hive on Spark is Blazing Fast .. Which latest version of Spark can run as an engine for Hive please? Thanks P.S. I am aware of Hive on TEZ but that is not what I am interested here please Warmest regards
Re: spark parquet too many small files ?
Hi Takeshi, I cant use coalesce in spark-sql shell right I know we can use coalesce in spark with scala application , here in my project we are not building jar or using python we are just executing hive query in spark-sql shell and submitting to yarn client . Example:- spark-sql --verbose --queue default --name wchargeback_event.sparksql.kali --master yarn-client --driver-memory 15g --executor-memory 15g --num-executors 10 --executor-cores 2 -f /x/home/pp_dt_fin_batch/users/ srtummala/run-spark/sql/wtr_full.sql --conf "spark.yarn.executor.memoryOverhead=8000" --conf "spark.sql.shuffle.partitions=50" --conf "spark.kyroserializer.buffer.max.mb=5g" --conf "spark.driver.maxResultSize=20g" --conf "spark.storage.memoryFraction=0.8" --conf "spark.hadoopConfiguration=2560" --conf "spark.dynamicAllocation.enabled=false$" --conf "spark.shuffle.service.enabled=false" --conf "spark.executor.instances=10" Thanks Sri On Sat, Jul 2, 2016 at 2:53 AM, Takeshi Yamamurowrote: > Please also see https://issues.apache.org/jira/browse/SPARK-16188. > > // maropu > > On Fri, Jul 1, 2016 at 7:39 PM, kali.tumm...@gmail.com < > kali.tumm...@gmail.com> wrote: > >> I found the jira for the issue will there be a fix in future ? or no fix ? >> >> https://issues.apache.org/jira/browse/SPARK-6221 >> >> >> >> -- >> View this message in context: >> http://apache-spark-user-list.1001560.n3.nabble.com/spark-parquet-too-many-small-files-tp27264p27267.html >> Sent from the Apache Spark User List mailing list archive at Nabble.com. >> >> - >> To unsubscribe e-mail: user-unsubscr...@spark.apache.org >> >> > > > -- > --- > Takeshi Yamamuro > -- Thanks & Regards Sri Tummala
Working of Streaming Kmeans
Hi, I wanted to ask a very basic question about the working of Streaming Kmeans. Does the model update only when training (i.e. training dataset is used) or does it update on the PredictOnValues function as well for the test dataset? Thanks and Regards Biplob -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Working-of-Streaming-Kmeans-tp27268.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe e-mail: user-unsubscr...@spark.apache.org
RE: Spark 2.0.0-preview ... problem with jackson core version
Yes ! We got it :-) Btw it's not available on Maven yet. :-( Paolo PatiernoSenior Software Engineer (IoT) @ Red Hat Microsoft MVP on Windows Embedded & IoTMicrosoft Azure Advisor Twitter : @ppatierno Linkedin : paolopatierno Blog : DevExperience > From: so...@cloudera.com > Date: Sat, 2 Jul 2016 15:12:11 +0100 > Subject: Re: Spark 2.0.0-preview ... problem with jackson core version > To: ppatie...@live.com > CC: charles.al...@metamarkets.com; user@spark.apache.org > > Ah, it looks like it was 2.5.3 as of 2.0.0-preview: > > https://github.com/apache/spark/blob/2.0.0-preview/pom.xml#L164 > > but was updated to 2.6.5 soon after that, since it was 2.6.5 in 2.0.0-RC1: > > https://github.com/apache/spark/blob/v2.0.0-rc1/pom.xml#L163 > > On Sat, Jul 2, 2016 at 3:04 PM, Paolo Patiernowrote: > > This sounds strange to me because here : > > > > https://mvnrepository.com/artifact/org.apache.spark/spark-core_2.11/2.0.0-preview > > > > I see : > > > > com.fasterxml.jackson.module » jackson-module-scala_2.112.5.3 > > > > So it seems that 2.0.0-preview is bringing jackson module scala 2.5.3 that > > is what I see. > > > > > > > >> From: so...@cloudera.com > >> Date: Sat, 2 Jul 2016 14:32:58 +0100 > >> Subject: Re: Spark 2.0.0-preview ... problem with jackson core version > >> To: ppatie...@live.com > >> CC: charles.al...@metamarkets.com; user@spark.apache.org > > > >> > >> This is something to do with your app. The version is 2.6.5 in master > >> and branch-2.0, and jackson-module-scala is managed to this version > >> along with all the other jackson artifacts. > >> > >> On Sat, Jul 2, 2016 at 1:35 PM, Paolo Patierno wrote: > >> > What I see is the following ... > >> > > >> > - Working configuration > >> > > >> > Spark Version : "2.0.0-SNAPSHOT" > >> > > >> > The Vert.x library brings ... > >> > jackson-annotations:2.6.0 > >> > jackson-core:2.6.1 > >> > jackson-databind:2.6.1 > >> > > >> > Spark brings > >> > jackson-annotations:2.6.5 > >> > jackson-core:2.6.5 > >> > jackson-databind:2.6.5 > >> > jackson-module-scala_2.11: 2.6.5 > >> > > >> > The runtime will use the latest version 2.6.5 and all works fine. > >> > > >> > - NOT Working configuration > >> > > >> > Spark Version : "2.0.0-preview" > >> > > >> > The Vert.x library brings ... (same dependencies as before) > >> > jackson-annotations:2.6.0 > >> > jackson-core:2.6.1 > >> > jackson-databind:2.6.1 > >> > > >> > Spark brings > >> > jackson-module-scala_2.11: 2.5.3 > >> > > >> > The module scale related to jackson is 2.5.3 ... seems not work with > >> > above > >> > 2.6.0/2.6.1 version ... so the exception. > >> > > >> > > >> >> From: so...@cloudera.com > >> >> Date: Sat, 2 Jul 2016 08:34:45 +0100 > >> >> Subject: Re: Spark 2.0.0-preview ... problem with jackson core version > >> >> To: charles.al...@metamarkets.com > >> >> CC: ppatie...@live.com; user@spark.apache.org > >> > > >> >> > >> >> mvn dependency:tree? > >> >> > >> >> On Sat, Jul 2, 2016 at 12:46 AM, Charles Allen > >> >> wrote: > >> >> > I'm having the same difficulty porting > >> >> > https://github.com/metamx/druid-spark-batch/tree/spark2 over to > >> >> > spark2.x, > >> >> > where I have to go track down who is pulling in bad jackson versions. > >> >> > > >> >> > >> >> - > >> >> To unsubscribe e-mail: user-unsubscr...@spark.apache.org > >> >> > >> > >> - > >> To unsubscribe e-mail: user-unsubscr...@spark.apache.org > >> > > - > To unsubscribe e-mail: user-unsubscr...@spark.apache.org >
Re: Spark 2.0.0-preview ... problem with jackson core version
Ah, it looks like it was 2.5.3 as of 2.0.0-preview: https://github.com/apache/spark/blob/2.0.0-preview/pom.xml#L164 but was updated to 2.6.5 soon after that, since it was 2.6.5 in 2.0.0-RC1: https://github.com/apache/spark/blob/v2.0.0-rc1/pom.xml#L163 On Sat, Jul 2, 2016 at 3:04 PM, Paolo Patiernowrote: > This sounds strange to me because here : > > https://mvnrepository.com/artifact/org.apache.spark/spark-core_2.11/2.0.0-preview > > I see : > > com.fasterxml.jackson.module » jackson-module-scala_2.112.5.3 > > So it seems that 2.0.0-preview is bringing jackson module scala 2.5.3 that > is what I see. > > > >> From: so...@cloudera.com >> Date: Sat, 2 Jul 2016 14:32:58 +0100 >> Subject: Re: Spark 2.0.0-preview ... problem with jackson core version >> To: ppatie...@live.com >> CC: charles.al...@metamarkets.com; user@spark.apache.org > >> >> This is something to do with your app. The version is 2.6.5 in master >> and branch-2.0, and jackson-module-scala is managed to this version >> along with all the other jackson artifacts. >> >> On Sat, Jul 2, 2016 at 1:35 PM, Paolo Patierno wrote: >> > What I see is the following ... >> > >> > - Working configuration >> > >> > Spark Version : "2.0.0-SNAPSHOT" >> > >> > The Vert.x library brings ... >> > jackson-annotations:2.6.0 >> > jackson-core:2.6.1 >> > jackson-databind:2.6.1 >> > >> > Spark brings >> > jackson-annotations:2.6.5 >> > jackson-core:2.6.5 >> > jackson-databind:2.6.5 >> > jackson-module-scala_2.11: 2.6.5 >> > >> > The runtime will use the latest version 2.6.5 and all works fine. >> > >> > - NOT Working configuration >> > >> > Spark Version : "2.0.0-preview" >> > >> > The Vert.x library brings ... (same dependencies as before) >> > jackson-annotations:2.6.0 >> > jackson-core:2.6.1 >> > jackson-databind:2.6.1 >> > >> > Spark brings >> > jackson-module-scala_2.11: 2.5.3 >> > >> > The module scale related to jackson is 2.5.3 ... seems not work with >> > above >> > 2.6.0/2.6.1 version ... so the exception. >> > >> > >> >> From: so...@cloudera.com >> >> Date: Sat, 2 Jul 2016 08:34:45 +0100 >> >> Subject: Re: Spark 2.0.0-preview ... problem with jackson core version >> >> To: charles.al...@metamarkets.com >> >> CC: ppatie...@live.com; user@spark.apache.org >> > >> >> >> >> mvn dependency:tree? >> >> >> >> On Sat, Jul 2, 2016 at 12:46 AM, Charles Allen >> >> wrote: >> >> > I'm having the same difficulty porting >> >> > https://github.com/metamx/druid-spark-batch/tree/spark2 over to >> >> > spark2.x, >> >> > where I have to go track down who is pulling in bad jackson versions. >> >> > >> >> >> >> - >> >> To unsubscribe e-mail: user-unsubscr...@spark.apache.org >> >> >> >> - >> To unsubscribe e-mail: user-unsubscr...@spark.apache.org >> - To unsubscribe e-mail: user-unsubscr...@spark.apache.org
RE: Spark 2.0.0-preview ... problem with jackson core version
This sounds strange to me because here : https://mvnrepository.com/artifact/org.apache.spark/spark-core_2.11/2.0.0-preview I see : com.fasterxml.jackson.module » jackson-module-scala_2.112.5.3 So it seems that 2.0.0-preview is bringing jackson module scala 2.5.3 that is what I see. > From: so...@cloudera.com > Date: Sat, 2 Jul 2016 14:32:58 +0100 > Subject: Re: Spark 2.0.0-preview ... problem with jackson core version > To: ppatie...@live.com > CC: charles.al...@metamarkets.com; user@spark.apache.org > > This is something to do with your app. The version is 2.6.5 in master > and branch-2.0, and jackson-module-scala is managed to this version > along with all the other jackson artifacts. > > On Sat, Jul 2, 2016 at 1:35 PM, Paolo Patiernowrote: > > What I see is the following ... > > > > - Working configuration > > > > Spark Version : "2.0.0-SNAPSHOT" > > > > The Vert.x library brings ... > > jackson-annotations:2.6.0 > > jackson-core:2.6.1 > > jackson-databind:2.6.1 > > > > Spark brings > > jackson-annotations:2.6.5 > > jackson-core:2.6.5 > > jackson-databind:2.6.5 > > jackson-module-scala_2.11: 2.6.5 > > > > The runtime will use the latest version 2.6.5 and all works fine. > > > > - NOT Working configuration > > > > Spark Version : "2.0.0-preview" > > > > The Vert.x library brings ... (same dependencies as before) > > jackson-annotations:2.6.0 > > jackson-core:2.6.1 > > jackson-databind:2.6.1 > > > > Spark brings > > jackson-module-scala_2.11: 2.5.3 > > > > The module scale related to jackson is 2.5.3 ... seems not work with above > > 2.6.0/2.6.1 version ... so the exception. > > > > > >> From: so...@cloudera.com > >> Date: Sat, 2 Jul 2016 08:34:45 +0100 > >> Subject: Re: Spark 2.0.0-preview ... problem with jackson core version > >> To: charles.al...@metamarkets.com > >> CC: ppatie...@live.com; user@spark.apache.org > > > >> > >> mvn dependency:tree? > >> > >> On Sat, Jul 2, 2016 at 12:46 AM, Charles Allen > >> wrote: > >> > I'm having the same difficulty porting > >> > https://github.com/metamx/druid-spark-batch/tree/spark2 over to > >> > spark2.x, > >> > where I have to go track down who is pulling in bad jackson versions. > >> > > >> > >> - > >> To unsubscribe e-mail: user-unsubscr...@spark.apache.org > >> > > - > To unsubscribe e-mail: user-unsubscr...@spark.apache.org >
Re: Spark 2.0.0-preview ... problem with jackson core version
This is something to do with your app. The version is 2.6.5 in master and branch-2.0, and jackson-module-scala is managed to this version along with all the other jackson artifacts. On Sat, Jul 2, 2016 at 1:35 PM, Paolo Patiernowrote: > What I see is the following ... > > - Working configuration > > Spark Version : "2.0.0-SNAPSHOT" > > The Vert.x library brings ... > jackson-annotations:2.6.0 > jackson-core:2.6.1 > jackson-databind:2.6.1 > > Spark brings > jackson-annotations:2.6.5 > jackson-core:2.6.5 > jackson-databind:2.6.5 > jackson-module-scala_2.11: 2.6.5 > > The runtime will use the latest version 2.6.5 and all works fine. > > - NOT Working configuration > > Spark Version : "2.0.0-preview" > > The Vert.x library brings ... (same dependencies as before) > jackson-annotations:2.6.0 > jackson-core:2.6.1 > jackson-databind:2.6.1 > > Spark brings > jackson-module-scala_2.11: 2.5.3 > > The module scale related to jackson is 2.5.3 ... seems not work with above > 2.6.0/2.6.1 version ... so the exception. > > >> From: so...@cloudera.com >> Date: Sat, 2 Jul 2016 08:34:45 +0100 >> Subject: Re: Spark 2.0.0-preview ... problem with jackson core version >> To: charles.al...@metamarkets.com >> CC: ppatie...@live.com; user@spark.apache.org > >> >> mvn dependency:tree? >> >> On Sat, Jul 2, 2016 at 12:46 AM, Charles Allen >> wrote: >> > I'm having the same difficulty porting >> > https://github.com/metamx/druid-spark-batch/tree/spark2 over to >> > spark2.x, >> > where I have to go track down who is pulling in bad jackson versions. >> > >> >> - >> To unsubscribe e-mail: user-unsubscr...@spark.apache.org >> - To unsubscribe e-mail: user-unsubscr...@spark.apache.org
RE: Spark 2.0.0-preview ... problem with jackson core version
What I see is the following ... - Working configuration Spark Version : "2.0.0-SNAPSHOT" The Vert.x library brings ... jackson-annotations:2.6.0 jackson-core:2.6.1 jackson-databind:2.6.1 Spark brings jackson-annotations:2.6.5 jackson-core:2.6.5 jackson-databind:2.6.5 jackson-module-scala_2.11: 2.6.5 The runtime will use the latest version 2.6.5 and all works fine. - NOT Working configuration Spark Version : "2.0.0-preview" The Vert.x library brings ... (same dependencies as before) jackson-annotations:2.6.0 jackson-core:2.6.1 jackson-databind:2.6.1 Spark brings jackson-module-scala_2.11: 2.5.3 The module scale related to jackson is 2.5.3 ... seems not work with above 2.6.0/2.6.1 version ... so the exception. > From: so...@cloudera.com > Date: Sat, 2 Jul 2016 08:34:45 +0100 > Subject: Re: Spark 2.0.0-preview ... problem with jackson core version > To: charles.al...@metamarkets.com > CC: ppatie...@live.com; user@spark.apache.org > > mvn dependency:tree? > > On Sat, Jul 2, 2016 at 12:46 AM, Charles Allen >wrote: > > I'm having the same difficulty porting > > https://github.com/metamx/druid-spark-batch/tree/spark2 over to spark2.x, > > where I have to go track down who is pulling in bad jackson versions. > > > > - > To unsubscribe e-mail: user-unsubscr...@spark.apache.org >
Spark-13979: issues with hadoopConf
Hello, Any ideas about this one https://issues.apache.org/jira/browse/SPARK-13979 ? Does others see the same issues? Thanks Gil.
Re: Several questions about how pyspark.ml works
Hi Nick, Please see my inline reply. Thanks Yanbo 2016-06-12 3:08 GMT-07:00 XapaJIaMnu: > Hey, > > I have some additional Spark ML algorithms implemented in scala that I > would > like to make available in pyspark. For a reference I am looking at the > available logistic regression implementation here: > > > https://spark.apache.org/docs/1.6.0/api/python/_modules/pyspark/ml/classification.html > > I have couple of questions: > 1) The constructor for the *class LogisticRegression* as far as I > understand > just accepts the arguments and then just constructs the underlying Scala > object via /py4j/ and parses its arguments. This is done via the line > *self._java_obj = self._new_java_obj( > "org.apache.spark.ml.classification.LogisticRegression", self.uid)* > Is this correct? > What does the line *super(LogisticRegression, self).__init__()* do? > *super(LogisticRegression, self).__init__()* is used to initialize the *Params* object at Python side, since we store all params at Python side and transfer them to Scala side when calling *fit*. > > Does this mean that any python datastructures used with it will be > converted > to java structures once the object is instantiated? > > 2) The corresponding model *class LogisticRegressionModel(JavaModel):* > again > just instantiates the Java object and nothing else? Is just enough for me > to > forward the arguments and instantiate the scala objects? > Does this mean that when the pipeline is created, even if the pipeline is > python it expects objects which are underlying scala code instantiated by > /py4j/. Can one use pure python elements inside the pipeline (dealing with > RDDs)? What would be the performance implication? > *class LogisticRegressionModel(JavaModel)* is only a wrapper of the peer Scala model object. > > Cheers, > > Nick > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Several-questions-about-how-pyspark-ml-works-tp27141.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > - > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > >
Re: Trainning a spark ml linear regresion model fail after migrating from 1.5.2 to 1.6.1
Yes, WeightedLeastSquares can not solve some ill-conditioned problem currently, the community members have paid some efforts to resolve it (SPARK-13777). For the work around, you can set the solver to "l-bfgs" which will train the LogisticRegressionModel by L-BFGS optimization method. 2016-06-09 7:37 GMT-07:00 chaz2505: > I ran into this problem too - it's because WeightedLeastSquares (added in > 1.6.0 SPARK-10668) is being used on an ill-conditioned problem > (SPARK-11918). I guess because of the one hot encoding. To get around it > you > need to ensure WeightedLeastSquares isn't used. Set parameters to make the > following false: > > $(solver) == "auto" && $(elasticNetParam) == 0.0 && > numFeatures <= WeightedLeastSquares.MAX_NUM_FEATURES) || $(solver) == > "normal" > > Hope this helps > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Trainning-a-spark-ml-linear-regresion-model-fail-after-migrating-from-1-5-2-to-1-6-1-tp27111p27128.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > - > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > >
Re: Get both feature importance and ROC curve from a random forest classifier
Hi Mathieu, Using the new ml package to train a RandomForestClassificationModel, you can get feature importance. Then you can convert the prediction result to RDD and feed it into BinaryClassificationEvaluator for ROC curve. You can refer the following code snippet: val rf = new RandomForestClassifier() val model = rf.fit(trainingData) val predictions = model.transform(testData) val scoreAndLabels = predictions.select(model.getRawPredictionCol, model.getLabelCol).rdd.map { case Row(rawPrediction: Vector, label: Double) => (rawPrediction(1), label) case Row(rawPrediction: Double, label: Double) => (rawPrediction, label) } val metrics = new BinaryClassificationMetrics(scoreAndLabels) metrics.roc() Thanks Yanbo 2016-06-15 7:13 GMT-07:00 matd: > Hi ml folks ! > > I'm using a Random Forest for a binary classification. > I'm interested in getting both the ROC *curve* and the feature importance > from the trained model. > > If I'm not missing something obvious, the ROC curve is only available in > the > old mllib world, via BinaryClassificationMetrics. In the new ml package, > only the areaUnderROC and areaUnderPR are available through > BinaryClassificationEvaluator. > > The feature importance is only available in ml package, through > RandomForestClassificationModel. > > Any idea to get both ? > > Mathieu > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/Get-both-feature-importance-and-ROC-curve-from-a-random-forest-classifier-tp27175.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > - > To unsubscribe, e-mail: user-unsubscr...@spark.apache.org > For additional commands, e-mail: user-h...@spark.apache.org > >
Re: Enforcing shuffle hash join
Hi, No, spark has no hint for the hash join. // maropu On Fri, Jul 1, 2016 at 4:56 PM, Lalitha MVwrote: > Hi, > > In order to force broadcast hash join, we can set > the spark.sql.autoBroadcastJoinThreshold config. Is there a way to enforce > shuffle hash join in spark sql? > > > Thanks, > Lalitha > -- --- Takeshi Yamamuro
Re: spark parquet too many small files ?
Please also see https://issues.apache.org/jira/browse/SPARK-16188. // maropu On Fri, Jul 1, 2016 at 7:39 PM, kali.tumm...@gmail.com < kali.tumm...@gmail.com> wrote: > I found the jira for the issue will there be a fix in future ? or no fix ? > > https://issues.apache.org/jira/browse/SPARK-6221 > > > > -- > View this message in context: > http://apache-spark-user-list.1001560.n3.nabble.com/spark-parquet-too-many-small-files-tp27264p27267.html > Sent from the Apache Spark User List mailing list archive at Nabble.com. > > - > To unsubscribe e-mail: user-unsubscr...@spark.apache.org > > -- --- Takeshi Yamamuro
Re: Thrift JDBC server - why only one per machine and only yarn-client
This is probably because the current thrift-server implementation has `SparkContext` inside (See: https://github.com/apache/spark/blob/master/sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/SparkSQLEnv.scala#L34 ). To support yarn-cluster, we need to add a lots of functionalities to deploy the thrift-server itself in a cluster. However, istm there are many technical issues around this. // maropu On Fri, Jul 1, 2016 at 1:38 PM, Egor Pahomovwrote: > What about yarn-cluster mode? > > 2016-07-01 11:24 GMT-07:00 Egor Pahomov : > >> Separate bad users with bad quires from good users with good quires. >> Spark do not provide no scope separation out of the box. >> >> 2016-07-01 11:12 GMT-07:00 Jeff Zhang : >> >>> I think so, any reason you want to deploy multiple thrift server on one >>> machine ? >>> >>> On Fri, Jul 1, 2016 at 10:59 AM, Egor Pahomov >>> wrote: >>> Takeshi, of course I used different HIVE_SERVER2_THRIFT_PORT Jeff, thanks, I would try, but from your answer I'm getting the feeling, that I'm trying some very rare case? 2016-07-01 10:54 GMT-07:00 Jeff Zhang : > This is not a bug, because these 2 processes use the > same SPARK_PID_DIR which is /tmp by default. Although you can resolve > this > by using different SPARK_PID_DIR, I suspect you would still have other > issues like port conflict. I would suggest you to deploy one spark thrift > server per machine for now. If stick to deploy multiple spark thrift > server > on one machine, then define different SPARK_CONF_DIR, SPARK_LOG_DIR and > SPARK_PID_DIR for your 2 instances of spark thrift server. Not sure if > there's other conflicts. but please try first. > > > On Fri, Jul 1, 2016 at 10:47 AM, Egor Pahomov > wrote: > >> I get >> >> "org.apache.spark.sql.hive.thriftserver.HiveThriftServer2 running as >> process 28989. Stop it first." >> >> Is it a bug? >> >> 2016-07-01 10:10 GMT-07:00 Jeff Zhang : >> >>> I don't think the one instance per machine is true. As long as you >>> resolve the conflict issue such as port conflict, pid file, log file and >>> etc, you can run multiple instances of spark thrift server. >>> >>> On Fri, Jul 1, 2016 at 9:32 AM, Egor Pahomov >> > wrote: >>> Hi, I'm using Spark Thrift JDBC server and 2 limitations are really bother me - 1) One instance per machine 2) Yarn client only(not yarn cluster) Are there any architectural reasons for such limitations? About yarn-client I might understand in theory - master is the same process as a server, so it makes some sense, but it's really inconvenient - I need a lot of memory on my driver machine. Reasons for one instance per machine I do not understand. -- *Sincerely yoursEgor Pakhomov* >>> >>> >>> >>> -- >>> Best Regards >>> >>> Jeff Zhang >>> >> >> >> >> -- >> >> >> *Sincerely yoursEgor Pakhomov* >> > > > > -- > Best Regards > > Jeff Zhang > -- *Sincerely yoursEgor Pakhomov* >>> >>> >>> >>> -- >>> Best Regards >>> >>> Jeff Zhang >>> >> >> >> >> -- >> >> >> *Sincerely yoursEgor Pakhomov* >> > > > > -- > > > *Sincerely yoursEgor Pakhomov* > -- --- Takeshi Yamamuro
Re: Ideas to put a Spark ML model in production
Let's suppose you have trained a LogisticRegressionModel and saved it at "/tmp/lr-model". You can copy the directory to production environment and use it to make prediction on users new data. You can refer the following code snippets: val model = LogisiticRegressionModel.load("/tmp/lr-model") val data = newDataset val prediction = model.transform(data) However, usually we save/load PipelineModel which include necessary feature transformers and model training process rather than the single model, but they are similar operations. Thanks Yanbo 2016-06-23 10:54 GMT-07:00 Saurabh Sardeshpande: > Hi all, > > How do you reliably deploy a spark model in production? Let's say I've > done a lot of analysis and come up with a model that performs great. I have > this "model file" and I'm not sure what to do with it. I want to build some > kind of service around it that takes some inputs, converts them into a > feature, runs the equivalent of 'transform', i.e. predict the output and > return the output. > > At the Spark Summit I heard a lot of talk about how this will be easy to > do in Spark 2.0, but I'm looking for an solution sooner. PMML support is > limited and the model I have can't be exported in that format. > > I would appreciate any ideas around this, especially from personal > experiences. > > Regards, > Saurabh >
Re: Spark 2.0.0-preview ... problem with jackson core version
mvn dependency:tree? On Sat, Jul 2, 2016 at 12:46 AM, Charles Allenwrote: > I'm having the same difficulty porting > https://github.com/metamx/druid-spark-batch/tree/spark2 over to spark2.x, > where I have to go track down who is pulling in bad jackson versions. > - To unsubscribe e-mail: user-unsubscr...@spark.apache.org
Re: Custom Optimizer
Spark MLlib does not support optimizer as a plugin, since the optimizer interface is private. Thanks Yanbo 2016-06-23 16:56 GMT-07:00 Stephen Boesch: > My team has a custom optimization routine that we would have wanted to > plug in as a replacement for the default LBFGS / OWLQN for use by some of > the ml/mllib algorithms. > > However it seems the choice of optimizer is hard-coded in every algorithm > except LDA: and even in that one it is only a choice between the internally > defined Online or batch version. > > Any suggestions on how we might be able to incorporate our own optimizer? > Or do we need to roll all of our algorithms from top to bottom - basically > side stepping ml/mllib? > > thanks > stephen >
Re: Spark ML - Java implementation of custom Transformer
Hi Mehdi, Could you share your code and then we can help you to figure out the problem? Actually JavaTestParams can work well but there is some compatibility issue for JavaDeveloperApiExample. We have removed JavaDeveloperApiExample temporary at Spark 2.0 in order to not confuse users. Since the solution for the compatibility issue has been figured out, we will add it back at 2.1. Thanks Yanbo 2016-06-27 11:57 GMT-07:00 Mehdi Meziane: > Hi all, > > We have some problems while implementing custom Transformers in JAVA > (SPARK 1.6.1). > We do override the method copy, but it crashes with an AbstractMethodError. > > If we extends the UnaryTransformer, and do not override the copy method, > it works without any error. > > We tried to write the copy like in these examples : > > https://github.com/apache/spark/blob/branch-2.0/mllib/src/test/java/org/apache/spark/ml/param/JavaTestParams.java > > https://github.com/eBay/Spark/blob/branch-1.6/examples/src/main/java/org/apache/spark/examples/ml/JavaDeveloperApiExample.java > > None of it worked. > > The copy is defined in the Params class as : > > /** >* Creates a copy of this instance with the same UID and some extra > params. >* Subclasses should implement this method and set the return type > properly. >* >* @see [[defaultCopy()]] >*/ > def copy(extra: ParamMap): Params > > Any idea? > Thanks, > > Mehdi >