Re: Is there any Spark implementation for Item-based Collaborative Filtering?
The latest version of MLlib has it built in no? J Sent from my iPhone On Nov 30, 2014, at 9:36 AM, shahab shahab.mok...@gmail.com wrote: Hi, I just wonder if there is any implementation for Item-based Collaborative Filtering in Spark? best, /Shahab - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: how to convert System.currentTimeMillis to calendar time
You could also use the jodatime library, which has a ton of great other options in it. J ᐧ *JIMMY MCERLAIN* DATA SCIENTIST (NERD) *. . . . . . . . . . . . . . . . . .* *IF WE CAN’T DOUBLE YOUR SALES,* *ONE OF US IS IN THE WRONG BUSINESS.* *E*: ji...@sellpoints.com *M*: *510.303.7751* On Thu, Nov 13, 2014 at 10:40 AM, Akhil Das ak...@sigmoidanalytics.com wrote: This way? scala val epoch = System.currentTimeMillis epoch: Long = 1415903974545 scala val date = new Date(epoch) date: java.util.Date = Fri Nov 14 00:09:34 IST 2014 Thanks Best Regards On Thu, Nov 13, 2014 at 10:17 PM, spr s...@yarcdata.com wrote: Apologies for what seems an egregiously simple question, but I can't find the answer anywhere. I have timestamps from the Spark Streaming Time() interface, in milliseconds since an epoch, and I want to print out a human-readable calendar date and time. How does one do that? -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/how-to-convert-System-currentTimeMillis-to-calendar-time-tp18856.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: which is the recommended workflow engine for Apache Spark jobs?
I have used Oozie for all our workflows with Spark apps but you will have to use a java event as the workflow element. I am interested in anyones experience with Luigi and/or any other tools. On Mon, Nov 10, 2014 at 10:34 AM, Adamantios Corais adamantios.cor...@gmail.com wrote: I have some previous experience with Apache Oozie while I was developing in Apache Pig. Now, I am working explicitly with Apache Spark and I am looking for a tool with similar functionality. Is Oozie recommended? What about Luigi? What do you use \ recommend? -- Nothing under the sun is greater than education. By educating one person and sending him/her into the society of his/her generation, we make a contribution extending a hundred generations to come. -Jigoro Kano, Founder of Judo-
Re: Unable to use HiveContext in spark-shell
can you be more specific what version of spark, hive, hadoop, etc... what are you trying to do? what are the issues you are seeing? J ᐧ *JIMMY MCERLAIN* DATA SCIENTIST (NERD) *. . . . . . . . . . . . . . . . . .* *IF WE CAN’T DOUBLE YOUR SALES,* *ONE OF US IS IN THE WRONG BUSINESS.* *E*: ji...@sellpoints.com *M*: *510.303.7751* On Thu, Nov 6, 2014 at 9:22 AM, tridib tridib.sama...@live.com wrote: Help please! -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Unable-to-use-HiveContext-in-spark-shell-tp18261p18280.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Spark v Redshift
This is pretty spot on.. though I would also add that the Spark features that it touts around speed are all dependent on caching the data into memory... reading off the disk still takes time..ie pulling the data into an RDD. This is the reason that Spark is great for ML... the data is used over and over again to fit models so its pulled into memory once then basically analyzed through the algos... other DBs systems are reading and writing to disk repeatedly and are thus slower, such as mahout (though its getting ported over to Spark as well to compete with MLlib)... J ᐧ *JIMMY MCERLAIN* DATA SCIENTIST (NERD) *. . . . . . . . . . . . . . . . . .* *IF WE CAN’T DOUBLE YOUR SALES,* *ONE OF US IS IN THE WRONG BUSINESS.* *E*: ji...@sellpoints.com *M*: *510.303.7751* On Tue, Nov 4, 2014 at 3:51 PM, Matei Zaharia matei.zaha...@gmail.com wrote: Is this about Spark SQL vs Redshift, or Spark in general? Spark in general provides a broader set of capabilities than Redshift because it has APIs in general-purpose languages (Java, Scala, Python) and libraries for things like machine learning and graph processing. For example, you might use Spark to do the ETL that will put data into a database such as Redshift, or you might pull data out of Redshift into Spark for machine learning. On the other hand, if *all* you want to do is SQL and you are okay with the set of data formats and features in Redshift (i.e. you can express everything using its UDFs and you have a way to get data in), then Redshift is a complete service which will do more management out of the box. Matei On Nov 4, 2014, at 3:11 PM, agfung agf...@gmail.com wrote: I'm in the midst of a heated debate about the use of Redshift v Spark with a colleague. We keep trading anecdotes and links back and forth (eg airbnb post from 2013 or amplab benchmarks), and we don't seem to be getting anywhere. So before we start down the prototype /benchmark road, and in desperation of finding *some* kind of objective third party perspective, was wondering if anyone who has used both in 2014 would care to provide commentary about the sweet spot use cases / gotchas for non trivial use (eg a simple filter scan isn't really interesting). Soft issues like operational maintenance and time spent developing v out of the box are interesting too... -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-v-Redshift-tp18112.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: Spark + Tableau
What ODBC driver are you using? We recently got the Hortonworks JODBC drivers working on a Windows box but was having issues with Mac Sent from my iPhone On Oct 30, 2014, at 4:23 AM, Bojan Kostic blood9ra...@gmail.com wrote: I'm testing beta driver from Databricks for Tableua. And unfortunately i encounter some issues. While beeline connection works without problems, Tableau can't connect to spark thrift server. Error from driver(Tableau): Unable to connect to the ODBC Data Source. Check that the necessary drivers are installed and that the connection properties are valid. [Simba][SparkODBC] (34) Error from Spark: ETIMEDOUT. Unable to connect to the server test.server.com. Check that the server is running and that you have access privileges to the requested database. Unable to connect to the server. Check that the server is running and that you have access privileges to the requested database. Exception on Thrift server: java.lang.RuntimeException: org.apache.thrift.transport.TTransportException at org.apache.thrift.transport.TSaslServerTransport$Factory.getTransport(TSaslServerTransport.java:219) at org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:189) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615) at java.lang.Thread.run(Thread.java:722) Caused by: org.apache.thrift.transport.TTransportException at org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132) at org.apache.thrift.transport.TTransport.readAll(TTransport.java:84) at org.apache.thrift.transport.TSaslTransport.receiveSaslMessage(TSaslTransport.java:182) at org.apache.thrift.transport.TSaslServerTransport.handleSaslStartMessage(TSaslServerTransport.java:125) at org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:253) at org.apache.thrift.transport.TSaslServerTransport.open(TSaslServerTransport.java:41) at org.apache.thrift.transport.TSaslServerTransport$Factory.getTransport(TSaslServerTransport.java:216) ... 4 more Is there anyone else who's testing this driver, or did anyone saw this message? Best regards Bojan Kostić -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Tableau-tp17720.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org
Re: issue on applying SVM to 5 million examples.
Watch the app manager it should tell you what's running and taking awhile... My guess it's a distinct function on the data. J Sent from my iPhone On Oct 30, 2014, at 8:22 AM, peng xia toxiap...@gmail.com wrote: Hi, Previous we have applied SVM algorithm in MLlib to 5 million records (600 mb), it takes more than 25 minutes to finish. The spark version we are using is 1.0 and we were running this program on a 4 nodes cluster. Each node has 4 cpu cores and 11 GB RAM. The 5 million records only have two distinct records (One positive and one negative), others are all duplications. Any one has any idea on why it takes so long on this small data? Thanks, Best, Peng
Re: issue on applying SVM to 5 million examples.
sampleRDD. cache() Sent from my iPhone On Oct 30, 2014, at 5:01 PM, peng xia toxiap...@gmail.com wrote: Hi Xiangrui, Can you give me some code example about caching, as I am new to Spark. Thanks, Best, Peng On Thu, Oct 30, 2014 at 6:57 PM, Xiangrui Meng men...@gmail.com wrote: Then caching should solve the problem. Otherwise, it is just loading and parsing data from disk for each iteration. -Xiangrui On Thu, Oct 30, 2014 at 11:44 AM, peng xia toxiap...@gmail.com wrote: Thanks for all your help. I think I didn't cache the data. My previous cluster was expired and I don't have a chance to check the load balance or app manager. Below is my code. There are 18 features for each record and I am using the Scala API. import org.apache.spark.SparkConf import org.apache.spark.SparkContext import org.apache.spark.SparkContext._ import org.apache.spark.rdd._ import org.apache.spark.mllib.classification.SVMWithSGD import org.apache.spark.mllib.regression.LabeledPoint import org.apache.spark.mllib.linalg.Vectors import java.util.Calendar object BenchmarkClassification { def main(args: Array[String]) { // Load and parse the data file val conf = new SparkConf() .setAppName(SVM) .set(spark.executor.memory, 8g) // .set(spark.executor.extraJavaOptions, -Xms8g -Xmx8g) val sc = new SparkContext(conf) val data = sc.textFile(args(0)) val parsedData = data.map { line = val parts = line.split(',') LabeledPoint(parts(0).toDouble, Vectors.dense(parts.tail.map(x = x.toDouble))) } val testData = sc.textFile(args(1)) val testParsedData = testData .map { line = val parts = line.split(',') LabeledPoint(parts(0).toDouble, Vectors.dense(parts.tail.map(x = x.toDouble))) } // Run training algorithm to build the model val numIterations = 20 val model = SVMWithSGD.train(parsedData, numIterations) // Evaluate model on training examples and compute training error // val labelAndPreds = testParsedData.map { point = // val prediction = model.predict(point.features) // (point.label, prediction) // } // val trainErr = labelAndPreds.filter(r = r._1 != r._2).count.toDouble / testParsedData.count // println(Training Error = + trainErr) println(Calendar.getInstance().getTime()) } } Thanks, Best, Peng On Thu, Oct 30, 2014 at 1:23 PM, Xiangrui Meng men...@gmail.com wrote: DId you cache the data and check the load balancing? How many features? Which API are you using, Scala, Java, or Python? -Xiangrui On Thu, Oct 30, 2014 at 9:13 AM, Jimmy ji...@sellpoints.com wrote: Watch the app manager it should tell you what's running and taking awhile... My guess it's a distinct function on the data. J Sent from my iPhone On Oct 30, 2014, at 8:22 AM, peng xia toxiap...@gmail.com wrote: Hi, Previous we have applied SVM algorithm in MLlib to 5 million records (600 mb), it takes more than 25 minutes to finish. The spark version we are using is 1.0 and we were running this program on a 4 nodes cluster. Each node has 4 cpu cores and 11 GB RAM. The 5 million records only have two distinct records (One positive and one negative), others are all duplications. Any one has any idea on why it takes so long on this small data? Thanks, Best, Peng
Re: TaskNotSerializableException when running through Spark shell
I actually only ran into this issue recently after we upgraded to Spark 1.1. Within the REPL for Spark 1.0 everything works fine but within the REPL for 1.1, it is not. FYI I am also only doing simple regex matching functions within an RDD... Now when I am running the same code as App everything is working fine... it leads me to believe that it is a bug within the REPL for 1.1 Can anyone else confirm this? ᐧ *JIMMY MCERLAIN* DATA SCIENTIST (NERD) *. . . . . . . . . . . . . . . . . .* *IF WE CAN’T DOUBLE YOUR SALES,* *ONE OF US IS IN THE WRONG BUSINESS.* *E*: ji...@sellpoints.com *M*: *510.303.7751* On Thu, Oct 16, 2014 at 7:56 AM, Akshat Aranya aara...@gmail.com wrote: Hi, Can anyone explain how things get captured in a closure when runing through the REPL. For example: def foo(..) = { .. } rdd.map(foo) sometimes complains about classes not being serializable that are completely unrelated to foo. This happens even when I write it such: object Foo { def foo(..) = { .. } } rdd.map(Foo.foo) It also doesn't happen all the time.
Re: Exception while reading SendingConnection to ConnectionManagerId
Does anyone know anything re: this error? Thank you! On Wed, Oct 15, 2014 at 3:38 PM, Jimmy Li jimmy...@bluelabs.com wrote: Hi there, I'm running spark on ec2, and am running into an error there that I don't get locally. Here's the error: 11335 [handle-read-write-executor-3] ERROR org.apache.spark.network.SendingConnection - Exception while reading SendingConnection to ConnectionManagerId([IP HERE]) java.nio.channels.ClosedChannelException Does anyone know what might be causing this? Spark is running on my ec2 instances. Thanks, Jimmy
Exception while reading SendingConnection to ConnectionManagerId
Hi there, I'm running spark on ec2, and am running into an error there that I don't get locally. Here's the error: 11335 [handle-read-write-executor-3] ERROR org.apache.spark.network.SendingConnection - Exception while reading SendingConnection to ConnectionManagerId([IP HERE]) java.nio.channels.ClosedChannelException Does anyone know what might be causing this? Spark is running on my ec2 instances. Thanks, Jimmy
Re: Spark can't find jars
So the only way that I could make this work was to build a fat jar file as suggested earlier. To me (and I am no expert) it seems like this is a bug. Everything was working for me prior to our upgrade to Spark 1.1 on Hadoop 2.2 but now it seems to not... ie packaging my jars locally then pushing them out to the cluster and pointing them to corresponding dependent jars Sorry I cannot be more help! J ᐧ *JIMMY MCERLAIN* DATA SCIENTIST (NERD) *. . . . . . . . . . . . . . . . . .* *IF WE CAN’T DOUBLE YOUR SALES,* *ONE OF US IS IN THE WRONG BUSINESS.* *E*: ji...@sellpoints.com *M*: *510.303.7751* On Tue, Oct 14, 2014 at 4:59 AM, Christophe Préaud christophe.pre...@kelkoo.com wrote: Hello, I have already posted a message with the exact same problem, and proposed a patch (the subject is Application failure in yarn-cluster mode). Can you test it, and see if it works for you? I would be glad too if someone can confirm that it is a bug in Spark 1.1.0. Regards, Christophe. On 14/10/2014 03:15, Jimmy McErlain wrote: BTW this has always worked for me before until we upgraded the cluster to Spark 1.1.1... J ᐧ *JIMMY MCERLAIN* DATA SCIENTIST (NERD) *. . . . . . . . . . . . . . . . . .* *IF WE CAN’T DOUBLE YOUR SALES,* *ONE OF US IS IN THE WRONG BUSINESS. * *E*: ji...@sellpoints.com *M*: *510.303.7751 510.303.7751* On Mon, Oct 13, 2014 at 5:39 PM, HARIPRIYA AYYALASOMAYAJULA aharipriy...@gmail.com wrote: Helo, Can you check if the jar file is available in the target-scala-2.10 folder? When you use sbt package to make the jar file, that is where the jar file would be located. The following command works well for me: spark-submit --class “Classname --master yarn-cluster jarfile(withcomplete path) Can you try checking with this initially and later add other options? On Mon, Oct 13, 2014 at 7:36 PM, Jimmy ji...@sellpoints.com wrote: Having the exact same error with the exact same jar Do you work for Altiscale? :) J Sent from my iPhone On Oct 13, 2014, at 5:33 PM, Andy Srine andy.sr...@gmail.com wrote: Hi Guys, Spark rookie here. I am getting a file not found exception on the --jars. This is on the yarn cluster mode and I am running the following command on our recently upgraded Spark 1.1.1 environment. ./bin/spark-submit --verbose --master yarn --deploy-mode cluster --class myEngine --driver-memory 1g --driver-library-path /hadoop/share/hadoop/mapreduce/lib/hadoop-lzo-0.4.18-201406111750.jar --executor-memory 5g --executor-cores 5 --jars /home/andy/spark/lib/joda-convert-1.2.jar --queue default --num-executors 4 /home/andy/spark/lib/my-spark-lib_1.0.jar This is the error I am hitting. Any tips would be much appreciated. The file permissions looks fine on my local disk. 14/10/13 22:49:39 INFO yarn.ApplicationMaster: Unregistering ApplicationMaster with FAILED 14/10/13 22:49:39 INFO impl.AMRMClientImpl: Waiting for application to be successfully unregistered. Exception in thread Driver java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:162) Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task 3 in stage 1.0 failed 4 times, most recent failure: Lost task 3.3 in stage 1.0 (TID 12, 122-67.vb2.company.com): java.io.FileNotFoundException: ./joda-convert-1.2.jar (Permission denied) java.io.FileOutputStream.open(Native Method) java.io.FileOutputStream.init(FileOutputStream.java:221) com.google.common.io.Files$FileByteSink.openStream(Files.java:223) com.google.common.io.Files$FileByteSink.openStream(Files.java:211) Thanks, Andy -- Regards, Haripriya Ayyalasomayajula -- Kelkoo SAS Société par Actions Simplifiée Au capital de € 4.168.964,30 Siège social : 8, rue du Sentier 75002 Paris 425 093 069 RCS Paris Ce message et les pièces jointes sont confidentiels et établis à l'attention exclusive de leurs destinataires. Si vous n'êtes pas le destinataire de ce message, merci de le détruire et d'en avertir l'expéditeur.
Re: Spark can't find jars
Having the exact same error with the exact same jar Do you work for Altiscale? :) J Sent from my iPhone On Oct 13, 2014, at 5:33 PM, Andy Srine andy.sr...@gmail.com wrote: Hi Guys, Spark rookie here. I am getting a file not found exception on the --jars. This is on the yarn cluster mode and I am running the following command on our recently upgraded Spark 1.1.1 environment. ./bin/spark-submit --verbose --master yarn --deploy-mode cluster --class myEngine --driver-memory 1g --driver-library-path /hadoop/share/hadoop/mapreduce/lib/hadoop-lzo-0.4.18-201406111750.jar --executor-memory 5g --executor-cores 5 --jars /home/andy/spark/lib/joda-convert-1.2.jar --queue default --num-executors 4 /home/andy/spark/lib/my-spark-lib_1.0.jar This is the error I am hitting. Any tips would be much appreciated. The file permissions looks fine on my local disk. 14/10/13 22:49:39 INFO yarn.ApplicationMaster: Unregistering ApplicationMaster with FAILED 14/10/13 22:49:39 INFO impl.AMRMClientImpl: Waiting for application to be successfully unregistered. Exception in thread Driver java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:162) Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task 3 in stage 1.0 failed 4 times, most recent failure: Lost task 3.3 in stage 1.0 (TID 12, 122-67.vb2.company.com): java.io.FileNotFoundException: ./joda-convert-1.2.jar (Permission denied) java.io.FileOutputStream.open(Native Method) java.io.FileOutputStream.init(FileOutputStream.java:221) com.google.common.io.Files$FileByteSink.openStream(Files.java:223) com.google.common.io.Files$FileByteSink.openStream(Files.java:211) Thanks, Andy
Re: Spark can't find jars
BTW this has always worked for me before until we upgraded the cluster to Spark 1.1.1... J ᐧ *JIMMY MCERLAIN* DATA SCIENTIST (NERD) *. . . . . . . . . . . . . . . . . .* *IF WE CAN’T DOUBLE YOUR SALES,* *ONE OF US IS IN THE WRONG BUSINESS.* *E*: ji...@sellpoints.com *M*: *510.303.7751* On Mon, Oct 13, 2014 at 5:39 PM, HARIPRIYA AYYALASOMAYAJULA aharipriy...@gmail.com wrote: Helo, Can you check if the jar file is available in the target-scala-2.10 folder? When you use sbt package to make the jar file, that is where the jar file would be located. The following command works well for me: spark-submit --class “Classname --master yarn-cluster jarfile(withcomplete path) Can you try checking with this initially and later add other options? On Mon, Oct 13, 2014 at 7:36 PM, Jimmy ji...@sellpoints.com wrote: Having the exact same error with the exact same jar Do you work for Altiscale? :) J Sent from my iPhone On Oct 13, 2014, at 5:33 PM, Andy Srine andy.sr...@gmail.com wrote: Hi Guys, Spark rookie here. I am getting a file not found exception on the --jars. This is on the yarn cluster mode and I am running the following command on our recently upgraded Spark 1.1.1 environment. ./bin/spark-submit --verbose --master yarn --deploy-mode cluster --class myEngine --driver-memory 1g --driver-library-path /hadoop/share/hadoop/mapreduce/lib/hadoop-lzo-0.4.18-201406111750.jar --executor-memory 5g --executor-cores 5 --jars /home/andy/spark/lib/joda-convert-1.2.jar --queue default --num-executors 4 /home/andy/spark/lib/my-spark-lib_1.0.jar This is the error I am hitting. Any tips would be much appreciated. The file permissions looks fine on my local disk. 14/10/13 22:49:39 INFO yarn.ApplicationMaster: Unregistering ApplicationMaster with FAILED 14/10/13 22:49:39 INFO impl.AMRMClientImpl: Waiting for application to be successfully unregistered. Exception in thread Driver java.lang.reflect.InvocationTargetException at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:162) Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task 3 in stage 1.0 failed 4 times, most recent failure: Lost task 3.3 in stage 1.0 (TID 12, 122-67.vb2.company.com): java.io.FileNotFoundException: ./joda-convert-1.2.jar (Permission denied) java.io.FileOutputStream.open(Native Method) java.io.FileOutputStream.init(FileOutputStream.java:221) com.google.common.io.Files$FileByteSink.openStream(Files.java:223) com.google.common.io.Files$FileByteSink.openStream(Files.java:211) Thanks, Andy -- Regards, Haripriya Ayyalasomayajula
Re: Print Decision Tree Models
Yeah I'm using 1.0.0 and thanks for taking the time to check! Sent from my iPhone On Oct 1, 2014, at 8:48 PM, Xiangrui Meng men...@gmail.com wrote: Which Spark version are you using? It works in 1.1.0 but not in 1.0.0. -Xiangrui On Wed, Oct 1, 2014 at 2:13 PM, Jimmy McErlain ji...@sellpoints.com wrote: So I am trying to print the model output from MLlib however I am only getting things like the following: org.apache.spark.mllib.tree.model.DecisionTreeModel@1120c600 0.17171527904439082 0.8282847209556092 5273125.0 2.5435412E7 from the following code: val trainErr = labelAndPreds.filter(r = r._1 != r._2).count.toDouble / cleanedData2.count val trainSucc = labelAndPreds.filter(r = r._1 == r._2).count.toDouble / cleanedData2.count val trainErrCount = labelAndPreds.filter(r = r._1 != r._2).count.toDouble val trainSuccCount = labelAndPreds.filter(r = r._1 == r._2).count.toDouble print(model) println(trainErr) println(trainSucc) println(trainErrCount) println(trainSuccCount) I have also tried the following: val model_string = model.toString() print(model_string) And I still do not get the model to print but where it resides in memory. Thanks, J JIMMY MCERLAIN DATA SCIENTIST (NERD) . . . . . . . . . . . . . . . . . . IF WE CAN’T DOUBLE YOUR SALES, ONE OF US IS IN THE WRONG BUSINESS. E: ji...@sellpoints.com M: 510.303.7751 ᐧ
Re: Window comparison matching using the sliding window functionality: feasibility
Not sure if this is what you are after but its based on a moving average within spark... I was building an ARIMA model on top of spark and this helped me out a lot: http://stackoverflow.com/questions/23402303/apache-spark-moving-average ᐧ *JIMMY MCERLAIN* DATA SCIENTIST (NERD) *. . . . . . . . . . . . . . . . . .* *IF WE CAN’T DOUBLE YOUR SALES,* *ONE OF US IS IN THE WRONG BUSINESS.* *E*: ji...@sellpoints.com *M*: *510.303.7751* On Tue, Sep 30, 2014 at 8:19 AM, nitinkak001 nitinkak...@gmail.com wrote: Any ideas guys? Trying to find some information online. Not much luck so far. -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Window-comparison-matching-using-the-sliding-window-functionality-feasibility-tp15352p15404.html Sent from the Apache Spark User List mailing list archive at Nabble.com. - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org