Re: Is there any Spark implementation for Item-based Collaborative Filtering?

2014-11-30 Thread Jimmy
The latest version of MLlib has it built in no?
J

Sent from my iPhone

 On Nov 30, 2014, at 9:36 AM, shahab shahab.mok...@gmail.com wrote:
 
 Hi,
 
 I just wonder if there is any implementation for Item-based Collaborative 
 Filtering in Spark?
 
 best,
 /Shahab

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: how to convert System.currentTimeMillis to calendar time

2014-11-13 Thread Jimmy McErlain
You could also use the jodatime library, which has a ton of great other
options in it.
J
ᐧ




*JIMMY MCERLAIN*

DATA SCIENTIST (NERD)

*. . . . . . . . . . . . . . . . . .*


*IF WE CAN’T DOUBLE YOUR SALES,*



*ONE OF US IS IN THE WRONG BUSINESS.*

*E*: ji...@sellpoints.com

*M*: *510.303.7751*

On Thu, Nov 13, 2014 at 10:40 AM, Akhil Das ak...@sigmoidanalytics.com
wrote:

 This way?

 scala val epoch = System.currentTimeMillis
 epoch: Long = 1415903974545

 scala val date = new Date(epoch)
 date: java.util.Date = Fri Nov 14 00:09:34 IST 2014



 Thanks
 Best Regards

 On Thu, Nov 13, 2014 at 10:17 PM, spr s...@yarcdata.com wrote:

 Apologies for what seems an egregiously simple question, but I can't find
 the
 answer anywhere.

 I have timestamps from the Spark Streaming Time() interface, in
 milliseconds
 since an epoch, and I want to print out a human-readable calendar date and
 time.  How does one do that?



 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/how-to-convert-System-currentTimeMillis-to-calendar-time-tp18856.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org





Re: which is the recommended workflow engine for Apache Spark jobs?

2014-11-10 Thread Jimmy McErlain
I have used Oozie for all our workflows with Spark apps but you will have
to use a java event as the workflow element.   I am interested in anyones
experience with Luigi and/or any other tools.


On Mon, Nov 10, 2014 at 10:34 AM, Adamantios Corais 
adamantios.cor...@gmail.com wrote:

 I have some previous experience with Apache Oozie while I was developing
 in Apache Pig. Now, I am working explicitly with Apache Spark and I am
 looking for a tool with similar functionality. Is Oozie recommended? What
 about Luigi? What do you use \ recommend?




-- 


Nothing under the sun is greater than education. By educating one person
and sending him/her into the society of his/her generation, we make a
contribution extending a hundred generations to come.
-Jigoro Kano, Founder of Judo-


Re: Unable to use HiveContext in spark-shell

2014-11-06 Thread Jimmy McErlain
can you be more specific what version of spark, hive, hadoop, etc...
what are you trying to do?  what are the issues you are seeing?
J
ᐧ




*JIMMY MCERLAIN*

DATA SCIENTIST (NERD)

*. . . . . . . . . . . . . . . . . .*


*IF WE CAN’T DOUBLE YOUR SALES,*



*ONE OF US IS IN THE WRONG BUSINESS.*

*E*: ji...@sellpoints.com

*M*: *510.303.7751*

On Thu, Nov 6, 2014 at 9:22 AM, tridib tridib.sama...@live.com wrote:

 Help please!



 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/Unable-to-use-HiveContext-in-spark-shell-tp18261p18280.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org




Re: Spark v Redshift

2014-11-04 Thread Jimmy McErlain
This is pretty spot on.. though I would also add that the Spark features
that it touts around speed are all dependent on caching the data into
memory... reading off the disk still takes time..ie pulling the data into
an RDD.  This is the reason that Spark is great for ML... the data is used
over and over again to fit models so its pulled into memory once then
basically analyzed through the algos... other DBs systems are reading and
writing to disk repeatedly and are thus slower, such as mahout (though its
getting ported over to Spark as well to compete with MLlib)...

J
ᐧ




*JIMMY MCERLAIN*

DATA SCIENTIST (NERD)

*. . . . . . . . . . . . . . . . . .*


*IF WE CAN’T DOUBLE YOUR SALES,*



*ONE OF US IS IN THE WRONG BUSINESS.*

*E*: ji...@sellpoints.com

*M*: *510.303.7751*

On Tue, Nov 4, 2014 at 3:51 PM, Matei Zaharia matei.zaha...@gmail.com
wrote:

 Is this about Spark SQL vs Redshift, or Spark in general? Spark in general
 provides a broader set of capabilities than Redshift because it has APIs in
 general-purpose languages (Java, Scala, Python) and libraries for things
 like machine learning and graph processing. For example, you might use
 Spark to do the ETL that will put data into a database such as Redshift, or
 you might pull data out of Redshift into Spark for machine learning. On the
 other hand, if *all* you want to do is SQL and you are okay with the set of
 data formats and features in Redshift (i.e. you can express everything
 using its UDFs and you have a way to get data in), then Redshift is a
 complete service which will do more management out of the box.

 Matei

  On Nov 4, 2014, at 3:11 PM, agfung agf...@gmail.com wrote:
 
  I'm in the midst of a heated debate about the use of Redshift v Spark
 with a
  colleague.  We keep trading anecdotes and links back and forth (eg airbnb
  post from 2013 or amplab benchmarks), and we don't seem to be getting
  anywhere.
 
  So before we start down the prototype /benchmark road, and in desperation
  of finding *some* kind of objective third party perspective,  was
 wondering
  if anyone who has used both in 2014 would care to provide commentary
 about
  the sweet spot use cases / gotchas for non trivial use (eg a simple
 filter
  scan isn't really interesting).  Soft issues like operational maintenance
  and time spent developing v out of the box are interesting too...
 
 
 
  --
  View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/Spark-v-Redshift-tp18112.html
  Sent from the Apache Spark User List mailing list archive at Nabble.com.
 
  -
  To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
  For additional commands, e-mail: user-h...@spark.apache.org
 


 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org




Re: Spark + Tableau

2014-10-30 Thread Jimmy
What ODBC driver are you using? We recently got the Hortonworks JODBC drivers 
working on a Windows box but was having issues with Mac



Sent from my iPhone

 On Oct 30, 2014, at 4:23 AM, Bojan Kostic blood9ra...@gmail.com wrote:
 
 I'm testing beta driver from Databricks for Tableua.
 And unfortunately i encounter some issues.
 While beeline connection works without problems, Tableau can't connect to
 spark thrift server.
 
 Error from driver(Tableau):
 Unable to connect to the ODBC Data Source. Check that the necessary drivers
 are installed and that the connection properties are valid.
 [Simba][SparkODBC] (34) Error from Spark: ETIMEDOUT.
 
 Unable to connect to the server test.server.com. Check that the server is
 running and that you have access privileges to the requested database.
 Unable to connect to the server. Check that the server is running and that
 you have access privileges to the requested database.
 
 Exception on Thrift server:
 java.lang.RuntimeException: org.apache.thrift.transport.TTransportException
at
 org.apache.thrift.transport.TSaslServerTransport$Factory.getTransport(TSaslServerTransport.java:219)
at
 org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:189)
at
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:722)
 Caused by: org.apache.thrift.transport.TTransportException
at
 org.apache.thrift.transport.TIOStreamTransport.read(TIOStreamTransport.java:132)
at
 org.apache.thrift.transport.TTransport.readAll(TTransport.java:84)
at
 org.apache.thrift.transport.TSaslTransport.receiveSaslMessage(TSaslTransport.java:182)
at
 org.apache.thrift.transport.TSaslServerTransport.handleSaslStartMessage(TSaslServerTransport.java:125)
at
 org.apache.thrift.transport.TSaslTransport.open(TSaslTransport.java:253)
at
 org.apache.thrift.transport.TSaslServerTransport.open(TSaslServerTransport.java:41)
at
 org.apache.thrift.transport.TSaslServerTransport$Factory.getTransport(TSaslServerTransport.java:216)
... 4 more
 
 Is there anyone else who's testing this driver, or did anyone saw this
 message?
 
 Best regards
 Bojan Kostić
 
 
 
 --
 View this message in context: 
 http://apache-spark-user-list.1001560.n3.nabble.com/Spark-Tableau-tp17720.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.
 
 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org
 

-
To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
For additional commands, e-mail: user-h...@spark.apache.org



Re: issue on applying SVM to 5 million examples.

2014-10-30 Thread Jimmy
Watch the app manager it should tell you what's running and taking awhile... My 
guess it's a distinct function on the data.
J

Sent from my iPhone

 On Oct 30, 2014, at 8:22 AM, peng xia toxiap...@gmail.com wrote:
 
 Hi,
 
  
 
 Previous we have applied SVM algorithm in MLlib to 5 million records (600 
 mb), it takes more than 25 minutes to finish.
 The spark version we are using is 1.0 and we were running this program on a 4 
 nodes cluster. Each node has 4 cpu cores and 11 GB RAM.
 
 The 5 million records only have two distinct records (One positive and one 
 negative), others are all duplications.
 
 Any one has any idea on why it takes so long on this small data?
 
  
 
 Thanks,
 Best,
 
 Peng


Re: issue on applying SVM to 5 million examples.

2014-10-30 Thread Jimmy
sampleRDD. cache()

Sent from my iPhone

 On Oct 30, 2014, at 5:01 PM, peng xia toxiap...@gmail.com wrote:
 
 Hi Xiangrui, 
 
 Can you give me some code example about caching, as I am new to Spark.
 
 Thanks,
 Best,
 Peng
 
 On Thu, Oct 30, 2014 at 6:57 PM, Xiangrui Meng men...@gmail.com wrote:
 Then caching should solve the problem. Otherwise, it is just loading
 and parsing data from disk for each iteration. -Xiangrui
 
 On Thu, Oct 30, 2014 at 11:44 AM, peng xia toxiap...@gmail.com wrote:
  Thanks for all your help.
  I think I didn't cache the data. My previous cluster was expired and I 
  don't
  have a chance to check the load balance or app manager.
  Below is my code.
  There are 18 features for each record and I am using the Scala API.
 
  import org.apache.spark.SparkConf
  import org.apache.spark.SparkContext
  import org.apache.spark.SparkContext._
  import org.apache.spark.rdd._
  import org.apache.spark.mllib.classification.SVMWithSGD
  import org.apache.spark.mllib.regression.LabeledPoint
  import org.apache.spark.mllib.linalg.Vectors
  import java.util.Calendar
 
  object BenchmarkClassification {
  def main(args: Array[String]) {
  // Load and parse the data file
  val conf = new SparkConf()
   .setAppName(SVM)
   .set(spark.executor.memory, 8g)
   // .set(spark.executor.extraJavaOptions, -Xms8g -Xmx8g)
 val sc = new SparkContext(conf)
  val data = sc.textFile(args(0))
  val parsedData = data.map { line =
   val parts = line.split(',')
   LabeledPoint(parts(0).toDouble, Vectors.dense(parts.tail.map(x =
  x.toDouble)))
  }
  val testData = sc.textFile(args(1))
  val testParsedData = testData .map { line =
   val parts = line.split(',')
   LabeledPoint(parts(0).toDouble, Vectors.dense(parts.tail.map(x =
  x.toDouble)))
  }
 
  // Run training algorithm to build the model
  val numIterations = 20
  val model = SVMWithSGD.train(parsedData, numIterations)
 
  // Evaluate model on training examples and compute training error
  // val labelAndPreds = testParsedData.map { point =
  //   val prediction = model.predict(point.features)
  //   (point.label, prediction)
  // }
  // val trainErr = labelAndPreds.filter(r = r._1 != r._2).count.toDouble /
  testParsedData.count
  // println(Training Error =  + trainErr)
  println(Calendar.getInstance().getTime())
  }
  }
 
 
 
 
  Thanks,
  Best,
  Peng
 
  On Thu, Oct 30, 2014 at 1:23 PM, Xiangrui Meng men...@gmail.com wrote:
 
  DId you cache the data and check the load balancing? How many
  features? Which API are you using, Scala, Java, or Python? -Xiangrui
 
  On Thu, Oct 30, 2014 at 9:13 AM, Jimmy ji...@sellpoints.com wrote:
   Watch the app manager it should tell you what's running and taking
   awhile...
   My guess it's a distinct function on the data.
   J
  
   Sent from my iPhone
  
   On Oct 30, 2014, at 8:22 AM, peng xia toxiap...@gmail.com wrote:
  
   Hi,
  
  
  
   Previous we have applied SVM algorithm in MLlib to 5 million records
   (600
   mb), it takes more than 25 minutes to finish.
   The spark version we are using is 1.0 and we were running this program
   on a
   4 nodes cluster. Each node has 4 cpu cores and 11 GB RAM.
  
   The 5 million records only have two distinct records (One positive and
   one
   negative), others are all duplications.
  
   Any one has any idea on why it takes so long on this small data?
  
  
  
   Thanks,
   Best,
  
   Peng
 
 
 


Re: TaskNotSerializableException when running through Spark shell

2014-10-16 Thread Jimmy McErlain
I actually only ran into this issue recently after we upgraded to Spark
1.1.  Within the REPL for Spark 1.0 everything works fine but within the
REPL for 1.1, it is not.  FYI I am also only doing simple regex matching
functions within an RDD... Now when I am running the same code as App
everything is working fine... it leads me to believe that it is a bug
within the REPL for 1.1

Can anyone else confirm this?

ᐧ




*JIMMY MCERLAIN*

DATA SCIENTIST (NERD)

*. . . . . . . . . . . . . . . . . .*


*IF WE CAN’T DOUBLE YOUR SALES,*



*ONE OF US IS IN THE WRONG BUSINESS.*

*E*: ji...@sellpoints.com

*M*: *510.303.7751*

On Thu, Oct 16, 2014 at 7:56 AM, Akshat Aranya aara...@gmail.com wrote:

 Hi,

 Can anyone explain how things get captured in a closure when runing
 through the REPL.  For example:

 def foo(..) = { .. }

 rdd.map(foo)

 sometimes complains about classes not being serializable that are
 completely unrelated to foo.  This happens even when I write it such:

 object Foo {
   def foo(..) = { .. }
 }

 rdd.map(Foo.foo)

 It also doesn't happen all the time.



Re: Exception while reading SendingConnection to ConnectionManagerId

2014-10-16 Thread Jimmy Li
Does anyone know anything re: this error? Thank you!

On Wed, Oct 15, 2014 at 3:38 PM, Jimmy Li jimmy...@bluelabs.com wrote:

 Hi there, I'm running spark on ec2, and am running into an error there
 that I don't get locally. Here's the error:

 11335 [handle-read-write-executor-3] ERROR
 org.apache.spark.network.SendingConnection  - Exception while reading
 SendingConnection to ConnectionManagerId([IP HERE])
 java.nio.channels.ClosedChannelException

 Does anyone know what might be causing this? Spark is running on my ec2
 instances.

 Thanks,
 Jimmy



Exception while reading SendingConnection to ConnectionManagerId

2014-10-15 Thread Jimmy Li
Hi there, I'm running spark on ec2, and am running into an error there that
I don't get locally. Here's the error:

11335 [handle-read-write-executor-3] ERROR
org.apache.spark.network.SendingConnection  - Exception while reading
SendingConnection to ConnectionManagerId([IP HERE])
java.nio.channels.ClosedChannelException

Does anyone know what might be causing this? Spark is running on my ec2
instances.

Thanks,
Jimmy


Re: Spark can't find jars

2014-10-14 Thread Jimmy McErlain
So the only way that I could make this work was to build a fat jar file as
suggested earlier.  To me (and I am no expert) it seems like this is a
bug.  Everything was working for me prior to our upgrade to Spark 1.1 on
Hadoop 2.2 but now it seems to not...  ie packaging my jars locally then
pushing them out to the cluster and pointing them to corresponding
dependent jars

Sorry I cannot be more help!
J
ᐧ




*JIMMY MCERLAIN*

DATA SCIENTIST (NERD)

*. . . . . . . . . . . . . . . . . .*


*IF WE CAN’T DOUBLE YOUR SALES,*



*ONE OF US IS IN THE WRONG BUSINESS.*

*E*: ji...@sellpoints.com

*M*: *510.303.7751*

On Tue, Oct 14, 2014 at 4:59 AM, Christophe Préaud 
christophe.pre...@kelkoo.com wrote:

  Hello,

 I have already posted a message with the exact same problem, and proposed
 a patch (the subject is Application failure in yarn-cluster mode).
 Can you test it, and see if it works for you?
 I would be glad too if someone can confirm that it is a bug in Spark 1.1.0.

 Regards,
 Christophe.


 On 14/10/2014 03:15, Jimmy McErlain wrote:

 BTW this has always worked for me before until we upgraded the cluster to
 Spark 1.1.1...
 J
 ᐧ




  *JIMMY MCERLAIN*

 DATA SCIENTIST (NERD)

 *. . . . . . . . . . . . . . . . . .*


   *IF WE CAN’T DOUBLE YOUR SALES,*



 *ONE OF US IS IN THE WRONG BUSINESS. *

 *E*: ji...@sellpoints.com

 *M*: *510.303.7751 510.303.7751*

 On Mon, Oct 13, 2014 at 5:39 PM, HARIPRIYA AYYALASOMAYAJULA 
 aharipriy...@gmail.com wrote:

 Helo,

  Can you check if  the jar file is available in the target-scala-2.10
 folder?

  When you use sbt package to make the jar file, that is where the jar
 file would be located.

  The following command works well for me:

  spark-submit --class “Classname   --master yarn-cluster
 jarfile(withcomplete path)

 Can you try checking  with this initially and later add other options?

 On Mon, Oct 13, 2014 at 7:36 PM, Jimmy ji...@sellpoints.com wrote:

  Having the exact same error with the exact same jar Do you work
 for Altiscale? :)
 J

 Sent from my iPhone

 On Oct 13, 2014, at 5:33 PM, Andy Srine andy.sr...@gmail.com wrote:

   Hi Guys,


  Spark rookie here. I am getting a file not found exception on the
 --jars. This is on the yarn cluster mode and I am running the following
 command on our recently upgraded Spark 1.1.1 environment.


  ./bin/spark-submit --verbose --master yarn --deploy-mode cluster
 --class myEngine --driver-memory 1g --driver-library-path
 /hadoop/share/hadoop/mapreduce/lib/hadoop-lzo-0.4.18-201406111750.jar
 --executor-memory 5g --executor-cores 5 --jars
 /home/andy/spark/lib/joda-convert-1.2.jar --queue default --num-executors 4
 /home/andy/spark/lib/my-spark-lib_1.0.jar


  This is the error I am hitting. Any tips would be much appreciated.
 The file permissions looks fine on my local disk.


  14/10/13 22:49:39 INFO yarn.ApplicationMaster: Unregistering
 ApplicationMaster with FAILED

 14/10/13 22:49:39 INFO impl.AMRMClientImpl: Waiting for application to
 be successfully unregistered.

 Exception in thread Driver java.lang.reflect.InvocationTargetException

 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

 at
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)

 at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

 at java.lang.reflect.Method.invoke(Method.java:606)

 at
 org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:162)

 Caused by: org.apache.spark.SparkException: Job aborted due to stage
 failure: Task 3 in stage 1.0 failed 4 times, most recent failure: Lost task
 3.3 in stage 1.0 (TID 12, 122-67.vb2.company.com):
 java.io.FileNotFoundException: ./joda-convert-1.2.jar (Permission denied)

 java.io.FileOutputStream.open(Native Method)

 java.io.FileOutputStream.init(FileOutputStream.java:221)


 com.google.common.io.Files$FileByteSink.openStream(Files.java:223)


 com.google.common.io.Files$FileByteSink.openStream(Files.java:211)


 Thanks,
 Andy




   --
 Regards,
 Haripriya Ayyalasomayajula




 --
 Kelkoo SAS
 Société par Actions Simplifiée
 Au capital de € 4.168.964,30
 Siège social : 8, rue du Sentier 75002 Paris
 425 093 069 RCS Paris

 Ce message et les pièces jointes sont confidentiels et établis à
 l'attention exclusive de leurs destinataires. Si vous n'êtes pas le
 destinataire de ce message, merci de le détruire et d'en avertir
 l'expéditeur.



Re: Spark can't find jars

2014-10-13 Thread Jimmy
Having the exact same error with the exact same jar Do you work for 
Altiscale? :) 
J

Sent from my iPhone

 On Oct 13, 2014, at 5:33 PM, Andy Srine andy.sr...@gmail.com wrote:
 
 Hi Guys,
 
 Spark rookie here. I am getting a file not found exception on the --jars. 
 This is on the yarn cluster mode and I am running the following command on 
 our recently upgraded Spark 1.1.1 environment.
 
 ./bin/spark-submit --verbose --master yarn --deploy-mode cluster --class 
 myEngine --driver-memory 1g --driver-library-path 
 /hadoop/share/hadoop/mapreduce/lib/hadoop-lzo-0.4.18-201406111750.jar 
 --executor-memory 5g --executor-cores 5 --jars 
 /home/andy/spark/lib/joda-convert-1.2.jar --queue default --num-executors 4 
 /home/andy/spark/lib/my-spark-lib_1.0.jar
 
 This is the error I am hitting. Any tips would be much appreciated. The file 
 permissions looks fine on my local disk.
 
 14/10/13 22:49:39 INFO yarn.ApplicationMaster: Unregistering 
 ApplicationMaster with FAILED
 14/10/13 22:49:39 INFO impl.AMRMClientImpl: Waiting for application to be 
 successfully unregistered.
 Exception in thread Driver java.lang.reflect.InvocationTargetException
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
   at java.lang.reflect.Method.invoke(Method.java:606)
   at 
 org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:162)
 Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: 
 Task 3 in stage 1.0 failed 4 times, most recent failure: Lost task 3.3 in 
 stage 1.0 (TID 12, 122-67.vb2.company.com): java.io.FileNotFoundException: 
 ./joda-convert-1.2.jar (Permission denied)
 java.io.FileOutputStream.open(Native Method)
 java.io.FileOutputStream.init(FileOutputStream.java:221)
 com.google.common.io.Files$FileByteSink.openStream(Files.java:223)
 com.google.common.io.Files$FileByteSink.openStream(Files.java:211)
 
 Thanks,
 Andy
 


Re: Spark can't find jars

2014-10-13 Thread Jimmy McErlain
BTW this has always worked for me before until we upgraded the cluster to
Spark 1.1.1...
J
ᐧ




*JIMMY MCERLAIN*

DATA SCIENTIST (NERD)

*. . . . . . . . . . . . . . . . . .*


*IF WE CAN’T DOUBLE YOUR SALES,*



*ONE OF US IS IN THE WRONG BUSINESS.*

*E*: ji...@sellpoints.com

*M*: *510.303.7751*

On Mon, Oct 13, 2014 at 5:39 PM, HARIPRIYA AYYALASOMAYAJULA 
aharipriy...@gmail.com wrote:

 Helo,

 Can you check if  the jar file is available in the target-scala-2.10
 folder?

 When you use sbt package to make the jar file, that is where the jar file
 would be located.

 The following command works well for me:

 spark-submit --class “Classname   --master yarn-cluster
 jarfile(withcomplete path)

 Can you try checking  with this initially and later add other options?

 On Mon, Oct 13, 2014 at 7:36 PM, Jimmy ji...@sellpoints.com wrote:

 Having the exact same error with the exact same jar Do you work for
 Altiscale? :)
 J

 Sent from my iPhone

 On Oct 13, 2014, at 5:33 PM, Andy Srine andy.sr...@gmail.com wrote:

 Hi Guys,


 Spark rookie here. I am getting a file not found exception on the --jars.
 This is on the yarn cluster mode and I am running the following command on
 our recently upgraded Spark 1.1.1 environment.


 ./bin/spark-submit --verbose --master yarn --deploy-mode cluster --class
 myEngine --driver-memory 1g --driver-library-path
 /hadoop/share/hadoop/mapreduce/lib/hadoop-lzo-0.4.18-201406111750.jar
 --executor-memory 5g --executor-cores 5 --jars
 /home/andy/spark/lib/joda-convert-1.2.jar --queue default --num-executors 4
 /home/andy/spark/lib/my-spark-lib_1.0.jar


 This is the error I am hitting. Any tips would be much appreciated. The
 file permissions looks fine on my local disk.


 14/10/13 22:49:39 INFO yarn.ApplicationMaster: Unregistering
 ApplicationMaster with FAILED

 14/10/13 22:49:39 INFO impl.AMRMClientImpl: Waiting for application to be
 successfully unregistered.

 Exception in thread Driver java.lang.reflect.InvocationTargetException

 at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)

 at
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)

 at
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)

 at java.lang.reflect.Method.invoke(Method.java:606)

 at
 org.apache.spark.deploy.yarn.ApplicationMaster$$anon$2.run(ApplicationMaster.scala:162)

 Caused by: org.apache.spark.SparkException: Job aborted due to stage
 failure: Task 3 in stage 1.0 failed 4 times, most recent failure: Lost task
 3.3 in stage 1.0 (TID 12, 122-67.vb2.company.com):
 java.io.FileNotFoundException: ./joda-convert-1.2.jar (Permission denied)

 java.io.FileOutputStream.open(Native Method)

 java.io.FileOutputStream.init(FileOutputStream.java:221)

 com.google.common.io.Files$FileByteSink.openStream(Files.java:223)

 com.google.common.io.Files$FileByteSink.openStream(Files.java:211)


 Thanks,
 Andy




 --
 Regards,
 Haripriya Ayyalasomayajula




Re: Print Decision Tree Models

2014-10-01 Thread Jimmy
Yeah I'm using 1.0.0 and thanks for taking the time to check! 

Sent from my iPhone

 On Oct 1, 2014, at 8:48 PM, Xiangrui Meng men...@gmail.com wrote:
 
 Which Spark version are you using? It works in 1.1.0 but not in 1.0.0. 
 -Xiangrui
 
 On Wed, Oct 1, 2014 at 2:13 PM, Jimmy McErlain ji...@sellpoints.com wrote:
 So I am trying to print the model output from MLlib however I am only 
 getting things like the following:
 org.apache.spark.mllib.tree.model.DecisionTreeModel@1120c600
 0.17171527904439082
 0.8282847209556092
 5273125.0
 2.5435412E7
 
 from the following code:
   val trainErr = labelAndPreds.filter(r = r._1 != r._2).count.toDouble 
 / cleanedData2.count
   val trainSucc = labelAndPreds.filter(r = r._1 == r._2).count.toDouble 
 / cleanedData2.count
   val trainErrCount = labelAndPreds.filter(r = r._1 != 
 r._2).count.toDouble
   val trainSuccCount = labelAndPreds.filter(r = r._1 == 
 r._2).count.toDouble
   
   print(model)
   println(trainErr)
   println(trainSucc)
   println(trainErrCount)
   println(trainSuccCount)
 
 I have also tried the following:
   val model_string = model.toString()
   print(model_string)
 
 And I still do not get the model to print but where it resides in memory.
 
 Thanks,
 J
 
 
 
 
 
 
 JIMMY MCERLAIN
 
 DATA SCIENTIST (NERD)
 
 . . . . . . . . . . . . . . . . . . 
 
 
 IF WE CAN’T DOUBLE YOUR SALES,
 
 ONE OF US IS IN THE WRONG BUSINESS.
 
 
 E: ji...@sellpoints.com   
 
 M: 510.303.7751
 
 ᐧ
 


Re: Window comparison matching using the sliding window functionality: feasibility

2014-09-30 Thread Jimmy McErlain
Not sure if this is what you are after but its based on a moving average
within spark...  I was building an ARIMA model on top of spark and this
helped me out a lot:

http://stackoverflow.com/questions/23402303/apache-spark-moving-average
ᐧ




*JIMMY MCERLAIN*

DATA SCIENTIST (NERD)

*. . . . . . . . . . . . . . . . . .*


*IF WE CAN’T DOUBLE YOUR SALES,*



*ONE OF US IS IN THE WRONG BUSINESS.*

*E*: ji...@sellpoints.com

*M*: *510.303.7751*

On Tue, Sep 30, 2014 at 8:19 AM, nitinkak001 nitinkak...@gmail.com wrote:

 Any ideas guys?

 Trying to find some information online. Not much luck so far.



 --
 View this message in context:
 http://apache-spark-user-list.1001560.n3.nabble.com/Window-comparison-matching-using-the-sliding-window-functionality-feasibility-tp15352p15404.html
 Sent from the Apache Spark User List mailing list archive at Nabble.com.

 -
 To unsubscribe, e-mail: user-unsubscr...@spark.apache.org
 For additional commands, e-mail: user-h...@spark.apache.org