Re: DecisionTree Algorithm used in Spark MLLib

2015-01-01 Thread Manish Amde
Hi Anoop, The Spark decision tree implementation supports: regression and multi class classification, continuous and categorical features, pruning and does not support missing features at present. You can probably think of it as distributed CART though personally I always find the acronyms confusi

Re: DAG info

2015-01-01 Thread Josh Rosen
This log message is normal; in this case, this message is saying that the final stage needed to compute your job does not have any dependencies / parent stages and that there are no parent stages that need to be computed. On Thu, Jan 1, 2015 at 11:02 PM, shahid wrote: > hi guys > > > i have just

Re: Compile error from Spark 1.2.0

2015-01-01 Thread zigen
thank you, issue of ticket. 2015/01/02 15:45、Akhil Das のメッセージ: > Yep, Opened SPARK-5054 > > Thanks > Best Regards > >> On Tue, Dec 30, 2014 at 5:52 AM, Michael Armbrust >> wrote: >> Yeah, this looks like a regression in the API due to the addition of >> arbitrary decimal support. Can you

DAG info

2015-01-01 Thread shahid
hi guys i have just starting using spark, i am getting this as an info 15/01/02 11:54:17 INFO DAGScheduler: Parents of final stage: List() 15/01/02 11:54:17 INFO DAGScheduler: Missing parents: List() 15/01/02 11:54:17 INFO DAGScheduler: Submitting Stage 6 (PythonRDD[12] at RDD at PythonRDD.scala:

Re: Compile error from Spark 1.2.0

2015-01-01 Thread Akhil Das
Yep, Opened SPARK-5054 Thanks Best Regards On Tue, Dec 30, 2014 at 5:52 AM, Michael Armbrust wrote: > Yeah, this looks like a regression in the API due to the addition of > arbitrary decimal support. Can you open a JIRA? > > On Sun, Dec 28, 20

DAG info

2015-01-01 Thread shahid ashraf
hi guys i have just starting using spark, i am getting this as an info 15/01/02 11:54:17 INFO DAGScheduler: Parents of final stage: List() 15/01/02 11:54:17 INFO DAGScheduler: Missing parents: List() 15/01/02 11:54:17 INFO DAGScheduler: Submitting Stage 6 (PythonRDD[12] at RDD at PythonRDD.scala:

Re: Spark app performance

2015-01-01 Thread Raghavendra Pandey
I have seen that link. I am using RDD of Byte Array n Kryo serialization. Inside mapPartition when I measure time it is never more than 1 ms whereas total time took by application is like 30 min. Codebase has lot of dependencies. I m trying to come up with a simple version where I can reproduce thi

Re: sparkContext.textFile does not honour the minPartitions argument

2015-01-01 Thread Rishi Yadav
Hi Ankit, Optional number of partitions value is to increase number of partitions not reduce it from default value. On Thu, Jan 1, 2015 at 10:43 AM, Aniket Bhatnagar < aniket.bhatna...@gmail.com> wrote: > I am trying to read a file into a single partition but it seems like > sparkContext.textFil

Re: Trying to make spark-jobserver work with yarn

2015-01-01 Thread Fernando O.
Thanks Akhil, that will help a lot ! It turned out that spark-jobserver does not work in "development mode" but if you deploy a server it works (looks like the dependencies when running jobserver from sbt are not right) On Thu, Jan 1, 2015 at 5:22 AM, Akhil Das wrote: > Hi Fernando, > > He

Re: FlatMapValues

2015-01-01 Thread Sanjay Subramanian
thanks let me try that out From: Hitesh Khamesra To: Sanjay Subramanian Cc: Kapil Malik ; Sean Owen ; "user@spark.apache.org" Sent: Thursday, January 1, 2015 9:46 AM Subject: Re: FlatMapValues How about this..apply flatmap on per line. And in that function, parse each line and r

sparkContext.textFile does not honour the minPartitions argument

2015-01-01 Thread Aniket Bhatnagar
I am trying to read a file into a single partition but it seems like sparkContext.textFile ignores the passed minPartitions value. I know I can repartition the RDD but I was curious to know if this is expected or if this is a bug that needs to be further investigated?

Re: spark ignoring all memory settings and defaulting to 512MB?

2015-01-01 Thread Kevin Burton
ok.. we need to get these centralized somewhere as the documentation for spark-env.sh sends people far far far off in the wrong direction. maybe remove all the directives in that script in favor of a link to a page that is more live and can be updated? Kevin On Thu, Jan 1, 2015 at 12:43 AM, Sean

Re: JdbcRDD

2015-01-01 Thread Sujee
Hi, I wrote a blog post about this. http://www.sparkexpert.com/2015/01/02/load-database-data-into-spark-using-jdbcrdd-in-java/ -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/JdbcRDD-tp19233p20939.html Sent from the Apache Spark User List mailing list archi

Re: JdbcRDD and ClassTag issue

2015-01-01 Thread Sujee
Hi, I encountered the same issue and solved it. Please check my blog post http://www.sparkexpert.com/2015/01/02/load-database-data-into-spark-using-jdbcrdd-in-java/ Thank you -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/JdbcRDD-and-ClassTag-issue-tp18570

Re: FlatMapValues

2015-01-01 Thread Hitesh Khamesra
How about this..apply flatmap on per line. And in that function, parse each line and return all the colums as per your need. On Wed, Dec 31, 2014 at 10:16 AM, Sanjay Subramanian < sanjaysubraman...@yahoo.com.invalid> wrote: > hey guys > > Some of u may care :-) but this is just give u a backgroun

Re: Reading nested JSON data with Spark SQL

2015-01-01 Thread Pankaj Narang
oops sqlContext.setConf("spark.sql.parquet.binaryAsString", "true") thois solved the issue important for everyone -- View this message in context: http://apache-spark-user-list.1001560.n3.nabble.com/Reading-nested-JSON-data-with-Spark-SQL-tp19310p20936.html Sent from the Apache Spark U

Re: Reading nested JSON data with Spark SQL

2015-01-01 Thread Pankaj Narang
Also it looks like that when I store the String in parquet and try to fetch them using spark code I got classcast exception below how my array of strings are saved. each character ascii value is present in array of ints res25: Array[Seq[String]] r= Array(ArrayBuffer(Array(104, 116, 116, 112, 58

Only One Kafka receiver is running in spark irrespective of multiple DStreams

2015-01-01 Thread Tapas Swain
Hi All, I am consuming a 8 partition kafka topic through multiple Dstreams and Processing them in Spark. But irrespective of multiple InputDstreams the spark master UI is showing only one receiver. The following is the consumer part of spark code: int numStreams = 8; List> kafkaSt

Re: spark ignoring all memory settings and defaulting to 512MB?

2015-01-01 Thread Sean Owen
You don't in general configure Spark with environment variables. They exist but largely for backwards compatibility. Use arguments like --executor-memory on spark-submit, which are explained in the docs and the help message. It is possible to directly set the system properties with -D too if you ne

Re: Some Serious Issue with Spark Streaming ? Blocks Getting Removed and Jobs have Failed..

2015-01-01 Thread Sean Owen
I believe the message merely means that a block has been removed from memory because either it is not needed or because it is also persisted on disk and memory is low. It does not mean data is lost. What is the end problem you observe? This does not match the problem you link to in the mailing list

Re: Spark app performance

2015-01-01 Thread Akhil Das
Would be great if you can share the piece of code happening inside your mapPartition, I'm assuming you are creating/handling a lot of Complex objects and hence it slows down the performance. Here's a link to performance tuning if you haven't seen it

Re: Reading nested JSON data with Spark SQL

2015-01-01 Thread Pankaj Narang
Hih I am having simiiar problem and tries your solution with spark 1.2 build withing hadoop I am saving object to parquet files where some fields are of type Array. When I fetch them as below I get java.lang.ClassCastException: [B cannot be cast to java.lang.CharSequence def fetchTags(rows

Re: spark stream + cassandra (execution on event)

2015-01-01 Thread Akhil Das
One approach will be to create a event based streaming pipeline, like your spark streaming will listen on a socket or whatever for the event to happen and once it happens, it will hit your cassandra and does work. Thanks Best Regards On Wed, Dec 31, 2014 at 3:14 PM, Oleg Ruchovets wrote: > Hi .

Re: NoSuchMethodError: com.typesafe.config.Config.getDuration with akka-http/akka-stream

2015-01-01 Thread Akhil Das
Its a typesafe jar conflict, you will need to put the jar with getDuration method in the first position of your classpath. Thanks Best Regards On Wed, Dec 31, 2014 at 4:38 PM, Christophe Billiard < christophe.billi...@gmail.com> wrote: > Hi all, > > I am currently trying to combine datastax's "s

Re: Trying to make spark-jobserver work with yarn

2015-01-01 Thread Akhil Das
Hi Fernando, Here's a simple log parser/analyser written in scala (you can run it without spark-shell/submit). https://github.com/sigmoidanalytics/Test Basically to run a spark job without spark-submit or shell you need a build file

Re: Some Serious Issue with Spark Streaming ? Blocks Getting Removed and Jobs have Failed..

2015-01-01 Thread zgm
I am also seeing this error in a YARN spark streaming (1.2.0) application Tim Smith wrote > Similar issue (Spark 1.0.0). Streaming app runs for a few seconds > before these errors start to pop all over the driver logs: > > 14/09/12 17:30:23 WARN TaskSetManager: Loss was due to java.lang.Except