Re: Using data frames to join separate RDDs in spark streaming

2016-06-04 Thread Cyril Scetbon
Problem solved by creating only one RDD. > On Jun 1, 2016, at 14:05, Cyril Scetbon wrote: > > It seems that to join a DStream with a RDD I can use : > > mgs.transform(rdd => rdd.join(rdd1)) > > or > > mgs.foreachRDD(rdd => rdd.join(rdd1)) > > But, I can

Spark Streaming w/variables used as dynamic queries

2016-06-03 Thread Cyril Scetbon
Hey guys, Can someone help me to solve the current issue. My code is the following : var arr = new ArrayBuffer[String]() sa_msgs.map(x => x._1) .foreachRDD { rdd => arr = new ArrayBuffer[String]() } (2) sa_msgs.map(x => x._1) .foreachRDD { rdd => arr ++=

Re: Using data frames to join separate RDDs in spark streaming

2016-06-01 Thread Cyril Scetbon
It seems that to join a DStream with a RDD I can use : mgs.transform(rdd => rdd.join(rdd1)) or mgs.foreachRDD(rdd => rdd.join(rdd1)) But, I can't see why rdd1.toDF("id","aid") really causes SPARK-5063 > On Jun 1, 2016, at 12:00, Cyril Scetbon wrote: >

Using data frames to join separate RDDs in spark streaming

2016-06-01 Thread Cyril Scetbon
do it or I need to do everything using RDDs join only ? And If I need to use only RDDs join, how to do it ? as I have a RDD (rdd1) and a DStream (mgs) ? Thanks -- Cyril SCETBON - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org

Re: Joining a RDD to a Dataframe

2016-05-12 Thread Cyril Scetbon
DataFrame.scala:152) Is there a better way to query nested objects and to join between a DF containing nested objects and another regular data frame (yes it's the current case) > On May 9, 2016, at 00:42, Cyril Scetbon wrote: > > Hi Ashish, > > The issue is not relate

Re: Joining a RDD to a Dataframe

2016-05-08 Thread Cyril Scetbon
comment on it and maybe give me advices. Thank you. > On May 8, 2016, at 22:12, Ashish Dubey wrote: > > Is there any reason you dont want to convert this - i dont think join b/w RDD > and DF is supported. > > On Sat, May 7, 2016 at 11:41 PM, Cyril Scetbon <mailto:cyril.s

Joining a RDD to a Dataframe

2016-05-07 Thread Cyril Scetbon
Hi, I have a RDD built during a spark streaming job and I'd like to join it to a DataFrame (E/S input) to enrich it. It seems that I can't join the RDD and the DF without converting first the RDD to a DF (Tell me if I'm wrong). Here are the schemas of both DF : scala> df res32: org.apache.spark

Re: spark-shell failing but pyspark works

2016-04-04 Thread Cyril Scetbon
The only way I've found to make it work now is by using the current spark context and changing its configuration using spark-shell options. Which is really different from pyspark where you can't instantiate a new one, initialize it etc.. > On Apr 4, 2016, at 18:16, Cyril Scetbon w

Re: spark-shell failing but pyspark works

2016-04-04 Thread Cyril Scetbon
set("spark.hadoop.validateOutputSpecs", "false") > > And that will work > > Have you tried it? > > HTH > > Dr Mich Talebzadeh > > LinkedIn > https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw >

Re: spark-shell failing but pyspark works

2016-04-04 Thread Cyril Scetbon
deh > > LinkedIn > https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw > > <https://www.linkedin.com/profile/view?id=AAEWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw> > > http://talebzadehmich.wordpress.com <http://talebzadehmich.wordpress.com/>

Re: spark-shell failing but pyspark works

2016-04-02 Thread Cyril Scetbon
Nobody has any idea ? > On Mar 31, 2016, at 23:22, Cyril Scetbon wrote: > > Hi, > > I'm having issues to create a StreamingContext with Scala using spark-shell. > It tries to access the localhost interface and the Application Master is not > running on t

spark-shell failing but pyspark works

2016-03-31 Thread Cyril Scetbon
seems there are differences. Are there any parameters set using Scala but not Python ? Thanks -- Cyril SCETBON - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For additional commands, e-mail: user-h...@spark.apache.org

Re: StateSpec raises error "missing arguments for method"

2016-03-27 Thread Cyril Scetbon
gt; val spec = StateSpec.function(mappingFunction) spec: org.apache.spark.streaming.StateSpec[String,Int,Int,Option[String]] = StateSpecImpl() This code comes from one of the spark examples. If anyone can explain me the difference between the two. Regards > On Mar 27, 2016, at 23:57, Cyril

StateSpec raises error "missing arguments for method"

2016-03-27 Thread Cyril Scetbon
Hi, I'm testing a sample code and I get this error. Here is the sample code I use : scala> import org.apache.spark.streaming._ import org.apache.spark.streaming._ scala> def mappingFunction(key: String, value: Option[Int], state: State[Int]): Option[String] = { |Option(key) | } ma

Re: python support of mapWithState

2016-03-24 Thread Cyril Scetbon
sync since they are both in the JVM and a > bit more work is needed to expose the same functionality from the JVM in > Python (or re-implement the Scala code in Python where appropriate). > > On Thursday, March 24, 2016, Cyril Scetbon <mailto:cyril.scet...@free.fr>> wrote:

python support of mapWithState

2016-03-24 Thread Cyril Scetbon
Hi guys, It seems that mapWithState is not supported. Am I right ? Is there a reason there are 3 languages supported and one is behind the two others ? Thanks - To unsubscribe, e-mail: user-unsubscr...@spark.apache.org For addit

Re: Spark Streaming with Kafka DirectStream

2016-02-17 Thread Cyril Scetbon
I don't think we can print an integer value in a spark streaming process As opposed to a spark job. I think I can print the content of an rdd but not debug messages. Am I wrong ? Cyril Scetbon > On Feb 17, 2016, at 12:51 AM, ayan guha wrote: > > Hi > > You can alway

Re: Spark Streaming with Kafka DirectStream

2016-02-16 Thread Cyril Scetbon
erstanding. > > Direct stream generates 1 RDD per batch, however, number of partitions in > that RDD = number of partitions in kafka topic. > > On Wed, Feb 17, 2016 at 12:18 PM, Cyril Scetbon <mailto:cyril.scet...@free.fr>> wrote: > Hi guys, > > I'm makin

Spark Streaming with Kafka DirectStream

2016-02-16 Thread Cyril Scetbon
Hi guys, I'm making some tests with Spark and Kafka using a Python script. I use the second method that doesn't need any receiver (Direct Approach). It should adapt the number of RDDs to the number of partitions in the topic. I'm trying to verify it. What's the easiest way to verify it ? I also