SparkSQL on hive error

2015-10-27 Thread Anand Nalya
Hi, I've a partitioned table in Hive (Avro) that I can query alright from hive cli. When using SparkSQL, I'm able to query some of the partitions, but getting exception on some of the partitions. The query is: sqlContext.sql("select * from myTable where source='http' and date = 20150812").take(

Re: Checkpoint file not found

2015-08-03 Thread Anand Nalya
(notFilter).checkpoint(interval) toNotUpdate.foreachRDD(rdd => pending = rdd ) Thanks On 3 August 2015 at 13:09, Tathagata Das wrote: > Can you tell us more about streaming app? DStream operation that you are > using? > > On Sun, Aug 2, 2015 at 9:14 PM, Anand Nalya wrote: >

Checkpoint file not found

2015-08-02 Thread Anand Nalya
Hi, I'm writing a Streaming application in Spark 1.3. After running for some time, I'm getting following execption. I'm sure, that no other process is modifying the hdfs file. Any idea, what might be the cause of this? 15/08/02 21:24:13 ERROR scheduler.DAGSchedulerEventProcessLoop: DAGSchedulerEv

Re: updateStateByKey schedule time

2015-07-21 Thread Anand Nalya
I also ran into a similar use case. Is this possible? On 15 July 2015 at 18:12, Michel Hubert wrote: > Hi, > > > > > > I want to implement a time-out mechanism in de updateStateByKey(…) > routine. > > > > But is there a way the retrieve the time of the start of the batch > corresponding to the

Re: Breaking lineage and reducing stages in Spark Streaming

2015-07-10 Thread Anand Nalya
e before using foreach(println). If the RDD is huge, you >> definitely don't want to do that. >> >> Hope this helps. >> >> dean >> >> Dean Wampler, Ph.D. >> Author: Programming Scala, 2nd Edition >> <http://shop.oreilly.com/product/063692003

Re: Breaking lineage and reducing stages in Spark Streaming

2015-07-09 Thread Anand Nalya
product/0636920033073.do> (O'Reilly) > Typesafe <http://typesafe.com> > @deanwampler <http://twitter.com/deanwampler> > http://polyglotprogramming.com > > On Thu, Jul 9, 2015 at 5:56 AM, Anand Nalya wrote: > >> The data coming from dstream have the same key

Re: Breaking lineage and reducing stages in Spark Streaming

2015-07-09 Thread Anand Nalya
it? > > On Thu, Jul 9, 2015 at 3:17 AM, Anand Nalya wrote: > >> Thats from the Streaming tab for Spark 1.4 WebUI. >> >> On 9 July 2015 at 15:35, Michel Hubert wrote: >> >>> Hi, >>> >>> >>> >>> I was just wondering h

Re: Breaking lineage and reducing stages in Spark Streaming

2015-07-09 Thread Anand Nalya
Thats from the Streaming tab for Spark 1.4 WebUI. On 9 July 2015 at 15:35, Michel Hubert wrote: > Hi, > > > > I was just wondering how you generated to second image with the charts. > > What product? > > > > *From:* Anand Nalya [mailto:anand.na...@gmail.com] &

Breaking lineage and reducing stages in Spark Streaming

2015-07-09 Thread Anand Nalya
Hi, I've an application in which an rdd is being updated with tuples coming from RDDs in a DStream with following pattern. dstream.foreachRDD(rdd => { myRDD = myRDD.union(rdd.filter(myfilter)).reduceByKey(_+_) }) I'm using cache() and checkpointin to cache results. Over the time, the lineage o

[no subject]

2015-07-07 Thread Anand Nalya
Hi, Suppose I have an RDD that is loaded from some file and then I also have a DStream that has data coming from some stream. I want to keep union some of the tuples from the DStream into my RDD. For this I can use something like this: var myRDD: RDD[(String, Long)] = sc.fromText... dstream.f

Split RDD into two in a single pass

2015-07-06 Thread Anand Nalya
Hi, I've a RDD which I want to split into two disjoint RDDs on with a boolean function. I can do this with the following val rdd1 = rdd.filter(f) val rdd2 = rdd.filter(fnot) I'm assuming that each of the above statement will traverse the RDD once thus resulting in 2 passes. Is there a way of do

Array fields in dataframe.write.jdbc

2015-07-02 Thread Anand Nalya
Hi, I'm using spark 1.4. I've a array field in my data frame and when I'm trying to write this dataframe to postgres, I'm getting the following exception: Exception in thread "main" java.lang.IllegalArgumentException: Can't translate null value for field StructField(filter,ArrayType(StringType,fa