Streaming write to orc problem

2022-04-22 Thread hsy...@gmail.com
Hello all, I’m just trying to build a pipeline reading data from a streaming source and write to orc file. But I don’t see any file that is written to the file system nor any exceptions Here is an example val df = spark.readStream.format(“...") .option( “Topic", "Some topic

I got weird error from a join

2018-02-21 Thread hsy...@gmail.com
from pyspark.sql import Row A_DF = sc.parallelize( [ Row(id='A123', name='a1'), Row(id='A234', name='a2') ]).toDF() B_DF = sc.parallelize( [ Row(id='A123', pid='A234', ename='e1') ]).toDF() join_df = B_DF.join(A_DF, B_DF.id==A_DF.id).drop(B_DF.id) final_joi

Questions about using pyspark 2.1.1 pushing data to kafka

2018-01-23 Thread hsy...@gmail.com
I have questions about using pyspark 2.1.1 pushing data to kafka. I don't see any pyspark streaming api to write data directly to kafka, if there is one or example, please point me to the right page. I implemented my own way which using a global kafka producer and push the data picked from foreac

Question about accumulator

2018-01-23 Thread hsy...@gmail.com
I have a small application like this acc = sc.accumulate(5) def t_f(x,): global acc sleep(5) acc += x def f(x): global acc thread = Thread(target = t_f, args = (x,)) thread.start() # thread.join() # without this it doesn't work rdd = sc.parallelize([1,2,4,1]) rdd.for

Re: How to do an interactive Spark SQL

2014-07-23 Thread hsy...@gmail.com
Anyone has any idea on this? On Tue, Jul 22, 2014 at 7:02 PM, hsy...@gmail.com wrote: > But how do they do the interactive sql in the demo? > https://www.youtube.com/watch?v=dJQ5lV5Tldw > > And if it can work in the local mode. I think it should be able to work in > cluste

Re: How to do an interactive Spark SQL

2014-07-22 Thread hsy...@gmail.com
rent stream after the StreamingContext has started. > > Tobias > > > On Wed, Jul 23, 2014 at 9:55 AM, hsy...@gmail.com > wrote: > >> For example, this is what I tested and work on local mode, what it does >> is it get data and sql query both from kafka and do sql on

Re: How to do an interactive Spark SQL

2014-07-22 Thread hsy...@gmail.com
+ r.mkString(",") + "\n" }) producer.send(new KeyedMessage[String, String](outputTopic, s"SQL: $sqlS \n $result")) }) ssc.start() ssc.awaitTermination() On Tue, Jul 22, 2014 at 5:10 PM, Zongheng Yang wrote: > Can you paste a small code example

Re: How to do an interactive Spark SQL

2014-07-22 Thread hsy...@gmail.com
? What do you mean by "cannot shar the sql to all workers"? > > On Tue, Jul 22, 2014 at 4:03 PM, hsy...@gmail.com > wrote: > > Hi guys, > > > > I'm able to run some Spark SQL example but the sql is static in the > code. I > > would like to know is

How to do an interactive Spark SQL

2014-07-22 Thread hsy...@gmail.com
Hi guys, I'm able to run some Spark SQL example but the sql is static in the code. I would like to know is there a way to read sql from somewhere else (shell for example) I could read sql statement from kafka/zookeeper, but I cannot share the sql to all workers. broadcast seems not working for up

Re: Task not serializable: java.io.NotSerializableException: org.apache.spark.SparkContext

2014-07-21 Thread hsy...@gmail.com
I have the same problem On Sat, Jul 19, 2014 at 12:31 AM, lihu wrote: > Hi, > Everyone. I have a piece of following code. When I run it, > it occurred the error just like below, it seem that the SparkContext is not > serializable, but i do not try to use the SparkContext except the broadca

Re: Difference among batchDuration, windowDuration, slideDuration

2014-07-17 Thread hsy...@gmail.com
Thanks Tathagata, so can I say RDD size(from the stream) is window size. and the overlap between 2 adjacent RDDs are sliding size. But I still don't understand what it batch size, why do we need this since data processing is RDD by RDD right? And does spark chop the data into RDDs at the very beg

Difference among batchDuration, windowDuration, slideDuration

2014-07-16 Thread hsy...@gmail.com
When I'm reading the API of spark streaming, I'm confused by the 3 different durations StreamingContext(conf: SparkConf , batchDuration: Duration

Re: SQL + streaming

2014-07-15 Thread hsy...@gmail.com
tors, so you wont find anything in the driver logs! > So try doing a collect, or take on the RDD returned by sql query and print > that. > > TD > > > On Tue, Jul 15, 2014 at 4:28 PM, hsy...@gmail.com > wrote: > >> By the way, have you ever run SQL and stream to

Re: SQL + streaming

2014-07-15 Thread hsy...@gmail.com
By the way, have you ever run SQL and stream together? Do you know any example that works? Thanks! On Tue, Jul 15, 2014 at 4:28 PM, hsy...@gmail.com wrote: > Hi Tathagata, > > I could see the output of count, but no sql results. Run in standalone is > meaningless for me and I ju

Re: SQL + streaming

2014-07-15 Thread hsy...@gmail.com
see output? > Also, I recommend going through the previous step-by-step approach to > narrow down where the problem is. > > TD > > > On Mon, Jul 14, 2014 at 9:15 PM, hsy...@gmail.com > wrote: > >> Actually, I deployed this on yarn cluster(spark-submit) and I could

Re: How to kill running spark yarn application

2014-07-15 Thread hsy...@gmail.com
>> it is terminated as well. Sorry, I cannot reproduce it. >> >> >> On Mon, Jul 14, 2014 at 7:36 PM, hsy...@gmail.com >> wrote: >> >>> Before "yarn application -kill" If you do jps You'll have a list >>> of SparkSubmit and ApplicationM

Re: SQL + streaming

2014-07-14 Thread hsy...@gmail.com
If you can get that to work, then I would test the Spark SQL > stuff. > > TD > > > On Mon, Jul 14, 2014 at 5:25 PM, hsy...@gmail.com > wrote: > >> No errors but no output either... Thanks! >> >> >> On Mon, Jul 14, 2014 at 4:59 PM, Tathagata Das &

Re: SQL + streaming

2014-07-14 Thread hsy...@gmail.com
throwing error? No errors but no output > either? > > TD > > > On Mon, Jul 14, 2014 at 4:06 PM, hsy...@gmail.com > wrote: > >> Hi All, >> >> Couple days ago, I tried to integrate SQL and streaming together. My >> understanding is I can transform RDD from

Re: How to kill running spark yarn application

2014-07-14 Thread hsy...@gmail.com
is is what I did 2 hours > ago. > > Sorry I cannot provide more help. > > > Sent from my iPhone > > On 14 Jul, 2014, at 6:05 pm, "hsy...@gmail.com" wrote: > > yarn-cluster > > > On Mon, Jul 14, 2014 at 2:44 PM, Jerry Lam wrote: > >> Hi Si

SQL + streaming

2014-07-14 Thread hsy...@gmail.com
Hi All, Couple days ago, I tried to integrate SQL and streaming together. My understanding is I can transform RDD from Dstream to schemaRDD and execute SQL on each RDD. But I got no luck Would you guys help me take a look at my code? Thank you very much! object KafkaSpark { def main(args: Arr

Re: How to kill running spark yarn application

2014-07-14 Thread hsy...@gmail.com
yarn-cluster On Mon, Jul 14, 2014 at 2:44 PM, Jerry Lam wrote: > Hi Siyuan, > > I wonder if you --master yarn-cluster or yarn-client? > > Best Regards, > > Jerry > > > On Mon, Jul 14, 2014 at 5:08 PM, hsy...@gmail.com > wrote: > >> Hi all, >

How to kill running spark yarn application

2014-07-14 Thread hsy...@gmail.com
Hi all, A newbie question, I start a spark yarn application through spark-submit How do I kill this app. I can kill the yarn app by "yarn application -kill appid" but the application master is still running. What's the proper way to shutdown the entire app? Best, Siyuan

Difference between SparkSQL and shark

2014-07-10 Thread hsy...@gmail.com
I have a newbie question. What is the difference between SparkSQL and Shark? Best, Siyuan

Re: Some question about SQL and streaming

2014-07-10 Thread hsy...@gmail.com
now if it is the best way, but it works. > > Tobias > > > On Thu, Jul 10, 2014 at 4:21 AM, hsy...@gmail.com > wrote: > >> Hi guys, >> >> I'm a new user to spark. I would like to know is there an example of how >> to user spark SQL and spark streaming together? My use case is I want to do >> some SQL on the input stream from kafka. >> Thanks! >> >> Best, >> Siyuan >> > >

Some question about SQL and streaming

2014-07-09 Thread hsy...@gmail.com
Hi guys, I'm a new user to spark. I would like to know is there an example of how to user spark SQL and spark streaming together? My use case is I want to do some SQL on the input stream from kafka. Thanks! Best, Siyuan