Re: Re: About the question of Spark Structured Streaming window output

2018-08-26 Thread z...@zjdex.com
Hi Jungtaek Lim & Gerard Mass: Thanks very much. When I put three batch data like following : batch 0: 2018-08-27 09:53:00,1 2018-08-27 09:53:01,1 batch 1: 2018-08-27 11:04:00,1 2018-08-27 11:04:01,1 batch 2: 2018-08-27 11:17:00,1 2018-08-27 11:17:01,1 the agg result of time "2018-08-27

java.io.NotSerializableException: org.apache.spark.sql.TypedColumn

2018-08-26 Thread zzcclp
Hi dev: I am using Spark-Shell to run the example which is in section 'http://spark.apache.org/docs/2.2.2/sql-programming-guide.html#type-safe-user-defined-aggregate-functions', and there is an error: *Caused by: java.io.NotSerializableException: org.apache.spark.sql.TypedColumn Serialization

Re: How to deal with context dependent computing?

2018-08-26 Thread JF Chen
Thanks Sonal. For example, I have data as following: login 2018/8/27 10:00 logout 2018/8/27 10:05 login 2018/8/27 10:08 logout 2018/8/27 10:15 login 2018/8/27 11:08 logout 2018/8/27 11:32 Now I want to calculate the time between each login and logout. For example, I should get 5 min, 7 min, 24

Re: Spark Structured Streaming using S3 as data source

2018-08-26 Thread Burak Yavuz
Yes, the checkpoint makes sure that you start off from where you left off. On Sun, Aug 26, 2018 at 2:22 AM sherif98 wrote: > I have data that is continuously pushed to multiple S3 buckets. I want to > set > up a structured streaming application that uses the S3 buckets as the data > source and

Spark Structured Streaming using S3 as data source

2018-08-26 Thread sherif98
I have data that is continuously pushed to multiple S3 buckets. I want to set up a structured streaming application that uses the S3 buckets as the data source and do stream-stream joins. My question is if the application is down for some reason, will restarting the application would continue