Hi Jungtaek Lim & Gerard Mass:
Thanks very much.
When I put three batch data like following :
batch 0:
2018-08-27 09:53:00,1
2018-08-27 09:53:01,1
batch 1:
2018-08-27 11:04:00,1
2018-08-27 11:04:01,1
batch 2:
2018-08-27 11:17:00,1
2018-08-27 11:17:01,1
the agg result of time "2018-08-27
Hi dev:
I am using Spark-Shell to run the example which is in section
'http://spark.apache.org/docs/2.2.2/sql-programming-guide.html#type-safe-user-defined-aggregate-functions',
and there is an error:
*Caused by: java.io.NotSerializableException:
org.apache.spark.sql.TypedColumn
Serialization
Thanks Sonal.
For example, I have data as following:
login 2018/8/27 10:00
logout 2018/8/27 10:05
login 2018/8/27 10:08
logout 2018/8/27 10:15
login 2018/8/27 11:08
logout 2018/8/27 11:32
Now I want to calculate the time between each login and logout. For
example, I should get 5 min, 7 min, 24
Yes, the checkpoint makes sure that you start off from where you left off.
On Sun, Aug 26, 2018 at 2:22 AM sherif98
wrote:
> I have data that is continuously pushed to multiple S3 buckets. I want to
> set
> up a structured streaming application that uses the S3 buckets as the data
> source and
I have data that is continuously pushed to multiple S3 buckets. I want to set
up a structured streaming application that uses the S3 buckets as the data
source and do stream-stream joins.
My question is if the application is down for some reason, will restarting
the application would continue