Re: How to prevent and track data loss/dropped due to watermark during structure streaming aggregation

2020-02-07 Thread stevech.hu
Thanks Jungtaek. I could not remove the watermark but setting 0 works for me. -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ - To unsubscribe e-mail: user-unsubscr...@spark.apache.org

Re: How to prevent and track data loss/dropped due to watermark during structure streaming aggregation

2020-01-23 Thread stevech.hu
Anyone know the answers or pointers? thanks. -- Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/ - To unsubscribe e-mail: user-unsubscr...@spark.apache.org

How to prevent and track data loss/dropped due to watermark during structure streaming aggregation

2020-01-18 Thread stevech.hu
We have a scenario to group raw records by correlation id every 3 minutes and append groupped result to some HDFS store, below is an example of our query val df= records.readStream.format("SomeDataSource") .selectExpr("current_timestamp() as CurrentTime", "*") .withWatermark("Curr

Issue with structured streaming custom data source V2

2019-09-04 Thread stevech.hu
Hi, I started building a custom data source using data source v2 api to stream data incrementally from azure data lake (HDFS) in Spark 2.4.0. I used kafka as the reference to implement the DataSourceV2/MicroBatchReader/InputPartitionReader/InputPartition/OffsetV2 classes and goteverything working