subject:"\"use rocksdb for spark structured streaming \\\(SSS\\\)\""

Re: use rocksdb for spark structured streaming (SSS)

2019-03-10 Thread Lian Jiang

Thanks guys! I am using SSS to backfill the past 3 month data. I thought I can use SSS for both history data and new data. I just realized that SSS is not appropriate for backfilling since the watermark relies on receivedAt which could be 3 month ago. I will use batch job for backfill and use SSS

Re: use rocksdb for spark structured streaming (SSS)

2019-03-10 Thread Jungtaek Lim

The query makes state growing infinitely. Could you consider watermark apply to "receivedAt" to let unnecessary part of state cleared out? Other than watermark you could implement TTL based eviction via flatMapGroupsWithState, though you'll need to implement your custom "dropDuplicate". 2019년 3월 1

Re: use rocksdb for spark structured streaming (SSS)

2019-03-10 Thread Georg Heiler

Use https://github.com/chermenin/spark-states instead Am So., 10. März 2019 um 20:51 Uhr schrieb Arun Mahadevan : > > Read the link carefully, > > This solution is available (*only*) in Databricks Runtime. > > You can enable RockDB-based state management by setting the following > configuration i

Re: use rocksdb for spark structured streaming (SSS)

2019-03-10 Thread Arun Mahadevan

Read the link carefully, This solution is available (*only*) in Databricks Runtime. You can enable RockDB-based state management by setting the following configuration in the SparkSession before starting the streaming query. spark.conf.set( "spark.sql.streaming.stateStore.providerClass", "co

use rocksdb for spark structured streaming (SSS)

2019-03-10 Thread Lian Jiang

Hi, I have a very simple SSS pipeline which does: val query = df .dropDuplicates(Array("Id", "receivedAt")) .withColumn(timePartitionCol, timestamp_udfnc(col("receivedAt"))) .writeStream .format("parquet") .partitionBy("availabilityDomain", timePartitionCol) .trigger(Trigger.Processin

Re: use rocksdb for spark structured streaming (SSS)

Re: use rocksdb for spark structured streaming (SSS)

Re: use rocksdb for spark structured streaming (SSS)

Re: use rocksdb for spark structured streaming (SSS)

use rocksdb for spark structured streaming (SSS)

5 matches

Site Navigation

Mail list logo

Footer information