Re: [SPARK STRUCTURED STREAMING]: Alternatives to using Foreach sink in pyspark

2017-07-26 Thread Priyank Shrivastava
t; > On Tue, Jul 25, 2017 at 6:05 PM, Priyank Shrivastava < > priy...@asperasoft.com> wrote: > >> I am trying to write key-values to redis using a DataStreamWriter object >> using pyspark structured streaming APIs. I am using Spark 2.2 >> >> Since the Foreach Sin

Re: [SPARK STRUCTURED STREAMING]: Alternatives to using Foreach sink in pyspark

2017-07-28 Thread Priyank Shrivastava
se-of-use from non-jvm languages. > > On Wed, Jul 26, 2017 at 11:49 AM, Priyank Shrivastava < > priy...@asperasoft.com> wrote: > >> Thanks TD. I am going to try the python-scala hybrid approach by using >> scala only for custom redis sink and python for

Re: [SPARK STRUCTURED STREAMING]: Alternatives to using Foreach sink in pyspark

2017-07-28 Thread Priyank Shrivastava
across language > boundaries. > > Python: > df = spark.readStream... > # Python logic > df.createOrReplaceTempView("tmp1") > > Scala: > val df = spark.table("tmp1") > df.writeStream > .foreach(...) > > > On Fri, Jul 28, 20

Re: [SPARK STRUCTURED STREAMING]: Alternatives to using Foreach sink in pyspark

2017-07-28 Thread Priyank Shrivastava
Also, in your example doesn't the tempview need to be accessed using the same sparkSession on the scala side? Since I am not using a notebook, how can I get access to the same sparksession in scala. On Fri, Jul 28, 2017 at 3:17 PM, Priyank Shrivastava <priy...@asperasoft.com > wrote: &g

[Structured Streaming] Recommended way of joining streams

2017-08-09 Thread Priyank Shrivastava
I have streams of data coming in from various applications through Kafka. These streams are converted into dataframes in Spark. I would like to join these dataframes on a common ID they all contain. Since joining streaming dataframes is currently not supported, what is the current recommended

[SPARK STRUCTURED STREAMING]: Alternatives to using Foreach sink in pyspark

2017-07-25 Thread Priyank Shrivastava
I am trying to write key-values to redis using a DataStreamWriter object using pyspark structured streaming APIs. I am using Spark 2.2 Since the Foreach Sink is not supported for python; here , I am

[Structured Streaming] Avoiding multiple streaming queries

2018-02-12 Thread Priyank Shrivastava
I have a structured streaming query which sinks to Kafka. This query has a complex aggregation logic. I would like to sink the output DF of this query to multiple Kafka topics each partitioned on a different ‘key’ column. I don’t want to have multiple Kafka sinks for each of the different

[Structured Streaming] Restarting streaming query on exception/termination

2018-04-20 Thread Priyank Shrivastava
What's the right way of programmatically restarting a structured streaming query which has terminated due to an exception? Example code or reference would be appreciated. Could it be done from within the onQueryTerminated() event handler of StreamingQueryListener class? Priyank

Re: [Structured Streaming] Restarting streaming query on exception/termination

2018-04-23 Thread Priyank Shrivastava
, 2018 at 8:52 PM, formice <51296...@qq.com> wrote: > standlone > add config:(1)--deploy-mode cluster (2)--supervise > example: spark-submit --master spark://master:7077 --deploy-mode > cluster --supervise .. > > > -- 原始邮件 -----

[Structured Streaming] Application Updates in Production

2018-03-21 Thread Priyank Shrivastava
I am using Structured Streaming with Spark 2.2. We are using Kafka as our source and are using checkpoints for failure recovery and e2e exactly once guarantees. I would like to get some more information on how to handle updates to the application when there is a change in stateful operations