Re: Spark DataSets and multiple write(.) calls

2018-11-20 Thread Dipl.-Inf. Rico Bergmann
d from > the checkpointed state avoiding recomputing. > > On Mon, Nov 19, 2018 at 7:51 AM Dipl.-Inf. Rico Bergmann > mailto:i...@ricobergmann.de>> wrote: > > Thanks for your advise. But I'm using Batch processing. Does > anyone have a solution for the batch processing

Spark DataSets and multiple write(.) calls

2018-11-19 Thread Dipl.-Inf. Rico Bergmann
Hi! I have a SparkSQL programm, having one input and 6 ouputs (write). When executing this programm every call to write(.) executes the plan. My problem is, that I want all these writes to happen in parallel (inside one execution plan), because all writes have a common and compute intensive

Re: Spark DataSets and multiple write(.) calls

2018-11-19 Thread Dipl.-Inf. Rico Bergmann
have to make a new connection "per > batch" instead of creating one long lasting connections for the > pipeline as such. Ie you might have to implement some sort of > connection pooling by yourself depending on sink.  > > Regards, > > Magnus > > > On Mon, Nov 19,

Spark 2.2.1 Dataframes multiple joins bug?

2020-03-23 Thread Dipl.-Inf. Rico Bergmann
Hi all! Is it possible that Spark creates under certain circumstances duplicate rows when doing multiple joins? What I did: buse.count res0: Long = 20554365 buse.alias("buse").join(bdef.alias("bdef"), $"buse._c4"===$"bdef._c4").count res1: Long = 20554365

Re: Structured Streaming Microbatch Semantics

2021-03-05 Thread Dipl.-Inf. Rico Bergmann
ng on this email's technical content is explicitly disclaimed. The author will in no case be liable for any monetary damages arising from such loss, damage or destruction. On Fri, 5 Mar 2021 at 08:06, Dipl.-Inf. Rico Bergmann mailto:i...@ricobergmann.de>> wrote: Hi all! I'm u

Re: Structured Streaming Microbatch Semantics

2021-03-05 Thread Dipl.-Inf. Rico Bergmann
lting HTTP records, maybe you consider splitting the pipeline into two parts: - process trigger event, pull data from HTTP, write to kafka - perform structured streaming ingestion Kind regards Dipl.-Inf. Rico Bergmann <mailto:i...@ricobergmann.de>> schrieb am Fr. 5. März 2021 um 09:06:

Structured Streaming Microbatch Semantics

2021-03-05 Thread Dipl.-Inf. Rico Bergmann
Hi all! I'm using Spark structured streaming for a data ingestion pipeline. Basically the pipeline reads events (notifications of new available data) from a Kafka topic and then queries a REST endpoint to get the real data (within a flatMap). For one single event the pipeline creates a few

Updating Broadcast Variable in Spark Streaming 2.4.4

2022-07-22 Thread Dipl.-Inf. Rico Bergmann
Hi folks! I'm trying to implement an update of a broadcast var in Spark Streaming. The idea is that whenever some configuration value has changed (this is periodically checked by the driver) the existing broadcast variable is unpersisted and then (re-)broadcasted. In a local test setup

Updating Broadcast Variable in Spark Streaming 2.4.4

2022-09-28 Thread Dipl.-Inf. Rico Bergmann
Hi folks! I'm trying to implement an update of a broadcast var in Spark Streaming. The idea is that whenever some configuration value has changed (this is periodically checked by the driver) the existing broadcast variable is unpersisted and then (re-)broadcasted. In a local test setup