from:"Dipl.\-Inf. Rico Bergmann"

Updating Broadcast Variable in Spark Streaming 2.4.4

2022-09-28 Thread Dipl.-Inf. Rico Bergmann

Hi folks! I'm trying to implement an update of a broadcast var in Spark Streaming. The idea is that whenever some configuration value has changed (this is periodically checked by the driver) the existing broadcast variable is unpersisted and then (re-)broadcasted. In a local test setup (usi

Updating Broadcast Variable in Spark Streaming 2.4.4

2022-07-22 Thread Dipl.-Inf. Rico Bergmann

Hi folks! I'm trying to implement an update of a broadcast var in Spark Streaming. The idea is that whenever some configuration value has changed (this is periodically checked by the driver) the existing broadcast variable is unpersisted and then (re-)broadcasted. In a local test setup (usin

Re: Structured Streaming Microbatch Semantics

2021-03-05 Thread Dipl.-Inf. Rico Bergmann

resulting HTTP records, maybe you consider splitting the pipeline into two parts: - process trigger event, pull data from HTTP, write to kafka - perform structured streaming ingestion Kind regards Dipl.-Inf. Rico Bergmann <mailto:i...@ricobergmann.de>> schrieb am Fr. 5. März 2021 um 09:06

Re: Structured Streaming Microbatch Semantics

2021-03-05 Thread Dipl.-Inf. Rico Bergmann

which may arise from relying on this email's technical content is explicitly disclaimed. The author will in no case be liable for any monetary damages arising from such loss, damage or destruction. On Fri, 5 Mar 2021 at 08:06, Dipl.-Inf. Rico Bergmann mailto:i...@ricobergmann.de>&

Structured Streaming Microbatch Semantics

2021-03-05 Thread Dipl.-Inf. Rico Bergmann

Hi all! I'm using Spark structured streaming for a data ingestion pipeline. Basically the pipeline reads events (notifications of new available data) from a Kafka topic and then queries a REST endpoint to get the real data (within a flatMap). For one single event the pipeline creates a few t

Spark 2.2.1 Dataframes multiple joins bug?

2020-03-23 Thread Dipl.-Inf. Rico Bergmann

Hi all! Is it possible that Spark creates under certain circumstances duplicate rows when doing multiple joins? What I did: buse.count res0: Long = 20554365 buse.alias("buse").join(bdef.alias("bdef"), $"buse._c4"===$"bdef._c4").count res1: Long = 20554365 buse.alias("buse").join(bdef.alia

Re: Spark DataSets and multiple write(.) calls

2018-11-20 Thread Dipl.-Inf. Rico Bergmann

> the checkpointed state avoiding recomputing. > > On Mon, Nov 19, 2018 at 7:51 AM Dipl.-Inf. Rico Bergmann > mailto:i...@ricobergmann.de>> wrote: > > Thanks for your advise. But I'm using Batch processing. Does > anyone have a solution for the batch proce

Re: Spark DataSets and multiple write(.) calls

2018-11-19 Thread Dipl.-Inf. Rico Bergmann

u have to make a new connection "per > batch" instead of creating one long lasting connections for the > pipeline as such. Ie you might have to implement some sort of > connection pooling by yourself depending on sink. > > Regards, > > Magnus > > > On Mon, No

Spark DataSets and multiple write(.) calls

2018-11-19 Thread Dipl.-Inf. Rico Bergmann

Hi! I have a SparkSQL programm, having one input and 6 ouputs (write). When executing this programm every call to write(.) executes the plan. My problem is, that I want all these writes to happen in parallel (inside one execution plan), because all writes have a common and compute intensive subpar

Updating Broadcast Variable in Spark Streaming 2.4.4

Updating Broadcast Variable in Spark Streaming 2.4.4

Re: Structured Streaming Microbatch Semantics

Re: Structured Streaming Microbatch Semantics

Structured Streaming Microbatch Semantics

Spark 2.2.1 Dataframes multiple joins bug?

Re: Spark DataSets and multiple write(.) calls

Re: Spark DataSets and multiple write(.) calls

Spark DataSets and multiple write(.) calls

9 matches

Site Navigation

Mail list logo

Footer information