Hi folks!
I'm trying to implement an update of a broadcast var in Spark Streaming.
The idea is that whenever some configuration value has changed (this is
periodically checked by the driver) the existing broadcast variable is
unpersisted and then (re-)broadcasted.
In a local test setup (usi
Hi folks!
I'm trying to implement an update of a broadcast var in Spark Streaming.
The idea is that whenever some configuration value has changed (this is
periodically checked by the driver) the existing broadcast variable is
unpersisted and then (re-)broadcasted.
In a local test setup (usin
resulting HTTP
records, maybe you consider splitting the pipeline into two parts:
- process trigger event, pull data from HTTP, write to kafka
- perform structured streaming ingestion
Kind regards
Dipl.-Inf. Rico Bergmann <mailto:i...@ricobergmann.de>> schrieb am Fr. 5. März 2021 um 09:06
which
may arise from relying on this email's technical content is explicitly
disclaimed. The author will in no case be liable for any monetary
damages arising from such loss, damage or destruction.
On Fri, 5 Mar 2021 at 08:06, Dipl.-Inf. Rico Bergmann
mailto:i...@ricobergmann.de>&
Hi all!
I'm using Spark structured streaming for a data ingestion pipeline.
Basically the pipeline reads events (notifications of new available
data) from a Kafka topic and then queries a REST endpoint to get the
real data (within a flatMap).
For one single event the pipeline creates a few t
Hi all!
Is it possible that Spark creates under certain circumstances duplicate
rows when doing multiple joins?
What I did:
buse.count
res0: Long = 20554365
buse.alias("buse").join(bdef.alias("bdef"), $"buse._c4"===$"bdef._c4").count
res1: Long = 20554365
buse.alias("buse").join(bdef.alia
> the checkpointed state avoiding recomputing.
>
> On Mon, Nov 19, 2018 at 7:51 AM Dipl.-Inf. Rico Bergmann
> mailto:i...@ricobergmann.de>> wrote:
>
> Thanks for your advise. But I'm using Batch processing. Does
> anyone have a solution for the batch proce
u have to make a new connection "per
> batch" instead of creating one long lasting connections for the
> pipeline as such. Ie you might have to implement some sort of
> connection pooling by yourself depending on sink.
>
> Regards,
>
> Magnus
>
>
> On Mon, No
Hi!
I have a SparkSQL programm, having one input and 6 ouputs (write). When
executing this programm every call to write(.) executes the plan. My
problem is, that I want all these writes to happen in parallel (inside
one execution plan), because all writes have a common and compute
intensive subpar