Updating Broadcast Variable in Spark Streaming 2.4.4

2022-09-28 Thread Dipl.-Inf. Rico Bergmann
Hi folks! I'm trying to implement an update of a broadcast var in Spark Streaming. The idea is that whenever some configuration value has changed (this is periodically checked by the driver) the existing broadcast variable is unpersisted and then (re-)broadcasted. In a local test setup (usi

Updating Broadcast Variable in Spark Streaming 2.4.4

2022-07-22 Thread Dipl.-Inf. Rico Bergmann
Hi folks! I'm trying to implement an update of a broadcast var in Spark Streaming. The idea is that whenever some configuration value has changed (this is periodically checked by the driver) the existing broadcast variable is unpersisted and then (re-)broadcasted. In a local test setup (usin

Re: Cast int to string not possible?

2022-02-17 Thread Rico Bergmann
shot of the exact CAST statement that you are > using? Did you use the SQL method mentioned by me earlier? > > Regards, > Gourav Sengupta > >> On Thu, Feb 17, 2022 at 12:17 PM Rico Bergmann wrote: >> hi! >> >> Casting another int column that is not a pa

Re: Cast int to string not possible?

2022-02-17 Thread Rico Bergmann
G has never been an issue for >> me. >> >> Can you just help us with the output of : df.printSchema() ? >> >> I prefer to use SQL, and the method I use for casting is: CAST(<> name>> AS STRING) <>. >> >> Regards, >> Gourav >>

Re: Cast int to string not possible?

2022-02-16 Thread Rico Bergmann
the column and the data type of the column. Best, Rico. > Am 17.02.2022 um 03:17 schrieb Morven Huang : > > Hi Rico, you have any code snippet? I have no problem casting int to string. > >> 2022年2月17日 上午12:26,Rico Bergmann 写道: >> >> Hi! >> >> I am

Cast int to string not possible?

2022-02-16 Thread Rico Bergmann
Hi! I am reading a partitioned dataFrame into spark using automatic type inference for the partition columns. For one partition column the data contains an integer, therefor Spark uses IntegerType for this column. In general this is supposed to be a StringType column. So I tried to cast this co

Re: Spark DataFrame CodeGeneration in Java generates Scala specific code?

2021-04-29 Thread Rico Bergmann
Indeed adding public constructors solved the problem... Thanks a lot! > Am 29.04.2021 um 18:53 schrieb Rico Bergmann : > >  > It didn’t have it. So I added public no args and all args constructors. But I > still get the same error > > > >>> Am 29.0

Re: Spark DataFrame CodeGeneration in Java generates Scala specific code?

2021-04-29 Thread Rico Bergmann
gt; On Thu, Apr 29, 2021 at 9:55 AM Rico Bergmann wrote: >> Here is the relevant generated code and the Exception stacktrace. >> >> The problem in the generated code is at line 35. >>

Re: Spark DataFrame CodeGeneration in Java generates Scala specific code?

2021-04-29 Thread Rico Bergmann
is looking for > members of a companion object when there is none here. Can you show any more > of the stack trace or generated code? > >> On Thu, Apr 29, 2021 at 7:40 AM Rico Bergmann wrote: >> Hi all! >> >> A simplified code snippet of what my Spark pipe

Spark DataFrame CodeGeneration in Java generates Scala specific code?

2021-04-29 Thread Rico Bergmann
Hi all! A simplified code snippet of what my Spark pipeline written in Java does: public class MyPojo implements Serializable { ... // some fields with Getter and Setter } a custom Aggregator (defined in the Driver class): public static MyAggregator extends org.apache.spark.sql.expressions

Re: Structured Streaming Microbatch Semantics

2021-03-05 Thread Dipl.-Inf. Rico Bergmann
resulting HTTP records, maybe you consider splitting the pipeline into two parts: - process trigger event, pull data from HTTP, write to kafka - perform structured streaming ingestion Kind regards Dipl.-Inf. Rico Bergmann <mailto:i...@ricobergmann.de>> schrieb am Fr. 5. März 2021 um 09:06

Re: Structured Streaming Microbatch Semantics

2021-03-05 Thread Dipl.-Inf. Rico Bergmann
which may arise from relying on this email's technical content is explicitly disclaimed. The author will in no case be liable for any monetary damages arising from such loss, damage or destruction. On Fri, 5 Mar 2021 at 08:06, Dipl.-Inf. Rico Bergmann mailto:i...@ricobergmann.de>&

Structured Streaming Microbatch Semantics

2021-03-05 Thread Dipl.-Inf. Rico Bergmann
Hi all! I'm using Spark structured streaming for a data ingestion pipeline. Basically the pipeline reads events (notifications of new available data) from a Kafka topic and then queries a REST endpoint to get the real data (within a flatMap). For one single event the pipeline creates a few t

Spark 2.2.1 Dataframes multiple joins bug?

2020-03-23 Thread Dipl.-Inf. Rico Bergmann
Hi all! Is it possible that Spark creates under certain circumstances duplicate rows when doing multiple joins? What I did: buse.count res0: Long = 20554365 buse.alias("buse").join(bdef.alias("bdef"), $"buse._c4"===$"bdef._c4").count res1: Long = 20554365 buse.alias("buse").join(bdef.alia

Re: Spark DataSets and multiple write(.) calls

2018-11-20 Thread Dipl.-Inf. Rico Bergmann
> the checkpointed state avoiding recomputing. > > On Mon, Nov 19, 2018 at 7:51 AM Dipl.-Inf. Rico Bergmann > mailto:i...@ricobergmann.de>> wrote: > > Thanks for your advise. But I'm using Batch processing. Does > anyone have a solution for the batch proce

Re: Spark DataSets and multiple write(.) calls

2018-11-19 Thread Dipl.-Inf. Rico Bergmann
u have to make a new connection "per > batch" instead of creating one long lasting connections for the > pipeline as such. Ie you might have to implement some sort of > connection pooling by yourself depending on sink.  > > Regards, > > Magnus > > > On Mon, No

Spark DataSets and multiple write(.) calls

2018-11-19 Thread Dipl.-Inf. Rico Bergmann
Hi! I have a SparkSQL programm, having one input and 6 ouputs (write). When executing this programm every call to write(.) executes the plan. My problem is, that I want all these writes to happen in parallel (inside one execution plan), because all writes have a common and compute intensive subpar

Re: Strange codegen error for SortMergeJoin in Spark 2.2.1

2018-06-08 Thread Rico Bergmann
entry with a small program that > can reproduce this problem? > > Best Regards, > Kazuaki Ishizaki > > > > From:        Rico Bergmann > To:        "user@spark.apache.org" > Date:        2018/06/05 19:58 > Subject:        Stran

Strange codegen error for SortMergeJoin in Spark 2.2.1

2018-06-05 Thread Rico Bergmann
Hi! I get a strange error when executing a complex SQL-query involving 4 tables that are left-outer-joined: Caused by: org.codehaus.commons.compiler.CompileException: File 'generated.java', Line 37, Column 18: failed to compile: org.codehaus.commons.compiler.CompileException: File 'generated.ja

Problem running Kubernetes example v2.2.0-kubernetes-0.5.0

2018-04-11 Thread Rico Bergmann
Hi! I was trying to get the SparkPi example running using the spark-on-k8s distro from kubespark. But I get the following error: + /sbin/tini -s -- driver [FATAL tini (11)] exec driver failed: No such file or directory Did anyone get the example running on a Kubernetes cluster? Best, Rico. invo