Hi folks!
I'm trying to implement an update of a broadcast var in Spark Streaming.
The idea is that whenever some configuration value has changed (this is
periodically checked by the driver) the existing broadcast variable is
unpersisted and then (re-)broadcasted.
In a local test setup (usi
Hi folks!
I'm trying to implement an update of a broadcast var in Spark Streaming.
The idea is that whenever some configuration value has changed (this is
periodically checked by the driver) the existing broadcast variable is
unpersisted and then (re-)broadcasted.
In a local test setup (usin
shot of the exact CAST statement that you are
> using? Did you use the SQL method mentioned by me earlier?
>
> Regards,
> Gourav Sengupta
>
>> On Thu, Feb 17, 2022 at 12:17 PM Rico Bergmann wrote:
>> hi!
>>
>> Casting another int column that is not a pa
G has never been an issue for
>> me.
>>
>> Can you just help us with the output of : df.printSchema() ?
>>
>> I prefer to use SQL, and the method I use for casting is: CAST(<> name>> AS STRING) <>.
>>
>> Regards,
>> Gourav
>>
the column and the data type of the column.
Best, Rico.
> Am 17.02.2022 um 03:17 schrieb Morven Huang :
>
> Hi Rico, you have any code snippet? I have no problem casting int to string.
>
>> 2022年2月17日 上午12:26,Rico Bergmann 写道:
>>
>> Hi!
>>
>> I am
Hi!
I am reading a partitioned dataFrame into spark using automatic type inference
for the partition columns. For one partition column the data contains an
integer, therefor Spark uses IntegerType for this column. In general this is
supposed to be a StringType column. So I tried to cast this co
Indeed adding public constructors solved the problem...
Thanks a lot!
> Am 29.04.2021 um 18:53 schrieb Rico Bergmann :
>
>
> It didn’t have it. So I added public no args and all args constructors. But I
> still get the same error
>
>
>
>>> Am 29.0
gt; On Thu, Apr 29, 2021 at 9:55 AM Rico Bergmann wrote:
>> Here is the relevant generated code and the Exception stacktrace.
>>
>> The problem in the generated code is at line 35.
>>
is looking for
> members of a companion object when there is none here. Can you show any more
> of the stack trace or generated code?
>
>> On Thu, Apr 29, 2021 at 7:40 AM Rico Bergmann wrote:
>> Hi all!
>>
>> A simplified code snippet of what my Spark pipe
Hi all!
A simplified code snippet of what my Spark pipeline written in Java does:
public class MyPojo implements Serializable {
... // some fields with Getter and Setter
}
a custom Aggregator (defined in the Driver class):
public static MyAggregator extends
org.apache.spark.sql.expressions
resulting HTTP
records, maybe you consider splitting the pipeline into two parts:
- process trigger event, pull data from HTTP, write to kafka
- perform structured streaming ingestion
Kind regards
Dipl.-Inf. Rico Bergmann <mailto:i...@ricobergmann.de>> schrieb am Fr. 5. März 2021 um 09:06
which
may arise from relying on this email's technical content is explicitly
disclaimed. The author will in no case be liable for any monetary
damages arising from such loss, damage or destruction.
On Fri, 5 Mar 2021 at 08:06, Dipl.-Inf. Rico Bergmann
mailto:i...@ricobergmann.de>&
Hi all!
I'm using Spark structured streaming for a data ingestion pipeline.
Basically the pipeline reads events (notifications of new available
data) from a Kafka topic and then queries a REST endpoint to get the
real data (within a flatMap).
For one single event the pipeline creates a few t
Hi all!
Is it possible that Spark creates under certain circumstances duplicate
rows when doing multiple joins?
What I did:
buse.count
res0: Long = 20554365
buse.alias("buse").join(bdef.alias("bdef"), $"buse._c4"===$"bdef._c4").count
res1: Long = 20554365
buse.alias("buse").join(bdef.alia
> the checkpointed state avoiding recomputing.
>
> On Mon, Nov 19, 2018 at 7:51 AM Dipl.-Inf. Rico Bergmann
> mailto:i...@ricobergmann.de>> wrote:
>
> Thanks for your advise. But I'm using Batch processing. Does
> anyone have a solution for the batch proce
u have to make a new connection "per
> batch" instead of creating one long lasting connections for the
> pipeline as such. Ie you might have to implement some sort of
> connection pooling by yourself depending on sink.
>
> Regards,
>
> Magnus
>
>
> On Mon, No
Hi!
I have a SparkSQL programm, having one input and 6 ouputs (write). When
executing this programm every call to write(.) executes the plan. My
problem is, that I want all these writes to happen in parallel (inside
one execution plan), because all writes have a common and compute
intensive subpar
entry with a small program that
> can reproduce this problem?
>
> Best Regards,
> Kazuaki Ishizaki
>
>
>
> From: Rico Bergmann
> To: "user@spark.apache.org"
> Date: 2018/06/05 19:58
> Subject: Stran
Hi!
I get a strange error when executing a complex SQL-query involving 4
tables that are left-outer-joined:
Caused by: org.codehaus.commons.compiler.CompileException: File
'generated.java', Line 37, Column 18: failed to compile:
org.codehaus.commons.compiler.CompileException: File 'generated.ja
Hi!
I was trying to get the SparkPi example running using the spark-on-k8s
distro from kubespark. But I get the following error:
+ /sbin/tini -s -- driver
[FATAL tini (11)] exec driver failed: No such file or directory
Did anyone get the example running on a Kubernetes cluster?
Best,
Rico.
invo
20 matches
Mail list logo