I am writing a data-profiling application that needs to iterate over a large
.gz file (imported as a Dataset). Each key-value pair in the hashmap
will be the row value and the number of times it occurs in the column. There
is one hashmap for each column, and they are all added to a JSON at the end.
Hi users, I'm Jungtaek Lim, one of contributors on streaming part.
Recently I proposed some new feature: native support of session window [1].
While it also tackles the edge-case map/flatMapGroupsWithState don't cover
for session window, its major benefit is mostly better usability on session
wind
Hi,
Using the master branch, I tried to perform SQL aggregation on batch_df in
foreachBatch and only SQL API methods work but not spark sql queries on the
temp table (register as a table or view createOrReplaceTempView).
Is it supported? I really appreciate your help.
--
Sent from: http://apa
Hi all,
I understood from previous threads that the Data source V2 API will see some
changes in spark 2.4.0, however, I can't seem to find what these changes
are.
Is there some documentation which summarizes the changes?
The only mention I seem to find is this pull request:
https://github.com/apa
--
AM Hasunie Sandanathala Adikari
Tel:0713095876
Around 500KB each time i call the function (~150 times)
From: Felix Cheung
Sent: den 26 september 2018 14:57
To: Junior Alvarez ; user@spark.apache.org
Subject: Re: spark.lapply
It looks like the native R process is terminated from buffer overflow. Do you
know how much data is involved?
_