Hi folks,
Do you know about how long Spark will continue to maintain version 2.4?
Thanks.
--
Best regards,
Netanel Malka.
Hi,
This code will hang indefinitely at the last line (the .map()).
Interestingly, if I run the same code at the beginning of my application
(removing the .write step) it executes as expected. Otherwise, the code
appears further along in my application which is where it hangs. The
debugging
Currently we’ve a “Stateful” Spark Structured Streaming job that computes
aggregates for each ID. I need to implement a new requirement which says
that if the no. of incoming messages for a particular ID exceeds a certain
value then add this ID to a blacklist & remove the state for it. Going
BTW, we are seeing this message as well:
*"org.apache.kafka.common.KafkaException:
Producer** closed while send in progress"*. I am assuming this happens
because of the previous issue.."producer has been closed", right? Or are
they unrelated? Please advise. Thanks.
On Tue, Nov 10, 2020 at 11:17
Thanks for the reply. We are on Spark 2.4. Is there no way to get this
fixed in Spark 2.4?
On Mon, Nov 2, 2020 at 8:32 PM Jungtaek Lim
wrote:
> Which Spark version do you use? There's a known issue on Kafka producer
> pool in Spark 2.x which was fixed in Spark 3.0, so you'd like to check
>
Hello,
I have a usecase where I have to stream events from Kafka to a JDBC sink.
Kafka producers write events in bursts of hourly batches.
I started with a structured streaming approach, but it turns out that
structured streaming has no JDBC sink. I found an implementation in Apache
Bahir, but
Hi,
We have many Spark jobs that create multiple small files. We would like to
improve analyst reading performance, doing so I'm testing the parquet
optimal file size.
I've found that the optimal file size should be around 1GB, and not less
than 128MB, depending on the size of the data.
I took
Hi,
In Spark I specifically specify the format of the table to be created
sqltext = """
CREATE TABLE test.randomDataPy(
ID INT
, CLUSTERED INT
, SCATTERED INT
, RANDOMISED INT
, RANDOM_STRING VARCHAR(50)
, SMALL_VC VARCHAR(50)
, PADDING VARCHAR(4000)
Hi all, I am trying to make distribution 3.0.1 with spark 3 using
./dev/make-distribution.sh --name spark3-hive12 --pip --tgz -Phive-1.2
-Phadoop-2.7 -Pyarn
The problem is maven can't found right profile for hive and build ends
without hive jars
++ /Users/reireirei/spark/spark/build/mvn
I an very very new to both spark and spark structured streaming. I have to
write an application that receives a very very large csv files in hdfs
folder. the app must take the file and on each row it must read from
Cassandra data base some rows (not many rows will be returned for each row
in csv).
10 matches
Mail list logo