Yes. Disk spill can be a huge performance hit, with smaller partitions you
may avoid this and possibly complete your job faster. I hope you don't get
OOM.
On Sat, 18 Jan 2020 at 10:06, Arwin Tio wrote:
> Okay! I didn't realize you can pump those partition numbers up that high.
> 15000 partitions
Okay! I didn't realize you can pump those partition numbers up that high. 15000
partitions still failed. I am trying 3 partitions now. There is still some
disk spill but it is not that high.
Thanks,
Arwin
From: Chris Teoh
Sent: January 17, 2020 7:32 PM
To:
I need to extract a value from a PySpark structured streaming Dataframe to
a string variable to check something.
I tried this code.
agentName =
kinesisDF.select(kinesisDF.agentName.getItem(0).alias("agentName")).collect()[0][0]
This works on a non-streaming Dataframe only. In a streaming Datafra
Given a session/context, we can get the UI web URL like this:
sparkSession.sparkContext.uiWebUrl
This gives me something like http://node-name.cluster-name:4040. If
opening this from outside the cluster (ex: my laptop), this redirects
via HTTP 302 to something like
http://node-name.cluster-name:
-
To unsubscribe e-mail: user-unsubscr...@spark.apache.org
Hi,
I have a question on the design of monitoring pyspark script on the large
number of source json data coming from more than 100 kafka topics.
These multiple topics are store under separate bucket in aws s3.each of the
kafka topics having more Terabytes of json data with respect to the
partition
Sorry, but my original solution is incorrect
1. Glue Crawlers are not supposed to set the spark.sql.sources.schema.*
properties, but Spark SQL should. The default in Spark 2.4 for
spark.sql.hive.caseSensitiveInferenceMode is INFER_AND_SAVE which means that
Spark infers the schema from the underlyi
This bug happens because the Glue table's SERDEPROPERTIES is missing two
important properties:
spark.sql.sources.schema.numParts
spark.sql.sources.schema.part.0
To solve the problem, I had to add those two properties via the Glue console
(couldn't do it with ALTER TABLE …)
I guess thi
12 matches
Mail list logo