Daniel Haviv wrote
> Hi,
> I'm trying to debug an issue with Spark so I've set log level to DEBUG but
> at the same time I'd like to avoid the httpclient.wire's verbose output by
> setting it to WARN.
>
> I tried the following log4.properties config but I'm still getting DEBUG
> outputs for
Abdeali Kothari wrote
> I am using Spark 2.3.0 and trying to read a CSV file which has 500
> records.
> When I try to read it, spark says that it has two stages: 10, 11 and then
> they join into stage 12.
What's your CSV size per file? I think Spark optimizer may put many files
into one task when
Abdeali Kothari wrote
> My entire CSV is less than 20KB.
> By somewhere in between, I do a broadcast join with 3500 records in
> another
> file.
> After the broadcast join I have a lot of processing to do. Overall, the
> time to process a single record goes up-to 5mins on 1 executor
>
> I'm
So can you read the file on executor side?
I think the file passed by --files my.app.conf would be added under
classpath, and you can use it directly.
--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
-
To
What's your Spark version?
Do you have added hadoop native library to your path? like
"spark.executor.extraJavaOptions -Djava.library.path=/hadoop-native/" in
spark-defaults.conf.
--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
You can't, sparkcontext is a singleton object. You have to use hadoop library
or aws client to read files on s3.
--
Sent from: http://apache-spark-user-list.1001560.n3.nabble.com/
-
To unsubscribe e-mail:
Hi,
I have an application that runs in a Spark-2.4.4 cluster and it transforms
two RDD to DataFrame with `rdd.toDF()` then outputs them to file.
For slave resource usage optimization, the application executes the job in
multi-thread. The code snippet looks like this:
And I found that