But I still have one question. I find the task number in stage is 3. So
where is this 3 from? How to increase the parallelism?
Regard,
Junfeng Chen
On Tue, Apr 10, 2018 at 11:31 AM, Junfeng Chen wrote:
> Yeah, I have increase the executor number and executor cores, and it runs
> normally now.
Hi All,
We are running spark 2.1.1 on Hadoop YARN 2.6.5.
We found the pyspark.daemon process consume more than 300GB memory.
However, according to
https://cwiki.apache.org/confluence/display/SPARK/PySpark+Internals, the
daemon process shouldn't have this problem.
Also, we find the daemon proces
Yeah, I have increase the executor number and executor cores, and it runs
normally now. The hdp spark 2 have only 2 executor and 1 executor cores by
default.
Regard,
Junfeng Chen
On Tue, Apr 10, 2018 at 10:19 AM, Saisai Shao
wrote:
> In yarn mode, only two executor are assigned to process the
Hi Han,
You may be seeing the same issue I described here:
https://issues.apache.org/jira/browse/SPARK-22342?focusedCommentId=16411780&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16411780
Do you see "TASK_LOST" in your driver logs? I got past that issue by
updat
>
> In yarn mode, only two executor are assigned to process the task, since
> one executor can process one task only, they need 6 min in total.
>
This is not true. You should set --executor-cores/--num-executors to
increase the task parallelism for executor. To be fair, Spark application
should ha
I found the potential reason.
In local mode, all tasks in one stage runs concurrently, while tasks in
yarn mode runs in sequence.
For example, in one stage, each task costs 3 mins. If in local mode, they
will run together, and cost 3 min in total. In yarn mode, only two executor
are assigned to p
Hi,
what I am curious about is the reassignment of df.
Can you please look into the explain plan of df after the statement df =
df.join(df_t.select("ID"),["ID"])? And then compare with the explain plan
of df1 after the statement df1 = df.join(df_t.select("ID"),["ID"])?
Its late here, but I am ye
Hi Spark Users,
The following code snippet has an "attribute missing" error while the
attribute exists. This bug is triggered by a particular sequence of of
"select", "groupby" and "join". Note that if I take away the "select" in
#line B, the code runs without error. However, the "select
Hey Felix,
I've already tried with
.format("memory")
.queryName("tableName")
but, still, it doesn't work for the second query. It just stalls the
program expecting new data for the first query.
Here's my code -
from pyspark.sql import SparkSession
from pyspark.sql.functions import split
sp
Hi Jorn,
I checked the log info of my application:
The ResultStage3 (parquet writing) cost a very long time,nearly 300s, where
the total processing time of this loop is 6 mins.
Regard,
Junfeng Chen
On Mon, Apr 9, 2018 at 2:12 PM, Jörn Franke wrote:
> Probably network / shuffling cost? Or broa
10 matches
Mail list logo