[jira] [Commented] (SPARK-12179) Spark SQL get different result with the same code
[ https://issues.apache.org/jira/browse/SPARK-12179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15670679#comment-15670679 ] Herman van Hovell commented on SPARK-12179: --- [~litao1990] is this still a problem? > Spark SQL get different result with the same code > - > > Key: SPARK-12179 > URL: https://issues.apache.org/jira/browse/SPARK-12179 > Project: Spark > Issue Type: Bug > Components: Spark Core, SQL >Affects Versions: 1.3.0, 1.3.1, 1.3.2, 1.4.0, 1.4.1, 1.4.2, 1.5.0, 1.5.1, > 1.5.2, 1.5.3 > Environment: hadoop version: 2.5.0-cdh5.3.2 > spark version: 1.5.3 > run mode: yarn-client >Reporter: Tao Li >Priority: Critical > > I run the sql in yarn-client mode, but get different result each time. > As you can see the example, I get the different shuffle write with the same > shuffle read in two jobs with the same code. > Some of my spark app runs well, but some always met this problem. And I met > this problem on spark 1.3, 1.4 and 1.5 version. > Can you give me some suggestions about the possible causes or how do I figure > out the problem? > 1. First Run > Details for Stage 9 (Attempt 0) > Total Time Across All Tasks: 5.8 min > Shuffle Read: 24.4 MB / 205399 > Shuffle Write: 6.8 MB / 54934 > 2. Second Run > Details for Stage 9 (Attempt 0) > Total Time Across All Tasks: 5.6 min > Shuffle Read: 24.4 MB / 205399 > Shuffle Write: 6.8 MB / 54905 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12179) Spark SQL get different result with the same code
[ https://issues.apache.org/jira/browse/SPARK-12179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15060213#comment-15060213 ] Davies Liu commented on SPARK-12179: I think this UDF is not thread safe, rowNum and comparedColumn will be updated by multiple threads > Spark SQL get different result with the same code > - > > Key: SPARK-12179 > URL: https://issues.apache.org/jira/browse/SPARK-12179 > Project: Spark > Issue Type: Bug > Components: Spark Core, SQL >Affects Versions: 1.3.0, 1.3.1, 1.3.2, 1.4.0, 1.4.1, 1.4.2, 1.5.0, 1.5.1, > 1.5.2, 1.5.3 > Environment: hadoop version: 2.5.0-cdh5.3.2 > spark version: 1.5.3 > run mode: yarn-client >Reporter: Tao Li >Priority: Critical > > I run the sql in yarn-client mode, but get different result each time. > As you can see the example, I get the different shuffle write with the same > shuffle read in two jobs with the same code. > Some of my spark app runs well, but some always met this problem. And I met > this problem on spark 1.3, 1.4 and 1.5 version. > Can you give me some suggestions about the possible causes or how do I figure > out the problem? > 1. First Run > Details for Stage 9 (Attempt 0) > Total Time Across All Tasks: 5.8 min > Shuffle Read: 24.4 MB / 205399 > Shuffle Write: 6.8 MB / 54934 > 2. Second Run > Details for Stage 9 (Attempt 0) > Total Time Across All Tasks: 5.6 min > Shuffle Read: 24.4 MB / 205399 > Shuffle Write: 6.8 MB / 54905 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12179) Spark SQL get different result with the same code
[ https://issues.apache.org/jira/browse/SPARK-12179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15060217#comment-15060217 ] Davies Liu commented on SPARK-12179: Which version of Spark are you using? > Spark SQL get different result with the same code > - > > Key: SPARK-12179 > URL: https://issues.apache.org/jira/browse/SPARK-12179 > Project: Spark > Issue Type: Bug > Components: Spark Core, SQL >Affects Versions: 1.3.0, 1.3.1, 1.3.2, 1.4.0, 1.4.1, 1.4.2, 1.5.0, 1.5.1, > 1.5.2, 1.5.3 > Environment: hadoop version: 2.5.0-cdh5.3.2 > spark version: 1.5.3 > run mode: yarn-client >Reporter: Tao Li >Priority: Critical > > I run the sql in yarn-client mode, but get different result each time. > As you can see the example, I get the different shuffle write with the same > shuffle read in two jobs with the same code. > Some of my spark app runs well, but some always met this problem. And I met > this problem on spark 1.3, 1.4 and 1.5 version. > Can you give me some suggestions about the possible causes or how do I figure > out the problem? > 1. First Run > Details for Stage 9 (Attempt 0) > Total Time Across All Tasks: 5.8 min > Shuffle Read: 24.4 MB / 205399 > Shuffle Write: 6.8 MB / 54934 > 2. Second Run > Details for Stage 9 (Attempt 0) > Total Time Across All Tasks: 5.6 min > Shuffle Read: 24.4 MB / 205399 > Shuffle Write: 6.8 MB / 54905 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12179) Spark SQL get different result with the same code
[ https://issues.apache.org/jira/browse/SPARK-12179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15048496#comment-15048496 ] Tao Li commented on SPARK-12179: I try to use spark internal Row_Number() udf, but still has the problem. select $DATE as date, 'main' as type, host, rfhost, rfpv from ( select Row_Number() OVER (partition by host ORDER BY host ,rfpv desc) r, host, rfhost, rfpv from ( select delhost(t0.host) as host, delhost(t0.rfhost) as rfhost ,sum(t0.rfpv) as rfpv from ( select h.host as host,i.rfhost as rfhost ,i.rfpv as rfpv from ( select parse_url(ur,'HOST') as host,count(1) as pv from custom.web_sogourank_orc_zlib where logdate>=$starttime and logdate<=$endtime group by parse_url(ur,'HOST') order by pv desc limit 1 ) h left outer join ( select parse_url(ur,'HOST') as host,parse_url(rf,'HOST') as rfhost , count(*) as rfpv from custom.web_sogourank_orc_zlib where logdate>=$starttime and logdate<=$endtime group by parse_url(ur,'HOST'), parse_url(rf,'HOST') ) i on h.host = i.host ) t0 group by delhost(t0.host),delhost(t0.rfhost) distribute by host sort by host ,rfpv desc ) t1 ) t2 where r<=10 > Spark SQL get different result with the same code > - > > Key: SPARK-12179 > URL: https://issues.apache.org/jira/browse/SPARK-12179 > Project: Spark > Issue Type: Bug > Components: Spark Core, SQL >Affects Versions: 1.3.0, 1.3.1, 1.3.2, 1.4.0, 1.4.1, 1.4.2, 1.5.0, 1.5.1, > 1.5.2, 1.5.3 > Environment: hadoop version: 2.5.0-cdh5.3.2 > spark version: 1.5.3 > run mode: yarn-client >Reporter: Tao Li >Priority: Critical > > I run the sql in yarn-client mode, but get different result each time. > As you can see the example, I get the different shuffle write with the same > shuffle read in two jobs with the same code. > Some of my spark app runs well, but some always met this problem. And I met > this problem on spark 1.3, 1.4 and 1.5 version. > Can you give me some suggestions about the possible causes or how do I figure > out the problem? > 1. First Run > Details for Stage 9 (Attempt 0) > Total Time Across All Tasks: 5.8 min > Shuffle Read: 24.4 MB / 205399 > Shuffle Write: 6.8 MB / 54934 > 2. Second Run > Details for Stage 9 (Attempt 0) > Total Time Across All Tasks: 5.6 min > Shuffle Read: 24.4 MB / 205399 > Shuffle Write: 6.8 MB / 54905 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12179) Spark SQL get different result with the same code
[ https://issues.apache.org/jira/browse/SPARK-12179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15047186#comment-15047186 ] Davies Liu commented on SPARK-12179: Could you also test 1.6-RC1? I'm just wondering that the window function `row_number` came since Spark 1.4, how can you run this query again 1.3 ? > Spark SQL get different result with the same code > - > > Key: SPARK-12179 > URL: https://issues.apache.org/jira/browse/SPARK-12179 > Project: Spark > Issue Type: Bug > Components: Spark Core, SQL >Affects Versions: 1.3.0, 1.3.1, 1.3.2, 1.4.0, 1.4.1, 1.4.2, 1.5.0, 1.5.1, > 1.5.2, 1.5.3 > Environment: hadoop version: 2.5.0-cdh5.3.2 > spark version: 1.5.3 > run mode: yarn-client >Reporter: Tao Li >Priority: Critical > > I run the sql in yarn-client mode, but get different result each time. > As you can see the example, I get the different shuffle write with the same > shuffle read in two jobs with the same code. > Some of my spark app runs well, but some always met this problem. And I met > this problem on spark 1.3, 1.4 and 1.5 version. > Can you give me some suggestions about the possible causes or how do I figure > out the problem? > 1. First Run > Details for Stage 9 (Attempt 0) > Total Time Across All Tasks: 5.8 min > Shuffle Read: 24.4 MB / 205399 > Shuffle Write: 6.8 MB / 54934 > 2. Second Run > Details for Stage 9 (Attempt 0) > Total Time Across All Tasks: 5.6 min > Shuffle Read: 24.4 MB / 205399 > Shuffle Write: 6.8 MB / 54905 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12179) Spark SQL get different result with the same code
[ https://issues.apache.org/jira/browse/SPARK-12179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15047193#comment-15047193 ] Davies Liu commented on SPARK-12179: There are two direction to narrow down the problem: 1) simplify the query until removing anything from it the problem will gone 2) remove the customized configurations (for example, extraJavaOptions), until remove anything of them the problem will gone. This could be a critical bug, hopefully we could find a way to fix it. > Spark SQL get different result with the same code > - > > Key: SPARK-12179 > URL: https://issues.apache.org/jira/browse/SPARK-12179 > Project: Spark > Issue Type: Bug > Components: Spark Core, SQL >Affects Versions: 1.3.0, 1.3.1, 1.3.2, 1.4.0, 1.4.1, 1.4.2, 1.5.0, 1.5.1, > 1.5.2, 1.5.3 > Environment: hadoop version: 2.5.0-cdh5.3.2 > spark version: 1.5.3 > run mode: yarn-client >Reporter: Tao Li >Priority: Critical > > I run the sql in yarn-client mode, but get different result each time. > As you can see the example, I get the different shuffle write with the same > shuffle read in two jobs with the same code. > Some of my spark app runs well, but some always met this problem. And I met > this problem on spark 1.3, 1.4 and 1.5 version. > Can you give me some suggestions about the possible causes or how do I figure > out the problem? > 1. First Run > Details for Stage 9 (Attempt 0) > Total Time Across All Tasks: 5.8 min > Shuffle Read: 24.4 MB / 205399 > Shuffle Write: 6.8 MB / 54934 > 2. Second Run > Details for Stage 9 (Attempt 0) > Total Time Across All Tasks: 5.6 min > Shuffle Read: 24.4 MB / 205399 > Shuffle Write: 6.8 MB / 54905 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12179) Spark SQL get different result with the same code
[ https://issues.apache.org/jira/browse/SPARK-12179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15047962#comment-15047962 ] Tao Li commented on SPARK-12179: Sorry, row_number is udf written by myself, not spark internal udf. Do I still need to test it on 1.6-RC1 and 1.3? > Spark SQL get different result with the same code > - > > Key: SPARK-12179 > URL: https://issues.apache.org/jira/browse/SPARK-12179 > Project: Spark > Issue Type: Bug > Components: Spark Core, SQL >Affects Versions: 1.3.0, 1.3.1, 1.3.2, 1.4.0, 1.4.1, 1.4.2, 1.5.0, 1.5.1, > 1.5.2, 1.5.3 > Environment: hadoop version: 2.5.0-cdh5.3.2 > spark version: 1.5.3 > run mode: yarn-client >Reporter: Tao Li >Priority: Critical > > I run the sql in yarn-client mode, but get different result each time. > As you can see the example, I get the different shuffle write with the same > shuffle read in two jobs with the same code. > Some of my spark app runs well, but some always met this problem. And I met > this problem on spark 1.3, 1.4 and 1.5 version. > Can you give me some suggestions about the possible causes or how do I figure > out the problem? > 1. First Run > Details for Stage 9 (Attempt 0) > Total Time Across All Tasks: 5.8 min > Shuffle Read: 24.4 MB / 205399 > Shuffle Write: 6.8 MB / 54934 > 2. Second Run > Details for Stage 9 (Attempt 0) > Total Time Across All Tasks: 5.6 min > Shuffle Read: 24.4 MB / 205399 > Shuffle Write: 6.8 MB / 54905 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12179) Spark SQL get different result with the same code
[ https://issues.apache.org/jira/browse/SPARK-12179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15047938#comment-15047938 ] Tao Li commented on SPARK-12179: ok, i will try on it > Spark SQL get different result with the same code > - > > Key: SPARK-12179 > URL: https://issues.apache.org/jira/browse/SPARK-12179 > Project: Spark > Issue Type: Bug > Components: Spark Core, SQL >Affects Versions: 1.3.0, 1.3.1, 1.3.2, 1.4.0, 1.4.1, 1.4.2, 1.5.0, 1.5.1, > 1.5.2, 1.5.3 > Environment: hadoop version: 2.5.0-cdh5.3.2 > spark version: 1.5.3 > run mode: yarn-client >Reporter: Tao Li >Priority: Critical > > I run the sql in yarn-client mode, but get different result each time. > As you can see the example, I get the different shuffle write with the same > shuffle read in two jobs with the same code. > Some of my spark app runs well, but some always met this problem. And I met > this problem on spark 1.3, 1.4 and 1.5 version. > Can you give me some suggestions about the possible causes or how do I figure > out the problem? > 1. First Run > Details for Stage 9 (Attempt 0) > Total Time Across All Tasks: 5.8 min > Shuffle Read: 24.4 MB / 205399 > Shuffle Write: 6.8 MB / 54934 > 2. Second Run > Details for Stage 9 (Attempt 0) > Total Time Across All Tasks: 5.6 min > Shuffle Read: 24.4 MB / 205399 > Shuffle Write: 6.8 MB / 54905 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12179) Spark SQL get different result with the same code
[ https://issues.apache.org/jira/browse/SPARK-12179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15047965#comment-15047965 ] Tao Li commented on SPARK-12179: The row_number implementation is as follows: package UDF; import org.apache.hadoop.hive.ql.exec.UDF; public class RowNumber extends UDF { private static int MAX_VALUE = 50; private static String[] comparedColumn = new String[MAX_VALUE]; private static int rowNum = 1; public int evaluate(Object[] args) { String[] columnValue = new String[args.length]; for (int i = 0; i < args.length; i++) { columnValue[i] = (args[i] == null ? "" : args[i].toString()); } if (rowNum == 1) { for (int i = 0; i < columnValue.length; i++) { comparedColumn[i] = columnValue[i]; } } for (int i = 0; i < columnValue.length; i++) { if (!comparedColumn[i].equals(columnValue[i])) { for (int j = 0; j < columnValue.length; j++) { comparedColumn[j] = columnValue[j]; } rowNum = 1; return rowNum++; } } return rowNum++; } } > Spark SQL get different result with the same code > - > > Key: SPARK-12179 > URL: https://issues.apache.org/jira/browse/SPARK-12179 > Project: Spark > Issue Type: Bug > Components: Spark Core, SQL >Affects Versions: 1.3.0, 1.3.1, 1.3.2, 1.4.0, 1.4.1, 1.4.2, 1.5.0, 1.5.1, > 1.5.2, 1.5.3 > Environment: hadoop version: 2.5.0-cdh5.3.2 > spark version: 1.5.3 > run mode: yarn-client >Reporter: Tao Li >Priority: Critical > > I run the sql in yarn-client mode, but get different result each time. > As you can see the example, I get the different shuffle write with the same > shuffle read in two jobs with the same code. > Some of my spark app runs well, but some always met this problem. And I met > this problem on spark 1.3, 1.4 and 1.5 version. > Can you give me some suggestions about the possible causes or how do I figure > out the problem? > 1. First Run > Details for Stage 9 (Attempt 0) > Total Time Across All Tasks: 5.8 min > Shuffle Read: 24.4 MB / 205399 > Shuffle Write: 6.8 MB / 54934 > 2. Second Run > Details for Stage 9 (Attempt 0) > Total Time Across All Tasks: 5.6 min > Shuffle Read: 24.4 MB / 205399 > Shuffle Write: 6.8 MB / 54905 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12179) Spark SQL get different result with the same code
[ https://issues.apache.org/jira/browse/SPARK-12179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15045028#comment-15045028 ] Sean Owen commented on SPARK-12179: --- Yes, that's the critical information. I think it's hard to parse and debug this big query. Can you narrow this down to something more reproducible? Is the underlying data changing? > Spark SQL get different result with the same code > - > > Key: SPARK-12179 > URL: https://issues.apache.org/jira/browse/SPARK-12179 > Project: Spark > Issue Type: Bug > Components: Spark Core, SQL >Affects Versions: 1.3.0, 1.3.1, 1.3.2, 1.4.0, 1.4.1, 1.4.2, 1.5.0, 1.5.1, > 1.5.2, 1.5.3 > Environment: hadoop version: 2.5.0-cdh5.3.2 > spark version: 1.5.3 > run mode: yarn-client >Reporter: Tao Li >Priority: Minor > > I run the sql in yarn-client mode, but get different result each time. > As you can see the example, I get the different shuffle write with the same > shuffle read in two jobs with the same code. > Some of my spark app runs well, but some always met this problem. And I met > this problem on spark 1.3, 1.4 and 1.5 version. > Can you give me some suggestions about the possible causes or how do I figure > out the problem? > 1. First Run > Details for Stage 9 (Attempt 0) > Total Time Across All Tasks: 5.8 min > Shuffle Read: 24.4 MB / 205399 > Shuffle Write: 6.8 MB / 54934 > 2. Second Run > Details for Stage 9 (Attempt 0) > Total Time Across All Tasks: 5.6 min > Shuffle Read: 24.4 MB / 205399 > Shuffle Write: 6.8 MB / 54905 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12179) Spark SQL get different result with the same code
[ https://issues.apache.org/jira/browse/SPARK-12179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15045014#comment-15045014 ] Tao Li commented on SPARK-12179: [~sowen] My sql and command line param is like this: DATE=$1 starttime=$DATE"00" endtime=$DATE"23" sql=" select $DATE as date, 'main' as type, host, rfhost, rfpv from ( select row_number(t1.host) r, host, rfhost, rfpv from ( select delhost(t0.host) as host, delhost(t0.rfhost) as rfhost ,sum(t0.rfpv) as rfpv from ( select h.host as host,i.rfhost as rfhost ,i.rfpv as rfpv from ( select parse_url(ur,'HOST') as host,count(1) as pv from mytable where logdate>=$starttime and logdate<=$endtime group by parse_url(ur,'HOST') order by pv desc limit 1 ) h left outer join ( select parse_url(ur,'HOST') as host,parse_url(rf,'HOST') as rfhost , count(*) as rfpv from mytable where logdate>=$starttime and logdate<=$endtime group by parse_url(ur,'HOST'), parse_url(rf,'HOST') ) i on h.host = i.host ) t0 group by delhost(t0.host),delhost(t0.rfhost) distribute by host sort by host ,rfpv desc ) t1 ) t2 where r<=10 " /opt/spark/bin/spark-sql \ --master yarn-client \ --executor-memory 5G --num-executors 70 --executor-cores 1 --conf spark.yarn.executor.memoryOverhead=2048 --conf spark.executor.extraJavaOptions="-XX:MaxPermSize=256m -XX:+CMSClassUnloadingEnabled -XX:MaxDirectMemorySize=1536m -XX:MaxTenuringThreshold=1 -Xmn100m -XX:+PrintGC -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -XX:+PrintGCApplicationStoppedTime -XX:+PrintHeapAtGC -XX:+PrintGCApplicationConcurrentTime -Xloggc:gc.log -XX:+HeapDumpOnOutOfMemoryError -XX:+UseConcMarkSweepGC -XX:+CMSParallelRemarkEnabled -XX:+UseCMSCompactAtFullCollection -XX:+UseCMSInitiatingOccupancyOnly -XX:CMSInitiatingOccupancyFraction=10 -XX:+UseCompressedOops" \ --driver-memory 3G --conf spark.driver.maxResultSize=2G --conf spark.driver.extraJavaOptions="-XX:MaxPermSize=256m -XX:+CMSClassUnloadingEnabled -XX:+PrintGC -XX:+PrintGCDetails -XX:+PrintGCTimeStamps -XX:+PrintGCDateStamps -XX:+PrintHeapAtGC -XX:+PrintGCApplicationStoppedTime -XX:+PrintGCApplicationConcurrentTime -Xloggc:gc.log -XX:+HeapDumpOnOutOfMemoryError" \ --conf spark.yarn.am.memory=2G --conf spark.yarn.am.extraJavaOptions="-XX:MaxPermSize=125m -XX:+CMSClassUnloadingEnabled" \ --conf spark.sql.shuffle.partitions=2000 \ --conf spark.executor.userClassPathFirst=true \ -i init.hql -e "${sql}" -S > log.$DATE > Spark SQL get different result with the same code > - > > Key: SPARK-12179 > URL: https://issues.apache.org/jira/browse/SPARK-12179 > Project: Spark > Issue Type: Bug > Components: Spark Core, SQL >Affects Versions: 1.3.0, 1.3.1, 1.3.2, 1.4.0, 1.4.1, 1.4.2, 1.5.0, 1.5.1, > 1.5.2, 1.5.3 > Environment: hadoop version: 2.5.0-cdh5.3.2 > spark version: 1.5.3 > run mode: yarn-client >Reporter: Tao Li >Priority: Minor > > I run the sql in yarn-client mode, but get different result each time. > As you can see the example, I get the different shuffle write with the same > shuffle read in two jobs with the same code. > Some of my spark app runs well, but some always met this problem. And I met > this problem on spark 1.3, 1.4 and 1.5 version. > Can you give me some suggestions about the possible causes or how do I figure > out the problem? > 1. First Run > Details for Stage 9 (Attempt 0) > Total Time Across All Tasks: 5.8 min > Shuffle Read: 24.4 MB / 205399 > Shuffle Write: 6.8 MB / 54934 > 2. Second Run > Details for Stage 9 (Attempt 0) > Total Time Across All Tasks: 5.6 min > Shuffle Read: 24.4 MB / 205399 > Shuffle Write: 6.8 MB / 54905 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12179) Spark SQL get different result with the same code
[ https://issues.apache.org/jira/browse/SPARK-12179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15045053#comment-15045053 ] Tao Li commented on SPARK-12179: The query is on a hive table and the hive data is not changing. I think there are many factor will cause this problem, such as 1. is there some different in different hadoop node environment ? 2. is there some bugs on spark shuffle ? 3. is there some classpath or jar version problem ? 4. is the hive compatibility problem ? I think I can make some breakthrough on "shuffle write" number display on the web ui. Why the shuffle write is different? How to get the shuffle write number? Is there any factor will cause the shuffle write different? I will work on this cause and figure it out. [~srowen] If you have any idea or experience, please let me know. Thank you very much! > Spark SQL get different result with the same code > - > > Key: SPARK-12179 > URL: https://issues.apache.org/jira/browse/SPARK-12179 > Project: Spark > Issue Type: Bug > Components: Spark Core, SQL >Affects Versions: 1.3.0, 1.3.1, 1.3.2, 1.4.0, 1.4.1, 1.4.2, 1.5.0, 1.5.1, > 1.5.2, 1.5.3 > Environment: hadoop version: 2.5.0-cdh5.3.2 > spark version: 1.5.3 > run mode: yarn-client >Reporter: Tao Li >Priority: Minor > > I run the sql in yarn-client mode, but get different result each time. > As you can see the example, I get the different shuffle write with the same > shuffle read in two jobs with the same code. > Some of my spark app runs well, but some always met this problem. And I met > this problem on spark 1.3, 1.4 and 1.5 version. > Can you give me some suggestions about the possible causes or how do I figure > out the problem? > 1. First Run > Details for Stage 9 (Attempt 0) > Total Time Across All Tasks: 5.8 min > Shuffle Read: 24.4 MB / 205399 > Shuffle Write: 6.8 MB / 54934 > 2. Second Run > Details for Stage 9 (Attempt 0) > Total Time Across All Tasks: 5.6 min > Shuffle Read: 24.4 MB / 205399 > Shuffle Write: 6.8 MB / 54905 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12179) Spark SQL get different result with the same code
[ https://issues.apache.org/jira/browse/SPARK-12179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15045012#comment-15045012 ] Tao Li commented on SPARK-12179: In the same stage with the same shuffle read, I think it should get the same shuffle write, but it was different between two jobs and all the tasks succeed. > Spark SQL get different result with the same code > - > > Key: SPARK-12179 > URL: https://issues.apache.org/jira/browse/SPARK-12179 > Project: Spark > Issue Type: Bug > Components: Spark Core, SQL >Affects Versions: 1.3.0, 1.3.1, 1.3.2, 1.4.0, 1.4.1, 1.4.2, 1.5.0, 1.5.1, > 1.5.2, 1.5.3 > Environment: hadoop version: 2.5.0-cdh5.3.2 > spark version: 1.5.3 > run mode: yarn-client >Reporter: Tao Li >Priority: Minor > > I run the sql in yarn-client mode, but get different result each time. > As you can see the example, I get the different shuffle write with the same > shuffle read in two jobs with the same code. > Some of my spark app runs well, but some always met this problem. And I met > this problem on spark 1.3, 1.4 and 1.5 version. > Can you give me some suggestions about the possible causes or how do I figure > out the problem? > 1. First Run > Details for Stage 9 (Attempt 0) > Total Time Across All Tasks: 5.8 min > Shuffle Read: 24.4 MB / 205399 > Shuffle Write: 6.8 MB / 54934 > 2. Second Run > Details for Stage 9 (Attempt 0) > Total Time Across All Tasks: 5.6 min > Shuffle Read: 24.4 MB / 205399 > Shuffle Write: 6.8 MB / 54905 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12179) Spark SQL get different result with the same code
[ https://issues.apache.org/jira/browse/SPARK-12179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15045015#comment-15045015 ] Sean Owen commented on SPARK-12179: --- I don't see that you've shown results at all. What do you expect to see vs what do you see? > Spark SQL get different result with the same code > - > > Key: SPARK-12179 > URL: https://issues.apache.org/jira/browse/SPARK-12179 > Project: Spark > Issue Type: Bug > Components: Spark Core, SQL >Affects Versions: 1.3.0, 1.3.1, 1.3.2, 1.4.0, 1.4.1, 1.4.2, 1.5.0, 1.5.1, > 1.5.2, 1.5.3 > Environment: hadoop version: 2.5.0-cdh5.3.2 > spark version: 1.5.3 > run mode: yarn-client >Reporter: Tao Li >Priority: Minor > > I run the sql in yarn-client mode, but get different result each time. > As you can see the example, I get the different shuffle write with the same > shuffle read in two jobs with the same code. > Some of my spark app runs well, but some always met this problem. And I met > this problem on spark 1.3, 1.4 and 1.5 version. > Can you give me some suggestions about the possible causes or how do I figure > out the problem? > 1. First Run > Details for Stage 9 (Attempt 0) > Total Time Across All Tasks: 5.8 min > Shuffle Read: 24.4 MB / 205399 > Shuffle Write: 6.8 MB / 54934 > 2. Second Run > Details for Stage 9 (Attempt 0) > Total Time Across All Tasks: 5.6 min > Shuffle Read: 24.4 MB / 205399 > Shuffle Write: 6.8 MB / 54905 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12179) Spark SQL get different result with the same code
[ https://issues.apache.org/jira/browse/SPARK-12179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15045010#comment-15045010 ] Tao Li commented on SPARK-12179: [~srowen] In this case, there is no task failures and I set spark.speculation=false. But I get the different query results. > Spark SQL get different result with the same code > - > > Key: SPARK-12179 > URL: https://issues.apache.org/jira/browse/SPARK-12179 > Project: Spark > Issue Type: Bug > Components: Spark Core, SQL >Affects Versions: 1.3.0, 1.3.1, 1.3.2, 1.4.0, 1.4.1, 1.4.2, 1.5.0, 1.5.1, > 1.5.2, 1.5.3 > Environment: hadoop version: 2.5.0-cdh5.3.2 > spark version: 1.5.3 > run mode: yarn-client >Reporter: Tao Li >Priority: Minor > > I run the sql in yarn-client mode, but get different result each time. > As you can see the example, I get the different shuffle write with the same > shuffle read in two jobs with the same code. > Some of my spark app runs well, but some always met this problem. And I met > this problem on spark 1.3, 1.4 and 1.5 version. > Can you give me some suggestions about the possible causes or how do I figure > out the problem? > 1. First Run > Details for Stage 9 (Attempt 0) > Total Time Across All Tasks: 5.8 min > Shuffle Read: 24.4 MB / 205399 > Shuffle Write: 6.8 MB / 54934 > 2. Second Run > Details for Stage 9 (Attempt 0) > Total Time Across All Tasks: 5.6 min > Shuffle Read: 24.4 MB / 205399 > Shuffle Write: 6.8 MB / 54905 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12179) Spark SQL get different result with the same code
[ https://issues.apache.org/jira/browse/SPARK-12179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15045024#comment-15045024 ] Tao Li commented on SPARK-12179: For example: 1. First Time Run, I got: 20151204maingoogle google 1234 20151204mainfacebook facebook 12345 20151204maingithub github 123456 2. Second Time Run, I got: 20151204maingoogle google 1234 20151204mainfacebook facebook 22345 20151204maintwitter twitter 12345 You can see: 1. "google" is same 2. "facebook" is different 3. the first run has "github" but no "twitter", the second run has "twitter" but no "github" > Spark SQL get different result with the same code > - > > Key: SPARK-12179 > URL: https://issues.apache.org/jira/browse/SPARK-12179 > Project: Spark > Issue Type: Bug > Components: Spark Core, SQL >Affects Versions: 1.3.0, 1.3.1, 1.3.2, 1.4.0, 1.4.1, 1.4.2, 1.5.0, 1.5.1, > 1.5.2, 1.5.3 > Environment: hadoop version: 2.5.0-cdh5.3.2 > spark version: 1.5.3 > run mode: yarn-client >Reporter: Tao Li >Priority: Minor > > I run the sql in yarn-client mode, but get different result each time. > As you can see the example, I get the different shuffle write with the same > shuffle read in two jobs with the same code. > Some of my spark app runs well, but some always met this problem. And I met > this problem on spark 1.3, 1.4 and 1.5 version. > Can you give me some suggestions about the possible causes or how do I figure > out the problem? > 1. First Run > Details for Stage 9 (Attempt 0) > Total Time Across All Tasks: 5.8 min > Shuffle Read: 24.4 MB / 205399 > Shuffle Write: 6.8 MB / 54934 > 2. Second Run > Details for Stage 9 (Attempt 0) > Total Time Across All Tasks: 5.6 min > Shuffle Read: 24.4 MB / 205399 > Shuffle Write: 6.8 MB / 54905 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12179) Spark SQL get different result with the same code
[ https://issues.apache.org/jira/browse/SPARK-12179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15046240#comment-15046240 ] Tao Li commented on SPARK-12179: yes, my spark version is 1.5.3-SNAPSHOT,and the issue SPARK-11009 is already fixed in my current spark version. > Spark SQL get different result with the same code > - > > Key: SPARK-12179 > URL: https://issues.apache.org/jira/browse/SPARK-12179 > Project: Spark > Issue Type: Bug > Components: Spark Core, SQL >Affects Versions: 1.3.0, 1.3.1, 1.3.2, 1.4.0, 1.4.1, 1.4.2, 1.5.0, 1.5.1, > 1.5.2, 1.5.3 > Environment: hadoop version: 2.5.0-cdh5.3.2 > spark version: 1.5.3 > run mode: yarn-client >Reporter: Tao Li >Priority: Critical > > I run the sql in yarn-client mode, but get different result each time. > As you can see the example, I get the different shuffle write with the same > shuffle read in two jobs with the same code. > Some of my spark app runs well, but some always met this problem. And I met > this problem on spark 1.3, 1.4 and 1.5 version. > Can you give me some suggestions about the possible causes or how do I figure > out the problem? > 1. First Run > Details for Stage 9 (Attempt 0) > Total Time Across All Tasks: 5.8 min > Shuffle Read: 24.4 MB / 205399 > Shuffle Write: 6.8 MB / 54934 > 2. Second Run > Details for Stage 9 (Attempt 0) > Total Time Across All Tasks: 5.6 min > Shuffle Read: 24.4 MB / 205399 > Shuffle Write: 6.8 MB / 54905 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12179) Spark SQL get different result with the same code
[ https://issues.apache.org/jira/browse/SPARK-12179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15046238#comment-15046238 ] Tao Li commented on SPARK-12179: yes, my spark version is 1.5.3-SNAPSHOT,and the issue SPARK-11009 is already fixed in my current spark version. > Spark SQL get different result with the same code > - > > Key: SPARK-12179 > URL: https://issues.apache.org/jira/browse/SPARK-12179 > Project: Spark > Issue Type: Bug > Components: Spark Core, SQL >Affects Versions: 1.3.0, 1.3.1, 1.3.2, 1.4.0, 1.4.1, 1.4.2, 1.5.0, 1.5.1, > 1.5.2, 1.5.3 > Environment: hadoop version: 2.5.0-cdh5.3.2 > spark version: 1.5.3 > run mode: yarn-client >Reporter: Tao Li >Priority: Critical > > I run the sql in yarn-client mode, but get different result each time. > As you can see the example, I get the different shuffle write with the same > shuffle read in two jobs with the same code. > Some of my spark app runs well, but some always met this problem. And I met > this problem on spark 1.3, 1.4 and 1.5 version. > Can you give me some suggestions about the possible causes or how do I figure > out the problem? > 1. First Run > Details for Stage 9 (Attempt 0) > Total Time Across All Tasks: 5.8 min > Shuffle Read: 24.4 MB / 205399 > Shuffle Write: 6.8 MB / 54934 > 2. Second Run > Details for Stage 9 (Attempt 0) > Total Time Across All Tasks: 5.6 min > Shuffle Read: 24.4 MB / 205399 > Shuffle Write: 6.8 MB / 54905 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12179) Spark SQL get different result with the same code
[ https://issues.apache.org/jira/browse/SPARK-12179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15046239#comment-15046239 ] Tao Li commented on SPARK-12179: yes, my spark version is 1.5.3-SNAPSHOT,and the issue SPARK-11009 is already fixed in my current spark version. > Spark SQL get different result with the same code > - > > Key: SPARK-12179 > URL: https://issues.apache.org/jira/browse/SPARK-12179 > Project: Spark > Issue Type: Bug > Components: Spark Core, SQL >Affects Versions: 1.3.0, 1.3.1, 1.3.2, 1.4.0, 1.4.1, 1.4.2, 1.5.0, 1.5.1, > 1.5.2, 1.5.3 > Environment: hadoop version: 2.5.0-cdh5.3.2 > spark version: 1.5.3 > run mode: yarn-client >Reporter: Tao Li >Priority: Critical > > I run the sql in yarn-client mode, but get different result each time. > As you can see the example, I get the different shuffle write with the same > shuffle read in two jobs with the same code. > Some of my spark app runs well, but some always met this problem. And I met > this problem on spark 1.3, 1.4 and 1.5 version. > Can you give me some suggestions about the possible causes or how do I figure > out the problem? > 1. First Run > Details for Stage 9 (Attempt 0) > Total Time Across All Tasks: 5.8 min > Shuffle Read: 24.4 MB / 205399 > Shuffle Write: 6.8 MB / 54934 > 2. Second Run > Details for Stage 9 (Attempt 0) > Total Time Across All Tasks: 5.6 min > Shuffle Read: 24.4 MB / 205399 > Shuffle Write: 6.8 MB / 54905 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-12179) Spark SQL get different result with the same code
[ https://issues.apache.org/jira/browse/SPARK-12179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15046243#comment-15046243 ] Tao Li commented on SPARK-12179: I see there was some exceptions in my executors stderr log: ``` 15/12/08 00:47:44 INFO broadcast.TorrentBroadcast: Started reading broadcast variable 24 15/12/08 00:47:44 INFO storage.MemoryStore: ensureFreeSpace(45407) called with curMem=417720, maxMem=2893440614 15/12/08 00:47:44 INFO storage.MemoryStore: Block broadcast_24_piece0 stored as bytes in memory (estimated size 44.3 KB, free 2.7 GB) 15/12/08 00:47:44 INFO broadcast.TorrentBroadcast: Reading broadcast variable 24 took 34 ms 15/12/08 00:47:44 INFO storage.MemoryStore: ensureFreeSpace(527088) called with curMem=463127, maxMem=2893440614 15/12/08 00:47:44 INFO storage.MemoryStore: Block broadcast_24 stored as values in memory (estimated size 514.7 KB, free 2.7 GB) 15/12/08 00:47:44 WARN conf.Configuration: org.apache.hadoop.hive.conf.LoopingByteArrayInputStream@661ec4ee:an attempt to override final parameter: mapreduce.reduce.speculative; Ignoring. 15/12/08 00:47:45 INFO metastore.HiveMetaStore: 0: Opening raw store with implemenation class:org.apache.hadoop.hive.metastore.ObjectStore 15/12/08 00:47:45 INFO metastore.ObjectStore: ObjectStore, initialize called 15/12/08 00:47:45 WARN metastore.HiveMetaStore: Retrying creating default database after error: Class org.datanucleus.api.jdo.JDOPersistenceManagerFactory was not found. javax.jdo.JDOFatalUserException: Class org.datanucleus.api.jdo.JDOPersistenceManagerFactory was not found. at javax.jdo.JDOHelper.invokeGetPersistenceManagerFactoryOnImplementation(JDOHelper.java:1175) at javax.jdo.JDOHelper.getPersistenceManagerFactory(JDOHelper.java:808) at javax.jdo.JDOHelper.getPersistenceManagerFactory(JDOHelper.java:701) at org.apache.hadoop.hive.metastore.ObjectStore.getPMF(ObjectStore.java:365) at org.apache.hadoop.hive.metastore.ObjectStore.getPersistenceManager(ObjectStore.java:394) at org.apache.hadoop.hive.metastore.ObjectStore.initialize(ObjectStore.java:291) at org.apache.hadoop.hive.metastore.ObjectStore.setConf(ObjectStore.java:258) at org.apache.hadoop.util.ReflectionUtils.setConf(ReflectionUtils.java:73) at org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:133) at org.apache.hadoop.hive.metastore.RawStoreProxy.(RawStoreProxy.java:57) at org.apache.hadoop.hive.metastore.RawStoreProxy.getProxy(RawStoreProxy.java:66) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.newRawStore(HiveMetaStore.java:593) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.getMS(HiveMetaStore.java:571) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.createDefaultDB(HiveMetaStore.java:620) at org.apache.hadoop.hive.metastore.HiveMetaStore$HMSHandler.init(HiveMetaStore.java:461) at org.apache.hadoop.hive.metastore.RetryingHMSHandler.(RetryingHMSHandler.java:66) at org.apache.hadoop.hive.metastore.RetryingHMSHandler.getProxy(RetryingHMSHandler.java:72) at org.apache.hadoop.hive.metastore.HiveMetaStore.newRetryingHMSHandler(HiveMetaStore.java:5762) at org.apache.hadoop.hive.metastore.HiveMetaStoreClient.(HiveMetaStoreClient.java:199) at org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient.(SessionHiveMetaStoreClient.java:74) at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method) at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:57) at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45) at java.lang.reflect.Constructor.newInstance(Constructor.java:526) at org.apache.hadoop.hive.metastore.MetaStoreUtils.newInstance(MetaStoreUtils.java:1521) at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.(RetryingMetaStoreClient.java:86) at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:132) at org.apache.hadoop.hive.metastore.RetryingMetaStoreClient.getProxy(RetryingMetaStoreClient.java:104) at org.apache.hadoop.hive.ql.metadata.Hive.createMetaStoreClient(Hive.java:3005) at org.apache.hadoop.hive.ql.metadata.Hive.getMSC(Hive.java:3024) at org.apache.hadoop.hive.ql.metadata.Hive.getAllDatabases(Hive.java:1234) at org.apache.hadoop.hive.ql.metadata.Hive.reloadFunctions(Hive.java:174) at org.apache.hadoop.hive.ql.metadata.Hive.(Hive.java:166) at org.apache.hadoop.hive.ql.plan.PlanUtils.configureJobPropertiesForStorageHandler(PlanUtils.java:803) at org.apache.hadoop.hive.ql.plan.PlanUtils.configureInputJobPropertiesForStorageHandler(PlanUtils.java:782) at
[jira] [Commented] (SPARK-12179) Spark SQL get different result with the same code
[ https://issues.apache.org/jira/browse/SPARK-12179?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15046014#comment-15046014 ] Davies Liu commented on SPARK-12179: This may be related to https://issues.apache.org/jira/browse/SPARK-11009, but that is fixed in 1.5.2+, can you really reproduce it on 1.5.2+? > Spark SQL get different result with the same code > - > > Key: SPARK-12179 > URL: https://issues.apache.org/jira/browse/SPARK-12179 > Project: Spark > Issue Type: Bug > Components: Spark Core, SQL >Affects Versions: 1.3.0, 1.3.1, 1.3.2, 1.4.0, 1.4.1, 1.4.2, 1.5.0, 1.5.1, > 1.5.2, 1.5.3 > Environment: hadoop version: 2.5.0-cdh5.3.2 > spark version: 1.5.3 > run mode: yarn-client >Reporter: Tao Li >Priority: Minor > > I run the sql in yarn-client mode, but get different result each time. > As you can see the example, I get the different shuffle write with the same > shuffle read in two jobs with the same code. > Some of my spark app runs well, but some always met this problem. And I met > this problem on spark 1.3, 1.4 and 1.5 version. > Can you give me some suggestions about the possible causes or how do I figure > out the problem? > 1. First Run > Details for Stage 9 (Attempt 0) > Total Time Across All Tasks: 5.8 min > Shuffle Read: 24.4 MB / 205399 > Shuffle Write: 6.8 MB / 54934 > 2. Second Run > Details for Stage 9 (Attempt 0) > Total Time Across All Tasks: 5.6 min > Shuffle Read: 24.4 MB / 205399 > Shuffle Write: 6.8 MB / 54905 -- This message was sent by Atlassian JIRA (v6.3.4#6332) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org