[jira] [Updated] (SPARK-25717) Insert overwrite a recreated external and partitioned table may result in incorrect query results
[ https://issues.apache.org/jira/browse/SPARK-25717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jinhua Fu updated SPARK-25717: -- Description: Consider the following scenario: {code:java} spark.range(100).createTempView("temp") (0 until 3).foreach { _ => spark.sql("drop table if exists tableA") spark.sql("create table if not exists tableA(a int) partitioned by (p int) location 'file:/e:/study/warehouse/tableA'") spark.sql("insert overwrite table tableA partition(p=1) select * from temp") spark.sql("select count(1) from tableA where p=1").show } {code} We expect the count always be 100, but the actual results are as follows: {code:java} ++ |count(1)| ++ | 100| ++ ++ |count(1)| ++ | 200| ++ ++ |count(1)| ++ | 300| ++ {code} when spark executes an `insert overwrite` command, it gets the historical partition first, and then delete it from fileSystem. But for recreated external and partitioned table, the partitions were all deleted by the `drop table` command with data unremoved. So the historical data is preserved which lead to the query results incorrect. was: Consider the following scenario: {code:java} spark.range(100).createTempView("temp") (0 until 3).foreach { _ => spark.sql("drop table if exists tableA") spark.sql("create table if not exists tableA(a int) partitioned by (p int) location 'file:/e:/study/warehouse/tableA'") spark.sql("insert overwrite table tableA partition(p=1) select * from temp") spark.sql("select count(1) from tableA where p=1").show } {code} We expect the count always be 100, but the actual results are as follows: {code:java} ++ |count(1)| ++ | 100| ++ ++ |count(1)| ++ | 200| ++ ++ |count(1)| ++ | 300| ++ {code} when spark executes an `insert overwrite` command, it gets the historical partition first, and then delete it from fileSystem. But for recreated external and partitioned table, the partitions were all deleted by the `drop table` command. So the historical data is preserved which lead to the query results incorrect. > Insert overwrite a recreated external and partitioned table may result in > incorrect query results > - > > Key: SPARK-25717 > URL: https://issues.apache.org/jira/browse/SPARK-25717 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.2 >Reporter: Jinhua Fu >Priority: Major > > Consider the following scenario: > {code:java} > spark.range(100).createTempView("temp") > (0 until 3).foreach { _ => > spark.sql("drop table if exists tableA") > spark.sql("create table if not exists tableA(a int) partitioned by (p int) > location 'file:/e:/study/warehouse/tableA'") > spark.sql("insert overwrite table tableA partition(p=1) select * from temp") > spark.sql("select count(1) from tableA where p=1").show > } > {code} > We expect the count always be 100, but the actual results are as follows: > {code:java} > ++ > |count(1)| > ++ > | 100| > ++ > ++ > |count(1)| > ++ > | 200| > ++ > ++ > |count(1)| > ++ > | 300| > ++ > {code} > when spark executes an `insert overwrite` command, it gets the historical > partition first, and then delete it from fileSystem. > But for recreated external and partitioned table, the partitions were all > deleted by the `drop table` command with data unremoved. So the historical > data is preserved which lead to the query results incorrect. > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-25717) Insert overwrite a recreated external and partitioned table may result in incorrect query results
Jinhua Fu created SPARK-25717: - Summary: Insert overwrite a recreated external and partitioned table may result in incorrect query results Key: SPARK-25717 URL: https://issues.apache.org/jira/browse/SPARK-25717 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.3.2 Reporter: Jinhua Fu Consider the following scenario: {code:java} spark.range(100).createTempView("temp") (0 until 3).foreach { _ => spark.sql("drop table if exists tableA") spark.sql("create table if not exists tableA(a int) partitioned by (p int) location 'file:/e:/study/warehouse/tableA'") spark.sql("insert overwrite table tableA partition(p=1) select * from temp") spark.sql("select count(1) from tableA where p=1").show } {code} We expect the count always be 100, but the actual results are as follows: {code:java} ++ |count(1)| ++ | 100| ++ ++ |count(1)| ++ | 200| ++ ++ |count(1)| ++ | 300| ++ {code} when spark executes an `insert overwrite` command, it gets the historical partition first, and then delete it from fileSystem. But for recreated external and partitioned table, the partitions were all deleted by the `drop table` command. So the historical data is preserved which lead to the query results incorrect. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-25701) Supports calculation of table statistics from partition's catalog statistics
Jinhua Fu created SPARK-25701: - Summary: Supports calculation of table statistics from partition's catalog statistics Key: SPARK-25701 URL: https://issues.apache.org/jira/browse/SPARK-25701 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 2.3.2 Reporter: Jinhua Fu When obtaining table statistics, if the `totalSize` of the table is not defined, we fallback to HDFS to get the table statistics when `spark.sql.statistics.fallBackToHdfs` is `true`, otherwise the default value(`spark.sql.defaultSizeInBytes`) will be taken. Fortunately, in most case the data is written into the table by a insertion command which will save the data-size in meta data, so it's possible to use meta data to calculate the table statistics. -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-25404) Staging path may not on the expected place when table path contains the stagingDir string
[ https://issues.apache.org/jira/browse/SPARK-25404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jinhua Fu updated SPARK-25404: -- Description: Considering the follow scenario: {code:java} SET hive.exec.stagingdir=temp; CREATE TABLE tempTableA(key int) location '/spark/temp/tempTableA'; INSERT OVERWRITE TABLE tempTableA SELECT 1; {code} We expect the staging path under the table path, such as '/spark/temp/tempTableA/.hive-stagingXXX'(SPARK-20594), but actually it is '/spark/tempXXX'. I'm not quite sure why we use the 'if ... else ...' when getting a stagingDir, but it maybe the cause of this bug. {code:java} // SaveAsHiveFile.scala private def getStagingDir( inputPath: Path, hadoopConf: Configuration, stagingDir: String): Path = { .. var stagingPathName: String = if (inputPathName.indexOf(stagingDir) == -1) { new Path(inputPathName, stagingDir).toString } else { // The 'indexOf' may not get expected position, and this may be the cause of this bug. inputPathName.substring(0, inputPathName.indexOf(stagingDir) + stagingDir.length) } .. } {code} was: Considering the follow scenario: {code:java} SET hive.exec.stagingdir=temp; CREATE TABLE tempTableA(key int) location '/spark/temp/tempTableA'; INSERT OVERWRITE TABLE tempTableA SELECT 1; {code} We expect the staging path under the table path, such as '/spark/temp/tempTableA/.hive-stagingXXX'(SPARK-20594), but actually it is '/spark/tempXXX'. I'm not quite sure why we use the 'if ... else ...' when getting a stagingDir, but it maybe the cause of this bug. {code:java} // SaveAsHiveFile.scala private def getStagingDir( inputPath: Path, hadoopConf: Configuration, stagingDir: String): Path = { .. var stagingPathName: String = if (inputPathName.indexOf(stagingDir) == -1) { new Path(inputPathName, stagingDir).toString } else { // The 'indexOf' may get expect position, and this may be the cause of this bug. inputPathName.substring(0, inputPathName.indexOf(stagingDir) + stagingDir.length) } .. } {code} > Staging path may not on the expected place when table path contains the > stagingDir string > - > > Key: SPARK-25404 > URL: https://issues.apache.org/jira/browse/SPARK-25404 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.1 >Reporter: Jinhua Fu >Priority: Minor > > Considering the follow scenario: > > {code:java} > SET hive.exec.stagingdir=temp; > CREATE TABLE tempTableA(key int) location '/spark/temp/tempTableA'; > INSERT OVERWRITE TABLE tempTableA SELECT 1; > {code} > We expect the staging path under the table path, such as > '/spark/temp/tempTableA/.hive-stagingXXX'(SPARK-20594), but actually it is > '/spark/tempXXX'. > I'm not quite sure why we use the 'if ... else ...' when getting a > stagingDir, but it maybe the cause of this bug. > > {code:java} > // SaveAsHiveFile.scala > private def getStagingDir( > inputPath: Path, > hadoopConf: Configuration, > stagingDir: String): Path = { > .. > var stagingPathName: String = > if (inputPathName.indexOf(stagingDir) == -1) { > new Path(inputPathName, stagingDir).toString > } else { > // The 'indexOf' may not get expected position, and this may be the cause > of this bug. > inputPathName.substring(0, inputPathName.indexOf(stagingDir) + > stagingDir.length) > } > .. > } > {code} > > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-25404) Staging path may not on the expected place when table path contains the stagingDir string
[ https://issues.apache.org/jira/browse/SPARK-25404?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jinhua Fu updated SPARK-25404: -- Description: Considering the follow scenario: {code:java} SET hive.exec.stagingdir=temp; CREATE TABLE tempTableA(key int) location '/spark/temp/tempTableA'; INSERT OVERWRITE TABLE tempTableA SELECT 1; {code} We expect the staging path under the table path, such as '/spark/temp/tempTableA/.hive-stagingXXX'(SPARK-20594), but actually it is '/spark/tempXXX'. I'm not quite sure why we use the 'if ... else ...' when getting a stagingDir, but it maybe the cause of this bug. {code:java} // SaveAsHiveFile.scala private def getStagingDir( inputPath: Path, hadoopConf: Configuration, stagingDir: String): Path = { .. var stagingPathName: String = if (inputPathName.indexOf(stagingDir) == -1) { new Path(inputPathName, stagingDir).toString } else { // The 'indexOf' may get expect position, and this may be the cause of this bug. inputPathName.substring(0, inputPathName.indexOf(stagingDir) + stagingDir.length) } .. } {code} was: Considering the follow scenario: {code:java} SET hive.exec.stagingdir=temp; CREATE TABLE tempTableA(key int) location '/spark/temp/tempTableA'; INSERT OVERWRITE TABLE tempTableA SELECT 1; {code} We expect the staging path under the table path, such as '/spark/temp/tempTableA/.hive-stagingXXX'(SPARK-20594), but actually it is '/spark/tempXXX'. I'm not quite sure why we use the 'if ... else ...' when getting a stagingDir, but it maybe the cause of this bug. {code:java} private def getStagingDir( inputPath: Path, hadoopConf: Configuration, stagingDir: String): Path = { .. var stagingPathName: String = if (inputPathName.indexOf(stagingDir) == -1) { new Path(inputPathName, stagingDir).toString } else { inputPathName.substring(0, inputPathName.indexOf(stagingDir) + stagingDir.length) } .. } {code} > Staging path may not on the expected place when table path contains the > stagingDir string > - > > Key: SPARK-25404 > URL: https://issues.apache.org/jira/browse/SPARK-25404 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.1 >Reporter: Jinhua Fu >Priority: Minor > > Considering the follow scenario: > > {code:java} > SET hive.exec.stagingdir=temp; > CREATE TABLE tempTableA(key int) location '/spark/temp/tempTableA'; > INSERT OVERWRITE TABLE tempTableA SELECT 1; > {code} > We expect the staging path under the table path, such as > '/spark/temp/tempTableA/.hive-stagingXXX'(SPARK-20594), but actually it is > '/spark/tempXXX'. > I'm not quite sure why we use the 'if ... else ...' when getting a > stagingDir, but it maybe the cause of this bug. > > {code:java} > // SaveAsHiveFile.scala > private def getStagingDir( > inputPath: Path, > hadoopConf: Configuration, > stagingDir: String): Path = { > .. > var stagingPathName: String = > if (inputPathName.indexOf(stagingDir) == -1) { > new Path(inputPathName, stagingDir).toString > } else { > // The 'indexOf' may get expect position, and this may be the cause of > this bug. > inputPathName.substring(0, inputPathName.indexOf(stagingDir) + > stagingDir.length) > } > .. > } > {code} > > > -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-25404) Staging path may not on the expected place when table path contains the stagingDir string
Jinhua Fu created SPARK-25404: - Summary: Staging path may not on the expected place when table path contains the stagingDir string Key: SPARK-25404 URL: https://issues.apache.org/jira/browse/SPARK-25404 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.3.1 Reporter: Jinhua Fu Considering the follow scenario: {code:java} SET hive.exec.stagingdir=temp; CREATE TABLE tempTableA(key int) location '/spark/temp/tempTableA'; INSERT OVERWRITE TABLE tempTableA SELECT 1; {code} We expect the staging path under the table path, such as '/spark/temp/tempTableA/.hive-stagingXXX'(SPARK-20594), but actually it is '/spark/tempXXX'. I'm not quite sure why we use the 'if ... else ...' when getting a stagingDir, but it maybe the cause of this bug. {code:java} private def getStagingDir( inputPath: Path, hadoopConf: Configuration, stagingDir: String): Path = { .. var stagingPathName: String = if (inputPathName.indexOf(stagingDir) == -1) { new Path(inputPathName, stagingDir).toString } else { inputPathName.substring(0, inputPathName.indexOf(stagingDir) + stagingDir.length) } .. } {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-20758) Add Constant propagation optimization
[ https://issues.apache.org/jira/browse/SPARK-20758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jinhua Fu updated SPARK-20758: -- Issue Type: New JIRA Project (was: Improvement) > Add Constant propagation optimization > - > > Key: SPARK-20758 > URL: https://issues.apache.org/jira/browse/SPARK-20758 > Project: Spark > Issue Type: New JIRA Project > Components: SQL >Affects Versions: 2.1.1 >Reporter: Tejas Patil >Assignee: Tejas Patil >Priority: Minor > Fix For: 2.3.0 > > > Constant propagation involves substituting attributes which can be statically > evaluated in expressions. Its a pretty common optimization in compilers world. > eg. > {noformat} > SELECT * FROM table WHERE i = 5 AND j = i + 3 > {noformat} > can be re-written as: > {noformat} > SELECT * FROM table WHERE i = 5 AND j = 8 > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-21786) The 'spark.sql.parquet.compression.codec' configuration doesn't take effect on tables with partition field(s)
Jinhua Fu created SPARK-21786: - Summary: The 'spark.sql.parquet.compression.codec' configuration doesn't take effect on tables with partition field(s) Key: SPARK-21786 URL: https://issues.apache.org/jira/browse/SPARK-21786 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.2.0 Reporter: Jinhua Fu For tables created like below, 'spark.sql.parquet.compression.codec' doesn't take any effect when insert data. And because the default compression codec is 'uncompressed', if I want to change the compression codec, I have to change it by 'set parquet.compression='. Contrast,tables without any partition field will work normal with 'spark.sql.parquet.compression.codec',and the default compression codec is 'snappy', but it seems 'parquet.compression' no longer in effect. Should we use the ‘spark.sql.parquet.compression.codec’ configuration uniformly? CREATE TABLE Test_Parquet(provincecode int, citycode int, districtcode int) PARTITIONED BY (p_provincecode int) STORED AS PARQUET; INSERT OVERWRITE TABLE Test_Parquet select * from TableB; -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-21135) On history server page,duration of incompleted applications should be hidden instead of showing up as 0
Jinhua Fu created SPARK-21135: - Summary: On history server page,duration of incompleted applications should be hidden instead of showing up as 0 Key: SPARK-21135 URL: https://issues.apache.org/jira/browse/SPARK-21135 Project: Spark Issue Type: Improvement Components: Web UI Affects Versions: 2.2.1 Reporter: Jinhua Fu Priority: Minor On history server page,duration of incompleted applications should be hidden instead of showing up as 0. In addition, the application of an exception abort (such as the application of a background kill or driver outage) will always be treated as a Incompleted application, and I'm not sure if this is a problem. -- This message was sent by Atlassian JIRA (v6.4.14#64029) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-21018) "Completed Jobs" and "Completed Stages" support pagination
[ https://issues.apache.org/jira/browse/SPARK-21018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jinhua Fu updated SPARK-21018: -- Description: When using Thriftsever, the number of jobs and Stages may be very large, and if not paginated, the page will be very long and slow to load, especially when spark.ui.retainedJobs is set to a large value. So I suggest "completed Jobs" and "completed Stages" support pagination. I'd like to change them to a paging display similar to the tasks in the "Details for Stage" page. was:When using Thriftsever, the number of jobs and Stages may be very large, and if not paginated, the page will be very long and slow to load, especially when spark.ui.retainedJobs is set to a large value. So I suggest "completed Jobs" and "completed Stages" support pagination. > "Completed Jobs" and "Completed Stages" support pagination > -- > > Key: SPARK-21018 > URL: https://issues.apache.org/jira/browse/SPARK-21018 > Project: Spark > Issue Type: Improvement > Components: Web UI >Affects Versions: 2.0.2 >Reporter: Jinhua Fu >Priority: Minor > Attachments: CompletedJobs.png, PagedTasks.png > > > When using Thriftsever, the number of jobs and Stages may be very large, and > if not paginated, the page will be very long and slow to load, especially > when spark.ui.retainedJobs is set to a large value. So I suggest "completed > Jobs" and "completed Stages" support pagination. > I'd like to change them to a paging display similar to the tasks in the > "Details for Stage" page. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-21018) "Completed Jobs" and "Completed Stages" support pagination
[ https://issues.apache.org/jira/browse/SPARK-21018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jinhua Fu updated SPARK-21018: -- Attachment: PagedTasks.png > "Completed Jobs" and "Completed Stages" support pagination > -- > > Key: SPARK-21018 > URL: https://issues.apache.org/jira/browse/SPARK-21018 > Project: Spark > Issue Type: Improvement > Components: Web UI >Affects Versions: 2.0.2 >Reporter: Jinhua Fu >Priority: Minor > Attachments: CompletedJobs.png, PagedTasks.png > > > When using Thriftsever, the number of jobs and Stages may be very large, and > if not paginated, the page will be very long and slow to load, especially > when spark.ui.retainedJobs is set to a large value. So I suggest "completed > Jobs" and "completed Stages" support pagination. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-21018) "Completed Jobs" and "Completed Stages" support pagination
[ https://issues.apache.org/jira/browse/SPARK-21018?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jinhua Fu updated SPARK-21018: -- Attachment: CompletedJobs.png > "Completed Jobs" and "Completed Stages" support pagination > -- > > Key: SPARK-21018 > URL: https://issues.apache.org/jira/browse/SPARK-21018 > Project: Spark > Issue Type: Improvement > Components: Web UI >Affects Versions: 2.0.2 >Reporter: Jinhua Fu >Priority: Minor > Attachments: CompletedJobs.png > > > When using Thriftsever, the number of jobs and Stages may be very large, and > if not paginated, the page will be very long and slow to load, especially > when spark.ui.retainedJobs is set to a large value. So I suggest "completed > Jobs" and "completed Stages" support pagination. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-21018) "Completed Jobs" and "Completed Stages" support pagination
Jinhua Fu created SPARK-21018: - Summary: "Completed Jobs" and "Completed Stages" support pagination Key: SPARK-21018 URL: https://issues.apache.org/jira/browse/SPARK-21018 Project: Spark Issue Type: Improvement Components: Web UI Affects Versions: 2.0.2 Reporter: Jinhua Fu Priority: Minor When using Thriftsever, the number of jobs and Stages may be very large, and if not paginated, the page will be very long and slow to load, especially when spark.ui.retainedJobs is set to a large value. So I suggest "completed Jobs" and "completed Stages" support pagination. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-20591) Succeeded tasks num not equal in job page and job detail page on spark web ui when speculative task(s) exist
[ https://issues.apache.org/jira/browse/SPARK-20591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15996681#comment-15996681 ] Jinhua Fu commented on SPARK-20591: --- Does it need modify and may I take this PR? > Succeeded tasks num not equal in job page and job detail page on spark web ui > when speculative task(s) exist > > > Key: SPARK-20591 > URL: https://issues.apache.org/jira/browse/SPARK-20591 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 2.0.2 >Reporter: Jinhua Fu >Priority: Minor > Attachments: job detail page(stages).png, job page.png > > > when spark.speculation is enabled,and there are some speculative tasks, then > we can see succeeded tasks num include speculative tasks on the job page, > which however not being included on the job detail page(job stages page). > When I consider some tasks may run a little slow by the job page's succeeded > tasks more than total tasks,which make me want to known which tasks and why,I > have to check every stage to find the speculative tasks which is beacause > speculative tasks not being included in the stage succeeded task num. > Can it be improved? > update two screenshots, succeeded task num is 557 on job page,but 550(by sum) > on job detail page(stages),the extra 7 tasks are speculative tasks. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-20591) Succeeded tasks num not equal in job page and job detail page on spark web ui when speculative task(s) exist
[ https://issues.apache.org/jira/browse/SPARK-20591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jinhua Fu updated SPARK-20591: -- Description: when spark.speculation is enabled,and there are some speculative tasks, then we can see succeeded tasks num include speculative tasks on the job page, which however not being included on the job detail page(job stages page). When I consider some tasks may run a little slow by the job page's succeeded tasks more than total tasks,which make me want to known which tasks and why,I have to check every stage to find the speculative tasks which is beacause speculative tasks not being included in the stage succeeded task num. Can it be improved? update two screenshots, succeeded task num is 557 on job page,but 550(by sum) on job detail page(stages),the extra 7 tasks are speculative tasks. was: when spark.speculation is enabled,and there are some speculative tasks, then we can see succeeded tasks num include speculative tasks on the job page, which however not being included on the job detail page(job stages page). When I consider some tasks may run a little slow by the job page's succeeded tasks more than total tasks,which make me want to known which tasks and why,I have to check every stage to find the speculative tasks which is beacause speculative tasks not being included in the stage succeeded task num. Can it be improved? > Succeeded tasks num not equal in job page and job detail page on spark web ui > when speculative task(s) exist > > > Key: SPARK-20591 > URL: https://issues.apache.org/jira/browse/SPARK-20591 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 2.0.2 >Reporter: Jinhua Fu >Priority: Minor > Attachments: job detail page(stages).png, job page.png > > > when spark.speculation is enabled,and there are some speculative tasks, then > we can see succeeded tasks num include speculative tasks on the job page, > which however not being included on the job detail page(job stages page). > When I consider some tasks may run a little slow by the job page's succeeded > tasks more than total tasks,which make me want to known which tasks and why,I > have to check every stage to find the speculative tasks which is beacause > speculative tasks not being included in the stage succeeded task num. > Can it be improved? > update two screenshots, succeeded task num is 557 on job page,but 550(by sum) > on job detail page(stages),the extra 7 tasks are speculative tasks. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-20591) Succeeded tasks num not equal in job page and job detail page on spark web ui when speculative task(s) exist
[ https://issues.apache.org/jira/browse/SPARK-20591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jinhua Fu updated SPARK-20591: -- Attachment: job detail page(stages).png > Succeeded tasks num not equal in job page and job detail page on spark web ui > when speculative task(s) exist > > > Key: SPARK-20591 > URL: https://issues.apache.org/jira/browse/SPARK-20591 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 2.0.2 >Reporter: Jinhua Fu >Priority: Minor > Attachments: job detail page(stages).png, job page.png > > > when spark.speculation is enabled,and there are some speculative tasks, then > we can see succeeded tasks num include speculative tasks on the job page, > which however not being included on the job detail page(job stages page). > When I consider some tasks may run a little slow by the job page's succeeded > tasks more than total tasks,which make me want to known which tasks and why,I > have to check every stage to find the speculative tasks which is beacause > speculative tasks not being included in the stage succeeded task num. > Can it be improved? -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-20591) Succeeded tasks num not equal in job page and job detail page on spark web ui when speculative task(s) exist
[ https://issues.apache.org/jira/browse/SPARK-20591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jinhua Fu updated SPARK-20591: -- Attachment: (was: screenshot-1.png) > Succeeded tasks num not equal in job page and job detail page on spark web ui > when speculative task(s) exist > > > Key: SPARK-20591 > URL: https://issues.apache.org/jira/browse/SPARK-20591 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 2.0.2 >Reporter: Jinhua Fu >Priority: Minor > Attachments: job detail page(stages).png, job page.png > > > when spark.speculation is enabled,and there are some speculative tasks, then > we can see succeeded tasks num include speculative tasks on the job page, > which however not being included on the job detail page(job stages page). > When I consider some tasks may run a little slow by the job page's succeeded > tasks more than total tasks,which make me want to known which tasks and why,I > have to check every stage to find the speculative tasks which is beacause > speculative tasks not being included in the stage succeeded task num. > Can it be improved? -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-20591) Succeeded tasks num not equal in job page and job detail page on spark web ui when speculative task(s) exist
[ https://issues.apache.org/jira/browse/SPARK-20591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jinhua Fu updated SPARK-20591: -- Attachment: screenshot-1.png > Succeeded tasks num not equal in job page and job detail page on spark web ui > when speculative task(s) exist > > > Key: SPARK-20591 > URL: https://issues.apache.org/jira/browse/SPARK-20591 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 2.0.2 >Reporter: Jinhua Fu >Priority: Minor > Attachments: job page.png, screenshot-1.png > > > when spark.speculation is enabled,and there are some speculative tasks, then > we can see succeeded tasks num include speculative tasks on the job page, > which however not being included on the job detail page(job stages page). > When I consider some tasks may run a little slow by the job page's succeeded > tasks more than total tasks,which make me want to known which tasks and why,I > have to check every stage to find the speculative tasks which is beacause > speculative tasks not being included in the stage succeeded task num. > Can it be improved? -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-20591) Succeeded tasks num not equal in job page and job detail page on spark web ui when speculative task(s) exist
[ https://issues.apache.org/jira/browse/SPARK-20591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jinhua Fu updated SPARK-20591: -- Attachment: job page.png > Succeeded tasks num not equal in job page and job detail page on spark web ui > when speculative task(s) exist > > > Key: SPARK-20591 > URL: https://issues.apache.org/jira/browse/SPARK-20591 > Project: Spark > Issue Type: Bug > Components: Web UI >Affects Versions: 2.0.2 >Reporter: Jinhua Fu >Priority: Minor > Attachments: job page.png > > > when spark.speculation is enabled,and there are some speculative tasks, then > we can see succeeded tasks num include speculative tasks on the job page, > which however not being included on the job detail page(job stages page). > When I consider some tasks may run a little slow by the job page's succeeded > tasks more than total tasks,which make me want to known which tasks and why,I > have to check every stage to find the speculative tasks which is beacause > speculative tasks not being included in the stage succeeded task num. > Can it be improved? -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-20591) Succeeded tasks num not equal in job page and job detail page on spark web ui when speculative task(s) exist
Jinhua Fu created SPARK-20591: - Summary: Succeeded tasks num not equal in job page and job detail page on spark web ui when speculative task(s) exist Key: SPARK-20591 URL: https://issues.apache.org/jira/browse/SPARK-20591 Project: Spark Issue Type: Improvement Components: Web UI Affects Versions: 2.0.2 Reporter: Jinhua Fu when spark.speculation is enabled,and there are some speculative tasks, then we can see succeeded tasks num include speculative tasks on the job page, which however not being included on the job detail page(job stages page). When I consider some tasks may run a little slow by the job page's succeeded tasks more than total tasks,which make me want to known which tasks and why,I have to check every stage to find the speculative tasks which is beacause speculative tasks not being included in the stage succeeded task num. Can it be improved? -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-20150) Add permsize statistics for worker memory which may be very useful for the memory usage assessment
[ https://issues.apache.org/jira/browse/SPARK-20150?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Jinhua Fu updated SPARK-20150: -- Summary: Add permsize statistics for worker memory which may be very useful for the memory usage assessment (was: Can the spark add a mechanism for permsize statistics which may be very useful for the memory usage assessment) > Add permsize statistics for worker memory which may be very useful for the > memory usage assessment > -- > > Key: SPARK-20150 > URL: https://issues.apache.org/jira/browse/SPARK-20150 > Project: Spark > Issue Type: Wish > Components: Web UI >Affects Versions: 2.0.2 >Reporter: Jinhua Fu > > It seems worker memory only be assigned to executor heap which is usually not > enough for estimating the whole clauster memory usage,especially when memory > becomes a bottleneck of the clauster.In many case,we found a executor's real > memory usage was much larger than its heap size which make me have to check > for every application's real memory expenditure. > This can be improved by adding a mechanism for Non-Heap(permsize) > statistics,only shown for extra memory usage which has no effect on the > current worker memory allocation and statistics.The permsize can be obtained > easily from executor java options. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-20150) Can the spark add a mechanism for permsize statistics which may be very useful for the memory usage assessment
Jinhua Fu created SPARK-20150: - Summary: Can the spark add a mechanism for permsize statistics which may be very useful for the memory usage assessment Key: SPARK-20150 URL: https://issues.apache.org/jira/browse/SPARK-20150 Project: Spark Issue Type: Wish Components: Web UI Affects Versions: 2.0.2 Reporter: Jinhua Fu It seems worker memory only be assigned to executor heap which is usually not enough for estimating the whole clauster memory usage,especially when memory becomes a bottleneck of the clauster.In many case,we found a executor's real memory usage was much larger than its heap size which make me have to check for every application's real memory expenditure. This can be improved by adding a mechanism for Non-Heap(permsize) statistics,only shown for extra memory usage which has no effect on the current worker memory allocation and statistics.The permsize can be obtained easily from executor java options. -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Comment Edited] (SPARK-20120) spark-sql CLI support silent mode
[ https://issues.apache.org/jira/browse/SPARK-20120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15944635#comment-15944635 ] Jinhua Fu edited comment on SPARK-20120 at 3/28/17 6:51 AM: Good idea.I agree with you! The "-S" option seems not effective. was (Author: jinhua fu): Good idea.I agree with you! > spark-sql CLI support silent mode > - > > Key: SPARK-20120 > URL: https://issues.apache.org/jira/browse/SPARK-20120 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.1.0 >Reporter: Yuming Wang > > It is similar to Hive silent mode, just show the query result. see: > https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Cli -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-20120) spark-sql CLI support silent mode
[ https://issues.apache.org/jira/browse/SPARK-20120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15944635#comment-15944635 ] Jinhua Fu commented on SPARK-20120: --- Good idea.I agree with you! > spark-sql CLI support silent mode > - > > Key: SPARK-20120 > URL: https://issues.apache.org/jira/browse/SPARK-20120 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.1.0 >Reporter: Yuming Wang > > It is similar to Hive silent mode, just show the query result. see: > https://cwiki.apache.org/confluence/display/Hive/LanguageManual+Cli -- This message was sent by Atlassian JIRA (v6.3.15#6346) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org