[jira] [Assigned] (SPARK-31684) Overwrite partition failed with 'WRONG FS' when the target partition is not belong to the filesystem as same as the table
[ https://issues.apache.org/jira/browse/SPARK-31684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-31684: Assignee: (was: Apache Spark) > Overwrite partition failed with 'WRONG FS' when the target partition is not > belong to the filesystem as same as the table > -- > > Key: SPARK-31684 > URL: https://issues.apache.org/jira/browse/SPARK-31684 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.1.3, 2.2.3, 2.3.4, 2.4.5, 3.0.0, 3.1.0 >Reporter: Kent Yao >Priority: Blocker > > With https://issues.apache.org/jira/browse/SPARK-18107, we will disable the > underlying replace(overwrite) and instead do delete in spark side and only do > copy in hive side to bypass the performance issue - > https://issues.apache.org/jira/browse/HIVE-11940 > > Conditionally, if the table location and partition location do not belong to > the same [[FileSystem]], We should not disable hive overwrite. Otherwise, > hive will use the [[FileSystem]] instance belong to the table location to > copy files, which will fail [[FileSystem#checkPath]] > see > https://github.com/apache/hive/blob/rel/release-2.3.7/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java#L1648-L1659 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-31684) Overwrite partition failed with 'WRONG FS' when the target partition is not belong to the filesystem as same as the table
[ https://issues.apache.org/jira/browse/SPARK-31684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17105142#comment-17105142 ] Apache Spark commented on SPARK-31684: -- User 'yaooqinn' has created a pull request for this issue: https://github.com/apache/spark/pull/28511 > Overwrite partition failed with 'WRONG FS' when the target partition is not > belong to the filesystem as same as the table > -- > > Key: SPARK-31684 > URL: https://issues.apache.org/jira/browse/SPARK-31684 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.1.3, 2.2.3, 2.3.4, 2.4.5, 3.0.0, 3.1.0 >Reporter: Kent Yao >Priority: Blocker > > With https://issues.apache.org/jira/browse/SPARK-18107, we will disable the > underlying replace(overwrite) and instead do delete in spark side and only do > copy in hive side to bypass the performance issue - > https://issues.apache.org/jira/browse/HIVE-11940 > > Conditionally, if the table location and partition location do not belong to > the same [[FileSystem]], We should not disable hive overwrite. Otherwise, > hive will use the [[FileSystem]] instance belong to the table location to > copy files, which will fail [[FileSystem#checkPath]] > see > https://github.com/apache/hive/blob/rel/release-2.3.7/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java#L1648-L1659 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-31684) Overwrite partition failed with 'WRONG FS' when the target partition is not belong to the filesystem as same as the table
[ https://issues.apache.org/jira/browse/SPARK-31684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-31684: Assignee: Apache Spark > Overwrite partition failed with 'WRONG FS' when the target partition is not > belong to the filesystem as same as the table > -- > > Key: SPARK-31684 > URL: https://issues.apache.org/jira/browse/SPARK-31684 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.1.3, 2.2.3, 2.3.4, 2.4.5, 3.0.0, 3.1.0 >Reporter: Kent Yao >Assignee: Apache Spark >Priority: Blocker > > With https://issues.apache.org/jira/browse/SPARK-18107, we will disable the > underlying replace(overwrite) and instead do delete in spark side and only do > copy in hive side to bypass the performance issue - > https://issues.apache.org/jira/browse/HIVE-11940 > > Conditionally, if the table location and partition location do not belong to > the same [[FileSystem]], We should not disable hive overwrite. Otherwise, > hive will use the [[FileSystem]] instance belong to the table location to > copy files, which will fail [[FileSystem#checkPath]] > see > https://github.com/apache/hive/blob/rel/release-2.3.7/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java#L1648-L1659 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-31684) Overwrite partition failed with 'WRONG FS' when the target partition is not belong to the filesystem as same as the table
[ https://issues.apache.org/jira/browse/SPARK-31684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao updated SPARK-31684: - Description: With https://issues.apache.org/jira/browse/SPARK-18107, we will disable the underlying replace(overwrite) and instead do delete in spark side and only do copy in hive side to bypass the performance issue - https://issues.apache.org/jira/browse/HIVE-11940 Conditionally, if the table location and partition location do not belong to the same [[FileSystem]], We should not disable hive overwrite. Otherwise, hive will use the [[FileSystem]] instance belong to the table location to copy files, which will fail [[FileSystem#checkPath]] see https://github.com/apache/hive/blob/rel/release-2.3.7/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java#L1648-L1659 was: With https://issues.apache.org/jira/browse/SPARK-18107, we conditionally disable the underlying replace(overwrite) and instead do delete in spark side and only do copy in hive side to bypass the performance issue - https://issues.apache.org/jira/browse/HIVE-11940 Additionally, if the table location and partition location do not belong to the same [[FileSystem]], We should not disable hive overwrite. Otherwise, hive will use the [[FileSystem]] instance belong to the table location to copy files, which will fail [[FileSystem#checkPath]] see https://github.com/apache/hive/blob/rel/release-2.3.7/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java#L1648-L1659 > Overwrite partition failed with 'WRONG FS' when the target partition is not > belong to the filesystem as same as the table > -- > > Key: SPARK-31684 > URL: https://issues.apache.org/jira/browse/SPARK-31684 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.1.3, 2.2.3, 2.3.4, 2.4.5, 3.0.0, 3.1.0 >Reporter: Kent Yao >Priority: Blocker > > With https://issues.apache.org/jira/browse/SPARK-18107, we will disable the > underlying replace(overwrite) and instead do delete in spark side and only do > copy in hive side to bypass the performance issue - > https://issues.apache.org/jira/browse/HIVE-11940 > > Conditionally, if the table location and partition location do not belong to > the same [[FileSystem]], We should not disable hive overwrite. Otherwise, > hive will use the [[FileSystem]] instance belong to the table location to > copy files, which will fail [[FileSystem#checkPath]] > see > https://github.com/apache/hive/blob/rel/release-2.3.7/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java#L1648-L1659 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-31684) Overwrite partition failed with 'WRONG FS' when the target partition is not belong to the filesystem as same as the table
Kent Yao created SPARK-31684: Summary: Overwrite partition failed with 'WRONG FS' when the target partition is not belong to the filesystem as same as the table Key: SPARK-31684 URL: https://issues.apache.org/jira/browse/SPARK-31684 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.4.5, 2.3.4, 2.2.3, 2.1.3, 3.0.0, 3.1.0 Reporter: Kent Yao With https://issues.apache.org/jira/browse/SPARK-18107, we conditionally disable the underlying replace(overwrite) and instead do delete in spark side and only do copy in hive side to bypass the performance issue - https://issues.apache.org/jira/browse/HIVE-11940 Additionally, if the table location and partition location do not belong to the same [[FileSystem]], We should not disable hive overwrite. Otherwise, hive will use the [[FileSystem]] instance belong to the table location to copy files, which will fail [[FileSystem#checkPath]] see https://github.com/apache/hive/blob/rel/release-2.3.7/ql/src/java/org/apache/hadoop/hive/ql/metadata/Hive.java#L1648-L1659 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-30699) GMM blockify input vectors
[ https://issues.apache.org/jira/browse/SPARK-30699?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengruifeng resolved SPARK-30699. -- Fix Version/s: 3.1.0 Resolution: Fixed Issue resolved by pull request 27473 [https://github.com/apache/spark/pull/27473] > GMM blockify input vectors > -- > > Key: SPARK-30699 > URL: https://issues.apache.org/jira/browse/SPARK-30699 > Project: Spark > Issue Type: Sub-task > Components: ML, PySpark >Affects Versions: 3.1.0 >Reporter: zhengruifeng >Assignee: zhengruifeng >Priority: Minor > Fix For: 3.1.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-31683) Make Prometheus output consistent with DropWizard 4.1 result
[ https://issues.apache.org/jira/browse/SPARK-31683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17105058#comment-17105058 ] Apache Spark commented on SPARK-31683: -- User 'dongjoon-hyun' has created a pull request for this issue: https://github.com/apache/spark/pull/28510 > Make Prometheus output consistent with DropWizard 4.1 result > > > Key: SPARK-31683 > URL: https://issues.apache.org/jira/browse/SPARK-31683 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 3.0.0 >Reporter: Dongjoon Hyun >Priority: Major > > SPARK-29032 adds Prometheus support. > After that, SPARK-29674 upgraded DropWizard for JDK9+ support and causes > difference in output labels and number of keys. > > This issue aims to fix this inconsistency in Spark. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-31683) Make Prometheus output consistent with DropWizard 4.1 result
[ https://issues.apache.org/jira/browse/SPARK-31683?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17105056#comment-17105056 ] Apache Spark commented on SPARK-31683: -- User 'dongjoon-hyun' has created a pull request for this issue: https://github.com/apache/spark/pull/28510 > Make Prometheus output consistent with DropWizard 4.1 result > > > Key: SPARK-31683 > URL: https://issues.apache.org/jira/browse/SPARK-31683 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 3.0.0 >Reporter: Dongjoon Hyun >Priority: Major > > SPARK-29032 adds Prometheus support. > After that, SPARK-29674 upgraded DropWizard for JDK9+ support and causes > difference in output labels and number of keys. > > This issue aims to fix this inconsistency in Spark. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-31683) Make Prometheus output consistent with DropWizard 4.1 result
[ https://issues.apache.org/jira/browse/SPARK-31683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-31683: Assignee: Apache Spark > Make Prometheus output consistent with DropWizard 4.1 result > > > Key: SPARK-31683 > URL: https://issues.apache.org/jira/browse/SPARK-31683 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 3.0.0 >Reporter: Dongjoon Hyun >Assignee: Apache Spark >Priority: Major > > SPARK-29032 adds Prometheus support. > After that, SPARK-29674 upgraded DropWizard for JDK9+ support and causes > difference in output labels and number of keys. > > This issue aims to fix this inconsistency in Spark. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-31683) Make Prometheus output consistent with DropWizard 4.1 result
[ https://issues.apache.org/jira/browse/SPARK-31683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-31683: Assignee: (was: Apache Spark) > Make Prometheus output consistent with DropWizard 4.1 result > > > Key: SPARK-31683 > URL: https://issues.apache.org/jira/browse/SPARK-31683 > Project: Spark > Issue Type: Sub-task > Components: Spark Core >Affects Versions: 3.0.0 >Reporter: Dongjoon Hyun >Priority: Major > > SPARK-29032 adds Prometheus support. > After that, SPARK-29674 upgraded DropWizard for JDK9+ support and causes > difference in output labels and number of keys. > > This issue aims to fix this inconsistency in Spark. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-31683) Make Prometheus output consistent with DropWizard 4.1 result
Dongjoon Hyun created SPARK-31683: - Summary: Make Prometheus output consistent with DropWizard 4.1 result Key: SPARK-31683 URL: https://issues.apache.org/jira/browse/SPARK-31683 Project: Spark Issue Type: Sub-task Components: Spark Core Affects Versions: 3.0.0 Reporter: Dongjoon Hyun SPARK-29032 adds Prometheus support. After that, SPARK-29674 upgraded DropWizard for JDK9+ support and causes difference in output labels and number of keys. This issue aims to fix this inconsistency in Spark. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-31655) Upgrade snappy to version 1.1.7.5
[ https://issues.apache.org/jira/browse/SPARK-31655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17105052#comment-17105052 ] Apache Spark commented on SPARK-31655: -- User 'AngersZh' has created a pull request for this issue: https://github.com/apache/spark/pull/28509 > Upgrade snappy to version 1.1.7.5 > - > > Key: SPARK-31655 > URL: https://issues.apache.org/jira/browse/SPARK-31655 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 3.0.0 >Reporter: angerszhu >Assignee: angerszhu >Priority: Minor > Fix For: 3.1.0 > > > Upgrade snappy to version 1.1.7.5 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-31655) Upgrade snappy to version 1.1.7.5
[ https://issues.apache.org/jira/browse/SPARK-31655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17105049#comment-17105049 ] Apache Spark commented on SPARK-31655: -- User 'AngersZh' has created a pull request for this issue: https://github.com/apache/spark/pull/28508 > Upgrade snappy to version 1.1.7.5 > - > > Key: SPARK-31655 > URL: https://issues.apache.org/jira/browse/SPARK-31655 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 3.0.0 >Reporter: angerszhu >Assignee: angerszhu >Priority: Minor > Fix For: 3.1.0 > > > Upgrade snappy to version 1.1.7.5 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-31655) Upgrade snappy to version 1.1.7.5
[ https://issues.apache.org/jira/browse/SPARK-31655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17105043#comment-17105043 ] Apache Spark commented on SPARK-31655: -- User 'AngersZh' has created a pull request for this issue: https://github.com/apache/spark/pull/28507 > Upgrade snappy to version 1.1.7.5 > - > > Key: SPARK-31655 > URL: https://issues.apache.org/jira/browse/SPARK-31655 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 3.0.0 >Reporter: angerszhu >Assignee: angerszhu >Priority: Minor > Fix For: 3.1.0 > > > Upgrade snappy to version 1.1.7.5 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-31655) Upgrade snappy to version 1.1.7.5
[ https://issues.apache.org/jira/browse/SPARK-31655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17105042#comment-17105042 ] Apache Spark commented on SPARK-31655: -- User 'AngersZh' has created a pull request for this issue: https://github.com/apache/spark/pull/28507 > Upgrade snappy to version 1.1.7.5 > - > > Key: SPARK-31655 > URL: https://issues.apache.org/jira/browse/SPARK-31655 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 3.0.0 >Reporter: angerszhu >Assignee: angerszhu >Priority: Minor > Fix For: 3.1.0 > > > Upgrade snappy to version 1.1.7.5 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-31655) Upgrade snappy to version 1.1.7.5
[ https://issues.apache.org/jira/browse/SPARK-31655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17105026#comment-17105026 ] Apache Spark commented on SPARK-31655: -- User 'AngersZh' has created a pull request for this issue: https://github.com/apache/spark/pull/28506 > Upgrade snappy to version 1.1.7.5 > - > > Key: SPARK-31655 > URL: https://issues.apache.org/jira/browse/SPARK-31655 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 3.0.0 >Reporter: angerszhu >Assignee: angerszhu >Priority: Minor > Fix For: 3.1.0 > > > Upgrade snappy to version 1.1.7.5 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-31655) Upgrade snappy to version 1.1.7.5
[ https://issues.apache.org/jira/browse/SPARK-31655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17105025#comment-17105025 ] Apache Spark commented on SPARK-31655: -- User 'AngersZh' has created a pull request for this issue: https://github.com/apache/spark/pull/28506 > Upgrade snappy to version 1.1.7.5 > - > > Key: SPARK-31655 > URL: https://issues.apache.org/jira/browse/SPARK-31655 > Project: Spark > Issue Type: Bug > Components: Build >Affects Versions: 3.0.0 >Reporter: angerszhu >Assignee: angerszhu >Priority: Minor > Fix For: 3.1.0 > > > Upgrade snappy to version 1.1.7.5 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-31682) Turn on spark.sql.legacy.createHiveTableByDefault.enabled by default
[ https://issues.apache.org/jira/browse/SPARK-31682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] wuyi updated SPARK-31682: - Parent: (was: SPARK-30098) Issue Type: Improvement (was: Sub-task) > Turn on spark.sql.legacy.createHiveTableByDefault.enabled by default > - > > Key: SPARK-31682 > URL: https://issues.apache.org/jira/browse/SPARK-31682 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: wuyi >Priority: Major > > According to the latest status of [[DISCUSS] Resolve ambiguous parser rule > between two "create > table"s|http://apache-spark-developers-list.1001551.n3.nabble.com/DISCUSS-Resolve-ambiguous-parser-rule-between-two-quot-create-table-quot-s-td29051i20.html#a29355], > there're might be a choice to turn this conf on by default to unblock Spark > 3.0 release. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-31682) Turn on spark.sql.legacy.createHiveTableByDefault.enabled by default
[ https://issues.apache.org/jira/browse/SPARK-31682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] wuyi updated SPARK-31682: - Parent: SPARK-31085 Issue Type: Sub-task (was: Improvement) > Turn on spark.sql.legacy.createHiveTableByDefault.enabled by default > - > > Key: SPARK-31682 > URL: https://issues.apache.org/jira/browse/SPARK-31682 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: wuyi >Priority: Major > > According to the latest status of [[DISCUSS] Resolve ambiguous parser rule > between two "create > table"s|http://apache-spark-developers-list.1001551.n3.nabble.com/DISCUSS-Resolve-ambiguous-parser-rule-between-two-quot-create-table-quot-s-td29051i20.html#a29355], > there're might be a choice to turn this conf on by default to unblock Spark > 3.0 release. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-31682) Turn on spark.sql.legacy.createHiveTableByDefault.enabled by default
[ https://issues.apache.org/jira/browse/SPARK-31682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-31682: Assignee: (was: Apache Spark) > Turn on spark.sql.legacy.createHiveTableByDefault.enabled by default > - > > Key: SPARK-31682 > URL: https://issues.apache.org/jira/browse/SPARK-31682 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: wuyi >Priority: Major > > According to the latest status of [[DISCUSS] Resolve ambiguous parser rule > between two "create > table"s|http://apache-spark-developers-list.1001551.n3.nabble.com/DISCUSS-Resolve-ambiguous-parser-rule-between-two-quot-create-table-quot-s-td29051i20.html#a29355], > there're might be a choice to turn this conf on by default to unblock Spark > 3.0 release. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-31682) Turn on spark.sql.legacy.createHiveTableByDefault.enabled by default
[ https://issues.apache.org/jira/browse/SPARK-31682?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-31682: Assignee: Apache Spark > Turn on spark.sql.legacy.createHiveTableByDefault.enabled by default > - > > Key: SPARK-31682 > URL: https://issues.apache.org/jira/browse/SPARK-31682 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: wuyi >Assignee: Apache Spark >Priority: Major > > According to the latest status of [[DISCUSS] Resolve ambiguous parser rule > between two "create > table"s|http://apache-spark-developers-list.1001551.n3.nabble.com/DISCUSS-Resolve-ambiguous-parser-rule-between-two-quot-create-table-quot-s-td29051i20.html#a29355], > there're might be a choice to turn this conf on by default to unblock Spark > 3.0 release. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-31682) Turn on spark.sql.legacy.createHiveTableByDefault.enabled by default
[ https://issues.apache.org/jira/browse/SPARK-31682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17105016#comment-17105016 ] Apache Spark commented on SPARK-31682: -- User 'Ngone51' has created a pull request for this issue: https://github.com/apache/spark/pull/28500 > Turn on spark.sql.legacy.createHiveTableByDefault.enabled by default > - > > Key: SPARK-31682 > URL: https://issues.apache.org/jira/browse/SPARK-31682 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: wuyi >Priority: Major > > According to the latest status of [[DISCUSS] Resolve ambiguous parser rule > between two "create > table"s|http://apache-spark-developers-list.1001551.n3.nabble.com/DISCUSS-Resolve-ambiguous-parser-rule-between-two-quot-create-table-quot-s-td29051i20.html#a29355], > there're might be a choice to turn this conf on by default to unblock Spark > 3.0 release. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-31682) Turn on spark.sql.legacy.createHiveTableByDefault.enabled by default
wuyi created SPARK-31682: Summary: Turn on spark.sql.legacy.createHiveTableByDefault.enabled by default Key: SPARK-31682 URL: https://issues.apache.org/jira/browse/SPARK-31682 Project: Spark Issue Type: Sub-task Components: SQL Affects Versions: 3.0.0 Reporter: wuyi According to the latest status of [[DISCUSS] Resolve ambiguous parser rule between two "create table"s|http://apache-spark-developers-list.1001551.n3.nabble.com/DISCUSS-Resolve-ambiguous-parser-rule-between-two-quot-create-table-quot-s-td29051i20.html#a29355], there're might be a choice to turn this conf on by default to unblock Spark 3.0 release. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-31393) Show the correct alias in schema for expression
[ https://issues.apache.org/jira/browse/SPARK-31393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon resolved SPARK-31393. -- Fix Version/s: 3.0.0 Resolution: Fixed Issue resolved by pull request 28164 [https://github.com/apache/spark/pull/28164] > Show the correct alias in schema for expression > --- > > Key: SPARK-31393 > URL: https://issues.apache.org/jira/browse/SPARK-31393 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.0 >Reporter: jiaan.geng >Assignee: jiaan.geng >Priority: Major > Fix For: 3.0.0 > > > Spark SQL exists some function no elegant implementation alias. > For example: BitwiseCount override the sql method > {code:java} > override def sql: String = s"bit_count(${child.sql})" > {code} > I don't think it's elegant enough. > Because `Expression` gives the following definitions. > {code:java} > def sql: String = { > val childrenSQL = children.map(_.sql).mkString(", ") > s"$prettyName($childrenSQL)" > } > {code} > By this definition, BitwiseCount should override `prettyName` method. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-31393) Show the correct alias in schema for expression
[ https://issues.apache.org/jira/browse/SPARK-31393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Hyukjin Kwon reassigned SPARK-31393: Assignee: jiaan.geng > Show the correct alias in schema for expression > --- > > Key: SPARK-31393 > URL: https://issues.apache.org/jira/browse/SPARK-31393 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.0 >Reporter: jiaan.geng >Assignee: jiaan.geng >Priority: Major > > Spark SQL exists some function no elegant implementation alias. > For example: BitwiseCount override the sql method > {code:java} > override def sql: String = s"bit_count(${child.sql})" > {code} > I don't think it's elegant enough. > Because `Expression` gives the following definitions. > {code:java} > def sql: String = { > val childrenSQL = children.map(_.sql).mkString(", ") > s"$prettyName($childrenSQL)" > } > {code} > By this definition, BitwiseCount should override `prettyName` method. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-31559) AM starts with initial fetched tokens in any attempt
[ https://issues.apache.org/jira/browse/SPARK-31559?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Marcelo Masiero Vanzin resolved SPARK-31559. Fix Version/s: 3.0.0 Assignee: Jungtaek Lim Resolution: Fixed > AM starts with initial fetched tokens in any attempt > > > Key: SPARK-31559 > URL: https://issues.apache.org/jira/browse/SPARK-31559 > Project: Spark > Issue Type: Bug > Components: YARN >Affects Versions: 3.0.0 >Reporter: Jungtaek Lim >Assignee: Jungtaek Lim >Priority: Major > Fix For: 3.0.0 > > > The issue is only occurred in yarn-cluster mode. > Submitter will obtain delegation tokens for yarn-cluster mode, and add these > credentials to the launch context. AM will be launched with these > credentials, and AM and driver are able to leverage these tokens. > In Yarn cluster mode, driver is launched in AM, which in turn initializes > token manager (while initializing SparkContext) and obtain delegation tokens > (+ schedule to renew) if both principal and keytab are available. > That said, even we provide principal and keytab to run application with > yarn-cluster mode, AM always starts with initial tokens from launch context > until token manager runs and obtains delegation tokens. > So there's a "gap", and if user codes (driver) access to external system with > delegation tokens (e.g. HDFS) before initializing SparkContext, it cannot > leverage the tokens token manager will obtain. It will make the application > fail if AM is killed "after" the initial tokens are expired and relaunched. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-31671) Wrong error message in VectorAssembler when column lengths can not be inferred
[ https://issues.apache.org/jira/browse/SPARK-31671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean R. Owen resolved SPARK-31671. -- Fix Version/s: 2.4.7 3.0.0 Assignee: YijieFan Resolution: Fixed Resolved by https://github.com/apache/spark/pull/28487 > Wrong error message in VectorAssembler when column lengths can not be > inferred > --- > > Key: SPARK-31671 > URL: https://issues.apache.org/jira/browse/SPARK-31671 > Project: Spark > Issue Type: Bug > Components: ML >Affects Versions: 2.4.4 > Environment: Mac OS catalina >Reporter: YijieFan >Assignee: YijieFan >Priority: Minor > Fix For: 3.0.0, 2.4.7 > > Original Estimate: 72h > Remaining Estimate: 72h > > In VectorAssembler when input column lengths can not be inferred and > handleInvalid = "keep", it will throw a runtime exception with message like > below > _Can not infer column lengths with handleInvalid = "keep". *Consider using > VectorSizeHint*_ > *_|to add metadata for columns: [column1, column2]_* > However, even if you set vector size hint for *column1*, the message remains, > and will not change to *[column2]* only. This is not consistent with the > description in the error message. > This introduce difficulties when I try to resolve this exception, for I do > not know which column required vectorSizeHint. This is especially troublesome > when you have a large number of columns to deal with. > Here is a simple example: > > {code:java} > // create a df without vector size > val df = Seq( > (Vectors.dense(1.0), Vectors.dense(2.0)) > ).toDF("n1", "n2") > // only set vector size hint for n1 column > val hintedDf = new VectorSizeHint() > .setInputCol("n1") > .setSize(1) > .transform(df) > // assemble n1, n2 > val output = new VectorAssembler() > .setInputCols(Array("n1", "n2")) > .setOutputCol("features") > .setHandleInvalid("keep") > .transform(hintedDf) > // because only n1 has vector size, the error message should tell us to set > vector size for n2 too > output.show() > {code} > Expected error message: > > {code:java} > Can not infer column lengths with handleInvalid = "keep". Consider using > VectorSizeHint to add metadata for columns: [n2]. > {code} > Actual error message: > {code:java} > Can not infer column lengths with handleInvalid = "keep". Consider using > VectorSizeHint to add metadata for columns: [n1, n2]. > {code} > I change one line in VectorAssembler.scala, so that it can work properly as > expected. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-20007) Make SparkR apply() functions robust to workers that return empty data.frame
[ https://issues.apache.org/jira/browse/SPARK-20007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17104969#comment-17104969 ] Apache Spark commented on SPARK-20007: -- User 'liangz1' has created a pull request for this issue: https://github.com/apache/spark/pull/28504 > Make SparkR apply() functions robust to workers that return empty data.frame > > > Key: SPARK-20007 > URL: https://issues.apache.org/jira/browse/SPARK-20007 > Project: Spark > Issue Type: Bug > Components: SparkR >Affects Versions: 2.2.0 >Reporter: Hossein Falaki >Priority: Major > Labels: bulk-closed > > When using {{gapply()}} (or other members of {{apply()}} family) with a > schema, Spark will try to parse data returned form the R process on each > worker as Spark DataFrame Rows based on the schema. In this case our provided > schema suggests that we have six column. When an R worker returns results to > JVM, SparkSQL will try to access its columns one by one and cast them to > proper types. If R worker returns nothing, JVM will throw > {{ArrayIndexOutOfBoundsException}} exception. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-31671) Wrong error message in VectorAssembler when column lengths can not be inferred
[ https://issues.apache.org/jira/browse/SPARK-31671?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean R. Owen updated SPARK-31671: - Affects Version/s: (was: 3.0.1) Labels: (was: pull-request-available) > Wrong error message in VectorAssembler when column lengths can not be > inferred > --- > > Key: SPARK-31671 > URL: https://issues.apache.org/jira/browse/SPARK-31671 > Project: Spark > Issue Type: Bug > Components: ML >Affects Versions: 2.4.4 > Environment: Mac OS catalina >Reporter: YijieFan >Priority: Minor > Original Estimate: 72h > Remaining Estimate: 72h > > In VectorAssembler when input column lengths can not be inferred and > handleInvalid = "keep", it will throw a runtime exception with message like > below > _Can not infer column lengths with handleInvalid = "keep". *Consider using > VectorSizeHint*_ > *_|to add metadata for columns: [column1, column2]_* > However, even if you set vector size hint for *column1*, the message remains, > and will not change to *[column2]* only. This is not consistent with the > description in the error message. > This introduce difficulties when I try to resolve this exception, for I do > not know which column required vectorSizeHint. This is especially troublesome > when you have a large number of columns to deal with. > Here is a simple example: > > {code:java} > // create a df without vector size > val df = Seq( > (Vectors.dense(1.0), Vectors.dense(2.0)) > ).toDF("n1", "n2") > // only set vector size hint for n1 column > val hintedDf = new VectorSizeHint() > .setInputCol("n1") > .setSize(1) > .transform(df) > // assemble n1, n2 > val output = new VectorAssembler() > .setInputCols(Array("n1", "n2")) > .setOutputCol("features") > .setHandleInvalid("keep") > .transform(hintedDf) > // because only n1 has vector size, the error message should tell us to set > vector size for n2 too > output.show() > {code} > Expected error message: > > {code:java} > Can not infer column lengths with handleInvalid = "keep". Consider using > VectorSizeHint to add metadata for columns: [n2]. > {code} > Actual error message: > {code:java} > Can not infer column lengths with handleInvalid = "keep". Consider using > VectorSizeHint to add metadata for columns: [n1, n2]. > {code} > I change one line in VectorAssembler.scala, so that it can work properly as > expected. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-20007) Make SparkR apply() functions robust to workers that return empty data.frame
[ https://issues.apache.org/jira/browse/SPARK-20007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17104967#comment-17104967 ] Apache Spark commented on SPARK-20007: -- User 'liangz1' has created a pull request for this issue: https://github.com/apache/spark/pull/28504 > Make SparkR apply() functions robust to workers that return empty data.frame > > > Key: SPARK-20007 > URL: https://issues.apache.org/jira/browse/SPARK-20007 > Project: Spark > Issue Type: Bug > Components: SparkR >Affects Versions: 2.2.0 >Reporter: Hossein Falaki >Priority: Major > Labels: bulk-closed > > When using {{gapply()}} (or other members of {{apply()}} family) with a > schema, Spark will try to parse data returned form the R process on each > worker as Spark DataFrame Rows based on the schema. In this case our provided > schema suggests that we have six column. When an R worker returns results to > JVM, SparkSQL will try to access its columns one by one and cast them to > proper types. If R worker returns nothing, JVM will throw > {{ArrayIndexOutOfBoundsException}} exception. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-31666) Cannot map hostPath volumes to container
[ https://issues.apache.org/jira/browse/SPARK-31666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17104947#comment-17104947 ] Stephen Hopper commented on SPARK-31666: I was able to fix the issue by patching Spark. As noted in this issue: [https://github.com/GoogleCloudPlatform/spark-on-k8s-operator/issues/828] I applied versions of these two PRs (for Spark 3.0) (with some minor tweaks to make them compatible with 2.4.5): [https://github.com/apache/spark/pull/22323] [https://github.com/apache/spark/pull/24879] I then rebuilt Spark (as well as spark-submit and spark-operator) and it's working now. However, this is still going to be an issue for anyone on 2.4.5 as the docs state that hostPath directories should be useable. Should I open a PR to backport this fix for Spark 2.4.6? When is Spark 3.0 coming out? > Cannot map hostPath volumes to container > > > Key: SPARK-31666 > URL: https://issues.apache.org/jira/browse/SPARK-31666 > Project: Spark > Issue Type: Bug > Components: Kubernetes >Affects Versions: 2.4.5 >Reporter: Stephen Hopper >Priority: Major > > I'm trying to mount additional hostPath directories as seen in a couple of > places: > [https://aws.amazon.com/blogs/containers/optimizing-spark-performance-on-kubernetes/] > [https://github.com/GoogleCloudPlatform/spark-on-k8s-operator/blob/master/docs/user-guide.md#using-volume-for-scratch-space] > [https://spark.apache.org/docs/latest/running-on-kubernetes.html#using-kubernetes-volumes] > > However, whenever I try to submit my job, I run into this error: > {code:java} > Uncaught exception in thread kubernetes-executor-snapshots-subscribers-1 │ > io.fabric8.kubernetes.client.KubernetesClientException: Failure executing: > POST at: https://kubernetes.default.svc/api/v1/namespaces/my-spark-ns/pods. > Message: Pod "spark-pi-1588970477877-exec-1" is invalid: > spec.containers[0].volumeMounts[1].mountPath: Invalid value: "/tmp1": must be > unique. Received status: Status(apiVersion=v1, code=422, > details=StatusDetails(causes=[StatusCause(field=spec.containers[0].volumeMounts[1].mountPath, > message=Invalid value: "/tmp1": must be unique, reason=FieldValueInvalid, > additionalProperties={})], group=null, kind=Pod, > name=spark-pi-1588970477877-exec-1, retryAfterSeconds=null, uid=null, > additionalProperties={}), kind=Status, message=Pod > "spark-pi-1588970477877-exec-1" is invalid: > spec.containers[0].volumeMounts[1].mountPath: Invalid value: "/tmp1": must be > unique, metadata=ListMeta(_continue=null, remainingItemCount=null, > resourceVersion=null, selfLink=null, additionalProperties={}), > reason=Invalid, status=Failure, additionalProperties={}).{code} > > This is my spark-submit command (note: I've used my own build of spark for > kubernetes as well as a few other images that I've seen floating around (such > as this one seedjeffwan/spark:v2.4.5) and they all have this same issue): > {code:java} > bin/spark-submit \ > --master k8s://https://my-k8s-server:443 \ > --deploy-mode cluster \ > --name spark-pi \ > --class org.apache.spark.examples.SparkPi \ > --conf spark.executor.instances=2 \ > --conf spark.kubernetes.container.image=my-spark-image:my-tag \ > --conf spark.kubernetes.driver.pod.name=sparkpi-test-driver \ > --conf spark.kubernetes.namespace=my-spark-ns \ > --conf > spark.kubernetes.executor.volumes.hostPath.spark-local-dir-2.mount.path=/tmp1 > \ > --conf > spark.kubernetes.executor.volumes.hostPath.spark-local-dir-2.options.path=/tmp1 > \ > --conf spark.local.dir="/tmp1" \ > --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark > local:///opt/spark/examples/jars/spark-examples_2.11-2.4.5.jar 2{code} > Any ideas on what's causing this? > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-31681) Python multiclass logistic regression evaluate should return LogisticRegressionSummary
[ https://issues.apache.org/jira/browse/SPARK-31681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17104936#comment-17104936 ] Apache Spark commented on SPARK-31681: -- User 'huaxingao' has created a pull request for this issue: https://github.com/apache/spark/pull/28503 > Python multiclass logistic regression evaluate should return > LogisticRegressionSummary > -- > > Key: SPARK-31681 > URL: https://issues.apache.org/jira/browse/SPARK-31681 > Project: Spark > Issue Type: Bug > Components: ML, PySpark >Affects Versions: 3.1.0 >Reporter: Huaxin Gao >Priority: Minor > > {code:java} > def evaluate(self, dataset): > .. > java_blr_summary = self._call_java("evaluate", dataset) > return BinaryLogisticRegressionSummary(java_blr_summary) > {code} > We should return LogisticRegressionSummary instead of > BinaryLogisticRegressionSummary for multiclass LogisticRegression -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-31681) Python multiclass logistic regression evaluate should return LogisticRegressionSummary
[ https://issues.apache.org/jira/browse/SPARK-31681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-31681: Assignee: Apache Spark > Python multiclass logistic regression evaluate should return > LogisticRegressionSummary > -- > > Key: SPARK-31681 > URL: https://issues.apache.org/jira/browse/SPARK-31681 > Project: Spark > Issue Type: Bug > Components: ML, PySpark >Affects Versions: 3.1.0 >Reporter: Huaxin Gao >Assignee: Apache Spark >Priority: Minor > > {code:java} > def evaluate(self, dataset): > .. > java_blr_summary = self._call_java("evaluate", dataset) > return BinaryLogisticRegressionSummary(java_blr_summary) > {code} > We should return LogisticRegressionSummary instead of > BinaryLogisticRegressionSummary for multiclass LogisticRegression -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-31681) Python multiclass logistic regression evaluate should return LogisticRegressionSummary
[ https://issues.apache.org/jira/browse/SPARK-31681?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-31681: Assignee: (was: Apache Spark) > Python multiclass logistic regression evaluate should return > LogisticRegressionSummary > -- > > Key: SPARK-31681 > URL: https://issues.apache.org/jira/browse/SPARK-31681 > Project: Spark > Issue Type: Bug > Components: ML, PySpark >Affects Versions: 3.1.0 >Reporter: Huaxin Gao >Priority: Minor > > {code:java} > def evaluate(self, dataset): > .. > java_blr_summary = self._call_java("evaluate", dataset) > return BinaryLogisticRegressionSummary(java_blr_summary) > {code} > We should return LogisticRegressionSummary instead of > BinaryLogisticRegressionSummary for multiclass LogisticRegression -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-31681) Python multiclass logistic regression evaluate should return LogisticRegressionSummary
[ https://issues.apache.org/jira/browse/SPARK-31681?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17104935#comment-17104935 ] Apache Spark commented on SPARK-31681: -- User 'huaxingao' has created a pull request for this issue: https://github.com/apache/spark/pull/28503 > Python multiclass logistic regression evaluate should return > LogisticRegressionSummary > -- > > Key: SPARK-31681 > URL: https://issues.apache.org/jira/browse/SPARK-31681 > Project: Spark > Issue Type: Bug > Components: ML, PySpark >Affects Versions: 3.1.0 >Reporter: Huaxin Gao >Priority: Minor > > {code:java} > def evaluate(self, dataset): > .. > java_blr_summary = self._call_java("evaluate", dataset) > return BinaryLogisticRegressionSummary(java_blr_summary) > {code} > We should return LogisticRegressionSummary instead of > BinaryLogisticRegressionSummary for multiclass LogisticRegression -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-31681) Python multiclass logistic regression evaluate should return LogisticRegressionSummary
Huaxin Gao created SPARK-31681: -- Summary: Python multiclass logistic regression evaluate should return LogisticRegressionSummary Key: SPARK-31681 URL: https://issues.apache.org/jira/browse/SPARK-31681 Project: Spark Issue Type: Bug Components: ML, PySpark Affects Versions: 3.1.0 Reporter: Huaxin Gao {code:java} def evaluate(self, dataset): .. java_blr_summary = self._call_java("evaluate", dataset) return BinaryLogisticRegressionSummary(java_blr_summary) {code} We should return LogisticRegressionSummary instead of BinaryLogisticRegressionSummary for multiclass LogisticRegression -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-31680) Support Java 8 datetime types by Random data generator
[ https://issues.apache.org/jira/browse/SPARK-31680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-31680: Assignee: Apache Spark > Support Java 8 datetime types by Random data generator > -- > > Key: SPARK-31680 > URL: https://issues.apache.org/jira/browse/SPARK-31680 > Project: Spark > Issue Type: Test > Components: SQL >Affects Versions: 3.1.0 >Reporter: Maxim Gekk >Assignee: Apache Spark >Priority: Major > > Currently, RandomDataGenerator.forType can generate: > * java.sql.Date values for DateType > * java.sql.Timestamp values for TimestampType > The ticket aims to support java.time.Instant for TimestampType and > java.time.LocalDate for DateType. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-31680) Support Java 8 datetime types by Random data generator
[ https://issues.apache.org/jira/browse/SPARK-31680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17104884#comment-17104884 ] Apache Spark commented on SPARK-31680: -- User 'MaxGekk' has created a pull request for this issue: https://github.com/apache/spark/pull/28502 > Support Java 8 datetime types by Random data generator > -- > > Key: SPARK-31680 > URL: https://issues.apache.org/jira/browse/SPARK-31680 > Project: Spark > Issue Type: Test > Components: SQL >Affects Versions: 3.1.0 >Reporter: Maxim Gekk >Priority: Major > > Currently, RandomDataGenerator.forType can generate: > * java.sql.Date values for DateType > * java.sql.Timestamp values for TimestampType > The ticket aims to support java.time.Instant for TimestampType and > java.time.LocalDate for DateType. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-31680) Support Java 8 datetime types by Random data generator
[ https://issues.apache.org/jira/browse/SPARK-31680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-31680: Assignee: (was: Apache Spark) > Support Java 8 datetime types by Random data generator > -- > > Key: SPARK-31680 > URL: https://issues.apache.org/jira/browse/SPARK-31680 > Project: Spark > Issue Type: Test > Components: SQL >Affects Versions: 3.1.0 >Reporter: Maxim Gekk >Priority: Major > > Currently, RandomDataGenerator.forType can generate: > * java.sql.Date values for DateType > * java.sql.Timestamp values for TimestampType > The ticket aims to support java.time.Instant for TimestampType and > java.time.LocalDate for DateType. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-31680) Support Java 8 datetime types by Random data generator
Maxim Gekk created SPARK-31680: -- Summary: Support Java 8 datetime types by Random data generator Key: SPARK-31680 URL: https://issues.apache.org/jira/browse/SPARK-31680 Project: Spark Issue Type: Test Components: SQL Affects Versions: 3.1.0 Reporter: Maxim Gekk Currently, RandomDataGenerator.forType can generate: * java.sql.Date values for DateType * java.sql.Timestamp values for TimestampType The ticket aims to support java.time.Instant for TimestampType and java.time.LocalDate for DateType. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-31456) If shutdownhook is added with priority Integer.MIN_VALUE, it's supposed to be called the last, but it gets called before other positive priority shutdownhook
[ https://issues.apache.org/jira/browse/SPARK-31456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun resolved SPARK-31456. --- Fix Version/s: 3.0.0 2.4.6 Resolution: Fixed Issue resolved by pull request 28494 [https://github.com/apache/spark/pull/28494] > If shutdownhook is added with priority Integer.MIN_VALUE, it's supposed to be > called the last, but it gets called before other positive priority > shutdownhook > - > > Key: SPARK-31456 > URL: https://issues.apache.org/jira/browse/SPARK-31456 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.6.3, 2.0.2, 2.1.3, 2.2.3, 2.3.4, 2.4.5 > Environment: macOS Mojave 10.14.6 >Reporter: Xiaolei Liu >Assignee: Oleg Kuznetsov >Priority: Major > Fix For: 2.4.6, 3.0.0 > > > https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/util/ShutdownHookManager.scala > Since shutdownHookManager use below method to do the comparison. > override def compareTo(other: SparkShutdownHook): Int = { > other.priority - priority > } > Which will cause : > (Int)(25 - Integer.MIN_VALUE) < 0 > Then the shutdownhook with Integer.Min_VALUE would not be called the last. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-31456) If shutdownhook is added with priority Integer.MIN_VALUE, it's supposed to be called the last, but it gets called before other positive priority shutdownhook
[ https://issues.apache.org/jira/browse/SPARK-31456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-31456: -- Fix Version/s: (was: 2.4.6) 2.4.7 > If shutdownhook is added with priority Integer.MIN_VALUE, it's supposed to be > called the last, but it gets called before other positive priority > shutdownhook > - > > Key: SPARK-31456 > URL: https://issues.apache.org/jira/browse/SPARK-31456 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.6.3, 2.0.2, 2.1.3, 2.2.3, 2.3.4, 2.4.5 > Environment: macOS Mojave 10.14.6 >Reporter: Xiaolei Liu >Assignee: Oleg Kuznetsov >Priority: Major > Fix For: 3.0.0, 2.4.7 > > > https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/util/ShutdownHookManager.scala > Since shutdownHookManager use below method to do the comparison. > override def compareTo(other: SparkShutdownHook): Int = { > other.priority - priority > } > Which will cause : > (Int)(25 - Integer.MIN_VALUE) < 0 > Then the shutdownhook with Integer.Min_VALUE would not be called the last. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-31456) If shutdownhook is added with priority Integer.MIN_VALUE, it's supposed to be called the last, but it gets called before other positive priority shutdownhook
[ https://issues.apache.org/jira/browse/SPARK-31456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun reassigned SPARK-31456: - Assignee: Oleg Kuznetsov > If shutdownhook is added with priority Integer.MIN_VALUE, it's supposed to be > called the last, but it gets called before other positive priority > shutdownhook > - > > Key: SPARK-31456 > URL: https://issues.apache.org/jira/browse/SPARK-31456 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.6.3, 2.0.2, 2.1.3, 2.2.3, 2.3.4, 2.4.5 > Environment: macOS Mojave 10.14.6 >Reporter: Xiaolei Liu >Assignee: Oleg Kuznetsov >Priority: Major > > https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/util/ShutdownHookManager.scala > Since shutdownHookManager use below method to do the comparison. > override def compareTo(other: SparkShutdownHook): Int = { > other.priority - priority > } > Which will cause : > (Int)(25 - Integer.MIN_VALUE) < 0 > Then the shutdownhook with Integer.Min_VALUE would not be called the last. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-31456) If shutdownhook is added with priority Integer.MIN_VALUE, it's supposed to be called the last, but it gets called before other positive priority shutdownhook
[ https://issues.apache.org/jira/browse/SPARK-31456?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Dongjoon Hyun updated SPARK-31456: -- Affects Version/s: 1.6.3 2.0.2 2.1.3 2.2.3 2.3.4 > If shutdownhook is added with priority Integer.MIN_VALUE, it's supposed to be > called the last, but it gets called before other positive priority > shutdownhook > - > > Key: SPARK-31456 > URL: https://issues.apache.org/jira/browse/SPARK-31456 > Project: Spark > Issue Type: Bug > Components: Spark Core >Affects Versions: 1.6.3, 2.0.2, 2.1.3, 2.2.3, 2.3.4, 2.4.5 > Environment: macOS Mojave 10.14.6 >Reporter: Xiaolei Liu >Priority: Major > > https://github.com/apache/spark/blob/master/core/src/main/scala/org/apache/spark/util/ShutdownHookManager.scala > Since shutdownHookManager use below method to do the comparison. > override def compareTo(other: SparkShutdownHook): Int = { > other.priority - priority > } > Which will cause : > (Int)(25 - Integer.MIN_VALUE) < 0 > Then the shutdownhook with Integer.Min_VALUE would not be called the last. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-27249) Developers API for Transformers beyond UnaryTransformer
[ https://issues.apache.org/jira/browse/SPARK-27249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17104661#comment-17104661 ] Nick Afshartous commented on SPARK-27249: - [~enrush] Hi Everett, can you please chime in on the thread in the PR. There's a question about whether or not the need is covered by existing API's. > Developers API for Transformers beyond UnaryTransformer > --- > > Key: SPARK-27249 > URL: https://issues.apache.org/jira/browse/SPARK-27249 > Project: Spark > Issue Type: New Feature > Components: ML >Affects Versions: 3.1.0 >Reporter: Everett Rush >Priority: Minor > Labels: starter > Attachments: Screen Shot 2020-01-17 at 4.20.57 PM.png > > Original Estimate: 96h > Remaining Estimate: 96h > > It would be nice to have a developers' API for dataset transformations that > need more than one column from a row (ie UnaryTransformer inputs one column > and outputs one column) or that contain objects too expensive to initialize > repeatedly in a UDF such as a database connection. > > Design: > Abstract class PartitionTransformer extends Transformer and defines the > partition transformation function as Iterator[Row] => Iterator[Row] > NB: This parallels the UnaryTransformer createTransformFunc method > > When developers subclass this transformer, they can provide their own schema > for the output Row in which case the PartitionTransformer creates a row > encoder and executes the transformation. Alternatively the developer can set > output Datatype and output col name. Then the PartitionTransformer class will > create a new schema, a row encoder, and execute the transformation. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-31663) Grouping sets with having clause returns the wrong result
[ https://issues.apache.org/jira/browse/SPARK-31663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-31663: Assignee: (was: Apache Spark) > Grouping sets with having clause returns the wrong result > - > > Key: SPARK-31663 > URL: https://issues.apache.org/jira/browse/SPARK-31663 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.2, 2.1.3, 2.2.3, 2.3.4, 2.4.5, 3.0.0 >Reporter: Yuanjian Li >Priority: Major > Labels: correctness > > Grouping sets with having clause returns the wrong result when the condition > of having contained conflicting naming. See the below example: > {code:java} > select sum(a) as b FROM VALUES (1, 10), (2, 20) AS T(a, b) group by GROUPING > SETS ((b), (a, b)) having b > 10{code} > The `b` in `having b > 10` should be resolved as `T.b` not `sum(a)`, so the > right result should be > {code:java} > +---+ > | b| > +---+ > | 2| > | 2| > +---+{code} > instead of an empty result. > The root cause is similar to SPARK-31519, it's caused by we parsed HAVING as > Filter(..., Agg(...)) and resolved these two parts in different rules. The > CUBE and ROLLUP have the same issue. > Other systems worked as expected, I checked PostgreSQL 9.6 and MS SQL Server > 2017. > > For Apache Spark 2.0.2 ~ 2.3.4, the following query is tested. > {code:java} > spark-sql> select sum(a) as b from t group by b grouping sets(b) having b > > 10; > Time taken: 0.194 seconds > hive> select sum(a) as b from t group by b grouping sets(b) having b > 10; > 2 > Time taken: 1.605 seconds, Fetched: 1 row(s) {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-31663) Grouping sets with having clause returns the wrong result
[ https://issues.apache.org/jira/browse/SPARK-31663?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-31663: Assignee: Apache Spark > Grouping sets with having clause returns the wrong result > - > > Key: SPARK-31663 > URL: https://issues.apache.org/jira/browse/SPARK-31663 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.2, 2.1.3, 2.2.3, 2.3.4, 2.4.5, 3.0.0 >Reporter: Yuanjian Li >Assignee: Apache Spark >Priority: Major > Labels: correctness > > Grouping sets with having clause returns the wrong result when the condition > of having contained conflicting naming. See the below example: > {code:java} > select sum(a) as b FROM VALUES (1, 10), (2, 20) AS T(a, b) group by GROUPING > SETS ((b), (a, b)) having b > 10{code} > The `b` in `having b > 10` should be resolved as `T.b` not `sum(a)`, so the > right result should be > {code:java} > +---+ > | b| > +---+ > | 2| > | 2| > +---+{code} > instead of an empty result. > The root cause is similar to SPARK-31519, it's caused by we parsed HAVING as > Filter(..., Agg(...)) and resolved these two parts in different rules. The > CUBE and ROLLUP have the same issue. > Other systems worked as expected, I checked PostgreSQL 9.6 and MS SQL Server > 2017. > > For Apache Spark 2.0.2 ~ 2.3.4, the following query is tested. > {code:java} > spark-sql> select sum(a) as b from t group by b grouping sets(b) having b > > 10; > Time taken: 0.194 seconds > hive> select sum(a) as b from t group by b grouping sets(b) having b > 10; > 2 > Time taken: 1.605 seconds, Fetched: 1 row(s) {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-31663) Grouping sets with having clause returns the wrong result
[ https://issues.apache.org/jira/browse/SPARK-31663?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17104528#comment-17104528 ] Apache Spark commented on SPARK-31663: -- User 'xuanyuanking' has created a pull request for this issue: https://github.com/apache/spark/pull/28501 > Grouping sets with having clause returns the wrong result > - > > Key: SPARK-31663 > URL: https://issues.apache.org/jira/browse/SPARK-31663 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.0.2, 2.1.3, 2.2.3, 2.3.4, 2.4.5, 3.0.0 >Reporter: Yuanjian Li >Priority: Major > Labels: correctness > > Grouping sets with having clause returns the wrong result when the condition > of having contained conflicting naming. See the below example: > {code:java} > select sum(a) as b FROM VALUES (1, 10), (2, 20) AS T(a, b) group by GROUPING > SETS ((b), (a, b)) having b > 10{code} > The `b` in `having b > 10` should be resolved as `T.b` not `sum(a)`, so the > right result should be > {code:java} > +---+ > | b| > +---+ > | 2| > | 2| > +---+{code} > instead of an empty result. > The root cause is similar to SPARK-31519, it's caused by we parsed HAVING as > Filter(..., Agg(...)) and resolved these two parts in different rules. The > CUBE and ROLLUP have the same issue. > Other systems worked as expected, I checked PostgreSQL 9.6 and MS SQL Server > 2017. > > For Apache Spark 2.0.2 ~ 2.3.4, the following query is tested. > {code:java} > spark-sql> select sum(a) as b from t group by b grouping sets(b) having b > > 10; > Time taken: 0.194 seconds > hive> select sum(a) as b from t group by b grouping sets(b) having b > 10; > 2 > Time taken: 1.605 seconds, Fetched: 1 row(s) {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-31575) Synchronise global JVM security configuration modification
[ https://issues.apache.org/jira/browse/SPARK-31575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean R. Owen updated SPARK-31575: - Priority: Minor (was: Major) > Synchronise global JVM security configuration modification > -- > > Key: SPARK-31575 > URL: https://issues.apache.org/jira/browse/SPARK-31575 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.1.0 >Reporter: Gabor Somogyi >Assignee: Gabor Somogyi >Priority: Minor > Fix For: 3.1.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-31575) Synchronise global JVM security configuration modification
[ https://issues.apache.org/jira/browse/SPARK-31575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean R. Owen resolved SPARK-31575. -- Fix Version/s: 3.1.0 Assignee: Gabor Somogyi Resolution: Fixed Resolved by https://github.com/apache/spark/pull/28368 > Synchronise global JVM security configuration modification > -- > > Key: SPARK-31575 > URL: https://issues.apache.org/jira/browse/SPARK-31575 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.1.0 >Reporter: Gabor Somogyi >Assignee: Gabor Somogyi >Priority: Major > Fix For: 3.1.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-31667) Python side flatten the result dataframe of ANOVATest/ChisqTest/FValueTest
[ https://issues.apache.org/jira/browse/SPARK-31667?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Sean R. Owen resolved SPARK-31667. -- Fix Version/s: 3.1.0 Assignee: Huaxin Gao Resolution: Fixed Resolved by https://github.com/apache/spark/pull/28483 > Python side flatten the result dataframe of ANOVATest/ChisqTest/FValueTest > -- > > Key: SPARK-31667 > URL: https://issues.apache.org/jira/browse/SPARK-31667 > Project: Spark > Issue Type: New Feature > Components: ML, PySpark >Affects Versions: 3.1.0 >Reporter: Huaxin Gao >Assignee: Huaxin Gao >Priority: Minor > Fix For: 3.1.0 > > > Add Python version of > {code:java} > @Since("3.1.0") > def test( > dataset: DataFrame, > featuresCol: String, > labelCol: String, > flatten: Boolean): DataFrame > {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-30098) Use default datasource as provider for CREATE TABLE syntax
[ https://issues.apache.org/jira/browse/SPARK-30098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17104479#comment-17104479 ] Apache Spark commented on SPARK-30098: -- User 'Ngone51' has created a pull request for this issue: https://github.com/apache/spark/pull/28500 > Use default datasource as provider for CREATE TABLE syntax > -- > > Key: SPARK-30098 > URL: https://issues.apache.org/jira/browse/SPARK-30098 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: wuyi >Assignee: wuyi >Priority: Major > Labels: release-notes > Fix For: 3.0.0 > > > Changing the default provider from `hive` to the value of > `spark.sql.sources.default` for "CREATE TABLE" syntax to make it be > consistent with DataFrameWriter.saveAsTable API. > Also, it brings more friendly to end users since Spark is well know of using > parquet(default value of `spark.sql.sources.default`) as its default I/O > format. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-30098) Use default datasource as provider for CREATE TABLE syntax
[ https://issues.apache.org/jira/browse/SPARK-30098?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17104478#comment-17104478 ] Apache Spark commented on SPARK-30098: -- User 'Ngone51' has created a pull request for this issue: https://github.com/apache/spark/pull/28500 > Use default datasource as provider for CREATE TABLE syntax > -- > > Key: SPARK-30098 > URL: https://issues.apache.org/jira/browse/SPARK-30098 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.0.0 >Reporter: wuyi >Assignee: wuyi >Priority: Major > Labels: release-notes > Fix For: 3.0.0 > > > Changing the default provider from `hive` to the value of > `spark.sql.sources.default` for "CREATE TABLE" syntax to make it be > consistent with DataFrameWriter.saveAsTable API. > Also, it brings more friendly to end users since Spark is well know of using > parquet(default value of `spark.sql.sources.default`) as its default I/O > format. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-31665) Test parquet dictionary encoding of random dates/timestamps
[ https://issues.apache.org/jira/browse/SPARK-31665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan reassigned SPARK-31665: --- Assignee: Maxim Gekk > Test parquet dictionary encoding of random dates/timestamps > --- > > Key: SPARK-31665 > URL: https://issues.apache.org/jira/browse/SPARK-31665 > Project: Spark > Issue Type: Test > Components: SQL >Affects Versions: 3.1.0 >Reporter: Maxim Gekk >Assignee: Maxim Gekk >Priority: Major > > Currently, dictionary encoding is not tested in ParquetHadoopFsRelationSuite > test "test all data types" because dates and timestamps are uniformly > distributed, and dictionary encoding is not applied for the types in fact. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Resolved] (SPARK-31665) Test parquet dictionary encoding of random dates/timestamps
[ https://issues.apache.org/jira/browse/SPARK-31665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Wenchen Fan resolved SPARK-31665. - Fix Version/s: 3.0.0 Resolution: Fixed Issue resolved by pull request 28481 [https://github.com/apache/spark/pull/28481] > Test parquet dictionary encoding of random dates/timestamps > --- > > Key: SPARK-31665 > URL: https://issues.apache.org/jira/browse/SPARK-31665 > Project: Spark > Issue Type: Test > Components: SQL >Affects Versions: 3.1.0 >Reporter: Maxim Gekk >Assignee: Maxim Gekk >Priority: Major > Fix For: 3.0.0 > > > Currently, dictionary encoding is not tested in ParquetHadoopFsRelationSuite > test "test all data types" because dates and timestamps are uniformly > distributed, and dictionary encoding is not applied for the types in fact. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-31666) Cannot map hostPath volumes to container
[ https://issues.apache.org/jira/browse/SPARK-31666?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17104391#comment-17104391 ] Hyukjin Kwon commented on SPARK-31666: -- No idea. Something must have been wrong. > Cannot map hostPath volumes to container > > > Key: SPARK-31666 > URL: https://issues.apache.org/jira/browse/SPARK-31666 > Project: Spark > Issue Type: Bug > Components: Kubernetes >Affects Versions: 2.4.5 >Reporter: Stephen Hopper >Priority: Major > > I'm trying to mount additional hostPath directories as seen in a couple of > places: > [https://aws.amazon.com/blogs/containers/optimizing-spark-performance-on-kubernetes/] > [https://github.com/GoogleCloudPlatform/spark-on-k8s-operator/blob/master/docs/user-guide.md#using-volume-for-scratch-space] > [https://spark.apache.org/docs/latest/running-on-kubernetes.html#using-kubernetes-volumes] > > However, whenever I try to submit my job, I run into this error: > {code:java} > Uncaught exception in thread kubernetes-executor-snapshots-subscribers-1 │ > io.fabric8.kubernetes.client.KubernetesClientException: Failure executing: > POST at: https://kubernetes.default.svc/api/v1/namespaces/my-spark-ns/pods. > Message: Pod "spark-pi-1588970477877-exec-1" is invalid: > spec.containers[0].volumeMounts[1].mountPath: Invalid value: "/tmp1": must be > unique. Received status: Status(apiVersion=v1, code=422, > details=StatusDetails(causes=[StatusCause(field=spec.containers[0].volumeMounts[1].mountPath, > message=Invalid value: "/tmp1": must be unique, reason=FieldValueInvalid, > additionalProperties={})], group=null, kind=Pod, > name=spark-pi-1588970477877-exec-1, retryAfterSeconds=null, uid=null, > additionalProperties={}), kind=Status, message=Pod > "spark-pi-1588970477877-exec-1" is invalid: > spec.containers[0].volumeMounts[1].mountPath: Invalid value: "/tmp1": must be > unique, metadata=ListMeta(_continue=null, remainingItemCount=null, > resourceVersion=null, selfLink=null, additionalProperties={}), > reason=Invalid, status=Failure, additionalProperties={}).{code} > > This is my spark-submit command (note: I've used my own build of spark for > kubernetes as well as a few other images that I've seen floating around (such > as this one seedjeffwan/spark:v2.4.5) and they all have this same issue): > {code:java} > bin/spark-submit \ > --master k8s://https://my-k8s-server:443 \ > --deploy-mode cluster \ > --name spark-pi \ > --class org.apache.spark.examples.SparkPi \ > --conf spark.executor.instances=2 \ > --conf spark.kubernetes.container.image=my-spark-image:my-tag \ > --conf spark.kubernetes.driver.pod.name=sparkpi-test-driver \ > --conf spark.kubernetes.namespace=my-spark-ns \ > --conf > spark.kubernetes.executor.volumes.hostPath.spark-local-dir-2.mount.path=/tmp1 > \ > --conf > spark.kubernetes.executor.volumes.hostPath.spark-local-dir-2.options.path=/tmp1 > \ > --conf spark.local.dir="/tmp1" \ > --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark > local:///opt/spark/examples/jars/spark-examples_2.11-2.4.5.jar 2{code} > Any ideas on what's causing this? > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Reopened] (SPARK-31666) Cannot map hostPath volumes to container
[ https://issues.apache.org/jira/browse/SPARK-31666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Stephen Hopper reopened SPARK-31666: [~hyukjin.kwon] why was this closed and marked as invalid? > Cannot map hostPath volumes to container > > > Key: SPARK-31666 > URL: https://issues.apache.org/jira/browse/SPARK-31666 > Project: Spark > Issue Type: Bug > Components: Kubernetes >Affects Versions: 2.4.5 >Reporter: Stephen Hopper >Priority: Major > > I'm trying to mount additional hostPath directories as seen in a couple of > places: > [https://aws.amazon.com/blogs/containers/optimizing-spark-performance-on-kubernetes/] > [https://github.com/GoogleCloudPlatform/spark-on-k8s-operator/blob/master/docs/user-guide.md#using-volume-for-scratch-space] > [https://spark.apache.org/docs/latest/running-on-kubernetes.html#using-kubernetes-volumes] > > However, whenever I try to submit my job, I run into this error: > {code:java} > Uncaught exception in thread kubernetes-executor-snapshots-subscribers-1 │ > io.fabric8.kubernetes.client.KubernetesClientException: Failure executing: > POST at: https://kubernetes.default.svc/api/v1/namespaces/my-spark-ns/pods. > Message: Pod "spark-pi-1588970477877-exec-1" is invalid: > spec.containers[0].volumeMounts[1].mountPath: Invalid value: "/tmp1": must be > unique. Received status: Status(apiVersion=v1, code=422, > details=StatusDetails(causes=[StatusCause(field=spec.containers[0].volumeMounts[1].mountPath, > message=Invalid value: "/tmp1": must be unique, reason=FieldValueInvalid, > additionalProperties={})], group=null, kind=Pod, > name=spark-pi-1588970477877-exec-1, retryAfterSeconds=null, uid=null, > additionalProperties={}), kind=Status, message=Pod > "spark-pi-1588970477877-exec-1" is invalid: > spec.containers[0].volumeMounts[1].mountPath: Invalid value: "/tmp1": must be > unique, metadata=ListMeta(_continue=null, remainingItemCount=null, > resourceVersion=null, selfLink=null, additionalProperties={}), > reason=Invalid, status=Failure, additionalProperties={}).{code} > > This is my spark-submit command (note: I've used my own build of spark for > kubernetes as well as a few other images that I've seen floating around (such > as this one seedjeffwan/spark:v2.4.5) and they all have this same issue): > {code:java} > bin/spark-submit \ > --master k8s://https://my-k8s-server:443 \ > --deploy-mode cluster \ > --name spark-pi \ > --class org.apache.spark.examples.SparkPi \ > --conf spark.executor.instances=2 \ > --conf spark.kubernetes.container.image=my-spark-image:my-tag \ > --conf spark.kubernetes.driver.pod.name=sparkpi-test-driver \ > --conf spark.kubernetes.namespace=my-spark-ns \ > --conf > spark.kubernetes.executor.volumes.hostPath.spark-local-dir-2.mount.path=/tmp1 > \ > --conf > spark.kubernetes.executor.volumes.hostPath.spark-local-dir-2.options.path=/tmp1 > \ > --conf spark.local.dir="/tmp1" \ > --conf spark.kubernetes.authenticate.driver.serviceAccountName=spark > local:///opt/spark/examples/jars/spark-examples_2.11-2.4.5.jar 2{code} > Any ideas on what's causing this? > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-31678) PrintStackTrace for Spark SQL CLI when error occurs
[ https://issues.apache.org/jira/browse/SPARK-31678?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17104352#comment-17104352 ] Apache Spark commented on SPARK-31678: -- User 'yaooqinn' has created a pull request for this issue: https://github.com/apache/spark/pull/28499 > PrintStackTrace for Spark SQL CLI when error occurs > --- > > Key: SPARK-31678 > URL: https://issues.apache.org/jira/browse/SPARK-31678 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.4.5, 3.0.0, 3.1.0 >Reporter: Kent Yao >Priority: Minor > > When I was finding the root cause of > https://issues.apache.org/jira/browse/SPARK-31675, I noticed that it is very > difficult for me to see what was actually going on, since it output nothing > else but > {code:java} > Error in query: java.lang.IllegalArgumentException: Wrong FS: > hdfs://cluster2/user/warehouse/bdms_hzyaoqin_test.db/abc/.hive-staging_hive_2020-05-11_17-10-27_123_6306294638950056285-1/-ext-1/part-0-badf2a31-ab36-4b60-82a1-0848774e4af5-c000, > expected: hdfs://cluster1 > {code} > It is really hard for us to find causes through such a simple error message. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-31678) PrintStackTrace for Spark SQL CLI when error occurs
[ https://issues.apache.org/jira/browse/SPARK-31678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-31678: Assignee: Apache Spark > PrintStackTrace for Spark SQL CLI when error occurs > --- > > Key: SPARK-31678 > URL: https://issues.apache.org/jira/browse/SPARK-31678 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.4.5, 3.0.0, 3.1.0 >Reporter: Kent Yao >Assignee: Apache Spark >Priority: Minor > > When I was finding the root cause of > https://issues.apache.org/jira/browse/SPARK-31675, I noticed that it is very > difficult for me to see what was actually going on, since it output nothing > else but > {code:java} > Error in query: java.lang.IllegalArgumentException: Wrong FS: > hdfs://cluster2/user/warehouse/bdms_hzyaoqin_test.db/abc/.hive-staging_hive_2020-05-11_17-10-27_123_6306294638950056285-1/-ext-1/part-0-badf2a31-ab36-4b60-82a1-0848774e4af5-c000, > expected: hdfs://cluster1 > {code} > It is really hard for us to find causes through such a simple error message. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-31678) PrintStackTrace for Spark SQL CLI when error occurs
[ https://issues.apache.org/jira/browse/SPARK-31678?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-31678: Assignee: (was: Apache Spark) > PrintStackTrace for Spark SQL CLI when error occurs > --- > > Key: SPARK-31678 > URL: https://issues.apache.org/jira/browse/SPARK-31678 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 2.4.5, 3.0.0, 3.1.0 >Reporter: Kent Yao >Priority: Minor > > When I was finding the root cause of > https://issues.apache.org/jira/browse/SPARK-31675, I noticed that it is very > difficult for me to see what was actually going on, since it output nothing > else but > {code:java} > Error in query: java.lang.IllegalArgumentException: Wrong FS: > hdfs://cluster2/user/warehouse/bdms_hzyaoqin_test.db/abc/.hive-staging_hive_2020-05-11_17-10-27_123_6306294638950056285-1/-ext-1/part-0-badf2a31-ab36-4b60-82a1-0848774e4af5-c000, > expected: hdfs://cluster1 > {code} > It is really hard for us to find causes through such a simple error message. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-31679) Flaky test: org.apache.spark.sql.kafka010.KafkaDelegationTokenSuite: Failed to create new KafkaAdminClient
[ https://issues.apache.org/jira/browse/SPARK-31679?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17104322#comment-17104322 ] Gabor Somogyi commented on SPARK-31679: --- Started to work on this. > Flaky test: org.apache.spark.sql.kafka010.KafkaDelegationTokenSuite: Failed > to create new KafkaAdminClient > -- > > Key: SPARK-31679 > URL: https://issues.apache.org/jira/browse/SPARK-31679 > Project: Spark > Issue Type: Bug > Components: Structured Streaming, Tests >Affects Versions: 3.0.0 >Reporter: Gabor Somogyi >Priority: Major > > https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122389/testReport/ > {code:java} > Failed > org.apache.spark.sql.kafka010.KafkaDelegationTokenSuite.(It is not a test it > is a sbt.testing.SuiteSelector) > Failing for the past 1 build (Since Failed#122389 ) > Took 34 sec. > Error Message > org.apache.kafka.common.KafkaException: Failed to create new KafkaAdminClient > Stacktrace > sbt.ForkMain$ForkError: org.apache.kafka.common.KafkaException: Failed to > create new KafkaAdminClient > at > org.apache.kafka.clients.admin.KafkaAdminClient.createInternal(KafkaAdminClient.java:479) > at org.apache.kafka.clients.admin.Admin.create(Admin.java:61) > at > org.apache.kafka.clients.admin.AdminClient.create(AdminClient.java:39) > at > org.apache.spark.sql.kafka010.KafkaTestUtils.setupEmbeddedKafkaServer(KafkaTestUtils.scala:267) > at > org.apache.spark.sql.kafka010.KafkaTestUtils.setup(KafkaTestUtils.scala:290) > at > org.apache.spark.sql.kafka010.KafkaDelegationTokenSuite.beforeAll(KafkaDelegationTokenSuite.scala:49) > at > org.scalatest.BeforeAndAfterAll.liftedTree1$1(BeforeAndAfterAll.scala:212) > at org.scalatest.BeforeAndAfterAll.run(BeforeAndAfterAll.scala:210) > at org.scalatest.BeforeAndAfterAll.run$(BeforeAndAfterAll.scala:208) > at org.apache.spark.SparkFunSuite.run(SparkFunSuite.scala:58) > at > org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:317) > at > org.scalatest.tools.Framework$ScalaTestTask.execute(Framework.scala:510) > at sbt.ForkMain$Run$2.call(ForkMain.java:296) > at sbt.ForkMain$Run$2.call(ForkMain.java:286) > at java.util.concurrent.FutureTask.run(FutureTask.java:266) > at > java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) > at java.lang.Thread.run(Thread.java:748) > Caused by: sbt.ForkMain$ForkError: org.apache.kafka.common.KafkaException: > javax.security.auth.login.LoginException: Client not found in Kerberos > database (6) - Client not found in Kerberos database > at > org.apache.kafka.common.network.SaslChannelBuilder.configure(SaslChannelBuilder.java:172) > at > org.apache.kafka.common.network.ChannelBuilders.create(ChannelBuilders.java:157) > at > org.apache.kafka.common.network.ChannelBuilders.clientChannelBuilder(ChannelBuilders.java:73) > at > org.apache.kafka.clients.ClientUtils.createChannelBuilder(ClientUtils.java:105) > at > org.apache.kafka.clients.admin.KafkaAdminClient.createInternal(KafkaAdminClient.java:454) > ... 17 more > Caused by: sbt.ForkMain$ForkError: javax.security.auth.login.LoginException: > Client not found in Kerberos database (6) - Client not found in Kerberos > database > at > com.sun.security.auth.module.Krb5LoginModule.attemptAuthentication(Krb5LoginModule.java:804) > at > com.sun.security.auth.module.Krb5LoginModule.login(Krb5LoginModule.java:617) > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > at > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) > at > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) > at java.lang.reflect.Method.invoke(Method.java:498) > at javax.security.auth.login.LoginContext.invoke(LoginContext.java:755) > at > javax.security.auth.login.LoginContext.access$000(LoginContext.java:195) > at javax.security.auth.login.LoginContext$4.run(LoginContext.java:682) > at javax.security.auth.login.LoginContext$4.run(LoginContext.java:680) > at java.security.AccessController.doPrivileged(Native Method) > at > javax.security.auth.login.LoginContext.invokePriv(LoginContext.java:680) > at javax.security.auth.login.LoginContext.login(LoginContext.java:587) > at > org.apache.kafka.common.security.authenticator.AbstractLogin.login(AbstractLogin.java:60) > at > org.apache.kafka.common.security.kerberos.KerberosLogin.login(KerberosLogin.ja
[jira] [Created] (SPARK-31679) Flaky test: org.apache.spark.sql.kafka010.KafkaDelegationTokenSuite: Failed to create new KafkaAdminClient
Gabor Somogyi created SPARK-31679: - Summary: Flaky test: org.apache.spark.sql.kafka010.KafkaDelegationTokenSuite: Failed to create new KafkaAdminClient Key: SPARK-31679 URL: https://issues.apache.org/jira/browse/SPARK-31679 Project: Spark Issue Type: Bug Components: Structured Streaming, Tests Affects Versions: 3.0.0 Reporter: Gabor Somogyi https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/122389/testReport/ {code:java} Failed org.apache.spark.sql.kafka010.KafkaDelegationTokenSuite.(It is not a test it is a sbt.testing.SuiteSelector) Failing for the past 1 build (Since Failed#122389 ) Took 34 sec. Error Message org.apache.kafka.common.KafkaException: Failed to create new KafkaAdminClient Stacktrace sbt.ForkMain$ForkError: org.apache.kafka.common.KafkaException: Failed to create new KafkaAdminClient at org.apache.kafka.clients.admin.KafkaAdminClient.createInternal(KafkaAdminClient.java:479) at org.apache.kafka.clients.admin.Admin.create(Admin.java:61) at org.apache.kafka.clients.admin.AdminClient.create(AdminClient.java:39) at org.apache.spark.sql.kafka010.KafkaTestUtils.setupEmbeddedKafkaServer(KafkaTestUtils.scala:267) at org.apache.spark.sql.kafka010.KafkaTestUtils.setup(KafkaTestUtils.scala:290) at org.apache.spark.sql.kafka010.KafkaDelegationTokenSuite.beforeAll(KafkaDelegationTokenSuite.scala:49) at org.scalatest.BeforeAndAfterAll.liftedTree1$1(BeforeAndAfterAll.scala:212) at org.scalatest.BeforeAndAfterAll.run(BeforeAndAfterAll.scala:210) at org.scalatest.BeforeAndAfterAll.run$(BeforeAndAfterAll.scala:208) at org.apache.spark.SparkFunSuite.run(SparkFunSuite.scala:58) at org.scalatest.tools.Framework.org$scalatest$tools$Framework$$runSuite(Framework.scala:317) at org.scalatest.tools.Framework$ScalaTestTask.execute(Framework.scala:510) at sbt.ForkMain$Run$2.call(ForkMain.java:296) at sbt.ForkMain$Run$2.call(ForkMain.java:286) at java.util.concurrent.FutureTask.run(FutureTask.java:266) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: sbt.ForkMain$ForkError: org.apache.kafka.common.KafkaException: javax.security.auth.login.LoginException: Client not found in Kerberos database (6) - Client not found in Kerberos database at org.apache.kafka.common.network.SaslChannelBuilder.configure(SaslChannelBuilder.java:172) at org.apache.kafka.common.network.ChannelBuilders.create(ChannelBuilders.java:157) at org.apache.kafka.common.network.ChannelBuilders.clientChannelBuilder(ChannelBuilders.java:73) at org.apache.kafka.clients.ClientUtils.createChannelBuilder(ClientUtils.java:105) at org.apache.kafka.clients.admin.KafkaAdminClient.createInternal(KafkaAdminClient.java:454) ... 17 more Caused by: sbt.ForkMain$ForkError: javax.security.auth.login.LoginException: Client not found in Kerberos database (6) - Client not found in Kerberos database at com.sun.security.auth.module.Krb5LoginModule.attemptAuthentication(Krb5LoginModule.java:804) at com.sun.security.auth.module.Krb5LoginModule.login(Krb5LoginModule.java:617) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at javax.security.auth.login.LoginContext.invoke(LoginContext.java:755) at javax.security.auth.login.LoginContext.access$000(LoginContext.java:195) at javax.security.auth.login.LoginContext$4.run(LoginContext.java:682) at javax.security.auth.login.LoginContext$4.run(LoginContext.java:680) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.login.LoginContext.invokePriv(LoginContext.java:680) at javax.security.auth.login.LoginContext.login(LoginContext.java:587) at org.apache.kafka.common.security.authenticator.AbstractLogin.login(AbstractLogin.java:60) at org.apache.kafka.common.security.kerberos.KerberosLogin.login(KerberosLogin.java:103) at org.apache.kafka.common.security.authenticator.LoginManager.(LoginManager.java:62) at org.apache.kafka.common.security.authenticator.LoginManager.acquireLoginManager(LoginManager.java:105) at org.apache.kafka.common.network.SaslChannelBuilder.configure(SaslChannelBuilder.java:158) ... 21 more Caused by: sbt.ForkMain$ForkError: sun.security.krb5.KrbException: Client not
[jira] [Created] (SPARK-31678) PrintStackTrace for Spark SQL CLI when error occurs
Kent Yao created SPARK-31678: Summary: PrintStackTrace for Spark SQL CLI when error occurs Key: SPARK-31678 URL: https://issues.apache.org/jira/browse/SPARK-31678 Project: Spark Issue Type: Improvement Components: SQL Affects Versions: 2.4.5, 3.0.0, 3.1.0 Reporter: Kent Yao When I was finding the root cause of https://issues.apache.org/jira/browse/SPARK-31675, I noticed that it is very difficult for me to see what was actually going on, since it output nothing else but {code:java} Error in query: java.lang.IllegalArgumentException: Wrong FS: hdfs://cluster2/user/warehouse/bdms_hzyaoqin_test.db/abc/.hive-staging_hive_2020-05-11_17-10-27_123_6306294638950056285-1/-ext-1/part-0-badf2a31-ab36-4b60-82a1-0848774e4af5-c000, expected: hdfs://cluster1 {code} It is really hard for us to find causes through such a simple error message. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-31676) QuantileDiscretizer raise error parameter splits given invalid value (splits array includes -0.0 and 0.0)
[ https://issues.apache.org/jira/browse/SPARK-31676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-31676: Assignee: Apache Spark > QuantileDiscretizer raise error parameter splits given invalid value (splits > array includes -0.0 and 0.0) > - > > Key: SPARK-31676 > URL: https://issues.apache.org/jira/browse/SPARK-31676 > Project: Spark > Issue Type: Bug > Components: ML >Affects Versions: 2.4.5, 3.0.0 >Reporter: Weichen Xu >Assignee: Apache Spark >Priority: Major > > Reproduce code > {code} > import scala.util.Random > val rng = new Random(3) > val a1 = Array.tabulate(200)(_=>rng.nextDouble * 2.0 - 1.0) ++ > Array.fill(20)(0.0) ++ Array.fill(20)(-0.0) > import spark.implicits._ > val df1 = sc.parallelize(a1, 2).toDF("id") > import org.apache.spark.ml.feature.QuantileDiscretizer > val qd = new > QuantileDiscretizer().setInputCol("id").setOutputCol("out").setNumBuckets(200).setRelativeError(0.0) > val model = qd.fit(df1) > {code} > Raise error like: > at org.apache.spark.ml.param.Param.validate(params.scala:76) > at org.apache.spark.ml.param.ParamPair.(params.scala:634) > at org.apache.spark.ml.param.Param.$minus$greater(params.scala:85) > at org.apache.spark.ml.param.Params.set(params.scala:713) > at org.apache.spark.ml.param.Params.set$(params.scala:712) > at org.apache.spark.ml.PipelineStage.set(Pipeline.scala:41) > at org.apache.spark.ml.feature.Bucketizer.setSplits(Bucketizer.scala:77) > at > org.apache.spark.ml.feature.QuantileDiscretizer.fit(QuantileDiscretizer.scala:231) > ... 49 elided > java.lang.IllegalArgumentException: quantileDiscretizer_479bb5a3ca99 > parameter splits given invalid value [-Infinity,-0.9986765732730827,..., > -0.0, 0.0, ..., 0.9907184077958491,Infinity] > 0.0 > -0.0 is False, which break the paremater validation check. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-31676) QuantileDiscretizer raise error parameter splits given invalid value (splits array includes -0.0 and 0.0)
[ https://issues.apache.org/jira/browse/SPARK-31676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17104269#comment-17104269 ] Apache Spark commented on SPARK-31676: -- User 'WeichenXu123' has created a pull request for this issue: https://github.com/apache/spark/pull/28498 > QuantileDiscretizer raise error parameter splits given invalid value (splits > array includes -0.0 and 0.0) > - > > Key: SPARK-31676 > URL: https://issues.apache.org/jira/browse/SPARK-31676 > Project: Spark > Issue Type: Bug > Components: ML >Affects Versions: 2.4.5, 3.0.0 >Reporter: Weichen Xu >Priority: Major > > Reproduce code > {code} > import scala.util.Random > val rng = new Random(3) > val a1 = Array.tabulate(200)(_=>rng.nextDouble * 2.0 - 1.0) ++ > Array.fill(20)(0.0) ++ Array.fill(20)(-0.0) > import spark.implicits._ > val df1 = sc.parallelize(a1, 2).toDF("id") > import org.apache.spark.ml.feature.QuantileDiscretizer > val qd = new > QuantileDiscretizer().setInputCol("id").setOutputCol("out").setNumBuckets(200).setRelativeError(0.0) > val model = qd.fit(df1) > {code} > Raise error like: > at org.apache.spark.ml.param.Param.validate(params.scala:76) > at org.apache.spark.ml.param.ParamPair.(params.scala:634) > at org.apache.spark.ml.param.Param.$minus$greater(params.scala:85) > at org.apache.spark.ml.param.Params.set(params.scala:713) > at org.apache.spark.ml.param.Params.set$(params.scala:712) > at org.apache.spark.ml.PipelineStage.set(Pipeline.scala:41) > at org.apache.spark.ml.feature.Bucketizer.setSplits(Bucketizer.scala:77) > at > org.apache.spark.ml.feature.QuantileDiscretizer.fit(QuantileDiscretizer.scala:231) > ... 49 elided > java.lang.IllegalArgumentException: quantileDiscretizer_479bb5a3ca99 > parameter splits given invalid value [-Infinity,-0.9986765732730827,..., > -0.0, 0.0, ..., 0.9907184077958491,Infinity] > 0.0 > -0.0 is False, which break the paremater validation check. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-31676) QuantileDiscretizer raise error parameter splits given invalid value (splits array includes -0.0 and 0.0)
[ https://issues.apache.org/jira/browse/SPARK-31676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-31676: Assignee: (was: Apache Spark) > QuantileDiscretizer raise error parameter splits given invalid value (splits > array includes -0.0 and 0.0) > - > > Key: SPARK-31676 > URL: https://issues.apache.org/jira/browse/SPARK-31676 > Project: Spark > Issue Type: Bug > Components: ML >Affects Versions: 2.4.5, 3.0.0 >Reporter: Weichen Xu >Priority: Major > > Reproduce code > {code} > import scala.util.Random > val rng = new Random(3) > val a1 = Array.tabulate(200)(_=>rng.nextDouble * 2.0 - 1.0) ++ > Array.fill(20)(0.0) ++ Array.fill(20)(-0.0) > import spark.implicits._ > val df1 = sc.parallelize(a1, 2).toDF("id") > import org.apache.spark.ml.feature.QuantileDiscretizer > val qd = new > QuantileDiscretizer().setInputCol("id").setOutputCol("out").setNumBuckets(200).setRelativeError(0.0) > val model = qd.fit(df1) > {code} > Raise error like: > at org.apache.spark.ml.param.Param.validate(params.scala:76) > at org.apache.spark.ml.param.ParamPair.(params.scala:634) > at org.apache.spark.ml.param.Param.$minus$greater(params.scala:85) > at org.apache.spark.ml.param.Params.set(params.scala:713) > at org.apache.spark.ml.param.Params.set$(params.scala:712) > at org.apache.spark.ml.PipelineStage.set(Pipeline.scala:41) > at org.apache.spark.ml.feature.Bucketizer.setSplits(Bucketizer.scala:77) > at > org.apache.spark.ml.feature.QuantileDiscretizer.fit(QuantileDiscretizer.scala:231) > ... 49 elided > java.lang.IllegalArgumentException: quantileDiscretizer_479bb5a3ca99 > parameter splits given invalid value [-Infinity,-0.9986765732730827,..., > -0.0, 0.0, ..., 0.9907184077958491,Infinity] > 0.0 > -0.0 is False, which break the paremater validation check. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-31676) QuantileDiscretizer raise error parameter splits given invalid value (splits array includes -0.0 and 0.0)
[ https://issues.apache.org/jira/browse/SPARK-31676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weichen Xu updated SPARK-31676: --- Description: Reproduce code {code: scala} import scala.util.Random val rng = new Random(3) val a1 = Array.tabulate(200)(_=>rng.nextDouble * 2.0 - 1.0) ++ Array.fill(20)(0.0) ++ Array.fill(20)(-0.0) import spark.implicits._ val df1 = sc.parallelize(a1, 2).toDF("id") import org.apache.spark.ml.feature.QuantileDiscretizer val qd = new QuantileDiscretizer().setInputCol("id").setOutputCol("out").setNumBuckets(200).setRelativeError(0.0) val model = qd.fit(df1) {code} Raise error like: at org.apache.spark.ml.param.Param.validate(params.scala:76) at org.apache.spark.ml.param.ParamPair.(params.scala:634) at org.apache.spark.ml.param.Param.$minus$greater(params.scala:85) at org.apache.spark.ml.param.Params.set(params.scala:713) at org.apache.spark.ml.param.Params.set$(params.scala:712) at org.apache.spark.ml.PipelineStage.set(Pipeline.scala:41) at org.apache.spark.ml.feature.Bucketizer.setSplits(Bucketizer.scala:77) at org.apache.spark.ml.feature.QuantileDiscretizer.fit(QuantileDiscretizer.scala:231) ... 49 elided java.lang.IllegalArgumentException: quantileDiscretizer_479bb5a3ca99 parameter splits given invalid value [-Infinity,-0.9986765732730827,..., -0.0, 0.0, ..., 0.9907184077958491,Infinity] 0.0 > -0.0 is False, which break the paremater validation check. was: Reproduce code {code: scala} import scala.util.Random val rng = new Random(3) val a1 = Array.tabulate(200)(_=>rng.nextDouble * 2.0 - 1.0) ++ Array.fill(20)(0.0) ++ Array.fill(20)(-0.0) import spark.implicits._ val df1 = sc.parallelize(a1, 2).toDF("id") import org.apache.spark.ml.feature.QuantileDiscretizer val qd = new QuantileDiscretizer().setInputCol("id").setOutputCol("out").setNumBuckets(200).setRelativeError(0.0) val model = qd.fit(df1) {code} Raise error like: at org.apache.spark.ml.param.Param.validate(params.scala:76) at org.apache.spark.ml.param.ParamPair.(params.scala:634) at org.apache.spark.ml.param.Param.$minus$greater(params.scala:85) at org.apache.spark.ml.param.Params.set(params.scala:713) at org.apache.spark.ml.param.Params.set$(params.scala:712) at org.apache.spark.ml.PipelineStage.set(Pipeline.scala:41) at org.apache.spark.ml.feature.Bucketizer.setSplits(Bucketizer.scala:77) at org.apache.spark.ml.feature.QuantileDiscretizer.fit(QuantileDiscretizer.scala:231) ... 49 elided java.lang.IllegalArgumentException: quantileDiscretizer_479bb5a3ca99 parameter splits given invalid value [-Infinity,-0.9986765732730827,..., -0.0, 0.0, ..., 0.9907184077958491,Infinity] 0.0 > -0.0 is False, which break the paremater validation check. > QuantileDiscretizer raise error parameter splits given invalid value (splits > array includes -0.0 and 0.0) > - > > Key: SPARK-31676 > URL: https://issues.apache.org/jira/browse/SPARK-31676 > Project: Spark > Issue Type: Bug > Components: ML >Affects Versions: 2.4.5, 3.0.0 >Reporter: Weichen Xu >Priority: Major > > Reproduce code > {code: scala} > import scala.util.Random > val rng = new Random(3) > val a1 = Array.tabulate(200)(_=>rng.nextDouble * 2.0 - 1.0) ++ > Array.fill(20)(0.0) ++ Array.fill(20)(-0.0) > import spark.implicits._ > val df1 = sc.parallelize(a1, 2).toDF("id") > import org.apache.spark.ml.feature.QuantileDiscretizer > val qd = new > QuantileDiscretizer().setInputCol("id").setOutputCol("out").setNumBuckets(200).setRelativeError(0.0) > val model = qd.fit(df1) > {code} > Raise error like: > at org.apache.spark.ml.param.Param.validate(params.scala:76) > at org.apache.spark.ml.param.ParamPair.(params.scala:634) > at org.apache.spark.ml.param.Param.$minus$greater(params.scala:85) > at org.apache.spark.ml.param.Params.set(params.scala:713) > at org.apache.spark.ml.param.Params.set$(params.scala:712) > at org.apache.spark.ml.PipelineStage.set(Pipeline.scala:41) > at org.apache.spark.ml.feature.Bucketizer.setSplits(Bucketizer.scala:77) > at > org.apache.spark.ml.feature.QuantileDiscretizer.fit(QuantileDiscretizer.scala:231) > ... 49 elided > java.lang.IllegalArgumentException: quantileDiscretizer_479bb5a3ca99 > parameter splits given invalid value [-Infinity,-0.9986765732730827,..., > -0.0, 0.0, ..., 0.9907184077958491,Infinity] > 0.0 > -0.0 is False, which break the paremater validation check. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-31676) QuantileDiscretizer raise error parameter splits given invalid value (splits array includes -0.0 and 0.0)
[ https://issues.apache.org/jira/browse/SPARK-31676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Weichen Xu updated SPARK-31676: --- Description: Reproduce code {code} import scala.util.Random val rng = new Random(3) val a1 = Array.tabulate(200)(_=>rng.nextDouble * 2.0 - 1.0) ++ Array.fill(20)(0.0) ++ Array.fill(20)(-0.0) import spark.implicits._ val df1 = sc.parallelize(a1, 2).toDF("id") import org.apache.spark.ml.feature.QuantileDiscretizer val qd = new QuantileDiscretizer().setInputCol("id").setOutputCol("out").setNumBuckets(200).setRelativeError(0.0) val model = qd.fit(df1) {code} Raise error like: at org.apache.spark.ml.param.Param.validate(params.scala:76) at org.apache.spark.ml.param.ParamPair.(params.scala:634) at org.apache.spark.ml.param.Param.$minus$greater(params.scala:85) at org.apache.spark.ml.param.Params.set(params.scala:713) at org.apache.spark.ml.param.Params.set$(params.scala:712) at org.apache.spark.ml.PipelineStage.set(Pipeline.scala:41) at org.apache.spark.ml.feature.Bucketizer.setSplits(Bucketizer.scala:77) at org.apache.spark.ml.feature.QuantileDiscretizer.fit(QuantileDiscretizer.scala:231) ... 49 elided java.lang.IllegalArgumentException: quantileDiscretizer_479bb5a3ca99 parameter splits given invalid value [-Infinity,-0.9986765732730827,..., -0.0, 0.0, ..., 0.9907184077958491,Infinity] 0.0 > -0.0 is False, which break the paremater validation check. was: Reproduce code {code: scala} import scala.util.Random val rng = new Random(3) val a1 = Array.tabulate(200)(_=>rng.nextDouble * 2.0 - 1.0) ++ Array.fill(20)(0.0) ++ Array.fill(20)(-0.0) import spark.implicits._ val df1 = sc.parallelize(a1, 2).toDF("id") import org.apache.spark.ml.feature.QuantileDiscretizer val qd = new QuantileDiscretizer().setInputCol("id").setOutputCol("out").setNumBuckets(200).setRelativeError(0.0) val model = qd.fit(df1) {code} Raise error like: at org.apache.spark.ml.param.Param.validate(params.scala:76) at org.apache.spark.ml.param.ParamPair.(params.scala:634) at org.apache.spark.ml.param.Param.$minus$greater(params.scala:85) at org.apache.spark.ml.param.Params.set(params.scala:713) at org.apache.spark.ml.param.Params.set$(params.scala:712) at org.apache.spark.ml.PipelineStage.set(Pipeline.scala:41) at org.apache.spark.ml.feature.Bucketizer.setSplits(Bucketizer.scala:77) at org.apache.spark.ml.feature.QuantileDiscretizer.fit(QuantileDiscretizer.scala:231) ... 49 elided java.lang.IllegalArgumentException: quantileDiscretizer_479bb5a3ca99 parameter splits given invalid value [-Infinity,-0.9986765732730827,..., -0.0, 0.0, ..., 0.9907184077958491,Infinity] 0.0 > -0.0 is False, which break the paremater validation check. > QuantileDiscretizer raise error parameter splits given invalid value (splits > array includes -0.0 and 0.0) > - > > Key: SPARK-31676 > URL: https://issues.apache.org/jira/browse/SPARK-31676 > Project: Spark > Issue Type: Bug > Components: ML >Affects Versions: 2.4.5, 3.0.0 >Reporter: Weichen Xu >Priority: Major > > Reproduce code > {code} > import scala.util.Random > val rng = new Random(3) > val a1 = Array.tabulate(200)(_=>rng.nextDouble * 2.0 - 1.0) ++ > Array.fill(20)(0.0) ++ Array.fill(20)(-0.0) > import spark.implicits._ > val df1 = sc.parallelize(a1, 2).toDF("id") > import org.apache.spark.ml.feature.QuantileDiscretizer > val qd = new > QuantileDiscretizer().setInputCol("id").setOutputCol("out").setNumBuckets(200).setRelativeError(0.0) > val model = qd.fit(df1) > {code} > Raise error like: > at org.apache.spark.ml.param.Param.validate(params.scala:76) > at org.apache.spark.ml.param.ParamPair.(params.scala:634) > at org.apache.spark.ml.param.Param.$minus$greater(params.scala:85) > at org.apache.spark.ml.param.Params.set(params.scala:713) > at org.apache.spark.ml.param.Params.set$(params.scala:712) > at org.apache.spark.ml.PipelineStage.set(Pipeline.scala:41) > at org.apache.spark.ml.feature.Bucketizer.setSplits(Bucketizer.scala:77) > at > org.apache.spark.ml.feature.QuantileDiscretizer.fit(QuantileDiscretizer.scala:231) > ... 49 elided > java.lang.IllegalArgumentException: quantileDiscretizer_479bb5a3ca99 > parameter splits given invalid value [-Infinity,-0.9986765732730827,..., > -0.0, 0.0, ..., 0.9907184077958491,Infinity] > 0.0 > -0.0 is False, which break the paremater validation check. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-31677) Use KVStore to cache stream query progress
[ https://issues.apache.org/jira/browse/SPARK-31677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17104265#comment-17104265 ] Apache Spark commented on SPARK-31677: -- User 'uncleGen' has created a pull request for this issue: https://github.com/apache/spark/pull/28497 > Use KVStore to cache stream query progress > -- > > Key: SPARK-31677 > URL: https://issues.apache.org/jira/browse/SPARK-31677 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming >Affects Versions: 2.4.5, 3.0.0 >Reporter: Genmao Yu >Priority: Major > > 1. Streaming query progress information are cached twice in *StreamExecution* > and *StreamingQueryStatusListener*. It is memory-wasting. We can make this > two usage unified. > 2. Use *KVStore* instead to cache streaming query progress information. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-31677) Use KVStore to cache stream query progress
[ https://issues.apache.org/jira/browse/SPARK-31677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-31677: Assignee: Apache Spark > Use KVStore to cache stream query progress > -- > > Key: SPARK-31677 > URL: https://issues.apache.org/jira/browse/SPARK-31677 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming >Affects Versions: 2.4.5, 3.0.0 >Reporter: Genmao Yu >Assignee: Apache Spark >Priority: Major > > 1. Streaming query progress information are cached twice in *StreamExecution* > and *StreamingQueryStatusListener*. It is memory-wasting. We can make this > two usage unified. > 2. Use *KVStore* instead to cache streaming query progress information. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Assigned] (SPARK-31677) Use KVStore to cache stream query progress
[ https://issues.apache.org/jira/browse/SPARK-31677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-31677: Assignee: (was: Apache Spark) > Use KVStore to cache stream query progress > -- > > Key: SPARK-31677 > URL: https://issues.apache.org/jira/browse/SPARK-31677 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming >Affects Versions: 2.4.5, 3.0.0 >Reporter: Genmao Yu >Priority: Major > > 1. Streaming query progress information are cached twice in *StreamExecution* > and *StreamingQueryStatusListener*. It is memory-wasting. We can make this > two usage unified. > 2. Use *KVStore* instead to cache streaming query progress information. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-31677) Use KVStore to cache stream query progress
[ https://issues.apache.org/jira/browse/SPARK-31677?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17104264#comment-17104264 ] Apache Spark commented on SPARK-31677: -- User 'uncleGen' has created a pull request for this issue: https://github.com/apache/spark/pull/28497 > Use KVStore to cache stream query progress > -- > > Key: SPARK-31677 > URL: https://issues.apache.org/jira/browse/SPARK-31677 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming >Affects Versions: 2.4.5, 3.0.0 >Reporter: Genmao Yu >Priority: Major > > 1. Streaming query progress information are cached twice in *StreamExecution* > and *StreamingQueryStatusListener*. It is memory-wasting. We can make this > two usage unified. > 2. Use *KVStore* instead to cache streaming query progress information. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-31677) Use KVStore to cache stream query progress
[ https://issues.apache.org/jira/browse/SPARK-31677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Genmao Yu updated SPARK-31677: -- Environment: (was: 1. Streaming query progress information are cached twice in *StreamExecution* and *StreamingQueryStatusListener*. It is memory-wasting. We can make this two usage unified. 2. Use *KVStore* instead to cache streaming query progress information.) > Use KVStore to cache stream query progress > -- > > Key: SPARK-31677 > URL: https://issues.apache.org/jira/browse/SPARK-31677 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming >Affects Versions: 2.4.5, 3.0.0 >Reporter: Genmao Yu >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-31677) Use KVStore to cache stream query progress
[ https://issues.apache.org/jira/browse/SPARK-31677?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Genmao Yu updated SPARK-31677: -- Description: 1. Streaming query progress information are cached twice in *StreamExecution* and *StreamingQueryStatusListener*. It is memory-wasting. We can make this two usage unified. 2. Use *KVStore* instead to cache streaming query progress information. > Use KVStore to cache stream query progress > -- > > Key: SPARK-31677 > URL: https://issues.apache.org/jira/browse/SPARK-31677 > Project: Spark > Issue Type: Improvement > Components: Structured Streaming >Affects Versions: 2.4.5, 3.0.0 >Reporter: Genmao Yu >Priority: Major > > 1. Streaming query progress information are cached twice in *StreamExecution* > and *StreamingQueryStatusListener*. It is memory-wasting. We can make this > two usage unified. > 2. Use *KVStore* instead to cache streaming query progress information. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-31677) Use KVStore to cache stream query progress
Genmao Yu created SPARK-31677: - Summary: Use KVStore to cache stream query progress Key: SPARK-31677 URL: https://issues.apache.org/jira/browse/SPARK-31677 Project: Spark Issue Type: Improvement Components: Structured Streaming Affects Versions: 2.4.5, 3.0.0 Environment: 1. Streaming query progress information are cached twice in *StreamExecution* and *StreamingQueryStatusListener*. It is memory-wasting. We can make this two usage unified. 2. Use *KVStore* instead to cache streaming query progress information. Reporter: Genmao Yu -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Created] (SPARK-31676) QuantileDiscretizer raise error parameter splits given invalid value (splits array includes -0.0 and 0.0)
Weichen Xu created SPARK-31676: -- Summary: QuantileDiscretizer raise error parameter splits given invalid value (splits array includes -0.0 and 0.0) Key: SPARK-31676 URL: https://issues.apache.org/jira/browse/SPARK-31676 Project: Spark Issue Type: Bug Components: ML Affects Versions: 2.4.5, 3.0.0 Reporter: Weichen Xu Reproduce code {code: scala} import scala.util.Random val rng = new Random(3) val a1 = Array.tabulate(200)(_=>rng.nextDouble * 2.0 - 1.0) ++ Array.fill(20)(0.0) ++ Array.fill(20)(-0.0) import spark.implicits._ val df1 = sc.parallelize(a1, 2).toDF("id") import org.apache.spark.ml.feature.QuantileDiscretizer val qd = new QuantileDiscretizer().setInputCol("id").setOutputCol("out").setNumBuckets(200).setRelativeError(0.0) val model = qd.fit(df1) {code} Raise error like: at org.apache.spark.ml.param.Param.validate(params.scala:76) at org.apache.spark.ml.param.ParamPair.(params.scala:634) at org.apache.spark.ml.param.Param.$minus$greater(params.scala:85) at org.apache.spark.ml.param.Params.set(params.scala:713) at org.apache.spark.ml.param.Params.set$(params.scala:712) at org.apache.spark.ml.PipelineStage.set(Pipeline.scala:41) at org.apache.spark.ml.feature.Bucketizer.setSplits(Bucketizer.scala:77) at org.apache.spark.ml.feature.QuantileDiscretizer.fit(QuantileDiscretizer.scala:231) ... 49 elided java.lang.IllegalArgumentException: quantileDiscretizer_479bb5a3ca99 parameter splits given invalid value [-Infinity,-0.9986765732730827,..., -0.0, 0.0, ..., 0.9907184077958491,Infinity] 0.0 > -0.0 is False, which break the paremater validation check. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-31675) Fail to insert data to a table with remote location which causes by hive encryption check
[ https://issues.apache.org/jira/browse/SPARK-31675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kent Yao updated SPARK-31675: - Description: Before this fix https://issues.apache.org/jira/browse/HIVE-14380 in Hive 2.2.0, when moving files from staging dir to the final table dir, Hive will do encryption check for the srcPaths and destPaths {code:java} // Some comments here if (!isSrcLocal) { // For NOT local src file, rename the file if (hdfsEncryptionShim != null && (hdfsEncryptionShim.isPathEncrypted(srcf) || hdfsEncryptionShim.isPathEncrypted(destf)) && !hdfsEncryptionShim.arePathsOnSameEncryptionZone(srcf, destf)) { LOG.info("Copying source " + srcf + " to " + destf + " because HDFS encryption zones are different."); success = FileUtils.copy(srcf.getFileSystem(conf), srcf, destf.getFileSystem(conf), destf, true,// delete source replace, // overwrite destination conf); } else { {code} The hdfsEncryptionShim instance holds a global FileSystem instance belong to the default fileSystem. It causes failures when checking a path that belongs to a remote file system. For example, I {code:sql} key int NULL # Detailed Table Information Databasebdms_hzyaoqin_test_2 Table abc Owner bdms_hzyaoqin Created TimeMon May 11 15:14:15 CST 2020 Last Access Thu Jan 01 08:00:00 CST 1970 Created By Spark 2.4.3 TypeMANAGED Providerhive Table Properties[transient_lastDdlTime=1589181255] Locationhdfs://cluster2/user/warehouse/bdms_hzyaoqin_test.db/abc Serde Library org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe InputFormat org.apache.hadoop.mapred.TextInputFormat OutputFormatorg.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat Storage Properties [serialization.format=1] Partition Provider Catalog Time taken: 0.224 seconds, Fetched 18 row(s) {code} The table abc belongs to the remote hdfs 'hdfs://cluster2', and when we run command below via a spark sql job with default fs is ' 'hdfs://cluster1' {code:sql} insert into bdms_hzyaoqin_test_2.abc values(1); {code} {code:java} Error in query: java.lang.IllegalArgumentException: Wrong FS: hdfs://cluster2/user/warehouse/bdms_hzyaoqin_test.db/abc/.hive-staging_hive_2020-05-11_17-10-27_123_6306294638950056285-1/-ext-1/part-0-badf2a31-ab36-4b60-82a1-0848774e4af5-c000, expected: hdfs://cluster1 {code} was: Before this fix https://issues.apache.org/jira/browse/HIVE-14380 in Hive 2.2.0, when moving files from staging dir to the final table dir, Hive will do encryption check for the srcPaths and destPaths {code:java} // Some comments here if (!isSrcLocal) { // For NOT local src file, rename the file if (hdfsEncryptionShim != null && (hdfsEncryptionShim.isPathEncrypted(srcf) || hdfsEncryptionShim.isPathEncrypted(destf)) && !hdfsEncryptionShim.arePathsOnSameEncryptionZone(srcf, destf)) { LOG.info("Copying source " + srcf + " to " + destf + " because HDFS encryption zones are different."); success = FileUtils.copy(srcf.getFileSystem(conf), srcf, destf.getFileSystem(conf), destf, true,// delete source replace, // overwrite destination conf); } else { {code} The hdfsEncryptionShim instance holds a global FileSystem instance belong to the default fileSystem. It causes failures when checking a path that belongs to a remote file system. For example, I {code:sql} key int NULL # Detailed Table Information Databasebdms_hzyaoqin_test_2 Table abc Owner bdms_hzyaoqin Created TimeMon May 11 15:14:15 CST 2020 Last Access Thu Jan 01 08:00:00 CST 1970 Created By Spark 2.4.3 TypeMANAGED Providerhive Table Properties[transient_lastDdlTime=1589181255] Locationhdfs://cluster2/user/warehouse/bdms_hzyaoqin_test.db/abc Serde Library org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe InputFormat org.apache.hadoop.mapred.TextInputFormat OutputFormatorg.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat Storage Properties [serialization.format=1] Partition Provider Catalog Time taken: 0.224 seconds, Fetched 18 row(s) {code} The table abc belongs to the remote hdfs 'hdfs://cluster2', and when we run command below via a spark sql job with default fs is ' 'hdfs://cluster1' {code:sql} insert into bdms_hzyaoqin_test_2.abc values(1); {code} {code:java} Error in query: java.lang.IllegalArgumentException: Wrong FS: hdfs://cluster2/user/warehouse/bdms_hzyaoqin_test.db/abc/.hive-staging_hive_2020-05-11_17-10-27_123_6306294638950056285-1/-ext-1/part-0-badf2a31-ab36-4b60-82a1-0848774e4af5-c000, expected: hdfs://cluster1 // Some comments here public String getFoo() { return foo; } {code} > Fail to inse
[jira] [Created] (SPARK-31675) Fail to insert data to a table with remote location which causes by hive encryption check
Kent Yao created SPARK-31675: Summary: Fail to insert data to a table with remote location which causes by hive encryption check Key: SPARK-31675 URL: https://issues.apache.org/jira/browse/SPARK-31675 Project: Spark Issue Type: Bug Components: SQL Affects Versions: 2.4.6, 3.0.0, 3.1.0 Reporter: Kent Yao Before this fix https://issues.apache.org/jira/browse/HIVE-14380 in Hive 2.2.0, when moving files from staging dir to the final table dir, Hive will do encryption check for the srcPaths and destPaths {code:java} // Some comments here if (!isSrcLocal) { // For NOT local src file, rename the file if (hdfsEncryptionShim != null && (hdfsEncryptionShim.isPathEncrypted(srcf) || hdfsEncryptionShim.isPathEncrypted(destf)) && !hdfsEncryptionShim.arePathsOnSameEncryptionZone(srcf, destf)) { LOG.info("Copying source " + srcf + " to " + destf + " because HDFS encryption zones are different."); success = FileUtils.copy(srcf.getFileSystem(conf), srcf, destf.getFileSystem(conf), destf, true,// delete source replace, // overwrite destination conf); } else { {code} The hdfsEncryptionShim instance holds a global FileSystem instance belong to the default fileSystem. It causes failures when checking a path that belongs to a remote file system. For example, I {code:sql} key int NULL # Detailed Table Information Databasebdms_hzyaoqin_test_2 Table abc Owner bdms_hzyaoqin Created TimeMon May 11 15:14:15 CST 2020 Last Access Thu Jan 01 08:00:00 CST 1970 Created By Spark 2.4.3 TypeMANAGED Providerhive Table Properties[transient_lastDdlTime=1589181255] Locationhdfs://cluster2/user/warehouse/bdms_hzyaoqin_test.db/abc Serde Library org.apache.hadoop.hive.serde2.lazy.LazySimpleSerDe InputFormat org.apache.hadoop.mapred.TextInputFormat OutputFormatorg.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat Storage Properties [serialization.format=1] Partition Provider Catalog Time taken: 0.224 seconds, Fetched 18 row(s) {code} The table abc belongs to the remote hdfs 'hdfs://cluster2', and when we run command below via a spark sql job with default fs is ' 'hdfs://cluster1' {code:sql} insert into bdms_hzyaoqin_test_2.abc values(1); {code} {code:java} Error in query: java.lang.IllegalArgumentException: Wrong FS: hdfs://cluster2/user/warehouse/bdms_hzyaoqin_test.db/abc/.hive-staging_hive_2020-05-11_17-10-27_123_6306294638950056285-1/-ext-1/part-0-badf2a31-ab36-4b60-82a1-0848774e4af5-c000, expected: hdfs://cluster1 // Some comments here public String getFoo() { return foo; } {code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-31634) "show tables like" support for SQL wildcard characters (% and _)
[ https://issues.apache.org/jira/browse/SPARK-31634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17104223#comment-17104223 ] pavithra ramachandran commented on SPARK-31634: --- [~yumwang] i see that show tables uses catalogue and there is an open Jira in hive side. Once that gets fixed, it will work in spark, Or do u want us to handle separately handle in spark. > "show tables like" support for SQL wildcard characters (% and _) > > > Key: SPARK-31634 > URL: https://issues.apache.org/jira/browse/SPARK-31634 > Project: Spark > Issue Type: Improvement > Components: SQL >Affects Versions: 3.1.0 >Reporter: Yuming Wang >Priority: Major > > https://docs.snowflake.com/en/sql-reference/sql/show-tables.html > https://clickhouse.tech/docs/en/sql-reference/statements/show/ > https://www.mysqltutorial.org/mysql-show-tables/ > https://issues.apache.org/jira/browse/HIVE-23359 -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-30331) The final AdaptiveSparkPlan event is not marked with `isFinalPlan=true`
[ https://issues.apache.org/jira/browse/SPARK-30331?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Manu Zhang updated SPARK-30331: --- Parent: SPARK-31412 Issue Type: Sub-task (was: Bug) > The final AdaptiveSparkPlan event is not marked with `isFinalPlan=true` > --- > > Key: SPARK-30331 > URL: https://issues.apache.org/jira/browse/SPARK-30331 > Project: Spark > Issue Type: Sub-task > Components: SQL >Affects Versions: 3.0.0 >Reporter: Manu Zhang >Assignee: Manu Zhang >Priority: Minor > Fix For: 3.0.0 > > > This is due to that the final AdaptiveSparkPlan event is sent out before > {{isFinalPlan}} variable set to `true`. It would fail any listener attempting > to catch the final event by pattern matching `isFinalPlan=true` -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Updated] (SPARK-31658) SQL UI doesn't show write commands of AQE plan
[ https://issues.apache.org/jira/browse/SPARK-31658?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Manu Zhang updated SPARK-31658: --- Parent: SPARK-31412 Issue Type: Sub-task (was: Improvement) > SQL UI doesn't show write commands of AQE plan > -- > > Key: SPARK-31658 > URL: https://issues.apache.org/jira/browse/SPARK-31658 > Project: Spark > Issue Type: Sub-task > Components: SQL, Web UI >Affects Versions: 3.0.0 >Reporter: Manu Zhang >Assignee: Manu Zhang >Priority: Minor > Fix For: 3.0.0 > > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org For additional commands, e-mail: issues-h...@spark.apache.org
[jira] [Commented] (SPARK-31620) TreeNodeException: Binding attribute, tree: sum#19L
[ https://issues.apache.org/jira/browse/SPARK-31620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17104151#comment-17104151 ] Apache Spark commented on SPARK-31620: -- User 'Ngone51' has created a pull request for this issue: https://github.com/apache/spark/pull/28496 > TreeNodeException: Binding attribute, tree: sum#19L > --- > > Key: SPARK-31620 > URL: https://issues.apache.org/jira/browse/SPARK-31620 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.4, 2.4.5, 3.0.0 >Reporter: Yuming Wang >Priority: Major > > {noformat} > scala> spark.sql("create temporary view t1 as select * from values (1, 2) as > t1(a, b)") > res0: org.apache.spark.sql.DataFrame = [] > scala> spark.sql("create temporary view t2 as select * from values (3, 4) as > t2(c, d)") > res1: org.apache.spark.sql.DataFrame = [] > scala> spark.sql("select sum(if(c > (select a from t1), d, 0)) as csum from > t2").show > org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Binding > attribute, tree: sum#19L > at > org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:56) > at > org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:75) > at > org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:74) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDown$1(TreeNode.scala:309) > at > org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:72) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:309) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDown$3(TreeNode.scala:314) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$1(TreeNode.scala:399) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:237) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:397) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:350) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:314) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDown$3(TreeNode.scala:314) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapChild$2(TreeNode.scala:368) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$4(TreeNode.scala:427) > at > scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238) > at > scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36) > at > scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33) > at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:38) > at scala.collection.TraversableLike.map(TraversableLike.scala:238) > at scala.collection.TraversableLike.map$(TraversableLike.scala:231) > at scala.collection.AbstractTraversable.map(Traversable.scala:108) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$1(TreeNode.scala:427) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:237) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:397) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:350) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:314) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:298) > at > org.apache.spark.sql.catalyst.expressions.BindReferences$.bindReference(BoundAttribute.scala:74) > at > org.apache.spark.sql.catalyst.expressions.BindReferences$.$anonfun$bindReferences$1(BoundAttribute.scala:96) > at > scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238) > at scala.collection.immutable.List.foreach(List.scala:392) > at scala.collection.TraversableLike.map(TraversableLike.scala:238) > at scala.collection.TraversableLike.map$(TraversableLike.scala:231) > at scala.collection.immutable.List.map(List.scala:298) > at > org.apache.spark.sql.catalyst.expressions.BindReferences$.bindReferences(BoundAttribute.scala:96) > at > org.apache.spark.sql.execution.aggregate.HashAggregateExec.$anonfun$doConsumeWithoutKeys$4(HashAggregateExec.scala:348) > at > scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238) > at scala.collection.immutable.List.foreach(List.scala:392) > at scala.collection.TraversableLike.map(TraversableLike.scala:238) > at scala.collection.TraversableLike.map$(TraversableLike.scala:231) > at scala.collection.immutable.List.map(List.scala:298) > at > org.apache.spark.sql.executio
[jira] [Assigned] (SPARK-31620) TreeNodeException: Binding attribute, tree: sum#19L
[ https://issues.apache.org/jira/browse/SPARK-31620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-31620: Assignee: Apache Spark > TreeNodeException: Binding attribute, tree: sum#19L > --- > > Key: SPARK-31620 > URL: https://issues.apache.org/jira/browse/SPARK-31620 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.4, 2.4.5, 3.0.0 >Reporter: Yuming Wang >Assignee: Apache Spark >Priority: Major > > {noformat} > scala> spark.sql("create temporary view t1 as select * from values (1, 2) as > t1(a, b)") > res0: org.apache.spark.sql.DataFrame = [] > scala> spark.sql("create temporary view t2 as select * from values (3, 4) as > t2(c, d)") > res1: org.apache.spark.sql.DataFrame = [] > scala> spark.sql("select sum(if(c > (select a from t1), d, 0)) as csum from > t2").show > org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Binding > attribute, tree: sum#19L > at > org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:56) > at > org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:75) > at > org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:74) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDown$1(TreeNode.scala:309) > at > org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:72) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:309) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDown$3(TreeNode.scala:314) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$1(TreeNode.scala:399) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:237) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:397) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:350) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:314) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDown$3(TreeNode.scala:314) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapChild$2(TreeNode.scala:368) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$4(TreeNode.scala:427) > at > scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238) > at > scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36) > at > scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33) > at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:38) > at scala.collection.TraversableLike.map(TraversableLike.scala:238) > at scala.collection.TraversableLike.map$(TraversableLike.scala:231) > at scala.collection.AbstractTraversable.map(Traversable.scala:108) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$1(TreeNode.scala:427) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:237) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:397) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:350) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:314) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:298) > at > org.apache.spark.sql.catalyst.expressions.BindReferences$.bindReference(BoundAttribute.scala:74) > at > org.apache.spark.sql.catalyst.expressions.BindReferences$.$anonfun$bindReferences$1(BoundAttribute.scala:96) > at > scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238) > at scala.collection.immutable.List.foreach(List.scala:392) > at scala.collection.TraversableLike.map(TraversableLike.scala:238) > at scala.collection.TraversableLike.map$(TraversableLike.scala:231) > at scala.collection.immutable.List.map(List.scala:298) > at > org.apache.spark.sql.catalyst.expressions.BindReferences$.bindReferences(BoundAttribute.scala:96) > at > org.apache.spark.sql.execution.aggregate.HashAggregateExec.$anonfun$doConsumeWithoutKeys$4(HashAggregateExec.scala:348) > at > scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238) > at scala.collection.immutable.List.foreach(List.scala:392) > at scala.collection.TraversableLike.map(TraversableLike.scala:238) > at scala.collection.TraversableLike.map$(TraversableLike.scala:231) > at scala.collection.immutable.List.map(List.scala:298) > at > org.apache.spark.sql.execution.aggregate.HashAggregateExec.doConsumeWithoutKeys(HashAggregateExec.scala:347) > at >
[jira] [Commented] (SPARK-31620) TreeNodeException: Binding attribute, tree: sum#19L
[ https://issues.apache.org/jira/browse/SPARK-31620?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17104148#comment-17104148 ] Apache Spark commented on SPARK-31620: -- User 'Ngone51' has created a pull request for this issue: https://github.com/apache/spark/pull/28496 > TreeNodeException: Binding attribute, tree: sum#19L > --- > > Key: SPARK-31620 > URL: https://issues.apache.org/jira/browse/SPARK-31620 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.4, 2.4.5, 3.0.0 >Reporter: Yuming Wang >Priority: Major > > {noformat} > scala> spark.sql("create temporary view t1 as select * from values (1, 2) as > t1(a, b)") > res0: org.apache.spark.sql.DataFrame = [] > scala> spark.sql("create temporary view t2 as select * from values (3, 4) as > t2(c, d)") > res1: org.apache.spark.sql.DataFrame = [] > scala> spark.sql("select sum(if(c > (select a from t1), d, 0)) as csum from > t2").show > org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Binding > attribute, tree: sum#19L > at > org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:56) > at > org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:75) > at > org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:74) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDown$1(TreeNode.scala:309) > at > org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:72) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:309) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDown$3(TreeNode.scala:314) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$1(TreeNode.scala:399) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:237) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:397) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:350) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:314) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDown$3(TreeNode.scala:314) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapChild$2(TreeNode.scala:368) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$4(TreeNode.scala:427) > at > scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238) > at > scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36) > at > scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33) > at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:38) > at scala.collection.TraversableLike.map(TraversableLike.scala:238) > at scala.collection.TraversableLike.map$(TraversableLike.scala:231) > at scala.collection.AbstractTraversable.map(Traversable.scala:108) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$1(TreeNode.scala:427) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:237) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:397) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:350) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:314) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:298) > at > org.apache.spark.sql.catalyst.expressions.BindReferences$.bindReference(BoundAttribute.scala:74) > at > org.apache.spark.sql.catalyst.expressions.BindReferences$.$anonfun$bindReferences$1(BoundAttribute.scala:96) > at > scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238) > at scala.collection.immutable.List.foreach(List.scala:392) > at scala.collection.TraversableLike.map(TraversableLike.scala:238) > at scala.collection.TraversableLike.map$(TraversableLike.scala:231) > at scala.collection.immutable.List.map(List.scala:298) > at > org.apache.spark.sql.catalyst.expressions.BindReferences$.bindReferences(BoundAttribute.scala:96) > at > org.apache.spark.sql.execution.aggregate.HashAggregateExec.$anonfun$doConsumeWithoutKeys$4(HashAggregateExec.scala:348) > at > scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238) > at scala.collection.immutable.List.foreach(List.scala:392) > at scala.collection.TraversableLike.map(TraversableLike.scala:238) > at scala.collection.TraversableLike.map$(TraversableLike.scala:231) > at scala.collection.immutable.List.map(List.scala:298) > at > org.apache.spark.sql.executio
[jira] [Assigned] (SPARK-31620) TreeNodeException: Binding attribute, tree: sum#19L
[ https://issues.apache.org/jira/browse/SPARK-31620?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Apache Spark reassigned SPARK-31620: Assignee: (was: Apache Spark) > TreeNodeException: Binding attribute, tree: sum#19L > --- > > Key: SPARK-31620 > URL: https://issues.apache.org/jira/browse/SPARK-31620 > Project: Spark > Issue Type: Bug > Components: SQL >Affects Versions: 2.3.4, 2.4.5, 3.0.0 >Reporter: Yuming Wang >Priority: Major > > {noformat} > scala> spark.sql("create temporary view t1 as select * from values (1, 2) as > t1(a, b)") > res0: org.apache.spark.sql.DataFrame = [] > scala> spark.sql("create temporary view t2 as select * from values (3, 4) as > t2(c, d)") > res1: org.apache.spark.sql.DataFrame = [] > scala> spark.sql("select sum(if(c > (select a from t1), d, 0)) as csum from > t2").show > org.apache.spark.sql.catalyst.errors.package$TreeNodeException: Binding > attribute, tree: sum#19L > at > org.apache.spark.sql.catalyst.errors.package$.attachTree(package.scala:56) > at > org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:75) > at > org.apache.spark.sql.catalyst.expressions.BindReferences$$anonfun$bindReference$1.applyOrElse(BoundAttribute.scala:74) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDown$1(TreeNode.scala:309) > at > org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:72) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:309) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDown$3(TreeNode.scala:314) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$1(TreeNode.scala:399) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:237) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:397) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:350) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:314) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$transformDown$3(TreeNode.scala:314) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapChild$2(TreeNode.scala:368) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$4(TreeNode.scala:427) > at > scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238) > at > scala.collection.IndexedSeqOptimized.foreach(IndexedSeqOptimized.scala:36) > at > scala.collection.IndexedSeqOptimized.foreach$(IndexedSeqOptimized.scala:33) > at scala.collection.mutable.WrappedArray.foreach(WrappedArray.scala:38) > at scala.collection.TraversableLike.map(TraversableLike.scala:238) > at scala.collection.TraversableLike.map$(TraversableLike.scala:231) > at scala.collection.AbstractTraversable.map(Traversable.scala:108) > at > org.apache.spark.sql.catalyst.trees.TreeNode.$anonfun$mapChildren$1(TreeNode.scala:427) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:237) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:397) > at > org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:350) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:314) > at > org.apache.spark.sql.catalyst.trees.TreeNode.transform(TreeNode.scala:298) > at > org.apache.spark.sql.catalyst.expressions.BindReferences$.bindReference(BoundAttribute.scala:74) > at > org.apache.spark.sql.catalyst.expressions.BindReferences$.$anonfun$bindReferences$1(BoundAttribute.scala:96) > at > scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238) > at scala.collection.immutable.List.foreach(List.scala:392) > at scala.collection.TraversableLike.map(TraversableLike.scala:238) > at scala.collection.TraversableLike.map$(TraversableLike.scala:231) > at scala.collection.immutable.List.map(List.scala:298) > at > org.apache.spark.sql.catalyst.expressions.BindReferences$.bindReferences(BoundAttribute.scala:96) > at > org.apache.spark.sql.execution.aggregate.HashAggregateExec.$anonfun$doConsumeWithoutKeys$4(HashAggregateExec.scala:348) > at > scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:238) > at scala.collection.immutable.List.foreach(List.scala:392) > at scala.collection.TraversableLike.map(TraversableLike.scala:238) > at scala.collection.TraversableLike.map$(TraversableLike.scala:231) > at scala.collection.immutable.List.map(List.scala:298) > at > org.apache.spark.sql.execution.aggregate.HashAggregateExec.doConsumeWithoutKeys(HashAggregateExec.scala:347) > at > org.apache.spark.sql.exe