[jira] [Resolved] (SPARK-31615) Pretty string output for sql method of RuntimeReplaceable expressions

2020-05-06 Thread Takeshi Yamamuro (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31615?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takeshi Yamamuro resolved SPARK-31615.
--
Fix Version/s: 3.1.0
 Assignee: Kent Yao
   Resolution: Fixed

Resolved by https://github.com/apache/spark/pull/28420

> Pretty string output for sql method of RuntimeReplaceable expressions
> -
>
> Key: SPARK-31615
> URL: https://issues.apache.org/jira/browse/SPARK-31615
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Minor
> Fix For: 3.1.0
>
>
> The RuntimeReplaceable ones are runtime replaceable, thus, their original 
> parameters are not going to be resolved to PrettyAttribute and remain debug 
> style string if we directly implement their `sql` methods with their 
> parameters' `sql` methods.
> e.g. `sql` method from `to_timestamp`



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-31631) Fix test flakiness caused by MiniKdc which throws "address in use" BindException

2020-05-06 Thread Takeshi Yamamuro (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31631?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takeshi Yamamuro resolved SPARK-31631.
--
Fix Version/s: 3.1.0
 Assignee: Kent Yao
   Resolution: Fixed

Resolved by https://github.com/apache/spark/pull/28442

> Fix test flakiness caused by MiniKdc which throws "address in use" 
> BindException
> 
>
> Key: SPARK-31631
> URL: https://issues.apache.org/jira/browse/SPARK-31631
> Project: Spark
>  Issue Type: Bug
>  Components: Tests
>Affects Versions: 3.0.0, 3.1.0
>Reporter: Kent Yao
>Assignee: Kent Yao
>Priority: Major
> Fix For: 3.1.0
>
>
> {code:java}
> [info] org.apache.spark.sql.kafka010.KafkaDelegationTokenSuite *** ABORTED 
> *** (15 seconds, 426 milliseconds)
> [info]   java.net.BindException: Address already in use
> [info]   at sun.nio.ch.Net.bind0(Native Method)
> [info]   at sun.nio.ch.Net.bind(Net.java:433)
> [info]   at sun.nio.ch.Net.bind(Net.java:425)
> [info]   at 
> sun.nio.ch.ServerSocketChannelImpl.bind(ServerSocketChannelImpl.java:223)
> [info]   at sun.nio.ch.ServerSocketAdaptor.bind(ServerSocketAdaptor.java:74)
> [info]   at 
> org.apache.mina.transport.socket.nio.NioSocketAcceptor.open(NioSocketAcceptor.java:198)
> [info]   at 
> org.apache.mina.transport.socket.nio.NioSocketAcceptor.open(NioSocketAcceptor.java:51)
> [info]   at 
> org.apache.mina.core.polling.AbstractPollingIoAcceptor.registerHandles(AbstractPollingIoAcceptor.java:547)
> [info]   at 
> org.apache.mina.core.polling.AbstractPollingIoAcceptor.access$400(AbstractPollingIoAcceptor.java:68)
> [info]   at 
> org.apache.mina.core.polling.AbstractPollingIoAcceptor$Acceptor.run(AbstractPollingIoAcceptor.java:422)
> [info]   at 
> org.apache.mina.util.NamePreservingRunnable.run(NamePreservingRunnable.java:64)
> [info]   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> [info]   at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> [info]   at java.lang.Thread.run(Thread.java:748)
> {code}
> This is an issue fixed in hadoop 2.8.0
> https://issues.apache.org/jira/browse/HADOOP-12656
> We may need apply the approach from HBASE first before we drop Hadoop 2.7.x
> https://issues.apache.org/jira/browse/HBASE-14734



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-31656) AFT blockify input vectors

2020-05-06 Thread zhengruifeng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhengruifeng reassigned SPARK-31656:


Assignee: zhengruifeng

> AFT blockify input vectors
> --
>
> Key: SPARK-31656
> URL: https://issues.apache.org/jira/browse/SPARK-31656
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML
>Affects Versions: 3.1.0
>Reporter: zhengruifeng
>Assignee: zhengruifeng
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-31656) AFT blockify input vectors

2020-05-06 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-31656:


Assignee: Apache Spark

> AFT blockify input vectors
> --
>
> Key: SPARK-31656
> URL: https://issues.apache.org/jira/browse/SPARK-31656
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML
>Affects Versions: 3.1.0
>Reporter: zhengruifeng
>Assignee: Apache Spark
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-31656) AFT blockify input vectors

2020-05-06 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-31656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17101349#comment-17101349
 ] 

Apache Spark commented on SPARK-31656:
--

User 'zhengruifeng' has created a pull request for this issue:
https://github.com/apache/spark/pull/28473

> AFT blockify input vectors
> --
>
> Key: SPARK-31656
> URL: https://issues.apache.org/jira/browse/SPARK-31656
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML
>Affects Versions: 3.1.0
>Reporter: zhengruifeng
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-31656) AFT blockify input vectors

2020-05-06 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-31656:


Assignee: (was: Apache Spark)

> AFT blockify input vectors
> --
>
> Key: SPARK-31656
> URL: https://issues.apache.org/jira/browse/SPARK-31656
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML
>Affects Versions: 3.1.0
>Reporter: zhengruifeng
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-31655) Upgrade snappy to version 1.1.7.5

2020-05-06 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-31655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17101348#comment-17101348
 ] 

Apache Spark commented on SPARK-31655:
--

User 'AngersZh' has created a pull request for this issue:
https://github.com/apache/spark/pull/28472

> Upgrade snappy to version 1.1.7.5
> -
>
> Key: SPARK-31655
> URL: https://issues.apache.org/jira/browse/SPARK-31655
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: angerszhu
>Priority: Minor
>
> Upgrade snappy to version 1.1.7.5



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-31655) Upgrade snappy to version 1.1.7.5

2020-05-06 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-31655:


Assignee: Apache Spark

> Upgrade snappy to version 1.1.7.5
> -
>
> Key: SPARK-31655
> URL: https://issues.apache.org/jira/browse/SPARK-31655
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: angerszhu
>Assignee: Apache Spark
>Priority: Minor
>
> Upgrade snappy to version 1.1.7.5



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-31655) Upgrade snappy to version 1.1.7.5

2020-05-06 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-31655?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17101347#comment-17101347
 ] 

Apache Spark commented on SPARK-31655:
--

User 'AngersZh' has created a pull request for this issue:
https://github.com/apache/spark/pull/28472

> Upgrade snappy to version 1.1.7.5
> -
>
> Key: SPARK-31655
> URL: https://issues.apache.org/jira/browse/SPARK-31655
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: angerszhu
>Priority: Minor
>
> Upgrade snappy to version 1.1.7.5



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-31655) Upgrade snappy to version 1.1.7.5

2020-05-06 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31655?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-31655:


Assignee: (was: Apache Spark)

> Upgrade snappy to version 1.1.7.5
> -
>
> Key: SPARK-31655
> URL: https://issues.apache.org/jira/browse/SPARK-31655
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: angerszhu
>Priority: Minor
>
> Upgrade snappy to version 1.1.7.5



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-31656) AFT blockify input vectors

2020-05-06 Thread zhengruifeng (Jira)
zhengruifeng created SPARK-31656:


 Summary: AFT blockify input vectors
 Key: SPARK-31656
 URL: https://issues.apache.org/jira/browse/SPARK-31656
 Project: Spark
  Issue Type: Sub-task
  Components: ML
Affects Versions: 3.1.0
Reporter: zhengruifeng






--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-30660) LinearRegression blockify input vectors

2020-05-06 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-30660:


Assignee: zhengruifeng  (was: Apache Spark)

> LinearRegression blockify input vectors
> ---
>
> Key: SPARK-30660
> URL: https://issues.apache.org/jira/browse/SPARK-30660
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML, PySpark
>Affects Versions: 3.0.0
>Reporter: zhengruifeng
>Assignee: zhengruifeng
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-30660) LinearRegression blockify input vectors

2020-05-06 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-30660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17101340#comment-17101340
 ] 

Apache Spark commented on SPARK-30660:
--

User 'zhengruifeng' has created a pull request for this issue:
https://github.com/apache/spark/pull/28471

> LinearRegression blockify input vectors
> ---
>
> Key: SPARK-30660
> URL: https://issues.apache.org/jira/browse/SPARK-30660
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML, PySpark
>Affects Versions: 3.0.0
>Reporter: zhengruifeng
>Assignee: zhengruifeng
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-30660) LinearRegression blockify input vectors

2020-05-06 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-30660?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17101341#comment-17101341
 ] 

Apache Spark commented on SPARK-30660:
--

User 'zhengruifeng' has created a pull request for this issue:
https://github.com/apache/spark/pull/28471

> LinearRegression blockify input vectors
> ---
>
> Key: SPARK-30660
> URL: https://issues.apache.org/jira/browse/SPARK-30660
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML, PySpark
>Affects Versions: 3.0.0
>Reporter: zhengruifeng
>Assignee: zhengruifeng
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-30660) LinearRegression blockify input vectors

2020-05-06 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30660?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-30660:


Assignee: Apache Spark  (was: zhengruifeng)

> LinearRegression blockify input vectors
> ---
>
> Key: SPARK-30660
> URL: https://issues.apache.org/jira/browse/SPARK-30660
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML, PySpark
>Affects Versions: 3.0.0
>Reporter: zhengruifeng
>Assignee: Apache Spark
>Priority: Minor
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Closed] (SPARK-5195) when hive table is query with alias the cache data lose effectiveness.

2020-05-06 Thread yixiaohua (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-5195?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

yixiaohua closed SPARK-5195.


> when hive table is query with alias  the cache data  lose effectiveness.
> 
>
> Key: SPARK-5195
> URL: https://issues.apache.org/jira/browse/SPARK-5195
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.2.0
>Reporter: yixiaohua
>Assignee: yixiaohua
>Priority: Major
> Fix For: 1.3.0
>
>
> override the MetastoreRelation's sameresult method only compare databasename 
> and table name
> because in previous :
> cache table t1;
> select count() from t1;
> it will read data from memory but the sql below will not,instead it read from 
> hdfs:
> select count() from t1 t;
> because cache data is keyed by logical plan and compare with sameResult ,so 
> when table with alias the same table 's logicalplan is not the same logical 
> plan with out alias so modify the sameresult method only compare databasename 
> and table name



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-31655) Upgrade snappy to version 1.1.7.5

2020-05-06 Thread angerszhu (Jira)
angerszhu created SPARK-31655:
-

 Summary: Upgrade snappy to version 1.1.7.5
 Key: SPARK-31655
 URL: https://issues.apache.org/jira/browse/SPARK-31655
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 3.0.0
Reporter: angerszhu


Upgrade snappy to version 1.1.7.5



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-29803) remove all instances of 'from __future__ import print_function'

2020-05-06 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-29803?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17101315#comment-17101315
 ] 

Apache Spark commented on SPARK-29803:
--

User 'tianshizz' has created a pull request for this issue:
https://github.com/apache/spark/pull/28470

> remove all instances of 'from __future__ import print_function' 
> 
>
> Key: SPARK-29803
> URL: https://issues.apache.org/jira/browse/SPARK-29803
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build, PySpark, Tests
>Affects Versions: 3.0.0
>Reporter: Shane Knapp
>Priority: Major
> Attachments: print_function_list.txt
>
>
> there are 135 python files in the spark repo that need to have `from 
> __future__ import print_function` removed (see attached file 
> 'print_function_list.txt').
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-29803) remove all instances of 'from __future__ import print_function'

2020-05-06 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-29803:


Assignee: Apache Spark

> remove all instances of 'from __future__ import print_function' 
> 
>
> Key: SPARK-29803
> URL: https://issues.apache.org/jira/browse/SPARK-29803
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build, PySpark, Tests
>Affects Versions: 3.0.0
>Reporter: Shane Knapp
>Assignee: Apache Spark
>Priority: Major
> Attachments: print_function_list.txt
>
>
> there are 135 python files in the spark repo that need to have `from 
> __future__ import print_function` removed (see attached file 
> 'print_function_list.txt').
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-29803) remove all instances of 'from __future__ import print_function'

2020-05-06 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-29803:


Assignee: (was: Apache Spark)

> remove all instances of 'from __future__ import print_function' 
> 
>
> Key: SPARK-29803
> URL: https://issues.apache.org/jira/browse/SPARK-29803
> Project: Spark
>  Issue Type: Sub-task
>  Components: Build, PySpark, Tests
>Affects Versions: 3.0.0
>Reporter: Shane Knapp
>Priority: Major
> Attachments: print_function_list.txt
>
>
> there are 135 python files in the spark repo that need to have `from 
> __future__ import print_function` removed (see attached file 
> 'print_function_list.txt').
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-29802) update remaining python scripts in repo to python3 shebang

2020-05-06 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-29802:


Assignee: (was: Apache Spark)

> update remaining python scripts in repo to python3 shebang
> --
>
> Key: SPARK-29802
> URL: https://issues.apache.org/jira/browse/SPARK-29802
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.0.0
>Reporter: Shane Knapp
>Priority: Major
>
> there are a bunch of scripts in the repo that need to have their shebang 
> updated to python3:
> {noformat}
> dev/create-release/releaseutils.py:#!/usr/bin/env python
> dev/create-release/generate-contributors.py:#!/usr/bin/env python
> dev/create-release/translate-contributors.py:#!/usr/bin/env python
> dev/github_jira_sync.py:#!/usr/bin/env python
> dev/merge_spark_pr.py:#!/usr/bin/env python
> python/pyspark/version.py:#!/usr/bin/env python
> python/pyspark/find_spark_home.py:#!/usr/bin/env python
> python/setup.py:#!/usr/bin/env python{noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-29802) update remaining python scripts in repo to python3 shebang

2020-05-06 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-29802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17101310#comment-17101310
 ] 

Apache Spark commented on SPARK-29802:
--

User 'tianshizz' has created a pull request for this issue:
https://github.com/apache/spark/pull/28469

> update remaining python scripts in repo to python3 shebang
> --
>
> Key: SPARK-29802
> URL: https://issues.apache.org/jira/browse/SPARK-29802
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.0.0
>Reporter: Shane Knapp
>Priority: Major
>
> there are a bunch of scripts in the repo that need to have their shebang 
> updated to python3:
> {noformat}
> dev/create-release/releaseutils.py:#!/usr/bin/env python
> dev/create-release/generate-contributors.py:#!/usr/bin/env python
> dev/create-release/translate-contributors.py:#!/usr/bin/env python
> dev/github_jira_sync.py:#!/usr/bin/env python
> dev/merge_spark_pr.py:#!/usr/bin/env python
> python/pyspark/version.py:#!/usr/bin/env python
> python/pyspark/find_spark_home.py:#!/usr/bin/env python
> python/setup.py:#!/usr/bin/env python{noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-29802) update remaining python scripts in repo to python3 shebang

2020-05-06 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-29802:


Assignee: Apache Spark

> update remaining python scripts in repo to python3 shebang
> --
>
> Key: SPARK-29802
> URL: https://issues.apache.org/jira/browse/SPARK-29802
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.0.0
>Reporter: Shane Knapp
>Assignee: Apache Spark
>Priority: Major
>
> there are a bunch of scripts in the repo that need to have their shebang 
> updated to python3:
> {noformat}
> dev/create-release/releaseutils.py:#!/usr/bin/env python
> dev/create-release/generate-contributors.py:#!/usr/bin/env python
> dev/create-release/translate-contributors.py:#!/usr/bin/env python
> dev/github_jira_sync.py:#!/usr/bin/env python
> dev/merge_spark_pr.py:#!/usr/bin/env python
> python/pyspark/version.py:#!/usr/bin/env python
> python/pyspark/find_spark_home.py:#!/usr/bin/env python
> python/setup.py:#!/usr/bin/env python{noformat}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-30659) LogisticRegression blockify input vectors

2020-05-06 Thread zhengruifeng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-30659?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhengruifeng resolved SPARK-30659.
--
Fix Version/s: 3.1.0
   Resolution: Fixed

Issue resolved by pull request 28458
[https://github.com/apache/spark/pull/28458]

> LogisticRegression blockify input vectors
> -
>
> Key: SPARK-30659
> URL: https://issues.apache.org/jira/browse/SPARK-30659
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML, PySpark
>Affects Versions: 3.0.0
>Reporter: zhengruifeng
>Assignee: zhengruifeng
>Priority: Minor
> Fix For: 3.1.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-31212) Failure of casting the '1000-02-29' string to the date type

2020-05-06 Thread Hyukjin Kwon (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-31212.
--
Resolution: Won't Fix

Won't fix in Spark 2.4.x. See also 
https://github.com/apache/spark/pull/28445#issuecomment-624455200

> Failure of casting the '1000-02-29' string to the date type
> ---
>
> Key: SPARK-31212
> URL: https://issues.apache.org/jira/browse/SPARK-31212
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.5
>Reporter: Maxim Gekk
>Priority: Major
>
> The '1000-02-29' is valid date in the Julian calendar used in Spark 2.4.5 for 
> dates before 1582-10-15 but casting the string to the date type fails:
> {code:scala}
> scala> val df = 
> Seq("1000-02-29").toDF("dateS").select($"dateS".cast("date").as("date"))
> df: org.apache.spark.sql.DataFrame = [date: date]
> scala> df.show
> ++
> |date|
> ++
> |null|
> ++
> {code}
> Creating a dataset from java.sql.Date w/ the same input string works 
> correctly:
> {code:scala}
> scala> val df2 = 
> Seq(java.sql.Date.valueOf("1000-02-29")).toDF("dateS").select($"dateS".as("date"))
> df2: org.apache.spark.sql.DataFrame = [date: date]
> scala> df2.show
> +--+
> |  date|
> +--+
> |1000-02-29|
> +--+
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-31647) Deprecate 'spark.sql.optimizer.metadataOnly'

2020-05-06 Thread Takeshi Yamamuro (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31647?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Takeshi Yamamuro resolved SPARK-31647.
--
Fix Version/s: 3.0.0
   Resolution: Fixed

Resolved by https://github.com/apache/spark/pull/28459

> Deprecate 'spark.sql.optimizer.metadataOnly'
> 
>
> Key: SPARK-31647
> URL: https://issues.apache.org/jira/browse/SPARK-31647
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
> Fix For: 3.0.0
>
>
> This optimization can cause a potential correctness issue, see also 
> SPARK-26709.
> Also, it seems difficult to extend the optimization. Basically you should 
> whitelist all available functions.
> Seems we should rather deprecate and remove this.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-31654) sequence producing inconsistent intervals for month step

2020-05-06 Thread Roman Yalki (Jira)
Roman Yalki created SPARK-31654:
---

 Summary: sequence producing inconsistent intervals for month step
 Key: SPARK-31654
 URL: https://issues.apache.org/jira/browse/SPARK-31654
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.4.4
Reporter: Roman Yalki


Taking an example from [https://spark.apache.org/docs/latest/api/sql/]
{code:java}
> SELECT sequence(to_date('2018-01-01'), to_date('2018-03-01'), interval 1 
> month);{code}
[2018-01-01,2018-02-01,2018-03-01]

if one is to expand `stop` till the end of the year some intervals are returned 
as the last day of the month whereas fist day of the month is expected
{code:java}
> SELECT sequence(to_date('2018-01-01'), to_date('2019-01-01'), interval 1 
> month){code}
[2018-01-01, 2018-02-01, 2018-03-01, *2018-03-31, 2018-04-30, 2018-05-31, 
2018-06-30, 2018-07-31, 2018-08-31, 2018-09-30, 2018-10-31*, 2018-12-01, 
2019-01-01]

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-31365) Enable nested predicate pushdown per data sources

2020-05-06 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-31365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17101237#comment-17101237
 ] 

Apache Spark commented on SPARK-31365:
--

User 'viirya' has created a pull request for this issue:
https://github.com/apache/spark/pull/28468

> Enable nested predicate pushdown per data sources
> -
>
> Key: SPARK-31365
> URL: https://issues.apache.org/jira/browse/SPARK-31365
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: DB Tsai
>Assignee: L. C. Hsieh
>Priority: Blocker
> Fix For: 3.0.0
>
>
> Currently, nested predicate pushdown is on or off for all data sources. We 
> should create configuration for each supported data source.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-31365) Enable nested predicate pushdown per data sources

2020-05-06 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-31365?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17101235#comment-17101235
 ] 

Apache Spark commented on SPARK-31365:
--

User 'viirya' has created a pull request for this issue:
https://github.com/apache/spark/pull/28468

> Enable nested predicate pushdown per data sources
> -
>
> Key: SPARK-31365
> URL: https://issues.apache.org/jira/browse/SPARK-31365
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: DB Tsai
>Assignee: L. C. Hsieh
>Priority: Blocker
> Fix For: 3.0.0
>
>
> Currently, nested predicate pushdown is on or off for all data sources. We 
> should create configuration for each supported data source.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-31653) setuptools needs to be installed before anything else

2020-05-06 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-31653:


Assignee: Holden Karau  (was: Apache Spark)

> setuptools needs to be installed before anything else
> -
>
> Key: SPARK-31653
> URL: https://issues.apache.org/jira/browse/SPARK-31653
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 2.4.6
>Reporter: Holden Karau
>Assignee: Holden Karau
>Priority: Blocker
>
> One of the early packages we install as part of the release build in the 
> Dockerfile now requires setuptools to be pre-installed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-31653) setuptools needs to be installed before anything else

2020-05-06 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-31653:


Assignee: Apache Spark  (was: Holden Karau)

> setuptools needs to be installed before anything else
> -
>
> Key: SPARK-31653
> URL: https://issues.apache.org/jira/browse/SPARK-31653
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 2.4.6
>Reporter: Holden Karau
>Assignee: Apache Spark
>Priority: Blocker
>
> One of the early packages we install as part of the release build in the 
> Dockerfile now requires setuptools to be pre-installed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-31653) setuptools needs to be installed before anything else

2020-05-06 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-31653?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17101219#comment-17101219
 ] 

Apache Spark commented on SPARK-31653:
--

User 'holdenk' has created a pull request for this issue:
https://github.com/apache/spark/pull/28467

> setuptools needs to be installed before anything else
> -
>
> Key: SPARK-31653
> URL: https://issues.apache.org/jira/browse/SPARK-31653
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 2.4.6
>Reporter: Holden Karau
>Priority: Blocker
>
> One of the early packages we install as part of the release build in the 
> Dockerfile now requires setuptools to be pre-installed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-31653) setuptools needs to be installed before anything else

2020-05-06 Thread Holden Karau (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Holden Karau reassigned SPARK-31653:


Assignee: Holden Karau

> setuptools needs to be installed before anything else
> -
>
> Key: SPARK-31653
> URL: https://issues.apache.org/jira/browse/SPARK-31653
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 2.4.6
>Reporter: Holden Karau
>Assignee: Holden Karau
>Priority: Blocker
>
> One of the early packages we install as part of the release build in the 
> Dockerfile now requires setuptools to be pre-installed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-31653) setuptools needs to be installed before anything else

2020-05-06 Thread Holden Karau (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Holden Karau resolved SPARK-31653.
--
Fix Version/s: 2.4.6
   Resolution: Fixed

> setuptools needs to be installed before anything else
> -
>
> Key: SPARK-31653
> URL: https://issues.apache.org/jira/browse/SPARK-31653
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 2.4.6
>Reporter: Holden Karau
>Assignee: Holden Karau
>Priority: Blocker
> Fix For: 2.4.6
>
>
> One of the early packages we install as part of the release build in the 
> Dockerfile now requires setuptools to be pre-installed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-31653) setuptools needs to be installed before anything else

2020-05-06 Thread Holden Karau (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31653?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Holden Karau updated SPARK-31653:
-
Priority: Blocker  (was: Major)

> setuptools needs to be installed before anything else
> -
>
> Key: SPARK-31653
> URL: https://issues.apache.org/jira/browse/SPARK-31653
> Project: Spark
>  Issue Type: Bug
>  Components: Build
>Affects Versions: 2.4.6
>Reporter: Holden Karau
>Priority: Blocker
>
> One of the early packages we install as part of the release build in the 
> Dockerfile now requires setuptools to be pre-installed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-31653) setuptools needs to be installed before anything else

2020-05-06 Thread Holden Karau (Jira)
Holden Karau created SPARK-31653:


 Summary: setuptools needs to be installed before anything else
 Key: SPARK-31653
 URL: https://issues.apache.org/jira/browse/SPARK-31653
 Project: Spark
  Issue Type: Bug
  Components: Build
Affects Versions: 2.4.6
Reporter: Holden Karau


One of the early packages we install as part of the release build in the 
Dockerfile now requires setuptools to be pre-installed.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-29250) Upgrade to Hadoop 3.2.1

2020-05-06 Thread Dongjoon Hyun (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-29250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17101110#comment-17101110
 ] 

Dongjoon Hyun commented on SPARK-29250:
---

Thanks for the update. For now, we are not aiming to downgrade Hadoop 3.2.x to 
Hadoop 3.1.x. If we find a way how to handle Apache Hive 1.2 and Hive 2.3's 
guava dependency, it will be with Hadoop 3.2+. Since Apache Hive community 
doesn't have any plan for Guava update in `branch-2.3`, Apache Spark community 
needs another way to handle them. For Apache Hive 1.2, I want to drop it as 
soon as possible after Apache Spark 3.0.0 release. It will narrow down this 
issue to Apache Hive 2.3's guava dependency.

> Upgrade to Hadoop 3.2.1
> ---
>
> Key: SPARK-29250
> URL: https://issues.apache.org/jira/browse/SPARK-29250
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.1.0
>Reporter: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-29250) Upgrade to Hadoop 3.2.1

2020-05-06 Thread Dongjoon Hyun (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-29250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17101110#comment-17101110
 ] 

Dongjoon Hyun edited comment on SPARK-29250 at 5/6/20, 7:25 PM:


Thanks for the update. For now, we are not aiming to downgrade from Hadoop 
3.2.x to Hadoop 3.1.x. If we find a way how to handle Apache Hive 1.2 and Hive 
2.3's guava dependency, it will be with Hadoop 3.2+. Since Apache Hive 
community doesn't have any plan for Guava update in `branch-2.3`, Apache Spark 
community needs another way to handle them. For Apache Hive 1.2, I want to drop 
it as soon as possible after Apache Spark 3.0.0 release. It will narrow down 
this issue to Apache Hive 2.3's guava dependency.


was (Author: dongjoon):
Thanks for the update. For now, we are not aiming to downgrade Hadoop 3.2.x to 
Hadoop 3.1.x. If we find a way how to handle Apache Hive 1.2 and Hive 2.3's 
guava dependency, it will be with Hadoop 3.2+. Since Apache Hive community 
doesn't have any plan for Guava update in `branch-2.3`, Apache Spark community 
needs another way to handle them. For Apache Hive 1.2, I want to drop it as 
soon as possible after Apache Spark 3.0.0 release. It will narrow down this 
issue to Apache Hive 2.3's guava dependency.

> Upgrade to Hadoop 3.2.1
> ---
>
> Key: SPARK-29250
> URL: https://issues.apache.org/jira/browse/SPARK-29250
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.1.0
>Reporter: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-31361) Rebase datetime in parquet/avro according to file metadata

2020-05-06 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-31361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17101089#comment-17101089
 ] 

Apache Spark commented on SPARK-31361:
--

User 'MaxGekk' has created a pull request for this issue:
https://github.com/apache/spark/pull/28466

> Rebase datetime in parquet/avro according to file metadata
> --
>
> Key: SPARK-31361
> URL: https://issues.apache.org/jira/browse/SPARK-31361
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
>Priority: Blocker
> Fix For: 3.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-31361) Rebase datetime in parquet/avro according to file metadata

2020-05-06 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-31361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17101088#comment-17101088
 ] 

Apache Spark commented on SPARK-31361:
--

User 'MaxGekk' has created a pull request for this issue:
https://github.com/apache/spark/pull/28466

> Rebase datetime in parquet/avro according to file metadata
> --
>
> Key: SPARK-31361
> URL: https://issues.apache.org/jira/browse/SPARK-31361
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
>Priority: Blocker
> Fix For: 3.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5300) Spark loads file partitions in inconsistent order on native filesystems

2020-05-06 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-5300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17100976#comment-17100976
 ] 

Apache Spark commented on SPARK-5300:
-

User 'wetneb' has created a pull request for this issue:
https://github.com/apache/spark/pull/28465

> Spark loads file partitions in inconsistent order on native filesystems
> ---
>
> Key: SPARK-5300
> URL: https://issues.apache.org/jira/browse/SPARK-5300
> Project: Spark
>  Issue Type: Bug
>  Components: Input/Output
>Affects Versions: 1.1.0, 1.2.0
> Environment: Linux, EXT4, for example.
>Reporter: Ewan Higgs
>Priority: Major
>
> Discussed on user list in April 2014:
> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-reads-partitions-in-a-wrong-order-td4818.html
> And on dev list January 2015:
> http://apache-spark-developers-list.1001551.n3.nabble.com/RDD-order-guarantees-td10142.html
> When using a file system which isn't HDFS, file partitions ('part-0, 
> part-1', etc.) are not guaranteed to load in the same order. This means 
> previously sorted RDDs will be loaded out of order. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-5300) Spark loads file partitions in inconsistent order on native filesystems

2020-05-06 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-5300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17100973#comment-17100973
 ] 

Apache Spark commented on SPARK-5300:
-

User 'wetneb' has created a pull request for this issue:
https://github.com/apache/spark/pull/28465

> Spark loads file partitions in inconsistent order on native filesystems
> ---
>
> Key: SPARK-5300
> URL: https://issues.apache.org/jira/browse/SPARK-5300
> Project: Spark
>  Issue Type: Bug
>  Components: Input/Output
>Affects Versions: 1.1.0, 1.2.0
> Environment: Linux, EXT4, for example.
>Reporter: Ewan Higgs
>Priority: Major
>
> Discussed on user list in April 2014:
> http://apache-spark-user-list.1001560.n3.nabble.com/Spark-reads-partitions-in-a-wrong-order-td4818.html
> And on dev list January 2015:
> http://apache-spark-developers-list.1001551.n3.nabble.com/RDD-order-guarantees-td10142.html
> When using a file system which isn't HDFS, file partitions ('part-0, 
> part-1', etc.) are not guaranteed to load in the same order. This means 
> previously sorted RDDs will be loaded out of order. 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-21770) ProbabilisticClassificationModel: Improve normalization of all-zero raw predictions

2020-05-06 Thread David Mavashev (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-21770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17100948#comment-17100948
 ] 

David Mavashev commented on SPARK-21770:


Hi,

I'm hitting the above issue, in which the whole job is failing because of a 
single row that gets a 0 vector probabilities:

 
{code:java}
class: SparkException, cause: Failed to execute user defined 
function($anonfun$2: 
(struct,values:array>) => 
struct,values:array>) 
org.apache.spark.SparkException: Job aborted due to stage failure: Task 5 in 
stage 10251.0 failed 1 times, most recent failure: Lost task 5.0 in stage 
10251.0 (TID 128916, localhost, executor driver): 
org.apache.spark.SparkException: Failed to execute user defined 
function($anonfun$2: 
(struct,values:array>) => 
struct,values:array>)
at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown
 Source)
at 
org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at 
org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$13$$anon$1.hasNext(WholeStageCodegenExec.scala:636)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at scala.collection.Iterator$class.foreach(Iterator.scala:893)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
at 
org.apache.spark.rdd.RDD$$anonfun$foreach$1$$anonfun$apply$27.apply(RDD.scala:972)
at 
org.apache.spark.rdd.RDD$$anonfun$foreach$1$$anonfun$apply$27.apply(RDD.scala:972)
at 
org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2101)
at 
org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2101)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:123)
at 
org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.IllegalArgumentException: requirement failed: Can't 
normalize the 0-vector.
at scala.Predef$.require(Predef.scala:224)
at 
org.apache.spark.ml.classification.ProbabilisticClassificationModel$.normalizeToProbabilitiesInPlace(ProbabilisticClassifier.scala:244)
at 
org.apache.spark.ml.classification.DecisionTreeClassificationModel.raw2probabilityInPlace(DecisionTreeClassifier.scala:198)
at 
org.apache.spark.ml.classification.ProbabilisticClassificationModel.raw2probability(ProbabilisticClassifier.scala:172)
at 
org.apache.spark.ml.classification.ProbabilisticClassificationModel$$anonfun$2.apply(ProbabilisticClassifier.scala:124)
at 
org.apache.spark.ml.classification.ProbabilisticClassificationModel$$anonfun$2.apply(ProbabilisticClassifier.scala:124)
... 19 more
{code}
What should be the correct handling to make this work, this is randomly 
happening on models we generate with Random Forest Classifier.

 

> ProbabilisticClassificationModel: Improve normalization of all-zero raw 
> predictions
> ---
>
> Key: SPARK-21770
> URL: https://issues.apache.org/jira/browse/SPARK-21770
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Affects Versions: 2.3.0
>Reporter: Siddharth Murching
>Assignee: Weichen Xu
>Priority: Minor
> Fix For: 2.3.0
>
>
> Given an n-element raw prediction vector of all-zeros, 
> ProbabilisticClassifierModel.normalizeToProbabilitiesInPlace() should output 
> a probability vector of all-equal 1/n entries



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-21770) ProbabilisticClassificationModel: Improve normalization of all-zero raw predictions

2020-05-06 Thread David Mavashev (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-21770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17100948#comment-17100948
 ] 

David Mavashev edited comment on SPARK-21770 at 5/6/20, 4:24 PM:
-

Hi,

Im using version 2.4.5, I'm hitting the above issue, in which the whole job is 
failing because of a single row that gets a 0 vector probabilities:

 
{code:java}
class: SparkException, cause: Failed to execute user defined 
function($anonfun$2: 
(struct,values:array>) => 
struct,values:array>) 
org.apache.spark.SparkException: Job aborted due to stage failure: Task 5 in 
stage 10251.0 failed 1 times, most recent failure: Lost task 5.0 in stage 
10251.0 (TID 128916, localhost, executor driver): 
org.apache.spark.SparkException: Failed to execute user defined 
function($anonfun$2: 
(struct,values:array>) => 
struct,values:array>)
at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown
 Source)
at 
org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at 
org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$13$$anon$1.hasNext(WholeStageCodegenExec.scala:636)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at scala.collection.Iterator$class.foreach(Iterator.scala:893)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
at 
org.apache.spark.rdd.RDD$$anonfun$foreach$1$$anonfun$apply$27.apply(RDD.scala:972)
at 
org.apache.spark.rdd.RDD$$anonfun$foreach$1$$anonfun$apply$27.apply(RDD.scala:972)
at 
org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2101)
at 
org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2101)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
at org.apache.spark.scheduler.Task.run(Task.scala:123)
at 
org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.IllegalArgumentException: requirement failed: Can't 
normalize the 0-vector.
at scala.Predef$.require(Predef.scala:224)
at 
org.apache.spark.ml.classification.ProbabilisticClassificationModel$.normalizeToProbabilitiesInPlace(ProbabilisticClassifier.scala:244)
at 
org.apache.spark.ml.classification.DecisionTreeClassificationModel.raw2probabilityInPlace(DecisionTreeClassifier.scala:198)
at 
org.apache.spark.ml.classification.ProbabilisticClassificationModel.raw2probability(ProbabilisticClassifier.scala:172)
at 
org.apache.spark.ml.classification.ProbabilisticClassificationModel$$anonfun$2.apply(ProbabilisticClassifier.scala:124)
at 
org.apache.spark.ml.classification.ProbabilisticClassificationModel$$anonfun$2.apply(ProbabilisticClassifier.scala:124)
... 19 more
{code}
What should be the correct handling to make this work, this is randomly 
happening on models we generate with Random Forest Classifier.

 


was (Author: davidmav86):
Hi,

I'm hitting the above issue, in which the whole job is failing because of a 
single row that gets a 0 vector probabilities:

 
{code:java}
class: SparkException, cause: Failed to execute user defined 
function($anonfun$2: 
(struct,values:array>) => 
struct,values:array>) 
org.apache.spark.SparkException: Job aborted due to stage failure: Task 5 in 
stage 10251.0 failed 1 times, most recent failure: Lost task 5.0 in stage 
10251.0 (TID 128916, localhost, executor driver): 
org.apache.spark.SparkException: Failed to execute user defined 
function($anonfun$2: 
(struct,values:array>) => 
struct,values:array>)
at 
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown
 Source)
at 
org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at 
org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$13$$anon$1.hasNext(WholeStageCodegenExec.scala:636)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at scala.collection.Iterator$class.foreach(Iterator.scala:893)
at scala.collection.AbstractIterator.foreach(Iterator.scala:1336)
at 
org.apache.spark.rdd.RDD$$anonfun$foreach$1$$anonfun$apply$27.apply(RDD.scala:972)
at 

[jira] [Assigned] (SPARK-31652) Add ANOVASelector and FValueSelector to PySpark

2020-05-06 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-31652:


Assignee: Apache Spark

> Add ANOVASelector and FValueSelector to PySpark
> ---
>
> Key: SPARK-31652
> URL: https://issues.apache.org/jira/browse/SPARK-31652
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML, PySpark
>Affects Versions: 3.1.0
>Reporter: Huaxin Gao
>Assignee: Apache Spark
>Priority: Major
>
> Add ANOVASelector and FValueSelector to PySpark



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-31652) Add ANOVASelector and FValueSelector to PySpark

2020-05-06 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-31652:


Assignee: (was: Apache Spark)

> Add ANOVASelector and FValueSelector to PySpark
> ---
>
> Key: SPARK-31652
> URL: https://issues.apache.org/jira/browse/SPARK-31652
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML, PySpark
>Affects Versions: 3.1.0
>Reporter: Huaxin Gao
>Priority: Major
>
> Add ANOVASelector and FValueSelector to PySpark



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-31652) Add ANOVASelector and FValueSelector to PySpark

2020-05-06 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-31652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17100940#comment-17100940
 ] 

Apache Spark commented on SPARK-31652:
--

User 'huaxingao' has created a pull request for this issue:
https://github.com/apache/spark/pull/28464

> Add ANOVASelector and FValueSelector to PySpark
> ---
>
> Key: SPARK-31652
> URL: https://issues.apache.org/jira/browse/SPARK-31652
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML, PySpark
>Affects Versions: 3.1.0
>Reporter: Huaxin Gao
>Priority: Major
>
> Add ANOVASelector and FValueSelector to PySpark



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-31652) Add ANOVASelector and FValueSelector to PySpark

2020-05-06 Thread Huaxin Gao (Jira)
Huaxin Gao created SPARK-31652:
--

 Summary: Add ANOVASelector and FValueSelector to PySpark
 Key: SPARK-31652
 URL: https://issues.apache.org/jira/browse/SPARK-31652
 Project: Spark
  Issue Type: Sub-task
  Components: ML, PySpark
Affects Versions: 3.1.0
Reporter: Huaxin Gao


Add ANOVASelector and FValueSelector to PySpark



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-19371) Cannot spread cached partitions evenly across executors

2020-05-06 Thread Thunder Stumpges (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-19371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17100878#comment-17100878
 ] 

Thunder Stumpges commented on SPARK-19371:
--

Thank you for the comments [~honor], [~danmeany], and [~lebedev] ! I am glad to 
see there are others with this issue. We have had to "just live with it" for 
these years. And this job is STILL running in production, every 10 seconds, 
wasting computing resources and time due to imbalanced cached partitions across 
the executors. 

> Cannot spread cached partitions evenly across executors
> ---
>
> Key: SPARK-19371
> URL: https://issues.apache.org/jira/browse/SPARK-19371
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 1.6.1
>Reporter: Thunder Stumpges
>Priority: Major
>  Labels: bulk-closed
> Attachments: RDD Block Distribution on two executors.png, Unbalanced 
> RDD Blocks, and resulting task imbalance.png, Unbalanced RDD Blocks, and 
> resulting task imbalance.png, execution timeline.png
>
>
> Before running an intensive iterative job (in this case a distributed topic 
> model training), we need to load a dataset and persist it across executors. 
> After loading from HDFS and persisting, the partitions are spread unevenly 
> across executors (based on the initial scheduling of the reads which are not 
> data locale sensitive). The partition sizes are even, just not their 
> distribution over executors. We currently have no way to force the partitions 
> to spread evenly, and as the iterative algorithm begins, tasks are 
> distributed to executors based on this initial load, forcing some very 
> unbalanced work.
> This has been mentioned a 
> [number|http://apache-spark-developers-list.1001551.n3.nabble.com/RDD-Partitions-not-distributed-evenly-to-executors-tt16988.html#a17059]
>  of 
> [times|http://apache-spark-user-list.1001560.n3.nabble.com/Spark-work-distribution-among-execs-tt26502.html]
>  in 
> [various|http://apache-spark-user-list.1001560.n3.nabble.com/Partitions-are-get-placed-on-the-single-node-tt26597.html]
>  user/dev group threads.
> None of the discussions I could find had solutions that worked for me. Here 
> are examples of things I have tried. All resulted in partitions in memory 
> that were NOT evenly distributed to executors, causing future tasks to be 
> imbalanced across executors as well.
> *Reduce Locality*
> {code}spark.shuffle.reduceLocality.enabled=false/true{code}
> *"Legacy" memory mode*
> {code}spark.memory.useLegacyMode = true/false{code}
> *Basic load and repartition*
> {code}
> val numPartitions = 48*16
> val df = sqlContext.read.
> parquet("/data/folder_to_load").
> repartition(numPartitions).
> persist
> df.count
> {code}
> *Load and repartition to 2x partitions, then shuffle repartition down to 
> desired partitions*
> {code}
> val numPartitions = 48*16
> val df2 = sqlContext.read.
> parquet("/data/folder_to_load").
> repartition(numPartitions*2)
> val df = df2.repartition(numPartitions).
> persist
> df.count
> {code}
> It would be great if when persisting an RDD/DataFrame, if we could request 
> that those partitions be stored evenly across executors in preparation for 
> future tasks. 
> I'm not sure if this is a more general issue (I.E. not just involving 
> persisting RDDs), but for the persisted in-memory case, it can make a HUGE 
> difference in the over-all running time of the remaining work.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-31609) Add VarianceThresholdSelector to PySpark

2020-05-06 Thread Sean R. Owen (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31609?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen resolved SPARK-31609.
--
Fix Version/s: 3.1.0
 Assignee: Huaxin Gao
   Resolution: Fixed

Resolved by https://github.com/apache/spark/pull/28409

> Add VarianceThresholdSelector to PySpark
> 
>
> Key: SPARK-31609
> URL: https://issues.apache.org/jira/browse/SPARK-31609
> Project: Spark
>  Issue Type: New Feature
>  Components: ML, PySpark
>Affects Versions: 3.1.0
>Reporter: Huaxin Gao
>Assignee: Huaxin Gao
>Priority: Minor
> Fix For: 3.1.0
>
>
> Add VarianceThresholdSelector to PySpark



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-31648) Filtering is supported only on partition keys of type string Issue

2020-05-06 Thread angerszhu (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-31648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17100820#comment-17100820
 ] 

angerszhu edited comment on SPARK-31648 at 5/6/20, 1:51 PM:


[~Rajesh Tadi]

Can you show your reproduce detail code process? thanks

 

I can't reproduce this in 2.4.0 and master branch


was (Author: angerszhuuu):
[~Rajesh Tadi]

Can you show your reproduce detail code process?

 

I can't reproduce this in 2.4.0 and master branch

> Filtering is supported only on partition keys of type string Issue
> --
>
> Key: SPARK-31648
> URL: https://issues.apache.org/jira/browse/SPARK-31648
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.4
>Reporter: Rajesh Tadi
>Priority: Major
> Attachments: Spark Bug.txt
>
>
> When I submit a SQL with partition filter I see the below error. I tried 
> setting Spark Configuration spark.sql.hive.manageFilesourcePartitions to 
> false but I still see the same issue.
> java.lang.RuntimeException: Caught Hive MetaException attempting to get 
> partition metadata by filter from Hive.
> java.lang.reflect.InvocationTargetException: 
> org.apache.hadoop.hive.metastore.api.MetaException: Filtering is supported 
> only on partition keys of type string
> org.apache.hadoop.hive.metastore.api.MetaException: Filtering is supported 
> only on partition keys of type string
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-31648) Filtering is supported only on partition keys of type string Issue

2020-05-06 Thread angerszhu (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-31648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17100820#comment-17100820
 ] 

angerszhu commented on SPARK-31648:
---

[~Rajesh Tadi]

Can you show your reproduce detail code process?

 

I can't reproduce this in 2.4.0 and master branch

> Filtering is supported only on partition keys of type string Issue
> --
>
> Key: SPARK-31648
> URL: https://issues.apache.org/jira/browse/SPARK-31648
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.4
>Reporter: Rajesh Tadi
>Priority: Major
> Attachments: Spark Bug.txt
>
>
> When I submit a SQL with partition filter I see the below error. I tried 
> setting Spark Configuration spark.sql.hive.manageFilesourcePartitions to 
> false but I still see the same issue.
> java.lang.RuntimeException: Caught Hive MetaException attempting to get 
> partition metadata by filter from Hive.
> java.lang.reflect.InvocationTargetException: 
> org.apache.hadoop.hive.metastore.api.MetaException: Filtering is supported 
> only on partition keys of type string
> org.apache.hadoop.hive.metastore.api.MetaException: Filtering is supported 
> only on partition keys of type string
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-20628) Keep track of nodes which are going to be shut down & avoid scheduling new tasks

2020-05-06 Thread wuyi (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-20628?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17100793#comment-17100793
 ] 

wuyi commented on SPARK-20628:
--

Hi [~holden] is this ticket resolved by 
[https://github.com/apache/spark/pull/26440]?

> Keep track of nodes which are going to be shut down & avoid scheduling new 
> tasks
> 
>
> Key: SPARK-20628
> URL: https://issues.apache.org/jira/browse/SPARK-20628
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 2.2.0, 2.3.0
>Reporter: Holden Karau
>Assignee: Holden Karau
>Priority: Major
> Fix For: 3.1.0
>
>
> Keep track of nodes which are going to be shut down. We considered adding 
> this for YARN but took a different approach, for instances where we can't 
> control instance termination though (EC2, GCE, etc.) this may make more sense.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-31648) Filtering is supported only on partition keys of type string Issue

2020-05-06 Thread Rajesh Tadi (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-31648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17100789#comment-17100789
 ] 

Rajesh Tadi commented on SPARK-31648:
-

[~angerszhuuu] I have tried creating a table using Spark-SQL and Dataframes in 
Scala as well. Both the ways I see the same issue.

> Filtering is supported only on partition keys of type string Issue
> --
>
> Key: SPARK-31648
> URL: https://issues.apache.org/jira/browse/SPARK-31648
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.4
>Reporter: Rajesh Tadi
>Priority: Major
> Attachments: Spark Bug.txt
>
>
> When I submit a SQL with partition filter I see the below error. I tried 
> setting Spark Configuration spark.sql.hive.manageFilesourcePartitions to 
> false but I still see the same issue.
> java.lang.RuntimeException: Caught Hive MetaException attempting to get 
> partition metadata by filter from Hive.
> java.lang.reflect.InvocationTargetException: 
> org.apache.hadoop.hive.metastore.api.MetaException: Filtering is supported 
> only on partition keys of type string
> org.apache.hadoop.hive.metastore.api.MetaException: Filtering is supported 
> only on partition keys of type string
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-31650) SQL UI doesn't show metrics and whole stage codegen in AQE

2020-05-06 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan resolved SPARK-31650.
-
Fix Version/s: 3.0.0
   Resolution: Fixed

Issue resolved by pull request 28460
[https://github.com/apache/spark/pull/28460]

> SQL UI doesn't show metrics and whole stage codegen in AQE
> --
>
> Key: SPARK-31650
> URL: https://issues.apache.org/jira/browse/SPARK-31650
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: wuyi
>Assignee: wuyi
>Priority: Major
> Fix For: 3.0.0
>
> Attachments: before_aqe_ui.png
>
>
> When enabling AQE with subqueris within a query, the SQL UI may doesn't show 
> metrics and whole stage codegen.
> Here's a reproduce demo:
>  
> {code:java}
> spark.range(1).toDF("value").write.parquet("/tmp/p1")
> spark.range(1).toDF("value").write.parquet("/tmp/p2")
> spark.read.parquet("/tmp/p1").createOrReplaceTempView("t1")
> spark.read.parquet("/tmp/p2").createOrReplaceTempView("t2")
> spark.sql("select * from t1 where value=(select Max(value) from t2)").show()
> {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-31650) SQL UI doesn't show metrics and whole stage codegen in AQE

2020-05-06 Thread Wenchen Fan (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Wenchen Fan reassigned SPARK-31650:
---

Assignee: wuyi

> SQL UI doesn't show metrics and whole stage codegen in AQE
> --
>
> Key: SPARK-31650
> URL: https://issues.apache.org/jira/browse/SPARK-31650
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: wuyi
>Assignee: wuyi
>Priority: Major
> Attachments: before_aqe_ui.png
>
>
> When enabling AQE with subqueris within a query, the SQL UI may doesn't show 
> metrics and whole stage codegen.
> Here's a reproduce demo:
>  
> {code:java}
> spark.range(1).toDF("value").write.parquet("/tmp/p1")
> spark.range(1).toDF("value").write.parquet("/tmp/p2")
> spark.read.parquet("/tmp/p1").createOrReplaceTempView("t1")
> spark.read.parquet("/tmp/p2").createOrReplaceTempView("t2")
> spark.sql("select * from t1 where value=(select Max(value) from t2)").show()
> {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-31399) Closure cleaner broken in Scala 2.12

2020-05-06 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-31399?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17100721#comment-17100721
 ] 

Apache Spark commented on SPARK-31399:
--

User 'rednaxelafx' has created a pull request for this issue:
https://github.com/apache/spark/pull/28463

> Closure cleaner broken in Scala 2.12
> 
>
> Key: SPARK-31399
> URL: https://issues.apache.org/jira/browse/SPARK-31399
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.4.5, 3.0.0
>Reporter: Wenchen Fan
>Assignee: Kris Mok
>Priority: Blocker
>
> The `ClosureCleaner` only support Scala functions and it uses the following 
> check to catch closures
> {code}
>   // Check whether a class represents a Scala closure
>   private def isClosure(cls: Class[_]): Boolean = {
> cls.getName.contains("$anonfun$")
>   }
> {code}
> This doesn't work in 3.0 any more as we upgrade to Scala 2.12 and most Scala 
> functions become Java lambdas.
> As an example, the following code works well in Spark 2.4 Spark Shell:
> {code}
> scala> :pa
> // Entering paste mode (ctrl-D to finish)
> import org.apache.spark.sql.functions.lit
> case class Foo(id: String)
> val col = lit("123")
> val df = sc.range(0,10,1,1).map { _ => Foo("") }
> // Exiting paste mode, now interpreting.
> import org.apache.spark.sql.functions.lit
> defined class Foo
> col: org.apache.spark.sql.Column = 123
> df: org.apache.spark.rdd.RDD[Foo] = MapPartitionsRDD[5] at map at :20
> {code}
> But fails in 3.0
> {code}
> scala> :pa
> // Entering paste mode (ctrl-D to finish)
> import org.apache.spark.sql.functions.lit
> case class Foo(id: String)
> val col = lit("123")
> val df = sc.range(0,10,1,1).map { _ => Foo("") }
> // Exiting paste mode, now interpreting.
> org.apache.spark.SparkException: Task not serializable
>   at 
> org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:396)
>   at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:386)
>   at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:159)
>   at org.apache.spark.SparkContext.clean(SparkContext.scala:2371)
>   at org.apache.spark.rdd.RDD.$anonfun$map$1(RDD.scala:422)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
>   at org.apache.spark.rdd.RDD.withScope(RDD.scala:414)
>   at org.apache.spark.rdd.RDD.map(RDD.scala:421)
>   ... 39 elided
> Caused by: java.io.NotSerializableException: org.apache.spark.sql.Column
> Serialization stack:
>   - object not serializable (class: org.apache.spark.sql.Column, value: 
> 123)
>   - field (class: $iw, name: col, type: class org.apache.spark.sql.Column)
>   - object (class $iw, $iw@2d87ac2b)
>   - element of array (index: 0)
>   - array (class [Ljava.lang.Object;, size 1)
>   - field (class: java.lang.invoke.SerializedLambda, name: capturedArgs, 
> type: class [Ljava.lang.Object;)
>   - object (class java.lang.invoke.SerializedLambda, 
> SerializedLambda[capturingClass=class $iw, 
> functionalInterfaceMethod=scala/Function1.apply:(Ljava/lang/Object;)Ljava/lang/Object;,
>  implementation=invokeStatic 
> $anonfun$df$1$adapted:(L$iw;Ljava/lang/Object;)LFoo;, 
> instantiatedMethodType=(Ljava/lang/Object;)LFoo;, numCaptured=1])
>   - writeReplace data (class: java.lang.invoke.SerializedLambda)
>   - object (class $Lambda$2438/170049100, $Lambda$2438/170049100@d6b8c43)
>   at 
> org.apache.spark.serializer.SerializationDebugger$.improveException(SerializationDebugger.scala:41)
>   at 
> org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:47)
>   at 
> org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:101)
>   at 
> org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:393)
>   ... 47 more
> {code}
> **Apache Spark 2.4.5 with Scala 2.12**
> {code}
> Welcome to
>     __
>  / __/__  ___ _/ /__
> _\ \/ _ \/ _ `/ __/  '_/
>/___/ .__/\_,_/_/ /_/\_\   version 2.4.5
>   /_/
> Using Scala version 2.12.10 (OpenJDK 64-Bit Server VM, Java 1.8.0_242)
> Type in expressions to have them evaluated.
> Type :help for more information.
> scala> :pa
> // Entering paste mode (ctrl-D to finish)
> import org.apache.spark.sql.functions.lit
> case class Foo(id: String)
> val col = lit("123")
> val df = sc.range(0,10,1,1).map { _ => Foo("") }
> // Exiting paste mode, now interpreting.
> org.apache.spark.SparkException: Task not serializable
>   at 
> org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:403)
>   at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:393)
>   at 

[jira] [Assigned] (SPARK-31399) Closure cleaner broken in Scala 2.12

2020-05-06 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-31399:


Assignee: Apache Spark  (was: Kris Mok)

> Closure cleaner broken in Scala 2.12
> 
>
> Key: SPARK-31399
> URL: https://issues.apache.org/jira/browse/SPARK-31399
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.4.5, 3.0.0
>Reporter: Wenchen Fan
>Assignee: Apache Spark
>Priority: Blocker
>
> The `ClosureCleaner` only support Scala functions and it uses the following 
> check to catch closures
> {code}
>   // Check whether a class represents a Scala closure
>   private def isClosure(cls: Class[_]): Boolean = {
> cls.getName.contains("$anonfun$")
>   }
> {code}
> This doesn't work in 3.0 any more as we upgrade to Scala 2.12 and most Scala 
> functions become Java lambdas.
> As an example, the following code works well in Spark 2.4 Spark Shell:
> {code}
> scala> :pa
> // Entering paste mode (ctrl-D to finish)
> import org.apache.spark.sql.functions.lit
> case class Foo(id: String)
> val col = lit("123")
> val df = sc.range(0,10,1,1).map { _ => Foo("") }
> // Exiting paste mode, now interpreting.
> import org.apache.spark.sql.functions.lit
> defined class Foo
> col: org.apache.spark.sql.Column = 123
> df: org.apache.spark.rdd.RDD[Foo] = MapPartitionsRDD[5] at map at :20
> {code}
> But fails in 3.0
> {code}
> scala> :pa
> // Entering paste mode (ctrl-D to finish)
> import org.apache.spark.sql.functions.lit
> case class Foo(id: String)
> val col = lit("123")
> val df = sc.range(0,10,1,1).map { _ => Foo("") }
> // Exiting paste mode, now interpreting.
> org.apache.spark.SparkException: Task not serializable
>   at 
> org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:396)
>   at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:386)
>   at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:159)
>   at org.apache.spark.SparkContext.clean(SparkContext.scala:2371)
>   at org.apache.spark.rdd.RDD.$anonfun$map$1(RDD.scala:422)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
>   at org.apache.spark.rdd.RDD.withScope(RDD.scala:414)
>   at org.apache.spark.rdd.RDD.map(RDD.scala:421)
>   ... 39 elided
> Caused by: java.io.NotSerializableException: org.apache.spark.sql.Column
> Serialization stack:
>   - object not serializable (class: org.apache.spark.sql.Column, value: 
> 123)
>   - field (class: $iw, name: col, type: class org.apache.spark.sql.Column)
>   - object (class $iw, $iw@2d87ac2b)
>   - element of array (index: 0)
>   - array (class [Ljava.lang.Object;, size 1)
>   - field (class: java.lang.invoke.SerializedLambda, name: capturedArgs, 
> type: class [Ljava.lang.Object;)
>   - object (class java.lang.invoke.SerializedLambda, 
> SerializedLambda[capturingClass=class $iw, 
> functionalInterfaceMethod=scala/Function1.apply:(Ljava/lang/Object;)Ljava/lang/Object;,
>  implementation=invokeStatic 
> $anonfun$df$1$adapted:(L$iw;Ljava/lang/Object;)LFoo;, 
> instantiatedMethodType=(Ljava/lang/Object;)LFoo;, numCaptured=1])
>   - writeReplace data (class: java.lang.invoke.SerializedLambda)
>   - object (class $Lambda$2438/170049100, $Lambda$2438/170049100@d6b8c43)
>   at 
> org.apache.spark.serializer.SerializationDebugger$.improveException(SerializationDebugger.scala:41)
>   at 
> org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:47)
>   at 
> org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:101)
>   at 
> org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:393)
>   ... 47 more
> {code}
> **Apache Spark 2.4.5 with Scala 2.12**
> {code}
> Welcome to
>     __
>  / __/__  ___ _/ /__
> _\ \/ _ \/ _ `/ __/  '_/
>/___/ .__/\_,_/_/ /_/\_\   version 2.4.5
>   /_/
> Using Scala version 2.12.10 (OpenJDK 64-Bit Server VM, Java 1.8.0_242)
> Type in expressions to have them evaluated.
> Type :help for more information.
> scala> :pa
> // Entering paste mode (ctrl-D to finish)
> import org.apache.spark.sql.functions.lit
> case class Foo(id: String)
> val col = lit("123")
> val df = sc.range(0,10,1,1).map { _ => Foo("") }
> // Exiting paste mode, now interpreting.
> org.apache.spark.SparkException: Task not serializable
>   at 
> org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:403)
>   at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:393)
>   at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:162)
>   at 

[jira] [Assigned] (SPARK-31399) Closure cleaner broken in Scala 2.12

2020-05-06 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31399?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-31399:


Assignee: Kris Mok  (was: Apache Spark)

> Closure cleaner broken in Scala 2.12
> 
>
> Key: SPARK-31399
> URL: https://issues.apache.org/jira/browse/SPARK-31399
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core
>Affects Versions: 2.4.5, 3.0.0
>Reporter: Wenchen Fan
>Assignee: Kris Mok
>Priority: Blocker
>
> The `ClosureCleaner` only support Scala functions and it uses the following 
> check to catch closures
> {code}
>   // Check whether a class represents a Scala closure
>   private def isClosure(cls: Class[_]): Boolean = {
> cls.getName.contains("$anonfun$")
>   }
> {code}
> This doesn't work in 3.0 any more as we upgrade to Scala 2.12 and most Scala 
> functions become Java lambdas.
> As an example, the following code works well in Spark 2.4 Spark Shell:
> {code}
> scala> :pa
> // Entering paste mode (ctrl-D to finish)
> import org.apache.spark.sql.functions.lit
> case class Foo(id: String)
> val col = lit("123")
> val df = sc.range(0,10,1,1).map { _ => Foo("") }
> // Exiting paste mode, now interpreting.
> import org.apache.spark.sql.functions.lit
> defined class Foo
> col: org.apache.spark.sql.Column = 123
> df: org.apache.spark.rdd.RDD[Foo] = MapPartitionsRDD[5] at map at :20
> {code}
> But fails in 3.0
> {code}
> scala> :pa
> // Entering paste mode (ctrl-D to finish)
> import org.apache.spark.sql.functions.lit
> case class Foo(id: String)
> val col = lit("123")
> val df = sc.range(0,10,1,1).map { _ => Foo("") }
> // Exiting paste mode, now interpreting.
> org.apache.spark.SparkException: Task not serializable
>   at 
> org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:396)
>   at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:386)
>   at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:159)
>   at org.apache.spark.SparkContext.clean(SparkContext.scala:2371)
>   at org.apache.spark.rdd.RDD.$anonfun$map$1(RDD.scala:422)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
>   at 
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:112)
>   at org.apache.spark.rdd.RDD.withScope(RDD.scala:414)
>   at org.apache.spark.rdd.RDD.map(RDD.scala:421)
>   ... 39 elided
> Caused by: java.io.NotSerializableException: org.apache.spark.sql.Column
> Serialization stack:
>   - object not serializable (class: org.apache.spark.sql.Column, value: 
> 123)
>   - field (class: $iw, name: col, type: class org.apache.spark.sql.Column)
>   - object (class $iw, $iw@2d87ac2b)
>   - element of array (index: 0)
>   - array (class [Ljava.lang.Object;, size 1)
>   - field (class: java.lang.invoke.SerializedLambda, name: capturedArgs, 
> type: class [Ljava.lang.Object;)
>   - object (class java.lang.invoke.SerializedLambda, 
> SerializedLambda[capturingClass=class $iw, 
> functionalInterfaceMethod=scala/Function1.apply:(Ljava/lang/Object;)Ljava/lang/Object;,
>  implementation=invokeStatic 
> $anonfun$df$1$adapted:(L$iw;Ljava/lang/Object;)LFoo;, 
> instantiatedMethodType=(Ljava/lang/Object;)LFoo;, numCaptured=1])
>   - writeReplace data (class: java.lang.invoke.SerializedLambda)
>   - object (class $Lambda$2438/170049100, $Lambda$2438/170049100@d6b8c43)
>   at 
> org.apache.spark.serializer.SerializationDebugger$.improveException(SerializationDebugger.scala:41)
>   at 
> org.apache.spark.serializer.JavaSerializationStream.writeObject(JavaSerializer.scala:47)
>   at 
> org.apache.spark.serializer.JavaSerializerInstance.serialize(JavaSerializer.scala:101)
>   at 
> org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:393)
>   ... 47 more
> {code}
> **Apache Spark 2.4.5 with Scala 2.12**
> {code}
> Welcome to
>     __
>  / __/__  ___ _/ /__
> _\ \/ _ \/ _ `/ __/  '_/
>/___/ .__/\_,_/_/ /_/\_\   version 2.4.5
>   /_/
> Using Scala version 2.12.10 (OpenJDK 64-Bit Server VM, Java 1.8.0_242)
> Type in expressions to have them evaluated.
> Type :help for more information.
> scala> :pa
> // Entering paste mode (ctrl-D to finish)
> import org.apache.spark.sql.functions.lit
> case class Foo(id: String)
> val col = lit("123")
> val df = sc.range(0,10,1,1).map { _ => Foo("") }
> // Exiting paste mode, now interpreting.
> org.apache.spark.SparkException: Task not serializable
>   at 
> org.apache.spark.util.ClosureCleaner$.ensureSerializable(ClosureCleaner.scala:403)
>   at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:393)
>   at org.apache.spark.util.ClosureCleaner$.clean(ClosureCleaner.scala:162)
>   at 

[jira] [Commented] (SPARK-31648) Filtering is supported only on partition keys of type string Issue

2020-05-06 Thread angerszhu (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-31648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17100639#comment-17100639
 ] 

angerszhu commented on SPARK-31648:
---

[~Rajesh Tadi]

Anyway to reproduce this bug?

> Filtering is supported only on partition keys of type string Issue
> --
>
> Key: SPARK-31648
> URL: https://issues.apache.org/jira/browse/SPARK-31648
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.4
>Reporter: Rajesh Tadi
>Priority: Major
> Attachments: Spark Bug.txt
>
>
> When I submit a SQL with partition filter I see the below error. I tried 
> setting Spark Configuration spark.sql.hive.manageFilesourcePartitions to 
> false but I still see the same issue.
> java.lang.RuntimeException: Caught Hive MetaException attempting to get 
> partition metadata by filter from Hive.
> java.lang.reflect.InvocationTargetException: 
> org.apache.hadoop.hive.metastore.api.MetaException: Filtering is supported 
> only on partition keys of type string
> org.apache.hadoop.hive.metastore.api.MetaException: Filtering is supported 
> only on partition keys of type string
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-31648) Filtering is supported only on partition keys of type string Issue

2020-05-06 Thread Rajesh Tadi (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-31648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17100629#comment-17100629
 ] 

Rajesh Tadi edited comment on SPARK-31648 at 5/6/20, 9:52 AM:
--

[~angerszhuuu] Below is the SQL I have used.

 

select * from testdb.partbuck_test where country_cd='India';

 

My table structure will look similar as below.

Schema:

 
||col_name||data_type||comment||
|ID|bigint|null|
|NAME|string|null|
|COUNTRY_CD|string|null|
|# Partition Information| | |
|# col_name|data_type|comment|
|COUNTRY_CD|string|null|

 

 


was (Author: rajesh tadi):
[~angerszhuuu] Below is the SQL I have used.

 

select * from testdb.partbuck_test where country_cd='India';

 

My table structure will look similar as below.

Schema:

++--+--+
 |col_name                            |data_type                                
                                                                    |comment|
++--+--+

 |ID                                       |bigint                              
                                                                             
|null         |

 |NAME                                 |string                                  
                                                                         |null  
       |

 |.                    |...                 
                                                                                
          |null         |

 |.                    |...                 
                                                                                
          |null         |

 |.                    |...                 
                                                                                
          |null         |

 |COUNTRY_CD                    |string                                         
                                                                  |null         
|

|# Partition Information       |                                                
                                                                    |           
    |
|# col_name                         |data_type                                  
                                                                  |comment|
 |COUNTRY_CD                    |string                                         
                                                                  |null         
|

++--+--+

 

> Filtering is supported only on partition keys of type string Issue
> --
>
> Key: SPARK-31648
> URL: https://issues.apache.org/jira/browse/SPARK-31648
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.4
>Reporter: Rajesh Tadi
>Priority: Major
> Attachments: Spark Bug.txt
>
>
> When I submit a SQL with partition filter I see the below error. I tried 
> setting Spark Configuration spark.sql.hive.manageFilesourcePartitions to 
> false but I still see the same issue.
> java.lang.RuntimeException: Caught Hive MetaException attempting to get 
> partition metadata by filter from Hive.
> java.lang.reflect.InvocationTargetException: 
> org.apache.hadoop.hive.metastore.api.MetaException: Filtering is supported 
> only on partition keys of type string
> org.apache.hadoop.hive.metastore.api.MetaException: Filtering is supported 
> only on partition keys of type string
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-31651) Improve handling the case where different barrier sync types in a single sync

2020-05-06 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-31651:


Assignee: Apache Spark

> Improve handling the case where different barrier sync types in a single sync
> -
>
> Key: SPARK-31651
> URL: https://issues.apache.org/jira/browse/SPARK-31651
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: wuyi
>Assignee: Apache Spark
>Priority: Major
>
> Currently, we use cleanupBarrierStage when detecting different barrier sync 
> types in a single sync. This cause a problem that a new `ContextBarrierState` 
> can be created again if there's following requesters on the way. And those 
> corresponding tasks will fail because of killing instead of different barrier 
> sync types detected.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-31651) Improve handling the case where different barrier sync types in a single sync

2020-05-06 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-31651?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17100635#comment-17100635
 ] 

Apache Spark commented on SPARK-31651:
--

User 'Ngone51' has created a pull request for this issue:
https://github.com/apache/spark/pull/28462

> Improve handling the case where different barrier sync types in a single sync
> -
>
> Key: SPARK-31651
> URL: https://issues.apache.org/jira/browse/SPARK-31651
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: wuyi
>Priority: Major
>
> Currently, we use cleanupBarrierStage when detecting different barrier sync 
> types in a single sync. This cause a problem that a new `ContextBarrierState` 
> can be created again if there's following requesters on the way. And those 
> corresponding tasks will fail because of killing instead of different barrier 
> sync types detected.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-31651) Improve handling the case where different barrier sync types in a single sync

2020-05-06 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-31651:


Assignee: (was: Apache Spark)

> Improve handling the case where different barrier sync types in a single sync
> -
>
> Key: SPARK-31651
> URL: https://issues.apache.org/jira/browse/SPARK-31651
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: wuyi
>Priority: Major
>
> Currently, we use cleanupBarrierStage when detecting different barrier sync 
> types in a single sync. This cause a problem that a new `ContextBarrierState` 
> can be created again if there's following requesters on the way. And those 
> corresponding tasks will fail because of killing instead of different barrier 
> sync types detected.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-31648) Filtering is supported only on partition keys of type string Issue

2020-05-06 Thread Rajesh Tadi (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-31648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17100629#comment-17100629
 ] 

Rajesh Tadi commented on SPARK-31648:
-

[~angerszhuuu] Below is the SQL I have used.

 

select * from testdb.partbuck_test where country_cd='India';

 

My table structure will look similar as below.

Schema:

++--+--+
 |col_name                            |data_type                                
                                                                    |comment|
++--+--+

 |ID                                       |bigint                              
                                                                             
|null         |

 |NAME                                 |string                                  
                                                                         |null  
       |

 |.                    |...                 
                                                                                
          |null         |

 |.                    |...                 
                                                                                
          |null         |

 |.                    |...                 
                                                                                
          |null         |

 |COUNTRY_CD                    |string                                         
                                                                  |null         
|

|# Partition Information       |                                                
                                                                    |           
    |
|# col_name                         |data_type                                  
                                                                  |comment|
 |COUNTRY_CD                    |string                                         
                                                                  |null         
|

++--+--+

 

> Filtering is supported only on partition keys of type string Issue
> --
>
> Key: SPARK-31648
> URL: https://issues.apache.org/jira/browse/SPARK-31648
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.4
>Reporter: Rajesh Tadi
>Priority: Major
> Attachments: Spark Bug.txt
>
>
> When I submit a SQL with partition filter I see the below error. I tried 
> setting Spark Configuration spark.sql.hive.manageFilesourcePartitions to 
> false but I still see the same issue.
> java.lang.RuntimeException: Caught Hive MetaException attempting to get 
> partition metadata by filter from Hive.
> java.lang.reflect.InvocationTargetException: 
> org.apache.hadoop.hive.metastore.api.MetaException: Filtering is supported 
> only on partition keys of type string
> org.apache.hadoop.hive.metastore.api.MetaException: Filtering is supported 
> only on partition keys of type string
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-31470) Introduce SORTED BY clause in CREATE TABLE statement

2020-05-06 Thread Vikas Reddy Aravabhumi (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-31470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17100620#comment-17100620
 ] 

Vikas Reddy Aravabhumi edited comment on SPARK-31470 at 5/6/20, 9:42 AM:
-

[~yumwang] Could you please let us know the ETA for this fix?


was (Author: vikasreddy):
[~yumwang] Could you please let us know the ETA of this fix?

> Introduce SORTED BY clause in CREATE TABLE statement
> 
>
> Key: SPARK-31470
> URL: https://issues.apache.org/jira/browse/SPARK-31470
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Yuming Wang
>Priority: Major
>
> We usually sort on frequently filtered columns when writing data to improve 
> query performance. But there is no these info in the table information.
>  
> {code:sql}
> CREATE TABLE t(day INT, hour INT, year INT, month INT)
> USING parquet
> PARTITIONED BY (year, month)
> SORTED BY (day, hour);
> {code}
>  
> Impala, Oracle and redshift support this clause:
> https://issues.apache.org/jira/browse/IMPALA-4166
> https://docs.oracle.com/database/121/DWHSG/attcluster.htm#GUID-DAECFBC5-FD1A-45A5-8C2C-DC9884D0857B
> https://docs.aws.amazon.com/redshift/latest/dg/t_Sorting_data-compare-sort-styles.html



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-31470) Introduce SORTED BY clause in CREATE TABLE statement

2020-05-06 Thread Yuming Wang (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31470?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated SPARK-31470:

Description: 
We usually sort on frequently filtered columns when writing data to improve 
query performance. But there is no these info in the table information.
 

{code:sql}
CREATE TABLE t(day INT, hour INT, year INT, month INT)
USING parquet
PARTITIONED BY (year, month)
SORTED BY (day, hour);
{code}

 

Impala, Oracle and redshift support this clause:
https://issues.apache.org/jira/browse/IMPALA-4166
https://docs.oracle.com/database/121/DWHSG/attcluster.htm#GUID-DAECFBC5-FD1A-45A5-8C2C-DC9884D0857B
https://docs.aws.amazon.com/redshift/latest/dg/t_Sorting_data-compare-sort-styles.html

  was:
We usually sort on frequently filtered columns when writing data to improve 
query performance. But there is no these info in the table information.
 

{code:sql}
CREATE TABLE t(day INT, hour INT, year INT, month INT)
USING parquet
PARTITIONED BY (year, month)
SORTED BY (day, hour);
{code}

 

Impala and redshift support this clause:
https://issues.apache.org/jira/browse/IMPALA-4166
https://docs.aws.amazon.com/redshift/latest/dg/t_Sorting_data-compare-sort-styles.html


> Introduce SORTED BY clause in CREATE TABLE statement
> 
>
> Key: SPARK-31470
> URL: https://issues.apache.org/jira/browse/SPARK-31470
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Yuming Wang
>Priority: Major
>
> We usually sort on frequently filtered columns when writing data to improve 
> query performance. But there is no these info in the table information.
>  
> {code:sql}
> CREATE TABLE t(day INT, hour INT, year INT, month INT)
> USING parquet
> PARTITIONED BY (year, month)
> SORTED BY (day, hour);
> {code}
>  
> Impala, Oracle and redshift support this clause:
> https://issues.apache.org/jira/browse/IMPALA-4166
> https://docs.oracle.com/database/121/DWHSG/attcluster.htm#GUID-DAECFBC5-FD1A-45A5-8C2C-DC9884D0857B
> https://docs.aws.amazon.com/redshift/latest/dg/t_Sorting_data-compare-sort-styles.html



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-31651) Improve handling the case where different barrier sync types in a single sync

2020-05-06 Thread wuyi (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wuyi updated SPARK-31651:
-
Summary: Improve handling the case where different barrier sync types in a 
single sync  (was: Improve handling for the case of different barrier sync 
types in a single sync)

> Improve handling the case where different barrier sync types in a single sync
> -
>
> Key: SPARK-31651
> URL: https://issues.apache.org/jira/browse/SPARK-31651
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: wuyi
>Priority: Major
>
> Currently, we use cleanupBarrierStage when detecting different barrier sync 
> types in a single sync. This cause a problem that a new `ContextBarrierState` 
> can be created again if there's following requesters on the way. And those 
> corresponding tasks will fail because of killing instead of different barrier 
> sync types detected.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-31651) Improve handling for the case of different barrier sync types in a single sync

2020-05-06 Thread wuyi (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31651?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wuyi updated SPARK-31651:
-
Description: 
Currently, we use cleanupBarrierStage when detecting different barrier sync 
types in a single sync. This cause a problem that a new `ContextBarrierState` 
can be created again if there's following requesters on the way. And those 
corresponding tasks will fail because of killing instead of different barrier 
sync types detected.

 

 

  was:
Currently, we use cleanupBarrierStage when detecting different barrier sync 
types in a single sync. This cause a problem that a new `ContextBarrierState` 
can be created again if there's following requesters on the way. And those 
corresponding tasks will fail because of killing instead of different barrier 
sync types deteced.

 

 


> Improve handling for the case of different barrier sync types in a single sync
> --
>
> Key: SPARK-31651
> URL: https://issues.apache.org/jira/browse/SPARK-31651
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: wuyi
>Priority: Major
>
> Currently, we use cleanupBarrierStage when detecting different barrier sync 
> types in a single sync. This cause a problem that a new `ContextBarrierState` 
> can be created again if there's following requesters on the way. And those 
> corresponding tasks will fail because of killing instead of different barrier 
> sync types detected.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-31651) Improve handling for the case of different barrier sync types in a single sync

2020-05-06 Thread wuyi (Jira)
wuyi created SPARK-31651:


 Summary: Improve handling for the case of different barrier sync 
types in a single sync
 Key: SPARK-31651
 URL: https://issues.apache.org/jira/browse/SPARK-31651
 Project: Spark
  Issue Type: Improvement
  Components: Spark Core
Affects Versions: 3.0.0
Reporter: wuyi


Currently, we use cleanupBarrierStage when detecting different barrier sync 
types in a single sync. This cause a problem that a new `ContextBarrierState` 
can be created again if there's following requesters on the way. And those 
corresponding tasks will fail because of killing instead of different barrier 
sync types deteced.

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-31470) Introduce SORTED BY clause in CREATE TABLE statement

2020-05-06 Thread Vikas Reddy Aravabhumi (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-31470?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17100620#comment-17100620
 ] 

Vikas Reddy Aravabhumi commented on SPARK-31470:


[~yumwang] Could you please let us know the ETA of this fix?

> Introduce SORTED BY clause in CREATE TABLE statement
> 
>
> Key: SPARK-31470
> URL: https://issues.apache.org/jira/browse/SPARK-31470
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Yuming Wang
>Priority: Major
>
> We usually sort on frequently filtered columns when writing data to improve 
> query performance. But there is no these info in the table information.
>  
> {code:sql}
> CREATE TABLE t(day INT, hour INT, year INT, month INT)
> USING parquet
> PARTITIONED BY (year, month)
> SORTED BY (day, hour);
> {code}
>  
> Impala and redshift support this clause:
> https://issues.apache.org/jira/browse/IMPALA-4166
> https://docs.aws.amazon.com/redshift/latest/dg/t_Sorting_data-compare-sort-styles.html



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-31650) SQL UI doesn't show metrics and whole stage codegen in AQE

2020-05-06 Thread wuyi (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wuyi updated SPARK-31650:
-
Issue Type: Bug  (was: Test)

> SQL UI doesn't show metrics and whole stage codegen in AQE
> --
>
> Key: SPARK-31650
> URL: https://issues.apache.org/jira/browse/SPARK-31650
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: wuyi
>Priority: Major
> Attachments: before_aqe_ui.png
>
>
> When enabling AQE with subqueris within a query, the SQL UI may doesn't show 
> metrics and whole stage codegen.
> Here's a reproduce demo:
>  
> {code:java}
> spark.range(1).toDF("value").write.parquet("/tmp/p1")
> spark.range(1).toDF("value").write.parquet("/tmp/p2")
> spark.read.parquet("/tmp/p1").createOrReplaceTempView("t1")
> spark.read.parquet("/tmp/p2").createOrReplaceTempView("t2")
> spark.sql("select * from t1 where value=(select Max(value) from t2)").show()
> {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-31361) Rebase datetime in parquet/avro according to file metadata

2020-05-06 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-31361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17100600#comment-17100600
 ] 

Apache Spark commented on SPARK-31361:
--

User 'MaxGekk' has created a pull request for this issue:
https://github.com/apache/spark/pull/28461

> Rebase datetime in parquet/avro according to file metadata
> --
>
> Key: SPARK-31361
> URL: https://issues.apache.org/jira/browse/SPARK-31361
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Wenchen Fan
>Assignee: Wenchen Fan
>Priority: Blocker
> Fix For: 3.0.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-31127) Add abstract Selector

2020-05-06 Thread zhengruifeng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhengruifeng reassigned SPARK-31127:


Assignee: Huaxin Gao

> Add abstract Selector
> -
>
> Key: SPARK-31127
> URL: https://issues.apache.org/jira/browse/SPARK-31127
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML
>Affects Versions: 3.1.0
>Reporter: Huaxin Gao
>Assignee: Huaxin Gao
>Priority: Major
>
> Add abstract Selector. Put the common code between ChisqSelector and 
> FValueSelector to Selector.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-31127) Add abstract Selector

2020-05-06 Thread zhengruifeng (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

zhengruifeng resolved SPARK-31127.
--
Fix Version/s: 3.1.0
   Resolution: Fixed

Issue resolved by pull request 27978
[https://github.com/apache/spark/pull/27978]

> Add abstract Selector
> -
>
> Key: SPARK-31127
> URL: https://issues.apache.org/jira/browse/SPARK-31127
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML
>Affects Versions: 3.1.0
>Reporter: Huaxin Gao
>Assignee: Huaxin Gao
>Priority: Major
> Fix For: 3.1.0
>
>
> Add abstract Selector. Put the common code between ChisqSelector and 
> FValueSelector to Selector.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-31650) SQL UI doesn't show metrics and whole stage codegen in AQE

2020-05-06 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-31650:


Assignee: Apache Spark

> SQL UI doesn't show metrics and whole stage codegen in AQE
> --
>
> Key: SPARK-31650
> URL: https://issues.apache.org/jira/browse/SPARK-31650
> Project: Spark
>  Issue Type: Test
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: wuyi
>Assignee: Apache Spark
>Priority: Major
> Attachments: before_aqe_ui.png
>
>
> When enabling AQE with subqueris within a query, the SQL UI may doesn't show 
> metrics and whole stage codegen.
> Here's a reproduce demo:
>  
> {code:java}
> spark.range(1).toDF("value").write.parquet("/tmp/p1")
> spark.range(1).toDF("value").write.parquet("/tmp/p2")
> spark.read.parquet("/tmp/p1").createOrReplaceTempView("t1")
> spark.read.parquet("/tmp/p2").createOrReplaceTempView("t2")
> spark.sql("select * from t1 where value=(select Max(value) from t2)").show()
> {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-31650) SQL UI doesn't show metrics and whole stage codegen in AQE

2020-05-06 Thread Apache Spark (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-31650?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17100555#comment-17100555
 ] 

Apache Spark commented on SPARK-31650:
--

User 'Ngone51' has created a pull request for this issue:
https://github.com/apache/spark/pull/28460

> SQL UI doesn't show metrics and whole stage codegen in AQE
> --
>
> Key: SPARK-31650
> URL: https://issues.apache.org/jira/browse/SPARK-31650
> Project: Spark
>  Issue Type: Test
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: wuyi
>Priority: Major
> Attachments: before_aqe_ui.png
>
>
> When enabling AQE with subqueris within a query, the SQL UI may doesn't show 
> metrics and whole stage codegen.
> Here's a reproduce demo:
>  
> {code:java}
> spark.range(1).toDF("value").write.parquet("/tmp/p1")
> spark.range(1).toDF("value").write.parquet("/tmp/p2")
> spark.read.parquet("/tmp/p1").createOrReplaceTempView("t1")
> spark.read.parquet("/tmp/p2").createOrReplaceTempView("t2")
> spark.sql("select * from t1 where value=(select Max(value) from t2)").show()
> {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-31650) SQL UI doesn't show metrics and whole stage codegen in AQE

2020-05-06 Thread Apache Spark (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-31650:


Assignee: (was: Apache Spark)

> SQL UI doesn't show metrics and whole stage codegen in AQE
> --
>
> Key: SPARK-31650
> URL: https://issues.apache.org/jira/browse/SPARK-31650
> Project: Spark
>  Issue Type: Test
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: wuyi
>Priority: Major
> Attachments: before_aqe_ui.png
>
>
> When enabling AQE with subqueris within a query, the SQL UI may doesn't show 
> metrics and whole stage codegen.
> Here's a reproduce demo:
>  
> {code:java}
> spark.range(1).toDF("value").write.parquet("/tmp/p1")
> spark.range(1).toDF("value").write.parquet("/tmp/p2")
> spark.read.parquet("/tmp/p1").createOrReplaceTempView("t1")
> spark.read.parquet("/tmp/p2").createOrReplaceTempView("t2")
> spark.sql("select * from t1 where value=(select Max(value) from t2)").show()
> {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-31648) Filtering is supported only on partition keys of type string Issue

2020-05-06 Thread angerszhu (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-31648?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17100551#comment-17100551
 ] 

angerszhu commented on SPARK-31648:
---

[~Rajesh Tadi]

Can you show your sql and table schema detail?

> Filtering is supported only on partition keys of type string Issue
> --
>
> Key: SPARK-31648
> URL: https://issues.apache.org/jira/browse/SPARK-31648
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.4
>Reporter: Rajesh Tadi
>Priority: Major
> Attachments: Spark Bug.txt
>
>
> When I submit a SQL with partition filter I see the below error. I tried 
> setting Spark Configuration spark.sql.hive.manageFilesourcePartitions to 
> false but I still see the same issue.
> java.lang.RuntimeException: Caught Hive MetaException attempting to get 
> partition metadata by filter from Hive.
> java.lang.reflect.InvocationTargetException: 
> org.apache.hadoop.hive.metastore.api.MetaException: Filtering is supported 
> only on partition keys of type string
> org.apache.hadoop.hive.metastore.api.MetaException: Filtering is supported 
> only on partition keys of type string
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-31460) spark-sql-kafka source in spark 2.4.4 causes reading stream failure frequently

2020-05-06 Thread Gabor Somogyi (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31460?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gabor Somogyi resolved SPARK-31460.
---
Resolution: Information Provided

> spark-sql-kafka source in spark 2.4.4 causes reading stream failure frequently
> --
>
> Key: SPARK-31460
> URL: https://issues.apache.org/jira/browse/SPARK-31460
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 2.4.4
>Reporter: vinay
>Priority: Major
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> In spark 2.4.4 , it provides a source "spark-sql-kafka-0-10_2.11".
>  
> When I wanted to read from my kafka-0.10.2.11 cluster, it throws out an error 
> "*java.util.concurrent.TimeoutException: Cannot fetch record for offset x 
> in 1000 milliseconds*"  frequently, and the job thus failed.
>  
> I see this issue was seen before in 2.3 according to ticket 23829 and an 
> upgrade to spark 2.4 was supposed to solve this.
>  
> {code:java}
> compile group: 'org.apache.spark', name: 'spark-sql-kafka-0-10_2.11', 
> version: '2.4.4'{code}
> Here is the error stack.
> {code:java}
> org.apache.spark.SparkException: Writing job aborted.
>  
> org.apache.spark.sql.execution.datasources.v2.WriteToDataSourceV2Exec.doExecute(WriteToDataSourceV2Exec.scala:92)
>  
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
>  
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
>  
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
>  
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
>  org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
>  org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
>  org.apache.spark.sql.execution.SparkPlan.getByteArrayRdd(SparkPlan.scala:247)
>  org.apache.spark.sql.execution.SparkPlan.executeCollect(SparkPlan.scala:296)
>  
> org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$collectFromPlan(Dataset.scala:3389)
>  org.apache.spark.sql.Dataset$$anonfun$collect$1.apply(Dataset.scala:2788)
>  org.apache.spark.sql.Dataset$$anonfun$collect$1.apply(Dataset.scala:2788)
>  org.apache.spark.sql.Dataset$$anonfun$52.apply(Dataset.scala:3370)
>  
> org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:78)
>  
> org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125)
>  
> org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73)
>  org.apache.spark.sql.Dataset.withAction(Dataset.scala:3369)
>  org.apache.spark.sql.Dataset.collect(Dataset.scala:2788)
> org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$org$apache$spark$sql$execution$streaming$MicroBatchExecution$$runBatch$5$$anonfun$apply$17.apply(MicroBatchExecution.scala:540)
>  
> org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:78)
>  
> org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125)
>  
> org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73)
> org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$org$apache$spark$sql$execution$streaming$MicroBatchExecution$$runBatch$5.apply(MicroBatchExecution.scala:535)
>  
> org.apache.spark.sql.execution.streaming.ProgressReporter$class.reportTimeTaken(ProgressReporter.scala:351)
>  
> org.apache.spark.sql.execution.streaming.StreamExecution.reportTimeTaken(StreamExecution.scala:58)
> org.apache.spark.sql.execution.streaming.MicroBatchExecution.org$apache$spark$sql$execution$streaming$MicroBatchExecution$$runBatch(MicroBatchExecution.scala:534)
> org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$runActivatedStream$1$$anonfun$apply$mcZ$sp$1.apply$mcV$sp(MicroBatchExecution.scala:198)
> org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$runActivatedStream$1$$anonfun$apply$mcZ$sp$1.apply(MicroBatchExecution.scala:166)
> org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$runActivatedStream$1$$anonfun$apply$mcZ$sp$1.apply(MicroBatchExecution.scala:166)
>  
> org.apache.spark.sql.execution.streaming.ProgressReporter$class.reportTimeTaken(ProgressReporter.scala:351)
>  
> org.apache.spark.sql.execution.streaming.StreamExecution.reportTimeTaken(StreamExecution.scala:58)
> org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$runActivatedStream$1.apply$mcZ$sp(MicroBatchExecution.scala:166)
>  
> org.apache.spark.sql.execution.streaming.ProcessingTimeExecutor.execute(TriggerExecutor.scala:56)
>  
> 

[jira] [Commented] (SPARK-31460) spark-sql-kafka source in spark 2.4.4 causes reading stream failure frequently

2020-05-06 Thread Gabor Somogyi (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-31460?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17100549#comment-17100549
 ] 

Gabor Somogyi commented on SPARK-31460:
---

Please re-open if the suggestion didn't help.

> spark-sql-kafka source in spark 2.4.4 causes reading stream failure frequently
> --
>
> Key: SPARK-31460
> URL: https://issues.apache.org/jira/browse/SPARK-31460
> Project: Spark
>  Issue Type: Bug
>  Components: Structured Streaming
>Affects Versions: 2.4.4
>Reporter: vinay
>Priority: Major
>   Original Estimate: 24h
>  Remaining Estimate: 24h
>
> In spark 2.4.4 , it provides a source "spark-sql-kafka-0-10_2.11".
>  
> When I wanted to read from my kafka-0.10.2.11 cluster, it throws out an error 
> "*java.util.concurrent.TimeoutException: Cannot fetch record for offset x 
> in 1000 milliseconds*"  frequently, and the job thus failed.
>  
> I see this issue was seen before in 2.3 according to ticket 23829 and an 
> upgrade to spark 2.4 was supposed to solve this.
>  
> {code:java}
> compile group: 'org.apache.spark', name: 'spark-sql-kafka-0-10_2.11', 
> version: '2.4.4'{code}
> Here is the error stack.
> {code:java}
> org.apache.spark.SparkException: Writing job aborted.
>  
> org.apache.spark.sql.execution.datasources.v2.WriteToDataSourceV2Exec.doExecute(WriteToDataSourceV2Exec.scala:92)
>  
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:131)
>  
> org.apache.spark.sql.execution.SparkPlan$$anonfun$execute$1.apply(SparkPlan.scala:127)
>  
> org.apache.spark.sql.execution.SparkPlan$$anonfun$executeQuery$1.apply(SparkPlan.scala:155)
>  
> org.apache.spark.rdd.RDDOperationScope$.withScope(RDDOperationScope.scala:151)
>  org.apache.spark.sql.execution.SparkPlan.executeQuery(SparkPlan.scala:152)
>  org.apache.spark.sql.execution.SparkPlan.execute(SparkPlan.scala:127)
>  org.apache.spark.sql.execution.SparkPlan.getByteArrayRdd(SparkPlan.scala:247)
>  org.apache.spark.sql.execution.SparkPlan.executeCollect(SparkPlan.scala:296)
>  
> org.apache.spark.sql.Dataset.org$apache$spark$sql$Dataset$$collectFromPlan(Dataset.scala:3389)
>  org.apache.spark.sql.Dataset$$anonfun$collect$1.apply(Dataset.scala:2788)
>  org.apache.spark.sql.Dataset$$anonfun$collect$1.apply(Dataset.scala:2788)
>  org.apache.spark.sql.Dataset$$anonfun$52.apply(Dataset.scala:3370)
>  
> org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:78)
>  
> org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125)
>  
> org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73)
>  org.apache.spark.sql.Dataset.withAction(Dataset.scala:3369)
>  org.apache.spark.sql.Dataset.collect(Dataset.scala:2788)
> org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$org$apache$spark$sql$execution$streaming$MicroBatchExecution$$runBatch$5$$anonfun$apply$17.apply(MicroBatchExecution.scala:540)
>  
> org.apache.spark.sql.execution.SQLExecution$$anonfun$withNewExecutionId$1.apply(SQLExecution.scala:78)
>  
> org.apache.spark.sql.execution.SQLExecution$.withSQLConfPropagated(SQLExecution.scala:125)
>  
> org.apache.spark.sql.execution.SQLExecution$.withNewExecutionId(SQLExecution.scala:73)
> org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$org$apache$spark$sql$execution$streaming$MicroBatchExecution$$runBatch$5.apply(MicroBatchExecution.scala:535)
>  
> org.apache.spark.sql.execution.streaming.ProgressReporter$class.reportTimeTaken(ProgressReporter.scala:351)
>  
> org.apache.spark.sql.execution.streaming.StreamExecution.reportTimeTaken(StreamExecution.scala:58)
> org.apache.spark.sql.execution.streaming.MicroBatchExecution.org$apache$spark$sql$execution$streaming$MicroBatchExecution$$runBatch(MicroBatchExecution.scala:534)
> org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$runActivatedStream$1$$anonfun$apply$mcZ$sp$1.apply$mcV$sp(MicroBatchExecution.scala:198)
> org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$runActivatedStream$1$$anonfun$apply$mcZ$sp$1.apply(MicroBatchExecution.scala:166)
> org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$runActivatedStream$1$$anonfun$apply$mcZ$sp$1.apply(MicroBatchExecution.scala:166)
>  
> org.apache.spark.sql.execution.streaming.ProgressReporter$class.reportTimeTaken(ProgressReporter.scala:351)
>  
> org.apache.spark.sql.execution.streaming.StreamExecution.reportTimeTaken(StreamExecution.scala:58)
> org.apache.spark.sql.execution.streaming.MicroBatchExecution$$anonfun$runActivatedStream$1.apply$mcZ$sp(MicroBatchExecution.scala:166)
>  
> org.apache.spark.sql.execution.streaming.ProcessingTimeExecutor.execute(TriggerExecutor.scala:56)
>  
> 

[jira] [Created] (SPARK-31650) SQL UI doesn't show metrics and whole stage codegen in AQE

2020-05-06 Thread wuyi (Jira)
wuyi created SPARK-31650:


 Summary: SQL UI doesn't show metrics and whole stage codegen in AQE
 Key: SPARK-31650
 URL: https://issues.apache.org/jira/browse/SPARK-31650
 Project: Spark
  Issue Type: Test
  Components: SQL
Affects Versions: 3.0.0
Reporter: wuyi
 Attachments: before_aqe_ui.png

When enabling AQE with subqueris within a query, the SQL UI may doesn't show 
metrics and whole stage codegen.

Here's a reproduce demo:

 
{code:java}
spark.range(1).toDF("value").write.parquet("/tmp/p1")

spark.range(1).toDF("value").write.parquet("/tmp/p2")

spark.read.parquet("/tmp/p1").createOrReplaceTempView("t1")

spark.read.parquet("/tmp/p2").createOrReplaceTempView("t2")


spark.sql("select * from t1 where value=(select Max(value) from t2)").show()
{code}
 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-31650) SQL UI doesn't show metrics and whole stage codegen in AQE

2020-05-06 Thread wuyi (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31650?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

wuyi updated SPARK-31650:
-
Attachment: before_aqe_ui.png

> SQL UI doesn't show metrics and whole stage codegen in AQE
> --
>
> Key: SPARK-31650
> URL: https://issues.apache.org/jira/browse/SPARK-31650
> Project: Spark
>  Issue Type: Test
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: wuyi
>Priority: Major
> Attachments: before_aqe_ui.png
>
>
> When enabling AQE with subqueris within a query, the SQL UI may doesn't show 
> metrics and whole stage codegen.
> Here's a reproduce demo:
>  
> {code:java}
> spark.range(1).toDF("value").write.parquet("/tmp/p1")
> spark.range(1).toDF("value").write.parquet("/tmp/p2")
> spark.read.parquet("/tmp/p1").createOrReplaceTempView("t1")
> spark.read.parquet("/tmp/p2").createOrReplaceTempView("t2")
> spark.sql("select * from t1 where value=(select Max(value) from t2)").show()
> {code}
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-31649) Spread partitions evenly to spark executors

2020-05-06 Thread serdar onur (Jira)
serdar onur created SPARK-31649:
---

 Summary: Spread partitions evenly to spark executors
 Key: SPARK-31649
 URL: https://issues.apache.org/jira/browse/SPARK-31649
 Project: Spark
  Issue Type: New Feature
  Components: Spark Core
Affects Versions: 2.4.4
Reporter: serdar onur


The year is 2020 and I am still trying to find a solution to this. I totally 
understand what [~thunderstumpges] was trying to achieve and I am trying to 
achieve the same. For a tool like spark, it is unacceptable not to be able to 
distribute the created partitions to the executors evenly. You know, we can 
create a custom partitioner to distribute the data to the partitions evenly by 
creating our own partition index. I was under the impression that a similar 
approach could be applied to spread these partitions to the executors 
evenly(using some sort of executor index for selection of executors during 
partition distribution). I have been googling this for a day now and I am very 
disappointed to say that up to now this seems to be not possible.

Note: I am disappointed that the issue below was put into resolved state 
without actually doing anything about it.

https://issues.apache.org/jira/browse/SPARK-19371



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-19371) Cannot spread cached partitions evenly across executors

2020-05-06 Thread serdar onur (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-19371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17100517#comment-17100517
 ] 

serdar onur commented on SPARK-19371:
-

The year is 2020 and I am still trying to find a solution to this. I totally 
understand what [~thunderstumpges] was trying to achieve and I am trying to 
achieve the same. For a tool like spark, it is unacceptable not to be able to 
distribute the created partitions to the executors evenly. You know, we can 
create a custom partitioner to distribute the data to the partitions evenly by 
creating our own partition index. I was under the impression that a similar 
approach could be applied to spread these partitions to the executors 
evenly(using some sort of executor index for selection of executors during 
partition distribution). I have been googling this for a day now and I am very 
disappointed to say that up to now this seems to be not possible.

> Cannot spread cached partitions evenly across executors
> ---
>
> Key: SPARK-19371
> URL: https://issues.apache.org/jira/browse/SPARK-19371
> Project: Spark
>  Issue Type: Bug
>Affects Versions: 1.6.1
>Reporter: Thunder Stumpges
>Priority: Major
>  Labels: bulk-closed
> Attachments: RDD Block Distribution on two executors.png, Unbalanced 
> RDD Blocks, and resulting task imbalance.png, Unbalanced RDD Blocks, and 
> resulting task imbalance.png, execution timeline.png
>
>
> Before running an intensive iterative job (in this case a distributed topic 
> model training), we need to load a dataset and persist it across executors. 
> After loading from HDFS and persisting, the partitions are spread unevenly 
> across executors (based on the initial scheduling of the reads which are not 
> data locale sensitive). The partition sizes are even, just not their 
> distribution over executors. We currently have no way to force the partitions 
> to spread evenly, and as the iterative algorithm begins, tasks are 
> distributed to executors based on this initial load, forcing some very 
> unbalanced work.
> This has been mentioned a 
> [number|http://apache-spark-developers-list.1001551.n3.nabble.com/RDD-Partitions-not-distributed-evenly-to-executors-tt16988.html#a17059]
>  of 
> [times|http://apache-spark-user-list.1001560.n3.nabble.com/Spark-work-distribution-among-execs-tt26502.html]
>  in 
> [various|http://apache-spark-user-list.1001560.n3.nabble.com/Partitions-are-get-placed-on-the-single-node-tt26597.html]
>  user/dev group threads.
> None of the discussions I could find had solutions that worked for me. Here 
> are examples of things I have tried. All resulted in partitions in memory 
> that were NOT evenly distributed to executors, causing future tasks to be 
> imbalanced across executors as well.
> *Reduce Locality*
> {code}spark.shuffle.reduceLocality.enabled=false/true{code}
> *"Legacy" memory mode*
> {code}spark.memory.useLegacyMode = true/false{code}
> *Basic load and repartition*
> {code}
> val numPartitions = 48*16
> val df = sqlContext.read.
> parquet("/data/folder_to_load").
> repartition(numPartitions).
> persist
> df.count
> {code}
> *Load and repartition to 2x partitions, then shuffle repartition down to 
> desired partitions*
> {code}
> val numPartitions = 48*16
> val df2 = sqlContext.read.
> parquet("/data/folder_to_load").
> repartition(numPartitions*2)
> val df = df2.repartition(numPartitions).
> persist
> df.count
> {code}
> It would be great if when persisting an RDD/DataFrame, if we could request 
> that those partitions be stored evenly across executors in preparation for 
> future tasks. 
> I'm not sure if this is a more general issue (I.E. not just involving 
> persisting RDDs), but for the persisted in-memory case, it can make a HUGE 
> difference in the over-all running time of the remaining work.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-31648) Filtering is supported only on partition keys of type string Issue

2020-05-06 Thread Rajesh Tadi (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Tadi updated SPARK-31648:

Description: 
When I submit a SQL with partition filter I see the below error. I tried 
setting Spark Configuration spark.sql.hive.manageFilesourcePartitions to false 
but I still see the same issue.

java.lang.RuntimeException: Caught Hive MetaException attempting to get 
partition metadata by filter from Hive.

java.lang.reflect.InvocationTargetException: 
org.apache.hadoop.hive.metastore.api.MetaException: Filtering is supported only 
on partition keys of type string

org.apache.hadoop.hive.metastore.api.MetaException: Filtering is supported only 
on partition keys of type string

 

 

  was:
When I submit a SQL with partition filter I see the below error. I tried 
setting Spark Configuration spark.sql.hive.manageFilesourcePartitions to false 
but I still see the same issue.

java.lang.RuntimeException: Caught Hive MetaException attempting to get 
partition metadata by filter from Hive.

java.lang.reflect.InvocationTargetException: 
org.apache.hadoop.hive.metastore.api.MetaException: Filtering is supported only 
on partition keys of type string

org.apache.hadoop.hive.metastore.api.MetaException: Filtering is supported only 
on partition keys of type string


> Filtering is supported only on partition keys of type string Issue
> --
>
> Key: SPARK-31648
> URL: https://issues.apache.org/jira/browse/SPARK-31648
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.4
>Reporter: Rajesh Tadi
>Priority: Major
> Attachments: Spark Bug.txt
>
>
> When I submit a SQL with partition filter I see the below error. I tried 
> setting Spark Configuration spark.sql.hive.manageFilesourcePartitions to 
> false but I still see the same issue.
> java.lang.RuntimeException: Caught Hive MetaException attempting to get 
> partition metadata by filter from Hive.
> java.lang.reflect.InvocationTargetException: 
> org.apache.hadoop.hive.metastore.api.MetaException: Filtering is supported 
> only on partition keys of type string
> org.apache.hadoop.hive.metastore.api.MetaException: Filtering is supported 
> only on partition keys of type string
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-31648) Filtering is supported only on partition keys of type string Issue

2020-05-06 Thread Rajesh Tadi (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Tadi updated SPARK-31648:

Description: 
When I submit a SQL with partition filter I see the below error. I tried 
setting Spark Configuration spark.sql.hive.manageFilesourcePartitions to false 
but I still see the same issue.

java.lang.RuntimeException: Caught Hive MetaException attempting to get 
partition metadata by filter from Hive.

java.lang.reflect.InvocationTargetException: 
org.apache.hadoop.hive.metastore.api.MetaException: Filtering is supported only 
on partition keys of type string

org.apache.hadoop.hive.metastore.api.MetaException: Filtering is supported only 
on partition keys of type string

  was:
When I submit a SQL with partition filter I see the below error. I tried 
setting Spark Configuration spark.sql.hive.manageFilesourcePartitions to false 
but I still see the same issue.

java.lang.RuntimeException: Caught Hive MetaException attempting to get 
partition metadata by filter from Hive.

java.lang.reflect.InvocationTargetException: 
org.apache.hadoop.hive.metastore.api.MetaException: Filtering is supported only 
on partition keys of type string

org.apache.hadoop.hive.metastore.api.MetaException: Filtering is supported only 
on partition keys of type string

 

[^Spark Bug.txt]


> Filtering is supported only on partition keys of type string Issue
> --
>
> Key: SPARK-31648
> URL: https://issues.apache.org/jira/browse/SPARK-31648
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.4
>Reporter: Rajesh Tadi
>Priority: Major
> Attachments: Spark Bug.txt
>
>
> When I submit a SQL with partition filter I see the below error. I tried 
> setting Spark Configuration spark.sql.hive.manageFilesourcePartitions to 
> false but I still see the same issue.
> java.lang.RuntimeException: Caught Hive MetaException attempting to get 
> partition metadata by filter from Hive.
> java.lang.reflect.InvocationTargetException: 
> org.apache.hadoop.hive.metastore.api.MetaException: Filtering is supported 
> only on partition keys of type string
> org.apache.hadoop.hive.metastore.api.MetaException: Filtering is supported 
> only on partition keys of type string



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-31648) Filtering is supported only on partition keys of type string Issue

2020-05-06 Thread Rajesh Tadi (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Tadi updated SPARK-31648:

Description: 
When I submit a SQL with partition filter I see the below error. I tried 
setting Spark Configuration spark.sql.hive.manageFilesourcePartitions to false 
but I still see the same issue.

java.lang.RuntimeException: Caught Hive MetaException attempting to get 
partition metadata by filter from Hive.

java.lang.reflect.InvocationTargetException: 
org.apache.hadoop.hive.metastore.api.MetaException: Filtering is supported only 
on partition keys of type string

org.apache.hadoop.hive.metastore.api.MetaException: Filtering is supported only 
on partition keys of type string

 

[^Spark Bug.txt]

  was:
When I submit a SQL with partition filter I see the below error. I tried 
setting Spark Configuration spark.sql.hive.manageFilesourcePartitions to false 
but I still see the same issue.

java.lang.RuntimeException: Caught Hive MetaException attempting to get 
partition metadata by filter from Hive.

java.lang.reflect.InvocationTargetException: 
org.apache.hadoop.hive.metastore.api.MetaException: Filtering is supported only 
on partition keys of type string

org.apache.hadoop.hive.metastore.api.MetaException: Filtering is supported only 
on partition keys of type string

Summary: Filtering is supported only on partition keys of type string 
Issue  (was: Filtering is supported only on partition keys of type string)

> Filtering is supported only on partition keys of type string Issue
> --
>
> Key: SPARK-31648
> URL: https://issues.apache.org/jira/browse/SPARK-31648
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.4
>Reporter: Rajesh Tadi
>Priority: Major
> Attachments: Spark Bug.txt
>
>
> When I submit a SQL with partition filter I see the below error. I tried 
> setting Spark Configuration spark.sql.hive.manageFilesourcePartitions to 
> false but I still see the same issue.
> java.lang.RuntimeException: Caught Hive MetaException attempting to get 
> partition metadata by filter from Hive.
> java.lang.reflect.InvocationTargetException: 
> org.apache.hadoop.hive.metastore.api.MetaException: Filtering is supported 
> only on partition keys of type string
> org.apache.hadoop.hive.metastore.api.MetaException: Filtering is supported 
> only on partition keys of type string
>  
> [^Spark Bug.txt]



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-31648) Filtering is supported only on partition keys of type string

2020-05-06 Thread Rajesh Tadi (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Tadi updated SPARK-31648:

Description: 
When I submit a SQL with partition filter I see the below error. I tried 
setting Spark Configuration spark.sql.hive.manageFilesourcePartitions to false 
but I still see the same issue.

java.lang.RuntimeException: Caught Hive MetaException attempting to get 
partition metadata by filter from Hive.

java.lang.reflect.InvocationTargetException: 
org.apache.hadoop.hive.metastore.api.MetaException: Filtering is supported only 
on partition keys of type string

org.apache.hadoop.hive.metastore.api.MetaException: Filtering is supported only 
on partition keys of type string

  was:
When I submit a SQL with partition filter I see the below error. I tried 
setting Spark Configuration spark.sql.hive.manageFilesourcePartitions to false 
but no luck.

java.lang.RuntimeException: Caught Hive MetaException attempting to get 
partition metadata by filter from Hive.

java.lang.reflect.InvocationTargetException: 
org.apache.hadoop.hive.metastore.api.MetaException: Filtering is supported only 
on partition keys of type string

org.apache.hadoop.hive.metastore.api.MetaException: Filtering is supported only 
on partition keys of type string


> Filtering is supported only on partition keys of type string
> 
>
> Key: SPARK-31648
> URL: https://issues.apache.org/jira/browse/SPARK-31648
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.4
>Reporter: Rajesh Tadi
>Priority: Major
> Attachments: Spark Bug.txt
>
>
> When I submit a SQL with partition filter I see the below error. I tried 
> setting Spark Configuration spark.sql.hive.manageFilesourcePartitions to 
> false but I still see the same issue.
> java.lang.RuntimeException: Caught Hive MetaException attempting to get 
> partition metadata by filter from Hive.
> java.lang.reflect.InvocationTargetException: 
> org.apache.hadoop.hive.metastore.api.MetaException: Filtering is supported 
> only on partition keys of type string
> org.apache.hadoop.hive.metastore.api.MetaException: Filtering is supported 
> only on partition keys of type string



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-31648) Filtering is supported only on partition keys of type string

2020-05-06 Thread Rajesh Tadi (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Tadi updated SPARK-31648:

Attachment: Spark Bug.txt

> Filtering is supported only on partition keys of type string
> 
>
> Key: SPARK-31648
> URL: https://issues.apache.org/jira/browse/SPARK-31648
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.4
>Reporter: Rajesh Tadi
>Priority: Major
> Attachments: Spark Bug.txt
>
>
> When I submit a SQL with partition filter I see the below error. I tried 
> setting Spark Configuration spark.sql.hive.manageFilesourcePartitions to 
> false but no luck.
> java.lang.RuntimeException: Caught Hive MetaException attempting to get 
> partition metadata by filter from Hive.
> java.lang.reflect.InvocationTargetException: 
> org.apache.hadoop.hive.metastore.api.MetaException: Filtering is supported 
> only on partition keys of type string
> org.apache.hadoop.hive.metastore.api.MetaException: Filtering is supported 
> only on partition keys of type string



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-31648) Filtering is supported only on partition keys of type string

2020-05-06 Thread Rajesh Tadi (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Tadi updated SPARK-31648:

  Docs Text: 
java.lang.RuntimeException: Caught Hive MetaException attempting to get 
partition metadata by filter from Hive. You can set the Spark configuration 
setting spark.sql.hive.manageFilesourcePartitions to false to work around this 
problem, however this will result in degraded performance. Please report a bug: 
https://issues.apache.org/jira/browse/SPARK
at 
org.apache.spark.sql.hive.client.Shim_v0_13.getPartitionsByFilter(HiveShim.scala:775)
at 
org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getPartitionsByFilter$1.apply(HiveClientImpl.scala:679)
at 
org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getPartitionsByFilter$1.apply(HiveClientImpl.scala:677)
at 
org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$withHiveState$1.apply(HiveClientImpl.scala:275)
at 
org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:213)
at 
org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:212)
at 
org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:258)
at 
org.apache.spark.sql.hive.client.HiveClientImpl.getPartitionsByFilter(HiveClientImpl.scala:677)
at 
org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$listPartitionsByFilter$1.apply(HiveExternalCatalog.scala:1221)
at 
org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$listPartitionsByFilter$1.apply(HiveExternalCatalog.scala:1214)
at 
org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:97)
at 
org.apache.spark.sql.hive.HiveExternalCatalog.listPartitionsByFilter(HiveExternalCatalog.scala:1214)
at 
org.apache.spark.sql.catalyst.catalog.ExternalCatalogWithListener.listPartitionsByFilter(ExternalCatalogWithListener.scala:254)
at 
org.apache.spark.sql.catalyst.catalog.SessionCatalog.listPartitionsByFilter(SessionCatalog.scala:962)
at 
org.apache.spark.sql.execution.datasources.CatalogFileIndex.filterPartitions(CatalogFileIndex.scala:73)
at 
org.apache.spark.sql.execution.datasources.PruneFileSourcePartitions$$anonfun$apply$1.applyOrElse(PruneFileSourcePartitions.scala:63)
at 
org.apache.spark.sql.execution.datasources.PruneFileSourcePartitions$$anonfun$apply$1.applyOrElse(PruneFileSourcePartitions.scala:27)
at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$2.apply(TreeNode.scala:259)
at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$2.apply(TreeNode.scala:259)
at 
org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:258)
at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDown(LogicalPlan.scala:29)
at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$class.transformDown(AnalysisHelper.scala:149)
at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDown(LogicalPlan.scala:29)
at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDown(LogicalPlan.scala:29)
at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:264)
at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:264)
at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:329)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:327)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:264)
at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDown(LogicalPlan.scala:29)
at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$class.transformDown(AnalysisHelper.scala:149)
at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDown(LogicalPlan.scala:29)
at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDown(LogicalPlan.scala:29)
at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:264)
at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:264)
at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:329)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:327)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:264)
at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDown(LogicalPlan.scala:29)
at 

[jira] [Updated] (SPARK-31648) Filtering is supported only on partition keys of type string

2020-05-06 Thread Rajesh Tadi (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31648?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Rajesh Tadi updated SPARK-31648:

Docs Text:   (was: java.lang.RuntimeException: Caught Hive MetaException 
attempting to get partition metadata by filter from Hive. You can set the Spark 
configuration setting spark.sql.hive.manageFilesourcePartitions to false to 
work around this problem, however this will result in degraded performance. 
Please report a bug: https://issues.apache.org/jira/browse/SPARK
at 
org.apache.spark.sql.hive.client.Shim_v0_13.getPartitionsByFilter(HiveShim.scala:775)
at 
org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getPartitionsByFilter$1.apply(HiveClientImpl.scala:679)
at 
org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$getPartitionsByFilter$1.apply(HiveClientImpl.scala:677)
at 
org.apache.spark.sql.hive.client.HiveClientImpl$$anonfun$withHiveState$1.apply(HiveClientImpl.scala:275)
at 
org.apache.spark.sql.hive.client.HiveClientImpl.liftedTree1$1(HiveClientImpl.scala:213)
at 
org.apache.spark.sql.hive.client.HiveClientImpl.retryLocked(HiveClientImpl.scala:212)
at 
org.apache.spark.sql.hive.client.HiveClientImpl.withHiveState(HiveClientImpl.scala:258)
at 
org.apache.spark.sql.hive.client.HiveClientImpl.getPartitionsByFilter(HiveClientImpl.scala:677)
at 
org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$listPartitionsByFilter$1.apply(HiveExternalCatalog.scala:1221)
at 
org.apache.spark.sql.hive.HiveExternalCatalog$$anonfun$listPartitionsByFilter$1.apply(HiveExternalCatalog.scala:1214)
at 
org.apache.spark.sql.hive.HiveExternalCatalog.withClient(HiveExternalCatalog.scala:97)
at 
org.apache.spark.sql.hive.HiveExternalCatalog.listPartitionsByFilter(HiveExternalCatalog.scala:1214)
at 
org.apache.spark.sql.catalyst.catalog.ExternalCatalogWithListener.listPartitionsByFilter(ExternalCatalogWithListener.scala:254)
at 
org.apache.spark.sql.catalyst.catalog.SessionCatalog.listPartitionsByFilter(SessionCatalog.scala:962)
at 
org.apache.spark.sql.execution.datasources.CatalogFileIndex.filterPartitions(CatalogFileIndex.scala:73)
at 
org.apache.spark.sql.execution.datasources.PruneFileSourcePartitions$$anonfun$apply$1.applyOrElse(PruneFileSourcePartitions.scala:63)
at 
org.apache.spark.sql.execution.datasources.PruneFileSourcePartitions$$anonfun$apply$1.applyOrElse(PruneFileSourcePartitions.scala:27)
at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$2.apply(TreeNode.scala:259)
at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$2.apply(TreeNode.scala:259)
at 
org.apache.spark.sql.catalyst.trees.CurrentOrigin$.withOrigin(TreeNode.scala:70)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:258)
at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDown(LogicalPlan.scala:29)
at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$class.transformDown(AnalysisHelper.scala:149)
at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDown(LogicalPlan.scala:29)
at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDown(LogicalPlan.scala:29)
at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:264)
at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:264)
at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:329)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:327)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:264)
at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDown(LogicalPlan.scala:29)
at 
org.apache.spark.sql.catalyst.plans.logical.AnalysisHelper$class.transformDown(AnalysisHelper.scala:149)
at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDown(LogicalPlan.scala:29)
at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.transformDown(LogicalPlan.scala:29)
at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:264)
at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$transformDown$1.apply(TreeNode.scala:264)
at 
org.apache.spark.sql.catalyst.trees.TreeNode$$anonfun$4.apply(TreeNode.scala:329)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.mapProductIterator(TreeNode.scala:187)
at org.apache.spark.sql.catalyst.trees.TreeNode.mapChildren(TreeNode.scala:327)
at 
org.apache.spark.sql.catalyst.trees.TreeNode.transformDown(TreeNode.scala:264)
at 
org.apache.spark.sql.catalyst.plans.logical.LogicalPlan.org$apache$spark$sql$catalyst$plans$logical$AnalysisHelper$$super$transformDown(LogicalPlan.scala:29)
at