date:20200923

[jira] [Assigned] (SPARK-32982) Remove hive-1.2 profiles in PIP installation option

2020-09-23 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-32982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-32982:


Assignee: Hyukjin Kwon

> Remove hive-1.2 profiles in PIP installation option
> ---
>
> Key: SPARK-32982
> URL: https://issues.apache.org/jira/browse/SPARK-32982
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.1.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
>
> Hive 1.2 is a fork that we should remove. It's best to don't expose this 
> distribution from pip.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-32982) Remove hive-1.2 profiles in PIP installation option

2020-09-23 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-32982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-32982.
--
Fix Version/s: 3.1.0
   Resolution: Fixed

Issue resolved by pull request 29858
[https://github.com/apache/spark/pull/29858]

> Remove hive-1.2 profiles in PIP installation option
> ---
>
> Key: SPARK-32982
> URL: https://issues.apache.org/jira/browse/SPARK-32982
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.1.0
>Reporter: Hyukjin Kwon
>Assignee: Hyukjin Kwon
>Priority: Major
> Fix For: 3.1.0
>
>
> Hive 1.2 is a fork that we should remove. It's best to don't expose this 
> distribution from pip.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-32714) Port pyspark-stubs

2020-09-23 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-32714?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-32714.
--
Fix Version/s: 3.1.0
   Resolution: Fixed

Issue resolved by pull request 29591
[https://github.com/apache/spark/pull/29591]

> Port pyspark-stubs
> --
>
> Key: SPARK-32714
> URL: https://issues.apache.org/jira/browse/SPARK-32714
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark
>Affects Versions: 3.1.0
>Reporter: Hyukjin Kwon
>Assignee: Maciej Szymkiewicz
>Priority: Major
> Fix For: 3.1.0
>
>
> Port https://github.com/zero323/pyspark-stubs into PySpark. This was being 
> discussed in dev mailing list. See also 
> http://apache-spark-developers-list.1001551.n3.nabble.com/Re-PySpark-Revisiting-PySpark-type-annotations-td26232.html



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-32971) Support dynamic PVC creation/deletion for K8s executors

2020-09-23 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-32971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17201257#comment-17201257
 ] 

Apache Spark commented on SPARK-32971:
--

User 'LuciferYang' has created a pull request for this issue:
https://github.com/apache/spark/pull/29861

> Support dynamic PVC creation/deletion for K8s executors
> ---
>
> Key: SPARK-32971
> URL: https://issues.apache.org/jira/browse/SPARK-32971
> Project: Spark
>  Issue Type: New Feature
>  Components: Kubernetes
>Affects Versions: 3.1.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
> Fix For: 3.1.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-32971) Support dynamic PVC creation/deletion for K8s executors

2020-09-23 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-32971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17201256#comment-17201256
 ] 

Apache Spark commented on SPARK-32971:
--

User 'LuciferYang' has created a pull request for this issue:
https://github.com/apache/spark/pull/29861

> Support dynamic PVC creation/deletion for K8s executors
> ---
>
> Key: SPARK-32971
> URL: https://issues.apache.org/jira/browse/SPARK-32971
> Project: Spark
>  Issue Type: New Feature
>  Components: Kubernetes
>Affects Versions: 3.1.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
> Fix For: 3.1.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-32984) Improve showing the differences between approved and actual plans

2020-09-23 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-32984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-32984:


Assignee: (was: Apache Spark)

> Improve showing the differences between approved and actual plans
> -
>
> Key: SPARK-32984
> URL: https://issues.apache.org/jira/browse/SPARK-32984
> Project: Spark
>  Issue Type: Test
>  Components: Tests
>Affects Versions: 3.1.0
>Reporter: wuyi
>Priority: Major
>
> It's hard to find the difference between the approved and actual plan since 
> the plans of TPC-DS queries are often huge. We could add hint, e.g., caret 
> (^), to help developers locate the differences quickly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-32984) Improve showing the differences between approved and actual plans

2020-09-23 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-32984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-32984:


Assignee: Apache Spark

> Improve showing the differences between approved and actual plans
> -
>
> Key: SPARK-32984
> URL: https://issues.apache.org/jira/browse/SPARK-32984
> Project: Spark
>  Issue Type: Test
>  Components: Tests
>Affects Versions: 3.1.0
>Reporter: wuyi
>Assignee: Apache Spark
>Priority: Major
>
> It's hard to find the difference between the approved and actual plan since 
> the plans of TPC-DS queries are often huge. We could add hint, e.g., caret 
> (^), to help developers locate the differences quickly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-32984) Improve showing the differences between approved and actual plans

2020-09-23 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-32984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-32984:


Assignee: (was: Apache Spark)

> Improve showing the differences between approved and actual plans
> -
>
> Key: SPARK-32984
> URL: https://issues.apache.org/jira/browse/SPARK-32984
> Project: Spark
>  Issue Type: Test
>  Components: Tests
>Affects Versions: 3.1.0
>Reporter: wuyi
>Priority: Major
>
> It's hard to find the difference between the approved and actual plan since 
> the plans of TPC-DS queries are often huge. We could add hint, e.g., caret 
> (^), to help developers locate the differences quickly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-32984) Improve showing the differences between approved and actual plans

2020-09-23 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-32984?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17201245#comment-17201245
 ] 

Apache Spark commented on SPARK-32984:
--

User 'Ngone51' has created a pull request for this issue:
https://github.com/apache/spark/pull/29860

> Improve showing the differences between approved and actual plans
> -
>
> Key: SPARK-32984
> URL: https://issues.apache.org/jira/browse/SPARK-32984
> Project: Spark
>  Issue Type: Test
>  Components: Tests
>Affects Versions: 3.1.0
>Reporter: wuyi
>Priority: Major
>
> It's hard to find the difference between the approved and actual plan since 
> the plans of TPC-DS queries are often huge. We could add hint, e.g., caret 
> (^), to help developers locate the differences quickly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-32984) Improve showing the differences between approved and actual plans

2020-09-23 Thread wuyi (Jira)

wuyi created SPARK-32984:


 Summary: Improve showing the differences between approved and 
actual plans
 Key: SPARK-32984
 URL: https://issues.apache.org/jira/browse/SPARK-32984
 Project: Spark
  Issue Type: Test
  Components: Tests
Affects Versions: 3.1.0
Reporter: wuyi


It's hard to find the difference between the approved and actual plan since the 
plans of TPC-DS queries are often huge. We could add hint, e.g., caret (^), to 
help developers locate the differences quickly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-32983) Spark SQL INTERSECT ALL does not keep all rows.

2020-09-23 Thread Will Du (Jira)

Will Du created SPARK-32983:
---

 Summary: Spark SQL INTERSECT ALL does not keep all rows.
 Key: SPARK-32983
 URL: https://issues.apache.org/jira/browse/SPARK-32983
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.0.1, 3.0.0, 2.4.6
Reporter: Will Du


Spark SQL INTERSECT ALL should keep all rows. But, it actually remove 
replicated like Spark SQL INTERSECT.

with base as (

select 1 as id union all select 2 as id

), a as (

select 1 as id union all select 3 as id)

select * from a INTERSECT ALL select * from base;

 

with base as (

select 1 as id union all select 2 as id

), a as (

select 1 as id union all select 3 as id)

select * from a INTERSECT select * from base;

 

Both the above queries return one record that is 1.

I think the 1st query should return

1

1



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-32977) [SQL] JavaDoc on Default Save mode Incorrect

2020-09-23 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-32977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-32977:
-

Assignee: Russell Spitzer

> [SQL] JavaDoc on Default Save mode Incorrect
> 
>
> Key: SPARK-32977
> URL: https://issues.apache.org/jira/browse/SPARK-32977
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.1
>Reporter: Russell Spitzer
>Assignee: Russell Spitzer
>Priority: Major
>
> The JavaDoc says that the default save mode is dependent on DataSource 
> version which is incorrect. It is always ErrorOnExists.
> http://apache-spark-developers-list.1001551.n3.nabble.com/DatasourceV2-Default-Mode-for-DataFrameWriter-not-Dependent-on-DataSource-Version-td29434.html



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-32977) [SQL] JavaDoc on Default Save mode Incorrect

2020-09-23 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-32977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-32977.
---
Fix Version/s: 3.0.2
   3.1.0
   Resolution: Fixed

Issue resolved by pull request 29853
[https://github.com/apache/spark/pull/29853]

> [SQL] JavaDoc on Default Save mode Incorrect
> 
>
> Key: SPARK-32977
> URL: https://issues.apache.org/jira/browse/SPARK-32977
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.1
>Reporter: Russell Spitzer
>Assignee: Russell Spitzer
>Priority: Major
> Fix For: 3.1.0, 3.0.2
>
>
> The JavaDoc says that the default save mode is dependent on DataSource 
> version which is incorrect. It is always ErrorOnExists.
> http://apache-spark-developers-list.1001551.n3.nabble.com/DatasourceV2-Default-Mode-for-DataFrameWriter-not-Dependent-on-DataSource-Version-td29434.html



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-32982) Remove hive-1.2 profiles in PIP installation option

2020-09-23 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-32982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17201230#comment-17201230
 ] 

Apache Spark commented on SPARK-32982:
--

User 'HyukjinKwon' has created a pull request for this issue:
https://github.com/apache/spark/pull/29858

> Remove hive-1.2 profiles in PIP installation option
> ---
>
> Key: SPARK-32982
> URL: https://issues.apache.org/jira/browse/SPARK-32982
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.1.0
>Reporter: Hyukjin Kwon
>Priority: Major
>
> Hive 1.2 is a fork that we should remove. It's best to don't expose this 
> distribution from pip.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-32982) Remove hive-1.2 profiles in PIP installation option

2020-09-23 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-32982?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17201232#comment-17201232
 ] 

Apache Spark commented on SPARK-32982:
--

User 'HyukjinKwon' has created a pull request for this issue:
https://github.com/apache/spark/pull/29858

> Remove hive-1.2 profiles in PIP installation option
> ---
>
> Key: SPARK-32982
> URL: https://issues.apache.org/jira/browse/SPARK-32982
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.1.0
>Reporter: Hyukjin Kwon
>Priority: Major
>
> Hive 1.2 is a fork that we should remove. It's best to don't expose this 
> distribution from pip.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-32982) Remove hive-1.2 profiles in PIP installation option

2020-09-23 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-32982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-32982:


Assignee: (was: Apache Spark)

> Remove hive-1.2 profiles in PIP installation option
> ---
>
> Key: SPARK-32982
> URL: https://issues.apache.org/jira/browse/SPARK-32982
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.1.0
>Reporter: Hyukjin Kwon
>Priority: Major
>
> Hive 1.2 is a fork that we should remove. It's best to don't expose this 
> distribution from pip.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-32982) Remove hive-1.2 profiles in PIP installation option

2020-09-23 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-32982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-32982:


Assignee: Apache Spark

> Remove hive-1.2 profiles in PIP installation option
> ---
>
> Key: SPARK-32982
> URL: https://issues.apache.org/jira/browse/SPARK-32982
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.1.0
>Reporter: Hyukjin Kwon
>Assignee: Apache Spark
>Priority: Major
>
> Hive 1.2 is a fork that we should remove. It's best to don't expose this 
> distribution from pip.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-32982) Remove hive-1.2 profiles in PIP installation option

2020-09-23 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-32982?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-32982:
-
Issue Type: Improvement  (was: Bug)

> Remove hive-1.2 profiles in PIP installation option
> ---
>
> Key: SPARK-32982
> URL: https://issues.apache.org/jira/browse/SPARK-32982
> Project: Spark
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.1.0
>Reporter: Hyukjin Kwon
>Priority: Major
>
> Hive 1.2 is a fork that we should remove. It's best to don't expose this 
> distribution from pip.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-32982) Remove hive-1.2 profiles in PIP installation option

2020-09-23 Thread Hyukjin Kwon (Jira)

Hyukjin Kwon created SPARK-32982:


 Summary: Remove hive-1.2 profiles in PIP installation option
 Key: SPARK-32982
 URL: https://issues.apache.org/jira/browse/SPARK-32982
 Project: Spark
  Issue Type: Bug
  Components: Build
Affects Versions: 3.1.0
Reporter: Hyukjin Kwon


Hive 1.2 is a fork that we should remove. It's best to don't expose this 
distribution from pip.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-32971) Support dynamic PVC creation/deletion for K8s executors

2020-09-23 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-32971?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17201225#comment-17201225
 ] 

Apache Spark commented on SPARK-32971:
--

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/29859

> Support dynamic PVC creation/deletion for K8s executors
> ---
>
> Key: SPARK-32971
> URL: https://issues.apache.org/jira/browse/SPARK-32971
> Project: Spark
>  Issue Type: New Feature
>  Components: Kubernetes
>Affects Versions: 3.1.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
> Fix For: 3.1.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-32981) Remove hive-1.2/hadoop-2.7 from Apache Spark 3.1 distribution

2020-09-23 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-32981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17201222#comment-17201222
 ] 

Apache Spark commented on SPARK-32981:
--

User 'HyukjinKwon' has created a pull request for this issue:
https://github.com/apache/spark/pull/29858

> Remove hive-1.2/hadoop-2.7 from Apache Spark 3.1 distribution
> -
>
> Key: SPARK-32981
> URL: https://issues.apache.org/jira/browse/SPARK-32981
> Project: Spark
>  Issue Type: Task
>  Components: Build
>Affects Versions: 3.1.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
> Fix For: 3.1.0
>
>
> Apache Spark 3.0 switches its Hive execution version from 1.2 to 2.3, but we 
> still provide the unofficial forked Hive 1.2 version from our distribution. 
> This issue aims to remove it from Apache Spark 3.1.0.
> {code}
> spark-3.0.1-bin-hadoop2.7-hive1.2.tgz
> spark-3.0.1-bin-hadoop2.7-hive1.2.tgz.asc
> spark-3.0.1-bin-hadoop2.7-hive1.2.tgz.sha512 
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-32981) Remove hive-1.2/hadoop-2.7 from Apache Spark 3.1 distribution

2020-09-23 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-32981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17201219#comment-17201219
 ] 

Apache Spark commented on SPARK-32981:
--

User 'HyukjinKwon' has created a pull request for this issue:
https://github.com/apache/spark/pull/29858

> Remove hive-1.2/hadoop-2.7 from Apache Spark 3.1 distribution
> -
>
> Key: SPARK-32981
> URL: https://issues.apache.org/jira/browse/SPARK-32981
> Project: Spark
>  Issue Type: Task
>  Components: Build
>Affects Versions: 3.1.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
> Fix For: 3.1.0
>
>
> Apache Spark 3.0 switches its Hive execution version from 1.2 to 2.3, but we 
> still provide the unofficial forked Hive 1.2 version from our distribution. 
> This issue aims to remove it from Apache Spark 3.1.0.
> {code}
> spark-3.0.1-bin-hadoop2.7-hive1.2.tgz
> spark-3.0.1-bin-hadoop2.7-hive1.2.tgz.asc
> spark-3.0.1-bin-hadoop2.7-hive1.2.tgz.sha512 
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-32972) Pass all `mllib` module UTs in Scala 2.13

2020-09-23 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-32972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17201217#comment-17201217
 ] 

Apache Spark commented on SPARK-32972:
--

User 'LuciferYang' has created a pull request for this issue:
https://github.com/apache/spark/pull/29857

> Pass all `mllib` module UTs in Scala 2.13
> -
>
> Key: SPARK-32972
> URL: https://issues.apache.org/jira/browse/SPARK-32972
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML, MLlib
>Affects Versions: 3.1.0
>Reporter: Yang Jie
>Priority: Major
>
> There are 51 Scala test and 3 java test Failed of `mllib` module, the failed 
> case as follow:
> *Java:*
>  * org.apache.spark.mllib.fpm.JavaPrefixSpanSuite ( 1 FAILED)
>  * org.apache.spark.ml.feature.JavaVectorIndexerSuite ( 1 FAILED)
>  * org.apache.spark.ml.feature.JavaWord2VecSuite ( 1 FAILED)
> *Scala:*
>  * MatrixFactorizationModelSuite ( 1 FAILED)
>  * LDASuite ( 1 FAILED)
>  * MLTestSuite ( 1 FAILED)
>  * PrefixSpanSuite ( 1 FAILED)
>  * BucketedRandomProjectionLSHSuite ( 3 FAILED)
>  * Word2VecSuite ( 3 FAILED)
>  * Word2VecSuite ( 5 FAILED)
>  * MinHashLSHSuite ( 3 FAILED)
>  * DecisionTreeSuite ( 1 FAILED)
>  * FPGrowthSuite ( 2 FAILED)
>  * NaiveBayesSuite ( 2 FAILED)
>  * NGramSuite ( 4 FAILED)
>  * RFormulaSuite ( 4 FAILED)
>  * GradientBoostedTreesSuite ( 1 FAILED)
>  * StopWordsRemoverSuite ( 10 FAILED)
>  * RandomForestSuite ( 1 FAILED)
>  * PrefixSpanSuite ( 4 FAILED)
>  * StringIndexerSuite ( 2 FAILED)
>  * IDFSuite ( 1 FAILED)
>  * RandomForestRegressorSuite ( 1 FAILED)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-32972) Pass all `mllib` module UTs in Scala 2.13

2020-09-23 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-32972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-32972:


Assignee: Apache Spark

> Pass all `mllib` module UTs in Scala 2.13
> -
>
> Key: SPARK-32972
> URL: https://issues.apache.org/jira/browse/SPARK-32972
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML, MLlib
>Affects Versions: 3.1.0
>Reporter: Yang Jie
>Assignee: Apache Spark
>Priority: Major
>
> There are 51 Scala test and 3 java test Failed of `mllib` module, the failed 
> case as follow:
> *Java:*
>  * org.apache.spark.mllib.fpm.JavaPrefixSpanSuite ( 1 FAILED)
>  * org.apache.spark.ml.feature.JavaVectorIndexerSuite ( 1 FAILED)
>  * org.apache.spark.ml.feature.JavaWord2VecSuite ( 1 FAILED)
> *Scala:*
>  * MatrixFactorizationModelSuite ( 1 FAILED)
>  * LDASuite ( 1 FAILED)
>  * MLTestSuite ( 1 FAILED)
>  * PrefixSpanSuite ( 1 FAILED)
>  * BucketedRandomProjectionLSHSuite ( 3 FAILED)
>  * Word2VecSuite ( 3 FAILED)
>  * Word2VecSuite ( 5 FAILED)
>  * MinHashLSHSuite ( 3 FAILED)
>  * DecisionTreeSuite ( 1 FAILED)
>  * FPGrowthSuite ( 2 FAILED)
>  * NaiveBayesSuite ( 2 FAILED)
>  * NGramSuite ( 4 FAILED)
>  * RFormulaSuite ( 4 FAILED)
>  * GradientBoostedTreesSuite ( 1 FAILED)
>  * StopWordsRemoverSuite ( 10 FAILED)
>  * RandomForestSuite ( 1 FAILED)
>  * PrefixSpanSuite ( 4 FAILED)
>  * StringIndexerSuite ( 2 FAILED)
>  * IDFSuite ( 1 FAILED)
>  * RandomForestRegressorSuite ( 1 FAILED)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-32972) Pass all `mllib` module UTs in Scala 2.13

2020-09-23 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-32972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-32972:


Assignee: (was: Apache Spark)

> Pass all `mllib` module UTs in Scala 2.13
> -
>
> Key: SPARK-32972
> URL: https://issues.apache.org/jira/browse/SPARK-32972
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML, MLlib
>Affects Versions: 3.1.0
>Reporter: Yang Jie
>Priority: Major
>
> There are 51 Scala test and 3 java test Failed of `mllib` module, the failed 
> case as follow:
> *Java:*
>  * org.apache.spark.mllib.fpm.JavaPrefixSpanSuite ( 1 FAILED)
>  * org.apache.spark.ml.feature.JavaVectorIndexerSuite ( 1 FAILED)
>  * org.apache.spark.ml.feature.JavaWord2VecSuite ( 1 FAILED)
> *Scala:*
>  * MatrixFactorizationModelSuite ( 1 FAILED)
>  * LDASuite ( 1 FAILED)
>  * MLTestSuite ( 1 FAILED)
>  * PrefixSpanSuite ( 1 FAILED)
>  * BucketedRandomProjectionLSHSuite ( 3 FAILED)
>  * Word2VecSuite ( 3 FAILED)
>  * Word2VecSuite ( 5 FAILED)
>  * MinHashLSHSuite ( 3 FAILED)
>  * DecisionTreeSuite ( 1 FAILED)
>  * FPGrowthSuite ( 2 FAILED)
>  * NaiveBayesSuite ( 2 FAILED)
>  * NGramSuite ( 4 FAILED)
>  * RFormulaSuite ( 4 FAILED)
>  * GradientBoostedTreesSuite ( 1 FAILED)
>  * StopWordsRemoverSuite ( 10 FAILED)
>  * RandomForestSuite ( 1 FAILED)
>  * PrefixSpanSuite ( 4 FAILED)
>  * StringIndexerSuite ( 2 FAILED)
>  * IDFSuite ( 1 FAILED)
>  * RandomForestRegressorSuite ( 1 FAILED)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-32972) Pass all `mllib` module UTs in Scala 2.13

2020-09-23 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-32972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17201216#comment-17201216
 ] 

Apache Spark commented on SPARK-32972:
--

User 'LuciferYang' has created a pull request for this issue:
https://github.com/apache/spark/pull/29857

> Pass all `mllib` module UTs in Scala 2.13
> -
>
> Key: SPARK-32972
> URL: https://issues.apache.org/jira/browse/SPARK-32972
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML, MLlib
>Affects Versions: 3.1.0
>Reporter: Yang Jie
>Priority: Major
>
> There are 51 Scala test and 3 java test Failed of `mllib` module, the failed 
> case as follow:
> *Java:*
>  * org.apache.spark.mllib.fpm.JavaPrefixSpanSuite ( 1 FAILED)
>  * org.apache.spark.ml.feature.JavaVectorIndexerSuite ( 1 FAILED)
>  * org.apache.spark.ml.feature.JavaWord2VecSuite ( 1 FAILED)
> *Scala:*
>  * MatrixFactorizationModelSuite ( 1 FAILED)
>  * LDASuite ( 1 FAILED)
>  * MLTestSuite ( 1 FAILED)
>  * PrefixSpanSuite ( 1 FAILED)
>  * BucketedRandomProjectionLSHSuite ( 3 FAILED)
>  * Word2VecSuite ( 3 FAILED)
>  * Word2VecSuite ( 5 FAILED)
>  * MinHashLSHSuite ( 3 FAILED)
>  * DecisionTreeSuite ( 1 FAILED)
>  * FPGrowthSuite ( 2 FAILED)
>  * NaiveBayesSuite ( 2 FAILED)
>  * NGramSuite ( 4 FAILED)
>  * RFormulaSuite ( 4 FAILED)
>  * GradientBoostedTreesSuite ( 1 FAILED)
>  * StopWordsRemoverSuite ( 10 FAILED)
>  * RandomForestSuite ( 1 FAILED)
>  * PrefixSpanSuite ( 4 FAILED)
>  * StringIndexerSuite ( 2 FAILED)
>  * IDFSuite ( 1 FAILED)
>  * RandomForestRegressorSuite ( 1 FAILED)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-32975) [K8S] - executor fails to be restarted after it goes to ERROR/Failure state

2020-09-23 Thread Tibor Fasanga (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-32975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17201207#comment-17201207
 ] 

Tibor Fasanga commented on SPARK-32975:
---

Note that the main problem is that the executor POD quits with error and Spark 
driver and Spark operator think it is still running, therefore the executor is 
never restarted.

This is intermittent problem. Our testing shows that this happens frequently 
when the following is true: 
 # the driver POD has a sidecar container, and
 # it takes longer to initialize and start the sidecar container (this delay is 
caused by time required to pull the image of the sidecar container)

In other words, this problem manifests itself when there is a delay between 
starting the driver *container* and the time the driver *POD* is fully started 
(the POD contains the driver container and the sidecar container).

In this case we see the following events in the description of the driver POD: 
(see the "_Pulling image "registry.nspos.nokia.local/fluent/fluent-bit:1.5.5_" 
event that is present in this case) 
{code:java}
Events:
  Type Reason   AgeFrom   Message
   --         ---
  Normal   Scheduled  default-scheduler  Successfully assigned 
default/act-pipeline-app-driver to node5
  Warning  FailedMount  20mkubelet, node5 MountVolume.SetUp failed 
for volume "spark-conf-volume" : configmap 
"act-pipeline-app-1600699152173-driver-conf-map" not found
  Normal   Pulled   20mkubelet, node5 Container image 
"registry.nspos.nokia.local/nspos-pki-container:20.9.0-rel.1" already present 
on machine
  Normal   Created  20mkubelet, node5 Created container 
nspos-pki
  Normal   Started  20mkubelet, node5 Started container 
nspos-pki
  Normal   Pulling  20mkubelet, node5 Pulling image 
"registry.nspos.nokia.local/analytics-rtanalytics-pipeline-app:20.9.0-rel.48"
  Normal   Pulled   19mkubelet, node5 Successfully pulled image 
"registry.nspos.nokia.local/analytics-rtanalytics-pipeline-app:20.9.0-rel.48"
  Normal   Created  19mkubelet, node5 Created container 
spark-kubernetes-driver
  Normal   Started  19mkubelet, node5 Started container 
spark-kubernetes-driver
  Normal   Pulling  19mkubelet, node5 Pulling image 
"registry.nspos.nokia.local/fluent/fluent-bit:1.5.5"
  Normal   Pulled   18mkubelet, node5 Successfully pulled image 
"registry.nspos.nokia.local/fluent/fluent-bit:1.5.5"
  Normal   Created  18mkubelet, node5 Created container 
log-sidecar
  Normal   Started  18mkubelet, node5 Started container 
log-sidecar
{code}
Note: The message "_MountVolume.SetUp failed for volume "spark-conf-volume" : 
configmap "act-pipeline-app-1600699152173-driver-conf-map" not found_" seems to 
be unrelated and does not seem to cause any problems.

> [K8S] - executor fails to be restarted after it goes to ERROR/Failure state
> ---
>
> Key: SPARK-32975
> URL: https://issues.apache.org/jira/browse/SPARK-32975
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes, Scheduler
>Affects Versions: 2.4.4
>Reporter: Shenson Joseph
>Priority: Critical
>
> We are using v1beta2-1.1.2-2.4.5 version of operator with spark-2.4.4
> spark executors keeps getting killed with exit code 1 and we are seeing 
> following exception in the executor which goes to error state. Once this 
> error happens, driver doesn't restart executor. 
>  
> Exception in thread "main" java.lang.reflect.UndeclaredThrowableException
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1713)
> at 
> org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:64)
> at 
> org.apache.spark.executor.CoarseGrainedExecutorBackend$.run(CoarseGrainedExecutorBackend.scala:188)
> at 
> org.apache.spark.executor.CoarseGrainedExecutorBackend$.main(CoarseGrainedExecutorBackend.scala:281)
> at 
> org.apache.spark.executor.CoarseGrainedExecutorBackend.main(CoarseGrainedExecutorBackend.scala)
> Caused by: org.apache.spark.SparkException: Exception thrown in awaitResult:
> at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:226)
> at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75)
> at org.apache.spark.rpc.RpcEnv.setupEndpointRefByURI(RpcEnv.scala:101)
> at 
> org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$run$1.apply$mcV$sp(CoarseGrainedExecutorBackend.scala:201)
> at 
> org.apache.spark.deploy.SparkHadoopUtil$$anon$2.run(SparkHadoopUtil.scala:65)
> at 
> org.apache.spark.deploy.SparkHadoopUtil$$anon$2.run(SparkHadoopUtil.scala:64)
> at

[jira] [Assigned] (SPARK-32971) Support dynamic PVC creation/deletion for K8s executors

2020-09-23 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-32971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-32971:
-

Assignee: Dongjoon Hyun

> Support dynamic PVC creation/deletion for K8s executors
> ---
>
> Key: SPARK-32971
> URL: https://issues.apache.org/jira/browse/SPARK-32971
> Project: Spark
>  Issue Type: New Feature
>  Components: Kubernetes
>Affects Versions: 3.1.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-32971) Support dynamic PVC creation/deletion for K8s executors

2020-09-23 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-32971?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-32971.
---
Fix Version/s: 3.1.0
   Resolution: Fixed

Issue resolved by pull request 29846
[https://github.com/apache/spark/pull/29846]

> Support dynamic PVC creation/deletion for K8s executors
> ---
>
> Key: SPARK-32971
> URL: https://issues.apache.org/jira/browse/SPARK-32971
> Project: Spark
>  Issue Type: New Feature
>  Components: Kubernetes
>Affects Versions: 3.1.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
> Fix For: 3.1.0
>
>




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-27733) Upgrade to Avro 1.10.0

2020-09-23 Thread Michael Heuer (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-27733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17201156#comment-17201156
 ] 

Michael Heuer commented on SPARK-27733:
---

I can also participate in the Parquet sync meeting.  Since this involves 
coordination across several different projects (Spark, Parquet, Avro, Hive, 
possibly others), will that be an adequate venue for discussion and decision 
making?

> Upgrade to Avro 1.10.0
> --
>
> Key: SPARK-27733
> URL: https://issues.apache.org/jira/browse/SPARK-27733
> Project: Spark
>  Issue Type: Improvement
>  Components: Build, SQL
>Affects Versions: 3.1.0
>Reporter: Ismaël Mejía
>Priority: Minor
>
> Avro 1.9.2 was released with many nice features including reduced size (1MB 
> less), and removed dependencies, no paranamer, no shaded guava, security 
> updates, so probably a worth upgrade.
> Avro 1.10.0 was released and this is still not done.
> There is at the moment (2020/08) still a blocker because of Hive related 
> transitive dependencies bringing older versions of Avro, so we could say that 
> this is somehow still blocked until HIVE-21737 is solved.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-12312) JDBC connection to Kerberos secured databases fails on remote executors

2020-09-23 Thread John Lonergan (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-12312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17201154#comment-17201154
 ] 

John Lonergan commented on SPARK-12312:
---

Yes - the driver wrapper I wrote accepted both a KT or a ticket cache.,

See the reference I gave.

If I recall correctly we provided the option of selecting either approach
of ...
UserGroupInformation.loginUserFromKeytabAndReturnUGI(principal, keytabFile)

OR
UserGroupInformation.getUGIFromTicketCache(principal, cache)

JL

On Wed, 23 Sep 2020 at 22:17, Prakash Rajendran (Jira) 



> JDBC connection to Kerberos secured databases fails on remote executors
> ---
>
> Key: SPARK-12312
> URL: https://issues.apache.org/jira/browse/SPARK-12312
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.2, 2.4.2
>Reporter: nabacg
>Assignee: Gabor Somogyi
>Priority: Minor
>
> When loading DataFrames from JDBC datasource with Kerberos authentication, 
> remote executors (yarn-client/cluster etc. modes) fail to establish a 
> connection due to lack of Kerberos ticket or ability to generate it. 
> This is a real issue when trying to ingest data from kerberized data sources 
> (SQL Server, Oracle) in enterprise environment where exposing simple 
> authentication access is not an option due to IT policy issues.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-32980) Launcher Client tests flake with minikube

2020-09-23 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-32980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-32980.
---
Fix Version/s: 3.1.0
   Resolution: Fixed

Issue resolved by pull request 29854
[https://github.com/apache/spark/pull/29854]

> Launcher Client tests flake with minikube
> -
>
> Key: SPARK-32980
> URL: https://issues.apache.org/jira/browse/SPARK-32980
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes, Tests
>Affects Versions: 3.1.0
>Reporter: Holden Karau
>Assignee: Holden Karau
>Priority: Major
> Fix For: 3.1.0
>
>
> Launcher Client tests flake with minikube



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-32937) DecomissionSuite in k8s integration tests is failing.

2020-09-23 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-32937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-32937:
-

Assignee: Holden Karau

> DecomissionSuite in k8s integration tests is failing.
> -
>
> Key: SPARK-32937
> URL: https://issues.apache.org/jira/browse/SPARK-32937
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 3.1.0
>Reporter: Prashant Sharma
>Assignee: Holden Karau
>Priority: Major
>
> Logs from the failing test, copied from jenkins. As of now, it is always 
> failing. 
> {code}
> - Test basic decommissioning *** FAILED ***
>   The code passed to eventually never returned normally. Attempted 182 times 
> over 3.00377927275 minutes. Last failure message: "++ id -u
>   + myuid=185
>   ++ id -g
>   + mygid=0
>   + set +e
>   ++ getent passwd 185
>   + uidentry=
>   + set -e
>   + '[' -z '' ']'
>   + '[' -w /etc/passwd ']'
>   + echo '185:x:185:0:anonymous uid:/opt/spark:/bin/false'
>   + SPARK_CLASSPATH=':/opt/spark/jars/*'
>   + env
>   + grep SPARK_JAVA_OPT_
>   + sort -t_ -k4 -n
>   + sed 's/[^=]*=\(.*\)/\1/g'
>   + readarray -t SPARK_EXECUTOR_JAVA_OPTS
>   + '[' -n '' ']'
>   + '[' 3 == 2 ']'
>   + '[' 3 == 3 ']'
>   ++ python3 -V
>   + pyv3='Python 3.7.3'
>   + export PYTHON_VERSION=3.7.3
>   + PYTHON_VERSION=3.7.3
>   + export PYSPARK_PYTHON=python3
>   + PYSPARK_PYTHON=python3
>   + export PYSPARK_DRIVER_PYTHON=python3
>   + PYSPARK_DRIVER_PYTHON=python3
>   + '[' -n '' ']'
>   + '[' -z ']'
>   + '[' -z x ']'
>   + SPARK_CLASSPATH='/opt/spark/conf::/opt/spark/jars/*'
>   + case "$1" in
>   + shift 1
>   + CMD=("$SPARK_HOME/bin/spark-submit" --conf 
> "spark.driver.bindAddress=$SPARK_DRIVER_BIND_ADDRESS" --deploy-mode client 
> "$@")
>   + exec /usr/bin/tini -s -- /opt/spark/bin/spark-submit --conf 
> spark.driver.bindAddress=172.17.0.4 --deploy-mode client --properties-file 
> /opt/spark/conf/spark.properties --class org.apache.spark.deploy.PythonRunner 
> local:///opt/spark/tests/decommissioning.py
>   20/09/17 11:06:56 WARN NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
>   Starting decom test
>   Using Spark's default log4j profile: 
> org/apache/spark/log4j-defaults.properties
>   20/09/17 11:06:56 INFO SparkContext: Running Spark version 3.1.0-SNAPSHOT
>   20/09/17 11:06:57 INFO ResourceUtils: 
> ==
>   20/09/17 11:06:57 INFO ResourceUtils: No custom resources configured for 
> spark.driver.
>   20/09/17 11:06:57 INFO ResourceUtils: 
> ==
>   20/09/17 11:06:57 INFO SparkContext: Submitted application: PyMemoryTest
>   20/09/17 11:06:57 INFO ResourceProfile: Default ResourceProfile created, 
> executor resources: Map(cores -> name: cores, amount: 1, script: , vendor: , 
> memory -> name: memory, amount: 1024, script: , vendor: ), task resources: 
> Map(cpus -> name: cpus, amount: 1.0)
>   20/09/17 11:06:57 INFO ResourceProfile: Limiting resource is cpus at 1 
> tasks per executor
>   20/09/17 11:06:57 INFO ResourceProfileManager: Added ResourceProfile id: 0
>   20/09/17 11:06:57 INFO SecurityManager: Changing view acls to: 185,jenkins
>   20/09/17 11:06:57 INFO SecurityManager: Changing modify acls to: 185,jenkins
>   20/09/17 11:06:57 INFO SecurityManager: Changing view acls groups to: 
>   20/09/17 11:06:57 INFO SecurityManager: Changing modify acls groups to: 
>   20/09/17 11:06:57 INFO SecurityManager: SecurityManager: authentication 
> enabled; ui acls disabled; users  with view permissions: Set(185, jenkins); 
> groups with view permissions: Set(); users  with modify permissions: Set(185, 
> jenkins); groups with modify permissions: Set()
>   20/09/17 11:06:57 INFO Utils: Successfully started service 'sparkDriver' on 
> port 7078.
>   20/09/17 11:06:57 INFO SparkEnv: Registering MapOutputTracker
>   20/09/17 11:06:57 INFO SparkEnv: Registering BlockManagerMaster
>   20/09/17 11:06:57 INFO BlockManagerMasterEndpoint: Using 
> org.apache.spark.storage.DefaultTopologyMapper for getting topology 
> information
>   20/09/17 11:06:57 INFO BlockManagerMasterEndpoint: 
> BlockManagerMasterEndpoint up
>   20/09/17 11:06:57 INFO SparkEnv: Registering BlockManagerMasterHeartbeat
>   20/09/17 11:06:57 INFO DiskBlockManager: Created local directory at 
> /var/data/spark-7985c075-3b02-42ec-9111-cefba535adf0/blockmgr-3bd403d0-6689-46be-997e-5bc699ecefd3
>   20/09/17 11:06:57 INFO MemoryStore: MemoryStore started with capacity 593.9 
> MiB
>   20/09/17 11:06:57 INFO SparkEnv: Registering OutputCommitCoordinator
>   20/09/17 11:06:58 INFO Utils: Successfully started service 'SparkUI' on 
> port 4040.
>   20/09/17 11:06:58 INFO

[jira] [Assigned] (SPARK-32980) Launcher Client tests flake with minikube

2020-09-23 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-32980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-32980:
-

Assignee: Holden Karau

> Launcher Client tests flake with minikube
> -
>
> Key: SPARK-32980
> URL: https://issues.apache.org/jira/browse/SPARK-32980
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes, Tests
>Affects Versions: 3.1.0
>Reporter: Holden Karau
>Assignee: Holden Karau
>Priority: Major
>
> Launcher Client tests flake with minikube



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-32937) DecomissionSuite in k8s integration tests is failing.

2020-09-23 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-32937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-32937.
---
Fix Version/s: 3.1.0
   Resolution: Fixed

Issue resolved by pull request 29854
[https://github.com/apache/spark/pull/29854]

> DecomissionSuite in k8s integration tests is failing.
> -
>
> Key: SPARK-32937
> URL: https://issues.apache.org/jira/browse/SPARK-32937
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 3.1.0
>Reporter: Prashant Sharma
>Assignee: Holden Karau
>Priority: Major
> Fix For: 3.1.0
>
>
> Logs from the failing test, copied from jenkins. As of now, it is always 
> failing. 
> {code}
> - Test basic decommissioning *** FAILED ***
>   The code passed to eventually never returned normally. Attempted 182 times 
> over 3.00377927275 minutes. Last failure message: "++ id -u
>   + myuid=185
>   ++ id -g
>   + mygid=0
>   + set +e
>   ++ getent passwd 185
>   + uidentry=
>   + set -e
>   + '[' -z '' ']'
>   + '[' -w /etc/passwd ']'
>   + echo '185:x:185:0:anonymous uid:/opt/spark:/bin/false'
>   + SPARK_CLASSPATH=':/opt/spark/jars/*'
>   + env
>   + grep SPARK_JAVA_OPT_
>   + sort -t_ -k4 -n
>   + sed 's/[^=]*=\(.*\)/\1/g'
>   + readarray -t SPARK_EXECUTOR_JAVA_OPTS
>   + '[' -n '' ']'
>   + '[' 3 == 2 ']'
>   + '[' 3 == 3 ']'
>   ++ python3 -V
>   + pyv3='Python 3.7.3'
>   + export PYTHON_VERSION=3.7.3
>   + PYTHON_VERSION=3.7.3
>   + export PYSPARK_PYTHON=python3
>   + PYSPARK_PYTHON=python3
>   + export PYSPARK_DRIVER_PYTHON=python3
>   + PYSPARK_DRIVER_PYTHON=python3
>   + '[' -n '' ']'
>   + '[' -z ']'
>   + '[' -z x ']'
>   + SPARK_CLASSPATH='/opt/spark/conf::/opt/spark/jars/*'
>   + case "$1" in
>   + shift 1
>   + CMD=("$SPARK_HOME/bin/spark-submit" --conf 
> "spark.driver.bindAddress=$SPARK_DRIVER_BIND_ADDRESS" --deploy-mode client 
> "$@")
>   + exec /usr/bin/tini -s -- /opt/spark/bin/spark-submit --conf 
> spark.driver.bindAddress=172.17.0.4 --deploy-mode client --properties-file 
> /opt/spark/conf/spark.properties --class org.apache.spark.deploy.PythonRunner 
> local:///opt/spark/tests/decommissioning.py
>   20/09/17 11:06:56 WARN NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
>   Starting decom test
>   Using Spark's default log4j profile: 
> org/apache/spark/log4j-defaults.properties
>   20/09/17 11:06:56 INFO SparkContext: Running Spark version 3.1.0-SNAPSHOT
>   20/09/17 11:06:57 INFO ResourceUtils: 
> ==
>   20/09/17 11:06:57 INFO ResourceUtils: No custom resources configured for 
> spark.driver.
>   20/09/17 11:06:57 INFO ResourceUtils: 
> ==
>   20/09/17 11:06:57 INFO SparkContext: Submitted application: PyMemoryTest
>   20/09/17 11:06:57 INFO ResourceProfile: Default ResourceProfile created, 
> executor resources: Map(cores -> name: cores, amount: 1, script: , vendor: , 
> memory -> name: memory, amount: 1024, script: , vendor: ), task resources: 
> Map(cpus -> name: cpus, amount: 1.0)
>   20/09/17 11:06:57 INFO ResourceProfile: Limiting resource is cpus at 1 
> tasks per executor
>   20/09/17 11:06:57 INFO ResourceProfileManager: Added ResourceProfile id: 0
>   20/09/17 11:06:57 INFO SecurityManager: Changing view acls to: 185,jenkins
>   20/09/17 11:06:57 INFO SecurityManager: Changing modify acls to: 185,jenkins
>   20/09/17 11:06:57 INFO SecurityManager: Changing view acls groups to: 
>   20/09/17 11:06:57 INFO SecurityManager: Changing modify acls groups to: 
>   20/09/17 11:06:57 INFO SecurityManager: SecurityManager: authentication 
> enabled; ui acls disabled; users  with view permissions: Set(185, jenkins); 
> groups with view permissions: Set(); users  with modify permissions: Set(185, 
> jenkins); groups with modify permissions: Set()
>   20/09/17 11:06:57 INFO Utils: Successfully started service 'sparkDriver' on 
> port 7078.
>   20/09/17 11:06:57 INFO SparkEnv: Registering MapOutputTracker
>   20/09/17 11:06:57 INFO SparkEnv: Registering BlockManagerMaster
>   20/09/17 11:06:57 INFO BlockManagerMasterEndpoint: Using 
> org.apache.spark.storage.DefaultTopologyMapper for getting topology 
> information
>   20/09/17 11:06:57 INFO BlockManagerMasterEndpoint: 
> BlockManagerMasterEndpoint up
>   20/09/17 11:06:57 INFO SparkEnv: Registering BlockManagerMasterHeartbeat
>   20/09/17 11:06:57 INFO DiskBlockManager: Created local directory at 
> /var/data/spark-7985c075-3b02-42ec-9111-cefba535adf0/blockmgr-3bd403d0-6689-46be-997e-5bc699ecefd3
>   20/09/17 11:06:57 INFO MemoryStore: MemoryStore started with capacity 593.9 
> MiB
>   20/09/17 11:06:57 INFO SparkEnv: Registering

[jira] [Resolved] (SPARK-32981) Remove hive-1.2/hadoop-2.7 from Apache Spark 3.1 distribution

2020-09-23 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-32981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-32981.
---
Fix Version/s: 3.1.0
   Resolution: Fixed

Issue resolved by pull request 29856
[https://github.com/apache/spark/pull/29856]

> Remove hive-1.2/hadoop-2.7 from Apache Spark 3.1 distribution
> -
>
> Key: SPARK-32981
> URL: https://issues.apache.org/jira/browse/SPARK-32981
> Project: Spark
>  Issue Type: Task
>  Components: Build
>Affects Versions: 3.1.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
> Fix For: 3.1.0
>
>
> Apache Spark 3.0 switches its Hive execution version from 1.2 to 2.3, but we 
> still provide the unofficial forked Hive 1.2 version from our distribution. 
> This issue aims to remove it from Apache Spark 3.1.0.
> {code}
> spark-3.0.1-bin-hadoop2.7-hive1.2.tgz
> spark-3.0.1-bin-hadoop2.7-hive1.2.tgz.asc
> spark-3.0.1-bin-hadoop2.7-hive1.2.tgz.sha512 
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-32981) Remove hive-1.2/hadoop-2.7 from Apache Spark 3.1 distribution

2020-09-23 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-32981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17201141#comment-17201141
 ] 

Apache Spark commented on SPARK-32981:
--

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/29856

> Remove hive-1.2/hadoop-2.7 from Apache Spark 3.1 distribution
> -
>
> Key: SPARK-32981
> URL: https://issues.apache.org/jira/browse/SPARK-32981
> Project: Spark
>  Issue Type: Task
>  Components: Build
>Affects Versions: 3.1.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>
> Apache Spark 3.0 switches its Hive execution version from 1.2 to 2.3, but we 
> still provide the unofficial forked Hive 1.2 version from our distribution. 
> This issue aims to remove it from Apache Spark 3.1.0.
> {code}
> spark-3.0.1-bin-hadoop2.7-hive1.2.tgz
> spark-3.0.1-bin-hadoop2.7-hive1.2.tgz.asc
> spark-3.0.1-bin-hadoop2.7-hive1.2.tgz.sha512 
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-32981) Remove hive-1.2/hadoop-2.7 from Apache Spark 3.1 distribution

2020-09-23 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-32981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17201138#comment-17201138
 ] 

Apache Spark commented on SPARK-32981:
--

User 'dongjoon-hyun' has created a pull request for this issue:
https://github.com/apache/spark/pull/29856

> Remove hive-1.2/hadoop-2.7 from Apache Spark 3.1 distribution
> -
>
> Key: SPARK-32981
> URL: https://issues.apache.org/jira/browse/SPARK-32981
> Project: Spark
>  Issue Type: Task
>  Components: Build
>Affects Versions: 3.1.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>
> Apache Spark 3.0 switches its Hive execution version from 1.2 to 2.3, but we 
> still provide the unofficial forked Hive 1.2 version from our distribution. 
> This issue aims to remove it from Apache Spark 3.1.0.
> {code}
> spark-3.0.1-bin-hadoop2.7-hive1.2.tgz
> spark-3.0.1-bin-hadoop2.7-hive1.2.tgz.asc
> spark-3.0.1-bin-hadoop2.7-hive1.2.tgz.sha512 
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-32981) Remove hive-1.2/hadoop-2.7 from Apache Spark 3.1 distribution

2020-09-23 Thread Dongjoon Hyun (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-32981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-32981:
-

Assignee: Dongjoon Hyun

> Remove hive-1.2/hadoop-2.7 from Apache Spark 3.1 distribution
> -
>
> Key: SPARK-32981
> URL: https://issues.apache.org/jira/browse/SPARK-32981
> Project: Spark
>  Issue Type: Task
>  Components: Build
>Affects Versions: 3.1.0
>Reporter: Dongjoon Hyun
>Assignee: Dongjoon Hyun
>Priority: Major
>
> Apache Spark 3.0 switches its Hive execution version from 1.2 to 2.3, but we 
> still provide the unofficial forked Hive 1.2 version from our distribution. 
> This issue aims to remove it from Apache Spark 3.1.0.
> {code}
> spark-3.0.1-bin-hadoop2.7-hive1.2.tgz
> spark-3.0.1-bin-hadoop2.7-hive1.2.tgz.asc
> spark-3.0.1-bin-hadoop2.7-hive1.2.tgz.sha512 
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-32981) Remove hive-1.2/hadoop-2.7 from Apache Spark 3.1 distribution

2020-09-23 Thread Dongjoon Hyun (Jira)

Dongjoon Hyun created SPARK-32981:
-

 Summary: Remove hive-1.2/hadoop-2.7 from Apache Spark 3.1 
distribution
 Key: SPARK-32981
 URL: https://issues.apache.org/jira/browse/SPARK-32981
 Project: Spark
  Issue Type: Task
  Components: Build
Affects Versions: 3.1.0
Reporter: Dongjoon Hyun


Apache Spark 3.0 switches its Hive execution version from 1.2 to 2.3, but we 
still provide the unofficial forked Hive 1.2 version from our distribution. 
This issue aims to remove it from Apache Spark 3.1.0.
{code}
spark-3.0.1-bin-hadoop2.7-hive1.2.tgz
spark-3.0.1-bin-hadoop2.7-hive1.2.tgz.asc
spark-3.0.1-bin-hadoop2.7-hive1.2.tgz.sha512 
{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-32067) [K8S] Executor pod template ConfigMap of ongoing submission got inadvertently altered by subsequent submission

2020-09-23 Thread James Yu (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-32067?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

James Yu updated SPARK-32067:
-
Affects Version/s: (was: 2.4.6)
   (was: 3.0.0)
   2.4.7
   3.0.1

> [K8S] Executor pod template ConfigMap of ongoing submission got inadvertently 
> altered by subsequent submission
> --
>
> Key: SPARK-32067
> URL: https://issues.apache.org/jira/browse/SPARK-32067
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 2.4.7, 3.0.1
>Reporter: James Yu
>Priority: Minor
>
> THE BUG:
> The bug is reproducible by spark-submit two different apps (app1 and app2) 
> with different executor pod templates (e.g., different labels) to K8s 
> sequentially,  with app2 launching while app1 is still in the middle of 
> ramping up all its executor pods. The unwanted result is that some launched 
> executor pods of app1 end up having app2's executor pod template applied to 
> them.
> The root cause appears to be that app1's podspec-configmap got overwritten by 
> app2 during the overlapping launching periods because both apps use the same 
> ConfigMap (name). This causes some app1's executor pods being ramped up after 
> app2 is launched to be inadvertently launched with the app2's pod template. 
> The issue can be seen as follows:
> First, after submitting app1, you get these configmaps:
> {code:java}
> NAMESPACENAME   DATAAGE
> default  app1--driver-conf-map  1   9m46s
> default  podspec-configmap  1   12m{code}
> Then submit app2 while app1 is still ramping up its executors. The 
> podspec-confimap is modified by app2.
> {code:java}
> NAMESPACENAME   DATAAGE
> default  app1--driver-conf-map  1   11m43s
> default  app2--driver-conf-map  1   10s
> default  podspec-configmap  1   13m57s{code}
>  
> PROPOSED SOLUTION:
> Properly prefix the podspec-configmap for each submitted app, ideally the 
> same way as the driver configmap:
> {code:java}
> NAMESPACENAME   DATAAGE
> default  app1--driver-conf-map  1   11m43s
> default  app1--podspec-configmap1   13m57s
> default  app2--driver-conf-map  1   10s 
> default  app2--podspec-configmap1   3m{code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-27733) Upgrade to Avro 1.10.0

2020-09-23 Thread Dongjoon Hyun (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-27733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17201109#comment-17201109
 ] 

Dongjoon Hyun commented on SPARK-27733:
---

[~smilegator]. Shall we discuss dropping Hive 1.2.1 then?
> Let us wait for one more month. If no one is complaining about the quality of 
> Hive 2.3.x, we can discuss whether we can drop Hive 1.2.1 as Hive execution 
> in Spark 3.1. 

> Upgrade to Avro 1.10.0
> --
>
> Key: SPARK-27733
> URL: https://issues.apache.org/jira/browse/SPARK-27733
> Project: Spark
>  Issue Type: Improvement
>  Components: Build, SQL
>Affects Versions: 3.1.0
>Reporter: Ismaël Mejía
>Priority: Minor
>
> Avro 1.9.2 was released with many nice features including reduced size (1MB 
> less), and removed dependencies, no paranamer, no shaded guava, security 
> updates, so probably a worth upgrade.
> Avro 1.10.0 was released and this is still not done.
> There is at the moment (2020/08) still a blocker because of Hive related 
> transitive dependencies bringing older versions of Avro, so we could say that 
> this is somehow still blocked until HIVE-21737 is solved.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Comment Edited] (SPARK-12312) JDBC connection to Kerberos secured databases fails on remote executors

2020-09-23 Thread Prakash Rajendran (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-12312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17201101#comment-17201101
 ] 

Prakash Rajendran edited comment on SPARK-12312 at 9/23/20, 9:16 PM:
-

[~johnlon] the above references use keytab and principal to establish kerberos 
authentication to Oracle. IN my scenario, I do not have control over keytab 
file, as there will be a sidecar which takes care of setting up the kerb5cache 
file to my pod. So the executor has to use only the krb5cache file. Is this 
scenario also works?


was (Author: prakki79):
[~johnlon] the above references use keytab and principal to establish kerberos 
authentication to Oracle. IN my scenario, I donot have control over keytab 
file, as there will be a sidecar which takes care of detting up the kerb5cache 
file to my pod. So the executor has to use only the krb5cache file. Is this 
scenario also works?

> JDBC connection to Kerberos secured databases fails on remote executors
> ---
>
> Key: SPARK-12312
> URL: https://issues.apache.org/jira/browse/SPARK-12312
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.2, 2.4.2
>Reporter: nabacg
>Assignee: Gabor Somogyi
>Priority: Minor
>
> When loading DataFrames from JDBC datasource with Kerberos authentication, 
> remote executors (yarn-client/cluster etc. modes) fail to establish a 
> connection due to lack of Kerberos ticket or ability to generate it. 
> This is a real issue when trying to ingest data from kerberized data sources 
> (SQL Server, Oracle) in enterprise environment where exposing simple 
> authentication access is not an option due to IT policy issues.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-12312) JDBC connection to Kerberos secured databases fails on remote executors

2020-09-23 Thread Prakash Rajendran (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-12312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17201101#comment-17201101
 ] 

Prakash Rajendran commented on SPARK-12312:
---

[~johnlon] the above references use keytab and principal to establish kerberos 
authentication to Oracle. IN my scenario, I donot have control over keytab 
file, as there will be a sidecar which takes care of detting up the kerb5cache 
file to my pod. So the executor has to use only the krb5cache file. Is this 
scenario also works?

> JDBC connection to Kerberos secured databases fails on remote executors
> ---
>
> Key: SPARK-12312
> URL: https://issues.apache.org/jira/browse/SPARK-12312
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 1.5.2, 2.4.2
>Reporter: nabacg
>Assignee: Gabor Somogyi
>Priority: Minor
>
> When loading DataFrames from JDBC datasource with Kerberos authentication, 
> remote executors (yarn-client/cluster etc. modes) fail to establish a 
> connection due to lack of Kerberos ticket or ability to generate it. 
> This is a real issue when trying to ingest data from kerberized data sources 
> (SQL Server, Oracle) in enterprise environment where exposing simple 
> authentication access is not an option due to IT policy issues.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-27733) Upgrade to Avro 1.10.0

2020-09-23 Thread Chao Sun (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-27733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17201094#comment-17201094
 ] 

Chao Sun commented on SPARK-27733:
--

[~sha...@uber.com] sure, I can join in the next sync meeting.

> Upgrade to Avro 1.10.0
> --
>
> Key: SPARK-27733
> URL: https://issues.apache.org/jira/browse/SPARK-27733
> Project: Spark
>  Issue Type: Improvement
>  Components: Build, SQL
>Affects Versions: 3.1.0
>Reporter: Ismaël Mejía
>Priority: Minor
>
> Avro 1.9.2 was released with many nice features including reduced size (1MB 
> less), and removed dependencies, no paranamer, no shaded guava, security 
> updates, so probably a worth upgrade.
> Avro 1.10.0 was released and this is still not done.
> There is at the moment (2020/08) still a blocker because of Hive related 
> transitive dependencies bringing older versions of Avro, so we could say that 
> this is somehow still blocked until HIVE-21737 is solved.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-27733) Upgrade to Avro 1.10.0

2020-09-23 Thread Dongjoon Hyun (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-27733?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17201091#comment-17201091
 ] 

Dongjoon Hyun commented on SPARK-27733:
---

Thank you for pinging me, [~sha...@uber.com]. Sorry, I cannot join there.
cc [~dbtsai]

> Upgrade to Avro 1.10.0
> --
>
> Key: SPARK-27733
> URL: https://issues.apache.org/jira/browse/SPARK-27733
> Project: Spark
>  Issue Type: Improvement
>  Components: Build, SQL
>Affects Versions: 3.1.0
>Reporter: Ismaël Mejía
>Priority: Minor
>
> Avro 1.9.2 was released with many nice features including reduced size (1MB 
> less), and removed dependencies, no paranamer, no shaded guava, security 
> updates, so probably a worth upgrade.
> Avro 1.10.0 was released and this is still not done.
> There is at the moment (2020/08) still a blocker because of Hive related 
> transitive dependencies bringing older versions of Avro, so we could say that 
> this is somehow still blocked until HIVE-21737 is solved.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-32980) Launcher Client tests flake with minikube

2020-09-23 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-32980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17201076#comment-17201076
 ] 

Apache Spark commented on SPARK-32980:
--

User 'holdenk' has created a pull request for this issue:
https://github.com/apache/spark/pull/29854

> Launcher Client tests flake with minikube
> -
>
> Key: SPARK-32980
> URL: https://issues.apache.org/jira/browse/SPARK-32980
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes, Tests
>Affects Versions: 3.1.0
>Reporter: Holden Karau
>Priority: Major
>
> Launcher Client tests flake with minikube



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-32980) Launcher Client tests flake with minikube

2020-09-23 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-32980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-32980:


Assignee: Apache Spark

> Launcher Client tests flake with minikube
> -
>
> Key: SPARK-32980
> URL: https://issues.apache.org/jira/browse/SPARK-32980
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes, Tests
>Affects Versions: 3.1.0
>Reporter: Holden Karau
>Assignee: Apache Spark
>Priority: Major
>
> Launcher Client tests flake with minikube



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-32980) Launcher Client tests flake with minikube

2020-09-23 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-32980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-32980:


Assignee: (was: Apache Spark)

> Launcher Client tests flake with minikube
> -
>
> Key: SPARK-32980
> URL: https://issues.apache.org/jira/browse/SPARK-32980
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes, Tests
>Affects Versions: 3.1.0
>Reporter: Holden Karau
>Priority: Major
>
> Launcher Client tests flake with minikube



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-32980) Launcher Client tests flake with minikube

2020-09-23 Thread Holden Karau (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-32980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17201073#comment-17201073
 ] 

Holden Karau commented on SPARK-32980:
--

Our method of getting the service assumes the service is on the first line, but 
when a new version of minikube is released the first few lines are upgrade info.

> Launcher Client tests flake with minikube
> -
>
> Key: SPARK-32980
> URL: https://issues.apache.org/jira/browse/SPARK-32980
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes, Tests
>Affects Versions: 3.1.0
>Reporter: Holden Karau
>Priority: Major
>
> Launcher Client tests flake with minikube



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-32980) Launcher Client tests flake with minikube

2020-09-23 Thread Holden Karau (Jira)

Holden Karau created SPARK-32980:


 Summary: Launcher Client tests flake with minikube
 Key: SPARK-32980
 URL: https://issues.apache.org/jira/browse/SPARK-32980
 Project: Spark
  Issue Type: Bug
  Components: Kubernetes, Tests
Affects Versions: 3.1.0
Reporter: Holden Karau


Launcher Client tests flake with minikube



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-32915) RPC implementation to support pushing and merging shuffle blocks

2020-09-23 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-32915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-32915:


Assignee: (was: Apache Spark)

> RPC implementation to support pushing and merging shuffle blocks
> 
>
> Key: SPARK-32915
> URL: https://issues.apache.org/jira/browse/SPARK-32915
> Project: Spark
>  Issue Type: Sub-task
>  Components: Shuffle, Spark Core
>Affects Versions: 3.1.0
>Reporter: Min Shen
>Priority: Major
>
> RPC implementation for the basic functionality in network-common and 
> network-shuffle module to enable pushing blocks on the client side and 
> merging received blocks on the server side.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-32915) RPC implementation to support pushing and merging shuffle blocks

2020-09-23 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-32915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-32915:


Assignee: Apache Spark

> RPC implementation to support pushing and merging shuffle blocks
> 
>
> Key: SPARK-32915
> URL: https://issues.apache.org/jira/browse/SPARK-32915
> Project: Spark
>  Issue Type: Sub-task
>  Components: Shuffle, Spark Core
>Affects Versions: 3.1.0
>Reporter: Min Shen
>Assignee: Apache Spark
>Priority: Major
>
> RPC implementation for the basic functionality in network-common and 
> network-shuffle module to enable pushing blocks on the client side and 
> merging received blocks on the server side.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-32915) RPC implementation to support pushing and merging shuffle blocks

2020-09-23 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-32915?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17201053#comment-17201053
 ] 

Apache Spark commented on SPARK-32915:
--

User 'Victsm' has created a pull request for this issue:
https://github.com/apache/spark/pull/29855

> RPC implementation to support pushing and merging shuffle blocks
> 
>
> Key: SPARK-32915
> URL: https://issues.apache.org/jira/browse/SPARK-32915
> Project: Spark
>  Issue Type: Sub-task
>  Components: Shuffle, Spark Core
>Affects Versions: 3.1.0
>Reporter: Min Shen
>Priority: Major
>
> RPC implementation for the basic functionality in network-common and 
> network-shuffle module to enable pushing blocks on the client side and 
> merging received blocks on the server side.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-32937) DecomissionSuite in k8s integration tests is failing.

2020-09-23 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-32937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-32937:


Assignee: Apache Spark

> DecomissionSuite in k8s integration tests is failing.
> -
>
> Key: SPARK-32937
> URL: https://issues.apache.org/jira/browse/SPARK-32937
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 3.1.0
>Reporter: Prashant Sharma
>Assignee: Apache Spark
>Priority: Major
>
> Logs from the failing test, copied from jenkins. As of now, it is always 
> failing. 
> {code}
> - Test basic decommissioning *** FAILED ***
>   The code passed to eventually never returned normally. Attempted 182 times 
> over 3.00377927275 minutes. Last failure message: "++ id -u
>   + myuid=185
>   ++ id -g
>   + mygid=0
>   + set +e
>   ++ getent passwd 185
>   + uidentry=
>   + set -e
>   + '[' -z '' ']'
>   + '[' -w /etc/passwd ']'
>   + echo '185:x:185:0:anonymous uid:/opt/spark:/bin/false'
>   + SPARK_CLASSPATH=':/opt/spark/jars/*'
>   + env
>   + grep SPARK_JAVA_OPT_
>   + sort -t_ -k4 -n
>   + sed 's/[^=]*=\(.*\)/\1/g'
>   + readarray -t SPARK_EXECUTOR_JAVA_OPTS
>   + '[' -n '' ']'
>   + '[' 3 == 2 ']'
>   + '[' 3 == 3 ']'
>   ++ python3 -V
>   + pyv3='Python 3.7.3'
>   + export PYTHON_VERSION=3.7.3
>   + PYTHON_VERSION=3.7.3
>   + export PYSPARK_PYTHON=python3
>   + PYSPARK_PYTHON=python3
>   + export PYSPARK_DRIVER_PYTHON=python3
>   + PYSPARK_DRIVER_PYTHON=python3
>   + '[' -n '' ']'
>   + '[' -z ']'
>   + '[' -z x ']'
>   + SPARK_CLASSPATH='/opt/spark/conf::/opt/spark/jars/*'
>   + case "$1" in
>   + shift 1
>   + CMD=("$SPARK_HOME/bin/spark-submit" --conf 
> "spark.driver.bindAddress=$SPARK_DRIVER_BIND_ADDRESS" --deploy-mode client 
> "$@")
>   + exec /usr/bin/tini -s -- /opt/spark/bin/spark-submit --conf 
> spark.driver.bindAddress=172.17.0.4 --deploy-mode client --properties-file 
> /opt/spark/conf/spark.properties --class org.apache.spark.deploy.PythonRunner 
> local:///opt/spark/tests/decommissioning.py
>   20/09/17 11:06:56 WARN NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
>   Starting decom test
>   Using Spark's default log4j profile: 
> org/apache/spark/log4j-defaults.properties
>   20/09/17 11:06:56 INFO SparkContext: Running Spark version 3.1.0-SNAPSHOT
>   20/09/17 11:06:57 INFO ResourceUtils: 
> ==
>   20/09/17 11:06:57 INFO ResourceUtils: No custom resources configured for 
> spark.driver.
>   20/09/17 11:06:57 INFO ResourceUtils: 
> ==
>   20/09/17 11:06:57 INFO SparkContext: Submitted application: PyMemoryTest
>   20/09/17 11:06:57 INFO ResourceProfile: Default ResourceProfile created, 
> executor resources: Map(cores -> name: cores, amount: 1, script: , vendor: , 
> memory -> name: memory, amount: 1024, script: , vendor: ), task resources: 
> Map(cpus -> name: cpus, amount: 1.0)
>   20/09/17 11:06:57 INFO ResourceProfile: Limiting resource is cpus at 1 
> tasks per executor
>   20/09/17 11:06:57 INFO ResourceProfileManager: Added ResourceProfile id: 0
>   20/09/17 11:06:57 INFO SecurityManager: Changing view acls to: 185,jenkins
>   20/09/17 11:06:57 INFO SecurityManager: Changing modify acls to: 185,jenkins
>   20/09/17 11:06:57 INFO SecurityManager: Changing view acls groups to: 
>   20/09/17 11:06:57 INFO SecurityManager: Changing modify acls groups to: 
>   20/09/17 11:06:57 INFO SecurityManager: SecurityManager: authentication 
> enabled; ui acls disabled; users  with view permissions: Set(185, jenkins); 
> groups with view permissions: Set(); users  with modify permissions: Set(185, 
> jenkins); groups with modify permissions: Set()
>   20/09/17 11:06:57 INFO Utils: Successfully started service 'sparkDriver' on 
> port 7078.
>   20/09/17 11:06:57 INFO SparkEnv: Registering MapOutputTracker
>   20/09/17 11:06:57 INFO SparkEnv: Registering BlockManagerMaster
>   20/09/17 11:06:57 INFO BlockManagerMasterEndpoint: Using 
> org.apache.spark.storage.DefaultTopologyMapper for getting topology 
> information
>   20/09/17 11:06:57 INFO BlockManagerMasterEndpoint: 
> BlockManagerMasterEndpoint up
>   20/09/17 11:06:57 INFO SparkEnv: Registering BlockManagerMasterHeartbeat
>   20/09/17 11:06:57 INFO DiskBlockManager: Created local directory at 
> /var/data/spark-7985c075-3b02-42ec-9111-cefba535adf0/blockmgr-3bd403d0-6689-46be-997e-5bc699ecefd3
>   20/09/17 11:06:57 INFO MemoryStore: MemoryStore started with capacity 593.9 
> MiB
>   20/09/17 11:06:57 INFO SparkEnv: Registering OutputCommitCoordinator
>   20/09/17 11:06:58 INFO Utils: Successfully started service 'SparkUI' on 
> port 4040.
>   20/09/17 11:06:58 INFO

[jira] [Assigned] (SPARK-32937) DecomissionSuite in k8s integration tests is failing.

2020-09-23 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-32937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-32937:


Assignee: (was: Apache Spark)

> DecomissionSuite in k8s integration tests is failing.
> -
>
> Key: SPARK-32937
> URL: https://issues.apache.org/jira/browse/SPARK-32937
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 3.1.0
>Reporter: Prashant Sharma
>Priority: Major
>
> Logs from the failing test, copied from jenkins. As of now, it is always 
> failing. 
> {code}
> - Test basic decommissioning *** FAILED ***
>   The code passed to eventually never returned normally. Attempted 182 times 
> over 3.00377927275 minutes. Last failure message: "++ id -u
>   + myuid=185
>   ++ id -g
>   + mygid=0
>   + set +e
>   ++ getent passwd 185
>   + uidentry=
>   + set -e
>   + '[' -z '' ']'
>   + '[' -w /etc/passwd ']'
>   + echo '185:x:185:0:anonymous uid:/opt/spark:/bin/false'
>   + SPARK_CLASSPATH=':/opt/spark/jars/*'
>   + env
>   + grep SPARK_JAVA_OPT_
>   + sort -t_ -k4 -n
>   + sed 's/[^=]*=\(.*\)/\1/g'
>   + readarray -t SPARK_EXECUTOR_JAVA_OPTS
>   + '[' -n '' ']'
>   + '[' 3 == 2 ']'
>   + '[' 3 == 3 ']'
>   ++ python3 -V
>   + pyv3='Python 3.7.3'
>   + export PYTHON_VERSION=3.7.3
>   + PYTHON_VERSION=3.7.3
>   + export PYSPARK_PYTHON=python3
>   + PYSPARK_PYTHON=python3
>   + export PYSPARK_DRIVER_PYTHON=python3
>   + PYSPARK_DRIVER_PYTHON=python3
>   + '[' -n '' ']'
>   + '[' -z ']'
>   + '[' -z x ']'
>   + SPARK_CLASSPATH='/opt/spark/conf::/opt/spark/jars/*'
>   + case "$1" in
>   + shift 1
>   + CMD=("$SPARK_HOME/bin/spark-submit" --conf 
> "spark.driver.bindAddress=$SPARK_DRIVER_BIND_ADDRESS" --deploy-mode client 
> "$@")
>   + exec /usr/bin/tini -s -- /opt/spark/bin/spark-submit --conf 
> spark.driver.bindAddress=172.17.0.4 --deploy-mode client --properties-file 
> /opt/spark/conf/spark.properties --class org.apache.spark.deploy.PythonRunner 
> local:///opt/spark/tests/decommissioning.py
>   20/09/17 11:06:56 WARN NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
>   Starting decom test
>   Using Spark's default log4j profile: 
> org/apache/spark/log4j-defaults.properties
>   20/09/17 11:06:56 INFO SparkContext: Running Spark version 3.1.0-SNAPSHOT
>   20/09/17 11:06:57 INFO ResourceUtils: 
> ==
>   20/09/17 11:06:57 INFO ResourceUtils: No custom resources configured for 
> spark.driver.
>   20/09/17 11:06:57 INFO ResourceUtils: 
> ==
>   20/09/17 11:06:57 INFO SparkContext: Submitted application: PyMemoryTest
>   20/09/17 11:06:57 INFO ResourceProfile: Default ResourceProfile created, 
> executor resources: Map(cores -> name: cores, amount: 1, script: , vendor: , 
> memory -> name: memory, amount: 1024, script: , vendor: ), task resources: 
> Map(cpus -> name: cpus, amount: 1.0)
>   20/09/17 11:06:57 INFO ResourceProfile: Limiting resource is cpus at 1 
> tasks per executor
>   20/09/17 11:06:57 INFO ResourceProfileManager: Added ResourceProfile id: 0
>   20/09/17 11:06:57 INFO SecurityManager: Changing view acls to: 185,jenkins
>   20/09/17 11:06:57 INFO SecurityManager: Changing modify acls to: 185,jenkins
>   20/09/17 11:06:57 INFO SecurityManager: Changing view acls groups to: 
>   20/09/17 11:06:57 INFO SecurityManager: Changing modify acls groups to: 
>   20/09/17 11:06:57 INFO SecurityManager: SecurityManager: authentication 
> enabled; ui acls disabled; users  with view permissions: Set(185, jenkins); 
> groups with view permissions: Set(); users  with modify permissions: Set(185, 
> jenkins); groups with modify permissions: Set()
>   20/09/17 11:06:57 INFO Utils: Successfully started service 'sparkDriver' on 
> port 7078.
>   20/09/17 11:06:57 INFO SparkEnv: Registering MapOutputTracker
>   20/09/17 11:06:57 INFO SparkEnv: Registering BlockManagerMaster
>   20/09/17 11:06:57 INFO BlockManagerMasterEndpoint: Using 
> org.apache.spark.storage.DefaultTopologyMapper for getting topology 
> information
>   20/09/17 11:06:57 INFO BlockManagerMasterEndpoint: 
> BlockManagerMasterEndpoint up
>   20/09/17 11:06:57 INFO SparkEnv: Registering BlockManagerMasterHeartbeat
>   20/09/17 11:06:57 INFO DiskBlockManager: Created local directory at 
> /var/data/spark-7985c075-3b02-42ec-9111-cefba535adf0/blockmgr-3bd403d0-6689-46be-997e-5bc699ecefd3
>   20/09/17 11:06:57 INFO MemoryStore: MemoryStore started with capacity 593.9 
> MiB
>   20/09/17 11:06:57 INFO SparkEnv: Registering OutputCommitCoordinator
>   20/09/17 11:06:58 INFO Utils: Successfully started service 'SparkUI' on 
> port 4040.
>   20/09/17 11:06:58 INFO SparkUI: Bound SparkUI to

[jira] [Commented] (SPARK-32937) DecomissionSuite in k8s integration tests is failing.

2020-09-23 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-32937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17201045#comment-17201045
 ] 

Apache Spark commented on SPARK-32937:
--

User 'holdenk' has created a pull request for this issue:
https://github.com/apache/spark/pull/29854

> DecomissionSuite in k8s integration tests is failing.
> -
>
> Key: SPARK-32937
> URL: https://issues.apache.org/jira/browse/SPARK-32937
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 3.1.0
>Reporter: Prashant Sharma
>Priority: Major
>
> Logs from the failing test, copied from jenkins. As of now, it is always 
> failing. 
> {code}
> - Test basic decommissioning *** FAILED ***
>   The code passed to eventually never returned normally. Attempted 182 times 
> over 3.00377927275 minutes. Last failure message: "++ id -u
>   + myuid=185
>   ++ id -g
>   + mygid=0
>   + set +e
>   ++ getent passwd 185
>   + uidentry=
>   + set -e
>   + '[' -z '' ']'
>   + '[' -w /etc/passwd ']'
>   + echo '185:x:185:0:anonymous uid:/opt/spark:/bin/false'
>   + SPARK_CLASSPATH=':/opt/spark/jars/*'
>   + env
>   + grep SPARK_JAVA_OPT_
>   + sort -t_ -k4 -n
>   + sed 's/[^=]*=\(.*\)/\1/g'
>   + readarray -t SPARK_EXECUTOR_JAVA_OPTS
>   + '[' -n '' ']'
>   + '[' 3 == 2 ']'
>   + '[' 3 == 3 ']'
>   ++ python3 -V
>   + pyv3='Python 3.7.3'
>   + export PYTHON_VERSION=3.7.3
>   + PYTHON_VERSION=3.7.3
>   + export PYSPARK_PYTHON=python3
>   + PYSPARK_PYTHON=python3
>   + export PYSPARK_DRIVER_PYTHON=python3
>   + PYSPARK_DRIVER_PYTHON=python3
>   + '[' -n '' ']'
>   + '[' -z ']'
>   + '[' -z x ']'
>   + SPARK_CLASSPATH='/opt/spark/conf::/opt/spark/jars/*'
>   + case "$1" in
>   + shift 1
>   + CMD=("$SPARK_HOME/bin/spark-submit" --conf 
> "spark.driver.bindAddress=$SPARK_DRIVER_BIND_ADDRESS" --deploy-mode client 
> "$@")
>   + exec /usr/bin/tini -s -- /opt/spark/bin/spark-submit --conf 
> spark.driver.bindAddress=172.17.0.4 --deploy-mode client --properties-file 
> /opt/spark/conf/spark.properties --class org.apache.spark.deploy.PythonRunner 
> local:///opt/spark/tests/decommissioning.py
>   20/09/17 11:06:56 WARN NativeCodeLoader: Unable to load native-hadoop 
> library for your platform... using builtin-java classes where applicable
>   Starting decom test
>   Using Spark's default log4j profile: 
> org/apache/spark/log4j-defaults.properties
>   20/09/17 11:06:56 INFO SparkContext: Running Spark version 3.1.0-SNAPSHOT
>   20/09/17 11:06:57 INFO ResourceUtils: 
> ==
>   20/09/17 11:06:57 INFO ResourceUtils: No custom resources configured for 
> spark.driver.
>   20/09/17 11:06:57 INFO ResourceUtils: 
> ==
>   20/09/17 11:06:57 INFO SparkContext: Submitted application: PyMemoryTest
>   20/09/17 11:06:57 INFO ResourceProfile: Default ResourceProfile created, 
> executor resources: Map(cores -> name: cores, amount: 1, script: , vendor: , 
> memory -> name: memory, amount: 1024, script: , vendor: ), task resources: 
> Map(cpus -> name: cpus, amount: 1.0)
>   20/09/17 11:06:57 INFO ResourceProfile: Limiting resource is cpus at 1 
> tasks per executor
>   20/09/17 11:06:57 INFO ResourceProfileManager: Added ResourceProfile id: 0
>   20/09/17 11:06:57 INFO SecurityManager: Changing view acls to: 185,jenkins
>   20/09/17 11:06:57 INFO SecurityManager: Changing modify acls to: 185,jenkins
>   20/09/17 11:06:57 INFO SecurityManager: Changing view acls groups to: 
>   20/09/17 11:06:57 INFO SecurityManager: Changing modify acls groups to: 
>   20/09/17 11:06:57 INFO SecurityManager: SecurityManager: authentication 
> enabled; ui acls disabled; users  with view permissions: Set(185, jenkins); 
> groups with view permissions: Set(); users  with modify permissions: Set(185, 
> jenkins); groups with modify permissions: Set()
>   20/09/17 11:06:57 INFO Utils: Successfully started service 'sparkDriver' on 
> port 7078.
>   20/09/17 11:06:57 INFO SparkEnv: Registering MapOutputTracker
>   20/09/17 11:06:57 INFO SparkEnv: Registering BlockManagerMaster
>   20/09/17 11:06:57 INFO BlockManagerMasterEndpoint: Using 
> org.apache.spark.storage.DefaultTopologyMapper for getting topology 
> information
>   20/09/17 11:06:57 INFO BlockManagerMasterEndpoint: 
> BlockManagerMasterEndpoint up
>   20/09/17 11:06:57 INFO SparkEnv: Registering BlockManagerMasterHeartbeat
>   20/09/17 11:06:57 INFO DiskBlockManager: Created local directory at 
> /var/data/spark-7985c075-3b02-42ec-9111-cefba535adf0/blockmgr-3bd403d0-6689-46be-997e-5bc699ecefd3
>   20/09/17 11:06:57 INFO MemoryStore: MemoryStore started with capacity 593.9 
> MiB
>   20/09/17 11:06:57 INFO SparkEnv: Registering OutputCommitCoordinator
>   20/09/17 11:06:58 INFO Utils: Successfully

[jira] [Resolved] (SPARK-32979) Spark K8s decom test is broken

2020-09-23 Thread Holden Karau (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-32979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Holden Karau resolved SPARK-32979.
--
Resolution: Duplicate

Duplicate of SPARK-32937

> Spark K8s decom test is broken
> --
>
> Key: SPARK-32979
> URL: https://issues.apache.org/jira/browse/SPARK-32979
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes, Spark Core, Tests
>Affects Versions: 3.1.0
>Reporter: Holden Karau
>Assignee: Holden Karau
>Priority: Major
>
> Someone changed the logging messages again. Let's fix the test and add some 
> comments about the importance of running the K8s test on changes.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-32933) Use keyword-only syntax for keyword_only methods

2020-09-23 Thread Maciej Szymkiewicz (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-32933?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17201017#comment-17201017
 ] 

Maciej Szymkiewicz commented on SPARK-32933:


Not a problem [~hyukjin.kwon].  My only concern is that we still need a viable 
alternative for capturing arguments. {{locals}} hack does the job, especially 
if we add some helper for dropping {{self}}

{code:python}
def _drop_self(d):
d = copy.copy(d)
del d["self"]
return d 


class BucketedRandomProjectionLSH(_LSH, _BucketedRandomProjectionLSHParams,
  HasSeed, JavaMLReadable, JavaMLWritable):
def __init__(self, *, inputCol=None, outputCol=None, seed=None, 
numHashTables=1,
 bucketLength=None):
kwargs = _drop_self(locals())
...
{code}

Alternatively, we could just provide all the args explicitly.

I guess we could also leverage `inspect` (optionally combined with class 
decorator? Just thinking out loud). 



> Use keyword-only syntax for keyword_only methods
> 
>
> Key: SPARK-32933
> URL: https://issues.apache.org/jira/browse/SPARK-32933
> Project: Spark
>  Issue Type: Improvement
>  Components: PySpark
>Affects Versions: 3.1.0
>Reporter: Maciej Szymkiewicz
>Assignee: Maciej Szymkiewicz
>Priority: Minor
> Fix For: 3.1.0
>
>
> Since 3.0, provides syntax for indicating keyword-only arguments ([PEP 
> 3102|https://www.python.org/dev/peps/pep-3102/]).
> It is not a full replacement for our current usage of {{keyword_only}}, but 
> it would allow us to make our expectations explicit:
> {code:python}
> @keyword_only
> def __init__(self, degree=2, inputCol=None, outputCol=None):
> {code}
> {code:python}
> @keyword_only
> def __init__(self, *, degree=2, inputCol=None, outputCol=None):
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-32979) Spark K8s decom test is broken

2020-09-23 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-32979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-32979:


Assignee: Holden Karau  (was: Apache Spark)

> Spark K8s decom test is broken
> --
>
> Key: SPARK-32979
> URL: https://issues.apache.org/jira/browse/SPARK-32979
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes, Spark Core, Tests
>Affects Versions: 3.1.0
>Reporter: Holden Karau
>Assignee: Holden Karau
>Priority: Major
>
> Someone changed the logging messages again. Let's fix the test and add some 
> comments about the importance of running the K8s test on changes.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-32979) Spark K8s decom test is broken

2020-09-23 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-32979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-32979:


Assignee: Holden Karau  (was: Apache Spark)

> Spark K8s decom test is broken
> --
>
> Key: SPARK-32979
> URL: https://issues.apache.org/jira/browse/SPARK-32979
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes, Spark Core, Tests
>Affects Versions: 3.1.0
>Reporter: Holden Karau
>Assignee: Holden Karau
>Priority: Major
>
> Someone changed the logging messages again. Let's fix the test and add some 
> comments about the importance of running the K8s test on changes.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-32979) Spark K8s decom test is broken

2020-09-23 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-32979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-32979:


Assignee: Apache Spark  (was: Holden Karau)

> Spark K8s decom test is broken
> --
>
> Key: SPARK-32979
> URL: https://issues.apache.org/jira/browse/SPARK-32979
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes, Spark Core, Tests
>Affects Versions: 3.1.0
>Reporter: Holden Karau
>Assignee: Apache Spark
>Priority: Major
>
> Someone changed the logging messages again. Let's fix the test and add some 
> comments about the importance of running the K8s test on changes.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-32979) Spark K8s decom test is broken

2020-09-23 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-32979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17200999#comment-17200999
 ] 

Apache Spark commented on SPARK-32979:
--

User 'holdenk' has created a pull request for this issue:
https://github.com/apache/spark/pull/29854

> Spark K8s decom test is broken
> --
>
> Key: SPARK-32979
> URL: https://issues.apache.org/jira/browse/SPARK-32979
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes, Spark Core, Tests
>Affects Versions: 3.1.0
>Reporter: Holden Karau
>Assignee: Holden Karau
>Priority: Major
>
> Someone changed the logging messages again. Let's fix the test and add some 
> comments about the importance of running the K8s test on changes.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-32979) Spark K8s decom test is broken

2020-09-23 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-32979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-32979:


Assignee: Apache Spark  (was: Holden Karau)

> Spark K8s decom test is broken
> --
>
> Key: SPARK-32979
> URL: https://issues.apache.org/jira/browse/SPARK-32979
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes, Spark Core, Tests
>Affects Versions: 3.1.0
>Reporter: Holden Karau
>Assignee: Apache Spark
>Priority: Major
>
> Someone changed the logging messages again. Let's fix the test and add some 
> comments about the importance of running the K8s test on changes.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-32950) No need for some big-endian specific code paths in {On,Off}HeapColumnVector

2020-09-23 Thread Sean R. Owen (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-32950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen reassigned SPARK-32950:


Assignee: Michael Munday

> No need for some big-endian specific code paths in {On,Off}HeapColumnVector
> ---
>
> Key: SPARK-32950
> URL: https://issues.apache.org/jira/browse/SPARK-32950
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.1
>Reporter: Michael Munday
>Assignee: Michael Munday
>Priority: Trivial
>  Labels: big-endian
>
> There is no need for a separate code path for big-endian platforms in 
> putFloats and putDoubles in OnHeapColumnVector and OffHeapColumnVector. Since 
> SPARK-26985 was fixed the values have been copied in native byte order so the 
> code required to perform this operation can be the same on both little- and 
> big-endian platforms.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-32950) No need for some big-endian specific code paths in {On,Off}HeapColumnVector

2020-09-23 Thread Sean R. Owen (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-32950?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen resolved SPARK-32950.
--
Fix Version/s: 3.1.0
   Resolution: Fixed

Issue resolved by pull request 29815
[https://github.com/apache/spark/pull/29815]

> No need for some big-endian specific code paths in {On,Off}HeapColumnVector
> ---
>
> Key: SPARK-32950
> URL: https://issues.apache.org/jira/browse/SPARK-32950
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.1
>Reporter: Michael Munday
>Assignee: Michael Munday
>Priority: Trivial
>  Labels: big-endian
> Fix For: 3.1.0
>
>
> There is no need for a separate code path for big-endian platforms in 
> putFloats and putDoubles in OnHeapColumnVector and OffHeapColumnVector. Since 
> SPARK-26985 was fixed the values have been copied in native byte order so the 
> code required to perform this operation can be the same on both little- and 
> big-endian platforms.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Resolved] (SPARK-32892) Murmur3 and xxHash64 implementations do not produce the correct results on big-endian platforms

2020-09-23 Thread Sean R. Owen (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-32892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen resolved SPARK-32892.
--
Fix Version/s: 3.1.0
   Resolution: Fixed

Issue resolved by pull request 29762
[https://github.com/apache/spark/pull/29762]

> Murmur3 and xxHash64 implementations do not produce the correct results on 
> big-endian platforms
> ---
>
> Key: SPARK-32892
> URL: https://issues.apache.org/jira/browse/SPARK-32892
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, SQL
>Affects Versions: 3.0.1
>Reporter: Michael Munday
>Assignee: Michael Munday
>Priority: Minor
>  Labels: big-endian
> Fix For: 3.1.0
>
>
> The Murmur3 and xxHash64 implementations in Spark do not produce the correct 
> results on big-endian systems. This causes test failures on my target 
> platform (s390x).
> These hash functions require that multi-byte chunks be interpreted as 
> integers encoded in *little-endian* byte order. This requires byte reversal 
> when using multi-byte unsafe operations on big-endian platforms.
> I have a PR ready for discussion and review.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-32892) Murmur3 and xxHash64 implementations do not produce the correct results on big-endian platforms

2020-09-23 Thread Sean R. Owen (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-32892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Sean R. Owen reassigned SPARK-32892:


Assignee: Michael Munday

> Murmur3 and xxHash64 implementations do not produce the correct results on 
> big-endian platforms
> ---
>
> Key: SPARK-32892
> URL: https://issues.apache.org/jira/browse/SPARK-32892
> Project: Spark
>  Issue Type: Bug
>  Components: Spark Core, SQL
>Affects Versions: 3.0.1
>Reporter: Michael Munday
>Assignee: Michael Munday
>Priority: Minor
>  Labels: big-endian
>
> The Murmur3 and xxHash64 implementations in Spark do not produce the correct 
> results on big-endian systems. This causes test failures on my target 
> platform (s390x).
> These hash functions require that multi-byte chunks be interpreted as 
> integers encoded in *little-endian* byte order. This requires byte reversal 
> when using multi-byte unsafe operations on big-endian platforms.
> I have a PR ready for discussion and review.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-32979) Spark K8s decom test is broken

2020-09-23 Thread Holden Karau (Jira)

Holden Karau created SPARK-32979:


 Summary: Spark K8s decom test is broken
 Key: SPARK-32979
 URL: https://issues.apache.org/jira/browse/SPARK-32979
 Project: Spark
  Issue Type: Bug
  Components: Kubernetes, Spark Core, Tests
Affects Versions: 3.1.0
Reporter: Holden Karau
Assignee: Holden Karau


Someone changed the logging messages again. Let's fix the test and add some 
comments about the importance of running the K8s test on changes.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-32972) Pass all `mllib` module UTs in Scala 2.13

2020-09-23 Thread Yang Jie (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-32972?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yang Jie updated SPARK-32972:
-
Description: 
There are 51 Scala test and 3 java test Failed of `mllib` module, the failed 
case as follow:

*Java:*
 * org.apache.spark.mllib.fpm.JavaPrefixSpanSuite ( 1 FAILED)
 * org.apache.spark.ml.feature.JavaVectorIndexerSuite ( 1 FAILED)
 * org.apache.spark.ml.feature.JavaWord2VecSuite ( 1 FAILED)

*Scala:*
 * MatrixFactorizationModelSuite ( 1 FAILED)
 * LDASuite ( 1 FAILED)
 * MLTestSuite ( 1 FAILED)
 * PrefixSpanSuite ( 1 FAILED)
 * BucketedRandomProjectionLSHSuite ( 3 FAILED)
 * Word2VecSuite ( 3 FAILED)
 * Word2VecSuite ( 5 FAILED)
 * MinHashLSHSuite ( 3 FAILED)
 * DecisionTreeSuite ( 1 FAILED)
 * FPGrowthSuite ( 2 FAILED)
 * NaiveBayesSuite ( 2 FAILED)
 * NGramSuite ( 4 FAILED)
 * RFormulaSuite ( 4 FAILED)
 * GradientBoostedTreesSuite ( 1 FAILED)
 * StopWordsRemoverSuite ( 10 FAILED)
 * RandomForestSuite ( 1 FAILED)
 * PrefixSpanSuite ( 4 FAILED)
 * StringIndexerSuite ( 2 FAILED)
 * IDFSuite ( 1 FAILED)
 * RandomForestRegressorSuite ( 1 FAILED)

  was:There are 51 Scala test and 3 java test Failed of `mllib` module, details 
will be added later.


> Pass all `mllib` module UTs in Scala 2.13
> -
>
> Key: SPARK-32972
> URL: https://issues.apache.org/jira/browse/SPARK-32972
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML, MLlib
>Affects Versions: 3.1.0
>Reporter: Yang Jie
>Priority: Major
>
> There are 51 Scala test and 3 java test Failed of `mllib` module, the failed 
> case as follow:
> *Java:*
>  * org.apache.spark.mllib.fpm.JavaPrefixSpanSuite ( 1 FAILED)
>  * org.apache.spark.ml.feature.JavaVectorIndexerSuite ( 1 FAILED)
>  * org.apache.spark.ml.feature.JavaWord2VecSuite ( 1 FAILED)
> *Scala:*
>  * MatrixFactorizationModelSuite ( 1 FAILED)
>  * LDASuite ( 1 FAILED)
>  * MLTestSuite ( 1 FAILED)
>  * PrefixSpanSuite ( 1 FAILED)
>  * BucketedRandomProjectionLSHSuite ( 3 FAILED)
>  * Word2VecSuite ( 3 FAILED)
>  * Word2VecSuite ( 5 FAILED)
>  * MinHashLSHSuite ( 3 FAILED)
>  * DecisionTreeSuite ( 1 FAILED)
>  * FPGrowthSuite ( 2 FAILED)
>  * NaiveBayesSuite ( 2 FAILED)
>  * NGramSuite ( 4 FAILED)
>  * RFormulaSuite ( 4 FAILED)
>  * GradientBoostedTreesSuite ( 1 FAILED)
>  * StopWordsRemoverSuite ( 10 FAILED)
>  * RandomForestSuite ( 1 FAILED)
>  * PrefixSpanSuite ( 4 FAILED)
>  * StringIndexerSuite ( 2 FAILED)
>  * IDFSuite ( 1 FAILED)
>  * RandomForestRegressorSuite ( 1 FAILED)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-32977) [SQL] JavaDoc on Default Save mode Incorrect

2020-09-23 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-32977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17200935#comment-17200935
 ] 

Apache Spark commented on SPARK-32977:
--

User 'RussellSpitzer' has created a pull request for this issue:
https://github.com/apache/spark/pull/29853

> [SQL] JavaDoc on Default Save mode Incorrect
> 
>
> Key: SPARK-32977
> URL: https://issues.apache.org/jira/browse/SPARK-32977
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.1
>Reporter: Russell Spitzer
>Priority: Major
>
> The JavaDoc says that the default save mode is dependent on DataSource 
> version which is incorrect. It is always ErrorOnExists.
> http://apache-spark-developers-list.1001551.n3.nabble.com/DatasourceV2-Default-Mode-for-DataFrameWriter-not-Dependent-on-DataSource-Version-td29434.html



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-32977) [SQL] JavaDoc on Default Save mode Incorrect

2020-09-23 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-32977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-32977:


Assignee: (was: Apache Spark)

> [SQL] JavaDoc on Default Save mode Incorrect
> 
>
> Key: SPARK-32977
> URL: https://issues.apache.org/jira/browse/SPARK-32977
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.1
>Reporter: Russell Spitzer
>Priority: Major
>
> The JavaDoc says that the default save mode is dependent on DataSource 
> version which is incorrect. It is always ErrorOnExists.
> http://apache-spark-developers-list.1001551.n3.nabble.com/DatasourceV2-Default-Mode-for-DataFrameWriter-not-Dependent-on-DataSource-Version-td29434.html



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-32977) [SQL] JavaDoc on Default Save mode Incorrect

2020-09-23 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-32977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-32977:


Assignee: Apache Spark

> [SQL] JavaDoc on Default Save mode Incorrect
> 
>
> Key: SPARK-32977
> URL: https://issues.apache.org/jira/browse/SPARK-32977
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.1
>Reporter: Russell Spitzer
>Assignee: Apache Spark
>Priority: Major
>
> The JavaDoc says that the default save mode is dependent on DataSource 
> version which is incorrect. It is always ErrorOnExists.
> http://apache-spark-developers-list.1001551.n3.nabble.com/DatasourceV2-Default-Mode-for-DataFrameWriter-not-Dependent-on-DataSource-Version-td29434.html



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-32977) [SQL] JavaDoc on Default Save mode Incorrect

2020-09-23 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-32977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17200932#comment-17200932
 ] 

Apache Spark commented on SPARK-32977:
--

User 'RussellSpitzer' has created a pull request for this issue:
https://github.com/apache/spark/pull/29853

> [SQL] JavaDoc on Default Save mode Incorrect
> 
>
> Key: SPARK-32977
> URL: https://issues.apache.org/jira/browse/SPARK-32977
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.1
>Reporter: Russell Spitzer
>Priority: Major
>
> The JavaDoc says that the default save mode is dependent on DataSource 
> version which is incorrect. It is always ErrorOnExists.
> http://apache-spark-developers-list.1001551.n3.nabble.com/DatasourceV2-Default-Mode-for-DataFrameWriter-not-Dependent-on-DataSource-Version-td29434.html



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-32977) [SQL] JavaDoc on Default Save mode Incorrect

2020-09-23 Thread Russell Spitzer (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-32977?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17200930#comment-17200930
 ] 

Russell Spitzer commented on SPARK-32977:
-

[~brkyvz] We talked about this a while back, just submitted the PR to fix the 
doc. Could you please review?

> [SQL] JavaDoc on Default Save mode Incorrect
> 
>
> Key: SPARK-32977
> URL: https://issues.apache.org/jira/browse/SPARK-32977
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.1
>Reporter: Russell Spitzer
>Priority: Major
>
> The JavaDoc says that the default save mode is dependent on DataSource 
> version which is incorrect. It is always ErrorOnExists.
> http://apache-spark-developers-list.1001551.n3.nabble.com/DatasourceV2-Default-Mode-for-DataFrameWriter-not-Dependent-on-DataSource-Version-td29434.html



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-32978) Incorrect number of dynamic part metric

2020-09-23 Thread Yuming Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-32978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated SPARK-32978:

Description: 
How to reproduce this issue:
{code:sql}
create table dynamic_partition(i bigint, part bigint) using parquet partitioned 
by (part);
insert overwrite table dynamic_partition partition(part) select id, id % 50 as 
part  from range(1);
{code}

The number of dynamic part should be 50, but it is 800.


  was:
How to reproduce this issue:
{code:sql}
create table dynamic_partition(i bigint, part bigint) using parquet partitioned 
by (part);
insert overwrite table dynamic_partition partition(part) select id, id % 50 as 
part  from range(1);
{code}



> Incorrect number of dynamic part metric
> ---
>
> Key: SPARK-32978
> URL: https://issues.apache.org/jira/browse/SPARK-32978
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Yuming Wang
>Priority: Major
> Attachments: screenshot-1.png
>
>
> How to reproduce this issue:
> {code:sql}
> create table dynamic_partition(i bigint, part bigint) using parquet 
> partitioned by (part);
> insert overwrite table dynamic_partition partition(part) select id, id % 50 
> as part  from range(1);
> {code}
> The number of dynamic part should be 50, but it is 800.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-32978) Incorrect number of dynamic part metric

2020-09-23 Thread Yuming Wang (Jira)

Yuming Wang created SPARK-32978:
---

 Summary: Incorrect number of dynamic part metric
 Key: SPARK-32978
 URL: https://issues.apache.org/jira/browse/SPARK-32978
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.0.0
Reporter: Yuming Wang
 Attachments: screenshot-1.png

How to reproduce this issue:
{code:sql}
create table dynamic_partition(i bigint, part bigint) using parquet partitioned 
by (part);
insert overwrite table dynamic_partition partition(part) select id, id % 50 as 
part  from range(1);
{code}




--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-32978) Incorrect number of dynamic part metric

2020-09-23 Thread Yuming Wang (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-32978?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated SPARK-32978:

Attachment: screenshot-1.png

> Incorrect number of dynamic part metric
> ---
>
> Key: SPARK-32978
> URL: https://issues.apache.org/jira/browse/SPARK-32978
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Yuming Wang
>Priority: Major
> Attachments: screenshot-1.png
>
>
> How to reproduce this issue:
> {code:sql}
> create table dynamic_partition(i bigint, part bigint) using parquet 
> partitioned by (part);
> insert overwrite table dynamic_partition partition(part) select id, id % 50 
> as part  from range(1);
> {code}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-32977) [SQL] JavaDoc on Default Save mode Incorrect

2020-09-23 Thread Russell Spitzer (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-32977?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Russell Spitzer updated SPARK-32977:

Description: 
The JavaDoc says that the default save mode is dependent on DataSource version 
which is incorrect. It is always ErrorOnExists.

http://apache-spark-developers-list.1001551.n3.nabble.com/DatasourceV2-Default-Mode-for-DataFrameWriter-not-Dependent-on-DataSource-Version-td29434.html

  was:The JavaDoc says that the default save mode is dependent on DataSource 
version which is incorrect. It is always ErrorOnExists.


> [SQL] JavaDoc on Default Save mode Incorrect
> 
>
> Key: SPARK-32977
> URL: https://issues.apache.org/jira/browse/SPARK-32977
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 3.0.1
>Reporter: Russell Spitzer
>Priority: Major
>
> The JavaDoc says that the default save mode is dependent on DataSource 
> version which is incorrect. It is always ErrorOnExists.
> http://apache-spark-developers-list.1001551.n3.nabble.com/DatasourceV2-Default-Mode-for-DataFrameWriter-not-Dependent-on-DataSource-Version-td29434.html



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-32977) [SQL] JavaDoc on Default Save mode Incorrect

2020-09-23 Thread Russell Spitzer (Jira)

Russell Spitzer created SPARK-32977:
---

 Summary: [SQL] JavaDoc on Default Save mode Incorrect
 Key: SPARK-32977
 URL: https://issues.apache.org/jira/browse/SPARK-32977
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.0.1
Reporter: Russell Spitzer


The JavaDoc says that the default save mode is dependent on DataSource version 
which is incorrect. It is always ErrorOnExists.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-32976) Support column list in INSERT statement

2020-09-23 Thread Kent Yao (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-32976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17200888#comment-17200888
 ] 

Kent Yao commented on SPARK-32976:
--

thanks for pinging me，looking into this

> Support column list in INSERT statement
> ---
>
> Key: SPARK-32976
> URL: https://issues.apache.org/jira/browse/SPARK-32976
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Xiao Li
>Priority: Major
>
> INSERT currently does not support named column lists.  
> {{INSERT INTO  (col1, col2,…) VALUES( 'val1', 'val2', … )}}
> Note, we assume the column list contains all the column names. Issue an 
> exception if the list is not complete. The column order could be different 
> from the column order defined in the table definition.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-32976) Support column list in INSERT statement

2020-09-23 Thread Xiao Li (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-32976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li updated SPARK-32976:

Description: 
INSERT currently does not support named column lists.  

{{INSERT INTO  (col1, col2,…) VALUES( 'val1', 'val2', … )}}

Note, we assume the column list contains all the column names. The order could 
be different from the column order defined in the table definition.

  was:
INSERT currently does not support named column lists.  

{{INSERT INTO  (col1, col2,…) VALUES( 'val1', 'val2', … )}}


> Support column list in INSERT statement
> ---
>
> Key: SPARK-32976
> URL: https://issues.apache.org/jira/browse/SPARK-32976
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Xiao Li
>Priority: Major
>
> INSERT currently does not support named column lists.  
> {{INSERT INTO  (col1, col2,…) VALUES( 'val1', 'val2', … )}}
> Note, we assume the column list contains all the column names. The order 
> could be different from the column order defined in the table definition.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-32976) Support column list in INSERT statement

2020-09-23 Thread Xiao Li (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-32976?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Xiao Li updated SPARK-32976:

Description: 
INSERT currently does not support named column lists.  

{{INSERT INTO  (col1, col2,…) VALUES( 'val1', 'val2', … )}}

Note, we assume the column list contains all the column names. Issue an 
exception if the list is not complete. The column order could be different from 
the column order defined in the table definition.

  was:
INSERT currently does not support named column lists.  

{{INSERT INTO  (col1, col2,…) VALUES( 'val1', 'val2', … )}}

Note, we assume the column list contains all the column names. The order could 
be different from the column order defined in the table definition.


> Support column list in INSERT statement
> ---
>
> Key: SPARK-32976
> URL: https://issues.apache.org/jira/browse/SPARK-32976
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Xiao Li
>Priority: Major
>
> INSERT currently does not support named column lists.  
> {{INSERT INTO  (col1, col2,…) VALUES( 'val1', 'val2', … )}}
> Note, we assume the column list contains all the column names. Issue an 
> exception if the list is not complete. The column order could be different 
> from the column order defined in the table definition.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-32976) Support column list in INSERT statement

2020-09-23 Thread Xiao Li (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-32976?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17200877#comment-17200877
 ] 

Xiao Li commented on SPARK-32976:
-

[~Qin Yao] Are you interested in this?

> Support column list in INSERT statement
> ---
>
> Key: SPARK-32976
> URL: https://issues.apache.org/jira/browse/SPARK-32976
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Xiao Li
>Priority: Major
>
> INSERT currently does not support named column lists.  
> {{INSERT INTO  (col1, col2,…) VALUES( 'val1', 'val2', … )}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Created] (SPARK-32976) Support column list in INSERT statement

2020-09-23 Thread Xiao Li (Jira)

Xiao Li created SPARK-32976:
---

 Summary: Support column list in INSERT statement
 Key: SPARK-32976
 URL: https://issues.apache.org/jira/browse/SPARK-32976
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 3.1.0
Reporter: Xiao Li


INSERT currently does not support named column lists.  

{{INSERT INTO  (col1, col2,…) VALUES( 'val1', 'val2', … )}}



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-32973) FeatureHasher does not check categoricalCols in inputCols

2020-09-23 Thread Sean R. Owen (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-32973?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17200846#comment-17200846
 ] 

Sean R. Owen commented on SPARK-32973:
--

It looks like "real" is ignored here? I only see two features hashed. That 
would at least be consistent with the comment if so.
We could change it to an error, which seems OK, but, maybe just a warning to 
avoid a behavior change?

> FeatureHasher does not check categoricalCols in inputCols
> -
>
> Key: SPARK-32973
> URL: https://issues.apache.org/jira/browse/SPARK-32973
> Project: Spark
>  Issue Type: Improvement
>  Components: Documentation, ML
>Affects Versions: 2.3.0, 2.4.0, 3.0.0, 3.1.0
>Reporter: zhengruifeng
>Priority: Trivial
>
> doc related to {{categoricalCols}}:
> {code:java}
> Numeric columns to treat as categorical features. By default only string and 
> boolean columns are treated as categorical, so this param can be used to 
> explicitly specify the numerical columns to treat as categorical. Note, the 
> relevant columns must also be set in inputCols. {code}
>  
> However, the check to make sure {{categoricalCols}} in {{inputCols}} was 
> never implemented:
> for example, in 2.4.7 and current master(3.1.0):
> {code:java}
> scala> import org.apache.spark.ml.feature._
> import org.apache.spark.ml.feature._
> scala> import org.apache.spark.ml.linalg.{Vector, Vectors}
> import org.apache.spark.ml.linalg.{Vector, Vectors}
> scala> val df = Seq((2.0, 1, "foo"),(3.0, 2, "bar")).toDF("real", "int", 
> "string")
> df: org.apache.spark.sql.DataFrame = [real: double, int: int ... 1 more field]
> scala> val n = 100
> n: Int = 100
> scala> val hasher = new FeatureHasher().setInputCols("int", 
> "string").setCategoricalCols(Array("real")).setOutputCol("features").setNumFeatures(n)
>  
> hasher: org.apache.spark.ml.feature.FeatureHasher = featureHasher_fbe05968b33f
> scala> hasher.transform(df).show
> ++---+--++
> |real|int|string|features|
> ++---+--++
> | 2.0|  1|   foo|(100,[2,39],[1.0,...|
> | 3.0|  2|   bar|(100,[2,42],[2.0,...|
> ++---+--++
> {code}
>  
> CategoricalCols "real" is not in inputCols ("int", "string").
>  
> I think there are two options:
> 1, remove this comment  "Note, the relevant columns must also be set in 
> inputCols. ", since this requirement seems unnecessary;
> 2, add a check to make sure all CategoricalCols are in inputCols.
>  
>  



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-32852) spark.sql.hive.metastore.jars support HDFS location

2020-09-23 Thread Yuming Wang (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-32852?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17200798#comment-17200798
 ] 

Yuming Wang commented on SPARK-32852:
-

Workaround:
{code:sh}
bin/spark-submit --deploy-mode cluster  --conf 
spark.yarn.dist.archives=/tmp/hive-1.2.1-lib.tgz --conf 
spark.sql.hive.metastore.jars=./hive-1.2.1-lib.tgz/hive-1.2.1-lib/* --conf 
"spark.sql.hive.metastore.version=1.2.1
{code}


> spark.sql.hive.metastore.jars support HDFS location
> ---
>
> Key: SPARK-32852
> URL: https://issues.apache.org/jira/browse/SPARK-32852
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.1.0
>Reporter: Yuming Wang
>Priority: Major
>
> It would be great if {{spark.sql.hive.metastore.jars}} supports HDFS 
> location. The cluster mode will be very convenient.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-32975) [K8S] - executor fails to be restarted after it goes to ERROR/Failure state

2020-09-23 Thread Shenson Joseph (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-32975?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17200771#comment-17200771
 ] 

Shenson Joseph commented on SPARK-32975:


[~anirudh] [~eje] [~liyinan926]

> [K8S] - executor fails to be restarted after it goes to ERROR/Failure state
> ---
>
> Key: SPARK-32975
> URL: https://issues.apache.org/jira/browse/SPARK-32975
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes, Scheduler
>Affects Versions: 2.4.4
>Reporter: Shenson Joseph
>Priority: Critical
>
> We are using v1beta2-1.1.2-2.4.5 version of operator with spark-2.4.4
> spark executors keeps getting killed with exit code 1 and we are seeing 
> following exception in the executor which goes to error state. Once this 
> error happens, driver doesn't restart executor. 
>  
> Exception in thread "main" java.lang.reflect.UndeclaredThrowableException
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1713)
> at 
> org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:64)
> at 
> org.apache.spark.executor.CoarseGrainedExecutorBackend$.run(CoarseGrainedExecutorBackend.scala:188)
> at 
> org.apache.spark.executor.CoarseGrainedExecutorBackend$.main(CoarseGrainedExecutorBackend.scala:281)
> at 
> org.apache.spark.executor.CoarseGrainedExecutorBackend.main(CoarseGrainedExecutorBackend.scala)
> Caused by: org.apache.spark.SparkException: Exception thrown in awaitResult:
> at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:226)
> at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75)
> at org.apache.spark.rpc.RpcEnv.setupEndpointRefByURI(RpcEnv.scala:101)
> at 
> org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$run$1.apply$mcV$sp(CoarseGrainedExecutorBackend.scala:201)
> at 
> org.apache.spark.deploy.SparkHadoopUtil$$anon$2.run(SparkHadoopUtil.scala:65)
> at 
> org.apache.spark.deploy.SparkHadoopUtil$$anon$2.run(SparkHadoopUtil.scala:64)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:422)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
> ... 4 more
> Caused by: java.io.IOException: Failed to connect to 
> act-pipeline-app-1600187491917-driver-svc.default.svc:7078
> at 
> org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:245)
> at 
> org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:187)
> at org.apache.spark.rpc.netty.NettyRpcEnv.createClient(NettyRpcEnv.scala:198)
> at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:194)
> at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:190)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> Caused by: java.net.UnknownHostException: 
> act-pipeline-app-1600187491917-driver-svc.default.svc
> at java.net.InetAddress.getAllByName0(InetAddress.java:1281)
> at java.net.InetAddress.getAllByName(InetAddress.java:1193)
> at java.net.InetAddress.getAllByName(InetAddress.java:1127)
> at java.net.InetAddress.getByName(InetAddress.java:1077)
> at io.netty.util.internal.SocketUtils$8.run(SocketUtils.java:146)
> at io.netty.util.internal.SocketUtils$8.run(SocketUtils.java:143)
> at java.security.AccessController.doPrivileged(Native Method)
> at io.netty.util.internal.SocketUtils.addressByName(SocketUtils.java:143)
> at 
> io.netty.resolver.DefaultNameResolver.doResolve(DefaultNameResolver.java:43)
> at io.netty.resolver.SimpleNameResolver.resolve(SimpleNameResolver.java:63)
> at io.netty.resolver.SimpleNameResolver.resolve(SimpleNameResolver.java:55)
> at 
> io.netty.resolver.InetSocketAddressResolver.doResolve(InetSocketAddressResolver.java:57)
> at 
> io.netty.resolver.InetSocketAddressResolver.doResolve(InetSocketAddressResolver.java:32)
> at 
> io.netty.resolver.AbstractAddressResolver.resolve(AbstractAddressResolver.java:108)
> at io.netty.bootstrap.Bootstrap.doResolveAndConnect0(Bootstrap.java:208)
> at io.netty.bootstrap.Bootstrap.access$000(Bootstrap.java:49)
> at io.netty.bootstrap.Bootstrap$1.operationComplete(Bootstrap.java:188)
> at io.netty.bootstrap.Bootstrap$1.operationComplete(Bootstrap.java:174)
> at 
> io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:507)
> at 
> io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:481)
> at 
> io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:420)
> at

[jira] [Created] (SPARK-32975) [K8S] - executor fails to be restarted after it goes to ERROR/Failure state

2020-09-23 Thread Shenson Joseph (Jira)

Shenson Joseph created SPARK-32975:
--

 Summary: [K8S] - executor fails to be restarted after it goes to 
ERROR/Failure state
 Key: SPARK-32975
 URL: https://issues.apache.org/jira/browse/SPARK-32975
 Project: Spark
  Issue Type: Bug
  Components: Kubernetes, Scheduler
Affects Versions: 2.4.4
Reporter: Shenson Joseph


We are using v1beta2-1.1.2-2.4.5 version of operator with spark-2.4.4

spark executors keeps getting killed with exit code 1 and we are seeing 
following exception in the executor which goes to error state. Once this error 
happens, driver doesn't restart executor. 

 

Exception in thread "main" java.lang.reflect.UndeclaredThrowableException
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1713)
at 
org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:64)
at 
org.apache.spark.executor.CoarseGrainedExecutorBackend$.run(CoarseGrainedExecutorBackend.scala:188)
at 
org.apache.spark.executor.CoarseGrainedExecutorBackend$.main(CoarseGrainedExecutorBackend.scala:281)
at 
org.apache.spark.executor.CoarseGrainedExecutorBackend.main(CoarseGrainedExecutorBackend.scala)
Caused by: org.apache.spark.SparkException: Exception thrown in awaitResult:
at org.apache.spark.util.ThreadUtils$.awaitResult(ThreadUtils.scala:226)
at org.apache.spark.rpc.RpcTimeout.awaitResult(RpcTimeout.scala:75)
at org.apache.spark.rpc.RpcEnv.setupEndpointRefByURI(RpcEnv.scala:101)
at 
org.apache.spark.executor.CoarseGrainedExecutorBackend$$anonfun$run$1.apply$mcV$sp(CoarseGrainedExecutorBackend.scala:201)
at org.apache.spark.deploy.SparkHadoopUtil$$anon$2.run(SparkHadoopUtil.scala:65)
at org.apache.spark.deploy.SparkHadoopUtil$$anon$2.run(SparkHadoopUtil.scala:64)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
... 4 more
Caused by: java.io.IOException: Failed to connect to 
act-pipeline-app-1600187491917-driver-svc.default.svc:7078
at 
org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:245)
at 
org.apache.spark.network.client.TransportClientFactory.createClient(TransportClientFactory.java:187)
at org.apache.spark.rpc.netty.NettyRpcEnv.createClient(NettyRpcEnv.scala:198)
at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:194)
at org.apache.spark.rpc.netty.Outbox$$anon$1.call(Outbox.scala:190)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.net.UnknownHostException: 
act-pipeline-app-1600187491917-driver-svc.default.svc
at java.net.InetAddress.getAllByName0(InetAddress.java:1281)
at java.net.InetAddress.getAllByName(InetAddress.java:1193)
at java.net.InetAddress.getAllByName(InetAddress.java:1127)
at java.net.InetAddress.getByName(InetAddress.java:1077)
at io.netty.util.internal.SocketUtils$8.run(SocketUtils.java:146)
at io.netty.util.internal.SocketUtils$8.run(SocketUtils.java:143)
at java.security.AccessController.doPrivileged(Native Method)
at io.netty.util.internal.SocketUtils.addressByName(SocketUtils.java:143)
at io.netty.resolver.DefaultNameResolver.doResolve(DefaultNameResolver.java:43)
at io.netty.resolver.SimpleNameResolver.resolve(SimpleNameResolver.java:63)
at io.netty.resolver.SimpleNameResolver.resolve(SimpleNameResolver.java:55)
at 
io.netty.resolver.InetSocketAddressResolver.doResolve(InetSocketAddressResolver.java:57)
at 
io.netty.resolver.InetSocketAddressResolver.doResolve(InetSocketAddressResolver.java:32)
at 
io.netty.resolver.AbstractAddressResolver.resolve(AbstractAddressResolver.java:108)
at io.netty.bootstrap.Bootstrap.doResolveAndConnect0(Bootstrap.java:208)
at io.netty.bootstrap.Bootstrap.access$000(Bootstrap.java:49)
at io.netty.bootstrap.Bootstrap$1.operationComplete(Bootstrap.java:188)
at io.netty.bootstrap.Bootstrap$1.operationComplete(Bootstrap.java:174)
at 
io.netty.util.concurrent.DefaultPromise.notifyListener0(DefaultPromise.java:507)
at 
io.netty.util.concurrent.DefaultPromise.notifyListenersNow(DefaultPromise.java:481)
at 
io.netty.util.concurrent.DefaultPromise.notifyListeners(DefaultPromise.java:420)
at io.netty.util.concurrent.DefaultPromise.trySuccess(DefaultPromise.java:104)
at 
io.netty.channel.DefaultChannelPromise.trySuccess(DefaultChannelPromise.java:82)
at 
io.netty.channel.AbstractChannel$AbstractUnsafe.safeSetSuccess(AbstractChannel.java:978)
at 
io.netty.channel.AbstractChannel$AbstractUnsafe.register0(AbstractChannel.java:512)
at 
io.netty.channel.AbstractChannel$AbstractUnsafe.access$200(AbstractChannel.java:423)
at

[jira] [Commented] (SPARK-21481) Add indexOf method in ml.feature.HashingTF similar to mllib.feature.HashingTF

2020-09-23 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-21481?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17200757#comment-17200757
 ] 

Apache Spark commented on SPARK-21481:
--

User 'zhengruifeng' has created a pull request for this issue:
https://github.com/apache/spark/pull/29852

> Add indexOf method in ml.feature.HashingTF similar to mllib.feature.HashingTF
> -
>
> Key: SPARK-21481
> URL: https://issues.apache.org/jira/browse/SPARK-21481
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Affects Versions: 2.1.0, 2.2.0
>Reporter: Aseem Bansal
>Assignee: Huaxin Gao
>Priority: Minor
> Fix For: 3.0.0
>
>
> If we want to find the index of any input based on hashing trick then it is 
> possible in 
> https://spark.apache.org/docs/2.2.0/api/scala/index.html#org.apache.spark.mllib.feature.HashingTF
>  but not in 
> https://spark.apache.org/docs/2.2.0/api/scala/index.html#org.apache.spark.ml.feature.HashingTF.
> Should allow that for feature parity



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-22674) PySpark breaks serialization of namedtuple subclasses

2020-09-23 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-22674?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17200747#comment-17200747
 ] 

Apache Spark commented on SPARK-22674:
--

User 'HyukjinKwon' has created a pull request for this issue:
https://github.com/apache/spark/pull/29851

> PySpark breaks serialization of namedtuple subclasses
> -
>
> Key: SPARK-22674
> URL: https://issues.apache.org/jira/browse/SPARK-22674
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 2.2.0, 2.3.0
>Reporter: Jonas Amrich
>Priority: Major
>
> Pyspark monkey patches the namedtuple class to make it serializable, however 
> this breaks serialization of its subclasses. With current implementation, any 
> subclass will be serialized (and deserialized) as it's parent namedtuple. 
> Consider this code, which will fail with {{AttributeError: 'Point' object has 
> no attribute 'sum'}}:
> {code}
> from collections import namedtuple
> Point = namedtuple("Point", "x y")
> class PointSubclass(Point):
> def sum(self):
> return self.x + self.y
> rdd = spark.sparkContext.parallelize([[PointSubclass(1, 1)]])
> rdd.collect()[0][0].sum()
> {code}
> Moreover, as PySpark hijacks all namedtuples in the main module, importing 
> pyspark breaks serialization of namedtuple subclasses even in code which is 
> not related to spark / distributed execution. I don't see any clean solution 
> to this; a possible workaround may be to limit serialization hack only to 
> direct namedtuple subclasses like in 
> https://github.com/JonasAmrich/spark/commit/f3efecee28243380ecf6657fe54e1a165c1b7204



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-32965) pyspark reading csv files with utf_16le encoding

2020-09-23 Thread Punit Shah (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-32965?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17200743#comment-17200743
 ] 

Punit Shah commented on SPARK-32965:


It looks similar.  I've attached a utf-16le file to this ticket.  The pyspark 
code is essentially:

spark.read.csv("16le.csv", inferSchema=True, header=True, encoding="utf_16le").

The attached picture shows the result.

> pyspark reading csv files with utf_16le encoding
> 
>
> Key: SPARK-32965
> URL: https://issues.apache.org/jira/browse/SPARK-32965
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.7, 3.0.0, 3.0.1
>Reporter: Punit Shah
>Priority: Major
> Attachments: 16le.csv, 32965.png
>
>
> If you have a file encoded in utf_16le or utf_16be and try to use 
> spark.read.csv("", encoding="utf_16le") the dataframe isn't 
> rendered properly
> if you use python decoding like:
> prdd = spark_session._sc.binaryFiles(path_url).values().flatMap(lambda x : 
> x.decode("utf_16le").splitlines())
> and then do spark.read.csv(prdd), then it works.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-32965) pyspark reading csv files with utf_16le encoding

2020-09-23 Thread Punit Shah (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-32965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Punit Shah updated SPARK-32965:
---
Attachment: 32965.png

> pyspark reading csv files with utf_16le encoding
> 
>
> Key: SPARK-32965
> URL: https://issues.apache.org/jira/browse/SPARK-32965
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.7, 3.0.0, 3.0.1
>Reporter: Punit Shah
>Priority: Major
> Attachments: 16le.csv, 32965.png
>
>
> If you have a file encoded in utf_16le or utf_16be and try to use 
> spark.read.csv("", encoding="utf_16le") the dataframe isn't 
> rendered properly
> if you use python decoding like:
> prdd = spark_session._sc.binaryFiles(path_url).values().flatMap(lambda x : 
> x.decode("utf_16le").splitlines())
> and then do spark.read.csv(prdd), then it works.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-32306) `approx_percentile` in Spark SQL gives incorrect results

2020-09-23 Thread Hyukjin Kwon (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-32306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-32306:
-
Fix Version/s: 3.0.2
   2.4.8

> `approx_percentile` in Spark SQL gives incorrect results
> 
>
> Key: SPARK-32306
> URL: https://issues.apache.org/jira/browse/SPARK-32306
> Project: Spark
>  Issue Type: Documentation
>  Components: PySpark, SQL
>Affects Versions: 2.4.4, 3.0.0, 3.1.0
>Reporter: Sean Malory
>Assignee: Maxim Gekk
>Priority: Major
> Fix For: 2.4.8, 3.0.2, 3.1.0
>
>
> The `approx_percentile` function in Spark SQL does not give the correct 
> result. I'm not sure how incorrect it is; it may just be a boundary issue. 
> From the docs:
> {quote}The accuracy parameter (default: 1) is a positive numeric literal 
> which controls approximation accuracy at the cost of memory. Higher value of 
> accuracy yields better accuracy, 1.0/accuracy is the relative error of the 
> approximation.
> {quote}
> This is not true. Here is a minimum example in `pyspark` where, essentially, 
> the median of 5 and 8 is being calculated as 5:
> {code:python}
> import pyspark.sql.functions as psf
> df = spark.createDataFrame(
> [('bar', 5), ('bar', 8)], ['name', 'val']
> )
> median = psf.expr('percentile_approx(val, 0.5, 2147483647)')
> df.groupBy('name').agg(median.alias('median'))# gives the median as 5
> {code}
> I've tested this with Spark v2.4.4, pyspark v2.4.5- although I suspect this 
> is an issue with the underlying algorithm.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Updated] (SPARK-32965) pyspark reading csv files with utf_16le encoding

2020-09-23 Thread Punit Shah (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-32965?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Punit Shah updated SPARK-32965:
---
Attachment: 16le.csv

> pyspark reading csv files with utf_16le encoding
> 
>
> Key: SPARK-32965
> URL: https://issues.apache.org/jira/browse/SPARK-32965
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.4.7, 3.0.0, 3.0.1
>Reporter: Punit Shah
>Priority: Major
> Attachments: 16le.csv
>
>
> If you have a file encoded in utf_16le or utf_16be and try to use 
> spark.read.csv("", encoding="utf_16le") the dataframe isn't 
> rendered properly
> if you use python decoding like:
> prdd = spark_session._sc.binaryFiles(path_url).values().flatMap(lambda x : 
> x.decode("utf_16le").splitlines())
> and then do spark.read.csv(prdd), then it works.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-32972) Pass all `mllib` module UTs in Scala 2.13

2020-09-23 Thread Yang Jie (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-32972?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17200732#comment-17200732
 ] 

Yang Jie commented on SPARK-32972:
--

There is only "training with sample weights"  in 
org.apache.spark.ml.regression.RandomForestRegressorSuite not fixed, but 
there's no good idea to fix it now...

> Pass all `mllib` module UTs in Scala 2.13
> -
>
> Key: SPARK-32972
> URL: https://issues.apache.org/jira/browse/SPARK-32972
> Project: Spark
>  Issue Type: Sub-task
>  Components: ML, MLlib
>Affects Versions: 3.1.0
>Reporter: Yang Jie
>Priority: Major
>
> There are 51 Scala test and 3 java test Failed of `mllib` module, details 
> will be added later.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-32974) FeatureHasher transform optimization

2020-09-23 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-32974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17200721#comment-17200721
 ] 

Apache Spark commented on SPARK-32974:
--

User 'zhengruifeng' has created a pull request for this issue:
https://github.com/apache/spark/pull/29850

> FeatureHasher transform optimization
> 
>
> Key: SPARK-32974
> URL: https://issues.apache.org/jira/browse/SPARK-32974
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Affects Versions: 3.1.0
>Reporter: zhengruifeng
>Priority: Minor
>
> for a numerical column, its output index is a hash of its col_name, we can 
> pre-compute it at first, instead of computing it on each row.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Commented] (SPARK-32974) FeatureHasher transform optimization

2020-09-23 Thread Apache Spark (Jira)



[ 
https://issues.apache.org/jira/browse/SPARK-32974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17200719#comment-17200719
 ] 

Apache Spark commented on SPARK-32974:
--

User 'zhengruifeng' has created a pull request for this issue:
https://github.com/apache/spark/pull/29850

> FeatureHasher transform optimization
> 
>
> Key: SPARK-32974
> URL: https://issues.apache.org/jira/browse/SPARK-32974
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Affects Versions: 3.1.0
>Reporter: zhengruifeng
>Priority: Minor
>
> for a numerical column, its output index is a hash of its col_name, we can 
> pre-compute it at first, instead of computing it on each row.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-32974) FeatureHasher transform optimization

2020-09-23 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-32974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-32974:


Assignee: (was: Apache Spark)

> FeatureHasher transform optimization
> 
>
> Key: SPARK-32974
> URL: https://issues.apache.org/jira/browse/SPARK-32974
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Affects Versions: 3.1.0
>Reporter: zhengruifeng
>Priority: Minor
>
> for a numerical column, its output index is a hash of its col_name, we can 
> pre-compute it at first, instead of computing it on each row.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

[jira] [Assigned] (SPARK-32974) FeatureHasher transform optimization

2020-09-23 Thread Apache Spark (Jira)



 [ 
https://issues.apache.org/jira/browse/SPARK-32974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Apache Spark reassigned SPARK-32974:


Assignee: Apache Spark

> FeatureHasher transform optimization
> 
>
> Key: SPARK-32974
> URL: https://issues.apache.org/jira/browse/SPARK-32974
> Project: Spark
>  Issue Type: Improvement
>  Components: ML
>Affects Versions: 3.1.0
>Reporter: zhengruifeng
>Assignee: Apache Spark
>Priority: Minor
>
> for a numerical column, its output index is a hash of its col_name, we can 
> pre-compute it at first, instead of computing it on each row.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org

1 2 >

1 - 100 of 119 matches

Mail list logo