[jira] [Commented] (SPARK-23758) MLlib 2.4 Roadmap

2019-07-19 Thread Dongjoon Hyun (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-23758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16889370#comment-16889370
 ] 

Dongjoon Hyun commented on SPARK-23758:
---

Thank you so much, [~josephkb]!

> MLlib 2.4 Roadmap
> -
>
> Key: SPARK-23758
> URL: https://issues.apache.org/jira/browse/SPARK-23758
> Project: Spark
>  Issue Type: New Feature
>  Components: ML
>Affects Versions: 2.4.0
>Reporter: Joseph K. Bradley
>Priority: Major
>
> h1. Roadmap process
> This roadmap is a master list for MLlib improvements we are working on during 
> this release.  This includes ML-related changes in PySpark and SparkR.
> *What is planned for the next release?*
> * This roadmap lists issues which at least one Committer has prioritized.  
> See details below in "Instructions for committers."
> * This roadmap only lists larger or more critical issues.
> *How can contributors influence this roadmap?*
> * If you believe an issue should be in this roadmap, please discuss the issue 
> on JIRA and/or the dev mailing list.  Make sure to ping Committers since at 
> least one must agree to shepherd the issue.
> * For general discussions, use this JIRA or the dev mailing list.  For 
> specific issues, please comment on those issues or the mailing list.
> * Vote for & watch issues which are important to you.
> ** MLlib, sorted by: [Votes | 
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20SPARK%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened)%20AND%20component%20in%20(ML%2C%20MLlib)%20ORDER%20BY%20votes%20DESC]
>  or [Watchers | 
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20SPARK%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened)%20AND%20component%20in%20(ML%2C%20MLlib)%20ORDER%20BY%20Watchers%20DESC]
> ** SparkR, sorted by: [Votes | 
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20SPARK%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened)%20AND%20component%20in%20(SparkR)%20ORDER%20BY%20votes%20DESC]
>  or [Watchers | 
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20SPARK%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened)%20AND%20component%20in%20(SparkR)%20ORDER%20BY%20Watchers%20DESC]
> h2. Target Version and Priority
> This section describes the meaning of Target Version and Priority.
> || Category | Target Version | Priority | Shepherd | Put on roadmap? | In 
> next release? ||
> | [1 | 
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20SPARK%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened)%20AND%20priority%20%3D%20Blocker%20AND%20component%20in%20(GraphX%2C%20ML%2C%20MLlib%2C%20SparkR)%20AND%20%22Target%20Version%2Fs%22%20in%20(2.4.0%2C%203.0.0)]
>  | next release | Blocker | *must* | *must* | *must* |
> | [2 | 
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20SPARK%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened)%20AND%20priority%20%3D%20Critical%20AND%20component%20in%20(GraphX%2C%20ML%2C%20MLlib%2C%20SparkR)%20AND%20%22Target%20Version%2Fs%22%20in%20(2.4.0%2C%203.0.0)]
>  | next release | Critical | *must* | yes, unless small | *best effort* |
> | [3 | 
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20SPARK%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened)%20AND%20priority%20%3D%20Major%20AND%20component%20in%20(GraphX%2C%20ML%2C%20MLlib%2C%20SparkR)%20AND%20%22Target%20Version%2Fs%22%20in%20(2.4.0%2C%203.0.0)]
>  | next release | Major | *must* | optional | *best effort* |
> | [4 | 
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20SPARK%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened)%20AND%20priority%20%3D%20Minor%20AND%20component%20in%20(GraphX%2C%20ML%2C%20MLlib%2C%20SparkR)%20AND%20%22Target%20Version%2Fs%22%20in%20(2.4.0%2C%203.0.0)]
>  | next release | Minor | optional | no | maybe |
> | [5 | 
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20SPARK%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened)%20AND%20priority%20%3D%20Trivial%20AND%20component%20in%20(GraphX%2C%20ML%2C%20MLlib%2C%20SparkR)%20AND%20%22Target%20Version%2Fs%22%20in%20(2.4.0%2C%203.0.0)]
>  | next release | Trivial | optional | no | maybe |
> | [6 | 
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20SPARK%20AND%20status%20in%20(Open%2C%20"In%20Progress"%2C%20Reopened)%20AND%20component%20in%20(GraphX%2C%20ML%2C%20MLlib%2C%20SparkR)%20AND%20"Target%20Version%2Fs"%20in%20(EMPTY)%20AND%20Shepherd%20not%20in%20(EMPTY)%20ORDER%20BY%20priority%20DESC]
>  | (empty) | (any) | yes | no | maybe |
> | [7 | 
> 

[jira] [Commented] (SPARK-28155) Improve SQL optimizer's predicate pushdown performance for cascading joins

2019-07-19 Thread Dongjoon Hyun (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-28155?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16889362#comment-16889362
 ] 

Dongjoon Hyun commented on SPARK-28155:
---

This is committed with a wrong JIRA ID, `SPARK-28155`.

> Improve SQL optimizer's predicate pushdown performance for cascading joins
> --
>
> Key: SPARK-28155
> URL: https://issues.apache.org/jira/browse/SPARK-28155
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Yesheng Ma
>Assignee: Yesheng Ma
>Priority: Major
> Fix For: 3.0.0
>
>
> The current catalyst optimizer's predicate pushdown is divided into two 
> separate rules: PushDownPredicate and PushThroughJoin. This is not efficient 
> for optimizing cascading joins such as TPC-DS q64, where a whole default 
> batch is re-executed just due to this. We need a more efficient approach to 
> pushdown predicate as much as possible in a single pass.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-28433) Incorrect assertion in scala test for aarch64 platform

2019-07-19 Thread Dongjoon Hyun (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-28433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16889263#comment-16889263
 ] 

Dongjoon Hyun commented on SPARK-28433:
---

I removed `2.4.3` from the affected versions because there is no test using 
those assertions in `branch-2.4`.

> Incorrect assertion in scala test for aarch64 platform
> --
>
> Key: SPARK-28433
> URL: https://issues.apache.org/jira/browse/SPARK-28433
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: huangtianhua
>Assignee: huangtianhua
>Priority: Minor
> Fix For: 3.0.0
>
>
> We ran unit tests of spark on aarch64 server, here are two sql scala tests 
> failed: 
> - SPARK-26021: NaN and -0.0 in grouping expressions *** FAILED ***
>2143289344 equaled 2143289344 (DataFrameAggregateSuite.scala:732)
>  - NaN and -0.0 in window partition keys *** FAILED ***
>2143289344 equaled 2143289344 (DataFrameWindowFunctionsSuite.scala:704)
> we found the values of floatToRawIntBits(0.0f / 0.0f) and 
> floatToRawIntBits(Float.NaN) on aarch64 are same(2143289344),  first we 
> thought it's something about jdk or scala, but after discuss with jdk-dev and 
> scala community see 
> https://users.scala-lang.org/t/the-value-of-floattorawintbits-0-0f-0-0f-is-different-on-x86-64-and-aarch64-platforms/4845
>  , we believe the value depends on the architecture.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-28433) Incorrect assertion in scala test for aarch64 platform

2019-07-19 Thread Dongjoon Hyun (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-28433:
--
Affects Version/s: (was: 2.4.3)

> Incorrect assertion in scala test for aarch64 platform
> --
>
> Key: SPARK-28433
> URL: https://issues.apache.org/jira/browse/SPARK-28433
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: huangtianhua
>Assignee: huangtianhua
>Priority: Minor
> Fix For: 3.0.0
>
>
> We ran unit tests of spark on aarch64 server, here are two sql scala tests 
> failed: 
> - SPARK-26021: NaN and -0.0 in grouping expressions *** FAILED ***
>2143289344 equaled 2143289344 (DataFrameAggregateSuite.scala:732)
>  - NaN and -0.0 in window partition keys *** FAILED ***
>2143289344 equaled 2143289344 (DataFrameWindowFunctionsSuite.scala:704)
> we found the values of floatToRawIntBits(0.0f / 0.0f) and 
> floatToRawIntBits(Float.NaN) on aarch64 are same(2143289344),  first we 
> thought it's something about jdk or scala, but after discuss with jdk-dev and 
> scala community see 
> https://users.scala-lang.org/t/the-value-of-floattorawintbits-0-0f-0-0f-is-different-on-x86-64-and-aarch64-platforms/4845
>  , we believe the value depends on the architecture.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-28433) Incorrect assertion in scala test for aarch64 platform

2019-07-19 Thread Dongjoon Hyun (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-28433.
---
   Resolution: Fixed
 Assignee: huangtianhua
Fix Version/s: 3.0.0

This is resolved via https://github.com/apache/spark/pull/25186

> Incorrect assertion in scala test for aarch64 platform
> --
>
> Key: SPARK-28433
> URL: https://issues.apache.org/jira/browse/SPARK-28433
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0, 2.4.3
>Reporter: huangtianhua
>Assignee: huangtianhua
>Priority: Minor
> Fix For: 3.0.0
>
>
> We ran unit tests of spark on aarch64 server, here are two sql scala tests 
> failed: 
> - SPARK-26021: NaN and -0.0 in grouping expressions *** FAILED ***
>2143289344 equaled 2143289344 (DataFrameAggregateSuite.scala:732)
>  - NaN and -0.0 in window partition keys *** FAILED ***
>2143289344 equaled 2143289344 (DataFrameWindowFunctionsSuite.scala:704)
> we found the values of floatToRawIntBits(0.0f / 0.0f) and 
> floatToRawIntBits(Float.NaN) on aarch64 are same(2143289344),  first we 
> thought it's something about jdk or scala, but after discuss with jdk-dev and 
> scala community see 
> https://users.scala-lang.org/t/the-value-of-floattorawintbits-0-0f-0-0f-is-different-on-x86-64-and-aarch64-platforms/4845
>  , we believe the value depends on the architecture.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-23758) MLlib 2.4 Roadmap

2019-07-19 Thread Joseph K. Bradley (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-23758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph K. Bradley resolved SPARK-23758.
---
Resolution: Done

> MLlib 2.4 Roadmap
> -
>
> Key: SPARK-23758
> URL: https://issues.apache.org/jira/browse/SPARK-23758
> Project: Spark
>  Issue Type: New Feature
>  Components: ML
>Affects Versions: 3.0.0
>Reporter: Joseph K. Bradley
>Priority: Major
>
> h1. Roadmap process
> This roadmap is a master list for MLlib improvements we are working on during 
> this release.  This includes ML-related changes in PySpark and SparkR.
> *What is planned for the next release?*
> * This roadmap lists issues which at least one Committer has prioritized.  
> See details below in "Instructions for committers."
> * This roadmap only lists larger or more critical issues.
> *How can contributors influence this roadmap?*
> * If you believe an issue should be in this roadmap, please discuss the issue 
> on JIRA and/or the dev mailing list.  Make sure to ping Committers since at 
> least one must agree to shepherd the issue.
> * For general discussions, use this JIRA or the dev mailing list.  For 
> specific issues, please comment on those issues or the mailing list.
> * Vote for & watch issues which are important to you.
> ** MLlib, sorted by: [Votes | 
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20SPARK%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened)%20AND%20component%20in%20(ML%2C%20MLlib)%20ORDER%20BY%20votes%20DESC]
>  or [Watchers | 
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20SPARK%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened)%20AND%20component%20in%20(ML%2C%20MLlib)%20ORDER%20BY%20Watchers%20DESC]
> ** SparkR, sorted by: [Votes | 
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20SPARK%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened)%20AND%20component%20in%20(SparkR)%20ORDER%20BY%20votes%20DESC]
>  or [Watchers | 
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20SPARK%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened)%20AND%20component%20in%20(SparkR)%20ORDER%20BY%20Watchers%20DESC]
> h2. Target Version and Priority
> This section describes the meaning of Target Version and Priority.
> || Category | Target Version | Priority | Shepherd | Put on roadmap? | In 
> next release? ||
> | [1 | 
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20SPARK%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened)%20AND%20priority%20%3D%20Blocker%20AND%20component%20in%20(GraphX%2C%20ML%2C%20MLlib%2C%20SparkR)%20AND%20%22Target%20Version%2Fs%22%20in%20(2.4.0%2C%203.0.0)]
>  | next release | Blocker | *must* | *must* | *must* |
> | [2 | 
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20SPARK%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened)%20AND%20priority%20%3D%20Critical%20AND%20component%20in%20(GraphX%2C%20ML%2C%20MLlib%2C%20SparkR)%20AND%20%22Target%20Version%2Fs%22%20in%20(2.4.0%2C%203.0.0)]
>  | next release | Critical | *must* | yes, unless small | *best effort* |
> | [3 | 
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20SPARK%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened)%20AND%20priority%20%3D%20Major%20AND%20component%20in%20(GraphX%2C%20ML%2C%20MLlib%2C%20SparkR)%20AND%20%22Target%20Version%2Fs%22%20in%20(2.4.0%2C%203.0.0)]
>  | next release | Major | *must* | optional | *best effort* |
> | [4 | 
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20SPARK%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened)%20AND%20priority%20%3D%20Minor%20AND%20component%20in%20(GraphX%2C%20ML%2C%20MLlib%2C%20SparkR)%20AND%20%22Target%20Version%2Fs%22%20in%20(2.4.0%2C%203.0.0)]
>  | next release | Minor | optional | no | maybe |
> | [5 | 
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20SPARK%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened)%20AND%20priority%20%3D%20Trivial%20AND%20component%20in%20(GraphX%2C%20ML%2C%20MLlib%2C%20SparkR)%20AND%20%22Target%20Version%2Fs%22%20in%20(2.4.0%2C%203.0.0)]
>  | next release | Trivial | optional | no | maybe |
> | [6 | 
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20SPARK%20AND%20status%20in%20(Open%2C%20"In%20Progress"%2C%20Reopened)%20AND%20component%20in%20(GraphX%2C%20ML%2C%20MLlib%2C%20SparkR)%20AND%20"Target%20Version%2Fs"%20in%20(EMPTY)%20AND%20Shepherd%20not%20in%20(EMPTY)%20ORDER%20BY%20priority%20DESC]
>  | (empty) | (any) | yes | no | maybe |
> | [7 | 
> 

[jira] [Updated] (SPARK-23758) MLlib 2.4 Roadmap

2019-07-19 Thread Joseph K. Bradley (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-23758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Joseph K. Bradley updated SPARK-23758:
--
Affects Version/s: (was: 3.0.0)
   2.4.0

> MLlib 2.4 Roadmap
> -
>
> Key: SPARK-23758
> URL: https://issues.apache.org/jira/browse/SPARK-23758
> Project: Spark
>  Issue Type: New Feature
>  Components: ML
>Affects Versions: 2.4.0
>Reporter: Joseph K. Bradley
>Priority: Major
>
> h1. Roadmap process
> This roadmap is a master list for MLlib improvements we are working on during 
> this release.  This includes ML-related changes in PySpark and SparkR.
> *What is planned for the next release?*
> * This roadmap lists issues which at least one Committer has prioritized.  
> See details below in "Instructions for committers."
> * This roadmap only lists larger or more critical issues.
> *How can contributors influence this roadmap?*
> * If you believe an issue should be in this roadmap, please discuss the issue 
> on JIRA and/or the dev mailing list.  Make sure to ping Committers since at 
> least one must agree to shepherd the issue.
> * For general discussions, use this JIRA or the dev mailing list.  For 
> specific issues, please comment on those issues or the mailing list.
> * Vote for & watch issues which are important to you.
> ** MLlib, sorted by: [Votes | 
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20SPARK%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened)%20AND%20component%20in%20(ML%2C%20MLlib)%20ORDER%20BY%20votes%20DESC]
>  or [Watchers | 
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20SPARK%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened)%20AND%20component%20in%20(ML%2C%20MLlib)%20ORDER%20BY%20Watchers%20DESC]
> ** SparkR, sorted by: [Votes | 
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20SPARK%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened)%20AND%20component%20in%20(SparkR)%20ORDER%20BY%20votes%20DESC]
>  or [Watchers | 
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20SPARK%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened)%20AND%20component%20in%20(SparkR)%20ORDER%20BY%20Watchers%20DESC]
> h2. Target Version and Priority
> This section describes the meaning of Target Version and Priority.
> || Category | Target Version | Priority | Shepherd | Put on roadmap? | In 
> next release? ||
> | [1 | 
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20SPARK%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened)%20AND%20priority%20%3D%20Blocker%20AND%20component%20in%20(GraphX%2C%20ML%2C%20MLlib%2C%20SparkR)%20AND%20%22Target%20Version%2Fs%22%20in%20(2.4.0%2C%203.0.0)]
>  | next release | Blocker | *must* | *must* | *must* |
> | [2 | 
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20SPARK%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened)%20AND%20priority%20%3D%20Critical%20AND%20component%20in%20(GraphX%2C%20ML%2C%20MLlib%2C%20SparkR)%20AND%20%22Target%20Version%2Fs%22%20in%20(2.4.0%2C%203.0.0)]
>  | next release | Critical | *must* | yes, unless small | *best effort* |
> | [3 | 
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20SPARK%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened)%20AND%20priority%20%3D%20Major%20AND%20component%20in%20(GraphX%2C%20ML%2C%20MLlib%2C%20SparkR)%20AND%20%22Target%20Version%2Fs%22%20in%20(2.4.0%2C%203.0.0)]
>  | next release | Major | *must* | optional | *best effort* |
> | [4 | 
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20SPARK%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened)%20AND%20priority%20%3D%20Minor%20AND%20component%20in%20(GraphX%2C%20ML%2C%20MLlib%2C%20SparkR)%20AND%20%22Target%20Version%2Fs%22%20in%20(2.4.0%2C%203.0.0)]
>  | next release | Minor | optional | no | maybe |
> | [5 | 
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20SPARK%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened)%20AND%20priority%20%3D%20Trivial%20AND%20component%20in%20(GraphX%2C%20ML%2C%20MLlib%2C%20SparkR)%20AND%20%22Target%20Version%2Fs%22%20in%20(2.4.0%2C%203.0.0)]
>  | next release | Trivial | optional | no | maybe |
> | [6 | 
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20SPARK%20AND%20status%20in%20(Open%2C%20"In%20Progress"%2C%20Reopened)%20AND%20component%20in%20(GraphX%2C%20ML%2C%20MLlib%2C%20SparkR)%20AND%20"Target%20Version%2Fs"%20in%20(EMPTY)%20AND%20Shepherd%20not%20in%20(EMPTY)%20ORDER%20BY%20priority%20DESC]
>  | (empty) | (any) | yes | no | maybe |
> | [7 | 
> 

[jira] [Commented] (SPARK-23758) MLlib 2.4 Roadmap

2019-07-19 Thread Joseph K. Bradley (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-23758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16889240#comment-16889240
 ] 

Joseph K. Bradley commented on SPARK-23758:
---

Ah sorry, we stopped using this.  I'll close it.

> MLlib 2.4 Roadmap
> -
>
> Key: SPARK-23758
> URL: https://issues.apache.org/jira/browse/SPARK-23758
> Project: Spark
>  Issue Type: New Feature
>  Components: ML
>Affects Versions: 3.0.0
>Reporter: Joseph K. Bradley
>Priority: Major
>
> h1. Roadmap process
> This roadmap is a master list for MLlib improvements we are working on during 
> this release.  This includes ML-related changes in PySpark and SparkR.
> *What is planned for the next release?*
> * This roadmap lists issues which at least one Committer has prioritized.  
> See details below in "Instructions for committers."
> * This roadmap only lists larger or more critical issues.
> *How can contributors influence this roadmap?*
> * If you believe an issue should be in this roadmap, please discuss the issue 
> on JIRA and/or the dev mailing list.  Make sure to ping Committers since at 
> least one must agree to shepherd the issue.
> * For general discussions, use this JIRA or the dev mailing list.  For 
> specific issues, please comment on those issues or the mailing list.
> * Vote for & watch issues which are important to you.
> ** MLlib, sorted by: [Votes | 
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20SPARK%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened)%20AND%20component%20in%20(ML%2C%20MLlib)%20ORDER%20BY%20votes%20DESC]
>  or [Watchers | 
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20SPARK%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened)%20AND%20component%20in%20(ML%2C%20MLlib)%20ORDER%20BY%20Watchers%20DESC]
> ** SparkR, sorted by: [Votes | 
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20SPARK%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened)%20AND%20component%20in%20(SparkR)%20ORDER%20BY%20votes%20DESC]
>  or [Watchers | 
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20SPARK%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened)%20AND%20component%20in%20(SparkR)%20ORDER%20BY%20Watchers%20DESC]
> h2. Target Version and Priority
> This section describes the meaning of Target Version and Priority.
> || Category | Target Version | Priority | Shepherd | Put on roadmap? | In 
> next release? ||
> | [1 | 
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20SPARK%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened)%20AND%20priority%20%3D%20Blocker%20AND%20component%20in%20(GraphX%2C%20ML%2C%20MLlib%2C%20SparkR)%20AND%20%22Target%20Version%2Fs%22%20in%20(2.4.0%2C%203.0.0)]
>  | next release | Blocker | *must* | *must* | *must* |
> | [2 | 
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20SPARK%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened)%20AND%20priority%20%3D%20Critical%20AND%20component%20in%20(GraphX%2C%20ML%2C%20MLlib%2C%20SparkR)%20AND%20%22Target%20Version%2Fs%22%20in%20(2.4.0%2C%203.0.0)]
>  | next release | Critical | *must* | yes, unless small | *best effort* |
> | [3 | 
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20SPARK%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened)%20AND%20priority%20%3D%20Major%20AND%20component%20in%20(GraphX%2C%20ML%2C%20MLlib%2C%20SparkR)%20AND%20%22Target%20Version%2Fs%22%20in%20(2.4.0%2C%203.0.0)]
>  | next release | Major | *must* | optional | *best effort* |
> | [4 | 
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20SPARK%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened)%20AND%20priority%20%3D%20Minor%20AND%20component%20in%20(GraphX%2C%20ML%2C%20MLlib%2C%20SparkR)%20AND%20%22Target%20Version%2Fs%22%20in%20(2.4.0%2C%203.0.0)]
>  | next release | Minor | optional | no | maybe |
> | [5 | 
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20SPARK%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened)%20AND%20priority%20%3D%20Trivial%20AND%20component%20in%20(GraphX%2C%20ML%2C%20MLlib%2C%20SparkR)%20AND%20%22Target%20Version%2Fs%22%20in%20(2.4.0%2C%203.0.0)]
>  | next release | Trivial | optional | no | maybe |
> | [6 | 
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20SPARK%20AND%20status%20in%20(Open%2C%20"In%20Progress"%2C%20Reopened)%20AND%20component%20in%20(GraphX%2C%20ML%2C%20MLlib%2C%20SparkR)%20AND%20"Target%20Version%2Fs"%20in%20(EMPTY)%20AND%20Shepherd%20not%20in%20(EMPTY)%20ORDER%20BY%20priority%20DESC]
>  | (empty) | (any) | yes | no | maybe |
> | [7 | 
> 

[jira] [Created] (SPARK-28456) Add a public API `Encoder.copyEncoder` to allow creating Encoder without touching Scala reflections

2019-07-19 Thread Shixiong Zhu (JIRA)
Shixiong Zhu created SPARK-28456:


 Summary: Add a public API `Encoder.copyEncoder` to allow creating 
Encoder without touching Scala reflections
 Key: SPARK-28456
 URL: https://issues.apache.org/jira/browse/SPARK-28456
 Project: Spark
  Issue Type: New Feature
  Components: SQL
Affects Versions: 2.4.3
Reporter: Shixiong Zhu
Assignee: Shixiong Zhu


Because `Encoder` is not thread safe, the user cannot reuse an `Encoder` in 
multiple `Dataset`s. However, creating an `Encoder` for a complicated class is 
slow due to Scala reflections. To reduce the cost of Encoder creation, right 
now I usually use the private API `ExpressionEncoder.copy` as follows:

{code}
object FooEncoder {
 private lazy val _encoder: ExpressionEncoder[Foo] = ExpressionEncoder[Foo]()
 implicit def encoder: ExpressionEncoder[Foo] = _encoder.copy()
}
{code}

This PR proposes a new method `copyEncoder` in `Encoder` so that the above 
codes can be rewritten using public APIs.

{code}
object FooEncoder {
 private lazy val _encoder: Encoder[Foo] = Encoders.product[Foo]()
 implicit def encoder: Encoder[Foo] = _encoder.copyEncoder()
}
{code}

Regarding the method name, 
- Why not use `copy`? It conflicts with `case class`'s copy.
- Why not use `clone`? It conflicts with `Object.clone`.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-28455) Executor may be timed out too soon because of overflow in tracking code

2019-07-19 Thread Marcelo Vanzin (JIRA)
Marcelo Vanzin created SPARK-28455:
--

 Summary: Executor may be timed out too soon because of overflow in 
tracking code
 Key: SPARK-28455
 URL: https://issues.apache.org/jira/browse/SPARK-28455
 Project: Spark
  Issue Type: Bug
  Components: Spark Core
Affects Versions: 3.0.0
Reporter: Marcelo Vanzin


This affects the new code added in SPARK-27963 (so normal dynamic allocation is 
fine). There's an overflow issue in that code that may cause executors to be 
timed out early with the default configuration.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-28454) Validate LongType in _make_type_verifier

2019-07-19 Thread AY (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-28454?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16889204#comment-16889204
 ] 

AY commented on SPARK-28454:


[https://github.com/apache/spark/pull/25117] - related PR.

> Validate LongType in _make_type_verifier
> 
>
> Key: SPARK-28454
> URL: https://issues.apache.org/jira/browse/SPARK-28454
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark
>Affects Versions: 2.3.3
>Reporter: AY
>Priority: Major
>
> {{pyspark.sql.types._make_type_verifier doesn't validate LongType values 
> range.}}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-28454) Validate LongType in _make_type_verifier

2019-07-19 Thread AY (JIRA)
AY created SPARK-28454:
--

 Summary: Validate LongType in _make_type_verifier
 Key: SPARK-28454
 URL: https://issues.apache.org/jira/browse/SPARK-28454
 Project: Spark
  Issue Type: Bug
  Components: PySpark
Affects Versions: 2.3.3
Reporter: AY


{{pyspark.sql.types._make_type_verifier doesn't validate LongType values 
range.}}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-28453) Support recursive view syntax

2019-07-19 Thread Peter Toth (JIRA)
Peter Toth created SPARK-28453:
--

 Summary: Support recursive view syntax
 Key: SPARK-28453
 URL: https://issues.apache.org/jira/browse/SPARK-28453
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.0.0
Reporter: Peter Toth


PostgreSQL does support recursive view syntax:
{noformat}
CREATE RECURSIVE VIEW nums (n) AS
  VALUES (1)
  UNION ALL
  SELECT n+1 FROM nums WHERE n < 5
{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-28444) Bump Kubernetes Client Version to 4.3.0

2019-07-19 Thread Dongjoon Hyun (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-28444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16888987#comment-16888987
 ] 

Dongjoon Hyun commented on SPARK-28444:
---

BTW, I also agree with [~skonto]. I believe the tests will pass because we 
don't use new features of 1.14. K8s itself and K8s Client library should 
provide the compatibliity.

> Bump Kubernetes Client Version to 4.3.0
> ---
>
> Key: SPARK-28444
> URL: https://issues.apache.org/jira/browse/SPARK-28444
> Project: Spark
>  Issue Type: Dependency upgrade
>  Components: Kubernetes
>Affects Versions: 3.0.0, 2.4.3
>Reporter: Patrick Winter
>Priority: Major
>
> Spark is currently using the Kubernetes client version 4.1.2. This client 
> does not support the current Kubernetes version 1.14, as can be seen on the 
> [compatibility 
> matrix|[https://github.com/fabric8io/kubernetes-client#compatibility-matrix]].
>  Therefore the Kubernetes client should be bumped up to version 4.3.0.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-28444) Bump Kubernetes Client Version to 4.3.0

2019-07-19 Thread Dongjoon Hyun (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-28444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16888986#comment-16888986
 ] 

Dongjoon Hyun commented on SPARK-28444:
---

According to that matrix, it looks reasonable to make a PR because it's not 
covered officially, [~patrick-winter-swisscard].

> Bump Kubernetes Client Version to 4.3.0
> ---
>
> Key: SPARK-28444
> URL: https://issues.apache.org/jira/browse/SPARK-28444
> Project: Spark
>  Issue Type: Dependency upgrade
>  Components: Kubernetes
>Affects Versions: 3.0.0, 2.4.3
>Reporter: Patrick Winter
>Priority: Major
>
> Spark is currently using the Kubernetes client version 4.1.2. This client 
> does not support the current Kubernetes version 1.14, as can be seen on the 
> [compatibility 
> matrix|[https://github.com/fabric8io/kubernetes-client#compatibility-matrix]].
>  Therefore the Kubernetes client should be bumped up to version 4.3.0.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-28444) Bump Kubernetes Client Version to 4.3.0

2019-07-19 Thread Patrick Winter (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-28444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16888975#comment-16888975
 ] 

Patrick Winter commented on SPARK-28444:


I agree. I will be on holiday for the next few weeks, but will keep you updated 
once we found out more. Thanks for your help!

> Bump Kubernetes Client Version to 4.3.0
> ---
>
> Key: SPARK-28444
> URL: https://issues.apache.org/jira/browse/SPARK-28444
> Project: Spark
>  Issue Type: Dependency upgrade
>  Components: Kubernetes
>Affects Versions: 3.0.0, 2.4.3
>Reporter: Patrick Winter
>Priority: Major
>
> Spark is currently using the Kubernetes client version 4.1.2. This client 
> does not support the current Kubernetes version 1.14, as can be seen on the 
> [compatibility 
> matrix|[https://github.com/fabric8io/kubernetes-client#compatibility-matrix]].
>  Therefore the Kubernetes client should be bumped up to version 4.3.0.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-28440) Use TestingUtils to compare floating point values

2019-07-19 Thread Dongjoon Hyun (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-28440:
-

Assignee: Ievgen Prokhorenko

> Use TestingUtils to compare floating point values
> -
>
> Key: SPARK-28440
> URL: https://issues.apache.org/jira/browse/SPARK-28440
> Project: Spark
>  Issue Type: Improvement
>  Components: MLlib, Tests
>Affects Versions: 3.0.0
>Reporter: Dongjoon Hyun
>Assignee: Ievgen Prokhorenko
>Priority: Minor
> Fix For: 3.0.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-28444) Bump Kubernetes Client Version to 4.3.0

2019-07-19 Thread Stavros Kontopoulos (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-28444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16888970#comment-16888970
 ] 

Stavros Kontopoulos edited comment on SPARK-28444 at 7/19/19 3:26 PM:
--

Right now on master we have 4.1.2 
[https://github.com/apache/spark/blob/453cbf3dd8df5ec4da844c93eb6000610b551541/resource-managers/kubernetes/core/pom.xml#L32].
 Afaik this is the same version for 2.4.3. Something else is not right.


was (Author: skonto):
Right now on master we have 4.1.2 
[https://github.com/apache/spark/blob/453cbf3dd8df5ec4da844c93eb6000610b551541/resource-managers/kubernetes/core/pom.xml#L32].
 Afaik this is the same version for 2.4.3.

> Bump Kubernetes Client Version to 4.3.0
> ---
>
> Key: SPARK-28444
> URL: https://issues.apache.org/jira/browse/SPARK-28444
> Project: Spark
>  Issue Type: Dependency upgrade
>  Components: Kubernetes
>Affects Versions: 3.0.0, 2.4.3
>Reporter: Patrick Winter
>Priority: Major
>
> Spark is currently using the Kubernetes client version 4.1.2. This client 
> does not support the current Kubernetes version 1.14, as can be seen on the 
> [compatibility 
> matrix|[https://github.com/fabric8io/kubernetes-client#compatibility-matrix]].
>  Therefore the Kubernetes client should be bumped up to version 4.3.0.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-28444) Bump Kubernetes Client Version to 4.3.0

2019-07-19 Thread Stavros Kontopoulos (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-28444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16888970#comment-16888970
 ] 

Stavros Kontopoulos edited comment on SPARK-28444 at 7/19/19 3:26 PM:
--

Right now on master we have 4.1.2 
[https://github.com/apache/spark/blob/453cbf3dd8df5ec4da844c93eb6000610b551541/resource-managers/kubernetes/core/pom.xml#L32].
 Afaik this is the same version for 2.4.3.


was (Author: skonto):
Right now on master we have 4.1.2 
[https://github.com/apache/spark/blob/453cbf3dd8df5ec4da844c93eb6000610b551541/resource-managers/kubernetes/core/pom.xml#L32].
 Did you try 3.0.0?

> Bump Kubernetes Client Version to 4.3.0
> ---
>
> Key: SPARK-28444
> URL: https://issues.apache.org/jira/browse/SPARK-28444
> Project: Spark
>  Issue Type: Dependency upgrade
>  Components: Kubernetes
>Affects Versions: 3.0.0, 2.4.3
>Reporter: Patrick Winter
>Priority: Major
>
> Spark is currently using the Kubernetes client version 4.1.2. This client 
> does not support the current Kubernetes version 1.14, as can be seen on the 
> [compatibility 
> matrix|[https://github.com/fabric8io/kubernetes-client#compatibility-matrix]].
>  Therefore the Kubernetes client should be bumped up to version 4.3.0.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-28444) Bump Kubernetes Client Version to 4.3.0

2019-07-19 Thread Stavros Kontopoulos (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-28444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16888970#comment-16888970
 ] 

Stavros Kontopoulos edited comment on SPARK-28444 at 7/19/19 3:25 PM:
--

Right now on master we have [4.1.2 | 
[https://github.com/apache/spark/blob/453cbf3dd8df5ec4da844c93eb6000610b551541/resource-managers/kubernetes/core/pom.xml#L32]]


was (Author: skonto):
Right now on master we have [4.1.2 
|[https://github.com/apache/spark/blob/453cbf3dd8df5ec4da844c93eb6000610b551541/resource-managers/kubernetes/core/pom.xml#L32]]

> Bump Kubernetes Client Version to 4.3.0
> ---
>
> Key: SPARK-28444
> URL: https://issues.apache.org/jira/browse/SPARK-28444
> Project: Spark
>  Issue Type: Dependency upgrade
>  Components: Kubernetes
>Affects Versions: 3.0.0, 2.4.3
>Reporter: Patrick Winter
>Priority: Major
>
> Spark is currently using the Kubernetes client version 4.1.2. This client 
> does not support the current Kubernetes version 1.14, as can be seen on the 
> [compatibility 
> matrix|[https://github.com/fabric8io/kubernetes-client#compatibility-matrix]].
>  Therefore the Kubernetes client should be bumped up to version 4.3.0.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-28444) Bump Kubernetes Client Version to 4.3.0

2019-07-19 Thread Stavros Kontopoulos (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-28444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16888970#comment-16888970
 ] 

Stavros Kontopoulos edited comment on SPARK-28444 at 7/19/19 3:25 PM:
--

Right now on master we have 4.1.2 
[https://github.com/apache/spark/blob/453cbf3dd8df5ec4da844c93eb6000610b551541/resource-managers/kubernetes/core/pom.xml#L32].
 Did you try 3.0.0?


was (Author: skonto):
Right now on master we have [4.1.2 | 
[https://github.com/apache/spark/blob/453cbf3dd8df5ec4da844c93eb6000610b551541/resource-managers/kubernetes/core/pom.xml#L32]]

> Bump Kubernetes Client Version to 4.3.0
> ---
>
> Key: SPARK-28444
> URL: https://issues.apache.org/jira/browse/SPARK-28444
> Project: Spark
>  Issue Type: Dependency upgrade
>  Components: Kubernetes
>Affects Versions: 3.0.0, 2.4.3
>Reporter: Patrick Winter
>Priority: Major
>
> Spark is currently using the Kubernetes client version 4.1.2. This client 
> does not support the current Kubernetes version 1.14, as can be seen on the 
> [compatibility 
> matrix|[https://github.com/fabric8io/kubernetes-client#compatibility-matrix]].
>  Therefore the Kubernetes client should be bumped up to version 4.3.0.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-28444) Bump Kubernetes Client Version to 4.3.0

2019-07-19 Thread Stavros Kontopoulos (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-28444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16888970#comment-16888970
 ] 

Stavros Kontopoulos commented on SPARK-28444:
-

Right now on master we have [4.1.2 
|[https://github.com/apache/spark/blob/453cbf3dd8df5ec4da844c93eb6000610b551541/resource-managers/kubernetes/core/pom.xml#L32]]

> Bump Kubernetes Client Version to 4.3.0
> ---
>
> Key: SPARK-28444
> URL: https://issues.apache.org/jira/browse/SPARK-28444
> Project: Spark
>  Issue Type: Dependency upgrade
>  Components: Kubernetes
>Affects Versions: 3.0.0, 2.4.3
>Reporter: Patrick Winter
>Priority: Major
>
> Spark is currently using the Kubernetes client version 4.1.2. This client 
> does not support the current Kubernetes version 1.14, as can be seen on the 
> [compatibility 
> matrix|[https://github.com/fabric8io/kubernetes-client#compatibility-matrix]].
>  Therefore the Kubernetes client should be bumped up to version 4.3.0.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-28444) Bump Kubernetes Client Version to 4.3.0

2019-07-19 Thread Patrick Winter (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-28444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16888965#comment-16888965
 ] 

Patrick Winter commented on SPARK-28444:


Using the Kubernetes client in a standalone application works both with version 
4.1.2 and 4.3.0. We used the same credentials as for Spark.

> Bump Kubernetes Client Version to 4.3.0
> ---
>
> Key: SPARK-28444
> URL: https://issues.apache.org/jira/browse/SPARK-28444
> Project: Spark
>  Issue Type: Dependency upgrade
>  Components: Kubernetes
>Affects Versions: 3.0.0, 2.4.3
>Reporter: Patrick Winter
>Priority: Major
>
> Spark is currently using the Kubernetes client version 4.1.2. This client 
> does not support the current Kubernetes version 1.14, as can be seen on the 
> [compatibility 
> matrix|[https://github.com/fabric8io/kubernetes-client#compatibility-matrix]].
>  Therefore the Kubernetes client should be bumped up to version 4.3.0.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-28441) PythonUDF used in correlated scalar subquery causes

2019-07-19 Thread Liang-Chi Hsieh (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liang-Chi Hsieh updated SPARK-28441:

Summary: PythonUDF used in correlated scalar subquery causes   (was: 
udf(max(udf(column))) throws java.lang.UnsupportedOperationException: Cannot 
evaluate expression: udf(null))

> PythonUDF used in correlated scalar subquery causes 
> 
>
> Key: SPARK-28441
> URL: https://issues.apache.org/jira/browse/SPARK-28441
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, SQL
>Affects Versions: 3.0.0
>Reporter: Huaxin Gao
>Priority: Minor
>
> I found this when doing https://issues.apache.org/jira/browse/SPARK-28277
>  
> {code:java}
> >>> @pandas_udf("string", PandasUDFType.SCALAR)
> ... def noop(x):
> ...     return x.apply(str)
> ... 
> >>> spark.udf.register("udf", noop)
> 
> >>> spark.sql("CREATE OR REPLACE TEMPORARY VIEW t1 as select * from values 
> >>> (\"one\", 1), (\"two\", 2),(\"three\", 3),(\"one\", NULL) as t1(k, v)")
> DataFrame[]
> >>> spark.sql("CREATE OR REPLACE TEMPORARY VIEW t2 as select * from values 
> >>> (\"one\", 1), (\"two\", 22),(\"one\", 5),(\"one\", NULL), (NULL, 5) as 
> >>> t2(k, v)")
> DataFrame[]
> >>> spark.sql("SELECT t1.k FROM t1 WHERE  t1.v <= (SELECT   
> >>> udf(max(udf(t2.v))) FROM     t2 WHERE    udf(t2.k) = udf(t1.k))").show()
> py4j.protocol.Py4JJavaError: An error occurred while calling o65.showString.
> : java.lang.UnsupportedOperationException: Cannot evaluate expression: 
> udf(null)
>  at 
> org.apache.spark.sql.catalyst.expressions.Unevaluable.eval(Expression.scala:296)
>  at 
> org.apache.spark.sql.catalyst.expressions.Unevaluable.eval$(Expression.scala:295)
>  at 
> org.apache.spark.sql.catalyst.expressions.PythonUDF.eval(PythonUDF.scala:52)
> {code}
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-28441) PythonUDF used in correlated scalar subquery causes UnsupportedOperationException

2019-07-19 Thread Liang-Chi Hsieh (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liang-Chi Hsieh updated SPARK-28441:

Summary: PythonUDF used in correlated scalar subquery causes 
UnsupportedOperationException   (was: PythonUDF used in correlated scalar 
subquery causes )

> PythonUDF used in correlated scalar subquery causes 
> UnsupportedOperationException 
> --
>
> Key: SPARK-28441
> URL: https://issues.apache.org/jira/browse/SPARK-28441
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, SQL
>Affects Versions: 3.0.0
>Reporter: Huaxin Gao
>Priority: Minor
>
> I found this when doing https://issues.apache.org/jira/browse/SPARK-28277
>  
> {code:java}
> >>> @pandas_udf("string", PandasUDFType.SCALAR)
> ... def noop(x):
> ...     return x.apply(str)
> ... 
> >>> spark.udf.register("udf", noop)
> 
> >>> spark.sql("CREATE OR REPLACE TEMPORARY VIEW t1 as select * from values 
> >>> (\"one\", 1), (\"two\", 2),(\"three\", 3),(\"one\", NULL) as t1(k, v)")
> DataFrame[]
> >>> spark.sql("CREATE OR REPLACE TEMPORARY VIEW t2 as select * from values 
> >>> (\"one\", 1), (\"two\", 22),(\"one\", 5),(\"one\", NULL), (NULL, 5) as 
> >>> t2(k, v)")
> DataFrame[]
> >>> spark.sql("SELECT t1.k FROM t1 WHERE  t1.v <= (SELECT   
> >>> udf(max(udf(t2.v))) FROM     t2 WHERE    udf(t2.k) = udf(t1.k))").show()
> py4j.protocol.Py4JJavaError: An error occurred while calling o65.showString.
> : java.lang.UnsupportedOperationException: Cannot evaluate expression: 
> udf(null)
>  at 
> org.apache.spark.sql.catalyst.expressions.Unevaluable.eval(Expression.scala:296)
>  at 
> org.apache.spark.sql.catalyst.expressions.Unevaluable.eval$(Expression.scala:295)
>  at 
> org.apache.spark.sql.catalyst.expressions.PythonUDF.eval(PythonUDF.scala:52)
> {code}
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-28441) PythonUDF used in correlated scalar subquery causes UnsupportedOperationException

2019-07-19 Thread Liang-Chi Hsieh (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28441?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liang-Chi Hsieh updated SPARK-28441:

Priority: Major  (was: Minor)

> PythonUDF used in correlated scalar subquery causes 
> UnsupportedOperationException 
> --
>
> Key: SPARK-28441
> URL: https://issues.apache.org/jira/browse/SPARK-28441
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, SQL
>Affects Versions: 3.0.0
>Reporter: Huaxin Gao
>Priority: Major
>
> I found this when doing https://issues.apache.org/jira/browse/SPARK-28277
>  
> {code:java}
> >>> @pandas_udf("string", PandasUDFType.SCALAR)
> ... def noop(x):
> ...     return x.apply(str)
> ... 
> >>> spark.udf.register("udf", noop)
> 
> >>> spark.sql("CREATE OR REPLACE TEMPORARY VIEW t1 as select * from values 
> >>> (\"one\", 1), (\"two\", 2),(\"three\", 3),(\"one\", NULL) as t1(k, v)")
> DataFrame[]
> >>> spark.sql("CREATE OR REPLACE TEMPORARY VIEW t2 as select * from values 
> >>> (\"one\", 1), (\"two\", 22),(\"one\", 5),(\"one\", NULL), (NULL, 5) as 
> >>> t2(k, v)")
> DataFrame[]
> >>> spark.sql("SELECT t1.k FROM t1 WHERE  t1.v <= (SELECT   
> >>> udf(max(udf(t2.v))) FROM     t2 WHERE    udf(t2.k) = udf(t1.k))").show()
> py4j.protocol.Py4JJavaError: An error occurred while calling o65.showString.
> : java.lang.UnsupportedOperationException: Cannot evaluate expression: 
> udf(null)
>  at 
> org.apache.spark.sql.catalyst.expressions.Unevaluable.eval(Expression.scala:296)
>  at 
> org.apache.spark.sql.catalyst.expressions.Unevaluable.eval$(Expression.scala:295)
>  at 
> org.apache.spark.sql.catalyst.expressions.PythonUDF.eval(PythonUDF.scala:52)
> {code}
>  
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-28444) Bump Kubernetes Client Version to 4.3.0

2019-07-19 Thread Stavros Kontopoulos (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-28444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1671#comment-1671
 ] 

Stavros Kontopoulos edited comment on SPARK-28444 at 7/19/19 1:46 PM:
--

Am I not sure this is a k8s client version issue, it seems more like a 
credentials issue. But let's find out. Have you tried to update the k8s client? 
Can you verify you can/cant create pods with a simple app (outside Spark) using 
the fabric8io k8s client in different versions? Does it work with minikube 1.14?


was (Author: skonto):
Am I not sure this is a k8s client version issue, it is more like a credentials 
issue. Have you tried to update the k8s client? Can you verify you can/cant 
create pods with a simple app (outside Spark) using the fabric8io k8s client in 
different versions? Does it work with minikube 1.14?

> Bump Kubernetes Client Version to 4.3.0
> ---
>
> Key: SPARK-28444
> URL: https://issues.apache.org/jira/browse/SPARK-28444
> Project: Spark
>  Issue Type: Dependency upgrade
>  Components: Kubernetes
>Affects Versions: 3.0.0, 2.4.3
>Reporter: Patrick Winter
>Priority: Major
>
> Spark is currently using the Kubernetes client version 4.1.2. This client 
> does not support the current Kubernetes version 1.14, as can be seen on the 
> [compatibility 
> matrix|[https://github.com/fabric8io/kubernetes-client#compatibility-matrix]].
>  Therefore the Kubernetes client should be bumped up to version 4.3.0.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-28444) Bump Kubernetes Client Version to 4.3.0

2019-07-19 Thread Stavros Kontopoulos (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-28444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1671#comment-1671
 ] 

Stavros Kontopoulos edited comment on SPARK-28444 at 7/19/19 1:46 PM:
--

Am I not sure this is a k8s client version issue, it is more like a credentials 
issue. Have you tried to update the k8s client? Can you verify you can/cant 
create pods with a simple app (outside Spark) using the fabric8io k8s client in 
different versions? Does it work with minikube 1.14?


was (Author: skonto):
Am I not sure this is a k8s client version issue, it is more like a credentials 
issue. Have you tried to update the k8s client? Can you verify you can/cant 
create pods with a simple app (outside Spark) using the fabric8io k8s client in 
different versions? 

> Bump Kubernetes Client Version to 4.3.0
> ---
>
> Key: SPARK-28444
> URL: https://issues.apache.org/jira/browse/SPARK-28444
> Project: Spark
>  Issue Type: Dependency upgrade
>  Components: Kubernetes
>Affects Versions: 3.0.0, 2.4.3
>Reporter: Patrick Winter
>Priority: Major
>
> Spark is currently using the Kubernetes client version 4.1.2. This client 
> does not support the current Kubernetes version 1.14, as can be seen on the 
> [compatibility 
> matrix|[https://github.com/fabric8io/kubernetes-client#compatibility-matrix]].
>  Therefore the Kubernetes client should be bumped up to version 4.3.0.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-28444) Bump Kubernetes Client Version to 4.3.0

2019-07-19 Thread Stavros Kontopoulos (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-28444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1671#comment-1671
 ] 

Stavros Kontopoulos edited comment on SPARK-28444 at 7/19/19 1:45 PM:
--

Am I not sure this is a k8s client version issue, it is more like a credentials 
issue. Have you tried to update the k8s client? Can you verify you can/cant 
create pods with a simple app (outside Spark) using the fabric8io k8s client in 
different versions? 


was (Author: skonto):
Am I not sure this is a k8s client version issue, it is more like a credentials 
issue. Have you tried to update the k8s client? Can you verify you can/cant 
create pods with a simple app (outside Spark) using the fabric8ios client in 
different versions? 

> Bump Kubernetes Client Version to 4.3.0
> ---
>
> Key: SPARK-28444
> URL: https://issues.apache.org/jira/browse/SPARK-28444
> Project: Spark
>  Issue Type: Dependency upgrade
>  Components: Kubernetes
>Affects Versions: 3.0.0, 2.4.3
>Reporter: Patrick Winter
>Priority: Major
>
> Spark is currently using the Kubernetes client version 4.1.2. This client 
> does not support the current Kubernetes version 1.14, as can be seen on the 
> [compatibility 
> matrix|[https://github.com/fabric8io/kubernetes-client#compatibility-matrix]].
>  Therefore the Kubernetes client should be bumped up to version 4.3.0.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-28444) Bump Kubernetes Client Version to 4.3.0

2019-07-19 Thread Stavros Kontopoulos (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-28444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1671#comment-1671
 ] 

Stavros Kontopoulos edited comment on SPARK-28444 at 7/19/19 1:45 PM:
--

Am I not sure this is a k8s client version issue, it is more like a credentials 
issue. Have you tried to update the k8s client? Can you verify you can/cant 
create pods with a simple app (outside Spark) using the fabric8ios client in 
different versions? 


was (Author: skonto):
Am I not sure this is a k8s client version issue, it is more like a credentials 
issue. Have you tried to update the k8s client? Can you verify you can create 
pods with a simple app using the fabric8ios client in different versions?

> Bump Kubernetes Client Version to 4.3.0
> ---
>
> Key: SPARK-28444
> URL: https://issues.apache.org/jira/browse/SPARK-28444
> Project: Spark
>  Issue Type: Dependency upgrade
>  Components: Kubernetes
>Affects Versions: 3.0.0, 2.4.3
>Reporter: Patrick Winter
>Priority: Major
>
> Spark is currently using the Kubernetes client version 4.1.2. This client 
> does not support the current Kubernetes version 1.14, as can be seen on the 
> [compatibility 
> matrix|[https://github.com/fabric8io/kubernetes-client#compatibility-matrix]].
>  Therefore the Kubernetes client should be bumped up to version 4.3.0.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-28444) Bump Kubernetes Client Version to 4.3.0

2019-07-19 Thread Stavros Kontopoulos (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-28444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1671#comment-1671
 ] 

Stavros Kontopoulos edited comment on SPARK-28444 at 7/19/19 1:44 PM:
--

Am I not sure this is a k8s client version issue, it is more like a credentials 
issue. Have you tried to update the k8s client? Can you verify you can create 
pods with a simple app using the fabric8ios client in different versions?


was (Author: skonto):
Am I not sure this is a k8s client version issue, it is more like a credentials 
issue. Have you tried to update the k8s client?

> Bump Kubernetes Client Version to 4.3.0
> ---
>
> Key: SPARK-28444
> URL: https://issues.apache.org/jira/browse/SPARK-28444
> Project: Spark
>  Issue Type: Dependency upgrade
>  Components: Kubernetes
>Affects Versions: 3.0.0, 2.4.3
>Reporter: Patrick Winter
>Priority: Major
>
> Spark is currently using the Kubernetes client version 4.1.2. This client 
> does not support the current Kubernetes version 1.14, as can be seen on the 
> [compatibility 
> matrix|[https://github.com/fabric8io/kubernetes-client#compatibility-matrix]].
>  Therefore the Kubernetes client should be bumped up to version 4.3.0.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-28444) Bump Kubernetes Client Version to 4.3.0

2019-07-19 Thread Stavros Kontopoulos (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-28444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1671#comment-1671
 ] 

Stavros Kontopoulos commented on SPARK-28444:
-

Am I not sure this is a k8s client version issue, it is more like a credentials 
issue. Have you tried to update the k8s client?

> Bump Kubernetes Client Version to 4.3.0
> ---
>
> Key: SPARK-28444
> URL: https://issues.apache.org/jira/browse/SPARK-28444
> Project: Spark
>  Issue Type: Dependency upgrade
>  Components: Kubernetes
>Affects Versions: 3.0.0, 2.4.3
>Reporter: Patrick Winter
>Priority: Major
>
> Spark is currently using the Kubernetes client version 4.1.2. This client 
> does not support the current Kubernetes version 1.14, as can be seen on the 
> [compatibility 
> matrix|[https://github.com/fabric8io/kubernetes-client#compatibility-matrix]].
>  Therefore the Kubernetes client should be bumped up to version 4.3.0.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-28452) CSV datasource writer do not support maxCharsPerColumn option

2019-07-19 Thread Weichen Xu (JIRA)
Weichen Xu created SPARK-28452:
--

 Summary: CSV datasource writer do not support maxCharsPerColumn 
option
 Key: SPARK-28452
 URL: https://issues.apache.org/jira/browse/SPARK-28452
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 2.4.3
Reporter: Weichen Xu


CSV datasource reader support maxCharsPerColumn option, but CSV datasource 
writer do not support maxCharsPerColumn option.

Should we make CSV datasource writer also support maxCharsPerColumn ? So that 
reader/writer will have consistent behavior on this option. Otherwise user may 
write a DF to csv successfully but then load it failed.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-28444) Bump Kubernetes Client Version to 4.3.0

2019-07-19 Thread Patrick Winter (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-28444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1653#comment-1653
 ] 

Patrick Winter commented on SPARK-28444:


We have been hitting this one previously, but are now already using a user 
account (.kube/config) on the submit machine, so this does not seem to be the 
issue.

> Bump Kubernetes Client Version to 4.3.0
> ---
>
> Key: SPARK-28444
> URL: https://issues.apache.org/jira/browse/SPARK-28444
> Project: Spark
>  Issue Type: Dependency upgrade
>  Components: Kubernetes
>Affects Versions: 3.0.0, 2.4.3
>Reporter: Patrick Winter
>Priority: Major
>
> Spark is currently using the Kubernetes client version 4.1.2. This client 
> does not support the current Kubernetes version 1.14, as can be seen on the 
> [compatibility 
> matrix|[https://github.com/fabric8io/kubernetes-client#compatibility-matrix]].
>  Therefore the Kubernetes client should be bumped up to version 4.3.0.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-28289) Convert and port 'union.sql' into UDF test base

2019-07-19 Thread Yiheng Wang (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-28289?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1647#comment-1647
 ] 

Yiheng Wang commented on SPARK-28289:
-

Here's the PR:

[https://github.com/apache/spark/pull/25202]

[~hyukjin.kwon]

> Convert and port 'union.sql' into UDF test base
> ---
>
> Key: SPARK-28289
> URL: https://issues.apache.org/jira/browse/SPARK-28289
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark, SQL, Tests
>Affects Versions: 3.0.0
>Reporter: Hyukjin Kwon
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-28451) substr returns different values

2019-07-19 Thread Yuming Wang (JIRA)
Yuming Wang created SPARK-28451:
---

 Summary: substr returns different values
 Key: SPARK-28451
 URL: https://issues.apache.org/jira/browse/SPARK-28451
 Project: Spark
  Issue Type: Sub-task
  Components: Tests
Affects Versions: 3.0.0
Reporter: Yuming Wang


PostgreSQL:
{noformat}
postgres=# select substr('1234567890', -1, 5);
 substr

 123
(1 row)

postgres=# select substr('1234567890', 1, -1);
ERROR:  negative substring length not allowed
{noformat}

Spark SQL:
{noformat}
spark-sql> select substr('1234567890', -1, 5);
0
spark-sql> select substr('1234567890', 1, -1);

spark-sql>
{noformat}



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-28444) Bump Kubernetes Client Version to 4.3.0

2019-07-19 Thread Stavros Kontopoulos (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-28444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1639#comment-1639
 ] 

Stavros Kontopoulos commented on SPARK-28444:
-

Probably you are hitting this one: 
https://issues.apache.org/jira/browse/SPARK-26833

> Bump Kubernetes Client Version to 4.3.0
> ---
>
> Key: SPARK-28444
> URL: https://issues.apache.org/jira/browse/SPARK-28444
> Project: Spark
>  Issue Type: Dependency upgrade
>  Components: Kubernetes
>Affects Versions: 3.0.0, 2.4.3
>Reporter: Patrick Winter
>Priority: Major
>
> Spark is currently using the Kubernetes client version 4.1.2. This client 
> does not support the current Kubernetes version 1.14, as can be seen on the 
> [compatibility 
> matrix|[https://github.com/fabric8io/kubernetes-client#compatibility-matrix]].
>  Therefore the Kubernetes client should be bumped up to version 4.3.0.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-28444) Bump Kubernetes Client Version to 4.3.0

2019-07-19 Thread Patrick Winter (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-28444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1633#comment-1633
 ] 

Patrick Winter commented on SPARK-28444:


Running spark-submit does unfortunately not give much more information:

Exception in thread "main" 
io.fabric8.kubernetes.client.KubernetesClientException
 at 
io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager$2.onFailure(WatchConnectionManager.java:201)
 at okhttp3.internal.ws.RealWebSocket.failWebSocket(RealWebSocket.java:543)
 at okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:185)
 at okhttp3.RealCall$AsyncCall.execute(RealCall.java:141)
 at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
 at java.lang.Thread.run(Thread.java:748)
19/07/19 12:43:20 INFO ShutdownHookManager: Shutdown hook called
19/07/19 12:43:20 INFO ShutdownHookManager: Deleting directory 
/tmp/spark-54cf5aa1-7a66-4bb4-8d88-96ac7d2076e2

 

Running the jar directly we get a little more:

19/07/19 12:45:27 INFO SparkContext: Running Spark version 2.4.2
19/07/19 12:45:27 WARN NativeCodeLoader: Unable to load native-hadoop library 
for your platform... using builtin-java classes where applicable
19/07/19 12:45:27 INFO SparkContext: Submitted application: bigdataAnalyticsPoC
19/07/19 12:45:28 INFO SecurityManager: Changing view acls to: root
19/07/19 12:45:28 INFO SecurityManager: Changing modify acls to: root
19/07/19 12:45:28 INFO SecurityManager: Changing view acls groups to: 
19/07/19 12:45:28 INFO SecurityManager: Changing modify acls groups to: 
19/07/19 12:45:28 INFO SecurityManager: SecurityManager: authentication 
disabled; ui acls disabled; users with view permissions: Set(root); groups with 
view permissions: Set(); users with modify permissions: Set(root); groups with 
modify permissions: Set()
19/07/19 12:45:28 INFO Utils: Successfully started service 'sparkDriver' on 
port 40288.
19/07/19 12:45:28 INFO SparkEnv: Registering MapOutputTracker
19/07/19 12:45:28 INFO SparkEnv: Registering BlockManagerMaster
19/07/19 12:45:28 INFO BlockManagerMasterEndpoint: Using 
org.apache.spark.storage.DefaultTopologyMapper for getting topology information
19/07/19 12:45:28 INFO BlockManagerMasterEndpoint: BlockManagerMasterEndpoint up
19/07/19 12:45:28 INFO DiskBlockManager: Created local directory at 
/tmp/blockmgr-f46c28fd-5c19-441e-9f62-c7d392e2c29a
19/07/19 12:45:28 INFO MemoryStore: MemoryStore started with capacity 2.1 GB
19/07/19 12:45:28 INFO SparkEnv: Registering OutputCommitCoordinator
19/07/19 12:45:28 INFO Utils: Successfully started service 'SparkUI' on port 
4040.
19/07/19 12:45:28 INFO SparkUI: Bound SparkUI to 0.0.0.0, and started at 
http://spark-submit-client-849vj:4040
19/07/19 12:45:29 INFO ExecutorPodsAllocator: Going to request 2 executors from 
Kubernetes.
19/07/19 12:45:29 WARN WatchConnectionManager: Exec Failure: HTTP 403, Status: 
403 - null
java.net.ProtocolException: Expected HTTP 101 response but was '403 Forbidden'
 at okhttp3.internal.ws.RealWebSocket.checkResponse(RealWebSocket.java:216)
 at okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:183)
 at okhttp3.RealCall$AsyncCall.execute(RealCall.java:141)
 at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
 at java.lang.Thread.run(Thread.java:748)
19/07/19 12:45:29 WARN ExecutorPodsWatchSnapshotSource: Kubernetes client has 
been closed (this is expected if the application is shutting down.)
19/07/19 12:45:30 ERROR SparkContext: Error initializing SparkContext.
io.fabric8.kubernetes.client.KubernetesClientException
 at 
io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager$2.onFailure(WatchConnectionManager.java:201)
 at okhttp3.internal.ws.RealWebSocket.failWebSocket(RealWebSocket.java:543)
 at okhttp3.internal.ws.RealWebSocket$2.onResponse(RealWebSocket.java:185)
 at okhttp3.RealCall$AsyncCall.execute(RealCall.java:141)
 at okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
 at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
 at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
 at java.lang.Thread.run(Thread.java:748)
19/07/19 12:45:30 INFO SparkUI: Stopped Spark web UI at 
http://spark-submit-client-849vj:4040
19/07/19 12:45:30 INFO KubernetesClusterSchedulerBackend: Shutting down all 
executors
19/07/19 12:45:30 INFO 
KubernetesClusterSchedulerBackend$KubernetesDriverEndpoint: Asking each 
executor to shut down
19/07/19 12:45:30 INFO MapOutputTrackerMasterEndpoint: 
MapOutputTrackerMasterEndpoint stopped!
19/07/19 12:45:30 INFO 

[jira] [Updated] (SPARK-28450) When scan hive data of a not existed partition, it return an error

2019-07-19 Thread angerszhu (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

angerszhu updated SPARK-28450:
--
Attachment: image-2019-07-19-20-51-12-861.png

> When scan hive data of a not existed partition, it return an error
> --
>
> Key: SPARK-28450
> URL: https://issues.apache.org/jira/browse/SPARK-28450
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.0, 3.0.0
>Reporter: angerszhu
>Priority: Major
> Attachments: image-2019-07-19-20-51-12-861.png
>
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-28450) When scan hive data of a not existed partition, it return an error

2019-07-19 Thread angerszhu (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

angerszhu updated SPARK-28450:
--
Description: 
When we select data of a un-existed hive partition table's partition, it will 
return error, bu it should just return empty.

!image-2019-07-19-20-51-12-861.png!

> When scan hive data of a not existed partition, it return an error
> --
>
> Key: SPARK-28450
> URL: https://issues.apache.org/jira/browse/SPARK-28450
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 2.4.0, 3.0.0
>Reporter: angerszhu
>Priority: Major
> Attachments: image-2019-07-19-20-51-12-861.png
>
>
> When we select data of a un-existed hive partition table's partition, it will 
> return error, bu it should just return empty.
> !image-2019-07-19-20-51-12-861.png!



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-28450) When scan hive data of a not existed partition, it return an error

2019-07-19 Thread angerszhu (JIRA)
angerszhu created SPARK-28450:
-

 Summary: When scan hive data of a not existed partition, it return 
an error
 Key: SPARK-28450
 URL: https://issues.apache.org/jira/browse/SPARK-28450
 Project: Spark
  Issue Type: Improvement
  Components: SQL
Affects Versions: 2.4.0, 3.0.0
Reporter: angerszhu






--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-28449) Missing escape_string_warning and standard_conforming_strings config

2019-07-19 Thread Yuming Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated SPARK-28449:

Summary: Missing escape_string_warning and standard_conforming_strings 
config  (was: Missing escape_string_warning/standard_conforming_strings config)

> Missing escape_string_warning and standard_conforming_strings config
> 
>
> Key: SPARK-28449
> URL: https://issues.apache.org/jira/browse/SPARK-28449
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Yuming Wang
>Priority: Major
>
> When on, a warning is issued if a backslash ({{}}) appears in an ordinary 
> string literal ({{'...'}} syntax) and {{standard_conforming_strings}} is off. 
> The default is {{on}}.
> Applications that wish to use backslash as escape should be modified to use 
> escape string syntax ({{E'...'}}), because the default behavior of ordinary 
> strings is now to treat backslash as an ordinary character, per SQL standard. 
> This variable can be enabled to help locate code that needs to be changed.
>  
> [https://www.postgresql.org/docs/11/runtime-config-compatible.html]



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-28449) Missing escape_string_warning/standard_conforming_strings config

2019-07-19 Thread Yuming Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated SPARK-28449:

Description: 
When on, a warning is issued if a backslash ({{}}) appears in an ordinary 
string literal ({{'...'}} syntax) and {{standard_conforming_strings}} is off. 
The default is {{on}}.

Applications that wish to use backslash as escape should be modified to use 
escape string syntax ({{E'...'}}), because the default behavior of ordinary 
strings is now to treat backslash as an ordinary character, per SQL standard. 
This variable can be enabled to help locate code that needs to be changed.

 

[https://www.postgresql.org/docs/11/runtime-config-compatible.html]

  was:
When on, a warning is issued if a backslash ({{\}}) appears in an ordinary 
string literal ({{'...'}} syntax) and {{standard_conforming_strings}} is off. 
The default is {{on}}.

Applications that wish to use backslash as escape should be modified to use 
escape string syntax ({{E'...'}}), because the default behavior of ordinary 
strings is now to treat backslash as an ordinary character, per SQL standard. 
This variable can be enabled to help locate code that needs to be changed.


> Missing escape_string_warning/standard_conforming_strings config
> 
>
> Key: SPARK-28449
> URL: https://issues.apache.org/jira/browse/SPARK-28449
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Yuming Wang
>Priority: Major
>
> When on, a warning is issued if a backslash ({{}}) appears in an ordinary 
> string literal ({{'...'}} syntax) and {{standard_conforming_strings}} is off. 
> The default is {{on}}.
> Applications that wish to use backslash as escape should be modified to use 
> escape string syntax ({{E'...'}}), because the default behavior of ordinary 
> strings is now to treat backslash as an ordinary character, per SQL standard. 
> This variable can be enabled to help locate code that needs to be changed.
>  
> [https://www.postgresql.org/docs/11/runtime-config-compatible.html]



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-28449) Missing escape_string_warning/standard_conforming_strings config

2019-07-19 Thread Yuming Wang (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28449?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yuming Wang updated SPARK-28449:

Summary: Missing escape_string_warning/standard_conforming_strings config  
(was: Missing escape_string_warning config)

> Missing escape_string_warning/standard_conforming_strings config
> 
>
> Key: SPARK-28449
> URL: https://issues.apache.org/jira/browse/SPARK-28449
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Yuming Wang
>Priority: Major
>
> When on, a warning is issued if a backslash ({{\}}) appears in an ordinary 
> string literal ({{'...'}} syntax) and {{standard_conforming_strings}} is off. 
> The default is {{on}}.
> Applications that wish to use backslash as escape should be modified to use 
> escape string syntax ({{E'...'}}), because the default behavior of ordinary 
> strings is now to treat backslash as an ordinary character, per SQL standard. 
> This variable can be enabled to help locate code that needs to be changed.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-28449) Missing escape_string_warning config

2019-07-19 Thread Yuming Wang (JIRA)
Yuming Wang created SPARK-28449:
---

 Summary: Missing escape_string_warning config
 Key: SPARK-28449
 URL: https://issues.apache.org/jira/browse/SPARK-28449
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.0.0
Reporter: Yuming Wang


When on, a warning is issued if a backslash ({{\}}) appears in an ordinary 
string literal ({{'...'}} syntax) and {{standard_conforming_strings}} is off. 
The default is {{on}}.

Applications that wish to use backslash as escape should be modified to use 
escape string syntax ({{E'...'}}), because the default behavior of ordinary 
strings is now to treat backslash as an ordinary character, per SQL standard. 
This variable can be enabled to help locate code that needs to be changed.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-28448) Implement ILIKE operator

2019-07-19 Thread Yuming Wang (JIRA)
Yuming Wang created SPARK-28448:
---

 Summary: Implement ILIKE operator
 Key: SPARK-28448
 URL: https://issues.apache.org/jira/browse/SPARK-28448
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.0.0
Reporter: Yuming Wang


The key word {{ILIKE}} can be used instead of {{LIKE}} to make the match 
case-insensitive according to the active locale. This is not in the SQL 
standard but is a PostgreSQL extension.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-28445) Inconsistency between Scala and Python/Panda udfs when groupby with udf() is used

2019-07-19 Thread Hyukjin Kwon (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-28445:
-
Description: 
Python:

{code}
from pyspark.sql.functions import pandas_udf, PandasUDFType

@pandas_udf("int", PandasUDFType.SCALAR)
def noop(x):
 return x

spark.udf.register("udf", noop)

sql("""
 CREATE OR REPLACE TEMPORARY VIEW testData AS SELECT * FROM VALUES
 (1, 1), (1, 2), (2, 1), (2, 2), (3, 1), (3, 2), (null, 1), (3, null), (null, 
null)
 AS testData(a, b)""")

sql("""SELECT udf(a + 1), udf(COUNT(b)) FROM testData GROUP BY udf(a + 
1)""").show()
{code}

{code}
: org.apache.spark.sql.AnalysisException: expression 'testdata.`a`' is neither 
present in the group by, nor is it an aggregate function. Add to group by or 
wrap in first() (or first_value) if you don't care which value you get.;;
Aggregate [udf((a#0 + 1))], [udf((a#0 + 1)) AS udf((a + 1))#10, udf(count(b#1)) 
AS udf(count(b))#12]
+- SubqueryAlias `testdata`
 +- Project [a#0, b#1]
 +- SubqueryAlias `testData`
 +- LocalRelation [a#0, b#1]
{code}


Scala:

{code}
spark.udf.register("udf", (input: Int) => input)

sql("""
 CREATE OR REPLACE TEMPORARY VIEW testData AS SELECT * FROM VALUES
 (1, 1), (1, 2), (2, 1), (2, 2), (3, 1), (3, 2), (null, 1), (3, null), (null, 
null)
 AS testData(a, b)""")

sql("""SELECT udf(a + 1), udf(COUNT(b)) FROM testData GROUP BY udf(a + 
1)""").show()
{code}

{code}
++-+
|udf((a + 1))|udf(count(b))|
++-+
|null|1|
|   3|2|
|   4|2|
|   2|2|
++-+
{code}


  was:
Python:

{code}
from pyspark.sql.functions import pandas_udf, PandasUDFType

@pandas_udf("int", PandasUDFType.SCALAR)
def noop(x):
 return x

spark.udf.register("udf", noop)

sql("""
 CREATE OR REPLACE TEMPORARY VIEW testData AS SELECT * FROM VALUES
 (1, 1), (1, 2), (2, 1), (2, 2), (3, 1), (3, 2), (null, 1), (3, null), (null, 
null)
 AS testData(a, b)""")

sql("""SELECT udf(a + 1), udf(COUNT(b)) FROM testData GROUP BY udf(a + 
1)""").show()
{code}

{code}
: org.apache.spark.sql.AnalysisException: expression 'testdata.`a`' is neither 
present in the group by, nor is it an aggregate function. Add to group by or 
wrap in first() (or first_value) if you don't care which value you get.;;
Aggregate [udf((a#0 + 1))], [udf((a#0 + 1)) AS udf((a + 1))#10, udf(count(b#1)) 
AS udf(count(b))#12]
+- SubqueryAlias `testdata`
 +- Project [a#0, b#1]
 +- SubqueryAlias `testData`
 +- LocalRelation [a#0, b#1]
{code}


Scala:

{code}
spark.udf.register("udf", (input: Int) => input)

sql("""
 CREATE OR REPLACE TEMPORARY VIEW testData AS SELECT * FROM VALUES
 (1, 1), (1, 2), (2, 1), (2, 2), (3, 1), (3, 2), (null, 1), (3, null), (null, 
null)
 AS testData(a, b)""")

sql("""SELECT udf(a + 1), udf(COUNT(b)) FROM testData GROUP BY udf(a + 
1)""").show()
{code}

{code}
++-+
|udf((a + 1))|udf(count(b))|
++-+
| null| 1|
| 3| 2|
| 4| 2|
| 2| 2|
++-+
{code}



> Inconsistency between Scala and Python/Panda udfs when groupby with udf() is 
> used
> -
>
> Key: SPARK-28445
> URL: https://issues.apache.org/jira/browse/SPARK-28445
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, SQL
>Affects Versions: 3.0.0
>Reporter: Stavros Kontopoulos
>Priority: Major
>
> Python:
> {code}
> from pyspark.sql.functions import pandas_udf, PandasUDFType
> @pandas_udf("int", PandasUDFType.SCALAR)
> def noop(x):
>  return x
> spark.udf.register("udf", noop)
> sql("""
>  CREATE OR REPLACE TEMPORARY VIEW testData AS SELECT * FROM VALUES
>  (1, 1), (1, 2), (2, 1), (2, 2), (3, 1), (3, 2), (null, 1), (3, null), (null, 
> null)
>  AS testData(a, b)""")
> sql("""SELECT udf(a + 1), udf(COUNT(b)) FROM testData GROUP BY udf(a + 
> 1)""").show()
> {code}
> {code}
> : org.apache.spark.sql.AnalysisException: expression 'testdata.`a`' is 
> neither present in the group by, nor is it an aggregate function. Add to 
> group by or wrap in first() (or first_value) if you don't care which value 
> you get.;;
> Aggregate [udf((a#0 + 1))], [udf((a#0 + 1)) AS udf((a + 1))#10, 
> udf(count(b#1)) AS udf(count(b))#12]
> +- SubqueryAlias `testdata`
>  +- Project [a#0, b#1]
>  +- SubqueryAlias `testData`
>  +- LocalRelation [a#0, b#1]
> {code}
> Scala:
> {code}
> spark.udf.register("udf", (input: Int) => input)
> sql("""
>  CREATE OR REPLACE TEMPORARY VIEW testData AS SELECT * FROM VALUES
>  (1, 1), (1, 2), (2, 1), (2, 2), (3, 1), (3, 2), (null, 1), (3, null), (null, 
> null)
>  AS testData(a, b)""")
> sql("""SELECT udf(a + 1), udf(COUNT(b)) FROM testData GROUP BY udf(a + 
> 1)""").show()
> {code}
> {code}
> ++-+
> |udf((a + 

[jira] [Updated] (SPARK-28445) Inconsistency between Scala and Python/Panda udfs when groupby with udf() is used

2019-07-19 Thread Hyukjin Kwon (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon updated SPARK-28445:
-
Description: 
Python:

{code}
from pyspark.sql.functions import pandas_udf, PandasUDFType

@pandas_udf("int", PandasUDFType.SCALAR)
def noop(x):
 return x

spark.udf.register("udf", noop)

sql("""
 CREATE OR REPLACE TEMPORARY VIEW testData AS SELECT * FROM VALUES
 (1, 1), (1, 2), (2, 1), (2, 2), (3, 1), (3, 2), (null, 1), (3, null), (null, 
null)
 AS testData(a, b)""")

sql("""SELECT udf(a + 1), udf(COUNT(b)) FROM testData GROUP BY udf(a + 
1)""").show()
{code}

{code}
: org.apache.spark.sql.AnalysisException: expression 'testdata.`a`' is neither 
present in the group by, nor is it an aggregate function. Add to group by or 
wrap in first() (or first_value) if you don't care which value you get.;;
Aggregate [udf((a#0 + 1))], [udf((a#0 + 1)) AS udf((a + 1))#10, udf(count(b#1)) 
AS udf(count(b))#12]
+- SubqueryAlias `testdata`
 +- Project [a#0, b#1]
 +- SubqueryAlias `testData`
 +- LocalRelation [a#0, b#1]
{code}


Scala:

{code}
spark.udf.register("udf", (input: Int) => input)

sql("""
 CREATE OR REPLACE TEMPORARY VIEW testData AS SELECT * FROM VALUES
 (1, 1), (1, 2), (2, 1), (2, 2), (3, 1), (3, 2), (null, 1), (3, null), (null, 
null)
 AS testData(a, b)""")

sql("""SELECT udf(a + 1), udf(COUNT(b)) FROM testData GROUP BY udf(a + 
1)""").show()
{code}

{code}
++-+
|udf((a + 1))|udf(count(b))|
++-+
| null| 1|
| 3| 2|
| 4| 2|
| 2| 2|
++-+
{code}


  was:
Python:

from pyspark.sql.functions import pandas_udf, PandasUDFType

@pandas_udf("int", PandasUDFType.SCALAR)
def noop(x):
 return x

spark.udf.register("udf", noop)

sql("""
 CREATE OR REPLACE TEMPORARY VIEW testData AS SELECT * FROM VALUES
 (1, 1), (1, 2), (2, 1), (2, 2), (3, 1), (3, 2), (null, 1), (3, null), (null, 
null)
 AS testData(a, b)""")

sql("""SELECT udf(a + 1), udf(COUNT(b)) FROM testData GROUP BY udf(a + 
1)""").show()
: org.apache.spark.sql.AnalysisException: expression 'testdata.`a`' is neither 
present in the group by, nor is it an aggregate function. Add to group by or 
wrap in first() (or first_value) if you don't care which value you get.;;
Aggregate [udf((a#0 + 1))], [udf((a#0 + 1)) AS udf((a + 1))#10, udf(count(b#1)) 
AS udf(count(b))#12]
+- SubqueryAlias `testdata`
 +- Project [a#0, b#1]
 +- SubqueryAlias `testData`
 +- LocalRelation [a#0, b#1]
Scala:

spark.udf.register("udf", (input: Int) => input)

sql("""
 CREATE OR REPLACE TEMPORARY VIEW testData AS SELECT * FROM VALUES
 (1, 1), (1, 2), (2, 1), (2, 2), (3, 1), (3, 2), (null, 1), (3, null), (null, 
null)
 AS testData(a, b)""")

sql("""SELECT udf(a + 1), udf(COUNT(b)) FROM testData GROUP BY udf(a + 
1)""").show()
++-+
|udf((a + 1))|udf(count(b))|
++-+
| null| 1|
| 3| 2|
| 4| 2|
| 2| 2|
++-+


> Inconsistency between Scala and Python/Panda udfs when groupby with udf() is 
> used
> -
>
> Key: SPARK-28445
> URL: https://issues.apache.org/jira/browse/SPARK-28445
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, SQL
>Affects Versions: 3.0.0
>Reporter: Stavros Kontopoulos
>Priority: Major
>
> Python:
> {code}
> from pyspark.sql.functions import pandas_udf, PandasUDFType
> @pandas_udf("int", PandasUDFType.SCALAR)
> def noop(x):
>  return x
> spark.udf.register("udf", noop)
> sql("""
>  CREATE OR REPLACE TEMPORARY VIEW testData AS SELECT * FROM VALUES
>  (1, 1), (1, 2), (2, 1), (2, 2), (3, 1), (3, 2), (null, 1), (3, null), (null, 
> null)
>  AS testData(a, b)""")
> sql("""SELECT udf(a + 1), udf(COUNT(b)) FROM testData GROUP BY udf(a + 
> 1)""").show()
> {code}
> {code}
> : org.apache.spark.sql.AnalysisException: expression 'testdata.`a`' is 
> neither present in the group by, nor is it an aggregate function. Add to 
> group by or wrap in first() (or first_value) if you don't care which value 
> you get.;;
> Aggregate [udf((a#0 + 1))], [udf((a#0 + 1)) AS udf((a + 1))#10, 
> udf(count(b#1)) AS udf(count(b))#12]
> +- SubqueryAlias `testdata`
>  +- Project [a#0, b#1]
>  +- SubqueryAlias `testData`
>  +- LocalRelation [a#0, b#1]
> {code}
> Scala:
> {code}
> spark.udf.register("udf", (input: Int) => input)
> sql("""
>  CREATE OR REPLACE TEMPORARY VIEW testData AS SELECT * FROM VALUES
>  (1, 1), (1, 2), (2, 1), (2, 2), (3, 1), (3, 2), (null, 1), (3, null), (null, 
> null)
>  AS testData(a, b)""")
> sql("""SELECT udf(a + 1), udf(COUNT(b)) FROM testData GROUP BY udf(a + 
> 1)""").show()
> {code}
> {code}
> ++-+
> |udf((a + 1))|udf(count(b))|
> ++-+
> | null| 1|
> | 3| 2|
> | 4| 2|
> | 2| 2|
> ++-+
> {code}



--
This 

[jira] [Updated] (SPARK-28445) Inconsistency between Scala and Python/Panda udfs when groupby with udf() is used

2019-07-19 Thread Stavros Kontopoulos (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stavros Kontopoulos updated SPARK-28445:

Summary: Inconsistency between Scala and Python/Panda udfs when groupby 
with udf() is used  (was: Inconsistency between Scala and Python/Panda udfs 
when groupby udef() is used)

> Inconsistency between Scala and Python/Panda udfs when groupby with udf() is 
> used
> -
>
> Key: SPARK-28445
> URL: https://issues.apache.org/jira/browse/SPARK-28445
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, SQL
>Affects Versions: 3.0.0
>Reporter: Stavros Kontopoulos
>Priority: Major
>
> Python:
> from pyspark.sql.functions import pandas_udf, PandasUDFType
> @pandas_udf("int", PandasUDFType.SCALAR)
> def noop(x):
>  return x
> spark.udf.register("udf", noop)
> sql("""
>  CREATE OR REPLACE TEMPORARY VIEW testData AS SELECT * FROM VALUES
>  (1, 1), (1, 2), (2, 1), (2, 2), (3, 1), (3, 2), (null, 1), (3, null), (null, 
> null)
>  AS testData(a, b)""")
> sql("""SELECT udf(a + 1), udf(COUNT(b)) FROM testData GROUP BY udf(a + 
> 1)""").show()
> : org.apache.spark.sql.AnalysisException: expression 'testdata.`a`' is 
> neither present in the group by, nor is it an aggregate function. Add to 
> group by or wrap in first() (or first_value) if you don't care which value 
> you get.;;
> Aggregate [udf((a#0 + 1))], [udf((a#0 + 1)) AS udf((a + 1))#10, 
> udf(count(b#1)) AS udf(count(b))#12]
> +- SubqueryAlias `testdata`
>  +- Project [a#0, b#1]
>  +- SubqueryAlias `testData`
>  +- LocalRelation [a#0, b#1]
> Scala:
> spark.udf.register("udf", (input: Int) => input)
> sql("""
>  CREATE OR REPLACE TEMPORARY VIEW testData AS SELECT * FROM VALUES
>  (1, 1), (1, 2), (2, 1), (2, 2), (3, 1), (3, 2), (null, 1), (3, null), (null, 
> null)
>  AS testData(a, b)""")
> sql("""SELECT udf(a + 1), udf(COUNT(b)) FROM testData GROUP BY udf(a + 
> 1)""").show()
> ++-+
> |udf((a + 1))|udf(count(b))|
> ++-+
> | null| 1|
> | 3| 2|
> | 4| 2|
> | 2| 2|
> ++-+



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-28447) ANSI SQL: Unicode escapes in literals

2019-07-19 Thread Yuming Wang (JIRA)
Yuming Wang created SPARK-28447:
---

 Summary: ANSI SQL: Unicode escapes in literals
 Key: SPARK-28447
 URL: https://issues.apache.org/jira/browse/SPARK-28447
 Project: Spark
  Issue Type: Sub-task
  Components: SQL
Affects Versions: 3.0.0
Reporter: Yuming Wang


[https://github.com/postgres/postgres/blob/REL_12_BETA2/src/test/regress/sql/strings.sql#L19-L44]

 
*Feature ID*: F393



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-28446) Document Kafka Headers support

2019-07-19 Thread Lee Dongjin (JIRA)
Lee Dongjin created SPARK-28446:
---

 Summary: Document Kafka Headers support
 Key: SPARK-28446
 URL: https://issues.apache.org/jira/browse/SPARK-28446
 Project: Spark
  Issue Type: Documentation
  Components: Documentation, Structured Streaming
Affects Versions: 3.0.0
Reporter: Lee Dongjin


This issue is a follow up of SPARK-23539.

After completing SPARK-23539, the following information about the headers 
functionality should be noted in Structured Streaming + Kafka Integration Guide:
 * The requirements to use Headers functionality (i.e., Kafka version).
 * How to turn on the Headers functionality.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-28445) Inconsistency between Scala and Python/Panda udfs when groupby udef() is used

2019-07-19 Thread Stavros Kontopoulos (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28445?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Stavros Kontopoulos updated SPARK-28445:

Component/s: PySpark

> Inconsistency between Scala and Python/Panda udfs when groupby udef() is used
> -
>
> Key: SPARK-28445
> URL: https://issues.apache.org/jira/browse/SPARK-28445
> Project: Spark
>  Issue Type: Bug
>  Components: PySpark, SQL
>Affects Versions: 3.0.0
>Reporter: Stavros Kontopoulos
>Priority: Major
>
> Python:
> from pyspark.sql.functions import pandas_udf, PandasUDFType
> @pandas_udf("int", PandasUDFType.SCALAR)
> def noop(x):
>  return x
> spark.udf.register("udf", noop)
> sql("""
>  CREATE OR REPLACE TEMPORARY VIEW testData AS SELECT * FROM VALUES
>  (1, 1), (1, 2), (2, 1), (2, 2), (3, 1), (3, 2), (null, 1), (3, null), (null, 
> null)
>  AS testData(a, b)""")
> sql("""SELECT udf(a + 1), udf(COUNT(b)) FROM testData GROUP BY udf(a + 
> 1)""").show()
> : org.apache.spark.sql.AnalysisException: expression 'testdata.`a`' is 
> neither present in the group by, nor is it an aggregate function. Add to 
> group by or wrap in first() (or first_value) if you don't care which value 
> you get.;;
> Aggregate [udf((a#0 + 1))], [udf((a#0 + 1)) AS udf((a + 1))#10, 
> udf(count(b#1)) AS udf(count(b))#12]
> +- SubqueryAlias `testdata`
>  +- Project [a#0, b#1]
>  +- SubqueryAlias `testData`
>  +- LocalRelation [a#0, b#1]
> Scala:
> spark.udf.register("udf", (input: Int) => input)
> sql("""
>  CREATE OR REPLACE TEMPORARY VIEW testData AS SELECT * FROM VALUES
>  (1, 1), (1, 2), (2, 1), (2, 2), (3, 1), (3, 2), (null, 1), (3, null), (null, 
> null)
>  AS testData(a, b)""")
> sql("""SELECT udf(a + 1), udf(COUNT(b)) FROM testData GROUP BY udf(a + 
> 1)""").show()
> ++-+
> |udf((a + 1))|udf(count(b))|
> ++-+
> | null| 1|
> | 3| 2|
> | 4| 2|
> | 2| 2|
> ++-+



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-28445) Inconsistency between Scala and Python/Panda udfs when groupby udef() is used

2019-07-19 Thread Stavros Kontopoulos (JIRA)
Stavros Kontopoulos created SPARK-28445:
---

 Summary: Inconsistency between Scala and Python/Panda udfs when 
groupby udef() is used
 Key: SPARK-28445
 URL: https://issues.apache.org/jira/browse/SPARK-28445
 Project: Spark
  Issue Type: Bug
  Components: SQL
Affects Versions: 3.0.0
Reporter: Stavros Kontopoulos


Python:

from pyspark.sql.functions import pandas_udf, PandasUDFType

@pandas_udf("int", PandasUDFType.SCALAR)
def noop(x):
 return x

spark.udf.register("udf", noop)

sql("""
 CREATE OR REPLACE TEMPORARY VIEW testData AS SELECT * FROM VALUES
 (1, 1), (1, 2), (2, 1), (2, 2), (3, 1), (3, 2), (null, 1), (3, null), (null, 
null)
 AS testData(a, b)""")

sql("""SELECT udf(a + 1), udf(COUNT(b)) FROM testData GROUP BY udf(a + 
1)""").show()
: org.apache.spark.sql.AnalysisException: expression 'testdata.`a`' is neither 
present in the group by, nor is it an aggregate function. Add to group by or 
wrap in first() (or first_value) if you don't care which value you get.;;
Aggregate [udf((a#0 + 1))], [udf((a#0 + 1)) AS udf((a + 1))#10, udf(count(b#1)) 
AS udf(count(b))#12]
+- SubqueryAlias `testdata`
 +- Project [a#0, b#1]
 +- SubqueryAlias `testData`
 +- LocalRelation [a#0, b#1]
Scala:

spark.udf.register("udf", (input: Int) => input)

sql("""
 CREATE OR REPLACE TEMPORARY VIEW testData AS SELECT * FROM VALUES
 (1, 1), (1, 2), (2, 1), (2, 2), (3, 1), (3, 2), (null, 1), (3, null), (null, 
null)
 AS testData(a, b)""")

sql("""SELECT udf(a + 1), udf(COUNT(b)) FROM testData GROUP BY udf(a + 
1)""").show()
++-+
|udf((a + 1))|udf(count(b))|
++-+
| null| 1|
| 3| 2|
| 4| 2|
| 2| 2|
++-+



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-28444) Bump Kubernetes Client Version to 4.3.0

2019-07-19 Thread Stavros Kontopoulos (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-28444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1608#comment-1608
 ] 

Stavros Kontopoulos commented on SPARK-28444:
-

Hi [~patrick-winter-swisscard]. On our ci we are using v1.15 and tests pass, 
could you add some log output showing why pods are not created.

We need to be compliant with the compatibility matrix but still we dotn have a 
good answer to the problem of catching up with k8s, it moves fast.

> Bump Kubernetes Client Version to 4.3.0
> ---
>
> Key: SPARK-28444
> URL: https://issues.apache.org/jira/browse/SPARK-28444
> Project: Spark
>  Issue Type: Dependency upgrade
>  Components: Kubernetes
>Affects Versions: 3.0.0, 2.4.3
>Reporter: Patrick Winter
>Priority: Major
>
> Spark is currently using the Kubernetes client version 4.1.2. This client 
> does not support the current Kubernetes version 1.14, as can be seen on the 
> [compatibility 
> matrix|[https://github.com/fabric8io/kubernetes-client#compatibility-matrix]].
>  Therefore the Kubernetes client should be bumped up to version 4.3.0.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-28444) Bump Kubernetes Client Version to 4.3.0

2019-07-19 Thread Stavros Kontopoulos (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-28444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=1608#comment-1608
 ] 

Stavros Kontopoulos edited comment on SPARK-28444 at 7/19/19 11:44 AM:
---

Hi [~patrick-winter-swisscard]. On our ci we are using v1.15 and tests pass, 
could you add some log output showing why pods are not created.

We need to be compliant with the compatibility matrix but still we dont have a 
good answer to the problem of catching up with k8s, it moves fast.


was (Author: skonto):
Hi [~patrick-winter-swisscard]. On our ci we are using v1.15 and tests pass, 
could you add some log output showing why pods are not created.

We need to be compliant with the compatibility matrix but still we dotn have a 
good answer to the problem of catching up with k8s, it moves fast.

> Bump Kubernetes Client Version to 4.3.0
> ---
>
> Key: SPARK-28444
> URL: https://issues.apache.org/jira/browse/SPARK-28444
> Project: Spark
>  Issue Type: Dependency upgrade
>  Components: Kubernetes
>Affects Versions: 3.0.0, 2.4.3
>Reporter: Patrick Winter
>Priority: Major
>
> Spark is currently using the Kubernetes client version 4.1.2. This client 
> does not support the current Kubernetes version 1.14, as can be seen on the 
> [compatibility 
> matrix|[https://github.com/fabric8io/kubernetes-client#compatibility-matrix]].
>  Therefore the Kubernetes client should be bumped up to version 4.3.0.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23443) Spark with Glue as external catalog

2019-07-19 Thread Dongjoon Hyun (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-23443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16888681#comment-16888681
 ] 

Dongjoon Hyun commented on SPARK-23443:
---

Does Glue support Spark 2.3+? As of now, AWS Glue Console shows only Spark 2.2 
(Scala/Python2).

BTW,
- Spark 2.2 was EOL on January 2019
- Spark 2.3 will be EOL on August 2019 (next month)
- Python 2.x will be EOL on January 2020. (PySpark will deprecate Python2 in 
this year).

> Spark with Glue as external catalog
> ---
>
> Key: SPARK-23443
> URL: https://issues.apache.org/jira/browse/SPARK-23443
> Project: Spark
>  Issue Type: New Feature
>  Components: Spark Core
>Affects Versions: 3.0.0
>Reporter: Ameen Tayyebi
>Priority: Major
>
> AWS Glue Catalog is an external Hive metastore backed by a web service. It 
> allows permanent storage of catalog data for BigData use cases.
> To find out more information about AWS Glue, please consult:
>  * AWS Glue - [https://aws.amazon.com/glue/]
>  * Using Glue as a Metastore catalog for Spark - 
> [https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-spark-glue.html]
> Today, the integration of Glue and Spark is through the Hive layer. Glue 
> implements the IMetaStore interface of Hive and for installations of Spark 
> that contain Hive, Glue can be used as the metastore.
> The feature set that Glue supports does not align 1-1 with the set of 
> features that the latest version of Spark supports. For example, Glue 
> interface supports more advanced partition pruning that the latest version of 
> Hive embedded in Spark.
> To enable a more natural integration with Spark and to allow leveraging 
> latest features of Glue, without being coupled to Hive, a direct integration 
> through Spark's own Catalog API is proposed. This Jira tracks this work.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-23758) MLlib 2.4 Roadmap

2019-07-19 Thread Dongjoon Hyun (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-23758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16888669#comment-16888669
 ] 

Dongjoon Hyun commented on SPARK-23758:
---

I moved this to 3.0.0 because open `New Feature` JIRA should go to `3.0.0`. I 
agree that this looks weird, but I'm not sure I can close this. Roadmap is 
usually managed by PMC.

Hi, [~josephkb] , we already have 2.4.3 and can we close this issue?

> MLlib 2.4 Roadmap
> -
>
> Key: SPARK-23758
> URL: https://issues.apache.org/jira/browse/SPARK-23758
> Project: Spark
>  Issue Type: New Feature
>  Components: ML
>Affects Versions: 3.0.0
>Reporter: Joseph K. Bradley
>Priority: Major
>
> h1. Roadmap process
> This roadmap is a master list for MLlib improvements we are working on during 
> this release.  This includes ML-related changes in PySpark and SparkR.
> *What is planned for the next release?*
> * This roadmap lists issues which at least one Committer has prioritized.  
> See details below in "Instructions for committers."
> * This roadmap only lists larger or more critical issues.
> *How can contributors influence this roadmap?*
> * If you believe an issue should be in this roadmap, please discuss the issue 
> on JIRA and/or the dev mailing list.  Make sure to ping Committers since at 
> least one must agree to shepherd the issue.
> * For general discussions, use this JIRA or the dev mailing list.  For 
> specific issues, please comment on those issues or the mailing list.
> * Vote for & watch issues which are important to you.
> ** MLlib, sorted by: [Votes | 
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20SPARK%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened)%20AND%20component%20in%20(ML%2C%20MLlib)%20ORDER%20BY%20votes%20DESC]
>  or [Watchers | 
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20SPARK%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened)%20AND%20component%20in%20(ML%2C%20MLlib)%20ORDER%20BY%20Watchers%20DESC]
> ** SparkR, sorted by: [Votes | 
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20SPARK%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened)%20AND%20component%20in%20(SparkR)%20ORDER%20BY%20votes%20DESC]
>  or [Watchers | 
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20SPARK%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened)%20AND%20component%20in%20(SparkR)%20ORDER%20BY%20Watchers%20DESC]
> h2. Target Version and Priority
> This section describes the meaning of Target Version and Priority.
> || Category | Target Version | Priority | Shepherd | Put on roadmap? | In 
> next release? ||
> | [1 | 
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20SPARK%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened)%20AND%20priority%20%3D%20Blocker%20AND%20component%20in%20(GraphX%2C%20ML%2C%20MLlib%2C%20SparkR)%20AND%20%22Target%20Version%2Fs%22%20in%20(2.4.0%2C%203.0.0)]
>  | next release | Blocker | *must* | *must* | *must* |
> | [2 | 
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20SPARK%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened)%20AND%20priority%20%3D%20Critical%20AND%20component%20in%20(GraphX%2C%20ML%2C%20MLlib%2C%20SparkR)%20AND%20%22Target%20Version%2Fs%22%20in%20(2.4.0%2C%203.0.0)]
>  | next release | Critical | *must* | yes, unless small | *best effort* |
> | [3 | 
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20SPARK%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened)%20AND%20priority%20%3D%20Major%20AND%20component%20in%20(GraphX%2C%20ML%2C%20MLlib%2C%20SparkR)%20AND%20%22Target%20Version%2Fs%22%20in%20(2.4.0%2C%203.0.0)]
>  | next release | Major | *must* | optional | *best effort* |
> | [4 | 
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20SPARK%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened)%20AND%20priority%20%3D%20Minor%20AND%20component%20in%20(GraphX%2C%20ML%2C%20MLlib%2C%20SparkR)%20AND%20%22Target%20Version%2Fs%22%20in%20(2.4.0%2C%203.0.0)]
>  | next release | Minor | optional | no | maybe |
> | [5 | 
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20SPARK%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened)%20AND%20priority%20%3D%20Trivial%20AND%20component%20in%20(GraphX%2C%20ML%2C%20MLlib%2C%20SparkR)%20AND%20%22Target%20Version%2Fs%22%20in%20(2.4.0%2C%203.0.0)]
>  | next release | Trivial | optional | no | maybe |
> | [6 | 
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20SPARK%20AND%20status%20in%20(Open%2C%20"In%20Progress"%2C%20Reopened)%20AND%20component%20in%20(GraphX%2C%20ML%2C%20MLlib%2C%20SparkR)%20AND%20"Target%20Version%2Fs"%20in%20(EMPTY)%20AND%20Shepherd%20not%20in%20(EMPTY)%20ORDER%20BY%20priority%20DESC]
>  | (empty) | (any) | yes | no | 

[jira] [Commented] (SPARK-28444) Bump Kubernetes Client Version to 4.3.0

2019-07-19 Thread Patrick Winter (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-28444?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16888616#comment-16888616
 ] 

Patrick Winter commented on SPARK-28444:


Our company recently upgraded Kubernetes to 1.14 and since then Spark can't 
create PODs anymore, throwing a KubernetesClientException instead.

> Bump Kubernetes Client Version to 4.3.0
> ---
>
> Key: SPARK-28444
> URL: https://issues.apache.org/jira/browse/SPARK-28444
> Project: Spark
>  Issue Type: Dependency upgrade
>  Components: Kubernetes
>Affects Versions: 3.0.0, 2.4.3
>Reporter: Patrick Winter
>Priority: Major
>
> Spark is currently using the Kubernetes client version 4.1.2. This client 
> does not support the current Kubernetes version 1.14, as can be seen on the 
> [compatibility 
> matrix|[https://github.com/fabric8io/kubernetes-client#compatibility-matrix]].
>  Therefore the Kubernetes client should be bumped up to version 4.3.0.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-28444) Bump Kubernetes Client Version to 4.3.0

2019-07-19 Thread Patrick Winter (JIRA)
Patrick Winter created SPARK-28444:
--

 Summary: Bump Kubernetes Client Version to 4.3.0
 Key: SPARK-28444
 URL: https://issues.apache.org/jira/browse/SPARK-28444
 Project: Spark
  Issue Type: Dependency upgrade
  Components: Kubernetes
Affects Versions: 2.4.3, 3.0.0
Reporter: Patrick Winter


Spark is currently using the Kubernetes client version 4.1.2. This client does 
not support the current Kubernetes version 1.14, as can be seen on the 
[compatibility 
matrix|[https://github.com/fabric8io/kubernetes-client#compatibility-matrix]]. 
Therefore the Kubernetes client should be bumped up to version 4.3.0.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-28284) Convert and port 'join-empty-relation.sql' into UDF test base

2019-07-19 Thread Hyukjin Kwon (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon resolved SPARK-28284.
--
   Resolution: Fixed
Fix Version/s: 3.0.0

Issue resolved by pull request 25127
[https://github.com/apache/spark/pull/25127]

> Convert and port 'join-empty-relation.sql' into UDF test base
> -
>
> Key: SPARK-28284
> URL: https://issues.apache.org/jira/browse/SPARK-28284
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark, SQL, Tests
>Affects Versions: 3.0.0
>Reporter: Hyukjin Kwon
>Assignee: Terry Kim
>Priority: Major
> Fix For: 3.0.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-28284) Convert and port 'join-empty-relation.sql' into UDF test base

2019-07-19 Thread Hyukjin Kwon (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28284?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Hyukjin Kwon reassigned SPARK-28284:


Assignee: Terry Kim

> Convert and port 'join-empty-relation.sql' into UDF test base
> -
>
> Key: SPARK-28284
> URL: https://issues.apache.org/jira/browse/SPARK-28284
> Project: Spark
>  Issue Type: Sub-task
>  Components: PySpark, SQL, Tests
>Affects Versions: 3.0.0
>Reporter: Hyukjin Kwon
>Assignee: Terry Kim
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-28440) Use TestingUtils to compare floating point values

2019-07-19 Thread Dongjoon Hyun (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28440?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-28440.
---
   Resolution: Fixed
Fix Version/s: 3.0.0

This is resolved via https://github.com/apache/spark/pull/25191

> Use TestingUtils to compare floating point values
> -
>
> Key: SPARK-28440
> URL: https://issues.apache.org/jira/browse/SPARK-28440
> Project: Spark
>  Issue Type: Improvement
>  Components: MLlib, Tests
>Affects Versions: 3.0.0
>Reporter: Dongjoon Hyun
>Priority: Minor
> Fix For: 3.0.0
>
>




--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Assigned] (SPARK-27707) Performance issue using explode

2019-07-19 Thread Dongjoon Hyun (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun reassigned SPARK-27707:
-

Assignee: Liang-Chi Hsieh

> Performance issue using explode
> ---
>
> Key: SPARK-27707
> URL: https://issues.apache.org/jira/browse/SPARK-27707
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Ohad Raviv
>Assignee: Liang-Chi Hsieh
>Priority: Major
>
> this is a corner case of SPARK-21657.
> we have a case where we want to explode array inside a struct and also keep 
> some other columns of the struct. we again encounter a huge performance issue.
> reconstruction code:
> {code}
> val df = spark.sparkContext.parallelize(Seq(("1",
>   Array.fill(M)({
> val i = math.random
> (i.toString, (i + 1).toString, (i + 2).toString, (i + 3).toString)
>   }.toDF("col", "arr")
>   .selectExpr("col", "struct(col, arr) as st")
>   .selectExpr("col", "st.col as col1", "explode(st.arr) as arr_col")
> df.write.mode("overwrite").save("/tmp/blah")
> {code}
> a workaround is projecting before the explode:
> {code}
> val df = spark.sparkContext.parallelize(Seq(("1",
>   Array.fill(M)({
> val i = math.random
> (i.toString, (i + 1).toString, (i + 2).toString, (i + 3).toString)
>   }.toDF("col", "arr")
>   .selectExpr("col", "struct(col, arr) as st")
>   .withColumn("col1", $"st.col")
>   .selectExpr("col", "col1", "explode(st.arr) as arr_col")
> df.write.mode("overwrite").save("/tmp/blah")
> {code}
> in this case the optimization done in SPARK-21657:
> {code}
> // prune unrequired references
> case p @ Project(_, g: Generate) if p.references != g.outputSet =>
>   val requiredAttrs = p.references -- g.producedAttributes ++ 
> g.generator.references
>   val newChild = prunedChild(g.child, requiredAttrs)
>   val unrequired = g.generator.references -- p.references
>   val unrequiredIndices = newChild.output.zipWithIndex.filter(t => 
> unrequired.contains(t._1))
> .map(_._2)
>   p.copy(child = g.copy(child = newChild, unrequiredChildIndex = 
> unrequiredIndices))
> {code}
> doesn't work because `p.references` has whole the `st` struct as reference 
> and not just the projected field.
> this causes the entire struct including the huge array field to get 
> duplicated as the number of array elements.
> I know this is kind of a corner case but was really non trivial to 
> understand..



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-27707) Prune unnecessary nested fields from Generate

2019-07-19 Thread Dongjoon Hyun (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun updated SPARK-27707:
--
Summary: Prune unnecessary nested fields from Generate  (was: Performance 
issue using explode)

> Prune unnecessary nested fields from Generate
> -
>
> Key: SPARK-27707
> URL: https://issues.apache.org/jira/browse/SPARK-27707
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Ohad Raviv
>Assignee: Liang-Chi Hsieh
>Priority: Major
> Fix For: 3.0.0
>
>
> this is a corner case of SPARK-21657.
> we have a case where we want to explode array inside a struct and also keep 
> some other columns of the struct. we again encounter a huge performance issue.
> reconstruction code:
> {code}
> val df = spark.sparkContext.parallelize(Seq(("1",
>   Array.fill(M)({
> val i = math.random
> (i.toString, (i + 1).toString, (i + 2).toString, (i + 3).toString)
>   }.toDF("col", "arr")
>   .selectExpr("col", "struct(col, arr) as st")
>   .selectExpr("col", "st.col as col1", "explode(st.arr) as arr_col")
> df.write.mode("overwrite").save("/tmp/blah")
> {code}
> a workaround is projecting before the explode:
> {code}
> val df = spark.sparkContext.parallelize(Seq(("1",
>   Array.fill(M)({
> val i = math.random
> (i.toString, (i + 1).toString, (i + 2).toString, (i + 3).toString)
>   }.toDF("col", "arr")
>   .selectExpr("col", "struct(col, arr) as st")
>   .withColumn("col1", $"st.col")
>   .selectExpr("col", "col1", "explode(st.arr) as arr_col")
> df.write.mode("overwrite").save("/tmp/blah")
> {code}
> in this case the optimization done in SPARK-21657:
> {code}
> // prune unrequired references
> case p @ Project(_, g: Generate) if p.references != g.outputSet =>
>   val requiredAttrs = p.references -- g.producedAttributes ++ 
> g.generator.references
>   val newChild = prunedChild(g.child, requiredAttrs)
>   val unrequired = g.generator.references -- p.references
>   val unrequiredIndices = newChild.output.zipWithIndex.filter(t => 
> unrequired.contains(t._1))
> .map(_._2)
>   p.copy(child = g.copy(child = newChild, unrequiredChildIndex = 
> unrequiredIndices))
> {code}
> doesn't work because `p.references` has whole the `st` struct as reference 
> and not just the projected field.
> this causes the entire struct including the huge array field to get 
> duplicated as the number of array elements.
> I know this is kind of a corner case but was really non trivial to 
> understand..



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-27707) Performance issue using explode

2019-07-19 Thread Dongjoon Hyun (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-27707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Dongjoon Hyun resolved SPARK-27707.
---
   Resolution: Fixed
Fix Version/s: 3.0.0

Issue resolved by pull request 24637
[https://github.com/apache/spark/pull/24637]

> Performance issue using explode
> ---
>
> Key: SPARK-27707
> URL: https://issues.apache.org/jira/browse/SPARK-27707
> Project: Spark
>  Issue Type: Improvement
>  Components: SQL
>Affects Versions: 3.0.0
>Reporter: Ohad Raviv
>Assignee: Liang-Chi Hsieh
>Priority: Major
> Fix For: 3.0.0
>
>
> this is a corner case of SPARK-21657.
> we have a case where we want to explode array inside a struct and also keep 
> some other columns of the struct. we again encounter a huge performance issue.
> reconstruction code:
> {code}
> val df = spark.sparkContext.parallelize(Seq(("1",
>   Array.fill(M)({
> val i = math.random
> (i.toString, (i + 1).toString, (i + 2).toString, (i + 3).toString)
>   }.toDF("col", "arr")
>   .selectExpr("col", "struct(col, arr) as st")
>   .selectExpr("col", "st.col as col1", "explode(st.arr) as arr_col")
> df.write.mode("overwrite").save("/tmp/blah")
> {code}
> a workaround is projecting before the explode:
> {code}
> val df = spark.sparkContext.parallelize(Seq(("1",
>   Array.fill(M)({
> val i = math.random
> (i.toString, (i + 1).toString, (i + 2).toString, (i + 3).toString)
>   }.toDF("col", "arr")
>   .selectExpr("col", "struct(col, arr) as st")
>   .withColumn("col1", $"st.col")
>   .selectExpr("col", "col1", "explode(st.arr) as arr_col")
> df.write.mode("overwrite").save("/tmp/blah")
> {code}
> in this case the optimization done in SPARK-21657:
> {code}
> // prune unrequired references
> case p @ Project(_, g: Generate) if p.references != g.outputSet =>
>   val requiredAttrs = p.references -- g.producedAttributes ++ 
> g.generator.references
>   val newChild = prunedChild(g.child, requiredAttrs)
>   val unrequired = g.generator.references -- p.references
>   val unrequiredIndices = newChild.output.zipWithIndex.filter(t => 
> unrequired.contains(t._1))
> .map(_._2)
>   p.copy(child = g.copy(child = newChild, unrequiredChildIndex = 
> unrequiredIndices))
> {code}
> doesn't work because `p.references` has whole the `st` struct as reference 
> and not just the projected field.
> this causes the entire struct including the huge array field to get 
> duplicated as the number of array elements.
> I know this is kind of a corner case but was really non trivial to 
> understand..



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org