[jira] [Created] (SPARK-32504) Shuffle Storage API: Dynamic updates of shuffle metadata

2020-07-30 Thread Matt Cheah (Jira)
Matt Cheah created SPARK-32504:
--

 Summary: Shuffle Storage API: Dynamic updates of shuffle metadata
 Key: SPARK-32504
 URL: https://issues.apache.org/jira/browse/SPARK-32504
 Project: Spark
  Issue Type: Sub-task
  Components: Shuffle
Affects Versions: 3.0.0
Reporter: Matt Cheah


When using external storage for shuffles as part of the shuffle storage API 
mechanism, it is often desirable to update the metadata associated with 
shuffles that we have enabled plugin systems to implement via 
https://issues.apache.org/jira/browse/SPARK-31801. For example:
 # If data is stored in some replicated manner, and the number of replicas is 
updated - then we want the metadata stored on the driver to reflect the new 
number of replicas and where they are located.
 # If data is stored on the mapper's local disk, but is asynchronously backed 
up to some external storage medium, then we want to know when a backup is 
available externally.

To achieve this, we would need to pass a hook to updating the shuffle metadata 
to the shuffle executor components at the root of the plugin tree on the 
executor side. The executor would establish an RPC connection with the driver 
and send messages to update shuffle metadata accordingly.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-28210) Shuffle Storage API: Reads

2020-07-30 Thread Matt Cheah (Jira)


[ 
https://issues.apache.org/jira/browse/SPARK-28210?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17168287#comment-17168287
 ] 

Matt Cheah commented on SPARK-28210:


[~devaraj]  [~tianczha] Thanks for expressing interest in this! This patch 
blocks on the shuffle metadata APIs patch, which one can find here: 
[https://github.com/apache/spark/pull/28618.|https://github.com/apache/spark/pull/28618]

I think after merging the shuffle metadata API change, we can provide the 
appropriate reader APIs and then integrate the usage of shuffle metadata 
accordingly. I originally had a diff here: 
[https://github.com/mccheah/spark/pull/12], but it's fallen far out of sync 
with the patches proposed against upstream Spark.

The proposed reader API can be found on the shuffle API design document: 
[https://docs.google.com/document/d/1Aj6IyMsbS2sdIfHxLvIbHUNjHIWHTabfknIPoxOrTjk/edit#heading=h.cli6x4fsunz6.|https://docs.google.com/document/d/1Aj6IyMsbS2sdIfHxLvIbHUNjHIWHTabfknIPoxOrTjk/edit#heading=h.cli6x4fsunz6]

But we really cannot make any progress here unless we have the shuffle metadata 
storage APIs and integration complete. Can you take a review through the Apache 
patch listed above, give your +1 or feedback on how it can be improved, and 
then we can go from there?

> Shuffle Storage API: Reads
> --
>
> Key: SPARK-28210
> URL: https://issues.apache.org/jira/browse/SPARK-28210
> Project: Spark
>  Issue Type: Sub-task
>  Components: Shuffle, Spark Core
>Affects Versions: 3.1.0
>Reporter: Matt Cheah
>Priority: Major
>
> As part of the effort to store shuffle data in arbitrary places, this issue 
> tracks implementing an API for reading the shuffle data stored by the write 
> API. Also ensure that the existing shuffle implementation is refactored to 
> use the API.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-31801) Register shuffle map output metadata with a shuffle output tracker

2020-05-22 Thread Matt Cheah (Jira)
Matt Cheah created SPARK-31801:
--

 Summary: Register shuffle map output metadata with a shuffle 
output tracker
 Key: SPARK-31801
 URL: https://issues.apache.org/jira/browse/SPARK-31801
 Project: Spark
  Issue Type: Sub-task
  Components: Shuffle
Affects Versions: 3.1.0
Reporter: Matt Cheah


Part of the design as discussed in [this 
document|https://docs.google.com/document/d/1Aj6IyMsbS2sdIfHxLvIbHUNjHIWHTabfknIPoxOrTjk/edit#].

Establish a {{ShuffleOutputTracker}} API that resides on the driver, and handle 
accepting map output metadata returned by the map output writers and send them 
to the output tracker module accordingly.

Requires https://issues.apache.org/jira/browse/SPARK-31798.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-31798) Return map output metadata from shuffle writers

2020-05-22 Thread Matt Cheah (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Cheah updated SPARK-31798:
---
Description: 
Part of the overall sub-push for shuffle metadata management as proposed in 
[this 
document|https://docs.google.com/document/d/1Aj6IyMsbS2sdIfHxLvIbHUNjHIWHTabfknIPoxOrTjk/edit].

The shuffle plugin system needs a means for storing custom metadata for shuffle 
readers to eventually use. This task tracks the API changes required for the 
map output writer to return some such metadata back to the driver.

  was:
Part of the overall sub-push for shuffle metadata management as proposed in 
[this 
document|[https://docs.google.com/document/d/1Aj6IyMsbS2sdIfHxLvIbHUNjHIWHTabfknIPoxOrTjk/edit]]

The shuffle plugin system needs a means for storing custom metadata for shuffle 
readers to eventually use. This task tracks the API changes required for the 
map output writer to return some such metadata back to the driver.


> Return map output metadata from shuffle writers
> ---
>
> Key: SPARK-31798
> URL: https://issues.apache.org/jira/browse/SPARK-31798
> Project: Spark
>  Issue Type: Sub-task
>  Components: Shuffle
>Affects Versions: 3.0.0
>Reporter: Matt Cheah
>Priority: Major
>
> Part of the overall sub-push for shuffle metadata management as proposed in 
> [this 
> document|https://docs.google.com/document/d/1Aj6IyMsbS2sdIfHxLvIbHUNjHIWHTabfknIPoxOrTjk/edit].
> The shuffle plugin system needs a means for storing custom metadata for 
> shuffle readers to eventually use. This task tracks the API changes required 
> for the map output writer to return some such metadata back to the driver.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-31798) Return map output metadata from shuffle writers

2020-05-22 Thread Matt Cheah (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Cheah updated SPARK-31798:
---
Description: 
Part of the overall sub-push for shuffle metadata management as proposed in 
[this 
document|[https://docs.google.com/document/d/1Aj6IyMsbS2sdIfHxLvIbHUNjHIWHTabfknIPoxOrTjk/edit]]

The shuffle plugin system needs a means for storing custom metadata for shuffle 
readers to eventually use. This task tracks the API changes required for the 
map output writer to return some such metadata back to the driver.

  was:
Part of the overall sub-push for shuffle metadata management as proposed in 
[this 
document|[https://docs.google.com/document/d/1Aj6IyMsbS2sdIfHxLvIbHUNjHIWHTabfknIPoxOrTjk/edit]].

The shuffle plugin system needs a means for storing custom metadata for shuffle 
readers to eventually use. This task tracks the API changes required for the 
map output writer to return some such metadata back to the driver.


> Return map output metadata from shuffle writers
> ---
>
> Key: SPARK-31798
> URL: https://issues.apache.org/jira/browse/SPARK-31798
> Project: Spark
>  Issue Type: Sub-task
>  Components: Shuffle
>Affects Versions: 3.0.0
>Reporter: Matt Cheah
>Priority: Major
>
> Part of the overall sub-push for shuffle metadata management as proposed in 
> [this 
> document|[https://docs.google.com/document/d/1Aj6IyMsbS2sdIfHxLvIbHUNjHIWHTabfknIPoxOrTjk/edit]]
> The shuffle plugin system needs a means for storing custom metadata for 
> shuffle readers to eventually use. This task tracks the API changes required 
> for the map output writer to return some such metadata back to the driver.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-31798) Return map output metadata from shuffle writers

2020-05-22 Thread Matt Cheah (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-31798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Cheah updated SPARK-31798:
---
Description: 
Part of the overall sub-push for shuffle metadata management as proposed in 
[this 
document|[https://docs.google.com/document/d/1Aj6IyMsbS2sdIfHxLvIbHUNjHIWHTabfknIPoxOrTjk/edit]].

The shuffle plugin system needs a means for storing custom metadata for shuffle 
readers to eventually use. This task tracks the API changes required for the 
map output writer to return some such metadata back to the driver.

  was:
Part of the overall sub-push for shuffle metadata management as proposed in[ 
this 
document|[https://docs.google.com/document/d/1Aj6IyMsbS2sdIfHxLvIbHUNjHIWHTabfknIPoxOrTjk/edit]].

The shuffle plugin system needs a means for storing custom metadata for shuffle 
readers to eventually use. This task tracks the API changes required for the 
map output writer to return some such metadata back to the driver.


> Return map output metadata from shuffle writers
> ---
>
> Key: SPARK-31798
> URL: https://issues.apache.org/jira/browse/SPARK-31798
> Project: Spark
>  Issue Type: Sub-task
>  Components: Shuffle
>Affects Versions: 3.0.0
>Reporter: Matt Cheah
>Priority: Major
>
> Part of the overall sub-push for shuffle metadata management as proposed in 
> [this 
> document|[https://docs.google.com/document/d/1Aj6IyMsbS2sdIfHxLvIbHUNjHIWHTabfknIPoxOrTjk/edit]].
> The shuffle plugin system needs a means for storing custom metadata for 
> shuffle readers to eventually use. This task tracks the API changes required 
> for the map output writer to return some such metadata back to the driver.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-31798) Return map output metadata from shuffle writers

2020-05-22 Thread Matt Cheah (Jira)
Matt Cheah created SPARK-31798:
--

 Summary: Return map output metadata from shuffle writers
 Key: SPARK-31798
 URL: https://issues.apache.org/jira/browse/SPARK-31798
 Project: Spark
  Issue Type: Sub-task
  Components: Shuffle
Affects Versions: 3.0.0
Reporter: Matt Cheah


Part of the overall sub-push for shuffle metadata management as proposed in[ 
this 
document|[https://docs.google.com/document/d/1Aj6IyMsbS2sdIfHxLvIbHUNjHIWHTabfknIPoxOrTjk/edit]].

The shuffle plugin system needs a means for storing custom metadata for shuffle 
readers to eventually use. This task tracks the API changes required for the 
map output writer to return some such metadata back to the driver.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-29072) Properly track shuffle write time with refactor

2019-09-12 Thread Matt Cheah (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Cheah updated SPARK-29072:
---
Description: From SPARK-28209, SPARK-28570, and SPARK-28571, we used the 
new shuffle writer plugin API across all the shuffle writers. However, we 
accidentally lost time tracking metrics for shuffle writes in the process, 
particularly for UnsafeShuffleWriter when writing with streams (without 
transferTo), as well as the SortShuffleWriter.  (was: From SPARK-28209, 
SPARK-28570, and SPARK-28571, we used the new shuffle writer plugin API across 
all the shuffle writers. However, some mistakes were made in the process:
 * We lost shuffle write time metrics for the Unsafe and Sort shuffle writer 
implementations
 * There was a duplicate implementation of ShufflePartitionPairsWriter 
introduced

This issue tracks rectifying both situations; the second is related to the 
first.)

> Properly track shuffle write time with refactor
> ---
>
> Key: SPARK-29072
> URL: https://issues.apache.org/jira/browse/SPARK-29072
> Project: Spark
>  Issue Type: Bug
>  Components: Shuffle
>Affects Versions: 3.0.0
>Reporter: Matt Cheah
>Priority: Major
>
> From SPARK-28209, SPARK-28570, and SPARK-28571, we used the new shuffle 
> writer plugin API across all the shuffle writers. However, we accidentally 
> lost time tracking metrics for shuffle writes in the process, particularly 
> for UnsafeShuffleWriter when writing with streams (without transferTo), as 
> well as the SortShuffleWriter.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-29072) Properly track shuffle write time with refactor

2019-09-12 Thread Matt Cheah (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Cheah updated SPARK-29072:
---
Description: 
>From SPARK-28209, SPARK-28570, and SPARK-28571, we used the new shuffle writer 
>plugin API across all the shuffle writers. However, some mistakes were made in 
>the process:
 * We lost shuffle write time metrics for the Unsafe and Sort shuffle writer 
implementations
 * There was a duplicate implementation of ShufflePartitionPairsWriter 
introduced

This issue tracks rectifying both situations; the second is related to the 
first.

  was:
>From SPARK-28209, SPARK-28570, and SPARK-28571, we used the new shuffle writer 
>plugin API across all the shuffle writers. However, some mistakes were made in 
>the process:
 * We lost shuffle write time metrics for the Unsafe and Sort shuffle writer 
implementations
 * There was a duplicate implementation of ShufflePartitionPairsWriter 
introduced


> Properly track shuffle write time with refactor
> ---
>
> Key: SPARK-29072
> URL: https://issues.apache.org/jira/browse/SPARK-29072
> Project: Spark
>  Issue Type: Bug
>  Components: Shuffle
>Affects Versions: 3.0.0
>Reporter: Matt Cheah
>Priority: Major
>
> From SPARK-28209, SPARK-28570, and SPARK-28571, we used the new shuffle 
> writer plugin API across all the shuffle writers. However, some mistakes were 
> made in the process:
>  * We lost shuffle write time metrics for the Unsafe and Sort shuffle writer 
> implementations
>  * There was a duplicate implementation of ShufflePartitionPairsWriter 
> introduced
> This issue tracks rectifying both situations; the second is related to the 
> first.



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-29072) Properly track shuffle write time with refactor

2019-09-12 Thread Matt Cheah (Jira)


 [ 
https://issues.apache.org/jira/browse/SPARK-29072?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Cheah updated SPARK-29072:
---
Description: 
>From SPARK-28209, SPARK-28570, and SPARK-28571, we used the new shuffle writer 
>plugin API across all the shuffle writers. However, some mistakes were made in 
>the process:
 * We lost shuffle write time metrics for the Unsafe and Sort shuffle writer 
implementations
 * There was a duplicate implementation of ShufflePartitionPairsWriter 
introduced

> Properly track shuffle write time with refactor
> ---
>
> Key: SPARK-29072
> URL: https://issues.apache.org/jira/browse/SPARK-29072
> Project: Spark
>  Issue Type: Bug
>  Components: Shuffle
>Affects Versions: 3.0.0
>Reporter: Matt Cheah
>Priority: Major
>
> From SPARK-28209, SPARK-28570, and SPARK-28571, we used the new shuffle 
> writer plugin API across all the shuffle writers. However, some mistakes were 
> made in the process:
>  * We lost shuffle write time metrics for the Unsafe and Sort shuffle writer 
> implementations
>  * There was a duplicate implementation of ShufflePartitionPairsWriter 
> introduced



--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-29072) Properly track shuffle write time with refactor

2019-09-12 Thread Matt Cheah (Jira)
Matt Cheah created SPARK-29072:
--

 Summary: Properly track shuffle write time with refactor
 Key: SPARK-29072
 URL: https://issues.apache.org/jira/browse/SPARK-29072
 Project: Spark
  Issue Type: Bug
  Components: Shuffle
Affects Versions: 3.0.0
Reporter: Matt Cheah






--
This message was sent by Atlassian Jira
(v8.3.2#803003)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-28764) Remove unnecessary writePartitionedFile method from ExternalSorter

2019-08-16 Thread Matt Cheah (JIRA)
Matt Cheah created SPARK-28764:
--

 Summary: Remove unnecessary writePartitionedFile method from 
ExternalSorter
 Key: SPARK-28764
 URL: https://issues.apache.org/jira/browse/SPARK-28764
 Project: Spark
  Issue Type: Task
  Components: Shuffle, Tests
Affects Versions: 3.0.0
Reporter: Matt Cheah


Following SPARK-28571, we now use {{ExternalSorter#writePartitionedData}} in 
{{SortShuffleWriter}} when persisting the shuffle data via the shuffle writer 
plugin. However, we left the {{writePartitionedFile}} method on 
{{ExternalSorter}} strictly for tests. We should figure out a way how to 
refactor those tests to use {{writePartitionedData}} instead of 
{{writePartitionedFile}}.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-28607) Don't hold a reference to two partitionLengths arrays

2019-08-02 Thread Matt Cheah (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Cheah updated SPARK-28607:
---
Description: SPARK-28209 introduced the new shuffle writer API and its 
usage in BypassMergeSortShuffleWriter. However, the design of the API forces 
the partition lengths to be tracked both in the implementation of the plugin 
and also by the higher-level writer. This leads to redundant memory usage. We 
should only track the lengths of the partitions in the implementation of the 
plugin and propagate this information back up to the writer as the return value 
of {{commitAllPartitions}}.

> Don't hold a reference to two partitionLengths arrays
> -
>
> Key: SPARK-28607
> URL: https://issues.apache.org/jira/browse/SPARK-28607
> Project: Spark
>  Issue Type: Sub-task
>  Components: Shuffle
>Affects Versions: 3.0.0
>Reporter: Matt Cheah
>Priority: Major
>
> SPARK-28209 introduced the new shuffle writer API and its usage in 
> BypassMergeSortShuffleWriter. However, the design of the API forces the 
> partition lengths to be tracked both in the implementation of the plugin and 
> also by the higher-level writer. This leads to redundant memory usage. We 
> should only track the lengths of the partitions in the implementation of the 
> plugin and propagate this information back up to the writer as the return 
> value of {{commitAllPartitions}}.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-28607) Don't hold a reference to two partitionLengths arrays

2019-08-02 Thread Matt Cheah (JIRA)
Matt Cheah created SPARK-28607:
--

 Summary: Don't hold a reference to two partitionLengths arrays
 Key: SPARK-28607
 URL: https://issues.apache.org/jira/browse/SPARK-28607
 Project: Spark
  Issue Type: Sub-task
  Components: Shuffle
Affects Versions: 3.0.0
Reporter: Matt Cheah






--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-28209) Shuffle Storage API: Writer API and usage in BypassMergeSortShuffleWriter

2019-07-30 Thread Matt Cheah (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Cheah updated SPARK-28209:
---
Description: Adds the write-side API for storing shuffle data in arbitrary 
storage systems. Also refactor the BypassMergeSortShuffleWriter so that it uses 
this API, demonstrating the API's viability.  (was: Adds the write-side API for 
storing shuffle data in arbitrary storage systems. Also refactor the existing 
shuffle write code so that it uses this API.)

> Shuffle Storage API: Writer API and usage in BypassMergeSortShuffleWriter
> -
>
> Key: SPARK-28209
> URL: https://issues.apache.org/jira/browse/SPARK-28209
> Project: Spark
>  Issue Type: Sub-task
>  Components: Shuffle
>Affects Versions: 3.0.0
>Reporter: Matt Cheah
>Assignee: Matt Cheah
>Priority: Major
> Fix For: 3.0.0
>
>
> Adds the write-side API for storing shuffle data in arbitrary storage 
> systems. Also refactor the BypassMergeSortShuffleWriter so that it uses this 
> API, demonstrating the API's viability.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-28571) Shuffle storage API: Use API in SortShuffleWriter

2019-07-30 Thread Matt Cheah (JIRA)
Matt Cheah created SPARK-28571:
--

 Summary: Shuffle storage API: Use API in SortShuffleWriter
 Key: SPARK-28571
 URL: https://issues.apache.org/jira/browse/SPARK-28571
 Project: Spark
  Issue Type: Sub-task
  Components: Shuffle
Affects Versions: 3.0.0
Reporter: Matt Cheah


Use the APIs introduced in SPARK-28209 in the SortShuffleWriter.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-28570) Shuffle Storage API: Use writer API in UnsafeShuffleWriter

2019-07-30 Thread Matt Cheah (JIRA)
Matt Cheah created SPARK-28570:
--

 Summary: Shuffle Storage API: Use writer API in UnsafeShuffleWriter
 Key: SPARK-28570
 URL: https://issues.apache.org/jira/browse/SPARK-28570
 Project: Spark
  Issue Type: Sub-task
  Components: Shuffle
Affects Versions: 3.0.0
Reporter: Matt Cheah


Use the APIs introduced in SPARK-28209 in the UnsafeShuffleWriter.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-28209) Shuffle Storage API: Writer API and usage in BypassMergeSortShuffleWriter

2019-07-30 Thread Matt Cheah (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Cheah updated SPARK-28209:
---
Summary: Shuffle Storage API: Writer API and usage in 
BypassMergeSortShuffleWriter  (was: Shuffle Storage API: Writer API)

> Shuffle Storage API: Writer API and usage in BypassMergeSortShuffleWriter
> -
>
> Key: SPARK-28209
> URL: https://issues.apache.org/jira/browse/SPARK-28209
> Project: Spark
>  Issue Type: Sub-task
>  Components: Shuffle
>Affects Versions: 3.0.0
>Reporter: Matt Cheah
>Assignee: Matt Cheah
>Priority: Major
> Fix For: 3.0.0
>
>
> Adds the write-side API for storing shuffle data in arbitrary storage 
> systems. Also refactor the existing shuffle write code so that it uses this 
> API.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-28209) Shuffle Storage API: Writer API

2019-07-30 Thread Matt Cheah (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-28209?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Cheah updated SPARK-28209:
---
Summary: Shuffle Storage API: Writer API  (was: Shuffle Storage API: Writes)

> Shuffle Storage API: Writer API
> ---
>
> Key: SPARK-28209
> URL: https://issues.apache.org/jira/browse/SPARK-28209
> Project: Spark
>  Issue Type: Sub-task
>  Components: Shuffle
>Affects Versions: 3.0.0
>Reporter: Matt Cheah
>Assignee: Matt Cheah
>Priority: Major
> Fix For: 3.0.0
>
>
> Adds the write-side API for storing shuffle data in arbitrary storage 
> systems. Also refactor the existing shuffle write code so that it uses this 
> API.



--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-28568) Make Javadoc in org.apache.spark.shuffle.api visible

2019-07-30 Thread Matt Cheah (JIRA)
Matt Cheah created SPARK-28568:
--

 Summary: Make Javadoc in org.apache.spark.shuffle.api visible
 Key: SPARK-28568
 URL: https://issues.apache.org/jira/browse/SPARK-28568
 Project: Spark
  Issue Type: Sub-task
  Components: Shuffle
Affects Versions: 3.0.0
Reporter: Matt Cheah






--
This message was sent by Atlassian JIRA
(v7.6.14#76016)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-28238) DESCRIBE TABLE for Data Source V2 tables

2019-07-02 Thread Matt Cheah (JIRA)
Matt Cheah created SPARK-28238:
--

 Summary: DESCRIBE TABLE for Data Source V2 tables
 Key: SPARK-28238
 URL: https://issues.apache.org/jira/browse/SPARK-28238
 Project: Spark
  Issue Type: New Feature
  Components: SQL
Affects Versions: 3.0.0
Reporter: Matt Cheah


Implement the \{{DESCRIBE TABLE}} DDL command for tables that are loaded by V2 
catalogs.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-28212) Shuffle Storage API: Shuffle Cleanup

2019-06-28 Thread Matt Cheah (JIRA)
Matt Cheah created SPARK-28212:
--

 Summary: Shuffle Storage API: Shuffle Cleanup
 Key: SPARK-28212
 URL: https://issues.apache.org/jira/browse/SPARK-28212
 Project: Spark
  Issue Type: Sub-task
  Components: Shuffle
Affects Versions: 3.0.0
Reporter: Matt Cheah


In the shuffle storage API, there should be a plugin point for removing 
shuffles that are no longer used.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-28211) Shuffle Storage API: Driver Lifecycle

2019-06-28 Thread Matt Cheah (JIRA)
Matt Cheah created SPARK-28211:
--

 Summary: Shuffle Storage API: Driver Lifecycle
 Key: SPARK-28211
 URL: https://issues.apache.org/jira/browse/SPARK-28211
 Project: Spark
  Issue Type: Sub-task
  Components: Shuffle
Affects Versions: 3.0.0
Reporter: Matt Cheah


As part of the shuffle storage API, allow users to hook in application-wide 
startup and shutdown methods. This can do things like create tables in the 
shuffle storage database, or register / unregister against file servers.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-28210) Shuffle Storage API: Reads

2019-06-28 Thread Matt Cheah (JIRA)
Matt Cheah created SPARK-28210:
--

 Summary: Shuffle Storage API: Reads
 Key: SPARK-28210
 URL: https://issues.apache.org/jira/browse/SPARK-28210
 Project: Spark
  Issue Type: Sub-task
  Components: Shuffle
Affects Versions: 3.0.0
Reporter: Matt Cheah


As part of the effort to store shuffle data in arbitrary places, this issue 
tracks implementing an API for reading the shuffle data stored by the write 
API. Also ensure that the existing shuffle implementation is refactored to use 
the API.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-25299) Use remote storage for persisting shuffle data

2019-06-28 Thread Matt Cheah (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Cheah updated SPARK-25299:
---
Description: 
In Spark, the shuffle primitive requires Spark executors to persist data to the 
local disk of the worker nodes. If executors crash, the external shuffle 
service can continue to serve the shuffle data that was written beyond the 
lifetime of the executor itself. In YARN, Mesos, and Standalone mode, the 
external shuffle service is deployed on every worker node. The shuffle service 
shares local disk with the executors that run on its node.

There are some shortcomings with the way shuffle is fundamentally implemented 
right now. Particularly:
 * If any external shuffle service process or node becomes unavailable, all 
applications that had an executor that ran on that node must recompute the 
shuffle blocks that were lost.
 * Similarly to the above, the external shuffle service must be kept running at 
all times, which may waste resources when no applications are using that 
shuffle service node.
 * Mounting local storage can prevent users from taking advantage of desirable 
isolation benefits from using containerized environments, like Kubernetes. We 
had an external shuffle service implementation in an early prototype of the 
Kubernetes backend, but it was rejected due to its strict requirement to be 
able to mount hostPath volumes or other persistent volume setups.

In the following [architecture discussion 
document|https://docs.google.com/document/d/1uCkzGGVG17oGC6BJ75TpzLAZNorvrAU3FRd2X-rVHSM/edit#heading=h.btqugnmt2h40]
 (note: _not_ an SPIP), we brainstorm various high level architectures for 
improving the external shuffle service in a way that addresses the above 
problems. The purpose of this umbrella JIRA is to promote additional discussion 
on how we can approach these problems, both at the architecture level and the 
implementation level. We anticipate filing sub-issues that break down the tasks 
that must be completed to achieve this goal.

Edit June 28 2019: Our SPIP is here: 
[https://docs.google.com/document/d/1d6egnL6WHOwWZe8MWv3m8n4PToNacdx7n_0iMSWwhCQ/edit]

  was:
In Spark, the shuffle primitive requires Spark executors to persist data to the 
local disk of the worker nodes. If executors crash, the external shuffle 
service can continue to serve the shuffle data that was written beyond the 
lifetime of the executor itself. In YARN, Mesos, and Standalone mode, the 
external shuffle service is deployed on every worker node. The shuffle service 
shares local disk with the executors that run on its node.

There are some shortcomings with the way shuffle is fundamentally implemented 
right now. Particularly:
 * If any external shuffle service process or node becomes unavailable, all 
applications that had an executor that ran on that node must recompute the 
shuffle blocks that were lost.
 * Similarly to the above, the external shuffle service must be kept running at 
all times, which may waste resources when no applications are using that 
shuffle service node.
 * Mounting local storage can prevent users from taking advantage of desirable 
isolation benefits from using containerized environments, like Kubernetes. We 
had an external shuffle service implementation in an early prototype of the 
Kubernetes backend, but it was rejected due to its strict requirement to be 
able to mount hostPath volumes or other persistent volume setups.

In the following [architecture discussion 
document|https://docs.google.com/document/d/1uCkzGGVG17oGC6BJ75TpzLAZNorvrAU3FRd2X-rVHSM/edit#heading=h.btqugnmt2h40]
 (note: _not_ an SPIP), we brainstorm various high level architectures for 
improving the external shuffle service in a way that addresses the above 
problems. The purpose of this umbrella JIRA is to promote additional discussion 
on how we can approach these problems, both at the architecture level and the 
implementation level. We anticipate filing sub-issues that break down the tasks 
that must be completed to achieve this goal.


> Use remote storage for persisting shuffle data
> --
>
> Key: SPARK-25299
> URL: https://issues.apache.org/jira/browse/SPARK-25299
> Project: Spark
>  Issue Type: New Feature
>  Components: Shuffle
>Affects Versions: 2.4.0
>Reporter: Matt Cheah
>Priority: Major
>  Labels: SPIP
>
> In Spark, the shuffle primitive requires Spark executors to persist data to 
> the local disk of the worker nodes. If executors crash, the external shuffle 
> service can continue to serve the shuffle data that was written beyond the 
> lifetime of the executor itself. In YARN, Mesos, and Standalone mode, the 
> external shuffle service is deployed on every worker node. The shuffle 
> service shares local disk with the executors that run on 

[jira] [Commented] (SPARK-25299) Use remote storage for persisting shuffle data

2019-06-28 Thread Matt Cheah (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-25299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16875106#comment-16875106
 ] 

Matt Cheah commented on SPARK-25299:


I also noticed the SPIP document wasn't ever posted on this ticket, so sorry 
about that! Here's the link for everyone who wasn't following along on the 
mailing list: 
[https://docs.google.com/document/d/1d6egnL6WHOwWZe8MWv3m8n4PToNacdx7n_0iMSWwhCQ/edit]

> Use remote storage for persisting shuffle data
> --
>
> Key: SPARK-25299
> URL: https://issues.apache.org/jira/browse/SPARK-25299
> Project: Spark
>  Issue Type: New Feature
>  Components: Shuffle
>Affects Versions: 2.4.0
>Reporter: Matt Cheah
>Priority: Major
>  Labels: SPIP
>
> In Spark, the shuffle primitive requires Spark executors to persist data to 
> the local disk of the worker nodes. If executors crash, the external shuffle 
> service can continue to serve the shuffle data that was written beyond the 
> lifetime of the executor itself. In YARN, Mesos, and Standalone mode, the 
> external shuffle service is deployed on every worker node. The shuffle 
> service shares local disk with the executors that run on its node.
> There are some shortcomings with the way shuffle is fundamentally implemented 
> right now. Particularly:
>  * If any external shuffle service process or node becomes unavailable, all 
> applications that had an executor that ran on that node must recompute the 
> shuffle blocks that were lost.
>  * Similarly to the above, the external shuffle service must be kept running 
> at all times, which may waste resources when no applications are using that 
> shuffle service node.
>  * Mounting local storage can prevent users from taking advantage of 
> desirable isolation benefits from using containerized environments, like 
> Kubernetes. We had an external shuffle service implementation in an early 
> prototype of the Kubernetes backend, but it was rejected due to its strict 
> requirement to be able to mount hostPath volumes or other persistent volume 
> setups.
> In the following [architecture discussion 
> document|https://docs.google.com/document/d/1uCkzGGVG17oGC6BJ75TpzLAZNorvrAU3FRd2X-rVHSM/edit#heading=h.btqugnmt2h40]
>  (note: _not_ an SPIP), we brainstorm various high level architectures for 
> improving the external shuffle service in a way that addresses the above 
> problems. The purpose of this umbrella JIRA is to promote additional 
> discussion on how we can approach these problems, both at the architecture 
> level and the implementation level. We anticipate filing sub-issues that 
> break down the tasks that must be completed to achieve this goal.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-28209) Shuffle Storage API: Writes

2019-06-28 Thread Matt Cheah (JIRA)
Matt Cheah created SPARK-28209:
--

 Summary: Shuffle Storage API: Writes
 Key: SPARK-28209
 URL: https://issues.apache.org/jira/browse/SPARK-28209
 Project: Spark
  Issue Type: Sub-task
  Components: Shuffle
Affects Versions: 3.0.0
Reporter: Matt Cheah


Adds the write-side API for storing shuffle data in arbitrary storage systems. 
Also refactor the existing shuffle write code so that it uses this API.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-25299) Use remote storage for persisting shuffle data

2019-06-28 Thread Matt Cheah (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-25299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16875101#comment-16875101
 ] 

Matt Cheah commented on SPARK-25299:


Let's start by making sub-issues. I have a patch staged for master I intend to 
post by end of day.

> Use remote storage for persisting shuffle data
> --
>
> Key: SPARK-25299
> URL: https://issues.apache.org/jira/browse/SPARK-25299
> Project: Spark
>  Issue Type: New Feature
>  Components: Shuffle
>Affects Versions: 2.4.0
>Reporter: Matt Cheah
>Priority: Major
>  Labels: SPIP
>
> In Spark, the shuffle primitive requires Spark executors to persist data to 
> the local disk of the worker nodes. If executors crash, the external shuffle 
> service can continue to serve the shuffle data that was written beyond the 
> lifetime of the executor itself. In YARN, Mesos, and Standalone mode, the 
> external shuffle service is deployed on every worker node. The shuffle 
> service shares local disk with the executors that run on its node.
> There are some shortcomings with the way shuffle is fundamentally implemented 
> right now. Particularly:
>  * If any external shuffle service process or node becomes unavailable, all 
> applications that had an executor that ran on that node must recompute the 
> shuffle blocks that were lost.
>  * Similarly to the above, the external shuffle service must be kept running 
> at all times, which may waste resources when no applications are using that 
> shuffle service node.
>  * Mounting local storage can prevent users from taking advantage of 
> desirable isolation benefits from using containerized environments, like 
> Kubernetes. We had an external shuffle service implementation in an early 
> prototype of the Kubernetes backend, but it was rejected due to its strict 
> requirement to be able to mount hostPath volumes or other persistent volume 
> setups.
> In the following [architecture discussion 
> document|https://docs.google.com/document/d/1uCkzGGVG17oGC6BJ75TpzLAZNorvrAU3FRd2X-rVHSM/edit#heading=h.btqugnmt2h40]
>  (note: _not_ an SPIP), we brainstorm various high level architectures for 
> improving the external shuffle service in a way that addresses the above 
> problems. The purpose of this umbrella JIRA is to promote additional 
> discussion on how we can approach these problems, both at the architecture 
> level and the implementation level. We anticipate filing sub-issues that 
> break down the tasks that must be completed to achieve this goal.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-26874) With PARQUET-1414, Spark can erroneously write empty pages

2019-02-14 Thread Matt Cheah (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Cheah resolved SPARK-26874.

Resolution: Not A Problem

I did some more digging with [~rdblue] and we discovered that this is a root 
problem on the Parquet side, and particularly in a feature that hasn't been 
released yet. We'll continue pursuing solutions there.

> With PARQUET-1414, Spark can erroneously write empty pages
> --
>
> Key: SPARK-26874
> URL: https://issues.apache.org/jira/browse/SPARK-26874
> Project: Spark
>  Issue Type: Bug
>  Components: Input/Output, SQL
>Affects Versions: 2.4.0
>Reporter: Matt Cheah
>Priority: Major
>
> This issue will only come up when Spark upgrades its Parquet dependency to 
> the latest off of parquet-mr/master. This issue is being filed to proactively 
> fix the bug before we upgrade - it's not something that would easily be found 
> in the current unit tests and can be missed until the community scale tests 
> in an e.g. RC phase.
> Parquet introduced a new feature to limit the number of rows written to a 
> page in a column chunk - see PARQUET-1414. Previously, Parquet would only 
> flush pages to the column store after the page writer had filled its buffer 
> with a certain amount of bytes. The idea of the Parquet patch was to make 
> page writers flush to the column store upon the writer being given a certain 
> number of rows - the default value is 2.
> The patch makes the Spark Parquet Data Source erroneously write empty pages 
> to column chunks, making the Parquet file ultimately unreadable with 
> exceptions like these:
>  
> {code:java}
> Caused by: org.apache.parquet.io.ParquetDecodingException: Can not read value 
> at 0 in block -1 in file 
> file:/private/var/folders/7f/rrg37sj15r33cwlxbhg7fyg5080xxg/T/spark-2c1b1e5d-c132-42fe-98e1-b523b9baa176/parquet-data/part-2-b8b4bb3c-b49f-440b-b96c-53fcd19786ad-c000.snappy.parquet
>  at 
> org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:251)
>  at 
> org.apache.parquet.hadoop.ParquetRecordReader.nextKeyValue(ParquetRecordReader.java:207)
>  at 
> org.apache.spark.sql.execution.datasources.RecordReaderIterator.hasNext(RecordReaderIterator.scala:39)
>  at 
> org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:101)
>  at 
> org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:181)
>  ... 18 more
> Caused by: java.lang.IllegalArgumentException: Reading past RLE/BitPacking 
> stream.
>  at org.apache.parquet.Preconditions.checkArgument(Preconditions.java:53)
>  at 
> org.apache.parquet.column.values.rle.RunLengthBitPackingHybridDecoder.readNext(RunLengthBitPackingHybridDecoder.java:80)
>  at 
> org.apache.parquet.column.values.rle.RunLengthBitPackingHybridDecoder.readInt(RunLengthBitPackingHybridDecoder.java:62)
>  at 
> org.apache.parquet.column.values.rle.RunLengthBitPackingHybridValuesReader.readInteger(RunLengthBitPackingHybridValuesReader.java:53)
>  at 
> org.apache.parquet.column.impl.ColumnReaderBase$ValuesReaderIntIterator.nextInt(ColumnReaderBase.java:733)
>  at 
> org.apache.parquet.column.impl.ColumnReaderBase.checkRead(ColumnReaderBase.java:567)
>  at 
> org.apache.parquet.column.impl.ColumnReaderBase.consume(ColumnReaderBase.java:705)
>  at 
> org.apache.parquet.column.impl.ColumnReaderImpl.consume(ColumnReaderImpl.java:30)
>  at 
> org.apache.parquet.column.impl.ColumnReaderImpl.(ColumnReaderImpl.java:47)
>  at 
> org.apache.parquet.column.impl.ColumnReadStoreImpl.getColumnReader(ColumnReadStoreImpl.java:84)
>  at 
> org.apache.parquet.io.RecordReaderImplementation.(RecordReaderImplementation.java:271)
>  at org.apache.parquet.io.MessageColumnIO$1.visit(MessageColumnIO.java:147)
>  at org.apache.parquet.io.MessageColumnIO$1.visit(MessageColumnIO.java:109)
>  at 
> org.apache.parquet.filter2.compat.FilterCompat$NoOpFilter.accept(FilterCompat.java:165)
>  at 
> org.apache.parquet.io.MessageColumnIO.getRecordReader(MessageColumnIO.java:109)
>  at 
> org.apache.parquet.hadoop.InternalParquetRecordReader.checkRead(InternalParquetRecordReader.java:137)
>  at 
> org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:222)
>  ... 22 more
> {code}
> What's happening here is that the reader is being given a page with no 
> values, which Parquet can never handle.
> The root cause is due to the way Spark treats empty (null) records in 
> optional fields. Concretely, in {{ParquetWriteSupport#write}}, we always 
> indicate to the recordConsumer that we are starting a message 
> ({{recordConsumer#startMessage}}). If Spark is given a null field in the row, 
> it still indicates to the record consumer after having ignored the 

[jira] [Updated] (SPARK-26874) With PARQUET-1414, Spark can erroneously write empty pages

2019-02-13 Thread Matt Cheah (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Cheah updated SPARK-26874:
---
Description: 
This issue will only come up when Spark upgrades its Parquet dependency to the 
latest off of parquet-mr/master. This issue is being filed to proactively fix 
the bug before we upgrade - it's not something that would easily be found in 
the current unit tests and can be missed until the community scale tests in an 
e.g. RC phase.

Parquet introduced a new feature to limit the number of rows written to a page 
in a column chunk - see PARQUET-1414. Previously, Parquet would only flush 
pages to the column store after the page writer had filled its buffer with a 
certain amount of bytes. The idea of the Parquet patch was to make page writers 
flush to the column store upon the writer being given a certain number of rows 
- the default value is 2.

The patch makes the Spark Parquet Data Source erroneously write empty pages to 
column chunks, making the Parquet file ultimately unreadable with exceptions 
like these:

 
{code:java}
Caused by: org.apache.parquet.io.ParquetDecodingException: Can not read value 
at 0 in block -1 in file 
file:/private/var/folders/7f/rrg37sj15r33cwlxbhg7fyg5080xxg/T/spark-2c1b1e5d-c132-42fe-98e1-b523b9baa176/parquet-data/part-2-b8b4bb3c-b49f-440b-b96c-53fcd19786ad-c000.snappy.parquet
 at 
org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:251)
 at 
org.apache.parquet.hadoop.ParquetRecordReader.nextKeyValue(ParquetRecordReader.java:207)
 at 
org.apache.spark.sql.execution.datasources.RecordReaderIterator.hasNext(RecordReaderIterator.scala:39)
 at 
org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:101)
 at 
org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:181)
 ... 18 more
Caused by: java.lang.IllegalArgumentException: Reading past RLE/BitPacking 
stream.
 at org.apache.parquet.Preconditions.checkArgument(Preconditions.java:53)
 at 
org.apache.parquet.column.values.rle.RunLengthBitPackingHybridDecoder.readNext(RunLengthBitPackingHybridDecoder.java:80)
 at 
org.apache.parquet.column.values.rle.RunLengthBitPackingHybridDecoder.readInt(RunLengthBitPackingHybridDecoder.java:62)
 at 
org.apache.parquet.column.values.rle.RunLengthBitPackingHybridValuesReader.readInteger(RunLengthBitPackingHybridValuesReader.java:53)
 at 
org.apache.parquet.column.impl.ColumnReaderBase$ValuesReaderIntIterator.nextInt(ColumnReaderBase.java:733)
 at 
org.apache.parquet.column.impl.ColumnReaderBase.checkRead(ColumnReaderBase.java:567)
 at 
org.apache.parquet.column.impl.ColumnReaderBase.consume(ColumnReaderBase.java:705)
 at 
org.apache.parquet.column.impl.ColumnReaderImpl.consume(ColumnReaderImpl.java:30)
 at 
org.apache.parquet.column.impl.ColumnReaderImpl.(ColumnReaderImpl.java:47)
 at 
org.apache.parquet.column.impl.ColumnReadStoreImpl.getColumnReader(ColumnReadStoreImpl.java:84)
 at 
org.apache.parquet.io.RecordReaderImplementation.(RecordReaderImplementation.java:271)
 at org.apache.parquet.io.MessageColumnIO$1.visit(MessageColumnIO.java:147)
 at org.apache.parquet.io.MessageColumnIO$1.visit(MessageColumnIO.java:109)
 at 
org.apache.parquet.filter2.compat.FilterCompat$NoOpFilter.accept(FilterCompat.java:165)
 at 
org.apache.parquet.io.MessageColumnIO.getRecordReader(MessageColumnIO.java:109)
 at 
org.apache.parquet.hadoop.InternalParquetRecordReader.checkRead(InternalParquetRecordReader.java:137)
 at 
org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:222)
 ... 22 more
{code}
What's happening here is that the reader is being given a page with no values, 
which Parquet can never handle.

The root cause is due to the way Spark treats empty (null) records in optional 
fields. Concretely, in {{ParquetWriteSupport#write}}, we always indicate to the 
recordConsumer that we are starting a message 
({{recordConsumer#startMessage}}). If Spark is given a null field in the row, 
it still indicates to the record consumer after having ignored the row that the 
message is finished ({{recordConsumer#endMessage}}). The ending of the message 
causes all column writers to increment their row count in the current page by 
1, despite the fact that Spark is not necessarily sending records to the 
underlying page writer. Now suppose the page maximum row count is N; if Spark 
does the above N times in a page, and particularly if Spark cuts a page 
boundary and is subsequently given N empty values for an optional column - the 
column writer will then think it needs to flush the page to the column chunk 
store and will write out an empty page.

This will most likely be manifested in very sparse columns.

A simple demonstration of the issue is given below. Assume this code is 
manually inserted into {{ParquetIOSuite}}:
{code:java}

[jira] [Updated] (SPARK-26874) When we upgrade Parquet to 1.11+, Spark can erroneously write empty pages

2019-02-13 Thread Matt Cheah (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Cheah updated SPARK-26874:
---
Priority: Critical  (was: Major)

> When we upgrade Parquet to 1.11+, Spark can erroneously write empty pages
> -
>
> Key: SPARK-26874
> URL: https://issues.apache.org/jira/browse/SPARK-26874
> Project: Spark
>  Issue Type: Bug
>  Components: Input/Output, SQL
>Affects Versions: 2.4.0
>Reporter: Matt Cheah
>Priority: Critical
>
> This issue will only come up when Spark upgrades its Parquet dependency to 
> the latest. This issue is being filed to proactively fix the bug before we 
> upgrade - it's not something that would easily be found in the current unit 
> tests and can be missed until the community scale tests in an e.g. RC phase.
> Parquet introduced a new feature to limit the number of rows written to a 
> page in a column chunk - see PARQUET-1414. Previously, Parquet would only 
> flush pages to the column store after the page writer had filled its buffer 
> with a certain amount of bytes. The idea of the Parquet patch was to make 
> page writers flush to the column store upon the writer being given a certain 
> number of rows - the default value is 2.
> The patch makes the Spark Parquet Data Source erroneously write empty pages 
> to column chunks, making the Parquet file ultimately unreadable with 
> exceptions like these:
>  
> {code:java}
> Caused by: org.apache.parquet.io.ParquetDecodingException: Can not read value 
> at 0 in block -1 in file 
> file:/private/var/folders/7f/rrg37sj15r33cwlxbhg7fyg5080xxg/T/spark-2c1b1e5d-c132-42fe-98e1-b523b9baa176/parquet-data/part-2-b8b4bb3c-b49f-440b-b96c-53fcd19786ad-c000.snappy.parquet
>  at 
> org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:251)
>  at 
> org.apache.parquet.hadoop.ParquetRecordReader.nextKeyValue(ParquetRecordReader.java:207)
>  at 
> org.apache.spark.sql.execution.datasources.RecordReaderIterator.hasNext(RecordReaderIterator.scala:39)
>  at 
> org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:101)
>  at 
> org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:181)
>  ... 18 more
> Caused by: java.lang.IllegalArgumentException: Reading past RLE/BitPacking 
> stream.
>  at org.apache.parquet.Preconditions.checkArgument(Preconditions.java:53)
>  at 
> org.apache.parquet.column.values.rle.RunLengthBitPackingHybridDecoder.readNext(RunLengthBitPackingHybridDecoder.java:80)
>  at 
> org.apache.parquet.column.values.rle.RunLengthBitPackingHybridDecoder.readInt(RunLengthBitPackingHybridDecoder.java:62)
>  at 
> org.apache.parquet.column.values.rle.RunLengthBitPackingHybridValuesReader.readInteger(RunLengthBitPackingHybridValuesReader.java:53)
>  at 
> org.apache.parquet.column.impl.ColumnReaderBase$ValuesReaderIntIterator.nextInt(ColumnReaderBase.java:733)
>  at 
> org.apache.parquet.column.impl.ColumnReaderBase.checkRead(ColumnReaderBase.java:567)
>  at 
> org.apache.parquet.column.impl.ColumnReaderBase.consume(ColumnReaderBase.java:705)
>  at 
> org.apache.parquet.column.impl.ColumnReaderImpl.consume(ColumnReaderImpl.java:30)
>  at 
> org.apache.parquet.column.impl.ColumnReaderImpl.(ColumnReaderImpl.java:47)
>  at 
> org.apache.parquet.column.impl.ColumnReadStoreImpl.getColumnReader(ColumnReadStoreImpl.java:84)
>  at 
> org.apache.parquet.io.RecordReaderImplementation.(RecordReaderImplementation.java:271)
>  at org.apache.parquet.io.MessageColumnIO$1.visit(MessageColumnIO.java:147)
>  at org.apache.parquet.io.MessageColumnIO$1.visit(MessageColumnIO.java:109)
>  at 
> org.apache.parquet.filter2.compat.FilterCompat$NoOpFilter.accept(FilterCompat.java:165)
>  at 
> org.apache.parquet.io.MessageColumnIO.getRecordReader(MessageColumnIO.java:109)
>  at 
> org.apache.parquet.hadoop.InternalParquetRecordReader.checkRead(InternalParquetRecordReader.java:137)
>  at 
> org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:222)
>  ... 22 more
> {code}
> What's happening here is that the reader is being given a page with no 
> values, which Parquet can never handle.
> The root cause is due to the way Spark treats empty (null) records in 
> optional fields. Concretely, in {{ParquetWriteSupport#write}}, we always 
> indicate to the recordConsumer that we are starting a message 
> ({{recordConsumer#startMessage}}). If Spark is given a null field in the row, 
> it still indicates to the record consumer after having ignored the row that 
> the message is finished ({{recordConsumer#endMessage}}). The ending of the 
> message causes all column writers to increment their row count in the current 
> page by 1, despite the fact that 

[jira] [Updated] (SPARK-26874) When we upgrade Parquet to 1.11+, Spark can erroneously write empty pages

2019-02-13 Thread Matt Cheah (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Cheah updated SPARK-26874:
---
Description: 
This issue will only come up when Spark upgrades its Parquet dependency to the 
latest. This issue is being filed to proactively fix the bug before we upgrade 
- it's not something that would easily be found in the current unit tests and 
can be missed until the community scale tests in an e.g. RC phase.

Parquet introduced a new feature to limit the number of rows written to a page 
in a column chunk - see PARQUET-1414. Previously, Parquet would only flush 
pages to the column store after the page writer had filled its buffer with a 
certain amount of bytes. The idea of the Parquet patch was to make page writers 
flush to the column store upon the writer being given a certain number of rows 
- the default value is 2.

The patch makes the Spark Parquet Data Source erroneously write empty pages to 
column chunks, making the Parquet file ultimately unreadable with exceptions 
like these:

 
{code:java}
Caused by: org.apache.parquet.io.ParquetDecodingException: Can not read value 
at 0 in block -1 in file 
file:/private/var/folders/7f/rrg37sj15r33cwlxbhg7fyg5080xxg/T/spark-2c1b1e5d-c132-42fe-98e1-b523b9baa176/parquet-data/part-2-b8b4bb3c-b49f-440b-b96c-53fcd19786ad-c000.snappy.parquet
 at 
org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:251)
 at 
org.apache.parquet.hadoop.ParquetRecordReader.nextKeyValue(ParquetRecordReader.java:207)
 at 
org.apache.spark.sql.execution.datasources.RecordReaderIterator.hasNext(RecordReaderIterator.scala:39)
 at 
org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:101)
 at 
org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:181)
 ... 18 more
Caused by: java.lang.IllegalArgumentException: Reading past RLE/BitPacking 
stream.
 at org.apache.parquet.Preconditions.checkArgument(Preconditions.java:53)
 at 
org.apache.parquet.column.values.rle.RunLengthBitPackingHybridDecoder.readNext(RunLengthBitPackingHybridDecoder.java:80)
 at 
org.apache.parquet.column.values.rle.RunLengthBitPackingHybridDecoder.readInt(RunLengthBitPackingHybridDecoder.java:62)
 at 
org.apache.parquet.column.values.rle.RunLengthBitPackingHybridValuesReader.readInteger(RunLengthBitPackingHybridValuesReader.java:53)
 at 
org.apache.parquet.column.impl.ColumnReaderBase$ValuesReaderIntIterator.nextInt(ColumnReaderBase.java:733)
 at 
org.apache.parquet.column.impl.ColumnReaderBase.checkRead(ColumnReaderBase.java:567)
 at 
org.apache.parquet.column.impl.ColumnReaderBase.consume(ColumnReaderBase.java:705)
 at 
org.apache.parquet.column.impl.ColumnReaderImpl.consume(ColumnReaderImpl.java:30)
 at 
org.apache.parquet.column.impl.ColumnReaderImpl.(ColumnReaderImpl.java:47)
 at 
org.apache.parquet.column.impl.ColumnReadStoreImpl.getColumnReader(ColumnReadStoreImpl.java:84)
 at 
org.apache.parquet.io.RecordReaderImplementation.(RecordReaderImplementation.java:271)
 at org.apache.parquet.io.MessageColumnIO$1.visit(MessageColumnIO.java:147)
 at org.apache.parquet.io.MessageColumnIO$1.visit(MessageColumnIO.java:109)
 at 
org.apache.parquet.filter2.compat.FilterCompat$NoOpFilter.accept(FilterCompat.java:165)
 at 
org.apache.parquet.io.MessageColumnIO.getRecordReader(MessageColumnIO.java:109)
 at 
org.apache.parquet.hadoop.InternalParquetRecordReader.checkRead(InternalParquetRecordReader.java:137)
 at 
org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:222)
 ... 22 more
{code}
What's happening here is that the reader is being given a page with no values, 
which Parquet can never handle.

The root cause is due to the way Spark treats empty (null) records in optional 
fields. Concretely, in {{ParquetWriteSupport#write}}, we always indicate to the 
recordConsumer that we are starting a message 
({{recordConsumer#startMessage}}). If Spark is given a null field in the row, 
it still indicates to the record consumer after having ignored the row that the 
message is finished ({{recordConsumer#endMessage}}). The ending of the message 
causes all column writers to increment their row count in the current page by 
1, despite the fact that Spark is not necessarily sending records to the 
underlying page writer. Now suppose the page maximum row count is N; if Spark 
does the above N times in a page, and particularly if Spark cuts a page 
boundary and is subsequently given N empty values for an optional column - the 
column writer will then think it needs to flush the page to the column chunk 
store and will write out an empty page.

This will most likely be manifested in very sparse columns.

A simple demonstration of the issue is given below. Assume this code is 
manually inserted into {{ParquetIOSuite}}:
{code:java}
test("PARQUET-1414 Problems") 

[jira] [Commented] (SPARK-26874) When we upgrade Parquet to 1.11+, Spark can erroneously write empty pages

2019-02-13 Thread Matt Cheah (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16767736#comment-16767736
 ] 

Matt Cheah commented on SPARK-26874:


[~rdblue] [~cloud_fan] - was wondering if you had any thoughts here or can 
possibly confirm my understanding of what's going on.

> When we upgrade Parquet to 1.11+, Spark can erroneously write empty pages
> -
>
> Key: SPARK-26874
> URL: https://issues.apache.org/jira/browse/SPARK-26874
> Project: Spark
>  Issue Type: Bug
>  Components: Input/Output, SQL
>Affects Versions: 2.4.0
>Reporter: Matt Cheah
>Priority: Major
>
> This issue will only come up when Spark upgrades its Parquet dependency to 
> the latest. This issue is being filed to proactively fix the bug before we 
> upgrade - it's not something that would easily be found in the current unit 
> tests and can be missed until the community scale tests in an e.g. RC phase.
> Parquet introduced a new feature to limit the number of rows written to a 
> page in a column chunk - see PARQUET-1414. Previously, Parquet would only 
> flush pages to the column store after the page writer had filled its buffer 
> with a certain amount of bytes. The idea of the Parquet patch was to make 
> page writers flush to the column store upon the writer being given a certain 
> number of rows - the default value is 2.
> The patch makes the Spark Parquet Data Source erroneously write empty pages 
> to column chunks, making the Parquet file ultimately unreadable with 
> exceptions like these:
>  
> {code:java}
> Caused by: org.apache.parquet.io.ParquetDecodingException: Can not read value 
> at 0 in block -1 in file 
> file:/private/var/folders/7f/rrg37sj15r33cwlxbhg7fyg5080xxg/T/spark-2c1b1e5d-c132-42fe-98e1-b523b9baa176/parquet-data/part-2-b8b4bb3c-b49f-440b-b96c-53fcd19786ad-c000.snappy.parquet
>  at 
> org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:251)
>  at 
> org.apache.parquet.hadoop.ParquetRecordReader.nextKeyValue(ParquetRecordReader.java:207)
>  at 
> org.apache.spark.sql.execution.datasources.RecordReaderIterator.hasNext(RecordReaderIterator.scala:39)
>  at 
> org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:101)
>  at 
> org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:181)
>  ... 18 more
> Caused by: java.lang.IllegalArgumentException: Reading past RLE/BitPacking 
> stream.
>  at org.apache.parquet.Preconditions.checkArgument(Preconditions.java:53)
>  at 
> org.apache.parquet.column.values.rle.RunLengthBitPackingHybridDecoder.readNext(RunLengthBitPackingHybridDecoder.java:80)
>  at 
> org.apache.parquet.column.values.rle.RunLengthBitPackingHybridDecoder.readInt(RunLengthBitPackingHybridDecoder.java:62)
>  at 
> org.apache.parquet.column.values.rle.RunLengthBitPackingHybridValuesReader.readInteger(RunLengthBitPackingHybridValuesReader.java:53)
>  at 
> org.apache.parquet.column.impl.ColumnReaderBase$ValuesReaderIntIterator.nextInt(ColumnReaderBase.java:733)
>  at 
> org.apache.parquet.column.impl.ColumnReaderBase.checkRead(ColumnReaderBase.java:567)
>  at 
> org.apache.parquet.column.impl.ColumnReaderBase.consume(ColumnReaderBase.java:705)
>  at 
> org.apache.parquet.column.impl.ColumnReaderImpl.consume(ColumnReaderImpl.java:30)
>  at 
> org.apache.parquet.column.impl.ColumnReaderImpl.(ColumnReaderImpl.java:47)
>  at 
> org.apache.parquet.column.impl.ColumnReadStoreImpl.getColumnReader(ColumnReadStoreImpl.java:84)
>  at 
> org.apache.parquet.io.RecordReaderImplementation.(RecordReaderImplementation.java:271)
>  at org.apache.parquet.io.MessageColumnIO$1.visit(MessageColumnIO.java:147)
>  at org.apache.parquet.io.MessageColumnIO$1.visit(MessageColumnIO.java:109)
>  at 
> org.apache.parquet.filter2.compat.FilterCompat$NoOpFilter.accept(FilterCompat.java:165)
>  at 
> org.apache.parquet.io.MessageColumnIO.getRecordReader(MessageColumnIO.java:109)
>  at 
> org.apache.parquet.hadoop.InternalParquetRecordReader.checkRead(InternalParquetRecordReader.java:137)
>  at 
> org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:222)
>  ... 22 more
> {code}
> What's happening here is that the reader is being given a page with no 
> values, which Parquet can never handle.
> The root cause is due to the way Spark treats empty (null) records in 
> optional fields. Concretely, in {{ParquetWriteSupport#write}}, we always 
> indicate to the recordConsumer that we are starting a message 
> ({{recordConsumer#startMessage}}). If Spark is given a null field in the row, 
> it still indicates to the record consumer after having written the row that 
> the message is finished ({{recordConsumer#endMessage}}). The 

[jira] [Created] (SPARK-26874) When we upgrade Parquet to 1.11+, Spark can erroneously write empty pages

2019-02-13 Thread Matt Cheah (JIRA)
Matt Cheah created SPARK-26874:
--

 Summary: When we upgrade Parquet to 1.11+, Spark can erroneously 
write empty pages
 Key: SPARK-26874
 URL: https://issues.apache.org/jira/browse/SPARK-26874
 Project: Spark
  Issue Type: Bug
  Components: Input/Output, SQL
Affects Versions: 2.4.0
Reporter: Matt Cheah


This issue will only come up when Spark upgrades its Parquet dependency to the 
latest. This issue is being filed to proactively fix the bug before we upgrade 
- it's not something that would easily be found in the current unit tests and 
can be missed until the community scale tests in an e.g. RC phase.

Parquet introduced a new feature to limit the number of rows written to a page 
in a column chunk - see PARQUET-1414. Previously, Parquet would only flush 
pages to the column store after the page writer had filled its buffer with a 
certain amount of bytes. The idea of the Parquet patch was to make page writers 
flush to the column store upon the writer being given a certain number of rows 
- the default value is 2.

The patch makes the Spark Parquet Data Source erroneously write empty pages to 
column chunks, making the Parquet file ultimately unreadable with exceptions 
like these:

 
{code:java}
Caused by: org.apache.parquet.io.ParquetDecodingException: Can not read value 
at 0 in block -1 in file 
file:/private/var/folders/7f/rrg37sj15r33cwlxbhg7fyg5080xxg/T/spark-2c1b1e5d-c132-42fe-98e1-b523b9baa176/parquet-data/part-2-b8b4bb3c-b49f-440b-b96c-53fcd19786ad-c000.snappy.parquet
 at 
org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:251)
 at 
org.apache.parquet.hadoop.ParquetRecordReader.nextKeyValue(ParquetRecordReader.java:207)
 at 
org.apache.spark.sql.execution.datasources.RecordReaderIterator.hasNext(RecordReaderIterator.scala:39)
 at 
org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.hasNext(FileScanRDD.scala:101)
 at 
org.apache.spark.sql.execution.datasources.FileScanRDD$$anon$1.nextIterator(FileScanRDD.scala:181)
 ... 18 more
Caused by: java.lang.IllegalArgumentException: Reading past RLE/BitPacking 
stream.
 at org.apache.parquet.Preconditions.checkArgument(Preconditions.java:53)
 at 
org.apache.parquet.column.values.rle.RunLengthBitPackingHybridDecoder.readNext(RunLengthBitPackingHybridDecoder.java:80)
 at 
org.apache.parquet.column.values.rle.RunLengthBitPackingHybridDecoder.readInt(RunLengthBitPackingHybridDecoder.java:62)
 at 
org.apache.parquet.column.values.rle.RunLengthBitPackingHybridValuesReader.readInteger(RunLengthBitPackingHybridValuesReader.java:53)
 at 
org.apache.parquet.column.impl.ColumnReaderBase$ValuesReaderIntIterator.nextInt(ColumnReaderBase.java:733)
 at 
org.apache.parquet.column.impl.ColumnReaderBase.checkRead(ColumnReaderBase.java:567)
 at 
org.apache.parquet.column.impl.ColumnReaderBase.consume(ColumnReaderBase.java:705)
 at 
org.apache.parquet.column.impl.ColumnReaderImpl.consume(ColumnReaderImpl.java:30)
 at 
org.apache.parquet.column.impl.ColumnReaderImpl.(ColumnReaderImpl.java:47)
 at 
org.apache.parquet.column.impl.ColumnReadStoreImpl.getColumnReader(ColumnReadStoreImpl.java:84)
 at 
org.apache.parquet.io.RecordReaderImplementation.(RecordReaderImplementation.java:271)
 at org.apache.parquet.io.MessageColumnIO$1.visit(MessageColumnIO.java:147)
 at org.apache.parquet.io.MessageColumnIO$1.visit(MessageColumnIO.java:109)
 at 
org.apache.parquet.filter2.compat.FilterCompat$NoOpFilter.accept(FilterCompat.java:165)
 at 
org.apache.parquet.io.MessageColumnIO.getRecordReader(MessageColumnIO.java:109)
 at 
org.apache.parquet.hadoop.InternalParquetRecordReader.checkRead(InternalParquetRecordReader.java:137)
 at 
org.apache.parquet.hadoop.InternalParquetRecordReader.nextKeyValue(InternalParquetRecordReader.java:222)
 ... 22 more
{code}
What's happening here is that the reader is being given a page with no values, 
which Parquet can never handle.

The root cause is due to the way Spark treats empty (null) records in optional 
fields. Concretely, in {{ParquetWriteSupport#write}}, we always indicate to the 
recordConsumer that we are starting a message 
({{recordConsumer#startMessage}}). If Spark is given a null field in the row, 
it still indicates to the record consumer after having written the row that the 
message is finished ({{recordConsumer#endMessage}}). The ending of the message 
causes all column writers to increment their row count in the current page by 
1, despite the fact that Spark is not necessarily sending records to the 
underlying page writer. Now suppose the page maximum row count is N; if Spark 
does the above N times in a page, and particularly if Spark cuts a page 
boundary and is subsequently given N empty values for an optional column - the 
column writer will then think it needs to flush the page to the column chunk 
store and will write out an empty page.


[jira] [Resolved] (SPARK-26625) spark.redaction.regex should include oauthToken

2019-01-16 Thread Matt Cheah (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26625?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Cheah resolved SPARK-26625.

   Resolution: Fixed
Fix Version/s: 3.0.0

> spark.redaction.regex should include oauthToken
> ---
>
> Key: SPARK-26625
> URL: https://issues.apache.org/jira/browse/SPARK-26625
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes, Spark Core
>Affects Versions: 2.4.0
>Reporter: Vinoo Ganesh
>Priority: Critical
> Fix For: 3.0.0
>
>
> The regex (spark.redaction.regex) that is used to decide which config 
> properties or environment settings are sensitive should also include 
> oauthToken to match  spark.kubernetes.authenticate.submission.oauthToken



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-25877) Put all feature-related code in the feature step itself

2018-12-12 Thread Matt Cheah (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Cheah resolved SPARK-25877.

   Resolution: Fixed
Fix Version/s: 3.0.0

> Put all feature-related code in the feature step itself
> ---
>
> Key: SPARK-25877
> URL: https://issues.apache.org/jira/browse/SPARK-25877
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 2.4.0
>Reporter: Marcelo Vanzin
>Priority: Major
> Fix For: 3.0.0
>
>
> This is a child task of SPARK-25874. It covers having all the code related to 
> features in the feature steps themselves, including logic about whether a 
> step should be applied or not.
> Please refer to the parent bug for further details.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-26301) Consider switching from putting secret in environment variable directly to using secret reference

2018-12-06 Thread Matt Cheah (JIRA)
Matt Cheah created SPARK-26301:
--

 Summary: Consider switching from putting secret in environment 
variable directly to using secret reference
 Key: SPARK-26301
 URL: https://issues.apache.org/jira/browse/SPARK-26301
 Project: Spark
  Issue Type: New Feature
  Components: Kubernetes
Affects Versions: 3.0.0
Reporter: Matt Cheah


In SPARK-26194 we proposed using an environment variable that is loaded in the 
executor pod spec to share the generated SASL secret key between the driver and 
the executors. However in practice this is very difficult to secure. Most 
traditional Kubernetes deployments will handle permissions by allowing wide 
access to viewing pod specs but restricting access to view Kubernetes secrets. 
Now however any user that can view the pod spec can also view the contents of 
the SASL secrets.

An example use case where this quickly breaks down is in the case where a 
systems administrator is allowed to look at pods that run user code in order to 
debug failing infrastructure, but the cluster administrator should not be able 
to view contents of secrets or other sensitive data from Spark applications run 
by their users.

We propose modifying the existing solution to instead automatically create a 
Kubernetes Secret object containing the SASL encryption key, then using the 
[secret reference feature in 
Kubernetes|https://kubernetes.io/docs/concepts/configuration/secret/#using-secrets-as-environment-variables]
 to store the data in the environment variable without putting the secret data 
in the pod spec itself.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-26194) Support automatic spark.authenticate secret in Kubernetes backend

2018-12-06 Thread Matt Cheah (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26194?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Cheah resolved SPARK-26194.

   Resolution: Fixed
Fix Version/s: 3.0.0

> Support automatic spark.authenticate secret in Kubernetes backend
> -
>
> Key: SPARK-26194
> URL: https://issues.apache.org/jira/browse/SPARK-26194
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 3.0.0
>Reporter: Marcelo Vanzin
>Priority: Major
> Fix For: 3.0.0
>
>
> Currently k8s inherits the default behavior for {{spark.authenticate}}, which 
> is that the user must provide an auth secret.
> k8s doesn't have that requirement and could instead generate its own unique 
> per-app secret, and propagate it to executors.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26239) Add configurable auth secret source in k8s backend

2018-12-03 Thread Matt Cheah (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16707962#comment-16707962
 ] 

Matt Cheah commented on SPARK-26239:


It could work in client mode but is less useful there overall because the user 
has to determine how to get ahold of that secret file. Nevertheless for cluster 
mode users that have secret file mounting systems for the driver and executors, 
it would be a great start. I can start building the code for this.

> Add configurable auth secret source in k8s backend
> --
>
> Key: SPARK-26239
> URL: https://issues.apache.org/jira/browse/SPARK-26239
> Project: Spark
>  Issue Type: New Feature
>  Components: Kubernetes
>Affects Versions: 3.0.0
>Reporter: Marcelo Vanzin
>Priority: Major
>
> This is a follow up to SPARK-26194, which aims to add auto-generated secrets 
> similar to the YARN backend.
> There's a desire to support different ways to generate and propagate these 
> auth secrets (e.g. using things like Vault). Need to investigate:
> - exposing configuration to support that
> - changing SecurityManager so that it can delegate some of the 
> secret-handling logic to custom implementations
> - figuring out whether this can also be used in client-mode, where the driver 
> is not created by the k8s backend in Spark.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-25876) Simplify configuration types in k8s backend

2018-11-30 Thread Matt Cheah (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25876?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Cheah resolved SPARK-25876.

   Resolution: Fixed
Fix Version/s: 3.0.0

> Simplify configuration types in k8s backend
> ---
>
> Key: SPARK-25876
> URL: https://issues.apache.org/jira/browse/SPARK-25876
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 2.4.0
>Reporter: Marcelo Vanzin
>Priority: Major
> Fix For: 3.0.0
>
>
> This is a child of SPARK-25874 to deal with the current issues with the 
> different configuration objects used in the k8s backend. Please refer to the 
> parent for further discussion of what this means.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-26239) Add configurable auth secret source in k8s backend

2018-11-30 Thread Matt Cheah (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16705273#comment-16705273
 ] 

Matt Cheah edited comment on SPARK-26239 at 11/30/18 8:59 PM:
--

Could we add a simple version that just points to file paths for the executor 
and driver to load, with the secret contents being inside? The user can decide 
how those files are mounted into the containers.


was (Author: mcheah):
Would a simple addition just to point to file paths for the executor and driver 
to load, with the secret contents being inside? The user can decide how those 
files are mounted into the containers.

> Add configurable auth secret source in k8s backend
> --
>
> Key: SPARK-26239
> URL: https://issues.apache.org/jira/browse/SPARK-26239
> Project: Spark
>  Issue Type: New Feature
>  Components: Kubernetes
>Affects Versions: 3.0.0
>Reporter: Marcelo Vanzin
>Priority: Major
>
> This is a follow up to SPARK-26194, which aims to add auto-generated secrets 
> similar to the YARN backend.
> There's a desire to support different ways to generate and propagate these 
> auth secrets (e.g. using things like Vault). Need to investigate:
> - exposing configuration to support that
> - changing SecurityManager so that it can delegate some of the 
> secret-handling logic to custom implementations
> - figuring out whether this can also be used in client-mode, where the driver 
> is not created by the k8s backend in Spark.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26239) Add configurable auth secret source in k8s backend

2018-11-30 Thread Matt Cheah (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16705273#comment-16705273
 ] 

Matt Cheah commented on SPARK-26239:


Would a simple addition just to point to file paths for the executor and driver 
to load, with the secret contents being inside? The user can decide how those 
files are mounted into the containers.

> Add configurable auth secret source in k8s backend
> --
>
> Key: SPARK-26239
> URL: https://issues.apache.org/jira/browse/SPARK-26239
> Project: Spark
>  Issue Type: New Feature
>  Components: Kubernetes
>Affects Versions: 3.0.0
>Reporter: Marcelo Vanzin
>Priority: Major
>
> This is a follow up to SPARK-26194, which aims to add auto-generated secrets 
> similar to the YARN backend.
> There's a desire to support different ways to generate and propagate these 
> auth secrets (e.g. using things like Vault). Need to investigate:
> - exposing configuration to support that
> - changing SecurityManager so that it can delegate some of the 
> secret-handling logic to custom implementations
> - figuring out whether this can also be used in client-mode, where the driver 
> is not created by the k8s backend in Spark.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-25957) Skip building spark-r docker image if spark distribution does not have R support

2018-11-21 Thread Matt Cheah (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25957?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Cheah resolved SPARK-25957.

   Resolution: Fixed
Fix Version/s: 3.0.0

> Skip building spark-r docker image if spark distribution does not have R 
> support
> 
>
> Key: SPARK-25957
> URL: https://issues.apache.org/jira/browse/SPARK-25957
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 2.4.0
>Reporter: Nagaram Prasad Addepally
>Priority: Major
> Fix For: 3.0.0
>
>
> [docker-image-tool.sh|https://github.com/apache/spark/blob/master/bin/docker-image-tool.sh]
>  script by default tries to build spark-r image. We may not always build 
> spark distribution with R support. It would be good to skip building and 
> publishing spark-r images if R support is not available in the spark 
> distribution.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-26078) WHERE .. IN fails to filter rows when used in combination with UNION

2018-11-15 Thread Matt Cheah (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Cheah updated SPARK-26078:
---
Labels: Correctness  (was: )

> WHERE .. IN fails to filter rows when used in combination with UNION
> 
>
> Key: SPARK-26078
> URL: https://issues.apache.org/jira/browse/SPARK-26078
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.1, 2.4.0
>Reporter: Arttu Voutilainen
>Priority: Blocker
>  Labels: correctness
>
> Hey,
> We encountered a case where Spark SQL does not seem to handle WHERE .. IN 
> correctly, when used in combination with UNION, but instead returns also rows 
> that do not fulfill the condition. Swapping the order of the datasets in the 
> UNION makes the problem go away. Repro below:
>  
> {code}
> sql = SQLContext(sc)
> a = spark.createDataFrame([{'id': 'a', 'num': 2}, {'id':'b', 'num':1}])
> b = spark.createDataFrame([{'id': 'a', 'num': 2}, {'id':'b', 'num':1}])
> a.registerTempTable('a')
> b.registerTempTable('b')
> bug = sql.sql("""
> SELECT id,num,source FROM
> (
> SELECT id, num, 'a' as source FROM a
> UNION ALL
> SELECT id, num, 'b' as source FROM b
> ) AS c
> WHERE c.id IN (SELECT id FROM b WHERE num = 2)
> """)
> no_bug = sql.sql("""
> SELECT id,num,source FROM
> (
> SELECT id, num, 'b' as source FROM b
> UNION ALL
> SELECT id, num, 'a' as source FROM a
> ) AS c
> WHERE c.id IN (SELECT id FROM b WHERE num = 2)
> """)
> bug.show()
> no_bug.show()
> bug.explain(True)
> no_bug.explain(True)
> {code}
> This results in one extra row in the "bug" DF coming from DF "b", that should 
> not be there as it  
> {code:java}
> >>> bug.show()
> +---+---+--+
> | id|num|source|
> +---+---+--+
> |  a|  2| a|
> |  a|  2| b|
> |  b|  1| b|
> +---+---+--+
> >>> no_bug.show()
> +---+---+--+
> | id|num|source|
> +---+---+--+
> |  a|  2| b|
> |  a|  2| a|
> +---+---+--+
> {code}
>  The reason can be seen in the query plans:
> {code:java}
> >>> bug.explain(True)
> ...
> == Optimized Logical Plan ==
> Union
> == Optimized Logical Plan ==
> Union
> :- Project [id#0, num#1L, a AS source#136]
> :  +- Join LeftSemi, (id#0 = id#4)
> : :- LogicalRDD [id#0, num#1L], false
> : +- Project [id#4]
> :+- Filter (isnotnull(num#5L) && (num#5L = 2))
> :   +- LogicalRDD [id#4, num#5L], false
> +- Join LeftSemi, (id#4#172 = id#4#172)
>:- Project [id#4, num#5L, b AS source#137]
>:  +- LogicalRDD [id#4, num#5L], false
>+- Project [id#4 AS id#4#172]
>   +- Filter (isnotnull(num#5L) && (num#5L = 2))
>  +- LogicalRDD [id#4, num#5L], false
> {code}
> Note the line *+- Join LeftSemi, (id#4#172 = id#4#172)* - this condition 
> seems wrong, and I believe it causes the LeftSemi to return true for all rows 
> in the left-hand-side table, thus failing to filter as the WHERE .. IN 
> should. Compare with the non-buggy version, where both LeftSemi joins have 
> distinct #-things on both sides:
> {code:java}
> >>> no_bug.explain()
> ...
> == Optimized Logical Plan ==
> Union
> :- Project [id#4, num#5L, b AS source#142]
> :  +- Join LeftSemi, (id#4 = id#4#173)
> : :- LogicalRDD [id#4, num#5L], false
> : +- Project [id#4 AS id#4#173]
> :+- Filter (isnotnull(num#5L) && (num#5L = 2))
> :   +- LogicalRDD [id#4, num#5L], false
> +- Project [id#0, num#1L, a AS source#143]
>+- Join LeftSemi, (id#0 = id#4#173)
>   :- LogicalRDD [id#0, num#1L], false
>   +- Project [id#4 AS id#4#173]
>  +- Filter (isnotnull(num#5L) && (num#5L = 2))
> +- LogicalRDD [id#4, num#5L], false
> {code}
>  
> Best,
> -Arttu 
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-26078) WHERE .. IN fails to filter rows when used in combination with UNION

2018-11-15 Thread Matt Cheah (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Cheah updated SPARK-26078:
---
Labels: Correctness correctness  (was: Correctness)

> WHERE .. IN fails to filter rows when used in combination with UNION
> 
>
> Key: SPARK-26078
> URL: https://issues.apache.org/jira/browse/SPARK-26078
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.1, 2.4.0
>Reporter: Arttu Voutilainen
>Priority: Blocker
>  Labels: correctness
>
> Hey,
> We encountered a case where Spark SQL does not seem to handle WHERE .. IN 
> correctly, when used in combination with UNION, but instead returns also rows 
> that do not fulfill the condition. Swapping the order of the datasets in the 
> UNION makes the problem go away. Repro below:
>  
> {code}
> sql = SQLContext(sc)
> a = spark.createDataFrame([{'id': 'a', 'num': 2}, {'id':'b', 'num':1}])
> b = spark.createDataFrame([{'id': 'a', 'num': 2}, {'id':'b', 'num':1}])
> a.registerTempTable('a')
> b.registerTempTable('b')
> bug = sql.sql("""
> SELECT id,num,source FROM
> (
> SELECT id, num, 'a' as source FROM a
> UNION ALL
> SELECT id, num, 'b' as source FROM b
> ) AS c
> WHERE c.id IN (SELECT id FROM b WHERE num = 2)
> """)
> no_bug = sql.sql("""
> SELECT id,num,source FROM
> (
> SELECT id, num, 'b' as source FROM b
> UNION ALL
> SELECT id, num, 'a' as source FROM a
> ) AS c
> WHERE c.id IN (SELECT id FROM b WHERE num = 2)
> """)
> bug.show()
> no_bug.show()
> bug.explain(True)
> no_bug.explain(True)
> {code}
> This results in one extra row in the "bug" DF coming from DF "b", that should 
> not be there as it  
> {code:java}
> >>> bug.show()
> +---+---+--+
> | id|num|source|
> +---+---+--+
> |  a|  2| a|
> |  a|  2| b|
> |  b|  1| b|
> +---+---+--+
> >>> no_bug.show()
> +---+---+--+
> | id|num|source|
> +---+---+--+
> |  a|  2| b|
> |  a|  2| a|
> +---+---+--+
> {code}
>  The reason can be seen in the query plans:
> {code:java}
> >>> bug.explain(True)
> ...
> == Optimized Logical Plan ==
> Union
> == Optimized Logical Plan ==
> Union
> :- Project [id#0, num#1L, a AS source#136]
> :  +- Join LeftSemi, (id#0 = id#4)
> : :- LogicalRDD [id#0, num#1L], false
> : +- Project [id#4]
> :+- Filter (isnotnull(num#5L) && (num#5L = 2))
> :   +- LogicalRDD [id#4, num#5L], false
> +- Join LeftSemi, (id#4#172 = id#4#172)
>:- Project [id#4, num#5L, b AS source#137]
>:  +- LogicalRDD [id#4, num#5L], false
>+- Project [id#4 AS id#4#172]
>   +- Filter (isnotnull(num#5L) && (num#5L = 2))
>  +- LogicalRDD [id#4, num#5L], false
> {code}
> Note the line *+- Join LeftSemi, (id#4#172 = id#4#172)* - this condition 
> seems wrong, and I believe it causes the LeftSemi to return true for all rows 
> in the left-hand-side table, thus failing to filter as the WHERE .. IN 
> should. Compare with the non-buggy version, where both LeftSemi joins have 
> distinct #-things on both sides:
> {code:java}
> >>> no_bug.explain()
> ...
> == Optimized Logical Plan ==
> Union
> :- Project [id#4, num#5L, b AS source#142]
> :  +- Join LeftSemi, (id#4 = id#4#173)
> : :- LogicalRDD [id#4, num#5L], false
> : +- Project [id#4 AS id#4#173]
> :+- Filter (isnotnull(num#5L) && (num#5L = 2))
> :   +- LogicalRDD [id#4, num#5L], false
> +- Project [id#0, num#1L, a AS source#143]
>+- Join LeftSemi, (id#0 = id#4#173)
>   :- LogicalRDD [id#0, num#1L], false
>   +- Project [id#4 AS id#4#173]
>  +- Filter (isnotnull(num#5L) && (num#5L = 2))
> +- LogicalRDD [id#4, num#5L], false
> {code}
>  
> Best,
> -Arttu 
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-26078) WHERE .. IN fails to filter rows when used in combination with UNION

2018-11-15 Thread Matt Cheah (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Cheah updated SPARK-26078:
---
Target Version/s: 2.4.1, 3.0.0

> WHERE .. IN fails to filter rows when used in combination with UNION
> 
>
> Key: SPARK-26078
> URL: https://issues.apache.org/jira/browse/SPARK-26078
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.1, 2.4.0
>Reporter: Arttu Voutilainen
>Priority: Blocker
>  Labels: correctness
>
> Hey,
> We encountered a case where Spark SQL does not seem to handle WHERE .. IN 
> correctly, when used in combination with UNION, but instead returns also rows 
> that do not fulfill the condition. Swapping the order of the datasets in the 
> UNION makes the problem go away. Repro below:
>  
> {code}
> sql = SQLContext(sc)
> a = spark.createDataFrame([{'id': 'a', 'num': 2}, {'id':'b', 'num':1}])
> b = spark.createDataFrame([{'id': 'a', 'num': 2}, {'id':'b', 'num':1}])
> a.registerTempTable('a')
> b.registerTempTable('b')
> bug = sql.sql("""
> SELECT id,num,source FROM
> (
> SELECT id, num, 'a' as source FROM a
> UNION ALL
> SELECT id, num, 'b' as source FROM b
> ) AS c
> WHERE c.id IN (SELECT id FROM b WHERE num = 2)
> """)
> no_bug = sql.sql("""
> SELECT id,num,source FROM
> (
> SELECT id, num, 'b' as source FROM b
> UNION ALL
> SELECT id, num, 'a' as source FROM a
> ) AS c
> WHERE c.id IN (SELECT id FROM b WHERE num = 2)
> """)
> bug.show()
> no_bug.show()
> bug.explain(True)
> no_bug.explain(True)
> {code}
> This results in one extra row in the "bug" DF coming from DF "b", that should 
> not be there as it  
> {code:java}
> >>> bug.show()
> +---+---+--+
> | id|num|source|
> +---+---+--+
> |  a|  2| a|
> |  a|  2| b|
> |  b|  1| b|
> +---+---+--+
> >>> no_bug.show()
> +---+---+--+
> | id|num|source|
> +---+---+--+
> |  a|  2| b|
> |  a|  2| a|
> +---+---+--+
> {code}
>  The reason can be seen in the query plans:
> {code:java}
> >>> bug.explain(True)
> ...
> == Optimized Logical Plan ==
> Union
> == Optimized Logical Plan ==
> Union
> :- Project [id#0, num#1L, a AS source#136]
> :  +- Join LeftSemi, (id#0 = id#4)
> : :- LogicalRDD [id#0, num#1L], false
> : +- Project [id#4]
> :+- Filter (isnotnull(num#5L) && (num#5L = 2))
> :   +- LogicalRDD [id#4, num#5L], false
> +- Join LeftSemi, (id#4#172 = id#4#172)
>:- Project [id#4, num#5L, b AS source#137]
>:  +- LogicalRDD [id#4, num#5L], false
>+- Project [id#4 AS id#4#172]
>   +- Filter (isnotnull(num#5L) && (num#5L = 2))
>  +- LogicalRDD [id#4, num#5L], false
> {code}
> Note the line *+- Join LeftSemi, (id#4#172 = id#4#172)* - this condition 
> seems wrong, and I believe it causes the LeftSemi to return true for all rows 
> in the left-hand-side table, thus failing to filter as the WHERE .. IN 
> should. Compare with the non-buggy version, where both LeftSemi joins have 
> distinct #-things on both sides:
> {code:java}
> >>> no_bug.explain()
> ...
> == Optimized Logical Plan ==
> Union
> :- Project [id#4, num#5L, b AS source#142]
> :  +- Join LeftSemi, (id#4 = id#4#173)
> : :- LogicalRDD [id#4, num#5L], false
> : +- Project [id#4 AS id#4#173]
> :+- Filter (isnotnull(num#5L) && (num#5L = 2))
> :   +- LogicalRDD [id#4, num#5L], false
> +- Project [id#0, num#1L, a AS source#143]
>+- Join LeftSemi, (id#0 = id#4#173)
>   :- LogicalRDD [id#0, num#1L], false
>   +- Project [id#4 AS id#4#173]
>  +- Filter (isnotnull(num#5L) && (num#5L = 2))
> +- LogicalRDD [id#4, num#5L], false
> {code}
>  
> Best,
> -Arttu 
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-26078) WHERE .. IN fails to filter rows when used in combination with UNION

2018-11-15 Thread Matt Cheah (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Cheah updated SPARK-26078:
---
Labels: correctness  (was: Correctness correctness)

> WHERE .. IN fails to filter rows when used in combination with UNION
> 
>
> Key: SPARK-26078
> URL: https://issues.apache.org/jira/browse/SPARK-26078
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.1, 2.4.0
>Reporter: Arttu Voutilainen
>Priority: Blocker
>  Labels: correctness
>
> Hey,
> We encountered a case where Spark SQL does not seem to handle WHERE .. IN 
> correctly, when used in combination with UNION, but instead returns also rows 
> that do not fulfill the condition. Swapping the order of the datasets in the 
> UNION makes the problem go away. Repro below:
>  
> {code}
> sql = SQLContext(sc)
> a = spark.createDataFrame([{'id': 'a', 'num': 2}, {'id':'b', 'num':1}])
> b = spark.createDataFrame([{'id': 'a', 'num': 2}, {'id':'b', 'num':1}])
> a.registerTempTable('a')
> b.registerTempTable('b')
> bug = sql.sql("""
> SELECT id,num,source FROM
> (
> SELECT id, num, 'a' as source FROM a
> UNION ALL
> SELECT id, num, 'b' as source FROM b
> ) AS c
> WHERE c.id IN (SELECT id FROM b WHERE num = 2)
> """)
> no_bug = sql.sql("""
> SELECT id,num,source FROM
> (
> SELECT id, num, 'b' as source FROM b
> UNION ALL
> SELECT id, num, 'a' as source FROM a
> ) AS c
> WHERE c.id IN (SELECT id FROM b WHERE num = 2)
> """)
> bug.show()
> no_bug.show()
> bug.explain(True)
> no_bug.explain(True)
> {code}
> This results in one extra row in the "bug" DF coming from DF "b", that should 
> not be there as it  
> {code:java}
> >>> bug.show()
> +---+---+--+
> | id|num|source|
> +---+---+--+
> |  a|  2| a|
> |  a|  2| b|
> |  b|  1| b|
> +---+---+--+
> >>> no_bug.show()
> +---+---+--+
> | id|num|source|
> +---+---+--+
> |  a|  2| b|
> |  a|  2| a|
> +---+---+--+
> {code}
>  The reason can be seen in the query plans:
> {code:java}
> >>> bug.explain(True)
> ...
> == Optimized Logical Plan ==
> Union
> == Optimized Logical Plan ==
> Union
> :- Project [id#0, num#1L, a AS source#136]
> :  +- Join LeftSemi, (id#0 = id#4)
> : :- LogicalRDD [id#0, num#1L], false
> : +- Project [id#4]
> :+- Filter (isnotnull(num#5L) && (num#5L = 2))
> :   +- LogicalRDD [id#4, num#5L], false
> +- Join LeftSemi, (id#4#172 = id#4#172)
>:- Project [id#4, num#5L, b AS source#137]
>:  +- LogicalRDD [id#4, num#5L], false
>+- Project [id#4 AS id#4#172]
>   +- Filter (isnotnull(num#5L) && (num#5L = 2))
>  +- LogicalRDD [id#4, num#5L], false
> {code}
> Note the line *+- Join LeftSemi, (id#4#172 = id#4#172)* - this condition 
> seems wrong, and I believe it causes the LeftSemi to return true for all rows 
> in the left-hand-side table, thus failing to filter as the WHERE .. IN 
> should. Compare with the non-buggy version, where both LeftSemi joins have 
> distinct #-things on both sides:
> {code:java}
> >>> no_bug.explain()
> ...
> == Optimized Logical Plan ==
> Union
> :- Project [id#4, num#5L, b AS source#142]
> :  +- Join LeftSemi, (id#4 = id#4#173)
> : :- LogicalRDD [id#4, num#5L], false
> : +- Project [id#4 AS id#4#173]
> :+- Filter (isnotnull(num#5L) && (num#5L = 2))
> :   +- LogicalRDD [id#4, num#5L], false
> +- Project [id#0, num#1L, a AS source#143]
>+- Join LeftSemi, (id#0 = id#4#173)
>   :- LogicalRDD [id#0, num#1L], false
>   +- Project [id#4 AS id#4#173]
>  +- Filter (isnotnull(num#5L) && (num#5L = 2))
> +- LogicalRDD [id#4, num#5L], false
> {code}
>  
> Best,
> -Arttu 
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-26078) WHERE .. IN fails to filter rows when used in combination with UNION

2018-11-15 Thread Matt Cheah (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-26078?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Cheah updated SPARK-26078:
---
Priority: Blocker  (was: Major)

> WHERE .. IN fails to filter rows when used in combination with UNION
> 
>
> Key: SPARK-26078
> URL: https://issues.apache.org/jira/browse/SPARK-26078
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.1, 2.4.0
>Reporter: Arttu Voutilainen
>Priority: Blocker
>
> Hey,
> We encountered a case where Spark SQL does not seem to handle WHERE .. IN 
> correctly, when used in combination with UNION, but instead returns also rows 
> that do not fulfill the condition. Swapping the order of the datasets in the 
> UNION makes the problem go away. Repro below:
>  
> {code}
> sql = SQLContext(sc)
> a = spark.createDataFrame([{'id': 'a', 'num': 2}, {'id':'b', 'num':1}])
> b = spark.createDataFrame([{'id': 'a', 'num': 2}, {'id':'b', 'num':1}])
> a.registerTempTable('a')
> b.registerTempTable('b')
> bug = sql.sql("""
> SELECT id,num,source FROM
> (
> SELECT id, num, 'a' as source FROM a
> UNION ALL
> SELECT id, num, 'b' as source FROM b
> ) AS c
> WHERE c.id IN (SELECT id FROM b WHERE num = 2)
> """)
> no_bug = sql.sql("""
> SELECT id,num,source FROM
> (
> SELECT id, num, 'b' as source FROM b
> UNION ALL
> SELECT id, num, 'a' as source FROM a
> ) AS c
> WHERE c.id IN (SELECT id FROM b WHERE num = 2)
> """)
> bug.show()
> no_bug.show()
> bug.explain(True)
> no_bug.explain(True)
> {code}
> This results in one extra row in the "bug" DF coming from DF "b", that should 
> not be there as it  
> {code:java}
> >>> bug.show()
> +---+---+--+
> | id|num|source|
> +---+---+--+
> |  a|  2| a|
> |  a|  2| b|
> |  b|  1| b|
> +---+---+--+
> >>> no_bug.show()
> +---+---+--+
> | id|num|source|
> +---+---+--+
> |  a|  2| b|
> |  a|  2| a|
> +---+---+--+
> {code}
>  The reason can be seen in the query plans:
> {code:java}
> >>> bug.explain(True)
> ...
> == Optimized Logical Plan ==
> Union
> == Optimized Logical Plan ==
> Union
> :- Project [id#0, num#1L, a AS source#136]
> :  +- Join LeftSemi, (id#0 = id#4)
> : :- LogicalRDD [id#0, num#1L], false
> : +- Project [id#4]
> :+- Filter (isnotnull(num#5L) && (num#5L = 2))
> :   +- LogicalRDD [id#4, num#5L], false
> +- Join LeftSemi, (id#4#172 = id#4#172)
>:- Project [id#4, num#5L, b AS source#137]
>:  +- LogicalRDD [id#4, num#5L], false
>+- Project [id#4 AS id#4#172]
>   +- Filter (isnotnull(num#5L) && (num#5L = 2))
>  +- LogicalRDD [id#4, num#5L], false
> {code}
> Note the line *+- Join LeftSemi, (id#4#172 = id#4#172)* - this condition 
> seems wrong, and I believe it causes the LeftSemi to return true for all rows 
> in the left-hand-side table, thus failing to filter as the WHERE .. IN 
> should. Compare with the non-buggy version, where both LeftSemi joins have 
> distinct #-things on both sides:
> {code:java}
> >>> no_bug.explain()
> ...
> == Optimized Logical Plan ==
> Union
> :- Project [id#4, num#5L, b AS source#142]
> :  +- Join LeftSemi, (id#4 = id#4#173)
> : :- LogicalRDD [id#4, num#5L], false
> : +- Project [id#4 AS id#4#173]
> :+- Filter (isnotnull(num#5L) && (num#5L = 2))
> :   +- LogicalRDD [id#4, num#5L], false
> +- Project [id#0, num#1L, a AS source#143]
>+- Join LeftSemi, (id#0 = id#4#173)
>   :- LogicalRDD [id#0, num#1L], false
>   +- Project [id#4 AS id#4#173]
>  +- Filter (isnotnull(num#5L) && (num#5L = 2))
> +- LogicalRDD [id#4, num#5L], false
> {code}
>  
> Best,
> -Arttu 
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-26078) WHERE .. IN fails to filter rows when used in combination with UNION

2018-11-15 Thread Matt Cheah (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-26078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16688460#comment-16688460
 ] 

Matt Cheah commented on SPARK-26078:


As per other correctness issues we have seen as of late and as of other 
discussions around these kinds of things, I'm elevating this to a blocker. 
[~cloud_fan] [~r...@databricks.com]. We should look to get this into 2.4.1 and 
3.0.

> WHERE .. IN fails to filter rows when used in combination with UNION
> 
>
> Key: SPARK-26078
> URL: https://issues.apache.org/jira/browse/SPARK-26078
> Project: Spark
>  Issue Type: Bug
>  Components: SQL
>Affects Versions: 2.3.1, 2.4.0
>Reporter: Arttu Voutilainen
>Priority: Major
>
> Hey,
> We encountered a case where Spark SQL does not seem to handle WHERE .. IN 
> correctly, when used in combination with UNION, but instead returns also rows 
> that do not fulfill the condition. Swapping the order of the datasets in the 
> UNION makes the problem go away. Repro below:
>  
> {code}
> sql = SQLContext(sc)
> a = spark.createDataFrame([{'id': 'a', 'num': 2}, {'id':'b', 'num':1}])
> b = spark.createDataFrame([{'id': 'a', 'num': 2}, {'id':'b', 'num':1}])
> a.registerTempTable('a')
> b.registerTempTable('b')
> bug = sql.sql("""
> SELECT id,num,source FROM
> (
> SELECT id, num, 'a' as source FROM a
> UNION ALL
> SELECT id, num, 'b' as source FROM b
> ) AS c
> WHERE c.id IN (SELECT id FROM b WHERE num = 2)
> """)
> no_bug = sql.sql("""
> SELECT id,num,source FROM
> (
> SELECT id, num, 'b' as source FROM b
> UNION ALL
> SELECT id, num, 'a' as source FROM a
> ) AS c
> WHERE c.id IN (SELECT id FROM b WHERE num = 2)
> """)
> bug.show()
> no_bug.show()
> bug.explain(True)
> no_bug.explain(True)
> {code}
> This results in one extra row in the "bug" DF coming from DF "b", that should 
> not be there as it  
> {code:java}
> >>> bug.show()
> +---+---+--+
> | id|num|source|
> +---+---+--+
> |  a|  2| a|
> |  a|  2| b|
> |  b|  1| b|
> +---+---+--+
> >>> no_bug.show()
> +---+---+--+
> | id|num|source|
> +---+---+--+
> |  a|  2| b|
> |  a|  2| a|
> +---+---+--+
> {code}
>  The reason can be seen in the query plans:
> {code:java}
> >>> bug.explain(True)
> ...
> == Optimized Logical Plan ==
> Union
> == Optimized Logical Plan ==
> Union
> :- Project [id#0, num#1L, a AS source#136]
> :  +- Join LeftSemi, (id#0 = id#4)
> : :- LogicalRDD [id#0, num#1L], false
> : +- Project [id#4]
> :+- Filter (isnotnull(num#5L) && (num#5L = 2))
> :   +- LogicalRDD [id#4, num#5L], false
> +- Join LeftSemi, (id#4#172 = id#4#172)
>:- Project [id#4, num#5L, b AS source#137]
>:  +- LogicalRDD [id#4, num#5L], false
>+- Project [id#4 AS id#4#172]
>   +- Filter (isnotnull(num#5L) && (num#5L = 2))
>  +- LogicalRDD [id#4, num#5L], false
> {code}
> Note the line *+- Join LeftSemi, (id#4#172 = id#4#172)* - this condition 
> seems wrong, and I believe it causes the LeftSemi to return true for all rows 
> in the left-hand-side table, thus failing to filter as the WHERE .. IN 
> should. Compare with the non-buggy version, where both LeftSemi joins have 
> distinct #-things on both sides:
> {code:java}
> >>> no_bug.explain()
> ...
> == Optimized Logical Plan ==
> Union
> :- Project [id#4, num#5L, b AS source#142]
> :  +- Join LeftSemi, (id#4 = id#4#173)
> : :- LogicalRDD [id#4, num#5L], false
> : +- Project [id#4 AS id#4#173]
> :+- Filter (isnotnull(num#5L) && (num#5L = 2))
> :   +- LogicalRDD [id#4, num#5L], false
> +- Project [id#0, num#1L, a AS source#143]
>+- Join LeftSemi, (id#0 = id#4#173)
>   :- LogicalRDD [id#0, num#1L], false
>   +- Project [id#4 AS id#4#173]
>  +- Filter (isnotnull(num#5L) && (num#5L = 2))
> +- LogicalRDD [id#4, num#5L], false
> {code}
>  
> Best,
> -Arttu 
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-25821) Remove SQLContext methods deprecated as of Spark 1.4

2018-11-13 Thread Matt Cheah (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-25821?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16685705#comment-16685705
 ] 

Matt Cheah commented on SPARK-25821:


Should we call this out in release notes? It seems pretty obscure and it's not 
clear if users would ever have called these APIs, but it might be worth a 
bullet point or two.

> Remove SQLContext methods deprecated as of Spark 1.4
> 
>
> Key: SPARK-25821
> URL: https://issues.apache.org/jira/browse/SPARK-25821
> Project: Spark
>  Issue Type: Task
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Sean Owen
>Assignee: Sean Owen
>Priority: Major
> Fix For: 3.0.0
>
>
> There are several SQLContext methods that have been deprecate since Spark 
> 1.4, like:
> {code:java}
> @deprecated("Use read.parquet() instead.", "1.4.0")
> @scala.annotation.varargs
> def parquetFile(paths: String*): DataFrame = {
>   if (paths.isEmpty) {
> emptyDataFrame
>   } else {
> read.parquet(paths : _*)
>   }
> }{code}
> Let's remove them in Spark 3.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-25862) Remove rangeBetween APIs introduced in SPARK-21608

2018-11-13 Thread Matt Cheah (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25862?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Cheah updated SPARK-25862:
---
Labels: release-notes  (was: )

> Remove rangeBetween APIs introduced in SPARK-21608
> --
>
> Key: SPARK-25862
> URL: https://issues.apache.org/jira/browse/SPARK-25862
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Reynold Xin
>Assignee: Reynold Xin
>Priority: Major
>  Labels: release-notes
> Fix For: 3.0.0
>
>
> As a follow up to https://issues.apache.org/jira/browse/SPARK-25842, removing 
> the API so we can introduce a new one.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-25862) Remove rangeBetween APIs introduced in SPARK-21608

2018-11-13 Thread Matt Cheah (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-25862?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16685704#comment-16685704
 ] 

Matt Cheah commented on SPARK-25862:


Adding release-notes label because this seems like a breaking change.

> Remove rangeBetween APIs introduced in SPARK-21608
> --
>
> Key: SPARK-25862
> URL: https://issues.apache.org/jira/browse/SPARK-25862
> Project: Spark
>  Issue Type: Sub-task
>  Components: SQL
>Affects Versions: 2.4.0
>Reporter: Reynold Xin
>Assignee: Reynold Xin
>Priority: Major
>  Labels: release-notes
> Fix For: 3.0.0
>
>
> As a follow up to https://issues.apache.org/jira/browse/SPARK-25842, removing 
> the API so we can introduce a new one.
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-25908) Remove old deprecated items in Spark 3

2018-11-13 Thread Matt Cheah (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25908?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Cheah updated SPARK-25908:
---
Labels: release-notes  (was: )

> Remove old deprecated items in Spark 3
> --
>
> Key: SPARK-25908
> URL: https://issues.apache.org/jira/browse/SPARK-25908
> Project: Spark
>  Issue Type: Task
>  Components: Spark Core, SQL
>Affects Versions: 3.0.0
>Reporter: Sean Owen
>Assignee: Sean Owen
>Priority: Major
>  Labels: release-notes
> Fix For: 3.0.0
>
>
> There are many deprecated methods and classes in Spark. They _can_ be removed 
> in Spark 3, and for those that have been deprecated a long time (i.e. since 
> Spark <= 2.0), we should probably do so. This addresses most of these cases, 
> the easiest ones, those that are easy to remove and are old:
>  - Remove some AccumulableInfo .apply() methods
>  - Remove non-label-specific multiclass precision/recall/fScore in favor of 
> accuracy
>  - Remove toDegrees/toRadians in favor of degrees/radians (SparkR: only 
> deprecated)
>  - Remove approxCountDistinct in favor of approx_count_distinct (SparkR: only 
> deprecated)
>  - Remove unused Python StorageLevel constants
>  - Remove Dataset unionAll in favor of union
>  - Remove unused multiclass option in libsvm parsing
>  - Remove references to deprecated spark configs like spark.yarn.am.port
>  - Remove TaskContext.isRunningLocally
>  - Remove ShuffleMetrics.shuffle* methods
>  - Remove BaseReadWrite.context in favor of session
>  - Remove Column.!== in favor of =!=
>  - Remove Dataset.explode
>  - Remove Dataset.registerTempTable
>  - Remove SQLContext.getOrCreate, setActive, clearActive, constructors
> Not touched yet:
>  - everything else in MLLib
>  - HiveContext
>  - Anything deprecated more recently than 2.0.0, generally



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24834) Utils#nanSafeCompare{Double,Float} functions do not differ from normal java double/float comparison

2018-11-08 Thread Matt Cheah (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16680170#comment-16680170
 ] 

Matt Cheah commented on SPARK-24834:


[~srowen] - I know this is an old ticket but I wanted to propose re-opening 
this and addressing it for Spark 3.0. My understanding is that this behavior is 
also not consistent with other SQL systems like MySQL and PostGres. In a sense, 
even though this would be a behavioral change, one could argue that this is a 
correctness issue given what one should be expecting given behavior from other 
systems. Would it be reasonable to make the behavior change for Spark 3.0 and 
call it out in the release notes?

> Utils#nanSafeCompare{Double,Float} functions do not differ from normal java 
> double/float comparison
> ---
>
> Key: SPARK-24834
> URL: https://issues.apache.org/jira/browse/SPARK-24834
> Project: Spark
>  Issue Type: Improvement
>  Components: Spark Core
>Affects Versions: 2.3.2
>Reporter: Benjamin Duffield
>Priority: Minor
>
> Utils.scala contains two functions `nanSafeCompareDoubles` and 
> `nanSafeCompareFloats` which purport to have special handling of NaN values 
> in comparisons.
> The handling in these functions do not appear to differ from 
> java.lang.Double.compare and java.lang.Float.compare - they seem to produce 
> identical output to the built-in java comparison functions.
> I think it's clearer to not have these special Utils functions, and instead 
> just use the standard java comparison functions.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-25875) Merge code to set up driver features for different languages

2018-11-02 Thread Matt Cheah (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25875?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Cheah resolved SPARK-25875.

Resolution: Fixed

> Merge code to set up driver features for different languages
> 
>
> Key: SPARK-25875
> URL: https://issues.apache.org/jira/browse/SPARK-25875
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 2.4.0
>Reporter: Marcelo Vanzin
>Priority: Major
>
> This is the first step for SPARK-25874. Please refer to the parent bug for 
> details.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-25809) Support additional K8S cluster types for integration tests

2018-11-01 Thread Matt Cheah (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25809?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Cheah resolved SPARK-25809.

Resolution: Fixed

> Support additional K8S cluster types for integration tests
> --
>
> Key: SPARK-25809
> URL: https://issues.apache.org/jira/browse/SPARK-25809
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 2.3.2, 2.4.0
>Reporter: Rob Vesse
>Priority: Major
>
> Currently the Spark on K8S integration tests are hardcoded to use a 
> {{minikube}} based backend.  It would be nice if developers had more 
> flexibility in the choice of K8S cluster they wish to use for integration 
> testing.  More specifically it would be useful to be able to use the built-in 
> Kubernetes support in recent Docker releases and to just use a generic K8S 
> cluster.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-24434) Support user-specified driver and executor pod templates

2018-10-30 Thread Matt Cheah (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-24434?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Cheah resolved SPARK-24434.

Resolution: Fixed

> Support user-specified driver and executor pod templates
> 
>
> Key: SPARK-24434
> URL: https://issues.apache.org/jira/browse/SPARK-24434
> Project: Spark
>  Issue Type: New Feature
>  Components: Kubernetes
>Affects Versions: 2.4.0
>Reporter: Yinan Li
>Priority: Major
>
> With more requests for customizing the driver and executor pods coming, the 
> current approach of adding new Spark configuration options has some serious 
> drawbacks: 1) it means more Kubernetes specific configuration options to 
> maintain, and 2) it widens the gap between the declarative model used by 
> Kubernetes and the configuration model used by Spark. We should start 
> designing a solution that allows users to specify pod templates as central 
> places for all customization needs for the driver and executor pods. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-18278) SPIP: Support native submission of spark jobs to a kubernetes cluster

2018-10-15 Thread Matt Cheah (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-18278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16650524#comment-16650524
 ] 

Matt Cheah commented on SPARK-18278:


The fork is no longer being maintained, because Kubernetes support is now being 
built and available on mainline.

Spark 2.3 introduced basic support for Kubernetes. Spark 2.4 will have Python 
support. Dynamic allocation is being reworked, see [my comment 
above|https://issues.apache.org/jira/browse/SPARK-18278?focusedCommentId=16646282=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-16646282].

> SPIP: Support native submission of spark jobs to a kubernetes cluster
> -
>
> Key: SPARK-18278
> URL: https://issues.apache.org/jira/browse/SPARK-18278
> Project: Spark
>  Issue Type: Umbrella
>  Components: Build, Deploy, Documentation, Kubernetes, Scheduler, 
> Spark Core
>Affects Versions: 2.3.0
>Reporter: Erik Erlandson
>Assignee: Anirudh Ramanathan
>Priority: Major
>  Labels: SPIP
> Attachments: SPARK-18278 Spark on Kubernetes Design Proposal Revision 
> 2 (1).pdf
>
>
> A new Apache Spark sub-project that enables native support for submitting 
> Spark applications to a kubernetes cluster.   The submitted application runs 
> in a driver executing on a kubernetes pod, and executors lifecycles are also 
> managed as pods.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-18278) SPIP: Support native submission of spark jobs to a kubernetes cluster

2018-10-12 Thread Matt Cheah (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-18278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16648208#comment-16648208
 ] 

Matt Cheah commented on SPARK-18278:


[~liushaohui] a significantly large prerequisite for dynamic allocation is 
changing the way shuffle resiliency works; see 
https://issues.apache.org/jira/browse/SPARK-25299.

> SPIP: Support native submission of spark jobs to a kubernetes cluster
> -
>
> Key: SPARK-18278
> URL: https://issues.apache.org/jira/browse/SPARK-18278
> Project: Spark
>  Issue Type: Umbrella
>  Components: Build, Deploy, Documentation, Kubernetes, Scheduler, 
> Spark Core
>Affects Versions: 2.3.0
>Reporter: Erik Erlandson
>Assignee: Anirudh Ramanathan
>Priority: Major
>  Labels: SPIP
> Attachments: SPARK-18278 Spark on Kubernetes Design Proposal Revision 
> 2 (1).pdf
>
>
> A new Apache Spark sub-project that enables native support for submitting 
> Spark applications to a kubernetes cluster.   The submitted application runs 
> in a driver executing on a kubernetes pod, and executors lifecycles are also 
> managed as pods.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-23429) Add executor memory metrics to heartbeat and expose in executors REST API

2018-09-07 Thread Matt Cheah (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-23429?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Cheah resolved SPARK-23429.

Resolution: Fixed

> Add executor memory metrics to heartbeat and expose in executors REST API
> -
>
> Key: SPARK-23429
> URL: https://issues.apache.org/jira/browse/SPARK-23429
> Project: Spark
>  Issue Type: Sub-task
>  Components: Spark Core
>Affects Versions: 2.2.1
>Reporter: Edwina Lu
>Priority: Major
>
> Add new executor level memory metrics ( jvmUsedMemory, onHeapExecutionMemory, 
> offHeapExecutionMemory, onHeapStorageMemory, offHeapStorageMemory, 
> onHeapUnifiedMemory, and offHeapUnifiedMemory), and expose these via the 
> executors REST API. This information will help provide insight into how 
> executor and driver JVM memory is used, and for the different memory regions. 
> It can be used to help determine good values for spark.executor.memory, 
> spark.driver.memory, spark.memory.fraction, and spark.memory.storageFraction.
> Add an ExecutorMetrics class, with jvmUsedMemory, onHeapExecutionMemory, 
> offHeapExecutionMemory, onHeapStorageMemory, and offHeapStorageMemory. This 
> will track the memory usage at the executor level. The new ExecutorMetrics 
> will be sent by executors to the driver as part of the Heartbeat. A heartbeat 
> will be added for the driver as well, to collect these metrics for the driver.
> Modify the EventLoggingListener to log ExecutorMetricsUpdate events if there 
> is a new peak value for one of the memory metrics for an executor and stage. 
> Only the ExecutorMetrics will be logged, and not the TaskMetrics, to minimize 
> additional logging. Analysis on a set of sample applications showed an 
> increase of 0.25% in the size of the Spark history log, with this approach.
> Modify the AppStatusListener to collect snapshots of peak values for each 
> memory metric. Each snapshot has the time, jvmUsedMemory, executionMemory and 
> storageMemory, and list of active stages.
> Add the new memory metrics (snapshots of peak values for each memory metric) 
> to the executors REST API.
> This is a subtask for SPARK-23206. Please refer to the design doc for that 
> ticket for more details.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-25262) Make Spark local dir volumes configurable with Spark on Kubernetes

2018-09-06 Thread Matt Cheah (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-25262?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16606522#comment-16606522
 ] 

Matt Cheah commented on SPARK-25262:


For [https://github.com/apache/spark/pull/22323] we allow using tmpfs but other 
volume types aren't allowed. Is it fine to close this issue or do we want to 
keep this open to track work to support other volume types there?

> Make Spark local dir volumes configurable with Spark on Kubernetes
> --
>
> Key: SPARK-25262
> URL: https://issues.apache.org/jira/browse/SPARK-25262
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 2.3.0, 2.3.1
>Reporter: Rob Vesse
>Priority: Major
>
> As discussed during review of the design document for SPARK-24434 while 
> providing pod templates will provide more in-depth customisation for Spark on 
> Kubernetes there are some things that cannot be modified because Spark code 
> generates pod specs in very specific ways.
> The particular issue identified relates to handling on {{spark.local.dirs}} 
> which is done by {{LocalDirsFeatureStep.scala}}.  For each directory 
> specified, or a single default if no explicit specification, it creates a 
> Kubernetes {{emptyDir}} volume.  As noted in the Kubernetes documentation 
> this will be backed by the node storage 
> (https://kubernetes.io/docs/concepts/storage/volumes/#emptydir).  In some 
> compute environments this may be extremely undesirable.  For example with 
> diskless compute resources the node storage will likely be a non-performant 
> remote mounted disk, often with limited capacity.  For such environments it 
> would likely be better to set {{medium: Memory}} on the volume per the K8S 
> documentation to use a {{tmpfs}} volume instead.
> Another closely related issue is that users might want to use a different 
> volume type to back the local directories and there is no possibility to do 
> that.
> Pod templates will not really solve either of these issues because Spark is 
> always going to attempt to generate a new volume for each local directory and 
> always going to set these as {{emptyDir}}.
> Therefore the proposal is to make two changes to {{LocalDirsFeatureStep}}:
> * Provide a new config setting to enable using {{tmpfs}} backed {{emptyDir}} 
> volumes
> * Modify the logic to check if there is a volume already defined with the 
> name and if so skip generating a volume definition for it



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-25222) Spark on Kubernetes Pod Watcher dumps raw container status

2018-09-06 Thread Matt Cheah (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25222?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Cheah resolved SPARK-25222.

Resolution: Fixed

> Spark on Kubernetes Pod Watcher dumps raw container status
> --
>
> Key: SPARK-25222
> URL: https://issues.apache.org/jira/browse/SPARK-25222
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 2.3.0, 2.3.1
>Reporter: Rob Vesse
>Priority: Minor
>
> Spark on Kubernetes provides logging of the pod/container status as a monitor 
> of the job progress.  However the logger just dumps the raw container status 
> object leading to fairly unreadable output like so:
> {noformat}
> 18/08/24 09:03:27 INFO LoggingPodStatusWatcherImpl: State changed, new state: 
>pod name: spark-groupby-1535101393784-driver
>namespace: default
>labels: spark-app-selector -> spark-47f7248122b9444b8d5fd3701028a1e8, 
> spark-role -> driver
>pod uid: 88de6467-a77c-11e8-b9da-a4bf0128b75b
>creation time: 2018-08-24T09:03:14Z
>service account name: spark
>volumes: spark-local-dir-1, spark-conf-volume, spark-token-kjxkv
>node name: tab-cmp4
>start time: 2018-08-24T09:03:14Z
>container images: rvesse/spark:latest
>phase: Running
>status: 
> [ContainerStatus(containerID=docker://23ae58571f59505e837dca40455d0347fb90e9b88e2a2b145a38e2919fceb447,
>  image=rvesse/spark:latest, 
> imageID=docker-pullable://rvesse/spark@sha256:92abf0b718743d0f5a26068fc94ec42233db0493c55a8570dc8c851c62a4bc0a,
>  lastState=ContainerState(running=null, terminated=null, waiting=null, 
> additionalProperties={}), name=spark-kubernetes-driver, ready=true, 
> restartCount=0, 
> state=ContainerState(running=ContainerStateRunning(startedAt=Time(time=2018-08-24T09:03:26Z,
>  additionalProperties={}), additionalProperties={}), terminated=null, 
> waiting=null, additionalProperties={}), additionalProperties={})]
> {noformat}
> The {{LoggingPodStatusWatcher}} actually already includes code to nicely 
> format this information but only invokes it at the end of the job:
> {noformat}
> 18/08/24 09:04:07 INFO LoggingPodStatusWatcherImpl: Container final statuses:
>  Container name: spark-kubernetes-driver
>  Container image: rvesse/spark:latest
>  Container state: Terminated
>  Exit code: 0
> {noformat}
> It would be nice if we continually used the nice formatting throughout the 
> logging.
> We already have patched this on our internal fork and will upstream a fix 
> shortly.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-25299) Use remote storage for persisting shuffle data

2018-09-04 Thread Matt Cheah (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-25299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16603807#comment-16603807
 ] 

Matt Cheah commented on SPARK-25299:


(Changed the title to "remote storage" for a little more generalization)

> Use remote storage for persisting shuffle data
> --
>
> Key: SPARK-25299
> URL: https://issues.apache.org/jira/browse/SPARK-25299
> Project: Spark
>  Issue Type: New Feature
>  Components: Shuffle
>Affects Versions: 2.4.0
>Reporter: Matt Cheah
>Priority: Major
>
> In Spark, the shuffle primitive requires Spark executors to persist data to 
> the local disk of the worker nodes. If executors crash, the external shuffle 
> service can continue to serve the shuffle data that was written beyond the 
> lifetime of the executor itself. In YARN, Mesos, and Standalone mode, the 
> external shuffle service is deployed on every worker node. The shuffle 
> service shares local disk with the executors that run on its node.
> There are some shortcomings with the way shuffle is fundamentally implemented 
> right now. Particularly:
>  * If any external shuffle service process or node becomes unavailable, all 
> applications that had an executor that ran on that node must recompute the 
> shuffle blocks that were lost.
>  * Similarly to the above, the external shuffle service must be kept running 
> at all times, which may waste resources when no applications are using that 
> shuffle service node.
>  * Mounting local storage can prevent users from taking advantage of 
> desirable isolation benefits from using containerized environments, like 
> Kubernetes. We had an external shuffle service implementation in an early 
> prototype of the Kubernetes backend, but it was rejected due to its strict 
> requirement to be able to mount hostPath volumes or other persistent volume 
> setups.
> In the following [architecture discussion 
> document|https://docs.google.com/document/d/1uCkzGGVG17oGC6BJ75TpzLAZNorvrAU3FRd2X-rVHSM/edit#heading=h.btqugnmt2h40]
>  (note: _not_ an SPIP), we brainstorm various high level architectures for 
> improving the external shuffle service in a way that addresses the above 
> problems. The purpose of this umbrella JIRA is to promote additional 
> discussion on how we can approach these problems, both at the architecture 
> level and the implementation level. We anticipate filing sub-issues that 
> break down the tasks that must be completed to achieve this goal.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-25299) Use remote storage for persisting shuffle data

2018-09-04 Thread Matt Cheah (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Cheah updated SPARK-25299:
---
Summary: Use remote storage for persisting shuffle data  (was: Use 
distributed storage for persisting shuffle data)

> Use remote storage for persisting shuffle data
> --
>
> Key: SPARK-25299
> URL: https://issues.apache.org/jira/browse/SPARK-25299
> Project: Spark
>  Issue Type: New Feature
>  Components: Shuffle
>Affects Versions: 2.4.0
>Reporter: Matt Cheah
>Priority: Major
>
> In Spark, the shuffle primitive requires Spark executors to persist data to 
> the local disk of the worker nodes. If executors crash, the external shuffle 
> service can continue to serve the shuffle data that was written beyond the 
> lifetime of the executor itself. In YARN, Mesos, and Standalone mode, the 
> external shuffle service is deployed on every worker node. The shuffle 
> service shares local disk with the executors that run on its node.
> There are some shortcomings with the way shuffle is fundamentally implemented 
> right now. Particularly:
>  * If any external shuffle service process or node becomes unavailable, all 
> applications that had an executor that ran on that node must recompute the 
> shuffle blocks that were lost.
>  * Similarly to the above, the external shuffle service must be kept running 
> at all times, which may waste resources when no applications are using that 
> shuffle service node.
>  * Mounting local storage can prevent users from taking advantage of 
> desirable isolation benefits from using containerized environments, like 
> Kubernetes. We had an external shuffle service implementation in an early 
> prototype of the Kubernetes backend, but it was rejected due to its strict 
> requirement to be able to mount hostPath volumes or other persistent volume 
> setups.
> In the following [architecture discussion 
> document|https://docs.google.com/document/d/1uCkzGGVG17oGC6BJ75TpzLAZNorvrAU3FRd2X-rVHSM/edit#heading=h.btqugnmt2h40]
>  (note: _not_ an SPIP), we brainstorm various high level architectures for 
> improving the external shuffle service in a way that addresses the above 
> problems. The purpose of this umbrella JIRA is to promote additional 
> discussion on how we can approach these problems, both at the architecture 
> level and the implementation level. We anticipate filing sub-issues that 
> break down the tasks that must be completed to achieve this goal.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24434) Support user-specified driver and executor pod templates

2018-09-01 Thread Matt Cheah (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24434?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16599697#comment-16599697
 ] 

Matt Cheah commented on SPARK-24434:


Everyone, thank you for your contribution to this discussion. It is important 
for us to agree upon constructive next steps to avoid this kind of 
miscommunication in the future.

I have a few notes to make from having collaborated with Yifei and Onur on this 
patch and on behalf of Palantir.

Firstly, we apologize that we did not communicate clearly enough on Apache 
communication channels that we were working on this, and the urgency of which 
we needed this work done. We agree with [~felixcheung]'s assessment that notes 
from the weekly meetings that have bearing on Spark development should be sent 
back to the wider community. We are specifically sorry for not having said 
something to the effect of "I am taking a stab at implementing this at 
https://github.com/... . Stavros, are you cool with that?" Palantir and the 
Kubernetes Big Data group must improve our communication next time.

Secondly, we would suggest that a work in progress patch proposed early on in 
the feature's development would have been helpful for users to prepare to be 
able to use this feature in their internal tools. It's helpful for everyone to 
see the API and expected behavior of a new feature so that they can plan to 
take advantage of that feature ahead of time.

Thirdly, a small clarifying comment on timelines and urgency. While we don't 
see the need for this to be in Spark 2.4, we will be taking the patch ahead of 
time on our fork of Spark which follows the Apache master branch (see 
[https://github.com/palantir/spark).] We were hoping to cherry-pick this patch 
soon but could have been clearer in our communication of this need.

Fourthly, we are sorry for the wording in "On 15 Aug it was discussed that as 
Stavros Kontopoulos was out, and was not actively working on this PR at that 
moment, Yifei Huang and I can take over and start working on this.”: Instead of 
“take over”, we should have said “contribute to this feature” in this comment 
specifically.

Finally, moving forward we are happy to collaborate on what the community 
believes to be the best implementation of this feature. We are happy to use 
Onur's, but we can also use Stavros's. Regardless of the chosen implementation, 
credit should be given to all parties. For example if Onur's implementation is 
chosen, Stavros's design work should be called out in the pull request 
description. Either way, we would like to see this feature merged by Friday, 
September 07, though this will have to be delayed if the Spark 2.4 release 
branch is not cut before that time (since we don't want this going into master 
and ending up in Spark 2.4 as a result).

We are open to feedback on any of the above points and suggestions on how we 
can improve the way we contribute to Spark in the future.

> Support user-specified driver and executor pod templates
> 
>
> Key: SPARK-24434
> URL: https://issues.apache.org/jira/browse/SPARK-24434
> Project: Spark
>  Issue Type: New Feature
>  Components: Kubernetes
>Affects Versions: 2.4.0
>Reporter: Yinan Li
>Priority: Major
>
> With more requests for customizing the driver and executor pods coming, the 
> current approach of adding new Spark configuration options has some serious 
> drawbacks: 1) it means more Kubernetes specific configuration options to 
> maintain, and 2) it widens the gap between the declarative model used by 
> Kubernetes and the configuration model used by Spark. We should start 
> designing a solution that allows users to specify pod templates as central 
> places for all customization needs for the driver and executor pods. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-25299) Use distributed storage for persisting shuffle data

2018-08-31 Thread Matt Cheah (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-25299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16599428#comment-16599428
 ] 

Matt Cheah commented on SPARK-25299:


 

Note that SPARK-1529 was a much earlier feature request that is more or less 
identical to this one, but the old age of SPARK-1529 led me to open this newer 
issue instead of re-opening the old one. If it is preferable to use the old 
issue we can do that as well.

> Use distributed storage for persisting shuffle data
> ---
>
> Key: SPARK-25299
> URL: https://issues.apache.org/jira/browse/SPARK-25299
> Project: Spark
>  Issue Type: New Feature
>  Components: Shuffle
>Affects Versions: 2.4.0
>Reporter: Matt Cheah
>Priority: Major
>
> In Spark, the shuffle primitive requires Spark executors to persist data to 
> the local disk of the worker nodes. If executors crash, the external shuffle 
> service can continue to serve the shuffle data that was written beyond the 
> lifetime of the executor itself. In YARN, Mesos, and Standalone mode, the 
> external shuffle service is deployed on every worker node. The shuffle 
> service shares local disk with the executors that run on its node.
> There are some shortcomings with the way shuffle is fundamentally implemented 
> right now. Particularly:
>  * If any external shuffle service process or node becomes unavailable, all 
> applications that had an executor that ran on that node must recompute the 
> shuffle blocks that were lost.
>  * Similarly to the above, the external shuffle service must be kept running 
> at all times, which may waste resources when no applications are using that 
> shuffle service node.
>  * Mounting local storage can prevent users from taking advantage of 
> desirable isolation benefits from using containerized environments, like 
> Kubernetes. We had an external shuffle service implementation in an early 
> prototype of the Kubernetes backend, but it was rejected due to its strict 
> requirement to be able to mount hostPath volumes or other persistent volume 
> setups.
> In the following [architecture discussion 
> document|https://docs.google.com/document/d/1uCkzGGVG17oGC6BJ75TpzLAZNorvrAU3FRd2X-rVHSM/edit#heading=h.btqugnmt2h40]
>  (note: _not_ an SPIP), we brainstorm various high level architectures for 
> improving the external shuffle service in a way that addresses the above 
> problems. The purpose of this umbrella JIRA is to promote additional 
> discussion on how we can approach these problems, both at the architecture 
> level and the implementation level. We anticipate filing sub-issues that 
> break down the tasks that must be completed to achieve this goal.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-25299) Use distributed storage for persisting shuffle data

2018-08-31 Thread Matt Cheah (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-25299?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16599428#comment-16599428
 ] 

Matt Cheah edited comment on SPARK-25299 at 9/1/18 12:27 AM:
-

Note that SPARK-1529 was a much earlier feature request that is more or less 
identical to this one, but the old age of SPARK-1529 led me to open this newer 
issue instead of re-opening the old one. If it is preferable to use the old 
issue we can do that as well.


was (Author: mcheah):
 

Note that SPARK-1529 was a much earlier feature request that is more or less 
identical to this one, but the old age of SPARK-1529 led me to open this newer 
issue instead of re-opening the old one. If it is preferable to use the old 
issue we can do that as well.

> Use distributed storage for persisting shuffle data
> ---
>
> Key: SPARK-25299
> URL: https://issues.apache.org/jira/browse/SPARK-25299
> Project: Spark
>  Issue Type: New Feature
>  Components: Shuffle
>Affects Versions: 2.4.0
>Reporter: Matt Cheah
>Priority: Major
>
> In Spark, the shuffle primitive requires Spark executors to persist data to 
> the local disk of the worker nodes. If executors crash, the external shuffle 
> service can continue to serve the shuffle data that was written beyond the 
> lifetime of the executor itself. In YARN, Mesos, and Standalone mode, the 
> external shuffle service is deployed on every worker node. The shuffle 
> service shares local disk with the executors that run on its node.
> There are some shortcomings with the way shuffle is fundamentally implemented 
> right now. Particularly:
>  * If any external shuffle service process or node becomes unavailable, all 
> applications that had an executor that ran on that node must recompute the 
> shuffle blocks that were lost.
>  * Similarly to the above, the external shuffle service must be kept running 
> at all times, which may waste resources when no applications are using that 
> shuffle service node.
>  * Mounting local storage can prevent users from taking advantage of 
> desirable isolation benefits from using containerized environments, like 
> Kubernetes. We had an external shuffle service implementation in an early 
> prototype of the Kubernetes backend, but it was rejected due to its strict 
> requirement to be able to mount hostPath volumes or other persistent volume 
> setups.
> In the following [architecture discussion 
> document|https://docs.google.com/document/d/1uCkzGGVG17oGC6BJ75TpzLAZNorvrAU3FRd2X-rVHSM/edit#heading=h.btqugnmt2h40]
>  (note: _not_ an SPIP), we brainstorm various high level architectures for 
> improving the external shuffle service in a way that addresses the above 
> problems. The purpose of this umbrella JIRA is to promote additional 
> discussion on how we can approach these problems, both at the architecture 
> level and the implementation level. We anticipate filing sub-issues that 
> break down the tasks that must be completed to achieve this goal.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-25299) Use distributed storage for persisting shuffle data

2018-08-31 Thread Matt Cheah (JIRA)
Matt Cheah created SPARK-25299:
--

 Summary: Use distributed storage for persisting shuffle data
 Key: SPARK-25299
 URL: https://issues.apache.org/jira/browse/SPARK-25299
 Project: Spark
  Issue Type: New Feature
  Components: Shuffle
Affects Versions: 2.4.0
Reporter: Matt Cheah


In Spark, the shuffle primitive requires Spark executors to persist data to the 
local disk of the worker nodes. If executors crash, the external shuffle 
service can continue to serve the shuffle data that was written beyond the 
lifetime of the executor itself. In YARN, Mesos, and Standalone mode, the 
external shuffle service is deployed on every worker node. The shuffle service 
shares local disk with the executors that run on its node.

There are some shortcomings with the way shuffle is fundamentally implemented 
right now. Particularly:
 * If any external shuffle service process or node becomes unavailable, all 
applications that had an executor that ran on that node must recompute the 
shuffle blocks that were lost.
 * Similarly to the above, the external shuffle service must be kept running at 
all times, which may waste resources when no applications are using that 
shuffle service node.
 * Mounting local storage can prevent users from taking advantage of desirable 
isolation benefits from using containerized environments, like Kubernetes. We 
had an external shuffle service implementation in an early prototype of the 
Kubernetes backend, but it was rejected due to its strict requirement to be 
able to mount hostPath volumes or other persistent volume setups.

In the following [architecture discussion 
document|https://docs.google.com/document/d/1uCkzGGVG17oGC6BJ75TpzLAZNorvrAU3FRd2X-rVHSM/edit#heading=h.btqugnmt2h40]
 (note: _not_ an SPIP), we brainstorm various high level architectures for 
improving the external shuffle service in a way that addresses the above 
problems. The purpose of this umbrella JIRA is to promote additional discussion 
on how we can approach these problems, both at the architecture level and the 
implementation level. We anticipate filing sub-issues that break down the tasks 
that must be completed to achieve this goal.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-25264) Fix comma-delineated arguments passed into PythonRunner and RRunner

2018-08-31 Thread Matt Cheah (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-25264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Cheah resolved SPARK-25264.

   Resolution: Fixed
Fix Version/s: 2.4.0

> Fix comma-delineated arguments passed into PythonRunner and RRunner
> ---
>
> Key: SPARK-25264
> URL: https://issues.apache.org/jira/browse/SPARK-25264
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes, PySpark
>Affects Versions: 2.4.0
>Reporter: Ilan Filonenko
>Priority: Major
> Fix For: 2.4.0
>
>
> The arguments passed into the PythonRunner and RRunner are comma-delineated. 
> Because the Runners do a arg.slice(2,...) This means that the delineation in 
> the entrypoint needs to be a space, as it would be expected by the Runner 
> arguments. 
> This issue was logged here: 
> [https://github.com/GoogleCloudPlatform/spark-on-k8s-operator/issues/273]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-25152) Enable Spark on Kubernetes R Integration Tests

2018-08-17 Thread Matt Cheah (JIRA)
Matt Cheah created SPARK-25152:
--

 Summary: Enable Spark on Kubernetes R Integration Tests
 Key: SPARK-25152
 URL: https://issues.apache.org/jira/browse/SPARK-25152
 Project: Spark
  Issue Type: Test
  Components: Kubernetes, SparkR
Affects Versions: 2.4.0
Reporter: Matt Cheah


We merged [https://github.com/apache/spark/pull/21584] for SPARK-24433 but we 
had to turn off the integration tests due to issues with the Jenkins 
environment. Re-enable the tests after the environment is fixed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-24433) Add Spark R support

2018-08-17 Thread Matt Cheah (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-24433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Cheah updated SPARK-24433:
---
Fix Version/s: 2.4.0

> Add Spark R support
> ---
>
> Key: SPARK-24433
> URL: https://issues.apache.org/jira/browse/SPARK-24433
> Project: Spark
>  Issue Type: New Feature
>  Components: Kubernetes
>Affects Versions: 2.4.0
>Reporter: Yinan Li
>Priority: Major
> Fix For: 2.4.0
>
>
> This is the ticket to track work on adding support for R binding into the 
> Kubernetes mode. The feature is available in our fork at 
> github.com/apache-spark-on-k8s/spark and needs to be upstreamed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-24433) Add Spark R support

2018-08-17 Thread Matt Cheah (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-24433?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Cheah resolved SPARK-24433.

Resolution: Fixed

> Add Spark R support
> ---
>
> Key: SPARK-24433
> URL: https://issues.apache.org/jira/browse/SPARK-24433
> Project: Spark
>  Issue Type: New Feature
>  Components: Kubernetes
>Affects Versions: 2.4.0
>Reporter: Yinan Li
>Priority: Major
>
> This is the ticket to track work on adding support for R binding into the 
> Kubernetes mode. The feature is available in our fork at 
> github.com/apache-spark-on-k8s/spark and needs to be upstreamed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-24960) k8s: explicitly expose ports on driver container

2018-08-01 Thread Matt Cheah (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-24960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Cheah resolved SPARK-24960.

   Resolution: Fixed
Fix Version/s: 2.4.0

> k8s: explicitly expose ports on driver container
> 
>
> Key: SPARK-24960
> URL: https://issues.apache.org/jira/browse/SPARK-24960
> Project: Spark
>  Issue Type: Improvement
>  Components: Deploy, Kubernetes, Scheduler
>Affects Versions: 2.2.0
>Reporter: Adelbert Chang
>Priority: Minor
> Fix For: 2.4.0
>
>
> For the Kubernetes scheduler, the Driver Pod does not explicitly expose its 
> ports. It is possible for a Kubernetes environment to be setup such that Pod 
> ports are closed by default and must be opened explicitly in the Pod spec. In 
> such an environment without this improvement the Driver Service will be 
> unable to route requests (e.g. from the Executors) to the corresponding 
> Driver Pod, which can be observed on the Executor side with this error 
> message:
> {noformat}
> Caused by: java.io.IOException: Failed to connect to 
> org-apache-spark-examples-sparkpi-1519271450264-driver-svc.dev.svc.cluster.local:7078{noformat}
>  
> For posterity, this is a copy of the [original 
> issue|https://github.com/apache-spark-on-k8s/spark/issues/617] filed in the 
> now deprecated {{apache-spark-on-k8s}} repository.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-24963) Integration tests will fail if they run in a namespace not being the default

2018-07-30 Thread Matt Cheah (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-24963?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Cheah resolved SPARK-24963.

   Resolution: Fixed
Fix Version/s: 2.4.0

> Integration tests will fail if they run in a namespace not being the default
> 
>
> Key: SPARK-24963
> URL: https://issues.apache.org/jira/browse/SPARK-24963
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 2.4.0
>Reporter: Stavros Kontopoulos
>Priority: Minor
> Fix For: 2.4.0
>
>
> Related discussion is here: 
> [https://github.com/apache/spark/pull/21748#pullrequestreview-141048893]
> If  spark-rbac.yaml is used when tests are used locally, client mode tests 
> will fail.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-23146) Support client mode for Kubernetes cluster backend

2018-07-25 Thread Matt Cheah (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-23146?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Cheah resolved SPARK-23146.

   Resolution: Fixed
Fix Version/s: 2.4.0

> Support client mode for Kubernetes cluster backend
> --
>
> Key: SPARK-23146
> URL: https://issues.apache.org/jira/browse/SPARK-23146
> Project: Spark
>  Issue Type: New Feature
>  Components: Kubernetes
>Affects Versions: 2.3.0
>Reporter: Anirudh Ramanathan
>Priority: Major
> Fix For: 2.4.0
>
>
> This issue tracks client mode support within Spark when running in the 
> Kubernetes cluster backend.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24825) [K8S][TEST] Kubernetes integration tests don't trace the maven project dependency structure

2018-07-17 Thread Matt Cheah (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24825?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16547137#comment-16547137
 ] 

Matt Cheah commented on SPARK-24825:


We're looking into this now, this particular phase was built out by myself, 
[~ssuchter], [~foxish], and [~ifilonenko]. Consulting with some folks from 
RiseLab now also - [~shaneknapp]. I think we really shouldn't have to maven 
install, the multi-module build should pick up the other modules properly. 
We've likely configured the maven reactor incorrectly in the integration test's 
pom.xml.

> [K8S][TEST] Kubernetes integration tests don't trace the maven project 
> dependency structure
> ---
>
> Key: SPARK-24825
> URL: https://issues.apache.org/jira/browse/SPARK-24825
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes, Tests
>Affects Versions: 2.4.0
>Reporter: Matt Cheah
>Priority: Critical
>
> The Kubernetes integration tests will currently fail if maven installation is 
> not performed first, because the integration test build believes it should be 
> pulling the Spark parent artifact from maven central. However, this is 
> incorrect because the integration test should be building the Spark parent 
> pom as a dependency in the multi-module build, and the integration test 
> should just use the dynamically built artifact. Or to put it another way, the 
> integration test builds should never be pulling Spark dependencies from maven 
> central.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-24825) [K8S][TEST] Kubernetes integration tests don't trace the maven project dependency structure

2018-07-16 Thread Matt Cheah (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-24825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Cheah updated SPARK-24825:
---
Issue Type: Bug  (was: Improvement)

> [K8S][TEST] Kubernetes integration tests don't trace the maven project 
> dependency structure
> ---
>
> Key: SPARK-24825
> URL: https://issues.apache.org/jira/browse/SPARK-24825
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes, Tests
>Affects Versions: 2.4.0
>Reporter: Matt Cheah
>Priority: Major
>
> The Kubernetes integration tests will currently fail if maven installation is 
> not performed first, because the integration test build believes it should be 
> pulling the Spark parent artifact from maven central. However, this is 
> incorrect because the integration test should be building the Spark parent 
> pom as a dependency in the multi-module build, and the integration test 
> should just use the dynamically built artifact. Or to put it another way, the 
> integration test builds should never be pulling Spark dependencies from maven 
> central.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-24825) [K8S][TEST] Kubernetes integration tests don't trace the maven project dependency structure

2018-07-16 Thread Matt Cheah (JIRA)
Matt Cheah created SPARK-24825:
--

 Summary: [K8S][TEST] Kubernetes integration tests don't trace the 
maven project dependency structure
 Key: SPARK-24825
 URL: https://issues.apache.org/jira/browse/SPARK-24825
 Project: Spark
  Issue Type: Improvement
  Components: Kubernetes, Tests
Affects Versions: 2.4.0
Reporter: Matt Cheah


The Kubernetes integration tests will currently fail if maven installation is 
not performed first, because the integration test build believes it should be 
pulling the Spark parent artifact from maven central. However, this is 
incorrect because the integration test should be building the Spark parent pom 
as a dependency in the multi-module build, and the integration test should just 
use the dynamically built artifact. Or to put it another way, the integration 
test builds should never be pulling Spark dependencies from maven central.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Updated] (SPARK-24825) [K8S][TEST] Kubernetes integration tests don't trace the maven project dependency structure

2018-07-16 Thread Matt Cheah (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-24825?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Cheah updated SPARK-24825:
---
Priority: Critical  (was: Major)

> [K8S][TEST] Kubernetes integration tests don't trace the maven project 
> dependency structure
> ---
>
> Key: SPARK-24825
> URL: https://issues.apache.org/jira/browse/SPARK-24825
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes, Tests
>Affects Versions: 2.4.0
>Reporter: Matt Cheah
>Priority: Critical
>
> The Kubernetes integration tests will currently fail if maven installation is 
> not performed first, because the integration test build believes it should be 
> pulling the Spark parent artifact from maven central. However, this is 
> incorrect because the integration test should be building the Spark parent 
> pom as a dependency in the multi-module build, and the integration test 
> should just use the dynamically built artifact. Or to put it another way, the 
> integration test builds should never be pulling Spark dependencies from maven 
> central.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-24683) SparkLauncher.NO_RESOURCE doesn't work with Java applications

2018-07-02 Thread Matt Cheah (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-24683?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Cheah resolved SPARK-24683.

   Resolution: Fixed
Fix Version/s: 2.4.0

> SparkLauncher.NO_RESOURCE doesn't work with Java applications
> -
>
> Key: SPARK-24683
> URL: https://issues.apache.org/jira/browse/SPARK-24683
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 2.4.0
>Reporter: Matt Cheah
>Priority: Critical
> Fix For: 2.4.0
>
>
> On the tip of master, after we merged the Python bindings support, Spark 
> Submit on Kubernetes no longer works with {{SparkLauncher.NO_RESOURCE. 
> This is because spark-submit will not set the main class as a child argument.
> See here: 
> https://github.com/apache/spark/blob/1a644afbac35c204f9ad55f86999319a9ab458c6/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala#L694



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-24683) SparkLauncher.NO_RESOURCE doesn't work with Java applications

2018-06-28 Thread Matt Cheah (JIRA)
Matt Cheah created SPARK-24683:
--

 Summary: SparkLauncher.NO_RESOURCE doesn't work with Java 
applications
 Key: SPARK-24683
 URL: https://issues.apache.org/jira/browse/SPARK-24683
 Project: Spark
  Issue Type: Improvement
  Components: Kubernetes
Affects Versions: 2.4.0
Reporter: Matt Cheah


On the tip of master, after we merged the Python bindings support, Spark Submit 
on Kubernetes no longer works with {{SparkLauncher.NO_RESOURCE. This is 
because spark-submit will not set the main class as a child argument.

See here: 
https://github.com/apache/spark/blob/1a644afbac35c204f9ad55f86999319a9ab458c6/core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala#L694



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-24655) [K8S] Custom Docker Image Expectations and Documentation

2018-06-25 Thread Matt Cheah (JIRA)
Matt Cheah created SPARK-24655:
--

 Summary: [K8S] Custom Docker Image Expectations and Documentation
 Key: SPARK-24655
 URL: https://issues.apache.org/jira/browse/SPARK-24655
 Project: Spark
  Issue Type: Improvement
  Components: Kubernetes
Affects Versions: 2.3.1
Reporter: Matt Cheah


A common use case we want to support with Kubernetes is the usage of custom 
Docker images. Some examples include:
 * A user builds an application using Gradle or Maven, using Spark as a 
compile-time dependency. The application's jars (both the custom-written jars 
and the dependencies) need to be packaged in a docker image that can be run via 
spark-submit.
 * A user builds a PySpark or R application and desires to include custom 
dependencies
 * A user wants to switch the base image from Alpine to CentOS while using 
either built-in or custom jars

We currently do not document how these custom Docker images are supposed to be 
built, nor do we guarantee stability of these Docker images with various 
spark-submit versions. To illustrate how this can break down, suppose for 
example we decide to change the names of environment variables that denote the 
driver/executor extra JVM options specified by 
{{spark.[driver|executor].extraJavaOptions}}. If we change the environment 
variable spark-submit provides then the user must update their custom 
Dockerfile and build new images.

Rather than jumping to an implementation immediately though, it's worth taking 
a step back and considering these matters from the perspective of the end user. 
Towards that end, this ticket will serve as a forum where we can answer at 
least the following questions, and any others pertaining to the matter:
 # What would be the steps a user would need to take to build a custom Docker 
image, given their desire to customize the dependencies and the content (OS or 
otherwise) of said images?
 # How can we ensure the user does not need to rebuild the image if only the 
spark-submit version changes?

The end deliverable for this ticket is a design document, and then we'll create 
sub-issues for the technical implementation and documentation of the contract.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24248) [K8S] Use the Kubernetes cluster as the backing store for the state of pods

2018-06-18 Thread Matt Cheah (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16516368#comment-16516368
 ] 

Matt Cheah commented on SPARK-24248:


I've summarized what we ended up going with after some iteration on the PR 
here: 
[https://docs.google.com/document/d/1BWTK76k2242spz66JOx8SKKEl5qFV6Cg9ASxCdmIWbY/edit?usp=sharing].
 Recommendations are still welcome and can be worked on in follow up patches.

> [K8S] Use the Kubernetes cluster as the backing store for the state of pods
> ---
>
> Key: SPARK-24248
> URL: https://issues.apache.org/jira/browse/SPARK-24248
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 2.3.0
>Reporter: Matt Cheah
>Priority: Major
> Fix For: 2.4.0
>
>
> We have a number of places in KubernetesClusterSchedulerBackend right now 
> that maintains the state of pods in memory. However, the Kubernetes API can 
> always give us the most up to date and correct view of what our executors are 
> doing. We should consider moving away from in-memory state as much as can in 
> favor of using the Kubernetes cluster as the source of truth for pod status. 
> Maintaining less state in memory makes it so that there's a lower chance that 
> we accidentally miss updating one of these data structures and breaking the 
> lifecycle of executors.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24584) [K8s] More efficient storage of executor pod state

2018-06-18 Thread Matt Cheah (JIRA)


[ 
https://issues.apache.org/jira/browse/SPARK-24584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16516300#comment-16516300
 ] 

Matt Cheah commented on SPARK-24584:


Related to https://issues.apache.org/jira/browse/SPARK-24248

> [K8s] More efficient storage of executor pod state
> --
>
> Key: SPARK-24584
> URL: https://issues.apache.org/jira/browse/SPARK-24584
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 2.4.0
>Reporter: Matt Cheah
>Priority: Major
>
> Currently we store buffers of snapshots in {{ExecutorPodsSnapshotStore}}, 
> where the snapshots are duplicated per subscriber. With hundreds or maybe 
> thousands of executors, this buffering may become untenable. Investigate 
> storing less state while still maintaining the same level-triggered semantics.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-24584) [K8s] More efficient storage of executor pod state

2018-06-18 Thread Matt Cheah (JIRA)
Matt Cheah created SPARK-24584:
--

 Summary: [K8s] More efficient storage of executor pod state
 Key: SPARK-24584
 URL: https://issues.apache.org/jira/browse/SPARK-24584
 Project: Spark
  Issue Type: Improvement
  Components: Kubernetes
Affects Versions: 2.4.0
Reporter: Matt Cheah


Currently we store buffers of snapshots in {{ExecutorPodsSnapshotStore}}, where 
the snapshots are duplicated per subscriber. With hundreds or maybe thousands 
of executors, this buffering may become untenable. Investigate storing less 
state while still maintaining the same level-triggered semantics.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-24248) [K8S] Use the Kubernetes cluster as the backing store for the state of pods

2018-06-14 Thread Matt Cheah (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-24248?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Cheah resolved SPARK-24248.

   Resolution: Fixed
Fix Version/s: 2.4.0

> [K8S] Use the Kubernetes cluster as the backing store for the state of pods
> ---
>
> Key: SPARK-24248
> URL: https://issues.apache.org/jira/browse/SPARK-24248
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 2.3.0
>Reporter: Matt Cheah
>Priority: Major
> Fix For: 2.4.0
>
>
> We have a number of places in KubernetesClusterSchedulerBackend right now 
> that maintains the state of pods in memory. However, the Kubernetes API can 
> always give us the most up to date and correct view of what our executors are 
> doing. We should consider moving away from in-memory state as much as can in 
> favor of using the Kubernetes cluster as the source of truth for pod status. 
> Maintaining less state in memory makes it so that there's a lower chance that 
> we accidentally miss updating one of these data structures and breaking the 
> lifecycle of executors.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-23010) Add integration testing for Kubernetes backend into the apache/spark repository

2018-06-08 Thread Matt Cheah (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-23010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Cheah resolved SPARK-23010.

   Resolution: Fixed
Fix Version/s: 2.4.0

Some tests are missing but we got a basic set of the tests and the 
infrastructure into master.

> Add integration testing for Kubernetes backend into the apache/spark 
> repository
> ---
>
> Key: SPARK-23010
> URL: https://issues.apache.org/jira/browse/SPARK-23010
> Project: Spark
>  Issue Type: New Feature
>  Components: Kubernetes
>Affects Versions: 2.4.0
>Reporter: Anirudh Ramanathan
>Priority: Major
> Fix For: 2.4.0
>
>
> Add tests for the scheduler backend into apache/spark
> /xref: 
> http://apache-spark-developers-list.1001551.n3.nabble.com/Integration-testing-and-Scheduler-Backends-td23105.html



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Resolved] (SPARK-23984) PySpark Bindings for K8S

2018-06-08 Thread Matt Cheah (JIRA)


 [ 
https://issues.apache.org/jira/browse/SPARK-23984?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Cheah resolved SPARK-23984.

   Resolution: Fixed
Fix Version/s: 2.4.0

> PySpark Bindings for K8S
> 
>
> Key: SPARK-23984
> URL: https://issues.apache.org/jira/browse/SPARK-23984
> Project: Spark
>  Issue Type: New Feature
>  Components: Kubernetes, PySpark
>Affects Versions: 2.3.0
>Reporter: Ilan Filonenko
>Priority: Major
> Fix For: 2.4.0
>
>
> This ticket is tracking the ongoing work of moving the upsteam work from 
> [https://github.com/apache-spark-on-k8s/spark] specifically regarding Python 
> bindings for Spark on Kubernetes. 
> The points of focus are: dependency management, increased non-JVM memory 
> overhead default values, and modified Docker images to include Python 
> Support. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24248) [K8S] Use the Kubernetes cluster as the backing store for the state of pods

2018-05-10 Thread Matt Cheah (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-24248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16471295#comment-16471295
 ] 

Matt Cheah commented on SPARK-24248:


I see - I suppose if the watch connection drops, we should try to reestablish 
connection to the API server periodically, and when we can reestablish it we 
can do a full sync then?

> [K8S] Use the Kubernetes cluster as the backing store for the state of pods
> ---
>
> Key: SPARK-24248
> URL: https://issues.apache.org/jira/browse/SPARK-24248
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 2.3.0
>Reporter: Matt Cheah
>Priority: Major
>
> We have a number of places in KubernetesClusterSchedulerBackend right now 
> that maintains the state of pods in memory. However, the Kubernetes API can 
> always give us the most up to date and correct view of what our executors are 
> doing. We should consider moving away from in-memory state as much as can in 
> favor of using the Kubernetes cluster as the source of truth for pod status. 
> Maintaining less state in memory makes it so that there's a lower chance that 
> we accidentally miss updating one of these data structures and breaking the 
> lifecycle of executors.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-24248) [K8S] Use the Kubernetes cluster as the backing store for the state of pods

2018-05-10 Thread Matt Cheah (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-24248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16471280#comment-16471280
 ] 

Matt Cheah edited comment on SPARK-24248 at 5/10/18 11:18 PM:
--

I thought about it a bit more, and believe that we can do most if not all of 
our actions in the Watcher directly.

In other words, can we drive the entire lifecycle of all executors solely via 
watch events? This would be a pretty big rewrite, but you get a large number of 
benefits from this - namely, you remove any need for synchronization or local 
state management at all. {{KubernetesClusterSchedulerBackend}} becomes 
effectively stateless, apart from the parent class's fields.

This would imply at least the following changes, though I'm sure I'm missing 
some:
 * When the watcher receives a modified or error event, check the status of the 
executor, construct the exit reason, and call {{RemoveExecutor}} directly
 * The watcher keeps a running count of active executors and itself triggers 
rounds of creating new executors (instead of the periodic polling)
 * {{KubernetesDriverEndpoint::onDisconnected}} is a tricky one. What I'm 
thinking is that we can just disable the executor but not remove it, counting 
on the Watch to receive an event that would actually trigger removing the 
executor. The idea here is that the status of the pods as reported by the Watch 
should be fully reliable - e.g. whenever any error occurs in the executor such 
that it becomes unusable, the Kubernetes API should report such state. We could 
perhaps make the API's representation of the world more accurate by attaching 
liveness probes to the executor pod.

 


was (Author: mcheah):
I thought about it a bit more, and believe that we can do most if not all of 
our actions in the Watcher directly.

In other words, can we drive the entire lifecycle of all executors solely from 
the perspective of watch events? This would be a pretty big rewrite, but you 
get a large number of benefits from this - namely, you remove any need for 
synchronization or local state management at all. 
{{KubernetesClusterSchedulerBackend}} becomes effectively stateless, apart from 
the parent class's fields.

This would imply at least the following changes, though I'm sure I'm missing 
some:
 * When the watcher receives a modified or error event, check the status of the 
executor, construct the exit reason, and call \{{RemoveExecutor}} directly
 * The watcher keeps a running count of active executors and itself triggers 
rounds of creating new executors (instead of the periodic polling)
 * {\{KubernetesDriverEndpoint::onDisconnected}} is a tricky one. What I'm 
thinking is that we can just disable the executor but not remove it, counting 
on the Watch to receive an event that would actually trigger removing the 
executor. The idea here is that the status of the pods as reported by the Watch 
should be fully reliable - e.g. whenever any error occurs in the executor such 
that it becomes unusable, the Kubernetes API should report such state. We could 
perhaps make the API's representation of the world more accurate by attaching 
liveness probes to the executor pod.

 

> [K8S] Use the Kubernetes cluster as the backing store for the state of pods
> ---
>
> Key: SPARK-24248
> URL: https://issues.apache.org/jira/browse/SPARK-24248
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 2.3.0
>Reporter: Matt Cheah
>Priority: Major
>
> We have a number of places in KubernetesClusterSchedulerBackend right now 
> that maintains the state of pods in memory. However, the Kubernetes API can 
> always give us the most up to date and correct view of what our executors are 
> doing. We should consider moving away from in-memory state as much as can in 
> favor of using the Kubernetes cluster as the source of truth for pod status. 
> Maintaining less state in memory makes it so that there's a lower chance that 
> we accidentally miss updating one of these data structures and breaking the 
> lifecycle of executors.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24248) [K8S] Use the Kubernetes cluster as the backing store for the state of pods

2018-05-10 Thread Matt Cheah (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-24248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16471280#comment-16471280
 ] 

Matt Cheah commented on SPARK-24248:


I thought about it a bit more, and believe that we can do most if not all of 
our actions in the Watcher directly.

In other words, can we drive the entire lifecycle of all executors solely from 
the perspective of watch events? This would be a pretty big rewrite, but you 
get a large number of benefits from this - namely, you remove any need for 
synchronization or local state management at all. 
{{KubernetesClusterSchedulerBackend}} becomes effectively stateless, apart from 
the parent class's fields.

This would imply at least the following changes, though I'm sure I'm missing 
some:
 * When the watcher receives a modified or error event, check the status of the 
executor, construct the exit reason, and call \{{RemoveExecutor}} directly
 * The watcher keeps a running count of active executors and itself triggers 
rounds of creating new executors (instead of the periodic polling)
 * {\{KubernetesDriverEndpoint::onDisconnected}} is a tricky one. What I'm 
thinking is that we can just disable the executor but not remove it, counting 
on the Watch to receive an event that would actually trigger removing the 
executor. The idea here is that the status of the pods as reported by the Watch 
should be fully reliable - e.g. whenever any error occurs in the executor such 
that it becomes unusable, the Kubernetes API should report such state. We could 
perhaps make the API's representation of the world more accurate by attaching 
liveness probes to the executor pod.

 

> [K8S] Use the Kubernetes cluster as the backing store for the state of pods
> ---
>
> Key: SPARK-24248
> URL: https://issues.apache.org/jira/browse/SPARK-24248
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 2.3.0
>Reporter: Matt Cheah
>Priority: Major
>
> We have a number of places in KubernetesClusterSchedulerBackend right now 
> that maintains the state of pods in memory. However, the Kubernetes API can 
> always give us the most up to date and correct view of what our executors are 
> doing. We should consider moving away from in-memory state as much as can in 
> favor of using the Kubernetes cluster as the source of truth for pod status. 
> Maintaining less state in memory makes it so that there's a lower chance that 
> we accidentally miss updating one of these data structures and breaking the 
> lifecycle of executors.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24248) [K8S] Use the Kubernetes cluster as the backing store for the state of pods

2018-05-10 Thread Matt Cheah (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-24248?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16471206#comment-16471206
 ] 

Matt Cheah commented on SPARK-24248:


[~foxish] [~liyinan926] curious as to what you think about this idea.

> [K8S] Use the Kubernetes cluster as the backing store for the state of pods
> ---
>
> Key: SPARK-24248
> URL: https://issues.apache.org/jira/browse/SPARK-24248
> Project: Spark
>  Issue Type: Improvement
>  Components: Kubernetes
>Affects Versions: 2.3.0
>Reporter: Matt Cheah
>Priority: Major
>
> We have a number of places in KubernetesClusterSchedulerBackend right now 
> that maintains the state of pods in memory. However, the Kubernetes API can 
> always give us the most up to date and correct view of what our executors are 
> doing. We should consider moving away from in-memory state as much as can in 
> favor of using the Kubernetes cluster as the source of truth for pod status. 
> Maintaining less state in memory makes it so that there's a lower chance that 
> we accidentally miss updating one of these data structures and breaking the 
> lifecycle of executors.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-24248) [K8S] Use the Kubernetes cluster as the backing store for the state of pods

2018-05-10 Thread Matt Cheah (JIRA)
Matt Cheah created SPARK-24248:
--

 Summary: [K8S] Use the Kubernetes cluster as the backing store for 
the state of pods
 Key: SPARK-24248
 URL: https://issues.apache.org/jira/browse/SPARK-24248
 Project: Spark
  Issue Type: Improvement
  Components: Kubernetes
Affects Versions: 2.3.0
Reporter: Matt Cheah


We have a number of places in KubernetesClusterSchedulerBackend right now that 
maintains the state of pods in memory. However, the Kubernetes API can always 
give us the most up to date and correct view of what our executors are doing. 
We should consider moving away from in-memory state as much as can in favor of 
using the Kubernetes cluster as the source of truth for pod status. Maintaining 
less state in memory makes it so that there's a lower chance that we 
accidentally miss updating one of these data structures and breaking the 
lifecycle of executors.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Created] (SPARK-24247) [K8S] currentNodeToLocalTaskCount is unused in KubernetesClusterSchedulerBackend

2018-05-10 Thread Matt Cheah (JIRA)
Matt Cheah created SPARK-24247:
--

 Summary: [K8S] currentNodeToLocalTaskCount is unused in 
KubernetesClusterSchedulerBackend
 Key: SPARK-24247
 URL: https://issues.apache.org/jira/browse/SPARK-24247
 Project: Spark
  Issue Type: Improvement
  Components: Kubernetes
Affects Versions: 2.3.0
Reporter: Matt Cheah


This variable isn't used: 
[https://github.com/apache/spark/blob/master/resource-managers/kubernetes/core/src/main/scala/org/apache/spark/scheduler/cluster/k8s/KubernetesClusterSchedulerBackend.scala#L109]
 - we should either remove it or else be putting it to good use. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24135) [K8s] Executors that fail to start up because of init-container errors are not retried and limit the executor pool size

2018-05-04 Thread Matt Cheah (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-24135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16464467#comment-16464467
 ] 

Matt Cheah commented on SPARK-24135:


Put up the PR< see above - created a separate setting for this class of errors.

> [K8s] Executors that fail to start up because of init-container errors are 
> not retried and limit the executor pool size
> ---
>
> Key: SPARK-24135
> URL: https://issues.apache.org/jira/browse/SPARK-24135
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 2.3.0
>Reporter: Matt Cheah
>Priority: Major
>
> In KubernetesClusterSchedulerBackend, we detect if executors disconnect after 
> having been started or if executors hit the {{ERROR}} or {{DELETED}} states. 
> When executors fail in these ways, they are removed from the pending 
> executors pool and the driver should retry requesting these executors.
> However, the driver does not handle a different class of error: when the pod 
> enters the {{Init:Error}} state. This state comes up when the executor fails 
> to launch because one of its init-containers fails. Spark itself doesn't 
> attach any init-containers to the executors. However, custom web hooks can 
> run on the cluster and attach init-containers to the executor pods. 
> Additionally, pod presets can specify init containers to run on these pods. 
> Therefore Spark should be handling the {{Init:Error}} cases regardless if 
> Spark itself is aware of init-containers or not.
> This class of error is particularly bad because when we hit this state, the 
> failed executor will never start, but it's still seen as pending by the 
> executor allocator. The executor allocator won't request more rounds of 
> executors because its current batch hasn't been resolved to either running or 
> failed. Therefore we end up with being stuck with the number of executors 
> that successfully started before the faulty one failed to start, potentially 
> creating a fake resource bottleneck.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-24135) [K8s] Executors that fail to start up because of init-container errors are not retried and limit the executor pool size

2018-05-03 Thread Matt Cheah (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-24135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16462106#comment-16462106
 ] 

Matt Cheah edited comment on SPARK-24135 at 5/3/18 8:35 AM:


Not necessarily - if the pods fail to start up, we should retry them 
indefinitely as a replication controller or a deployment would. There's an 
argument that can be made that we should be using those higher level primitives 
to run executors instead of raw pods anyways, just that Spark's scheduler code 
would need non-trivial changes to do so right now.


was (Author: mcheah):
Not necessarily - if the pods fail to start up, we should retry them 
indefinitely as a replication controller or a deployment would. There's an 
argument that can be made that we should be using those lower level primitives 
to run executors instead of raw pods anyways, just that Spark's scheduler code 
would need non-trivial changes to do so right now.

> [K8s] Executors that fail to start up because of init-container errors are 
> not retried and limit the executor pool size
> ---
>
> Key: SPARK-24135
> URL: https://issues.apache.org/jira/browse/SPARK-24135
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 2.3.0
>Reporter: Matt Cheah
>Priority: Major
>
> In KubernetesClusterSchedulerBackend, we detect if executors disconnect after 
> having been started or if executors hit the {{ERROR}} or {{DELETED}} states. 
> When executors fail in these ways, they are removed from the pending 
> executors pool and the driver should retry requesting these executors.
> However, the driver does not handle a different class of error: when the pod 
> enters the {{Init:Error}} state. This state comes up when the executor fails 
> to launch because one of its init-containers fails. Spark itself doesn't 
> attach any init-containers to the executors. However, custom web hooks can 
> run on the cluster and attach init-containers to the executor pods. 
> Additionally, pod presets can specify init containers to run on these pods. 
> Therefore Spark should be handling the {{Init:Error}} cases regardless if 
> Spark itself is aware of init-containers or not.
> This class of error is particularly bad because when we hit this state, the 
> failed executor will never start, but it's still seen as pending by the 
> executor allocator. The executor allocator won't request more rounds of 
> executors because its current batch hasn't been resolved to either running or 
> failed. Therefore we end up with being stuck with the number of executors 
> that successfully started before the faulty one failed to start, potentially 
> creating a fake resource bottleneck.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24135) [K8s] Executors that fail to start up because of init-container errors are not retried and limit the executor pool size

2018-05-03 Thread Matt Cheah (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-24135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16462106#comment-16462106
 ] 

Matt Cheah commented on SPARK-24135:


Not necessarily - if the pods fail to start up, we should retry them 
indefinitely as a replication controller or a deployment would. There's an 
argument that can be made that we should be using those lower level primitives 
to run executors instead of raw pods anyways, just that Spark's scheduler code 
would need non-trivial changes to do so right now.

> [K8s] Executors that fail to start up because of init-container errors are 
> not retried and limit the executor pool size
> ---
>
> Key: SPARK-24135
> URL: https://issues.apache.org/jira/browse/SPARK-24135
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 2.3.0
>Reporter: Matt Cheah
>Priority: Major
>
> In KubernetesClusterSchedulerBackend, we detect if executors disconnect after 
> having been started or if executors hit the {{ERROR}} or {{DELETED}} states. 
> When executors fail in these ways, they are removed from the pending 
> executors pool and the driver should retry requesting these executors.
> However, the driver does not handle a different class of error: when the pod 
> enters the {{Init:Error}} state. This state comes up when the executor fails 
> to launch because one of its init-containers fails. Spark itself doesn't 
> attach any init-containers to the executors. However, custom web hooks can 
> run on the cluster and attach init-containers to the executor pods. 
> Additionally, pod presets can specify init containers to run on these pods. 
> Therefore Spark should be handling the {{Init:Error}} cases regardless if 
> Spark itself is aware of init-containers or not.
> This class of error is particularly bad because when we hit this state, the 
> failed executor will never start, but it's still seen as pending by the 
> executor allocator. The executor allocator won't request more rounds of 
> executors because its current batch hasn't been resolved to either running or 
> failed. Therefore we end up with being stuck with the number of executors 
> that successfully started before the faulty one failed to start, potentially 
> creating a fake resource bottleneck.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-24135) [K8s] Executors that fail to start up because of init-container errors are not retried and limit the executor pool size

2018-05-02 Thread Matt Cheah (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-24135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16461188#comment-16461188
 ] 

Matt Cheah edited comment on SPARK-24135 at 5/2/18 3:37 PM:


{quote}Restarting seems like it would eventually be limited by the job failure 
limit that Spark already has. If pod startup failures are deterministic the job 
failure count will hit this limit and job will be killed that way.
{quote}
In the case of the executor failing to start at all, this wouldn't be caught by 
Spark's task failure count logic because you're never going to end up 
scheduling tasks on these executors that failed to start.


was (Author: mcheah):
> Restarting seems like it would eventually be limited by the job failure limit 
>that Spark already has. If pod startup failures are deterministic the job 
>failure count will hit this limit and job will be killed that way.

In the case of the executor failing to start at all, this wouldn't be caught by 
Spark's task failure count logic because you're never going to end up 
scheduling tasks on these executors that failed to start.

> [K8s] Executors that fail to start up because of init-container errors are 
> not retried and limit the executor pool size
> ---
>
> Key: SPARK-24135
> URL: https://issues.apache.org/jira/browse/SPARK-24135
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 2.3.0
>Reporter: Matt Cheah
>Priority: Major
>
> In KubernetesClusterSchedulerBackend, we detect if executors disconnect after 
> having been started or if executors hit the {{ERROR}} or {{DELETED}} states. 
> When executors fail in these ways, they are removed from the pending 
> executors pool and the driver should retry requesting these executors.
> However, the driver does not handle a different class of error: when the pod 
> enters the {{Init:Error}} state. This state comes up when the executor fails 
> to launch because one of its init-containers fails. Spark itself doesn't 
> attach any init-containers to the executors. However, custom web hooks can 
> run on the cluster and attach init-containers to the executor pods. 
> Additionally, pod presets can specify init containers to run on these pods. 
> Therefore Spark should be handling the {{Init:Error}} cases regardless if 
> Spark itself is aware of init-containers or not.
> This class of error is particularly bad because when we hit this state, the 
> failed executor will never start, but it's still seen as pending by the 
> executor allocator. The executor allocator won't request more rounds of 
> executors because its current batch hasn't been resolved to either running or 
> failed. Therefore we end up with being stuck with the number of executors 
> that successfully started before the faulty one failed to start, potentially 
> creating a fake resource bottleneck.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Comment Edited] (SPARK-24135) [K8s] Executors that fail to start up because of init-container errors are not retried and limit the executor pool size

2018-05-02 Thread Matt Cheah (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-24135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16460047#comment-16460047
 ] 

Matt Cheah edited comment on SPARK-24135 at 5/2/18 3:37 PM:


{quote}But I'm not sure how much this buys us because very likely the newly 
requested executors will fail to be initialized,
{quote}
That's entirely up to the behavior of the init container itself - there's many 
reasons for one to believe that a given init container's logic can be flaky. 
But it's not immediately obvious to me whether or not the init container's 
failure should count towards a job failure. Job failures shouldn't be caused by 
failures in the framework, and in this case, the framework has added the 
init-container for these pods - in other words the user's code didn't directly 
cause the job failure.


was (Author: mcheah):
_> But I'm not sure how much this buys us because very likely the newly 
requested executors will fail to be initialized,_

That's entirely up to the behavior of the init container itself - there's many 
reasons for one to believe that a given init container's logic can be flaky. 
But it's not immediately obvious to me whether or not the init container's 
failure should count towards a job failure. Job failures shouldn't be caused by 
failures in the framework, and in this case, the framework has added the 
init-container for these pods - in other words the user's code didn't directly 
cause the job failure.

> [K8s] Executors that fail to start up because of init-container errors are 
> not retried and limit the executor pool size
> ---
>
> Key: SPARK-24135
> URL: https://issues.apache.org/jira/browse/SPARK-24135
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 2.3.0
>Reporter: Matt Cheah
>Priority: Major
>
> In KubernetesClusterSchedulerBackend, we detect if executors disconnect after 
> having been started or if executors hit the {{ERROR}} or {{DELETED}} states. 
> When executors fail in these ways, they are removed from the pending 
> executors pool and the driver should retry requesting these executors.
> However, the driver does not handle a different class of error: when the pod 
> enters the {{Init:Error}} state. This state comes up when the executor fails 
> to launch because one of its init-containers fails. Spark itself doesn't 
> attach any init-containers to the executors. However, custom web hooks can 
> run on the cluster and attach init-containers to the executor pods. 
> Additionally, pod presets can specify init containers to run on these pods. 
> Therefore Spark should be handling the {{Init:Error}} cases regardless if 
> Spark itself is aware of init-containers or not.
> This class of error is particularly bad because when we hit this state, the 
> failed executor will never start, but it's still seen as pending by the 
> executor allocator. The executor allocator won't request more rounds of 
> executors because its current batch hasn't been resolved to either running or 
> failed. Therefore we end up with being stuck with the number of executors 
> that successfully started before the faulty one failed to start, potentially 
> creating a fake resource bottleneck.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



[jira] [Commented] (SPARK-24135) [K8s] Executors that fail to start up because of init-container errors are not retried and limit the executor pool size

2018-05-02 Thread Matt Cheah (JIRA)

[ 
https://issues.apache.org/jira/browse/SPARK-24135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16461188#comment-16461188
 ] 

Matt Cheah commented on SPARK-24135:


> Restarting seems like it would eventually be limited by the job failure limit 
>that Spark already has. If pod startup failures are deterministic the job 
>failure count will hit this limit and job will be killed that way.

In the case of the executor failing to start at all, this wouldn't be caught by 
Spark's task failure count logic because you're never going to end up 
scheduling tasks on these executors that failed to start.

> [K8s] Executors that fail to start up because of init-container errors are 
> not retried and limit the executor pool size
> ---
>
> Key: SPARK-24135
> URL: https://issues.apache.org/jira/browse/SPARK-24135
> Project: Spark
>  Issue Type: Bug
>  Components: Kubernetes
>Affects Versions: 2.3.0
>Reporter: Matt Cheah
>Priority: Major
>
> In KubernetesClusterSchedulerBackend, we detect if executors disconnect after 
> having been started or if executors hit the {{ERROR}} or {{DELETED}} states. 
> When executors fail in these ways, they are removed from the pending 
> executors pool and the driver should retry requesting these executors.
> However, the driver does not handle a different class of error: when the pod 
> enters the {{Init:Error}} state. This state comes up when the executor fails 
> to launch because one of its init-containers fails. Spark itself doesn't 
> attach any init-containers to the executors. However, custom web hooks can 
> run on the cluster and attach init-containers to the executor pods. 
> Additionally, pod presets can specify init containers to run on these pods. 
> Therefore Spark should be handling the {{Init:Error}} cases regardless if 
> Spark itself is aware of init-containers or not.
> This class of error is particularly bad because when we hit this state, the 
> failed executor will never start, but it's still seen as pending by the 
> executor allocator. The executor allocator won't request more rounds of 
> executors because its current batch hasn't been resolved to either running or 
> failed. Therefore we end up with being stuck with the number of executors 
> that successfully started before the faulty one failed to start, potentially 
> creating a fake resource bottleneck.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: issues-unsubscr...@spark.apache.org
For additional commands, e-mail: issues-h...@spark.apache.org



  1   2   3   >