[spark] branch master updated (68c032c -> 93ad26b)
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 68c032c [SPARK-33364][SQL] Introduce the "purge" option in TableCatalog.dropTable for v2 catalog add 93ad26b [SPARK-23432][UI] Add executor peak jvm memory metrics in executors page No new revisions were added by this update. Summary of changes: .../spark/ui/static/executorspage-template.html| 16 +++ .../org/apache/spark/ui/static/executorspage.js| 52 -- 2 files changed, 64 insertions(+), 4 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (e66201b -> 21413b7)
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from e66201b [MINOR][SS][DOCS] Update join type in stream static joins code examples add 21413b7 [SPARK-30294][SS] Explicitly defines read-only StateStore and optimize for HDFSBackedStateStore No new revisions were added by this update. Summary of changes: .../state/HDFSBackedStateStoreProvider.scala | 46 +++-- .../sql/execution/streaming/state/StateStore.scala | 111 + .../execution/streaming/state/StateStoreRDD.scala | 104 ++- .../state/StreamingAggregationStateManager.scala | 22 ++-- .../sql/execution/streaming/state/package.scala| 35 +++ .../execution/streaming/statefulOperators.scala| 2 +- .../streaming/state/StateStoreSuite.scala | 4 +- 7 files changed, 261 insertions(+), 63 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (e66201b -> 21413b7)
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from e66201b [MINOR][SS][DOCS] Update join type in stream static joins code examples add 21413b7 [SPARK-30294][SS] Explicitly defines read-only StateStore and optimize for HDFSBackedStateStore No new revisions were added by this update. Summary of changes: .../state/HDFSBackedStateStoreProvider.scala | 46 +++-- .../sql/execution/streaming/state/StateStore.scala | 111 + .../execution/streaming/state/StateStoreRDD.scala | 104 ++- .../state/StreamingAggregationStateManager.scala | 22 ++-- .../sql/execution/streaming/state/package.scala| 35 +++ .../execution/streaming/statefulOperators.scala| 2 +- .../streaming/state/StateStoreSuite.scala | 4 +- 7 files changed, 261 insertions(+), 63 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (e66201b -> 21413b7)
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from e66201b [MINOR][SS][DOCS] Update join type in stream static joins code examples add 21413b7 [SPARK-30294][SS] Explicitly defines read-only StateStore and optimize for HDFSBackedStateStore No new revisions were added by this update. Summary of changes: .../state/HDFSBackedStateStoreProvider.scala | 46 +++-- .../sql/execution/streaming/state/StateStore.scala | 111 + .../execution/streaming/state/StateStoreRDD.scala | 104 ++- .../state/StreamingAggregationStateManager.scala | 22 ++-- .../sql/execution/streaming/state/package.scala| 35 +++ .../execution/streaming/statefulOperators.scala| 2 +- .../streaming/state/StateStoreSuite.scala | 4 +- 7 files changed, 261 insertions(+), 63 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (e66201b -> 21413b7)
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from e66201b [MINOR][SS][DOCS] Update join type in stream static joins code examples add 21413b7 [SPARK-30294][SS] Explicitly defines read-only StateStore and optimize for HDFSBackedStateStore No new revisions were added by this update. Summary of changes: .../state/HDFSBackedStateStoreProvider.scala | 46 +++-- .../sql/execution/streaming/state/StateStore.scala | 111 + .../execution/streaming/state/StateStoreRDD.scala | 104 ++- .../state/StreamingAggregationStateManager.scala | 22 ++-- .../sql/execution/streaming/state/package.scala| 35 +++ .../execution/streaming/statefulOperators.scala| 2 +- .../streaming/state/StateStoreSuite.scala | 4 +- 7 files changed, 261 insertions(+), 63 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (e66201b -> 21413b7)
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from e66201b [MINOR][SS][DOCS] Update join type in stream static joins code examples add 21413b7 [SPARK-30294][SS] Explicitly defines read-only StateStore and optimize for HDFSBackedStateStore No new revisions were added by this update. Summary of changes: .../state/HDFSBackedStateStoreProvider.scala | 46 +++-- .../sql/execution/streaming/state/StateStore.scala | 111 + .../execution/streaming/state/StateStoreRDD.scala | 104 ++- .../state/StreamingAggregationStateManager.scala | 22 ++-- .../sql/execution/streaming/state/package.scala| 35 +++ .../execution/streaming/statefulOperators.scala| 2 +- .../streaming/state/StateStoreSuite.scala | 4 +- 7 files changed, 261 insertions(+), 63 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-2.4 updated: [MINOR][SS][DOCS] Update join type in stream static joins code examples
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a commit to branch branch-2.4 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-2.4 by this push: new 8684720 [MINOR][SS][DOCS] Update join type in stream static joins code examples 8684720 is described below commit 868472040a9eb67169086ed0fab0afcbbbd321f9 Author: Sarvesh Dave AuthorDate: Thu Nov 5 16:22:31 2020 +0900 [MINOR][SS][DOCS] Update join type in stream static joins code examples ### What changes were proposed in this pull request? Update join type in stream static joins code examples in structured streaming programming guide. 1) Scala, Java and Python examples have a common issue. The join keyword is "right_join", it should be "left_outer". _Reasons:_ a) This code snippet is an example of "left outer join" as the streaming df is on left and static df is on right. Also, right outerjoin between stream df(left) and static df(right) is not supported. b) The keyword "right_join/left_join" is unsupported and it should be "right_outer/left_outer". So, all of these code snippets have been updated to "left_outer". 2) R exmaple is correct, but the example is of "right_outer" with static df (left) and streaming df(right). It is changed to "left_outer" to make it consistent with other three examples of scala, java and python. ### Why are the changes needed? To fix the mistake in example code of documentation. ### Does this PR introduce _any_ user-facing change? Yes, it is a user-facing change (but documentation update only). **Screenshots 1: Scala/Java/python example (similar issue)** _Before:_ https://user-images.githubusercontent.com/62717942/98155351-19e59400-1efc-11eb-8142-e6a25a5e6497.png";> _After:_ https://user-images.githubusercontent.com/62717942/98155503-5d400280-1efc-11eb-96e1-5ba0f3c35c82.png";> **Screenshots 2: R example (Make it consistent with above change)** _Before:_ https://user-images.githubusercontent.com/62717942/98155685-ac863300-1efc-11eb-93bc-b7ca4dd34634.png";> _After:_ https://user-images.githubusercontent.com/62717942/98155739-c0ca3000-1efc-11eb-8f95-a7538fa784b7.png";> ### How was this patch tested? The change was tested locally. 1) cd docs/ SKIP_API=1 jekyll build 2) Verify docs/_site/structured-streaming-programming-guide.html file in browser. Closes #30252 from sarveshdave1/doc-update-stream-static-joins. Authored-by: Sarvesh Dave Signed-off-by: Jungtaek Lim (HeartSaVioR) (cherry picked from commit e66201b30bc1f3da7284af14b32e5e6200768dbd) Signed-off-by: Jungtaek Lim (HeartSaVioR) --- docs/structured-streaming-programming-guide.md | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/docs/structured-streaming-programming-guide.md b/docs/structured-streaming-programming-guide.md index dce4b35..aac262b 100644 --- a/docs/structured-streaming-programming-guide.md +++ b/docs/structured-streaming-programming-guide.md @@ -1089,7 +1089,7 @@ val staticDf = spark.read. ... val streamingDf = spark.readStream. ... streamingDf.join(staticDf, "type") // inner equi-join with a static DF -streamingDf.join(staticDf, "type", "right_join") // right outer join with a static DF +streamingDf.join(staticDf, "type", "left_outer") // left outer join with a static DF {% endhighlight %} @@ -1100,7 +1100,7 @@ streamingDf.join(staticDf, "type", "right_join") // right outer join with a sta Dataset staticDf = spark.read(). ...; Dataset streamingDf = spark.readStream(). ...; streamingDf.join(staticDf, "type"); // inner equi-join with a static DF -streamingDf.join(staticDf, "type", "right_join"); // right outer join with a static DF +streamingDf.join(staticDf, "type", "left_outer"); // left outer join with a static DF {% endhighlight %} @@ -,7 +,7 @@ streamingDf.join(staticDf, "type", "right_join"); // right outer join with a st staticDf = spark.read. ... streamingDf = spark.readStream. ... streamingDf.join(staticDf, "type") # inner equi-join with a static DF -streamingDf.join(staticDf, "type", "right_join") # right outer join with a static DF +streamingDf.join(staticDf, "type", "left_outer") # left outer join with a static DF {% endhighlight %} @@ -1123,10 +1123,10 @@ staticDf <- read.df(...) streamingDf <- read.stream(...) joined <- merge(streamingDf, staticDf, sort = FALSE) # inner equi
[spark] branch branch-3.0 updated: [MINOR][SS][DOCS] Update join type in stream static joins code examples
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new c43231c [MINOR][SS][DOCS] Update join type in stream static joins code examples c43231c is described below commit c43231c6cd2d38fb9e6c6435ca08b735c9aacb61 Author: Sarvesh Dave AuthorDate: Thu Nov 5 16:22:31 2020 +0900 [MINOR][SS][DOCS] Update join type in stream static joins code examples ### What changes were proposed in this pull request? Update join type in stream static joins code examples in structured streaming programming guide. 1) Scala, Java and Python examples have a common issue. The join keyword is "right_join", it should be "left_outer". _Reasons:_ a) This code snippet is an example of "left outer join" as the streaming df is on left and static df is on right. Also, right outerjoin between stream df(left) and static df(right) is not supported. b) The keyword "right_join/left_join" is unsupported and it should be "right_outer/left_outer". So, all of these code snippets have been updated to "left_outer". 2) R exmaple is correct, but the example is of "right_outer" with static df (left) and streaming df(right). It is changed to "left_outer" to make it consistent with other three examples of scala, java and python. ### Why are the changes needed? To fix the mistake in example code of documentation. ### Does this PR introduce _any_ user-facing change? Yes, it is a user-facing change (but documentation update only). **Screenshots 1: Scala/Java/python example (similar issue)** _Before:_ https://user-images.githubusercontent.com/62717942/98155351-19e59400-1efc-11eb-8142-e6a25a5e6497.png";> _After:_ https://user-images.githubusercontent.com/62717942/98155503-5d400280-1efc-11eb-96e1-5ba0f3c35c82.png";> **Screenshots 2: R example (Make it consistent with above change)** _Before:_ https://user-images.githubusercontent.com/62717942/98155685-ac863300-1efc-11eb-93bc-b7ca4dd34634.png";> _After:_ https://user-images.githubusercontent.com/62717942/98155739-c0ca3000-1efc-11eb-8f95-a7538fa784b7.png";> ### How was this patch tested? The change was tested locally. 1) cd docs/ SKIP_API=1 jekyll build 2) Verify docs/_site/structured-streaming-programming-guide.html file in browser. Closes #30252 from sarveshdave1/doc-update-stream-static-joins. Authored-by: Sarvesh Dave Signed-off-by: Jungtaek Lim (HeartSaVioR) (cherry picked from commit e66201b30bc1f3da7284af14b32e5e6200768dbd) Signed-off-by: Jungtaek Lim (HeartSaVioR) --- docs/structured-streaming-programming-guide.md | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/docs/structured-streaming-programming-guide.md b/docs/structured-streaming-programming-guide.md index 69d744d..31b1ca9d 100644 --- a/docs/structured-streaming-programming-guide.md +++ b/docs/structured-streaming-programming-guide.md @@ -1115,7 +1115,7 @@ val staticDf = spark.read. ... val streamingDf = spark.readStream. ... streamingDf.join(staticDf, "type") // inner equi-join with a static DF -streamingDf.join(staticDf, "type", "right_join") // right outer join with a static DF +streamingDf.join(staticDf, "type", "left_outer") // left outer join with a static DF {% endhighlight %} @@ -1126,7 +1126,7 @@ streamingDf.join(staticDf, "type", "right_join") // right outer join with a sta Dataset staticDf = spark.read(). ...; Dataset streamingDf = spark.readStream(). ...; streamingDf.join(staticDf, "type"); // inner equi-join with a static DF -streamingDf.join(staticDf, "type", "right_join"); // right outer join with a static DF +streamingDf.join(staticDf, "type", "left_outer"); // left outer join with a static DF {% endhighlight %} @@ -1137,7 +1137,7 @@ streamingDf.join(staticDf, "type", "right_join"); // right outer join with a st staticDf = spark.read. ... streamingDf = spark.readStream. ... streamingDf.join(staticDf, "type") # inner equi-join with a static DF -streamingDf.join(staticDf, "type", "right_join") # right outer join with a static DF +streamingDf.join(staticDf, "type", "left_outer") # left outer join with a static DF {% endhighlight %} @@ -1149,10 +1149,10 @@ staticDf <- read.df(...) streamingDf <- read.stream(...) joined <- merge(streamingDf, staticDf, sort = FALSE) # inner equi
[spark] branch branch-2.4 updated: [MINOR][SS][DOCS] Update join type in stream static joins code examples
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a commit to branch branch-2.4 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-2.4 by this push: new 8684720 [MINOR][SS][DOCS] Update join type in stream static joins code examples 8684720 is described below commit 868472040a9eb67169086ed0fab0afcbbbd321f9 Author: Sarvesh Dave AuthorDate: Thu Nov 5 16:22:31 2020 +0900 [MINOR][SS][DOCS] Update join type in stream static joins code examples ### What changes were proposed in this pull request? Update join type in stream static joins code examples in structured streaming programming guide. 1) Scala, Java and Python examples have a common issue. The join keyword is "right_join", it should be "left_outer". _Reasons:_ a) This code snippet is an example of "left outer join" as the streaming df is on left and static df is on right. Also, right outerjoin between stream df(left) and static df(right) is not supported. b) The keyword "right_join/left_join" is unsupported and it should be "right_outer/left_outer". So, all of these code snippets have been updated to "left_outer". 2) R exmaple is correct, but the example is of "right_outer" with static df (left) and streaming df(right). It is changed to "left_outer" to make it consistent with other three examples of scala, java and python. ### Why are the changes needed? To fix the mistake in example code of documentation. ### Does this PR introduce _any_ user-facing change? Yes, it is a user-facing change (but documentation update only). **Screenshots 1: Scala/Java/python example (similar issue)** _Before:_ https://user-images.githubusercontent.com/62717942/98155351-19e59400-1efc-11eb-8142-e6a25a5e6497.png";> _After:_ https://user-images.githubusercontent.com/62717942/98155503-5d400280-1efc-11eb-96e1-5ba0f3c35c82.png";> **Screenshots 2: R example (Make it consistent with above change)** _Before:_ https://user-images.githubusercontent.com/62717942/98155685-ac863300-1efc-11eb-93bc-b7ca4dd34634.png";> _After:_ https://user-images.githubusercontent.com/62717942/98155739-c0ca3000-1efc-11eb-8f95-a7538fa784b7.png";> ### How was this patch tested? The change was tested locally. 1) cd docs/ SKIP_API=1 jekyll build 2) Verify docs/_site/structured-streaming-programming-guide.html file in browser. Closes #30252 from sarveshdave1/doc-update-stream-static-joins. Authored-by: Sarvesh Dave Signed-off-by: Jungtaek Lim (HeartSaVioR) (cherry picked from commit e66201b30bc1f3da7284af14b32e5e6200768dbd) Signed-off-by: Jungtaek Lim (HeartSaVioR) --- docs/structured-streaming-programming-guide.md | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/docs/structured-streaming-programming-guide.md b/docs/structured-streaming-programming-guide.md index dce4b35..aac262b 100644 --- a/docs/structured-streaming-programming-guide.md +++ b/docs/structured-streaming-programming-guide.md @@ -1089,7 +1089,7 @@ val staticDf = spark.read. ... val streamingDf = spark.readStream. ... streamingDf.join(staticDf, "type") // inner equi-join with a static DF -streamingDf.join(staticDf, "type", "right_join") // right outer join with a static DF +streamingDf.join(staticDf, "type", "left_outer") // left outer join with a static DF {% endhighlight %} @@ -1100,7 +1100,7 @@ streamingDf.join(staticDf, "type", "right_join") // right outer join with a sta Dataset staticDf = spark.read(). ...; Dataset streamingDf = spark.readStream(). ...; streamingDf.join(staticDf, "type"); // inner equi-join with a static DF -streamingDf.join(staticDf, "type", "right_join"); // right outer join with a static DF +streamingDf.join(staticDf, "type", "left_outer"); // left outer join with a static DF {% endhighlight %} @@ -,7 +,7 @@ streamingDf.join(staticDf, "type", "right_join"); // right outer join with a st staticDf = spark.read. ... streamingDf = spark.readStream. ... streamingDf.join(staticDf, "type") # inner equi-join with a static DF -streamingDf.join(staticDf, "type", "right_join") # right outer join with a static DF +streamingDf.join(staticDf, "type", "left_outer") # left outer join with a static DF {% endhighlight %} @@ -1123,10 +1123,10 @@ staticDf <- read.df(...) streamingDf <- read.stream(...) joined <- merge(streamingDf, staticDf, sort = FALSE) # inner equi
[spark] branch branch-3.0 updated: [MINOR][SS][DOCS] Update join type in stream static joins code examples
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new c43231c [MINOR][SS][DOCS] Update join type in stream static joins code examples c43231c is described below commit c43231c6cd2d38fb9e6c6435ca08b735c9aacb61 Author: Sarvesh Dave AuthorDate: Thu Nov 5 16:22:31 2020 +0900 [MINOR][SS][DOCS] Update join type in stream static joins code examples ### What changes were proposed in this pull request? Update join type in stream static joins code examples in structured streaming programming guide. 1) Scala, Java and Python examples have a common issue. The join keyword is "right_join", it should be "left_outer". _Reasons:_ a) This code snippet is an example of "left outer join" as the streaming df is on left and static df is on right. Also, right outerjoin between stream df(left) and static df(right) is not supported. b) The keyword "right_join/left_join" is unsupported and it should be "right_outer/left_outer". So, all of these code snippets have been updated to "left_outer". 2) R exmaple is correct, but the example is of "right_outer" with static df (left) and streaming df(right). It is changed to "left_outer" to make it consistent with other three examples of scala, java and python. ### Why are the changes needed? To fix the mistake in example code of documentation. ### Does this PR introduce _any_ user-facing change? Yes, it is a user-facing change (but documentation update only). **Screenshots 1: Scala/Java/python example (similar issue)** _Before:_ https://user-images.githubusercontent.com/62717942/98155351-19e59400-1efc-11eb-8142-e6a25a5e6497.png";> _After:_ https://user-images.githubusercontent.com/62717942/98155503-5d400280-1efc-11eb-96e1-5ba0f3c35c82.png";> **Screenshots 2: R example (Make it consistent with above change)** _Before:_ https://user-images.githubusercontent.com/62717942/98155685-ac863300-1efc-11eb-93bc-b7ca4dd34634.png";> _After:_ https://user-images.githubusercontent.com/62717942/98155739-c0ca3000-1efc-11eb-8f95-a7538fa784b7.png";> ### How was this patch tested? The change was tested locally. 1) cd docs/ SKIP_API=1 jekyll build 2) Verify docs/_site/structured-streaming-programming-guide.html file in browser. Closes #30252 from sarveshdave1/doc-update-stream-static-joins. Authored-by: Sarvesh Dave Signed-off-by: Jungtaek Lim (HeartSaVioR) (cherry picked from commit e66201b30bc1f3da7284af14b32e5e6200768dbd) Signed-off-by: Jungtaek Lim (HeartSaVioR) --- docs/structured-streaming-programming-guide.md | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/docs/structured-streaming-programming-guide.md b/docs/structured-streaming-programming-guide.md index 69d744d..31b1ca9d 100644 --- a/docs/structured-streaming-programming-guide.md +++ b/docs/structured-streaming-programming-guide.md @@ -1115,7 +1115,7 @@ val staticDf = spark.read. ... val streamingDf = spark.readStream. ... streamingDf.join(staticDf, "type") // inner equi-join with a static DF -streamingDf.join(staticDf, "type", "right_join") // right outer join with a static DF +streamingDf.join(staticDf, "type", "left_outer") // left outer join with a static DF {% endhighlight %} @@ -1126,7 +1126,7 @@ streamingDf.join(staticDf, "type", "right_join") // right outer join with a sta Dataset staticDf = spark.read(). ...; Dataset streamingDf = spark.readStream(). ...; streamingDf.join(staticDf, "type"); // inner equi-join with a static DF -streamingDf.join(staticDf, "type", "right_join"); // right outer join with a static DF +streamingDf.join(staticDf, "type", "left_outer"); // left outer join with a static DF {% endhighlight %} @@ -1137,7 +1137,7 @@ streamingDf.join(staticDf, "type", "right_join"); // right outer join with a st staticDf = spark.read. ... streamingDf = spark.readStream. ... streamingDf.join(staticDf, "type") # inner equi-join with a static DF -streamingDf.join(staticDf, "type", "right_join") # right outer join with a static DF +streamingDf.join(staticDf, "type", "left_outer") # left outer join with a static DF {% endhighlight %} @@ -1149,10 +1149,10 @@ staticDf <- read.df(...) streamingDf <- read.stream(...) joined <- merge(streamingDf, staticDf, sort = FALSE) # inner equi
[spark] branch master updated (d530ed0 -> e66201b)
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from d530ed0 Revert "[SPARK-33277][PYSPARK][SQL] Use ContextAwareIterator to stop consuming after the task ends" add e66201b [MINOR][SS][DOCS] Update join type in stream static joins code examples No new revisions were added by this update. Summary of changes: docs/structured-streaming-programming-guide.md | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-2.4 updated: [MINOR][SS][DOCS] Update join type in stream static joins code examples
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a commit to branch branch-2.4 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-2.4 by this push: new 8684720 [MINOR][SS][DOCS] Update join type in stream static joins code examples 8684720 is described below commit 868472040a9eb67169086ed0fab0afcbbbd321f9 Author: Sarvesh Dave AuthorDate: Thu Nov 5 16:22:31 2020 +0900 [MINOR][SS][DOCS] Update join type in stream static joins code examples ### What changes were proposed in this pull request? Update join type in stream static joins code examples in structured streaming programming guide. 1) Scala, Java and Python examples have a common issue. The join keyword is "right_join", it should be "left_outer". _Reasons:_ a) This code snippet is an example of "left outer join" as the streaming df is on left and static df is on right. Also, right outerjoin between stream df(left) and static df(right) is not supported. b) The keyword "right_join/left_join" is unsupported and it should be "right_outer/left_outer". So, all of these code snippets have been updated to "left_outer". 2) R exmaple is correct, but the example is of "right_outer" with static df (left) and streaming df(right). It is changed to "left_outer" to make it consistent with other three examples of scala, java and python. ### Why are the changes needed? To fix the mistake in example code of documentation. ### Does this PR introduce _any_ user-facing change? Yes, it is a user-facing change (but documentation update only). **Screenshots 1: Scala/Java/python example (similar issue)** _Before:_ https://user-images.githubusercontent.com/62717942/98155351-19e59400-1efc-11eb-8142-e6a25a5e6497.png";> _After:_ https://user-images.githubusercontent.com/62717942/98155503-5d400280-1efc-11eb-96e1-5ba0f3c35c82.png";> **Screenshots 2: R example (Make it consistent with above change)** _Before:_ https://user-images.githubusercontent.com/62717942/98155685-ac863300-1efc-11eb-93bc-b7ca4dd34634.png";> _After:_ https://user-images.githubusercontent.com/62717942/98155739-c0ca3000-1efc-11eb-8f95-a7538fa784b7.png";> ### How was this patch tested? The change was tested locally. 1) cd docs/ SKIP_API=1 jekyll build 2) Verify docs/_site/structured-streaming-programming-guide.html file in browser. Closes #30252 from sarveshdave1/doc-update-stream-static-joins. Authored-by: Sarvesh Dave Signed-off-by: Jungtaek Lim (HeartSaVioR) (cherry picked from commit e66201b30bc1f3da7284af14b32e5e6200768dbd) Signed-off-by: Jungtaek Lim (HeartSaVioR) --- docs/structured-streaming-programming-guide.md | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/docs/structured-streaming-programming-guide.md b/docs/structured-streaming-programming-guide.md index dce4b35..aac262b 100644 --- a/docs/structured-streaming-programming-guide.md +++ b/docs/structured-streaming-programming-guide.md @@ -1089,7 +1089,7 @@ val staticDf = spark.read. ... val streamingDf = spark.readStream. ... streamingDf.join(staticDf, "type") // inner equi-join with a static DF -streamingDf.join(staticDf, "type", "right_join") // right outer join with a static DF +streamingDf.join(staticDf, "type", "left_outer") // left outer join with a static DF {% endhighlight %} @@ -1100,7 +1100,7 @@ streamingDf.join(staticDf, "type", "right_join") // right outer join with a sta Dataset staticDf = spark.read(). ...; Dataset streamingDf = spark.readStream(). ...; streamingDf.join(staticDf, "type"); // inner equi-join with a static DF -streamingDf.join(staticDf, "type", "right_join"); // right outer join with a static DF +streamingDf.join(staticDf, "type", "left_outer"); // left outer join with a static DF {% endhighlight %} @@ -,7 +,7 @@ streamingDf.join(staticDf, "type", "right_join"); // right outer join with a st staticDf = spark.read. ... streamingDf = spark.readStream. ... streamingDf.join(staticDf, "type") # inner equi-join with a static DF -streamingDf.join(staticDf, "type", "right_join") # right outer join with a static DF +streamingDf.join(staticDf, "type", "left_outer") # left outer join with a static DF {% endhighlight %} @@ -1123,10 +1123,10 @@ staticDf <- read.df(...) streamingDf <- read.stream(...) joined <- merge(streamingDf, staticDf, sort = FALSE) # inner equi
[spark] branch branch-3.0 updated: [MINOR][SS][DOCS] Update join type in stream static joins code examples
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new c43231c [MINOR][SS][DOCS] Update join type in stream static joins code examples c43231c is described below commit c43231c6cd2d38fb9e6c6435ca08b735c9aacb61 Author: Sarvesh Dave AuthorDate: Thu Nov 5 16:22:31 2020 +0900 [MINOR][SS][DOCS] Update join type in stream static joins code examples ### What changes were proposed in this pull request? Update join type in stream static joins code examples in structured streaming programming guide. 1) Scala, Java and Python examples have a common issue. The join keyword is "right_join", it should be "left_outer". _Reasons:_ a) This code snippet is an example of "left outer join" as the streaming df is on left and static df is on right. Also, right outerjoin between stream df(left) and static df(right) is not supported. b) The keyword "right_join/left_join" is unsupported and it should be "right_outer/left_outer". So, all of these code snippets have been updated to "left_outer". 2) R exmaple is correct, but the example is of "right_outer" with static df (left) and streaming df(right). It is changed to "left_outer" to make it consistent with other three examples of scala, java and python. ### Why are the changes needed? To fix the mistake in example code of documentation. ### Does this PR introduce _any_ user-facing change? Yes, it is a user-facing change (but documentation update only). **Screenshots 1: Scala/Java/python example (similar issue)** _Before:_ https://user-images.githubusercontent.com/62717942/98155351-19e59400-1efc-11eb-8142-e6a25a5e6497.png";> _After:_ https://user-images.githubusercontent.com/62717942/98155503-5d400280-1efc-11eb-96e1-5ba0f3c35c82.png";> **Screenshots 2: R example (Make it consistent with above change)** _Before:_ https://user-images.githubusercontent.com/62717942/98155685-ac863300-1efc-11eb-93bc-b7ca4dd34634.png";> _After:_ https://user-images.githubusercontent.com/62717942/98155739-c0ca3000-1efc-11eb-8f95-a7538fa784b7.png";> ### How was this patch tested? The change was tested locally. 1) cd docs/ SKIP_API=1 jekyll build 2) Verify docs/_site/structured-streaming-programming-guide.html file in browser. Closes #30252 from sarveshdave1/doc-update-stream-static-joins. Authored-by: Sarvesh Dave Signed-off-by: Jungtaek Lim (HeartSaVioR) (cherry picked from commit e66201b30bc1f3da7284af14b32e5e6200768dbd) Signed-off-by: Jungtaek Lim (HeartSaVioR) --- docs/structured-streaming-programming-guide.md | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/docs/structured-streaming-programming-guide.md b/docs/structured-streaming-programming-guide.md index 69d744d..31b1ca9d 100644 --- a/docs/structured-streaming-programming-guide.md +++ b/docs/structured-streaming-programming-guide.md @@ -1115,7 +1115,7 @@ val staticDf = spark.read. ... val streamingDf = spark.readStream. ... streamingDf.join(staticDf, "type") // inner equi-join with a static DF -streamingDf.join(staticDf, "type", "right_join") // right outer join with a static DF +streamingDf.join(staticDf, "type", "left_outer") // left outer join with a static DF {% endhighlight %} @@ -1126,7 +1126,7 @@ streamingDf.join(staticDf, "type", "right_join") // right outer join with a sta Dataset staticDf = spark.read(). ...; Dataset streamingDf = spark.readStream(). ...; streamingDf.join(staticDf, "type"); // inner equi-join with a static DF -streamingDf.join(staticDf, "type", "right_join"); // right outer join with a static DF +streamingDf.join(staticDf, "type", "left_outer"); // left outer join with a static DF {% endhighlight %} @@ -1137,7 +1137,7 @@ streamingDf.join(staticDf, "type", "right_join"); // right outer join with a st staticDf = spark.read. ... streamingDf = spark.readStream. ... streamingDf.join(staticDf, "type") # inner equi-join with a static DF -streamingDf.join(staticDf, "type", "right_join") # right outer join with a static DF +streamingDf.join(staticDf, "type", "left_outer") # left outer join with a static DF {% endhighlight %} @@ -1149,10 +1149,10 @@ staticDf <- read.df(...) streamingDf <- read.stream(...) joined <- merge(streamingDf, staticDf, sort = FALSE) # inner equi
[spark] branch master updated (d530ed0 -> e66201b)
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from d530ed0 Revert "[SPARK-33277][PYSPARK][SQL] Use ContextAwareIterator to stop consuming after the task ends" add e66201b [MINOR][SS][DOCS] Update join type in stream static joins code examples No new revisions were added by this update. Summary of changes: docs/structured-streaming-programming-guide.md | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (d530ed0 -> e66201b)
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from d530ed0 Revert "[SPARK-33277][PYSPARK][SQL] Use ContextAwareIterator to stop consuming after the task ends" add e66201b [MINOR][SS][DOCS] Update join type in stream static joins code examples No new revisions were added by this update. Summary of changes: docs/structured-streaming-programming-guide.md | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-2.4 updated: [MINOR][SS][DOCS] Update join type in stream static joins code examples
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a commit to branch branch-2.4 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-2.4 by this push: new 8684720 [MINOR][SS][DOCS] Update join type in stream static joins code examples 8684720 is described below commit 868472040a9eb67169086ed0fab0afcbbbd321f9 Author: Sarvesh Dave AuthorDate: Thu Nov 5 16:22:31 2020 +0900 [MINOR][SS][DOCS] Update join type in stream static joins code examples ### What changes were proposed in this pull request? Update join type in stream static joins code examples in structured streaming programming guide. 1) Scala, Java and Python examples have a common issue. The join keyword is "right_join", it should be "left_outer". _Reasons:_ a) This code snippet is an example of "left outer join" as the streaming df is on left and static df is on right. Also, right outerjoin between stream df(left) and static df(right) is not supported. b) The keyword "right_join/left_join" is unsupported and it should be "right_outer/left_outer". So, all of these code snippets have been updated to "left_outer". 2) R exmaple is correct, but the example is of "right_outer" with static df (left) and streaming df(right). It is changed to "left_outer" to make it consistent with other three examples of scala, java and python. ### Why are the changes needed? To fix the mistake in example code of documentation. ### Does this PR introduce _any_ user-facing change? Yes, it is a user-facing change (but documentation update only). **Screenshots 1: Scala/Java/python example (similar issue)** _Before:_ https://user-images.githubusercontent.com/62717942/98155351-19e59400-1efc-11eb-8142-e6a25a5e6497.png";> _After:_ https://user-images.githubusercontent.com/62717942/98155503-5d400280-1efc-11eb-96e1-5ba0f3c35c82.png";> **Screenshots 2: R example (Make it consistent with above change)** _Before:_ https://user-images.githubusercontent.com/62717942/98155685-ac863300-1efc-11eb-93bc-b7ca4dd34634.png";> _After:_ https://user-images.githubusercontent.com/62717942/98155739-c0ca3000-1efc-11eb-8f95-a7538fa784b7.png";> ### How was this patch tested? The change was tested locally. 1) cd docs/ SKIP_API=1 jekyll build 2) Verify docs/_site/structured-streaming-programming-guide.html file in browser. Closes #30252 from sarveshdave1/doc-update-stream-static-joins. Authored-by: Sarvesh Dave Signed-off-by: Jungtaek Lim (HeartSaVioR) (cherry picked from commit e66201b30bc1f3da7284af14b32e5e6200768dbd) Signed-off-by: Jungtaek Lim (HeartSaVioR) --- docs/structured-streaming-programming-guide.md | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/docs/structured-streaming-programming-guide.md b/docs/structured-streaming-programming-guide.md index dce4b35..aac262b 100644 --- a/docs/structured-streaming-programming-guide.md +++ b/docs/structured-streaming-programming-guide.md @@ -1089,7 +1089,7 @@ val staticDf = spark.read. ... val streamingDf = spark.readStream. ... streamingDf.join(staticDf, "type") // inner equi-join with a static DF -streamingDf.join(staticDf, "type", "right_join") // right outer join with a static DF +streamingDf.join(staticDf, "type", "left_outer") // left outer join with a static DF {% endhighlight %} @@ -1100,7 +1100,7 @@ streamingDf.join(staticDf, "type", "right_join") // right outer join with a sta Dataset staticDf = spark.read(). ...; Dataset streamingDf = spark.readStream(). ...; streamingDf.join(staticDf, "type"); // inner equi-join with a static DF -streamingDf.join(staticDf, "type", "right_join"); // right outer join with a static DF +streamingDf.join(staticDf, "type", "left_outer"); // left outer join with a static DF {% endhighlight %} @@ -,7 +,7 @@ streamingDf.join(staticDf, "type", "right_join"); // right outer join with a st staticDf = spark.read. ... streamingDf = spark.readStream. ... streamingDf.join(staticDf, "type") # inner equi-join with a static DF -streamingDf.join(staticDf, "type", "right_join") # right outer join with a static DF +streamingDf.join(staticDf, "type", "left_outer") # left outer join with a static DF {% endhighlight %} @@ -1123,10 +1123,10 @@ staticDf <- read.df(...) streamingDf <- read.stream(...) joined <- merge(streamingDf, staticDf, sort = FALSE) # inner equi
[spark] branch branch-3.0 updated: [MINOR][SS][DOCS] Update join type in stream static joins code examples
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new c43231c [MINOR][SS][DOCS] Update join type in stream static joins code examples c43231c is described below commit c43231c6cd2d38fb9e6c6435ca08b735c9aacb61 Author: Sarvesh Dave AuthorDate: Thu Nov 5 16:22:31 2020 +0900 [MINOR][SS][DOCS] Update join type in stream static joins code examples ### What changes were proposed in this pull request? Update join type in stream static joins code examples in structured streaming programming guide. 1) Scala, Java and Python examples have a common issue. The join keyword is "right_join", it should be "left_outer". _Reasons:_ a) This code snippet is an example of "left outer join" as the streaming df is on left and static df is on right. Also, right outerjoin between stream df(left) and static df(right) is not supported. b) The keyword "right_join/left_join" is unsupported and it should be "right_outer/left_outer". So, all of these code snippets have been updated to "left_outer". 2) R exmaple is correct, but the example is of "right_outer" with static df (left) and streaming df(right). It is changed to "left_outer" to make it consistent with other three examples of scala, java and python. ### Why are the changes needed? To fix the mistake in example code of documentation. ### Does this PR introduce _any_ user-facing change? Yes, it is a user-facing change (but documentation update only). **Screenshots 1: Scala/Java/python example (similar issue)** _Before:_ https://user-images.githubusercontent.com/62717942/98155351-19e59400-1efc-11eb-8142-e6a25a5e6497.png";> _After:_ https://user-images.githubusercontent.com/62717942/98155503-5d400280-1efc-11eb-96e1-5ba0f3c35c82.png";> **Screenshots 2: R example (Make it consistent with above change)** _Before:_ https://user-images.githubusercontent.com/62717942/98155685-ac863300-1efc-11eb-93bc-b7ca4dd34634.png";> _After:_ https://user-images.githubusercontent.com/62717942/98155739-c0ca3000-1efc-11eb-8f95-a7538fa784b7.png";> ### How was this patch tested? The change was tested locally. 1) cd docs/ SKIP_API=1 jekyll build 2) Verify docs/_site/structured-streaming-programming-guide.html file in browser. Closes #30252 from sarveshdave1/doc-update-stream-static-joins. Authored-by: Sarvesh Dave Signed-off-by: Jungtaek Lim (HeartSaVioR) (cherry picked from commit e66201b30bc1f3da7284af14b32e5e6200768dbd) Signed-off-by: Jungtaek Lim (HeartSaVioR) --- docs/structured-streaming-programming-guide.md | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/docs/structured-streaming-programming-guide.md b/docs/structured-streaming-programming-guide.md index 69d744d..31b1ca9d 100644 --- a/docs/structured-streaming-programming-guide.md +++ b/docs/structured-streaming-programming-guide.md @@ -1115,7 +1115,7 @@ val staticDf = spark.read. ... val streamingDf = spark.readStream. ... streamingDf.join(staticDf, "type") // inner equi-join with a static DF -streamingDf.join(staticDf, "type", "right_join") // right outer join with a static DF +streamingDf.join(staticDf, "type", "left_outer") // left outer join with a static DF {% endhighlight %} @@ -1126,7 +1126,7 @@ streamingDf.join(staticDf, "type", "right_join") // right outer join with a sta Dataset staticDf = spark.read(). ...; Dataset streamingDf = spark.readStream(). ...; streamingDf.join(staticDf, "type"); // inner equi-join with a static DF -streamingDf.join(staticDf, "type", "right_join"); // right outer join with a static DF +streamingDf.join(staticDf, "type", "left_outer"); // left outer join with a static DF {% endhighlight %} @@ -1137,7 +1137,7 @@ streamingDf.join(staticDf, "type", "right_join"); // right outer join with a st staticDf = spark.read. ... streamingDf = spark.readStream. ... streamingDf.join(staticDf, "type") # inner equi-join with a static DF -streamingDf.join(staticDf, "type", "right_join") # right outer join with a static DF +streamingDf.join(staticDf, "type", "left_outer") # left outer join with a static DF {% endhighlight %} @@ -1149,10 +1149,10 @@ staticDf <- read.df(...) streamingDf <- read.stream(...) joined <- merge(streamingDf, staticDf, sort = FALSE) # inner equi
[spark] branch master updated (d530ed0 -> e66201b)
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from d530ed0 Revert "[SPARK-33277][PYSPARK][SQL] Use ContextAwareIterator to stop consuming after the task ends" add e66201b [MINOR][SS][DOCS] Update join type in stream static joins code examples No new revisions were added by this update. Summary of changes: docs/structured-streaming-programming-guide.md | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-2.4 updated: [MINOR][SS][DOCS] Update join type in stream static joins code examples
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a commit to branch branch-2.4 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-2.4 by this push: new 8684720 [MINOR][SS][DOCS] Update join type in stream static joins code examples 8684720 is described below commit 868472040a9eb67169086ed0fab0afcbbbd321f9 Author: Sarvesh Dave AuthorDate: Thu Nov 5 16:22:31 2020 +0900 [MINOR][SS][DOCS] Update join type in stream static joins code examples ### What changes were proposed in this pull request? Update join type in stream static joins code examples in structured streaming programming guide. 1) Scala, Java and Python examples have a common issue. The join keyword is "right_join", it should be "left_outer". _Reasons:_ a) This code snippet is an example of "left outer join" as the streaming df is on left and static df is on right. Also, right outerjoin between stream df(left) and static df(right) is not supported. b) The keyword "right_join/left_join" is unsupported and it should be "right_outer/left_outer". So, all of these code snippets have been updated to "left_outer". 2) R exmaple is correct, but the example is of "right_outer" with static df (left) and streaming df(right). It is changed to "left_outer" to make it consistent with other three examples of scala, java and python. ### Why are the changes needed? To fix the mistake in example code of documentation. ### Does this PR introduce _any_ user-facing change? Yes, it is a user-facing change (but documentation update only). **Screenshots 1: Scala/Java/python example (similar issue)** _Before:_ https://user-images.githubusercontent.com/62717942/98155351-19e59400-1efc-11eb-8142-e6a25a5e6497.png";> _After:_ https://user-images.githubusercontent.com/62717942/98155503-5d400280-1efc-11eb-96e1-5ba0f3c35c82.png";> **Screenshots 2: R example (Make it consistent with above change)** _Before:_ https://user-images.githubusercontent.com/62717942/98155685-ac863300-1efc-11eb-93bc-b7ca4dd34634.png";> _After:_ https://user-images.githubusercontent.com/62717942/98155739-c0ca3000-1efc-11eb-8f95-a7538fa784b7.png";> ### How was this patch tested? The change was tested locally. 1) cd docs/ SKIP_API=1 jekyll build 2) Verify docs/_site/structured-streaming-programming-guide.html file in browser. Closes #30252 from sarveshdave1/doc-update-stream-static-joins. Authored-by: Sarvesh Dave Signed-off-by: Jungtaek Lim (HeartSaVioR) (cherry picked from commit e66201b30bc1f3da7284af14b32e5e6200768dbd) Signed-off-by: Jungtaek Lim (HeartSaVioR) --- docs/structured-streaming-programming-guide.md | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/docs/structured-streaming-programming-guide.md b/docs/structured-streaming-programming-guide.md index dce4b35..aac262b 100644 --- a/docs/structured-streaming-programming-guide.md +++ b/docs/structured-streaming-programming-guide.md @@ -1089,7 +1089,7 @@ val staticDf = spark.read. ... val streamingDf = spark.readStream. ... streamingDf.join(staticDf, "type") // inner equi-join with a static DF -streamingDf.join(staticDf, "type", "right_join") // right outer join with a static DF +streamingDf.join(staticDf, "type", "left_outer") // left outer join with a static DF {% endhighlight %} @@ -1100,7 +1100,7 @@ streamingDf.join(staticDf, "type", "right_join") // right outer join with a sta Dataset staticDf = spark.read(). ...; Dataset streamingDf = spark.readStream(). ...; streamingDf.join(staticDf, "type"); // inner equi-join with a static DF -streamingDf.join(staticDf, "type", "right_join"); // right outer join with a static DF +streamingDf.join(staticDf, "type", "left_outer"); // left outer join with a static DF {% endhighlight %} @@ -,7 +,7 @@ streamingDf.join(staticDf, "type", "right_join"); // right outer join with a st staticDf = spark.read. ... streamingDf = spark.readStream. ... streamingDf.join(staticDf, "type") # inner equi-join with a static DF -streamingDf.join(staticDf, "type", "right_join") # right outer join with a static DF +streamingDf.join(staticDf, "type", "left_outer") # left outer join with a static DF {% endhighlight %} @@ -1123,10 +1123,10 @@ staticDf <- read.df(...) streamingDf <- read.stream(...) joined <- merge(streamingDf, staticDf, sort = FALSE) # inner equi
[spark] branch branch-3.0 updated: [MINOR][SS][DOCS] Update join type in stream static joins code examples
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new c43231c [MINOR][SS][DOCS] Update join type in stream static joins code examples c43231c is described below commit c43231c6cd2d38fb9e6c6435ca08b735c9aacb61 Author: Sarvesh Dave AuthorDate: Thu Nov 5 16:22:31 2020 +0900 [MINOR][SS][DOCS] Update join type in stream static joins code examples ### What changes were proposed in this pull request? Update join type in stream static joins code examples in structured streaming programming guide. 1) Scala, Java and Python examples have a common issue. The join keyword is "right_join", it should be "left_outer". _Reasons:_ a) This code snippet is an example of "left outer join" as the streaming df is on left and static df is on right. Also, right outerjoin between stream df(left) and static df(right) is not supported. b) The keyword "right_join/left_join" is unsupported and it should be "right_outer/left_outer". So, all of these code snippets have been updated to "left_outer". 2) R exmaple is correct, but the example is of "right_outer" with static df (left) and streaming df(right). It is changed to "left_outer" to make it consistent with other three examples of scala, java and python. ### Why are the changes needed? To fix the mistake in example code of documentation. ### Does this PR introduce _any_ user-facing change? Yes, it is a user-facing change (but documentation update only). **Screenshots 1: Scala/Java/python example (similar issue)** _Before:_ https://user-images.githubusercontent.com/62717942/98155351-19e59400-1efc-11eb-8142-e6a25a5e6497.png";> _After:_ https://user-images.githubusercontent.com/62717942/98155503-5d400280-1efc-11eb-96e1-5ba0f3c35c82.png";> **Screenshots 2: R example (Make it consistent with above change)** _Before:_ https://user-images.githubusercontent.com/62717942/98155685-ac863300-1efc-11eb-93bc-b7ca4dd34634.png";> _After:_ https://user-images.githubusercontent.com/62717942/98155739-c0ca3000-1efc-11eb-8f95-a7538fa784b7.png";> ### How was this patch tested? The change was tested locally. 1) cd docs/ SKIP_API=1 jekyll build 2) Verify docs/_site/structured-streaming-programming-guide.html file in browser. Closes #30252 from sarveshdave1/doc-update-stream-static-joins. Authored-by: Sarvesh Dave Signed-off-by: Jungtaek Lim (HeartSaVioR) (cherry picked from commit e66201b30bc1f3da7284af14b32e5e6200768dbd) Signed-off-by: Jungtaek Lim (HeartSaVioR) --- docs/structured-streaming-programming-guide.md | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/docs/structured-streaming-programming-guide.md b/docs/structured-streaming-programming-guide.md index 69d744d..31b1ca9d 100644 --- a/docs/structured-streaming-programming-guide.md +++ b/docs/structured-streaming-programming-guide.md @@ -1115,7 +1115,7 @@ val staticDf = spark.read. ... val streamingDf = spark.readStream. ... streamingDf.join(staticDf, "type") // inner equi-join with a static DF -streamingDf.join(staticDf, "type", "right_join") // right outer join with a static DF +streamingDf.join(staticDf, "type", "left_outer") // left outer join with a static DF {% endhighlight %} @@ -1126,7 +1126,7 @@ streamingDf.join(staticDf, "type", "right_join") // right outer join with a sta Dataset staticDf = spark.read(). ...; Dataset streamingDf = spark.readStream(). ...; streamingDf.join(staticDf, "type"); // inner equi-join with a static DF -streamingDf.join(staticDf, "type", "right_join"); // right outer join with a static DF +streamingDf.join(staticDf, "type", "left_outer"); // left outer join with a static DF {% endhighlight %} @@ -1137,7 +1137,7 @@ streamingDf.join(staticDf, "type", "right_join"); // right outer join with a st staticDf = spark.read. ... streamingDf = spark.readStream. ... streamingDf.join(staticDf, "type") # inner equi-join with a static DF -streamingDf.join(staticDf, "type", "right_join") # right outer join with a static DF +streamingDf.join(staticDf, "type", "left_outer") # left outer join with a static DF {% endhighlight %} @@ -1149,10 +1149,10 @@ staticDf <- read.df(...) streamingDf <- read.stream(...) joined <- merge(streamingDf, staticDf, sort = FALSE) # inner equi
[spark] branch master updated (d530ed0 -> e66201b)
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from d530ed0 Revert "[SPARK-33277][PYSPARK][SQL] Use ContextAwareIterator to stop consuming after the task ends" add e66201b [MINOR][SS][DOCS] Update join type in stream static joins code examples No new revisions were added by this update. Summary of changes: docs/structured-streaming-programming-guide.md | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (9818f07 -> 4b0e23e)
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 9818f07 [SPARK-33243][PYTHON][BUILD] Add numpydoc into documentation dependency add 4b0e23e [SPARK-33215][WEBUI] Speed up event log download by skipping UI rebuild No new revisions were added by this update. Summary of changes: .../history/ApplicationHistoryProvider.scala | 7 +++ .../spark/deploy/history/FsHistoryProvider.scala | 34 ++ .../spark/deploy/history/HistoryServer.scala | 5 ++ .../spark/status/api/v1/ApiRootResource.scala | 15 ++ .../status/api/v1/OneApplicationResource.scala | 9 ++-- .../main/scala/org/apache/spark/ui/SparkUI.scala | 5 ++ .../deploy/history/FsHistoryProviderSuite.scala| 54 +- .../spark/deploy/history/HistoryServerSuite.scala | 18 8 files changed, 132 insertions(+), 15 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (9818f07 -> 4b0e23e)
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 9818f07 [SPARK-33243][PYTHON][BUILD] Add numpydoc into documentation dependency add 4b0e23e [SPARK-33215][WEBUI] Speed up event log download by skipping UI rebuild No new revisions were added by this update. Summary of changes: .../history/ApplicationHistoryProvider.scala | 7 +++ .../spark/deploy/history/FsHistoryProvider.scala | 34 ++ .../spark/deploy/history/HistoryServer.scala | 5 ++ .../spark/status/api/v1/ApiRootResource.scala | 15 ++ .../status/api/v1/OneApplicationResource.scala | 9 ++-- .../main/scala/org/apache/spark/ui/SparkUI.scala | 5 ++ .../deploy/history/FsHistoryProviderSuite.scala| 54 +- .../spark/deploy/history/HistoryServerSuite.scala | 18 8 files changed, 132 insertions(+), 15 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (9818f07 -> 4b0e23e)
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 9818f07 [SPARK-33243][PYTHON][BUILD] Add numpydoc into documentation dependency add 4b0e23e [SPARK-33215][WEBUI] Speed up event log download by skipping UI rebuild No new revisions were added by this update. Summary of changes: .../history/ApplicationHistoryProvider.scala | 7 +++ .../spark/deploy/history/FsHistoryProvider.scala | 34 ++ .../spark/deploy/history/HistoryServer.scala | 5 ++ .../spark/status/api/v1/ApiRootResource.scala | 15 ++ .../status/api/v1/OneApplicationResource.scala | 9 ++-- .../main/scala/org/apache/spark/ui/SparkUI.scala | 5 ++ .../deploy/history/FsHistoryProviderSuite.scala| 54 +- .../spark/deploy/history/HistoryServerSuite.scala | 18 8 files changed, 132 insertions(+), 15 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (9818f07 -> 4b0e23e)
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 9818f07 [SPARK-33243][PYTHON][BUILD] Add numpydoc into documentation dependency add 4b0e23e [SPARK-33215][WEBUI] Speed up event log download by skipping UI rebuild No new revisions were added by this update. Summary of changes: .../history/ApplicationHistoryProvider.scala | 7 +++ .../spark/deploy/history/FsHistoryProvider.scala | 34 ++ .../spark/deploy/history/HistoryServer.scala | 5 ++ .../spark/status/api/v1/ApiRootResource.scala | 15 ++ .../status/api/v1/OneApplicationResource.scala | 9 ++-- .../main/scala/org/apache/spark/ui/SparkUI.scala | 5 ++ .../deploy/history/FsHistoryProviderSuite.scala| 54 +- .../spark/deploy/history/HistoryServerSuite.scala | 18 8 files changed, 132 insertions(+), 15 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (9818f07 -> 4b0e23e)
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 9818f07 [SPARK-33243][PYTHON][BUILD] Add numpydoc into documentation dependency add 4b0e23e [SPARK-33215][WEBUI] Speed up event log download by skipping UI rebuild No new revisions were added by this update. Summary of changes: .../history/ApplicationHistoryProvider.scala | 7 +++ .../spark/deploy/history/FsHistoryProvider.scala | 34 ++ .../spark/deploy/history/HistoryServer.scala | 5 ++ .../spark/status/api/v1/ApiRootResource.scala | 15 ++ .../status/api/v1/OneApplicationResource.scala | 9 ++-- .../main/scala/org/apache/spark/ui/SparkUI.scala | 5 ++ .../deploy/history/FsHistoryProviderSuite.scala| 54 +- .../spark/deploy/history/HistoryServerSuite.scala | 18 8 files changed, 132 insertions(+), 15 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (369cc61 -> d87a0bb)
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 369cc61 Revert "[SPARK-32388][SQL] TRANSFORM with schema-less mode should keep the same with hive" add d87a0bb [SPARK-32862][SS] Left semi stream-stream join No new revisions were added by this update. Summary of changes: .../analysis/UnsupportedOperationChecker.scala | 15 +- .../spark/sql/catalyst/expressions/JoinedRow.scala | 10 + .../analysis/UnsupportedOperationsSuite.scala | 66 ++- .../streaming/StreamingSymmetricHashJoinExec.scala | 121 +++-- .../state/SymmetricHashJoinStateManager.scala | 11 +- .../spark/sql/streaming/StreamingJoinSuite.scala | 502 +++-- 6 files changed, 545 insertions(+), 180 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (369cc61 -> d87a0bb)
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 369cc61 Revert "[SPARK-32388][SQL] TRANSFORM with schema-less mode should keep the same with hive" add d87a0bb [SPARK-32862][SS] Left semi stream-stream join No new revisions were added by this update. Summary of changes: .../analysis/UnsupportedOperationChecker.scala | 15 +- .../spark/sql/catalyst/expressions/JoinedRow.scala | 10 + .../analysis/UnsupportedOperationsSuite.scala | 66 ++- .../streaming/StreamingSymmetricHashJoinExec.scala | 121 +++-- .../state/SymmetricHashJoinStateManager.scala | 11 +- .../spark/sql/streaming/StreamingJoinSuite.scala | 502 +++-- 6 files changed, 545 insertions(+), 180 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (369cc61 -> d87a0bb)
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 369cc61 Revert "[SPARK-32388][SQL] TRANSFORM with schema-less mode should keep the same with hive" add d87a0bb [SPARK-32862][SS] Left semi stream-stream join No new revisions were added by this update. Summary of changes: .../analysis/UnsupportedOperationChecker.scala | 15 +- .../spark/sql/catalyst/expressions/JoinedRow.scala | 10 + .../analysis/UnsupportedOperationsSuite.scala | 66 ++- .../streaming/StreamingSymmetricHashJoinExec.scala | 121 +++-- .../state/SymmetricHashJoinStateManager.scala | 11 +- .../spark/sql/streaming/StreamingJoinSuite.scala | 502 +++-- 6 files changed, 545 insertions(+), 180 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (369cc61 -> d87a0bb)
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 369cc61 Revert "[SPARK-32388][SQL] TRANSFORM with schema-less mode should keep the same with hive" add d87a0bb [SPARK-32862][SS] Left semi stream-stream join No new revisions were added by this update. Summary of changes: .../analysis/UnsupportedOperationChecker.scala | 15 +- .../spark/sql/catalyst/expressions/JoinedRow.scala | 10 + .../analysis/UnsupportedOperationsSuite.scala | 66 ++- .../streaming/StreamingSymmetricHashJoinExec.scala | 121 +++-- .../state/SymmetricHashJoinStateManager.scala | 11 +- .../spark/sql/streaming/StreamingJoinSuite.scala | 502 +++-- 6 files changed, 545 insertions(+), 180 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (369cc61 -> d87a0bb)
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 369cc61 Revert "[SPARK-32388][SQL] TRANSFORM with schema-less mode should keep the same with hive" add d87a0bb [SPARK-32862][SS] Left semi stream-stream join No new revisions were added by this update. Summary of changes: .../analysis/UnsupportedOperationChecker.scala | 15 +- .../spark/sql/catalyst/expressions/JoinedRow.scala | 10 + .../analysis/UnsupportedOperationsSuite.scala | 66 ++- .../streaming/StreamingSymmetricHashJoinExec.scala | 121 +++-- .../state/SymmetricHashJoinStateManager.scala | 11 +- .../spark/sql/streaming/StreamingJoinSuite.scala | 502 +++-- 6 files changed, 545 insertions(+), 180 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated (0bff1f6 -> 02f80cf)
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a change to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git. from 0bff1f6 [SPARK-33123][INFRA] Ignore GitHub only changes in Amplab Jenkins build new 15ed312 [SPARK-32557][CORE] Logging and swallowing the exception per entry in History server new 02f80cf Revert "Revert "[SPARK-33146][CORE] Check for non-fatal errors when loading new applications in SHS"" The 2 revisions listed above as "new" are entirely new to this repository and will be described in separate emails. The revisions listed as "add" were already present in the repository and have only been added to this reference. Summary of changes: .../spark/deploy/history/FsHistoryProvider.scala | 7 +++- .../deploy/history/FsHistoryProviderSuite.scala| 49 ++ 2 files changed, 55 insertions(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] 02/02: Revert "Revert "[SPARK-33146][CORE] Check for non-fatal errors when loading new applications in SHS""
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git commit 02f80cf293739f4d2881316897dbdcea74daa0bc Author: Jungtaek Lim (HeartSaVioR) AuthorDate: Thu Oct 15 15:28:52 2020 +0900 Revert "Revert "[SPARK-33146][CORE] Check for non-fatal errors when loading new applications in SHS"" This reverts commit e40c147a5d194adbba13f12590959dc68347ec14. --- .../spark/deploy/history/FsHistoryProvider.scala | 3 ++ .../deploy/history/FsHistoryProviderSuite.scala| 49 ++ 2 files changed, 52 insertions(+) diff --git a/core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala b/core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala index f5e7c4fa..7e63d55 100644 --- a/core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala +++ b/core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala @@ -527,6 +527,9 @@ private[history] class FsHistoryProvider(conf: SparkConf, clock: Clock) reader.fileSizeForLastIndex > 0 } catch { case _: FileNotFoundException => false +case NonFatal(e) => + logWarning(s"Error while reading new log ${reader.rootPath}", e) + false } case NonFatal(e) => diff --git a/core/src/test/scala/org/apache/spark/deploy/history/FsHistoryProviderSuite.scala b/core/src/test/scala/org/apache/spark/deploy/history/FsHistoryProviderSuite.scala index c2f34fc..f3beb35 100644 --- a/core/src/test/scala/org/apache/spark/deploy/history/FsHistoryProviderSuite.scala +++ b/core/src/test/scala/org/apache/spark/deploy/history/FsHistoryProviderSuite.scala @@ -1470,6 +1470,55 @@ class FsHistoryProviderSuite extends SparkFunSuite with Matchers with Logging { } } + test("SPARK-33146: don't let one bad rolling log folder prevent loading other applications") { +withTempDir { dir => + val conf = createTestConf(true) + conf.set(HISTORY_LOG_DIR, dir.getAbsolutePath) + val hadoopConf = SparkHadoopUtil.newConfiguration(conf) + val fs = new Path(dir.getAbsolutePath).getFileSystem(hadoopConf) + + val provider = new FsHistoryProvider(conf) + + val writer = new RollingEventLogFilesWriter("app", None, dir.toURI, conf, hadoopConf) + writer.start() + + writeEventsToRollingWriter(writer, Seq( +SparkListenerApplicationStart("app", Some("app"), 0, "user", None), +SparkListenerJobStart(1, 0, Seq.empty)), rollFile = false) + provider.checkForLogs() + provider.cleanLogs() + assert(dir.listFiles().size === 1) + assert(provider.getListing.length === 1) + + // Manually delete the appstatus file to make an invalid rolling event log + val appStatusPath = RollingEventLogFilesWriter.getAppStatusFilePath(new Path(writer.logPath), +"app", None, true) + fs.delete(appStatusPath, false) + provider.checkForLogs() + provider.cleanLogs() + assert(provider.getListing.length === 0) + + // Create a new application + val writer2 = new RollingEventLogFilesWriter("app2", None, dir.toURI, conf, hadoopConf) + writer2.start() + writeEventsToRollingWriter(writer2, Seq( +SparkListenerApplicationStart("app2", Some("app2"), 0, "user", None), +SparkListenerJobStart(1, 0, Seq.empty)), rollFile = false) + + // Both folders exist but only one application found + provider.checkForLogs() + provider.cleanLogs() + assert(provider.getListing.length === 1) + assert(dir.listFiles().size === 2) + + // Make sure a new provider sees the valid application + provider.stop() + val newProvider = new FsHistoryProvider(conf) + newProvider.checkForLogs() + assert(newProvider.getListing.length === 1) +} + } + /** * Asks the provider to check for logs and calls a function to perform checks on the updated * app list. Example: - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] 01/02: [SPARK-32557][CORE] Logging and swallowing the exception per entry in History server
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git commit 15ed3121994d4c7e6f1db7e0b5b2cea5565b495a Author: Yan Xiaole AuthorDate: Sun Aug 9 16:47:31 2020 -0700 [SPARK-32557][CORE] Logging and swallowing the exception per entry in History server ### What changes were proposed in this pull request? This PR adds a try catch wrapping the History server scan logic to log and swallow the exception per entry. ### Why are the changes needed? As discussed in #29350 , one entry failure shouldn't affect others. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Manually tested. Closes #29374 from yanxiaole/SPARK-32557. Authored-by: Yan Xiaole Signed-off-by: Dongjoon Hyun --- .../scala/org/apache/spark/deploy/history/FsHistoryProvider.scala | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala b/core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala index c262152..f5e7c4fa 100644 --- a/core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala +++ b/core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala @@ -27,6 +27,7 @@ import java.util.zip.ZipOutputStream import scala.collection.JavaConverters._ import scala.collection.mutable import scala.io.Source +import scala.util.control.NonFatal import scala.xml.Node import com.fasterxml.jackson.annotation.JsonIgnore @@ -528,7 +529,8 @@ private[history] class FsHistoryProvider(conf: SparkConf, clock: Clock) case _: FileNotFoundException => false } -case _: FileNotFoundException => +case NonFatal(e) => + logWarning(s"Error while filtering log ${reader.rootPath}", e) false } } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] 01/02: [SPARK-32557][CORE] Logging and swallowing the exception per entry in History server
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git commit 15ed3121994d4c7e6f1db7e0b5b2cea5565b495a Author: Yan Xiaole AuthorDate: Sun Aug 9 16:47:31 2020 -0700 [SPARK-32557][CORE] Logging and swallowing the exception per entry in History server ### What changes were proposed in this pull request? This PR adds a try catch wrapping the History server scan logic to log and swallow the exception per entry. ### Why are the changes needed? As discussed in #29350 , one entry failure shouldn't affect others. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Manually tested. Closes #29374 from yanxiaole/SPARK-32557. Authored-by: Yan Xiaole Signed-off-by: Dongjoon Hyun --- .../scala/org/apache/spark/deploy/history/FsHistoryProvider.scala | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala b/core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala index c262152..f5e7c4fa 100644 --- a/core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala +++ b/core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala @@ -27,6 +27,7 @@ import java.util.zip.ZipOutputStream import scala.collection.JavaConverters._ import scala.collection.mutable import scala.io.Source +import scala.util.control.NonFatal import scala.xml.Node import com.fasterxml.jackson.annotation.JsonIgnore @@ -528,7 +529,8 @@ private[history] class FsHistoryProvider(conf: SparkConf, clock: Clock) case _: FileNotFoundException => false } -case _: FileNotFoundException => +case NonFatal(e) => + logWarning(s"Error while filtering log ${reader.rootPath}", e) false } } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] 02/02: Revert "Revert "[SPARK-33146][CORE] Check for non-fatal errors when loading new applications in SHS""
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git commit 02f80cf293739f4d2881316897dbdcea74daa0bc Author: Jungtaek Lim (HeartSaVioR) AuthorDate: Thu Oct 15 15:28:52 2020 +0900 Revert "Revert "[SPARK-33146][CORE] Check for non-fatal errors when loading new applications in SHS"" This reverts commit e40c147a5d194adbba13f12590959dc68347ec14. --- .../spark/deploy/history/FsHistoryProvider.scala | 3 ++ .../deploy/history/FsHistoryProviderSuite.scala| 49 ++ 2 files changed, 52 insertions(+) diff --git a/core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala b/core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala index f5e7c4fa..7e63d55 100644 --- a/core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala +++ b/core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala @@ -527,6 +527,9 @@ private[history] class FsHistoryProvider(conf: SparkConf, clock: Clock) reader.fileSizeForLastIndex > 0 } catch { case _: FileNotFoundException => false +case NonFatal(e) => + logWarning(s"Error while reading new log ${reader.rootPath}", e) + false } case NonFatal(e) => diff --git a/core/src/test/scala/org/apache/spark/deploy/history/FsHistoryProviderSuite.scala b/core/src/test/scala/org/apache/spark/deploy/history/FsHistoryProviderSuite.scala index c2f34fc..f3beb35 100644 --- a/core/src/test/scala/org/apache/spark/deploy/history/FsHistoryProviderSuite.scala +++ b/core/src/test/scala/org/apache/spark/deploy/history/FsHistoryProviderSuite.scala @@ -1470,6 +1470,55 @@ class FsHistoryProviderSuite extends SparkFunSuite with Matchers with Logging { } } + test("SPARK-33146: don't let one bad rolling log folder prevent loading other applications") { +withTempDir { dir => + val conf = createTestConf(true) + conf.set(HISTORY_LOG_DIR, dir.getAbsolutePath) + val hadoopConf = SparkHadoopUtil.newConfiguration(conf) + val fs = new Path(dir.getAbsolutePath).getFileSystem(hadoopConf) + + val provider = new FsHistoryProvider(conf) + + val writer = new RollingEventLogFilesWriter("app", None, dir.toURI, conf, hadoopConf) + writer.start() + + writeEventsToRollingWriter(writer, Seq( +SparkListenerApplicationStart("app", Some("app"), 0, "user", None), +SparkListenerJobStart(1, 0, Seq.empty)), rollFile = false) + provider.checkForLogs() + provider.cleanLogs() + assert(dir.listFiles().size === 1) + assert(provider.getListing.length === 1) + + // Manually delete the appstatus file to make an invalid rolling event log + val appStatusPath = RollingEventLogFilesWriter.getAppStatusFilePath(new Path(writer.logPath), +"app", None, true) + fs.delete(appStatusPath, false) + provider.checkForLogs() + provider.cleanLogs() + assert(provider.getListing.length === 0) + + // Create a new application + val writer2 = new RollingEventLogFilesWriter("app2", None, dir.toURI, conf, hadoopConf) + writer2.start() + writeEventsToRollingWriter(writer2, Seq( +SparkListenerApplicationStart("app2", Some("app2"), 0, "user", None), +SparkListenerJobStart(1, 0, Seq.empty)), rollFile = false) + + // Both folders exist but only one application found + provider.checkForLogs() + provider.cleanLogs() + assert(provider.getListing.length === 1) + assert(dir.listFiles().size === 2) + + // Make sure a new provider sees the valid application + provider.stop() + val newProvider = new FsHistoryProvider(conf) + newProvider.checkForLogs() + assert(newProvider.getListing.length === 1) +} + } + /** * Asks the provider to check for logs and calls a function to perform checks on the updated * app list. Example: - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated (0bff1f6 -> 02f80cf)
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a change to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git. from 0bff1f6 [SPARK-33123][INFRA] Ignore GitHub only changes in Amplab Jenkins build new 15ed312 [SPARK-32557][CORE] Logging and swallowing the exception per entry in History server new 02f80cf Revert "Revert "[SPARK-33146][CORE] Check for non-fatal errors when loading new applications in SHS"" The 2 revisions listed above as "new" are entirely new to this repository and will be described in separate emails. The revisions listed as "add" were already present in the repository and have only been added to this reference. Summary of changes: .../spark/deploy/history/FsHistoryProvider.scala | 7 +++- .../deploy/history/FsHistoryProviderSuite.scala| 49 ++ 2 files changed, 55 insertions(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] 01/02: [SPARK-32557][CORE] Logging and swallowing the exception per entry in History server
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git commit 15ed3121994d4c7e6f1db7e0b5b2cea5565b495a Author: Yan Xiaole AuthorDate: Sun Aug 9 16:47:31 2020 -0700 [SPARK-32557][CORE] Logging and swallowing the exception per entry in History server ### What changes were proposed in this pull request? This PR adds a try catch wrapping the History server scan logic to log and swallow the exception per entry. ### Why are the changes needed? As discussed in #29350 , one entry failure shouldn't affect others. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Manually tested. Closes #29374 from yanxiaole/SPARK-32557. Authored-by: Yan Xiaole Signed-off-by: Dongjoon Hyun --- .../scala/org/apache/spark/deploy/history/FsHistoryProvider.scala | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala b/core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala index c262152..f5e7c4fa 100644 --- a/core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala +++ b/core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala @@ -27,6 +27,7 @@ import java.util.zip.ZipOutputStream import scala.collection.JavaConverters._ import scala.collection.mutable import scala.io.Source +import scala.util.control.NonFatal import scala.xml.Node import com.fasterxml.jackson.annotation.JsonIgnore @@ -528,7 +529,8 @@ private[history] class FsHistoryProvider(conf: SparkConf, clock: Clock) case _: FileNotFoundException => false } -case _: FileNotFoundException => +case NonFatal(e) => + logWarning(s"Error while filtering log ${reader.rootPath}", e) false } } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated (0bff1f6 -> 02f80cf)
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a change to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git. from 0bff1f6 [SPARK-33123][INFRA] Ignore GitHub only changes in Amplab Jenkins build new 15ed312 [SPARK-32557][CORE] Logging and swallowing the exception per entry in History server new 02f80cf Revert "Revert "[SPARK-33146][CORE] Check for non-fatal errors when loading new applications in SHS"" The 2 revisions listed above as "new" are entirely new to this repository and will be described in separate emails. The revisions listed as "add" were already present in the repository and have only been added to this reference. Summary of changes: .../spark/deploy/history/FsHistoryProvider.scala | 7 +++- .../deploy/history/FsHistoryProviderSuite.scala| 49 ++ 2 files changed, 55 insertions(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] 02/02: Revert "Revert "[SPARK-33146][CORE] Check for non-fatal errors when loading new applications in SHS""
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git commit 02f80cf293739f4d2881316897dbdcea74daa0bc Author: Jungtaek Lim (HeartSaVioR) AuthorDate: Thu Oct 15 15:28:52 2020 +0900 Revert "Revert "[SPARK-33146][CORE] Check for non-fatal errors when loading new applications in SHS"" This reverts commit e40c147a5d194adbba13f12590959dc68347ec14. --- .../spark/deploy/history/FsHistoryProvider.scala | 3 ++ .../deploy/history/FsHistoryProviderSuite.scala| 49 ++ 2 files changed, 52 insertions(+) diff --git a/core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala b/core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala index f5e7c4fa..7e63d55 100644 --- a/core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala +++ b/core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala @@ -527,6 +527,9 @@ private[history] class FsHistoryProvider(conf: SparkConf, clock: Clock) reader.fileSizeForLastIndex > 0 } catch { case _: FileNotFoundException => false +case NonFatal(e) => + logWarning(s"Error while reading new log ${reader.rootPath}", e) + false } case NonFatal(e) => diff --git a/core/src/test/scala/org/apache/spark/deploy/history/FsHistoryProviderSuite.scala b/core/src/test/scala/org/apache/spark/deploy/history/FsHistoryProviderSuite.scala index c2f34fc..f3beb35 100644 --- a/core/src/test/scala/org/apache/spark/deploy/history/FsHistoryProviderSuite.scala +++ b/core/src/test/scala/org/apache/spark/deploy/history/FsHistoryProviderSuite.scala @@ -1470,6 +1470,55 @@ class FsHistoryProviderSuite extends SparkFunSuite with Matchers with Logging { } } + test("SPARK-33146: don't let one bad rolling log folder prevent loading other applications") { +withTempDir { dir => + val conf = createTestConf(true) + conf.set(HISTORY_LOG_DIR, dir.getAbsolutePath) + val hadoopConf = SparkHadoopUtil.newConfiguration(conf) + val fs = new Path(dir.getAbsolutePath).getFileSystem(hadoopConf) + + val provider = new FsHistoryProvider(conf) + + val writer = new RollingEventLogFilesWriter("app", None, dir.toURI, conf, hadoopConf) + writer.start() + + writeEventsToRollingWriter(writer, Seq( +SparkListenerApplicationStart("app", Some("app"), 0, "user", None), +SparkListenerJobStart(1, 0, Seq.empty)), rollFile = false) + provider.checkForLogs() + provider.cleanLogs() + assert(dir.listFiles().size === 1) + assert(provider.getListing.length === 1) + + // Manually delete the appstatus file to make an invalid rolling event log + val appStatusPath = RollingEventLogFilesWriter.getAppStatusFilePath(new Path(writer.logPath), +"app", None, true) + fs.delete(appStatusPath, false) + provider.checkForLogs() + provider.cleanLogs() + assert(provider.getListing.length === 0) + + // Create a new application + val writer2 = new RollingEventLogFilesWriter("app2", None, dir.toURI, conf, hadoopConf) + writer2.start() + writeEventsToRollingWriter(writer2, Seq( +SparkListenerApplicationStart("app2", Some("app2"), 0, "user", None), +SparkListenerJobStart(1, 0, Seq.empty)), rollFile = false) + + // Both folders exist but only one application found + provider.checkForLogs() + provider.cleanLogs() + assert(provider.getListing.length === 1) + assert(dir.listFiles().size === 2) + + // Make sure a new provider sees the valid application + provider.stop() + val newProvider = new FsHistoryProvider(conf) + newProvider.checkForLogs() + assert(newProvider.getListing.length === 1) +} + } + /** * Asks the provider to check for logs and calls a function to perform checks on the updated * app list. Example: - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated (0bff1f6 -> 02f80cf)
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a change to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git. from 0bff1f6 [SPARK-33123][INFRA] Ignore GitHub only changes in Amplab Jenkins build new 15ed312 [SPARK-32557][CORE] Logging and swallowing the exception per entry in History server new 02f80cf Revert "Revert "[SPARK-33146][CORE] Check for non-fatal errors when loading new applications in SHS"" The 2 revisions listed above as "new" are entirely new to this repository and will be described in separate emails. The revisions listed as "add" were already present in the repository and have only been added to this reference. Summary of changes: .../spark/deploy/history/FsHistoryProvider.scala | 7 +++- .../deploy/history/FsHistoryProviderSuite.scala| 49 ++ 2 files changed, 55 insertions(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] 02/02: Revert "Revert "[SPARK-33146][CORE] Check for non-fatal errors when loading new applications in SHS""
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git commit 02f80cf293739f4d2881316897dbdcea74daa0bc Author: Jungtaek Lim (HeartSaVioR) AuthorDate: Thu Oct 15 15:28:52 2020 +0900 Revert "Revert "[SPARK-33146][CORE] Check for non-fatal errors when loading new applications in SHS"" This reverts commit e40c147a5d194adbba13f12590959dc68347ec14. --- .../spark/deploy/history/FsHistoryProvider.scala | 3 ++ .../deploy/history/FsHistoryProviderSuite.scala| 49 ++ 2 files changed, 52 insertions(+) diff --git a/core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala b/core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala index f5e7c4fa..7e63d55 100644 --- a/core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala +++ b/core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala @@ -527,6 +527,9 @@ private[history] class FsHistoryProvider(conf: SparkConf, clock: Clock) reader.fileSizeForLastIndex > 0 } catch { case _: FileNotFoundException => false +case NonFatal(e) => + logWarning(s"Error while reading new log ${reader.rootPath}", e) + false } case NonFatal(e) => diff --git a/core/src/test/scala/org/apache/spark/deploy/history/FsHistoryProviderSuite.scala b/core/src/test/scala/org/apache/spark/deploy/history/FsHistoryProviderSuite.scala index c2f34fc..f3beb35 100644 --- a/core/src/test/scala/org/apache/spark/deploy/history/FsHistoryProviderSuite.scala +++ b/core/src/test/scala/org/apache/spark/deploy/history/FsHistoryProviderSuite.scala @@ -1470,6 +1470,55 @@ class FsHistoryProviderSuite extends SparkFunSuite with Matchers with Logging { } } + test("SPARK-33146: don't let one bad rolling log folder prevent loading other applications") { +withTempDir { dir => + val conf = createTestConf(true) + conf.set(HISTORY_LOG_DIR, dir.getAbsolutePath) + val hadoopConf = SparkHadoopUtil.newConfiguration(conf) + val fs = new Path(dir.getAbsolutePath).getFileSystem(hadoopConf) + + val provider = new FsHistoryProvider(conf) + + val writer = new RollingEventLogFilesWriter("app", None, dir.toURI, conf, hadoopConf) + writer.start() + + writeEventsToRollingWriter(writer, Seq( +SparkListenerApplicationStart("app", Some("app"), 0, "user", None), +SparkListenerJobStart(1, 0, Seq.empty)), rollFile = false) + provider.checkForLogs() + provider.cleanLogs() + assert(dir.listFiles().size === 1) + assert(provider.getListing.length === 1) + + // Manually delete the appstatus file to make an invalid rolling event log + val appStatusPath = RollingEventLogFilesWriter.getAppStatusFilePath(new Path(writer.logPath), +"app", None, true) + fs.delete(appStatusPath, false) + provider.checkForLogs() + provider.cleanLogs() + assert(provider.getListing.length === 0) + + // Create a new application + val writer2 = new RollingEventLogFilesWriter("app2", None, dir.toURI, conf, hadoopConf) + writer2.start() + writeEventsToRollingWriter(writer2, Seq( +SparkListenerApplicationStart("app2", Some("app2"), 0, "user", None), +SparkListenerJobStart(1, 0, Seq.empty)), rollFile = false) + + // Both folders exist but only one application found + provider.checkForLogs() + provider.cleanLogs() + assert(provider.getListing.length === 1) + assert(dir.listFiles().size === 2) + + // Make sure a new provider sees the valid application + provider.stop() + val newProvider = new FsHistoryProvider(conf) + newProvider.checkForLogs() + assert(newProvider.getListing.length === 1) +} + } + /** * Asks the provider to check for logs and calls a function to perform checks on the updated * app list. Example: - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] 01/02: [SPARK-32557][CORE] Logging and swallowing the exception per entry in History server
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git commit 15ed3121994d4c7e6f1db7e0b5b2cea5565b495a Author: Yan Xiaole AuthorDate: Sun Aug 9 16:47:31 2020 -0700 [SPARK-32557][CORE] Logging and swallowing the exception per entry in History server ### What changes were proposed in this pull request? This PR adds a try catch wrapping the History server scan logic to log and swallow the exception per entry. ### Why are the changes needed? As discussed in #29350 , one entry failure shouldn't affect others. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Manually tested. Closes #29374 from yanxiaole/SPARK-32557. Authored-by: Yan Xiaole Signed-off-by: Dongjoon Hyun --- .../scala/org/apache/spark/deploy/history/FsHistoryProvider.scala | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala b/core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala index c262152..f5e7c4fa 100644 --- a/core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala +++ b/core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala @@ -27,6 +27,7 @@ import java.util.zip.ZipOutputStream import scala.collection.JavaConverters._ import scala.collection.mutable import scala.io.Source +import scala.util.control.NonFatal import scala.xml.Node import com.fasterxml.jackson.annotation.JsonIgnore @@ -528,7 +529,8 @@ private[history] class FsHistoryProvider(conf: SparkConf, clock: Clock) case _: FileNotFoundException => false } -case _: FileNotFoundException => +case NonFatal(e) => + logWarning(s"Error while filtering log ${reader.rootPath}", e) false } } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] 02/02: Revert "Revert "[SPARK-33146][CORE] Check for non-fatal errors when loading new applications in SHS""
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git commit 02f80cf293739f4d2881316897dbdcea74daa0bc Author: Jungtaek Lim (HeartSaVioR) AuthorDate: Thu Oct 15 15:28:52 2020 +0900 Revert "Revert "[SPARK-33146][CORE] Check for non-fatal errors when loading new applications in SHS"" This reverts commit e40c147a5d194adbba13f12590959dc68347ec14. --- .../spark/deploy/history/FsHistoryProvider.scala | 3 ++ .../deploy/history/FsHistoryProviderSuite.scala| 49 ++ 2 files changed, 52 insertions(+) diff --git a/core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala b/core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala index f5e7c4fa..7e63d55 100644 --- a/core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala +++ b/core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala @@ -527,6 +527,9 @@ private[history] class FsHistoryProvider(conf: SparkConf, clock: Clock) reader.fileSizeForLastIndex > 0 } catch { case _: FileNotFoundException => false +case NonFatal(e) => + logWarning(s"Error while reading new log ${reader.rootPath}", e) + false } case NonFatal(e) => diff --git a/core/src/test/scala/org/apache/spark/deploy/history/FsHistoryProviderSuite.scala b/core/src/test/scala/org/apache/spark/deploy/history/FsHistoryProviderSuite.scala index c2f34fc..f3beb35 100644 --- a/core/src/test/scala/org/apache/spark/deploy/history/FsHistoryProviderSuite.scala +++ b/core/src/test/scala/org/apache/spark/deploy/history/FsHistoryProviderSuite.scala @@ -1470,6 +1470,55 @@ class FsHistoryProviderSuite extends SparkFunSuite with Matchers with Logging { } } + test("SPARK-33146: don't let one bad rolling log folder prevent loading other applications") { +withTempDir { dir => + val conf = createTestConf(true) + conf.set(HISTORY_LOG_DIR, dir.getAbsolutePath) + val hadoopConf = SparkHadoopUtil.newConfiguration(conf) + val fs = new Path(dir.getAbsolutePath).getFileSystem(hadoopConf) + + val provider = new FsHistoryProvider(conf) + + val writer = new RollingEventLogFilesWriter("app", None, dir.toURI, conf, hadoopConf) + writer.start() + + writeEventsToRollingWriter(writer, Seq( +SparkListenerApplicationStart("app", Some("app"), 0, "user", None), +SparkListenerJobStart(1, 0, Seq.empty)), rollFile = false) + provider.checkForLogs() + provider.cleanLogs() + assert(dir.listFiles().size === 1) + assert(provider.getListing.length === 1) + + // Manually delete the appstatus file to make an invalid rolling event log + val appStatusPath = RollingEventLogFilesWriter.getAppStatusFilePath(new Path(writer.logPath), +"app", None, true) + fs.delete(appStatusPath, false) + provider.checkForLogs() + provider.cleanLogs() + assert(provider.getListing.length === 0) + + // Create a new application + val writer2 = new RollingEventLogFilesWriter("app2", None, dir.toURI, conf, hadoopConf) + writer2.start() + writeEventsToRollingWriter(writer2, Seq( +SparkListenerApplicationStart("app2", Some("app2"), 0, "user", None), +SparkListenerJobStart(1, 0, Seq.empty)), rollFile = false) + + // Both folders exist but only one application found + provider.checkForLogs() + provider.cleanLogs() + assert(provider.getListing.length === 1) + assert(dir.listFiles().size === 2) + + // Make sure a new provider sees the valid application + provider.stop() + val newProvider = new FsHistoryProvider(conf) + newProvider.checkForLogs() + assert(newProvider.getListing.length === 1) +} + } + /** * Asks the provider to check for logs and calls a function to perform checks on the updated * app list. Example: - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated (0bff1f6 -> 02f80cf)
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a change to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git. from 0bff1f6 [SPARK-33123][INFRA] Ignore GitHub only changes in Amplab Jenkins build new 15ed312 [SPARK-32557][CORE] Logging and swallowing the exception per entry in History server new 02f80cf Revert "Revert "[SPARK-33146][CORE] Check for non-fatal errors when loading new applications in SHS"" The 2 revisions listed above as "new" are entirely new to this repository and will be described in separate emails. The revisions listed as "add" were already present in the repository and have only been added to this reference. Summary of changes: .../spark/deploy/history/FsHistoryProvider.scala | 7 +++- .../deploy/history/FsHistoryProviderSuite.scala| 49 ++ 2 files changed, 55 insertions(+), 1 deletion(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] 01/02: [SPARK-32557][CORE] Logging and swallowing the exception per entry in History server
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git commit 15ed3121994d4c7e6f1db7e0b5b2cea5565b495a Author: Yan Xiaole AuthorDate: Sun Aug 9 16:47:31 2020 -0700 [SPARK-32557][CORE] Logging and swallowing the exception per entry in History server ### What changes were proposed in this pull request? This PR adds a try catch wrapping the History server scan logic to log and swallow the exception per entry. ### Why are the changes needed? As discussed in #29350 , one entry failure shouldn't affect others. ### Does this PR introduce _any_ user-facing change? No. ### How was this patch tested? Manually tested. Closes #29374 from yanxiaole/SPARK-32557. Authored-by: Yan Xiaole Signed-off-by: Dongjoon Hyun --- .../scala/org/apache/spark/deploy/history/FsHistoryProvider.scala | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala b/core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala index c262152..f5e7c4fa 100644 --- a/core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala +++ b/core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala @@ -27,6 +27,7 @@ import java.util.zip.ZipOutputStream import scala.collection.JavaConverters._ import scala.collection.mutable import scala.io.Source +import scala.util.control.NonFatal import scala.xml.Node import com.fasterxml.jackson.annotation.JsonIgnore @@ -528,7 +529,8 @@ private[history] class FsHistoryProvider(conf: SparkConf, clock: Clock) case _: FileNotFoundException => false } -case _: FileNotFoundException => +case NonFatal(e) => + logWarning(s"Error while filtering log ${reader.rootPath}", e) false } } - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-33146][CORE] Check for non-fatal errors when loading new applications in SHS
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new d9669bd [SPARK-33146][CORE] Check for non-fatal errors when loading new applications in SHS d9669bd is described below commit d9669bdf0ff4ed9951d7077b8dc9ad94507615c5 Author: Adam Binford AuthorDate: Thu Oct 15 11:59:29 2020 +0900 [SPARK-33146][CORE] Check for non-fatal errors when loading new applications in SHS ### What changes were proposed in this pull request? Adds an additional check for non-fatal errors when attempting to add a new entry to the history server application listing. ### Why are the changes needed? A bad rolling event log folder (missing appstatus file or no log files) would cause no applications to be loaded by the Spark history server. Figuring out why invalid event log folders are created in the first place will be addressed in separate issues, this just lets the history server skip the invalid folder and successfully load all the valid applications. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? New UT Closes #30037 from Kimahriman/bug/rolling-log-crashing-history. Authored-by: Adam Binford Signed-off-by: Jungtaek Lim (HeartSaVioR) (cherry picked from commit 9ab0ec4e38e5df0537b38cb0f89e004ad57bec90) Signed-off-by: Jungtaek Lim (HeartSaVioR) --- .../spark/deploy/history/FsHistoryProvider.scala | 3 ++ .../deploy/history/FsHistoryProviderSuite.scala| 49 ++ 2 files changed, 52 insertions(+) diff --git a/core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala b/core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala index c262152..5970708 100644 --- a/core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala +++ b/core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala @@ -526,6 +526,9 @@ private[history] class FsHistoryProvider(conf: SparkConf, clock: Clock) reader.fileSizeForLastIndex > 0 } catch { case _: FileNotFoundException => false +case NonFatal(e) => + logWarning(s"Error while reading new log ${reader.rootPath}", e) + false } case _: FileNotFoundException => diff --git a/core/src/test/scala/org/apache/spark/deploy/history/FsHistoryProviderSuite.scala b/core/src/test/scala/org/apache/spark/deploy/history/FsHistoryProviderSuite.scala index c2f34fc..f3beb35 100644 --- a/core/src/test/scala/org/apache/spark/deploy/history/FsHistoryProviderSuite.scala +++ b/core/src/test/scala/org/apache/spark/deploy/history/FsHistoryProviderSuite.scala @@ -1470,6 +1470,55 @@ class FsHistoryProviderSuite extends SparkFunSuite with Matchers with Logging { } } + test("SPARK-33146: don't let one bad rolling log folder prevent loading other applications") { +withTempDir { dir => + val conf = createTestConf(true) + conf.set(HISTORY_LOG_DIR, dir.getAbsolutePath) + val hadoopConf = SparkHadoopUtil.newConfiguration(conf) + val fs = new Path(dir.getAbsolutePath).getFileSystem(hadoopConf) + + val provider = new FsHistoryProvider(conf) + + val writer = new RollingEventLogFilesWriter("app", None, dir.toURI, conf, hadoopConf) + writer.start() + + writeEventsToRollingWriter(writer, Seq( +SparkListenerApplicationStart("app", Some("app"), 0, "user", None), +SparkListenerJobStart(1, 0, Seq.empty)), rollFile = false) + provider.checkForLogs() + provider.cleanLogs() + assert(dir.listFiles().size === 1) + assert(provider.getListing.length === 1) + + // Manually delete the appstatus file to make an invalid rolling event log + val appStatusPath = RollingEventLogFilesWriter.getAppStatusFilePath(new Path(writer.logPath), +"app", None, true) + fs.delete(appStatusPath, false) + provider.checkForLogs() + provider.cleanLogs() + assert(provider.getListing.length === 0) + + // Create a new application + val writer2 = new RollingEventLogFilesWriter("app2", None, dir.toURI, conf, hadoopConf) + writer2.start() + writeEventsToRollingWriter(writer2, Seq( +SparkListenerApplicationStart("app2", Some("app2"), 0, "user", None), +SparkListenerJobStart(1, 0, Seq.empty)), rollFile = false) + + // Both folders exist but only one application found + provider.checkForLogs() + provider.cleanLogs() + assert(provider.getListing.length === 1) + assert(dir.listFiles
[spark] branch branch-3.0 updated: [SPARK-33146][CORE] Check for non-fatal errors when loading new applications in SHS
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new d9669bd [SPARK-33146][CORE] Check for non-fatal errors when loading new applications in SHS d9669bd is described below commit d9669bdf0ff4ed9951d7077b8dc9ad94507615c5 Author: Adam Binford AuthorDate: Thu Oct 15 11:59:29 2020 +0900 [SPARK-33146][CORE] Check for non-fatal errors when loading new applications in SHS ### What changes were proposed in this pull request? Adds an additional check for non-fatal errors when attempting to add a new entry to the history server application listing. ### Why are the changes needed? A bad rolling event log folder (missing appstatus file or no log files) would cause no applications to be loaded by the Spark history server. Figuring out why invalid event log folders are created in the first place will be addressed in separate issues, this just lets the history server skip the invalid folder and successfully load all the valid applications. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? New UT Closes #30037 from Kimahriman/bug/rolling-log-crashing-history. Authored-by: Adam Binford Signed-off-by: Jungtaek Lim (HeartSaVioR) (cherry picked from commit 9ab0ec4e38e5df0537b38cb0f89e004ad57bec90) Signed-off-by: Jungtaek Lim (HeartSaVioR) --- .../spark/deploy/history/FsHistoryProvider.scala | 3 ++ .../deploy/history/FsHistoryProviderSuite.scala| 49 ++ 2 files changed, 52 insertions(+) diff --git a/core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala b/core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala index c262152..5970708 100644 --- a/core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala +++ b/core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala @@ -526,6 +526,9 @@ private[history] class FsHistoryProvider(conf: SparkConf, clock: Clock) reader.fileSizeForLastIndex > 0 } catch { case _: FileNotFoundException => false +case NonFatal(e) => + logWarning(s"Error while reading new log ${reader.rootPath}", e) + false } case _: FileNotFoundException => diff --git a/core/src/test/scala/org/apache/spark/deploy/history/FsHistoryProviderSuite.scala b/core/src/test/scala/org/apache/spark/deploy/history/FsHistoryProviderSuite.scala index c2f34fc..f3beb35 100644 --- a/core/src/test/scala/org/apache/spark/deploy/history/FsHistoryProviderSuite.scala +++ b/core/src/test/scala/org/apache/spark/deploy/history/FsHistoryProviderSuite.scala @@ -1470,6 +1470,55 @@ class FsHistoryProviderSuite extends SparkFunSuite with Matchers with Logging { } } + test("SPARK-33146: don't let one bad rolling log folder prevent loading other applications") { +withTempDir { dir => + val conf = createTestConf(true) + conf.set(HISTORY_LOG_DIR, dir.getAbsolutePath) + val hadoopConf = SparkHadoopUtil.newConfiguration(conf) + val fs = new Path(dir.getAbsolutePath).getFileSystem(hadoopConf) + + val provider = new FsHistoryProvider(conf) + + val writer = new RollingEventLogFilesWriter("app", None, dir.toURI, conf, hadoopConf) + writer.start() + + writeEventsToRollingWriter(writer, Seq( +SparkListenerApplicationStart("app", Some("app"), 0, "user", None), +SparkListenerJobStart(1, 0, Seq.empty)), rollFile = false) + provider.checkForLogs() + provider.cleanLogs() + assert(dir.listFiles().size === 1) + assert(provider.getListing.length === 1) + + // Manually delete the appstatus file to make an invalid rolling event log + val appStatusPath = RollingEventLogFilesWriter.getAppStatusFilePath(new Path(writer.logPath), +"app", None, true) + fs.delete(appStatusPath, false) + provider.checkForLogs() + provider.cleanLogs() + assert(provider.getListing.length === 0) + + // Create a new application + val writer2 = new RollingEventLogFilesWriter("app2", None, dir.toURI, conf, hadoopConf) + writer2.start() + writeEventsToRollingWriter(writer2, Seq( +SparkListenerApplicationStart("app2", Some("app2"), 0, "user", None), +SparkListenerJobStart(1, 0, Seq.empty)), rollFile = false) + + // Both folders exist but only one application found + provider.checkForLogs() + provider.cleanLogs() + assert(provider.getListing.length === 1) + assert(dir.listFiles
[spark] branch master updated (f3ad32f -> 9ab0ec4)
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from f3ad32f [SPARK-33026][SQL][FOLLOWUP] metrics name should be numOutputRows add 9ab0ec4 [SPARK-33146][CORE] Check for non-fatal errors when loading new applications in SHS No new revisions were added by this update. Summary of changes: .../spark/deploy/history/FsHistoryProvider.scala | 3 ++ .../deploy/history/FsHistoryProviderSuite.scala| 49 ++ 2 files changed, 52 insertions(+) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-33146][CORE] Check for non-fatal errors when loading new applications in SHS
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new d9669bd [SPARK-33146][CORE] Check for non-fatal errors when loading new applications in SHS d9669bd is described below commit d9669bdf0ff4ed9951d7077b8dc9ad94507615c5 Author: Adam Binford AuthorDate: Thu Oct 15 11:59:29 2020 +0900 [SPARK-33146][CORE] Check for non-fatal errors when loading new applications in SHS ### What changes were proposed in this pull request? Adds an additional check for non-fatal errors when attempting to add a new entry to the history server application listing. ### Why are the changes needed? A bad rolling event log folder (missing appstatus file or no log files) would cause no applications to be loaded by the Spark history server. Figuring out why invalid event log folders are created in the first place will be addressed in separate issues, this just lets the history server skip the invalid folder and successfully load all the valid applications. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? New UT Closes #30037 from Kimahriman/bug/rolling-log-crashing-history. Authored-by: Adam Binford Signed-off-by: Jungtaek Lim (HeartSaVioR) (cherry picked from commit 9ab0ec4e38e5df0537b38cb0f89e004ad57bec90) Signed-off-by: Jungtaek Lim (HeartSaVioR) --- .../spark/deploy/history/FsHistoryProvider.scala | 3 ++ .../deploy/history/FsHistoryProviderSuite.scala| 49 ++ 2 files changed, 52 insertions(+) diff --git a/core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala b/core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala index c262152..5970708 100644 --- a/core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala +++ b/core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala @@ -526,6 +526,9 @@ private[history] class FsHistoryProvider(conf: SparkConf, clock: Clock) reader.fileSizeForLastIndex > 0 } catch { case _: FileNotFoundException => false +case NonFatal(e) => + logWarning(s"Error while reading new log ${reader.rootPath}", e) + false } case _: FileNotFoundException => diff --git a/core/src/test/scala/org/apache/spark/deploy/history/FsHistoryProviderSuite.scala b/core/src/test/scala/org/apache/spark/deploy/history/FsHistoryProviderSuite.scala index c2f34fc..f3beb35 100644 --- a/core/src/test/scala/org/apache/spark/deploy/history/FsHistoryProviderSuite.scala +++ b/core/src/test/scala/org/apache/spark/deploy/history/FsHistoryProviderSuite.scala @@ -1470,6 +1470,55 @@ class FsHistoryProviderSuite extends SparkFunSuite with Matchers with Logging { } } + test("SPARK-33146: don't let one bad rolling log folder prevent loading other applications") { +withTempDir { dir => + val conf = createTestConf(true) + conf.set(HISTORY_LOG_DIR, dir.getAbsolutePath) + val hadoopConf = SparkHadoopUtil.newConfiguration(conf) + val fs = new Path(dir.getAbsolutePath).getFileSystem(hadoopConf) + + val provider = new FsHistoryProvider(conf) + + val writer = new RollingEventLogFilesWriter("app", None, dir.toURI, conf, hadoopConf) + writer.start() + + writeEventsToRollingWriter(writer, Seq( +SparkListenerApplicationStart("app", Some("app"), 0, "user", None), +SparkListenerJobStart(1, 0, Seq.empty)), rollFile = false) + provider.checkForLogs() + provider.cleanLogs() + assert(dir.listFiles().size === 1) + assert(provider.getListing.length === 1) + + // Manually delete the appstatus file to make an invalid rolling event log + val appStatusPath = RollingEventLogFilesWriter.getAppStatusFilePath(new Path(writer.logPath), +"app", None, true) + fs.delete(appStatusPath, false) + provider.checkForLogs() + provider.cleanLogs() + assert(provider.getListing.length === 0) + + // Create a new application + val writer2 = new RollingEventLogFilesWriter("app2", None, dir.toURI, conf, hadoopConf) + writer2.start() + writeEventsToRollingWriter(writer2, Seq( +SparkListenerApplicationStart("app2", Some("app2"), 0, "user", None), +SparkListenerJobStart(1, 0, Seq.empty)), rollFile = false) + + // Both folders exist but only one application found + provider.checkForLogs() + provider.cleanLogs() + assert(provider.getListing.length === 1) + assert(dir.listFiles
[spark] branch master updated (f3ad32f -> 9ab0ec4)
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from f3ad32f [SPARK-33026][SQL][FOLLOWUP] metrics name should be numOutputRows add 9ab0ec4 [SPARK-33146][CORE] Check for non-fatal errors when loading new applications in SHS No new revisions were added by this update. Summary of changes: .../spark/deploy/history/FsHistoryProvider.scala | 3 ++ .../deploy/history/FsHistoryProviderSuite.scala| 49 ++ 2 files changed, 52 insertions(+) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-33146][CORE] Check for non-fatal errors when loading new applications in SHS
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new d9669bd [SPARK-33146][CORE] Check for non-fatal errors when loading new applications in SHS d9669bd is described below commit d9669bdf0ff4ed9951d7077b8dc9ad94507615c5 Author: Adam Binford AuthorDate: Thu Oct 15 11:59:29 2020 +0900 [SPARK-33146][CORE] Check for non-fatal errors when loading new applications in SHS ### What changes were proposed in this pull request? Adds an additional check for non-fatal errors when attempting to add a new entry to the history server application listing. ### Why are the changes needed? A bad rolling event log folder (missing appstatus file or no log files) would cause no applications to be loaded by the Spark history server. Figuring out why invalid event log folders are created in the first place will be addressed in separate issues, this just lets the history server skip the invalid folder and successfully load all the valid applications. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? New UT Closes #30037 from Kimahriman/bug/rolling-log-crashing-history. Authored-by: Adam Binford Signed-off-by: Jungtaek Lim (HeartSaVioR) (cherry picked from commit 9ab0ec4e38e5df0537b38cb0f89e004ad57bec90) Signed-off-by: Jungtaek Lim (HeartSaVioR) --- .../spark/deploy/history/FsHistoryProvider.scala | 3 ++ .../deploy/history/FsHistoryProviderSuite.scala| 49 ++ 2 files changed, 52 insertions(+) diff --git a/core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala b/core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala index c262152..5970708 100644 --- a/core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala +++ b/core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala @@ -526,6 +526,9 @@ private[history] class FsHistoryProvider(conf: SparkConf, clock: Clock) reader.fileSizeForLastIndex > 0 } catch { case _: FileNotFoundException => false +case NonFatal(e) => + logWarning(s"Error while reading new log ${reader.rootPath}", e) + false } case _: FileNotFoundException => diff --git a/core/src/test/scala/org/apache/spark/deploy/history/FsHistoryProviderSuite.scala b/core/src/test/scala/org/apache/spark/deploy/history/FsHistoryProviderSuite.scala index c2f34fc..f3beb35 100644 --- a/core/src/test/scala/org/apache/spark/deploy/history/FsHistoryProviderSuite.scala +++ b/core/src/test/scala/org/apache/spark/deploy/history/FsHistoryProviderSuite.scala @@ -1470,6 +1470,55 @@ class FsHistoryProviderSuite extends SparkFunSuite with Matchers with Logging { } } + test("SPARK-33146: don't let one bad rolling log folder prevent loading other applications") { +withTempDir { dir => + val conf = createTestConf(true) + conf.set(HISTORY_LOG_DIR, dir.getAbsolutePath) + val hadoopConf = SparkHadoopUtil.newConfiguration(conf) + val fs = new Path(dir.getAbsolutePath).getFileSystem(hadoopConf) + + val provider = new FsHistoryProvider(conf) + + val writer = new RollingEventLogFilesWriter("app", None, dir.toURI, conf, hadoopConf) + writer.start() + + writeEventsToRollingWriter(writer, Seq( +SparkListenerApplicationStart("app", Some("app"), 0, "user", None), +SparkListenerJobStart(1, 0, Seq.empty)), rollFile = false) + provider.checkForLogs() + provider.cleanLogs() + assert(dir.listFiles().size === 1) + assert(provider.getListing.length === 1) + + // Manually delete the appstatus file to make an invalid rolling event log + val appStatusPath = RollingEventLogFilesWriter.getAppStatusFilePath(new Path(writer.logPath), +"app", None, true) + fs.delete(appStatusPath, false) + provider.checkForLogs() + provider.cleanLogs() + assert(provider.getListing.length === 0) + + // Create a new application + val writer2 = new RollingEventLogFilesWriter("app2", None, dir.toURI, conf, hadoopConf) + writer2.start() + writeEventsToRollingWriter(writer2, Seq( +SparkListenerApplicationStart("app2", Some("app2"), 0, "user", None), +SparkListenerJobStart(1, 0, Seq.empty)), rollFile = false) + + // Both folders exist but only one application found + provider.checkForLogs() + provider.cleanLogs() + assert(provider.getListing.length === 1) + assert(dir.listFiles
[spark] branch master updated (f3ad32f -> 9ab0ec4)
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from f3ad32f [SPARK-33026][SQL][FOLLOWUP] metrics name should be numOutputRows add 9ab0ec4 [SPARK-33146][CORE] Check for non-fatal errors when loading new applications in SHS No new revisions were added by this update. Summary of changes: .../spark/deploy/history/FsHistoryProvider.scala | 3 ++ .../deploy/history/FsHistoryProviderSuite.scala| 49 ++ 2 files changed, 52 insertions(+) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-33146][CORE] Check for non-fatal errors when loading new applications in SHS
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new d9669bd [SPARK-33146][CORE] Check for non-fatal errors when loading new applications in SHS d9669bd is described below commit d9669bdf0ff4ed9951d7077b8dc9ad94507615c5 Author: Adam Binford AuthorDate: Thu Oct 15 11:59:29 2020 +0900 [SPARK-33146][CORE] Check for non-fatal errors when loading new applications in SHS ### What changes were proposed in this pull request? Adds an additional check for non-fatal errors when attempting to add a new entry to the history server application listing. ### Why are the changes needed? A bad rolling event log folder (missing appstatus file or no log files) would cause no applications to be loaded by the Spark history server. Figuring out why invalid event log folders are created in the first place will be addressed in separate issues, this just lets the history server skip the invalid folder and successfully load all the valid applications. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? New UT Closes #30037 from Kimahriman/bug/rolling-log-crashing-history. Authored-by: Adam Binford Signed-off-by: Jungtaek Lim (HeartSaVioR) (cherry picked from commit 9ab0ec4e38e5df0537b38cb0f89e004ad57bec90) Signed-off-by: Jungtaek Lim (HeartSaVioR) --- .../spark/deploy/history/FsHistoryProvider.scala | 3 ++ .../deploy/history/FsHistoryProviderSuite.scala| 49 ++ 2 files changed, 52 insertions(+) diff --git a/core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala b/core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala index c262152..5970708 100644 --- a/core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala +++ b/core/src/main/scala/org/apache/spark/deploy/history/FsHistoryProvider.scala @@ -526,6 +526,9 @@ private[history] class FsHistoryProvider(conf: SparkConf, clock: Clock) reader.fileSizeForLastIndex > 0 } catch { case _: FileNotFoundException => false +case NonFatal(e) => + logWarning(s"Error while reading new log ${reader.rootPath}", e) + false } case _: FileNotFoundException => diff --git a/core/src/test/scala/org/apache/spark/deploy/history/FsHistoryProviderSuite.scala b/core/src/test/scala/org/apache/spark/deploy/history/FsHistoryProviderSuite.scala index c2f34fc..f3beb35 100644 --- a/core/src/test/scala/org/apache/spark/deploy/history/FsHistoryProviderSuite.scala +++ b/core/src/test/scala/org/apache/spark/deploy/history/FsHistoryProviderSuite.scala @@ -1470,6 +1470,55 @@ class FsHistoryProviderSuite extends SparkFunSuite with Matchers with Logging { } } + test("SPARK-33146: don't let one bad rolling log folder prevent loading other applications") { +withTempDir { dir => + val conf = createTestConf(true) + conf.set(HISTORY_LOG_DIR, dir.getAbsolutePath) + val hadoopConf = SparkHadoopUtil.newConfiguration(conf) + val fs = new Path(dir.getAbsolutePath).getFileSystem(hadoopConf) + + val provider = new FsHistoryProvider(conf) + + val writer = new RollingEventLogFilesWriter("app", None, dir.toURI, conf, hadoopConf) + writer.start() + + writeEventsToRollingWriter(writer, Seq( +SparkListenerApplicationStart("app", Some("app"), 0, "user", None), +SparkListenerJobStart(1, 0, Seq.empty)), rollFile = false) + provider.checkForLogs() + provider.cleanLogs() + assert(dir.listFiles().size === 1) + assert(provider.getListing.length === 1) + + // Manually delete the appstatus file to make an invalid rolling event log + val appStatusPath = RollingEventLogFilesWriter.getAppStatusFilePath(new Path(writer.logPath), +"app", None, true) + fs.delete(appStatusPath, false) + provider.checkForLogs() + provider.cleanLogs() + assert(provider.getListing.length === 0) + + // Create a new application + val writer2 = new RollingEventLogFilesWriter("app2", None, dir.toURI, conf, hadoopConf) + writer2.start() + writeEventsToRollingWriter(writer2, Seq( +SparkListenerApplicationStart("app2", Some("app2"), 0, "user", None), +SparkListenerJobStart(1, 0, Seq.empty)), rollFile = false) + + // Both folders exist but only one application found + provider.checkForLogs() + provider.cleanLogs() + assert(provider.getListing.length === 1) + assert(dir.listFiles
[spark] branch master updated (f3ad32f -> 9ab0ec4)
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from f3ad32f [SPARK-33026][SQL][FOLLOWUP] metrics name should be numOutputRows add 9ab0ec4 [SPARK-33146][CORE] Check for non-fatal errors when loading new applications in SHS No new revisions were added by this update. Summary of changes: .../spark/deploy/history/FsHistoryProvider.scala | 3 ++ .../deploy/history/FsHistoryProviderSuite.scala| 49 ++ 2 files changed, 52 insertions(+) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (f3ad32f -> 9ab0ec4)
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from f3ad32f [SPARK-33026][SQL][FOLLOWUP] metrics name should be numOutputRows add 9ab0ec4 [SPARK-33146][CORE] Check for non-fatal errors when loading new applications in SHS No new revisions were added by this update. Summary of changes: .../spark/deploy/history/FsHistoryProvider.scala | 3 ++ .../deploy/history/FsHistoryProviderSuite.scala| 49 ++ 2 files changed, 52 insertions(+) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (7fdb571 -> d936cb3)
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 7fdb571 [SPARK-32890][SQL] Pass all `sql/hive` module UTs in Scala 2.13 add d936cb3 [SPARK-26425][SS] Add more constraint checks to avoid checkpoint corruption No new revisions were added by this update. Summary of changes: .../spark/sql/execution/streaming/FileStreamSource.scala | 12 +--- .../spark/sql/execution/streaming/MicroBatchExecution.scala | 4 +++- 2 files changed, 12 insertions(+), 4 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (7fdb571 -> d936cb3)
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 7fdb571 [SPARK-32890][SQL] Pass all `sql/hive` module UTs in Scala 2.13 add d936cb3 [SPARK-26425][SS] Add more constraint checks to avoid checkpoint corruption No new revisions were added by this update. Summary of changes: .../spark/sql/execution/streaming/FileStreamSource.scala | 12 +--- .../spark/sql/execution/streaming/MicroBatchExecution.scala | 4 +++- 2 files changed, 12 insertions(+), 4 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (7fdb571 -> d936cb3)
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 7fdb571 [SPARK-32890][SQL] Pass all `sql/hive` module UTs in Scala 2.13 add d936cb3 [SPARK-26425][SS] Add more constraint checks to avoid checkpoint corruption No new revisions were added by this update. Summary of changes: .../spark/sql/execution/streaming/FileStreamSource.scala | 12 +--- .../spark/sql/execution/streaming/MicroBatchExecution.scala | 4 +++- 2 files changed, 12 insertions(+), 4 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (7fdb571 -> d936cb3)
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 7fdb571 [SPARK-32890][SQL] Pass all `sql/hive` module UTs in Scala 2.13 add d936cb3 [SPARK-26425][SS] Add more constraint checks to avoid checkpoint corruption No new revisions were added by this update. Summary of changes: .../spark/sql/execution/streaming/FileStreamSource.scala | 12 +--- .../spark/sql/execution/streaming/MicroBatchExecution.scala | 4 +++- 2 files changed, 12 insertions(+), 4 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (7fdb571 -> d936cb3)
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 7fdb571 [SPARK-32890][SQL] Pass all `sql/hive` module UTs in Scala 2.13 add d936cb3 [SPARK-26425][SS] Add more constraint checks to avoid checkpoint corruption No new revisions were added by this update. Summary of changes: .../spark/sql/execution/streaming/FileStreamSource.scala | 12 +--- .../spark/sql/execution/streaming/MicroBatchExecution.scala | 4 +++- 2 files changed, 12 insertions(+), 4 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (4a09613 -> db89b0e)
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 4a09613 Revert "[SPARK-32772][SQL][FOLLOWUP] Remove legacy silent support mode for spark-sql CLI" add db89b0e [SPARK-32831][SS] Refactor SupportsStreamingUpdate to represent actual meaning of the behavior No new revisions were added by this update. Summary of changes: .../apache/spark/sql/kafka010/KafkaSourceProvider.scala | 5 ++--- ...Update.scala => SupportsStreamingUpdateAsAppend.scala} | 15 +++ .../sql/execution/datasources/noop/NoopDataSource.scala | 5 ++--- .../spark/sql/execution/streaming/StreamExecution.scala | 6 +++--- .../apache/spark/sql/execution/streaming/console.scala| 7 +++ .../execution/streaming/sources/ForeachWriterTable.scala | 7 +++ .../spark/sql/execution/streaming/sources/memory.scala| 7 ++- 7 files changed, 26 insertions(+), 26 deletions(-) rename sql/catalyst/src/main/scala/org/apache/spark/sql/internal/connector/{SupportsStreamingUpdate.scala => SupportsStreamingUpdateAsAppend.scala} (64%) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (4a09613 -> db89b0e)
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 4a09613 Revert "[SPARK-32772][SQL][FOLLOWUP] Remove legacy silent support mode for spark-sql CLI" add db89b0e [SPARK-32831][SS] Refactor SupportsStreamingUpdate to represent actual meaning of the behavior No new revisions were added by this update. Summary of changes: .../apache/spark/sql/kafka010/KafkaSourceProvider.scala | 5 ++--- ...Update.scala => SupportsStreamingUpdateAsAppend.scala} | 15 +++ .../sql/execution/datasources/noop/NoopDataSource.scala | 5 ++--- .../spark/sql/execution/streaming/StreamExecution.scala | 6 +++--- .../apache/spark/sql/execution/streaming/console.scala| 7 +++ .../execution/streaming/sources/ForeachWriterTable.scala | 7 +++ .../spark/sql/execution/streaming/sources/memory.scala| 7 ++- 7 files changed, 26 insertions(+), 26 deletions(-) rename sql/catalyst/src/main/scala/org/apache/spark/sql/internal/connector/{SupportsStreamingUpdate.scala => SupportsStreamingUpdateAsAppend.scala} (64%) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (4a09613 -> db89b0e)
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 4a09613 Revert "[SPARK-32772][SQL][FOLLOWUP] Remove legacy silent support mode for spark-sql CLI" add db89b0e [SPARK-32831][SS] Refactor SupportsStreamingUpdate to represent actual meaning of the behavior No new revisions were added by this update. Summary of changes: .../apache/spark/sql/kafka010/KafkaSourceProvider.scala | 5 ++--- ...Update.scala => SupportsStreamingUpdateAsAppend.scala} | 15 +++ .../sql/execution/datasources/noop/NoopDataSource.scala | 5 ++--- .../spark/sql/execution/streaming/StreamExecution.scala | 6 +++--- .../apache/spark/sql/execution/streaming/console.scala| 7 +++ .../execution/streaming/sources/ForeachWriterTable.scala | 7 +++ .../spark/sql/execution/streaming/sources/memory.scala| 7 ++- 7 files changed, 26 insertions(+), 26 deletions(-) rename sql/catalyst/src/main/scala/org/apache/spark/sql/internal/connector/{SupportsStreamingUpdate.scala => SupportsStreamingUpdateAsAppend.scala} (64%) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (4a09613 -> db89b0e)
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 4a09613 Revert "[SPARK-32772][SQL][FOLLOWUP] Remove legacy silent support mode for spark-sql CLI" add db89b0e [SPARK-32831][SS] Refactor SupportsStreamingUpdate to represent actual meaning of the behavior No new revisions were added by this update. Summary of changes: .../apache/spark/sql/kafka010/KafkaSourceProvider.scala | 5 ++--- ...Update.scala => SupportsStreamingUpdateAsAppend.scala} | 15 +++ .../sql/execution/datasources/noop/NoopDataSource.scala | 5 ++--- .../spark/sql/execution/streaming/StreamExecution.scala | 6 +++--- .../apache/spark/sql/execution/streaming/console.scala| 7 +++ .../execution/streaming/sources/ForeachWriterTable.scala | 7 +++ .../spark/sql/execution/streaming/sources/memory.scala| 7 ++- 7 files changed, 26 insertions(+), 26 deletions(-) rename sql/catalyst/src/main/scala/org/apache/spark/sql/internal/connector/{SupportsStreamingUpdate.scala => SupportsStreamingUpdateAsAppend.scala} (64%) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (4a09613 -> db89b0e)
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 4a09613 Revert "[SPARK-32772][SQL][FOLLOWUP] Remove legacy silent support mode for spark-sql CLI" add db89b0e [SPARK-32831][SS] Refactor SupportsStreamingUpdate to represent actual meaning of the behavior No new revisions were added by this update. Summary of changes: .../apache/spark/sql/kafka010/KafkaSourceProvider.scala | 5 ++--- ...Update.scala => SupportsStreamingUpdateAsAppend.scala} | 15 +++ .../sql/execution/datasources/noop/NoopDataSource.scala | 5 ++--- .../spark/sql/execution/streaming/StreamExecution.scala | 6 +++--- .../apache/spark/sql/execution/streaming/console.scala| 7 +++ .../execution/streaming/sources/ForeachWriterTable.scala | 7 +++ .../spark/sql/execution/streaming/sources/memory.scala| 7 ++- 7 files changed, 26 insertions(+), 26 deletions(-) rename sql/catalyst/src/main/scala/org/apache/spark/sql/internal/connector/{SupportsStreamingUpdate.scala => SupportsStreamingUpdateAsAppend.scala} (64%) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (3eee915 -> 9151a58)
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 3eee915 [MINOR][SQL] Add missing documentation for LongType mapping add 9151a58 [SPARK-31608][CORE][WEBUI][TEST] Add test suites for HybridStore and HistoryServerMemoryManager No new revisions were added by this update. Summary of changes: .../history/HistoryServerMemoryManager.scala | 5 +- .../apache/spark/deploy/history/HybridStore.scala | 6 +- .../deploy/history/FsHistoryProviderSuite.scala| 13 +- .../history/HistoryServerMemoryManagerSuite.scala | 55 + .../spark/deploy/history/HybridStoreSuite.scala| 232 + 5 files changed, 304 insertions(+), 7 deletions(-) create mode 100644 core/src/test/scala/org/apache/spark/deploy/history/HistoryServerMemoryManagerSuite.scala create mode 100644 core/src/test/scala/org/apache/spark/deploy/history/HybridStoreSuite.scala - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (3eee915 -> 9151a58)
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 3eee915 [MINOR][SQL] Add missing documentation for LongType mapping add 9151a58 [SPARK-31608][CORE][WEBUI][TEST] Add test suites for HybridStore and HistoryServerMemoryManager No new revisions were added by this update. Summary of changes: .../history/HistoryServerMemoryManager.scala | 5 +- .../apache/spark/deploy/history/HybridStore.scala | 6 +- .../deploy/history/FsHistoryProviderSuite.scala| 13 +- .../history/HistoryServerMemoryManagerSuite.scala | 55 + .../spark/deploy/history/HybridStoreSuite.scala| 232 + 5 files changed, 304 insertions(+), 7 deletions(-) create mode 100644 core/src/test/scala/org/apache/spark/deploy/history/HistoryServerMemoryManagerSuite.scala create mode 100644 core/src/test/scala/org/apache/spark/deploy/history/HybridStoreSuite.scala - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (3eee915 -> 9151a58)
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 3eee915 [MINOR][SQL] Add missing documentation for LongType mapping add 9151a58 [SPARK-31608][CORE][WEBUI][TEST] Add test suites for HybridStore and HistoryServerMemoryManager No new revisions were added by this update. Summary of changes: .../history/HistoryServerMemoryManager.scala | 5 +- .../apache/spark/deploy/history/HybridStore.scala | 6 +- .../deploy/history/FsHistoryProviderSuite.scala| 13 +- .../history/HistoryServerMemoryManagerSuite.scala | 55 + .../spark/deploy/history/HybridStoreSuite.scala| 232 + 5 files changed, 304 insertions(+), 7 deletions(-) create mode 100644 core/src/test/scala/org/apache/spark/deploy/history/HistoryServerMemoryManagerSuite.scala create mode 100644 core/src/test/scala/org/apache/spark/deploy/history/HybridStoreSuite.scala - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (3eee915 -> 9151a58)
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 3eee915 [MINOR][SQL] Add missing documentation for LongType mapping add 9151a58 [SPARK-31608][CORE][WEBUI][TEST] Add test suites for HybridStore and HistoryServerMemoryManager No new revisions were added by this update. Summary of changes: .../history/HistoryServerMemoryManager.scala | 5 +- .../apache/spark/deploy/history/HybridStore.scala | 6 +- .../deploy/history/FsHistoryProviderSuite.scala| 13 +- .../history/HistoryServerMemoryManagerSuite.scala | 55 + .../spark/deploy/history/HybridStoreSuite.scala| 232 + 5 files changed, 304 insertions(+), 7 deletions(-) create mode 100644 core/src/test/scala/org/apache/spark/deploy/history/HistoryServerMemoryManagerSuite.scala create mode 100644 core/src/test/scala/org/apache/spark/deploy/history/HybridStoreSuite.scala - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (3eee915 -> 9151a58)
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 3eee915 [MINOR][SQL] Add missing documentation for LongType mapping add 9151a58 [SPARK-31608][CORE][WEBUI][TEST] Add test suites for HybridStore and HistoryServerMemoryManager No new revisions were added by this update. Summary of changes: .../history/HistoryServerMemoryManager.scala | 5 +- .../apache/spark/deploy/history/HybridStore.scala | 6 +- .../deploy/history/FsHistoryProviderSuite.scala| 13 +- .../history/HistoryServerMemoryManagerSuite.scala | 55 + .../spark/deploy/history/HybridStoreSuite.scala| 232 + 5 files changed, 304 insertions(+), 7 deletions(-) create mode 100644 core/src/test/scala/org/apache/spark/deploy/history/HistoryServerMemoryManagerSuite.scala create mode 100644 core/src/test/scala/org/apache/spark/deploy/history/HybridStoreSuite.scala - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-31792][SS][DOC][FOLLOW-UP] Rephrase the description for some operations
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new a6df16b [SPARK-31792][SS][DOC][FOLLOW-UP] Rephrase the description for some operations a6df16b is described below commit a6df16b36210da32359c77205920eaee98d3e232 Author: Yuanjian Li AuthorDate: Sat Aug 22 21:32:23 2020 +0900 [SPARK-31792][SS][DOC][FOLLOW-UP] Rephrase the description for some operations ### What changes were proposed in this pull request? Rephrase the description for some operations to make it clearer. ### Why are the changes needed? Add more detail in the document. ### Does this PR introduce _any_ user-facing change? No, document only. ### How was this patch tested? Document only. Closes #29269 from xuanyuanking/SPARK-31792-follow. Authored-by: Yuanjian Li Signed-off-by: Jungtaek Lim (HeartSaVioR) (cherry picked from commit 8b26c69ce7f905a3c7bbabb1c47ee6a51a23) Signed-off-by: Jungtaek Lim (HeartSaVioR) --- docs/web-ui.md | 10 +- .../spark/sql/execution/streaming/MicroBatchExecution.scala| 3 +-- 2 files changed, 6 insertions(+), 7 deletions(-) diff --git a/docs/web-ui.md b/docs/web-ui.md index 134a8c8..fe26043 100644 --- a/docs/web-ui.md +++ b/docs/web-ui.md @@ -425,11 +425,11 @@ queries. Currently, it contains the following metrics. * **Batch Duration.** The process duration of each batch. * **Operation Duration.** The amount of time taken to perform various operations in milliseconds. The tracked operations are listed as follows. -* addBatch: Adds result data of the current batch to the sink. -* getBatch: Gets a new batch of data to process. -* latestOffset: Gets the latest offsets for sources. -* queryPlanning: Generates the execution plan. -* walCommit: Writes the offsets to the metadata log. +* addBatch: Time taken to read the micro-batch's input data from the sources, process it, and write the batch's output to the sink. This should take the bulk of the micro-batch's time. +* getBatch: Time taken to prepare the logical query to read the input of the current micro-batch from the sources. +* latestOffset & getOffset: Time taken to query the maximum available offset for this source. +* queryPlanning: Time taken to generates the execution plan. +* walCommit: Time taken to write the offsets to the metadata log. As an early-release version, the statistics page is still under development and will be improved in future releases. diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MicroBatchExecution.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MicroBatchExecution.scala index e022bfb..e0731db 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MicroBatchExecution.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MicroBatchExecution.scala @@ -566,8 +566,7 @@ class MicroBatchExecution( val nextBatch = new Dataset(lastExecution, RowEncoder(lastExecution.analyzed.schema)) -val batchSinkProgress: Option[StreamWriterCommitProgress] = - reportTimeTaken("addBatch") { +val batchSinkProgress: Option[StreamWriterCommitProgress] = reportTimeTaken("addBatch") { SQLExecution.withNewExecutionId(lastExecution) { sink match { case s: Sink => s.addBatch(currentBatchId, nextBatch) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-31792][SS][DOC][FOLLOW-UP] Rephrase the description for some operations
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new a6df16b [SPARK-31792][SS][DOC][FOLLOW-UP] Rephrase the description for some operations a6df16b is described below commit a6df16b36210da32359c77205920eaee98d3e232 Author: Yuanjian Li AuthorDate: Sat Aug 22 21:32:23 2020 +0900 [SPARK-31792][SS][DOC][FOLLOW-UP] Rephrase the description for some operations ### What changes were proposed in this pull request? Rephrase the description for some operations to make it clearer. ### Why are the changes needed? Add more detail in the document. ### Does this PR introduce _any_ user-facing change? No, document only. ### How was this patch tested? Document only. Closes #29269 from xuanyuanking/SPARK-31792-follow. Authored-by: Yuanjian Li Signed-off-by: Jungtaek Lim (HeartSaVioR) (cherry picked from commit 8b26c69ce7f905a3c7bbabb1c47ee6a51a23) Signed-off-by: Jungtaek Lim (HeartSaVioR) --- docs/web-ui.md | 10 +- .../spark/sql/execution/streaming/MicroBatchExecution.scala| 3 +-- 2 files changed, 6 insertions(+), 7 deletions(-) diff --git a/docs/web-ui.md b/docs/web-ui.md index 134a8c8..fe26043 100644 --- a/docs/web-ui.md +++ b/docs/web-ui.md @@ -425,11 +425,11 @@ queries. Currently, it contains the following metrics. * **Batch Duration.** The process duration of each batch. * **Operation Duration.** The amount of time taken to perform various operations in milliseconds. The tracked operations are listed as follows. -* addBatch: Adds result data of the current batch to the sink. -* getBatch: Gets a new batch of data to process. -* latestOffset: Gets the latest offsets for sources. -* queryPlanning: Generates the execution plan. -* walCommit: Writes the offsets to the metadata log. +* addBatch: Time taken to read the micro-batch's input data from the sources, process it, and write the batch's output to the sink. This should take the bulk of the micro-batch's time. +* getBatch: Time taken to prepare the logical query to read the input of the current micro-batch from the sources. +* latestOffset & getOffset: Time taken to query the maximum available offset for this source. +* queryPlanning: Time taken to generates the execution plan. +* walCommit: Time taken to write the offsets to the metadata log. As an early-release version, the statistics page is still under development and will be improved in future releases. diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MicroBatchExecution.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MicroBatchExecution.scala index e022bfb..e0731db 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MicroBatchExecution.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MicroBatchExecution.scala @@ -566,8 +566,7 @@ class MicroBatchExecution( val nextBatch = new Dataset(lastExecution, RowEncoder(lastExecution.analyzed.schema)) -val batchSinkProgress: Option[StreamWriterCommitProgress] = - reportTimeTaken("addBatch") { +val batchSinkProgress: Option[StreamWriterCommitProgress] = reportTimeTaken("addBatch") { SQLExecution.withNewExecutionId(lastExecution) { sink match { case s: Sink => s.addBatch(currentBatchId, nextBatch) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (12f4331 -> 8b26c69)
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 12f4331 [SPARK-32672][SQL] Fix data corruption in boolean bit set compression add 8b26c69 [SPARK-31792][SS][DOC][FOLLOW-UP] Rephrase the description for some operations No new revisions were added by this update. Summary of changes: docs/web-ui.md | 10 +- .../spark/sql/execution/streaming/MicroBatchExecution.scala| 3 +-- 2 files changed, 6 insertions(+), 7 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-31792][SS][DOC][FOLLOW-UP] Rephrase the description for some operations
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new a6df16b [SPARK-31792][SS][DOC][FOLLOW-UP] Rephrase the description for some operations a6df16b is described below commit a6df16b36210da32359c77205920eaee98d3e232 Author: Yuanjian Li AuthorDate: Sat Aug 22 21:32:23 2020 +0900 [SPARK-31792][SS][DOC][FOLLOW-UP] Rephrase the description for some operations ### What changes were proposed in this pull request? Rephrase the description for some operations to make it clearer. ### Why are the changes needed? Add more detail in the document. ### Does this PR introduce _any_ user-facing change? No, document only. ### How was this patch tested? Document only. Closes #29269 from xuanyuanking/SPARK-31792-follow. Authored-by: Yuanjian Li Signed-off-by: Jungtaek Lim (HeartSaVioR) (cherry picked from commit 8b26c69ce7f905a3c7bbabb1c47ee6a51a23) Signed-off-by: Jungtaek Lim (HeartSaVioR) --- docs/web-ui.md | 10 +- .../spark/sql/execution/streaming/MicroBatchExecution.scala| 3 +-- 2 files changed, 6 insertions(+), 7 deletions(-) diff --git a/docs/web-ui.md b/docs/web-ui.md index 134a8c8..fe26043 100644 --- a/docs/web-ui.md +++ b/docs/web-ui.md @@ -425,11 +425,11 @@ queries. Currently, it contains the following metrics. * **Batch Duration.** The process duration of each batch. * **Operation Duration.** The amount of time taken to perform various operations in milliseconds. The tracked operations are listed as follows. -* addBatch: Adds result data of the current batch to the sink. -* getBatch: Gets a new batch of data to process. -* latestOffset: Gets the latest offsets for sources. -* queryPlanning: Generates the execution plan. -* walCommit: Writes the offsets to the metadata log. +* addBatch: Time taken to read the micro-batch's input data from the sources, process it, and write the batch's output to the sink. This should take the bulk of the micro-batch's time. +* getBatch: Time taken to prepare the logical query to read the input of the current micro-batch from the sources. +* latestOffset & getOffset: Time taken to query the maximum available offset for this source. +* queryPlanning: Time taken to generates the execution plan. +* walCommit: Time taken to write the offsets to the metadata log. As an early-release version, the statistics page is still under development and will be improved in future releases. diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MicroBatchExecution.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MicroBatchExecution.scala index e022bfb..e0731db 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MicroBatchExecution.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MicroBatchExecution.scala @@ -566,8 +566,7 @@ class MicroBatchExecution( val nextBatch = new Dataset(lastExecution, RowEncoder(lastExecution.analyzed.schema)) -val batchSinkProgress: Option[StreamWriterCommitProgress] = - reportTimeTaken("addBatch") { +val batchSinkProgress: Option[StreamWriterCommitProgress] = reportTimeTaken("addBatch") { SQLExecution.withNewExecutionId(lastExecution) { sink match { case s: Sink => s.addBatch(currentBatchId, nextBatch) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (12f4331 -> 8b26c69)
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 12f4331 [SPARK-32672][SQL] Fix data corruption in boolean bit set compression add 8b26c69 [SPARK-31792][SS][DOC][FOLLOW-UP] Rephrase the description for some operations No new revisions were added by this update. Summary of changes: docs/web-ui.md | 10 +- .../spark/sql/execution/streaming/MicroBatchExecution.scala| 3 +-- 2 files changed, 6 insertions(+), 7 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-31792][SS][DOC][FOLLOW-UP] Rephrase the description for some operations
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new a6df16b [SPARK-31792][SS][DOC][FOLLOW-UP] Rephrase the description for some operations a6df16b is described below commit a6df16b36210da32359c77205920eaee98d3e232 Author: Yuanjian Li AuthorDate: Sat Aug 22 21:32:23 2020 +0900 [SPARK-31792][SS][DOC][FOLLOW-UP] Rephrase the description for some operations ### What changes were proposed in this pull request? Rephrase the description for some operations to make it clearer. ### Why are the changes needed? Add more detail in the document. ### Does this PR introduce _any_ user-facing change? No, document only. ### How was this patch tested? Document only. Closes #29269 from xuanyuanking/SPARK-31792-follow. Authored-by: Yuanjian Li Signed-off-by: Jungtaek Lim (HeartSaVioR) (cherry picked from commit 8b26c69ce7f905a3c7bbabb1c47ee6a51a23) Signed-off-by: Jungtaek Lim (HeartSaVioR) --- docs/web-ui.md | 10 +- .../spark/sql/execution/streaming/MicroBatchExecution.scala| 3 +-- 2 files changed, 6 insertions(+), 7 deletions(-) diff --git a/docs/web-ui.md b/docs/web-ui.md index 134a8c8..fe26043 100644 --- a/docs/web-ui.md +++ b/docs/web-ui.md @@ -425,11 +425,11 @@ queries. Currently, it contains the following metrics. * **Batch Duration.** The process duration of each batch. * **Operation Duration.** The amount of time taken to perform various operations in milliseconds. The tracked operations are listed as follows. -* addBatch: Adds result data of the current batch to the sink. -* getBatch: Gets a new batch of data to process. -* latestOffset: Gets the latest offsets for sources. -* queryPlanning: Generates the execution plan. -* walCommit: Writes the offsets to the metadata log. +* addBatch: Time taken to read the micro-batch's input data from the sources, process it, and write the batch's output to the sink. This should take the bulk of the micro-batch's time. +* getBatch: Time taken to prepare the logical query to read the input of the current micro-batch from the sources. +* latestOffset & getOffset: Time taken to query the maximum available offset for this source. +* queryPlanning: Time taken to generates the execution plan. +* walCommit: Time taken to write the offsets to the metadata log. As an early-release version, the statistics page is still under development and will be improved in future releases. diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MicroBatchExecution.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MicroBatchExecution.scala index e022bfb..e0731db 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MicroBatchExecution.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MicroBatchExecution.scala @@ -566,8 +566,7 @@ class MicroBatchExecution( val nextBatch = new Dataset(lastExecution, RowEncoder(lastExecution.analyzed.schema)) -val batchSinkProgress: Option[StreamWriterCommitProgress] = - reportTimeTaken("addBatch") { +val batchSinkProgress: Option[StreamWriterCommitProgress] = reportTimeTaken("addBatch") { SQLExecution.withNewExecutionId(lastExecution) { sink match { case s: Sink => s.addBatch(currentBatchId, nextBatch) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (12f4331 -> 8b26c69)
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 12f4331 [SPARK-32672][SQL] Fix data corruption in boolean bit set compression add 8b26c69 [SPARK-31792][SS][DOC][FOLLOW-UP] Rephrase the description for some operations No new revisions were added by this update. Summary of changes: docs/web-ui.md | 10 +- .../spark/sql/execution/streaming/MicroBatchExecution.scala| 3 +-- 2 files changed, 6 insertions(+), 7 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch branch-3.0 updated: [SPARK-31792][SS][DOC][FOLLOW-UP] Rephrase the description for some operations
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new a6df16b [SPARK-31792][SS][DOC][FOLLOW-UP] Rephrase the description for some operations a6df16b is described below commit a6df16b36210da32359c77205920eaee98d3e232 Author: Yuanjian Li AuthorDate: Sat Aug 22 21:32:23 2020 +0900 [SPARK-31792][SS][DOC][FOLLOW-UP] Rephrase the description for some operations ### What changes were proposed in this pull request? Rephrase the description for some operations to make it clearer. ### Why are the changes needed? Add more detail in the document. ### Does this PR introduce _any_ user-facing change? No, document only. ### How was this patch tested? Document only. Closes #29269 from xuanyuanking/SPARK-31792-follow. Authored-by: Yuanjian Li Signed-off-by: Jungtaek Lim (HeartSaVioR) (cherry picked from commit 8b26c69ce7f905a3c7bbabb1c47ee6a51a23) Signed-off-by: Jungtaek Lim (HeartSaVioR) --- docs/web-ui.md | 10 +- .../spark/sql/execution/streaming/MicroBatchExecution.scala| 3 +-- 2 files changed, 6 insertions(+), 7 deletions(-) diff --git a/docs/web-ui.md b/docs/web-ui.md index 134a8c8..fe26043 100644 --- a/docs/web-ui.md +++ b/docs/web-ui.md @@ -425,11 +425,11 @@ queries. Currently, it contains the following metrics. * **Batch Duration.** The process duration of each batch. * **Operation Duration.** The amount of time taken to perform various operations in milliseconds. The tracked operations are listed as follows. -* addBatch: Adds result data of the current batch to the sink. -* getBatch: Gets a new batch of data to process. -* latestOffset: Gets the latest offsets for sources. -* queryPlanning: Generates the execution plan. -* walCommit: Writes the offsets to the metadata log. +* addBatch: Time taken to read the micro-batch's input data from the sources, process it, and write the batch's output to the sink. This should take the bulk of the micro-batch's time. +* getBatch: Time taken to prepare the logical query to read the input of the current micro-batch from the sources. +* latestOffset & getOffset: Time taken to query the maximum available offset for this source. +* queryPlanning: Time taken to generates the execution plan. +* walCommit: Time taken to write the offsets to the metadata log. As an early-release version, the statistics page is still under development and will be improved in future releases. diff --git a/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MicroBatchExecution.scala b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MicroBatchExecution.scala index e022bfb..e0731db 100644 --- a/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MicroBatchExecution.scala +++ b/sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/MicroBatchExecution.scala @@ -566,8 +566,7 @@ class MicroBatchExecution( val nextBatch = new Dataset(lastExecution, RowEncoder(lastExecution.analyzed.schema)) -val batchSinkProgress: Option[StreamWriterCommitProgress] = - reportTimeTaken("addBatch") { +val batchSinkProgress: Option[StreamWriterCommitProgress] = reportTimeTaken("addBatch") { SQLExecution.withNewExecutionId(lastExecution) { sink match { case s: Sink => s.addBatch(currentBatchId, nextBatch) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (12f4331 -> 8b26c69)
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 12f4331 [SPARK-32672][SQL] Fix data corruption in boolean bit set compression add 8b26c69 [SPARK-31792][SS][DOC][FOLLOW-UP] Rephrase the description for some operations No new revisions were added by this update. Summary of changes: docs/web-ui.md | 10 +- .../spark/sql/execution/streaming/MicroBatchExecution.scala| 3 +-- 2 files changed, 6 insertions(+), 7 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (12f4331 -> 8b26c69)
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 12f4331 [SPARK-32672][SQL] Fix data corruption in boolean bit set compression add 8b26c69 [SPARK-31792][SS][DOC][FOLLOW-UP] Rephrase the description for some operations No new revisions were added by this update. Summary of changes: docs/web-ui.md | 10 +- .../spark/sql/execution/streaming/MicroBatchExecution.scala| 3 +-- 2 files changed, 6 insertions(+), 7 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (ae82768 -> 813532d)
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from ae82768 [SPARK-32421][SQL] Add code-gen for shuffled hash join add 813532d [SPARK-32468][SS][TESTS] Fix timeout config issue in Kafka connector tests No new revisions were added by this update. Summary of changes: .../apache/spark/sql/kafka010/KafkaContinuousSourceSuite.scala| 2 +- .../apache/spark/sql/kafka010/KafkaDontFailOnDataLossSuite.scala | 2 +- .../apache/spark/sql/kafka010/KafkaMicroBatchSourceSuite.scala| 8 3 files changed, 6 insertions(+), 6 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (ae82768 -> 813532d)
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from ae82768 [SPARK-32421][SQL] Add code-gen for shuffled hash join add 813532d [SPARK-32468][SS][TESTS] Fix timeout config issue in Kafka connector tests No new revisions were added by this update. Summary of changes: .../apache/spark/sql/kafka010/KafkaContinuousSourceSuite.scala| 2 +- .../apache/spark/sql/kafka010/KafkaDontFailOnDataLossSuite.scala | 2 +- .../apache/spark/sql/kafka010/KafkaMicroBatchSourceSuite.scala| 8 3 files changed, 6 insertions(+), 6 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (ae82768 -> 813532d)
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from ae82768 [SPARK-32421][SQL] Add code-gen for shuffled hash join add 813532d [SPARK-32468][SS][TESTS] Fix timeout config issue in Kafka connector tests No new revisions were added by this update. Summary of changes: .../apache/spark/sql/kafka010/KafkaContinuousSourceSuite.scala| 2 +- .../apache/spark/sql/kafka010/KafkaDontFailOnDataLossSuite.scala | 2 +- .../apache/spark/sql/kafka010/KafkaMicroBatchSourceSuite.scala| 8 3 files changed, 6 insertions(+), 6 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (ae82768 -> 813532d)
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from ae82768 [SPARK-32421][SQL] Add code-gen for shuffled hash join add 813532d [SPARK-32468][SS][TESTS] Fix timeout config issue in Kafka connector tests No new revisions were added by this update. Summary of changes: .../apache/spark/sql/kafka010/KafkaContinuousSourceSuite.scala| 2 +- .../apache/spark/sql/kafka010/KafkaDontFailOnDataLossSuite.scala | 2 +- .../apache/spark/sql/kafka010/KafkaMicroBatchSourceSuite.scala| 8 3 files changed, 6 insertions(+), 6 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (ae82768 -> 813532d)
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from ae82768 [SPARK-32421][SQL] Add code-gen for shuffled hash join add 813532d [SPARK-32468][SS][TESTS] Fix timeout config issue in Kafka connector tests No new revisions were added by this update. Summary of changes: .../apache/spark/sql/kafka010/KafkaContinuousSourceSuite.scala| 2 +- .../apache/spark/sql/kafka010/KafkaDontFailOnDataLossSuite.scala | 2 +- .../apache/spark/sql/kafka010/KafkaMicroBatchSourceSuite.scala| 8 3 files changed, 6 insertions(+), 6 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (9d7b1d9 -> f602782)
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 9d7b1d9 [SPARK-32175][SPARK-32175][FOLLOWUP] Remove flaky test added in add f602782 [SPARK-32482][SS][TESTS] Eliminate deprecated poll(long) API calls to avoid infinite wait in tests No new revisions were added by this update. Summary of changes: .../apache/spark/sql/kafka010/KafkaTestUtils.scala | 47 +++--- 1 file changed, 14 insertions(+), 33 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (9d7b1d9 -> f602782)
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 9d7b1d9 [SPARK-32175][SPARK-32175][FOLLOWUP] Remove flaky test added in add f602782 [SPARK-32482][SS][TESTS] Eliminate deprecated poll(long) API calls to avoid infinite wait in tests No new revisions were added by this update. Summary of changes: .../apache/spark/sql/kafka010/KafkaTestUtils.scala | 47 +++--- 1 file changed, 14 insertions(+), 33 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (9d7b1d9 -> f602782)
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 9d7b1d9 [SPARK-32175][SPARK-32175][FOLLOWUP] Remove flaky test added in add f602782 [SPARK-32482][SS][TESTS] Eliminate deprecated poll(long) API calls to avoid infinite wait in tests No new revisions were added by this update. Summary of changes: .../apache/spark/sql/kafka010/KafkaTestUtils.scala | 47 +++--- 1 file changed, 14 insertions(+), 33 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (9d7b1d9 -> f602782)
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 9d7b1d9 [SPARK-32175][SPARK-32175][FOLLOWUP] Remove flaky test added in add f602782 [SPARK-32482][SS][TESTS] Eliminate deprecated poll(long) API calls to avoid infinite wait in tests No new revisions were added by this update. Summary of changes: .../apache/spark/sql/kafka010/KafkaTestUtils.scala | 47 +++--- 1 file changed, 14 insertions(+), 33 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (9d7b1d9 -> f602782)
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 9d7b1d9 [SPARK-32175][SPARK-32175][FOLLOWUP] Remove flaky test added in add f602782 [SPARK-32482][SS][TESTS] Eliminate deprecated poll(long) API calls to avoid infinite wait in tests No new revisions were added by this update. Summary of changes: .../apache/spark/sql/kafka010/KafkaTestUtils.scala | 47 +++--- 1 file changed, 14 insertions(+), 33 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (39181ff -> 7b9d755)
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 39181ff [SPARK-32286][SQL] Coalesce bucketed table for shuffled hash join if applicable add 7b9d755 [SPARK-32350][CORE] Add batch-write on LevelDB to improve performance of HybridStore No new revisions were added by this update. Summary of changes: .../org/apache/spark/util/kvstore/LevelDB.java | 73 ++ .../apache/spark/deploy/history/HybridStore.scala | 9 +-- 2 files changed, 66 insertions(+), 16 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (39181ff -> 7b9d755)
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 39181ff [SPARK-32286][SQL] Coalesce bucketed table for shuffled hash join if applicable add 7b9d755 [SPARK-32350][CORE] Add batch-write on LevelDB to improve performance of HybridStore No new revisions were added by this update. Summary of changes: .../org/apache/spark/util/kvstore/LevelDB.java | 73 ++ .../apache/spark/deploy/history/HybridStore.scala | 9 +-- 2 files changed, 66 insertions(+), 16 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (39181ff -> 7b9d755)
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 39181ff [SPARK-32286][SQL] Coalesce bucketed table for shuffled hash join if applicable add 7b9d755 [SPARK-32350][CORE] Add batch-write on LevelDB to improve performance of HybridStore No new revisions were added by this update. Summary of changes: .../org/apache/spark/util/kvstore/LevelDB.java | 73 ++ .../apache/spark/deploy/history/HybridStore.scala | 9 +-- 2 files changed, 66 insertions(+), 16 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (39181ff -> 7b9d755)
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 39181ff [SPARK-32286][SQL] Coalesce bucketed table for shuffled hash join if applicable add 7b9d755 [SPARK-32350][CORE] Add batch-write on LevelDB to improve performance of HybridStore No new revisions were added by this update. Summary of changes: .../org/apache/spark/util/kvstore/LevelDB.java | 73 ++ .../apache/spark/deploy/history/HybridStore.scala | 9 +-- 2 files changed, 66 insertions(+), 16 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (39181ff -> 7b9d755)
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from 39181ff [SPARK-32286][SQL] Coalesce bucketed table for shuffled hash join if applicable add 7b9d755 [SPARK-32350][CORE] Add batch-write on LevelDB to improve performance of HybridStore No new revisions were added by this update. Summary of changes: .../org/apache/spark/util/kvstore/LevelDB.java | 73 ++ .../apache/spark/deploy/history/HybridStore.scala | 9 +-- 2 files changed, 66 insertions(+), 16 deletions(-) - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (fb51925 -> 9747e8f)
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from fb51925 [SPARK-32335][K8S][TESTS] Remove Python2 test from K8s IT add 9747e8f [SPARK-31831][SQL][TESTS][FOLLOWUP] Put mocks for HiveSessionImplSuite in hive version related subdirectories No new revisions were added by this update. Summary of changes: sql/hive-thriftserver/pom.xml | 12 ++ .../hive/thriftserver/HiveSessionImplSuite.scala | 29 + .../thriftserver/GetCatalogsOperationMock.scala| 50 ++ .../thriftserver/GetCatalogsOperationMock.scala| 50 ++ 4 files changed, 113 insertions(+), 28 deletions(-) create mode 100644 sql/hive-thriftserver/v1.2/src/test/scala/ org/apache/spark/sql/hive/thriftserver/GetCatalogsOperationMock.scala create mode 100644 sql/hive-thriftserver/v2.3/src/test/scala/ org/apache/spark/sql/hive/thriftserver/GetCatalogsOperationMock.scala - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (fb51925 -> 9747e8f)
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from fb51925 [SPARK-32335][K8S][TESTS] Remove Python2 test from K8s IT add 9747e8f [SPARK-31831][SQL][TESTS][FOLLOWUP] Put mocks for HiveSessionImplSuite in hive version related subdirectories No new revisions were added by this update. Summary of changes: sql/hive-thriftserver/pom.xml | 12 ++ .../hive/thriftserver/HiveSessionImplSuite.scala | 29 + .../thriftserver/GetCatalogsOperationMock.scala| 50 ++ .../thriftserver/GetCatalogsOperationMock.scala| 50 ++ 4 files changed, 113 insertions(+), 28 deletions(-) create mode 100644 sql/hive-thriftserver/v1.2/src/test/scala/ org/apache/spark/sql/hive/thriftserver/GetCatalogsOperationMock.scala create mode 100644 sql/hive-thriftserver/v2.3/src/test/scala/ org/apache/spark/sql/hive/thriftserver/GetCatalogsOperationMock.scala - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (fb51925 -> 9747e8f)
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from fb51925 [SPARK-32335][K8S][TESTS] Remove Python2 test from K8s IT add 9747e8f [SPARK-31831][SQL][TESTS][FOLLOWUP] Put mocks for HiveSessionImplSuite in hive version related subdirectories No new revisions were added by this update. Summary of changes: sql/hive-thriftserver/pom.xml | 12 ++ .../hive/thriftserver/HiveSessionImplSuite.scala | 29 + .../thriftserver/GetCatalogsOperationMock.scala| 50 ++ .../thriftserver/GetCatalogsOperationMock.scala| 50 ++ 4 files changed, 113 insertions(+), 28 deletions(-) create mode 100644 sql/hive-thriftserver/v1.2/src/test/scala/ org/apache/spark/sql/hive/thriftserver/GetCatalogsOperationMock.scala create mode 100644 sql/hive-thriftserver/v2.3/src/test/scala/ org/apache/spark/sql/hive/thriftserver/GetCatalogsOperationMock.scala - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org
[spark] branch master updated (fb51925 -> 9747e8f)
This is an automated email from the ASF dual-hosted git repository. kabhwan pushed a change to branch master in repository https://gitbox.apache.org/repos/asf/spark.git. from fb51925 [SPARK-32335][K8S][TESTS] Remove Python2 test from K8s IT add 9747e8f [SPARK-31831][SQL][TESTS][FOLLOWUP] Put mocks for HiveSessionImplSuite in hive version related subdirectories No new revisions were added by this update. Summary of changes: sql/hive-thriftserver/pom.xml | 12 ++ .../hive/thriftserver/HiveSessionImplSuite.scala | 29 + .../thriftserver/GetCatalogsOperationMock.scala| 50 ++ .../thriftserver/GetCatalogsOperationMock.scala| 50 ++ 4 files changed, 113 insertions(+), 28 deletions(-) create mode 100644 sql/hive-thriftserver/v1.2/src/test/scala/ org/apache/spark/sql/hive/thriftserver/GetCatalogsOperationMock.scala create mode 100644 sql/hive-thriftserver/v2.3/src/test/scala/ org/apache/spark/sql/hive/thriftserver/GetCatalogsOperationMock.scala - To unsubscribe, e-mail: commits-unsubscr...@spark.apache.org For additional commands, e-mail: commits-h...@spark.apache.org