[GitHub] spark pull request #20888: [SPARK-23775][TEST] Make DataFrameRangeSuite not ...

2018-04-20 Thread gaborgsomogyi
Github user gaborgsomogyi commented on a diff in the pull request: https://github.com/apache/spark/pull/20888#discussion_r183067778 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameRangeSuite.scala --- @@ -152,39 +154,53 @@ class DataFrameRangeSuite extends

[GitHub] spark issue #20997: [SPARK-19185] [DSTREAMS] Avoid concurrent use of cached ...

2018-04-20 Thread gaborgsomogyi
Github user gaborgsomogyi commented on the issue: https://github.com/apache/spark/pull/20997 Taken a look at the pool options I have the feeling it requires more time to come up with a proper solution. Switching back to the SQL code provided one cached consumer approach

[GitHub] spark issue #21105: [SPARK-24022][TEST] Make SparkContextSuite not flaky

2018-04-19 Thread gaborgsomogyi
Github user gaborgsomogyi commented on the issue: https://github.com/apache/spark/pull/21105 Yes, please see SPARK-22764. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e

[GitHub] spark issue #21105: [SPARK-24022][TEST] Make SparkContextSuite not flaky

2018-04-19 Thread gaborgsomogyi
Github user gaborgsomogyi commented on the issue: https://github.com/apache/spark/pull/21105 @jiangxb1987 Here it's possible to create a separate context. --- - To unsubscribe, e-mail: reviews-uns

[GitHub] spark issue #21105: [SPARK-24022][TEST] Make SparkContextSuite not flaky

2018-04-19 Thread gaborgsomogyi
Github user gaborgsomogyi commented on the issue: https://github.com/apache/spark/pull/21105 cc @vanzin @jiangxb1987 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e

[GitHub] spark pull request #21105: [SPARK-24022][TEST] Make SparkContextSuite not fl...

2018-04-19 Thread gaborgsomogyi
GitHub user gaborgsomogyi opened a pull request: https://github.com/apache/spark/pull/21105 [SPARK-24022][TEST] Make SparkContextSuite not flaky ## What changes were proposed in this pull request? SparkContextSuite.test("Cancelling stages/jobs with custom reasons.&qu

[GitHub] spark pull request #20888: [SPARK-23775][TEST] Make DataFrameRangeSuite not ...

2018-04-17 Thread gaborgsomogyi
Github user gaborgsomogyi commented on a diff in the pull request: https://github.com/apache/spark/pull/20888#discussion_r182013606 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameRangeSuite.scala --- @@ -152,39 +154,54 @@ class DataFrameRangeSuite extends

[GitHub] spark pull request #20888: [SPARK-23775][TEST] Make DataFrameRangeSuite not ...

2018-04-17 Thread gaborgsomogyi
Github user gaborgsomogyi commented on a diff in the pull request: https://github.com/apache/spark/pull/20888#discussion_r182011978 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameRangeSuite.scala --- @@ -156,43 +156,52 @@ class DataFrameRangeSuite extends

[GitHub] spark pull request #20997: [SPARK-19185] [DSTREAMS] Avoid concurrent use of ...

2018-04-16 Thread gaborgsomogyi
Github user gaborgsomogyi commented on a diff in the pull request: https://github.com/apache/spark/pull/20997#discussion_r181655790 --- Diff: external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/KafkaDataConsumer.scala --- @@ -0,0 +1,381

[GitHub] spark pull request #20997: [SPARK-19185] [DSTREAMS] Avoid concurrent use of ...

2018-04-16 Thread gaborgsomogyi
Github user gaborgsomogyi commented on a diff in the pull request: https://github.com/apache/spark/pull/20997#discussion_r181647681 --- Diff: external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/KafkaDataConsumer.scala --- @@ -0,0 +1,381

[GitHub] spark pull request #20997: [SPARK-19185] [DSTREAMS] Avoid concurrent use of ...

2018-04-16 Thread gaborgsomogyi
Github user gaborgsomogyi commented on a diff in the pull request: https://github.com/apache/spark/pull/20997#discussion_r181645665 --- Diff: external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/KafkaDataConsumer.scala --- @@ -0,0 +1,381

[GitHub] spark pull request #20997: [SPARK-19185] [DSTREAMS] Avoid concurrent use of ...

2018-04-13 Thread gaborgsomogyi
Github user gaborgsomogyi commented on a diff in the pull request: https://github.com/apache/spark/pull/20997#discussion_r181502862 --- Diff: external/kafka-0-10/src/test/scala/org/apache/spark/streaming/kafka010/KafkaDataConsumerSuite.scala --- @@ -0,0 +1,111

[GitHub] spark pull request #20888: [SPARK-23775][TEST] Make DataFrameRangeSuite not ...

2018-04-13 Thread gaborgsomogyi
Github user gaborgsomogyi commented on a diff in the pull request: https://github.com/apache/spark/pull/20888#discussion_r181489845 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameRangeSuite.scala --- @@ -152,22 +154,28 @@ class DataFrameRangeSuite extends

[GitHub] spark pull request #20888: [SPARK-23775][TEST] Make DataFrameRangeSuite not ...

2018-04-13 Thread gaborgsomogyi
Github user gaborgsomogyi commented on a diff in the pull request: https://github.com/apache/spark/pull/20888#discussion_r181459717 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameRangeSuite.scala --- @@ -152,22 +154,28 @@ class DataFrameRangeSuite extends

[GitHub] spark pull request #20888: [SPARK-23775][TEST] Make DataFrameRangeSuite not ...

2018-04-13 Thread gaborgsomogyi
Github user gaborgsomogyi commented on a diff in the pull request: https://github.com/apache/spark/pull/20888#discussion_r181326106 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameRangeSuite.scala --- @@ -152,22 +154,28 @@ class DataFrameRangeSuite extends

[GitHub] spark pull request #20997: [SPARK-19185] [DSTREAMS] Avoid concurrent use of ...

2018-04-12 Thread gaborgsomogyi
Github user gaborgsomogyi commented on a diff in the pull request: https://github.com/apache/spark/pull/20997#discussion_r181116277 --- Diff: external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/KafkaDataConsumer.scala --- @@ -0,0 +1,381

[GitHub] spark pull request #20997: [SPARK-19185] [DSTREAMS] Avoid concurrent use of ...

2018-04-12 Thread gaborgsomogyi
Github user gaborgsomogyi commented on a diff in the pull request: https://github.com/apache/spark/pull/20997#discussion_r181109231 --- Diff: external/kafka-0-10/src/test/scala/org/apache/spark/streaming/kafka010/KafkaDataConsumerSuite.scala --- @@ -0,0 +1,111

[GitHub] spark issue #20997: [SPARK-19185] [DSTREAMS] Avoid concurrent use of cached ...

2018-04-12 Thread gaborgsomogyi
Github user gaborgsomogyi commented on the issue: https://github.com/apache/spark/pull/20997 @koeninger > I don't see an upper bound on the number of consumers per key, nor a way of reaping idle consumers. If the SQL equivalent code is likely to be modified to use

[GitHub] spark pull request #20997: [SPARK-19185] [DSTREAMS] Avoid concurrent use of ...

2018-04-12 Thread gaborgsomogyi
Github user gaborgsomogyi commented on a diff in the pull request: https://github.com/apache/spark/pull/20997#discussion_r181098163 --- Diff: external/kafka-0-10/src/test/scala/org/apache/spark/streaming/kafka010/KafkaDataConsumerSuite.scala --- @@ -0,0 +1,111

[GitHub] spark pull request #20997: [SPARK-19185] [DSTREAMS] Avoid concurrent use of ...

2018-04-12 Thread gaborgsomogyi
Github user gaborgsomogyi commented on a diff in the pull request: https://github.com/apache/spark/pull/20997#discussion_r18108 --- Diff: external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/KafkaDataConsumer.scala --- @@ -0,0 +1,381

[GitHub] spark pull request #20997: [SPARK-19185] [DSTREAMS] Avoid concurrent use of ...

2018-04-12 Thread gaborgsomogyi
Github user gaborgsomogyi commented on a diff in the pull request: https://github.com/apache/spark/pull/20997#discussion_r181077275 --- Diff: external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/KafkaDataConsumer.scala --- @@ -0,0 +1,381

[GitHub] spark pull request #20997: [SPARK-19185] [DSTREAMS] Avoid concurrent use of ...

2018-04-12 Thread gaborgsomogyi
Github user gaborgsomogyi commented on a diff in the pull request: https://github.com/apache/spark/pull/20997#discussion_r181076158 --- Diff: external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/KafkaDataConsumer.scala --- @@ -0,0 +1,381

[GitHub] spark pull request #20997: [SPARK-19185] [DSTREAMS] Avoid concurrent use of ...

2018-04-12 Thread gaborgsomogyi
Github user gaborgsomogyi commented on a diff in the pull request: https://github.com/apache/spark/pull/20997#discussion_r181074560 --- Diff: external/kafka-0-10/src/test/scala/org/apache/spark/streaming/kafka010/KafkaDataConsumerSuite.scala --- @@ -0,0 +1,111

[GitHub] spark pull request #20997: [SPARK-19185] [DSTREAMS] Avoid concurrent use of ...

2018-04-12 Thread gaborgsomogyi
Github user gaborgsomogyi commented on a diff in the pull request: https://github.com/apache/spark/pull/20997#discussion_r181073687 --- Diff: external/kafka-0-10/src/test/scala/org/apache/spark/streaming/kafka010/KafkaDataConsumerSuite.scala --- @@ -0,0 +1,111

[GitHub] spark pull request #20997: [SPARK-19185] [DSTREAMS] Avoid concurrent use of ...

2018-04-12 Thread gaborgsomogyi
Github user gaborgsomogyi commented on a diff in the pull request: https://github.com/apache/spark/pull/20997#discussion_r181072630 --- Diff: external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/KafkaDataConsumer.scala --- @@ -0,0 +1,381

[GitHub] spark pull request #20997: [SPARK-19185] [DSTREAMS] Avoid concurrent use of ...

2018-04-12 Thread gaborgsomogyi
Github user gaborgsomogyi commented on a diff in the pull request: https://github.com/apache/spark/pull/20997#discussion_r181066695 --- Diff: external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/KafkaDataConsumer.scala --- @@ -0,0 +1,381

[GitHub] spark pull request #20997: [SPARK-19185] [DSTREAMS] Avoid concurrent use of ...

2018-04-12 Thread gaborgsomogyi
Github user gaborgsomogyi commented on a diff in the pull request: https://github.com/apache/spark/pull/20997#discussion_r181066058 --- Diff: external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/KafkaDataConsumer.scala --- @@ -0,0 +1,381

[GitHub] spark pull request #20997: [SPARK-19185] [DSTREAMS] Avoid concurrent use of ...

2018-04-12 Thread gaborgsomogyi
Github user gaborgsomogyi commented on a diff in the pull request: https://github.com/apache/spark/pull/20997#discussion_r181063487 --- Diff: external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/KafkaDataConsumer.scala --- @@ -0,0 +1,381

[GitHub] spark pull request #20997: [SPARK-19185] [DSTREAMS] Avoid concurrent use of ...

2018-04-12 Thread gaborgsomogyi
Github user gaborgsomogyi commented on a diff in the pull request: https://github.com/apache/spark/pull/20997#discussion_r181058027 --- Diff: external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/KafkaDataConsumer.scala --- @@ -0,0 +1,381

[GitHub] spark pull request #20997: [SPARK-19185] [DSTREAMS] Avoid concurrent use of ...

2018-04-12 Thread gaborgsomogyi
Github user gaborgsomogyi commented on a diff in the pull request: https://github.com/apache/spark/pull/20997#discussion_r181057829 --- Diff: external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/KafkaDataConsumer.scala --- @@ -0,0 +1,381

[GitHub] spark pull request #20997: [SPARK-19185] [DSTREAMS] Avoid concurrent use of ...

2018-04-12 Thread gaborgsomogyi
Github user gaborgsomogyi commented on a diff in the pull request: https://github.com/apache/spark/pull/20997#discussion_r181057477 --- Diff: external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/KafkaDataConsumer.scala --- @@ -0,0 +1,381

[GitHub] spark pull request #20997: [SPARK-19185] [DSTREAMS] Avoid concurrent use of ...

2018-04-12 Thread gaborgsomogyi
Github user gaborgsomogyi commented on a diff in the pull request: https://github.com/apache/spark/pull/20997#discussion_r181057345 --- Diff: external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/KafkaDataConsumer.scala --- @@ -0,0 +1,381

[GitHub] spark pull request #20997: [SPARK-19185] [DSTREAMS] Avoid concurrent use of ...

2018-04-12 Thread gaborgsomogyi
Github user gaborgsomogyi commented on a diff in the pull request: https://github.com/apache/spark/pull/20997#discussion_r181057226 --- Diff: external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/KafkaDataConsumer.scala --- @@ -0,0 +1,381

[GitHub] spark pull request #20997: [SPARK-19185] [DSTREAMS] Avoid concurrent use of ...

2018-04-11 Thread gaborgsomogyi
Github user gaborgsomogyi commented on a diff in the pull request: https://github.com/apache/spark/pull/20997#discussion_r180810282 --- Diff: external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/KafkaDataConsumer.scala --- @@ -0,0 +1,381

[GitHub] spark pull request #20997: [SPARK-19185] [DSTREAMS] Avoid concurrent use of ...

2018-04-11 Thread gaborgsomogyi
Github user gaborgsomogyi commented on a diff in the pull request: https://github.com/apache/spark/pull/20997#discussion_r180808097 --- Diff: external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/KafkaDataConsumer.scala --- @@ -0,0 +1,381

[GitHub] spark pull request #20997: [SPARK-19185] [DSTREAMS] Avoid concurrent use of ...

2018-04-11 Thread gaborgsomogyi
Github user gaborgsomogyi commented on a diff in the pull request: https://github.com/apache/spark/pull/20997#discussion_r180808081 --- Diff: external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/KafkaDataConsumer.scala --- @@ -0,0 +1,381

[GitHub] spark pull request #20997: [SPARK-19185] [DSTREAMS] Avoid concurrent use of ...

2018-04-11 Thread gaborgsomogyi
Github user gaborgsomogyi commented on a diff in the pull request: https://github.com/apache/spark/pull/20997#discussion_r180807486 --- Diff: external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/KafkaDataConsumer.scala --- @@ -0,0 +1,381

[GitHub] spark pull request #20997: [SPARK-19185] [DSTREAMS] Avoid concurrent use of ...

2018-04-11 Thread gaborgsomogyi
Github user gaborgsomogyi commented on a diff in the pull request: https://github.com/apache/spark/pull/20997#discussion_r180806811 --- Diff: external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/KafkaDataConsumer.scala --- @@ -0,0 +1,381

[GitHub] spark pull request #20997: [SPARK-19185] [DSTREAMS] Avoid concurrent use of ...

2018-04-11 Thread gaborgsomogyi
Github user gaborgsomogyi commented on a diff in the pull request: https://github.com/apache/spark/pull/20997#discussion_r180806244 --- Diff: external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/KafkaDataConsumer.scala --- @@ -0,0 +1,381

[GitHub] spark pull request #20997: [SPARK-19185] [DSTREAMS] Avoid concurrent use of ...

2018-04-11 Thread gaborgsomogyi
Github user gaborgsomogyi commented on a diff in the pull request: https://github.com/apache/spark/pull/20997#discussion_r180805909 --- Diff: external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/KafkaDataConsumer.scala --- @@ -0,0 +1,381

[GitHub] spark pull request #20997: [SPARK-19185] [DSTREAMS] Avoid concurrent use of ...

2018-04-11 Thread gaborgsomogyi
Github user gaborgsomogyi commented on a diff in the pull request: https://github.com/apache/spark/pull/20997#discussion_r180805380 --- Diff: external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/KafkaDataConsumer.scala --- @@ -0,0 +1,381

[GitHub] spark pull request #20997: [SPARK-19185] [DSTREAMS] Avoid concurrent use of ...

2018-04-11 Thread gaborgsomogyi
Github user gaborgsomogyi commented on a diff in the pull request: https://github.com/apache/spark/pull/20997#discussion_r180780119 --- Diff: external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/KafkaDataConsumer.scala --- @@ -0,0 +1,381

[GitHub] spark issue #20888: [SPARK-23775][TEST] Make DataFrameRangeSuite not flaky

2018-04-11 Thread gaborgsomogyi
Github user gaborgsomogyi commented on the issue: https://github.com/apache/spark/pull/20888 Updated the title and the description to reflect the changes. --- - To unsubscribe, e-mail: reviews-unsubscr

[GitHub] spark pull request #20888: [SPARK-23775][TEST] DataFrameRangeSuite should wa...

2018-04-11 Thread gaborgsomogyi
Github user gaborgsomogyi commented on a diff in the pull request: https://github.com/apache/spark/pull/20888#discussion_r180691534 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameRangeSuite.scala --- @@ -164,10 +164,13 @@ class DataFrameRangeSuite extends

[GitHub] spark issue #20888: [SPARK-23775][TEST] DataFrameRangeSuite should wait for ...

2018-04-10 Thread gaborgsomogyi
Github user gaborgsomogyi commented on the issue: https://github.com/apache/spark/pull/20888 Thank for the hints. I've taken a deeper look at the possible solutions and the suggested test. The problem is similar but not the same so I would solve it a different way. So here

[GitHub] spark issue #20888: [SPARK-23775][TEST] DataFrameRangeSuite should wait for ...

2018-04-09 Thread gaborgsomogyi
Github user gaborgsomogyi commented on the issue: https://github.com/apache/spark/pull/20888 `SparkStatusTracker` states the following: ``` * These APIs intentionally provide very weak consistency semantics; consumers of these APIs should * be prepared to handle empty

[GitHub] spark issue #20997: [SPARK-19185] [DSTREAMS] Avoid concurrent use of cached ...

2018-04-06 Thread gaborgsomogyi
Github user gaborgsomogyi commented on the issue: https://github.com/apache/spark/pull/20997 cc @tdas @zsxwing @koeninger --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands

[GitHub] spark pull request #20997: [SPARK-19185] [DSTREAMS] Avoid concurrent use of ...

2018-04-06 Thread gaborgsomogyi
GitHub user gaborgsomogyi opened a pull request: https://github.com/apache/spark/pull/20997 [SPARK-19185] [DSTREAMS] Avoid concurrent use of cached consumers in CachedKafkaConsumer ## What changes were proposed in this pull request? `CachedKafkaConsumer` in the project

[GitHub] spark issue #20888: [SPARK-23775][TEST] DataFrameRangeSuite should wait for ...

2018-03-23 Thread gaborgsomogyi
Github user gaborgsomogyi commented on the issue: https://github.com/apache/spark/pull/20888 @vanzin @squito yeah, there is an issue with threading as well. I'm just taking a look at it because it's n

[GitHub] spark issue #20836: SPARK-23685 : Fix for the Spark Structured Streaming Kaf...

2018-03-23 Thread gaborgsomogyi
Github user gaborgsomogyi commented on the issue: https://github.com/apache/spark/pull/20836 I don't see the problem. You see an exception which tells exactly what can be done: > If you don't want your streaming query to fail on such cases, set the

[GitHub] spark pull request #20836: SPARK-23685 : Fix for the Spark Structured Stream...

2018-03-23 Thread gaborgsomogyi
Github user gaborgsomogyi commented on a diff in the pull request: https://github.com/apache/spark/pull/20836#discussion_r176656905 --- Diff: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaDataConsumer.scala --- @@ -279,9 +279,8 @@ private[kafka010

[GitHub] spark issue #20888: [SPARK-23775][TEST] DataFrameRangeSuite should wait for ...

2018-03-22 Thread gaborgsomogyi
Github user gaborgsomogyi commented on the issue: https://github.com/apache/spark/pull/20888 Uploaded logs for the jira. You're right when you pointed out a second issue with the `stageToKill`. The `onJobStart` tries to cancel the same ID

[GitHub] spark issue #20888: [SPARK-23775][TEST] DataFrameRangeSuite should wait for ...

2018-03-22 Thread gaborgsomogyi
Github user gaborgsomogyi commented on the issue: https://github.com/apache/spark/pull/20888 I mean on my machine the stage ID is zero for long long time here: ``` DataFrameRangeSuite.stageToKill = TaskContext.get().stageId() ``` and after 200 seconds the other

[GitHub] spark issue #20888: [SPARK-23775][TEST] DataFrameRangeSuite should wait for ...

2018-03-22 Thread gaborgsomogyi
Github user gaborgsomogyi commented on the issue: https://github.com/apache/spark/pull/20888 Just an additional info if I execute the test on my machine alone it never pass. --- - To unsubscribe, e-mail: reviews

[GitHub] spark issue #20888: [SPARK-23775][TEST] DataFrameRangeSuite should wait for ...

2018-03-22 Thread gaborgsomogyi
Github user gaborgsomogyi commented on the issue: https://github.com/apache/spark/pull/20888 Where do you think the reset should happen? There is already one inside `withSQLConf` which makes a reset before job submit. Related the ID I've just taken a look at the ori

[GitHub] spark pull request #20888: [SPARK-23775][TEST] DataFrameRangeSuite should wa...

2018-03-22 Thread gaborgsomogyi
GitHub user gaborgsomogyi opened a pull request: https://github.com/apache/spark/pull/20888 [SPARK-23775][TEST] DataFrameRangeSuite should wait for first stage ## What changes were proposed in this pull request? DataFrameRangeSuite.test("Cancelling stage in a query

[GitHub] spark issue #20745: [SPARK-23288][SS] Fix output metrics with parquet sink

2018-03-21 Thread gaborgsomogyi
Github user gaborgsomogyi commented on the issue: https://github.com/apache/spark/pull/20745 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark pull request #20745: [SPARK-23288][SS] Fix output metrics with parquet...

2018-03-20 Thread gaborgsomogyi
Github user gaborgsomogyi commented on a diff in the pull request: https://github.com/apache/spark/pull/20745#discussion_r175994267 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/streaming/FileStreamSinkSuite.scala --- @@ -405,4 +406,55 @@ class FileStreamSinkSuite

[GitHub] spark issue #20745: [SPARK-23288][SS] Fix output metrics with parquet sink

2018-03-20 Thread gaborgsomogyi
Github user gaborgsomogyi commented on the issue: https://github.com/apache/spark/pull/20745 I've started history server then executed `test("SPARK-23288 writing and checking output metrics")` with `spark.eventLog.enabled` parameter. Now there is only one entry in

[GitHub] spark issue #20745: [SPARK-23288][SS] Fix output metrics with parquet sink

2018-03-20 Thread gaborgsomogyi
Github user gaborgsomogyi commented on the issue: https://github.com/apache/spark/pull/20745 https://user-images.githubusercontent.com/18561820/37696015-b1250bae-2c90-11e8-8ad1-515661487b94.png";> ---

[GitHub] spark issue #20745: [SPARK-23288][SS] Fix output metrics with parquet sink

2018-03-20 Thread gaborgsomogyi
Github user gaborgsomogyi commented on the issue: https://github.com/apache/spark/pull/20745 https://user-images.githubusercontent.com/18561820/37695954-5aacaa2a-2c90-11e8-9f73-f57d0e1b27f6.png";> ---

[GitHub] spark issue #20767: [SPARK-23623] [SS] Avoid concurrent use of cached consum...

2018-03-20 Thread gaborgsomogyi
Github user gaborgsomogyi commented on the issue: https://github.com/apache/spark/pull/20767 @tdas @zsxwing @koeninger @tedyu do you think it makes sense to make similar step in the DStream area like this and then later follow with the mentioned Apache Common Pool

[GitHub] spark issue #19819: [SPARK-22606][Streaming]Add threadId to the CachedKafkaC...

2018-03-19 Thread gaborgsomogyi
Github user gaborgsomogyi commented on the issue: https://github.com/apache/spark/pull/19819 It will create a new consumer for each thread. This could be quite resource consuming when several topics shared with thread pools

[GitHub] spark pull request #20853: [SPARK-23729][CORE] Respect URI fragment when res...

2018-03-19 Thread gaborgsomogyi
Github user gaborgsomogyi commented on a diff in the pull request: https://github.com/apache/spark/pull/20853#discussion_r175585634 --- Diff: core/src/main/scala/org/apache/spark/deploy/DependencyUtils.scala --- @@ -137,16 +138,36 @@ private[deploy] object DependencyUtils

[GitHub] spark pull request #20853: [SPARK-23729][CORE] Respect URI fragment when res...

2018-03-19 Thread gaborgsomogyi
Github user gaborgsomogyi commented on a diff in the pull request: https://github.com/apache/spark/pull/20853#discussion_r175585581 --- Diff: core/src/main/scala/org/apache/spark/deploy/DependencyUtils.scala --- @@ -137,16 +138,36 @@ private[deploy] object DependencyUtils

[GitHub] spark pull request #20853: [SPARK-23729][CORE] Respect URI fragment when res...

2018-03-19 Thread gaborgsomogyi
Github user gaborgsomogyi commented on a diff in the pull request: https://github.com/apache/spark/pull/20853#discussion_r175567819 --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala --- @@ -245,6 +245,19 @@ object SparkSubmit extends CommandLineUtils with

[GitHub] spark pull request #20853: [SPARK-23729][CORE] Respect URI fragment when res...

2018-03-19 Thread gaborgsomogyi
Github user gaborgsomogyi commented on a diff in the pull request: https://github.com/apache/spark/pull/20853#discussion_r175553185 --- Diff: core/src/test/scala/org/apache/spark/deploy/SparkSubmitSuite.scala --- @@ -105,11 +105,17 @@ class SparkSubmitSuite

[GitHub] spark pull request #20853: [SPARK-23729][CORE] Respect URI fragment when res...

2018-03-19 Thread gaborgsomogyi
Github user gaborgsomogyi commented on a diff in the pull request: https://github.com/apache/spark/pull/20853#discussion_r175544492 --- Diff: core/src/main/scala/org/apache/spark/deploy/DependencyUtils.scala --- @@ -137,16 +137,29 @@ private[deploy] object DependencyUtils

[GitHub] spark pull request #20853: [SPARK-23729][CORE] Respect URI fragment when res...

2018-03-19 Thread gaborgsomogyi
Github user gaborgsomogyi commented on a diff in the pull request: https://github.com/apache/spark/pull/20853#discussion_r175543124 --- Diff: core/src/main/scala/org/apache/spark/deploy/DependencyUtils.scala --- @@ -137,16 +137,29 @@ private[deploy] object DependencyUtils

[GitHub] spark pull request #20853: [SPARK-23729][CORE] Respect URI fragment when res...

2018-03-19 Thread gaborgsomogyi
Github user gaborgsomogyi commented on a diff in the pull request: https://github.com/apache/spark/pull/20853#discussion_r175541913 --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala --- @@ -245,6 +245,19 @@ object SparkSubmit extends CommandLineUtils with

[GitHub] spark pull request #20853: [SPARK-23729][CORE] Respect URI fragment when res...

2018-03-19 Thread gaborgsomogyi
Github user gaborgsomogyi commented on a diff in the pull request: https://github.com/apache/spark/pull/20853#discussion_r175540847 --- Diff: core/src/test/scala/org/apache/spark/deploy/SparkSubmitSuite.scala --- @@ -606,9 +612,12 @@ class SparkSubmitSuite

[GitHub] spark pull request #20853: [SPARK-23729][CORE] Respect URI fragment when res...

2018-03-19 Thread gaborgsomogyi
Github user gaborgsomogyi commented on a diff in the pull request: https://github.com/apache/spark/pull/20853#discussion_r175540696 --- Diff: core/src/test/scala/org/apache/spark/deploy/SparkSubmitSuite.scala --- @@ -105,11 +105,17 @@ class SparkSubmitSuite

[GitHub] spark pull request #20853: [SPARK-23729][CORE] Respect URI fragment when res...

2018-03-19 Thread gaborgsomogyi
Github user gaborgsomogyi commented on a diff in the pull request: https://github.com/apache/spark/pull/20853#discussion_r175533621 --- Diff: core/src/main/scala/org/apache/spark/deploy/DependencyUtils.scala --- @@ -137,16 +137,29 @@ private[deploy] object DependencyUtils

[GitHub] spark pull request #20853: [SPARK-23729][CORE] Respect URI fragment when res...

2018-03-19 Thread gaborgsomogyi
Github user gaborgsomogyi commented on a diff in the pull request: https://github.com/apache/spark/pull/20853#discussion_r175521372 --- Diff: core/src/test/scala/org/apache/spark/deploy/SparkSubmitSuite.scala --- @@ -606,9 +612,12 @@ class SparkSubmitSuite

[GitHub] spark pull request #20853: [SPARK-23729][CORE] Respect URI fragment when res...

2018-03-19 Thread gaborgsomogyi
Github user gaborgsomogyi commented on a diff in the pull request: https://github.com/apache/spark/pull/20853#discussion_r175515840 --- Diff: core/src/test/scala/org/apache/spark/deploy/SparkSubmitSuite.scala --- @@ -105,11 +105,17 @@ class SparkSubmitSuite

[GitHub] spark pull request #20853: [SPARK-23729][CORE] Respect URI fragment when res...

2018-03-19 Thread gaborgsomogyi
Github user gaborgsomogyi commented on a diff in the pull request: https://github.com/apache/spark/pull/20853#discussion_r175529369 --- Diff: core/src/main/scala/org/apache/spark/deploy/DependencyUtils.scala --- @@ -137,16 +137,29 @@ private[deploy] object DependencyUtils

[GitHub] spark pull request #20853: [SPARK-23729][CORE] Respect URI fragment when res...

2018-03-19 Thread gaborgsomogyi
Github user gaborgsomogyi commented on a diff in the pull request: https://github.com/apache/spark/pull/20853#discussion_r175521925 --- Diff: core/src/test/scala/org/apache/spark/deploy/SparkSubmitSuite.scala --- @@ -657,6 +667,31 @@ class SparkSubmitSuite conf3.get

[GitHub] spark pull request #20853: [SPARK-23729][CORE] Respect URI fragment when res...

2018-03-19 Thread gaborgsomogyi
Github user gaborgsomogyi commented on a diff in the pull request: https://github.com/apache/spark/pull/20853#discussion_r175523620 --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala --- @@ -245,6 +245,19 @@ object SparkSubmit extends CommandLineUtils with

[GitHub] spark issue #20807: SPARK-23660: Fix exception in yarn cluster mode when app...

2018-03-16 Thread gaborgsomogyi
Github user gaborgsomogyi commented on the issue: https://github.com/apache/spark/pull/20807 @vanzin sorry, one useless comment left in the code. Just removed. --- - To unsubscribe, e-mail: reviews-unsubscr

[GitHub] spark pull request #20745: [SPARK-23288][SS] Fix output metrics with parquet...

2018-03-16 Thread gaborgsomogyi
Github user gaborgsomogyi commented on a diff in the pull request: https://github.com/apache/spark/pull/20745#discussion_r175186820 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/streaming/FileStreamSinkSuite.scala --- @@ -405,4 +406,52 @@ class FileStreamSinkSuite

[GitHub] spark pull request #20745: [SPARK-23288][SS] Fix output metrics with parquet...

2018-03-16 Thread gaborgsomogyi
Github user gaborgsomogyi commented on a diff in the pull request: https://github.com/apache/spark/pull/20745#discussion_r175186800 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/streaming/FileStreamSinkSuite.scala --- @@ -405,4 +406,52 @@ class FileStreamSinkSuite

[GitHub] spark pull request #20807: SPARK-23660: Fix exception in yarn cluster mode w...

2018-03-16 Thread gaborgsomogyi
Github user gaborgsomogyi commented on a diff in the pull request: https://github.com/apache/spark/pull/20807#discussion_r175156772 --- Diff: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala --- @@ -418,7 +418,19 @@ private[spark] class

[GitHub] spark pull request #20807: SPARK-23660: Fix exception in yarn cluster mode w...

2018-03-16 Thread gaborgsomogyi
Github user gaborgsomogyi commented on a diff in the pull request: https://github.com/apache/spark/pull/20807#discussion_r175156659 --- Diff: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala --- @@ -497,6 +500,8 @@ private[spark] class

[GitHub] spark issue #20745: [SPARK-23288][SS] Fix output metrics with parquet sink

2018-03-16 Thread gaborgsomogyi
Github user gaborgsomogyi commented on the issue: https://github.com/apache/spark/pull/20745 ping @koeninger --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark issue #20683: [SPARK-8605] Exclude files in StreamingContext. textFile...

2018-03-16 Thread gaborgsomogyi
Github user gaborgsomogyi commented on the issue: https://github.com/apache/spark/pull/20683 Don't really understand the issue itself. Which filesystem used this case? Why is it not possible to use Hadoop-compatible filesystem like HDFS for instance? This supports atomic rename.

[GitHub] spark pull request #20807: SPARK-23660: Fix exception in yarn cluster mode w...

2018-03-15 Thread gaborgsomogyi
Github user gaborgsomogyi commented on a diff in the pull request: https://github.com/apache/spark/pull/20807#discussion_r174983468 --- Diff: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala --- @@ -497,6 +500,8 @@ private[spark] class

[GitHub] spark pull request #20807: SPARK-23660: Fix exception in yarn cluster mode w...

2018-03-15 Thread gaborgsomogyi
Github user gaborgsomogyi commented on a diff in the pull request: https://github.com/apache/spark/pull/20807#discussion_r174983449 --- Diff: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala --- @@ -417,8 +417,11 @@ private[spark] class

[GitHub] spark pull request #20807: SPARK-23660: Fix exception in yarn cluster mode w...

2018-03-15 Thread gaborgsomogyi
Github user gaborgsomogyi commented on a diff in the pull request: https://github.com/apache/spark/pull/20807#discussion_r174983476 --- Diff: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala --- @@ -497,6 +500,8 @@ private[spark] class

[GitHub] spark pull request #20807: SPARK-23660: Fix exception in yarn cluster mode w...

2018-03-13 Thread gaborgsomogyi
Github user gaborgsomogyi commented on a diff in the pull request: https://github.com/apache/spark/pull/20807#discussion_r174045577 --- Diff: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala --- @@ -496,7 +497,7 @@ private[yarn] class

[GitHub] spark issue #20807: SPARK-23660: Fix exception in yarn cluster mode when app...

2018-03-13 Thread gaborgsomogyi
Github user gaborgsomogyi commented on the issue: https://github.com/apache/spark/pull/20807 I've executed more invasive tests on the cluster and this PR didn't solve all the issues. As another not so invasive approach tried to catch the exception in `runDriver`

[GitHub] spark pull request #20807: SPARK-23660: Fix exception in yarn cluster mode w...

2018-03-13 Thread gaborgsomogyi
Github user gaborgsomogyi commented on a diff in the pull request: https://github.com/apache/spark/pull/20807#discussion_r174032329 --- Diff: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala --- @@ -496,7 +497,7 @@ private[yarn] class

[GitHub] spark pull request #20807: SPARK-23660: Fix exception in yarn cluster mode w...

2018-03-12 Thread gaborgsomogyi
GitHub user gaborgsomogyi opened a pull request: https://github.com/apache/spark/pull/20807 SPARK-23660: Fix exception in yarn cluster mode when application ended fast ## What changes were proposed in this pull request? Yarn throws the following exception in cluster mode

[GitHub] spark issue #20745: [SPARK-23288][SS] Fix output metrics with parquet sink

2018-03-06 Thread gaborgsomogyi
Github user gaborgsomogyi commented on the issue: https://github.com/apache/spark/pull/20745 Re-created the PR because something got stuck in the previous one. cc @tdas @zsxwing @vanzin --- - To unsubscribe, e

[GitHub] spark pull request #20745: [SPARK-23288][SS] Fix output metrics with parquet...

2018-03-05 Thread gaborgsomogyi
GitHub user gaborgsomogyi opened a pull request: https://github.com/apache/spark/pull/20745 [SPARK-23288][SS] Fix output metrics with parquet sink ## What changes were proposed in this pull request? Output metrics were not filled when parquet sink used. This PR

[GitHub] spark issue #20639: [SPARK-23288][SS] Fix output metrics with parquet sink

2018-03-05 Thread gaborgsomogyi
Github user gaborgsomogyi commented on the issue: https://github.com/apache/spark/pull/20639 God, seems like stuck somehow. I'll re-create the PR. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apach

[GitHub] spark pull request #20639: [SPARK-23288][SS] Fix output metrics with parquet...

2018-03-05 Thread gaborgsomogyi
Github user gaborgsomogyi closed the pull request at: https://github.com/apache/spark/pull/20639 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #20703: [SPARK-19185][SS] Make Kafka consumer cache configurable

2018-03-02 Thread gaborgsomogyi
Github user gaborgsomogyi commented on the issue: https://github.com/apache/spark/pull/20703 @zsxwing I didn't know that the original design of structured streaming is not to share the Kafka consumers. I'll close this PR and take a deeper look at the

[GitHub] spark pull request #20703: [SPARK-19185][SS] Make Kafka consumer cache confi...

2018-03-02 Thread gaborgsomogyi
Github user gaborgsomogyi closed the pull request at: https://github.com/apache/spark/pull/20703 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #20639: [SPARK-23288][SS] Fix output metrics with parquet sink

2018-03-02 Thread gaborgsomogyi
Github user gaborgsomogyi commented on the issue: https://github.com/apache/spark/pull/20639 Executed these tests manually again but working fine. Seems like flaky. --- - To unsubscribe, e-mail: reviews-unsubscr

[GitHub] spark pull request #20703: [SPARK-19185][SS] Make Kafka consumer cache confi...

2018-03-01 Thread gaborgsomogyi
Github user gaborgsomogyi commented on a diff in the pull request: https://github.com/apache/spark/pull/20703#discussion_r171741027 --- Diff: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaMicroBatchReader.scala --- @@ -76,6 +76,10 @@ private[kafka010

<    1   2   3   4   5   >