[GitHub] spark pull request #22138: [SPARK-25151][SS] Apply Apache Commons Pool to Ka...

2018-09-06 Thread gaborgsomogyi
Github user gaborgsomogyi commented on a diff in the pull request: https://github.com/apache/spark/pull/22138#discussion_r215579562 --- Diff: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaDataConsumer.scala --- @@ -18,222 +18,247 @@ package

[GitHub] spark pull request #22138: [SPARK-25151][SS] Apply Apache Commons Pool to Ka...

2018-09-06 Thread gaborgsomogyi
Github user gaborgsomogyi commented on a diff in the pull request: https://github.com/apache/spark/pull/22138#discussion_r215583862 --- Diff: external/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/FetchedPoolSuite.scala --- @@ -0,0 +1,299 @@ +/* + * Licensed

[GitHub] spark pull request #22138: [SPARK-25151][SS] Apply Apache Commons Pool to Ka...

2018-09-06 Thread gaborgsomogyi
Github user gaborgsomogyi commented on a diff in the pull request: https://github.com/apache/spark/pull/22138#discussion_r215594790 --- Diff: external/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/FetchedPoolSuite.scala --- @@ -0,0 +1,299 @@ +/* + * Licensed

[GitHub] spark issue #21685: [SPARK-24707][DSTREAMS] Enable spark-kafka-streaming to ...

2018-07-10 Thread gaborgsomogyi
Github user gaborgsomogyi commented on the issue: https://github.com/apache/spark/pull/21685 In the meantime came something into my mind (the most obvious question). What is the size of kafka events which is processed? Big events could end up in high polling time. Maybe some

[GitHub] spark pull request #19893: [SPARK-16139][TEST] Add logging functionality for...

2018-01-12 Thread gaborgsomogyi
Github user gaborgsomogyi commented on a diff in the pull request: https://github.com/apache/spark/pull/19893#discussion_r161222015 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/test/SharedSQLContext.scala --- @@ -17,4 +17,22 @@ package

[GitHub] spark issue #20745: [SPARK-23288][SS] Fix output metrics with parquet sink

2018-03-06 Thread gaborgsomogyi
Github user gaborgsomogyi commented on the issue: https://github.com/apache/spark/pull/20745 Re-created the PR because something got stuck in the previous one. cc @tdas @zsxwing @vanzin --- - To unsubscribe, e

[GitHub] spark issue #20703: [SPARK-19185][SS] Make Kafka consumer cache configurable

2018-03-01 Thread gaborgsomogyi
Github user gaborgsomogyi commented on the issue: https://github.com/apache/spark/pull/20703 cc @vanzin @zsxwing @jose-torres @cloud-fan --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

[GitHub] spark pull request #20703: [SPARK-19185][SS] Make Kafka consumer cache confi...

2018-03-01 Thread gaborgsomogyi
GitHub user gaborgsomogyi opened a pull request: https://github.com/apache/spark/pull/20703 [SPARK-19185][SS] Make Kafka consumer cache configurable ## What changes were proposed in this pull request? Use property `spark.streaming.kafka.consumer.cache.enabled` in structured

[GitHub] spark pull request #20807: SPARK-23660: Fix exception in yarn cluster mode w...

2018-03-12 Thread gaborgsomogyi
GitHub user gaborgsomogyi opened a pull request: https://github.com/apache/spark/pull/20807 SPARK-23660: Fix exception in yarn cluster mode when application ended fast ## What changes were proposed in this pull request? Yarn throws the following exception in cluster mode

[GitHub] spark issue #20807: SPARK-23660: Fix exception in yarn cluster mode when app...

2018-03-13 Thread gaborgsomogyi
Github user gaborgsomogyi commented on the issue: https://github.com/apache/spark/pull/20807 I've executed more invasive tests on the cluster and this PR didn't solve all the issues. As another not so invasive approach tried to catch the exception in `runDriver` but failed

[GitHub] spark pull request #20807: SPARK-23660: Fix exception in yarn cluster mode w...

2018-03-13 Thread gaborgsomogyi
Github user gaborgsomogyi commented on a diff in the pull request: https://github.com/apache/spark/pull/20807#discussion_r174045577 --- Diff: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala --- @@ -496,7 +497,7 @@ private[yarn] class

[GitHub] spark pull request #20807: SPARK-23660: Fix exception in yarn cluster mode w...

2018-03-13 Thread gaborgsomogyi
Github user gaborgsomogyi commented on a diff in the pull request: https://github.com/apache/spark/pull/20807#discussion_r174032329 --- Diff: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/YarnAllocator.scala --- @@ -496,7 +497,7 @@ private[yarn] class

[GitHub] spark issue #20683: [SPARK-8605] Exclude files in StreamingContext. textFile...

2018-03-16 Thread gaborgsomogyi
Github user gaborgsomogyi commented on the issue: https://github.com/apache/spark/pull/20683 Don't really understand the issue itself. Which filesystem used this case? Why is it not possible to use Hadoop-compatible filesystem like HDFS for instance? This supports atomic rename. [See

[GitHub] spark pull request #20807: SPARK-23660: Fix exception in yarn cluster mode w...

2018-03-15 Thread gaborgsomogyi
Github user gaborgsomogyi commented on a diff in the pull request: https://github.com/apache/spark/pull/20807#discussion_r174983449 --- Diff: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala --- @@ -417,8 +417,11 @@ private[spark] class

[GitHub] spark pull request #20807: SPARK-23660: Fix exception in yarn cluster mode w...

2018-03-15 Thread gaborgsomogyi
Github user gaborgsomogyi commented on a diff in the pull request: https://github.com/apache/spark/pull/20807#discussion_r174983476 --- Diff: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala --- @@ -497,6 +500,8 @@ private[spark] class

[GitHub] spark pull request #20807: SPARK-23660: Fix exception in yarn cluster mode w...

2018-03-15 Thread gaborgsomogyi
Github user gaborgsomogyi commented on a diff in the pull request: https://github.com/apache/spark/pull/20807#discussion_r174983468 --- Diff: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala --- @@ -497,6 +500,8 @@ private[spark] class

[GitHub] spark pull request #20997: [SPARK-19185] [DSTREAMS] Avoid concurrent use of ...

2018-04-06 Thread gaborgsomogyi
GitHub user gaborgsomogyi opened a pull request: https://github.com/apache/spark/pull/20997 [SPARK-19185] [DSTREAMS] Avoid concurrent use of cached consumers in CachedKafkaConsumer ## What changes were proposed in this pull request? `CachedKafkaConsumer` in the project

[GitHub] spark issue #20888: [SPARK-23775][TEST] DataFrameRangeSuite should wait for ...

2018-04-10 Thread gaborgsomogyi
Github user gaborgsomogyi commented on the issue: https://github.com/apache/spark/pull/20888 Thank for the hints. I've taken a deeper look at the possible solutions and the suggested test. The problem is similar but not the same so I would solve it a different way. So here is my

[GitHub] spark issue #20888: [SPARK-23775][TEST] Make DataFrameRangeSuite not flaky

2018-04-11 Thread gaborgsomogyi
Github user gaborgsomogyi commented on the issue: https://github.com/apache/spark/pull/20888 Updated the title and the description to reflect the changes. --- - To unsubscribe, e-mail: reviews-unsubscr

[GitHub] spark pull request #20888: [SPARK-23775][TEST] DataFrameRangeSuite should wa...

2018-04-11 Thread gaborgsomogyi
Github user gaborgsomogyi commented on a diff in the pull request: https://github.com/apache/spark/pull/20888#discussion_r180691534 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameRangeSuite.scala --- @@ -164,10 +164,13 @@ class DataFrameRangeSuite extends

[GitHub] spark pull request #20997: [SPARK-19185] [DSTREAMS] Avoid concurrent use of ...

2018-04-11 Thread gaborgsomogyi
Github user gaborgsomogyi commented on a diff in the pull request: https://github.com/apache/spark/pull/20997#discussion_r180805909 --- Diff: external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/KafkaDataConsumer.scala --- @@ -0,0 +1,381

[GitHub] spark pull request #20997: [SPARK-19185] [DSTREAMS] Avoid concurrent use of ...

2018-04-11 Thread gaborgsomogyi
Github user gaborgsomogyi commented on a diff in the pull request: https://github.com/apache/spark/pull/20997#discussion_r180806244 --- Diff: external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/KafkaDataConsumer.scala --- @@ -0,0 +1,381

[GitHub] spark pull request #20997: [SPARK-19185] [DSTREAMS] Avoid concurrent use of ...

2018-04-11 Thread gaborgsomogyi
Github user gaborgsomogyi commented on a diff in the pull request: https://github.com/apache/spark/pull/20997#discussion_r180810282 --- Diff: external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/KafkaDataConsumer.scala --- @@ -0,0 +1,381

[GitHub] spark pull request #20997: [SPARK-19185] [DSTREAMS] Avoid concurrent use of ...

2018-04-11 Thread gaborgsomogyi
Github user gaborgsomogyi commented on a diff in the pull request: https://github.com/apache/spark/pull/20997#discussion_r180806811 --- Diff: external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/KafkaDataConsumer.scala --- @@ -0,0 +1,381

[GitHub] spark pull request #20997: [SPARK-19185] [DSTREAMS] Avoid concurrent use of ...

2018-04-11 Thread gaborgsomogyi
Github user gaborgsomogyi commented on a diff in the pull request: https://github.com/apache/spark/pull/20997#discussion_r180780119 --- Diff: external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/KafkaDataConsumer.scala --- @@ -0,0 +1,381

[GitHub] spark pull request #20997: [SPARK-19185] [DSTREAMS] Avoid concurrent use of ...

2018-04-11 Thread gaborgsomogyi
Github user gaborgsomogyi commented on a diff in the pull request: https://github.com/apache/spark/pull/20997#discussion_r180807486 --- Diff: external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/KafkaDataConsumer.scala --- @@ -0,0 +1,381

[GitHub] spark pull request #20997: [SPARK-19185] [DSTREAMS] Avoid concurrent use of ...

2018-04-11 Thread gaborgsomogyi
Github user gaborgsomogyi commented on a diff in the pull request: https://github.com/apache/spark/pull/20997#discussion_r180805380 --- Diff: external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/KafkaDataConsumer.scala --- @@ -0,0 +1,381

[GitHub] spark pull request #20997: [SPARK-19185] [DSTREAMS] Avoid concurrent use of ...

2018-04-11 Thread gaborgsomogyi
Github user gaborgsomogyi commented on a diff in the pull request: https://github.com/apache/spark/pull/20997#discussion_r180808081 --- Diff: external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/KafkaDataConsumer.scala --- @@ -0,0 +1,381

[GitHub] spark pull request #20997: [SPARK-19185] [DSTREAMS] Avoid concurrent use of ...

2018-04-11 Thread gaborgsomogyi
Github user gaborgsomogyi commented on a diff in the pull request: https://github.com/apache/spark/pull/20997#discussion_r180808097 --- Diff: external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/KafkaDataConsumer.scala --- @@ -0,0 +1,381

[GitHub] spark pull request #20997: [SPARK-19185] [DSTREAMS] Avoid concurrent use of ...

2018-04-12 Thread gaborgsomogyi
Github user gaborgsomogyi commented on a diff in the pull request: https://github.com/apache/spark/pull/20997#discussion_r181057226 --- Diff: external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/KafkaDataConsumer.scala --- @@ -0,0 +1,381

[GitHub] spark pull request #20997: [SPARK-19185] [DSTREAMS] Avoid concurrent use of ...

2018-04-12 Thread gaborgsomogyi
Github user gaborgsomogyi commented on a diff in the pull request: https://github.com/apache/spark/pull/20997#discussion_r181057345 --- Diff: external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/KafkaDataConsumer.scala --- @@ -0,0 +1,381

[GitHub] spark pull request #20997: [SPARK-19185] [DSTREAMS] Avoid concurrent use of ...

2018-04-12 Thread gaborgsomogyi
Github user gaborgsomogyi commented on a diff in the pull request: https://github.com/apache/spark/pull/20997#discussion_r181057477 --- Diff: external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/KafkaDataConsumer.scala --- @@ -0,0 +1,381

[GitHub] spark pull request #20997: [SPARK-19185] [DSTREAMS] Avoid concurrent use of ...

2018-04-12 Thread gaborgsomogyi
Github user gaborgsomogyi commented on a diff in the pull request: https://github.com/apache/spark/pull/20997#discussion_r181058027 --- Diff: external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/KafkaDataConsumer.scala --- @@ -0,0 +1,381

[GitHub] spark pull request #20997: [SPARK-19185] [DSTREAMS] Avoid concurrent use of ...

2018-04-12 Thread gaborgsomogyi
Github user gaborgsomogyi commented on a diff in the pull request: https://github.com/apache/spark/pull/20997#discussion_r181057829 --- Diff: external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/KafkaDataConsumer.scala --- @@ -0,0 +1,381

[GitHub] spark pull request #20997: [SPARK-19185] [DSTREAMS] Avoid concurrent use of ...

2018-04-12 Thread gaborgsomogyi
Github user gaborgsomogyi commented on a diff in the pull request: https://github.com/apache/spark/pull/20997#discussion_r181109231 --- Diff: external/kafka-0-10/src/test/scala/org/apache/spark/streaming/kafka010/KafkaDataConsumerSuite.scala --- @@ -0,0 +1,111

[GitHub] spark pull request #20997: [SPARK-19185] [DSTREAMS] Avoid concurrent use of ...

2018-04-12 Thread gaborgsomogyi
Github user gaborgsomogyi commented on a diff in the pull request: https://github.com/apache/spark/pull/20997#discussion_r181098163 --- Diff: external/kafka-0-10/src/test/scala/org/apache/spark/streaming/kafka010/KafkaDataConsumerSuite.scala --- @@ -0,0 +1,111

[GitHub] spark issue #20997: [SPARK-19185] [DSTREAMS] Avoid concurrent use of cached ...

2018-04-12 Thread gaborgsomogyi
Github user gaborgsomogyi commented on the issue: https://github.com/apache/spark/pull/20997 @koeninger > I don't see an upper bound on the number of consumers per key, nor a way of reaping idle consumers. If the SQL equivalent code is likely to be modified to use pool

[GitHub] spark pull request #20997: [SPARK-19185] [DSTREAMS] Avoid concurrent use of ...

2018-04-12 Thread gaborgsomogyi
Github user gaborgsomogyi commented on a diff in the pull request: https://github.com/apache/spark/pull/20997#discussion_r181072630 --- Diff: external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/KafkaDataConsumer.scala --- @@ -0,0 +1,381

[GitHub] spark pull request #20997: [SPARK-19185] [DSTREAMS] Avoid concurrent use of ...

2018-04-12 Thread gaborgsomogyi
Github user gaborgsomogyi commented on a diff in the pull request: https://github.com/apache/spark/pull/20997#discussion_r181074560 --- Diff: external/kafka-0-10/src/test/scala/org/apache/spark/streaming/kafka010/KafkaDataConsumerSuite.scala --- @@ -0,0 +1,111

[GitHub] spark pull request #20997: [SPARK-19185] [DSTREAMS] Avoid concurrent use of ...

2018-04-12 Thread gaborgsomogyi
Github user gaborgsomogyi commented on a diff in the pull request: https://github.com/apache/spark/pull/20997#discussion_r181076158 --- Diff: external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/KafkaDataConsumer.scala --- @@ -0,0 +1,381

[GitHub] spark pull request #20997: [SPARK-19185] [DSTREAMS] Avoid concurrent use of ...

2018-04-12 Thread gaborgsomogyi
Github user gaborgsomogyi commented on a diff in the pull request: https://github.com/apache/spark/pull/20997#discussion_r181077275 --- Diff: external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/KafkaDataConsumer.scala --- @@ -0,0 +1,381

[GitHub] spark pull request #20997: [SPARK-19185] [DSTREAMS] Avoid concurrent use of ...

2018-04-12 Thread gaborgsomogyi
Github user gaborgsomogyi commented on a diff in the pull request: https://github.com/apache/spark/pull/20997#discussion_r18108 --- Diff: external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/KafkaDataConsumer.scala --- @@ -0,0 +1,381

[GitHub] spark pull request #20997: [SPARK-19185] [DSTREAMS] Avoid concurrent use of ...

2018-04-12 Thread gaborgsomogyi
Github user gaborgsomogyi commented on a diff in the pull request: https://github.com/apache/spark/pull/20997#discussion_r181063487 --- Diff: external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/KafkaDataConsumer.scala --- @@ -0,0 +1,381

[GitHub] spark pull request #20997: [SPARK-19185] [DSTREAMS] Avoid concurrent use of ...

2018-04-12 Thread gaborgsomogyi
Github user gaborgsomogyi commented on a diff in the pull request: https://github.com/apache/spark/pull/20997#discussion_r181066058 --- Diff: external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/KafkaDataConsumer.scala --- @@ -0,0 +1,381

[GitHub] spark pull request #20997: [SPARK-19185] [DSTREAMS] Avoid concurrent use of ...

2018-04-12 Thread gaborgsomogyi
Github user gaborgsomogyi commented on a diff in the pull request: https://github.com/apache/spark/pull/20997#discussion_r181073687 --- Diff: external/kafka-0-10/src/test/scala/org/apache/spark/streaming/kafka010/KafkaDataConsumerSuite.scala --- @@ -0,0 +1,111

[GitHub] spark pull request #20997: [SPARK-19185] [DSTREAMS] Avoid concurrent use of ...

2018-04-12 Thread gaborgsomogyi
Github user gaborgsomogyi commented on a diff in the pull request: https://github.com/apache/spark/pull/20997#discussion_r181066695 --- Diff: external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/KafkaDataConsumer.scala --- @@ -0,0 +1,381

[GitHub] spark pull request #20997: [SPARK-19185] [DSTREAMS] Avoid concurrent use of ...

2018-04-13 Thread gaborgsomogyi
Github user gaborgsomogyi commented on a diff in the pull request: https://github.com/apache/spark/pull/20997#discussion_r181502862 --- Diff: external/kafka-0-10/src/test/scala/org/apache/spark/streaming/kafka010/KafkaDataConsumerSuite.scala --- @@ -0,0 +1,111

[GitHub] spark pull request #20888: [SPARK-23775][TEST] Make DataFrameRangeSuite not ...

2018-04-13 Thread gaborgsomogyi
Github user gaborgsomogyi commented on a diff in the pull request: https://github.com/apache/spark/pull/20888#discussion_r181459717 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameRangeSuite.scala --- @@ -152,22 +154,28 @@ class DataFrameRangeSuite extends

[GitHub] spark pull request #20888: [SPARK-23775][TEST] Make DataFrameRangeSuite not ...

2018-04-13 Thread gaborgsomogyi
Github user gaborgsomogyi commented on a diff in the pull request: https://github.com/apache/spark/pull/20888#discussion_r181489845 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameRangeSuite.scala --- @@ -152,22 +154,28 @@ class DataFrameRangeSuite extends

[GitHub] spark issue #20888: [SPARK-23775][TEST] DataFrameRangeSuite should wait for ...

2018-04-09 Thread gaborgsomogyi
Github user gaborgsomogyi commented on the issue: https://github.com/apache/spark/pull/20888 `SparkStatusTracker` states the following: ``` * These APIs intentionally provide very weak consistency semantics; consumers of these APIs should * be prepared to handle empty

[GitHub] spark pull request #20888: [SPARK-23775][TEST] Make DataFrameRangeSuite not ...

2018-04-13 Thread gaborgsomogyi
Github user gaborgsomogyi commented on a diff in the pull request: https://github.com/apache/spark/pull/20888#discussion_r181326106 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameRangeSuite.scala --- @@ -152,22 +154,28 @@ class DataFrameRangeSuite extends

[GitHub] spark pull request #20997: [SPARK-19185] [DSTREAMS] Avoid concurrent use of ...

2018-04-12 Thread gaborgsomogyi
Github user gaborgsomogyi commented on a diff in the pull request: https://github.com/apache/spark/pull/20997#discussion_r181116277 --- Diff: external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/KafkaDataConsumer.scala --- @@ -0,0 +1,381

[GitHub] spark issue #20997: [SPARK-19185] [DSTREAMS] Avoid concurrent use of cached ...

2018-04-06 Thread gaborgsomogyi
Github user gaborgsomogyi commented on the issue: https://github.com/apache/spark/pull/20997 cc @tdas @zsxwing @koeninger --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands

[GitHub] spark pull request #20888: [SPARK-23775][TEST] Make DataFrameRangeSuite not ...

2018-04-17 Thread gaborgsomogyi
Github user gaborgsomogyi commented on a diff in the pull request: https://github.com/apache/spark/pull/20888#discussion_r182011978 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameRangeSuite.scala --- @@ -156,43 +156,52 @@ class DataFrameRangeSuite extends

[GitHub] spark pull request #20888: [SPARK-23775][TEST] Make DataFrameRangeSuite not ...

2018-04-17 Thread gaborgsomogyi
Github user gaborgsomogyi commented on a diff in the pull request: https://github.com/apache/spark/pull/20888#discussion_r182013606 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameRangeSuite.scala --- @@ -152,39 +154,54 @@ class DataFrameRangeSuite extends

[GitHub] spark pull request #21105: [SPARK-24022][TEST] Make SparkContextSuite not fl...

2018-04-19 Thread gaborgsomogyi
GitHub user gaborgsomogyi opened a pull request: https://github.com/apache/spark/pull/21105 [SPARK-24022][TEST] Make SparkContextSuite not flaky ## What changes were proposed in this pull request? SparkContextSuite.test("Cancelling stages/jobs with custom reasons.&qu

[GitHub] spark issue #20997: [SPARK-19185] [DSTREAMS] Avoid concurrent use of cached ...

2018-04-21 Thread gaborgsomogyi
Github user gaborgsomogyi commented on the issue: https://github.com/apache/spark/pull/20997 In the meantime found a small glitch in the SQL part. Namely if reattempt happens this line https://github.com/apache/spark/blob/1d758dc73b54e802fdc92be204185fe7414e6553/external/kafka-0

[GitHub] spark pull request #20997: [SPARK-19185] [DSTREAMS] Avoid concurrent use of ...

2018-04-16 Thread gaborgsomogyi
Github user gaborgsomogyi commented on a diff in the pull request: https://github.com/apache/spark/pull/20997#discussion_r181655790 --- Diff: external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/KafkaDataConsumer.scala --- @@ -0,0 +1,381

[GitHub] spark pull request #20997: [SPARK-19185] [DSTREAMS] Avoid concurrent use of ...

2018-04-16 Thread gaborgsomogyi
Github user gaborgsomogyi commented on a diff in the pull request: https://github.com/apache/spark/pull/20997#discussion_r181645665 --- Diff: external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/KafkaDataConsumer.scala --- @@ -0,0 +1,381

[GitHub] spark pull request #20997: [SPARK-19185] [DSTREAMS] Avoid concurrent use of ...

2018-04-16 Thread gaborgsomogyi
Github user gaborgsomogyi commented on a diff in the pull request: https://github.com/apache/spark/pull/20997#discussion_r181647681 --- Diff: external/kafka-0-10/src/main/scala/org/apache/spark/streaming/kafka010/KafkaDataConsumer.scala --- @@ -0,0 +1,381

[GitHub] spark issue #21105: [SPARK-24022][TEST] Make SparkContextSuite not flaky

2018-04-19 Thread gaborgsomogyi
Github user gaborgsomogyi commented on the issue: https://github.com/apache/spark/pull/21105 @jiangxb1987 Here it's possible to create a separate context. --- - To unsubscribe, e-mail: reviews-unsubscr

[GitHub] spark issue #21105: [SPARK-24022][TEST] Make SparkContextSuite not flaky

2018-04-19 Thread gaborgsomogyi
Github user gaborgsomogyi commented on the issue: https://github.com/apache/spark/pull/21105 cc @vanzin @jiangxb1987 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e

[GitHub] spark issue #21105: [SPARK-24022][TEST] Make SparkContextSuite not flaky

2018-04-19 Thread gaborgsomogyi
Github user gaborgsomogyi commented on the issue: https://github.com/apache/spark/pull/21105 Yes, please see SPARK-22764. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e

[GitHub] spark issue #20997: [SPARK-19185] [DSTREAMS] Avoid concurrent use of cached ...

2018-04-20 Thread gaborgsomogyi
Github user gaborgsomogyi commented on the issue: https://github.com/apache/spark/pull/20997 Taken a look at the pool options I have the feeling it requires more time to come up with a proper solution. Switching back to the SQL code provided one cached consumer approach

[GitHub] spark pull request #20888: [SPARK-23775][TEST] Make DataFrameRangeSuite not ...

2018-04-20 Thread gaborgsomogyi
Github user gaborgsomogyi commented on a diff in the pull request: https://github.com/apache/spark/pull/20888#discussion_r183067778 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameRangeSuite.scala --- @@ -152,39 +154,53 @@ class DataFrameRangeSuite extends

[GitHub] spark pull request #20888: [SPARK-23775][TEST] Make DataFrameRangeSuite not ...

2018-04-20 Thread gaborgsomogyi
Github user gaborgsomogyi commented on a diff in the pull request: https://github.com/apache/spark/pull/20888#discussion_r183069064 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameRangeSuite.scala --- @@ -152,39 +154,53 @@ class DataFrameRangeSuite extends

[GitHub] spark pull request #20807: SPARK-23660: Fix exception in yarn cluster mode w...

2018-03-16 Thread gaborgsomogyi
Github user gaborgsomogyi commented on a diff in the pull request: https://github.com/apache/spark/pull/20807#discussion_r175156772 --- Diff: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala --- @@ -418,7 +418,19 @@ private[spark] class

[GitHub] spark issue #20745: [SPARK-23288][SS] Fix output metrics with parquet sink

2018-03-16 Thread gaborgsomogyi
Github user gaborgsomogyi commented on the issue: https://github.com/apache/spark/pull/20745 ping @koeninger --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark pull request #20807: SPARK-23660: Fix exception in yarn cluster mode w...

2018-03-16 Thread gaborgsomogyi
Github user gaborgsomogyi commented on a diff in the pull request: https://github.com/apache/spark/pull/20807#discussion_r175156659 --- Diff: resource-managers/yarn/src/main/scala/org/apache/spark/deploy/yarn/ApplicationMaster.scala --- @@ -497,6 +500,8 @@ private[spark] class

[GitHub] spark pull request #20745: [SPARK-23288][SS] Fix output metrics with parquet...

2018-03-16 Thread gaborgsomogyi
Github user gaborgsomogyi commented on a diff in the pull request: https://github.com/apache/spark/pull/20745#discussion_r175186820 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/streaming/FileStreamSinkSuite.scala --- @@ -405,4 +406,52 @@ class FileStreamSinkSuite

[GitHub] spark pull request #20745: [SPARK-23288][SS] Fix output metrics with parquet...

2018-03-16 Thread gaborgsomogyi
Github user gaborgsomogyi commented on a diff in the pull request: https://github.com/apache/spark/pull/20745#discussion_r175186800 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/streaming/FileStreamSinkSuite.scala --- @@ -405,4 +406,52 @@ class FileStreamSinkSuite

[GitHub] spark issue #20807: SPARK-23660: Fix exception in yarn cluster mode when app...

2018-03-16 Thread gaborgsomogyi
Github user gaborgsomogyi commented on the issue: https://github.com/apache/spark/pull/20807 @vanzin sorry, one useless comment left in the code. Just removed. --- - To unsubscribe, e-mail: reviews-unsubscr

[GitHub] spark issue #19819: [SPARK-22606][Streaming]Add threadId to the CachedKafkaC...

2018-03-19 Thread gaborgsomogyi
Github user gaborgsomogyi commented on the issue: https://github.com/apache/spark/pull/19819 It will create a new consumer for each thread. This could be quite resource consuming when several topics shared with thread pools

[GitHub] spark pull request #20853: [SPARK-23729][CORE] Respect URI fragment when res...

2018-03-19 Thread gaborgsomogyi
Github user gaborgsomogyi commented on a diff in the pull request: https://github.com/apache/spark/pull/20853#discussion_r175541913 --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala --- @@ -245,6 +245,19 @@ object SparkSubmit extends CommandLineUtils

[GitHub] spark pull request #20853: [SPARK-23729][CORE] Respect URI fragment when res...

2018-03-19 Thread gaborgsomogyi
Github user gaborgsomogyi commented on a diff in the pull request: https://github.com/apache/spark/pull/20853#discussion_r175540847 --- Diff: core/src/test/scala/org/apache/spark/deploy/SparkSubmitSuite.scala --- @@ -606,9 +612,12 @@ class SparkSubmitSuite

[GitHub] spark pull request #20853: [SPARK-23729][CORE] Respect URI fragment when res...

2018-03-19 Thread gaborgsomogyi
Github user gaborgsomogyi commented on a diff in the pull request: https://github.com/apache/spark/pull/20853#discussion_r175543124 --- Diff: core/src/main/scala/org/apache/spark/deploy/DependencyUtils.scala --- @@ -137,16 +137,29 @@ private[deploy] object DependencyUtils

[GitHub] spark pull request #20853: [SPARK-23729][CORE] Respect URI fragment when res...

2018-03-19 Thread gaborgsomogyi
Github user gaborgsomogyi commented on a diff in the pull request: https://github.com/apache/spark/pull/20853#discussion_r175553185 --- Diff: core/src/test/scala/org/apache/spark/deploy/SparkSubmitSuite.scala --- @@ -105,11 +105,17 @@ class SparkSubmitSuite

[GitHub] spark pull request #20853: [SPARK-23729][CORE] Respect URI fragment when res...

2018-03-19 Thread gaborgsomogyi
Github user gaborgsomogyi commented on a diff in the pull request: https://github.com/apache/spark/pull/20853#discussion_r175567819 --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala --- @@ -245,6 +245,19 @@ object SparkSubmit extends CommandLineUtils

[GitHub] spark pull request #20853: [SPARK-23729][CORE] Respect URI fragment when res...

2018-03-19 Thread gaborgsomogyi
Github user gaborgsomogyi commented on a diff in the pull request: https://github.com/apache/spark/pull/20853#discussion_r175540696 --- Diff: core/src/test/scala/org/apache/spark/deploy/SparkSubmitSuite.scala --- @@ -105,11 +105,17 @@ class SparkSubmitSuite

[GitHub] spark pull request #20853: [SPARK-23729][CORE] Respect URI fragment when res...

2018-03-19 Thread gaborgsomogyi
Github user gaborgsomogyi commented on a diff in the pull request: https://github.com/apache/spark/pull/20853#discussion_r175544492 --- Diff: core/src/main/scala/org/apache/spark/deploy/DependencyUtils.scala --- @@ -137,16 +137,29 @@ private[deploy] object DependencyUtils

[GitHub] spark pull request #20853: [SPARK-23729][CORE] Respect URI fragment when res...

2018-03-19 Thread gaborgsomogyi
Github user gaborgsomogyi commented on a diff in the pull request: https://github.com/apache/spark/pull/20853#discussion_r175529369 --- Diff: core/src/main/scala/org/apache/spark/deploy/DependencyUtils.scala --- @@ -137,16 +137,29 @@ private[deploy] object DependencyUtils

[GitHub] spark pull request #20853: [SPARK-23729][CORE] Respect URI fragment when res...

2018-03-19 Thread gaborgsomogyi
Github user gaborgsomogyi commented on a diff in the pull request: https://github.com/apache/spark/pull/20853#discussion_r175521925 --- Diff: core/src/test/scala/org/apache/spark/deploy/SparkSubmitSuite.scala --- @@ -657,6 +667,31 @@ class SparkSubmitSuite conf3.get

[GitHub] spark pull request #20853: [SPARK-23729][CORE] Respect URI fragment when res...

2018-03-19 Thread gaborgsomogyi
Github user gaborgsomogyi commented on a diff in the pull request: https://github.com/apache/spark/pull/20853#discussion_r175523620 --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala --- @@ -245,6 +245,19 @@ object SparkSubmit extends CommandLineUtils

[GitHub] spark pull request #20853: [SPARK-23729][CORE] Respect URI fragment when res...

2018-03-19 Thread gaborgsomogyi
Github user gaborgsomogyi commented on a diff in the pull request: https://github.com/apache/spark/pull/20853#discussion_r175515840 --- Diff: core/src/test/scala/org/apache/spark/deploy/SparkSubmitSuite.scala --- @@ -105,11 +105,17 @@ class SparkSubmitSuite

[GitHub] spark pull request #20853: [SPARK-23729][CORE] Respect URI fragment when res...

2018-03-19 Thread gaborgsomogyi
Github user gaborgsomogyi commented on a diff in the pull request: https://github.com/apache/spark/pull/20853#discussion_r175533621 --- Diff: core/src/main/scala/org/apache/spark/deploy/DependencyUtils.scala --- @@ -137,16 +137,29 @@ private[deploy] object DependencyUtils

[GitHub] spark pull request #20853: [SPARK-23729][CORE] Respect URI fragment when res...

2018-03-19 Thread gaborgsomogyi
Github user gaborgsomogyi commented on a diff in the pull request: https://github.com/apache/spark/pull/20853#discussion_r175521372 --- Diff: core/src/test/scala/org/apache/spark/deploy/SparkSubmitSuite.scala --- @@ -606,9 +612,12 @@ class SparkSubmitSuite

[GitHub] spark pull request #20853: [SPARK-23729][CORE] Respect URI fragment when res...

2018-03-19 Thread gaborgsomogyi
Github user gaborgsomogyi commented on a diff in the pull request: https://github.com/apache/spark/pull/20853#discussion_r175585634 --- Diff: core/src/main/scala/org/apache/spark/deploy/DependencyUtils.scala --- @@ -137,16 +138,36 @@ private[deploy] object DependencyUtils

[GitHub] spark pull request #20853: [SPARK-23729][CORE] Respect URI fragment when res...

2018-03-19 Thread gaborgsomogyi
Github user gaborgsomogyi commented on a diff in the pull request: https://github.com/apache/spark/pull/20853#discussion_r175585581 --- Diff: core/src/main/scala/org/apache/spark/deploy/DependencyUtils.scala --- @@ -137,16 +138,36 @@ private[deploy] object DependencyUtils

[GitHub] spark issue #20745: [SPARK-23288][SS] Fix output metrics with parquet sink

2018-03-20 Thread gaborgsomogyi
Github user gaborgsomogyi commented on the issue: https://github.com/apache/spark/pull/20745 https://user-images.githubusercontent.com/18561820/37695954-5aacaa2a-2c90-11e8-9f73-f57d0e1b27f6.png

[GitHub] spark pull request #20745: [SPARK-23288][SS] Fix output metrics with parquet...

2018-03-21 Thread gaborgsomogyi
Github user gaborgsomogyi commented on a diff in the pull request: https://github.com/apache/spark/pull/20745#discussion_r175994267 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/streaming/FileStreamSinkSuite.scala --- @@ -405,4 +406,55 @@ class FileStreamSinkSuite

[GitHub] spark issue #20745: [SPARK-23288][SS] Fix output metrics with parquet sink

2018-03-21 Thread gaborgsomogyi
Github user gaborgsomogyi commented on the issue: https://github.com/apache/spark/pull/20745 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark issue #20745: [SPARK-23288][SS] Fix output metrics with parquet sink

2018-03-20 Thread gaborgsomogyi
Github user gaborgsomogyi commented on the issue: https://github.com/apache/spark/pull/20745 https://user-images.githubusercontent.com/18561820/37696015-b1250bae-2c90-11e8-8ad1-515661487b94.png

[GitHub] spark issue #20745: [SPARK-23288][SS] Fix output metrics with parquet sink

2018-03-21 Thread gaborgsomogyi
Github user gaborgsomogyi commented on the issue: https://github.com/apache/spark/pull/20745 I've started history server then executed `test("SPARK-23288 writing and checking output metrics")` with `spark.eventLog.enabled` parameter. Now there is only one entry in the a

[GitHub] spark issue #20767: [SPARK-23623] [SS] Avoid concurrent use of cached consum...

2018-03-20 Thread gaborgsomogyi
Github user gaborgsomogyi commented on the issue: https://github.com/apache/spark/pull/20767 @tdas @zsxwing @koeninger @tedyu do you think it makes sense to make similar step in the DStream area like this and then later follow with the mentioned Apache Common Pool

[GitHub] spark pull request #20888: [SPARK-23775][TEST] DataFrameRangeSuite should wa...

2018-03-22 Thread gaborgsomogyi
GitHub user gaborgsomogyi opened a pull request: https://github.com/apache/spark/pull/20888 [SPARK-23775][TEST] DataFrameRangeSuite should wait for first stage ## What changes were proposed in this pull request? DataFrameRangeSuite.test("Cancelling stage in a

[GitHub] spark pull request #20836: SPARK-23685 : Fix for the Spark Structured Stream...

2018-03-23 Thread gaborgsomogyi
Github user gaborgsomogyi commented on a diff in the pull request: https://github.com/apache/spark/pull/20836#discussion_r176656905 --- Diff: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaDataConsumer.scala --- @@ -279,9 +279,8 @@ private[kafka010

[GitHub] spark issue #20888: [SPARK-23775][TEST] DataFrameRangeSuite should wait for ...

2018-03-22 Thread gaborgsomogyi
Github user gaborgsomogyi commented on the issue: https://github.com/apache/spark/pull/20888 I mean on my machine the stage ID is zero for long long time here: ``` DataFrameRangeSuite.stageToKill = TaskContext.get().stageId() ``` and after 200 seconds the other

[GitHub] spark issue #20836: SPARK-23685 : Fix for the Spark Structured Streaming Kaf...

2018-03-23 Thread gaborgsomogyi
Github user gaborgsomogyi commented on the issue: https://github.com/apache/spark/pull/20836 I don't see the problem. You see an exception which tells exactly what can be done: > If you don't want your streaming query to fail on such cases, set the source opt

[GitHub] spark issue #20888: [SPARK-23775][TEST] DataFrameRangeSuite should wait for ...

2018-03-22 Thread gaborgsomogyi
Github user gaborgsomogyi commented on the issue: https://github.com/apache/spark/pull/20888 Where do you think the reset should happen? There is already one inside `withSQLConf` which makes a reset before job submit. Related the ID I've just taken a look at the original

[GitHub] spark issue #20888: [SPARK-23775][TEST] DataFrameRangeSuite should wait for ...

2018-03-22 Thread gaborgsomogyi
Github user gaborgsomogyi commented on the issue: https://github.com/apache/spark/pull/20888 Just an additional info if I execute the test on my machine alone it never pass. --- - To unsubscribe, e-mail: reviews

<    1   2   3   4   5   >