[GitHub] spark pull request: [SPARK-11818][REPL] Fix ExecutorClassLoader to...

2015-11-18 Thread HeartSaVioR
GitHub user HeartSaVioR opened a pull request: https://github.com/apache/spark/pull/9812 [SPARK-11818][REPL] Fix ExecutorClassLoader to lookup resources from … …parent class loader Without patch, two additional tests of ExecutorClassLoaderSuite fails

[GitHub] spark pull request: [SPARK-11818][REPL] Fix ExecutorClassLoader to...

2015-11-24 Thread HeartSaVioR
Github user HeartSaVioR commented on the pull request: https://github.com/apache/spark/pull/9812#issuecomment-159347482 @srowen @vanzin @holdenk Thanks for reviewing and merging! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub

[GitHub] spark pull request: [SPARK-11818][REPL] Fix ExecutorClassLoader to...

2015-11-18 Thread HeartSaVioR
Github user HeartSaVioR commented on a diff in the pull request: https://github.com/apache/spark/pull/9812#discussion_r45288354 --- Diff: repl/src/main/scala/org/apache/spark/repl/ExecutorClassLoader.scala --- @@ -55,6 +57,14 @@ class ExecutorClassLoader(conf: SparkConf, classUri

[GitHub] spark pull request: [SPARK-11818][REPL] Fix ExecutorClassLoader to...

2015-11-18 Thread HeartSaVioR
Github user HeartSaVioR commented on a diff in the pull request: https://github.com/apache/spark/pull/9812#discussion_r45288990 --- Diff: repl/src/main/scala/org/apache/spark/repl/ExecutorClassLoader.scala --- @@ -55,6 +57,14 @@ class ExecutorClassLoader(conf: SparkConf, classUri

[GitHub] spark pull request: [SPARK-11818][REPL] Fix ExecutorClassLoader to...

2015-11-18 Thread HeartSaVioR
Github user HeartSaVioR commented on a diff in the pull request: https://github.com/apache/spark/pull/9812#discussion_r45285698 --- Diff: repl/src/main/scala/org/apache/spark/repl/ExecutorClassLoader.scala --- @@ -55,6 +57,14 @@ class ExecutorClassLoader(conf: SparkConf, classUri

[GitHub] spark pull request: [SPARK-11818][REPL] Fix ExecutorClassLoader to...

2015-11-20 Thread HeartSaVioR
Github user HeartSaVioR commented on the pull request: https://github.com/apache/spark/pull/9812#issuecomment-158338205 @vanzin Thanks for reviewing, I addressed your comment. Please take a look again. --- If your project is set up for it, you can reply to this email and have your

[GitHub] spark pull request: [SPARK-11818][REPL] Fix ExecutorClassLoader to...

2015-11-20 Thread HeartSaVioR
Github user HeartSaVioR commented on the pull request: https://github.com/apache/spark/pull/9812#issuecomment-158377559 Failed tests seems not related. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request: [SPARK-11818][REPL] Fix ExecutorClassLoader to...

2015-11-18 Thread HeartSaVioR
Github user HeartSaVioR commented on a diff in the pull request: https://github.com/apache/spark/pull/9812#discussion_r45260904 --- Diff: core/src/main/scala/org/apache/spark/TestUtils.scala --- @@ -159,6 +159,16 @@ private[spark] object TestUtils { createCompiledClass

[GitHub] spark pull request: [SPARK-11818][REPL] Fix ExecutorClassLoader to...

2015-11-18 Thread HeartSaVioR
Github user HeartSaVioR commented on a diff in the pull request: https://github.com/apache/spark/pull/9812#discussion_r45261349 --- Diff: repl/src/main/scala/org/apache/spark/repl/ExecutorClassLoader.scala --- @@ -55,6 +57,14 @@ class ExecutorClassLoader(conf: SparkConf, classUri

[GitHub] spark pull request #21222: [SPARK-24161][SS] Enable debug package feature on...

2018-05-04 Thread HeartSaVioR
Github user HeartSaVioR commented on a diff in the pull request: https://github.com/apache/spark/pull/21222#discussion_r186247474 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/debug/package.scala --- @@ -116,6 +168,30 @@ package object debug

[GitHub] spark pull request #21222: [SPARK-24161][SS] Enable debug package feature on...

2018-05-04 Thread HeartSaVioR
Github user HeartSaVioR commented on a diff in the pull request: https://github.com/apache/spark/pull/21222#discussion_r186032252 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/debug/package.scala --- @@ -88,23 +100,62 @@ package object debug

[GitHub] spark issue #21222: [SPARK-24161][SS] Enable debug package feature on struct...

2018-05-14 Thread HeartSaVioR
Github user HeartSaVioR commented on the issue: https://github.com/apache/spark/pull/21222 Kindly ping. I guess debugging last batch might not be attractive that much, but printing codegen would be helpful to someone who want to investigate or debug in detail

[GitHub] spark issue #21152: [SPARK-23688][SS] Refactor tests away from rate source

2018-04-27 Thread HeartSaVioR
Github user HeartSaVioR commented on the issue: https://github.com/apache/spark/pull/21152 @jerryshao Thanks for merging! My Apache JIRA ID is “kabhwan” --- - To unsubscribe, e-mail: reviews-unsubscr

[GitHub] spark pull request #21337: [SPARK-24234][SS] Reader for continuous processin...

2018-05-16 Thread HeartSaVioR
Github user HeartSaVioR commented on a diff in the pull request: https://github.com/apache/spark/pull/21337#discussion_r188632188 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/shuffle/ContinuousShuffleReadRDD.scala --- @@ -0,0 +1,64

[GitHub] spark pull request #21337: [SPARK-24234][SS] Reader for continuous processin...

2018-05-16 Thread HeartSaVioR
Github user HeartSaVioR commented on a diff in the pull request: https://github.com/apache/spark/pull/21337#discussion_r188636306 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/shuffle/UnsafeRowReceiver.scala --- @@ -0,0 +1,56

[GitHub] spark pull request #21337: [SPARK-24234][SS] Reader for continuous processin...

2018-05-16 Thread HeartSaVioR
Github user HeartSaVioR commented on a diff in the pull request: https://github.com/apache/spark/pull/21337#discussion_r188628202 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/shuffle/ContinuousShuffleReadRDD.scala --- @@ -0,0 +1,64

[GitHub] spark pull request #21337: [SPARK-24234][SS] Reader for continuous processin...

2018-05-16 Thread HeartSaVioR
Github user HeartSaVioR commented on a diff in the pull request: https://github.com/apache/spark/pull/21337#discussion_r188638980 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/streaming/continuous/shuffle/ContinuousShuffleReadSuite.scala --- @@ -0,0 +1,122

[GitHub] spark issue #21388: [SPARK-24336][SQL] Support 'pass through' transformation...

2018-05-22 Thread HeartSaVioR
Github user HeartSaVioR commented on the issue: https://github.com/apache/spark/pull/21388 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark pull request #21388: [SPARK-24336][SQL] Support 'pass through' transfo...

2018-05-21 Thread HeartSaVioR
GitHub user HeartSaVioR opened a pull request: https://github.com/apache/spark/pull/21388 [SPARK-24336][SQL] Support 'pass through' transformation in BasicOperators ## What changes were proposed in this pull request? Enable 'pass through' transformation in BasicOperators

[GitHub] spark issue #21388: [SPARK-24336][SQL] Support 'pass through' transformation...

2018-05-22 Thread HeartSaVioR
Github user HeartSaVioR commented on the issue: https://github.com/apache/spark/pull/21388 @hvanhovell I also think someone might not want to have reflection magic (I was the one but realized I should do it), so I'm happy to close the PR when others voice same opinion

[GitHub] spark pull request #21385: [SPARK-24234][SS] Support multiple row writers in...

2018-05-23 Thread HeartSaVioR
Github user HeartSaVioR commented on a diff in the pull request: https://github.com/apache/spark/pull/21385#discussion_r190131693 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/shuffle/UnsafeRowReceiver.scala --- @@ -56,20 +69,73 @@ private

[GitHub] spark pull request #21385: [SPARK-24234][SS] Support multiple row writers in...

2018-05-23 Thread HeartSaVioR
Github user HeartSaVioR commented on a diff in the pull request: https://github.com/apache/spark/pull/21385#discussion_r190129892 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/shuffle/UnsafeRowReceiver.scala --- @@ -56,20 +69,73 @@ private

[GitHub] spark pull request #21385: [SPARK-24234][SS] Support multiple row writers in...

2018-05-23 Thread HeartSaVioR
Github user HeartSaVioR commented on a diff in the pull request: https://github.com/apache/spark/pull/21385#discussion_r190125731 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/shuffle/UnsafeRowReceiver.scala --- @@ -56,20 +69,73 @@ private

[GitHub] spark pull request #21385: [SPARK-24234][SS] Support multiple row writers in...

2018-05-23 Thread HeartSaVioR
Github user HeartSaVioR commented on a diff in the pull request: https://github.com/apache/spark/pull/21385#discussion_r190120836 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/shuffle/UnsafeRowReceiver.scala --- @@ -56,20 +69,73 @@ private

[GitHub] spark pull request #21357: [SPARK-24311][SS] Refactor HDFSBackedStateStorePr...

2018-05-17 Thread HeartSaVioR
GitHub user HeartSaVioR opened a pull request: https://github.com/apache/spark/pull/21357 [SPARK-24311][SS] Refactor HDFSBackedStateStoreProvider to remove duplicate… …d logic between operations on delta file and snapshot file ## What changes were proposed in this pull

[GitHub] spark issue #21357: [SPARK-24311][SS] Refactor HDFSBackedStateStoreProvider ...

2018-05-17 Thread HeartSaVioR
Github user HeartSaVioR commented on the issue: https://github.com/apache/spark/pull/21357 cc. @tdas --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #21388: [SPARK-24336][SQL] Support 'pass through' transformation...

2018-05-22 Thread HeartSaVioR
Github user HeartSaVioR commented on the issue: https://github.com/apache/spark/pull/21388 Thanks @HyukjinKwon for reviewing. Addressed review comments. --- - To unsubscribe, e-mail: reviews-unsubscr

[GitHub] spark pull request #21497: [SPARK-24466][SS] Fix TextSocketMicroBatchReader ...

2018-06-06 Thread HeartSaVioR
Github user HeartSaVioR commented on a diff in the pull request: https://github.com/apache/spark/pull/21497#discussion_r193374662 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/sources/TextSocketStreamSuite.scala --- @@ -256,6 +246,66 @@ class

[GitHub] spark pull request #21497: [SPARK-24466][SS] Fix TextSocketMicroBatchReader ...

2018-06-06 Thread HeartSaVioR
Github user HeartSaVioR commented on a diff in the pull request: https://github.com/apache/spark/pull/21497#discussion_r193372564 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/sources/TextSocketStreamSuite.scala --- @@ -256,6 +246,66 @@ class

[GitHub] spark issue #21506: [SPARK-24485][SS] Measure and log elapsed time for files...

2018-06-09 Thread HeartSaVioR
Github user HeartSaVioR commented on the issue: https://github.com/apache/spark/pull/21506 cc. @tdas @jose-torres @jerryshao @arunmahadevan @HyukjinKwon --- - To unsubscribe, e-mail: reviews-unsubscr

[GitHub] spark issue #21469: [SPARK-24441][SS] Expose total estimated size of states ...

2018-06-09 Thread HeartSaVioR
Github user HeartSaVioR commented on the issue: https://github.com/apache/spark/pull/21469 @jose-torres No problem. I expect there would be some inactive moment in Spark community during spark summit. Addressed comment regarding renaming

[GitHub] spark issue #21357: [SPARK-24311][SS] Refactor HDFSBackedStateStoreProvider ...

2018-06-11 Thread HeartSaVioR
Github user HeartSaVioR commented on the issue: https://github.com/apache/spark/pull/21357 Kindly ping again to @tdas And cc. to @jose-torres @jerryshao @HyukjinKwon @arunmahadevan for reviewing

[GitHub] spark issue #21500: Scalable Memory option for HDFSBackedStateStore

2018-06-11 Thread HeartSaVioR
Github user HeartSaVioR commented on the issue: https://github.com/apache/spark/pull/21500 @aalobaidi I just would like to see the benefit of unloading the version of state which is expected to be read from the next batch. Totally I agree current mechanism of cache is excessive

[GitHub] spark issue #21469: [SPARK-24441][SS] Expose total estimated size of states ...

2018-06-11 Thread HeartSaVioR
Github user HeartSaVioR commented on the issue: https://github.com/apache/spark/pull/21469 Kindly ping again to @tdas --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e

[GitHub] spark issue #21497: [SPARK-24466][SS] Fix TextSocketMicroBatchReader to be c...

2018-06-11 Thread HeartSaVioR
Github user HeartSaVioR commented on the issue: https://github.com/apache/spark/pull/21497 Kindly ping again to @tdas --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e

[GitHub] spark issue #21506: [SPARK-24485][SS] Measure and log elapsed time for files...

2018-06-11 Thread HeartSaVioR
Github user HeartSaVioR commented on the issue: https://github.com/apache/spark/pull/21506 Kindly ping again to @tdas --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e

[GitHub] spark issue #21222: [SPARK-24161][SS] Enable debug package feature on struct...

2018-06-11 Thread HeartSaVioR
Github user HeartSaVioR commented on the issue: https://github.com/apache/spark/pull/21222 Kindly ping again to @tdas --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e

[GitHub] spark issue #21357: [SPARK-24311][SS] Refactor HDFSBackedStateStoreProvider ...

2018-06-11 Thread HeartSaVioR
Github user HeartSaVioR commented on the issue: https://github.com/apache/spark/pull/21357 retest this, please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark pull request #21469: [SPARK-24441][SS] Expose total estimated size of ...

2018-06-11 Thread HeartSaVioR
Github user HeartSaVioR commented on a diff in the pull request: https://github.com/apache/spark/pull/21469#discussion_r194585044 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/statefulOperators.scala --- @@ -112,14 +122,19 @@ trait StateStoreWriter

[GitHub] spark pull request #21469: [SPARK-24441][SS] Expose total estimated size of ...

2018-06-11 Thread HeartSaVioR
Github user HeartSaVioR commented on a diff in the pull request: https://github.com/apache/spark/pull/21469#discussion_r194613720 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/statefulOperators.scala --- @@ -112,14 +122,19 @@ trait StateStoreWriter

[GitHub] spark pull request #21477: [WIP] [SPARK-24396] [SS] [PYSPARK] Add Structured...

2018-06-07 Thread HeartSaVioR
Github user HeartSaVioR commented on a diff in the pull request: https://github.com/apache/spark/pull/21477#discussion_r193740695 --- Diff: python/pyspark/sql/streaming.py --- @@ -843,6 +844,169 @@ def trigger(self, processingTime=None, once=None, continuous=None

[GitHub] spark pull request #21506: [SPARK-24485][SS] Measure and log elapsed time fo...

2018-06-07 Thread HeartSaVioR
GitHub user HeartSaVioR opened a pull request: https://github.com/apache/spark/pull/21506 [SPARK-24485][SS] Measure and log elapsed time for filesystem operations in HDFSBackedStateStoreProvider ## What changes were proposed in this pull request? This patch measures

[GitHub] spark issue #21506: [SPARK-24485][SS] Measure and log elapsed time for files...

2018-06-07 Thread HeartSaVioR
Github user HeartSaVioR commented on the issue: https://github.com/apache/spark/pull/21506 There're plenty of other debug messages which might hide the log messages added from this patch. Would we want to log them with INFO instead of DEBUG

[GitHub] spark issue #21469: [SPARK-24441][SS] Expose total estimated size of states ...

2018-06-07 Thread HeartSaVioR
Github user HeartSaVioR commented on the issue: https://github.com/apache/spark/pull/21469 retest this, please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark issue #21504: [SPARK-24479][SS] Added config for registering streaming...

2018-06-08 Thread HeartSaVioR
Github user HeartSaVioR commented on the issue: https://github.com/apache/spark/pull/21504 retest this, please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark issue #21500: Scalable Memory option for HDFSBackedStateStore

2018-06-07 Thread HeartSaVioR
Github user HeartSaVioR commented on the issue: https://github.com/apache/spark/pull/21500 @aalobaidi You can also merge #21506 (maybe with changing log level or modify the patch to set message to INFO level) and see latencies on loading state, snapshotting, cleaning up

[GitHub] spark issue #21469: [SPARK-24441][SS] Expose total estimated size of states ...

2018-06-07 Thread HeartSaVioR
Github user HeartSaVioR commented on the issue: https://github.com/apache/spark/pull/21469 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark issue #21500: Scalable Memory option for HDFSBackedStateStore

2018-06-07 Thread HeartSaVioR
Github user HeartSaVioR commented on the issue: https://github.com/apache/spark/pull/21500 @aalobaidi One thing you may want to be aware is that in point of executor's view, executor must load at least 1 version of state in memory regardless of caching versions. I guess you may

[GitHub] spark issue #21388: [SPARK-24336][SQL] Support 'pass through' transformation...

2018-06-13 Thread HeartSaVioR
Github user HeartSaVioR commented on the issue: https://github.com/apache/spark/pull/21388 @hvanhovell To be honest, I found the rationalization of the issue from a comment in Spark code: https://github.com/apache/spark/blob/4c388bccf1bcac8f833fd9214096dd164c3ea065/sql

[GitHub] spark pull request #21469: [SPARK-24441][SS] Expose total estimated size of ...

2018-06-11 Thread HeartSaVioR
Github user HeartSaVioR commented on a diff in the pull request: https://github.com/apache/spark/pull/21469#discussion_r194563959 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/HDFSBackedStateStoreProvider.scala --- @@ -247,6 +253,14 @@ private

[GitHub] spark pull request #21477: [SPARK-24396] [SS] [PYSPARK] Add Structured Strea...

2018-06-14 Thread HeartSaVioR
Github user HeartSaVioR commented on a diff in the pull request: https://github.com/apache/spark/pull/21477#discussion_r195632189 --- Diff: python/pyspark/sql/streaming.py --- @@ -843,6 +843,170 @@ def trigger(self, processingTime=None, once=None, continuous=None

[GitHub] spark pull request #21477: [SPARK-24396] [SS] [PYSPARK] Add Structured Strea...

2018-06-14 Thread HeartSaVioR
Github user HeartSaVioR commented on a diff in the pull request: https://github.com/apache/spark/pull/21477#discussion_r195625921 --- Diff: python/pyspark/sql/streaming.py --- @@ -843,6 +843,170 @@ def trigger(self, processingTime=None, once=None, continuous=None

[GitHub] spark pull request #21504: [SPARK-24479][SS] Added config for registering st...

2018-06-07 Thread HeartSaVioR
Github user HeartSaVioR commented on a diff in the pull request: https://github.com/apache/spark/pull/21504#discussion_r193945288 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/streaming/StreamingQueryManager.scala --- @@ -55,6 +57,19 @@ class StreamingQueryManager

[GitHub] spark pull request #21506: [SPARK-24485][SS] Measure and log elapsed time fo...

2018-06-10 Thread HeartSaVioR
Github user HeartSaVioR commented on a diff in the pull request: https://github.com/apache/spark/pull/21506#discussion_r194293251 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/HDFSBackedStateStoreProvider.scala --- @@ -280,38 +278,49 @@ private

[GitHub] spark pull request #21506: [SPARK-24485][SS] Measure and log elapsed time fo...

2018-06-10 Thread HeartSaVioR
Github user HeartSaVioR commented on a diff in the pull request: https://github.com/apache/spark/pull/21506#discussion_r194293481 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/HDFSBackedStateStoreProvider.scala --- @@ -280,38 +278,49 @@ private

[GitHub] spark pull request #21506: [SPARK-24485][SS] Measure and log elapsed time fo...

2018-06-10 Thread HeartSaVioR
Github user HeartSaVioR commented on a diff in the pull request: https://github.com/apache/spark/pull/21506#discussion_r194295068 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/HDFSBackedStateStoreProvider.scala --- @@ -280,38 +278,49 @@ private

[GitHub] spark issue #21506: [SPARK-24485][SS] Measure and log elapsed time for files...

2018-06-11 Thread HeartSaVioR
Github user HeartSaVioR commented on the issue: https://github.com/apache/spark/pull/21506 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark issue #21500: Scalable Memory option for HDFSBackedStateStore

2018-06-11 Thread HeartSaVioR
Github user HeartSaVioR commented on the issue: https://github.com/apache/spark/pull/21500 After enabling option, I've observed small expected latency whenever starting batch per each partition per each batch. Median/average was 4~50 ms for my case, but max latency was a bit higher

[GitHub] spark issue #21497: [SPARK-24466][SS] Fix TextSocketMicroBatchReader to be c...

2018-06-09 Thread HeartSaVioR
Github user HeartSaVioR commented on the issue: https://github.com/apache/spark/pull/21497 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark issue #21504: [SPARK-24479][SS] Added config for registering streaming...

2018-06-09 Thread HeartSaVioR
Github user HeartSaVioR commented on the issue: https://github.com/apache/spark/pull/21504 Test failures were from kafka. retest this, please --- - To unsubscribe, e-mail: reviews-unsubscr

[GitHub] spark issue #21500: Scalable Memory option for HDFSBackedStateStore

2018-06-09 Thread HeartSaVioR
Github user HeartSaVioR commented on the issue: https://github.com/apache/spark/pull/21500 @aalobaidi When starting batch, latest version state is being read to start a new version of state. If the state should be restored from snapshot as well as delta files, it will incur huge

[GitHub] spark pull request #21428: [SPARK-24235][SS] Implement continuous shuffle wr...

2018-05-29 Thread HeartSaVioR
Github user HeartSaVioR commented on a diff in the pull request: https://github.com/apache/spark/pull/21428#discussion_r191605388 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/streaming/continuous/shuffle/ContinuousShuffleSuite.scala --- @@ -58,39 +46,29 @@ class

[GitHub] spark pull request #21428: [SPARK-24235][SS] Implement continuous shuffle wr...

2018-05-29 Thread HeartSaVioR
Github user HeartSaVioR commented on a diff in the pull request: https://github.com/apache/spark/pull/21428#discussion_r191629272 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/streaming/continuous/shuffle/ContinuousShuffleSuite.scala --- @@ -288,4 +267,153 @@ class

[GitHub] spark pull request #21428: [SPARK-24235][SS] Implement continuous shuffle wr...

2018-05-29 Thread HeartSaVioR
Github user HeartSaVioR commented on a diff in the pull request: https://github.com/apache/spark/pull/21428#discussion_r191629554 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/streaming/continuous/shuffle/ContinuousShuffleSuite.scala --- @@ -58,39 +46,29 @@ class

[GitHub] spark issue #21469: [SPARK-24441][SS] Expose total size of states in HDFSBac...

2018-05-31 Thread HeartSaVioR
Github user HeartSaVioR commented on the issue: https://github.com/apache/spark/pull/21469 cc. @tdas @jose-torres @jerryshao @HyukjinKwon @arunmahadevan --- - To unsubscribe, e-mail: reviews-unsubscr

[GitHub] spark issue #21469: [SPARK-24441][SS] Expose total size of states in HDFSBac...

2018-06-01 Thread HeartSaVioR
Github user HeartSaVioR commented on the issue: https://github.com/apache/spark/pull/21469 Thanks @HyukjinKwon for reviewing. Addressed PR title as well as fixing nit. --- - To unsubscribe, e-mail: reviews-unsubscr

[GitHub] spark pull request #21497: [SPARK-24466][SS] Fix TextSocketMicroBatchReader ...

2018-06-05 Thread HeartSaVioR
GitHub user HeartSaVioR opened a pull request: https://github.com/apache/spark/pull/21497 [SPARK-24466][SS] Fix TextSocketMicroBatchReader to be compatible with netcat again ## What changes were proposed in this pull request? TextSocketMicroBatchReader was no longer

[GitHub] spark issue #21469: [SPARK-24441][SS] Expose total estimated size of states ...

2018-06-04 Thread HeartSaVioR
Github user HeartSaVioR commented on the issue: https://github.com/apache/spark/pull/21469 @arunmahadevan I didn't add the metric to StateOperatorProgress cause this behavior is specific to HDFSBackedStateStoreProvider (though this is only one implementation available in Apache

[GitHub] spark issue #21469: [SPARK-24441][SS] Expose total estimated size of states ...

2018-06-05 Thread HeartSaVioR
Github user HeartSaVioR commented on the issue: https://github.com/apache/spark/pull/21469 Looks like the size is added only once for same identity on SizeEstimator.estimate(), so SizeEstimator.estimate() is working correctly in this case. There might be other valid cases

[GitHub] spark issue #21477: [WIP] [SPARK-24396] [SS] [PYSPARK] Add Structured Stream...

2018-06-05 Thread HeartSaVioR
Github user HeartSaVioR commented on the issue: https://github.com/apache/spark/pull/21477 Thanks @HyukjinKwon for cc.ing me. I didn't cover the python part on structured streaming so would take some time to cover and going through the code. Hoping I can participate reviewing in time

[GitHub] spark issue #21497: [SPARK-24466][SS] Fix TextSocketMicroBatchReader to be c...

2018-06-05 Thread HeartSaVioR
Github user HeartSaVioR commented on the issue: https://github.com/apache/spark/pull/21497 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail

[GitHub] spark issue #21497: [SPARK-24466][SS] Fix TextSocketMicroBatchReader to be c...

2018-06-05 Thread HeartSaVioR
Github user HeartSaVioR commented on the issue: https://github.com/apache/spark/pull/21497 Failing tests were below: * org.apache.spark.sql.hive.client.HiveClientSuites.(It is not a test it is a sbt.testing.NestedSuiteSelector

[GitHub] spark issue #21469: [SPARK-24441][SS] Expose total estimated size of states ...

2018-06-05 Thread HeartSaVioR
Github user HeartSaVioR commented on the issue: https://github.com/apache/spark/pull/21469 Also added custom metric for the count of versions stored in loadedMaps. This is a new screenshot: https://user-images.githubusercontent.com/1317309/40978481-b46ad324-690e-11e8-9b0f

[GitHub] spark pull request #21497: [SPARK-24466][SS] Fix TextSocketMicroBatchReader ...

2018-06-05 Thread HeartSaVioR
Github user HeartSaVioR commented on a diff in the pull request: https://github.com/apache/spark/pull/21497#discussion_r193277616 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/sources/TextSocketStreamSuite.scala --- @@ -35,10 +34,11 @@ import

[GitHub] spark issue #21497: [SPARK-24466][SS] Fix TextSocketMicroBatchReader to be c...

2018-06-05 Thread HeartSaVioR
Github user HeartSaVioR commented on the issue: https://github.com/apache/spark/pull/21497 @arunmahadevan Yes, before the patch Spark connects to socket server twice: one for getting schema, and another one for reading data. And `-k` flag is only supported for specific

[GitHub] spark issue #21497: [SPARK-24466][SS] Fix TextSocketMicroBatchReader to be c...

2018-06-05 Thread HeartSaVioR
Github user HeartSaVioR commented on the issue: https://github.com/apache/spark/pull/21497 cc. @tdas @jose-torres @jerryshao @arunmahadevan --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

[GitHub] spark issue #21500: Scalable Memory option for HDFSBackedStateStore

2018-06-05 Thread HeartSaVioR
Github user HeartSaVioR commented on the issue: https://github.com/apache/spark/pull/21500 I agree that current cache approach may consume excessive memory unnecessarily, and that's also same to my finding in #21469. The issue is not that simple however, because in micro

[GitHub] spark pull request #21477: [WIP] [SPARK-24396] [SS] [PYSPARK] Add Structured...

2018-06-05 Thread HeartSaVioR
Github user HeartSaVioR commented on a diff in the pull request: https://github.com/apache/spark/pull/21477#discussion_r193289099 --- Diff: python/pyspark/sql/tests.py --- @@ -1884,7 +1885,164 @@ def test_query_manager_await_termination(self): finally

[GitHub] spark pull request #21477: [WIP] [SPARK-24396] [SS] [PYSPARK] Add Structured...

2018-06-05 Thread HeartSaVioR
Github user HeartSaVioR commented on a diff in the pull request: https://github.com/apache/spark/pull/21477#discussion_r193284293 --- Diff: python/pyspark/sql/streaming.py --- @@ -843,6 +844,169 @@ def trigger(self, processingTime=None, once=None, continuous=None

[GitHub] spark pull request #21477: [WIP] [SPARK-24396] [SS] [PYSPARK] Add Structured...

2018-06-05 Thread HeartSaVioR
Github user HeartSaVioR commented on a diff in the pull request: https://github.com/apache/spark/pull/21477#discussion_r193289567 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/ForeachWriter.scala --- @@ -20,10 +20,48 @@ package org.apache.spark.sql import

[GitHub] spark pull request #21477: [WIP] [SPARK-24396] [SS] [PYSPARK] Add Structured...

2018-06-05 Thread HeartSaVioR
Github user HeartSaVioR commented on a diff in the pull request: https://github.com/apache/spark/pull/21477#discussion_r193291809 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/python/PythonForeachWriter.scala --- @@ -0,0 +1,161 @@ +/* + * Licensed

[GitHub] spark pull request #21477: [WIP] [SPARK-24396] [SS] [PYSPARK] Add Structured...

2018-06-05 Thread HeartSaVioR
Github user HeartSaVioR commented on a diff in the pull request: https://github.com/apache/spark/pull/21477#discussion_r193286066 --- Diff: python/pyspark/sql/streaming.py --- @@ -843,6 +844,169 @@ def trigger(self, processingTime=None, once=None, continuous=None

[GitHub] spark pull request #21477: [WIP] [SPARK-24396] [SS] [PYSPARK] Add Structured...

2018-06-05 Thread HeartSaVioR
Github user HeartSaVioR commented on a diff in the pull request: https://github.com/apache/spark/pull/21477#discussion_r193286932 --- Diff: python/pyspark/sql/tests.py --- @@ -1884,7 +1885,164 @@ def test_query_manager_await_termination(self): finally

[GitHub] spark pull request #21477: [WIP] [SPARK-24396] [SS] [PYSPARK] Add Structured...

2018-06-05 Thread HeartSaVioR
Github user HeartSaVioR commented on a diff in the pull request: https://github.com/apache/spark/pull/21477#discussion_r193284839 --- Diff: python/pyspark/sql/streaming.py --- @@ -843,6 +844,169 @@ def trigger(self, processingTime=None, once=None, continuous=None

[GitHub] spark pull request #21477: [WIP] [SPARK-24396] [SS] [PYSPARK] Add Structured...

2018-06-05 Thread HeartSaVioR
Github user HeartSaVioR commented on a diff in the pull request: https://github.com/apache/spark/pull/21477#discussion_r193285667 --- Diff: python/pyspark/sql/streaming.py --- @@ -843,6 +844,169 @@ def trigger(self, processingTime=None, once=None, continuous=None

[GitHub] spark pull request #21477: [WIP] [SPARK-24396] [SS] [PYSPARK] Add Structured...

2018-06-06 Thread HeartSaVioR
Github user HeartSaVioR commented on a diff in the pull request: https://github.com/apache/spark/pull/21477#discussion_r193304316 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/ForeachWriter.scala --- @@ -20,10 +20,48 @@ package org.apache.spark.sql import

[GitHub] spark pull request #21469: [SPARK-24441][SS] Expose total estimated size of ...

2018-06-06 Thread HeartSaVioR
Github user HeartSaVioR commented on a diff in the pull request: https://github.com/apache/spark/pull/21469#discussion_r193622940 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingQueryListenerSuite.scala --- @@ -231,7 +231,7 @@ class

[GitHub] spark issue #21500: Scalable Memory option for HDFSBackedStateStore

2018-06-06 Thread HeartSaVioR
Github user HeartSaVioR commented on the issue: https://github.com/apache/spark/pull/21500 @TomaszGaweda @aalobaidi Please correct me if I'm missing here. From every start of batch, state store loads previous version of state so that it can be read and written. If we

[GitHub] spark issue #21500: Scalable Memory option for HDFSBackedStateStore

2018-06-07 Thread HeartSaVioR
Github user HeartSaVioR commented on the issue: https://github.com/apache/spark/pull/21500 Retaining versions of state is also relevant to do snapshotting the last version in files: HDFSBackedStateStoreProvider doesn't snapshot if the version doesn't exist in loadedMaps. So we may

[GitHub] spark issue #21469: [SPARK-24441][SS] Expose total estimated size of states ...

2018-06-06 Thread HeartSaVioR
Github user HeartSaVioR commented on the issue: https://github.com/apache/spark/pull/21469 @arunmahadevan Added custom metrics in state store to streaming query status as well. You can see `providerLoadedMapSize` is added to `stateOperators.customMetrics` in below output

[GitHub] spark issue #21445: [SPARK-24404][SS] Increase currentEpoch when meet a Epoc...

2018-05-29 Thread HeartSaVioR
Github user HeartSaVioR commented on the issue: https://github.com/apache/spark/pull/21445 @LiangchangZ > In the real CP situation, reader and writer may be always in different tasks, right? Continuous mode already supports some valid use cases, and putting

[GitHub] spark issue #21445: [SPARK-24404][SS] Increase currentEpoch when meet a Epoc...

2018-05-28 Thread HeartSaVioR
Github user HeartSaVioR commented on the issue: https://github.com/apache/spark/pull/21445 @LiangchangZ Looks like the patch is needed only with #21353 #21332 #21293 as of now, right? If then please state the condition in JIRA issue description as well as PR's description so

[GitHub] spark pull request #21469: [SPARK-24441][SS] Expose total size of states in ...

2018-05-31 Thread HeartSaVioR
GitHub user HeartSaVioR opened a pull request: https://github.com/apache/spark/pull/21469 [SPARK-24441][SS] Expose total size of states in HDFSBackedStateStore… …Provider ## What changes were proposed in this pull request? This patch exposes the estimation

[GitHub] spark issue #21469: [SPARK-24441][SS] Expose total size of states in HDFSBac...

2018-06-02 Thread HeartSaVioR
Github user HeartSaVioR commented on the issue: https://github.com/apache/spark/pull/21469 @jose-torres Ah yes I forgot that shallow copy has been occurring, so while new map should hold necessary size of map entries but row object will be shared across versions. Thanks

[GitHub] spark pull request #21617: [SPARK-24634][SS] Add a new metric regarding numb...

2018-06-25 Thread HeartSaVioR
Github user HeartSaVioR commented on a diff in the pull request: https://github.com/apache/spark/pull/21617#discussion_r197981651 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/streaming/progress.scala --- @@ -48,12 +49,13 @@ class StateOperatorProgress private[sql

[GitHub] spark issue #21617: [SPARK-24634][SS] Add a new metric regarding number of r...

2018-06-25 Thread HeartSaVioR
Github user HeartSaVioR commented on the issue: https://github.com/apache/spark/pull/21617 @jose-torres Yes, you're right. They would be the rows which applies other transformation and filtering, not origin input rows. I just haven't find proper alternative word than "inpu

[GitHub] spark pull request #21617: [SPARK-24634][SS] Add a new metric regarding numb...

2018-06-25 Thread HeartSaVioR
Github user HeartSaVioR commented on a diff in the pull request: https://github.com/apache/spark/pull/21617#discussion_r197986093 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/streaming/progress.scala --- @@ -48,12 +49,13 @@ class StateOperatorProgress private[sql

[GitHub] spark issue #21617: [SPARK-24634][SS] Add a new metric regarding number of r...

2018-06-25 Thread HeartSaVioR
Github user HeartSaVioR commented on the issue: https://github.com/apache/spark/pull/21617 Abandoning the patch. While I think the JIRA issue is still valid, looks like we should address watermark issue to have correct number of late events. Thanks for reviewing @jose-torres

[GitHub] spark pull request #21617: [SPARK-24634][SS] Add a new metric regarding numb...

2018-06-25 Thread HeartSaVioR
Github user HeartSaVioR closed the pull request at: https://github.com/apache/spark/pull/21617 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #21617: [SPARK-24634][SS] Add a new metric regarding number of r...

2018-06-22 Thread HeartSaVioR
Github user HeartSaVioR commented on the issue: https://github.com/apache/spark/pull/21617 cc. @tdas @zsxwing @jose-torres @jerryshao @arunmahadevan @HyukjinKwon --- - To unsubscribe, e-mail: reviews-unsubscr

  1   2   3   4   5   >