GitHub user HeartSaVioR opened a pull request:
https://github.com/apache/spark/pull/9812
[SPARK-11818][REPL] Fix ExecutorClassLoader to lookup resources from â¦
â¦parent class loader
Without patch, two additional tests of ExecutorClassLoaderSuite fails
Github user HeartSaVioR commented on the pull request:
https://github.com/apache/spark/pull/9812#issuecomment-159347482
@srowen @vanzin @holdenk Thanks for reviewing and merging!
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub
Github user HeartSaVioR commented on a diff in the pull request:
https://github.com/apache/spark/pull/9812#discussion_r45288354
--- Diff:
repl/src/main/scala/org/apache/spark/repl/ExecutorClassLoader.scala ---
@@ -55,6 +57,14 @@ class ExecutorClassLoader(conf: SparkConf, classUri
Github user HeartSaVioR commented on a diff in the pull request:
https://github.com/apache/spark/pull/9812#discussion_r45288990
--- Diff:
repl/src/main/scala/org/apache/spark/repl/ExecutorClassLoader.scala ---
@@ -55,6 +57,14 @@ class ExecutorClassLoader(conf: SparkConf, classUri
Github user HeartSaVioR commented on a diff in the pull request:
https://github.com/apache/spark/pull/9812#discussion_r45285698
--- Diff:
repl/src/main/scala/org/apache/spark/repl/ExecutorClassLoader.scala ---
@@ -55,6 +57,14 @@ class ExecutorClassLoader(conf: SparkConf, classUri
Github user HeartSaVioR commented on the pull request:
https://github.com/apache/spark/pull/9812#issuecomment-158338205
@vanzin Thanks for reviewing, I addressed your comment. Please take a look
again.
---
If your project is set up for it, you can reply to this email and have your
Github user HeartSaVioR commented on the pull request:
https://github.com/apache/spark/pull/9812#issuecomment-158377559
Failed tests seems not related.
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does
Github user HeartSaVioR commented on a diff in the pull request:
https://github.com/apache/spark/pull/9812#discussion_r45260904
--- Diff: core/src/main/scala/org/apache/spark/TestUtils.scala ---
@@ -159,6 +159,16 @@ private[spark] object TestUtils {
createCompiledClass
Github user HeartSaVioR commented on a diff in the pull request:
https://github.com/apache/spark/pull/9812#discussion_r45261349
--- Diff:
repl/src/main/scala/org/apache/spark/repl/ExecutorClassLoader.scala ---
@@ -55,6 +57,14 @@ class ExecutorClassLoader(conf: SparkConf, classUri
Github user HeartSaVioR commented on a diff in the pull request:
https://github.com/apache/spark/pull/21222#discussion_r186247474
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/debug/package.scala ---
@@ -116,6 +168,30 @@ package object debug
Github user HeartSaVioR commented on a diff in the pull request:
https://github.com/apache/spark/pull/21222#discussion_r186032252
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/debug/package.scala ---
@@ -88,23 +100,62 @@ package object debug
Github user HeartSaVioR commented on the issue:
https://github.com/apache/spark/pull/21222
Kindly ping. I guess debugging last batch might not be attractive that
much, but printing codegen would be helpful to someone who want to investigate
or debug in detail
Github user HeartSaVioR commented on the issue:
https://github.com/apache/spark/pull/21152
@jerryshao Thanks for merging! My Apache JIRA ID is âkabhwanâ
---
-
To unsubscribe, e-mail: reviews-unsubscr
Github user HeartSaVioR commented on a diff in the pull request:
https://github.com/apache/spark/pull/21337#discussion_r188632188
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/shuffle/ContinuousShuffleReadRDD.scala
---
@@ -0,0 +1,64
Github user HeartSaVioR commented on a diff in the pull request:
https://github.com/apache/spark/pull/21337#discussion_r188636306
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/shuffle/UnsafeRowReceiver.scala
---
@@ -0,0 +1,56
Github user HeartSaVioR commented on a diff in the pull request:
https://github.com/apache/spark/pull/21337#discussion_r188628202
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/shuffle/ContinuousShuffleReadRDD.scala
---
@@ -0,0 +1,64
Github user HeartSaVioR commented on a diff in the pull request:
https://github.com/apache/spark/pull/21337#discussion_r188638980
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/streaming/continuous/shuffle/ContinuousShuffleReadSuite.scala
---
@@ -0,0 +1,122
Github user HeartSaVioR commented on the issue:
https://github.com/apache/spark/pull/21388
retest this please
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail
GitHub user HeartSaVioR opened a pull request:
https://github.com/apache/spark/pull/21388
[SPARK-24336][SQL] Support 'pass through' transformation in BasicOperators
## What changes were proposed in this pull request?
Enable 'pass through' transformation in BasicOperators
Github user HeartSaVioR commented on the issue:
https://github.com/apache/spark/pull/21388
@hvanhovell
I also think someone might not want to have reflection magic (I was the one
but realized I should do it), so I'm happy to close the PR when others voice
same opinion
Github user HeartSaVioR commented on a diff in the pull request:
https://github.com/apache/spark/pull/21385#discussion_r190131693
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/shuffle/UnsafeRowReceiver.scala
---
@@ -56,20 +69,73 @@ private
Github user HeartSaVioR commented on a diff in the pull request:
https://github.com/apache/spark/pull/21385#discussion_r190129892
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/shuffle/UnsafeRowReceiver.scala
---
@@ -56,20 +69,73 @@ private
Github user HeartSaVioR commented on a diff in the pull request:
https://github.com/apache/spark/pull/21385#discussion_r190125731
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/shuffle/UnsafeRowReceiver.scala
---
@@ -56,20 +69,73 @@ private
Github user HeartSaVioR commented on a diff in the pull request:
https://github.com/apache/spark/pull/21385#discussion_r190120836
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/shuffle/UnsafeRowReceiver.scala
---
@@ -56,20 +69,73 @@ private
GitHub user HeartSaVioR opened a pull request:
https://github.com/apache/spark/pull/21357
[SPARK-24311][SS] Refactor HDFSBackedStateStoreProvider to remove
duplicateâ¦
â¦d logic between operations on delta file and snapshot file
## What changes were proposed in this pull
Github user HeartSaVioR commented on the issue:
https://github.com/apache/spark/pull/21357
cc. @tdas
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h
Github user HeartSaVioR commented on the issue:
https://github.com/apache/spark/pull/21388
Thanks @HyukjinKwon for reviewing. Addressed review comments.
---
-
To unsubscribe, e-mail: reviews-unsubscr
Github user HeartSaVioR commented on a diff in the pull request:
https://github.com/apache/spark/pull/21497#discussion_r193374662
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/sources/TextSocketStreamSuite.scala
---
@@ -256,6 +246,66 @@ class
Github user HeartSaVioR commented on a diff in the pull request:
https://github.com/apache/spark/pull/21497#discussion_r193372564
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/sources/TextSocketStreamSuite.scala
---
@@ -256,6 +246,66 @@ class
Github user HeartSaVioR commented on the issue:
https://github.com/apache/spark/pull/21506
cc. @tdas @jose-torres @jerryshao @arunmahadevan @HyukjinKwon
---
-
To unsubscribe, e-mail: reviews-unsubscr
Github user HeartSaVioR commented on the issue:
https://github.com/apache/spark/pull/21469
@jose-torres No problem. I expect there would be some inactive moment in
Spark community during spark summit. Addressed comment regarding renaming
Github user HeartSaVioR commented on the issue:
https://github.com/apache/spark/pull/21357
Kindly ping again to @tdas
And cc. to @jose-torres @jerryshao @HyukjinKwon @arunmahadevan for
reviewing
Github user HeartSaVioR commented on the issue:
https://github.com/apache/spark/pull/21500
@aalobaidi
I just would like to see the benefit of unloading the version of state
which is expected to be read from the next batch. Totally I agree current
mechanism of cache is excessive
Github user HeartSaVioR commented on the issue:
https://github.com/apache/spark/pull/21469
Kindly ping again to @tdas
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e
Github user HeartSaVioR commented on the issue:
https://github.com/apache/spark/pull/21497
Kindly ping again to @tdas
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e
Github user HeartSaVioR commented on the issue:
https://github.com/apache/spark/pull/21506
Kindly ping again to @tdas
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e
Github user HeartSaVioR commented on the issue:
https://github.com/apache/spark/pull/21222
Kindly ping again to @tdas
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e
Github user HeartSaVioR commented on the issue:
https://github.com/apache/spark/pull/21357
retest this, please
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail
Github user HeartSaVioR commented on a diff in the pull request:
https://github.com/apache/spark/pull/21469#discussion_r194585044
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/statefulOperators.scala
---
@@ -112,14 +122,19 @@ trait StateStoreWriter
Github user HeartSaVioR commented on a diff in the pull request:
https://github.com/apache/spark/pull/21469#discussion_r194613720
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/statefulOperators.scala
---
@@ -112,14 +122,19 @@ trait StateStoreWriter
Github user HeartSaVioR commented on a diff in the pull request:
https://github.com/apache/spark/pull/21477#discussion_r193740695
--- Diff: python/pyspark/sql/streaming.py ---
@@ -843,6 +844,169 @@ def trigger(self, processingTime=None, once=None,
continuous=None
GitHub user HeartSaVioR opened a pull request:
https://github.com/apache/spark/pull/21506
[SPARK-24485][SS] Measure and log elapsed time for filesystem operations in
HDFSBackedStateStoreProvider
## What changes were proposed in this pull request?
This patch measures
Github user HeartSaVioR commented on the issue:
https://github.com/apache/spark/pull/21506
There're plenty of other debug messages which might hide the log messages
added from this patch. Would we want to log them with INFO instead of DEBUG
Github user HeartSaVioR commented on the issue:
https://github.com/apache/spark/pull/21469
retest this, please
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail
Github user HeartSaVioR commented on the issue:
https://github.com/apache/spark/pull/21504
retest this, please
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail
Github user HeartSaVioR commented on the issue:
https://github.com/apache/spark/pull/21500
@aalobaidi
You can also merge #21506 (maybe with changing log level or modify the
patch to set message to INFO level) and see latencies on loading state,
snapshotting, cleaning up
Github user HeartSaVioR commented on the issue:
https://github.com/apache/spark/pull/21469
retest this please
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail
Github user HeartSaVioR commented on the issue:
https://github.com/apache/spark/pull/21500
@aalobaidi
One thing you may want to be aware is that in point of executor's view,
executor must load at least 1 version of state in memory regardless of caching
versions. I guess you may
Github user HeartSaVioR commented on the issue:
https://github.com/apache/spark/pull/21388
@hvanhovell
To be honest, I found the rationalization of the issue from a comment in
Spark code:
https://github.com/apache/spark/blob/4c388bccf1bcac8f833fd9214096dd164c3ea065/sql
Github user HeartSaVioR commented on a diff in the pull request:
https://github.com/apache/spark/pull/21469#discussion_r194563959
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/HDFSBackedStateStoreProvider.scala
---
@@ -247,6 +253,14 @@ private
Github user HeartSaVioR commented on a diff in the pull request:
https://github.com/apache/spark/pull/21477#discussion_r195632189
--- Diff: python/pyspark/sql/streaming.py ---
@@ -843,6 +843,170 @@ def trigger(self, processingTime=None, once=None,
continuous=None
Github user HeartSaVioR commented on a diff in the pull request:
https://github.com/apache/spark/pull/21477#discussion_r195625921
--- Diff: python/pyspark/sql/streaming.py ---
@@ -843,6 +843,170 @@ def trigger(self, processingTime=None, once=None,
continuous=None
Github user HeartSaVioR commented on a diff in the pull request:
https://github.com/apache/spark/pull/21504#discussion_r193945288
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/streaming/StreamingQueryManager.scala
---
@@ -55,6 +57,19 @@ class StreamingQueryManager
Github user HeartSaVioR commented on a diff in the pull request:
https://github.com/apache/spark/pull/21506#discussion_r194293251
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/HDFSBackedStateStoreProvider.scala
---
@@ -280,38 +278,49 @@ private
Github user HeartSaVioR commented on a diff in the pull request:
https://github.com/apache/spark/pull/21506#discussion_r194293481
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/HDFSBackedStateStoreProvider.scala
---
@@ -280,38 +278,49 @@ private
Github user HeartSaVioR commented on a diff in the pull request:
https://github.com/apache/spark/pull/21506#discussion_r194295068
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/HDFSBackedStateStoreProvider.scala
---
@@ -280,38 +278,49 @@ private
Github user HeartSaVioR commented on the issue:
https://github.com/apache/spark/pull/21506
retest this please
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail
Github user HeartSaVioR commented on the issue:
https://github.com/apache/spark/pull/21500
After enabling option, I've observed small expected latency whenever
starting batch per each partition per each batch. Median/average was 4~50 ms
for my case, but max latency was a bit higher
Github user HeartSaVioR commented on the issue:
https://github.com/apache/spark/pull/21497
retest this please
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail
Github user HeartSaVioR commented on the issue:
https://github.com/apache/spark/pull/21504
Test failures were from kafka.
retest this, please
---
-
To unsubscribe, e-mail: reviews-unsubscr
Github user HeartSaVioR commented on the issue:
https://github.com/apache/spark/pull/21500
@aalobaidi
When starting batch, latest version state is being read to start a new
version of state. If the state should be restored from snapshot as well as
delta files, it will incur huge
Github user HeartSaVioR commented on a diff in the pull request:
https://github.com/apache/spark/pull/21428#discussion_r191605388
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/streaming/continuous/shuffle/ContinuousShuffleSuite.scala
---
@@ -58,39 +46,29 @@ class
Github user HeartSaVioR commented on a diff in the pull request:
https://github.com/apache/spark/pull/21428#discussion_r191629272
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/streaming/continuous/shuffle/ContinuousShuffleSuite.scala
---
@@ -288,4 +267,153 @@ class
Github user HeartSaVioR commented on a diff in the pull request:
https://github.com/apache/spark/pull/21428#discussion_r191629554
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/streaming/continuous/shuffle/ContinuousShuffleSuite.scala
---
@@ -58,39 +46,29 @@ class
Github user HeartSaVioR commented on the issue:
https://github.com/apache/spark/pull/21469
cc. @tdas @jose-torres @jerryshao @HyukjinKwon @arunmahadevan
---
-
To unsubscribe, e-mail: reviews-unsubscr
Github user HeartSaVioR commented on the issue:
https://github.com/apache/spark/pull/21469
Thanks @HyukjinKwon for reviewing. Addressed PR title as well as fixing nit.
---
-
To unsubscribe, e-mail: reviews-unsubscr
GitHub user HeartSaVioR opened a pull request:
https://github.com/apache/spark/pull/21497
[SPARK-24466][SS] Fix TextSocketMicroBatchReader to be compatible with
netcat again
## What changes were proposed in this pull request?
TextSocketMicroBatchReader was no longer
Github user HeartSaVioR commented on the issue:
https://github.com/apache/spark/pull/21469
@arunmahadevan
I didn't add the metric to StateOperatorProgress cause this behavior is
specific to HDFSBackedStateStoreProvider (though this is only one
implementation available in Apache
Github user HeartSaVioR commented on the issue:
https://github.com/apache/spark/pull/21469
Looks like the size is added only once for same identity on
SizeEstimator.estimate(), so SizeEstimator.estimate() is working correctly in
this case. There might be other valid cases
Github user HeartSaVioR commented on the issue:
https://github.com/apache/spark/pull/21477
Thanks @HyukjinKwon for cc.ing me. I didn't cover the python part on
structured streaming so would take some time to cover and going through the
code. Hoping I can participate reviewing in time
Github user HeartSaVioR commented on the issue:
https://github.com/apache/spark/pull/21497
retest this please
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail
Github user HeartSaVioR commented on the issue:
https://github.com/apache/spark/pull/21497
Failing tests were below:
* org.apache.spark.sql.hive.client.HiveClientSuites.(It is not a test it is
a sbt.testing.NestedSuiteSelector
Github user HeartSaVioR commented on the issue:
https://github.com/apache/spark/pull/21469
Also added custom metric for the count of versions stored in loadedMaps.
This is a new screenshot:
https://user-images.githubusercontent.com/1317309/40978481-b46ad324-690e-11e8-9b0f
Github user HeartSaVioR commented on a diff in the pull request:
https://github.com/apache/spark/pull/21497#discussion_r193277616
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/sources/TextSocketStreamSuite.scala
---
@@ -35,10 +34,11 @@ import
Github user HeartSaVioR commented on the issue:
https://github.com/apache/spark/pull/21497
@arunmahadevan
Yes, before the patch Spark connects to socket server twice: one for
getting schema, and another one for reading data.
And `-k` flag is only supported for specific
Github user HeartSaVioR commented on the issue:
https://github.com/apache/spark/pull/21497
cc. @tdas @jose-torres @jerryshao @arunmahadevan
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
Github user HeartSaVioR commented on the issue:
https://github.com/apache/spark/pull/21500
I agree that current cache approach may consume excessive memory
unnecessarily, and that's also same to my finding in #21469.
The issue is not that simple however, because in micro
Github user HeartSaVioR commented on a diff in the pull request:
https://github.com/apache/spark/pull/21477#discussion_r193289099
--- Diff: python/pyspark/sql/tests.py ---
@@ -1884,7 +1885,164 @@ def test_query_manager_await_termination(self):
finally
Github user HeartSaVioR commented on a diff in the pull request:
https://github.com/apache/spark/pull/21477#discussion_r193284293
--- Diff: python/pyspark/sql/streaming.py ---
@@ -843,6 +844,169 @@ def trigger(self, processingTime=None, once=None,
continuous=None
Github user HeartSaVioR commented on a diff in the pull request:
https://github.com/apache/spark/pull/21477#discussion_r193289567
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/ForeachWriter.scala
---
@@ -20,10 +20,48 @@ package org.apache.spark.sql
import
Github user HeartSaVioR commented on a diff in the pull request:
https://github.com/apache/spark/pull/21477#discussion_r193291809
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/execution/python/PythonForeachWriter.scala
---
@@ -0,0 +1,161 @@
+/*
+ * Licensed
Github user HeartSaVioR commented on a diff in the pull request:
https://github.com/apache/spark/pull/21477#discussion_r193286066
--- Diff: python/pyspark/sql/streaming.py ---
@@ -843,6 +844,169 @@ def trigger(self, processingTime=None, once=None,
continuous=None
Github user HeartSaVioR commented on a diff in the pull request:
https://github.com/apache/spark/pull/21477#discussion_r193286932
--- Diff: python/pyspark/sql/tests.py ---
@@ -1884,7 +1885,164 @@ def test_query_manager_await_termination(self):
finally
Github user HeartSaVioR commented on a diff in the pull request:
https://github.com/apache/spark/pull/21477#discussion_r193284839
--- Diff: python/pyspark/sql/streaming.py ---
@@ -843,6 +844,169 @@ def trigger(self, processingTime=None, once=None,
continuous=None
Github user HeartSaVioR commented on a diff in the pull request:
https://github.com/apache/spark/pull/21477#discussion_r193285667
--- Diff: python/pyspark/sql/streaming.py ---
@@ -843,6 +844,169 @@ def trigger(self, processingTime=None, once=None,
continuous=None
Github user HeartSaVioR commented on a diff in the pull request:
https://github.com/apache/spark/pull/21477#discussion_r193304316
--- Diff: sql/core/src/main/scala/org/apache/spark/sql/ForeachWriter.scala
---
@@ -20,10 +20,48 @@ package org.apache.spark.sql
import
Github user HeartSaVioR commented on a diff in the pull request:
https://github.com/apache/spark/pull/21469#discussion_r193622940
--- Diff:
sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingQueryListenerSuite.scala
---
@@ -231,7 +231,7 @@ class
Github user HeartSaVioR commented on the issue:
https://github.com/apache/spark/pull/21500
@TomaszGaweda @aalobaidi
Please correct me if I'm missing here.
From every start of batch, state store loads previous version of state so
that it can be read and written. If we
Github user HeartSaVioR commented on the issue:
https://github.com/apache/spark/pull/21500
Retaining versions of state is also relevant to do snapshotting the last
version in files: HDFSBackedStateStoreProvider doesn't snapshot if the version
doesn't exist in loadedMaps. So we may
Github user HeartSaVioR commented on the issue:
https://github.com/apache/spark/pull/21469
@arunmahadevan
Added custom metrics in state store to streaming query status as well. You
can see `providerLoadedMapSize` is added to `stateOperators.customMetrics` in
below output
Github user HeartSaVioR commented on the issue:
https://github.com/apache/spark/pull/21445
@LiangchangZ
> In the real CP situation, reader and writer may be always in different
tasks, right?
Continuous mode already supports some valid use cases, and putting
Github user HeartSaVioR commented on the issue:
https://github.com/apache/spark/pull/21445
@LiangchangZ
Looks like the patch is needed only with #21353 #21332 #21293 as of now,
right? If then please state the condition in JIRA issue description as well as
PR's description so
GitHub user HeartSaVioR opened a pull request:
https://github.com/apache/spark/pull/21469
[SPARK-24441][SS] Expose total size of states in HDFSBackedStateStoreâ¦
â¦Provider
## What changes were proposed in this pull request?
This patch exposes the estimation
Github user HeartSaVioR commented on the issue:
https://github.com/apache/spark/pull/21469
@jose-torres
Ah yes I forgot that shallow copy has been occurring, so while new map
should hold necessary size of map entries but row object will be shared across
versions. Thanks
Github user HeartSaVioR commented on a diff in the pull request:
https://github.com/apache/spark/pull/21617#discussion_r197981651
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/streaming/progress.scala ---
@@ -48,12 +49,13 @@ class StateOperatorProgress private[sql
Github user HeartSaVioR commented on the issue:
https://github.com/apache/spark/pull/21617
@jose-torres
Yes, you're right. They would be the rows which applies other
transformation and filtering, not origin input rows. I just haven't find proper
alternative word than "inpu
Github user HeartSaVioR commented on a diff in the pull request:
https://github.com/apache/spark/pull/21617#discussion_r197986093
--- Diff:
sql/core/src/main/scala/org/apache/spark/sql/streaming/progress.scala ---
@@ -48,12 +49,13 @@ class StateOperatorProgress private[sql
Github user HeartSaVioR commented on the issue:
https://github.com/apache/spark/pull/21617
Abandoning the patch. While I think the JIRA issue is still valid, looks
like we should address watermark issue to have correct number of late events.
Thanks for reviewing @jose-torres
Github user HeartSaVioR closed the pull request at:
https://github.com/apache/spark/pull/21617
---
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h
Github user HeartSaVioR commented on the issue:
https://github.com/apache/spark/pull/21617
cc. @tdas @zsxwing @jose-torres @jerryshao @arunmahadevan @HyukjinKwon
---
-
To unsubscribe, e-mail: reviews-unsubscr
1 - 100 of 469 matches
Mail list logo