[GitHub] spark pull request #23208: [SPARK-25530][SQL] data source v2 API refactor (b...

2018-12-06 Thread jose-torres
Github user jose-torres commented on a diff in the pull request: https://github.com/apache/spark/pull/23208#discussion_r239563967 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/TableProvider.java --- @@ -25,7 +25,10 @@ * The base interface for v2 data

[GitHub] spark pull request #20906: [SPARK-23561][SS] Pull continuous processing out ...

2018-11-29 Thread jose-torres
Github user jose-torres closed the pull request at: https://github.com/apache/spark/pull/20906 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark pull request #20752: [SPARK-23559][SS] Create StreamingDataWriterFacto...

2018-11-29 Thread jose-torres
Github user jose-torres closed the pull request at: https://github.com/apache/spark/pull/20752 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark pull request #20859: [SPARK-23702][SS] Forbid watermarks on both sides...

2018-11-29 Thread jose-torres
Github user jose-torres closed the pull request at: https://github.com/apache/spark/pull/20859 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark pull request #23095: [SPARK-23886][SS] Update query status for Continu...

2018-11-21 Thread jose-torres
Github user jose-torres commented on a diff in the pull request: https://github.com/apache/spark/pull/23095#discussion_r235473858 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/ContinuousExecution.scala --- @@ -117,6 +117,7 @@ class

[GitHub] spark pull request #23086: [SPARK-25528][SQL] data source v2 API refactor (b...

2018-11-21 Thread jose-torres
Github user jose-torres commented on a diff in the pull request: https://github.com/apache/spark/pull/23086#discussion_r235470285 --- Diff: project/MimaExcludes.scala --- @@ -149,7 +149,8 @@ object MimaExcludes { ProblemFilters.exclude[MissingClassProblem

[GitHub] spark issue #23023: [SPARK-26042][SS][TESTS]Fix a potential hang in KafkaCon...

2018-11-13 Thread jose-torres
Github user jose-torres commented on the issue: https://github.com/apache/spark/pull/23023 LGTM --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #22547: [SPARK-25528][SQL] data source V2 read side API refactor...

2018-10-19 Thread jose-torres
Github user jose-torres commented on the issue: https://github.com/apache/spark/pull/22547 I agree that we need a shared understanding of the relationship between this work and the new catalog API. I was not under the impression that the primary purpose of v2 is to integrate catalog

[GitHub] spark pull request #22671: [SPARK-25615][SQL][TEST] Improve the test runtime...

2018-10-09 Thread jose-torres
Github user jose-torres commented on a diff in the pull request: https://github.com/apache/spark/pull/22671#discussion_r223775935 --- Diff: external/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaSinkSuite.scala --- @@ -332,7 +332,9 @@ class KafkaSinkSuite

[GitHub] spark issue #22478: [SPARK-25472][SS] Don't have legitimate stops of streams...

2018-09-19 Thread jose-torres
Github user jose-torres commented on the issue: https://github.com/apache/spark/pull/22478 Lgtm pending tests On Wed, Sep 19, 2018 at 5:16 PM Shixiong Zhu wrote: > LGTM pending tests. Could you add [SS] to your title? > > — > You a

[GitHub] spark issue #22388: Revert [SPARK-24882][SQL] improve data source v2 API fro...

2018-09-12 Thread jose-torres
Github user jose-torres commented on the issue: https://github.com/apache/spark/pull/22388 MicroBatchExecution.scala and ContinuousExecution.scala look right after the revert, although it would be helpful to understand what the diff is between this and a straight `git revert

[GitHub] spark pull request #22386: [SPARK-25399] Continuous processing state should ...

2018-09-11 Thread jose-torres
Github user jose-torres commented on a diff in the pull request: https://github.com/apache/spark/pull/22386#discussion_r216728848 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/ContinuousExecution.scala --- @@ -391,6 +393,7 @@ class

[GitHub] spark pull request #22245: [SPARK-24882][FOLLOWUP] Fix flaky synchronization...

2018-08-27 Thread jose-torres
GitHub user jose-torres opened a pull request: https://github.com/apache/spark/pull/22245 [SPARK-24882][FOLLOWUP] Fix flaky synchronization in Kafka tests. ## What changes were proposed in this pull request? Fix flaky synchronization in Kafka tests - we need to use the scan

[GitHub] spark pull request #22191: [SPARK-25204][SS] Fix race in rate source test.

2018-08-22 Thread jose-torres
GitHub user jose-torres opened a pull request: https://github.com/apache/spark/pull/22191 [SPARK-25204][SS] Fix race in rate source test. ## What changes were proposed in this pull request? Fix a race in the rate source tests. We need a better way of testing restart

[GitHub] spark pull request #22009: [SPARK-24882][SQL] improve data source v2 API

2018-08-21 Thread jose-torres
Github user jose-torres commented on a diff in the pull request: https://github.com/apache/spark/pull/22009#discussion_r211739221 --- Diff: external/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaContinuousTest.scala --- @@ -47,7 +47,9 @@ trait

[GitHub] spark pull request #22009: [SPARK-24882][SQL] improve data source v2 API

2018-08-21 Thread jose-torres
Github user jose-torres commented on a diff in the pull request: https://github.com/apache/spark/pull/22009#discussion_r211639298 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/streaming/ContinuousReadSupport.java --- @@ -24,16 +24,17 @@ import

[GitHub] spark issue #21919: [SPARK-24933][SS] Report numOutputRows in SinkProgress

2018-08-14 Thread jose-torres
Github user jose-torres commented on the issue: https://github.com/apache/spark/pull/21919 Sure, but I'm not a committer so I can't make that happen. @cloud-fan --- - To unsubscribe, e-mail: reviews-unsubscr

[GitHub] spark issue #22009: [SPARK-24882][SQL] improve data source v2 API

2018-08-13 Thread jose-torres
Github user jose-torres commented on the issue: https://github.com/apache/spark/pull/22009 There's a reasonable chance that the Error adding data: Could not find index of the source to which data was added flakiness in the Kafka suite was caused by this PR. Let me

[GitHub] spark pull request #22009: [SPARK-24882][SQL] improve data source v2 API

2018-08-13 Thread jose-torres
Github user jose-torres commented on a diff in the pull request: https://github.com/apache/spark/pull/22009#discussion_r209708483 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/BatchWriteSupportProvider.java --- @@ -21,33 +21,39 @@ import

[GitHub] spark pull request #22009: [SPARK-24882][SQL] improve data source v2 API

2018-08-13 Thread jose-torres
Github user jose-torres commented on a diff in the pull request: https://github.com/apache/spark/pull/22009#discussion_r209702134 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/streaming/StreamingReadSupport.java --- @@ -0,0 +1,49

[GitHub] spark pull request #22009: [SPARK-24882][SQL] improve data source v2 API

2018-08-10 Thread jose-torres
Github user jose-torres commented on a diff in the pull request: https://github.com/apache/spark/pull/22009#discussion_r209301920 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/streaming/StreamingReadSupport.java --- @@ -0,0 +1,49

[GitHub] spark pull request #22009: [SPARK-24882][SQL] improve data source v2 API

2018-08-09 Thread jose-torres
Github user jose-torres commented on a diff in the pull request: https://github.com/apache/spark/pull/22009#discussion_r209094853 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/streaming/StreamingReadSupport.java --- @@ -0,0 +1,49

[GitHub] spark pull request #22009: [SPARK-24882][SQL] improve data source v2 API

2018-08-09 Thread jose-torres
Github user jose-torres commented on a diff in the pull request: https://github.com/apache/spark/pull/22009#discussion_r209059908 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/streaming/StreamingReadSupport.java --- @@ -0,0 +1,49

[GitHub] spark pull request #22009: [SPARK-24882][SQL] improve data source v2 API

2018-08-09 Thread jose-torres
Github user jose-torres commented on a diff in the pull request: https://github.com/apache/spark/pull/22009#discussion_r209041048 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/streaming/MicroBatchReadSupport.java --- @@ -0,0 +1,49

[GitHub] spark pull request #22009: [SPARK-24882][SQL] improve data source v2 API

2018-08-09 Thread jose-torres
Github user jose-torres commented on a diff in the pull request: https://github.com/apache/spark/pull/22009#discussion_r209040246 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/streaming/MicroBatchReadSupport.java --- @@ -0,0 +1,49

[GitHub] spark pull request #22009: [SPARK-24882][SQL] improve data source v2 API

2018-08-09 Thread jose-torres
Github user jose-torres commented on a diff in the pull request: https://github.com/apache/spark/pull/22009#discussion_r208983568 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/ContinuousReadSupportProvider.java --- @@ -0,0 +1,73 @@ +/* + * Licensed

[GitHub] spark pull request #22009: [SPARK-24882][SQL] improve data source v2 API

2018-08-09 Thread jose-torres
Github user jose-torres commented on a diff in the pull request: https://github.com/apache/spark/pull/22009#discussion_r208984614 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/DataSourceV2.java --- @@ -23,8 +23,9 @@ * The base interface for data source v2

[GitHub] spark pull request #22009: [SPARK-24882][SQL] improve data source v2 API

2018-08-09 Thread jose-torres
Github user jose-torres commented on a diff in the pull request: https://github.com/apache/spark/pull/22009#discussion_r208763503 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/SessionConfigSupport.java --- @@ -27,10 +27,10 @@ @InterfaceStability.Evolving

[GitHub] spark pull request #22009: [SPARK-24882][SQL] improve data source v2 API

2018-08-09 Thread jose-torres
Github user jose-torres commented on a diff in the pull request: https://github.com/apache/spark/pull/22009#discussion_r208984707 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/MicroBatchReadSupportProvider.java --- @@ -0,0 +1,73 @@ +/* + * Licensed

[GitHub] spark pull request #22009: [SPARK-24882][SQL] improve data source v2 API

2018-08-09 Thread jose-torres
Github user jose-torres commented on a diff in the pull request: https://github.com/apache/spark/pull/22009#discussion_r209019530 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/streaming/StreamingReadSupport.java --- @@ -0,0 +1,49

[GitHub] spark pull request #22009: [SPARK-24882][SQL] improve data source v2 API

2018-08-09 Thread jose-torres
Github user jose-torres commented on a diff in the pull request: https://github.com/apache/spark/pull/22009#discussion_r208986233 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/BatchReadSupport.java --- @@ -0,0 +1,47 @@ +/* + * Licensed

[GitHub] spark pull request #22009: [SPARK-24882][SQL] improve data source v2 API

2018-08-09 Thread jose-torres
Github user jose-torres commented on a diff in the pull request: https://github.com/apache/spark/pull/22009#discussion_r208987767 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/InputPartition.java --- @@ -22,18 +22,16 @@ import

[GitHub] spark pull request #22009: [SPARK-24882][SQL] improve data source v2 API

2018-08-09 Thread jose-torres
Github user jose-torres commented on a diff in the pull request: https://github.com/apache/spark/pull/22009#discussion_r209013149 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/PartitionReaderFactory.java --- @@ -0,0 +1,68 @@ +/* + * Licensed

[GitHub] spark pull request #22009: [SPARK-24882][SQL] improve data source v2 API

2018-08-09 Thread jose-torres
Github user jose-torres commented on a diff in the pull request: https://github.com/apache/spark/pull/22009#discussion_r209018297 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/streaming/ContinuousReadSupport.java --- @@ -0,0 +1,79

[GitHub] spark pull request #22009: [SPARK-24882][SQL] improve data source v2 API

2018-08-09 Thread jose-torres
Github user jose-torres commented on a diff in the pull request: https://github.com/apache/spark/pull/22009#discussion_r208982830 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/BatchReadSupportProvider.java --- @@ -18,19 +18,22 @@ package

[GitHub] spark pull request #22009: [SPARK-24882][SQL] improve data source v2 API

2018-08-09 Thread jose-torres
Github user jose-torres commented on a diff in the pull request: https://github.com/apache/spark/pull/22009#discussion_r209021329 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/writer/streaming/StreamingDataWriterFactory.java --- @@ -0,0 +1,58

[GitHub] spark issue #21919: [SPARK-24933][SS] Report numOutputRows in SinkProgress

2018-08-08 Thread jose-torres
Github user jose-torres commented on the issue: https://github.com/apache/spark/pull/21919 No more suggestions, the PR looks fine to me. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

[GitHub] spark pull request #22009: [SPARK-24882][SQL] improve data source v2 API

2018-08-08 Thread jose-torres
Github user jose-torres commented on a diff in the pull request: https://github.com/apache/spark/pull/22009#discussion_r208642760 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/streaming/MicroBatchReadSupport.java --- @@ -0,0 +1,49

[GitHub] spark pull request #22009: [SPARK-24882][SQL] improve data source v2 API

2018-08-08 Thread jose-torres
Github user jose-torres commented on a diff in the pull request: https://github.com/apache/spark/pull/22009#discussion_r208641014 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/streaming/StreamingReadSupport.java --- @@ -0,0 +1,49

[GitHub] spark pull request #22009: [SPARK-24882][SQL] improve data source v2 API

2018-08-07 Thread jose-torres
Github user jose-torres commented on a diff in the pull request: https://github.com/apache/spark/pull/22009#discussion_r208425737 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/streaming/StreamingReadSupport.java --- @@ -0,0 +1,49

[GitHub] spark pull request #22009: [SPARK-24882][SQL] improve data source v2 API

2018-08-07 Thread jose-torres
Github user jose-torres commented on a diff in the pull request: https://github.com/apache/spark/pull/22009#discussion_r208425199 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/streaming/StreamingReadSupport.java --- @@ -0,0 +1,49

[GitHub] spark pull request #22009: [SPARK-24882][SQL] improve data source v2 API

2018-08-07 Thread jose-torres
Github user jose-torres commented on a diff in the pull request: https://github.com/apache/spark/pull/22009#discussion_r208392865 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/streaming/StreamingReadSupport.java --- @@ -0,0 +1,49

[GitHub] spark pull request #22009: [SPARK-24882][SQL] improve data source v2 API

2018-08-07 Thread jose-torres
Github user jose-torres commented on a diff in the pull request: https://github.com/apache/spark/pull/22009#discussion_r208391449 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/streaming/ContinuousReadSupport.java --- @@ -0,0 +1,72

[GitHub] spark pull request #22009: [SPARK-24882][SQL] improve data source v2 API

2018-08-07 Thread jose-torres
Github user jose-torres commented on a diff in the pull request: https://github.com/apache/spark/pull/22009#discussion_r208373424 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/streaming/StreamingReadSupport.java --- @@ -0,0 +1,49

[GitHub] spark pull request #22009: [SPARK-24882][SQL] improve data source v2 API

2018-08-07 Thread jose-torres
Github user jose-torres commented on a diff in the pull request: https://github.com/apache/spark/pull/22009#discussion_r208372469 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/streaming/StreamingReadSupport.java --- @@ -0,0 +1,49

[GitHub] spark pull request #22009: [SPARK-24882][SQL] improve data source v2 API

2018-08-07 Thread jose-torres
Github user jose-torres commented on a diff in the pull request: https://github.com/apache/spark/pull/22009#discussion_r208370493 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/streaming/ContinuousReadSupport.java --- @@ -0,0 +1,72

[GitHub] spark pull request #22009: [SPARK-24882][SQL] improve data source v2 API

2018-08-06 Thread jose-torres
Github user jose-torres commented on a diff in the pull request: https://github.com/apache/spark/pull/22009#discussion_r207978414 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/memory.scala --- @@ -122,24 +119,22 @@ case class MemoryStream[A : Encoder

[GitHub] spark pull request #21919: [SPARK-24933][SS] Report numOutputRows in SinkPro...

2018-08-06 Thread jose-torres
Github user jose-torres commented on a diff in the pull request: https://github.com/apache/spark/pull/21919#discussion_r207961663 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/WriteToDataSourceV2.scala --- @@ -179,3 +192,24 @@ class

[GitHub] spark issue #21919: [SPARK-24933][SS] Report numOutputRows in SinkProgress v...

2018-08-02 Thread jose-torres
Github user jose-torres commented on the issue: https://github.com/apache/spark/pull/21919 If the individual connectors aren't doing the counting, I don't see a good reason to put the data inside WriterCommitMessage instead of just leaving StreamWriterCommitProgress as its own

[GitHub] spark issue #21919: [SPARK-24933][SS] Report numOutputRows in SinkProgress v...

2018-08-02 Thread jose-torres
Github user jose-torres commented on the issue: https://github.com/apache/spark/pull/21919 I don't think so. The offsets for the file source need to be consumer owned, because they need to work with files that were generated outside of Spark

[GitHub] spark issue #21919: [SPARK-24933][SS] Report numOutputRows in SinkProgress v...

2018-08-02 Thread jose-torres
Github user jose-torres commented on the issue: https://github.com/apache/spark/pull/21919 For file streams, the offsets are just indices into a log the source keeps of which files it's seen. So a file sink doesn't have any access to those offsets

[GitHub] spark issue #21919: [SPARK-24933][SS] Report numOutputRows in SinkProgress v...

2018-08-02 Thread jose-torres
Github user jose-torres commented on the issue: https://github.com/apache/spark/pull/21919 Minimum and maximum offset in the sink wouldn't make sense for most sources. There aren't any meaningful values to report for e.g. writing out Parquet files. It'd make sense to put them inside

[GitHub] spark issue #21919: [SPARK-24933][SS] Report numOutputRows in SinkProgress v...

2018-08-01 Thread jose-torres
Github user jose-torres commented on the issue: https://github.com/apache/spark/pull/21919 Sure. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #21919: [SPARK-24933][SS] Report numOutputRows in SinkProgress v...

2018-08-01 Thread jose-torres
Github user jose-torres commented on the issue: https://github.com/apache/spark/pull/21919 I like the idea of doing this, but I don't think it really belongs as part of the WriterCommitMessage interface. Every connector shouldn't have to independently count its rows; the execution

[GitHub] spark issue #21199: [SPARK-24127][SS] Continuous text socket source

2018-08-01 Thread jose-torres
Github user jose-torres commented on the issue: https://github.com/apache/spark/pull/21199 The change looks broadly good (and important) to me. I'll defer to @HeartSaVioR wrt the in-depth review; let me know if there are any specific parts I should to take a look

[GitHub] spark issue #21948: [SPARK-24991][SQL] use InternalRow in DataSourceWriter

2018-08-01 Thread jose-torres
Github user jose-torres commented on the issue: https://github.com/apache/spark/pull/21948 lgtm --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #21946: [SPARK-24990][SQL] merge ReadSupport and ReadSupportWith...

2018-08-01 Thread jose-torres
Github user jose-torres commented on the issue: https://github.com/apache/spark/pull/21946 Wouldn't the redo of the API that we're discussing obsolete this? --- - To unsubscribe, e-mail: reviews-unsubscr

[GitHub] spark pull request #21721: [SPARK-24748][SS] Support for reporting custom me...

2018-07-30 Thread jose-torres
Github user jose-torres commented on a diff in the pull request: https://github.com/apache/spark/pull/21721#discussion_r206325719 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/streaming/SupportsCustomReaderMetrics.java --- @@ -0,0 +1,45

[GitHub] spark pull request #21721: [SPARK-24748][SS] Support for reporting custom me...

2018-07-30 Thread jose-torres
Github user jose-torres commented on a diff in the pull request: https://github.com/apache/spark/pull/21721#discussion_r206322694 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/streaming/SupportsCustomReaderMetrics.java --- @@ -0,0 +1,45

[GitHub] spark issue #21921: [SPARK-24971][SQL] remove SupportsDeprecatedScanRow

2018-07-30 Thread jose-torres
Github user jose-torres commented on the issue: https://github.com/apache/spark/pull/21921 lgtm --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark pull request #21118: SPARK-23325: Use InternalRow when reading with Da...

2018-07-20 Thread jose-torres
Github user jose-torres commented on a diff in the pull request: https://github.com/apache/spark/pull/21118#discussion_r204108549 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/DataSourceV2Strategy.scala --- @@ -125,16 +125,13 @@ object

[GitHub] spark issue #21817: [SPARK-24861][SS][test] create corrected temp directorie...

2018-07-19 Thread jose-torres
Github user jose-torres commented on the issue: https://github.com/apache/spark/pull/21817 lgtm --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark issue #21733: [SPARK-24763][SS] Remove redundant key data from value i...

2018-07-11 Thread jose-torres
Github user jose-torres commented on the issue: https://github.com/apache/spark/pull/21733 We could still save the value of the option to offsetSeqMetadata and error if it's changed. The value of using an option would just be that there's no global default; a poweruser can set

[GitHub] spark pull request #21700: [SPARK-24717][SS] Split out max retain version of...

2018-07-11 Thread jose-torres
Github user jose-torres commented on a diff in the pull request: https://github.com/apache/spark/pull/21700#discussion_r201792030 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/state/HDFSBackedStateStoreProvider.scala --- @@ -239,8 +241,9 @@ private

[GitHub] spark pull request #21700: [SPARK-24717][SS] Split out max retain version of...

2018-07-11 Thread jose-torres
Github user jose-torres commented on a diff in the pull request: https://github.com/apache/spark/pull/21700#discussion_r201793040 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/state/StateStoreSuite.scala --- @@ -64,6 +64,63 @@ class StateStoreSuite

[GitHub] spark pull request #21733: [SPARK-24763][SS] Remove redundant key data from ...

2018-07-11 Thread jose-torres
Github user jose-torres commented on a diff in the pull request: https://github.com/apache/spark/pull/21733#discussion_r201791805 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala --- @@ -825,6 +825,16 @@ object SQLConf { .intConf

[GitHub] spark issue #21469: [SPARK-24441][SS] Expose total estimated size of states ...

2018-07-10 Thread jose-torres
Github user jose-torres commented on the issue: https://github.com/apache/spark/pull/21469 Sure, I don't mind if we remove that metric. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

[GitHub] spark issue #21622: [SPARK-24637][SS] Add metrics regarding state and waterm...

2018-07-10 Thread jose-torres
Github user jose-torres commented on the issue: https://github.com/apache/spark/pull/21622 lgtm --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark pull request #21700: [SPARK-24717][SS] Split out max retain version of...

2018-07-10 Thread jose-torres
Github user jose-torres commented on a diff in the pull request: https://github.com/apache/spark/pull/21700#discussion_r201407516 --- Diff: sql/core/src/main/java/org/apache/spark/sql/streaming/state/BoundedSortedMap.java --- @@ -0,0 +1,145 @@ +/* + * Licensed

[GitHub] spark issue #21721: [SPARK-24748][SS] Support for reporting custom metrics v...

2018-07-10 Thread jose-torres
Github user jose-torres commented on the issue: https://github.com/apache/spark/pull/21721 Looks fine to me with a MemorySink example. I don't think a formal discussion is super necessary - the major advantage of the mixin model is to let us add things like this without impacting

[GitHub] spark pull request #21721: [SPARK-24748][SS] Support for reporting custom me...

2018-07-10 Thread jose-torres
Github user jose-torres commented on a diff in the pull request: https://github.com/apache/spark/pull/21721#discussion_r201402523 --- Diff: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/JsonUtils.scala --- @@ -95,4 +95,25 @@ private object JsonUtils

[GitHub] spark pull request #21733: [SPARK-24763][SS] Remove redundant key data from ...

2018-07-10 Thread jose-torres
Github user jose-torres commented on a diff in the pull request: https://github.com/apache/spark/pull/21733#discussion_r201397799 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala --- @@ -825,6 +825,16 @@ object SQLConf { .intConf

[GitHub] spark issue #21305: [SPARK-24251][SQL] Add AppendData logical plan.

2018-07-09 Thread jose-torres
Github user jose-torres commented on the issue: https://github.com/apache/spark/pull/21305 (The last test failure is a known flaky test I've been working (albeit unsuccessfully so far) to find a solution

[GitHub] spark pull request #21721: [SPARK-24748][SS] Support for reporting custom me...

2018-07-05 Thread jose-torres
Github user jose-torres commented on a diff in the pull request: https://github.com/apache/spark/pull/21721#discussion_r200538008 --- Diff: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/JsonUtils.scala --- @@ -95,4 +95,25 @@ private object JsonUtils

[GitHub] spark pull request #21721: [SPARK-24748][SS] Support for reporting custom me...

2018-07-05 Thread jose-torres
Github user jose-torres commented on a diff in the pull request: https://github.com/apache/spark/pull/21721#discussion_r200537533 --- Diff: external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaMicroBatchReader.scala --- @@ -379,3 +384,16 @@ private[kafka010

[GitHub] spark pull request #21721: [SPARK-24748][SS] Support for reporting custom me...

2018-07-05 Thread jose-torres
Github user jose-torres commented on a diff in the pull request: https://github.com/apache/spark/pull/21721#discussion_r200537276 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/reader/SupportsCustomMetrics.java --- @@ -0,0 +1,30 @@ +/* --- End diff

[GitHub] spark pull request #21721: [SPARK-24748][SS] Support for reporting custom me...

2018-07-05 Thread jose-torres
Github user jose-torres commented on a diff in the pull request: https://github.com/apache/spark/pull/21721#discussion_r200537454 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/streaming/progress.scala --- @@ -178,12 +180,18 @@ class SourceProgress protected[sql

[GitHub] spark pull request #21662: [SPARK-24662][SQL][SS] Support limit in structure...

2018-07-05 Thread jose-torres
Github user jose-torres commented on a diff in the pull request: https://github.com/apache/spark/pull/21662#discussion_r200454033 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamSuite.scala --- @@ -805,6 +806,75 @@ class StreamSuite extends StreamTest

[GitHub] spark pull request #21662: [SPARK-24662][SQL][SS] Support limit in structure...

2018-07-05 Thread jose-torres
Github user jose-torres commented on a diff in the pull request: https://github.com/apache/spark/pull/21662#discussion_r200454145 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkStrategies.scala --- @@ -354,6 +355,24 @@ abstract class SparkStrategies extends

[GitHub] spark pull request #21662: [SPARK-24662][SQL][SS] Support limit in structure...

2018-06-29 Thread jose-torres
Github user jose-torres commented on a diff in the pull request: https://github.com/apache/spark/pull/21662#discussion_r199288285 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamingLimitExec.scala --- @@ -0,0 +1,96 @@ +/* + * Licensed

[GitHub] spark pull request #21662: [SPARK-24662][SQL][SS] Support limit in structure...

2018-06-29 Thread jose-torres
Github user jose-torres commented on a diff in the pull request: https://github.com/apache/spark/pull/21662#discussion_r199284336 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/StreamingLimitExec.scala --- @@ -0,0 +1,96 @@ +/* + * Licensed

[GitHub] spark pull request #21662: [SPARK-24662][SQL][SS] Support limit in structure...

2018-06-29 Thread jose-torres
Github user jose-torres commented on a diff in the pull request: https://github.com/apache/spark/pull/21662#discussion_r199284559 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamSuite.scala --- @@ -805,6 +806,75 @@ class StreamSuite extends StreamTest

[GitHub] spark pull request #21662: [SPARK-24662][SQL][SS] Support limit in structure...

2018-06-29 Thread jose-torres
Github user jose-torres commented on a diff in the pull request: https://github.com/apache/spark/pull/21662#discussion_r199283820 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/MemorySinkSuite.scala --- @@ -70,35 +68,9 @@ class MemorySinkSuite extends

[GitHub] spark pull request #20351: [SPARK-23014][SS] Fully remove V1 memory sink.

2018-06-27 Thread jose-torres
Github user jose-torres closed the pull request at: https://github.com/apache/spark/pull/20351 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h

[GitHub] spark pull request #21560: [SPARK-24386][SS] coalesce(1) aggregates in conti...

2018-06-27 Thread jose-torres
Github user jose-torres commented on a diff in the pull request: https://github.com/apache/spark/pull/21560#discussion_r198571824 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/ContinuousCoalesceRDD.scala --- @@ -0,0 +1,108

[GitHub] spark issue #21560: [SPARK-24386][SS] coalesce(1) aggregates in continuous p...

2018-06-27 Thread jose-torres
Github user jose-torres commented on the issue: https://github.com/apache/spark/pull/21560 Sorry, that wasn't meant to be a complete push. Added the tests now. --- - To unsubscribe, e-mail: reviews-unsubscr

[GitHub] spark pull request #21560: [SPARK-24386][SS] coalesce(1) aggregates in conti...

2018-06-27 Thread jose-torres
Github user jose-torres commented on a diff in the pull request: https://github.com/apache/spark/pull/21560#discussion_r198571496 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/UnsupportedOperationChecker.scala --- @@ -349,6 +349,17 @@ object

[GitHub] spark pull request #21560: [SPARK-24386][SS] coalesce(1) aggregates in conti...

2018-06-26 Thread jose-torres
Github user jose-torres commented on a diff in the pull request: https://github.com/apache/spark/pull/21560#discussion_r198337615 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/ContinuousDataSourceRDD.scala --- @@ -51,7 +51,7 @@ class

[GitHub] spark issue #21617: [SPARK-24634][SS] Add a new metric regarding number of r...

2018-06-25 Thread jose-torres
Github user jose-torres commented on the issue: https://github.com/apache/spark/pull/21617 Well, "clear" is relative. Since we're trying to provide functionality in the Dataframe API, it's perfectly alright for the RDD graph to end up looking a bit weird. It seems feas

[GitHub] spark pull request #21560: [SPARK-24386][SS] coalesce(1) aggregates in conti...

2018-06-25 Thread jose-torres
Github user jose-torres commented on a diff in the pull request: https://github.com/apache/spark/pull/21560#discussion_r197936350 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/UnsupportedOperationChecker.scala --- @@ -349,6 +349,17 @@ object

[GitHub] spark pull request #21560: [SPARK-24386][SS] coalesce(1) aggregates in conti...

2018-06-25 Thread jose-torres
Github user jose-torres commented on a diff in the pull request: https://github.com/apache/spark/pull/21560#discussion_r197935989 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/UnsupportedOperationChecker.scala --- @@ -349,6 +349,17 @@ object

[GitHub] spark pull request #21560: [SPARK-24386][SS] coalesce(1) aggregates in conti...

2018-06-25 Thread jose-torres
Github user jose-torres commented on a diff in the pull request: https://github.com/apache/spark/pull/21560#discussion_r197933535 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/shuffle/ContinuousShuffleReadRDD.scala --- @@ -61,12 +63,14

[GitHub] spark pull request #21560: [SPARK-24386][SS] coalesce(1) aggregates in conti...

2018-06-25 Thread jose-torres
Github user jose-torres commented on a diff in the pull request: https://github.com/apache/spark/pull/21560#discussion_r197933175 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/ContinuousCoalesceRDD.scala --- @@ -0,0 +1,108

[GitHub] spark pull request #21560: [SPARK-24386][SS] coalesce(1) aggregates in conti...

2018-06-25 Thread jose-torres
Github user jose-torres commented on a diff in the pull request: https://github.com/apache/spark/pull/21560#discussion_r197930245 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/ContinuousCoalesceRDD.scala --- @@ -0,0 +1,108

[GitHub] spark pull request #21560: [SPARK-24386][SS] coalesce(1) aggregates in conti...

2018-06-25 Thread jose-torres
Github user jose-torres commented on a diff in the pull request: https://github.com/apache/spark/pull/21560#discussion_r197929943 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/ContinuousCoalesceRDD.scala --- @@ -0,0 +1,108

[GitHub] spark pull request #21560: [SPARK-24386][SS] coalesce(1) aggregates in conti...

2018-06-25 Thread jose-torres
Github user jose-torres commented on a diff in the pull request: https://github.com/apache/spark/pull/21560#discussion_r197929262 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/UnsupportedOperationChecker.scala --- @@ -349,6 +349,17 @@ object

[GitHub] spark pull request #21560: [SPARK-24386][SS] coalesce(1) aggregates in conti...

2018-06-25 Thread jose-torres
Github user jose-torres commented on a diff in the pull request: https://github.com/apache/spark/pull/21560#discussion_r197928934 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/ContinuousDataSourceRDD.scala --- @@ -98,6 +98,10 @@ class

[GitHub] spark issue #21617: [SPARK-24634][SS] Add a new metric regarding number of r...

2018-06-25 Thread jose-torres
Github user jose-torres commented on the issue: https://github.com/apache/spark/pull/21617 LGTM, but note that the rows being counted here are the rows persisted into the state store, which aren't necessarily the input rows. So the side-channel described in the JIRA would

[GitHub] spark pull request #21560: [SPARK-24386][SS] coalesce(1) aggregates in conti...

2018-06-20 Thread jose-torres
Github user jose-torres commented on a diff in the pull request: https://github.com/apache/spark/pull/21560#discussion_r196930368 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/ContinuousDataSourceRDD.scala --- @@ -51,7 +51,7 @@ class

[GitHub] spark pull request #21560: [SPARK-24386][SS] coalesce(1) aggregates in conti...

2018-06-20 Thread jose-torres
Github user jose-torres commented on a diff in the pull request: https://github.com/apache/spark/pull/21560#discussion_r196924994 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/streaming/continuous/ContinuousCoalesceRDD.scala --- @@ -0,0 +1,93

  1   2   3   4   5   6   >