spark git commit: [SPARK-18031][TESTS] Fix flaky test ExecutorAllocationManagerSuite.basic functionality

2016-12-21 Thread tdas
Repository: spark Updated Branches: refs/heads/branch-2.1 3c8861d92 -> 162bdb910 [SPARK-18031][TESTS] Fix flaky test ExecutorAllocationManagerSuite.basic functionality ## What changes were proposed in this pull request? The failure is because in `test("basic functionality")`, it doesn't

spark git commit: [SPARK-18588][SS][KAFKA] Create a new KafkaConsumer when error happens to fix the flaky test

2016-12-21 Thread tdas
Repository: spark Updated Branches: refs/heads/master 354e93618 -> 95efc895e [SPARK-18588][SS][KAFKA] Create a new KafkaConsumer when error happens to fix the flaky test ## What changes were proposed in this pull request? When KafkaSource fails on Kafka errors, we should create a new

spark git commit: [FLAKY-TEST] InputStreamsSuite.socket input stream

2016-12-21 Thread tdas
Repository: spark Updated Branches: refs/heads/branch-2.1 021952d58 -> 9a3c5bd70 [FLAKY-TEST] InputStreamsSuite.socket input stream ## What changes were proposed in this pull request?

spark git commit: [FLAKY-TEST] InputStreamsSuite.socket input stream

2016-12-21 Thread tdas
Repository: spark Updated Branches: refs/heads/master 7e8994ffd -> afe36516e [FLAKY-TEST] InputStreamsSuite.socket input stream ## What changes were proposed in this pull request?

spark git commit: [SPARK-18234][SS] Made update mode public

2016-12-21 Thread tdas
ot; - Changed package of InternalOutputModes from o.a.s.sql to o.a.s.sql.catalyst - Added update mode state removing with watermark to StateStoreSaveExec ## How was this patch tested? Added new tests in changed modules Author: Tathagata Das <tathagata.das1...@gmail.com> Closes #16360 fr

spark git commit: [SPARK-18234][SS] Made update mode public

2016-12-21 Thread tdas
ot; - Changed package of InternalOutputModes from o.a.s.sql to o.a.s.sql.catalyst - Added update mode state removing with watermark to StateStoreSaveExec ## How was this patch tested? Added new tests in changed modules Author: Tathagata Das <tathagata.das1...@gmail.com> Closes #16360 fr

spark git commit: [SPARK-18588][SS][KAFKA] Create a new KafkaConsumer when error happens to fix the flaky test

2016-12-21 Thread tdas
Repository: spark Updated Branches: refs/heads/branch-2.1 0e51bb085 -> 17ef57fe8 [SPARK-18588][SS][KAFKA] Create a new KafkaConsumer when error happens to fix the flaky test ## What changes were proposed in this pull request? When KafkaSource fails on Kafka errors, we should create a new

spark git commit: [SPARK-19876][SS][WIP] OneTime Trigger Executor

2017-03-23 Thread tdas
ean that we should fall back to what's in the offset log. - A OneTime trigger execution that results in an exception being thrown. marmbrus tdas zsxwing Please review http://spark.apache.org/contributing.html before opening a pull request. Author: Tyson Condie <tcon...@gmail.com> Author:

[1/2] spark git commit: [SPARK-20057][SS] Renamed KeyedState to GroupState in mapGroupsWithState

2017-03-22 Thread tdas
Repository: spark Updated Branches: refs/heads/master 80fd07038 -> 82b598b96 http://git-wip-us.apache.org/repos/asf/spark/blob/82b598b9/sql/core/src/test/scala/org/apache/spark/sql/streaming/FlatMapGroupsWithStateSuite.scala

[2/2] spark git commit: [SPARK-20057][SS] Renamed KeyedState to GroupState in mapGroupsWithState

2017-03-22 Thread tdas
is would make it more general if you extends this operation to RelationGroupedDataset and python APIs. ## How was this patch tested? Existing unit tests. Author: Tathagata Das <tathagata.das1...@gmail.com> Closes #17385 from tdas/SPARK-20057. Project: http://git-wip-us.apache.org/repos/a

spark git commit: [SPARK-20165][SS] Resolve state encoder's deserializer in driver in FlatMapGroupsWithStateExec

2017-03-31 Thread tdas
il.com> Closes #17488 from tdas/SPARK-20165. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/567a50ac Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/567a50ac Diff: http://git-wip-us.apache.org/repos/asf/spark/diff

[2/2] spark git commit: [SPARK-19067][SS] Processing-time-based timeout in MapGroupsWithState

2017-03-19 Thread tdas
to address. ## How was this patch tested? New unit tests in - MapGroupsWithStateSuite for timeouts. - StateStoreSuite for new APIs in StateStore. Author: Tathagata Das <tathagata.das1...@gmail.com> Closes #17179 from tdas/mapgroupwithstate-timeout. Project: http://git-wip-us.apache.org

[1/2] spark git commit: [SPARK-19067][SS] Processing-time-based timeout in MapGroupsWithState

2017-03-19 Thread tdas
Repository: spark Updated Branches: refs/heads/master 0ee9fbf51 -> 990af630d http://git-wip-us.apache.org/repos/asf/spark/blob/990af630/sql/core/src/main/scala/org/apache/spark/sql/streaming/KeyedState.scala -- diff --git

spark git commit: [SPARK-19986][TESTS] Make pyspark.streaming.tests.CheckpointTests more stable

2017-03-17 Thread tdas
Repository: spark Updated Branches: refs/heads/master 7b5d873ae -> 376d78216 [SPARK-19986][TESTS] Make pyspark.streaming.tests.CheckpointTests more stable ## What changes were proposed in this pull request? Sometimes, CheckpointTests will hang on a busy machine because the streaming jobs

spark git commit: [SPARK-19986][TESTS] Make pyspark.streaming.tests.CheckpointTests more stable

2017-03-17 Thread tdas
Repository: spark Updated Branches: refs/heads/branch-2.1 710b5554e -> 5fb70831b [SPARK-19986][TESTS] Make pyspark.streaming.tests.CheckpointTests more stable ## What changes were proposed in this pull request? Sometimes, CheckpointTests will hang on a busy machine because the streaming

spark git commit: [SPARK-19873][SS] Record num shuffle partitions in offset log and enforce in next batch.

2017-03-17 Thread tdas
Repository: spark Updated Branches: refs/heads/master 7de66bae5 -> 3783539d7 [SPARK-19873][SS] Record num shuffle partitions in offset log and enforce in next batch. ## What changes were proposed in this pull request? If the user changes the shuffle partition number between batches,

spark git commit: [SPARK-19906][SS][DOCS] Documentation describing how to write queries to Kafka

2017-03-20 Thread tdas
fka. zsxwing tdas Please review http://spark.apache.org/contributing.html before opening a pull request. Author: Tyson Condie <tcon...@gmail.com> Closes #17246 from tcondie/kafka-write-docs. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/

spark git commit: [SPARK-19986][TESTS] Make pyspark.streaming.tests.CheckpointTests more stable

2017-03-17 Thread tdas
Repository: spark Updated Branches: refs/heads/branch-2.0 fd5149aff -> 6ee7d5bf4 [SPARK-19986][TESTS] Make pyspark.streaming.tests.CheckpointTests more stable ## What changes were proposed in this pull request? Sometimes, CheckpointTests will hang on a busy machine because the streaming

spark git commit: [SPARK-20051][SS] Fix StreamSuite flaky test - recover from v2.1 checkpoint

2017-03-21 Thread tdas
Repository: spark Updated Branches: refs/heads/master 9281a3d50 -> 2d73fcced [SPARK-20051][SS] Fix StreamSuite flaky test - recover from v2.1 checkpoint ## What changes were proposed in this pull request? There is a race condition between calling stop on a streaming query and deleting

spark git commit: [SPARK-20030][SS] Event-time-based timeout for MapGroupsWithState

2017-03-21 Thread tdas
ing `KeyedState.setTimeoutTimestamp`. The keys times out when the watermark crosses the timeout timestamp. ## How was this patch tested? Unit tests Author: Tathagata Das <tathagata.das1...@gmail.com> Closes #17361 from tdas/SPARK-20030. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Com

spark git commit: [SPARK-20301][FLAKY-TEST] Fix Hadoop Shell.runCommand flakiness in Structured Streaming tests

2017-04-12 Thread tdas
Repository: spark Updated Branches: refs/heads/master 99a947312 -> 924c42477 [SPARK-20301][FLAKY-TEST] Fix Hadoop Shell.runCommand flakiness in Structured Streaming tests ## What changes were proposed in this pull request? Some Structured Streaming tests show flakiness such as: ``` [info] -

spark git commit: [SPARK-19858][SS] Add output mode to flatMapGroupsWithState and disallow invalid cases

2017-03-08 Thread tdas
Repository: spark Updated Branches: refs/heads/master e9e2c612d -> 1bf901238 [SPARK-19858][SS] Add output mode to flatMapGroupsWithState and disallow invalid cases ## What changes were proposed in this pull request? Add a output mode parameter to `flatMapGroupsWithState` and just define

spark git commit: [SPARK-19719][SS] Kafka writer for both structured streaming and batch queires

2017-03-06 Thread tdas
end and ErrorIfExist supported under identical semantics. Other save modes result in an AnalysisException tdas zsxwing ## How was this patch tested? ### The following unit tests will be included - write to stream with topic field: valid stream write with data that includes an existing topic in the sch

spark git commit: [SPARK-19719][SS] Kafka writer for both structured streaming and batch queires

2017-03-06 Thread tdas
end and ErrorIfExist supported under identical semantics. Other save modes result in an AnalysisException tdas zsxwing ## How was this patch tested? ### The following unit tests will be included - write to stream with topic field: valid stream write with data that includes an existing topic in the schema - wr

spark git commit: [SPARK-20209][SS] Execute next trigger immediately if previous batch took longer than trigger interval

2017-04-05 Thread tdas
ple, ProcessingTimeExecutorSuite does not need to create any context for testing, just needs the StreamManualClock. ## How was this patch tested? Added new unit tests to comprehensively test this behavior. Author: Tathagata Das <tathagata.das1...@gmail.com> Closes #17525 from tdas/SPARK-20209. Project: http:

spark git commit: [SPARK-20224][SS] Updated docs for streaming dropDuplicates and mapGroupsWithState

2017-04-05 Thread tdas
ted markdown docs - Updated scala docs - Added scala and Java example ## How was this patch tested? Manually ran examples. Author: Tathagata Das <tathagata.das1...@gmail.com> Closes #17539 from tdas/SPARK-20224. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http:

spark git commit: [SPARK-20377][SS] Fix JavaStructuredSessionization example

2017-04-18 Thread tdas
ate when using timeouts. ## How was this patch tested? manually ran the example Author: Tathagata Das <tathagata.das1...@gmail.com> Closes #17676 from tdas/SPARK-20377. (cherry picked from commit 74aa0df8f7f132b62754e5159262e4a5b9b641ab) Signed-off-by: Tathagata Das <tathagata.das1.

spark git commit: [SPARK-20377][SS] Fix JavaStructuredSessionization example

2017-04-18 Thread tdas
hen using timeouts. ## How was this patch tested? manually ran the example Author: Tathagata Das <tathagata.das1...@gmail.com> Closes #17676 from tdas/SPARK-20377. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/74aa

spark git commit: [SPARK-21696][SS] Fix a potential issue that may generate partial snapshot files

2017-08-14 Thread tdas
Repository: spark Updated Branches: refs/heads/master fbc269252 -> 282f00b41 [SPARK-21696][SS] Fix a potential issue that may generate partial snapshot files ## What changes were proposed in this pull request? Directly writing a snapshot file may generate a partial file. This PR changes it

spark git commit: [SPARK-21696][SS] Fix a potential issue that may generate partial snapshot files

2017-08-14 Thread tdas
Repository: spark Updated Branches: refs/heads/branch-2.2 7b9807754 -> 48bacd36c [SPARK-21696][SS] Fix a potential issue that may generate partial snapshot files ## What changes were proposed in this pull request? Directly writing a snapshot file may generate a partial file. This PR changes

spark git commit: [SPARK-21370][SS] Add test for state reliability when one read-only state store aborts after read-write state store commits

2017-07-12 Thread tdas
Repository: spark Updated Branches: refs/heads/master e16e8c7ad -> e0af76a36 [SPARK-21370][SS] Add test for state reliability when one read-only state store aborts after read-write state store commits ## What changes were proposed in this pull request? During Streaming Aggregation, we have

spark git commit: [SPARK-21409][SS] Follow up PR to allow different types of custom metrics to be exposed

2017-07-17 Thread tdas
PR enables that. Author: Tathagata Das <tathagata.das1...@gmail.com> Closes #18661 from tdas/SPARK-21409-2. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/e9faae13 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree

spark git commit: [SPARK-21462][SS] Added batchId to StreamingQueryProgress.json

2017-07-18 Thread tdas
lso, removed recently added numPartitions from StatefulOperatorProgress as this value does not change through the query run, and there are other ways to find that. ## How was this patch tested? Updated unit tests Author: Tathagata Das <tathagata.das1...@gmail.com> Closes #18675 from t

spark git commit: [SPARK-21464][SS] Minimize deprecation warnings caused by ProcessingTime class

2017-07-19 Thread tdas
ing by removing its uses from tests as much as possible. ## How was this patch tested? Existing tests. Author: Tathagata Das <tathagata.das1...@gmail.com> Closes #18678 from tdas/SPARK-21464. (cherry picked from commit 70fe99dc62ef636a99bcb8a580ad4de4dca95181) Signed-off-by: Tathagata Das <ta

spark git commit: [SPARK-21464][SS] Minimize deprecation warnings caused by ProcessingTime class

2017-07-19 Thread tdas
ing its uses from tests as much as possible. ## How was this patch tested? Existing tests. Author: Tathagata Das <tathagata.das1...@gmail.com> Closes #18678 from tdas/SPARK-21464. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spa

spark git commit: [SS][MINOR] Fix flaky test in DatastreamReaderWriterSuite. temp checkpoint dir should be deleted

2017-07-06 Thread tdas
row interrupt exception, in which case temporary checkpoint directories will not be deleted, and the test will fail. Author: Tathagata Das <tathagata.das1...@gmail.com> Closes #18442 from tdas/DatastreamReaderWriterSuite-fix. (cherry picked from commit 60043f22458668ac7ecba94fa78953f23a6bdcec) S

spark git commit: [SS][MINOR] Fix flaky test in DatastreamReaderWriterSuite. temp checkpoint dir should be deleted

2017-07-06 Thread tdas
upt exception, in which case temporary checkpoint directories will not be deleted, and the test will fail. Author: Tathagata Das <tathagata.das1...@gmail.com> Closes #18442 from tdas/DatastreamReaderWriterSuite-fix. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http:

spark git commit: [SPARK-20461][CORE][SS] Use UninterruptibleThread for Executor and fix the potential hang in CachedKafkaConsumer

2017-04-27 Thread tdas
Repository: spark Updated Branches: refs/heads/master 606432a13 -> 01c999e7f [SPARK-20461][CORE][SS] Use UninterruptibleThread for Executor and fix the potential hang in CachedKafkaConsumer ## What changes were proposed in this pull request? This PR changes Executor's threads to

spark git commit: [SPARK-20461][CORE][SS] Use UninterruptibleThread for Executor and fix the potential hang in CachedKafkaConsumer

2017-04-27 Thread tdas
Repository: spark Updated Branches: refs/heads/branch-2.2 4512e2ae6 -> 753e129f3 [SPARK-20461][CORE][SS] Use UninterruptibleThread for Executor and fix the potential hang in CachedKafkaConsumer ## What changes were proposed in this pull request? This PR changes Executor's threads to

spark git commit: [SPARK-20452][SS][KAFKA] Fix a potential ConcurrentModificationException for batch Kafka DataFrame

2017-04-27 Thread tdas
Repository: spark Updated Branches: refs/heads/branch-2.2 753e129f3 -> 3d53d825e [SPARK-20452][SS][KAFKA] Fix a potential ConcurrentModificationException for batch Kafka DataFrame ## What changes were proposed in this pull request? Cancel a batch Kafka query but one of task cannot be

spark git commit: [SPARK-20452][SS][KAFKA] Fix a potential ConcurrentModificationException for batch Kafka DataFrame

2017-04-27 Thread tdas
Repository: spark Updated Branches: refs/heads/master 01c999e7f -> 823baca2c [SPARK-20452][SS][KAFKA] Fix a potential ConcurrentModificationException for batch Kafka DataFrame ## What changes were proposed in this pull request? Cancel a batch Kafka query but one of task cannot be cancelled,

spark git commit: [SPARK-21596][SS] Ensure places calling HDFSMetadataLog.get check the return value

2017-08-08 Thread tdas
Repository: spark Updated Branches: refs/heads/master fb54a564d -> 6edfff055 [SPARK-21596][SS] Ensure places calling HDFSMetadataLog.get check the return value ## What changes were proposed in this pull request? When I was investigating a flaky test, I realized that many places don't check

spark git commit: [SPARK-21587][SS] Added filter pushdown through watermarks.

2017-08-09 Thread tdas
Repository: spark Updated Branches: refs/heads/master 2d799d080 -> 0fb73253f [SPARK-21587][SS] Added filter pushdown through watermarks. ## What changes were proposed in this pull request? Push filter predicates through EventTimeWatermark if they're deterministic and do not reference the

spark git commit: [SPARK-21145][SS] Added StateStoreProviderId with queryRunId to reload StateStoreProviders when query is restarted

2017-06-23 Thread tdas
il.com> Closes #18355 from tdas/SPARK-21145. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/fe24634d Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/fe24634d Diff: http://git-wip-us.apache.org/repos/asf/spark/diff

spark git commit: [SPARK-20957][SS][TESTS] Fix o.a.s.sql.streaming.StreamingQueryManagerSuite listing

2017-06-05 Thread tdas
Repository: spark Updated Branches: refs/heads/master 06c054411 -> bc537e40a [SPARK-20957][SS][TESTS] Fix o.a.s.sql.streaming.StreamingQueryManagerSuite listing ## What changes were proposed in this pull request? When stopping StreamingQuery, StreamExecution will set `streamDeathCause` then

spark git commit: [SPARK-20957][SS][TESTS] Fix o.a.s.sql.streaming.StreamingQueryManagerSuite listing

2017-06-05 Thread tdas
Repository: spark Updated Branches: refs/heads/branch-2.2 1388fdd70 -> 421d8ecb8 [SPARK-20957][SS][TESTS] Fix o.a.s.sql.streaming.StreamingQueryManagerSuite listing ## What changes were proposed in this pull request? When stopping StreamingQuery, StreamExecution will set `streamDeathCause`

spark git commit: [SPARK-22018][SQL] Preserve top-level alias metadata when collapsing projects

2017-09-14 Thread tdas
ing the metadata of top-level aliases. ## How was this patch tested? New unit test Author: Tathagata Das <tathagata.das1...@gmail.com> Closes #19240 from tdas/SPARK-22018. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/8866

spark git commit: [SPARK-22017] Take minimum of all watermark execs in StreamExecution.

2017-09-15 Thread tdas
Repository: spark Updated Branches: refs/heads/master c7307acda -> 0bad10d3e [SPARK-22017] Take minimum of all watermark execs in StreamExecution. ## What changes were proposed in this pull request? Take the minimum of all watermark exec nodes as the "real" watermark in StreamExecution,

[1/2] spark git commit: [SPARK-22053][SS] Stream-stream inner join in Append Mode

2017-09-21 Thread tdas
Repository: spark Updated Branches: refs/heads/master a8a5cd24e -> f32a84250 http://git-wip-us.apache.org/repos/asf/spark/blob/f32a8425/sql/core/src/test/scala/org/apache/spark/sql/streaming/StateStoreMetricsTest.scala -- diff

[2/2] spark git commit: [SPARK-22053][SS] Stream-stream inner join in Append Mode

2017-09-21 Thread tdas
evented stream-stream join on an empty batch dataframe to be collapsed by the optimizer ## How was this patch tested? - New tests in StreamingJoinSuite - Updated tests UnsupportedOperationSuite Author: Tathagata Das <tathagata.das1...@gmail.com> Closes #19271 from tdas/SPARK-22053. Proje

spark git commit: [SPARK-22187][SS] Update unsaferow format for saved state such that we can set timeouts when state is null

2017-10-04 Thread tdas
ll, and avoid these confusing corner cases. ## How was this patch tested? Refactored tests. Author: Tathagata Das <tathagata.das1...@gmail.com> Closes #19416 from tdas/SPARK-22187. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/sp

spark git commit: [SPARK-22238] Fix plan resolution bug caused by EnsureStatefulOpPartitioning

2017-10-14 Thread tdas
Repository: spark Updated Branches: refs/heads/master 014dc8471 -> e8547ffb4 [SPARK-22238] Fix plan resolution bug caused by EnsureStatefulOpPartitioning ## What changes were proposed in this pull request? In EnsureStatefulOpPartitioning, we check that the inputRDD to a SparkPlan has the

spark git commit: [SPARK-21925] Update trigger interval documentation in docs with behavior change in Spark 2.2

2017-09-05 Thread tdas
Repository: spark Updated Branches: refs/heads/branch-2.2 fb1b5f08a -> 1f7c4869b [SPARK-21925] Update trigger interval documentation in docs with behavior change in Spark 2.2 Forgot to update docs with behavior change. Author: Burak Yavuz Closes #19138 from

spark git commit: [SPARK-21925] Update trigger interval documentation in docs with behavior change in Spark 2.2

2017-09-05 Thread tdas
Repository: spark Updated Branches: refs/heads/master 2974406d1 -> 8c954d2cd [SPARK-21925] Update trigger interval documentation in docs with behavior change in Spark 2.2 Forgot to update docs with behavior change. Author: Burak Yavuz Closes #19138 from

spark git commit: [SPARK-21765] Check that optimization doesn't affect isStreaming bit.

2017-09-06 Thread tdas
Repository: spark Updated Branches: refs/heads/master 36b48ee6e -> acdf45fb5 [SPARK-21765] Check that optimization doesn't affect isStreaming bit. ## What changes were proposed in this pull request? Add an assert in logical plan optimization that the isStreaming bit stays the same, and fix

spark git commit: [SPARK-21788][SS] Handle more exceptions when stopping a streaming query

2017-08-24 Thread tdas
Repository: spark Updated Branches: refs/heads/master 2dd37d827 -> d3abb3699 [SPARK-21788][SS] Handle more exceptions when stopping a streaming query ## What changes were proposed in this pull request? Add more cases we should view as a normal query stop rather than a failure. ## How was

[2/2] spark git commit: [SPARK-22136][SS] Implement stream-stream outer joins.

2017-10-03 Thread tdas
[SPARK-22136][SS] Implement stream-stream outer joins. ## What changes were proposed in this pull request? Allow one-sided outer joins between two streams when a watermark is defined. ## How was this patch tested? new unit tests Author: Jose Torres Closes #19327 from

[1/2] spark git commit: [SPARK-22136][SS] Implement stream-stream outer joins.

2017-10-03 Thread tdas
Repository: spark Updated Branches: refs/heads/master 5f6943345 -> 3099c574c http://git-wip-us.apache.org/repos/asf/spark/blob/3099c574/sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingJoinSuite.scala -- diff

spark git commit: [SPARK-22136][SS] Evaluate one-sided conditions early in stream-stream joins.

2017-10-17 Thread tdas
Repository: spark Updated Branches: refs/heads/master e1960c3d6 -> 75d666b95 [SPARK-22136][SS] Evaluate one-sided conditions early in stream-stream joins. ## What changes were proposed in this pull request? Evaluate one-sided conditions early in stream-stream joins. This is in addition to

spark git commit: [SPARK-22278][SS] Expose current event time watermark and current processing time in GroupState

2017-10-17 Thread tdas
Das <tathagata.das1...@gmail.com> Closes #19495 from tdas/SPARK-22278. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/f3137fee Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/f3137fee Diff: http://git-wip-us.apache.org/repos/

spark git commit: [SPARK-24157][SS] Enabled no-data batches in MicroBatchExecution for streaming aggregation and deduplication.

2018-05-04 Thread tdas
ation Author: Tathagata Das <tathagata.das1...@gmail.com> Closes #21220 from tdas/SPARK-24157. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/47b5b685 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/47b5b685 D

spark git commit: [SPARK-24039][SS] Do continuous processing writes with multiple compute() calls

2018-05-04 Thread tdas
Repository: spark Updated Branches: refs/heads/master d04806a23 -> af4dc5028 [SPARK-24039][SS] Do continuous processing writes with multiple compute() calls ## What changes were proposed in this pull request? Do continuous processing writes with multiple compute() calls. The current

spark git commit: [SPARK-24234][SS] Reader for continuous processing shuffle

2018-05-21 Thread tdas
Repository: spark Updated Branches: refs/heads/master 03e90f65b -> a33dcf4a0 [SPARK-24234][SS] Reader for continuous processing shuffle ## What changes were proposed in this pull request? Read RDD for continuous processing shuffle, as well as the initial RPC-based row receiver.

spark git commit: [SPARK-23416][SS] Add a specific stop method for ContinuousExecution.

2018-05-23 Thread tdas
Repository: spark Updated Branches: refs/heads/master b7a036b75 -> f45793329 [SPARK-23416][SS] Add a specific stop method for ContinuousExecution. ## What changes were proposed in this pull request? Add a specific stop method for ContinuousExecution. The previous StreamExecution.stop()

spark git commit: [SPARK-24234][SS] Support multiple row writers in continuous processing shuffle reader.

2018-05-24 Thread tdas
Repository: spark Updated Branches: refs/heads/master 53c06ddab -> 0fd68cb72 [SPARK-24234][SS] Support multiple row writers in continuous processing shuffle reader. ## What changes were proposed in this pull request?

spark git commit: [SPARK-24158][SS] Enable no-data batches for streaming joins

2018-05-16 Thread tdas
nup. This PR enables it for stream-stream joins. ## How was this patch tested? - Updated join tests. Additionally, updated them to not use `CheckLastBatch` anywhere to set good precedence for future. Author: Tathagata Das <tathagata.das1...@gmail.com> Closes #21253 from tdas/SPARK-24158.

spark git commit: [SPARK-23503][SS] Enforce sequencing of committed epochs for Continuous Execution

2018-05-18 Thread tdas
Repository: spark Updated Branches: refs/heads/master 710e4e81a -> 434d74e33 [SPARK-23503][SS] Enforce sequencing of committed epochs for Continuous Execution ## What changes were proposed in this pull request? Made changes to EpochCoordinator so that it enforces a commit order. In case a

spark git commit: [SPARK-24396][SS][PYSPARK] Add Structured Streaming ForeachWriter for python

2018-06-15 Thread tdas
Das Closes #21477 from tdas/SPARK-24396. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/b5ccf0d3 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/b5ccf0d3 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/b5ccf

spark git commit: [SPARK-24397][PYSPARK] Added TaskContext.getLocalProperty(key) in Python

2018-05-31 Thread tdas
ext. It mirrors the Java TaskContext API of returning a string value if the key exists, or None if the key does not exist. ## How was this patch tested? New test added. Author: Tathagata Das Closes #21437 from tdas/SPARK-24397. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: h

spark git commit: [SPARK-24453][SS] Fix error recovering from the failure in a no-data batch

2018-06-05 Thread tdas
lag appropriately. ## How was this patch tested? new unit test Author: Tathagata Das Closes #21491 from tdas/SPARK-24453. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/2c2a86b5 Tree: http://git-wip-us.apache.org/repos/

spark git commit: [SPARK-24094][SS][MINOR] Change description strings of v2 streaming sources to reflect the change

2018-04-26 Thread tdas
ion is running. Great for debugging production issues. ## How was this patch tested? Not necessary. Author: Tathagata Das <tathagata.das1...@gmail.com> Closes #21160 from tdas/SPARK-24094. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/

spark git commit: [SPARK-22912] v2 data source support in MicroBatchExecution

2018-01-09 Thread tdas
Repository: spark Updated Branches: refs/heads/branch-2.3 be5991902 -> 44763d93c [SPARK-22912] v2 data source support in MicroBatchExecution ## What changes were proposed in this pull request? Support for v2 data sources in microbatch streaming. ## How was this patch tested? A very basic

spark git commit: [SPARK-22912] v2 data source support in MicroBatchExecution

2018-01-08 Thread tdas
Repository: spark Updated Branches: refs/heads/master eed82a0b2 -> 4f7e75883 [SPARK-22912] v2 data source support in MicroBatchExecution ## What changes were proposed in this pull request? Support for v2 data sources in microbatch streaming. ## How was this patch tested? A very basic new

spark git commit: [SPARK-23144][SS] Added console sink for continuous processing

2018-01-18 Thread tdas
How was this patch tested? new unit test Author: Tathagata Das <tathagata.das1...@gmail.com> Closes #20311 from tdas/SPARK-23144. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/bf34d665 Tree: http://git-wip-us.apache.org/repos/

spark git commit: [SPARK-23144][SS] Added console sink for continuous processing

2018-01-18 Thread tdas
How was this patch tested? new unit test Author: Tathagata Das <tathagata.das1...@gmail.com> Closes #20311 from tdas/SPARK-23144. (cherry picked from commit bf34d665b9c865e00fac7001500bf6d521c2dff9) Signed-off-by: Tathagata Das <tathagata.das1...@gmail.com> Project: http://git-wip-us.a

spark git commit: [SPARK-23143][SS][PYTHON] Added python API for setting continuous trigger

2018-01-18 Thread tdas
Das <tathagata.das1...@gmail.com> Closes #20309 from tdas/SPARK-23143. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/2d41f040 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/2d41f040 Diff: http://git-wip-us.a

[2/2] spark git commit: [SPARK-22908][SS] Roll forward continuous processing Kafka support with fix to continuous Kafka data reader

2018-01-16 Thread tdas
[SPARK-22908][SS] Roll forward continuous processing Kafka support with fix to continuous Kafka data reader ## What changes were proposed in this pull request? The Kafka reader is now interruptible and can close itself. ## How was this patch tested? I locally ran one of the

[1/2] spark git commit: [SPARK-22908][SS] Roll forward continuous processing Kafka support with fix to continuous Kafka data reader

2018-01-16 Thread tdas
Repository: spark Updated Branches: refs/heads/branch-2.3 08252bb38 -> 0a441d2ed http://git-wip-us.apache.org/repos/asf/spark/blob/0a441d2e/external/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaSourceSuite.scala

[2/2] spark git commit: [SPARK-22908][SS] Roll forward continuous processing Kafka support with fix to continuous Kafka data reader

2018-01-16 Thread tdas
[SPARK-22908][SS] Roll forward continuous processing Kafka support with fix to continuous Kafka data reader ## What changes were proposed in this pull request? The Kafka reader is now interruptible and can close itself. ## How was this patch tested? I locally ran one of the

[1/2] spark git commit: [SPARK-22908][SS] Roll forward continuous processing Kafka support with fix to continuous Kafka data reader

2018-01-16 Thread tdas
Repository: spark Updated Branches: refs/heads/master a9b845ebb -> 166705785 http://git-wip-us.apache.org/repos/asf/spark/blob/16670578/external/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaSourceSuite.scala

spark git commit: [SPARK-23033][SS] Don't use task level retry for continuous processing

2018-01-17 Thread tdas
Repository: spark Updated Branches: refs/heads/master c132538a1 -> 86a845031 [SPARK-23033][SS] Don't use task level retry for continuous processing ## What changes were proposed in this pull request? Continuous processing tasks will fail on any attempt number greater than 0.

spark git commit: [SPARK-23033][SS] Don't use task level retry for continuous processing

2018-01-17 Thread tdas
Repository: spark Updated Branches: refs/heads/branch-2.3 1a6dfaf25 -> dbd2a5566 [SPARK-23033][SS] Don't use task level retry for continuous processing ## What changes were proposed in this pull request? Continuous processing tasks will fail on any attempt number greater than 0.

spark git commit: [SPARK-23052][SS] Migrate ConsoleSink to data source V2 api.

2018-01-17 Thread tdas
Repository: spark Updated Branches: refs/heads/branch-2.3 3a80cc59b -> 2a87c3a77 [SPARK-23052][SS] Migrate ConsoleSink to data source V2 api. ## What changes were proposed in this pull request? Migrate ConsoleSink to data source V2 api. Note that this includes a missing piece in

spark git commit: [SPARK-23052][SS] Migrate ConsoleSink to data source V2 api.

2018-01-17 Thread tdas
Repository: spark Updated Branches: refs/heads/master 39d244d92 -> 1c76a91e5 [SPARK-23052][SS] Migrate ConsoleSink to data source V2 api. ## What changes were proposed in this pull request? Migrate ConsoleSink to data source V2 api. Note that this includes a missing piece in

[1/2] spark git commit: [SPARK-22908] Add kafka source and sink for continuous processing.

2018-01-11 Thread tdas
Repository: spark Updated Branches: refs/heads/branch-2.3 b94debd2b -> f891ee324 http://git-wip-us.apache.org/repos/asf/spark/blob/f891ee32/sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala -- diff --git

[1/2] spark git commit: [SPARK-22908] Add kafka source and sink for continuous processing.

2018-01-11 Thread tdas
Repository: spark Updated Branches: refs/heads/master 0b2eefb67 -> 6f7aaed80 http://git-wip-us.apache.org/repos/asf/spark/blob/6f7aaed8/sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala -- diff --git

[2/2] spark git commit: [SPARK-22908] Add kafka source and sink for continuous processing.

2018-01-11 Thread tdas
[SPARK-22908] Add kafka source and sink for continuous processing. ## What changes were proposed in this pull request? Add kafka source and sink for continuous processing. This involves two small changes to the execution engine: * Bring data reader close() into the normal data reader thread to

[2/2] spark git commit: [SPARK-22908] Add kafka source and sink for continuous processing.

2018-01-11 Thread tdas
[SPARK-22908] Add kafka source and sink for continuous processing. ## What changes were proposed in this pull request? Add kafka source and sink for continuous processing. This involves two small changes to the execution engine: * Bring data reader close() into the normal data reader thread to

spark git commit: [SPARK-23064][SS][DOCS] Stream-stream joins Documentation - follow up

2018-02-02 Thread tdas
How was this patch tested? N/A Author: Tathagata Das <tathagata.das1...@gmail.com> Closes #20494 from tdas/SPARK-23064-2. (cherry picked from commit eaf35de2471fac4337dd2920026836d52b1ec847) Signed-off-by: Tathagata Das <tathagata.das1...@gmail.com> Project: http://git-wip-us.apache.org/repos

spark git commit: [SPARK-23064][SS][DOCS] Stream-stream joins Documentation - follow up

2018-02-02 Thread tdas
tch tested? N/A Author: Tathagata Das <tathagata.das1...@gmail.com> Closes #20494 from tdas/SPARK-23064-2. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/eaf35de2 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree

spark git commit: [SPARK-23092][SQL] Migrate MemoryStream to DataSourceV2 APIs

2018-02-07 Thread tdas
sts. Author: Tathagata Das <tathagata.das1...@gmail.com> Author: Burak Yavuz <brk...@gmail.com> Closes #20445 from tdas/SPARK-23092. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/30295bf5 Tree: http://git-wip

spark git commit: [SPARK-23406][SS] Enable stream-stream self-joins

2018-02-14 Thread tdas
key#9, value#12] +- Project [value#66 AS value#12]// solution: project with aliases +- LocalRelation [value#66] ``` ## How was this patch tested? New unit test Author: Tathagata Das <tathagata.das1...@gmail.com> Closes #20598 from tdas/SPARK-23406. Project: http://git

spark git commit: [SPARK-23454][SS][DOCS] Added trigger information to the Structured Streaming programming guide

2018-02-20 Thread tdas
ove this) Please review http://spark.apache.org/contributing.html before opening a pull request. Author: Tathagata Das <tathagata.das1...@gmail.com> Closes #20631 from tdas/SPARK-23454. (cherry picked from commit 601d653bff9160db8477f86d961e609fc2190237) Signed-off-by: Tathagata Das <ta

spark git commit: [SPARK-23454][SS][DOCS] Added trigger information to the Structured Streaming programming guide

2018-02-20 Thread tdas
ove this) Please review http://spark.apache.org/contributing.html before opening a pull request. Author: Tathagata Das <tathagata.das1...@gmail.com> Closes #20631 from tdas/SPARK-23454. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spa

spark git commit: [SPARK-23484][SS] Fix possible race condition in KafkaContinuousReader

2018-02-21 Thread tdas
rom multiple threads - the query thread at the time of reader factory creation, and the epoch tracking thread at the time of `needsReconfiguration`. ## How was this patch tested? Existing tests. Author: Tathagata Das <tathagata.das1...@gmail.com> Closes #20655 from tdas/SPARK-23484.

spark git commit: [SPARK-23484][SS] Fix possible race condition in KafkaContinuousReader

2018-02-21 Thread tdas
rom multiple threads - the query thread at the time of reader factory creation, and the epoch tracking thread at the time of `needsReconfiguration`. ## How was this patch tested? Existing tests. Author: Tathagata Das <tathagata.das1...@gmail.com> Closes #20655 from tdas/SPARK-23484. Proj

[2/2] spark git commit: [SPARK-23362][SS] Migrate Kafka Microbatch source to v2

2018-02-16 Thread tdas
throughput of V1 and V2 using 20M records in a single partition. They were comparable. ## How was this patch tested? Existing tests, few modified to be better tests than the existing ones. Author: Tathagata Das <tathagata.das1...@gmail.com> Closes #20554 from tdas/SPARK-23362. Project: http://g

[1/2] spark git commit: [SPARK-23362][SS] Migrate Kafka Microbatch source to v2

2018-02-16 Thread tdas
Repository: spark Updated Branches: refs/heads/master c5857e496 -> 0a73aa31f http://git-wip-us.apache.org/repos/asf/spark/blob/0a73aa31/external/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaSourceSuite.scala

spark git commit: [SPARK-23408][SS] Synchronize successive AddData actions in Streaming*JoinSuite

2018-02-23 Thread tdas
ing sure that the flaky tests are deterministic. ## How was this patch tested? Modified test cases in `Streaming*JoinSuites` where there are consecutive `AddData` actions. Author: Tathagata Das <tathagata.das1...@gmail.com> Closes #20650 from tdas/SPARK-23408. Project: http:

spark git commit: [SPARK-23491][SS] Remove explicit job cancellation from ContinuousExecution reconfiguring

2018-02-26 Thread tdas
Repository: spark Updated Branches: refs/heads/master 185f5bc7d -> 7ec83658f [SPARK-23491][SS] Remove explicit job cancellation from ContinuousExecution reconfiguring ## What changes were proposed in this pull request? Remove queryExecutionThread.interrupt() from ContinuousExecution. As

<    3   4   5   6   7   8   9   >