[spark] branch branch-3.2 updated: [SPARK-36132][SS][SQL] Support initial state for batch mode of flatMapGroupsWithState

2021-07-20 Thread tdas
This is an automated email from the ASF dual-hosted git repository. tdas pushed a commit to branch branch-3.2 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.2 by this push: new 0d60cb5 [SPARK-36132][SS][SQL] Support

[spark] branch master updated: [SPARK-36132][SS][SQL] Support initial state for batch mode of flatMapGroupsWithState

2021-07-20 Thread tdas
This is an automated email from the ASF dual-hosted git repository. tdas pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new efcce23 [SPARK-36132][SS][SQL] Support initial

[spark] branch master updated: [SPARK-35800][SS] Improving GroupState testability by introducing TestGroupState

2021-06-22 Thread tdas
This is an automated email from the ASF dual-hosted git repository. tdas pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new dfd7b02 [SPARK-35800][SS] Improving GroupState

[spark] branch master updated: [SPARK-32585][SQL] Support scala enumeration in ScalaReflection

2020-10-01 Thread tdas
This is an automated email from the ASF dual-hosted git repository. tdas pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new e62d247 [SPARK-32585][SQL] Support scala

[spark] branch branch-2.4 updated: [SPARK-32794][SS] Fixed rare corner case error in micro-batch engine with some stateful queries + no-data-batches + V1 sources

2020-09-11 Thread tdas
This is an automated email from the ASF dual-hosted git repository. tdas pushed a commit to branch branch-2.4 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-2.4 by this push: new c82b6e4 [SPARK-32794][SS] Fixed rare

[spark] branch branch-3.0 updated: [SPARK-32794][SS] Fixed rare corner case error in micro-batch engine with some stateful queries + no-data-batches + V1 sources

2020-09-09 Thread tdas
This is an automated email from the ASF dual-hosted git repository. tdas pushed a commit to branch branch-3.0 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-3.0 by this push: new e632e7c [SPARK-32794][SS] Fixed rare

[spark] branch master updated: [SPARK-32794][SS] Fixed rare corner case error in micro-batch engine with some stateful queries + no-data-batches + V1 sources

2020-09-09 Thread tdas
This is an automated email from the ASF dual-hosted git repository. tdas pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new e4237bb [SPARK-32794][SS] Fixed rare corner case

[spark] branch master updated: [SPARK-29438][SS] Use partition ID of StateStoreAwareZipPartitionsRDD for determining partition ID of state store in stream-stream join

2020-01-30 Thread tdas
This is an automated email from the ASF dual-hosted git repository. tdas pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new cbb714f [SPARK-29438][SS] Use partition ID

[spark] branch master updated: [SPARK-30609] Allow default merge command resolution to be bypassed by DSv2 tables

2020-01-22 Thread tdas
This is an automated email from the ASF dual-hosted git repository. tdas pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new d2bca8f [SPARK-30609] Allow default merge command

[spark] branch branch-2.4 updated: [SPARK-27453] Pass partitionBy as options in DataFrameWriter

2019-04-16 Thread tdas
This is an automated email from the ASF dual-hosted git repository. tdas pushed a commit to branch branch-2.4 in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/branch-2.4 by this push: new df9a506 [SPARK-27453] Pass partitionBy

[spark] branch master updated: [SPARK-27453] Pass partitionBy as options in DataFrameWriter

2019-04-16 Thread tdas
This is an automated email from the ASF dual-hosted git repository. tdas pushed a commit to branch master in repository https://gitbox.apache.org/repos/asf/spark.git The following commit(s) were added to refs/heads/master by this push: new 26ed65f [SPARK-27453] Pass partitionBy as options

spark git commit: [SPARK-25639][DOCS] Added docs for foreachBatch, python foreach and multiple watermarks

2018-10-08 Thread tdas
tch - Multiple watermark policy - The semantics of what changes are allowed to the streaming between restarts. ## How was this patch tested? No tests Closes #22627 from tdas/SPARK-25639. Authored-by: Tathagata Das Signed-off-by: Tathagata Das (cherry picked from com

spark git commit: [SPARK-25639][DOCS] Added docs for foreachBatch, python foreach and multiple watermarks

2018-10-08 Thread tdas
ple watermark policy - The semantics of what changes are allowed to the streaming between restarts. ## How was this patch tested? No tests Closes #22627 from tdas/SPARK-25639. Authored-by: Tathagata Das Signed-off-by: Tathagata Das Project: http://git-wip-us.apache.org/repos/asf/spark/repo Com

spark git commit: [SPARK-25399][SS] Continuous processing state should not affect microbatch execution jobs

2018-09-11 Thread tdas
Repository: spark Updated Branches: refs/heads/branch-2.4 3a6ef8b7e -> 0dbf1450f [SPARK-25399][SS] Continuous processing state should not affect microbatch execution jobs ## What changes were proposed in this pull request? The leftover state from running a continuous processing streaming

spark git commit: [SPARK-25399][SS] Continuous processing state should not affect microbatch execution jobs

2018-09-11 Thread tdas
Repository: spark Updated Branches: refs/heads/master 97d4afaa1 -> 9f5c5b4cc [SPARK-25399][SS] Continuous processing state should not affect microbatch execution jobs ## What changes were proposed in this pull request? The leftover state from running a continuous processing streaming job

spark git commit: [SPARK-25204][SS] Fix race in rate source test.

2018-08-23 Thread tdas
Repository: spark Updated Branches: refs/heads/master a9aacdf1c -> 8ed044928 [SPARK-25204][SS] Fix race in rate source test. ## What changes were proposed in this pull request? Fix a race in the rate source tests. We need a better way of testing restart behavior. ## How was this patch

spark git commit: [SPARK-25184][SS] Fixed race condition in StreamExecution that caused flaky test in FlatMapGroupsWithState

2018-08-22 Thread tdas
n is also avoided. ## How was this patch tested? Ran locally many times. Closes #22182 from tdas/SPARK-25184. Authored-by: Tathagata Das Signed-off-by: Tathagata Das Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/31063249 Tr

spark git commit: [MINOR] Added import to fix compilation

2018-08-21 Thread tdas
[error] one error found [error] Compile failed at Aug 21, 2018 4:04:35 PM [12.827s] ``` ## How was this patch tested? It compiles! Closes #22175 from tdas/fix-build. Authored-by: Tathagata Das Signed-off-by: Tathagata Das Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: htt

spark git commit: [SPARK-24441][SS] Expose total estimated size of states in HDFSBackedStateStoreProvider

2018-08-21 Thread tdas
Repository: spark Updated Branches: refs/heads/master ac0174e55 -> 42035a4fe [SPARK-24441][SS] Expose total estimated size of states in HDFSBackedStateStoreProvider ## What changes were proposed in this pull request? This patch exposes the estimation of size of cache (loadedMaps) in

spark git commit: [SPARK-24763][SS] Remove redundant key data from value in streaming aggregation

2018-08-21 Thread tdas
Repository: spark Updated Branches: refs/heads/master 72ecfd095 -> 6c5cb8585 [SPARK-24763][SS] Remove redundant key data from value in streaming aggregation ## What changes were proposed in this pull request? This patch proposes a new flag option for stateful aggregation: remove redundant

spark git commit: [SPARK-24699][SS] Make watermarks work with Trigger.Once by saving updated watermark to commit log

2018-07-23 Thread tdas
red-by: Tathagata Das Co-authored-by: c-horn Author: Tathagata Das Closes #21746 from tdas/SPARK-24699. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/61f0ca4f Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/61f0ca4f Diff: h

spark git commit: [SPARK-22187][SS] Update unsaferow format for saved state in flatMapGroupsWithState to allow timeouts with deleted state

2018-07-19 Thread tdas
citly testing obj-to-row conversion for both state formats. Author: Tathagata Das Closes #21739 from tdas/SPARK-22187-1. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/b3d88ac0 Tree: http://git-wip-us.apache.org/repos/a

spark git commit: [SPARK-24717][SS] Split out max retain version of state for memory in HDFSBackedStateStoreProvider

2018-07-19 Thread tdas
Repository: spark Updated Branches: refs/heads/master d05a926e7 -> 8b7d4f842 [SPARK-24717][SS] Split out max retain version of state for memory in HDFSBackedStateStoreProvider ## What changes were proposed in this pull request? This patch proposes breaking down configuration of retaining

spark git commit: [SPARK-24697][SS] Fix the reported start offsets in streaming query progress

2018-07-11 Thread tdas
ing the `committedOffsets`. ## How was this patch tested? Added new tests Author: Tathagata Das Closes #21744 from tdas/SPARK-24697. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/ff7f6ef7 Tree: http://git-wip-us.apache.

spark git commit: [SPARK-24730][SS] Add policy to choose max as global watermark when streaming query has multiple watermarks

2018-07-10 Thread tdas
Add a test for recovery from existing checkpoints. ## How was this patch tested? New unit test Author: Tathagata Das Closes #21701 from tdas/SPARK-24730. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/6078b891 Tree: http://git-

spark git commit: [SPARK-24662][SQL][SS] Support limit in structured streaming

2018-07-10 Thread tdas
Repository: spark Updated Branches: refs/heads/master e0559f238 -> 32cb50835 [SPARK-24662][SQL][SS] Support limit in structured streaming ## What changes were proposed in this pull request? Support the LIMIT operator in structured streaming. For streams in append or complete output mode, a

spark git commit: [SPARK-24386][SS] coalesce(1) aggregates in continuous processing

2018-06-28 Thread tdas
Repository: spark Updated Branches: refs/heads/master 2224861f2 -> f6e6899a8 [SPARK-24386][SS] coalesce(1) aggregates in continuous processing ## What changes were proposed in this pull request? Provide a continuous processing implementation of coalesce(1), as well as allowing aggregates on

spark git commit: [SPARK-24396][SS][PYSPARK] Add Structured Streaming ForeachWriter for python

2018-06-15 Thread tdas
Das Closes #21477 from tdas/SPARK-24396. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/b5ccf0d3 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/b5ccf0d3 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/b5ccf

spark git commit: [SPARK-24453][SS] Fix error recovering from the failure in a no-data batch

2018-06-05 Thread tdas
lag appropriately. ## How was this patch tested? new unit test Author: Tathagata Das Closes #21491 from tdas/SPARK-24453. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/2c2a86b5 Tree: http://git-wip-us.apache.org/repos/

spark git commit: [SPARK-24397][PYSPARK] Added TaskContext.getLocalProperty(key) in Python

2018-05-31 Thread tdas
ext. It mirrors the Java TaskContext API of returning a string value if the key exists, or None if the key does not exist. ## How was this patch tested? New test added. Author: Tathagata Das Closes #21437 from tdas/SPARK-24397. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: h

spark git commit: [SPARK-24234][SS] Support multiple row writers in continuous processing shuffle reader.

2018-05-24 Thread tdas
Repository: spark Updated Branches: refs/heads/master 53c06ddab -> 0fd68cb72 [SPARK-24234][SS] Support multiple row writers in continuous processing shuffle reader. ## What changes were proposed in this pull request?

spark git commit: [SPARK-23416][SS] Add a specific stop method for ContinuousExecution.

2018-05-23 Thread tdas
Repository: spark Updated Branches: refs/heads/master b7a036b75 -> f45793329 [SPARK-23416][SS] Add a specific stop method for ContinuousExecution. ## What changes were proposed in this pull request? Add a specific stop method for ContinuousExecution. The previous StreamExecution.stop()

spark git commit: [SPARK-24234][SS] Reader for continuous processing shuffle

2018-05-21 Thread tdas
Repository: spark Updated Branches: refs/heads/master 03e90f65b -> a33dcf4a0 [SPARK-24234][SS] Reader for continuous processing shuffle ## What changes were proposed in this pull request? Read RDD for continuous processing shuffle, as well as the initial RPC-based row receiver.

spark git commit: [SPARK-23503][SS] Enforce sequencing of committed epochs for Continuous Execution

2018-05-18 Thread tdas
Repository: spark Updated Branches: refs/heads/master 710e4e81a -> 434d74e33 [SPARK-23503][SS] Enforce sequencing of committed epochs for Continuous Execution ## What changes were proposed in this pull request? Made changes to EpochCoordinator so that it enforces a commit order. In case a

spark git commit: [SPARK-24158][SS] Enable no-data batches for streaming joins

2018-05-16 Thread tdas
nup. This PR enables it for stream-stream joins. ## How was this patch tested? - Updated join tests. Additionally, updated them to not use `CheckLastBatch` anywhere to set good precedence for future. Author: Tathagata Das <tathagata.das1...@gmail.com> Closes #21253 from tdas/SPARK-24158.

spark git commit: [SPARK-24157][SS] Enabled no-data batches in MicroBatchExecution for streaming aggregation and deduplication.

2018-05-04 Thread tdas
ation Author: Tathagata Das <tathagata.das1...@gmail.com> Closes #21220 from tdas/SPARK-24157. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/47b5b685 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/47b5b685 D

spark git commit: [SPARK-24039][SS] Do continuous processing writes with multiple compute() calls

2018-05-04 Thread tdas
Repository: spark Updated Branches: refs/heads/master d04806a23 -> af4dc5028 [SPARK-24039][SS] Do continuous processing writes with multiple compute() calls ## What changes were proposed in this pull request? Do continuous processing writes with multiple compute() calls. The current

spark git commit: [SPARK-24094][SS][MINOR] Change description strings of v2 streaming sources to reflect the change

2018-04-26 Thread tdas
ion is running. Great for debugging production issues. ## How was this patch tested? Not necessary. Author: Tathagata Das <tathagata.das1...@gmail.com> Closes #21160 from tdas/SPARK-24094. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/

spark git commit: [SPARK-24050][SS] Calculate input / processing rates correctly for DataSourceV2 streaming sources

2018-04-25 Thread tdas
ath where the execution-metrics-to-source mapping is done directly. Otherwise we fall back to existing mapping logic. ## How was this patch tested? - New unit tests using V2 memory source - Existing unit tests using V1 source Author: Tathagata Das <tathagata.das1...@gmail.com> Closes #21126

spark git commit: [SPARK-24038][SS] Refactor continuous writing to its own class

2018-04-24 Thread tdas
Repository: spark Updated Branches: refs/heads/master 7b1e6523a -> d6c26d1c9 [SPARK-24038][SS] Refactor continuous writing to its own class ## What changes were proposed in this pull request? Refactor continuous writing to its own class. See WIP https://github.com/jose-torres/spark/pull/13

spark git commit: [SPARK-24056][SS] Make consumer creation lazy in Kafka source for Structured streaming

2018-04-24 Thread tdas
ing the KafkaOffsetReader. ## How was this patch tested? Existing unit tests Author: Tathagata Das <tathagata.das1...@gmail.com> Closes #21134 from tdas/SPARK-24056. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/7b1e6523 Tree: http:

spark git commit: [SPARK-23004][SS] Ensure StateStore.commit is called only once in a streaming aggregation task

2018-04-23 Thread tdas
t;tathagata.das1...@gmail.com> Closes #21124 from tdas/SPARK-23004. (cherry picked from commit 770add81c3474e754867d7105031a5eaf27159bd) Signed-off-by: Tathagata Das <tathagata.das1...@gmail.com> Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.o

spark git commit: [SPARK-23004][SS] Ensure StateStore.commit is called only once in a streaming aggregation task

2018-04-23 Thread tdas
t;tathagata.das1...@gmail.com> Closes #21124 from tdas/SPARK-23004. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/770add81 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/770add81 Diff: http://git-wip-us.apache.org/r

spark git commit: [SPARK-23747][STRUCTURED STREAMING] Add EpochCoordinator unit tests

2018-04-17 Thread tdas
Repository: spark Updated Branches: refs/heads/master 1cc66a072 -> 05ae74778 [SPARK-23747][STRUCTURED STREAMING] Add EpochCoordinator unit tests ## What changes were proposed in this pull request? Unit tests for EpochCoordinator that test correct sequencing of committed epochs. Several

spark git commit: [SPARK-23687][SS] Add a memory source for continuous processing.

2018-04-17 Thread tdas
Repository: spark Updated Branches: refs/heads/master 14844a62c -> 1cc66a072 [SPARK-23687][SS] Add a memory source for continuous processing. ## What changes were proposed in this pull request? Add a memory source for continuous processing. Note that only one of the ContinuousSuite tests is

spark git commit: [SPARK-23966][SS] Refactoring all checkpoint file writing logic in a common CheckpointFileManager interface

2018-04-13 Thread tdas
all `close()` to successfully write the file, or `cancel()` in case of an error. ## How was this patch tested? New tests in `CheckpointFileManagerSuite` and slightly modified existing tests. Author: Tathagata Das <tathagata.das1...@gmail.com> Closes #21048 from tdas/SPARK-23966. Proj

spark git commit: [SPARK-23748][SS] Fix SS continuous process doesn't support SubqueryAlias issue

2018-04-12 Thread tdas
Repository: spark Updated Branches: refs/heads/branch-2.3 908c681c6 -> 2995b79d6 [SPARK-23748][SS] Fix SS continuous process doesn't support SubqueryAlias issue ## What changes were proposed in this pull request? Current SS continuous doesn't support processing on temp table or

spark git commit: [SPARK-23748][SS] Fix SS continuous process doesn't support SubqueryAlias issue

2018-04-12 Thread tdas
Repository: spark Updated Branches: refs/heads/master 682002b6d -> 14291b061 [SPARK-23748][SS] Fix SS continuous process doesn't support SubqueryAlias issue ## What changes were proposed in this pull request? Current SS continuous doesn't support processing on temp table or `df.as("xxx")`,

spark git commit: [SPARK-23099][SS] Migrate foreach sink to DataSourceV2

2018-04-03 Thread tdas
Repository: spark Updated Branches: refs/heads/master 7cf9fab33 -> 66a3a5a2d [SPARK-23099][SS] Migrate foreach sink to DataSourceV2 ## What changes were proposed in this pull request? Migrate foreach sink to DataSourceV2. Since the previous attempt at this PR #20552, we've changed and

spark git commit: [SPARK-23827][SS] StreamingJoinExec should ensure that input data is partitioned into specific number of partitions

2018-03-30 Thread tdas
dant. Author: Tathagata Das <tathagata.das1...@gmail.com> Closes #20941 from tdas/SPARK-23827. (cherry picked from commit 15298b99ac8944e781328423289586176cf824d7) Signed-off-by: Tathagata Das <tathagata.das1...@gmail.com> Project: http://git-wip-us.apache.org/repos/asf/spark/re

spark git commit: [SPARK-23096][SS] Migrate rate source to V2

2018-03-27 Thread tdas
Repository: spark Updated Branches: refs/heads/master 35997b59f -> c68ec4e6a [SPARK-23096][SS] Migrate rate source to V2 ## What changes were proposed in this pull request? This PR migrate micro batch rate source to V2 API and rewrite UTs to suite V2 test. ## How was this patch tested?

spark git commit: [SPARK-23406][SS] Enable stream-stream self-joins for branch-2.3

2018-03-07 Thread tdas
Relation [value#66] +- Project [(value#12 % 5) AS key#9, value#12] +- Project [value#66 AS value#12]// solution: project with aliases +- LocalRelation [value#66] ``` ## How was this patch tested? New unit test Author: Tathagata Das <tathagata.das1...@gmail.com> Closes #207

spark git commit: [SPARK-23559][SS] Add epoch ID to DataWriterFactory.

2018-03-05 Thread tdas
Repository: spark Updated Branches: refs/heads/master ba622f45c -> b0f422c38 [SPARK-23559][SS] Add epoch ID to DataWriterFactory. ## What changes were proposed in this pull request? Add an epoch ID argument to DataWriterFactory for use in streaming. As a side effect of passing in this

spark git commit: [SPARK-23097][SQL][SS] Migrate text socket source to V2

2018-03-02 Thread tdas
Repository: spark Updated Branches: refs/heads/master 3a4d15e5d -> 707e6506d [SPARK-23097][SQL][SS] Migrate text socket source to V2 ## What changes were proposed in this pull request? This PR moves structured streaming text socket source to V2. Questions: do we need to remove old "socket"

spark git commit: [SPARK-23491][SS] Remove explicit job cancellation from ContinuousExecution reconfiguring

2018-02-26 Thread tdas
Repository: spark Updated Branches: refs/heads/master 185f5bc7d -> 7ec83658f [SPARK-23491][SS] Remove explicit job cancellation from ContinuousExecution reconfiguring ## What changes were proposed in this pull request? Remove queryExecutionThread.interrupt() from ContinuousExecution. As

spark git commit: [SPARK-23408][SS] Synchronize successive AddData actions in Streaming*JoinSuite

2018-02-23 Thread tdas
ing sure that the flaky tests are deterministic. ## How was this patch tested? Modified test cases in `Streaming*JoinSuites` where there are consecutive `AddData` actions. Author: Tathagata Das <tathagata.das1...@gmail.com> Closes #20650 from tdas/SPARK-23408. Project: http:

spark git commit: [SPARK-23484][SS] Fix possible race condition in KafkaContinuousReader

2018-02-21 Thread tdas
rom multiple threads - the query thread at the time of reader factory creation, and the epoch tracking thread at the time of `needsReconfiguration`. ## How was this patch tested? Existing tests. Author: Tathagata Das <tathagata.das1...@gmail.com> Closes #20655 from tdas/SPARK-23484.

spark git commit: [SPARK-23484][SS] Fix possible race condition in KafkaContinuousReader

2018-02-21 Thread tdas
rom multiple threads - the query thread at the time of reader factory creation, and the epoch tracking thread at the time of `needsReconfiguration`. ## How was this patch tested? Existing tests. Author: Tathagata Das <tathagata.das1...@gmail.com> Closes #20655 from tdas/SPARK-23484. Proj

spark git commit: [SPARK-23454][SS][DOCS] Added trigger information to the Structured Streaming programming guide

2018-02-20 Thread tdas
ove this) Please review http://spark.apache.org/contributing.html before opening a pull request. Author: Tathagata Das <tathagata.das1...@gmail.com> Closes #20631 from tdas/SPARK-23454. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spa

spark git commit: [SPARK-23454][SS][DOCS] Added trigger information to the Structured Streaming programming guide

2018-02-20 Thread tdas
ove this) Please review http://spark.apache.org/contributing.html before opening a pull request. Author: Tathagata Das <tathagata.das1...@gmail.com> Closes #20631 from tdas/SPARK-23454. (cherry picked from commit 601d653bff9160db8477f86d961e609fc2190237) Signed-off-by: Tathagata Das <ta

[1/2] spark git commit: [SPARK-23362][SS] Migrate Kafka Microbatch source to v2

2018-02-16 Thread tdas
Repository: spark Updated Branches: refs/heads/master c5857e496 -> 0a73aa31f http://git-wip-us.apache.org/repos/asf/spark/blob/0a73aa31/external/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaSourceSuite.scala

[2/2] spark git commit: [SPARK-23362][SS] Migrate Kafka Microbatch source to v2

2018-02-16 Thread tdas
throughput of V1 and V2 using 20M records in a single partition. They were comparable. ## How was this patch tested? Existing tests, few modified to be better tests than the existing ones. Author: Tathagata Das <tathagata.das1...@gmail.com> Closes #20554 from tdas/SPARK-23362. Project: http://g

spark git commit: [SPARK-23406][SS] Enable stream-stream self-joins

2018-02-14 Thread tdas
key#9, value#12] +- Project [value#66 AS value#12]// solution: project with aliases +- LocalRelation [value#66] ``` ## How was this patch tested? New unit test Author: Tathagata Das <tathagata.das1...@gmail.com> Closes #20598 from tdas/SPARK-23406. Project: http://git

spark git commit: [SPARK-23092][SQL] Migrate MemoryStream to DataSourceV2 APIs

2018-02-07 Thread tdas
sts. Author: Tathagata Das <tathagata.das1...@gmail.com> Author: Burak Yavuz <brk...@gmail.com> Closes #20445 from tdas/SPARK-23092. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/30295bf5 Tree: http://git-wip

spark git commit: [SPARK-23064][SS][DOCS] Stream-stream joins Documentation - follow up

2018-02-02 Thread tdas
How was this patch tested? N/A Author: Tathagata Das <tathagata.das1...@gmail.com> Closes #20494 from tdas/SPARK-23064-2. (cherry picked from commit eaf35de2471fac4337dd2920026836d52b1ec847) Signed-off-by: Tathagata Das <tathagata.das1...@gmail.com> Project: http://git-wip-us.apache.org/repos

spark git commit: [SPARK-23064][SS][DOCS] Stream-stream joins Documentation - follow up

2018-02-02 Thread tdas
tch tested? N/A Author: Tathagata Das <tathagata.das1...@gmail.com> Closes #20494 from tdas/SPARK-23064-2. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/eaf35de2 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree

spark git commit: [SPARK-23197][DSTREAMS] Increased timeouts to resolve flakiness

2018-01-23 Thread tdas
tch tested? Multiple rounds of tests. Author: Tathagata Das <tathagata.das1...@gmail.com> Closes #20371 from tdas/SPARK-23197. (cherry picked from commit 15adcc8273e73352e5e1c3fc9915c0b004ec4836) Signed-off-by: Tathagata Das <tathagata.das1...@gmail.com> Project: http://git-wip-us.apach

spark git commit: [SPARK-23197][DSTREAMS] Increased timeouts to resolve flakiness

2018-01-23 Thread tdas
ted? Multiple rounds of tests. Author: Tathagata Das <tathagata.das1...@gmail.com> Closes #20371 from tdas/SPARK-23197. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/15adcc82 Tree: http://git-wip-us.apache.org/repos/

spark git commit: [SPARK-23142][SS][DOCS] Added docs for continuous processing

2018-01-18 Thread tdas
fde-75992cc517bd.png) ## How was this patch tested? N/A Author: Tathagata Das <tathagata.das1...@gmail.com> Closes #20308 from tdas/SPARK-23142. (cherry picked from commit 4cd2ecc0c7222fef1337e04f1948333296c3be86) Signed-off-by: Tathagata Das <tathagata.das1...@gmail.com> Project:

spark git commit: [SPARK-23142][SS][DOCS] Added docs for continuous processing

2018-01-18 Thread tdas
fde-75992cc517bd.png) ## How was this patch tested? N/A Author: Tathagata Das <tathagata.das1...@gmail.com> Closes #20308 from tdas/SPARK-23142. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/4cd2ecc0 Tree: http:

spark git commit: [SPARK-23144][SS] Added console sink for continuous processing

2018-01-18 Thread tdas
How was this patch tested? new unit test Author: Tathagata Das <tathagata.das1...@gmail.com> Closes #20311 from tdas/SPARK-23144. (cherry picked from commit bf34d665b9c865e00fac7001500bf6d521c2dff9) Signed-off-by: Tathagata Das <tathagata.das1...@gmail.com> Project: http://git-wip-us.a

spark git commit: [SPARK-23144][SS] Added console sink for continuous processing

2018-01-18 Thread tdas
How was this patch tested? new unit test Author: Tathagata Das <tathagata.das1...@gmail.com> Closes #20311 from tdas/SPARK-23144. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/bf34d665 Tree: http://git-wip-us.apache.org/repos/

spark git commit: [SPARK-23143][SS][PYTHON] Added python API for setting continuous trigger

2018-01-18 Thread tdas
Das <tathagata.das1...@gmail.com> Closes #20309 from tdas/SPARK-23143. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/2d41f040 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/2d41f040 Diff: http://git-wip-us.a

spark git commit: [SPARK-23052][SS] Migrate ConsoleSink to data source V2 api.

2018-01-17 Thread tdas
Repository: spark Updated Branches: refs/heads/branch-2.3 3a80cc59b -> 2a87c3a77 [SPARK-23052][SS] Migrate ConsoleSink to data source V2 api. ## What changes were proposed in this pull request? Migrate ConsoleSink to data source V2 api. Note that this includes a missing piece in

spark git commit: [SPARK-23052][SS] Migrate ConsoleSink to data source V2 api.

2018-01-17 Thread tdas
Repository: spark Updated Branches: refs/heads/master 39d244d92 -> 1c76a91e5 [SPARK-23052][SS] Migrate ConsoleSink to data source V2 api. ## What changes were proposed in this pull request? Migrate ConsoleSink to data source V2 api. Note that this includes a missing piece in

spark git commit: [SPARK-23033][SS] Don't use task level retry for continuous processing

2018-01-17 Thread tdas
Repository: spark Updated Branches: refs/heads/branch-2.3 1a6dfaf25 -> dbd2a5566 [SPARK-23033][SS] Don't use task level retry for continuous processing ## What changes were proposed in this pull request? Continuous processing tasks will fail on any attempt number greater than 0.

spark git commit: [SPARK-23033][SS] Don't use task level retry for continuous processing

2018-01-17 Thread tdas
Repository: spark Updated Branches: refs/heads/master c132538a1 -> 86a845031 [SPARK-23033][SS] Don't use task level retry for continuous processing ## What changes were proposed in this pull request? Continuous processing tasks will fail on any attempt number greater than 0.

[2/2] spark git commit: [SPARK-22908][SS] Roll forward continuous processing Kafka support with fix to continuous Kafka data reader

2018-01-16 Thread tdas
[SPARK-22908][SS] Roll forward continuous processing Kafka support with fix to continuous Kafka data reader ## What changes were proposed in this pull request? The Kafka reader is now interruptible and can close itself. ## How was this patch tested? I locally ran one of the

[1/2] spark git commit: [SPARK-22908][SS] Roll forward continuous processing Kafka support with fix to continuous Kafka data reader

2018-01-16 Thread tdas
Repository: spark Updated Branches: refs/heads/branch-2.3 08252bb38 -> 0a441d2ed http://git-wip-us.apache.org/repos/asf/spark/blob/0a441d2e/external/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaSourceSuite.scala

[2/2] spark git commit: [SPARK-22908][SS] Roll forward continuous processing Kafka support with fix to continuous Kafka data reader

2018-01-16 Thread tdas
[SPARK-22908][SS] Roll forward continuous processing Kafka support with fix to continuous Kafka data reader ## What changes were proposed in this pull request? The Kafka reader is now interruptible and can close itself. ## How was this patch tested? I locally ran one of the

[1/2] spark git commit: [SPARK-22908][SS] Roll forward continuous processing Kafka support with fix to continuous Kafka data reader

2018-01-16 Thread tdas
Repository: spark Updated Branches: refs/heads/master a9b845ebb -> 166705785 http://git-wip-us.apache.org/repos/asf/spark/blob/16670578/external/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaSourceSuite.scala

[1/2] spark git commit: [SPARK-22908] Add kafka source and sink for continuous processing.

2018-01-11 Thread tdas
Repository: spark Updated Branches: refs/heads/branch-2.3 b94debd2b -> f891ee324 http://git-wip-us.apache.org/repos/asf/spark/blob/f891ee32/sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala -- diff --git

[2/2] spark git commit: [SPARK-22908] Add kafka source and sink for continuous processing.

2018-01-11 Thread tdas
[SPARK-22908] Add kafka source and sink for continuous processing. ## What changes were proposed in this pull request? Add kafka source and sink for continuous processing. This involves two small changes to the execution engine: * Bring data reader close() into the normal data reader thread to

[2/2] spark git commit: [SPARK-22908] Add kafka source and sink for continuous processing.

2018-01-11 Thread tdas
[SPARK-22908] Add kafka source and sink for continuous processing. ## What changes were proposed in this pull request? Add kafka source and sink for continuous processing. This involves two small changes to the execution engine: * Bring data reader close() into the normal data reader thread to

[1/2] spark git commit: [SPARK-22908] Add kafka source and sink for continuous processing.

2018-01-11 Thread tdas
Repository: spark Updated Branches: refs/heads/master 0b2eefb67 -> 6f7aaed80 http://git-wip-us.apache.org/repos/asf/spark/blob/6f7aaed8/sql/core/src/main/scala/org/apache/spark/sql/DataFrameWriter.scala -- diff --git

spark git commit: [SPARK-22912] v2 data source support in MicroBatchExecution

2018-01-09 Thread tdas
Repository: spark Updated Branches: refs/heads/branch-2.3 be5991902 -> 44763d93c [SPARK-22912] v2 data source support in MicroBatchExecution ## What changes were proposed in this pull request? Support for v2 data sources in microbatch streaming. ## How was this patch tested? A very basic

spark git commit: [SPARK-22912] v2 data source support in MicroBatchExecution

2018-01-08 Thread tdas
Repository: spark Updated Branches: refs/heads/master eed82a0b2 -> 4f7e75883 [SPARK-22912] v2 data source support in MicroBatchExecution ## What changes were proposed in this pull request? Support for v2 data sources in microbatch streaming. ## How was this patch tested? A very basic new

spark git commit: [SPARK-22278][SS] Expose current event time watermark and current processing time in GroupState

2017-10-17 Thread tdas
Das <tathagata.das1...@gmail.com> Closes #19495 from tdas/SPARK-22278. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/f3137fee Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/f3137fee Diff: http://git-wip-us.apache.org/repos/

spark git commit: [SPARK-22136][SS] Evaluate one-sided conditions early in stream-stream joins.

2017-10-17 Thread tdas
Repository: spark Updated Branches: refs/heads/master e1960c3d6 -> 75d666b95 [SPARK-22136][SS] Evaluate one-sided conditions early in stream-stream joins. ## What changes were proposed in this pull request? Evaluate one-sided conditions early in stream-stream joins. This is in addition to

spark git commit: [SPARK-22238] Fix plan resolution bug caused by EnsureStatefulOpPartitioning

2017-10-14 Thread tdas
Repository: spark Updated Branches: refs/heads/master 014dc8471 -> e8547ffb4 [SPARK-22238] Fix plan resolution bug caused by EnsureStatefulOpPartitioning ## What changes were proposed in this pull request? In EnsureStatefulOpPartitioning, we check that the inputRDD to a SparkPlan has the

spark git commit: [SPARK-22187][SS] Update unsaferow format for saved state such that we can set timeouts when state is null

2017-10-04 Thread tdas
ll, and avoid these confusing corner cases. ## How was this patch tested? Refactored tests. Author: Tathagata Das <tathagata.das1...@gmail.com> Closes #19416 from tdas/SPARK-22187. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/sp

[1/2] spark git commit: [SPARK-22136][SS] Implement stream-stream outer joins.

2017-10-03 Thread tdas
Repository: spark Updated Branches: refs/heads/master 5f6943345 -> 3099c574c http://git-wip-us.apache.org/repos/asf/spark/blob/3099c574/sql/core/src/test/scala/org/apache/spark/sql/streaming/StreamingJoinSuite.scala -- diff

[2/2] spark git commit: [SPARK-22136][SS] Implement stream-stream outer joins.

2017-10-03 Thread tdas
[SPARK-22136][SS] Implement stream-stream outer joins. ## What changes were proposed in this pull request? Allow one-sided outer joins between two streams when a watermark is defined. ## How was this patch tested? new unit tests Author: Jose Torres Closes #19327 from

[2/2] spark git commit: [SPARK-22053][SS] Stream-stream inner join in Append Mode

2017-09-21 Thread tdas
evented stream-stream join on an empty batch dataframe to be collapsed by the optimizer ## How was this patch tested? - New tests in StreamingJoinSuite - Updated tests UnsupportedOperationSuite Author: Tathagata Das <tathagata.das1...@gmail.com> Closes #19271 from tdas/SPARK-22053. Proje

[1/2] spark git commit: [SPARK-22053][SS] Stream-stream inner join in Append Mode

2017-09-21 Thread tdas
Repository: spark Updated Branches: refs/heads/master a8a5cd24e -> f32a84250 http://git-wip-us.apache.org/repos/asf/spark/blob/f32a8425/sql/core/src/test/scala/org/apache/spark/sql/streaming/StateStoreMetricsTest.scala -- diff

spark git commit: [SPARK-22017] Take minimum of all watermark execs in StreamExecution.

2017-09-15 Thread tdas
Repository: spark Updated Branches: refs/heads/master c7307acda -> 0bad10d3e [SPARK-22017] Take minimum of all watermark execs in StreamExecution. ## What changes were proposed in this pull request? Take the minimum of all watermark exec nodes as the "real" watermark in StreamExecution,

spark git commit: [SPARK-22018][SQL] Preserve top-level alias metadata when collapsing projects

2017-09-14 Thread tdas
ing the metadata of top-level aliases. ## How was this patch tested? New unit test Author: Tathagata Das <tathagata.das1...@gmail.com> Closes #19240 from tdas/SPARK-22018. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/8866

spark git commit: [SPARK-21765] Check that optimization doesn't affect isStreaming bit.

2017-09-06 Thread tdas
Repository: spark Updated Branches: refs/heads/master 36b48ee6e -> acdf45fb5 [SPARK-21765] Check that optimization doesn't affect isStreaming bit. ## What changes were proposed in this pull request? Add an assert in logical plan optimization that the isStreaming bit stays the same, and fix

spark git commit: [SPARK-21925] Update trigger interval documentation in docs with behavior change in Spark 2.2

2017-09-05 Thread tdas
Repository: spark Updated Branches: refs/heads/branch-2.2 fb1b5f08a -> 1f7c4869b [SPARK-21925] Update trigger interval documentation in docs with behavior change in Spark 2.2 Forgot to update docs with behavior change. Author: Burak Yavuz Closes #19138 from

spark git commit: [SPARK-21925] Update trigger interval documentation in docs with behavior change in Spark 2.2

2017-09-05 Thread tdas
Repository: spark Updated Branches: refs/heads/master 2974406d1 -> 8c954d2cd [SPARK-21925] Update trigger interval documentation in docs with behavior change in Spark 2.2 Forgot to update docs with behavior change. Author: Burak Yavuz Closes #19138 from

  1   2   3   4   5   6   7   8   9   >