spark git commit: [SPARK-15593][SQL] Add DataFrameWriter.foreach to allow the user consuming data in ContinuousQuery

2016-06-10 Thread tdas
Repository: spark Updated Branches: refs/heads/master 5a3533e77 -> 00c310133 [SPARK-15593][SQL] Add DataFrameWriter.foreach to allow the user consuming data in ContinuousQuery ## What changes were proposed in this pull request? * Add DataFrameWriter.foreach to allow the user consuming data

spark git commit: [SPARK-15853][SQL] HDFSMetadataLog.get should close the input stream

2016-06-09 Thread tdas
Repository: spark Updated Branches: refs/heads/branch-2.0 00bbf7873 -> ca0801120 [SPARK-15853][SQL] HDFSMetadataLog.get should close the input stream ## What changes were proposed in this pull request? This PR closes the input stream created in `HDFSMetadataLog.get` ## How was this patch

spark git commit: [SPARK-15853][SQL] HDFSMetadataLog.get should close the input stream

2016-06-09 Thread tdas
Repository: spark Updated Branches: refs/heads/master b914e1930 -> 4d9d9cc58 [SPARK-15853][SQL] HDFSMetadataLog.get should close the input stream ## What changes were proposed in this pull request? This PR closes the input stream created in `HDFSMetadataLog.get` ## How was this patch

spark git commit: [SPARK-15580][SQL] Add ContinuousQueryInfo to make ContinuousQueryListener events serializable

2016-06-07 Thread tdas
Repository: spark Updated Branches: refs/heads/branch-2.0 ec556fec0 -> 003c44792 [SPARK-15580][SQL] Add ContinuousQueryInfo to make ContinuousQueryListener events serializable ## What changes were proposed in this pull request? This PR adds ContinuousQueryInfo to make

spark git commit: [SPARK-15580][SQL] Add ContinuousQueryInfo to make ContinuousQueryListener events serializable

2016-06-07 Thread tdas
Repository: spark Updated Branches: refs/heads/master 695dbc816 -> 0cfd6192f [SPARK-15580][SQL] Add ContinuousQueryInfo to make ContinuousQueryListener events serializable ## What changes were proposed in this pull request? This PR adds ContinuousQueryInfo to make ContinuousQueryListener

spark git commit: [SPARK-15458][SQL][STREAMING] Disable schema inference for streaming datasets on file streams

2016-05-24 Thread tdas
es #13238 from tdas/SPARK-15458. (cherry picked from commit e631b819fe348729aab062207a452b8f1d1511bd) Signed-off-by: Tathagata Das <tathagata.das1...@gmail.com> Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/1fb7b3a0 Tree:

spark git commit: [SPARK-15458][SQL][STREAMING] Disable schema inference for streaming datasets on file streams

2016-05-24 Thread tdas
3238 from tdas/SPARK-15458. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/e631b819 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/e631b819 Diff: http://git-wip-us.apache.org/repos/asf/spark/diff/e631b819 Branch: r

spark git commit: [SPARK-15428][SQL] Disable multiple streaming aggregations

2016-05-22 Thread tdas
ave the necessary support for "delta" to implement correctly. So disabling the support for multiple streaming aggregations. ## How was this patch tested? Additional unit tests Author: Tathagata Das <tathagata.das1...@gmail.com> Closes #13210 from tdas/SPARK-15428. (cherry

spark git commit: [SPARK-15428][SQL] Disable multiple streaming aggregations

2016-05-22 Thread tdas
ary support for "delta" to implement correctly. So disabling the support for multiple streaming aggregations. ## How was this patch tested? Additional unit tests Author: Tathagata Das <tathagata.das1...@gmail.com> Closes #13210 from tdas/SPARK-15428. Project: http://git-wip-us.a

spark git commit: [SPARK-15103][SQL] Refactored FileCatalog class to allow StreamFileCatalog to infer partitioning

2016-05-04 Thread tdas
ing gets inferred, and on reading whether the partitions get pruned correctly based on the query. - Other unit tests are unchanged and pass as expected. Author: Tathagata Das <tathagata.das1...@gmail.com> Closes #12879 from tdas/SPARK-15103. (cherry picked fr

spark git commit: [SPARK-15103][SQL] Refactored FileCatalog class to allow StreamFileCatalog to infer partitioning

2016-05-04 Thread tdas
ets inferred, and on reading whether the partitions get pruned correctly based on the query. - Other unit tests are unchanged and pass as expected. Author: Tathagata Das <tathagata.das1...@gmail.com> Closes #12879 from tdas/SPARK-15103. Project: http://git-wip-us.apache.org/repos/asf/s

spark git commit: [SPARK-14860][TESTS] Create a new Waiter in reset to bypass an issue of ScalaTest's Waiter.wait

2016-05-03 Thread tdas
Repository: spark Updated Branches: refs/heads/master 4ad492c40 -> b545d7521 [SPARK-14860][TESTS] Create a new Waiter in reset to bypass an issue of ScalaTest's Waiter.wait ## What changes were proposed in this pull request? This PR updates `QueryStatusCollector.reset` to create Waiter

spark git commit: [SPARK-14860][TESTS] Create a new Waiter in reset to bypass an issue of ScalaTest's Waiter.wait

2016-05-03 Thread tdas
Repository: spark Updated Branches: refs/heads/branch-2.0 c5b7e1f70 -> 31e5a2a76 [SPARK-14860][TESTS] Create a new Waiter in reset to bypass an issue of ScalaTest's Waiter.wait ## What changes were proposed in this pull request? This PR updates `QueryStatusCollector.reset` to create Waiter

spark git commit: [SPARK-14716][SQL] Added support for partitioning in FileStreamSink

2016-05-03 Thread tdas
e PR). - Updated FileStressSuite to test number of records read from partitioned output files. Author: Tathagata Das <tathagata.das1...@gmail.com> Closes #12409 from tdas/streaming-partitioned-parquet. (cherry picked from commit 4ad492c40358d0104db508db98ce0971114b6817) Signed-off-by: Tathaga

spark git commit: [SPARK-14555] Second cut of Python API for Structured Streaming

2016-04-28 Thread tdas
Repository: spark Updated Branches: refs/heads/master d584a2b8a -> 78c8aaf84 [SPARK-14555] Second cut of Python API for Structured Streaming ## What changes were proposed in this pull request? This PR adds Python APIs for: - `ContinuousQueryManager` - `ContinuousQueryException` The

spark git commit: [SPARK-14930][SPARK-13693] Fix race condition in CheckpointWriter.stop()

2016-04-27 Thread tdas
s as a whole, they now run in ~220 seconds vs. ~354 before. /cc zsxwing and tdas for review. Author: Josh Rosen <joshro...@databricks.com> Closes #12712 from JoshRosen/fix-checkpoint-writer-race. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache

spark git commit: [SPARK-14719] WriteAheadLogBasedBlockHandler should ignore BlockManager put errors

2016-04-18 Thread tdas
ISK` storage level, and 2. typically, individual blocks may be small enough relative to the total storage memory such that they're able to evict blocks from previous batches, so `put()` failures here may be rare in practice. This patch fixes the faulty test and fixes the bug. /cc tdas Author: Josh Ro

spark git commit: [SPARK-14382][SQL] QueryProgress should be post after committedOffsets is updated

2016-04-06 Thread tdas
Repository: spark Updated Branches: refs/heads/master 9c6556c5f -> a4ead6d38 [SPARK-14382][SQL] QueryProgress should be post after committedOffsets is updated ## What changes were proposed in this pull request? Make sure QueryProgress is post after committedOffsets is updated. If

spark git commit: [SPARK-14109][SQL] Fix HDFSMetadataLog to fallback from FileContext to FileSystem API

2016-03-25 Thread tdas
log directory is concurrently modified. In addition I have also added more tests to increase the code coverage. ## How was this patch tested? Unit test. Tested on cluster with custom file system. Author: Tathagata Das <tathagata.das1...@gmail.com> Closes #11925 from tdas/SPARK-14109. Proj

spark git commit: [SPARK-13809][SQL] State store for streaming aggregations

2016-03-23 Thread tdas
ations are set correctly - [ ] Whether recovery works correctly with distributed storage - [x] Basic performance tests - [x] Docs Author: Tathagata Das <tathagata.das1...@gmail.com> Closes #11645 from tdas/state-store. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: ht

spark git commit: [HOTFIX][SQL] Don't stop ContinuousQuery in quietly

2016-03-23 Thread tdas
Repository: spark Updated Branches: refs/heads/master 926a93e54 -> abacf5f25 [HOTFIX][SQL] Don't stop ContinuousQuery in quietly ## What changes were proposed in this pull request? Try to fix a flaky hang ## How was this patch tested? Existing Jenkins test Author: Shixiong Zhu

spark git commit: [SPARK-13149][SQL] Add FileStreamSource

2016-02-09 Thread tdas
Repository: spark Updated Branches: refs/heads/master 6f710f9fd -> b385ce388 [SPARK-13149][SQL] Add FileStreamSource `FileStreamSource` is an implementation of `org.apache.spark.sql.execution.streaming.Source`. It takes advantage of the existing `HadoopFsRelationProvider` to support various

spark git commit: [SPARK-7799][STREAMING][DOCUMENT] Add the linking and deploying instructions for streaming-akka project

2016-01-26 Thread tdas
Repository: spark Updated Branches: refs/heads/master 08c781ca6 -> cbd507d69 [SPARK-7799][STREAMING][DOCUMENT] Add the linking and deploying instructions for streaming-akka project Since `actorStream` is an external project, we should add the linking and deploying instructions for it. A

spark git commit: [SPARK-12847][CORE][STREAMING] Remove StreamingListenerBus and post all Streaming events to the same thread as Spark events

2016-01-20 Thread tdas
Repository: spark Updated Branches: refs/heads/master e3727c409 -> 944fdadf7 [SPARK-12847][CORE][STREAMING] Remove StreamingListenerBus and post all Streaming events to the same thread as Spark events Including the following changes: 1. Add StreamingListenerForwardingBus to

spark git commit: [SPARK-7799][SPARK-12786][STREAMING] Add "streaming-akka" project

2016-01-20 Thread tdas
Repository: spark Updated Branches: refs/heads/master 944fdadf7 -> b7d74a602 [SPARK-7799][SPARK-12786][STREAMING] Add "streaming-akka" project Include the following changes: 1. Add "streaming-akka" project and org.apache.spark.streaming.akka.AkkaUtils for creating an actorStream 2. Remove

spark git commit: [SPARK-12814][DOCUMENT] Add deploy instructions for Python in flume integration doc

2016-01-18 Thread tdas
Repository: spark Updated Branches: refs/heads/branch-1.6 8c2b67f55 -> 7482c7b5a [SPARK-12814][DOCUMENT] Add deploy instructions for Python in flume integration doc This PR added instructions to get flume assembly jar for Python users in the flume integration page like Kafka doc. Author:

spark git commit: [SPARK-12814][DOCUMENT] Add deploy instructions for Python in flume integration doc

2016-01-18 Thread tdas
Repository: spark Updated Branches: refs/heads/master 404190221 -> a973f483f [SPARK-12814][DOCUMENT] Add deploy instructions for Python in flume integration doc This PR added instructions to get flume assembly jar for Python users in the flume integration page like Kafka doc. Author:

spark git commit: [SPARK-12894][DOCUMENT] Add deploy instructions for Python in Kinesis integration doc

2016-01-18 Thread tdas
Repository: spark Updated Branches: refs/heads/branch-1.6 7482c7b5a -> d43704d7f [SPARK-12894][DOCUMENT] Add deploy instructions for Python in Kinesis integration doc This PR added instructions to get Kinesis assembly jar for Python users in the Kinesis integration page like Kafka doc.

spark git commit: [SPARK-12894][DOCUMENT] Add deploy instructions for Python in Kinesis integration doc

2016-01-18 Thread tdas
Repository: spark Updated Branches: refs/heads/master 4bcea1b85 -> 721845c1b [SPARK-12894][DOCUMENT] Add deploy instructions for Python in Kinesis integration doc This PR added instructions to get Kinesis assembly jar for Python users in the Kinesis integration page like Kafka doc. Author:

spark git commit: [SPARK-12591][STREAMING] Register OpenHashMapBasedStateMap for Kryo (branch 1.6)

2016-01-08 Thread tdas
Repository: spark Updated Branches: refs/heads/branch-1.6 a7c36362f -> 0d96c5453 [SPARK-12591][STREAMING] Register OpenHashMapBasedStateMap for Kryo (branch 1.6) backport #10609 to branch 1.6 Author: Shixiong Zhu Closes #10656 from zsxwing/SPARK-12591-branch-1.6.

spark git commit: [SPARK-12507][STREAMING][DOCUMENT] Expose closeFileAfterWrite and allowBatching configurations for Streaming

2016-01-07 Thread tdas
Repository: spark Updated Branches: refs/heads/branch-1.6 6ef823544 -> a7c36362f [SPARK-12507][STREAMING][DOCUMENT] Expose closeFileAfterWrite and allowBatching configurations for Streaming /cc tdas brkyvz Author: Shixiong Zhu <shixi...@databricks.com> Closes #10453 from zsxwing/

spark git commit: [SPARK-12591][STREAMING] Register OpenHashMapBasedStateMap for Kryo

2016-01-07 Thread tdas
Repository: spark Updated Branches: refs/heads/master c94199e97 -> 28e0e500a [SPARK-12591][STREAMING] Register OpenHashMapBasedStateMap for Kryo The default serializer in Kryo is FieldSerializer and it ignores transient fields and never calls `writeObject` or `readObject`. So we should

spark git commit: [SPARK-12507][STREAMING][DOCUMENT] Expose closeFileAfterWrite and allowBatching configurations for Streaming

2016-01-07 Thread tdas
Repository: spark Updated Branches: refs/heads/master 5a4021998 -> c94199e97 [SPARK-12507][STREAMING][DOCUMENT] Expose closeFileAfterWrite and allowBatching configurations for Streaming /cc tdas brkyvz Author: Shixiong Zhu <shixi...@databricks.com> Closes #10453 from zsxwing/strea

spark git commit: [SPARK-12429][STREAMING][DOC] Add Accumulator and Broadcast example for Streaming

2015-12-22 Thread tdas
Repository: spark Updated Branches: refs/heads/master 93db50d1c -> 20591afd7 [SPARK-12429][STREAMING][DOC] Add Accumulator and Broadcast example for Streaming This PR adds Scala, Java and Python examples to show how to use Accumulator and Broadcast in Spark Streaming to support

spark git commit: [SPARK-12429][STREAMING][DOC] Add Accumulator and Broadcast example for Streaming

2015-12-22 Thread tdas
Repository: spark Updated Branches: refs/heads/branch-1.6 94fb5e870 -> 942c0577b [SPARK-12429][STREAMING][DOC] Add Accumulator and Broadcast example for Streaming This PR adds Scala, Java and Python examples to show how to use Accumulator and Broadcast in Spark Streaming to support

spark git commit: [SPARK-12487][STREAMING][DOCUMENT] Add docs for Kafka message handler

2015-12-22 Thread tdas
Repository: spark Updated Branches: refs/heads/branch-1.6 0f905d7df -> 94fb5e870 [SPARK-12487][STREAMING][DOCUMENT] Add docs for Kafka message handler Author: Shixiong Zhu Closes #10439 from zsxwing/kafka-message-handler-doc. (cherry picked from commit

spark git commit: [SPARK-12487][STREAMING][DOCUMENT] Add docs for Kafka message handler

2015-12-22 Thread tdas
Repository: spark Updated Branches: refs/heads/master b374a2583 -> 93db50d1c [SPARK-12487][STREAMING][DOCUMENT] Add docs for Kafka message handler Author: Shixiong Zhu Closes #10439 from zsxwing/kafka-message-handler-doc. Project:

spark git commit: [SPARK-11932][STREAMING] Partition previous TrackStateRDD if partitioner not present

2015-12-07 Thread tdas
nts, there may be a non-zero chance that the saving and recovery fails. To be resilient, this PR repartitions the previous state RDD if the partitioner is not detected. Author: Tathagata Das <tathagata.das1...@gmail.com> Closes #9988 from tdas/SPARK-11932. Project: http://git-wip-us.a

spark git commit: [SPARK-11932][STREAMING] Partition previous TrackStateRDD if partitioner not present

2015-12-07 Thread tdas
RDD checkpoints, there may be a non-zero chance that the saving and recovery fails. To be resilient, this PR repartitions the previous state RDD if the partitioner is not detected. Author: Tathagata Das <tathagata.das1...@gmail.com> Closes #9988 from tdas/SPARK-11932. (cherry picked fr

spark git commit: [SPARK-12122][STREAMING] Prevent batches from being submitted twice after recovering StreamingContext from checkpoint

2015-12-04 Thread tdas
Repository: spark Updated Branches: refs/heads/branch-1.6 2d7c4f6af -> 8f784b864 [SPARK-12122][STREAMING] Prevent batches from being submitted twice after recovering StreamingContext from checkpoint Author: Tathagata Das <tathagata.das1...@gmail.com> Closes #10127 from tdas/SP

spark git commit: [SPARK-12122][STREAMING] Prevent batches from being submitted twice after recovering StreamingContext from checkpoint

2015-12-04 Thread tdas
Repository: spark Updated Branches: refs/heads/master 5011f264f -> 4106d80fb [SPARK-12122][STREAMING] Prevent batches from being submitted twice after recovering StreamingContext from checkpoint Author: Tathagata Das <tathagata.das1...@gmail.com> Closes #10127 from tdas/SP

spark git commit: [SPARK-12058][STREAMING][KINESIS][TESTS] fix Kinesis python tests

2015-12-04 Thread tdas
KPL. cc zsxwing tdas Author: Burak Yavuz <brk...@gmail.com> Closes #10050 from brkyvz/kinesis-py. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/39d5cc8a Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/39d5

spark git commit: [SPARK-12058][STREAMING][KINESIS][TESTS] fix Kinesis python tests

2015-12-04 Thread tdas
cc zsxwing tdas Author: Burak Yavuz <brk...@gmail.com> Closes #10050 from brkyvz/kinesis-py. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/302d68de Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/302d68de D

spark git commit: [FLAKY-TEST-FIX][STREAMING][TEST] Make sure StreamingContexts are shutdown after test

2015-12-03 Thread tdas
Repository: spark Updated Branches: refs/heads/master ad7cea6f7 -> a02d47277 [FLAKY-TEST-FIX][STREAMING][TEST] Make sure StreamingContexts are shutdown after test Author: Tathagata Das <tathagata.das1...@gmail.com> Closes #10124 from tdas/InputStreamSuite-flaky-test. Project: h

spark git commit: [FLAKY-TEST-FIX][STREAMING][TEST] Make sure StreamingContexts are shutdown after test

2015-12-03 Thread tdas
Repository: spark Updated Branches: refs/heads/branch-1.6 9d698fc57 -> b1a27d616 [FLAKY-TEST-FIX][STREAMING][TEST] Make sure StreamingContexts are shutdown after test Author: Tathagata Das <tathagata.das1...@gmail.com> Closes #10124 from tdas/InputStreamSuite-flaky-test. (cher

spark git commit: [SPARK-11935][PYSPARK] Send the Python exceptions in TransformFunction and TransformFunctionSerializer to Java

2015-11-25 Thread tdas
Repository: spark Updated Branches: refs/heads/master 88875d941 -> d29e2ef4c [SPARK-11935][PYSPARK] Send the Python exceptions in TransformFunction and TransformFunctionSerializer to Java The Python exception track in TransformFunction and TransformFunctionSerializer is not sent back to

spark git commit: [SPARK-11935][PYSPARK] Send the Python exceptions in TransformFunction and TransformFunctionSerializer to Java

2015-11-25 Thread tdas
Repository: spark Updated Branches: refs/heads/branch-1.6 b4cf318ab -> 849ddb6ae [SPARK-11935][PYSPARK] Send the Python exceptions in TransformFunction and TransformFunctionSerializer to Java The Python exception track in TransformFunction and TransformFunctionSerializer is not sent back to

spark git commit: [SPARK-11870][STREAMING][PYSPARK] Rethrow the exceptions in TransformFunction and TransformFunctionSerializer

2015-11-20 Thread tdas
Repository: spark Updated Branches: refs/heads/branch-1.6 9c8e17984 -> 0c23dd52d [SPARK-11870][STREAMING][PYSPARK] Rethrow the exceptions in TransformFunction and TransformFunctionSerializer TransformFunction and TransformFunctionSerializer don't rethrow the exception, so when any exception

spark git commit: [SPARK-11870][STREAMING][PYSPARK] Rethrow the exceptions in TransformFunction and TransformFunctionSerializer

2015-11-20 Thread tdas
Repository: spark Updated Branches: refs/heads/branch-1.4 5118abb4e -> 94789f374 [SPARK-11870][STREAMING][PYSPARK] Rethrow the exceptions in TransformFunction and TransformFunctionSerializer TransformFunction and TransformFunctionSerializer don't rethrow the exception, so when any exception

spark git commit: [SPARK-11870][STREAMING][PYSPARK] Rethrow the exceptions in TransformFunction and TransformFunctionSerializer

2015-11-20 Thread tdas
Repository: spark Updated Branches: refs/heads/branch-1.5 9a906c1c3 -> e9ae1fda9 [SPARK-11870][STREAMING][PYSPARK] Rethrow the exceptions in TransformFunction and TransformFunctionSerializer TransformFunction and TransformFunctionSerializer don't rethrow the exception, so when any exception

spark git commit: [SPARK-11870][STREAMING][PYSPARK] Rethrow the exceptions in TransformFunction and TransformFunctionSerializer

2015-11-20 Thread tdas
Repository: spark Updated Branches: refs/heads/master 9ed4ad426 -> be7a2cfd9 [SPARK-11870][STREAMING][PYSPARK] Rethrow the exceptions in TransformFunction and TransformFunctionSerializer TransformFunction and TransformFunctionSerializer don't rethrow the exception, so when any exception

spark git commit: [SPARK-11812][PYSPARK] invFunc=None works properly with python's reduceByKeyAndWindow

2015-11-19 Thread tdas
Repository: spark Updated Branches: refs/heads/branch-1.3 5278ef0f1 -> 387d81891 [SPARK-11812][PYSPARK] invFunc=None works properly with python's reduceByKeyAndWindow invFunc is optional and can be None. Instead of invFunc (the parameter) invReduceFunc (a local function) was checked for

spark git commit: [SPARK-11812][PYSPARK] invFunc=None works properly with python's reduceByKeyAndWindow

2015-11-19 Thread tdas
Repository: spark Updated Branches: refs/heads/branch-1.5 9957925e4 -> 001c44667 [SPARK-11812][PYSPARK] invFunc=None works properly with python's reduceByKeyAndWindow invFunc is optional and can be None. Instead of invFunc (the parameter) invReduceFunc (a local function) was checked for

spark git commit: [SPARK-11812][PYSPARK] invFunc=None works properly with python's reduceByKeyAndWindow

2015-11-19 Thread tdas
Repository: spark Updated Branches: refs/heads/branch-1.6 fdffc400c -> abe393024 [SPARK-11812][PYSPARK] invFunc=None works properly with python's reduceByKeyAndWindow invFunc is optional and can be None. Instead of invFunc (the parameter) invReduceFunc (a local function) was checked for

spark git commit: [SPARK-11812][PYSPARK] invFunc=None works properly with python's reduceByKeyAndWindow

2015-11-19 Thread tdas
Repository: spark Updated Branches: refs/heads/master 470007453 -> 599a8c6e2 [SPARK-11812][PYSPARK] invFunc=None works properly with python's reduceByKeyAndWindow invFunc is optional and can be None. Instead of invFunc (the parameter) invReduceFunc (a local function) was checked for

spark git commit: [SPARK-11812][PYSPARK] invFunc=None works properly with python's reduceByKeyAndWindow

2015-11-19 Thread tdas
Repository: spark Updated Branches: refs/heads/branch-1.4 eda1ff4ee -> 5118abb4e [SPARK-11812][PYSPARK] invFunc=None works properly with python's reduceByKeyAndWindow invFunc is optional and can be None. Instead of invFunc (the parameter) invReduceFunc (a local function) was checked for

spark git commit: [SPARK-4557][STREAMING] Spark Streaming foreachRDD Java API method should accept a VoidFunction<...>

2015-11-18 Thread tdas
Repository: spark Updated Branches: refs/heads/master 94624eacb -> 31921e0f0 [SPARK-4557][STREAMING] Spark Streaming foreachRDD Java API method should accept a VoidFunction<...> Currently streaming foreachRDD Java API uses a function prototype requiring a return value of null. This PR

spark git commit: [SPARK-4557][STREAMING] Spark Streaming foreachRDD Java API method should accept a VoidFunction<...>

2015-11-18 Thread tdas
Repository: spark Updated Branches: refs/heads/branch-1.6 899106cc6 -> c130b8626 [SPARK-4557][STREAMING] Spark Streaming foreachRDD Java API method should accept a VoidFunction<...> Currently streaming foreachRDD Java API uses a function prototype requiring a return value of null. This PR

spark git commit: [SPARK-11791] Fix flaky test in BatchedWriteAheadLogSuite

2015-11-18 Thread tdas
solution would be to implement a custom mockito matcher that sorts and then compares the results, but that kind of sounds like overkill to me. Let me know what you think tdas zsxwing Author: Burak Yavuz <brk...@gmail.com> Closes #9790 from brkyvz/fix-flaky-2. Project: http://git-wip-us.a

spark git commit: [SPARK-11791] Fix flaky test in BatchedWriteAheadLogSuite

2015-11-18 Thread tdas
Another solution would be to implement a custom mockito matcher that sorts and then compares the results, but that kind of sounds like overkill to me. Let me know what you think tdas zsxwing Author: Burak Yavuz <brk...@gmail.com> Closes #9790 from brkyvz/fix-flaky-2. (cherry pic

spark git commit: [SPARK-11814][STREAMING] Add better default checkpoint duration

2015-11-18 Thread tdas
val = batch interval, and RDDs get checkpointed every batch. This PR is to set the checkpoint interval of trackStateByKey to 10 * batch duration. Author: Tathagata Das <tathagata.das1...@gmail.com> Closes #9805 from tdas/SPARK-11814. (cherry picked fr

spark git commit: [SPARK-11814][STREAMING] Add better default checkpoint duration

2015-11-18 Thread tdas
val = batch interval, and RDDs get checkpointed every batch. This PR is to set the checkpoint interval of trackStateByKey to 10 * batch duration. Author: Tathagata Das <tathagata.das1...@gmail.com> Closes #9805 from tdas/SPARK-11814. Project: http://git-wip-us.apache.org/repos/asf/spark/re

spark git commit: [SPARK-11740][STREAMING] Fix the race condition of two checkpoints in a batch

2015-11-17 Thread tdas
Repository: spark Updated Branches: refs/heads/master 936bc0bcb -> 928d63162 [SPARK-11740][STREAMING] Fix the race condition of two checkpoints in a batch We will do checkpoint when generating a batch and completing a batch. When the processing time of a batch is greater than the batch

spark git commit: [SPARK-11740][STREAMING] Fix the race condition of two checkpoints in a batch

2015-11-17 Thread tdas
Repository: spark Updated Branches: refs/heads/branch-1.6 1a5dfb706 -> fa9d56f9e [SPARK-11740][STREAMING] Fix the race condition of two checkpoints in a batch We will do checkpoint when generating a batch and completing a batch. When the processing time of a batch is greater than the batch

spark git commit: [SPARK-11740][STREAMING] Fix the race condition of two checkpoints in a batch

2015-11-17 Thread tdas
Repository: spark Updated Branches: refs/heads/branch-1.5 bdcbbdac6 -> e26dc9642 [SPARK-11740][STREAMING] Fix the race condition of two checkpoints in a batch We will do checkpoint when generating a batch and completing a batch. When the processing time of a batch is greater than the batch

spark git commit: [HOTFIX][STREAMING] Add mockito to fix the compilation error

2015-11-17 Thread tdas
Repository: spark Updated Branches: refs/heads/branch-1.5 e26dc9642 -> f33e277f9 [HOTFIX][STREAMING] Add mockito to fix the compilation error Added mockito to the test scope to fix the compilation error in branch 1.5 Author: Shixiong Zhu Closes #9782 from

spark git commit: [SPARK-9065][STREAMING][PYSPARK] Add MessageHandler for Kafka Python API

2015-11-17 Thread tdas
Repository: spark Updated Branches: refs/heads/branch-1.6 3133d8bd1 -> a7fcc3117 [SPARK-9065][STREAMING][PYSPARK] Add MessageHandler for Kafka Python API Fixed the merge conflicts in #7410 Closes #7410 Author: Shixiong Zhu Author: jerryshao

spark git commit: [SPARK-9065][STREAMING][PYSPARK] Add MessageHandler for Kafka Python API

2015-11-17 Thread tdas
Repository: spark Updated Branches: refs/heads/master b362d50fc -> 75a292291 [SPARK-9065][STREAMING][PYSPARK] Add MessageHandler for Kafka Python API Fixed the merge conflicts in #7410 Closes #7410 Author: Shixiong Zhu Author: jerryshao

spark git commit: [SPARK-11761] Prevent the call to StreamingContext#stop() in the listener bus's thread

2015-11-17 Thread tdas
Repository: spark Updated Branches: refs/heads/master 8fb775ba8 -> 446738e51 [SPARK-11761] Prevent the call to StreamingContext#stop() in the listener bus's thread See discussion toward the tail of https://github.com/apache/spark/pull/9723 >From zsxwing : ``` The user should not call stop or

spark git commit: [SPARK-11761] Prevent the call to StreamingContext#stop() in the listener bus's thread

2015-11-17 Thread tdas
Repository: spark Updated Branches: refs/heads/branch-1.6 c13f72316 -> 737f07172 [SPARK-11761] Prevent the call to StreamingContext#stop() in the listener bus's thread See discussion toward the tail of https://github.com/apache/spark/pull/9723 >From zsxwing : ``` The user should not call

spark git commit: [SPARK-11731][STREAMING] Enable batching on Driver WriteAheadLog by default

2015-11-16 Thread tdas
Repository: spark Updated Branches: refs/heads/master b0c3fd34e -> de5e531d3 [SPARK-11731][STREAMING] Enable batching on Driver WriteAheadLog by default Using batching on the driver for the WriteAheadLog should be an improvement for all environments and use cases. Users will be able to scale

spark git commit: [SPARK-11731][STREAMING] Enable batching on Driver WriteAheadLog by default

2015-11-16 Thread tdas
Repository: spark Updated Branches: refs/heads/branch-1.6 f14fb291d -> 38673d7e6 [SPARK-11731][STREAMING] Enable batching on Driver WriteAheadLog by default Using batching on the driver for the WriteAheadLog should be an improvement for all environments and use cases. Users will be able to

spark git commit: [SPARK-6328][PYTHON] Python API for StreamingListener

2015-11-16 Thread tdas
Repository: spark Updated Branches: refs/heads/branch-1.6 38673d7e6 -> c83177d30 [SPARK-6328][PYTHON] Python API for StreamingListener Author: Daniel Jalova Closes #9186 from djalova/SPARK-6328. (cherry picked from commit ace0db47141ffd457c2091751038fc291f6d5a8b)

spark git commit: [SPARK-6328][PYTHON] Python API for StreamingListener

2015-11-16 Thread tdas
Repository: spark Updated Branches: refs/heads/master de5e531d3 -> ace0db471 [SPARK-6328][PYTHON] Python API for StreamingListener Author: Daniel Jalova Closes #9186 from djalova/SPARK-6328. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit:

spark git commit: [SPARK-11742][STREAMING] Add the failure info to the batch lists

2015-11-16 Thread tdas
Repository: spark Updated Branches: refs/heads/master 3c025087b -> bcea0bfda [SPARK-11742][STREAMING] Add the failure info to the batch lists https://cloud.githubusercontent.com/assets/1000778/11162322/9b88e204-8a51-11e5-8c57-a44889cab713.png;> Author: Shixiong Zhu

spark git commit: [SPARK-11742][STREAMING] Add the failure info to the batch lists

2015-11-16 Thread tdas
Repository: spark Updated Branches: refs/heads/branch-1.6 64439f7d6 -> 3bd72eafc [SPARK-11742][STREAMING] Add the failure info to the batch lists https://cloud.githubusercontent.com/assets/1000778/11162322/9b88e204-8a51-11e5-8c57-a44889cab713.png;> Author: Shixiong Zhu

spark git commit: [SPARK-11706][STREAMING] Fix the bug that Streaming Python tests cannot report failures

2015-11-13 Thread tdas
Repository: spark Updated Branches: refs/heads/branch-1.6 aff44f9a8 -> c3da2bd46 [SPARK-11706][STREAMING] Fix the bug that Streaming Python tests cannot report failures This PR just checks the test results and returns 1 if the test fails, so that `run-tests.py` can mark it fail. Author:

spark git commit: [SPARK-11706][STREAMING] Fix the bug that Streaming Python tests cannot report failures

2015-11-13 Thread tdas
Repository: spark Updated Branches: refs/heads/master ad960885b -> ec80c0c2f [SPARK-11706][STREAMING] Fix the bug that Streaming Python tests cannot report failures This PR just checks the test results and returns 1 if the test fails, so that `run-tests.py` can mark it fail. Author:

spark git commit: [SPARK-11663][STREAMING] Add Java API for trackStateByKey

2015-11-12 Thread tdas
Repository: spark Updated Branches: refs/heads/master 41bbd2300 -> 0f1d00a90 [SPARK-11663][STREAMING] Add Java API for trackStateByKey TODO - [x] Add Java API - [x] Add API tests - [x] Add a function test Author: Shixiong Zhu Closes #9636 from zsxwing/java-track.

spark git commit: [SPARK-11663][STREAMING] Add Java API for trackStateByKey

2015-11-12 Thread tdas
Repository: spark Updated Branches: refs/heads/branch-1.6 6c1bf19e8 -> 05666e09b [SPARK-11663][STREAMING] Add Java API for trackStateByKey TODO - [x] Add Java API - [x] Add API tests - [x] Add a function test Author: Shixiong Zhu Closes #9636 from

spark git commit: [SPARK-11290][STREAMING][TEST-MAVEN] Fix the test for maven build

2015-11-12 Thread tdas
Repository: spark Updated Branches: refs/heads/branch-1.6 f5c66d163 -> 340ca9e76 [SPARK-11290][STREAMING][TEST-MAVEN] Fix the test for maven build Should not create SparkContext in the constructor of `TrackStateRDDSuite`. This is a follow up PR for #9256 to fix the test for maven build.

spark git commit: [SPARK-11290][STREAMING][TEST-MAVEN] Fix the test for maven build

2015-11-12 Thread tdas
Repository: spark Updated Branches: refs/heads/master 767d288b6 -> f0d3b58d9 [SPARK-11290][STREAMING][TEST-MAVEN] Fix the test for maven build Should not create SparkContext in the constructor of `TrackStateRDDSuite`. This is a follow up PR for #9256 to fix the test for maven build. Author:

spark git commit: [SPARK-11419][STREAMING] Parallel recovery for FileBasedWriteAheadLog + minor recovery tweaks

2015-11-12 Thread tdas
Repository: spark Updated Branches: refs/heads/branch-1.6 05666e09b -> 199e4cb21 [SPARK-11419][STREAMING] Parallel recovery for FileBasedWriteAheadLog + minor recovery tweaks The support for closing WriteAheadLog files after writes was just merged in. Closing every file after a write is a

spark git commit: [SPARK-11419][STREAMING] Parallel recovery for FileBasedWriteAheadLog + minor recovery tweaks

2015-11-12 Thread tdas
Repository: spark Updated Branches: refs/heads/master 0f1d00a90 -> 7786f9cc0 [SPARK-11419][STREAMING] Parallel recovery for FileBasedWriteAheadLog + minor recovery tweaks The support for closing WriteAheadLog files after writes was just merged in. Closing every file after a write is a very

spark git commit: [SPARK-11681][STREAMING] Correctly update state timestamp even when state is not updated

2015-11-12 Thread tdas
out is defined as "no data for a while", not "not state update for a while". Fix: Update timestamp when timestamp when timeout is specified, otherwise no need. Also refactored the code for better testability and added unit tests. Author: Tathagata Das <tathagata.das1...@gmail.co

spark git commit: [SPARK-11681][STREAMING] Correctly update state timestamp even when state is not updated

2015-11-12 Thread tdas
il.com> Closes #9648 from tdas/SPARK-11681. (cherry picked from commit e4e46b20f6475f8e148d5326f7c88c57850d46a1) Signed-off-by: Tathagata Das <tathagata.das1...@gmail.com> Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/c

spark git commit: [SPARK-11639][STREAMING][FLAKY-TEST] Implement BlockingWriteAheadLog for testing the BatchedWriteAheadLog

2015-11-11 Thread tdas
lar problem, but missed it here :( Submitting the fix using a waiter. cc tdas Author: Burak Yavuz <brk...@gmail.com> Closes #9605 from brkyvz/fix-flaky-test. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/27029bc8 T

spark git commit: [SPARK-11335][STREAMING] update kafka direct python docs on how to get the offset ranges for a KafkaRDD

2015-11-11 Thread tdas
Repository: spark Updated Branches: refs/heads/branch-1.6 d6d31815f -> f7c6c95f9 [SPARK-11335][STREAMING] update kafka direct python docs on how to get the offset ranges for a KafkaRDD tdas koeninger This updates the Spark Streaming + Kafka Integration Guide doc with a working met

spark git commit: [SPARK-11335][STREAMING] update kafka direct python docs on how to get the offset ranges for a KafkaRDD

2015-11-11 Thread tdas
Repository: spark Updated Branches: refs/heads/master a9a6b80c7 -> dd77e278b [SPARK-11335][STREAMING] update kafka direct python docs on how to get the offset ranges for a KafkaRDD tdas koeninger This updates the Spark Streaming + Kafka Integration Guide doc with a working method to acc

[2/2] spark git commit: [SPARK-11290][STREAMING] Basic implementation of trackStateByKey

2015-11-10 Thread tdas
- [x] state creating, updating, removing - [ ] emitting - [ ] checkpointing - [x] Misc unit tests for State, TrackStateSpec, etc. - [x] Update docs and experimental tags Author: Tathagata Das <tathagata.das1...@gmail.com> Closes #9256 from tdas/trackStateByKey. Project

[1/2] spark git commit: [SPARK-11290][STREAMING] Basic implementation of trackStateByKey

2015-11-10 Thread tdas
Repository: spark Updated Branches: refs/heads/branch-1.6 85bc72908 -> daa74be6f http://git-wip-us.apache.org/repos/asf/spark/blob/daa74be6/streaming/src/test/scala/org/apache/spark/streaming/rdd/TrackStateRDDSuite.scala --

[1/2] spark git commit: [SPARK-11290][STREAMING] Basic implementation of trackStateByKey

2015-11-10 Thread tdas
Repository: spark Updated Branches: refs/heads/master bd70244b3 -> 99f5f9886 http://git-wip-us.apache.org/repos/asf/spark/blob/99f5f988/streaming/src/test/scala/org/apache/spark/streaming/rdd/TrackStateRDDSuite.scala -- diff

[2/2] spark git commit: [SPARK-11290][STREAMING] Basic implementation of trackStateByKey

2015-11-10 Thread tdas
- [x] state creating, updating, removing - [ ] emitting - [ ] checkpointing - [x] Misc unit tests for State, TrackStateSpec, etc. - [x] Update docs and experimental tags Author: Tathagata Das <tathagata.das1...@gmail.com> Closes #9256 from tdas/trackStateByKey. (cherry

spark git commit: [SPARK-11462][STREAMING] Add JavaStreamingListener

2015-11-09 Thread tdas
Repository: spark Updated Branches: refs/heads/branch-1.6 dccc4645d -> ab7da0eae [SPARK-11462][STREAMING] Add JavaStreamingListener Currently, StreamingListener is not Java friendly because it exposes some Scala collections to Java users directly, such as Option, Map. This PR added a Java

spark git commit: [SPARK-11333][STREAMING] Add executorId to ReceiverInfo and display it in UI

2015-11-09 Thread tdas
Repository: spark Updated Branches: refs/heads/master 1f0f14efe -> 6502944f3 [SPARK-11333][STREAMING] Add executorId to ReceiverInfo and display it in UI Expose executorId to `ReceiverInfo` and UI since it's helpful when there are multiple executors running in the same host. Screenshot:

spark git commit: [SPARK-11333][STREAMING] Add executorId to ReceiverInfo and display it in UI

2015-11-09 Thread tdas
Repository: spark Updated Branches: refs/heads/branch-1.6 ab7da0eae -> d33f18c42 [SPARK-11333][STREAMING] Add executorId to ReceiverInfo and display it in UI Expose executorId to `ReceiverInfo` and UI since it's helpful when there are multiple executors running in the same host. Screenshot:

spark git commit: Add mockito as an explicit test dependency to spark-streaming

2015-11-09 Thread tdas
Repository: spark Updated Branches: refs/heads/master 6502944f3 -> 1431319e5 Add mockito as an explicit test dependency to spark-streaming While sbt successfully compiles as it properly pulls the mockito dependency, maven builds have broken. We need this in ASAP. tdas Author: Burak Ya

spark git commit: Add mockito as an explicit test dependency to spark-streaming

2015-11-09 Thread tdas
Repository: spark Updated Branches: refs/heads/branch-1.6 d33f18c42 -> d6f4b56a6 Add mockito as an explicit test dependency to spark-streaming While sbt successfully compiles as it properly pulls the mockito dependency, maven builds have broken. We need this in ASAP. tdas Author: Bu

spark git commit: [SPARK-11141][STREAMING] Batch ReceivedBlockTrackerLogEvents for WAL writes

2015-11-09 Thread tdas
end AddBlock events to the ReceiverTracker. This PR adds batching of events in the ReceivedBlockTracker so that receivers don't get blocked by the driver for too long. cc zsxwing tdas Author: Burak Yavuz <brk...@gmail.com> Closes #9143 from brkyvz/batch-wal-writes. Project: http:

spark git commit: [SPARK-11141][STREAMING] Batch ReceivedBlockTrackerLogEvents for WAL writes

2015-11-09 Thread tdas
end AddBlock events to the ReceiverTracker. This PR adds batching of events in the ReceivedBlockTracker so that receivers don't get blocked by the driver for too long. cc zsxwing tdas Author: Burak Yavuz <brk...@gmail.com> Closes #9143 from brkyvz/batch-wal-writes. (cherry picked fr

<    1   2   3   4   5   6   7   8   9   >