spark git commit: [SPARK-18744][CORE] Remove workaround for Netty memory leak

2016-12-06 Thread zsxwing
tty/netty/issues/5833 Now we can remove them as it's fixed in Netty 4.0.42.Final. ## How was this patch tested? Jenkins Author: Shixiong Zhu Closes #16167 from zsxwing/remove-netty-workaround. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.o

spark git commit: [SPARK-18671][SS][TEST] Added tests to ensure stability of that all Structured Streaming log formats

2016-12-06 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master cb1f10b46 -> 1ef6b296d [SPARK-18671][SS][TEST] Added tests to ensure stability of that all Structured Streaming log formats ## What changes were proposed in this pull request? To be able to restart StreamingQueries across Spark version, w

spark git commit: [SPARK-18671][SS][TEST] Added tests to ensure stability of that all Structured Streaming log formats

2016-12-06 Thread zsxwing
Repository: spark Updated Branches: refs/heads/branch-2.1 ace4079c5 -> d20e0d6b8 [SPARK-18671][SS][TEST] Added tests to ensure stability of that all Structured Streaming log formats ## What changes were proposed in this pull request? To be able to restart StreamingQueries across Spark versio

spark git commit: [SPARK-18694][SS] Add StreamingQuery.explain and exception to Python and fix StreamingQueryException (branch 2.1)

2016-12-05 Thread zsxwing
was this patch tested? Jenkins Author: Shixiong Zhu Closes #16153 from zsxwing/SPARK-18694-2.1. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/c6a4e3d9 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/c6a4e3d9 Diff: h

spark git commit: [SPARK-18617][SPARK-18560][TESTS] Fix flaky test: StreamingContextSuite. Receiver data should be deserialized properly

2016-12-01 Thread zsxwing
to stop StreamingContext. Otherwise, the latch added in #16091 can be passed too early. ## How was this patch tested? Jenkins Author: Shixiong Zhu Closes #16105 from zsxwing/SPARK-18617-2. (cherry picked from commit 086b0c8f6788b205bc630d5ccf078f77b9751af3) Signed-off-by: Shixiong Zhu Proj

spark git commit: [SPARK-18617][SPARK-18560][TESTS] Fix flaky test: StreamingContextSuite. Receiver data should be deserialized properly

2016-12-01 Thread zsxwing
to stop StreamingContext. Otherwise, the latch added in #16091 can be passed too early. ## How was this patch tested? Jenkins Author: Shixiong Zhu Closes #16105 from zsxwing/SPARK-18617-2. (cherry picked from commit 086b0c8f6788b205bc630d5ccf078f77b9751af3) Signed-off-by: Shixiong Zhu Proj

spark git commit: [SPARK-18617][SPARK-18560][TESTS] Fix flaky test: StreamingContextSuite. Receiver data should be deserialized properly

2016-12-01 Thread zsxwing
top StreamingContext. Otherwise, the latch added in #16091 can be passed too early. ## How was this patch tested? Jenkins Author: Shixiong Zhu Closes #16105 from zsxwing/SPARK-18617-2. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/sp

spark git commit: [SPARK-18655][SS] Ignore Structured Streaming 2.0.2 logs in history server

2016-11-30 Thread zsxwing
n.databind.deser.BeanDeserializerBase.deserializeFromObjectUsingNonDefault(BeanDeserializerBase.java:1099) ... ``` This PR just ignores such errors and adds a test to make sure we can read 2.0.2 logs. ## How was this patch tested? `query-event-logs-version-2.0.2.txt` has all types of events generated

spark git commit: [SPARK-18655][SS] Ignore Structured Streaming 2.0.2 logs in history server

2016-11-30 Thread zsxwing
bind.deser.BeanDeserializerBase.deserializeFromObjectUsingNonDefault(BeanDeserializerBase.java:1099) ... ``` This PR just ignores such errors and adds a test to make sure we can read 2.0.2 logs. ## How was this patch tested? `query-event-logs-version-2.0.2.txt` has all types of events generated by

spark git commit: [SPARK-18188] add checksum for blocks of broadcast

2016-11-29 Thread zsxwing
Repository: spark Updated Branches: refs/heads/branch-2.1 ea6957da2 -> 06a56df22 [SPARK-18188] add checksum for blocks of broadcast ## What changes were proposed in this pull request? A TorrentBroadcast is serialized and compressed first, then splitted as fixed size blocks, if any block is c

spark git commit: [SPARK-18188] add checksum for blocks of broadcast

2016-11-29 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master 3c0beea47 -> 7d5cb3af7 [SPARK-18188] add checksum for blocks of broadcast ## What changes were proposed in this pull request? A TorrentBroadcast is serialized and compressed first, then splitted as fixed size blocks, if any block is corru

spark git commit: [SPARK-18547][CORE] Propagate I/O encryption key when executors register.

2016-11-28 Thread zsxwing
Repository: spark Updated Branches: refs/heads/branch-2.1 45e2b3c0e -> c4cbdc864 [SPARK-18547][CORE] Propagate I/O encryption key when executors register. This change modifies the method used to propagate encryption keys used during shuffle. Instead of relying on YARN's UserGroupInformation cr

spark git commit: [SPARK-18547][CORE] Propagate I/O encryption key when executors register.

2016-11-28 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master 1633ff3b6 -> 8b325b17e [SPARK-18547][CORE] Propagate I/O encryption key when executors register. This change modifies the method used to propagate encryption keys used during shuffle. Instead of relying on YARN's UserGroupInformation creden

spark git commit: [SPARK-18588][SS][KAFKA] Ignore the flaky kafka test

2016-11-28 Thread zsxwing
Jenkins Author: Shixiong Zhu Closes #16051 from zsxwing/ignore-flaky-kafka-test. (cherry picked from commit 1633ff3b6c97e33191859f34c868782cbb0972fd) Signed-off-by: Shixiong Zhu Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spar

spark git commit: [SPARK-18588][SS][KAFKA] Ignore the flaky kafka test

2016-11-28 Thread zsxwing
Jenkins Author: Shixiong Zhu Closes #16051 from zsxwing/ignore-flaky-kafka-test. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/1633ff3b Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/1633ff3b Diff: http:/

spark git commit: [SPARK-18530][SS][KAFKA] Change Kafka timestamp column type to TimestampType

2016-11-22 Thread zsxwing
tch tested? `test("Kafka column types")`. Author: Shixiong Zhu Closes #15969 from zsxwing/SPARK-18530. (cherry picked from commit d0212eb0f22473ee5482fe98dafc24e16ffcfc63) Signed-off-by: Shixiong Zhu Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apa

spark git commit: [SPARK-18530][SS][KAFKA] Change Kafka timestamp column type to TimestampType

2016-11-22 Thread zsxwing
ted? `test("Kafka column types")`. Author: Shixiong Zhu Closes #15969 from zsxwing/SPARK-18530. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/d0212eb0 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/d0212e

spark git commit: [SPARK-18425][STRUCTURED STREAMING][TESTS] Test `CompactibleFileStreamLog` directly

2016-11-21 Thread zsxwing
Repository: spark Updated Branches: refs/heads/branch-2.1 6dbe44891 -> aaa2a173a [SPARK-18425][STRUCTURED STREAMING][TESTS] Test `CompactibleFileStreamLog` directly ## What changes were proposed in this pull request? Right now we are testing the most of `CompactibleFileStreamLog` in `FileSt

spark git commit: [SPARK-18425][STRUCTURED STREAMING][TESTS] Test `CompactibleFileStreamLog` directly

2016-11-21 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master 97a8239a6 -> ebeb0830a [SPARK-18425][STRUCTURED STREAMING][TESTS] Test `CompactibleFileStreamLog` directly ## What changes were proposed in this pull request? Right now we are testing the most of `CompactibleFileStreamLog` in `FileStream

spark git commit: [SPARK-18493] Add missing python APIs: withWatermark and checkpoint to dataframe

2016-11-21 Thread zsxwing
Repository: spark Updated Branches: refs/heads/branch-2.1 2afc18be2 -> 6dbe44891 [SPARK-18493] Add missing python APIs: withWatermark and checkpoint to dataframe ## What changes were proposed in this pull request? This PR adds two of the newly added methods of `Dataset`s to Python: `withWater

spark git commit: [SPARK-18493] Add missing python APIs: withWatermark and checkpoint to dataframe

2016-11-21 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master a2d464770 -> 97a8239a6 [SPARK-18493] Add missing python APIs: withWatermark and checkpoint to dataframe ## What changes were proposed in this pull request? This PR adds two of the newly added methods of `Dataset`s to Python: `withWatermark

spark git commit: [SPARK-18187][SQL] CompactibleFileStreamLog should not use "compactInterval" direcly with user setting.

2016-11-18 Thread zsxwing
as this patch tested? When restart a stream, we change the 'spark.sql.streaming.fileSource.log.compactInterval' different with the former one. The primary solution to this issue was given by uncleGen Added extensions include an additional metadata field in OffsetSeq and CompactibleFile

spark git commit: [SPARK-18187][SQL] CompactibleFileStreamLog should not use "compactInterval" direcly with user setting.

2016-11-18 Thread zsxwing
his patch tested? When restart a stream, we change the 'spark.sql.streaming.fileSource.log.compactInterval' different with the former one. The primary solution to this issue was given by uncleGen Added extensions include an additional metadata field in OffsetSeq and CompactibleFile

spark git commit: [SPARK-18459][SPARK-18460][STRUCTUREDSTREAMING] Rename triggerId to batchId and add triggerDetails to json in StreamingQueryStatus (for branch-2.0)

2016-11-16 Thread zsxwing
Repository: spark Updated Branches: refs/heads/branch-2.0 10b36d62a -> 37e6d9930 [SPARK-18459][SPARK-18460][STRUCTUREDSTREAMING] Rename triggerId to batchId and add triggerDetails to json in StreamingQueryStatus (for branch-2.0) This is a fix for branch-2.0 for the earlier PR #15895 ## What

spark git commit: [SPARK-18459][SPARK-18460][STRUCTUREDSTREAMING] Rename triggerId to batchId and add triggerDetails to json in StreamingQueryStatus

2016-11-16 Thread zsxwing
Repository: spark Updated Branches: refs/heads/branch-2.1 c0dbe08d6 -> b86e962c9 [SPARK-18459][SPARK-18460][STRUCTUREDSTREAMING] Rename triggerId to batchId and add triggerDetails to json in StreamingQueryStatus ## What changes were proposed in this pull request? SPARK-18459: triggerId seems

spark git commit: [SPARK-18459][SPARK-18460][STRUCTUREDSTREAMING] Rename triggerId to batchId and add triggerDetails to json in StreamingQueryStatus

2016-11-16 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master 608ecc512 -> 0048ce7ce [SPARK-18459][SPARK-18460][STRUCTUREDSTREAMING] Rename triggerId to batchId and add triggerDetails to json in StreamingQueryStatus ## What changes were proposed in this pull request? SPARK-18459: triggerId seems lik

spark git commit: [SPARK-18300][SQL] Fix scala 2.10 build for FoldablePropagation

2016-11-15 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master 3ce057d00 -> 4b35d13ba [SPARK-18300][SQL] Fix scala 2.10 build for FoldablePropagation ## What changes were proposed in this pull request? Commit https://github.com/apache/spark/commit/f14ae4900ad0ed66ba36108b7792d56cd6767a69 broke the sc

spark git commit: [SPARK-18300][SQL] Fix scala 2.10 build for FoldablePropagation

2016-11-15 Thread zsxwing
Repository: spark Updated Branches: refs/heads/branch-2.1 1126c3194 -> 175c47864 [SPARK-18300][SQL] Fix scala 2.10 build for FoldablePropagation ## What changes were proposed in this pull request? Commit https://github.com/apache/spark/commit/f14ae4900ad0ed66ba36108b7792d56cd6767a69 broke th

spark git commit: [SPARK-18423][STREAMING] ReceiverTracker should close checkpoint dir when stopped even if it was not started

2016-11-15 Thread zsxwing
Repository: spark Updated Branches: refs/heads/branch-2.1 b424dc947 -> e469d3bad [SPARK-18423][STREAMING] ReceiverTracker should close checkpoint dir when stopped even if it was not started ## What changes were proposed in this pull request? Several tests are being failed on Windows due to t

spark git commit: [SPARK-18423][STREAMING] ReceiverTracker should close checkpoint dir when stopped even if it was not started

2016-11-15 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master 1ae4652b7 -> 503378f10 [SPARK-18423][STREAMING] ReceiverTracker should close checkpoint dir when stopped even if it was not started ## What changes were proposed in this pull request? Several tests are being failed on Windows due to the f

spark git commit: [SPARK-13027][STREAMING] Added batch time as a parameter to updateStateByKey

2016-11-15 Thread zsxwing
Repository: spark Updated Branches: refs/heads/branch-2.1 0af94e772 -> 5f7a9af66 [SPARK-13027][STREAMING] Added batch time as a parameter to updateStateByKey Added RDD batch time as an input parameter to the update function in updateStateByKey. Author: Aaditya Ramesh Closes #11122 from ara

spark git commit: [SPARK-13027][STREAMING] Added batch time as a parameter to updateStateByKey

2016-11-15 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master 745ab8bc5 -> 6f9e598cc [SPARK-13027][STREAMING] Added batch time as a parameter to updateStateByKey Added RDD batch time as an input parameter to the update function in updateStateByKey. Author: Aaditya Ramesh Closes #11122 from aramesh

spark git commit: [SPARK-17510][STREAMING][KAFKA] config max rate on a per-partition basis

2016-11-14 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master bdfe60ac9 -> 89d1fa58d [SPARK-17510][STREAMING][KAFKA] config max rate on a per-partition basis ## What changes were proposed in this pull request? Allow configuration of max rate on a per-topicpartition basis. ## How was this patch tested

spark git commit: [SPARK-17510][STREAMING][KAFKA] config max rate on a per-partition basis

2016-11-14 Thread zsxwing
Repository: spark Updated Branches: refs/heads/branch-2.1 3c623d226 -> db691f05c [SPARK-17510][STREAMING][KAFKA] config max rate on a per-partition basis ## What changes were proposed in this pull request? Allow configuration of max rate on a per-topicpartition basis. ## How was this patch te

spark git commit: [SPARK-18416][STRUCTURED STREAMING] Fixed temp file leak in state store

2016-11-14 Thread zsxwing
Repository: spark Updated Branches: refs/heads/branch-2.0 666396510 -> c40fcbc14 [SPARK-18416][STRUCTURED STREAMING] Fixed temp file leak in state store ## What changes were proposed in this pull request? StateStore.get() causes temporary files to be created immediately, even if the store is

spark git commit: [SPARK-18416][STRUCTURED STREAMING] Fixed temp file leak in state store

2016-11-14 Thread zsxwing
Repository: spark Updated Branches: refs/heads/branch-2.1 c07fe1c59 -> 3c623d226 [SPARK-18416][STRUCTURED STREAMING] Fixed temp file leak in state store ## What changes were proposed in this pull request? StateStore.get() causes temporary files to be created immediately, even if the store is

spark git commit: [SPARK-18416][STRUCTURED STREAMING] Fixed temp file leak in state store

2016-11-14 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master 9d07ceee7 -> bdfe60ac9 [SPARK-18416][STRUCTURED STREAMING] Fixed temp file leak in state store ## What changes were proposed in this pull request? StateStore.get() causes temporary files to be created immediately, even if the store is not

spark git commit: [SPARK-17829][SQL] Stable format for offset log

2016-11-09 Thread zsxwing
treaming/OffsetSuite.scala) Please review https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark before opening a pull request. zsxwing marmbrus Author: Tyson Condie Author: Tyson Condie Closes #15626 from tcondie/spark-8360. (cherry picked from commit 3f62e1b5d9e75dc07bac3aa4db3e8d06

spark git commit: [SPARK-17829][SQL] Stable format for offset log

2016-11-09 Thread zsxwing
treaming/OffsetSuite.scala) Please review https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark before opening a pull request. zsxwing marmbrus Author: Tyson Condie Author: Tyson Condie Closes #15626 from tcondie/spark-8360. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Comm

spark git commit: [SPARK-18280][CORE] Fix potential deadlock in `StandaloneSchedulerBackend.dead`

2016-11-08 Thread zsxwing
if launching a new thread to stop the SparkContext. ## How was this patch tested? Jenkins Author: Shixiong Zhu Closes #15775 from zsxwing/SPARK-18280. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/ba80eaf7 Tree: http://git-

spark git commit: [SPARK-18280][CORE] Fix potential deadlock in `StandaloneSchedulerBackend.dead`

2016-11-08 Thread zsxwing
if launching a new thread to stop the SparkContext. ## How was this patch tested? Jenkins Author: Shixiong Zhu Closes #15775 from zsxwing/SPARK-18280. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/8aa419b2 Tree: http://git-

spark git commit: [SPARK-18280][CORE] Fix potential deadlock in `StandaloneSchedulerBackend.dead`

2016-11-08 Thread zsxwing
if launching a new thread to stop the SparkContext. ## How was this patch tested? Jenkins Author: Shixiong Zhu Closes #15775 from zsxwing/SPARK-18280. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/b6de0c98 Tree: http://git-

spark git commit: [SPARK-18283][STRUCTURED STREAMING][KAFKA] Added test to check whether default starting offset in latest

2016-11-07 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master daa975f4b -> b06c23db9 [SPARK-18283][STRUCTURED STREAMING][KAFKA] Added test to check whether default starting offset in latest ## What changes were proposed in this pull request? Added test to check whether default starting offset in lat

spark git commit: [SPARK-18283][STRUCTURED STREAMING][KAFKA] Added test to check whether default starting offset in latest

2016-11-07 Thread zsxwing
Repository: spark Updated Branches: refs/heads/branch-2.0 b5d7217af -> 10525c294 [SPARK-18283][STRUCTURED STREAMING][KAFKA] Added test to check whether default starting offset in latest ## What changes were proposed in this pull request? Added test to check whether default starting offset in

spark git commit: [SPARK-18283][STRUCTURED STREAMING][KAFKA] Added test to check whether default starting offset in latest

2016-11-07 Thread zsxwing
Repository: spark Updated Branches: refs/heads/branch-2.1 6b332909f -> 7a84edb24 [SPARK-18283][STRUCTURED STREAMING][KAFKA] Added test to check whether default starting offset in latest ## What changes were proposed in this pull request? Added test to check whether default starting offset in

spark git commit: [SPARK-18144][SQL] logging StreamingQueryListener$QueryStartedEvent

2016-11-01 Thread zsxwing
the original code is actually calling StreamingQueryListenerBus.postToAll() which has no listener at allwe shall post by sparkListenerBus.postToAll(s) and this.postToAll() to trigger local listeners as well as the listeners registered in LiveListenerBus zsxwing ## How was this patch tested?

spark git commit: [SPARK-18144][SQL] logging StreamingQueryListener$QueryStartedEvent

2016-11-01 Thread zsxwing
the original code is actually calling StreamingQueryListenerBus.postToAll() which has no listener at allwe shall post by sparkListenerBus.postToAll(s) and this.postToAll() to trigger local listeners as well as the listeners registered in LiveListenerBus zsxwing ## How was this patch tested?

spark git commit: [SPARK-18144][SQL] logging StreamingQueryListener$QueryStartedEvent

2016-11-01 Thread zsxwing
nal code is actually calling StreamingQueryListenerBus.postToAll() which has no listener at allwe shall post by sparkListenerBus.postToAll(s) and this.postToAll() to trigger local listeners as well as the listeners registered in LiveListenerBus zsxwing ## How was this patch tested?

spark git commit: [SPARK-18030][TESTS] Fix flaky FileStreamSourceSuite by not deleting the files

2016-10-31 Thread zsxwing
uld not delete files because the source maybe is listing files. This PR just removes the delete actions since they are not necessary. ## How was this patch tested? Jenkins Author: Shixiong Zhu Closes #15699 from zsxwing/SPARK-18030. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Com

spark git commit: [SPARK-18030][TESTS] Fix flaky FileStreamSourceSuite by not deleting the files

2016-10-31 Thread zsxwing
ata` should not delete files because the source maybe is listing files. This PR just removes the delete actions since they are not necessary. ## How was this patch tested? Jenkins Author: Shixiong Zhu Closes #15699 from zsxwing/SPARK-18030. (cherry picked from com

spark git commit: [SPARK-18164][SQL] ForeachSink should fail the Spark job if `process` throws exception

2016-10-28 Thread zsxwing
## How was this patch tested? The fixed unit test. Author: Shixiong Zhu Closes #15674 from zsxwing/foreach-sink-error. (cherry picked from commit 59cccbda489f25add3e10997e950de7e88704aa7) Signed-off-by: Shixiong Zhu Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http:/

spark git commit: [SPARK-18164][SQL] ForeachSink should fail the Spark job if `process` throws exception

2016-10-28 Thread zsxwing
How was this patch tested? The fixed unit test. Author: Shixiong Zhu Closes #15674 from zsxwing/foreach-sink-error. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/59cccbda Tree: http://git-wip-us.apache.org/repos/asf/sp

spark git commit: [SPARK-16963][SQL] Fix test "StreamExecution metadata garbage collection"

2016-10-27 Thread zsxwing
use the file list API doesn't guarantee any order of the return list. ## How was this patch tested? Jenkins Author: Shixiong Zhu Closes #15661 from zsxwing/fix-StreamingQuerySuite. (cherry picked from commit 79fd0cc0584e48fb021c4237877b15abbffb319a) Signed-off-by: Shixiong Zhu

spark git commit: [SPARK-16963][SQL] Fix test "StreamExecution metadata garbage collection"

2016-10-27 Thread zsxwing
the file list API doesn't guarantee any order of the return list. ## How was this patch tested? Jenkins Author: Shixiong Zhu Closes #15661 from zsxwing/fix-StreamingQuerySuite. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spa

spark git commit: [SPARK-17813][SQL][KAFKA] Maximum data per trigger

2016-10-27 Thread zsxwing
Repository: spark Updated Branches: refs/heads/branch-2.0 1a4be51d6 -> 6fb1f735f [SPARK-17813][SQL][KAFKA] Maximum data per trigger ## What changes were proposed in this pull request? maxOffsetsPerTrigger option for rate limiting, proportionally based on volume of different topicpartitions.

spark git commit: [SPARK-17813][SQL][KAFKA] Maximum data per trigger

2016-10-27 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master 701a9d361 -> 104232580 [SPARK-17813][SQL][KAFKA] Maximum data per trigger ## What changes were proposed in this pull request? maxOffsetsPerTrigger option for rate limiting, proportionally based on volume of different topicpartitions. ##

spark git commit: [SPARK-16963][STREAMING][SQL] Changes to Source trait and related implementation classes

2016-10-26 Thread zsxwing
Repository: spark Updated Branches: refs/heads/branch-2.0 72b3cff33 -> ea205e376 [SPARK-16963][STREAMING][SQL] Changes to Source trait and related implementation classes ## What changes were proposed in this pull request? This PR contains changes to the Source trait such that the scheduler c

spark git commit: [SPARK-16963][STREAMING][SQL] Changes to Source trait and related implementation classes

2016-10-26 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master a76846cfb -> 5b27598ff [SPARK-16963][STREAMING][SQL] Changes to Source trait and related implementation classes ## What changes were proposed in this pull request? This PR contains changes to the Source trait such that the scheduler can n

spark git commit: [SPARK-13747][SQL] Fix concurrent executions in ForkJoinPool for SQL (branch 2.0)

2016-10-26 Thread zsxwing
hor: Shixiong Zhu Closes #15646 from zsxwing/SPARK-13747-2.0. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/76b71eef Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/76b71eef Diff: http://git-wip-us.apache.org/repos/

spark git commit: [SPARK-18104][DOC] Don't build KafkaSource doc

2016-10-26 Thread zsxwing
kaSource. All KafkaSource APIs are internal. ## How was this patch tested? Verified manually. Author: Shixiong Zhu Closes #15630 from zsxwing/kafka-unidoc. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/7d10631c Tree: h

spark git commit: [SPARK-18104][DOC] Don't build KafkaSource doc

2016-10-26 Thread zsxwing
e KafkaSource. All KafkaSource APIs are internal. ## How was this patch tested? Verified manually. Author: Shixiong Zhu Closes #15630 from zsxwing/kafka-unidoc. (cherry picked from commit 7d10631c16b980adf1f55378c128436310daed65) Signed-off-by: Shixiong Zhu Project: http://git-wip-us.apache.

spark git commit: [SPARK-13747][SQL] Fix concurrent executions in ForkJoinPool for SQL

2016-10-26 Thread zsxwing
h tested? Jenkins Author: Shixiong Zhu Closes #15520 from zsxwing/SPARK-13747. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/7ac70e7b Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/7ac70e7b Diff: http:/

spark git commit: [SPARK-16304] LinkageError should not crash Spark executor

2016-10-26 Thread zsxwing
Repository: spark Updated Branches: refs/heads/branch-2.0 b4a7b6551 -> 773fbfef1 [SPARK-16304] LinkageError should not crash Spark executor ## What changes were proposed in this pull request? This patch updates the failure handling logic so Spark executor does not crash when seeing LinkageErr

spark git commit: [SPARK-17894][HOTFIX] Fix broken build from

2016-10-24 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master d479c5262 -> 483c37c58 [SPARK-17894][HOTFIX] Fix broken build from The named parameter in an overridden class isn't supported in Scala 2.10 so was breaking the build. cc zsxwing Author: Kay Ousterhout Closes #15617 from kayou

spark git commit: [SPARK-17624][SQL][STREAMING][TEST] Fixed flaky StateStoreSuite.maintenance

2016-10-24 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master 81d6933e7 -> 407c3cedf [SPARK-17624][SQL][STREAMING][TEST] Fixed flaky StateStoreSuite.maintenance ## What changes were proposed in this pull request? The reason for the flakiness was follows. The test starts the maintenance background th

spark git commit: [SPARK-17624][SQL][STREAMING][TEST] Fixed flaky StateStoreSuite.maintenance

2016-10-24 Thread zsxwing
Repository: spark Updated Branches: refs/heads/branch-2.0 bad15bcdf -> 1c1e847bc [SPARK-17624][SQL][STREAMING][TEST] Fixed flaky StateStoreSuite.maintenance ## What changes were proposed in this pull request? The reason for the flakiness was follows. The test starts the maintenance backgroun

spark git commit: [SPARK-18044][STREAMING] FileStreamSource should not infer partitions in every batch

2016-10-24 Thread zsxwing
Repository: spark Updated Branches: refs/heads/branch-2.0 aef65ac02 -> bad15bcdf [SPARK-18044][STREAMING] FileStreamSource should not infer partitions in every batch ## What changes were proposed in this pull request? In `FileStreamSource.getBatch`, we will create a `DataSource` with specifi

spark git commit: [SPARK-17153][SQL] Should read partition data when reading new files in filestream without globbing

2016-10-24 Thread zsxwing
Repository: spark Updated Branches: refs/heads/branch-2.0 064db176d -> aef65ac02 [SPARK-17153][SQL] Should read partition data when reading new files in filestream without globbing ## What changes were proposed in this pull request? When reading file stream with non-globbing path, the result

spark git commit: [STREAMING][KAFKA][DOC] clarify kafka settings needed for larger batches

2016-10-21 Thread zsxwing
Repository: spark Updated Branches: refs/heads/branch-2.0 3e9840f1d -> d3c78c4f3 [STREAMING][KAFKA][DOC] clarify kafka settings needed for larger batches ## What changes were proposed in this pull request? Minor doc change to mention kafka configuration for larger spark batches. ## How was t

spark git commit: [STREAMING][KAFKA][DOC] clarify kafka settings needed for larger batches

2016-10-21 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master 268ccb9a4 -> c9720b219 [STREAMING][KAFKA][DOC] clarify kafka settings needed for larger batches ## What changes were proposed in this pull request? Minor doc change to mention kafka configuration for larger spark batches. ## How was this

spark git commit: [SPARK-17812][SQL][KAFKA] Assign and specific startingOffsets for structured stream

2016-10-21 Thread zsxwing
Repository: spark Updated Branches: refs/heads/branch-2.0 b113b5d9f -> 3e9840f1d [SPARK-17812][SQL][KAFKA] Assign and specific startingOffsets for structured stream ## What changes were proposed in this pull request? startingOffsets takes specific per-topicpartition offsets as a json argumen

spark git commit: [SPARK-17812][SQL][KAFKA] Assign and specific startingOffsets for structured stream

2016-10-21 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master 140570252 -> 268ccb9a4 [SPARK-17812][SQL][KAFKA] Assign and specific startingOffsets for structured stream ## What changes were proposed in this pull request? startingOffsets takes specific per-topicpartition offsets as a json argument,

spark git commit: [SPARK-18044][STREAMING] FileStreamSource should not infer partitions in every batch

2016-10-21 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master c1f344f1a -> 140570252 [SPARK-18044][STREAMING] FileStreamSource should not infer partitions in every batch ## What changes were proposed in this pull request? In `FileStreamSource.getBatch`, we will create a `DataSource` with specified

spark git commit: [SPARK-17929][CORE] Fix deadlock when CoarseGrainedSchedulerBackend reset

2016-10-21 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master 7a531e305 -> c1f344f1a [SPARK-17929][CORE] Fix deadlock when CoarseGrainedSchedulerBackend reset ## What changes were proposed in this pull request? https://issues.apache.org/jira/browse/SPARK-17929 Now `CoarseGrainedSchedulerBackend` res

spark git commit: [SPARK-17929][CORE] Fix deadlock when CoarseGrainedSchedulerBackend reset

2016-10-21 Thread zsxwing
Repository: spark Updated Branches: refs/heads/branch-2.0 af2e6e0c9 -> b113b5d9f [SPARK-17929][CORE] Fix deadlock when CoarseGrainedSchedulerBackend reset ## What changes were proposed in this pull request? https://issues.apache.org/jira/browse/SPARK-17929 Now `CoarseGrainedSchedulerBackend`

spark git commit: [SPARK-18030][TESTS] Adds more checks to collect more info about FileStreamSourceSuite failure

2016-10-20 Thread zsxwing
ore info. ## How was this patch tested? Jenkins Author: Shixiong Zhu Closes #15577 from zsxwing/SPARK-18030-debug. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/1bb99c48 Tree: http://git-wip-us.apache.org/repos/asf/spark/t

spark git commit: [SPARK-17999][KAFKA][SQL] Add getPreferredLocations for KafkaSourceRDD

2016-10-20 Thread zsxwing
Repository: spark Updated Branches: refs/heads/branch-2.0 4131623a8 -> e8923d21d [SPARK-17999][KAFKA][SQL] Add getPreferredLocations for KafkaSourceRDD ## What changes were proposed in this pull request? The newly implemented Structured Streaming `KafkaSource` did calculate the preferred loc

spark git commit: [SPARK-17999][KAFKA][SQL] Add getPreferredLocations for KafkaSourceRDD

2016-10-20 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master 84b245f2d -> 947f4f252 [SPARK-17999][KAFKA][SQL] Add getPreferredLocations for KafkaSourceRDD ## What changes were proposed in this pull request? The newly implemented Structured Streaming `KafkaSource` did calculate the preferred locatio

spark git commit: [SPARK-17711][TEST-HADOOP2.2] Fix hadoop2.2 compilation error

2016-10-18 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master 5f20ae039 -> 2629cd746 [SPARK-17711][TEST-HADOOP2.2] Fix hadoop2.2 compilation error ## What changes were proposed in this pull request? Fix hadoop2.2 compilation error. ## How was this patch tested? Existing tests. cc tdas zsxw

spark git commit: [SPARK-17711][TEST-HADOOP2.2] Fix hadoop2.2 compilation error

2016-10-18 Thread zsxwing
Repository: spark Updated Branches: refs/heads/branch-2.0 99943bf69 -> 3796a98cf [SPARK-17711][TEST-HADOOP2.2] Fix hadoop2.2 compilation error ## What changes were proposed in this pull request? Fix hadoop2.2 compilation error. ## How was this patch tested? Existing tests. cc tdas zsxw

spark git commit: [SPARK-17930][CORE] The SerializerInstance instance used when deserializing a TaskResult is not reused

2016-10-18 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master 20dd11096 -> 4518642ab [SPARK-17930][CORE] The SerializerInstance instance used when deserializing a TaskResult is not reused ## What changes were proposed in this pull request? The following code is called when the DirectTaskResult instan

spark git commit: [SPARK-17839][CORE] Use Nio's directbuffer instead of BufferedInputStream in order to avoid additional copy from os buffer cache to user buffer

2016-10-17 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master e3bf37fa3 -> c7ac027d5 [SPARK-17839][CORE] Use Nio's directbuffer instead of BufferedInputStream in order to avoid additional copy from os buffer cache to user buffer ## What changes were proposed in this pull request? Currently we use Bu

spark git commit: [SPARK-17678][REPL][BRANCH-1.6] Honor spark.replClassServer.port in scala-2.11 repl

2016-10-13 Thread zsxwing
Repository: spark Updated Branches: refs/heads/branch-1.6 585c5657f -> 18b173cfc [SPARK-17678][REPL][BRANCH-1.6] Honor spark.replClassServer.port in scala-2.11 repl ## What changes were proposed in this pull request? Spark 1.6 Scala-2.11 repl doesn't honor "spark.replClassServer.port" confi

spark git commit: [SPARK-17834][SQL] Fetch the earliest offsets manually in KafkaSource instead of counting on KafkaConsumer

2016-10-13 Thread zsxwing
the partition offsets, this PR just calls `seekToBeginning` to manually set the earliest offsets for the KafkaSource initial offsets. ## How was this patch tested? Existing tests. Author: Shixiong Zhu Closes #15397 from zsxwing/SPARK-17834. Project: http://git-wip-us.apache.org/repos/asf/spark/r

spark git commit: [SPARK-17834][SQL] Fetch the earliest offsets manually in KafkaSource instead of counting on KafkaConsumer

2016-10-13 Thread zsxwing
the partition offsets, this PR just calls `seekToBeginning` to manually set the earliest offsets for the KafkaSource initial offsets. ## How was this patch tested? Existing tests. Author: Shixiong Zhu Closes #15397 from zsxwing/SPARK-17834. (cherry picked from com

spark git commit: [SPARK-17876] Write StructuredStreaming WAL to a stream instead of materializing all at once

2016-10-12 Thread zsxwing
Repository: spark Updated Branches: refs/heads/branch-2.0 5903dabc5 -> ab00e410c [SPARK-17876] Write StructuredStreaming WAL to a stream instead of materializing all at once ## What changes were proposed in this pull request? The CompactibleFileStreamLog materializes the whole metadata log i

spark git commit: [SPARK-17876] Write StructuredStreaming WAL to a stream instead of materializing all at once

2016-10-12 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master 21cb59f1c -> edeb51a39 [SPARK-17876] Write StructuredStreaming WAL to a stream instead of materializing all at once ## What changes were proposed in this pull request? The CompactibleFileStreamLog materializes the whole metadata log in me

spark git commit: [SPARK-17782][STREAMING][KAFKA] alternative eliminate race condition of poll twice

2016-10-12 Thread zsxwing
Repository: spark Updated Branches: refs/heads/branch-2.0 d55ba3063 -> 050b8177e [SPARK-17782][STREAMING][KAFKA] alternative eliminate race condition of poll twice ## What changes were proposed in this pull request? Alternative approach to https://github.com/apache/spark/pull/15387 Author:

spark git commit: [SPARK-17782][STREAMING][KAFKA] alternative eliminate race condition of poll twice

2016-10-12 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master 9ce7d3e54 -> f9a56a153 [SPARK-17782][STREAMING][KAFKA] alternative eliminate race condition of poll twice ## What changes were proposed in this pull request? Alternative approach to https://github.com/apache/spark/pull/15387 Author: cody

spark git commit: [SPARK-17816][CORE][BRANCH-2.0] Fix ConcurrentModificationException issue in BlockStatusesAccumulator

2016-10-11 Thread zsxwing
Repository: spark Updated Branches: refs/heads/branch-2.0 a6b5e1dcc -> 5ec3e6680 [SPARK-17816][CORE][BRANCH-2.0] Fix ConcurrentModificationException issue in BlockStatusesAccumulator ## What changes were proposed in this pull request? Replaced `BlockStatusesAccumulator` with `CollectionAccumu

spark git commit: [SPARK-17816][CORE] Fix ConcurrentModificationException issue in BlockStatusesAccumulator

2016-10-10 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master 90217f9de -> 19a5bae47 [SPARK-17816][CORE] Fix ConcurrentModificationException issue in BlockStatusesAccumulator ## What changes were proposed in this pull request? Change the BlockStatusesAccumulator to return immutable object when value

spark git commit: [SPARK-17738][TEST] Fix flaky test in ColumnTypeSuite

2016-10-10 Thread zsxwing
Repository: spark Updated Branches: refs/heads/branch-2.0 d719e9a08 -> ff9f5bbf1 [SPARK-17738][TEST] Fix flaky test in ColumnTypeSuite ## What changes were proposed in this pull request? The default buffer size is not big enough for randomly generated MapType. ## How was this patch tested?

spark git commit: [SPARK-17738][TEST] Fix flaky test in ColumnTypeSuite

2016-10-10 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master 03c40202f -> d5ec4a3e0 [SPARK-17738][TEST] Fix flaky test in ColumnTypeSuite ## What changes were proposed in this pull request? The default buffer size is not big enough for randomly generated MapType. ## How was this patch tested? Ran

spark git commit: [SPARK-17782][STREAMING][BUILD] Add Kafka 0.10 project to build modules

2016-10-07 Thread zsxwing
Repository: spark Updated Branches: refs/heads/branch-2.0 f460a199e -> a84d8ef37 [SPARK-17782][STREAMING][BUILD] Add Kafka 0.10 project to build modules ## What changes were proposed in this pull request? This PR adds the Kafka 0.10 subproject to the build infrastructure. This makes sure Kafk

[2/2] spark git commit: [SPARK-17346][SQL][TEST-MAVEN] Add Kafka source for Structured Streaming (branch 2.0)

2016-10-07 Thread zsxwing
/b678e465afa417780b54db0fbbaa311621311f15 into branch 2.0. The only difference is the Spark version in pom file. ## How was this patch tested? Jenkins. Author: Shixiong Zhu Closes #15367 from zsxwing/kafka-source-branch-2.0. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip

[1/2] spark git commit: [SPARK-17346][SQL][TEST-MAVEN] Add Kafka source for Structured Streaming (branch 2.0)

2016-10-07 Thread zsxwing
Repository: spark Updated Branches: refs/heads/branch-2.0 9f2eb27a4 -> f460a199e http://git-wip-us.apache.org/repos/asf/spark/blob/f460a199/external/kafka-0-10-sql/src/test/scala/org/apache/spark/sql/kafka010/KafkaTestUtils.scala -

spark git commit: [SPARK-17707][WEBUI] Web UI prevents spark-submit application to be finished

2016-10-07 Thread zsxwing
Repository: spark Updated Branches: refs/heads/branch-2.0 3487b0203 -> 9f2eb27a4 [SPARK-17707][WEBUI] Web UI prevents spark-submit application to be finished This expands calls to Jetty's simple `ServerConnector` constructor to explicitly specify a `ScheduledExecutorScheduler` that makes daem

spark git commit: [SPARK-17707][WEBUI] Web UI prevents spark-submit application to be finished

2016-10-07 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master dd16b52cf -> cff560755 [SPARK-17707][WEBUI] Web UI prevents spark-submit application to be finished ## What changes were proposed in this pull request? This expands calls to Jetty's simple `ServerConnector` constructor to explicitly speci

spark git commit: [SPARK-17798][SQL] Remove redundant Experimental annotations in sql.streaming

2016-10-06 Thread zsxwing
Repository: spark Updated Branches: refs/heads/branch-2.0 1c2dff1ee -> 225372adf [SPARK-17798][SQL] Remove redundant Experimental annotations in sql.streaming ## What changes were proposed in this pull request? I was looking through API annotations to catch mislabeled APIs, and realized DataS

spark git commit: [SPARK-17798][SQL] Remove redundant Experimental annotations in sql.streaming

2016-10-06 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master 92b7e5728 -> 79accf45a [SPARK-17798][SQL] Remove redundant Experimental annotations in sql.streaming ## What changes were proposed in this pull request? I was looking through API annotations to catch mislabeled APIs, and realized DataStrea

<    1   2   3   4   5   6   7   8   >