spark git commit: [SPARK-21546][SS] dropDuplicates should ignore watermark when it's not a key

2017-08-02 Thread zsxwing
ash. This PR fixed this issue. ## How was this patch tested? The new unit test. Author: Shixiong Zhu <shixi...@databricks.com> Closes #18822 from zsxwing/SPARK-21546. (cherry picked from commit 0d26b3aa55f9cc75096b0e2b309f64fe3270b9a5) Signed-off-by: Shixiong Zhu <shixi...@databricks.co

spark git commit: [CORE][MINOR] Improve the error message of checkpoint RDD verification

2017-08-01 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master 77cc0d67d -> 4cc704b12 [CORE][MINOR] Improve the error message of checkpoint RDD verification ### What changes were proposed in this pull request? The original error message is pretty confusing. It is unable to tell which number is

spark git commit: [SPARK-21146][CORE] Master/Worker should handle and shutdown when any thread gets UncaughtException

2017-07-12 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master 24367f23f -> e16e8c7ad [SPARK-21146][CORE] Master/Worker should handle and shutdown when any thread gets UncaughtException ## What changes were proposed in this pull request? Adding the default UncaughtExceptionHandler to the Worker. ##

spark git commit: [SPARK-21069][SS][DOCS] Add rate source to programming guide.

2017-07-08 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master 9760c15ac -> d0bfc6733 [SPARK-21069][SS][DOCS] Add rate source to programming guide. ## What changes were proposed in this pull request? SPARK-20979 added a new structured streaming source: Rate source. This patch adds the corresponding

spark git commit: [SPARK-21069][SS][DOCS] Add rate source to programming guide.

2017-07-08 Thread zsxwing
Repository: spark Updated Branches: refs/heads/branch-2.2 576fd4c3a -> ab12848d6 [SPARK-21069][SS][DOCS] Add rate source to programming guide. ## What changes were proposed in this pull request? SPARK-20979 added a new structured streaming source: Rate source. This patch adds the

spark git commit: [SPARK-21421][SS] Add the query id as a local property to allow source and sink using it

2017-07-14 Thread zsxwing
ing it. ## How was this patch tested? The new unit test. Author: Shixiong Zhu <shixi...@databricks.com> Closes #18638 from zsxwing/SPARK-21421. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/2d968a07 Tree: http:

spark git commit: [SPARK-21409][SS] Expose state store memory usage in SQL metrics and progress updates

2017-07-17 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master 53465075c -> 9d8c83179 [SPARK-21409][SS] Expose state store memory usage in SQL metrics and progress updates ## What changes were proposed in this pull request? Currently, there is no tracking of memory usage of state stores. This JIRA

spark git commit: [SPARK-21253][CORE][HOTFIX] Fix Scala 2.10 build

2017-06-29 Thread zsxwing
Zhu <shixi...@databricks.com> Closes #18478 from zsxwing/SPARK-21253-2. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/cfc696f4 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/cfc696f4 Diff: http://git-wip-us.a

spark git commit: [SPARK-21253][CORE][HOTFIX] Fix Scala 2.10 build

2017-06-29 Thread zsxwing
ong Zhu <shixi...@databricks.com> Closes #18478 from zsxwing/SPARK-21253-2. (cherry picked from commit cfc696f4a4289acf132cb26baf7c02c5b6305277) Signed-off-by: Shixiong Zhu <shixi...@databricks.com> Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.a

spark git commit: [SPARK-21188][CORE] releaseAllLocksForTask should synchronize the whole method

2017-06-29 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master 18066f2e6 -> f9151bebc [SPARK-21188][CORE] releaseAllLocksForTask should synchronize the whole method ## What changes were proposed in this pull request? Since the objects `readLocksByTask`, `writeLocksByTask` and `info`s are coupled and

spark git commit: [SPARK-21248][SS] The clean up codes in StreamExecution should not be interrupted

2017-07-05 Thread zsxwing
des in StreamExecution is interrupted. It also removes an optimization in `runUninterruptibly` to make sure this method never throw `InterruptedException`. ## How was this patch tested? Jenkins Author: Shixiong Zhu <shixi...@databricks.com> Closes #18461 from zsxwing/SPARK-21248. Project: http:

spark git commit: [SPARK-21216][SS] Hive strategies missed in Structured Streaming IncrementalExecution

2017-06-28 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master 838effb98 -> e68aed70f [SPARK-21216][SS] Hive strategies missed in Structured Streaming IncrementalExecution ## What changes were proposed in this pull request? If someone creates a HiveSession, the planner in `IncrementalExecution`

spark git commit: [SPARK-21329][SS] Make EventTimeWatermarkExec explicitly UnaryExecNode

2017-07-06 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master 40c7add3a -> e5bb26174 [SPARK-21329][SS] Make EventTimeWatermarkExec explicitly UnaryExecNode ## What changes were proposed in this pull request? Making EventTimeWatermarkExec explicitly UnaryExecNode /cc tdas zsxwing ##

spark git commit: [SPARK-21267][SS][DOCS] Update Structured Streaming Documentation

2017-07-06 Thread zsxwing
Repository: spark Updated Branches: refs/heads/branch-2.2 4e53a4edd -> 576fd4c3a [SPARK-21267][SS][DOCS] Update Structured Streaming Documentation ## What changes were proposed in this pull request? Few changes to the Structured Streaming documentation - Clarify that the entire stream input

spark git commit: [SPARK-19525][CORE] Add RDD checkpoint compression support

2017-04-28 Thread zsxwing
ess` to enable/disable it. Credit goes to aramesh117 Closes #17024 ## How was this patch tested? The new unit test. Author: Shixiong Zhu <shixi...@databricks.com> Author: Aaditya Ramesh <aram...@conviva.com> Closes #17789 from zsxwing/pr17024. Project: http://git-wip-us.apache.org/repos

spark git commit: [SPARK-19525][CORE] Add RDD checkpoint compression support

2017-04-28 Thread zsxwing
ess` to enable/disable it. Credit goes to aramesh117 Closes #17024 ## How was this patch tested? The new unit test. Author: Shixiong Zhu <shixi...@databricks.com> Author: Aaditya Ramesh <aram...@conviva.com> Closes #17789 from zsxwing/pr17024. (cherry pic

spark git commit: [SPARK-21596][SS] Ensure places calling HDFSMetadataLog.get check the return value

2017-08-09 Thread zsxwing
Repository: spark Updated Branches: refs/heads/branch-2.2 7446be332 -> f6d56d2f1 [SPARK-21596][SS] Ensure places calling HDFSMetadataLog.get check the return value Same PR as #18799 but for branch 2.2. Main discussion the other PR. When I was investigating a flaky test, I realized

spark git commit: [SPARK-21597][SS] Fix a potential overflow issue in EventTimeStats

2017-08-02 Thread zsxwing
ted? The new unit tests Author: Shixiong Zhu <shixi...@databricks.com> Closes #18803 from zsxwing/avg. (cherry picked from commit 7f63e85b47a93434030482160e88fe63bf9cff4e) Signed-off-by: Shixiong Zhu <shixi...@databricks.com> Project: http://git-wip-us.apache.org/repos/asf/spark/re

spark git commit: [SPARK-21597][SS] Fix a potential overflow issue in EventTimeStats

2017-08-02 Thread zsxwing
ted? The new unit tests Author: Shixiong Zhu <shixi...@databricks.com> Closes #18803 from zsxwing/avg. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/7f63e85b Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/7f63

spark git commit: [SPARK-21374][CORE] Fix reading globbed paths from S3 into DF with disabled FS cache

2017-08-07 Thread zsxwing
t; Author: Andrey Taptunov <taptu...@amazon.com> Closes #18848 from zsxwing/review-pr18623. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/43f9c84b Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/43f9c84b Diff:

spark git commit: [SPARK-21565][SS] Propagate metadata in attribute replacement.

2017-08-07 Thread zsxwing
Repository: spark Updated Branches: refs/heads/branch-2.2 43f9c84b6 -> fa92a7be7 [SPARK-21565][SS] Propagate metadata in attribute replacement. ## What changes were proposed in this pull request? Propagate metadata in attribute replacement during streaming execution. This is necessary for

spark git commit: [SPARK-21565][SS] Propagate metadata in attribute replacement.

2017-08-07 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master 4f7ec3a31 -> cce25b360 [SPARK-21565][SS] Propagate metadata in attribute replacement. ## What changes were proposed in this pull request? Propagate metadata in attribute replacement during streaming execution. This is necessary for

spark git commit: [SPARK-21517][CORE] Avoid copying memory when transfer chunks remotely

2017-07-25 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master 300807c6e -> 16612638f [SPARK-21517][CORE] Avoid copying memory when transfer chunks remotely ## What changes were proposed in this pull request? In our production cluster,oom happens when NettyBlockRpcServer receive OpenBlocks

spark git commit: [SPARK-19965][SS] DataFrame batch reader may fail to infer partitions when reading FileStreamSink's output

2017-05-03 Thread zsxwing
Repository: spark Updated Branches: refs/heads/branch-2.2 f0e80aa2d -> 36d807906 [SPARK-19965][SS] DataFrame batch reader may fail to infer partitions when reading FileStreamSink's output ## The Problem Right now DataFrame batch reader may fail to infer partitions when reading

spark git commit: [SPARK-19965][SS] DataFrame batch reader may fail to infer partitions when reading FileStreamSink's output

2017-05-03 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master 527fc5d0c -> 6b9e49d12 [SPARK-19965][SS] DataFrame batch reader may fail to infer partitions when reading FileStreamSink's output ## The Problem Right now DataFrame batch reader may fail to infer partitions when reading FileStreamSink's

spark git commit: [SPARK-20600][SS] KafkaRelation should be pretty printed in web UI

2017-05-11 Thread zsxwing
Repository: spark Updated Branches: refs/heads/branch-2.2 dd9e3b2c9 -> 5844151bc [SPARK-20600][SS] KafkaRelation should be pretty printed in web UI ## What changes were proposed in this pull request? User-friendly name of `KafkaRelation` in web UI (under Details for Query). ### Before

spark git commit: [SPARK-20600][SS] KafkaRelation should be pretty printed in web UI

2017-05-11 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master 3aa4e464a -> 7144b5180 [SPARK-20600][SS] KafkaRelation should be pretty printed in web UI ## What changes were proposed in this pull request? User-friendly name of `KafkaRelation` in web UI (under Details for Query). ### Before

spark git commit: [SPARK-20373][SQL][SS] Batch queries with 'Dataset/DataFrame.withWatermark()` does not execute

2017-05-09 Thread zsxwing
Repository: spark Updated Branches: refs/heads/branch-2.2 d191b962d -> 7600a7ab6 [SPARK-20373][SQL][SS] Batch queries with 'Dataset/DataFrame.withWatermark()` does not execute ## What changes were proposed in this pull request? Any Dataset/DataFrame batch query with the operation

spark git commit: [SPARK-20373][SQL][SS] Batch queries with 'Dataset/DataFrame.withWatermark()` does not execute

2017-05-09 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master f79aa285c -> c0189abc7 [SPARK-20373][SQL][SS] Batch queries with 'Dataset/DataFrame.withWatermark()` does not execute ## What changes were proposed in this pull request? Any Dataset/DataFrame batch query with the operation

spark git commit: [SPARK-20717][SS] Minor tweaks to the MapGroupsWithState behavior

2017-05-15 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master d2416925c -> 499ba2cb4 [SPARK-20717][SS] Minor tweaks to the MapGroupsWithState behavior ## What changes were proposed in this pull request? Timeout and state data are two independent entities and should be settable independently.

spark git commit: [SPARK-20717][SS] Minor tweaks to the MapGroupsWithState behavior

2017-05-15 Thread zsxwing
Repository: spark Updated Branches: refs/heads/branch-2.2 82ae1f0ac -> a79a120a8 [SPARK-20717][SS] Minor tweaks to the MapGroupsWithState behavior ## What changes were proposed in this pull request? Timeout and state data are two independent entities and should be settable independently.

spark git commit: [SPARK-20716][SS] StateStore.abort() should not throw exceptions

2017-05-15 Thread zsxwing
Repository: spark Updated Branches: refs/heads/branch-2.2 0bd918f67 -> 82ae1f0ac [SPARK-20716][SS] StateStore.abort() should not throw exceptions ## What changes were proposed in this pull request? StateStore.abort() should do a best effort attempt to clean up temporary resources. It should

spark git commit: [SPARK-20716][SS] StateStore.abort() should not throw exceptions

2017-05-15 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master e1aaab1e2 -> 271175e2b [SPARK-20716][SS] StateStore.abort() should not throw exceptions ## What changes were proposed in this pull request? StateStore.abort() should do a best effort attempt to clean up temporary resources. It should not

spark git commit: [SPARK-20788][CORE] Fix the Executor task reaper's false alarm warning logs

2017-05-17 Thread zsxwing
ask is finishing but being killed at the same time. The fix is pretty easy, just flip the "finished" flag when a task is successful. ## How was this patch tested? Jenkins Author: Shixiong Zhu <shixi...@databricks.com> Closes #18021 from zsxwing/SPARK-20788. Project: http://git-wip-

spark git commit: [SPARK-20788][CORE] Fix the Executor task reaper's false alarm warning logs

2017-05-17 Thread zsxwing
ask is finishing but being killed at the same time. The fix is pretty easy, just flip the "finished" flag when a task is successful. ## How was this patch tested? Jenkins Author: Shixiong Zhu <shixi...@databricks.com> Closes #18021 from zsxwing/SPARK-20788. (cherry

spark git commit: [SPARK-13747][CORE] Add ThreadUtils.awaitReady and disallow Await.ready

2017-05-17 Thread zsxwing
low `Await.ready`. ## How was this patch tested? Jenkins Author: Shixiong Zhu <shixi...@databricks.com> Closes #17763 from zsxwing/awaitready. (cherry picked from commit 324a904d8e80089d8865e4c7edaedb92ab2ec1b2) Signed-off-by: Shixiong Zhu <shixi...@databricks.com> Project: http://git-wi

spark git commit: [SPARK-13747][CORE] Add ThreadUtils.awaitReady and disallow Await.ready

2017-05-17 Thread zsxwing
low `Await.ready`. ## How was this patch tested? Jenkins Author: Shixiong Zhu <shixi...@databricks.com> Closes #17763 from zsxwing/awaitready. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/324a904d Tree: http://git-wip-us.a

spark git commit: [SPARK-20529][CORE] Allow worker and master work with a proxy server

2017-05-16 Thread zsxwing
How was this patch tested? The new added unit test. Author: Shixiong Zhu <shixi...@databricks.com> Closes #17821 from zsxwing/SPARK-20529. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/9150bca4 Tree: http://git-wip-us.apache.

spark git commit: [SPARK-20529][CORE] Allow worker and master work with a proxy server

2017-05-16 Thread zsxwing
How was this patch tested? The new added unit test. Author: Shixiong Zhu <shixi...@databricks.com> Closes #17821 from zsxwing/SPARK-20529. (cherry picked from commit 9150bca47e4b8782e20441386d3d225eb5f2f404) Signed-off-by: Shixiong Zhu <shixi...@databricks.com> Project: http://git-wi

spark git commit: [SPARK-20702][CORE] TaskContextImpl.markTaskCompleted should not hide the original error

2017-05-12 Thread zsxwing
ted` to propagate the original error. It also fixes an issue that `TaskCompletionListenerException.getMessage` doesn't include `previousError`. ## How was this patch tested? New unit tests. Author: Shixiong Zhu <shixi...@databricks.com> Closes #17942 from zsxwing/SPARK-20702. Project: h

spark git commit: [SPARK-20702][CORE] TaskContextImpl.markTaskCompleted should not hide the original error

2017-05-12 Thread zsxwing
ter to `TaskContextImpl.markTaskCompleted` to propagate the original error. It also fixes an issue that `TaskCompletionListenerException.getMessage` doesn't include `previousError`. ## How was this patch tested? New unit tests. Author: Shixiong Zhu <shixi...@databricks.com> Closes #17942 from zsxwing/SPARK-20702. (cher

spark git commit: [SPARK-20714][SS] Fix match error when watermark is set with timeout = no timeout / processing timeout

2017-05-12 Thread zsxwing
Repository: spark Updated Branches: refs/heads/branch-2.2 7123ec8e1 -> f14246959 [SPARK-20714][SS] Fix match error when watermark is set with timeout = no timeout / processing timeout ## What changes were proposed in this pull request? When watermark is set, and timeout conf is NoTimeout or

spark git commit: [SPARK-20714][SS] Fix match error when watermark is set with timeout = no timeout / processing timeout

2017-05-12 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master 7d6ff3910 -> 0d3a63193 [SPARK-20714][SS] Fix match error when watermark is set with timeout = no timeout / processing timeout ## What changes were proposed in this pull request? When watermark is set, and timeout conf is NoTimeout or

spark git commit: [SPARK-20979][SS] Add RateSource to generate values for tests and benchmark

2017-06-12 Thread zsxwing
e added tests. Author: Shixiong Zhu <shixi...@databricks.com> Author: Michael Armbrust <mich...@databricks.com> Closes #18199 from zsxwing/rate. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/74a432d3 Tree: http://git-wip

spark git commit: [SPARK-21123][DOCS][STRUCTURED STREAMING] Options for file stream source are in a wrong table - version to fix 2.1

2017-06-20 Thread zsxwing
ted the structured streaming programming guide. zsxwing This is the PR to fix version 2.1 as discussed in PR #18342 Author: assafmendelson <assaf.mendel...@gmail.com> Closes #18363 from assafmendelson/spark-21123-for-spark2.1. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Com

spark git commit: [SPARK-21147][SS] Throws an analysis exception when a user-specified schema is given in socket/rate sources

2017-06-21 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master ad459cfb1 -> 7a00c658d [SPARK-21147][SS] Throws an analysis exception when a user-specified schema is given in socket/rate sources ## What changes were proposed in this pull request? This PR proposes to throw an exception if a schema is

spark git commit: [SPARK-21167][SS] Decode the path generated by File sink to handle special characters

2017-06-22 Thread zsxwing
ers. ## How was this patch tested? The added unit test. Author: Shixiong Zhu <shixi...@databricks.com> Closes #18381 from zsxwing/SPARK-21167. (cherry picked from commit d66b143eec7f604595089f72d8786edbdcd74282) Signed-off-by: Shixiong Zhu <shixi...@databricks.com> Project:

spark git commit: [SPARK-21167][SS] Decode the path generated by File sink to handle special characters

2017-06-22 Thread zsxwing
How was this patch tested? The added unit test. Author: Shixiong Zhu <shixi...@databricks.com> Closes #18381 from zsxwing/SPARK-21167. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/d66b143e Tree: http://git-wip-us.a

spark git commit: [SPARK-21167][SS] Decode the path generated by File sink to handle special characters

2017-06-22 Thread zsxwing
ers. ## How was this patch tested? The added unit test. Author: Shixiong Zhu <shixi...@databricks.com> Closes #18381 from zsxwing/SPARK-21167. (cherry picked from commit d66b143eec7f604595089f72d8786edbdcd74282) Signed-off-by: Shixiong Zhu <shixi...@databricks.com> Project:

spark git commit: [SPARK-20599][SS] ConsoleSink should work with (batch)

2017-06-22 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master 19331b8e4 -> e55a105ae [SPARK-20599][SS] ConsoleSink should work with (batch) ## What changes were proposed in this pull request? Currently, if we read a batch and want to display it on the console sink, it will lead a runtime exception.

spark git commit: [SPARK-21192][SS] Preserve State Store provider class configuration across StreamingQuery restarts

2017-06-23 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master 1ebe7ffe0 -> 2ebd0838d [SPARK-21192][SS] Preserve State Store provider class configuration across StreamingQuery restarts ## What changes were proposed in this pull request? If the SQL conf for StateStore provider class is changed

spark git commit: [SPARK-21153] Use project instead of expand in tumbling windows

2017-06-26 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master 6b3d02285 -> 5282bae04 [SPARK-21153] Use project instead of expand in tumbling windows ## What changes were proposed in this pull request? Time windowing in Spark currently performs an Expand + Filter, because there is no way to

spark git commit: [SPARK-21123][DOCS][STRUCTURED STREAMING] Options for file stream source are in a wrong table

2017-06-19 Thread zsxwing
Repository: spark Updated Branches: refs/heads/branch-2.2 f7fcdec6c -> 7b50736c4 [SPARK-21123][DOCS][STRUCTURED STREAMING] Options for file stream source are in a wrong table ## What changes were proposed in this pull request? The description for several options of File Source for

spark git commit: [SPARK-21123][DOCS][STRUCTURED STREAMING] Options for file stream source are in a wrong table

2017-06-19 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master e92ffe6f1 -> 66a792cd8 [SPARK-21123][DOCS][STRUCTURED STREAMING] Options for file stream source are in a wrong table ## What changes were proposed in this pull request? The description for several options of File Source for structured

spark git commit: [SPARK-20844] Remove experimental from Structured Streaming APIs

2017-05-26 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master 0fd84b05d -> d935e0a9d [SPARK-20844] Remove experimental from Structured Streaming APIs Now that Structured Streaming has been out for several Spark release and has large production use cases, the `Experimental` label is no longer

spark git commit: [SPARK-20014] Optimize mergeSpillsWithFileStream method

2017-05-26 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master d935e0a9d -> 473d7552a [SPARK-20014] Optimize mergeSpillsWithFileStream method ## What changes were proposed in this pull request? When the individual partition size in a spill is small, mergeSpillsWithTransferTo method does many small

spark git commit: [SPARK-20844] Remove experimental from Structured Streaming APIs

2017-05-26 Thread zsxwing
Repository: spark Updated Branches: refs/heads/branch-2.2 92837aeb4 -> 2b59ed4f1 [SPARK-20844] Remove experimental from Structured Streaming APIs Now that Structured Streaming has been out for several Spark release and has large production use cases, the `Experimental` label is no longer

spark git commit: [SPARK-19372][SQL] Fix throwing a Java exception at df.fliter() due to 64KB bytecode size limit

2017-05-26 Thread zsxwing
Repository: spark Updated Branches: refs/heads/branch-2.2 f99456b5f -> 92837aeb4 [SPARK-19372][SQL] Fix throwing a Java exception at df.fliter() due to 64KB bytecode size limit ## What changes were proposed in this pull request? When an expression for `df.filter()` has many nodes (e.g.

spark git commit: [SPARK-20843][CORE] Add a config to set driver terminate timeout

2017-05-26 Thread zsxwing
How was this patch tested? Jenkins Author: Shixiong Zhu <shixi...@databricks.com> Closes #18126 from zsxwing/SPARK-20843. (cherry picked from commit 6c1dbd6fc8d49acf7c1c902d2ebf89ed5e788a4e) Signed-off-by: Shixiong Zhu <shixi...@databricks.com> Project: http://git-wip-us.apache.org/

spark git commit: [SPARK-20843][CORE] Add a config to set driver terminate timeout

2017-05-26 Thread zsxwing
How was this patch tested? Jenkins Author: Shixiong Zhu <shixi...@databricks.com> Closes #18126 from zsxwing/SPARK-20843. (cherry picked from commit 6c1dbd6fc8d49acf7c1c902d2ebf89ed5e788a4e) Signed-off-by: Shixiong Zhu <shixi...@databricks.com> Project: http://git-wip-us.apache.org/

spark git commit: [SPARK-20843][CORE] Add a config to set driver terminate timeout

2017-05-26 Thread zsxwing
How was this patch tested? Jenkins Author: Shixiong Zhu <shixi...@databricks.com> Closes #18126 from zsxwing/SPARK-20843. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/6c1dbd6f Tree: http://git-wip-us.apache.org/repos/

spark git commit: [SPARK-20907][TEST] Use testQuietly for test suites that generate long log output

2017-05-29 Thread zsxwing
Repository: spark Updated Branches: refs/heads/branch-2.2 dc51be1e7 -> 26640a269 [SPARK-20907][TEST] Use testQuietly for test suites that generate long log output ## What changes were proposed in this pull request? Supress console output by using `testQuietly` in test suites ## How was

spark git commit: [SPARK-20907][TEST] Use testQuietly for test suites that generate long log output

2017-05-29 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master ef9fd920c -> c9749068e [SPARK-20907][TEST] Use testQuietly for test suites that generate long log output ## What changes were proposed in this pull request? Supress console output by using `testQuietly` in test suites ## How was this

spark git commit: [SPARK-19372][SQL] Fix throwing a Java exception at df.fliter() due to 64KB bytecode size limit

2017-05-16 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master 9150bca47 -> 6f62e9d9b [SPARK-19372][SQL] Fix throwing a Java exception at df.fliter() due to 64KB bytecode size limit ## What changes were proposed in this pull request? When an expression for `df.filter()` has many nodes (e.g. 400),

[2/2] spark git commit: [SPARK-20883][SPARK-20376][SS] Refactored StateStore APIs and added conf to choose implementation

2017-05-30 Thread zsxwing
[SPARK-20883][SPARK-20376][SS] Refactored StateStore APIs and added conf to choose implementation ## What changes were proposed in this pull request? A bunch of changes to the StateStore APIs and implementation. Current state store API has a bunch of problems that causes too many transient

[1/2] spark git commit: [SPARK-20883][SPARK-20376][SS] Refactored StateStore APIs and added conf to choose implementation

2017-05-30 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master 4bb6a53eb -> fa757ee1d http://git-wip-us.apache.org/repos/asf/spark/blob/fa757ee1/sql/core/src/test/scala/org/apache/spark/sql/execution/streaming/state/StateStoreSuite.scala

spark git commit: [SPARK-19968][SS] Use a cached instance of `KafkaProducer` instead of creating one every batch.

2017-05-29 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master 1c7db00c7 -> 96a4d1d08 [SPARK-19968][SS] Use a cached instance of `KafkaProducer` instead of creating one every batch. ## What changes were proposed in this pull request? In summary, cost of recreating a KafkaProducer for writing every

spark git commit: [SPARK-19968][SS] Use a cached instance of `KafkaProducer` instead of creating one every batch.

2017-05-29 Thread zsxwing
Repository: spark Updated Branches: refs/heads/branch-2.2 3b79e4cda -> f6730a70c [SPARK-19968][SS] Use a cached instance of `KafkaProducer` instead of creating one every batch. ## What changes were proposed in this pull request? In summary, cost of recreating a KafkaProducer for writing

spark git commit: [SPARK-20894][SS] Resolve the checkpoint location in driver and use the resolved path in state store

2017-05-31 Thread zsxwing
org/apache/spark/sql/execution/datasources/DataSource.scala#L402), it doesn't make things worse. ## How was this patch tested? The new added test. Author: Shixiong Zhu <shixi...@databricks.com> Closes #18149 from zsxwing/SPARK-20894. Project: http://git-wip-us.apache.org/repos/asf/s

spark git commit: [SPARK-20940][CORE] Replace IllegalAccessError with IllegalStateException

2017-05-31 Thread zsxwing
8168 from zsxwing/SPARK-20940. (cherry picked from commit 24db35826a81960f08e3eb68556b0f51781144e1) Signed-off-by: Shixiong Zhu <shixi...@databricks.com> Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/a607a26b Tree:

spark git commit: [SPARK-20940][CORE] Replace IllegalAccessError with IllegalStateException

2017-05-31 Thread zsxwing
8168 from zsxwing/SPARK-20940. (cherry picked from commit 24db35826a81960f08e3eb68556b0f51781144e1) Signed-off-by: Shixiong Zhu <shixi...@databricks.com> Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/cd870c0c Tree:

spark git commit: [SPARK-20940][CORE] Replace IllegalAccessError with IllegalStateException

2017-05-31 Thread zsxwing
8168 from zsxwing/SPARK-20940. (cherry picked from commit 24db35826a81960f08e3eb68556b0f51781144e1) Signed-off-by: Shixiong Zhu <shixi...@databricks.com> Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/dade85f7 Tree:

spark git commit: [SPARK-20940][CORE] Replace IllegalAccessError with IllegalStateException

2017-05-31 Thread zsxwing
org/jira/browse/SPARK-20666) is an example of killing SparkContext due to `IllegalAccessError`). I think the correct type of exception in AccumulatorV2 should be `IllegalStateException`. ## How was this patch tested? Jenkins Author: Shixiong Zhu <shixi...@databricks.com> Closes #18168 fro

spark git commit: [SPARK-20979][SS] Add RateSource to generate values for tests and benchmark

2017-06-13 Thread zsxwing
e added tests. Author: Shixiong Zhu <shixi...@databricks.com> Author: Michael Armbrust <mich...@databricks.com> Closes #18199 from zsxwing/rate. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/220943d8 Tree: http://git-wip

spark git commit: [SPARK-20464][SS] Add a job group and description for streaming queries and fix cancellation of running jobs using the job group

2017-05-01 Thread zsxwing
Repository: spark Updated Branches: refs/heads/branch-2.2 38edb9256 -> 6f0d29672 [SPARK-20464][SS] Add a job group and description for streaming queries and fix cancellation of running jobs using the job group ## What changes were proposed in this pull request? Job group: adding a job group

spark git commit: [SPARK-20464][SS] Add a job group and description for streaming queries and fix cancellation of running jobs using the job group

2017-05-01 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master ab30590f4 -> 6fc6cf88d [SPARK-20464][SS] Add a job group and description for streaming queries and fix cancellation of running jobs using the job group ## What changes were proposed in this pull request? Job group: adding a job group is

spark git commit: [SPARK-20603][SS][TEST] Set default number of topic partitions to 1 to reduce the load

2017-05-05 Thread zsxwing
PR changes `offsets.topic.num.partitions` from the default value 50 to 1 to make creating `__consumer_offsets` (50 partitions -> 1 partition) much faster. ## How was this patch tested? Jenkins Author: Shixiong Zhu <shixi...@databricks.com> Closes #17863 from zsxwing/fix-kafka-flaky-te

spark git commit: [SPARK-20603][SS][TEST] Set default number of topic partitions to 1 to reduce the load

2017-05-05 Thread zsxwing
PR changes `offsets.topic.num.partitions` from the default value 50 to 1 to make creating `__consumer_offsets` (50 partitions -> 1 partition) much faster. ## How was this patch tested? Jenkins Author: Shixiong Zhu <shixi...@databricks.com> Closes #17863 from zsxwing/fix-kafka-flaky-te

spark git commit: [SPARK-20603][SS][TEST] Set default number of topic partitions to 1 to reduce the load

2017-05-05 Thread zsxwing
ges `offsets.topic.num.partitions` from the default value 50 to 1 to make creating `__consumer_offsets` (50 partitions -> 1 partition) much faster. ## How was this patch tested? Jenkins Author: Shixiong Zhu <shixi...@databricks.com> Closes #17863 from zsxwing/fix-kafka-flaky-test. P

spark git commit: [SPARK-20874][EXAMPLES] Add Structured Streaming Kafka Source to examples project

2017-05-25 Thread zsxwing
so that people can run `bin/run-example StructuredKafkaWordCount ...`. ## How was this patch tested? manually tested it. Author: Shixiong Zhu <shixi...@databricks.com> Closes #18101 from zsxwing/add-missing-example-dep. (cherry picked from commit 98c3852986a2cb5f2d249d6c8ef602be283bd90e) S

spark git commit: [SPARK-20874][EXAMPLES] Add Structured Streaming Kafka Source to examples project

2017-05-25 Thread zsxwing
so that people can run `bin/run-example StructuredKafkaWordCount ...`. ## How was this patch tested? manually tested it. Author: Shixiong Zhu <shixi...@databricks.com> Closes #18101 from zsxwing/add-missing-example-dep. (cherry picked from commit 98c3852986a2cb5f2d249d6c8ef602be283bd90e) S

spark git commit: [SPARK-20874][EXAMPLES] Add Structured Streaming Kafka Source to examples project

2017-05-25 Thread zsxwing
ple can run `bin/run-example StructuredKafkaWordCount ...`. ## How was this patch tested? manually tested it. Author: Shixiong Zhu <shixi...@databricks.com> Closes #18101 from zsxwing/add-missing-example-dep. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http:

spark git commit: [SPARK-20792][SS] Support same timeout operations in mapGroupsWithState function in batch queries as in streaming queries

2017-05-21 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master bbd8d7def -> 9d6661c82 [SPARK-20792][SS] Support same timeout operations in mapGroupsWithState function in batch queries as in streaming queries ## What changes were proposed in this pull request? Currently, in the batch queries, timeout

spark git commit: [SPARK-20792][SS] Support same timeout operations in mapGroupsWithState function in batch queries as in streaming queries

2017-05-21 Thread zsxwing
Repository: spark Updated Branches: refs/heads/branch-2.2 3aad5982a -> cfd1bf0be [SPARK-20792][SS] Support same timeout operations in mapGroupsWithState function in batch queries as in streaming queries ## What changes were proposed in this pull request? Currently, in the batch queries,

spark git commit: [SPARK-20991][SQL] BROADCAST_TIMEOUT conf should be a TimeoutConf

2017-06-05 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master bc537e40a -> 88a23d3de [SPARK-20991][SQL] BROADCAST_TIMEOUT conf should be a TimeoutConf ## What changes were proposed in this pull request? The construction of BROADCAST_TIMEOUT conf should take the TimeUnit argument as a TimeoutConf.

spark git commit: [SPARK-21113][CORE] Read ahead input stream to amortize disk IO cost …

2017-09-18 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master 7c7266208 -> 1e978b17d [SPARK-21113][CORE] Read ahead input stream to amortize disk IO cost … Profiling some of our big jobs, we see that around 30% of the time is being spent in reading the spill files from disk. In order to amortize

spark git commit: [SPARK-21988] Add default stats to StreamingExecutionRelation.

2017-09-14 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master ddd7f5e11 -> 054ddb2f5 [SPARK-21988] Add default stats to StreamingExecutionRelation. ## What changes were proposed in this pull request? Add default stats to StreamingExecutionRelation. ## How was this patch tested? existing unit tests

spark git commit: [SPARK-22094][SS] processAllAvailable should check the query state

2017-09-21 Thread zsxwing
Zhu <zsxw...@gmail.com> Closes #19314 from zsxwing/SPARK-22094. (cherry picked from commit fedf6961be4e99139eb7ab08d5e6e29187ea5ccf) Signed-off-by: Shixiong Zhu <zsxw...@gmail.com> Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/a

spark git commit: [SPARK-22094][SS] processAllAvailable should check the query state

2017-09-21 Thread zsxwing
uld return. ## How was this patch tested? The new unit test. Author: Shixiong Zhu <zsxw...@gmail.com> Closes #19314 from zsxwing/SPARK-22094. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/fedf6961 Tree: http:

spark git commit: [SPARK-22203][SQL] Add job description for file listing Spark jobs

2017-10-04 Thread zsxwing
7-9c2b-7bf80b153adb.png;> Author: Shixiong Zhu <zsxw...@gmail.com> Closes #19432 from zsxwing/SPARK-22203. Project: http://git-wip-us.apache.org/repos/asf/spark/repo Commit: http://git-wip-us.apache.org/repos/asf/spark/commit/c8affec2 Tree: http://git-wip-us.apache.org/repos/asf/spark/tree/c8af

spark git commit: [SPARK-21947][SS] Check and report error when monotonically_increasing_id is used in streaming query

2017-10-06 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master 08b204fd2 -> debcbec74 [SPARK-21947][SS] Check and report error when monotonically_increasing_id is used in streaming query ## What changes were proposed in this pull request? `monotonically_increasing_id` doesn't work in Structured

spark git commit: [MINOR][SS] keyWithIndexToNumValues" -> "keyWithIndexToValue"

2017-10-13 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master 3823dc88d -> 1bb8b7604 [MINOR][SS] keyWithIndexToNumValues" -> "keyWithIndexToValue" ## What changes were proposed in this pull request? This PR changes `keyWithIndexToNumValues` to `keyWithIndexToValue`. There will be directories on

spark git commit: [SPARK-9104][CORE] Expose Netty memory metrics in Spark

2017-09-05 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master 6a2325448 -> 445f1790a [SPARK-9104][CORE] Expose Netty memory metrics in Spark ## What changes were proposed in this pull request? This PR exposes Netty memory usage for Spark's `TransportClientFactory` and `TransportServer`, including

spark git commit: [SPARK-21901][SS] Define toString for StateOperatorProgress

2017-09-06 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master acdf45fb5 -> fa0092bdd [SPARK-21901][SS] Define toString for StateOperatorProgress ## What changes were proposed in this pull request? Just `StateOperatorProgress.toString` + few formatting fixes ## How was this patch tested? Local

spark git commit: [SPARK-21901][SS] Define toString for StateOperatorProgress

2017-09-06 Thread zsxwing
Repository: spark Updated Branches: refs/heads/branch-2.2 9afab9a52 -> 342cc2a4c [SPARK-21901][SS] Define toString for StateOperatorProgress ## What changes were proposed in this pull request? Just `StateOperatorProgress.toString` + few formatting fixes ## How was this patch tested? Local

spark git commit: [SPARK-21701][CORE] Enable RPC client to use ` SO_RCVBUF` and ` SO_SNDBUF` in SparkConf.

2017-08-24 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master d3abb3699 -> 763b83ee8 [SPARK-21701][CORE] Enable RPC client to use ` SO_RCVBUF` and ` SO_SNDBUF` in SparkConf. ## What changes were proposed in this pull request? TCP parameters like SO_RCVBUF and SO_SNDBUF can be set in SparkConf, and

spark git commit: [SPARK-21880][WEB UI] In the SQL table page, modify jobs trace information

2017-09-01 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master 0bdbefe9d -> 12f0d2422 [SPARK-21880][WEB UI] In the SQL table page, modify jobs trace information ## What changes were proposed in this pull request? As shown below, for example, When the job 5 is running, It was a mistake to think that

spark git commit: [SPARK-22230] Swap per-row order in state store restore.

2017-10-09 Thread zsxwing
Repository: spark Updated Branches: refs/heads/master 155ab6347 -> 71c2b81aa [SPARK-22230] Swap per-row order in state store restore. ## What changes were proposed in this pull request? In state store restore, for each row, put the saved state before the row in the iterator instead of after.

spark git commit: [SPARK-21988][SS] Implement StreamingRelation.computeStats to fix explain

2017-10-11 Thread zsxwing
ted? - unit tests: `StreamingRelation.computeStats` and `StreamingExecutionRelation.computeStats`. - regression tests: `explain join with a normal source` and `explain join with MemoryStream`. Author: Shixiong Zhu <zsxw...@gmail.com> Closes #19465 from zsxwing/SPARK-21988. Project: http:

spark git commit: [SPARK-22638][SS] Use a separate queue for StreamingQueryListenerBus

2017-12-01 Thread zsxwing
non-streaming events, streaming query listeners don't need to wait for other Spark listeners and can catch up. ## How was this patch tested? Jenkins Author: Shixiong Zhu <zsxw...@gmail.com> Closes #19838 from zsxwing/SPARK-22638. Project: http://git-wip-us.apache.org/repos/asf/spark/re

<    1   2   3   4   5   6   7   8   >