[GitHub] spark pull request #19288: [SPARK-22075][ML] unpersist datasets cached by Pe...

2017-09-20 Thread zhengruifeng
GitHub user zhengruifeng opened a pull request: https://github.com/apache/spark/pull/19288 [SPARK-22075][ML] unpersist datasets cached by PeriodicRDDCheckpointer ## What changes were proposed in this pull request? PeriodicRDDCheckpointer will automatically persist the last 3

[GitHub] spark issue #19288: [SPARK-22075][ML] unpersist datasets cached by PeriodicR...

2017-09-20 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19288 **[Test build #81970 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81970/testReport)** for PR 19288 at commit

[GitHub] spark issue #19145: [spark-21933][yarn] Spark Streaming request more executo...

2017-09-20 Thread klion26
Github user klion26 commented on the issue: https://github.com/apache/spark/pull/19145 @squito I agree with you that this should be handled by yarn. In my opinion, this is some form of defensive programming. The Spark Streaming and structured streaming will both request more

[GitHub] spark issue #19160: [SPARK-21934][CORE] Expose Shuffle Netty memory usage to...

2017-09-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19160 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81962/ Test PASSed. ---

[GitHub] spark issue #19181: [SPARK-21907][CORE] oom during spill

2017-09-20 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19181 **[Test build #81998 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81998/testReport)** for PR 19181 at commit

[GitHub] spark issue #19287: [SPARK-22074][Core] Task killed by other attempt task sh...

2017-09-20 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19287 **[Test build #81987 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81987/testReport)** for PR 19287 at commit

[GitHub] spark issue #19287: [SPARK-22074][Core] Task killed by other attempt task sh...

2017-09-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19287 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #19287: [SPARK-22074][Core] Task killed by other attempt task sh...

2017-09-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19287 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81987/ Test PASSed. ---

[GitHub] spark issue #18853: [SPARK-21646][SQL] CommonType for binary comparison

2017-09-20 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18853 **[Test build #81999 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81999/testReport)** for PR 18853 at commit

[GitHub] spark pull request #15544: [SPARK-17997] [SQL] Add an aggregation function f...

2017-09-20 Thread wzhfy
Github user wzhfy commented on a diff in the pull request: https://github.com/apache/spark/pull/15544#discussion_r139886249 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/ApproxCountDistinctForIntervals.scala --- @@ -0,0 +1,235 @@

[GitHub] spark issue #19286: [SPARK-21338][SQL][FOLLOW-UP] Implement isCascadingTrunc...

2017-09-20 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19286 **[Test build #81976 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81976/testReport)** for PR 19286 at commit

[GitHub] spark issue #19271: [SPARK-22053][SS] Stream-stream inner join in Append Mod...

2017-09-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19271 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81972/ Test FAILed. ---

[GitHub] spark issue #19286: [SPARK-21338][SQL][FOLLOW-UP] Implement isCascadingTrunc...

2017-09-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19286 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #19286: [SPARK-21338][SQL][FOLLOW-UP] Implement isCascadingTrunc...

2017-09-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19286 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #19286: [SPARK-21338][SQL][FOLLOW-UP] Implement isCascadingTrunc...

2017-09-20 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/19286 retest this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail:

[GitHub] spark issue #19288: [SPARK-22075][ML] GBTs unpersist datasets cached by Peri...

2017-09-20 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19288 **[Test build #81975 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81975/testReport)** for PR 19288 at commit

[GitHub] spark issue #19243: [SPARK-21780][R] Simpler Dataset.sample API in R

2017-09-20 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19243 **[Test build #81978 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81978/testReport)** for PR 19243 at commit

[GitHub] spark issue #19281: [SPARK-21998][SQL] SortMergeJoinExec did not calculate i...

2017-09-20 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/19281 > So we ended up not getting the correct outputOrdering during physical planning stage before Sort nodes are added to the children. What's the harm of this? I think only

[GitHub] spark issue #19287: [SPARK-22074][Core] Task killed by other attempt task sh...

2017-09-20 Thread xuanyuanking
Github user xuanyuanking commented on the issue: https://github.com/apache/spark/pull/19287 `signal 9` retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #19285: [SPARK-22068][CORE]Reduce the duplicate code between put...

2017-09-20 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19285 **[Test build #81981 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81981/testReport)** for PR 19285 at commit

[GitHub] spark pull request #19277: [SPARK-22058][CORE]the BufferedInputStream will n...

2017-09-20 Thread srowen
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/19277#discussion_r139894546 --- Diff: core/src/main/scala/org/apache/spark/scheduler/EventLoggingListener.scala --- @@ -351,11 +351,11 @@ private[spark] object EventLoggingListener

[GitHub] spark pull request #19277: [SPARK-22058][CORE]the BufferedInputStream will n...

2017-09-20 Thread srowen
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/19277#discussion_r139894343 --- Diff: core/src/main/scala/org/apache/spark/scheduler/EventLoggingListener.scala --- @@ -351,11 +351,11 @@ private[spark] object EventLoggingListener

[GitHub] spark pull request #19277: [SPARK-22058][CORE]the BufferedInputStream will n...

2017-09-20 Thread srowen
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/19277#discussion_r139894416 --- Diff: core/src/main/scala/org/apache/spark/scheduler/EventLoggingListener.scala --- @@ -351,11 +351,11 @@ private[spark] object EventLoggingListener

[GitHub] spark pull request #19246: [SPARK-22025][PySpark] Speeding up fromInternal f...

2017-09-20 Thread maver1ck
Github user maver1ck closed the pull request at: https://github.com/apache/spark/pull/19246 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #15544: [SPARK-17997] [SQL] Add an aggregation function f...

2017-09-20 Thread wzhfy
Github user wzhfy commented on a diff in the pull request: https://github.com/apache/spark/pull/15544#discussion_r139896879 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/ApproxCountDistinctForIntervals.scala --- @@ -0,0 +1,235 @@

[GitHub] spark issue #19288: [SPARK-22075][ML] GBTs unpersist datasets cached by Peri...

2017-09-20 Thread zhengruifeng
Github user zhengruifeng commented on the issue: https://github.com/apache/spark/pull/19288 @srowen I check `LDA` : although `unpersistDataSet` is not called in it, no intermediate cached rdds is generated after `fit()`. Then I check `Pregel`, and find that each call of

[GitHub] spark issue #19145: [spark-21933][yarn] Spark Streaming request more executo...

2017-09-20 Thread jerryshao
Github user jerryshao commented on the issue: https://github.com/apache/spark/pull/19145 @klion26 , this is not a problem related to Spark Streaming and Structured Streaming. For any Spark application it will run into this problem. This is basically a YARN problem and looks hard to

[GitHub] spark issue #19160: [SPARK-21934][CORE] Expose Shuffle Netty memory usage to...

2017-09-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19160 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #19160: [SPARK-21934][CORE] Expose Shuffle Netty memory usage to...

2017-09-20 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19160 **[Test build #81962 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81962/testReport)** for PR 19160 at commit

[GitHub] spark issue #19285: [SPARK-22068][CORE]Reduce the duplicate code between put...

2017-09-20 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19285 **[Test build #81971 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81971/testReport)** for PR 19285 at commit

[GitHub] spark pull request #19243: [SPARK-21780][R] Simpler Dataset.sample API in R

2017-09-20 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/19243#discussion_r139884675 --- Diff: R/pkg/R/DataFrame.R --- @@ -998,33 +998,44 @@ setMethod("unique", #' sparkR.session() #' path <- "path/to/file.json" #' df <-

[GitHub] spark pull request #15544: [SPARK-17997] [SQL] Add an aggregation function f...

2017-09-20 Thread wzhfy
Github user wzhfy commented on a diff in the pull request: https://github.com/apache/spark/pull/15544#discussion_r139887066 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/ApproxCountDistinctForIntervals.scala --- @@ -0,0 +1,235 @@

[GitHub] spark pull request #19286: [SPARK-21338][SQL][FOLLOW-UP] Implement isCascadi...

2017-09-20 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/19286#discussion_r139887244 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/jdbc/JDBCSuite.scala --- @@ -747,6 +747,19 @@ class JDBCSuite extends SparkFunSuite

[GitHub] spark issue #19289: [SPARK-22076][SQL] Expand.projections should not be a St...

2017-09-20 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/19289 cc @liufengdb @gatorsmile @jiangxb1987 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark pull request #19289: [SPARK-22076][SQL] Expand.projections should not ...

2017-09-20 Thread cloud-fan
GitHub user cloud-fan opened a pull request: https://github.com/apache/spark/pull/19289 [SPARK-22076][SQL] Expand.projections should not be a Stream ## What changes were proposed in this pull request? Spark with Scala 2.10 fails with a group by cube: ```

[GitHub] spark issue #19289: [SPARK-22076][SQL] Expand.projections should not be a St...

2017-09-20 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19289 **[Test build #81974 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81974/testReport)** for PR 19289 at commit

[GitHub] spark pull request #19269: [SPARK-22026][SQL][WIP] data source v2 write path

2017-09-20 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/19269#discussion_r139889045 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/writer/SupportsWriteUnsafeRow.java --- @@ -0,0 +1,44 @@ +/* + * Licensed to the

[GitHub] spark issue #19287: [SPARK-22074][Core] Task killed by other attempt task sh...

2017-09-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19287 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #19281: [SPARK-21998][SQL] SortMergeJoinExec did not calculate i...

2017-09-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19281 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #18193: [SPARK-15616] [SQL] CatalogRelation should fallback to H...

2017-09-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18193 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #19281: [SPARK-21998][SQL] SortMergeJoinExec did not calculate i...

2017-09-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19281 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81966/ Test FAILed. ---

[GitHub] spark issue #19286: [SPARK-21338][SQL][FOLLOW-UP] Implement isCascadingTrunc...

2017-09-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19286 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81973/ Test FAILed. ---

[GitHub] spark issue #18193: [SPARK-15616] [SQL] CatalogRelation should fallback to H...

2017-09-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18193 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81968/ Test FAILed. ---

[GitHub] spark issue #19286: [SPARK-21338][SQL][FOLLOW-UP] Implement isCascadingTrunc...

2017-09-20 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19286 **[Test build #81965 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81965/testReport)** for PR 19286 at commit

[GitHub] spark issue #19271: [SPARK-22053][SS] Stream-stream inner join in Append Mod...

2017-09-20 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19271 **[Test build #81972 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81972/testReport)** for PR 19271 at commit

[GitHub] spark issue #18805: [SPARK-19112][CORE] Support for ZStandard codec

2017-09-20 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18805 **[Test build #81967 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81967/testReport)** for PR 18805 at commit

[GitHub] spark issue #19289: [SPARK-22076][SQL] Expand.projections should not be a St...

2017-09-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19289 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #19285: [SPARK-22068][CORE]Reduce the duplicate code between put...

2017-09-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19285 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #19289: [SPARK-22076][SQL] Expand.projections should not be a St...

2017-09-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19289 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81974/ Test FAILed. ---

[GitHub] spark issue #19288: [SPARK-22075][ML] GBTs unpersist datasets cached by Peri...

2017-09-20 Thread zhengruifeng
Github user zhengruifeng commented on the issue: https://github.com/apache/spark/pull/19288 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail:

[GitHub] spark issue #19288: [SPARK-22075][ML] GBTs unpersist datasets cached by Peri...

2017-09-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19288 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #19288: [SPARK-22075][ML] GBTs unpersist datasets cached by Peri...

2017-09-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19288 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81970/ Test FAILed. ---

[GitHub] spark issue #19287: [SPARK-22074][Core] Task killed by other attempt task sh...

2017-09-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19287 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81969/ Test FAILed. ---

[GitHub] spark issue #18805: [SPARK-19112][CORE] Support for ZStandard codec

2017-09-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18805 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #19286: [SPARK-21338][SQL][FOLLOW-UP] Implement isCascadingTrunc...

2017-09-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19286 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81965/ Test FAILed. ---

[GitHub] spark issue #19271: [SPARK-22053][SS] Stream-stream inner join in Append Mod...

2017-09-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19271 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #18805: [SPARK-19112][CORE] Support for ZStandard codec

2017-09-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18805 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81967/ Test FAILed. ---

[GitHub] spark issue #19285: [SPARK-22068][CORE]Reduce the duplicate code between put...

2017-09-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19285 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81971/ Test FAILed. ---

[GitHub] spark pull request #19269: [SPARK-22026][SQL][WIP] data source v2 write path

2017-09-20 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/19269#discussion_r139889741 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/WriteToDataSourceV2Command.scala --- @@ -0,0 +1,114 @@ +/* + *

[GitHub] spark pull request #15544: [SPARK-17997] [SQL] Add an aggregation function f...

2017-09-20 Thread wzhfy
Github user wzhfy commented on a diff in the pull request: https://github.com/apache/spark/pull/15544#discussion_r139890936 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/ApproxCountDistinctForIntervals.scala --- @@ -0,0 +1,235 @@

[GitHub] spark issue #19289: [SPARK-22076][SQL] Expand.projections should not be a St...

2017-09-20 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19289 **[Test build #81977 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81977/testReport)** for PR 19289 at commit

[GitHub] spark issue #19277: [SPARK-22058][CORE]the BufferedInputStream will not be c...

2017-09-20 Thread zuotingbing
Github user zuotingbing commented on the issue: https://github.com/apache/spark/pull/19277 I am not sure. If we move this line `val in = new BufferedInputStream(fs.open(log))` into try~catch, we should define `var in: BufferedInputStream = null` before, and use `catch { case e:

[GitHub] spark issue #19277: [SPARK-22058][CORE]the BufferedInputStream will not be c...

2017-09-20 Thread jerryshao
Github user jerryshao commented on the issue: https://github.com/apache/spark/pull/19277 Strictly saying, this line `new BufferedInputStream(fs.open(log))` will also throw exception, shouldn't you try-catch it? ---

[GitHub] spark pull request #19269: [SPARK-22026][SQL][WIP] data source v2 write path

2017-09-20 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/19269#discussion_r139892702 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/writer/DataWriter.java --- @@ -0,0 +1,38 @@ +/* + * Licensed to the Apache

[GitHub] spark issue #19281: [SPARK-21998][SQL] SortMergeJoinExec did not calculate i...

2017-09-20 Thread maryannxue
Github user maryannxue commented on the issue: https://github.com/apache/spark/pull/19281 Thank you very much for the feedback, @tejasapatil, @gatorsmile! All the suggestions/comments have been addressed by my latest check-in. ---

[GitHub] spark issue #19281: [SPARK-21998][SQL] SortMergeJoinExec did not calculate i...

2017-09-20 Thread maryannxue
Github user maryannxue commented on the issue: https://github.com/apache/spark/pull/19281 @cloud-fan Please refer to https://issues.apache.org/jira/browse/SPARK-18591. The plan is to apply the optimization during physical planning stage, and specifically, when creating Aggregate

[GitHub] spark issue #18636: added support word2vec training with additional data

2017-09-20 Thread LeoIV
Github user LeoIV commented on the issue: https://github.com/apache/spark/pull/18636 The problem emerges in cases where you built a whole pipeline. You have a set of documents you want to classify. These documents have some additional features and they are preprocessed in the

[GitHub] spark pull request #19285: [SPARK-22068][CORE]Reduce the duplicate code betw...

2017-09-20 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/19285#discussion_r139897940 --- Diff: core/src/main/scala/org/apache/spark/storage/memory/MemoryStore.scala --- @@ -233,17 +235,13 @@ private[spark] class MemoryStore( }

[GitHub] spark pull request #18805: [SPARK-19112][CORE] Support for ZStandard codec

2017-09-20 Thread srowen
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/18805#discussion_r139898945 --- Diff: core/src/main/scala/org/apache/spark/io/CompressionCodec.scala --- @@ -216,3 +218,33 @@ private final class SnappyOutputStreamWrapper(os:

[GitHub] spark pull request #19277: [SPARK-22058][CORE]the BufferedInputStream will n...

2017-09-20 Thread zuotingbing
Github user zuotingbing commented on a diff in the pull request: https://github.com/apache/spark/pull/19277#discussion_r139902219 --- Diff: core/src/main/scala/org/apache/spark/scheduler/EventLoggingListener.scala --- @@ -351,11 +351,11 @@ private[spark] object

[GitHub] spark issue #19271: [SPARK-22053][SS] Stream-stream inner join in Append Mod...

2017-09-20 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19271 **[Test build #81972 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81972/testReport)** for PR 19271 at commit

[GitHub] spark pull request #19269: [SPARK-22026][SQL][WIP] data source v2 write path

2017-09-20 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/19269#discussion_r13951 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/writer/DataWriter.java --- @@ -0,0 +1,38 @@ +/* + * Licensed to the Apache Software

[GitHub] spark issue #19196: [SPARK-21977] SinglePartition optimizations break certai...

2017-09-20 Thread brkyvz
Github user brkyvz commented on the issue: https://github.com/apache/spark/pull/19196 Thanks! Merging to master --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail:

[GitHub] spark issue #19289: [SPARK-22076][SQL] Expand.projections should not be a St...

2017-09-20 Thread gatorsmile
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/19289 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail:

[GitHub] spark pull request #19289: [SPARK-22076][SQL] Expand.projections should not ...

2017-09-20 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/19289#discussion_r139890182 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -279,7 +279,13 @@ class Analyzer( * We

[GitHub] spark issue #19288: [SPARK-22075][ML] GBTs unpersist datasets cached by Peri...

2017-09-20 Thread zhengruifeng
Github user zhengruifeng commented on the issue: https://github.com/apache/spark/pull/19288 @srowen In MLlib, `PeriodicRDDCheckpointer` is only used in `GradientBoostedTrees`. I just find that there is another checkpointer `PeriodicGraphCheckpointer`, I will check it. ---

[GitHub] spark issue #19285: [SPARK-22068][CORE]Reduce the duplicate code between put...

2017-09-20 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/19285 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail:

[GitHub] spark issue #19285: [SPARK-22068][CORE]Reduce the duplicate code between put...

2017-09-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19285 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #19285: [SPARK-22068][CORE]Reduce the duplicate code between put...

2017-09-20 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19285 **[Test build #81981 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81981/testReport)** for PR 19285 at commit

[GitHub] spark issue #19285: [SPARK-22068][CORE]Reduce the duplicate code between put...

2017-09-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19285 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81981/ Test FAILed. ---

[GitHub] spark issue #19246: [SPARK-22025][PySpark] Speeding up fromInternal for Stru...

2017-09-20 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/19246 Yea, thanks. That was a cool patch BTW. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #19246: [SPARK-22025][PySpark] Speeding up fromInternal for Stru...

2017-09-20 Thread maver1ck
Github user maver1ck commented on the issue: https://github.com/apache/spark/pull/19246 @HyukjinKwon I created this before https://github.com/apache/spark/pull/19249, which greatly decrease function call. I agree we can close it. ---

[GitHub] spark issue #19288: [SPARK-22075][ML] GBTs unpersist datasets cached by Peri...

2017-09-20 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19288 **[Test build #81982 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81982/testReport)** for PR 19288 at commit

[GitHub] spark issue #19277: [SPARK-22058][CORE]the BufferedInputStream will not be c...

2017-09-20 Thread zuotingbing
Github user zuotingbing commented on the issue: https://github.com/apache/spark/pull/19277 @jerryshao if this line `new BufferedInputStream(fs.open(log))` throws exception, it mean we do not need to close the object of BufferedInputStream because it new failed. ---

[GitHub] spark issue #19068: [SPARK-21428][SQL][FOLLOWUP]CliSessionState should point...

2017-09-20 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/19068 @yaooqinn This change works well for me, thanks for fix ! After this change, hive client for execution(points to a dummy local metastore) will never be used when running sql

[GitHub] spark issue #19288: [SPARK-22075][ML][GRAPHX] GBTs/Pregel unpersist datasets...

2017-09-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19288 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/81975/ Test PASSed. ---

[GitHub] spark issue #19286: [SPARK-21338][SQL][FOLLOW-UP] Implement isCascadingTrunc...

2017-09-20 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19286 **[Test build #81973 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81973/testReport)** for PR 19286 at commit

[GitHub] spark pull request #15544: [SPARK-17997] [SQL] Add an aggregation function f...

2017-09-20 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/15544#discussion_r139888528 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/ApproxCountDistinctForIntervals.scala --- @@ -0,0 +1,235 @@

[GitHub] spark pull request #19196: [SPARK-21977] SinglePartition optimizations break...

2017-09-20 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/19196 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19287: [SPARK-22074][Core] Task killed by other attempt task sh...

2017-09-20 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19287 **[Test build #81969 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81969/testReport)** for PR 19287 at commit

[GitHub] spark issue #19281: [SPARK-21998][SQL] SortMergeJoinExec did not calculate i...

2017-09-20 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19281 **[Test build #81966 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81966/testReport)** for PR 19281 at commit

[GitHub] spark issue #19285: [SPARK-22068][CORE]Reduce the duplicate code between put...

2017-09-20 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19285 **[Test build #81971 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81971/testReport)** for PR 19285 at commit

[GitHub] spark issue #19286: [SPARK-21338][SQL][FOLLOW-UP] Implement isCascadingTrunc...

2017-09-20 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19286 **[Test build #81973 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81973/testReport)** for PR 19286 at commit

[GitHub] spark issue #19289: [SPARK-22076][SQL] Expand.projections should not be a St...

2017-09-20 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19289 **[Test build #81974 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81974/testReport)** for PR 19289 at commit

[GitHub] spark issue #19288: [SPARK-22075][ML] GBTs unpersist datasets cached by Peri...

2017-09-20 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19288 **[Test build #81970 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81970/testReport)** for PR 19288 at commit

[GitHub] spark issue #18193: [SPARK-15616] [SQL] CatalogRelation should fallback to H...

2017-09-20 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18193 **[Test build #81968 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81968/testReport)** for PR 18193 at commit

[GitHub] spark issue #19289: [SPARK-22076][SQL] Expand.projections should not be a St...

2017-09-20 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19289 **[Test build #81979 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81979/testReport)** for PR 19289 at commit

[GitHub] spark issue #19281: [SPARK-21998][SQL] SortMergeJoinExec did not calculate i...

2017-09-20 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19281 **[Test build #81980 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/81980/testReport)** for PR 19281 at commit

[GitHub] spark pull request #15544: [SPARK-17997] [SQL] Add an aggregation function f...

2017-09-20 Thread wzhfy
Github user wzhfy commented on a diff in the pull request: https://github.com/apache/spark/pull/15544#discussion_r139898372 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/aggregate/ApproxCountDistinctForIntervals.scala --- @@ -0,0 +1,235 @@

[GitHub] spark issue #19243: [SPARK-21780][R] Simpler Dataset.sample API in R

2017-09-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19243 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

  1   2   3   4   5   >