[GitHub] spark issue #19465: [SPARK-21988][SS]Implement StreamingRelation.computeStat...

2017-10-11 Thread joseph-torres
Github user joseph-torres commented on the issue: https://github.com/apache/spark/pull/19465 LGTM --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail:

[GitHub] spark pull request #19269: [SPARK-22026][SQL][WIP] data source v2 write path

2017-10-11 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/19269#discussion_r144097061 --- Diff: sql/core/src/main/java/org/apache/spark/sql/sources/v2/writer/DataSourceV2Writer.java --- @@ -0,0 +1,88 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request #19466: [SPARK-22237] [CORE] Fix spark submit file downlo...

2017-10-11 Thread loneknightpy
Github user loneknightpy commented on a diff in the pull request: https://github.com/apache/spark/pull/19466#discussion_r144100747 --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmit.scala --- @@ -366,6 +366,16 @@ object SparkSubmit extends CommandLineUtils with

[GitHub] spark pull request #19466: [SPARK-22237] [CORE] Fix spark submit file downlo...

2017-10-11 Thread loneknightpy
Github user loneknightpy closed the pull request at: https://github.com/apache/spark/pull/19466 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail:

[GitHub] spark issue #19474: [SPARK-22252][SQL] FileFormatWriter should respect the i...

2017-10-11 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19474 **[Test build #82639 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82639/testReport)** for PR 19474 at commit

[GitHub] spark issue #19451: SPARK-22181 Adds ReplaceExceptWithNotFilter rule

2017-10-11 Thread sathiyapk
Github user sathiyapk commented on the issue: https://github.com/apache/spark/pull/19451 @gengliangwang Ready for a next review :) > put case ... in a new line Are your sure? I thought according to the coding style, while calling on a partial function if there is

[GitHub] spark pull request #18386: [SPARK-21165] [SQL] [2.2] Use executedPlan instea...

2017-10-11 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/18386#discussion_r144097084 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileFormatWriter.scala --- @@ -111,9 +111,18 @@ object FileFormatWriter

[GitHub] spark issue #19474: [SPARK-22252][SQL] FileFormatWriter should respect the i...

2017-10-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19474 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #19474: [SPARK-22252][SQL] FileFormatWriter should respect the i...

2017-10-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19474 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82639/ Test PASSed. ---

[GitHub] spark pull request #18979: [SPARK-21762][SQL] FileFormatWriter/BasicWriteTas...

2017-10-11 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/18979#discussion_r144094437 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/BasicWriteStatsTracker.scala --- @@ -57,7 +60,14 @@ class

[GitHub] spark issue #18924: [SPARK-14371] [MLLIB] OnlineLDAOptimizer should not coll...

2017-10-11 Thread akopich
Github user akopich commented on the issue: https://github.com/apache/spark/pull/18924 @WeichenXu123, yes sure. But can this wait until this PR is merged? --- - To unsubscribe, e-mail:

[GitHub] spark pull request #19448: [SPARK-22217] [SQL] ParquetFileFormat to support ...

2017-10-11 Thread rdblue
Github user rdblue commented on a diff in the pull request: https://github.com/apache/spark/pull/19448#discussion_r144088592 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala --- @@ -138,6 +138,10 @@ class

[GitHub] spark issue #30: SPARK-1004. PySpark on YARN

2017-10-11 Thread swaapnika-guntaka
Github user swaapnika-guntaka commented on the issue: https://github.com/apache/spark/pull/30 I see the Java EOF Exception when I run python packaged jar(using JDK 8) using Spark-2.2 I'm trying to run this using the below command. `time bash -x $SPARK_HOME/bin/spark-submit

[GitHub] spark issue #19474: [SPARK-22252][SQL] FileFormatWriter should respect the i...

2017-10-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19474 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82638/ Test PASSed. ---

[GitHub] spark issue #19474: [SPARK-22252][SQL] FileFormatWriter should respect the i...

2017-10-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19474 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark pull request #19448: [SPARK-22217] [SQL] ParquetFileFormat to support ...

2017-10-11 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/19448#discussion_r144086925 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetCommitterSuite.scala --- @@ -0,0 +1,152 @@ +/* + *

[GitHub] spark issue #19474: [SPARK-22252][SQL] FileFormatWriter should respect the i...

2017-10-11 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19474 **[Test build #82638 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82638/testReport)** for PR 19474 at commit

[GitHub] spark issue #19468: [SPARK-18278] [Scheduler] Spark on Kubernetes - Basic Sc...

2017-10-11 Thread markhamstra
Github user markhamstra commented on the issue: https://github.com/apache/spark/pull/19468 > iron out the kinks A large chunk of the difficulty in identifying and ironing out kinks in such a project is the difficulty of writing adequate tests of the scheduler code. I'd

[GitHub] spark issue #19467: [SPARK-22238] Fix plan resolution bug caused by EnsureSt...

2017-10-11 Thread brkyvz
Github user brkyvz commented on the issue: https://github.com/apache/spark/pull/19467 cc @tdas --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail:

[GitHub] spark pull request #19448: [SPARK-22217] [SQL] ParquetFileFormat to support ...

2017-10-11 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/19448#discussion_r144082508 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala --- @@ -138,6 +138,10 @@ class

[GitHub] spark pull request #18460: [SPARK-21247][SQL] Type comparison should respect...

2017-10-11 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/18460#discussion_r144082145 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala --- @@ -100,6 +101,17 @@ object TypeCoercion {

[GitHub] spark issue #19429: [SPARK-20055] [Docs] Added documentation for loading csv...

2017-10-11 Thread felixcheung
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/19429 +1 for more detailed documentation (we should steer away from `inferSchema`) --- - To unsubscribe, e-mail:

[GitHub] spark issue #18805: [SPARK-19112][CORE] Support for ZStandard codec

2017-10-11 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18805 **[Test build #82644 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82644/testReport)** for PR 18805 at commit

[GitHub] spark issue #18805: [SPARK-19112][CORE] Support for ZStandard codec

2017-10-11 Thread vanzin
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/18805 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail:

[GitHub] spark issue #18805: [SPARK-19112][CORE] Support for ZStandard codec

2017-10-11 Thread vanzin
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/18805 The [code](https://github.com/luben/zstd-jni/blob/master/src/main/java/com/github/luben/zstd/util/Native.java) overwrites the original exception message that might shed some light on what's going

[GitHub] spark pull request #18460: [SPARK-21247][SQL] Type comparison should respect...

2017-10-11 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/18460#discussion_r144078834 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala --- @@ -100,6 +101,17 @@ object TypeCoercion {

[GitHub] spark pull request #18460: [SPARK-21247][SQL] Type comparison should respect...

2017-10-11 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/18460#discussion_r144075681 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala --- @@ -100,6 +101,17 @@ object TypeCoercion {

[GitHub] spark issue #19473: [SPARK-22251][SQL] Metric 'aggregate time' is incorrect ...

2017-10-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19473 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #19473: [SPARK-22251][SQL] Metric 'aggregate time' is incorrect ...

2017-10-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19473 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82637/ Test PASSed. ---

[GitHub] spark issue #19473: [SPARK-22251][SQL] Metric 'aggregate time' is incorrect ...

2017-10-11 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19473 **[Test build #82637 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82637/testReport)** for PR 19473 at commit

[GitHub] spark issue #18460: [SPARK-21247][SQL] Type comparison should respect case-s...

2017-10-11 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/18460 Please let me know if there is something to do more~ Thank you always, @gatorsmile . --- - To unsubscribe, e-mail:

[GitHub] spark issue #19471: [SPARK-22245][SQL] partitioned data set should always pu...

2017-10-11 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/19471 +1 for this change. BTW, wow, there are lots of test case failures: 81 failures. --- - To unsubscribe, e-mail:

[GitHub] spark issue #19448: [SPARK-22217] [SQL] ParquetFileFormat to support arbitra...

2017-10-11 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19448 **[Test build #82643 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82643/testReport)** for PR 19448 at commit

[GitHub] spark issue #19466: [SPARK-22237] [CORE] Fix spark submit file download for ...

2017-10-11 Thread vanzin
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/19466 > here with this change all the resources should be fetched from local driver That's a good point. You should download resources just to add them to the driver's classpath, but executors

[GitHub] spark issue #19472: [WIP][SPARK-22246][SQL] Use MemoryBlock in UnsafeRow, Un...

2017-10-11 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19472 **[Test build #82642 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82642/testReport)** for PR 19472 at commit

[GitHub] spark issue #19456: [SPARK] [Scheduler] Configurable default scheduling mode

2017-10-11 Thread blyncsy-david-lewis
Github user blyncsy-david-lewis commented on the issue: https://github.com/apache/spark/pull/19456 I have a multiuser application where I use the userId as the name of the scheduling pool so that users are balanced equally by spark and within a user's workload I can set the

[GitHub] spark issue #18664: [SPARK-21375][PYSPARK][SQL][WIP] Add Date and Timestamp ...

2017-10-11 Thread BryanCutler
Github user BryanCutler commented on the issue: https://github.com/apache/spark/pull/18664 Thanks @ueshin , I agree it is better to convert the timezone to Python system local first and then localize to make tz-naive in case the Python system local tz is different that

[GitHub] spark issue #18664: [SPARK-21375][PYSPARK][SQL][WIP] Add Date and Timestamp ...

2017-10-11 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18664 **[Test build #82641 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82641/testReport)** for PR 18664 at commit

[GitHub] spark pull request #19448: [SPARK-22217] [SQL] ParquetFileFormat to support ...

2017-10-11 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/19448#discussion_r144065810 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala --- @@ -138,6 +138,13 @@ class

[GitHub] spark issue #19472: [WIP][SPARK-22246][SQL] Use MemoryBlock in UnsafeRow, Un...

2017-10-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19472 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark pull request #19448: [SPARK-22217] [SQL] ParquetFileFormat to support ...

2017-10-11 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/19448#discussion_r144065074 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala --- @@ -138,6 +138,13 @@ class

[GitHub] spark issue #19472: [WIP][SPARK-22246][SQL] Use MemoryBlock in UnsafeRow, Un...

2017-10-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19472 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82636/ Test FAILed. ---

[GitHub] spark issue #19472: [WIP][SPARK-22246][SQL] Use MemoryBlock in UnsafeRow, Un...

2017-10-11 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19472 **[Test build #82636 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82636/testReport)** for PR 19472 at commit

[GitHub] spark pull request #19448: [SPARK-22217] [SQL] ParquetFileFormat to support ...

2017-10-11 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/19448#discussion_r144065041 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetCommitterSuite.scala --- @@ -0,0 +1,149 @@ +/* +

[GitHub] spark issue #18098: [SPARK-16944][Mesos] Improve data locality when launchin...

2017-10-11 Thread PerilousApricot
Github user PerilousApricot commented on the issue: https://github.com/apache/spark/pull/18098 Is there any documentation for this feature? How would I expose my topology to mesos/spark? --- - To unsubscribe,

[GitHub] spark issue #19474: [SPARK-22252][SQL] FileFormatWriter should respect the i...

2017-10-11 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19474 **[Test build #82640 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82640/testReport)** for PR 19474 at commit

[GitHub] spark pull request #19337: [SPARK-22114][ML][MLLIB]add epsilon for LDA

2017-10-11 Thread mpjlu
Github user mpjlu commented on a diff in the pull request: https://github.com/apache/spark/pull/19337#discussion_r144060858 --- Diff: mllib/src/main/scala/org/apache/spark/ml/clustering/LDA.scala --- @@ -224,6 +224,24 @@ private[clustering] trait LDAParams extends Params with

[GitHub] spark issue #19474: [SPARK-22252][SQL] FileFormatWriter should respect the i...

2017-10-11 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/19474 For a simple command `Seq(1 -> "a").toDF("i", "j").write.parquet("/tmp/qwe")`, the UI before this PR:

[GitHub] spark issue #19474: [SPARK-22252][SQL] FileFormatWriter should respect the i...

2017-10-11 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/19474 cc @gatorsmile @viirya --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail:

[GitHub] spark issue #19474: [SPARK-22252][SQL] FileFormatWriter should respect the i...

2017-10-11 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19474 **[Test build #82639 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82639/testReport)** for PR 19474 at commit

[GitHub] spark issue #19468: [SPARK-18278] [Scheduler] Spark on Kubernetes - Basic Sc...

2017-10-11 Thread foxish
Github user foxish commented on the issue: https://github.com/apache/spark/pull/19468 @skonto, there was some discussion about this on the SPIP. We see them as separate and independent issues with the pluggable API being a long term goal. It would involve a working group of people

[GitHub] spark issue #18979: [SPARK-21762][SQL] FileFormatWriter/BasicWriteTaskStatsT...

2017-10-11 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/18979 @viirya : the new data writer API will allow for a broader set of stats to be propagated back from workers. When you are working with the object stores, an useful stat to get back is throttle

[GitHub] spark issue #19041: [SPARK-21097][CORE] Add option to recover cached data

2017-10-11 Thread brad-kaiser
Github user brad-kaiser commented on the issue: https://github.com/apache/spark/pull/19041 Thanks @vanzin I will work on these comments. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For

[GitHub] spark issue #19473: [SPARK-22251] Metric 'aggregate time' is incorrect when ...

2017-10-11 Thread hvanhovell
Github user hvanhovell commented on the issue: https://github.com/apache/spark/pull/19473 Could you add [SQL] to the title? That makes it easier for others to scan PRs. --- - To unsubscribe, e-mail:

[GitHub] spark issue #19473: [SPARK-22251] Metric 'aggregate time' is incorrect when ...

2017-10-11 Thread hvanhovell
Github user hvanhovell commented on the issue: https://github.com/apache/spark/pull/19473 LGTM --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail:

[GitHub] spark issue #19474: [SPARK-22252][SQL] FileFormatWriter should respect the i...

2017-10-11 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19474 **[Test build #82638 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82638/testReport)** for PR 19474 at commit

[GitHub] spark pull request #19474: [SPARK-22252][SQL] FileFormatWriter should respec...

2017-10-11 Thread cloud-fan
GitHub user cloud-fan opened a pull request: https://github.com/apache/spark/pull/19474 [SPARK-22252][SQL] FileFormatWriter should respect the input query schema ## What changes were proposed in this pull request? In https://github.com/apache/spark/pull/18064, we allowed

[GitHub] spark pull request #19077: [SPARK-21860][core]Improve memory reuse for heap ...

2017-10-11 Thread jiangxb1987
Github user jiangxb1987 commented on a diff in the pull request: https://github.com/apache/spark/pull/19077#discussion_r144038007 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/complexTypeCreator.scala --- @@ -116,9 +116,10 @@ private [sql] object

[GitHub] spark pull request #19077: [SPARK-21860][core]Improve memory reuse for heap ...

2017-10-11 Thread jiangxb1987
Github user jiangxb1987 commented on a diff in the pull request: https://github.com/apache/spark/pull/19077#discussion_r144037194 --- Diff: common/unsafe/src/main/java/org/apache/spark/unsafe/memory/MemoryBlock.java --- @@ -48,6 +49,15 @@ public long size() { }

[GitHub] spark pull request #19077: [SPARK-21860][core]Improve memory reuse for heap ...

2017-10-11 Thread jiangxb1987
Github user jiangxb1987 commented on a diff in the pull request: https://github.com/apache/spark/pull/19077#discussion_r144037771 --- Diff: sql/catalyst/src/main/java/org/apache/spark/sql/catalyst/expressions/codegen/UnsafeArrayWriter.java --- @@ -57,7 +57,7 @@ public void

[GitHub] spark pull request #19077: [SPARK-21860][core]Improve memory reuse for heap ...

2017-10-11 Thread jiangxb1987
Github user jiangxb1987 commented on a diff in the pull request: https://github.com/apache/spark/pull/19077#discussion_r144037069 --- Diff: common/unsafe/src/main/java/org/apache/spark/unsafe/memory/MemoryBlock.java --- @@ -48,6 +49,15 @@ public long size() { }

[GitHub] spark pull request #18386: [SPARK-21165] [SQL] [2.2] Use executedPlan instea...

2017-10-11 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/18386#discussion_r144037286 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/FileFormatWriter.scala --- @@ -111,9 +111,18 @@ object FileFormatWriter

[GitHub] spark issue #19456: [SPARK] [Scheduler] Configurable default scheduling mode

2017-10-11 Thread jiangxb1987
Github user jiangxb1987 commented on the issue: https://github.com/apache/spark/pull/19456 Could you elaborate on the scenario that you should need to make these settings configurable? --- - To unsubscribe, e-mail:

[GitHub] spark pull request #19464: [SPARK-22233] [core] Allow user to filter out emp...

2017-10-11 Thread jiangxb1987
Github user jiangxb1987 commented on a diff in the pull request: https://github.com/apache/spark/pull/19464#discussion_r144030461 --- Diff: core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala --- @@ -196,7 +196,10 @@ class HadoopRDD[K, V]( // add the credentials here

[GitHub] spark pull request #19316: [SPARK-22097][CORE]Request an accurate memory aft...

2017-10-11 Thread jiangxb1987
Github user jiangxb1987 commented on a diff in the pull request: https://github.com/apache/spark/pull/19316#discussion_r144034339 --- Diff: core/src/main/scala/org/apache/spark/storage/memory/MemoryStore.scala --- @@ -388,7 +388,13 @@ private[spark] class MemoryStore( //

[GitHub] spark pull request #19464: [SPARK-22233] [core] Allow user to filter out emp...

2017-10-11 Thread jiangxb1987
Github user jiangxb1987 commented on a diff in the pull request: https://github.com/apache/spark/pull/19464#discussion_r144029852 --- Diff: docs/configuration.md --- @@ -1211,6 +1211,14 @@ Apart from these, the following properties are also available, and may be useful

[GitHub] spark issue #19473: [SPARK-22251] Metric 'aggregate time' is incorrect when ...

2017-10-11 Thread ala
Github user ala commented on the issue: https://github.com/apache/spark/pull/19473 @hvanhovell --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail:

[GitHub] spark pull request #19464: [SPARK-22233] [core] Allow user to filter out emp...

2017-10-11 Thread jiangxb1987
Github user jiangxb1987 commented on a diff in the pull request: https://github.com/apache/spark/pull/19464#discussion_r144031728 --- Diff: core/src/test/scala/org/apache/spark/FileSuite.scala --- @@ -510,4 +510,16 @@ class FileSuite extends SparkFunSuite with LocalSparkContext {

[GitHub] spark pull request #19464: [SPARK-22233] [core] Allow user to filter out emp...

2017-10-11 Thread jiangxb1987
Github user jiangxb1987 commented on a diff in the pull request: https://github.com/apache/spark/pull/19464#discussion_r144030646 --- Diff: core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala --- @@ -196,7 +196,10 @@ class HadoopRDD[K, V]( // add the credentials here

[GitHub] spark pull request #19464: [SPARK-22233] [core] Allow user to filter out emp...

2017-10-11 Thread jiangxb1987
Github user jiangxb1987 commented on a diff in the pull request: https://github.com/apache/spark/pull/19464#discussion_r144031167 --- Diff: core/src/test/scala/org/apache/spark/FileSuite.scala --- @@ -510,4 +510,16 @@ class FileSuite extends SparkFunSuite with LocalSparkContext {

[GitHub] spark issue #19250: [SPARK-12297] Table timezone correction for Timestamps

2017-10-11 Thread zivanfi
Github user zivanfi commented on the issue: https://github.com/apache/spark/pull/19250 @attilajeges has just found a problem with the behavior specified in the requirements: * Partitions of a table can use different file formats. * As a result, a single table can have data

[GitHub] spark issue #19437: [SPARK-22131][MESOS] Mesos driver secrets

2017-10-11 Thread susanxhuynh
Github user susanxhuynh commented on the issue: https://github.com/apache/spark/pull/19437 @vanzin Would you mind reviewing this PR? A followup to ArtRand's secrets PR. --- - To unsubscribe, e-mail:

[GitHub] spark issue #19473: [SPARK-22251] Metric 'aggregate time' is incorrect when ...

2017-10-11 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19473 **[Test build #82637 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82637/testReport)** for PR 19473 at commit

[GitHub] spark issue #19437: [SPARK-22131][MESOS] Mesos driver secrets

2017-10-11 Thread susanxhuynh
Github user susanxhuynh commented on the issue: https://github.com/apache/spark/pull/19437 @skonto I haven't tested with TLS; I'll work on that in the next couple of days. --- - To unsubscribe, e-mail:

[GitHub] spark pull request #19473: [SPARK-22251] Metric 'aggregate time' is incorrec...

2017-10-11 Thread ala
GitHub user ala opened a pull request: https://github.com/apache/spark/pull/19473 [SPARK-22251] Metric 'aggregate time' is incorrect when codegen is off ## What changes were proposed in this pull request? Adding the code for setting 'aggregate time' metric to non-codegen

[GitHub] spark pull request #18931: [SPARK-21717][SQL] Decouple consume functions of ...

2017-10-11 Thread a10y
Github user a10y commented on a diff in the pull request: https://github.com/apache/spark/pull/18931#discussion_r144025706 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/WholeStageCodegenExec.scala --- @@ -175,6 +175,25 @@ trait CodegenSupport extends SparkPlan

[GitHub] spark issue #18979: [SPARK-21762][SQL] FileFormatWriter/BasicWriteTaskStatsT...

2017-10-11 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/18979 I don't have strong opinion against this. Incorrect size is an issue but I can't think a better solution for now... --- - To

[GitHub] spark issue #19472: [WIP][SPARK-22246][SQL] Use MemoryBlock in UnsafeRow, Un...

2017-10-11 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19472 **[Test build #82636 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82636/testReport)** for PR 19472 at commit

[GitHub] spark issue #18924: [SPARK-14371] [MLLIB] OnlineLDAOptimizer should not coll...

2017-10-11 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/18924 @akopich LGTM. and do you have time to create a PR to resolve random seed not working issue mentioned by @hhbyyh ? Thanks! ---

[GitHub] spark issue #19472: [WIP][SPARK-22246][SQL] Use MemoryBlock in UnsafeRow, Un...

2017-10-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19472 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82634/ Test FAILed. ---

[GitHub] spark issue #19472: [WIP][SPARK-22246][SQL] Use MemoryBlock in UnsafeRow, Un...

2017-10-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19472 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #19472: [WIP][SPARK-22246][SQL] Use MemoryBlock in UnsafeRow, Un...

2017-10-11 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19472 **[Test build #82634 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82634/testReport)** for PR 19472 at commit

[GitHub] spark pull request #19448: [SPARK-22217] [SQL] ParquetFileFormat to support ...

2017-10-11 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/19448#discussion_r143996060 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala --- @@ -138,6 +138,13 @@ class

[GitHub] spark issue #18029: [SPARK-20168] [DStream] Add changes to use kinesis fetch...

2017-10-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18029 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82635/ Test PASSed. ---

[GitHub] spark issue #18029: [SPARK-20168] [DStream] Add changes to use kinesis fetch...

2017-10-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18029 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #18029: [SPARK-20168] [DStream] Add changes to use kinesis fetch...

2017-10-11 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18029 **[Test build #82635 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82635/testReport)** for PR 18029 at commit

[GitHub] spark pull request #19448: [SPARK-22217] [SQL] ParquetFileFormat to support ...

2017-10-11 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/19448#discussion_r143992362 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetCommitterSuite.scala --- @@ -0,0 +1,149 @@ +/* +

[GitHub] spark pull request #19448: [SPARK-22217] [SQL] ParquetFileFormat to support ...

2017-10-11 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/19448#discussion_r143992319 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala --- @@ -138,6 +138,13 @@ class

[GitHub] spark pull request #19448: [SPARK-22217] [SQL] ParquetFileFormat to support ...

2017-10-11 Thread steveloughran
Github user steveloughran commented on a diff in the pull request: https://github.com/apache/spark/pull/19448#discussion_r143992018 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/parquet/ParquetFileFormat.scala --- @@ -138,6 +138,13 @@ class

[GitHub] spark issue #18029: [SPARK-20168] [DStream] Add changes to use kinesis fetch...

2017-10-11 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18029 **[Test build #82635 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82635/testReport)** for PR 18029 at commit

[GitHub] spark pull request #19464: [SPARK-22233] [core] Allow user to filter out emp...

2017-10-11 Thread srowen
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/19464#discussion_r143984294 --- Diff: core/src/main/scala/org/apache/spark/rdd/NewHadoopRDD.scala --- @@ -122,7 +122,10 @@ class NewHadoopRDD[K, V]( case _ => }

[GitHub] spark issue #19464: [SPARK-22233] [core] Allow user to filter out empty spli...

2017-10-11 Thread srowen
Github user srowen commented on the issue: https://github.com/apache/spark/pull/19464 Interesting. On the one hand I don't like adding yet another flag that changes behavior, when the user often can't meaningfully decide to set it. There is probably no value in processing an empty

[GitHub] spark issue #19471: [SPARK-22245][SQL] partitioned data set should always pu...

2017-10-11 Thread maropu
Github user maropu commented on the issue: https://github.com/apache/spark/pull/19471 Fair enough to me. To check this change reasonable, we might be able to send a dev/user list email to social feedbacks. I saw marmbrus doing so when adding the json API;

[GitHub] spark issue #17819: [SPARK-20542][ML][SQL] Add an API to Bucketizer that can...

2017-10-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17819 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #17819: [SPARK-20542][ML][SQL] Add an API to Bucketizer that can...

2017-10-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17819 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82632/ Test PASSed. ---

[GitHub] spark issue #17819: [SPARK-20542][ML][SQL] Add an API to Bucketizer that can...

2017-10-11 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17819 **[Test build #82632 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82632/testReport)** for PR 17819 at commit

[GitHub] spark issue #19471: [SPARK-22245][SQL] partitioned data set should always pu...

2017-10-11 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/19471 waiting for more feedbacks before moving forward :) Another thing I wanna point out: for `sql("create table t using parquet options(skipHiveMetadata=true) location '/tmp/t'")`, it works

[GitHub] spark issue #19471: [SPARK-22245][SQL] partitioned data set should always pu...

2017-10-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19471 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82633/ Test FAILed. ---

[GitHub] spark issue #19471: [SPARK-22245][SQL] partitioned data set should always pu...

2017-10-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19471 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #19471: [SPARK-22245][SQL] partitioned data set should always pu...

2017-10-11 Thread maropu
Github user maropu commented on the issue: https://github.com/apache/spark/pull/19471 Does this change affect some other tests for the overlapped cases like

<    1   2   3   4   >