[GitHub] spark issue #17239: Using map function in spark for huge operation

2017-03-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17239 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark pull request #17239: Using map function in spark for huge operation

2017-03-09 Thread nischay21
GitHub user nischay21 opened a pull request: https://github.com/apache/spark/pull/17239 Using map function in spark for huge operation We need to calculate distance matrix like jaccard on huge collection of Dataset in spark. Facing couple of issues. Kindly help us to give

[GitHub] spark issue #17177: [SPARK-19834][SQL] csv escape of quote escape

2017-03-09 Thread jbax
Github user jbax commented on the issue: https://github.com/apache/spark/pull/17177 Doesn't seem correct to me. All test cases are using broken CSV and trigger the parser handling of unescaped quotes, where it tries to rescue the data and produce something sensible. See my test case

[GitHub] spark issue #17237: [SPARK-19852][PYSPARK][ML] Update Python API setHandleIn...

2017-03-09 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17237 **[Test build #74305 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74305/testReport)** for PR 17237 at commit

[GitHub] spark issue #17237: [SPARK-19852][PYSPARK][ML] Update Python API setHandleIn...

2017-03-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17237 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #17237: [SPARK-19852][PYSPARK][ML] Update Python API setHandleIn...

2017-03-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17237 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74305/ Test FAILed. ---

[GitHub] spark issue #14033: [SPARK-16286][SQL] Implement stack table generating func...

2017-03-09 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/14033 just found out that we didn't implement a type coercion rule for `stack`... --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark issue #17226: [SPARK-19882][SQL] Pivot with null as a distinct pivot v...

2017-03-09 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/17226 For me, LGTM --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or

[GitHub] spark pull request #17188: [SPARK-19751][SQL] Throw an exception if bean cla...

2017-03-09 Thread maropu
Github user maropu commented on a diff in the pull request: https://github.com/apache/spark/pull/17188#discussion_r105343821 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/JavaTypeInference.scala --- @@ -69,7 +69,8 @@ object JavaTypeInference { *

[GitHub] spark issue #17238: getRackForHost returns None if host is unknown by driver

2017-03-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17238 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark pull request #17238: getRackForHost returns None if host is unknown by...

2017-03-09 Thread morenn520
GitHub user morenn520 opened a pull request: https://github.com/apache/spark/pull/17238 getRackForHost returns None if host is unknown by driver ## What changes were proposed in this pull request? https://issues.apache.org/jira/browse/SPARK-19894 ## How was this

[GitHub] spark issue #17237: [SPARK-19852][PYSPARK][ML] Update Python API setHandleIn...

2017-03-09 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17237 **[Test build #74305 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74305/testReport)** for PR 17237 at commit

[GitHub] spark pull request #17175: [SPARK-19468][SQL] Rewrite physical Project opera...

2017-03-09 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/17175#discussion_r105342800 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/basicPhysicalOperators.scala --- @@ -78,9 +78,42 @@ case class ProjectExec(projectList:

[GitHub] spark pull request #17224: [SPARK-19882][SQL] Pivot with null as the dictinc...

2017-03-09 Thread HyukjinKwon
Github user HyukjinKwon closed the pull request at: https://github.com/apache/spark/pull/17224 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark issue #17224: [SPARK-19882][SQL] Pivot with null as the dictinct pivot...

2017-03-09 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/17224 I am closing this per https://github.com/apache/spark/pull/17226#issuecomment-285597434 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub

[GitHub] spark issue #17226: [SPARK-19882][SQL] Pivot with null as a distinct pivot v...

2017-03-09 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/17226 I see. So, `count` in "**Spark 2.1.0** (and presumably 2.0.x/master)" was unexpectedly introduced by the optimization in SPARK-13749 and this behaviour change between 1.6 and master (whether it

[GitHub] spark pull request #17225: [CORE] Support ZStandard Compression

2017-03-09 Thread dongjinleekr
Github user dongjinleekr commented on a diff in the pull request: https://github.com/apache/spark/pull/17225#discussion_r105342714 --- Diff: core/src/main/scala/org/apache/spark/io/CompressionCodec.scala --- @@ -49,13 +50,14 @@ private[spark] object CompressionCodec {

[GitHub] spark pull request #17188: [SPARK-19751][SQL] Throw an exception if bean cla...

2017-03-09 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/17188#discussion_r105342563 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/JavaTypeInference.scala --- @@ -69,7 +69,8 @@ object JavaTypeInference { *

[GitHub] spark issue #17188: [SPARK-19751][SQL] Throw an exception if bean class has ...

2017-03-09 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17188 **[Test build #74304 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74304/testReport)** for PR 17188 at commit

[GitHub] spark issue #17188: [SPARK-19751][SQL] Throw an exception if bean class has ...

2017-03-09 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17188 **[Test build #74303 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74303/testReport)** for PR 17188 at commit

[GitHub] spark issue #17237: [SPARK-19852][PYSPARK][ML] Update Python API setHandleIn...

2017-03-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17237 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74301/ Test FAILed. ---

[GitHub] spark issue #17237: [SPARK-19852][PYSPARK][ML] Update Python API setHandleIn...

2017-03-09 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17237 **[Test build #74301 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74301/testReport)** for PR 17237 at commit

[GitHub] spark issue #17237: [SPARK-19852][PYSPARK][ML] Update Python API setHandleIn...

2017-03-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17237 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #17237: [SPARK-19852][PYSPARK][ML] Update Python API setHandleIn...

2017-03-09 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17237 **[Test build #74301 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74301/testReport)** for PR 17237 at commit

[GitHub] spark issue #17188: [SPARK-19751][SQL] Throw an exception if bean class has ...

2017-03-09 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17188 **[Test build #74302 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74302/testReport)** for PR 17188 at commit

[GitHub] spark pull request #17231: [SPARK-19891][SS] Await Batch Lock notified on st...

2017-03-09 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/17231 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

[GitHub] spark pull request #17237: [SPARK-19852][PYSPARK][ML] Update Python API setH...

2017-03-09 Thread VinceShieh
GitHub user VinceShieh opened a pull request: https://github.com/apache/spark/pull/17237 [SPARK-19852][PYSPARK][ML] Update Python API setHandleInvalid for StringIndexer ## What changes were proposed in this pull request? This PR is to maintain API parity with changes made

[GitHub] spark issue #17231: [SPARK-19891][SS] Await Batch Lock notified on stream ex...

2017-03-09 Thread zsxwing
Github user zsxwing commented on the issue: https://github.com/apache/spark/pull/17231 LGTM. Merging to master and 2.1. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark pull request #17172: [SPARK-19008][SQL] Improve performance of Dataset...

2017-03-09 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/17172 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

[GitHub] spark issue #17172: [SPARK-19008][SQL] Improve performance of Dataset.map by...

2017-03-09 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/17172 cool! merging to master! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and

[GitHub] spark issue #17236: [SPARK-xxxx][SQL] Cannot run intersect/except with map t...

2017-03-09 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17236 **[Test build #74300 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74300/testReport)** for PR 17236 at commit

[GitHub] spark issue #17236: [SPARK-xxxx][SQL] Cannot run intersect/except with map t...

2017-03-09 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/17236 cc @yhuai @sameeragarwal @hvanhovell --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request #17236: [SPARK-xxxx][SQL] Cannot run intersect/except wit...

2017-03-09 Thread cloud-fan
GitHub user cloud-fan opened a pull request: https://github.com/apache/spark/pull/17236 [SPARK-][SQL] Cannot run intersect/except with map type ## What changes were proposed in this pull request? In spark SQL, map type can't be used in equality test/comparison, and

[GitHub] spark issue #17138: [SPARK-17080] [SQL] join reorder

2017-03-09 Thread wzhfy
Github user wzhfy commented on the issue: https://github.com/apache/spark/pull/17138 @nsyca Yes you're right. There's still much room of optimization. We will improve Spark's optimizer gradually :) --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] spark issue #17109: [SPARK-19740][MESOS]Add support in Spark to pass arbitra...

2017-03-09 Thread tnachen
Github user tnachen commented on the issue: https://github.com/apache/spark/pull/17109 @srowen @mgummelt PTAL --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes

[GitHub] spark pull request #17192: [SPARK-19849][SQL] Support ArrayType in to_json t...

2017-03-09 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/17192#discussion_r105337154 --- Diff: R/pkg/inst/tests/testthat/test_sparkSQL.R --- @@ -1339,6 +1339,11 @@ test_that("column functions", { expect_equal(collect(select(df,

[GitHub] spark pull request #17177: [SPARK-19834][SQL] csv escape of quote escape

2017-03-09 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/17177#discussion_r105337093 --- Diff: python/pyspark/sql/readwriter.py --- @@ -693,8 +697,8 @@ def text(self, path, compression=None): @since(2.0) def

[GitHub] spark issue #17164: [SPARK-16844][SQL] Support codegen for sort-based aggrea...

2017-03-09 Thread maropu
Github user maropu commented on the issue: https://github.com/apache/spark/pull/17164 @hvanhovell ping --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or

[GitHub] spark pull request #17192: [SPARK-19849][SQL] Support ArrayType in to_json t...

2017-03-09 Thread felixcheung
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/17192#discussion_r105336865 --- Diff: R/pkg/inst/tests/testthat/test_sparkSQL.R --- @@ -1339,6 +1339,11 @@ test_that("column functions", { expect_equal(collect(select(df,

[GitHub] spark pull request #17192: [SPARK-19849][SQL] Support ArrayType in to_json t...

2017-03-09 Thread felixcheung
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/17192#discussion_r105336776 --- Diff: R/pkg/inst/tests/testthat/test_sparkSQL.R --- @@ -1339,6 +1339,11 @@ test_that("column functions", { expect_equal(collect(select(df,

[GitHub] spark pull request #17234: [SPARK-19892][MLlib] Implement findAnalogies meth...

2017-03-09 Thread felixcheung
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/17234#discussion_r105336208 --- Diff: R/pkg/DESCRIPTION --- @@ -54,5 +54,5 @@ Collate: 'types.R' 'utils.R' 'window.R' -RoxygenNote: 5.0.1

[GitHub] spark issue #17109: [SPARK-19740][MESOS]Add support in Spark to pass arbitra...

2017-03-09 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17109 **[Test build #74299 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74299/testReport)** for PR 17109 at commit

[GitHub] spark issue #17109: [SPARK-19740][MESOS]Add support in Spark to pass arbitra...

2017-03-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17109 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #17109: [SPARK-19740][MESOS]Add support in Spark to pass arbitra...

2017-03-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17109 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74299/ Test PASSed. ---

[GitHub] spark issue #17170: [SPARK-19825][R][ML] spark.ml R API for FPGrowth

2017-03-09 Thread felixcheung
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/17170 I can see your point, but renaming it only on the R side is not really addressing the issue. Please feel free to open a JIRA on spark.ml FPGrowth and start a discussion there. --- If your

[GitHub] spark issue #17109: [SPARK-19740][MESOS]Add support in Spark to pass arbitra...

2017-03-09 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17109 **[Test build #74299 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74299/testReport)** for PR 17109 at commit

[GitHub] spark issue #17177: [SPARK-19834][SQL] csv escape of quote escape

2017-03-09 Thread ep1804
Github user ep1804 commented on the issue: https://github.com/apache/spark/pull/17177 Documentation for DataFrameReader, DataFrameWriter, DataStreamReader, readwriter.py and streaming.py are written. Check please. --- If your project is set up for it, you can reply to this email and

[GitHub] spark issue #17232: [SPARK-18112] [SQL] Support reading data from Hive 2.1 m...

2017-03-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17232 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #17232: [SPARK-18112] [SQL] Support reading data from Hive 2.1 m...

2017-03-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17232 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74298/ Test FAILed. ---

[GitHub] spark issue #17232: [SPARK-18112] [SQL] Support reading data from Hive 2.1 m...

2017-03-09 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17232 **[Test build #74298 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74298/testReport)** for PR 17232 at commit

[GitHub] spark issue #17226: [SPARK-19882][SQL] Pivot with null as a distinct pivot v...

2017-03-09 Thread aray
Github user aray commented on the issue: https://github.com/apache/spark/pull/17226 @HyukjinKwon There is an inconsistency/regression but its not being introduced in this PR, its already there. Take an example without null as a pivot column value like below. The only difference is

[GitHub] spark issue #17235: [SPARK-19320][MESOS][WIP]allow specifying a hard limit o...

2017-03-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17235 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark pull request #17235: [SPARK-19320][MESOS][WIP]allow specifying a hard ...

2017-03-09 Thread yanji84
GitHub user yanji84 opened a pull request: https://github.com/apache/spark/pull/17235 [SPARK-19320][MESOS][WIP]allow specifying a hard limit on number of gpus required in each spark executor when running on mesos ## What changes were proposed in this pull request?

[GitHub] spark issue #17234: [SPARK-19892][MLlib] Implement findAnalogies method for ...

2017-03-09 Thread benradford
Github user benradford commented on the issue: https://github.com/apache/spark/pull/17234 ok to test Jenkins, add to whitelist --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark issue #17234: [SPARK-19892][MLlib] Implement findAnalogies method for ...

2017-03-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17234 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark pull request #17234: [SPARK-19892][MLlib] Implement findAnalogies meth...

2017-03-09 Thread benradford
GitHub user benradford opened a pull request: https://github.com/apache/spark/pull/17234 [SPARK-19892][MLlib] Implement findAnalogies method for Word2VecModel ## What changes were proposed in this pull request? Added findAnalogies method to Word2VecModel for performing

[GitHub] spark pull request #17174: [SPARK-19145][SQL] Timestamp to String casting is...

2017-03-09 Thread tanejagagan
Github user tanejagagan commented on a diff in the pull request: https://github.com/apache/spark/pull/17174#discussion_r105333124 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/TypeCoercion.scala --- @@ -324,14 +324,22 @@ object TypeCoercion {

[GitHub] spark issue #16952: [SPARK-19620][SQL]Fix incorrect exchange coordinator id ...

2017-03-09 Thread carsonwang
Github user carsonwang commented on the issue: https://github.com/apache/spark/pull/16952 @gatorsmile @cloud-fan @yhuai , can you help review and merge this minor one line fix? The code change itself is straightforward. --- If your project is set up for it, you can reply to this

[GitHub] spark issue #17123: [SPARK-19781][ML] Handle NULLs as well as NaNs in Bucket...

2017-03-09 Thread crackcell
Github user crackcell commented on the issue: https://github.com/apache/spark/pull/17123 @cloud-fan Would you please review my code again? I'm now using `Option` to handle NULLs. :-) --- If your project is set up for it, you can reply to this email and have your reply appear on

[GitHub] spark issue #17177: [SPARK-19834][SQL] csv escape of quote escape

2017-03-09 Thread ep1804
Github user ep1804 commented on the issue: https://github.com/apache/spark/pull/17177 An issue is raised for uniVocity parser: https://github.com/uniVocity/univocity-parsers/issues/143 --- If your project is set up for it, you can reply to this email and have your reply appear on

[GitHub] spark pull request #17233: [SPARK-11569][ML] Fix StringIndexer to handle nul...

2017-03-09 Thread crackcell
GitHub user crackcell opened a pull request: https://github.com/apache/spark/pull/17233 [SPARK-11569][ML] Fix StringIndexer to handle null value properly ## What changes were proposed in this pull request? This PR is to enhance StringIndexer with NULL values handling.

[GitHub] spark issue #17177: [SPARK-19834][SQL] csv escape of quote escape

2017-03-09 Thread ep1804
Github user ep1804 commented on the issue: https://github.com/apache/spark/pull/17177 Thank you @HyukjinKwon . I made changes following your comments: * `escapeQuoteEscaping` instead of `escapeEscape` * defalutl value to `\u` (unset) * `withTempPath` *

[GitHub] spark issue #17172: [SPARK-19008][SQL] Improve performance of Dataset.map by...

2017-03-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17172 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74297/ Test PASSed. ---

[GitHub] spark issue #17172: [SPARK-19008][SQL] Improve performance of Dataset.map by...

2017-03-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17172 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #17172: [SPARK-19008][SQL] Improve performance of Dataset.map by...

2017-03-09 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17172 **[Test build #74297 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74297/testReport)** for PR 17172 at commit

[GitHub] spark issue #17172: [SPARK-19008][SQL] Improve performance of Dataset.map by...

2017-03-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17172 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #17172: [SPARK-19008][SQL] Improve performance of Dataset.map by...

2017-03-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17172 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74296/ Test PASSed. ---

[GitHub] spark issue #17172: [SPARK-19008][SQL] Improve performance of Dataset.map by...

2017-03-09 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17172 **[Test build #74296 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74296/testReport)** for PR 17172 at commit

[GitHub] spark issue #17226: [SPARK-19882][SQL] Pivot with null as a distinct pivot v...

2017-03-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17226 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #17226: [SPARK-19882][SQL] Pivot with null as a distinct pivot v...

2017-03-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17226 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74295/ Test PASSed. ---

[GitHub] spark issue #17226: [SPARK-19882][SQL] Pivot with null as a distinct pivot v...

2017-03-09 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17226 **[Test build #74295 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74295/testReport)** for PR 17226 at commit

[GitHub] spark issue #17232: [SPARK-18112] [SQL] Support reading data from Hive 2.1 m...

2017-03-09 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17232 **[Test build #74298 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74298/testReport)** for PR 17232 at commit

[GitHub] spark pull request #17232: [SPARK-18112] [SQL] Support reading data from Hiv...

2017-03-09 Thread gatorsmile
GitHub user gatorsmile opened a pull request: https://github.com/apache/spark/pull/17232 [SPARK-18112] [SQL] Support reading data from Hive 2.1 metastore [WIP] ### What changes were proposed in this pull request? This PR is to support reading data from Hive 2.1 metastore. Need

[GitHub] spark issue #17231: [SPARK-19891][SS] Await Batch Lock notified on stream ex...

2017-03-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17231 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #17231: [SPARK-19891][SS] Await Batch Lock notified on stream ex...

2017-03-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17231 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #17231: [SPARK-19891][SS] Await Batch Lock notified on stream ex...

2017-03-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17231 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74293/ Test PASSed. ---

[GitHub] spark issue #17231: [SPARK-19891][SS] Await Batch Lock notified on stream ex...

2017-03-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17231 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74294/ Test PASSed. ---

[GitHub] spark issue #17226: [SPARK-19882][SQL] Pivot with null as a distinct pivot v...

2017-03-09 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/17226 > we're not introducing a regression in this PR by fixing the NPE, the answer given by 1.6 was incorrect under any interpenetration Right, if it was a bug, then this PR introduces an

[GitHub] spark issue #17231: [SPARK-19891][SS] Await Batch Lock notified on stream ex...

2017-03-09 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17231 **[Test build #74293 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74293/testReport)** for PR 17231 at commit

[GitHub] spark issue #17231: [SPARK-19891][SS] Await Batch Lock notified on stream ex...

2017-03-09 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17231 **[Test build #74294 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74294/testReport)** for PR 17231 at commit

[GitHub] spark issue #17231: [SPARK-19891][SS] Await Batch Lock notified on stream ex...

2017-03-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17231 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #17231: [SPARK-19891][SS] Await Batch Lock notified on stream ex...

2017-03-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17231 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/74292/ Test PASSed. ---

[GitHub] spark issue #17231: [SPARK-19891][SS] Await Batch Lock notified on stream ex...

2017-03-09 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17231 **[Test build #74292 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/74292/testReport)** for PR 17231 at commit

[GitHub] spark issue #17226: [SPARK-19882][SQL] Pivot with null as a distinct pivot v...

2017-03-09 Thread aray
Github user aray commented on the issue: https://github.com/apache/spark/pull/17226 @HyukjinKwon we're not introducing a regression in this PR by fixing the NPE, the answer given by 1.6 was incorrect under any interpenetration. Again, there is a completely separate issue of what the

[GitHub] spark pull request #17224: [SPARK-19882][SQL] Pivot with null as the dictinc...

2017-03-09 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/17224#discussion_r105327492 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -524,15 +529,21 @@ class Analyzer(

[GitHub] spark issue #17226: [SPARK-19882][SQL] Pivot with null as a distinct pivot v...

2017-03-09 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/17226 cc @cloud-fan and @yhuai could you pick up one of them? Let me follow. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark issue #17226: [SPARK-19882][SQL] Pivot with null as a distinct pivot v...

2017-03-09 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/17226 > Spark 2.0+ with PivotFirst gives a NPE when one of the pivot column values is null. The main thing fixed in this PR. I meant to say it is not fully fixed because it does not NPE but

[GitHub] spark issue #17226: [SPARK-19882][SQL] Pivot with null as a distinct pivot v...

2017-03-09 Thread aray
Github user aray commented on the issue: https://github.com/apache/spark/pull/17226 BTW for 3 above if we decide it should be 0, we can add an initial value for `PivotFirst` to make the fix. --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark issue #17226: [SPARK-19882][SQL] Pivot with null as a distinct pivot v...

2017-03-09 Thread aray
Github user aray commented on the issue: https://github.com/apache/spark/pull/17226 There are three things going on here in your one example. 1. Spark 1.6 [first version with pivot] (and Spark 2.0+ with an aggregate output type unsupported by PivotFirst) gives incorrect

[GitHub] spark pull request #17226: [SPARK-19882][SQL] Pivot with null as a distinct ...

2017-03-09 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/17226#discussion_r105325964 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -522,7 +522,7 @@ class Analyzer( }

[GitHub] spark pull request #15821: [SPARK-13534][PySpark] Using Apache Arrow to incr...

2017-03-09 Thread julienledem
Github user julienledem commented on a diff in the pull request: https://github.com/apache/spark/pull/15821#discussion_r105323172 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/ArrowConverters.scala --- @@ -0,0 +1,411 @@ +/* +* Licensed to the Apache Software

[GitHub] spark pull request #15821: [SPARK-13534][PySpark] Using Apache Arrow to incr...

2017-03-09 Thread julienledem
Github user julienledem commented on a diff in the pull request: https://github.com/apache/spark/pull/15821#discussion_r105322928 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/ArrowConverters.scala --- @@ -0,0 +1,411 @@ +/* +* Licensed to the Apache Software

[GitHub] spark pull request #15821: [SPARK-13534][PySpark] Using Apache Arrow to incr...

2017-03-09 Thread julienledem
Github user julienledem commented on a diff in the pull request: https://github.com/apache/spark/pull/15821#discussion_r105324269 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/ArrowConvertersSuite.scala --- @@ -0,0 +1,567 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request #15821: [SPARK-13534][PySpark] Using Apache Arrow to incr...

2017-03-09 Thread julienledem
Github user julienledem commented on a diff in the pull request: https://github.com/apache/spark/pull/15821#discussion_r105321731 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/ArrowConverters.scala --- @@ -0,0 +1,411 @@ +/* +* Licensed to the Apache Software

[GitHub] spark pull request #15821: [SPARK-13534][PySpark] Using Apache Arrow to incr...

2017-03-09 Thread julienledem
Github user julienledem commented on a diff in the pull request: https://github.com/apache/spark/pull/15821#discussion_r105322470 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/ArrowConverters.scala --- @@ -0,0 +1,411 @@ +/* +* Licensed to the Apache Software

[GitHub] spark pull request #15821: [SPARK-13534][PySpark] Using Apache Arrow to incr...

2017-03-09 Thread julienledem
Github user julienledem commented on a diff in the pull request: https://github.com/apache/spark/pull/15821#discussion_r105322707 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/ArrowConverters.scala --- @@ -0,0 +1,411 @@ +/* +* Licensed to the Apache Software

[GitHub] spark issue #17226: [SPARK-19882][SQL] Pivot with null as a distinct pivot v...

2017-03-09 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/17226 @aray, this is a regression as I described in my PR that is introduced by this optimization. Spark 1.6. ``` +++---+ | a|null| 1| +++---+

[GitHub] spark pull request #15821: [SPARK-13534][PySpark] Using Apache Arrow to incr...

2017-03-09 Thread julienledem
Github user julienledem commented on a diff in the pull request: https://github.com/apache/spark/pull/15821#discussion_r105322098 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/ArrowConverters.scala --- @@ -0,0 +1,411 @@ +/* +* Licensed to the Apache Software

[GitHub] spark pull request #15821: [SPARK-13534][PySpark] Using Apache Arrow to incr...

2017-03-09 Thread julienledem
Github user julienledem commented on a diff in the pull request: https://github.com/apache/spark/pull/15821#discussion_r105322283 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/ArrowConverters.scala --- @@ -0,0 +1,411 @@ +/* +* Licensed to the Apache Software

[GitHub] spark pull request #15821: [SPARK-13534][PySpark] Using Apache Arrow to incr...

2017-03-09 Thread julienledem
Github user julienledem commented on a diff in the pull request: https://github.com/apache/spark/pull/15821#discussion_r105321652 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/ArrowConverters.scala --- @@ -0,0 +1,411 @@ +/* +* Licensed to the Apache Software

  1   2   3   4   5   >