[GitHub] spark pull request #18958: [SPARK-21745][SQL] Refactor ColumnVector hierarch...

2017-08-20 Thread ueshin
Github user ueshin commented on a diff in the pull request: https://github.com/apache/spark/pull/18958#discussion_r134148287 --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/ColumnVector.java --- @@ -970,30 +458,14 @@ public final int

[GitHub] spark issue #18986: [SPARK-21774][SQL] The rule PromoteStrings should cast a...

2017-08-20 Thread DonnyZone
Github user DonnyZone commented on the issue: https://github.com/apache/spark/pull/18986 @gatorsmile For this issue, I think the behevior in PromoteStrings rule is reasonable, but there are problems in underlying converter UTF8String. As described in PR-15880

[GitHub] spark pull request #18968: [SPARK-21759][SQL] In.checkInputDataTypes should ...

2017-08-20 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/18968#discussion_r134147697 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/predicates.scala --- @@ -138,46 +138,56 @@ case class Not(child:

[GitHub] spark issue #18984: [SPARK-21773][BUILD][DOCS] Installs mkdocs if missing in...

2017-08-20 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/18984 Here is the context I got: https://github.com/apache/spark/pull/18702 broke the documentation build in https://amplab.cs.berkeley.edu/jenkins/job/spark-master-docs/. ```

[GitHub] spark issue #18968: [SPARK-21759][SQL] In.checkInputDataTypes should not wro...

2017-08-20 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18968 **[Test build #80920 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80920/testReport)** for PR 18968 at commit

[GitHub] spark pull request #18958: [SPARK-21745][SQL] Refactor ColumnVector hierarch...

2017-08-20 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/18958#discussion_r134147099 --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/ColumnarBatch.java --- @@ -307,64 +293,70 @@ public void update(int ordinal,

[GitHub] spark pull request #18968: [SPARK-21759][SQL] In.checkInputDataTypes should ...

2017-08-20 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/18968#discussion_r134147067 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/predicates.scala --- @@ -138,46 +138,63 @@ case class Not(child:

[GitHub] spark pull request #18958: [SPARK-21745][SQL] Refactor ColumnVector hierarch...

2017-08-20 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/18958#discussion_r134146811 --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/ColumnVector.java --- @@ -970,30 +458,14 @@ public final int

[GitHub] spark pull request #18968: [SPARK-21759][SQL] In.checkInputDataTypes should ...

2017-08-20 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/18968#discussion_r134146523 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/predicates.scala --- @@ -138,46 +138,63 @@ case class Not(child:

[GitHub] spark issue #18984: [SPARK-21773][BUILD][DOCS] Installs mkdocs if missing in...

2017-08-20 Thread shaneknapp
Github user shaneknapp commented on the issue: https://github.com/apache/spark/pull/18984 how long has this been failing in this way? i'll take a closer look tomorrow afternoon. On Sun, Aug 20, 2017 at 4:31 AM, Hyukjin Kwon wrote: >

[GitHub] spark pull request #17951: [SPARK-20711][ML] Fix incorrect min/max for NaN v...

2017-08-20 Thread zhengruifeng
Github user zhengruifeng closed the pull request at: https://github.com/apache/spark/pull/17951 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark pull request #18958: [SPARK-21745][SQL] Refactor ColumnVector hierarch...

2017-08-20 Thread kiszk
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/18958#discussion_r134145753 --- Diff: sql/core/src/main/java/org/apache/spark/sql/execution/vectorized/ColumnarBatch.java --- @@ -307,64 +293,69 @@ public void update(int ordinal,

[GitHub] spark issue #19002: [SPARK-21790][TESTS][FOLLOW-UP] Add filter pushdown veri...

2017-08-20 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19002 **[Test build #80919 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80919/testReport)** for PR 19002 at commit

[GitHub] spark pull request #18994: [SPARK-21784][SQL] Adds support for defining info...

2017-08-20 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/18994#discussion_r134145171 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/HiveExternalCatalog.scala --- @@ -1214,6 +1246,11 @@ object HiveExternalCatalog {

[GitHub] spark pull request #19002: [SPARK-21790][TESTS][FOLLOW-UP] Add filter pushdo...

2017-08-20 Thread wangyum
Github user wangyum commented on a diff in the pull request: https://github.com/apache/spark/pull/19002#discussion_r134144956 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/test/SQLTestUtils.scala --- @@ -39,7 +39,6 @@ import org.apache.spark.sql.catalyst.plans.PlanTest

[GitHub] spark pull request #18994: [SPARK-21784][SQL] Adds support for defining info...

2017-08-20 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/18994#discussion_r134144063 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/TableConstraints.scala --- @@ -0,0 +1,323 @@ +/* + * Licensed to the

[GitHub] spark issue #18958: [SPARK-21745][SQL] Refactor ColumnVector hierarchy to ma...

2017-08-20 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18958 **[Test build #80918 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80918/testReport)** for PR 18958 at commit

[GitHub] spark issue #18999: [SPARK-21779][PYTHON] Simpler DataFrame.sample API in Py...

2017-08-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18999 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80912/ Test PASSed. ---

[GitHub] spark issue #18999: [SPARK-21779][PYTHON] Simpler DataFrame.sample API in Py...

2017-08-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18999 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80911/ Test PASSed. ---

[GitHub] spark issue #18999: [SPARK-21779][PYTHON] Simpler DataFrame.sample API in Py...

2017-08-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18999 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #18999: [SPARK-21779][PYTHON] Simpler DataFrame.sample API in Py...

2017-08-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18999 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #18999: [SPARK-21779][PYTHON] Simpler DataFrame.sample API in Py...

2017-08-20 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18999 **[Test build #80911 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80911/testReport)** for PR 18999 at commit

[GitHub] spark issue #18999: [SPARK-21779][PYTHON] Simpler DataFrame.sample API in Py...

2017-08-20 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18999 **[Test build #80912 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80912/testReport)** for PR 18999 at commit

[GitHub] spark issue #17951: [SPARK-20711][ML] Fix incorrect min/max for NaN value in...

2017-08-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17951 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80914/ Test FAILed. ---

[GitHub] spark issue #17951: [SPARK-20711][ML] Fix incorrect min/max for NaN value in...

2017-08-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/17951 Merged build finished. Test FAILed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #17951: [SPARK-20711][ML] Fix incorrect min/max for NaN value in...

2017-08-20 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17951 **[Test build #80914 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80914/testReport)** for PR 17951 at commit

[GitHub] spark issue #18270: [SPARK-21055][SQL] replace grouping__id with grouping_id...

2017-08-20 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/18270 @cenyuhai Are you still working on this? Could please fix the test? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark issue #18975: [SPARK-4131] Support "Writing data into the filesystem f...

2017-08-20 Thread janewangfb
Github user janewangfb commented on the issue: https://github.com/apache/spark/pull/18975 still need to implement the data source table portion. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not

[GitHub] spark issue #18975: [SPARK-4131] Support "Writing data into the filesystem f...

2017-08-20 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18975 **[Test build #80917 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80917/testReport)** for PR 18975 at commit

[GitHub] spark pull request #18975: [SPARK-4131] Support "Writing data into the files...

2017-08-20 Thread janewangfb
Github user janewangfb closed the pull request at: https://github.com/apache/spark/pull/18975 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature

[GitHub] spark issue #18029: [SPARK-20168] [DStream] Add changes to use kinesis fetch...

2017-08-20 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18029 **[Test build #80913 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80913/testReport)** for PR 18029 at commit

[GitHub] spark issue #18029: [SPARK-20168] [DStream] Add changes to use kinesis fetch...

2017-08-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18029 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80913/ Test PASSed. ---

[GitHub] spark issue #18029: [SPARK-20168] [DStream] Add changes to use kinesis fetch...

2017-08-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18029 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request #19002: [SPARK-21790][TESTS][FOLLOW-UP] Add filter pushdo...

2017-08-20 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/19002#discussion_r134140695 --- Diff: external/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/OracleIntegrationSuite.scala --- @@ -255,6 +256,18 @@ class

[GitHub] spark pull request #19002: [SPARK-21790][TESTS][FOLLOW-UP] Add filter pushdo...

2017-08-20 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/19002#discussion_r134140826 --- Diff: external/docker-integration-tests/src/test/scala/org/apache/spark/sql/jdbc/OracleIntegrationSuite.scala --- @@ -255,6 +256,18 @@ class

[GitHub] spark issue #19008: [SPARK-21756][SQL]Add JSON option to allow unquoted cont...

2017-08-20 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19008 **[Test build #80916 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80916/testReport)** for PR 19008 at commit

[GitHub] spark pull request #18985: [SPARK-21772] Fix staging parent directory for In...

2017-08-20 Thread liupc
Github user liupc closed the pull request at: https://github.com/apache/spark/pull/18985 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

[GitHub] spark issue #18968: [SPARK-21759][SQL] In.checkInputDataTypes should not wro...

2017-08-20 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18968 **[Test build #80915 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80915/testReport)** for PR 18968 at commit

[GitHub] spark issue #19009: [MINOR][CORE]remove scala 's' function

2017-08-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19009 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark issue #17951: [SPARK-20711][ML] Fix incorrect min/max for identical Na...

2017-08-20 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17951 **[Test build #80914 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80914/testReport)** for PR 17951 at commit

[GitHub] spark pull request #19009: [MINOR][CORE]remove scala 's' function

2017-08-20 Thread heary-cao
GitHub user heary-cao opened a pull request: https://github.com/apache/spark/pull/19009 [MINOR][CORE]remove scala 's' function ## What changes were proposed in this pull request? remove scala 's' function when output information without taking the value of the

[GitHub] spark issue #18985: [SPARK-21772] Fix staging parent directory for InsertInt...

2017-08-20 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/18985 @liupc If you have no more question about this, can you close this PR? Thank you. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If

[GitHub] spark issue #18985: [SPARK-21772] Fix staging parent directory for InsertInt...

2017-08-20 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/18985 OK. Thanks @liupc. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so,

[GitHub] spark issue #18985: [SPARK-21772] Fix staging parent directory for InsertInt...

2017-08-20 Thread liupc
Github user liupc commented on the issue: https://github.com/apache/spark/pull/18985 Sorry, I think SPARK-18675 has solved this problem. https://issues.apache.org/jira/browse/SPARK-18675 My environment is hive-0.13, spark2.1.0, There are two reasons caused this problem.

[GitHub] spark issue #18029: [SPARK-20168] [DStream] Add changes to use kinesis fetch...

2017-08-20 Thread yssharma
Github user yssharma commented on the issue: https://github.com/apache/spark/pull/18029 Will wait for @brkyvz , @HyukjinKwon for final ☑️ --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark issue #18029: [SPARK-20168] [DStream] Add changes to use kinesis fetch...

2017-08-20 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18029 **[Test build #80913 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80913/testReport)** for PR 18029 at commit

[GitHub] spark pull request #18029: [SPARK-20168] [DStream] Add changes to use kinesi...

2017-08-20 Thread yssharma
Github user yssharma commented on a diff in the pull request: https://github.com/apache/spark/pull/18029#discussion_r134138994 --- Diff: external/kinesis-asl/src/main/scala/org/apache/spark/streaming/kinesis/InitialPosition.scala --- @@ -0,0 +1,104 @@ +/* + * Licensed to

[GitHub] spark issue #18029: [SPARK-20168] [DStream] Add changes to use kinesis fetch...

2017-08-20 Thread yssharma
Github user yssharma commented on the issue: https://github.com/apache/spark/pull/18029 Added review suggestions @budde ! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled

[GitHub] spark issue #18810: [SPARK-21603][SQL]The wholestage codegen will be much sl...

2017-08-20 Thread maropu
Github user maropu commented on the issue: https://github.com/apache/spark/pull/18810 Btw, as for merged prs, I'm just monitoring TPCDS perf. in [here](https://docs.google.com/spreadsheets/d/1V8xoKR9ElU-rOXMH84gb5BbLEw0XAPTJY8c8aZeIqus/edit#gid=445143188). Also, I wrote a script

[GitHub] spark issue #18906: [SPARK-21692][PYSPARK][SQL] Add nullability support to P...

2017-08-20 Thread ueshin
Github user ueshin commented on the issue: https://github.com/apache/spark/pull/18906 @ptkool Thank you for working on this! I'd like to ask what your use-case is. Users have historically been confused about what nullable means, and we don't think we should give them yet another

[GitHub] spark issue #18810: [SPARK-21603][SQL]The wholestage codegen will be much sl...

2017-08-20 Thread maropu
Github user maropu commented on the issue: https://github.com/apache/spark/pull/18810 yea, I'll do --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if

[GitHub] spark pull request #18968: [SPARK-21759][SQL] In.checkInputDataTypes should ...

2017-08-20 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/18968#discussion_r134136888 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/predicates.scala --- @@ -138,46 +138,80 @@ case class Not(child:

[GitHub] spark issue #18810: [SPARK-21603][SQL]The wholestage codegen will be much sl...

2017-08-20 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/18810 @maropu Interesting. Would you like to benchmark with #18931 too? It is my attempt to solve long code-gen functions without disabling it. --- If your project is set up for it, you can reply to this

[GitHub] spark pull request #18966: [SPARK-21751][SQL] CodeGeneraor.splitExpressions ...

2017-08-20 Thread kiszk
Github user kiszk commented on a diff in the pull request: https://github.com/apache/spark/pull/18966#discussion_r134134190 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala --- @@ -582,6 +582,15 @@ object SQLConf { .intConf

[GitHub] spark issue #18999: [SPARK-21779][PYTHON] Simpler DataFrame.sample API in Py...

2017-08-20 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18999 **[Test build #80912 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80912/testReport)** for PR 18999 at commit

[GitHub] spark issue #18999: [SPARK-21779][PYTHON] Simpler DataFrame.sample API in Py...

2017-08-20 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18999 **[Test build #80911 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80911/testReport)** for PR 18999 at commit

[GitHub] spark issue #18748: [SPARK-20679][ML] Support recommending for a subset of u...

2017-08-20 Thread mpjlu
Github user mpjlu commented on the issue: https://github.com/apache/spark/pull/18748 Thanks @MLnick . I have double checked my test. Since there is no recommendForUserSubset , my previous test is MLLIB MatrixFactorizationModel::predict(RDD(Int, Int)), which predicts the rating of

[GitHub] spark issue #18999: [SPARK-21779][PYTHON] Simpler DataFrame.sample API in Py...

2017-08-20 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/18999 https://github.com/apache/spark/pull/18999#discussion_r134131441 looks hidden. I addressed the other comment. --- If your project is set up for it, you can reply to this email and have your

[GitHub] spark issue #18866: [SPARK-21649][SQL] Support writing data into hive bucket...

2017-08-20 Thread jinxing64
Github user jinxing64 commented on the issue: https://github.com/apache/spark/pull/18866 @cloud-fan Would you give some advice on this ? Thus I can know if I'm on the right direction. I can keep working on it :) --- If your project is set up for it, you can reply to this email

[GitHub] spark pull request #18999: [SPARK-21779][PYTHON] Simpler DataFrame.sample AP...

2017-08-20 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/18999#discussion_r134131441 --- Diff: python/pyspark/sql/dataframe.py --- @@ -659,19 +659,77 @@ def distinct(self): return DataFrame(self._jdf.distinct(),

[GitHub] spark issue #18576: [SPARK-21351][SQL] Update nullability based on children'...

2017-08-20 Thread maropu
Github user maropu commented on the issue: https://github.com/apache/spark/pull/18576 ping --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the

[GitHub] spark issue #18810: [SPARK-21603][SQL]The wholestage codegen will be much sl...

2017-08-20 Thread maropu
Github user maropu commented on the issue: https://github.com/apache/spark/pull/18810 (copied from jira for just-in-case) Just for your information, I checked the performance changes of TPCDS before/after the pr #18810; the pr affected Q17/Q66 only (that is, they have too long

[GitHub] spark issue #18986: [SPARK-21774][SQL] The rule PromoteStrings should cast a...

2017-08-20 Thread maropu
Github user maropu commented on the issue: https://github.com/apache/spark/pull/18986 Yea, since this topic is important for some users, I mean we better move the doc into `./docs/` ( I feel novices dont seem to check the code documents). --- If your project is set up for it, you

[GitHub] spark issue #18029: [SPARK-20168] [DStream] Add changes to use kinesis fetch...

2017-08-20 Thread yssharma
Github user yssharma commented on the issue: https://github.com/apache/spark/pull/18029 Will update and post another request seen. Thanks. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request #19003: [SPARK-21769] [SQL] Add a table-specific option f...

2017-08-20 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/19003#discussion_r134129199 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/client/VersionsSuite.scala --- @@ -763,6 +763,47 @@ class VersionsSuite extends

[GitHub] spark issue #18849: [SPARK-21617][SQL] Store correct table metadata when alt...

2017-08-20 Thread gatorsmile
Github user gatorsmile commented on the issue: https://github.com/apache/spark/pull/18849 If `ALTER TABLE` makes the hive compatibility broken, the value of this flag becomes misleading. Currently, the naming of this flag is pretty general. I expect this flag could be used for the

[GitHub] spark pull request #19008: [SPARK-21756][SQL]Add JSON option to allow unquot...

2017-08-20 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/19008#discussion_r134128223 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/json/JsonParsingOptionsSuite.scala --- @@ -72,6 +72,21 @@ class

[GitHub] spark pull request #18966: [SPARK-21751][SQL] CodeGeneraor.splitExpressions ...

2017-08-20 Thread gatorsmile
Github user gatorsmile commented on a diff in the pull request: https://github.com/apache/spark/pull/18966#discussion_r134128067 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala --- @@ -582,6 +582,15 @@ object SQLConf { .intConf

[GitHub] spark pull request #18968: [SPARK-21759][SQL] In.checkInputDataTypes should ...

2017-08-20 Thread dilipbiswal
Github user dilipbiswal commented on a diff in the pull request: https://github.com/apache/spark/pull/18968#discussion_r134126635 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/predicates.scala --- @@ -138,46 +138,80 @@ case class Not(child:

[GitHub] spark issue #19008: [SPARK-21756][SQL]Add JSON option to allow unquoted cont...

2017-08-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19008 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80910/ Test PASSed. ---

[GitHub] spark issue #19008: [SPARK-21756][SQL]Add JSON option to allow unquoted cont...

2017-08-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19008 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #19008: [SPARK-21756][SQL]Add JSON option to allow unquoted cont...

2017-08-20 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19008 **[Test build #80910 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80910/testReport)** for PR 19008 at commit

[GitHub] spark issue #18849: [SPARK-21617][SQL] Store correct table metadata when alt...

2017-08-20 Thread vanzin
Github user vanzin commented on the issue: https://github.com/apache/spark/pull/18849 If the flag is set to true, then whenever an "alter table" command is executed, it will follow the "Hive compatible" path, which lets the Hive metastore decide whether the change is valid or not.

[GitHub] spark issue #19001: [SPARK-19256][SQL] Hive bucketing support

2017-08-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19001 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80908/ Test PASSed. ---

[GitHub] spark issue #19001: [SPARK-19256][SQL] Hive bucketing support

2017-08-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19001 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #19001: [SPARK-19256][SQL] Hive bucketing support

2017-08-20 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19001 **[Test build #80908 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80908/testReport)** for PR 19001 at commit

[GitHub] spark pull request #18999: [SPARK-21779][PYTHON] Simpler DataFrame.sample AP...

2017-08-20 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/18999#discussion_r134123916 --- Diff: python/pyspark/sql/dataframe.py --- @@ -659,19 +659,77 @@ def distinct(self): return DataFrame(self._jdf.distinct(), self.sql_ctx)

[GitHub] spark pull request #18281: [SPARK-21027][ML][PYTHON] Added tunable paralleli...

2017-08-20 Thread felixcheung
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/18281#discussion_r134123840 --- Diff: mllib/src/main/scala/org/apache/spark/ml/classification/OneVsRest.scala --- @@ -17,29 +17,34 @@ package

[GitHub] spark issue #18732: [SPARK-20396][SQL][PySpark] groupby().apply() with panda...

2017-08-20 Thread felixcheung
Github user felixcheung commented on the issue: https://github.com/apache/spark/pull/18732 cool - this is a bit understated but potentially huge (to me anyway) --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request #18999: [SPARK-21779][PYTHON] Simpler DataFrame.sample AP...

2017-08-20 Thread felixcheung
Github user felixcheung commented on a diff in the pull request: https://github.com/apache/spark/pull/18999#discussion_r134123764 --- Diff: python/pyspark/sql/dataframe.py --- @@ -659,19 +659,77 @@ def distinct(self): return DataFrame(self._jdf.distinct(),

[GitHub] spark issue #18953: [SPARK-20682][SQL] Update ORC data source based on Apach...

2017-08-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18953 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #18953: [SPARK-20682][SQL] Update ORC data source based on Apach...

2017-08-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18953 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80909/ Test PASSed. ---

[GitHub] spark pull request #18999: [SPARK-21779][PYTHON] Simpler DataFrame.sample AP...

2017-08-20 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/18999#discussion_r134123358 --- Diff: python/pyspark/sql/dataframe.py --- @@ -659,19 +659,77 @@ def distinct(self): return DataFrame(self._jdf.distinct(), self.sql_ctx)

[GitHub] spark issue #18953: [SPARK-20682][SQL] Update ORC data source based on Apach...

2017-08-20 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18953 **[Test build #80909 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80909/testReport)** for PR 18953 at commit

[GitHub] spark issue #19007: [SPARK-21783][SQL]Turn on ORC filter push-down by defaul...

2017-08-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19007 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #19007: [SPARK-21783][SQL]Turn on ORC filter push-down by defaul...

2017-08-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19007 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80907/ Test PASSed. ---

[GitHub] spark issue #19007: [SPARK-21783][SQL]Turn on ORC filter push-down by defaul...

2017-08-20 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19007 **[Test build #80907 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80907/testReport)** for PR 19007 at commit

[GitHub] spark issue #19007: [SPARK-21783][SQL]Turn on ORC filter push-down by defaul...

2017-08-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19007 Merged build finished. Test PASSed. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark issue #19007: [SPARK-21783][SQL]Turn on ORC filter push-down by defaul...

2017-08-20 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19007 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/80906/ Test PASSed. ---

[GitHub] spark issue #19007: [SPARK-21783][SQL]Turn on ORC filter push-down by defaul...

2017-08-20 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19007 **[Test build #80906 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80906/testReport)** for PR 19007 at commit

[GitHub] spark issue #19008: [SPARK-21756][SQL]Add JSON option to allow unquoted cont...

2017-08-20 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19008 **[Test build #80910 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80910/testReport)** for PR 19008 at commit

[GitHub] spark pull request #19008: [SPARK-21756][SQL]Add JSON option to allow unquot...

2017-08-20 Thread vinodkc
GitHub user vinodkc opened a pull request: https://github.com/apache/spark/pull/19008 [SPARK-21756][SQL]Add JSON option to allow unquoted control characters ## What changes were proposed in this pull request? This patch adds allowUnquotedControlChars option in JSON data

[GitHub] spark pull request #19007: [SPARK-21783][SQL]Turn on ORC filter push-down by...

2017-08-20 Thread vinodkc
Github user vinodkc closed the pull request at: https://github.com/apache/spark/pull/19007 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

[GitHub] spark issue #19007: [SPARK-21783][SQL]Turn on ORC filter push-down by defaul...

2017-08-20 Thread vinodkc
Github user vinodkc commented on the issue: https://github.com/apache/spark/pull/19007 Ok , I'm closing my PR. Now a days, Spark JIRA is not showing PR status.That is why I missed your PR. --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] spark issue #18953: [SPARK-20682][SQL] Update ORC data source based on Apach...

2017-08-20 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/18953 Hi, @cloud-fan . I added `SparkOrcNewRecordReader.java` back to reduce the patch size. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub

[GitHub] spark issue #19007: [SPARK-21783][SQL]Turn on ORC filter push-down by defaul...

2017-08-20 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/19007 Ur, I made the PR two days ago already, #18991 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark issue #19001: [SPARK-19256][SQL] Hive bucketing support

2017-08-20 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19001 **[Test build #80908 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80908/testReport)** for PR 19001 at commit

[GitHub] spark issue #18953: [SPARK-20682][SQL] Update ORC data source based on Apach...

2017-08-20 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18953 **[Test build #80909 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/80909/testReport)** for PR 18953 at commit

[GitHub] spark issue #19001: [SPARK-19256][SQL] Hive bucketing support

2017-08-20 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/19001 Retest this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and

[GitHub] spark pull request #19001: [SPARK-19256][SQL] Hive bucketing support

2017-08-20 Thread dongjoon-hyun
Github user dongjoon-hyun commented on a diff in the pull request: https://github.com/apache/spark/pull/19001#discussion_r134120888 --- Diff: sql/hive/src/main/java/org/apache/hadoop/hive/ql/io/BucketizedSparkRecordReader.java --- @@ -0,0 +1,147 @@ +/** + * Licensed to

  1   2   3   >