[GitHub] spark pull request #19588: [SPARK-12375][ML] VectorIndexerModel support hand...

2017-11-03 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/19588#discussion_r148734195 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/VectorIndexer.scala --- @@ -311,22 +342,39 @@ class VectorIndexerModel private[ml] (

[GitHub] spark issue #19645: [SPARK-22429][STREAMING] Streaming checkpointing code do...

2017-11-03 Thread srowen
Github user srowen commented on the issue: https://github.com/apache/spark/pull/19645 Looks good though this would have to be opened vs master. How about just not nulling fs? Rather than reopen it. --- - To

[GitHub] spark issue #19350: [SPARK-22126][ML] Fix model-specific optimization suppor...

2017-11-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19350 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83395/ Test FAILed. ---

[GitHub] spark issue #19350: [SPARK-22126][ML] Fix model-specific optimization suppor...

2017-11-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19350 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark pull request #19652: [SPARK-22435][SQL] Support processing array and m...

2017-11-03 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/19652#discussion_r148749276 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala --- @@ -1485,21 +1487,27 @@ class SparkSqlAstBuilder(conf: SQLConf)

[GitHub] spark pull request #19640: [SPARK-16986][WEB-UI] Replace GMT with history se...

2017-11-03 Thread dbolshak
Github user dbolshak commented on a diff in the pull request: https://github.com/apache/spark/pull/19640#discussion_r148750999 --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala --- @@ -426,6 +426,10 @@ class SparkHadoopUtil extends Logging {

[GitHub] spark pull request #19640: [SPARK-16986][WEB-UI] Replace GMT with history se...

2017-11-03 Thread dbolshak
Github user dbolshak commented on a diff in the pull request: https://github.com/apache/spark/pull/19640#discussion_r148750407 --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkHadoopUtil.scala --- @@ -426,6 +426,10 @@ class SparkHadoopUtil extends Logging {

[GitHub] spark pull request #19447: [SPARK-22215][SQL] Add configuration to set the t...

2017-11-03 Thread mgaido91
Github user mgaido91 closed the pull request at: https://github.com/apache/spark/pull/19447 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19447: [SPARK-22215][SQL] Add configuration to set the threshol...

2017-11-03 Thread mgaido91
Github user mgaido91 commented on the issue: https://github.com/apache/spark/pull/19447 I am closing this because as @kiszk pointed out in his comment, there is no reliable way to get `SQLConf` here. --- - To

[GitHub] spark issue #19646: [SPARK-22147][PYTHON] Fix for createDataFrame from panda...

2017-11-03 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/19646 I think this should be linked to SPARK-22417. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For

[GitHub] spark pull request #16578: [SPARK-4502][SQL] Parquet nested column pruning

2017-11-03 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/16578#discussion_r148731266 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/planning/SelectedFieldSuite.scala --- @@ -0,0 +1,440 @@ +/* + * Licensed to

[GitHub] spark issue #18029: [SPARK-20168] [DStream] Add changes to use kinesis fetch...

2017-11-03 Thread yssharma
Github user yssharma commented on the issue: https://github.com/apache/spark/pull/18029 @brkyvz could you please have a look if it looks good. Would be great if you're happy with the changes and we could merge it. ---

[GitHub] spark issue #19621: [SPARK-11215][ML] Add multiple columns support to String...

2017-11-03 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/19621 Jenkins, test this please. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands,

[GitHub] spark issue #19625: [SPARK-22407][WEB-UI] Add rdd id column on storage page ...

2017-11-03 Thread srowen
Github user srowen commented on the issue: https://github.com/apache/spark/pull/19625 Merged to master --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail:

[GitHub] spark issue #19208: [SPARK-21087] [ML] CrossValidator, TrainValidationSplit ...

2017-11-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19208 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83388/ Test FAILed. ---

[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2017-11-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16578 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83389/ Test FAILed. ---

[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2017-11-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16578 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2017-11-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16578 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83390/ Test FAILed. ---

[GitHub] spark issue #19621: [SPARK-11215][ML] Add multiple columns support to String...

2017-11-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19621 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83392/ Test FAILed. ---

[GitHub] spark issue #19208: [SPARK-21087] [ML] CrossValidator, TrainValidationSplit ...

2017-11-03 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19208 **[Test build #83393 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83393/testReport)** for PR 19208 at commit

[GitHub] spark issue #19621: [SPARK-11215][ML] Add multiple columns support to String...

2017-11-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19621 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark pull request #19646: [SPARK-22147][PYTHON] Fix for createDataFrame fro...

2017-11-03 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/19646#discussion_r148763906 --- Diff: python/pyspark/sql/session.py --- @@ -416,6 +417,50 @@ def _createFromLocal(self, data, schema): data = [schema.toInternal(row)

[GitHub] spark pull request #19639: [SPARK-22423][SQL] The TestHiveSingleton.scala fi...

2017-11-03 Thread xubo245
Github user xubo245 commented on a diff in the pull request: https://github.com/apache/spark/pull/19639#discussion_r148764458 --- Diff: sql/hive/src/test/scala/org/apache/spark/sql/hive/test/TestHiveSingleton.scala --- @@ -24,7 +24,6 @@ import org.apache.spark.sql.SparkSession

[GitHub] spark pull request #19646: [SPARK-22147][PYTHON] Fix for createDataFrame fro...

2017-11-03 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/19646#discussion_r148763235 --- Diff: python/pyspark/sql/session.py --- @@ -416,6 +417,50 @@ def _createFromLocal(self, data, schema): data = [schema.toInternal(row)

[GitHub] spark issue #19642: [SPARK-22410][SQL] Remove unnecessary output from BatchE...

2017-11-03 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/19642 To have a logical python runner, we may need to change some logic of extracting python udfs. May require quite more change than this simple fix. If you prefer it, I can do it. If it is just for

[GitHub] spark pull request #19646: [SPARK-22147][PYTHON] Fix for createDataFrame fro...

2017-11-03 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/19646#discussion_r148771775 --- Diff: python/pyspark/sql/tests.py --- @@ -2592,6 +2592,16 @@ def test_create_dataframe_from_array_of_long(self): df =

[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2017-11-03 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/16578 I can't tell what's causing the build to fail: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83390/console Any ideas? ---

[GitHub] spark issue #19588: [SPARK-12375][ML] VectorIndexerModel support handle unse...

2017-11-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19588 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #19208: [SPARK-21087] [ML] CrossValidator, TrainValidationSplit ...

2017-11-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19208 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83393/ Test PASSed. ---

[GitHub] spark issue #19208: [SPARK-21087] [ML] CrossValidator, TrainValidationSplit ...

2017-11-03 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19208 **[Test build #83393 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83393/testReport)** for PR 19208 at commit

[GitHub] spark issue #19208: [SPARK-21087] [ML] CrossValidator, TrainValidationSplit ...

2017-11-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19208 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #19208: [SPARK-21087] [ML] CrossValidator, TrainValidationSplit ...

2017-11-03 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/19208 ping @jkbradley Comments all addressed! Pls take a look again. Thanks! --- - To unsubscribe, e-mail:

[GitHub] spark pull request #19648: [SPARK-14516][ML][FOLLOW-UP] Move ClusteringEvalu...

2017-11-03 Thread mgaido91
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/19648#discussion_r148734734 --- Diff: mllib/src/test/scala/org/apache/spark/ml/evaluation/ClusteringEvaluatorSuite.scala --- @@ -22,15 +22,21 @@ import

[GitHub] spark issue #19588: [SPARK-12375][ML] VectorIndexerModel support handle unse...

2017-11-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19588 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83397/ Test FAILed. ---

[GitHub] spark issue #19381: [SPARK-10884][ML] Support prediction on single instance ...

2017-11-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19381 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83398/ Test FAILed. ---

[GitHub] spark issue #19588: [SPARK-12375][ML] VectorIndexerModel support handle unse...

2017-11-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19588 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #19621: [SPARK-11215][ML] Add multiple columns support to String...

2017-11-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19621 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83396/ Test FAILed. ---

[GitHub] spark issue #19621: [SPARK-11215][ML] Add multiple columns support to String...

2017-11-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19621 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #19381: [SPARK-10884][ML] Support prediction on single instance ...

2017-11-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19381 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2017-11-03 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/16578 **[Test build #83387 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83387/testReport)** for PR 16578 at commit

[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2017-11-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16578 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83387/ Test PASSed. ---

[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2017-11-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16578 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #19208: [SPARK-21087] [ML] CrossValidator, TrainValidationSplit ...

2017-11-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19208 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2017-11-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/16578 Build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands,

[GitHub] spark issue #16578: [SPARK-4502][SQL] Parquet nested column pruning

2017-11-03 Thread mallman
Github user mallman commented on the issue: https://github.com/apache/spark/pull/16578 > Yeah, I think with a config for this optimization is good. I added a config switch, `spark.sql.nestedSchemaPruning.enabled`, which disables the optimizations if set to `false`. By default

[GitHub] spark issue #19532: [DOC]update the API doc and modify the stage API descrip...

2017-11-03 Thread guoxiaolongzte
Github user guoxiaolongzte commented on the issue: https://github.com/apache/spark/pull/19532 retest this please --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail:

[GitHub] spark issue #19381: [SPARK-10884][ML] Support prediction on single instance ...

2017-11-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19381 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83394/ Test FAILed. ---

[GitHub] spark pull request #19350: [SPARK-22126][ML] Fix model-specific optimization...

2017-11-03 Thread WeichenXu123
GitHub user WeichenXu123 reopened a pull request: https://github.com/apache/spark/pull/19350 [SPARK-22126][ML] Fix model-specific optimization support for ML tuning ## What changes were proposed in this pull request? Push down fitting parallelization code from

[GitHub] spark issue #19381: [SPARK-10884][ML] Support prediction on single instance ...

2017-11-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19381 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark pull request #19652: [SPARK-22435][SQL] Support processing array and m...

2017-11-03 Thread jinxing64
GitHub user jinxing64 opened a pull request: https://github.com/apache/spark/pull/19652 [SPARK-22435][SQL] Support processing array and map type using script ## What changes were proposed in this pull request? Currently, It is not supported to use script(e.g. python) to

[GitHub] spark pull request #19652: [SPARK-22435][SQL] Support processing array and m...

2017-11-03 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/19652#discussion_r148749862 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala --- @@ -1485,21 +1487,27 @@ class SparkSqlAstBuilder(conf: SQLConf)

[GitHub] spark issue #19640: [SPARK-16986][WEB-UI] Replace GMT with history server si...

2017-11-03 Thread wangyum
Github user wangyum commented on the issue: https://github.com/apache/spark/pull/19640 @srowen We can configure time zone by`spark.history.timeZone` now. --- - To unsubscribe, e-mail:

[GitHub] spark pull request #19646: [SPARK-22147][PYTHON] Fix for createDataFrame fro...

2017-11-03 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/19646#discussion_r148770218 --- Diff: python/pyspark/sql/session.py --- @@ -416,6 +417,50 @@ def _createFromLocal(self, data, schema): data = [schema.toInternal(row)

[GitHub] spark pull request #19646: [SPARK-22147][PYTHON] Fix for createDataFrame fro...

2017-11-03 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/19646#discussion_r148773536 --- Diff: python/pyspark/sql/session.py --- @@ -416,6 +417,50 @@ def _createFromLocal(self, data, schema): data = [schema.toInternal(row)

[GitHub] spark issue #19250: [SPARK-12297] Table timezone correction for Timestamps

2017-11-03 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/19250 why is this patch so complicated? Based on the fact that data sources accept a "timezone" option for read/writre, I'd expect it to be just: * when `CreateTable`, set session local

[GitHub] spark issue #19588: [SPARK-12375][ML] VectorIndexerModel support handle unse...

2017-11-03 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19588 **[Test build #83391 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83391/testReport)** for PR 19588 at commit

[GitHub] spark issue #19642: [SPARK-22410][SQL] Remove unnecessary output from BatchE...

2017-11-03 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/19642 I see, is it better to introduce a logical python runner then we can do column pruning correctly? --- - To unsubscribe,

[GitHub] spark issue #19640: [SPARK-16986][WEB-UI] Replace GMT with history server si...

2017-11-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19640 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #19652: [SPARK-22435][SQL] Support processing array and map type...

2017-11-03 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19652 **[Test build #83399 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83399/testReport)** for PR 19652 at commit

[GitHub] spark issue #19640: [SPARK-16986][WEB-UI] Replace GMT with history server si...

2017-11-03 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19640 **[Test build #83385 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83385/testReport)** for PR 19640 at commit

[GitHub] spark issue #19640: [SPARK-16986][WEB-UI] Replace GMT with history server si...

2017-11-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19640 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83385/ Test PASSed. ---

[GitHub] spark issue #19586: [SPARK-22367][WIP][CORE] Separate the serialization of c...

2017-11-03 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/19586 You can call `SparkConf#registerKryoClasses` manually, maybe we can also register these ml classes automatically in `KryoSerializer.newKryo` via reflection. cc @yanboliang @srowen ---

[GitHub] spark pull request #16578: [SPARK-4502][SQL] Parquet nested column pruning

2017-11-03 Thread mallman
Github user mallman commented on a diff in the pull request: https://github.com/apache/spark/pull/16578#discussion_r148731634 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/planning/SelectedFieldSuite.scala --- @@ -0,0 +1,440 @@ +/* + * Licensed to

[GitHub] spark issue #19588: [SPARK-12375][ML] VectorIndexerModel support handle unse...

2017-11-03 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/19588 @hhbyyh comments addressed. Thanks! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark pull request #19652: [SPARK-22435][SQL] Support processing array and m...

2017-11-03 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/19652#discussion_r148748292 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/SparkSqlParser.scala --- @@ -1454,22 +1454,24 @@ class SparkSqlAstBuilder(conf: SQLConf)

[GitHub] spark issue #19588: [SPARK-12375][ML] VectorIndexerModel support handle unse...

2017-11-03 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19588 **[Test build #83391 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83391/testReport)** for PR 19588 at commit

[GitHub] spark issue #19588: [SPARK-12375][ML] VectorIndexerModel support handle unse...

2017-11-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19588 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83391/ Test PASSed. ---

[GitHub] spark issue #19645: [SPARK-22429][STREAMING] Streaming checkpointing code do...

2017-11-03 Thread tmgstevens
Github user tmgstevens commented on the issue: https://github.com/apache/spark/pull/19645 @srowen, okay my issue around compilation issues, so I've changed the base branch. We can not null it, but I guess by re-open it we stand a better chance of resolving any unexpected

[GitHub] spark pull request #19625: [SPARK-22407][WEB-UI] Add rdd id column on storag...

2017-11-03 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/19625 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19652: [SPARK-22435][SQL] Support processing array and m...

2017-11-03 Thread jinxing64
Github user jinxing64 commented on a diff in the pull request: https://github.com/apache/spark/pull/19652#discussion_r148775355 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/ScriptTransformationExec.scala --- @@ -267,6 +268,33 @@ private class

[GitHub] spark pull request #19646: [SPARK-22147][PYTHON] Fix for createDataFrame fro...

2017-11-03 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/19646#discussion_r148779399 --- Diff: python/pyspark/sql/session.py --- @@ -512,9 +557,7 @@ def createDataFrame(self, data, schema=None, samplingRatio=None, verifySchema=Tr

[GitHub] spark pull request #19646: [SPARK-22147][PYTHON] Fix for createDataFrame fro...

2017-11-03 Thread HyukjinKwon
Github user HyukjinKwon commented on a diff in the pull request: https://github.com/apache/spark/pull/19646#discussion_r148782931 --- Diff: python/pyspark/sql/session.py --- @@ -512,9 +557,7 @@ def createDataFrame(self, data, schema=None, samplingRatio=None, verifySchema=Tr

[GitHub] spark pull request #19630: wip: [SPARK-22409] Introduce function type argume...

2017-11-03 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/19630#discussion_r148800791 --- Diff: python/pyspark/rdd.py --- @@ -56,6 +56,22 @@ __all__ = ["RDD"] +class PythonEvalType(object): +""" +

[GitHub] spark pull request #19630: wip: [SPARK-22409] Introduce function type argume...

2017-11-03 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/19630#discussion_r148805057 --- Diff: python/pyspark/sql/functions.py --- @@ -2049,133 +2049,18 @@ def map_values(col): # User Defined

[GitHub] spark issue #19640: [SPARK-16986][WEB-UI] Replace GMT with history server si...

2017-11-03 Thread jiangxb1987
Github user jiangxb1987 commented on the issue: https://github.com/apache/spark/pull/19640 cc @ueshin --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail:

[GitHub] spark issue #19652: [SPARK-22435][SQL] Support processing array and map type...

2017-11-03 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19652 **[Test build #83399 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/83399/testReport)** for PR 19652 at commit

[GitHub] spark issue #19652: [SPARK-22435][SQL] Support processing array and map type...

2017-11-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19652 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #19652: [SPARK-22435][SQL] Support processing array and map type...

2017-11-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19652 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83399/ Test PASSed. ---

[GitHub] spark pull request #19649: [SPARK-22405][SQL] Add more ExternalCatalogEvent

2017-11-03 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/19649#discussion_r148783615 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalog.scala --- @@ -158,7 +173,13 @@ abstract class

[GitHub] spark pull request #19649: [SPARK-22405][SQL] Add more ExternalCatalogEvent

2017-11-03 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/19649#discussion_r148783570 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalog.scala --- @@ -147,7 +154,15 @@ abstract class

[GitHub] spark pull request #19646: [SPARK-22147][PYTHON] Fix for createDataFrame fro...

2017-11-03 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/19646#discussion_r148791016 --- Diff: python/pyspark/sql/session.py --- @@ -512,9 +557,7 @@ def createDataFrame(self, data, schema=None, samplingRatio=None, verifySchema=Tr

[GitHub] spark issue #19642: [SPARK-22410][SQL] Remove unnecessary output from BatchE...

2017-11-03 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/19642 To have a python runner operator also has advantage like to work with optimizer better, e.g. column pruning. I am not against this idea. However, since it requires more change, I'd like to have more

[GitHub] spark pull request #19630: wip: [SPARK-22409] Introduce function type argume...

2017-11-03 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/19630#discussion_r148807539 --- Diff: python/pyspark/sql/group.py --- @@ -214,11 +214,11 @@ def apply(self, udf): :param udf: A function object returned by

[GitHub] spark pull request #19654: [SPARK-22437][PYSPARK] default mode for jdbc is w...

2017-11-03 Thread mgaido91
GitHub user mgaido91 opened a pull request: https://github.com/apache/spark/pull/19654 [SPARK-22437][PYSPARK] default mode for jdbc is wrongly set to None ## What changes were proposed in this pull request? When writing using jdbc with python currently we are wrongly

[GitHub] spark issue #19645: [SPARK-22429][STREAMING] Streaming checkpointing code do...

2017-11-03 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19645 **[Test build #3978 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/3978/testReport)** for PR 19645 at commit

[GitHub] spark issue #19642: [SPARK-22410][SQL] Remove unnecessary output from BatchE...

2017-11-03 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/19642 For the answer to https://github.com/apache/spark/pull/19642#issuecomment-341688601, > To have a logical python runner, we may need to change some logic of extracting python udfs. May

[GitHub] spark pull request #19630: wip: [SPARK-22409] Introduce function type argume...

2017-11-03 Thread icexelloss
Github user icexelloss commented on a diff in the pull request: https://github.com/apache/spark/pull/19630#discussion_r148807433 --- Diff: python/pyspark/sql/functions.py --- @@ -2208,16 +2093,26 @@ def udf(f=None, returnType=StringType()): | 8| JOHN DOE|

[GitHub] spark issue #19630: wip: [SPARK-22409] Introduce function type argument in p...

2017-11-03 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/19630 overall LGTM, but let's not add udf types that are not implemented yet, like the AGGREGATE --- - To unsubscribe, e-mail:

[GitHub] spark pull request #19630: wip: [SPARK-22409] Introduce function type argume...

2017-11-03 Thread icexelloss
Github user icexelloss commented on a diff in the pull request: https://github.com/apache/spark/pull/19630#discussion_r148810396 --- Diff: python/pyspark/sql/udf.py --- @@ -0,0 +1,136 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +#

[GitHub] spark issue #19654: [SPARK-22437][PYSPARK] default mode for jdbc is wrongly ...

2017-11-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19654 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/83401/ Test FAILed. ---

[GitHub] spark issue #19654: [SPARK-22437][PYSPARK] default mode for jdbc is wrongly ...

2017-11-03 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19654 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark pull request #19630: wip: [SPARK-22409] Introduce function type argume...

2017-11-03 Thread icexelloss
Github user icexelloss commented on a diff in the pull request: https://github.com/apache/spark/pull/19630#discussion_r148811820 --- Diff: python/pyspark/sql/group.py --- @@ -214,11 +214,11 @@ def apply(self, udf): :param udf: A function object returned by

[GitHub] spark issue #19532: [DOC]update the API doc and modify the stage API descrip...

2017-11-03 Thread jiangxb1987
Github user jiangxb1987 commented on the issue: https://github.com/apache/spark/pull/19532 lgtm --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail:

[GitHub] spark issue #19651: [SPARK-20682][SPARK-15474][SPARK-21791] Add new ORCFileF...

2017-11-03 Thread dongjoon-hyun
Github user dongjoon-hyun commented on the issue: https://github.com/apache/spark/pull/19651 Thank you, @HyukjinKwon ! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands,

[GitHub] spark issue #19642: [SPARK-22410][SQL] Remove unnecessary output from BatchE...

2017-11-03 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/19642 why is column pruning execution details? Actually I feel it's werid to have `ExtractPythonUDFs` rule applying on physical plans, is there a particular reason? ---

[GitHub] spark issue #19649: [SPARK-22405][SQL] Add more ExternalCatalogEvent

2017-11-03 Thread hvanhovell
Github user hvanhovell commented on the issue: https://github.com/apache/spark/pull/19649 Looks good, one small question. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands,

[GitHub] spark issue #19642: [SPARK-22410][SQL] Remove unnecessary output from BatchE...

2017-11-03 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/19642 I mean to have an individual python runner operator tells the way we execute python udfs. Currently python udfs are just normal expressions. It seems to me that logically they are just expressions.

[GitHub] spark issue #19653: [SPARK-22418][SQL][TEST] Add test cases for NULL Handlin...

2017-11-03 Thread mgaido91
Github user mgaido91 commented on the issue: https://github.com/apache/spark/pull/19653 I think that the failure is related to the lack of changes in code outside tests but I am not sure... --- - To unsubscribe,

[GitHub] spark pull request #19649: [SPARK-22405][SQL] Add more ExternalCatalogEvent

2017-11-03 Thread jiangxb1987
Github user jiangxb1987 commented on a diff in the pull request: https://github.com/apache/spark/pull/19649#discussion_r148810142 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/catalog/ExternalCatalogEventSuite.scala --- @@ -104,6 +109,8 @@ class

[GitHub] spark issue #19654: [SPARK-22437][PYSPARK] default mode for jdbc is wrongly ...

2017-11-03 Thread mgaido91
Github user mgaido91 commented on the issue: https://github.com/apache/spark/pull/19654 I think that the failure is an infra issue.. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

  1   2   3   4   >