[GitHub] spark issue #18711: [SPARK-21506][DOC]The description of "spark.executor.cor...

2017-10-10 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18711 **[Test build #82578 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82578/testReport)** for PR 18711 at commit

[GitHub] spark issue #17968: [SPARK-9792] Make DenseMatrix equality semantical

2017-10-10 Thread gglanzani
Github user gglanzani commented on the issue: https://github.com/apache/spark/pull/17968 @WeichenXu123 Done. Let me know. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands,

[GitHub] spark issue #17968: [SPARK-9792] Make DenseMatrix equality semantical

2017-10-10 Thread gglanzani
Github user gglanzani commented on the issue: https://github.com/apache/spark/pull/17968 @WeichenXu123 Done again! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail:

[GitHub] spark pull request #19438: [SPARK-22208] [SQL] Improve percentile_approx by ...

2017-10-10 Thread srowen
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/19438#discussion_r143684083 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/QuantileSummariesSuite.scala --- @@ -58,7 +58,7 @@ class QuantileSummariesSuite

[GitHub] spark issue #19464: Spark 22233

2017-10-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19464 Can one of the admins verify this patch? --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #18711: [SPARK-21506][DOC]The description of "spark.executor.cor...

2017-10-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18711 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #18711: [SPARK-21506][DOC]The description of "spark.executor.cor...

2017-10-10 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18711 **[Test build #82578 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82578/testReport)** for PR 18711 at commit

[GitHub] spark issue #18711: [SPARK-21506][DOC]The description of "spark.executor.cor...

2017-10-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/18711 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82578/ Test PASSed. ---

[GitHub] spark issue #18664: [SPARK-21375][PYSPARK][SQL][WIP] Add Date and Timestamp ...

2017-10-10 Thread ueshin
Github user ueshin commented on the issue: https://github.com/apache/spark/pull/18664 I'd say I prefer 1, too. I'm just wondering what if we use timestamp in nested types. Currently we don't support nested types but in the future? ---

[GitHub] spark pull request #19439: [SPARK-21866][ML][PySpark] Adding spark image rea...

2017-10-10 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/19439#discussion_r143684708 --- Diff: mllib/src/main/scala/org/apache/spark/ml/image/HadoopUtils.scala --- @@ -0,0 +1,107 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request #19439: [SPARK-21866][ML][PySpark] Adding spark image rea...

2017-10-10 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/19439#discussion_r143687378 --- Diff: mllib/src/main/scala/org/apache/spark/ml/image/ImageSchema.scala --- @@ -0,0 +1,217 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request #19106: [SPARK-21770][ML] ProbabilisticClassificationMode...

2017-10-10 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/19106 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #17359: [SPARK-20028][SQL] Add aggreagate expression nGrams

2017-10-10 Thread gczsjdy
Github user gczsjdy commented on the issue: https://github.com/apache/spark/pull/17359 Sorry, but I think this is inactive. Thanks for your attention. @wzhfy @viirya @gatorsmile --- - To unsubscribe, e-mail:

[GitHub] spark pull request #19439: [SPARK-21866][ML][PySpark] Adding spark image rea...

2017-10-10 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/19439#discussion_r143685432 --- Diff: mllib/src/main/scala/org/apache/spark/ml/image/HadoopUtils.scala --- @@ -0,0 +1,107 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark issue #19218: [SPARK-21786][SQL] The 'spark.sql.parquet.compression.co...

2017-10-10 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19218 **[Test build #82580 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82580/testReport)** for PR 19218 at commit

[GitHub] spark issue #19218: [SPARK-21786][SQL] The 'spark.sql.parquet.compression.co...

2017-10-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19218 Merged build finished. Test FAILed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #19218: [SPARK-21786][SQL] The 'spark.sql.parquet.compression.co...

2017-10-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19218 Test FAILed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82579/ Test FAILed. ---

[GitHub] spark issue #16648: [SPARK-18016][SQL][CATALYST] Code Generation: Constant P...

2017-10-10 Thread kiszk
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/16648 @bdrillard Thank you very much --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail:

[GitHub] spark issue #6751: [SPARK-8300] DataFrame hint for broadcast join.

2017-10-10 Thread fjh100456
Github user fjh100456 commented on the issue: https://github.com/apache/spark/pull/6751 @rxin @marmbrus Is there another way to broadcast table with the spark-sql now, except by `spark.sql.autoBroadcastJoinThreshold`? And if no, is it a good way to broadcast table by user

[GitHub] spark pull request #18732: [SPARK-20396][SQL][PySpark] groupby().apply() wit...

2017-10-10 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/18732#discussion_r143646922 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/pythonLogicalOperators.scala --- @@ -0,0 +1,43 @@ +/* + *

[GitHub] spark pull request #17968: [SPARK-9792] Make DenseMatrix equality semantical

2017-10-10 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/17968#discussion_r143656820 --- Diff: python/pyspark/mllib/linalg/__init__.py --- @@ -1131,14 +1131,20 @@ def __getitem__(self, indices): return self.values[i + j

[GitHub] spark issue #19218: [SPARK-21786][SQL] The 'spark.sql.parquet.compression.co...

2017-10-10 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19218 **[Test build #82579 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82579/testReport)** for PR 19218 at commit

[GitHub] spark pull request #19439: [SPARK-21866][ML][PySpark] Adding spark image rea...

2017-10-10 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/19439#discussion_r143686181 --- Diff: mllib/src/main/scala/org/apache/spark/ml/image/HadoopUtils.scala --- @@ -0,0 +1,107 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request #19439: [SPARK-21866][ML][PySpark] Adding spark image rea...

2017-10-10 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/19439#discussion_r143688286 --- Diff: mllib/src/main/scala/org/apache/spark/ml/image/ImageSchema.scala --- @@ -0,0 +1,217 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark issue #19464: Spark 22233

2017-10-10 Thread kiszk
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/19464 Could you please update the title of this PR appropriately? e.g. `[SPARK-22233][core] ...` --- - To unsubscribe, e-mail:

[GitHub] spark issue #19218: [SPARK-21786][SQL] The 'spark.sql.parquet.compression.co...

2017-10-10 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19218 **[Test build #82579 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82579/testReport)** for PR 19218 at commit

[GitHub] spark issue #18732: [SPARK-20396][SQL][PySpark] groupby().apply() with panda...

2017-10-10 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/18732 LGTM --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail:

[GitHub] spark issue #18664: [SPARK-21375][PYSPARK][SQL][WIP] Add Date and Timestamp ...

2017-10-10 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/18664 > Write Arrow data with SESSION_LOCAL timestamp (as is currently in this PR) BTW, could we just use `DateTimeUtils.defaultTimeZone()` instead of `SQLConf.SESSION_LOCAL_TIMEZONE` if you

[GitHub] spark pull request #19464: Spark 22233

2017-10-10 Thread liutang123
GitHub user liutang123 opened a pull request: https://github.com/apache/spark/pull/19464 Spark 22233 ## What changes were proposed in this pull request? add spark.hadoop.filterOutEmptySplit confituration to allow user to filter out empty split in HadoopRDD. You can merge this

[GitHub] spark pull request #19439: [SPARK-21866][ML][PySpark] Adding spark image rea...

2017-10-10 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/19439#discussion_r143688666 --- Diff: mllib/src/main/scala/org/apache/spark/ml/image/ImageSchema.scala --- @@ -0,0 +1,217 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request #18732: [SPARK-20396][SQL][PySpark] groupby().apply() wit...

2017-10-10 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/18732#discussion_r143646526 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/pythonLogicalOperators.scala --- @@ -0,0 +1,43 @@ +/* + *

[GitHub] spark issue #19082: [SPARK-21870][SQL] Split aggregation code into small fun...

2017-10-10 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/19082 @kiszk Thanks for summarizing the PRs. I just have a question about inlining method by JIT compiler. So you mean JIT compiler will inline methods into larger unit and then do JIT compilation

[GitHub] spark issue #19082: [SPARK-21870][SQL] Split aggregation code into small fun...

2017-10-10 Thread kiszk
Github user kiszk commented on the issue: https://github.com/apache/spark/pull/19082 Let me summarize recent interesting PRs for code generation regarding JVM bytecode limit for JIT compilation. These PRs encourages to apply JIT compilation to more methods since most of JIT compilers

[GitHub] spark pull request #17968: [SPARK-9792] Make DenseMatrix equality semantical

2017-10-10 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/17968#discussion_r143656736 --- Diff: python/pyspark/ml/linalg/__init__.py --- @@ -976,14 +976,20 @@ def __getitem__(self, indices): return self.values[i + j *

[GitHub] spark issue #19439: [SPARK-21866][ML][PySpark] Adding spark image reader

2017-10-10 Thread viirya
Github user viirya commented on the issue: https://github.com/apache/spark/pull/19439 I saw there are few images, just want to make sure, are those images are safe of license issue to be included in Spark? --- - To

[GitHub] spark pull request #19439: [SPARK-21866][ML][PySpark] Adding spark image rea...

2017-10-10 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/19439#discussion_r143686017 --- Diff: mllib/src/main/scala/org/apache/spark/ml/image/HadoopUtils.scala --- @@ -0,0 +1,107 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request #19439: [SPARK-21866][ML][PySpark] Adding spark image rea...

2017-10-10 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/19439#discussion_r143689246 --- Diff: mllib/src/main/scala/org/apache/spark/ml/image/ImageSchema.scala --- @@ -0,0 +1,217 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request #17359: [SPARK-20028][SQL] Add aggreagate expression nGra...

2017-10-10 Thread gczsjdy
Github user gczsjdy closed the pull request at: https://github.com/apache/spark/pull/17359 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19463: Cleanup comment in RDDSuite test

2017-10-10 Thread sohum2002
Github user sohum2002 closed the pull request at: https://github.com/apache/spark/pull/19463 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19106: [SPARK-21770][ML] ProbabilisticClassificationModel fix c...

2017-10-10 Thread srowen
Github user srowen commented on the issue: https://github.com/apache/spark/pull/19106 Merged to master. I wasn't clear whether this was a pressing problem that needed to be backported. --- - To unsubscribe, e-mail:

[GitHub] spark issue #19463: Cleanup comment in RDDSuite test

2017-10-10 Thread srowen
Github user srowen commented on the issue: https://github.com/apache/spark/pull/19463 This comment seems valid. It's stating the question the test is trying to answer. I'd close this please, as it would be trivial even if valid ---

[GitHub] spark pull request #19439: [SPARK-21866][ML][PySpark] Adding spark image rea...

2017-10-10 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/19439#discussion_r143686577 --- Diff: mllib/src/main/scala/org/apache/spark/ml/image/HadoopUtils.scala --- @@ -0,0 +1,107 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request #19439: [SPARK-21866][ML][PySpark] Adding spark image rea...

2017-10-10 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/19439#discussion_r143689831 --- Diff: mllib/src/main/scala/org/apache/spark/ml/image/ImageSchema.scala --- @@ -0,0 +1,217 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request #19439: [SPARK-21866][ML][PySpark] Adding spark image rea...

2017-10-10 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/19439#discussion_r143689721 --- Diff: mllib/src/main/scala/org/apache/spark/ml/image/ImageSchema.scala --- @@ -0,0 +1,217 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request #17968: [SPARK-9792] Make DenseMatrix equality semantical

2017-10-10 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/17968#discussion_r143702719 --- Diff: python/pyspark/mllib/linalg/__init__.py --- @@ -1131,14 +1131,17 @@ def __getitem__(self, indices): return self.values[i + j

[GitHub] spark pull request #17968: [SPARK-9792] Make DenseMatrix equality semantical

2017-10-10 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/17968#discussion_r143702830 --- Diff: python/pyspark/ml/linalg/__init__.py --- @@ -976,14 +976,18 @@ def __getitem__(self, indices): return self.values[i + j *

[GitHub] spark issue #17819: [SPARK-20542][ML][SQL] Add an API to Bucketizer that can...

2017-10-10 Thread MLnick
Github user MLnick commented on the issue: https://github.com/apache/spark/pull/17819 Yes, fair enough On Tue, 10 Oct 2017 at 14:09 Liang-Chi Hsieh wrote: > *@viirya* commented on this pull request. > --

[GitHub] spark issue #19337: [SPARK-22114][ML][MLLIB]add epsilon for LDA

2017-10-10 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19337 **[Test build #82581 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82581/testReport)** for PR 19337 at commit

[GitHub] spark pull request #18711: [SPARK-21506][DOC]The description of "spark.execu...

2017-10-10 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/18711 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #19337: [SPARK-22114][ML][MLLIB]add epsilon for LDA

2017-10-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19337 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #19337: [SPARK-22114][ML][MLLIB]add epsilon for LDA

2017-10-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19337 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82581/ Test PASSed. ---

[GitHub] spark issue #19218: [SPARK-21786][SQL] The 'spark.sql.parquet.compression.co...

2017-10-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19218 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #17819: [SPARK-20542][ML][SQL] Add an API to Bucketizer that can...

2017-10-10 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/17819 **[Test build #82583 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82583/testReport)** for PR 17819 at commit

[GitHub] spark pull request #19020: [SPARK-3181] [ML] Implement huber loss for Linear...

2017-10-10 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/19020#discussion_r143727850 --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala --- @@ -208,6 +292,26 @@ class LinearRegression @Since("1.3.0")

[GitHub] spark issue #19337: [SPARK-22114][ML][MLLIB]add epsilon for LDA

2017-10-10 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19337 **[Test build #82582 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82582/testReport)** for PR 19337 at commit

[GitHub] spark issue #17357: [SPARK-20025][CORE] Ignore SPARK_LOCAL* env, while deplo...

2017-10-10 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/17357 LGTM, merging to master! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail:

[GitHub] spark issue #19218: [SPARK-21786][SQL] The 'spark.sql.parquet.compression.co...

2017-10-10 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19218 **[Test build #82580 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82580/testReport)** for PR 19218 at commit

[GitHub] spark issue #19218: [SPARK-21786][SQL] The 'spark.sql.parquet.compression.co...

2017-10-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19218 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82580/ Test PASSed. ---

[GitHub] spark pull request #18732: [SPARK-20396][SQL][PySpark] groupby().apply() wit...

2017-10-10 Thread icexelloss
Github user icexelloss commented on a diff in the pull request: https://github.com/apache/spark/pull/18732#discussion_r143740636 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/object.scala --- @@ -519,3 +519,4 @@ case class CoGroup(

[GitHub] spark issue #18979: [SPARK-21762][SQL] FileFormatWriter/BasicWriteTaskStatsT...

2017-10-10 Thread steveloughran
Github user steveloughran commented on the issue: https://github.com/apache/spark/pull/18979 Has anyone had a look at this recently? The problem still exists, and while downstream filesystems can address if they recognise the use case & lie about values, they will be

[GitHub] spark pull request #19443: [SPARK-22212][SQL][PySpark] Some SQL functions in...

2017-10-10 Thread jsnowacki
Github user jsnowacki closed the pull request at: https://github.com/apache/spark/pull/19443 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark issue #18732: [SPARK-20396][SQL][PySpark] groupby().apply() with panda...

2017-10-10 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/18732 **[Test build #82584 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82584/testReport)** for PR 18732 at commit

[GitHub] spark issue #18711: [SPARK-21506][DOC]The description of "spark.executor.cor...

2017-10-10 Thread cloud-fan
Github user cloud-fan commented on the issue: https://github.com/apache/spark/pull/18711 LGTM, merging to master! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail:

[GitHub] spark pull request #17819: [SPARK-20542][ML][SQL] Add an API to Bucketizer t...

2017-10-10 Thread MLnick
Github user MLnick commented on a diff in the pull request: https://github.com/apache/spark/pull/17819#discussion_r143733664 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Bucketizer.scala --- @@ -96,9 +99,71 @@ final class Bucketizer @Since("1.4.0") (@Since("1.4.0")

[GitHub] spark pull request #18732: [SPARK-20396][SQL][PySpark] groupby().apply() wit...

2017-10-10 Thread icexelloss
Github user icexelloss commented on a diff in the pull request: https://github.com/apache/spark/pull/18732#discussion_r143741944 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/execution/python/ArrowEvalPythonExec.scala --- @@ -44,14 +73,18 @@ case class

[GitHub] spark issue #19337: [SPARK-22114][ML][MLLIB]add epsilon for LDA

2017-10-10 Thread mpjlu
Github user mpjlu commented on the issue: https://github.com/apache/spark/pull/19337 Thanks, @hhbyyh. I will create a JIRA for python API --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For

[GitHub] spark pull request #19337: [SPARK-22114][ML][MLLIB]add epsilon for LDA

2017-10-10 Thread mgaido91
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/19337#discussion_r143705705 --- Diff: mllib/src/test/scala/org/apache/spark/ml/clustering/LDASuite.scala --- @@ -119,6 +121,8 @@ class LDASuite extends SparkFunSuite with

[GitHub] spark pull request #19363: [SPARK-22224][Minor]Override toString of KeyValue...

2017-10-10 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request: https://github.com/apache/spark/pull/19363#discussion_r143717543 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/KeyValueGroupedDataset.scala --- @@ -564,4 +565,30 @@ class KeyValueGroupedDataset[K, V]

[GitHub] spark pull request #19337: [SPARK-22114][ML][MLLIB]add epsilon for LDA

2017-10-10 Thread mpjlu
Github user mpjlu commented on a diff in the pull request: https://github.com/apache/spark/pull/19337#discussion_r143723408 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/LDAOptimizer.scala --- @@ -573,7 +584,8 @@ private[clustering] object OnlineLDAOptimizer {

[GitHub] spark issue #19337: [SPARK-22114][ML][MLLIB]add epsilon for LDA

2017-10-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19337 Merged build finished. Test PASSed. --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark issue #19337: [SPARK-22114][ML][MLLIB]add epsilon for LDA

2017-10-10 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19337 **[Test build #82582 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82582/testReport)** for PR 19337 at commit

[GitHub] spark issue #19337: [SPARK-22114][ML][MLLIB]add epsilon for LDA

2017-10-10 Thread AmplabJenkins
Github user AmplabJenkins commented on the issue: https://github.com/apache/spark/pull/19337 Test PASSed. Refer to this link for build results (access rights to CI server needed): https://amplab.cs.berkeley.edu/jenkins//job/SparkPullRequestBuilder/82582/ Test PASSed. ---

[GitHub] spark pull request #17819: [SPARK-20542][ML][SQL] Add an API to Bucketizer t...

2017-10-10 Thread MLnick
Github user MLnick commented on a diff in the pull request: https://github.com/apache/spark/pull/17819#discussion_r143712667 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Bucketizer.scala --- @@ -108,26 +173,53 @@ final class Bucketizer @Since("1.4.0")

[GitHub] spark pull request #17819: [SPARK-20542][ML][SQL] Add an API to Bucketizer t...

2017-10-10 Thread MLnick
Github user MLnick commented on a diff in the pull request: https://github.com/apache/spark/pull/17819#discussion_r143713315 --- Diff: examples/src/main/java/org/apache/spark/examples/ml/JavaBucketizerExample.java --- @@ -33,6 +33,13 @@ import

[GitHub] spark pull request #17819: [SPARK-20542][ML][SQL] Add an API to Bucketizer t...

2017-10-10 Thread MLnick
Github user MLnick commented on a diff in the pull request: https://github.com/apache/spark/pull/17819#discussion_r143723205 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Bucketizer.scala --- @@ -96,9 +99,71 @@ final class Bucketizer @Since("1.4.0") (@Since("1.4.0")

[GitHub] spark pull request #17819: [SPARK-20542][ML][SQL] Add an API to Bucketizer t...

2017-10-10 Thread MLnick
Github user MLnick commented on a diff in the pull request: https://github.com/apache/spark/pull/17819#discussion_r143730685 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Bucketizer.scala --- @@ -24,20 +24,23 @@ import org.apache.spark.annotation.Since import

[GitHub] spark pull request #17819: [SPARK-20542][ML][SQL] Add an API to Bucketizer t...

2017-10-10 Thread MLnick
Github user MLnick commented on a diff in the pull request: https://github.com/apache/spark/pull/17819#discussion_r143722947 --- Diff: mllib/src/test/scala/org/apache/spark/ml/feature/BucketizerSuite.scala --- @@ -187,6 +188,196 @@ class BucketizerSuite extends SparkFunSuite with

[GitHub] spark pull request #17819: [SPARK-20542][ML][SQL] Add an API to Bucketizer t...

2017-10-10 Thread MLnick
Github user MLnick commented on a diff in the pull request: https://github.com/apache/spark/pull/17819#discussion_r143708258 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Bucketizer.scala --- @@ -24,20 +24,23 @@ import org.apache.spark.annotation.Since import

[GitHub] spark issue #19270: [SPARK-21809] : Change Stage Page to use datatables to s...

2017-10-10 Thread pgandhi999
Github user pgandhi999 commented on the issue: https://github.com/apache/spark/pull/19270 @ajbozarth I do not quite understand what you are saying. Everything seems to be working fine on my test setup. Can you please let me know how do I replicate the issue? Thank you. ---

[GitHub] spark pull request #18732: [SPARK-20396][SQL][PySpark] groupby().apply() wit...

2017-10-10 Thread icexelloss
Github user icexelloss commented on a diff in the pull request: https://github.com/apache/spark/pull/18732#discussion_r143740157 --- Diff: python/pyspark/sql/tests.py --- @@ -3376,6 +3376,151 @@ def test_vectorized_udf_empty_partition(self): res =

[GitHub] spark pull request #18732: [SPARK-20396][SQL][PySpark] groupby().apply() wit...

2017-10-10 Thread icexelloss
Github user icexelloss commented on a diff in the pull request: https://github.com/apache/spark/pull/18732#discussion_r143740078 --- Diff: python/pyspark/sql/tests.py --- @@ -3376,6 +3376,151 @@ def test_vectorized_udf_empty_partition(self): res =

[GitHub] spark pull request #18732: [SPARK-20396][SQL][PySpark] groupby().apply() wit...

2017-10-10 Thread icexelloss
Github user icexelloss commented on a diff in the pull request: https://github.com/apache/spark/pull/18732#discussion_r143740129 --- Diff: python/pyspark/sql/tests.py --- @@ -3376,6 +3376,151 @@ def test_vectorized_udf_empty_partition(self): res =

[GitHub] spark issue #19443: [SPARK-22212][SQL][PySpark] Some SQL functions in Python...

2017-10-10 Thread HyukjinKwon
Github user HyukjinKwon commented on the issue: https://github.com/apache/spark/pull/19443 Let's resolve it as `Later` for now. Will keep my eyes on similar JIRAs and ping / cc you in the future. Thanks for bearing with me @jsnowacki. ---

[GitHub] spark issue #19337: [SPARK-22114][ML][MLLIB]add epsilon for LDA

2017-10-10 Thread SparkQA
Github user SparkQA commented on the issue: https://github.com/apache/spark/pull/19337 **[Test build #82581 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/82581/testReport)** for PR 19337 at commit

[GitHub] spark pull request #19337: [SPARK-22114][ML][MLLIB]add epsilon for LDA

2017-10-10 Thread mgaido91
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/19337#discussion_r143704472 --- Diff: mllib/src/test/scala/org/apache/spark/ml/clustering/LDASuite.scala --- @@ -119,6 +121,8 @@ class LDASuite extends SparkFunSuite with

[GitHub] spark pull request #17819: [SPARK-20542][ML][SQL] Add an API to Bucketizer t...

2017-10-10 Thread MLnick
Github user MLnick commented on a diff in the pull request: https://github.com/apache/spark/pull/17819#discussion_r143706308 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala --- @@ -684,6 +684,34 @@ class DataFrameSuite extends QueryTest with

[GitHub] spark pull request #17357: [SPARK-20025][CORE] Ignore SPARK_LOCAL* env, whil...

2017-10-10 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/17357 --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org

[GitHub] spark pull request #19337: [SPARK-22114][ML][MLLIB]add epsilon for LDA

2017-10-10 Thread mgaido91
Github user mgaido91 commented on a diff in the pull request: https://github.com/apache/spark/pull/19337#discussion_r143720272 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/clustering/LDAOptimizer.scala --- @@ -573,7 +584,8 @@ private[clustering] object OnlineLDAOptimizer

[GitHub] spark pull request #19020: [SPARK-3181] [ML] Implement huber loss for Linear...

2017-10-10 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/19020#discussion_r143726258 --- Diff: mllib/src/main/scala/org/apache/spark/ml/regression/LinearRegression.scala --- @@ -744,11 +754,20 @@ object LinearRegressionModel extends

[GitHub] spark pull request #19020: [SPARK-3181] [ML] Implement huber loss for Linear...

2017-10-10 Thread WeichenXu123
Github user WeichenXu123 commented on a diff in the pull request: https://github.com/apache/spark/pull/19020#discussion_r143727489 --- Diff: mllib/src/main/scala/org/apache/spark/ml/optim/aggregator/HuberAggregator.scala --- @@ -0,0 +1,145 @@ +/* + * Licensed to the

[GitHub] spark pull request #18732: [SPARK-20396][SQL][PySpark] groupby().apply() wit...

2017-10-10 Thread icexelloss
Github user icexelloss commented on a diff in the pull request: https://github.com/apache/spark/pull/18732#discussion_r143744197 --- Diff: python/pyspark/sql/functions.py --- @@ -2181,30 +2187,66 @@ def udf(f=None, returnType=StringType()): @since(2.3) def

[GitHub] spark pull request #19337: [SPARK-22114][ML][MLLIB]add epsilon for LDA

2017-10-10 Thread mpjlu
Github user mpjlu commented on a diff in the pull request: https://github.com/apache/spark/pull/19337#discussion_r143705102 --- Diff: mllib/src/test/scala/org/apache/spark/ml/clustering/LDASuite.scala --- @@ -119,6 +121,8 @@ class LDASuite extends SparkFunSuite with

[GitHub] spark pull request #17819: [SPARK-20542][ML][SQL] Add an API to Bucketizer t...

2017-10-10 Thread viirya
Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/17819#discussion_r143707170 --- Diff: sql/core/src/test/scala/org/apache/spark/sql/DataFrameSuite.scala --- @@ -684,6 +684,34 @@ class DataFrameSuite extends QueryTest with

[GitHub] spark pull request #19438: [SPARK-22208] [SQL] Improve percentile_approx by ...

2017-10-10 Thread wzhfy
Github user wzhfy commented on a diff in the pull request: https://github.com/apache/spark/pull/19438#discussion_r143748535 --- Diff: sql/catalyst/src/test/scala/org/apache/spark/sql/catalyst/util/QuantileSummariesSuite.scala --- @@ -58,7 +58,7 @@ class QuantileSummariesSuite

[GitHub] spark issue #17968: [SPARK-9792] Make DenseMatrix equality semantical

2017-10-10 Thread WeichenXu123
Github user WeichenXu123 commented on the issue: https://github.com/apache/spark/pull/17968 @gatorsmile Add this to white list! --- - To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional

[GitHub] spark pull request #17819: [SPARK-20542][ML][SQL] Add an API to Bucketizer t...

2017-10-10 Thread MLnick
Github user MLnick commented on a diff in the pull request: https://github.com/apache/spark/pull/17819#discussion_r143728324 --- Diff: mllib/src/test/scala/org/apache/spark/ml/feature/BucketizerSuite.scala --- @@ -187,6 +188,196 @@ class BucketizerSuite extends SparkFunSuite with

[GitHub] spark pull request #17819: [SPARK-20542][ML][SQL] Add an API to Bucketizer t...

2017-10-10 Thread MLnick
Github user MLnick commented on a diff in the pull request: https://github.com/apache/spark/pull/17819#discussion_r143730103 --- Diff: mllib/src/test/scala/org/apache/spark/ml/feature/BucketizerSuite.scala --- @@ -187,6 +188,196 @@ class BucketizerSuite extends SparkFunSuite with

[GitHub] spark pull request #17819: [SPARK-20542][ML][SQL] Add an API to Bucketizer t...

2017-10-10 Thread MLnick
Github user MLnick commented on a diff in the pull request: https://github.com/apache/spark/pull/17819#discussion_r143728663 --- Diff: mllib/src/test/scala/org/apache/spark/ml/feature/BucketizerSuite.scala --- @@ -187,6 +188,196 @@ class BucketizerSuite extends SparkFunSuite with

[GitHub] spark pull request #17819: [SPARK-20542][ML][SQL] Add an API to Bucketizer t...

2017-10-10 Thread MLnick
Github user MLnick commented on a diff in the pull request: https://github.com/apache/spark/pull/17819#discussion_r143728145 --- Diff: mllib/src/main/scala/org/apache/spark/ml/feature/Bucketizer.scala --- @@ -96,9 +99,71 @@ final class Bucketizer @Since("1.4.0") (@Since("1.4.0")

[GitHub] spark pull request #17819: [SPARK-20542][ML][SQL] Add an API to Bucketizer t...

2017-10-10 Thread MLnick
Github user MLnick commented on a diff in the pull request: https://github.com/apache/spark/pull/17819#discussion_r143724079 --- Diff: mllib/src/test/scala/org/apache/spark/ml/feature/BucketizerSuite.scala --- @@ -187,6 +188,196 @@ class BucketizerSuite extends SparkFunSuite with

  1   2   3   4   5   6   >