[GitHub] spark pull request: [SPARK-5454] More robust handling of self join...

2015-02-11 Thread chenghao-intel
Github user chenghao-intel commented on a diff in the pull request: https://github.com/apache/spark/pull/4520#discussion_r24507150 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -282,6 +279,29 @@ class Analyzer(catalog: Catalog,

[GitHub] spark pull request: [SPARK-5454] More robust handling of self join...

2015-02-11 Thread chenghao-intel
Github user chenghao-intel commented on a diff in the pull request: https://github.com/apache/spark/pull/4520#discussion_r24507183 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -282,6 +279,29 @@ class Analyzer(catalog: Catalog,

[GitHub] spark pull request: [SPARK-5454] More robust handling of self join...

2015-02-11 Thread chenghao-intel
Github user chenghao-intel commented on a diff in the pull request: https://github.com/apache/spark/pull/4520#discussion_r24507245 --- Diff: sql/core/src/test/resources/log4j.properties --- @@ -37,7 +37,10 @@ log4j.appender.FA.Threshold = INFO # Some packages are noisy

[GitHub] spark pull request: [SPARK-5454] More robust handling of self join...

2015-02-11 Thread chenghao-intel
Github user chenghao-intel commented on the pull request: https://github.com/apache/spark/pull/4520#issuecomment-73911621 LGTM in general except some of the minor issues. My original thought on this, is adding a new `Project` on top of the `MultiInstanceRelation`(if it

[GitHub] spark pull request: [SPARK-5522] Accelerate the Histroty Server st...

2015-02-11 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4525#issuecomment-73911610 [Test build #27290 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27290/consoleFull) for PR 4525 at commit

[GitHub] spark pull request: [SPARK-5521] PCA wrapper for easy transform ve...

2015-02-11 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/4304#issuecomment-73918335 ok to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request: [SPARK-5521] PCA wrapper for easy transform ve...

2015-02-11 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4304#issuecomment-73920270 [Test build #27291 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27291/consoleFull) for PR 4304 at commit

[GitHub] spark pull request: [SPARK-5521] PCA wrapper for easy transform ve...

2015-02-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4304#issuecomment-73920276 Test FAILed. Refer to this link for build results (access rights to CI server needed):

[GitHub] spark pull request: [SPARK-5741][SQL] Support the path contains co...

2015-02-11 Thread yhuai
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/4532#discussion_r24512732 --- Diff: sql/hive/src/main/scala/org/apache/spark/sql/hive/TableReader.scala --- @@ -248,7 +249,7 @@ private[hive] object HadoopTableReader extends

[GitHub] spark pull request: [SPARK-5521] PCA wrapper for easy transform ve...

2015-02-11 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/4304#discussion_r24511390 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/PCA.scala --- @@ -0,0 +1,111 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

[GitHub] spark pull request: [SPARK-5521] PCA wrapper for easy transform ve...

2015-02-11 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/4304#discussion_r24511371 --- Diff: docs/mllib-dimensionality-reduction.md --- @@ -157,6 +157,23 @@ val pc: Matrix = mat.computePrincipalComponents(10) // Principal components are

[GitHub] spark pull request: [SPARK-5521] PCA wrapper for easy transform ve...

2015-02-11 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/4304#discussion_r24511400 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/PCA.scala --- @@ -0,0 +1,111 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

[GitHub] spark pull request: [SPARK-5521] PCA wrapper for easy transform ve...

2015-02-11 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/4304#discussion_r24511394 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/PCA.scala --- @@ -0,0 +1,111 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

[GitHub] spark pull request: [SPARK-5741][SQL] Support the path contains co...

2015-02-11 Thread adrian-wang
Github user adrian-wang commented on the pull request: https://github.com/apache/spark/pull/4532#issuecomment-73919122 ok to test. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark pull request: [SPARK-5521] PCA wrapper for easy transform ve...

2015-02-11 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4304#issuecomment-73918912 [Test build #27291 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27291/consoleFull) for PR 4304 at commit

[GitHub] spark pull request: [SPARK-5722][SQL] fix for infer long type in p...

2015-02-11 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/4521#discussion_r24512284 --- Diff: python/pyspark/sql.py --- @@ -605,6 +605,10 @@ def _infer_type(obj): dataType = _type_mappings.get(type(obj)) if dataType is

[GitHub] spark pull request: [SPARK-5722][SQL] fix for infer long type in p...

2015-02-11 Thread davies
Github user davies commented on the pull request: https://github.com/apache/spark/pull/4521#issuecomment-73920542 @dondrake After adding a comment, I think it's ready to go. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as

[GitHub] spark pull request: [SPARK-5522] Accelerate the Histroty Server st...

2015-02-11 Thread andrewor14
Github user andrewor14 commented on the pull request: https://github.com/apache/spark/pull/4525#issuecomment-73910687 ok to test. Thanks for working on this! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-5521] PCA wrapper for easy transform ve...

2015-02-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4304#issuecomment-73906552 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-5521] PCA wrapper for easy transform ve...

2015-02-11 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/4304#discussion_r24511410 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/PCA.scala --- @@ -0,0 +1,111 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

[GitHub] spark pull request: [SPARK-5521] PCA wrapper for easy transform ve...

2015-02-11 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/4304#discussion_r24511397 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/PCA.scala --- @@ -0,0 +1,111 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

[GitHub] spark pull request: [SPARK-5521] PCA wrapper for easy transform ve...

2015-02-11 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/4304#discussion_r24511392 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/PCA.scala --- @@ -0,0 +1,111 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

[GitHub] spark pull request: [SPARK-5521] PCA wrapper for easy transform ve...

2015-02-11 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/4304#discussion_r24511403 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/PCA.scala --- @@ -0,0 +1,111 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

[GitHub] spark pull request: [SPARK-5521] PCA wrapper for easy transform ve...

2015-02-11 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/4304#discussion_r24511379 --- Diff: docs/mllib-feature-extraction.md --- @@ -370,3 +370,58 @@ data2 = labels.zip(normalizer2.transform(features)) {% endhighlight %} /div

[GitHub] spark pull request: [SPARK-5521] PCA wrapper for easy transform ve...

2015-02-11 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/4304#discussion_r24511414 --- Diff: mllib/src/test/scala/org/apache/spark/mllib/feature/PCASuite.scala --- @@ -0,0 +1,47 @@ +/* + * Licensed to the Apache Software Foundation

[GitHub] spark pull request: [SPARK-5521] PCA wrapper for easy transform ve...

2015-02-11 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/4304#discussion_r24511407 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/PCA.scala --- @@ -0,0 +1,111 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

[GitHub] spark pull request: [SPARK-5521] PCA wrapper for easy transform ve...

2015-02-11 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/4304#discussion_r24511384 --- Diff: mllib/src/main/scala/org/apache/spark/mllib/feature/PCA.scala --- @@ -0,0 +1,111 @@ +/* + * Licensed to the Apache Software Foundation (ASF)

[GitHub] spark pull request: [SPARK-5521] PCA wrapper for easy transform ve...

2015-02-11 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/4304#discussion_r24511362 --- Diff: docs/mllib-dimensionality-reduction.md --- @@ -157,6 +157,23 @@ val pc: Matrix = mat.computePrincipalComponents(10) // Principal components are

[GitHub] spark pull request: [SPARK-5722][SQL] fix for infer long type in p...

2015-02-11 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4521#issuecomment-73920435 [Test build #598 has started](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/598/consoleFull) for PR 4521 at commit

[GitHub] spark pull request: [SPARK-5741][SQL] Support the path contains co...

2015-02-11 Thread yhuai
Github user yhuai commented on the pull request: https://github.com/apache/spark/pull/4532#issuecomment-73921513 ok to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request: [SPARK-5677] [SPARK-5734] [SQL] [PySpark] Pyth...

2015-02-11 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4528#issuecomment-73921913 [Test build #27293 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27293/consoleFull) for PR 4528 at commit

[GitHub] spark pull request: [SPARK-5738] [SQL] Reuse mutable row for each ...

2015-02-11 Thread yhuai
Github user yhuai commented on the pull request: https://github.com/apache/spark/pull/4527#issuecomment-73925301 Also, can you add performance numbers? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does

[GitHub] spark pull request: Remove outdated remark about take(n).

2015-02-11 Thread srowen
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/4533#issuecomment-73907037 Looking at the implementation, the driver does query one partition at a time for the number of elements it thinks it needs and continues until it is satisfied. I'd

[GitHub] spark pull request: Remove outdated remark about take(n).

2015-02-11 Thread darabos
Github user darabos commented on the pull request: https://github.com/apache/spark/pull/4533#issuecomment-73910087 Oh, thanks. I never looked into how `allowLocal` works. Looks like it results in local execution if the number of affected partitions is 1

[GitHub] spark pull request: [SPARK-5741][SQL] Support the path contains co...

2015-02-11 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4532#issuecomment-73922024 [Test build #27292 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27292/consoleFull) for PR 4532 at commit

[GitHub] spark pull request: [SPARK-5738] [SQL] Reuse mutable row for each ...

2015-02-11 Thread yhuai
Github user yhuai commented on the pull request: https://github.com/apache/spark/pull/4527#issuecomment-73924548 Thank you for working on it. Seems `new SpecificMutableRow(schema.fields.map(_.dataType))` cannot handle nested structure. I think we need to use the schema to

[GitHub] spark pull request: Remove outdated remark about take(n).

2015-02-11 Thread srowen
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/4533#issuecomment-73925690 Sounds correct. The subsequent tries do try in parallel. So, I suppose that's pretty good evidence it's parallelized. Unless anyone else speaks up I think this sentence

[GitHub] spark pull request: [SPARK-5677] [SPARK-5734] [SQL] [PySpark] Pyth...

2015-02-11 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/4528#discussion_r24517268 --- Diff: python/pyspark/sql/context.py --- @@ -294,9 +303,9 @@ def applySchema(self, rdd, schema): df =

[GitHub] spark pull request: [SPARK-5503][MLLIB] Example code for Power Ite...

2015-02-11 Thread javadba
Github user javadba commented on a diff in the pull request: https://github.com/apache/spark/pull/4495#discussion_r24521887 --- Diff: examples/src/main/scala/org/apache/spark/examples/mllib/PowerIterationClusteringExample.scala --- @@ -0,0 +1,149 @@ +/* + * Licensed to

[GitHub] spark pull request: [SPARK-5503][MLLIB] Example code for Power Ite...

2015-02-11 Thread javadba
Github user javadba commented on a diff in the pull request: https://github.com/apache/spark/pull/4495#discussion_r24521948 --- Diff: examples/src/main/scala/org/apache/spark/examples/mllib/PowerIterationClusteringExample.scala --- @@ -0,0 +1,149 @@ +/* + * Licensed to

[GitHub] spark pull request: [SPARK-5677] [SPARK-5734] [SQL] [PySpark] Pyth...

2015-02-11 Thread yhuai
Github user yhuai commented on a diff in the pull request: https://github.com/apache/spark/pull/4528#discussion_r24515956 --- Diff: python/pyspark/sql/context.py --- @@ -294,9 +303,9 @@ def applySchema(self, rdd, schema): df =

[GitHub] spark pull request: [SPARK-5677] [SPARK-5734] [SQL] [PySpark] Pyth...

2015-02-11 Thread yhuai
Github user yhuai commented on the pull request: https://github.com/apache/spark/pull/4528#issuecomment-73929153 For functions taking a table name as an input parameter, can we make the parameter name consistent in this PR? There are a few places in Python where we call it `name`.

[GitHub] spark pull request: [SPARK-5454] More robust handling of self join...

2015-02-11 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4520#issuecomment-73937987 [Test build #27296 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27296/consoleFull) for PR 4520 at commit

[GitHub] spark pull request: SPARK-4588 [MLLIB] [WIP] Add API for feature a...

2015-02-11 Thread srowen
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/4460#issuecomment-73927224 @mengxr Great, that helps me. I took another shot at implementing the above ideas. - Is package `org.apache.spark.ml.attribute` reasonable? - `FeatureType`

[GitHub] spark pull request: [SPARK-5740] Change comment default value from...

2015-02-11 Thread marmbrus
Github user marmbrus commented on the pull request: https://github.com/apache/spark/pull/4530#issuecomment-73934772 Hive isn't even consistent across versions for this... Also, SQL has a well defined concept for data missing, `null`. Given that, I don't think we should use a

[GitHub] spark pull request: [SPARK-5503][MLLIB] Example code for Power Ite...

2015-02-11 Thread javadba
Github user javadba commented on a diff in the pull request: https://github.com/apache/spark/pull/4495#discussion_r24519073 --- Diff: examples/src/main/scala/org/apache/spark/examples/mllib/PowerIterationClusteringExample.scala --- @@ -0,0 +1,149 @@ +/* + * Licensed to

[GitHub] spark pull request: [SPARK-5503][MLLIB] Example code for Power Ite...

2015-02-11 Thread javadba
Github user javadba commented on a diff in the pull request: https://github.com/apache/spark/pull/4495#discussion_r24519062 --- Diff: examples/src/main/scala/org/apache/spark/examples/mllib/PowerIterationClusteringExample.scala --- @@ -0,0 +1,149 @@ +/* + * Licensed to

[GitHub] spark pull request: Fixing SPARK-5744.

2015-02-11 Thread markhamstra
Github user markhamstra commented on the pull request: https://github.com/apache/spark/pull/4534#issuecomment-73936918 Thanks for doing this, but the title of this PR isn't sufficient. It will become the commit log message, so please update the PR title to adequately describe what

[GitHub] spark pull request: Fixing SPARK-5744.

2015-02-11 Thread srowen
Github user srowen commented on a diff in the pull request: https://github.com/apache/spark/pull/4534#discussion_r24520320 --- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala --- @@ -1253,9 +1253,9 @@ abstract class RDD[T: ClassTag]( /** * @return true

[GitHub] spark pull request: Making RDD.isEmpty robust to empty partitions ...

2015-02-11 Thread markhamstra
Github user markhamstra commented on the pull request: https://github.com/apache/spark/pull/4534#issuecomment-73940039 @tbertelsen Better, but you still should include SPARK-5744 and add [CORE] to the PR title. --- If your project is set up for it, you can reply to this email and

[GitHub] spark pull request: §Spark 5502

2015-02-11 Thread zapletal-martin
GitHub user zapletal-martin opened a pull request: https://github.com/apache/spark/pull/4535 §Spark 5502 You can merge this pull request into a Git repository by running: $ git pull https://github.com/zapletal-martin/spark SPARK-5502 Alternatively you can review and apply

[GitHub] spark pull request: §Spark 5502

2015-02-11 Thread zapletal-martin
Github user zapletal-martin closed the pull request at: https://github.com/apache/spark/pull/4535 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the

[GitHub] spark pull request: SPARK-4588 [MLLIB] [WIP] Add API for feature a...

2015-02-11 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4460#issuecomment-73927632 [Test build #27294 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27294/consoleFull) for PR 4460 at commit

[GitHub] spark pull request: [SPARK-5738] [SQL] Reuse mutable row for each ...

2015-02-11 Thread yhuai
Github user yhuai commented on the pull request: https://github.com/apache/spark/pull/4527#issuecomment-73927673 Oh, `enforceCorrectType` will take care inner structures by calling `asRow`. It will be great if we can use mutable rows for inner structures as well. --- If

[GitHub] spark pull request: SPARK-4588 [MLLIB] [WIP] Add API for feature a...

2015-02-11 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4460#issuecomment-73927817 [Test build #27294 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27294/consoleFull) for PR 4460 at commit

[GitHub] spark pull request: SPARK-4588 [MLLIB] [WIP] Add API for feature a...

2015-02-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4460#issuecomment-73927823 Test FAILed. Refer to this link for build results (access rights to CI server needed):

[GitHub] spark pull request: [SPARK-5677] [SPARK-5734] [SQL] [PySpark] Pyth...

2015-02-11 Thread davies
Github user davies commented on the pull request: https://github.com/apache/spark/pull/4528#issuecomment-73933395 @yhuai We're trying our best to have the same API between Scala, Java and Python, but sometimes we can't, because the difference between languages. For those out of

[GitHub] spark pull request: [SPARK-5741][SQL] Support the path contains co...

2015-02-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4532#issuecomment-73938036 Test PASSed. Refer to this link for build results (access rights to CI server needed):

[GitHub] spark pull request: #SPARK-2808 update kafka to version 0.8.2

2015-02-11 Thread helena
Github user helena commented on the pull request: https://github.com/apache/spark/pull/3631#issuecomment-73938104 @koeninger this is a definite blocker for me, I'm upgrading the connector to scala 2.11 with a cross build. Let me know if you have time, otherwise I will get back to

[GitHub] spark pull request: SPARK-5744 [CORE] Making RDD.isEmpty robust to...

2015-02-11 Thread tbertelsen
Github user tbertelsen commented on the pull request: https://github.com/apache/spark/pull/4534#issuecomment-73940991 Sorry. Is is good now? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-5744] [CORE] Making RDD.isEmpty robust ...

2015-02-11 Thread markhamstra
Github user markhamstra commented on the pull request: https://github.com/apache/spark/pull/4534#issuecomment-73941110 perfect --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request: [SPARK-5522] Accelerate the Histroty Server st...

2015-02-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4525#issuecomment-73926814 Test PASSed. Refer to this link for build results (access rights to CI server needed):

[GitHub] spark pull request: [SPARK-5522] Accelerate the Histroty Server st...

2015-02-11 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4525#issuecomment-73926806 [Test build #27290 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27290/consoleFull) for PR 4525 at commit

[GitHub] spark pull request: [SPARK-5503][MLLIB] Example code for Power Ite...

2015-02-11 Thread javadba
Github user javadba commented on a diff in the pull request: https://github.com/apache/spark/pull/4495#discussion_r24518027 --- Diff: examples/src/main/scala/org/apache/spark/examples/mllib/PowerIterationClusteringExample.scala --- @@ -0,0 +1,149 @@ +/* + * Licensed to

[GitHub] spark pull request: [SPARK-5503][MLLIB] Example code for Power Ite...

2015-02-11 Thread javadba
Github user javadba commented on a diff in the pull request: https://github.com/apache/spark/pull/4495#discussion_r24517979 --- Diff: examples/src/main/scala/org/apache/spark/examples/mllib/PowerIterationClusteringExample.scala --- @@ -0,0 +1,149 @@ +/* + * Licensed to

[GitHub] spark pull request: [SPARK-5677] [SPARK-5734] [SQL] [PySpark] Pyth...

2015-02-11 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/4528#discussion_r24518769 --- Diff: python/pyspark/sql/types.py --- @@ -188,6 +199,8 @@ class IntegerType(PrimitiveType): The data type representing int values.

[GitHub] spark pull request: [SPARK-5503][MLLIB] Example code for Power Ite...

2015-02-11 Thread javadba
Github user javadba commented on a diff in the pull request: https://github.com/apache/spark/pull/4495#discussion_r24518800 --- Diff: examples/src/main/scala/org/apache/spark/examples/mllib/PowerIterationClusteringExample.scala --- @@ -0,0 +1,149 @@ +/* + * Licensed to

[GitHub] spark pull request: [SPARK-5677] [SPARK-5734] [SQL] [PySpark] Pyth...

2015-02-11 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4528#issuecomment-73935150 [Test build #27295 has started](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27295/consoleFull) for PR 4528 at commit

[GitHub] spark pull request: Fixing SPARK-5744.

2015-02-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4534#issuecomment-73936109 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [SPARK-5744] [CORE] Making RDD.isEmpty robust ...

2015-02-11 Thread srowen
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/4534#issuecomment-73946196 This works, so it's not quite empty partitions: ``` sc.parallelize(Seq[Int](), 1).isEmpty() ``` This also creates an exception, so it's to do with

[GitHub] spark pull request: [SPARK-5740] Change comment default value from...

2015-02-11 Thread rxin
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/4530#issuecomment-73932469 This is technically API breaking - and if we want to change it, I think N/A is a better word. @marmbrus @mengxr ? --- If your project is set up for it, you can reply to

[GitHub] spark pull request: [SPARK-5503][MLLIB] Example code for Power Ite...

2015-02-11 Thread javadba
Github user javadba commented on a diff in the pull request: https://github.com/apache/spark/pull/4495#discussion_r24518861 --- Diff: examples/src/main/scala/org/apache/spark/examples/mllib/PowerIterationClusteringExample.scala --- @@ -0,0 +1,149 @@ +/* + * Licensed to

[GitHub] spark pull request: Making RDD.isEmpty robust to empty partitions ...

2015-02-11 Thread tbertelsen
Github user tbertelsen commented on a diff in the pull request: https://github.com/apache/spark/pull/4534#discussion_r24521457 --- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala --- @@ -1253,9 +1253,9 @@ abstract class RDD[T: ClassTag]( /** * @return

[GitHub] spark pull request: [SPARK-5649][SQL] added a rule to check dataty...

2015-02-11 Thread marmbrus
Github user marmbrus commented on a diff in the pull request: https://github.com/apache/spark/pull/4425#discussion_r24525418 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -69,6 +69,7 @@ class Analyzer(catalog: Catalog,

[GitHub] spark pull request: [SPARK-5503][MLLIB] Example code for Power Ite...

2015-02-11 Thread javadba
Github user javadba commented on a diff in the pull request: https://github.com/apache/spark/pull/4495#discussion_r24518525 --- Diff: examples/src/main/scala/org/apache/spark/examples/mllib/PowerIterationClusteringExample.scala --- @@ -0,0 +1,149 @@ +/* + * Licensed to

[GitHub] spark pull request: Fixing SPARK-5744.

2015-02-11 Thread tbertelsen
Github user tbertelsen commented on the pull request: https://github.com/apache/spark/pull/4534#issuecomment-73935576 FYI: The method was introduced in https://github.com/apache/spark/pull/4074 --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] spark pull request: Fixing SPARK-5744.

2015-02-11 Thread tbertelsen
GitHub user tbertelsen opened a pull request: https://github.com/apache/spark/pull/4534 Fixing SPARK-5744. RDD.isEmpty fails when an RDD contains empty partitions. You can merge this pull request into a Git repository by running: $ git pull https://github.com/tbertelsen/spark

[GitHub] spark pull request: [SPARK-5741][SQL] Support the path contains co...

2015-02-11 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4532#issuecomment-73938027 [Test build #27292 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27292/consoleFull) for PR 4532 at commit

[GitHub] spark pull request: [SPARK-5722][SQL] fix for infer long type in p...

2015-02-11 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4521#issuecomment-73937983 [Test build #598 has finished](https://amplab.cs.berkeley.edu/jenkins/job/NewSparkPullRequestBuilder/598/consoleFull) for PR 4521 at commit

[GitHub] spark pull request: [SPARK-5677] [SPARK-5734] [SQL] [PySpark] Pyth...

2015-02-11 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/4528#issuecomment-73943042 [Test build #27293 has finished](https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/27293/consoleFull) for PR 4528 at commit

[GitHub] spark pull request: [SPARK-5677] [SPARK-5734] [SQL] [PySpark] Pyth...

2015-02-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4528#issuecomment-73943049 Test PASSed. Refer to this link for build results (access rights to CI server needed):

[GitHub] spark pull request: [SPARK-5744] [CORE] Making RDD.isEmpty robust ...

2015-02-11 Thread tbertelsen
Github user tbertelsen commented on a diff in the pull request: https://github.com/apache/spark/pull/4534#discussion_r24523768 --- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala --- @@ -1253,9 +1253,9 @@ abstract class RDD[T: ClassTag]( /** * @return

[GitHub] spark pull request: [SPARK-5722][SQL] fix for infer long type in p...

2015-02-11 Thread dondrake
Github user dondrake commented on the pull request: https://github.com/apache/spark/pull/4521#issuecomment-73884346 @rxin I updated the title of the pull request. @davies In regards to inferSchema(), this is a PR for v1.2, I'm going to submit another PR for 1.3 that will use

[GitHub] spark pull request: [SPARK-5090][examples] The improvement of pyth...

2015-02-11 Thread GenTang
Github user GenTang commented on the pull request: https://github.com/apache/spark/pull/3920#issuecomment-73889175 @davies @MLnick Perhaps it is not a good place to discuss this, but I tried the script hbase_outputformat.py in spark 1.2.0 and it caused

[GitHub] spark pull request: [SPARK-5738] [SQL] Reuse mutable row for each ...

2015-02-11 Thread chenghao-intel
Github user chenghao-intel commented on a diff in the pull request: https://github.com/apache/spark/pull/4527#discussion_r24503655 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/json/JsonRDD.scala --- @@ -39,7 +39,19 @@ private[sql] object JsonRDD extends Logging {

[GitHub] spark pull request: Remove outdated remark about take(n).

2015-02-11 Thread darabos
GitHub user darabos opened a pull request: https://github.com/apache/spark/pull/4533 Remove outdated remark about take(n). Looking at the code, I believe this remark about `take(n)` computing partitions on the driver is no longer correct. Apologies if I'm wrong. This came

[GitHub] spark pull request: [SPARK-5090][examples] The improvement of pyth...

2015-02-11 Thread GenTang
Github user GenTang commented on a diff in the pull request: https://github.com/apache/spark/pull/3920#discussion_r24495658 --- Diff: examples/src/main/scala/org/apache/spark/examples/pythonconverters/HBaseConverters.scala --- @@ -23,15 +23,27 @@ import

[GitHub] spark pull request: [SPARK-5738] [SQL] Reuse mutable row for each ...

2015-02-11 Thread chenghao-intel
Github user chenghao-intel commented on a diff in the pull request: https://github.com/apache/spark/pull/4527#discussion_r24504239 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/json/JsonRDD.scala --- @@ -39,7 +39,19 @@ private[sql] object JsonRDD extends Logging {

[GitHub] spark pull request: [SPARK-5733] Error Link in Pagination of Histr...

2015-02-11 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/4523 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

[GitHub] spark pull request: [SPARK-5732][CORE]:Add an option to print the ...

2015-02-11 Thread uncleGen
Github user uncleGen commented on a diff in the pull request: https://github.com/apache/spark/pull/4522#discussion_r24495488 --- Diff: core/src/main/scala/org/apache/spark/deploy/SparkSubmitArguments.scala --- @@ -413,10 +413,13 @@ private[spark] class SparkSubmitArguments(args:

[GitHub] spark pull request: [SPARK-5738] [SQL] Reuse mutable row for each ...

2015-02-11 Thread chenghao-intel
Github user chenghao-intel commented on a diff in the pull request: https://github.com/apache/spark/pull/4527#discussion_r24503309 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/json/JsonRDD.scala --- @@ -39,7 +39,19 @@ private[sql] object JsonRDD extends Logging {

[GitHub] spark pull request: Remove outdated remark about take(n).

2015-02-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request: https://github.com/apache/spark/pull/4533#issuecomment-73884236 Can one of the admins verify this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your

[GitHub] spark pull request: [MLLIB][SPARK-5502] User guide for isotonic re...

2015-02-11 Thread zapletal-martin
GitHub user zapletal-martin opened a pull request: https://github.com/apache/spark/pull/4536 [MLLIB][SPARK-5502] User guide for isotonic regression User guide for isotonic regression added to docs/mllib-regression.md including code examples for Scala and Java. You can merge this

[GitHub] spark pull request: [SPARK-5622][SQL] add connector configuration ...

2015-02-11 Thread helena
Github user helena commented on a diff in the pull request: https://github.com/apache/spark/pull/4406#discussion_r24527030 --- Diff: sql/hive-thriftserver/src/main/scala/org/apache/spark/sql/hive/thriftserver/HiveThriftServer2.scala --- @@ -35,7 +37,7 @@ import

[GitHub] spark pull request: [SPARK-2808][Streaming][Kafka] update kafka to...

2015-02-11 Thread srowen
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/4537#issuecomment-73960395 ok to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request: [SPARK-5722][SQL] fix for infer long type in p...

2015-02-11 Thread srowen
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/4538#issuecomment-73961331 ok to test --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request: [MLLIB][SPARK-5502] User guide for isotonic re...

2015-02-11 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/4536#issuecomment-73961955 @zapletal-martin My bad. It should be `mllib-classification-regression.md`. --- If your project is set up for it, you can reply to this email and have your reply appear

[GitHub] spark pull request: [SPARK-3688][SQL]LogicalPlan can't resolve col...

2015-02-11 Thread rxin
Github user rxin commented on a diff in the pull request: https://github.com/apache/spark/pull/4524#discussion_r24531964 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/LogicalPlan.scala --- @@ -128,6 +128,29 @@ abstract class LogicalPlan extends

[GitHub] spark pull request: [SPARK-3688][SQL]LogicalPlan can't resolve col...

2015-02-11 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/4524 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

[GitHub] spark pull request: [MLLIB][SPARK-5502] User guide for isotonic re...

2015-02-11 Thread mengxr
Github user mengxr commented on a diff in the pull request: https://github.com/apache/spark/pull/4536#discussion_r24532955 --- Diff: data/mllib/sample_isotonic_regression_data.csv --- @@ -0,0 +1,101 @@ +4710.28,500.00,1.00 --- End diff -- Btw, we can use `.txt`

  1   2   3   4   5   >