[GitHub] spark pull request: [SPARK-2454] Do not assume drivers and executo...

2014-07-21 Thread andrewor14
Github user andrewor14 commented on the pull request: https://github.com/apache/spark/pull/1472#issuecomment-49689468 I have tested this on a standalone cluster, purposefully changing the directory structure of the driver to be different from that of the executors. I was able to

[GitHub] spark pull request: [SPARK-2538] [PySpark] Hash based disk spillin...

2014-07-21 Thread mateiz
Github user mateiz commented on a diff in the pull request: https://github.com/apache/spark/pull/1460#discussion_r15207593 --- Diff: docs/configuration.md --- @@ -195,6 +195,15 @@ Apart from these, the following properties are also available, and may be useful Spark's

[GitHub] spark pull request: [SPARK-2538] [PySpark] Hash based disk spillin...

2014-07-21 Thread mateiz
Github user mateiz commented on a diff in the pull request: https://github.com/apache/spark/pull/1460#discussion_r15207614 --- Diff: python/pyspark/rdd.py --- @@ -1247,15 +1262,16 @@ def combineLocally(iterator): return combiners.iteritems()

[GitHub] spark pull request: [SPARK-2538] [PySpark] Hash based disk spillin...

2014-07-21 Thread mateiz
Github user mateiz commented on a diff in the pull request: https://github.com/apache/spark/pull/1460#discussion_r15207622 --- Diff: python/pyspark/rdd.py --- @@ -168,6 +169,20 @@ def _replaceRoot(self, value): self._sink(1) +def _parse_memory(s):

[GitHub] spark pull request: [SPARK-2538] [PySpark] Hash based disk spillin...

2014-07-21 Thread mateiz
Github user mateiz commented on a diff in the pull request: https://github.com/apache/spark/pull/1460#discussion_r15207630 --- Diff: python/pyspark/serializers.py --- @@ -297,6 +297,33 @@ class MarshalSerializer(FramedSerializer): loads = marshal.loads

[GitHub] spark pull request: [SPARK-2538] [PySpark] Hash based disk spillin...

2014-07-21 Thread mateiz
Github user mateiz commented on a diff in the pull request: https://github.com/apache/spark/pull/1460#discussion_r15207645 --- Diff: python/pyspark/shuffle.py --- @@ -0,0 +1,258 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor

[GitHub] spark pull request: [SPARK-2538] [PySpark] Hash based disk spillin...

2014-07-21 Thread mateiz
Github user mateiz commented on a diff in the pull request: https://github.com/apache/spark/pull/1460#discussion_r15207668 --- Diff: python/pyspark/shuffle.py --- @@ -0,0 +1,258 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor

[GitHub] spark pull request: [SPARK-2538] [PySpark] Hash based disk spillin...

2014-07-21 Thread mateiz
Github user mateiz commented on a diff in the pull request: https://github.com/apache/spark/pull/1460#discussion_r15207662 --- Diff: python/pyspark/shuffle.py --- @@ -0,0 +1,258 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor

[GitHub] spark pull request: [SPARK-2538] [PySpark] Hash based disk spillin...

2014-07-21 Thread mateiz
Github user mateiz commented on a diff in the pull request: https://github.com/apache/spark/pull/1460#discussion_r15207683 --- Diff: python/pyspark/tests.py --- @@ -47,6 +48,40 @@ SPARK_HOME = os.environ[SPARK_HOME] +class TestMerger(unittest.TestCase): +

[GitHub] spark pull request: SPARK-2565. Update ShuffleReadMetrics as block...

2014-07-21 Thread andrewor14
Github user andrewor14 commented on the pull request: https://github.com/apache/spark/pull/1507#issuecomment-49690248 @kayousterhout --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark pull request: [SPARK-2538] [PySpark] Hash based disk spillin...

2014-07-21 Thread mateiz
Github user mateiz commented on a diff in the pull request: https://github.com/apache/spark/pull/1460#discussion_r15207731 --- Diff: python/pyspark/rdd.py --- @@ -1247,15 +1262,16 @@ def combineLocally(iterator): return combiners.iteritems()

[GitHub] spark pull request: [SPARK-2538] [PySpark] Hash based disk spillin...

2014-07-21 Thread mateiz
Github user mateiz commented on a diff in the pull request: https://github.com/apache/spark/pull/1460#discussion_r15207721 --- Diff: python/pyspark/rdd.py --- @@ -1247,15 +1262,16 @@ def combineLocally(iterator): return combiners.iteritems()

[GitHub] spark pull request: [SPARK-2538] [PySpark] Hash based disk spillin...

2014-07-21 Thread mateiz
Github user mateiz commented on a diff in the pull request: https://github.com/apache/spark/pull/1460#discussion_r15207775 --- Diff: python/pyspark/shuffle.py --- @@ -0,0 +1,258 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor

[GitHub] spark pull request: [SPARK-2555] Support configuration spark.sched...

2014-07-21 Thread li-zhihui
Github user li-zhihui commented on the pull request: https://github.com/apache/spark/pull/1462#issuecomment-49690527 Sorry @tgravescs @kayousterhout I am not aware of the issue's seriousness at that time. thanks @kayousterhout for your coach. --- If your project is set up for it,

[GitHub] spark pull request: [SPARK-2538] [PySpark] Hash based disk spillin...

2014-07-21 Thread mateiz
Github user mateiz commented on a diff in the pull request: https://github.com/apache/spark/pull/1460#discussion_r15207812 --- Diff: python/pyspark/shuffle.py --- @@ -0,0 +1,258 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor

[GitHub] spark pull request: [SPARK-2538] [PySpark] Hash based disk spillin...

2014-07-21 Thread mateiz
Github user mateiz commented on a diff in the pull request: https://github.com/apache/spark/pull/1460#discussion_r15207837 --- Diff: python/pyspark/shuffle.py --- @@ -0,0 +1,258 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor

[GitHub] spark pull request: [SPARK-2538] [PySpark] Hash based disk spillin...

2014-07-21 Thread mateiz
Github user mateiz commented on a diff in the pull request: https://github.com/apache/spark/pull/1460#discussion_r15207914 --- Diff: python/pyspark/shuffle.py --- @@ -0,0 +1,258 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor

[GitHub] spark pull request: [SPARK-2538] [PySpark] Hash based disk spillin...

2014-07-21 Thread mateiz
Github user mateiz commented on a diff in the pull request: https://github.com/apache/spark/pull/1460#discussion_r15207936 --- Diff: python/pyspark/shuffle.py --- @@ -0,0 +1,258 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor

[GitHub] spark pull request: [SPARK-2538] [PySpark] Hash based disk spillin...

2014-07-21 Thread mateiz
Github user mateiz commented on a diff in the pull request: https://github.com/apache/spark/pull/1460#discussion_r15207983 --- Diff: python/pyspark/shuffle.py --- @@ -0,0 +1,258 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor

[GitHub] spark pull request: [SPARK-2538] [PySpark] Hash based disk spillin...

2014-07-21 Thread mateiz
Github user mateiz commented on a diff in the pull request: https://github.com/apache/spark/pull/1460#discussion_r15208014 --- Diff: python/pyspark/shuffle.py --- @@ -0,0 +1,258 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor

[GitHub] spark pull request: [SPARK-2538] [PySpark] Hash based disk spillin...

2014-07-21 Thread mateiz
Github user mateiz commented on a diff in the pull request: https://github.com/apache/spark/pull/1460#discussion_r15208037 --- Diff: core/src/main/scala/org/apache/spark/api/python/PythonRDD.scala --- @@ -57,7 +57,9 @@ private[spark] class PythonRDD[T: ClassTag]( override

[GitHub] spark pull request: [SPARK-2538] [PySpark] Hash based disk spillin...

2014-07-21 Thread mateiz
Github user mateiz commented on a diff in the pull request: https://github.com/apache/spark/pull/1460#discussion_r15208070 --- Diff: python/pyspark/shuffle.py --- @@ -0,0 +1,258 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor

[GitHub] spark pull request: [SPARK-2538] [PySpark] Hash based disk spillin...

2014-07-21 Thread mateiz
Github user mateiz commented on a diff in the pull request: https://github.com/apache/spark/pull/1460#discussion_r15208097 --- Diff: python/pyspark/shuffle.py --- @@ -0,0 +1,258 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor

[GitHub] spark pull request: [SPARK-2588][SQL] Add some more DSLs.

2014-07-21 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1491#issuecomment-49692220 QA tests have started for PR 1491. This patch merges cleanly. brView progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16939/consoleFull ---

[GitHub] spark pull request: [SPARK-2086] Improve output of toDebugString t...

2014-07-21 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/1364 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

[GitHub] spark pull request: [SPARK-2603][SQL] Remove unnecessary toMap and...

2014-07-21 Thread advancedxy
Github user advancedxy commented on a diff in the pull request: https://github.com/apache/spark/pull/1504#discussion_r15208849 --- Diff: sql/core/src/main/scala/org/apache/spark/sql/json/JsonRDD.scala --- @@ -239,9 +239,9 @@ private[sql] object JsonRDD extends Logging {

[GitHub] spark pull request: [SPARK-2454] Do not assume drivers and executo...

2014-07-21 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1472#issuecomment-49693537 QA results for PR 1472:br- This patch PASSES unit tests.br- This patch merges cleanlybr- This patch adds no public classesbrbrFor more information see test

[GitHub] spark pull request: (WIP) SPARK-2045 Sort-based shuffle

2014-07-21 Thread colorant
Github user colorant commented on a diff in the pull request: https://github.com/apache/spark/pull/1499#discussion_r15208983 --- Diff: core/src/main/scala/org/apache/spark/shuffle/sort/SortShuffleManager.scala --- @@ -0,0 +1,80 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: (WIP) SPARK-2045 Sort-based shuffle

2014-07-21 Thread colorant
Github user colorant commented on a diff in the pull request: https://github.com/apache/spark/pull/1499#discussion_r15209013 --- Diff: core/src/main/scala/org/apache/spark/shuffle/sort/SortShuffleWriter.scala --- @@ -0,0 +1,156 @@ +/* + * Licensed to the Apache Software

[GitHub] spark pull request: [bagel]unpersist old processed rdd

2014-07-21 Thread adrian-wang
GitHub user adrian-wang opened a pull request: https://github.com/apache/spark/pull/1519 [bagel]unpersist old processed rdd You can merge this pull request into a Git repository by running: $ git pull https://github.com/adrian-wang/spark bagelunpersist Alternatively you can

[GitHub] spark pull request: [bagel]unpersist old processed rdd

2014-07-21 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1519#issuecomment-49694004 QA tests have started for PR 1519. This patch merges cleanly. brView progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16940/consoleFull ---

[GitHub] spark pull request: [SPARK-2470] PEP8 fixes to PySpark

2014-07-21 Thread rxin
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/1505#issuecomment-49694729 If we have fixed all the problems, let's do it. We should add pep8 check to the Jenkins scripts (in /dev/). I'm not 100% positive whether our Jenkins instances have

[GitHub] spark pull request: [SPARK-2470] PEP8 fixes to PySpark

2014-07-21 Thread rxin
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/1505#issuecomment-49694759 Jenkins, test this please. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-2470] PEP8 fixes to PySpark

2014-07-21 Thread rxin
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/1505#issuecomment-49694764 Jenkins, add to whitelist. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-2470] PEP8 fixes to PySpark

2014-07-21 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1505#issuecomment-49694965 QA tests have started for PR 1505. This patch merges cleanly. brView progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16941/consoleFull ---

[GitHub] spark pull request: [SPARK-2514] [mllib] Random RDD generator

2014-07-21 Thread dorx
GitHub user dorx opened a pull request: https://github.com/apache/spark/pull/1520 [SPARK-2514] [mllib] Random RDD generator Utilities for generating random RDDs. RandomRDD and RandomVectorRDD are created instead of using `sc.parallelize(range:Range)` because `Range`

[GitHub] spark pull request: [SPARK-2514] [mllib] Random RDD generator

2014-07-21 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1520#issuecomment-49695427 QA tests have started for PR 1520. This patch merges cleanly. brView progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16942/consoleFull ---

[GitHub] spark pull request: [SPARK-2514] [mllib] Random RDD generator

2014-07-21 Thread dorx
Github user dorx commented on the pull request: https://github.com/apache/spark/pull/1520#issuecomment-49695577 @falaki @jkbradley @mengxr --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-2588][SQL] Add some more DSLs.

2014-07-21 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1491#issuecomment-49696891 QA results for PR 1491:br- This patch PASSES unit tests.br- This patch merges cleanlybr- This patch adds no public classesbrbrFor more information see test

[GitHub] spark pull request: SPARK-1416: PySpark support for SequenceFile a...

2014-07-21 Thread rjurney
Github user rjurney commented on the pull request: https://github.com/apache/spark/pull/455#issuecomment-49697592 My data has doubles in it, could that be the issue? Using Python version 2.7.6rc1 (v2.7.6rc1:4913d0e9be30+, Oct 27 2013 20:52:11) SparkContext available as

[GitHub] spark pull request: SPARK-1416: PySpark support for SequenceFile a...

2014-07-21 Thread rjurney
Github user rjurney commented on the pull request: https://github.com/apache/spark/pull/455#issuecomment-49698081 It also looks like we need a custom function to handle the UNION type. I've extended what you wrote for DOUBLE/FLOAT: def unpack(value: Any, schema: Schema):

[GitHub] spark pull request: Adding OWL-QN optimizer for L1 regularizations...

2014-07-21 Thread gzm55
Github user gzm55 commented on the pull request: https://github.com/apache/spark/pull/840#issuecomment-49698091 @codedeft could you fix the conflicts when merging into master? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as

[GitHub] spark pull request: SPARK-1416: PySpark support for SequenceFile a...

2014-07-21 Thread rjurney
Github user rjurney commented on the pull request: https://github.com/apache/spark/pull/455#issuecomment-49698499 This file, without any UNIONS, works: https://github.com/miguno/avro-cli-examples/blob/master/twitter.snappy.avro My data is more complex :( --- If your project

[GitHub] spark pull request: [SPARK-2470] PEP8 fixes to PySpark

2014-07-21 Thread nchammas
Github user nchammas commented on the pull request: https://github.com/apache/spark/pull/1505#issuecomment-49698547 If we have to, we could probably somehow package `pep8` and its dependencies as a standalone. It's doable but I think also a bit ugly and harder to update. As

[GitHub] spark pull request: [bagel]unpersist old processed rdd

2014-07-21 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1519#issuecomment-49698613 QA results for PR 1519:br- This patch PASSES unit tests.br- This patch merges cleanlybr- This patch adds no public classesbrbrFor more information see test

[GitHub] spark pull request: [SPARK-2470] PEP8 fixes to PySpark

2014-07-21 Thread rxin
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/1505#issuecomment-49698681 Let's definitely add the pep8 check in a separate PR. I'm waiting for Jenkins to come back positive before merging this pull request. I think it'd make sense to

[GitHub] spark pull request: [SPARK-2479][MLlib] Comparing floating-point n...

2014-07-21 Thread mengxr
Github user mengxr commented on the pull request: https://github.com/apache/spark/pull/1425#issuecomment-49698702 @dbtsai I saw why `!(a~==b)` doesn't work but the question was that `!~==` was not used in our tests except the unit tests for itself. --- If your project is set up for

[GitHub] spark pull request: SPARK-2250: show stage RDDs in UI

2014-07-21 Thread nevillelyh
Github user nevillelyh commented on the pull request: https://github.com/apache/spark/pull/1188#issuecomment-49698747 @andrewor14 ping again? Rebased again and scalastyle/tests passed locally. --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] spark pull request: [SPARK-2538] [PySpark] Hash based disk spillin...

2014-07-21 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/1460#discussion_r15211059 --- Diff: core/src/main/scala/org/apache/spark/api/python/PythonRDD.scala --- @@ -57,7 +57,9 @@ private[spark] class PythonRDD[T: ClassTag]( override

[GitHub] spark pull request: SPARK-1416: PySpark support for SequenceFile a...

2014-07-21 Thread ericgarcia
Github user ericgarcia commented on the pull request: https://github.com/apache/spark/pull/455#issuecomment-49699078 @rjurney Nice job adding the DOUBLE and FLOAT. I might be able to get the UNIONS working if you post an example .avro file for me to test it with. I neglected the data

[GitHub] spark pull request: [SPARK-2538] [PySpark] Hash based disk spillin...

2014-07-21 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/1460#discussion_r15211142 --- Diff: python/pyspark/rdd.py --- @@ -1247,15 +1262,16 @@ def combineLocally(iterator): return combiners.iteritems()

[GitHub] spark pull request: [SPARK-2470] PEP8 fixes to PySpark

2014-07-21 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1505#issuecomment-49699504 QA results for PR 1505:br- This patch PASSES unit tests.br- This patch merges cleanlybr- This patch adds no public classesbrbrFor more information see test

[GitHub] spark pull request: [SPARK-2538] [PySpark] Hash based disk spillin...

2014-07-21 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/1460#discussion_r15211313 --- Diff: python/pyspark/serializers.py --- @@ -297,6 +297,33 @@ class MarshalSerializer(FramedSerializer): loads = marshal.loads

[GitHub] spark pull request: [SPARK-2538] [PySpark] Hash based disk spillin...

2014-07-21 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/1460#discussion_r15211390 --- Diff: python/pyspark/shuffle.py --- @@ -0,0 +1,258 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor

[GitHub] spark pull request: [SPARK-2470] PEP8 fixes to PySpark

2014-07-21 Thread rxin
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/1505#issuecomment-49699903 Merging this in master! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this

[GitHub] spark pull request: [SPARK-2364][STREAMING] ShuffledDStream run ta...

2014-07-21 Thread guowei2
Github user guowei2 commented on the pull request: https://github.com/apache/spark/pull/1291#issuecomment-49699950 is there anything wrong with this patch? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project

[GitHub] spark pull request: [SPARK-2470] PEP8 fixes to PySpark

2014-07-21 Thread rxin
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/1505#issuecomment-49699932 Now I've merged this, do you mind submitting a pep8 checker? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as

[GitHub] spark pull request: [SPARK-2470] PEP8 fixes to PySpark

2014-07-21 Thread asfgit
Github user asfgit closed the pull request at: https://github.com/apache/spark/pull/1505 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

[GitHub] spark pull request: [SPARK-2514] [mllib] Random RDD generator

2014-07-21 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1520#issuecomment-49700088 QA results for PR 1520:br- This patch PASSES unit tests.br- This patch merges cleanlybr- This patch adds the following public classes (experimental):brtrait

[GitHub] spark pull request: SPARK-1416: PySpark support for SequenceFile a...

2014-07-21 Thread rjurney
Github user rjurney commented on the pull request: https://github.com/apache/spark/pull/455#issuecomment-49700443 @ericgarcia Thanks! Very exciting. An example file is here: https://drive.google.com/file/d/0B3wy0wXNwbpRekJVaW13cGRKb1U/edit?usp=sharing --- If your project is set up

[GitHub] spark pull request: [SPARK-2538] [PySpark] Hash based disk spillin...

2014-07-21 Thread davies
Github user davies commented on a diff in the pull request: https://github.com/apache/spark/pull/1460#discussion_r15211652 --- Diff: python/pyspark/shuffle.py --- @@ -0,0 +1,258 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one or more +# contributor

[GitHub] spark pull request: [SQL][CORE] SPARK-2102

2014-07-21 Thread pwendell
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/1377#discussion_r15211673 --- Diff: docs/configuration.md --- @@ -382,6 +382,16 @@ Apart from these, the following properties are also available, and may be useful /td

[GitHub] spark pull request: SPARK-2099. Report progress while task is runn...

2014-07-21 Thread sryza
Github user sryza commented on the pull request: https://github.com/apache/spark/pull/1056#issuecomment-49700737 Upmerged and incorporated review comments. Also added a random sleep at the start so that the executor heartbeats are less likely to get in sync. --- If your

[GitHub] spark pull request: Fix for SPARK-2228

2014-07-21 Thread kellrott
Github user kellrott closed the pull request at: https://github.com/apache/spark/pull/1182 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is

[GitHub] spark pull request: SPARK-2099. Report progress while task is runn...

2014-07-21 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1056#issuecomment-49700944 QA tests have started for PR 1056. This patch merges cleanly. brView progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16944/consoleFull ---

[GitHub] spark pull request: [SQL][CORE] SPARK-2102

2014-07-21 Thread ianoc
Github user ianoc commented on the pull request: https://github.com/apache/spark/pull/1377#issuecomment-49701073 @pwendell Sounds good to me, updated as per your suggestion. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as

[GitHub] spark pull request: [SPARK-2523] [SQL] [WIP] Hadoop table scan bug...

2014-07-21 Thread chenghao-intel
Github user chenghao-intel commented on the pull request: https://github.com/apache/spark/pull/1439#issuecomment-49574159 Thank you guys, I've updated the code as suggested, and the also attached the micro-benchmark result in the PR description. --- If your project is set up for it,

[GitHub] spark pull request: [SPARK-2410][SQL][WIP] Cherry picked Hive Thri...

2014-07-21 Thread liancheng
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/1399#discussion_r15155941 --- Diff: bin/spark-sql --- @@ -0,0 +1,81 @@ +#!/usr/bin/env bash + +# +# Licensed to the Apache Software Foundation (ASF) under one or more

[GitHub] spark pull request: [SPARK-2410][SQL][WIP] Cherry picked Hive Thri...

2014-07-21 Thread liancheng
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/1399#discussion_r15155947 --- Diff: assembly/pom.xml --- @@ -162,6 +162,11 @@ artifactIdspark-hive_${scala.binary.version}/artifactId

[GitHub] spark pull request: [SPARK-2410][SQL][WIP] Cherry picked Hive Thri...

2014-07-21 Thread liancheng
Github user liancheng commented on a diff in the pull request: https://github.com/apache/spark/pull/1399#discussion_r15155961 --- Diff: docs/sql-programming-guide.md --- @@ -573,4 +572,170 @@ prefixed with a tick (`'`). Implicit conversions turn these symbols into expres

[GitHub] spark pull request: SPARK-2226: transform HAVING clauses with aggr...

2014-07-21 Thread chenghao-intel
Github user chenghao-intel commented on a diff in the pull request: https://github.com/apache/spark/pull/1497#discussion_r15156055 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala --- @@ -152,6 +155,34 @@ class Analyzer(catalog: Catalog,

[GitHub] spark pull request: SPARK-2047: Introduce an in-mem Sorter, and us...

2014-07-21 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/1502#issuecomment-49574913 I spoke with @aarondav, but I'm not sure we can borrow this code from Java if it is LGPL licensed. --- If your project is set up for it, you can reply to this email

[GitHub] spark pull request: [SPARK-2494] [PySpark] make hash of None consi...

2014-07-21 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/1371#issuecomment-49574961 Are you sure about that? They're forked from Java, not from the Python process. If this is the case, please suggest another way to test this. We can't add a bug

[GitHub] spark pull request: SPARK-2269 Refactor mesos scheduler resourceOf...

2014-07-21 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/1487#issuecomment-49575030 Jenkins, test this please. Naw this is a message from jenkins. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as

[GitHub] spark pull request: SPARK-2269 Refactor mesos scheduler resourceOf...

2014-07-21 Thread pwendell
Github user pwendell commented on a diff in the pull request: https://github.com/apache/spark/pull/1487#discussion_r15156158 --- Diff: core/src/test/scala/org/apache/spark/scheduler/mesos/MesosSchedulerBackendSuite.scala --- @@ -0,0 +1,84 @@ +/* + * Licensed to the Apache

[GitHub] spark pull request: SPARK-2047: Introduce an in-mem Sorter, and us...

2014-07-21 Thread rxin
Github user rxin commented on the pull request: https://github.com/apache/spark/pull/1502#issuecomment-49575156 He did it! --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature

[GitHub] spark pull request: SPARK-1707. Remove unnecessary 3 second sleep ...

2014-07-21 Thread lianhuiwang
Github user lianhuiwang commented on a diff in the pull request: https://github.com/apache/spark/pull/634#discussion_r15156184 --- Diff: yarn/common/src/main/scala/org/apache/spark/scheduler/cluster/YarnClientClusterScheduler.scala --- @@ -37,14 +37,4 @@ private[spark] class

[GitHub] spark pull request: SPARK-2310. Support arbitrary Spark properties...

2014-07-21 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/1253#issuecomment-49575215 I agree, -D is for JVM options, but these are not arbitrary JVM options. --- If your project is set up for it, you can reply to this email and have your reply appear on

[GitHub] spark pull request: SPARK-2047: Introduce an in-mem Sorter, and us...

2014-07-21 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1502#issuecomment-49575204 QA tests have started for PR 1502. This patch merges cleanly. brView progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16898/consoleFull ---

[GitHub] spark pull request: SPARK-2269 Refactor mesos scheduler resourceOf...

2014-07-21 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1487#issuecomment-49575213 QA tests have started for PR 1487. This patch merges cleanly. brView progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16899/consoleFull ---

[GitHub] spark pull request: SPARK-2269 Refactor mesos scheduler resourceOf...

2014-07-21 Thread pwendell
Github user pwendell commented on the pull request: https://github.com/apache/spark/pull/1487#issuecomment-49575273 Don't have a ton of time to look ATM but I kicked off the tests. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub

[GitHub] spark pull request: SPARK-1707. Remove unnecessary 3 second sleep ...

2014-07-21 Thread lianhuiwang
Github user lianhuiwang commented on a diff in the pull request: https://github.com/apache/spark/pull/634#discussion_r15156221 --- Diff: yarn/common/src/main/scala/org/apache/spark/scheduler/cluster/YarnClientSchedulerBackend.scala --- @@ -30,6 +30,11 @@ private[spark] class

[GitHub] spark pull request: SPARK-2565. Update ShuffleReadMetrics as block...

2014-07-21 Thread sryza
GitHub user sryza opened a pull request: https://github.com/apache/spark/pull/1507 SPARK-2565. Update ShuffleReadMetrics as blocks are fetched You can merge this pull request into a Git repository by running: $ git pull https://github.com/sryza/spark sandy-spark-2565

[GitHub] spark pull request: [SPARK-2103][Streaming] Change to ClassTag for...

2014-07-21 Thread jerryshao
GitHub user jerryshao opened a pull request: https://github.com/apache/spark/pull/1508 [SPARK-2103][Streaming] Change to ClassTag for KafkaInputDStream and fix reflection issue This PR updates previous Manifest for KafkaInputDStream's Decoder to ClassTag, also fix the problem

[GitHub] spark pull request: SPARK-2565. Update ShuffleReadMetrics as block...

2014-07-21 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1507#issuecomment-49575625 QA tests have started for PR 1507. This patch merges cleanly. brView progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16901/consoleFull ---

[GitHub] spark pull request: [SPARK-2103][Streaming] Change to ClassTag for...

2014-07-21 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1508#issuecomment-49575634 QA tests have started for PR 1508. This patch merges cleanly. brView progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16900/consoleFull ---

[GitHub] spark pull request: SPARK-2047: Introduce an in-mem Sorter, and us...

2014-07-21 Thread aarondav
Github user aarondav commented on the pull request: https://github.com/apache/spark/pull/1502#issuecomment-49575933 In light of that minor issue, I have ported an Apache v2 Timsort (from the Android repos). It's a bit longer, but far more performant (roughly twice as fast!) --- If

[GitHub] spark pull request: SPARK-2047: Introduce an in-mem Sorter, and us...

2014-07-21 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1502#issuecomment-49576105 QA tests have started for PR 1502. This patch merges cleanly. brView progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16902/consoleFull ---

[GitHub] spark pull request: [SPARK-2190][SQL] Specialized ColumnType for T...

2014-07-21 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1440#issuecomment-49576209 QA results for PR 1440:br- This patch PASSES unit tests.brbrFor more information see test

[GitHub] spark pull request: SPARK-2047: Introduce an in-mem Sorter, and us...

2014-07-21 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/1502#issuecomment-49576500 Hey Aaron, make sure you add a note in the LICENSE file saying this part of the code is from Android (similar to the other notes there). --- If your project is set up

[GitHub] spark pull request: SPARK-2047: Introduce an in-mem Sorter, and us...

2014-07-21 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/1502#issuecomment-49576760 BTW the API looks good to me! Actually it might allow more efficient ways to keep track of the partition ID for each key in my case too. --- If your project is set up

[GitHub] spark pull request: [SPARK-2190][SQL] Specialized ColumnType for T...

2014-07-21 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1440#issuecomment-49576854 QA results for PR 1440:br- This patch PASSES unit tests.br- This patch merges cleanlybr- This patch adds no public classesbrbrFor more information see test

[GitHub] spark pull request: SPARK-2497 Exclude companion classes, with the...

2014-07-21 Thread ScrapCodes
Github user ScrapCodes commented on the pull request: https://github.com/apache/spark/pull/1463#issuecomment-49576941 @pwendell The reason I did this is, if you compile a file A.scala with contents ```scala @SomeAnnotation object A ``` it will produce two class

[GitHub] spark pull request: [SPARK-2523] [SQL] [WIP] Hadoop table scan bug...

2014-07-21 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1439#issuecomment-49577438 QA results for PR 1439:br- This patch FAILED unit tests.brbrFor more information see test

[GitHub] spark pull request: SPARK-2047: Introduce an in-mem Sorter, and us...

2014-07-21 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1502#issuecomment-49577557 QA tests have started for PR 1502. This patch merges cleanly. brView progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16903/consoleFull ---

[GitHub] spark pull request: [SPARK-2494] [PySpark] make hash of None consi...

2014-07-21 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/1371#issuecomment-49577963 Jenkins, test this please --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have

[GitHub] spark pull request: [SPARK-2103][Streaming] Change to ClassTag for...

2014-07-21 Thread srowen
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/1508#issuecomment-49577894 Nice one Jerry! this actually enables using Kafka with non-String data in Java. --- If your project is set up for it, you can reply to this email and have your reply

[GitHub] spark pull request: SPARK-2310. Support arbitrary Spark properties...

2014-07-21 Thread srowen
Github user srowen commented on the pull request: https://github.com/apache/spark/pull/1253#issuecomment-49577965 Good points. I meant triaging all -D options but yes those then have very 'local' semantics. --- If your project is set up for it, you can reply to this email and have

[GitHub] spark pull request: [SPARK-2494] [PySpark] make hash of None consi...

2014-07-21 Thread mateiz
Github user mateiz commented on the pull request: https://github.com/apache/spark/pull/1371#issuecomment-49577994 Actually I see there are some doctests that I missed earlier, maybe that's okay. Though last time it failed Jenkins... --- If your project is set up for it, you can

[GitHub] spark pull request: [SPARK-2494] [PySpark] make hash of None consi...

2014-07-21 Thread SparkQA
Github user SparkQA commented on the pull request: https://github.com/apache/spark/pull/1371#issuecomment-49578184 QA tests have started for PR 1371. This patch merges cleanly. brView progress: https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16906/consoleFull ---

<    1   2   3   4   >