date:20140717

[GitHub] spark pull request: [SPARK-2534] Avoid pulling in the entire RDD i...

2014-07-17 Thread rxin

Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/1450#issuecomment-49261722
  
Eh  the binary checker is really failing me. Is there a way to disable 
binary checker for inner functions? @pwendell


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-2519 part 2. Remove pattern matching on ...

2014-07-17 Thread aarondav

Github user aarondav commented on a diff in the pull request:

https://github.com/apache/spark/pull/1447#discussion_r15042631
  
--- Diff: core/src/main/scala/org/apache/spark/rdd/PairRDDFunctions.scala 
---
@@ -216,17 +216,17 @@ class PairRDDFunctions[K, V](self: RDD[(K, V)])
 
 def reducePartition(iter: Iterator[(K, V)]): Iterator[JHashMap[K, V]] 
= {
   val map = new JHashMap[K, V]
-  iter.foreach { case (k, v) =
-val old = map.get(k)
-map.put(k, if (old == null) v else func(old, v))
+  iter.foreach { pair =
+val old = map.get(pair._1)
--- End diff --

This I'm sure has been done a million times, but one more for good luck -- 
here is a `map { case (x, y) = x + y}` versus `foreach { xy = xy._1 + xy._2}`:
http://www.diffchecker.com/vtv6cptx

No new virtual function calls, but one new branch (trivially branch 
predictable) and one new throw (which will never be invoked), and several 
store/loads. (Perhaps I'm misreading the bytecode, but isn't `astore_2` 
followed by `aload_2` a no-op?)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-2519 part 2. Remove pattern matching on ...

2014-07-17 Thread mateiz

Github user mateiz commented on a diff in the pull request:

https://github.com/apache/spark/pull/1447#discussion_r15042857
  
--- Diff: core/src/main/scala/org/apache/spark/rdd/PairRDDFunctions.scala 
---
@@ -216,17 +216,17 @@ class PairRDDFunctions[K, V](self: RDD[(K, V)])
 
 def reducePartition(iter: Iterator[(K, V)]): Iterator[JHashMap[K, V]] 
= {
   val map = new JHashMap[K, V]
-  iter.foreach { case (k, v) =
-val old = map.get(k)
-map.put(k, if (old == null) v else func(old, v))
+  iter.foreach { pair =
+val old = map.get(pair._1)
--- End diff --

I think we tested this in the past and saw a difference, perhaps because it 
can now throw. But I don't remember the details. Weird that it's also doing 
those stores.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-2519 part 2. Remove pattern matching on ...

2014-07-17 Thread rxin

Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/1447#discussion_r15042897
  
--- Diff: core/src/main/scala/org/apache/spark/rdd/PairRDDFunctions.scala 
---
@@ -216,17 +216,17 @@ class PairRDDFunctions[K, V](self: RDD[(K, V)])
 
 def reducePartition(iter: Iterator[(K, V)]): Iterator[JHashMap[K, V]] 
= {
   val map = new JHashMap[K, V]
-  iter.foreach { case (k, v) =
-val old = map.get(k)
-map.put(k, if (old == null) v else func(old, v))
+  iter.foreach { pair =
+val old = map.get(pair._1)
--- End diff --

There are all kinds of stuff that's really hard to profile at runtime. For 
example, the JVM might decide not to JIT the code because it is longer. It 
might decide not to do it because of exceptions. Branch prediction might stop 
working because it has too many branches. Etc.

Again, I'm only in favor of this change because it is small, and it might 
have impact. If it is a big ugly change, I would be against it without 
profiling.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-2519 part 2. Remove pattern matching on ...

2014-07-17 Thread mateiz

Github user mateiz commented on a diff in the pull request:

https://github.com/apache/spark/pull/1447#discussion_r15042905
  
--- Diff: core/src/main/scala/org/apache/spark/rdd/PairRDDFunctions.scala 
---
@@ -571,12 +571,7 @@ class PairRDDFunctions[K, V](self: RDD[(K, V)])
   throw new SparkException(Default partitioner cannot partition array 
keys.)
 }
 val cg = new CoGroupedRDD[K](Seq(self, other1, other2, other3), 
partitioner)
-cg.mapValues { case Seq(vs, w1s, w2s, w3s) =
-  (vs.asInstanceOf[Seq[V]],
-w1s.asInstanceOf[Seq[W1]],
-w2s.asInstanceOf[Seq[W2]],
-w3s.asInstanceOf[Seq[W3]])
-}
+cg.mapValues { seq  = seq.asInstanceOf[(Seq[V], Seq[W1], Seq[W2], 
Seq[W3])] }
--- End diff --

And yeah as Reynold pointed out, this won't actually cast. A Seq is not a 
tuple4, you need to make a new one. In this case it might be okay to leave the 
code as is, because the non-pattern-matching way will be hairier. Basically 
you'd have to do `seq = (seq(0).asInstanceOf[Seq[V]], 
seq(1).asInstanceOf[Seq[W1]], etc)`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [branch-0.9] Fix github links in docs

2014-07-17 Thread mengxr

GitHub user mengxr opened a pull request:

https://github.com/apache/spark/pull/1456

[branch-0.9] Fix github links in docs

We moved example code in v1.0. The links are no longer valid if still 
pointing to `tree/master`.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mengxr/spark links-0.9

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/1456.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1456


commit b7b926092bd13f4f6adf9ab9871c3e086309ed23
Author: Xiangrui Meng m...@databricks.com
Date:   2014-07-17T06:33:23Z

change tree/master to tree/branch-0.9 in docs




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2299] Consolidate various stageIdTo* ha...

2014-07-17 Thread rxin

Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/1262#issuecomment-49263949
  
I pushed a new version. I'd first merge this and then have a separate PR to 
index the hash table by stageId + attempt. 

Now it includes @kayousterhout's change. Please take another look.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Required AM memory is amMem, not args.amMem...

2014-07-17 Thread maji2014

GitHub user maji2014 opened a pull request:

https://github.com/apache/spark/pull/1457

Required AM memory is amMem, not args.amMemory

ERROR yarn.Client: Required AM memory (1024) is above the max threshold 
(1048) of this cluster appears if this code is not changed. obviously, 1024 is 
less than 1048, so change this

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/maji2014/spark master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/1457.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1457


commit 2cc1af6e69fa1dc1a76383e44d9b5389d4e133af
Author: maji2014 ma...@asiainfo-linkage.com
Date:   2014-06-08T11:19:01Z

Update run-example

Old code can only be ran under spark_home and use bin/run-example.
 Error ./run-example: line 55: ./bin/spark-submit: No such file or 
directory appears when running in other place. So change this

commit 9975f13235c4e4f325df8419878b0977db533e5f
Author: derek ma ma...@asiainfo-linkage.com
Date:   2014-07-17T06:35:30Z

Required AM memory is amMem, not args.amMemory

ERROR yarn.Client: Required AM memory (1024) is above the max threshold 
(1048) of this cluster appears if this code is not changed. obviously, 1024 is 
less than 1048. So change this.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [branch-0.9] Fix github links in docs

2014-07-17 Thread mengxr

Github user mengxr commented on the pull request:

https://github.com/apache/spark/pull/1456#issuecomment-49263937
  
This PR only contains changes in docs and the links were verified using 
linkchecker.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [branch-0.9] Fix github links in docs

2014-07-17 Thread mengxr

Github user mengxr closed the pull request at:

https://github.com/apache/spark/pull/1456


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [branch-0.9] Fix github links in docs

2014-07-17 Thread mengxr

Github user mengxr commented on the pull request:

https://github.com/apache/spark/pull/1456#issuecomment-49264105
  
Merged into branch-0.9.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Required AM memory is amMem, not args.amMem...

2014-07-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/1457#issuecomment-49264180
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2299] Consolidate various stageIdTo* ha...

2014-07-17 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1262#issuecomment-49264275
  
QA tests have started for PR 1262. This patch merges cleanly. brView 
progress: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16768/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Required AM memory is amMem, not args.amMem...

2014-07-17 Thread maji2014

Github user maji2014 commented on the pull request:

https://github.com/apache/spark/pull/1457#issuecomment-49264699
  
Please focus on the second issue  as the first issue is a old patch on June.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-2294: fix locality inversion bug in Task...

2014-07-17 Thread lirui-intel

Github user lirui-intel commented on the pull request:

https://github.com/apache/spark/pull/1313#issuecomment-49264857
  
If a TaskSet only contains no-pref tasks, there won't be delay because the 
only valid level is ANY, so everything gets scheduled right away.

If a TaskSet contains process-local + no-pref tasks (I suppose it should be 
process-local + node-local + no-pref actually since process-local task is also 
node-local?), there'll be delay on the no-pref tasks, even when all the 
process-local tasks have been launched. This is because valid levels are not 
re-computed as task finishes. Not sure if there'll be too much overhead to do 
so... (it should also be re-computed when executor is lost)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [branch-0.9] bump versions for v0.9.2 release ...

2014-07-17 Thread mengxr

GitHub user mengxr opened a pull request:

https://github.com/apache/spark/pull/1458

[branch-0.9] bump versions for v0.9.2 release candidate

Manually update some version numbers.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mengxr/spark v0.9.2-rc

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/1458.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1458


commit bc870352f52ad4016a3f21f9f21cc37047536b53
Author: Xiangrui Meng m...@databricks.com
Date:   2014-07-17T05:36:48Z

bump version numbers to 0.9.2

commit 162af66ad32d98e6a7cfc785af65f85930b63d6e
Author: Xiangrui Meng m...@databricks.com
Date:   2014-07-17T05:37:08Z

Merge remote-tracking branch 'apache/branch-0.9' into v0.9.2-rc

commit ea2b20576d829fe66c689a08fdcca9851fcc868d
Author: Xiangrui Meng m...@databricks.com
Date:   2014-07-17T05:42:30Z

update version in SparkBuild

commit 7d0fb769cdb4e0b17f4ed22689d609a3a460c4e8
Author: Xiangrui Meng m...@databricks.com
Date:   2014-07-17T06:33:23Z

change tree/master to tree/branch-0.9 in docs

commit 2c38419bc02e4f4a830f04da2509f1d73ec7aa83
Author: Xiangrui Meng m...@databricks.com
Date:   2014-07-17T06:41:08Z

Merge remote-tracking branch 'apache/branch-0.9' into v0.9.2-rc




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [branch-0.9] bump versions for v0.9.2 release ...

2014-07-17 Thread mengxr

Github user mengxr commented on the pull request:

https://github.com/apache/spark/pull/1458#issuecomment-49265114
  
Merged into branch-0.9.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [branch-0.9] bump versions for v0.9.2 release ...

2014-07-17 Thread mengxr

Github user mengxr closed the pull request at:

https://github.com/apache/spark/pull/1458


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2523] [SQL] [WIP] Hadoop table scan bug...

2014-07-17 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1439#issuecomment-49265512
  
QA results for PR 1439:br- This patch PASSES unit tests.br- This patch 
merges cleanlybr- This patch adds the following public classes 
(experimental):brclass HadoopTableReader(@transient attributes: 
Seq[Attribute], brbrFor more information see test 
ouptut:brhttps://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16766/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: update CHANGES.txt

2014-07-17 Thread mengxr

GitHub user mengxr opened a pull request:

https://github.com/apache/spark/pull/1459

update CHANGES.txt



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/mengxr/spark v0.9.2-rc

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/1459.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1459


commit 6fa5a656281dd5df5cf7f72660efd89fe7b8ec8d
Author: Xiangrui Meng m...@databricks.com
Date:   2014-07-17T06:59:57Z

update CHANGES.txt




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2534] Avoid pulling in the entire RDD i...

2014-07-17 Thread rxin

Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/1450#discussion_r15043414
  
--- Diff: core/src/main/scala/org/apache/spark/rdd/PairRDDFunctions.scala 
---
@@ -214,7 +214,7 @@ class PairRDDFunctions[K, V](self: RDD[(K, V)])
   throw new SparkException(reduceByKeyLocally() does not support 
array keys)
 }
 
-def reducePartition(iter: Iterator[(K, V)]): Iterator[JHashMap[K, V]] 
= {
+val reducePartition = (iter: Iterator[(K, V)]) = {
--- End diff --

this is fixed


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [branch-0.9] Update CHANGES.txt

2014-07-17 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1459#issuecomment-49266193
  
QA tests have started for PR 1459. This patch merges cleanly. brView 
progress: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16769/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2534] Avoid pulling in the entire RDD i...

2014-07-17 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1450#issuecomment-49266828
  
QA tests have started for PR 1450. This patch merges cleanly. brView 
progress: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16770/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-2519 part 2. Remove pattern matching on ...

2014-07-17 Thread sryza

Github user sryza commented on a diff in the pull request:

https://github.com/apache/spark/pull/1447#discussion_r15043849
  
--- Diff: core/src/main/scala/org/apache/spark/rdd/PairRDDFunctions.scala 
---
@@ -571,12 +571,7 @@ class PairRDDFunctions[K, V](self: RDD[(K, V)])
   throw new SparkException(Default partitioner cannot partition array 
keys.)
 }
 val cg = new CoGroupedRDD[K](Seq(self, other1, other2, other3), 
partitioner)
-cg.mapValues { case Seq(vs, w1s, w2s, w3s) =
-  (vs.asInstanceOf[Seq[V]],
-w1s.asInstanceOf[Seq[W1]],
-w2s.asInstanceOf[Seq[W2]],
-w3s.asInstanceOf[Seq[W3]])
-}
+cg.mapValues { seq  = seq.asInstanceOf[(Seq[V], Seq[W1], Seq[W2], 
Seq[W3])] }
--- End diff --

Ah, right.  I'll leave these as they are for now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-2519 part 2. Remove pattern matching on ...

2014-07-17 Thread sryza

Github user sryza commented on a diff in the pull request:

https://github.com/apache/spark/pull/1447#discussion_r15043860
  
--- Diff: core/src/main/scala/org/apache/spark/rdd/PairRDDFunctions.scala 
---
@@ -216,17 +216,17 @@ class PairRDDFunctions[K, V](self: RDD[(K, V)])
 
 def reducePartition(iter: Iterator[(K, V)]): Iterator[JHashMap[K, V]] 
= {
   val map = new JHashMap[K, V]
-  iter.foreach { case (k, v) =
-val old = map.get(k)
-map.put(k, if (old == null) v else func(old, v))
+  iter.foreach { pair =
+val old = map.get(pair._1)
--- End diff --

Also on removing the calls to _1,  these values should be in cache, so 
accesses will be really fast.  Will hold off on these for now, but happy to 
make the change if y'all want.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-2519 part 2. Remove pattern matching on ...

2014-07-17 Thread sryza

Github user sryza commented on a diff in the pull request:

https://github.com/apache/spark/pull/1447#discussion_r15044028
  
--- Diff: core/src/main/scala/org/apache/spark/rdd/PairRDDFunctions.scala 
---
@@ -712,8 +701,8 @@ class PairRDDFunctions[K, V](self: RDD[(K, V)])
 val index = p.getPartition(key)
 def process(it: Iterator[(K, V)]): Seq[V] = {
   val buf = new ArrayBuffer[V]
-  for ((k, v) - it if k == key) {
--- End diff --

Wait, actually, my understanding of this loop is that it's iterating over 
every record within the partition.  Am I missing something? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-2519 part 2. Remove pattern matching on ...

2014-07-17 Thread rxin

Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/1447#discussion_r15044062
  
--- Diff: core/src/main/scala/org/apache/spark/rdd/PairRDDFunctions.scala 
---
@@ -712,8 +701,8 @@ class PairRDDFunctions[K, V](self: RDD[(K, V)])
 val index = p.getPartition(key)
 def process(it: Iterator[(K, V)]): Seq[V] = {
   val buf = new ArrayBuffer[V]
-  for ((k, v) - it if k == key) {
--- End diff --

Actually yes my bad :)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [branch-0.9] Update CHANGES.txt

2014-07-17 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1459#issuecomment-49267990
  
QA results for PR 1459:br- This patch PASSES unit tests.br- This patch 
merges cleanlybr- This patch adds no public classesbrbrFor more 
information see test 
ouptut:brhttps://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16769/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2534] Avoid pulling in the entire RDD i...

2014-07-17 Thread pwendell

Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/1450#issuecomment-49269144
  
I created a JIRA to deal with this and did some initial exploration, but I 
think I'll need to wait for Prashant to actually do it:

https://issues.apache.org/jira/browse/SPARK-2549


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2412] CoalescedRDD throws exception wit...

2014-07-17 Thread pwendell

Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/1337#issuecomment-49270261
  
Okay I merged this.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2412] CoalescedRDD throws exception wit...

2014-07-17 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/1337


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-2526: Simplify options in make-distribut...

2014-07-17 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/1445


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2410][SQL][WIP] Cherry picked Hive Thri...

2014-07-17 Thread pwendell

Github user pwendell commented on a diff in the pull request:

https://github.com/apache/spark/pull/1399#discussion_r15045195
  
--- Diff: sbin/start-thriftserver.sh ---
@@ -0,0 +1,24 @@
+#!/usr/bin/env bash
+
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the License); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an AS IS BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+# Figure out where Spark is installed
+FWDIR=$(cd `dirname $0`/..; pwd)
+
+CLASS=org.apache.spark.sql.hive.thriftserver.HiveThriftServer2
+$FWDIR/bin/spark-class $CLASS $@
--- End diff --

Checkout `spark-shell` for an example of a user-facing script that triages 
options to `spark-submit`.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2423] Clean up SparkSubmit for readabil...

2014-07-17 Thread pwendell

Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/1349#issuecomment-49271133
  
Thanks Andrew, looks good!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2538] [PySpark] Hash based disk spillin...

2014-07-17 Thread davies

GitHub user davies opened a pull request:

https://github.com/apache/spark/pull/1460

[SPARK-2538] [PySpark] Hash based disk spilling aggregation

During aggregation in Python worker, if the memory usage is above 
spark.executor.memory, it will do disk spilling aggregation. 

It will split the aggregation into multiple stage, in each stage, it will 
partition the aggregated data by hash and dump them into disks. After all the 
data are aggregated, it will merge all the stages together (partition by 
partition).

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/davies/spark spill

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/1460.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1460


commit f933713ed628779309fab0da76045f8750d6b350
Author: Davies Liu davies@gmail.com
Date:   2014-07-17T08:03:32Z

Hash based disk spilling aggregation




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2423] Clean up SparkSubmit for readabil...

2014-07-17 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/1349


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [branch-0.9] Update CHANGES.txt

2014-07-17 Thread mengxr

Github user mengxr commented on the pull request:

https://github.com/apache/spark/pull/1459#issuecomment-49271772
  
Merged into branch-0.9.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [branch-0.9] Update CHANGES.txt

2014-07-17 Thread mengxr

Github user mengxr closed the pull request at:

https://github.com/apache/spark/pull/1459


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2299] Consolidate various stageIdTo* ha...

2014-07-17 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1262#issuecomment-49271814
  
QA results for PR 1262:br- This patch PASSES unit tests.br- This patch 
merges cleanlybr- This patch adds no public classesbrbrFor more 
information see test 
ouptut:brhttps://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16768/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2538] [PySpark] Hash based disk spillin...

2014-07-17 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1460#issuecomment-49271938
  
QA tests have started for PR 1460. This patch merges cleanly. brView 
progress: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16772/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Streaming mllib [SPARK-2438][MLLIB]

2014-07-17 Thread mengxr

Github user mengxr commented on the pull request:

https://github.com/apache/spark/pull/1361#issuecomment-49272156
  
Jenkins, add to whitelist.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Streaming mllib [SPARK-2438][MLLIB]

2014-07-17 Thread mengxr

Github user mengxr commented on the pull request:

https://github.com/apache/spark/pull/1361#issuecomment-49272169
  
Jenkins, test this please.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-2519 part 2. Remove pattern matching on ...

2014-07-17 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1447#issuecomment-49272425
  
QA tests have started for PR 1447. This patch merges cleanly. brView 
progress: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16773/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Streaming mllib [SPARK-2438][MLLIB]

2014-07-17 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1361#issuecomment-49272423
  
QA tests have started for PR 1361. This patch merges cleanly. brView 
progress: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16774/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-2553. CoGroupedRDD unnecessarily allocat...

2014-07-17 Thread sryza

GitHub user sryza opened a pull request:

https://github.com/apache/spark/pull/1461

SPARK-2553. CoGroupedRDD unnecessarily allocates a Tuple2 per dependency...

... per key

My humble opinion is that avoiding allocations in this performance-critical 
section is worth the extra code.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sryza/spark sandy-spark-2553

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/1461.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1461


commit 7eaf7f2a18ddea7ee47aacb5c5559c278a924899
Author: Sandy Ryza sa...@cloudera.com
Date:   2014-07-17T08:19:48Z

SPARK-2553. CoGroupedRDD unnecessarily allocates a Tuple2 per dependency 
per key




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2555] Support configuration spark.sched...

2014-07-17 Thread li-zhihui

Github user li-zhihui commented on the pull request:

https://github.com/apache/spark/pull/1462#issuecomment-49284642
  
@tgravescs 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2538] [PySpark] Hash based disk spillin...

2014-07-17 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1460#issuecomment-49286439
  
QA results for PR 1460:br- This patch FAILED unit tests.br- This patch 
merges cleanlybr- This patch adds the following public classes 
(experimental):brclass Merger(object):brclass 
AutoSerializer(FramedSerializer):brbrFor more information see test 
ouptut:brhttps://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16772/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Streaming mllib [SPARK-2438][MLLIB]

2014-07-17 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1361#issuecomment-49287245
  
QA results for PR 1361:br- This patch FAILED unit tests.br- This patch 
merges cleanlybr- This patch adds no public classesbrbrFor more 
information see test 
ouptut:brhttps://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16774/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-2519 part 2. Remove pattern matching on ...

2014-07-17 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1447#issuecomment-49287721
  
QA results for PR 1447:br- This patch PASSES unit tests.br- This patch 
merges cleanlybr- This patch adds no public classesbrbrFor more 
information see test 
ouptut:brhttps://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16773/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2125] Add sort flag and move sort into ...

2014-07-17 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1210#issuecomment-49293649
  
QA results for PR 1210:br- This patch FAILED unit tests.br- This patch 
merges cleanlybr- This patch adds no public classesbrbrFor more 
information see test 
ouptut:brhttps://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16776/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-2497 Exclude companion classes, with the...

2014-07-17 Thread ScrapCodes

GitHub user ScrapCodes opened a pull request:

https://github.com/apache/spark/pull/1463

SPARK-2497 Exclude companion classes, with their corresponding objects.



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ScrapCodes/spark-1 SPARK-2497/mima-exclude-all

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/1463.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1463


commit 11eff8f7fe2a5dcd3bd6b0ed38c51c9f914e96c3
Author: Prashant Sharma prashan...@imaginea.com
Date:   2014-07-17T12:48:49Z

SPARK-2497 Exclude companion classes, with their corresponding objects.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-2497 Exclude companion classes, with the...

2014-07-17 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1463#issuecomment-49302217
  
QA tests have started for PR 1463. This patch merges cleanly. brView 
progress: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16777/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2460] Optimize SparkContext.hadoopFile ...

2014-07-17 Thread lianhuiwang

Github user lianhuiwang commented on a diff in the pull request:

https://github.com/apache/spark/pull/1385#discussion_r15055059
  
--- Diff: core/src/main/scala/org/apache/spark/rdd/HadoopRDD.scala ---
@@ -128,25 +123,13 @@ class HadoopRDD[K, V](
 
   // Returns a JobConf that will be used on slaves to obtain input splits 
for Hadoop reads.
   protected def getJobConf(): JobConf = {
-val conf: Configuration = broadcastedConf.value.value
-if (conf.isInstanceOf[JobConf]) {
-  // A user-broadcasted JobConf was provided to the HadoopRDD, so 
always use it.
-  conf.asInstanceOf[JobConf]
-} else if (HadoopRDD.containsCachedMetadata(jobConfCacheKey)) {
+val conf: JobConf = broadcastedConf.value.value
+if (HadoopRDD.containsCachedMetadata(jobConfCacheKey)) {
--- End diff --

yes,i agree with you. broadcastedConf is cached by blockManager in Broadcast


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-2481: The environment variables SPARK_HI...

2014-07-17 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1341#issuecomment-49303258
  
QA tests have started for PR 1341. This patch merges cleanly. brView 
progress: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16778/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [WIP]When the executor is thrown OutOfMemoryEr...

2014-07-17 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1387#issuecomment-49303783
  
QA tests have started for PR 1387. This patch merges cleanly. brView 
progress: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16779/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-2497 Exclude companion classes, with the...

2014-07-17 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1463#issuecomment-49314286
  
QA results for PR 1463:br- This patch PASSES unit tests.br- This patch 
merges cleanlybr- This patch adds no public classesbrbrFor more 
information see test 
ouptut:brhttps://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16777/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-2481: The environment variables SPARK_HI...

2014-07-17 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1341#issuecomment-49316014
  
QA results for PR 1341:br- This patch PASSES unit tests.br- This patch 
merges cleanlybr- This patch adds no public classesbrbrFor more 
information see test 
ouptut:brhttps://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16778/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: The driver perform garbage collection, when th...

2014-07-17 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1387#issuecomment-49316425
  
QA results for PR 1387:br- This patch PASSES unit tests.br- This patch 
merges cleanlybr- This patch adds no public classesbrbrFor more 
information see test 
ouptut:brhttps://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16779/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2479][MLlib] Comparing floating-point n...

2014-07-17 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1425#issuecomment-49318202
  
QA tests have started for PR 1425. This patch merges cleanly. brView 
progress: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16780/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2479][MLlib] Comparing floating-point n...

2014-07-17 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1425#issuecomment-49321741
  
QA tests have started for PR 1425. This patch merges cleanly. brView 
progress: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16781/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [Spark 2557] fix LOCAL_N_REGEX in createTaskSc...

2014-07-17 Thread advancedxy

GitHub user advancedxy opened a pull request:

https://github.com/apache/spark/pull/1464

[Spark 2557] fix LOCAL_N_REGEX in createTaskScheduler and make local-n and 
local-n-failures consistent

[SPARK-2557](https://issues.apache.org/jira/browse/SPARK-2557)



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/advancedxy/spark SPARK-2557

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/1464.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1464


commit 3bbc668a9994a76683926902131ab06a90d96a85
Author: Ye Xianjin advance...@gmail.com
Date:   2014-07-17T15:12:00Z

fix LOCAL_N_REGEX regular expression and make local_n_failures
accept * as all cores on the computer

commit d844d67120edcc31409fabf59328eb8a651bbad3
Author: Ye Xianjin advance...@gmail.com
Date:   2014-07-17T15:21:16Z

add local-*-n-failures, bad-local-n, bad-local-n-failures test case




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [Spark 2557] fix LOCAL_N_REGEX in createTaskSc...

2014-07-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/1464#issuecomment-49323035
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2523] [SQL] [WIP] Hadoop table scan bug...

2014-07-17 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/1439#discussion_r15067812
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/execution/HiveTableScan.scala 
---
@@ -67,95 +61,12 @@ case class HiveTableScan(
   }
 
   @transient
-  private[this] val hadoopReader = new 
HadoopTableReader(relation.tableDesc, context)
-
-  /**
-   * The hive object inspector for this table, which can be used to 
extract values from the
-   * serialized row representation.
-   */
-  @transient
-  private[this] lazy val objectInspector =
-
relation.tableDesc.getDeserializer.getObjectInspector.asInstanceOf[StructObjectInspector]
-
-  /**
-   * Functions that extract the requested attributes from the hive output. 
 Partitioned values are
-   * casted from string to its declared data type.
-   */
-  @transient
-  protected lazy val attributeFunctions: Seq[(Any, Array[String]) = Any] 
= {
-attributes.map { a =
-  val ordinal = relation.partitionKeys.indexOf(a)
-  if (ordinal = 0) {
-val dataType = relation.partitionKeys(ordinal).dataType
-(_: Any, partitionKeys: Array[String]) = {
-  castFromString(partitionKeys(ordinal), dataType)
-}
-  } else {
-val ref = objectInspector.getAllStructFieldRefs
-  .find(_.getFieldName == a.name)
-  .getOrElse(sys.error(sCan't find attribute $a))
-val fieldObjectInspector = ref.getFieldObjectInspector
-
-val unwrapHiveData = fieldObjectInspector match {
-  case _: HiveVarcharObjectInspector =
-(value: Any) = value.asInstanceOf[HiveVarchar].getValue
-  case _: HiveDecimalObjectInspector =
-(value: Any) = 
BigDecimal(value.asInstanceOf[HiveDecimal].bigDecimalValue())
-  case _ =
-identity[Any] _
-}
-
-(row: Any, _: Array[String]) = {
-  val data = objectInspector.getStructFieldData(row, ref)
-  val hiveData = unwrapData(data, fieldObjectInspector)
-  if (hiveData != null) unwrapHiveData(hiveData) else null
-}
-  }
-}
-  }
+  private[this] val hadoopReader = new HadoopTableReader(attributes, 
relation, context)
 
   private[this] def castFromString(value: String, dataType: DataType) = {
 Cast(Literal(value), dataType).eval(null)
   }
 
-  private def addColumnMetadataToConf(hiveConf: HiveConf) {
--- End diff --

I would keep it. It is important to set needed columns in conf. So, RCFile 
and ORC can know what columns should be skipped. Also, seems 
`hiveConf.set(serdeConstants.LIST_COLUMN_TYPES, columnTypeNames)` and 
`hiveConf.set(serdeConstants.LIST_COLUMNS, columnInternalNames)` will be used 
to push down filters.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2523] [SQL] [WIP] Hadoop table scan bug...

2014-07-17 Thread yhuai

Github user yhuai commented on a diff in the pull request:

https://github.com/apache/spark/pull/1439#discussion_r15068484
  
--- Diff: 
sql/hive/src/main/scala/org/apache/spark/sql/hive/TableReader.scala ---
@@ -156,33 +158,43 @@ class HadoopTableReader(@transient _tableDesc: 
TableDesc, @transient sc: HiveCon
   }
 
   // Create local references so that the outer object isn't serialized.
-  val tableDesc = _tableDesc
+  val tableDesc = relation.tableDesc
   val broadcastedHiveConf = _broadcastedHiveConf
   val localDeserializer = partDeserializer
+  val mutableRow = new GenericMutableRow(attributes.length)
+
+  // split the attributes (output schema) into 2 categories:
+  // (partition keys, ordinal), (normal attributes, ordinal), the 
ordinal mean the 
+  // position of the in the output Row.
--- End diff --

position of the attribute?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2538] [PySpark] Hash based disk spillin...

2014-07-17 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1460#issuecomment-49334156
  
QA tests have started for PR 1460. This patch merges cleanly. brView 
progress: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16782/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-2083 Add support for spark.local.maxFail...

2014-07-17 Thread kbzod

GitHub user kbzod opened a pull request:

https://github.com/apache/spark/pull/1465

SPARK-2083 Add support for spark.local.maxFailures configuration property

The logic in `SparkContext` for creating a new task scheduler now looks for 
a spark.local.maxFailures property to specify the number of task failures in 
a local job that will cause the job to fail. Its default is the prior fixed 
value of 1 (no retries).

The patch includes documentation updates and new unit tests.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/kbzod/spark SPARK-2083

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/1465.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1465


commit a8b369dad154b7773ec8b535cf399b428ec3952c
Author: Bill Havanki bhava...@cloudera.com
Date:   2014-07-17T16:01:22Z

SPARK-2083 Add support for spark.local.maxFailures configuration property




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-2083 Add support for spark.local.maxFail...

2014-07-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/1465#issuecomment-49335741
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2411] Add a history-not-found page to s...

2014-07-17 Thread andrewor14

Github user andrewor14 commented on the pull request:

https://github.com/apache/spark/pull/1336#issuecomment-49336964
  
I have updated the screenshots again. Anything else?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1478.2 Fix incorrect NioServerSocketChan...

2014-07-17 Thread srowen

GitHub user srowen opened a pull request:

https://github.com/apache/spark/pull/1466

SPARK-1478.2 Fix incorrect NioServerSocketChannelFactory constructor call

The line break inadvertently means this was interpreted as a call to the 
no-arg constructor. This doesn't exist in older Netty even. (Also fixed a val 
name typo.)

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/srowen/spark SPARK-1478.2

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/1466.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1466


commit 59c3501262d1358f5dc5580c19f2d30858cfcabc
Author: Sean Owen sro...@gmail.com
Date:   2014-07-17T17:23:48Z

Line break caused Scala to interpret NioServerSocketChannelFactory 
constructor as the no-arg version, which is not even present in some versions 
of Netty




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2340] Resolve event logging and History...

2014-07-17 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1280#issuecomment-49337830
  
QA tests have started for PR 1280. This patch merges cleanly. brView 
progress: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16784/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1478.2 Fix incorrect NioServerSocketChan...

2014-07-17 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1466#issuecomment-49337826
  
QA tests have started for PR 1466. This patch merges cleanly. brView 
progress: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16783/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2523] [SQL] [WIP] Hadoop table scan bug...

2014-07-17 Thread yhuai

Github user yhuai commented on the pull request:

https://github.com/apache/spark/pull/1439#issuecomment-49338675
  
I think we are not clear on the boundary between a `TableReader` and a 
physical `TableScan` operator (e.g. `HiveTableScan`). Seems we just want 
`TableReader` to create `RDD`s (general-purpose work) and inside a `TableScan` 
operator, we create Catalyst `Row`s (table-specific work). However, when we 
look at `HadoopTableReader`, it is actually a `HiveTableReader`. For every Hive 
partition, we create a `HadoopRDD` (requiring Hive-specific code) and 
deserialize Hive rows. I am not sure if `TableReader` is a good abstraction. 

I think it makes sense to remove the trait of `TableReader` and add a 
abstract `TableScan` class (inheriting `LeafNode`). All existing TableScan 
operators will inherit this abstract `TableScan` class. If we think it is the 
right approach. I can do it in another PR.

@marmbrus, @liancheng, @rxin, @chenghao-intel thoughts?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2538] [PySpark] Hash based disk spillin...

2014-07-17 Thread mateiz

Github user mateiz commented on a diff in the pull request:

https://github.com/apache/spark/pull/1460#discussion_r15072735
  
--- Diff: python/pyspark/rdd.py ---
@@ -168,6 +170,123 @@ def _replaceRoot(self, value):
 self._sink(1)
 
 
+class Merger(object):
+
+External merger will dump the aggregated data into disks when memory 
usage is above
+the limit, then merge them together.
+
+ combiner = lambda x, y:x+y
+ merger = Merger(combiner, 10)
+ N = 1
+ merger.merge(zip(xrange(N), xrange(N)) * 10)
+ merger.spills
+100
+ sum(1 for k,v in merger.iteritems())
+1
+
+
+PARTITIONS = 64
+BATCH = 1000
+
+def __init__(self, combiner, memory_limit=256, path=/tmp/pyspark, 
serializer=None):
--- End diff --

This should actually rotate among storage directories in spark.local.dir. 
Check out how the DiskStore works in Java.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2538] [PySpark] Hash based disk spillin...

2014-07-17 Thread mateiz

Github user mateiz commented on a diff in the pull request:

https://github.com/apache/spark/pull/1460#discussion_r15072768
  
--- Diff: python/pyspark/rdd.py ---
@@ -168,6 +170,123 @@ def _replaceRoot(self, value):
 self._sink(1)
 
 
+class Merger(object):
+
+External merger will dump the aggregated data into disks when memory 
usage is above
+the limit, then merge them together.
+
+ combiner = lambda x, y:x+y
+ merger = Merger(combiner, 10)
+ N = 1
+ merger.merge(zip(xrange(N), xrange(N)) * 10)
+ merger.spills
+100
+ sum(1 for k,v in merger.iteritems())
+1
+
+
+PARTITIONS = 64
+BATCH = 1000
+
+def __init__(self, combiner, memory_limit=256, path=/tmp/pyspark, 
serializer=None):
+self.combiner = combiner
+self.path = os.path.join(path, str(os.getpid()))
+self.memory_limit = memory_limit
+self.serializer = serializer or 
BatchedSerializer(AutoSerializer(), 1024)
+self.item_limit = None
+self.data = {}
+self.pdata = []
+self.spills = 0
+
+def used_memory(self):
+rss = resource.getrusage(resource.RUSAGE_SELF).ru_maxrss
+if platform.system() == 'Linux':
+rss = 10
+elif platform.system() == 'Darwin':
+rss = 20
--- End diff --

We also need to make it work on Windows, or at least fail gracefully. Do 
you know how to get this number on Windows? You can spin up a Windows machine 
on EC2 to try it.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2538] [PySpark] Hash based disk spillin...

2014-07-17 Thread mateiz

Github user mateiz commented on a diff in the pull request:

https://github.com/apache/spark/pull/1460#discussion_r15072860
  
--- Diff: python/pyspark/rdd.py ---
@@ -168,6 +170,123 @@ def _replaceRoot(self, value):
 self._sink(1)
 
 
+class Merger(object):
+
+External merger will dump the aggregated data into disks when memory 
usage is above
+the limit, then merge them together.
+
+ combiner = lambda x, y:x+y
+ merger = Merger(combiner, 10)
+ N = 1
+ merger.merge(zip(xrange(N), xrange(N)) * 10)
+ merger.spills
+100
+ sum(1 for k,v in merger.iteritems())
+1
+
+
+PARTITIONS = 64
+BATCH = 1000
+
+def __init__(self, combiner, memory_limit=256, path=/tmp/pyspark, 
serializer=None):
+self.combiner = combiner
+self.path = os.path.join(path, str(os.getpid()))
+self.memory_limit = memory_limit
+self.serializer = serializer or 
BatchedSerializer(AutoSerializer(), 1024)
--- End diff --

Probably need to make the batch size an argument and pass in 
SparkContext._batchSize


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2538] [PySpark] Hash based disk spillin...

2014-07-17 Thread mateiz

Github user mateiz commented on a diff in the pull request:

https://github.com/apache/spark/pull/1460#discussion_r15072948
  
--- Diff: python/pyspark/rdd.py ---
@@ -1247,15 +1366,12 @@ def combineLocally(iterator):
 return combiners.iteritems()
 locally_combined = self.mapPartitions(combineLocally)
 shuffled = locally_combined.partitionBy(numPartitions)
-
+ 
+executorMemory = self.ctx._jsc.sc().executorMemory()
 def _mergeCombiners(iterator):
-combiners = {}
-for (k, v) in iterator:
-if k not in combiners:
-combiners[k] = v
-else:
-combiners[k] = mergeCombiners(combiners[k], v)
-return combiners.iteritems()
+merger = Merger(mergeCombiners, executorMemory * 0.7)
--- End diff --

We probably want to use the same memory as the Java append only map, which 
is spark.shuffle.memoryFraction * executorMemory. Check how that gets created 
in ExternalAppendOnlyMap


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2411] Add a history-not-found page to s...

2014-07-17 Thread pwendell

Github user pwendell commented on a diff in the pull request:

https://github.com/apache/spark/pull/1336#discussion_r15072986
  
--- Diff: core/src/main/scala/org/apache/spark/deploy/master/Master.scala 
---
@@ -667,29 +664,47 @@ private[spark] class Master(
*/
   def rebuildSparkUI(app: ApplicationInfo): Boolean = {
 val appName = app.desc.name
-val eventLogDir = app.desc.eventLogDir.getOrElse { return false }
+val eventLogDir = app.desc.eventLogDir.getOrElse {
+  // Event logging is not enabled for this application
+  app.desc.appUiUrl = /history/not-found
+  return false
+}
 val fileSystem = Utils.getHadoopFileSystem(eventLogDir)
 val eventLogInfo = EventLoggingListener.parseLoggingInfo(eventLogDir, 
fileSystem)
 val eventLogPaths = eventLogInfo.logPaths
 val compressionCodec = eventLogInfo.compressionCodec
-if (!eventLogPaths.isEmpty) {
-  try {
-val replayBus = new ReplayListenerBus(eventLogPaths, fileSystem, 
compressionCodec)
-val ui = new SparkUI(
-  new SparkConf, replayBus, appName +  (completed), /history/ 
+ app.id)
-replayBus.replay()
-app.desc.appUiUrl = ui.basePath
-appIdToUI(app.id) = ui
-webUi.attachSparkUI(ui)
-return true
-  } catch {
-case e: Exception =
-  logError(Exception in replaying log for application %s 
(%s).format(appName, app.id), e)
-  }
-} else {
-  logWarning(Application %s (%s) has no valid logs: 
%s.format(appName, app.id, eventLogDir))
+
+if (eventLogPaths.isEmpty) {
--- End diff --

What happens here if `eventLogDir` doesn't exist?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2538] [PySpark] Hash based disk spillin...

2014-07-17 Thread mateiz

Github user mateiz commented on a diff in the pull request:

https://github.com/apache/spark/pull/1460#discussion_r15073011
  
--- Diff: python/pyspark/serializers.py ---
@@ -297,6 +297,33 @@ class MarshalSerializer(FramedSerializer):
 loads = marshal.loads
 
 
+class AutoSerializer(FramedSerializer):
+
+Choose marshal or cPickle as serialization protocol autumatically
+
+def __init__(self):
+FramedSerializer.__init__(self)
+self._type = None
+
+def dumps(self, obj):
+try:
+if self._type is not None:
+raise TypeError(fallback)
+return 'M' + marshal.dumps(obj)
+except Exception:
+self._type = 'P'
+return 'P' + cPickle.dumps(obj, -1)
--- End diff --

If the objects are not marshal-able but are pickle-able, is there a big 
performance cost to throwing an exception on each write? Would be good to test 
this, because if not, we can make this serializer our default where we now use 
Pickle. Even if there is a cost maybe we can do something where if 10% of the 
objects written fail to marshal we switch to always using pickle.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2523] [SQL] [WIP] Hadoop table scan bug...

2014-07-17 Thread yhuai

Github user yhuai commented on the pull request:

https://github.com/apache/spark/pull/1439#issuecomment-49341477
  
@chenghao-intel explained the root cause in 
https://issues.apache.org/jira/browse/SPARK-2523. Basically, we should use 
partition-specific `ObjectInspectors` to extract fields instead of using the 
`ObjectInspectors` set in the `TableDesc`. If partitions of a table are using 
different SerDe, their `ObjectInspector`s will be different. @chenghao-intel 
can you add unit tests? Is there any Hive query test that can be included?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2538] [PySpark] Hash based disk spillin...

2014-07-17 Thread mateiz

Github user mateiz commented on a diff in the pull request:

https://github.com/apache/spark/pull/1460#discussion_r15073144
  
--- Diff: python/pyspark/serializers.py ---
@@ -297,6 +297,33 @@ class MarshalSerializer(FramedSerializer):
 loads = marshal.loads
 
 
+class AutoSerializer(FramedSerializer):
+
+Choose marshal or cPickle as serialization protocol autumatically
+
+def __init__(self):
+FramedSerializer.__init__(self)
+self._type = None
+
+def dumps(self, obj):
+try:
+if self._type is not None:
+raise TypeError(fallback)
+return 'M' + marshal.dumps(obj)
+except Exception:
+self._type = 'P'
+return 'P' + cPickle.dumps(obj, -1)
+
+def loads(self, stream):
+_type = stream[0]
+if _type == 'M':
+return marshal.loads(stream[1:])
+elif _type == 'P':
+return cPickle.loads(stream[1:])
--- End diff --

Seems to be more efficient to read a byte with stream.read(1) than to index 
into the stream, but I'm not sure


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-2497 Exclude companion classes, with the...

2014-07-17 Thread pwendell

Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/1463#issuecomment-49341909
  
@ScrapCodes could you explain a bit more how this fixes SPARK-2497. If I 
look at the original false-positive the issue reported was not with a companion 
class. It was actually in a method of the object itself. Also the method was 
annotated, not the object.

```
Source: 
https://github.com/apache/spark/pull/886/files#diff-0f907b47af6261abe00eb31097a9493bR41
Error:
[error]  * method calculate(Double,Double)Double in object 
org.apache.spark.mllib.tree.impurity.Gini's type has changed; was 
(Double,Double)Double, is now: (Array[Double],Double)Double
[error]filter with: 
ProblemFilters.exclude[IncompatibleMethTypeProblem](org.apache.spark.mllib.tree.impurity.Gini.calculate)
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2538] [PySpark] Hash based disk spillin...

2014-07-17 Thread mateiz

Github user mateiz commented on a diff in the pull request:

https://github.com/apache/spark/pull/1460#discussion_r15073412
  
--- Diff: python/pyspark/rdd.py ---
@@ -168,6 +170,123 @@ def _replaceRoot(self, value):
 self._sink(1)
 
 
+class Merger(object):
+
+External merger will dump the aggregated data into disks when memory 
usage is above
+the limit, then merge them together.
+
+ combiner = lambda x, y:x+y
+ merger = Merger(combiner, 10)
+ N = 1
+ merger.merge(zip(xrange(N), xrange(N)) * 10)
+ merger.spills
+100
+ sum(1 for k,v in merger.iteritems())
+1
+
+
+PARTITIONS = 64
+BATCH = 1000
+
+def __init__(self, combiner, memory_limit=256, path=/tmp/pyspark, 
serializer=None):
+self.combiner = combiner
+self.path = os.path.join(path, str(os.getpid()))
+self.memory_limit = memory_limit
+self.serializer = serializer or 
BatchedSerializer(AutoSerializer(), 1024)
+self.item_limit = None
+self.data = {}
+self.pdata = []
+self.spills = 0
+
+def used_memory(self):
+rss = resource.getrusage(resource.RUSAGE_SELF).ru_maxrss
+if platform.system() == 'Linux':
+rss = 10
+elif platform.system() == 'Darwin':
+rss = 20
+return rss
+
+def merge(self, iterator):
+iterator = iter(iterator)
+d = self.data
+comb = self.combiner
+c = 0
+for k, v in iterator:
+if k in d:
+d[k] = comb(d[k], v)
+else:
+d[k] = v
+
+if self.item_limit is not None:
+continue
+
+c += 1
+if c % self.BATCH == 0 and self.used_memory()  
self.memory_limit:
+self.item_limit = c
+self._first_spill()
+self._partitioned_merge(iterator)
+return
+
+def _partitioned_merge(self, iterator):
+comb = self.combiner
+c = 0
+for k, v in iterator:
+d = self.pdata[hash(k) % self.PARTITIONS]
+if k in d:
+d[k] = comb(d[k], v)
+else:
+d[k] = v
+c += 1
+if c = self.item_limit:
+self._spill()
+c = 0
+
+def _first_spill(self):
+path = os.path.join(self.path, str(self.spills))
+if not os.path.exists(path):
+os.makedirs(path)
+streams = [open(os.path.join(path, str(i)), 'w')
+   for i in range(self.PARTITIONS)]
--- End diff --

In the future we might want to compress these with GzipFile, but for now 
just add a TODO


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1215 [MLLIB]: Clustering: Index out of b...

2014-07-17 Thread jkbradley

Github user jkbradley commented on a diff in the pull request:

https://github.com/apache/spark/pull/1407#discussion_r15073983
  
--- Diff: 
mllib/src/main/scala/org/apache/spark/mllib/clustering/LocalKMeans.scala ---
@@ -59,6 +59,11 @@ private[mllib] object LocalKMeans extends Logging {
 cumulativeScore += weights(j) * KMeans.pointCost(curCenters, 
points(j))
 j += 1
   }
+  if (j == 0) {
+logWarning(kMeansPlusPlus initialization ran out of distinct 
points for centers. +
+  s Using duplicate point for center k = $i.)
+j = 1
--- End diff --

I'll go with the second suggestion.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2542] Exit Code Class should be renamed...

2014-07-17 Thread sarutak

GitHub user sarutak opened a pull request:

https://github.com/apache/spark/pull/1467

[SPARK-2542] Exit Code Class should be renamed and placed package properly



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sarutak/spark master

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/1467.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #1467


commit 01acd8e9033045d9d5f9d5c230c427a007535cdb
Author: Kousuke Saruta saru...@oss.nttdata.co.jp
Date:   2014-07-17T18:11:34Z

Renamed and repackaged ExecutorExitCode.scala properly




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2542] Exit Code Class should be renamed...

2014-07-17 Thread AmplabJenkins

Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/1467#issuecomment-49343702
  
Can one of the admins verify this patch?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Streaming mllib [SPARK-2438][MLLIB]

2014-07-17 Thread freeman-lab

Github user freeman-lab commented on the pull request:

https://github.com/apache/spark/pull/1361#issuecomment-49344174
  
Looks like the basic test for correct final params passes, but not the 
stricter test for improvement on every update. Both pass locally. My guess is 
that it's running a bit slower on Jenkins, so the updates don't complete fast 
enough (I can create a failure locally by making the test data rate too high). 
I'll play with this, might work to just slow down the data rate.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2538] [PySpark] Hash based disk spillin...

2014-07-17 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1460#issuecomment-49346100
  
QA results for PR 1460:br- This patch PASSES unit tests.br- This patch 
merges cleanlybr- This patch adds the following public classes 
(experimental):brclass Merger(object):brclass 
AutoSerializer(FramedSerializer):brbrFor more information see test 
ouptut:brhttps://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16782/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1215 [MLLIB]: Clustering: Index out of b...

2014-07-17 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1407#issuecomment-49347076
  
QA tests have started for PR 1407. This patch merges cleanly. brView 
progress: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16785/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1215 [MLLIB]: Clustering: Index out of b...

2014-07-17 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1407#issuecomment-49347163
  
QA results for PR 1407:br- This patch FAILED unit tests.br- This patch 
merges cleanlybr- This patch adds no public classesbrbrFor more 
information see test 
ouptut:brhttps://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16785/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-695] In DAGScheduler's getPreferredLocs...

2014-07-17 Thread markhamstra

Github user markhamstra commented on a diff in the pull request:

https://github.com/apache/spark/pull/1362#discussion_r15076386
  
--- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala 
---
@@ -1107,7 +1106,6 @@ class DAGScheduler(
 case shufDep: ShuffleDependency[_, _, _] =
   val mapStage = getShuffleMapStage(shufDep, stage.jobId)
   if (!mapStage.isAvailable) {
-visitedStages += mapStage
--- End diff --

@mateiz It looks to me that this is what was always intended, but has 
actually been missing the `visitedStages` check for the past couple of years:
```scala
if (!mapStage.isAvailable  !visitedStages(mapStage)) {
  visitedStages += mapStage
  visit(mapStage.rdd)
}  // Otherwise there's no need to follow the dependency back
```
Making that change works for me in the sense that all of the test suites 
continue to pass, but I haven't yet got any relative performance numbers.

Proabably needs to go in a separate JIRA  PR.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Modify default YARN memory_overhead-- from an ...

2014-07-17 Thread nishkamravi2

Github user nishkamravi2 commented on the pull request:

https://github.com/apache/spark/pull/1391#issuecomment-49348179
  
Bringing the discussion back online. Thanks for all the input so far. 

Ran a few experiments yday and today. Number of executors (which was the 
other main handle we wanted to factor in) doesn't seem to have any noticeable 
impact. Tried a few other parameters such as num_partitions, 
default_parallelism but nothing sticks. Confirmed the proportionality with 
container size. Have also been trying to tune the multiplier to minimize 
potential waste and I think 6% (as opposed to 7% as we currently have) is the 
lowest we should go. Modifying the PR accordingly.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1478.2 Fix incorrect NioServerSocketChan...

2014-07-17 Thread tdas

Github user tdas commented on a diff in the pull request:

https://github.com/apache/spark/pull/1466#discussion_r15076631
  
--- Diff: 
external/flume/src/main/scala/org/apache/spark/streaming/flume/FlumeInputDStream.scala
 ---
@@ -153,15 +153,15 @@ class FlumeReceiver(
 
   private def initServer() = {
 if (enableDecompression) {
-  val channelFactory = new NioServerSocketChannelFactory
-(Executors.newCachedThreadPool(), Executors.newCachedThreadPool());
-  val channelPipelieFactory = new CompressionChannelPipelineFactory()
+  val channelFactory = new 
NioServerSocketChannelFactory(Executors.newCachedThreadPool(),
+ 
Executors.newCachedThreadPool())
+  val channelPipelineFactory = new CompressionChannelPipelineFactory()
--- End diff --

Good catch on the spelling mistake!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1478.2 Fix incorrect NioServerSocketChan...

2014-07-17 Thread srowen

Github user srowen commented on a diff in the pull request:

https://github.com/apache/spark/pull/1466#discussion_r15076673
  
--- Diff: 
external/flume/src/main/scala/org/apache/spark/streaming/flume/FlumeInputDStream.scala
 ---
@@ -153,15 +153,15 @@ class FlumeReceiver(
 
   private def initServer() = {
 if (enableDecompression) {
-  val channelFactory = new NioServerSocketChannelFactory
-(Executors.newCachedThreadPool(), Executors.newCachedThreadPool());
-  val channelPipelieFactory = new CompressionChannelPipelineFactory()
+  val channelFactory = new 
NioServerSocketChannelFactory(Executors.newCachedThreadPool(),
+ 
Executors.newCachedThreadPool())
+  val channelPipelineFactory = new CompressionChannelPipelineFactory()
   
   new NettyServer(
 responder, 
 new InetSocketAddress(host, port),
-channelFactory, 
-channelPipelieFactory, 
+channelFactory,
+channelPipelineFactory,
--- End diff --

There was a typo in the variable name. The other line was just the IDE
stripping a trailing space I think.
On Jul 17, 2014 7:52 PM, Tathagata Das notificati...@github.com wrote:

 In
 
external/flume/src/main/scala/org/apache/spark/streaming/flume/FlumeInputDStream.scala:

 
 new NettyServer(
   responder,
   new InetSocketAddress(host, port),
  -channelFactory,
  -channelPipelieFactory,
  +channelFactory,
  +channelPipelineFactory,

 What changed here? Was there a formatting issue?

 â
 Reply to this email directly or view it on GitHub
 https://github.com/apache/spark/pull/1466/files#r15076612.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-695] In DAGScheduler's getPreferredLocs...

2014-07-17 Thread mateiz

Github user mateiz commented on a diff in the pull request:

https://github.com/apache/spark/pull/1362#discussion_r15076921
  
--- Diff: core/src/main/scala/org/apache/spark/Dependency.scala ---
@@ -32,8 +32,6 @@ abstract class Dependency[T](val rdd: RDD[T]) extends 
Serializable
 
 /**
  * :: DeveloperApi ::
- * Base class for dependencies where each partition of the parent RDD is 
used by at most one
- * partition of the child RDD.  Narrow dependencies allow for pipelined 
execution.
--- End diff --

Sounds good.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-695] In DAGScheduler's getPreferredLocs...

2014-07-17 Thread mateiz

Github user mateiz commented on a diff in the pull request:

https://github.com/apache/spark/pull/1362#discussion_r15076891
  
--- Diff: core/src/main/scala/org/apache/spark/scheduler/DAGScheduler.scala 
---
@@ -1107,7 +1106,6 @@ class DAGScheduler(
 case shufDep: ShuffleDependency[_, _, _] =
   val mapStage = getShuffleMapStage(shufDep, stage.jobId)
   if (!mapStage.isAvailable) {
-visitedStages += mapStage
--- End diff --

Hmm, yeah, it seems that should be there. On the other hand, will tracking 
the visited RDDs be enough? I guess we need to look into it, but it doesn't 
seem urgent.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2534] Avoid pulling in the entire RDD i...

2014-07-17 Thread rxin

Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/1450#issuecomment-49350307
  
Merged in master.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2340] Resolve event logging and History...

2014-07-17 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1280#issuecomment-49350379
  
QA results for PR 1280:br- This patch PASSES unit tests.br- This patch 
merges cleanlybr- This patch adds no public classesbrbrFor more 
information see test 
ouptut:brhttps://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16784/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2534] Avoid pulling in the entire RDD i...

2014-07-17 Thread asfgit

Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/1450


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: SPARK-1478.2 Fix incorrect NioServerSocketChan...

2014-07-17 Thread SparkQA

Github user SparkQA commented on the pull request:

https://github.com/apache/spark/pull/1466#issuecomment-49350436
  
QA results for PR 1466:br- This patch PASSES unit tests.br- This patch 
merges cleanlybr- This patch adds no public classesbrbrFor more 
information see test 
ouptut:brhttps://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/16783/consoleFull


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: [SPARK-2538] [PySpark] Hash based disk spillin...

2014-07-17 Thread davies

Github user davies commented on a diff in the pull request:

https://github.com/apache/spark/pull/1460#discussion_r15078093
  
--- Diff: python/pyspark/rdd.py ---
@@ -168,6 +170,123 @@ def _replaceRoot(self, value):
 self._sink(1)
 
 
+class Merger(object):
+
+External merger will dump the aggregated data into disks when memory 
usage is above
+the limit, then merge them together.
+
+ combiner = lambda x, y:x+y
+ merger = Merger(combiner, 10)
+ N = 1
+ merger.merge(zip(xrange(N), xrange(N)) * 10)
+ merger.spills
+100
+ sum(1 for k,v in merger.iteritems())
+1
+
+
+PARTITIONS = 64
+BATCH = 1000
+
+def __init__(self, combiner, memory_limit=256, path=/tmp/pyspark, 
serializer=None):
+self.combiner = combiner
+self.path = os.path.join(path, str(os.getpid()))
+self.memory_limit = memory_limit
+self.serializer = serializer or 
BatchedSerializer(AutoSerializer(), 1024)
+self.item_limit = None
+self.data = {}
+self.pdata = []
+self.spills = 0
+
+def used_memory(self):
+rss = resource.getrusage(resource.RUSAGE_SELF).ru_maxrss
+if platform.system() == 'Linux':
+rss = 10
+elif platform.system() == 'Darwin':
+rss = 20
+return rss
+
+def merge(self, iterator):
+iterator = iter(iterator)
+d = self.data
+comb = self.combiner
+c = 0
+for k, v in iterator:
+if k in d:
+d[k] = comb(d[k], v)
+else:
+d[k] = v
+
+if self.item_limit is not None:
+continue
+
+c += 1
+if c % self.BATCH == 0 and self.used_memory()  
self.memory_limit:
+self.item_limit = c
+self._first_spill()
+self._partitioned_merge(iterator)
+return
+
+def _partitioned_merge(self, iterator):
+comb = self.combiner
+c = 0
+for k, v in iterator:
+d = self.pdata[hash(k) % self.PARTITIONS]
+if k in d:
+d[k] = comb(d[k], v)
+else:
+d[k] = v
+c += 1
+if c = self.item_limit:
+self._spill()
+c = 0
+
+def _first_spill(self):
+path = os.path.join(self.path, str(self.spills))
+if not os.path.exists(path):
+os.makedirs(path)
+streams = [open(os.path.join(path, str(i)), 'w')
+   for i in range(self.PARTITIONS)]
--- End diff --

Because the data was compressed and decompressed once, so it's better to 
use lightweight compress method, such as LZ4/Snappy/LZO. Personally, I prefer 
LZ4, because it has similar compress ratio but with much higher performance 
than Snappy and LZO, LZF.

I will add them as part of BatchedSerializer.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

1 2 3 >

1 - 100 of 215 matches

Mail list logo