[GitHub] spark pull request: SPARK-1063 Add .sortBy(f) method on RDD

2014-06-17 Thread ash211
Github user ash211 commented on the pull request:

https://github.com/apache/spark/pull/369#issuecomment-46396473
  
Took a look through and didn't see any issues with the merge.  Thanks to
everyone who helped me get this in!


On Tue, Jun 17, 2014 at 2:48 PM, Reynold Xin 
wrote:

> There was a conflict that I had to merge manually. Take a look at master
> to make sure everything is ok. I did compile and ran a couple things.
>
> —
> Reply to this email directly or view it on GitHub
> .
>


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1063 Add .sortBy(f) method on RDD

2014-06-17 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/spark/pull/369


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1063 Add .sortBy(f) method on RDD

2014-06-17 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/369#issuecomment-46348842
  
There was a conflict that I had to merge manually. Take a look at master to 
make sure everything is ok. I did compile and ran a couple things.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1063 Add .sortBy(f) method on RDD

2014-06-17 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/369#issuecomment-46345169
  
This looks good to me. I will merge it.




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1063 Add .sortBy(f) method on RDD

2014-06-17 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/369#issuecomment-46343296
  
I will test this today.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1063 Add .sortBy(f) method on RDD

2014-06-16 Thread ash211
Github user ash211 commented on the pull request:

https://github.com/apache/spark/pull/369#issuecomment-46255197
  
Ping I think this is ready for merging.  There's been some discussion about 
how to pass the implicit parameters to the methods (in two phases of 
discussion!) but I think we're in a good place with these right now and can 
expect to keep this API going forward.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1063 Add .sortBy(f) method on RDD

2014-06-08 Thread ash211
Github user ash211 commented on a diff in the pull request:

https://github.com/apache/spark/pull/369#discussion_r13530130
  
--- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala ---
@@ -423,6 +423,18 @@ abstract class RDD[T: ClassTag](
   def ++(other: RDD[T]): RDD[T] = this.union(other)
 
   /**
+   * Return this RDD sorted by the given key function.
+   */
+  def sortBy[K](
+  f: (T) ⇒ K,
+  ascending: Boolean = true,
+  numPartitions: Int = this.partitions.size)
+  (implicit ord: Ordering[K], ctag: ClassTag[K]): RDD[T] =
--- End diff --

Do we need to further discuss the way the parameter is passed or is the 
current method sufficient?  I'm not sure the implications for binary 
compatibility in the future if we switch between the two (context bound vs 
implicits).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1063 Add .sortBy(f) method on RDD

2014-06-04 Thread markhamstra
Github user markhamstra commented on a diff in the pull request:

https://github.com/apache/spark/pull/369#discussion_r13423508
  
--- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala ---
@@ -423,6 +423,18 @@ abstract class RDD[T: ClassTag](
   def ++(other: RDD[T]): RDD[T] = this.union(other)
 
   /**
+   * Return this RDD sorted by the given key function.
+   */
+  def sortBy[K](
+  f: (T) ⇒ K,
+  ascending: Boolean = true,
+  numPartitions: Int = this.partitions.size)
+  (implicit ord: Ordering[K], ctag: ClassTag[K]): RDD[T] =
--- End diff --

https://issues.apache.org/jira/browse/SPARK-1540

https://github.com/apache/spark/commit/640f9a0efefd42cff86aecd4878a3a57f5ae85fa


On Wed, Jun 4, 2014 at 8:14 PM, Wenchen Fan 
wrote:

> In core/src/main/scala/org/apache/spark/rdd/RDD.scala:
>
> > @@ -423,6 +423,18 @@ abstract class RDD[T: ClassTag](
> >def ++(other: RDD[T]): RDD[T] = this.union(other)
> >
> >/**
> > +   * Return this RDD sorted by the given key function.
> > +   */
> > +  def sortBy[K](
> > +  f: (T) ⇒ K,
> > +  ascending: Boolean = true,
> > +  numPartitions: Int = this.partitions.size)
> > +  (implicit ord: Ordering[K], ctag: ClassTag[K]): RDD[T] =
>
> @markhamstra  I checked RDD.scala and
> found the style you talked about. But one thing I don't understand is:def
> countByValue()(implicit ord: Ordering[T] = null). I can't find the use of
> implicit Ordering[T] anywhere inside the method, and I compiled spark
> successfully with this implicit Ordering[T] removed. Did I miss something
> here?
>
> —
> Reply to this email directly or view it on GitHub
> .
>


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1063 Add .sortBy(f) method on RDD

2014-06-04 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/369#discussion_r13422474
  
--- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala ---
@@ -423,6 +423,18 @@ abstract class RDD[T: ClassTag](
   def ++(other: RDD[T]): RDD[T] = this.union(other)
 
   /**
+   * Return this RDD sorted by the given key function.
+   */
+  def sortBy[K](
+  f: (T) ⇒ K,
+  ascending: Boolean = true,
+  numPartitions: Int = this.partitions.size)
+  (implicit ord: Ordering[K], ctag: ClassTag[K]): RDD[T] =
--- End diff --

@markhamstra I checked RDD.scala and found the style you talked about. But 
one thing I don't understand is:`def countByValue()(implicit ord: Ordering[T] = 
null)`. I can't find the use of implicit `Ordering[T]` anywhere inside the 
method, and I compiled spark successfully with this implicit Ordering[T] 
removed. Did I miss something here?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1063 Add .sortBy(f) method on RDD

2014-06-04 Thread markhamstra
Github user markhamstra commented on a diff in the pull request:

https://github.com/apache/spark/pull/369#discussion_r13419732
  
--- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala ---
@@ -423,6 +423,18 @@ abstract class RDD[T: ClassTag](
   def ++(other: RDD[T]): RDD[T] = this.union(other)
 
   /**
+   * Return this RDD sorted by the given key function.
+   */
+  def sortBy[K](
+  f: (T) ⇒ K,
+  ascending: Boolean = true,
+  numPartitions: Int = this.partitions.size)
+  (implicit ord: Ordering[K], ctag: ClassTag[K]): RDD[T] =
--- End diff --

If I understand what you are claiming, then I think you are mistaken.  
There is only one implicit parameter list, though, so I don't think you can 
supply an argument for only one of the implicits:
```
scala> def foo[A: ClassTag](implicit ord: Ordering[A]) = 
(implicitly[ClassTag[A]], ord)
foo: [A](implicit evidence$1: scala.reflect.ClassTag[A], implicit ord: 
scala.math.Ordering[A])(scala.reflect.ClassTag[A], scala.math.Ordering[A])

scala> foo[Int]
res1: (scala.reflect.ClassTag[Int], scala.math.Ordering[Int]) = 
(Int,scala.math.Ordering$Int$@5e10a811)

scala> foo[Int](scala.reflect.classTag[Int], Ordering[Int])
res2: (scala.reflect.ClassTag[Int], scala.math.Ordering[Int]) = 
(Int,scala.math.Ordering$Int$@5e10a811)

scala> foo[Int](scala.reflect.classTag[Int], Ordering[Int].reverse)
res3: (scala.reflect.ClassTag[Int], scala.math.Ordering[Int]) = 
(Int,scala.math.Ordering$$anon$4@7293296)
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1063 Add .sortBy(f) method on RDD

2014-06-04 Thread aarondav
Github user aarondav commented on a diff in the pull request:

https://github.com/apache/spark/pull/369#discussion_r13418102
  
--- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala ---
@@ -423,6 +423,18 @@ abstract class RDD[T: ClassTag](
   def ++(other: RDD[T]): RDD[T] = this.union(other)
 
   /**
+   * Return this RDD sorted by the given key function.
+   */
+  def sortBy[K](
+  f: (T) ⇒ K,
+  ascending: Boolean = true,
+  numPartitions: Int = this.partitions.size)
+  (implicit ord: Ordering[K], ctag: ClassTag[K]): RDD[T] =
--- End diff --

Look at the discussion at the beginning of this PR, but the reason for this 
is related to the fact that the user may want to actually specify the Ordering 
explicitly, so we didn't want the sugar. Also you can't include both a context 
bound for ClassTag and an implicit argument list, I believe.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1063 Add .sortBy(f) method on RDD

2014-06-04 Thread markhamstra
Github user markhamstra commented on a diff in the pull request:

https://github.com/apache/spark/pull/369#discussion_r13387198
  
--- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala ---
@@ -423,6 +423,18 @@ abstract class RDD[T: ClassTag](
   def ++(other: RDD[T]): RDD[T] = this.union(other)
 
   /**
+   * Return this RDD sorted by the given key function.
+   */
+  def sortBy[K](
+  f: (T) ⇒ K,
+  ascending: Boolean = true,
+  numPartitions: Int = this.partitions.size)
+  (implicit ord: Ordering[K], ctag: ClassTag[K]): RDD[T] =
--- End diff --

That substitution certainly could be made since it is functionally 
equivalent.  Whether it should be made is mostly a question of style.  Since 
`ord` and `ctag` are not used explicitly here, there is no real use to 
enumerating them as implicit parameters; but the rest of RDD.scala doesn't use 
the context bound sugar, so there is some value in consistently maintaining 
that style. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1063 Add .sortBy(f) method on RDD

2014-06-04 Thread cloud-fan
Github user cloud-fan commented on a diff in the pull request:

https://github.com/apache/spark/pull/369#discussion_r13376621
  
--- Diff: core/src/main/scala/org/apache/spark/rdd/RDD.scala ---
@@ -423,6 +423,18 @@ abstract class RDD[T: ClassTag](
   def ++(other: RDD[T]): RDD[T] = this.union(other)
 
   /**
+   * Return this RDD sorted by the given key function.
+   */
+  def sortBy[K](
+  f: (T) ⇒ K,
+  ascending: Boolean = true,
+  numPartitions: Int = this.partitions.size)
+  (implicit ord: Ordering[K], ctag: ClassTag[K]): RDD[T] =
--- End diff --

why not using context bound?  like `def sortBy[K : Ordering : ClassTag]`


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1063 Add .sortBy(f) method on RDD

2014-06-03 Thread pwendell
Github user pwendell commented on the pull request:

https://github.com/apache/spark/pull/369#issuecomment-45053945
  
@rxin want to give this a final sign off?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1063 Add .sortBy(f) method on RDD

2014-05-25 Thread ash211
Github user ash211 commented on the pull request:

https://github.com/apache/spark/pull/369#issuecomment-44148572
  
Any objections to merging this `.sortBy()` method?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1063 Add .sortBy(f) method on RDD

2014-05-19 Thread ash211
Github user ash211 commented on the pull request:

https://github.com/apache/spark/pull/369#issuecomment-43574435
  
Success!


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1063 Add .sortBy(f) method on RDD

2014-05-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/369#issuecomment-43572007
  
All automated tests passed.
Refer to this link for build results: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/15087/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1063 Add .sortBy(f) method on RDD

2014-05-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/369#issuecomment-43572005
  
Merged build finished. All automated tests passed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1063 Add .sortBy(f) method on RDD

2014-05-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/369#issuecomment-43569435
  
Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1063 Add .sortBy(f) method on RDD

2014-05-19 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/369#issuecomment-43569426
  
 Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1063 Add .sortBy(f) method on RDD

2014-05-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/369#issuecomment-43052152
  

Refer to this link for build results: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14969/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1063 Add .sortBy(f) method on RDD

2014-05-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/369#issuecomment-43052151
  
Merged build finished. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1063 Add .sortBy(f) method on RDD

2014-05-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/369#issuecomment-43049565
  
 Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1063 Add .sortBy(f) method on RDD

2014-05-14 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/369#issuecomment-43049575
  
Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1063 Add .sortBy(f) method on RDD

2014-05-13 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/369#discussion_r12621775
  
--- Diff: core/src/test/scala/org/apache/spark/rdd/RDDSuite.scala ---
@@ -541,14 +543,64 @@ class RDDSuite extends FunSuite with 
SharedSparkContext {
 }
   }
 
+  test("sortByKey") {
+val data = sc.parallelize(Seq("5|50|A","4|60|C", "6|40|B"))
+
+val col1 = Array("4|60|C", "5|50|A", "6|40|B")
+val col2 = Array("6|40|B", "5|50|A", "4|60|C")
+val col3 = Array("5|50|A", "6|40|B", "4|60|C")
+
+assert(data.sortBy(_.split("\\|")(0)).collect() === col1)
+assert(data.sortBy(_.split("\\|")(1)).collect() === col2)
+assert(data.sortBy(_.split("\\|")(2)).collect() === col3)
+  }
+
+  test("sortByKey ascending parameter") {
+val data = sc.parallelize(Seq("5|50|A","4|60|C", "6|40|B"))
+
+val asc = Array("4|60|C", "5|50|A", "6|40|B")
+val desc = Array("6|40|B", "5|50|A", "4|60|C")
+
+assert(data.sortBy(_.split("\\|")(0), true).collect() === asc)
+assert(data.sortBy(_.split("\\|")(0), false).collect() === desc)
+  }
+
+  // issues with serialization of Ordering in the test
+  ignore("sortByKey with explicit ordering") {
+val data = sc.parallelize(Seq("Bob|Smith|50",
+  "Jane|Smith|40",
+  "Thomas|Williams|30",
+  "Karen|Williams|60"))
+
+val ageOrdered = Array("Thomas|Williams|30",
+   "Jane|Smith|40",
+   "Bob|Smith|50",
+   "Karen|Williams|60")
+
+// last name, then first name
+val nameOrdered = Array("Bob|Smith|50",
+"Jane|Smith|40",
+"Karen|Williams|60",
+"Thomas|Williams|30")
+
+def parse(s: String): Person = {
--- End diff --

if you change this to a closure, then it will pass without the 
serialization problem, e.g.
```scala
val parse = (s: String) => {
  val split = s.split("\\|")
  Person(split(0), split(1), split(2).toInt)
}
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1063 Add .sortBy(f) method on RDD

2014-05-13 Thread ash211
Github user ash211 commented on a diff in the pull request:

https://github.com/apache/spark/pull/369#discussion_r12621830
  
--- Diff: core/src/test/scala/org/apache/spark/rdd/RDDSuite.scala ---
@@ -541,14 +543,64 @@ class RDDSuite extends FunSuite with 
SharedSparkContext {
 }
   }
 
+  test("sortByKey") {
+val data = sc.parallelize(Seq("5|50|A","4|60|C", "6|40|B"))
+
+val col1 = Array("4|60|C", "5|50|A", "6|40|B")
+val col2 = Array("6|40|B", "5|50|A", "4|60|C")
+val col3 = Array("5|50|A", "6|40|B", "4|60|C")
+
+assert(data.sortBy(_.split("\\|")(0)).collect() === col1)
+assert(data.sortBy(_.split("\\|")(1)).collect() === col2)
+assert(data.sortBy(_.split("\\|")(2)).collect() === col3)
+  }
+
+  test("sortByKey ascending parameter") {
+val data = sc.parallelize(Seq("5|50|A","4|60|C", "6|40|B"))
+
+val asc = Array("4|60|C", "5|50|A", "6|40|B")
+val desc = Array("6|40|B", "5|50|A", "4|60|C")
+
+assert(data.sortBy(_.split("\\|")(0), true).collect() === asc)
+assert(data.sortBy(_.split("\\|")(0), false).collect() === desc)
+  }
+
+  // issues with serialization of Ordering in the test
+  ignore("sortByKey with explicit ordering") {
+val data = sc.parallelize(Seq("Bob|Smith|50",
+  "Jane|Smith|40",
+  "Thomas|Williams|30",
+  "Karen|Williams|60"))
+
+val ageOrdered = Array("Thomas|Williams|30",
+   "Jane|Smith|40",
+   "Bob|Smith|50",
+   "Karen|Williams|60")
+
+// last name, then first name
+val nameOrdered = Array("Bob|Smith|50",
+"Jane|Smith|40",
+"Karen|Williams|60",
+"Thomas|Williams|30")
+
+def parse(s: String): Person = {
--- End diff --

I'll take a pass through and see what I come up with.  Thanks!


On Tue, May 13, 2014 at 10:52 PM, Reynold Xin 
wrote:

> In core/src/test/scala/org/apache/spark/rdd/RDDSuite.scala:
>
> > +  "Jane|Smith|40",
> > +  "Thomas|Williams|30",
> > +  "Karen|Williams|60"))
> > +
> > +val ageOrdered = Array("Thomas|Williams|30",
> > +   "Jane|Smith|40",
> > +   "Bob|Smith|50",
> > +   "Karen|Williams|60")
> > +
> > +// last name, then first name
> > +val nameOrdered = Array("Bob|Smith|50",
> > +"Jane|Smith|40",
> > +"Karen|Williams|60",
> > +"Thomas|Williams|30")
> > +
> > +def parse(s: String): Person = {
>
> if you change this to a closure, then it will pass without the
> serialization problem, e.g.
>
> val parse = (s: String) => {
>
>   val split = s.split("\\|")
>
>   Person(split(0), split(1), split(2).toInt)}
>
> —
> Reply to this email directly or view it on 
GitHub
> .
>


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1063 Add .sortBy(f) method on RDD

2014-05-13 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/369#discussion_r12621760
  
--- Diff: core/src/main/scala/org/apache/spark/api/java/JavaRDD.scala ---
@@ -150,6 +153,18 @@ class JavaRDD[T](val rdd: RDD[T])(implicit val 
classTag: ClassTag[T])
 rdd.setName(name)
 this
   }
+
+  /**
+   * Return this RDD sorted by the given key function.
+   */
+  def sortBy[S](f: JFunction[T, S], ascending: Boolean, numPartitions: 
Int): JavaRDD[T] = {
+import scala.collection.JavaConverters._
+def fn = (x: T) => f.call(x)
+implicit val ordering = 
com.google.common.collect.Ordering.natural().asInstanceOf[Ordering[S]]
--- End diff --

Actually I gotta focus on something else and considering this is might be 
too last minute for 1.0 given it introduces new APIs, we have more time to push 
it into 1.1.  Do you mind fixing this yourself? The problem is you should've 
cast the type to Comparator[S] instead of Ordering[S] (which is Scala's 
ordering in this scope).


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1063 Add .sortBy(f) method on RDD

2014-05-13 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/369#discussion_r12621764
  
--- Diff: core/src/test/scala/org/apache/spark/rdd/RDDSuite.scala ---
@@ -541,14 +543,64 @@ class RDDSuite extends FunSuite with 
SharedSparkContext {
 }
   }
 
+  test("sortByKey") {
+val data = sc.parallelize(Seq("5|50|A","4|60|C", "6|40|B"))
+
+val col1 = Array("4|60|C", "5|50|A", "6|40|B")
+val col2 = Array("6|40|B", "5|50|A", "4|60|C")
+val col3 = Array("5|50|A", "6|40|B", "4|60|C")
+
+assert(data.sortBy(_.split("\\|")(0)).collect() === col1)
+assert(data.sortBy(_.split("\\|")(1)).collect() === col2)
+assert(data.sortBy(_.split("\\|")(2)).collect() === col3)
+  }
+
+  test("sortByKey ascending parameter") {
+val data = sc.parallelize(Seq("5|50|A","4|60|C", "6|40|B"))
+
+val asc = Array("4|60|C", "5|50|A", "6|40|B")
+val desc = Array("6|40|B", "5|50|A", "4|60|C")
+
+assert(data.sortBy(_.split("\\|")(0), true).collect() === asc)
+assert(data.sortBy(_.split("\\|")(0), false).collect() === desc)
+  }
+
+  // issues with serialization of Ordering in the test
+  ignore("sortByKey with explicit ordering") {
+val data = sc.parallelize(Seq("Bob|Smith|50",
+  "Jane|Smith|40",
+  "Thomas|Williams|30",
+  "Karen|Williams|60"))
+
+val ageOrdered = Array("Thomas|Williams|30",
+   "Jane|Smith|40",
+   "Bob|Smith|50",
+   "Karen|Williams|60")
+
+// last name, then first name
+val nameOrdered = Array("Bob|Smith|50",
+"Jane|Smith|40",
+"Karen|Williams|60",
+"Thomas|Williams|30")
+
+def parse(s: String): Person = {
+  val split = s.split("\\|")
+  Person(split(0), split(1), split(2).toInt)
+}
+
+import scala.reflect.classTag
+assert(data.sortBy(parse, false, 2)(AgeOrdering, classTag[Person]) === 
ageOrdered)
--- End diff --

ascending should be true, not false


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1063 Add .sortBy(f) method on RDD

2014-05-13 Thread ash211
Github user ash211 commented on a diff in the pull request:

https://github.com/apache/spark/pull/369#discussion_r12621569
  
--- Diff: core/src/main/scala/org/apache/spark/api/java/JavaRDD.scala ---
@@ -150,6 +153,18 @@ class JavaRDD[T](val rdd: RDD[T])(implicit val 
classTag: ClassTag[T])
 rdd.setName(name)
 this
   }
+
+  /**
+   * Return this RDD sorted by the given key function.
+   */
+  def sortBy[S](f: JFunction[T, S], ascending: Boolean, numPartitions: 
Int): JavaRDD[T] = {
+import scala.collection.JavaConverters._
+def fn = (x: T) => f.call(x)
+implicit val ordering = 
com.google.common.collect.Ordering.natural().asInstanceOf[Ordering[S]]
--- End diff --

Thank you!  I'm not incredibly familiar with Java/Scala interactions so
would appreciate the assistance.  Also, didn't know Scala had an Ordering.

Using Guava for the default comparator was based on the
`sortByKey(Boolean)` method in `JavaPairRDD.scala`


On Tue, May 13, 2014 at 10:35 PM, Reynold Xin 
wrote:

> In core/src/main/scala/org/apache/spark/api/java/JavaRDD.scala:
>
> > @@ -150,6 +153,18 @@ class JavaRDD[T](val rdd: RDD[T])(implicit val 
classTag: ClassTag[T])
> >  rdd.setName(name)
> >  this
> >}
> > +
> > +  /**
> > +   * Return this RDD sorted by the given key function.
> > +   */
> > +  def sortBy[S](f: JFunction[T, S], ascending: Boolean, numPartitions: 
Int): JavaRDD[T] = {
> > +import scala.collection.JavaConverters._
> > +def fn = (x: T) => f.call(x)
> > +implicit val ordering = 
com.google.common.collect.Ordering.natural().asInstanceOf[Ordering[S]]
>
> I will submit a pr to fix these.
>
> —
> Reply to this email directly or view it on 
GitHub
> .
>


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1063 Add .sortBy(f) method on RDD

2014-05-13 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/369#discussion_r12621500
  
--- Diff: core/src/main/scala/org/apache/spark/api/java/JavaRDD.scala ---
@@ -150,6 +153,18 @@ class JavaRDD[T](val rdd: RDD[T])(implicit val 
classTag: ClassTag[T])
 rdd.setName(name)
 this
   }
+
+  /**
+   * Return this RDD sorted by the given key function.
+   */
+  def sortBy[S](f: JFunction[T, S], ascending: Boolean, numPartitions: 
Int): JavaRDD[T] = {
+import scala.collection.JavaConverters._
+def fn = (x: T) => f.call(x)
+implicit val ordering = 
com.google.common.collect.Ordering.natural().asInstanceOf[Ordering[S]]
--- End diff --

I will submit a pr to fix these.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1063 Add .sortBy(f) method on RDD

2014-05-13 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/369#discussion_r12621340
  
--- Diff: core/src/main/scala/org/apache/spark/api/java/JavaRDD.scala ---
@@ -150,6 +153,18 @@ class JavaRDD[T](val rdd: RDD[T])(implicit val 
classTag: ClassTag[T])
 rdd.setName(name)
 this
   }
+
+  /**
+   * Return this RDD sorted by the given key function.
+   */
+  def sortBy[S](f: JFunction[T, S], ascending: Boolean, numPartitions: 
Int): JavaRDD[T] = {
+import scala.collection.JavaConverters._
+def fn = (x: T) => f.call(x)
+implicit val ordering = 
com.google.common.collect.Ordering.natural().asInstanceOf[Ordering[S]]
--- End diff --

I think the problem here is that Ordering in this scope is just scala's 
Ordering


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1063 Add .sortBy(f) method on RDD

2014-05-13 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/369#discussion_r12621237
  
--- Diff: core/src/main/scala/org/apache/spark/api/java/JavaRDD.scala ---
@@ -150,6 +153,18 @@ class JavaRDD[T](val rdd: RDD[T])(implicit val 
classTag: ClassTag[T])
 rdd.setName(name)
 this
   }
+
+  /**
+   * Return this RDD sorted by the given key function.
+   */
+  def sortBy[S](f: JFunction[T, S], ascending: Boolean, numPartitions: 
Int): JavaRDD[T] = {
+import scala.collection.JavaConverters._
+def fn = (x: T) => f.call(x)
+implicit val ordering = 
com.google.common.collect.Ordering.natural().asInstanceOf[Ordering[S]]
--- End diff --

How does this actually work? Shouldn't natural() return a Comparator?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1063 Add .sortBy(f) method on RDD

2014-05-13 Thread ash211
Github user ash211 commented on a diff in the pull request:

https://github.com/apache/spark/pull/369#discussion_r12621279
  
--- Diff: core/src/main/scala/org/apache/spark/api/java/JavaRDD.scala ---
@@ -150,6 +153,18 @@ class JavaRDD[T](val rdd: RDD[T])(implicit val 
classTag: ClassTag[T])
 rdd.setName(name)
 this
   }
+
+  /**
+   * Return this RDD sorted by the given key function.
+   */
+  def sortBy[S](f: JFunction[T, S], ascending: Boolean, numPartitions: 
Int): JavaRDD[T] = {
+import scala.collection.JavaConverters._
+def fn = (x: T) => f.call(x)
+implicit val ordering = 
com.google.common.collect.Ordering.natural().asInstanceOf[Ordering[S]]
--- End diff --

Ordering.natural() returns an Ordering object, which implements the
Comparator interface.


http://docs.guava-libraries.googlecode.com/git-history/release/javadoc/com/google/common/collect/Ordering.html#natural()


On Tue, May 13, 2014 at 10:16 PM, Reynold Xin 
wrote:

> In core/src/main/scala/org/apache/spark/api/java/JavaRDD.scala:
>
> > @@ -150,6 +153,18 @@ class JavaRDD[T](val rdd: RDD[T])(implicit val 
classTag: ClassTag[T])
> >  rdd.setName(name)
> >  this
> >}
> > +
> > +  /**
> > +   * Return this RDD sorted by the given key function.
> > +   */
> > +  def sortBy[S](f: JFunction[T, S], ascending: Boolean, numPartitions: 
Int): JavaRDD[T] = {
> > +import scala.collection.JavaConverters._
> > +def fn = (x: T) => f.call(x)
> > +implicit val ordering = 
com.google.common.collect.Ordering.natural().asInstanceOf[Ordering[S]]
>
> How does this actually work? Shouldn't natural() return a Comparator?
>
> —
> Reply to this email directly or view it on 
GitHub
> .
>


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1063 Add .sortBy(f) method on RDD

2014-05-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/369#issuecomment-42904709
  
Merged build finished. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1063 Add .sortBy(f) method on RDD

2014-05-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/369#issuecomment-42904649
  
 Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1063 Add .sortBy(f) method on RDD

2014-05-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/369#issuecomment-42904665
  
Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1063 Add .sortBy(f) method on RDD

2014-05-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/369#issuecomment-42906068
  
Merged build finished. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1063 Add .sortBy(f) method on RDD

2014-05-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/369#issuecomment-42906072
  
Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1063 Add .sortBy(f) method on RDD

2014-05-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/369#issuecomment-42906069
  

Refer to this link for build results: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14919/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1063 Add .sortBy(f) method on RDD

2014-05-12 Thread ash211
Github user ash211 commented on the pull request:

https://github.com/apache/spark/pull/369#issuecomment-42905647
  
Fixed up with an unfortunate force push.  I typically use merges for these 
sorts of things but forgot how bad they appear in GitHub.  Will remember to use 
rebase for GitHub projects in the future.

Any comments on the Java/Python bridges?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1063 Add .sortBy(f) method on RDD

2014-05-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/369#issuecomment-42906147
  

Refer to this link for build results: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14920/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1063 Add .sortBy(f) method on RDD

2014-05-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/369#issuecomment-42905778
  
 Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1063 Add .sortBy(f) method on RDD

2014-05-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/369#issuecomment-42906146
  
Merged build finished. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1063 Add .sortBy(f) method on RDD

2014-05-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/369#issuecomment-42905496
  
 Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1063 Add .sortBy(f) method on RDD

2014-05-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/369#issuecomment-42905502
  
Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1063 Add .sortBy(f) method on RDD

2014-05-12 Thread markhamstra
Github user markhamstra commented on the pull request:

https://github.com/apache/spark/pull/369#issuecomment-42905001
  
I'll say it again: When you are working on preparing a PR, you're better 
off rebasing than merging.

within a clone of your github repo:

git pull --rebase g...@github.com:apache/spark.git master
git push +YourBranchName



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1063 Add .sortBy(f) method on RDD

2014-05-12 Thread ash211
Github user ash211 commented on the pull request:

https://github.com/apache/spark/pull/369#issuecomment-42873882
  
Ping


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1063 Add .sortBy(f) method on RDD

2014-05-12 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/369#issuecomment-42904711
  

Refer to this link for build results: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14915/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1063 Add .sortBy(f) method on RDD

2014-05-12 Thread ash211
Github user ash211 commented on the pull request:

https://github.com/apache/spark/pull/369#issuecomment-42904551
  
That merge makes the GitHub diffs really unclean, one sec and I'll make it 
a nicer clean merge


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1063 Add .sortBy(f) method on RDD

2014-05-12 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/369#discussion_r12554718
  
--- Diff: core/src/test/scala/org/apache/spark/rdd/RDDSuite.scala ---
@@ -541,6 +543,46 @@ class RDDSuite extends FunSuite with 
SharedSparkContext {
 }
   }
 
+  test("sortByKey") {
+val data = sc.parallelize(Seq("5|50|A","4|60|C", "6|40|B"))
+
+val col1 = Array("4|60|C", "5|50|A", "6|40|B")
+val col2 = Array("6|40|B", "5|50|A", "4|60|C")
+val col3 = Array("5|50|A", "6|40|B", "4|60|C")
+
+assert(data.sortBy(_.split("\\|")(0)).collect === col1)
+assert(data.sortBy(_.split("\\|")(1)).collect === col2)
+assert(data.sortBy(_.split("\\|")(2)).collect === col3)
+  }
+
+  test("sortByKey ascending parameter") {
+val data = sc.parallelize(Seq("5|50|A","4|60|C", "6|40|B"))
+
+val asc = Array("4|60|C", "5|50|A", "6|40|B")
+val desc = Array("6|40|B", "5|50|A", "4|60|C")
+
+assert(data.sortBy(_.split("\\|")(0), true).collect === asc)
+assert(data.sortBy(_.split("\\|")(0), false).collect === desc)
+  }
+
+  // issues with serialization of Ordering in the test
+  ignore("sortByKey with explicit ordering") {
+val data = sc.parallelize(Seq("Bob|Smith|50", "Jane|Smith|40", 
"Thomas|Williams|30", "Karen|Williams|60"))
--- End diff --

a few of the lines in this function exceed 100 chars


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1063 Add .sortBy(f) method on RDD

2014-05-12 Thread rxin
Github user rxin commented on a diff in the pull request:

https://github.com/apache/spark/pull/369#discussion_r12554704
  
--- Diff: core/src/test/scala/org/apache/spark/rdd/RDDSuite.scala ---
@@ -541,6 +543,46 @@ class RDDSuite extends FunSuite with 
SharedSparkContext {
 }
   }
 
+  test("sortByKey") {
+val data = sc.parallelize(Seq("5|50|A","4|60|C", "6|40|B"))
+
+val col1 = Array("4|60|C", "5|50|A", "6|40|B")
+val col2 = Array("6|40|B", "5|50|A", "4|60|C")
+val col3 = Array("5|50|A", "6|40|B", "4|60|C")
+
+assert(data.sortBy(_.split("\\|")(0)).collect === col1)
--- End diff --

mind adding parenthesis to all the collect's? 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1063 Add .sortBy(f) method on RDD

2014-05-12 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/369#issuecomment-42891085
  
This looks good. This would be a useful API to add to Spark to facilitate 
sorting. Do you mind looking into Java and Python API for this as well? 



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1063 Add .sortBy(f) method on RDD

2014-05-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/369#issuecomment-42391059
  
Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1063 Add .sortBy(f) method on RDD

2014-05-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/369#issuecomment-42391094
  
Merged build finished. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1063 Add .sortBy(f) method on RDD

2014-05-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/369#issuecomment-42391095
  

Refer to this link for build results: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14758/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1063 Add .sortBy(f) method on RDD

2014-05-07 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/369#issuecomment-42391055
  
 Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1063 Add .sortBy(f) method on RDD

2014-05-07 Thread ash211
Github user ash211 commented on the pull request:

https://github.com/apache/spark/pull/369#issuecomment-42391069
  
Would appreciate getting this into 1.0 but it can wait until later as well.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1063 Add .sortBy(f) method on RDD

2014-04-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/369#issuecomment-40267040
  
Merged build finished. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1063 Add .sortBy(f) method on RDD

2014-04-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/369#issuecomment-40267041
  

Refer to this link for build results: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/14067/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1063 Add .sortBy(f) method on RDD

2014-04-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/369#issuecomment-40267020
  
Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1063 Add .sortBy(f) method on RDD

2014-04-11 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/369#issuecomment-40267016
  
 Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1063 Add .sortBy(f) method on RDD

2014-04-11 Thread ash211
Github user ash211 commented on the pull request:

https://github.com/apache/spark/pull/369#issuecomment-40266965
  
Moved the Ordering outside of the test class, but it still brings in a 
SparkConf from somewhere that fails to serialize.  Sorry, I'm not much of a 
Scala expert so will need some handholding here.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1063 Add .sortBy(f) method on RDD

2014-04-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/369#issuecomment-40036254
  

Refer to this link for build results: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/13973/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1063 Add .sortBy(f) method on RDD

2014-04-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/369#issuecomment-40036252
  
Merged build finished. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1063 Add .sortBy(f) method on RDD

2014-04-09 Thread aarondav
Github user aarondav commented on the pull request:

https://github.com/apache/spark/pull/369#issuecomment-40034991
  
You'll have to define your Ordering outside of the test class, otherwise it 
will try to bring in the test class's implicit "RDDSuite.this" into the 
closure. You can define it inside an RDDSuite companion object, that probably 
preserves the desired visibility semantics.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1063 Add .sortBy(f) method on RDD

2014-04-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/369#issuecomment-40034670
  
 Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1063 Add .sortBy(f) method on RDD

2014-04-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/369#issuecomment-40034678
  
Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1063 Add .sortBy(f) method on RDD

2014-04-09 Thread ash211
Github user ash211 commented on the pull request:

https://github.com/apache/spark/pull/369#issuecomment-40034695
  
I tried writing a test for the Ordering parameter but am getting issues 
with SparkConf not being serializable.  I suspect that the surrounding context 
of the Ordering parameter is causing issues.  When I comment out the last two 
lines that use the Ordering, the class compiles and works (though obviously 
doesn't actually test anything).

```
[info] - sortByKey with explicit ordering *** FAILED *** (11 milliseconds)
[info]   org.apache.spark.SparkException: Job aborted: Task not 
serializable: java.io.NotSerializableException: org.apache.spark.SparkConf
[info]   at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1017)
[info]   at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$org$apache$spark$scheduler$DAGScheduler$$abortStage$1.apply(DAGScheduler.scala:1015)
[info]   at 
scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
[info]   at 
scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:47)
[info]   at 
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$abortStage(DAGScheduler.scala:1015)
[info]   at 
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$submitMissingTasks(DAGScheduler.scala:778)
[info]   at 
org.apache.spark.scheduler.DAGScheduler.org$apache$spark$scheduler$DAGScheduler$$submitStage(DAGScheduler.scala:721)
[info]   at 
org.apache.spark.scheduler.DAGScheduler.processEvent(DAGScheduler.scala:551)
[info]   at 
org.apache.spark.scheduler.DAGScheduler$$anonfun$start$1$$anon$2$$anonfun$receive$1.applyOrElse(DAGScheduler.scala:190)
[info]   at akka.actor.ActorCell.receiveMessage(ActorCell.scala:498)
[info]   at akka.actor.ActorCell.invoke(ActorCell.scala:456)
[info]   at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:237)
[info]   at akka.dispatch.Mailbox.run(Mailbox.scala:219)
[info]   at 
akka.dispatch.ForkJoinExecutorConfigurator$AkkaForkJoinTask.exec(AbstractDispatcher.scala:386)
[info]   at 
scala.concurrent.forkjoin.ForkJoinTask.doExec(ForkJoinTask.java:260)
[info]   at 
scala.concurrent.forkjoin.ForkJoinPool$WorkQueue.runTask(ForkJoinPool.java:1339)
[info]   at 
scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
[info]   at 
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
```

Any ideas on how to properly call it with this extra implicit parameter?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1063 Add .sortBy(f) method on RDD

2014-04-09 Thread aarondav
Github user aarondav commented on the pull request:

https://github.com/apache/spark/pull/369#issuecomment-40033173
  
Ah, alas, looks like the context bound just desugars to the exact same 
thing here, except we've just reordered the implicit arguments. We might be 
able to get around this by clever usage of implicit conversions, but it almost 
certainly wouldn't be worth it. At least having the whole parameter list 
explicitly listed out makes it more obvious that you can specify the Ordering, 
even if we can't remove the ugliness of also specifying a ClassTag too.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1063 Add .sortBy(f) method on RDD

2014-04-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/369#issuecomment-40033138
  
Merged build finished. All automated tests passed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1063 Add .sortBy(f) method on RDD

2014-04-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/369#issuecomment-40033139
  
All automated tests passed.
Refer to this link for build results: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/13970/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1063 Add .sortBy(f) method on RDD

2014-04-09 Thread ash211
Github user ash211 commented on the pull request:

https://github.com/apache/spark/pull/369#issuecomment-40031981
  
That gets me closer, but it looks like you then have to pass all the 
implicit parameters or none of them.  I tried passing in the ord parameter by 
name as well, but that still didn't work.

Any other ideas?

Personally I would have encoded my Ordering requirement as the output of my 
sortBy function rather than passing in a key function as well as an ordering on 
that key.  I can see that some people might prefer to have both if they're 
re-using an Ordering from elsewhere.

```
[error] 
/Users/aash/git/spark/core/src/test/scala/org/apache/spark/rdd/RDDSuite.scala:556:
 not enough arguments for method sortBy: (implicit ord: Ordering[Person], 
implicit ctag: scala.reflect.ClassTag[Person])org.apache.spark.rdd.RDD[String].
[error] Unspecified value parameter ctag.
[error] assert(data.sortBy(parse, false, 2)(NameOrdering) === 
nameOrdered)
[error]
```


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1063 Add .sortBy(f) method on RDD

2014-04-09 Thread aarondav
Github user aarondav commented on the pull request:

https://github.com/apache/spark/pull/369#issuecomment-40031655
  
It's likely you'll have to specify the classtag as an implicit parameter of 
its own, by appending the following argument: `(implicit ctag: ClassTag[K])`, 
and remove the context bound in the type signature.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1063 Add .sortBy(f) method on RDD

2014-04-09 Thread ash211
Github user ash211 commented on the pull request:

https://github.com/apache/spark/pull/369#issuecomment-40031343
  
How would you write the signature with the implicit Ordering parameter as 
well?

I tried with the below definition and below test

```
  def sortBy[K: ClassTag](
  f: (T) ⇒ K,
  ascending: Boolean = true,
  numPartitions: Int = this.partitions.size)
  (implicit ord: Ordering[K]): RDD[T] =
this.keyBy[K](f)
.sortByKey(ascending, numPartitions)
.values
```

```
  test("sortByKey with explicit ordering") {
val data = sc.parallelize(Seq("Bob|Smith|50", "Jane|Smith|40", 
"Thomas|Williams|30", "Karen|Williams|60"))

val ageOrdered = Array("Thomas|Williams|30", "Jane|Smith|40", 
"Bob|Smith|50", "Karen|Williams|60")
// last name, then first name
val nameOrdered = Array("Bob|Smith|50", "Jane|Smith|40", 
"Karen|Williams|60", "Thomas|Williams|30")

case class Person(first: String, last: String, age: Int)

def parse(s: String): Person = {
  val split = s.split("\\|")
  Person(split(0), split(1), split(2).toInt)
}

object AgeOrdering extends Ordering[Person] {
  def compare(a:Person, b:Person) = a.age compare b.age
}

object NameOrdering extends Ordering[Person] {
  def compare(a:Person, b:Person) =
implicitly[Ordering[Tuple2[String,String]]].compare((a.last, 
a.first), (b.last, b.first))
}

assert(data.sortBy(parse, false, 2)(AgeOrdering) === ageOrdered)
assert(data.sortBy(parse, false, 2)(NameOrdering) === nameOrdered)
  }
```

But got an error that I'd have to pass in the ClassTag implicitly as well:

```
[error] 
/Users/aash/git/spark/core/src/test/scala/org/apache/spark/rdd/RDDSuite.scala:555:
 not enough arguments for method sortBy: (implicit evidence$5: 
scala.reflect.ClassTag[Person], implicit ord: 
Ordering[Person])org.apache.spark.rdd.RDD[String].
[error] Unspecified value parameter ord.
[error] assert(data.sortBy(parse, false, 2)(AgeOrdering) === ageOrdered)
[error]^
[error] 
/Users/aash/git/spark/core/src/test/scala/org/apache/spark/rdd/RDDSuite.scala:556:
 not enough arguments for method sortBy: (implicit evidence$5: 
scala.reflect.ClassTag[Person], implicit ord: 
Ordering[Person])org.apache.spark.rdd.RDD[String].
[error] Unspecified value parameter ord.
[error] assert(data.sortBy(parse, false, 2)(NameOrdering) === 
nameOrdered)
[error]^
[error] two errors found
[error] (core/test:compile) Compilation failed
[error] Total time: 9 s, completed Apr 10, 2014 1:13:27 AM
aash@aash-mbp ~/git/spark$
```

Is there a way to set the ordering of the implicit parameters so that the 
Ordering goes before the ClassTag ?




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1063 Add .sortBy(f) method on RDD

2014-04-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/369#issuecomment-40031354
  
Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1063 Add .sortBy(f) method on RDD

2014-04-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/369#issuecomment-40031345
  
 Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1063 Add .sortBy(f) method on RDD

2014-04-09 Thread rxin
Github user rxin commented on the pull request:

https://github.com/apache/spark/pull/369#issuecomment-40010754
  
Through this pull request, I just realized how inefficient the sorting code
was (one that takes Ordered). I'm glad you are using Ordering instead. We
should change the sortByKey implementation ...

For the test code, do you mind removing the extra spaces you used to align
=s?

And can we take an implicit Ordering like the way Scala lib does? This way,
the user can also change the ordering.


On Wed, Apr 9, 2014 at 7:40 AM, Andrew Ash  wrote:

> This never got merged from the apache/incubator-spark repo (which is now
> deleted) but there had been several rounds of code review on this PR 
there.
>
> I think this is ready for merging.
> --
> You can merge this Pull Request by running
>
>   git pull https://github.com/ash211/spark sortby
>
> Or view, comment on, or merge it at:
>
>   https://github.com/apache/spark/pull/369
> Commit Summary
>
>- Add .sortBy(f) method on RDD
>- Merge remote-tracking branch 'origin/master' into sortby
>- Support ascending and numPartitions params in sortBy()
>- Correct silly typo
>- Actually use ascending and numPartitions parameters
>
> File Changes
>
>- *M* 
core/src/main/scala/org/apache/spark/rdd/RDD.scala(11)
>- *M* 
core/src/test/scala/org/apache/spark/rdd/RDDSuite.scala(12)
>
> Patch Links:
>
>- https://github.com/apache/spark/pull/369.patch
>- https://github.com/apache/spark/pull/369.diff
>
> --
> Reply to this email directly or view it on 
GitHub
> .
>


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1063 Add .sortBy(f) method on RDD

2014-04-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/369#issuecomment-39976773
  
All automated tests passed.
Refer to this link for build results: 
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/13950/


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1063 Add .sortBy(f) method on RDD

2014-04-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/369#issuecomment-39976772
  
Merged build finished. All automated tests passed.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1063 Add .sortBy(f) method on RDD

2014-04-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/369#issuecomment-39971959
  
Merged build started. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1063 Add .sortBy(f) method on RDD

2014-04-09 Thread AmplabJenkins
Github user AmplabJenkins commented on the pull request:

https://github.com/apache/spark/pull/369#issuecomment-39971934
  
 Merged build triggered. 


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] spark pull request: SPARK-1063 Add .sortBy(f) method on RDD

2014-04-09 Thread ash211
GitHub user ash211 opened a pull request:

https://github.com/apache/spark/pull/369

SPARK-1063 Add .sortBy(f) method on RDD

This never got merged from the apache/incubator-spark repo (which is now 
deleted) but there had been several rounds of code review on this PR there.

I think this is ready for merging.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/ash211/spark sortby

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/spark/pull/369.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #369


commit ca4490da536578ef4650039b099db8dafc9d6b66
Author: Andrew Ash 
Date:   2014-01-24T08:26:55Z

Add .sortBy(f) method on RDD

commit 0f685fd17584061d4b18419234bedb79843a0813
Author: Andrew Ash 
Date:   2014-02-14T06:11:27Z

Merge remote-tracking branch 'origin/master' into sortby

Conflicts:
core/src/main/scala/org/apache/spark/rdd/RDD.scala
core/src/test/scala/org/apache/spark/rdd/RDDSuite.scala

commit 7db3e849c5a9e4a3189ea594e349835cef6d307e
Author: Andrew Ash 
Date:   2014-02-14T06:27:06Z

Support ascending and numPartitions params in sortBy()

commit 381eef23f59a44b0555de9bb63fc8e598595ef32
Author: Andrew Ash 
Date:   2014-02-14T06:32:19Z

Correct silly typo

commit 8c53298cfeebcba7e08ef8c586816e7513daf11b
Author: Andrew Ash 
Date:   2014-02-25T00:19:02Z

Actually use ascending and numPartitions parameters




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---