[GitHub] spark issue #15899: [SPARK-18466] added withFilter method to RDD

reggert Fri, 18 Nov 2016 11:03:52 -0800

Github user reggert commented on the issue:

    https://github.com/apache/spark/pull/15899
  
    I don't get why you say that it "doesn't even work in general". Under what 
circumstances doesn't it work? I've never run into any problems with it. 
    
    The "simple syntactic sugar" allows very clear, concise code to be written 
in many cases, and even lets you take advantage of Scala pattern matching for 
filtering. For example:
    ```scala
    val strings = sparkContext.parallelize(List("1213,999", "abc", "456,789"))
    val NumberPairString = """(\d{1,5}),(\d{1,5})""".r
    val numbers = for (NumberPairString(a, b) <- strings; n <- Seq(a, b)) yield 
n.toInt
    // numbers.collect() yields Array[Int](1213, 999, 456, 789)
    ```
    
    Without the `for` comprehension, you wind up with this significantly uglier 
and somewhat confusing chain of calls:
    
    ```scala
    val strings = sparkContext.parallelize(List("1213,999", "abc", "456,789"))
    val NumberPairString = """(\d{1,5}),(\d{1,5})""".r
    val numbers = strings.filter {
       case NumberPairString(_, _) => true 
       case _ => false
    }.flatMap{
       case NumberString(a, b) => Seq(a, b)
    }.map(_.toInt)
    // numbers.collect() yields Array[Int](1213, 999, 456, 789)
    ```
    There are alternate ways to write this (e.g., with a single `flatMap` on a 
pattern match function that returns either 0 or 2 elements), but none of them 
are as clean and concise as the `for` version.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark issue #15899: [SPARK-18466] added withFilter method to RDD

Reply via email to