[GitHub] spark issue #14256: [SPARK-16620][CORE] Add back the tokenization process in...

srowen Tue, 19 Jul 2016 01:05:47 -0700

Github user srowen commented on the issue:

    https://github.com/apache/spark/pull/14256
  
    SPARK-16613 is different I believe.
    
    You reported a `StackOverflowError` and indeed I can't figure out why the 
existing `pipe` methods just call themselves? It happened in 
https://github.com/apache/spark/commit/279bd4aa5fddbabdb0383a3f6f0fc8d91780e092 
and unless I totally miss something that's just a small but bad error. They 
need to call to the main `pipe` overload.
    
    The cleanup to `PipedRDD` constructors also lost the `tokenize` call. These 
simpler `pipe` overloads do need to invoke it.
    
    This is certainly my fault as I was reviewing and suggested some cleanup 
that ultimately led to losing this functionality.
    
    (Also I don't really like using `StringTokenizer` instead of just splitting 
on whitespace, but, maybe not the thing to deal with now.)



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark issue #14256: [SPARK-16620][CORE] Add back the tokenization process in...

Reply via email to