[GitHub] spark pull request: [SPARK-14542][CORE] PipeRDD should allow confi...

srowen Wed, 13 Apr 2016 12:27:58 -0700

Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12309#discussion_r59609962
  
    --- Diff: core/src/main/scala/org/apache/spark/rdd/PipedRDD.scala ---
    @@ -144,7 +142,8 @@ private[spark] class PipedRDD[T: ClassTag](
         new Thread(s"stdin writer for $command") {
           override def run(): Unit = {
             TaskContext.setTaskContext(context)
    -        val out = new PrintWriter(proc.getOutputStream)
    +        val out = new PrintWriter(new BufferedWriter(
    --- End diff --
    
    Buffering here is probably a decent idea, with a small buffer. Is it even 
necessary to make it configurable? 8K is pretty standard; you've found a larger 
buffer (32K?) is better. Would you ever want to turn it off or make it quite 
larger than that? The reason is just that this requires you to change a public 
API and that's going to require additional steps.
    
    Separately, this needs to specify UTF-8 encoding. Actually, we have this 
same problem in the stderr and stdout readers above, that they rely on platform 
encoding. I can sort of see an argument that using platform encoding makes 
sense when dealing with platform binaries, but, there's still no particular 
reason to expect the JVM default more often matches whatever some binary is 
using.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request: [SPARK-14542][CORE] PipeRDD should allow confi...

Reply via email to