Github user srowen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/12309#discussion_r60207018
  
    --- Diff: core/src/main/scala/org/apache/spark/rdd/PipedRDD.scala ---
    @@ -144,7 +142,8 @@ private[spark] class PipedRDD[T: ClassTag](
         new Thread(s"stdin writer for $command") {
           override def run(): Unit = {
             TaskContext.setTaskContext(context)
    -        val out = new PrintWriter(proc.getOutputStream)
    +        val out = new PrintWriter(new BufferedWriter(
    --- End diff --
    
    Hm, is it crazy to just set a large buffer size? 1M isn't that much since 
an executors isn't generally going to run lots of processes. I was thinking 
some conf parameter might be nicer than an API param, but, I suppose the right 
buffer size depends on what you're doing.
    
    Maybe an API method argument isn't so bad. I was going to say, this needs 
to be plumbed through to the Java API, but its `pipe` method already lacks most 
of the args of the core Scala version. Maybe it's sensible if we do this for 
2.0.0, and, fix up the Java API? (Python API works entirely differently anyway 
here)
    
    @tejasapatil and erm, @andrewor14 or @tgravescs any particular opinions on 
adding a bufferSize arg to the pipe method?
    
    Yes, if there's an encoding issue it's not introduced by this change of 
course. In general we never want to depend on the platform encoding. I wonder 
... if this is one exception, since it's communicating with platform binaries. 
It's still not so great, but maybe on second thought that can be left alone for 
now.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to