Github user marmbrus commented on a diff in the pull request:

    https://github.com/apache/spark/pull/6882#discussion_r33111047
  
    --- Diff: core/src/main/scala/org/apache/spark/util/Utils.scala ---
    @@ -2333,3 +2333,36 @@ private[spark] class RedirectThread(
         }
       }
     }
    +
    +/**
    + * An [[OutputStream]] that will store the last 10 kilobytes (by default) 
written to it
    + * in a circular buffer. The current contents of the buffer can be 
accessed using
    + * the toString method.
    + */
    +private[spark] class CircularBuffer(sizeInByte: Int = 10240) extends 
java.io.OutputStream {
    +  var pos: Int = 0
    +  var buffer = new Array[Int](sizeInByte / 4)
    --- End diff --
    
    Okay, that is correct but the class comments do not line up with what the 
implementation does anymore.  The contract of an `OutputStream` is that each 
time you call `write()` it takes in a byte value (a number between 0-255).  The 
24 high-order bits of the value are ignored.  The JVM represents this as an 
`Int` only to mirror the InputStream, which needs to distinguish -1 from 255.
    
    Those details aside, if you tell me that you are going to store `the last 
10 kilobytes`, I would expect that you are going to store the input I gave you 
as a result of the last 10240 invocations of the write function call.  I don't 
care how you are actually representing the data internally.  As such, it seems 
that you are lying to the user and only storing a 25% of bytes that you 
promised to.


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to