[GitHub] spark pull request: Added a FastByteArrayOutputStream that exposes...

srowen Sat, 12 Apr 2014 05:31:26 -0700

Github user srowen commented on the pull request:

    https://github.com/apache/spark/pull/397#issuecomment-40279192
  
    You could deprecate and override `toByteArray` to throw an exception, etc., 
to be extra-safe. They "work", the result just may not have much meaning 
independently. Your class still has methods like `close()` either way. Dunno, 
still seems simpler than the duplication.
    
    What's the compaction for? If you've got a series of ~2GB containers, I'd 
assume you'd fill them each pretty completely and transparently split a big 
write across the existing and next buffer. It saves a huge allocation, which 
could fail.
    
    (In the grow() method, you would have to check that the new doubled size 
hasn't overflowed!)
    
    I agree with use of `ByteBuffer`, but suppose I'm pointing out that it has 
to get used in several other places in the code that use `byte[]` right now in 
order to get the benefit. I understand that wasn't the direct purpose of the 
code you're working on, but is the purpose of this PR I think. In which case, 
perhaps better to leverage your direction.
    
    A simpler step in your direction could be the basis for the change that 
this PR is trying for. That's why I wonder if this piece could have a simpler, 
stand-alone purpose.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

[GitHub] spark pull request: Added a FastByteArrayOutputStream that exposes...

Reply via email to