Github user srowen commented on the pull request:

    https://github.com/apache/spark/pull/215#issuecomment-38824292
  
    @pwendell @velvia Yes almost done now. A few preliminary notes, which might 
affect all your opinions on this:
    
    - The uses of `FastBuffered{Input,Output}Stream` look safe to replace with 
`Buffered{Input,Output}Stream` because none of their special methods are used, 
and any performance differences look trivial for purposes here
    - `FastByteArrayInputStream`'s difference seems to be just different 
behavior in case of a 0-byte read at EOF. The docs say the difference is a fix; 
to me the required API is kinda ambiguous and this is unlikely to be an issue 
anyway.
    - Use of 32-bit MurmurHash3 is easily replaced with Guava
    - Use of `FastByteArrayOutputStream` is a little more contentious, since it 
lets you access the raw byte array rather than copy it. However, in 2 of 3 
uses, the byte array is copied anyway due to a call to `trim()`. Still there is 
1 use where it will make a difference.
    - It's functionally easy to replace uses of the maps with 
`mutable.Map[T,Long]`. They may be slower. There's one case in `examples` where 
it looks like a lot of care was taken to use this efficiently. This can be 
handled a bit specially. It's only in examples.
    
    Keep in mind it's also possible to write some shading exclusion rules to 
trim out parts of fastutil that aren't needed for sure. That's another way to 
go.
    
    Once I get it finished and tests pass I'll open a PR just for a look at 
what it concretely looks like.
    
    _Other small note: the project seems clearly labeled as AL2.0 licensed, but 
lots of the source I'm looking through is labeled LGPL 2.1. I think this must 
be an oversight but gave me pause._


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

Reply via email to