Re: [protobuf] using compression on protobuf messages

2010-06-22 Thread Evan Jones

On Jun 22, 2010, at 13:54 , sheila miguez wrote:

When I have a message to compress, I know the size of the byte array
stream buffer to allocate. Then call the writeTo on it. Is there
anything I should do other than this, given a message? writeTo should
be pretty performant, yes? In unit test, when measuring the speed that
takes, it is pretty good.


I don't quite understand what you are doing. Are you allocating a  
ByteArrayOutputStream, writing the message to it, then passing the  
byte[] from the ByteArrayOutputStream to some LZO library? You could  
just call message.toByteArray() if that is what you want, which will  
be faster.


I haven't tested this carefully, but my experience is that if you want  
the absolute best performance while using the Java API:


* If you are writing to an OutputStream, you want re-use a single  
CodedOutputStream. It has an internal buffer, and allocating this  
buffer multiple times seems to slow things down. You probably want  
this option if you are writing many messages. Its typically pretty  
easy to provide your own implementation of OutputStream if you need to  
pass data to something else (eg. LZO).


* If you have a byte[] array that is big enough, pass it in to  
CodedOutputStream.newInstance() to avoid an extra copy.


* If you just want a byte[] array that is the exact right size, just  
call message.toByteArray()



Does the LZO library have an OutputStream API? This would allow you to  
compress large protobuf messages as they are written out, rather than  
needing to serialize the entire thing to a byte[] array, then compress  
it. This could be better, but as always you'll have to measure it.


Hope this helps,

Evan

--
Evan Jones
http://evanjones.ca/

--
You received this message because you are subscribed to the Google Groups Protocol 
Buffers group.
To post to this group, send email to proto...@googlegroups.com.
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en.



Re: [protobuf] using compression on protobuf messages

2010-06-22 Thread sheila miguez
On Tue, Jun 22, 2010 at 1:28 PM, Evan Jones ev...@mit.edu wrote:
 On Jun 22, 2010, at 13:54 , sheila miguez wrote:

 When I have a message to compress, I know the size of the byte array
 stream buffer to allocate. Then call the writeTo on it. Is there
 anything I should do other than this, given a message? writeTo should
 be pretty performant, yes? In unit test, when measuring the speed that
 takes, it is pretty good.

 I don't quite understand what you are doing. Are you allocating a
 ByteArrayOutputStream, writing the message to it, then passing the byte[]
 from the ByteArrayOutputStream to some LZO library? You could just call
 message.toByteArray() if that is what you want, which will be faster.

I've got a servlet filter which wraps the HttpServletResponse. So, the
servlet response's output stream, which is wrapped in a stream from
the lzo library, is compressing data as it is getting written to.

 I haven't tested this carefully, but my experience is that if you want the
 absolute best performance while using the Java API:

 [helpful info]

 Does the LZO library have an OutputStream API? This would allow you to

Yes.

For the curious, I'm using code from http://github.com/kevinweil/hadoop-lzo

 Hope this helps,

 Evan

Thank you


-- 
sheila

-- 
You received this message because you are subscribed to the Google Groups 
Protocol Buffers group.
To post to this group, send email to proto...@googlegroups.com.
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en.



Re: [protobuf] using compression on protobuf messages

2010-06-22 Thread Evan Jones

On Jun 22, 2010, at 15:35 , sheila miguez wrote:

I've got a servlet filter which wraps the HttpServletResponse. So, the
servlet response's output stream, which is wrapped in a stream from
the lzo library, is compressing data as it is getting written to.


Ah, so the best case is probably message.writeTo(servletOutputStream)  
If you are writing multiple messages, you'll probably want to  
explicitly create a single CodedOutputStream to write all of them.


If you experiment with this and find something different, I would be  
interested to know.


Evan

--
Evan Jones
http://evanjones.ca/

--
You received this message because you are subscribed to the Google Groups Protocol 
Buffers group.
To post to this group, send email to proto...@googlegroups.com.
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en.