Re: [protobuf] Re: How can I reset a FileInputStream?

Kenton Varda Fri, 29 Jan 2010 16:16:15 -0800

We can't add a new virtual method to the ZeroCopyInputStream interface
because it would break existing implementations.


Besides that, it's unclear what the Reset() method means in an abstract
sense.  Yes, you can define it for the particular set of streams that you're
thinking of, but what does it mean in general?

What should ArrayInputStream::Reset() do?  In this case Reset() is
nonsensical.

What should IstreamInputStream::Reset() do?  Should it only discard its own
buffer, or should it also reset the underlying istream?  If that istream is
itself wrapping a file descriptor, and you're trying to seek that file
descriptor directly, then you need to reset the istream.  But maybe the user
is actually calling IstreamInputStream::Reset() because they have seeked the
istream itself and what IstreamInputStream to acknowledge this.  Who knows?
 But you can't say that Reset() is only propagated down the stack by *some*
implementations and not others.

No, we won't be adding a Reset() method because the meaning is unclear.

Meanwhile, you seem to have made an argument against
FileInputStream::Seek():  Any streams layered on top of it will be broken if
you Seek() the stream under them.  So you have to have some way to reset
those streams, and the problem starts again!

Please just don't add anything new.  If you are unhappy with what
ZeroCopy{Input,Output}Stream provide, you can always just create your own
stream framework to use.  All you have to do is write a ZeroCopyStream
wrapper around your own interface and protocol buffers will then be able to
use it.  Presumably you'd only construct the wrapper object immediately
before parsing/serializing and destroy it immediately after, so you could
allocate it on the stack and avoid malloc overhead altogether.  This way you
can do whatever you want with your streams.

On Fri, Jan 29, 2010 at 9:39 AM, Jacob Rief <jacob.r...@gmail.com> wrote:

> Hello Kenton,
> here I added a small testing program, to show the difference together
> with the Seek()-patch.
> The benchmarks:
> duration with Seek: 00:00:01.285103
> duration conventional: 00:00:01.544068
>
> duration with Seek: 00:00:01.295928
> duration conventional: 00:00:01.565688
>
> duration with Seek: 00:00:01.292432
> duration conventional: 00:00:01.563996
>
> duration with Seek: 00:00:01.292950
> duration conventional: 00:00:01.619057
>
> duration with Seek: 00:00:01.302095
> duration conventional: 00:00:01.569414
>
> duration with Seek: 00:00:01.291782
> duration conventional: 00:00:01.549302
>
> duration with Seek: 00:00:01.289552
> duration conventional: 00:00:01.568220
>
> as I told in my last email, I did not overload operator new.
> Everything I use is an out-of-the-box Fedora 11 with their default
> libraries: gcc-c++-4.4.1-2.fc11.x86_64, libstdc++-4.4.1-2.fc11.x86_64
> and glibc-2.10.1-5.x86_64.
> Sure, at Google the default allocator might be tcmalloc, but this adds
> another layer of complexity to the building step. IMO releasing and
> reallocating three or more blocks of memory always is less efficient
> than just resetting the internal state of an object, regardless which
> allocator you use.
>
> I thought about a possible FileIn/OutputStream::Seek function and
> after some rethinking I have to admit, that I don't really like that
> idea. Let me explain why:
> Assume you have an AESEncryptionInputStream stacked upon a
> CompressionInputStream stacked upon a FileInputStream. Each stream
> knows about its lower level which is a ZeroCopyInputStream. If for
> some reason, you have to change the session key (which for instance is
> encrypted using RSA) in the AESEncryptionInputStream, you must reset
> all the internal buffers to a clean state. Then however,
> ZeroCopyInputStream needs a virtual Reset() function so that each
> stacked Stream can call Reset() on its lower level. This would allow
> to flexibly stack different kinds of compatible streams over each
> other. Without such a Reset() function the caller has to keep track
> about deallocating and reallocating all the streams, which certainly
> makes that code much harder to understand than just using one single
> Reset() statement which does everything for the caller.
>
> BTW: As you know, I wrote a streaming class using the LZMAlgorithm.
> This class has a better compression ratio than GzipIn/OutputStream and
> another big advantage: The LzipInputStream'ing class is able to
> resynchronize to the beginning of any compression member inside an
> LZMA-stream, so one can seek to any position of the underlying file
> and still read data. In order to achieve this, I had to convince
> Antonio Diaz, author of the LZMA library, to add a reset function to
> its decoder. Otherwise it would have been much more complicate.
> Thus it would be really nice to also have a Reset() function on the
> "other side" of the interface.
>
> Regards, Jacob
>
>
> 2010/1/28 Kenton Varda <ken...@google.com>:
> > Your new results seem to indicate that the extra malloc/free can cost
> > between 300ns and 1500ns.  At the low end, the overhead is negligible and
> > not worth complicating the interface.  At the high end, your results seem
> > hard to believe, unless your memory allocator is extremely poor.  And
> even
> > if that is the case, why the variability?  Note that the memory allocator
> is
> > part of libc, not GCC, so it's the libc version that matters.
> > I wanted the source code so that I could run the test using tcmalloc
> > instead.  Feel free to try this yourself:
> > http://code.google.com/p/google-perftools/
> > Even if we did add the Seek() method, using tcmalloc is probably a good
> idea
> > for any C++ app that cares about performance.
> > On Thu, Jan 21, 2010 at 3:14 PM, Jacob Rief <jacob.r...@gmail.com>
> wrote:
>
> --
> You received this message because you are subscribed to the Google Groups
> "Protocol Buffers" group.
> To post to this group, send email to proto...@googlegroups.com.
> To unsubscribe from this group, send email to
> protobuf+unsubscr...@googlegroups.com<protobuf%2bunsubscr...@googlegroups.com>
> .
> For more options, visit this group at
> http://groups.google.com/group/protobuf?hl=en.
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To post to this group, send email to proto...@googlegroups.com.
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en.

Re: [protobuf] Re: How can I reset a FileInputStream?

Reply via email to