Re: [protobuf] Re: How can I reset a FileInputStream?

2010-02-01 Thread Kenton Varda
On Sat, Jan 30, 2010 at 3:03 PM, Jacob Rief jacob.r...@gmail.com wrote:

  Please just don't add anything new.  If you are unhappy with what
  ZeroCopy{Input,Output}Stream provide, you can always just create your own
  stream framework to use.

 Well, I have to live with that decision. Maybe in the future some
 other people have similar use cases. Maybe in version 3?


I don't think this is that bad.  The ZeroCopyInputStream interface was
really designed to be an adaptor for using other streaming frameworks with
ProtocolBuffers.  It's not meant to be a streaming framework in itself --
that would be outside the scope of the library.  So, I'd rather keep the
interface as minimal as possible, providing only what is strictly necessary
for protocol buffers to operate.  I don't think Seek() or Reset() are
strictly necessary, and since it's not clear to me that either one is
necessarily correct, I'd just rather not get into it.


 Just for curiosity, the protobuf code is really easy to read and to
 understand. The only thing is disliked is the mapping of class names
 to filenames. Is all the code inside Google written that clearly?


Thanks!  I don't know if I can give an unbiased answer to this question,
since I wrote most of the proto2 code, so of course to *me* it is the
clearest code in all of Google.  ;)  I think in general the code here tends
to be pretty high-quality due to our policy of reviewing every change
pre-checkin.  However, there is certainly some ugly code, mostly code which
evolved slowly over time with many different authors.

If you want to find out for yourself, we're hiring!  :)

-- 
You received this message because you are subscribed to the Google Groups 
Protocol Buffers group.
To post to this group, send email to proto...@googlegroups.com.
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en.



Re: [protobuf] Re: How can I reset a FileInputStream?

2010-01-30 Thread Jacob Rief
Hello Kenton,

2010/1/30 Kenton Varda ken...@google.com:
 We can't add a new virtual method to the ZeroCopyInputStream interface
 because it would break existing implementations.

but only on a binary level. God's sake you are not Microsoft :)

 Besides that, it's unclear what the Reset() method means in an abstract
 sense.  Yes, you can define it for the particular set of streams that you're
 thinking of, but what does it mean in general?

Put the object into a state equivalent to a state immediately after
construction. The Reset() button should be pressed only when you
know what you are doing - otherwise you will loose valuable data.

 What should ArrayInputStream::Reset() do?  In this case Reset() is
 nonsensical.

the same as its constructor without reallocation, ie.:
void ArrayInputStream::Reset() {
  position_ = 0;
  last_returned_size_ = 0;
}

 What should IstreamInputStream::Reset() do?  Should it only discard its own
 buffer, or should it also reset the underlying istream?  If that istream is

void IstreamInputStream::Reset() {
  impl_.Reset();
}

since impl_ is a CopyingInputStreamAdaptor which itself IS-A
ZeroCopyInputStream its Reset() is just another implementation, ie.

void CopyingInputStreamAdaptor::Reset() {
  position_ = 0;
  buffer_used_ = 0;
  backup_bytes_ = 0;
}

 itself wrapping a file descriptor, and you're trying to seek that file
 descriptor directly, then you need to reset the istream.  But maybe the user
 is actually calling IstreamInputStream::Reset() because they have seeked the
 istream itself and what IstreamInputStream to acknowledge this.  Who knows?
  But you can't say that Reset() is only propagated down the stack by *some*
 implementations and not others.

Since the creator of IstreamInputStream is the owner of the file
descriptor, its his responsibility to seek to whatever location
desired.

 No, we won't be adding a Reset() method because the meaning is unclear.
 Meanwhile, you seem to have made an argument against
 FileInputStream::Seek():  Any streams layered on top of it will be broken if
 you Seek() the stream under them.  So you have to have some way to reset
 those streams, and the problem starts again!

Exactly! Therefore instead of destroying and recreating them, a much
simpler Reset() function would do the job. The design of the streaming
classes is to consider a stream which can move only forward. It was
not designed for moving backwards or random access.

 Please just don't add anything new.  If you are unhappy with what
 ZeroCopy{Input,Output}Stream provide, you can always just create your own
 stream framework to use.

Well, I have to live with that decision. Maybe in the future some
other people have similar use cases. Maybe in version 3?

Just for curiosity, the protobuf code is really easy to read and to
understand. The only thing is disliked is the mapping of class names
to filenames. Is all the code inside Google written that clearly?

Regards, Jacob

-- 
You received this message because you are subscribed to the Google Groups 
Protocol Buffers group.
To post to this group, send email to proto...@googlegroups.com.
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en.



Re: [protobuf] Re: How can I reset a FileInputStream?

2010-01-29 Thread Kenton Varda
We can't add a new virtual method to the ZeroCopyInputStream interface
because it would break existing implementations.

Besides that, it's unclear what the Reset() method means in an abstract
sense.  Yes, you can define it for the particular set of streams that you're
thinking of, but what does it mean in general?

What should ArrayInputStream::Reset() do?  In this case Reset() is
nonsensical.

What should IstreamInputStream::Reset() do?  Should it only discard its own
buffer, or should it also reset the underlying istream?  If that istream is
itself wrapping a file descriptor, and you're trying to seek that file
descriptor directly, then you need to reset the istream.  But maybe the user
is actually calling IstreamInputStream::Reset() because they have seeked the
istream itself and what IstreamInputStream to acknowledge this.  Who knows?
 But you can't say that Reset() is only propagated down the stack by *some*
implementations and not others.

No, we won't be adding a Reset() method because the meaning is unclear.

Meanwhile, you seem to have made an argument against
FileInputStream::Seek():  Any streams layered on top of it will be broken if
you Seek() the stream under them.  So you have to have some way to reset
those streams, and the problem starts again!

Please just don't add anything new.  If you are unhappy with what
ZeroCopy{Input,Output}Stream provide, you can always just create your own
stream framework to use.  All you have to do is write a ZeroCopyStream
wrapper around your own interface and protocol buffers will then be able to
use it.  Presumably you'd only construct the wrapper object immediately
before parsing/serializing and destroy it immediately after, so you could
allocate it on the stack and avoid malloc overhead altogether.  This way you
can do whatever you want with your streams.

On Fri, Jan 29, 2010 at 9:39 AM, Jacob Rief jacob.r...@gmail.com wrote:

 Hello Kenton,
 here I added a small testing program, to show the difference together
 with the Seek()-patch.
 The benchmarks:
 duration with Seek: 00:00:01.285103
 duration conventional: 00:00:01.544068

 duration with Seek: 00:00:01.295928
 duration conventional: 00:00:01.565688

 duration with Seek: 00:00:01.292432
 duration conventional: 00:00:01.563996

 duration with Seek: 00:00:01.292950
 duration conventional: 00:00:01.619057

 duration with Seek: 00:00:01.302095
 duration conventional: 00:00:01.569414

 duration with Seek: 00:00:01.291782
 duration conventional: 00:00:01.549302

 duration with Seek: 00:00:01.289552
 duration conventional: 00:00:01.568220

 as I told in my last email, I did not overload operator new.
 Everything I use is an out-of-the-box Fedora 11 with their default
 libraries: gcc-c++-4.4.1-2.fc11.x86_64, libstdc++-4.4.1-2.fc11.x86_64
 and glibc-2.10.1-5.x86_64.
 Sure, at Google the default allocator might be tcmalloc, but this adds
 another layer of complexity to the building step. IMO releasing and
 reallocating three or more blocks of memory always is less efficient
 than just resetting the internal state of an object, regardless which
 allocator you use.

 I thought about a possible FileIn/OutputStream::Seek function and
 after some rethinking I have to admit, that I don't really like that
 idea. Let me explain why:
 Assume you have an AESEncryptionInputStream stacked upon a
 CompressionInputStream stacked upon a FileInputStream. Each stream
 knows about its lower level which is a ZeroCopyInputStream. If for
 some reason, you have to change the session key (which for instance is
 encrypted using RSA) in the AESEncryptionInputStream, you must reset
 all the internal buffers to a clean state. Then however,
 ZeroCopyInputStream needs a virtual Reset() function so that each
 stacked Stream can call Reset() on its lower level. This would allow
 to flexibly stack different kinds of compatible streams over each
 other. Without such a Reset() function the caller has to keep track
 about deallocating and reallocating all the streams, which certainly
 makes that code much harder to understand than just using one single
 Reset() statement which does everything for the caller.

 BTW: As you know, I wrote a streaming class using the LZMAlgorithm.
 This class has a better compression ratio than GzipIn/OutputStream and
 another big advantage: The LzipInputStream'ing class is able to
 resynchronize to the beginning of any compression member inside an
 LZMA-stream, so one can seek to any position of the underlying file
 and still read data. In order to achieve this, I had to convince
 Antonio Diaz, author of the LZMA library, to add a reset function to
 its decoder. Otherwise it would have been much more complicate.
 Thus it would be really nice to also have a Reset() function on the
 other side of the interface.

 Regards, Jacob


 2010/1/28 Kenton Varda ken...@google.com:
  Your new results seem to indicate that the extra malloc/free can cost
  between 300ns and 1500ns.  At the low end, the overhead is negligible 

Re: [protobuf] Re: How can I reset a FileInputStream?

2010-01-27 Thread Kenton Varda
Your new results seem to indicate that the extra malloc/free can cost
between 300ns and 1500ns.  At the low end, the overhead is negligible and
not worth complicating the interface.  At the high end, your results seem
hard to believe, unless your memory allocator is extremely poor.  And even
if that is the case, why the variability?  Note that the memory allocator is
part of libc, not GCC, so it's the libc version that matters.

I wanted the source code so that I could run the test using tcmalloc
instead.  Feel free to try this yourself:

http://code.google.com/p/google-perftools/

Even if we did add the Seek() method, using tcmalloc is probably a good idea
for any C++ app that cares about performance.

On Thu, Jan 21, 2010 at 3:14 PM, Jacob Rief jacob.r...@gmail.com wrote:

 Hello Kenton,
 I added this code to my working project, but it would be too messy to
 send the whole project. I could write a small testing application.
 BTW, I also added the Seek() function to FileOutputStream, a) for
 symmetry reasons and b) because in one occasion I have to reposition
 the file descriptor while writing. Everything is included in this
 patch.

 Regards, Jacob

 2010/1/21 Kenton Varda ken...@google.com:
  Can you send me the complete source file for this benchmark?


-- 
You received this message because you are subscribed to the Google Groups 
Protocol Buffers group.
To post to this group, send email to proto...@googlegroups.com.
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en.



[protobuf] Re: How can I reset a FileInputStream?

2010-01-21 Thread Jacob Rief
Hello Kenton,

2010/1/20 Kenton Varda ken...@google.com:
 (1) Normally micro-benchmarks involve running the operation in a loop many
 times so that the total time is closer to 1s or more, not running the
 operation once and trying to time that.  System clocks are not very accurate
 at that scale, and depending on what kind of clock it is, it may actually
 take significantly longer to read the lock than it does not allocate memory.
 (2) Your benchmark does not include the time spent actually reading the
 file, which is what I asserted would be much slower than re-allocating the
 buffer.  Sure, the seek itself is fast but it is pointless without actually
 reading.

now I modified the benchmark, now the code looks like this

boost::posix_time::ptime time0(boost::posix_time::microsec_clock::local_time());
boost::posix_time::ptime time1(boost::posix_time::microsec_clock::local_time());
for (int i = 0; i100; ++i) {
       const void* data;
       int size;
       fileInStream-Seek(offset, whence);
       fileInStream-Next(data, size);
}
boost::posix_time::ptime time2(boost::posix_time::microsec_clock::local_time());
for (int i = 0; i100; ++i) {
       const void* data;
       int size;
       ::lseek64(fileDescriptor, offset, whence);
       fileInStream.reset(new
google::protobuf::io::FileInputStream(fileDescriptor));
       fileInStream-Next(data, size);
}
boost::posix_time::ptime time3(boost::posix_time::microsec_clock::local_time());
std::cerr  t1:    boost::posix_time::time_period(time1,
time2).length()   t2:   boost::posix_time::time_period(time2,
time3).length()  std::endl;

The difference now is less significant, but still measurable:
t1: 00:00:02.068949 t2: 00:00:02.389942
t1: 00:00:02.092842 t2: 00:00:02.429206
t1: 00:00:02.080614 t2: 00:00:02.394708
t1: 00:00:02.094289 t2: 00:00:02.429952
t1: 00:00:02.323403 t2: 00:00:03.723459
t1: 00:00:02.151486 t2: 00:00:03.711809
t1: 00:00:02.084442 t2: 00:00:02.416326
t1: 00:00:02.052930 t2: 00:00:02.383500

 (3) What memory allocator are you using?  With tcmalloc, a malloc/free pair
 should take around 50ns, two orders of magnitude less than your 4us
 measurement.

The 'new' operator is not overloaded. I use gcc-Version 4.4.1 20090725
(Red Hat 4.4.1-2)

Regards, Jacob

 On Wed, Jan 20, 2010 at 2:17 PM, Jacob Rief jacob.r...@gmail.com wrote:

 Hello Kenton,
 now I did some benchmarks, while Seek'ing though a FileInputStream.
 The testing code looks like this:

  boost::posix_time::ptime
 t0(boost::posix_time::microsec_clock::local_time()); // initialize
 boost::posix_time
  boost::shared_ptrgoogle::protobuf::io::FileInputStream
 fileInStream = new
 google::protobuf::io::FileInputStream(fileDescriptor);
  boost::posix_time::ptime
 t1(boost::posix_time::microsec_clock::local_time());
   // using Seek(), the function available through my patch
  fileInStream-Seek(offset, whence);
  boost::posix_time::ptime
 t2(boost::posix_time::microsec_clock::local_time());
  // this is the default method of achieving the same
  ::lseek64(fileDescriptor, offset, whence);
  fileInStream.reset(new
 google::protobuf::io::FileInputStream(fileDescriptor));
  boost::posix_time::ptime
 t3(boost::posix_time::microsec_clock::local_time());
  std::cerr  t1:          boost::posix_time::time_period(t1,
 t2).length()   t2:   boost::posix_time::time_period(t2,
 t3).length()  std::endl;

 and on my Intel Core2 Duo CPU E8400 (3.00GHz) with 4GB of RAM,
 gcc-Version 4.4.1 20090725, compiled with -O2
 I get these numbers:
 t1: 00:00:00.01 t2: 00:00:00.03
 t1: 00:00:00.01 t2: 00:00:00.03
 t1: 00:00:00.01 t2: 00:00:00.04
 t1: 00:00:00.01 t2: 00:00:00.07
 t1: 00:00:00.01 t2: 00:00:00.02
 t1: 00:00:00.01 t2: 00:00:00.03
 t1: 00:00:00.02 t2: 00:00:00.03
 t1: 00:00:00.01 t2: 00:00:00.04
 t1: 00:00:00.01 t2: 00:00:00.04
 t1: 00:00:00.01 t2: 00:00:00.03
 t1: 00:00:00.01 t2: 00:00:00.04

 In absolute numbers, ~1 microsecond compared to 3-4 microseconds is
 not a big difference,
 but from a relative point of view, direct Seek'ing is much faster than
 object recreation. And since
 I have to seek a lot in the FileInputStream, the measured times will
 accumulate.

 Regards, Jacob

 2010/1/19 Kenton Varda ken...@google.com:
  Did you do any tests to determine if the performance difference is
  relevant?
 
  On Mon, Jan 18, 2010 at 3:14 PM, Jacob Rief jacob.r...@gmail.com
  wrote:
 
  Hello Kenton,
 
  2010/1/18 Kenton Varda ken...@google.com:
  (...snip...)
   As for code cleanliness, I find the Reset() method awkward since the
   user
   has to remember to call it at the same time as they do some other
   operation,
   like seeking the file descriptor.  Either calling Reset() or seeking
   the
   file descriptor alone will put the object in an inconsistent state.
    It
   might make more sense to offer an actual Seek() method which can
   safely
   perform both operations together with an interface that is not so
   

Re: [protobuf] Re: How can I reset a FileInputStream?

2010-01-21 Thread Kenton Varda
Can you send me the complete source file for this benchmark?

On Thu, Jan 21, 2010 at 11:50 AM, Jacob Rief jacob.r...@gmail.com wrote:

 Hello Kenton,

 2010/1/20 Kenton Varda ken...@google.com:
  (1) Normally micro-benchmarks involve running the operation in a loop
 many
  times so that the total time is closer to 1s or more, not running the
  operation once and trying to time that.  System clocks are not very
 accurate
  at that scale, and depending on what kind of clock it is, it may actually
  take significantly longer to read the lock than it does not allocate
 memory.
  (2) Your benchmark does not include the time spent actually reading the
  file, which is what I asserted would be much slower than re-allocating
 the
  buffer.  Sure, the seek itself is fast but it is pointless without
 actually
  reading.

 now I modified the benchmark, now the code looks like this

 boost::posix_time::ptime
 time0(boost::posix_time::microsec_clock::local_time());
 boost::posix_time::ptime
 time1(boost::posix_time::microsec_clock::local_time());
 for (int i = 0; i100; ++i) {
const void* data;
int size;
fileInStream-Seek(offset, whence);
fileInStream-Next(data, size);
 }
 boost::posix_time::ptime
 time2(boost::posix_time::microsec_clock::local_time());
 for (int i = 0; i100; ++i) {
const void* data;
int size;
::lseek64(fileDescriptor, offset, whence);
fileInStream.reset(new
 google::protobuf::io::FileInputStream(fileDescriptor));
fileInStream-Next(data, size);
 }
 boost::posix_time::ptime
 time3(boost::posix_time::microsec_clock::local_time());
 std::cerr  t1:boost::posix_time::time_period(time1,
 time2).length()   t2:   boost::posix_time::time_period(time2,
 time3).length()  std::endl;

 The difference now is less significant, but still measurable:
 t1: 00:00:02.068949 t2: 00:00:02.389942
 t1: 00:00:02.092842 t2: 00:00:02.429206
 t1: 00:00:02.080614 t2: 00:00:02.394708
 t1: 00:00:02.094289 t2: 00:00:02.429952
 t1: 00:00:02.323403 t2: 00:00:03.723459
 t1: 00:00:02.151486 t2: 00:00:03.711809
 t1: 00:00:02.084442 t2: 00:00:02.416326
 t1: 00:00:02.052930 t2: 00:00:02.383500

  (3) What memory allocator are you using?  With tcmalloc, a malloc/free
 pair
  should take around 50ns, two orders of magnitude less than your 4us
  measurement.

 The 'new' operator is not overloaded. I use gcc-Version 4.4.1 20090725
 (Red Hat 4.4.1-2)

 Regards, Jacob

  On Wed, Jan 20, 2010 at 2:17 PM, Jacob Rief jacob.r...@gmail.com
 wrote:
 
  Hello Kenton,
  now I did some benchmarks, while Seek'ing though a FileInputStream.
  The testing code looks like this:
 
   boost::posix_time::ptime
  t0(boost::posix_time::microsec_clock::local_time()); // initialize
  boost::posix_time
   boost::shared_ptrgoogle::protobuf::io::FileInputStream
  fileInStream = new
  google::protobuf::io::FileInputStream(fileDescriptor);
   boost::posix_time::ptime
  t1(boost::posix_time::microsec_clock::local_time());
// using Seek(), the function available through my patch
   fileInStream-Seek(offset, whence);
   boost::posix_time::ptime
  t2(boost::posix_time::microsec_clock::local_time());
   // this is the default method of achieving the same
   ::lseek64(fileDescriptor, offset, whence);
   fileInStream.reset(new
  google::protobuf::io::FileInputStream(fileDescriptor));
   boost::posix_time::ptime
  t3(boost::posix_time::microsec_clock::local_time());
   std::cerr  t1:  boost::posix_time::time_period(t1,
  t2).length()   t2:   boost::posix_time::time_period(t2,
  t3).length()  std::endl;
 
  and on my Intel Core2 Duo CPU E8400 (3.00GHz) with 4GB of RAM,
  gcc-Version 4.4.1 20090725, compiled with -O2
  I get these numbers:
  t1: 00:00:00.01 t2: 00:00:00.03
  t1: 00:00:00.01 t2: 00:00:00.03
  t1: 00:00:00.01 t2: 00:00:00.04
  t1: 00:00:00.01 t2: 00:00:00.07
  t1: 00:00:00.01 t2: 00:00:00.02
  t1: 00:00:00.01 t2: 00:00:00.03
  t1: 00:00:00.02 t2: 00:00:00.03
  t1: 00:00:00.01 t2: 00:00:00.04
  t1: 00:00:00.01 t2: 00:00:00.04
  t1: 00:00:00.01 t2: 00:00:00.03
  t1: 00:00:00.01 t2: 00:00:00.04
 
  In absolute numbers, ~1 microsecond compared to 3-4 microseconds is
  not a big difference,
  but from a relative point of view, direct Seek'ing is much faster than
  object recreation. And since
  I have to seek a lot in the FileInputStream, the measured times will
  accumulate.
 
  Regards, Jacob
 
  2010/1/19 Kenton Varda ken...@google.com:
   Did you do any tests to determine if the performance difference is
   relevant?
  
   On Mon, Jan 18, 2010 at 3:14 PM, Jacob Rief jacob.r...@gmail.com
   wrote:
  
   Hello Kenton,
  
   2010/1/18 Kenton Varda ken...@google.com:
   (...snip...)
As for code cleanliness, I find the Reset() method awkward since
 the
user
has to remember to call it at the same time as they do some other
operation,
like seeking the file descriptor.  

Re: [protobuf] Re: How can I reset a FileInputStream?

2010-01-20 Thread Kenton Varda
(1) Normally micro-benchmarks involve running the operation in a loop many
times so that the total time is closer to 1s or more, not running the
operation once and trying to time that.  System clocks are not very accurate
at that scale, and depending on what kind of clock it is, it may actually
take significantly longer to read the lock than it does not allocate memory.

(2) Your benchmark does not include the time spent actually reading the
file, which is what I asserted would be much slower than re-allocating the
buffer.  Sure, the seek itself is fast but it is pointless without actually
reading.

(3) What memory allocator are you using?  With tcmalloc, a malloc/free pair
should take around 50ns, two orders of magnitude less than your 4us
measurement.

On Wed, Jan 20, 2010 at 2:17 PM, Jacob Rief jacob.r...@gmail.com wrote:

 Hello Kenton,
 now I did some benchmarks, while Seek'ing though a FileInputStream.
 The testing code looks like this:

  boost::posix_time::ptime
 t0(boost::posix_time::microsec_clock::local_time()); // initialize
 boost::posix_time
  boost::shared_ptrgoogle::protobuf::io::FileInputStream
 fileInStream = new
 google::protobuf::io::FileInputStream(fileDescriptor);
  boost::posix_time::ptime
 t1(boost::posix_time::microsec_clock::local_time());
   // using Seek(), the function available through my patch
  fileInStream-Seek(offset, whence);
  boost::posix_time::ptime
 t2(boost::posix_time::microsec_clock::local_time());
  // this is the default method of achieving the same
  ::lseek64(fileDescriptor, offset, whence);
  fileInStream.reset(new
 google::protobuf::io::FileInputStream(fileDescriptor));
  boost::posix_time::ptime
 t3(boost::posix_time::microsec_clock::local_time());
  std::cerr  t1:  boost::posix_time::time_period(t1,
 t2).length()   t2:   boost::posix_time::time_period(t2,
 t3).length()  std::endl;

 and on my Intel Core2 Duo CPU E8400 (3.00GHz) with 4GB of RAM,
 gcc-Version 4.4.1 20090725, compiled with -O2
 I get these numbers:
 t1: 00:00:00.01 t2: 00:00:00.03
 t1: 00:00:00.01 t2: 00:00:00.03
 t1: 00:00:00.01 t2: 00:00:00.04
 t1: 00:00:00.01 t2: 00:00:00.07
 t1: 00:00:00.01 t2: 00:00:00.02
 t1: 00:00:00.01 t2: 00:00:00.03
 t1: 00:00:00.02 t2: 00:00:00.03
 t1: 00:00:00.01 t2: 00:00:00.04
 t1: 00:00:00.01 t2: 00:00:00.04
 t1: 00:00:00.01 t2: 00:00:00.03
 t1: 00:00:00.01 t2: 00:00:00.04

 In absolute numbers, ~1 microsecond compared to 3-4 microseconds is
 not a big difference,
 but from a relative point of view, direct Seek'ing is much faster than
 object recreation. And since
 I have to seek a lot in the FileInputStream, the measured times will
 accumulate.

 Regards, Jacob

 2010/1/19 Kenton Varda ken...@google.com:
  Did you do any tests to determine if the performance difference is
 relevant?
 
  On Mon, Jan 18, 2010 at 3:14 PM, Jacob Rief jacob.r...@gmail.com
 wrote:
 
  Hello Kenton,
 
  2010/1/18 Kenton Varda ken...@google.com:
  (...snip...)
   As for code cleanliness, I find the Reset() method awkward since the
   user
   has to remember to call it at the same time as they do some other
   operation,
   like seeking the file descriptor.  Either calling Reset() or seeking
 the
   file descriptor alone will put the object in an inconsistent state.
  It
   might make more sense to offer an actual Seek() method which can
 safely
   perform both operations together with an interface that is not so easy
   to
   misuse.
 
  I agree, a 64bit Seek function would definitely be safer.
  Here is a patch for version 2.3.0 to add FileInputStream::Seek(). The
  patch also fixes a compiler warning.
  I also adopted my library to use this function and it seems to work.
  Regards, Jacob
 
 

-- 

You received this message because you are subscribed to the Google Groups "Protocol Buffers" group.

To post to this group, send email to proto...@googlegroups.com.

To unsubscribe from this group, send email to protobuf+unsubscr...@googlegroups.com.

For more options, visit this group at http://groups.google.com/group/protobuf?hl=en.



[protobuf] Re: How can I reset a FileInputStream?

2010-01-17 Thread Jacob Rief
Hello Kenton,

 What makes you think it is inefficient?  It does mean the buffer has to be
 re-allocated but with a decent malloc implementation that shouldn't take
 long.  Certainly the actual reading from the file would take longer.  Have
 you seen performance problems with this approach?

Well, in order to see any performance penalties, I would have to
implement FileInputStream::Reset() and compare the results with the
current implementation, (I can do that, if there is enough interest).
I reviewed the implementation and I saw that by reinstantiating a
FileInputStream object, 3 destructors and 3 constructors have to be
called, where one (CopyingInputStreamAdaptor) invalidates a buffer
which in the Next() step immediately afterwards has to be
reallocated. A Reset() function would avoid these unnecessary steps.

 If there really is a performance problem with allocating new objects, then
 sure.

From the performance point of view, its certainly not a big issue, but
from the code cleanness point of view, it is.
I have written a class named LzipInputStream, which offers a Reset()
functionality to randomly access any part of the uncompressed input
stream without having to decompress everything. Therefore this Reset()
function is called quite often and it has to destroy and recreate its
lower layer, ie. the FileInputStream. If each stackable ...InputStream
offers a Reset() function, the upper layer then only has to call Reset
() on the lower layer, instead of keeping track how to reconstruct the
lower layered FileInputStream object.

Regards, Jacob
-- 
You received this message because you are subscribed to the Google Groups 
Protocol Buffers group.
To post to this group, send email to proto...@googlegroups.com.
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en.