Re: [protobuf] Re: Opensourcing LzLib Protocol Buffer Code

2010-04-01 Thread Jacob Rief
Hello Ernest,
here is the latest version of that lzip-in/output-stream classes. I
fixed some issues since the last published version. These two classes,
in my humble opinion, are stable now. Any code reviews are welcomed!

Kenton, if there is a repository for external utility classes like
this one, please let me know. And if my other library gets into a
state, where I can publish it, I will create a project together with
its repository containing the above classes. However, since this is
work in progress with lots of changes to my API, I prefer to keep it
unpublished for now.

Regards, Jacob

-- 
You received this message because you are subscribed to the Google Groups 
Protocol Buffers group.
To post to this group, send email to proto...@googlegroups.com.
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en.

// This file contains the declaration of classes
// LzipInputStream and LzipOutputStream used to compress
// and decompress Google's Protocol Buffer Streams using
// the Lempel-Ziv-Markow-Algorithm.
//
// Derived from 
http://protobuf.googlecode.com/svn/tags/2.2.0/src/google/protobuf/io/gzip_stream.h
// Copyright 2010 by Jacob Rief jacob.r...@gmail.com

#ifndef GOOGLE_PROTOBUF_IO_LZIP_STREAM_H__
#define GOOGLE_PROTOBUF_IO_LZIP_STREAM_H__

#include stdint.h
#include lzlib.h
#include google/protobuf/io/zero_copy_stream.h

namespace google {
namespace protobuf {
namespace io {

// A ZeroCopyInputStream that reads compressed data through lzib
class LIBPROTOBUF_EXPORT LzipInputStream : public ZeroCopyInputStream {
 public:
  explicit LzipInputStream(ZeroCopyInputStream* sub_stream);

  virtual ~LzipInputStream();

  // Releases the decoder.
  bool Close();

  // Reset the underlying input stream, resetting the decompressor and all 
counters.
  void Reset();

  // Forward the underlying InputStream to the begin of the next compression 
member.
  // Use this function after repositioning the underlying stream or in case a 
stream error occured.
  bool Forward();

  // In case of an error, check reason here
  inline LZ_Errno ErrorCode() const {
return errno_;
  }

  // --- implements ZeroCopyInputStream ---
  bool Next(const void** data, int* size);
  void BackUp(int count);
  bool Skip(int count);
  int64 ByteCount() const;

 private:
  GOOGLE_DISALLOW_EVIL_CONSTRUCTORS(LzipInputStream);

  void Decompress();

  // compressed input stream
  ZeroCopyInputStream* sub_stream_;
  bool finished_;

  // plain text output stream
  const int output_buffer_length_;
  void* const output_buffer_;
  uint8_t* output_position_;
  uint8_t* next_out_;
  int avail_out_;

  // Lzip decoder
  LZ_Decoder* decoder_;
  LZ_Errno errno_;
};

class LIBPROTOBUF_EXPORT LzipOutputStream : public ZeroCopyOutputStream {
 public:
  // Create a LzipOutputStream with default options.
  explicit LzipOutputStream(ZeroCopyOutputStream* sub_stream, size_t 
compression_level = 5, int64_t member_size = kint64max);

  virtual ~LzipOutputStream();

  // Flushes data written so far to zipped data in the underlying stream.
  // It is the caller's responsibility to flush the underlying stream if
  // necessary.
  // Compression may be less efficient stopping and starting around flushes.
  // Returns true if no error.
  bool Flush();

  // Flushes data written so far to zipped data in the underlying stream
  // and restarts a new LZIP member. It is the caller's responsibility to
  // flush the underlying stream if necessary.
  // Compression is a lot more inefficient when restarting a new member,
  // rather than calling Flush().
  // Returns true if no error.
  bool Restart();

  // Writes out all data and closes the lzip stream.
  // It is the caller's responsibility to close the underlying stream if
  // necessary.
  // Returns true if no error.
  bool Close();

  // --- implements ZeroCopyOutputStream ---
  bool Next(void** data, int* size);
  void BackUp(int count);
  int64 ByteCount() const;
  void Reset();

 private:
  GOOGLE_DISALLOW_EVIL_CONSTRUCTORS(LzipOutputStream);

  void Compress(bool flush = false);

  // plain text input stream
  const int input_buffer_length_;
  void* const input_buffer_;
  uint8_t* input_position_;
  uint8_t* const input_buffer_end_;

  // compressed output stream
  ZeroCopyOutputStream* sub_stream_;
  bool finished_;

  // Lzip encoder
  struct Options {
int dictionary_size; // 4KiB..512MiB
int match_len_limit; // 5..273
  };
  static const Options options[9];

  LZ_Encoder* encoder_;
  const uint64_t member_size_;
  LZ_Errno errno_;
};

}  // namespace io
}  // namespace protobuf
}  // namespace google

#endif  // GOOGLE_PROTOBUF_IO_LZIP_STREAM_H__
// This file contains the implementation of classes
// LzipInputStream and LzipOutputStream used to compress
// and decompress Google's Protocol Buffer Streams using
// the Lempel-Ziv-Markow-Algorithm.
//
// Derived from 
http

[protobuf] Re: Opensourcing LzLib Protocol Buffer Code

2010-03-26 Thread Jacob Rief
Hello Ernest,
this code is part of a private project in progress and it works well
in that context. Unfortunately the Google guys had no use case for it,
therefore they did not want to incorporate it into their code base;
maybe they just suffer the not-invented-here-syndrome. When my project
is ready to be published, I will add that code there. If I can get
write access to a PB-related Google-repository, I will use that. The
reason I did not publish anything yet, was, that I did not want to
start a Google project just to publish two files.
Regards, Jacob

2010/3/23 Ernest Lee hellfir...@gmail.com:
 Hello Jacob Rief,

 I have noticed you haven't publicly said anythng about your lzma protobuff
 storage. Is it dead? Can you post it on your google code project? What has
 happened to it?

 Thanks.


-- 
You received this message because you are subscribed to the Google Groups 
Protocol Buffers group.
To post to this group, send email to proto...@googlegroups.com.
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en.



Re: [protobuf] Re: How can I reset a FileInputStream?

2010-01-30 Thread Jacob Rief
Hello Kenton,

2010/1/30 Kenton Varda ken...@google.com:
 We can't add a new virtual method to the ZeroCopyInputStream interface
 because it would break existing implementations.

but only on a binary level. God's sake you are not Microsoft :)

 Besides that, it's unclear what the Reset() method means in an abstract
 sense.  Yes, you can define it for the particular set of streams that you're
 thinking of, but what does it mean in general?

Put the object into a state equivalent to a state immediately after
construction. The Reset() button should be pressed only when you
know what you are doing - otherwise you will loose valuable data.

 What should ArrayInputStream::Reset() do?  In this case Reset() is
 nonsensical.

the same as its constructor without reallocation, ie.:
void ArrayInputStream::Reset() {
  position_ = 0;
  last_returned_size_ = 0;
}

 What should IstreamInputStream::Reset() do?  Should it only discard its own
 buffer, or should it also reset the underlying istream?  If that istream is

void IstreamInputStream::Reset() {
  impl_.Reset();
}

since impl_ is a CopyingInputStreamAdaptor which itself IS-A
ZeroCopyInputStream its Reset() is just another implementation, ie.

void CopyingInputStreamAdaptor::Reset() {
  position_ = 0;
  buffer_used_ = 0;
  backup_bytes_ = 0;
}

 itself wrapping a file descriptor, and you're trying to seek that file
 descriptor directly, then you need to reset the istream.  But maybe the user
 is actually calling IstreamInputStream::Reset() because they have seeked the
 istream itself and what IstreamInputStream to acknowledge this.  Who knows?
  But you can't say that Reset() is only propagated down the stack by *some*
 implementations and not others.

Since the creator of IstreamInputStream is the owner of the file
descriptor, its his responsibility to seek to whatever location
desired.

 No, we won't be adding a Reset() method because the meaning is unclear.
 Meanwhile, you seem to have made an argument against
 FileInputStream::Seek():  Any streams layered on top of it will be broken if
 you Seek() the stream under them.  So you have to have some way to reset
 those streams, and the problem starts again!

Exactly! Therefore instead of destroying and recreating them, a much
simpler Reset() function would do the job. The design of the streaming
classes is to consider a stream which can move only forward. It was
not designed for moving backwards or random access.

 Please just don't add anything new.  If you are unhappy with what
 ZeroCopy{Input,Output}Stream provide, you can always just create your own
 stream framework to use.

Well, I have to live with that decision. Maybe in the future some
other people have similar use cases. Maybe in version 3?

Just for curiosity, the protobuf code is really easy to read and to
understand. The only thing is disliked is the mapping of class names
to filenames. Is all the code inside Google written that clearly?

Regards, Jacob

-- 
You received this message because you are subscribed to the Google Groups 
Protocol Buffers group.
To post to this group, send email to proto...@googlegroups.com.
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en.



[protobuf] Re: How can I reset a FileInputStream?

2010-01-21 Thread Jacob Rief
Hello Kenton,

2010/1/20 Kenton Varda ken...@google.com:
 (1) Normally micro-benchmarks involve running the operation in a loop many
 times so that the total time is closer to 1s or more, not running the
 operation once and trying to time that.  System clocks are not very accurate
 at that scale, and depending on what kind of clock it is, it may actually
 take significantly longer to read the lock than it does not allocate memory.
 (2) Your benchmark does not include the time spent actually reading the
 file, which is what I asserted would be much slower than re-allocating the
 buffer.  Sure, the seek itself is fast but it is pointless without actually
 reading.

now I modified the benchmark, now the code looks like this

boost::posix_time::ptime time0(boost::posix_time::microsec_clock::local_time());
boost::posix_time::ptime time1(boost::posix_time::microsec_clock::local_time());
for (int i = 0; i100; ++i) {
       const void* data;
       int size;
       fileInStream-Seek(offset, whence);
       fileInStream-Next(data, size);
}
boost::posix_time::ptime time2(boost::posix_time::microsec_clock::local_time());
for (int i = 0; i100; ++i) {
       const void* data;
       int size;
       ::lseek64(fileDescriptor, offset, whence);
       fileInStream.reset(new
google::protobuf::io::FileInputStream(fileDescriptor));
       fileInStream-Next(data, size);
}
boost::posix_time::ptime time3(boost::posix_time::microsec_clock::local_time());
std::cerr  t1:    boost::posix_time::time_period(time1,
time2).length()   t2:   boost::posix_time::time_period(time2,
time3).length()  std::endl;

The difference now is less significant, but still measurable:
t1: 00:00:02.068949 t2: 00:00:02.389942
t1: 00:00:02.092842 t2: 00:00:02.429206
t1: 00:00:02.080614 t2: 00:00:02.394708
t1: 00:00:02.094289 t2: 00:00:02.429952
t1: 00:00:02.323403 t2: 00:00:03.723459
t1: 00:00:02.151486 t2: 00:00:03.711809
t1: 00:00:02.084442 t2: 00:00:02.416326
t1: 00:00:02.052930 t2: 00:00:02.383500

 (3) What memory allocator are you using?  With tcmalloc, a malloc/free pair
 should take around 50ns, two orders of magnitude less than your 4us
 measurement.

The 'new' operator is not overloaded. I use gcc-Version 4.4.1 20090725
(Red Hat 4.4.1-2)

Regards, Jacob

 On Wed, Jan 20, 2010 at 2:17 PM, Jacob Rief jacob.r...@gmail.com wrote:

 Hello Kenton,
 now I did some benchmarks, while Seek'ing though a FileInputStream.
 The testing code looks like this:

  boost::posix_time::ptime
 t0(boost::posix_time::microsec_clock::local_time()); // initialize
 boost::posix_time
  boost::shared_ptrgoogle::protobuf::io::FileInputStream
 fileInStream = new
 google::protobuf::io::FileInputStream(fileDescriptor);
  boost::posix_time::ptime
 t1(boost::posix_time::microsec_clock::local_time());
   // using Seek(), the function available through my patch
  fileInStream-Seek(offset, whence);
  boost::posix_time::ptime
 t2(boost::posix_time::microsec_clock::local_time());
  // this is the default method of achieving the same
  ::lseek64(fileDescriptor, offset, whence);
  fileInStream.reset(new
 google::protobuf::io::FileInputStream(fileDescriptor));
  boost::posix_time::ptime
 t3(boost::posix_time::microsec_clock::local_time());
  std::cerr  t1:          boost::posix_time::time_period(t1,
 t2).length()   t2:   boost::posix_time::time_period(t2,
 t3).length()  std::endl;

 and on my Intel Core2 Duo CPU E8400 (3.00GHz) with 4GB of RAM,
 gcc-Version 4.4.1 20090725, compiled with -O2
 I get these numbers:
 t1: 00:00:00.01 t2: 00:00:00.03
 t1: 00:00:00.01 t2: 00:00:00.03
 t1: 00:00:00.01 t2: 00:00:00.04
 t1: 00:00:00.01 t2: 00:00:00.07
 t1: 00:00:00.01 t2: 00:00:00.02
 t1: 00:00:00.01 t2: 00:00:00.03
 t1: 00:00:00.02 t2: 00:00:00.03
 t1: 00:00:00.01 t2: 00:00:00.04
 t1: 00:00:00.01 t2: 00:00:00.04
 t1: 00:00:00.01 t2: 00:00:00.03
 t1: 00:00:00.01 t2: 00:00:00.04

 In absolute numbers, ~1 microsecond compared to 3-4 microseconds is
 not a big difference,
 but from a relative point of view, direct Seek'ing is much faster than
 object recreation. And since
 I have to seek a lot in the FileInputStream, the measured times will
 accumulate.

 Regards, Jacob

 2010/1/19 Kenton Varda ken...@google.com:
  Did you do any tests to determine if the performance difference is
  relevant?
 
  On Mon, Jan 18, 2010 at 3:14 PM, Jacob Rief jacob.r...@gmail.com
  wrote:
 
  Hello Kenton,
 
  2010/1/18 Kenton Varda ken...@google.com:
  (...snip...)
   As for code cleanliness, I find the Reset() method awkward since the
   user
   has to remember to call it at the same time as they do some other
   operation,
   like seeking the file descriptor.  Either calling Reset() or seeking
   the
   file descriptor alone will put the object in an inconsistent state.
    It
   might make more sense to offer an actual Seek() method which can
   safely
   perform both operations together with an interface that is not so

[protobuf] Re: How can I reset a FileInputStream?

2010-01-17 Thread Jacob Rief
Hello Kenton,

 What makes you think it is inefficient?  It does mean the buffer has to be
 re-allocated but with a decent malloc implementation that shouldn't take
 long.  Certainly the actual reading from the file would take longer.  Have
 you seen performance problems with this approach?

Well, in order to see any performance penalties, I would have to
implement FileInputStream::Reset() and compare the results with the
current implementation, (I can do that, if there is enough interest).
I reviewed the implementation and I saw that by reinstantiating a
FileInputStream object, 3 destructors and 3 constructors have to be
called, where one (CopyingInputStreamAdaptor) invalidates a buffer
which in the Next() step immediately afterwards has to be
reallocated. A Reset() function would avoid these unnecessary steps.

 If there really is a performance problem with allocating new objects, then
 sure.

From the performance point of view, its certainly not a big issue, but
from the code cleanness point of view, it is.
I have written a class named LzipInputStream, which offers a Reset()
functionality to randomly access any part of the uncompressed input
stream without having to decompress everything. Therefore this Reset()
function is called quite often and it has to destroy and recreate its
lower layer, ie. the FileInputStream. If each stackable ...InputStream
offers a Reset() function, the upper layer then only has to call Reset
() on the lower layer, instead of keeping track how to reconstruct the
lower layered FileInputStream object.

Regards, Jacob
-- 
You received this message because you are subscribed to the Google Groups 
Protocol Buffers group.
To post to this group, send email to proto...@googlegroups.com.
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en.




[protobuf] How can I reset a FileInputStream?

2010-01-13 Thread Jacob Rief
Hello Kenton,

currently I have the following problem: I have a very big file with
many small messages serialized with Protobuf. Each message contains
its owner separator and thus can be found even in an unsynchronized
stream. I move through this file using lseek64, because
FileInputStream::Skip only works into forwarding direction and
FileInputStream::BackUp can move back only up to the current buffer
boundary. Since I am the owner of the file descriptor, also used by
FileInputStream, I can randomly seek to any position in the file.
However after seek'ing, my FileInputStream is obviously in an unusable
state and has to be reset. Currently the only feasible solution is to
replace the current FileInputStream object by a new one - which,
somehow is quite inefficient!

Wouldn't it make sense to add a member function which resets a
FileInputStream to the state of a natively opened and repositioned
file descriptor? Or is there any other solution to randomly access the
raw content of the file, say by wrapping seek?

Regards,
Jacob
-- 
You received this message because you are subscribed to the Google Groups 
Protocol Buffers group.
To post to this group, send email to proto...@googlegroups.com.
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en.




Re: [protobuf] Re: Protocol Buffers using Lzip

2009-12-11 Thread Jacob Rief
Hello Chris,

2009/12/10 Christopher Smith cbsm...@gmail.com:
 One compression algo that I thought would be particularly useful with PB's
 would be LZO. It lines up nicely with PB's goals of being fast and compact.
 Have you thought about allowing an integrated LZO stream?

 --Chris

My goal is to compress huge amounts 5GB of small serialized chunks
(~150...500 Bytes) into a single stream, and still being able to
randomly access each part of it without having to decompress to whole
stream. GzipOutputStream (with level 5) reduces the size to about 40%
compared to the uncompressed binary stream, whereas my
LzipOutputStream (with level 5) reduces the size to about 20%. The
difficulty with gzip is to find synchronizing boundaries in the stream
during uncompression
If your aim is to exchange small messages, say by RPC, than a fast but
less efficient algorithm is the right choice. If however you want to
store huge amounts of data permanently, your requirements may be
different.

In my opinion, generic streaming classes such as
ZeroCopyIn/OutputStream, shall offer different compression algorithms
for different purposes. LZO has advantages if used for communication
of small to medium sized chunks of data. LZMA on the other hand has
advantages if you have to store lots of data for a long term. GZIP is
somewhere in the middle. Unfortunately Kenton has another opinion
about adding too many compression streaming classes.

Today I studied the API of LZO. From what I have seen, I think one
could implement two LzoIn/OutputStream classes. LZO compression
however has a small drawback, let me explain why: The LZO API is not
intended to be used for streams. Instead it always compresses and
decompresses a whole block. This is different behaviour than gzip and
lzip, which are intended to compress streams. A compression class has
a fixed sized buffer of typically 8 or 64kB. If this buffer is filled
with data, lzip and gzip digest the input and you can start to fill
the buffer from the beginning. On the other hand, the LZO compressor
has to compress the whole buffer in one step. The next block then has
to be concatenated with the already compressed data, which means that
during decompression you have to fiddle these chunks apart.

If your intention is to compress a chunk of data with, say less than
64kB each, and then to put it on the wire, then LZO is the right
solution for you. For my requirements, as you will understand now, LZO
does not really fit well.
If there is a strong interest in an alternative Protocol Buffer
compression stream, don't hesitate to contact me.

Jacob

--

You received this message because you are subscribed to the Google Groups 
Protocol Buffers group.
To post to this group, send email to proto...@googlegroups.com.
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en.




[protobuf] Protocol Buffers using Lzip

2009-12-08 Thread Jacob Rief
Hello Brian, hello Kenton, hello list,
as an alternative to GzipInputStream and GzipOutputStream I have
written a compression and an uncompression stream class which are
stackable into Protocol Buffers streams. They are named
LzipInputStream and LzipOutputStream and use the Lempel-Ziv-Markov
chain algorithm, as implemented by LZIP
http://www.nongnu.org/lzip/lzip.html

An advantage for using Lzip instead of Gzip is, that Lzip supports
multi member compression. So one can jump into the stream at any
position, forward up to the next synchronization boundary and start
reading from there.
Using the default compression level, Lzip has a better compression
ratio at the cost of being slower than Gzip, but when Lzip is used
with a low compression level, speed and output size of Lzip are
comparable to that of Gzip.

I would like to donate these classes to the ProtoBuf software
repository. They will be released under an OSS license, compatible to
LZIP and Google's. Could someone please check them and tell me in what
kind of repository I can publish them. In Google's license agreements
there is a passage telling: Neither the name of Google Inc. nor the
names of its contributors may be used to endorse or promote products
derived from this software without specific prior written permission.
Since I have to use the name google in the C++ namespace of
LzipIn/OutputStream, hereby I ask for permission to do so.

Comments are appreciated,
Jacob

--

You received this message because you are subscribed to the Google Groups 
Protocol Buffers group.
To post to this group, send email to proto...@googlegroups.com.
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com.
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en.


// This file contains the implementation of classes 
// LzipInputStream and LzipOutputStream used to compress
// and decompress Google's Protocol Buffer Streams using
// the Lempel-Ziv-Markow-Algorithm.
//
// Derived from 
http://protobuf.googlecode.com/svn/tags/2.2.0/src/google/protobuf/io/gzip_stream.cc
// Copyright 2009 by Jacob Rief jacob.r...@gmail.com
// Evaluation copy - don't use in production code

#include lzip_stream.h
#include google/protobuf/stubs/common.h

namespace google {
namespace protobuf {
namespace io {

static const int kDefaultBufferSize = 8192;

// === LzipInputStream ===

LzipInputStream::LzipInputStream(ZeroCopyInputStream* sub_stream) :
  sub_stream_(sub_stream),
  finished_(false),
  output_buffer_length_(kDefaultBufferSize),
  output_buffer_(operator new(output_buffer_length_)),
  output_position_(NULL),
  next_out_(NULL),
  avail_out_(0),
  errno_(LZ_ok)
{
  GOOGLE_CHECK(output_buffer_ != NULL);
  decoder_ = LZ_decompress_open();
  errno_ = LZ_decompress_errno(decoder_);
  GOOGLE_CHECK(errno_ == LZ_ok);
}

LzipInputStream::~LzipInputStream() {
  if (decoder_ != NULL) {
Close();
  }
  if (output_buffer_ != NULL) {
operator delete(output_buffer_);
  }
}

bool LzipInputStream::Close() {
  errno_ = LZ_decompress_errno(decoder_);
  bool ok = LZ_decompress_close(decoder_) == LZ_ok;
  decoder_ = NULL;
  return ok;
}

// --- implements ZeroCopyInputStream ---
bool LzipInputStream::Next(const void** data, int* size) {
  GOOGLE_CHECK_GE(next_out_, output_position_);
  if (next_out_ == output_position_) {
if (finished_  LZ_decompress_finished(decoder_))
  return false;
output_position_ = next_out_ = static_castuint8_t*(output_buffer_);
avail_out_ = output_buffer_length_;
Decompress();
  }
  *data = output_position_;
  *size = next_out_ - output_position_;
  output_position_ = next_out_;
  return true;
}

void LzipInputStream::BackUp(int count) {
  GOOGLE_CHECK_GE(output_position_-static_castuint8_t*(output_buffer_), 
count);
  output_position_ -= count;
}

bool LzipInputStream::Skip(int count) {
  const void* data;
  int size;
  bool ok = Next(data, size);
  while (ok  (size  count)) {
count -= size;
ok = Next(data, size);
  }
  if (size  count) {
BackUp(size - count);
  }
  return ok;
}

int64 LzipInputStream::ByteCount() const {
  return LZ_decompress_total_out_size(decoder_);
}

// --- private ---
void LzipInputStream::Decompress() {
  GOOGLE_CHECK_GT(avail_out_, 0);
  if (!finished_) {
int avail_in;
const void* next_in;
if (sub_stream_-Next(next_in, avail_in)) {
  int bytes_written = LZ_decompress_write(decoder_, static_castconst 
uint8_t*(next_in), avail_in);
  errno_ = LZ_decompress_errno(decoder_);
  GOOGLE_CHECK(errno_ == LZ_ok);
  GOOGLE_CHECK_GE(bytes_written, 0);
  sub_stream_-BackUp(avail_in - bytes_written);
} else {
  GOOGLE_CHECK(LZ_decompress_finish(decoder_) == LZ_ok);
  finished_ = true;
}
  }
  int bytes_read = LZ_decompress_read(decoder_, next_out_, avail_out_);
  errno_ = LZ_decompress_errno(decoder_);
  GOOGLE_CHECK(errno_ == LZ_ok);
  GOOGLE_CHECK_GE(bytes_read, 0);
  next_out_ += bytes_read;
  avail_out_ -= bytes_read

Unable to read from concatenated zip's using GzipInputStream in protobuf-2.2.0 with zlib-1.2.3

2009-10-18 Thread Jacob Rief

I use protobuf to write self delimited messages to a file. When I use
FileOutputStream, I can close the stream, reopen it at a later time
for writing, closing it again and then parse the whole file. When I
try to do the same job after writing with GzipOutputStream, than
parsing with GzipInputStream, I can read up to the end of the first
chunk, but then CodedInputStream::ReadRaw returns false and my
application looses its sync. If however I first uncompress the written
file with gunzip and then use FileInputStream to decode it, everything
works fine. Also, if I lseek the file descriptor onto the beginning of
the second chunk (1f 8b 08 ...) and create a new GzipInputStream
object using that file descriptor, I can read everything.

I did some debugging and found out that when I use a zipped file with
one chunk (normal case) and hit the eof in GzipInputStream::Next
Inflate() returns Z_STREAM_END and zcontext_.avail_in is 0. When I do
the same tests with a concatenated file, when reaching the end of the
first chunk, in GzipInputStream::Next Inflate() returns Z_STREAM_END
and zcontext_.avail_in is 1129, which means that the zlib has some
unprocessed bytes in the input buffer.
--~--~-~--~~~---~--~~
You received this message because you are subscribed to the Google Groups 
Protocol Buffers group.
To post to this group, send email to protobuf@googlegroups.com
To unsubscribe from this group, send email to 
protobuf+unsubscr...@googlegroups.com
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en
-~--~~~~--~~--~--~---