Is this a case of needing to delimit the input? I'm not familiar with SplitterInputStream, but I'm wondering if it does the right thing for this to work.
--Chris On Thu, Feb 18, 2010 at 12:56 PM, Kenton Varda <[email protected]> wrote: > Please reply-all so the mailing list stays CC'd. I don't know anything > about the libraries you are using so I can't really help you further. Maybe > someone else can. > > On Thu, Feb 18, 2010 at 12:46 PM, Yang <[email protected]> wrote: > >> thanks Kenton, >> >> I thought about the same, >> what I did was that I use a splitter stream, and split the actual input >> stream into 2, dumping out one for debugging, and feeding the other one to >> PB. >> >> >> my code for Hadoop is >> >> Writable.readFields( Datainput in ) { >> >> SplitterInputStream ios = new SplitterInputStream(in); >> >> pb_object = MyPBClass.parseFrom(ios); >> } >> >> SplitterInputStream dumps out the actual bytes, and the resulting byte >> stream is >> indeed (decimal) >> >> 10 2 79 79 16 1 ... repeating 20 times\ >> >> which is 20 records of >> message { >> 1: string name ; // taking a value of "yy" >> 2: i32 Id; //taking a value of 1 >> } >> >> >> >> indeed, in compression or non-compression mode, the dumped out bytestream >> is the same. >> >> >> >> On Thu, Feb 18, 2010 at 12:03 PM, Kenton Varda <[email protected]> wrote: >> >>> You should verify that the bytes that come out of the InputStream really >>> are the exact same bytes that were written by the serializer to the >>> OutputStream originally. You could do this by computing a checksum at both >>> ends and printing it, then inspecting visually. You'll probably find that >>> the bytes differ somehow, or don't end at the same point. >>> >>> On Thu, Feb 18, 2010 at 2:48 AM, Yang <[email protected]> wrote: >>> >>>> I tried to use protocol buffer in hadoop, >>>> >>>> so far it works fine with SequenceFile, after I hook it up with a simple >>>> wrapper, >>>> >>>> but after I put in a compressor in sequenceFile, it fails, because it >>>> read all the messages and yet still wants to advance the read pointer, and >>>> then readTag() returns 0, so the mergeFrom() returns a message with no >>>> fields set. >>>> >>>> anybody familiar with both SequenceFile and protocol buffer has an idea >>>> why it fails like this? >>>> I find it difficult to understand because the InputStream is simply the >>>> same, whether it comes through a compressor or not >>>> >>>> >>>> thanks >>>> Yang >>>> >>>> -- >>>> You received this message because you are subscribed to the Google >>>> Groups "Protocol Buffers" group. >>>> To post to this group, send email to [email protected]. >>>> To unsubscribe from this group, send email to >>>> [email protected]<protobuf%[email protected]> >>>> . >>>> For more options, visit this group at >>>> http://groups.google.com/group/protobuf?hl=en. >>>> >>> >>> >> > -- > You received this message because you are subscribed to the Google Groups > "Protocol Buffers" group. > To post to this group, send email to [email protected]. > To unsubscribe from this group, send email to > [email protected]<protobuf%[email protected]> > . > For more options, visit this group at > http://groups.google.com/group/protobuf?hl=en. > -- Chris -- You received this message because you are subscribed to the Google Groups "Protocol Buffers" group. To post to this group, send email to [email protected]. To unsubscribe from this group, send email to [email protected]. For more options, visit this group at http://groups.google.com/group/protobuf?hl=en.
