Re: [protobuf] ProtocolBuffer + compression in hadoop?

Christopher Smith Thu, 18 Feb 2010 13:15:25 -0800

Is this a case of needing to delimit the input? I'm not familiar with
SplitterInputStream, but I'm wondering if it does the right thing for this
to work.


--Chris

On Thu, Feb 18, 2010 at 12:56 PM, Kenton Varda <[email protected]> wrote:

> Please reply-all so the mailing list stays CC'd.  I don't know anything
> about the libraries you are using so I can't really help you further.  Maybe
> someone else can.
>
> On Thu, Feb 18, 2010 at 12:46 PM, Yang <[email protected]> wrote:
>
>> thanks Kenton,
>>
>> I thought about the same,
>> what I did was that I use a splitter stream, and split the actual input
>> stream into 2, dumping out one for debugging, and feeding the other one to
>> PB.
>>
>>
>> my code for Hadoop is
>>
>> Writable.readFields( Datainput in ) {
>>
>>     SplitterInputStream ios = new SplitterInputStream(in);
>>
>>     pb_object = MyPBClass.parseFrom(ios);
>> }
>>
>> SplitterInputStream dumps out the actual bytes, and the resulting byte
>> stream is
>> indeed (decimal)
>>
>> 10 2 79 79  16 1  ... repeating 20 times\
>>
>> which is 20 records of
>> message {
>>   1: string name ;  // taking a value of "yy"
>>   2: i32     Id;   //taking a value of 1
>> }
>>
>>
>>
>> indeed, in compression or non-compression mode, the dumped out bytestream
>> is the same.
>>
>>
>>
>> On Thu, Feb 18, 2010 at 12:03 PM, Kenton Varda <[email protected]> wrote:
>>
>>> You should verify that the bytes that come out of the InputStream really
>>> are the exact same bytes that were written by the serializer to the
>>> OutputStream originally.  You could do this by computing a checksum at both
>>> ends and printing it, then inspecting visually.  You'll probably find that
>>> the bytes differ somehow, or don't end at the same point.
>>>
>>> On Thu, Feb 18, 2010 at 2:48 AM, Yang <[email protected]> wrote:
>>>
>>>> I tried to use protocol buffer in hadoop,
>>>>
>>>> so far it works fine with SequenceFile, after I hook it up with a simple
>>>> wrapper,
>>>>
>>>> but after I put in a compressor in sequenceFile, it fails, because it
>>>> read all the messages and yet still wants to advance the read pointer, and
>>>> then readTag() returns 0, so the mergeFrom() returns a message with no
>>>> fields set.
>>>>
>>>> anybody familiar with both SequenceFile and protocol buffer has an idea
>>>> why it fails like this?
>>>> I find it difficult to understand because the InputStream is simply the
>>>> same, whether it comes through a compressor or not
>>>>
>>>>
>>>> thanks
>>>> Yang
>>>>
>>>> --
>>>> You received this message because you are subscribed to the Google
>>>> Groups "Protocol Buffers" group.
>>>> To post to this group, send email to [email protected].
>>>> To unsubscribe from this group, send email to
>>>> [email protected]<protobuf%[email protected]>
>>>> .
>>>> For more options, visit this group at
>>>> http://groups.google.com/group/protobuf?hl=en.
>>>>
>>>
>>>
>>
>  --
> You received this message because you are subscribed to the Google Groups
> "Protocol Buffers" group.
> To post to this group, send email to [email protected].
> To unsubscribe from this group, send email to
> [email protected]<protobuf%[email protected]>
> .
> For more options, visit this group at
> http://groups.google.com/group/protobuf?hl=en.
>



-- 
Chris

-- 
You received this message because you are subscribed to the Google Groups 
"Protocol Buffers" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/protobuf?hl=en.

Re: [protobuf] ProtocolBuffer + compression in hadoop?

Reply via email to