You can first try to set io.skip.checksum.errors to true, which will
ignore bad checksum.

>>In facebook, we also had a requirement to ignore corrupt/bad data - but it 
>>has not been committed yet. Yongqiang, what is the jira number ?
there seems no jira for this issue.

thanks
yongqiang
2011/1/31 Namit Jain <nj...@fb.com>:
>
>
> On 1/31/11 7:46 AM, "Laurent Laborde" <kerdez...@gmail.com> wrote:
>
>>On Fri, Jan 28, 2011 at 8:05 AM, Laurent Laborde <kerdez...@gmail.com>
>>wrote:
>>> On Fri, Jan 28, 2011 at 1:12 AM, Namit Jain <nj...@fb.com> wrote:
>>>> Hi Laurent,
>>>>
>>>> 1. Are you saying that _top.sql did not exist in the home directory.
>>>> Or that, _top.sql existed, but hive was not able to read it after
>>>>loading
>>>
>>> It exist, it's loaded, and i can see it in the hive's warehouse
>>>directory.
>>> it's just impossible to query it.
>>>
>>>> 2. I don¹t think reserved words are documented somewhere. Can you file
>>>>a
>>>> jira for this ?
>>>
>>> Ok; will do that today.
>>>
>>>> 3. The bad row is printed in the task log.
>>>>
>>>> 1. 2011-01-27 11:11:07,046 INFO org.apache.hadoop.fs.FSInputChecker:
>>>>Found
>>>> checksum error: b[1024,
>>>>
>>>>1536]=7374796c653d22666f6e742d73697a653a20313270743b223e3c623e266e627370
>>>>3b2
>>>>
>>>>66e6273703b266e6273703b202a202838302920416d69656e733a3c2f623e3c2f7370616
>>>>e3e
>>>>
>>>>3c2f7370616e3e5c6e20203c2f703e5c6e20203c703e5c6e202020203c7370616e207374
>>>>796
>>>>
>>>>c653d22666f66742d66616d696c793a2068656c7665746963613b223e3c7370616e20737
>>>>479
>>>>
>>>>6c653d22666f6e742d73697a653a20313270743b223e3c623e266e6273703b266e627370
>>>>3b2
>>>>
>>>>66e6273703b266e6273703b266e6273703b266e6273703b266e6273703b266e6273703b2
>>>>66e
>>>>
>>>>6273703b206f203132682c2050697175652d6e6971756520646576616e74206c65205265
>>>>637
>>>>
>>>>46f7261742e3c2f623e3c2f7370616e3e3c2f7370616e3e5c6e20203c2f703e5c6e20203
>>>>c70
>>>>
>>>>3e5c6e202020203c7370616e207374796c653d22666f6e742d66616d696c793a2068656c
>>>>766
>>>>
>>>>5746963613b223e3c7370616e207374796c653d22666f6e742d73697a653a20313270743
>>>>b22
>>>>
>>>>3e3c623e266e6273703b266e6273703b266e6273703b266e6273703b266e6273703b266e
>>>>627
>>>>
>>>>3703b266e6273703b266e6273703b266e6273703b206f2031346833302c204d6169736f6
>>>>e20
>>>>
>>>>6465206c612063756c747572652e3c2f623e3c2f7370616e3e3c2f7370616e3e5c6e2020
>>>>3c2
>>>> f703e5c6e20203c703e5c6e202020203c7370616e207374796c653d
>>>
>>> Is this the actual data ?
>>>
>>>> 2. org.apache.hadoop.fs.ChecksumException: Checksum error:
>>>> /blk_2466764552666222475:of:/user/hive/warehouse/article/article.copy
>>>>at
>>>> 23446528
>>>
>>> 23446528 is the line number ?
>>>
>>> thank you
>>
>>optional question (the previous ones are still open) :
>>is there a way to tell hive to ignore invalid data ? (if the problem
>>is invalid data)
>>
>
> Currently, not.
> In facebook, we also had a requirement to ignore corrupt/bad data - but it
> has not
> been committed yet. Yongqiang, what is the jira number ?
>
>
> Thanks,
> -namit
>
>
>>
>>--
>>Laurent "ker2x" Laborde
>>Sysadmin & DBA at http://www.over-blog.com/
>
>

Reply via email to