You can first try to set io.skip.checksum.errors to true, which will ignore bad checksum.
>>In facebook, we also had a requirement to ignore corrupt/bad data - but it >>has not been committed yet. Yongqiang, what is the jira number ? there seems no jira for this issue. thanks yongqiang 2011/1/31 Namit Jain <nj...@fb.com>: > > > On 1/31/11 7:46 AM, "Laurent Laborde" <kerdez...@gmail.com> wrote: > >>On Fri, Jan 28, 2011 at 8:05 AM, Laurent Laborde <kerdez...@gmail.com> >>wrote: >>> On Fri, Jan 28, 2011 at 1:12 AM, Namit Jain <nj...@fb.com> wrote: >>>> Hi Laurent, >>>> >>>> 1. Are you saying that _top.sql did not exist in the home directory. >>>> Or that, _top.sql existed, but hive was not able to read it after >>>>loading >>> >>> It exist, it's loaded, and i can see it in the hive's warehouse >>>directory. >>> it's just impossible to query it. >>> >>>> 2. I don¹t think reserved words are documented somewhere. Can you file >>>>a >>>> jira for this ? >>> >>> Ok; will do that today. >>> >>>> 3. The bad row is printed in the task log. >>>> >>>> 1. 2011-01-27 11:11:07,046 INFO org.apache.hadoop.fs.FSInputChecker: >>>>Found >>>> checksum error: b[1024, >>>> >>>>1536]=7374796c653d22666f6e742d73697a653a20313270743b223e3c623e266e627370 >>>>3b2 >>>> >>>>66e6273703b266e6273703b202a202838302920416d69656e733a3c2f623e3c2f7370616 >>>>e3e >>>> >>>>3c2f7370616e3e5c6e20203c2f703e5c6e20203c703e5c6e202020203c7370616e207374 >>>>796 >>>> >>>>c653d22666f66742d66616d696c793a2068656c7665746963613b223e3c7370616e20737 >>>>479 >>>> >>>>6c653d22666f6e742d73697a653a20313270743b223e3c623e266e6273703b266e627370 >>>>3b2 >>>> >>>>66e6273703b266e6273703b266e6273703b266e6273703b266e6273703b266e6273703b2 >>>>66e >>>> >>>>6273703b206f203132682c2050697175652d6e6971756520646576616e74206c65205265 >>>>637 >>>> >>>>46f7261742e3c2f623e3c2f7370616e3e3c2f7370616e3e5c6e20203c2f703e5c6e20203 >>>>c70 >>>> >>>>3e5c6e202020203c7370616e207374796c653d22666f6e742d66616d696c793a2068656c >>>>766 >>>> >>>>5746963613b223e3c7370616e207374796c653d22666f6e742d73697a653a20313270743 >>>>b22 >>>> >>>>3e3c623e266e6273703b266e6273703b266e6273703b266e6273703b266e6273703b266e >>>>627 >>>> >>>>3703b266e6273703b266e6273703b266e6273703b206f2031346833302c204d6169736f6 >>>>e20 >>>> >>>>6465206c612063756c747572652e3c2f623e3c2f7370616e3e3c2f7370616e3e5c6e2020 >>>>3c2 >>>> f703e5c6e20203c703e5c6e202020203c7370616e207374796c653d >>> >>> Is this the actual data ? >>> >>>> 2. org.apache.hadoop.fs.ChecksumException: Checksum error: >>>> /blk_2466764552666222475:of:/user/hive/warehouse/article/article.copy >>>>at >>>> 23446528 >>> >>> 23446528 is the line number ? >>> >>> thank you >> >>optional question (the previous ones are still open) : >>is there a way to tell hive to ignore invalid data ? (if the problem >>is invalid data) >> > > Currently, not. > In facebook, we also had a requirement to ignore corrupt/bad data - but it > has not > been committed yet. Yongqiang, what is the jira number ? > > > Thanks, > -namit > > >> >>-- >>Laurent "ker2x" Laborde >>Sysadmin & DBA at http://www.over-blog.com/ > >