On 1/31/11 7:46 AM, "Laurent Laborde" <kerdez...@gmail.com> wrote:
>On Fri, Jan 28, 2011 at 8:05 AM, Laurent Laborde <kerdez...@gmail.com> >wrote: >> On Fri, Jan 28, 2011 at 1:12 AM, Namit Jain <nj...@fb.com> wrote: >>> Hi Laurent, >>> >>> 1. Are you saying that _top.sql did not exist in the home directory. >>> Or that, _top.sql existed, but hive was not able to read it after >>>loading >> >> It exist, it's loaded, and i can see it in the hive's warehouse >>directory. >> it's just impossible to query it. >> >>> 2. I don¹t think reserved words are documented somewhere. Can you file >>>a >>> jira for this ? >> >> Ok; will do that today. >> >>> 3. The bad row is printed in the task log. >>> >>> 1. 2011-01-27 11:11:07,046 INFO org.apache.hadoop.fs.FSInputChecker: >>>Found >>> checksum error: b[1024, >>> >>>1536]=7374796c653d22666f6e742d73697a653a20313270743b223e3c623e266e627370 >>>3b2 >>> >>>66e6273703b266e6273703b202a202838302920416d69656e733a3c2f623e3c2f7370616 >>>e3e >>> >>>3c2f7370616e3e5c6e20203c2f703e5c6e20203c703e5c6e202020203c7370616e207374 >>>796 >>> >>>c653d22666f66742d66616d696c793a2068656c7665746963613b223e3c7370616e20737 >>>479 >>> >>>6c653d22666f6e742d73697a653a20313270743b223e3c623e266e6273703b266e627370 >>>3b2 >>> >>>66e6273703b266e6273703b266e6273703b266e6273703b266e6273703b266e6273703b2 >>>66e >>> >>>6273703b206f203132682c2050697175652d6e6971756520646576616e74206c65205265 >>>637 >>> >>>46f7261742e3c2f623e3c2f7370616e3e3c2f7370616e3e5c6e20203c2f703e5c6e20203 >>>c70 >>> >>>3e5c6e202020203c7370616e207374796c653d22666f6e742d66616d696c793a2068656c >>>766 >>> >>>5746963613b223e3c7370616e207374796c653d22666f6e742d73697a653a20313270743 >>>b22 >>> >>>3e3c623e266e6273703b266e6273703b266e6273703b266e6273703b266e6273703b266e >>>627 >>> >>>3703b266e6273703b266e6273703b266e6273703b206f2031346833302c204d6169736f6 >>>e20 >>> >>>6465206c612063756c747572652e3c2f623e3c2f7370616e3e3c2f7370616e3e5c6e2020 >>>3c2 >>> f703e5c6e20203c703e5c6e202020203c7370616e207374796c653d >> >> Is this the actual data ? >> >>> 2. org.apache.hadoop.fs.ChecksumException: Checksum error: >>> /blk_2466764552666222475:of:/user/hive/warehouse/article/article.copy >>>at >>> 23446528 >> >> 23446528 is the line number ? >> >> thank you > >optional question (the previous ones are still open) : >is there a way to tell hive to ignore invalid data ? (if the problem >is invalid data) > Currently, not. In facebook, we also had a requirement to ignore corrupt/bad data - but it has not been committed yet. Yongqiang, what is the jira number ? Thanks, -namit > >-- >Laurent "ker2x" Laborde >Sysadmin & DBA at http://www.over-blog.com/