On 1/31/11 7:46 AM, "Laurent Laborde" <kerdez...@gmail.com> wrote:

>On Fri, Jan 28, 2011 at 8:05 AM, Laurent Laborde <kerdez...@gmail.com>
>wrote:
>> On Fri, Jan 28, 2011 at 1:12 AM, Namit Jain <nj...@fb.com> wrote:
>>> Hi Laurent,
>>>
>>> 1. Are you saying that _top.sql did not exist in the home directory.
>>> Or that, _top.sql existed, but hive was not able to read it after
>>>loading
>>
>> It exist, it's loaded, and i can see it in the hive's warehouse
>>directory.
>> it's just impossible to query it.
>>
>>> 2. I don¹t think reserved words are documented somewhere. Can you file
>>>a
>>> jira for this ?
>>
>> Ok; will do that today.
>>
>>> 3. The bad row is printed in the task log.
>>>
>>> 1. 2011-01-27 11:11:07,046 INFO org.apache.hadoop.fs.FSInputChecker:
>>>Found
>>> checksum error: b[1024,
>>> 
>>>1536]=7374796c653d22666f6e742d73697a653a20313270743b223e3c623e266e627370
>>>3b2
>>> 
>>>66e6273703b266e6273703b202a202838302920416d69656e733a3c2f623e3c2f7370616
>>>e3e
>>> 
>>>3c2f7370616e3e5c6e20203c2f703e5c6e20203c703e5c6e202020203c7370616e207374
>>>796
>>> 
>>>c653d22666f66742d66616d696c793a2068656c7665746963613b223e3c7370616e20737
>>>479
>>> 
>>>6c653d22666f6e742d73697a653a20313270743b223e3c623e266e6273703b266e627370
>>>3b2
>>> 
>>>66e6273703b266e6273703b266e6273703b266e6273703b266e6273703b266e6273703b2
>>>66e
>>> 
>>>6273703b206f203132682c2050697175652d6e6971756520646576616e74206c65205265
>>>637
>>> 
>>>46f7261742e3c2f623e3c2f7370616e3e3c2f7370616e3e5c6e20203c2f703e5c6e20203
>>>c70
>>> 
>>>3e5c6e202020203c7370616e207374796c653d22666f6e742d66616d696c793a2068656c
>>>766
>>> 
>>>5746963613b223e3c7370616e207374796c653d22666f6e742d73697a653a20313270743
>>>b22
>>> 
>>>3e3c623e266e6273703b266e6273703b266e6273703b266e6273703b266e6273703b266e
>>>627
>>> 
>>>3703b266e6273703b266e6273703b266e6273703b206f2031346833302c204d6169736f6
>>>e20
>>> 
>>>6465206c612063756c747572652e3c2f623e3c2f7370616e3e3c2f7370616e3e5c6e2020
>>>3c2
>>> f703e5c6e20203c703e5c6e202020203c7370616e207374796c653d
>>
>> Is this the actual data ?
>>
>>> 2. org.apache.hadoop.fs.ChecksumException: Checksum error:
>>> /blk_2466764552666222475:of:/user/hive/warehouse/article/article.copy
>>>at
>>> 23446528
>>
>> 23446528 is the line number ?
>>
>> thank you
>
>optional question (the previous ones are still open) :
>is there a way to tell hive to ignore invalid data ? (if the problem
>is invalid data)
>

Currently, not.
In facebook, we also had a requirement to ignore corrupt/bad data - but it
has not
been committed yet. Yongqiang, what is the jira number ?


Thanks,
-namit


>
>-- 
>Laurent "ker2x" Laborde
>Sysadmin & DBA at http://www.over-blog.com/

Reply via email to