thank you for your replies. i reinstalled hadoop and hive, switched from Cloudera CDH3 to CDH2, restarted everything from scratch i've set io.skip.checksum.errors=true
and i still have the same error :( what's wrong ? :( the dataset come from a postgresql database and is consistant. On Tue, Feb 1, 2011 at 6:57 AM, Aaron Kimball <akimbal...@gmail.com> wrote: > In MapReduce, filenames that begin with an underscore are "hidden" files and > are not enumerated by FileInputFormat (Hive, I believe, processes tables > with TextInputFormat and SequenceFileInputFormat, both descendants of this > class). > Using "_foo" as a hidden/ignored filename is conventional in the Hadoop > world. This is different than the UNIX convention of using ".foo", but > that's software engineering for you. ;) > This is unlikely to change soon; MapReduce emits files with names like > "_SUCCESS" into directories to indicate successful job completion. > Directories such as "_tmp" and "_logs" also appear in datasets, and are > therefore ignored as input by MapReduce-based tools, but those metadata > names are established in other projects. > If you run 'hadoop fs -mv /path/to/_top.sql /path/to/top.sql', that should > make things work for you. > - Aaron > > On Mon, Jan 31, 2011 at 10:21 AM, yongqiang he <heyongqiang...@gmail.com> > wrote: >> >> You can first try to set io.skip.checksum.errors to true, which will >> ignore bad checksum. >> >> >>In facebook, we also had a requirement to ignore corrupt/bad data - but >> >> it has not been committed yet. Yongqiang, what is the jira number ? >> there seems no jira for this issue. >> >> thanks >> yongqiang >> 2011/1/31 Namit Jain <nj...@fb.com>: >> > >> > >> > On 1/31/11 7:46 AM, "Laurent Laborde" <kerdez...@gmail.com> wrote: >> > >> >>On Fri, Jan 28, 2011 at 8:05 AM, Laurent Laborde <kerdez...@gmail.com> >> >>wrote: >> >>> On Fri, Jan 28, 2011 at 1:12 AM, Namit Jain <nj...@fb.com> wrote: >> >>>> Hi Laurent, >> >>>> >> >>>> 1. Are you saying that _top.sql did not exist in the home directory. >> >>>> Or that, _top.sql existed, but hive was not able to read it after >> >>>>loading >> >>> >> >>> It exist, it's loaded, and i can see it in the hive's warehouse >> >>>directory. >> >>> it's just impossible to query it. >> >>> >> >>>> 2. I don¹t think reserved words are documented somewhere. Can you >> >>>> file >> >>>>a >> >>>> jira for this ? >> >>> >> >>> Ok; will do that today. >> >>> >> >>>> 3. The bad row is printed in the task log. >> >>>> >> >>>> 1. 2011-01-27 11:11:07,046 INFO org.apache.hadoop.fs.FSInputChecker: >> >>>>Found >> >>>> checksum error: b[1024, >> >>>> >> >> >>>> >>>>1536]=7374796c653d22666f6e742d73697a653a20313270743b223e3c623e266e627370 >> >>>>3b2 >> >>>> >> >> >>>> >>>>66e6273703b266e6273703b202a202838302920416d69656e733a3c2f623e3c2f7370616 >> >>>>e3e >> >>>> >> >> >>>> >>>>3c2f7370616e3e5c6e20203c2f703e5c6e20203c703e5c6e202020203c7370616e207374 >> >>>>796 >> >>>> >> >> >>>> >>>>c653d22666f66742d66616d696c793a2068656c7665746963613b223e3c7370616e20737 >> >>>>479 >> >>>> >> >> >>>> >>>>6c653d22666f6e742d73697a653a20313270743b223e3c623e266e6273703b266e627370 >> >>>>3b2 >> >>>> >> >> >>>> >>>>66e6273703b266e6273703b266e6273703b266e6273703b266e6273703b266e6273703b2 >> >>>>66e >> >>>> >> >> >>>> >>>>6273703b206f203132682c2050697175652d6e6971756520646576616e74206c65205265 >> >>>>637 >> >>>> >> >> >>>> >>>>46f7261742e3c2f623e3c2f7370616e3e3c2f7370616e3e5c6e20203c2f703e5c6e20203 >> >>>>c70 >> >>>> >> >> >>>> >>>>3e5c6e202020203c7370616e207374796c653d22666f6e742d66616d696c793a2068656c >> >>>>766 >> >>>> >> >> >>>> >>>>5746963613b223e3c7370616e207374796c653d22666f6e742d73697a653a20313270743 >> >>>>b22 >> >>>> >> >> >>>> >>>>3e3c623e266e6273703b266e6273703b266e6273703b266e6273703b266e6273703b266e >> >>>>627 >> >>>> >> >> >>>> >>>>3703b266e6273703b266e6273703b266e6273703b206f2031346833302c204d6169736f6 >> >>>>e20 >> >>>> >> >> >>>> >>>>6465206c612063756c747572652e3c2f623e3c2f7370616e3e3c2f7370616e3e5c6e2020 >> >>>>3c2 >> >>>> f703e5c6e20203c703e5c6e202020203c7370616e207374796c653d >> >>> >> >>> Is this the actual data ? >> >>> >> >>>> 2. org.apache.hadoop.fs.ChecksumException: Checksum error: >> >>>> /blk_2466764552666222475:of:/user/hive/warehouse/article/article.copy >> >>>>at >> >>>> 23446528 >> >>> >> >>> 23446528 is the line number ? >> >>> >> >>> thank you >> >> >> >>optional question (the previous ones are still open) : >> >>is there a way to tell hive to ignore invalid data ? (if the problem >> >>is invalid data) >> >> >> > >> > Currently, not. >> > In facebook, we also had a requirement to ignore corrupt/bad data - but >> > it >> > has not >> > been committed yet. Yongqiang, what is the jira number ? >> > >> > >> > Thanks, >> > -namit >> > >> > >> >> >> >>-- >> >>Laurent "ker2x" Laborde >> >>Sysadmin & DBA at http://www.over-blog.com/ >> > >> > > > -- Laurent "ker2x" Laborde Sysadmin & DBA at http://www.over-blog.com/