> This is definitely a curious problem.
>
It's data corruption. The file is tab-separated, so I created a quick Perl
pipe to print out the number of tabs on a given line:
-bash-3.2$ hadoop fs -cat
/user/hive/warehouse/ushb/2010-10-25/data-2010-10-25 | perl -pe
's/[^\t\n]//g' | perl -pe 's/\t/-/g' | sort | uniq -c
The STDOUT was slightly disturbing:
1 --
1552318 -------
The STDERR moreso:
11/05/04 11:07:49 INFO hdfs.DFSClient: No node available for block:
blk_-1511269407958713809_10494
file=/user/hive/warehouse/ushb/2010-10-25/data-2010-10-25
11/05/04 11:07:49 INFO hdfs.DFSClient: Could not obtain block
blk_-1511269407958713809_10494 from any node: java.io.IOException: No live
nodes contain current block. Will get new block locations from namenode and
retry...
11/05/04 11:07:52 INFO hdfs.DFSClient: No node available for block:
blk_-1511269407958713809_10494
file=/user/hive/warehouse/ushb/2010-10-25/data-2010-10-25
11/05/04 11:07:52 INFO hdfs.DFSClient: Could not obtain block
blk_-1511269407958713809_10494 from any node: java.io.IOException: No live
nodes contain current block. Will get new block locations from namenode and
retry...
11/05/04 11:07:58 WARN hdfs.DFSClient: DFS Read: java.io.IOException: Could
not obtain block: blk_-1511269407958713809_10494
file=/user/hive/warehouse/ushb/2010-10-25/data-2010-10-25
at
org.apache.hadoop.hdfs.DFSClient$DFSInputStream.chooseDataNode(DFSClient.java:1977)
at
org.apache.hadoop.hdfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java:1784)
at
org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:1932)
at java.io.DataInputStream.read(DataInputStream.java:83)
(...etc)
cat: Could not obtain block: blk_-1511269407958713809_10494
file=/user/hive/warehouse/ushb/2010-10-25/data-2010-10-25
--
Tim Ellis
Riot Games