[jira] Commented: (HDFS-1459) NullPointerException in DataInputStream.readInt

2010-10-24 Thread JIRA

[ 
https://issues.apache.org/jira/browse/HDFS-1459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12924306#action_12924306
 ] 

Hajo Nils Krabbenhöft commented on HDFS-1459:
-

I found this in my datanode logs:

2010-10-20 15:31:17,154 ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: 
DatanodeRegistration(10.17.5.3:50010, 
storageID=DS-266784496-78.46.65.54-50010-1287004808819, infoPort=50075, 
ipcPort=50020):DataXceiver
java.io.IOException: xceiverCount 257 exceeds the limit of concurrent xcievers 
256
at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:88)
at java.lang.Thread.run(Thread.java:619)

2010-10-20 15:31:19,115 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: 
DatanodeRegistration(10.17.5.3:50010, 
storageID=DS-266784496-78.46.65.54-50010-1287004808819, infoPort=50075, 
ipcPort=50020):Got exception while serving blk_-8099607957427967059_1974 to 
/10.17.5.4:
java.net.SocketTimeoutException: 48 millis timeout while waiting for 
channel to be ready for write. ch : java.nio.channels.SocketChannel[connected 
local=/10.17.5.3:50010 remote=/10.17.5.4:51336]
at 
org.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:246)
at 
org.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:159)
at 
org.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:198)
at 
org.apache.hadoop.hdfs.server.datanode.BlockSender.sendChunks(BlockSender.java:313)
at 
org.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:401)
at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:180)
at 
org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:95)
at java.lang.Thread.run(Thread.java:619)

and so far using this configuration snippet seems to fix the problem:

property
  namedfs.datanode.handler.count/name
  value40/value
  descriptionThe number of server threads for the datanode./description
/property

property
  namedfs.namenode.handler.count/name
  value40/value
  descriptionThe number of server threads for the namenode./description
/property

property  
  namedfs.datanode.max.xcievers/name
  value2048/value   
  descriptionThe maximum # of threads that can be connected to a data
ndoe simultaneously. Default value is 256.  
  /description
/property


So the underlying problem seems to be that when max xcievers is reached that 
the client does not get notified and thus reports unusable error messages.

 NullPointerException in DataInputStream.readInt
 ---

 Key: HDFS-1459
 URL: https://issues.apache.org/jira/browse/HDFS-1459
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 0.20.1
 Environment: Debian 64 bit
 Cloudera Hadoop
Reporter: Hajo Nils Krabbenhöft

 First, here's my source code accessing the HDFS:
 final FSDataInputStream indexFile = getFile(bucketPathStr, 
 Integer.toString(hashTableId) + .index);
 indexFile.seek(bucketId * 4);
 int bucketStart = ByteSwapper.swap(indexFile.readInt());
 int bucketEnd = ByteSwapper.swap(indexFile.readInt());
 final FSDataInputStream dataFile = getFile(bucketPathStr, 
 Integer.toString(hashTableId) + .data);
 dataFile.seek(bucketStart * (2 + Hasher.getConfigHashLength()) * 4);
 for (int hash = bucketStart; hash  bucketEnd; hash++) {
   int RimageIdA = ByteSwapper.swap(dataFile.readInt());
   int RimageIdB = ByteSwapper.swap(dataFile.readInt());
   ... read hash of length Hasher.getConfigHashLength() and work with 
 it 
 }
 As you can see, i am reading the range to be read from an X.index file and 
 then read these rows from X.data. The index file is always exactly 6.710.888 
 bytes in length.
 As for the data file, everything works fine with 50 different 1.35 GB (22 
 blocks) data files and it fails every time i tried with 50 different 2.42 GB 
 (39 blocks) data files. So the cause of the bug is clearly dependent on the 
 file size.
 I checked for ulimit and for the number of network connections and they are 
 both not maxed out when the error occurs. The stack trace i get is:
 java.lang.NullPointerException
   at 
 org.apache.hadoop.hdfs.DFSClient$DFSInputStream.readBuffer(DFSClient.java:1703)
   at 
 org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:1755)
   at 
 org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:1680)
   at java.io.DataInputStream.readInt(DataInputStream.java:370)
 ...
   at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
   at 

[jira] Commented: (HDFS-1459) NullPointerException in DataInputStream.readInt

2010-10-20 Thread Konstantin Boudnik (JIRA)

[ 
https://issues.apache.org/jira/browse/HDFS-1459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12923075#action_12923075
 ] 

Konstantin Boudnik commented on HDFS-1459:
--

Thanks for opening new JIRA, Hajo. One thing: in the future, please try to 
limit 'Description' field to a self-explanatory short diagnosis of a problem 
and post any error messages, code snippets, etc. as 'Comment' messages.

 NullPointerException in DataInputStream.readInt
 ---

 Key: HDFS-1459
 URL: https://issues.apache.org/jira/browse/HDFS-1459
 Project: Hadoop HDFS
  Issue Type: Bug
Affects Versions: 0.20.1
 Environment: Debian 64 bit
 Cloudera Hadoop
Reporter: Hajo Nils Krabbenhöft

 First, here's my source code accessing the HDFS:
 final FSDataInputStream indexFile = getFile(bucketPathStr, 
 Integer.toString(hashTableId) + .index);
 indexFile.seek(bucketId * 4);
 int bucketStart = ByteSwapper.swap(indexFile.readInt());
 int bucketEnd = ByteSwapper.swap(indexFile.readInt());
 final FSDataInputStream dataFile = getFile(bucketPathStr, 
 Integer.toString(hashTableId) + .data);
 dataFile.seek(bucketStart * (2 + Hasher.getConfigHashLength()) * 4);
 for (int hash = bucketStart; hash  bucketEnd; hash++) {
   int RimageIdA = ByteSwapper.swap(dataFile.readInt());
   int RimageIdB = ByteSwapper.swap(dataFile.readInt());
   ... read hash of length Hasher.getConfigHashLength() and work with 
 it 
 }
 As you can see, i am reading the range to be read from an X.index file and 
 then read these rows from X.data. The index file is always exactly 6.710.888 
 bytes in length.
 As for the data file, everything works fine with 50 different 1.35 GB (22 
 blocks) data files and it fails every time i tried with 50 different 2.42 GB 
 (39 blocks) data files. So the cause of the bug is clearly dependent on the 
 file size.
 I checked for ulimit and for the number of network connections and they are 
 both not maxed out when the error occurs. The stack trace i get is:
 java.lang.NullPointerException
   at 
 org.apache.hadoop.hdfs.DFSClient$DFSInputStream.readBuffer(DFSClient.java:1703)
   at 
 org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:1755)
   at 
 org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:1680)
   at java.io.DataInputStream.readInt(DataInputStream.java:370)
 ...
   at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:50)
   at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:358)
   at org.apache.hadoop.mapred.MapTask.run(MapTask.java:307)
   at org.apache.hadoop.mapred.Child.main(Child.java:170)
 which leads me to believe that DFSClient.blockSeekTo returns with a non-null 
 chosenNode but with blockReader = null.
 Since the exact same jar works flawlessly with small data files and fails 
 reliably with big data files, i'm wondering how this could possibly dependent 
 on the file's size or block count (DFSClient.java line 1628+):
 s = socketFactory.createSocket();
 NetUtils.connect(s, targetAddr, socketTimeout);
 s.setSoTimeout(socketTimeout);
 Block blk = targetBlock.getBlock();
 blockReader = BlockReader.newBlockReader(s, src, blk.getBlockId(), 
 blk.getGenerationStamp(),
 offsetIntoBlock, blk.getNumBytes() - offsetIntoBlock,
 buffersize, verifyChecksum, clientName);
 return chosenNode;

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.