[jira] [Resolved] (HDFS-10502) Enabled memory locking and now HDFS won't start up

Chris Nauroth (JIRA) Wed, 08 Jun 2016 09:56:09 -0700

     [ 
https://issues.apache.org/jira/browse/HDFS-10502?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]


Chris Nauroth resolved HDFS-10502.
----------------------------------
    Resolution: Invalid

Hello [~machey].  I recommend taking these questions to the 
u...@hadoop.apache.org mailing list.  We use JIRA for tracking confirmed bugs 
and feature requests.  We use u...@hadoop.apache.org for usage advice and 
troubleshooting.

Regarding whether or not this is a recommended approach, I think it depends on 
a few other factors.  Is the intent to use these cached files from Hadoop 
workloads, such as MapReduce jobs or Hive queries?  If not, then I wonder if 
your use case might be better served by something more directly focused on 
general caching use cases, such as Redis or memcached.  If your use case does 
involve Hadoop integration, then certainly Centralized Cache Management is 
worth exploring.

Regarding the timeouts, I can tell from the exception that this is the 
heartbeat RPC sent from the DataNode to the NameNode.  I recommend 
investigating connectivity between the DataNode and the NameNode and examining 
the logs from both sides to try to determine if something is going wrong in the 
handling of the heartbeat message.  On one hand, a heartbeat timeout is not an 
error condition that is specific to Centralized Cache Management.  It could 
happen whether or not you're using that feature.  On the other hand, the 
heartbeat message does contain some optional information about the state of 
cache capacity and current usage at the DataNode.  That information would 
trigger special handling logic at the NameNode side, so I suppose there is a 
chance that something in that logic is hanging up the heartbeat handling.  
Investigating the logs might reveal more.

u...@hadoop.apache.org would be a good forum for further discussion of both of 
these topics.

> Enabled memory locking and now HDFS won't start up
> --------------------------------------------------
>
>                 Key: HDFS-10502
>                 URL: https://issues.apache.org/jira/browse/HDFS-10502
>             Project: Hadoop HDFS
>          Issue Type: Bug
>          Components: fs
>    Affects Versions: 2.7.2
>         Environment: RHEL 6.8
>            Reporter: Chris Machemer
>
> My goal is to speed up reads.  I have about 500k small files (2k to 15k) and 
> I'm trying to use HDFS as a cache for serialized instances of java objects.
> I've written the code to construct and serialize all the objects out to HDFS, 
> and am now hoping to improve read performance, because accessing the objects 
> from disk-based storage is proving to be too slow for my application's SLA's.
> So my first question is, is using memory locking and hdfs cacheadmin pools 
> and directives the right way to go, to cache my objects into memory, or 
> should I create RAM disks, and do memory-based storage instead?
> If hdfs cacheadmin is the way to go (it's the path I'm going down so far), 
> then I need to figure out if what's happening is a bug or if I've configured 
> something wrong, because when I start up HDFS with a gig of memory locked 
> (both in limits.d for ulimit -l and also in hdfs-site.xml) and the server 
> starts up, and presumably tries to cache things into memory, I get hours and 
> hours of timeouts in the logs like this:
> 2016-06-08 07:42:50,856 WARN org.apache.hadoop.hdfs.server.datanode.DataNode: 
> IOException in offerService
> java.net.SocketTimeoutException: Call From stgb-fe1.litle.com/10.1.9.66 to 
> localhost:8020 failed on socket timeout exception: 
> java.net.SocketTimeoutException: 60000 millis timeout while waiting for 
> channel to be ready for read. ch : java.nio.channels.SocketChannel[connected 
> local=/127.0.0.1:51647 remote=localhost/127.0.0.1:8020]; For more details 
> see:  http://wiki.apache.org/hadoop/SocketTimeout
>       at sun.reflect.GeneratedConstructorAccessor8.newInstance(Unknown Source)
>       at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>       at java.lang.reflect.Constructor.newInstance(Constructor.java:526)
>       at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:792)
>       at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:751)
>       at org.apache.hadoop.ipc.Client.call(Client.java:1479)
>       at org.apache.hadoop.ipc.Client.call(Client.java:1412)
>       at 
> org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:229)
>       at com.sun.proxy.$Proxy13.sendHeartbeat(Unknown Source)
>       at 
> org.apache.hadoop.hdfs.protocolPB.DatanodeProtocolClientSideTranslatorPB.sendHeartbeat(DatanodeProtocolClientSideTranslatorPB.java:153)
>       at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.sendHeartBeat(BPServiceActor.java:554)
>       at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.offerService(BPServiceActor.java:653)
>       at 
> org.apache.hadoop.hdfs.server.datanode.BPServiceActor.run(BPServiceActor.java:824)
>       at java.lang.Thread.run(Thread.java:745)
> Caused by: java.net.SocketTimeoutException: 60000 millis timeout while 
> waiting for channel to be ready for read. ch : 
> java.nio.channels.SocketChannel[connected local=/127.0.0.1:51647 
> remote=localhost/127.0.0.1:8020]
>       at 
> org.apache.hadoop.net.SocketIOWithTimeout.doIO(SocketIOWithTimeout.java:164)
>       at 
> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:161)
>       at 
> org.apache.hadoop.net.SocketInputStream.read(SocketInputStream.java:131)
>       at java.io.FilterInputStream.read(FilterInputStream.java:133)
>       at java.io.FilterInputStream.read(FilterInputStream.java:133)
>       at 
> org.apache.hadoop.ipc.Client$Connection$PingInputStream.read(Client.java:520)
>       at java.io.BufferedInputStream.fill(BufferedInputStream.java:235)
>       at java.io.BufferedInputStream.read(BufferedInputStream.java:254)
>       at java.io.DataInputStream.readInt(DataInputStream.java:387)
>       at 
> org.apache.hadoop.ipc.Client$Connection.receiveRpcResponse(Client.java:1084)
>       at org.apache.hadoop.ipc.Client$Connection.run(Client.java:979)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

---------------------------------------------------------------------
To unsubscribe, e-mail: hdfs-issues-unsubscr...@hadoop.apache.org
For additional commands, e-mail: hdfs-issues-h...@hadoop.apache.org

[jira] [Resolved] (HDFS-10502) Enabled memory locking and now HDFS won't start up

Reply via email to