[jira] [Commented] (HBASE-9393) Hbase does not closing a closed socket resulting in many CLOSE_WAIT
[ https://issues.apache.org/jira/browse/HBASE-9393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15105939#comment-15105939 ] Colin Patrick McCabe commented on HBASE-9393: - Unfortunately, this is kind of a complex topic. In HDFS, sockets for input streams are managed by the {{Peer}} class. {{Peers}} can either be "owned" by {{DFSInputStream}} objects, or stored in the {{PeerCache}}. The {{PeerCache}} already has appropriate timeouts and won't keep open too many sockets. However, there is no limit to how long a {{DFSInputStream}} could hold on to a {{Peer}}. There are a few ways to minimize the number of open peers. 1. If HBase only ever called positional read (pread), the {{DFSInputStream}} object would never own a {{Peer}}, so this issue would not arise. 2. If HBase called {{DFSInputStream#unbuffer}}, any open peers would be closed, even though the stream would continue to be open. 3. If HDFS had a timeout for how long it would hold onto a {{Peer}}, that could limit the number of open sockets. Configuring HBase to periodically close open streams is not necessary; it's strictly worse than option #2. I believe there is an option do to #1 even right now. Can't HBase be configured just to use pread and never read? #2 would require a code change to HBase; #3 would require a code change to HDFS. Are you running out of file descriptors? What's the user-visible problem here? > Hbase does not closing a closed socket resulting in many CLOSE_WAIT > > > Key: HBASE-9393 > URL: https://issues.apache.org/jira/browse/HBASE-9393 > Project: HBase > Issue Type: Bug >Affects Versions: 0.94.2, 0.98.0 > Environment: Centos 6.4 - 7 regionservers/datanodes, 8 TB per node, > 7279 regions >Reporter: Avi Zrachya > > HBase dose not close a dead connection with the datanode. > This resulting in over 60K CLOSE_WAIT and at some point HBase can not connect > to the datanode because too many mapped sockets from one host to another on > the same port. > The example below is with low CLOSE_WAIT count because we had to restart > hbase to solve the porblem, later in time it will incease to 60-100K sockets > on CLOSE_WAIT > [root@hd2-region3 ~]# netstat -nap |grep CLOSE_WAIT |grep 21592 |wc -l > 13156 > [root@hd2-region3 ~]# ps -ef |grep 21592 > root 17255 17219 0 12:26 pts/000:00:00 grep 21592 > hbase21592 1 17 Aug29 ?03:29:06 > /usr/java/jdk1.6.0_26/bin/java -XX:OnOutOfMemoryError=kill -9 %p -Xmx8000m > -ea -XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode > -Dhbase.log.dir=/var/log/hbase > -Dhbase.log.file=hbase-hbase-regionserver-hd2-region3.swnet.corp.log ... -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-9393) Hbase does not closing a closed socket resulting in many CLOSE_WAIT
[ https://issues.apache.org/jira/browse/HBASE-9393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15102383#comment-15102383 ] Colin Patrick McCabe commented on HBASE-9393: - The timeout that I'm talking about is inside DFSClient.java, not inside HBase. HDFS-4911 fixed a problem where the timeout was too long. Can you be a little bit clearer on what you'd like to implement, and what you see as the problem here? > Hbase does not closing a closed socket resulting in many CLOSE_WAIT > > > Key: HBASE-9393 > URL: https://issues.apache.org/jira/browse/HBASE-9393 > Project: HBase > Issue Type: Bug >Affects Versions: 0.94.2, 0.98.0 > Environment: Centos 6.4 - 7 regionservers/datanodes, 8 TB per node, > 7279 regions >Reporter: Avi Zrachya > > HBase dose not close a dead connection with the datanode. > This resulting in over 60K CLOSE_WAIT and at some point HBase can not connect > to the datanode because too many mapped sockets from one host to another on > the same port. > The example below is with low CLOSE_WAIT count because we had to restart > hbase to solve the porblem, later in time it will incease to 60-100K sockets > on CLOSE_WAIT > [root@hd2-region3 ~]# netstat -nap |grep CLOSE_WAIT |grep 21592 |wc -l > 13156 > [root@hd2-region3 ~]# ps -ef |grep 21592 > root 17255 17219 0 12:26 pts/000:00:00 grep 21592 > hbase21592 1 17 Aug29 ?03:29:06 > /usr/java/jdk1.6.0_26/bin/java -XX:OnOutOfMemoryError=kill -9 %p -Xmx8000m > -ea -XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode > -Dhbase.log.dir=/var/log/hbase > -Dhbase.log.file=hbase-hbase-regionserver-hd2-region3.swnet.corp.log ... -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-9393) Hbase does not closing a closed socket resulting in many CLOSE_WAIT
[ https://issues.apache.org/jira/browse/HBASE-9393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15098796#comment-15098796 ] Colin Patrick McCabe commented on HBASE-9393: - The client should be configured so that it closes sockets a short time after the server does. In other words, its timeout should be slightly longer than the server's. Suggest checking your timeout configuration (this was too long in older versions of Hadoop). > Hbase does not closing a closed socket resulting in many CLOSE_WAIT > > > Key: HBASE-9393 > URL: https://issues.apache.org/jira/browse/HBASE-9393 > Project: HBase > Issue Type: Bug >Affects Versions: 0.94.2, 0.98.0 > Environment: Centos 6.4 - 7 regionservers/datanodes, 8 TB per node, > 7279 regions >Reporter: Avi Zrachya > > HBase dose not close a dead connection with the datanode. > This resulting in over 60K CLOSE_WAIT and at some point HBase can not connect > to the datanode because too many mapped sockets from one host to another on > the same port. > The example below is with low CLOSE_WAIT count because we had to restart > hbase to solve the porblem, later in time it will incease to 60-100K sockets > on CLOSE_WAIT > [root@hd2-region3 ~]# netstat -nap |grep CLOSE_WAIT |grep 21592 |wc -l > 13156 > [root@hd2-region3 ~]# ps -ef |grep 21592 > root 17255 17219 0 12:26 pts/000:00:00 grep 21592 > hbase21592 1 17 Aug29 ?03:29:06 > /usr/java/jdk1.6.0_26/bin/java -XX:OnOutOfMemoryError=kill -9 %p -Xmx8000m > -ea -XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode > -Dhbase.log.dir=/var/log/hbase > -Dhbase.log.file=hbase-hbase-regionserver-hd2-region3.swnet.corp.log ... -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-14451) Move on to htrace-4.0.1 (from htrace-3.2.0)
[ https://issues.apache.org/jira/browse/HBASE-14451?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HBASE-14451: - Summary: Move on to htrace-4.0.1 (from htrace-3.2.0) (was: Move on to htrace-4.0.0 (from htrace-3.2.0)) > Move on to htrace-4.0.1 (from htrace-3.2.0) > --- > > Key: HBASE-14451 > URL: https://issues.apache.org/jira/browse/HBASE-14451 > Project: HBase > Issue Type: Task >Reporter: stack >Assignee: stack > Attachments: 14451.txt, 14451v2.txt, 14451v3.txt, 14451v4.txt, > 14451v5.txt, 14451v6.txt, 14451v7.txt, 14451v8.txt, 14451v9.txt > > > htrace-4.0.0 was just release with a new API. Get up on it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14451) Move on to htrace-4.0.0 (from htrace-3.2.0)
[ https://issues.apache.org/jira/browse/HBASE-14451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14903324#comment-14903324 ] Colin Patrick McCabe commented on HBASE-14451: -- Thanks for this, [~stack]. {{ResultBoundedCompletionService}}: it seems like {{Tracer}} should be an argument to the constructor here, rather than pulled from {{Tracer#curThreadTracer}}. {code} requestHeaderBuilder.setTraceInfo(TracingProtos.RPCTInfo.newBuilder(). setParentId(spanId.getHigh()). setTraceId(spanId.getLow())); {code} No sure if it matters, but for consistency, we should probably set {{TraceId}} to {{spanId#getHigh}}, since that is the 64 bits that is conserved between parent and child (in single-parent scenarios). Same comment in {{RpcClientImpl.java}}. {code} protected void tracedWriteRequest(Call call, int priority, TraceScope traceScope) throws IOException { try { writeRequest(call, priority, traceScope); } finally { if (traceScope != null) traceScope.close(); } } {code} Do we need this method any more? It seems like the calls to {{writeRequest}} are already wrapped in try...catch blocks that we could use a traceScope with. {{RpcClientImpl.java}}: there is a lot of awkwardness here with trying to get the current thread tracer. Shouldn't the {{RpcClientImpl}} have its own {{Tracer}} object internally and just use that for everything? Same comment for {{RecoverableZooKeeper}}. {{hbase-default.xml}}: should we also document {{hbase.htrace.sampler.classes}}? In general, {{Tracer#curThreadTracer}} is a hack. It may be helpful in some legacy code, but in general you should pass tracers around "normally"-- i.e. when the {{HRegionServer}} creates objects to do what it needs to do, it should pass them its own tracer. Remember that worker threads won't have a current tracer when they're first created. It is always safer and cleaner to pass the {{Tracer}} object you want in explicitly than to rely on {{curThreadTracer}}. I don't see where we're creating the Tracer for the HBase client. I only see us creating a tracer for the RegionServer. {code} cmccabe@keter:~/hbase1> git grep Tracer.Builder hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java: this.tracer = new Tracer.Builder("RegionServer"). hbase-server/src/test/java/org/apache/hadoop/hbase/PerformanceEvaluation.java: this.tracer = new Tracer.Builder().name("Client"). hbase-server/src/test/java/org/apache/hadoop/hbase/trace/TestHTraceHooks.java: new Tracer.Builder().name("test").conf(new HBaseHTraceConfiguration(conf)).build()) { {code} (Note that the second two grep results are unit tests, and so don't count here) We should trace the HBaseClient as well as the region server. And probably we need another tracer for the HBase Master? RingBufferTruck: I thought HBase was more of a series of tubes? > Move on to htrace-4.0.0 (from htrace-3.2.0) > --- > > Key: HBASE-14451 > URL: https://issues.apache.org/jira/browse/HBASE-14451 > Project: HBase > Issue Type: Task >Reporter: stack >Assignee: stack > Attachments: 14451.txt, 14451v2.txt, 14451v3.txt, 14451v4.txt, > 14451v5.txt, 14451v6.txt, 14451v7.txt > > > htrace-4.0.0 was just release with a new API. Get up on it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-14451) Move on to htrace-4.0.0 (from htrace-3.2.0)
[ https://issues.apache.org/jira/browse/HBASE-14451?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14901651#comment-14901651 ] Colin Patrick McCabe commented on HBASE-14451: -- [~stack]: right, to get "cross-cutting" tracing with HTrace you need to have the same major version in all your components. You can still get tracing in just HBase with any version you choose. Hadoop 2.8 will have htrace 4.0. > Move on to htrace-4.0.0 (from htrace-3.2.0) > --- > > Key: HBASE-14451 > URL: https://issues.apache.org/jira/browse/HBASE-14451 > Project: HBase > Issue Type: Task >Reporter: stack >Assignee: stack > Attachments: 14451.txt, 14451v2.txt, 14451v3.txt, 14451v4.txt, > 14451v5.txt > > > htrace-4.0.0 was just release with a new API. Get up on it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-9393) Hbase does not closing a closed socket resulting in many CLOSE_WAIT
[ https://issues.apache.org/jira/browse/HBASE-9393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14535512#comment-14535512 ] Colin Patrick McCabe commented on HBASE-9393: - CDH4.4 had some configuration defaults that weren't the best, that were improved in later versions. It is getting pretty old now, so I would suggest just upgrading. If that's not possible, then you could check out some of the recent HBaseCon talks about tuning HBase and HDFS performance. I think this jira should be closed since I don't see any bug here. if we get more information about something specific we could improve we could reopen it. > Hbase does not closing a closed socket resulting in many CLOSE_WAIT > > > Key: HBASE-9393 > URL: https://issues.apache.org/jira/browse/HBASE-9393 > Project: HBase > Issue Type: Bug >Affects Versions: 0.94.2, 0.98.0 > Environment: Centos 6.4 - 7 regionservers/datanodes, 8 TB per node, > 7279 regions >Reporter: Avi Zrachya > > HBase dose not close a dead connection with the datanode. > This resulting in over 60K CLOSE_WAIT and at some point HBase can not connect > to the datanode because too many mapped sockets from one host to another on > the same port. > The example below is with low CLOSE_WAIT count because we had to restart > hbase to solve the porblem, later in time it will incease to 60-100K sockets > on CLOSE_WAIT > [root@hd2-region3 ~]# netstat -nap |grep CLOSE_WAIT |grep 21592 |wc -l > 13156 > [root@hd2-region3 ~]# ps -ef |grep 21592 > root 17255 17219 0 12:26 pts/000:00:00 grep 21592 > hbase21592 1 17 Aug29 ?03:29:06 > /usr/java/jdk1.6.0_26/bin/java -XX:OnOutOfMemoryError=kill -9 %p -Xmx8000m > -ea -XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode > -Dhbase.log.dir=/var/log/hbase > -Dhbase.log.file=hbase-hbase-regionserver-hd2-region3.swnet.corp.log ... -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-13060) Don't use deprecated HTrace API addKVAnnotation(byte[], byte[])
[ https://issues.apache.org/jira/browse/HBASE-13060?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14324981#comment-14324981 ] Colin Patrick McCabe commented on HBASE-13060: -- It looks like the String-based API didn't make it into HTrace 3.1.0. I guess we will have to wait on this one. > Don't use deprecated HTrace API addKVAnnotation(byte[], byte[]) > --- > > Key: HBASE-13060 > URL: https://issues.apache.org/jira/browse/HBASE-13060 > Project: HBase > Issue Type: Bug >Reporter: Colin Patrick McCabe >Priority: Critical > > Let's avoid using the deprecated HTrace API addKVAnnotation(byte[], byte[]). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-13060) Don't use deprecated HTrace API addKVAnnotation(byte[], byte[])
Colin Patrick McCabe created HBASE-13060: Summary: Don't use deprecated HTrace API addKVAnnotation(byte[], byte[]) Key: HBASE-13060 URL: https://issues.apache.org/jira/browse/HBASE-13060 Project: HBase Issue Type: Bug Reporter: Colin Patrick McCabe Priority: Critical Let's avoid using the deprecated HTrace API addKVAnnotation(byte[], byte[]). -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-12899) HBase should prefix htrace configuration keys with "hbase.htrace" rather than just "hbase."
[ https://issues.apache.org/jira/browse/HBASE-12899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HBASE-12899: - Attachment: HBASE-12899.002.patch OK. We can set the new configuration keys when we see the old ones, and print a warning, if that's helpful. Here is a new patch that does this. We know which config keys existed in the pre-3.1 world, so we can just include deprecations for those. > HBase should prefix htrace configuration keys with "hbase.htrace" rather than > just "hbase." > --- > > Key: HBASE-12899 > URL: https://issues.apache.org/jira/browse/HBASE-12899 > Project: HBase > Issue Type: Improvement >Affects Versions: 1.0.0 >Reporter: Colin Patrick McCabe > Attachments: HBASE-12899.001.patch, HBASE-12899.002.patch > > > In Hadoop, we pass all configuration keys starting with "hadoop.htrace" to > htrace. So "hadoop.htrace.sampler.fraction" gets passed to HTrace as > sampler.fraction, and so forth. > For consistency, it seems like HBase should prefix htrace configuration keys > with "hbase.htrace" rather than just "hbase." -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12899) HBase should prefix htrace configuration keys with "hbase.htrace" rather than just "hbase."
[ https://issues.apache.org/jira/browse/HBASE-12899?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14286476#comment-14286476 ] Colin Patrick McCabe commented on HBASE-12899: -- thanks for the review, Nick. HTrace-for-hbase isn't really deployed in production yet. I would be surprised if anyone but a handful of devs had used this. I think now is the right time to change the prefix so that we don't get conflicts between htrace and hbase configuration keys. If we start with compatibility shims, we'll have to carry them around forever (speaking from Hadoop experience) and there's little benefit. We only just got an HTrace release last week! :) > HBase should prefix htrace configuration keys with "hbase.htrace" rather than > just "hbase." > --- > > Key: HBASE-12899 > URL: https://issues.apache.org/jira/browse/HBASE-12899 > Project: HBase > Issue Type: Improvement >Affects Versions: 1.0.0 >Reporter: Colin Patrick McCabe > Attachments: HBASE-12899.001.patch > > > In Hadoop, we pass all configuration keys starting with "hadoop.htrace" to > htrace. So "hadoop.htrace.sampler.fraction" gets passed to HTrace as > sampler.fraction, and so forth. > For consistency, it seems like HBase should prefix htrace configuration keys > with "hbase.htrace" rather than just "hbase." -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-12899) HBase should prefix htrace configuration keys with "hbase.htrace" rather than just "hbase."
[ https://issues.apache.org/jira/browse/HBASE-12899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HBASE-12899: - Status: Patch Available (was: Open) > HBase should prefix htrace configuration keys with "hbase.htrace" rather than > just "hbase." > --- > > Key: HBASE-12899 > URL: https://issues.apache.org/jira/browse/HBASE-12899 > Project: HBase > Issue Type: Improvement >Affects Versions: 1.0.0 >Reporter: Colin Patrick McCabe > Attachments: HBASE-12899.001.patch > > > In Hadoop, we pass all configuration keys starting with "hadoop.htrace" to > htrace. So "hadoop.htrace.sampler.fraction" gets passed to HTrace as > sampler.fraction, and so forth. > For consistency, it seems like HBase should prefix htrace configuration keys > with "hbase.htrace" rather than just "hbase." -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-12899) HBase should prefix htrace configuration keys with "hbase.htrace" rather than just "hbase."
[ https://issues.apache.org/jira/browse/HBASE-12899?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HBASE-12899: - Attachment: HBASE-12899.001.patch > HBase should prefix htrace configuration keys with "hbase.htrace" rather than > just "hbase." > --- > > Key: HBASE-12899 > URL: https://issues.apache.org/jira/browse/HBASE-12899 > Project: HBase > Issue Type: Improvement >Affects Versions: 1.0.0 >Reporter: Colin Patrick McCabe > Attachments: HBASE-12899.001.patch > > > In Hadoop, we pass all configuration keys starting with "hadoop.htrace" to > htrace. So "hadoop.htrace.sampler.fraction" gets passed to HTrace as > sampler.fraction, and so forth. > For consistency, it seems like HBase should prefix htrace configuration keys > with "hbase.htrace" rather than just "hbase." -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (HBASE-12899) HBase should prefix htrace configuration keys with "hbase.htrace" rather than just "hbase."
Colin Patrick McCabe created HBASE-12899: Summary: HBase should prefix htrace configuration keys with "hbase.htrace" rather than just "hbase." Key: HBASE-12899 URL: https://issues.apache.org/jira/browse/HBASE-12899 Project: HBase Issue Type: Improvement Affects Versions: 1.0.0 Reporter: Colin Patrick McCabe In Hadoop, we pass all configuration keys starting with "hadoop.htrace" to htrace. So "hadoop.htrace.sampler.fraction" gets passed to HTrace as sampler.fraction, and so forth. For consistency, it seems like HBase should prefix htrace configuration keys with "hbase.htrace" rather than just "hbase." -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-9393) Hbase does not closing a closed socket resulting in many CLOSE_WAIT
[ https://issues.apache.org/jira/browse/HBASE-9393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14118943#comment-14118943 ] Colin Patrick McCabe commented on HBASE-9393: - Best guess is that you didn't apply your configuration to HBase, which is the DFSClient in this scenario. Suggest posting to hdfs-u...@apache.org > Hbase does not closing a closed socket resulting in many CLOSE_WAIT > > > Key: HBASE-9393 > URL: https://issues.apache.org/jira/browse/HBASE-9393 > Project: HBase > Issue Type: Bug >Affects Versions: 0.94.2, 0.98.0 > Environment: Centos 6.4 - 7 regionservers/datanodes, 8 TB per node, > 7279 regions >Reporter: Avi Zrachya > > HBase dose not close a dead connection with the datanode. > This resulting in over 60K CLOSE_WAIT and at some point HBase can not connect > to the datanode because too many mapped sockets from one host to another on > the same port. > The example below is with low CLOSE_WAIT count because we had to restart > hbase to solve the porblem, later in time it will incease to 60-100K sockets > on CLOSE_WAIT > [root@hd2-region3 ~]# netstat -nap |grep CLOSE_WAIT |grep 21592 |wc -l > 13156 > [root@hd2-region3 ~]# ps -ef |grep 21592 > root 17255 17219 0 12:26 pts/000:00:00 grep 21592 > hbase21592 1 17 Aug29 ?03:29:06 > /usr/java/jdk1.6.0_26/bin/java -XX:OnOutOfMemoryError=kill -9 %p -Xmx8000m > -ea -XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode > -Dhbase.log.dir=/var/log/hbase > -Dhbase.log.file=hbase-hbase-regionserver-hd2-region3.swnet.corp.log ... -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-10689) Explore advisory caching for MR over snapshot scans
[ https://issues.apache.org/jira/browse/HBASE-10689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13926010#comment-13926010 ] Colin Patrick McCabe commented on HBASE-10689: -- No problem. I agree it can get a bit confusing. It would be nice to see some numbers from tweaking readahead on HBase when you guys get a chance. I guess the gain will depend partly on how much caching HBase is doing. If HBase is caching that extra 4 MB that it read, then it's not such a loss. If it's throwing that away, then making readahead shorter may be a big gain. > Explore advisory caching for MR over snapshot scans > --- > > Key: HBASE-10689 > URL: https://issues.apache.org/jira/browse/HBASE-10689 > Project: HBase > Issue Type: Improvement > Components: mapreduce, Performance >Reporter: Nick Dimiduk > > Per > [comment|https://issues.apache.org/jira/browse/HBASE-10660?focusedCommentId=13921730&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13921730] > on HBASE-10660, explore using the new HDFS advisory caching feature > introduced in HDFS-4817 for TableSnapshotInputFormat. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-10689) Explore advisory caching for MR over snapshot scans
[ https://issues.apache.org/jira/browse/HBASE-10689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13925386#comment-13925386 ] Colin Patrick McCabe commented on HBASE-10689: -- [~stack], there are multiple kinds of caching in HDFS. The path-based caching added in HDFS-4949 caches at the file level, so you are right that it is not that useful for HBase. The advisory caching API is a little different. It allows the application to control how much readahead HDFS does and a little bit about how the page cache is used. When HBase reads a 64kb chunk, currently HDFS will load a 4MB segment off of the disk. The rest of that 4MB is thrown away unless HBase uses it. HBase could avoid this issue by calling DFSInputStream#setReadahead(65536). Unless HBase is doing something smart with the rest of that 4MB, it seems like this might be a good idea? > Explore advisory caching for MR over snapshot scans > --- > > Key: HBASE-10689 > URL: https://issues.apache.org/jira/browse/HBASE-10689 > Project: HBase > Issue Type: Improvement > Components: mapreduce, Performance >Reporter: Nick Dimiduk > > Per > [comment|https://issues.apache.org/jira/browse/HBASE-10660?focusedCommentId=13921730&page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-13921730] > on HBASE-10660, explore using the new HDFS advisory caching feature > introduced in HDFS-4817 for TableSnapshotInputFormat. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-10052) use HDFS advisory caching to avoid caching HFiles that are not going to be read again (because they are being compacted)
[ https://issues.apache.org/jira/browse/HBASE-10052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13840355#comment-13840355 ] Colin Patrick McCabe commented on HBASE-10052: -- bq. One thing to be wary of: during the compaction, readers are still accessing the old files, so if you're compacting large files, this could really hurt read latency during compactions (assuming that people are relying on linux LRU in addition to hbase-internal LRU for performance). That's a fair point. bq. In most cases, as soon as the compaction is complete, we end up removing the input files anyway (thus removing from cache), right? Unlinking a file doesn't remove that file from the buffer cache. If the unlinked file is no longer referenced (certainly the case here), it will be removed over time, as other things evict it. In the meantime, having those pages buffered means that something else isn't. When doing the fadvise work, I remember us coming up with a crude hack that did fadvise from HBase during compactions and seeing some performance gain. But it seems like might be workload-dependent. It's a shame that there isn't a way to tell Linux to do a read without caching. That's really what we want here. Instead, we just have a way of nuking the cache for a range of the file if it exists, which is not at all the same thing. I took a look at the Linux source tree again today, and {{FADV_NOREUSE}} was still a no-op :( bq. Hmm, ok, moving out until we have something with a quantified benefit. Yeah, it would be interesting to see some test numbers. I also wonder if we could somehow quantify how often the HBase LRU hits. > use HDFS advisory caching to avoid caching HFiles that are not going to be > read again (because they are being compacted) > > > Key: HBASE-10052 > URL: https://issues.apache.org/jira/browse/HBASE-10052 > Project: HBase > Issue Type: Improvement >Reporter: Colin Patrick McCabe >Assignee: Andrew Purtell >Priority: Minor > Fix For: 0.98.1, 0.99.0 > > > HBase can benefit from doing dropbehind during compaction since compacted > files are not read again. HDFS advisory caching, introduced in HDFS-4817, > can help here. The right API here is {{DataInputStream#setDropBehind}}. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-10052) use HDFS advisory caching to avoid caching HFiles that are not going to be read again (because they are being compacted)
[ https://issues.apache.org/jira/browse/HBASE-10052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13838543#comment-13838543 ] Colin Patrick McCabe commented on HBASE-10052: -- [~enis] It would be interesting to experiment with using drop-behind on HBase's block files. However, in my experiments at least, this wasn't a performance win since HBase still relies on the OS page cache in some cases. It's been a while since I did them, though. > use HDFS advisory caching to avoid caching HFiles that are not going to be > read again (because they are being compacted) > > > Key: HBASE-10052 > URL: https://issues.apache.org/jira/browse/HBASE-10052 > Project: HBase > Issue Type: Improvement >Reporter: Colin Patrick McCabe > Fix For: 0.98.0 > > > HBase can benefit from doing dropbehind during compaction since compacted > files are not read again. HDFS advisory caching, introduced in HDFS-4817, > can help here. The right API here is {{DataInputStream#setDropBehind}}. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-10052) use HDFS advisory caching to avoid caching HFiles that are not going to be read again (because they are being compacted)
[ https://issues.apache.org/jira/browse/HBASE-10052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13834477#comment-13834477 ] Colin Patrick McCabe commented on HBASE-10052: -- [~andrew.purt...@gmail.com] you can take it if you want > use HDFS advisory caching to avoid caching HFiles that are not going to be > read again (because they are being compacted) > > > Key: HBASE-10052 > URL: https://issues.apache.org/jira/browse/HBASE-10052 > Project: HBase > Issue Type: Improvement >Reporter: Colin Patrick McCabe > Fix For: 0.98.0 > > > HBase can benefit from doing dropbehind during compaction since compacted > files are not read again. HDFS advisory caching, introduced in HDFS-4817, > can help here. The right API here is {{DataInputStream#setDropBehind}}. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HBASE-10052) use HDFS advisory caching to avoid caching HFiles that are not going to be read again (because they are being compacted)
[ https://issues.apache.org/jira/browse/HBASE-10052?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Colin Patrick McCabe updated HBASE-10052: - Description: HBase can benefit from doing dropbehind during compaction since compacted files are not read again. HDFS advisory caching, introduced in HDFS-4817, can help here. The right API here is {{DataInputStream#setDropBehind}}. (was: HBase can benefit from doing dropbehind during compaction since compacted files are not read again. HDFS advisory caching, introduced in HDFS-4817, can help here. The right API here is {{DataOutputStream#setDropBehind}}.) > use HDFS advisory caching to avoid caching HFiles that are not going to be > read again (because they are being compacted) > > > Key: HBASE-10052 > URL: https://issues.apache.org/jira/browse/HBASE-10052 > Project: HBase > Issue Type: Improvement >Reporter: Colin Patrick McCabe > > HBase can benefit from doing dropbehind during compaction since compacted > files are not read again. HDFS advisory caching, introduced in HDFS-4817, > can help here. The right API here is {{DataInputStream#setDropBehind}}. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-10052) use HDFS advisory caching to avoid caching HFiles that are not going to be read again (because they are being compacted)
[ https://issues.apache.org/jira/browse/HBASE-10052?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13834203#comment-13834203 ] Colin Patrick McCabe commented on HBASE-10052: -- [~andrew.purt...@gmail.com] : sorry. I did mean DFSInputStream, since the files not being used again would be the compactees. And yeah, reflection would be a good way to do this and support older Hadoops (see {{CanSetDropBehind}} interface) > use HDFS advisory caching to avoid caching HFiles that are not going to be > read again (because they are being compacted) > > > Key: HBASE-10052 > URL: https://issues.apache.org/jira/browse/HBASE-10052 > Project: HBase > Issue Type: Improvement >Reporter: Colin Patrick McCabe > > HBase can benefit from doing dropbehind during compaction since compacted > files are not read again. HDFS advisory caching, introduced in HDFS-4817, > can help here. The right API here is {{DataOutputStream#setDropBehind}}. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Created] (HBASE-10052) use HDFS advisory caching to avoid caching HFiles that are not going to be read again (because they are being compacted)
Colin Patrick McCabe created HBASE-10052: Summary: use HDFS advisory caching to avoid caching HFiles that are not going to be read again (because they are being compacted) Key: HBASE-10052 URL: https://issues.apache.org/jira/browse/HBASE-10052 Project: HBase Issue Type: Improvement Reporter: Colin Patrick McCabe HBase can benefit from doing dropbehind during compaction since compacted files are not read again. HDFS advisory caching, introduced in HDFS-4817, can help here. The right API here is {{DataOutputStream#setDropBehind}}. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-9393) Hbase dose not closing a closed socket resulting in many CLOSE_WAIT
[ https://issues.apache.org/jira/browse/HBASE-9393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13792938#comment-13792938 ] Colin Patrick McCabe commented on HBASE-9393: - I guess I should also explain why this doesn't happen in branch-1 of Hadoop. The reason is because Hadoop-1 had no socket cache and no grace period before the sockets were closed. The client simply opened a new socket each time, performed the op, and then closed it. This would result in (basically) no sockets in {{CLOSE_WAIT}}. Remember {{CLOSE_WAIT}} only happens when the server is waiting for the client to execute {{close}}. Keeping sockets open is an optimization, but one that may require you to raise your maximum number of file descriptors. If you are not happy with this tradeoff, you can set {{dfs.client.socketcache.capacity}} to {{0}} and {{dfs.datanode.socket.reuse.keepalive}} to {{0}} to get the old branch-1 behavior. It will be slower, though. > Hbase dose not closing a closed socket resulting in many CLOSE_WAIT > > > Key: HBASE-9393 > URL: https://issues.apache.org/jira/browse/HBASE-9393 > Project: HBase > Issue Type: Bug >Affects Versions: 0.94.2 > Environment: Centos 6.4 - 7 regionservers/datanodes, 8 TB per node, > 7279 regions >Reporter: Avi Zrachya > > HBase dose not close a dead connection with the datanode. > This resulting in over 60K CLOSE_WAIT and at some point HBase can not connect > to the datanode because too many mapped sockets from one host to another on > the same port. > The example below is with low CLOSE_WAIT count because we had to restart > hbase to solve the porblem, later in time it will incease to 60-100K sockets > on CLOSE_WAIT > [root@hd2-region3 ~]# netstat -nap |grep CLOSE_WAIT |grep 21592 |wc -l > 13156 > [root@hd2-region3 ~]# ps -ef |grep 21592 > root 17255 17219 0 12:26 pts/000:00:00 grep 21592 > hbase21592 1 17 Aug29 ?03:29:06 > /usr/java/jdk1.6.0_26/bin/java -XX:OnOutOfMemoryError=kill -9 %p -Xmx8000m > -ea -XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode > -Dhbase.log.dir=/var/log/hbase > -Dhbase.log.file=hbase-hbase-regionserver-hd2-region3.swnet.corp.log ... -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-9393) Hbase dose not closing a closed socket resulting in many CLOSE_WAIT
[ https://issues.apache.org/jira/browse/HBASE-9393?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13792915#comment-13792915 ] Colin Patrick McCabe commented on HBASE-9393: - I looked into this issue. I found a few things: The HDFS socket cache is too small by default and times out too quickly. Its default size is 16, but HBase seems to be opening many more connections to the DN than that. In this situation, sockets must inevitably be opened and then discarded, leading to sockets in {{CLOSE_WAIT}}. When you use positional read (aka {{pread}}), we grab a socket from the cache, read from it, and then immediately put it back. When you seek and then call {{read}}, we don't put the socket back at the end. The assumption behind the normal {{read}} method is that you are probably going to call {{read}} again, so it holds on to the socket until something else comes up (such as closing the stream). In many scenarios, this can lead to {{seek+read}} generating more sockets in {{CLOSE_WAIT}} than {{pread}}. I don't think we want to alter this HDFS behavior, since it's helpful in the case that you're reading through the entire file from start to finish-- which many HDFS clients do. It allows us to make certain optimizations such as reading a few kilobytes at a time, even if the user only asks for a few bytes at a time. These optimizations are unavailable with {{pread}} because it creates a new {{BlockReader}} each time. So as far as recommendations for HBase go: * use short-circuit reads whenever possible, since in many cases you can avoid needing a socket at all and just reuse the same file descriptor * set the socket cache to a bigger size and adjust the timeouts to be longer (I may explore changing the defaults in HDFS...) * if you are going to keep files open for a while and random read, use {{pread}}, never {{seek+read}}. > Hbase dose not closing a closed socket resulting in many CLOSE_WAIT > > > Key: HBASE-9393 > URL: https://issues.apache.org/jira/browse/HBASE-9393 > Project: HBase > Issue Type: Bug >Affects Versions: 0.94.2 > Environment: Centos 6.4 - 7 regionservers/datanodes, 8 TB per node, > 7279 regions >Reporter: Avi Zrachya > > HBase dose not close a dead connection with the datanode. > This resulting in over 60K CLOSE_WAIT and at some point HBase can not connect > to the datanode because too many mapped sockets from one host to another on > the same port. > The example below is with low CLOSE_WAIT count because we had to restart > hbase to solve the porblem, later in time it will incease to 60-100K sockets > on CLOSE_WAIT > [root@hd2-region3 ~]# netstat -nap |grep CLOSE_WAIT |grep 21592 |wc -l > 13156 > [root@hd2-region3 ~]# ps -ef |grep 21592 > root 17255 17219 0 12:26 pts/000:00:00 grep 21592 > hbase21592 1 17 Aug29 ?03:29:06 > /usr/java/jdk1.6.0_26/bin/java -XX:OnOutOfMemoryError=kill -9 %p -Xmx8000m > -ea -XX:+UseConcMarkSweepGC -XX:+CMSIncrementalMode > -Dhbase.log.dir=/var/log/hbase > -Dhbase.log.file=hbase-hbase-regionserver-hd2-region3.swnet.corp.log ... -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-8337) Investigate why disabling hadoop short circuit read is required to make recovery tests pass consistently under hadoop2
[ https://issues.apache.org/jira/browse/HBASE-8337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13641069#comment-13641069 ] Colin Patrick McCabe commented on HBASE-8337: - bq. Maybe I'm missing it in the discussion above, but why is this only a problem with hadoop2 and not hadoop1? Is SCR not enabled by default in hadoop1? Probably because of HDFS-4595. > Investigate why disabling hadoop short circuit read is required to make > recovery tests pass consistently under hadoop2 > -- > > Key: HBASE-8337 > URL: https://issues.apache.org/jira/browse/HBASE-8337 > Project: HBase > Issue Type: Sub-task > Components: hadoop2, test >Affects Versions: 0.98.0, 0.95.1 >Reporter: Jonathan Hsieh >Priority: Critical > Fix For: 0.95.1 > > > HBASE-7636 makes some TestDistributedLogSplitting pass consistently by > disabling hdfs short circuit reads. > HBASE-8349 makes datanode node death recovery pass consistently by disabling > hdfs short circuit reads. > This will likely require configuration modifications to fix and may have > different fixes for hadoop1, hadoop2 (HDFS-2246), and hadoop3 (HDFS-347)... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8337) Investigate why disabling hadoop short circuit read is required to make recovery tests pass consistently under hadoop2
[ https://issues.apache.org/jira/browse/HBASE-8337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13640976#comment-13640976 ] Colin Patrick McCabe commented on HBASE-8337: - bq. Looks like this has allowed us to get away with things we shouldn't. Tested using the same User for master and all regionservers in the minicluster, with 0.94 branch and the default Hadoop 1. TestMasterZKSessionRecovery OOMEs after surefire tries to parse a 180 MB logfile full of IOExceptions. As soon as one regionserver aborts, its filesystem is cached and/or closed by user, the master file system's DFS client is closed, and all hell breaks loose. You can use {{Filesystem#newInstance}} to prevent this problem. > Investigate why disabling hadoop short circuit read is required to make > recovery tests pass consistently under hadoop2 > -- > > Key: HBASE-8337 > URL: https://issues.apache.org/jira/browse/HBASE-8337 > Project: HBase > Issue Type: Sub-task > Components: hadoop2, test >Affects Versions: 0.98.0, 0.95.1 >Reporter: Jonathan Hsieh >Priority: Critical > Fix For: 0.95.1 > > > HBASE-7636 makes some TestDistributedLogSplitting pass consistently by > disabling hdfs short circuit reads. > HBASE-8349 makes datanode node death recovery pass consistently by disabling > hdfs short circuit reads. > This will likely require configuration modifications to fix and may have > different fixes for hadoop1, hadoop2 (HDFS-2246), and hadoop3 (HDFS-347)... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8337) Investigate why disabling hadoop short circuit read is required to make recovery tests pass consistently under hadoop2
[ https://issues.apache.org/jira/browse/HBASE-8337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13639970#comment-13639970 ] Colin Patrick McCabe commented on HBASE-8337: - Actually, the 2.0 series will have SCR; it's just not there now. > Investigate why disabling hadoop short circuit read is required to make > recovery tests pass consistently under hadoop2 > -- > > Key: HBASE-8337 > URL: https://issues.apache.org/jira/browse/HBASE-8337 > Project: HBase > Issue Type: Sub-task > Components: hadoop2, test >Affects Versions: 0.98.0, 0.95.1 >Reporter: Jonathan Hsieh >Priority: Critical > Fix For: 0.95.1 > > > HBASE-7636 makes some TestDistributedLogSplitting pass consistently by > disabling hdfs short circuit reads. > HBASE-8349 makes datanode node death recovery pass consistently by disabling > hdfs short circuit reads. > This will likely require configuration modifications to fix and may have > different fixes for hadoop1, hadoop2 (HDFS-2246), and hadoop3 (HDFS-347)... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8337) Investigate why disabling hadoop short circuit read is required to make recovery tests pass consistently under hadoop2
[ https://issues.apache.org/jira/browse/HBASE-8337?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13632467#comment-13632467 ] Colin Patrick McCabe commented on HBASE-8337: - It's good to have HDFS-347 merged into trunk. I hope that the merge into branch-2 will not take too long. Regardless, I think the bottom line is that HBase needs to have support for both new-style and old-style short-circuit local reads, at least for a while. Even if we had HDFS-347 in branch-2 now, you'd still need this support to test against branch-1, or test against a Windows-based HDFS cluster. > Investigate why disabling hadoop short circuit read is required to make > recovery tests pass consistently under hadoop2 > -- > > Key: HBASE-8337 > URL: https://issues.apache.org/jira/browse/HBASE-8337 > Project: HBase > Issue Type: Sub-task > Components: hadoop2, test >Affects Versions: 0.98.0, 0.95.1 >Reporter: Jonathan Hsieh >Priority: Critical > Fix For: 0.95.1 > > > HBASE-7636 makes some TestDistributedLogSplitting pass consistently by > disabling hdfs short circuit reads. > HBASE-8349 makes datanode node death recovery pass consistently by disabling > hdfs short circuit reads. > This will likely require configuration modifications to fix and may have > different fixes for hadoop1, hadoop2 (HDFS-2246), and hadoop3 (HDFS-347)... -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7636) TestDistributedLogSplitting#testThreeRSAbort fails against hadoop 2.0
[ https://issues.apache.org/jira/browse/HBASE-7636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13630477#comment-13630477 ] Colin Patrick McCabe commented on HBASE-7636: - bq. Is HDFS-347 the "new" version? Yes. bq. Is the current short circuit read (HDFS-2246?) in hadoop1 different from hadoop2's? It's very similar. bq. What is the fix version/target version for HDFS-347 – what HDFS branches are you targeting? It's going to be merged to trunk at first. bq. From HBase's point of view, this is purely a unit test fix and specifically for the MiniDFSCluster. Do you think disabling the SCR feature for unit tests is a prudent idea? If you are just trying to test functionality rather than performance, it's easiest to keep it off. > TestDistributedLogSplitting#testThreeRSAbort fails against hadoop 2.0 > - > > Key: HBASE-7636 > URL: https://issues.apache.org/jira/browse/HBASE-7636 > Project: HBase > Issue Type: Sub-task > Components: hadoop2, test >Affects Versions: 0.95.0 >Reporter: Ted Yu >Assignee: Jonathan Hsieh > Fix For: 0.98.0, 0.95.1 > > Attachments: hbase-7636.v2.patch, hbase-7636.v3.patch > > > From > https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/364/testReport/org.apache.hadoop.hbase.master/TestDistributedLogSplitting/testThreeRSAbort/ > : > {code} > 2013-01-21 11:49:34,276 DEBUG > [MASTER_SERVER_OPERATIONS-juno.apache.org,57966,1358768818594-0] > client.HConnectionManager$HConnectionImplementation(956): Looked up root > region location, connection=hconnection 0x12f19fe; > serverName=juno.apache.org,55531,1358768819479 > 2013-01-21 11:49:34,278 INFO > [MASTER_SERVER_OPERATIONS-juno.apache.org,57966,1358768818594-0] > catalog.CatalogTracker(576): Failed verification of .META.,,1 at > address=juno.apache.org,57582,1358768819456; > org.apache.hadoop.hbase.ipc.HBaseClient$FailedServerException: This server is > in the failed servers list: juno.apache.org/67.195.138.61:57582 > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7636) TestDistributedLogSplitting#testThreeRSAbort fails against hadoop 2.0
[ https://issues.apache.org/jira/browse/HBASE-7636?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13630453#comment-13630453 ] Colin Patrick McCabe commented on HBASE-7636: - I was not able to find any HDFS logs attached (but maybe I didn't look in the right place?) If you can find where the test puts those, your answer is almost certainly in there. And it's almost certainly a configuration problem. In general, configuring short circuit reads is complex. For old-style short-circuit reads, you need to set up special permissions on your DataNode storage directories. You also need to specify the right user for {{dfs.block.local-path-access.user}}. For new-style short-circuit reads, you need to have {{libhadoop.so}} installed, and possibly be running with the native profile {{-Pnative}} so that Maven will set up {{LD_LIBRARY_PATH}} correctly. Then you need to set a valid socket path. It's probably best to wait until we finish merging the HDFS-347 branch (vote was successful, now we just need to do the work in svn), and then I'll help you set up the conf for this test. You probably want to have a fallback for if native code is not enabled. > TestDistributedLogSplitting#testThreeRSAbort fails against hadoop 2.0 > - > > Key: HBASE-7636 > URL: https://issues.apache.org/jira/browse/HBASE-7636 > Project: HBase > Issue Type: Sub-task > Components: hadoop2, test >Affects Versions: 0.95.0 >Reporter: Ted Yu >Assignee: Jonathan Hsieh > Fix For: 0.98.0, 0.95.1 > > Attachments: hbase-7636.v2.patch, hbase-7636.v3.patch > > > From > https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/364/testReport/org.apache.hadoop.hbase.master/TestDistributedLogSplitting/testThreeRSAbort/ > : > {code} > 2013-01-21 11:49:34,276 DEBUG > [MASTER_SERVER_OPERATIONS-juno.apache.org,57966,1358768818594-0] > client.HConnectionManager$HConnectionImplementation(956): Looked up root > region location, connection=hconnection 0x12f19fe; > serverName=juno.apache.org,55531,1358768819479 > 2013-01-21 11:49:34,278 INFO > [MASTER_SERVER_OPERATIONS-juno.apache.org,57966,1358768818594-0] > catalog.CatalogTracker(576): Failed verification of .META.,,1 at > address=juno.apache.org,57582,1358768819456; > org.apache.hadoop.hbase.ipc.HBaseClient$FailedServerException: This server is > in the failed servers list: juno.apache.org/67.195.138.61:57582 > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6686) HFile Quarantine fails with missing dirs in hadoop 2.0
[ https://issues.apache.org/jira/browse/HBASE-6686?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1349#comment-1349 ] Colin Patrick McCabe commented on HBASE-6686: - As far as I can see, hadoop-1 returns null, not an empty list, when the directory does not exist. Admittedly, I only checked DistributedFileSystem.java on line 279 (DistributedFileSystem#listStatus) Maybe there's some other override that does it, but... seems questionable. You're right that this is an exception in hadoop-2 / cdh4. > HFile Quarantine fails with missing dirs in hadoop 2.0 > --- > > Key: HBASE-6686 > URL: https://issues.apache.org/jira/browse/HBASE-6686 > Project: HBase > Issue Type: Bug > Components: hbck >Affects Versions: 0.92.2, 0.96.0, 0.94.2 >Reporter: Jonathan Hsieh >Assignee: Jonathan Hsieh > Fix For: 0.92.2, 0.96.0, 0.94.2 > > Attachments: hbase-6686-94-92.patch > > > Two unit tests fail because listStatus's semantics change between hadoop 1.0 > and hadoop 2.0. (specifically -- hadoop 1.0 returns empty array if used on > dir that does not exist, but hadoop 2.0 throws FileNotFoundException). > here's the exception: > {code} > 2012-08-28 16:01:19,789 WARN [Thread-3155] hbck.HFileCorruptionChecker(230): > Failed to quaratine an HFile in regiondir > hdfs://localhost:38096/user/jenkins/hbase/testQuarantineMissingFamdir/34b2e072b33052bf4875f85513e9c669 > java.io.FileNotFoundException: File > hdfs://localhost:38096/user/jenkins/hbase/testQuarantineMissingFamdir/34b2e072b33052bf4875f85513e9c669/fam > does not exist. > at > org.apache.hadoop.hdfs.DistributedFileSystem.listStatus(DistributedFileSystem.java:406) > at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1341) > at org.apache.hadoop.fs.FileSystem.listStatus(FileSystem.java:1381) > at > org.apache.hadoop.hbase.util.hbck.HFileCorruptionChecker.checkColFamDir(HFileCorruptionChecker.java:152) > at > org.apache.hadoop.hbase.util.TestHBaseFsck$2$1.checkColFamDir(TestHBaseFsck.java:1401) > at > org.apache.hadoop.hbase.util.hbck.HFileCorruptionChecker.checkRegionDir(HFileCorruptionChecker.java:185) > at > org.apache.hadoop.hbase.util.hbck.HFileCorruptionChecker$RegionDirChecker.call(HFileCorruptionChecker.java:267) > at > org.apache.hadoop.hbase.util.hbck.HFileCorruptionChecker$RegionDirChecker.call(HFileCorruptionChecker.java:258) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > at > java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441) > at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303) > at java.util.concurrent.FutureTask.run(FutureTask.java:138) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$301(ScheduledThreadPoolExecutor.java:98) > at > java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:206) > at > java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) > at > java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) > at java.lang.Thread.run(Thread.java:662) > {code} -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira