[jira] [Commented] (HBASE-6358) Bulkloading from remote filesystem is problematic
[ https://issues.apache.org/jira/browse/HBASE-6358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13429262#comment-13429262 ] Dave Revell commented on HBASE-6358: {quote}If the size and speed don't matter, then wouldn't you have just used a normal (non-bulk-load) MR job to load the data?{quote} There are other reasons to atomically load hfiles even for non-huge datasets, such as ETL and restoring backups. And atomicity could have some benefits for certain use cases. But it's probably not asking too much for people with these use cases to use a distributed hfile loader that depends on mapreduce, so I'm willing to concede the point. @Todd, would you be in favor of adding another JIRA ticket for a distributed bulk loader, and having this ticket be blocked until it's done? I think it should be blocked so we don't remove the current bulkload from remote fs capability without offering an alternative, though the user does have the option of running distcp themselves. Bulkloading from remote filesystem is problematic - Key: HBASE-6358 URL: https://issues.apache.org/jira/browse/HBASE-6358 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.94.0 Reporter: Dave Revell Assignee: Dave Revell Attachments: 6358-suggestion.txt, HBASE-6358-trunk-v1.diff, HBASE-6358-trunk-v2.diff, HBASE-6358-trunk-v3.diff Bulk loading hfiles that don't live on the same filesystem as HBase can cause problems for subtle reasons. In Store.bulkLoadHFile(), the regionserver will copy the source hfile to its own filesystem if it's not already there. Since this can take a long time for large hfiles, it's likely that the client will timeout and retry. When the client retries repeatedly, there may be several bulkload operations in flight for the same hfile, causing lots of unnecessary IO and tying up handler threads. This can seriously impact performance. In my case, the cluster became unusable and the regionservers had to be kill -9'ed. Possible solutions: # Require that hfiles already be on the same filesystem as HBase in order for bulkloading to succeed. The copy could be handled by LoadIncrementalHFiles before the regionserver is called. # Others? I'm not familiar with Hadoop IPC so there may be tricks to extend the timeout or something else. I'm willing to write a patch but I'd appreciate recommendations on how to proceed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6358) Bulkloading from remote filesystem is problematic
[ https://issues.apache.org/jira/browse/HBASE-6358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13429359#comment-13429359 ] Todd Lipcon commented on HBASE-6358: bq. @Todd, would you be in favor of adding another JIRA ticket for a distributed bulk loader, and having this ticket be blocked until it's done? I think it should be blocked so we don't remove the current bulkload from remote fs capability without offering an alternative, though the user does have the option of running distcp themselves. I could go either way on this. Up to folks who are more actively contributing code than I :) Bulkloading from remote filesystem is problematic - Key: HBASE-6358 URL: https://issues.apache.org/jira/browse/HBASE-6358 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.94.0 Reporter: Dave Revell Assignee: Dave Revell Attachments: 6358-suggestion.txt, HBASE-6358-trunk-v1.diff, HBASE-6358-trunk-v2.diff, HBASE-6358-trunk-v3.diff Bulk loading hfiles that don't live on the same filesystem as HBase can cause problems for subtle reasons. In Store.bulkLoadHFile(), the regionserver will copy the source hfile to its own filesystem if it's not already there. Since this can take a long time for large hfiles, it's likely that the client will timeout and retry. When the client retries repeatedly, there may be several bulkload operations in flight for the same hfile, causing lots of unnecessary IO and tying up handler threads. This can seriously impact performance. In my case, the cluster became unusable and the regionservers had to be kill -9'ed. Possible solutions: # Require that hfiles already be on the same filesystem as HBase in order for bulkloading to succeed. The copy could be handled by LoadIncrementalHFiles before the regionserver is called. # Others? I'm not familiar with Hadoop IPC so there may be tricks to extend the timeout or something else. I'm willing to write a patch but I'd appreciate recommendations on how to proceed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6358) Bulkloading from remote filesystem is problematic
[ https://issues.apache.org/jira/browse/HBASE-6358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13428217#comment-13428217 ] Dave Revell commented on HBASE-6358: @Todd, can you explain your original rationale for using srcFs.equals(destFs) as a way of checking whether the src and dest filesystems are the same? Also, as an HDFS expert, can you suggest the best possible way for checking whether the underlying filesystems are actually the same FS? Bulkloading from remote filesystem is problematic - Key: HBASE-6358 URL: https://issues.apache.org/jira/browse/HBASE-6358 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.94.0 Reporter: Dave Revell Assignee: Dave Revell Attachments: 6358-suggestion.txt, HBASE-6358-trunk-v1.diff, HBASE-6358-trunk-v2.diff, HBASE-6358-trunk-v3.diff Bulk loading hfiles that don't live on the same filesystem as HBase can cause problems for subtle reasons. In Store.bulkLoadHFile(), the regionserver will copy the source hfile to its own filesystem if it's not already there. Since this can take a long time for large hfiles, it's likely that the client will timeout and retry. When the client retries repeatedly, there may be several bulkload operations in flight for the same hfile, causing lots of unnecessary IO and tying up handler threads. This can seriously impact performance. In my case, the cluster became unusable and the regionservers had to be kill -9'ed. Possible solutions: # Require that hfiles already be on the same filesystem as HBase in order for bulkloading to succeed. The copy could be handled by LoadIncrementalHFiles before the regionserver is called. # Others? I'm not familiar with Hadoop IPC so there may be tricks to extend the timeout or something else. I'm willing to write a patch but I'd appreciate recommendations on how to proceed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6358) Bulkloading from remote filesystem is problematic
[ https://issues.apache.org/jira/browse/HBASE-6358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13428291#comment-13428291 ] Todd Lipcon commented on HBASE-6358: hmm... I don't know if I thought about it in a huge amount of detail. The original idea was to allow you to run an MR on one cluster, and then LoadIncrementalHFiles on your HBase cluster which uses a different HDFS. I was thinking there would be an advantage here over the distcp-then-load approach, because the region server doing the copy would end up with a local replica after the load. That said, I didn't think through the timeout implications, which seems to be the issue discussed in this JIRA. As for how to determine if they're the same, the .equals() call is supposed to do that, but perhaps it's not working right? Bulkloading from remote filesystem is problematic - Key: HBASE-6358 URL: https://issues.apache.org/jira/browse/HBASE-6358 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.94.0 Reporter: Dave Revell Assignee: Dave Revell Attachments: 6358-suggestion.txt, HBASE-6358-trunk-v1.diff, HBASE-6358-trunk-v2.diff, HBASE-6358-trunk-v3.diff Bulk loading hfiles that don't live on the same filesystem as HBase can cause problems for subtle reasons. In Store.bulkLoadHFile(), the regionserver will copy the source hfile to its own filesystem if it's not already there. Since this can take a long time for large hfiles, it's likely that the client will timeout and retry. When the client retries repeatedly, there may be several bulkload operations in flight for the same hfile, causing lots of unnecessary IO and tying up handler threads. This can seriously impact performance. In my case, the cluster became unusable and the regionservers had to be kill -9'ed. Possible solutions: # Require that hfiles already be on the same filesystem as HBase in order for bulkloading to succeed. The copy could be handled by LoadIncrementalHFiles before the regionserver is called. # Others? I'm not familiar with Hadoop IPC so there may be tricks to extend the timeout or something else. I'm willing to write a patch but I'd appreciate recommendations on how to proceed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6358) Bulkloading from remote filesystem is problematic
[ https://issues.apache.org/jira/browse/HBASE-6358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13428295#comment-13428295 ] Dave Revell commented on HBASE-6358: Thanks Todd. My new plan is to keep the copy, but do it in LoadIncrementalHFiles instead of in Store. This keeps the existing use cases and test cases intact, but works around the timeout/retry problem. Bulkloading from remote filesystem is problematic - Key: HBASE-6358 URL: https://issues.apache.org/jira/browse/HBASE-6358 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.94.0 Reporter: Dave Revell Assignee: Dave Revell Attachments: 6358-suggestion.txt, HBASE-6358-trunk-v1.diff, HBASE-6358-trunk-v2.diff, HBASE-6358-trunk-v3.diff Bulk loading hfiles that don't live on the same filesystem as HBase can cause problems for subtle reasons. In Store.bulkLoadHFile(), the regionserver will copy the source hfile to its own filesystem if it's not already there. Since this can take a long time for large hfiles, it's likely that the client will timeout and retry. When the client retries repeatedly, there may be several bulkload operations in flight for the same hfile, causing lots of unnecessary IO and tying up handler threads. This can seriously impact performance. In my case, the cluster became unusable and the regionservers had to be kill -9'ed. Possible solutions: # Require that hfiles already be on the same filesystem as HBase in order for bulkloading to succeed. The copy could be handled by LoadIncrementalHFiles before the regionserver is called. # Others? I'm not familiar with Hadoop IPC so there may be tricks to extend the timeout or something else. I'm willing to write a patch but I'd appreciate recommendations on how to proceed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6358) Bulkloading from remote filesystem is problematic
[ https://issues.apache.org/jira/browse/HBASE-6358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13428337#comment-13428337 ] Todd Lipcon commented on HBASE-6358: The problem of doing it automatically in LoadIncrementalHFiles (i.e the client) is that it is going to be very slow for any non-trivial amount of data, to funnel it through this single node. Here's an alternate idea: 1. In this JIRA, change the RS side to fail if the filesystem doesn't match 2. Separately, add a new DistributedLoadIncrementalHFiles program which acts as a combination of distcp and LoadIncrementalHFiles. For each RS (or perhaps for each region), it would create one map task, with a locality hint to that server. Then the task would copy the relevant file (achieving a local replica) and make the necessary call to load the file. Between step 1 and 2, users would have to use distcp and sacrifice locality. But, with the current scheme, they already don't get locality for the common case where the MR job runs on the same cluster as HBase. Thoughts? Bulkloading from remote filesystem is problematic - Key: HBASE-6358 URL: https://issues.apache.org/jira/browse/HBASE-6358 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.94.0 Reporter: Dave Revell Assignee: Dave Revell Attachments: 6358-suggestion.txt, HBASE-6358-trunk-v1.diff, HBASE-6358-trunk-v2.diff, HBASE-6358-trunk-v3.diff Bulk loading hfiles that don't live on the same filesystem as HBase can cause problems for subtle reasons. In Store.bulkLoadHFile(), the regionserver will copy the source hfile to its own filesystem if it's not already there. Since this can take a long time for large hfiles, it's likely that the client will timeout and retry. When the client retries repeatedly, there may be several bulkload operations in flight for the same hfile, causing lots of unnecessary IO and tying up handler threads. This can seriously impact performance. In my case, the cluster became unusable and the regionservers had to be kill -9'ed. Possible solutions: # Require that hfiles already be on the same filesystem as HBase in order for bulkloading to succeed. The copy could be handled by LoadIncrementalHFiles before the regionserver is called. # Others? I'm not familiar with Hadoop IPC so there may be tricks to extend the timeout or something else. I'm willing to write a patch but I'd appreciate recommendations on how to proceed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6358) Bulkloading from remote filesystem is problematic
[ https://issues.apache.org/jira/browse/HBASE-6358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13428368#comment-13428368 ] Dave Revell commented on HBASE-6358: @Todd, that idea seems fine to me overall. If we just did the slow copy in LoadIncrementalHFiles as I suggested earlier, users would still have the option of doing distcp before calling LoadIncrementalHFiles if they need performance. This has the benefits of # not breaking the current use case of non-local bulk loading when size or speed requirements are modest # not requiring a new DistributedLoadIncrementalHFiles utility This scheme would not give locality though, so users with serious performance requirements might not be satisfied. Bulkloading from remote filesystem is problematic - Key: HBASE-6358 URL: https://issues.apache.org/jira/browse/HBASE-6358 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.94.0 Reporter: Dave Revell Assignee: Dave Revell Attachments: 6358-suggestion.txt, HBASE-6358-trunk-v1.diff, HBASE-6358-trunk-v2.diff, HBASE-6358-trunk-v3.diff Bulk loading hfiles that don't live on the same filesystem as HBase can cause problems for subtle reasons. In Store.bulkLoadHFile(), the regionserver will copy the source hfile to its own filesystem if it's not already there. Since this can take a long time for large hfiles, it's likely that the client will timeout and retry. When the client retries repeatedly, there may be several bulkload operations in flight for the same hfile, causing lots of unnecessary IO and tying up handler threads. This can seriously impact performance. In my case, the cluster became unusable and the regionservers had to be kill -9'ed. Possible solutions: # Require that hfiles already be on the same filesystem as HBase in order for bulkloading to succeed. The copy could be handled by LoadIncrementalHFiles before the regionserver is called. # Others? I'm not familiar with Hadoop IPC so there may be tricks to extend the timeout or something else. I'm willing to write a patch but I'd appreciate recommendations on how to proceed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6358) Bulkloading from remote filesystem is problematic
[ https://issues.apache.org/jira/browse/HBASE-6358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13428508#comment-13428508 ] Todd Lipcon commented on HBASE-6358: bq. not breaking the current use case of non-local bulk loading when size or speed requirements are modest If the size and speed don't matter, then wouldn't you have just used a normal (non-bulk-load) MR job to load the data? I think funneling the load through one host basically defeats the purpose of bulk load. Perhaps it could be available as an option for people just testing out, but I would prefer the default to be a failure, and you have to enable the copy with a {{-copyToCluster}} or something. Bulkloading from remote filesystem is problematic - Key: HBASE-6358 URL: https://issues.apache.org/jira/browse/HBASE-6358 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.94.0 Reporter: Dave Revell Assignee: Dave Revell Attachments: 6358-suggestion.txt, HBASE-6358-trunk-v1.diff, HBASE-6358-trunk-v2.diff, HBASE-6358-trunk-v3.diff Bulk loading hfiles that don't live on the same filesystem as HBase can cause problems for subtle reasons. In Store.bulkLoadHFile(), the regionserver will copy the source hfile to its own filesystem if it's not already there. Since this can take a long time for large hfiles, it's likely that the client will timeout and retry. When the client retries repeatedly, there may be several bulkload operations in flight for the same hfile, causing lots of unnecessary IO and tying up handler threads. This can seriously impact performance. In my case, the cluster became unusable and the regionservers had to be kill -9'ed. Possible solutions: # Require that hfiles already be on the same filesystem as HBase in order for bulkloading to succeed. The copy could be handled by LoadIncrementalHFiles before the regionserver is called. # Others? I'm not familiar with Hadoop IPC so there may be tricks to extend the timeout or something else. I'm willing to write a patch but I'd appreciate recommendations on how to proceed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6358) Bulkloading from remote filesystem is problematic
[ https://issues.apache.org/jira/browse/HBASE-6358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13427664#comment-13427664 ] Zhihong Ted Yu commented on HBASE-6358: --- {code} + throw new IOException(errMsg); {code} Should DoNotRetryIOException be thrown instead ? Bulkloading from remote filesystem is problematic - Key: HBASE-6358 URL: https://issues.apache.org/jira/browse/HBASE-6358 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.94.0 Reporter: Dave Revell Assignee: Dave Revell Attachments: HBASE-6358-trunk-v1.diff Bulk loading hfiles that don't live on the same filesystem as HBase can cause problems for subtle reasons. In Store.bulkLoadHFile(), the regionserver will copy the source hfile to its own filesystem if it's not already there. Since this can take a long time for large hfiles, it's likely that the client will timeout and retry. When the client retries repeatedly, there may be several bulkload operations in flight for the same hfile, causing lots of unnecessary IO and tying up handler threads. This can seriously impact performance. In my case, the cluster became unusable and the regionservers had to be kill -9'ed. Possible solutions: # Require that hfiles already be on the same filesystem as HBase in order for bulkloading to succeed. The copy could be handled by LoadIncrementalHFiles before the regionserver is called. # Others? I'm not familiar with Hadoop IPC so there may be tricks to extend the timeout or something else. I'm willing to write a patch but I'd appreciate recommendations on how to proceed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6358) Bulkloading from remote filesystem is problematic
[ https://issues.apache.org/jira/browse/HBASE-6358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13427711#comment-13427711 ] Dave Revell commented on HBASE-6358: [~zhi...@ebaysf.com]: yes, I'll change it. Bulkloading from remote filesystem is problematic - Key: HBASE-6358 URL: https://issues.apache.org/jira/browse/HBASE-6358 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.94.0 Reporter: Dave Revell Assignee: Dave Revell Attachments: HBASE-6358-trunk-v1.diff Bulk loading hfiles that don't live on the same filesystem as HBase can cause problems for subtle reasons. In Store.bulkLoadHFile(), the regionserver will copy the source hfile to its own filesystem if it's not already there. Since this can take a long time for large hfiles, it's likely that the client will timeout and retry. When the client retries repeatedly, there may be several bulkload operations in flight for the same hfile, causing lots of unnecessary IO and tying up handler threads. This can seriously impact performance. In my case, the cluster became unusable and the regionservers had to be kill -9'ed. Possible solutions: # Require that hfiles already be on the same filesystem as HBase in order for bulkloading to succeed. The copy could be handled by LoadIncrementalHFiles before the regionserver is called. # Others? I'm not familiar with Hadoop IPC so there may be tricks to extend the timeout or something else. I'm willing to write a patch but I'd appreciate recommendations on how to proceed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6358) Bulkloading from remote filesystem is problematic
[ https://issues.apache.org/jira/browse/HBASE-6358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13427761#comment-13427761 ] Hadoop QA commented on HBASE-6358: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12538970/HBASE-6358-trunk-v2.diff against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 hadoop2.0. The patch compiles against the hadoop 2.0 profile. +1 javadoc. The javadoc tool did not generate any warning messages. -1 javac. The applied patch generated 5 javac compiler warnings (more than the trunk's current 4 warnings). -1 findbugs. The patch appears to introduce 10 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.io.hfile.TestForceCacheImportantBlocks org.apache.hadoop.hbase.regionserver.TestHRegionServerBulkLoad org.apache.hadoop.hbase.mapreduce.TestLoadIncrementalHFilesSplitRecovery org.apache.hadoop.hbase.master.TestMasterNoCluster org.apache.hadoop.hbase.mapreduce.TestLoadIncrementalHFiles org.apache.hadoop.hbase.coprocessor.TestRegionObserverInterface org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/2489//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2489//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2489//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2489//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2489//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2489//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/2489//console This message is automatically generated. Bulkloading from remote filesystem is problematic - Key: HBASE-6358 URL: https://issues.apache.org/jira/browse/HBASE-6358 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.94.0 Reporter: Dave Revell Assignee: Dave Revell Attachments: HBASE-6358-trunk-v1.diff, HBASE-6358-trunk-v2.diff Bulk loading hfiles that don't live on the same filesystem as HBase can cause problems for subtle reasons. In Store.bulkLoadHFile(), the regionserver will copy the source hfile to its own filesystem if it's not already there. Since this can take a long time for large hfiles, it's likely that the client will timeout and retry. When the client retries repeatedly, there may be several bulkload operations in flight for the same hfile, causing lots of unnecessary IO and tying up handler threads. This can seriously impact performance. In my case, the cluster became unusable and the regionservers had to be kill -9'ed. Possible solutions: # Require that hfiles already be on the same filesystem as HBase in order for bulkloading to succeed. The copy could be handled by LoadIncrementalHFiles before the regionserver is called. # Others? I'm not familiar with Hadoop IPC so there may be tricks to extend the timeout or something else. I'm willing to write a patch but I'd appreciate recommendations on how to proceed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6358) Bulkloading from remote filesystem is problematic
[ https://issues.apache.org/jira/browse/HBASE-6358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13427772#comment-13427772 ] Dave Revell commented on HBASE-6358: Patch v3 uploaded Bulkloading from remote filesystem is problematic - Key: HBASE-6358 URL: https://issues.apache.org/jira/browse/HBASE-6358 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.94.0 Reporter: Dave Revell Assignee: Dave Revell Attachments: HBASE-6358-trunk-v1.diff, HBASE-6358-trunk-v2.diff, HBASE-6358-trunk-v3.diff Bulk loading hfiles that don't live on the same filesystem as HBase can cause problems for subtle reasons. In Store.bulkLoadHFile(), the regionserver will copy the source hfile to its own filesystem if it's not already there. Since this can take a long time for large hfiles, it's likely that the client will timeout and retry. When the client retries repeatedly, there may be several bulkload operations in flight for the same hfile, causing lots of unnecessary IO and tying up handler threads. This can seriously impact performance. In my case, the cluster became unusable and the regionservers had to be kill -9'ed. Possible solutions: # Require that hfiles already be on the same filesystem as HBase in order for bulkloading to succeed. The copy could be handled by LoadIncrementalHFiles before the regionserver is called. # Others? I'm not familiar with Hadoop IPC so there may be tricks to extend the timeout or something else. I'm willing to write a patch but I'd appreciate recommendations on how to proceed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6358) Bulkloading from remote filesystem is problematic
[ https://issues.apache.org/jira/browse/HBASE-6358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13427786#comment-13427786 ] Hadoop QA commented on HBASE-6358: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12538970/HBASE-6358-trunk-v2.diff against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 hadoop2.0. The patch compiles against the hadoop 2.0 profile. +1 javadoc. The javadoc tool did not generate any warning messages. -1 javac. The applied patch generated 5 javac compiler warnings (more than the trunk's current 4 warnings). -1 findbugs. The patch appears to introduce 10 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.regionserver.TestHRegionServerBulkLoad org.apache.hadoop.hbase.mapreduce.TestLoadIncrementalHFilesSplitRecovery org.apache.hadoop.hbase.mapreduce.TestLoadIncrementalHFiles org.apache.hadoop.hbase.coprocessor.TestRegionObserverInterface org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/2492//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2492//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2492//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2492//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2492//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2492//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/2492//console This message is automatically generated. Bulkloading from remote filesystem is problematic - Key: HBASE-6358 URL: https://issues.apache.org/jira/browse/HBASE-6358 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.94.0 Reporter: Dave Revell Assignee: Dave Revell Attachments: HBASE-6358-trunk-v1.diff, HBASE-6358-trunk-v2.diff, HBASE-6358-trunk-v3.diff Bulk loading hfiles that don't live on the same filesystem as HBase can cause problems for subtle reasons. In Store.bulkLoadHFile(), the regionserver will copy the source hfile to its own filesystem if it's not already there. Since this can take a long time for large hfiles, it's likely that the client will timeout and retry. When the client retries repeatedly, there may be several bulkload operations in flight for the same hfile, causing lots of unnecessary IO and tying up handler threads. This can seriously impact performance. In my case, the cluster became unusable and the regionservers had to be kill -9'ed. Possible solutions: # Require that hfiles already be on the same filesystem as HBase in order for bulkloading to succeed. The copy could be handled by LoadIncrementalHFiles before the regionserver is called. # Others? I'm not familiar with Hadoop IPC so there may be tricks to extend the timeout or something else. I'm willing to write a patch but I'd appreciate recommendations on how to proceed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6358) Bulkloading from remote filesystem is problematic
[ https://issues.apache.org/jira/browse/HBASE-6358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13427805#comment-13427805 ] Hadoop QA commented on HBASE-6358: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12538985/HBASE-6358-trunk-v3.diff against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 hadoop2.0. The patch compiles against the hadoop 2.0 profile. +1 javadoc. The javadoc tool did not generate any warning messages. -1 javac. The applied patch generated 5 javac compiler warnings (more than the trunk's current 4 warnings). -1 findbugs. The patch appears to introduce 10 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.client.TestAdmin org.apache.hadoop.hbase.regionserver.TestHRegionServerBulkLoad org.apache.hadoop.hbase.regionserver.TestSplitTransactionOnCluster org.apache.hadoop.hbase.mapreduce.TestLoadIncrementalHFilesSplitRecovery org.apache.hadoop.hbase.master.TestMasterNoCluster org.apache.hadoop.hbase.mapreduce.TestLoadIncrementalHFiles org.apache.hadoop.hbase.coprocessor.TestRegionObserverInterface org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/2493//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2493//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2493//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2493//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2493//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2493//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/2493//console This message is automatically generated. Bulkloading from remote filesystem is problematic - Key: HBASE-6358 URL: https://issues.apache.org/jira/browse/HBASE-6358 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.94.0 Reporter: Dave Revell Assignee: Dave Revell Attachments: HBASE-6358-trunk-v1.diff, HBASE-6358-trunk-v2.diff, HBASE-6358-trunk-v3.diff Bulk loading hfiles that don't live on the same filesystem as HBase can cause problems for subtle reasons. In Store.bulkLoadHFile(), the regionserver will copy the source hfile to its own filesystem if it's not already there. Since this can take a long time for large hfiles, it's likely that the client will timeout and retry. When the client retries repeatedly, there may be several bulkload operations in flight for the same hfile, causing lots of unnecessary IO and tying up handler threads. This can seriously impact performance. In my case, the cluster became unusable and the regionservers had to be kill -9'ed. Possible solutions: # Require that hfiles already be on the same filesystem as HBase in order for bulkloading to succeed. The copy could be handled by LoadIncrementalHFiles before the regionserver is called. # Others? I'm not familiar with Hadoop IPC so there may be tricks to extend the timeout or something else. I'm willing to write a patch but I'd appreciate recommendations on how to proceed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6358) Bulkloading from remote filesystem is problematic
[ https://issues.apache.org/jira/browse/HBASE-6358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13427810#comment-13427810 ] Zhihong Ted Yu commented on HBASE-6358: --- With a little more information added to the log, you would see that the following check is inaccurate: {code} if (!srcFs.equals(fs)) { {code} srcFs is: DFS[DFSClient[clientName=DFSClient_hb_rs_192.168.0.13,55152,1343960033351, ugi=zhihyu.hfs.0]] fs is: org.apache.hadoop.hbase.fs.HFileSystem@580a00fd I suggest using fs.getHomeDirectory()for comparison: it includes hostname, port number and home path. Bulkloading from remote filesystem is problematic - Key: HBASE-6358 URL: https://issues.apache.org/jira/browse/HBASE-6358 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.94.0 Reporter: Dave Revell Assignee: Dave Revell Attachments: HBASE-6358-trunk-v1.diff, HBASE-6358-trunk-v2.diff, HBASE-6358-trunk-v3.diff Bulk loading hfiles that don't live on the same filesystem as HBase can cause problems for subtle reasons. In Store.bulkLoadHFile(), the regionserver will copy the source hfile to its own filesystem if it's not already there. Since this can take a long time for large hfiles, it's likely that the client will timeout and retry. When the client retries repeatedly, there may be several bulkload operations in flight for the same hfile, causing lots of unnecessary IO and tying up handler threads. This can seriously impact performance. In my case, the cluster became unusable and the regionservers had to be kill -9'ed. Possible solutions: # Require that hfiles already be on the same filesystem as HBase in order for bulkloading to succeed. The copy could be handled by LoadIncrementalHFiles before the regionserver is called. # Others? I'm not familiar with Hadoop IPC so there may be tricks to extend the timeout or something else. I'm willing to write a patch but I'd appreciate recommendations on how to proceed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6358) Bulkloading from remote filesystem is problematic
[ https://issues.apache.org/jira/browse/HBASE-6358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13427829#comment-13427829 ] Hadoop QA commented on HBASE-6358: -- -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12538994/6358-suggestion.txt against trunk revision . +1 @author. The patch does not contain any @author tags. -1 tests included. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. +1 hadoop2.0. The patch compiles against the hadoop 2.0 profile. +1 javadoc. The javadoc tool did not generate any warning messages. -1 javac. The applied patch generated 5 javac compiler warnings (more than the trunk's current 4 warnings). -1 findbugs. The patch appears to introduce 10 new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests: org.apache.hadoop.hbase.master.TestAssignmentManager Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/2495//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2495//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2495//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2495//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2495//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/2495//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/2495//console This message is automatically generated. Bulkloading from remote filesystem is problematic - Key: HBASE-6358 URL: https://issues.apache.org/jira/browse/HBASE-6358 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.94.0 Reporter: Dave Revell Assignee: Dave Revell Attachments: 6358-suggestion.txt, HBASE-6358-trunk-v1.diff, HBASE-6358-trunk-v2.diff, HBASE-6358-trunk-v3.diff Bulk loading hfiles that don't live on the same filesystem as HBase can cause problems for subtle reasons. In Store.bulkLoadHFile(), the regionserver will copy the source hfile to its own filesystem if it's not already there. Since this can take a long time for large hfiles, it's likely that the client will timeout and retry. When the client retries repeatedly, there may be several bulkload operations in flight for the same hfile, causing lots of unnecessary IO and tying up handler threads. This can seriously impact performance. In my case, the cluster became unusable and the regionservers had to be kill -9'ed. Possible solutions: # Require that hfiles already be on the same filesystem as HBase in order for bulkloading to succeed. The copy could be handled by LoadIncrementalHFiles before the regionserver is called. # Others? I'm not familiar with Hadoop IPC so there may be tricks to extend the timeout or something else. I'm willing to write a patch but I'd appreciate recommendations on how to proceed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6358) Bulkloading from remote filesystem is problematic
[ https://issues.apache.org/jira/browse/HBASE-6358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13426680#comment-13426680 ] Harsh J commented on HBASE-6358: Hey Dave, Will you be doing the patch? We can probably mark-deprecate this feature in 0.96, and remove it in the release after that? Bulkloading from remote filesystem is problematic - Key: HBASE-6358 URL: https://issues.apache.org/jira/browse/HBASE-6358 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.94.0 Reporter: Dave Revell Assignee: Dave Revell Bulk loading hfiles that don't live on the same filesystem as HBase can cause problems for subtle reasons. In Store.bulkLoadHFile(), the regionserver will copy the source hfile to its own filesystem if it's not already there. Since this can take a long time for large hfiles, it's likely that the client will timeout and retry. When the client retries repeatedly, there may be several bulkload operations in flight for the same hfile, causing lots of unnecessary IO and tying up handler threads. This can seriously impact performance. In my case, the cluster became unusable and the regionservers had to be kill -9'ed. Possible solutions: # Require that hfiles already be on the same filesystem as HBase in order for bulkloading to succeed. The copy could be handled by LoadIncrementalHFiles before the regionserver is called. # Others? I'm not familiar with Hadoop IPC so there may be tricks to extend the timeout or something else. I'm willing to write a patch but I'd appreciate recommendations on how to proceed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6358) Bulkloading from remote filesystem is problematic
[ https://issues.apache.org/jira/browse/HBASE-6358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13426747#comment-13426747 ] Dave Revell commented on HBASE-6358: @Harsh, Yes I will, sorry for the delay. I can have it within a week. Bulkloading from remote filesystem is problematic - Key: HBASE-6358 URL: https://issues.apache.org/jira/browse/HBASE-6358 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.94.0 Reporter: Dave Revell Assignee: Dave Revell Bulk loading hfiles that don't live on the same filesystem as HBase can cause problems for subtle reasons. In Store.bulkLoadHFile(), the regionserver will copy the source hfile to its own filesystem if it's not already there. Since this can take a long time for large hfiles, it's likely that the client will timeout and retry. When the client retries repeatedly, there may be several bulkload operations in flight for the same hfile, causing lots of unnecessary IO and tying up handler threads. This can seriously impact performance. In my case, the cluster became unusable and the regionservers had to be kill -9'ed. Possible solutions: # Require that hfiles already be on the same filesystem as HBase in order for bulkloading to succeed. The copy could be handled by LoadIncrementalHFiles before the regionserver is called. # Others? I'm not familiar with Hadoop IPC so there may be tricks to extend the timeout or something else. I'm willing to write a patch but I'd appreciate recommendations on how to proceed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6358) Bulkloading from remote filesystem is problematic
[ https://issues.apache.org/jira/browse/HBASE-6358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13410658#comment-13410658 ] Andrew Purtell commented on HBASE-6358: --- bq. Require that hfiles already be on the same filesystem as HBase in order for bulkloading to succeed. +1. Will avoid surprises like that reported on this issue. Bulkloading from remote filesystem is problematic - Key: HBASE-6358 URL: https://issues.apache.org/jira/browse/HBASE-6358 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.94.0 Reporter: Dave Revell Assignee: Dave Revell Bulk loading hfiles that don't live on the same filesystem as HBase can cause problems for subtle reasons. In Store.bulkLoadHFile(), the regionserver will copy the source hfile to its own filesystem if it's not already there. Since this can take a long time for large hfiles, it's likely that the client will timeout and retry. When the client retries repeatedly, there may be several bulkload operations in flight for the same hfile, causing lots of unnecessary IO and tying up handler threads. This can seriously impact performance. In my case, the cluster became unusable and the regionservers had to be kill -9'ed. Possible solutions: # Require that hfiles already be on the same filesystem as HBase in order for bulkloading to succeed. The copy could be handled by LoadIncrementalHFiles before the regionserver is called. # Others? I'm not familiar with Hadoop IPC so there may be tricks to extend the timeout or something else. I'm willing to write a patch but I'd appreciate recommendations on how to proceed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6358) Bulkloading from remote filesystem is problematic
[ https://issues.apache.org/jira/browse/HBASE-6358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13409896#comment-13409896 ] Harsh J commented on HBASE-6358: I sort of agree, except it is also more of a best-practice thing. If you bulk load remotely with only a single or very few requests per source at a time, and with a high RPC timeout at the client (such that it does not retry too often), then it should be more tolerable. But in any case, having the RS do FS copies will indeed make it slow. I ran into a very similar issue and the tweak I had to suggest was to indeed distcp/cp the data first and bulk load next. HBASE-6350 (Logging improvements for ops) and HBASE-6339 (Possible optimization, negative in the end) came out of it. Bulkloading from remote filesystem is problematic - Key: HBASE-6358 URL: https://issues.apache.org/jira/browse/HBASE-6358 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.94.0 Reporter: Dave Revell Assignee: Dave Revell Bulk loading hfiles that don't live on the same filesystem as HBase can cause problems for subtle reasons. In Store.bulkLoadHFile(), the regionserver will copy the source hfile to its own filesystem if it's not already there. Since this can take a long time for large hfiles, it's likely that the client will timeout and retry. When the client retries repeatedly, there may be several bulkload operations in flight for the same hfile, causing lots of unnecessary IO and tying up handler threads. This can seriously impact performance. In my case, the cluster became unusable and the regionservers had to be kill -9'ed. Possible solutions: # Require that hfiles already be on the same filesystem as HBase in order for bulkloading to succeed. The copy could be handled by LoadIncrementalHFiles before the regionserver is called. # Others? I'm not familiar with Hadoop IPC so there may be tricks to extend the timeout or something else. I'm willing to write a patch but I'd appreciate recommendations on how to proceed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-6358) Bulkloading from remote filesystem is problematic
[ https://issues.apache.org/jira/browse/HBASE-6358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13409898#comment-13409898 ] Todd Lipcon commented on HBASE-6358: Yea, I originally wrote the code that did the copy, but in hindsight I think it was a mistake. I think we should remove that capability and have the code fail if the filesystem doesn't match. Bulkloading from remote filesystem is problematic - Key: HBASE-6358 URL: https://issues.apache.org/jira/browse/HBASE-6358 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.94.0 Reporter: Dave Revell Assignee: Dave Revell Bulk loading hfiles that don't live on the same filesystem as HBase can cause problems for subtle reasons. In Store.bulkLoadHFile(), the regionserver will copy the source hfile to its own filesystem if it's not already there. Since this can take a long time for large hfiles, it's likely that the client will timeout and retry. When the client retries repeatedly, there may be several bulkload operations in flight for the same hfile, causing lots of unnecessary IO and tying up handler threads. This can seriously impact performance. In my case, the cluster became unusable and the regionservers had to be kill -9'ed. Possible solutions: # Require that hfiles already be on the same filesystem as HBase in order for bulkloading to succeed. The copy could be handled by LoadIncrementalHFiles before the regionserver is called. # Others? I'm not familiar with Hadoop IPC so there may be tricks to extend the timeout or something else. I'm willing to write a patch but I'd appreciate recommendations on how to proceed. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators: https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa For more information on JIRA, see: http://www.atlassian.com/software/jira