[jira] [Commented] (HBASE-6358) Bulkloading from remote filesystem is problematic

2012-08-06 Thread Dave Revell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13429262#comment-13429262
 ] 

Dave Revell commented on HBASE-6358:


{quote}If the size and speed don't matter, then wouldn't you have just used a 
normal (non-bulk-load) MR job to load the data?{quote}

There are other reasons to atomically load hfiles even for non-huge datasets, 
such as ETL and restoring backups. And atomicity could have some benefits for 
certain use cases. But it's probably not asking too much for people with these 
use cases to use a distributed hfile loader that depends on mapreduce, so I'm 
willing to concede the point.

@Todd, would you be in favor of adding another JIRA ticket for a distributed 
bulk loader, and having this ticket be blocked until it's done? I think it 
should be blocked so we don't remove the current bulkload from remote fs 
capability without offering an alternative, though the user does have the 
option of running distcp themselves.

 Bulkloading from remote filesystem is problematic
 -

 Key: HBASE-6358
 URL: https://issues.apache.org/jira/browse/HBASE-6358
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.94.0
Reporter: Dave Revell
Assignee: Dave Revell
 Attachments: 6358-suggestion.txt, HBASE-6358-trunk-v1.diff, 
 HBASE-6358-trunk-v2.diff, HBASE-6358-trunk-v3.diff


 Bulk loading hfiles that don't live on the same filesystem as HBase can cause 
 problems for subtle reasons.
 In Store.bulkLoadHFile(), the regionserver will copy the source hfile to its 
 own filesystem if it's not already there. Since this can take a long time for 
 large hfiles, it's likely that the client will timeout and retry. When the 
 client retries repeatedly, there may be several bulkload operations in flight 
 for the same hfile, causing lots of unnecessary IO and tying up handler 
 threads. This can seriously impact performance. In my case, the cluster 
 became unusable and the regionservers had to be kill -9'ed.
 Possible solutions:
  # Require that hfiles already be on the same filesystem as HBase in order 
 for bulkloading to succeed. The copy could be handled by 
 LoadIncrementalHFiles before the regionserver is called.
  # Others? I'm not familiar with Hadoop IPC so there may be tricks to extend 
 the timeout or something else.
 I'm willing to write a patch but I'd appreciate recommendations on how to 
 proceed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6358) Bulkloading from remote filesystem is problematic

2012-08-06 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13429359#comment-13429359
 ] 

Todd Lipcon commented on HBASE-6358:


bq. @Todd, would you be in favor of adding another JIRA ticket for a 
distributed bulk loader, and having this ticket be blocked until it's done? I 
think it should be blocked so we don't remove the current bulkload from remote 
fs capability without offering an alternative, though the user does have the 
option of running distcp themselves.

I could go either way on this. Up to folks who are more actively contributing 
code than I :)

 Bulkloading from remote filesystem is problematic
 -

 Key: HBASE-6358
 URL: https://issues.apache.org/jira/browse/HBASE-6358
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.94.0
Reporter: Dave Revell
Assignee: Dave Revell
 Attachments: 6358-suggestion.txt, HBASE-6358-trunk-v1.diff, 
 HBASE-6358-trunk-v2.diff, HBASE-6358-trunk-v3.diff


 Bulk loading hfiles that don't live on the same filesystem as HBase can cause 
 problems for subtle reasons.
 In Store.bulkLoadHFile(), the regionserver will copy the source hfile to its 
 own filesystem if it's not already there. Since this can take a long time for 
 large hfiles, it's likely that the client will timeout and retry. When the 
 client retries repeatedly, there may be several bulkload operations in flight 
 for the same hfile, causing lots of unnecessary IO and tying up handler 
 threads. This can seriously impact performance. In my case, the cluster 
 became unusable and the regionservers had to be kill -9'ed.
 Possible solutions:
  # Require that hfiles already be on the same filesystem as HBase in order 
 for bulkloading to succeed. The copy could be handled by 
 LoadIncrementalHFiles before the regionserver is called.
  # Others? I'm not familiar with Hadoop IPC so there may be tricks to extend 
 the timeout or something else.
 I'm willing to write a patch but I'd appreciate recommendations on how to 
 proceed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6358) Bulkloading from remote filesystem is problematic

2012-08-03 Thread Dave Revell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13428217#comment-13428217
 ] 

Dave Revell commented on HBASE-6358:


@Todd, can you explain your original rationale for using srcFs.equals(destFs) 
as a way of checking whether the src and dest filesystems are the same? Also, 
as an HDFS expert, can you suggest the best possible way for checking whether 
the underlying filesystems are actually the same FS?

 Bulkloading from remote filesystem is problematic
 -

 Key: HBASE-6358
 URL: https://issues.apache.org/jira/browse/HBASE-6358
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.94.0
Reporter: Dave Revell
Assignee: Dave Revell
 Attachments: 6358-suggestion.txt, HBASE-6358-trunk-v1.diff, 
 HBASE-6358-trunk-v2.diff, HBASE-6358-trunk-v3.diff


 Bulk loading hfiles that don't live on the same filesystem as HBase can cause 
 problems for subtle reasons.
 In Store.bulkLoadHFile(), the regionserver will copy the source hfile to its 
 own filesystem if it's not already there. Since this can take a long time for 
 large hfiles, it's likely that the client will timeout and retry. When the 
 client retries repeatedly, there may be several bulkload operations in flight 
 for the same hfile, causing lots of unnecessary IO and tying up handler 
 threads. This can seriously impact performance. In my case, the cluster 
 became unusable and the regionservers had to be kill -9'ed.
 Possible solutions:
  # Require that hfiles already be on the same filesystem as HBase in order 
 for bulkloading to succeed. The copy could be handled by 
 LoadIncrementalHFiles before the regionserver is called.
  # Others? I'm not familiar with Hadoop IPC so there may be tricks to extend 
 the timeout or something else.
 I'm willing to write a patch but I'd appreciate recommendations on how to 
 proceed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6358) Bulkloading from remote filesystem is problematic

2012-08-03 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13428291#comment-13428291
 ] 

Todd Lipcon commented on HBASE-6358:


hmm... I don't know if I thought about it in a huge amount of detail. The 
original idea was to allow you to run an MR on one cluster, and then 
LoadIncrementalHFiles on your HBase cluster which uses a different HDFS. I 
was thinking there would be an advantage here over the distcp-then-load 
approach, because the region server doing the copy would end up with a local 
replica after the load.

That said, I didn't think through the timeout implications, which seems to be 
the issue discussed in this JIRA.

As for how to determine if they're the same, the .equals() call is supposed to 
do that, but perhaps it's not working right?

 Bulkloading from remote filesystem is problematic
 -

 Key: HBASE-6358
 URL: https://issues.apache.org/jira/browse/HBASE-6358
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.94.0
Reporter: Dave Revell
Assignee: Dave Revell
 Attachments: 6358-suggestion.txt, HBASE-6358-trunk-v1.diff, 
 HBASE-6358-trunk-v2.diff, HBASE-6358-trunk-v3.diff


 Bulk loading hfiles that don't live on the same filesystem as HBase can cause 
 problems for subtle reasons.
 In Store.bulkLoadHFile(), the regionserver will copy the source hfile to its 
 own filesystem if it's not already there. Since this can take a long time for 
 large hfiles, it's likely that the client will timeout and retry. When the 
 client retries repeatedly, there may be several bulkload operations in flight 
 for the same hfile, causing lots of unnecessary IO and tying up handler 
 threads. This can seriously impact performance. In my case, the cluster 
 became unusable and the regionservers had to be kill -9'ed.
 Possible solutions:
  # Require that hfiles already be on the same filesystem as HBase in order 
 for bulkloading to succeed. The copy could be handled by 
 LoadIncrementalHFiles before the regionserver is called.
  # Others? I'm not familiar with Hadoop IPC so there may be tricks to extend 
 the timeout or something else.
 I'm willing to write a patch but I'd appreciate recommendations on how to 
 proceed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6358) Bulkloading from remote filesystem is problematic

2012-08-03 Thread Dave Revell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13428295#comment-13428295
 ] 

Dave Revell commented on HBASE-6358:


Thanks Todd. My new plan is to keep the copy, but do it in 
LoadIncrementalHFiles instead of in Store. This keeps the existing use cases 
and test cases intact, but works around the timeout/retry problem.

 Bulkloading from remote filesystem is problematic
 -

 Key: HBASE-6358
 URL: https://issues.apache.org/jira/browse/HBASE-6358
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.94.0
Reporter: Dave Revell
Assignee: Dave Revell
 Attachments: 6358-suggestion.txt, HBASE-6358-trunk-v1.diff, 
 HBASE-6358-trunk-v2.diff, HBASE-6358-trunk-v3.diff


 Bulk loading hfiles that don't live on the same filesystem as HBase can cause 
 problems for subtle reasons.
 In Store.bulkLoadHFile(), the regionserver will copy the source hfile to its 
 own filesystem if it's not already there. Since this can take a long time for 
 large hfiles, it's likely that the client will timeout and retry. When the 
 client retries repeatedly, there may be several bulkload operations in flight 
 for the same hfile, causing lots of unnecessary IO and tying up handler 
 threads. This can seriously impact performance. In my case, the cluster 
 became unusable and the regionservers had to be kill -9'ed.
 Possible solutions:
  # Require that hfiles already be on the same filesystem as HBase in order 
 for bulkloading to succeed. The copy could be handled by 
 LoadIncrementalHFiles before the regionserver is called.
  # Others? I'm not familiar with Hadoop IPC so there may be tricks to extend 
 the timeout or something else.
 I'm willing to write a patch but I'd appreciate recommendations on how to 
 proceed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6358) Bulkloading from remote filesystem is problematic

2012-08-03 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13428337#comment-13428337
 ] 

Todd Lipcon commented on HBASE-6358:


The problem of doing it automatically in LoadIncrementalHFiles (i.e the client) 
is that it is going to be very slow for any non-trivial amount of data, to 
funnel it through this single node.

Here's an alternate idea:
1. In this JIRA, change the RS side to fail if the filesystem doesn't match
2. Separately, add a new DistributedLoadIncrementalHFiles program which acts 
as a combination of distcp and LoadIncrementalHFiles. For each RS (or perhaps 
for each region), it would create one map task, with a locality hint to that 
server. Then the task would copy the relevant file (achieving a local replica) 
and make the necessary call to load the file.

Between step 1 and 2, users would have to use distcp and sacrifice locality. 
But, with the current scheme, they already don't get locality for the common 
case where the MR job runs on the same cluster as HBase.

Thoughts?

 Bulkloading from remote filesystem is problematic
 -

 Key: HBASE-6358
 URL: https://issues.apache.org/jira/browse/HBASE-6358
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.94.0
Reporter: Dave Revell
Assignee: Dave Revell
 Attachments: 6358-suggestion.txt, HBASE-6358-trunk-v1.diff, 
 HBASE-6358-trunk-v2.diff, HBASE-6358-trunk-v3.diff


 Bulk loading hfiles that don't live on the same filesystem as HBase can cause 
 problems for subtle reasons.
 In Store.bulkLoadHFile(), the regionserver will copy the source hfile to its 
 own filesystem if it's not already there. Since this can take a long time for 
 large hfiles, it's likely that the client will timeout and retry. When the 
 client retries repeatedly, there may be several bulkload operations in flight 
 for the same hfile, causing lots of unnecessary IO and tying up handler 
 threads. This can seriously impact performance. In my case, the cluster 
 became unusable and the regionservers had to be kill -9'ed.
 Possible solutions:
  # Require that hfiles already be on the same filesystem as HBase in order 
 for bulkloading to succeed. The copy could be handled by 
 LoadIncrementalHFiles before the regionserver is called.
  # Others? I'm not familiar with Hadoop IPC so there may be tricks to extend 
 the timeout or something else.
 I'm willing to write a patch but I'd appreciate recommendations on how to 
 proceed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6358) Bulkloading from remote filesystem is problematic

2012-08-03 Thread Dave Revell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13428368#comment-13428368
 ] 

Dave Revell commented on HBASE-6358:


@Todd, that idea seems fine to me overall.

If we just did the slow copy in LoadIncrementalHFiles as I suggested earlier, 
users would still have the option of doing distcp before calling 
LoadIncrementalHFiles if they need performance. This has the benefits of 

 # not breaking the current use case of non-local bulk loading when size or 
speed requirements are modest
 # not requiring a new DistributedLoadIncrementalHFiles utility

This scheme would not give locality though, so users with serious performance 
requirements might not be satisfied.


 Bulkloading from remote filesystem is problematic
 -

 Key: HBASE-6358
 URL: https://issues.apache.org/jira/browse/HBASE-6358
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.94.0
Reporter: Dave Revell
Assignee: Dave Revell
 Attachments: 6358-suggestion.txt, HBASE-6358-trunk-v1.diff, 
 HBASE-6358-trunk-v2.diff, HBASE-6358-trunk-v3.diff


 Bulk loading hfiles that don't live on the same filesystem as HBase can cause 
 problems for subtle reasons.
 In Store.bulkLoadHFile(), the regionserver will copy the source hfile to its 
 own filesystem if it's not already there. Since this can take a long time for 
 large hfiles, it's likely that the client will timeout and retry. When the 
 client retries repeatedly, there may be several bulkload operations in flight 
 for the same hfile, causing lots of unnecessary IO and tying up handler 
 threads. This can seriously impact performance. In my case, the cluster 
 became unusable and the regionservers had to be kill -9'ed.
 Possible solutions:
  # Require that hfiles already be on the same filesystem as HBase in order 
 for bulkloading to succeed. The copy could be handled by 
 LoadIncrementalHFiles before the regionserver is called.
  # Others? I'm not familiar with Hadoop IPC so there may be tricks to extend 
 the timeout or something else.
 I'm willing to write a patch but I'd appreciate recommendations on how to 
 proceed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6358) Bulkloading from remote filesystem is problematic

2012-08-03 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13428508#comment-13428508
 ] 

Todd Lipcon commented on HBASE-6358:


bq. not breaking the current use case of non-local bulk loading when size or 
speed requirements are modest

If the size and speed don't matter, then wouldn't you have just used a normal 
(non-bulk-load) MR job to load the data?

I think funneling the load through one host basically defeats the purpose of 
bulk load. Perhaps it could be available as an option for people just testing 
out, but I would prefer the default to be a failure, and you have to enable the 
copy with a {{-copyToCluster}} or something.

 Bulkloading from remote filesystem is problematic
 -

 Key: HBASE-6358
 URL: https://issues.apache.org/jira/browse/HBASE-6358
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.94.0
Reporter: Dave Revell
Assignee: Dave Revell
 Attachments: 6358-suggestion.txt, HBASE-6358-trunk-v1.diff, 
 HBASE-6358-trunk-v2.diff, HBASE-6358-trunk-v3.diff


 Bulk loading hfiles that don't live on the same filesystem as HBase can cause 
 problems for subtle reasons.
 In Store.bulkLoadHFile(), the regionserver will copy the source hfile to its 
 own filesystem if it's not already there. Since this can take a long time for 
 large hfiles, it's likely that the client will timeout and retry. When the 
 client retries repeatedly, there may be several bulkload operations in flight 
 for the same hfile, causing lots of unnecessary IO and tying up handler 
 threads. This can seriously impact performance. In my case, the cluster 
 became unusable and the regionservers had to be kill -9'ed.
 Possible solutions:
  # Require that hfiles already be on the same filesystem as HBase in order 
 for bulkloading to succeed. The copy could be handled by 
 LoadIncrementalHFiles before the regionserver is called.
  # Others? I'm not familiar with Hadoop IPC so there may be tricks to extend 
 the timeout or something else.
 I'm willing to write a patch but I'd appreciate recommendations on how to 
 proceed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6358) Bulkloading from remote filesystem is problematic

2012-08-02 Thread Zhihong Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13427664#comment-13427664
 ] 

Zhihong Ted Yu commented on HBASE-6358:
---

{code}
+  throw new IOException(errMsg);
{code}
Should DoNotRetryIOException be thrown instead ?

 Bulkloading from remote filesystem is problematic
 -

 Key: HBASE-6358
 URL: https://issues.apache.org/jira/browse/HBASE-6358
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.94.0
Reporter: Dave Revell
Assignee: Dave Revell
 Attachments: HBASE-6358-trunk-v1.diff


 Bulk loading hfiles that don't live on the same filesystem as HBase can cause 
 problems for subtle reasons.
 In Store.bulkLoadHFile(), the regionserver will copy the source hfile to its 
 own filesystem if it's not already there. Since this can take a long time for 
 large hfiles, it's likely that the client will timeout and retry. When the 
 client retries repeatedly, there may be several bulkload operations in flight 
 for the same hfile, causing lots of unnecessary IO and tying up handler 
 threads. This can seriously impact performance. In my case, the cluster 
 became unusable and the regionservers had to be kill -9'ed.
 Possible solutions:
  # Require that hfiles already be on the same filesystem as HBase in order 
 for bulkloading to succeed. The copy could be handled by 
 LoadIncrementalHFiles before the regionserver is called.
  # Others? I'm not familiar with Hadoop IPC so there may be tricks to extend 
 the timeout or something else.
 I'm willing to write a patch but I'd appreciate recommendations on how to 
 proceed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6358) Bulkloading from remote filesystem is problematic

2012-08-02 Thread Dave Revell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13427711#comment-13427711
 ] 

Dave Revell commented on HBASE-6358:


[~zhi...@ebaysf.com]: yes, I'll change it.

 Bulkloading from remote filesystem is problematic
 -

 Key: HBASE-6358
 URL: https://issues.apache.org/jira/browse/HBASE-6358
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.94.0
Reporter: Dave Revell
Assignee: Dave Revell
 Attachments: HBASE-6358-trunk-v1.diff


 Bulk loading hfiles that don't live on the same filesystem as HBase can cause 
 problems for subtle reasons.
 In Store.bulkLoadHFile(), the regionserver will copy the source hfile to its 
 own filesystem if it's not already there. Since this can take a long time for 
 large hfiles, it's likely that the client will timeout and retry. When the 
 client retries repeatedly, there may be several bulkload operations in flight 
 for the same hfile, causing lots of unnecessary IO and tying up handler 
 threads. This can seriously impact performance. In my case, the cluster 
 became unusable and the regionservers had to be kill -9'ed.
 Possible solutions:
  # Require that hfiles already be on the same filesystem as HBase in order 
 for bulkloading to succeed. The copy could be handled by 
 LoadIncrementalHFiles before the regionserver is called.
  # Others? I'm not familiar with Hadoop IPC so there may be tricks to extend 
 the timeout or something else.
 I'm willing to write a patch but I'd appreciate recommendations on how to 
 proceed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6358) Bulkloading from remote filesystem is problematic

2012-08-02 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13427761#comment-13427761
 ] 

Hadoop QA commented on HBASE-6358:
--

-1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12538970/HBASE-6358-trunk-v2.diff
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

+1 hadoop2.0.  The patch compiles against the hadoop 2.0 profile.

+1 javadoc.  The javadoc tool did not generate any warning messages.

-1 javac.  The applied patch generated 5 javac compiler warnings (more than 
the trunk's current 4 warnings).

-1 findbugs.  The patch appears to introduce 10 new Findbugs (version 
1.3.9) warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

 -1 core tests.  The patch failed these unit tests:
   
org.apache.hadoop.hbase.io.hfile.TestForceCacheImportantBlocks
  org.apache.hadoop.hbase.regionserver.TestHRegionServerBulkLoad
  
org.apache.hadoop.hbase.mapreduce.TestLoadIncrementalHFilesSplitRecovery
  org.apache.hadoop.hbase.master.TestMasterNoCluster
  org.apache.hadoop.hbase.mapreduce.TestLoadIncrementalHFiles
  
org.apache.hadoop.hbase.coprocessor.TestRegionObserverInterface
  org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2489//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2489//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2489//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2489//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2489//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2489//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2489//console

This message is automatically generated.

 Bulkloading from remote filesystem is problematic
 -

 Key: HBASE-6358
 URL: https://issues.apache.org/jira/browse/HBASE-6358
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.94.0
Reporter: Dave Revell
Assignee: Dave Revell
 Attachments: HBASE-6358-trunk-v1.diff, HBASE-6358-trunk-v2.diff


 Bulk loading hfiles that don't live on the same filesystem as HBase can cause 
 problems for subtle reasons.
 In Store.bulkLoadHFile(), the regionserver will copy the source hfile to its 
 own filesystem if it's not already there. Since this can take a long time for 
 large hfiles, it's likely that the client will timeout and retry. When the 
 client retries repeatedly, there may be several bulkload operations in flight 
 for the same hfile, causing lots of unnecessary IO and tying up handler 
 threads. This can seriously impact performance. In my case, the cluster 
 became unusable and the regionservers had to be kill -9'ed.
 Possible solutions:
  # Require that hfiles already be on the same filesystem as HBase in order 
 for bulkloading to succeed. The copy could be handled by 
 LoadIncrementalHFiles before the regionserver is called.
  # Others? I'm not familiar with Hadoop IPC so there may be tricks to extend 
 the timeout or something else.
 I'm willing to write a patch but I'd appreciate recommendations on how to 
 proceed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6358) Bulkloading from remote filesystem is problematic

2012-08-02 Thread Dave Revell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13427772#comment-13427772
 ] 

Dave Revell commented on HBASE-6358:


Patch v3 uploaded

 Bulkloading from remote filesystem is problematic
 -

 Key: HBASE-6358
 URL: https://issues.apache.org/jira/browse/HBASE-6358
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.94.0
Reporter: Dave Revell
Assignee: Dave Revell
 Attachments: HBASE-6358-trunk-v1.diff, HBASE-6358-trunk-v2.diff, 
 HBASE-6358-trunk-v3.diff


 Bulk loading hfiles that don't live on the same filesystem as HBase can cause 
 problems for subtle reasons.
 In Store.bulkLoadHFile(), the regionserver will copy the source hfile to its 
 own filesystem if it's not already there. Since this can take a long time for 
 large hfiles, it's likely that the client will timeout and retry. When the 
 client retries repeatedly, there may be several bulkload operations in flight 
 for the same hfile, causing lots of unnecessary IO and tying up handler 
 threads. This can seriously impact performance. In my case, the cluster 
 became unusable and the regionservers had to be kill -9'ed.
 Possible solutions:
  # Require that hfiles already be on the same filesystem as HBase in order 
 for bulkloading to succeed. The copy could be handled by 
 LoadIncrementalHFiles before the regionserver is called.
  # Others? I'm not familiar with Hadoop IPC so there may be tricks to extend 
 the timeout or something else.
 I'm willing to write a patch but I'd appreciate recommendations on how to 
 proceed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6358) Bulkloading from remote filesystem is problematic

2012-08-02 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13427786#comment-13427786
 ] 

Hadoop QA commented on HBASE-6358:
--

-1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12538970/HBASE-6358-trunk-v2.diff
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

+1 hadoop2.0.  The patch compiles against the hadoop 2.0 profile.

+1 javadoc.  The javadoc tool did not generate any warning messages.

-1 javac.  The applied patch generated 5 javac compiler warnings (more than 
the trunk's current 4 warnings).

-1 findbugs.  The patch appears to introduce 10 new Findbugs (version 
1.3.9) warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

 -1 core tests.  The patch failed these unit tests:
   
org.apache.hadoop.hbase.regionserver.TestHRegionServerBulkLoad
  
org.apache.hadoop.hbase.mapreduce.TestLoadIncrementalHFilesSplitRecovery
  org.apache.hadoop.hbase.mapreduce.TestLoadIncrementalHFiles
  
org.apache.hadoop.hbase.coprocessor.TestRegionObserverInterface
  org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2492//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2492//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2492//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2492//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2492//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2492//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2492//console

This message is automatically generated.

 Bulkloading from remote filesystem is problematic
 -

 Key: HBASE-6358
 URL: https://issues.apache.org/jira/browse/HBASE-6358
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.94.0
Reporter: Dave Revell
Assignee: Dave Revell
 Attachments: HBASE-6358-trunk-v1.diff, HBASE-6358-trunk-v2.diff, 
 HBASE-6358-trunk-v3.diff


 Bulk loading hfiles that don't live on the same filesystem as HBase can cause 
 problems for subtle reasons.
 In Store.bulkLoadHFile(), the regionserver will copy the source hfile to its 
 own filesystem if it's not already there. Since this can take a long time for 
 large hfiles, it's likely that the client will timeout and retry. When the 
 client retries repeatedly, there may be several bulkload operations in flight 
 for the same hfile, causing lots of unnecessary IO and tying up handler 
 threads. This can seriously impact performance. In my case, the cluster 
 became unusable and the regionservers had to be kill -9'ed.
 Possible solutions:
  # Require that hfiles already be on the same filesystem as HBase in order 
 for bulkloading to succeed. The copy could be handled by 
 LoadIncrementalHFiles before the regionserver is called.
  # Others? I'm not familiar with Hadoop IPC so there may be tricks to extend 
 the timeout or something else.
 I'm willing to write a patch but I'd appreciate recommendations on how to 
 proceed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6358) Bulkloading from remote filesystem is problematic

2012-08-02 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13427805#comment-13427805
 ] 

Hadoop QA commented on HBASE-6358:
--

-1 overall.  Here are the results of testing the latest attachment 
  
http://issues.apache.org/jira/secure/attachment/12538985/HBASE-6358-trunk-v3.diff
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

+1 hadoop2.0.  The patch compiles against the hadoop 2.0 profile.

+1 javadoc.  The javadoc tool did not generate any warning messages.

-1 javac.  The applied patch generated 5 javac compiler warnings (more than 
the trunk's current 4 warnings).

-1 findbugs.  The patch appears to introduce 10 new Findbugs (version 
1.3.9) warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

 -1 core tests.  The patch failed these unit tests:
   org.apache.hadoop.hbase.client.TestAdmin
  org.apache.hadoop.hbase.regionserver.TestHRegionServerBulkLoad
  
org.apache.hadoop.hbase.regionserver.TestSplitTransactionOnCluster
  
org.apache.hadoop.hbase.mapreduce.TestLoadIncrementalHFilesSplitRecovery
  org.apache.hadoop.hbase.master.TestMasterNoCluster
  org.apache.hadoop.hbase.mapreduce.TestLoadIncrementalHFiles
  
org.apache.hadoop.hbase.coprocessor.TestRegionObserverInterface
  org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2493//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2493//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2493//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2493//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2493//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2493//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2493//console

This message is automatically generated.

 Bulkloading from remote filesystem is problematic
 -

 Key: HBASE-6358
 URL: https://issues.apache.org/jira/browse/HBASE-6358
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.94.0
Reporter: Dave Revell
Assignee: Dave Revell
 Attachments: HBASE-6358-trunk-v1.diff, HBASE-6358-trunk-v2.diff, 
 HBASE-6358-trunk-v3.diff


 Bulk loading hfiles that don't live on the same filesystem as HBase can cause 
 problems for subtle reasons.
 In Store.bulkLoadHFile(), the regionserver will copy the source hfile to its 
 own filesystem if it's not already there. Since this can take a long time for 
 large hfiles, it's likely that the client will timeout and retry. When the 
 client retries repeatedly, there may be several bulkload operations in flight 
 for the same hfile, causing lots of unnecessary IO and tying up handler 
 threads. This can seriously impact performance. In my case, the cluster 
 became unusable and the regionservers had to be kill -9'ed.
 Possible solutions:
  # Require that hfiles already be on the same filesystem as HBase in order 
 for bulkloading to succeed. The copy could be handled by 
 LoadIncrementalHFiles before the regionserver is called.
  # Others? I'm not familiar with Hadoop IPC so there may be tricks to extend 
 the timeout or something else.
 I'm willing to write a patch but I'd appreciate recommendations on how to 
 proceed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6358) Bulkloading from remote filesystem is problematic

2012-08-02 Thread Zhihong Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13427810#comment-13427810
 ] 

Zhihong Ted Yu commented on HBASE-6358:
---

With a little more information added to the log, you would see that the 
following check is inaccurate:
{code}
 if (!srcFs.equals(fs)) {
{code}
srcFs is: 
DFS[DFSClient[clientName=DFSClient_hb_rs_192.168.0.13,55152,1343960033351, 
ugi=zhihyu.hfs.0]]
fs is: org.apache.hadoop.hbase.fs.HFileSystem@580a00fd

I suggest using fs.getHomeDirectory()for comparison: it includes hostname, port 
number and home path.

 Bulkloading from remote filesystem is problematic
 -

 Key: HBASE-6358
 URL: https://issues.apache.org/jira/browse/HBASE-6358
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.94.0
Reporter: Dave Revell
Assignee: Dave Revell
 Attachments: HBASE-6358-trunk-v1.diff, HBASE-6358-trunk-v2.diff, 
 HBASE-6358-trunk-v3.diff


 Bulk loading hfiles that don't live on the same filesystem as HBase can cause 
 problems for subtle reasons.
 In Store.bulkLoadHFile(), the regionserver will copy the source hfile to its 
 own filesystem if it's not already there. Since this can take a long time for 
 large hfiles, it's likely that the client will timeout and retry. When the 
 client retries repeatedly, there may be several bulkload operations in flight 
 for the same hfile, causing lots of unnecessary IO and tying up handler 
 threads. This can seriously impact performance. In my case, the cluster 
 became unusable and the regionservers had to be kill -9'ed.
 Possible solutions:
  # Require that hfiles already be on the same filesystem as HBase in order 
 for bulkloading to succeed. The copy could be handled by 
 LoadIncrementalHFiles before the regionserver is called.
  # Others? I'm not familiar with Hadoop IPC so there may be tricks to extend 
 the timeout or something else.
 I'm willing to write a patch but I'd appreciate recommendations on how to 
 proceed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6358) Bulkloading from remote filesystem is problematic

2012-08-02 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13427829#comment-13427829
 ] 

Hadoop QA commented on HBASE-6358:
--

-1 overall.  Here are the results of testing the latest attachment 
  http://issues.apache.org/jira/secure/attachment/12538994/6358-suggestion.txt
  against trunk revision .

+1 @author.  The patch does not contain any @author tags.

-1 tests included.  The patch doesn't appear to include any new or modified 
tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

+1 hadoop2.0.  The patch compiles against the hadoop 2.0 profile.

+1 javadoc.  The javadoc tool did not generate any warning messages.

-1 javac.  The applied patch generated 5 javac compiler warnings (more than 
the trunk's current 4 warnings).

-1 findbugs.  The patch appears to introduce 10 new Findbugs (version 
1.3.9) warnings.

+1 release audit.  The applied patch does not increase the total number of 
release audit warnings.

 -1 core tests.  The patch failed these unit tests:
   org.apache.hadoop.hbase.master.TestAssignmentManager

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2495//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2495//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2495//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2495//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2495//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2495//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/2495//console

This message is automatically generated.

 Bulkloading from remote filesystem is problematic
 -

 Key: HBASE-6358
 URL: https://issues.apache.org/jira/browse/HBASE-6358
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.94.0
Reporter: Dave Revell
Assignee: Dave Revell
 Attachments: 6358-suggestion.txt, HBASE-6358-trunk-v1.diff, 
 HBASE-6358-trunk-v2.diff, HBASE-6358-trunk-v3.diff


 Bulk loading hfiles that don't live on the same filesystem as HBase can cause 
 problems for subtle reasons.
 In Store.bulkLoadHFile(), the regionserver will copy the source hfile to its 
 own filesystem if it's not already there. Since this can take a long time for 
 large hfiles, it's likely that the client will timeout and retry. When the 
 client retries repeatedly, there may be several bulkload operations in flight 
 for the same hfile, causing lots of unnecessary IO and tying up handler 
 threads. This can seriously impact performance. In my case, the cluster 
 became unusable and the regionservers had to be kill -9'ed.
 Possible solutions:
  # Require that hfiles already be on the same filesystem as HBase in order 
 for bulkloading to succeed. The copy could be handled by 
 LoadIncrementalHFiles before the regionserver is called.
  # Others? I'm not familiar with Hadoop IPC so there may be tricks to extend 
 the timeout or something else.
 I'm willing to write a patch but I'd appreciate recommendations on how to 
 proceed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6358) Bulkloading from remote filesystem is problematic

2012-08-01 Thread Harsh J (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13426680#comment-13426680
 ] 

Harsh J commented on HBASE-6358:


Hey Dave,

Will you be doing the patch? We can probably mark-deprecate this feature in 
0.96, and remove it in the release after that?

 Bulkloading from remote filesystem is problematic
 -

 Key: HBASE-6358
 URL: https://issues.apache.org/jira/browse/HBASE-6358
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.94.0
Reporter: Dave Revell
Assignee: Dave Revell

 Bulk loading hfiles that don't live on the same filesystem as HBase can cause 
 problems for subtle reasons.
 In Store.bulkLoadHFile(), the regionserver will copy the source hfile to its 
 own filesystem if it's not already there. Since this can take a long time for 
 large hfiles, it's likely that the client will timeout and retry. When the 
 client retries repeatedly, there may be several bulkload operations in flight 
 for the same hfile, causing lots of unnecessary IO and tying up handler 
 threads. This can seriously impact performance. In my case, the cluster 
 became unusable and the regionservers had to be kill -9'ed.
 Possible solutions:
  # Require that hfiles already be on the same filesystem as HBase in order 
 for bulkloading to succeed. The copy could be handled by 
 LoadIncrementalHFiles before the regionserver is called.
  # Others? I'm not familiar with Hadoop IPC so there may be tricks to extend 
 the timeout or something else.
 I'm willing to write a patch but I'd appreciate recommendations on how to 
 proceed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6358) Bulkloading from remote filesystem is problematic

2012-08-01 Thread Dave Revell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13426747#comment-13426747
 ] 

Dave Revell commented on HBASE-6358:


@Harsh,

Yes I will, sorry for the delay. I can have it within a week.

 Bulkloading from remote filesystem is problematic
 -

 Key: HBASE-6358
 URL: https://issues.apache.org/jira/browse/HBASE-6358
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.94.0
Reporter: Dave Revell
Assignee: Dave Revell

 Bulk loading hfiles that don't live on the same filesystem as HBase can cause 
 problems for subtle reasons.
 In Store.bulkLoadHFile(), the regionserver will copy the source hfile to its 
 own filesystem if it's not already there. Since this can take a long time for 
 large hfiles, it's likely that the client will timeout and retry. When the 
 client retries repeatedly, there may be several bulkload operations in flight 
 for the same hfile, causing lots of unnecessary IO and tying up handler 
 threads. This can seriously impact performance. In my case, the cluster 
 became unusable and the regionservers had to be kill -9'ed.
 Possible solutions:
  # Require that hfiles already be on the same filesystem as HBase in order 
 for bulkloading to succeed. The copy could be handled by 
 LoadIncrementalHFiles before the regionserver is called.
  # Others? I'm not familiar with Hadoop IPC so there may be tricks to extend 
 the timeout or something else.
 I'm willing to write a patch but I'd appreciate recommendations on how to 
 proceed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6358) Bulkloading from remote filesystem is problematic

2012-07-10 Thread Andrew Purtell (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13410658#comment-13410658
 ] 

Andrew Purtell commented on HBASE-6358:
---

bq. Require that hfiles already be on the same filesystem as HBase in order for 
bulkloading to succeed.

+1. Will avoid surprises like that reported on this issue.

 Bulkloading from remote filesystem is problematic
 -

 Key: HBASE-6358
 URL: https://issues.apache.org/jira/browse/HBASE-6358
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.94.0
Reporter: Dave Revell
Assignee: Dave Revell

 Bulk loading hfiles that don't live on the same filesystem as HBase can cause 
 problems for subtle reasons.
 In Store.bulkLoadHFile(), the regionserver will copy the source hfile to its 
 own filesystem if it's not already there. Since this can take a long time for 
 large hfiles, it's likely that the client will timeout and retry. When the 
 client retries repeatedly, there may be several bulkload operations in flight 
 for the same hfile, causing lots of unnecessary IO and tying up handler 
 threads. This can seriously impact performance. In my case, the cluster 
 became unusable and the regionservers had to be kill -9'ed.
 Possible solutions:
  # Require that hfiles already be on the same filesystem as HBase in order 
 for bulkloading to succeed. The copy could be handled by 
 LoadIncrementalHFiles before the regionserver is called.
  # Others? I'm not familiar with Hadoop IPC so there may be tricks to extend 
 the timeout or something else.
 I'm willing to write a patch but I'd appreciate recommendations on how to 
 proceed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6358) Bulkloading from remote filesystem is problematic

2012-07-09 Thread Harsh J (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13409896#comment-13409896
 ] 

Harsh J commented on HBASE-6358:


I sort of agree, except it is also more of a best-practice thing. If you bulk 
load remotely with only a single or very few requests per source at a time, and 
with a high RPC timeout at the client (such that it does not retry too often), 
then it should be more tolerable.

But in any case, having the RS do FS copies will indeed make it slow.

I ran into a very similar issue and the tweak I had to suggest was to indeed 
distcp/cp the data first and bulk load next. HBASE-6350 (Logging improvements 
for ops) and HBASE-6339 (Possible optimization, negative in the end) came out 
of it.

 Bulkloading from remote filesystem is problematic
 -

 Key: HBASE-6358
 URL: https://issues.apache.org/jira/browse/HBASE-6358
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.94.0
Reporter: Dave Revell
Assignee: Dave Revell

 Bulk loading hfiles that don't live on the same filesystem as HBase can cause 
 problems for subtle reasons.
 In Store.bulkLoadHFile(), the regionserver will copy the source hfile to its 
 own filesystem if it's not already there. Since this can take a long time for 
 large hfiles, it's likely that the client will timeout and retry. When the 
 client retries repeatedly, there may be several bulkload operations in flight 
 for the same hfile, causing lots of unnecessary IO and tying up handler 
 threads. This can seriously impact performance. In my case, the cluster 
 became unusable and the regionservers had to be kill -9'ed.
 Possible solutions:
  # Require that hfiles already be on the same filesystem as HBase in order 
 for bulkloading to succeed. The copy could be handled by 
 LoadIncrementalHFiles before the regionserver is called.
  # Others? I'm not familiar with Hadoop IPC so there may be tricks to extend 
 the timeout or something else.
 I'm willing to write a patch but I'd appreciate recommendations on how to 
 proceed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-6358) Bulkloading from remote filesystem is problematic

2012-07-09 Thread Todd Lipcon (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13409898#comment-13409898
 ] 

Todd Lipcon commented on HBASE-6358:


Yea, I originally wrote the code that did the copy, but in hindsight I think it 
was a mistake. I think we should remove that capability and have the code fail 
if the filesystem doesn't match.

 Bulkloading from remote filesystem is problematic
 -

 Key: HBASE-6358
 URL: https://issues.apache.org/jira/browse/HBASE-6358
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.94.0
Reporter: Dave Revell
Assignee: Dave Revell

 Bulk loading hfiles that don't live on the same filesystem as HBase can cause 
 problems for subtle reasons.
 In Store.bulkLoadHFile(), the regionserver will copy the source hfile to its 
 own filesystem if it's not already there. Since this can take a long time for 
 large hfiles, it's likely that the client will timeout and retry. When the 
 client retries repeatedly, there may be several bulkload operations in flight 
 for the same hfile, causing lots of unnecessary IO and tying up handler 
 threads. This can seriously impact performance. In my case, the cluster 
 became unusable and the regionservers had to be kill -9'ed.
 Possible solutions:
  # Require that hfiles already be on the same filesystem as HBase in order 
 for bulkloading to succeed. The copy could be handled by 
 LoadIncrementalHFiles before the regionserver is called.
  # Others? I'm not familiar with Hadoop IPC so there may be tricks to extend 
 the timeout or something else.
 I'm willing to write a patch but I'd appreciate recommendations on how to 
 proceed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira