subject:"\[jira\] \[Commented\] \(HBASE\-6358\) Bulkloading from remote filesystem is problematic"

[jira] [Commented] (HBASE-6358) Bulkloading from remote filesystem is problematic

2012-08-06 Thread Dave Revell (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-6358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13429262#comment-13429262
]

Dave Revell commented on HBASE-6358:

{quote}If the size and speed don't matter, then wouldn't you have just used a
normal (non-bulk-load) MR job to load the data?{quote}

There are other reasons to atomically load hfiles even for non-huge datasets,
such as ETL and restoring backups. And atomicity could have some benefits for
certain use cases. But it's probably not asking too much for people with these
use cases to use a distributed hfile loader that depends on mapreduce, so I'm
willing to concede the point.

@Todd, would you be in favor of adding another JIRA ticket for a distributed
bulk loader, and having this ticket be blocked until it's done? I think it
should be blocked so we don't remove the current bulkload from remote fs
capability without offering an alternative, though the user does have the
option of running distcp themselves.

Bulkloading from remote filesystem is problematic
-

Key: HBASE-6358
URL: https://issues.apache.org/jira/browse/HBASE-6358
Project: HBase
Issue Type: Bug
Components: regionserver
Affects Versions: 0.94.0
Reporter: Dave Revell
Assignee: Dave Revell
Attachments: 6358-suggestion.txt, HBASE-6358-trunk-v1.diff,
HBASE-6358-trunk-v2.diff, HBASE-6358-trunk-v3.diff

Bulk loading hfiles that don't live on the same filesystem as HBase can cause
problems for subtle reasons.
In Store.bulkLoadHFile(), the regionserver will copy the source hfile to its
own filesystem if it's not already there. Since this can take a long time for
large hfiles, it's likely that the client will timeout and retry. When the
client retries repeatedly, there may be several bulkload operations in flight
for the same hfile, causing lots of unnecessary IO and tying up handler
threads. This can seriously impact performance. In my case, the cluster
became unusable and the regionservers had to be kill -9'ed.
Possible solutions:
# Require that hfiles already be on the same filesystem as HBase in order
for bulkloading to succeed. The copy could be handled by
LoadIncrementalHFiles before the regionserver is called.
# Others? I'm not familiar with Hadoop IPC so there may be tricks to extend
the timeout or something else.
I'm willing to write a patch but I'd appreciate recommendations on how to
proceed.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6358) Bulkloading from remote filesystem is problematic

2012-08-06 Thread Todd Lipcon (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-6358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13429359#comment-13429359
]

Todd Lipcon commented on HBASE-6358:

bq. @Todd, would you be in favor of adding another JIRA ticket for a
distributed bulk loader, and having this ticket be blocked until it's done? I
think it should be blocked so we don't remove the current bulkload from remote
fs capability without offering an alternative, though the user does have the
option of running distcp themselves.

I could go either way on this. Up to folks who are more actively contributing
code than I :)

Bulkloading from remote filesystem is problematic
-

[jira] [Commented] (HBASE-6358) Bulkloading from remote filesystem is problematic

2012-08-03 Thread Dave Revell (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-6358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13428217#comment-13428217
]

Dave Revell commented on HBASE-6358:

@Todd, can you explain your original rationale for using srcFs.equals(destFs)
as a way of checking whether the src and dest filesystems are the same? Also,
as an HDFS expert, can you suggest the best possible way for checking whether
the underlying filesystems are actually the same FS?

Bulkloading from remote filesystem is problematic
-

[jira] [Commented] (HBASE-6358) Bulkloading from remote filesystem is problematic

2012-08-03 Thread Todd Lipcon (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-6358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13428291#comment-13428291
]

Todd Lipcon commented on HBASE-6358:

hmm... I don't know if I thought about it in a huge amount of detail. The
original idea was to allow you to run an MR on one cluster, and then
LoadIncrementalHFiles on your HBase cluster which uses a different HDFS. I
was thinking there would be an advantage here over the distcp-then-load
approach, because the region server doing the copy would end up with a local
replica after the load.

That said, I didn't think through the timeout implications, which seems to be
the issue discussed in this JIRA.

As for how to determine if they're the same, the .equals() call is supposed to
do that, but perhaps it's not working right?

Bulkloading from remote filesystem is problematic
-

[jira] [Commented] (HBASE-6358) Bulkloading from remote filesystem is problematic

2012-08-03 Thread Dave Revell (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-6358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13428295#comment-13428295
]

Dave Revell commented on HBASE-6358:

Thanks Todd. My new plan is to keep the copy, but do it in
LoadIncrementalHFiles instead of in Store. This keeps the existing use cases
and test cases intact, but works around the timeout/retry problem.

Bulkloading from remote filesystem is problematic
-

[jira] [Commented] (HBASE-6358) Bulkloading from remote filesystem is problematic

2012-08-03 Thread Todd Lipcon (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-6358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13428337#comment-13428337
]

Todd Lipcon commented on HBASE-6358:

The problem of doing it automatically in LoadIncrementalHFiles (i.e the client)
is that it is going to be very slow for any non-trivial amount of data, to
funnel it through this single node.

Here's an alternate idea:
1. In this JIRA, change the RS side to fail if the filesystem doesn't match
2. Separately, add a new DistributedLoadIncrementalHFiles program which acts
as a combination of distcp and LoadIncrementalHFiles. For each RS (or perhaps
for each region), it would create one map task, with a locality hint to that
server. Then the task would copy the relevant file (achieving a local replica)
and make the necessary call to load the file.

Between step 1 and 2, users would have to use distcp and sacrifice locality.
But, with the current scheme, they already don't get locality for the common
case where the MR job runs on the same cluster as HBase.

Thoughts?

Bulkloading from remote filesystem is problematic
-

[jira] [Commented] (HBASE-6358) Bulkloading from remote filesystem is problematic

2012-08-03 Thread Dave Revell (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-6358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13428368#comment-13428368
]

Dave Revell commented on HBASE-6358:

@Todd, that idea seems fine to me overall.

If we just did the slow copy in LoadIncrementalHFiles as I suggested earlier,
users would still have the option of doing distcp before calling
LoadIncrementalHFiles if they need performance. This has the benefits of

# not breaking the current use case of non-local bulk loading when size or
speed requirements are modest
# not requiring a new DistributedLoadIncrementalHFiles utility

This scheme would not give locality though, so users with serious performance
requirements might not be satisfied.

Bulkloading from remote filesystem is problematic
-

[jira] [Commented] (HBASE-6358) Bulkloading from remote filesystem is problematic

2012-08-03 Thread Todd Lipcon (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-6358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13428508#comment-13428508
]

Todd Lipcon commented on HBASE-6358:

bq. not breaking the current use case of non-local bulk loading when size or
speed requirements are modest

If the size and speed don't matter, then wouldn't you have just used a normal
(non-bulk-load) MR job to load the data?

I think funneling the load through one host basically defeats the purpose of
bulk load. Perhaps it could be available as an option for people just testing
out, but I would prefer the default to be a failure, and you have to enable the
copy with a {{-copyToCluster}} or something.

Bulkloading from remote filesystem is problematic
-

[jira] [Commented] (HBASE-6358) Bulkloading from remote filesystem is problematic

2012-08-02 Thread Zhihong Ted Yu (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-6358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13427664#comment-13427664
]

Zhihong Ted Yu commented on HBASE-6358:
---

{code}
+ throw new IOException(errMsg);
{code}
Should DoNotRetryIOException be thrown instead ?

Bulkloading from remote filesystem is problematic
-

[jira] [Commented] (HBASE-6358) Bulkloading from remote filesystem is problematic

2012-08-02 Thread Dave Revell (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-6358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13427711#comment-13427711
]

Dave Revell commented on HBASE-6358:

[~zhi...@ebaysf.com]: yes, I'll change it.

Bulkloading from remote filesystem is problematic
-

[jira] [Commented] (HBASE-6358) Bulkloading from remote filesystem is problematic

2012-08-02 Thread Hadoop QA (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-6358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13427761#comment-13427761
]

Hadoop QA commented on HBASE-6358:
--

-1 overall. Here are the results of testing the latest attachment

http://issues.apache.org/jira/secure/attachment/12538970/HBASE-6358-trunk-v2.diff
against trunk revision .

+1 @author. The patch does not contain any @author tags.

-1 tests included. The patch doesn't appear to include any new or modified
tests.
Please justify why no new tests are needed for this
patch.
Also please list what manual steps were performed to
verify this patch.

+1 hadoop2.0. The patch compiles against the hadoop 2.0 profile.

+1 javadoc. The javadoc tool did not generate any warning messages.

-1 javac. The applied patch generated 5 javac compiler warnings (more than
the trunk's current 4 warnings).

-1 findbugs. The patch appears to introduce 10 new Findbugs (version
1.3.9) warnings.

+1 release audit. The applied patch does not increase the total number of
release audit warnings.

-1 core tests. The patch failed these unit tests:

org.apache.hadoop.hbase.io.hfile.TestForceCacheImportantBlocks
org.apache.hadoop.hbase.regionserver.TestHRegionServerBulkLoad

org.apache.hadoop.hbase.mapreduce.TestLoadIncrementalHFilesSplitRecovery
org.apache.hadoop.hbase.master.TestMasterNoCluster
org.apache.hadoop.hbase.mapreduce.TestLoadIncrementalHFiles

org.apache.hadoop.hbase.coprocessor.TestRegionObserverInterface
org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat

Test results:
https://builds.apache.org/job/PreCommit-HBASE-Build/2489//testReport/
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/2489//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/2489//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/2489//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/2489//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/2489//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Console output:
https://builds.apache.org/job/PreCommit-HBASE-Build/2489//console

This message is automatically generated.

Bulkloading from remote filesystem is problematic
-

[jira] [Commented] (HBASE-6358) Bulkloading from remote filesystem is problematic

2012-08-02 Thread Dave Revell (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-6358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13427772#comment-13427772
]

Dave Revell commented on HBASE-6358:

Patch v3 uploaded

Bulkloading from remote filesystem is problematic
-

Key: HBASE-6358
URL: https://issues.apache.org/jira/browse/HBASE-6358
Project: HBase
Issue Type: Bug
Components: regionserver
Affects Versions: 0.94.0
Reporter: Dave Revell
Assignee: Dave Revell
Attachments: HBASE-6358-trunk-v1.diff, HBASE-6358-trunk-v2.diff,
HBASE-6358-trunk-v3.diff

[jira] [Commented] (HBASE-6358) Bulkloading from remote filesystem is problematic

2012-08-02 Thread Hadoop QA (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-6358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13427786#comment-13427786
]

Hadoop QA commented on HBASE-6358:
--

-1 overall. Here are the results of testing the latest attachment

http://issues.apache.org/jira/secure/attachment/12538970/HBASE-6358-trunk-v2.diff
against trunk revision .

+1 @author. The patch does not contain any @author tags.

+1 hadoop2.0. The patch compiles against the hadoop 2.0 profile.

+1 javadoc. The javadoc tool did not generate any warning messages.

-1 javac. The applied patch generated 5 javac compiler warnings (more than
the trunk's current 4 warnings).

-1 findbugs. The patch appears to introduce 10 new Findbugs (version
1.3.9) warnings.

+1 release audit. The applied patch does not increase the total number of
release audit warnings.

-1 core tests. The patch failed these unit tests:

org.apache.hadoop.hbase.regionserver.TestHRegionServerBulkLoad

org.apache.hadoop.hbase.mapreduce.TestLoadIncrementalHFilesSplitRecovery
org.apache.hadoop.hbase.mapreduce.TestLoadIncrementalHFiles

org.apache.hadoop.hbase.coprocessor.TestRegionObserverInterface
org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat

Test results:
https://builds.apache.org/job/PreCommit-HBASE-Build/2492//testReport/
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/2492//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/2492//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/2492//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/2492//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/2492//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Console output:
https://builds.apache.org/job/PreCommit-HBASE-Build/2492//console

This message is automatically generated.

Bulkloading from remote filesystem is problematic
-

Key: HBASE-6358
URL: https://issues.apache.org/jira/browse/HBASE-6358
Project: HBase
Issue Type: Bug
Components: regionserver
Affects Versions: 0.94.0
Reporter: Dave Revell
Assignee: Dave Revell
Attachments: HBASE-6358-trunk-v1.diff, HBASE-6358-trunk-v2.diff,
HBASE-6358-trunk-v3.diff

[jira] [Commented] (HBASE-6358) Bulkloading from remote filesystem is problematic

2012-08-02 Thread Hadoop QA (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-6358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13427805#comment-13427805
]

Hadoop QA commented on HBASE-6358:
--

-1 overall. Here are the results of testing the latest attachment

http://issues.apache.org/jira/secure/attachment/12538985/HBASE-6358-trunk-v3.diff
against trunk revision .

+1 @author. The patch does not contain any @author tags.

+1 hadoop2.0. The patch compiles against the hadoop 2.0 profile.

+1 javadoc. The javadoc tool did not generate any warning messages.

-1 javac. The applied patch generated 5 javac compiler warnings (more than
the trunk's current 4 warnings).

-1 findbugs. The patch appears to introduce 10 new Findbugs (version
1.3.9) warnings.

+1 release audit. The applied patch does not increase the total number of
release audit warnings.

-1 core tests. The patch failed these unit tests:
org.apache.hadoop.hbase.client.TestAdmin
org.apache.hadoop.hbase.regionserver.TestHRegionServerBulkLoad

org.apache.hadoop.hbase.regionserver.TestSplitTransactionOnCluster

org.apache.hadoop.hbase.mapreduce.TestLoadIncrementalHFilesSplitRecovery
org.apache.hadoop.hbase.master.TestMasterNoCluster
org.apache.hadoop.hbase.mapreduce.TestLoadIncrementalHFiles

org.apache.hadoop.hbase.coprocessor.TestRegionObserverInterface
org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat

Test results:
https://builds.apache.org/job/PreCommit-HBASE-Build/2493//testReport/
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/2493//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/2493//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/2493//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/2493//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/2493//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Console output:
https://builds.apache.org/job/PreCommit-HBASE-Build/2493//console

This message is automatically generated.

Bulkloading from remote filesystem is problematic
-

Key: HBASE-6358
URL: https://issues.apache.org/jira/browse/HBASE-6358
Project: HBase
Issue Type: Bug
Components: regionserver
Affects Versions: 0.94.0
Reporter: Dave Revell
Assignee: Dave Revell
Attachments: HBASE-6358-trunk-v1.diff, HBASE-6358-trunk-v2.diff,
HBASE-6358-trunk-v3.diff

[jira] [Commented] (HBASE-6358) Bulkloading from remote filesystem is problematic

2012-08-02 Thread Zhihong Ted Yu (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-6358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13427810#comment-13427810
]

Zhihong Ted Yu commented on HBASE-6358:
---

With a little more information added to the log, you would see that the
following check is inaccurate:
{code}
if (!srcFs.equals(fs)) {
{code}
srcFs is:
DFS[DFSClient[clientName=DFSClient_hb_rs_192.168.0.13,55152,1343960033351,
ugi=zhihyu.hfs.0]]
fs is: org.apache.hadoop.hbase.fs.HFileSystem@580a00fd

I suggest using fs.getHomeDirectory()for comparison: it includes hostname, port
number and home path.

Bulkloading from remote filesystem is problematic
-

Key: HBASE-6358
URL: https://issues.apache.org/jira/browse/HBASE-6358
Project: HBase
Issue Type: Bug
Components: regionserver
Affects Versions: 0.94.0
Reporter: Dave Revell
Assignee: Dave Revell
Attachments: HBASE-6358-trunk-v1.diff, HBASE-6358-trunk-v2.diff,
HBASE-6358-trunk-v3.diff

[jira] [Commented] (HBASE-6358) Bulkloading from remote filesystem is problematic

2012-08-02 Thread Hadoop QA (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-6358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13427829#comment-13427829
]

Hadoop QA commented on HBASE-6358:
--

-1 overall. Here are the results of testing the latest attachment
http://issues.apache.org/jira/secure/attachment/12538994/6358-suggestion.txt
against trunk revision .

+1 @author. The patch does not contain any @author tags.

+1 hadoop2.0. The patch compiles against the hadoop 2.0 profile.

+1 javadoc. The javadoc tool did not generate any warning messages.

-1 javac. The applied patch generated 5 javac compiler warnings (more than
the trunk's current 4 warnings).

-1 findbugs. The patch appears to introduce 10 new Findbugs (version
1.3.9) warnings.

+1 release audit. The applied patch does not increase the total number of
release audit warnings.

-1 core tests. The patch failed these unit tests:
org.apache.hadoop.hbase.master.TestAssignmentManager

Test results:
https://builds.apache.org/job/PreCommit-HBASE-Build/2495//testReport/
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/2495//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/2495//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/2495//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/2495//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/2495//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Console output:
https://builds.apache.org/job/PreCommit-HBASE-Build/2495//console

This message is automatically generated.

Bulkloading from remote filesystem is problematic
-

[jira] [Commented] (HBASE-6358) Bulkloading from remote filesystem is problematic

2012-08-01 Thread Harsh J (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-6358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13426680#comment-13426680
]

Harsh J commented on HBASE-6358:

Hey Dave,

Will you be doing the patch? We can probably mark-deprecate this feature in
0.96, and remove it in the release after that?

Bulkloading from remote filesystem is problematic
-

Key: HBASE-6358
URL: https://issues.apache.org/jira/browse/HBASE-6358
Project: HBase
Issue Type: Bug
Components: regionserver
Affects Versions: 0.94.0
Reporter: Dave Revell
Assignee: Dave Revell

[jira] [Commented] (HBASE-6358) Bulkloading from remote filesystem is problematic

2012-08-01 Thread Dave Revell (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-6358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13426747#comment-13426747
]

Dave Revell commented on HBASE-6358:

@Harsh,

Yes I will, sorry for the delay. I can have it within a week.

Bulkloading from remote filesystem is problematic
-

Key: HBASE-6358
URL: https://issues.apache.org/jira/browse/HBASE-6358
Project: HBase
Issue Type: Bug
Components: regionserver
Affects Versions: 0.94.0
Reporter: Dave Revell
Assignee: Dave Revell

[jira] [Commented] (HBASE-6358) Bulkloading from remote filesystem is problematic

2012-07-10 Thread Andrew Purtell (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-6358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13410658#comment-13410658
]

Andrew Purtell commented on HBASE-6358:
---

bq. Require that hfiles already be on the same filesystem as HBase in order for
bulkloading to succeed.

+1. Will avoid surprises like that reported on this issue.

Bulkloading from remote filesystem is problematic
-

Key: HBASE-6358
URL: https://issues.apache.org/jira/browse/HBASE-6358
Project: HBase
Issue Type: Bug
Components: regionserver
Affects Versions: 0.94.0
Reporter: Dave Revell
Assignee: Dave Revell

[jira] [Commented] (HBASE-6358) Bulkloading from remote filesystem is problematic

2012-07-09 Thread Harsh J (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-6358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13409896#comment-13409896
]

Harsh J commented on HBASE-6358:

I sort of agree, except it is also more of a best-practice thing. If you bulk
load remotely with only a single or very few requests per source at a time, and
with a high RPC timeout at the client (such that it does not retry too often),
then it should be more tolerable.

But in any case, having the RS do FS copies will indeed make it slow.

I ran into a very similar issue and the tweak I had to suggest was to indeed
distcp/cp the data first and bulk load next. HBASE-6350 (Logging improvements
for ops) and HBASE-6339 (Possible optimization, negative in the end) came out
of it.

Bulkloading from remote filesystem is problematic
-

Key: HBASE-6358
URL: https://issues.apache.org/jira/browse/HBASE-6358
Project: HBase
Issue Type: Bug
Components: regionserver
Affects Versions: 0.94.0
Reporter: Dave Revell
Assignee: Dave Revell

[jira] [Commented] (HBASE-6358) Bulkloading from remote filesystem is problematic

2012-07-09 Thread Todd Lipcon (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-6358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13409898#comment-13409898
]

Todd Lipcon commented on HBASE-6358:

Yea, I originally wrote the code that did the copy, but in hindsight I think it
was a mistake. I think we should remove that capability and have the code fail
if the filesystem doesn't match.

Bulkloading from remote filesystem is problematic
-

Key: HBASE-6358
URL: https://issues.apache.org/jira/browse/HBASE-6358
Project: HBase
Issue Type: Bug
Components: regionserver
Affects Versions: 0.94.0
Reporter: Dave Revell
Assignee: Dave Revell

[jira] [Commented] (HBASE-6358) Bulkloading from remote filesystem is problematic

[jira] [Commented] (HBASE-6358) Bulkloading from remote filesystem is problematic

[jira] [Commented] (HBASE-6358) Bulkloading from remote filesystem is problematic

[jira] [Commented] (HBASE-6358) Bulkloading from remote filesystem is problematic

[jira] [Commented] (HBASE-6358) Bulkloading from remote filesystem is problematic

[jira] [Commented] (HBASE-6358) Bulkloading from remote filesystem is problematic

[jira] [Commented] (HBASE-6358) Bulkloading from remote filesystem is problematic

[jira] [Commented] (HBASE-6358) Bulkloading from remote filesystem is problematic

[jira] [Commented] (HBASE-6358) Bulkloading from remote filesystem is problematic

[jira] [Commented] (HBASE-6358) Bulkloading from remote filesystem is problematic

[jira] [Commented] (HBASE-6358) Bulkloading from remote filesystem is problematic

[jira] [Commented] (HBASE-6358) Bulkloading from remote filesystem is problematic

[jira] [Commented] (HBASE-6358) Bulkloading from remote filesystem is problematic

[jira] [Commented] (HBASE-6358) Bulkloading from remote filesystem is problematic

[jira] [Commented] (HBASE-6358) Bulkloading from remote filesystem is problematic

[jira] [Commented] (HBASE-6358) Bulkloading from remote filesystem is problematic

[jira] [Commented] (HBASE-6358) Bulkloading from remote filesystem is problematic

[jira] [Commented] (HBASE-6358) Bulkloading from remote filesystem is problematic

[jira] [Commented] (HBASE-6358) Bulkloading from remote filesystem is problematic

[jira] [Commented] (HBASE-6358) Bulkloading from remote filesystem is problematic

[jira] [Commented] (HBASE-6358) Bulkloading from remote filesystem is problematic

21 matches

Site Navigation

Mail list logo

Footer information