Hmmm, we didn't change anything with the config (at least that we know
of) and we certainly didn't change any of the ordering of performing the
load and the distcp off the cluster.
One interesting thing we were noticing after the upgrade is that distcp
would copy HFiles to the backup cluster, but then the files would there
would be deleted. That was actually how we first noticed the change as
we were tracking the total number of HFiles there and the count would
increase as normal after the distcp but then mysteriously decrease. I
presume it was due to the HFile loader marking the files for deletion
while distcp was running, and then the remote HDFS completing the deletion.
Between distcp and LoadIncrementalHFiles some bit of behavior definitely
changed, I just don't know where it is. Regardless we now have a
working solution/work-around. If this is the expected behavior rather
than a bug then all is fine.
- Adam
On 4/30/11 10:50 PM, Todd Lipcon wrote:
Hi Adam,
It's always been this way.
The only time you'll see them copied is if you run the load from a
remote filesystem - ie if you specify a URL that doesn't match the URL
used in hbase.rootdir.
See th bulkLoadHFile() method in Store.java:
// Move the file if it's on another filesystem
FileSystem srcFs = srcPath.getFileSystem(conf);
if (!srcFs.equals(fs)) {
LOG.info("File " + srcPath + " on different filesystem than " +
"destination store - moving to this filesystem.");
Path tmpPath = getTmpPath();
FileUtil.copy(srcFs, srcPath, fs, tmpPath, false, conf);
LOG.info("Copied to temporary path on dst filesystem: " + tmpPath);
srcPath = tmpPath;
}
Perhaps your config changed slightly during the upgrade?
-Todd
On Fri, Apr 29, 2011 at 1:11 PM, Adam Phelps <[email protected]
<mailto:[email protected]>> wrote:
I could believe that, although I was under the impression that these
files are actually incorporated into the existing region files.
Still, its definitely a different behavior than what we were
seeing before our recent upgrade.
- Adam
On 4/29/11 10:41 AM, Patrick Angeles wrote:
Adam,
They are probably not deleted, but moved to the appropriate region
subdirectory under /hbase.
On Fri, Apr 29, 2011 at 1:15 PM, Adam Phelps<[email protected]
<mailto:[email protected]>> wrote:
I just verified this, and the hfiles seem to be deleted one
at a time as
the bulk load runs.
- Adam
On 4/28/11 4:28 PM, Stack wrote:
I took a look through the code and don't see any
explicit removes and
looking through history of changes to the file, I don't
see any change
of substance.
Can you figure what is doing the delete? At what stage?
Is it as
completebulkload runs?
St.Ack
On Thu, Apr 28, 2011 at 10:59 AM, Adam
Phelps<[email protected] <mailto:[email protected]>> wrote:
We were using a backup scheme for our system where
we have map-reduce
jobs
generating HFiles, which we then loaded using
LoadIncrementalHFiles
before
making a remote copy of them using distcp.
However we just upgraded hbase (we're using
cloudera's package, so we
went
from CDH3B4 to CDH3U0, both of which are versions of
0.90.1), and
discovered
that the HFiles now get deleted by the load
operation. Is this a recent
change? Is there a configuration variable to revert
this behavior?
We can work around it by doing the copy before the
load, but that is less
than optimal in our scenario as we'd prefer to have
quicker access to the
data in HBase.
- Adam
--
Todd Lipcon
Software Engineer, Cloudera