In the first step, the files are read correctly and regionGroups is creates as it should. When debugging, in LoadIncrementalHFiles.tryAtomicRegionLoad() I notice that ServerCallable's regionName returned from server is the wrong region (the pre-split last region). The previous last region is not supposed to delete I'm just adding new regions (always following lexicographically) so that the last region before the pre-split is not the last anymore. It seems that wherever the ServerCallable is running, it is not updated with the new regions... I tried major compacting (the new regions) after pre-split and before the bulkload, but that didn't help.
On Mon, Dec 16, 2013 at 3:07 PM, Bijieshan <[email protected]> wrote: > As we know, bulk load has two steps: > 1. Create HFiles by MapReduce. > 2. Load HFiles into HBase. > > I wonder whether it read the right partitions information during the first > step. Have you run hbck tool to check the cluster healthy? > You mentioned you see the new regions in the webapp. The files were moved > to the previous old region indicated the old region directory was still > there. So you started bulk load just after region split? (Old region > directory will be deleted soon by CatalogJanitor after region-split once > compaction finished) > > I suggest to check the regionserver logs. > > Jieshan. > -----Original Message----- > From: Amit Sela [mailto:[email protected]] > Sent: Monday, December 16, 2013 2:29 PM > To: [email protected] > Subject: RE: Bulk load moving HFiles to the wrong region > > Every split executed is a new day. The row key design is yyyyMMdd_URL. And > the split points are yyyyMMdd_x, yyyyMMdd_y etc. In a way that the entire > load is (almost) evenly spread. > The problem I described causes the bulk load to load all files to to the > last region of the previous day. > Thanks. > On Dec 16, 2013 3:43 AM, "Bijieshan" <[email protected]> wrote: > > > Hi Amit: > > Can you provide the split-keys of the new regions and your row-key > design? > > > > Thank you. > > Jieshan. > > -----Original Message----- > > From: Amit Sela [mailto:[email protected]] > > Sent: Monday, December 16, 2013 7:09 AM > > To: [email protected] > > Subject: Bulk load moving HFiles to the wrong region > > > > Hi all, > > I'm using Hadoop 1.0.4 and HBase 0.94.12. > > When trying to bulk load using the Java API I sometimes get the HFiles > > moved to the wrong directory. > > I'm pre-splitting regions and the new regions are always the last > > (lexicographically), so when this happens all files move to the last > > region pre-split. But the split does work. I see the new regions in > > the webapp before bulk load executes. Once a table has this problem > > (not all the time) it keeps on until I restart HBase. > > > > Anyone seen something similar ? > > > > Thanks, > > Amit. > > >
