BTW, I noticed another problem. I bulk load data into HBase every five
minutes, but I found that whenever the following command was executed
hbase org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles
HFiles-Dir MyTable
there is a new process called "LoadIncrementalHFiles"
I can see many processes called "LoadIncrementalHFiles" using the command
"jps" in the terminal, why are these processes still there even after the
command that bulk load HFiles into HBase has finished executing ? I have to
kill them myself.
2013/12/17 Bijieshan <[email protected]>
> Yes, it should be cleaned up. But not included in current code in my
> understanding.
>
> Jieshan.
> -----Original Message-----
> From: Ted Yu [mailto:[email protected]]
> Sent: Tuesday, December 17, 2013 10:55 AM
> To: [email protected]
> Subject: Re: Why so many unexpected files like partitions_xxxx are created?
>
> Should bulk load task clean up partitions_xxxx upon completion ?
>
> Cheers
>
>
> On Mon, Dec 16, 2013 at 6:53 PM, Bijieshan <[email protected]> wrote:
>
> > > I think I should delete these files immediately after I have
> > > finished
> > bulk loading data into HBase since they are useless at that time, right ?
> >
> > Ya. I think so. They are useless once bulk load task finished.
> >
> > Jieshan.
> > -----Original Message-----
> > From: Tao Xiao [mailto:[email protected]]
> > Sent: Tuesday, December 17, 2013 9:34 AM
> > To: [email protected]
> > Subject: Re: Why so many unexpected files like partitions_xxxx are
> created?
> >
> > Indeed these files are produced by org.apache.hadoop.hbase.mapreduce.
> > LoadIncrementalHFiles in the directory specified by what
> > job.getWorkingDirectory()
> > returns, and I think I should delete these files immediately after I
> > have finished bulk loading data into HBase since they are useless at
> > that time, right ?
> >
> >
> >
> >
> > 2013/12/16 Bijieshan <[email protected]>
> >
> > > The reduce partition information is stored in this partition_XXXX file.
> > > See the below code:
> > >
> > > HFileOutputFormat#configureIncrementalLoad:
> > > .....................
> > > Path partitionsPath = new Path(job.getWorkingDirectory(),
> > > "partitions_" + UUID.randomUUID());
> > > LOG.info("Writing partition information to " + partitionsPath);
> > >
> > > FileSystem fs = partitionsPath.getFileSystem(conf);
> > > writePartitions(conf, partitionsPath, startKeys);
> > > .....................
> > >
> > > Hoping it helps.
> > >
> > > Jieshan
> > > -----Original Message-----
> > > From: Tao Xiao [mailto:[email protected]]
> > > Sent: Monday, December 16, 2013 6:48 PM
> > > To: [email protected]
> > > Subject: Why so many unexpected files like partitions_xxxx are created?
> > >
> > > I imported data into HBase in the fashion of bulk load, but after
> > > that I found many unexpected file were created in the HDFS directory
> > > of /user/root/, and they like these:
> > >
> > > /user/root/partitions_fd74866b-6588-468d-8463-474e202db070
> > > /user/root/partitions_fd867cd2-d9c9-48f5-9eec-185b2e57788d
> > > /user/root/partitions_fda37b8a-a882-4787-babc-8310a969f85c
> > > /user/root/partitions_fdaca2f4-2792-41f6-b7e8-61a8a5677dea
> > > /user/root/partitions_fdd55baa-3a12-493e-8844-a23ae83209c5
> > > /user/root/partitions_fdd85a3c-9abe-45d4-a0c6-76d2bed88ea5
> > > /user/root/partitions_fe133460-5f3f-4c6a-9fff-ff6c62410cc1
> > > /user/root/partitions_fe29a2b0-b281-465f-8d4a-6044822d960a
> > > /user/root/partitions_fe2fa6fa-9066-484c-bc91-ec412e48d008
> > > /user/root/partitions_fe31667b-2d5a-452e-baf7-a81982fe954a
> > > /user/root/partitions_fe3a5542-bc4d-4137-9d5e-1a0c59f72ac3
> > > /user/root/partitions_fe6a9407-c27b-4a67-bb50-e6b9fd172bc9
> > > /user/root/partitions_fe6f9294-f970-473c-8659-c08292c27ddd
> > > ... ...
> > > ... ...
> > >
> > >
> > > It seems that they are HFiles, but I don't know why the were created
> > here?
> > >
> > > I bulk load data into HBase in the following way:
> > >
> > > Firstly, I wrote a MapReduce program which only has map tasks. The
> map
> > > tasks read some text data and emit them in the form of RowKey and
> > > KeyValue.The following is my program:
> > >
> > > @Override
> > > protected void map(NullWritable NULL, GtpcV1SignalWritable
> > > signal, Context ctx) throws InterruptedException, IOException {
> > > String strRowkey = xxx;
> > > byte[] rowkeyBytes = Bytes.toBytes(strRowkey);
> > >
> > > rowkey.set(rowkeyBytes);
> > >
> > > part1.init(signal);
> > > part2.init(signal);
> > >
> > > KeyValue kv = new KeyValue(rowkeyBytes, Family_A,
> > > Qualifier_Q, part1.serialize());
> > > ctx.write(rowkey, kv);
> > >
> > > kv = new KeyValue(rowkeyBytes, Family_B, Qualifier_Q,
> > > part2.serialize());
> > > ctx.write(rowkey, kv);
> > > }
> > >
> > >
> > > after the MR programs finished, there were several HFiles generated
> > > in the output directory I specified.
> > >
> > > Then I bean to load these HFiles into HBase using the following
> command:
> > > hbase org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles
> > > HFiles-Dir MyTable
> > >
> > > Finally , I could see that the data were indeed loaded into the
> > > table in HBase.
> > >
> > >
> > > But, I could also see that there were many unexpected files
> > > generated in the HDFS directory of /user/root/, just as I have
> > > mentioned at the begining of this mail, and I did not specify any
> > > files to be produced in this directory.
> > >
> > > What happened ? Who can tell me what there files are and who
> > > produced
> > them?
> > >
> > > Thanks
> > >
> >
>