Yes, it should be cleaned up. But not included in current code in my understanding.
Jieshan. -----Original Message----- From: Ted Yu [mailto:[email protected]] Sent: Tuesday, December 17, 2013 10:55 AM To: [email protected] Subject: Re: Why so many unexpected files like partitions_xxxx are created? Should bulk load task clean up partitions_xxxx upon completion ? Cheers On Mon, Dec 16, 2013 at 6:53 PM, Bijieshan <[email protected]> wrote: > > I think I should delete these files immediately after I have > > finished > bulk loading data into HBase since they are useless at that time, right ? > > Ya. I think so. They are useless once bulk load task finished. > > Jieshan. > -----Original Message----- > From: Tao Xiao [mailto:[email protected]] > Sent: Tuesday, December 17, 2013 9:34 AM > To: [email protected] > Subject: Re: Why so many unexpected files like partitions_xxxx are created? > > Indeed these files are produced by org.apache.hadoop.hbase.mapreduce. > LoadIncrementalHFiles in the directory specified by what > job.getWorkingDirectory() > returns, and I think I should delete these files immediately after I > have finished bulk loading data into HBase since they are useless at > that time, right ? > > > > > 2013/12/16 Bijieshan <[email protected]> > > > The reduce partition information is stored in this partition_XXXX file. > > See the below code: > > > > HFileOutputFormat#configureIncrementalLoad: > > ..................... > > Path partitionsPath = new Path(job.getWorkingDirectory(), > > "partitions_" + UUID.randomUUID()); > > LOG.info("Writing partition information to " + partitionsPath); > > > > FileSystem fs = partitionsPath.getFileSystem(conf); > > writePartitions(conf, partitionsPath, startKeys); > > ..................... > > > > Hoping it helps. > > > > Jieshan > > -----Original Message----- > > From: Tao Xiao [mailto:[email protected]] > > Sent: Monday, December 16, 2013 6:48 PM > > To: [email protected] > > Subject: Why so many unexpected files like partitions_xxxx are created? > > > > I imported data into HBase in the fashion of bulk load, but after > > that I found many unexpected file were created in the HDFS directory > > of /user/root/, and they like these: > > > > /user/root/partitions_fd74866b-6588-468d-8463-474e202db070 > > /user/root/partitions_fd867cd2-d9c9-48f5-9eec-185b2e57788d > > /user/root/partitions_fda37b8a-a882-4787-babc-8310a969f85c > > /user/root/partitions_fdaca2f4-2792-41f6-b7e8-61a8a5677dea > > /user/root/partitions_fdd55baa-3a12-493e-8844-a23ae83209c5 > > /user/root/partitions_fdd85a3c-9abe-45d4-a0c6-76d2bed88ea5 > > /user/root/partitions_fe133460-5f3f-4c6a-9fff-ff6c62410cc1 > > /user/root/partitions_fe29a2b0-b281-465f-8d4a-6044822d960a > > /user/root/partitions_fe2fa6fa-9066-484c-bc91-ec412e48d008 > > /user/root/partitions_fe31667b-2d5a-452e-baf7-a81982fe954a > > /user/root/partitions_fe3a5542-bc4d-4137-9d5e-1a0c59f72ac3 > > /user/root/partitions_fe6a9407-c27b-4a67-bb50-e6b9fd172bc9 > > /user/root/partitions_fe6f9294-f970-473c-8659-c08292c27ddd > > ... ... > > ... ... > > > > > > It seems that they are HFiles, but I don't know why the were created > here? > > > > I bulk load data into HBase in the following way: > > > > Firstly, I wrote a MapReduce program which only has map tasks. The map > > tasks read some text data and emit them in the form of RowKey and > > KeyValue.The following is my program: > > > > @Override > > protected void map(NullWritable NULL, GtpcV1SignalWritable > > signal, Context ctx) throws InterruptedException, IOException { > > String strRowkey = xxx; > > byte[] rowkeyBytes = Bytes.toBytes(strRowkey); > > > > rowkey.set(rowkeyBytes); > > > > part1.init(signal); > > part2.init(signal); > > > > KeyValue kv = new KeyValue(rowkeyBytes, Family_A, > > Qualifier_Q, part1.serialize()); > > ctx.write(rowkey, kv); > > > > kv = new KeyValue(rowkeyBytes, Family_B, Qualifier_Q, > > part2.serialize()); > > ctx.write(rowkey, kv); > > } > > > > > > after the MR programs finished, there were several HFiles generated > > in the output directory I specified. > > > > Then I bean to load these HFiles into HBase using the following command: > > hbase org.apache.hadoop.hbase.mapreduce.LoadIncrementalHFiles > > HFiles-Dir MyTable > > > > Finally , I could see that the data were indeed loaded into the > > table in HBase. > > > > > > But, I could also see that there were many unexpected files > > generated in the HDFS directory of /user/root/, just as I have > > mentioned at the begining of this mail, and I did not specify any > > files to be produced in this directory. > > > > What happened ? Who can tell me what there files are and who > > produced > them? > > > > Thanks > > >
