Re: Compactions after bulk load
Thanks for the feedback, I've been slammed with other tasks but will get to this soon as we get other things stable. -Austin On 07/20/2018 02:59 PM, Ted Yu wrote: Have you checked the output from bulk load and see if there were lines in the following form (from LoadIncrementalHFiles#splitStoreFile) ? LOG.info("HFile at " + hfilePath + " no longer fits inside a single " + "region. Splitting..."); In the server log, you should see log in the following form: if (LOG.isDebugEnabled()) { LOG.debug("Compacting " + file + ", keycount=" + keyCount + ", bloomtype=" + r.getBloomFilterType().toString() + ", size=" + TraditionalBinaryPrefix.long2String(r.length(), "", 1) + ", encoding=" + r.getHFileReader().getDataBlockEncoding() + ", seqNum=" + seqNum + (allFiles ? ", earliestPutTs=" + earliestPutTs: "")); } where allFiles being true indicates major compaction. The above should give you some idea of the cause for the compaction activity. Thanks On Tue, Jul 17, 2018 at 11:12 AM Austin Heyne wrote: Hi all, I'm trying to bulk load a large amount of data into HBase. The bulk load succeeds but then HBase starts running compactions. My input files are typically ~5-6GB and there are over 3k files. I've used the same table splits for the bulk ingest and the bulk load so there should be no reason for hbase to run any compactions. However, I'm seeing it first start compacting the hfiles into 25+GB files and then into 200+GB files but didn't let it run any longer. Additionally, I've talked with another coworker who's tried this process in the past and he's experience the same thing, eventually giving up on the feature. My attempts have been on HBase 1.4.2. Does anyone have information on why HBase is insisting on running these compactions or how I can stop them? They are essentially breaking the feature for us. Thanks, -- Austin L. Heyne -- Austin L. Heyne
Re: Compactions after bulk load
Have you checked the output from bulk load and see if there were lines in the following form (from LoadIncrementalHFiles#splitStoreFile) ? LOG.info("HFile at " + hfilePath + " no longer fits inside a single " + "region. Splitting..."); In the server log, you should see log in the following form: if (LOG.isDebugEnabled()) { LOG.debug("Compacting " + file + ", keycount=" + keyCount + ", bloomtype=" + r.getBloomFilterType().toString() + ", size=" + TraditionalBinaryPrefix.long2String(r.length(), "", 1) + ", encoding=" + r.getHFileReader().getDataBlockEncoding() + ", seqNum=" + seqNum + (allFiles ? ", earliestPutTs=" + earliestPutTs: "")); } where allFiles being true indicates major compaction. The above should give you some idea of the cause for the compaction activity. Thanks On Tue, Jul 17, 2018 at 11:12 AM Austin Heyne wrote: > Hi all, > > I'm trying to bulk load a large amount of data into HBase. The bulk load > succeeds but then HBase starts running compactions. My input files are > typically ~5-6GB and there are over 3k files. I've used the same table > splits for the bulk ingest and the bulk load so there should be no > reason for hbase to run any compactions. However, I'm seeing it first > start compacting the hfiles into 25+GB files and then into 200+GB files > but didn't let it run any longer. Additionally, I've talked with another > coworker who's tried this process in the past and he's experience the > same thing, eventually giving up on the feature. My attempts have been > on HBase 1.4.2. Does anyone have information on why HBase is insisting > on running these compactions or how I can stop them? They are > essentially breaking the feature for us. > > Thanks, > > -- > Austin L. Heyne > >
Re: Compactions after bulk load
Hi Austin, Can you share your table description? Also,was the table empty? Last, what does your bulk data look like? I mean, how many files? One per region? Are you 100% sure? Have you used the HFile too to validate the splits and keys of your files? JMS 2018-07-17 14:12 GMT-04:00 Austin Heyne : > Hi all, > > I'm trying to bulk load a large amount of data into HBase. The bulk load > succeeds but then HBase starts running compactions. My input files are > typically ~5-6GB and there are over 3k files. I've used the same table > splits for the bulk ingest and the bulk load so there should be no reason > for hbase to run any compactions. However, I'm seeing it first start > compacting the hfiles into 25+GB files and then into 200+GB files but > didn't let it run any longer. Additionally, I've talked with another > coworker who's tried this process in the past and he's experience the same > thing, eventually giving up on the feature. My attempts have been on HBase > 1.4.2. Does anyone have information on why HBase is insisting on running > these compactions or how I can stop them? They are essentially breaking the > feature for us. > > Thanks, > > -- > Austin L. Heyne > >
Compactions after bulk load
Hi all, I'm trying to bulk load a large amount of data into HBase. The bulk load succeeds but then HBase starts running compactions. My input files are typically ~5-6GB and there are over 3k files. I've used the same table splits for the bulk ingest and the bulk load so there should be no reason for hbase to run any compactions. However, I'm seeing it first start compacting the hfiles into 25+GB files and then into 200+GB files but didn't let it run any longer. Additionally, I've talked with another coworker who's tried this process in the past and he's experience the same thing, eventually giving up on the feature. My attempts have been on HBase 1.4.2. Does anyone have information on why HBase is insisting on running these compactions or how I can stop them? They are essentially breaking the feature for us. Thanks, -- Austin L. Heyne