[jira] [Commented] (HBASE-11368) Multi-column family BulkLoad fails if compactions go on too long
[ https://issues.apache.org/jira/browse/HBASE-11368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15120874#comment-15120874 ] Hadoop QA commented on HBASE-11368: --- | (x) *{color:red}-1 overall{color}* | \\ \\ || Vote || Subsystem || Runtime || Comment || | {color:red}-1{color} | {color:red} patch {color} | {color:red} 0m 5s {color} | {color:red} HBASE-11368 does not apply to master. Rebase required? Wrong Branch? See https://yetus.apache.org/documentation/latest/precommit-patchnames for help. {color} | \\ \\ || Subsystem || Report/Notes || | JIRA Patch URL | https://issues.apache.org/jira/secure/attachment/12677543/hbase11368-master.patch | | JIRA Issue | HBASE-11368 | | Powered by | Apache Yetus 0.1.0 http://yetus.apache.org | | Console output | https://builds.apache.org/job/PreCommit-HBASE-Build/331/console | This message was automatically generated. > Multi-column family BulkLoad fails if compactions go on too long > > > Key: HBASE-11368 > URL: https://issues.apache.org/jira/browse/HBASE-11368 > Project: HBase > Issue Type: Bug >Reporter: stack >Assignee: Qiang Tian > Attachments: hbase-11368-0.98.5.patch, hbase11368-master.patch, > key_stacktrace_hbase10882.TXT, performance_improvement_verification_98.5.patch > > > Compactions take a read lock. If a multi-column family region, before bulk > loading, we want to take a write lock on the region. If the compaction takes > too long, the bulk load fails. > Various recipes include: > + Making smaller regions (lame) > + [~victorunique] suggests major compacting just before bulk loading over in > HBASE-10882 as a work around. > Does the compaction need a read lock for that long? Does the bulk load need > a full write lock when multiple column families? Can we fail more gracefully > at least? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-11368) Multi-column family BulkLoad fails if compactions go on too long
[ https://issues.apache.org/jira/browse/HBASE-11368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15101219#comment-15101219 ] Jerry He commented on HBASE-11368: -- This ticket can be closed because the only sub-task would fix the problem. Right? > Multi-column family BulkLoad fails if compactions go on too long > > > Key: HBASE-11368 > URL: https://issues.apache.org/jira/browse/HBASE-11368 > Project: HBase > Issue Type: Bug >Reporter: stack >Assignee: Qiang Tian > Attachments: hbase-11368-0.98.5.patch, hbase11368-master.patch, > key_stacktrace_hbase10882.TXT, performance_improvement_verification_98.5.patch > > > Compactions take a read lock. If a multi-column family region, before bulk > loading, we want to take a write lock on the region. If the compaction takes > too long, the bulk load fails. > Various recipes include: > + Making smaller regions (lame) > + [~victorunique] suggests major compacting just before bulk loading over in > HBASE-10882 as a work around. > Does the compaction need a read lock for that long? Does the bulk load need > a full write lock when multiple column families? Can we fail more gracefully > at least? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-11368) Multi-column family BulkLoad fails if compactions go on too long
[ https://issues.apache.org/jira/browse/HBASE-11368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14983879#comment-14983879 ] ramkrishna.s.vasudevan commented on HBASE-11368: This comment https://issues.apache.org/jira/browse/HBASE-11368?focusedCommentId=14693166=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel#comment-14693166 is getting addressed as part of HBASE-13082. So doing that JIRA would mean that any current on going scan will not be able to see the bulk loaded hfiles which is loaded just after the current scan has started. I think that behaviour should be acceptable, right? > Multi-column family BulkLoad fails if compactions go on too long > > > Key: HBASE-11368 > URL: https://issues.apache.org/jira/browse/HBASE-11368 > Project: HBase > Issue Type: Bug >Reporter: stack >Assignee: Qiang Tian > Attachments: hbase-11368-0.98.5.patch, hbase11368-master.patch, > key_stacktrace_hbase10882.TXT, performance_improvement_verification_98.5.patch > > > Compactions take a read lock. If a multi-column family region, before bulk > loading, we want to take a write lock on the region. If the compaction takes > too long, the bulk load fails. > Various recipes include: > + Making smaller regions (lame) > + [~victorunique] suggests major compacting just before bulk loading over in > HBASE-10882 as a work around. > Does the compaction need a read lock for that long? Does the bulk load need > a full write lock when multiple column families? Can we fail more gracefully > at least? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-11368) Multi-column family BulkLoad fails if compactions go on too long
[ https://issues.apache.org/jira/browse/HBASE-11368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14984102#comment-14984102 ] Nick Dimiduk commented on HBASE-11368: -- bq. any current on going scan will not be able to see the bulk loaded hfiles which is loaded just after the current scan has started. I think that behaviour should be acceptable, right? I believe this should be correct behavior, yes. > Multi-column family BulkLoad fails if compactions go on too long > > > Key: HBASE-11368 > URL: https://issues.apache.org/jira/browse/HBASE-11368 > Project: HBase > Issue Type: Bug >Reporter: stack >Assignee: Qiang Tian > Attachments: hbase-11368-0.98.5.patch, hbase11368-master.patch, > key_stacktrace_hbase10882.TXT, performance_improvement_verification_98.5.patch > > > Compactions take a read lock. If a multi-column family region, before bulk > loading, we want to take a write lock on the region. If the compaction takes > too long, the bulk load fails. > Various recipes include: > + Making smaller regions (lame) > + [~victorunique] suggests major compacting just before bulk loading over in > HBASE-10882 as a work around. > Does the compaction need a read lock for that long? Does the bulk load need > a full write lock when multiple column families? Can we fail more gracefully > at least? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-11368) Multi-column family BulkLoad fails if compactions go on too long
[ https://issues.apache.org/jira/browse/HBASE-11368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14947820#comment-14947820 ] Nick Dimiduk commented on HBASE-11368: -- FYI, opened subtask HBASE-14575 to give [~devaraj]'s idea a spin. Mind having a look? > Multi-column family BulkLoad fails if compactions go on too long > > > Key: HBASE-11368 > URL: https://issues.apache.org/jira/browse/HBASE-11368 > Project: HBase > Issue Type: Bug >Reporter: stack >Assignee: Qiang Tian > Attachments: hbase-11368-0.98.5.patch, hbase11368-master.patch, > key_stacktrace_hbase10882.TXT, performance_improvement_verification_98.5.patch > > > Compactions take a read lock. If a multi-column family region, before bulk > loading, we want to take a write lock on the region. If the compaction takes > too long, the bulk load fails. > Various recipes include: > + Making smaller regions (lame) > + [~victorunique] suggests major compacting just before bulk loading over in > HBASE-10882 as a work around. > Does the compaction need a read lock for that long? Does the bulk load need > a full write lock when multiple column families? Can we fail more gracefully > at least? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-11368) Multi-column family BulkLoad fails if compactions go on too long
[ https://issues.apache.org/jira/browse/HBASE-11368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14730105#comment-14730105 ] Nick Dimiduk commented on HBASE-11368: -- Looks like HBASE-6028 wants to implement the meat of what I've proposed above. It also happens to be 2/3 of the work for HBASE-12446. Seems like good bang for the buck on this approach. Chatting with [~enis] and [~devaraj] about this offline. Another idea is we can reduce the scope of when the read lock is held during compaction. In theory the compactor only needs a region read lock while deciding what files to compact and at the time of committing the compaction. We're protected from the case of region close events because compactions are checking (between every Cell!) if the store has been closed in order to abort in such a case. Is there another reason why we would want to hold the read lock for the entire duration of the compaction? [~stack] [~lhofhansl]? > Multi-column family BulkLoad fails if compactions go on too long > > > Key: HBASE-11368 > URL: https://issues.apache.org/jira/browse/HBASE-11368 > Project: HBase > Issue Type: Bug >Reporter: stack >Assignee: Qiang Tian > Attachments: hbase-11368-0.98.5.patch, hbase11368-master.patch, > key_stacktrace_hbase10882.TXT, performance_improvement_verification_98.5.patch > > > Compactions take a read lock. If a multi-column family region, before bulk > loading, we want to take a write lock on the region. If the compaction takes > too long, the bulk load fails. > Various recipes include: > + Making smaller regions (lame) > + [~victorunique] suggests major compacting just before bulk loading over in > HBASE-10882 as a work around. > Does the compaction need a read lock for that long? Does the bulk load need > a full write lock when multiple column families? Can we fail more gracefully > at least? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-11368) Multi-column family BulkLoad fails if compactions go on too long
[ https://issues.apache.org/jira/browse/HBASE-11368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14730283#comment-14730283 ] stack commented on HBASE-11368: --- [~ndimiduk] I like the idea of narrowing the lock scope but started to look and its a bit of a rats nest where locks are held (compactions checking on each row seems well dodgy... ) Yeah, a review of the attempt at undoing scanner locks so only a region-level lock sounds like it would help. * > Multi-column family BulkLoad fails if compactions go on too long > > > Key: HBASE-11368 > URL: https://issues.apache.org/jira/browse/HBASE-11368 > Project: HBase > Issue Type: Bug >Reporter: stack >Assignee: Qiang Tian > Attachments: hbase-11368-0.98.5.patch, hbase11368-master.patch, > key_stacktrace_hbase10882.TXT, performance_improvement_verification_98.5.patch > > > Compactions take a read lock. If a multi-column family region, before bulk > loading, we want to take a write lock on the region. If the compaction takes > too long, the bulk load fails. > Various recipes include: > + Making smaller regions (lame) > + [~victorunique] suggests major compacting just before bulk loading over in > HBASE-10882 as a work around. > Does the compaction need a read lock for that long? Does the bulk load need > a full write lock when multiple column families? Can we fail more gracefully > at least? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-11368) Multi-column family BulkLoad fails if compactions go on too long
[ https://issues.apache.org/jira/browse/HBASE-11368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14730140#comment-14730140 ] Lars Hofhansl commented on HBASE-11368: --- See also discussion in HBASE-13082. > Multi-column family BulkLoad fails if compactions go on too long > > > Key: HBASE-11368 > URL: https://issues.apache.org/jira/browse/HBASE-11368 > Project: HBase > Issue Type: Bug >Reporter: stack >Assignee: Qiang Tian > Attachments: hbase-11368-0.98.5.patch, hbase11368-master.patch, > key_stacktrace_hbase10882.TXT, performance_improvement_verification_98.5.patch > > > Compactions take a read lock. If a multi-column family region, before bulk > loading, we want to take a write lock on the region. If the compaction takes > too long, the bulk load fails. > Various recipes include: > + Making smaller regions (lame) > + [~victorunique] suggests major compacting just before bulk loading over in > HBASE-10882 as a work around. > Does the compaction need a read lock for that long? Does the bulk load need > a full write lock when multiple column families? Can we fail more gracefully > at least? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-11368) Multi-column family BulkLoad fails if compactions go on too long
[ https://issues.apache.org/jira/browse/HBASE-11368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14728155#comment-14728155 ] Nick Dimiduk commented on HBASE-11368: -- Check my facts: # the only uses of the write lock are region close and bulkload, two rare events. # the only long-running read lock holders are compactions, frequent events. What if we allow write lock access requests to interrupt running compactions? > Multi-column family BulkLoad fails if compactions go on too long > > > Key: HBASE-11368 > URL: https://issues.apache.org/jira/browse/HBASE-11368 > Project: HBase > Issue Type: Bug >Reporter: stack >Assignee: Qiang Tian > Attachments: hbase-11368-0.98.5.patch, hbase11368-master.patch, > key_stacktrace_hbase10882.TXT, performance_improvement_verification_98.5.patch > > > Compactions take a read lock. If a multi-column family region, before bulk > loading, we want to take a write lock on the region. If the compaction takes > too long, the bulk load fails. > Various recipes include: > + Making smaller regions (lame) > + [~victorunique] suggests major compacting just before bulk loading over in > HBASE-10882 as a work around. > Does the compaction need a read lock for that long? Does the bulk load need > a full write lock when multiple column families? Can we fail more gracefully > at least? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-11368) Multi-column family BulkLoad fails if compactions go on too long
[ https://issues.apache.org/jira/browse/HBASE-11368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14705298#comment-14705298 ] Esteban Gutierrez commented on HBASE-11368: --- Hey [~tianq] are you still working on this? Also I agree with [~enis] regarding ref-counting might be an alternative but I couldn't find the JIRA for that, any pointer [~lhofhansl]? Multi-column family BulkLoad fails if compactions go on too long Key: HBASE-11368 URL: https://issues.apache.org/jira/browse/HBASE-11368 Project: HBase Issue Type: Bug Reporter: stack Assignee: Qiang Tian Attachments: hbase-11368-0.98.5.patch, hbase11368-master.patch, key_stacktrace_hbase10882.TXT, performance_improvement_verification_98.5.patch Compactions take a read lock. If a multi-column family region, before bulk loading, we want to take a write lock on the region. If the compaction takes too long, the bulk load fails. Various recipes include: + Making smaller regions (lame) + [~victorunique] suggests major compacting just before bulk loading over in HBASE-10882 as a work around. Does the compaction need a read lock for that long? Does the bulk load need a full write lock when multiple column families? Can we fail more gracefully at least? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-11368) Multi-column family BulkLoad fails if compactions go on too long
[ https://issues.apache.org/jira/browse/HBASE-11368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14696724#comment-14696724 ] Enis Soztutar commented on HBASE-11368: --- bq. How? Would the following case be true without the bulk load getting the region write lock? We do not do this now, but in theory it can be done similarly to regular writes. - obtain new seqId as a write transaction - bulk load all files across CFs with the seqId. - advance mvcc read point only when all bulk loads are complete. This way the scanners are guaranteed to atomically observe the bulk loaded data atomically without the region-write-lock. bq. In the 0.98 code line, we don't have seqid, and the atomicity is still guaranteed there. Yes. Not worth changing 0.98 line. bq. I think it is being propagated properly to the scanner. Think about the same notifyChangedReadersObservers is being used at the end of compaction and flushes as well. The reset of the readers should work. I am not sure about that. Agreed that the cells at the store level will actually get re-ordered, but the heap at the region level is never re-ordered. So, after a bulk load, the ordering of store scanners at the region level might change, but the scanner will miss it if I understand this correctly. bq. Atomicity may be a false blanket considering HBASE-4652 is still unresolved. Very good point. We need a transactional commit for the BL files. Multi-column family BulkLoad fails if compactions go on too long Key: HBASE-11368 URL: https://issues.apache.org/jira/browse/HBASE-11368 Project: HBase Issue Type: Bug Reporter: stack Assignee: Qiang Tian Attachments: hbase-11368-0.98.5.patch, hbase11368-master.patch, key_stacktrace_hbase10882.TXT, performance_improvement_verification_98.5.patch Compactions take a read lock. If a multi-column family region, before bulk loading, we want to take a write lock on the region. If the compaction takes too long, the bulk load fails. Various recipes include: + Making smaller regions (lame) + [~victorunique] suggests major compacting just before bulk loading over in HBASE-10882 as a work around. Does the compaction need a read lock for that long? Does the bulk load need a full write lock when multiple column families? Can we fail more gracefully at least? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-11368) Multi-column family BulkLoad fails if compactions go on too long
[ https://issues.apache.org/jira/browse/HBASE-11368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14695115#comment-14695115 ] Enis Soztutar commented on HBASE-11368: --- I was reading HBASE-4552 and RegionScannerImpl code again to try to understand why we need the write lock for multi-CF bulk loads in the first place. It seems that it was put there to ensure atomicity, but that should be guaranteed with the seqId / mvcc combination and not via region write lock. However, the bulk load files obtain a seqId, and acquiring the region write lock will block all flushes which may be the reason. On bulk load, we call HStore.notifyChangedReadersObservers(), which resets the KVHeap, but we never reset the RegionScanner from my reading of code. Is this a bug? The current scanners should not see the new bulk loaded data (via mvcc) so maybe it is ok? Multi-column family BulkLoad fails if compactions go on too long Key: HBASE-11368 URL: https://issues.apache.org/jira/browse/HBASE-11368 Project: HBase Issue Type: Bug Reporter: stack Assignee: Qiang Tian Attachments: hbase-11368-0.98.5.patch, hbase11368-master.patch, key_stacktrace_hbase10882.TXT, performance_improvement_verification_98.5.patch Compactions take a read lock. If a multi-column family region, before bulk loading, we want to take a write lock on the region. If the compaction takes too long, the bulk load fails. Various recipes include: + Making smaller regions (lame) + [~victorunique] suggests major compacting just before bulk loading over in HBASE-10882 as a work around. Does the compaction need a read lock for that long? Does the bulk load need a full write lock when multiple column families? Can we fail more gracefully at least? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-11368) Multi-column family BulkLoad fails if compactions go on too long
[ https://issues.apache.org/jira/browse/HBASE-11368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14696223#comment-14696223 ] Nick Dimiduk commented on HBASE-11368: -- Atomicity may be a false blanket considering HBASE-4652 is still unresolved. Multi-column family BulkLoad fails if compactions go on too long Key: HBASE-11368 URL: https://issues.apache.org/jira/browse/HBASE-11368 Project: HBase Issue Type: Bug Reporter: stack Assignee: Qiang Tian Attachments: hbase-11368-0.98.5.patch, hbase11368-master.patch, key_stacktrace_hbase10882.TXT, performance_improvement_verification_98.5.patch Compactions take a read lock. If a multi-column family region, before bulk loading, we want to take a write lock on the region. If the compaction takes too long, the bulk load fails. Various recipes include: + Making smaller regions (lame) + [~victorunique] suggests major compacting just before bulk loading over in HBASE-10882 as a work around. Does the compaction need a read lock for that long? Does the bulk load need a full write lock when multiple column families? Can we fail more gracefully at least? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-11368) Multi-column family BulkLoad fails if compactions go on too long
[ https://issues.apache.org/jira/browse/HBASE-11368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14695884#comment-14695884 ] Jerry He commented on HBASE-11368: -- bq. but that should be guaranteed with the seqId / mvcc combination and not via region write lock. How? Would the following case be true without the bulk load getting the region write lock? a. the bulk load obtain a seqId b. a read request comes in and gets the seqId as mvcc. c. The read will be able to see the partially loaded data while the bulk is still in process In the 0.98 code line, we don't have seqid, and the atomicity is still guaranteed there. bq. On bulk load, we call HStore.notifyChangedReadersObservers(), which resets the KVHeap, but we never reset the RegionScanner from my reading of code. Is this a bug? I think it is being propagated properly to the scanner. Think about the same notifyChangedReadersObservers is being used at the end of compaction and flushes as well. The reset of the readers should work. I think the region write lock is still the only guarantee for bulk load atomicity. On the high level, the region scan and next calls are within the region read lock, which is mutually elusive with bulk load process which needs the region write lock. This is heavy. Multi-column family BulkLoad fails if compactions go on too long Key: HBASE-11368 URL: https://issues.apache.org/jira/browse/HBASE-11368 Project: HBase Issue Type: Bug Reporter: stack Assignee: Qiang Tian Attachments: hbase-11368-0.98.5.patch, hbase11368-master.patch, key_stacktrace_hbase10882.TXT, performance_improvement_verification_98.5.patch Compactions take a read lock. If a multi-column family region, before bulk loading, we want to take a write lock on the region. If the compaction takes too long, the bulk load fails. Various recipes include: + Making smaller regions (lame) + [~victorunique] suggests major compacting just before bulk loading over in HBASE-10882 as a work around. Does the compaction need a read lock for that long? Does the bulk load need a full write lock when multiple column families? Can we fail more gracefully at least? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-11368) Multi-column family BulkLoad fails if compactions go on too long
[ https://issues.apache.org/jira/browse/HBASE-11368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14696335#comment-14696335 ] Jerry He commented on HBASE-11368: -- You are right, Nick. That is the unsolved issue. Multi-column family BulkLoad fails if compactions go on too long Key: HBASE-11368 URL: https://issues.apache.org/jira/browse/HBASE-11368 Project: HBase Issue Type: Bug Reporter: stack Assignee: Qiang Tian Attachments: hbase-11368-0.98.5.patch, hbase11368-master.patch, key_stacktrace_hbase10882.TXT, performance_improvement_verification_98.5.patch Compactions take a read lock. If a multi-column family region, before bulk loading, we want to take a write lock on the region. If the compaction takes too long, the bulk load fails. Various recipes include: + Making smaller regions (lame) + [~victorunique] suggests major compacting just before bulk loading over in HBASE-10882 as a work around. Does the compaction need a read lock for that long? Does the bulk load need a full write lock when multiple column families? Can we fail more gracefully at least? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-11368) Multi-column family BulkLoad fails if compactions go on too long
[ https://issues.apache.org/jira/browse/HBASE-11368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14693166#comment-14693166 ] Enis Soztutar commented on HBASE-11368: --- My concern with the patch is that it is acquiring yet another lock per get/scan on top of the already existing ones. Agreed that the region close lock is abused here for multi-CF bulkloads and have to be fixed. I believe the actual long term solution to this is to do ref-counting to Store files in the store, and have the store file list per scan immutable. Then we do not need the costly mechanism for keeping the store files updated between KVHea, scanner and store file list ({{notifyChangedReadersObservers}}). leveldb is doing ref counting for files I believe. [~lhofhansl] you had a jira for this? Multi-column family BulkLoad fails if compactions go on too long Key: HBASE-11368 URL: https://issues.apache.org/jira/browse/HBASE-11368 Project: HBase Issue Type: Bug Reporter: stack Assignee: Qiang Tian Attachments: hbase-11368-0.98.5.patch, hbase11368-master.patch, key_stacktrace_hbase10882.TXT, performance_improvement_verification_98.5.patch Compactions take a read lock. If a multi-column family region, before bulk loading, we want to take a write lock on the region. If the compaction takes too long, the bulk load fails. Various recipes include: + Making smaller regions (lame) + [~victorunique] suggests major compacting just before bulk loading over in HBASE-10882 as a work around. Does the compaction need a read lock for that long? Does the bulk load need a full write lock when multiple column families? Can we fail more gracefully at least? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-11368) Multi-column family BulkLoad fails if compactions go on too long
[ https://issues.apache.org/jira/browse/HBASE-11368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14682493#comment-14682493 ] Stephen Yuan Jiang commented on HBASE-11368: [~tianq] and [~stack], any update or concern on this patch? We have a customer seeing this issue recently. Multi-column family BulkLoad fails if compactions go on too long Key: HBASE-11368 URL: https://issues.apache.org/jira/browse/HBASE-11368 Project: HBase Issue Type: Bug Reporter: stack Assignee: Qiang Tian Attachments: hbase-11368-0.98.5.patch, hbase11368-master.patch, key_stacktrace_hbase10882.TXT, performance_improvement_verification_98.5.patch Compactions take a read lock. If a multi-column family region, before bulk loading, we want to take a write lock on the region. If the compaction takes too long, the bulk load fails. Various recipes include: + Making smaller regions (lame) + [~victorunique] suggests major compacting just before bulk loading over in HBASE-10882 as a work around. Does the compaction need a read lock for that long? Does the bulk load need a full write lock when multiple column families? Can we fail more gracefully at least? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-11368) Multi-column family BulkLoad fails if compactions go on too long
[ https://issues.apache.org/jira/browse/HBASE-11368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14186581#comment-14186581 ] Hadoop QA commented on HBASE-11368: --- {color:red}-1 overall{color}. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12677543/hbase11368-master.patch against trunk revision . ATTACHMENT ID: 12677543 {color:green}+1 @author{color}. The patch does not contain any @author tags. {color:red}-1 tests included{color}. The patch doesn't appear to include any new or modified tests. Please justify why no new tests are needed for this patch. Also please list what manual steps were performed to verify this patch. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javac{color}. The applied patch does not increase the total number of javac compiler warnings. {color:green}+1 javadoc{color}. The javadoc tool did not generate any warning messages. {color:red}-1 checkstyle{color}. The applied patch generated 3781 checkstyle errors (more than the trunk's current 3780 errors). {color:green}+1 findbugs{color}. The patch does not introduce any new Findbugs (version 2.0.3) warnings. {color:green}+1 release audit{color}. The applied patch does not increase the total number of release audit warnings. {color:green}+1 lineLengths{color}. The patch does not introduce lines longer than 100 {color:green}+1 site{color}. The mvn site goal succeeds with this patch. {color:red}-1 core tests{color}. The patch failed these unit tests: org.apache.hadoop.hbase.client.TestHCM org.apache.hadoop.hbase.master.TestSplitLogManager Test results: https://builds.apache.org/job/PreCommit-HBASE-Build/11489//testReport/ Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11489//artifact/patchprocess/newPatchFindbugsWarningshbase-client.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11489//artifact/patchprocess/newPatchFindbugsWarningshbase-annotations.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11489//artifact/patchprocess/newPatchFindbugsWarningshbase-thrift.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11489//artifact/patchprocess/newPatchFindbugsWarningshbase-server.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11489//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11489//artifact/patchprocess/newPatchFindbugsWarningshbase-protocol.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11489//artifact/patchprocess/newPatchFindbugsWarningshbase-examples.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11489//artifact/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11489//artifact/patchprocess/newPatchFindbugsWarningshbase-prefix-tree.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11489//artifact/patchprocess/newPatchFindbugsWarningshbase-common.html Findbugs warnings: https://builds.apache.org/job/PreCommit-HBASE-Build/11489//artifact/patchprocess/newPatchFindbugsWarningshbase-rest.html Checkstyle Errors: https://builds.apache.org/job/PreCommit-HBASE-Build/11489//artifact/patchprocess/checkstyle-aggregate.html Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/11489//console This message is automatically generated. Multi-column family BulkLoad fails if compactions go on too long Key: HBASE-11368 URL: https://issues.apache.org/jira/browse/HBASE-11368 Project: HBase Issue Type: Bug Reporter: stack Assignee: Qiang Tian Attachments: hbase-11368-0.98.5.patch, hbase11368-master.patch, key_stacktrace_hbase10882.TXT, performance_improvement_verification_98.5.patch Compactions take a read lock. If a multi-column family region, before bulk loading, we want to take a write lock on the region. If the compaction takes too long, the bulk load fails. Various recipes include: + Making smaller regions (lame) + [~victorunique] suggests major compacting just before bulk loading over in HBASE-10882 as a work around. Does the compaction need a read lock for that long? Does the bulk load need a full write lock when multiple column families? Can we fail more gracefully at least? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-11368) Multi-column family BulkLoad fails if compactions go on too long
[ https://issues.apache.org/jira/browse/HBASE-11368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14184066#comment-14184066 ] Qiang Tian commented on HBASE-11368: the attachments: {{key_stacktrace_hbase10882.TXT}} : the problem stacktrace {{hbase-11368-0.98.5.patch}} : the fix {{performance_improvement_verification_98.5.patch}}: the testcase to verify performance improvement Multi-column family BulkLoad fails if compactions go on too long Key: HBASE-11368 URL: https://issues.apache.org/jira/browse/HBASE-11368 Project: HBase Issue Type: Bug Reporter: stack Assignee: Qiang Tian Attachments: hbase-11368-0.98.5.patch, key_stacktrace_hbase10882.TXT, performance_improvement_verification_98.5.patch Compactions take a read lock. If a multi-column family region, before bulk loading, we want to take a write lock on the region. If the compaction takes too long, the bulk load fails. Various recipes include: + Making smaller regions (lame) + [~victorunique] suggests major compacting just before bulk loading over in HBASE-10882 as a work around. Does the compaction need a read lock for that long? Does the bulk load need a full write lock when multiple column families? Can we fail more gracefully at least? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-11368) Multi-column family BulkLoad fails if compactions go on too long
[ https://issues.apache.org/jira/browse/HBASE-11368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14182647#comment-14182647 ] Qiang Tian commented on HBASE-11368: Hi [~stack], [~apurtell], any comments? thanks! Multi-column family BulkLoad fails if compactions go on too long Key: HBASE-11368 URL: https://issues.apache.org/jira/browse/HBASE-11368 Project: HBase Issue Type: Bug Reporter: stack Assignee: Qiang Tian Attachments: hbase-11368-0.98.5.patch, performance_improvement_verification_98.5.patch Compactions take a read lock. If a multi-column family region, before bulk loading, we want to take a write lock on the region. If the compaction takes too long, the bulk load fails. Various recipes include: + Making smaller regions (lame) + [~victorunique] suggests major compacting just before bulk loading over in HBASE-10882 as a work around. Does the compaction need a read lock for that long? Does the bulk load need a full write lock when multiple column families? Can we fail more gracefully at least? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-11368) Multi-column family BulkLoad fails if compactions go on too long
[ https://issues.apache.org/jira/browse/HBASE-11368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14183019#comment-14183019 ] stack commented on HBASE-11368: --- Is this meant to be in the patch? 1449LOG.info(###compaction get the closelock, sleep 20s to simulate slow compaction); 1450try { 1451 Thread.sleep(2); 1452} catch (InterruptedException e) { 1453 LOG.info(###sleep interrupted); 1454} What change did you do [~tianq]? Multi-column family BulkLoad fails if compactions go on too long Key: HBASE-11368 URL: https://issues.apache.org/jira/browse/HBASE-11368 Project: HBase Issue Type: Bug Reporter: stack Assignee: Qiang Tian Attachments: hbase-11368-0.98.5.patch, performance_improvement_verification_98.5.patch Compactions take a read lock. If a multi-column family region, before bulk loading, we want to take a write lock on the region. If the compaction takes too long, the bulk load fails. Various recipes include: + Making smaller regions (lame) + [~victorunique] suggests major compacting just before bulk loading over in HBASE-10882 as a work around. Does the compaction need a read lock for that long? Does the bulk load need a full write lock when multiple column families? Can we fail more gracefully at least? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-11368) Multi-column family BulkLoad fails if compactions go on too long
[ https://issues.apache.org/jira/browse/HBASE-11368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14181168#comment-14181168 ] Qiang Tian commented on HBASE-11368: initial YCSB test: Env: --- hadoop 2.2.0 YCSB 1.0.4(Andrew's branch) 3 nodes, 1 master, 2 RS //ignore cluster details since just to evaluate the new lock Steps: --- Followed Andrew's steps(see http://search-hadoop.com/m/DHED4hl7pC/) the seed table has 3 CFs, pre-split to 20 regions load 1 million rows to CF 'f1', using workloada run 3 iterations for workloadc and workloada respectively. the parameter in each run: bq. -p columnfamily=f1 -p operationcount=100 -s -threads 10 Results: --- 0.98.5: workload c: [READ], AverageLatency(us), 496.225811 [READ], AverageLatency(us), 510.206831 [READ], AverageLatency(us), 501.256123 workload a: [READ], AverageLatency(us), 676.4527555821747 [READ], AverageLatency(us), 622.5544771452717 [READ], AverageLatency(us), 628.1365657163067 0.98.5+patch: workload c: [READ], AverageLatency(us), 536.334437 [READ], AverageLatency(us), 508.40 [READ], AverageLatency(us), 491.416182 workload a: [READ], AverageLatency(us), 640.3625218319231 [READ], AverageLatency(us), 642.9719823488798 [READ], AverageLatency(us), 631.7491770928287 looks little performance penalty. I also ran PE in the cluster, since the test table has only 1 CF, the new lock is actually not used. interestingly, with the patch the performance is even a bit better... Multi-column family BulkLoad fails if compactions go on too long Key: HBASE-11368 URL: https://issues.apache.org/jira/browse/HBASE-11368 Project: HBase Issue Type: Bug Reporter: stack Assignee: Qiang Tian Attachments: hbase-11368-0.98.5.patch Compactions take a read lock. If a multi-column family region, before bulk loading, we want to take a write lock on the region. If the compaction takes too long, the bulk load fails. Various recipes include: + Making smaller regions (lame) + [~victorunique] suggests major compacting just before bulk loading over in HBASE-10882 as a work around. Does the compaction need a read lock for that long? Does the bulk load need a full write lock when multiple column families? Can we fail more gracefully at least? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-11368) Multi-column family BulkLoad fails if compactions go on too long
[ https://issues.apache.org/jira/browse/HBASE-11368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14170649#comment-14170649 ] Qiang Tian commented on HBASE-11368: it looks to me the patch could show the value only when there is long compaction + gets/scans, not sure if [~victorunique] wants to try it in some test env? thanks. Multi-column family BulkLoad fails if compactions go on too long Key: HBASE-11368 URL: https://issues.apache.org/jira/browse/HBASE-11368 Project: HBase Issue Type: Bug Reporter: stack Assignee: Qiang Tian Attachments: hbase-11368-0.98.5.patch Compactions take a read lock. If a multi-column family region, before bulk loading, we want to take a write lock on the region. If the compaction takes too long, the bulk load fails. Various recipes include: + Making smaller regions (lame) + [~victorunique] suggests major compacting just before bulk loading over in HBASE-10882 as a work around. Does the compaction need a read lock for that long? Does the bulk load need a full write lock when multiple column families? Can we fail more gracefully at least? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-11368) Multi-column family BulkLoad fails if compactions go on too long
[ https://issues.apache.org/jira/browse/HBASE-11368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14166466#comment-14166466 ] Qiang Tian commented on HBASE-11368: update: the idea will cause deadlock since bulkload and scanner follow different orders to acquire bulkload lock and StoreScanner.lock. will look at if we could lower the granularity of storescanner lock. Multi-column family BulkLoad fails if compactions go on too long Key: HBASE-11368 URL: https://issues.apache.org/jira/browse/HBASE-11368 Project: HBase Issue Type: Bug Reporter: stack Assignee: Qiang Tian Compactions take a read lock. If a multi-column family region, before bulk loading, we want to take a write lock on the region. If the compaction takes too long, the bulk load fails. Various recipes include: + Making smaller regions (lame) + [~victorunique] suggests major compacting just before bulk loading over in HBASE-10882 as a work around. Does the compaction need a read lock for that long? Does the bulk load need a full write lock when multiple column families? Can we fail more gracefully at least? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-11368) Multi-column family BulkLoad fails if compactions go on too long
[ https://issues.apache.org/jira/browse/HBASE-11368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14164957#comment-14164957 ] Qiang Tian commented on HBASE-11368: Thanks [~jinghe], is it right way to run the bulkload test? {{mvn test -Dtest=TestHRegionServerBulkLoad}} the test is supposed to run for 5 minutes, but only after about 1 minutes then it exits. is it expected? Multi-column family BulkLoad fails if compactions go on too long Key: HBASE-11368 URL: https://issues.apache.org/jira/browse/HBASE-11368 Project: HBase Issue Type: Bug Reporter: stack Assignee: Qiang Tian Compactions take a read lock. If a multi-column family region, before bulk loading, we want to take a write lock on the region. If the compaction takes too long, the bulk load fails. Various recipes include: + Making smaller regions (lame) + [~victorunique] suggests major compacting just before bulk loading over in HBASE-10882 as a work around. Does the compaction need a read lock for that long? Does the bulk load need a full write lock when multiple column families? Can we fail more gracefully at least? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-11368) Multi-column family BulkLoad fails if compactions go on too long
[ https://issues.apache.org/jira/browse/HBASE-11368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14165411#comment-14165411 ] Jerry He commented on HBASE-11368: -- {code} /** * Atomic bulk load. */ @Test public void testAtomicBulkLoad() throws Exception { String TABLE_NAME = atomicBulkLoad; int millisToRun = 3; {code} This test case is 30 sec. {code} /** * Run test on an HBase instance for 5 minutes. This assumes that the table * under test only has a single region. */ public static void main(String args[]) throws Exception { {code} main is not invoked during JUnit run. Multi-column family BulkLoad fails if compactions go on too long Key: HBASE-11368 URL: https://issues.apache.org/jira/browse/HBASE-11368 Project: HBase Issue Type: Bug Reporter: stack Assignee: Qiang Tian Compactions take a read lock. If a multi-column family region, before bulk loading, we want to take a write lock on the region. If the compaction takes too long, the bulk load fails. Various recipes include: + Making smaller regions (lame) + [~victorunique] suggests major compacting just before bulk loading over in HBASE-10882 as a work around. Does the compaction need a read lock for that long? Does the bulk load need a full write lock when multiple column families? Can we fail more gracefully at least? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-11368) Multi-column family BulkLoad fails if compactions go on too long
[ https://issues.apache.org/jira/browse/HBASE-11368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14163162#comment-14163162 ] Qiang Tian commented on HBASE-11368: ideas for lowering down the lock granularity(based on 0.98.5 code base) 1)read/scan is it the primary goal for atomic multi-CF bulkload in HBASE-4552? After DefaultStoreFileManager#storefiles is updated in HStore#bulkLoadHFile, notifyChangedReadersObservers is called to reset the StoreScanner#heap, so checkReseek-resetScannerStack will be triggered in next scan/read to recreate store scanners based on new storefiles. so we could introduce a new region level rwlock multiCFLock, HRegion#bulkLoadHFiles acquires the writelock before multi-CF HStore.bulkLoadHFile call. and StoreScanner#resetScannerStack acquires the readlock. this way the scanners are recreated after all CFs' store files are populated. 2)split region. the region will be closed in SplitTransaction#stepsBeforePONR, which falls into the HRegion#lock protection area. bulk load still still need to acquire its readlock at start. 3) memstore flush. we flush to a new file which is not related to the loaded files. 4)compaction. the compaction is performed store by store. if bulkload inserts new files to {{storefiles}} during the selectCompaction process, the file list to be compacted might be impacted. e.g., the compaction for some CF do not include new loaded files, while others might include. but this does not impact the data integrity and read behavior? at the end of compaction, {{storefiles}} access is still protected by HStore#lock if there is bulk load change to the same CF. comments? thanks Multi-column family BulkLoad fails if compactions go on too long Key: HBASE-11368 URL: https://issues.apache.org/jira/browse/HBASE-11368 Project: HBase Issue Type: Bug Reporter: stack Compactions take a read lock. If a multi-column family region, before bulk loading, we want to take a write lock on the region. If the compaction takes too long, the bulk load fails. Various recipes include: + Making smaller regions (lame) + [~victorunique] suggests major compacting just before bulk loading over in HBASE-10882 as a work around. Does the compaction need a read lock for that long? Does the bulk load need a full write lock when multiple column families? Can we fail more gracefully at least? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-11368) Multi-column family BulkLoad fails if compactions go on too long
[ https://issues.apache.org/jira/browse/HBASE-11368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14163697#comment-14163697 ] Jerry He commented on HBASE-11368: -- Hi, [~tianq] The idea may be feasible: bulk load begins: acquire region read lock + new bulk load write lock. Scan//next begins: acquire region read lock + new bulk load read lock. Other region operations: only acquire region read lock This will save the compaction and bulk load from blocking each other. Do you mind drafting a patch and run thru the test suite? Multi-column family BulkLoad fails if compactions go on too long Key: HBASE-11368 URL: https://issues.apache.org/jira/browse/HBASE-11368 Project: HBase Issue Type: Bug Reporter: stack Compactions take a read lock. If a multi-column family region, before bulk loading, we want to take a write lock on the region. If the compaction takes too long, the bulk load fails. Various recipes include: + Making smaller regions (lame) + [~victorunique] suggests major compacting just before bulk loading over in HBASE-10882 as a work around. Does the compaction need a read lock for that long? Does the bulk load need a full write lock when multiple column families? Can we fail more gracefully at least? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-11368) Multi-column family BulkLoad fails if compactions go on too long
[ https://issues.apache.org/jira/browse/HBASE-11368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14163705#comment-14163705 ] Jerry He commented on HBASE-11368: -- bq. Other region operations: only acquire region read lock -- Other region operations: only acquire region read or write lock, no change from existing behavior. Multi-column family BulkLoad fails if compactions go on too long Key: HBASE-11368 URL: https://issues.apache.org/jira/browse/HBASE-11368 Project: HBase Issue Type: Bug Reporter: stack Compactions take a read lock. If a multi-column family region, before bulk loading, we want to take a write lock on the region. If the compaction takes too long, the bulk load fails. Various recipes include: + Making smaller regions (lame) + [~victorunique] suggests major compacting just before bulk loading over in HBASE-10882 as a work around. Does the compaction need a read lock for that long? Does the bulk load need a full write lock when multiple column families? Can we fail more gracefully at least? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-11368) Multi-column family BulkLoad fails if compactions go on too long
[ https://issues.apache.org/jira/browse/HBASE-11368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14163024#comment-14163024 ] Qiang Tian commented on HBASE-11368: As [~stack] mentioned in http://search-hadoop.com/m/DHED4NR0wT, the HRegion#lock is to protect region close. the comments in HRegion.java and the fact that only HRegion#doClose locks the writelock(if we do not consider HRegion#startBulkRegionOperation) also show that. so using HRegion#lock to protect multi-CF bulkload in HBASE-4552 looks too heavy-weight? from the stacktrace of HBASE-10882, all the read/scan are blocked since bulkload is waiting for lock.writelock, however compaction already acquired lock.readlock and is reading data, a time-consuming operation. and related topic is discussed again in http://search-hadoop.com/m/DHED4I11p31. perhaps we need another region level lock. Multi-column family BulkLoad fails if compactions go on too long Key: HBASE-11368 URL: https://issues.apache.org/jira/browse/HBASE-11368 Project: HBase Issue Type: Bug Reporter: stack Compactions take a read lock. If a multi-column family region, before bulk loading, we want to take a write lock on the region. If the compaction takes too long, the bulk load fails. Various recipes include: + Making smaller regions (lame) + [~victorunique] suggests major compacting just before bulk loading over in HBASE-10882 as a work around. Does the compaction need a read lock for that long? Does the bulk load need a full write lock when multiple column families? Can we fail more gracefully at least? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-11368) Multi-column family BulkLoad fails if compactions go on too long
[ https://issues.apache.org/jira/browse/HBASE-11368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14033409#comment-14033409 ] stack commented on HBASE-11368: --- This is an old issue, http://search-hadoop.com/m/0AGoj1C1AXY/org.apache.hadoop.hbase.RegionTooBusyException%253A+failed+to+get+a+lock+in+6mssubj=Bulk+loading+HFiles+via+LoadIncrementalHFiles+fails+at+a+region+that+is+being+compacted+a+bug+ Multi-column family BulkLoad fails if compactions go on too long Key: HBASE-11368 URL: https://issues.apache.org/jira/browse/HBASE-11368 Project: HBase Issue Type: Bug Reporter: stack Compactions take a read lock. If a multi-column family region, before bulk loading, we want to take a write lock on the region. If the compaction takes too long, the bulk load fails. Various recipes include: + Making smaller regions (lame) + [~victorunique] suggests major compacting just before bulk loading over in HBASE-10882 as a work around. Does the compaction need a read lock for that long? Does the bulk load need a full write lock when multiple column families? Can we fail more gracefully at least? -- This message was sent by Atlassian JIRA (v6.2#6252)