from:"\"Jean\\\-Daniel Cryans \\\(JIRA\\\)\""

[jira] [Commented] (HBASE-21874) Bucket cache on Persistent memory

2019-02-15 Thread Jean-Daniel Cryans (JIRA)



[ 
https://issues.apache.org/jira/browse/HBASE-21874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16769508#comment-16769508
 ] 

Jean-Daniel Cryans commented on HBASE-21874:


Thank you Anoop and Ram for addressing my questions. Some follow-ups:

bq. We can remove that part if it is distracting but it may be needed if you 
need to create bigger sized cache. It seems like a java restriction from 
mmapping buffers.

We should document this limitation and add a check in the code for the moment.

bq. So it becomes a direct replacement of the DRAM based IOEngine. ( you don't 
copy the buffers onheap)

That's the part I don't get. ByteBufferIOEngine basically offers the same 
functionality if we don't care about persistence and the Intel DCPMM comes in a 
DIMM form factor so shouldn't that _just work_?

> Bucket cache on Persistent memory
> -
>
> Key: HBASE-21874
> URL: https://issues.apache.org/jira/browse/HBASE-21874
> Project: HBase
>  Issue Type: New Feature
>  Components: BucketCache
>Affects Versions: 3.0.0
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
>Priority: Major
> Fix For: 3.0.0
>
> Attachments: HBASE-21874.patch, HBASE-21874.patch, 
> HBASE-21874_V2.patch, Pmem_BC.png
>
>
> Non volatile persistent memory devices are byte addressable like DRAM (for 
> eg. Intel DCPMM). Bucket cache implementation can take advantage of this new 
> memory type and can make use of the existing offheap data structures to serve 
> data directly from this memory area without having to bring the data to 
> onheap.
> The patch is a new IOEngine implementation that works with the persistent 
> memory.
> Note : Here we don't make use of the persistence nature of the device and 
> just make use of the big memory it provides.
> Performance numbers to follow. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HBASE-21874) Bucket cache on Persistent memory

2019-02-15 Thread Jean-Daniel Cryans (JIRA)



[ 
https://issues.apache.org/jira/browse/HBASE-21874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16769510#comment-16769510
 ] 

Jean-Daniel Cryans commented on HBASE-21874:


Oh additionally maybe we shouldn't call this PmemIOEngine but something more 
generic like SharedMemoryFileMmapEngine if it's not going to be persistent.

> Bucket cache on Persistent memory
> -
>
> Key: HBASE-21874
> URL: https://issues.apache.org/jira/browse/HBASE-21874
> Project: HBase
>  Issue Type: New Feature
>  Components: BucketCache
>Affects Versions: 3.0.0
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
>Priority: Major
> Fix For: 3.0.0
>
> Attachments: HBASE-21874.patch, HBASE-21874.patch, 
> HBASE-21874_V2.patch, Pmem_BC.png
>
>
> Non volatile persistent memory devices are byte addressable like DRAM (for 
> eg. Intel DCPMM). Bucket cache implementation can take advantage of this new 
> memory type and can make use of the existing offheap data structures to serve 
> data directly from this memory area without having to bring the data to 
> onheap.
> The patch is a new IOEngine implementation that works with the persistent 
> memory.
> Note : Here we don't make use of the persistence nature of the device and 
> just make use of the big memory it provides.
> Performance numbers to follow. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HBASE-21874) Bucket cache on Persistent memory

2019-02-14 Thread Jean-Daniel Cryans (JIRA)



[ 
https://issues.apache.org/jira/browse/HBASE-21874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16768748#comment-16768748
 ] 

Jean-Daniel Cryans commented on HBASE-21874:


Hi Ram. Two questions:
- Why add the configurable buffer size? That seems to be what most of the patch 
is about which is distracting.
- You say that the patch doesn't make use of the persistent nature of the 
device, but PmemIOEngine extends FileMmapEngine which advertises itself as 
persistent so it does a few other things on top. I might be confused about what 
it implies but this is also confusing :)

> Bucket cache on Persistent memory
> -
>
> Key: HBASE-21874
> URL: https://issues.apache.org/jira/browse/HBASE-21874
> Project: HBase
>  Issue Type: New Feature
>  Components: BucketCache
>Affects Versions: 3.0.0
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
>Priority: Major
> Fix For: 3.0.0
>
> Attachments: HBASE-21874.patch, HBASE-21874.patch, Pmem_BC.png
>
>
> Non volatile persistent memory devices are byte addressable like DRAM (for 
> eg. Intel DCPMM). Bucket cache implementation can take advantage of this new 
> memory type and can make use of the existing offheap data structures to serve 
> data directly from this memory area without having to bring the data to 
> onheap.
> The patch is a new IOEngine implementation that works with the persistent 
> memory.
> Note : Here we don't make use of the persistence nature of the device and 
> just make use of the big memory it provides.
> Performance numbers to follow. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

[jira] [Commented] (HBASE-12169) Document IPC binding options

2017-06-29 Thread Jean-Daniel Cryans (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-12169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16069042#comment-16069042
 ] 

Jean-Daniel Cryans commented on HBASE-12169:


It's tools _and_ practices on how to run a cluster, I think it fits.

> Document IPC binding options
> 
>
> Key: HBASE-12169
> URL: https://issues.apache.org/jira/browse/HBASE-12169
> Project: HBase
>  Issue Type: Task
>  Components: documentation
>Affects Versions: 0.98.0, 0.94.7, 0.99.0
>Reporter: Sean Busbey
>Assignee: Mike Drob
>Priority: Minor
> Attachments: HBASE-12169.patch
>
>
> HBASE-8148 added options to change binding component services, but there 
> aren't any docs for it.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HBASE-12169) Document IPC binding options

2017-06-29 Thread Jean-Daniel Cryans (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-12169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16068722#comment-16068722
 ] 

Jean-Daniel Cryans commented on HBASE-12169:


Instead of adding this to hbase-default, one option could be to document "How 
to have HBase processes bind to specific addresses", maybe on this page 
http://hbase.apache.org/book.html#ops_mgt ?

> Document IPC binding options
> 
>
> Key: HBASE-12169
> URL: https://issues.apache.org/jira/browse/HBASE-12169
> Project: HBase
>  Issue Type: Task
>  Components: documentation
>Affects Versions: 0.98.0, 0.94.7, 0.99.0
>Reporter: Sean Busbey
>Assignee: Mike Drob
>Priority: Minor
> Attachments: HBASE-12169.patch
>
>
> HBASE-8148 added options to change binding component services, but there 
> aren't any docs for it.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HBASE-12169) Document IPC binding options

2017-06-29 Thread Jean-Daniel Cryans (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-12169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16068578#comment-16068578
 ] 

Jean-Daniel Cryans commented on HBASE-12169:


Hey Mike,

I could see a case for documenting why someone would want to change those 
configs, have you considered it?

Also, and that might show how long it's been since I've looked at the HBase 
docs, but isn't the hbase-default stuff just pulled from hbase-default.xml? The 
configs you are documenting aren't in that file AFAIK, so it should either be 
documented elsewhere or the configs have to be pulled into hbase-default.xml.

> Document IPC binding options
> 
>
> Key: HBASE-12169
> URL: https://issues.apache.org/jira/browse/HBASE-12169
> Project: HBase
>  Issue Type: Task
>  Components: documentation
>Affects Versions: 0.98.0, 0.94.7, 0.99.0
>Reporter: Sean Busbey
>Assignee: Mike Drob
>Priority: Minor
> Attachments: HBASE-12169.patch
>
>
> HBASE-8148 added options to change binding component services, but there 
> aren't any docs for it.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

[jira] [Commented] (HBASE-13135) Move replication ops mgmt stuff from Javadoc to Ref Guide

2015-03-05 Thread Jean-Daniel Cryans (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-13135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14349168#comment-14349168
 ] 

Jean-Daniel Cryans commented on HBASE-13135:


Good stuff Misty, a few more things while we're here.

I think we can remove this line since it's on by default, maybe say that the 
operator should check it wasn't disabled:

bq. On the source cluster, enable replication by setting `hbase.replication` to 
`true` in _hbase-site.xml_.

The following can be removed, we don't print this out at least since the 
endpoint changes:

{quote}
 Considering 1 rs, with ratio 0.1
 Getting 1 rs from peer cluster # 0
 Choosing peer 10.10.1.49:62020
{quote}

This is a more reliable thing to point out instead, coming from 
ReplicationSource:

bq. LOG.info("Replicating "+clusterId + " -> " + peerClusterId);



> Move replication ops mgmt stuff from Javadoc to Ref Guide
> -
>
> Key: HBASE-13135
> URL: https://issues.apache.org/jira/browse/HBASE-13135
> Project: HBase
>  Issue Type: Bug
>  Components: documentation, Replication
>Reporter: Misty Stanley-Jones
>Assignee: Misty Stanley-Jones
> Attachments: HBASE-13135-v1.patch, HBASE-13135-v2.patch, 
> HBASE-13135.patch
>
>
> As per discussion with [~jmhsieh] and [~saint@gmail.com]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-13135) Move replication ops mgmt stuff from Javadoc to Ref Guide

2015-03-02 Thread Jean-Daniel Cryans (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-13135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14343406#comment-14343406
 ] 

Jean-Daniel Cryans commented on HBASE-13135:


Looking at the current doc, this line should be changed since at the end it 
includes a link that points back to the doc we're moving:

bq. The following is a simplified procedure for configuring cluster 
replication. It may not cover every edge case. For more information, see the 
API documentation for replication.

This whole section is very similar to the tail of the "Configuring Cluster 
Replication" section (merge or skip?):

bq. + Verifying Replicated Data

> Move replication ops mgmt stuff from Javadoc to Ref Guide
> -
>
> Key: HBASE-13135
> URL: https://issues.apache.org/jira/browse/HBASE-13135
> Project: HBase
>  Issue Type: Bug
>  Components: documentation, Replication
>Reporter: Misty Stanley-Jones
>Assignee: Misty Stanley-Jones
> Attachments: HBASE-13135.patch
>
>
> As per discussion with [~jmhsieh] and [~saint@gmail.com]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-7631) Region w/ few, fat reads was hard to find on a box carrying hundreds of regions

2015-01-12 Thread Jean-Daniel Cryans (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-7631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14273783#comment-14273783
 ] 

Jean-Daniel Cryans commented on HBASE-7631:
---

[~clehene] Things have changed a lot since then so I'm not sure what kind of 
work is required now. I'd close this unless someone is planning on working on 
it now that there's noise.

> Region w/ few, fat reads was hard to find on a box carrying hundreds of 
> regions
> ---
>
> Key: HBASE-7631
> URL: https://issues.apache.org/jira/browse/HBASE-7631
> Project: HBase
>  Issue Type: Improvement
>  Components: metrics
>Reporter: stack
>
> Of a sudden on a prod cluster, a table's rows gained girth... hundreds of 
> thousands of rows... and the application was pulling them all back every time 
> but only once a second or so.  Regionserver was carrying hundreds of regions. 
>  Was plain that there was lots of network out traffic.  It was tough figuring 
> which region was the culprit (JD's trick was moving the regions off one at a 
> time while watching network out traffic on cluster to see whose spiked next 
> -- it worked but just some time).
> If we had per region read/write sizes in metrics, that would have saved a 
> bunch of diagnostic time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-8318) TableOutputFormat.TableRecordWriter should accept Increments

2015-01-12 Thread Jean-Daniel Cryans (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-8318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14273760#comment-14273760
 ] 

Jean-Daniel Cryans commented on HBASE-8318:
---

[~clehene] hey, so re-reading what I wrote, my opinion remains the same. Fact 
that nobody else picked this up seems to imply that it isn't a common use case 
either. I'm fine closing this.

> TableOutputFormat.TableRecordWriter should accept Increments
> 
>
> Key: HBASE-8318
> URL: https://issues.apache.org/jira/browse/HBASE-8318
> Project: HBase
>  Issue Type: Bug
>Reporter: Jean-Marc Spaggiari
>Assignee: Jean-Marc Spaggiari
> Attachments: HBASE-8318-v0-trunk.patch, HBASE-8318-v1-trunk.patch, 
> HBASE-8318-v2-trunk.patch
>
>
> TableOutputFormat.TableRecordWriter can take Puts and Deletes but it should 
> also accept Increments.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-12774) Fix the inconsistent permission checks for bulkloading.

2014-12-31 Thread Jean-Daniel Cryans (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-12774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14262361#comment-14262361
 ] 

Jean-Daniel Cryans commented on HBASE-12774:


Can you fixup the javadoc that I missed in HBASE-11008 while you're there?

https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/security/access/AccessController.java#L1961

Also, I'd like [~apurtell]'s benediction on this.

> Fix the inconsistent permission checks for bulkloading.
> ---
>
> Key: HBASE-12774
> URL: https://issues.apache.org/jira/browse/HBASE-12774
> Project: HBase
>  Issue Type: Bug
>Reporter: Srikanth Srungarapu
>Assignee: Srikanth Srungarapu
>Priority: Minor
> Attachments: HBASE-12774.patch, HBASE-12774_v2.patch
>
>
> Three checks(prePrepareBulkLoad, preCleanupBulkLoad, preBulkLoadHFile)  are 
> being done while performing secure bulk load, and it looks the former two 
> checks for 'W', while the later checks for 'C'. After having offline chat 
> with [~jdcryans], looks like the inconsistency among checks were unintended. 
> So, we can address this multiple ways.
> * All checks should be for 'W'
> * All checks should be for 'C'
> * All checks should be for both 'W' and 'C'
> Posting the initial patch going by the first option. Open to discussion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-12774) Fix the inconsistent permission checks for bulkloading.

2014-12-29 Thread Jean-Daniel Cryans (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-12774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14260570#comment-14260570
 ] 

Jean-Daniel Cryans commented on HBASE-12774:


Like we discussed, and because of HBASE-11008, it needs to be C which is your 
second option. Now, adding a C+W check isn't something we're doing anywhere 
else (I think?) and I'm not sure if it makes sense. The way things currently 
work, I'm not sure if it makes sense for a user to have C without having W.

> Fix the inconsistent permission checks for bulkloading.
> ---
>
> Key: HBASE-12774
> URL: https://issues.apache.org/jira/browse/HBASE-12774
> Project: HBase
>  Issue Type: Improvement
>Reporter: Srikanth Srungarapu
>Assignee: Srikanth Srungarapu
>Priority: Minor
> Attachments: HBASE-12774.patch
>
>
> Three checks(prePrepareBulkLoad, preCleanupBulkLoad, preBulkLoadHFile)  are 
> being done while performing secure bulk load, and it looks the former two 
> checks for 'W', while the later checks for 'C'. After having offline chat 
> with [~jdcryans], looks like the inconsistency among checks were unintended. 
> So, we can address this multiple ways.
> * All checks should be for 'W'
> * All checks should be for 'C'
> * All checks should be for both 'W' and 'C'
> Posting the initial patch going by the first option. Open to discussion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-12770) Don't transfer all the queued hlogs of a dead server to the same alive server

2014-12-29 Thread Jean-Daniel Cryans (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-12770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14260345#comment-14260345
 ] 

Jean-Daniel Cryans commented on HBASE-12770:


bq. is it reasonable that the alive server only transfer one peer's hlogs from 
the dead server

Just to make sure I understand you correctly, you're saying that in the case 
where we have many peers, instead of one server grabbing all the queues we 
should instead have the servers grab only one so that the load is spread out. 
If so, this is reasonable, but I would change the jira's title to reflect that 
(it's kind of vague right now, unless that was the goal to spur discussion).

bq. Regionservers could publish their queue depths and if there is a 
substantial difference detected, one RS could transfer a queue to the less 
loaded peer.

Yeah, but the details might be hard to get correctly. Might be a reason why a 
particular queue is growing (blocked on whatever) so it will start bouncing 
around. We could at least have admin tools to move queues though, right now 
we'd have to move all of them at the same time but with above's work it might 
be possible to do it more granularly.

> Don't transfer all the queued hlogs of a dead server to the same alive server
> -
>
> Key: HBASE-12770
> URL: https://issues.apache.org/jira/browse/HBASE-12770
> Project: HBase
>  Issue Type: Improvement
>  Components: Replication
>Reporter: cuijianwei
>Priority: Minor
>
> When a region server is down(or the cluster restart), all the hlog queues 
> will be transferred by the same alive region server. In a shared cluster, we 
> might create several peers replicating data to different peer clusters. There 
> might be lots of hlogs queued for these peers caused by several reasons, such 
> as some peers might be disabled, or errors from peer cluster might prevent 
> the replication, or the replication sources may fail to read some hlog 
> because of hdfs problem. Then, if the server is down or restarted, another 
> alive server will take all the replication jobs of the dead server, this 
> might bring a big pressure to resources(network/disk read) of the alive 
> server and also is not fast enough to replicate the queued hlogs. And if the 
> alive server is down, all the replication jobs including that takes from 
> other dead servers will once again be totally transferred to another alive 
> server, this might cause a server have a large number of queued hlogs(in our 
> shared cluster, we find one server might have thousands of queued hlogs for 
> replication). As an optional way, is it reasonable that the alive server only 
> transfer one peer's hlogs from the dead server one time? Then, other alive 
> region servers might have the opportunity to transfer the hlogs of rest 
> peers. This may also help the queued hlogs be processed more fast. Any 
> discussion is welcome.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-12665) When aborting, dump metrics

2014-12-09 Thread Jean-Daniel Cryans (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-12665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14240456#comment-14240456
 ] 

Jean-Daniel Cryans commented on HBASE-12665:


+1, still seems like a lot but better than showing everything. Also, I like it 
better when logged like that. We could also consider printing it out in a more 
compact way, it's basically how the metrics dump was before, but we can do 
later if needed.

It seems that 0.98 will need a different patch.

> When aborting, dump metrics
> ---
>
> Key: HBASE-12665
> URL: https://issues.apache.org/jira/browse/HBASE-12665
> Project: HBase
>  Issue Type: Bug
>  Components: Operability
>Reporter: stack
>Assignee: stack
> Fix For: 1.0.0, 2.0.0, 0.98.9
>
> Attachments: 0001-First-cut.patch, 
> 0001-HBASE-12665-When-aborting-dump-metrics.patch, 12665v3.txt, 12665v4.txt, 
> 12665v5.txt, dump.txt
>
>
> We used to dump out all metrics when we were exiting on abort. Was of use 
> debugging why the abort.  We used to have this.  [~jdcryans] noticed it was 
> dropped by his brother [~eclark] over in HBASE-6410 "Move RegionServer 
> Metrics to metrics2" To stop the two brothers fighting I intervened with this 
> patch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-12665) When aborting, dump metrics

2014-12-09 Thread Jean-Daniel Cryans (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-12665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14240277#comment-14240277
 ] 

Jean-Daniel Cryans commented on HBASE-12665:


[~stack] From a usability standpoint, how many lines do you get when dumping 
the metrics? Looks like it would go in the hundreds (also you are showing the 
master's metrics, standalone?). What I liked about what we had before was that 
it was concise and contained mostly stuff I was interested in, whereas now we 
seem to be getting all the histograms. Also, it goes to stdout instead of being 
logged, was that intentional? I'm not sure how that's gonna come out if there's 
loads of stuff being logged (like it usually is when a RS is aborting), it 
might get mixed a bit. Finally, it would be nice to get some description of 
what's being dumped. I was actually confused when I saw this part:

{noformat}
2014-12-09 13:44:21,244 FATAL [main] regionserver.HRegionServer(1927): 
RegionServer abort: loaded coprocessors are: 
[org.apache.hadoop.hbase.coprocessor.MultiRowMutationEndpoint]
{
  "beans" : [ {
{noformat}

It felt like the metrics were a continuation of the coprocessors printout. I'm 
easy to confuse so that might just be me.

Thanks for picking this up. In this time of the year, family values are more 
important than ever.

> When aborting, dump metrics
> ---
>
> Key: HBASE-12665
> URL: https://issues.apache.org/jira/browse/HBASE-12665
> Project: HBase
>  Issue Type: Bug
>  Components: Operability
>Reporter: stack
>Assignee: stack
> Fix For: 1.0.0, 2.0.0, 0.98.9
>
> Attachments: 0001-First-cut.patch, 
> 0001-HBASE-12665-When-aborting-dump-metrics.patch
>
>
> We used to dump out all metrics when we were exiting on abort. Was of use 
> debugging why the abort.  We used to have this.  [~jdcryans] noticed it was 
> dropped by his brother [~eclark] over in HBASE-6410 "Move RegionServer 
> Metrics to metrics2" To stop the two brothers fighting I intervened with this 
> patch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-12665) When aborting, dump metrics

2014-12-09 Thread Jean-Daniel Cryans (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-12665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14240046#comment-14240046
 ] 

Jean-Daniel Cryans commented on HBASE-12665:


I think your first cut is missing something, I see you lifting a bunch of stuff 
from JMXJsonServlet into JSONBean but the former still has most of the latter's 
code.

> When aborting, dump metrics
> ---
>
> Key: HBASE-12665
> URL: https://issues.apache.org/jira/browse/HBASE-12665
> Project: HBase
>  Issue Type: Bug
>  Components: Operability
>Reporter: stack
>Assignee: stack
> Fix For: 1.0.0, 2.0.0, 0.98.9
>
> Attachments: 0001-First-cut.patch
>
>
> We used to dump out all metrics when we were exiting on abort. Was of use 
> debugging why the abort.  We used to have this.  [~jdcryans] noticed it was 
> dropped by his brother [~eclark] over in HBASE-6410 "Move RegionServer 
> Metrics to metrics2" To stop the two brothers fighting I intervened with this 
> patch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-12419) "Partial cell read caused by EOF" ERRORs on replication source during replication

2014-11-05 Thread Jean-Daniel Cryans (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-12419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14199480#comment-14199480
 ] 

Jean-Daniel Cryans commented on HBASE-12419:


Yeah, getting partial reads has always been a reality in ReplicationSource, 
even before PBing everything. I'm +1 on TRACEing this but what about all the 
other users of BaseDecoder? Are there cases where we rely on this LOG.error to 
show that something went wrong?

> "Partial cell read caused by EOF" ERRORs on replication source during 
> replication
> -
>
> Key: HBASE-12419
> URL: https://issues.apache.org/jira/browse/HBASE-12419
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.98.7
>Reporter: Andrew Purtell
> Fix For: 2.0.0, 0.98.8, 0.99.2
>
> Attachments: TestReplicationIngest.patch
>
>
> We are seeing exceptions like these on the replication sources when 
> replication is active:
> {noformat}
> 2014-11-04 01:20:19,738 ERROR 
> [regionserver8120-EventThread.replicationSource,1] codec.BaseDecoder:
> Partial cell read caused by EOF: java.io.IOException: Premature EOF from 
> inputStream
> {noformat}
> HBase 0.98.8-SNAPSHOT, Hadoop 2.4.1.
> Happens both with and without short circuit reads on the source cluster.
> I'm able to reproduce this reliably:
> # Set up two clusters. Can be single slave.
> # Enable replication in configuration
> # Use LoadTestTool -init_only on both clusters
> # On source cluster via shell: alter 
> 'cluster_test',{NAME=>'test_cf',REPLICATION_SCOPE=>1}
> # On source cluster via shell: add_peer 'remote:port:/hbase'
> # On source cluster, LoadTestTool -skip_init -write 1:1024:10 -num_keys 
> 100
> # Wait for LoadTestTool to complete
> # Use the shell to verify 1M rows are in 'cluster_test' on the target cluster.
> All 1M rows will replicate without data loss, but I'll see 5-15 instances of 
> "Partial cell read caused by EOF" messages logged from codec.BaseDecoder at 
> ERROR level on the replication source. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

[jira] [Commented] (HBASE-11316) Expand info about compactions beyond HBASE-11120

2014-07-29 Thread Jean-Daniel Cryans (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-11316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14078104#comment-14078104
 ] 

Jean-Daniel Cryans commented on HBASE-11316:


+1, thanks for looking at this Stack and sorry I overlooked those failing tests.

> Expand info about compactions beyond HBASE-11120
> 
>
> Key: HBASE-11316
> URL: https://issues.apache.org/jira/browse/HBASE-11316
> Project: HBase
>  Issue Type: Bug
>  Components: Compaction, documentation
>Reporter: Misty Stanley-Jones
>Assignee: Misty Stanley-Jones
> Fix For: 2.0.0
>
> Attachments: HBASE-11316-1.patch, HBASE-11316-2.patch, 
> HBASE-11316-3.patch, HBASE-11316-4.patch, HBASE-11316-5.patch, 
> HBASE-11316-6-rebased-v2.patch, HBASE-11316-6-rebased.patch, 
> HBASE-11316-6.patch, HBASE-11316-7.patch, HBASE-11316-8.patch, 
> HBASE-11316-9.patch, HBASE-11316.patch, addendum.txt, ch9_compactions.pdf
>
>
> Round 2 - expand info about the algorithms, talk about stripe compaction, and 
> talk more about configuration. Hopefully find some rules of thumb.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HBASE-11316) Expand info about compactions beyond HBASE-11120

2014-07-28 Thread Jean-Daniel Cryans (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-11316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Daniel Cryans updated HBASE-11316:
---

   Resolution: Fixed
Fix Version/s: 2.0.0
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

Pushed to master, thanks for all the hard work documenting our most complicated 
stuff Misty.

> Expand info about compactions beyond HBASE-11120
> 
>
> Key: HBASE-11316
> URL: https://issues.apache.org/jira/browse/HBASE-11316
> Project: HBase
>  Issue Type: Bug
>  Components: Compaction, documentation
>Reporter: Misty Stanley-Jones
>Assignee: Misty Stanley-Jones
> Fix For: 2.0.0
>
> Attachments: HBASE-11316-1.patch, HBASE-11316-2.patch, 
> HBASE-11316-3.patch, HBASE-11316-4.patch, HBASE-11316-5.patch, 
> HBASE-11316-6-rebased-v2.patch, HBASE-11316-6-rebased.patch, 
> HBASE-11316-6.patch, HBASE-11316-7.patch, HBASE-11316-8.patch, 
> HBASE-11316-9.patch, HBASE-11316.patch, ch9_compactions.pdf
>
>
> Round 2 - expand info about the algorithms, talk about stripe compaction, and 
> talk more about configuration. Hopefully find some rules of thumb.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HBASE-11316) Expand info about compactions beyond HBASE-11120

2014-07-28 Thread Jean-Daniel Cryans (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-11316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14077225#comment-14077225
 ] 

Jean-Daniel Cryans commented on HBASE-11316:


Comments:

{noformat}
-Name of the failsafe snapshot taken by the restore operation.
+Name of the failsafe snapshot taken by the restore opecompn.
{noformat}

Typo slipped in there.

{noformat}
+  Being Stuck
{noformat}

Looking at the output, this line (and the other titles at that level) is 
comically small. There's probably we can do short of changing the style sheet.

And I'm good with the rest.

> Expand info about compactions beyond HBASE-11120
> 
>
> Key: HBASE-11316
> URL: https://issues.apache.org/jira/browse/HBASE-11316
> Project: HBase
>  Issue Type: Bug
>  Components: Compaction, documentation
>Reporter: Misty Stanley-Jones
>Assignee: Misty Stanley-Jones
> Attachments: HBASE-11316-1.patch, HBASE-11316-2.patch, 
> HBASE-11316-3.patch, HBASE-11316-4.patch, HBASE-11316-5.patch, 
> HBASE-11316-6-rebased-v2.patch, HBASE-11316-6-rebased.patch, 
> HBASE-11316-6.patch, HBASE-11316-7.patch, HBASE-11316-8.patch, 
> HBASE-11316.patch, ch9_compactions.pdf
>
>
> Round 2 - expand info about the algorithms, talk about stripe compaction, and 
> talk more about configuration. Hopefully find some rules of thumb.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HBASE-11522) Move Replication information into the Ref Guide

2014-07-28 Thread Jean-Daniel Cryans (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-11522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Daniel Cryans updated HBASE-11522:
---

   Resolution: Fixed
Fix Version/s: 2.0.0
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

I compiled the site and looked and the end result, +1. Pushed to master. Thanks 
Misty!

> Move Replication information into the Ref Guide
> ---
>
> Key: HBASE-11522
> URL: https://issues.apache.org/jira/browse/HBASE-11522
> Project: HBase
>  Issue Type: Bug
>  Components: documentation
>Reporter: Misty Stanley-Jones
>Assignee: Misty Stanley-Jones
> Fix For: 2.0.0
>
> Attachments: HBASE-11522-1.patch, HBASE-11522.patch, 
> replication_overview.png
>
>
> Currently, information about replication lives at 
> http://hbase.apache.org/replication.html and 
> http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/replication/package-summary.html#requirements.
>  IMHO, it makes sense at least for the first link to be in the Ref Guide 
> instead of where it is now. This came up because of HBASE-10398.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Resolved] (HBASE-11388) The order parameter is wrong when invoking the constructor of the ReplicationPeer In the method "getPeer" of the class ReplicationPeersZKImpl

2014-07-25 Thread Jean-Daniel Cryans (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-11388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Daniel Cryans resolved HBASE-11388.


   Resolution: Fixed
Fix Version/s: (was: 0.99.0)
 Hadoop Flags: Reviewed

You're right, so I just pushed to patch to 0.98. Thanks!

> The order parameter is wrong when invoking the constructor of the 
> ReplicationPeer In the method "getPeer" of the class ReplicationPeersZKImpl
> -
>
> Key: HBASE-11388
> URL: https://issues.apache.org/jira/browse/HBASE-11388
> Project: HBase
>  Issue Type: Bug
>  Components: Replication
>Affects Versions: 0.99.0, 0.98.3
>Reporter: Qianxi Zhang
>Assignee: Qianxi Zhang
>Priority: Minor
> Fix For: 0.98.5
>
> Attachments: HBASE_11388.patch, HBASE_11388_trunk_V1.patch
>
>
> The parameters is "Configurationi", "ClusterKey" and "id" in the constructor 
> of the class ReplicationPeer. But he order parameter is "Configurationi", 
> "id" and "ClusterKey" when invoking the constructor of the ReplicationPeer In 
> the method "getPeer" of the class ReplicationPeersZKImpl
> ReplicationPeer#76
> {code}
>   public ReplicationPeer(Configuration conf, String key, String id) throws 
> ReplicationException {
> this.conf = conf;
> this.clusterKey = key;
> this.id = id;
> try {
>   this.reloadZkWatcher();
> } catch (IOException e) {
>   throw new ReplicationException("Error connecting to peer cluster with 
> peerId=" + id, e);
> }
>   }
> {code}
> ReplicationPeersZKImpl#498
> {code}
> ReplicationPeer peer =
> new ReplicationPeer(peerConf, peerId, 
> ZKUtil.getZooKeeperClusterKey(peerConf));
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HBASE-11388) The order parameter is wrong when invoking the constructor of the ReplicationPeer In the method "getPeer" of the class ReplicationPeersZKImpl

2014-07-24 Thread Jean-Daniel Cryans (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-11388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Daniel Cryans updated HBASE-11388:
---

Status: Open  (was: Patch Available)

Well, this needs a rebase. I'm back from vacation now, [~qianxiZhang], so if 
you resubmit a patch this week we could get this in quickly.

> The order parameter is wrong when invoking the constructor of the 
> ReplicationPeer In the method "getPeer" of the class ReplicationPeersZKImpl
> -
>
> Key: HBASE-11388
> URL: https://issues.apache.org/jira/browse/HBASE-11388
> Project: HBase
>  Issue Type: Bug
>  Components: Replication
>Affects Versions: 0.98.3, 0.99.0
>Reporter: Qianxi Zhang
>Assignee: Qianxi Zhang
>Priority: Minor
> Fix For: 0.99.0, 0.98.5
>
> Attachments: HBASE_11388.patch, HBASE_11388_trunk_V1.patch
>
>
> The parameters is "Configurationi", "ClusterKey" and "id" in the constructor 
> of the class ReplicationPeer. But he order parameter is "Configurationi", 
> "id" and "ClusterKey" when invoking the constructor of the ReplicationPeer In 
> the method "getPeer" of the class ReplicationPeersZKImpl
> ReplicationPeer#76
> {code}
>   public ReplicationPeer(Configuration conf, String key, String id) throws 
> ReplicationException {
> this.conf = conf;
> this.clusterKey = key;
> this.id = id;
> try {
>   this.reloadZkWatcher();
> } catch (IOException e) {
>   throw new ReplicationException("Error connecting to peer cluster with 
> peerId=" + id, e);
> }
>   }
> {code}
> ReplicationPeersZKImpl#498
> {code}
> ReplicationPeer peer =
> new ReplicationPeer(peerConf, peerId, 
> ZKUtil.getZooKeeperClusterKey(peerConf));
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HBASE-11388) The order parameter is wrong when invoking the constructor of the ReplicationPeer In the method "getPeer" of the class ReplicationPeersZKImpl

2014-07-24 Thread Jean-Daniel Cryans (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-11388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Daniel Cryans updated HBASE-11388:
---

Status: Patch Available  (was: Open)

+1 on the second patch, I'll get a Hadoop QA run for good measure.

> The order parameter is wrong when invoking the constructor of the 
> ReplicationPeer In the method "getPeer" of the class ReplicationPeersZKImpl
> -
>
> Key: HBASE-11388
> URL: https://issues.apache.org/jira/browse/HBASE-11388
> Project: HBase
>  Issue Type: Bug
>  Components: Replication
>Affects Versions: 0.98.3, 0.99.0
>Reporter: Qianxi Zhang
>Assignee: Qianxi Zhang
>Priority: Minor
> Fix For: 0.99.0, 0.98.5
>
> Attachments: HBASE_11388.patch, HBASE_11388_trunk_V1.patch
>
>
> The parameters is "Configurationi", "ClusterKey" and "id" in the constructor 
> of the class ReplicationPeer. But he order parameter is "Configurationi", 
> "id" and "ClusterKey" when invoking the constructor of the ReplicationPeer In 
> the method "getPeer" of the class ReplicationPeersZKImpl
> ReplicationPeer#76
> {code}
>   public ReplicationPeer(Configuration conf, String key, String id) throws 
> ReplicationException {
> this.conf = conf;
> this.clusterKey = key;
> this.id = id;
> try {
>   this.reloadZkWatcher();
> } catch (IOException e) {
>   throw new ReplicationException("Error connecting to peer cluster with 
> peerId=" + id, e);
> }
>   }
> {code}
> ReplicationPeersZKImpl#498
> {code}
> ReplicationPeer peer =
> new ReplicationPeer(peerConf, peerId, 
> ZKUtil.getZooKeeperClusterKey(peerConf));
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HBASE-11522) Move Replication information into the Ref Guide

2014-07-23 Thread Jean-Daniel Cryans (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-11522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14072071#comment-14072071
 ] 

Jean-Daniel Cryans commented on HBASE-11522:


A few little things:
{noformat}
+Apache HBase (TM) provides a replication mechanism to copy data 
between HBase
{noformat}

We needed to put the whole Apache HBase (TM) here because we were mentioning 
HBase for the first time, but now that it's in the middle of the documentation 
we can just replace this by just "HBase provides a replication...".
{noformat}
+
{noformat}

It's not a procedure, it's more a description of what happens when you 
replicate.
{noformat}
+  Replication FAQ
{noformat}
We can remove the whole section, it's stale and nobody's been asking those 
questions.

Finally, we could use a few more "xml:id" tags, at least for the first level 
below replication.

> Move Replication information into the Ref Guide
> ---
>
> Key: HBASE-11522
> URL: https://issues.apache.org/jira/browse/HBASE-11522
> Project: HBase
>  Issue Type: Bug
>  Components: documentation
>Reporter: Misty Stanley-Jones
>Assignee: Misty Stanley-Jones
> Attachments: HBASE-11522.patch, replication_overview.png
>
>
> Currently, information about replication lives at 
> http://hbase.apache.org/replication.html and 
> http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/replication/package-summary.html#requirements.
>  IMHO, it makes sense at least for the first link to be in the Ref Guide 
> instead of where it is now. This came up because of HBASE-10398.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HBASE-11316) Expand info about compactions beyond HBASE-11120

2014-07-01 Thread Jean-Daniel Cryans (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-11316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14049476#comment-14049476
 ] 

Jean-Daniel Cryans commented on HBASE-11316:


IMO, unless we specifically refer to something related to the HFile 
implementation, we should always use StoreFile. That's also how we present in 
the web UI when list the number of Stores and StoreFiles in a region.

> Expand info about compactions beyond HBASE-11120
> 
>
> Key: HBASE-11316
> URL: https://issues.apache.org/jira/browse/HBASE-11316
> Project: HBase
>  Issue Type: Bug
>  Components: Compaction, documentation
>Reporter: Misty Stanley-Jones
>Assignee: Misty Stanley-Jones
> Attachments: HBASE-11316-1.patch, HBASE-11316-2.patch, 
> HBASE-11316-3.patch, HBASE-11316-4.patch, HBASE-11316-5.patch, 
> HBASE-11316.patch, ch9_compactions.pdf
>
>
> Round 2 - expand info about the algorithms, talk about stripe compaction, and 
> talk more about configuration. Hopefully find some rules of thumb.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HBASE-11388) The order parameter is wrong when invoking the constructor of the ReplicationPeer In the method "getPeer" of the class ReplicationPeersZKImpl

2014-06-24 Thread Jean-Daniel Cryans (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-11388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14042287#comment-14042287
 ] 

Jean-Daniel Cryans commented on HBASE-11388:


Ah, sorry, I wasn't clear. I was thinking that we should only remove it from 
the constructor, but then populate that field within the constructor using 
getZooKeeperClusterKey.

> The order parameter is wrong when invoking the constructor of the 
> ReplicationPeer In the method "getPeer" of the class ReplicationPeersZKImpl
> -
>
> Key: HBASE-11388
> URL: https://issues.apache.org/jira/browse/HBASE-11388
> Project: HBase
>  Issue Type: Bug
>  Components: Replication
>Affects Versions: 0.99.0, 0.98.3
>Reporter: Qianxi Zhang
>Assignee: Qianxi Zhang
>Priority: Minor
> Fix For: 0.99.0
>
> Attachments: HBASE_11388.patch
>
>
> The parameters is "Configurationi", "ClusterKey" and "id" in the constructor 
> of the class ReplicationPeer. But he order parameter is "Configurationi", 
> "id" and "ClusterKey" when invoking the constructor of the ReplicationPeer In 
> the method "getPeer" of the class ReplicationPeersZKImpl
> ReplicationPeer#76
> {code}
>   public ReplicationPeer(Configuration conf, String key, String id) throws 
> ReplicationException {
> this.conf = conf;
> this.clusterKey = key;
> this.id = id;
> try {
>   this.reloadZkWatcher();
> } catch (IOException e) {
>   throw new ReplicationException("Error connecting to peer cluster with 
> peerId=" + id, e);
> }
>   }
> {code}
> ReplicationPeersZKImpl#498
> {code}
> ReplicationPeer peer =
> new ReplicationPeer(peerConf, peerId, 
> ZKUtil.getZooKeeperClusterKey(peerConf));
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HBASE-11388) The order parameter is wrong when invoking the constructor of the ReplicationPeer In the method "getPeer" of the class ReplicationPeersZKImpl

2014-06-23 Thread Jean-Daniel Cryans (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-11388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14040977#comment-14040977
 ] 

Jean-Daniel Cryans commented on HBASE-11388:


Wow nice one [~qianxiZhang], I'm surprised that things even work but it looks 
like we don't do anything useful with clusterKey in ReplicationPeer, we just 
use the configuration. I think a better interface would be to just remove the 
passing of the clusterKey in ReplicationPeer's constructor since we can infer 
it using ZKUtil#getZooKeeperClusterKey().

> The order parameter is wrong when invoking the constructor of the 
> ReplicationPeer In the method "getPeer" of the class ReplicationPeersZKImpl
> -
>
> Key: HBASE-11388
> URL: https://issues.apache.org/jira/browse/HBASE-11388
> Project: HBase
>  Issue Type: Bug
>  Components: Replication
>Affects Versions: 0.99.0, 0.98.3
>Reporter: Qianxi Zhang
>Assignee: Qianxi Zhang
>Priority: Minor
> Fix For: 0.99.0
>
> Attachments: HBASE_11388.patch
>
>
> The parameters is "Configurationi", "ClusterKey" and "id" in the constructor 
> of the class ReplicationPeer. But he order parameter is "Configurationi", 
> "id" and "ClusterKey" when invoking the constructor of the ReplicationPeer In 
> the method "getPeer" of the class ReplicationPeersZKImpl
> ReplicationPeer#76
> {code}
>   public ReplicationPeer(Configuration conf, String key, String id) throws 
> ReplicationException {
> this.conf = conf;
> this.clusterKey = key;
> this.id = id;
> try {
>   this.reloadZkWatcher();
> } catch (IOException e) {
>   throw new ReplicationException("Error connecting to peer cluster with 
> peerId=" + id, e);
> }
>   }
> {code}
> ReplicationPeersZKImpl#498
> {code}
> ReplicationPeer peer =
> new ReplicationPeer(peerConf, peerId, 
> ZKUtil.getZooKeeperClusterKey(peerConf));
> {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HBASE-10504) Define Replication Interface

2014-06-16 Thread Jean-Daniel Cryans (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-10504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14033216#comment-14033216
 ] 

Jean-Daniel Cryans commented on HBASE-10504:


bq.  Do you want to do this in a subtask?

Yes, please.

> Define Replication Interface
> 
>
> Key: HBASE-10504
> URL: https://issues.apache.org/jira/browse/HBASE-10504
> Project: HBase
>  Issue Type: Task
>Reporter: stack
>Assignee: Enis Soztutar
>Priority: Blocker
> Fix For: 0.99.0
>
> Attachments: hbase-10504_v1.patch, hbase-10504_wip1.patch
>
>
> HBase has replication.  Fellas have been hijacking the replication apis to do 
> all kinds of perverse stuff like indexing hbase content (hbase-indexer 
> https://github.com/NGDATA/hbase-indexer) and our [~toffer] just showed up w/ 
> overrides that replicate via an alternate channel (over a secure thrift 
> channel between dcs over on HBASE-9360).  This issue is about surfacing these 
> APIs as public with guarantees to downstreamers similar to those we have on 
> our public client-facing APIs (and so we don't break them for downstreamers).
> Any input [~phunt] or [~gabriel.reid] or [~toffer]?
> Thanks.
>  



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HBASE-6192) Document ACL matrix in the book

2014-06-16 Thread Jean-Daniel Cryans (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14032601#comment-14032601
 ] 

Jean-Daniel Cryans commented on HBASE-6192:
---

bq. The code seems to indicate that BulkLoadHFile requires CREATE but the PDF 
says it requires WRITE. Which is true?

I changed it in HBASE-11008, the explanation is in the release note.

> Document ACL matrix in the book
> ---
>
> Key: HBASE-6192
> URL: https://issues.apache.org/jira/browse/HBASE-6192
> Project: HBase
>  Issue Type: Task
>  Components: documentation, security
>Affects Versions: 0.94.1, 0.95.2
>Reporter: Enis Soztutar
>Assignee: Misty Stanley-Jones
>  Labels: documentaion, security
> Fix For: 0.99.0
>
> Attachments: HBASE-6192-rebased.patch, HBASE-6192.patch, HBase 
> Security-ACL Matrix.pdf, HBase Security-ACL Matrix.pdf, HBase Security-ACL 
> Matrix.pdf, HBase Security-ACL Matrix.xls, HBase Security-ACL Matrix.xls, 
> HBase Security-ACL Matrix.xls
>
>
> We have an excellent matrix at 
> https://issues.apache.org/jira/secure/attachment/12531252/Security-ACL%20Matrix.pdf
>  for ACL. Once the changes are done, we can adapt that and put it in the 
> book, also add some more documentation about the new authorization features. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HBASE-8844) Document the removal of replication state AKA start/stop_replication

2014-06-13 Thread Jean-Daniel Cryans (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-8844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Daniel Cryans updated HBASE-8844:
--

   Resolution: Fixed
Fix Version/s: 0.99.0
   Status: Resolved  (was: Patch Available)

Pushed to trunk, thanks Misty!

> Document the removal of replication state AKA start/stop_replication
> 
>
> Key: HBASE-8844
> URL: https://issues.apache.org/jira/browse/HBASE-8844
> Project: HBase
>  Issue Type: Task
>  Components: documentation
>Reporter: Patrick
>Assignee: Misty Stanley-Jones
>Priority: Minor
> Fix For: 0.99.0
>
> Attachments: HBASE-8844-v2.patch, HBASE-8844-v3-rebased.patch, 
> HBASE-8844-v3.patch, HBASE-8844.patch
>
>
> The first two tutorials for enabling replication that google gives me [1], 
> [2] take very different tones with regard to stop_replication. The HBase docs 
> [1] make it sound fine to start and stop replication as desired. The Cloudera 
> docs [2] say it may cause data loss.
> Which is true? If data loss is possible, are we talking about data loss in 
> the primary cluster, or data loss in the standby cluster (presumably would 
> require reinitializing the sync with a new CopyTable).
> [1] 
> http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/replication/package-summary.html#requirements
> [2] 
> http://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH4/4.2.0/CDH4-Installation-Guide/cdh4ig_topic_20_11.html



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HBASE-8844) Document the removal of replication state AKA start/stop_replication

2014-06-13 Thread Jean-Daniel Cryans (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-8844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Daniel Cryans updated HBASE-8844:
--

Summary: Document the removal of replication state AKA 
start/stop_replication  (was: Document stop_replication danger)

Updating the jira's summary to better reflect what we ended up doing.

> Document the removal of replication state AKA start/stop_replication
> 
>
> Key: HBASE-8844
> URL: https://issues.apache.org/jira/browse/HBASE-8844
> Project: HBase
>  Issue Type: Task
>  Components: documentation
>Reporter: Patrick
>Assignee: Misty Stanley-Jones
>Priority: Minor
> Attachments: HBASE-8844-v2.patch, HBASE-8844-v3-rebased.patch, 
> HBASE-8844-v3.patch, HBASE-8844.patch
>
>
> The first two tutorials for enabling replication that google gives me [1], 
> [2] take very different tones with regard to stop_replication. The HBase docs 
> [1] make it sound fine to start and stop replication as desired. The Cloudera 
> docs [2] say it may cause data loss.
> Which is true? If data loss is possible, are we talking about data loss in 
> the primary cluster, or data loss in the standby cluster (presumably would 
> require reinitializing the sync with a new CopyTable).
> [1] 
> http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/replication/package-summary.html#requirements
> [2] 
> http://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH4/4.2.0/CDH4-Installation-Guide/cdh4ig_topic_20_11.html



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HBASE-10504) Define Replication Interface

2014-06-12 Thread Jean-Daniel Cryans (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-10504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14029806#comment-14029806
 ] 

Jean-Daniel Cryans commented on HBASE-10504:


As long as the RE daemons are disjoint from the region servers that would work, 
since you need to be able to have as many/few of them as you want, plus to be 
able to have them on different machines.

I could even see this as being useful for certain use cases that want 
inter-HBase replication to be done through specific gateways.

We then have to architect a way to discover those endpoints, etc... kind of 
what the current replication code already does with sinks.

> Define Replication Interface
> 
>
> Key: HBASE-10504
> URL: https://issues.apache.org/jira/browse/HBASE-10504
> Project: HBase
>  Issue Type: Task
>Reporter: stack
>Assignee: Enis Soztutar
>Priority: Blocker
> Fix For: 0.99.0
>
> Attachments: hbase-10504_v1.patch, hbase-10504_wip1.patch
>
>
> HBase has replication.  Fellas have been hijacking the replication apis to do 
> all kinds of perverse stuff like indexing hbase content (hbase-indexer 
> https://github.com/NGDATA/hbase-indexer) and our [~toffer] just showed up w/ 
> overrides that replicate via an alternate channel (over a secure thrift 
> channel between dcs over on HBASE-9360).  This issue is about surfacing these 
> APIs as public with guarantees to downstreamers similar to those we have on 
> our public client-facing APIs (and so we don't break them for downstreamers).
> Any input [~phunt] or [~gabriel.reid] or [~toffer]?
> Thanks.
>  



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HBASE-10504) Define Replication Interface

2014-06-12 Thread Jean-Daniel Cryans (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-10504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14029275#comment-14029275
 ] 

Jean-Daniel Cryans commented on HBASE-10504:


[~enis], as per the discussion above, I think what you are proposing is good 
but it doesn't solve the problem that this jira is about. How about we open a 
new jira, titled something like "Make ReplicationSource extendable", and do the 
review/commit over there?

> Define Replication Interface
> 
>
> Key: HBASE-10504
> URL: https://issues.apache.org/jira/browse/HBASE-10504
> Project: HBase
>  Issue Type: Task
>Reporter: stack
>Assignee: Enis Soztutar
>Priority: Blocker
> Fix For: 0.99.0
>
> Attachments: hbase-10504_v1.patch, hbase-10504_wip1.patch
>
>
> HBase has replication.  Fellas have been hijacking the replication apis to do 
> all kinds of perverse stuff like indexing hbase content (hbase-indexer 
> https://github.com/NGDATA/hbase-indexer) and our [~toffer] just showed up w/ 
> overrides that replicate via an alternate channel (over a secure thrift 
> channel between dcs over on HBASE-9360).  This issue is about surfacing these 
> APIs as public with guarantees to downstreamers similar to those we have on 
> our public client-facing APIs (and so we don't break them for downstreamers).
> Any input [~phunt] or [~gabriel.reid] or [~toffer]?
> Thanks.
>  



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HBASE-8844) Document stop_replication danger

2014-06-09 Thread Jean-Daniel Cryans (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-8844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14025308#comment-14025308
 ] 

Jean-Daniel Cryans commented on HBASE-8844:
---

[~misty] I was about to commit and was double checking the documentation and I 
figured that we should remove the whole "state znode" section since that was 
also connected to start/stop_replication. I can do it on commit if that's ok 
with you.

> Document stop_replication danger
> 
>
> Key: HBASE-8844
> URL: https://issues.apache.org/jira/browse/HBASE-8844
> Project: HBase
>  Issue Type: Task
>  Components: documentation
>Reporter: Patrick
>Assignee: Misty Stanley-Jones
>Priority: Minor
> Attachments: HBASE-8844-v2.patch, HBASE-8844-v3-rebased.patch, 
> HBASE-8844-v3.patch, HBASE-8844.patch
>
>
> The first two tutorials for enabling replication that google gives me [1], 
> [2] take very different tones with regard to stop_replication. The HBase docs 
> [1] make it sound fine to start and stop replication as desired. The Cloudera 
> docs [2] say it may cause data loss.
> Which is true? If data loss is possible, are we talking about data loss in 
> the primary cluster, or data loss in the standby cluster (presumably would 
> require reinitializing the sync with a new CopyTable).
> [1] 
> http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/replication/package-summary.html#requirements
> [2] 
> http://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH4/4.2.0/CDH4-Installation-Guide/cdh4ig_topic_20_11.html



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HBASE-11120) Update documentation about major compaction algorithm

2014-06-09 Thread Jean-Daniel Cryans (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-11120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14025292#comment-14025292
 ] 

Jean-Daniel Cryans commented on HBASE-11120:


bq.  I don't see hbase.hstore.compaction.min.size in code base any more

Yeah I was confused by this too, [~stack], until I found this thread that you 
started for this reason a while back: 
http://osdir.com/ml/general/2013-06/msg41068.html

Basically, {{CompactionConfiguration}} was written with a "configuration 
prefix" so doing a grep on that config (or hbase.hstore.compaction.max.size) 
won't return anything. I think we should undo the "refactoring" in that class 
since it obviously breaks a pattern we're used to, which is to completely spell 
out the configuration names so we can easily search for them.

> Update documentation about major compaction algorithm
> -
>
> Key: HBASE-11120
> URL: https://issues.apache.org/jira/browse/HBASE-11120
> Project: HBase
>  Issue Type: Bug
>  Components: Compaction, documentation
>Affects Versions: 0.98.2
>Reporter: Misty Stanley-Jones
>Assignee: Misty Stanley-Jones
> Attachments: HBASE-11120-2.patch, HBASE-11120-3-rebased.patch, 
> HBASE-11120-3.patch, HBASE-11120-4-rebased.patch, HBASE-11120-4.patch, 
> HBASE-11120.patch
>
>
> [14:20:38]   seems that there's 
> http://hbase.apache.org/book.html#compaction and 
> http://hbase.apache.org/book.html#managed.compactions
> [14:20:56]   the latter doesn't say much, except that you 
> should manage them
> [14:21:44]   the former gives a good description of the 
> _old_ selection algo
> [14:45:25]   this is the new selection algo since C5 / 
> 0.96.0: https://issues.apache.org/jira/browse/HBASE-7842



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HBASE-11120) Update documentation about major compaction algorithm

2014-05-29 Thread Jean-Daniel Cryans (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-11120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14012875#comment-14012875
 ] 

Jean-Daniel Cryans commented on HBASE-11120:


Spent some more time reading through and I saw that I had missed a few things. 
BTW, thanks again Misty for working on this, it is much needed.

I'm not sure about this part:

{quote}
However, if the normal compaction
+algorithm do not find any normally-eligible StoreFiles, a 
major compaction is
+the only way to get out of this situation, and is forced. 
This is also called a
+size-based or size-triggered major compaction.
{quote}

The exploration compaction will select the smallest set of files to compact and 
it won't trigger a major compaction. Also, I looked for "size-based" and 
"size-triggered" in the code but couldn't find anything so I'm not sure what 
this is referring to.

This is wrong:

{quote}
+If this compaction was user-requested, do a major 
compaction.
+
+  Compactions can run on a schedule or can be initiated 
manually. If a
+compaction is requested manually, it is always a major 
compaction. If the
+compaction is user-requested, the major compaction still 
happens even if the are
+more than hbase.hstore.compaction.max 
files that need
+compaction.
+
{quote}

It will do a minor compaction if the user asks for that, or a major compaction 
if it's specifically asked for.

It might be good to mention that hbase.hstore.compactionThreshold was the old 
name for hbase.hstore.compaction.min in this section:

{quote}
+  
+hbase.hstore.compaction.min
+The minimum number of files which must be eligible 
for compaction before
+  compaction can run.
+3
+  
{quote}

Those two are wrong (if 1000 bytes was the max we'd probably never compact :P):

{quote}
+  
+hbase.hstore.compaction.min.size
+A StoreFile smaller than this size (in bytes) will 
always be eligible for
+  minor compaction.
+10
+  
+  
+hbase.hstore.compaction.max.size
+A StoreFile larger than this size (in bytes) will 
be excluded from minor
+  compaction.
+1000
+  
{quote}

It should be 128MB and Long.MAX_VALUE as per: 
https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/compactions/CompactionConfiguration.java#L71

> Update documentation about major compaction algorithm
> -
>
> Key: HBASE-11120
> URL: https://issues.apache.org/jira/browse/HBASE-11120
> Project: HBase
>  Issue Type: Bug
>  Components: Compaction, documentation
>Affects Versions: 0.98.2
>Reporter: Misty Stanley-Jones
>Assignee: Misty Stanley-Jones
> Attachments: HBASE-11120-2.patch, HBASE-11120-3-rebased.patch, 
> HBASE-11120-3.patch, HBASE-11120.patch
>
>
> [14:20:38]   seems that there's 
> http://hbase.apache.org/book.html#compaction and 
> http://hbase.apache.org/book.html#managed.compactions
> [14:20:56]   the latter doesn't say much, except that you 
> should manage them
> [14:21:44]   the former gives a good description of the 
> _old_ selection algo
> [14:45:25]   this is the new selection algo since C5 / 
> 0.96.0: https://issues.apache.org/jira/browse/HBASE-7842



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HBASE-8844) Document stop_replication danger

2014-05-29 Thread Jean-Daniel Cryans (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-8844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14012492#comment-14012492
 ] 

Jean-Daniel Cryans commented on HBASE-8844:
---

This part isn't right:

{quote}
It is also cached locally using an AtomicBoolean in the 
ReplicationZookeeper 
-class. This value can be changed on a live cluster using the 
stop_replication 
+class. This value can be changed on a live cluster using the 
disable_peer 
{quote}

The AtomicBoolean was completely removed, so there's now no way to toggle 
hbase.replication while the cluster is running. So that part can be completely 
removed, and hbase.replication is now only read when HBase starts.

Regarding disable_peer, it might be good to add that what it does is it stops 
sending the edits to that peer cluster, but it still keeps track of all the new 
WALs that it will have to replicate when it's re-enabled.

> Document stop_replication danger
> 
>
> Key: HBASE-8844
> URL: https://issues.apache.org/jira/browse/HBASE-8844
> Project: HBase
>  Issue Type: Task
>  Components: documentation
>Reporter: Patrick
>Assignee: Misty Stanley-Jones
>Priority: Minor
> Attachments: HBASE-8844-v2.patch, HBASE-8844.patch
>
>
> The first two tutorials for enabling replication that google gives me [1], 
> [2] take very different tones with regard to stop_replication. The HBase docs 
> [1] make it sound fine to start and stop replication as desired. The Cloudera 
> docs [2] say it may cause data loss.
> Which is true? If data loss is possible, are we talking about data loss in 
> the primary cluster, or data loss in the standby cluster (presumably would 
> require reinitializing the sync with a new CopyTable).
> [1] 
> http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/replication/package-summary.html#requirements
> [2] 
> http://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH4/4.2.0/CDH4-Installation-Guide/cdh4ig_topic_20_11.html



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HBASE-11196) Update description of -ROOT- in ref guide

2014-05-28 Thread Jean-Daniel Cryans (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-11196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14011827#comment-14011827
 ] 

Jean-Daniel Cryans commented on HBASE-11196:


Nit:

bq. tracked by Zookeeper

Should be "stored in ZooKeeper".

Regarding the section about {{HTablePool}}, it's now a deprecated class, so we 
should eventually update that too (but let's stick to this jira's scope).

The rest looks good!


> Update description of -ROOT- in ref guide
> -
>
> Key: HBASE-11196
> URL: https://issues.apache.org/jira/browse/HBASE-11196
> Project: HBase
>  Issue Type: Bug
>  Components: documentation
>Reporter: Dima Spivak
>Assignee: Misty Stanley-Jones
> Attachments: HBASE-11196.patch
>
>
> Since the resolution of HBASE-3171, \-ROOT- is no longer used to store the 
> location(s) of .META. . Unfortunately, not all of [our 
> documentation|http://hbase.apache.org/book/arch.catalog.html] has been 
> updated to reflect this change in architecture.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HBASE-11120) Update documentation about major compaction algorithm

2014-05-22 Thread Jean-Daniel Cryans (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-11120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14006480#comment-14006480
 ] 

Jean-Daniel Cryans commented on HBASE-11120:


On top of Sergey's comments:

This could also use a sentence regarding TTLs, they are simply filtered out and 
tombstones aren't created:

{noformat}
bq. Compaction and Deletions
{noformat}

Same kind of comment for this para, the old versions are filtered more than 
deleted:

{noformat}
+Compaction and Versions
{noformat}

One thing bothers me about this para:

{noformat}
+The compaction algorithms used by HBase have evolved over 
time. HBase 0.96
+  introduced a new algorithm for compaction file selection. To 
find out about the old
+  algorithm, see . The rest of this section describes 
the new algorithm, which
+  was implemented in https://issues.apache.org/jira/browse/HBASE-7842";>HBASE-7842.
{noformat}

What's written next is much more about how to pick the files that will then be 
considered for compaction than the actual new policy which is in 
ExploringCompactionPolicy.java. In other words, the text that follows what I 
just quoted is ok but not really new in 0.96 via HBASE-7842, and it seems to me 
that we should explain what HBASE-7842 does.

> Update documentation about major compaction algorithm
> -
>
> Key: HBASE-11120
> URL: https://issues.apache.org/jira/browse/HBASE-11120
> Project: HBase
>  Issue Type: Bug
>  Components: Compaction, documentation
>Affects Versions: 0.98.2
>Reporter: Misty Stanley-Jones
>Assignee: Misty Stanley-Jones
> Attachments: HBASE-11120.patch
>
>
> [14:20:38]   seems that there's 
> http://hbase.apache.org/book.html#compaction and 
> http://hbase.apache.org/book.html#managed.compactions
> [14:20:56]   the latter doesn't say much, except that you 
> should manage them
> [14:21:44]   the former gives a good description of the 
> _old_ selection algo
> [14:45:25]   this is the new selection algo since C5 / 
> 0.96.0: https://issues.apache.org/jira/browse/HBASE-7842



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Resolved] (HBASE-10946) hbase.metrics.showTableName cause run “hbase xxx.Hfile” report Inconsistent configuration

2014-05-16 Thread Jean-Daniel Cryans (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-10946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Daniel Cryans resolved HBASE-10946.


Resolution: Duplicate

I'm fixing this in HBASE-11188.

> hbase.metrics.showTableName  cause run “hbase xxx.Hfile”  report Inconsistent 
> configuration
> ---
>
> Key: HBASE-10946
> URL: https://issues.apache.org/jira/browse/HBASE-10946
> Project: HBase
>  Issue Type: Bug
>  Components: HFile
>Affects Versions: 0.94.17
> Environment:  hadoop 1.2.1  hbase0.94.17
>Reporter: Zhang Jingpeng
>Priority: Minor
>
> when i run hbase org.apache.hadoop.hbase.io.hfile.Hfile ,print the error info
> "ERROR metrics.SchemaMetrics: Inconsistent configuration. Previous 
> configuration for using table name in metrics: true, new configuration: false"
> I find the report section in Hfile -->setUseTableName method
> and to be called by next
>  final boolean useTableNameNew =
>   conf.getBoolean(SHOW_TABLE_NAME_CONF_KEY, false);
>   setUseTableName(useTableNameNew);
> but  the hbase-default.xml is hbase.metrics.showTableName
> true



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HBASE-11133) Add an option to skip snapshot verification after Export

2014-05-16 Thread Jean-Daniel Cryans (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-11133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13993007#comment-13993007
 ] 

Jean-Daniel Cryans commented on HBASE-11133:


Something like "-skip-copy-sanity-check" would be better, also maybe describe 
more what the sanity check is doing?

> Add an option to skip snapshot verification after Export
> 
>
> Key: HBASE-11133
> URL: https://issues.apache.org/jira/browse/HBASE-11133
> Project: HBase
>  Issue Type: Bug
>  Components: snapshots
>Affects Versions: 0.99.0
>Reporter: Matteo Bertozzi
>Assignee: Matteo Bertozzi
>Priority: Trivial
> Fix For: 0.99.0
>
> Attachments: HBASE-11133-v0.patch, HBASE-11133-v1.patch
>
>
> Add a "-skip-dst-verify" option to skip snapshot verification after Export



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (HBASE-11188) "Inconsistent configuration" for SchemaMetrics is always shown

2014-05-16 Thread Jean-Daniel Cryans (JIRA)

Jean-Daniel Cryans created HBASE-11188:
--

 Summary: "Inconsistent configuration" for SchemaMetrics is always 
shown
 Key: HBASE-11188
 URL: https://issues.apache.org/jira/browse/HBASE-11188
 Project: HBase
  Issue Type: Bug
  Components: metrics
Affects Versions: 0.94.19
Reporter: Jean-Daniel Cryans
Assignee: Jean-Daniel Cryans
 Fix For: 0.94.20
 Attachments: HBASE-11188-0.94-v2.patch, HBASE-11188-0.94.patch

Some users have been complaining about this message:

{noformat}
ERROR org.apache.hadoop.hbase.regionserver.metrics.SchemaMetrics: Inconsistent 
configuration. Previous configuration for using table name in metrics: true, 
new configuration: false
{noformat}

The interesting thing is that we see it with default configurations, which made 
me think that some code path must have been passing the wrong thing. I found 
that if SchemaConfigured is passed a null Configuration in its constructor that 
it will then pass null to SchemaMetrics#configureGlobally which will interpret 
useTableName as being false:

{code}
  public static void configureGlobally(Configuration conf) {
if (conf != null) {
  final boolean useTableNameNew =
  conf.getBoolean(SHOW_TABLE_NAME_CONF_KEY, false);
  setUseTableName(useTableNameNew);
} else {
  setUseTableName(false);
}
  }
{code}

It should be set to true since that's the new default, meaning we missed it in 
HBASE-5671.

I found one code path that passes a null configuration, StoreFile.Reader 
extends SchemaConfigured and uses the constructor that only passes a Path, so 
the Configuration is set to null.

I'm planning on just passing true instead of false, fixing the problem for 
almost everyone (those that disable this feature will get the error message). 
IMO it's not worth more efforts since it's a 0.94-only problem and it's not 
actually doing anything bad.

I'm closing both HBASE-10990 and HBASE-10946 as duplicates.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HBASE-6990) Pretty print TTL

2014-05-16 Thread Jean-Daniel Cryans (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14000440#comment-14000440
 ] 

Jean-Daniel Cryans commented on HBASE-6990:
---

That representation is pretty much my original ask, +1

> Pretty print TTL
> 
>
> Key: HBASE-6990
> URL: https://issues.apache.org/jira/browse/HBASE-6990
> Project: HBase
>  Issue Type: Improvement
>  Components: Usability
>Reporter: Jean-Daniel Cryans
>Assignee: Esteban Gutierrez
>Priority: Minor
> Attachments: HBASE-6990.v0.patch, HBASE-6990.v1.patch, 
> HBASE-6990.v2.patch, HBASE-6990.v3.patch, HBASE-6990.v4.patch
>
>
> I've seen a lot of users getting confused by the TTL configuration and I 
> think that if we just pretty printed it it would solve most of the issues. 
> For example, let's say a user wanted to set a TTL of 90 days. That would be 
> 7776000. But let's say that it was typo'd to 7776 instead, it gives you 
> 900 days!
> So when we print the TTL we could do something like "x days, x hours, x 
> minutes, x seconds (real_ttl_value)". This would also help people when they 
> use ms instead of seconds as they would see really big values in there.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HBASE-11188) "Inconsistent configuration" for SchemaMetrics is always shown

2014-05-16 Thread Jean-Daniel Cryans (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-11188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Daniel Cryans updated HBASE-11188:
---

Attachment: HBASE-11188-0.94-v2.patch

To be extra correct there's another line that should default to true, attaching 
the patch.

> "Inconsistent configuration" for SchemaMetrics is always shown
> --
>
> Key: HBASE-11188
> URL: https://issues.apache.org/jira/browse/HBASE-11188
> Project: HBase
>  Issue Type: Bug
>  Components: metrics
>Affects Versions: 0.94.19
>Reporter: Jean-Daniel Cryans
>Assignee: Jean-Daniel Cryans
> Fix For: 0.94.20
>
> Attachments: HBASE-11188-0.94-v2.patch, HBASE-11188-0.94.patch
>
>
> Some users have been complaining about this message:
> {noformat}
> ERROR org.apache.hadoop.hbase.regionserver.metrics.SchemaMetrics: 
> Inconsistent configuration. Previous configuration for using table name in 
> metrics: true, new configuration: false
> {noformat}
> The interesting thing is that we see it with default configurations, which 
> made me think that some code path must have been passing the wrong thing. I 
> found that if SchemaConfigured is passed a null Configuration in its 
> constructor that it will then pass null to SchemaMetrics#configureGlobally 
> which will interpret useTableName as being false:
> {code}
>   public static void configureGlobally(Configuration conf) {
> if (conf != null) {
>   final boolean useTableNameNew =
>   conf.getBoolean(SHOW_TABLE_NAME_CONF_KEY, false);
>   setUseTableName(useTableNameNew);
> } else {
>   setUseTableName(false);
> }
>   }
> {code}
> It should be set to true since that's the new default, meaning we missed it 
> in HBASE-5671.
> I found one code path that passes a null configuration, StoreFile.Reader 
> extends SchemaConfigured and uses the constructor that only passes a Path, so 
> the Configuration is set to null.
> I'm planning on just passing true instead of false, fixing the problem for 
> almost everyone (those that disable this feature will get the error message). 
> IMO it's not worth more efforts since it's a 0.94-only problem and it's not 
> actually doing anything bad.
> I'm closing both HBASE-10990 and HBASE-10946 as duplicates.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HBASE-11188) "Inconsistent configuration" for SchemaMetrics is always shown

2014-05-16 Thread Jean-Daniel Cryans (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-11188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Daniel Cryans updated HBASE-11188:
---

Attachment: HBASE-11188-0.94.patch

Proposing this one-liner patch, passes both TestSchemaConfigured and 
TestSchemaMetrics.

> "Inconsistent configuration" for SchemaMetrics is always shown
> --
>
> Key: HBASE-11188
> URL: https://issues.apache.org/jira/browse/HBASE-11188
> Project: HBase
>  Issue Type: Bug
>  Components: metrics
>Affects Versions: 0.94.19
>Reporter: Jean-Daniel Cryans
>Assignee: Jean-Daniel Cryans
> Fix For: 0.94.20
>
> Attachments: HBASE-11188-0.94-v2.patch, HBASE-11188-0.94.patch
>
>
> Some users have been complaining about this message:
> {noformat}
> ERROR org.apache.hadoop.hbase.regionserver.metrics.SchemaMetrics: 
> Inconsistent configuration. Previous configuration for using table name in 
> metrics: true, new configuration: false
> {noformat}
> The interesting thing is that we see it with default configurations, which 
> made me think that some code path must have been passing the wrong thing. I 
> found that if SchemaConfigured is passed a null Configuration in its 
> constructor that it will then pass null to SchemaMetrics#configureGlobally 
> which will interpret useTableName as being false:
> {code}
>   public static void configureGlobally(Configuration conf) {
> if (conf != null) {
>   final boolean useTableNameNew =
>   conf.getBoolean(SHOW_TABLE_NAME_CONF_KEY, false);
>   setUseTableName(useTableNameNew);
> } else {
>   setUseTableName(false);
> }
>   }
> {code}
> It should be set to true since that's the new default, meaning we missed it 
> in HBASE-5671.
> I found one code path that passes a null configuration, StoreFile.Reader 
> extends SchemaConfigured and uses the constructor that only passes a Path, so 
> the Configuration is set to null.
> I'm planning on just passing true instead of false, fixing the problem for 
> almost everyone (those that disable this feature will get the error message). 
> IMO it's not worth more efforts since it's a 0.94-only problem and it's not 
> actually doing anything bad.
> I'm closing both HBASE-10990 and HBASE-10946 as duplicates.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Resolved] (HBASE-11188) "Inconsistent configuration" for SchemaMetrics is always shown

2014-05-16 Thread Jean-Daniel Cryans (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-11188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Daniel Cryans resolved HBASE-11188.


  Resolution: Fixed
Release Note: 
Region servers with the default value for hbase.metrics.showTableName will stop 
showing the error message "ERROR 
org.apache.hadoop.hbase.regionserver.metrics.SchemaMetrics: Inconsistent 
configuration. Previous configuration for using table name in metrics: true, 
new configuration: false".
Region servers configured with hbase.metrics.showTableName=false should now get 
a message like this one: "ERROR 
org.apache.hadoop.hbase.regionserver.metrics.SchemaMetrics: Inconsistent 
configuration. Previous configuration for using table name in metrics: false, 
new configuration: true", and it's nothing to be concerned about.
Hadoop Flags: Reviewed

Grazie Matteo, I committed the patch to 0.94

> "Inconsistent configuration" for SchemaMetrics is always shown
> --
>
> Key: HBASE-11188
> URL: https://issues.apache.org/jira/browse/HBASE-11188
> Project: HBase
>  Issue Type: Bug
>  Components: metrics
>Affects Versions: 0.94.19
>Reporter: Jean-Daniel Cryans
>Assignee: Jean-Daniel Cryans
> Fix For: 0.94.20
>
> Attachments: HBASE-11188-0.94-v2.patch, HBASE-11188-0.94.patch
>
>
> Some users have been complaining about this message:
> {noformat}
> ERROR org.apache.hadoop.hbase.regionserver.metrics.SchemaMetrics: 
> Inconsistent configuration. Previous configuration for using table name in 
> metrics: true, new configuration: false
> {noformat}
> The interesting thing is that we see it with default configurations, which 
> made me think that some code path must have been passing the wrong thing. I 
> found that if SchemaConfigured is passed a null Configuration in its 
> constructor that it will then pass null to SchemaMetrics#configureGlobally 
> which will interpret useTableName as being false:
> {code}
>   public static void configureGlobally(Configuration conf) {
> if (conf != null) {
>   final boolean useTableNameNew =
>   conf.getBoolean(SHOW_TABLE_NAME_CONF_KEY, false);
>   setUseTableName(useTableNameNew);
> } else {
>   setUseTableName(false);
> }
>   }
> {code}
> It should be set to true since that's the new default, meaning we missed it 
> in HBASE-5671.
> I found one code path that passes a null configuration, StoreFile.Reader 
> extends SchemaConfigured and uses the constructor that only passes a Path, so 
> the Configuration is set to null.
> I'm planning on just passing true instead of false, fixing the problem for 
> almost everyone (those that disable this feature will get the error message). 
> IMO it's not worth more efforts since it's a 0.94-only problem and it's not 
> actually doing anything bad.
> I'm closing both HBASE-10990 and HBASE-10946 as duplicates.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Resolved] (HBASE-10990) Running CompressionTest yields unrelated schema metrics configuration errors.

2014-05-16 Thread Jean-Daniel Cryans (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-10990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Daniel Cryans resolved HBASE-10990.


Resolution: Duplicate

I'm fixing this in HBASE-11188.

> Running CompressionTest yields unrelated schema metrics configuration errors.
> -
>
> Key: HBASE-10990
> URL: https://issues.apache.org/jira/browse/HBASE-10990
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.94.18
>Reporter: Srikanth Srungarapu
>Assignee: Srikanth Srungarapu
>Priority: Minor
> Attachments: HBASE-10990.patch
>
>
> In a vanilla configuration, running CompressionTest yields the following 
> error:
> sudo -u hdfs hbase org.apache.hadoop.hbase.util.CompressionTest 
> /path/to/hfile gz
> Output:
> 13/03/07 14:49:40 ERROR metrics.SchemaMetrics: Inconsistent configuration. 
> Previous configuration for using table name in metrics: true, new 
> configuration: false



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HBASE-5671) hbase.metrics.showTableName should be true by default

2014-05-16 Thread Jean-Daniel Cryans (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13999172#comment-13999172
 ] 

Jean-Daniel Cryans commented on HBASE-5671:
---

For anyone stumbling here because of this message "Inconsistent configuration. 
Previous configuration for using table name in metrics: true, new 
configuration: false", please see HBASE-11188.

> hbase.metrics.showTableName should be true by default
> -
>
> Key: HBASE-5671
> URL: https://issues.apache.org/jira/browse/HBASE-5671
> Project: HBase
>  Issue Type: Improvement
>  Components: metrics
>Reporter: Enis Soztutar
>Assignee: Enis Soztutar
>Priority: Critical
> Fix For: 0.94.0, 0.95.0
>
> Attachments: HBASE-5671_v1.patch
>
>
> HBASE-4768 added per-cf metrics and a new configuration option 
> "hbase.metrics.showTableName". We should switch the conf option to true by 
> default, since it is not intuitive (at least to me) to aggregate per-cf 
> across tables by default, and it seems confusing to report on cf's without 
> table names. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HBASE-10504) Define Replication Interface

2014-05-15 Thread Jean-Daniel Cryans (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-10504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13997727#comment-13997727
 ] 

Jean-Daniel Cryans commented on HBASE-10504:


[~enis], sounds like a plan. Oh and something we should not forget is setting 
the right InterfaceAudience. 

[~gabriel.reid] you down with this? It means that this class (1) will need a 
few changes just to continue working like it currently does, and eventually a 
lot more will need to change to use the replication consumer but it should 
simplify the code a lot. It will also have an impact on the deployment since 
you won't need a fake slave cluster anymore.

1. 
https://github.com/NGDATA/hbase-indexer/blob/master/hbase-sep/hbase-sep-impl-0.98/src/main/java/com/ngdata/sep/impl/SepReplicationSource.java

> Define Replication Interface
> 
>
> Key: HBASE-10504
> URL: https://issues.apache.org/jira/browse/HBASE-10504
> Project: HBase
>  Issue Type: Task
>Reporter: stack
>Assignee: stack
>Priority: Blocker
> Fix For: 0.99.0
>
> Attachments: hbase-10504_wip1.patch
>
>
> HBase has replication.  Fellas have been hijacking the replication apis to do 
> all kinds of perverse stuff like indexing hbase content (hbase-indexer 
> https://github.com/NGDATA/hbase-indexer) and our [~toffer] just showed up w/ 
> overrides that replicate via an alternate channel (over a secure thrift 
> channel between dcs over on HBASE-9360).  This issue is about surfacing these 
> APIs as public with guarantees to downstreamers similar to those we have on 
> our public client-facing APIs (and so we don't break them for downstreamers).
> Any input [~phunt] or [~gabriel.reid] or [~toffer]?
> Thanks.
>  



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HBASE-10251) Restore API Compat for PerformanceEvaluation.generateValue()

2014-05-15 Thread Jean-Daniel Cryans (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-10251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Daniel Cryans updated HBASE-10251:
---

   Resolution: Fixed
Fix Version/s: (was: 0.98.2)
   (was: 0.98.1)
   (was: 0.98.0)
   0.98.3
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

Committed to 0.98 and trunk, thanks Dima!

> Restore API Compat for PerformanceEvaluation.generateValue()
> 
>
> Key: HBASE-10251
> URL: https://issues.apache.org/jira/browse/HBASE-10251
> Project: HBase
>  Issue Type: Bug
>  Components: Client
>Affects Versions: 0.98.0, 0.98.1, 0.99.0, 0.98.2
>Reporter: Aleksandr Shulman
>Assignee: Dima Spivak
>  Labels: api_compatibility
> Fix For: 0.99.0, 0.98.3
>
> Attachments: HBASE-10251-v2.patch, HBASE_10251.patch, 
> HBASE_10251_v3.patch
>
>
> Observed:
> A couple of my client tests fail to compile against trunk because the method 
> PerformanceEvaluation.generateValue was removed as part of HBASE-8496.
> This is an issue because it was used in a number of places, including unit 
> tests. Since we did not explicitly label this API as private, it's ambiguous 
> as to whether this could/should have been used by people writing apps against 
> 0.96. If they used it, then they would be broken upon upgrade to 0.98 and 
> trunk.
> Potential Solution:
> The method was renamed to generateData, but the logic is still the same. We 
> can reintroduce it as deprecated in 0.98, as compat shim over generateData. 
> The patch should be a few lines. We may also consider doing so in trunk, but 
> I'd be just as fine with leaving it out.
> More generally, this raises the question about what other code is in this 
> "grey-area", where it is public, is used outside of the package, but is not 
> explicitly labeled with an AudienceInterface.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Assigned] (HBASE-10251) Restore API Compat for PerformanceEvaluation.generateValue()

2014-05-13 Thread Jean-Daniel Cryans (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-10251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Daniel Cryans reassigned HBASE-10251:
--

Assignee: Dima Spivak  (was: Jean-Daniel Cryans)

God I hate jira, so easy to misclick. Reassigning Dima.

> Restore API Compat for PerformanceEvaluation.generateValue()
> 
>
> Key: HBASE-10251
> URL: https://issues.apache.org/jira/browse/HBASE-10251
> Project: HBase
>  Issue Type: Bug
>  Components: Client
>Affects Versions: 0.98.0, 0.98.1, 0.99.0, 0.98.2
>Reporter: Aleksandr Shulman
>Assignee: Dima Spivak
>  Labels: api_compatibility
> Fix For: 0.98.0, 0.98.1, 0.99.0, 0.98.2
>
> Attachments: HBASE-10251-v2.patch, HBASE_10251.patch
>
>
> Observed:
> A couple of my client tests fail to compile against trunk because the method 
> PerformanceEvaluation.generateValue was removed as part of HBASE-8496.
> This is an issue because it was used in a number of places, including unit 
> tests. Since we did not explicitly label this API as private, it's ambiguous 
> as to whether this could/should have been used by people writing apps against 
> 0.96. If they used it, then they would be broken upon upgrade to 0.98 and 
> trunk.
> Potential Solution:
> The method was renamed to generateData, but the logic is still the same. We 
> can reintroduce it as deprecated in 0.98, as compat shim over generateData. 
> The patch should be a few lines. We may also consider doing so in trunk, but 
> I'd be just as fine with leaving it out.
> More generally, this raises the question about what other code is in this 
> "grey-area", where it is public, is used outside of the package, but is not 
> explicitly labeled with an AudienceInterface.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Assigned] (HBASE-10251) Restore API Compat for PerformanceEvaluation.generateValue()

2014-05-13 Thread Jean-Daniel Cryans (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-10251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Daniel Cryans reassigned HBASE-10251:
--

Assignee: Jean-Daniel Cryans  (was: Dima Spivak)

> Restore API Compat for PerformanceEvaluation.generateValue()
> 
>
> Key: HBASE-10251
> URL: https://issues.apache.org/jira/browse/HBASE-10251
> Project: HBase
>  Issue Type: Bug
>  Components: Client
>Affects Versions: 0.98.0, 0.98.1, 0.99.0, 0.98.2
>Reporter: Aleksandr Shulman
>Assignee: Jean-Daniel Cryans
>  Labels: api_compatibility
> Fix For: 0.98.0, 0.98.1, 0.99.0, 0.98.2
>
> Attachments: HBASE-10251-v2.patch, HBASE_10251.patch
>
>
> Observed:
> A couple of my client tests fail to compile against trunk because the method 
> PerformanceEvaluation.generateValue was removed as part of HBASE-8496.
> This is an issue because it was used in a number of places, including unit 
> tests. Since we did not explicitly label this API as private, it's ambiguous 
> as to whether this could/should have been used by people writing apps against 
> 0.96. If they used it, then they would be broken upon upgrade to 0.98 and 
> trunk.
> Potential Solution:
> The method was renamed to generateData, but the logic is still the same. We 
> can reintroduce it as deprecated in 0.98, as compat shim over generateData. 
> The patch should be a few lines. We may also consider doing so in trunk, but 
> I'd be just as fine with leaving it out.
> More generally, this raises the question about what other code is in this 
> "grey-area", where it is public, is used outside of the package, but is not 
> explicitly labeled with an AudienceInterface.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HBASE-10251) Restore API Compat for PerformanceEvaluation.generateValue()

2014-05-13 Thread Jean-Daniel Cryans (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-10251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13996651#comment-13996651
 ] 

Jean-Daniel Cryans commented on HBASE-10251:


Pass VALUE_LENGTH directly instead of re-declaring it.

> Restore API Compat for PerformanceEvaluation.generateValue()
> 
>
> Key: HBASE-10251
> URL: https://issues.apache.org/jira/browse/HBASE-10251
> Project: HBase
>  Issue Type: Bug
>  Components: Client
>Affects Versions: 0.98.0, 0.98.1, 0.99.0, 0.98.2
>Reporter: Aleksandr Shulman
>Assignee: Dima Spivak
>  Labels: api_compatibility
> Fix For: 0.98.0, 0.98.1, 0.99.0, 0.98.2
>
> Attachments: HBASE_10251.patch
>
>
> Observed:
> A couple of my client tests fail to compile against trunk because the method 
> PerformanceEvaluation.generateValue was removed as part of HBASE-8496.
> This is an issue because it was used in a number of places, including unit 
> tests. Since we did not explicitly label this API as private, it's ambiguous 
> as to whether this could/should have been used by people writing apps against 
> 0.96. If they used it, then they would be broken upon upgrade to 0.98 and 
> trunk.
> Potential Solution:
> The method was renamed to generateData, but the logic is still the same. We 
> can reintroduce it as deprecated in 0.98, as compat shim over generateData. 
> The patch should be a few lines. We may also consider doing so in trunk, but 
> I'd be just as fine with leaving it out.
> More generally, this raises the question about what other code is in this 
> "grey-area", where it is public, is used outside of the package, but is not 
> explicitly labeled with an AudienceInterface.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HBASE-10504) Define Replication Interface

2014-05-12 Thread Jean-Daniel Cryans (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-10504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13995685#comment-13995685
 ] 

Jean-Daniel Cryans commented on HBASE-10504:


[~enis] So IIUC you are going a completely different way than what was 
discussed before, right? I don't mind it but I want to make sure that we're on 
the same track.

So instead of having people define their sinks, they instead put that code in 
the ReplicationConsumer. Seems like a good idea, you don't have to mess around 
getting a fake cluster up and running, among other issues. Might still be nice 
to offer some tools like removeNonReplicableEdits(), the basic metrics the 
handling of a disabled peer, etc, for the other consumers.

> Define Replication Interface
> 
>
> Key: HBASE-10504
> URL: https://issues.apache.org/jira/browse/HBASE-10504
> Project: HBase
>  Issue Type: Task
>Reporter: stack
>Assignee: stack
>Priority: Blocker
> Fix For: 0.99.0
>
> Attachments: hbase-10504_wip1.patch
>
>
> HBase has replication.  Fellas have been hijacking the replication apis to do 
> all kinds of perverse stuff like indexing hbase content (hbase-indexer 
> https://github.com/NGDATA/hbase-indexer) and our [~toffer] just showed up w/ 
> overrides that replicate via an alternate channel (over a secure thrift 
> channel between dcs over on HBASE-9360).  This issue is about surfacing these 
> APIs as public with guarantees to downstreamers similar to those we have on 
> our public client-facing APIs (and so we don't break them for downstreamers).
> Any input [~phunt] or [~gabriel.reid] or [~toffer]?
> Thanks.
>  



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HBASE-11143) ageOfLastShippedOp metric is confusing

2014-05-12 Thread Jean-Daniel Cryans (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-11143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13995525#comment-13995525
 ] 

Jean-Daniel Cryans commented on HBASE-11143:


I'm +1 for the trunk patch too.

> ageOfLastShippedOp metric is confusing
> --
>
> Key: HBASE-11143
> URL: https://issues.apache.org/jira/browse/HBASE-11143
> Project: HBase
>  Issue Type: Bug
>  Components: Replication
>Reporter: Lars Hofhansl
>Assignee: Lars Hofhansl
> Fix For: 0.99.0, 0.96.3, 0.94.20, 0.98.3
>
> Attachments: 11143-0.94-v2.txt, 11143-0.94.txt, 11143-trunk.txt
>
>
> We are trying to report on replication lag and find that there is no good 
> single metric to do that.
> ageOfLastShippedOp is close, but unfortunately it is increased even when 
> there is nothing to ship on a particular RegionServer.
> I would like discuss a few options here:
> Add a new metric: replicationQueueTime (or something) with the above meaning. 
> I.e. if we have something to ship we set the age of that last shipped edit, 
> if we fail we increment that last time (just like we do now). But if there is 
> nothing to replicate we set it to current time (and hence that metric is 
> reported to close to 0).
> Alternatively we could change the meaning of ageOfLastShippedOp to mean to do 
> that. That might lead to surprises, but the current behavior is clearly weird 
> when there is nothing to replicate.
> Comments? [~jdcryans], [~stack].
> If approach sounds good, I'll make a patch for all branches.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HBASE-11143) ageOfLastShippedOp metric is confusing

2014-05-12 Thread Jean-Daniel Cryans (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-11143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13995242#comment-13995242
 ] 

Jean-Daniel Cryans commented on HBASE-11143:


+1, and can we get the new metric in 0.96+?

> ageOfLastShippedOp metric is confusing
> --
>
> Key: HBASE-11143
> URL: https://issues.apache.org/jira/browse/HBASE-11143
> Project: HBase
>  Issue Type: Bug
>  Components: Replication
>Reporter: Lars Hofhansl
>Assignee: Lars Hofhansl
> Fix For: 0.94.20
>
> Attachments: 11143-0.94-v2.txt, 11143-0.94.txt
>
>
> We are trying to report on replication lag and find that there is no good 
> single metric to do that.
> ageOfLastShippedOp is close, but unfortunately it is increased even when 
> there is nothing to ship on a particular RegionServer.
> I would like discuss a few options here:
> Add a new metric: replicationQueueTime (or something) with the above meaning. 
> I.e. if we have something to ship we set the age of that last shipped edit, 
> if we fail we increment that last time (just like we do now). But if there is 
> nothing to replicate we set it to current time (and hence that metric is 
> reported to close to 0).
> Alternatively we could change the meaning of ageOfLastShippedOp to mean to do 
> that. That might lead to surprises, but the current behavior is clearly weird 
> when there is nothing to replicate.
> Comments? [~jdcryans], [~stack].
> If approach sounds good, I'll make a patch for all branches.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HBASE-10993) Deprioritize long-running scanners

2014-05-06 Thread Jean-Daniel Cryans (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-10993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13991179#comment-13991179
 ] 

Jean-Daniel Cryans commented on HBASE-10993:


I'm +1, there are a few things to fix in the comments but you can do that on 
commit (like "mantain").

BTW have you run this on a cluster where clients > handlers? 

> Deprioritize long-running scanners
> --
>
> Key: HBASE-10993
> URL: https://issues.apache.org/jira/browse/HBASE-10993
> Project: HBase
>  Issue Type: Sub-task
>Reporter: Matteo Bertozzi
>Assignee: Matteo Bertozzi
>Priority: Minor
> Fix For: 0.99.0
>
> Attachments: HBASE-10993-v0.patch, HBASE-10993-v1.patch, 
> HBASE-10993-v2.patch, HBASE-10993-v3.patch, HBASE-10993-v4.patch, 
> HBASE-10993-v4.patch
>
>
> Currently we have a single call queue that serves all the "normal user"  
> requests, and the requests are executed in FIFO.
> When running map-reduce jobs and user-queries on the same machine, we want to 
> prioritize the user-queries.
> Without changing too much code, and not having the user giving hints, we can 
> add a “vtime” field to the scanner, to keep track from how long is running. 
> And we can replace the callQueue with a priorityQueue. In this way we can 
> deprioritize long-running scans, the longer a scan request lives the less 
> priority it gets.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HBASE-11116) Allow ReplicationSink to write to tables when READONLY is set

2014-05-05 Thread Jean-Daniel Cryans (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13990246#comment-13990246
 ] 

Jean-Daniel Cryans commented on HBASE-6:


I wonder if you could sort of enable that right now with ACLs. Basically, only 
the user that runs hbase can write to the tables on the slave cluster, everyone 
else is disallowed.

> Allow ReplicationSink to write to tables when READONLY is set
> -
>
> Key: HBASE-6
> URL: https://issues.apache.org/jira/browse/HBASE-6
> Project: HBase
>  Issue Type: New Feature
>  Components: regionserver, Replication
>Reporter: Geoffrey Anderson
>
> A common practice with MySQL replication is to enable the read_only flag 
> (http://dev.mysql.com/doc/refman/5.5/en/server-system-variables.html#sysvar_read_only)
>  on slaves in order to disallow any user changes to a slave, while allowing 
> replication from the master to flow in.  In HBase, enabling the READONLY 
> field on a table will cause ReplicationSink to fail to apply changes.  It'd 
> be ideal to have the default behavior allow replication to continue going 
> through, a configurable option to influence this type of behavior, or perhaps 
> even a flag that manipulates a peer's state (e.g. via add_peer) that dictates 
> if replication can write to READONLY tables.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HBASE-11008) Align bulk load, flush, and compact to require Action.CREATE

2014-04-30 Thread Jean-Daniel Cryans (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-11008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Daniel Cryans updated HBASE-11008:
---

  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

And committed to 0.94, thanks everyone.

> Align bulk load, flush, and compact to require Action.CREATE
> 
>
> Key: HBASE-11008
> URL: https://issues.apache.org/jira/browse/HBASE-11008
> Project: HBase
>  Issue Type: Improvement
>  Components: security
>Reporter: Jean-Daniel Cryans
>Assignee: Jean-Daniel Cryans
> Fix For: 0.99.0, 0.98.2, 0.96.3, 0.94.20
>
> Attachments: HBASE-11008-0.94.patch, HBASE-11008-v2.patch, 
> HBASE-11008-v3.patch, HBASE-11008.patch
>
>
> Over in HBASE-10958 we noticed that it might make sense to require 
> Action.CREATE for bulk load, flush, and compact since it is also required for 
> things like enable and disable.
> This means the following changes:
>  - preBulkLoadHFile goes from WRITE to CREATE
>  - compact/flush go from ADMIN to ADMIN or CREATE



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HBASE-10958) [dataloss] Bulk loading with seqids can prevent some log entries from being replayed

2014-04-30 Thread Jean-Daniel Cryans (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-10958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Daniel Cryans updated HBASE-10958:
---

  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

Now committed to 0.94 (took some time because I wanted to double check the 
tests I modified were all green). Thanks everyone.

> [dataloss] Bulk loading with seqids can prevent some log entries from being 
> replayed
> 
>
> Key: HBASE-10958
> URL: https://issues.apache.org/jira/browse/HBASE-10958
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.96.2, 0.98.1, 0.94.18
>Reporter: Jean-Daniel Cryans
>Assignee: Jean-Daniel Cryans
>Priority: Blocker
> Fix For: 0.99.0, 0.98.2, 0.96.3, 0.94.20
>
> Attachments: HBASE-10958-0.94.patch, 
> HBASE-10958-less-intrusive-hack-0.96.patch, 
> HBASE-10958-quick-hack-0.96.patch, HBASE-10958-v2.patch, 
> HBASE-10958-v3.patch, HBASE-10958.patch
>
>
> We found an issue with bulk loads causing data loss when assigning sequence 
> ids (HBASE-6630) that is triggered when replaying recovered edits. We're 
> nicknaming this issue *Blindspot*.
> The problem is that the sequence id given to a bulk loaded file is higher 
> than those of the edits in the region's memstore. When replaying recovered 
> edits, the rule to skip some of them is that they have to be _lower than the 
> highest sequence id_. In other words, the edits that have a sequence id lower 
> than the highest one in the store files *should* have also been flushed. This 
> is not the case with bulk loaded files since we now have an HFile with a 
> sequence id higher than unflushed edits.
> The log recovery code takes this into account by simply skipping the bulk 
> loaded files, but this "bulk loaded status" is *lost* on compaction. The 
> edits in the logs that have a sequence id lower than the bulk loaded file 
> that got compacted are put in a blind spot and are skipped during replay.
> Here's the easiest way to recreate this issue:
>  - Create an empty table
>  - Put one row in it (let's say it gets seqid 1)
>  - Bulk load one file (it gets seqid 2). I used ImporTsv and set 
> hbase.mapreduce.bulkload.assign.sequenceNumbers.
>  - Bulk load a second file the same way (it gets seqid 3).
>  - Major compact the table (the new file has seqid 3 and isn't considered 
> bulk loaded).
>  - Kill the region server that holds the table's region.
>  - Scan the table once the region is made available again. The first row, at 
> seqid 1, will be missing since the HFile with seqid 3 makes us believe that 
> everything that came before it was flushed.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HBASE-10958) [dataloss] Bulk loading with seqids can prevent some log entries from being replayed

2014-04-30 Thread Jean-Daniel Cryans (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-10958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13986053#comment-13986053
 ] 

Jean-Daniel Cryans commented on HBASE-10958:


Sorry, went to do something else, lemme get that in (with HBASE-11008 first).

> [dataloss] Bulk loading with seqids can prevent some log entries from being 
> replayed
> 
>
> Key: HBASE-10958
> URL: https://issues.apache.org/jira/browse/HBASE-10958
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.96.2, 0.98.1, 0.94.18
>Reporter: Jean-Daniel Cryans
>Assignee: Jean-Daniel Cryans
>Priority: Blocker
> Fix For: 0.99.0, 0.98.2, 0.96.3, 0.94.20
>
> Attachments: HBASE-10958-0.94.patch, 
> HBASE-10958-less-intrusive-hack-0.96.patch, 
> HBASE-10958-quick-hack-0.96.patch, HBASE-10958-v2.patch, 
> HBASE-10958-v3.patch, HBASE-10958.patch
>
>
> We found an issue with bulk loads causing data loss when assigning sequence 
> ids (HBASE-6630) that is triggered when replaying recovered edits. We're 
> nicknaming this issue *Blindspot*.
> The problem is that the sequence id given to a bulk loaded file is higher 
> than those of the edits in the region's memstore. When replaying recovered 
> edits, the rule to skip some of them is that they have to be _lower than the 
> highest sequence id_. In other words, the edits that have a sequence id lower 
> than the highest one in the store files *should* have also been flushed. This 
> is not the case with bulk loaded files since we now have an HFile with a 
> sequence id higher than unflushed edits.
> The log recovery code takes this into account by simply skipping the bulk 
> loaded files, but this "bulk loaded status" is *lost* on compaction. The 
> edits in the logs that have a sequence id lower than the bulk loaded file 
> that got compacted are put in a blind spot and are skipped during replay.
> Here's the easiest way to recreate this issue:
>  - Create an empty table
>  - Put one row in it (let's say it gets seqid 1)
>  - Bulk load one file (it gets seqid 2). I used ImporTsv and set 
> hbase.mapreduce.bulkload.assign.sequenceNumbers.
>  - Bulk load a second file the same way (it gets seqid 3).
>  - Major compact the table (the new file has seqid 3 and isn't considered 
> bulk loaded).
>  - Kill the region server that holds the table's region.
>  - Scan the table once the region is made available again. The first row, at 
> seqid 1, will be missing since the HFile with seqid 3 makes us believe that 
> everything that came before it was flushed.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HBASE-10958) [dataloss] Bulk loading with seqids can prevent some log entries from being replayed

2014-04-25 Thread Jean-Daniel Cryans (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-10958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Daniel Cryans updated HBASE-10958:
---

Release Note: Bulk loading with sequence IDs, an option in late 0.94 
releases and the default since 0.96.0, will now trigger a flush per region that 
loads an HFile (if there's data that needs to flushed).

Committed to 0.96 and up. Like for HBASE-11008, I'm waiting to commit to 0.94 
or I can open a backport jira.

> [dataloss] Bulk loading with seqids can prevent some log entries from being 
> replayed
> 
>
> Key: HBASE-10958
> URL: https://issues.apache.org/jira/browse/HBASE-10958
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.96.2, 0.98.1, 0.94.18
>Reporter: Jean-Daniel Cryans
>Assignee: Jean-Daniel Cryans
>Priority: Blocker
> Fix For: 0.99.0, 0.98.2, 0.96.3, 0.94.20
>
> Attachments: HBASE-10958-0.94.patch, 
> HBASE-10958-less-intrusive-hack-0.96.patch, 
> HBASE-10958-quick-hack-0.96.patch, HBASE-10958-v2.patch, 
> HBASE-10958-v3.patch, HBASE-10958.patch
>
>
> We found an issue with bulk loads causing data loss when assigning sequence 
> ids (HBASE-6630) that is triggered when replaying recovered edits. We're 
> nicknaming this issue *Blindspot*.
> The problem is that the sequence id given to a bulk loaded file is higher 
> than those of the edits in the region's memstore. When replaying recovered 
> edits, the rule to skip some of them is that they have to be _lower than the 
> highest sequence id_. In other words, the edits that have a sequence id lower 
> than the highest one in the store files *should* have also been flushed. This 
> is not the case with bulk loaded files since we now have an HFile with a 
> sequence id higher than unflushed edits.
> The log recovery code takes this into account by simply skipping the bulk 
> loaded files, but this "bulk loaded status" is *lost* on compaction. The 
> edits in the logs that have a sequence id lower than the bulk loaded file 
> that got compacted are put in a blind spot and are skipped during replay.
> Here's the easiest way to recreate this issue:
>  - Create an empty table
>  - Put one row in it (let's say it gets seqid 1)
>  - Bulk load one file (it gets seqid 2). I used ImporTsv and set 
> hbase.mapreduce.bulkload.assign.sequenceNumbers.
>  - Bulk load a second file the same way (it gets seqid 3).
>  - Major compact the table (the new file has seqid 3 and isn't considered 
> bulk loaded).
>  - Kill the region server that holds the table's region.
>  - Scan the table once the region is made available again. The first row, at 
> seqid 1, will be missing since the HFile with seqid 3 makes us believe that 
> everything that came before it was flushed.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HBASE-11008) Align bulk load, flush, and compact to require Action.CREATE

2014-04-25 Thread Jean-Daniel Cryans (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-11008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Daniel Cryans updated HBASE-11008:
---

Release Note: 
preBulkLoadHFile now requires CREATE, which it effectively already needed since 
getTableDescriptor also requires it which is what LoadIncrementalHFiles is 
doing before bulk loading.
compact and flush can now be issued by users with CREATE permission.

I committed to 0.96 and up. I'm waiting on [~lhofhansl]'s +1 for 0.94 (or I can 
create a backport jira).

> Align bulk load, flush, and compact to require Action.CREATE
> 
>
> Key: HBASE-11008
> URL: https://issues.apache.org/jira/browse/HBASE-11008
> Project: HBase
>  Issue Type: Improvement
>  Components: security
>Reporter: Jean-Daniel Cryans
>Assignee: Jean-Daniel Cryans
> Fix For: 0.99.0, 0.98.2, 0.96.3, 0.94.20
>
> Attachments: HBASE-11008-0.94.patch, HBASE-11008-v2.patch, 
> HBASE-11008-v3.patch, HBASE-11008.patch
>
>
> Over in HBASE-10958 we noticed that it might make sense to require 
> Action.CREATE for bulk load, flush, and compact since it is also required for 
> things like enable and disable.
> This means the following changes:
>  - preBulkLoadHFile goes from WRITE to CREATE
>  - compact/flush go from ADMIN to ADMIN or CREATE



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Resolved] (HBASE-10932) Improve RowCounter to allow mapper number set/control

2014-04-25 Thread Jean-Daniel Cryans (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-10932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Daniel Cryans resolved HBASE-10932.


Resolution: Won't Fix

Resolving as won't fix. If you want to work on a more general solution, like 
adding this option to the TIF, please open a new jira. Thanks.

> Improve RowCounter to allow mapper number set/control
> -
>
> Key: HBASE-10932
> URL: https://issues.apache.org/jira/browse/HBASE-10932
> Project: HBase
>  Issue Type: Improvement
>  Components: mapreduce
>Reporter: Yu Li
>Assignee: Yu Li
>Priority: Minor
> Attachments: HBASE-10932_v1.patch, HBASE-10932_v2.patch
>
>
> The typical use case of RowCounter is to do some kind of data integrity 
> checking, like after exporting some data from RDBMS to HBase, or from one 
> HBase cluster to another, making sure the row(record) number matches. Such 
> check commonly won't require much on response time.
> Meanwhile, based on current impl, RowCounter will launch one mapper per 
> region, and each mapper will send one scan request. Assuming the table is 
> kind of big like having tens of regions, and the cpu core number of the whole 
> MR cluster is also enough, the parallel scan requests sent by mapper would be 
> a real burden for the HBase cluster.
> So in this JIRA, we're proposing to make rowcounter support an additional 
> option "--maps" to specify mapper number, and make each mapper able to scan 
> more than one region of the target table.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HBASE-10932) Improve RowCounter to allow mapper number set/control

2014-04-25 Thread Jean-Daniel Cryans (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-10932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13981271#comment-13981271
 ] 

Jean-Daniel Cryans commented on HBASE-10932:


bq. But I didn't quite catch the point of job scheduler, in my understanding 
job scheduler is cluster-level and cannot be configured per-job, right? 

Well, by using a scheduler, you can constrain certain types of jobs so that 
they don't run as fast as they can. For example, with the fair scheduler you 
can configure a pool (let's call it the "slow pool") to have only {{maxMaps}} 
running concurrently on the cluster. Then, when you run your {{RowCounter}} 
jobs and whatnot, you can tie them automatically to the slow pool. Hadoop 
cluster operators usually know how to use a scheduler, whereas having to rely 
on the person who runs the jobs to configure them correctly can lead to human 
errors like "oops I forgot to pass the maps configuration to my row counter and 
now the website is down".

It also works well if you have two users who want to concurrently run a row 
counter; they'll both get in the slow pool and only two mappers will run 
(alternating between the two jobs, unless you set different weights because one 
user is more important than the other, etc etc). If you were to rely on 
individual users specifying the correct number of maps, and they both set their 
job to use two, then you'd have four mappers running. Back to square one.

Anyways, all of this to say that there's a more generic way of doing this, and 
it already exists. Can we close this jira, [~carp84]?

> Improve RowCounter to allow mapper number set/control
> -
>
> Key: HBASE-10932
> URL: https://issues.apache.org/jira/browse/HBASE-10932
> Project: HBase
>  Issue Type: Improvement
>  Components: mapreduce
>Reporter: Yu Li
>Assignee: Yu Li
>Priority: Minor
> Attachments: HBASE-10932_v1.patch, HBASE-10932_v2.patch
>
>
> The typical use case of RowCounter is to do some kind of data integrity 
> checking, like after exporting some data from RDBMS to HBase, or from one 
> HBase cluster to another, making sure the row(record) number matches. Such 
> check commonly won't require much on response time.
> Meanwhile, based on current impl, RowCounter will launch one mapper per 
> region, and each mapper will send one scan request. Assuming the table is 
> kind of big like having tens of regions, and the cpu core number of the whole 
> MR cluster is also enough, the parallel scan requests sent by mapper would be 
> a real burden for the HBase cluster.
> So in this JIRA, we're proposing to make rowcounter support an additional 
> option "--maps" to specify mapper number, and make each mapper able to scan 
> more than one region of the target table.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HBASE-10981) taskmonitor use subList may cause recursion and get a java.lang.StackOverflowError

2014-04-24 Thread Jean-Daniel Cryans (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-10981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Daniel Cryans updated HBASE-10981:
---

Resolution: Duplicate
Status: Resolved  (was: Patch Available)

Yup, resolving as dup of HBASE-10312. 

> taskmonitor use subList may cause recursion and get a 
> java.lang.StackOverflowError
> --
>
> Key: HBASE-10981
> URL: https://issues.apache.org/jira/browse/HBASE-10981
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.98.0, 0.96.0, 0.96.2, 0.98.1, 0.96.1.1, 0.98.2
> Environment: java version "1.7.0_03" 64bit
>Reporter: sinfox
>Priority: Critical
> Attachments: fixsubListException.patch
>
>   Original Estimate: 5m
>  Remaining Estimate: 5m
>
> 2014-04-14 11:52:45,905 ERROR [RS_CLOSE_REGION-in16-062:60020-1] 
> executor.EventHandler: Caught throwable while processing event
> M_RS_CLOSE_REGION
> java.lang.RuntimeException: java.lang.StackOverflowError
> at 
> org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler.process(CloseRegionHandler.java:161)
> at 
> org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:128)
> at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
> at 
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
> at java.lang.Thread.run(Thread.java:722)
> Caused by: java.lang.StackOverflowError
> at java.util.ArrayList$SubList.add(ArrayList.java:965)
> at java.util.ArrayList$SubList.add(ArrayList.java:965)



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HBASE-3489) .oldlogs not being cleaned out

2014-04-22 Thread Jean-Daniel Cryans (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-3489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13977835#comment-13977835
 ] 

Jean-Daniel Cryans commented on HBASE-3489:
---

We removed the kill switch, start/stop_replication don't exist anymore.

> .oldlogs not being cleaned out
> --
>
> Key: HBASE-3489
> URL: https://issues.apache.org/jira/browse/HBASE-3489
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.90.0
> Environment: 10 Nodes Write Heavy Cluster
>Reporter: Wayne
> Attachments: oldlog.txt
>
>
> The .oldlogs folder is never being cleaned up. The 
> hbase.master.logcleaner.ttl has been set to clean up the old logs but the 
> clean up is never kicking in. The limit of 10 files is not the problem. After 
> running for 5 days not a single log file has ever been deleted and the 
> logcleaner is set to 2 days (from the default of 7 days). It is assumed that 
> the replication changes that want to be sure to keep these logs around if 
> needed have caused the cleanup to be blocked. There is no replication defined 
> (knowingly).
>  



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HBASE-11008) Align bulk load, flush, and compact to require Action.CREATE

2014-04-22 Thread Jean-Daniel Cryans (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-11008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13977612#comment-13977612
 ] 

Jean-Daniel Cryans commented on HBASE-11008:


Thanks [~apurtell].

[~stack], [~lhofhansl], you guys ok for 0.96 and 0.94 respectively?

> Align bulk load, flush, and compact to require Action.CREATE
> 
>
> Key: HBASE-11008
> URL: https://issues.apache.org/jira/browse/HBASE-11008
> Project: HBase
>  Issue Type: Improvement
>  Components: security
>Reporter: Jean-Daniel Cryans
>Assignee: Jean-Daniel Cryans
> Fix For: 0.99.0, 0.98.2, 0.96.3, 0.94.20
>
> Attachments: HBASE-11008-0.94.patch, HBASE-11008-v2.patch, 
> HBASE-11008-v3.patch, HBASE-11008.patch
>
>
> Over in HBASE-10958 we noticed that it might make sense to require 
> Action.CREATE for bulk load, flush, and compact since it is also required for 
> things like enable and disable.
> This means the following changes:
>  - preBulkLoadHFile goes from WRITE to CREATE
>  - compact/flush go from ADMIN to ADMIN or CREATE



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HBASE-11008) Align bulk load, flush, and compact to require Action.CREATE

2014-04-22 Thread Jean-Daniel Cryans (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-11008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13977605#comment-13977605
 ] 

Jean-Daniel Cryans commented on HBASE-11008:


Gently prodding for reviews and +1s, if we commit this then we can get 
HBASE-10958 in.

> Align bulk load, flush, and compact to require Action.CREATE
> 
>
> Key: HBASE-11008
> URL: https://issues.apache.org/jira/browse/HBASE-11008
> Project: HBase
>  Issue Type: Improvement
>  Components: security
>Reporter: Jean-Daniel Cryans
>Assignee: Jean-Daniel Cryans
> Fix For: 0.99.0, 0.98.2, 0.96.3, 0.94.20
>
> Attachments: HBASE-11008-0.94.patch, HBASE-11008-v2.patch, 
> HBASE-11008-v3.patch, HBASE-11008.patch
>
>
> Over in HBASE-10958 we noticed that it might make sense to require 
> Action.CREATE for bulk load, flush, and compact since it is also required for 
> things like enable and disable.
> This means the following changes:
>  - preBulkLoadHFile goes from WRITE to CREATE
>  - compact/flush go from ADMIN to ADMIN or CREATE



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HBASE-11008) Align bulk load, flush, and compact to require Action.CREATE

2014-04-21 Thread Jean-Daniel Cryans (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-11008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Daniel Cryans updated HBASE-11008:
---

Attachment: HBASE-11008-0.94.patch

Attaching the patch for 0.94. I have to force using sequence ids else it won't 
exercise the flush that's coming in HBASE-10958. It doesn't change the 
semantics of the test itself.

> Align bulk load, flush, and compact to require Action.CREATE
> 
>
> Key: HBASE-11008
> URL: https://issues.apache.org/jira/browse/HBASE-11008
> Project: HBase
>  Issue Type: Improvement
>  Components: security
>Reporter: Jean-Daniel Cryans
>Assignee: Jean-Daniel Cryans
> Fix For: 0.99.0, 0.98.2, 0.96.3, 0.94.20
>
> Attachments: HBASE-11008-0.94.patch, HBASE-11008-v2.patch, 
> HBASE-11008-v3.patch, HBASE-11008.patch
>
>
> Over in HBASE-10958 we noticed that it might make sense to require 
> Action.CREATE for bulk load, flush, and compact since it is also required for 
> things like enable and disable.
> This means the following changes:
>  - preBulkLoadHFile goes from WRITE to CREATE
>  - compact/flush go from ADMIN to ADMIN or CREATE



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HBASE-11043) Users with table's read/write permission can't get table's description

2014-04-21 Thread Jean-Daniel Cryans (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-11043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13976108#comment-13976108
 ] 

Jean-Daniel Cryans commented on HBASE-11043:


It's this way because of HBASE-8692.

> Users with table's read/write permission can't get table's description
> --
>
> Key: HBASE-11043
> URL: https://issues.apache.org/jira/browse/HBASE-11043
> Project: HBase
>  Issue Type: Bug
>  Components: security
>Affects Versions: 0.99.0
>Reporter: Liu Shaohui
>Assignee: Liu Shaohui
>Priority: Minor
> Attachments: HBASE-11043-trunk-v1.diff
>
>
> AccessController#preGetTableDescriptors only allow users with admin or create 
> permission to get table's description.
> {quote}
> requirePermission("getTableDescriptors", nameAsBytes, null, null,
>   Permission.Action.ADMIN, Permission.Action.CREATE);
> {quote}
> I think Users with table's read/write permission should also be able to get 
> table's description. 
> Eg: when create a hive table on HBase,  hive will get the table description 
> to check if the mapping is right. Usually the hive users only have the read 
> permission of table.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HBASE-10932) Improve RowCounter to allow mapper number set/control

2014-04-18 Thread Jean-Daniel Cryans (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-10932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13974306#comment-13974306
 ] 

Jean-Daniel Cryans commented on HBASE-10932:


Hey [~carp84], I forgot about this issue, let me address your latest replies.

bq. I thought it's designed for purpose to make each mapper just scan one 
single region

That's more an implementation detail than a design, and we can further improve 
the implementation by giving more control to the power users.

bq. This is useful especially in multi-tenant env, when we need to check data 
integrity for one user after data importing meanwhile don't want the scan 
burden to slow down RT of other users' request.

Right, but again, resource management is a broader issue. I doubt that 
RowCounter is the only job that needs to be throttled, what about 
VerifyReplication? Or Export? Those jobs usually aren't latency sensitive and 
can run in the background. This can be simply handled by a correctly configured 
job scheduler, that's what they do.

> Improve RowCounter to allow mapper number set/control
> -
>
> Key: HBASE-10932
> URL: https://issues.apache.org/jira/browse/HBASE-10932
> Project: HBase
>  Issue Type: Improvement
>  Components: mapreduce
>Reporter: Yu Li
>Assignee: Yu Li
>Priority: Minor
> Attachments: HBASE-10932_v1.patch, HBASE-10932_v2.patch
>
>
> The typical use case of RowCounter is to do some kind of data integrity 
> checking, like after exporting some data from RDBMS to HBase, or from one 
> HBase cluster to another, making sure the row(record) number matches. Such 
> check commonly won't require much on response time.
> Meanwhile, based on current impl, RowCounter will launch one mapper per 
> region, and each mapper will send one scan request. Assuming the table is 
> kind of big like having tens of regions, and the cpu core number of the whole 
> MR cluster is also enough, the parallel scan requests sent by mapper would be 
> a real burden for the HBase cluster.
> So in this JIRA, we're proposing to make rowcounter support an additional 
> option "--maps" to specify mapper number, and make each mapper able to scan 
> more than one region of the target table.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HBASE-11008) Align bulk load, flush, and compact to require Action.CREATE

2014-04-18 Thread Jean-Daniel Cryans (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-11008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Daniel Cryans updated HBASE-11008:
---

Attachment: HBASE-11008-v3.patch

New patch that fixes the documentation for flush and compact. This is good to 
go for 0.96 and up I think.

> Align bulk load, flush, and compact to require Action.CREATE
> 
>
> Key: HBASE-11008
> URL: https://issues.apache.org/jira/browse/HBASE-11008
> Project: HBase
>  Issue Type: Improvement
>  Components: security
>Reporter: Jean-Daniel Cryans
>Assignee: Jean-Daniel Cryans
> Fix For: 0.99.0, 0.98.2, 0.96.3, 0.94.20
>
> Attachments: HBASE-11008-v2.patch, HBASE-11008-v3.patch, 
> HBASE-11008.patch
>
>
> Over in HBASE-10958 we noticed that it might make sense to require 
> Action.CREATE for bulk load, flush, and compact since it is also required for 
> things like enable and disable.
> This means the following changes:
>  - preBulkLoadHFile goes from WRITE to CREATE
>  - compact/flush go from ADMIN to ADMIN or CREATE



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HBASE-10958) [dataloss] Bulk loading with seqids can prevent some log entries from being replayed

2014-04-17 Thread Jean-Daniel Cryans (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-10958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13973789#comment-13973789
 ] 

Jean-Daniel Cryans commented on HBASE-10958:


bq. method = name.getMethodName();

Will fix (looks like I copied that from one of the few methods in that class 
that doesn't do it).

> [dataloss] Bulk loading with seqids can prevent some log entries from being 
> replayed
> 
>
> Key: HBASE-10958
> URL: https://issues.apache.org/jira/browse/HBASE-10958
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.96.2, 0.98.1, 0.94.18
>Reporter: Jean-Daniel Cryans
>Assignee: Jean-Daniel Cryans
>Priority: Blocker
> Fix For: 0.99.0, 0.94.19, 0.98.2, 0.96.3
>
> Attachments: HBASE-10958-0.94.patch, 
> HBASE-10958-less-intrusive-hack-0.96.patch, 
> HBASE-10958-quick-hack-0.96.patch, HBASE-10958-v2.patch, 
> HBASE-10958-v3.patch, HBASE-10958.patch
>
>
> We found an issue with bulk loads causing data loss when assigning sequence 
> ids (HBASE-6630) that is triggered when replaying recovered edits. We're 
> nicknaming this issue *Blindspot*.
> The problem is that the sequence id given to a bulk loaded file is higher 
> than those of the edits in the region's memstore. When replaying recovered 
> edits, the rule to skip some of them is that they have to be _lower than the 
> highest sequence id_. In other words, the edits that have a sequence id lower 
> than the highest one in the store files *should* have also been flushed. This 
> is not the case with bulk loaded files since we now have an HFile with a 
> sequence id higher than unflushed edits.
> The log recovery code takes this into account by simply skipping the bulk 
> loaded files, but this "bulk loaded status" is *lost* on compaction. The 
> edits in the logs that have a sequence id lower than the bulk loaded file 
> that got compacted are put in a blind spot and are skipped during replay.
> Here's the easiest way to recreate this issue:
>  - Create an empty table
>  - Put one row in it (let's say it gets seqid 1)
>  - Bulk load one file (it gets seqid 2). I used ImporTsv and set 
> hbase.mapreduce.bulkload.assign.sequenceNumbers.
>  - Bulk load a second file the same way (it gets seqid 3).
>  - Major compact the table (the new file has seqid 3 and isn't considered 
> bulk loaded).
>  - Kill the region server that holds the table's region.
>  - Scan the table once the region is made available again. The first row, at 
> seqid 1, will be missing since the HFile with seqid 3 makes us believe that 
> everything that came before it was flushed.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HBASE-11008) Align bulk load, flush, and compact to require Action.CREATE

2014-04-17 Thread Jean-Daniel Cryans (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-11008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13973653#comment-13973653
 ] 

Jean-Daniel Cryans commented on HBASE-11008:


I can, but I'm not sure where to put them. That table doesn't have the concept 
of operations having multiple perms. Flush and compact are now both ADMIN and 
CREATE. I can put them in ADMIN until we find a better representation. Sounds 
good?

> Align bulk load, flush, and compact to require Action.CREATE
> 
>
> Key: HBASE-11008
> URL: https://issues.apache.org/jira/browse/HBASE-11008
> Project: HBase
>  Issue Type: Improvement
>  Components: security
>Reporter: Jean-Daniel Cryans
>Assignee: Jean-Daniel Cryans
> Fix For: 0.99.0, 0.98.2, 0.96.3, 0.94.20
>
> Attachments: HBASE-11008-v2.patch, HBASE-11008.patch
>
>
> Over in HBASE-10958 we noticed that it might make sense to require 
> Action.CREATE for bulk load, flush, and compact since it is also required for 
> things like enable and disable.
> This means the following changes:
>  - preBulkLoadHFile goes from WRITE to CREATE
>  - compact/flush go from ADMIN to ADMIN or CREATE



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HBASE-10958) [dataloss] Bulk loading with seqids can prevent some log entries from being replayed

2014-04-17 Thread Jean-Daniel Cryans (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-10958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Daniel Cryans updated HBASE-10958:
---

Attachment: HBASE-10958-0.94.patch
HBASE-10958-v3.patch

Attaching 2 patches.

One is a backport for 0.94. While doing the backport I saw that a 
TestSnapshotFromMaster was failing and Matteo was able to see that it was an 
error in my patch, flushing always returned that it needed compaction. I need 
to rerun all the tests now but it was the only one that failed (haven't tried 
with security either). I also added a test in TestHRegion for that.

The second patch is for trunk, in which I ported the same test to TestHRegion. 
Interestingly, it didn't work. I found that in 0.94 we compact if num_files > 
compactionThreshold, but in trunk it's >=, so it seems that we compact more 
often now. This patch also has the fixes from [~stack]'s comments.

> [dataloss] Bulk loading with seqids can prevent some log entries from being 
> replayed
> 
>
> Key: HBASE-10958
> URL: https://issues.apache.org/jira/browse/HBASE-10958
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.96.2, 0.98.1, 0.94.18
>Reporter: Jean-Daniel Cryans
>Assignee: Jean-Daniel Cryans
>Priority: Blocker
> Fix For: 0.99.0, 0.94.19, 0.98.2, 0.96.3
>
> Attachments: HBASE-10958-0.94.patch, 
> HBASE-10958-less-intrusive-hack-0.96.patch, 
> HBASE-10958-quick-hack-0.96.patch, HBASE-10958-v2.patch, 
> HBASE-10958-v3.patch, HBASE-10958.patch
>
>
> We found an issue with bulk loads causing data loss when assigning sequence 
> ids (HBASE-6630) that is triggered when replaying recovered edits. We're 
> nicknaming this issue *Blindspot*.
> The problem is that the sequence id given to a bulk loaded file is higher 
> than those of the edits in the region's memstore. When replaying recovered 
> edits, the rule to skip some of them is that they have to be _lower than the 
> highest sequence id_. In other words, the edits that have a sequence id lower 
> than the highest one in the store files *should* have also been flushed. This 
> is not the case with bulk loaded files since we now have an HFile with a 
> sequence id higher than unflushed edits.
> The log recovery code takes this into account by simply skipping the bulk 
> loaded files, but this "bulk loaded status" is *lost* on compaction. The 
> edits in the logs that have a sequence id lower than the bulk loaded file 
> that got compacted are put in a blind spot and are skipped during replay.
> Here's the easiest way to recreate this issue:
>  - Create an empty table
>  - Put one row in it (let's say it gets seqid 1)
>  - Bulk load one file (it gets seqid 2). I used ImporTsv and set 
> hbase.mapreduce.bulkload.assign.sequenceNumbers.
>  - Bulk load a second file the same way (it gets seqid 3).
>  - Major compact the table (the new file has seqid 3 and isn't considered 
> bulk loaded).
>  - Kill the region server that holds the table's region.
>  - Scan the table once the region is made available again. The first row, at 
> seqid 1, will be missing since the HFile with seqid 3 makes us believe that 
> everything that came before it was flushed.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HBASE-11008) Align bulk load, flush, and compact to require Action.CREATE

2014-04-17 Thread Jean-Daniel Cryans (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-11008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Daniel Cryans updated HBASE-11008:
---

Attachment: HBASE-11008-v2.patch

Patch with the modified documentation. FWIW I think that chapter needs a lot 
more love but it's out of scope.

> Align bulk load, flush, and compact to require Action.CREATE
> 
>
> Key: HBASE-11008
> URL: https://issues.apache.org/jira/browse/HBASE-11008
> Project: HBase
>  Issue Type: Improvement
>  Components: security
>Reporter: Jean-Daniel Cryans
>Assignee: Jean-Daniel Cryans
> Fix For: 0.99.0, 0.98.2, 0.96.3, 0.94.20
>
> Attachments: HBASE-11008-v2.patch, HBASE-11008.patch
>
>
> Over in HBASE-10958 we noticed that it might make sense to require 
> Action.CREATE for bulk load, flush, and compact since it is also required for 
> things like enable and disable.
> This means the following changes:
>  - preBulkLoadHFile goes from WRITE to CREATE
>  - compact/flush go from ADMIN to ADMIN or CREATE



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HBASE-11011) Avoid extra getFileStatus() calls on Region startup

2014-04-17 Thread Jean-Daniel Cryans (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-11011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13973470#comment-13973470
 ] 

Jean-Daniel Cryans commented on HBASE-11011:


Alright, +1

> Avoid extra getFileStatus() calls on Region startup
> ---
>
> Key: HBASE-11011
> URL: https://issues.apache.org/jira/browse/HBASE-11011
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 0.96.2, 0.98.1, 1.0.0
>Reporter: Matteo Bertozzi
>Assignee: Matteo Bertozzi
>Priority: Minor
> Fix For: 1.0.0, 0.98.2, 0.96.3
>
> Attachments: HBASE-11011-v0.patch, HBASE-11011-v1.patch, 
> HBASE-11011-v2.patch
>
>
> On load we already have a StoreFileInfo and we create it from the path,
> this will result in an extra fs.getFileStatus() call.
> In completeCompactionMarker() we do a fs.exists() and later a 
> fs.getFileStatus()
> to create the StoreFileInfo, we can avoid the exists.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HBASE-11011) Avoid extra getFileStatus() calls on Region startup

2014-04-17 Thread Jean-Daniel Cryans (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-11011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13973419#comment-13973419
 ] 

Jean-Daniel Cryans commented on HBASE-11011:


Does the changed code in completeCompactionMarker require a unit test? Or is 
there already one?

Also fix those lines:

{quote}
+// If we scan the directory and the file is not present, may means:
+// so, we can't do anything with the "compaction output list" since or is
+// already loaded on startup, because in the store folder, or it may be not
{quote}

> Avoid extra getFileStatus() calls on Region startup
> ---
>
> Key: HBASE-11011
> URL: https://issues.apache.org/jira/browse/HBASE-11011
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 0.96.2, 0.98.1, 1.0.0
>Reporter: Matteo Bertozzi
>Assignee: Matteo Bertozzi
>Priority: Minor
> Fix For: 1.0.0, 0.98.2, 0.96.3
>
> Attachments: HBASE-11011-v0.patch, HBASE-11011-v1.patch
>
>
> On load we already have a StoreFileInfo and we create it from the path,
> this will result in an extra fs.getFileStatus() call.
> In completeCompactionMarker() we do a fs.exists() and later a 
> fs.getFileStatus()
> to create the StoreFileInfo, we can avoid the exists.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HBASE-11011) Avoid extra getFileStatus() calls on Region startup

2014-04-17 Thread Jean-Daniel Cryans (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-11011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13973196#comment-13973196
 ] 

Jean-Daniel Cryans commented on HBASE-11011:


bq. In this case is not bad, is an "expected" situation, where the RS died 
before moving the compacted file.

This might be, but the log message doesn't convey any of that. Taken out of 
context (so most users reading our logs), it just says a file is missing.

bq. since is trying to lookup files in /table/region/family/ and if they are 
there are already loaded for sure.

I'm trusting you on this one :)

> Avoid extra getFileStatus() calls on Region startup
> ---
>
> Key: HBASE-11011
> URL: https://issues.apache.org/jira/browse/HBASE-11011
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 0.96.2, 0.98.1, 1.0.0
>Reporter: Matteo Bertozzi
>Assignee: Matteo Bertozzi
>Priority: Minor
> Fix For: 1.0.0, 0.98.2, 0.96.3
>
> Attachments: HBASE-11011-v0.patch
>
>
> On load we already have a StoreFileInfo and we create it from the path,
> this will result in an extra fs.getFileStatus() call.
> In completeCompactionMarker() we do a fs.exists() and later a 
> fs.getFileStatus()
> to create the StoreFileInfo, we can avoid the exists.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HBASE-11011) Avoid extra getFileStatus() calls on Region startup

2014-04-17 Thread Jean-Daniel Cryans (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-11011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13973168#comment-13973168
 ] 

Jean-Daniel Cryans commented on HBASE-11011:


What should one do when seeing this message?

{code}
+  LOG.debug("compaction file missing: " + outputPath);
{code}

The fact that a file is missing seems pretty bad, yet it's at DEBUG.

> Avoid extra getFileStatus() calls on Region startup
> ---
>
> Key: HBASE-11011
> URL: https://issues.apache.org/jira/browse/HBASE-11011
> Project: HBase
>  Issue Type: Bug
>  Components: regionserver
>Affects Versions: 0.96.2, 0.98.1, 1.0.0
>Reporter: Matteo Bertozzi
>Assignee: Matteo Bertozzi
>Priority: Minor
> Fix For: 1.0.0, 0.98.2, 0.96.3
>
> Attachments: HBASE-11011-v0.patch
>
>
> On load we already have a StoreFileInfo and we create it from the path,
> this will result in an extra fs.getFileStatus() call.
> In completeCompactionMarker() we do a fs.exists() and later a 
> fs.getFileStatus()
> to create the StoreFileInfo, we can avoid the exists.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HBASE-11008) Align bulk load, flush, and compact to require Action.CREATE

2014-04-16 Thread Jean-Daniel Cryans (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-11008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13972130#comment-13972130
 ] 

Jean-Daniel Cryans commented on HBASE-11008:


bq.  If we choose to make it more restrictive,

The problem, as stated in HBASE-10958, is that bulk loading requires WRITE but 
effectively by going through {{LoadIncrementalHFiles}} it already requires 
CREATE. 

> Align bulk load, flush, and compact to require Action.CREATE
> 
>
> Key: HBASE-11008
> URL: https://issues.apache.org/jira/browse/HBASE-11008
> Project: HBase
>  Issue Type: Improvement
>  Components: security
>Reporter: Jean-Daniel Cryans
>Assignee: Jean-Daniel Cryans
> Fix For: 0.99.0, 0.98.2, 0.96.3, 0.94.20
>
> Attachments: HBASE-11008.patch
>
>
> Over in HBASE-10958 we noticed that it might make sense to require 
> Action.CREATE for bulk load, flush, and compact since it is also required for 
> things like enable and disable.
> This means the following changes:
>  - preBulkLoadHFile goes from WRITE to CREATE
>  - compact/flush go from ADMIN to ADMIN or CREATE



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HBASE-10958) [dataloss] Bulk loading with seqids can prevent some log entries from being replayed

2014-04-16 Thread Jean-Daniel Cryans (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-10958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13972111#comment-13972111
 ] 

Jean-Daniel Cryans commented on HBASE-10958:


Forgot to reply to [~stack]'s comment:

bq. flushSucceeded method (should it be isFlushSucceeded) and then you go get 
the result by accessing the data member directly. Minor inconsistency.

flushSucceeded() isn't just looking up a field though, it's checking two 
things. 

> [dataloss] Bulk loading with seqids can prevent some log entries from being 
> replayed
> 
>
> Key: HBASE-10958
> URL: https://issues.apache.org/jira/browse/HBASE-10958
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.96.2, 0.98.1, 0.94.18
>Reporter: Jean-Daniel Cryans
>Assignee: Jean-Daniel Cryans
>Priority: Blocker
> Fix For: 0.99.0, 0.94.19, 0.98.2, 0.96.3
>
> Attachments: HBASE-10958-less-intrusive-hack-0.96.patch, 
> HBASE-10958-quick-hack-0.96.patch, HBASE-10958-v2.patch, HBASE-10958.patch
>
>
> We found an issue with bulk loads causing data loss when assigning sequence 
> ids (HBASE-6630) that is triggered when replaying recovered edits. We're 
> nicknaming this issue *Blindspot*.
> The problem is that the sequence id given to a bulk loaded file is higher 
> than those of the edits in the region's memstore. When replaying recovered 
> edits, the rule to skip some of them is that they have to be _lower than the 
> highest sequence id_. In other words, the edits that have a sequence id lower 
> than the highest one in the store files *should* have also been flushed. This 
> is not the case with bulk loaded files since we now have an HFile with a 
> sequence id higher than unflushed edits.
> The log recovery code takes this into account by simply skipping the bulk 
> loaded files, but this "bulk loaded status" is *lost* on compaction. The 
> edits in the logs that have a sequence id lower than the bulk loaded file 
> that got compacted are put in a blind spot and are skipped during replay.
> Here's the easiest way to recreate this issue:
>  - Create an empty table
>  - Put one row in it (let's say it gets seqid 1)
>  - Bulk load one file (it gets seqid 2). I used ImporTsv and set 
> hbase.mapreduce.bulkload.assign.sequenceNumbers.
>  - Bulk load a second file the same way (it gets seqid 3).
>  - Major compact the table (the new file has seqid 3 and isn't considered 
> bulk loaded).
>  - Kill the region server that holds the table's region.
>  - Scan the table once the region is made available again. The first row, at 
> seqid 1, will be missing since the HFile with seqid 3 makes us believe that 
> everything that came before it was flushed.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HBASE-11008) Align bulk load, flush, and compact to require Action.CREATE

2014-04-16 Thread Jean-Daniel Cryans (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-11008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Daniel Cryans updated HBASE-11008:
---

Attachment: HBASE-11008.patch

Attaching a patch for trunk. Not planning on committing until we get more 
consensus on this issue.

I do think that this is standalone from HBASE-10958 though, in the sense that 
these changes make sense to me on their own.

> Align bulk load, flush, and compact to require Action.CREATE
> 
>
> Key: HBASE-11008
> URL: https://issues.apache.org/jira/browse/HBASE-11008
> Project: HBase
>  Issue Type: Improvement
>  Components: security
>Reporter: Jean-Daniel Cryans
>Assignee: Jean-Daniel Cryans
> Fix For: 0.99.0, 0.98.2, 0.96.3, 0.94.20
>
> Attachments: HBASE-11008.patch
>
>
> Over in HBASE-10958 we noticed that it might make sense to require 
> Action.CREATE for bulk load, flush, and compact since it is also required for 
> things like enable and disable.
> This means the following changes:
>  - preBulkLoadHFile goes from WRITE to CREATE
>  - compact/flush go from ADMIN to ADMIN or CREATE



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HBASE-11008) Align bulk load, flush, and compact to require Action.CREATE

2014-04-16 Thread Jean-Daniel Cryans (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-11008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Daniel Cryans updated HBASE-11008:
---

Status: Patch Available  (was: Open)

> Align bulk load, flush, and compact to require Action.CREATE
> 
>
> Key: HBASE-11008
> URL: https://issues.apache.org/jira/browse/HBASE-11008
> Project: HBase
>  Issue Type: Improvement
>  Components: security
>Reporter: Jean-Daniel Cryans
>Assignee: Jean-Daniel Cryans
> Fix For: 0.99.0, 0.98.2, 0.96.3, 0.94.20
>
> Attachments: HBASE-11008.patch
>
>
> Over in HBASE-10958 we noticed that it might make sense to require 
> Action.CREATE for bulk load, flush, and compact since it is also required for 
> things like enable and disable.
> This means the following changes:
>  - preBulkLoadHFile goes from WRITE to CREATE
>  - compact/flush go from ADMIN to ADMIN or CREATE



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Created] (HBASE-11008) Align bulk load, flush, and compact to require Action.CREATE

2014-04-16 Thread Jean-Daniel Cryans (JIRA)

Jean-Daniel Cryans created HBASE-11008:
--

 Summary: Align bulk load, flush, and compact to require 
Action.CREATE
 Key: HBASE-11008
 URL: https://issues.apache.org/jira/browse/HBASE-11008
 Project: HBase
  Issue Type: Improvement
  Components: security
Reporter: Jean-Daniel Cryans
Assignee: Jean-Daniel Cryans
 Fix For: 0.99.0, 0.98.2, 0.96.3, 0.94.20


Over in HBASE-10958 we noticed that it might make sense to require 
Action.CREATE for bulk load, flush, and compact since it is also required for 
things like enable and disable.

This means the following changes:

 - preBulkLoadHFile goes from WRITE to CREATE
 - compact/flush go from ADMIN to ADMIN or CREATE



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HBASE-10958) [dataloss] Bulk loading with seqids can prevent some log entries from being replayed

2014-04-16 Thread Jean-Daniel Cryans (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-10958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13971998#comment-13971998
 ] 

Jean-Daniel Cryans commented on HBASE-10958:


bq. The requirement on 'CREATE' for bulk load seems to come from here. Is this 
even intended?

Thanks for doing this. There's also another place where this is called, 
splitStoreFile(), in order to get an HCD. I don't think it's necessary to call 
it twice, but it seems necessary to call it at least once else you can't:

- verify that the families exist
- get the schema for each family so that we create the HFiles with the correct 
configurations when splitting

> [dataloss] Bulk loading with seqids can prevent some log entries from being 
> replayed
> 
>
> Key: HBASE-10958
> URL: https://issues.apache.org/jira/browse/HBASE-10958
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.96.2, 0.98.1, 0.94.18
>Reporter: Jean-Daniel Cryans
>Assignee: Jean-Daniel Cryans
>Priority: Blocker
> Fix For: 0.99.0, 0.94.19, 0.98.2, 0.96.3
>
> Attachments: HBASE-10958-less-intrusive-hack-0.96.patch, 
> HBASE-10958-quick-hack-0.96.patch, HBASE-10958-v2.patch, HBASE-10958.patch
>
>
> We found an issue with bulk loads causing data loss when assigning sequence 
> ids (HBASE-6630) that is triggered when replaying recovered edits. We're 
> nicknaming this issue *Blindspot*.
> The problem is that the sequence id given to a bulk loaded file is higher 
> than those of the edits in the region's memstore. When replaying recovered 
> edits, the rule to skip some of them is that they have to be _lower than the 
> highest sequence id_. In other words, the edits that have a sequence id lower 
> than the highest one in the store files *should* have also been flushed. This 
> is not the case with bulk loaded files since we now have an HFile with a 
> sequence id higher than unflushed edits.
> The log recovery code takes this into account by simply skipping the bulk 
> loaded files, but this "bulk loaded status" is *lost* on compaction. The 
> edits in the logs that have a sequence id lower than the bulk loaded file 
> that got compacted are put in a blind spot and are skipped during replay.
> Here's the easiest way to recreate this issue:
>  - Create an empty table
>  - Put one row in it (let's say it gets seqid 1)
>  - Bulk load one file (it gets seqid 2). I used ImporTsv and set 
> hbase.mapreduce.bulkload.assign.sequenceNumbers.
>  - Bulk load a second file the same way (it gets seqid 3).
>  - Major compact the table (the new file has seqid 3 and isn't considered 
> bulk loaded).
>  - Kill the region server that holds the table's region.
>  - Scan the table once the region is made available again. The first row, at 
> seqid 1, will be missing since the HFile with seqid 3 makes us believe that 
> everything that came before it was flushed.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HBASE-10871) Indefinite OPEN/CLOSE wait on busy RegionServers

2014-04-16 Thread Jean-Daniel Cryans (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-10871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13971945#comment-13971945
 ] 

Jean-Daniel Cryans commented on HBASE-10871:


bq. Perhaps we should add some delay here.

Well we drop the region on the floor, and the delay right now is the RIT 
timeout (30 minutes in 0.94, 10 minutes after). We don't do tight retries.

> Indefinite OPEN/CLOSE wait on busy RegionServers
> 
>
> Key: HBASE-10871
> URL: https://issues.apache.org/jira/browse/HBASE-10871
> Project: HBase
>  Issue Type: Improvement
>  Components: Balancer, master, Region Assignment
>Affects Versions: 0.94.6
>Reporter: Harsh J
>
> We observed a case where, when a specific RS got bombarded by a large amount 
> of regular requests, spiking and filling up its RPC queue, the balancer's 
> invoked unassigns and assigns for regions that dealt with this server entered 
> into an indefinite retry loop.
> The regions specifically began waiting in PENDING_CLOSE/PENDING_OPEN states 
> indefinitely cause of the HBase Client RPC from the ServerManager at the 
> master was running into SocketTimeouts. This caused a region unavailability 
> in the server for the affected regions. The timeout monitor retry default of 
> 30m in 0.94's AM compounded the waiting gap further a bit more (this is now 
> 10m in 0.95+'s new AM, and has further retries before we get there, which is 
> good).
> Wonder if there's a way to improve this situation generally. PENDING_OPENs 
> may be easy to handle - we can switch them out and move them elsewhere. 
> PENDING_CLOSEs may be a bit more tricky, but there must perhaps at least be a 
> way to "give up" permanently on a movement plan, and letting things be for a 
> while hoping for the RS to recover itself on its own (such that clients also 
> have a chance of getting things to work in the meantime)?



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HBASE-10958) [dataloss] Bulk loading with seqids can prevent some log entries from being replayed

2014-04-16 Thread Jean-Daniel Cryans (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-10958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13971753#comment-13971753
 ] 

Jean-Daniel Cryans commented on HBASE-10958:


bq. I can see that folks would not want to grant users CREATE (or ADMIN) so 
that they cannot create/drop/enable/disable tables, but still do allow them to 
load data via bulk load. That would now no longer possible.

Bulk load already needs CREATE as shown above in TestAccessController.

bq. Have we dismissed this option?

I don't think so, but no one has been pushing for it. It creates a precedent, 
nowhere else in the code do we use User.runAs to override the current user that 
came in via a RPC (I'm not even sure if it works, but that's because I don't 
know that code very well).

> [dataloss] Bulk loading with seqids can prevent some log entries from being 
> replayed
> 
>
> Key: HBASE-10958
> URL: https://issues.apache.org/jira/browse/HBASE-10958
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.96.2, 0.98.1, 0.94.18
>Reporter: Jean-Daniel Cryans
>Assignee: Jean-Daniel Cryans
>Priority: Blocker
> Fix For: 0.99.0, 0.94.19, 0.98.2, 0.96.3
>
> Attachments: HBASE-10958-less-intrusive-hack-0.96.patch, 
> HBASE-10958-quick-hack-0.96.patch, HBASE-10958-v2.patch, HBASE-10958.patch
>
>
> We found an issue with bulk loads causing data loss when assigning sequence 
> ids (HBASE-6630) that is triggered when replaying recovered edits. We're 
> nicknaming this issue *Blindspot*.
> The problem is that the sequence id given to a bulk loaded file is higher 
> than those of the edits in the region's memstore. When replaying recovered 
> edits, the rule to skip some of them is that they have to be _lower than the 
> highest sequence id_. In other words, the edits that have a sequence id lower 
> than the highest one in the store files *should* have also been flushed. This 
> is not the case with bulk loaded files since we now have an HFile with a 
> sequence id higher than unflushed edits.
> The log recovery code takes this into account by simply skipping the bulk 
> loaded files, but this "bulk loaded status" is *lost* on compaction. The 
> edits in the logs that have a sequence id lower than the bulk loaded file 
> that got compacted are put in a blind spot and are skipped during replay.
> Here's the easiest way to recreate this issue:
>  - Create an empty table
>  - Put one row in it (let's say it gets seqid 1)
>  - Bulk load one file (it gets seqid 2). I used ImporTsv and set 
> hbase.mapreduce.bulkload.assign.sequenceNumbers.
>  - Bulk load a second file the same way (it gets seqid 3).
>  - Major compact the table (the new file has seqid 3 and isn't considered 
> bulk loaded).
>  - Kill the region server that holds the table's region.
>  - Scan the table once the region is made available again. The first row, at 
> seqid 1, will be missing since the HFile with seqid 3 makes us believe that 
> everything that came before it was flushed.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HBASE-10958) [dataloss] Bulk loading with seqids can prevent some log entries from being replayed

2014-04-16 Thread Jean-Daniel Cryans (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-10958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13971584#comment-13971584
 ] 

Jean-Daniel Cryans commented on HBASE-10958:


bq. we'd like ADMIN to be granted sparingly to admins or delegates

+1

bq.  IIRC enable and disable are ADMIN actions also, since disabling or 
enabling a 1 region table has consequences.

This is what I see in the code in trunk:

{code}
requirePermission("preBulkLoadHFile", getTableDesc().getTableName(), 
el.getFirst(), null, Permission.Action.WRITE);
requirePermission("enableTable", tableName, null, null, Action.ADMIN, 
Action.CREATE);
requirePermission("disableTable", tableName, null, null, Action.ADMIN, 
Action.CREATE);
requirePermission("compact", getTableName(e.getEnvironment()), null, null, 
Action.ADMIN);
requirePermission("flush", getTableName(e.getEnvironment()), null, null, 
Action.ADMIN);
{code}

IMO flush should have lower or same perms as disableTable.

So here's a list of changes I believe are needed:

 - preBulkLoadHFile goes from WRITE to CREATE (seems more in line with what's 
really needed to bulk load given the code I posted yesterday)
 - compact/flush go from ADMIN to ADMIN or CREATE

This should not have an impact on the current users. If we can agree on the 
changes, I'll open a new jira that's going to be blocking this one.

> [dataloss] Bulk loading with seqids can prevent some log entries from being 
> replayed
> 
>
> Key: HBASE-10958
> URL: https://issues.apache.org/jira/browse/HBASE-10958
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.96.2, 0.98.1, 0.94.18
>Reporter: Jean-Daniel Cryans
>Assignee: Jean-Daniel Cryans
>Priority: Blocker
> Fix For: 0.99.0, 0.94.19, 0.98.2, 0.96.3
>
> Attachments: HBASE-10958-less-intrusive-hack-0.96.patch, 
> HBASE-10958-quick-hack-0.96.patch, HBASE-10958-v2.patch, HBASE-10958.patch
>
>
> We found an issue with bulk loads causing data loss when assigning sequence 
> ids (HBASE-6630) that is triggered when replaying recovered edits. We're 
> nicknaming this issue *Blindspot*.
> The problem is that the sequence id given to a bulk loaded file is higher 
> than those of the edits in the region's memstore. When replaying recovered 
> edits, the rule to skip some of them is that they have to be _lower than the 
> highest sequence id_. In other words, the edits that have a sequence id lower 
> than the highest one in the store files *should* have also been flushed. This 
> is not the case with bulk loaded files since we now have an HFile with a 
> sequence id higher than unflushed edits.
> The log recovery code takes this into account by simply skipping the bulk 
> loaded files, but this "bulk loaded status" is *lost* on compaction. The 
> edits in the logs that have a sequence id lower than the bulk loaded file 
> that got compacted are put in a blind spot and are skipped during replay.
> Here's the easiest way to recreate this issue:
>  - Create an empty table
>  - Put one row in it (let's say it gets seqid 1)
>  - Bulk load one file (it gets seqid 2). I used ImporTsv and set 
> hbase.mapreduce.bulkload.assign.sequenceNumbers.
>  - Bulk load a second file the same way (it gets seqid 3).
>  - Major compact the table (the new file has seqid 3 and isn't considered 
> bulk loaded).
>  - Kill the region server that holds the table's region.
>  - Scan the table once the region is made available again. The first row, at 
> seqid 1, will be missing since the HFile with seqid 3 makes us believe that 
> everything that came before it was flushed.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HBASE-10871) Indefinite OPEN/CLOSE wait on busy RegionServers

2014-04-15 Thread Jean-Daniel Cryans (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-10871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13970255#comment-13970255
 ] 

Jean-Daniel Cryans commented on HBASE-10871:


I got privy with the original issue's details, and I don't think that this part 
is accurate:

bq. the balancer's invoked unassigns and assigns for regions that dealt with 
this server entered into an indefinite retry loop.

What I saw from the logs is that the master got a {{SocketTimeoutException}} 
trying to send the openRegion() to the region server and fell into this part of 
the code in {{AssignmentManager}}:

{code}
} else if (t instanceof java.net.SocketTimeoutException 
&& this.serverManager.isServerOnline(plan.getDestination())) {
  LOG.warn("Call openRegion() to " + plan.getDestination()
  + " has timed out when trying to assign "
  + region.getRegionNameAsString()
  + ", but the region might already be opened on "
  + plan.getDestination() + ".", t);
  return;
{code}

At that point we don't know if the region was opened or not, so we stop keeping 
track of it. Unfortunately, the 0.94 code has a default of 30 minutes for RIT 
timeout so it took that long to finally see this message in the log: "Region 
has been OFFLINE for too long, reassigning regionx to a random server". The 
looping isn't indefinite, it's just that a few of them were getting the socket 
timeout repeatedly. Some machines were in a bad state. There was also something 
strange with the call queues, I didn't see any mention of a connection reset so 
it doesn't seem like openRegion even made it to the queue.

So it behaved the way we expect it to, although it's really not ideal, and 
stayed in limbo in either OFFLINE, PENDING_OPEN, or PENDING_CLOSE. Intuitively, 
I'd say we just ask the server directly if it has the region or not, but the 
fact that we got a socket timeout in the first place kinda tells us that that 
RS is currently unreachable. Maybe it's going to die, which is good since we'll 
know for sure the region is closed, but if it was opened then it might already 
be serving requests.

I can think of a concept of having a list of regions in that special state that 
for every X seconds we try to contact the region server, but it seems brittle. 
FWIW the current situation of relying on the timeout isn't great either since 
the region could be open on the RS but we timeout anyways and double assign it.

Pinging [~jxiang] since he loves that part of the code for more inputs :)

> Indefinite OPEN/CLOSE wait on busy RegionServers
> 
>
> Key: HBASE-10871
> URL: https://issues.apache.org/jira/browse/HBASE-10871
> Project: HBase
>  Issue Type: Improvement
>  Components: Balancer, master, Region Assignment
>Affects Versions: 0.94.6
>Reporter: Harsh J
>
> We observed a case where, when a specific RS got bombarded by a large amount 
> of regular requests, spiking and filling up its RPC queue, the balancer's 
> invoked unassigns and assigns for regions that dealt with this server entered 
> into an indefinite retry loop.
> The regions specifically began waiting in PENDING_CLOSE/PENDING_OPEN states 
> indefinitely cause of the HBase Client RPC from the ServerManager at the 
> master was running into SocketTimeouts. This caused a region unavailability 
> in the server for the affected regions. The timeout monitor retry default of 
> 30m in 0.94's AM compounded the waiting gap further a bit more (this is now 
> 10m in 0.95+'s new AM, and has further retries before we get there, which is 
> good).
> Wonder if there's a way to improve this situation generally. PENDING_OPENs 
> may be easy to handle - we can switch them out and move them elsewhere. 
> PENDING_CLOSEs may be a bit more tricky, but there must perhaps at least be a 
> way to "give up" permanently on a movement plan, and letting things be for a 
> while hoping for the RS to recover itself on its own (such that clients also 
> have a chance of getting things to work in the meantime)?



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HBASE-10958) [dataloss] Bulk loading with seqids can prevent some log entries from being replayed

2014-04-15 Thread Jean-Daniel Cryans (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-10958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13970228#comment-13970228
 ] 

Jean-Daniel Cryans commented on HBASE-10958:


bq. Wait. Are you saying the region server itself cannot issue a flush as part 
of the bulkLoadHFiles RPC?

Exact. That's why the User.runAs (and then run as the region server itself) 
solution seems to make sense to me.

I read some more code, and it seems bulk load actually requires CREATE even 
though the call itself requires WRITE. From TestAccessController:

{code}
// User performing bulk loads must have privilege to read table metadata
// (ADMIN or CREATE)
verifyAllowed(bulkLoadAction, SUPERUSER, USER_ADMIN, USER_OWNER, 
USER_CREATE);
verifyDenied(bulkLoadAction, USER_RW, USER_NONE, USER_RO);
{code}

So another options is to set flush and compact as CREATE actions (in line with 
create table, alter, disable, enable, delete).

> [dataloss] Bulk loading with seqids can prevent some log entries from being 
> replayed
> 
>
> Key: HBASE-10958
> URL: https://issues.apache.org/jira/browse/HBASE-10958
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.96.2, 0.98.1, 0.94.18
>Reporter: Jean-Daniel Cryans
>Assignee: Jean-Daniel Cryans
>Priority: Blocker
> Fix For: 0.99.0, 0.94.19, 0.98.2, 0.96.3
>
> Attachments: HBASE-10958-less-intrusive-hack-0.96.patch, 
> HBASE-10958-quick-hack-0.96.patch, HBASE-10958-v2.patch, HBASE-10958.patch
>
>
> We found an issue with bulk loads causing data loss when assigning sequence 
> ids (HBASE-6630) that is triggered when replaying recovered edits. We're 
> nicknaming this issue *Blindspot*.
> The problem is that the sequence id given to a bulk loaded file is higher 
> than those of the edits in the region's memstore. When replaying recovered 
> edits, the rule to skip some of them is that they have to be _lower than the 
> highest sequence id_. In other words, the edits that have a sequence id lower 
> than the highest one in the store files *should* have also been flushed. This 
> is not the case with bulk loaded files since we now have an HFile with a 
> sequence id higher than unflushed edits.
> The log recovery code takes this into account by simply skipping the bulk 
> loaded files, but this "bulk loaded status" is *lost* on compaction. The 
> edits in the logs that have a sequence id lower than the bulk loaded file 
> that got compacted are put in a blind spot and are skipped during replay.
> Here's the easiest way to recreate this issue:
>  - Create an empty table
>  - Put one row in it (let's say it gets seqid 1)
>  - Bulk load one file (it gets seqid 2). I used ImporTsv and set 
> hbase.mapreduce.bulkload.assign.sequenceNumbers.
>  - Bulk load a second file the same way (it gets seqid 3).
>  - Major compact the table (the new file has seqid 3 and isn't considered 
> bulk loaded).
>  - Kill the region server that holds the table's region.
>  - Scan the table once the region is made available again. The first row, at 
> seqid 1, will be missing since the HFile with seqid 3 makes us believe that 
> everything that came before it was flushed.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HBASE-10958) [dataloss] Bulk loading with seqids can prevent some log entries from being replayed

2014-04-15 Thread Jean-Daniel Cryans (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-10958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13970173#comment-13970173
 ] 

Jean-Daniel Cryans commented on HBASE-10958:


Solutions on top of my head:

- Do doAs inside the method. The rationale being that it's the region server 
that's trying to do that here, not the user. I'm not sure if this is doable, 
and [~mbertozzi] is laughing at me.

- Matteo thinks that we should just make bulk load an ADMIN method. I agree but 
I'm not fond of breaking secure setups that use bulk loads. [~apurtell]?

> [dataloss] Bulk loading with seqids can prevent some log entries from being 
> replayed
> 
>
> Key: HBASE-10958
> URL: https://issues.apache.org/jira/browse/HBASE-10958
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.96.2, 0.98.1, 0.94.18
>Reporter: Jean-Daniel Cryans
>Assignee: Jean-Daniel Cryans
>Priority: Blocker
> Fix For: 0.99.0, 0.94.19, 0.98.2, 0.96.3
>
> Attachments: HBASE-10958-less-intrusive-hack-0.96.patch, 
> HBASE-10958-quick-hack-0.96.patch, HBASE-10958-v2.patch, HBASE-10958.patch
>
>
> We found an issue with bulk loads causing data loss when assigning sequence 
> ids (HBASE-6630) that is triggered when replaying recovered edits. We're 
> nicknaming this issue *Blindspot*.
> The problem is that the sequence id given to a bulk loaded file is higher 
> than those of the edits in the region's memstore. When replaying recovered 
> edits, the rule to skip some of them is that they have to be _lower than the 
> highest sequence id_. In other words, the edits that have a sequence id lower 
> than the highest one in the store files *should* have also been flushed. This 
> is not the case with bulk loaded files since we now have an HFile with a 
> sequence id higher than unflushed edits.
> The log recovery code takes this into account by simply skipping the bulk 
> loaded files, but this "bulk loaded status" is *lost* on compaction. The 
> edits in the logs that have a sequence id lower than the bulk loaded file 
> that got compacted are put in a blind spot and are skipped during replay.
> Here's the easiest way to recreate this issue:
>  - Create an empty table
>  - Put one row in it (let's say it gets seqid 1)
>  - Bulk load one file (it gets seqid 2). I used ImporTsv and set 
> hbase.mapreduce.bulkload.assign.sequenceNumbers.
>  - Bulk load a second file the same way (it gets seqid 3).
>  - Major compact the table (the new file has seqid 3 and isn't considered 
> bulk loaded).
>  - Kill the region server that holds the table's region.
>  - Scan the table once the region is made available again. The first row, at 
> seqid 1, will be missing since the HFile with seqid 3 makes us believe that 
> everything that came before it was flushed.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HBASE-10958) [dataloss] Bulk loading with seqids can prevent some log entries from being replayed

2014-04-15 Thread Jean-Daniel Cryans (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-10958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13970166#comment-13970166
 ] 

Jean-Daniel Cryans commented on HBASE-10958:


Right now bulk loading is WRITE and flush is ADMIN. Problematic!

> [dataloss] Bulk loading with seqids can prevent some log entries from being 
> replayed
> 
>
> Key: HBASE-10958
> URL: https://issues.apache.org/jira/browse/HBASE-10958
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.96.2, 0.98.1, 0.94.18
>Reporter: Jean-Daniel Cryans
>Assignee: Jean-Daniel Cryans
>Priority: Blocker
> Fix For: 0.99.0, 0.94.19, 0.98.2, 0.96.3
>
> Attachments: HBASE-10958-less-intrusive-hack-0.96.patch, 
> HBASE-10958-quick-hack-0.96.patch, HBASE-10958-v2.patch, HBASE-10958.patch
>
>
> We found an issue with bulk loads causing data loss when assigning sequence 
> ids (HBASE-6630) that is triggered when replaying recovered edits. We're 
> nicknaming this issue *Blindspot*.
> The problem is that the sequence id given to a bulk loaded file is higher 
> than those of the edits in the region's memstore. When replaying recovered 
> edits, the rule to skip some of them is that they have to be _lower than the 
> highest sequence id_. In other words, the edits that have a sequence id lower 
> than the highest one in the store files *should* have also been flushed. This 
> is not the case with bulk loaded files since we now have an HFile with a 
> sequence id higher than unflushed edits.
> The log recovery code takes this into account by simply skipping the bulk 
> loaded files, but this "bulk loaded status" is *lost* on compaction. The 
> edits in the logs that have a sequence id lower than the bulk loaded file 
> that got compacted are put in a blind spot and are skipped during replay.
> Here's the easiest way to recreate this issue:
>  - Create an empty table
>  - Put one row in it (let's say it gets seqid 1)
>  - Bulk load one file (it gets seqid 2). I used ImporTsv and set 
> hbase.mapreduce.bulkload.assign.sequenceNumbers.
>  - Bulk load a second file the same way (it gets seqid 3).
>  - Major compact the table (the new file has seqid 3 and isn't considered 
> bulk loaded).
>  - Kill the region server that holds the table's region.
>  - Scan the table once the region is made available again. The first row, at 
> seqid 1, will be missing since the HFile with seqid 3 makes us believe that 
> everything that came before it was flushed.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Updated] (HBASE-10312) Flooding the cluster with administrative actions leads to collapse

2014-04-15 Thread Jean-Daniel Cryans (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-10312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Daniel Cryans updated HBASE-10312:
---

  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

Committed to 0.99/8/6/4, thanks for the inputs guys.

> Flooding the cluster with administrative actions leads to collapse
> --
>
> Key: HBASE-10312
> URL: https://issues.apache.org/jira/browse/HBASE-10312
> Project: HBase
>  Issue Type: Bug
>Reporter: Andrew Purtell
>Assignee: Jean-Daniel Cryans
>Priority: Critical
> Fix For: 0.99.0, 0.94.19, 0.98.2, 0.96.3
>
> Attachments: HBASE-10312-0.94.patch, HBASE-10312.patch
>
>
> Steps to reproduce:
> 1. Start a cluster.
> 2. Start an ingest process.
> 3. In the HBase shell, do this:
> {noformat}
> while true do
>flush 'table'
> end
> {noformat}
> We should reject abuse via administrative requests like this.
> What happens on the cluster is the requests back up, leading to lots of these:
> {noformat}
> 2014-01-10 18:55:55,293 WARN  [Priority.RpcServer.handler=2,port=8120] 
> monitoring.TaskMonitor: Too many actions in action monitor! Purging some.
> {noformat}
> At this point we could lower a gate on further requests for actions until the 
> backlog clears.
> Continuing, all of the regionservers will eventually die with a 
> StackOverflowError of unknown origin because, stack overflow:
> {noformat}
> 2014-01-10 19:02:02,783 ERROR [Priority.RpcServer.handler=3,port=8120] 
> ipc.RpcServer: Unexpected throwable object java.lang.StackOverflowError
> at java.util.ArrayList$SubList.add(ArrayList.java:965)
> [...]
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HBASE-10958) [dataloss] Bulk loading with seqids can prevent some log entries from being replayed

2014-04-15 Thread Jean-Daniel Cryans (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-10958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13970158#comment-13970158
 ] 

Jean-Daniel Cryans commented on HBASE-10958:


Yeah, and bulk loading itself is kind of like a flush since you end up with an 
HFile.

> [dataloss] Bulk loading with seqids can prevent some log entries from being 
> replayed
> 
>
> Key: HBASE-10958
> URL: https://issues.apache.org/jira/browse/HBASE-10958
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.96.2, 0.98.1, 0.94.18
>Reporter: Jean-Daniel Cryans
>Assignee: Jean-Daniel Cryans
>Priority: Blocker
> Fix For: 0.99.0, 0.94.19, 0.98.2, 0.96.3
>
> Attachments: HBASE-10958-less-intrusive-hack-0.96.patch, 
> HBASE-10958-quick-hack-0.96.patch, HBASE-10958-v2.patch, HBASE-10958.patch
>
>
> We found an issue with bulk loads causing data loss when assigning sequence 
> ids (HBASE-6630) that is triggered when replaying recovered edits. We're 
> nicknaming this issue *Blindspot*.
> The problem is that the sequence id given to a bulk loaded file is higher 
> than those of the edits in the region's memstore. When replaying recovered 
> edits, the rule to skip some of them is that they have to be _lower than the 
> highest sequence id_. In other words, the edits that have a sequence id lower 
> than the highest one in the store files *should* have also been flushed. This 
> is not the case with bulk loaded files since we now have an HFile with a 
> sequence id higher than unflushed edits.
> The log recovery code takes this into account by simply skipping the bulk 
> loaded files, but this "bulk loaded status" is *lost* on compaction. The 
> edits in the logs that have a sequence id lower than the bulk loaded file 
> that got compacted are put in a blind spot and are skipped during replay.
> Here's the easiest way to recreate this issue:
>  - Create an empty table
>  - Put one row in it (let's say it gets seqid 1)
>  - Bulk load one file (it gets seqid 2). I used ImporTsv and set 
> hbase.mapreduce.bulkload.assign.sequenceNumbers.
>  - Bulk load a second file the same way (it gets seqid 3).
>  - Major compact the table (the new file has seqid 3 and isn't considered 
> bulk loaded).
>  - Kill the region server that holds the table's region.
>  - Scan the table once the region is made available again. The first row, at 
> seqid 1, will be missing since the HFile with seqid 3 makes us believe that 
> everything that came before it was flushed.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HBASE-10312) Flooding the cluster with administrative actions leads to collapse

2014-04-15 Thread Jean-Daniel Cryans (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-10312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13970152#comment-13970152
 ] 

Jean-Daniel Cryans commented on HBASE-10312:


Doing now.

> Flooding the cluster with administrative actions leads to collapse
> --
>
> Key: HBASE-10312
> URL: https://issues.apache.org/jira/browse/HBASE-10312
> Project: HBase
>  Issue Type: Bug
>Reporter: Andrew Purtell
>Assignee: Jean-Daniel Cryans
>Priority: Critical
> Fix For: 0.99.0, 0.94.19, 0.98.2, 0.96.3
>
> Attachments: HBASE-10312-0.94.patch, HBASE-10312.patch
>
>
> Steps to reproduce:
> 1. Start a cluster.
> 2. Start an ingest process.
> 3. In the HBase shell, do this:
> {noformat}
> while true do
>flush 'table'
> end
> {noformat}
> We should reject abuse via administrative requests like this.
> What happens on the cluster is the requests back up, leading to lots of these:
> {noformat}
> 2014-01-10 18:55:55,293 WARN  [Priority.RpcServer.handler=2,port=8120] 
> monitoring.TaskMonitor: Too many actions in action monitor! Purging some.
> {noformat}
> At this point we could lower a gate on further requests for actions until the 
> backlog clears.
> Continuing, all of the regionservers will eventually die with a 
> StackOverflowError of unknown origin because, stack overflow:
> {noformat}
> 2014-01-10 19:02:02,783 ERROR [Priority.RpcServer.handler=3,port=8120] 
> ipc.RpcServer: Unexpected throwable object java.lang.StackOverflowError
> at java.util.ArrayList$SubList.add(ArrayList.java:965)
> [...]
> {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HBASE-10958) [dataloss] Bulk loading with seqids can prevent some log entries from being replayed

2014-04-15 Thread Jean-Daniel Cryans (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-10958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13970118#comment-13970118
 ] 

Jean-Daniel Cryans commented on HBASE-10958:


That's a real test failure, the user that bulk loads doesn't have the 
permission to flush. Looking.

> [dataloss] Bulk loading with seqids can prevent some log entries from being 
> replayed
> 
>
> Key: HBASE-10958
> URL: https://issues.apache.org/jira/browse/HBASE-10958
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.96.2, 0.98.1, 0.94.18
>Reporter: Jean-Daniel Cryans
>Assignee: Jean-Daniel Cryans
>Priority: Blocker
> Fix For: 0.99.0, 0.94.19, 0.98.2, 0.96.3
>
> Attachments: HBASE-10958-less-intrusive-hack-0.96.patch, 
> HBASE-10958-quick-hack-0.96.patch, HBASE-10958-v2.patch, HBASE-10958.patch
>
>
> We found an issue with bulk loads causing data loss when assigning sequence 
> ids (HBASE-6630) that is triggered when replaying recovered edits. We're 
> nicknaming this issue *Blindspot*.
> The problem is that the sequence id given to a bulk loaded file is higher 
> than those of the edits in the region's memstore. When replaying recovered 
> edits, the rule to skip some of them is that they have to be _lower than the 
> highest sequence id_. In other words, the edits that have a sequence id lower 
> than the highest one in the store files *should* have also been flushed. This 
> is not the case with bulk loaded files since we now have an HFile with a 
> sequence id higher than unflushed edits.
> The log recovery code takes this into account by simply skipping the bulk 
> loaded files, but this "bulk loaded status" is *lost* on compaction. The 
> edits in the logs that have a sequence id lower than the bulk loaded file 
> that got compacted are put in a blind spot and are skipped during replay.
> Here's the easiest way to recreate this issue:
>  - Create an empty table
>  - Put one row in it (let's say it gets seqid 1)
>  - Bulk load one file (it gets seqid 2). I used ImporTsv and set 
> hbase.mapreduce.bulkload.assign.sequenceNumbers.
>  - Bulk load a second file the same way (it gets seqid 3).
>  - Major compact the table (the new file has seqid 3 and isn't considered 
> bulk loaded).
>  - Kill the region server that holds the table's region.
>  - Scan the table once the region is made available again. The first row, at 
> seqid 1, will be missing since the HFile with seqid 3 makes us believe that 
> everything that came before it was flushed.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

[jira] [Commented] (HBASE-10958) [dataloss] Bulk loading with seqids can prevent some log entries from being replayed

2014-04-15 Thread Jean-Daniel Cryans (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-10958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13970015#comment-13970015
 ] 

Jean-Daniel Cryans commented on HBASE-10958:


Oh and I found that testRegionMadeOfBulkLoadedFilesOnly wasn't really testing 
WAL replay with a bulk loaded file, the KV was being added at LATEST_TIMESTAMP 
so it wasn't visible being so far in the future. I made it so we check that we 
got all the rows back.

> [dataloss] Bulk loading with seqids can prevent some log entries from being 
> replayed
> 
>
> Key: HBASE-10958
> URL: https://issues.apache.org/jira/browse/HBASE-10958
> Project: HBase
>  Issue Type: Bug
>Affects Versions: 0.96.2, 0.98.1, 0.94.18
>Reporter: Jean-Daniel Cryans
>Assignee: Jean-Daniel Cryans
>Priority: Blocker
> Fix For: 0.99.0, 0.94.19, 0.98.2, 0.96.3
>
> Attachments: HBASE-10958-less-intrusive-hack-0.96.patch, 
> HBASE-10958-quick-hack-0.96.patch, HBASE-10958-v2.patch, HBASE-10958.patch
>
>
> We found an issue with bulk loads causing data loss when assigning sequence 
> ids (HBASE-6630) that is triggered when replaying recovered edits. We're 
> nicknaming this issue *Blindspot*.
> The problem is that the sequence id given to a bulk loaded file is higher 
> than those of the edits in the region's memstore. When replaying recovered 
> edits, the rule to skip some of them is that they have to be _lower than the 
> highest sequence id_. In other words, the edits that have a sequence id lower 
> than the highest one in the store files *should* have also been flushed. This 
> is not the case with bulk loaded files since we now have an HFile with a 
> sequence id higher than unflushed edits.
> The log recovery code takes this into account by simply skipping the bulk 
> loaded files, but this "bulk loaded status" is *lost* on compaction. The 
> edits in the logs that have a sequence id lower than the bulk loaded file 
> that got compacted are put in a blind spot and are skipped during replay.
> Here's the easiest way to recreate this issue:
>  - Create an empty table
>  - Put one row in it (let's say it gets seqid 1)
>  - Bulk load one file (it gets seqid 2). I used ImporTsv and set 
> hbase.mapreduce.bulkload.assign.sequenceNumbers.
>  - Bulk load a second file the same way (it gets seqid 3).
>  - Major compact the table (the new file has seqid 3 and isn't considered 
> bulk loaded).
>  - Kill the region server that holds the table's region.
>  - Scan the table once the region is made available again. The first row, at 
> seqid 1, will be missing since the HFile with seqid 3 makes us believe that 
> everything that came before it was flushed.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 2264 matches

Mail list logo