[jira] [Commented] (HBASE-21874) Bucket cache on Persistent memory

2019-02-15 Thread Jean-Daniel Cryans (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16769508#comment-16769508
 ] 

Jean-Daniel Cryans commented on HBASE-21874:


Thank you Anoop and Ram for addressing my questions. Some follow-ups:

bq. We can remove that part if it is distracting but it may be needed if you 
need to create bigger sized cache. It seems like a java restriction from 
mmapping buffers.

We should document this limitation and add a check in the code for the moment.

bq. So it becomes a direct replacement of the DRAM based IOEngine. ( you don't 
copy the buffers onheap)

That's the part I don't get. ByteBufferIOEngine basically offers the same 
functionality if we don't care about persistence and the Intel DCPMM comes in a 
DIMM form factor so shouldn't that _just work_?

> Bucket cache on Persistent memory
> -
>
> Key: HBASE-21874
> URL: https://issues.apache.org/jira/browse/HBASE-21874
> Project: HBase
>  Issue Type: New Feature
>  Components: BucketCache
>Affects Versions: 3.0.0
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
>Priority: Major
> Fix For: 3.0.0
>
> Attachments: HBASE-21874.patch, HBASE-21874.patch, 
> HBASE-21874_V2.patch, Pmem_BC.png
>
>
> Non volatile persistent memory devices are byte addressable like DRAM (for 
> eg. Intel DCPMM). Bucket cache implementation can take advantage of this new 
> memory type and can make use of the existing offheap data structures to serve 
> data directly from this memory area without having to bring the data to 
> onheap.
> The patch is a new IOEngine implementation that works with the persistent 
> memory.
> Note : Here we don't make use of the persistence nature of the device and 
> just make use of the big memory it provides.
> Performance numbers to follow. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21874) Bucket cache on Persistent memory

2019-02-15 Thread Jean-Daniel Cryans (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16769510#comment-16769510
 ] 

Jean-Daniel Cryans commented on HBASE-21874:


Oh additionally maybe we shouldn't call this PmemIOEngine but something more 
generic like SharedMemoryFileMmapEngine if it's not going to be persistent.

> Bucket cache on Persistent memory
> -
>
> Key: HBASE-21874
> URL: https://issues.apache.org/jira/browse/HBASE-21874
> Project: HBase
>  Issue Type: New Feature
>  Components: BucketCache
>Affects Versions: 3.0.0
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
>Priority: Major
> Fix For: 3.0.0
>
> Attachments: HBASE-21874.patch, HBASE-21874.patch, 
> HBASE-21874_V2.patch, Pmem_BC.png
>
>
> Non volatile persistent memory devices are byte addressable like DRAM (for 
> eg. Intel DCPMM). Bucket cache implementation can take advantage of this new 
> memory type and can make use of the existing offheap data structures to serve 
> data directly from this memory area without having to bring the data to 
> onheap.
> The patch is a new IOEngine implementation that works with the persistent 
> memory.
> Note : Here we don't make use of the persistence nature of the device and 
> just make use of the big memory it provides.
> Performance numbers to follow. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-21874) Bucket cache on Persistent memory

2019-02-14 Thread Jean-Daniel Cryans (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-21874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16768748#comment-16768748
 ] 

Jean-Daniel Cryans commented on HBASE-21874:


Hi Ram. Two questions:
- Why add the configurable buffer size? That seems to be what most of the patch 
is about which is distracting.
- You say that the patch doesn't make use of the persistent nature of the 
device, but PmemIOEngine extends FileMmapEngine which advertises itself as 
persistent so it does a few other things on top. I might be confused about what 
it implies but this is also confusing :)

> Bucket cache on Persistent memory
> -
>
> Key: HBASE-21874
> URL: https://issues.apache.org/jira/browse/HBASE-21874
> Project: HBase
>  Issue Type: New Feature
>  Components: BucketCache
>Affects Versions: 3.0.0
>Reporter: ramkrishna.s.vasudevan
>Assignee: ramkrishna.s.vasudevan
>Priority: Major
> Fix For: 3.0.0
>
> Attachments: HBASE-21874.patch, HBASE-21874.patch, Pmem_BC.png
>
>
> Non volatile persistent memory devices are byte addressable like DRAM (for 
> eg. Intel DCPMM). Bucket cache implementation can take advantage of this new 
> memory type and can make use of the existing offheap data structures to serve 
> data directly from this memory area without having to bring the data to 
> onheap.
> The patch is a new IOEngine implementation that works with the persistent 
> memory.
> Note : Here we don't make use of the persistence nature of the device and 
> just make use of the big memory it provides.
> Performance numbers to follow. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (HBASE-12169) Document IPC binding options

2017-06-29 Thread Jean-Daniel Cryans (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16069042#comment-16069042
 ] 

Jean-Daniel Cryans commented on HBASE-12169:


It's tools _and_ practices on how to run a cluster, I think it fits.

> Document IPC binding options
> 
>
> Key: HBASE-12169
> URL: https://issues.apache.org/jira/browse/HBASE-12169
> Project: HBase
>  Issue Type: Task
>  Components: documentation
>Affects Versions: 0.98.0, 0.94.7, 0.99.0
>Reporter: Sean Busbey
>Assignee: Mike Drob
>Priority: Minor
> Attachments: HBASE-12169.patch
>
>
> HBASE-8148 added options to change binding component services, but there 
> aren't any docs for it.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-12169) Document IPC binding options

2017-06-29 Thread Jean-Daniel Cryans (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16068722#comment-16068722
 ] 

Jean-Daniel Cryans commented on HBASE-12169:


Instead of adding this to hbase-default, one option could be to document "How 
to have HBase processes bind to specific addresses", maybe on this page 
http://hbase.apache.org/book.html#ops_mgt ?

> Document IPC binding options
> 
>
> Key: HBASE-12169
> URL: https://issues.apache.org/jira/browse/HBASE-12169
> Project: HBase
>  Issue Type: Task
>  Components: documentation
>Affects Versions: 0.98.0, 0.94.7, 0.99.0
>Reporter: Sean Busbey
>Assignee: Mike Drob
>Priority: Minor
> Attachments: HBASE-12169.patch
>
>
> HBASE-8148 added options to change binding component services, but there 
> aren't any docs for it.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-12169) Document IPC binding options

2017-06-29 Thread Jean-Daniel Cryans (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12169?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16068578#comment-16068578
 ] 

Jean-Daniel Cryans commented on HBASE-12169:


Hey Mike,

I could see a case for documenting why someone would want to change those 
configs, have you considered it?

Also, and that might show how long it's been since I've looked at the HBase 
docs, but isn't the hbase-default stuff just pulled from hbase-default.xml? The 
configs you are documenting aren't in that file AFAIK, so it should either be 
documented elsewhere or the configs have to be pulled into hbase-default.xml.

> Document IPC binding options
> 
>
> Key: HBASE-12169
> URL: https://issues.apache.org/jira/browse/HBASE-12169
> Project: HBase
>  Issue Type: Task
>  Components: documentation
>Affects Versions: 0.98.0, 0.94.7, 0.99.0
>Reporter: Sean Busbey
>Assignee: Mike Drob
>Priority: Minor
> Attachments: HBASE-12169.patch
>
>
> HBASE-8148 added options to change binding component services, but there 
> aren't any docs for it.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Commented] (HBASE-13135) Move replication ops mgmt stuff from Javadoc to Ref Guide

2015-03-05 Thread Jean-Daniel Cryans (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14349168#comment-14349168
 ] 

Jean-Daniel Cryans commented on HBASE-13135:


Good stuff Misty, a few more things while we're here.

I think we can remove this line since it's on by default, maybe say that the 
operator should check it wasn't disabled:

bq. On the source cluster, enable replication by setting `hbase.replication` to 
`true` in _hbase-site.xml_.

The following can be removed, we don't print this out at least since the 
endpoint changes:

{quote}
 Considering 1 rs, with ratio 0.1
 Getting 1 rs from peer cluster # 0
 Choosing peer 10.10.1.49:62020
{quote}

This is a more reliable thing to point out instead, coming from 
ReplicationSource:

bq. LOG.info(Replicating +clusterId +  -  + peerClusterId);



 Move replication ops mgmt stuff from Javadoc to Ref Guide
 -

 Key: HBASE-13135
 URL: https://issues.apache.org/jira/browse/HBASE-13135
 Project: HBase
  Issue Type: Bug
  Components: documentation, Replication
Reporter: Misty Stanley-Jones
Assignee: Misty Stanley-Jones
 Attachments: HBASE-13135-v1.patch, HBASE-13135-v2.patch, 
 HBASE-13135.patch


 As per discussion with [~jmhsieh] and [~saint@gmail.com]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-13135) Move replication ops mgmt stuff from Javadoc to Ref Guide

2015-03-02 Thread Jean-Daniel Cryans (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-13135?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14343406#comment-14343406
 ] 

Jean-Daniel Cryans commented on HBASE-13135:


Looking at the current doc, this line should be changed since at the end it 
includes a link that points back to the doc we're moving:

bq. The following is a simplified procedure for configuring cluster 
replication. It may not cover every edge case. For more information, see the 
API documentation for replication.

This whole section is very similar to the tail of the Configuring Cluster 
Replication section (merge or skip?):

bq. + Verifying Replicated Data

 Move replication ops mgmt stuff from Javadoc to Ref Guide
 -

 Key: HBASE-13135
 URL: https://issues.apache.org/jira/browse/HBASE-13135
 Project: HBase
  Issue Type: Bug
  Components: documentation, Replication
Reporter: Misty Stanley-Jones
Assignee: Misty Stanley-Jones
 Attachments: HBASE-13135.patch


 As per discussion with [~jmhsieh] and [~saint@gmail.com]



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-8318) TableOutputFormat.TableRecordWriter should accept Increments

2015-01-12 Thread Jean-Daniel Cryans (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8318?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14273760#comment-14273760
 ] 

Jean-Daniel Cryans commented on HBASE-8318:
---

[~clehene] hey, so re-reading what I wrote, my opinion remains the same. Fact 
that nobody else picked this up seems to imply that it isn't a common use case 
either. I'm fine closing this.

 TableOutputFormat.TableRecordWriter should accept Increments
 

 Key: HBASE-8318
 URL: https://issues.apache.org/jira/browse/HBASE-8318
 Project: HBase
  Issue Type: Bug
Reporter: Jean-Marc Spaggiari
Assignee: Jean-Marc Spaggiari
 Attachments: HBASE-8318-v0-trunk.patch, HBASE-8318-v1-trunk.patch, 
 HBASE-8318-v2-trunk.patch


 TableOutputFormat.TableRecordWriter can take Puts and Deletes but it should 
 also accept Increments.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-7631) Region w/ few, fat reads was hard to find on a box carrying hundreds of regions

2015-01-12 Thread Jean-Daniel Cryans (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7631?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14273783#comment-14273783
 ] 

Jean-Daniel Cryans commented on HBASE-7631:
---

[~clehene] Things have changed a lot since then so I'm not sure what kind of 
work is required now. I'd close this unless someone is planning on working on 
it now that there's noise.

 Region w/ few, fat reads was hard to find on a box carrying hundreds of 
 regions
 ---

 Key: HBASE-7631
 URL: https://issues.apache.org/jira/browse/HBASE-7631
 Project: HBase
  Issue Type: Improvement
  Components: metrics
Reporter: stack

 Of a sudden on a prod cluster, a table's rows gained girth... hundreds of 
 thousands of rows... and the application was pulling them all back every time 
 but only once a second or so.  Regionserver was carrying hundreds of regions. 
  Was plain that there was lots of network out traffic.  It was tough figuring 
 which region was the culprit (JD's trick was moving the regions off one at a 
 time while watching network out traffic on cluster to see whose spiked next 
 -- it worked but just some time).
 If we had per region read/write sizes in metrics, that would have saved a 
 bunch of diagnostic time.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-12774) Fix the inconsistent permission checks for bulkloading.

2014-12-31 Thread Jean-Daniel Cryans (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14262361#comment-14262361
 ] 

Jean-Daniel Cryans commented on HBASE-12774:


Can you fixup the javadoc that I missed in HBASE-11008 while you're there?

https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/security/access/AccessController.java#L1961

Also, I'd like [~apurtell]'s benediction on this.

 Fix the inconsistent permission checks for bulkloading.
 ---

 Key: HBASE-12774
 URL: https://issues.apache.org/jira/browse/HBASE-12774
 Project: HBase
  Issue Type: Bug
Reporter: Srikanth Srungarapu
Assignee: Srikanth Srungarapu
Priority: Minor
 Attachments: HBASE-12774.patch, HBASE-12774_v2.patch


 Three checks(prePrepareBulkLoad, preCleanupBulkLoad, preBulkLoadHFile)  are 
 being done while performing secure bulk load, and it looks the former two 
 checks for 'W', while the later checks for 'C'. After having offline chat 
 with [~jdcryans], looks like the inconsistency among checks were unintended. 
 So, we can address this multiple ways.
 * All checks should be for 'W'
 * All checks should be for 'C'
 * All checks should be for both 'W' and 'C'
 Posting the initial patch going by the first option. Open to discussion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-12770) Don't transfer all the queued hlogs of a dead server to the same alive server

2014-12-29 Thread Jean-Daniel Cryans (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14260345#comment-14260345
 ] 

Jean-Daniel Cryans commented on HBASE-12770:


bq. is it reasonable that the alive server only transfer one peer's hlogs from 
the dead server

Just to make sure I understand you correctly, you're saying that in the case 
where we have many peers, instead of one server grabbing all the queues we 
should instead have the servers grab only one so that the load is spread out. 
If so, this is reasonable, but I would change the jira's title to reflect that 
(it's kind of vague right now, unless that was the goal to spur discussion).

bq. Regionservers could publish their queue depths and if there is a 
substantial difference detected, one RS could transfer a queue to the less 
loaded peer.

Yeah, but the details might be hard to get correctly. Might be a reason why a 
particular queue is growing (blocked on whatever) so it will start bouncing 
around. We could at least have admin tools to move queues though, right now 
we'd have to move all of them at the same time but with above's work it might 
be possible to do it more granularly.

 Don't transfer all the queued hlogs of a dead server to the same alive server
 -

 Key: HBASE-12770
 URL: https://issues.apache.org/jira/browse/HBASE-12770
 Project: HBase
  Issue Type: Improvement
  Components: Replication
Reporter: cuijianwei
Priority: Minor

 When a region server is down(or the cluster restart), all the hlog queues 
 will be transferred by the same alive region server. In a shared cluster, we 
 might create several peers replicating data to different peer clusters. There 
 might be lots of hlogs queued for these peers caused by several reasons, such 
 as some peers might be disabled, or errors from peer cluster might prevent 
 the replication, or the replication sources may fail to read some hlog 
 because of hdfs problem. Then, if the server is down or restarted, another 
 alive server will take all the replication jobs of the dead server, this 
 might bring a big pressure to resources(network/disk read) of the alive 
 server and also is not fast enough to replicate the queued hlogs. And if the 
 alive server is down, all the replication jobs including that takes from 
 other dead servers will once again be totally transferred to another alive 
 server, this might cause a server have a large number of queued hlogs(in our 
 shared cluster, we find one server might have thousands of queued hlogs for 
 replication). As an optional way, is it reasonable that the alive server only 
 transfer one peer's hlogs from the dead server one time? Then, other alive 
 region servers might have the opportunity to transfer the hlogs of rest 
 peers. This may also help the queued hlogs be processed more fast. Any 
 discussion is welcome.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-12774) Fix the inconsistent permission checks for bulkloading.

2014-12-29 Thread Jean-Daniel Cryans (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14260570#comment-14260570
 ] 

Jean-Daniel Cryans commented on HBASE-12774:


Like we discussed, and because of HBASE-11008, it needs to be C which is your 
second option. Now, adding a C+W check isn't something we're doing anywhere 
else (I think?) and I'm not sure if it makes sense. The way things currently 
work, I'm not sure if it makes sense for a user to have C without having W.

 Fix the inconsistent permission checks for bulkloading.
 ---

 Key: HBASE-12774
 URL: https://issues.apache.org/jira/browse/HBASE-12774
 Project: HBase
  Issue Type: Improvement
Reporter: Srikanth Srungarapu
Assignee: Srikanth Srungarapu
Priority: Minor
 Attachments: HBASE-12774.patch


 Three checks(prePrepareBulkLoad, preCleanupBulkLoad, preBulkLoadHFile)  are 
 being done while performing secure bulk load, and it looks the former two 
 checks for 'W', while the later checks for 'C'. After having offline chat 
 with [~jdcryans], looks like the inconsistency among checks were unintended. 
 So, we can address this multiple ways.
 * All checks should be for 'W'
 * All checks should be for 'C'
 * All checks should be for both 'W' and 'C'
 Posting the initial patch going by the first option. Open to discussion.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-12665) When aborting, dump metrics

2014-12-09 Thread Jean-Daniel Cryans (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14240046#comment-14240046
 ] 

Jean-Daniel Cryans commented on HBASE-12665:


I think your first cut is missing something, I see you lifting a bunch of stuff 
from JMXJsonServlet into JSONBean but the former still has most of the latter's 
code.

 When aborting, dump metrics
 ---

 Key: HBASE-12665
 URL: https://issues.apache.org/jira/browse/HBASE-12665
 Project: HBase
  Issue Type: Bug
  Components: Operability
Reporter: stack
Assignee: stack
 Fix For: 1.0.0, 2.0.0, 0.98.9

 Attachments: 0001-First-cut.patch


 We used to dump out all metrics when we were exiting on abort. Was of use 
 debugging why the abort.  We used to have this.  [~jdcryans] noticed it was 
 dropped by his brother [~eclark] over in HBASE-6410 Move RegionServer 
 Metrics to metrics2 To stop the two brothers fighting I intervened with this 
 patch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-12665) When aborting, dump metrics

2014-12-09 Thread Jean-Daniel Cryans (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14240277#comment-14240277
 ] 

Jean-Daniel Cryans commented on HBASE-12665:


[~stack] From a usability standpoint, how many lines do you get when dumping 
the metrics? Looks like it would go in the hundreds (also you are showing the 
master's metrics, standalone?). What I liked about what we had before was that 
it was concise and contained mostly stuff I was interested in, whereas now we 
seem to be getting all the histograms. Also, it goes to stdout instead of being 
logged, was that intentional? I'm not sure how that's gonna come out if there's 
loads of stuff being logged (like it usually is when a RS is aborting), it 
might get mixed a bit. Finally, it would be nice to get some description of 
what's being dumped. I was actually confused when I saw this part:

{noformat}
2014-12-09 13:44:21,244 FATAL [main] regionserver.HRegionServer(1927): 
RegionServer abort: loaded coprocessors are: 
[org.apache.hadoop.hbase.coprocessor.MultiRowMutationEndpoint]
{
  beans : [ {
{noformat}

It felt like the metrics were a continuation of the coprocessors printout. I'm 
easy to confuse so that might just be me.

Thanks for picking this up. In this time of the year, family values are more 
important than ever.

 When aborting, dump metrics
 ---

 Key: HBASE-12665
 URL: https://issues.apache.org/jira/browse/HBASE-12665
 Project: HBase
  Issue Type: Bug
  Components: Operability
Reporter: stack
Assignee: stack
 Fix For: 1.0.0, 2.0.0, 0.98.9

 Attachments: 0001-First-cut.patch, 
 0001-HBASE-12665-When-aborting-dump-metrics.patch


 We used to dump out all metrics when we were exiting on abort. Was of use 
 debugging why the abort.  We used to have this.  [~jdcryans] noticed it was 
 dropped by his brother [~eclark] over in HBASE-6410 Move RegionServer 
 Metrics to metrics2 To stop the two brothers fighting I intervened with this 
 patch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-12665) When aborting, dump metrics

2014-12-09 Thread Jean-Daniel Cryans (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12665?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14240456#comment-14240456
 ] 

Jean-Daniel Cryans commented on HBASE-12665:


+1, still seems like a lot but better than showing everything. Also, I like it 
better when logged like that. We could also consider printing it out in a more 
compact way, it's basically how the metrics dump was before, but we can do 
later if needed.

It seems that 0.98 will need a different patch.

 When aborting, dump metrics
 ---

 Key: HBASE-12665
 URL: https://issues.apache.org/jira/browse/HBASE-12665
 Project: HBase
  Issue Type: Bug
  Components: Operability
Reporter: stack
Assignee: stack
 Fix For: 1.0.0, 2.0.0, 0.98.9

 Attachments: 0001-First-cut.patch, 
 0001-HBASE-12665-When-aborting-dump-metrics.patch, 12665v3.txt, 12665v4.txt, 
 12665v5.txt, dump.txt


 We used to dump out all metrics when we were exiting on abort. Was of use 
 debugging why the abort.  We used to have this.  [~jdcryans] noticed it was 
 dropped by his brother [~eclark] over in HBASE-6410 Move RegionServer 
 Metrics to metrics2 To stop the two brothers fighting I intervened with this 
 patch.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-12419) Partial cell read caused by EOF ERRORs on replication source during replication

2014-11-05 Thread Jean-Daniel Cryans (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12419?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14199480#comment-14199480
 ] 

Jean-Daniel Cryans commented on HBASE-12419:


Yeah, getting partial reads has always been a reality in ReplicationSource, 
even before PBing everything. I'm +1 on TRACEing this but what about all the 
other users of BaseDecoder? Are there cases where we rely on this LOG.error to 
show that something went wrong?

 Partial cell read caused by EOF ERRORs on replication source during 
 replication
 -

 Key: HBASE-12419
 URL: https://issues.apache.org/jira/browse/HBASE-12419
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.98.7
Reporter: Andrew Purtell
 Fix For: 2.0.0, 0.98.8, 0.99.2

 Attachments: TestReplicationIngest.patch


 We are seeing exceptions like these on the replication sources when 
 replication is active:
 {noformat}
 2014-11-04 01:20:19,738 ERROR 
 [regionserver8120-EventThread.replicationSource,1] codec.BaseDecoder:
 Partial cell read caused by EOF: java.io.IOException: Premature EOF from 
 inputStream
 {noformat}
 HBase 0.98.8-SNAPSHOT, Hadoop 2.4.1.
 Happens both with and without short circuit reads on the source cluster.
 I'm able to reproduce this reliably:
 # Set up two clusters. Can be single slave.
 # Enable replication in configuration
 # Use LoadTestTool -init_only on both clusters
 # On source cluster via shell: alter 
 'cluster_test',{NAME='test_cf',REPLICATION_SCOPE=1}
 # On source cluster via shell: add_peer 'remote:port:/hbase'
 # On source cluster, LoadTestTool -skip_init -write 1:1024:10 -num_keys 
 100
 # Wait for LoadTestTool to complete
 # Use the shell to verify 1M rows are in 'cluster_test' on the target cluster.
 All 1M rows will replicate without data loss, but I'll see 5-15 instances of 
 Partial cell read caused by EOF messages logged from codec.BaseDecoder at 
 ERROR level on the replication source. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-11316) Expand info about compactions beyond HBASE-11120

2014-07-29 Thread Jean-Daniel Cryans (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14078104#comment-14078104
 ] 

Jean-Daniel Cryans commented on HBASE-11316:


+1, thanks for looking at this Stack and sorry I overlooked those failing tests.

 Expand info about compactions beyond HBASE-11120
 

 Key: HBASE-11316
 URL: https://issues.apache.org/jira/browse/HBASE-11316
 Project: HBase
  Issue Type: Bug
  Components: Compaction, documentation
Reporter: Misty Stanley-Jones
Assignee: Misty Stanley-Jones
 Fix For: 2.0.0

 Attachments: HBASE-11316-1.patch, HBASE-11316-2.patch, 
 HBASE-11316-3.patch, HBASE-11316-4.patch, HBASE-11316-5.patch, 
 HBASE-11316-6-rebased-v2.patch, HBASE-11316-6-rebased.patch, 
 HBASE-11316-6.patch, HBASE-11316-7.patch, HBASE-11316-8.patch, 
 HBASE-11316-9.patch, HBASE-11316.patch, addendum.txt, ch9_compactions.pdf


 Round 2 - expand info about the algorithms, talk about stripe compaction, and 
 talk more about configuration. Hopefully find some rules of thumb.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HBASE-11522) Move Replication information into the Ref Guide

2014-07-28 Thread Jean-Daniel Cryans (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-11522?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Daniel Cryans updated HBASE-11522:
---

   Resolution: Fixed
Fix Version/s: 2.0.0
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

I compiled the site and looked and the end result, +1. Pushed to master. Thanks 
Misty!

 Move Replication information into the Ref Guide
 ---

 Key: HBASE-11522
 URL: https://issues.apache.org/jira/browse/HBASE-11522
 Project: HBase
  Issue Type: Bug
  Components: documentation
Reporter: Misty Stanley-Jones
Assignee: Misty Stanley-Jones
 Fix For: 2.0.0

 Attachments: HBASE-11522-1.patch, HBASE-11522.patch, 
 replication_overview.png


 Currently, information about replication lives at 
 http://hbase.apache.org/replication.html and 
 http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/replication/package-summary.html#requirements.
  IMHO, it makes sense at least for the first link to be in the Ref Guide 
 instead of where it is now. This came up because of HBASE-10398.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-11316) Expand info about compactions beyond HBASE-11120

2014-07-28 Thread Jean-Daniel Cryans (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14077225#comment-14077225
 ] 

Jean-Daniel Cryans commented on HBASE-11316:


Comments:

{noformat}
-descriptionName of the failsafe snapshot taken by the restore operation.
+descriptionName of the failsafe snapshot taken by the restore opecompn.
{noformat}

Typo slipped in there.

{noformat}
+  titleBeing Stuck/title
{noformat}

Looking at the output, this line (and the other titles at that level) is 
comically small. There's probably we can do short of changing the style sheet.

And I'm good with the rest.

 Expand info about compactions beyond HBASE-11120
 

 Key: HBASE-11316
 URL: https://issues.apache.org/jira/browse/HBASE-11316
 Project: HBase
  Issue Type: Bug
  Components: Compaction, documentation
Reporter: Misty Stanley-Jones
Assignee: Misty Stanley-Jones
 Attachments: HBASE-11316-1.patch, HBASE-11316-2.patch, 
 HBASE-11316-3.patch, HBASE-11316-4.patch, HBASE-11316-5.patch, 
 HBASE-11316-6-rebased-v2.patch, HBASE-11316-6-rebased.patch, 
 HBASE-11316-6.patch, HBASE-11316-7.patch, HBASE-11316-8.patch, 
 HBASE-11316.patch, ch9_compactions.pdf


 Round 2 - expand info about the algorithms, talk about stripe compaction, and 
 talk more about configuration. Hopefully find some rules of thumb.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HBASE-11316) Expand info about compactions beyond HBASE-11120

2014-07-28 Thread Jean-Daniel Cryans (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-11316?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Daniel Cryans updated HBASE-11316:
---

   Resolution: Fixed
Fix Version/s: 2.0.0
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

Pushed to master, thanks for all the hard work documenting our most complicated 
stuff Misty.

 Expand info about compactions beyond HBASE-11120
 

 Key: HBASE-11316
 URL: https://issues.apache.org/jira/browse/HBASE-11316
 Project: HBase
  Issue Type: Bug
  Components: Compaction, documentation
Reporter: Misty Stanley-Jones
Assignee: Misty Stanley-Jones
 Fix For: 2.0.0

 Attachments: HBASE-11316-1.patch, HBASE-11316-2.patch, 
 HBASE-11316-3.patch, HBASE-11316-4.patch, HBASE-11316-5.patch, 
 HBASE-11316-6-rebased-v2.patch, HBASE-11316-6-rebased.patch, 
 HBASE-11316-6.patch, HBASE-11316-7.patch, HBASE-11316-8.patch, 
 HBASE-11316-9.patch, HBASE-11316.patch, ch9_compactions.pdf


 Round 2 - expand info about the algorithms, talk about stripe compaction, and 
 talk more about configuration. Hopefully find some rules of thumb.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (HBASE-11388) The order parameter is wrong when invoking the constructor of the ReplicationPeer In the method getPeer of the class ReplicationPeersZKImpl

2014-07-25 Thread Jean-Daniel Cryans (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-11388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Daniel Cryans resolved HBASE-11388.


   Resolution: Fixed
Fix Version/s: (was: 0.99.0)
 Hadoop Flags: Reviewed

You're right, so I just pushed to patch to 0.98. Thanks!

 The order parameter is wrong when invoking the constructor of the 
 ReplicationPeer In the method getPeer of the class ReplicationPeersZKImpl
 -

 Key: HBASE-11388
 URL: https://issues.apache.org/jira/browse/HBASE-11388
 Project: HBase
  Issue Type: Bug
  Components: Replication
Affects Versions: 0.99.0, 0.98.3
Reporter: Qianxi Zhang
Assignee: Qianxi Zhang
Priority: Minor
 Fix For: 0.98.5

 Attachments: HBASE_11388.patch, HBASE_11388_trunk_V1.patch


 The parameters is Configurationi, ClusterKey and id in the constructor 
 of the class ReplicationPeer. But he order parameter is Configurationi, 
 id and ClusterKey when invoking the constructor of the ReplicationPeer In 
 the method getPeer of the class ReplicationPeersZKImpl
 ReplicationPeer#76
 {code}
   public ReplicationPeer(Configuration conf, String key, String id) throws 
 ReplicationException {
 this.conf = conf;
 this.clusterKey = key;
 this.id = id;
 try {
   this.reloadZkWatcher();
 } catch (IOException e) {
   throw new ReplicationException(Error connecting to peer cluster with 
 peerId= + id, e);
 }
   }
 {code}
 ReplicationPeersZKImpl#498
 {code}
 ReplicationPeer peer =
 new ReplicationPeer(peerConf, peerId, 
 ZKUtil.getZooKeeperClusterKey(peerConf));
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HBASE-11388) The order parameter is wrong when invoking the constructor of the ReplicationPeer In the method getPeer of the class ReplicationPeersZKImpl

2014-07-24 Thread Jean-Daniel Cryans (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-11388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Daniel Cryans updated HBASE-11388:
---

Status: Patch Available  (was: Open)

+1 on the second patch, I'll get a Hadoop QA run for good measure.

 The order parameter is wrong when invoking the constructor of the 
 ReplicationPeer In the method getPeer of the class ReplicationPeersZKImpl
 -

 Key: HBASE-11388
 URL: https://issues.apache.org/jira/browse/HBASE-11388
 Project: HBase
  Issue Type: Bug
  Components: Replication
Affects Versions: 0.98.3, 0.99.0
Reporter: Qianxi Zhang
Assignee: Qianxi Zhang
Priority: Minor
 Fix For: 0.99.0, 0.98.5

 Attachments: HBASE_11388.patch, HBASE_11388_trunk_V1.patch


 The parameters is Configurationi, ClusterKey and id in the constructor 
 of the class ReplicationPeer. But he order parameter is Configurationi, 
 id and ClusterKey when invoking the constructor of the ReplicationPeer In 
 the method getPeer of the class ReplicationPeersZKImpl
 ReplicationPeer#76
 {code}
   public ReplicationPeer(Configuration conf, String key, String id) throws 
 ReplicationException {
 this.conf = conf;
 this.clusterKey = key;
 this.id = id;
 try {
   this.reloadZkWatcher();
 } catch (IOException e) {
   throw new ReplicationException(Error connecting to peer cluster with 
 peerId= + id, e);
 }
   }
 {code}
 ReplicationPeersZKImpl#498
 {code}
 ReplicationPeer peer =
 new ReplicationPeer(peerConf, peerId, 
 ZKUtil.getZooKeeperClusterKey(peerConf));
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HBASE-11388) The order parameter is wrong when invoking the constructor of the ReplicationPeer In the method getPeer of the class ReplicationPeersZKImpl

2014-07-24 Thread Jean-Daniel Cryans (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-11388?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Daniel Cryans updated HBASE-11388:
---

Status: Open  (was: Patch Available)

Well, this needs a rebase. I'm back from vacation now, [~qianxiZhang], so if 
you resubmit a patch this week we could get this in quickly.

 The order parameter is wrong when invoking the constructor of the 
 ReplicationPeer In the method getPeer of the class ReplicationPeersZKImpl
 -

 Key: HBASE-11388
 URL: https://issues.apache.org/jira/browse/HBASE-11388
 Project: HBase
  Issue Type: Bug
  Components: Replication
Affects Versions: 0.98.3, 0.99.0
Reporter: Qianxi Zhang
Assignee: Qianxi Zhang
Priority: Minor
 Fix For: 0.99.0, 0.98.5

 Attachments: HBASE_11388.patch, HBASE_11388_trunk_V1.patch


 The parameters is Configurationi, ClusterKey and id in the constructor 
 of the class ReplicationPeer. But he order parameter is Configurationi, 
 id and ClusterKey when invoking the constructor of the ReplicationPeer In 
 the method getPeer of the class ReplicationPeersZKImpl
 ReplicationPeer#76
 {code}
   public ReplicationPeer(Configuration conf, String key, String id) throws 
 ReplicationException {
 this.conf = conf;
 this.clusterKey = key;
 this.id = id;
 try {
   this.reloadZkWatcher();
 } catch (IOException e) {
   throw new ReplicationException(Error connecting to peer cluster with 
 peerId= + id, e);
 }
   }
 {code}
 ReplicationPeersZKImpl#498
 {code}
 ReplicationPeer peer =
 new ReplicationPeer(peerConf, peerId, 
 ZKUtil.getZooKeeperClusterKey(peerConf));
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-11522) Move Replication information into the Ref Guide

2014-07-23 Thread Jean-Daniel Cryans (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11522?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14072071#comment-14072071
 ] 

Jean-Daniel Cryans commented on HBASE-11522:


A few little things:
{noformat}
+paraApache HBase (TM) provides a replication mechanism to copy data 
between HBase
{noformat}

We needed to put the whole Apache HBase (TM) here because we were mentioning 
HBase for the first time, but now that it's in the middle of the documentation 
we can just replace this by just HBase provides a replication
{noformat}
+procedure
{noformat}

It's not a procedure, it's more a description of what happens when you 
replicate.
{noformat}
+  titleReplication FAQ/title
{noformat}
We can remove the whole section, it's stale and nobody's been asking those 
questions.

Finally, we could use a few more xml:id tags, at least for the first level 
below replication.

 Move Replication information into the Ref Guide
 ---

 Key: HBASE-11522
 URL: https://issues.apache.org/jira/browse/HBASE-11522
 Project: HBase
  Issue Type: Bug
  Components: documentation
Reporter: Misty Stanley-Jones
Assignee: Misty Stanley-Jones
 Attachments: HBASE-11522.patch, replication_overview.png


 Currently, information about replication lives at 
 http://hbase.apache.org/replication.html and 
 http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/replication/package-summary.html#requirements.
  IMHO, it makes sense at least for the first link to be in the Ref Guide 
 instead of where it is now. This came up because of HBASE-10398.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-11316) Expand info about compactions beyond HBASE-11120

2014-07-01 Thread Jean-Daniel Cryans (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11316?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14049476#comment-14049476
 ] 

Jean-Daniel Cryans commented on HBASE-11316:


IMO, unless we specifically refer to something related to the HFile 
implementation, we should always use StoreFile. That's also how we present in 
the web UI when list the number of Stores and StoreFiles in a region.

 Expand info about compactions beyond HBASE-11120
 

 Key: HBASE-11316
 URL: https://issues.apache.org/jira/browse/HBASE-11316
 Project: HBase
  Issue Type: Bug
  Components: Compaction, documentation
Reporter: Misty Stanley-Jones
Assignee: Misty Stanley-Jones
 Attachments: HBASE-11316-1.patch, HBASE-11316-2.patch, 
 HBASE-11316-3.patch, HBASE-11316-4.patch, HBASE-11316-5.patch, 
 HBASE-11316.patch, ch9_compactions.pdf


 Round 2 - expand info about the algorithms, talk about stripe compaction, and 
 talk more about configuration. Hopefully find some rules of thumb.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-11388) The order parameter is wrong when invoking the constructor of the ReplicationPeer In the method getPeer of the class ReplicationPeersZKImpl

2014-06-24 Thread Jean-Daniel Cryans (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14042287#comment-14042287
 ] 

Jean-Daniel Cryans commented on HBASE-11388:


Ah, sorry, I wasn't clear. I was thinking that we should only remove it from 
the constructor, but then populate that field within the constructor using 
getZooKeeperClusterKey.

 The order parameter is wrong when invoking the constructor of the 
 ReplicationPeer In the method getPeer of the class ReplicationPeersZKImpl
 -

 Key: HBASE-11388
 URL: https://issues.apache.org/jira/browse/HBASE-11388
 Project: HBase
  Issue Type: Bug
  Components: Replication
Affects Versions: 0.99.0, 0.98.3
Reporter: Qianxi Zhang
Assignee: Qianxi Zhang
Priority: Minor
 Fix For: 0.99.0

 Attachments: HBASE_11388.patch


 The parameters is Configurationi, ClusterKey and id in the constructor 
 of the class ReplicationPeer. But he order parameter is Configurationi, 
 id and ClusterKey when invoking the constructor of the ReplicationPeer In 
 the method getPeer of the class ReplicationPeersZKImpl
 ReplicationPeer#76
 {code}
   public ReplicationPeer(Configuration conf, String key, String id) throws 
 ReplicationException {
 this.conf = conf;
 this.clusterKey = key;
 this.id = id;
 try {
   this.reloadZkWatcher();
 } catch (IOException e) {
   throw new ReplicationException(Error connecting to peer cluster with 
 peerId= + id, e);
 }
   }
 {code}
 ReplicationPeersZKImpl#498
 {code}
 ReplicationPeer peer =
 new ReplicationPeer(peerConf, peerId, 
 ZKUtil.getZooKeeperClusterKey(peerConf));
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-11388) The order parameter is wrong when invoking the constructor of the ReplicationPeer In the method getPeer of the class ReplicationPeersZKImpl

2014-06-23 Thread Jean-Daniel Cryans (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11388?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14040977#comment-14040977
 ] 

Jean-Daniel Cryans commented on HBASE-11388:


Wow nice one [~qianxiZhang], I'm surprised that things even work but it looks 
like we don't do anything useful with clusterKey in ReplicationPeer, we just 
use the configuration. I think a better interface would be to just remove the 
passing of the clusterKey in ReplicationPeer's constructor since we can infer 
it using ZKUtil#getZooKeeperClusterKey().

 The order parameter is wrong when invoking the constructor of the 
 ReplicationPeer In the method getPeer of the class ReplicationPeersZKImpl
 -

 Key: HBASE-11388
 URL: https://issues.apache.org/jira/browse/HBASE-11388
 Project: HBase
  Issue Type: Bug
  Components: Replication
Affects Versions: 0.99.0, 0.98.3
Reporter: Qianxi Zhang
Assignee: Qianxi Zhang
Priority: Minor
 Fix For: 0.99.0

 Attachments: HBASE_11388.patch


 The parameters is Configurationi, ClusterKey and id in the constructor 
 of the class ReplicationPeer. But he order parameter is Configurationi, 
 id and ClusterKey when invoking the constructor of the ReplicationPeer In 
 the method getPeer of the class ReplicationPeersZKImpl
 ReplicationPeer#76
 {code}
   public ReplicationPeer(Configuration conf, String key, String id) throws 
 ReplicationException {
 this.conf = conf;
 this.clusterKey = key;
 this.id = id;
 try {
   this.reloadZkWatcher();
 } catch (IOException e) {
   throw new ReplicationException(Error connecting to peer cluster with 
 peerId= + id, e);
 }
   }
 {code}
 ReplicationPeersZKImpl#498
 {code}
 ReplicationPeer peer =
 new ReplicationPeer(peerConf, peerId, 
 ZKUtil.getZooKeeperClusterKey(peerConf));
 {code}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-6192) Document ACL matrix in the book

2014-06-16 Thread Jean-Daniel Cryans (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6192?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14032601#comment-14032601
 ] 

Jean-Daniel Cryans commented on HBASE-6192:
---

bq. The code seems to indicate that BulkLoadHFile requires CREATE but the PDF 
says it requires WRITE. Which is true?

I changed it in HBASE-11008, the explanation is in the release note.

 Document ACL matrix in the book
 ---

 Key: HBASE-6192
 URL: https://issues.apache.org/jira/browse/HBASE-6192
 Project: HBase
  Issue Type: Task
  Components: documentation, security
Affects Versions: 0.94.1, 0.95.2
Reporter: Enis Soztutar
Assignee: Misty Stanley-Jones
  Labels: documentaion, security
 Fix For: 0.99.0

 Attachments: HBASE-6192-rebased.patch, HBASE-6192.patch, HBase 
 Security-ACL Matrix.pdf, HBase Security-ACL Matrix.pdf, HBase Security-ACL 
 Matrix.pdf, HBase Security-ACL Matrix.xls, HBase Security-ACL Matrix.xls, 
 HBase Security-ACL Matrix.xls


 We have an excellent matrix at 
 https://issues.apache.org/jira/secure/attachment/12531252/Security-ACL%20Matrix.pdf
  for ACL. Once the changes are done, we can adapt that and put it in the 
 book, also add some more documentation about the new authorization features. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-10504) Define Replication Interface

2014-06-16 Thread Jean-Daniel Cryans (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14033216#comment-14033216
 ] 

Jean-Daniel Cryans commented on HBASE-10504:


bq.  Do you want to do this in a subtask?

Yes, please.

 Define Replication Interface
 

 Key: HBASE-10504
 URL: https://issues.apache.org/jira/browse/HBASE-10504
 Project: HBase
  Issue Type: Task
Reporter: stack
Assignee: Enis Soztutar
Priority: Blocker
 Fix For: 0.99.0

 Attachments: hbase-10504_v1.patch, hbase-10504_wip1.patch


 HBase has replication.  Fellas have been hijacking the replication apis to do 
 all kinds of perverse stuff like indexing hbase content (hbase-indexer 
 https://github.com/NGDATA/hbase-indexer) and our [~toffer] just showed up w/ 
 overrides that replicate via an alternate channel (over a secure thrift 
 channel between dcs over on HBASE-9360).  This issue is about surfacing these 
 APIs as public with guarantees to downstreamers similar to those we have on 
 our public client-facing APIs (and so we don't break them for downstreamers).
 Any input [~phunt] or [~gabriel.reid] or [~toffer]?
 Thanks.
  



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HBASE-8844) Document the removal of replication state AKA start/stop_replication

2014-06-13 Thread Jean-Daniel Cryans (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-8844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Daniel Cryans updated HBASE-8844:
--

Summary: Document the removal of replication state AKA 
start/stop_replication  (was: Document stop_replication danger)

Updating the jira's summary to better reflect what we ended up doing.

 Document the removal of replication state AKA start/stop_replication
 

 Key: HBASE-8844
 URL: https://issues.apache.org/jira/browse/HBASE-8844
 Project: HBase
  Issue Type: Task
  Components: documentation
Reporter: Patrick
Assignee: Misty Stanley-Jones
Priority: Minor
 Attachments: HBASE-8844-v2.patch, HBASE-8844-v3-rebased.patch, 
 HBASE-8844-v3.patch, HBASE-8844.patch


 The first two tutorials for enabling replication that google gives me [1], 
 [2] take very different tones with regard to stop_replication. The HBase docs 
 [1] make it sound fine to start and stop replication as desired. The Cloudera 
 docs [2] say it may cause data loss.
 Which is true? If data loss is possible, are we talking about data loss in 
 the primary cluster, or data loss in the standby cluster (presumably would 
 require reinitializing the sync with a new CopyTable).
 [1] 
 http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/replication/package-summary.html#requirements
 [2] 
 http://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH4/4.2.0/CDH4-Installation-Guide/cdh4ig_topic_20_11.html



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HBASE-8844) Document the removal of replication state AKA start/stop_replication

2014-06-13 Thread Jean-Daniel Cryans (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-8844?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Daniel Cryans updated HBASE-8844:
--

   Resolution: Fixed
Fix Version/s: 0.99.0
   Status: Resolved  (was: Patch Available)

Pushed to trunk, thanks Misty!

 Document the removal of replication state AKA start/stop_replication
 

 Key: HBASE-8844
 URL: https://issues.apache.org/jira/browse/HBASE-8844
 Project: HBase
  Issue Type: Task
  Components: documentation
Reporter: Patrick
Assignee: Misty Stanley-Jones
Priority: Minor
 Fix For: 0.99.0

 Attachments: HBASE-8844-v2.patch, HBASE-8844-v3-rebased.patch, 
 HBASE-8844-v3.patch, HBASE-8844.patch


 The first two tutorials for enabling replication that google gives me [1], 
 [2] take very different tones with regard to stop_replication. The HBase docs 
 [1] make it sound fine to start and stop replication as desired. The Cloudera 
 docs [2] say it may cause data loss.
 Which is true? If data loss is possible, are we talking about data loss in 
 the primary cluster, or data loss in the standby cluster (presumably would 
 require reinitializing the sync with a new CopyTable).
 [1] 
 http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/replication/package-summary.html#requirements
 [2] 
 http://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH4/4.2.0/CDH4-Installation-Guide/cdh4ig_topic_20_11.html



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-10504) Define Replication Interface

2014-06-12 Thread Jean-Daniel Cryans (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14029275#comment-14029275
 ] 

Jean-Daniel Cryans commented on HBASE-10504:


[~enis], as per the discussion above, I think what you are proposing is good 
but it doesn't solve the problem that this jira is about. How about we open a 
new jira, titled something like Make ReplicationSource extendable, and do the 
review/commit over there?

 Define Replication Interface
 

 Key: HBASE-10504
 URL: https://issues.apache.org/jira/browse/HBASE-10504
 Project: HBase
  Issue Type: Task
Reporter: stack
Assignee: Enis Soztutar
Priority: Blocker
 Fix For: 0.99.0

 Attachments: hbase-10504_v1.patch, hbase-10504_wip1.patch


 HBase has replication.  Fellas have been hijacking the replication apis to do 
 all kinds of perverse stuff like indexing hbase content (hbase-indexer 
 https://github.com/NGDATA/hbase-indexer) and our [~toffer] just showed up w/ 
 overrides that replicate via an alternate channel (over a secure thrift 
 channel between dcs over on HBASE-9360).  This issue is about surfacing these 
 APIs as public with guarantees to downstreamers similar to those we have on 
 our public client-facing APIs (and so we don't break them for downstreamers).
 Any input [~phunt] or [~gabriel.reid] or [~toffer]?
 Thanks.
  



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-10504) Define Replication Interface

2014-06-12 Thread Jean-Daniel Cryans (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14029806#comment-14029806
 ] 

Jean-Daniel Cryans commented on HBASE-10504:


As long as the RE daemons are disjoint from the region servers that would work, 
since you need to be able to have as many/few of them as you want, plus to be 
able to have them on different machines.

I could even see this as being useful for certain use cases that want 
inter-HBase replication to be done through specific gateways.

We then have to architect a way to discover those endpoints, etc... kind of 
what the current replication code already does with sinks.

 Define Replication Interface
 

 Key: HBASE-10504
 URL: https://issues.apache.org/jira/browse/HBASE-10504
 Project: HBase
  Issue Type: Task
Reporter: stack
Assignee: Enis Soztutar
Priority: Blocker
 Fix For: 0.99.0

 Attachments: hbase-10504_v1.patch, hbase-10504_wip1.patch


 HBase has replication.  Fellas have been hijacking the replication apis to do 
 all kinds of perverse stuff like indexing hbase content (hbase-indexer 
 https://github.com/NGDATA/hbase-indexer) and our [~toffer] just showed up w/ 
 overrides that replicate via an alternate channel (over a secure thrift 
 channel between dcs over on HBASE-9360).  This issue is about surfacing these 
 APIs as public with guarantees to downstreamers similar to those we have on 
 our public client-facing APIs (and so we don't break them for downstreamers).
 Any input [~phunt] or [~gabriel.reid] or [~toffer]?
 Thanks.
  



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-11120) Update documentation about major compaction algorithm

2014-06-09 Thread Jean-Daniel Cryans (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14025292#comment-14025292
 ] 

Jean-Daniel Cryans commented on HBASE-11120:


bq.  I don't see hbase.hstore.compaction.min.size in code base any more

Yeah I was confused by this too, [~stack], until I found this thread that you 
started for this reason a while back: 
http://osdir.com/ml/general/2013-06/msg41068.html

Basically, {{CompactionConfiguration}} was written with a configuration 
prefix so doing a grep on that config (or hbase.hstore.compaction.max.size) 
won't return anything. I think we should undo the refactoring in that class 
since it obviously breaks a pattern we're used to, which is to completely spell 
out the configuration names so we can easily search for them.

 Update documentation about major compaction algorithm
 -

 Key: HBASE-11120
 URL: https://issues.apache.org/jira/browse/HBASE-11120
 Project: HBase
  Issue Type: Bug
  Components: Compaction, documentation
Affects Versions: 0.98.2
Reporter: Misty Stanley-Jones
Assignee: Misty Stanley-Jones
 Attachments: HBASE-11120-2.patch, HBASE-11120-3-rebased.patch, 
 HBASE-11120-3.patch, HBASE-11120-4-rebased.patch, HBASE-11120-4.patch, 
 HBASE-11120.patch


 [14:20:38]  jdcryans seems that there's 
 http://hbase.apache.org/book.html#compaction and 
 http://hbase.apache.org/book.html#managed.compactions
 [14:20:56]  jdcryans the latter doesn't say much, except that you 
 should manage them
 [14:21:44]  jdcryans the former gives a good description of the 
 _old_ selection algo
 [14:45:25]  jdcryans this is the new selection algo since C5 / 
 0.96.0: https://issues.apache.org/jira/browse/HBASE-7842



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-8844) Document stop_replication danger

2014-06-09 Thread Jean-Daniel Cryans (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14025308#comment-14025308
 ] 

Jean-Daniel Cryans commented on HBASE-8844:
---

[~misty] I was about to commit and was double checking the documentation and I 
figured that we should remove the whole state znode section since that was 
also connected to start/stop_replication. I can do it on commit if that's ok 
with you.

 Document stop_replication danger
 

 Key: HBASE-8844
 URL: https://issues.apache.org/jira/browse/HBASE-8844
 Project: HBase
  Issue Type: Task
  Components: documentation
Reporter: Patrick
Assignee: Misty Stanley-Jones
Priority: Minor
 Attachments: HBASE-8844-v2.patch, HBASE-8844-v3-rebased.patch, 
 HBASE-8844-v3.patch, HBASE-8844.patch


 The first two tutorials for enabling replication that google gives me [1], 
 [2] take very different tones with regard to stop_replication. The HBase docs 
 [1] make it sound fine to start and stop replication as desired. The Cloudera 
 docs [2] say it may cause data loss.
 Which is true? If data loss is possible, are we talking about data loss in 
 the primary cluster, or data loss in the standby cluster (presumably would 
 require reinitializing the sync with a new CopyTable).
 [1] 
 http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/replication/package-summary.html#requirements
 [2] 
 http://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH4/4.2.0/CDH4-Installation-Guide/cdh4ig_topic_20_11.html



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-8844) Document stop_replication danger

2014-05-29 Thread Jean-Daniel Cryans (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8844?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14012492#comment-14012492
 ] 

Jean-Daniel Cryans commented on HBASE-8844:
---

This part isn't right:

{quote}
It is also cached locally using an AtomicBoolean in the 
iReplicationZookeeper/i 
-class. This value can be changed on a live cluster using the 
istop_replication/i 
+class. This value can be changed on a live cluster using the 
idisable_peer/i 
{quote}

The AtomicBoolean was completely removed, so there's now no way to toggle 
hbase.replication while the cluster is running. So that part can be completely 
removed, and hbase.replication is now only read when HBase starts.

Regarding disable_peer, it might be good to add that what it does is it stops 
sending the edits to that peer cluster, but it still keeps track of all the new 
WALs that it will have to replicate when it's re-enabled.

 Document stop_replication danger
 

 Key: HBASE-8844
 URL: https://issues.apache.org/jira/browse/HBASE-8844
 Project: HBase
  Issue Type: Task
  Components: documentation
Reporter: Patrick
Assignee: Misty Stanley-Jones
Priority: Minor
 Attachments: HBASE-8844-v2.patch, HBASE-8844.patch


 The first two tutorials for enabling replication that google gives me [1], 
 [2] take very different tones with regard to stop_replication. The HBase docs 
 [1] make it sound fine to start and stop replication as desired. The Cloudera 
 docs [2] say it may cause data loss.
 Which is true? If data loss is possible, are we talking about data loss in 
 the primary cluster, or data loss in the standby cluster (presumably would 
 require reinitializing the sync with a new CopyTable).
 [1] 
 http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/replication/package-summary.html#requirements
 [2] 
 http://www.cloudera.com/content/cloudera-content/cloudera-docs/CDH4/4.2.0/CDH4-Installation-Guide/cdh4ig_topic_20_11.html



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-11120) Update documentation about major compaction algorithm

2014-05-29 Thread Jean-Daniel Cryans (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14012875#comment-14012875
 ] 

Jean-Daniel Cryans commented on HBASE-11120:


Spent some more time reading through and I saw that I had missed a few things. 
BTW, thanks again Misty for working on this, it is much needed.

I'm not sure about this part:

{quote}
However, if the normal compaction
+algorithm do not find any normally-eligible StoreFiles, a 
major compaction is
+the only way to get out of this situation, and is forced. 
This is also called a
+size-based or size-triggered major compaction.
{quote}

The exploration compaction will select the smallest set of files to compact and 
it won't trigger a major compaction. Also, I looked for size-based and 
size-triggered in the code but couldn't find anything so I'm not sure what 
this is referring to.

This is wrong:

{quote}
+termIf this compaction was user-requested, do a major 
compaction./term
+listitem
+  paraCompactions can run on a schedule or can be initiated 
manually. If a
+compaction is requested manually, it is always a major 
compaction. If the
+compaction is user-requested, the major compaction still 
happens even if the are
+more than varnamehbase.hstore.compaction.max/varname 
files that need
+compaction./para
+/listitem
{quote}

It will do a minor compaction if the user asks for that, or a major compaction 
if it's specifically asked for.

It might be good to mention that hbase.hstore.compactionThreshold was the old 
name for hbase.hstore.compaction.min in this section:

{quote}
+  row
+entryhbase.hstore.compaction.min/entry
+entryThe minimum number of files which must be eligible 
for compaction before
+  compaction can run./entry
+entry3/entry
+  /row
{quote}

Those two are wrong (if 1000 bytes was the max we'd probably never compact :P):

{quote}
+  row
+entryhbase.hstore.compaction.min.size/entry
+entryA StoreFile smaller than this size (in bytes) will 
always be eligible for
+  minor compaction./entry
+entry10/entry
+  /row
+  row
+entryhbase.hstore.compaction.max.size/entry
+entryA StoreFile larger than this size (in bytes) will 
be excluded from minor
+  compaction./entry
+entry1000/entry
+  /row
{quote}

It should be 128MB and Long.MAX_VALUE as per: 
https://github.com/apache/hbase/blob/master/hbase-server/src/main/java/org/apache/hadoop/hbase/regionserver/compactions/CompactionConfiguration.java#L71

 Update documentation about major compaction algorithm
 -

 Key: HBASE-11120
 URL: https://issues.apache.org/jira/browse/HBASE-11120
 Project: HBase
  Issue Type: Bug
  Components: Compaction, documentation
Affects Versions: 0.98.2
Reporter: Misty Stanley-Jones
Assignee: Misty Stanley-Jones
 Attachments: HBASE-11120-2.patch, HBASE-11120-3-rebased.patch, 
 HBASE-11120-3.patch, HBASE-11120.patch


 [14:20:38]  jdcryans seems that there's 
 http://hbase.apache.org/book.html#compaction and 
 http://hbase.apache.org/book.html#managed.compactions
 [14:20:56]  jdcryans the latter doesn't say much, except that you 
 should manage them
 [14:21:44]  jdcryans the former gives a good description of the 
 _old_ selection algo
 [14:45:25]  jdcryans this is the new selection algo since C5 / 
 0.96.0: https://issues.apache.org/jira/browse/HBASE-7842



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-11196) Update description of -ROOT- in ref guide

2014-05-28 Thread Jean-Daniel Cryans (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11196?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14011827#comment-14011827
 ] 

Jean-Daniel Cryans commented on HBASE-11196:


Nit:

bq. tracked by Zookeeper

Should be stored in ZooKeeper.

Regarding the section about {{HTablePool}}, it's now a deprecated class, so we 
should eventually update that too (but let's stick to this jira's scope).

The rest looks good!


 Update description of -ROOT- in ref guide
 -

 Key: HBASE-11196
 URL: https://issues.apache.org/jira/browse/HBASE-11196
 Project: HBase
  Issue Type: Bug
  Components: documentation
Reporter: Dima Spivak
Assignee: Misty Stanley-Jones
 Attachments: HBASE-11196.patch


 Since the resolution of HBASE-3171, \-ROOT- is no longer used to store the 
 location(s) of .META. . Unfortunately, not all of [our 
 documentation|http://hbase.apache.org/book/arch.catalog.html] has been 
 updated to reflect this change in architecture.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-11120) Update documentation about major compaction algorithm

2014-05-22 Thread Jean-Daniel Cryans (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11120?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14006480#comment-14006480
 ] 

Jean-Daniel Cryans commented on HBASE-11120:


On top of Sergey's comments:

This could also use a sentence regarding TTLs, they are simply filtered out and 
tombstones aren't created:

{noformat}
bq. titleCompaction and Deletions/title
{noformat}

Same kind of comment for this para, the old versions are filtered more than 
deleted:

{noformat}
+titleCompaction and Versions/title
{noformat}

One thing bothers me about this para:

{noformat}
+paraThe compaction algorithms used by HBase have evolved over 
time. HBase 0.96
+  introduced a new algorithm for compaction file selection. To 
find out about the old
+  algorithm, see xref
+linkend=compaction /. The rest of this section describes 
the new algorithm, which
+  was implemented in link
+
xlink:href=https://issues.apache.org/jira/browse/HBASE-7842;HBASE-7842/link.
{noformat}

What's written next is much more about how to pick the files that will then be 
considered for compaction than the actual new policy which is in 
ExploringCompactionPolicy.java. In other words, the text that follows what I 
just quoted is ok but not really new in 0.96 via HBASE-7842, and it seems to me 
that we should explain what HBASE-7842 does.

 Update documentation about major compaction algorithm
 -

 Key: HBASE-11120
 URL: https://issues.apache.org/jira/browse/HBASE-11120
 Project: HBase
  Issue Type: Bug
  Components: Compaction, documentation
Affects Versions: 0.98.2
Reporter: Misty Stanley-Jones
Assignee: Misty Stanley-Jones
 Attachments: HBASE-11120.patch


 [14:20:38]  jdcryans seems that there's 
 http://hbase.apache.org/book.html#compaction and 
 http://hbase.apache.org/book.html#managed.compactions
 [14:20:56]  jdcryans the latter doesn't say much, except that you 
 should manage them
 [14:21:44]  jdcryans the former gives a good description of the 
 _old_ selection algo
 [14:45:25]  jdcryans this is the new selection algo since C5 / 
 0.96.0: https://issues.apache.org/jira/browse/HBASE-7842



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-5671) hbase.metrics.showTableName should be true by default

2014-05-16 Thread Jean-Daniel Cryans (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5671?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13999172#comment-13999172
 ] 

Jean-Daniel Cryans commented on HBASE-5671:
---

For anyone stumbling here because of this message Inconsistent configuration. 
Previous configuration for using table name in metrics: true, new 
configuration: false, please see HBASE-11188.

 hbase.metrics.showTableName should be true by default
 -

 Key: HBASE-5671
 URL: https://issues.apache.org/jira/browse/HBASE-5671
 Project: HBase
  Issue Type: Improvement
  Components: metrics
Reporter: Enis Soztutar
Assignee: Enis Soztutar
Priority: Critical
 Fix For: 0.94.0, 0.95.0

 Attachments: HBASE-5671_v1.patch


 HBASE-4768 added per-cf metrics and a new configuration option 
 hbase.metrics.showTableName. We should switch the conf option to true by 
 default, since it is not intuitive (at least to me) to aggregate per-cf 
 across tables by default, and it seems confusing to report on cf's without 
 table names. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (HBASE-10990) Running CompressionTest yields unrelated schema metrics configuration errors.

2014-05-16 Thread Jean-Daniel Cryans (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10990?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Daniel Cryans resolved HBASE-10990.


Resolution: Duplicate

I'm fixing this in HBASE-11188.

 Running CompressionTest yields unrelated schema metrics configuration errors.
 -

 Key: HBASE-10990
 URL: https://issues.apache.org/jira/browse/HBASE-10990
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.18
Reporter: Srikanth Srungarapu
Assignee: Srikanth Srungarapu
Priority: Minor
 Attachments: HBASE-10990.patch


 In a vanilla configuration, running CompressionTest yields the following 
 error:
 sudo -u hdfs hbase org.apache.hadoop.hbase.util.CompressionTest 
 /path/to/hfile gz
 Output:
 13/03/07 14:49:40 ERROR metrics.SchemaMetrics: Inconsistent configuration. 
 Previous configuration for using table name in metrics: true, new 
 configuration: false



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (HBASE-11188) Inconsistent configuration for SchemaMetrics is always shown

2014-05-16 Thread Jean-Daniel Cryans (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-11188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Daniel Cryans resolved HBASE-11188.


  Resolution: Fixed
Release Note: 
Region servers with the default value for hbase.metrics.showTableName will stop 
showing the error message ERROR 
org.apache.hadoop.hbase.regionserver.metrics.SchemaMetrics: Inconsistent 
configuration. Previous configuration for using table name in metrics: true, 
new configuration: false.
Region servers configured with hbase.metrics.showTableName=false should now get 
a message like this one: ERROR 
org.apache.hadoop.hbase.regionserver.metrics.SchemaMetrics: Inconsistent 
configuration. Previous configuration for using table name in metrics: false, 
new configuration: true, and it's nothing to be concerned about.
Hadoop Flags: Reviewed

Grazie Matteo, I committed the patch to 0.94

 Inconsistent configuration for SchemaMetrics is always shown
 --

 Key: HBASE-11188
 URL: https://issues.apache.org/jira/browse/HBASE-11188
 Project: HBase
  Issue Type: Bug
  Components: metrics
Affects Versions: 0.94.19
Reporter: Jean-Daniel Cryans
Assignee: Jean-Daniel Cryans
 Fix For: 0.94.20

 Attachments: HBASE-11188-0.94-v2.patch, HBASE-11188-0.94.patch


 Some users have been complaining about this message:
 {noformat}
 ERROR org.apache.hadoop.hbase.regionserver.metrics.SchemaMetrics: 
 Inconsistent configuration. Previous configuration for using table name in 
 metrics: true, new configuration: false
 {noformat}
 The interesting thing is that we see it with default configurations, which 
 made me think that some code path must have been passing the wrong thing. I 
 found that if SchemaConfigured is passed a null Configuration in its 
 constructor that it will then pass null to SchemaMetrics#configureGlobally 
 which will interpret useTableName as being false:
 {code}
   public static void configureGlobally(Configuration conf) {
 if (conf != null) {
   final boolean useTableNameNew =
   conf.getBoolean(SHOW_TABLE_NAME_CONF_KEY, false);
   setUseTableName(useTableNameNew);
 } else {
   setUseTableName(false);
 }
   }
 {code}
 It should be set to true since that's the new default, meaning we missed it 
 in HBASE-5671.
 I found one code path that passes a null configuration, StoreFile.Reader 
 extends SchemaConfigured and uses the constructor that only passes a Path, so 
 the Configuration is set to null.
 I'm planning on just passing true instead of false, fixing the problem for 
 almost everyone (those that disable this feature will get the error message). 
 IMO it's not worth more efforts since it's a 0.94-only problem and it's not 
 actually doing anything bad.
 I'm closing both HBASE-10990 and HBASE-10946 as duplicates.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HBASE-11188) Inconsistent configuration for SchemaMetrics is always shown

2014-05-16 Thread Jean-Daniel Cryans (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-11188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Daniel Cryans updated HBASE-11188:
---

Attachment: HBASE-11188-0.94.patch

Proposing this one-liner patch, passes both TestSchemaConfigured and 
TestSchemaMetrics.

 Inconsistent configuration for SchemaMetrics is always shown
 --

 Key: HBASE-11188
 URL: https://issues.apache.org/jira/browse/HBASE-11188
 Project: HBase
  Issue Type: Bug
  Components: metrics
Affects Versions: 0.94.19
Reporter: Jean-Daniel Cryans
Assignee: Jean-Daniel Cryans
 Fix For: 0.94.20

 Attachments: HBASE-11188-0.94-v2.patch, HBASE-11188-0.94.patch


 Some users have been complaining about this message:
 {noformat}
 ERROR org.apache.hadoop.hbase.regionserver.metrics.SchemaMetrics: 
 Inconsistent configuration. Previous configuration for using table name in 
 metrics: true, new configuration: false
 {noformat}
 The interesting thing is that we see it with default configurations, which 
 made me think that some code path must have been passing the wrong thing. I 
 found that if SchemaConfigured is passed a null Configuration in its 
 constructor that it will then pass null to SchemaMetrics#configureGlobally 
 which will interpret useTableName as being false:
 {code}
   public static void configureGlobally(Configuration conf) {
 if (conf != null) {
   final boolean useTableNameNew =
   conf.getBoolean(SHOW_TABLE_NAME_CONF_KEY, false);
   setUseTableName(useTableNameNew);
 } else {
   setUseTableName(false);
 }
   }
 {code}
 It should be set to true since that's the new default, meaning we missed it 
 in HBASE-5671.
 I found one code path that passes a null configuration, StoreFile.Reader 
 extends SchemaConfigured and uses the constructor that only passes a Path, so 
 the Configuration is set to null.
 I'm planning on just passing true instead of false, fixing the problem for 
 almost everyone (those that disable this feature will get the error message). 
 IMO it's not worth more efforts since it's a 0.94-only problem and it's not 
 actually doing anything bad.
 I'm closing both HBASE-10990 and HBASE-10946 as duplicates.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HBASE-11188) Inconsistent configuration for SchemaMetrics is always shown

2014-05-16 Thread Jean-Daniel Cryans (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-11188?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Daniel Cryans updated HBASE-11188:
---

Attachment: HBASE-11188-0.94-v2.patch

To be extra correct there's another line that should default to true, attaching 
the patch.

 Inconsistent configuration for SchemaMetrics is always shown
 --

 Key: HBASE-11188
 URL: https://issues.apache.org/jira/browse/HBASE-11188
 Project: HBase
  Issue Type: Bug
  Components: metrics
Affects Versions: 0.94.19
Reporter: Jean-Daniel Cryans
Assignee: Jean-Daniel Cryans
 Fix For: 0.94.20

 Attachments: HBASE-11188-0.94-v2.patch, HBASE-11188-0.94.patch


 Some users have been complaining about this message:
 {noformat}
 ERROR org.apache.hadoop.hbase.regionserver.metrics.SchemaMetrics: 
 Inconsistent configuration. Previous configuration for using table name in 
 metrics: true, new configuration: false
 {noformat}
 The interesting thing is that we see it with default configurations, which 
 made me think that some code path must have been passing the wrong thing. I 
 found that if SchemaConfigured is passed a null Configuration in its 
 constructor that it will then pass null to SchemaMetrics#configureGlobally 
 which will interpret useTableName as being false:
 {code}
   public static void configureGlobally(Configuration conf) {
 if (conf != null) {
   final boolean useTableNameNew =
   conf.getBoolean(SHOW_TABLE_NAME_CONF_KEY, false);
   setUseTableName(useTableNameNew);
 } else {
   setUseTableName(false);
 }
   }
 {code}
 It should be set to true since that's the new default, meaning we missed it 
 in HBASE-5671.
 I found one code path that passes a null configuration, StoreFile.Reader 
 extends SchemaConfigured and uses the constructor that only passes a Path, so 
 the Configuration is set to null.
 I'm planning on just passing true instead of false, fixing the problem for 
 almost everyone (those that disable this feature will get the error message). 
 IMO it's not worth more efforts since it's a 0.94-only problem and it's not 
 actually doing anything bad.
 I'm closing both HBASE-10990 and HBASE-10946 as duplicates.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-6990) Pretty print TTL

2014-05-16 Thread Jean-Daniel Cryans (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6990?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14000440#comment-14000440
 ] 

Jean-Daniel Cryans commented on HBASE-6990:
---

That representation is pretty much my original ask, +1

 Pretty print TTL
 

 Key: HBASE-6990
 URL: https://issues.apache.org/jira/browse/HBASE-6990
 Project: HBase
  Issue Type: Improvement
  Components: Usability
Reporter: Jean-Daniel Cryans
Assignee: Esteban Gutierrez
Priority: Minor
 Attachments: HBASE-6990.v0.patch, HBASE-6990.v1.patch, 
 HBASE-6990.v2.patch, HBASE-6990.v3.patch, HBASE-6990.v4.patch


 I've seen a lot of users getting confused by the TTL configuration and I 
 think that if we just pretty printed it it would solve most of the issues. 
 For example, let's say a user wanted to set a TTL of 90 days. That would be 
 7776000. But let's say that it was typo'd to 7776 instead, it gives you 
 900 days!
 So when we print the TTL we could do something like x days, x hours, x 
 minutes, x seconds (real_ttl_value). This would also help people when they 
 use ms instead of seconds as they would see really big values in there.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HBASE-11188) Inconsistent configuration for SchemaMetrics is always shown

2014-05-16 Thread Jean-Daniel Cryans (JIRA)
Jean-Daniel Cryans created HBASE-11188:
--

 Summary: Inconsistent configuration for SchemaMetrics is always 
shown
 Key: HBASE-11188
 URL: https://issues.apache.org/jira/browse/HBASE-11188
 Project: HBase
  Issue Type: Bug
  Components: metrics
Affects Versions: 0.94.19
Reporter: Jean-Daniel Cryans
Assignee: Jean-Daniel Cryans
 Fix For: 0.94.20
 Attachments: HBASE-11188-0.94-v2.patch, HBASE-11188-0.94.patch

Some users have been complaining about this message:

{noformat}
ERROR org.apache.hadoop.hbase.regionserver.metrics.SchemaMetrics: Inconsistent 
configuration. Previous configuration for using table name in metrics: true, 
new configuration: false
{noformat}

The interesting thing is that we see it with default configurations, which made 
me think that some code path must have been passing the wrong thing. I found 
that if SchemaConfigured is passed a null Configuration in its constructor that 
it will then pass null to SchemaMetrics#configureGlobally which will interpret 
useTableName as being false:

{code}
  public static void configureGlobally(Configuration conf) {
if (conf != null) {
  final boolean useTableNameNew =
  conf.getBoolean(SHOW_TABLE_NAME_CONF_KEY, false);
  setUseTableName(useTableNameNew);
} else {
  setUseTableName(false);
}
  }
{code}

It should be set to true since that's the new default, meaning we missed it in 
HBASE-5671.

I found one code path that passes a null configuration, StoreFile.Reader 
extends SchemaConfigured and uses the constructor that only passes a Path, so 
the Configuration is set to null.

I'm planning on just passing true instead of false, fixing the problem for 
almost everyone (those that disable this feature will get the error message). 
IMO it's not worth more efforts since it's a 0.94-only problem and it's not 
actually doing anything bad.

I'm closing both HBASE-10990 and HBASE-10946 as duplicates.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-11133) Add an option to skip snapshot verification after Export

2014-05-16 Thread Jean-Daniel Cryans (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11133?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13993007#comment-13993007
 ] 

Jean-Daniel Cryans commented on HBASE-11133:


Something like -skip-copy-sanity-check would be better, also maybe describe 
more what the sanity check is doing?

 Add an option to skip snapshot verification after Export
 

 Key: HBASE-11133
 URL: https://issues.apache.org/jira/browse/HBASE-11133
 Project: HBase
  Issue Type: Bug
  Components: snapshots
Affects Versions: 0.99.0
Reporter: Matteo Bertozzi
Assignee: Matteo Bertozzi
Priority: Trivial
 Fix For: 0.99.0

 Attachments: HBASE-11133-v0.patch, HBASE-11133-v1.patch


 Add a -skip-dst-verify option to skip snapshot verification after Export



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (HBASE-10946) hbase.metrics.showTableName cause run “hbase xxx.Hfile” report Inconsistent configuration

2014-05-16 Thread Jean-Daniel Cryans (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10946?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Daniel Cryans resolved HBASE-10946.


Resolution: Duplicate

I'm fixing this in HBASE-11188.

 hbase.metrics.showTableName  cause run “hbase xxx.Hfile”  report Inconsistent 
 configuration
 ---

 Key: HBASE-10946
 URL: https://issues.apache.org/jira/browse/HBASE-10946
 Project: HBase
  Issue Type: Bug
  Components: HFile
Affects Versions: 0.94.17
 Environment:  hadoop 1.2.1  hbase0.94.17
Reporter: Zhang Jingpeng
Priority: Minor

 when i run hbase org.apache.hadoop.hbase.io.hfile.Hfile ,print the error info
 ERROR metrics.SchemaMetrics: Inconsistent configuration. Previous 
 configuration for using table name in metrics: true, new configuration: false
 I find the report section in Hfile --setUseTableName method
 and to be called by next
  final boolean useTableNameNew =
   conf.getBoolean(SHOW_TABLE_NAME_CONF_KEY, false);
   setUseTableName(useTableNameNew);
 but  the hbase-default.xml is namehbase.metrics.showTableName/name
 valuetrue/value



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HBASE-10251) Restore API Compat for PerformanceEvaluation.generateValue()

2014-05-15 Thread Jean-Daniel Cryans (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Daniel Cryans updated HBASE-10251:
---

   Resolution: Fixed
Fix Version/s: (was: 0.98.2)
   (was: 0.98.1)
   (was: 0.98.0)
   0.98.3
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

Committed to 0.98 and trunk, thanks Dima!

 Restore API Compat for PerformanceEvaluation.generateValue()
 

 Key: HBASE-10251
 URL: https://issues.apache.org/jira/browse/HBASE-10251
 Project: HBase
  Issue Type: Bug
  Components: Client
Affects Versions: 0.98.0, 0.98.1, 0.99.0, 0.98.2
Reporter: Aleksandr Shulman
Assignee: Dima Spivak
  Labels: api_compatibility
 Fix For: 0.99.0, 0.98.3

 Attachments: HBASE-10251-v2.patch, HBASE_10251.patch, 
 HBASE_10251_v3.patch


 Observed:
 A couple of my client tests fail to compile against trunk because the method 
 PerformanceEvaluation.generateValue was removed as part of HBASE-8496.
 This is an issue because it was used in a number of places, including unit 
 tests. Since we did not explicitly label this API as private, it's ambiguous 
 as to whether this could/should have been used by people writing apps against 
 0.96. If they used it, then they would be broken upon upgrade to 0.98 and 
 trunk.
 Potential Solution:
 The method was renamed to generateData, but the logic is still the same. We 
 can reintroduce it as deprecated in 0.98, as compat shim over generateData. 
 The patch should be a few lines. We may also consider doing so in trunk, but 
 I'd be just as fine with leaving it out.
 More generally, this raises the question about what other code is in this 
 grey-area, where it is public, is used outside of the package, but is not 
 explicitly labeled with an AudienceInterface.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-10504) Define Replication Interface

2014-05-15 Thread Jean-Daniel Cryans (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13997727#comment-13997727
 ] 

Jean-Daniel Cryans commented on HBASE-10504:


[~enis], sounds like a plan. Oh and something we should not forget is setting 
the right InterfaceAudience. 

[~gabriel.reid] you down with this? It means that this class (1) will need a 
few changes just to continue working like it currently does, and eventually a 
lot more will need to change to use the replication consumer but it should 
simplify the code a lot. It will also have an impact on the deployment since 
you won't need a fake slave cluster anymore.

1. 
https://github.com/NGDATA/hbase-indexer/blob/master/hbase-sep/hbase-sep-impl-0.98/src/main/java/com/ngdata/sep/impl/SepReplicationSource.java

 Define Replication Interface
 

 Key: HBASE-10504
 URL: https://issues.apache.org/jira/browse/HBASE-10504
 Project: HBase
  Issue Type: Task
Reporter: stack
Assignee: stack
Priority: Blocker
 Fix For: 0.99.0

 Attachments: hbase-10504_wip1.patch


 HBase has replication.  Fellas have been hijacking the replication apis to do 
 all kinds of perverse stuff like indexing hbase content (hbase-indexer 
 https://github.com/NGDATA/hbase-indexer) and our [~toffer] just showed up w/ 
 overrides that replicate via an alternate channel (over a secure thrift 
 channel between dcs over on HBASE-9360).  This issue is about surfacing these 
 APIs as public with guarantees to downstreamers similar to those we have on 
 our public client-facing APIs (and so we don't break them for downstreamers).
 Any input [~phunt] or [~gabriel.reid] or [~toffer]?
 Thanks.
  



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-10251) Restore API Compat for PerformanceEvaluation.generateValue()

2014-05-13 Thread Jean-Daniel Cryans (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13996651#comment-13996651
 ] 

Jean-Daniel Cryans commented on HBASE-10251:


Pass VALUE_LENGTH directly instead of re-declaring it.

 Restore API Compat for PerformanceEvaluation.generateValue()
 

 Key: HBASE-10251
 URL: https://issues.apache.org/jira/browse/HBASE-10251
 Project: HBase
  Issue Type: Bug
  Components: Client
Affects Versions: 0.98.0, 0.98.1, 0.99.0, 0.98.2
Reporter: Aleksandr Shulman
Assignee: Dima Spivak
  Labels: api_compatibility
 Fix For: 0.98.0, 0.98.1, 0.99.0, 0.98.2

 Attachments: HBASE_10251.patch


 Observed:
 A couple of my client tests fail to compile against trunk because the method 
 PerformanceEvaluation.generateValue was removed as part of HBASE-8496.
 This is an issue because it was used in a number of places, including unit 
 tests. Since we did not explicitly label this API as private, it's ambiguous 
 as to whether this could/should have been used by people writing apps against 
 0.96. If they used it, then they would be broken upon upgrade to 0.98 and 
 trunk.
 Potential Solution:
 The method was renamed to generateData, but the logic is still the same. We 
 can reintroduce it as deprecated in 0.98, as compat shim over generateData. 
 The patch should be a few lines. We may also consider doing so in trunk, but 
 I'd be just as fine with leaving it out.
 More generally, this raises the question about what other code is in this 
 grey-area, where it is public, is used outside of the package, but is not 
 explicitly labeled with an AudienceInterface.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Assigned] (HBASE-10251) Restore API Compat for PerformanceEvaluation.generateValue()

2014-05-13 Thread Jean-Daniel Cryans (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Daniel Cryans reassigned HBASE-10251:
--

Assignee: Dima Spivak  (was: Jean-Daniel Cryans)

God I hate jira, so easy to misclick. Reassigning Dima.

 Restore API Compat for PerformanceEvaluation.generateValue()
 

 Key: HBASE-10251
 URL: https://issues.apache.org/jira/browse/HBASE-10251
 Project: HBase
  Issue Type: Bug
  Components: Client
Affects Versions: 0.98.0, 0.98.1, 0.99.0, 0.98.2
Reporter: Aleksandr Shulman
Assignee: Dima Spivak
  Labels: api_compatibility
 Fix For: 0.98.0, 0.98.1, 0.99.0, 0.98.2

 Attachments: HBASE-10251-v2.patch, HBASE_10251.patch


 Observed:
 A couple of my client tests fail to compile against trunk because the method 
 PerformanceEvaluation.generateValue was removed as part of HBASE-8496.
 This is an issue because it was used in a number of places, including unit 
 tests. Since we did not explicitly label this API as private, it's ambiguous 
 as to whether this could/should have been used by people writing apps against 
 0.96. If they used it, then they would be broken upon upgrade to 0.98 and 
 trunk.
 Potential Solution:
 The method was renamed to generateData, but the logic is still the same. We 
 can reintroduce it as deprecated in 0.98, as compat shim over generateData. 
 The patch should be a few lines. We may also consider doing so in trunk, but 
 I'd be just as fine with leaving it out.
 More generally, this raises the question about what other code is in this 
 grey-area, where it is public, is used outside of the package, but is not 
 explicitly labeled with an AudienceInterface.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Assigned] (HBASE-10251) Restore API Compat for PerformanceEvaluation.generateValue()

2014-05-13 Thread Jean-Daniel Cryans (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Daniel Cryans reassigned HBASE-10251:
--

Assignee: Jean-Daniel Cryans  (was: Dima Spivak)

 Restore API Compat for PerformanceEvaluation.generateValue()
 

 Key: HBASE-10251
 URL: https://issues.apache.org/jira/browse/HBASE-10251
 Project: HBase
  Issue Type: Bug
  Components: Client
Affects Versions: 0.98.0, 0.98.1, 0.99.0, 0.98.2
Reporter: Aleksandr Shulman
Assignee: Jean-Daniel Cryans
  Labels: api_compatibility
 Fix For: 0.98.0, 0.98.1, 0.99.0, 0.98.2

 Attachments: HBASE-10251-v2.patch, HBASE_10251.patch


 Observed:
 A couple of my client tests fail to compile against trunk because the method 
 PerformanceEvaluation.generateValue was removed as part of HBASE-8496.
 This is an issue because it was used in a number of places, including unit 
 tests. Since we did not explicitly label this API as private, it's ambiguous 
 as to whether this could/should have been used by people writing apps against 
 0.96. If they used it, then they would be broken upon upgrade to 0.98 and 
 trunk.
 Potential Solution:
 The method was renamed to generateData, but the logic is still the same. We 
 can reintroduce it as deprecated in 0.98, as compat shim over generateData. 
 The patch should be a few lines. We may also consider doing so in trunk, but 
 I'd be just as fine with leaving it out.
 More generally, this raises the question about what other code is in this 
 grey-area, where it is public, is used outside of the package, but is not 
 explicitly labeled with an AudienceInterface.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-11143) ageOfLastShippedOp metric is confusing

2014-05-12 Thread Jean-Daniel Cryans (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13995242#comment-13995242
 ] 

Jean-Daniel Cryans commented on HBASE-11143:


+1, and can we get the new metric in 0.96+?

 ageOfLastShippedOp metric is confusing
 --

 Key: HBASE-11143
 URL: https://issues.apache.org/jira/browse/HBASE-11143
 Project: HBase
  Issue Type: Bug
  Components: Replication
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 0.94.20

 Attachments: 11143-0.94-v2.txt, 11143-0.94.txt


 We are trying to report on replication lag and find that there is no good 
 single metric to do that.
 ageOfLastShippedOp is close, but unfortunately it is increased even when 
 there is nothing to ship on a particular RegionServer.
 I would like discuss a few options here:
 Add a new metric: replicationQueueTime (or something) with the above meaning. 
 I.e. if we have something to ship we set the age of that last shipped edit, 
 if we fail we increment that last time (just like we do now). But if there is 
 nothing to replicate we set it to current time (and hence that metric is 
 reported to close to 0).
 Alternatively we could change the meaning of ageOfLastShippedOp to mean to do 
 that. That might lead to surprises, but the current behavior is clearly weird 
 when there is nothing to replicate.
 Comments? [~jdcryans], [~stack].
 If approach sounds good, I'll make a patch for all branches.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-11143) ageOfLastShippedOp metric is confusing

2014-05-12 Thread Jean-Daniel Cryans (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11143?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13995525#comment-13995525
 ] 

Jean-Daniel Cryans commented on HBASE-11143:


I'm +1 for the trunk patch too.

 ageOfLastShippedOp metric is confusing
 --

 Key: HBASE-11143
 URL: https://issues.apache.org/jira/browse/HBASE-11143
 Project: HBase
  Issue Type: Bug
  Components: Replication
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 0.99.0, 0.96.3, 0.94.20, 0.98.3

 Attachments: 11143-0.94-v2.txt, 11143-0.94.txt, 11143-trunk.txt


 We are trying to report on replication lag and find that there is no good 
 single metric to do that.
 ageOfLastShippedOp is close, but unfortunately it is increased even when 
 there is nothing to ship on a particular RegionServer.
 I would like discuss a few options here:
 Add a new metric: replicationQueueTime (or something) with the above meaning. 
 I.e. if we have something to ship we set the age of that last shipped edit, 
 if we fail we increment that last time (just like we do now). But if there is 
 nothing to replicate we set it to current time (and hence that metric is 
 reported to close to 0).
 Alternatively we could change the meaning of ageOfLastShippedOp to mean to do 
 that. That might lead to surprises, but the current behavior is clearly weird 
 when there is nothing to replicate.
 Comments? [~jdcryans], [~stack].
 If approach sounds good, I'll make a patch for all branches.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-10504) Define Replication Interface

2014-05-12 Thread Jean-Daniel Cryans (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13995685#comment-13995685
 ] 

Jean-Daniel Cryans commented on HBASE-10504:


[~enis] So IIUC you are going a completely different way than what was 
discussed before, right? I don't mind it but I want to make sure that we're on 
the same track.

So instead of having people define their sinks, they instead put that code in 
the ReplicationConsumer. Seems like a good idea, you don't have to mess around 
getting a fake cluster up and running, among other issues. Might still be nice 
to offer some tools like removeNonReplicableEdits(), the basic metrics the 
handling of a disabled peer, etc, for the other consumers.

 Define Replication Interface
 

 Key: HBASE-10504
 URL: https://issues.apache.org/jira/browse/HBASE-10504
 Project: HBase
  Issue Type: Task
Reporter: stack
Assignee: stack
Priority: Blocker
 Fix For: 0.99.0

 Attachments: hbase-10504_wip1.patch


 HBase has replication.  Fellas have been hijacking the replication apis to do 
 all kinds of perverse stuff like indexing hbase content (hbase-indexer 
 https://github.com/NGDATA/hbase-indexer) and our [~toffer] just showed up w/ 
 overrides that replicate via an alternate channel (over a secure thrift 
 channel between dcs over on HBASE-9360).  This issue is about surfacing these 
 APIs as public with guarantees to downstreamers similar to those we have on 
 our public client-facing APIs (and so we don't break them for downstreamers).
 Any input [~phunt] or [~gabriel.reid] or [~toffer]?
 Thanks.
  



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-10993) Deprioritize long-running scanners

2014-05-06 Thread Jean-Daniel Cryans (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10993?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13991179#comment-13991179
 ] 

Jean-Daniel Cryans commented on HBASE-10993:


I'm +1, there are a few things to fix in the comments but you can do that on 
commit (like mantain).

BTW have you run this on a cluster where clients  handlers? 

 Deprioritize long-running scanners
 --

 Key: HBASE-10993
 URL: https://issues.apache.org/jira/browse/HBASE-10993
 Project: HBase
  Issue Type: Sub-task
Reporter: Matteo Bertozzi
Assignee: Matteo Bertozzi
Priority: Minor
 Fix For: 0.99.0

 Attachments: HBASE-10993-v0.patch, HBASE-10993-v1.patch, 
 HBASE-10993-v2.patch, HBASE-10993-v3.patch, HBASE-10993-v4.patch, 
 HBASE-10993-v4.patch


 Currently we have a single call queue that serves all the normal user  
 requests, and the requests are executed in FIFO.
 When running map-reduce jobs and user-queries on the same machine, we want to 
 prioritize the user-queries.
 Without changing too much code, and not having the user giving hints, we can 
 add a “vtime” field to the scanner, to keep track from how long is running. 
 And we can replace the callQueue with a priorityQueue. In this way we can 
 deprioritize long-running scans, the longer a scan request lives the less 
 priority it gets.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-11116) Allow ReplicationSink to write to tables when READONLY is set

2014-05-05 Thread Jean-Daniel Cryans (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13990246#comment-13990246
 ] 

Jean-Daniel Cryans commented on HBASE-6:


I wonder if you could sort of enable that right now with ACLs. Basically, only 
the user that runs hbase can write to the tables on the slave cluster, everyone 
else is disallowed.

 Allow ReplicationSink to write to tables when READONLY is set
 -

 Key: HBASE-6
 URL: https://issues.apache.org/jira/browse/HBASE-6
 Project: HBase
  Issue Type: New Feature
  Components: regionserver, Replication
Reporter: Geoffrey Anderson

 A common practice with MySQL replication is to enable the read_only flag 
 (http://dev.mysql.com/doc/refman/5.5/en/server-system-variables.html#sysvar_read_only)
  on slaves in order to disallow any user changes to a slave, while allowing 
 replication from the master to flow in.  In HBase, enabling the READONLY 
 field on a table will cause ReplicationSink to fail to apply changes.  It'd 
 be ideal to have the default behavior allow replication to continue going 
 through, a configurable option to influence this type of behavior, or perhaps 
 even a flag that manipulates a peer's state (e.g. via add_peer) that dictates 
 if replication can write to READONLY tables.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-10958) [dataloss] Bulk loading with seqids can prevent some log entries from being replayed

2014-04-30 Thread Jean-Daniel Cryans (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13986053#comment-13986053
 ] 

Jean-Daniel Cryans commented on HBASE-10958:


Sorry, went to do something else, lemme get that in (with HBASE-11008 first).

 [dataloss] Bulk loading with seqids can prevent some log entries from being 
 replayed
 

 Key: HBASE-10958
 URL: https://issues.apache.org/jira/browse/HBASE-10958
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.96.2, 0.98.1, 0.94.18
Reporter: Jean-Daniel Cryans
Assignee: Jean-Daniel Cryans
Priority: Blocker
 Fix For: 0.99.0, 0.98.2, 0.96.3, 0.94.20

 Attachments: HBASE-10958-0.94.patch, 
 HBASE-10958-less-intrusive-hack-0.96.patch, 
 HBASE-10958-quick-hack-0.96.patch, HBASE-10958-v2.patch, 
 HBASE-10958-v3.patch, HBASE-10958.patch


 We found an issue with bulk loads causing data loss when assigning sequence 
 ids (HBASE-6630) that is triggered when replaying recovered edits. We're 
 nicknaming this issue *Blindspot*.
 The problem is that the sequence id given to a bulk loaded file is higher 
 than those of the edits in the region's memstore. When replaying recovered 
 edits, the rule to skip some of them is that they have to be _lower than the 
 highest sequence id_. In other words, the edits that have a sequence id lower 
 than the highest one in the store files *should* have also been flushed. This 
 is not the case with bulk loaded files since we now have an HFile with a 
 sequence id higher than unflushed edits.
 The log recovery code takes this into account by simply skipping the bulk 
 loaded files, but this bulk loaded status is *lost* on compaction. The 
 edits in the logs that have a sequence id lower than the bulk loaded file 
 that got compacted are put in a blind spot and are skipped during replay.
 Here's the easiest way to recreate this issue:
  - Create an empty table
  - Put one row in it (let's say it gets seqid 1)
  - Bulk load one file (it gets seqid 2). I used ImporTsv and set 
 hbase.mapreduce.bulkload.assign.sequenceNumbers.
  - Bulk load a second file the same way (it gets seqid 3).
  - Major compact the table (the new file has seqid 3 and isn't considered 
 bulk loaded).
  - Kill the region server that holds the table's region.
  - Scan the table once the region is made available again. The first row, at 
 seqid 1, will be missing since the HFile with seqid 3 makes us believe that 
 everything that came before it was flushed.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HBASE-10958) [dataloss] Bulk loading with seqids can prevent some log entries from being replayed

2014-04-30 Thread Jean-Daniel Cryans (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Daniel Cryans updated HBASE-10958:
---

  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

Now committed to 0.94 (took some time because I wanted to double check the 
tests I modified were all green). Thanks everyone.

 [dataloss] Bulk loading with seqids can prevent some log entries from being 
 replayed
 

 Key: HBASE-10958
 URL: https://issues.apache.org/jira/browse/HBASE-10958
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.96.2, 0.98.1, 0.94.18
Reporter: Jean-Daniel Cryans
Assignee: Jean-Daniel Cryans
Priority: Blocker
 Fix For: 0.99.0, 0.98.2, 0.96.3, 0.94.20

 Attachments: HBASE-10958-0.94.patch, 
 HBASE-10958-less-intrusive-hack-0.96.patch, 
 HBASE-10958-quick-hack-0.96.patch, HBASE-10958-v2.patch, 
 HBASE-10958-v3.patch, HBASE-10958.patch


 We found an issue with bulk loads causing data loss when assigning sequence 
 ids (HBASE-6630) that is triggered when replaying recovered edits. We're 
 nicknaming this issue *Blindspot*.
 The problem is that the sequence id given to a bulk loaded file is higher 
 than those of the edits in the region's memstore. When replaying recovered 
 edits, the rule to skip some of them is that they have to be _lower than the 
 highest sequence id_. In other words, the edits that have a sequence id lower 
 than the highest one in the store files *should* have also been flushed. This 
 is not the case with bulk loaded files since we now have an HFile with a 
 sequence id higher than unflushed edits.
 The log recovery code takes this into account by simply skipping the bulk 
 loaded files, but this bulk loaded status is *lost* on compaction. The 
 edits in the logs that have a sequence id lower than the bulk loaded file 
 that got compacted are put in a blind spot and are skipped during replay.
 Here's the easiest way to recreate this issue:
  - Create an empty table
  - Put one row in it (let's say it gets seqid 1)
  - Bulk load one file (it gets seqid 2). I used ImporTsv and set 
 hbase.mapreduce.bulkload.assign.sequenceNumbers.
  - Bulk load a second file the same way (it gets seqid 3).
  - Major compact the table (the new file has seqid 3 and isn't considered 
 bulk loaded).
  - Kill the region server that holds the table's region.
  - Scan the table once the region is made available again. The first row, at 
 seqid 1, will be missing since the HFile with seqid 3 makes us believe that 
 everything that came before it was flushed.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HBASE-11008) Align bulk load, flush, and compact to require Action.CREATE

2014-04-30 Thread Jean-Daniel Cryans (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-11008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Daniel Cryans updated HBASE-11008:
---

  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

And committed to 0.94, thanks everyone.

 Align bulk load, flush, and compact to require Action.CREATE
 

 Key: HBASE-11008
 URL: https://issues.apache.org/jira/browse/HBASE-11008
 Project: HBase
  Issue Type: Improvement
  Components: security
Reporter: Jean-Daniel Cryans
Assignee: Jean-Daniel Cryans
 Fix For: 0.99.0, 0.98.2, 0.96.3, 0.94.20

 Attachments: HBASE-11008-0.94.patch, HBASE-11008-v2.patch, 
 HBASE-11008-v3.patch, HBASE-11008.patch


 Over in HBASE-10958 we noticed that it might make sense to require 
 Action.CREATE for bulk load, flush, and compact since it is also required for 
 things like enable and disable.
 This means the following changes:
  - preBulkLoadHFile goes from WRITE to CREATE
  - compact/flush go from ADMIN to ADMIN or CREATE



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-10932) Improve RowCounter to allow mapper number set/control

2014-04-25 Thread Jean-Daniel Cryans (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13981271#comment-13981271
 ] 

Jean-Daniel Cryans commented on HBASE-10932:


bq. But I didn't quite catch the point of job scheduler, in my understanding 
job scheduler is cluster-level and cannot be configured per-job, right? 

Well, by using a scheduler, you can constrain certain types of jobs so that 
they don't run as fast as they can. For example, with the fair scheduler you 
can configure a pool (let's call it the slow pool) to have only {{maxMaps}} 
running concurrently on the cluster. Then, when you run your {{RowCounter}} 
jobs and whatnot, you can tie them automatically to the slow pool. Hadoop 
cluster operators usually know how to use a scheduler, whereas having to rely 
on the person who runs the jobs to configure them correctly can lead to human 
errors like oops I forgot to pass the maps configuration to my row counter and 
now the website is down.

It also works well if you have two users who want to concurrently run a row 
counter; they'll both get in the slow pool and only two mappers will run 
(alternating between the two jobs, unless you set different weights because one 
user is more important than the other, etc etc). If you were to rely on 
individual users specifying the correct number of maps, and they both set their 
job to use two, then you'd have four mappers running. Back to square one.

Anyways, all of this to say that there's a more generic way of doing this, and 
it already exists. Can we close this jira, [~carp84]?

 Improve RowCounter to allow mapper number set/control
 -

 Key: HBASE-10932
 URL: https://issues.apache.org/jira/browse/HBASE-10932
 Project: HBase
  Issue Type: Improvement
  Components: mapreduce
Reporter: Yu Li
Assignee: Yu Li
Priority: Minor
 Attachments: HBASE-10932_v1.patch, HBASE-10932_v2.patch


 The typical use case of RowCounter is to do some kind of data integrity 
 checking, like after exporting some data from RDBMS to HBase, or from one 
 HBase cluster to another, making sure the row(record) number matches. Such 
 check commonly won't require much on response time.
 Meanwhile, based on current impl, RowCounter will launch one mapper per 
 region, and each mapper will send one scan request. Assuming the table is 
 kind of big like having tens of regions, and the cpu core number of the whole 
 MR cluster is also enough, the parallel scan requests sent by mapper would be 
 a real burden for the HBase cluster.
 So in this JIRA, we're proposing to make rowcounter support an additional 
 option --maps to specify mapper number, and make each mapper able to scan 
 more than one region of the target table.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Resolved] (HBASE-10932) Improve RowCounter to allow mapper number set/control

2014-04-25 Thread Jean-Daniel Cryans (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Daniel Cryans resolved HBASE-10932.


Resolution: Won't Fix

Resolving as won't fix. If you want to work on a more general solution, like 
adding this option to the TIF, please open a new jira. Thanks.

 Improve RowCounter to allow mapper number set/control
 -

 Key: HBASE-10932
 URL: https://issues.apache.org/jira/browse/HBASE-10932
 Project: HBase
  Issue Type: Improvement
  Components: mapreduce
Reporter: Yu Li
Assignee: Yu Li
Priority: Minor
 Attachments: HBASE-10932_v1.patch, HBASE-10932_v2.patch


 The typical use case of RowCounter is to do some kind of data integrity 
 checking, like after exporting some data from RDBMS to HBase, or from one 
 HBase cluster to another, making sure the row(record) number matches. Such 
 check commonly won't require much on response time.
 Meanwhile, based on current impl, RowCounter will launch one mapper per 
 region, and each mapper will send one scan request. Assuming the table is 
 kind of big like having tens of regions, and the cpu core number of the whole 
 MR cluster is also enough, the parallel scan requests sent by mapper would be 
 a real burden for the HBase cluster.
 So in this JIRA, we're proposing to make rowcounter support an additional 
 option --maps to specify mapper number, and make each mapper able to scan 
 more than one region of the target table.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HBASE-11008) Align bulk load, flush, and compact to require Action.CREATE

2014-04-25 Thread Jean-Daniel Cryans (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-11008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Daniel Cryans updated HBASE-11008:
---

Release Note: 
preBulkLoadHFile now requires CREATE, which it effectively already needed since 
getTableDescriptor also requires it which is what LoadIncrementalHFiles is 
doing before bulk loading.
compact and flush can now be issued by users with CREATE permission.

I committed to 0.96 and up. I'm waiting on [~lhofhansl]'s +1 for 0.94 (or I can 
create a backport jira).

 Align bulk load, flush, and compact to require Action.CREATE
 

 Key: HBASE-11008
 URL: https://issues.apache.org/jira/browse/HBASE-11008
 Project: HBase
  Issue Type: Improvement
  Components: security
Reporter: Jean-Daniel Cryans
Assignee: Jean-Daniel Cryans
 Fix For: 0.99.0, 0.98.2, 0.96.3, 0.94.20

 Attachments: HBASE-11008-0.94.patch, HBASE-11008-v2.patch, 
 HBASE-11008-v3.patch, HBASE-11008.patch


 Over in HBASE-10958 we noticed that it might make sense to require 
 Action.CREATE for bulk load, flush, and compact since it is also required for 
 things like enable and disable.
 This means the following changes:
  - preBulkLoadHFile goes from WRITE to CREATE
  - compact/flush go from ADMIN to ADMIN or CREATE



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HBASE-10958) [dataloss] Bulk loading with seqids can prevent some log entries from being replayed

2014-04-25 Thread Jean-Daniel Cryans (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Daniel Cryans updated HBASE-10958:
---

Release Note: Bulk loading with sequence IDs, an option in late 0.94 
releases and the default since 0.96.0, will now trigger a flush per region that 
loads an HFile (if there's data that needs to flushed).

Committed to 0.96 and up. Like for HBASE-11008, I'm waiting to commit to 0.94 
or I can open a backport jira.

 [dataloss] Bulk loading with seqids can prevent some log entries from being 
 replayed
 

 Key: HBASE-10958
 URL: https://issues.apache.org/jira/browse/HBASE-10958
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.96.2, 0.98.1, 0.94.18
Reporter: Jean-Daniel Cryans
Assignee: Jean-Daniel Cryans
Priority: Blocker
 Fix For: 0.99.0, 0.98.2, 0.96.3, 0.94.20

 Attachments: HBASE-10958-0.94.patch, 
 HBASE-10958-less-intrusive-hack-0.96.patch, 
 HBASE-10958-quick-hack-0.96.patch, HBASE-10958-v2.patch, 
 HBASE-10958-v3.patch, HBASE-10958.patch


 We found an issue with bulk loads causing data loss when assigning sequence 
 ids (HBASE-6630) that is triggered when replaying recovered edits. We're 
 nicknaming this issue *Blindspot*.
 The problem is that the sequence id given to a bulk loaded file is higher 
 than those of the edits in the region's memstore. When replaying recovered 
 edits, the rule to skip some of them is that they have to be _lower than the 
 highest sequence id_. In other words, the edits that have a sequence id lower 
 than the highest one in the store files *should* have also been flushed. This 
 is not the case with bulk loaded files since we now have an HFile with a 
 sequence id higher than unflushed edits.
 The log recovery code takes this into account by simply skipping the bulk 
 loaded files, but this bulk loaded status is *lost* on compaction. The 
 edits in the logs that have a sequence id lower than the bulk loaded file 
 that got compacted are put in a blind spot and are skipped during replay.
 Here's the easiest way to recreate this issue:
  - Create an empty table
  - Put one row in it (let's say it gets seqid 1)
  - Bulk load one file (it gets seqid 2). I used ImporTsv and set 
 hbase.mapreduce.bulkload.assign.sequenceNumbers.
  - Bulk load a second file the same way (it gets seqid 3).
  - Major compact the table (the new file has seqid 3 and isn't considered 
 bulk loaded).
  - Kill the region server that holds the table's region.
  - Scan the table once the region is made available again. The first row, at 
 seqid 1, will be missing since the HFile with seqid 3 makes us believe that 
 everything that came before it was flushed.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HBASE-10981) taskmonitor use subList may cause recursion and get a java.lang.StackOverflowError

2014-04-24 Thread Jean-Daniel Cryans (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10981?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Daniel Cryans updated HBASE-10981:
---

Resolution: Duplicate
Status: Resolved  (was: Patch Available)

Yup, resolving as dup of HBASE-10312. 

 taskmonitor use subList may cause recursion and get a 
 java.lang.StackOverflowError
 --

 Key: HBASE-10981
 URL: https://issues.apache.org/jira/browse/HBASE-10981
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.98.0, 0.96.0, 0.96.2, 0.98.1, 0.96.1.1, 0.98.2
 Environment: java version 1.7.0_03 64bit
Reporter: sinfox
Priority: Critical
 Attachments: fixsubListException.patch

   Original Estimate: 5m
  Remaining Estimate: 5m

 2014-04-14 11:52:45,905 ERROR [RS_CLOSE_REGION-in16-062:60020-1] 
 executor.EventHandler: Caught throwable while processing event
 M_RS_CLOSE_REGION
 java.lang.RuntimeException: java.lang.StackOverflowError
 at 
 org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler.process(CloseRegionHandler.java:161)
 at 
 org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:128)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
 at java.lang.Thread.run(Thread.java:722)
 Caused by: java.lang.StackOverflowError
 at java.util.ArrayList$SubList.add(ArrayList.java:965)
 at java.util.ArrayList$SubList.add(ArrayList.java:965)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-11008) Align bulk load, flush, and compact to require Action.CREATE

2014-04-22 Thread Jean-Daniel Cryans (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13977605#comment-13977605
 ] 

Jean-Daniel Cryans commented on HBASE-11008:


Gently prodding for reviews and +1s, if we commit this then we can get 
HBASE-10958 in.

 Align bulk load, flush, and compact to require Action.CREATE
 

 Key: HBASE-11008
 URL: https://issues.apache.org/jira/browse/HBASE-11008
 Project: HBase
  Issue Type: Improvement
  Components: security
Reporter: Jean-Daniel Cryans
Assignee: Jean-Daniel Cryans
 Fix For: 0.99.0, 0.98.2, 0.96.3, 0.94.20

 Attachments: HBASE-11008-0.94.patch, HBASE-11008-v2.patch, 
 HBASE-11008-v3.patch, HBASE-11008.patch


 Over in HBASE-10958 we noticed that it might make sense to require 
 Action.CREATE for bulk load, flush, and compact since it is also required for 
 things like enable and disable.
 This means the following changes:
  - preBulkLoadHFile goes from WRITE to CREATE
  - compact/flush go from ADMIN to ADMIN or CREATE



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-11008) Align bulk load, flush, and compact to require Action.CREATE

2014-04-22 Thread Jean-Daniel Cryans (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13977612#comment-13977612
 ] 

Jean-Daniel Cryans commented on HBASE-11008:


Thanks [~apurtell].

[~stack], [~lhofhansl], you guys ok for 0.96 and 0.94 respectively?

 Align bulk load, flush, and compact to require Action.CREATE
 

 Key: HBASE-11008
 URL: https://issues.apache.org/jira/browse/HBASE-11008
 Project: HBase
  Issue Type: Improvement
  Components: security
Reporter: Jean-Daniel Cryans
Assignee: Jean-Daniel Cryans
 Fix For: 0.99.0, 0.98.2, 0.96.3, 0.94.20

 Attachments: HBASE-11008-0.94.patch, HBASE-11008-v2.patch, 
 HBASE-11008-v3.patch, HBASE-11008.patch


 Over in HBASE-10958 we noticed that it might make sense to require 
 Action.CREATE for bulk load, flush, and compact since it is also required for 
 things like enable and disable.
 This means the following changes:
  - preBulkLoadHFile goes from WRITE to CREATE
  - compact/flush go from ADMIN to ADMIN or CREATE



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-3489) .oldlogs not being cleaned out

2014-04-22 Thread Jean-Daniel Cryans (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-3489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13977835#comment-13977835
 ] 

Jean-Daniel Cryans commented on HBASE-3489:
---

We removed the kill switch, start/stop_replication don't exist anymore.

 .oldlogs not being cleaned out
 --

 Key: HBASE-3489
 URL: https://issues.apache.org/jira/browse/HBASE-3489
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.0
 Environment: 10 Nodes Write Heavy Cluster
Reporter: Wayne
 Attachments: oldlog.txt


 The .oldlogs folder is never being cleaned up. The 
 hbase.master.logcleaner.ttl has been set to clean up the old logs but the 
 clean up is never kicking in. The limit of 10 files is not the problem. After 
 running for 5 days not a single log file has ever been deleted and the 
 logcleaner is set to 2 days (from the default of 7 days). It is assumed that 
 the replication changes that want to be sure to keep these logs around if 
 needed have caused the cleanup to be blocked. There is no replication defined 
 (knowingly).
  



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-11043) Users with table's read/write permission can't get table's description

2014-04-21 Thread Jean-Daniel Cryans (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11043?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13976108#comment-13976108
 ] 

Jean-Daniel Cryans commented on HBASE-11043:


It's this way because of HBASE-8692.

 Users with table's read/write permission can't get table's description
 --

 Key: HBASE-11043
 URL: https://issues.apache.org/jira/browse/HBASE-11043
 Project: HBase
  Issue Type: Bug
  Components: security
Affects Versions: 0.99.0
Reporter: Liu Shaohui
Assignee: Liu Shaohui
Priority: Minor
 Attachments: HBASE-11043-trunk-v1.diff


 AccessController#preGetTableDescriptors only allow users with admin or create 
 permission to get table's description.
 {quote}
 requirePermission(getTableDescriptors, nameAsBytes, null, null,
   Permission.Action.ADMIN, Permission.Action.CREATE);
 {quote}
 I think Users with table's read/write permission should also be able to get 
 table's description. 
 Eg: when create a hive table on HBase,  hive will get the table description 
 to check if the mapping is right. Usually the hive users only have the read 
 permission of table.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HBASE-11008) Align bulk load, flush, and compact to require Action.CREATE

2014-04-21 Thread Jean-Daniel Cryans (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-11008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Daniel Cryans updated HBASE-11008:
---

Attachment: HBASE-11008-0.94.patch

Attaching the patch for 0.94. I have to force using sequence ids else it won't 
exercise the flush that's coming in HBASE-10958. It doesn't change the 
semantics of the test itself.

 Align bulk load, flush, and compact to require Action.CREATE
 

 Key: HBASE-11008
 URL: https://issues.apache.org/jira/browse/HBASE-11008
 Project: HBase
  Issue Type: Improvement
  Components: security
Reporter: Jean-Daniel Cryans
Assignee: Jean-Daniel Cryans
 Fix For: 0.99.0, 0.98.2, 0.96.3, 0.94.20

 Attachments: HBASE-11008-0.94.patch, HBASE-11008-v2.patch, 
 HBASE-11008-v3.patch, HBASE-11008.patch


 Over in HBASE-10958 we noticed that it might make sense to require 
 Action.CREATE for bulk load, flush, and compact since it is also required for 
 things like enable and disable.
 This means the following changes:
  - preBulkLoadHFile goes from WRITE to CREATE
  - compact/flush go from ADMIN to ADMIN or CREATE



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HBASE-11008) Align bulk load, flush, and compact to require Action.CREATE

2014-04-18 Thread Jean-Daniel Cryans (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-11008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Daniel Cryans updated HBASE-11008:
---

Attachment: HBASE-11008-v3.patch

New patch that fixes the documentation for flush and compact. This is good to 
go for 0.96 and up I think.

 Align bulk load, flush, and compact to require Action.CREATE
 

 Key: HBASE-11008
 URL: https://issues.apache.org/jira/browse/HBASE-11008
 Project: HBase
  Issue Type: Improvement
  Components: security
Reporter: Jean-Daniel Cryans
Assignee: Jean-Daniel Cryans
 Fix For: 0.99.0, 0.98.2, 0.96.3, 0.94.20

 Attachments: HBASE-11008-v2.patch, HBASE-11008-v3.patch, 
 HBASE-11008.patch


 Over in HBASE-10958 we noticed that it might make sense to require 
 Action.CREATE for bulk load, flush, and compact since it is also required for 
 things like enable and disable.
 This means the following changes:
  - preBulkLoadHFile goes from WRITE to CREATE
  - compact/flush go from ADMIN to ADMIN or CREATE



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-10932) Improve RowCounter to allow mapper number set/control

2014-04-18 Thread Jean-Daniel Cryans (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13974306#comment-13974306
 ] 

Jean-Daniel Cryans commented on HBASE-10932:


Hey [~carp84], I forgot about this issue, let me address your latest replies.

bq. I thought it's designed for purpose to make each mapper just scan one 
single region

That's more an implementation detail than a design, and we can further improve 
the implementation by giving more control to the power users.

bq. This is useful especially in multi-tenant env, when we need to check data 
integrity for one user after data importing meanwhile don't want the scan 
burden to slow down RT of other users' request.

Right, but again, resource management is a broader issue. I doubt that 
RowCounter is the only job that needs to be throttled, what about 
VerifyReplication? Or Export? Those jobs usually aren't latency sensitive and 
can run in the background. This can be simply handled by a correctly configured 
job scheduler, that's what they do.

 Improve RowCounter to allow mapper number set/control
 -

 Key: HBASE-10932
 URL: https://issues.apache.org/jira/browse/HBASE-10932
 Project: HBase
  Issue Type: Improvement
  Components: mapreduce
Reporter: Yu Li
Assignee: Yu Li
Priority: Minor
 Attachments: HBASE-10932_v1.patch, HBASE-10932_v2.patch


 The typical use case of RowCounter is to do some kind of data integrity 
 checking, like after exporting some data from RDBMS to HBase, or from one 
 HBase cluster to another, making sure the row(record) number matches. Such 
 check commonly won't require much on response time.
 Meanwhile, based on current impl, RowCounter will launch one mapper per 
 region, and each mapper will send one scan request. Assuming the table is 
 kind of big like having tens of regions, and the cpu core number of the whole 
 MR cluster is also enough, the parallel scan requests sent by mapper would be 
 a real burden for the HBase cluster.
 So in this JIRA, we're proposing to make rowcounter support an additional 
 option --maps to specify mapper number, and make each mapper able to scan 
 more than one region of the target table.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-11011) Avoid extra getFileStatus() calls on Region startup

2014-04-17 Thread Jean-Daniel Cryans (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13973168#comment-13973168
 ] 

Jean-Daniel Cryans commented on HBASE-11011:


What should one do when seeing this message?

{code}
+  LOG.debug(compaction file missing:  + outputPath);
{code}

The fact that a file is missing seems pretty bad, yet it's at DEBUG.

 Avoid extra getFileStatus() calls on Region startup
 ---

 Key: HBASE-11011
 URL: https://issues.apache.org/jira/browse/HBASE-11011
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.96.2, 0.98.1, 1.0.0
Reporter: Matteo Bertozzi
Assignee: Matteo Bertozzi
Priority: Minor
 Fix For: 1.0.0, 0.98.2, 0.96.3

 Attachments: HBASE-11011-v0.patch


 On load we already have a StoreFileInfo and we create it from the path,
 this will result in an extra fs.getFileStatus() call.
 In completeCompactionMarker() we do a fs.exists() and later a 
 fs.getFileStatus()
 to create the StoreFileInfo, we can avoid the exists.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-11011) Avoid extra getFileStatus() calls on Region startup

2014-04-17 Thread Jean-Daniel Cryans (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13973196#comment-13973196
 ] 

Jean-Daniel Cryans commented on HBASE-11011:


bq. In this case is not bad, is an expected situation, where the RS died 
before moving the compacted file.

This might be, but the log message doesn't convey any of that. Taken out of 
context (so most users reading our logs), it just says a file is missing.

bq. since is trying to lookup files in /table/region/family/ and if they are 
there are already loaded for sure.

I'm trusting you on this one :)

 Avoid extra getFileStatus() calls on Region startup
 ---

 Key: HBASE-11011
 URL: https://issues.apache.org/jira/browse/HBASE-11011
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.96.2, 0.98.1, 1.0.0
Reporter: Matteo Bertozzi
Assignee: Matteo Bertozzi
Priority: Minor
 Fix For: 1.0.0, 0.98.2, 0.96.3

 Attachments: HBASE-11011-v0.patch


 On load we already have a StoreFileInfo and we create it from the path,
 this will result in an extra fs.getFileStatus() call.
 In completeCompactionMarker() we do a fs.exists() and later a 
 fs.getFileStatus()
 to create the StoreFileInfo, we can avoid the exists.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-11011) Avoid extra getFileStatus() calls on Region startup

2014-04-17 Thread Jean-Daniel Cryans (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13973419#comment-13973419
 ] 

Jean-Daniel Cryans commented on HBASE-11011:


Does the changed code in completeCompactionMarker require a unit test? Or is 
there already one?

Also fix those lines:

{quote}
+// If we scan the directory and the file is not present, may means:
+// so, we can't do anything with the compaction output list since or is
+// already loaded on startup, because in the store folder, or it may be not
{quote}

 Avoid extra getFileStatus() calls on Region startup
 ---

 Key: HBASE-11011
 URL: https://issues.apache.org/jira/browse/HBASE-11011
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.96.2, 0.98.1, 1.0.0
Reporter: Matteo Bertozzi
Assignee: Matteo Bertozzi
Priority: Minor
 Fix For: 1.0.0, 0.98.2, 0.96.3

 Attachments: HBASE-11011-v0.patch, HBASE-11011-v1.patch


 On load we already have a StoreFileInfo and we create it from the path,
 this will result in an extra fs.getFileStatus() call.
 In completeCompactionMarker() we do a fs.exists() and later a 
 fs.getFileStatus()
 to create the StoreFileInfo, we can avoid the exists.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-11011) Avoid extra getFileStatus() calls on Region startup

2014-04-17 Thread Jean-Daniel Cryans (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13973470#comment-13973470
 ] 

Jean-Daniel Cryans commented on HBASE-11011:


Alright, +1

 Avoid extra getFileStatus() calls on Region startup
 ---

 Key: HBASE-11011
 URL: https://issues.apache.org/jira/browse/HBASE-11011
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.96.2, 0.98.1, 1.0.0
Reporter: Matteo Bertozzi
Assignee: Matteo Bertozzi
Priority: Minor
 Fix For: 1.0.0, 0.98.2, 0.96.3

 Attachments: HBASE-11011-v0.patch, HBASE-11011-v1.patch, 
 HBASE-11011-v2.patch


 On load we already have a StoreFileInfo and we create it from the path,
 this will result in an extra fs.getFileStatus() call.
 In completeCompactionMarker() we do a fs.exists() and later a 
 fs.getFileStatus()
 to create the StoreFileInfo, we can avoid the exists.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HBASE-11008) Align bulk load, flush, and compact to require Action.CREATE

2014-04-17 Thread Jean-Daniel Cryans (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-11008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Daniel Cryans updated HBASE-11008:
---

Attachment: HBASE-11008-v2.patch

Patch with the modified documentation. FWIW I think that chapter needs a lot 
more love but it's out of scope.

 Align bulk load, flush, and compact to require Action.CREATE
 

 Key: HBASE-11008
 URL: https://issues.apache.org/jira/browse/HBASE-11008
 Project: HBase
  Issue Type: Improvement
  Components: security
Reporter: Jean-Daniel Cryans
Assignee: Jean-Daniel Cryans
 Fix For: 0.99.0, 0.98.2, 0.96.3, 0.94.20

 Attachments: HBASE-11008-v2.patch, HBASE-11008.patch


 Over in HBASE-10958 we noticed that it might make sense to require 
 Action.CREATE for bulk load, flush, and compact since it is also required for 
 things like enable and disable.
 This means the following changes:
  - preBulkLoadHFile goes from WRITE to CREATE
  - compact/flush go from ADMIN to ADMIN or CREATE



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HBASE-10958) [dataloss] Bulk loading with seqids can prevent some log entries from being replayed

2014-04-17 Thread Jean-Daniel Cryans (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Daniel Cryans updated HBASE-10958:
---

Attachment: HBASE-10958-0.94.patch
HBASE-10958-v3.patch

Attaching 2 patches.

One is a backport for 0.94. While doing the backport I saw that a 
TestSnapshotFromMaster was failing and Matteo was able to see that it was an 
error in my patch, flushing always returned that it needed compaction. I need 
to rerun all the tests now but it was the only one that failed (haven't tried 
with security either). I also added a test in TestHRegion for that.

The second patch is for trunk, in which I ported the same test to TestHRegion. 
Interestingly, it didn't work. I found that in 0.94 we compact if num_files  
compactionThreshold, but in trunk it's =, so it seems that we compact more 
often now. This patch also has the fixes from [~stack]'s comments.

 [dataloss] Bulk loading with seqids can prevent some log entries from being 
 replayed
 

 Key: HBASE-10958
 URL: https://issues.apache.org/jira/browse/HBASE-10958
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.96.2, 0.98.1, 0.94.18
Reporter: Jean-Daniel Cryans
Assignee: Jean-Daniel Cryans
Priority: Blocker
 Fix For: 0.99.0, 0.94.19, 0.98.2, 0.96.3

 Attachments: HBASE-10958-0.94.patch, 
 HBASE-10958-less-intrusive-hack-0.96.patch, 
 HBASE-10958-quick-hack-0.96.patch, HBASE-10958-v2.patch, 
 HBASE-10958-v3.patch, HBASE-10958.patch


 We found an issue with bulk loads causing data loss when assigning sequence 
 ids (HBASE-6630) that is triggered when replaying recovered edits. We're 
 nicknaming this issue *Blindspot*.
 The problem is that the sequence id given to a bulk loaded file is higher 
 than those of the edits in the region's memstore. When replaying recovered 
 edits, the rule to skip some of them is that they have to be _lower than the 
 highest sequence id_. In other words, the edits that have a sequence id lower 
 than the highest one in the store files *should* have also been flushed. This 
 is not the case with bulk loaded files since we now have an HFile with a 
 sequence id higher than unflushed edits.
 The log recovery code takes this into account by simply skipping the bulk 
 loaded files, but this bulk loaded status is *lost* on compaction. The 
 edits in the logs that have a sequence id lower than the bulk loaded file 
 that got compacted are put in a blind spot and are skipped during replay.
 Here's the easiest way to recreate this issue:
  - Create an empty table
  - Put one row in it (let's say it gets seqid 1)
  - Bulk load one file (it gets seqid 2). I used ImporTsv and set 
 hbase.mapreduce.bulkload.assign.sequenceNumbers.
  - Bulk load a second file the same way (it gets seqid 3).
  - Major compact the table (the new file has seqid 3 and isn't considered 
 bulk loaded).
  - Kill the region server that holds the table's region.
  - Scan the table once the region is made available again. The first row, at 
 seqid 1, will be missing since the HFile with seqid 3 makes us believe that 
 everything that came before it was flushed.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-11008) Align bulk load, flush, and compact to require Action.CREATE

2014-04-17 Thread Jean-Daniel Cryans (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13973653#comment-13973653
 ] 

Jean-Daniel Cryans commented on HBASE-11008:


I can, but I'm not sure where to put them. That table doesn't have the concept 
of operations having multiple perms. Flush and compact are now both ADMIN and 
CREATE. I can put them in ADMIN until we find a better representation. Sounds 
good?

 Align bulk load, flush, and compact to require Action.CREATE
 

 Key: HBASE-11008
 URL: https://issues.apache.org/jira/browse/HBASE-11008
 Project: HBase
  Issue Type: Improvement
  Components: security
Reporter: Jean-Daniel Cryans
Assignee: Jean-Daniel Cryans
 Fix For: 0.99.0, 0.98.2, 0.96.3, 0.94.20

 Attachments: HBASE-11008-v2.patch, HBASE-11008.patch


 Over in HBASE-10958 we noticed that it might make sense to require 
 Action.CREATE for bulk load, flush, and compact since it is also required for 
 things like enable and disable.
 This means the following changes:
  - preBulkLoadHFile goes from WRITE to CREATE
  - compact/flush go from ADMIN to ADMIN or CREATE



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-10958) [dataloss] Bulk loading with seqids can prevent some log entries from being replayed

2014-04-17 Thread Jean-Daniel Cryans (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13973789#comment-13973789
 ] 

Jean-Daniel Cryans commented on HBASE-10958:


bq. method = name.getMethodName();

Will fix (looks like I copied that from one of the few methods in that class 
that doesn't do it).

 [dataloss] Bulk loading with seqids can prevent some log entries from being 
 replayed
 

 Key: HBASE-10958
 URL: https://issues.apache.org/jira/browse/HBASE-10958
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.96.2, 0.98.1, 0.94.18
Reporter: Jean-Daniel Cryans
Assignee: Jean-Daniel Cryans
Priority: Blocker
 Fix For: 0.99.0, 0.94.19, 0.98.2, 0.96.3

 Attachments: HBASE-10958-0.94.patch, 
 HBASE-10958-less-intrusive-hack-0.96.patch, 
 HBASE-10958-quick-hack-0.96.patch, HBASE-10958-v2.patch, 
 HBASE-10958-v3.patch, HBASE-10958.patch


 We found an issue with bulk loads causing data loss when assigning sequence 
 ids (HBASE-6630) that is triggered when replaying recovered edits. We're 
 nicknaming this issue *Blindspot*.
 The problem is that the sequence id given to a bulk loaded file is higher 
 than those of the edits in the region's memstore. When replaying recovered 
 edits, the rule to skip some of them is that they have to be _lower than the 
 highest sequence id_. In other words, the edits that have a sequence id lower 
 than the highest one in the store files *should* have also been flushed. This 
 is not the case with bulk loaded files since we now have an HFile with a 
 sequence id higher than unflushed edits.
 The log recovery code takes this into account by simply skipping the bulk 
 loaded files, but this bulk loaded status is *lost* on compaction. The 
 edits in the logs that have a sequence id lower than the bulk loaded file 
 that got compacted are put in a blind spot and are skipped during replay.
 Here's the easiest way to recreate this issue:
  - Create an empty table
  - Put one row in it (let's say it gets seqid 1)
  - Bulk load one file (it gets seqid 2). I used ImporTsv and set 
 hbase.mapreduce.bulkload.assign.sequenceNumbers.
  - Bulk load a second file the same way (it gets seqid 3).
  - Major compact the table (the new file has seqid 3 and isn't considered 
 bulk loaded).
  - Kill the region server that holds the table's region.
  - Scan the table once the region is made available again. The first row, at 
 seqid 1, will be missing since the HFile with seqid 3 makes us believe that 
 everything that came before it was flushed.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-10958) [dataloss] Bulk loading with seqids can prevent some log entries from being replayed

2014-04-16 Thread Jean-Daniel Cryans (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13971584#comment-13971584
 ] 

Jean-Daniel Cryans commented on HBASE-10958:


bq. we'd like ADMIN to be granted sparingly to admins or delegates

+1

bq.  IIRC enable and disable are ADMIN actions also, since disabling or 
enabling a 1 region table has consequences.

This is what I see in the code in trunk:

{code}
requirePermission(preBulkLoadHFile, getTableDesc().getTableName(), 
el.getFirst(), null, Permission.Action.WRITE);
requirePermission(enableTable, tableName, null, null, Action.ADMIN, 
Action.CREATE);
requirePermission(disableTable, tableName, null, null, Action.ADMIN, 
Action.CREATE);
requirePermission(compact, getTableName(e.getEnvironment()), null, null, 
Action.ADMIN);
requirePermission(flush, getTableName(e.getEnvironment()), null, null, 
Action.ADMIN);
{code}

IMO flush should have lower or same perms as disableTable.

So here's a list of changes I believe are needed:

 - preBulkLoadHFile goes from WRITE to CREATE (seems more in line with what's 
really needed to bulk load given the code I posted yesterday)
 - compact/flush go from ADMIN to ADMIN or CREATE

This should not have an impact on the current users. If we can agree on the 
changes, I'll open a new jira that's going to be blocking this one.

 [dataloss] Bulk loading with seqids can prevent some log entries from being 
 replayed
 

 Key: HBASE-10958
 URL: https://issues.apache.org/jira/browse/HBASE-10958
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.96.2, 0.98.1, 0.94.18
Reporter: Jean-Daniel Cryans
Assignee: Jean-Daniel Cryans
Priority: Blocker
 Fix For: 0.99.0, 0.94.19, 0.98.2, 0.96.3

 Attachments: HBASE-10958-less-intrusive-hack-0.96.patch, 
 HBASE-10958-quick-hack-0.96.patch, HBASE-10958-v2.patch, HBASE-10958.patch


 We found an issue with bulk loads causing data loss when assigning sequence 
 ids (HBASE-6630) that is triggered when replaying recovered edits. We're 
 nicknaming this issue *Blindspot*.
 The problem is that the sequence id given to a bulk loaded file is higher 
 than those of the edits in the region's memstore. When replaying recovered 
 edits, the rule to skip some of them is that they have to be _lower than the 
 highest sequence id_. In other words, the edits that have a sequence id lower 
 than the highest one in the store files *should* have also been flushed. This 
 is not the case with bulk loaded files since we now have an HFile with a 
 sequence id higher than unflushed edits.
 The log recovery code takes this into account by simply skipping the bulk 
 loaded files, but this bulk loaded status is *lost* on compaction. The 
 edits in the logs that have a sequence id lower than the bulk loaded file 
 that got compacted are put in a blind spot and are skipped during replay.
 Here's the easiest way to recreate this issue:
  - Create an empty table
  - Put one row in it (let's say it gets seqid 1)
  - Bulk load one file (it gets seqid 2). I used ImporTsv and set 
 hbase.mapreduce.bulkload.assign.sequenceNumbers.
  - Bulk load a second file the same way (it gets seqid 3).
  - Major compact the table (the new file has seqid 3 and isn't considered 
 bulk loaded).
  - Kill the region server that holds the table's region.
  - Scan the table once the region is made available again. The first row, at 
 seqid 1, will be missing since the HFile with seqid 3 makes us believe that 
 everything that came before it was flushed.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-10958) [dataloss] Bulk loading with seqids can prevent some log entries from being replayed

2014-04-16 Thread Jean-Daniel Cryans (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13971753#comment-13971753
 ] 

Jean-Daniel Cryans commented on HBASE-10958:


bq. I can see that folks would not want to grant users CREATE (or ADMIN) so 
that they cannot create/drop/enable/disable tables, but still do allow them to 
load data via bulk load. That would now no longer possible.

Bulk load already needs CREATE as shown above in TestAccessController.

bq. Have we dismissed this option?

I don't think so, but no one has been pushing for it. It creates a precedent, 
nowhere else in the code do we use User.runAs to override the current user that 
came in via a RPC (I'm not even sure if it works, but that's because I don't 
know that code very well).

 [dataloss] Bulk loading with seqids can prevent some log entries from being 
 replayed
 

 Key: HBASE-10958
 URL: https://issues.apache.org/jira/browse/HBASE-10958
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.96.2, 0.98.1, 0.94.18
Reporter: Jean-Daniel Cryans
Assignee: Jean-Daniel Cryans
Priority: Blocker
 Fix For: 0.99.0, 0.94.19, 0.98.2, 0.96.3

 Attachments: HBASE-10958-less-intrusive-hack-0.96.patch, 
 HBASE-10958-quick-hack-0.96.patch, HBASE-10958-v2.patch, HBASE-10958.patch


 We found an issue with bulk loads causing data loss when assigning sequence 
 ids (HBASE-6630) that is triggered when replaying recovered edits. We're 
 nicknaming this issue *Blindspot*.
 The problem is that the sequence id given to a bulk loaded file is higher 
 than those of the edits in the region's memstore. When replaying recovered 
 edits, the rule to skip some of them is that they have to be _lower than the 
 highest sequence id_. In other words, the edits that have a sequence id lower 
 than the highest one in the store files *should* have also been flushed. This 
 is not the case with bulk loaded files since we now have an HFile with a 
 sequence id higher than unflushed edits.
 The log recovery code takes this into account by simply skipping the bulk 
 loaded files, but this bulk loaded status is *lost* on compaction. The 
 edits in the logs that have a sequence id lower than the bulk loaded file 
 that got compacted are put in a blind spot and are skipped during replay.
 Here's the easiest way to recreate this issue:
  - Create an empty table
  - Put one row in it (let's say it gets seqid 1)
  - Bulk load one file (it gets seqid 2). I used ImporTsv and set 
 hbase.mapreduce.bulkload.assign.sequenceNumbers.
  - Bulk load a second file the same way (it gets seqid 3).
  - Major compact the table (the new file has seqid 3 and isn't considered 
 bulk loaded).
  - Kill the region server that holds the table's region.
  - Scan the table once the region is made available again. The first row, at 
 seqid 1, will be missing since the HFile with seqid 3 makes us believe that 
 everything that came before it was flushed.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-10871) Indefinite OPEN/CLOSE wait on busy RegionServers

2014-04-16 Thread Jean-Daniel Cryans (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10871?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13971945#comment-13971945
 ] 

Jean-Daniel Cryans commented on HBASE-10871:


bq. Perhaps we should add some delay here.

Well we drop the region on the floor, and the delay right now is the RIT 
timeout (30 minutes in 0.94, 10 minutes after). We don't do tight retries.

 Indefinite OPEN/CLOSE wait on busy RegionServers
 

 Key: HBASE-10871
 URL: https://issues.apache.org/jira/browse/HBASE-10871
 Project: HBase
  Issue Type: Improvement
  Components: Balancer, master, Region Assignment
Affects Versions: 0.94.6
Reporter: Harsh J

 We observed a case where, when a specific RS got bombarded by a large amount 
 of regular requests, spiking and filling up its RPC queue, the balancer's 
 invoked unassigns and assigns for regions that dealt with this server entered 
 into an indefinite retry loop.
 The regions specifically began waiting in PENDING_CLOSE/PENDING_OPEN states 
 indefinitely cause of the HBase Client RPC from the ServerManager at the 
 master was running into SocketTimeouts. This caused a region unavailability 
 in the server for the affected regions. The timeout monitor retry default of 
 30m in 0.94's AM compounded the waiting gap further a bit more (this is now 
 10m in 0.95+'s new AM, and has further retries before we get there, which is 
 good).
 Wonder if there's a way to improve this situation generally. PENDING_OPENs 
 may be easy to handle - we can switch them out and move them elsewhere. 
 PENDING_CLOSEs may be a bit more tricky, but there must perhaps at least be a 
 way to give up permanently on a movement plan, and letting things be for a 
 while hoping for the RS to recover itself on its own (such that clients also 
 have a chance of getting things to work in the meantime)?



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-10958) [dataloss] Bulk loading with seqids can prevent some log entries from being replayed

2014-04-16 Thread Jean-Daniel Cryans (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13971998#comment-13971998
 ] 

Jean-Daniel Cryans commented on HBASE-10958:


bq. The requirement on 'CREATE' for bulk load seems to come from here. Is this 
even intended?

Thanks for doing this. There's also another place where this is called, 
splitStoreFile(), in order to get an HCD. I don't think it's necessary to call 
it twice, but it seems necessary to call it at least once else you can't:

- verify that the families exist
- get the schema for each family so that we create the HFiles with the correct 
configurations when splitting

 [dataloss] Bulk loading with seqids can prevent some log entries from being 
 replayed
 

 Key: HBASE-10958
 URL: https://issues.apache.org/jira/browse/HBASE-10958
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.96.2, 0.98.1, 0.94.18
Reporter: Jean-Daniel Cryans
Assignee: Jean-Daniel Cryans
Priority: Blocker
 Fix For: 0.99.0, 0.94.19, 0.98.2, 0.96.3

 Attachments: HBASE-10958-less-intrusive-hack-0.96.patch, 
 HBASE-10958-quick-hack-0.96.patch, HBASE-10958-v2.patch, HBASE-10958.patch


 We found an issue with bulk loads causing data loss when assigning sequence 
 ids (HBASE-6630) that is triggered when replaying recovered edits. We're 
 nicknaming this issue *Blindspot*.
 The problem is that the sequence id given to a bulk loaded file is higher 
 than those of the edits in the region's memstore. When replaying recovered 
 edits, the rule to skip some of them is that they have to be _lower than the 
 highest sequence id_. In other words, the edits that have a sequence id lower 
 than the highest one in the store files *should* have also been flushed. This 
 is not the case with bulk loaded files since we now have an HFile with a 
 sequence id higher than unflushed edits.
 The log recovery code takes this into account by simply skipping the bulk 
 loaded files, but this bulk loaded status is *lost* on compaction. The 
 edits in the logs that have a sequence id lower than the bulk loaded file 
 that got compacted are put in a blind spot and are skipped during replay.
 Here's the easiest way to recreate this issue:
  - Create an empty table
  - Put one row in it (let's say it gets seqid 1)
  - Bulk load one file (it gets seqid 2). I used ImporTsv and set 
 hbase.mapreduce.bulkload.assign.sequenceNumbers.
  - Bulk load a second file the same way (it gets seqid 3).
  - Major compact the table (the new file has seqid 3 and isn't considered 
 bulk loaded).
  - Kill the region server that holds the table's region.
  - Scan the table once the region is made available again. The first row, at 
 seqid 1, will be missing since the HFile with seqid 3 makes us believe that 
 everything that came before it was flushed.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HBASE-11008) Align bulk load, flush, and compact to require Action.CREATE

2014-04-16 Thread Jean-Daniel Cryans (JIRA)
Jean-Daniel Cryans created HBASE-11008:
--

 Summary: Align bulk load, flush, and compact to require 
Action.CREATE
 Key: HBASE-11008
 URL: https://issues.apache.org/jira/browse/HBASE-11008
 Project: HBase
  Issue Type: Improvement
  Components: security
Reporter: Jean-Daniel Cryans
Assignee: Jean-Daniel Cryans
 Fix For: 0.99.0, 0.98.2, 0.96.3, 0.94.20


Over in HBASE-10958 we noticed that it might make sense to require 
Action.CREATE for bulk load, flush, and compact since it is also required for 
things like enable and disable.

This means the following changes:

 - preBulkLoadHFile goes from WRITE to CREATE
 - compact/flush go from ADMIN to ADMIN or CREATE



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HBASE-11008) Align bulk load, flush, and compact to require Action.CREATE

2014-04-16 Thread Jean-Daniel Cryans (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-11008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Daniel Cryans updated HBASE-11008:
---

Attachment: HBASE-11008.patch

Attaching a patch for trunk. Not planning on committing until we get more 
consensus on this issue.

I do think that this is standalone from HBASE-10958 though, in the sense that 
these changes make sense to me on their own.

 Align bulk load, flush, and compact to require Action.CREATE
 

 Key: HBASE-11008
 URL: https://issues.apache.org/jira/browse/HBASE-11008
 Project: HBase
  Issue Type: Improvement
  Components: security
Reporter: Jean-Daniel Cryans
Assignee: Jean-Daniel Cryans
 Fix For: 0.99.0, 0.98.2, 0.96.3, 0.94.20

 Attachments: HBASE-11008.patch


 Over in HBASE-10958 we noticed that it might make sense to require 
 Action.CREATE for bulk load, flush, and compact since it is also required for 
 things like enable and disable.
 This means the following changes:
  - preBulkLoadHFile goes from WRITE to CREATE
  - compact/flush go from ADMIN to ADMIN or CREATE



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HBASE-11008) Align bulk load, flush, and compact to require Action.CREATE

2014-04-16 Thread Jean-Daniel Cryans (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-11008?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Daniel Cryans updated HBASE-11008:
---

Status: Patch Available  (was: Open)

 Align bulk load, flush, and compact to require Action.CREATE
 

 Key: HBASE-11008
 URL: https://issues.apache.org/jira/browse/HBASE-11008
 Project: HBase
  Issue Type: Improvement
  Components: security
Reporter: Jean-Daniel Cryans
Assignee: Jean-Daniel Cryans
 Fix For: 0.99.0, 0.98.2, 0.96.3, 0.94.20

 Attachments: HBASE-11008.patch


 Over in HBASE-10958 we noticed that it might make sense to require 
 Action.CREATE for bulk load, flush, and compact since it is also required for 
 things like enable and disable.
 This means the following changes:
  - preBulkLoadHFile goes from WRITE to CREATE
  - compact/flush go from ADMIN to ADMIN or CREATE



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-10958) [dataloss] Bulk loading with seqids can prevent some log entries from being replayed

2014-04-16 Thread Jean-Daniel Cryans (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13972111#comment-13972111
 ] 

Jean-Daniel Cryans commented on HBASE-10958:


Forgot to reply to [~stack]'s comment:

bq. flushSucceeded method (should it be isFlushSucceeded) and then you go get 
the result by accessing the data member directly. Minor inconsistency.

flushSucceeded() isn't just looking up a field though, it's checking two 
things. 

 [dataloss] Bulk loading with seqids can prevent some log entries from being 
 replayed
 

 Key: HBASE-10958
 URL: https://issues.apache.org/jira/browse/HBASE-10958
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.96.2, 0.98.1, 0.94.18
Reporter: Jean-Daniel Cryans
Assignee: Jean-Daniel Cryans
Priority: Blocker
 Fix For: 0.99.0, 0.94.19, 0.98.2, 0.96.3

 Attachments: HBASE-10958-less-intrusive-hack-0.96.patch, 
 HBASE-10958-quick-hack-0.96.patch, HBASE-10958-v2.patch, HBASE-10958.patch


 We found an issue with bulk loads causing data loss when assigning sequence 
 ids (HBASE-6630) that is triggered when replaying recovered edits. We're 
 nicknaming this issue *Blindspot*.
 The problem is that the sequence id given to a bulk loaded file is higher 
 than those of the edits in the region's memstore. When replaying recovered 
 edits, the rule to skip some of them is that they have to be _lower than the 
 highest sequence id_. In other words, the edits that have a sequence id lower 
 than the highest one in the store files *should* have also been flushed. This 
 is not the case with bulk loaded files since we now have an HFile with a 
 sequence id higher than unflushed edits.
 The log recovery code takes this into account by simply skipping the bulk 
 loaded files, but this bulk loaded status is *lost* on compaction. The 
 edits in the logs that have a sequence id lower than the bulk loaded file 
 that got compacted are put in a blind spot and are skipped during replay.
 Here's the easiest way to recreate this issue:
  - Create an empty table
  - Put one row in it (let's say it gets seqid 1)
  - Bulk load one file (it gets seqid 2). I used ImporTsv and set 
 hbase.mapreduce.bulkload.assign.sequenceNumbers.
  - Bulk load a second file the same way (it gets seqid 3).
  - Major compact the table (the new file has seqid 3 and isn't considered 
 bulk loaded).
  - Kill the region server that holds the table's region.
  - Scan the table once the region is made available again. The first row, at 
 seqid 1, will be missing since the HFile with seqid 3 makes us believe that 
 everything that came before it was flushed.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-11008) Align bulk load, flush, and compact to require Action.CREATE

2014-04-16 Thread Jean-Daniel Cryans (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13972130#comment-13972130
 ] 

Jean-Daniel Cryans commented on HBASE-11008:


bq.  If we choose to make it more restrictive,

The problem, as stated in HBASE-10958, is that bulk loading requires WRITE but 
effectively by going through {{LoadIncrementalHFiles}} it already requires 
CREATE. 

 Align bulk load, flush, and compact to require Action.CREATE
 

 Key: HBASE-11008
 URL: https://issues.apache.org/jira/browse/HBASE-11008
 Project: HBase
  Issue Type: Improvement
  Components: security
Reporter: Jean-Daniel Cryans
Assignee: Jean-Daniel Cryans
 Fix For: 0.99.0, 0.98.2, 0.96.3, 0.94.20

 Attachments: HBASE-11008.patch


 Over in HBASE-10958 we noticed that it might make sense to require 
 Action.CREATE for bulk load, flush, and compact since it is also required for 
 things like enable and disable.
 This means the following changes:
  - preBulkLoadHFile goes from WRITE to CREATE
  - compact/flush go from ADMIN to ADMIN or CREATE



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-10981) taskmonitor use subList may cause recursion and get a java.lang.StackOverflowError

2014-04-15 Thread Jean-Daniel Cryans (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10981?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13969737#comment-13969737
 ] 

Jean-Daniel Cryans commented on HBASE-10981:


I'd rather we use my patch from HBASE-10312.

 taskmonitor use subList may cause recursion and get a 
 java.lang.StackOverflowError
 --

 Key: HBASE-10981
 URL: https://issues.apache.org/jira/browse/HBASE-10981
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.98.0, 0.96.0, 0.96.2, 0.98.1, 0.96.1.1, 0.98.2
 Environment: java version 1.7.0_03 64bit
Reporter: sinfox
Priority: Critical
 Attachments: fixsubListException.patch

   Original Estimate: 5m
  Remaining Estimate: 5m

 2014-04-14 11:52:45,905 ERROR [RS_CLOSE_REGION-in16-062:60020-1] 
 executor.EventHandler: Caught throwable while processing event
 M_RS_CLOSE_REGION
 java.lang.RuntimeException: java.lang.StackOverflowError
 at 
 org.apache.hadoop.hbase.regionserver.handler.CloseRegionHandler.process(CloseRegionHandler.java:161)
 at 
 org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:128)
 at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
 at java.lang.Thread.run(Thread.java:722)
 Caused by: java.lang.StackOverflowError
 at java.util.ArrayList$SubList.add(ArrayList.java:965)
 at java.util.ArrayList$SubList.add(ArrayList.java:965)



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-10958) [dataloss] Bulk loading with seqids can prevent some log entries from being replayed

2014-04-15 Thread Jean-Daniel Cryans (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13970015#comment-13970015
 ] 

Jean-Daniel Cryans commented on HBASE-10958:


Oh and I found that testRegionMadeOfBulkLoadedFilesOnly wasn't really testing 
WAL replay with a bulk loaded file, the KV was being added at LATEST_TIMESTAMP 
so it wasn't visible being so far in the future. I made it so we check that we 
got all the rows back.

 [dataloss] Bulk loading with seqids can prevent some log entries from being 
 replayed
 

 Key: HBASE-10958
 URL: https://issues.apache.org/jira/browse/HBASE-10958
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.96.2, 0.98.1, 0.94.18
Reporter: Jean-Daniel Cryans
Assignee: Jean-Daniel Cryans
Priority: Blocker
 Fix For: 0.99.0, 0.94.19, 0.98.2, 0.96.3

 Attachments: HBASE-10958-less-intrusive-hack-0.96.patch, 
 HBASE-10958-quick-hack-0.96.patch, HBASE-10958-v2.patch, HBASE-10958.patch


 We found an issue with bulk loads causing data loss when assigning sequence 
 ids (HBASE-6630) that is triggered when replaying recovered edits. We're 
 nicknaming this issue *Blindspot*.
 The problem is that the sequence id given to a bulk loaded file is higher 
 than those of the edits in the region's memstore. When replaying recovered 
 edits, the rule to skip some of them is that they have to be _lower than the 
 highest sequence id_. In other words, the edits that have a sequence id lower 
 than the highest one in the store files *should* have also been flushed. This 
 is not the case with bulk loaded files since we now have an HFile with a 
 sequence id higher than unflushed edits.
 The log recovery code takes this into account by simply skipping the bulk 
 loaded files, but this bulk loaded status is *lost* on compaction. The 
 edits in the logs that have a sequence id lower than the bulk loaded file 
 that got compacted are put in a blind spot and are skipped during replay.
 Here's the easiest way to recreate this issue:
  - Create an empty table
  - Put one row in it (let's say it gets seqid 1)
  - Bulk load one file (it gets seqid 2). I used ImporTsv and set 
 hbase.mapreduce.bulkload.assign.sequenceNumbers.
  - Bulk load a second file the same way (it gets seqid 3).
  - Major compact the table (the new file has seqid 3 and isn't considered 
 bulk loaded).
  - Kill the region server that holds the table's region.
  - Scan the table once the region is made available again. The first row, at 
 seqid 1, will be missing since the HFile with seqid 3 makes us believe that 
 everything that came before it was flushed.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HBASE-10958) [dataloss] Bulk loading with seqids can prevent some log entries from being replayed

2014-04-15 Thread Jean-Daniel Cryans (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10958?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Daniel Cryans updated HBASE-10958:
---

Attachment: HBASE-10958-v2.patch

New patch with a unit test inside {{TestWALReplay}}. I've done a bit of 
refactoring and I'm tempted to go a step further and somehow extract the common 
bits from testRegionMadeOfBulkLoadedFilesOnly and testCompactedBulkLoadedFiles 
but it seems a bit messy.

 [dataloss] Bulk loading with seqids can prevent some log entries from being 
 replayed
 

 Key: HBASE-10958
 URL: https://issues.apache.org/jira/browse/HBASE-10958
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.96.2, 0.98.1, 0.94.18
Reporter: Jean-Daniel Cryans
Assignee: Jean-Daniel Cryans
Priority: Blocker
 Fix For: 0.99.0, 0.94.19, 0.98.2, 0.96.3

 Attachments: HBASE-10958-less-intrusive-hack-0.96.patch, 
 HBASE-10958-quick-hack-0.96.patch, HBASE-10958-v2.patch, HBASE-10958.patch


 We found an issue with bulk loads causing data loss when assigning sequence 
 ids (HBASE-6630) that is triggered when replaying recovered edits. We're 
 nicknaming this issue *Blindspot*.
 The problem is that the sequence id given to a bulk loaded file is higher 
 than those of the edits in the region's memstore. When replaying recovered 
 edits, the rule to skip some of them is that they have to be _lower than the 
 highest sequence id_. In other words, the edits that have a sequence id lower 
 than the highest one in the store files *should* have also been flushed. This 
 is not the case with bulk loaded files since we now have an HFile with a 
 sequence id higher than unflushed edits.
 The log recovery code takes this into account by simply skipping the bulk 
 loaded files, but this bulk loaded status is *lost* on compaction. The 
 edits in the logs that have a sequence id lower than the bulk loaded file 
 that got compacted are put in a blind spot and are skipped during replay.
 Here's the easiest way to recreate this issue:
  - Create an empty table
  - Put one row in it (let's say it gets seqid 1)
  - Bulk load one file (it gets seqid 2). I used ImporTsv and set 
 hbase.mapreduce.bulkload.assign.sequenceNumbers.
  - Bulk load a second file the same way (it gets seqid 3).
  - Major compact the table (the new file has seqid 3 and isn't considered 
 bulk loaded).
  - Kill the region server that holds the table's region.
  - Scan the table once the region is made available again. The first row, at 
 seqid 1, will be missing since the HFile with seqid 3 makes us believe that 
 everything that came before it was flushed.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-10958) [dataloss] Bulk loading with seqids can prevent some log entries from being replayed

2014-04-15 Thread Jean-Daniel Cryans (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13970118#comment-13970118
 ] 

Jean-Daniel Cryans commented on HBASE-10958:


That's a real test failure, the user that bulk loads doesn't have the 
permission to flush. Looking.

 [dataloss] Bulk loading with seqids can prevent some log entries from being 
 replayed
 

 Key: HBASE-10958
 URL: https://issues.apache.org/jira/browse/HBASE-10958
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.96.2, 0.98.1, 0.94.18
Reporter: Jean-Daniel Cryans
Assignee: Jean-Daniel Cryans
Priority: Blocker
 Fix For: 0.99.0, 0.94.19, 0.98.2, 0.96.3

 Attachments: HBASE-10958-less-intrusive-hack-0.96.patch, 
 HBASE-10958-quick-hack-0.96.patch, HBASE-10958-v2.patch, HBASE-10958.patch


 We found an issue with bulk loads causing data loss when assigning sequence 
 ids (HBASE-6630) that is triggered when replaying recovered edits. We're 
 nicknaming this issue *Blindspot*.
 The problem is that the sequence id given to a bulk loaded file is higher 
 than those of the edits in the region's memstore. When replaying recovered 
 edits, the rule to skip some of them is that they have to be _lower than the 
 highest sequence id_. In other words, the edits that have a sequence id lower 
 than the highest one in the store files *should* have also been flushed. This 
 is not the case with bulk loaded files since we now have an HFile with a 
 sequence id higher than unflushed edits.
 The log recovery code takes this into account by simply skipping the bulk 
 loaded files, but this bulk loaded status is *lost* on compaction. The 
 edits in the logs that have a sequence id lower than the bulk loaded file 
 that got compacted are put in a blind spot and are skipped during replay.
 Here's the easiest way to recreate this issue:
  - Create an empty table
  - Put one row in it (let's say it gets seqid 1)
  - Bulk load one file (it gets seqid 2). I used ImporTsv and set 
 hbase.mapreduce.bulkload.assign.sequenceNumbers.
  - Bulk load a second file the same way (it gets seqid 3).
  - Major compact the table (the new file has seqid 3 and isn't considered 
 bulk loaded).
  - Kill the region server that holds the table's region.
  - Scan the table once the region is made available again. The first row, at 
 seqid 1, will be missing since the HFile with seqid 3 makes us believe that 
 everything that came before it was flushed.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-10312) Flooding the cluster with administrative actions leads to collapse

2014-04-15 Thread Jean-Daniel Cryans (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13970152#comment-13970152
 ] 

Jean-Daniel Cryans commented on HBASE-10312:


Doing now.

 Flooding the cluster with administrative actions leads to collapse
 --

 Key: HBASE-10312
 URL: https://issues.apache.org/jira/browse/HBASE-10312
 Project: HBase
  Issue Type: Bug
Reporter: Andrew Purtell
Assignee: Jean-Daniel Cryans
Priority: Critical
 Fix For: 0.99.0, 0.94.19, 0.98.2, 0.96.3

 Attachments: HBASE-10312-0.94.patch, HBASE-10312.patch


 Steps to reproduce:
 1. Start a cluster.
 2. Start an ingest process.
 3. In the HBase shell, do this:
 {noformat}
 while true do
flush 'table'
 end
 {noformat}
 We should reject abuse via administrative requests like this.
 What happens on the cluster is the requests back up, leading to lots of these:
 {noformat}
 2014-01-10 18:55:55,293 WARN  [Priority.RpcServer.handler=2,port=8120] 
 monitoring.TaskMonitor: Too many actions in action monitor! Purging some.
 {noformat}
 At this point we could lower a gate on further requests for actions until the 
 backlog clears.
 Continuing, all of the regionservers will eventually die with a 
 StackOverflowError of unknown origin because, stack overflow:
 {noformat}
 2014-01-10 19:02:02,783 ERROR [Priority.RpcServer.handler=3,port=8120] 
 ipc.RpcServer: Unexpected throwable object java.lang.StackOverflowError
 at java.util.ArrayList$SubList.add(ArrayList.java:965)
 [...]
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-10958) [dataloss] Bulk loading with seqids can prevent some log entries from being replayed

2014-04-15 Thread Jean-Daniel Cryans (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13970158#comment-13970158
 ] 

Jean-Daniel Cryans commented on HBASE-10958:


Yeah, and bulk loading itself is kind of like a flush since you end up with an 
HFile.

 [dataloss] Bulk loading with seqids can prevent some log entries from being 
 replayed
 

 Key: HBASE-10958
 URL: https://issues.apache.org/jira/browse/HBASE-10958
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.96.2, 0.98.1, 0.94.18
Reporter: Jean-Daniel Cryans
Assignee: Jean-Daniel Cryans
Priority: Blocker
 Fix For: 0.99.0, 0.94.19, 0.98.2, 0.96.3

 Attachments: HBASE-10958-less-intrusive-hack-0.96.patch, 
 HBASE-10958-quick-hack-0.96.patch, HBASE-10958-v2.patch, HBASE-10958.patch


 We found an issue with bulk loads causing data loss when assigning sequence 
 ids (HBASE-6630) that is triggered when replaying recovered edits. We're 
 nicknaming this issue *Blindspot*.
 The problem is that the sequence id given to a bulk loaded file is higher 
 than those of the edits in the region's memstore. When replaying recovered 
 edits, the rule to skip some of them is that they have to be _lower than the 
 highest sequence id_. In other words, the edits that have a sequence id lower 
 than the highest one in the store files *should* have also been flushed. This 
 is not the case with bulk loaded files since we now have an HFile with a 
 sequence id higher than unflushed edits.
 The log recovery code takes this into account by simply skipping the bulk 
 loaded files, but this bulk loaded status is *lost* on compaction. The 
 edits in the logs that have a sequence id lower than the bulk loaded file 
 that got compacted are put in a blind spot and are skipped during replay.
 Here's the easiest way to recreate this issue:
  - Create an empty table
  - Put one row in it (let's say it gets seqid 1)
  - Bulk load one file (it gets seqid 2). I used ImporTsv and set 
 hbase.mapreduce.bulkload.assign.sequenceNumbers.
  - Bulk load a second file the same way (it gets seqid 3).
  - Major compact the table (the new file has seqid 3 and isn't considered 
 bulk loaded).
  - Kill the region server that holds the table's region.
  - Scan the table once the region is made available again. The first row, at 
 seqid 1, will be missing since the HFile with seqid 3 makes us believe that 
 everything that came before it was flushed.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HBASE-10312) Flooding the cluster with administrative actions leads to collapse

2014-04-15 Thread Jean-Daniel Cryans (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-10312?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Daniel Cryans updated HBASE-10312:
---

  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

Committed to 0.99/8/6/4, thanks for the inputs guys.

 Flooding the cluster with administrative actions leads to collapse
 --

 Key: HBASE-10312
 URL: https://issues.apache.org/jira/browse/HBASE-10312
 Project: HBase
  Issue Type: Bug
Reporter: Andrew Purtell
Assignee: Jean-Daniel Cryans
Priority: Critical
 Fix For: 0.99.0, 0.94.19, 0.98.2, 0.96.3

 Attachments: HBASE-10312-0.94.patch, HBASE-10312.patch


 Steps to reproduce:
 1. Start a cluster.
 2. Start an ingest process.
 3. In the HBase shell, do this:
 {noformat}
 while true do
flush 'table'
 end
 {noformat}
 We should reject abuse via administrative requests like this.
 What happens on the cluster is the requests back up, leading to lots of these:
 {noformat}
 2014-01-10 18:55:55,293 WARN  [Priority.RpcServer.handler=2,port=8120] 
 monitoring.TaskMonitor: Too many actions in action monitor! Purging some.
 {noformat}
 At this point we could lower a gate on further requests for actions until the 
 backlog clears.
 Continuing, all of the regionservers will eventually die with a 
 StackOverflowError of unknown origin because, stack overflow:
 {noformat}
 2014-01-10 19:02:02,783 ERROR [Priority.RpcServer.handler=3,port=8120] 
 ipc.RpcServer: Unexpected throwable object java.lang.StackOverflowError
 at java.util.ArrayList$SubList.add(ArrayList.java:965)
 [...]
 {noformat}



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-10958) [dataloss] Bulk loading with seqids can prevent some log entries from being replayed

2014-04-15 Thread Jean-Daniel Cryans (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13970166#comment-13970166
 ] 

Jean-Daniel Cryans commented on HBASE-10958:


Right now bulk loading is WRITE and flush is ADMIN. Problematic!

 [dataloss] Bulk loading with seqids can prevent some log entries from being 
 replayed
 

 Key: HBASE-10958
 URL: https://issues.apache.org/jira/browse/HBASE-10958
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.96.2, 0.98.1, 0.94.18
Reporter: Jean-Daniel Cryans
Assignee: Jean-Daniel Cryans
Priority: Blocker
 Fix For: 0.99.0, 0.94.19, 0.98.2, 0.96.3

 Attachments: HBASE-10958-less-intrusive-hack-0.96.patch, 
 HBASE-10958-quick-hack-0.96.patch, HBASE-10958-v2.patch, HBASE-10958.patch


 We found an issue with bulk loads causing data loss when assigning sequence 
 ids (HBASE-6630) that is triggered when replaying recovered edits. We're 
 nicknaming this issue *Blindspot*.
 The problem is that the sequence id given to a bulk loaded file is higher 
 than those of the edits in the region's memstore. When replaying recovered 
 edits, the rule to skip some of them is that they have to be _lower than the 
 highest sequence id_. In other words, the edits that have a sequence id lower 
 than the highest one in the store files *should* have also been flushed. This 
 is not the case with bulk loaded files since we now have an HFile with a 
 sequence id higher than unflushed edits.
 The log recovery code takes this into account by simply skipping the bulk 
 loaded files, but this bulk loaded status is *lost* on compaction. The 
 edits in the logs that have a sequence id lower than the bulk loaded file 
 that got compacted are put in a blind spot and are skipped during replay.
 Here's the easiest way to recreate this issue:
  - Create an empty table
  - Put one row in it (let's say it gets seqid 1)
  - Bulk load one file (it gets seqid 2). I used ImporTsv and set 
 hbase.mapreduce.bulkload.assign.sequenceNumbers.
  - Bulk load a second file the same way (it gets seqid 3).
  - Major compact the table (the new file has seqid 3 and isn't considered 
 bulk loaded).
  - Kill the region server that holds the table's region.
  - Scan the table once the region is made available again. The first row, at 
 seqid 1, will be missing since the HFile with seqid 3 makes us believe that 
 everything that came before it was flushed.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-10958) [dataloss] Bulk loading with seqids can prevent some log entries from being replayed

2014-04-15 Thread Jean-Daniel Cryans (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10958?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13970173#comment-13970173
 ] 

Jean-Daniel Cryans commented on HBASE-10958:


Solutions on top of my head:

- Do doAs inside the method. The rationale being that it's the region server 
that's trying to do that here, not the user. I'm not sure if this is doable, 
and [~mbertozzi] is laughing at me.

- Matteo thinks that we should just make bulk load an ADMIN method. I agree but 
I'm not fond of breaking secure setups that use bulk loads. [~apurtell]?

 [dataloss] Bulk loading with seqids can prevent some log entries from being 
 replayed
 

 Key: HBASE-10958
 URL: https://issues.apache.org/jira/browse/HBASE-10958
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.96.2, 0.98.1, 0.94.18
Reporter: Jean-Daniel Cryans
Assignee: Jean-Daniel Cryans
Priority: Blocker
 Fix For: 0.99.0, 0.94.19, 0.98.2, 0.96.3

 Attachments: HBASE-10958-less-intrusive-hack-0.96.patch, 
 HBASE-10958-quick-hack-0.96.patch, HBASE-10958-v2.patch, HBASE-10958.patch


 We found an issue with bulk loads causing data loss when assigning sequence 
 ids (HBASE-6630) that is triggered when replaying recovered edits. We're 
 nicknaming this issue *Blindspot*.
 The problem is that the sequence id given to a bulk loaded file is higher 
 than those of the edits in the region's memstore. When replaying recovered 
 edits, the rule to skip some of them is that they have to be _lower than the 
 highest sequence id_. In other words, the edits that have a sequence id lower 
 than the highest one in the store files *should* have also been flushed. This 
 is not the case with bulk loaded files since we now have an HFile with a 
 sequence id higher than unflushed edits.
 The log recovery code takes this into account by simply skipping the bulk 
 loaded files, but this bulk loaded status is *lost* on compaction. The 
 edits in the logs that have a sequence id lower than the bulk loaded file 
 that got compacted are put in a blind spot and are skipped during replay.
 Here's the easiest way to recreate this issue:
  - Create an empty table
  - Put one row in it (let's say it gets seqid 1)
  - Bulk load one file (it gets seqid 2). I used ImporTsv and set 
 hbase.mapreduce.bulkload.assign.sequenceNumbers.
  - Bulk load a second file the same way (it gets seqid 3).
  - Major compact the table (the new file has seqid 3 and isn't considered 
 bulk loaded).
  - Kill the region server that holds the table's region.
  - Scan the table once the region is made available again. The first row, at 
 seqid 1, will be missing since the HFile with seqid 3 makes us believe that 
 everything that came before it was flushed.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


  1   2   3   4   5   6   7   8   9   10   >