date:20120125

2012-01-25 Thread jirapos...@reviews.apache.org (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13193130#comment-13193130
 ] 

Phabricator commented on HBASE-4218:


gqchen has commented on the revision [jira] [HBASE-4218] HFile data block 
encoding framework and delta encoding implementation.

  Looks really good to me!

  I haven't finished reviewing DiffKeyDeltaEncoding (another day or so) and 
might probably have a few minor comments about cosmetic things. But definitely 
no need to wait for that.

INLINE COMMENTS
  
src/main/java/org/apache/hadoop/hbase/io/encoding/FastDiffDeltaEncoder.java:198 
I think the logic is the following:
  1. if the value not the same, copy the whole value.
  2. however, if type is also not the same, take advantage of the fact that 
type field is right ahead of value, and copy both type and value in one 
shot.

  So the code would be like:

  if ((flag  FLAG_SAME_VALUE) == 0) {
  if ((flag  FALG_SAME_TYPE) == 0) {
 valueOffset -= ...
 valueLength += ...
  }
  ByteBufferUtils.copy...
  }

  The headache is if we decide to add one more field between type and value 
in the future, this code will be silently broken.

REVISION DETAIL
  https://reviews.facebook.net/D447


 Data Block Encoding of KeyValues  (aka delta encoding / prefix compression)
 ---

 Key: HBASE-4218
 URL: https://issues.apache.org/jira/browse/HBASE-4218
 Project: HBase
  Issue Type: Improvement
  Components: io
Affects Versions: 0.94.0
Reporter: Jacek Migdal
Assignee: Mikhail Bautin
  Labels: compression
 Fix For: 0.94.0

 Attachments: 0001-Delta-encoding-fixed-encoded-scanners.patch, 
 0001-Delta-encoding.patch, 4218-2012-01-14.txt, 4218-v16.txt, 4218.txt, 
 D447.1.patch, D447.10.patch, D447.11.patch, D447.12.patch, D447.13.patch, 
 D447.14.patch, D447.15.patch, D447.16.patch, D447.17.patch, D447.18.patch, 
 D447.19.patch, D447.2.patch, D447.20.patch, D447.21.patch, D447.22.patch, 
 D447.23.patch, D447.24.patch, D447.25.patch, D447.3.patch, D447.4.patch, 
 D447.5.patch, D447.6.patch, D447.7.patch, D447.8.patch, D447.9.patch, 
 Data-block-encoding-2011-12-23.patch, 
 Delta-encoding-2012-01-17_11_09_09.patch, 
 Delta-encoding-2012-01-25_00_45_29.patch, 
 Delta-encoding.patch-2011-12-22_11_52_07.patch, 
 Delta-encoding.patch-2012-01-05_15_16_43.patch, 
 Delta-encoding.patch-2012-01-05_16_31_44.patch, 
 Delta-encoding.patch-2012-01-05_16_31_44_copy.patch, 
 Delta-encoding.patch-2012-01-05_18_50_47.patch, 
 Delta-encoding.patch-2012-01-07_14_12_48.patch, 
 Delta-encoding.patch-2012-01-13_12_20_07.patch, 
 Delta_encoding_with_memstore_TS.patch, open-source.diff


 A compression for keys. Keys are sorted in HFile and they are usually very 
 similar. Because of that, it is possible to design better compression than 
 general purpose algorithms,
 It is an additional step designed to be used in memory. It aims to save 
 memory in cache as well as speeding seeks within HFileBlocks. It should 
 improve performance a lot, if key lengths are larger than value lengths. For 
 example, it makes a lot of sense to use it when value is a counter.
 Initial tests on real data (key length = ~ 90 bytes , value length = 8 bytes) 
 shows that I could achieve decent level of compression:
  key compression ratio: 92%
  total compression ratio: 85%
  LZO on the same data: 85%
  LZO after delta encoding: 91%
 While having much better performance (20-80% faster decompression ratio than 
 LZO). Moreover, it should allow far more efficient seeking which should 
 improve performance a bit.
 It seems that a simple compression algorithms are good enough. Most of the 
 savings are due to prefix compression, int128 encoding, timestamp diffs and 
 bitfields to avoid duplication. That way, comparisons of compressed data can 
 be much faster than a byte comparator (thanks to prefix compression and 
 bitfields).
 In order to implement it in HBase two important changes in design will be 
 needed:
 -solidify interface to HFileBlock / HFileReader Scanner to provide seeking 
 and iterating; access to uncompressed buffer in HFileBlock will have bad 
 performance
 -extend comparators to support comparison assuming that N first bytes are 
 equal (or some fields are equal)
 Link to a discussion about something similar:
 http://search-hadoop.com/m/5aqGXJEnaD1/hbase+windowssubj=Re+prefix+compression

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HBASE-5277) Put AssignmentManager on a diet

2012-01-25 Thread stack (Created) (JIRA)

Put AssignmentManager on a diet
---

 Key: HBASE-5277
 URL: https://issues.apache.org/jira/browse/HBASE-5277
 Project: HBase
  Issue Type: Task
Reporter: stack


AM has five or six inner classes, it takes zk callbacks, it has distinct Maps 
that are supposed to be edited together (regions and servers) and that then 
have some interaction w/ notion of regionsInTransition, etc.  It could do w/ a 
tune-up so it comes back into the realm of the penetrable (currently I'd not be 
surprised if its workings were beyond the ken of all who here code). 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5128) [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.

[
https://issues.apache.org/jira/browse/HBASE-5128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13193139#comment-13193139
]

jirapos...@reviews.apache.org commented on HBASE-5128:
--

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3435/
---

(Updated 2012-01-25 17:24:41.277326)

Review request for hbase, Todd Lipcon, Ted Yu, Michael Stack, and Jean-Daniel
Cryans.

Changes
---

This version includes updates after testing against real online but idle
clusters with real induced corruptions. This was hbck was tested successfully
against apache/0.90+this patch branch region servers and regionservers on
cdh3u2 (an 0.90.4-based hbase without the new offline method).

I'm going to post usage description and images I've created to explain this
better on the JIRA.

High level changes in this rev.
- hbck now wraps calls to the offline method and will use unasssign if the
target region server does not support offline.
- restructured hdfs integrity repairs into more phases -- when compound
problems were present we'd get into a loop where orphan repair would cause new
overlaps on a subsequent integrity repair iteration. This new approach should
be deterministic. The new phases are 1) Find hdfs holes and patch (post
condition: no more holes), 2) adopt orphan hdfs regions (post condition: no
orphan data in hdfs) 3) reload and fix overlaps (precondition: no holes but
overlaps possible; post condition: no overlaps). Previously integrity repairs
would interate doing all three until it converged (but this didn't always
happen in practice!).
- Added more command line options that allow this hbck to only attempt certain
repairs (which is necessary to get overlap repairs to work more
deterministically, and needed in to get non-offline supporting hbases to
converge)
- Added a few more test cases for new corruptions.

One big caveat with this rev is that the hbase was online but idle (no writes
happening). It was also suggested that I need to worry about compactions when
I close regions during overlap merging (JD -- I didn't see anything in
OnlineMerge -- why wasn't this a concern there?). If so, I'd like advice on
how to add guards to protect the user (is a glaring warning message or
requiring confirmation sufficient?). I'm going to do some initial testing on
online and active cases -- but ideally would like this to come in follow on
jiras.

Summary
---

I'm posting a preliminary version that I'm currently testing on real clusters.
The tests are flakey on the 0.90 branch (so there is something async that I
didn't synchronize properly), and there are a few more TODO's I want to knock
out before this is ready for full review to be considered for committing. It's
got some problems I need some advice figuring out.

Problem 1:

In the unit tests, I have a few cases where I fabricate new regions and try to
force the overlapping regions to be closed. For some of these, I cannot delete
a table after it is repaired without causing subsequent tests to fail. I think
this is due to a few things:

1) The disable table handler uses in-memory assignment manager state while
delete uses in META assignment information.
2) Currently I'm using the sneaky closeRegion that purposely doesn't go through
the master and in turn doesn't modify in-memory state – disable uses out of
date in-memory region assignments. If I use the unassign method sends RIT
transitions to the master, but which ends up attempting to assign it again,
causing timing/transient states.

What is a good way to clear the HMaster's assignment manager's assignment data
for particular regions or to force it to re-read from META? (without modifying
the 0.90 HBase's it is meant to repair).

Problem 2:

Sometimes test fail reporting HOLE_IN_REGION_CHAIN and
SERVER_DOES_NOT_MATCH_META. This means the old and new regions are confiused
with each other and basically something is still happening asynchronously. I
think this is the new region is being assigned and is still transitioning.
Sound about right? To make the unit test deterministic, should hbck wait for
these to settle or should just the unit test wait?

This addresses bug HBASE-5128.
https://issues.apache.org/jira/browse/HBASE-5128

Diffs (updated)
-

src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java c56b3a6
src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java 9520b95
src/main/java/org/apache/hadoop/hbase/master/HMaster.java f7ad064
src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java 6d3401d
src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java a3d8b8b
src/main/java/org/apache/hadoop/hbase/util/hbck/OfflineMetaRepair.java
29e8bb2

[jira] [Updated] (HBASE-5277) Put AssignmentManager on a diet

2012-01-25 Thread stack (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5277?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-5277:
-

Attachment: 5277v1.txt

Move RegionState out into its own package private class.  Make new 
datastructure ServersAndRegions that manages the servers to regions and regions 
to servers Maps.

Not done yet.  Cuts AM by 1k lines or about 8%.  More to do.

 Put AssignmentManager on a diet
 ---

 Key: HBASE-5277
 URL: https://issues.apache.org/jira/browse/HBASE-5277
 Project: HBase
  Issue Type: Task
Reporter: stack
 Attachments: 5277v1.txt


 AM has five or six inner classes, it takes zk callbacks, it has distinct Maps 
 that are supposed to be edited together (regions and servers) and that then 
 have some interaction w/ notion of regionsInTransition, etc.  It could do w/ 
 a tune-up so it comes back into the realm of the penetrable (currently I'd 
 not be surprised if its workings were beyond the ken of all who here code). 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4218) Data Block Encoding of KeyValues (aka delta encoding / prefix compression)

2012-01-25 Thread Mikhail Bautin (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13193148#comment-13193148
 ] 

Mikhail Bautin commented on HBASE-4218:
---

Re-running unit tests that failed on Jenkins:

Running org.apache.hadoop.hbase.client.TestFromClientSide
Tests run: 52, Failures: 0, Errors: 0, Skipped: 3, Time elapsed: 181.919 sec
Running org.apache.hadoop.hbase.client.TestAdmin
Tests run: 35, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 195.194 sec
Running org.apache.hadoop.hbase.mapreduce.TestHFileOutputFormat
Tests run: 9, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 223.405 sec
Running org.apache.hadoop.hbase.mapreduce.TestImportTsv
Tests run: 8, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 78.48 sec
Running org.apache.hadoop.hbase.mapreduce.TestTableMapReduce
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 97.561 sec
Running org.apache.hadoop.hbase.mapred.TestTableMapReduce
Tests run: 1, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 67.289 sec
Running org.apache.hadoop.hbase.io.hfile.TestHFileBlock
Tests run: 16, Failures: 0, Errors: 0, Skipped: 0, Time elapsed: 49.362 sec

Results :

Tests run: 122, Failures: 0, Errors: 0, Skipped: 3


 Data Block Encoding of KeyValues  (aka delta encoding / prefix compression)
 ---

 Key: HBASE-4218
 URL: https://issues.apache.org/jira/browse/HBASE-4218
 Project: HBase
  Issue Type: Improvement
  Components: io
Affects Versions: 0.94.0
Reporter: Jacek Migdal
Assignee: Mikhail Bautin
  Labels: compression
 Fix For: 0.94.0

 Attachments: 0001-Delta-encoding-fixed-encoded-scanners.patch, 
 0001-Delta-encoding.patch, 4218-2012-01-14.txt, 4218-v16.txt, 4218.txt, 
 D447.1.patch, D447.10.patch, D447.11.patch, D447.12.patch, D447.13.patch, 
 D447.14.patch, D447.15.patch, D447.16.patch, D447.17.patch, D447.18.patch, 
 D447.19.patch, D447.2.patch, D447.20.patch, D447.21.patch, D447.22.patch, 
 D447.23.patch, D447.24.patch, D447.25.patch, D447.3.patch, D447.4.patch, 
 D447.5.patch, D447.6.patch, D447.7.patch, D447.8.patch, D447.9.patch, 
 Data-block-encoding-2011-12-23.patch, 
 Delta-encoding-2012-01-17_11_09_09.patch, 
 Delta-encoding-2012-01-25_00_45_29.patch, 
 Delta-encoding.patch-2011-12-22_11_52_07.patch, 
 Delta-encoding.patch-2012-01-05_15_16_43.patch, 
 Delta-encoding.patch-2012-01-05_16_31_44.patch, 
 Delta-encoding.patch-2012-01-05_16_31_44_copy.patch, 
 Delta-encoding.patch-2012-01-05_18_50_47.patch, 
 Delta-encoding.patch-2012-01-07_14_12_48.patch, 
 Delta-encoding.patch-2012-01-13_12_20_07.patch, 
 Delta_encoding_with_memstore_TS.patch, open-source.diff


 A compression for keys. Keys are sorted in HFile and they are usually very 
 similar. Because of that, it is possible to design better compression than 
 general purpose algorithms,
 It is an additional step designed to be used in memory. It aims to save 
 memory in cache as well as speeding seeks within HFileBlocks. It should 
 improve performance a lot, if key lengths are larger than value lengths. For 
 example, it makes a lot of sense to use it when value is a counter.
 Initial tests on real data (key length = ~ 90 bytes , value length = 8 bytes) 
 shows that I could achieve decent level of compression:
  key compression ratio: 92%
  total compression ratio: 85%
  LZO on the same data: 85%
  LZO after delta encoding: 91%
 While having much better performance (20-80% faster decompression ratio than 
 LZO). Moreover, it should allow far more efficient seeking which should 
 improve performance a bit.
 It seems that a simple compression algorithms are good enough. Most of the 
 savings are due to prefix compression, int128 encoding, timestamp diffs and 
 bitfields to avoid duplication. That way, comparisons of compressed data can 
 be much faster than a byte comparator (thanks to prefix compression and 
 bitfields).
 In order to implement it in HBase two important changes in design will be 
 needed:
 -solidify interface to HFileBlock / HFileReader Scanner to provide seeking 
 and iterating; access to uncompressed buffer in HFileBlock will have bad 
 performance
 -extend comparators to support comparison assuming that N first bytes are 
 equal (or some fields are equal)
 Link to a discussion about something similar:
 http://search-hadoop.com/m/5aqGXJEnaD1/hbase+windowssubj=Re+prefix+compression

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5128) [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.

2012-01-25 Thread jirapos...@reviews.apache.org (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13193154#comment-13193154
 ] 

jirapos...@reviews.apache.org commented on HBASE-5128:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/3435/#review4591
---


We should deprecate clearRegionFromTransition().


src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
https://reviews.apache.org/r/3435/#comment10238

I think a boolean return value would help determine the outcome of the 
action.



src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
https://reviews.apache.org/r/3435/#comment10237

This sentence should be moved before ' from ...'



src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
https://reviews.apache.org/r/3435/#comment10234

We should handle potential exception from this method.

Maybe we should check the availability of this rpc outside the loop and set 
a flag indicating whether Master supports this RPC.



src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
https://reviews.apache.org/r/3435/#comment10240

I would expect a boolean return value since we may return without throwing 
exception (line 1125)



src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
https://reviews.apache.org/r/3435/#comment10239

How about naming this method hasHdfsOnlyEdits() ?



src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
https://reviews.apache.org/r/3435/#comment10233

This TODO has been implemented, so we can remove it.



src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java
https://reviews.apache.org/r/3435/#comment10232

More action is needed beyond a WARN message, right ?



src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java
https://reviews.apache.org/r/3435/#comment10235

success is local variable.
Why don't we change return type to boolean and return its value ?



src/main/java/org/apache/hadoop/hbase/util/HBaseFsckRepair.java
https://reviews.apache.org/r/3435/#comment10236

We should set interrupt flag.


- Ted


On 2012-01-25 17:24:41, jmhsieh wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/3435/
bq.  ---
bq.  
bq.  (Updated 2012-01-25 17:24:41)
bq.  
bq.  
bq.  Review request for hbase, Todd Lipcon, Ted Yu, Michael Stack, and 
Jean-Daniel Cryans.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  I'm posting a preliminary version that I'm currently testing on real 
clusters. The tests are flakey on the 0.90 branch (so there is something async 
that I didn't synchronize properly), and there are a few more TODO's I want to 
knock out before this is ready for full review to be considered for committing. 
It's got some problems I need some advice figuring out.
bq.  
bq.  Problem 1:
bq.  
bq.  In the unit tests, I have a few cases where I fabricate new regions and 
try to force the overlapping regions to be closed. For some of these, I cannot 
delete a table after it is repaired without causing subsequent tests to fail. I 
think this is due to a few things:
bq.  
bq.  1) The disable table handler uses in-memory assignment manager state while 
delete uses in META assignment information.
bq.  2) Currently I'm using the sneaky closeRegion that purposely doesn't go 
through the master and in turn doesn't modify in-memory state – disable uses 
out of date in-memory region assignments. If I use the unassign method sends 
RIT transitions to the master, but which ends up attempting to assign it again, 
causing timing/transient states.
bq.  
bq.  What is a good way to clear the HMaster's assignment manager's assignment 
data for particular regions or to force it to re-read from META? (without 
modifying the 0.90 HBase's it is meant to repair).
bq.  
bq.  Problem 2:
bq.  
bq.  Sometimes test fail reporting HOLE_IN_REGION_CHAIN and 
SERVER_DOES_NOT_MATCH_META. This means the old and new regions are confiused 
with each other and basically something is still happening asynchronously. I 
think this is the new region is being assigned and is still transitioning. 
Sound about right? To make the unit test deterministic, should hbck wait for 
these to settle or should just the unit test wait?
bq.  
bq.  
bq.  This addresses bug HBASE-5128.
bq.  https://issues.apache.org/jira/browse/HBASE-5128
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.src/main/java/org/apache/hadoop/hbase/ipc/HMasterInterface.java c56b3a6 
bq.src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java 
9520b95 
bq.src/main/java/org/apache/hadoop/hbase/master/HMaster.java f7ad064 
bq.

[jira] [Updated] (HBASE-5258) Move coprocessors set out of RegionLoad, region server should calculate disparity of loaded coprocessors among regions and send report through HServerLoad

2012-01-25 Thread Zhihong Yu (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zhihong Yu updated HBASE-5258:
--

Priority: Critical  (was: Major)

Raising priority as this task is about making the correct design choices.

 Move coprocessors set out of RegionLoad, region server should calculate 
 disparity of loaded coprocessors among regions and send report through 
 HServerLoad
 --

 Key: HBASE-5258
 URL: https://issues.apache.org/jira/browse/HBASE-5258
 Project: HBase
  Issue Type: Task
Reporter: Zhihong Yu
Priority: Critical

 When I worked on HBASE-5256, I revisited the code related to Ser/De of 
 coprocessors set in RegionLoad.
 I think the rationale for embedding coprocessors set is for maximum 
 flexibility where each region can load different coprocessors.
 This flexibility is causing extra cost in the region server to Master 
 communication and increasing the footprint of Master heap.
 Would HServerLoad be a better place for this set ?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5270) Handle potential data loss due to concurrent processing of processFaileOver and ServerShutdownHandler

[
https://issues.apache.org/jira/browse/HBASE-5270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13193187#comment-13193187
]

stack commented on HBASE-5270:
--

bq. So, the param 'definitiveRootServer' is used in this case to ensure the
dead root server is carryingRoot when it is being expired.

Whats 'definitive' about it? Is it that we know for sure the server was
carrying root or meta? How?

bq. Is there any possible to expire a server if its carrying root and meta now?
I don't think so.

You are saying that this patch does nothing new here? We COULD expire the
server that was carrying root, wait on its log split, then expire the server
carrying meta (though it may have been the same server)... it might be ok but
we might kill a server that has just started. I'm ok if fixing this is outside
scope of this patch.

bq. I don't find this operation earlier in master setup, and this operation is
not introduced by this issue. And I only introduce this logic for 90 from trunk.

So, you copied this to 0.90 from TRUNK (so my notion that we already had this
is my remembering how things work on TRUNK.. that would make sense).

bq. I think we need explain it, But whether we shouldn't use distributed split
log, I'm not very sure.

If we are not sure, we shouldn't do it.

bq. When matser is initializing, if one RS is killed and restart, then dead
server is in progress while master startup

This seems like a small window. Or do you think it could happen frequent?
Could we hold up shutdownserverhandler until master is up?

Handle potential data loss due to concurrent processing of processFaileOver
and ServerShutdownHandler
-

Key: HBASE-5270
URL: https://issues.apache.org/jira/browse/HBASE-5270
Project: HBase
Issue Type: Sub-task
Components: master
Reporter: Zhihong Yu
Fix For: 0.94.0, 0.92.1

This JIRA continues the effort from HBASE-5179. Starting with Stack's
comments about patches for 0.92 and TRUNK:
Reviewing 0.92v17
isDeadServerInProgress is a new public method in ServerManager but it does
not seem to be used anywhere.
Does isDeadRootServerInProgress need to be public? Ditto for meta version.
This method param names are not right 'definitiveRootServer'; what is meant
by definitive? Do they need this qualifier?
Is there anything in place to stop us expiring a server twice if its carrying
root and meta?
What is difference between asking assignment manager isCarryingRoot and this
variable that is passed in? Should be doc'd at least. Ditto for meta.
I think I've asked for this a few times - onlineServers needs to be
explained... either in javadoc or in comment. This is the param passed into
joinCluster. How does it arise? I think I know but am unsure. God love the
poor noob that comes awandering this code trying to make sense of it all.
It looks like we get the list by trawling zk for regionserver znodes that
have not checked in. Don't we do this operation earlier in master setup? Are
we doing it again here?
Though distributed split log is configured, we will do in master single
process splitting under some conditions with this patch. Its not explained in
code why we would do this. Why do we think master log splitting 'high
priority' when it could very well be slower. Should we only go this route if
distributed splitting is not going on. Do we know if concurrent distributed
log splitting and master splitting works?
Why would we have dead servers in progress here in master startup? Because a
servershutdownhandler fired?
This patch is different to the patch for 0.90. Should go into trunk first
with tests, then 0.92. Should it be in this issue? This issue is really hard
to follow now. Maybe this issue is for 0.90.x and new issue for more work on
this trunk patch?
This patch needs to have the v18 differences applied.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5270) Handle potential data loss due to concurrent processing of processFaileOver and ServerShutdownHandler

2012-01-25 Thread Zhihong Yu (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-5270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13193197#comment-13193197
]

Zhihong Yu commented on HBASE-5270:
---

bq. though it may have been the same server
This has been handled in patch on reviewboard, line 628:
{code}
!currentMetaServer.equals(currentRootServer)
{code}
bq. Could we hold up shutdownserverhandler until master is up?
What if the region server hosting .META. went down ?

Handle potential data loss due to concurrent processing of processFaileOver
and ServerShutdownHandler
-

Key: HBASE-5270
URL: https://issues.apache.org/jira/browse/HBASE-5270
Project: HBase
Issue Type: Sub-task
Components: master
Reporter: Zhihong Yu
Fix For: 0.94.0, 0.92.1

[jira] [Commented] (HBASE-5270) Handle potential data loss due to concurrent processing of processFaileOver and ServerShutdownHandler

[
https://issues.apache.org/jira/browse/HBASE-5270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13193209#comment-13193209
]

stack commented on HBASE-5270:
--

bq. What if the region server hosting .META. went down ?

Yes... was just thinking about that. In this case we'd run the splitter
in-line, in SSH, not via executor let me look at code. I'm trying to write
tests and catch-up on all the stuff that was done over on previous issue.

Handle potential data loss due to concurrent processing of processFaileOver
and ServerShutdownHandler
-

Key: HBASE-5270
URL: https://issues.apache.org/jira/browse/HBASE-5270
Project: HBase
Issue Type: Sub-task
Components: master
Reporter: Zhihong Yu
Fix For: 0.94.0, 0.92.1

[jira] [Updated] (HBASE-5278) HBase shell script refers to removed migrate functionality

2012-01-25 Thread Shaneal Manek (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shaneal Manek updated HBASE-5278:
-

Attachment: hbase-5278.patch

 HBase shell script refers to removed migrate functionality
 

 Key: HBASE-5278
 URL: https://issues.apache.org/jira/browse/HBASE-5278
 Project: HBase
  Issue Type: Bug
  Components: scripts
Affects Versions: 0.94.0, 0.90.5, 0.92.0
Reporter: Shaneal Manek
Assignee: Shaneal Manek
Priority: Trivial
 Attachments: hbase-5278.patch


 $ hbase migrate
 Exception in thread main java.lang.NoClassDefFoundError: 
 org/apache/hadoop/hbase/util/Migrate
 Caused by: java.lang.ClassNotFoundException: 
 org.apache.hadoop.hbase.util.Migrate
 at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
 at java.security.AccessController.doPrivileged(Native Method)
 at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
 at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
 Could not find the main class: org.apache.hadoop.hbase.util.Migrate. Program 
 will exit.
 The 'hbase' shell script has docs referring to a 'migrate' command which no 
 longer exists.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HBASE-5278) HBase shell script refers to removed migrate functionality

2012-01-25 Thread Shaneal Manek (Created) (JIRA)

HBase shell script refers to removed migrate functionality


 Key: HBASE-5278
 URL: https://issues.apache.org/jira/browse/HBASE-5278
 Project: HBase
  Issue Type: Bug
  Components: scripts
Affects Versions: 0.92.0, 0.90.5, 0.94.0
Reporter: Shaneal Manek
Assignee: Shaneal Manek
Priority: Trivial
 Attachments: hbase-5278.patch

$ hbase migrate
Exception in thread main java.lang.NoClassDefFoundError: 
org/apache/hadoop/hbase/util/Migrate
Caused by: java.lang.ClassNotFoundException: 
org.apache.hadoop.hbase.util.Migrate
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
Could not find the main class: org.apache.hadoop.hbase.util.Migrate. Program 
will exit.


The 'hbase' shell script has docs referring to a 'migrate' command which no 
longer exists.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5278) HBase shell script refers to removed migrate functionality

2012-01-25 Thread Jonathan Hsieh (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13193248#comment-13193248
 ] 

Jonathan Hsieh commented on HBASE-5278:
---

+1. lgtm. 

 HBase shell script refers to removed migrate functionality
 

 Key: HBASE-5278
 URL: https://issues.apache.org/jira/browse/HBASE-5278
 Project: HBase
  Issue Type: Bug
  Components: scripts
Affects Versions: 0.94.0, 0.90.5, 0.92.0
Reporter: Shaneal Manek
Assignee: Shaneal Manek
Priority: Trivial
 Attachments: hbase-5278.patch


 $ hbase migrate
 Exception in thread main java.lang.NoClassDefFoundError: 
 org/apache/hadoop/hbase/util/Migrate
 Caused by: java.lang.ClassNotFoundException: 
 org.apache.hadoop.hbase.util.Migrate
 at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
 at java.security.AccessController.doPrivileged(Native Method)
 at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
 at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
 Could not find the main class: org.apache.hadoop.hbase.util.Migrate. Program 
 will exit.
 The 'hbase' shell script has docs referring to a 'migrate' command which no 
 longer exists.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5278) HBase shell script refers to removed migrate functionality

2012-01-25 Thread Shaneal Manek (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Shaneal Manek updated HBASE-5278:
-

Status: Patch Available  (was: Open)

 HBase shell script refers to removed migrate functionality
 

 Key: HBASE-5278
 URL: https://issues.apache.org/jira/browse/HBASE-5278
 Project: HBase
  Issue Type: Bug
  Components: scripts
Affects Versions: 0.92.0, 0.90.5, 0.94.0
Reporter: Shaneal Manek
Assignee: Shaneal Manek
Priority: Trivial
 Attachments: hbase-5278.patch


 $ hbase migrate
 Exception in thread main java.lang.NoClassDefFoundError: 
 org/apache/hadoop/hbase/util/Migrate
 Caused by: java.lang.ClassNotFoundException: 
 org.apache.hadoop.hbase.util.Migrate
 at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
 at java.security.AccessController.doPrivileged(Native Method)
 at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
 at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
 Could not find the main class: org.apache.hadoop.hbase.util.Migrate. Program 
 will exit.
 The 'hbase' shell script has docs referring to a 'migrate' command which no 
 longer exists.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5278) HBase shell script refers to removed migrate functionality

2012-01-25 Thread stack (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-5278:
-

   Resolution: Fixed
Fix Version/s: 0.92.1
   0.94.0
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

Committed trunk and branch.  Thanks for patch Shaneal.

 HBase shell script refers to removed migrate functionality
 

 Key: HBASE-5278
 URL: https://issues.apache.org/jira/browse/HBASE-5278
 Project: HBase
  Issue Type: Bug
  Components: scripts
Affects Versions: 0.94.0, 0.90.5, 0.92.0
Reporter: Shaneal Manek
Assignee: Shaneal Manek
Priority: Trivial
 Fix For: 0.94.0, 0.92.1

 Attachments: hbase-5278.patch


 $ hbase migrate
 Exception in thread main java.lang.NoClassDefFoundError: 
 org/apache/hadoop/hbase/util/Migrate
 Caused by: java.lang.ClassNotFoundException: 
 org.apache.hadoop.hbase.util.Migrate
 at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
 at java.security.AccessController.doPrivileged(Native Method)
 at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
 at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
 Could not find the main class: org.apache.hadoop.hbase.util.Migrate. Program 
 will exit.
 The 'hbase' shell script has docs referring to a 'migrate' command which no 
 longer exists.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5276) PerformanceEvaluation does not set the correct classpath for MR because it lives in the test jar


[ 
https://issues.apache.org/jira/browse/HBASE-5276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13193255#comment-13193255
 ] 

stack commented on HBASE-5276:
--

@Tim Maybe open issue against CDH and close this one?

 PerformanceEvaluation does not set the correct classpath for MR because it 
 lives in the test jar
 

 Key: HBASE-5276
 URL: https://issues.apache.org/jira/browse/HBASE-5276
 Project: HBase
  Issue Type: Bug
  Components: test
Affects Versions: 0.90.4
Reporter: Tim Robertson
Priority: Minor

 Note: This was discovered running the CDH version hbase-0.90.4-cdh3u2
 Running the PerformanceEvaluation as follows:
   $HADOOP_HOME/bin/hadoop org.apache.hadoop.hbase.PerformanceEvaluation scan 5
 fails because the MR tasks do not get the HBase jar on the CP, and thus hit 
 ClassNotFoundExceptions.
 The job gets the following only:
   file:/Users/tim/dev/hadoop/hbase-0.90.4-cdh3u2/hbase-0.90.4-cdh3u2-tests.jar
   
 file:/Users/tim/dev/hadoop/hadoop-0.20.2-cdh3u2/hadoop-core-0.20.2-cdh3u2.jar
   
 file:/Users/tim/dev/hadoop/hbase-0.90.4-cdh3u2/lib/zookeeper-3.3.3-cdh3u2.jar
 The RowCounter etc all work because they live in the HBase jar, not the test 
 jar, and they get the following 
   file:/Users/tim/dev/hadoop/hbase-0.90.4-cdh3u2/lib/guava-r06.jar
   
 file:/Users/tim/dev/hadoop/hadoop-0.20.2-cdh3u2/hadoop-core-0.20.2-cdh3u2.jar
   file:/Users/tim/dev/hadoop/hbase-0.90.4-cdh3u2/hbase-0.90.4-cdh3u2.jar
   
 file:/Users/tim/dev/hadoop/hbase-0.90.4-cdh3u2/lib/zookeeper-3.3.3-cdh3u2.jar
 Presumably this relates to 
   job.setJarByClass(PerformanceEvaluation.class);
   ...
   TableMapReduceUtil.addDependencyJars(job);
 A (cowboy) workaround to run PE is to unpack the jars, and copy the 
 PerformanceEvaluation* classes building a patched jar.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4917) CRUD Verify Utility

2012-01-25 Thread Mubarak Seyed (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-4917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13193259#comment-13193259
]

Mubarak Seyed commented on HBASE-4917:
--

@Nicolas,

How different this LoadTest tool from PerformanceEvaluation

{code}
hbase org.apache.hadoop.hbase.PerformanceEvaluation
Usage: java org.apache.hadoop.hbase.PerformanceEvaluation \
[--miniCluster] [--nomapred] [--rows=ROWS] command nclients
{code}

I believe LoadTester.java generates load for multiple column families (provided
there is an external properties file to define the CFs and their definition,
read/write threads, regions/server) where as PerformanceEvaluation uses only
one CF (TestTable:info).

How does LoadTester differ from YCSB, i believe YCSB supports only one CF as
well.

I think LoadTester can be used for burn-in test (when we provision a new
cluster and sniff the cluster).

If no one is working on this issue, i can help porting loadtest to
src/test/java/org/apache/hadoop/hbase/loadtest.

Thanks.

CRUD Verify Utility
---

Key: HBASE-4917
URL: https://issues.apache.org/jira/browse/HBASE-4917
Project: HBase
Issue Type: Sub-task
Components: client, regionserver
Reporter: Nicolas Spiegelberg
Fix For: 0.94.0

Add a verify utility to run basic CRUD tests against hbase in various common
use cases. This is great for sanity checking a cluster setup because it can
be run as a one line shell command with no required params. Multiple column
families for different use-cases can be tested together. Currently provided
use-cases are 'action log', 'snapshot' and 'search'. The interface is
developed such that it can be easily extended to cover more use-cases.

[jira] [Commented] (HBASE-5278) HBase shell script refers to removed migrate functionality

2012-01-25 Thread Jonathan Hsieh (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13193260#comment-13193260
 ] 

Jonathan Hsieh commented on HBASE-5278:
---

Wow, you are fast Stack.  I was trying to commit. :)

 HBase shell script refers to removed migrate functionality
 

 Key: HBASE-5278
 URL: https://issues.apache.org/jira/browse/HBASE-5278
 Project: HBase
  Issue Type: Bug
  Components: scripts
Affects Versions: 0.94.0, 0.90.5, 0.92.0
Reporter: Shaneal Manek
Assignee: Shaneal Manek
Priority: Trivial
 Fix For: 0.94.0, 0.92.1

 Attachments: hbase-5278.patch


 $ hbase migrate
 Exception in thread main java.lang.NoClassDefFoundError: 
 org/apache/hadoop/hbase/util/Migrate
 Caused by: java.lang.ClassNotFoundException: 
 org.apache.hadoop.hbase.util.Migrate
 at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
 at java.security.AccessController.doPrivileged(Native Method)
 at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
 at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
 Could not find the main class: org.apache.hadoop.hbase.util.Migrate. Program 
 will exit.
 The 'hbase' shell script has docs referring to a 'migrate' command which no 
 longer exists.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5259) Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.


 [ 
https://issues.apache.org/jira/browse/HBASE-5259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phabricator updated HBASE-5259:
---

Attachment: D1413.2.patch

Liyin updated the revision [jira][HBASE-5259] Normalize the RegionLocation in 
TableInputFormat by the reverse DNS lookup..
Reviewers: Kannan, Karthik, mbautin

  Address Ted's comments.

REVISION DETAIL
  https://reviews.facebook.net/D1413

AFFECTED FILES
  src/main/java/org/apache/hadoop/hbase/mapreduce/TableInputFormatBase.java


 Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.
 ---

 Key: HBASE-5259
 URL: https://issues.apache.org/jira/browse/HBASE-5259
 Project: HBase
  Issue Type: Improvement
Reporter: Liyin Tang
Assignee: Liyin Tang
 Attachments: D1413.1.patch, D1413.1.patch, D1413.1.patch, 
 D1413.1.patch, D1413.2.patch, D1413.2.patch, D1413.2.patch, D1413.2.patch


 Assuming the HBase and MapReduce running in the same cluster, the 
 TableInputFormat is to override the split function which divides all the 
 regions from one particular table into a series of mapper tasks. So each 
 mapper task can process a region or one part of a region. Ideally, the mapper 
 task should run on the same machine on which the region server hosts the 
 corresponding region. That's the motivation that the TableInputFormat sets 
 the RegionLocation so that the MapReduce framework can respect the node 
 locality. 
 The code simply set the host name of the region server as the 
 HRegionLocation. However, the host name of the region server may have 
 different format with the host name of the task tracker (Mapper task). The 
 task tracker always gets its hostname by the reverse DNS lookup. And the DNS 
 service may return different host name format. For example, the host name of 
 the region server is correctly set as a.b.c.d while the reverse DNS lookup 
 may return a.b.c.d. (With an additional doc in the end).
 So the solution is to set the RegionLocation by the reverse DNS lookup as 
 well. No matter what host name format the DNS system is using, the 
 TableInputFormat has the responsibility to keep the consistent host name 
 format with the MapReduce framework.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5259) Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.


 [ 
https://issues.apache.org/jira/browse/HBASE-5259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phabricator updated HBASE-5259:
---

Attachment: D1413.2.patch

Liyin updated the revision [jira][HBASE-5259] Normalize the RegionLocation in 
TableInputFormat by the reverse DNS lookup..
Reviewers: Kannan, Karthik, mbautin

  Address Ted's comments.

REVISION DETAIL
  https://reviews.facebook.net/D1413

AFFECTED FILES
  src/main/java/org/apache/hadoop/hbase/mapreduce/TableInputFormatBase.java


 Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.
 ---

 Key: HBASE-5259
 URL: https://issues.apache.org/jira/browse/HBASE-5259
 Project: HBase
  Issue Type: Improvement
Reporter: Liyin Tang
Assignee: Liyin Tang
 Attachments: D1413.1.patch, D1413.1.patch, D1413.1.patch, 
 D1413.1.patch, D1413.2.patch, D1413.2.patch, D1413.2.patch, D1413.2.patch


 Assuming the HBase and MapReduce running in the same cluster, the 
 TableInputFormat is to override the split function which divides all the 
 regions from one particular table into a series of mapper tasks. So each 
 mapper task can process a region or one part of a region. Ideally, the mapper 
 task should run on the same machine on which the region server hosts the 
 corresponding region. That's the motivation that the TableInputFormat sets 
 the RegionLocation so that the MapReduce framework can respect the node 
 locality. 
 The code simply set the host name of the region server as the 
 HRegionLocation. However, the host name of the region server may have 
 different format with the host name of the task tracker (Mapper task). The 
 task tracker always gets its hostname by the reverse DNS lookup. And the DNS 
 service may return different host name format. For example, the host name of 
 the region server is correctly set as a.b.c.d while the reverse DNS lookup 
 may return a.b.c.d. (With an additional doc in the end).
 So the solution is to set the RegionLocation by the reverse DNS lookup as 
 well. No matter what host name format the DNS system is using, the 
 TableInputFormat has the responsibility to keep the consistent host name 
 format with the MapReduce framework.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5259) Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.


 [ 
https://issues.apache.org/jira/browse/HBASE-5259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phabricator updated HBASE-5259:
---

Attachment: D1413.2.patch

Liyin updated the revision [jira][HBASE-5259] Normalize the RegionLocation in 
TableInputFormat by the reverse DNS lookup..
Reviewers: Kannan, Karthik, mbautin

  Address Ted's comments.

REVISION DETAIL
  https://reviews.facebook.net/D1413

AFFECTED FILES
  src/main/java/org/apache/hadoop/hbase/mapreduce/TableInputFormatBase.java


 Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.
 ---

 Key: HBASE-5259
 URL: https://issues.apache.org/jira/browse/HBASE-5259
 Project: HBase
  Issue Type: Improvement
Reporter: Liyin Tang
Assignee: Liyin Tang
 Attachments: D1413.1.patch, D1413.1.patch, D1413.1.patch, 
 D1413.1.patch, D1413.2.patch, D1413.2.patch, D1413.2.patch, D1413.2.patch


 Assuming the HBase and MapReduce running in the same cluster, the 
 TableInputFormat is to override the split function which divides all the 
 regions from one particular table into a series of mapper tasks. So each 
 mapper task can process a region or one part of a region. Ideally, the mapper 
 task should run on the same machine on which the region server hosts the 
 corresponding region. That's the motivation that the TableInputFormat sets 
 the RegionLocation so that the MapReduce framework can respect the node 
 locality. 
 The code simply set the host name of the region server as the 
 HRegionLocation. However, the host name of the region server may have 
 different format with the host name of the task tracker (Mapper task). The 
 task tracker always gets its hostname by the reverse DNS lookup. And the DNS 
 service may return different host name format. For example, the host name of 
 the region server is correctly set as a.b.c.d while the reverse DNS lookup 
 may return a.b.c.d. (With an additional doc in the end).
 So the solution is to set the RegionLocation by the reverse DNS lookup as 
 well. No matter what host name format the DNS system is using, the 
 TableInputFormat has the responsibility to keep the consistent host name 
 format with the MapReduce framework.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5259) Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.


 [ 
https://issues.apache.org/jira/browse/HBASE-5259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phabricator updated HBASE-5259:
---

Attachment: D1413.2.patch

Liyin updated the revision [jira][HBASE-5259] Normalize the RegionLocation in 
TableInputFormat by the reverse DNS lookup..
Reviewers: Kannan, Karthik, mbautin

  Address Ted's comments.

REVISION DETAIL
  https://reviews.facebook.net/D1413

AFFECTED FILES
  src/main/java/org/apache/hadoop/hbase/mapreduce/TableInputFormatBase.java


 Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.
 ---

 Key: HBASE-5259
 URL: https://issues.apache.org/jira/browse/HBASE-5259
 Project: HBase
  Issue Type: Improvement
Reporter: Liyin Tang
Assignee: Liyin Tang
 Attachments: D1413.1.patch, D1413.1.patch, D1413.1.patch, 
 D1413.1.patch, D1413.2.patch, D1413.2.patch, D1413.2.patch, D1413.2.patch


 Assuming the HBase and MapReduce running in the same cluster, the 
 TableInputFormat is to override the split function which divides all the 
 regions from one particular table into a series of mapper tasks. So each 
 mapper task can process a region or one part of a region. Ideally, the mapper 
 task should run on the same machine on which the region server hosts the 
 corresponding region. That's the motivation that the TableInputFormat sets 
 the RegionLocation so that the MapReduce framework can respect the node 
 locality. 
 The code simply set the host name of the region server as the 
 HRegionLocation. However, the host name of the region server may have 
 different format with the host name of the task tracker (Mapper task). The 
 task tracker always gets its hostname by the reverse DNS lookup. And the DNS 
 service may return different host name format. For example, the host name of 
 the region server is correctly set as a.b.c.d while the reverse DNS lookup 
 may return a.b.c.d. (With an additional doc in the end).
 So the solution is to set the RegionLocation by the reverse DNS lookup as 
 well. No matter what host name format the DNS system is using, the 
 TableInputFormat has the responsibility to keep the consistent host name 
 format with the MapReduce framework.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5274) Filter out the expired store file scanner during the compaction

[
https://issues.apache.org/jira/browse/HBASE-5274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13193274#comment-13193274
]

Phabricator commented on HBASE-5274:

Liyin has abandoned the revision [jira][HBASE-5274] Filter out the expired
store file scanner during the compaction.

This could be part of [HBASE-5010] Filter HFiles based on TTL. Mikhail will
help to followup for fixing this issue.

REVISION DETAIL
https://reviews.facebook.net/D1407

Filter out the expired store file scanner during the compaction
---

Key: HBASE-5274
URL: https://issues.apache.org/jira/browse/HBASE-5274
Project: HBase
Issue Type: Improvement
Reporter: Liyin Tang
Assignee: Liyin Tang
Attachments: D1407.1.patch, D1407.1.patch, D1407.1.patch,
D1407.1.patch, D1407.1.patch

During the compaction time, HBase will generate a store scanner which will
scan a list of store files. And it would be more efficient to filer out the
expired store file since there is no need to read any key values from these
store files.
This optimization has been already implemented on 89-fb and this is the
building block for HBASE-5199 as well. It is supposed to be no-ops to compact
the expired store files.

[jira] [Commented] (HBASE-5274) Filter out the expired store file scanner during the compaction

[
https://issues.apache.org/jira/browse/HBASE-5274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13193275#comment-13193275
]

Phabricator commented on HBASE-5274:

Liyin has abandoned the revision [jira][HBASE-5274] Filter out the expired
store file scanner during the compaction.

This could be part of [HBASE-5010] Filter HFiles based on TTL. Mikhail will
help to followup for fixing this issue.

REVISION DETAIL
https://reviews.facebook.net/D1407

Filter out the expired store file scanner during the compaction
---

[jira] [Commented] (HBASE-5274) Filter out the expired store file scanner during the compaction

[
https://issues.apache.org/jira/browse/HBASE-5274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13193278#comment-13193278
]

Phabricator commented on HBASE-5274:

Liyin has abandoned the revision [jira][HBASE-5274] Filter out the expired
store file scanner during the compaction.

This could be part of [HBASE-5010] Filter HFiles based on TTL. Mikhail will
help to followup for fixing this issue.

REVISION DETAIL
https://reviews.facebook.net/D1407

Filter out the expired store file scanner during the compaction
---

[jira] [Commented] (HBASE-5274) Filter out the expired store file scanner during the compaction

[
https://issues.apache.org/jira/browse/HBASE-5274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13193277#comment-13193277
]

Phabricator commented on HBASE-5274:

Liyin has abandoned the revision [jira][HBASE-5274] Filter out the expired
store file scanner during the compaction.

This could be part of [HBASE-5010] Filter HFiles based on TTL. Mikhail will
help to followup for fixing this issue.

REVISION DETAIL
https://reviews.facebook.net/D1407

Filter out the expired store file scanner during the compaction
---

[jira] [Commented] (HBASE-5259) Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.


[ 
https://issues.apache.org/jira/browse/HBASE-5259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13193287#comment-13193287
 ] 

Phabricator commented on HBASE-5259:


tedyu has commented on the revision [jira][HBASE-5259] Normalize the 
RegionLocation in TableInputFormat by the reverse DNS lookup..

INLINE COMMENTS
  src/main/java/org/apache/hadoop/hbase/mapreduce/TableInputFormatBase.java:89 
I think reverseDNSCache is a good enough name.
  src/main/java/org/apache/hadoop/hbase/mapreduce/TableInputFormatBase.java:202 
Should NamingException be handled here ?

REVISION DETAIL
  https://reviews.facebook.net/D1413


 Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.
 ---

 Key: HBASE-5259
 URL: https://issues.apache.org/jira/browse/HBASE-5259
 Project: HBase
  Issue Type: Improvement
Reporter: Liyin Tang
Assignee: Liyin Tang
 Attachments: D1413.1.patch, D1413.1.patch, D1413.1.patch, 
 D1413.1.patch, D1413.2.patch, D1413.2.patch, D1413.2.patch, D1413.2.patch


 Assuming the HBase and MapReduce running in the same cluster, the 
 TableInputFormat is to override the split function which divides all the 
 regions from one particular table into a series of mapper tasks. So each 
 mapper task can process a region or one part of a region. Ideally, the mapper 
 task should run on the same machine on which the region server hosts the 
 corresponding region. That's the motivation that the TableInputFormat sets 
 the RegionLocation so that the MapReduce framework can respect the node 
 locality. 
 The code simply set the host name of the region server as the 
 HRegionLocation. However, the host name of the region server may have 
 different format with the host name of the task tracker (Mapper task). The 
 task tracker always gets its hostname by the reverse DNS lookup. And the DNS 
 service may return different host name format. For example, the host name of 
 the region server is correctly set as a.b.c.d while the reverse DNS lookup 
 may return a.b.c.d. (With an additional doc in the end).
 So the solution is to set the RegionLocation by the reverse DNS lookup as 
 well. No matter what host name format the DNS system is using, the 
 TableInputFormat has the responsibility to keep the consistent host name 
 format with the MapReduce framework.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5259) Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.


[ 
https://issues.apache.org/jira/browse/HBASE-5259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13193289#comment-13193289
 ] 

Phabricator commented on HBASE-5259:


tedyu has commented on the revision [jira][HBASE-5259] Normalize the 
RegionLocation in TableInputFormat by the reverse DNS lookup..

INLINE COMMENTS
  src/main/java/org/apache/hadoop/hbase/mapreduce/TableInputFormatBase.java:89 
I think reverseDNSCache is a good enough name.
  src/main/java/org/apache/hadoop/hbase/mapreduce/TableInputFormatBase.java:202 
Should NamingException be handled here ?

REVISION DETAIL
  https://reviews.facebook.net/D1413


 Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.
 ---

 Key: HBASE-5259
 URL: https://issues.apache.org/jira/browse/HBASE-5259
 Project: HBase
  Issue Type: Improvement
Reporter: Liyin Tang
Assignee: Liyin Tang
 Attachments: D1413.1.patch, D1413.1.patch, D1413.1.patch, 
 D1413.1.patch, D1413.2.patch, D1413.2.patch, D1413.2.patch, D1413.2.patch


 Assuming the HBase and MapReduce running in the same cluster, the 
 TableInputFormat is to override the split function which divides all the 
 regions from one particular table into a series of mapper tasks. So each 
 mapper task can process a region or one part of a region. Ideally, the mapper 
 task should run on the same machine on which the region server hosts the 
 corresponding region. That's the motivation that the TableInputFormat sets 
 the RegionLocation so that the MapReduce framework can respect the node 
 locality. 
 The code simply set the host name of the region server as the 
 HRegionLocation. However, the host name of the region server may have 
 different format with the host name of the task tracker (Mapper task). The 
 task tracker always gets its hostname by the reverse DNS lookup. And the DNS 
 service may return different host name format. For example, the host name of 
 the region server is correctly set as a.b.c.d while the reverse DNS lookup 
 may return a.b.c.d. (With an additional doc in the end).
 So the solution is to set the RegionLocation by the reverse DNS lookup as 
 well. No matter what host name format the DNS system is using, the 
 TableInputFormat has the responsibility to keep the consistent host name 
 format with the MapReduce framework.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5259) Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.


[ 
https://issues.apache.org/jira/browse/HBASE-5259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13193288#comment-13193288
 ] 

Phabricator commented on HBASE-5259:


tedyu has commented on the revision [jira][HBASE-5259] Normalize the 
RegionLocation in TableInputFormat by the reverse DNS lookup..

INLINE COMMENTS
  src/main/java/org/apache/hadoop/hbase/mapreduce/TableInputFormatBase.java:89 
I think reverseDNSCache is a good enough name.
  src/main/java/org/apache/hadoop/hbase/mapreduce/TableInputFormatBase.java:202 
Should NamingException be handled here ?

REVISION DETAIL
  https://reviews.facebook.net/D1413


 Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.
 ---

 Key: HBASE-5259
 URL: https://issues.apache.org/jira/browse/HBASE-5259
 Project: HBase
  Issue Type: Improvement
Reporter: Liyin Tang
Assignee: Liyin Tang
 Attachments: D1413.1.patch, D1413.1.patch, D1413.1.patch, 
 D1413.1.patch, D1413.2.patch, D1413.2.patch, D1413.2.patch, D1413.2.patch


 Assuming the HBase and MapReduce running in the same cluster, the 
 TableInputFormat is to override the split function which divides all the 
 regions from one particular table into a series of mapper tasks. So each 
 mapper task can process a region or one part of a region. Ideally, the mapper 
 task should run on the same machine on which the region server hosts the 
 corresponding region. That's the motivation that the TableInputFormat sets 
 the RegionLocation so that the MapReduce framework can respect the node 
 locality. 
 The code simply set the host name of the region server as the 
 HRegionLocation. However, the host name of the region server may have 
 different format with the host name of the task tracker (Mapper task). The 
 task tracker always gets its hostname by the reverse DNS lookup. And the DNS 
 service may return different host name format. For example, the host name of 
 the region server is correctly set as a.b.c.d while the reverse DNS lookup 
 may return a.b.c.d. (With an additional doc in the end).
 So the solution is to set the RegionLocation by the reverse DNS lookup as 
 well. No matter what host name format the DNS system is using, the 
 TableInputFormat has the responsibility to keep the consistent host name 
 format with the MapReduce framework.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5259) Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.


[ 
https://issues.apache.org/jira/browse/HBASE-5259?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13193290#comment-13193290
 ] 

Phabricator commented on HBASE-5259:


tedyu has commented on the revision [jira][HBASE-5259] Normalize the 
RegionLocation in TableInputFormat by the reverse DNS lookup..

INLINE COMMENTS
  src/main/java/org/apache/hadoop/hbase/mapreduce/TableInputFormatBase.java:89 
I think reverseDNSCache is a good enough name.
  src/main/java/org/apache/hadoop/hbase/mapreduce/TableInputFormatBase.java:202 
Should NamingException be handled here ?

REVISION DETAIL
  https://reviews.facebook.net/D1413


 Normalize the RegionLocation in TableInputFormat by the reverse DNS lookup.
 ---

 Key: HBASE-5259
 URL: https://issues.apache.org/jira/browse/HBASE-5259
 Project: HBase
  Issue Type: Improvement
Reporter: Liyin Tang
Assignee: Liyin Tang
 Attachments: D1413.1.patch, D1413.1.patch, D1413.1.patch, 
 D1413.1.patch, D1413.2.patch, D1413.2.patch, D1413.2.patch, D1413.2.patch


 Assuming the HBase and MapReduce running in the same cluster, the 
 TableInputFormat is to override the split function which divides all the 
 regions from one particular table into a series of mapper tasks. So each 
 mapper task can process a region or one part of a region. Ideally, the mapper 
 task should run on the same machine on which the region server hosts the 
 corresponding region. That's the motivation that the TableInputFormat sets 
 the RegionLocation so that the MapReduce framework can respect the node 
 locality. 
 The code simply set the host name of the region server as the 
 HRegionLocation. However, the host name of the region server may have 
 different format with the host name of the task tracker (Mapper task). The 
 task tracker always gets its hostname by the reverse DNS lookup. And the DNS 
 service may return different host name format. For example, the host name of 
 the region server is correctly set as a.b.c.d while the reverse DNS lookup 
 may return a.b.c.d. (With an additional doc in the end).
 So the solution is to set the RegionLocation by the reverse DNS lookup as 
 well. No matter what host name format the DNS system is using, the 
 TableInputFormat has the responsibility to keep the consistent host name 
 format with the MapReduce framework.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HBASE-5279) NPE in Master after upgrading to 0.92.0

2012-01-25 Thread Tobias Herbert (Created) (JIRA)

NPE in Master after upgrading to 0.92.0
---

 Key: HBASE-5279
 URL: https://issues.apache.org/jira/browse/HBASE-5279
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.92.0
Reporter: Tobias Herbert
Priority: Critical


I have upgraded my environment from 0.90.4 to 0.92.0

after the table migration I get the following error in the master (permanent)

{noformat}
2012-01-25 18:23:48,648 FATAL master-namenode,6,1327512209588 
org.apache.hadoop.hbase.master.HMaster - Unhandled exception. Starting shutdown.
java.lang.NullPointerException
at 
org.apache.hadoop.hbase.master.AssignmentManager.rebuildUserRegions(AssignmentManager.java:2190)
at 
org.apache.hadoop.hbase.master.AssignmentManager.joinCluster(AssignmentManager.java:323)
at 
org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:501)
at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:326)
at java.lang.Thread.run(Thread.java:662)
2012-01-25 18:23:48,650 INFO namenode,6,1327512209588 
org.apache.hadoop.hbase.master.HMaster - Aborting
{noformat}

I think that's because I had a hard crash in the cluster a while ago - and the 
following WARN since then

{noformat}
2012-01-25 21:20:47,121 WARN namenode,6,1327513078123-CatalogJanitor 
org.apache.hadoop.hbase.master.CatalogJanitor - REGIONINFO_QUALIFIER is empty 
in keyvalues={emails,,xxx./info:server/1314336400471/Put/vlen=38, 
emails,,1314189353300.xxx./info:serverstartcode/1314336400471/Put/vlen=8}
{noformat}

my patch was simple to go around the NPE (as the other code around the lines)
but I don't know if that's correct


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5279) NPE in Master after upgrading to 0.92.0

2012-01-25 Thread Tobias Herbert (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tobias Herbert updated HBASE-5279:
--

Attachment: HBASE-5279.patch

 NPE in Master after upgrading to 0.92.0
 ---

 Key: HBASE-5279
 URL: https://issues.apache.org/jira/browse/HBASE-5279
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.92.0
Reporter: Tobias Herbert
Priority: Critical
 Attachments: HBASE-5279.patch


 I have upgraded my environment from 0.90.4 to 0.92.0
 after the table migration I get the following error in the master (permanent)
 {noformat}
 2012-01-25 18:23:48,648 FATAL master-namenode,6,1327512209588 
 org.apache.hadoop.hbase.master.HMaster - Unhandled exception. Starting 
 shutdown.
 java.lang.NullPointerException
 at 
 org.apache.hadoop.hbase.master.AssignmentManager.rebuildUserRegions(AssignmentManager.java:2190)
 at 
 org.apache.hadoop.hbase.master.AssignmentManager.joinCluster(AssignmentManager.java:323)
 at 
 org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:501)
 at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:326)
 at java.lang.Thread.run(Thread.java:662)
 2012-01-25 18:23:48,650 INFO namenode,6,1327512209588 
 org.apache.hadoop.hbase.master.HMaster - Aborting
 {noformat}
 I think that's because I had a hard crash in the cluster a while ago - and 
 the following WARN since then
 {noformat}
 2012-01-25 21:20:47,121 WARN namenode,6,1327513078123-CatalogJanitor 
 org.apache.hadoop.hbase.master.CatalogJanitor - REGIONINFO_QUALIFIER is empty 
 in keyvalues={emails,,xxx./info:server/1314336400471/Put/vlen=38, 
 emails,,1314189353300.xxx./info:serverstartcode/1314336400471/Put/vlen=8}
 {noformat}
 my patch was simple to go around the NPE (as the other code around the lines)
 but I don't know if that's correct

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5230) Unit test to ensure compactions don't cache data on write


[ 
https://issues.apache.org/jira/browse/HBASE-5230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13193307#comment-13193307
 ] 

Phabricator commented on HBASE-5230:


mbautin has abandoned the revision [jira] [HBASE-5230] Extend TestCacheOnWrite 
to ensure we don't cache data blocks on compaction.

  This has been committed to HBase trunk. Abandoning the diff since I forgot to 
include differential revision in the commit message.


REVISION DETAIL
  https://reviews.facebook.net/D1353


 Unit test to ensure compactions don't cache data on write
 -

 Key: HBASE-5230
 URL: https://issues.apache.org/jira/browse/HBASE-5230
 Project: HBase
  Issue Type: Test
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin
Priority: Minor
 Attachments: D1353.1.patch, D1353.2.patch, D1353.3.patch, 
 D1353.4.patch, 
 Don-t-cache-data-blocks-on-compaction-2012-01-21_00_53_54.patch, 
 Don-t-cache-data-blocks-on-compaction-2012-01-23_10_23_45.patch, 
 Don-t-cache-data-blocks-on-compaction-2012-01-23_15_27_23.patch


 Create a unit test for HBASE-3976 (making sure we don't cache data blocks on 
 write during compactions even if cache-on-write is enabled generally 
 enabled). This is because we have very different implementations of 
 HBASE-3976 without HBASE-4422 CacheConfig (on top of 89-fb, created by Liyin) 
 and with CacheConfig (presumably it's there but not sure if it even works, 
 since the patch in HBASE-3976 may not have been committed). We need to create 
 a unit test to verify that we don't cache data blocks on write during 
 compactions, and resolve HBASE-3976 so that this new unit test does not fail.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5278) HBase shell script refers to removed migrate functionality


[ 
https://issues.apache.org/jira/browse/HBASE-5278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13193365#comment-13193365
 ] 

stack commented on HBASE-5278:
--

@Jon I've a bit of practise

 HBase shell script refers to removed migrate functionality
 

 Key: HBASE-5278
 URL: https://issues.apache.org/jira/browse/HBASE-5278
 Project: HBase
  Issue Type: Bug
  Components: scripts
Affects Versions: 0.94.0, 0.90.5, 0.92.0
Reporter: Shaneal Manek
Assignee: Shaneal Manek
Priority: Trivial
 Fix For: 0.94.0, 0.92.1

 Attachments: hbase-5278.patch


 $ hbase migrate
 Exception in thread main java.lang.NoClassDefFoundError: 
 org/apache/hadoop/hbase/util/Migrate
 Caused by: java.lang.ClassNotFoundException: 
 org.apache.hadoop.hbase.util.Migrate
 at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
 at java.security.AccessController.doPrivileged(Native Method)
 at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
 at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
 Could not find the main class: org.apache.hadoop.hbase.util.Migrate. Program 
 will exit.
 The 'hbase' shell script has docs referring to a 'migrate' command which no 
 longer exists.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5279) NPE in Master after upgrading to 0.92.0


[ 
https://issues.apache.org/jira/browse/HBASE-5279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13193363#comment-13193363
 ] 

stack commented on HBASE-5279:
--

Skipping should be fine.  You have a scan of .META. from before upgrade?

Are you up now?

 NPE in Master after upgrading to 0.92.0
 ---

 Key: HBASE-5279
 URL: https://issues.apache.org/jira/browse/HBASE-5279
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.92.0
Reporter: Tobias Herbert
Priority: Critical
 Attachments: HBASE-5279.patch


 I have upgraded my environment from 0.90.4 to 0.92.0
 after the table migration I get the following error in the master (permanent)
 {noformat}
 2012-01-25 18:23:48,648 FATAL master-namenode,6,1327512209588 
 org.apache.hadoop.hbase.master.HMaster - Unhandled exception. Starting 
 shutdown.
 java.lang.NullPointerException
 at 
 org.apache.hadoop.hbase.master.AssignmentManager.rebuildUserRegions(AssignmentManager.java:2190)
 at 
 org.apache.hadoop.hbase.master.AssignmentManager.joinCluster(AssignmentManager.java:323)
 at 
 org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:501)
 at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:326)
 at java.lang.Thread.run(Thread.java:662)
 2012-01-25 18:23:48,650 INFO namenode,6,1327512209588 
 org.apache.hadoop.hbase.master.HMaster - Aborting
 {noformat}
 I think that's because I had a hard crash in the cluster a while ago - and 
 the following WARN since then
 {noformat}
 2012-01-25 21:20:47,121 WARN namenode,6,1327513078123-CatalogJanitor 
 org.apache.hadoop.hbase.master.CatalogJanitor - REGIONINFO_QUALIFIER is empty 
 in keyvalues={emails,,xxx./info:server/1314336400471/Put/vlen=38, 
 emails,,1314189353300.xxx./info:serverstartcode/1314336400471/Put/vlen=8}
 {noformat}
 my patch was simple to go around the NPE (as the other code around the lines)
 but I don't know if that's correct

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5279) NPE in Master after upgrading to 0.92.0

2012-01-25 Thread Tobias Herbert (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5279?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13193393#comment-13193393
 ] 

Tobias Herbert commented on HBASE-5279:
---

unfortunately I have no scan from .META. from before the upgrade.
but with this patch I am up now :-)

 NPE in Master after upgrading to 0.92.0
 ---

 Key: HBASE-5279
 URL: https://issues.apache.org/jira/browse/HBASE-5279
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.92.0
Reporter: Tobias Herbert
Priority: Critical
 Attachments: HBASE-5279.patch


 I have upgraded my environment from 0.90.4 to 0.92.0
 after the table migration I get the following error in the master (permanent)
 {noformat}
 2012-01-25 18:23:48,648 FATAL master-namenode,6,1327512209588 
 org.apache.hadoop.hbase.master.HMaster - Unhandled exception. Starting 
 shutdown.
 java.lang.NullPointerException
 at 
 org.apache.hadoop.hbase.master.AssignmentManager.rebuildUserRegions(AssignmentManager.java:2190)
 at 
 org.apache.hadoop.hbase.master.AssignmentManager.joinCluster(AssignmentManager.java:323)
 at 
 org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:501)
 at org.apache.hadoop.hbase.master.HMaster.run(HMaster.java:326)
 at java.lang.Thread.run(Thread.java:662)
 2012-01-25 18:23:48,650 INFO namenode,6,1327512209588 
 org.apache.hadoop.hbase.master.HMaster - Aborting
 {noformat}
 I think that's because I had a hard crash in the cluster a while ago - and 
 the following WARN since then
 {noformat}
 2012-01-25 21:20:47,121 WARN namenode,6,1327513078123-CatalogJanitor 
 org.apache.hadoop.hbase.master.CatalogJanitor - REGIONINFO_QUALIFIER is empty 
 in keyvalues={emails,,xxx./info:server/1314336400471/Put/vlen=38, 
 emails,,1314189353300.xxx./info:serverstartcode/1314336400471/Put/vlen=8}
 {noformat}
 my patch was simple to go around the NPE (as the other code around the lines)
 but I don't know if that's correct

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5278) HBase shell script refers to removed migrate functionality


[ 
https://issues.apache.org/jira/browse/HBASE-5278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13193396#comment-13193396
 ] 

Hudson commented on HBASE-5278:
---

Integrated in HBase-0.92 #262 (See 
[https://builds.apache.org/job/HBase-0.92/262/])
HBASE-5278 HBase shell script refers to removed 'migrate' functionality

stack : 
Files : 
* /hbase/branches/0.92/CHANGES.txt
* /hbase/branches/0.92/bin/hbase


 HBase shell script refers to removed migrate functionality
 

 Key: HBASE-5278
 URL: https://issues.apache.org/jira/browse/HBASE-5278
 Project: HBase
  Issue Type: Bug
  Components: scripts
Affects Versions: 0.94.0, 0.90.5, 0.92.0
Reporter: Shaneal Manek
Assignee: Shaneal Manek
Priority: Trivial
 Fix For: 0.94.0, 0.92.1

 Attachments: hbase-5278.patch


 $ hbase migrate
 Exception in thread main java.lang.NoClassDefFoundError: 
 org/apache/hadoop/hbase/util/Migrate
 Caused by: java.lang.ClassNotFoundException: 
 org.apache.hadoop.hbase.util.Migrate
 at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
 at java.security.AccessController.doPrivileged(Native Method)
 at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
 at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
 Could not find the main class: org.apache.hadoop.hbase.util.Migrate. Program 
 will exit.
 The 'hbase' shell script has docs referring to a 'migrate' command which no 
 longer exists.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3025) Coprocessor based simple access control

2012-01-25 Thread Andrew Purtell (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-3025?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13193415#comment-13193415
]

Andrew Purtell commented on HBASE-3025:
---

See HBASE-4990. Destined for the site manual. The piece I have left to do is a
capture of an example shell session. I have such a capture but it's led to
follow on jiras that need to be resolved for 0.92.1

Coprocessor based simple access control
---

Key: HBASE-3025
URL: https://issues.apache.org/jira/browse/HBASE-3025
Project: HBase
Issue Type: Sub-task
Components: coprocessors
Reporter: Andrew Purtell
Priority: Critical
Fix For: 0.92.0

Attachments: HBASE-3025.1.patch, HBASE-3025_5.patch,
HBASE-3025_6.patch

Thanks for the clarification Jeff which reminds me to edit this issue.
Goals of this issue
# Client access to HBase is authenticated
# User data is private unless access has been granted
# Access to data can be granted at a table or per column family basis.
Non-Goals of this issue
The following items will be left out of the initial implementation for
simplicity:
# Row-level or per value (cell) This would require broader changes for
storing the ACLs inline with rows. It's still a future goal, but would slow
down the initial implementation considerably.
# Push down of file ownership to HDFS While table ownership seems like a
useful construct to start with (at least to lay the groundwork for future
changes), making HBase act as table owners when interacting with HDFS would
require more changes. In additional, while HDFS file ownership would make
applying quotas easy, and possibly make bulk imports more straightforward,
it's not clean it would offer a more secure setup. We'll leave this to
evaluate in a later phase.
# HBase managed roles as collections of permissions We will not model
roles internally in HBase to begin with. We will instead allow group names
to be granted permissions, which will allow some external modeling of roles
via group memberships. Groups will be created and manipulated externally to
HBase.
While the assignment of permissions to roles and roles to users (or other
roles) allows a great deal of flexibility in security policy, it would add
complexity to the initial implementation.
After the initial implementation, which will appear on this issue, we will
evaluate the addition of role definitions internal to HBase in a new JIRA. In
this scheme, administrators could assign permissions specifying HDFS groups,
and additionally HBase roles. HBase roles would be created and manipulated
internally to HBase, and would appear distinct from HDFS groups via some
syntactic sugar. HBase role definitions will be allowed to reference other
HBase role definitions.

[jira] [Created] (HBASE-5280) Remove AssignmentManager#clearRegionFromTransition and replace with assignmentManager#regionOffline

2012-01-25 Thread Jonathan Hsieh (Created) (JIRA)

Remove AssignmentManager#clearRegionFromTransition and replace with 
assignmentManager#regionOffline
---

 Key: HBASE-5280
 URL: https://issues.apache.org/jira/browse/HBASE-5280
 Project: HBase
  Issue Type: Improvement
  Components: master
Affects Versions: 0.92.0, 0.90.5, 0.94.0
Reporter: Jonathan Hsieh


These two methods are essentially the same and both present in the code base.  
It was suggested in the review for HBASE-5128 to remove 
#clearRegionFromTransition in favor of #regionOffline  (HBASE-5128 deprecates 
this method, but it is internal to the HMaster, so should be safely removable 
from 0.92 and 0.94).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5230) Ensure compactions do not cache-on-write data blocks

2012-01-25 Thread Mikhail Bautin (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Bautin updated HBASE-5230:
--

Issue Type: Improvement  (was: Test)
   Summary: Ensure compactions do not cache-on-write data blocks  (was: 
Unit test to ensure compactions don't cache data on write)

 Ensure compactions do not cache-on-write data blocks
 

 Key: HBASE-5230
 URL: https://issues.apache.org/jira/browse/HBASE-5230
 Project: HBase
  Issue Type: Improvement
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin
Priority: Minor
 Attachments: D1353.1.patch, D1353.2.patch, D1353.3.patch, 
 D1353.4.patch, 
 Don-t-cache-data-blocks-on-compaction-2012-01-21_00_53_54.patch, 
 Don-t-cache-data-blocks-on-compaction-2012-01-23_10_23_45.patch, 
 Don-t-cache-data-blocks-on-compaction-2012-01-23_15_27_23.patch


 Create a unit test for HBASE-3976 (making sure we don't cache data blocks on 
 write during compactions even if cache-on-write is enabled generally 
 enabled). This is because we have very different implementations of 
 HBASE-3976 without HBASE-4422 CacheConfig (on top of 89-fb, created by Liyin) 
 and with CacheConfig (presumably it's there but not sure if it even works, 
 since the patch in HBASE-3976 may not have been committed). We need to create 
 a unit test to verify that we don't cache data blocks on write during 
 compactions, and resolve HBASE-3976 so that this new unit test does not fail.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3466) runtime exception -- cached an already cached block -- during compaction

2012-01-25 Thread Simon Dircks (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-3466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13193444#comment-13193444
 ] 

Simon Dircks commented on HBASE-3466:
-

I just reproduced this with hadoop-1.0 and hbase-0.92 with YCSB. 


2012-01-25 23:23:51,556 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
regionserver:60020-0x134f70a343101a0 Successfully transitioned node 
162702503c650e551130e5fb588b3ec2 from RS_ZK_REGION_SPLIT to RS_ZK_REGION_SPLIT
2012-01-25 23:23:51,616 ERROR 
org.apache.hadoop.hbase.regionserver.HRegionServer: 
java.lang.RuntimeException: Cached an already cached block
at 
org.apache.hadoop.hbase.io.hfile.LruBlockCache.cacheBlock(LruBlockCache.java:268)
at 
org.apache.hadoop.hbase.io.hfile.HFileReaderV2.readBlock(HFileReaderV2.java:276)
at 
org.apache.hadoop.hbase.io.hfile.HFileReaderV2$ScannerV2.seekTo(HFileReaderV2.java:487)
at 
org.apache.hadoop.hbase.io.HalfStoreFileReader$1.seekTo(HalfStoreFileReader.java:168)
at 
org.apache.hadoop.hbase.regionserver.StoreFileScanner.seekAtOrAfter(StoreFileScanner.java:181)
at 
org.apache.hadoop.hbase.regionserver.StoreFileScanner.seek(StoreFileScanner.java:111)
at 
org.apache.hadoop.hbase.regionserver.StoreScanner.init(StoreScanner.java:83)
at 
org.apache.hadoop.hbase.regionserver.Store.getScanner(Store.java:1721)
at 
org.apache.hadoop.hbase.regionserver.HRegion$RegionScannerImpl.init(HRegion.java:2861)
at 
org.apache.hadoop.hbase.regionserver.HRegion.instantiateRegionScanner(HRegion.java:1432)
at 
org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:1424)
at 
org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:1400)
at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:3688)
at org.apache.hadoop.hbase.regionserver.HRegion.get(HRegion.java:3581)
at 
org.apache.hadoop.hbase.regionserver.HRegionServer.get(HRegionServer.java:1771)
at sun.reflect.GeneratedMethodAccessor38.invoke(Unknown Source)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at 
org.apache.hadoop.hbase.ipc.WritableRpcEngine$Server.call(WritableRpcEngine.java:364)
at 
org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1325)
2012-01-25 23:23:51,656 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
regionserver:60020-0x134f70a343101a0 Attempting to transition node 
162702503c650e551130e5fb588b3ec2 from RS_ZK_REGION_SPLIT to RS_ZK_REGION_SPLIT






18 node cluster, dedicated namenode, zookeeper, hbasemaster, and YCSB client 
machine. 


/usr/local/bin/java -cp build/ycsb.jar:db/hbase/lib/*:db/hbase/conf/ 
com.yahoo.ycsb.Client -load -db com.yahoo.ycsb.db.HBaseClient -P 
workloads/workloada -p columnfamily=family1 -p recordcount=500 -s  load.dat

loaded 5mil records, that created 8 regions. (balanced all onto the same RS)


/usr/local/bin/java -cp build/ycsb.jar:db/hbase/lib/*:db/hbase/conf/ 
com.yahoo.ycsb.Client -t -db com.yahoo.ycsb.db.HBaseClient -P 
workloads/workloada -p columnfamily=family1 -p operationcount=500 -threads 
10 -s  transaction.dat


I also was able to reproduce the 
2/01/25 15:19:24 WARN client.HConnectionManager$HConnectionImplementation: 
Failed all from 
region=usertable,user3076346045817661344,1327530607222.bab55fba6adb17bc8757eb6cdee99a91.,
 hostname=datatask6.hadoop.telescope.tv, port=60020
java.util.concurrent.ExecutionException: java.lang.RuntimeException: 
java.lang.NullPointerException

found in https://issues.apache.org/jira/browse/HBASE-4890




 runtime exception -- cached an already cached block -- during compaction
 

 Key: HBASE-3466
 URL: https://issues.apache.org/jira/browse/HBASE-3466
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.0
 Environment: ubuntu 9.10, kernel 2.6.31-14-generic SMP 8-core with 
 hyperthreading
Reporter: M. C. Srivas
Priority: Critical

 Happened while running ycsb against a single RS.  BlockSize was set to 64M to 
 tickle more splits. No compression, and replication factor set to 1.
  
 I noticed that  https://issues.apache.org/jira/browse/HBASE-2455 applied to 
 0.20.4, so opened this new one (didn't check to see if the code was the same 
 in 0.20.4 and 0.90.0)
 YCSB was run as follows:
 java -mx3000m -cp conf/:build/ycsb.jar:db/hbase/lib/* com.yahoo.ycsb.Client 
 -t -db com.yahoo.ycsb.db.HBaseClient -P workloads/workloada -p 
 columnfamily=family -p operationcount=1000 -s -threads 30 -target 3
 workloada was modified to do 1 billion records:
 --
 recordcount=10
 operationcount=1000

[jira] [Updated] (HBASE-4218) Data Block Encoding of KeyValues (aka delta encoding / prefix compression)


 [ 
https://issues.apache.org/jira/browse/HBASE-4218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phabricator updated HBASE-4218:
---

Attachment: D447.26.patch

mbautin updated the revision [jira] [HBASE-4218] HFile data block encoding 
framework and delta encoding implementation.
Reviewers: JIRA, tedyu, stack, nspiegelberg, Kannan

  Addressing Jerry's comments and rebasing on HBASE-5230 (ensuring that 
compactions do not cache data blocks on write). All unit tests pass.

  If there are no objections, I will commit this after final cluster testing.

REVISION DETAIL
  https://reviews.facebook.net/D447

AFFECTED FILES
  src/main/java/org/apache/hadoop/hbase/HColumnDescriptor.java
  src/main/java/org/apache/hadoop/hbase/HConstants.java
  src/main/java/org/apache/hadoop/hbase/KeyValue.java
  src/main/java/org/apache/hadoop/hbase/io/HalfStoreFileReader.java
  
src/main/java/org/apache/hadoop/hbase/io/encoding/BufferedDataBlockEncoder.java
  src/main/java/org/apache/hadoop/hbase/io/encoding/CompressionState.java
  src/main/java/org/apache/hadoop/hbase/io/encoding/CopyKeyDataBlockEncoder.java
  src/main/java/org/apache/hadoop/hbase/io/encoding/DataBlockEncoder.java
  src/main/java/org/apache/hadoop/hbase/io/encoding/DataBlockEncoding.java
  src/main/java/org/apache/hadoop/hbase/io/encoding/DiffKeyDeltaEncoder.java
  src/main/java/org/apache/hadoop/hbase/io/encoding/EncodedDataBlock.java
  
src/main/java/org/apache/hadoop/hbase/io/encoding/EncoderBufferTooSmallException.java
  src/main/java/org/apache/hadoop/hbase/io/encoding/FastDiffDeltaEncoder.java
  src/main/java/org/apache/hadoop/hbase/io/encoding/PrefixKeyDeltaEncoder.java
  src/main/java/org/apache/hadoop/hbase/io/hfile/AbstractHFileReader.java
  src/main/java/org/apache/hadoop/hbase/io/hfile/AbstractHFileWriter.java
  src/main/java/org/apache/hadoop/hbase/io/hfile/BlockCacheKey.java
  src/main/java/org/apache/hadoop/hbase/io/hfile/BlockType.java
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlockIndex.java
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileDataBlockEncoder.java
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileDataBlockEncoderImpl.java
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFilePrettyPrinter.java
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV1.java
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV1.java
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV2.java
  src/main/java/org/apache/hadoop/hbase/io/hfile/LruBlockCache.java
  src/main/java/org/apache/hadoop/hbase/io/hfile/NoOpDataBlockEncoder.java
  src/main/java/org/apache/hadoop/hbase/mapreduce/LoadIncrementalHFiles.java
  src/main/java/org/apache/hadoop/hbase/regionserver/CompactSplitThread.java
  src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
  src/main/java/org/apache/hadoop/hbase/regionserver/MemStore.java
  src/main/java/org/apache/hadoop/hbase/regionserver/Store.java
  src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java
  src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java
  src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java
  
src/main/java/org/apache/hadoop/hbase/regionserver/metrics/SchemaConfigured.java
  src/main/java/org/apache/hadoop/hbase/util/ByteBufferUtils.java
  src/main/java/org/apache/hadoop/hbase/util/CompoundBloomFilter.java
  src/main/ruby/hbase/admin.rb
  src/test/java/org/apache/hadoop/hbase/BROKE_TODO_FIX_TestAcidGuarantees.java
  src/test/java/org/apache/hadoop/hbase/HBaseTestCase.java
  src/test/java/org/apache/hadoop/hbase/HBaseTestingUtility.java
  src/test/java/org/apache/hadoop/hbase/HFilePerformanceEvaluation.java
  src/test/java/org/apache/hadoop/hbase/TestAcidGuarantees.java
  src/test/java/org/apache/hadoop/hbase/TestKeyValue.java
  src/test/java/org/apache/hadoop/hbase/client/TestAdmin.java
  src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java
  src/test/java/org/apache/hadoop/hbase/io/TestHalfStoreFileReader.java
  src/test/java/org/apache/hadoop/hbase/io/TestHeapSize.java
  src/test/java/org/apache/hadoop/hbase/io/encoding/RedundantKVGenerator.java
  
src/test/java/org/apache/hadoop/hbase/io/encoding/TestBufferedDataBlockEncoder.java
  src/test/java/org/apache/hadoop/hbase/io/encoding/TestChangingEncoding.java
  src/test/java/org/apache/hadoop/hbase/io/encoding/TestDataBlockEncoders.java
  src/test/java/org/apache/hadoop/hbase/io/encoding/TestEncodedSeekers.java
  
src/test/java/org/apache/hadoop/hbase/io/encoding/TestLoadAndSwitchEncodeOnDisk.java
  
src/test/java/org/apache/hadoop/hbase/io/encoding/TestUpgradeFromHFileV1ToEncoding.java
  src/test/java/org/apache/hadoop/hbase/io/hfile/CacheTestUtils.java

[jira] [Updated] (HBASE-4218) Data Block Encoding of KeyValues (aka delta encoding / prefix compression)

2012-01-25 Thread Mikhail Bautin (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-4218?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Mikhail Bautin updated HBASE-4218:
--

Attachment: Delta-encoding-2012-01-25_16_32_14.patch

Attaching a patch rebased on HBASE-5230 and addressing Jerry's new comment.

 Data Block Encoding of KeyValues  (aka delta encoding / prefix compression)
 ---

 Key: HBASE-4218
 URL: https://issues.apache.org/jira/browse/HBASE-4218
 Project: HBase
  Issue Type: Improvement
  Components: io
Affects Versions: 0.94.0
Reporter: Jacek Migdal
Assignee: Mikhail Bautin
  Labels: compression
 Fix For: 0.94.0

 Attachments: 0001-Delta-encoding-fixed-encoded-scanners.patch, 
 0001-Delta-encoding.patch, 4218-2012-01-14.txt, 4218-v16.txt, 4218.txt, 
 D447.1.patch, D447.10.patch, D447.11.patch, D447.12.patch, D447.13.patch, 
 D447.14.patch, D447.15.patch, D447.16.patch, D447.17.patch, D447.18.patch, 
 D447.19.patch, D447.2.patch, D447.20.patch, D447.21.patch, D447.22.patch, 
 D447.23.patch, D447.24.patch, D447.25.patch, D447.26.patch, D447.3.patch, 
 D447.4.patch, D447.5.patch, D447.6.patch, D447.7.patch, D447.8.patch, 
 D447.9.patch, Data-block-encoding-2011-12-23.patch, 
 Delta-encoding-2012-01-17_11_09_09.patch, 
 Delta-encoding-2012-01-25_00_45_29.patch, 
 Delta-encoding-2012-01-25_16_32_14.patch, 
 Delta-encoding.patch-2011-12-22_11_52_07.patch, 
 Delta-encoding.patch-2012-01-05_15_16_43.patch, 
 Delta-encoding.patch-2012-01-05_16_31_44.patch, 
 Delta-encoding.patch-2012-01-05_16_31_44_copy.patch, 
 Delta-encoding.patch-2012-01-05_18_50_47.patch, 
 Delta-encoding.patch-2012-01-07_14_12_48.patch, 
 Delta-encoding.patch-2012-01-13_12_20_07.patch, 
 Delta_encoding_with_memstore_TS.patch, open-source.diff


 A compression for keys. Keys are sorted in HFile and they are usually very 
 similar. Because of that, it is possible to design better compression than 
 general purpose algorithms,
 It is an additional step designed to be used in memory. It aims to save 
 memory in cache as well as speeding seeks within HFileBlocks. It should 
 improve performance a lot, if key lengths are larger than value lengths. For 
 example, it makes a lot of sense to use it when value is a counter.
 Initial tests on real data (key length = ~ 90 bytes , value length = 8 bytes) 
 shows that I could achieve decent level of compression:
  key compression ratio: 92%
  total compression ratio: 85%
  LZO on the same data: 85%
  LZO after delta encoding: 91%
 While having much better performance (20-80% faster decompression ratio than 
 LZO). Moreover, it should allow far more efficient seeking which should 
 improve performance a bit.
 It seems that a simple compression algorithms are good enough. Most of the 
 savings are due to prefix compression, int128 encoding, timestamp diffs and 
 bitfields to avoid duplication. That way, comparisons of compressed data can 
 be much faster than a byte comparator (thanks to prefix compression and 
 bitfields).
 In order to implement it in HBase two important changes in design will be 
 needed:
 -solidify interface to HFileBlock / HFileReader Scanner to provide seeking 
 and iterating; access to uncompressed buffer in HFileBlock will have bad 
 performance
 -extend comparators to support comparison assuming that N first bytes are 
 equal (or some fields are equal)
 Link to a discussion about something similar:
 http://search-hadoop.com/m/5aqGXJEnaD1/hbase+windowssubj=Re+prefix+compression

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3466) runtime exception -- cached an already cached block -- during compaction

2012-01-25 Thread Zhihong Yu (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-3466?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13193491#comment-13193491
 ] 

Zhihong Yu commented on HBASE-3466:
---

Since the issue can be reproduced, can you include cacheKey (and cb) in the 
exception message ?
{code}
CachedBlock cb = map.get(cacheKey);
if(cb != null) {
  throw new RuntimeException(Cached an already cached block);
}
{code}
Thanks

 runtime exception -- cached an already cached block -- during compaction
 

 Key: HBASE-3466
 URL: https://issues.apache.org/jira/browse/HBASE-3466
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.90.0
 Environment: ubuntu 9.10, kernel 2.6.31-14-generic SMP 8-core with 
 hyperthreading
Reporter: M. C. Srivas
Priority: Critical

 Happened while running ycsb against a single RS.  BlockSize was set to 64M to 
 tickle more splits. No compression, and replication factor set to 1.
  
 I noticed that  https://issues.apache.org/jira/browse/HBASE-2455 applied to 
 0.20.4, so opened this new one (didn't check to see if the code was the same 
 in 0.20.4 and 0.90.0)
 YCSB was run as follows:
 java -mx3000m -cp conf/:build/ycsb.jar:db/hbase/lib/* com.yahoo.ycsb.Client 
 -t -db com.yahoo.ycsb.db.HBaseClient -P workloads/workloada -p 
 columnfamily=family -p operationcount=1000 -s -threads 30 -target 3
 workloada was modified to do 1 billion records:
 --
 recordcount=10
 operationcount=1000
 workload=com.yahoo.ycsb.workloads.CoreWorkload
 readallfields=true
 readproportion=0.5
 updateproportion=0.4
 scanproportion=0
 insertproportion=0.1
 requestdistribution=zipfian
 ---
 Relevant portions from the RS's log:
 2011-01-23 10:48:20,719 INFO  
 org.apache.hadoop.hbase.regionserver.SplitTransaction 
 [regionserver60020.compactor]: Starting split of region 
 usertable,,1295808232386.44386ab6079bd5b497a6de3ab95e850c.
 2011-01-23 10:48:20,788 INFO  org.apache.hadoop.hbase.regionserver.Store 
 [regionserver60020.compactor]: Renaming flushed file at 
 maprfs:/hbase/usertable/44386ab6079bd5b497a6de3ab95e850c/.tmp/3202441284831392385
  to 
 maprfs:/hbase/usertable/44386ab6079bd5b497a6de3ab95e850c/family/1800354539520698957
 2011-01-23 10:48:20,791 INFO  org.apache.hadoop.hbase.regionserver.Store 
 [regionserver60020.compactor]: Added 
 maprfs:/hbase/usertable/44386ab6079bd5b497a6de3ab95e850c/family/1800354539520698957,
  entries=10943, sequenceid=128924, memsize=3.4m, filesize=1.5m
 2011-01-23 10:48:20,792 INFO  org.apache.hadoop.hbase.regionserver.HRegion 
 [regionserver60020.compactor]: Closed 
 usertable,,1295808232386.44386ab6079bd5b497a6de3ab95e850c.
 2011-01-23 10:48:20,828 INFO  org.apache.hadoop.hbase.catalog.MetaEditor 
 [regionserver60020.compactor]: Offlined parent region 
 usertable,,1295808232386.44386ab6079bd5b497a6de3ab95e850c. in META
 2011-01-23 10:48:20,856 INFO  org.apache.hadoop.hbase.regionserver.HRegion 
 [perfnode15.perf.lab,60020,1295807975391-daughterOpener=89e0f70da1e5ce2d5c4024ca6cc1addb]:
  Onlined usertable,,1295808500713.89e0f70da1e5ce2d5c4024ca6cc1addb.; next 
 sequenceid=128925
 2011-01-23 10:48:20,791 INFO  org.apache.hadoop.hbase.regionserver.Store 
 [regionserver60020.compactor]: Added 
 maprfs:/hbase/usertable/44386ab6079bd5b497a6de3ab95e850c/family/1800354539520698957,
  entries=10943, sequenceid=128924, memsize=3.4m, filesize=1.5m
 2011-01-23 10:48:20,792 INFO  org.apache.hadoop.hbase.regionserver.HRegion 
 [regionserver60020.compactor]: Closed 
 usertable,,1295808232386.44386ab6079bd5b497a6de3ab95e850c.
 2011-01-23 10:48:20,828 INFO  org.apache.hadoop.hbase.catalog.MetaEditor 
 [regionserver60020.compactor]: Offlined parent region 
 usertable,,1295808232386.44386ab6079bd5b497a6de3ab95e850c. in META
 2011-01-23 10:48:20,856 INFO  org.apache.hadoop.hbase.regionserver.HRegion 
 [perfnode15.perf.lab,60020,1295807975391-daughterOpener=89e0f70da1e5ce2d5c4024ca6cc1addb]:
  Onlined usertable,,1295808500713.89e0f70da1e5ce2d5c4024ca6cc1addb.; next 
 sequenceid=128925
 2011-01-23 10:48:20,863 INFO  org.apache.hadoop.hbase.catalog.MetaEditor 
 [perfnode15.perf.lab,60020,1295807975391-daughterOpener=89e0f70da1e5ce2d5c4024ca6cc1addb]:
  Added daughter usertable,,1295808500713.89e0f70da1e5ce2d5c4024ca6cc1addb. in 
 region .META.,,1, serverInfo=perfnode15.perf.lab,60020,1295807975391
 2011-01-23 10:48:20,868 INFO  org.apache.hadoop.hbase.regionserver.HRegion 
 [perfnode15.perf.lab,60020,1295807975391-daughterOpener=fd1d4e71c9a7e262a6e26adc0742414e]:
  Onlined 
 usertable,user1907848630,1295808500713.fd1d4e71c9a7e262a6e26adc0742414e.; 
 next sequenceid=128926
 2011-01-23 10:48:20,869 INFO  org.apache.hadoop.hbase.catalog.MetaEditor

[jira] [Commented] (HBASE-5128) [uber hbck] Enable hbck to automatically repair table integrity problems as well as region consistency problems while online.

2012-01-25 Thread Jonathan Hsieh (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-5128?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13193499#comment-13193499
]

Jonathan Hsieh commented on HBASE-5128:
---

It was also suggested that I need to worry about compactions due to a HRegion
flush when I close regions during overlap merging. At least in 0.90, this is
not actually necessary -- the closeRegion HMaster side actually flushes but
ignores the return value of internalFlushcache return flag that specifies if a
region needs to be compacted.

[uber hbck] Enable hbck to automatically repair table integrity problems as
well as region consistency problems while online.
-

Key: HBASE-5128
URL: https://issues.apache.org/jira/browse/HBASE-5128
Project: HBase
Issue Type: New Feature
Components: hbck
Affects Versions: 0.90.5, 0.92.0
Reporter: Jonathan Hsieh
Assignee: Jonathan Hsieh

The current (0.90.5, 0.92.0rc2) versions of hbck detects most of region
consistency and table integrity invariant violations. However with '-fix' it
can only automatically repair region consistency cases having to do with
deployment problems. This updated version should be able to handle all cases
(including a new orphan regiondir case). When complete will likely deprecate
the OfflineMetaRepair tool and subsume several open META-hole related issue.
Here's the approach (from the comment of at the top of the new version of the
file).
{code}
/**
* HBaseFsck (hbck) is a tool for checking and repairing region consistency
and
* table integrity.
*
* Region consistency checks verify that META, region deployment on
* region servers and the state of data in HDFS (.regioninfo files) all are in
* accordance.
*
* Table integrity checks verify that that all possible row keys can resolve
to
* exactly one region of a table. This means there are no individual
degenerate
* or backwards regions; no holes between regions; and that there no
overlapping
* regions.
*
* The general repair strategy works in these steps.
* 1) Repair Table Integrity on HDFS. (merge or fabricate regions)
* 2) Repair Region Consistency with META and assignments
*
* For table integrity repairs, the tables their region directories are
scanned
* for .regioninfo files. Each table's integrity is then verified. If there
* are any orphan regions (regions with no .regioninfo files), or holes, new
* regions are fabricated. Backwards regions are sidelined as well as empty
* degenerate (endkey==startkey) regions. If there are any overlapping
regions,
* a new region is created and all data is merged into the new region.
*
* Table integrity repairs deal solely with HDFS and can be done offline --
the
* hbase region servers or master do not need to be running. These phase can
be
* use to completely reconstruct the META table in an offline fashion.
*
* Region consistency requires three conditions -- 1) valid .regioninfo file
* present in an hdfs region dir, 2) valid row with .regioninfo data in META,
* and 3) a region is deployed only at the regionserver that is was assigned
to.
*
* Region consistency requires hbck to contact the HBase master and region
* servers, so the connect() must first be called successfully. Much of the
* region consistency information is transient and less risky to repair.
*/
{code}

[jira] [Commented] (HBASE-4218) Data Block Encoding of KeyValues (aka delta encoding / prefix compression)


[ 
https://issues.apache.org/jira/browse/HBASE-4218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13193514#comment-13193514
 ] 

Phabricator commented on HBASE-4218:


Kannan has accepted the revision [jira] [HBASE-4218] HFile data block encoding 
framework and delta encoding implementation.

  excellent!!

REVISION DETAIL
  https://reviews.facebook.net/D447


 Data Block Encoding of KeyValues  (aka delta encoding / prefix compression)
 ---

 Key: HBASE-4218
 URL: https://issues.apache.org/jira/browse/HBASE-4218
 Project: HBase
  Issue Type: Improvement
  Components: io
Affects Versions: 0.94.0
Reporter: Jacek Migdal
Assignee: Mikhail Bautin
  Labels: compression
 Fix For: 0.94.0

 Attachments: 0001-Delta-encoding-fixed-encoded-scanners.patch, 
 0001-Delta-encoding.patch, 4218-2012-01-14.txt, 4218-v16.txt, 4218.txt, 
 D447.1.patch, D447.10.patch, D447.11.patch, D447.12.patch, D447.13.patch, 
 D447.14.patch, D447.15.patch, D447.16.patch, D447.17.patch, D447.18.patch, 
 D447.19.patch, D447.2.patch, D447.20.patch, D447.21.patch, D447.22.patch, 
 D447.23.patch, D447.24.patch, D447.25.patch, D447.26.patch, D447.3.patch, 
 D447.4.patch, D447.5.patch, D447.6.patch, D447.7.patch, D447.8.patch, 
 D447.9.patch, Data-block-encoding-2011-12-23.patch, 
 Delta-encoding-2012-01-17_11_09_09.patch, 
 Delta-encoding-2012-01-25_00_45_29.patch, 
 Delta-encoding-2012-01-25_16_32_14.patch, 
 Delta-encoding.patch-2011-12-22_11_52_07.patch, 
 Delta-encoding.patch-2012-01-05_15_16_43.patch, 
 Delta-encoding.patch-2012-01-05_16_31_44.patch, 
 Delta-encoding.patch-2012-01-05_16_31_44_copy.patch, 
 Delta-encoding.patch-2012-01-05_18_50_47.patch, 
 Delta-encoding.patch-2012-01-07_14_12_48.patch, 
 Delta-encoding.patch-2012-01-13_12_20_07.patch, 
 Delta_encoding_with_memstore_TS.patch, open-source.diff


 A compression for keys. Keys are sorted in HFile and they are usually very 
 similar. Because of that, it is possible to design better compression than 
 general purpose algorithms,
 It is an additional step designed to be used in memory. It aims to save 
 memory in cache as well as speeding seeks within HFileBlocks. It should 
 improve performance a lot, if key lengths are larger than value lengths. For 
 example, it makes a lot of sense to use it when value is a counter.
 Initial tests on real data (key length = ~ 90 bytes , value length = 8 bytes) 
 shows that I could achieve decent level of compression:
  key compression ratio: 92%
  total compression ratio: 85%
  LZO on the same data: 85%
  LZO after delta encoding: 91%
 While having much better performance (20-80% faster decompression ratio than 
 LZO). Moreover, it should allow far more efficient seeking which should 
 improve performance a bit.
 It seems that a simple compression algorithms are good enough. Most of the 
 savings are due to prefix compression, int128 encoding, timestamp diffs and 
 bitfields to avoid duplication. That way, comparisons of compressed data can 
 be much faster than a byte comparator (thanks to prefix compression and 
 bitfields).
 In order to implement it in HBase two important changes in design will be 
 needed:
 -solidify interface to HFileBlock / HFileReader Scanner to provide seeking 
 and iterating; access to uncompressed buffer in HFileBlock will have bad 
 performance
 -extend comparators to support comparison assuming that N first bytes are 
 equal (or some fields are equal)
 Link to a discussion about something similar:
 http://search-hadoop.com/m/5aqGXJEnaD1/hbase+windowssubj=Re+prefix+compression

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4917) CRUD Verify Utility

2012-01-25 Thread Mubarak Seyed (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13193532#comment-13193532
 ] 

Mubarak Seyed commented on HBASE-4917:
--

working on this port, will attach the patch once i get the corporate approval. 
Thanks.

 CRUD Verify Utility
 ---

 Key: HBASE-4917
 URL: https://issues.apache.org/jira/browse/HBASE-4917
 Project: HBase
  Issue Type: Sub-task
  Components: client, regionserver
Reporter: Nicolas Spiegelberg
 Fix For: 0.94.0


 Add a verify utility to run basic CRUD tests against hbase in various common 
 use cases.  This is great for sanity checking a cluster setup because it can 
 be run as a one line shell command with no required params.  Multiple column 
 families for different use-cases can be tested together.  Currently provided 
 use-cases are 'action log', 'snapshot' and 'search'. The interface is 
 developed such that it can be easily extended to cover more use-cases.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4218) Data Block Encoding of KeyValues (aka delta encoding / prefix compression)

2012-01-25 Thread Hadoop QA (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-4218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13193534#comment-13193534
]

Hadoop QA commented on HBASE-4218:
--

-1 overall. Here are the results of testing the latest attachment

http://issues.apache.org/jira/secure/attachment/12511917/Delta-encoding-2012-01-25_16_32_14.patch
against trunk revision .

+1 @author. The patch does not contain any @author tags.

+1 tests included. The patch appears to include 189 new or modified tests.

-1 javadoc. The javadoc tool appears to have generated -140 warning
messages.

+1 javac. The applied patch does not increase the total number of javac
compiler warnings.

-1 findbugs. The patch appears to introduce 88 new Findbugs (version
1.3.9) warnings.

+1 release audit. The applied patch does not increase the total number of
release audit warnings.

-1 core tests. The patch failed these unit tests:
org.apache.hadoop.hbase.io.hfile.TestHFileBlock

Test results:
https://builds.apache.org/job/PreCommit-HBASE-Build/852//testReport/
Findbugs warnings:
https://builds.apache.org/job/PreCommit-HBASE-Build/852//artifact/trunk/patchprocess/newPatchFindbugsWarnings.html
Console output: https://builds.apache.org/job/PreCommit-HBASE-Build/852//console

This message is automatically generated.

Data Block Encoding of KeyValues (aka delta encoding / prefix compression)
---

Attachments: 0001-Delta-encoding-fixed-encoded-scanners.patch,
0001-Delta-encoding.patch, 4218-2012-01-14.txt, 4218-v16.txt, 4218.txt,
D447.1.patch, D447.10.patch, D447.11.patch, D447.12.patch, D447.13.patch,
D447.14.patch, D447.15.patch, D447.16.patch, D447.17.patch, D447.18.patch,
D447.19.patch, D447.2.patch, D447.20.patch, D447.21.patch, D447.22.patch,
D447.23.patch, D447.24.patch, D447.25.patch, D447.26.patch, D447.3.patch,
D447.4.patch, D447.5.patch, D447.6.patch, D447.7.patch, D447.8.patch,
D447.9.patch, Data-block-encoding-2011-12-23.patch,
Delta-encoding-2012-01-17_11_09_09.patch,
Delta-encoding-2012-01-25_00_45_29.patch,
Delta-encoding-2012-01-25_16_32_14.patch,
Delta-encoding.patch-2011-12-22_11_52_07.patch,
Delta-encoding.patch-2012-01-05_15_16_43.patch,
Delta-encoding.patch-2012-01-05_16_31_44.patch,
Delta-encoding.patch-2012-01-05_16_31_44_copy.patch,
Delta-encoding.patch-2012-01-05_18_50_47.patch,
Delta-encoding.patch-2012-01-07_14_12_48.patch,
Delta-encoding.patch-2012-01-13_12_20_07.patch,
Delta_encoding_with_memstore_TS.patch, open-source.diff

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators:

[jira] [Commented] (HBASE-4218) Data Block Encoding of KeyValues (aka delta encoding / prefix compression)


[ 
https://issues.apache.org/jira/browse/HBASE-4218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13193535#comment-13193535
 ] 

Phabricator commented on HBASE-4218:


mbautin has committed the revision [jira] [HBASE-4218] HFile data block 
encoding framework and delta encoding implementation.

REVISION DETAIL
  https://reviews.facebook.net/D447

COMMIT
  https://reviews.facebook.net/rHBASE1236031


 Data Block Encoding of KeyValues  (aka delta encoding / prefix compression)
 ---

 Key: HBASE-4218
 URL: https://issues.apache.org/jira/browse/HBASE-4218
 Project: HBase
  Issue Type: Improvement
  Components: io
Affects Versions: 0.94.0
Reporter: Jacek Migdal
Assignee: Mikhail Bautin
  Labels: compression
 Fix For: 0.94.0

 Attachments: 0001-Delta-encoding-fixed-encoded-scanners.patch, 
 0001-Delta-encoding.patch, 4218-2012-01-14.txt, 4218-v16.txt, 4218.txt, 
 D447.1.patch, D447.10.patch, D447.11.patch, D447.12.patch, D447.13.patch, 
 D447.14.patch, D447.15.patch, D447.16.patch, D447.17.patch, D447.18.patch, 
 D447.19.patch, D447.2.patch, D447.20.patch, D447.21.patch, D447.22.patch, 
 D447.23.patch, D447.24.patch, D447.25.patch, D447.26.patch, D447.3.patch, 
 D447.4.patch, D447.5.patch, D447.6.patch, D447.7.patch, D447.8.patch, 
 D447.9.patch, Data-block-encoding-2011-12-23.patch, 
 Delta-encoding-2012-01-17_11_09_09.patch, 
 Delta-encoding-2012-01-25_00_45_29.patch, 
 Delta-encoding-2012-01-25_16_32_14.patch, 
 Delta-encoding.patch-2011-12-22_11_52_07.patch, 
 Delta-encoding.patch-2012-01-05_15_16_43.patch, 
 Delta-encoding.patch-2012-01-05_16_31_44.patch, 
 Delta-encoding.patch-2012-01-05_16_31_44_copy.patch, 
 Delta-encoding.patch-2012-01-05_18_50_47.patch, 
 Delta-encoding.patch-2012-01-07_14_12_48.patch, 
 Delta-encoding.patch-2012-01-13_12_20_07.patch, 
 Delta_encoding_with_memstore_TS.patch, open-source.diff


 A compression for keys. Keys are sorted in HFile and they are usually very 
 similar. Because of that, it is possible to design better compression than 
 general purpose algorithms,
 It is an additional step designed to be used in memory. It aims to save 
 memory in cache as well as speeding seeks within HFileBlocks. It should 
 improve performance a lot, if key lengths are larger than value lengths. For 
 example, it makes a lot of sense to use it when value is a counter.
 Initial tests on real data (key length = ~ 90 bytes , value length = 8 bytes) 
 shows that I could achieve decent level of compression:
  key compression ratio: 92%
  total compression ratio: 85%
  LZO on the same data: 85%
  LZO after delta encoding: 91%
 While having much better performance (20-80% faster decompression ratio than 
 LZO). Moreover, it should allow far more efficient seeking which should 
 improve performance a bit.
 It seems that a simple compression algorithms are good enough. Most of the 
 savings are due to prefix compression, int128 encoding, timestamp diffs and 
 bitfields to avoid duplication. That way, comparisons of compressed data can 
 be much faster than a byte comparator (thanks to prefix compression and 
 bitfields).
 In order to implement it in HBase two important changes in design will be 
 needed:
 -solidify interface to HFileBlock / HFileReader Scanner to provide seeking 
 and iterating; access to uncompressed buffer in HFileBlock will have bad 
 performance
 -extend comparators to support comparison assuming that N first bytes are 
 equal (or some fields are equal)
 Link to a discussion about something similar:
 http://search-hadoop.com/m/5aqGXJEnaD1/hbase+windowssubj=Re+prefix+compression

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5276) PerformanceEvaluation does not set the correct classpath for MR because it lives in the test jar

2012-01-25 Thread Jonathan Hsieh (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-5276?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13193536#comment-13193536
 ] 

Jonathan Hsieh commented on HBASE-5276:
---

Hi Tim, 

From the HBASE-4688 issue, it looks this isn't in Apache HBase until 0.92.0.  
If you would like this in a future CDH3 release please file an issue here:

https://issues.cloudera.org/browse/DISTRO

Since CDH4 is based on Apache HBase 0.92, it will be in the CDH4 HBase.  

Thanks,
Jon.

 PerformanceEvaluation does not set the correct classpath for MR because it 
 lives in the test jar
 

 Key: HBASE-5276
 URL: https://issues.apache.org/jira/browse/HBASE-5276
 Project: HBase
  Issue Type: Bug
  Components: test
Affects Versions: 0.90.4
Reporter: Tim Robertson
Priority: Minor

 Note: This was discovered running the CDH version hbase-0.90.4-cdh3u2
 Running the PerformanceEvaluation as follows:
   $HADOOP_HOME/bin/hadoop org.apache.hadoop.hbase.PerformanceEvaluation scan 5
 fails because the MR tasks do not get the HBase jar on the CP, and thus hit 
 ClassNotFoundExceptions.
 The job gets the following only:
   file:/Users/tim/dev/hadoop/hbase-0.90.4-cdh3u2/hbase-0.90.4-cdh3u2-tests.jar
   
 file:/Users/tim/dev/hadoop/hadoop-0.20.2-cdh3u2/hadoop-core-0.20.2-cdh3u2.jar
   
 file:/Users/tim/dev/hadoop/hbase-0.90.4-cdh3u2/lib/zookeeper-3.3.3-cdh3u2.jar
 The RowCounter etc all work because they live in the HBase jar, not the test 
 jar, and they get the following 
   file:/Users/tim/dev/hadoop/hbase-0.90.4-cdh3u2/lib/guava-r06.jar
   
 file:/Users/tim/dev/hadoop/hadoop-0.20.2-cdh3u2/hadoop-core-0.20.2-cdh3u2.jar
   file:/Users/tim/dev/hadoop/hbase-0.90.4-cdh3u2/hbase-0.90.4-cdh3u2.jar
   
 file:/Users/tim/dev/hadoop/hbase-0.90.4-cdh3u2/lib/zookeeper-3.3.3-cdh3u2.jar
 Presumably this relates to 
   job.setJarByClass(PerformanceEvaluation.class);
   ...
   TableMapReduceUtil.addDependencyJars(job);
 A (cowboy) workaround to run PE is to unpack the jars, and copy the 
 PerformanceEvaluation* classes building a patched jar.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5186) Add metrics to ThriftServer


 [ 
https://issues.apache.org/jira/browse/HBASE-5186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phabricator updated HBASE-5186:
---

Attachment: HBASE-5186.D1461.1.patch

sc requested code review of HBASE-5186 [jira] Add metrics to ThriftServer.
Reviewers: dhruba, tedyu, JIRA

  Add metrics to ThriftServer

  It will be useful to have some metrics (queue length, waiting time, processing
  time ...) similar to Hadoop RPC server. This allows us to monitor system 
health
  also provide a tool to diagnose the problem where thrift calls are slow.

  It will be useful to have some metrics (queue length, waiting time, 
processing time ...) similar to Hadoop RPC server. This allows us to monitor 
system health also provide a tool to diagnose the problem where thrift calls 
are slow.

TEST PLAN
  EMPTY

REVISION DETAIL
  https://reviews.facebook.net/D1461

AFFECTED FILES
  pom.xml
  src/main/java/org/apache/hadoop/hbase/thrift/CallQueue.java
  src/main/java/org/apache/hadoop/hbase/thrift/HbaseHandlerMetricsProxy.java
  src/main/java/org/apache/hadoop/hbase/thrift/TBoundedThreadPoolServer.java
  src/main/java/org/apache/hadoop/hbase/thrift/ThriftMetrics.java
  src/main/java/org/apache/hadoop/hbase/thrift/ThriftServerRunner.java
  src/test/java/org/apache/hadoop/hbase/thrift/TestThriftServer.java

MANAGE HERALD DIFFERENTIAL RULES
  https://reviews.facebook.net/herald/view/differential/

WHY DID I GET THIS EMAIL?
  https://reviews.facebook.net/herald/transcript/3021/

Tip: use the X-Herald-Rules header to filter Herald messages in your client.


 Add metrics to ThriftServer
 ---

 Key: HBASE-5186
 URL: https://issues.apache.org/jira/browse/HBASE-5186
 Project: HBase
  Issue Type: Improvement
Reporter: Scott Chen
Assignee: Scott Chen
 Attachments: HBASE-5186.D1461.1.patch


 It will be useful to have some metrics (queue length, waiting time, 
 processing time ...) similar to Hadoop RPC server. This allows us to monitor 
 system health also provide a tool to diagnose the problem where thrift calls 
 are slow.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5186) Add metrics to ThriftServer


 [ 
https://issues.apache.org/jira/browse/HBASE-5186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phabricator updated HBASE-5186:
---

Attachment: HBASE-5186.D1461.2.patch

sc updated the revision HBASE-5186 [jira] Add metrics to ThriftServer.
Reviewers: dhruba, tedyu, JIRA

  Remove the debug change

REVISION DETAIL
  https://reviews.facebook.net/D1461

AFFECTED FILES
  src/main/java/org/apache/hadoop/hbase/thrift/CallQueue.java
  src/main/java/org/apache/hadoop/hbase/thrift/HbaseHandlerMetricsProxy.java
  src/main/java/org/apache/hadoop/hbase/thrift/TBoundedThreadPoolServer.java
  src/main/java/org/apache/hadoop/hbase/thrift/ThriftMetrics.java
  src/main/java/org/apache/hadoop/hbase/thrift/ThriftServerRunner.java
  src/test/java/org/apache/hadoop/hbase/thrift/TestThriftServer.java


 Add metrics to ThriftServer
 ---

 Key: HBASE-5186
 URL: https://issues.apache.org/jira/browse/HBASE-5186
 Project: HBase
  Issue Type: Improvement
Reporter: Scott Chen
Assignee: Scott Chen
 Attachments: HBASE-5186.D1461.1.patch, HBASE-5186.D1461.2.patch


 It will be useful to have some metrics (queue length, waiting time, 
 processing time ...) similar to Hadoop RPC server. This allows us to monitor 
 system health also provide a tool to diagnose the problem where thrift calls 
 are slow.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5230) Ensure compactions do not cache-on-write data blocks


[ 
https://issues.apache.org/jira/browse/HBASE-5230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13193545#comment-13193545
 ] 

Hudson commented on HBASE-5230:
---

Integrated in HBase-TRUNK #2646 (See 
[https://builds.apache.org/job/HBase-TRUNK/2646/])
HBASE-5230 : ensure that compactions do not cache-on-write data blocks

mbautin : 
Files : 
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV2.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/LruBlockCache.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/metrics/SchemaMetrics.java
* 
/hbase/trunk/src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java
* 
/hbase/trunk/src/test/java/org/apache/hadoop/hbase/io/hfile/TestCacheOnWrite.java


 Ensure compactions do not cache-on-write data blocks
 

 Key: HBASE-5230
 URL: https://issues.apache.org/jira/browse/HBASE-5230
 Project: HBase
  Issue Type: Improvement
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin
Priority: Minor
 Attachments: D1353.1.patch, D1353.2.patch, D1353.3.patch, 
 D1353.4.patch, 
 Don-t-cache-data-blocks-on-compaction-2012-01-21_00_53_54.patch, 
 Don-t-cache-data-blocks-on-compaction-2012-01-23_10_23_45.patch, 
 Don-t-cache-data-blocks-on-compaction-2012-01-23_15_27_23.patch


 Create a unit test for HBASE-3976 (making sure we don't cache data blocks on 
 write during compactions even if cache-on-write is enabled generally 
 enabled). This is because we have very different implementations of 
 HBASE-3976 without HBASE-4422 CacheConfig (on top of 89-fb, created by Liyin) 
 and with CacheConfig (presumably it's there but not sure if it even works, 
 since the patch in HBASE-3976 may not have been committed). We need to create 
 a unit test to verify that we don't cache data blocks on write during 
 compactions, and resolve HBASE-3976 so that this new unit test does not fail.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4720) Implement atomic update operations (checkAndPut, checkAndDelete) for REST client/server


[ 
https://issues.apache.org/jira/browse/HBASE-4720?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13193546#comment-13193546
 ] 

Hudson commented on HBASE-4720:
---

Integrated in HBase-TRUNK #2646 (See 
[https://builds.apache.org/job/HBase-TRUNK/2646/])
HBASE-4720 revert until agreement is reached on solution
HBASE-4720 Implement atomic update operations (checkAndPut, checkAndDelete) for 
REST client/server (Mubarak)

tedyu : 
Files : 
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/rest/CheckAndDeleteRowResource.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/rest/CheckAndDeleteTableResource.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/rest/CheckAndPutRowResource.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/rest/CheckAndPutTableResource.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/rest/RootResource.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/rest/client/RemoteHTable.java
* /hbase/trunk/src/test/java/org/apache/hadoop/hbase/rest/TestRowResource.java

tedyu : 
Files : 
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/rest/CheckAndDeleteRowResource.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/rest/CheckAndDeleteTableResource.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/rest/CheckAndPutRowResource.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/rest/CheckAndPutTableResource.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/rest/RootResource.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/rest/client/RemoteHTable.java
* /hbase/trunk/src/test/java/org/apache/hadoop/hbase/rest/TestRowResource.java


 Implement atomic update operations (checkAndPut, checkAndDelete) for REST 
 client/server 
 

 Key: HBASE-4720
 URL: https://issues.apache.org/jira/browse/HBASE-4720
 Project: HBase
  Issue Type: Improvement
Reporter: Daniel Lord
Assignee: Mubarak Seyed
 Fix For: 0.94.0

 Attachments: HBASE-4720.trunk.v1.patch, HBASE-4720.trunk.v2.patch, 
 HBASE-4720.trunk.v3.patch, HBASE-4720.trunk.v4.patch, 
 HBASE-4720.trunk.v5.patch, HBASE-4720.trunk.v6.patch, HBASE-4720.v1.patch, 
 HBASE-4720.v3.patch


 I have several large application/HBase clusters where an application node 
 will occasionally need to talk to HBase from a different cluster.  In order 
 to help ensure some of my consistency guarantees I have a sentinel table that 
 is updated atomically as users interact with the system.  This works quite 
 well for the regular hbase client but the REST client does not implement 
 the checkAndPut and checkAndDelete operations.  This exposes the application 
 to some race conditions that have to be worked around.  It would be ideal if 
 the same checkAndPut/checkAndDelete operations could be supported by the REST 
 client.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4218) Data Block Encoding of KeyValues (aka delta encoding / prefix compression)


[ 
https://issues.apache.org/jira/browse/HBASE-4218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13193543#comment-13193543
 ] 

Hudson commented on HBASE-4218:
---

Integrated in HBase-TRUNK #2646 (See 
[https://builds.apache.org/job/HBase-TRUNK/2646/])
[jira] [HBASE-4218] HFile data block encoding framework and delta encoding
implementation (Jacek Midgal, Mikhail Bautin)

Summary:

Adding a framework that allows to encode keys in an HFile data block. We
support two modes of encoding: (1) both on disk and in cache, and (2) in cache
only. This is distinct from compression that is already being done in HBase,
e.g. GZ or LZO. When data block encoding is enabled, we store blocks in cache
in an uncompressed but encoded form. This allows to fit more blocks in cache
and reduce the number of disk reads.

The most common example of data block encoding is delta encoding, where we take
advantage of the fact that HFile keys are sorted and share a lot of common
prefixes, and only store the delta between each pair of consecutive keys.
Initial encoding algorithms implemented are DIFF, FAST_DIFF, and PREFIX.

This is based on the delta encoding patch developed by Jacek Midgal during his
2011 summer internship at Facebook. The original patch is available here:
https://reviews.apache.org/r/2308/diff/.

Test Plan: Unit tests. Distributed load test on a five-node cluster.

Reviewers: JIRA, tedyu, stack, nspiegelberg, Kannan

Reviewed By: Kannan

CC: tedyu, todd, mbautin, stack, Kannan, mcorgan, gqchen

Differential Revision: https://reviews.facebook.net/D447

mbautin : 
Files : 
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/HColumnDescriptor.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/HConstants.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/KeyValue.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/HalfStoreFileReader.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/encoding/BufferedDataBlockEncoder.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/encoding/CompressionState.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/encoding/CopyKeyDataBlockEncoder.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/encoding/DataBlockEncoder.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/encoding/DataBlockEncoding.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/encoding/DiffKeyDeltaEncoder.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/encoding/EncodedDataBlock.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/encoding/EncoderBufferTooSmallException.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/encoding/FastDiffDeltaEncoder.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/encoding/PrefixKeyDeltaEncoder.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/AbstractHFileReader.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/AbstractHFileWriter.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/BlockCacheKey.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/BlockType.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlockIndex.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileDataBlockEncoder.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileDataBlockEncoderImpl.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/HFilePrettyPrinter.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV1.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV1.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV2.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/LruBlockCache.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/NoOpDataBlockEncoder.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/mapreduce/LoadIncrementalHFiles.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/CompactSplitThread.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/MemStore.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/metrics/SchemaConfigured.java
*

[jira] [Commented] (HBASE-5278) HBase shell script refers to removed migrate functionality


[ 
https://issues.apache.org/jira/browse/HBASE-5278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13193544#comment-13193544
 ] 

Hudson commented on HBASE-5278:
---

Integrated in HBase-TRUNK #2646 (See 
[https://builds.apache.org/job/HBase-TRUNK/2646/])
HBASE-5278 HBase shell script refers to removed 'migrate' functionality

stack : 
Files : 
* /hbase/trunk/bin/hbase


 HBase shell script refers to removed migrate functionality
 

 Key: HBASE-5278
 URL: https://issues.apache.org/jira/browse/HBASE-5278
 Project: HBase
  Issue Type: Bug
  Components: scripts
Affects Versions: 0.94.0, 0.90.5, 0.92.0
Reporter: Shaneal Manek
Assignee: Shaneal Manek
Priority: Trivial
 Fix For: 0.94.0, 0.92.1

 Attachments: hbase-5278.patch


 $ hbase migrate
 Exception in thread main java.lang.NoClassDefFoundError: 
 org/apache/hadoop/hbase/util/Migrate
 Caused by: java.lang.ClassNotFoundException: 
 org.apache.hadoop.hbase.util.Migrate
 at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
 at java.security.AccessController.doPrivileged(Native Method)
 at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
 at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
 at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
 Could not find the main class: org.apache.hadoop.hbase.util.Migrate. Program 
 will exit.
 The 'hbase' shell script has docs referring to a 'migrate' command which no 
 longer exists.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5229) Explore building blocks for multi-row local transactions.

2012-01-25 Thread Lars Hofhansl (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-5229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13193551#comment-13193551
]

Lars Hofhansl commented on HBASE-5229:
--

Is anybody interested in me exploring the split prefix idea described above?

Basically a table would declare a prefix of N bytes, and during splitting we
make sure don't split values with the same prefix (which essentially just means
that we calculate the midKey as we do now, and just take the first N bytes to
perform the actual split, hence actual split point would always be aligned with
the prefixes).
That way we have defined a grouping of rows that could participate in local
transactions.

Explore building blocks for multi-row local transactions.
---

Key: HBASE-5229
URL: https://issues.apache.org/jira/browse/HBASE-5229
Project: HBase
Issue Type: New Feature
Components: client, regionserver
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
Fix For: 0.94.0

Attachments: 5229-seekto-v2.txt, 5229-seekto.txt, 5229.txt

HBase should provide basic building blocks for multi-row local transactions.
Local means that we do this by co-locating the data. Global (cross region)
transactions are not discussed here.
After a bit of discussion two solutions have emerged:
1. Keep the row-key for determining grouping and location and allow efficient
intra-row scanning. A client application would then model tables as
HBase-rows.
2. Define a prefix-length in HTableDescriptor that defines a grouping of
rows. Regions will then never be split inside a grouping prefix.
#1 is true to the current storage paradigm of HBase.
#2 is true to the current client side API.
I will explore these two with sample patches here.

Was:
As discussed (at length) on the dev mailing list with the HBASE-3584 and
HBASE-5203 committed, supporting atomic cross row transactions within a
region becomes simple.
I am aware of the hesitation about the usefulness of this feature, but we
have to start somewhere.
Let's use this jira for discussion, I'll attach a patch (with tests)
momentarily to make this concrete.

[jira] [Updated] (HBASE-5186) Add metrics to ThriftServer


 [ 
https://issues.apache.org/jira/browse/HBASE-5186?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Phabricator updated HBASE-5186:
---

Attachment: HBASE-5186.D1461.3.patch

sc updated the revision HBASE-5186 [jira] Add metrics to ThriftServer.
Reviewers: dhruba, tedyu, JIRA

  Add TestCallQueue

REVISION DETAIL
  https://reviews.facebook.net/D1461

AFFECTED FILES
  src/main/java/org/apache/hadoop/hbase/thrift/CallQueue.java
  src/main/java/org/apache/hadoop/hbase/thrift/HbaseHandlerMetricsProxy.java
  src/main/java/org/apache/hadoop/hbase/thrift/TBoundedThreadPoolServer.java
  src/main/java/org/apache/hadoop/hbase/thrift/ThriftMetrics.java
  src/main/java/org/apache/hadoop/hbase/thrift/ThriftServerRunner.java
  src/test/java/org/apache/hadoop/hbase/thrift/TestCallQueue.java
  src/test/java/org/apache/hadoop/hbase/thrift/TestThriftServer.java


 Add metrics to ThriftServer
 ---

 Key: HBASE-5186
 URL: https://issues.apache.org/jira/browse/HBASE-5186
 Project: HBase
  Issue Type: Improvement
Reporter: Scott Chen
Assignee: Scott Chen
 Attachments: HBASE-5186.D1461.1.patch, HBASE-5186.D1461.2.patch, 
 HBASE-5186.D1461.3.patch


 It will be useful to have some metrics (queue length, waiting time, 
 processing time ...) similar to Hadoop RPC server. This allows us to monitor 
 system health also provide a tool to diagnose the problem where thrift calls 
 are slow.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-5230) Ensure compactions do not cache-on-write data blocks


[ 
https://issues.apache.org/jira/browse/HBASE-5230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13193566#comment-13193566
 ] 

Hudson commented on HBASE-5230:
---

Integrated in HBase-TRUNK-security #90 (See 
[https://builds.apache.org/job/HBase-TRUNK-security/90/])
HBASE-5230 : ensure that compactions do not cache-on-write data blocks

mbautin : 
Files : 
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV2.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/LruBlockCache.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/metrics/SchemaMetrics.java
* 
/hbase/trunk/src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java
* 
/hbase/trunk/src/test/java/org/apache/hadoop/hbase/io/hfile/TestCacheOnWrite.java


 Ensure compactions do not cache-on-write data blocks
 

 Key: HBASE-5230
 URL: https://issues.apache.org/jira/browse/HBASE-5230
 Project: HBase
  Issue Type: Improvement
Reporter: Mikhail Bautin
Assignee: Mikhail Bautin
Priority: Minor
 Attachments: D1353.1.patch, D1353.2.patch, D1353.3.patch, 
 D1353.4.patch, 
 Don-t-cache-data-blocks-on-compaction-2012-01-21_00_53_54.patch, 
 Don-t-cache-data-blocks-on-compaction-2012-01-23_10_23_45.patch, 
 Don-t-cache-data-blocks-on-compaction-2012-01-23_15_27_23.patch


 Create a unit test for HBASE-3976 (making sure we don't cache data blocks on 
 write during compactions even if cache-on-write is enabled generally 
 enabled). This is because we have very different implementations of 
 HBASE-3976 without HBASE-4422 CacheConfig (on top of 89-fb, created by Liyin) 
 and with CacheConfig (presumably it's there but not sure if it even works, 
 since the patch in HBASE-3976 may not have been committed). We need to create 
 a unit test to verify that we don't cache data blocks on write during 
 compactions, and resolve HBASE-3976 so that this new unit test does not fail.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4218) Data Block Encoding of KeyValues (aka delta encoding / prefix compression)


[ 
https://issues.apache.org/jira/browse/HBASE-4218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13193564#comment-13193564
 ] 

Hudson commented on HBASE-4218:
---

Integrated in HBase-TRUNK-security #90 (See 
[https://builds.apache.org/job/HBase-TRUNK-security/90/])
[jira] [HBASE-4218] HFile data block encoding framework and delta encoding
implementation (Jacek Midgal, Mikhail Bautin)

Summary:

Adding a framework that allows to encode keys in an HFile data block. We
support two modes of encoding: (1) both on disk and in cache, and (2) in cache
only. This is distinct from compression that is already being done in HBase,
e.g. GZ or LZO. When data block encoding is enabled, we store blocks in cache
in an uncompressed but encoded form. This allows to fit more blocks in cache
and reduce the number of disk reads.

The most common example of data block encoding is delta encoding, where we take
advantage of the fact that HFile keys are sorted and share a lot of common
prefixes, and only store the delta between each pair of consecutive keys.
Initial encoding algorithms implemented are DIFF, FAST_DIFF, and PREFIX.

This is based on the delta encoding patch developed by Jacek Midgal during his
2011 summer internship at Facebook. The original patch is available here:
https://reviews.apache.org/r/2308/diff/.

Test Plan: Unit tests. Distributed load test on a five-node cluster.

Reviewers: JIRA, tedyu, stack, nspiegelberg, Kannan

Reviewed By: Kannan

CC: tedyu, todd, mbautin, stack, Kannan, mcorgan, gqchen

Differential Revision: https://reviews.facebook.net/D447

mbautin : 
Files : 
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/HColumnDescriptor.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/HConstants.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/KeyValue.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/HalfStoreFileReader.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/encoding/BufferedDataBlockEncoder.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/encoding/CompressionState.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/encoding/CopyKeyDataBlockEncoder.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/encoding/DataBlockEncoder.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/encoding/DataBlockEncoding.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/encoding/DiffKeyDeltaEncoder.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/encoding/EncodedDataBlock.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/encoding/EncoderBufferTooSmallException.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/encoding/FastDiffDeltaEncoder.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/encoding/PrefixKeyDeltaEncoder.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/AbstractHFileReader.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/AbstractHFileWriter.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/BlockCacheKey.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/BlockType.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlockIndex.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileDataBlockEncoder.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileDataBlockEncoderImpl.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/HFilePrettyPrinter.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV1.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV1.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV2.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/LruBlockCache.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/NoOpDataBlockEncoder.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/mapreduce/LoadIncrementalHFiles.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/CompactSplitThread.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/MemStore.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/metrics/SchemaConfigured.java
*

[jira] [Updated] (HBASE-5266) Add documentation for ColumnRangeFilter

2012-01-25 Thread Lars Hofhansl (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-5266?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HBASE-5266:
-

Attachment: 5266-v2.txt

How's this?

 Add documentation for ColumnRangeFilter
 ---

 Key: HBASE-5266
 URL: https://issues.apache.org/jira/browse/HBASE-5266
 Project: HBase
  Issue Type: Sub-task
  Components: documentation
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
Priority: Minor
 Fix For: 0.94.0

 Attachments: 5266-v2.txt, 5266.txt


 There are only a few lines of documentation for ColumnRangeFilter.
 Given the usefulness of this filter for efficient intra-row scanning (see 
 HBASE-5229 and HBASE-4256), we should make this filter more prominent in the 
 documentation.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-5186) Add metrics to ThriftServer