[jira] [Commented] (HBASE-6992) Coprocessors semantic issues: post async operations, helper methods, ...

2012-10-14 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13475788#comment-13475788
 ] 

ramkrishna.s.vasudevan commented on HBASE-6992:
---

[~mbertozzi]
As you noticed the reason for introducing the pre/postOperationHandlers as part 
of HBASE-5584 is first of all to have control on the async operations.
In our use case the create/delete coprocessors where used to create/delete 
another table with some modified HTD.  
Generally any async operations we need to wait for the successful operation and 
that too when we have coprocessors we need to wait for the operation to be done 
on the coprocessor hook tii.
When we tried to see the exisiting impl the reason for having pre/Post 
operation hook for async operations and not waiting for the operation to get 
completed is to not to block the RPC threads that does the operation.
So we thought of having other hooks that is sync with the handlers so that the 
RPC threads need not wait. 
The similar case is applicable for enable/disable table also.  The new hooks 
now gives us the advantage of being sure that the postOpreationHandler hook 
will be exeucted only on success of the main operation.
bq.but in case of failure of async operations like deleteTable() we've removed 
rights that we still need.
But the above problem should be a problem even without the new hooks right?  
bq.for example: modifyTable() is just a helper to avoid multiple 
addColumn()/deleteColumn() calls
but the problem here is that modifyTable() has its own pre/post operation()
This is again a general problem right?


 Coprocessors semantic issues: post async operations, helper methods, ...
 

 Key: HBASE-6992
 URL: https://issues.apache.org/jira/browse/HBASE-6992
 Project: HBase
  Issue Type: Brainstorming
  Components: Coprocessors
Affects Versions: 0.92.2, 0.94.2, 0.96.0
Reporter: Matteo Bertozzi
Assignee: Matteo Bertozzi

 Discussion ticket around coprocessor pre/post semantic.
 For each rpc in HMaster we have a pre/post operation that allows a coprocessor
 to execute some code before and after the operation
 * preOperation()
 * my operation
 * postOperation()
 This is used for example by the AccessController to verify if the user can 
 execute or not the operation.
 Everything is fine, unless the master operation is asynchronous (like 
 create/delete table)
 * preOperation()
 * executor.submit(new OperationHandler())
 * postOperation()
 The pre operation is still fine, since is executed before the operation and 
 need to throw exceptions to the client in case of failures...
 The post operation, instead, is no longer post... is just post submit. And if 
 someone subscribe to postCreateTable() the notification can arrive before the 
 table creation.
 To solve this problem, HBASE-5584 added pre/post handlers and now the 
 situation looks like this:
 {code}
 client request  client response
   |   |
   +--+-- submit op --++--- (HMaster)
pre op post op
 (executor) + handler +
pre handler   post handler
 {code}
 Now, we've two types of pre/post operation and the semantical correct are 
 preOperation() and postOperationHandler()
 since the preOperation() needs to reply to the client (e.g AccessController 
 NotAllowException) and the postOperatioHandler() is really post operation.
 postOperation() is not post... and preOperationHandler() can't communicate 
 with the client.
 The AccessController coprocessor uses the postOperation() that is fine for 
 the sync operation like addColumn(), deleteColumn()... but in case of failure 
 of async operations like deleteTable() we've removed rights that we still 
 need.
 I think that we should get back just to the single pre/post operation but 
 with the right semantic...
 Other then the when is executed problem, we've also functions that can be 
 described with other simpler functions
 for example: modifyTable() is just a helper to avoid multiple 
 addColumn()/deleteColumn() calls
 but the problem here is that modifyTable() has its own pre/post operation() 
 and if I've implemented the pre/post addColumn I don't get notified when I 
 call modifyTable(). This is another problem in the access controller 
 coprocessor
 In this case I'm not sure what the best solution can be... but in this way, 
 adding new helper methods means breaking the coprocessors, because they don't 
 get notified even if something is changed...
 Any idea, thoughts, ...?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: 

[jira] [Commented] (HBASE-6942) Endpoint implementation for bulk delete rows

2012-10-14 Thread Anoop Sam John (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13475839#comment-13475839
 ] 

Anoop Sam John commented on HBASE-6942:
---

Regarding passing the rowBatchSize in attributes of Scan, I am in half mind 
Ted..  As this attribute is related with the delete op and not Scan...  
Requesting your opinion also Lars..
I was thinking of enhancing this for doing all kind of deletes. We have CF 
delete, column delete, version delete and time based KVs delete etc... Ideally 
better we can support all these kind of deletes.  What is in my mind is to 
accept a Delete object as a template in this Endpoint...  Well for Delete we 
need some byte[] as rowkey.. Any dummy(empty byte[]) is okey..  What we need is 
to follow that Delete object create Delete objects in Endpoint. Thougts??

Yes Ted in that case we can pass back the number of KVs deleted..  

 Endpoint implementation for bulk delete rows
 

 Key: HBASE-6942
 URL: https://issues.apache.org/jira/browse/HBASE-6942
 Project: HBase
  Issue Type: Improvement
  Components: Coprocessors, Performance
Reporter: Anoop Sam John
Assignee: Anoop Sam John
 Fix For: 0.94.3, 0.96.0

 Attachments: HBASE-6942.patch, HBASE-6942_V2.patch, 
 HBASE-6942_V3.patch, HBASE-6942_V4.patch


 We can provide an end point implementation for doing a bulk deletion of 
 rows(based on a scan) at the server side. This can reduce the time taken for 
 such an operation as right now it need to do a scan to client and issue 
 delete(s) using rowkeys.
 Query like  delete from table1 where...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6942) Endpoint implementation for bulk delete rows

2012-10-14 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13475846#comment-13475846
 ] 

Ted Yu commented on HBASE-6942:
---

I like the suggestion of passing Delete object to the endpoint.
If the Delete object has empty byte[] as row key, we make use of the Scan 
object as you have done. Otherwise, step of scanning the region can be skipped.

 Endpoint implementation for bulk delete rows
 

 Key: HBASE-6942
 URL: https://issues.apache.org/jira/browse/HBASE-6942
 Project: HBase
  Issue Type: Improvement
  Components: Coprocessors, Performance
Reporter: Anoop Sam John
Assignee: Anoop Sam John
 Fix For: 0.94.3, 0.96.0

 Attachments: HBASE-6942.patch, HBASE-6942_V2.patch, 
 HBASE-6942_V3.patch, HBASE-6942_V4.patch


 We can provide an end point implementation for doing a bulk deletion of 
 rows(based on a scan) at the server side. This can reduce the time taken for 
 such an operation as right now it need to do a scan to client and issue 
 delete(s) using rowkeys.
 Query like  delete from table1 where...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Comment Edited] (HBASE-6942) Endpoint implementation for bulk delete rows

2012-10-14 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13475846#comment-13475846
 ] 

Ted Yu edited comment on HBASE-6942 at 10/14/12 3:15 PM:
-

I like the suggestion of passing Delete object to the endpoint.

  was (Author: yuzhih...@gmail.com):
I like the suggestion of passing Delete object to the endpoint.
If the Delete object has empty byte[] as row key, we make use of the Scan 
object as you have done. Otherwise, step of scanning the region can be skipped.
  
 Endpoint implementation for bulk delete rows
 

 Key: HBASE-6942
 URL: https://issues.apache.org/jira/browse/HBASE-6942
 Project: HBase
  Issue Type: Improvement
  Components: Coprocessors, Performance
Reporter: Anoop Sam John
Assignee: Anoop Sam John
 Fix For: 0.94.3, 0.96.0

 Attachments: HBASE-6942.patch, HBASE-6942_V2.patch, 
 HBASE-6942_V3.patch, HBASE-6942_V4.patch


 We can provide an end point implementation for doing a bulk deletion of 
 rows(based on a scan) at the server side. This can reduce the time taken for 
 such an operation as right now it need to do a scan to client and issue 
 delete(s) using rowkeys.
 Query like  delete from table1 where...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-5251) Some commands return 0 rows when 0 rows were processed successfully

2012-10-14 Thread Sameer Vaishampayan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13475904#comment-13475904
 ] 

Sameer Vaishampayan commented on HBASE-5251:


That was the thinking around cleanup, that some of the code with different 
kinds of formatters were not even used. 

 Some commands return 0 rows when  0 rows were processed successfully
 ---

 Key: HBASE-5251
 URL: https://issues.apache.org/jira/browse/HBASE-5251
 Project: HBase
  Issue Type: Bug
  Components: shell
Affects Versions: 0.90.5
Reporter: David S. Wang
Assignee: Sameer Vaishampayan
Priority: Minor
  Labels: noob
 Attachments: patch7.diff, patch8.diff, patch9.diff


 From the hbase shell, I see this:
 hbase(main):049:0 scan 't1'
 ROW   COLUMN+CELL 
   
  r1   column=f1:c1, timestamp=1327104295560, value=value  
   
  r1   column=f1:c2, timestamp=1327104330625, value=value  
   
 1 row(s) in 0.0300 seconds
 hbase(main):050:0 deleteall 't1', 'r1'
 0 row(s) in 0.0080 seconds  == I expected this to read 
 2 row(s)
 hbase(main):051:0 scan 't1'   
 ROW   COLUMN+CELL 
   
 0 row(s) in 0.0090 seconds
 I expected the deleteall command to return 1 row(s) instead of 0, because 1 
 row was deleted.  Similar behavior for delete and some other commands.  Some 
 commands such as put work fine.
 Looking at the ruby shell code, it seems that formatter.footer() is called 
 even for commands that will not actually increment the number of rows 
 reported, such as deletes.  Perhaps there should be another similar function 
 to formatter.footer(), but that will not print out @row_count.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HBASE-6993) Make filters a first class part of the REST representations

2012-10-14 Thread Andrew Purtell (JIRA)
Andrew Purtell created HBASE-6993:
-

 Summary: Make filters a first class part of the REST 
representations
 Key: HBASE-6993
 URL: https://issues.apache.org/jira/browse/HBASE-6993
 Project: HBase
  Issue Type: Sub-task
Reporter: Andrew Purtell




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (HBASE-6993) Make filters a first class part of the REST representations

2012-10-14 Thread Andrew Purtell (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6993?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell reassigned HBASE-6993:
-

Assignee: Andrew Purtell

 Make filters a first class part of the REST representations
 ---

 Key: HBASE-6993
 URL: https://issues.apache.org/jira/browse/HBASE-6993
 Project: HBase
  Issue Type: Sub-task
Reporter: Andrew Purtell
Assignee: Andrew Purtell



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6942) Endpoint implementation for bulk delete rows

2012-10-14 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13475911#comment-13475911
 ] 

Lars Hofhansl commented on HBASE-6942:
--

Yeah, passing the delete option as scan attribute is a bit weird. But I can see 
this both ways.
It would be nice if all of this could be strictly controlled by the scan we 
pass in. The scan would (through the attribute) indicate the delete type to use 
and also describe the KVs that are to be deleted.

Also not sure about the template Delete... We'd have to make up fake column 
qualifiers and qualifiers in the future... Since this is an advanced feature we 
could pass the delete type (from KeyValue), or maybe a new enum to indicate 
what we want to do.

So I can see this going both ways. In either case this should probably return 
the number of KVs deleted.

 Endpoint implementation for bulk delete rows
 

 Key: HBASE-6942
 URL: https://issues.apache.org/jira/browse/HBASE-6942
 Project: HBase
  Issue Type: Improvement
  Components: Coprocessors, Performance
Reporter: Anoop Sam John
Assignee: Anoop Sam John
 Fix For: 0.94.3, 0.96.0

 Attachments: HBASE-6942.patch, HBASE-6942_V2.patch, 
 HBASE-6942_V3.patch, HBASE-6942_V4.patch


 We can provide an end point implementation for doing a bulk deletion of 
 rows(based on a scan) at the server side. This can reduce the time taken for 
 such an operation as right now it need to do a scan to client and issue 
 delete(s) using rowkeys.
 Query like  delete from table1 where...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-6755) HRegion.internalObtainRowLock uses unecessary AtomicInteger

2012-10-14 Thread Lars Hofhansl (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6755?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HBASE-6755:
-

Fix Version/s: (was: 0.94.3)

Removing from 0.94.3. Will probably just mark as Won't fix

 HRegion.internalObtainRowLock uses unecessary AtomicInteger
 ---

 Key: HBASE-6755
 URL: https://issues.apache.org/jira/browse/HBASE-6755
 Project: HBase
  Issue Type: Bug
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
Priority: Minor
 Attachments: 6755-0.96.txt


 I was looking at HBase's implementation of locks and saw that is 
 unnecessarily uses an AtomicInteger to obtain a unique lockid.
 The observation is that we only need a unique one and don't care if we happen 
 to skip one.
 In a very unscientific test I saw the %system CPU reduced when the 
 AtomicInteger is avoided.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-6562) Fake KVs are sometimes passed to filters

2012-10-14 Thread Lars Hofhansl (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HBASE-6562:
-

Fix Version/s: (was: 0.94.3)

 Fake KVs are sometimes passed to filters
 

 Key: HBASE-6562
 URL: https://issues.apache.org/jira/browse/HBASE-6562
 Project: HBase
  Issue Type: Bug
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
Priority: Minor
 Fix For: 0.96.0

 Attachments: 6562.txt, 6562-v2.txt, 6562-v3.txt, minimalTest.java


 In internal tests at Salesforce we found that fake row keys sometimes are 
 passed to filters (Filter.filterRowKey(...) specifically).
 The KVs are eventually filtered by the StoreScanner/ScanQueryMatcher, but the 
 row key is passed to filterRowKey in RegionScannImpl *before* that happens.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6843) loading lzo error when using coprocessor

2012-10-14 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13475925#comment-13475925
 ] 

Lars Hofhansl commented on HBASE-6843:
--

Trying to understand the issue here.

@Zhou, are you saying the coprocessors cannot be used at all if native lzo is 
used for compression? Is this the typical lzo compression library folks would 
use? That would indeed be pretty bad.


 loading lzo error when using coprocessor
 

 Key: HBASE-6843
 URL: https://issues.apache.org/jira/browse/HBASE-6843
 Project: HBase
  Issue Type: Bug
  Components: Coprocessors
Affects Versions: 0.94.1
Reporter: Zhou wenjian
Assignee: Zhou wenjian
Priority: Critical
 Fix For: 0.94.3, 0.96.0

 Attachments: HBASE-6843-trunk.patch


 After applying HBASE-6308,we found error followed
 2012-09-06 00:44:38,341 DEBUG 
 org.apache.hadoop.hbase.coprocessor.CoprocessorClassLoader: Finding class: 
 com.hadoop.compression.lzo.LzoCodec
 2012-09-06 00:44:38,351 ERROR com.hadoop.compression.lzo.GPLNativeCodeLoader: 
 Could not load native gpl library
 java.lang.UnsatisfiedLinkError: Native Library 
 /home/zhuzhuang/hbase/0.94.0-ali-1.0/lib/native/Linux-amd64-64/libgplcompression.so
  already loaded in another classloade
 r
 at java.lang.ClassLoader.loadLibrary0(ClassLoader.java:1772)
 at java.lang.ClassLoader.loadLibrary(ClassLoader.java:1732)
 at java.lang.Runtime.loadLibrary0(Runtime.java:823)
 at java.lang.System.loadLibrary(System.java:1028)
 at 
 com.hadoop.compression.lzo.GPLNativeCodeLoader.clinit(GPLNativeCodeLoader.java:32)
 at com.hadoop.compression.lzo.LzoCodec.clinit(LzoCodec.java:67)
 at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
 at 
 sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
 at 
 sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
 at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
 at 
 org.apache.hadoop.util.ReflectionUtils.newInstance(ReflectionUtils.java:113)
 at 
 org.apache.hadoop.hbase.io.hfile.Compression$Algorithm$1.getCodec(Compression.java:107)
 at 
 org.apache.hadoop.hbase.io.hfile.Compression$Algorithm.getCompressor(Compression.java:243)
 at 
 org.apache.hadoop.hbase.util.CompressionTest.testCompression(CompressionTest.java:85)
 at 
 org.apache.hadoop.hbase.regionserver.HRegion.checkCompressionCodecs(HRegion.java:3793)
 at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:3782)
 at org.apache.hadoop.hbase.regionserver.HRegion.openHRegion(HRegion.java:3732)
 at 
 org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.openRegion(OpenRegionHandler.java:332)
 at 
 org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler.process(OpenRegionHandler.java:108)
 at org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:169)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:662)
 2012-09-06 00:44:38,355 DEBUG 
 org.apache.hadoop.hbase.coprocessor.CoprocessorClassLoader: Skipping exempt 
 class java.io.PrintWriter - delegating directly to parent
 2012-09-06 00:44:38,355 ERROR com.hadoop.compression.lzo.LzoCodec: Cannot 
 load native-lzo without native-hadoop

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6974) Metric for blocked updates

2012-10-14 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13475927#comment-13475927
 ] 

Lars Hofhansl commented on HBASE-6974:
--

It might be good to track the total amount of a time (seconds is probably the 
most useful resolution here) that writes are blocked (i.e. not just a counter).


 Metric for blocked updates
 --

 Key: HBASE-6974
 URL: https://issues.apache.org/jira/browse/HBASE-6974
 Project: HBase
  Issue Type: Bug
Reporter: Lars Hofhansl
Assignee: Michael Drzal
 Fix For: 0.94.3, 0.96.0


 When the disc subsystem cannot keep up with a sustained high write load, a 
 region will eventually block updates to throttle clients.
 (HRegion.checkResources).
 It would be nice to have a metric for this, so that these occurrences can be 
 tracked.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-6974) Metric for blocked updates

2012-10-14 Thread Lars Hofhansl (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6974?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HBASE-6974:
-

Priority: Critical  (was: Major)

Upping to critical, so it goes into 0.94.3

 Metric for blocked updates
 --

 Key: HBASE-6974
 URL: https://issues.apache.org/jira/browse/HBASE-6974
 Project: HBase
  Issue Type: Bug
Reporter: Lars Hofhansl
Assignee: Michael Drzal
Priority: Critical
 Fix For: 0.94.3, 0.96.0


 When the disc subsystem cannot keep up with a sustained high write load, a 
 region will eventually block updates to throttle clients.
 (HRegion.checkResources).
 It would be nice to have a metric for this, so that these occurrences can be 
 tracked.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-6330) TestImportExport has been failing against hadoop 0.23/2.0 profile [Part2]

2012-10-14 Thread Lars Hofhansl (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6330?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HBASE-6330:
-

Priority: Major  (was: Critical)

Since these are just tests and there is no movement, lowering to Major

 TestImportExport has been failing against hadoop 0.23/2.0 profile [Part2]
 -

 Key: HBASE-6330
 URL: https://issues.apache.org/jira/browse/HBASE-6330
 Project: HBase
  Issue Type: Sub-task
  Components: test
Affects Versions: 0.94.1, 0.96.0
Reporter: Jonathan Hsieh
Assignee: Jonathan Hsieh
  Labels: hadoop-2.0
 Fix For: 0.94.3, 0.96.0

 Attachments: hbase-6330-94.patch, hbase-6330-trunk.patch, 
 hbase-6330-v2.patch


 See HBASE-5876.  I'm going to commit the v3 patches under this name since 
 there has been two months (my bad) since the first half was committed and 
 found to be incomplte.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-6307) Fix hbase unit tests running on hadoop 2.0

2012-10-14 Thread Lars Hofhansl (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6307?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HBASE-6307:
-

Priority: Major  (was: Critical)

Since these are just tests and there is no movement, lowering to Major

 Fix hbase unit tests running on hadoop 2.0
 --

 Key: HBASE-6307
 URL: https://issues.apache.org/jira/browse/HBASE-6307
 Project: HBase
  Issue Type: Sub-task
Reporter: Jonathan Hsieh
 Fix For: 0.94.3, 0.96.0


 This is an umbrella issue for fixing unit tests and hbase builds form 0.92+ 
 on top of hadoop 0.23 (currently 0.92/0.94) and hadoop 2.0.x (trunk/0.96).  
 Once these are up and passing properly, we'll close out the umbrella issue by 
 adding hbase-trunk-on-hadoop-2 build to the hadoopqa bot.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6305) TestLocalHBaseCluster hangs with hadoop 2.0/0.23 builds.

2012-10-14 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13475931#comment-13475931
 ] 

Lars Hofhansl commented on HBASE-6305:
--

Since these are just tests and there is no movement, lowering to Major

 TestLocalHBaseCluster hangs with hadoop 2.0/0.23 builds.
 

 Key: HBASE-6305
 URL: https://issues.apache.org/jira/browse/HBASE-6305
 Project: HBase
  Issue Type: Sub-task
  Components: test
Affects Versions: 0.92.2, 0.94.1
Reporter: Jonathan Hsieh
Assignee: Jonathan Hsieh
 Fix For: 0.92.3, 0.94.3

 Attachments: hbase-6305-94.patch


 trunk: mvn clean test -Dhadoop.profile=2.0 -Dtest=TestLocalHBaseCluster
 0.94: mvn clean test -Dhadoop.profile=23 -Dtest=TestLocalHBaseCluster
 {code}
 testLocalHBaseCluster(org.apache.hadoop.hbase.TestLocalHBaseCluster)  Time 
 elapsed: 0.022 sec   ERROR!
 java.lang.RuntimeException: Master not initialized after 200 seconds
 at 
 org.apache.hadoop.hbase.util.JVMClusterUtil.startup(JVMClusterUtil.java:208)
 at 
 org.apache.hadoop.hbase.LocalHBaseCluster.startup(LocalHBaseCluster.java:424)
 at 
 org.apache.hadoop.hbase.TestLocalHBaseCluster.testLocalHBaseCluster(TestLocalHBaseCluster.java:66)
 ...
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-6305) TestLocalHBaseCluster hangs with hadoop 2.0/0.23 builds.

2012-10-14 Thread Lars Hofhansl (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HBASE-6305:
-

Priority: Major  (was: Critical)

 TestLocalHBaseCluster hangs with hadoop 2.0/0.23 builds.
 

 Key: HBASE-6305
 URL: https://issues.apache.org/jira/browse/HBASE-6305
 Project: HBase
  Issue Type: Sub-task
  Components: test
Affects Versions: 0.92.2, 0.94.1
Reporter: Jonathan Hsieh
Assignee: Jonathan Hsieh
 Fix For: 0.92.3, 0.94.3

 Attachments: hbase-6305-94.patch


 trunk: mvn clean test -Dhadoop.profile=2.0 -Dtest=TestLocalHBaseCluster
 0.94: mvn clean test -Dhadoop.profile=23 -Dtest=TestLocalHBaseCluster
 {code}
 testLocalHBaseCluster(org.apache.hadoop.hbase.TestLocalHBaseCluster)  Time 
 elapsed: 0.022 sec   ERROR!
 java.lang.RuntimeException: Master not initialized after 200 seconds
 at 
 org.apache.hadoop.hbase.util.JVMClusterUtil.startup(JVMClusterUtil.java:208)
 at 
 org.apache.hadoop.hbase.LocalHBaseCluster.startup(LocalHBaseCluster.java:424)
 at 
 org.apache.hadoop.hbase.TestLocalHBaseCluster.testLocalHBaseCluster(TestLocalHBaseCluster.java:66)
 ...
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6919) Remove unnecessary cast from Bytes.readVLong

2012-10-14 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6919?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13475932#comment-13475932
 ] 

Lars Hofhansl commented on HBASE-6919:
--

Hey Mr. Taylor... Wanna make a patch. We can sit and do together. :)

 Remove unnecessary cast from Bytes.readVLong
 

 Key: HBASE-6919
 URL: https://issues.apache.org/jira/browse/HBASE-6919
 Project: HBase
  Issue Type: Bug
Reporter: James Taylor
Priority: Minor
 Fix For: 0.94.3, 0.96.0


 Remove the throws IOException so that caller doesn't have to catch and ignore.
   public static long readVLong(final byte [] buffer, final int offset)
   throws IOException
 Also, add
   public static int readVInt(final byte [] buffer, final int offset)
   throws IOException {
 return (int)readVLong(buffer,offset);
   }
 and these are useful too:
 /**
  * Put long as variable length encoded number at the offset in
  * the result byte array.
  * @param vint Integer to make a vint of.
  * @param result buffer to put vint into
  * @return Vint length in bytes of vint
  */
 public static int vintToBytes(byte[] result, int offset, final long vint) 
 {
   long i = vint;
   if (i = -112  i = 127) {
 result[offset] = (byte) i;
 return 1;
   }
   int len = -112;
   if (i  0) {
 i ^= -1L; // take one's complement'
 len = -120;
   }
   long tmp = i;
   while (tmp != 0) {
 tmp = tmp  8;
 len--;
   }
   result[offset++] = (byte) len;
   len = (len  -120) ? -(len + 120) : -(len + 112);
   for (int idx = len; idx != 0; idx--) {
 int shiftbits = (idx - 1) * 8;
 long mask = 0xFFL  shiftbits;
 result[offset++] = (byte)((i  mask)  shiftbits);
   }
   return len + 1;
 }
 /**
  * Decode a vint from the buffer pointed at to by ptr and
  * increment the offset of the ptr by the length of the
  * vint.
  * @param ptr a pointer to a byte array buffer
  * @return the decoded vint value as an int
  */
 public static int vintFromBytes(ImmutableBytesWritable ptr) {
 return (int) vlongFromBytes(ptr);
 }
 
 /**
  * Decode a vint from the buffer pointed at to by ptr and
  * increment the offset of the ptr by the length of the
  * vint.
  * @param ptr a pointer to a byte array buffer
  * @return the decoded vint value as a long
  */
 public static long vlongFromBytes(ImmutableBytesWritable ptr) {
 final byte [] buffer = ptr.get();
 final int offset = ptr.getOffset();
 byte firstByte = buffer[offset];
 int len = WritableUtils.decodeVIntSize(firstByte);
 if (len == 1) {
 ptr.set(buffer, offset+1, ptr.getLength());
 return firstByte;
 }
 long i = 0;
 for (int idx = 0; idx  len-1; idx++) {
 byte b = buffer[offset + 1 + idx];
 i = i  8;
 i = i | (b  0xFF);
 }
 ptr.set(buffer, offset+len, ptr.getLength());
 return (WritableUtils.isNegativeVInt(firstByte) ? ~i : i);
 }

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-4676) Prefix Compression - Trie data block encoding

2012-10-14 Thread Matt Corgan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13475934#comment-13475934
 ] 

Matt Corgan commented on HBASE-4676:


I have all known bugs worked out.  All tests are passing including those 
internal to the hbase-prefix-tree module, and existing DataBlockEncoding tests 
in hbase-server, such as TestEncodedSeekers.  

Split into 3 patches for reviewboard:
https://reviews.apache.org/r/7589/
https://reviews.apache.org/r/7591/
https://reviews.apache.org/r/7592/

The prefix-tree branch on my github repo contains the latest from trunk as of 
this comment.
http://github.com/hotpads/hbase/tree/prefix-tree
{code}
git clone -b prefix-tree https://hotp...@github.com/hotpads/hbase.git 
hbase-prefix-tree
{code}

 Prefix Compression - Trie data block encoding
 -

 Key: HBASE-4676
 URL: https://issues.apache.org/jira/browse/HBASE-4676
 Project: HBase
  Issue Type: New Feature
  Components: io, Performance, regionserver
Affects Versions: 0.90.6
Reporter: Matt Corgan
Assignee: Matt Corgan
 Attachments: HBASE-4676-0.94-v1.patch, hbase-prefix-trie-0.1.jar, 
 PrefixTrie_Format_v1.pdf, PrefixTrie_Performance_v1.pdf, SeeksPerSec by 
 blockSize.png


 The HBase data block format has room for 2 significant improvements for 
 applications that have high block cache hit ratios.  
 First, there is no prefix compression, and the current KeyValue format is 
 somewhat metadata heavy, so there can be tremendous memory bloat for many 
 common data layouts, specifically those with long keys and short values.
 Second, there is no random access to KeyValues inside data blocks.  This 
 means that every time you double the datablock size, average seek time (or 
 average cpu consumption) goes up by a factor of 2.  The standard 64KB block 
 size is ~10x slower for random seeks than a 4KB block size, but block sizes 
 as small as 4KB cause problems elsewhere.  Using block sizes of 256KB or 1MB 
 or more may be more efficient from a disk access and block-cache perspective 
 in many big-data applications, but doing so is infeasible from a random seek 
 perspective.
 The PrefixTrie block encoding format attempts to solve both of these 
 problems.  Some features:
 * trie format for row key encoding completely eliminates duplicate row keys 
 and encodes similar row keys into a standard trie structure which also saves 
 a lot of space
 * the column family is currently stored once at the beginning of each block.  
 this could easily be modified to allow multiple family names per block
 * all qualifiers in the block are stored in their own trie format which 
 caters nicely to wide rows.  duplicate qualifers between rows are eliminated. 
  the size of this trie determines the width of the block's qualifier 
 fixed-width-int
 * the minimum timestamp is stored at the beginning of the block, and deltas 
 are calculated from that.  the maximum delta determines the width of the 
 block's timestamp fixed-width-int
 The block is structured with metadata at the beginning, then a section for 
 the row trie, then the column trie, then the timestamp deltas, and then then 
 all the values.  Most work is done in the row trie, where every leaf node 
 (corresponding to a row) contains a list of offsets/references corresponding 
 to the cells in that row.  Each cell is fixed-width to enable binary 
 searching and is represented by [1 byte operationType, X bytes qualifier 
 offset, X bytes timestamp delta offset].
 If all operation types are the same for a block, there will be zero per-cell 
 overhead.  Same for timestamps.  Same for qualifiers when i get a chance.  
 So, the compression aspect is very strong, but makes a few small sacrifices 
 on VarInt size to enable faster binary searches in trie fan-out nodes.
 A more compressed but slower version might build on this by also applying 
 further (suffix, etc) compression on the trie nodes at the cost of slower 
 write speed.  Even further compression could be obtained by using all VInts 
 instead of FInts with a sacrifice on random seek speed (though not huge).
 One current drawback is the current write speed.  While programmed with good 
 constructs like TreeMaps, ByteBuffers, binary searches, etc, it's not 
 programmed with the same level of optimization as the read path.  Work will 
 need to be done to optimize the data structures used for encoding and could 
 probably show a 10x increase.  It will still be slower than delta encoding, 
 but with a much higher decode speed.  I have not yet created a thorough 
 benchmark for write speed nor sequential read speed.
 Though the trie is reaching a point where it is internally very efficient 
 (probably within half or a quarter of its max read speed) the way that hbase 
 currently uses it 

[jira] [Commented] (HBASE-6785) Convert AggregateProtocol to protobuf defined coprocessor service

2012-10-14 Thread Gary Helmling (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13475935#comment-13475935
 ] 

Gary Helmling commented on HBASE-6785:
--

Latest patch looks good to me.  Thanks for the doc updates, Devaraj.

[~zhi...@ebaysf.com], do you have any other comments?

 Convert AggregateProtocol to protobuf defined coprocessor service
 -

 Key: HBASE-6785
 URL: https://issues.apache.org/jira/browse/HBASE-6785
 Project: HBase
  Issue Type: Sub-task
  Components: Coprocessors
Reporter: Gary Helmling
Assignee: Devaraj Das
 Fix For: 0.96.0

 Attachments: 6785-2.patch, 6785-simplified-pb1.patch, 
 Aggregate.proto, Aggregate.proto


 With coprocessor endpoints now exposed as protobuf defined services, we 
 should convert over all of our built-in endpoints to PB services.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6785) Convert AggregateProtocol to protobuf defined coprocessor service

2012-10-14 Thread Ted Yu (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13475940#comment-13475940
 ] 

Ted Yu commented on HBASE-6785:
---

I don't have other comments.

Integrated to trunk.

Thanks for the review, Gary.

 Convert AggregateProtocol to protobuf defined coprocessor service
 -

 Key: HBASE-6785
 URL: https://issues.apache.org/jira/browse/HBASE-6785
 Project: HBase
  Issue Type: Sub-task
  Components: Coprocessors
Reporter: Gary Helmling
Assignee: Devaraj Das
 Fix For: 0.96.0

 Attachments: 6785-2.patch, 6785-simplified-pb1.patch, 
 Aggregate.proto, Aggregate.proto


 With coprocessor endpoints now exposed as protobuf defined services, we 
 should convert over all of our built-in endpoints to PB services.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6785) Convert AggregateProtocol to protobuf defined coprocessor service

2012-10-14 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6785?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13475946#comment-13475946
 ] 

Hudson commented on HBASE-6785:
---

Integrated in HBase-TRUNK #3447 (See 
[https://builds.apache.org/job/HBase-TRUNK/3447/])
HBASE-6785 Convert AggregateProtocol to protobuf defined coprocessor 
service (Devaraj Das) (Revision 1398175)

 Result = FAILURE
tedyu : 
Files : 
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/client/coprocessor/AggregationClient.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/client/coprocessor/LongColumnInterpreter.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateImplementation.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/coprocessor/AggregateProtocol.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/coprocessor/ColumnInterpreter.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/protobuf/generated/AggregateProtos.java
* /hbase/trunk/hbase-server/src/main/protobuf/Aggregate.proto


 Convert AggregateProtocol to protobuf defined coprocessor service
 -

 Key: HBASE-6785
 URL: https://issues.apache.org/jira/browse/HBASE-6785
 Project: HBase
  Issue Type: Sub-task
  Components: Coprocessors
Reporter: Gary Helmling
Assignee: Devaraj Das
 Fix For: 0.96.0

 Attachments: 6785-2.patch, 6785-simplified-pb1.patch, 
 Aggregate.proto, Aggregate.proto


 With coprocessor endpoints now exposed as protobuf defined services, we 
 should convert over all of our built-in endpoints to PB services.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-6994) minor doc update about DEFAULT_ACCEPTABLE_FACTOR

2012-10-14 Thread liang xie (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

liang xie updated HBASE-6994:
-

Description: 
Per ttk code, in LruBlockCache.java:
static final float DEFAULT_ACCEPTABLE_FACTOR = 0.99f;

but the site doc still :
number of region servers * heap size * hfile.block.cache.size * 0.85

seems the HBASE-6312 forgot to update this doc:)

  was:
Per turnk code, in LruBlockCache.java:
static final float DEFAULT_ACCEPTABLE_FACTOR = 0.99f;

but the site doc still :
number of region servers * heap size * hfile.block.cache.size * 0.85

seems the HBASE-6312 forgot to update this doc:)


 minor doc update about DEFAULT_ACCEPTABLE_FACTOR
 

 Key: HBASE-6994
 URL: https://issues.apache.org/jira/browse/HBASE-6994
 Project: HBase
  Issue Type: Bug
  Components: documentation
Affects Versions: 0.96.0
Reporter: liang xie
Assignee: liang xie
Priority: Minor

 Per ttk code, in LruBlockCache.java:
 static final float DEFAULT_ACCEPTABLE_FACTOR = 0.99f;
 but the site doc still :
 number of region servers * heap size * hfile.block.cache.size * 0.85
 seems the HBASE-6312 forgot to update this doc:)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HBASE-6994) minor doc update about DEFAULT_ACCEPTABLE_FACTOR

2012-10-14 Thread liang xie (JIRA)
liang xie created HBASE-6994:


 Summary: minor doc update about DEFAULT_ACCEPTABLE_FACTOR
 Key: HBASE-6994
 URL: https://issues.apache.org/jira/browse/HBASE-6994
 Project: HBase
  Issue Type: Bug
  Components: documentation
Affects Versions: 0.96.0
Reporter: liang xie
Assignee: liang xie
Priority: Minor


Per turnk code, in LruBlockCache.java:
static final float DEFAULT_ACCEPTABLE_FACTOR = 0.99f;

but the site doc still :
number of region servers * heap size * hfile.block.cache.size * 0.85

seems the HBASE-6312 forgot to update this doc:)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-6994) minor doc update about DEFAULT_ACCEPTABLE_FACTOR

2012-10-14 Thread liang xie (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

liang xie updated HBASE-6994:
-

Attachment: HBASE-6994.patch

 minor doc update about DEFAULT_ACCEPTABLE_FACTOR
 

 Key: HBASE-6994
 URL: https://issues.apache.org/jira/browse/HBASE-6994
 Project: HBase
  Issue Type: Bug
  Components: documentation
Affects Versions: 0.96.0
Reporter: liang xie
Assignee: liang xie
Priority: Minor
 Attachments: HBASE-6994.patch


 Per trunk code, in LruBlockCache.java:
 static final float DEFAULT_ACCEPTABLE_FACTOR = 0.99f;
 but the site doc still :
 number of region servers * heap size * hfile.block.cache.size * 0.85
 seems the HBASE-6312 forgot to update this doc:)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-6994) minor doc update about DEFAULT_ACCEPTABLE_FACTOR

2012-10-14 Thread liang xie (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

liang xie updated HBASE-6994:
-

Description: 
Per trunk code, in LruBlockCache.java:
static final float DEFAULT_ACCEPTABLE_FACTOR = 0.99f;

but the site doc still :
number of region servers * heap size * hfile.block.cache.size * 0.85

seems the HBASE-6312 forgot to update this doc:)

  was:
Per ttk code, in LruBlockCache.java:
static final float DEFAULT_ACCEPTABLE_FACTOR = 0.99f;

but the site doc still :
number of region servers * heap size * hfile.block.cache.size * 0.85

seems the HBASE-6312 forgot to update this doc:)


 minor doc update about DEFAULT_ACCEPTABLE_FACTOR
 

 Key: HBASE-6994
 URL: https://issues.apache.org/jira/browse/HBASE-6994
 Project: HBase
  Issue Type: Bug
  Components: documentation
Affects Versions: 0.96.0
Reporter: liang xie
Assignee: liang xie
Priority: Minor
 Attachments: HBASE-6994.patch


 Per trunk code, in LruBlockCache.java:
 static final float DEFAULT_ACCEPTABLE_FACTOR = 0.99f;
 but the site doc still :
 number of region servers * heap size * hfile.block.cache.size * 0.85
 seems the HBASE-6312 forgot to update this doc:)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-6994) minor doc update about DEFAULT_ACCEPTABLE_FACTOR

2012-10-14 Thread liang xie (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6994?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

liang xie updated HBASE-6994:
-

Status: Patch Available  (was: Open)

 minor doc update about DEFAULT_ACCEPTABLE_FACTOR
 

 Key: HBASE-6994
 URL: https://issues.apache.org/jira/browse/HBASE-6994
 Project: HBase
  Issue Type: Bug
  Components: documentation
Affects Versions: 0.96.0
Reporter: liang xie
Assignee: liang xie
Priority: Minor
 Attachments: HBASE-6994.patch


 Per trunk code, in LruBlockCache.java:
 static final float DEFAULT_ACCEPTABLE_FACTOR = 0.99f;
 but the site doc still :
 number of region servers * heap size * hfile.block.cache.size * 0.85
 seems the HBASE-6312 forgot to update this doc:)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6994) minor doc update about DEFAULT_ACCEPTABLE_FACTOR

2012-10-14 Thread liang xie (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13475949#comment-13475949
 ] 

liang xie commented on HBASE-6994:
--

thanks my colleague honghua for observing doc inconsistency firstly

 minor doc update about DEFAULT_ACCEPTABLE_FACTOR
 

 Key: HBASE-6994
 URL: https://issues.apache.org/jira/browse/HBASE-6994
 Project: HBase
  Issue Type: Bug
  Components: documentation
Affects Versions: 0.96.0
Reporter: liang xie
Assignee: liang xie
Priority: Minor
 Attachments: HBASE-6994.patch


 Per trunk code, in LruBlockCache.java:
 static final float DEFAULT_ACCEPTABLE_FACTOR = 0.99f;
 but the site doc still :
 number of region servers * heap size * hfile.block.cache.size * 0.85
 seems the HBASE-6312 forgot to update this doc:)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-6785) Convert AggregateProtocol to protobuf defined coprocessor service

2012-10-14 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6785?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-6785:
--

  Resolution: Fixed
Hadoop Flags: Reviewed
  Status: Resolved  (was: Patch Available)

I ran the following tests and they passed:
{code}
  637  mt -Dtest=TestFromClientSideWithCoprocessor
  638  mt 
-Dtest=TestRegionServerCoprocessorExceptionWithAbort#testExceptionFromCoprocessorDuringPut
{code}

 Convert AggregateProtocol to protobuf defined coprocessor service
 -

 Key: HBASE-6785
 URL: https://issues.apache.org/jira/browse/HBASE-6785
 Project: HBase
  Issue Type: Sub-task
  Components: Coprocessors
Reporter: Gary Helmling
Assignee: Devaraj Das
 Fix For: 0.96.0

 Attachments: 6785-2.patch, 6785-simplified-pb1.patch, 
 Aggregate.proto, Aggregate.proto


 With coprocessor endpoints now exposed as protobuf defined services, we 
 should convert over all of our built-in endpoints to PB services.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (HBASE-6371) [89-fb] Level based compaction

2012-10-14 Thread Liyin Tang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liyin Tang reassigned HBASE-6371:
-

Assignee: Liyin Tang  (was: Akashnil)

 [89-fb] Level based compaction
 --

 Key: HBASE-6371
 URL: https://issues.apache.org/jira/browse/HBASE-6371
 Project: HBase
  Issue Type: Improvement
Reporter: Akashnil
Assignee: Liyin Tang
  Labels: noob

 Currently, the compaction selection is not very flexible and is not sensitive 
 to the hotness of the data. Very old data is likely to be accessed less, and 
 very recent data is likely to be in the block cache. Both of these 
 considerations make it inefficient to compact these files as aggressively as 
 other files. In some use-cases, the access-pattern is particularly obvious 
 even though there is no way to control the compaction algorithm in those 
 cases.
 In the new compaction selection algorithm, we plan to divide the candidate 
 files into different levels according to oldness of the data that is present 
 in those files. For each level, parameters like compaction ratio, minimum 
 number of store-files in each compaction may be different. Number of levels, 
 time-ranges, and parameters for each level will be configurable online on a 
 per-column family basis.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-6371) [89-fb] Tier based compaction

2012-10-14 Thread Liyin Tang (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6371?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Liyin Tang updated HBASE-6371:
--

Summary: [89-fb] Tier based compaction  (was: [89-fb] Level based 
compaction)

 [89-fb] Tier based compaction
 -

 Key: HBASE-6371
 URL: https://issues.apache.org/jira/browse/HBASE-6371
 Project: HBase
  Issue Type: Improvement
Reporter: Akashnil
Assignee: Liyin Tang
  Labels: noob

 Currently, the compaction selection is not very flexible and is not sensitive 
 to the hotness of the data. Very old data is likely to be accessed less, and 
 very recent data is likely to be in the block cache. Both of these 
 considerations make it inefficient to compact these files as aggressively as 
 other files. In some use-cases, the access-pattern is particularly obvious 
 even though there is no way to control the compaction algorithm in those 
 cases.
 In the new compaction selection algorithm, we plan to divide the candidate 
 files into different levels according to oldness of the data that is present 
 in those files. For each level, parameters like compaction ratio, minimum 
 number of store-files in each compaction may be different. Number of levels, 
 time-ranges, and parameters for each level will be configurable online on a 
 per-column family basis.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6371) [89-fb] Tier based compaction

2012-10-14 Thread Liyin Tang (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6371?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13475952#comment-13475952
 ] 

Liyin Tang commented on HBASE-6371:
---

As Nicolas suggested, rename the jira as tier based compaction.

 [89-fb] Tier based compaction
 -

 Key: HBASE-6371
 URL: https://issues.apache.org/jira/browse/HBASE-6371
 Project: HBase
  Issue Type: Improvement
Reporter: Akashnil
Assignee: Liyin Tang
  Labels: noob

 Currently, the compaction selection is not very flexible and is not sensitive 
 to the hotness of the data. Very old data is likely to be accessed less, and 
 very recent data is likely to be in the block cache. Both of these 
 considerations make it inefficient to compact these files as aggressively as 
 other files. In some use-cases, the access-pattern is particularly obvious 
 even though there is no way to control the compaction algorithm in those 
 cases.
 In the new compaction selection algorithm, we plan to divide the candidate 
 files into different levels according to oldness of the data that is present 
 in those files. For each level, parameters like compaction ratio, minimum 
 number of store-files in each compaction may be different. Number of levels, 
 time-ranges, and parameters for each level will be configurable online on a 
 per-column family basis.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6942) Endpoint implementation for bulk delete rows

2012-10-14 Thread Anoop Sam John (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13475955#comment-13475955
 ] 

Anoop Sam John commented on HBASE-6942:
---

In order to support all types of deletes we need to have the params
cf names which needs to be deleted
qualifier names which needs to be deleted
timestamp im case we need a time based delete
a type - This is needed when it is a delete version request

The scan ideally need not scan all the column which needs to be deleted. If I 
have a condition based delete, what needs to be scanned is the columns 
involving in the condition..

So taking all these in another POJO, i thought it would be better to accept as 
Delete object. In that we have APIs like deleteColumn(s), deleteFamily etc... 
Users also might be very much knowing this. What do u say Lars? The bit weird 
part I am seeing in this is the rowkey which need to be a fake one..
bq.We'd have to make up fake column qualifiers and qualifiers in the future
Sorry didnt get your mean

Thanks for the reviews and comments 

 Endpoint implementation for bulk delete rows
 

 Key: HBASE-6942
 URL: https://issues.apache.org/jira/browse/HBASE-6942
 Project: HBase
  Issue Type: Improvement
  Components: Coprocessors, Performance
Reporter: Anoop Sam John
Assignee: Anoop Sam John
 Fix For: 0.94.3, 0.96.0

 Attachments: HBASE-6942.patch, HBASE-6942_V2.patch, 
 HBASE-6942_V3.patch, HBASE-6942_V4.patch


 We can provide an end point implementation for doing a bulk deletion of 
 rows(based on a scan) at the server side. This can reduce the time taken for 
 such an operation as right now it need to do a scan to client and issue 
 delete(s) using rowkeys.
 Query like  delete from table1 where...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6994) minor doc update about DEFAULT_ACCEPTABLE_FACTOR

2012-10-14 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6994?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13475959#comment-13475959
 ] 

Hadoop QA commented on HBASE-6994:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12549104/HBASE-6994.patch
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:green}+0 tests included{color}.  The patch appears to be a 
documentation patch that doesn't require tests.

{color:green}+1 hadoop2.0{color}.  The patch compiles against the hadoop 
2.0 profile.

{color:red}-1 javadoc{color}.  The javadoc tool appears to have generated 
82 warning messages.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:red}-1 findbugs{color}.  The patch appears to introduce 5 new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

{color:green}+1 core tests{color}.  The patch passed unit tests in .

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/3050//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/3050//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/3050//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/3050//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/3050//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/3050//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/3050//console

This message is automatically generated.

 minor doc update about DEFAULT_ACCEPTABLE_FACTOR
 

 Key: HBASE-6994
 URL: https://issues.apache.org/jira/browse/HBASE-6994
 Project: HBase
  Issue Type: Bug
  Components: documentation
Affects Versions: 0.96.0
Reporter: liang xie
Assignee: liang xie
Priority: Minor
 Attachments: HBASE-6994.patch


 Per trunk code, in LruBlockCache.java:
 static final float DEFAULT_ACCEPTABLE_FACTOR = 0.99f;
 but the site doc still :
 number of region servers * heap size * hfile.block.cache.size * 0.85
 seems the HBASE-6312 forgot to update this doc:)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6942) Endpoint implementation for bulk delete rows

2012-10-14 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13475961#comment-13475961
 ] 

Lars Hofhansl commented on HBASE-6942:
--

Hmm... Most of these parameters can be controlled with a scan.
For example to delete only some CFs, just configure the scan that way. I don't 
think we should make it more complicated/flexible than this.

What you are describing is another use case. What if I only want to delete the 
column that I am describing with the scan? (Now I would have include the 
superset of all possible columns in the passed Delete object)

If folks need more complicated logic they should write their own endpoint (now 
they have an example)... That is whole point of coprocessors, so that we do not 
have to anticipate every possible usecase :)

The beauty of this approach is that we can just pass a Scan object (along with 
just a delete type maybe) and have the endpoint do its work.

Anyway, I do not feel strongly about this. If you think that we need more 
flexibility and passing a Delete is the best way to pursue this, then let's do 
that... As long as the simple case is still simple.


 Endpoint implementation for bulk delete rows
 

 Key: HBASE-6942
 URL: https://issues.apache.org/jira/browse/HBASE-6942
 Project: HBase
  Issue Type: Improvement
  Components: Coprocessors, Performance
Reporter: Anoop Sam John
Assignee: Anoop Sam John
 Fix For: 0.94.3, 0.96.0

 Attachments: HBASE-6942.patch, HBASE-6942_V2.patch, 
 HBASE-6942_V3.patch, HBASE-6942_V4.patch


 We can provide an end point implementation for doing a bulk deletion of 
 rows(based on a scan) at the server side. This can reduce the time taken for 
 such an operation as right now it need to do a scan to client and issue 
 delete(s) using rowkeys.
 Query like  delete from table1 where...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6942) Endpoint implementation for bulk delete rows

2012-10-14 Thread ramkrishna.s.vasudevan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13475962#comment-13475962
 ] 

ramkrishna.s.vasudevan commented on HBASE-6942:
---

If delete also we are supporting, then its better to provide a sample POJO 
class.  Anyway its going to be listed in the examples section.  So a clear 
documentation and a proper example will atleast help users in using it.  
Generally as Ted suggested users at first level will just try to copy paste the 
examples and see how it is behaving.

 Endpoint implementation for bulk delete rows
 

 Key: HBASE-6942
 URL: https://issues.apache.org/jira/browse/HBASE-6942
 Project: HBase
  Issue Type: Improvement
  Components: Coprocessors, Performance
Reporter: Anoop Sam John
Assignee: Anoop Sam John
 Fix For: 0.94.3, 0.96.0

 Attachments: HBASE-6942.patch, HBASE-6942_V2.patch, 
 HBASE-6942_V3.patch, HBASE-6942_V4.patch


 We can provide an end point implementation for doing a bulk deletion of 
 rows(based on a scan) at the server side. This can reduce the time taken for 
 such an operation as right now it need to do a scan to client and issue 
 delete(s) using rowkeys.
 Query like  delete from table1 where...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6942) Endpoint implementation for bulk delete rows

2012-10-14 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13475963#comment-13475963
 ] 

Lars Hofhansl commented on HBASE-6942:
--

For the V4 patch. You do need to check all the operationStatus', you can break 
out of the loop as soon as you find the first status that is not success.

Can do that on commit. +1 on the rest of the patch.

 Endpoint implementation for bulk delete rows
 

 Key: HBASE-6942
 URL: https://issues.apache.org/jira/browse/HBASE-6942
 Project: HBase
  Issue Type: Improvement
  Components: Coprocessors, Performance
Reporter: Anoop Sam John
Assignee: Anoop Sam John
 Fix For: 0.94.3, 0.96.0

 Attachments: HBASE-6942.patch, HBASE-6942_V2.patch, 
 HBASE-6942_V3.patch, HBASE-6942_V4.patch


 We can provide an end point implementation for doing a bulk deletion of 
 rows(based on a scan) at the server side. This can reduce the time taken for 
 such an operation as right now it need to do a scan to client and issue 
 delete(s) using rowkeys.
 Query like  delete from table1 where...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Comment Edited] (HBASE-6942) Endpoint implementation for bulk delete rows

2012-10-14 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13475963#comment-13475963
 ] 

Lars Hofhansl edited comment on HBASE-6942 at 10/15/12 4:54 AM:


For the V4 patch. You do not need to check all the operationStatus', you can 
break out of the loop as soon as you find the first status that is not success.

Can do that on commit. +1 on the rest of the patch.

Edit: Forgot a not in the 2nd sentence.

  was (Author: lhofhansl):
For the V4 patch. You do need to check all the operationStatus', you can 
break out of the loop as soon as you find the first status that is not success.

Can do that on commit. +1 on the rest of the patch.
  
 Endpoint implementation for bulk delete rows
 

 Key: HBASE-6942
 URL: https://issues.apache.org/jira/browse/HBASE-6942
 Project: HBase
  Issue Type: Improvement
  Components: Coprocessors, Performance
Reporter: Anoop Sam John
Assignee: Anoop Sam John
 Fix For: 0.94.3, 0.96.0

 Attachments: HBASE-6942.patch, HBASE-6942_V2.patch, 
 HBASE-6942_V3.patch, HBASE-6942_V4.patch


 We can provide an end point implementation for doing a bulk deletion of 
 rows(based on a scan) at the server side. This can reduce the time taken for 
 such an operation as right now it need to do a scan to client and issue 
 delete(s) using rowkeys.
 Query like  delete from table1 where...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-6962) Upgrade hadoop 1 dependency to hadoop 1.1

2012-10-14 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-6962:
--

Attachment: 6962.txt

 Upgrade hadoop 1 dependency to hadoop 1.1
 -

 Key: HBASE-6962
 URL: https://issues.apache.org/jira/browse/HBASE-6962
 Project: HBase
  Issue Type: Bug
 Environment: hadoop 1.1 contains multiple important fixes, including 
 HDFS-3703
Reporter: Ted Yu
 Attachments: 6962.txt




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-6962) Upgrade hadoop 1 dependency to hadoop 1.1

2012-10-14 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-6962:
--

Status: Patch Available  (was: Open)

 Upgrade hadoop 1 dependency to hadoop 1.1
 -

 Key: HBASE-6962
 URL: https://issues.apache.org/jira/browse/HBASE-6962
 Project: HBase
  Issue Type: Bug
 Environment: hadoop 1.1 contains multiple important fixes, including 
 HDFS-3703
Reporter: Ted Yu
 Attachments: 6962.txt




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (HBASE-6962) Upgrade hadoop 1 dependency to hadoop 1.1

2012-10-14 Thread Ted Yu (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu reassigned HBASE-6962:
-

Assignee: Ted Yu

 Upgrade hadoop 1 dependency to hadoop 1.1
 -

 Key: HBASE-6962
 URL: https://issues.apache.org/jira/browse/HBASE-6962
 Project: HBase
  Issue Type: Bug
 Environment: hadoop 1.1 contains multiple important fixes, including 
 HDFS-3703
Reporter: Ted Yu
Assignee: Ted Yu
 Attachments: 6962.txt




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6974) Metric for blocked updates

2012-10-14 Thread Anoop Sam John (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13475964#comment-13475964
 ] 

Anoop Sam John commented on HBASE-6974:
---

Lars
Updates will be blocked because more than one reason right
1. As in checkResources when the memstore size became too much (as per the 
hbase.hregion.memstore.block.multiplier value and memstore size)
2. Because of the global heap space usage for all the memstore in the RS
3. Becuase of the store files count being more 
(hbase.hstore.blockingStoreFiles).. Well this is not directly affect.. This 
will make the flush to wait which in turn can block the updates also as per the 
point 1..  So may be not to consider..

So your idea is that in all case where the block is happened the wait time will 
be captured and updated in the metric right? May be reason also we can capture 
so that after seeing this metric user can think about changing some configs if 
needed etc...

Just for confirming I asked..  Will be very much useful I guess...
Nice work Lars... We would like to use this...

 Metric for blocked updates
 --

 Key: HBASE-6974
 URL: https://issues.apache.org/jira/browse/HBASE-6974
 Project: HBase
  Issue Type: Bug
Reporter: Lars Hofhansl
Assignee: Michael Drzal
Priority: Critical
 Fix For: 0.94.3, 0.96.0


 When the disc subsystem cannot keep up with a sustained high write load, a 
 region will eventually block updates to throttle clients.
 (HRegion.checkResources).
 It would be nice to have a metric for this, so that these occurrences can be 
 tracked.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6974) Metric for blocked updates

2012-10-14 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13475968#comment-13475968
 ] 

Lars Hofhansl commented on HBASE-6974:
--

@Anoop: Yes that's the idea.
I think you are right and #2 is probably checked somewhere else.
#3 should lead to #1.

Thanks for checking Anoop!


 Metric for blocked updates
 --

 Key: HBASE-6974
 URL: https://issues.apache.org/jira/browse/HBASE-6974
 Project: HBase
  Issue Type: Bug
Reporter: Lars Hofhansl
Assignee: Michael Drzal
Priority: Critical
 Fix For: 0.94.3, 0.96.0


 When the disc subsystem cannot keep up with a sustained high write load, a 
 region will eventually block updates to throttle clients.
 (HRegion.checkResources).
 It would be nice to have a metric for this, so that these occurrences can be 
 tracked.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-4676) Prefix Compression - Trie data block encoding

2012-10-14 Thread Matt Corgan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Corgan updated HBASE-4676:
---

Attachment: HBASE-4676-prefix-tree-trunk-v1.patch

Attaching everything as a single patch: HBASE-4676-prefix-tree-trunk-v1.patch

 Prefix Compression - Trie data block encoding
 -

 Key: HBASE-4676
 URL: https://issues.apache.org/jira/browse/HBASE-4676
 Project: HBase
  Issue Type: New Feature
  Components: io, Performance, regionserver
Affects Versions: 0.96.0
Reporter: Matt Corgan
Assignee: Matt Corgan
 Attachments: HBASE-4676-0.94-v1.patch, 
 HBASE-4676-prefix-tree-trunk-v1.patch, hbase-prefix-trie-0.1.jar, 
 PrefixTrie_Format_v1.pdf, PrefixTrie_Performance_v1.pdf, SeeksPerSec by 
 blockSize.png


 The HBase data block format has room for 2 significant improvements for 
 applications that have high block cache hit ratios.  
 First, there is no prefix compression, and the current KeyValue format is 
 somewhat metadata heavy, so there can be tremendous memory bloat for many 
 common data layouts, specifically those with long keys and short values.
 Second, there is no random access to KeyValues inside data blocks.  This 
 means that every time you double the datablock size, average seek time (or 
 average cpu consumption) goes up by a factor of 2.  The standard 64KB block 
 size is ~10x slower for random seeks than a 4KB block size, but block sizes 
 as small as 4KB cause problems elsewhere.  Using block sizes of 256KB or 1MB 
 or more may be more efficient from a disk access and block-cache perspective 
 in many big-data applications, but doing so is infeasible from a random seek 
 perspective.
 The PrefixTrie block encoding format attempts to solve both of these 
 problems.  Some features:
 * trie format for row key encoding completely eliminates duplicate row keys 
 and encodes similar row keys into a standard trie structure which also saves 
 a lot of space
 * the column family is currently stored once at the beginning of each block.  
 this could easily be modified to allow multiple family names per block
 * all qualifiers in the block are stored in their own trie format which 
 caters nicely to wide rows.  duplicate qualifers between rows are eliminated. 
  the size of this trie determines the width of the block's qualifier 
 fixed-width-int
 * the minimum timestamp is stored at the beginning of the block, and deltas 
 are calculated from that.  the maximum delta determines the width of the 
 block's timestamp fixed-width-int
 The block is structured with metadata at the beginning, then a section for 
 the row trie, then the column trie, then the timestamp deltas, and then then 
 all the values.  Most work is done in the row trie, where every leaf node 
 (corresponding to a row) contains a list of offsets/references corresponding 
 to the cells in that row.  Each cell is fixed-width to enable binary 
 searching and is represented by [1 byte operationType, X bytes qualifier 
 offset, X bytes timestamp delta offset].
 If all operation types are the same for a block, there will be zero per-cell 
 overhead.  Same for timestamps.  Same for qualifiers when i get a chance.  
 So, the compression aspect is very strong, but makes a few small sacrifices 
 on VarInt size to enable faster binary searches in trie fan-out nodes.
 A more compressed but slower version might build on this by also applying 
 further (suffix, etc) compression on the trie nodes at the cost of slower 
 write speed.  Even further compression could be obtained by using all VInts 
 instead of FInts with a sacrifice on random seek speed (though not huge).
 One current drawback is the current write speed.  While programmed with good 
 constructs like TreeMaps, ByteBuffers, binary searches, etc, it's not 
 programmed with the same level of optimization as the read path.  Work will 
 need to be done to optimize the data structures used for encoding and could 
 probably show a 10x increase.  It will still be slower than delta encoding, 
 but with a much higher decode speed.  I have not yet created a thorough 
 benchmark for write speed nor sequential read speed.
 Though the trie is reaching a point where it is internally very efficient 
 (probably within half or a quarter of its max read speed) the way that hbase 
 currently uses it is far from optimal.  The KeyValueScanner and related 
 classes that iterate through the trie will eventually need to be smarter and 
 have methods to do things like skipping to the next row of results without 
 scanning every cell in between.  When that is accomplished it will also allow 
 much faster compactions because the full row key will not have to be compared 
 as often as it is now.
 Current code is on github.  The trie code is in a separate project than the 
 

[jira] [Commented] (HBASE-6974) Metric for blocked updates

2012-10-14 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13475971#comment-13475971
 ] 

Lars Hofhansl commented on HBASE-6974:
--

In any case the #2 should be covered by a different metric, I think.
#1 (and #3) are caused by the IO system not being able to keep up. #2 is caused 
by insufficient memory (which may also in part stem from the IO system being 
too slow, but it also may have other reasons).


 Metric for blocked updates
 --

 Key: HBASE-6974
 URL: https://issues.apache.org/jira/browse/HBASE-6974
 Project: HBase
  Issue Type: Bug
Reporter: Lars Hofhansl
Assignee: Michael Drzal
Priority: Critical
 Fix For: 0.94.3, 0.96.0


 When the disc subsystem cannot keep up with a sustained high write load, a 
 region will eventually block updates to throttle clients.
 (HRegion.checkResources).
 It would be nice to have a metric for this, so that these occurrences can be 
 tracked.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-4676) Prefix Compression - Trie data block encoding

2012-10-14 Thread Matt Corgan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Corgan updated HBASE-4676:
---

Affects Version/s: (was: 0.90.6)
   0.96.0

 Prefix Compression - Trie data block encoding
 -

 Key: HBASE-4676
 URL: https://issues.apache.org/jira/browse/HBASE-4676
 Project: HBase
  Issue Type: New Feature
  Components: io, Performance, regionserver
Affects Versions: 0.96.0
Reporter: Matt Corgan
Assignee: Matt Corgan
 Attachments: HBASE-4676-0.94-v1.patch, 
 HBASE-4676-prefix-tree-trunk-v1.patch, hbase-prefix-trie-0.1.jar, 
 PrefixTrie_Format_v1.pdf, PrefixTrie_Performance_v1.pdf, SeeksPerSec by 
 blockSize.png


 The HBase data block format has room for 2 significant improvements for 
 applications that have high block cache hit ratios.  
 First, there is no prefix compression, and the current KeyValue format is 
 somewhat metadata heavy, so there can be tremendous memory bloat for many 
 common data layouts, specifically those with long keys and short values.
 Second, there is no random access to KeyValues inside data blocks.  This 
 means that every time you double the datablock size, average seek time (or 
 average cpu consumption) goes up by a factor of 2.  The standard 64KB block 
 size is ~10x slower for random seeks than a 4KB block size, but block sizes 
 as small as 4KB cause problems elsewhere.  Using block sizes of 256KB or 1MB 
 or more may be more efficient from a disk access and block-cache perspective 
 in many big-data applications, but doing so is infeasible from a random seek 
 perspective.
 The PrefixTrie block encoding format attempts to solve both of these 
 problems.  Some features:
 * trie format for row key encoding completely eliminates duplicate row keys 
 and encodes similar row keys into a standard trie structure which also saves 
 a lot of space
 * the column family is currently stored once at the beginning of each block.  
 this could easily be modified to allow multiple family names per block
 * all qualifiers in the block are stored in their own trie format which 
 caters nicely to wide rows.  duplicate qualifers between rows are eliminated. 
  the size of this trie determines the width of the block's qualifier 
 fixed-width-int
 * the minimum timestamp is stored at the beginning of the block, and deltas 
 are calculated from that.  the maximum delta determines the width of the 
 block's timestamp fixed-width-int
 The block is structured with metadata at the beginning, then a section for 
 the row trie, then the column trie, then the timestamp deltas, and then then 
 all the values.  Most work is done in the row trie, where every leaf node 
 (corresponding to a row) contains a list of offsets/references corresponding 
 to the cells in that row.  Each cell is fixed-width to enable binary 
 searching and is represented by [1 byte operationType, X bytes qualifier 
 offset, X bytes timestamp delta offset].
 If all operation types are the same for a block, there will be zero per-cell 
 overhead.  Same for timestamps.  Same for qualifiers when i get a chance.  
 So, the compression aspect is very strong, but makes a few small sacrifices 
 on VarInt size to enable faster binary searches in trie fan-out nodes.
 A more compressed but slower version might build on this by also applying 
 further (suffix, etc) compression on the trie nodes at the cost of slower 
 write speed.  Even further compression could be obtained by using all VInts 
 instead of FInts with a sacrifice on random seek speed (though not huge).
 One current drawback is the current write speed.  While programmed with good 
 constructs like TreeMaps, ByteBuffers, binary searches, etc, it's not 
 programmed with the same level of optimization as the read path.  Work will 
 need to be done to optimize the data structures used for encoding and could 
 probably show a 10x increase.  It will still be slower than delta encoding, 
 but with a much higher decode speed.  I have not yet created a thorough 
 benchmark for write speed nor sequential read speed.
 Though the trie is reaching a point where it is internally very efficient 
 (probably within half or a quarter of its max read speed) the way that hbase 
 currently uses it is far from optimal.  The KeyValueScanner and related 
 classes that iterate through the trie will eventually need to be smarter and 
 have methods to do things like skipping to the next row of results without 
 scanning every cell in between.  When that is accomplished it will also allow 
 much faster compactions because the full row key will not have to be compared 
 as often as it is now.
 Current code is on github.  The trie code is in a separate project than the 
 slightly modified hbase.  There is an hbase project there as well with 

[jira] [Commented] (HBASE-6974) Metric for blocked updates

2012-10-14 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13475975#comment-13475975
 ] 

Lars Hofhansl commented on HBASE-6974:
--

#2 is checked in MemStoreFlusher.reclaimMemStoreMemory. Currently there is not 
even a log message when that happens :(


 Metric for blocked updates
 --

 Key: HBASE-6974
 URL: https://issues.apache.org/jira/browse/HBASE-6974
 Project: HBase
  Issue Type: Bug
Reporter: Lars Hofhansl
Assignee: Michael Drzal
Priority: Critical
 Fix For: 0.94.3, 0.96.0


 When the disc subsystem cannot keep up with a sustained high write load, a 
 region will eventually block updates to throttle clients.
 (HRegion.checkResources).
 It would be nice to have a metric for this, so that these occurrences can be 
 tracked.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6974) Metric for blocked updates

2012-10-14 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13475976#comment-13475976
 ] 

Lars Hofhansl commented on HBASE-6974:
--

And while we're at that, let's remove the synchronized from 
reclaimMemStoreMemory... Looks like this a left over from the past.

 Metric for blocked updates
 --

 Key: HBASE-6974
 URL: https://issues.apache.org/jira/browse/HBASE-6974
 Project: HBase
  Issue Type: Bug
Reporter: Lars Hofhansl
Assignee: Michael Drzal
Priority: Critical
 Fix For: 0.94.3, 0.96.0


 When the disc subsystem cannot keep up with a sustained high write load, a 
 region will eventually block updates to throttle clients.
 (HRegion.checkResources).
 It would be nice to have a metric for this, so that these occurrences can be 
 tracked.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6929) Publish Hbase 0.94 artifacts build against hadoop-2.0

2012-10-14 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13475978#comment-13475978
 ] 

stack commented on HBASE-6929:
--

I can add hadoop2 to the name of the target artifact but not to the test jar.  
When maven-jar-plugin makes the jar, it uses a hard-coded 'tests' classifier.  
I cannot override it w/o changing plugin code (adding a classifier to 
maven-jar-plugin is just ignored).

I tried various other things.  I tried getting build-helper-maven-plugin to 
attach the test jar at package time but only seemed to end up doubling the test 
jars installed (a variant of this suggestion 
http://stackoverflow.com/questions/8499266/maven-deploy-source-classifiers).  
They still didn't have the right name on install.  I think the maven-jar-plugin 
attaches the test jar before I can intercede later w/ the 
build-helper-maven-plugin.

I tried setting the finalName to hadoop2 when we are using the hadoop2 profile. 
 This gets pretty far.  See below:

{code}
[INFO] --- maven-install-plugin:2.3.1:install (default-install) @ hbase ---
[INFO] Installing 
/Users/stack/checkouts/0.94/target/hbase-0.94.3-SNAPSHOT-hadoop2.jar to 
/Users/stack/.m2/repository/org/apache/hbase/hbase/0.94.3-SNAPSHOT/hbase-0.94.3-SNAPSHOT.jar
[INFO] Installing /Users/stack/checkouts/0.94/pom.xml to 
/Users/stack/.m2/repository/org/apache/hbase/hbase/0.94.3-SNAPSHOT/hbase-0.94.3-SNAPSHOT.pom
[INFO] Installing 
/Users/stack/checkouts/0.94/target/hbase-0.94.3-SNAPSHOT-hadoop2-tests.jar to 
/Users/stack/.m2/repository/org/apache/hbase/hbase/0.94.3-SNAPSHOT/hbase-0.94.3-SNAPSHOT-tests.jar
[INFO] Installing 
/Users/stack/checkouts/0.94/target/hbase-0.94.3-SNAPSHOT-hadoop2-sources.jar to 
/Users/stack/.m2/repository/org/apache/hbase/hbase/0.94.3-SNAPSHOT/hbase-0.94.3-SNAPSHOT-sources.jar
{code}

I cannot get install though to write the local repo w/ the names I told it to 
use.  Looking at the install plugin, it uses maven itself copying artifacts 
(The above logging is the output of a repository 'event' as artifacts are 
copied in).  I cannot put config on the install plugin to have it use my final 
name rather than the one it composes from base pom attributes.

Anyone else have a suggestion?

Is the test jar really needed?  The hadoop1 built one doesn't work w/ hadoop2?

 Publish Hbase 0.94 artifacts build against hadoop-2.0
 -

 Key: HBASE-6929
 URL: https://issues.apache.org/jira/browse/HBASE-6929
 Project: HBase
  Issue Type: Task
  Components: build
Affects Versions: 0.94.2
Reporter: Enis Soztutar
 Attachments: 6929.txt


 Downstream projects (flume, hive, pig, etc) depends on hbase, but since the 
 hbase binaries build with hadoop-2.0 are not pushed to maven, they cannot 
 depend on them. AFAIK, hadoop 1 and 2 are not binary compatible, so we should 
 also push hbase jars build with hadoop2.0 profile into maven, possibly with 
 version string like 0.94.2-hadoop2.0. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6929) Publish Hbase 0.94 artifacts build against hadoop-2.0

2012-10-14 Thread Lars Hofhansl (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13475979#comment-13475979
 ] 

Lars Hofhansl commented on HBASE-6929:
--

[~jesse_yates] Any ideas, Mr. build expert? :)

 Publish Hbase 0.94 artifacts build against hadoop-2.0
 -

 Key: HBASE-6929
 URL: https://issues.apache.org/jira/browse/HBASE-6929
 Project: HBase
  Issue Type: Task
  Components: build
Affects Versions: 0.94.2
Reporter: Enis Soztutar
 Attachments: 6929.txt


 Downstream projects (flume, hive, pig, etc) depends on hbase, but since the 
 hbase binaries build with hadoop-2.0 are not pushed to maven, they cannot 
 depend on them. AFAIK, hadoop 1 and 2 are not binary compatible, so we should 
 also push hbase jars build with hadoop2.0 profile into maven, possibly with 
 version string like 0.94.2-hadoop2.0. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6974) Metric for blocked updates

2012-10-14 Thread Anoop Sam John (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6974?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13475981#comment-13475981
 ] 

Anoop Sam John commented on HBASE-6974:
---

Yes Lars, you are correct, the reasons for #1(#3) and #2 will be normally 
different... And seperate metric can be better.. In some way user should be 
able to know why the updates are blocked.. :)

 Metric for blocked updates
 --

 Key: HBASE-6974
 URL: https://issues.apache.org/jira/browse/HBASE-6974
 Project: HBase
  Issue Type: Bug
Reporter: Lars Hofhansl
Assignee: Michael Drzal
Priority: Critical
 Fix For: 0.94.3, 0.96.0


 When the disc subsystem cannot keep up with a sustained high write load, a 
 region will eventually block updates to throttle clients.
 (HRegion.checkResources).
 It would be nice to have a metric for this, so that these occurrences can be 
 tracked.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6942) Endpoint implementation for bulk delete rows

2012-10-14 Thread Anoop Sam John (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6942?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13475985#comment-13475985
 ] 

Anoop Sam John commented on HBASE-6942:
---

Yes Lars, for the simplicity better to control every thing via Scan.. For the 
user also specify this would be easy.  Let me try it out with code.. Also will 
try passing every thing via the Scan#setAttribute() 

 Endpoint implementation for bulk delete rows
 

 Key: HBASE-6942
 URL: https://issues.apache.org/jira/browse/HBASE-6942
 Project: HBase
  Issue Type: Improvement
  Components: Coprocessors, Performance
Reporter: Anoop Sam John
Assignee: Anoop Sam John
 Fix For: 0.94.3, 0.96.0

 Attachments: HBASE-6942.patch, HBASE-6942_V2.patch, 
 HBASE-6942_V3.patch, HBASE-6942_V4.patch


 We can provide an end point implementation for doing a bulk deletion of 
 rows(based on a scan) at the server side. This can reduce the time taken for 
 such an operation as right now it need to do a scan to client and issue 
 delete(s) using rowkeys.
 Query like  delete from table1 where...

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-4676) Prefix Compression - Trie data block encoding

2012-10-14 Thread Matt Corgan (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4676?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matt Corgan updated HBASE-4676:
---

Status: Patch Available  (was: Open)

 Prefix Compression - Trie data block encoding
 -

 Key: HBASE-4676
 URL: https://issues.apache.org/jira/browse/HBASE-4676
 Project: HBase
  Issue Type: New Feature
  Components: io, Performance, regionserver
Affects Versions: 0.96.0
Reporter: Matt Corgan
Assignee: Matt Corgan
 Attachments: HBASE-4676-0.94-v1.patch, 
 HBASE-4676-prefix-tree-trunk-v1.patch, hbase-prefix-trie-0.1.jar, 
 PrefixTrie_Format_v1.pdf, PrefixTrie_Performance_v1.pdf, SeeksPerSec by 
 blockSize.png


 The HBase data block format has room for 2 significant improvements for 
 applications that have high block cache hit ratios.  
 First, there is no prefix compression, and the current KeyValue format is 
 somewhat metadata heavy, so there can be tremendous memory bloat for many 
 common data layouts, specifically those with long keys and short values.
 Second, there is no random access to KeyValues inside data blocks.  This 
 means that every time you double the datablock size, average seek time (or 
 average cpu consumption) goes up by a factor of 2.  The standard 64KB block 
 size is ~10x slower for random seeks than a 4KB block size, but block sizes 
 as small as 4KB cause problems elsewhere.  Using block sizes of 256KB or 1MB 
 or more may be more efficient from a disk access and block-cache perspective 
 in many big-data applications, but doing so is infeasible from a random seek 
 perspective.
 The PrefixTrie block encoding format attempts to solve both of these 
 problems.  Some features:
 * trie format for row key encoding completely eliminates duplicate row keys 
 and encodes similar row keys into a standard trie structure which also saves 
 a lot of space
 * the column family is currently stored once at the beginning of each block.  
 this could easily be modified to allow multiple family names per block
 * all qualifiers in the block are stored in their own trie format which 
 caters nicely to wide rows.  duplicate qualifers between rows are eliminated. 
  the size of this trie determines the width of the block's qualifier 
 fixed-width-int
 * the minimum timestamp is stored at the beginning of the block, and deltas 
 are calculated from that.  the maximum delta determines the width of the 
 block's timestamp fixed-width-int
 The block is structured with metadata at the beginning, then a section for 
 the row trie, then the column trie, then the timestamp deltas, and then then 
 all the values.  Most work is done in the row trie, where every leaf node 
 (corresponding to a row) contains a list of offsets/references corresponding 
 to the cells in that row.  Each cell is fixed-width to enable binary 
 searching and is represented by [1 byte operationType, X bytes qualifier 
 offset, X bytes timestamp delta offset].
 If all operation types are the same for a block, there will be zero per-cell 
 overhead.  Same for timestamps.  Same for qualifiers when i get a chance.  
 So, the compression aspect is very strong, but makes a few small sacrifices 
 on VarInt size to enable faster binary searches in trie fan-out nodes.
 A more compressed but slower version might build on this by also applying 
 further (suffix, etc) compression on the trie nodes at the cost of slower 
 write speed.  Even further compression could be obtained by using all VInts 
 instead of FInts with a sacrifice on random seek speed (though not huge).
 One current drawback is the current write speed.  While programmed with good 
 constructs like TreeMaps, ByteBuffers, binary searches, etc, it's not 
 programmed with the same level of optimization as the read path.  Work will 
 need to be done to optimize the data structures used for encoding and could 
 probably show a 10x increase.  It will still be slower than delta encoding, 
 but with a much higher decode speed.  I have not yet created a thorough 
 benchmark for write speed nor sequential read speed.
 Though the trie is reaching a point where it is internally very efficient 
 (probably within half or a quarter of its max read speed) the way that hbase 
 currently uses it is far from optimal.  The KeyValueScanner and related 
 classes that iterate through the trie will eventually need to be smarter and 
 have methods to do things like skipping to the next row of results without 
 scanning every cell in between.  When that is accomplished it will also allow 
 much faster compactions because the full row key will not have to be compared 
 as often as it is now.
 Current code is on github.  The trie code is in a separate project than the 
 slightly modified hbase.  There is an hbase project there as well with the 
 DeltaEncoding patch 

[jira] [Commented] (HBASE-6929) Publish Hbase 0.94 artifacts build against hadoop-2.0

2012-10-14 Thread Jonathan Hsieh (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6929?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13475986#comment-13475986
 ] 

Jonathan Hsieh commented on HBASE-6929:
---

Since I believe the minicluster code is in the test package, I'm fairly sure 
any project that depends on this hbase+hadoop 2.0 (flume, sqoop to name a few) 
will want them.

Maybe look at how bigtop deals with the different versions? 

 Publish Hbase 0.94 artifacts build against hadoop-2.0
 -

 Key: HBASE-6929
 URL: https://issues.apache.org/jira/browse/HBASE-6929
 Project: HBase
  Issue Type: Task
  Components: build
Affects Versions: 0.94.2
Reporter: Enis Soztutar
 Attachments: 6929.txt


 Downstream projects (flume, hive, pig, etc) depends on hbase, but since the 
 hbase binaries build with hadoop-2.0 are not pushed to maven, they cannot 
 depend on them. AFAIK, hadoop 1 and 2 are not binary compatible, so we should 
 also push hbase jars build with hadoop2.0 profile into maven, possibly with 
 version string like 0.94.2-hadoop2.0. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6962) Upgrade hadoop 1 dependency to hadoop 1.1

2012-10-14 Thread Hadoop QA (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6962?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13475989#comment-13475989
 ] 

Hadoop QA commented on HBASE-6962:
--

{color:red}-1 overall{color}.  Here are the results of testing the latest 
attachment 
  http://issues.apache.org/jira/secure/attachment/12549105/6962.txt
  against trunk revision .

{color:green}+1 @author{color}.  The patch does not contain any @author 
tags.

{color:red}-1 tests included{color}.  The patch doesn't appear to include 
any new or modified tests.
Please justify why no new tests are needed for this 
patch.
Also please list what manual steps were performed to 
verify this patch.

{color:green}+1 hadoop2.0{color}.  The patch compiles against the hadoop 
2.0 profile.

{color:red}-1 javadoc{color}.  The javadoc tool appears to have generated 
82 warning messages.

{color:green}+1 javac{color}.  The applied patch does not increase the 
total number of javac compiler warnings.

{color:red}-1 findbugs{color}.  The patch appears to introduce 5 new 
Findbugs (version 1.3.9) warnings.

{color:green}+1 release audit{color}.  The applied patch does not increase 
the total number of release audit warnings.

 {color:red}-1 core tests{color}.  The patch failed these unit tests:
   org.apache.hadoop.hbase.client.TestMultiParallel
  
org.apache.hadoop.hbase.backup.example.TestZooKeeperTableArchiveClient

Test results: 
https://builds.apache.org/job/PreCommit-HBASE-Build/3051//testReport/
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/3051//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop2-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/3051//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-server.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/3051//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop1-compat.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/3051//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-common.html
Findbugs warnings: 
https://builds.apache.org/job/PreCommit-HBASE-Build/3051//artifact/trunk/patchprocess/newPatchFindbugsWarningshbase-hadoop-compat.html
Console output: 
https://builds.apache.org/job/PreCommit-HBASE-Build/3051//console

This message is automatically generated.

 Upgrade hadoop 1 dependency to hadoop 1.1
 -

 Key: HBASE-6962
 URL: https://issues.apache.org/jira/browse/HBASE-6962
 Project: HBase
  Issue Type: Bug
 Environment: hadoop 1.1 contains multiple important fixes, including 
 HDFS-3703
Reporter: Ted Yu
Assignee: Ted Yu
 Attachments: 6962.txt




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira