[jira] [Commented] (HBASE-4528) The put operation can release the rowlock before sync-ing the Hlog

2011-10-08 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13123403#comment-13123403
 ] 

jirapos...@reviews.apache.org commented on HBASE-4528:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2141/
---

(Updated 2011-10-08 07:50:25.069515)


Review request for hbase.


Changes
---

The HLog.sync() does not throw an exception if it encounters a HDFS error. 
Instead it triggers a logroll as usual. If the put code encounter an error 
while syncing to hdfs, then it rollbacks the change to the memstore and throws 
an exception to the client.


Summary
---

The changes the multiPut operation so that the sync to the wal occurs outside 
the rowlock.

This enhancement is done only to HRegion.mut(Put[]) because this is the only 
method that gets invoked from an application. The HRegion.put(Put) is used only 
by unit tests and should possibly be deprecated.

I have attached a unit test. I have not yet run all unit tests, but early 
feedback on this patch will be very helpful.


This addresses bug HBASE-4528.
https://issues.apache.org/jira/browse/HBASE-4528


Diffs (updated)
-

  /src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java 1180314 
  /src/main/java/org/apache/hadoop/hbase/regionserver/MemStore.java 1180314 
  
/src/main/java/org/apache/hadoop/hbase/regionserver/ReadWriteConsistencyControl.java
 1180314 
  /src/main/java/org/apache/hadoop/hbase/regionserver/Store.java 1180314 
  /src/main/java/org/apache/hadoop/hbase/regionserver/StoreFlusher.java 1180314 
  /src/test/java/org/apache/hadoop/hbase/regionserver/TestParallelPut.java 
PRE-CREATION 
  /src/test/java/org/apache/hadoop/hbase/regionserver/TestStore.java 1180314 
  /src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLogRolling.java 
1180314 

Diff: https://reviews.apache.org/r/2141/diff


Testing
---

Not yet run the full suite of unit tests.


Thanks,

Dhruba



 The put operation can release the rowlock before sync-ing the Hlog
 --

 Key: HBASE-4528
 URL: https://issues.apache.org/jira/browse/HBASE-4528
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Reporter: dhruba borthakur
Assignee: dhruba borthakur
 Attachments: appendNoSyncPut1.txt, appendNoSyncPut2.txt, 
 appendNoSyncPut3.txt, appendNoSyncPut4.txt


 This allows for better throughput when there are hot rows. A single row 
 update improves from 100 puts/sec/server to 5000 puts/sec/server.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4528) The put operation can release the rowlock before sync-ing the Hlog

2011-10-08 Thread dhruba borthakur (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13123404#comment-13123404
 ] 

dhruba borthakur commented on HBASE-4528:
-

It appears to me that deleting kvs from the memstore (as part of the 
transaction rollback) should be ok. It is done without the rowlock because the 
kvs are not yet visible to scanners because the rwcc is not yet advanced. 

One doubtful case is when a thread A is trying to insert kv via a Put call.  
Thread A inserted into memstore but has failed to sync to wal. Now suppose 
another thread B has concurrently inserted exactly the same kv into the 
memstore successfully and committed the transaction by syncing to wal 
successfully. Now thread A tries to rollback its failed transaction and removes 
the kv from the memstore. Is this scenario possible? In that case, should the 
rollback code in Thread A delete the kv from the memstore only if its 
kv.memstoreTS matches its own?

 The put operation can release the rowlock before sync-ing the Hlog
 --

 Key: HBASE-4528
 URL: https://issues.apache.org/jira/browse/HBASE-4528
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Reporter: dhruba borthakur
Assignee: dhruba borthakur
 Attachments: appendNoSyncPut1.txt, appendNoSyncPut2.txt, 
 appendNoSyncPut3.txt, appendNoSyncPut4.txt


 This allows for better throughput when there are hot rows. A single row 
 update improves from 100 puts/sec/server to 5000 puts/sec/server.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4528) The put operation can release the rowlock before sync-ing the Hlog

2011-10-08 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13123500#comment-13123500
 ] 

jirapos...@reviews.apache.org commented on HBASE-4528:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2141/#review2463
---


Nice work.
See comments below which aim to solve the case Dhruba raised @ 08/Oct/11 08:01


/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
https://reviews.apache.org/r/2141/#comment5573

We should pass w.getWriteNumber() to this method and use it to verify the 
memstoreTS of kv's matches the writeNumber.
w wouldn't be null in this case since step 8 shouldn't have been executed 
yet.



/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
https://reviews.apache.org/r/2141/#comment5569

This line should be moved into an else block for the if block above.



/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
https://reviews.apache.org/r/2141/#comment5572

We should verify that kv.getMemstoreTS() matches w.getWriteNumber() before 
removing from store.



/src/main/java/org/apache/hadoop/hbase/regionserver/MemStore.java
https://reviews.apache.org/r/2141/#comment5570

Should be 'remove a KeyValue'


- Ted


On 2011-10-08 07:50:25, Dhruba Borthakur wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/2141/
bq.  ---
bq.  
bq.  (Updated 2011-10-08 07:50:25)
bq.  
bq.  
bq.  Review request for hbase.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  The changes the multiPut operation so that the sync to the wal occurs 
outside the rowlock.
bq.  
bq.  This enhancement is done only to HRegion.mut(Put[]) because this is the 
only method that gets invoked from an application. The HRegion.put(Put) is used 
only by unit tests and should possibly be deprecated.
bq.  
bq.  I have attached a unit test. I have not yet run all unit tests, but early 
feedback on this patch will be very helpful.
bq.  
bq.  
bq.  This addresses bug HBASE-4528.
bq.  https://issues.apache.org/jira/browse/HBASE-4528
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq./src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java 1180314 
bq./src/main/java/org/apache/hadoop/hbase/regionserver/MemStore.java 
1180314 
bq.
/src/main/java/org/apache/hadoop/hbase/regionserver/ReadWriteConsistencyControl.java
 1180314 
bq./src/main/java/org/apache/hadoop/hbase/regionserver/Store.java 1180314 
bq./src/main/java/org/apache/hadoop/hbase/regionserver/StoreFlusher.java 
1180314 
bq./src/test/java/org/apache/hadoop/hbase/regionserver/TestParallelPut.java 
PRE-CREATION 
bq./src/test/java/org/apache/hadoop/hbase/regionserver/TestStore.java 
1180314 
bq.
/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLogRolling.java 
1180314 
bq.  
bq.  Diff: https://reviews.apache.org/r/2141/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  Not yet run the full suite of unit tests.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Dhruba
bq.  
bq.



 The put operation can release the rowlock before sync-ing the Hlog
 --

 Key: HBASE-4528
 URL: https://issues.apache.org/jira/browse/HBASE-4528
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Reporter: dhruba borthakur
Assignee: dhruba borthakur
 Attachments: appendNoSyncPut1.txt, appendNoSyncPut2.txt, 
 appendNoSyncPut3.txt, appendNoSyncPut4.txt


 This allows for better throughput when there are hot rows. A single row 
 update improves from 100 puts/sec/server to 5000 puts/sec/server.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4528) The put operation can release the rowlock before sync-ing the Hlog

2011-10-08 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13123503#comment-13123503
 ] 

jirapos...@reviews.apache.org commented on HBASE-4528:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2141/#review2464
---


+1

- ramkrishna


On 2011-10-08 07:50:25, Dhruba Borthakur wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/2141/
bq.  ---
bq.  
bq.  (Updated 2011-10-08 07:50:25)
bq.  
bq.  
bq.  Review request for hbase.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  The changes the multiPut operation so that the sync to the wal occurs 
outside the rowlock.
bq.  
bq.  This enhancement is done only to HRegion.mut(Put[]) because this is the 
only method that gets invoked from an application. The HRegion.put(Put) is used 
only by unit tests and should possibly be deprecated.
bq.  
bq.  I have attached a unit test. I have not yet run all unit tests, but early 
feedback on this patch will be very helpful.
bq.  
bq.  
bq.  This addresses bug HBASE-4528.
bq.  https://issues.apache.org/jira/browse/HBASE-4528
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq./src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java 1180314 
bq./src/main/java/org/apache/hadoop/hbase/regionserver/MemStore.java 
1180314 
bq.
/src/main/java/org/apache/hadoop/hbase/regionserver/ReadWriteConsistencyControl.java
 1180314 
bq./src/main/java/org/apache/hadoop/hbase/regionserver/Store.java 1180314 
bq./src/main/java/org/apache/hadoop/hbase/regionserver/StoreFlusher.java 
1180314 
bq./src/test/java/org/apache/hadoop/hbase/regionserver/TestParallelPut.java 
PRE-CREATION 
bq./src/test/java/org/apache/hadoop/hbase/regionserver/TestStore.java 
1180314 
bq.
/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLogRolling.java 
1180314 
bq.  
bq.  Diff: https://reviews.apache.org/r/2141/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  Not yet run the full suite of unit tests.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Dhruba
bq.  
bq.



 The put operation can release the rowlock before sync-ing the Hlog
 --

 Key: HBASE-4528
 URL: https://issues.apache.org/jira/browse/HBASE-4528
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Reporter: dhruba borthakur
Assignee: dhruba borthakur
 Attachments: appendNoSyncPut1.txt, appendNoSyncPut2.txt, 
 appendNoSyncPut3.txt, appendNoSyncPut4.txt


 This allows for better throughput when there are hot rows. A single row 
 update improves from 100 puts/sec/server to 5000 puts/sec/server.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4528) The put operation can release the rowlock before sync-ing the Hlog

2011-10-08 Thread Ted Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13123505#comment-13123505
 ] 

Ted Yu commented on HBASE-4528:
---

Minor suggestion for the new MemStore.remove() method:
Since it is called only in case of error recovery, I feel a better name maybe 
rollback().

 The put operation can release the rowlock before sync-ing the Hlog
 --

 Key: HBASE-4528
 URL: https://issues.apache.org/jira/browse/HBASE-4528
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Reporter: dhruba borthakur
Assignee: dhruba borthakur
 Attachments: appendNoSyncPut1.txt, appendNoSyncPut2.txt, 
 appendNoSyncPut3.txt, appendNoSyncPut4.txt


 This allows for better throughput when there are hot rows. A single row 
 update improves from 100 puts/sec/server to 5000 puts/sec/server.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4218) Delta Encoding of KeyValues (aka prefix compression)

2011-10-08 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13123510#comment-13123510
 ] 

jirapos...@reviews.apache.org commented on HBASE-4218:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2308/#review2466
---


I ran unit tests with Jacek's patch. 1199 unit tests passed. The only one that 
failed was ServerCustomProtocol, which also seems to fail sporadically without 
the patch. Without the patch, there are only 1028 tests, so the patch is 
apparently very well unit-tested.

- Mikhail


On 2011-10-08 00:51:01, Jacek Migdal wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/2308/
bq.  ---
bq.  
bq.  (Updated 2011-10-08 00:51:01)
bq.  
bq.  
bq.  Review request for hbase.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  Delta encoding for key values.
bq.  
bq.  
bq.  This addresses bug HBASE-4218.
bq.  https://issues.apache.org/jira/browse/HBASE-4218
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/HColumnDescriptor.java
 1180113 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/KeyValue.java
 1180113 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/HalfStoreFileReader.java
 1180113 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/deltaencoder/BitsetKeyDeltaEncoder.java
 PRE-CREATION 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/deltaencoder/BufferedDeltaEncoder.java
 PRE-CREATION 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/deltaencoder/CompressionState.java
 PRE-CREATION 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/deltaencoder/CopyKeyDeltaEncoder.java
 PRE-CREATION 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/deltaencoder/DeltaEncodedBlock.java
 PRE-CREATION 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/deltaencoder/DeltaEncoder.java
 PRE-CREATION 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/deltaencoder/DeltaEncoderAlgorithms.java
 PRE-CREATION 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/deltaencoder/DeltaEncoderToSmallBufferException.java
 PRE-CREATION 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/deltaencoder/DiffKeyDeltaEncoder.java
 PRE-CREATION 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/deltaencoder/FastDiffDeltaEncoder.java
 PRE-CREATION 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/deltaencoder/PrefixKeyDeltaEncoder.java
 PRE-CREATION 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/AbstractHFileReader.java
 1180113 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/AbstractHFileWriter.java
 1180113 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/BlockDeltaEncoder.java
 PRE-CREATION 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/BlockType.java
 1180113 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/EmptyBlockDeltaEncoder.java
 PRE-CREATION 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java
 1180113 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlock.java
 1180113 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlockDeltaEncoder.java
 PRE-CREATION 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlockIndex.java
 1180113 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/HFilePrettyPrinter.java
 1180113 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV1.java
 1180113 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java
 1180113 
bq.

[jira] [Commented] (HBASE-4528) The put operation can release the rowlock before sync-ing the Hlog

2011-10-08 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4528?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13123513#comment-13123513
 ] 

jirapos...@reviews.apache.org commented on HBASE-4528:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2141/#review2467
---



/src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java
https://reviews.apache.org/r/2141/#comment5575

If walSyncSuccessful is true, rwcc.completeMemstoreInsert(w) on line 1852 
would have been called.
If walSyncSuccessful is false, removeKeysFromMemstore() on line 1878 rolls 
back the changes to memstore.
Looks like this line is no longer needed.


- Ted


On 2011-10-08 07:50:25, Dhruba Borthakur wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/2141/
bq.  ---
bq.  
bq.  (Updated 2011-10-08 07:50:25)
bq.  
bq.  
bq.  Review request for hbase.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  The changes the multiPut operation so that the sync to the wal occurs 
outside the rowlock.
bq.  
bq.  This enhancement is done only to HRegion.mut(Put[]) because this is the 
only method that gets invoked from an application. The HRegion.put(Put) is used 
only by unit tests and should possibly be deprecated.
bq.  
bq.  I have attached a unit test. I have not yet run all unit tests, but early 
feedback on this patch will be very helpful.
bq.  
bq.  
bq.  This addresses bug HBASE-4528.
bq.  https://issues.apache.org/jira/browse/HBASE-4528
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq./src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java 1180314 
bq./src/main/java/org/apache/hadoop/hbase/regionserver/MemStore.java 
1180314 
bq.
/src/main/java/org/apache/hadoop/hbase/regionserver/ReadWriteConsistencyControl.java
 1180314 
bq./src/main/java/org/apache/hadoop/hbase/regionserver/Store.java 1180314 
bq./src/main/java/org/apache/hadoop/hbase/regionserver/StoreFlusher.java 
1180314 
bq./src/test/java/org/apache/hadoop/hbase/regionserver/TestParallelPut.java 
PRE-CREATION 
bq./src/test/java/org/apache/hadoop/hbase/regionserver/TestStore.java 
1180314 
bq.
/src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestLogRolling.java 
1180314 
bq.  
bq.  Diff: https://reviews.apache.org/r/2141/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  Not yet run the full suite of unit tests.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Dhruba
bq.  
bq.



 The put operation can release the rowlock before sync-ing the Hlog
 --

 Key: HBASE-4528
 URL: https://issues.apache.org/jira/browse/HBASE-4528
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Reporter: dhruba borthakur
Assignee: dhruba borthakur
 Attachments: appendNoSyncPut1.txt, appendNoSyncPut2.txt, 
 appendNoSyncPut3.txt, appendNoSyncPut4.txt


 This allows for better throughput when there are hot rows. A single row 
 update improves from 100 puts/sec/server to 5000 puts/sec/server.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-1744) Thrift server to match the new java api.

2011-10-08 Thread Tim Sell (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-1744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Sell updated HBASE-1744:


Status: Patch Available  (was: Open)

 Thrift server to match the new java api.
 

 Key: HBASE-1744
 URL: https://issues.apache.org/jira/browse/HBASE-1744
 Project: HBase
  Issue Type: Improvement
  Components: thrift
Reporter: Tim Sell
Assignee: Tim Sell
Priority: Critical
 Fix For: 0.94.0

 Attachments: HBASE-1744.2.patch, HBASE-1744.3.patch, 
 HBASE-1744.4.patch, HBASE-1744.5.patch, HBASE-1744.6.patch, 
 HBASE-1744.7.patch, HBASE-1744.preview.1.patch, thriftexperiment.patch


 This mutateRows, etc.. is a little confusing compared to the new cleaner java 
 client.
 Thinking of ways to make a thrift client that is just as elegant. something 
 like:
 void put(1:Bytes table, 2:TPut put) throws (1:IOError io)
 with:
 struct TColumn {
   1:Bytes family,
   2:Bytes qualifier,
   3:i64 timestamp
 }
 struct TPut {
   1:Bytes row,
   2:mapTColumn, Bytes values
 }
 This creates more verbose rpc  than if the columns in TPut were just 
 mapBytes, mapBytes, Bytes, but that is harder to fit timestamps into and 
 still be intuitive from say python.
 Presumably the goal of a thrift gateway is to be easy first.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-1744) Thrift server to match the new java api.

2011-10-08 Thread Tim Sell (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-1744?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Sell updated HBASE-1744:


Attachment: HBASE-1744.7.patch

new patch, added licenses and trivial python demo.

 Thrift server to match the new java api.
 

 Key: HBASE-1744
 URL: https://issues.apache.org/jira/browse/HBASE-1744
 Project: HBase
  Issue Type: Improvement
  Components: thrift
Reporter: Tim Sell
Assignee: Tim Sell
Priority: Critical
 Fix For: 0.94.0

 Attachments: HBASE-1744.2.patch, HBASE-1744.3.patch, 
 HBASE-1744.4.patch, HBASE-1744.5.patch, HBASE-1744.6.patch, 
 HBASE-1744.7.patch, HBASE-1744.preview.1.patch, thriftexperiment.patch


 This mutateRows, etc.. is a little confusing compared to the new cleaner java 
 client.
 Thinking of ways to make a thrift client that is just as elegant. something 
 like:
 void put(1:Bytes table, 2:TPut put) throws (1:IOError io)
 with:
 struct TColumn {
   1:Bytes family,
   2:Bytes qualifier,
   3:i64 timestamp
 }
 struct TPut {
   1:Bytes row,
   2:mapTColumn, Bytes values
 }
 This creates more verbose rpc  than if the columns in TPut were just 
 mapBytes, mapBytes, Bytes, but that is harder to fit timestamps into and 
 still be intuitive from say python.
 Presumably the goal of a thrift gateway is to be easy first.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-1744) Thrift server to match the new java api.

2011-10-08 Thread Ted Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1744?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13123531#comment-13123531
 ] 

Ted Yu commented on HBASE-1744:
---

+1 on patch v7.

 Thrift server to match the new java api.
 

 Key: HBASE-1744
 URL: https://issues.apache.org/jira/browse/HBASE-1744
 Project: HBase
  Issue Type: Improvement
  Components: thrift
Reporter: Tim Sell
Assignee: Tim Sell
Priority: Critical
 Fix For: 0.94.0

 Attachments: HBASE-1744.2.patch, HBASE-1744.3.patch, 
 HBASE-1744.4.patch, HBASE-1744.5.patch, HBASE-1744.6.patch, 
 HBASE-1744.7.patch, HBASE-1744.preview.1.patch, thriftexperiment.patch


 This mutateRows, etc.. is a little confusing compared to the new cleaner java 
 client.
 Thinking of ways to make a thrift client that is just as elegant. something 
 like:
 void put(1:Bytes table, 2:TPut put) throws (1:IOError io)
 with:
 struct TColumn {
   1:Bytes family,
   2:Bytes qualifier,
   3:i64 timestamp
 }
 struct TPut {
   1:Bytes row,
   2:mapTColumn, Bytes values
 }
 This creates more verbose rpc  than if the columns in TPut were just 
 mapBytes, mapBytes, Bytes, but that is harder to fit timestamps into and 
 still be intuitive from say python.
 Presumably the goal of a thrift gateway is to be easy first.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Created] (HBASE-4557) Do something better than UnknownScannerException

2011-10-08 Thread Jean-Daniel Cryans (Created) (JIRA)
Do something better than UnknownScannerException


 Key: HBASE-4557
 URL: https://issues.apache.org/jira/browse/HBASE-4557
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.90.4
Reporter: Jean-Daniel Cryans
 Fix For: 0.94.0


UnknownScannerException is a plague, there's no reason we should not at least 
try to create a new scanner. If that fails again, maybe try automatically 
setting a lower scanner caching (if possible and with proper loggin) and retry 
again.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4102) atomicAppend: A put that appends to the latest version of a cell; i.e. reads current value then adds the bytes offered by the client to the tail and writes out a new entr

2011-10-08 Thread Lars Hofhansl (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Lars Hofhansl updated HBASE-4102:
-

Assignee: Lars Hofhansl

This seems like a simple useful addition.
Should this be in 0.92 or 0.94?

 atomicAppend: A put that appends to the latest version of a cell; i.e. reads 
 current value then adds the bytes offered by the client to the tail and 
 writes out a new entry
 ---

 Key: HBASE-4102
 URL: https://issues.apache.org/jira/browse/HBASE-4102
 Project: HBase
  Issue Type: New Feature
Reporter: stack
Assignee: Lars Hofhansl

 Its come up a few times that clients want to add to an existing cell rather 
 than make a new cell each time.  At our place, the frontend keeps a list of 
 urls a user has visited -- their md5s -- and updates it as user progresses.  
 Rather than read, modify client-side, then write new value back to hbase, it 
 would be sweet if could do it all in one operation in hbase server.  TSDB 
 aims to be space efficient.  Rather than pay the cost of the KV wrapper per 
 metric, it would rather have a KV for an interval an in this KV have a value 
 that is all the metrics for the period.
 It could be done as a coprocessor but this feels more like a fundamental 
 feature.
 Benoît suggests that atomicAppend take a flag to indicate whether or not the 
 client wants to see the resulting cell; often a client won't want to see the 
 result and in this case, why pay the price formulating and delivering a 
 response that client just drops.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4553) The update of .tableinfo is not atomic; we remove then rename

2011-10-08 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13123567#comment-13123567
 ] 

stack commented on HBASE-4553:
--

So, I see this in this failure up on jenkins in 
https://builds.apache.org/job/HBase-0.92/53/artifact/trunk/target/surefire-reports/org.apache.hadoop.hbase.avro.TestAvroServer-output.txt

I see that we updated .tableinfo and a check on the files modtime at about the 
same time fails with filenotfound.

Todd, should we spin here till file shows up?

 The update of .tableinfo is not atomic; we remove then rename
 -

 Key: HBASE-4553
 URL: https://issues.apache.org/jira/browse/HBASE-4553
 Project: HBase
  Issue Type: Task
Reporter: stack

 This comes of HBASE-4547.  The rename in 0.20 hdfs fails if file exists 
 already.  In 0.20+ its better but still 'some' issues if existing reader when 
 file is renamed.  This issue is about fixing this (though we depend on fix 
 first being in hdfs).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4540) OpenedRegionHandler is not enforcing atomicity of the operation it is performing

2011-10-08 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4540?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13123569#comment-13123569
 ] 

jirapos...@reviews.apache.org commented on HBASE-4540:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2251/#review2469
---

Ship it!


I'm good on commit.

Have some suggestions for future handler tests below.  I'm ok if we commit w/o 
addressing them here.

Nice fix Ram


http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKAssign.java
https://reviews.apache.org/r/2251/#comment5578

We don't have this method already in our ZK* classes?



http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/master/TestOpenedRegionHandler.java
https://reviews.apache.org/r/2251/#comment5579

Do you have to spin up the cluster twice?   Could you do it once only in 
@BeforeClass and then shut it down in @AfterClass?  So its run once only?



http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/master/TestOpenedRegionHandler.java
https://reviews.apache.org/r/2251/#comment5580

Good test.

Would it be possible to test the handler without spinning up the cluster?  
See TestOpenRegionHandler over under regionserver.handler in tests -- they 
don't spin up a cluster, just zk.  Test can run faster if no dfs+hbase.  Not 
important.  For the future.


- Michael


On 2011-10-08 05:13:32, ramkrishna vasudevan wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/2251/
bq.  ---
bq.  
bq.  (Updated 2011-10-08 05:13:32)
bq.  
bq.  
bq.  Review request for hbase, Ted Yu, Michael Stack, and Jonathan Gray.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  Fix for handling HBASE-4539 and HBASE-4540.
bq.  Ran all the testcases.  Added one new testcase to verify 
OpenedRegionHandler scenarios.
bq.  Also addresses Ted's comments.
bq.  
bq.  
bq.  This addresses bug HBASE-4540.
bq.  https://issues.apache.org/jira/browse/HBASE-4540
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/AssignmentManager.java
 1179945 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/master/handler/OpenedRegionHandler.java
 1179945 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKAssign.java
 1179945 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/zookeeper/ZKUtil.java
 1179945 
bq.
http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/master/TestOpenedRegionHandler.java
 PRE-CREATION 
bq.  
bq.  Diff: https://reviews.apache.org/r/2251/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  Yes
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  ramkrishna
bq.  
bq.



 OpenedRegionHandler is not enforcing atomicity of the operation it is 
 performing
 

 Key: HBASE-4540
 URL: https://issues.apache.org/jira/browse/HBASE-4540
 Project: HBase
  Issue Type: Bug
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
 Attachments: HBASE-4540_1.patch


 - OpenedRegionHandler has not yet deleted the znode of the region R1 opened 
 by RS1.
 - RS1 goes down.
 - Servershutdownhandler assigns the region R1 to RS2.
 - The znode of R1 is moved to OFFLINE state by master or OPENING state by 
 RS2 if RS2 has started opening the region.
 - Now the first OpenedRegionHandler tries to delete the znode thinking its 
 in OPENED state but fails.
 - Though it fails it removes the node from RIT and adds RS1 as the owner of 
 R1 in master's memory.
 - Now when RS2 completes opening the region the master is not able to open 
 the region as already the reigon has been deleted from RIT.
 {code}
 Master
 ==
 2011-10-05 20:49:45,301 INFO 
 org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: Finished 
 processing of shutdown of linux146,60020,1317827727647
 2011-10-05 20:49:54,177 DEBUG org.apache.hadoop.hbase.master.HMaster: Not 
 running balancer because 1 region(s) in transition: 
 {3e69d628a8bd8e9b7c5e7a2a6e03aad9=t1,,1317827883842.3e69d628a8bd8e9b7c5e7a2a6e03aad9.
  state=PENDING_OPEN, ts=1317827985272, server=linux76,60020,1317827746847}
 2011-10-05 20:49:57,720 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=M_ZK_REGION_OFFLINE, 

[jira] [Commented] (HBASE-4480) Testing script to simplfy local testing

2011-10-08 Thread Jesse Yates (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13123570#comment-13123570
 ] 

Jesse Yates commented on HBASE-4480:


nitpick: I would prefer the options be labeled in the help as something like 
'-f FILE' rather than '-f=FILE' - it feels more clear.

If you could put together an actual patch and update the docs that we can 
actually push into the code base, that would be great.

 Testing script to simplfy local testing
 ---

 Key: HBASE-4480
 URL: https://issues.apache.org/jira/browse/HBASE-4480
 Project: HBase
  Issue Type: Improvement
Reporter: Jesse Yates
Priority: Minor
  Labels: test
 Attachments: runtest.sh, runtest2.sh


 As mentioned by http://search-hadoop.com/m/r2Ab624ES3e and 
 http://search-hadoop.com/m/cZjDH1ykGIA it would be nice if we could have a 
 script that would handle more of the finer points of running/checking our 
 test suite.
 This script should:
 (1) Allow people to determine which tests are hanging/taking a long time to 
 run
 (2) Allow rerunning of particular tests to make sure it wasn't an artifact of 
 running the whole suite that caused the failure
 (3) Allow people to specify to run just unit tests or also integration tests 
 (essentially wrapping calls to 'maven test' and 'maven verify').
 This script should just be a convenience script - running tests directly from 
 maven should not be impacted.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4102) atomicAppend: A put that appends to the latest version of a cell; i.e. reads current value then adds the bytes offered by the client to the tail and writes out a new en

2011-10-08 Thread Lars Hofhansl (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13123572#comment-13123572
 ] 

Lars Hofhansl commented on HBASE-4102:
--

I have this working now.

But now I realized two things:
1. I modeled it after the old ICV. I assume we want something like the new 
Increment API.
2. Is this something that even want to build into HBase? Or should a user 
implement this with a coprocessor endpoint? (It would be possible to do with a 
coprocessor, albeit not quite as efficient as an endpoint would have no access 
to the Stores.


 atomicAppend: A put that appends to the latest version of a cell; i.e. reads 
 current value then adds the bytes offered by the client to the tail and 
 writes out a new entry
 ---

 Key: HBASE-4102
 URL: https://issues.apache.org/jira/browse/HBASE-4102
 Project: HBase
  Issue Type: New Feature
Reporter: stack
Assignee: Lars Hofhansl

 Its come up a few times that clients want to add to an existing cell rather 
 than make a new cell each time.  At our place, the frontend keeps a list of 
 urls a user has visited -- their md5s -- and updates it as user progresses.  
 Rather than read, modify client-side, then write new value back to hbase, it 
 would be sweet if could do it all in one operation in hbase server.  TSDB 
 aims to be space efficient.  Rather than pay the cost of the KV wrapper per 
 metric, it would rather have a KV for an interval an in this KV have a value 
 that is all the metrics for the period.
 It could be done as a coprocessor but this feels more like a fundamental 
 feature.
 Benoît suggests that atomicAppend take a flag to indicate whether or not the 
 client wants to see the resulting cell; often a client won't want to see the 
 result and in this case, why pay the price formulating and delivering a 
 response that client just drops.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Work started] (HBASE-4102) atomicAppend: A put that appends to the latest version of a cell; i.e. reads current value then adds the bytes offered by the client to the tail and writes out a new

2011-10-08 Thread Lars Hofhansl (Work started) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4102?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HBASE-4102 started by Lars Hofhansl.

 atomicAppend: A put that appends to the latest version of a cell; i.e. reads 
 current value then adds the bytes offered by the client to the tail and 
 writes out a new entry
 ---

 Key: HBASE-4102
 URL: https://issues.apache.org/jira/browse/HBASE-4102
 Project: HBase
  Issue Type: New Feature
Reporter: stack
Assignee: Lars Hofhansl

 Its come up a few times that clients want to add to an existing cell rather 
 than make a new cell each time.  At our place, the frontend keeps a list of 
 urls a user has visited -- their md5s -- and updates it as user progresses.  
 Rather than read, modify client-side, then write new value back to hbase, it 
 would be sweet if could do it all in one operation in hbase server.  TSDB 
 aims to be space efficient.  Rather than pay the cost of the KV wrapper per 
 metric, it would rather have a KV for an interval an in this KV have a value 
 that is all the metrics for the period.
 It could be done as a coprocessor but this feels more like a fundamental 
 feature.
 Benoît suggests that atomicAppend take a flag to indicate whether or not the 
 client wants to see the resulting cell; often a client won't want to see the 
 result and in this case, why pay the price formulating and delivering a 
 response that client just drops.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4480) Testing script to simplfy local testing

2011-10-08 Thread Jesse Yates (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13123573#comment-13123573
 ] 

Jesse Yates commented on HBASE-4480:


Oh, and also adding an option for running clean before the tests would be nice. 
Its odd when I go to run a single test (with the top 20 slowest) and it prints 
out more than 1 result.

 Testing script to simplfy local testing
 ---

 Key: HBASE-4480
 URL: https://issues.apache.org/jira/browse/HBASE-4480
 Project: HBase
  Issue Type: Improvement
Reporter: Jesse Yates
Priority: Minor
  Labels: test
 Attachments: runtest.sh, runtest2.sh


 As mentioned by http://search-hadoop.com/m/r2Ab624ES3e and 
 http://search-hadoop.com/m/cZjDH1ykGIA it would be nice if we could have a 
 script that would handle more of the finer points of running/checking our 
 test suite.
 This script should:
 (1) Allow people to determine which tests are hanging/taking a long time to 
 run
 (2) Allow rerunning of particular tests to make sure it wasn't an artifact of 
 running the whole suite that caused the failure
 (3) Allow people to specify to run just unit tests or also integration tests 
 (essentially wrapping calls to 'maven test' and 'maven verify').
 This script should just be a convenience script - running tests directly from 
 maven should not be impacted.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4218) Delta Encoding of KeyValues (aka prefix compression)

2011-10-08 Thread Ted Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13123576#comment-13123576
 ] 

Ted Yu commented on HBASE-4218:
---

For BlockDeltaEncoder.afterBlockCache(), I am not sure if the following matches 
the logic:
{code}
  // Postcondition: if (isCompaction is set and onDisk is not NONR) or
  //inMemory is not set - don;t encode.
{code}

 Delta Encoding of KeyValues  (aka prefix compression)
 -

 Key: HBASE-4218
 URL: https://issues.apache.org/jira/browse/HBASE-4218
 Project: HBase
  Issue Type: Improvement
  Components: io
Affects Versions: 0.94.0
Reporter: Jacek Migdal
  Labels: compression
 Attachments: open-source.diff


 A compression for keys. Keys are sorted in HFile and they are usually very 
 similar. Because of that, it is possible to design better compression than 
 general purpose algorithms,
 It is an additional step designed to be used in memory. It aims to save 
 memory in cache as well as speeding seeks within HFileBlocks. It should 
 improve performance a lot, if key lengths are larger than value lengths. For 
 example, it makes a lot of sense to use it when value is a counter.
 Initial tests on real data (key length = ~ 90 bytes , value length = 8 bytes) 
 shows that I could achieve decent level of compression:
  key compression ratio: 92%
  total compression ratio: 85%
  LZO on the same data: 85%
  LZO after delta encoding: 91%
 While having much better performance (20-80% faster decompression ratio than 
 LZO). Moreover, it should allow far more efficient seeking which should 
 improve performance a bit.
 It seems that a simple compression algorithms are good enough. Most of the 
 savings are due to prefix compression, int128 encoding, timestamp diffs and 
 bitfields to avoid duplication. That way, comparisons of compressed data can 
 be much faster than a byte comparator (thanks to prefix compression and 
 bitfields).
 In order to implement it in HBase two important changes in design will be 
 needed:
 -solidify interface to HFileBlock / HFileReader Scanner to provide seeking 
 and iterating; access to uncompressed buffer in HFileBlock will have bad 
 performance
 -extend comparators to support comparison assuming that N first bytes are 
 equal (or some fields are equal)
 Link to a discussion about something similar:
 http://search-hadoop.com/m/5aqGXJEnaD1/hbase+windowssubj=Re+prefix+compression

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4218) Delta Encoding of KeyValues (aka prefix compression)

2011-10-08 Thread Ted Yu (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4218?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13123597#comment-13123597
 ] 

Ted Yu commented on HBASE-4218:
---

EmptyBlockDeltaEncoder, CompressionState, BlockDeltaEncoder need license.

 Delta Encoding of KeyValues  (aka prefix compression)
 -

 Key: HBASE-4218
 URL: https://issues.apache.org/jira/browse/HBASE-4218
 Project: HBase
  Issue Type: Improvement
  Components: io
Affects Versions: 0.94.0
Reporter: Jacek Migdal
  Labels: compression
 Attachments: open-source.diff


 A compression for keys. Keys are sorted in HFile and they are usually very 
 similar. Because of that, it is possible to design better compression than 
 general purpose algorithms,
 It is an additional step designed to be used in memory. It aims to save 
 memory in cache as well as speeding seeks within HFileBlocks. It should 
 improve performance a lot, if key lengths are larger than value lengths. For 
 example, it makes a lot of sense to use it when value is a counter.
 Initial tests on real data (key length = ~ 90 bytes , value length = 8 bytes) 
 shows that I could achieve decent level of compression:
  key compression ratio: 92%
  total compression ratio: 85%
  LZO on the same data: 85%
  LZO after delta encoding: 91%
 While having much better performance (20-80% faster decompression ratio than 
 LZO). Moreover, it should allow far more efficient seeking which should 
 improve performance a bit.
 It seems that a simple compression algorithms are good enough. Most of the 
 savings are due to prefix compression, int128 encoding, timestamp diffs and 
 bitfields to avoid duplication. That way, comparisons of compressed data can 
 be much faster than a byte comparator (thanks to prefix compression and 
 bitfields).
 In order to implement it in HBase two important changes in design will be 
 needed:
 -solidify interface to HFileBlock / HFileReader Scanner to provide seeking 
 and iterating; access to uncompressed buffer in HFileBlock will have bad 
 performance
 -extend comparators to support comparison assuming that N first bytes are 
 equal (or some fields are equal)
 Link to a discussion about something similar:
 http://search-hadoop.com/m/5aqGXJEnaD1/hbase+windowssubj=Re+prefix+compression

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4553) The update of .tableinfo is not atomic; we remove then rename

2011-10-08 Thread Todd Lipcon (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13123603#comment-13123603
 ] 

Todd Lipcon commented on HBASE-4553:


atomic operations on filesystems are tricky... to do this correctly in the 
face of crashes, we need to have some process either do a rollback or 
roll-forward to recover from failures. Something like:

writer:
- create tableinfo.tmp
- delete tableinfo
- rename tableinfo.tmp to tableinfo

reader:
- try to read tableinfo
- on IOE (block missing, etc), that means that the file was deleted underneath. 
So spin until the file open succeeds.


if the writer crashes between the delete and rename, we need someone else to 
come in and finish the operation.

IMO we need some general purpose way of allowing the master to keep an intent 
log in ZK for this kind of thing - and then if the master fails over, it can 
complete the operation.

 The update of .tableinfo is not atomic; we remove then rename
 -

 Key: HBASE-4553
 URL: https://issues.apache.org/jira/browse/HBASE-4553
 Project: HBase
  Issue Type: Task
Reporter: stack

 This comes of HBASE-4547.  The rename in 0.20 hdfs fails if file exists 
 already.  In 0.20+ its better but still 'some' issues if existing reader when 
 file is renamed.  This issue is about fixing this (though we depend on fix 
 first being in hdfs).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4553) The update of .tableinfo is not atomic; we remove then rename

2011-10-08 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4553?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13123616#comment-13123616
 ] 

stack commented on HBASE-4553:
--

@Todd Thanks boss.

 The update of .tableinfo is not atomic; we remove then rename
 -

 Key: HBASE-4553
 URL: https://issues.apache.org/jira/browse/HBASE-4553
 Project: HBase
  Issue Type: Task
Reporter: stack

 This comes of HBASE-4547.  The rename in 0.20 hdfs fails if file exists 
 already.  In 0.20+ its better but still 'some' issues if existing reader when 
 file is renamed.  This issue is about fixing this (though we depend on fix 
 first being in hdfs).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4102) atomicAppend: A put that appends to the latest version of a cell; i.e. reads current value then adds the bytes offered by the client to the tail and writes out a new en

2011-10-08 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4102?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13123618#comment-13123618
 ] 

stack commented on HBASE-4102:
--

@Lars On whether it should be 0.92 or 0.94, I'm thinking 0.94 (because people 
are watching me -- J-D and Todd will kill me if I commit a new feature to 
0.92).  That said, I think at least SU will patch their 0.92 with this patch to 
get this feature; we need it.

I think we want it like new increment API.

On doing it as a CP, the argument is that this is a fundamental rather than 
something to do as a CP.

 atomicAppend: A put that appends to the latest version of a cell; i.e. reads 
 current value then adds the bytes offered by the client to the tail and 
 writes out a new entry
 ---

 Key: HBASE-4102
 URL: https://issues.apache.org/jira/browse/HBASE-4102
 Project: HBase
  Issue Type: New Feature
Reporter: stack
Assignee: Lars Hofhansl

 Its come up a few times that clients want to add to an existing cell rather 
 than make a new cell each time.  At our place, the frontend keeps a list of 
 urls a user has visited -- their md5s -- and updates it as user progresses.  
 Rather than read, modify client-side, then write new value back to hbase, it 
 would be sweet if could do it all in one operation in hbase server.  TSDB 
 aims to be space efficient.  Rather than pay the cost of the KV wrapper per 
 metric, it would rather have a KV for an interval an in this KV have a value 
 that is all the metrics for the period.
 It could be done as a coprocessor but this feels more like a fundamental 
 feature.
 Benoît suggests that atomicAppend take a flag to indicate whether or not the 
 client wants to see the resulting cell; often a client won't want to see the 
 result and in this case, why pay the price formulating and delivering a 
 response that client just drops.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Reopened] (HBASE-4430) Disable TestSlabCache and TestSingleSizedCache temporarily to see if these are cause of build box failure though all tests pass

2011-10-08 Thread stack (Reopened) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack reopened HBASE-4430:
--


Reopening because TestSlabCache failed in most recent build -- hung.  Going to 
disable this test again.

 Disable TestSlabCache and TestSingleSizedCache temporarily to see if these 
 are cause of build box failure though all tests pass
 ---

 Key: HBASE-4430
 URL: https://issues.apache.org/jira/browse/HBASE-4430
 Project: HBase
  Issue Type: Task
  Components: test
Reporter: stack
Assignee: Li Pi
Priority: Blocker
 Fix For: 0.92.0

 Attachments: TestSlabCache.trace




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4430) Disable TestSlabCache and TestSingleSizedCache temporarily to see if these are cause of build box failure though all tests pass

2011-10-08 Thread stack (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4430?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-4430:
-

Priority: Major  (was: Blocker)

Marking 'major' rather than blocker since this is experimental feature -- 
shouldn't hold up 0.92.

 Disable TestSlabCache and TestSingleSizedCache temporarily to see if these 
 are cause of build box failure though all tests pass
 ---

 Key: HBASE-4430
 URL: https://issues.apache.org/jira/browse/HBASE-4430
 Project: HBase
  Issue Type: Task
  Components: test
Reporter: stack
Assignee: Li Pi
 Fix For: 0.92.0

 Attachments: TestSlabCache.trace




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4430) Disable TestSlabCache and TestSingleSizedCache temporarily to see if these are cause of build box failure though all tests pass

2011-10-08 Thread stack (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4430?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13123621#comment-13123621
 ] 

stack commented on HBASE-4430:
--

I disabled all the tests in TestSlabCache for now.  I looked at build output 
and there is nothing in the .txt and no -output.txt unfortunately making it 
harder debug.

 Disable TestSlabCache and TestSingleSizedCache temporarily to see if these 
 are cause of build box failure though all tests pass
 ---

 Key: HBASE-4430
 URL: https://issues.apache.org/jira/browse/HBASE-4430
 Project: HBase
  Issue Type: Task
  Components: test
Reporter: stack
Assignee: Li Pi
Priority: Blocker
 Fix For: 0.92.0

 Attachments: TestSlabCache.trace




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Updated] (HBASE-4480) Testing script to simplfy local testing

2011-10-08 Thread Scott Kuehn (Updated) (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-4480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Scott Kuehn updated HBASE-4480:
---

Attachment: HBASE-4480.patch

 Testing script to simplfy local testing
 ---

 Key: HBASE-4480
 URL: https://issues.apache.org/jira/browse/HBASE-4480
 Project: HBase
  Issue Type: Improvement
Reporter: Jesse Yates
Priority: Minor
  Labels: test
 Attachments: HBASE-4480.patch, runtest.sh, runtest2.sh


 As mentioned by http://search-hadoop.com/m/r2Ab624ES3e and 
 http://search-hadoop.com/m/cZjDH1ykGIA it would be nice if we could have a 
 script that would handle more of the finer points of running/checking our 
 test suite.
 This script should:
 (1) Allow people to determine which tests are hanging/taking a long time to 
 run
 (2) Allow rerunning of particular tests to make sure it wasn't an artifact of 
 running the whole suite that caused the failure
 (3) Allow people to specify to run just unit tests or also integration tests 
 (essentially wrapping calls to 'maven test' and 'maven verify').
 This script should just be a convenience script - running tests directly from 
 maven should not be impacted.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4480) Testing script to simplfy local testing

2011-10-08 Thread Scott Kuehn (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13123626#comment-13123626
 ] 

Scott Kuehn commented on HBASE-4480:


@Jesse - done.  Three fixes/improvements to the latest script:
- the -s option only reports duration of tests that were run (ignores stale 
surefire files)
- added -c option for cleaning prior to test running. this will just force a 
'mvn clean'
- more robust path management

 Testing script to simplfy local testing
 ---

 Key: HBASE-4480
 URL: https://issues.apache.org/jira/browse/HBASE-4480
 Project: HBase
  Issue Type: Improvement
Reporter: Jesse Yates
Priority: Minor
  Labels: test
 Attachments: HBASE-4480.patch, runtest.sh, runtest2.sh


 As mentioned by http://search-hadoop.com/m/r2Ab624ES3e and 
 http://search-hadoop.com/m/cZjDH1ykGIA it would be nice if we could have a 
 script that would handle more of the finer points of running/checking our 
 test suite.
 This script should:
 (1) Allow people to determine which tests are hanging/taking a long time to 
 run
 (2) Allow rerunning of particular tests to make sure it wasn't an artifact of 
 running the whole suite that caused the failure
 (3) Allow people to specify to run just unit tests or also integration tests 
 (essentially wrapping calls to 'maven test' and 'maven verify').
 This script should just be a convenience script - running tests directly from 
 maven should not be impacted.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4480) Testing script to simplfy local testing

2011-10-08 Thread Jesse Yates (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4480?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13123629#comment-13123629
 ] 

Jesse Yates commented on HBASE-4480:


@Scott good stuff.

nitpick: if I'm doing a clean, it should run 'mvn clean test' rather than 
running 'mvn clean' and then 'mvn test'

 Testing script to simplfy local testing
 ---

 Key: HBASE-4480
 URL: https://issues.apache.org/jira/browse/HBASE-4480
 Project: HBase
  Issue Type: Improvement
Reporter: Jesse Yates
Priority: Minor
  Labels: test
 Attachments: HBASE-4480.patch, runtest.sh, runtest2.sh


 As mentioned by http://search-hadoop.com/m/r2Ab624ES3e and 
 http://search-hadoop.com/m/cZjDH1ykGIA it would be nice if we could have a 
 script that would handle more of the finer points of running/checking our 
 test suite.
 This script should:
 (1) Allow people to determine which tests are hanging/taking a long time to 
 run
 (2) Allow rerunning of particular tests to make sure it wasn't an artifact of 
 running the whole suite that caused the failure
 (3) Allow people to specify to run just unit tests or also integration tests 
 (essentially wrapping calls to 'maven test' and 'maven verify').
 This script should just be a convenience script - running tests directly from 
 maven should not be impacted.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira