[jira] Commented: (HBASE-50) Snapshot of table

2010-08-13 Thread HBase Review Board (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-50?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12898127#action_12898127
 ] 

HBase Review Board commented on HBASE-50:
-

Message from: Chongxin Li lichong...@zju.edu.cn


bq.  On 2010-08-12 10:33:25, Ted Yu wrote:
bq.   src/main/java/org/apache/hadoop/hbase/master/DeleteSnapshot.java, line 98
bq.   http://review.cloudera.org/r/467/diff/4/?file=6589#file6589line98
bq.  
bq.   Is there more to be done here ?

Deleting the region dir?


bq.  On 2010-08-12 10:33:25, Ted Yu wrote:
bq.   src/main/java/org/apache/hadoop/hbase/master/DeleteSnapshot.java, line 94
bq.   http://review.cloudera.org/r/467/diff/4/?file=6589#file6589line94
bq.  
bq.   Should return value be checked ?

Deleting the snapshot directory at last would delete all snapshot files anyway. 
Do we still have to check the return value? What if the return value if false, 
just log it?


- Chongxin


---
This is an automatically generated e-mail. To reply, visit:
http://review.cloudera.org/r/467/#review874
---





 Snapshot of table
 -

 Key: HBASE-50
 URL: https://issues.apache.org/jira/browse/HBASE-50
 Project: HBase
  Issue Type: New Feature
Reporter: Billy Pearson
Assignee: Li Chongxin
Priority: Minor
 Attachments: HBase Snapshot Design Report V2.pdf, HBase Snapshot 
 Design Report V3.pdf, HBase Snapshot Implementation Plan.pdf, Snapshot Class 
 Diagram.png


 Havening an option to take a snapshot of a table would be vary useful in 
 production.
 What I would like to see this option do is do a merge of all the data into 
 one or more files stored in the same folder on the dfs. This way we could 
 save data in case of a software bug in hadoop or user code. 
 The other advantage would be to be able to export a table to multi locations. 
 Say I had a read_only table that must be online. I could take a snapshot of 
 it when needed and export it to a separate data center and have it loaded 
 there and then i would have it online at multi data centers for load 
 balancing and failover.
 I understand that hadoop takes the need out of havening backup to protect 
 from failed servers, but this does not protect use from software bugs that 
 might delete or alter data in ways we did not plan. We should have a way we 
 can roll back a dataset.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HBASE-50) Snapshot of table

2010-08-13 Thread HBase Review Board (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-50?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12898203#action_12898203
 ] 

HBase Review Board commented on HBASE-50:
-

Message from: Ted Yu ted...@yahoo.com

---
This is an automatically generated e-mail. To reply, visit:
http://review.cloudera.org/r/467/#review897
---



src/main/java/org/apache/hadoop/hbase/master/DeleteSnapshot.java
http://review.cloudera.org/r/467/#comment2925

We should log if we fail to delete.



src/main/java/org/apache/hadoop/hbase/master/DeleteSnapshot.java
http://review.cloudera.org/r/467/#comment2924

Yes.


- Ted





 Snapshot of table
 -

 Key: HBASE-50
 URL: https://issues.apache.org/jira/browse/HBASE-50
 Project: HBase
  Issue Type: New Feature
Reporter: Billy Pearson
Assignee: Li Chongxin
Priority: Minor
 Attachments: HBase Snapshot Design Report V2.pdf, HBase Snapshot 
 Design Report V3.pdf, HBase Snapshot Implementation Plan.pdf, Snapshot Class 
 Diagram.png


 Havening an option to take a snapshot of a table would be vary useful in 
 production.
 What I would like to see this option do is do a merge of all the data into 
 one or more files stored in the same folder on the dfs. This way we could 
 save data in case of a software bug in hadoop or user code. 
 The other advantage would be to be able to export a table to multi locations. 
 Say I had a read_only table that must be online. I could take a snapshot of 
 it when needed and export it to a separate data center and have it loaded 
 there and then i would have it online at multi data centers for load 
 balancing and failover.
 I understand that hadoop takes the need out of havening backup to protect 
 from failed servers, but this does not protect use from software bugs that 
 might delete or alter data in ways we did not plan. We should have a way we 
 can roll back a dataset.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (HBASE-2907) [rest/stargate] Improve error response when trying to create a scanner on a nonexistant table

2010-08-13 Thread Andrew Purtell (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-2907?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell reassigned HBASE-2907:
-

Assignee: Andrew Purtell

 [rest/stargate] Improve error response when trying to create a scanner on a 
 nonexistant table
 -

 Key: HBASE-2907
 URL: https://issues.apache.org/jira/browse/HBASE-2907
 Project: HBase
  Issue Type: Improvement
  Components: rest
Reporter: Kieron Briggs
Assignee: Andrew Purtell
Priority: Minor

 Since 0.20.4, an attempt to create a scanner for a nonexistant table receives 
 a 400 Bad Request response with no furthur information. Prior to 0.20.4 it 
 would receive a 500 org.apache.hadoop.hbase.TableNotFoundException: table 
 response with a stack trace in the body.
 Neither of these is ideal - the 400 fails to identify what aspect of the 
 request was bad, and the 500 incorrectly suggests that the error was 
 internal. Ideally the error should be a 400 error with information in the 
 body identifying the nature of the problem.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (HBASE-2911) [stargate] Fix JSON handling of META and ROOT

2010-08-13 Thread Andrew Purtell (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-2911?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Andrew Purtell reassigned HBASE-2911:
-

Assignee: Andrew Purtell

 [stargate] Fix JSON handling of META and ROOT
 -

 Key: HBASE-2911
 URL: https://issues.apache.org/jira/browse/HBASE-2911
 Project: HBase
  Issue Type: Bug
  Components: rest
Reporter: Lars George
Assignee: Andrew Purtell

 While working on the HBase Explorer front end in Hue I found a few 
 inconsistencies between the plain text version of values versus the JSON 
 representation. From an email conversation:
 Plain Text
 ---
 $ curl -H curl -H Accept: text/plain localhost:/status/cluster
 1 live servers, 0 dead servers, 5. average load
 1 live servers
de1-app-mbp-2.fritz.box:62884 1280924907616
requests=0, regions=5
heapSizeMB=27
maxHeapSizeMB=995
t2,,1280917558997
stores=3
storefiless=0
storefileSizeMB=0
memstoreSizeMB=0
storefileIndexSizeMB=0
usertable,,1280917566604
stores=3
storefiless=2
storefileSizeMB=224
memstoreSizeMB=0
storefileIndexSizeMB=0
.META.,,1
stores=2
storefiless=1
storefileSizeMB=0
memstoreSizeMB=0
storefileIndexSizeMB=0
t1,,1280917554475
stores=3
storefiless=0
storefileSizeMB=0
memstoreSizeMB=0
storefileIndexSizeMB=0
\-ROOT\-,,0
stores=1
storefiless=1
storefileSizeMB=0
memstoreSizeMB=0
storefileIndexSizeMB=0
 JSON
 -
 And curling the JSON yields:
 $ curl -H Accept: application/json localhost:/status/cluster
 {requests:0,regions:5,averageLoad:5.0,DeadNodes:[null],LiveNodes:[{Node:{startCode:1280924907616,requests:0,name:de1-app-mbp-2.fritz.box:62884,maxHeapSizeMB:995,heapSizeMB:27,Region:[{stores:3,storefiles:0,storefileSizeMB:0,storefileIndexSizeMB:0,name:dDIsLDEyODA5MTc1NTg5OTc=,memstoreSizeMB:0},{stores:3,storefiles:2,storefileSizeMB:224,storefileIndexSizeMB:0,name:dXNlcnRhYmxlLCwxMjgwOTE3NTY2NjA0,memstoreSizeMB:0},{stores:2,storefiles:1,storefileSizeMB:0,storefileIndexSizeMB:0,name:Lk1FVEEuLCwx,memstoreSizeMB:0},{stores:3,storefiles:0,storefileSizeMB:0,storefileIndexSizeMB:0,name:dDEsLDEyODA5MTc1NTQ0NzU=,memstoreSizeMB:0},{stores:1,storefiles:1,storefileSizeMB:0,storefileIndexSizeMB:0,name:LVJPT1QtLCww,memstoreSizeMB:0}]}}]}
 And another one:
 I have another one with .META. and \-ROOT\-, in my small sample setup (all 
 local, /tmp etc.) I see this in the master UI:
 Name   Region Server   Encoded NameStart Key   End Key
 .META.,,1 10.0.0.43:60030  -  
 But running the same against Stargate I get:
 $ curl -H Accept: application/json http://localhost:/.META./regions
 {name:.META.}
 while a normal user table with a single row has
 Name   Region Server   Encoded NameStart Key   End Key
 t1,,128615489 10.0.0.43:60030  1127696125 
 and through Stargate:
 $ curl -H Accept: application/json http://localhost:/t1/regions
 {name:t1,Region:[{location:10.0.0.43:54988,endKey:,startKey:,id:128615489,name:t1,,128615489}]}
 So the internal tables are not reported right.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HBASE-2914) Profiling indicates that ThriftUtilities.rowResultFromHBase is quite inefficient

2010-08-13 Thread ryan rawson (JIRA)
Profiling indicates that ThriftUtilities.rowResultFromHBase is quite inefficient


 Key: HBASE-2914
 URL: https://issues.apache.org/jira/browse/HBASE-2914
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.89.20100621, 0.20.6
Reporter: ryan rawson
 Fix For: 0.90.0
 Attachments: HBASE-2914.patch

Profiling of ThriftServer here at SU has indicated that the call 
ThriftUtilities.rowResultFromHBase() is quite inefficient.  It first calls 
Result.getRowResult() which is inefficient and slow. Instead by reimplementing 
to create the TRowResult (the thrift return type) straight from the KeyValue[] 
array the performance boost is substantial, reducing time serializing the 
results.  In my profiling the time spent in scannerGetList() went from 1100ms 
to 108ms on similar test runs.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (HBASE-2914) Profiling indicates that ThriftUtilities.rowResultFromHBase is quite inefficient

2010-08-13 Thread ryan rawson (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-2914?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ryan rawson reassigned HBASE-2914:
--

Assignee: ryan rawson

 Profiling indicates that ThriftUtilities.rowResultFromHBase is quite 
 inefficient
 

 Key: HBASE-2914
 URL: https://issues.apache.org/jira/browse/HBASE-2914
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.20.6, 0.89.20100621
Reporter: ryan rawson
Assignee: ryan rawson
 Fix For: 0.90.0

 Attachments: HBASE-2914.patch


 Profiling of ThriftServer here at SU has indicated that the call 
 ThriftUtilities.rowResultFromHBase() is quite inefficient.  It first calls 
 Result.getRowResult() which is inefficient and slow. Instead by 
 reimplementing to create the TRowResult (the thrift return type) straight 
 from the KeyValue[] array the performance boost is substantial, reducing time 
 serializing the results.  In my profiling the time spent in scannerGetList() 
 went from 1100ms to 108ms on similar test runs.  

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HBASE-2915) Deadlock between HRegion.ICV and HRegion.close

2010-08-13 Thread Jean-Daniel Cryans (JIRA)
Deadlock between HRegion.ICV and HRegion.close
--

 Key: HBASE-2915
 URL: https://issues.apache.org/jira/browse/HBASE-2915
 Project: HBase
  Issue Type: Bug
Reporter: Jean-Daniel Cryans
Priority: Blocker
 Fix For: 0.90.0


HRegion.ICV gets a row lock then gets a newScanner lock.

HRegion.close gets a newScanner lock, slitCloseLock and finally waits for all 
row locks to finish.

If the ICV got the row lock and then close got the newScannerLock, both end up 
waiting on the other. This was introduced when Get became a Scan.

Stack thinks we can get rid of the newScannerLock in close since we setClosing 
to true.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HBASE-2916) Reseek directly to next column

2010-08-13 Thread Pranav Khaitan (JIRA)
Reseek directly to next column
--

 Key: HBASE-2916
 URL: https://issues.apache.org/jira/browse/HBASE-2916
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Reporter: Pranav Khaitan


When done with the current column, reseek directly to the next column rather 
than spending time reading more keys of current row-column which are not 
required.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Created: (HBASE-2917) Reseek directly to next row

2010-08-13 Thread Pranav Khaitan (JIRA)
Reseek directly to next row
---

 Key: HBASE-2917
 URL: https://issues.apache.org/jira/browse/HBASE-2917
 Project: HBase
  Issue Type: Improvement
Reporter: Pranav Khaitan


When done with the current row, reseek directly to the next row rather than 
spending time reading more keys of current row which are not required.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HBASE-2910) [stargate] Add /config/cluster endpoint to retrieve the current configuration

2010-08-13 Thread Lars George (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2910?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12898395#action_12898395
 ] 

Lars George commented on HBASE-2910:


Mainly HBase. But the more the better. The idea is to be able to tell the UI 
user what the current cluster configuration looks like. 

 [stargate] Add /config/cluster endpoint to retrieve the current configuration
 -

 Key: HBASE-2910
 URL: https://issues.apache.org/jira/browse/HBASE-2910
 Project: HBase
  Issue Type: Improvement
  Components: rest
Reporter: Lars George
 Attachments: Hue HBase Explorer.jpg


 I am working on the Hue based front end called the HBase Explorer. It would 
 be good to be able to also display the current cluster configuration.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HBASE-2917) Reseek directly to next row

2010-08-13 Thread Pranav Khaitan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2917?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12898396#action_12898396
 ] 

Pranav Khaitan commented on HBASE-2917:
---

Ryan says: we should be doing an optimized reseek here, by using 
KeyValue.createLastOnRow() which will take us to the next row.

 Reseek directly to next row
 ---

 Key: HBASE-2917
 URL: https://issues.apache.org/jira/browse/HBASE-2917
 Project: HBase
  Issue Type: Improvement
Reporter: Pranav Khaitan

 When done with the current row, reseek directly to the next row rather than 
 spending time reading more keys of current row which are not required.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HBASE-2916) Reseek directly to next column

2010-08-13 Thread Pranav Khaitan (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12898397#action_12898397
 ] 

Pranav Khaitan commented on HBASE-2916:
---

Ryan says: I think it is also possible to create a 'last on column' value as 
well for reseek optimization here.

 Reseek directly to next column
 --

 Key: HBASE-2916
 URL: https://issues.apache.org/jira/browse/HBASE-2916
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Reporter: Pranav Khaitan

 When done with the current column, reseek directly to the next column rather 
 than spending time reading more keys of current row-column which are not 
 required.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HBASE-2315) BookKeeper for write-ahead logging

2010-08-13 Thread Benjamin Reed (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12898424#action_12898424
 ] 

Benjamin Reed commented on HBASE-2315:
--

we looked into the problem of figuring out the path to use for the WAL and 
found the following: it appears that the assumption that the WAL is stored in 
HDFS is embedded in HBase. when looking up a WAL, for example, the FileSystem 
object is used to check existence. Deletion of logs also happens outside of the 
WAL interfaces. to be truly pluggable a WAL interface should be used to 
enumerate and delete logs. have you guys thought about doing this?

 BookKeeper for write-ahead logging
 --

 Key: HBASE-2315
 URL: https://issues.apache.org/jira/browse/HBASE-2315
 Project: HBase
  Issue Type: New Feature
  Components: regionserver
Reporter: Flavio Junqueira
 Attachments: bookkeeperOverview.pdf, HBASE-2315.patch, 
 zookeeper-dev-bookkeeper.jar


 BookKeeper, a contrib of the ZooKeeper project, is a fault tolerant and high 
 throughput write-ahead logging service. This issue provides an implementation 
 of write-ahead logging for hbase using BookKeeper. Apart from expected 
 throughput improvements, BookKeeper also has stronger durability guarantees 
 compared to the implementation currently used by hbase.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HBASE-2315) BookKeeper for write-ahead logging

2010-08-13 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2315?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12898435#action_12898435
 ] 

stack commented on HBASE-2315:
--

No.

If you want us to switch to an interface, just say (will happen faster if you 
put up a patch).

 BookKeeper for write-ahead logging
 --

 Key: HBASE-2315
 URL: https://issues.apache.org/jira/browse/HBASE-2315
 Project: HBase
  Issue Type: New Feature
  Components: regionserver
Reporter: Flavio Junqueira
 Attachments: bookkeeperOverview.pdf, HBASE-2315.patch, 
 zookeeper-dev-bookkeeper.jar


 BookKeeper, a contrib of the ZooKeeper project, is a fault tolerant and high 
 throughput write-ahead logging service. This issue provides an implementation 
 of write-ahead logging for hbase using BookKeeper. Apart from expected 
 throughput improvements, BookKeeper also has stronger durability guarantees 
 compared to the implementation currently used by hbase.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Resolved: (HBASE-2909) SoftValueSortedMap is broken, can generate NPEs

2010-08-13 Thread Jean-Daniel Cryans (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-2909?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Daniel Cryans resolved HBASE-2909.
---

Hadoop Flags: [Reviewed]
Assignee: Jean-Daniel Cryans
  Resolution: Fixed

Committed to branch and trunk, thanks for checking it out guys.

 SoftValueSortedMap is broken, can generate NPEs
 ---

 Key: HBASE-2909
 URL: https://issues.apache.org/jira/browse/HBASE-2909
 Project: HBase
  Issue Type: Bug
  Components: client
Affects Versions: 0.20.6, 0.89.20100621
Reporter: Jean-Daniel Cryans
Assignee: Jean-Daniel Cryans
Priority: Blocker
 Fix For: 0.20.7, 0.90.0

 Attachments: hbase-2909.patch


 The way SoftValueSortedMap is using SoftValues, it looks like that it's able 
 to get it's keys garbage collected along with the values themselves. We got 
 this issue in production but I was also able to randomly generate it using 
 YCSB with 300 threads. Here's an example on 0.20 with jdk 1.6u14:
 {noformat}
 java.lang.NullPointerException
 at org.apache.hadoop.hbase.util.Bytes.compareTo(Bytes.java:1036)
 at 
 org.apache.hadoop.hbase.util.Bytes$ByteArrayComparator.compare(Bytes.java:104)
 at 
 org.apache.hadoop.hbase.util.Bytes$ByteArrayComparator.compare(Bytes.java:96)
 at java.util.TreeMap.cmp(TreeMap.java:1911)
 at java.util.TreeMap.get(TreeMap.java:1835)
 at 
 org.apache.hadoop.hbase.util.SoftValueSortedMap.get(SoftValueSortedMap.java:91)
 at 
 org.apache.hadoop.hbase.client.HConnectionManager$TableServers.getCachedLocation(HConnectionManager.java:788)
 at 
 org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegionInMeta(HConnectionManager.java:651)
 at 
 org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:634)
 at 
 org.apache.hadoop.hbase.client.HConnectionManager$TableServers.locateRegion(HConnectionManager.java:601)
 at org.apache.hadoop.hbase.client.HTable.init(HTable.java:128)
 at 
 org.apache.hadoop.hbase.thrift.ThriftServer$HBaseHandler.getTable(ThriftServer.java:262)
 at 
 org.apache.hadoop.hbase.thrift.ThriftServer$HBaseHandler.mutateRowTs(ThriftServer.java:585)
 at 
 org.apache.hadoop.hbase.thrift.ThriftServer$HBaseHandler.mutateRow(ThriftServer.java:578)
 at 
 org.apache.hadoop.hbase.thrift.generated.Hbase$Processor$mutateRow.process(Hbase.java:2345)
 at 
 org.apache.hadoop.hbase.thrift.generated.Hbase$Processor.process(Hbase.java:1988)
 at 
 org.apache.thrift.server.TThreadPoolServer$WorkerProcess.run(TThreadPoolServer.java:259)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:619)
 {noformat}
 In this specific case, the null cannot be the passed key because it's coming 
 from HTable which uses HConstants.EMPTY_START_ROW. It cannot be a null key 
 that was inserted previously because we would have got the NPE at insert 
 time. This can only mean that some key *became* null.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HBASE-2898) MultiPut makes proper error handling impossible and leads to corrupted data

2010-08-13 Thread ryan rawson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-2898?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12898486#action_12898486
 ] 

ryan rawson commented on HBASE-2898:


I am interested in a new multi-put for 0.90.  There are also cases for 
multi-get and multi just about everything. See HBASE-1845.

Improving serialization would be nice, but we'd have to wreck our KeyValue 
serialization mechanism and have something custom on the wire.  There is a 
similar situation in the Result serialization as well, it's just KeyValues all 
the way on down.

 MultiPut makes proper error handling impossible and leads to corrupted data
 ---

 Key: HBASE-2898
 URL: https://issues.apache.org/jira/browse/HBASE-2898
 Project: HBase
  Issue Type: Bug
  Components: client, regionserver
Affects Versions: 0.89.20100621
Reporter: Benoit Sigoure
Priority: Blocker
 Fix For: 0.90.0


 The {{MultiPut}} RPC needs to be completely rewritten.  Let's see why step by 
 step.
 # An HBase user calls any of the {{put}} methods on an {{HTable}} instance.
 # Eventually, {{HTable#flushCommits}} is invoked to actually send the edits 
 to the RegionServer(s).
 # This takes us to {{HConnectionManager#processBatchOfPuts}} where all edits 
 are sorted into one or more {{MultiPut}}.  Each {{MultiPut}} is aggregating 
 all the edits that are going to a particular RegionServer.
 # A thread pool is used to send all the {{MultiPut}} in parallel to their 
 respective RegionServer.  Let's follow what happens for a single {{MultiPut}}.
 # The {{MultiPut}} travels through the IPC code on the client and then 
 through the network and then through the IPC code on the RegionServer.
 # We're now in {{HRegionServer#multiPut}} where a new {{MultiPutResponse}} is 
 created.
 # Still in {{HRegionServer#multiPut}}.  Since a {{MultiPut}} is essentially a 
 map from region name to a list of {{Put}} for that region, there's a {{for}} 
 loop that executes each list of {{Put}} for each region sequentially.  Let's 
 follow what happens for a single list of {{Put}} for a particular region.
 # We're now in {{HRegionServer#put(byte[], ListPut)}}.  Each {{Put}} is 
 associated with the row lock that was specified by the client (if any).  Then 
 the pairs of {{(Put, lock id)}} are handed to the right {{HRegion}}.
 # Now we're in {{HRegion#put(PairPut, Integer[])}}, which immediately takes 
 us to {{HRegion#doMiniBatchPut}}.
 # At this point, let's assume that we're doing just 2 edits.  So the 
 {{BatchOperationInProgress}} that {{doMiniBatchPut}} contains just 2 {{Put}}.
 # The {{while}} loop in {{doMiniBatchPut}} that's going to execute each 
 {{Put}} starts.
 # The first {{Put}} fails because an exception is thrown when appending the 
 edit to the {{WAL}}.  Its {{batchOp.retCodes}} is marked as 
 {{OperationStatusCode.FAILURE}}.
 # Because there was an exception, we're back to {{HRegion#put(PairPut, 
 Integer[])}} where the {{while}} loop will test that {{batchOp.isDone}} is 
 {{false}} and do another iteration.
 # {{doMiniBatchPut}} is called again and handles the remaining {{Put}}.
 # The second {{Put}} succeeds normally, so its {{batchOp.retCodes}} is marked 
 as {{OperationStatusCode.SUCCESS}}.
 # {{doMiniBatchPut}} is done and returns to {{HRegion#put(PairPut, 
 Integer[])}}, which returns to {{HRegionServer#put(byte[], ListPut)}}.
 # At this point, {{HRegionServer#put(byte[], ListPut)}} does a {{for}} loop 
 and extracts the index of the *first* {{Put}} that failed out of the 
 {{OperationStatusCode[]}}.  In our case, it'll return 0 since the first 
 {{Put}} failed.
 # This index in the list of {{Put}} of the first that failed (0 in this case) 
 is returned to {{HRegionServer#multiPut}}, which records in the 
 {{MultiPutResponse}} - the client knows that the first {{Put}} failed but has 
 no idea about the other one.
 So the client has no reliable way of knowing which {{Put}} failed (if any) 
 past the first failure.  All it knows is that for a particular region, they 
 succeeded up to a particular {{Put}}, at which point there was a failure, and 
 then the remaining may or may not have succeeded.  Its best bet is to retry 
 all the {{Put}} past the index of the first failure for this region.  But 
 this has an unintended consequence.  The {{Put}} that were successful during 
 the first run will be *re-applied*.  This will unexpectedly create extra 
 versions.  Now I realize most people don't really care about versions, so 
 they won't notice.  But whoever relies on the versions for whatever reason 
 will rightfully consider this to be data corruption.
 As it is now, {{MultiPut}} makes proper error handling impossible.  Since 
 this RPC cannot guarantee any atomicity other than at the individual {{Put}} 
 level, it should 

[jira] Updated: (HBASE-2898) MultiPut makes proper error handling impossible and leads to corrupted data

2010-08-13 Thread Benoit Sigoure (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-2898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benoit Sigoure updated HBASE-2898:
--

Description: 
tl;dr version: I think the {{MultiPut}} RPC needs to be completely rewritten.  
The current code makes it totally impossible for an HBase client to do proper 
error handling.  When an edit fails, the client has no clue as to what the 
problem was (certain error cases can be retried, others cannot e.g. when using 
a non-existent family) and the client doesn't even know which of the edits have 
been applied successfully.  So the client often has to retry edits without 
knowing whether they've been applied or not, which leads to extra unwanted 
versions for the {{KeyValue}} that were successfully applied (for those who 
care about versions, this is essentially equivalent to data corruption).  In 
addition, there's no way for a client to properly handle 
{{NotServingRegionException}}, the client has to unnecessarily invalidate 
cached locations of some regions and retry *all* edits.

h2. Life of a failed multi-put

Let's see why step by step what happens when a single edit in a multi-put fails.

# An HBase user calls any of the {{put}} methods on an {{HTable}} instance.
# Eventually, {{HTable#flushCommits}} is invoked to actually send the edits to 
the RegionServer(s).
# This takes us to {{HConnectionManager#processBatchOfPuts}} where all edits 
are sorted into one or more {{MultiPut}}.  Each {{MultiPut}} is aggregating all 
the edits that are going to a particular RegionServer.
# A thread pool is used to send all the {{MultiPut}} in parallel to their 
respective RegionServer.  Let's follow what happens for a single {{MultiPut}}.
# The {{MultiPut}} travels through the IPC code on the client and then through 
the network and then through the IPC code on the RegionServer.
# We're now in {{HRegionServer#multiPut}} where a new {{MultiPutResponse}} is 
created.
# Still in {{HRegionServer#multiPut}}.  Since a {{MultiPut}} is essentially a 
map from region name to a list of {{Put}} for that region, there's a {{for}} 
loop that executes each list of {{Put}} for each region sequentially.  Let's 
follow what happens for a single list of {{Put}} for a particular region.
# We're now in {{HRegionServer#put(byte[], ListPut)}}.  Each {{Put}} is 
associated with the row lock that was specified by the client (if any).  Then 
the pairs of {{(Put, lock id)}} are handed to the right {{HRegion}}.
# Now we're in {{HRegion#put(PairPut, Integer[])}}, which immediately takes 
us to {{HRegion#doMiniBatchPut}}.
# At this point, let's assume that we're doing just 2 edits.  So the 
{{BatchOperationInProgress}} that {{doMiniBatchPut}} contains just 2 {{Put}}.
# The {{while}} loop in {{doMiniBatchPut}} that's going to execute each {{Put}} 
starts.
# The first {{Put}} fails because an exception is thrown when appending the 
edit to the {{WAL}}.  Its {{batchOp.retCodes}} is marked as 
{{OperationStatusCode.FAILURE}}.
# Because there was an exception, we're back to {{HRegion#put(PairPut, 
Integer[])}} where the {{while}} loop will test that {{batchOp.isDone}} is 
{{false}} and do another iteration.
# {{doMiniBatchPut}} is called again and handles the remaining {{Put}}.
# The second {{Put}} succeeds normally, so its {{batchOp.retCodes}} is marked 
as {{OperationStatusCode.SUCCESS}}.
# {{doMiniBatchPut}} is done and returns to {{HRegion#put(PairPut, 
Integer[])}}, which returns to {{HRegionServer#put(byte[], ListPut)}}.
# At this point, {{HRegionServer#put(byte[], ListPut)}} does a {{for}} loop 
and extracts the index of the *first* {{Put}} that failed out of the 
{{OperationStatusCode[]}}.  In our case, it'll return 0 since the first {{Put}} 
failed.
# This index in the list of {{Put}} of the first that failed (0 in this case) 
is returned to {{HRegionServer#multiPut}}, which records in the 
{{MultiPutResponse}} - the client knows that the first {{Put}} failed but has 
no idea about the other one.

So the client has no reliable way of knowing which {{Put}} failed (if any) past 
the first failure.  All it knows is that for a particular region, they 
succeeded up to a particular {{Put}}, at which point there was a failure, and 
then the remaining may or may not have succeeded.  Its best bet is to retry all 
the {{Put}} past the index of the first failure for this region.  But this has 
an unintended consequence.  The {{Put}} that were successful during the first 
run will be *re-applied*.  This will unexpectedly create extra versions.  Now I 
realize most people don't really care about versions, so they won't notice.  
But whoever relies on the versions for whatever reason will rightfully consider 
this to be data corruption.

As it is now, {{MultiPut}} makes proper error handling impossible.  Since this 
RPC cannot guarantee any atomicity other than at the individual {{Put}} level, 
it should return to the client specific 

[jira] Commented: (HBASE-1845) MultiGet, MultiDelete, and MultiPut - batched to the appropriate region servers

2010-08-13 Thread Benoit Sigoure (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1845?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12898493#action_12898493
 ] 

Benoit Sigoure commented on HBASE-1845:
---

Hello,
I just became aware of this issue.  I haven't read all the comments and haven't 
looked at the patches yet, but I'd like to draw your attention to HBASE-2898 
and so you can make sure that whatever you do, you don't reproduce this issue.  
It'd be nice if this issue solved HBASE-2898 as a side-effect of rewriting 
multiPut as part of the multi-everything code.  I'll take a look at the code 
proposed here when time permits.

 MultiGet, MultiDelete, and MultiPut - batched to the appropriate region 
 servers
 ---

 Key: HBASE-1845
 URL: https://issues.apache.org/jira/browse/HBASE-1845
 Project: HBase
  Issue Type: New Feature
Reporter: Erik Holstad
 Fix For: 0.90.0

 Attachments: batch.patch, hbase-1845_0.20.3.patch, 
 hbase-1845_0.20.5.patch, multi-v1.patch


 I've started to create a general interface for doing these batch/multi calls 
 and would like to get some input and thoughts about how we should handle this 
 and what the protocol should
 look like. 
 First naive patch, coming soon.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Assigned: (HBASE-1849) HTable doesn't work well at the core of a multi-threaded server; e.g. webserver

2010-08-13 Thread Benoit Sigoure (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-1849?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Benoit Sigoure reassigned HBASE-1849:
-

Assignee: Benoit Sigoure

I've been working on this for the past 2 weeks, although I'm guessing that my 
solution won't be really satisfactory for this issue.  I wrote another HBase 
client from scratch, and it's been written from the ground up to work well in a 
multi-threaded environment.   I'll open-source it in a few days, stay tuned.

 HTable doesn't work well at the core of a multi-threaded server; e.g. 
 webserver
 ---

 Key: HBASE-1849
 URL: https://issues.apache.org/jira/browse/HBASE-1849
 Project: HBase
  Issue Type: Improvement
Reporter: stack
Assignee: Benoit Sigoure

 HTable must do the following:
 + Sit in a shell or simple client -- e.g. Map or Reduce task -- and feed and 
 read from HBase single-threadedly.  It does this job OK.
 + Sit at core of a multithreaded server (100s of threads) -- a webserver or 
 thrift gateway -- and keep the throughput high. Its currently not good at 
 this job.
 In the way of our achieving the second in the list above are the following:
 + HTable must seekout and cache region locations.  It keeps cache down in 
 HConnectionManager.  One is shared by all HTable instances if the HTable 
 instance was made with same HBaseConfiguration instance.   Lookups of regions 
 is inside a synchronize block; if the region wanted is in the cache, the lock 
 is held a short time.   Otherwise, must wait till trip to server completed 
 (may require retries).  Meantime all other work is blocked even if we're 
 using HTablePool.
 + Regardless of the identity of the HBaseConfiguration, Hadoop RPC has ONE 
 Connection open to a server at a time; request and response are multiplexed 
 over this single connection.
 Broken stuff:
 + Puts are synchronized to protect the write buffer so only one thread at a 
 time appends but flushcommit is open for any thread to call it.  Once the 
 write buffer is full, all Puts block until its freed again. This looks like 
 hang if hundreds of threads and each write is to a random region in a big 
 table and each write has to have its region looked-up (There may be some 
 other brokenness in here because this bottleneck seems to last longer than it 
 should even if hundreds of threads).
 Ideas:
 + Query of the cache does not block all access to the cache.  We only block 
 access if wanted region is being looked up so other reads and writes to 
 regions we know the location of can go ahead.
 + nio'd client and server

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HBASE-1849) HTable doesn't work well at the core of a multi-threaded server; e.g. webserver

2010-08-13 Thread ryan rawson (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12898496#action_12898496
 ] 

ryan rawson commented on HBASE-1849:


some of the original complaints have been fixed.  HTablePool does some things.  
The advice has generally been dont share HTable between threads.

The granularity of the locks in HCM were improved and while not all better 
there are substantial improvements since this issue was filed.

 HTable doesn't work well at the core of a multi-threaded server; e.g. 
 webserver
 ---

 Key: HBASE-1849
 URL: https://issues.apache.org/jira/browse/HBASE-1849
 Project: HBase
  Issue Type: Improvement
Reporter: stack
Assignee: Benoit Sigoure

 HTable must do the following:
 + Sit in a shell or simple client -- e.g. Map or Reduce task -- and feed and 
 read from HBase single-threadedly.  It does this job OK.
 + Sit at core of a multithreaded server (100s of threads) -- a webserver or 
 thrift gateway -- and keep the throughput high. Its currently not good at 
 this job.
 In the way of our achieving the second in the list above are the following:
 + HTable must seekout and cache region locations.  It keeps cache down in 
 HConnectionManager.  One is shared by all HTable instances if the HTable 
 instance was made with same HBaseConfiguration instance.   Lookups of regions 
 is inside a synchronize block; if the region wanted is in the cache, the lock 
 is held a short time.   Otherwise, must wait till trip to server completed 
 (may require retries).  Meantime all other work is blocked even if we're 
 using HTablePool.
 + Regardless of the identity of the HBaseConfiguration, Hadoop RPC has ONE 
 Connection open to a server at a time; request and response are multiplexed 
 over this single connection.
 Broken stuff:
 + Puts are synchronized to protect the write buffer so only one thread at a 
 time appends but flushcommit is open for any thread to call it.  Once the 
 write buffer is full, all Puts block until its freed again. This looks like 
 hang if hundreds of threads and each write is to a random region in a big 
 table and each write has to have its region looked-up (There may be some 
 other brokenness in here because this bottleneck seems to last longer than it 
 should even if hundreds of threads).
 Ideas:
 + Query of the cache does not block all access to the cache.  We only block 
 access if wanted region is being looked up so other reads and writes to 
 regions we know the location of can go ahead.
 + nio'd client and server

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.



[jira] Commented: (HBASE-1849) HTable doesn't work well at the core of a multi-threaded server; e.g. webserver

2010-08-13 Thread stack (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1849?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=12898508#action_12898508
 ] 

stack commented on HBASE-1849:
--

@BenĂ´it: Bring it on!

 HTable doesn't work well at the core of a multi-threaded server; e.g. 
 webserver
 ---

 Key: HBASE-1849
 URL: https://issues.apache.org/jira/browse/HBASE-1849
 Project: HBase
  Issue Type: Improvement
Reporter: stack
Assignee: Benoit Sigoure

 HTable must do the following:
 + Sit in a shell or simple client -- e.g. Map or Reduce task -- and feed and 
 read from HBase single-threadedly.  It does this job OK.
 + Sit at core of a multithreaded server (100s of threads) -- a webserver or 
 thrift gateway -- and keep the throughput high. Its currently not good at 
 this job.
 In the way of our achieving the second in the list above are the following:
 + HTable must seekout and cache region locations.  It keeps cache down in 
 HConnectionManager.  One is shared by all HTable instances if the HTable 
 instance was made with same HBaseConfiguration instance.   Lookups of regions 
 is inside a synchronize block; if the region wanted is in the cache, the lock 
 is held a short time.   Otherwise, must wait till trip to server completed 
 (may require retries).  Meantime all other work is blocked even if we're 
 using HTablePool.
 + Regardless of the identity of the HBaseConfiguration, Hadoop RPC has ONE 
 Connection open to a server at a time; request and response are multiplexed 
 over this single connection.
 Broken stuff:
 + Puts are synchronized to protect the write buffer so only one thread at a 
 time appends but flushcommit is open for any thread to call it.  Once the 
 write buffer is full, all Puts block until its freed again. This looks like 
 hang if hundreds of threads and each write is to a random region in a big 
 table and each write has to have its region looked-up (There may be some 
 other brokenness in here because this bottleneck seems to last longer than it 
 should even if hundreds of threads).
 Ideas:
 + Query of the cache does not block all access to the cache.  We only block 
 access if wanted region is being looked up so other reads and writes to 
 regions we know the location of can go ahead.
 + nio'd client and server

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.