date:20110928

[
https://issues.apache.org/jira/browse/HBASE-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13116498#comment-13116498
]

Ted Yu commented on HBASE-4497:
---

Ming's idea @ 28/Sep/11 04:56, especially point 3 is interesting.
I like that for long term solution.
We need to be careful writing migration code to accommodate the new operation
Id.

If region opening fails after updating META HBCK reports it as inconsistent
and scanning the region throws NSRE
---

Key: HBASE-4497
URL: https://issues.apache.org/jira/browse/HBASE-4497
Project: HBase
Issue Type: Bug
Reporter: ramkrishna.s.vasudevan
Priority: Critical

[jira] [Updated] (HBASE-3777) Redefine Identity Of HBase Configuration

2011-09-28 Thread Bright Fulton (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-3777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Bright Fulton updated HBASE-3777:
-

Attachment: HBASE-3777-V8.0.90.4.backport.patch

Attached backport of fix to 0.90.4.


 Redefine Identity Of HBase Configuration
 

 Key: HBASE-3777
 URL: https://issues.apache.org/jira/browse/HBASE-3777
 Project: HBase
  Issue Type: Improvement
  Components: client, ipc
Affects Versions: 0.90.2
Reporter: Karthick Sankarachary
Assignee: Karthick Sankarachary
Priority: Minor
 Fix For: 0.92.0

 Attachments: 3777-TOF.patch, HBASE-3777-V2.patch, 
 HBASE-3777-V3.patch, HBASE-3777-V4.patch, HBASE-3777-V6.patch, 
 HBASE-3777-V8.0.90.4.backport.patch, HBASE-3777.patch


 Judging from the javadoc in {{HConnectionManager}}, sharing connections 
 across multiple clients going to the same cluster is supposedly a good thing. 
 However, the fact that there is a one-to-one mapping between a configuration 
 and connection instance, kind of works against that goal. Specifically, when 
 you create {{HTable}} instances using a given {{Configuration}} instance and 
 a copy thereof, we end up with two distinct {{HConnection}} instances under 
 the covers. Is this really expected behavior, especially given that the 
 configuration instance gets cloned a lot?
 Here, I'd like to play devil's advocate and propose that we deep-compare 
 {{HBaseConfiguration}} instances, so that multiple {{HBaseConfiguration}} 
 instances that have the same properties map to the same {{HConnection}} 
 instance. In case one is concerned that a single {{HConnection}} is 
 insufficient for sharing amongst clients,  to quote the javadoc, then one 
 should be able to mark a given {{HBaseConfiguration}} instance as being 
 uniquely identifiable.
 Note that sharing connections makes clean up of {{HConnection}} instances a 
 little awkward, unless of course, you apply the change described in 
 HBASE-3766.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3777) Redefine Identity Of HBase Configuration

2011-09-28 Thread ramkrishna.s.vasudevan (Updated) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-3777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13116518#comment-13116518
 ] 

Ted Yu commented on HBASE-3777:
---

@Bright:
Do all tests in 0.90 pass ?

I got the following when applying your patch:
{code}
Hunk #11 succeeded at 1416 (offset 8 lines).
1 out of 11 hunks FAILED -- saving rejects to file 
src/main/java/org/apache/hadoop/hbase/client/HTable.java.rej
{code}
This is minor.

Running test suite.

 Redefine Identity Of HBase Configuration
 

 Key: HBASE-3777
 URL: https://issues.apache.org/jira/browse/HBASE-3777
 Project: HBase
  Issue Type: Improvement
  Components: client, ipc
Affects Versions: 0.90.2
Reporter: Karthick Sankarachary
Assignee: Karthick Sankarachary
Priority: Minor
 Fix For: 0.92.0

 Attachments: 3777-TOF.patch, HBASE-3777-V2.patch, 
 HBASE-3777-V3.patch, HBASE-3777-V4.patch, HBASE-3777-V6.patch, 
 HBASE-3777-V8.0.90.4.backport.patch, HBASE-3777.patch


 Judging from the javadoc in {{HConnectionManager}}, sharing connections 
 across multiple clients going to the same cluster is supposedly a good thing. 
 However, the fact that there is a one-to-one mapping between a configuration 
 and connection instance, kind of works against that goal. Specifically, when 
 you create {{HTable}} instances using a given {{Configuration}} instance and 
 a copy thereof, we end up with two distinct {{HConnection}} instances under 
 the covers. Is this really expected behavior, especially given that the 
 configuration instance gets cloned a lot?
 Here, I'd like to play devil's advocate and propose that we deep-compare 
 {{HBaseConfiguration}} instances, so that multiple {{HBaseConfiguration}} 
 instances that have the same properties map to the same {{HConnection}} 
 instance. In case one is concerned that a single {{HConnection}} is 
 insufficient for sharing amongst clients,  to quote the javadoc, then one 
 should be able to mark a given {{HBaseConfiguration}} instance as being 
 uniquely identifiable.
 Note that sharing connections makes clean up of {{HConnection}} instances a 
 little awkward, unless of course, you apply the change described in 
 HBASE-3766.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4492) TestRollingRestart fails intermittently


 [ 
https://issues.apache.org/jira/browse/HBASE-4492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

ramkrishna.s.vasudevan updated HBASE-4492:
--

Attachment: HBASE-4492.patch

I apologise for the silly and careless mistake committed there in HBASE-4153.
Will ensure such things are not repeated.
@Ted
Ran the testcases using the script many times and it did not fail after this.

 TestRollingRestart fails intermittently
 ---

 Key: HBASE-4492
 URL: https://issues.apache.org/jira/browse/HBASE-4492
 Project: HBase
  Issue Type: Test
Reporter: Ted Yu
Assignee: Jonathan Gray
 Attachments: 4492.txt, HBASE-4492.patch


 I got the following when running test suite on TRUNK:
 {code}
 testBasicRollingRestart(org.apache.hadoop.hbase.master.TestRollingRestart)  
 Time elapsed: 300.28 sec   ERROR!
 java.lang.Exception: test timed out after 30 milliseconds
 at java.lang.Thread.sleep(Native Method)
 at 
 org.apache.hadoop.hbase.master.TestRollingRestart.waitForRSShutdownToStartAndFinish(TestRollingRestart.java:313)
 at 
 org.apache.hadoop.hbase.master.TestRollingRestart.testBasicRollingRestart(TestRollingRestart.java:210)
 {code}
 I ran TestRollingRestart#testBasicRollingRestart manually afterwards which 
 wiped out test output file for the failed test.
 Similar failure can be found on Jenkins:
 https://builds.apache.org/view/G-L/view/HBase/job/HBase-0.92/19/testReport/junit/org.apache.hadoop.hbase.master/TestRollingRestart/testBasicRollingRestart/

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4492) TestRollingRestart fails intermittently

2011-09-28 Thread ramkrishna.s.vasudevan (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13116523#comment-13116523
 ] 

ramkrishna.s.vasudevan commented on HBASE-4492:
---

So my analysis in the comments 27/Sep/11 14:40 is not right. 

 TestRollingRestart fails intermittently
 ---

 Key: HBASE-4492
 URL: https://issues.apache.org/jira/browse/HBASE-4492
 Project: HBase
  Issue Type: Test
Reporter: Ted Yu
Assignee: Jonathan Gray
 Attachments: 4492.txt, HBASE-4492.patch


 I got the following when running test suite on TRUNK:
 {code}
 testBasicRollingRestart(org.apache.hadoop.hbase.master.TestRollingRestart)  
 Time elapsed: 300.28 sec   ERROR!
 java.lang.Exception: test timed out after 30 milliseconds
 at java.lang.Thread.sleep(Native Method)
 at 
 org.apache.hadoop.hbase.master.TestRollingRestart.waitForRSShutdownToStartAndFinish(TestRollingRestart.java:313)
 at 
 org.apache.hadoop.hbase.master.TestRollingRestart.testBasicRollingRestart(TestRollingRestart.java:210)
 {code}
 I ran TestRollingRestart#testBasicRollingRestart manually afterwards which 
 wiped out test output file for the failed test.
 Similar failure can be found on Jenkins:
 https://builds.apache.org/view/G-L/view/HBase/job/HBase-0.92/19/testReport/junit/org.apache.hadoop.hbase.master/TestRollingRestart/testBasicRollingRestart/

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4497) If region opening fails after updating META HBCK reports it as inconsistent and scanning the region throws NSRE

2011-09-28 Thread ramkrishna.s.vasudevan (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13116524#comment-13116524
]

ramkrishna.s.vasudevan commented on HBASE-4497:
---

As Ming suggested
we can generate an incremental integer at the master side which will be
generated per region and pass that value over RPC which we can be checked
before updating the META.

This value can be maintained in the master side in a map with region as the key.

If region opening fails after updating META HBCK reports it as inconsistent
and scanning the region throws NSRE
---

Key: HBASE-4497
URL: https://issues.apache.org/jira/browse/HBASE-4497
Project: HBase
Issue Type: Bug
Reporter: ramkrishna.s.vasudevan
Priority: Critical

[jira] [Commented] (HBASE-4497) If region opening fails after updating META HBCK reports it as inconsistent and scanning the region throws NSRE

[
https://issues.apache.org/jira/browse/HBASE-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13116525#comment-13116525
]

Jonathan Gray commented on HBASE-4497:
--

I don't think we can use the same ID as the ZK node. But we could just some
incrementing number.

An alternative would be to instead allow the roll-back of the META edit using a
checkAndDelete which might be simpler but less optimal.

If region opening fails after updating META HBCK reports it as inconsistent
and scanning the region throws NSRE
---

Key: HBASE-4497
URL: https://issues.apache.org/jira/browse/HBASE-4497
Project: HBase
Issue Type: Bug
Reporter: ramkrishna.s.vasudevan
Priority: Critical

[jira] [Commented] (HBASE-4488) Store could miss rows during flush

2011-09-28 Thread Lars Hofhansl (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13116547#comment-13116547
 ] 

Lars Hofhansl commented on HBASE-4488:
--

Looks like HBASE-4433 can potentially trigger this issue.

 Store could miss rows during flush
 --

 Key: HBASE-4488
 URL: https://issues.apache.org/jira/browse/HBASE-4488
 Project: HBase
  Issue Type: Sub-task
  Components: regionserver
Affects Versions: 0.92.0, 0.94.0
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 0.92.0, 0.94.0

 Attachments: 4488.txt


 While looking at HBASE-4344 I found that my change HBASE-4241 contains a 
 critical mistake:
 The while(scanner.next(kvs)) loop is incorrect and might miss the last edits.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4488) Store could miss rows during flush

2011-09-28 Thread jirapos...@reviews.apache.org (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13116552#comment-13116552
 ] 

Jonathan Gray commented on HBASE-4488:
--

Can you explain what you mean Lars?  Something is wrong with HBASE-4433 or 
there's nothing to worry about once I commit this :)

 Store could miss rows during flush
 --

 Key: HBASE-4488
 URL: https://issues.apache.org/jira/browse/HBASE-4488
 Project: HBase
  Issue Type: Sub-task
  Components: regionserver
Affects Versions: 0.92.0, 0.94.0
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 0.92.0, 0.94.0

 Attachments: 4488.txt


 While looking at HBASE-4344 I found that my change HBASE-4241 contains a 
 critical mistake:
 The while(scanner.next(kvs)) loop is incorrect and might miss the last edits.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4488) Store could miss rows during flush

2011-09-28 Thread Jonathan Gray (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-4488?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Gray updated HBASE-4488:
-

   Resolution: Fixed
Fix Version/s: (was: 0.94.0)
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

Committed to 92 branch and trunk, marking against 92.

Thanks Lars!

 Store could miss rows during flush
 --

 Key: HBASE-4488
 URL: https://issues.apache.org/jira/browse/HBASE-4488
 Project: HBase
  Issue Type: Sub-task
  Components: regionserver
Affects Versions: 0.92.0, 0.94.0
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 0.92.0

 Attachments: 4488.txt


 While looking at HBASE-4344 I found that my change HBASE-4241 contains a 
 critical mistake:
 The while(scanner.next(kvs)) loop is incorrect and might miss the last edits.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (HBASE-4131) Make the Replication Service pluggable via a standard interface definition

2011-09-28 Thread Jonathan Gray (Resolved) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-4131?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Gray resolved HBASE-4131.
--

Resolution: Fixed

Committed to trunk.  Good work Dhruba and thanks for review Ted!

 Make the Replication Service pluggable via a standard interface definition
 --

 Key: HBASE-4131
 URL: https://issues.apache.org/jira/browse/HBASE-4131
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Reporter: dhruba borthakur
Assignee: dhruba borthakur
 Fix For: 0.94.0

 Attachments: 4131-backedout.txt, replicationInterface1.txt, 
 replicationInterface2.txt, replicationInterface3.txt, 
 replicationInterface4.txt


 The current HBase code supports a replication service that can be used to 
 sync data from from one hbase cluster to another. It would be nice to make it 
 a pluggable interface so that other cross-data-center replication services 
 can be used in conjuction with HBase.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-2794) ROWCOL bloom filter not used if multiple columns within same family are requested in a Get


[ 
https://issues.apache.org/jira/browse/HBASE-2794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13116569#comment-13116569
 ] 

jirapos...@reviews.apache.org commented on HBASE-2794:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2084/
---

Review request for hbase.


Summary
---

Previously we only used row-column Bloom filters for scans that only requested 
one column. We have seen production queries that request up to 200 columns, and 
with say ~6 store files per store (region / column family combination) this 
might have resulted in 1200 block read operations in the worst case. With this 
diff we will be avoiding seeks on store files that we know don't contain the 
row/column of interest when using an ExplicitColumnTracker. The performance 
should remain the same for column range queries.


This addresses bug HBASE-2794.
https://issues.apache.org/jira/browse/HBASE-2794


Diffs
-

  src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java 
08d3ba4 
  src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java ac2348e 
  src/main/java/org/apache/hadoop/hbase/regionserver/MemStore.java 4aa72de 
  src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java 
68cdac5 
  src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java fd9e7ef 
  src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueHeap.java 9d9895c 
  src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueScanner.java 
6cdada7 
  src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java 7cbdb98 
  
src/main/java/org/apache/hadoop/hbase/regionserver/AbstractKeyValueScanner.java 
PRE-CREATION 
  src/main/java/org/apache/hadoop/hbase/KeyValue.java 585c4a8 
  src/main/java/org/apache/hadoop/hbase/io/hfile/AbstractHFileReader.java 
f5173c4 
  src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java a3d778e 
  src/main/java/org/apache/hadoop/hbase/util/CollectionBackedScanner.java 
32f88fb 
  src/test/java/org/apache/hadoop/hbase/regionserver/TestKeyValueHeap.java 
a5d13f7 
  
src/test/java/org/apache/hadoop/hbase/regionserver/TestMultiColumnScanner.java 
baee696 
  
src/test/java/org/apache/hadoop/hbase/regionserver/TestScanWithBloomError.java 
PRE-CREATION 

Diff: https://reviews.apache.org/r/2084/diff


Testing
---

Existing unit tests. A new unit test (TestScanWithBloomError). Load testing 
using HBaseTest.


Thanks,

Mikhail



 ROWCOL bloom filter not used if multiple columns within same family are 
 requested in a Get
 --

 Key: HBASE-2794
 URL: https://issues.apache.org/jira/browse/HBASE-2794
 Project: HBase
  Issue Type: Improvement
  Components: performance
Reporter: Kannan Muthukkaruppan

 Noticed the following snippet in StoreFile.java:Scanner:shouldSeek():
 {code}
 switch(bloomFilterType) {
   case ROW:
 key = row;
 break;
   case ROWCOL:
 if (columns.size() == 1) {
   byte[] col = columns.first();
   key = Bytes.add(row, col);
   break;
 }
 //$FALL-THROUGH$
   default:
 return true;
 }
 {code}
 If columns.size  1, then we currently don't take advantage of the bloom 
 filter.  We should optimize this to check bloom for each of columns and if 
 none of the columns are present in the bloom avoid opening the file.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4488) Store could miss rows during flush

2011-09-28 Thread Lars Hofhansl (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13116572#comment-13116572
 ] 

Lars Hofhansl commented on HBASE-4488:
--

Nothing to worry once this is committed.
Just saw that HBASE-4433 changes the conditions under which the StoreScanner 
still has rows while returning false.


 Store could miss rows during flush
 --

 Key: HBASE-4488
 URL: https://issues.apache.org/jira/browse/HBASE-4488
 Project: HBase
  Issue Type: Sub-task
  Components: regionserver
Affects Versions: 0.92.0, 0.94.0
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 0.92.0

 Attachments: 4488.txt


 While looking at HBASE-4344 I found that my change HBASE-4241 contains a 
 critical mistake:
 The while(scanner.next(kvs)) loop is incorrect and might miss the last edits.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3777) Redefine Identity Of HBase Configuration

2011-09-28 Thread jirapos...@reviews.apache.org (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-3777?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13116575#comment-13116575
 ] 

Ted Yu commented on HBASE-3777:
---

Test suite didn't go very far - TestLogRolling hangs
{code}
main prio=10 tid=0x57197000 nid=0x3f12 waiting on condition 
[0x406c8000]
   java.lang.Thread.State: TIMED_WAITING (sleeping)
at java.lang.Thread.sleep(Native Method)
at 
org.apache.hadoop.hbase.client.HBaseAdmin.deleteTable(HBaseAdmin.java:446)
at 
org.apache.hadoop.hbase.client.HBaseAdmin.deleteTable(HBaseAdmin.java:393)
at 
org.apache.hadoop.hbase.regionserver.wal.TestLogRolling.testLogRollOnPipelineRestart(TestLogRolling.java:415)
{code}

 Redefine Identity Of HBase Configuration
 

 Key: HBASE-3777
 URL: https://issues.apache.org/jira/browse/HBASE-3777
 Project: HBase
  Issue Type: Improvement
  Components: client, ipc
Affects Versions: 0.90.2
Reporter: Karthick Sankarachary
Assignee: Karthick Sankarachary
Priority: Minor
 Fix For: 0.92.0

 Attachments: 3777-TOF.patch, HBASE-3777-V2.patch, 
 HBASE-3777-V3.patch, HBASE-3777-V4.patch, HBASE-3777-V6.patch, 
 HBASE-3777-V8.0.90.4.backport.patch, HBASE-3777.patch


 Judging from the javadoc in {{HConnectionManager}}, sharing connections 
 across multiple clients going to the same cluster is supposedly a good thing. 
 However, the fact that there is a one-to-one mapping between a configuration 
 and connection instance, kind of works against that goal. Specifically, when 
 you create {{HTable}} instances using a given {{Configuration}} instance and 
 a copy thereof, we end up with two distinct {{HConnection}} instances under 
 the covers. Is this really expected behavior, especially given that the 
 configuration instance gets cloned a lot?
 Here, I'd like to play devil's advocate and propose that we deep-compare 
 {{HBaseConfiguration}} instances, so that multiple {{HBaseConfiguration}} 
 instances that have the same properties map to the same {{HConnection}} 
 instance. In case one is concerned that a single {{HConnection}} is 
 insufficient for sharing amongst clients,  to quote the javadoc, then one 
 should be able to mark a given {{HBaseConfiguration}} instance as being 
 uniquely identifiable.
 Note that sharing connections makes clean up of {{HConnection}} instances a 
 little awkward, unless of course, you apply the change described in 
 HBASE-3766.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-2794) ROWCOL bloom filter not used if multiple columns within same family are requested in a Get


[ 
https://issues.apache.org/jira/browse/HBASE-2794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13116578#comment-13116578
 ] 

jirapos...@reviews.apache.org commented on HBASE-2794:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2084/#review2130
---


nice work mikhail!  i will let someone else give the +1 though


src/main/java/org/apache/hadoop/hbase/KeyValue.java
https://reviews.apache.org/r/2084/#comment4946

method doesn't actually take a KeyValue... this is to create the last KV 
the on row and column for the KeyValue this is called on?



src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java
https://reviews.apache.org/r/2084/#comment4947

got it.  maybe add a comment on this method to explain this usage



src/test/java/org/apache/hadoop/hbase/regionserver/TestScanWithBloomError.java
https://reviews.apache.org/r/2084/#comment4948

license


- Jonathan


On 2011-09-28 16:03:52, Mikhail Bautin wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/2084/
bq.  ---
bq.  
bq.  (Updated 2011-09-28 16:03:52)
bq.  
bq.  
bq.  Review request for hbase.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  Previously we only used row-column Bloom filters for scans that only 
requested one column. We have seen production queries that request up to 200 
columns, and with say ~6 store files per store (region / column family 
combination) this might have resulted in 1200 block read operations in the 
worst case. With this diff we will be avoiding seeks on store files that we 
know don't contain the row/column of interest when using an 
ExplicitColumnTracker. The performance should remain the same for column range 
queries.
bq.  
bq.  
bq.  This addresses bug HBASE-2794.
bq.  https://issues.apache.org/jira/browse/HBASE-2794
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java 
08d3ba4 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java 
ac2348e 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/MemStore.java 4aa72de 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java 
68cdac5 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java 
fd9e7ef 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueHeap.java 
9d9895c 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueScanner.java 
6cdada7 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java 7cbdb98 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/AbstractKeyValueScanner.java 
PRE-CREATION 
bq.src/main/java/org/apache/hadoop/hbase/KeyValue.java 585c4a8 
bq.src/main/java/org/apache/hadoop/hbase/io/hfile/AbstractHFileReader.java 
f5173c4 
bq.src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java a3d778e 
bq.src/main/java/org/apache/hadoop/hbase/util/CollectionBackedScanner.java 
32f88fb 
bq.src/test/java/org/apache/hadoop/hbase/regionserver/TestKeyValueHeap.java 
a5d13f7 
bq.
src/test/java/org/apache/hadoop/hbase/regionserver/TestMultiColumnScanner.java 
baee696 
bq.
src/test/java/org/apache/hadoop/hbase/regionserver/TestScanWithBloomError.java 
PRE-CREATION 
bq.  
bq.  Diff: https://reviews.apache.org/r/2084/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  Existing unit tests. A new unit test (TestScanWithBloomError). Load 
testing using HBaseTest.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Mikhail
bq.  
bq.



 ROWCOL bloom filter not used if multiple columns within same family are 
 requested in a Get
 --

 Key: HBASE-2794
 URL: https://issues.apache.org/jira/browse/HBASE-2794
 Project: HBase
  Issue Type: Improvement
  Components: performance
Reporter: Kannan Muthukkaruppan

 Noticed the following snippet in StoreFile.java:Scanner:shouldSeek():
 {code}
 switch(bloomFilterType) {
   case ROW:
 key = row;
 break;
   case ROWCOL:
 if (columns.size() == 1) {
   byte[] col = columns.first();
   key = Bytes.add(row, col);
   break;
 }
 //$FALL-THROUGH$
   default:
 return true;
 }
 {code}
 If columns.size  1, then we currently don't take advantage of the bloom 
 filter.  We should optimize this to check bloom for each of columns and if 
 none of the columns are

[jira] [Created] (HBASE-4504) book.xml - adding section on filters

2011-09-28 Thread Doug Meil (Created) (JIRA)

book.xml - adding section on filters


 Key: HBASE-4504
 URL: https://issues.apache.org/jira/browse/HBASE-4504
 Project: HBase
  Issue Type: Improvement
Reporter: Doug Meil
Assignee: Doug Meil
Priority: Minor


Under Architecture
* new sub-section for Client Filters, with sub-sections by filter-type, etc.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4504) book.xml - adding section on filters

2011-09-28 Thread Doug Meil (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-4504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doug Meil updated HBASE-4504:
-

Status: Patch Available  (was: Open)

 book.xml - adding section on filters
 

 Key: HBASE-4504
 URL: https://issues.apache.org/jira/browse/HBASE-4504
 Project: HBase
  Issue Type: Improvement
Reporter: Doug Meil
Assignee: Doug Meil
Priority: Minor
 Attachments: book_HBASE_4504.xml.patch


 Under Architecture
 * new sub-section for Client Filters, with sub-sections by filter-type, etc.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4504) book.xml - adding section on filters

2011-09-28 Thread Doug Meil (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-4504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doug Meil updated HBASE-4504:
-

Attachment: book_HBASE_4504.xml.patch

 book.xml - adding section on filters
 

 Key: HBASE-4504
 URL: https://issues.apache.org/jira/browse/HBASE-4504
 Project: HBase
  Issue Type: Improvement
Reporter: Doug Meil
Assignee: Doug Meil
Priority: Minor
 Attachments: book_HBASE_4504.xml.patch


 Under Architecture
 * new sub-section for Client Filters, with sub-sections by filter-type, etc.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4504) book.xml - adding section on filters

2011-09-28 Thread Doug Meil (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-4504?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Doug Meil updated HBASE-4504:
-

Resolution: Fixed
Status: Resolved  (was: Patch Available)

 book.xml - adding section on filters
 

 Key: HBASE-4504
 URL: https://issues.apache.org/jira/browse/HBASE-4504
 Project: HBase
  Issue Type: Improvement
Reporter: Doug Meil
Assignee: Doug Meil
Priority: Minor
 Attachments: book_HBASE_4504.xml.patch


 Under Architecture
 * new sub-section for Client Filters, with sub-sections by filter-type, etc.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4492) TestRollingRestart fails intermittently


 [ 
https://issues.apache.org/jira/browse/HBASE-4492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-4492:
--

Attachment: 4492-v2.txt

Patch v2 combines my patch with Ramkrishna's fix.

 TestRollingRestart fails intermittently
 ---

 Key: HBASE-4492
 URL: https://issues.apache.org/jira/browse/HBASE-4492
 Project: HBase
  Issue Type: Test
Reporter: Ted Yu
Assignee: Jonathan Gray
 Attachments: 4492-v2.txt, 4492.txt, HBASE-4492.patch


 I got the following when running test suite on TRUNK:
 {code}
 testBasicRollingRestart(org.apache.hadoop.hbase.master.TestRollingRestart)  
 Time elapsed: 300.28 sec   ERROR!
 java.lang.Exception: test timed out after 30 milliseconds
 at java.lang.Thread.sleep(Native Method)
 at 
 org.apache.hadoop.hbase.master.TestRollingRestart.waitForRSShutdownToStartAndFinish(TestRollingRestart.java:313)
 at 
 org.apache.hadoop.hbase.master.TestRollingRestart.testBasicRollingRestart(TestRollingRestart.java:210)
 {code}
 I ran TestRollingRestart#testBasicRollingRestart manually afterwards which 
 wiped out test output file for the failed test.
 Similar failure can be found on Jenkins:
 https://builds.apache.org/view/G-L/view/HBase/job/HBase-0.92/19/testReport/junit/org.apache.hadoop.hbase.master/TestRollingRestart/testBasicRollingRestart/

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Assigned] (HBASE-4492) TestRollingRestart fails intermittently

2011-09-28 Thread Ted Yu (Assigned) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-4492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu reassigned HBASE-4492:
-

Assignee: ramkrishna.s.vasudevan  (was: Jonathan Gray)

 TestRollingRestart fails intermittently
 ---

 Key: HBASE-4492
 URL: https://issues.apache.org/jira/browse/HBASE-4492
 Project: HBase
  Issue Type: Test
Reporter: Ted Yu
Assignee: ramkrishna.s.vasudevan
 Attachments: 4492-v2.txt, 4492.txt, HBASE-4492.patch


 I got the following when running test suite on TRUNK:
 {code}
 testBasicRollingRestart(org.apache.hadoop.hbase.master.TestRollingRestart)  
 Time elapsed: 300.28 sec   ERROR!
 java.lang.Exception: test timed out after 30 milliseconds
 at java.lang.Thread.sleep(Native Method)
 at 
 org.apache.hadoop.hbase.master.TestRollingRestart.waitForRSShutdownToStartAndFinish(TestRollingRestart.java:313)
 at 
 org.apache.hadoop.hbase.master.TestRollingRestart.testBasicRollingRestart(TestRollingRestart.java:210)
 {code}
 I ran TestRollingRestart#testBasicRollingRestart manually afterwards which 
 wiped out test output file for the failed test.
 Similar failure can be found on Jenkins:
 https://builds.apache.org/view/G-L/view/HBase/job/HBase-0.92/19/testReport/junit/org.apache.hadoop.hbase.master/TestRollingRestart/testBasicRollingRestart/

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4492) TestRollingRestart fails intermittently


[ 
https://issues.apache.org/jira/browse/HBASE-4492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13116585#comment-13116585
 ] 

Jonathan Gray commented on HBASE-4492:
--

+1 for commit

 TestRollingRestart fails intermittently
 ---

 Key: HBASE-4492
 URL: https://issues.apache.org/jira/browse/HBASE-4492
 Project: HBase
  Issue Type: Test
Reporter: Ted Yu
Assignee: ramkrishna.s.vasudevan
 Attachments: 4492-v2.txt, 4492.txt, HBASE-4492.patch


 I got the following when running test suite on TRUNK:
 {code}
 testBasicRollingRestart(org.apache.hadoop.hbase.master.TestRollingRestart)  
 Time elapsed: 300.28 sec   ERROR!
 java.lang.Exception: test timed out after 30 milliseconds
 at java.lang.Thread.sleep(Native Method)
 at 
 org.apache.hadoop.hbase.master.TestRollingRestart.waitForRSShutdownToStartAndFinish(TestRollingRestart.java:313)
 at 
 org.apache.hadoop.hbase.master.TestRollingRestart.testBasicRollingRestart(TestRollingRestart.java:210)
 {code}
 I ran TestRollingRestart#testBasicRollingRestart manually afterwards which 
 wiped out test output file for the failed test.
 Similar failure can be found on Jenkins:
 https://builds.apache.org/view/G-L/view/HBase/job/HBase-0.92/19/testReport/junit/org.apache.hadoop.hbase.master/TestRollingRestart/testBasicRollingRestart/

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4492) TestRollingRestart fails intermittently

2011-09-28 Thread jirapos...@reviews.apache.org (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13116586#comment-13116586
 ] 

Jonathan Gray commented on HBASE-4492:
--

(92 branch)

 TestRollingRestart fails intermittently
 ---

 Key: HBASE-4492
 URL: https://issues.apache.org/jira/browse/HBASE-4492
 Project: HBase
  Issue Type: Test
Reporter: Ted Yu
Assignee: ramkrishna.s.vasudevan
 Attachments: 4492-v2.txt, 4492.txt, HBASE-4492.patch


 I got the following when running test suite on TRUNK:
 {code}
 testBasicRollingRestart(org.apache.hadoop.hbase.master.TestRollingRestart)  
 Time elapsed: 300.28 sec   ERROR!
 java.lang.Exception: test timed out after 30 milliseconds
 at java.lang.Thread.sleep(Native Method)
 at 
 org.apache.hadoop.hbase.master.TestRollingRestart.waitForRSShutdownToStartAndFinish(TestRollingRestart.java:313)
 at 
 org.apache.hadoop.hbase.master.TestRollingRestart.testBasicRollingRestart(TestRollingRestart.java:210)
 {code}
 I ran TestRollingRestart#testBasicRollingRestart manually afterwards which 
 wiped out test output file for the failed test.
 Similar failure can be found on Jenkins:
 https://builds.apache.org/view/G-L/view/HBase/job/HBase-0.92/19/testReport/junit/org.apache.hadoop.hbase.master/TestRollingRestart/testBasicRollingRestart/

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4145) Provide metrics for hbase client

2011-09-28 Thread Ming Ma (Updated) (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-4145?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Ming Ma updated HBASE-4145:
---

Attachment: HBaseClientSideMetrics.jpg

Here is the screenshot of what it looks like on jobtracker UI.

Provide metrics for hbase client

Key: HBASE-4145
URL: https://issues.apache.org/jira/browse/HBASE-4145
Project: HBase
Issue Type: Improvement
Reporter: Ming Ma
Assignee: Ming Ma
Attachments: HBaseClientSideMetrics.jpg

Sometimes it is useful to get some metrics from hbase client point of view.
This will help understand the metrics for scan/TableInputFormat map job
scenario.
What to capture, for example, for each ResultScanner object,
1. The number of RPC calls to RSs.
2. The delta time between consecutive RPC calls in the current serialized
scan implementation.
3. The number of RPC retry to RSs.
4. The number of NotServingRegionException got.
5. The number of remote RPC calls. This excludes those call that hbase client
calls the RS on the same machine.
6. The number of regions accessed.
How to capture
1. Metrics framework works for a fixed number of metrics. It doesn't fit this
scenario.
2. Use some TBD solution in HBase to capture such dynamic metrics. If we
assume there is a solution in HBase that HBase client can use to log such
kind of metrics, TableInputFormat can pass in mapreduce task ID as
application scan ID to HBase client as small addition to existing scan API;
and HBase client can log metrics accordingly with such ID. That will allow
query, analysis later on the metrics data for specific map reduce job.
3. Expose via MapReduce counter. It lacks certain features, for example,
there is no good way to access the metrics on per map instance; the MapReduce
framework only performs sum on the counter values so it is tricky to find the
max of certain metrics in all mapper instances. However, it might be good
enough for now. With this approach, the metrics value will be available via
MapReduce counter.
a) Have ResultScanner return a new ResultScannerMetrics interface.
b) TableInputFormat will access data from ResultScannerMetrics and populate
MapReduce counters accordingly.

[jira] [Commented] (HBASE-4145) Provide metrics for hbase client

[
https://issues.apache.org/jira/browse/HBASE-4145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13116588#comment-13116588
]

jirapos...@reviews.apache.org commented on HBASE-4145:
--

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/1674/
---

(Updated 2011-09-28 16:35:57.691899)

Review request for hbase.

Changes
---

Merge with latest trunk.
Run unit tests couple more times.

Summary
---

1. Collect client-side scan related metrics during scan operation. It is turned
off by default.
2. TableInputFormat enables metrics collection on scan and pass the data to
mapreduce framework. It only works with new mapreduce APIs that allow
TableInputFormat to get access to mapreduce Counter.
3. Clean up some minor issues in tableInputFormat as well as test code.

This addresses bug hbase-4145.
https://issues.apache.org/jira/browse/hbase-4145

Diffs (updated)
-

http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/mapred/TestTableInputFormat.java
1176942

http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/mapreduce/TestTableInputFormatScan.java
1176942

http://svn.apache.org/repos/asf/hbase/trunk/src/test/java/org/apache/hadoop/hbase/client/TestFromClientSide.java
1176942

http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/mapreduce/TableInputFormatBase.java
1176942

http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/mapreduce/TableRecordReader.java
1176942

http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/mapreduce/TableRecordReaderImpl.java
1176942

http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/metrics/ScanMetrics.java
PRE-CREATION

http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/MetaScanner.java
1176942

http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/Scan.java
1176942

http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/ScannerCallable.java
1176942

http://svn.apache.org/repos/asf/hbase/trunk/src/main/java/org/apache/hadoop/hbase/client/HTable.java
1176942

Diff: https://reviews.apache.org/r/1674/diff

Testing
---

1. Verified on a small cluster.
2. Existing unit tests.
3. Added new tests.

Thanks,

Ming

Provide metrics for hbase client

Key: HBASE-4145
URL: https://issues.apache.org/jira/browse/HBASE-4145
Project: HBase
Issue Type: Improvement
Reporter: Ming Ma
Assignee: Ming Ma
Attachments: HBaseClientSideMetrics.jpg

[jira] [Updated] (HBASE-4492) TestRollingRestart fails intermittently


 [ 
https://issues.apache.org/jira/browse/HBASE-4492?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-4492:
--

Affects Version/s: 0.92.0
Fix Version/s: 0.92.0

 TestRollingRestart fails intermittently
 ---

 Key: HBASE-4492
 URL: https://issues.apache.org/jira/browse/HBASE-4492
 Project: HBase
  Issue Type: Test
Affects Versions: 0.92.0
Reporter: Ted Yu
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.92.0

 Attachments: 4492-v2.txt, 4492.txt, HBASE-4492.patch


 I got the following when running test suite on TRUNK:
 {code}
 testBasicRollingRestart(org.apache.hadoop.hbase.master.TestRollingRestart)  
 Time elapsed: 300.28 sec   ERROR!
 java.lang.Exception: test timed out after 30 milliseconds
 at java.lang.Thread.sleep(Native Method)
 at 
 org.apache.hadoop.hbase.master.TestRollingRestart.waitForRSShutdownToStartAndFinish(TestRollingRestart.java:313)
 at 
 org.apache.hadoop.hbase.master.TestRollingRestart.testBasicRollingRestart(TestRollingRestart.java:210)
 {code}
 I ran TestRollingRestart#testBasicRollingRestart manually afterwards which 
 wiped out test output file for the failed test.
 Similar failure can be found on Jenkins:
 https://builds.apache.org/view/G-L/view/HBase/job/HBase-0.92/19/testReport/junit/org.apache.hadoop.hbase.master/TestRollingRestart/testBasicRollingRestart/

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4492) TestRollingRestart fails intermittently


[ 
https://issues.apache.org/jira/browse/HBASE-4492?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13116602#comment-13116602
 ] 

Ted Yu commented on HBASE-4492:
---

TestMasterObserver#testRegionTransitionOperations had a little hiccup during 
test suite run.
It passed when run individually.

 TestRollingRestart fails intermittently
 ---

 Key: HBASE-4492
 URL: https://issues.apache.org/jira/browse/HBASE-4492
 Project: HBase
  Issue Type: Test
Affects Versions: 0.92.0
Reporter: Ted Yu
Assignee: ramkrishna.s.vasudevan
 Fix For: 0.92.0

 Attachments: 4492-v2.txt, 4492.txt, HBASE-4492.patch


 I got the following when running test suite on TRUNK:
 {code}
 testBasicRollingRestart(org.apache.hadoop.hbase.master.TestRollingRestart)  
 Time elapsed: 300.28 sec   ERROR!
 java.lang.Exception: test timed out after 30 milliseconds
 at java.lang.Thread.sleep(Native Method)
 at 
 org.apache.hadoop.hbase.master.TestRollingRestart.waitForRSShutdownToStartAndFinish(TestRollingRestart.java:313)
 at 
 org.apache.hadoop.hbase.master.TestRollingRestart.testBasicRollingRestart(TestRollingRestart.java:210)
 {code}
 I ran TestRollingRestart#testBasicRollingRestart manually afterwards which 
 wiped out test output file for the failed test.
 Similar failure can be found on Jenkins:
 https://builds.apache.org/view/G-L/view/HBase/job/HBase-0.92/19/testReport/junit/org.apache.hadoop.hbase.master/TestRollingRestart/testBasicRollingRestart/

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HBASE-4505) SubstringComparator - Javadoc refers to a class that doesn't exist

2011-09-28 Thread Doug Meil (Created) (JIRA)

SubstringComparator - Javadoc refers to a class that doesn't exist
--

 Key: HBASE-4505
 URL: https://issues.apache.org/jira/browse/HBASE-4505
 Project: HBase
  Issue Type: Bug
Reporter: Doug Meil
Assignee: Doug Meil
Priority: Trivial


For example... 
http://hbase.apache.org/apidocs/org/apache/hadoop/hbase/filter/SubstringComparator.html

This comparator is for use with ColumnValueFilter,

No such class exists.  It should be SingleColumnValueFilter.  The code example 
is wrong too.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HBASE-4506) [hbck] Allow HBaseFsck to be instantiated without connecting

2011-09-28 Thread Jonathan Hsieh (Created) (JIRA)

[hbck] Allow HBaseFsck to be instantiated without connecting


 Key: HBASE-4506
 URL: https://issues.apache.org/jira/browse/HBASE-4506
 Project: HBase
  Issue Type: Improvement
  Components: hbck
Reporter: Jonathan Hsieh
Assignee: Jonathan Hsieh


This is a semantics preserving patch that allows for offline meta rebuild 
(HBASE-4377) to reuse code in the existing hbck code when hbase is down.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Work started] (HBASE-4377) [hbck] Offline rebuild .META. from fs data only.

2011-09-28 Thread Jonathan Hsieh (Work started) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-4377?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on HBASE-4377 started by Jonathan Hsieh.

 [hbck] Offline rebuild .META. from fs data only.
 

 Key: HBASE-4377
 URL: https://issues.apache.org/jira/browse/HBASE-4377
 Project: HBase
  Issue Type: New Feature
Reporter: Jonathan Hsieh
Assignee: Jonathan Hsieh

 In a worst case situation, it may be helpful to have an offline .META. 
 rebuilder that just looks at the file system's .regioninfos and rebuilds meta 
 from scratch.  Users could move bad regions out until there is a clean 
 rebuild.  
 It would likely fill in region split holes.  Follow on work could given 
 options to merge or select regions that overlap, or do online rebuilds.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4506) [hbck] Allow HBaseFsck to be instantiated without connecting


 [ 
https://issues.apache.org/jira/browse/HBASE-4506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hsieh updated HBASE-4506:
--

Attachment: (was: 
0001-FLUME-580-Flume-needs-to-be-consistent-with-autodisc.patch)

 [hbck] Allow HBaseFsck to be instantiated without connecting
 

 Key: HBASE-4506
 URL: https://issues.apache.org/jira/browse/HBASE-4506
 Project: HBase
  Issue Type: Improvement
  Components: hbck
Reporter: Jonathan Hsieh
Assignee: Jonathan Hsieh
 Attachments: 
 0001-HBASE-4506-hbck-Allow-HBaseFsck-to-be-instantiated-w.patch


 This is a semantics preserving patch that allows for offline meta rebuild 
 (HBASE-4377) to reuse code in the existing hbck code when hbase is down.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4506) [hbck] Allow HBaseFsck to be instantiated without connecting


 [ 
https://issues.apache.org/jira/browse/HBASE-4506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hsieh updated HBASE-4506:
--

Attachment: 0001-FLUME-580-Flume-needs-to-be-consistent-with-autodisc.patch

 [hbck] Allow HBaseFsck to be instantiated without connecting
 

 Key: HBASE-4506
 URL: https://issues.apache.org/jira/browse/HBASE-4506
 Project: HBase
  Issue Type: Improvement
  Components: hbck
Reporter: Jonathan Hsieh
Assignee: Jonathan Hsieh
 Attachments: 
 0001-HBASE-4506-hbck-Allow-HBaseFsck-to-be-instantiated-w.patch


 This is a semantics preserving patch that allows for offline meta rebuild 
 (HBASE-4377) to reuse code in the existing hbck code when hbase is down.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4506) [hbck] Allow HBaseFsck to be instantiated without connecting

2011-09-28 Thread jirapos...@reviews.apache.org (Commented) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-4506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hsieh updated HBASE-4506:
--

Attachment: 0001-HBASE-4506-hbck-Allow-HBaseFsck-to-be-instantiated-w.patch

 [hbck] Allow HBaseFsck to be instantiated without connecting
 

 Key: HBASE-4506
 URL: https://issues.apache.org/jira/browse/HBASE-4506
 Project: HBase
  Issue Type: Improvement
  Components: hbck
Reporter: Jonathan Hsieh
Assignee: Jonathan Hsieh
 Attachments: 
 0001-HBASE-4506-hbck-Allow-HBaseFsck-to-be-instantiated-w.patch


 This is a semantics preserving patch that allows for offline meta rebuild 
 (HBASE-4377) to reuse code in the existing hbck code when hbase is down.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4506) [hbck] Allow HBaseFsck to be instantiated without connecting


[ 
https://issues.apache.org/jira/browse/HBASE-4506?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13116622#comment-13116622
 ] 

jirapos...@reviews.apache.org commented on HBASE-4506:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2085/
---

Review request for hbase and Michael Stack.


Summary
---

commit d51a9fa5f3419114deca8ecd71f4f1ec4d2a6bc5
Author: Jonathan Hsieh j...@cloudera.com
Date:   Wed Sep 28 10:18:00 2011 -0700

HBASE-4506 [hbck] Allow HBaseFsck to be instantiated without connecting

This is a semantics preserving patch that allows for offline meta rebuild 
(HBASE-4377) to reuse code in the existing hbck code when hbase is down.


This addresses bug HBASE-4506.
https://issues.apache.org/jira/browse/HBASE-4506


Diffs
-

  src/main/java/org/apache/hadoop/hbase/util/HBaseFsck.java 8465724 
  src/test/java/org/apache/hadoop/hbase/util/TestHBaseFsck.java fae0881 

Diff: https://reviews.apache.org/r/2085/diff


Testing
---

TestHBaseFsck passes.


Thanks,

jmhsieh



 [hbck] Allow HBaseFsck to be instantiated without connecting
 

 Key: HBASE-4506
 URL: https://issues.apache.org/jira/browse/HBASE-4506
 Project: HBase
  Issue Type: Improvement
  Components: hbck
Reporter: Jonathan Hsieh
Assignee: Jonathan Hsieh
 Attachments: 
 0001-HBASE-4506-hbck-Allow-HBaseFsck-to-be-instantiated-w.patch


 This is a semantics preserving patch that allows for offline meta rebuild 
 (HBASE-4377) to reuse code in the existing hbck code when hbase is down.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-2794) ROWCOL bloom filter not used if multiple columns within same family are requested in a Get

2011-09-28 Thread jirapos...@reviews.apache.org (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-2794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13116628#comment-13116628
 ] 

jirapos...@reviews.apache.org commented on HBASE-2794:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2084/#review2137
---


This is an important feature.

Since the boolean parameter, forward, correlates so closely with reseek, can we 
give it a better name ?
I was thinking about either reseek or forwardOnly.

- Ted


On 2011-09-28 16:03:52, Mikhail Bautin wrote:
bq.  
bq.  ---
bq.  This is an automatically generated e-mail. To reply, visit:
bq.  https://reviews.apache.org/r/2084/
bq.  ---
bq.  
bq.  (Updated 2011-09-28 16:03:52)
bq.  
bq.  
bq.  Review request for hbase.
bq.  
bq.  
bq.  Summary
bq.  ---
bq.  
bq.  Previously we only used row-column Bloom filters for scans that only 
requested one column. We have seen production queries that request up to 200 
columns, and with say ~6 store files per store (region / column family 
combination) this might have resulted in 1200 block read operations in the 
worst case. With this diff we will be avoiding seeks on store files that we 
know don't contain the row/column of interest when using an 
ExplicitColumnTracker. The performance should remain the same for column range 
queries.
bq.  
bq.  
bq.  This addresses bug HBASE-2794.
bq.  https://issues.apache.org/jira/browse/HBASE-2794
bq.  
bq.  
bq.  Diffs
bq.  -
bq.  
bq.src/main/java/org/apache/hadoop/hbase/regionserver/StoreFileScanner.java 
08d3ba4 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/StoreScanner.java 
ac2348e 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/MemStore.java 4aa72de 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/ScanQueryMatcher.java 
68cdac5 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java 
fd9e7ef 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueHeap.java 
9d9895c 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/KeyValueScanner.java 
6cdada7 
bq.src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java 7cbdb98 
bq.
src/main/java/org/apache/hadoop/hbase/regionserver/AbstractKeyValueScanner.java 
PRE-CREATION 
bq.src/main/java/org/apache/hadoop/hbase/KeyValue.java 585c4a8 
bq.src/main/java/org/apache/hadoop/hbase/io/hfile/AbstractHFileReader.java 
f5173c4 
bq.src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java a3d778e 
bq.src/main/java/org/apache/hadoop/hbase/util/CollectionBackedScanner.java 
32f88fb 
bq.src/test/java/org/apache/hadoop/hbase/regionserver/TestKeyValueHeap.java 
a5d13f7 
bq.
src/test/java/org/apache/hadoop/hbase/regionserver/TestMultiColumnScanner.java 
baee696 
bq.
src/test/java/org/apache/hadoop/hbase/regionserver/TestScanWithBloomError.java 
PRE-CREATION 
bq.  
bq.  Diff: https://reviews.apache.org/r/2084/diff
bq.  
bq.  
bq.  Testing
bq.  ---
bq.  
bq.  Existing unit tests. A new unit test (TestScanWithBloomError). Load 
testing using HBaseTest.
bq.  
bq.  
bq.  Thanks,
bq.  
bq.  Mikhail
bq.  
bq.



 ROWCOL bloom filter not used if multiple columns within same family are 
 requested in a Get
 --

 Key: HBASE-2794
 URL: https://issues.apache.org/jira/browse/HBASE-2794
 Project: HBase
  Issue Type: Improvement
  Components: performance
Reporter: Kannan Muthukkaruppan

 Noticed the following snippet in StoreFile.java:Scanner:shouldSeek():
 {code}
 switch(bloomFilterType) {
   case ROW:
 key = row;
 break;
   case ROWCOL:
 if (columns.size() == 1) {
   byte[] col = columns.first();
   key = Bytes.add(row, col);
   break;
 }
 //$FALL-THROUGH$
   default:
 return true;
 }
 {code}
 If columns.size  1, then we currently don't take advantage of the bloom 
 filter.  We should optimize this to check bloom for each of columns and if 
 none of the columns are present in the bloom avoid opening the file.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4506) [hbck] Allow HBaseFsck to be instantiated without connecting


 [ 
https://issues.apache.org/jira/browse/HBASE-4506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hsieh updated HBASE-4506:
--

Affects Version/s: 0.90.5
   Status: Patch Available  (was: Open)

this should apply on 0.92, and trunk.  There are conflicts on 0.90.

 [hbck] Allow HBaseFsck to be instantiated without connecting
 

 Key: HBASE-4506
 URL: https://issues.apache.org/jira/browse/HBASE-4506
 Project: HBase
  Issue Type: Improvement
  Components: hbck
Affects Versions: 0.90.5
Reporter: Jonathan Hsieh
Assignee: Jonathan Hsieh
 Attachments: 
 0001-HBASE-4506-hbck-Allow-HBaseFsck-to-be-instantiated-w.patch


 This is a semantics preserving patch that allows for offline meta rebuild 
 (HBASE-4377) to reuse code in the existing hbck code when hbase is down.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4506) [hbck] Allow HBaseFsck to be instantiated without connecting


 [ 
https://issues.apache.org/jira/browse/HBASE-4506?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jonathan Hsieh updated HBASE-4506:
--

Attachment: hbase-4506-0.90.patch

 [hbck] Allow HBaseFsck to be instantiated without connecting
 

 Key: HBASE-4506
 URL: https://issues.apache.org/jira/browse/HBASE-4506
 Project: HBase
  Issue Type: Improvement
  Components: hbck
Affects Versions: 0.90.5
Reporter: Jonathan Hsieh
Assignee: Jonathan Hsieh
 Attachments: 
 0001-HBASE-4506-hbck-Allow-HBaseFsck-to-be-instantiated-w.patch, 
 hbase-4506-0.90.patch


 This is a semantics preserving patch that allows for offline meta rebuild 
 (HBASE-4377) to reuse code in the existing hbck code when hbase is down.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4377) [hbck] Offline rebuild .META. from fs data only.

2011-09-28 Thread Jonathan Hsieh (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13116634#comment-13116634
 ] 

Jonathan Hsieh commented on HBASE-4377:
---

I think my plan is to postpone the large refactor until after this gets 
through. 

 [hbck] Offline rebuild .META. from fs data only.
 

 Key: HBASE-4377
 URL: https://issues.apache.org/jira/browse/HBASE-4377
 Project: HBase
  Issue Type: New Feature
Reporter: Jonathan Hsieh
Assignee: Jonathan Hsieh

 In a worst case situation, it may be helpful to have an offline .META. 
 rebuilder that just looks at the file system's .regioninfos and rebuilds meta 
 from scratch.  Users could move bad regions out until there is a clean 
 rebuild.  
 It would likely fill in region split holes.  Follow on work could given 
 options to merge or select regions that overlap, or do online rebuilds.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4489) Better key splitting in RegionSplitter

2011-09-28 Thread Dave Revell (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-4489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13116639#comment-13116639
]

Dave Revell commented on HBASE-4489:

@Jonathan Hsieh, thanks for your thoughts.

When you say you agree with jgray: he actually wants to do two things. (1) stop
using ASCII and (2) remove the 0x7F range bug. It sounds like you only agree
with removing the 0x7F range bug but not with avoiding ASCII, for the default
split algorithm?

I agree in principle with your comment about preserving behavior between minor
releases. If there were a valid use case for the existing code, I would agree
that we should leave it. But given its current brokenness, we should fix it all
the way instead of creating an intermediate slightly-broken state that falls
short of a real fix. We're already breaking any existing use cases by virtue of
fixing the range bug. We should not create another generation of broken use
cases before making the real fix, IMO.

I agree that tests would be a good idea. I'll hopefully find some time for that
soon.

Better key splitting in RegionSplitter
--

Key: HBASE-4489
URL: https://issues.apache.org/jira/browse/HBASE-4489
Project: HBase
Issue Type: Improvement
Affects Versions: 0.90.4
Reporter: Dave Revell
Assignee: Dave Revell
Attachments: HBASE-4489-branch0.90-v1.patch, HBASE-4489-trunk-v1.patch

The RegionSplitter utility allows users to create a pre-split table from the
command line or do a rolling split on an existing table. It supports
pluggable split algorithms that implement the SplitAlgorithm interface. The
only/default SplitAlgorithm is one that assumes keys fall in the range from
ASCII string to ASCII string 7FFF. This is not a sane
default, and seems useless to most users. Users are likely to be surprised by
the fact that all the region splits occur in in the byte range of ASCII
characters.
A better default split algorithm would be one that evenly divides the space
of all bytes, which is what this patch does. Making a table with five regions
would split at \x33\x33..., \x66\x66, \x99\x99..., \xCC\xCC..., and
\xFF\xFF.

[jira] [Commented] (HBASE-4489) Better key splitting in RegionSplitter

[
https://issues.apache.org/jira/browse/HBASE-4489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13116642#comment-13116642
]

Ted Yu commented on HBASE-4489:
---

bq. There were no tests on the previous code
I think lack of unit test is not a gating item for this JIRA.

Better key splitting in RegionSplitter
--

[jira] [Commented] (HBASE-4489) Better key splitting in RegionSplitter

[
https://issues.apache.org/jira/browse/HBASE-4489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13116648#comment-13116648
]

Ted Yu commented on HBASE-4489:
---

+1 on Dave's patches.

Better key splitting in RegionSplitter
--

[jira] [Commented] (HBASE-4485) Eliminate window of missing Data


[ 
https://issues.apache.org/jira/browse/HBASE-4485?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13116660#comment-13116660
 ] 

stack commented on HBASE-4485:
--

@Amit Great stuff.  I like the reasoning above especially the bit where the fix 
I'd have done, the swapping order, likely has issues.

Looks like a little pollution in this patch from hbase-4344 but no matter since 
you've merged this into hbase-4344 over in hbase-4344 (getMaxMemstoreTS?).

Why move the notify outside of the lock?  Is it possible that when done outside 
of the lock, that observers could ever see different lists of readers?




 Eliminate window of missing Data
 

 Key: HBASE-4485
 URL: https://issues.apache.org/jira/browse/HBASE-4485
 Project: HBase
  Issue Type: Sub-task
Reporter: Amitanand Aiyer
Assignee: Amitanand Aiyer
 Fix For: 0.94.0

 Attachments: 4485-v1.diff, 4485-v2.diff, 4485-v3.diff, 4485-v4.diff, 
 repro_bug-4485.diff


 After incorporating v11 of the 2856 fix, we discovered that we are still 
 having some ACID violations.
 This time, however, the problem is not about including newer updates; but, 
 about missing older updates
 that should be including. 
 Here is what seems to be happening.
 There is a race condition in the StoreScanner.getScanners()
   private ListKeyValueScanner getScanners(Scan scan,
   final NavigableSetbyte[] columns) throws IOException {
 // First the store file scanners
 ListStoreFileScanner sfScanners = StoreFileScanner
   .getScannersForStoreFiles(store.getStorefiles(), cacheBlocks,
 isGet, false);
 ListKeyValueScanner scanners =
   new ArrayListKeyValueScanner(sfScanners.size()+1);
 // include only those scan files which pass all filters
 for (StoreFileScanner sfs : sfScanners) {
   if (sfs.shouldSeek(scan, columns)) {
 scanners.add(sfs);
   }
 }
 // Then the memstore scanners
 if (this.store.memstore.shouldSeek(scan)) {
   scanners.addAll(this.store.memstore.getScanners());
 }
 return scanners;
   }
 If for example there is a call to Store.updateStorefiles() that happens 
 between
 the store.getStorefiles() and this.store.memstore.getScanners(); then
 it is possible that there was a new HFile created, that is not seen by the
 StoreScanner, and the data is not present in the Memstore.snapshot either.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4488) Store could miss rows during flush

2011-09-28 Thread Hudson (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13116663#comment-13116663
 ] 

Hudson commented on HBASE-4488:
---

Integrated in HBase-0.92 #25 (See 
[https://builds.apache.org/job/HBase-0.92/25/])
HBASE-4488  Store could miss rows during flush (Lars H via jgray)

jgray : 
Files : 
* /hbase/branches/0.92/CHANGES.txt
* 
/hbase/branches/0.92/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java


 Store could miss rows during flush
 --

 Key: HBASE-4488
 URL: https://issues.apache.org/jira/browse/HBASE-4488
 Project: HBase
  Issue Type: Sub-task
  Components: regionserver
Affects Versions: 0.92.0, 0.94.0
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 0.92.0

 Attachments: 4488.txt


 While looking at HBASE-4344 I found that my change HBASE-4241 contains a 
 critical mistake:
 The while(scanner.next(kvs)) loop is incorrect and might miss the last edits.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-2794) ROWCOL bloom filter not used if multiple columns within same family are requested in a Get


 [ 
https://issues.apache.org/jira/browse/HBASE-2794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-2794:
--

Fix Version/s: 0.92.0

 ROWCOL bloom filter not used if multiple columns within same family are 
 requested in a Get
 --

 Key: HBASE-2794
 URL: https://issues.apache.org/jira/browse/HBASE-2794
 Project: HBase
  Issue Type: Improvement
  Components: performance
Reporter: Kannan Muthukkaruppan
 Fix For: 0.92.0


 Noticed the following snippet in StoreFile.java:Scanner:shouldSeek():
 {code}
 switch(bloomFilterType) {
   case ROW:
 key = row;
 break;
   case ROWCOL:
 if (columns.size() == 1) {
   byte[] col = columns.first();
   key = Bytes.add(row, col);
   break;
 }
 //$FALL-THROUGH$
   default:
 return true;
 }
 {code}
 If columns.size  1, then we currently don't take advantage of the bloom 
 filter.  We should optimize this to check bloom for each of columns and if 
 none of the columns are present in the bloom avoid opening the file.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4477) Ability for an application to store metadata into the transaction log

2011-09-28 Thread Andrew Purtell (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-4477?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13116675#comment-13116675
]

Andrew Purtell commented on HBASE-4477:
---

@dhruba

bq. can I change the signature of RegionObserver.prePut() to take in two
additional arguments: a WALEdit and Put object

That sounds good. Since there is no release yet, no deprecation is necessary.
Can be done simply by a patch on this issue I'd say.

@Jon

bq. If things are built only on Coprocessor interfaces, do people see us
including these in some kind of coprocessor contrib or should they be out on
github or something

I think it depends. Some stuff like security we'd clearly want to bundle. And
by that not a resurrection of contrib, instead as another package in main/.
Random additions that people build for themselves should go up on GitHub. My
opinion is anything that is core to a group of use cases is a candidate for
bundling, if a contributor or committer wants to maintain it, and/or if people
generally feel it is a good idea to bring the candidate in.

Ability for an application to store metadata into the transaction log
-

Key: HBASE-4477
URL: https://issues.apache.org/jira/browse/HBASE-4477
Project: HBase
Issue Type: Improvement
Reporter: dhruba borthakur
Assignee: dhruba borthakur
Attachments: hlogMetadata1.txt

mySQL allows an application to store an arbitrary blob along with each
transaction in its transaction logs. This JIRA is to have a similar feature
request for HBASE.
The use case is as follows: An application on one data center A stores a blob
of data along with each transaction. A replication software picks up these
blobs from the transaction logs in A and hands it to another instance of the
same application running on a remote data center B. The application in B is
responsible for applying this to the remote Hbase cluster (and also handle
conflict resolution if any).

[jira] [Commented] (HBASE-4489) Better key splitting in RegionSplitter

2011-09-28 Thread Todd Lipcon (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-4489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13116683#comment-13116683
]

Todd Lipcon commented on HBASE-4489:

bq. I think lack of unit test is not a gating item for this JIRA.

Why not? Lack of unit tests is what caused the bug in the first place. This is
trivially testable.

Better key splitting in RegionSplitter
--

[jira] [Commented] (HBASE-4489) Better key splitting in RegionSplitter

2011-09-28 Thread Dave Revell (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-4489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13116685#comment-13116685
]

Dave Revell commented on HBASE-4489:

I don't object to adding tests. I can have them by next Monday if someone else
doesn't write them first.

Better key splitting in RegionSplitter
--

[jira] [Commented] (HBASE-4344) Persist memstoreTS to disk


[ 
https://issues.apache.org/jira/browse/HBASE-4344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13116684#comment-13116684
 ] 

stack commented on HBASE-4344:
--

Looking at v12.  Minor nits.  Can fix on commit.  Small question below.

On commit we should just remove the below rather than comment them out:

{code}
-  @Ignore(Currently not passing - see HBASE-2856)
+  //@Ignore(Currently not passing - see HBASE-2856)
{code}

Looks like we usually just delete the '@Ignore'.

In StoreScanner we do:

{code}
+matcher.ignoreNewerKVs();
{code}

Does this mean that we will always ignore kvs with newer readpoints?  If so, 
should we just strip this method altogether and the setting of boolean 
attribute?  Same for similar method out in hfile v2.

Why should someone be able to do this?

{code}
+  public void setMaxMemstoreTS(long maxMemstoreTS) {
+this.maxMemstoreTS = maxMemstoreTS;
+  }
{code}

Shouldn't we be getting this from looking at the kvs that come in when writing 
and then when reading, it comes up out of the hfile metadata?  A client should 
never be able to set it?

This looks like it could be too much logging (I'd think that around here we are 
already dumping out the file path):

{code}
 
+LOG.info(HFileReaderV2 trying to read from  + path);
+
{code}



 Persist memstoreTS to disk
 --

 Key: HBASE-4344
 URL: https://issues.apache.org/jira/browse/HBASE-4344
 Project: HBase
  Issue Type: Sub-task
Reporter: Amitanand Aiyer
Assignee: Amitanand Aiyer
 Fix For: 0.89.20100924

 Attachments: 4344-v10.txt, 4344-v11.txt, 4344-v12.txt, 4344-v2.txt, 
 4344-v4.txt, 4344-v5.txt, 4344-v6.txt, 4344-v7.txt, 4344-v8.txt, 4344-v9.txt, 
 patch-2


 Atomicity can be achieved in two ways -- (i) by using  a multiversion 
 concurrency system (MVCC), or (ii) by ensuring that new writes do not 
 complete, until the old reads complete.
 Currently, Memstore uses something along the lines of MVCC (called RWCC for 
 read-write-consistency-control). But, this mechanism is not incorporated for 
 the key-values written to the disk, as they do not include the memstore TS.
 Let us make the two approaches be similar, by persisting the memstoreTS along 
 with the key-value when it is written to the disk.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-2794) ROWCOL bloom filter not used if multiple columns within same family are requested in a Get


[ 
https://issues.apache.org/jira/browse/HBASE-2794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13116689#comment-13116689
 ] 

Ted Yu commented on HBASE-2794:
---

I got the following errors from test suite:
{code}
Failed tests:   
testWorkerAbort(org.apache.hadoop.hbase.master.TestDistributedLogSplitting): 
expected:1 but was:0

Tests in error:
  testMergeTool(org.apache.hadoop.hbase.util.TestMergeTool): String index out 
of range: -1
  testBasicRollingRestart(org.apache.hadoop.hbase.master.TestRollingRestart): 
test timed out after 30 milliseconds
{code}
They passed individually.

 ROWCOL bloom filter not used if multiple columns within same family are 
 requested in a Get
 --

 Key: HBASE-2794
 URL: https://issues.apache.org/jira/browse/HBASE-2794
 Project: HBase
  Issue Type: Improvement
  Components: performance
Reporter: Kannan Muthukkaruppan
 Fix For: 0.92.0


 Noticed the following snippet in StoreFile.java:Scanner:shouldSeek():
 {code}
 switch(bloomFilterType) {
   case ROW:
 key = row;
 break;
   case ROWCOL:
 if (columns.size() == 1) {
   byte[] col = columns.first();
   key = Bytes.add(row, col);
   break;
 }
 //$FALL-THROUGH$
   default:
 return true;
 }
 {code}
 If columns.size  1, then we currently don't take advantage of the bloom 
 filter.  We should optimize this to check bloom for each of columns and if 
 none of the columns are present in the bloom avoid opening the file.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4145) Provide metrics for hbase client

2011-09-28 Thread Todd Lipcon (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-4145?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13116687#comment-13116687
]

Todd Lipcon commented on HBASE-4145:

This is nice stuff. I haven't looked at the code yet, but the feature seems
very useful. One small nit from the screenshot - I think we can rename the
counters from COUNT_OF_FOO to just FOOS -- the fact that it's' a COUNT_OF
or SUM_OF is implicit in it being a counter. eg we had HDFS_BYTES_READ, not
COUNT_OF_HDFS_BYTES_READ

Provide metrics for hbase client

Key: HBASE-4145
URL: https://issues.apache.org/jira/browse/HBASE-4145
Project: HBase
Issue Type: Improvement
Reporter: Ming Ma
Assignee: Ming Ma
Attachments: HBaseClientSideMetrics.jpg

[jira] [Commented] (HBASE-4344) Persist memstoreTS to disk


[ 
https://issues.apache.org/jira/browse/HBASE-4344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13116698#comment-13116698
 ] 

Ted Yu commented on HBASE-4344:
---

Note: v12 isn't ready to be committed yet.
We're still trying to solve HBASE-4485 without introducing deadlock.

 Persist memstoreTS to disk
 --

 Key: HBASE-4344
 URL: https://issues.apache.org/jira/browse/HBASE-4344
 Project: HBase
  Issue Type: Sub-task
Reporter: Amitanand Aiyer
Assignee: Amitanand Aiyer
 Fix For: 0.89.20100924

 Attachments: 4344-v10.txt, 4344-v11.txt, 4344-v12.txt, 4344-v2.txt, 
 4344-v4.txt, 4344-v5.txt, 4344-v6.txt, 4344-v7.txt, 4344-v8.txt, 4344-v9.txt, 
 patch-2


 Atomicity can be achieved in two ways -- (i) by using  a multiversion 
 concurrency system (MVCC), or (ii) by ensuring that new writes do not 
 complete, until the old reads complete.
 Currently, Memstore uses something along the lines of MVCC (called RWCC for 
 read-write-consistency-control). But, this mechanism is not incorporated for 
 the key-values written to the disk, as they do not include the memstore TS.
 Let us make the two approaches be similar, by persisting the memstoreTS along 
 with the key-value when it is written to the disk.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4497) If region opening fails after updating META HBCK reports it as inconsistent and scanning the region throws NSRE

[
https://issues.apache.org/jira/browse/HBASE-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13116695#comment-13116695
]

Jonathan Gray commented on HBASE-4497:
--

startcode and timestamp is what i initially thought of. seems like there could
be some weird situations. for example, what is to say that the server already
in META didn't somehow become the new assignment destination?

or what if... M tells RS1 to OPEN R1 and to expect RS3:StartCode3 in META. RS1
locks up right before doing the META edit, M tells RS2 to OPEN R1 and to also
expect RS3:StartCode3 in META. I guess this is the atomicity we need, so that
should be okay.

one neat idea would be to introduce this region assignment incrementing ID into
META. it would provide a nice way to debug the movement of a region across the
cluster over time and could also provide the necessary info to use CheckAndPut.

If region opening fails after updating META HBCK reports it as inconsistent
and scanning the region throws NSRE
---

Key: HBASE-4497
URL: https://issues.apache.org/jira/browse/HBASE-4497
Project: HBase
Issue Type: Bug
Reporter: ramkrishna.s.vasudevan
Priority: Critical

[jira] [Commented] (HBASE-4377) [hbck] Offline rebuild .META. from fs data only.

2011-09-28 Thread jirapos...@reviews.apache.org (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13116708#comment-13116708
 ] 

stack commented on HBASE-4377:
--

@Jon So you want me to review whats over in github and commit that?

 [hbck] Offline rebuild .META. from fs data only.
 

 Key: HBASE-4377
 URL: https://issues.apache.org/jira/browse/HBASE-4377
 Project: HBase
  Issue Type: New Feature
Reporter: Jonathan Hsieh
Assignee: Jonathan Hsieh

 In a worst case situation, it may be helpful to have an offline .META. 
 rebuilder that just looks at the file system's .regioninfos and rebuilds meta 
 from scratch.  Users could move bad regions out until there is a clean 
 rebuild.  
 It would likely fill in region split holes.  Follow on work could given 
 options to merge or select regions that overlap, or do online rebuilds.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4377) [hbck] Offline rebuild .META. from fs data only.

2011-09-28 Thread Jonathan Hsieh (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13116712#comment-13116712
 ] 

Jonathan Hsieh commented on HBASE-4377:
---

@stack: Not yet, I'm still cleaning this up and adding tests right now.

 [hbck] Offline rebuild .META. from fs data only.
 

 Key: HBASE-4377
 URL: https://issues.apache.org/jira/browse/HBASE-4377
 Project: HBase
  Issue Type: New Feature
Reporter: Jonathan Hsieh
Assignee: Jonathan Hsieh

 In a worst case situation, it may be helpful to have an offline .META. 
 rebuilder that just looks at the file system's .regioninfos and rebuilds meta 
 from scratch.  Users could move bad regions out until there is a clean 
 rebuild.  
 It would likely fill in region split holes.  Follow on work could given 
 options to merge or select regions that overlap, or do online rebuilds.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4377) [hbck] Offline rebuild .META. from fs data only.

2011-09-28 Thread Jonathan Hsieh (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4377?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13116713#comment-13116713
 ] 

Jonathan Hsieh commented on HBASE-4377:
---

More detail -- I've done a large refactor of hbck but found that then doing the 
changes would more difficult understand or review the offline rebuild code.  
So, my plan is to add the offline rebuild code, and then potentially do a 
refactor afterwards.

Regardless of whether the refactor happens, I feel that I need to add tests and 
docs for this before it is ready for review. 


 [hbck] Offline rebuild .META. from fs data only.
 

 Key: HBASE-4377
 URL: https://issues.apache.org/jira/browse/HBASE-4377
 Project: HBase
  Issue Type: New Feature
Reporter: Jonathan Hsieh
Assignee: Jonathan Hsieh

 In a worst case situation, it may be helpful to have an offline .META. 
 rebuilder that just looks at the file system's .regioninfos and rebuilds meta 
 from scratch.  Users could move bad regions out until there is a clean 
 rebuild.  
 It would likely fill in region split holes.  Follow on work could given 
 options to merge or select regions that overlap, or do online rebuilds.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4422) Move block cache parameters and references into single CacheConf class

[
https://issues.apache.org/jira/browse/HBASE-4422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13116720#comment-13116720
]

jirapos...@reviews.apache.org commented on HBASE-4422:
--

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2089/
---

Review request for hbase, Dhruba Borthakur, Michael Stack, and Li Pi.

Summary
---

Creates a new CacheConfig class and moves almost everything block cache related
into this single class. Adding new configuration params and booleans and such
should be much better.

All tests are NOT passing yet, still working on it, but wanted to have
something up today. Basically code complete but broken :)

This addresses bug HBASE-4422.
https://issues.apache.org/jira/browse/HBASE-4422

Diffs
-

Diff: https://reviews.apache.org/r/2089/diff

Testing
---

Still working through some tests that aren't passing.

Thanks,

Jonathan

Move block cache parameters and references into single CacheConf class
--

Key: HBASE-4422
URL: https://issues.apache.org/jira/browse/HBASE-4422
Project: HBase
Issue Type: Improvement
Components: io
Reporter: Jonathan Gray
Assignee: Jonathan Gray
Fix For: 0.92.0

From StoreFile down to HFile, we currently use a boolean argument for each of
the various block cache configuration parameters that exist. The number of
parameters is going to continue to increase as we look at compressed cache,
delta encoding, and more specific L1/L2 configuration. Every new config
currently requires changing many constructors because it introduces a new
boolean.
We should move everything into a single class so that modifications are much
less disruptive.

[jira] [Commented] (HBASE-4497) If region opening fails after updating META HBCK reports it as inconsistent and scanning the region throws NSRE

[
https://issues.apache.org/jira/browse/HBASE-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13116721#comment-13116721
]

stack commented on HBASE-4497:
--

bq. startcode and timestamp is what i initially thought of. seems like there
could be some weird situations. for example, what is to say that the server
already in META didn't somehow become the new assignment destination?

The timestamp will be different in this case? (It'll have been updated by the
new open).

bq. or what if... M tells RS1 to OPEN R1 and to expect RS3:StartCode3

I'm not suggesting the master tell the RS anything new. I'm suggesting that on
receiving the open, the RS itself read .META. at start of the open transaction
before it does anything else and use this read as input for the later
checkAndSet write.

bq. one neat idea would be to introduce this region assignment incrementing ID
into META. it would provide a nice way to debug the movement of a region across
the cluster over time and could also provide the necessary info to use
CheckAndPut.

This could work. Downsides are M has to write meta first before doing assign
which will be a bit of new burden on meta (double'd write load?) and this new
write is now inline with an assign; we'd have to do some hackery in here around
bulk assign.

If region opening fails after updating META HBCK reports it as inconsistent
and scanning the region throws NSRE
---

Key: HBASE-4497
URL: https://issues.apache.org/jira/browse/HBASE-4497
Project: HBase
Issue Type: Bug
Reporter: ramkrishna.s.vasudevan
Priority: Critical

[jira] [Commented] (HBASE-4497) If region opening fails after updating META HBCK reports it as inconsistent and scanning the region throws NSRE

[
https://issues.apache.org/jira/browse/HBASE-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13116727#comment-13116727
]

stack commented on HBASE-4497:
--

I just checked the checkAndPut. It doesn't expose timestamp. So. Fix
checkAndPut so it exposes timestamp or write timestamp or uuid to meta into a
new column info:editid whenever we do the metadata open update (I'd prefer
adding a checkAndPut override -- seems like a hole in checkAndPut that we don't
allow version checking).

If region opening fails after updating META HBCK reports it as inconsistent
and scanning the region throws NSRE
---

Key: HBASE-4497
URL: https://issues.apache.org/jira/browse/HBASE-4497
Project: HBase
Issue Type: Bug
Reporter: ramkrishna.s.vasudevan
Priority: Critical

[jira] [Commented] (HBASE-4497) If region opening fails after updating META HBCK reports it as inconsistent and scanning the region throws NSRE

[
https://issues.apache.org/jira/browse/HBASE-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13116729#comment-13116729
]

Jonathan Gray commented on HBASE-4497:
--

Sounds like it could work. I'm +1 on exposing version to checkAndPut and using
it for META edits. Good point, we can just do the read on the RS first.

If region opening fails after updating META HBCK reports it as inconsistent
and scanning the region throws NSRE
---

Key: HBASE-4497
URL: https://issues.apache.org/jira/browse/HBASE-4497
Project: HBase
Issue Type: Bug
Reporter: ramkrishna.s.vasudevan
Priority: Critical

[jira] [Created] (HBASE-4507) Create checkAndPut variant that exposes timestamp / UUID

2011-09-28 Thread Ted Yu (Created) (JIRA)

Create checkAndPut variant that exposes timestamp / UUID


 Key: HBASE-4507
 URL: https://issues.apache.org/jira/browse/HBASE-4507
 Project: HBase
  Issue Type: Sub-task
Reporter: Ted Yu


Michael checked the checkAndPut which doesn't expose timestamp. So variant of 
checkAndPut should expose timestamp by writing timestamp or uuid to .META. into 
a new column info:editid whenever we do the metadata open update.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4497) If region opening fails after updating META HBCK reports it as inconsistent and scanning the region throws NSRE

2011-09-28 Thread jirapos...@reviews.apache.org (Commented) (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-4497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13116740#comment-13116740
]

Ted Yu commented on HBASE-4497:
---

HBASE-4507 has been opened.

If region opening fails after updating META HBCK reports it as inconsistent
and scanning the region throws NSRE
---

Key: HBASE-4497
URL: https://issues.apache.org/jira/browse/HBASE-4497
Project: HBase
Issue Type: Bug
Reporter: ramkrishna.s.vasudevan
Priority: Critical

[jira] [Commented] (HBASE-4422) Move block cache parameters and references into single CacheConf class


[ 
https://issues.apache.org/jira/browse/HBASE-4422?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13116742#comment-13116742
 ] 

jirapos...@reviews.apache.org commented on HBASE-4422:
--


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/2089/
---

(Updated 2011-09-28 19:56:14.698336)


Review request for hbase, Dhruba Borthakur, Michael Stack, and Li Pi.


Changes
---

Diff attached now.


Summary
---

Creates a new CacheConfig class and moves almost everything block cache related 
into this single class.  Adding new configuration params and booleans and such 
should be much better.

All tests are NOT passing yet, still working on it, but wanted to have 
something up today.  Basically code complete but broken :)


This addresses bug HBASE-4422.
https://issues.apache.org/jira/browse/HBASE-4422


Diffs (updated)
-

  /src/main/java/org/apache/hadoop/hbase/io/HalfStoreFileReader.java 1177030 
  /src/main/java/org/apache/hadoop/hbase/io/hfile/AbstractHFileReader.java 
1177030 
  /src/main/java/org/apache/hadoop/hbase/io/hfile/AbstractHFileWriter.java 
1177030 
  /src/main/java/org/apache/hadoop/hbase/io/hfile/CacheConfig.java PRE-CREATION 
  /src/main/java/org/apache/hadoop/hbase/io/hfile/HFile.java 1177030 
  /src/main/java/org/apache/hadoop/hbase/io/hfile/HFileBlockIndex.java 1177030 
  /src/main/java/org/apache/hadoop/hbase/io/hfile/HFilePrettyPrinter.java 
1177030 
  /src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV1.java 1177030 
  /src/main/java/org/apache/hadoop/hbase/io/hfile/HFileReaderV2.java 1177030 
  /src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV1.java 1177030 
  /src/main/java/org/apache/hadoop/hbase/io/hfile/HFileWriterV2.java 1177030 
  /src/main/java/org/apache/hadoop/hbase/mapreduce/HFileOutputFormat.java 
1177030 
  /src/main/java/org/apache/hadoop/hbase/mapreduce/LoadIncrementalHFiles.java 
1177030 
  /src/main/java/org/apache/hadoop/hbase/regionserver/HRegion.java 1177030 
  /src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java 
1177030 
  /src/main/java/org/apache/hadoop/hbase/regionserver/Store.java 1177030 
  /src/main/java/org/apache/hadoop/hbase/regionserver/StoreFile.java 1177030 
  /src/main/java/org/apache/hadoop/hbase/util/BloomFilterFactory.java 1177030 
  /src/main/java/org/apache/hadoop/hbase/util/CompressionTest.java 1177030 
  /src/test/java/org/apache/hadoop/hbase/HFilePerformanceEvaluation.java 
1177030 
  /src/test/java/org/apache/hadoop/hbase/io/TestHalfStoreFileReader.java 
1177030 
  /src/test/java/org/apache/hadoop/hbase/io/hfile/RandomSeek.java 1177030 
  /src/test/java/org/apache/hadoop/hbase/io/hfile/TestCacheOnWrite.java 1177030 
  /src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFile.java 1177030 
  /src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileBlockIndex.java 
1177030 
  /src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFilePerformance.java 
1177030 
  /src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileReaderV1.java 
1177030 
  /src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileSeek.java 1177030 
  /src/test/java/org/apache/hadoop/hbase/io/hfile/TestHFileWriterV2.java 
1177030 
  /src/test/java/org/apache/hadoop/hbase/io/hfile/TestReseekTo.java 1177030 
  /src/test/java/org/apache/hadoop/hbase/io/hfile/TestSeekTo.java 1177030 
  /src/test/java/org/apache/hadoop/hbase/mapreduce/TestHFileOutputFormat.java 
1177030 
  
/src/test/java/org/apache/hadoop/hbase/mapreduce/TestLoadIncrementalHFiles.java 
1177030 
  /src/test/java/org/apache/hadoop/hbase/regionserver/TestBlocksRead.java 
1177030 
  /src/test/java/org/apache/hadoop/hbase/regionserver/TestCompactSelection.java 
1177030 
  
/src/test/java/org/apache/hadoop/hbase/regionserver/TestCompoundBloomFilter.java
 1177030 
  /src/test/java/org/apache/hadoop/hbase/regionserver/TestFSErrorsExposed.java 
1177030 
  /src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFile.java 
1177030 
  
/src/test/java/org/apache/hadoop/hbase/regionserver/TestStoreFileBlockCacheSummary.java
 1177030 
  /src/test/java/org/apache/hadoop/hbase/regionserver/wal/TestWALReplay.java 
1177030 

Diff: https://reviews.apache.org/r/2089/diff


Testing
---

Still working through some tests that aren't passing.


Thanks,

Jonathan



 Move block cache parameters and references into single CacheConf class
 --

 Key: HBASE-4422
 URL: https://issues.apache.org/jira/browse/HBASE-4422
 Project: HBase
  Issue Type: Improvement
  Components: io
Reporter: Jonathan Gray
Assignee: Jonathan Gray
 Fix For: 0.92.0


 From StoreFile down

[jira] [Commented] (HBASE-4131) Make the Replication Service pluggable via a standard interface definition

2011-09-28 Thread Hudson (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4131?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13116762#comment-13116762
 ] 

Hudson commented on HBASE-4131:
---

Integrated in HBase-TRUNK #2264 (See 
[https://builds.apache.org/job/HBase-TRUNK/2264/])
HBASE-4131  Make the Replication Service pluggable via a standard interface 
definition (dhruba via jgray)

jgray : 
Files : 
* /hbase/trunk/CHANGES.txt
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/HConstants.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/HRegionServer.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ReplicationService.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ReplicationSinkService.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/ReplicationSourceService.java
* 
/hbase/trunk/src/main/java/org/apache/hadoop/hbase/replication/regionserver/Replication.java


 Make the Replication Service pluggable via a standard interface definition
 --

 Key: HBASE-4131
 URL: https://issues.apache.org/jira/browse/HBASE-4131
 Project: HBase
  Issue Type: Improvement
  Components: regionserver
Reporter: dhruba borthakur
Assignee: dhruba borthakur
 Fix For: 0.94.0

 Attachments: 4131-backedout.txt, replicationInterface1.txt, 
 replicationInterface2.txt, replicationInterface3.txt, 
 replicationInterface4.txt


 The current HBase code supports a replication service that can be used to 
 sync data from from one hbase cluster to another. It would be nice to make it 
 a pluggable interface so that other cross-data-center replication services 
 can be used in conjuction with HBase.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4488) Store could miss rows during flush

2011-09-28 Thread Hudson (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4488?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13116761#comment-13116761
 ] 

Hudson commented on HBASE-4488:
---

Integrated in HBase-TRUNK #2264 (See 
[https://builds.apache.org/job/HBase-TRUNK/2264/])
HBASE-4488  Store could miss rows during flush (Lars H via jgray)

jgray : 
Files : 
* /hbase/trunk/CHANGES.txt
* /hbase/trunk/src/main/java/org/apache/hadoop/hbase/regionserver/Store.java


 Store could miss rows during flush
 --

 Key: HBASE-4488
 URL: https://issues.apache.org/jira/browse/HBASE-4488
 Project: HBase
  Issue Type: Sub-task
  Components: regionserver
Affects Versions: 0.92.0, 0.94.0
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
 Fix For: 0.92.0

 Attachments: 4488.txt


 While looking at HBASE-4344 I found that my change HBASE-4241 contains a 
 critical mistake:
 The while(scanner.next(kvs)) loop is incorrect and might miss the last edits.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4207) Run test suite in parallel, multiple concurrent test instances.

[
https://issues.apache.org/jira/browse/HBASE-4207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13116770#comment-13116770
]

stack commented on HBASE-4207:
--

I learned that this parallel mechanism only works if you do NOT for to run
tests:
http://stackoverflow.com/questions/3600090/maven-surefire-unable-to-fork-parallel-test-execution/7426894#7426894
My sense is we have to fork to protect against tests that run wild (or if we
move the crazies into integrated test category, maybe we could run non-forked).

Trying to build w/o forking, most tests pass. A few fail such as
TestTableMapReduce and TestCompaction. Would have to look at these.

When I try w/ parallel on, we are running 1/10th of the tests (but they run
fast).

Run test suite in parallel, multiple concurrent test instances.
---

Key: HBASE-4207
URL: https://issues.apache.org/jira/browse/HBASE-4207
Project: HBase
Issue Type: Task
Components: test
Reporter: stack
Attachments: parallel.build.txt

From a suggestion by Lohit up on the list, surefire allows running unit tests
in parallel. I'm trying it. I'll attach the patch to do classes in
parallel (as opposed to methods) with four threads per core max.

[jira] [Commented] (HBASE-4507) Create checkAndPut variant that exposes timestamp / UUID


[ 
https://issues.apache.org/jira/browse/HBASE-4507?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13116781#comment-13116781
 ] 

Jonathan Gray commented on HBASE-4507:
--

Is this jira for modifying CheckAndPut or using it with META?  The name doesn't 
match the description.  And I'm not sure this really a sub-task as much as it's 
a related task.

 Create checkAndPut variant that exposes timestamp / UUID
 

 Key: HBASE-4507
 URL: https://issues.apache.org/jira/browse/HBASE-4507
 Project: HBase
  Issue Type: Sub-task
Reporter: Ted Yu

 Michael checked the checkAndPut which doesn't expose timestamp. So variant of 
 checkAndPut should expose timestamp by writing timestamp or uuid to .META. 
 into a new column info:editid whenever we do the metadata open update.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4454) Add failsafe plugin to build and rename integration tests

2011-09-28 Thread Jesse Yates (Updated) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-4454?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jesse Yates updated HBASE-4454:
---

Attachment: mvn_HBASE-4454.patch

Patch uploaded to separate out running integration tests.

IntegrationTests must be named as **/IntegrationTest*.java.

They can be run with the command: 'mvn verfy'.

Since verify is part of the standard build phases, under assembly, package, 
etc, integration tests will be run automatically when doing a full build.

@Stack: should I open up a separate ticket, new patch version, or just add 
another patch for updating documentation? Do we even need to update the docs 
for this?


 Add failsafe plugin to build and rename integration tests
 -

 Key: HBASE-4454
 URL: https://issues.apache.org/jira/browse/HBASE-4454
 Project: HBase
  Issue Type: Improvement
Reporter: Jesse Yates
 Attachments: mvn_HBASE-4454.patch


 Add the maven-failsafe-plugin to the build process so we can run integration 
 tests with mvn verify. This will also involve a renaming of integration 
 tests to conform to a new integration test regex.
 This is a stopgap measure while we until break them out into their own module.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-4507) Create checkAndPut variant that exposes timestamp / UUID


 [ 
https://issues.apache.org/jira/browse/HBASE-4507?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-4507:
--

Description: Michael checked the checkAndPut which doesn't expose 
timestamp. A variant of checkAndPut should be created to expose timestamp which 
is written into a column specified by additional parameters.  (was: Michael 
checked the checkAndPut which doesn't expose timestamp. So variant of 
checkAndPut should expose timestamp by writing timestamp or uuid to .META. into 
a new column info:editid whenever we do the metadata open update.)

 Create checkAndPut variant that exposes timestamp / UUID
 

 Key: HBASE-4507
 URL: https://issues.apache.org/jira/browse/HBASE-4507
 Project: HBase
  Issue Type: Sub-task
Reporter: Ted Yu

 Michael checked the checkAndPut which doesn't expose timestamp. A variant of 
 checkAndPut should be created to expose timestamp which is written into a 
 column specified by additional parameters.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4336) Convert source tree into maven modules

2011-09-28 Thread Jesse Yates (Commented) (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-4336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13116808#comment-13116808
 ] 

Jesse Yates commented on HBASE-4336:


I was thinking about doing some hardcore forking action on github for hbase and 
maintaining a 'modulized' version. 

Do you guys think it is worth the effort to do this separately until we are 
ready to move over, is this a good time in terms of patches coming in?

 Convert source tree into maven modules
 --

 Key: HBASE-4336
 URL: https://issues.apache.org/jira/browse/HBASE-4336
 Project: HBase
  Issue Type: Task
  Components: build
Reporter: Gary Helmling

 When we originally converted the build to maven we had a single core module 
 defined, but later reverted this to a module-less build for the sake of 
 simplicity.
 It now looks like it's time to re-address this, as we have an actual need for 
 modules to:
 * provide a trimmed down client library that applications can make use of
 * more cleanly support building against different versions of Hadoop, in 
 place of some of the reflection machinations currently required
 * incorporate the secure RPC engine that depends on some secure Hadoop classes
 I propose we start simply by refactoring into two initial modules:
 * core - common classes and utilities, and client-side code and interfaces
 * server - master and region server implementations and supporting code
 This would also lay the groundwork for incorporating the HBase security 
 features that have been developed.  Once the module structure is in place, 
 security-related features could then be incorporated into a third module -- 
 security -- after normal review and approval.  The security module could 
 then depend on secure Hadoop, without modifying the dependencies of the rest 
 of the HBase code.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4489) Better key splitting in RegionSplitter


[ 
https://issues.apache.org/jira/browse/HBASE-4489?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13116830#comment-13116830
 ] 

Ted Yu commented on HBASE-4489:
---

@Dave, @Jonathan:
Shall we do the following to move this forward ?
* fix the range bug in MD5StringSplit
* provide unit tests
* keep MD5StringSplit as the default

Thanks

 Better key splitting in RegionSplitter
 --

 Key: HBASE-4489
 URL: https://issues.apache.org/jira/browse/HBASE-4489
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.90.4
Reporter: Dave Revell
Assignee: Dave Revell
 Attachments: HBASE-4489-branch0.90-v1.patch, HBASE-4489-trunk-v1.patch


 The RegionSplitter utility allows users to create a pre-split table from the 
 command line or do a rolling split on an existing table. It supports 
 pluggable split algorithms that implement the SplitAlgorithm interface. The 
 only/default SplitAlgorithm is one that assumes keys fall in the range from 
 ASCII string  to ASCII string 7FFF. This is not a sane 
 default, and seems useless to most users. Users are likely to be surprised by 
 the fact that all the region splits occur in in the byte range of ASCII 
 characters.
 A better default split algorithm would be one that evenly divides the space 
 of all bytes, which is what this patch does. Making a table with five regions 
 would split at \x33\x33..., \x66\x66, \x99\x99..., \xCC\xCC..., and 
 \xFF\xFF.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (HBASE-4455) Rolling restart RSs scenario, -ROOT-, .META. regions are lost in AssignmentManager

2011-09-28 Thread Ming Ma (Resolved) (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-4455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ming Ma resolved HBASE-4455.


Resolution: Fixed

 Rolling restart RSs scenario, -ROOT-, .META. regions are lost in 
 AssignmentManager
 --

 Key: HBASE-4455
 URL: https://issues.apache.org/jira/browse/HBASE-4455
 Project: HBase
  Issue Type: Bug
Reporter: Ming Ma
Assignee: Ming Ma
 Fix For: 0.92.0


 Keep Master up all the time, do rolling restart of RSs like this - stop RS1, 
 wait for 2 seconds, stop RS2, start RS1, wait for 2 seconds, stop RS3, start 
 RS2, wait for 2 seconds, etc. After a while, you will find the -ROOT-, .META. 
 regions aren't in regions in transtion from AssignmentManager point of 
 view, but they aren't assigned to any regions. Here are the issues.
 1. .-ROOT- or .META. location is stale when MetaServerShutdownHandler is 
 invoked to check if it contains -ROOT- region. That is due to long delay from 
 ZK notification and async nature of the system. Here is an example, even 
 though new root region server sea-lab-1,60020,1316380133656 is set at T2, at 
 T3 the shutdown process for sea-lab-1,60020,1316380133656, the root location 
 still points to old server sea-lab-3,60020,1316380037898.
 T1: 2011-09-18 14:08:52,470 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil: 
 master:6
 -0x1327e43175e Retrieved 29 byte(s) of data from znode 
 /hbase/root-regio
 n-server and set watcher; sea-lab-3,60020,1316380037898
 T2: 2011-09-18 14:08:57,173 INFO 
 org.apache.hadoop.hbase.catalog.RootLocationEditor: Setting ROOT region 
 location in ZooKeeper as sea-lab-1,60020,1316380133656
 T3: 2011-09-18 14:10:26,393 DEBUG 
 org.apache.hadoop.hbase.master.ServerManager: Adde
 d=sea-lab-1,60020,1316380133656 to dead servers, submitted shutdown handler 
 to be executed, root=false, meta=true, current Root Location: 
 sea-lab-3,60020,1316380037898
 T4: 2011-09-18 14:12:37,314 DEBUG org.apache.hadoop.hbase.zookeeper.ZKUtil: 
 master:6
 -0x1327e43175e Retrieved 29 byte(s) of data from znode 
 /hbase/root-region-server and set watcher; sea-lab-1,60020,1316380133656
 2. The MetaServerShutdownHandler worker thread that waits for -ROOT- or 
 .META. availability could be blocked. If meanwhile, the new server that 
 -ROOT- or .META. is being assigned restarted, another instance of 
 MetaServerShutdownHandler is queued. Eventually, all 
 MetaServerShutdownHandler worker threads are filled up. It looks like 
 HBASE-4245.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4336) Convert source tree into maven modules