date:20120917


[ 
https://issues.apache.org/jira/browse/HBASE-6657?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13456784#comment-13456784
 ] 

Gregory Chanan commented on HBASE-6657:
---

@Lars:
It's because areSerializedFieldsEqual is package-private, because it's a 
test-only method, but needs to be in the Filter.  Everything in an interface is 
automatically public.

It seemed to me slightly better to do it that way than making the test method 
public (not a big deal because it's a pure function) and keeping Filter as an 
interface.

 Merge Filter and FilterBase
 ---

 Key: HBASE-6657
 URL: https://issues.apache.org/jira/browse/HBASE-6657
 Project: HBase
  Issue Type: Bug
  Components: filters
Reporter: Gregory Chanan
Assignee: Gregory Chanan
Priority: Minor
 Fix For: 0.96.0


 After HBASE-6477, Filter is an abstract class, as is FilterBase.  It probably 
 doesn't make much sense to keep both.
 See Review Request for more info:
 https://reviews.apache.org/r/6670/

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6798) HDFS always read checksum form meta file

2012-09-17 Thread LiuLei (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13456786#comment-13456786
 ] 

LiuLei commented on HBASE-6798:
---

yes, Hadoop1.0.3 also has the problem.

 HDFS always read checksum form meta file
 

 Key: HBASE-6798
 URL: https://issues.apache.org/jira/browse/HBASE-6798
 Project: HBase
  Issue Type: Bug
  Components: performance
Affects Versions: 0.94.0, 0.94.1
Reporter: LiuLei

 I use hbase0.941 and hadoop-0.20.2-cdh3u5 version.
 The HBase support checksums in HBase block cache in HBASE-5074 jira.
 The  HBase  support checksums for decrease the iops of  HDFS, so that HDFS
 dont't need to read the checksum from meta file of block file.
 But in hadoop-0.20.2-cdh3u5 version, BlockSender still read the metadata file 
 even if the
  hbase.regionserver.checksum.verify property is ture.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Comment Edited] (HBASE-6798) HDFS always read checksum form meta file


[ 
https://issues.apache.org/jira/browse/HBASE-6798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13456790#comment-13456790
 ] 

Lars Hofhansl edited comment on HBASE-6798 at 9/17/12 5:27 PM:
---

Thanks LiuLei.

  was (Author: lhofhansl):
Thanks LuiLei.
  
 HDFS always read checksum form meta file
 

 Key: HBASE-6798
 URL: https://issues.apache.org/jira/browse/HBASE-6798
 Project: HBase
  Issue Type: Bug
  Components: performance
Affects Versions: 0.94.0, 0.94.1
Reporter: LiuLei

 I use hbase0.941 and hadoop-0.20.2-cdh3u5 version.
 The HBase support checksums in HBase block cache in HBASE-5074 jira.
 The  HBase  support checksums for decrease the iops of  HDFS, so that HDFS
 dont't need to read the checksum from meta file of block file.
 But in hadoop-0.20.2-cdh3u5 version, BlockSender still read the metadata file 
 even if the
  hbase.regionserver.checksum.verify property is ture.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6798) HDFS always read checksum form meta file


[ 
https://issues.apache.org/jira/browse/HBASE-6798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13456790#comment-13456790
 ] 

Lars Hofhansl commented on HBASE-6798:
--

Thanks LuiLei.

 HDFS always read checksum form meta file
 

 Key: HBASE-6798
 URL: https://issues.apache.org/jira/browse/HBASE-6798
 Project: HBase
  Issue Type: Bug
  Components: performance
Affects Versions: 0.94.0, 0.94.1
Reporter: LiuLei

 I use hbase0.941 and hadoop-0.20.2-cdh3u5 version.
 The HBase support checksums in HBase block cache in HBASE-5074 jira.
 The  HBase  support checksums for decrease the iops of  HDFS, so that HDFS
 dont't need to read the checksum from meta file of block file.
 But in hadoop-0.20.2-cdh3u5 version, BlockSender still read the metadata file 
 even if the
  hbase.regionserver.checksum.verify property is ture.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-6758) [replication] The replication-executor should make sure the file that it is replicating is closed before declaring success on that file

[
https://issues.apache.org/jira/browse/HBASE-6758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Devaraj Das updated HBASE-6758:
---

Attachment: 6758-1-0.92.patch

Attaching a patch for 0.92. The main idea is that at the beginning of the main
loop in replication executor's run method, it is checked whether the file
pointed to by getCurrentPath is presently in use by the WAL for writing. If so,
all the methods that are invoked later on in the present iteration of the loop
skips those operations that would remove the file from the ZK queue, or,
consider a file has been completely replicated.

With this patch, I haven't observed failures in TestReplication.queueFailover
(for the reason mentioned in the jira Description) for 100s of runs.

[replication] The replication-executor should make sure the file that it is
replicating is closed before declaring success on that file
---

Key: HBASE-6758
URL: https://issues.apache.org/jira/browse/HBASE-6758
Project: HBase
Issue Type: Bug
Reporter: Devaraj Das
Assignee: Devaraj Das
Attachments: 6758-1-0.92.patch

I have seen cases where the replication-executor would lose data to replicate
since the file hasn't been closed yet. Upon closing, the new data becomes
visible. Before that happens the ZK node shouldn't be deleted in
ReplicationSourceManager.logPositionAndCleanOldLogs. Changes need to be made
in ReplicationSource.processEndOfFile as well (currentPath related).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6707) TEST org.apache.hadoop.hbase.backup.example.TestZooKeeperTableArchiveClient.testMultipleTables flaps


[ 
https://issues.apache.org/jira/browse/HBASE-6707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13456867#comment-13456867
 ] 

Hudson commented on HBASE-6707:
---

Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #177 (See 
[https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/177/])
HBASE-6707 TEST 
org.apache.hadoop.hbase.backup.example.TestZooKeeperTableArchiveClient.testMultipleTables
 flaps (Revision 1385388)

 Result = FAILURE
stack : 
Files : 
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/backup/example/LongTermArchivingHFileCleaner.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/cleaner/CleanerChore.java
* 
/hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/backup/example/TestZooKeeperTableArchiveClient.java


 TEST 
 org.apache.hadoop.hbase.backup.example.TestZooKeeperTableArchiveClient.testMultipleTables
  flaps
 

 Key: HBASE-6707
 URL: https://issues.apache.org/jira/browse/HBASE-6707
 Project: HBase
  Issue Type: Bug
  Components: test
Reporter: Sameer Vaishampayan
Assignee: Jesse Yates
Priority: Critical
 Fix For: 0.96.0

 Attachments: hbase-6707-v0.patch, hbase-6707-v1.patch


 https://builds.apache.org/job/HBase-TRUNK/3293/
 Error Message
 Archived HFiles 
 (hdfs://localhost:59986/user/jenkins/hbase/.archive/otherTable/01ced3b55d7220a9c460273a4a57b198/fam)
  should have gotten deleted, but didn't, remaining 
 files:[hdfs://localhost:59986/user/jenkins/hbase/.archive/otherTable/01ced3b55d7220a9c460273a4a57b198/fam/fc872572a1f5443eb55b6e2567cfeb1c]
 Stacktrace
 java.lang.AssertionError: Archived HFiles 
 (hdfs://localhost:59986/user/jenkins/hbase/.archive/otherTable/01ced3b55d7220a9c460273a4a57b198/fam)
  should have gotten deleted, but didn't, remaining 
 files:[hdfs://localhost:59986/user/jenkins/hbase/.archive/otherTable/01ced3b55d7220a9c460273a4a57b198/fam/fc872572a1f5443eb55b6e2567cfeb1c]
   at org.junit.Assert.fail(Assert.java:93)
   at org.junit.Assert.assertTrue(Assert.java:43)
   at org.junit.Assert.assertNull(Assert.java:551)
   at 
 org.apache.hadoop.hbase.backup.example.TestZooKeeperTableArchiveClient.testMultipleTables(TestZooKeeperTableArchiveClient.java:291)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6707) TEST org.apache.hadoop.hbase.backup.example.TestZooKeeperTableArchiveClient.testMultipleTables flaps


[ 
https://issues.apache.org/jira/browse/HBASE-6707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13456872#comment-13456872
 ] 

Hudson commented on HBASE-6707:
---

Integrated in HBase-TRUNK #3340 (See 
[https://builds.apache.org/job/HBase-TRUNK/3340/])
HBASE-6707 TEST 
org.apache.hadoop.hbase.backup.example.TestZooKeeperTableArchiveClient.testMultipleTables
 flaps (Revision 1385388)

 Result = FAILURE
stack : 
Files : 
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/backup/example/LongTermArchivingHFileCleaner.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/cleaner/CleanerChore.java
* 
/hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/backup/example/TestZooKeeperTableArchiveClient.java


 TEST 
 org.apache.hadoop.hbase.backup.example.TestZooKeeperTableArchiveClient.testMultipleTables
  flaps
 

 Key: HBASE-6707
 URL: https://issues.apache.org/jira/browse/HBASE-6707
 Project: HBase
  Issue Type: Bug
  Components: test
Reporter: Sameer Vaishampayan
Assignee: Jesse Yates
Priority: Critical
 Fix For: 0.96.0

 Attachments: hbase-6707-v0.patch, hbase-6707-v1.patch


 https://builds.apache.org/job/HBase-TRUNK/3293/
 Error Message
 Archived HFiles 
 (hdfs://localhost:59986/user/jenkins/hbase/.archive/otherTable/01ced3b55d7220a9c460273a4a57b198/fam)
  should have gotten deleted, but didn't, remaining 
 files:[hdfs://localhost:59986/user/jenkins/hbase/.archive/otherTable/01ced3b55d7220a9c460273a4a57b198/fam/fc872572a1f5443eb55b6e2567cfeb1c]
 Stacktrace
 java.lang.AssertionError: Archived HFiles 
 (hdfs://localhost:59986/user/jenkins/hbase/.archive/otherTable/01ced3b55d7220a9c460273a4a57b198/fam)
  should have gotten deleted, but didn't, remaining 
 files:[hdfs://localhost:59986/user/jenkins/hbase/.archive/otherTable/01ced3b55d7220a9c460273a4a57b198/fam/fc872572a1f5443eb55b6e2567cfeb1c]
   at org.junit.Assert.fail(Assert.java:93)
   at org.junit.Assert.assertTrue(Assert.java:43)
   at org.junit.Assert.assertNull(Assert.java:551)
   at 
 org.apache.hadoop.hbase.backup.example.TestZooKeeperTableArchiveClient.testMultipleTables(TestZooKeeperTableArchiveClient.java:291)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-6800) Build a Document Store on HBase for Better Query Processing


 [ 
https://issues.apache.org/jira/browse/HBASE-6800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dai updated HBASE-6800:
-

Attachment: dot-deisgn.pdf

 Build a Document Store on HBase for Better Query Processing
 ---

 Key: HBASE-6800
 URL: https://issues.apache.org/jira/browse/HBASE-6800
 Project: HBase
  Issue Type: New Feature
  Components: coprocessors, performance
Affects Versions: 0.96.0
Reporter: Jason Dai
 Attachments: dot-deisgn.pdf


 In the last couple of years, increasingly more people begin to stream data 
 into HBase in near time, and 
 use high level queries (e.g., Hive) to analyze the data in HBase directly. 
 While HBase already has very effective MapReduce integration with its good 
 scanning performance, query processing using MapReduce on HBase still has 
 significant gaps compared to HDFS: ~3x space overheads and 3~5x performance 
 overheads according to our measurement.
 We propose to implement a document store on HBase, which can greatly improve 
 query processing on HBase (by leveraging the relational model and read-mostly 
 access patterns). According to our prototype, it can reduce space usage by 
 up-to ~3x and speedup query processing by up-to ~2x.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HBASE-6800) Build a Document Store on HBase for Better Query Processing

Jason Dai created HBASE-6800:


 Summary: Build a Document Store on HBase for Better Query 
Processing
 Key: HBASE-6800
 URL: https://issues.apache.org/jira/browse/HBASE-6800
 Project: HBase
  Issue Type: New Feature
  Components: coprocessors, performance
Affects Versions: 0.96.0
Reporter: Jason Dai
 Attachments: dot-deisgn.pdf

In the last couple of years, increasingly more people begin to stream data into 
HBase in near time, and 
use high level queries (e.g., Hive) to analyze the data in HBase directly. 
While HBase already has very effective MapReduce integration with its good 
scanning performance, query processing using MapReduce on HBase still has 
significant gaps compared to HDFS: ~3x space overheads and 3~5x performance 
overheads according to our measurement.

We propose to implement a document store on HBase, which can greatly improve 
query processing on HBase (by leveraging the relational model and read-mostly 
access patterns). According to our prototype, it can reduce space usage by 
up-to ~3x and speedup query processing by up-to ~2x.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6710) 0.92/0.94 compatibility issues due to HBASE-5206

2012-09-17 Thread ramkrishna.s.vasudevan (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-6710?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13456893#comment-13456893
]

ramkrishna.s.vasudevan commented on HBASE-6710:
---

So as per the change the 0.92 clients need a restart with configuration changes
and not the 0.94 client. Thanks Gregory.

0.92/0.94 compatibility issues due to HBASE-5206

Key: HBASE-6710
URL: https://issues.apache.org/jira/browse/HBASE-6710
Project: HBase
Issue Type: Bug
Reporter: Gregory Chanan
Assignee: Gregory Chanan
Priority: Critical
Fix For: 0.94.2

Attachments: HBASE-6710-v3.patch

HBASE-5206 introduces some compatibility issues between {0.94,0.94.1} and
{0.92.0,0.92.1}. The release notes of HBASE-5155 describes the issue
(HBASE-5206 is a backport of HBASE-5155).
I think we can make 0.94.2 compatible with both {0.94.0,0.94.1} and
{0.92.0,0.92.1}, although one of those sets will require configuration
changes.
The basic problem is that there is a znode for each table
zookeeper.znode.tableEnableDisable that is handled differently.
On 0.92.0 and 0.92.1 the states for this table are:
[ disabled, disabling, enabling ] or deleted if the table is enabled
On 0.94.1 and 0.94.2 the states for this table are:
[ disabled, disabling, enabling, enabled ]
What saves us is that the location of this znode is configurable. So the
basic idea is to have the 0.94.2 master write two different znodes,
zookeeper.znode.tableEnableDisabled92 and
zookeeper.znode.tableEnableDisabled94 where the 92 node is in 92 format,
the 94 node is in 94 format. And internally, the master would only use the
94 format in order to solve the original bug HBASE-5155 solves.
We can of course make one of these the same default as exists now, so we
don't need to make config changes for one of 0.92 or 0.94 clients. I argue
that 0.92 clients shouldn't have to make config changes for the same reason I
argued above. But that is debatable.
Then, I think the only question left is the question of how to bring along
the {0.94.0, 0.94.1} crew. A {0.94.0, 0.94.1} client would work against a
0.94.2 cluster by just configuring zookeeper.znode.tableEnableDisable in
the client to be whatever zookeeper.znode.tableEnableDisabled94 is in the
cluster. A 0.94.2 client would work against both a {0.94.0, 0.94.1} and
{0.92.0, 0.92.1} cluster if it had HBASE-6268 applied. About rolling upgrade
from {0.94.0, 0.94.1} to 0.94.2 -- I'd have to think about that. Do the
regionservers ever read the tableEnableDisabled znode?
On the mailing list, Lars H suggested the following:
The only input I'd have is that format we'll use going forward will not have
a version attached to it.
So maybe the 92 version would still be called
zookeeper.znode.tableEnableDisable and the new node could have a different
name zookeeper.znode.tableEnableDisableNew (or something).

[jira] [Updated] (HBASE-6800) Build a Document Store on HBase for Better Query Processing


 [ 
https://issues.apache.org/jira/browse/HBASE-6800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jason Dai updated HBASE-6800:
-

Description: 
In the last couple of years, increasingly more people begin to stream data into 
HBase in near time, and 
use high level queries (e.g., Hive) to analyze the data in HBase directly. 
While HBase already has very effective MapReduce integration with its good 
scanning performance, query processing using MapReduce on HBase still has 
significant gaps compared to HDFS: ~3x space overheads and 3~5x performance 
overheads according to our measurement.

We propose to implement a document store on HBase, which can greatly improve 
query processing on HBase (by leveraging the relational model and read-mostly 
access patterns). According to our prototype, it can reduce space usage by 
up-to ~3x and speedup query processing by up-to ~1.8x.

  was:
In the last couple of years, increasingly more people begin to stream data into 
HBase in near time, and 
use high level queries (e.g., Hive) to analyze the data in HBase directly. 
While HBase already has very effective MapReduce integration with its good 
scanning performance, query processing using MapReduce on HBase still has 
significant gaps compared to HDFS: ~3x space overheads and 3~5x performance 
overheads according to our measurement.

We propose to implement a document store on HBase, which can greatly improve 
query processing on HBase (by leveraging the relational model and read-mostly 
access patterns). According to our prototype, it can reduce space usage by 
up-to ~3x and speedup query processing by up-to ~2x.


 Build a Document Store on HBase for Better Query Processing
 ---

 Key: HBASE-6800
 URL: https://issues.apache.org/jira/browse/HBASE-6800
 Project: HBase
  Issue Type: New Feature
  Components: coprocessors, performance
Affects Versions: 0.96.0
Reporter: Jason Dai
 Attachments: dot-deisgn.pdf


 In the last couple of years, increasingly more people begin to stream data 
 into HBase in near time, and 
 use high level queries (e.g., Hive) to analyze the data in HBase directly. 
 While HBase already has very effective MapReduce integration with its good 
 scanning performance, query processing using MapReduce on HBase still has 
 significant gaps compared to HDFS: ~3x space overheads and 3~5x performance 
 overheads according to our measurement.
 We propose to implement a document store on HBase, which can greatly improve 
 query processing on HBase (by leveraging the relational model and read-mostly 
 access patterns). According to our prototype, it can reduce space usage by 
 up-to ~3x and speedup query processing by up-to ~1.8x.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6634) REST API ScannerModel's protobuf converter code duplicates the setBatch call

2012-09-17 Thread Michael Drzal (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13456990#comment-13456990
 ] 

Michael Drzal commented on HBASE-6634:
--

+1 on the patch.  That's really strange.

 REST API ScannerModel's protobuf converter code duplicates the setBatch call
 

 Key: HBASE-6634
 URL: https://issues.apache.org/jira/browse/HBASE-6634
 Project: HBase
  Issue Type: Bug
  Components: rest
Affects Versions: 0.94.0
Reporter: Harsh J
Assignee: Harsh J
Priority: Trivial
 Attachments: HBASE-6634.patch


 There's a dupe call to setBatch when a scanner model object is created for 
 protobuf outputs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6758) [replication] The replication-executor should make sure the file that it is replicating is closed before declaring success on that file


[ 
https://issues.apache.org/jira/browse/HBASE-6758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13457068#comment-13457068
 ] 

Ted Yu commented on HBASE-6758:
---

@Devaraj:
Thanks for your effort.
I got the following at compilation time:
{code}
[ERROR] 
/home/hduser/92/src/main/java/org/apache/hadoop/hbase/replication/regionserver/ReplicationSource.java:[317,11]
 readAllEntriesToReplicateOrNextFile(boolean) in 
org.apache.hadoop.hbase.replication.regionserver.ReplicationSource cannot be 
applied to ()
{code}
Do you see similar error ?

 [replication] The replication-executor should make sure the file that it is 
 replicating is closed before declaring success on that file
 ---

 Key: HBASE-6758
 URL: https://issues.apache.org/jira/browse/HBASE-6758
 Project: HBase
  Issue Type: Bug
Reporter: Devaraj Das
Assignee: Devaraj Das
 Attachments: 6758-1-0.92.patch


 I have seen cases where the replication-executor would lose data to replicate 
 since the file hasn't been closed yet. Upon closing, the new data becomes 
 visible. Before that happens the ZK node shouldn't be deleted in 
 ReplicationSourceManager.logPositionAndCleanOldLogs. Changes need to be made 
 in ReplicationSource.processEndOfFile as well (currentPath related).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6758) [replication] The replication-executor should make sure the file that it is replicating is closed before declaring success on that file

[
https://issues.apache.org/jira/browse/HBASE-6758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13457097#comment-13457097
]

stack commented on HBASE-6758:
--

Rather than change all new Replication invocations to take a null, why not
override the Replication constructor? Your patch would be smaller.

Could there be issues with isFileInUse in multithreaded context? Should
currentFilePath be an atomic reference so all threads see the changes when they
happen? Do you think this an issue?

Do we have to pass in an HRegionServer instance into ReplicationSourceManager?
Can it be one of the Interfaces Server or RegionServerServices? Or looking at
why you need it, you want it because you want to get at HLog instance. Can we
not pass this? Or better, an Interface that has isFileInUse on it?

Currently, you are passing an HRegionServer Instance to
ReplicationSourceManager to which is added a public method that exposes the
HRegionServer instance on which we invoke the getWAL method to call
isFileInUse. We're adding a bit of tangle.

Otherwise, I love the fact that you are figuring bugs and fixes in replication
just using the test. Painful I'd imagine. Great work.

[replication] The replication-executor should make sure the file that it is
replicating is closed before declaring success on that file
---

Key: HBASE-6758
URL: https://issues.apache.org/jira/browse/HBASE-6758
Project: HBase
Issue Type: Bug
Reporter: Devaraj Das
Assignee: Devaraj Das
Attachments: 6758-1-0.92.patch

[jira] [Created] (HBASE-6801) TestZooKeeperTableArchiveClient#testArchivingOnSingleTable sometimes fails in trunk

Ted Yu created HBASE-6801:
-

 Summary: 
TestZooKeeperTableArchiveClient#testArchivingOnSingleTable sometimes fails in 
trunk
 Key: HBASE-6801
 URL: https://issues.apache.org/jira/browse/HBASE-6801
 Project: HBase
  Issue Type: Bug
Reporter: Ted Yu


From build #3342:
{code}
Failed tests:   
testArchivingOnSingleTable(org.apache.hadoop.hbase.backup.example.TestZooKeeperTableArchiveClient):
 (..)
{code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6801) TestZooKeeperTableArchiveClient#testArchivingOnSingleTable sometimes fails in trunk


[ 
https://issues.apache.org/jira/browse/HBASE-6801?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13457108#comment-13457108
 ] 

Ted Yu commented on HBASE-6801:
---

Here is the assertion error:
{code}
java.lang.AssertionError: 
Expected: Store files in archive doesn't match expected
 got: 100

at org.junit.Assert.assertThat(Assert.java:780)
at org.junit.Assert.assertThat(Assert.java:738)
at 
org.apache.hadoop.hbase.backup.example.TestZooKeeperTableArchiveClient.ensureAllTableFilesinArchive(TestZooKeeperTableArchiveClient.java:344)
at 
org.apache.hadoop.hbase.backup.example.TestZooKeeperTableArchiveClient.testArchivingOnSingleTable(TestZooKeeperTableArchiveClient.java:166)
{code}

 TestZooKeeperTableArchiveClient#testArchivingOnSingleTable sometimes fails in 
 trunk
 ---

 Key: HBASE-6801
 URL: https://issues.apache.org/jira/browse/HBASE-6801
 Project: HBase
  Issue Type: Bug
Reporter: Ted Yu

 From build #3342:
 {code}
 Failed tests:   
 testArchivingOnSingleTable(org.apache.hadoop.hbase.backup.example.TestZooKeeperTableArchiveClient):
  (..)
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-4676) Prefix Compression - Trie data block encoding

[
https://issues.apache.org/jira/browse/HBASE-4676?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13457131#comment-13457131
]

Ted Yu commented on HBASE-4676:
---

@Matt:
Do you want to attach your patch(es) here for Hadoop QA to run test suite ?

Thanks

Prefix Compression - Trie data block encoding
-

Key: HBASE-4676
URL: https://issues.apache.org/jira/browse/HBASE-4676
Project: HBase
Issue Type: New Feature
Components: io, performance, regionserver
Affects Versions: 0.90.6
Reporter: Matt Corgan
Assignee: Matt Corgan
Attachments: HBASE-4676-0.94-v1.patch, hbase-prefix-trie-0.1.jar,
PrefixTrie_Format_v1.pdf, PrefixTrie_Performance_v1.pdf, SeeksPerSec by
blockSize.png

The HBase data block format has room for 2 significant improvements for
applications that have high block cache hit ratios.
First, there is no prefix compression, and the current KeyValue format is
somewhat metadata heavy, so there can be tremendous memory bloat for many
common data layouts, specifically those with long keys and short values.
Second, there is no random access to KeyValues inside data blocks. This
means that every time you double the datablock size, average seek time (or
average cpu consumption) goes up by a factor of 2. The standard 64KB block
size is ~10x slower for random seeks than a 4KB block size, but block sizes
as small as 4KB cause problems elsewhere. Using block sizes of 256KB or 1MB
or more may be more efficient from a disk access and block-cache perspective
in many big-data applications, but doing so is infeasible from a random seek
perspective.
The PrefixTrie block encoding format attempts to solve both of these
problems. Some features:
* trie format for row key encoding completely eliminates duplicate row keys
and encodes similar row keys into a standard trie structure which also saves
a lot of space
* the column family is currently stored once at the beginning of each block.
this could easily be modified to allow multiple family names per block
* all qualifiers in the block are stored in their own trie format which
caters nicely to wide rows. duplicate qualifers between rows are eliminated.
the size of this trie determines the width of the block's qualifier
fixed-width-int
* the minimum timestamp is stored at the beginning of the block, and deltas
are calculated from that. the maximum delta determines the width of the
block's timestamp fixed-width-int
The block is structured with metadata at the beginning, then a section for
the row trie, then the column trie, then the timestamp deltas, and then then
all the values. Most work is done in the row trie, where every leaf node
(corresponding to a row) contains a list of offsets/references corresponding
to the cells in that row. Each cell is fixed-width to enable binary
searching and is represented by [1 byte operationType, X bytes qualifier
offset, X bytes timestamp delta offset].
If all operation types are the same for a block, there will be zero per-cell
overhead. Same for timestamps. Same for qualifiers when i get a chance.
So, the compression aspect is very strong, but makes a few small sacrifices
on VarInt size to enable faster binary searches in trie fan-out nodes.
A more compressed but slower version might build on this by also applying
further (suffix, etc) compression on the trie nodes at the cost of slower
write speed. Even further compression could be obtained by using all VInts
instead of FInts with a sacrifice on random seek speed (though not huge).
One current drawback is the current write speed. While programmed with good
constructs like TreeMaps, ByteBuffers, binary searches, etc, it's not
programmed with the same level of optimization as the read path. Work will
need to be done to optimize the data structures used for encoding and could
probably show a 10x increase. It will still be slower than delta encoding,
but with a much higher decode speed. I have not yet created a thorough
benchmark for write speed nor sequential read speed.
Though the trie is reaching a point where it is internally very efficient
(probably within half or a quarter of its max read speed) the way that hbase
currently uses it is far from optimal. The KeyValueScanner and related
classes that iterate through the trie will eventually need to be smarter and
have methods to do things like skipping to the next row of results without
scanning every cell in between. When that is accomplished it will also allow
much faster compactions because the full row key will not have to be compared
as often as it is now.
Current code is on github. The trie code is in a separate project than the
slightly modified hbase. There is

[jira] [Commented] (HBASE-6776) Opened region of disabled table is not added to online region list


[ 
https://issues.apache.org/jira/browse/HBASE-6776?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13457136#comment-13457136
 ] 

Jimmy Xiang commented on HBASE-6776:


I knew Rajesh's idea how to fix HBASE-6317.  I think it is great. I prefer to 
isolate the change in EnableTableHandler and don't touch AssignmentManager.
I will comment their as well.

 Opened region of disabled table is not added to online region list
 --

 Key: HBASE-6776
 URL: https://issues.apache.org/jira/browse/HBASE-6776
 Project: HBase
  Issue Type: Bug
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
Priority: Minor
 Attachments: trunk-6776.patch, trunk-6776_v2.patch


 For opened region of disabled table, it should be added to online region 
 list, and then closed.  We should not just ignore them.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6317) Master clean start up and Partially enabled tables make region assignment inconsistent.

[
https://issues.apache.org/jira/browse/HBASE-6317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13457143#comment-13457143
]

Jimmy Xiang commented on HBASE-6317:

I think the idea of the patch is great. However, I prefer to isolate the change
in EnableTableHandler and don't touch AssignmentManager.
Even roundrobinassignment is set, if there is a server entry in meta for a
region of an enabling table, I think we can just ignore it and assign to the
server if the server is online. So we don't need to pass masterrestart around.
We can update the region plan only in EnableTableHandler. The assumption is
that if the server entry is already in meta, we should obey it and it is there
because of a previous roundrobinassignment. What do you think?

Master clean start up and Partially enabled tables make region assignment
inconsistent.
---

Key: HBASE-6317
URL: https://issues.apache.org/jira/browse/HBASE-6317
Project: HBase
Issue Type: Bug
Reporter: ramkrishna.s.vasudevan
Assignee: rajeshbabu
Fix For: 0.96.0, 0.92.3, 0.94.3

Attachments: HBASE-6317_94_3.patch, HBASE-6317_94.patch,
HBASE-6317_trunk_2.patch

If we have a table in partially enabled state (ENABLING) then on HMaster
restart we treat it as a clean cluster start up and do a bulk assign.
Currently in 0.94 bulk assign will not handle ALREADY_OPENED scenarios and it
leads to region assignment problems. Analysing more on this we found that we
have better way to handle these scenarios.
{code}
if (false == checkIfRegionBelongsToDisabled(regionInfo)
false == checkIfRegionsBelongsToEnabling(regionInfo)) {
synchronized (this.regions) {
regions.put(regionInfo, regionLocation);
addToServers(regionLocation, regionInfo);
}
{code}
We dont add to regions map so that enable table handler can handle it. But
as nothing is added to regions map we think it as a clean cluster start up.
Will come up with a patch tomorrow.

[jira] [Commented] (HBASE-6758) [replication] The replication-executor should make sure the file that it is replicating is closed before declaring success on that file

[
https://issues.apache.org/jira/browse/HBASE-6758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13457156#comment-13457156
]

Devaraj Das commented on HBASE-6758:

[~zhi...@ebaysf.com] Not sure why you got a compilation error. Will look..

[~stack] Thanks for the detailed comments. Here are the responses.

bq. Rather than change all new Replication invocations to take a null, why not
override the Replication constructor? Your patch would be smaller.

I had considered that but it didn't seem adding a new constructor is justified
in the long run. There probably are no consumers of the constructor outside
HBase, etc., and adding another constructor means new code to take care of,
etc. So although it makes the patch bigger, I think it's okay..

bq. Could there be issues with isFileInUse in multithreaded context? Should
currentFilePath be an atomic reference so all threads see the changes when they
happen? Do you think this an issue?

There shouldn't be any multithreading issues here. Each ReplicationExecutor
thread has its own copy of everything (including currentFilePath), and the
getters/setters are in the same thread context.

bq. Do we have to pass in an HRegionServer instance into
ReplicationSourceManager? Can it be one of the Interfaces Server or
RegionServerServices? Or looking at why you need it, you want it because you
want to get at HLog instance. Can we not pass this? Or better, an Interface
that has isFileInUse on it?

Yes, I tried to pass the HLog instance to Replication's constructor call within
HRegionServer. But the code is kind of tangled up. HRegionServer instantiates a
Replication object (in setupWALAndReplication). HLog is instantiated in
instantiateHLog, and the constructor of HLog invokes rollWriter. If the
Replication object was not registered prior to rollWriter call, things don't
work (which means the Replication object needs to be constructed first but the
HLog instance is not available yet). I tried fixing it but then I ran into
other issues...

But yeah, I like the interface idea. Will try to refactor the code in that
respect.

[replication] The replication-executor should make sure the file that it is
replicating is closed before declaring success on that file
---

Key: HBASE-6758
URL: https://issues.apache.org/jira/browse/HBASE-6758
Project: HBase
Issue Type: Bug
Reporter: Devaraj Das
Assignee: Devaraj Das
Attachments: 6758-1-0.92.patch

[jira] [Commented] (HBASE-3834) Store ignores checksum errors when opening files

2012-09-17 Thread Todd Lipcon (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-3834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13457176#comment-13457176
 ] 

Todd Lipcon commented on HBASE-3834:


Hi Liang. Thanks so much for testing this out. We really appreciate it!

So, let's leave this open to be fixed for an 0.90.x release if we can. Maybe 
Jon or Ram might be interested in taking it up?

 Store ignores checksum errors when opening files
 

 Key: HBASE-3834
 URL: https://issues.apache.org/jira/browse/HBASE-3834
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.90.2
Reporter: Todd Lipcon
Assignee: liang xie
Priority: Critical
 Fix For: 0.90.8

 Attachments: hbase-3834.tar.gz2


 If you corrupt one of the storefiles in a region (eg using vim to muck up 
 some bytes), the region will still open, but that storefile will just be 
 ignored with a log message. We should probably not do this in general - 
 better to keep that region unassigned and force an admin to make a decision 
 to remove the bad storefile.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6770) Allow scanner setCaching to specify size instead of number of rows

2012-09-17 Thread Karthik Ranganathan (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13457179#comment-13457179
 ] 

Karthik Ranganathan commented on HBASE-6770:


Agreed. If that's the only issue, then passing a hint makes it easier to use - 
do something like setPartialRowScanning(true) if we want to respect that. But 
in any case, I am not suggesting removing the existing API, just adding the new 
ones.

 Allow scanner setCaching to specify size instead of number of rows
 --

 Key: HBASE-6770
 URL: https://issues.apache.org/jira/browse/HBASE-6770
 Project: HBase
  Issue Type: Bug
  Components: client, regionserver
Reporter: Karthik Ranganathan

 Currently, we have the following api's to customize the behavior of scans:
 setCaching() - how many rows to cache on client to speed up scans
 setBatch() - max columns per row to return per row to prevent a very large 
 response.
 Ideally, we should be able to specify a memory buffer size because:
 1. that would take care of both of these use cases.
 2. it does not need any knowledge of the size of the rows or cells, as the 
 final thing we are worried about is the available memory.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6799) Store more metadata in HFiles


[ 
https://issues.apache.org/jira/browse/HBASE-6799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13457181#comment-13457181
 ] 

stack commented on HBASE-6799:
--

Here is dump of hfile metadata from production:

{code}
Block index size as per heapsize: 110632
reader=/hbase/ad_campaign_monthly_stumbles/2081100778/default/77955d7c8845435dbcfe7b91a55fd1c4,
compression=lzo,
cacheConf=CacheConfig:enabled [cacheDataOnRead=true] 
[cacheDataOnWrite=false] [cacheIndexesOnWrite=false] [cacheBloomsOnWrite=false] 
[cacheEvictOnClose=false] [cacheCompressed=f
firstKey=10:2009:06/default:organic/1264629982792/Put,
lastKey=9:2006:03/default:paid/1260930865681/Put,
avgKeyLen=38,
avgValueLen=8,
entries=1561501,
length=19277379
Trailer:
fileinfoOffset=19276842,
loadOnOpenDataOffset=19248105,
dataIndexCount=1313,
metaIndexCount=0,
totalUncomressedBytes=86064912,
entryCount=1561501,
compressionCodec=LZO,
uncompressedDataIndexSize=67170,
numDataIndexLevels=1,
firstDataBlockOffset=0,
lastDataBlockOffset=19247463,
comparatorClassName=org.apache.hadoop.hbase.KeyValue$KeyComparator,
majorVersion=2,
minorVersion=1
Fileinfo:
DATA_BLOCK_ENCODING = NONE
DELETE_FAMILY_COUNT = \x00\x00\x00\x00\x00\x00\x00\x00
EARLIEST_PUT_TS = \x00\x00\x01%\x95\x02\xD9\xDA
KEY_VALUE_VERSION = \x00\x00\x00\x01
MAJOR_COMPACTION_KEY = \xFF
MAX_MEMSTORE_TS_KEY = \x00\x00\x00\x00\x00\x00\x00\x00
MAX_SEQ_ID_KEY = 26057054872
TIMERANGE = 12609254097541266607612712
hfile.AVG_KEY_LEN = 38
hfile.AVG_VALUE_LEN = 8
hfile.LASTKEY = 
\x00\x099:2006:03\x07defaultpaid\x00\x00\x01%\x95V\x1A\x11\x04
Mid-key: \x00\x0D43195:2008:04\x07defaultpaid\x00\x00\x01%\x95W\x9C\x86\x04
Bloom filter:
Not present
Delete Family Bloom filter:
Not present
{code}

I'd have to look at the code but the above might be made of metadata and a 
toString on the Reader (Reader might seek the first key on open... and get last 
key from the hfile meta... which would not be the same as having all this data 
in the hfile meta).

Whether its major compacted is already in there... a bunch more could be added.

 Store more metadata in HFiles
 -

 Key: HBASE-6799
 URL: https://issues.apache.org/jira/browse/HBASE-6799
 Project: HBase
  Issue Type: Brainstorming
Reporter: Lars Hofhansl

 Current we store metadata in HFile:
 * the timerange of KVs
 * the earliest PUT ts
 * max sequence id
 * whether or not this file was created from a major compaction.
 I would like to brainstorm what extra data we need to store to make an HFile 
 self describing. I.e. it could be backed up to somewhere with external tools 
 (without invoking an HBase server) can gleam enough information from it to 
 make use of the data inside. Ideally it would also be nice to be able to 
 recreate .META. from a bunch of HFiles to standup a temporary HBase instance 
 to process a bunch of HFiles.
 What I can think of:
 * min/max key
 * table
 * column family (or families to be future proof)
 * custom tags (set by a backup tools for example)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-3834) Store ignores checksum errors when opening files


[ 
https://issues.apache.org/jira/browse/HBASE-3834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13457184#comment-13457184
 ] 

stack commented on HBASE-3834:
--

Thank you Liang for doing the research.

 Store ignores checksum errors when opening files
 

 Key: HBASE-3834
 URL: https://issues.apache.org/jira/browse/HBASE-3834
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.90.2
Reporter: Todd Lipcon
Assignee: liang xie
Priority: Critical
 Fix For: 0.90.8

 Attachments: hbase-3834.tar.gz2


 If you corrupt one of the storefiles in a region (eg using vim to muck up 
 some bytes), the region will still open, but that storefile will just be 
 ignored with a log message. We should probably not do this in general - 
 better to keep that region unassigned and force an admin to make a decision 
 to remove the bad storefile.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Reopened] (HBASE-6707) TEST org.apache.hadoop.hbase.backup.example.TestZooKeeperTableArchiveClient.testMultipleTables flaps


 [ 
https://issues.apache.org/jira/browse/HBASE-6707?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack reopened HBASE-6707:
--


Reopening again.  I reverted the patch because I got this fail on trunk build: 
https://builds.apache.org/view/G-L/view/HBase/job/HBase-TRUNK/3342/  If you get 
a chance, take a looksee mighty Jesse.  Thanks.

 TEST 
 org.apache.hadoop.hbase.backup.example.TestZooKeeperTableArchiveClient.testMultipleTables
  flaps
 

 Key: HBASE-6707
 URL: https://issues.apache.org/jira/browse/HBASE-6707
 Project: HBase
  Issue Type: Bug
  Components: test
Reporter: Sameer Vaishampayan
Assignee: Jesse Yates
Priority: Critical
 Fix For: 0.96.0

 Attachments: hbase-6707-v0.patch, hbase-6707-v1.patch


 https://builds.apache.org/job/HBase-TRUNK/3293/
 Error Message
 Archived HFiles 
 (hdfs://localhost:59986/user/jenkins/hbase/.archive/otherTable/01ced3b55d7220a9c460273a4a57b198/fam)
  should have gotten deleted, but didn't, remaining 
 files:[hdfs://localhost:59986/user/jenkins/hbase/.archive/otherTable/01ced3b55d7220a9c460273a4a57b198/fam/fc872572a1f5443eb55b6e2567cfeb1c]
 Stacktrace
 java.lang.AssertionError: Archived HFiles 
 (hdfs://localhost:59986/user/jenkins/hbase/.archive/otherTable/01ced3b55d7220a9c460273a4a57b198/fam)
  should have gotten deleted, but didn't, remaining 
 files:[hdfs://localhost:59986/user/jenkins/hbase/.archive/otherTable/01ced3b55d7220a9c460273a4a57b198/fam/fc872572a1f5443eb55b6e2567cfeb1c]
   at org.junit.Assert.fail(Assert.java:93)
   at org.junit.Assert.assertTrue(Assert.java:43)
   at org.junit.Assert.assertNull(Assert.java:551)
   at 
 org.apache.hadoop.hbase.backup.example.TestZooKeeperTableArchiveClient.testMultipleTables(TestZooKeeperTableArchiveClient.java:291)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-6710) 0.92/0.94 compatibility issues due to HBASE-5206

[
https://issues.apache.org/jira/browse/HBASE-6710?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Gregory Chanan updated HBASE-6710:
--

Release Note:
This issue introduces a compatibility mode on the HMaster for 0.92.0 and 0.92.1
clients. Without the compatibility mode, 0.92.0 and 0.92.1 clients will hang
on calls to enableTable and is_enabled will always return false, even for
enabled tables. To use the compatibility mode, 0.92.0 and 0.92.1 clients
require a restart with the following configuration change:
namezookeeper.znode.tableEnableDisable/name
valuetable92/value
In rare failure cases, even with the compatibility mode on, the client may
report incorrect results for is_enabled and is_disabled. For example,
is_enabled may return true even though the table is disabled (the correct
value can be checked via the HMaster UI). This issue can be corrected by
calling enable or disable to return the table to the desired state.

Added Release Note

0.92/0.94 compatibility issues due to HBASE-5206

Key: HBASE-6710
URL: https://issues.apache.org/jira/browse/HBASE-6710
Project: HBase
Issue Type: Bug
Reporter: Gregory Chanan
Assignee: Gregory Chanan
Priority: Critical
Fix For: 0.94.2

Attachments: HBASE-6710-v3.patch

[jira] [Commented] (HBASE-6758) [replication] The replication-executor should make sure the file that it is replicating is closed before declaring success on that file

[
https://issues.apache.org/jira/browse/HBASE-6758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13457194#comment-13457194
]

Jean-Daniel Cryans commented on HBASE-6758:
---

My understanding of this patch is that it reduces the race condition but it
still leaves a small window eg you can take the fileNotInUse snapshot, get
false, and the moment after that the log could roll. If this is correct, I'm
not sure it's worth the added complexity.

It seems to me this is a case where we'd need to lock HLog.cacheFlushLock for
the time we read the log to be 100% sure log rolling doesn't happen. This has
multiple side effects like delaying flushes and log rolls for a few ms while
replication is reading the log. It would also require having a way to get to
the WAL from ReplicationSource.

blue skyingWhile I'm thinking about this, it just occurred to me that when we
read a log that's not being written to then we don't need the open/close file
dance since the new data is already available. Possible optimization
here!/blue skying

Anyways, one solution I can think of that doesn't involve leaking HRS into
replication would be giving the log a second chance. Basically if you get an
EOF, flip the secondChance bit. If it's on then you don't get rid of that log
yet. Reset the bit when you loop back to read, now if there was new data added
you should get it else go to the next log.

[replication] The replication-executor should make sure the file that it is
replicating is closed before declaring success on that file
---

Key: HBASE-6758
URL: https://issues.apache.org/jira/browse/HBASE-6758
Project: HBase
Issue Type: Bug
Reporter: Devaraj Das
Assignee: Devaraj Das
Attachments: 6758-1-0.92.patch

[jira] [Commented] (HBASE-6707) TEST org.apache.hadoop.hbase.backup.example.TestZooKeeperTableArchiveClient.testMultipleTables flaps

2012-09-17 Thread Jesse Yates (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13457202#comment-13457202
 ] 

Jesse Yates commented on HBASE-6707:


@stack yeah, I'll take a look. This is getting really frustrating :-/

 TEST 
 org.apache.hadoop.hbase.backup.example.TestZooKeeperTableArchiveClient.testMultipleTables
  flaps
 

 Key: HBASE-6707
 URL: https://issues.apache.org/jira/browse/HBASE-6707
 Project: HBase
  Issue Type: Bug
  Components: test
Reporter: Sameer Vaishampayan
Assignee: Jesse Yates
Priority: Critical
 Fix For: 0.96.0

 Attachments: hbase-6707-v0.patch, hbase-6707-v1.patch


 https://builds.apache.org/job/HBase-TRUNK/3293/
 Error Message
 Archived HFiles 
 (hdfs://localhost:59986/user/jenkins/hbase/.archive/otherTable/01ced3b55d7220a9c460273a4a57b198/fam)
  should have gotten deleted, but didn't, remaining 
 files:[hdfs://localhost:59986/user/jenkins/hbase/.archive/otherTable/01ced3b55d7220a9c460273a4a57b198/fam/fc872572a1f5443eb55b6e2567cfeb1c]
 Stacktrace
 java.lang.AssertionError: Archived HFiles 
 (hdfs://localhost:59986/user/jenkins/hbase/.archive/otherTable/01ced3b55d7220a9c460273a4a57b198/fam)
  should have gotten deleted, but didn't, remaining 
 files:[hdfs://localhost:59986/user/jenkins/hbase/.archive/otherTable/01ced3b55d7220a9c460273a4a57b198/fam/fc872572a1f5443eb55b6e2567cfeb1c]
   at org.junit.Assert.fail(Assert.java:93)
   at org.junit.Assert.assertTrue(Assert.java:43)
   at org.junit.Assert.assertNull(Assert.java:551)
   at 
 org.apache.hadoop.hbase.backup.example.TestZooKeeperTableArchiveClient.testMultipleTables(TestZooKeeperTableArchiveClient.java:291)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Resolved] (HBASE-6571) Generic multi-thread/cross-process error handling framework

2012-09-17 Thread Jesse Yates (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-6571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]

Jesse Yates resolved HBASE-6571.

Resolution: Fixed

Closing - committed to https://github.com/jyates/hbase/tree/snapshots

Generic multi-thread/cross-process error handling framework
---

Key: HBASE-6571
URL: https://issues.apache.org/jira/browse/HBASE-6571
Project: HBase
Issue Type: Sub-task
Reporter: Jesse Yates
Assignee: Jesse Yates
Fix For: hbase-6055

Attachments: Distributed Error Monitoring.docx,
java_HBASE-6571-v0.patch

The idea for a generic inter-process error-handling framework came from
working on HBASE-6055 (snapshots). Distributed snapshots require tight time
constraints in taking a snapshot to minimize offline time in face of errors.
However, we often need to coordinate errors between processes and the current
Abortable framework is not sufficiently flexible to handle the multitude of
situations that can occur when coordinating between all region servers, the
master and zookeeper. Using this framework error handling for snapshots was a
simple matter, amounting to maybe 200 LOC.
This seems to be a generally useful framework and can be used to easily add
inter-process error handling in HBase. The most obvious immediate usage is as
part of HBASE-5487 when coordinating multiple sub-tasks.

[jira] [Commented] (HBASE-6707) TEST org.apache.hadoop.hbase.backup.example.TestZooKeeperTableArchiveClient.testMultipleTables flaps


[ 
https://issues.apache.org/jira/browse/HBASE-6707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13457219#comment-13457219
 ] 

stack commented on HBASE-6707:
--

[~jesse_yates] Deep breaths.

 TEST 
 org.apache.hadoop.hbase.backup.example.TestZooKeeperTableArchiveClient.testMultipleTables
  flaps
 

 Key: HBASE-6707
 URL: https://issues.apache.org/jira/browse/HBASE-6707
 Project: HBase
  Issue Type: Bug
  Components: test
Reporter: Sameer Vaishampayan
Assignee: Jesse Yates
Priority: Critical
 Fix For: 0.96.0

 Attachments: hbase-6707-v0.patch, hbase-6707-v1.patch


 https://builds.apache.org/job/HBase-TRUNK/3293/
 Error Message
 Archived HFiles 
 (hdfs://localhost:59986/user/jenkins/hbase/.archive/otherTable/01ced3b55d7220a9c460273a4a57b198/fam)
  should have gotten deleted, but didn't, remaining 
 files:[hdfs://localhost:59986/user/jenkins/hbase/.archive/otherTable/01ced3b55d7220a9c460273a4a57b198/fam/fc872572a1f5443eb55b6e2567cfeb1c]
 Stacktrace
 java.lang.AssertionError: Archived HFiles 
 (hdfs://localhost:59986/user/jenkins/hbase/.archive/otherTable/01ced3b55d7220a9c460273a4a57b198/fam)
  should have gotten deleted, but didn't, remaining 
 files:[hdfs://localhost:59986/user/jenkins/hbase/.archive/otherTable/01ced3b55d7220a9c460273a4a57b198/fam/fc872572a1f5443eb55b6e2567cfeb1c]
   at org.junit.Assert.fail(Assert.java:93)
   at org.junit.Assert.assertTrue(Assert.java:43)
   at org.junit.Assert.assertNull(Assert.java:551)
   at 
 org.apache.hadoop.hbase.backup.example.TestZooKeeperTableArchiveClient.testMultipleTables(TestZooKeeperTableArchiveClient.java:291)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6800) Build a Document Store on HBase for Better Query Processing

2012-09-17 Thread Andrew Purtell (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-6800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13457242#comment-13457242
]

Andrew Purtell commented on HBASE-6800:
---

Thank you for your interest in contributing to the HBase project. I have two
initial comments/suggestions:

1) From the attached document, it appears that the existing coprocessor
framework was sufficient for the implementation of the DOT system on top, which
is great to see. There has been some discussion in the HBase PMC, documented in
the archives of the d...@hbase.apache.org mailing list, that coprocessor based
applications should begin as independent code contributions, perhaps hosted in
a GitHub repository. In your announcement on general@ I see you have sort-of
done this already at: https://github.com/intel-hadoop/hbase-0.94-panthera ,
except this is a full fork of the HBase source tree with all history of
individual changes lost (a single commit of a source drop). It would be helpful
if only the changes on top of stock HBase code appear here. Otherwise, what you
have done is in effect forked the HBase project, which is not conducive to
contribution.

2) From the design document: The co-processor framework needs to be extended
to provide observers for the filter operations, similar to the observers of the
data access operations. We would be delighted to work with you on the
necessary coprocessor framework extensions. I'd recommend a separate JIRA
specifically for this. Let's discuss what Coprocessor API extensions or
additions are necessary. Do you have a proposal?

Build a Document Store on HBase for Better Query Processing
---

Key: HBASE-6800
URL: https://issues.apache.org/jira/browse/HBASE-6800
Project: HBase
Issue Type: New Feature
Components: coprocessors, performance
Affects Versions: 0.96.0
Reporter: Jason Dai
Attachments: dot-deisgn.pdf

In the last couple of years, increasingly more people begin to stream data
into HBase in near time, and
use high level queries (e.g., Hive) to analyze the data in HBase directly.
While HBase already has very effective MapReduce integration with its good
scanning performance, query processing using MapReduce on HBase still has
significant gaps compared to HDFS: ~3x space overheads and 3~5x performance
overheads according to our measurement.
We propose to implement a document store on HBase, which can greatly improve
query processing on HBase (by leveraging the relational model and read-mostly
access patterns). According to our prototype, it can reduce space usage by
up-to ~3x and speedup query processing by up-to ~1.8x.

[jira] [Updated] (HBASE-6770) Allow scanner setCaching to specify size instead of number of rows

2012-09-17 Thread Karthik Ranganathan (JIRA)


 [ 
https://issues.apache.org/jira/browse/HBASE-6770?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Karthik Ranganathan updated HBASE-6770:
---

Assignee: Michal Gregorczyk

 Allow scanner setCaching to specify size instead of number of rows
 --

 Key: HBASE-6770
 URL: https://issues.apache.org/jira/browse/HBASE-6770
 Project: HBase
  Issue Type: Bug
  Components: client, regionserver
Reporter: Karthik Ranganathan
Assignee: Michal Gregorczyk

 Currently, we have the following api's to customize the behavior of scans:
 setCaching() - how many rows to cache on client to speed up scans
 setBatch() - max columns per row to return per row to prevent a very large 
 response.
 Ideally, we should be able to specify a memory buffer size because:
 1. that would take care of both of these use cases.
 2. it does not need any knowledge of the size of the rows or cells, as the 
 final thing we are worried about is the available memory.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6707) TEST org.apache.hadoop.hbase.backup.example.TestZooKeeperTableArchiveClient.testMultipleTables flaps


[ 
https://issues.apache.org/jira/browse/HBASE-6707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13457259#comment-13457259
 ] 

Hudson commented on HBASE-6707:
---

Integrated in HBase-TRUNK #3343 (See 
[https://builds.apache.org/job/HBase-TRUNK/3343/])
HBASE-6707 TEST 
org.apache.hadoop.hbase.backup.example.TestZooKeeperTableArchiveClient.testMultipleTables
 flaps; REVERT AGAIN (Revision 1386748)

 Result = SUCCESS
stack : 
Files : 
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/backup/example/LongTermArchivingHFileCleaner.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/cleaner/CleanerChore.java
* 
/hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/backup/example/TestZooKeeperTableArchiveClient.java


 TEST 
 org.apache.hadoop.hbase.backup.example.TestZooKeeperTableArchiveClient.testMultipleTables
  flaps
 

 Key: HBASE-6707
 URL: https://issues.apache.org/jira/browse/HBASE-6707
 Project: HBase
  Issue Type: Bug
  Components: test
Reporter: Sameer Vaishampayan
Assignee: Jesse Yates
Priority: Critical
 Fix For: 0.96.0

 Attachments: hbase-6707-v0.patch, hbase-6707-v1.patch


 https://builds.apache.org/job/HBase-TRUNK/3293/
 Error Message
 Archived HFiles 
 (hdfs://localhost:59986/user/jenkins/hbase/.archive/otherTable/01ced3b55d7220a9c460273a4a57b198/fam)
  should have gotten deleted, but didn't, remaining 
 files:[hdfs://localhost:59986/user/jenkins/hbase/.archive/otherTable/01ced3b55d7220a9c460273a4a57b198/fam/fc872572a1f5443eb55b6e2567cfeb1c]
 Stacktrace
 java.lang.AssertionError: Archived HFiles 
 (hdfs://localhost:59986/user/jenkins/hbase/.archive/otherTable/01ced3b55d7220a9c460273a4a57b198/fam)
  should have gotten deleted, but didn't, remaining 
 files:[hdfs://localhost:59986/user/jenkins/hbase/.archive/otherTable/01ced3b55d7220a9c460273a4a57b198/fam/fc872572a1f5443eb55b6e2567cfeb1c]
   at org.junit.Assert.fail(Assert.java:93)
   at org.junit.Assert.assertTrue(Assert.java:43)
   at org.junit.Assert.assertNull(Assert.java:551)
   at 
 org.apache.hadoop.hbase.backup.example.TestZooKeeperTableArchiveClient.testMultipleTables(TestZooKeeperTableArchiveClient.java:291)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-6634) REST API ScannerModel's protobuf converter code duplicates the setBatch call


 [ 
https://issues.apache.org/jira/browse/HBASE-6634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-6634:
-

   Resolution: Fixed
Fix Version/s: 0.96.0
 Hadoop Flags: Reviewed
   Status: Resolved  (was: Patch Available)

Applied to trunk.  Thanks for reviews Michael and Ted.  Thanks for patch Mighty 
Harsh

 REST API ScannerModel's protobuf converter code duplicates the setBatch call
 

 Key: HBASE-6634
 URL: https://issues.apache.org/jira/browse/HBASE-6634
 Project: HBase
  Issue Type: Bug
  Components: rest
Affects Versions: 0.94.0
Reporter: Harsh J
Assignee: Harsh J
Priority: Trivial
 Fix For: 0.96.0

 Attachments: HBASE-6634.patch


 There's a dupe call to setBatch when a scanner model object is created for 
 protobuf outputs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-6798) HDFS always read checksum form meta file


 [ 
https://issues.apache.org/jira/browse/HBASE-6798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

stack updated HBASE-6798:
-

Priority: Blocker  (was: Major)

Marking a blocker on 0.96 (Can backport if find what issues is to 0.94)

 HDFS always read checksum form meta file
 

 Key: HBASE-6798
 URL: https://issues.apache.org/jira/browse/HBASE-6798
 Project: HBase
  Issue Type: Bug
  Components: performance
Affects Versions: 0.94.0, 0.94.1
Reporter: LiuLei
Priority: Blocker

 I use hbase0.941 and hadoop-0.20.2-cdh3u5 version.
 The HBase support checksums in HBase block cache in HBASE-5074 jira.
 The  HBase  support checksums for decrease the iops of  HDFS, so that HDFS
 dont't need to read the checksum from meta file of block file.
 But in hadoop-0.20.2-cdh3u5 version, BlockSender still read the metadata file 
 even if the
  hbase.regionserver.checksum.verify property is ture.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6758) [replication] The replication-executor should make sure the file that it is replicating is closed before declaring success on that file

[
https://issues.apache.org/jira/browse/HBASE-6758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13457296#comment-13457296
]

Devaraj Das commented on HBASE-6758:

[~jdcryans] Thanks for looking. Responses below.

bq. My understanding of this patch is that it reduces the race condition but it
still leaves a small window eg you can take the fileNotInUse snapshot, get
false, and the moment after that the log could roll. If this is correct, I'm
not sure it's worth the added complexity.

I don't think there is ever that window. The replication executor thread picks
up a path that the LogRoller puts in the replicator's queue BEFORE the log roll
happens (and the HLog constructor puts the first path before the replication
executor starts). The replication executor is always trailing, and so when the
HLog guy says that a path is not in use (being written to), it seems to me a
fact that it indeed is not being written to and any writes that ever happened
was in the past. Also note that the currentPath is reset AFTER a log roll,
which is kind of delayed..

bq. It seems to me this is a case where we'd need to lock HLog.cacheFlushLock
for the time we read the log to be 100% sure log rolling doesn't happen. This
has multiple side effects like delaying flushes and log rolls for a few ms
while replication is reading the log. It would also require having a way to get
to the WAL from ReplicationSource.

Yeah, I tried my best to avoid taking that crucial lock!

bq. Anyways, one solution I can think of that doesn't involve leaking HRS into
replication would be giving the log a second chance. Basically if you get an
EOF, flip the secondChance bit. If it's on then you don't get rid of that log
yet. Reset the bit when you loop back to read, now if there was new data added
you should get it else go to the next log.

I considered some variant of this. However, I gave it up and took a more
conservative approach - make sure that the replication-executor thread gets at
least one pass at a CLOSED file. All other solutions seemed incomplete to me
and prone to races...

[~stack] forgot to answer one of your previous questions.
bq. Should currentFilePath be an atomic reference so all threads see the
changes when they happen?

I think volatile suffices for the use case here.

[replication] The replication-executor should make sure the file that it is
replicating is closed before declaring success on that file
---

Key: HBASE-6758
URL: https://issues.apache.org/jira/browse/HBASE-6758
Project: HBase
Issue Type: Bug
Reporter: Devaraj Das
Assignee: Devaraj Das
Attachments: 6758-1-0.92.patch

[jira] [Commented] (HBASE-6524) Hooks for hbase tracing


[ 
https://issues.apache.org/jira/browse/HBASE-6524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13457309#comment-13457309
 ] 

stack commented on HBASE-6524:
--

Doc is great.  (it would not be very difficult to remove this requirement).  
What would this take?  The doc. is great.  I should add it to the manual?  Can 
we do this for hdfs too  yet?  What about an example that enables trace over 
hdfs too while the Get is going on?  Good on you Jonathan.

 Hooks for hbase tracing
 ---

 Key: HBASE-6524
 URL: https://issues.apache.org/jira/browse/HBASE-6524
 Project: HBase
  Issue Type: Sub-task
Reporter: Jonathan Leavitt
Assignee: Jonathan Leavitt
 Fix For: 0.96.0

 Attachments: 6524.addendum, 6524-v2.txt, 6524v3.txt, 
 createTableTrace.png, hbase-6524.diff


 Includes the hooks that use [htrace|http://www.github.com/cloudera/htrace] 
 library to add dapper-like tracing to hbase.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6798) HDFS always read checksum form meta file

2012-09-17 Thread Todd Lipcon (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13457327#comment-13457327
 ] 

Todd Lipcon commented on HBASE-6798:


Fixing this for remote reads (ie not short circuit ones) is going to be 
somewhat tricky for Hadoop 1.0, because we need to keep protocol compatibility. 
Doing it for Hadoop 2 shouldn't be bad, because we have protobufs, but will 
still take a bit of careful HDFS surgery. I vaguely remember an existing HDFS 
JIRA about this, but now not sure where it went. Anyone remember the number or 
should we re-file?

 HDFS always read checksum form meta file
 

 Key: HBASE-6798
 URL: https://issues.apache.org/jira/browse/HBASE-6798
 Project: HBase
  Issue Type: Bug
  Components: performance
Affects Versions: 0.94.0, 0.94.1
Reporter: LiuLei
Priority: Blocker

 I use hbase0.941 and hadoop-0.20.2-cdh3u5 version.
 The HBase support checksums in HBase block cache in HBASE-5074 jira.
 The  HBase  support checksums for decrease the iops of  HDFS, so that HDFS
 dont't need to read the checksum from meta file of block file.
 But in hadoop-0.20.2-cdh3u5 version, BlockSender still read the metadata file 
 even if the
  hbase.regionserver.checksum.verify property is ture.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-6794) FilterBase should provide a default implementation of toByteArray


 [ 
https://issues.apache.org/jira/browse/HBASE-6794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gregory Chanan updated HBASE-6794:
--

Resolution: Fixed
Status: Resolved  (was: Patch Available)

Thanks for the review, stack.

Committed to trunk.

 FilterBase should provide a default implementation of toByteArray
 -

 Key: HBASE-6794
 URL: https://issues.apache.org/jira/browse/HBASE-6794
 Project: HBase
  Issue Type: Bug
Reporter: Gregory Chanan
Assignee: Gregory Chanan
Priority: Minor
 Fix For: 0.96.0

 Attachments: HBASE-6794.patch


 See HBASE-6657, FilterBase provides stub implementations for other Filter 
 methods, it seems reasonable for it to provide a default implementation for 
 toByteArray, for Filters that don't need special serialization (e.g. ones 
 with no state).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6758) [replication] The replication-executor should make sure the file that it is replicating is closed before declaring success on that file

[
https://issues.apache.org/jira/browse/HBASE-6758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13457342#comment-13457342
]

Jean-Daniel Cryans commented on HBASE-6758:
---

I see, all that double-negation (eg !fileNotInUse) confused me :)

So in layman's terms, your patch short circuits all the checks to change the
current path if we know for sure that the file we are replicating from is being
written to. The side effect is that we won't quit the current file unless it
has aged right?

bq. The replication executor is always trailing, and so when the HLog guy says
that a path is not in use (being written to), it seems to me a fact that it
indeed is not being written to and any writes that ever happened was in the
past.

FWIW that might not be totally true, at least in 0.94 HLog.postLogRoll is
called before HLog.cleanupCurrentWriter which does issue a sync().

[replication] The replication-executor should make sure the file that it is
replicating is closed before declaring success on that file
---

Key: HBASE-6758
URL: https://issues.apache.org/jira/browse/HBASE-6758
Project: HBase
Issue Type: Bug
Reporter: Devaraj Das
Assignee: Devaraj Das
Attachments: 6758-1-0.92.patch

[jira] [Commented] (HBASE-6798) HDFS always read checksum form meta file


[ 
https://issues.apache.org/jira/browse/HBASE-6798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13457347#comment-13457347
 ] 

Ted Yu commented on HBASE-6798:
---

I found the following hdfs JIRAs:

HDFS-1034: Enhance datanode to read data and checksum file in parallel
HDFS-2699: Store data and checksums together in block file

HDFS-2699 garnered some attention early this year but there seems to be no 
active development.

 HDFS always read checksum form meta file
 

 Key: HBASE-6798
 URL: https://issues.apache.org/jira/browse/HBASE-6798
 Project: HBase
  Issue Type: Bug
  Components: performance
Affects Versions: 0.94.0, 0.94.1
Reporter: LiuLei
Priority: Blocker

 I use hbase0.941 and hadoop-0.20.2-cdh3u5 version.
 The HBase support checksums in HBase block cache in HBASE-5074 jira.
 The  HBase  support checksums for decrease the iops of  HDFS, so that HDFS
 dont't need to read the checksum from meta file of block file.
 But in hadoop-0.20.2-cdh3u5 version, BlockSender still read the metadata file 
 even if the
  hbase.regionserver.checksum.verify property is ture.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-6591) checkAndPut executed/not metrics


 [ 
https://issues.apache.org/jira/browse/HBASE-6591?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gregory Chanan updated HBASE-6591:
--

Attachment: HBASE-6591-v2.patch

* Attached HBASE-6591-v2.patch *

Thanks, stack.  Added two references to FIXED_OVERHEAD.  Do I need to add to 
DEEP_OVERHEAD as well?

Doesn't look like we did last time we added a counter:
https://github.com/apache/hbase/commit/bb027e97e1eced21c5ee2f6ce7cbe30634aaf144#L2L3386

 checkAndPut executed/not metrics
 

 Key: HBASE-6591
 URL: https://issues.apache.org/jira/browse/HBASE-6591
 Project: HBase
  Issue Type: Task
  Components: metrics, regionserver
Reporter: Gregory Chanan
Assignee: Gregory Chanan
Priority: Minor
 Fix For: 0.96.0

 Attachments: HBASE-6591.patch, HBASE-6591-v2.patch


 checkAndPut/checkAndDelete return true if the new put was executed, false 
 otherwise.
 So clients can figure out this metric for themselves, but it would be useful 
 to get a look at what is happening on the cluster as a whole, across all 
 clients.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6794) FilterBase should provide a default implementation of toByteArray


[ 
https://issues.apache.org/jira/browse/HBASE-6794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13457383#comment-13457383
 ] 

Hudson commented on HBASE-6794:
---

Integrated in HBase-TRUNK #3344 (See 
[https://builds.apache.org/job/HBase-TRUNK/3344/])
HBASE-6794 FilterBase should provide a default implementation of 
toByteArray (Revision 1386842)

 Result = FAILURE
gchanan : 
Files : 
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/filter/FilterBase.java


 FilterBase should provide a default implementation of toByteArray
 -

 Key: HBASE-6794
 URL: https://issues.apache.org/jira/browse/HBASE-6794
 Project: HBase
  Issue Type: Bug
Reporter: Gregory Chanan
Assignee: Gregory Chanan
Priority: Minor
 Fix For: 0.96.0

 Attachments: HBASE-6794.patch


 See HBASE-6657, FilterBase provides stub implementations for other Filter 
 methods, it seems reasonable for it to provide a default implementation for 
 toByteArray, for Filters that don't need special serialization (e.g. ones 
 with no state).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6634) REST API ScannerModel's protobuf converter code duplicates the setBatch call


[ 
https://issues.apache.org/jira/browse/HBASE-6634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13457384#comment-13457384
 ] 

Hudson commented on HBASE-6634:
---

Integrated in HBase-TRUNK #3344 (See 
[https://builds.apache.org/job/HBase-TRUNK/3344/])
HBASE-6634 REST API ScannerModel's protobuf converter code duplicates the 
setBatch call (Revision 1386816)

 Result = FAILURE
stack : 
Files : 
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/rest/model/ScannerModel.java


 REST API ScannerModel's protobuf converter code duplicates the setBatch call
 

 Key: HBASE-6634
 URL: https://issues.apache.org/jira/browse/HBASE-6634
 Project: HBase
  Issue Type: Bug
  Components: rest
Affects Versions: 0.94.0
Reporter: Harsh J
Assignee: Harsh J
Priority: Trivial
 Fix For: 0.96.0

 Attachments: HBASE-6634.patch


 There's a dupe call to setBatch when a scanner model object is created for 
 protobuf outputs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Created] (HBASE-6802) Export Snapshot

2012-09-17 Thread Matteo Bertozzi (JIRA)

Matteo Bertozzi created HBASE-6802:
--

 Summary: Export Snapshot
 Key: HBASE-6802
 URL: https://issues.apache.org/jira/browse/HBASE-6802
 Project: HBase
  Issue Type: Sub-task
  Components: snapshots
Reporter: Matteo Bertozzi
Assignee: Matteo Bertozzi
 Fix For: 0.96.0, hbase-6055


Export a snapshot to another cluster.
 - Copy the .snapshot/name folder with all the references
 - Copy the hfiles/hlogs needed by the snapshot

Once the other cluster has the files and the snapshot information it can 
restore the snapshot.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6802) Export Snapshot

2012-09-17 Thread Matteo Bertozzi (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6802?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13457404#comment-13457404
 ] 

Matteo Bertozzi commented on HBASE-6802:


Draft on review board https://reviews.apache.org/r/7137/

 Export Snapshot
 ---

 Key: HBASE-6802
 URL: https://issues.apache.org/jira/browse/HBASE-6802
 Project: HBase
  Issue Type: Sub-task
  Components: snapshots
Reporter: Matteo Bertozzi
Assignee: Matteo Bertozzi
 Fix For: 0.96.0, hbase-6055


 Export a snapshot to another cluster.
  - Copy the .snapshot/name folder with all the references
  - Copy the hfiles/hlogs needed by the snapshot
 Once the other cluster has the files and the snapshot information it can 
 restore the snapshot.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-6733) [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-2]


 [ 
https://issues.apache.org/jira/browse/HBASE-6733?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Devaraj Das updated HBASE-6733:
---

Attachment: 6733-2.patch

Ok this patch keeps the change to sleepMultiplier very localized (and relevant 
to the place where the problem happened, as described in comment#1)

 [0.92 UNIT TESTS] TestReplication.queueFailover occasionally fails [Part-2]
 ---

 Key: HBASE-6733
 URL: https://issues.apache.org/jira/browse/HBASE-6733
 Project: HBase
  Issue Type: Bug
Reporter: Devaraj Das
Assignee: Devaraj Das
 Fix For: 0.92.3

 Attachments: 6733-1.patch, 6733-2.patch


 The failure is in TestReplication.queueFailover (fails due to unreplicated 
 rows). I have come across two problems:
 1. The sleepMultiplier is not properly reset when the currentPath is changed 
 (in ReplicationSource.java).
 2. ReplicationExecutor sometime removes files to replicate from the queue too 
 early, resulting in corresponding edits missing. Here the problem is due to 
 the fact the log-file length that the replication executor finds is not the 
 most updated one, and hence it doesn't read anything from there, and 
 ultimately, when there is a log roll, the replication-queue gets a new entry, 
 and the executor drops the old entry out of the queue.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6795) mvn compile fails on a fresh checkout with empty ~/.m2/repo

2012-09-17 Thread Enis Soztutar (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-6795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13457415#comment-13457415
]

Enis Soztutar commented on HBASE-6795:
--

I did not realize that it is a known issue, obviously I expected to mvn compile
to work out of box.
Since it is a maven bug, I'm +1 on updating the manual, and maybe a short
comment on parent pom.xml. Shall we resolve this as won't fix, once HBASE-6412
is in?

mvn compile fails on a fresh checkout with empty ~/.m2/repo
---

Key: HBASE-6795
URL: https://issues.apache.org/jira/browse/HBASE-6795
Project: HBase
Issue Type: Bug
Components: build
Affects Versions: 0.96.0
Reporter: Enis Soztutar
Assignee: Enis Soztutar
Priority: Critical

I have noticed that mvn compile fails if your ~/m2/repository/ does not
contain hbase test jars, however mvn test-compile, mvn install, etc works as
expected.
The patch for HBASE-6706 introduced test-jar dependency from hbase-server and
hbase-hadoop1-compat to hbase-hadoop-compat test jar in the test scope. But
stupid maven still tries to resolve the test jar when you do maven compile
(notice that we are not even in the test scope).
mvn test-compile, etc works b/c the test-jar for hbase-hadoop-compat is build
before hbase-hadoop1-compat.
One way to solve this is to push SNAPSHOT test-jars for hbase-hadoop-compat
to the snapshot repository, so next time, they are referenced from there.
Other alternative is to move classes under hbase-hadoop{|1|2}-compat/src/test
to src/main, and remove the test-jar intra-module dependency. Still, it seems
we might need intra-module test-jar dependency in the future.
Any other suggestions are welcome.

[jira] [Commented] (HBASE-6795) mvn compile fails on a fresh checkout with empty ~/.m2/repo

2012-09-17 Thread Jesse Yates (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-6795?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13457427#comment-13457427
]

Jesse Yates commented on HBASE-6795:

+1 on enis's comment.

mvn compile fails on a fresh checkout with empty ~/.m2/repo
---

[jira] [Created] (HBASE-6803) script hbase should add JAVA_LIBRARY_PATH to LD_LIBRARY_PATH

Jimmy Xiang created HBASE-6803:
--

 Summary: script hbase should add JAVA_LIBRARY_PATH to 
LD_LIBRARY_PATH
 Key: HBASE-6803
 URL: https://issues.apache.org/jira/browse/HBASE-6803
 Project: HBase
  Issue Type: Bug
  Components: shell
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang


Snappy SO fails to load properly if LD_LIBRARY_PATH does not include the path 
where snappy SO is.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6758) [replication] The replication-executor should make sure the file that it is replicating is closed before declaring success on that file

[
https://issues.apache.org/jira/browse/HBASE-6758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13457437#comment-13457437
]

Devaraj Das commented on HBASE-6758:

bq. I see, all that double-negation (eg !fileNotInUse) confused me

Sorry about that. I'll see if I can change it to single negation :-)

bq. So in layman's terms, your patch short circuits all the checks to change
the current path if we know for sure that the file we are replicating from is
being written to. The side effect is that we won't quit the current file unless
it has aged right?

Yes ..

bq. FWIW that might not be totally true, at least in 0.94 HLog.postLogRoll is
called before HLog.cleanupCurrentWriter which does issue a sync().

I don't get this, JD. Could you please clarify a bit more? Given the fact that
the currentPath would be updated only after the call to cleanupCurrentWriter, I
don't see a difference in the behavior between 0.92 and 0.94... (maybe I am
missing something though).

[replication] The replication-executor should make sure the file that it is
replicating is closed before declaring success on that file
---

Key: HBASE-6758
URL: https://issues.apache.org/jira/browse/HBASE-6758
Project: HBase
Issue Type: Bug
Reporter: Devaraj Das
Assignee: Devaraj Das
Attachments: 6758-1-0.92.patch

[jira] [Updated] (HBASE-6803) script hbase should add JAVA_LIBRARY_PATH to LD_LIBRARY_PATH


 [ 
https://issues.apache.org/jira/browse/HBASE-6803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HBASE-6803:
---

Attachment: trunk-6803.patch

 script hbase should add JAVA_LIBRARY_PATH to LD_LIBRARY_PATH
 

 Key: HBASE-6803
 URL: https://issues.apache.org/jira/browse/HBASE-6803
 Project: HBase
  Issue Type: Bug
  Components: shell
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Attachments: trunk-6803.patch


 Snappy SO fails to load properly if LD_LIBRARY_PATH does not include the path 
 where snappy SO is.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-6803) script hbase should add JAVA_LIBRARY_PATH to LD_LIBRARY_PATH


 [ 
https://issues.apache.org/jira/browse/HBASE-6803?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jimmy Xiang updated HBASE-6803:
---

Status: Patch Available  (was: Open)

 script hbase should add JAVA_LIBRARY_PATH to LD_LIBRARY_PATH
 

 Key: HBASE-6803
 URL: https://issues.apache.org/jira/browse/HBASE-6803
 Project: HBase
  Issue Type: Bug
  Components: shell
Reporter: Jimmy Xiang
Assignee: Jimmy Xiang
 Attachments: trunk-6803.patch


 Snappy SO fails to load properly if LD_LIBRARY_PATH does not include the path 
 where snappy SO is.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6770) Allow scanner setCaching to specify size instead of number of rows


[ 
https://issues.apache.org/jira/browse/HBASE-6770?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13457461#comment-13457461
 ] 

Lars Hofhansl commented on HBASE-6770:
--

Agreed here :)  Having an API that allows limiting the return size is a good 
idea. I assume you not need it to be perfect, but a good estimate, right?


 Allow scanner setCaching to specify size instead of number of rows
 --

 Key: HBASE-6770
 URL: https://issues.apache.org/jira/browse/HBASE-6770
 Project: HBase
  Issue Type: Bug
  Components: client, regionserver
Reporter: Karthik Ranganathan
Assignee: Michal Gregorczyk

 Currently, we have the following api's to customize the behavior of scans:
 setCaching() - how many rows to cache on client to speed up scans
 setBatch() - max columns per row to return per row to prevent a very large 
 response.
 Ideally, we should be able to specify a memory buffer size because:
 1. that would take care of both of these use cases.
 2. it does not need any knowledge of the size of the rows or cells, as the 
 final thing we are worried about is the available memory.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6799) Store more metadata in HFiles

[
https://issues.apache.org/jira/browse/HBASE-6799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13457465#comment-13457465
]

Lars Hofhansl commented on HBASE-6799:
--

Hmm... That'd be good enough, methinks. I had just looked at the FileInfo code
that writes the metadata at the end of a write.
As long as all this info is available quickly without standing up an HBase
instance that's perfect.

Looks like I need to do a bit more research :)

Store more metadata in HFiles
-

Key: HBASE-6799
URL: https://issues.apache.org/jira/browse/HBASE-6799
Project: HBase
Issue Type: Brainstorming
Reporter: Lars Hofhansl

Current we store metadata in HFile:
* the timerange of KVs
* the earliest PUT ts
* max sequence id
* whether or not this file was created from a major compaction.
I would like to brainstorm what extra data we need to store to make an HFile
self describing. I.e. it could be backed up to somewhere with external tools
(without invoking an HBase server) can gleam enough information from it to
make use of the data inside. Ideally it would also be nice to be able to
recreate .META. from a bunch of HFiles to standup a temporary HBase instance
to process a bunch of HFiles.
What I can think of:
* min/max key
* table
* column family (or families to be future proof)
* custom tags (set by a backup tools for example)

[jira] [Commented] (HBASE-6707) TEST org.apache.hadoop.hbase.backup.example.TestZooKeeperTableArchiveClient.testMultipleTables flaps


[ 
https://issues.apache.org/jira/browse/HBASE-6707?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13457475#comment-13457475
 ] 

Hudson commented on HBASE-6707:
---

Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #178 (See 
[https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/178/])
HBASE-6707 TEST 
org.apache.hadoop.hbase.backup.example.TestZooKeeperTableArchiveClient.testMultipleTables
 flaps; REVERT AGAIN (Revision 1386748)

 Result = FAILURE
stack : 
Files : 
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/backup/example/LongTermArchivingHFileCleaner.java
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/master/cleaner/CleanerChore.java
* 
/hbase/trunk/hbase-server/src/test/java/org/apache/hadoop/hbase/backup/example/TestZooKeeperTableArchiveClient.java


 TEST 
 org.apache.hadoop.hbase.backup.example.TestZooKeeperTableArchiveClient.testMultipleTables
  flaps
 

 Key: HBASE-6707
 URL: https://issues.apache.org/jira/browse/HBASE-6707
 Project: HBase
  Issue Type: Bug
  Components: test
Reporter: Sameer Vaishampayan
Assignee: Jesse Yates
Priority: Critical
 Fix For: 0.96.0

 Attachments: hbase-6707-v0.patch, hbase-6707-v1.patch


 https://builds.apache.org/job/HBase-TRUNK/3293/
 Error Message
 Archived HFiles 
 (hdfs://localhost:59986/user/jenkins/hbase/.archive/otherTable/01ced3b55d7220a9c460273a4a57b198/fam)
  should have gotten deleted, but didn't, remaining 
 files:[hdfs://localhost:59986/user/jenkins/hbase/.archive/otherTable/01ced3b55d7220a9c460273a4a57b198/fam/fc872572a1f5443eb55b6e2567cfeb1c]
 Stacktrace
 java.lang.AssertionError: Archived HFiles 
 (hdfs://localhost:59986/user/jenkins/hbase/.archive/otherTable/01ced3b55d7220a9c460273a4a57b198/fam)
  should have gotten deleted, but didn't, remaining 
 files:[hdfs://localhost:59986/user/jenkins/hbase/.archive/otherTable/01ced3b55d7220a9c460273a4a57b198/fam/fc872572a1f5443eb55b6e2567cfeb1c]
   at org.junit.Assert.fail(Assert.java:93)
   at org.junit.Assert.assertTrue(Assert.java:43)
   at org.junit.Assert.assertNull(Assert.java:551)
   at 
 org.apache.hadoop.hbase.backup.example.TestZooKeeperTableArchiveClient.testMultipleTables(TestZooKeeperTableArchiveClient.java:291)
   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
   at 
 sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6794) FilterBase should provide a default implementation of toByteArray


[ 
https://issues.apache.org/jira/browse/HBASE-6794?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13457476#comment-13457476
 ] 

Hudson commented on HBASE-6794:
---

Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #178 (See 
[https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/178/])
HBASE-6794 FilterBase should provide a default implementation of 
toByteArray (Revision 1386842)

 Result = FAILURE
gchanan : 
Files : 
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/filter/FilterBase.java


 FilterBase should provide a default implementation of toByteArray
 -

 Key: HBASE-6794
 URL: https://issues.apache.org/jira/browse/HBASE-6794
 Project: HBase
  Issue Type: Bug
Reporter: Gregory Chanan
Assignee: Gregory Chanan
Priority: Minor
 Fix For: 0.96.0

 Attachments: HBASE-6794.patch


 See HBASE-6657, FilterBase provides stub implementations for other Filter 
 methods, it seems reasonable for it to provide a default implementation for 
 toByteArray, for Filters that don't need special serialization (e.g. ones 
 with no state).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6634) REST API ScannerModel's protobuf converter code duplicates the setBatch call


[ 
https://issues.apache.org/jira/browse/HBASE-6634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13457477#comment-13457477
 ] 

Hudson commented on HBASE-6634:
---

Integrated in HBase-TRUNK-on-Hadoop-2.0.0 #178 (See 
[https://builds.apache.org/job/HBase-TRUNK-on-Hadoop-2.0.0/178/])
HBASE-6634 REST API ScannerModel's protobuf converter code duplicates the 
setBatch call (Revision 1386816)

 Result = FAILURE
stack : 
Files : 
* 
/hbase/trunk/hbase-server/src/main/java/org/apache/hadoop/hbase/rest/model/ScannerModel.java


 REST API ScannerModel's protobuf converter code duplicates the setBatch call
 

 Key: HBASE-6634
 URL: https://issues.apache.org/jira/browse/HBASE-6634
 Project: HBase
  Issue Type: Bug
  Components: rest
Affects Versions: 0.94.0
Reporter: Harsh J
Assignee: Harsh J
Priority: Trivial
 Fix For: 0.96.0

 Attachments: HBASE-6634.patch


 There's a dupe call to setBatch when a scanner model object is created for 
 protobuf outputs.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-6758) [replication] The replication-executor should make sure the file that it is replicating is closed before declaring success on that file


 [ 
https://issues.apache.org/jira/browse/HBASE-6758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Devaraj Das updated HBASE-6758:
---

Attachment: 6758-2-0.92.patch

Updated patch. Uses RegionServerServices instead of HRegionServer. Also, 
renames the variable fileNotInUse to fileInUse to make the code more readable.

 [replication] The replication-executor should make sure the file that it is 
 replicating is closed before declaring success on that file
 ---

 Key: HBASE-6758
 URL: https://issues.apache.org/jira/browse/HBASE-6758
 Project: HBase
  Issue Type: Bug
Reporter: Devaraj Das
Assignee: Devaraj Das
 Attachments: 6758-1-0.92.patch, 6758-2-0.92.patch


 I have seen cases where the replication-executor would lose data to replicate 
 since the file hasn't been closed yet. Upon closing, the new data becomes 
 visible. Before that happens the ZK node shouldn't be deleted in 
 ReplicationSourceManager.logPositionAndCleanOldLogs. Changes need to be made 
 in ReplicationSource.processEndOfFile as well (currentPath related).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6758) [replication] The replication-executor should make sure the file that it is replicating is closed before declaring success on that file


[ 
https://issues.apache.org/jira/browse/HBASE-6758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13457502#comment-13457502
 ] 

Devaraj Das commented on HBASE-6758:


bq. Otherwise, I love the fact that you are figuring bugs and fixes in 
replication just using the test. Painful I'd imagine. Great work.

Thanks, Stack. Yes, I have burnt some midnight oil on these issues. Fun though.

 [replication] The replication-executor should make sure the file that it is 
 replicating is closed before declaring success on that file
 ---

 Key: HBASE-6758
 URL: https://issues.apache.org/jira/browse/HBASE-6758
 Project: HBase
  Issue Type: Bug
Reporter: Devaraj Das
Assignee: Devaraj Das
 Attachments: 6758-1-0.92.patch, 6758-2-0.92.patch


 I have seen cases where the replication-executor would lose data to replicate 
 since the file hasn't been closed yet. Upon closing, the new data becomes 
 visible. Before that happens the ZK node shouldn't be deleted in 
 ReplicationSourceManager.logPositionAndCleanOldLogs. Changes need to be made 
 in ReplicationSource.processEndOfFile as well (currentPath related).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6800) Build a Document Store on HBase for Better Query Processing

[
https://issues.apache.org/jira/browse/HBASE-6800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13457508#comment-13457508
]

Jason Dai commented on HBASE-6800:
--

bq. coprocessor based applications should begin as independent code
contributions, perhaps hosted in a GitHub repository
bq. It would be helpful if only the changes on top of stock HBase code appear
here.
This could work, though I think we need to figure out how to address several
implications brought by the proposal, such as:
(1) How do the users figure out what co-processor applications are stable, so
that they can use in their production deployment?
(2) How do we ensure the co-processor applications continue to be compatible
with the changes in the HBase project, and compatible with each other?
(3) How do the users get the co-processor applications? They can no longer get
these from the Apache HBase release, and may need to perform manual
integrations - not something average business users will do, and the main
reason that we put the full HBase source tree out (several of our users and
customers want to get a prototype of DOT to try it out).

bq. We would be delighted to work with you on the necessary coprocessor
framework extensions. I'd recommend a separate JIRA specifically for this.
Yes, we do plan to submit the proposal for observers for the filter operations
as a separate JIRA (the original plan was to make it a sub task of this JIRA).

Build a Document Store on HBase for Better Query Processing
---

[jira] [Created] (HBASE-6804) [replication] lower the amount of logging to a more human-readable level

Jean-Daniel Cryans created HBASE-6804:
-

 Summary: [replication] lower the amount of logging to a more 
human-readable level
 Key: HBASE-6804
 URL: https://issues.apache.org/jira/browse/HBASE-6804
 Project: HBase
  Issue Type: Improvement
Reporter: Jean-Daniel Cryans
 Fix For: 0.96.0


We need stop logging every time replication decides to do something. It used to 
be extremely useful when the code base was younger but now it should be 
possible to bring it down while keeping it relevant.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Updated] (HBASE-6804) [replication] lower the amount of logging to a more human-readable level


 [ 
https://issues.apache.org/jira/browse/HBASE-6804?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jean-Daniel Cryans updated HBASE-6804:
--

Attachment: HBASE-6804-0.94.patch

Attaching a first pass for 0.94 but I'm planning to just commit to 0.96.

I added a scheduled executor in {{Replication}} that prints out all the 
statistics for both the sources and the sink every 5 minutes by default. I had 
to add glue methods to get the stats all the way from {{ReplicationSource}} to 
{{Replication}} which is not super clean.

This is what the output looks like for a the sink:

bq. 2012-09-17 17:43:08,418 INFO  [Replication Statistics #0] 
regionserver.Replication$ReplicationStatisticsThread(281): Sink: age in ms of 
last applied edit: 15644, total replicated edits: 7350

And for a source:

bq. 2012-09-17 17:42:19,887 INFO  [Replication Statistics #0] 
regionserver.Replication$ReplicationStatisticsThread(281): Normal source for 
cluster 2: Total replicated edits: 1473, currently replicating from: 
hdfs://localhost:53235/user/jdcryans/hbase/.logs/h-25-185.sfo.stumble.net,53250,1347928807372/h-25-185.sfo.stumble.net%2C53250%2C1347928807372.1347928936043
 at position: 3347

I think I need to do some more work on the normal EOFs that occur when 
tail'ing a log file.

 [replication] lower the amount of logging to a more human-readable level
 

 Key: HBASE-6804
 URL: https://issues.apache.org/jira/browse/HBASE-6804
 Project: HBase
  Issue Type: Improvement
Reporter: Jean-Daniel Cryans
 Fix For: 0.96.0

 Attachments: HBASE-6804-0.94.patch


 We need stop logging every time replication decides to do something. It used 
 to be extremely useful when the code base was younger but now it should be 
 possible to bring it down while keeping it relevant.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6758) [replication] The replication-executor should make sure the file that it is replicating is closed before declaring success on that file

[
https://issues.apache.org/jira/browse/HBASE-6758?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13457526#comment-13457526
]

Ted Yu commented on HBASE-6758:
---

@Devaraj:
I tried your patch v2 and I still got:
{code}
queueFailover(org.apache.hadoop.hbase.replication.TestReplication) Time
elapsed: 86.817 sec FAILURE!
java.lang.AssertionError: Waited too much time for queueFailover replication.
Waited 41973ms.
at org.junit.Assert.fail(Assert.java:93)
at
org.apache.hadoop.hbase.replication.TestReplication.queueFailover(TestReplication.java:666)
{code}
I will attach some test output momentarily.

[replication] The replication-executor should make sure the file that it is
replicating is closed before declaring success on that file
---

Key: HBASE-6758
URL: https://issues.apache.org/jira/browse/HBASE-6758
Project: HBase
Issue Type: Bug
Reporter: Devaraj Das
Assignee: Devaraj Das
Attachments: 6758-1-0.92.patch, 6758-2-0.92.patch

[jira] [Updated] (HBASE-6758) [replication] The replication-executor should make sure the file that it is replicating is closed before declaring success on that file


 [ 
https://issues.apache.org/jira/browse/HBASE-6758?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ted Yu updated HBASE-6758:
--

Attachment: TEST-org.apache.hadoop.hbase.replication.TestReplication.xml

 [replication] The replication-executor should make sure the file that it is 
 replicating is closed before declaring success on that file
 ---

 Key: HBASE-6758
 URL: https://issues.apache.org/jira/browse/HBASE-6758
 Project: HBase
  Issue Type: Bug
Reporter: Devaraj Das
Assignee: Devaraj Das
 Attachments: 6758-1-0.92.patch, 6758-2-0.92.patch, 
 TEST-org.apache.hadoop.hbase.replication.TestReplication.xml


 I have seen cases where the replication-executor would lose data to replicate 
 since the file hasn't been closed yet. Upon closing, the new data becomes 
 visible. Before that happens the ZK node shouldn't be deleted in 
 ReplicationSourceManager.logPositionAndCleanOldLogs. Changes need to be made 
 in ReplicationSource.processEndOfFile as well (currentPath related).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6800) Build a Document Store on HBase for Better Query Processing

[
https://issues.apache.org/jira/browse/HBASE-6800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13457527#comment-13457527
]

Ted Yu commented on HBASE-6800:
---

@Jason:
You raised some interesting questions.

I think you may be aware of the modularization effort in trunk. Matt Corgan is
submitting his contribution as a separate module.
This model may be the answer to some of your questions.

Build a Document Store on HBase for Better Query Processing
---

[jira] [Commented] (HBASE-6317) Master clean start up and Partially enabled tables make region assignment inconsistent.

2012-09-17 Thread rajeshbabu (JIRA)


[ 
https://issues.apache.org/jira/browse/HBASE-6317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13457543#comment-13457543
 ] 

rajeshbabu commented on HBASE-6317:
---

@Jimmy,
bq.  I prefer to isolate the change in EnableTableHandler and don't touch 
AssignmentManager.
Ok.
One scenario need to handle in AssignmentManager,any way its done in HBASE-6381.
If all the tables are in ENABLING or ENABLING+DISABLED, we are considering it 
as a clean cluster startup and assigning all ENABLING table regions again(which 
may cause inconsistency). This has handled in HBASE-6381 by delegating ENABLING 
table regions assignment to EnableTableHanler.


bq. So we don't need to pass masterrestart around. We can update the region 
plan only in EnableTableHandler.
Select the regions not assigned and not in transition, we can update their 
plans in EnableTableHandler only, But in BulkEnabler we dont know whether its 
roundroblinassignment or normal assignment. Instead of passing masterrestart 
around, can we do like below in ETH to ignore roundrobinassignment? (master 
restart flag is there in EnableTableHandler only)
{code}
boolean roundRobinAssignment = false;
if(this.masterRestart){
  roundRobinAssignment = this.server.getConfiguration().getBoolean(
  hbase.master.enabletable.roundrobin, false);
  
this.server.getConfiguration().setBoolean(hbase.master.enabletable.roundrobin,
 false);
}
BulkEnabler bd = new BulkEnabler(this.server, regions,
countOfRegionsInTable);
try {
  if (bd.bulkAssign()) {
done = true;
  }
} catch (InterruptedException e) {
  LOG.warn(Enable operation was interrupted when enabling table '
+ this.tableNameStr + ');
  // Preserve the interrupt.
  Thread.currentThread().interrupt();
} finally {
  
this.server.getConfiguration().setBoolean(hbase.master.enabletable.roundrobin,
  roundRobinAssignment);
}
{code}



 Master clean start up and Partially enabled tables make region assignment 
 inconsistent.
 ---

 Key: HBASE-6317
 URL: https://issues.apache.org/jira/browse/HBASE-6317
 Project: HBase
  Issue Type: Bug
Reporter: ramkrishna.s.vasudevan
Assignee: rajeshbabu
 Fix For: 0.96.0, 0.92.3, 0.94.3

 Attachments: HBASE-6317_94_3.patch, HBASE-6317_94.patch, 
 HBASE-6317_trunk_2.patch


 If we have a  table in partially enabled state (ENABLING) then on HMaster 
 restart we treat it as a clean cluster start up and do a bulk assign.  
 Currently in 0.94 bulk assign will not handle ALREADY_OPENED scenarios and it 
 leads to region assignment problems.  Analysing more on this we found that we 
 have better way to handle these scenarios.
 {code}
 if (false == checkIfRegionBelongsToDisabled(regionInfo)
  false == checkIfRegionsBelongsToEnabling(regionInfo)) {
   synchronized (this.regions) {
 regions.put(regionInfo, regionLocation);
 addToServers(regionLocation, regionInfo);
   }
 {code}
 We dont add to regions map so that enable table handler can handle it.  But 
 as nothing is added to regions map we think it as a clean cluster start up.
 Will come up with a patch tomorrow.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6800) Build a Document Store on HBase for Better Query Processing

2012-09-17 Thread Andrew Purtell (JIRA)

[
https://issues.apache.org/jira/browse/HBASE-6800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13457564#comment-13457564
]

Andrew Purtell commented on HBASE-6800:
---

bq. (1) How do the users figure out what co-processor applications are stable,
so that they can use in their production deployment?

This is exactly the motivation for starting all coprocessor based
applications/contributions as external projects. We will have no registry of
approved or stable coprocessor applications. I'd imagine users would expect
all such apps in the HBase distribution proper to be in such a state. Beyond
that, I don't think the project can have the bandwidth to track a number of
ideas in development. We can't know in advance what support, interest, or
stability any given contribution would have, so starting as an external project
establishes this on its own merit. A popular and well cared for contribution
would eventually be candidate for inclusion into the HBase source distribution
proper. This is my characterization of what has been discussed and the
consensus reached by the PMC. If others feel this in error, or if we should do
something differently here, please speak up.

bq. (2) How do we ensure the co-processor applications continue to be
compatible with the changes in the HBase project, and compatible with each
other?

We don't. The onus is on the contributor. If at some point the consensus of the
project is to bring in a particular contribution into the ASF HBase source
distribution, then at that point we must insure these things... But only with
what is in the source distribution.

bq. (3) How do the users get the co-processor applications? They can no longer
get these from the Apache HBase release, and may need to perform manual
integrations - not something average business users will do, and the main
reason that we put the full HBase source tree out

HBase is a mavenized project and your DOT system is a coprocessor application.
There is no technical reason, barring issues with the CP framework itself, I
can see why you have to include and maintain a full fork of HBase. Simply
depend on HBase project artifacts and the complete DOT application can be
compiled as a jar to drop on the classpath of a HBase installation. Where the
CP framework may be insufficient, we can address that. Or, if there is some
other technical reason (like a patch to core HBase), please list those so we
can look at addressing it.

Like Ted says also, the modularization of HBase means we could accept a
mavenized project that depends on HBase core artifacts pretty easily.

Build a Document Store on HBase for Better Query Processing
---

[jira] [Commented] (HBASE-6317) Master clean start up and Partially enabled tables make region assignment inconsistent.


[ 
https://issues.apache.org/jira/browse/HBASE-6317?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13457576#comment-13457576
 ] 

Jimmy Xiang commented on HBASE-6317:


Can we pass masterRestart to BulkEnabler, either function or constructor?

 Master clean start up and Partially enabled tables make region assignment 
 inconsistent.
 ---

 Key: HBASE-6317
 URL: https://issues.apache.org/jira/browse/HBASE-6317
 Project: HBase
  Issue Type: Bug
Reporter: ramkrishna.s.vasudevan
Assignee: rajeshbabu
 Fix For: 0.96.0, 0.92.3, 0.94.3

 Attachments: HBASE-6317_94_3.patch, HBASE-6317_94.patch, 
 HBASE-6317_trunk_2.patch


 If we have a  table in partially enabled state (ENABLING) then on HMaster 
 restart we treat it as a clean cluster start up and do a bulk assign.  
 Currently in 0.94 bulk assign will not handle ALREADY_OPENED scenarios and it 
 leads to region assignment problems.  Analysing more on this we found that we 
 have better way to handle these scenarios.
 {code}
 if (false == checkIfRegionBelongsToDisabled(regionInfo)
  false == checkIfRegionsBelongsToEnabling(regionInfo)) {
   synchronized (this.regions) {
 regions.put(regionInfo, regionLocation);
 addToServers(regionLocation, regionInfo);
   }
 {code}
 We dont add to regions map so that enable table handler can handle it.  But 
 as nothing is added to regions map we think it as a clean cluster start up.
 Will come up with a patch tomorrow.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6591) checkAndPut executed/not metrics


[ 
https://issues.apache.org/jira/browse/HBASE-6591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13457578#comment-13457578
 ] 

stack commented on HBASE-6591:
--

Ain't sure Gregory.  Try your test.  See if it passes.  Then add the -d32 flag 
so mvn runs 32bit...maybe you put it in MAVEN_OPTS environment variable... to 
ensure your change works in both 32bit and 64bit jvm (I think jenkins machines 
run 32bit in some places; its my only explaination for why a TestHeapSize works 
locally but fails when its run up on jenkins).

 checkAndPut executed/not metrics
 

 Key: HBASE-6591
 URL: https://issues.apache.org/jira/browse/HBASE-6591
 Project: HBase
  Issue Type: Task
  Components: metrics, regionserver
Reporter: Gregory Chanan
Assignee: Gregory Chanan
Priority: Minor
 Fix For: 0.96.0

 Attachments: HBASE-6591.patch, HBASE-6591-v2.patch


 checkAndPut/checkAndDelete return true if the new put was executed, false 
 otherwise.
 So clients can figure out this metric for themselves, but it would be useful 
 to get a look at what is happening on the cluster as a whole, across all 
 clients.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6524) Hooks for hbase tracing


[ 
https://issues.apache.org/jira/browse/HBASE-6524?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13457579#comment-13457579
 ] 

stack commented on HBASE-6524:
--

On doc in general, I came across this recent quote Documentation is like sex: 
when it is good, it is very very good; and when it is bad, it is better than 
nothing.

 Hooks for hbase tracing
 ---

 Key: HBASE-6524
 URL: https://issues.apache.org/jira/browse/HBASE-6524
 Project: HBase
  Issue Type: Sub-task
Reporter: Jonathan Leavitt
Assignee: Jonathan Leavitt
 Fix For: 0.96.0

 Attachments: 6524.addendum, 6524-v2.txt, 6524v3.txt, 
 createTableTrace.png, hbase-6524.diff


 Includes the hooks that use [htrace|http://www.github.com/cloudera/htrace] 
 library to add dapper-like tracing to hbase.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

[jira] [Commented] (HBASE-6798) HDFS always read checksum form meta file