[jira] [Updated] (HBASE-12413) Mismatch in the equals and hashcode methods of KeyValue
[ https://issues.apache.org/jira/browse/HBASE-12413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabriel Reid updated HBASE-12413: - Attachment: HBASE-12413.v4.patch Thanks for taking a look [~tedyu] and [~jingcheng...@intel.com]. Here's an updated patch that fixes the javadoc verb issue (compares replaced by uses), and also ignores MVCC when calculating the hash code (including a test to verify this). Mismatch in the equals and hashcode methods of KeyValue --- Key: HBASE-12413 URL: https://issues.apache.org/jira/browse/HBASE-12413 Project: HBase Issue Type: Bug Reporter: Jingcheng Du Assignee: Jingcheng Du Priority: Minor Attachments: HBASE-12413-V2.diff, HBASE-12413.diff, HBASE-12413.v3.patch, HBASE-12413.v4.patch In the equals method of KeyValue only row key is compared, and in the hashcode method all bacing bytes are calculated. This breaks the Java rule. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (HBASE-12413) Mismatch in the equals and hashcode methods of KeyValue
[ https://issues.apache.org/jira/browse/HBASE-12413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14381806#comment-14381806 ] Gabriel Reid commented on HBASE-12413: -- [~tedyu] if this change is incompatible (i.e. if it breaks anything), I would assume that it is just making it more clear that that thing was broken anyhow. The case I'm imagining is that KeyValues are being put into a HashSet or used as keys in a HashMap -- in this case, the current (unpatched) behavior would be non-determinsitic, but may appear to work in many cases (i.e. the fact that the client code is broken might not be immediately visible). I think that committing this can only be a good thing -- I think that the worst case is that it would make it more clear that already-broken code is actually broken. Mismatch in the equals and hashcode methods of KeyValue --- Key: HBASE-12413 URL: https://issues.apache.org/jira/browse/HBASE-12413 Project: HBase Issue Type: Bug Reporter: Jingcheng Du Assignee: Jingcheng Du Priority: Minor Attachments: HBASE-12413-V2.diff, HBASE-12413.diff In the equals method of KeyValue only row key is compared, and in the hashcode method all bacing bytes are calculated. This breaks the Java rule. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-12413) Mismatch in the equals and hashcode methods of KeyValue
[ https://issues.apache.org/jira/browse/HBASE-12413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabriel Reid updated HBASE-12413: - Attachment: HBASE-12413.v3.patch Here's a slightly altered patch that removes the need to change the visibility of CellComparator.calculateHashForKeyValue, and adds some basic testing of the equals and hashCode methods on KeyValue. I'll try to get a new QA run on this and hopefully it'll be clean. Mismatch in the equals and hashcode methods of KeyValue --- Key: HBASE-12413 URL: https://issues.apache.org/jira/browse/HBASE-12413 Project: HBase Issue Type: Bug Reporter: Jingcheng Du Assignee: Jingcheng Du Priority: Minor Attachments: HBASE-12413-V2.diff, HBASE-12413.diff, HBASE-12413.v3.patch In the equals method of KeyValue only row key is compared, and in the hashcode method all bacing bytes are calculated. This breaks the Java rule. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-12413) Mismatch in the equals and hashcode methods of KeyValue
[ https://issues.apache.org/jira/browse/HBASE-12413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabriel Reid updated HBASE-12413: - Assignee: (was: Jingcheng Du) Status: Open (was: Patch Available) Mismatch in the equals and hashcode methods of KeyValue --- Key: HBASE-12413 URL: https://issues.apache.org/jira/browse/HBASE-12413 Project: HBase Issue Type: Bug Reporter: Jingcheng Du Priority: Minor Attachments: HBASE-12413-V2.diff, HBASE-12413.diff, HBASE-12413.v3.patch In the equals method of KeyValue only row key is compared, and in the hashcode method all bacing bytes are calculated. This breaks the Java rule. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (HBASE-12413) Mismatch in the equals and hashcode methods of KeyValue
[ https://issues.apache.org/jira/browse/HBASE-12413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabriel Reid updated HBASE-12413: - Assignee: Jingcheng Du Status: Patch Available (was: Open) Mismatch in the equals and hashcode methods of KeyValue --- Key: HBASE-12413 URL: https://issues.apache.org/jira/browse/HBASE-12413 Project: HBase Issue Type: Bug Reporter: Jingcheng Du Assignee: Jingcheng Du Priority: Minor Attachments: HBASE-12413-V2.diff, HBASE-12413.diff, HBASE-12413.v3.patch In the equals method of KeyValue only row key is compared, and in the hashcode method all bacing bytes are calculated. This breaks the Java rule. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (HBASE-6749) Compact one expired HFile all the time
[ https://issues.apache.org/jira/browse/HBASE-6749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabriel Reid resolved HBASE-6749. - Resolution: Duplicate This is a duplicate of HBASE-10371 Compact one expired HFile all the time -- Key: HBASE-6749 URL: https://issues.apache.org/jira/browse/HBASE-6749 Project: HBase Issue Type: Bug Components: regionserver Affects Versions: 0.94.1 Reporter: Jieshan Bean Assignee: Jieshan Bean It's an interesting issue. We found there's 1 HFile keeped changing its name all the time. After dig in more, we found one strange behavior in compaction flow. Here's the problem(We set the TTL property in our table): There were 10 HFiles and only 1 expired HFile when this problem occured: {noformat} 2012-09-07 02:21:05,298 DEBUG org.apache.hadoop.hbase.regionserver.Store: loaded hdfs://hacluster/hbase/fhtest/97ac7ea6f732b7c10fe504a54ab02441/value/221f56905cbd4bf09bd4d5d9dceb113a.3ed2e43476ca2c614e33d0e1255c79a9, isReference=true, isBulkLoadResult=false, seqid=118730, majorCompaction=false 2012-09-07 02:21:05,309 DEBUG org.apache.hadoop.hbase.regionserver.Store: loaded hdfs://hacluster/hbase/fhtest/97ac7ea6f732b7c10fe504a54ab02441/value/297b45a6c5f541dca05105ab098dab8d, isReference=false, isBulkLoadResult=false, seqid=122018, majorCompaction=false 2012-09-07 02:21:05,326 DEBUG org.apache.hadoop.hbase.regionserver.Store: loaded hdfs://hacluster/hbase/fhtest/97ac7ea6f732b7c10fe504a54ab02441/value/4a4a4598bc0443c9be087812052d6796.3ed2e43476ca2c614e33d0e1255c79a9, isReference=true, isBulkLoadResult=false, seqid=119850, majorCompaction=false 2012-09-07 02:21:05,348 DEBUG org.apache.hadoop.hbase.regionserver.Store: loaded hdfs://hacluster/hbase/fhtest/97ac7ea6f732b7c10fe504a54ab02441/value/8c6d56c9bafb4b0eb0dd6e04e41ca5b7, isReference=false, isBulkLoadResult=false, seqid=123135, majorCompaction=false 2012-09-07 02:21:05,357 DEBUG org.apache.hadoop.hbase.regionserver.Store: loaded hdfs://hacluster/hbase/fhtest/97ac7ea6f732b7c10fe504a54ab02441/value/a6c2873a646c425f87a6a8a271a9904e, isReference=false, isBulkLoadResult=false, seqid=76561, majorCompaction=false 2012-09-07 02:21:05,370 DEBUG org.apache.hadoop.hbase.regionserver.Store: loaded hdfs://hacluster/hbase/fhtest/97ac7ea6f732b7c10fe504a54ab02441/value/a712a2f79bd247f48405d2c6a91757ab.3ed2e43476ca2c614e33d0e1255c79a9, isReference=true, isBulkLoadResult=false, seqid=120951, majorCompaction=false 2012-09-07 02:21:05,381 DEBUG org.apache.hadoop.hbase.regionserver.Store: loaded hdfs://hacluster/hbase/fhtest/97ac7ea6f732b7c10fe504a54ab02441/value/a717c63214534cb0aaf8e695147fde46, isReference=false, isBulkLoadResult=false, seqid=122763, majorCompaction=false 2012-09-07 02:21:05,431 DEBUG org.apache.hadoop.hbase.regionserver.Store: loaded hdfs://hacluster/hbase/fhtest/97ac7ea6f732b7c10fe504a54ab02441/value/c456f3831b094ac3a0590678acbf27a5.3ed2e43476ca2c614e33d0e1255c79a9, isReference=true, isBulkLoadResult=false, seqid=120579, majorCompaction=false 2012-09-07 02:21:05,518 DEBUG org.apache.hadoop.hbase.regionserver.Store: loaded hdfs://hacluster/hbase/fhtest/97ac7ea6f732b7c10fe504a54ab02441/value/dde4e56b131a4ffdaec8f9574bffa5ab.3ed2e43476ca2c614e33d0e1255c79a9, isReference=true, isBulkLoadResult=false, seqid=121651, majorCompaction=false 2012-09-07 02:21:05,593 DEBUG org.apache.hadoop.hbase.regionserver.Store: loaded hdfs://hacluster/hbase/fhtest/97ac7ea6f732b7c10fe504a54ab02441/value/ee62844594f2474e88186dbde673c802.3ed2e43476ca2c614e33d0e1255c79a9, isReference=true, isBulkLoadResult=false, seqid=119478, majorCompaction=false {noformat} Compaction was triggered during Region-Opening . As we know, compaction should choose the expired HFiles to compact first, so that 1 expired HFile was choosed. Since no KeyValue was there, compaction deleted the old HFile and created a new one with minimumTimestamp = -1 maximumTimestamp = -1. So after the first compaction, there were still 10 HFiles. It triggered compaction again and again. {noformat} 2012-09-07 02:21:06,079 INFO org.apache.hadoop.hbase.regionserver.compactions.CompactSelection: Deleting the expired store file by compaction: hdfs://hacluster/hbase/fhtest/97ac7ea6f732b7c10fe504a54ab02441/value/72468dff1cd94c4fb9cf9196cc3183b7 whose maxTimeStamp is -1 while the max expired timestamp is 1344824466079 2012-09-07 02:21:06,080 DEBUG org.apache.hadoop.hbase.regionserver.Store: Compacting hdfs://hacluster/hbase/fhtest/97ac7ea6f732b7c10fe504a54ab02441/value/72468dff1cd94c4fb9cf9196cc3183b7, keycount=0, bloomtype=NONE, size=558, encoding=NONE 2012-09-07 02:21:06,082 DEBUG org.apache.hadoop.hbase.util.FSUtils: Creating
[jira] [Commented] (HBASE-10504) Define Replication Interface
[ https://issues.apache.org/jira/browse/HBASE-10504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13998525#comment-13998525 ] Gabriel Reid commented on HBASE-10504: -- {quote}Even in that case, the server-client implementation will be much simpler than what you have in SEP I think, which does hbase RPC + zookeeper + zk mockup. Only a server discovery + simple one method thrift / PB RPC is what is needed from my understanding. It would be great to have a skeleton for this living in HBase proper as well.{quote} The SEP does indeed make use of HBase RPC and ZK, and does masquerades in ZK as a region server, but I don't see that much of an advantage of doing a separate thrift or PB-based RPC. I think that this would just be exchanging one existing RPC mechanism (hbase) for another (thrift/PB). The service discovery would still be required, which would also involve ZK. Not having to do the setup of masquerading as a regionserver would indeed be a plus. Taking this road for the SEP would then require: * defining/setting up RPC (which is currently already provided by hbase * implementing a ReplicationConsumer (and deploying and configuring it in hbase RS) that does service discovery, RPC endpoint selection, and properly deals with failing/flaky RPC endpoints (also already provided in hbase replication) * reworking the SEP RPC server to use the new RPC mechanism This seems to me like work that is difficult to justify for the SEP because it makes use of existing HBase RPC and replication failover handling, and currently works. I may very well be missing something, but I don't see how reimplementing this stuff in the SEP would simplify things very much. The one big potential advantage that I do see is that it would protect the SEP from changes in the internal HBase replication interfaces. Obviously these comments are purely from the standpoint of SEP maintenance. As I mentioned in a previous comment, I think that this proposal would be a great addition to HBase and very useful -- I just don't see it's usefulness to the SEP yet. Define Replication Interface Key: HBASE-10504 URL: https://issues.apache.org/jira/browse/HBASE-10504 Project: HBase Issue Type: Task Reporter: stack Assignee: stack Priority: Blocker Fix For: 0.99.0 Attachments: hbase-10504_wip1.patch HBase has replication. Fellas have been hijacking the replication apis to do all kinds of perverse stuff like indexing hbase content (hbase-indexer https://github.com/NGDATA/hbase-indexer) and our [~toffer] just showed up w/ overrides that replicate via an alternate channel (over a secure thrift channel between dcs over on HBASE-9360). This issue is about surfacing these APIs as public with guarantees to downstreamers similar to those we have on our public client-facing APIs (and so we don't break them for downstreamers). Any input [~phunt] or [~gabriel.reid] or [~toffer]? Thanks. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-10504) Define Replication Interface
[ https://issues.apache.org/jira/browse/HBASE-10504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13999804#comment-13999804 ] Gabriel Reid commented on HBASE-10504: -- {quote}I think I don't fully understand why would you want to have a proxy layer as a separate service (other than isolation issues).{quote} Process isolation and deployment isolation are indeed the two main issues. The process isolation side of things probably speaks for itself. Deployment isolation is another issue however. Keeping dependencies of hbase-indexer in line with all deps in hbase is an issue (as all hbase-indexer jars would need to be deployed into the HBase lib for this approach to work), and all configuration, config changes, and restarting would also become directly linked to HBase. Define Replication Interface Key: HBASE-10504 URL: https://issues.apache.org/jira/browse/HBASE-10504 Project: HBase Issue Type: Task Reporter: stack Assignee: stack Priority: Blocker Fix For: 0.99.0 Attachments: hbase-10504_wip1.patch HBase has replication. Fellas have been hijacking the replication apis to do all kinds of perverse stuff like indexing hbase content (hbase-indexer https://github.com/NGDATA/hbase-indexer) and our [~toffer] just showed up w/ overrides that replicate via an alternate channel (over a secure thrift channel between dcs over on HBASE-9360). This issue is about surfacing these APIs as public with guarantees to downstreamers similar to those we have on our public client-facing APIs (and so we don't break them for downstreamers). Any input [~phunt] or [~gabriel.reid] or [~toffer]? Thanks. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-10504) Define Replication Interface
[ https://issues.apache.org/jira/browse/HBASE-10504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13997936#comment-13997936 ] Gabriel Reid commented on HBASE-10504: -- [~enis] [~jdcryans] I think that this would make a great addition to HBase in general -- if I'm reading it correctly, it basically works out as a kind of RegionObserver.postPut that runs in process, but off of the write path. However, I'm less sure of how well this general approach would work for hbase-indexer. Although this approach would simplify a lot of things (e.g.removing the need to to act as a slave cluster), it also introduces a number of important changes in the execution model of something that runs via the SepConsumer (i.e. reacting to HBase updates) * the number of replication endpoints becomes directly linked to the number of regionservers -- as far as I see, it would no longer be possible to have more or fewer replication endpoints than regionservers * the replication (ReplicationConsumer) runs within the same JVM as the regionserver, which brings some important changes: all of the dependencies then need to be present in the classpath of the regionserver (and compatible with the dependencies of the regionserver), it has an effect on the heap, is no longer possible to tune the JVM settings for the replication process on its own * from what I see of the current patch (although I'm sure this could be changed) is that it looks like there's only a single ReplicationConsumer, meaning that it wouldn't be possible to have real replication running next to a custom ReplicationSource The tighter-coupling issues that I outlined above could be worked around by having a lightweight ReplicationConsumer that just passes data to an external process via RPC, but that's basically exactly what the replication code in HBase is doing right now, so it would be a shame to re-implement something like that. Define Replication Interface Key: HBASE-10504 URL: https://issues.apache.org/jira/browse/HBASE-10504 Project: HBase Issue Type: Task Reporter: stack Assignee: stack Priority: Blocker Fix For: 0.99.0 Attachments: hbase-10504_wip1.patch HBase has replication. Fellas have been hijacking the replication apis to do all kinds of perverse stuff like indexing hbase content (hbase-indexer https://github.com/NGDATA/hbase-indexer) and our [~toffer] just showed up w/ overrides that replicate via an alternate channel (over a secure thrift channel between dcs over on HBASE-9360). This issue is about surfacing these APIs as public with guarantees to downstreamers similar to those we have on our public client-facing APIs (and so we don't break them for downstreamers). Any input [~phunt] or [~gabriel.reid] or [~toffer]? Thanks. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Created] (HBASE-11110) Ability to load FilterList class is dependent on context classloader
Gabriel Reid created HBASE-0: Summary: Ability to load FilterList class is dependent on context classloader Key: HBASE-0 URL: https://issues.apache.org/jira/browse/HBASE-0 Project: HBase Issue Type: Bug Affects Versions: 0.94.19 Reporter: Gabriel Reid In the 0.94 branch, the FilterList class contains a static call to HBaseConfiguration.create(). This create call in turn adds the needed hbase resources to the Configuration object, and sets the classloader of the Configuration object to be the context classloader of the current thread (if it isn't null). This approach causes issues if the FilterList class is loaded from a thread that has a custom context classloader that doesn't run back up to the main application classloader. In this case, HBaseConfiguration.checkDefaultsVersion fails because the hbase.defaults.for.version configuration value can't be found (because hbase-default.xml can't be found by the custom context classloader). This is a concrete issue that was discovered via Apache Phoenix within a commercial tool, when a (JDBC) connection is opened via a pool, and then passed off to a UI thread that has a custom context classloader. The UI thread is then the first thing to load FilterList, leading to this issue. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Updated] (HBASE-11110) Ability to load FilterList class is dependent on context classloader
[ https://issues.apache.org/jira/browse/HBASE-0?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabriel Reid updated HBASE-0: - Attachment: HBASE-0.patch Patch to set the current thread's context classloader to the classloader used to load the Filter class while setting static config member of FilterList. Ability to load FilterList class is dependent on context classloader Key: HBASE-0 URL: https://issues.apache.org/jira/browse/HBASE-0 Project: HBase Issue Type: Bug Affects Versions: 0.94.19 Reporter: Gabriel Reid Attachments: HBASE-0.patch In the 0.94 branch, the FilterList class contains a static call to HBaseConfiguration.create(). This create call in turn adds the needed hbase resources to the Configuration object, and sets the classloader of the Configuration object to be the context classloader of the current thread (if it isn't null). This approach causes issues if the FilterList class is loaded from a thread that has a custom context classloader that doesn't run back up to the main application classloader. In this case, HBaseConfiguration.checkDefaultsVersion fails because the hbase.defaults.for.version configuration value can't be found (because hbase-default.xml can't be found by the custom context classloader). This is a concrete issue that was discovered via Apache Phoenix within a commercial tool, when a (JDBC) connection is opened via a pool, and then passed off to a UI thread that has a custom context classloader. The UI thread is then the first thing to load FilterList, leading to this issue. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-11110) Ability to load FilterList class is dependent on context classloader
[ https://issues.apache.org/jira/browse/HBASE-0?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13988143#comment-13988143 ] Gabriel Reid commented on HBASE-0: -- {quote}This exact issue won't be a problem in 0.96+ since the relevant code has changed, but I wonder if it's possible to try Phoenix 4 + HBase 0.98 with that tool?{quote} Yes, will do. I'm working on going through some general tool compat tests, starting with Phoenix 3/HBase 0.94 but will also go on to do the same with Phoenix 4/HBase 0.98. Ability to load FilterList class is dependent on context classloader Key: HBASE-0 URL: https://issues.apache.org/jira/browse/HBASE-0 Project: HBase Issue Type: Bug Affects Versions: 0.94.19 Reporter: Gabriel Reid Assignee: Gabriel Reid Fix For: 0.94.20 Attachments: HBASE-0.patch In the 0.94 branch, the FilterList class contains a static call to HBaseConfiguration.create(). This create call in turn adds the needed hbase resources to the Configuration object, and sets the classloader of the Configuration object to be the context classloader of the current thread (if it isn't null). This approach causes issues if the FilterList class is loaded from a thread that has a custom context classloader that doesn't run back up to the main application classloader. In this case, HBaseConfiguration.checkDefaultsVersion fails because the hbase.defaults.for.version configuration value can't be found (because hbase-default.xml can't be found by the custom context classloader). This is a concrete issue that was discovered via Apache Phoenix within a commercial tool, when a (JDBC) connection is opened via a pool, and then passed off to a UI thread that has a custom context classloader. The UI thread is then the first thing to load FilterList, leading to this issue. -- This message was sent by Atlassian JIRA (v6.2#6252)
[jira] [Commented] (HBASE-10504) Define Replication Interface
[ https://issues.apache.org/jira/browse/HBASE-10504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13902005#comment-13902005 ] Gabriel Reid commented on HBASE-10504: -- Making the NRT listening of changes on HBase into a first-class API as [~toffer] suggested sounds like an interesting idea, if there's interest in going that far in actively supporting this use case. If there is interest in doing that, the current hbase-sep library (which is the current library used for latching in via replication and responding to changes for the hbase-indexer project) could be a possible starting point. The current API interfaces[1] as well as the impl are within sub-modules[2] of the hbase-indexer project on GitHub. [1] https://github.com/NGDATA/hbase-indexer/tree/master/hbase-sep/hbase-sep-api/src/main/java/com/ngdata/sep [2] https://github.com/NGDATA/hbase-indexer/tree/master/hbase-sep Define Replication Interface Key: HBASE-10504 URL: https://issues.apache.org/jira/browse/HBASE-10504 Project: HBase Issue Type: Task Reporter: stack Assignee: stack Fix For: 0.99.0 HBase has replication. Fellas have been hijacking the replication apis to do all kinds of perverse stuff like indexing hbase content (hbase-indexer https://github.com/NGDATA/hbase-indexer) and our [~toffer] just showed up w/ overrides that replicate via an alternate channel (over a secure thrift channel between dcs over on HBASE-9360). This issue is about surfacing these APIs as public with guarantees to downstreamers similar to those we have on our public client-facing APIs (and so we don't break them for downstreamers). Any input [~phunt] or [~gabriel.reid] or [~toffer]? Thanks. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-10504) Define Replication Interface
[ https://issues.apache.org/jira/browse/HBASE-10504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13898904#comment-13898904 ] Gabriel Reid commented on HBASE-10504: -- I would echo [~phunt]'s points -- having these interfaces public and stable would be great. I think that from the point of view of the hbase-indexer project, the main contact points that are involved are o.a.h.h.replication.regionserver.ReplicationSource and o.a.h.h.Server, as well as the generated AdminProtos, although I assume that the scope of this ticket will be a fair bit wider than that. If we're going to move to having a published stable API, now is also probably the time to make any changes that may be needed. The one thing I can think of right now that would be nice is a better hook into o.a.h.h.replication.regionserver.ReplicationSource#removeNonReplicableEdits. Right now we override that method to do additional filtering on HLog entries that we know we don't want do send to the indexer, but it would be good to make that into a public method with some kind of filter interface. Define Replication Interface Key: HBASE-10504 URL: https://issues.apache.org/jira/browse/HBASE-10504 Project: HBase Issue Type: Task Reporter: stack Assignee: stack Fix For: 0.99.0 HBase has replication. Fellas have been hijacking the replication apis to do all kinds of perverse stuff like indexing hbase content (hbase-indexer https://github.com/NGDATA/hbase-indexer) and our [~toffer] just showed up w/ overrides that replicate via an alternate channel (over a secure thrift channel between dcs over on HBASE-9360). This issue is about surfacing these APIs as public with guarantees to downstreamers similar to those we have on our public client-facing APIs (and so we don't break them for downstreamers). Any input [~phunt] or [~gabriel.reid] or [~toffer]? Thanks. -- This message was sent by Atlassian JIRA (v6.1.5#6160)
[jira] [Commented] (HBASE-9763) Scan javadoc doesn't fully capture semantics of start and stop row
[ https://issues.apache.org/jira/browse/HBASE-9763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13797015#comment-13797015 ] Gabriel Reid commented on HBASE-9763: - Thanks for taking a look. The difference between your test and my example is that my example is based on a row key prefix, and your test is based on full matching. Like the discussion and previous bug in the docs in HBASE-9035, there's a lot of potential for confusion when appending null bytes to get different semantics. I think this is a potentially confusing topic for users not matter what. Another option instead of removing the null-byte appending advice from the javadoc might be to qualify it with more information around the semantics of what the start row and end row of a scan really are. Or maybe I'm just worrying too much about it and it's not really that big of an issue. :-) Scan javadoc doesn't fully capture semantics of start and stop row -- Key: HBASE-9763 URL: https://issues.apache.org/jira/browse/HBASE-9763 Project: HBase Issue Type: Bug Components: documentation Reporter: Gabriel Reid Priority: Minor Attachments: HBASE-9763.patch The current javadoc for Scan#setStartRow and Scan#setStopRow methods don't accurately capture the semantics of the use of row prefix values. Both methods describe the use of a trailing null byte to change the inclusive/exclusive the respective semantics of setStartRow and setStopRow. The use of a trailing null byte for start row exclusion only works in the case that exact full matching is done on row keys. The use of a trailing null byte for stop row inclusion has even more limitations (see HBASE-9035). The basic example is having the following rows: {code} AAB ABB BBC BCC {code} Setting the start row to A and the stop row to B will include AAB and AB. Setting the start row to A\x0 and the stop row to B\x0 will result in the same two rows coming out of the scan, instead of having an effect on the inclusion/exclusion semantics. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Created] (HBASE-9763) Scan javadoc doesn't fully capture semantics of start and stop row
Gabriel Reid created HBASE-9763: --- Summary: Scan javadoc doesn't fully capture semantics of start and stop row Key: HBASE-9763 URL: https://issues.apache.org/jira/browse/HBASE-9763 Project: HBase Issue Type: Bug Components: documentation Reporter: Gabriel Reid Priority: Minor The current javadoc for Scan#setStartRow and Scan#setStopRow methods don't accurately capture the semantics of the use of row prefix values. Both methods describe the use of a trailing null byte to change the inclusive/exclusive the respective semantics of setStartRow and setStopRow. The use of a trailing null byte for start row exclusion only works in the case that exact full matching is done on row keys. The use of a trailing null byte for stop row inclusion has even more limitations (see HBASE-9035). The basic example is having the following rows: {code} AAB ABB BBC BCC {code} Setting the start row to A and the stop row to B will include AAB and AB. Setting the start row to A\x0 and the stop row to B\x0 will result in the same two rows coming out of the scan, instead of having an effect on the inclusion/exclusion semantics. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HBASE-9763) Scan javadoc doesn't fully capture semantics of start and stop row
[ https://issues.apache.org/jira/browse/HBASE-9763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabriel Reid updated HBASE-9763: Attachment: HBASE-9763.patch Trivial patch to remove the trailing null byte information, and be a bit more clear about the use of scan start and stop as matching the row prefix. Scan javadoc doesn't fully capture semantics of start and stop row -- Key: HBASE-9763 URL: https://issues.apache.org/jira/browse/HBASE-9763 Project: HBase Issue Type: Bug Components: documentation Reporter: Gabriel Reid Priority: Minor Attachments: HBASE-9763.patch The current javadoc for Scan#setStartRow and Scan#setStopRow methods don't accurately capture the semantics of the use of row prefix values. Both methods describe the use of a trailing null byte to change the inclusive/exclusive the respective semantics of setStartRow and setStopRow. The use of a trailing null byte for start row exclusion only works in the case that exact full matching is done on row keys. The use of a trailing null byte for stop row inclusion has even more limitations (see HBASE-9035). The basic example is having the following rows: {code} AAB ABB BBC BCC {code} Setting the start row to A and the stop row to B will include AAB and AB. Setting the start row to A\x0 and the stop row to B\x0 will result in the same two rows coming out of the scan, instead of having an effect on the inclusion/exclusion semantics. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Updated] (HBASE-9763) Scan javadoc doesn't fully capture semantics of start and stop row
[ https://issues.apache.org/jira/browse/HBASE-9763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabriel Reid updated HBASE-9763: Status: Patch Available (was: Open) Scan javadoc doesn't fully capture semantics of start and stop row -- Key: HBASE-9763 URL: https://issues.apache.org/jira/browse/HBASE-9763 Project: HBase Issue Type: Bug Components: documentation Reporter: Gabriel Reid Priority: Minor Attachments: HBASE-9763.patch The current javadoc for Scan#setStartRow and Scan#setStopRow methods don't accurately capture the semantics of the use of row prefix values. Both methods describe the use of a trailing null byte to change the inclusive/exclusive the respective semantics of setStartRow and setStopRow. The use of a trailing null byte for start row exclusion only works in the case that exact full matching is done on row keys. The use of a trailing null byte for stop row inclusion has even more limitations (see HBASE-9035). The basic example is having the following rows: {code} AAB ABB BBC BCC {code} Setting the start row to A and the stop row to B will include AAB and AB. Setting the start row to A\x0 and the stop row to B\x0 will result in the same two rows coming out of the scan, instead of having an effect on the inclusion/exclusion semantics. -- This message was sent by Atlassian JIRA (v6.1#6144)
[jira] [Commented] (HBASE-9591) [replication] getting Current list of sinks is out of date all the time when a source is recovered
[ https://issues.apache.org/jira/browse/HBASE-9591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13774597#comment-13774597 ] Gabriel Reid commented on HBASE-9591: - Sorry for being so slow at looking at this. I just finally took a closer look, and now I'm clear on what's going on. I think I'm leaning towards managing each cluster fully separately (i.e. having a separate ReplicationPeers instance per peer cluster), but I'm wondering what kind of impact that would have on resource usage. At first glance, it looks like it should be fine. I think that taking this approach will be better in terms of avoiding other variations of this bug in the future, which could be something that would happen if we do the noop if chooseSinks returns the same thing approach. On the other hand, the noop if chooseSinks returns the same thing approach will probably be quite a bit easier. Do you have a personal preference for the approach, or ideas on what would be best? [replication] getting Current list of sinks is out of date all the time when a source is recovered Key: HBASE-9591 URL: https://issues.apache.org/jira/browse/HBASE-9591 Project: HBase Issue Type: Bug Affects Versions: 0.96.0 Reporter: Jean-Daniel Cryans Priority: Minor Fix For: 0.96.1 I tried killing a region server when the slave cluster was down, from that point on my log was filled with: {noformat} 2013-09-20 00:31:03,942 INFO [regionserver60020.replicationSource,1] org.apache.hadoop.hbase.replication.regionserver.ReplicationSinkManager: Current list of sinks is out of date, updating 2013-09-20 00:31:04,226 INFO [ReplicationExecutor-0.replicationSource,1-jdec2hbase0403-4,60020,1379636329634] org.apache.hadoop.hbase.replication.regionserver.ReplicationSinkManager: Current list of sinks is out of date, updating {noformat} The first log line is from the normal source, the second is the recovered one. When we try to replicate, we call replicationSinkMgr.getReplicationSink() and if the list of machines was refreshed since the last time then we call chooseSinks() which in turn refreshes the list of sinks and resets our lastUpdateToPeers. The next source will notice the change, and will call chooseSinks() too. The first source is coming for another round, sees the list was refreshed, calls chooseSinks() again. It happens forever until the recovered queue is gone. We could have all the sources going to the same cluster share a thread-safe ReplicationSinkManager. We could also manage the same cluster separately for each source. Or even easier, if the list we get in chooseSinks() is the same we had before, consider it a noop. What do you think [~gabriel.reid]? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-9591) [replication] getting Current list of sinks is out of date all the time when a source is recovered
[ https://issues.apache.org/jira/browse/HBASE-9591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13772838#comment-13772838 ] Gabriel Reid commented on HBASE-9591: - I don't think I totally understand the situation. {quote}I tried killing a region server when the slave cluster was down{quote} So just to be totally sure, a region server on the master cluster is being killed, while the whole slave cluster is down, is that correct? If that's the case, I'm assuming that the list of region servers in the peer cluster would always remain empty, and wouldn't have any change events coming through ZK, and so the timestamp returned by ReplicationPeers#getTimestampOfLastChangeToPeer would stay the same. Of course, if that's all the case then it wouldn't lead to this situation, so I'm definitely not understanding something. In any case, I think it does make sense to consider things a noop (and not update timestamps) if the list of sinks fetched in ReplicationSinkManager#chooseSinks is the same as the last time. [~lhofhansl] This shouldn't apply to 0.94, the ReplicationSinkManager is only 0.95+. [replication] getting Current list of sinks is out of date all the time when a source is recovered Key: HBASE-9591 URL: https://issues.apache.org/jira/browse/HBASE-9591 Project: HBase Issue Type: Bug Affects Versions: 0.96.0 Reporter: Jean-Daniel Cryans Priority: Minor Fix For: 0.96.1 I tried killing a region server when the slave cluster was down, from that point on my log was filled with: {noformat} 2013-09-20 00:31:03,942 INFO [regionserver60020.replicationSource,1] org.apache.hadoop.hbase.replication.regionserver.ReplicationSinkManager: Current list of sinks is out of date, updating 2013-09-20 00:31:04,226 INFO [ReplicationExecutor-0.replicationSource,1-jdec2hbase0403-4,60020,1379636329634] org.apache.hadoop.hbase.replication.regionserver.ReplicationSinkManager: Current list of sinks is out of date, updating {noformat} The first log line is from the normal source, the second is the recovered one. When we try to replicate, we call replicationSinkMgr.getReplicationSink() and if the list of machines was refreshed since the last time then we call chooseSinks() which in turn refreshes the list of sinks and resets our lastUpdateToPeers. The next source will notice the change, and will call chooseSinks() too. The first source is coming for another round, sees the list was refreshed, calls chooseSinks() again. It happens forever until the recovered queue is gone. We could have all the sources going to the same cluster share a thread-safe ReplicationSinkManager. We could also manage the same cluster separately for each source. Or even easier, if the list we get in chooseSinks() is the same we had before, consider it a noop. What do you think [~gabriel.reid]? -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-7634) Replication handling of changes to peer clusters is inefficient
[ https://issues.apache.org/jira/browse/HBASE-7634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabriel Reid updated HBASE-7634: Release Note: This change has an impact on the number of watches set on the ${zookeeper.znode.parent}/rs node in ZK in a replication slave cluster (i.e. a cluster that is being replicated to). Every region server in each master cluster will place a watch on the rs node of each slave node. No additional configuration is necessary for this, but this could potentially have an impact the performance and/or hardware requirements of ZK on very large clusters. Replication handling of changes to peer clusters is inefficient --- Key: HBASE-7634 URL: https://issues.apache.org/jira/browse/HBASE-7634 Project: HBase Issue Type: Bug Components: Replication Affects Versions: 0.95.2 Reporter: Gabriel Reid Assignee: Gabriel Reid Fix For: 0.98.0, 0.95.2 Attachments: HBASE-7634.patch, HBASE-7634.v2.patch, HBASE-7634.v3.patch, HBASE-7634.v4.patch, HBASE-7634.v5.patch, HBASE-7634.v6.patch The current handling of changes to the region servers in a replication peer cluster is currently quite inefficient. The list of region servers that are being replicated to is only updated if there are a large number of issues encountered while replicating. This can cause it to take quite a while to recognize that a number of the regionserver in a peer cluster are no longer available. A potentially bigger problem is that if a replication peer cluster is started with a small number of regionservers, and then more region servers are added after replication has started, the additional region servers will never be used for replication (unless there are failures on the in-use regionservers). Part of the current issue is that the retry code in ReplicationSource#shipEdits checks a randomly-chosen replication peer regionserver (in ReplicationSource#isSlaveDown) to see if it is up after a replication write has failed on a different randonly-chosen replication peer. If the peer is seen as not down, another randomly-chosen peer is used for writing. A second part of the issue is that changes to the list of region servers in a peer cluster are not detected at all, and are only picked up if a certain number of failures have occurred when trying to ship edits. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-9594) Add reference documentation on changes made by HBASE-7634 (Replication handling of peer cluster changes)
[ https://issues.apache.org/jira/browse/HBASE-9594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabriel Reid updated HBASE-9594: Attachment: HBASE-9594.patch Documentation patch to include information on HBASE-7634 changes to replication. Add reference documentation on changes made by HBASE-7634 (Replication handling of peer cluster changes) Key: HBASE-9594 URL: https://issues.apache.org/jira/browse/HBASE-9594 Project: HBase Issue Type: Improvement Components: documentation Affects Versions: 0.96.0 Reporter: Gabriel Reid Priority: Minor Attachments: HBASE-9594.patch HBASE-7634 introduced changes to how the HBase replication handles changes to the composition of a slave cluster. Information on this mechanism should be added to the reference documentation for replication. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7634) Replication handling of changes to peer clusters is inefficient
[ https://issues.apache.org/jira/browse/HBASE-7634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13772957#comment-13772957 ] Gabriel Reid commented on HBASE-7634: - Added HBASE-9594 with documentation patch. Also added small release note to this issue. Replication handling of changes to peer clusters is inefficient --- Key: HBASE-7634 URL: https://issues.apache.org/jira/browse/HBASE-7634 Project: HBase Issue Type: Bug Components: Replication Affects Versions: 0.95.2 Reporter: Gabriel Reid Assignee: Gabriel Reid Fix For: 0.98.0, 0.95.2 Attachments: HBASE-7634.patch, HBASE-7634.v2.patch, HBASE-7634.v3.patch, HBASE-7634.v4.patch, HBASE-7634.v5.patch, HBASE-7634.v6.patch The current handling of changes to the region servers in a replication peer cluster is currently quite inefficient. The list of region servers that are being replicated to is only updated if there are a large number of issues encountered while replicating. This can cause it to take quite a while to recognize that a number of the regionserver in a peer cluster are no longer available. A potentially bigger problem is that if a replication peer cluster is started with a small number of regionservers, and then more region servers are added after replication has started, the additional region servers will never be used for replication (unless there are failures on the in-use regionservers). Part of the current issue is that the retry code in ReplicationSource#shipEdits checks a randomly-chosen replication peer regionserver (in ReplicationSource#isSlaveDown) to see if it is up after a replication write has failed on a different randonly-chosen replication peer. If the peer is seen as not down, another randomly-chosen peer is used for writing. A second part of the issue is that changes to the list of region servers in a peer cluster are not detected at all, and are only picked up if a certain number of failures have occurred when trying to ship edits. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-9594) Add reference documentation on changes made by HBASE-7634 (Replication handling of peer cluster changes)
Gabriel Reid created HBASE-9594: --- Summary: Add reference documentation on changes made by HBASE-7634 (Replication handling of peer cluster changes) Key: HBASE-9594 URL: https://issues.apache.org/jira/browse/HBASE-9594 Project: HBase Issue Type: Improvement Components: documentation Affects Versions: 0.96.0 Reporter: Gabriel Reid Priority: Minor HBASE-7634 introduced changes to how the HBase replication handles changes to the composition of a slave cluster. Information on this mechanism should be added to the reference documentation for replication. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-8561) [replication] Don't instantiate a ReplicationSource if the passed implementation isn't found
[ https://issues.apache.org/jira/browse/HBASE-8561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13728465#comment-13728465 ] Gabriel Reid commented on HBASE-8561: - Wouldn't it be easier to just throw an exception up the stack and then (I assume) abort the regionserver? That way it'll be immediately clear if there is an issue, as well as removing the need to do null checking everywhere in the replication code. [replication] Don't instantiate a ReplicationSource if the passed implementation isn't found Key: HBASE-8561 URL: https://issues.apache.org/jira/browse/HBASE-8561 Project: HBase Issue Type: Improvement Affects Versions: 0.94.6.1 Reporter: Jean-Daniel Cryans Fix For: 0.98.0, 0.95.2 I was debugging a case where the region servers were dying with: {noformat} ABORTING region server someserver.com,60020,1368123702806: Writing replication status org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = NoNode for /hbase/replication/rs/someserver.com,60020,1368123702806/etcetcetc/somserver.com%2C60020%2C1368123702740.1368123705091 at org.apache.zookeeper.KeeperException.create(KeeperException.java:111) at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) at org.apache.zookeeper.ZooKeeper.setData(ZooKeeper.java:1266) at org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.setData(RecoverableZooKeeper.java:354) at org.apache.hadoop.hbase.zookeeper.ZKUtil.setData(ZKUtil.java:846) at org.apache.hadoop.hbase.zookeeper.ZKUtil.setData(ZKUtil.java:898) at org.apache.hadoop.hbase.zookeeper.ZKUtil.setData(ZKUtil.java:892) at org.apache.hadoop.hbase.replication.ReplicationZookeeper.writeReplicationStatus(ReplicationZookeeper.java:558) at org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceManager.logPositionAndCleanOldLogs(ReplicationSourceManager.java:154) at org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.shipEdits(ReplicationSource.java:638) at org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:387) {noformat} Turns out the problem really was: {noformat} 2013-05-09 11:21:45,625 WARN org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceManager: Passed replication source implementation throws errors, defaulting to ReplicationSource java.lang.ClassNotFoundException: Some.Other.ReplicationSource.Implementation at java.net.URLClassLoader$1.run(URLClassLoader.java:366) at java.net.URLClassLoader$1.run(URLClassLoader.java:355) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:354) at java.lang.ClassLoader.loadClass(ClassLoader.java:423) at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308) at java.lang.ClassLoader.loadClass(ClassLoader.java:356) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:186) at org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceManager.getReplicationSource(ReplicationSourceManager.java:324) at org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceManager.addSource(ReplicationSourceManager.java:202) at org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceManager.init(ReplicationSourceManager.java:174) at org.apache.hadoop.hbase.replication.regionserver.Replication.startReplicationService(Replication.java:171) at org.apache.hadoop.hbase.regionserver.HRegionServer.startServiceThreads(HRegionServer.java:1583) at org.apache.hadoop.hbase.regionserver.HRegionServer.handleReportForDutyResponse(HRegionServer.java:1042) at org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:698) at java.lang.Thread.run(Thread.java:722) {noformat} So I think instantiating a ReplicationSource here is wrong and makes it harder to debug. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-7634) Replication handling of changes to peer clusters is inefficient
[ https://issues.apache.org/jira/browse/HBASE-7634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabriel Reid updated HBASE-7634: Attachment: HBASE-7634.v5.patch Updated patch based on reviewboard comments from J-D No real functional changes (other than increased configurability). Most changes are formatting and whitespace. Replication handling of changes to peer clusters is inefficient --- Key: HBASE-7634 URL: https://issues.apache.org/jira/browse/HBASE-7634 Project: HBase Issue Type: Bug Components: Replication Affects Versions: 0.95.2 Reporter: Gabriel Reid Attachments: HBASE-7634.patch, HBASE-7634.v2.patch, HBASE-7634.v3.patch, HBASE-7634.v4.patch, HBASE-7634.v5.patch The current handling of changes to the region servers in a replication peer cluster is currently quite inefficient. The list of region servers that are being replicated to is only updated if there are a large number of issues encountered while replicating. This can cause it to take quite a while to recognize that a number of the regionserver in a peer cluster are no longer available. A potentially bigger problem is that if a replication peer cluster is started with a small number of regionservers, and then more region servers are added after replication has started, the additional region servers will never be used for replication (unless there are failures on the in-use regionservers). Part of the current issue is that the retry code in ReplicationSource#shipEdits checks a randomly-chosen replication peer regionserver (in ReplicationSource#isSlaveDown) to see if it is up after a replication write has failed on a different randonly-chosen replication peer. If the peer is seen as not down, another randomly-chosen peer is used for writing. A second part of the issue is that changes to the list of region servers in a peer cluster are not detected at all, and are only picked up if a certain number of failures have occurred when trying to ship edits. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-7634) Replication handling of changes to peer clusters is inefficient
[ https://issues.apache.org/jira/browse/HBASE-7634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabriel Reid updated HBASE-7634: Attachment: HBASE-7634.v6.patch Now without javadoc warnings Replication handling of changes to peer clusters is inefficient --- Key: HBASE-7634 URL: https://issues.apache.org/jira/browse/HBASE-7634 Project: HBase Issue Type: Bug Components: Replication Affects Versions: 0.95.2 Reporter: Gabriel Reid Attachments: HBASE-7634.patch, HBASE-7634.v2.patch, HBASE-7634.v3.patch, HBASE-7634.v4.patch, HBASE-7634.v5.patch, HBASE-7634.v6.patch The current handling of changes to the region servers in a replication peer cluster is currently quite inefficient. The list of region servers that are being replicated to is only updated if there are a large number of issues encountered while replicating. This can cause it to take quite a while to recognize that a number of the regionserver in a peer cluster are no longer available. A potentially bigger problem is that if a replication peer cluster is started with a small number of regionservers, and then more region servers are added after replication has started, the additional region servers will never be used for replication (unless there are failures on the in-use regionservers). Part of the current issue is that the retry code in ReplicationSource#shipEdits checks a randomly-chosen replication peer regionserver (in ReplicationSource#isSlaveDown) to see if it is up after a replication write has failed on a different randonly-chosen replication peer. If the peer is seen as not down, another randomly-chosen peer is used for writing. A second part of the issue is that changes to the list of region servers in a peer cluster are not detected at all, and are only picked up if a certain number of failures have occurred when trying to ship edits. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-7325) Replication reacts slowly on a lightly-loaded cluster
[ https://issues.apache.org/jira/browse/HBASE-7325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabriel Reid updated HBASE-7325: Attachment: HBASE-7325.v2.patch Regenerated patch for trunk. I've also verified that it can be applied to the 0.94 branch (using patch -p1). Sorry for taking so long with the patch regeneration. Replication reacts slowly on a lightly-loaded cluster - Key: HBASE-7325 URL: https://issues.apache.org/jira/browse/HBASE-7325 Project: HBase Issue Type: Bug Components: Replication Reporter: Gabriel Reid Priority: Minor Attachments: HBASE-7325.patch, HBASE-7325.v2.patch ReplicationSource uses a backing-off algorithm to sleep for an increasing duration when an error is encountered in the replication run loop. However, this backing-off is also performed when there is nothing found to replicate in the HLog. Assuming default settings (1 second base retry sleep time, and maximum multiplier of 10), this means that replication takes up to 10 seconds to occur when there is a break of about 55 seconds without anything being written. As there is no error condition, and there is apparently no substantial load on the regionserver in this situation, it would probably make more sense to not back off in non-error situations. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-7325) Replication reacts slowly on a lightly-loaded cluster
[ https://issues.apache.org/jira/browse/HBASE-7325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabriel Reid updated HBASE-7325: Status: Patch Available (was: Open) Replication reacts slowly on a lightly-loaded cluster - Key: HBASE-7325 URL: https://issues.apache.org/jira/browse/HBASE-7325 Project: HBase Issue Type: Bug Components: Replication Reporter: Gabriel Reid Priority: Minor Attachments: HBASE-7325.patch, HBASE-7325.v2.patch ReplicationSource uses a backing-off algorithm to sleep for an increasing duration when an error is encountered in the replication run loop. However, this backing-off is also performed when there is nothing found to replicate in the HLog. Assuming default settings (1 second base retry sleep time, and maximum multiplier of 10), this means that replication takes up to 10 seconds to occur when there is a break of about 55 seconds without anything being written. As there is no error condition, and there is apparently no substantial load on the regionserver in this situation, it would probably make more sense to not back off in non-error situations. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-7634) Replication handling of changes to peer clusters is inefficient
[ https://issues.apache.org/jira/browse/HBASE-7634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabriel Reid updated HBASE-7634: Attachment: HBASE-7634.v4.patch Rebased patch on trunk, sorry for the delay on this one. Replication handling of changes to peer clusters is inefficient --- Key: HBASE-7634 URL: https://issues.apache.org/jira/browse/HBASE-7634 Project: HBase Issue Type: Bug Components: Replication Affects Versions: 0.95.2 Reporter: Gabriel Reid Attachments: HBASE-7634.patch, HBASE-7634.v2.patch, HBASE-7634.v3.patch, HBASE-7634.v4.patch The current handling of changes to the region servers in a replication peer cluster is currently quite inefficient. The list of region servers that are being replicated to is only updated if there are a large number of issues encountered while replicating. This can cause it to take quite a while to recognize that a number of the regionserver in a peer cluster are no longer available. A potentially bigger problem is that if a replication peer cluster is started with a small number of regionservers, and then more region servers are added after replication has started, the additional region servers will never be used for replication (unless there are failures on the in-use regionservers). Part of the current issue is that the retry code in ReplicationSource#shipEdits checks a randomly-chosen replication peer regionserver (in ReplicationSource#isSlaveDown) to see if it is up after a replication write has failed on a different randonly-chosen replication peer. If the peer is seen as not down, another randomly-chosen peer is used for writing. A second part of the issue is that changes to the list of region servers in a peer cluster are not detected at all, and are only picked up if a certain number of failures have occurred when trying to ship edits. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7634) Replication handling of changes to peer clusters is inefficient
[ https://issues.apache.org/jira/browse/HBASE-7634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13724226#comment-13724226 ] Gabriel Reid commented on HBASE-7634: - [~ctrezzo] great idea -- I've put it on reviewboard at https://reviews.apache.org/r/13075/ Replication handling of changes to peer clusters is inefficient --- Key: HBASE-7634 URL: https://issues.apache.org/jira/browse/HBASE-7634 Project: HBase Issue Type: Bug Components: Replication Affects Versions: 0.95.2 Reporter: Gabriel Reid Attachments: HBASE-7634.patch, HBASE-7634.v2.patch, HBASE-7634.v3.patch, HBASE-7634.v4.patch The current handling of changes to the region servers in a replication peer cluster is currently quite inefficient. The list of region servers that are being replicated to is only updated if there are a large number of issues encountered while replicating. This can cause it to take quite a while to recognize that a number of the regionserver in a peer cluster are no longer available. A potentially bigger problem is that if a replication peer cluster is started with a small number of regionservers, and then more region servers are added after replication has started, the additional region servers will never be used for replication (unless there are failures on the in-use regionservers). Part of the current issue is that the retry code in ReplicationSource#shipEdits checks a randomly-chosen replication peer regionserver (in ReplicationSource#isSlaveDown) to see if it is up after a replication write has failed on a different randonly-chosen replication peer. If the peer is seen as not down, another randomly-chosen peer is used for writing. A second part of the issue is that changes to the list of region servers in a peer cluster are not detected at all, and are only picked up if a certain number of failures have occurred when trying to ship edits. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7634) Replication handling of changes to peer clusters is inefficient
[ https://issues.apache.org/jira/browse/HBASE-7634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13722190#comment-13722190 ] Gabriel Reid commented on HBASE-7634: - Yes, will rebase this in the course of the next couple of days -- sorry for the delay on this. Replication handling of changes to peer clusters is inefficient --- Key: HBASE-7634 URL: https://issues.apache.org/jira/browse/HBASE-7634 Project: HBase Issue Type: Bug Components: Replication Affects Versions: 0.95.2 Reporter: Gabriel Reid Attachments: HBASE-7634.patch, HBASE-7634.v2.patch, HBASE-7634.v3.patch The current handling of changes to the region servers in a replication peer cluster is currently quite inefficient. The list of region servers that are being replicated to is only updated if there are a large number of issues encountered while replicating. This can cause it to take quite a while to recognize that a number of the regionserver in a peer cluster are no longer available. A potentially bigger problem is that if a replication peer cluster is started with a small number of regionservers, and then more region servers are added after replication has started, the additional region servers will never be used for replication (unless there are failures on the in-use regionservers). Part of the current issue is that the retry code in ReplicationSource#shipEdits checks a randomly-chosen replication peer regionserver (in ReplicationSource#isSlaveDown) to see if it is up after a replication write has failed on a different randonly-chosen replication peer. If the peer is seen as not down, another randomly-chosen peer is used for writing. A second part of the issue is that changes to the list of region servers in a peer cluster are not detected at all, and are only picked up if a certain number of failures have occurred when trying to ship edits. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-9035) Incorrect example for using a scan stopRow in HBase book
[ https://issues.apache.org/jira/browse/HBASE-9035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabriel Reid updated HBASE-9035: Attachment: HBASE-9035.patch Patch with working example, as well as a pointer to InclusiveStopFilter for easier scanning over a specific range. Incorrect example for using a scan stopRow in HBase book Key: HBASE-9035 URL: https://issues.apache.org/jira/browse/HBASE-9035 Project: HBase Issue Type: Bug Components: documentation Reporter: Gabriel Reid Attachments: HBASE-9035.patch The example of how to use a stop row in a scan in the Section 5.7.3 of the HBase book [1] is incorrect. It demonstrates using a start and stop row to only retrieve records with a given prefix, creating the stop row by appending a null byte to the start row. This creates a scan that does not include any of the target rows, because the the stop row is less than the target rows via lexicographical sorting. [1] http://hbase.apache.org/book/data_model_operations.html#scan -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-9035) Incorrect example for using a scan stopRow in HBase book
Gabriel Reid created HBASE-9035: --- Summary: Incorrect example for using a scan stopRow in HBase book Key: HBASE-9035 URL: https://issues.apache.org/jira/browse/HBASE-9035 Project: HBase Issue Type: Bug Components: documentation Reporter: Gabriel Reid Attachments: HBASE-9035.patch The example of how to use a stop row in a scan in the Section 5.7.3 of the HBase book [1] is incorrect. It demonstrates using a start and stop row to only retrieve records with a given prefix, creating the stop row by appending a null byte to the start row. This creates a scan that does not include any of the target rows, because the the stop row is less than the target rows via lexicographical sorting. [1] http://hbase.apache.org/book/data_model_operations.html#scan -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-9035) Incorrect example for using a scan stopRow in HBase book
[ https://issues.apache.org/jira/browse/HBASE-9035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabriel Reid updated HBASE-9035: Status: Patch Available (was: Open) Incorrect example for using a scan stopRow in HBase book Key: HBASE-9035 URL: https://issues.apache.org/jira/browse/HBASE-9035 Project: HBase Issue Type: Bug Components: documentation Reporter: Gabriel Reid Attachments: HBASE-9035.patch The example of how to use a stop row in a scan in the Section 5.7.3 of the HBase book [1] is incorrect. It demonstrates using a start and stop row to only retrieve records with a given prefix, creating the stop row by appending a null byte to the start row. This creates a scan that does not include any of the target rows, because the the stop row is less than the target rows via lexicographical sorting. [1] http://hbase.apache.org/book/data_model_operations.html#scan -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-9035) Incorrect example for using a scan stopRow in HBase book
[ https://issues.apache.org/jira/browse/HBASE-9035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13718262#comment-13718262 ] Gabriel Reid commented on HBASE-9035: - If I'm not mistaken, adding a #FF still isn't sufficient to do what's given as the example in the book. The example specifies that it will return all rows whose row key begins with row. Setting the stop row to row\xFF will still exclude all rows that begin with row\xFF. Incorrect example for using a scan stopRow in HBase book Key: HBASE-9035 URL: https://issues.apache.org/jira/browse/HBASE-9035 Project: HBase Issue Type: Bug Components: documentation Reporter: Gabriel Reid Attachments: HBASE-9035.patch The example of how to use a stop row in a scan in the Section 5.7.3 of the HBase book [1] is incorrect. It demonstrates using a start and stop row to only retrieve records with a given prefix, creating the stop row by appending a null byte to the start row. This creates a scan that does not include any of the target rows, because the the stop row is less than the target rows via lexicographical sorting. [1] http://hbase.apache.org/book/data_model_operations.html#scan -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-7634) Replication handling of changes to peer clusters is inefficient
[ https://issues.apache.org/jira/browse/HBASE-7634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabriel Reid updated HBASE-7634: Attachment: HBASE-7634.v3.patch Updated patch with formatting issues fix, and made TestReplicationSinkManager#testReportBadSink_DownToZeroSinks deterministic Replication handling of changes to peer clusters is inefficient --- Key: HBASE-7634 URL: https://issues.apache.org/jira/browse/HBASE-7634 Project: HBase Issue Type: Bug Components: Replication Affects Versions: 0.96.0 Reporter: Gabriel Reid Attachments: HBASE-7634.patch, HBASE-7634.v2.patch, HBASE-7634.v3.patch The current handling of changes to the region servers in a replication peer cluster is currently quite inefficient. The list of region servers that are being replicated to is only updated if there are a large number of issues encountered while replicating. This can cause it to take quite a while to recognize that a number of the regionserver in a peer cluster are no longer available. A potentially bigger problem is that if a replication peer cluster is started with a small number of regionservers, and then more region servers are added after replication has started, the additional region servers will never be used for replication (unless there are failures on the in-use regionservers). Part of the current issue is that the retry code in ReplicationSource#shipEdits checks a randomly-chosen replication peer regionserver (in ReplicationSource#isSlaveDown) to see if it is up after a replication write has failed on a different randonly-chosen replication peer. If the peer is seen as not down, another randomly-chosen peer is used for writing. A second part of the issue is that changes to the list of region servers in a peer cluster are not detected at all, and are only picked up if a certain number of failures have occurred when trying to ship edits. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-7673) Incorrect error logging when a replication peer is removed
Gabriel Reid created HBASE-7673: --- Summary: Incorrect error logging when a replication peer is removed Key: HBASE-7673 URL: https://issues.apache.org/jira/browse/HBASE-7673 Project: HBase Issue Type: Bug Components: Replication Affects Versions: 0.96.0 Reporter: Gabriel Reid Priority: Minor Attachments: HBASE-7673.patch When a replication peer is removed (and all goes well), the following error is still logged: {noformat}[ERROR][14:14:21,504][ventThread] org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceManager - The queue we wanted to close is missing peer-state{noformat} This is due to a watch being set on the peer-state node under the replication peer node in ZooKeeper, and the ReplicationSource#PeersWatcher doesn't correctly discern between nodes when it gets nodeDeleted messages. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-7673) Incorrect error logging when a replication peer is removed
[ https://issues.apache.org/jira/browse/HBASE-7673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabriel Reid updated HBASE-7673: Attachment: HBASE-7673.patch Patch to make the PeersWatcher aware of what a valid path is to a peer node in ZK, and ignore nodeDeleted messages for other nodes. Incorrect error logging when a replication peer is removed -- Key: HBASE-7673 URL: https://issues.apache.org/jira/browse/HBASE-7673 Project: HBase Issue Type: Bug Components: Replication Affects Versions: 0.96.0 Reporter: Gabriel Reid Priority: Minor Attachments: HBASE-7673.patch When a replication peer is removed (and all goes well), the following error is still logged: {noformat}[ERROR][14:14:21,504][ventThread] org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceManager - The queue we wanted to close is missing peer-state{noformat} This is due to a watch being set on the peer-state node under the replication peer node in ZooKeeper, and the ReplicationSource#PeersWatcher doesn't correctly discern between nodes when it gets nodeDeleted messages. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-7673) Incorrect error logging when a replication peer is removed
[ https://issues.apache.org/jira/browse/HBASE-7673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabriel Reid updated HBASE-7673: Status: Patch Available (was: Open) Incorrect error logging when a replication peer is removed -- Key: HBASE-7673 URL: https://issues.apache.org/jira/browse/HBASE-7673 Project: HBase Issue Type: Bug Components: Replication Affects Versions: 0.96.0 Reporter: Gabriel Reid Priority: Minor Attachments: HBASE-7673.patch When a replication peer is removed (and all goes well), the following error is still logged: {noformat}[ERROR][14:14:21,504][ventThread] org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceManager - The queue we wanted to close is missing peer-state{noformat} This is due to a watch being set on the peer-state node under the replication peer node in ZooKeeper, and the ReplicationSource#PeersWatcher doesn't correctly discern between nodes when it gets nodeDeleted messages. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-7122) Proper warning message when opening a log file with no entries (idle cluster)
[ https://issues.apache.org/jira/browse/HBASE-7122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabriel Reid updated HBASE-7122: Attachment: HBASE-7122.v2.patch We've just started having some issues with this bug as well -- trunk patch is attached. It's been tested against the replication unit tests, as well as a small-scale cluster test to ensure it does what it's supposed to. Proper warning message when opening a log file with no entries (idle cluster) - Key: HBASE-7122 URL: https://issues.apache.org/jira/browse/HBASE-7122 Project: HBase Issue Type: Sub-task Components: Replication Affects Versions: 0.94.2 Reporter: Himanshu Vashishtha Assignee: Himanshu Vashishtha Fix For: 0.96.0 Attachments: HBase-7122.patch, HBASE-7122.v2.patch In case the cluster is idle and the log has rolled (offset to 0), replicationSource tries to open the log and gets an EOF exception. This gets printed after every 10 sec until an entry is inserted in it. {code} 2012-11-07 15:47:40,924 DEBUG regionserver.ReplicationSource (ReplicationSource.java:openReader(487)) - Opening log for replication c0315.hal.cloudera.com%2C40020%2C1352324202860.1352327804874 at 0 2012-11-07 15:47:40,926 WARN regionserver.ReplicationSource (ReplicationSource.java:openReader(543)) - 1 Got: java.io.EOFException at java.io.DataInputStream.readFully(DataInputStream.java:180) at java.io.DataInputStream.readFully(DataInputStream.java:152) at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1508) at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1486) at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1475) at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1470) at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader$WALReader.init(SequenceFileLogReader.java:55) at org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.init(SequenceFileLogReader.java:175) at org.apache.hadoop.hbase.regionserver.wal.HLog.getReader(HLog.java:716) at org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.openReader(ReplicationSource.java:491) at org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:290) 2012-11-07 15:47:40,927 WARN regionserver.ReplicationSource (ReplicationSource.java:openReader(547)) - Waited too long for this file, considering dumping 2012-11-07 15:47:40,927 DEBUG regionserver.ReplicationSource (ReplicationSource.java:sleepForRetries(562)) - Unable to open a reader, sleeping 1000 times 10 {code} We should reduce the log spewing in this case (or some informative message, based on the offset). -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7673) Incorrect error logging when a replication peer is removed
[ https://issues.apache.org/jira/browse/HBASE-7673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13562744#comment-13562744 ] Gabriel Reid commented on HBASE-7673: - Hmm, looks like the failed test (in org.apache.hadoop.hbase.coprocessor.example.TestBulkDeleteProtocol) seems to be an unrelated random test failure. Incorrect error logging when a replication peer is removed -- Key: HBASE-7673 URL: https://issues.apache.org/jira/browse/HBASE-7673 Project: HBase Issue Type: Bug Components: Replication Affects Versions: 0.96.0 Reporter: Gabriel Reid Priority: Minor Attachments: HBASE-7673.patch When a replication peer is removed (and all goes well), the following error is still logged: {noformat}[ERROR][14:14:21,504][ventThread] org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceManager - The queue we wanted to close is missing peer-state{noformat} This is due to a watch being set on the peer-state node under the replication peer node in ZooKeeper, and the ReplicationSource#PeersWatcher doesn't correctly discern between nodes when it gets nodeDeleted messages. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-7634) Replication handling of changes to peer clusters is inefficient
[ https://issues.apache.org/jira/browse/HBASE-7634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabriel Reid updated HBASE-7634: Status: Patch Available (was: Open) Replication handling of changes to peer clusters is inefficient --- Key: HBASE-7634 URL: https://issues.apache.org/jira/browse/HBASE-7634 Project: HBase Issue Type: Bug Components: Replication Affects Versions: 0.96.0 Reporter: Gabriel Reid Attachments: HBASE-7634.patch The current handling of changes to the region servers in a replication peer cluster is currently quite inefficient. The list of region servers that are being replicated to is only updated if there are a large number of issues encountered while replicating. This can cause it to take quite a while to recognize that a number of the regionserver in a peer cluster are no longer available. A potentially bigger problem is that if a replication peer cluster is started with a small number of regionservers, and then more region servers are added after replication has started, the additional region servers will never be used for replication (unless there are failures on the in-use regionservers). Part of the current issue is that the retry code in ReplicationSource#shipEdits checks a randomly-chosen replication peer regionserver (in ReplicationSource#isSlaveDown) to see if it is up after a replication write has failed on a different randonly-chosen replication peer. If the peer is seen as not down, another randomly-chosen peer is used for writing. A second part of the issue is that changes to the list of region servers in a peer cluster are not detected at all, and are only picked up if a certain number of failures have occurred when trying to ship edits. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-7634) Replication handling of changes to peer clusters is inefficient
[ https://issues.apache.org/jira/browse/HBASE-7634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabriel Reid updated HBASE-7634: Status: Open (was: Patch Available) Current patch cancelled because it doesn't apply cleanly. Replication handling of changes to peer clusters is inefficient --- Key: HBASE-7634 URL: https://issues.apache.org/jira/browse/HBASE-7634 Project: HBase Issue Type: Bug Components: Replication Affects Versions: 0.96.0 Reporter: Gabriel Reid Attachments: HBASE-7634.patch The current handling of changes to the region servers in a replication peer cluster is currently quite inefficient. The list of region servers that are being replicated to is only updated if there are a large number of issues encountered while replicating. This can cause it to take quite a while to recognize that a number of the regionserver in a peer cluster are no longer available. A potentially bigger problem is that if a replication peer cluster is started with a small number of regionservers, and then more region servers are added after replication has started, the additional region servers will never be used for replication (unless there are failures on the in-use regionservers). Part of the current issue is that the retry code in ReplicationSource#shipEdits checks a randomly-chosen replication peer regionserver (in ReplicationSource#isSlaveDown) to see if it is up after a replication write has failed on a different randonly-chosen replication peer. If the peer is seen as not down, another randomly-chosen peer is used for writing. A second part of the issue is that changes to the list of region servers in a peer cluster are not detected at all, and are only picked up if a certain number of failures have occurred when trying to ship edits. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-7634) Replication handling of changes to peer clusters is inefficient
[ https://issues.apache.org/jira/browse/HBASE-7634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabriel Reid updated HBASE-7634: Status: Patch Available (was: Open) Replication handling of changes to peer clusters is inefficient --- Key: HBASE-7634 URL: https://issues.apache.org/jira/browse/HBASE-7634 Project: HBase Issue Type: Bug Components: Replication Affects Versions: 0.96.0 Reporter: Gabriel Reid Attachments: HBASE-7634.patch, HBASE-7634.v2.patch The current handling of changes to the region servers in a replication peer cluster is currently quite inefficient. The list of region servers that are being replicated to is only updated if there are a large number of issues encountered while replicating. This can cause it to take quite a while to recognize that a number of the regionserver in a peer cluster are no longer available. A potentially bigger problem is that if a replication peer cluster is started with a small number of regionservers, and then more region servers are added after replication has started, the additional region servers will never be used for replication (unless there are failures on the in-use regionservers). Part of the current issue is that the retry code in ReplicationSource#shipEdits checks a randomly-chosen replication peer regionserver (in ReplicationSource#isSlaveDown) to see if it is up after a replication write has failed on a different randonly-chosen replication peer. If the peer is seen as not down, another randomly-chosen peer is used for writing. A second part of the issue is that changes to the list of region servers in a peer cluster are not detected at all, and are only picked up if a certain number of failures have occurred when trying to ship edits. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-7634) Replication handling of changes to peer clusters is inefficient
[ https://issues.apache.org/jira/browse/HBASE-7634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabriel Reid updated HBASE-7634: Attachment: HBASE-7634.v2.patch Updated patch that applies cleanly on current trunk Replication handling of changes to peer clusters is inefficient --- Key: HBASE-7634 URL: https://issues.apache.org/jira/browse/HBASE-7634 Project: HBase Issue Type: Bug Components: Replication Affects Versions: 0.96.0 Reporter: Gabriel Reid Attachments: HBASE-7634.patch, HBASE-7634.v2.patch The current handling of changes to the region servers in a replication peer cluster is currently quite inefficient. The list of region servers that are being replicated to is only updated if there are a large number of issues encountered while replicating. This can cause it to take quite a while to recognize that a number of the regionserver in a peer cluster are no longer available. A potentially bigger problem is that if a replication peer cluster is started with a small number of regionservers, and then more region servers are added after replication has started, the additional region servers will never be used for replication (unless there are failures on the in-use regionservers). Part of the current issue is that the retry code in ReplicationSource#shipEdits checks a randomly-chosen replication peer regionserver (in ReplicationSource#isSlaveDown) to see if it is up after a replication write has failed on a different randonly-chosen replication peer. If the peer is seen as not down, another randomly-chosen peer is used for writing. A second part of the issue is that changes to the list of region servers in a peer cluster are not detected at all, and are only picked up if a certain number of failures have occurred when trying to ship edits. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-7634) Replication handling of changes to peer clusters is inefficient
Gabriel Reid created HBASE-7634: --- Summary: Replication handling of changes to peer clusters is inefficient Key: HBASE-7634 URL: https://issues.apache.org/jira/browse/HBASE-7634 Project: HBase Issue Type: Bug Components: Replication Affects Versions: 0.96.0 Reporter: Gabriel Reid Attachments: HBASE-7634.patch The current handling of changes to the region servers in a replication peer cluster is currently quite inefficient. The list of region servers that are being replicated to is only updated if there are a large number of issues encountered while replicating. This can cause it to take quite a while to recognize that a number of the regionserver in a peer cluster are no longer available. A potentially bigger problem is that if a replication peer cluster is started with a small number of regionservers, and then more region servers are added after replication has started, the additional region servers will never be used for replication (unless there are failures on the in-use regionservers). Part of the current issue is that the retry code in ReplicationSource#shipEdits checks a randomly-chosen replication peer regionserver (in ReplicationSource#isSlaveDown) to see if it is up after a replication write has failed on a different randonly-chosen replication peer. If the peer is seen as not down, another randomly-chosen peer is used for writing. A second part of the issue is that changes to the list of region servers in a peer cluster are not detected at all, and are only picked up if a certain number of failures have occurred when trying to ship edits. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-7634) Replication handling of changes to peer clusters is inefficient
[ https://issues.apache.org/jira/browse/HBASE-7634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabriel Reid updated HBASE-7634: Attachment: HBASE-7634.patch Initial patch to resolve this issue. Adds a watcher to replication peer's list of region servers to respond to changes in the list of region servers. Also changes checking randomly-chosen peer regionservers with the ability to report a bad peer regionserver. When a bad peer regionserver has been reported three times, it is no longer used for replication until the list of replication peer regionservers is refreshed. Replication handling of changes to peer clusters is inefficient --- Key: HBASE-7634 URL: https://issues.apache.org/jira/browse/HBASE-7634 Project: HBase Issue Type: Bug Components: Replication Affects Versions: 0.96.0 Reporter: Gabriel Reid Attachments: HBASE-7634.patch The current handling of changes to the region servers in a replication peer cluster is currently quite inefficient. The list of region servers that are being replicated to is only updated if there are a large number of issues encountered while replicating. This can cause it to take quite a while to recognize that a number of the regionserver in a peer cluster are no longer available. A potentially bigger problem is that if a replication peer cluster is started with a small number of regionservers, and then more region servers are added after replication has started, the additional region servers will never be used for replication (unless there are failures on the in-use regionservers). Part of the current issue is that the retry code in ReplicationSource#shipEdits checks a randomly-chosen replication peer regionserver (in ReplicationSource#isSlaveDown) to see if it is up after a replication write has failed on a different randonly-chosen replication peer. If the peer is seen as not down, another randomly-chosen peer is used for writing. A second part of the issue is that changes to the list of region servers in a peer cluster are not detected at all, and are only picked up if a certain number of failures have occurred when trying to ship edits. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7325) Replication reacts slowly on a lightly-loaded cluster
[ https://issues.apache.org/jira/browse/HBASE-7325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13552852#comment-13552852 ] Gabriel Reid commented on HBASE-7325: - [~jdcryans] [~lhofhansl] Gentle nudge on this so it doesn't get forgotten -- is it still the plan to commit this to trunk (and possibly 0.94)? Or is there more of a feeling that this wouldn't be a good idea? Replication reacts slowly on a lightly-loaded cluster - Key: HBASE-7325 URL: https://issues.apache.org/jira/browse/HBASE-7325 Project: HBase Issue Type: Bug Components: Replication Reporter: Gabriel Reid Priority: Minor Attachments: HBASE-7325.patch ReplicationSource uses a backing-off algorithm to sleep for an increasing duration when an error is encountered in the replication run loop. However, this backing-off is also performed when there is nothing found to replicate in the HLog. Assuming default settings (1 second base retry sleep time, and maximum multiplier of 10), this means that replication takes up to 10 seconds to occur when there is a break of about 55 seconds without anything being written. As there is no error condition, and there is apparently no substantial load on the regionserver in this situation, it would probably make more sense to not back off in non-error situations. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-7476) HBase shell count command doesn't escape binary output
[ https://issues.apache.org/jira/browse/HBASE-7476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabriel Reid updated HBASE-7476: Attachment: HBASE-7476_1.patch [~stack] Here's an updated patch that uses Bytes.toBinaryString as suggested HBase shell count command doesn't escape binary output -- Key: HBASE-7476 URL: https://issues.apache.org/jira/browse/HBASE-7476 Project: HBase Issue Type: Bug Components: shell Reporter: Gabriel Reid Priority: Minor Attachments: HBASE-7476_1.patch, HBASE-7476.patch When running the the count command in the HBase shell, the row key is printed each time a count interval is reached. However, the key is printed verbatim, meaning that non-printable characters are directly printed to the terminal. This can cause confusing results, or even leave the terminal in an unusable state. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-7476) HBase shell count command doesn't escape binary output
Gabriel Reid created HBASE-7476: --- Summary: HBase shell count command doesn't escape binary output Key: HBASE-7476 URL: https://issues.apache.org/jira/browse/HBASE-7476 Project: HBase Issue Type: Bug Components: shell Reporter: Gabriel Reid Priority: Minor When running the the count command in the HBase shell, the row key is printed each time a count interval is reached. However, the key is printed verbatim, meaning that non-printable characters are directly printed to the terminal. This can cause confusing results, or even leave the terminal in an unusable state. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-7476) HBase shell count command doesn't escape binary output
[ https://issues.apache.org/jira/browse/HBASE-7476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabriel Reid updated HBASE-7476: Attachment: HBASE-7476.patch Simple patch to escape the row key when it is printed. I'm pretty illiterate in Ruby, so there may well be a better way of doing this. HBase shell count command doesn't escape binary output -- Key: HBASE-7476 URL: https://issues.apache.org/jira/browse/HBASE-7476 Project: HBase Issue Type: Bug Components: shell Reporter: Gabriel Reid Priority: Minor Attachments: HBASE-7476.patch When running the the count command in the HBase shell, the row key is printed each time a count interval is reached. However, the key is printed verbatim, meaning that non-printable characters are directly printed to the terminal. This can cause confusing results, or even leave the terminal in an unusable state. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7476) HBase shell count command doesn't escape binary output
[ https://issues.apache.org/jira/browse/HBASE-7476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13542367#comment-13542367 ] Gabriel Reid commented on HBASE-7476: - Absolutely, anything that gets the job done sounds good to me, thanks! HBase shell count command doesn't escape binary output -- Key: HBASE-7476 URL: https://issues.apache.org/jira/browse/HBASE-7476 Project: HBase Issue Type: Bug Components: shell Reporter: Gabriel Reid Priority: Minor Attachments: HBASE-7476.patch When running the the count command in the HBase shell, the row key is printed each time a count interval is reached. However, the key is printed verbatim, meaning that non-printable characters are directly printed to the terminal. This can cause confusing results, or even leave the terminal in an unusable state. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7325) Replication reacts slowly on a lightly-loaded cluster
[ https://issues.apache.org/jira/browse/HBASE-7325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13529742#comment-13529742 ] Gabriel Reid commented on HBASE-7325: - I've tested it against the TestReplication unit tests, as well as doing some additional testing with HBaseTestingUtility to verify the expected performance improvement. Indeed, I think the once-per-second load being put on the namenode should be a non-issue, and worth it for the gain that you get with faster replication on a quiet cluster. Replication reacts slowly on a lightly-loaded cluster - Key: HBASE-7325 URL: https://issues.apache.org/jira/browse/HBASE-7325 Project: HBase Issue Type: Bug Components: Replication Reporter: Gabriel Reid Priority: Minor Attachments: HBASE-7325.patch ReplicationSource uses a backing-off algorithm to sleep for an increasing duration when an error is encountered in the replication run loop. However, this backing-off is also performed when there is nothing found to replicate in the HLog. Assuming default settings (1 second base retry sleep time, and maximum multiplier of 10), this means that replication takes up to 10 seconds to occur when there is a break of about 55 seconds without anything being written. As there is no error condition, and there is apparently no substantial load on the regionserver in this situation, it would probably make more sense to not back off in non-error situations. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7325) Replication reacts slowly on a lightly-loaded cluster
[ https://issues.apache.org/jira/browse/HBASE-7325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13530769#comment-13530769 ] Gabriel Reid commented on HBASE-7325: - [~lhofhansl] I follow your point -- however, this is a situation where the cluster is almost totally idle. If each region server is getting at least one mutation event per second (which I would assume is still a very light load) then the polling is going to be happening once per second anyhow. If the cluster is more heavily loaded, then the polling is going to be occurring at the rate at which edits can be shipped to peers. This makes me think that if the 1 second interval polling on an idle cluster is a problem, then replication on a loaded cluster will be a much bigger problem. Replication reacts slowly on a lightly-loaded cluster - Key: HBASE-7325 URL: https://issues.apache.org/jira/browse/HBASE-7325 Project: HBase Issue Type: Bug Components: Replication Reporter: Gabriel Reid Priority: Minor Attachments: HBASE-7325.patch ReplicationSource uses a backing-off algorithm to sleep for an increasing duration when an error is encountered in the replication run loop. However, this backing-off is also performed when there is nothing found to replicate in the HLog. Assuming default settings (1 second base retry sleep time, and maximum multiplier of 10), this means that replication takes up to 10 seconds to occur when there is a break of about 55 seconds without anything being written. As there is no error condition, and there is apparently no substantial load on the regionserver in this situation, it would probably make more sense to not back off in non-error situations. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-7325) Replication reacts slowly on a lightly-loaded cluster
Gabriel Reid created HBASE-7325: --- Summary: Replication reacts slowly on a lightly-loaded cluster Key: HBASE-7325 URL: https://issues.apache.org/jira/browse/HBASE-7325 Project: HBase Issue Type: Bug Components: Replication Reporter: Gabriel Reid Priority: Minor ReplicationSource uses a backing-off algorithm to sleep for an increasing duration when an error is encountered in the replication run loop. However, this backing-off is also performed when there is nothing found to replicate in the HLog. Assuming default settings (1 second base retry sleep time, and maximum multiplier of 10), this means that replication takes up to 10 seconds to occur when there is a break of about 55 seconds without anything being written. As there is no error condition, and there is apparently no substantial load on the regionserver in this situation, it would probably make more sense to not back off in non-error situations. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-7325) Replication reacts slowly on a lightly-loaded cluster
[ https://issues.apache.org/jira/browse/HBASE-7325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabriel Reid updated HBASE-7325: Attachment: HBASE-7325.patch Patch to reset the sleep multiplier if there is no data available to replicate (and it is not an error situation) Replication reacts slowly on a lightly-loaded cluster - Key: HBASE-7325 URL: https://issues.apache.org/jira/browse/HBASE-7325 Project: HBase Issue Type: Bug Components: Replication Reporter: Gabriel Reid Priority: Minor Attachments: HBASE-7325.patch ReplicationSource uses a backing-off algorithm to sleep for an increasing duration when an error is encountered in the replication run loop. However, this backing-off is also performed when there is nothing found to replicate in the HLog. Assuming default settings (1 second base retry sleep time, and maximum multiplier of 10), this means that replication takes up to 10 seconds to occur when there is a break of about 55 seconds without anything being written. As there is no error condition, and there is apparently no substantial load on the regionserver in this situation, it would probably make more sense to not back off in non-error situations. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-7292) HTablePool class description javadoc is unclear
Gabriel Reid created HBASE-7292: --- Summary: HTablePool class description javadoc is unclear Key: HBASE-7292 URL: https://issues.apache.org/jira/browse/HBASE-7292 Project: HBase Issue Type: Improvement Reporter: Gabriel Reid Priority: Minor The class description javadoc for HTablePool contains a sentence that makes no sense in the context (it appears to be part of an incorrectly-applied patch from the past). The sentence references the correct way of returning HTables to the pool, but it actually makes it more difficult to understand what the correct way of returning tables to the pool actually is. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-7292) HTablePool class description javadoc is unclear
[ https://issues.apache.org/jira/browse/HBASE-7292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabriel Reid updated HBASE-7292: Attachment: HBASE-7292.patch HTablePool class description javadoc is unclear --- Key: HBASE-7292 URL: https://issues.apache.org/jira/browse/HBASE-7292 Project: HBase Issue Type: Improvement Reporter: Gabriel Reid Priority: Minor Attachments: HBASE-7292.patch The class description javadoc for HTablePool contains a sentence that makes no sense in the context (it appears to be part of an incorrectly-applied patch from the past). The sentence references the correct way of returning HTables to the pool, but it actually makes it more difficult to understand what the correct way of returning tables to the pool actually is. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Commented] (HBASE-7292) HTablePool class description javadoc is unclear
[ https://issues.apache.org/jira/browse/HBASE-7292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13511470#comment-13511470 ] Gabriel Reid commented on HBASE-7292: - Added small patch to fix up the HTablePool javadoc. HTablePool class description javadoc is unclear --- Key: HBASE-7292 URL: https://issues.apache.org/jira/browse/HBASE-7292 Project: HBase Issue Type: Improvement Reporter: Gabriel Reid Priority: Minor Attachments: HBASE-7292.patch The class description javadoc for HTablePool contains a sentence that makes no sense in the context (it appears to be part of an incorrectly-applied patch from the past). The sentence references the correct way of returning HTables to the pool, but it actually makes it more difficult to understand what the correct way of returning tables to the pool actually is. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Created] (HBASE-6953) Incorrect javadoc description of HFileOutputFormat regarding multiple column families
Gabriel Reid created HBASE-6953: --- Summary: Incorrect javadoc description of HFileOutputFormat regarding multiple column families Key: HBASE-6953 URL: https://issues.apache.org/jira/browse/HBASE-6953 Project: HBase Issue Type: Bug Components: mapreduce Reporter: Gabriel Reid Priority: Minor The javadoc for HFileOutputFormat states that the class does not support writing multiple column families; however, this hasn't been the case since HBASE-1861 was resolved. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira
[jira] [Updated] (HBASE-6953) Incorrect javadoc description of HFileOutputFormat regarding multiple column families
[ https://issues.apache.org/jira/browse/HBASE-6953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabriel Reid updated HBASE-6953: Attachment: HBASE-6953.patch Attached patch removes the incorrect statement about multiple column families, as well as providing a pointer to the configureIncrementalLoad convenience method (as this is what most users of the class will be interested in). Incorrect javadoc description of HFileOutputFormat regarding multiple column families - Key: HBASE-6953 URL: https://issues.apache.org/jira/browse/HBASE-6953 Project: HBase Issue Type: Bug Components: mapreduce Reporter: Gabriel Reid Priority: Minor Attachments: HBASE-6953.patch The javadoc for HFileOutputFormat states that the class does not support writing multiple column families; however, this hasn't been the case since HBASE-1861 was resolved. -- This message is automatically generated by JIRA. If you think it was sent incorrectly, please contact your JIRA administrators For more information on JIRA, see: http://www.atlassian.com/software/jira