[jira] [Updated] (HBASE-12413) Mismatch in the equals and hashcode methods of KeyValue

2015-03-27 Thread Gabriel Reid (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-12413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gabriel Reid updated HBASE-12413:
-
Attachment: HBASE-12413.v4.patch

Thanks for taking a look [~tedyu] and [~jingcheng...@intel.com]. Here's an 
updated patch that fixes the javadoc verb issue (compares replaced by uses), 
and also ignores MVCC when calculating the hash code (including a test to 
verify this).

 Mismatch in the equals and hashcode methods of KeyValue
 ---

 Key: HBASE-12413
 URL: https://issues.apache.org/jira/browse/HBASE-12413
 Project: HBase
  Issue Type: Bug
Reporter: Jingcheng Du
Assignee: Jingcheng Du
Priority: Minor
 Attachments: HBASE-12413-V2.diff, HBASE-12413.diff, 
 HBASE-12413.v3.patch, HBASE-12413.v4.patch


 In the equals method of KeyValue only row key is compared, and in the 
 hashcode method all bacing bytes are calculated. This breaks the Java rule.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (HBASE-12413) Mismatch in the equals and hashcode methods of KeyValue

2015-03-26 Thread Gabriel Reid (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-12413?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14381806#comment-14381806
 ] 

Gabriel Reid commented on HBASE-12413:
--

[~tedyu] if this change is incompatible (i.e. if it breaks anything), I would 
assume that it is just making it more clear that that thing was broken anyhow. 
The case I'm imagining is that KeyValues are being put into a HashSet or used 
as keys in a HashMap -- in this case, the current (unpatched) behavior would be 
non-determinsitic, but may appear to work in many cases (i.e. the fact that 
the client code is broken might not be immediately visible).

I think that committing this can only be a good thing -- I think that the worst 
case is that it would make it more clear that already-broken code is actually 
broken.

 Mismatch in the equals and hashcode methods of KeyValue
 ---

 Key: HBASE-12413
 URL: https://issues.apache.org/jira/browse/HBASE-12413
 Project: HBase
  Issue Type: Bug
Reporter: Jingcheng Du
Assignee: Jingcheng Du
Priority: Minor
 Attachments: HBASE-12413-V2.diff, HBASE-12413.diff


 In the equals method of KeyValue only row key is compared, and in the 
 hashcode method all bacing bytes are calculated. This breaks the Java rule.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-12413) Mismatch in the equals and hashcode methods of KeyValue

2015-03-26 Thread Gabriel Reid (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-12413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gabriel Reid updated HBASE-12413:
-
Attachment: HBASE-12413.v3.patch

Here's a slightly altered patch that removes the need to change the visibility 
of CellComparator.calculateHashForKeyValue, and adds some basic testing of the 
equals and hashCode methods on KeyValue. I'll try to get a new QA run on this 
and hopefully it'll be clean.

 Mismatch in the equals and hashcode methods of KeyValue
 ---

 Key: HBASE-12413
 URL: https://issues.apache.org/jira/browse/HBASE-12413
 Project: HBase
  Issue Type: Bug
Reporter: Jingcheng Du
Assignee: Jingcheng Du
Priority: Minor
 Attachments: HBASE-12413-V2.diff, HBASE-12413.diff, 
 HBASE-12413.v3.patch


 In the equals method of KeyValue only row key is compared, and in the 
 hashcode method all bacing bytes are calculated. This breaks the Java rule.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-12413) Mismatch in the equals and hashcode methods of KeyValue

2015-03-26 Thread Gabriel Reid (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-12413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gabriel Reid updated HBASE-12413:
-
Assignee: (was: Jingcheng Du)
  Status: Open  (was: Patch Available)

 Mismatch in the equals and hashcode methods of KeyValue
 ---

 Key: HBASE-12413
 URL: https://issues.apache.org/jira/browse/HBASE-12413
 Project: HBase
  Issue Type: Bug
Reporter: Jingcheng Du
Priority: Minor
 Attachments: HBASE-12413-V2.diff, HBASE-12413.diff, 
 HBASE-12413.v3.patch


 In the equals method of KeyValue only row key is compared, and in the 
 hashcode method all bacing bytes are calculated. This breaks the Java rule.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (HBASE-12413) Mismatch in the equals and hashcode methods of KeyValue

2015-03-26 Thread Gabriel Reid (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-12413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gabriel Reid updated HBASE-12413:
-
Assignee: Jingcheng Du
  Status: Patch Available  (was: Open)

 Mismatch in the equals and hashcode methods of KeyValue
 ---

 Key: HBASE-12413
 URL: https://issues.apache.org/jira/browse/HBASE-12413
 Project: HBase
  Issue Type: Bug
Reporter: Jingcheng Du
Assignee: Jingcheng Du
Priority: Minor
 Attachments: HBASE-12413-V2.diff, HBASE-12413.diff, 
 HBASE-12413.v3.patch


 In the equals method of KeyValue only row key is compared, and in the 
 hashcode method all bacing bytes are calculated. This breaks the Java rule.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Resolved] (HBASE-6749) Compact one expired HFile all the time

2015-02-23 Thread Gabriel Reid (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6749?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gabriel Reid resolved HBASE-6749.
-
Resolution: Duplicate

This is a duplicate of HBASE-10371

 Compact one expired HFile all the time
 --

 Key: HBASE-6749
 URL: https://issues.apache.org/jira/browse/HBASE-6749
 Project: HBase
  Issue Type: Bug
  Components: regionserver
Affects Versions: 0.94.1
Reporter: Jieshan Bean
Assignee: Jieshan Bean

 It's an interesting issue. We found there's 1 HFile keeped changing its name 
 all the time. After dig in more, we found one strange behavior in compaction 
 flow.
 Here's the problem(We set the TTL property in our table):
 There were 10 HFiles and only 1 expired HFile when this problem occured:
 {noformat}
 2012-09-07 02:21:05,298 DEBUG org.apache.hadoop.hbase.regionserver.Store: 
 loaded 
 hdfs://hacluster/hbase/fhtest/97ac7ea6f732b7c10fe504a54ab02441/value/221f56905cbd4bf09bd4d5d9dceb113a.3ed2e43476ca2c614e33d0e1255c79a9,
  isReference=true, isBulkLoadResult=false, seqid=118730, majorCompaction=false
 2012-09-07 02:21:05,309 DEBUG org.apache.hadoop.hbase.regionserver.Store: 
 loaded 
 hdfs://hacluster/hbase/fhtest/97ac7ea6f732b7c10fe504a54ab02441/value/297b45a6c5f541dca05105ab098dab8d,
  isReference=false, isBulkLoadResult=false, seqid=122018, 
 majorCompaction=false
 2012-09-07 02:21:05,326 DEBUG org.apache.hadoop.hbase.regionserver.Store: 
 loaded 
 hdfs://hacluster/hbase/fhtest/97ac7ea6f732b7c10fe504a54ab02441/value/4a4a4598bc0443c9be087812052d6796.3ed2e43476ca2c614e33d0e1255c79a9,
  isReference=true, isBulkLoadResult=false, seqid=119850, majorCompaction=false
 2012-09-07 02:21:05,348 DEBUG org.apache.hadoop.hbase.regionserver.Store: 
 loaded 
 hdfs://hacluster/hbase/fhtest/97ac7ea6f732b7c10fe504a54ab02441/value/8c6d56c9bafb4b0eb0dd6e04e41ca5b7,
  isReference=false, isBulkLoadResult=false, seqid=123135, 
 majorCompaction=false
 2012-09-07 02:21:05,357 DEBUG org.apache.hadoop.hbase.regionserver.Store: 
 loaded 
 hdfs://hacluster/hbase/fhtest/97ac7ea6f732b7c10fe504a54ab02441/value/a6c2873a646c425f87a6a8a271a9904e,
  isReference=false, isBulkLoadResult=false, seqid=76561, majorCompaction=false
 2012-09-07 02:21:05,370 DEBUG org.apache.hadoop.hbase.regionserver.Store: 
 loaded 
 hdfs://hacluster/hbase/fhtest/97ac7ea6f732b7c10fe504a54ab02441/value/a712a2f79bd247f48405d2c6a91757ab.3ed2e43476ca2c614e33d0e1255c79a9,
  isReference=true, isBulkLoadResult=false, seqid=120951, majorCompaction=false
 2012-09-07 02:21:05,381 DEBUG org.apache.hadoop.hbase.regionserver.Store: 
 loaded 
 hdfs://hacluster/hbase/fhtest/97ac7ea6f732b7c10fe504a54ab02441/value/a717c63214534cb0aaf8e695147fde46,
  isReference=false, isBulkLoadResult=false, seqid=122763, 
 majorCompaction=false
 2012-09-07 02:21:05,431 DEBUG org.apache.hadoop.hbase.regionserver.Store: 
 loaded 
 hdfs://hacluster/hbase/fhtest/97ac7ea6f732b7c10fe504a54ab02441/value/c456f3831b094ac3a0590678acbf27a5.3ed2e43476ca2c614e33d0e1255c79a9,
  isReference=true, isBulkLoadResult=false, seqid=120579, majorCompaction=false
 2012-09-07 02:21:05,518 DEBUG org.apache.hadoop.hbase.regionserver.Store: 
 loaded 
 hdfs://hacluster/hbase/fhtest/97ac7ea6f732b7c10fe504a54ab02441/value/dde4e56b131a4ffdaec8f9574bffa5ab.3ed2e43476ca2c614e33d0e1255c79a9,
  isReference=true, isBulkLoadResult=false, seqid=121651, majorCompaction=false
 2012-09-07 02:21:05,593 DEBUG org.apache.hadoop.hbase.regionserver.Store: 
 loaded 
 hdfs://hacluster/hbase/fhtest/97ac7ea6f732b7c10fe504a54ab02441/value/ee62844594f2474e88186dbde673c802.3ed2e43476ca2c614e33d0e1255c79a9,
  isReference=true, isBulkLoadResult=false, seqid=119478, majorCompaction=false
 {noformat} 
 Compaction was triggered during Region-Opening . As we know, compaction 
 should choose the expired HFiles to compact first, so that 1 expired HFile 
 was choosed. 
 Since no KeyValue was there, compaction deleted the old HFile and created a 
 new one with minimumTimestamp = -1  maximumTimestamp = -1.
 So after the first compaction, there were still 10 HFiles. It triggered 
 compaction again and again.
 {noformat}
 2012-09-07 02:21:06,079 INFO 
 org.apache.hadoop.hbase.regionserver.compactions.CompactSelection: Deleting 
 the expired store file by compaction: 
 hdfs://hacluster/hbase/fhtest/97ac7ea6f732b7c10fe504a54ab02441/value/72468dff1cd94c4fb9cf9196cc3183b7
  whose maxTimeStamp is -1 while the max expired timestamp is 1344824466079
 
 2012-09-07 02:21:06,080 DEBUG org.apache.hadoop.hbase.regionserver.Store: 
 Compacting 
 hdfs://hacluster/hbase/fhtest/97ac7ea6f732b7c10fe504a54ab02441/value/72468dff1cd94c4fb9cf9196cc3183b7,
  keycount=0, bloomtype=NONE, size=558, encoding=NONE
 2012-09-07 02:21:06,082 DEBUG org.apache.hadoop.hbase.util.FSUtils: Creating 
 

[jira] [Commented] (HBASE-10504) Define Replication Interface

2014-05-16 Thread Gabriel Reid (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13998525#comment-13998525
 ] 

Gabriel Reid commented on HBASE-10504:
--

{quote}Even in that case, the server-client implementation will be much simpler 
than what you have in SEP I think, which does hbase RPC + zookeeper + zk 
mockup. Only a server discovery + simple one method thrift / PB RPC is what is 
needed from my understanding. It would be great to have a skeleton for this 
living in HBase proper as well.{quote}

The SEP does indeed make use of HBase RPC and ZK, and does masquerades in ZK as 
a region server, but I don't see that much of an advantage of doing a separate 
thrift or PB-based RPC. I think that this would just be exchanging one existing 
RPC mechanism (hbase) for another (thrift/PB). The service discovery would 
still be required, which would also involve ZK. Not having to do the setup of 
masquerading as a regionserver would indeed be a plus.

Taking this road for the SEP would then require:
* defining/setting up RPC (which is currently already provided by hbase
* implementing a ReplicationConsumer (and deploying and configuring it in hbase 
RS) that does service discovery, RPC endpoint selection, and properly deals 
with failing/flaky RPC endpoints (also already provided in hbase replication)
* reworking the SEP RPC server to use the new RPC mechanism

This seems to me like work that is difficult to justify for the SEP because it 
makes use of existing HBase RPC and replication failover handling, and 
currently works. I may very well be missing something, but I don't see how 
reimplementing this stuff in the SEP would simplify things very much.

The one big potential advantage that I do see is that it would protect the SEP 
from changes in the internal HBase replication interfaces.

Obviously these comments are purely from the standpoint of SEP maintenance. As 
I mentioned in a previous comment, I think that this proposal would be a great 
addition to HBase and very useful -- I just don't see it's usefulness to the 
SEP yet.

 Define Replication Interface
 

 Key: HBASE-10504
 URL: https://issues.apache.org/jira/browse/HBASE-10504
 Project: HBase
  Issue Type: Task
Reporter: stack
Assignee: stack
Priority: Blocker
 Fix For: 0.99.0

 Attachments: hbase-10504_wip1.patch


 HBase has replication.  Fellas have been hijacking the replication apis to do 
 all kinds of perverse stuff like indexing hbase content (hbase-indexer 
 https://github.com/NGDATA/hbase-indexer) and our [~toffer] just showed up w/ 
 overrides that replicate via an alternate channel (over a secure thrift 
 channel between dcs over on HBASE-9360).  This issue is about surfacing these 
 APIs as public with guarantees to downstreamers similar to those we have on 
 our public client-facing APIs (and so we don't break them for downstreamers).
 Any input [~phunt] or [~gabriel.reid] or [~toffer]?
 Thanks.
  



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-10504) Define Replication Interface

2014-05-16 Thread Gabriel Reid (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13999804#comment-13999804
 ] 

Gabriel Reid commented on HBASE-10504:
--

{quote}I think I don't fully understand why would you want to have a proxy 
layer as a separate service (other than isolation issues).{quote}

Process isolation and deployment isolation are indeed the two main issues. The 
process isolation side of things probably speaks for itself. Deployment 
isolation is another issue however. Keeping dependencies of hbase-indexer in 
line with all deps in hbase is an issue (as all hbase-indexer jars would need 
to be deployed into the HBase lib for this approach to work), and all 
configuration, config changes, and restarting would also become directly linked 
to HBase.

 Define Replication Interface
 

 Key: HBASE-10504
 URL: https://issues.apache.org/jira/browse/HBASE-10504
 Project: HBase
  Issue Type: Task
Reporter: stack
Assignee: stack
Priority: Blocker
 Fix For: 0.99.0

 Attachments: hbase-10504_wip1.patch


 HBase has replication.  Fellas have been hijacking the replication apis to do 
 all kinds of perverse stuff like indexing hbase content (hbase-indexer 
 https://github.com/NGDATA/hbase-indexer) and our [~toffer] just showed up w/ 
 overrides that replicate via an alternate channel (over a secure thrift 
 channel between dcs over on HBASE-9360).  This issue is about surfacing these 
 APIs as public with guarantees to downstreamers similar to those we have on 
 our public client-facing APIs (and so we don't break them for downstreamers).
 Any input [~phunt] or [~gabriel.reid] or [~toffer]?
 Thanks.
  



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-10504) Define Replication Interface

2014-05-15 Thread Gabriel Reid (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13997936#comment-13997936
 ] 

Gabriel Reid commented on HBASE-10504:
--

[~enis] [~jdcryans] I think that this would make a great addition to HBase in 
general -- if I'm reading it correctly, it basically works out as a kind of 
RegionObserver.postPut that runs in process, but off of the write path. 
However, I'm less sure of how well this general approach would work for 
hbase-indexer.

Although this approach would simplify a lot of things (e.g.removing the need to 
to act as a slave cluster), it also introduces a number of important changes in 
the execution model of something that runs via the SepConsumer (i.e. reacting 
to HBase updates)
* the number of replication endpoints becomes directly linked to the number of 
regionservers -- as far as I see, it would no longer be possible to have more 
or fewer replication endpoints than regionservers
* the replication (ReplicationConsumer) runs within the same JVM as the 
regionserver, which brings some important changes: all of the dependencies then 
need to be present in the classpath of the regionserver (and compatible with 
the dependencies of the regionserver), it has an effect on the heap, is no 
longer possible to tune the JVM settings for the replication process on its own
* from what I see of the current patch (although I'm sure this could be 
changed) is that it looks like there's only a single ReplicationConsumer, 
meaning that it wouldn't be possible to have real replication running next to 
a custom ReplicationSource

The tighter-coupling issues that I outlined above could be worked around by 
having a lightweight ReplicationConsumer that just passes data to an external 
process via RPC, but that's basically exactly what the replication code in 
HBase is doing right now, so it would be a shame to re-implement something like 
that.

 Define Replication Interface
 

 Key: HBASE-10504
 URL: https://issues.apache.org/jira/browse/HBASE-10504
 Project: HBase
  Issue Type: Task
Reporter: stack
Assignee: stack
Priority: Blocker
 Fix For: 0.99.0

 Attachments: hbase-10504_wip1.patch


 HBase has replication.  Fellas have been hijacking the replication apis to do 
 all kinds of perverse stuff like indexing hbase content (hbase-indexer 
 https://github.com/NGDATA/hbase-indexer) and our [~toffer] just showed up w/ 
 overrides that replicate via an alternate channel (over a secure thrift 
 channel between dcs over on HBASE-9360).  This issue is about surfacing these 
 APIs as public with guarantees to downstreamers similar to those we have on 
 our public client-facing APIs (and so we don't break them for downstreamers).
 Any input [~phunt] or [~gabriel.reid] or [~toffer]?
 Thanks.
  



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Created] (HBASE-11110) Ability to load FilterList class is dependent on context classloader

2014-05-02 Thread Gabriel Reid (JIRA)
Gabriel Reid created HBASE-0:


 Summary: Ability to load FilterList class is dependent on context 
classloader
 Key: HBASE-0
 URL: https://issues.apache.org/jira/browse/HBASE-0
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.19
Reporter: Gabriel Reid


In the 0.94 branch, the FilterList class contains a static call to 
HBaseConfiguration.create(). This create call in turn adds the needed hbase 
resources to the Configuration object, and sets the classloader of the 
Configuration object to be the context classloader of the current thread (if it 
isn't null).

This approach causes issues if the FilterList class is loaded from a thread 
that has a custom context classloader that doesn't run back up to the main 
application classloader. In this case, HBaseConfiguration.checkDefaultsVersion 
fails because the hbase.defaults.for.version configuration value can't be found 
(because hbase-default.xml can't be found by the custom context classloader).

This is a concrete issue that was discovered via Apache Phoenix within a 
commercial tool, when a (JDBC) connection is opened via a pool, and then passed 
off to a UI thread that has a custom context classloader. The UI thread is then 
the first thing to load FilterList, leading to this issue.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Updated] (HBASE-11110) Ability to load FilterList class is dependent on context classloader

2014-05-02 Thread Gabriel Reid (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-0?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gabriel Reid updated HBASE-0:
-

Attachment: HBASE-0.patch

Patch to set the current thread's context classloader to the classloader used 
to load the Filter class while setting static config member of FilterList.

 Ability to load FilterList class is dependent on context classloader
 

 Key: HBASE-0
 URL: https://issues.apache.org/jira/browse/HBASE-0
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.19
Reporter: Gabriel Reid
 Attachments: HBASE-0.patch


 In the 0.94 branch, the FilterList class contains a static call to 
 HBaseConfiguration.create(). This create call in turn adds the needed hbase 
 resources to the Configuration object, and sets the classloader of the 
 Configuration object to be the context classloader of the current thread (if 
 it isn't null).
 This approach causes issues if the FilterList class is loaded from a thread 
 that has a custom context classloader that doesn't run back up to the main 
 application classloader. In this case, 
 HBaseConfiguration.checkDefaultsVersion fails because the 
 hbase.defaults.for.version configuration value can't be found (because 
 hbase-default.xml can't be found by the custom context classloader).
 This is a concrete issue that was discovered via Apache Phoenix within a 
 commercial tool, when a (JDBC) connection is opened via a pool, and then 
 passed off to a UI thread that has a custom context classloader. The UI 
 thread is then the first thing to load FilterList, leading to this issue.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-11110) Ability to load FilterList class is dependent on context classloader

2014-05-02 Thread Gabriel Reid (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-0?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13988143#comment-13988143
 ] 

Gabriel Reid commented on HBASE-0:
--

{quote}This exact issue won't be a problem in 0.96+ since the relevant code has 
changed, but I wonder if it's possible to try Phoenix 4 + HBase 0.98 with that 
tool?{quote}

Yes, will do. I'm working on going through some general tool compat tests, 
starting with Phoenix 3/HBase 0.94 but will also go on to do the same with 
Phoenix 4/HBase 0.98.

 Ability to load FilterList class is dependent on context classloader
 

 Key: HBASE-0
 URL: https://issues.apache.org/jira/browse/HBASE-0
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.19
Reporter: Gabriel Reid
Assignee: Gabriel Reid
 Fix For: 0.94.20

 Attachments: HBASE-0.patch


 In the 0.94 branch, the FilterList class contains a static call to 
 HBaseConfiguration.create(). This create call in turn adds the needed hbase 
 resources to the Configuration object, and sets the classloader of the 
 Configuration object to be the context classloader of the current thread (if 
 it isn't null).
 This approach causes issues if the FilterList class is loaded from a thread 
 that has a custom context classloader that doesn't run back up to the main 
 application classloader. In this case, 
 HBaseConfiguration.checkDefaultsVersion fails because the 
 hbase.defaults.for.version configuration value can't be found (because 
 hbase-default.xml can't be found by the custom context classloader).
 This is a concrete issue that was discovered via Apache Phoenix within a 
 commercial tool, when a (JDBC) connection is opened via a pool, and then 
 passed off to a UI thread that has a custom context classloader. The UI 
 thread is then the first thing to load FilterList, leading to this issue.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-10504) Define Replication Interface

2014-02-14 Thread Gabriel Reid (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13902005#comment-13902005
 ] 

Gabriel Reid commented on HBASE-10504:
--

Making the NRT listening of changes on HBase into a first-class API as 
[~toffer] suggested sounds like an interesting idea, if there's interest in 
going that far in actively supporting this use case. 

If there is interest in doing that, the current hbase-sep library (which is the 
current library used for latching in via replication and responding to changes 
for the hbase-indexer project) could be a possible starting point. The current 
API interfaces[1] as well as the impl are within sub-modules[2] of the 
hbase-indexer project on GitHub.

[1] 
https://github.com/NGDATA/hbase-indexer/tree/master/hbase-sep/hbase-sep-api/src/main/java/com/ngdata/sep
[2] https://github.com/NGDATA/hbase-indexer/tree/master/hbase-sep

 Define Replication Interface
 

 Key: HBASE-10504
 URL: https://issues.apache.org/jira/browse/HBASE-10504
 Project: HBase
  Issue Type: Task
Reporter: stack
Assignee: stack
 Fix For: 0.99.0


 HBase has replication.  Fellas have been hijacking the replication apis to do 
 all kinds of perverse stuff like indexing hbase content (hbase-indexer 
 https://github.com/NGDATA/hbase-indexer) and our [~toffer] just showed up w/ 
 overrides that replicate via an alternate channel (over a secure thrift 
 channel between dcs over on HBASE-9360).  This issue is about surfacing these 
 APIs as public with guarantees to downstreamers similar to those we have on 
 our public client-facing APIs (and so we don't break them for downstreamers).
 Any input [~phunt] or [~gabriel.reid] or [~toffer]?
 Thanks.
  



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-10504) Define Replication Interface

2014-02-12 Thread Gabriel Reid (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-10504?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13898904#comment-13898904
 ] 

Gabriel Reid commented on HBASE-10504:
--

I would echo [~phunt]'s points -- having these interfaces public and stable 
would be great.

I think that from the point of view of the hbase-indexer project, the main 
contact points that are involved are 
o.a.h.h.replication.regionserver.ReplicationSource and o.a.h.h.Server, as well 
as the generated AdminProtos, although I assume that the scope of this ticket 
will be a fair bit wider than that.

If we're going to move to having a published stable API, now is also probably 
the time to make any changes that may be needed. The one thing I can think of 
right now that would be nice is a better hook into 
o.a.h.h.replication.regionserver.ReplicationSource#removeNonReplicableEdits. 
Right now we override that method to do additional filtering on HLog entries 
that we know we don't want do send to the indexer, but it would be good to make 
that into a public method with some kind of filter interface.

 Define Replication Interface
 

 Key: HBASE-10504
 URL: https://issues.apache.org/jira/browse/HBASE-10504
 Project: HBase
  Issue Type: Task
Reporter: stack
Assignee: stack
 Fix For: 0.99.0


 HBase has replication.  Fellas have been hijacking the replication apis to do 
 all kinds of perverse stuff like indexing hbase content (hbase-indexer 
 https://github.com/NGDATA/hbase-indexer) and our [~toffer] just showed up w/ 
 overrides that replicate via an alternate channel (over a secure thrift 
 channel between dcs over on HBASE-9360).  This issue is about surfacing these 
 APIs as public with guarantees to downstreamers similar to those we have on 
 our public client-facing APIs (and so we don't break them for downstreamers).
 Any input [~phunt] or [~gabriel.reid] or [~toffer]?
 Thanks.
  



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)


[jira] [Commented] (HBASE-9763) Scan javadoc doesn't fully capture semantics of start and stop row

2013-10-16 Thread Gabriel Reid (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9763?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13797015#comment-13797015
 ] 

Gabriel Reid commented on HBASE-9763:
-

Thanks for taking a look. The difference between your test and my example is 
that my example is based on a row key prefix, and your test is based on full 
matching. Like the discussion and previous bug in the docs in HBASE-9035, 
there's a lot of potential for confusion when appending null bytes to get 
different semantics.

I think this is a potentially confusing topic for users not matter what. 
Another option instead of removing the null-byte appending advice from the 
javadoc might be to qualify it with more information around the semantics of 
what the start row and end row of a scan really are. Or maybe I'm just worrying 
too much about it and it's not really that big of an issue. :-)

 Scan javadoc doesn't fully capture semantics of start and stop row
 --

 Key: HBASE-9763
 URL: https://issues.apache.org/jira/browse/HBASE-9763
 Project: HBase
  Issue Type: Bug
  Components: documentation
Reporter: Gabriel Reid
Priority: Minor
 Attachments: HBASE-9763.patch


 The current javadoc for Scan#setStartRow and Scan#setStopRow methods don't 
 accurately capture the semantics of the use of row prefix values. Both 
 methods describe the use of a trailing null byte to change the 
 inclusive/exclusive the respective semantics of setStartRow and setStopRow.
 The use of a trailing null byte for start row exclusion only works in the 
 case that exact full matching is done on row keys. The use of a trailing null 
 byte for stop row inclusion has even more limitations (see HBASE-9035).
 The basic example is having the following rows:
 {code}
 AAB
 ABB
 BBC
 BCC
 {code}
 Setting the start row to A and the stop row to B will include AAB and AB. 
 Setting the start row to A\x0 and the stop row to B\x0 will result in the 
 same two rows coming out of the scan, instead of having an effect on the 
 inclusion/exclusion semantics.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Created] (HBASE-9763) Scan javadoc doesn't fully capture semantics of start and stop row

2013-10-15 Thread Gabriel Reid (JIRA)
Gabriel Reid created HBASE-9763:
---

 Summary: Scan javadoc doesn't fully capture semantics of start and 
stop row
 Key: HBASE-9763
 URL: https://issues.apache.org/jira/browse/HBASE-9763
 Project: HBase
  Issue Type: Bug
  Components: documentation
Reporter: Gabriel Reid
Priority: Minor


The current javadoc for Scan#setStartRow and Scan#setStopRow methods don't 
accurately capture the semantics of the use of row prefix values. Both methods 
describe the use of a trailing null byte to change the inclusive/exclusive the 
respective semantics of setStartRow and setStopRow.

The use of a trailing null byte for start row exclusion only works in the case 
that exact full matching is done on row keys. The use of a trailing null byte 
for stop row inclusion has even more limitations (see HBASE-9035).

The basic example is having the following rows:

{code}
AAB
ABB
BBC
BCC
{code}

Setting the start row to A and the stop row to B will include AAB and AB. 

Setting the start row to A\x0 and the stop row to B\x0 will result in the same 
two rows coming out of the scan, instead of having an effect on the 
inclusion/exclusion semantics.






--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HBASE-9763) Scan javadoc doesn't fully capture semantics of start and stop row

2013-10-15 Thread Gabriel Reid (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gabriel Reid updated HBASE-9763:


Attachment: HBASE-9763.patch

Trivial patch to remove the trailing null byte information, and be a bit more 
clear about the use of scan start and stop as matching the row prefix.

 Scan javadoc doesn't fully capture semantics of start and stop row
 --

 Key: HBASE-9763
 URL: https://issues.apache.org/jira/browse/HBASE-9763
 Project: HBase
  Issue Type: Bug
  Components: documentation
Reporter: Gabriel Reid
Priority: Minor
 Attachments: HBASE-9763.patch


 The current javadoc for Scan#setStartRow and Scan#setStopRow methods don't 
 accurately capture the semantics of the use of row prefix values. Both 
 methods describe the use of a trailing null byte to change the 
 inclusive/exclusive the respective semantics of setStartRow and setStopRow.
 The use of a trailing null byte for start row exclusion only works in the 
 case that exact full matching is done on row keys. The use of a trailing null 
 byte for stop row inclusion has even more limitations (see HBASE-9035).
 The basic example is having the following rows:
 {code}
 AAB
 ABB
 BBC
 BCC
 {code}
 Setting the start row to A and the stop row to B will include AAB and AB. 
 Setting the start row to A\x0 and the stop row to B\x0 will result in the 
 same two rows coming out of the scan, instead of having an effect on the 
 inclusion/exclusion semantics.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Updated] (HBASE-9763) Scan javadoc doesn't fully capture semantics of start and stop row

2013-10-15 Thread Gabriel Reid (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gabriel Reid updated HBASE-9763:


Status: Patch Available  (was: Open)

 Scan javadoc doesn't fully capture semantics of start and stop row
 --

 Key: HBASE-9763
 URL: https://issues.apache.org/jira/browse/HBASE-9763
 Project: HBase
  Issue Type: Bug
  Components: documentation
Reporter: Gabriel Reid
Priority: Minor
 Attachments: HBASE-9763.patch


 The current javadoc for Scan#setStartRow and Scan#setStopRow methods don't 
 accurately capture the semantics of the use of row prefix values. Both 
 methods describe the use of a trailing null byte to change the 
 inclusive/exclusive the respective semantics of setStartRow and setStopRow.
 The use of a trailing null byte for start row exclusion only works in the 
 case that exact full matching is done on row keys. The use of a trailing null 
 byte for stop row inclusion has even more limitations (see HBASE-9035).
 The basic example is having the following rows:
 {code}
 AAB
 ABB
 BBC
 BCC
 {code}
 Setting the start row to A and the stop row to B will include AAB and AB. 
 Setting the start row to A\x0 and the stop row to B\x0 will result in the 
 same two rows coming out of the scan, instead of having an effect on the 
 inclusion/exclusion semantics.



--
This message was sent by Atlassian JIRA
(v6.1#6144)


[jira] [Commented] (HBASE-9591) [replication] getting Current list of sinks is out of date all the time when a source is recovered

2013-09-23 Thread Gabriel Reid (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13774597#comment-13774597
 ] 

Gabriel Reid commented on HBASE-9591:
-

Sorry for being so slow at looking at this.

I just finally took a closer look, and now I'm clear on what's going on. I 
think I'm leaning towards managing each cluster fully separately (i.e. having a 
separate ReplicationPeers instance per peer cluster), but I'm wondering what 
kind of impact that would have on resource usage. At first glance, it looks 
like it should be fine. I think that taking this approach will be better in 
terms of avoiding other variations of this bug in the future, which could be 
something that would happen if we do the noop if chooseSinks returns the same 
thing approach.

On the other hand, the noop if chooseSinks returns the same thing approach 
will probably be quite a bit easier.

Do you have a personal preference for the approach, or ideas on what would be 
best?

 [replication] getting Current list of sinks is out of date all the time 
 when a source is recovered
 

 Key: HBASE-9591
 URL: https://issues.apache.org/jira/browse/HBASE-9591
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.96.0
Reporter: Jean-Daniel Cryans
Priority: Minor
 Fix For: 0.96.1


 I tried killing a region server when the slave cluster was down, from that 
 point on my log was filled with:
 {noformat}
 2013-09-20 00:31:03,942 INFO  [regionserver60020.replicationSource,1] 
 org.apache.hadoop.hbase.replication.regionserver.ReplicationSinkManager: 
 Current list of sinks is out of date, updating
 2013-09-20 00:31:04,226 INFO  
 [ReplicationExecutor-0.replicationSource,1-jdec2hbase0403-4,60020,1379636329634]
  org.apache.hadoop.hbase.replication.regionserver.ReplicationSinkManager: 
 Current list of sinks is out of date, updating
 {noformat}
 The first log line is from the normal source, the second is the recovered 
 one. When we try to replicate, we call 
 replicationSinkMgr.getReplicationSink() and if the list of machines was 
 refreshed since the last time then we call chooseSinks() which in turn 
 refreshes the list of sinks and resets our lastUpdateToPeers. The next source 
 will notice the change, and will call chooseSinks() too. The first source is 
 coming for another round, sees the list was refreshed, calls chooseSinks() 
 again. It happens forever until the recovered queue is gone.
 We could have all the sources going to the same cluster share a thread-safe 
 ReplicationSinkManager. We could also manage the same cluster separately for 
 each source. Or even easier, if the list we get in chooseSinks() is the same 
 we had before, consider it a noop.
 What do you think [~gabriel.reid]?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-9591) [replication] getting Current list of sinks is out of date all the time when a source is recovered

2013-09-20 Thread Gabriel Reid (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9591?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13772838#comment-13772838
 ] 

Gabriel Reid commented on HBASE-9591:
-

I don't think I totally understand the situation. 
{quote}I tried killing a region server when the slave cluster was down{quote}

So just to be totally sure, a region server on the master cluster is being 
killed, while the whole slave cluster is down, is that correct? If that's the 
case, I'm assuming that the list of region servers in the peer cluster would 
always remain empty, and wouldn't have any change events coming through ZK, and 
so the timestamp returned by ReplicationPeers#getTimestampOfLastChangeToPeer 
would stay the same. Of course, if that's all the case then it wouldn't lead to 
this situation, so I'm definitely not understanding something.

In any case, I think it does make sense to consider things a noop (and not 
update timestamps) if the list of sinks fetched in 
ReplicationSinkManager#chooseSinks is the same as the last time.

[~lhofhansl] This shouldn't apply to 0.94, the ReplicationSinkManager is only 
0.95+.

 [replication] getting Current list of sinks is out of date all the time 
 when a source is recovered
 

 Key: HBASE-9591
 URL: https://issues.apache.org/jira/browse/HBASE-9591
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.96.0
Reporter: Jean-Daniel Cryans
Priority: Minor
 Fix For: 0.96.1


 I tried killing a region server when the slave cluster was down, from that 
 point on my log was filled with:
 {noformat}
 2013-09-20 00:31:03,942 INFO  [regionserver60020.replicationSource,1] 
 org.apache.hadoop.hbase.replication.regionserver.ReplicationSinkManager: 
 Current list of sinks is out of date, updating
 2013-09-20 00:31:04,226 INFO  
 [ReplicationExecutor-0.replicationSource,1-jdec2hbase0403-4,60020,1379636329634]
  org.apache.hadoop.hbase.replication.regionserver.ReplicationSinkManager: 
 Current list of sinks is out of date, updating
 {noformat}
 The first log line is from the normal source, the second is the recovered 
 one. When we try to replicate, we call 
 replicationSinkMgr.getReplicationSink() and if the list of machines was 
 refreshed since the last time then we call chooseSinks() which in turn 
 refreshes the list of sinks and resets our lastUpdateToPeers. The next source 
 will notice the change, and will call chooseSinks() too. The first source is 
 coming for another round, sees the list was refreshed, calls chooseSinks() 
 again. It happens forever until the recovered queue is gone.
 We could have all the sources going to the same cluster share a thread-safe 
 ReplicationSinkManager. We could also manage the same cluster separately for 
 each source. Or even easier, if the list we get in chooseSinks() is the same 
 we had before, consider it a noop.
 What do you think [~gabriel.reid]?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-7634) Replication handling of changes to peer clusters is inefficient

2013-09-20 Thread Gabriel Reid (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-7634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gabriel Reid updated HBASE-7634:


Release Note: This change has an impact on the number of watches set on the 
${zookeeper.znode.parent}/rs node in ZK in a replication slave cluster (i.e. a 
cluster that is being replicated to). Every region server in each master 
cluster will place a watch on the rs node of each slave node. No additional 
configuration is necessary for this, but this could potentially have an impact 
the performance and/or hardware requirements of ZK on very large clusters.

 Replication handling of changes to peer clusters is inefficient
 ---

 Key: HBASE-7634
 URL: https://issues.apache.org/jira/browse/HBASE-7634
 Project: HBase
  Issue Type: Bug
  Components: Replication
Affects Versions: 0.95.2
Reporter: Gabriel Reid
Assignee: Gabriel Reid
 Fix For: 0.98.0, 0.95.2

 Attachments: HBASE-7634.patch, HBASE-7634.v2.patch, 
 HBASE-7634.v3.patch, HBASE-7634.v4.patch, HBASE-7634.v5.patch, 
 HBASE-7634.v6.patch


 The current handling of changes to the region servers in a replication peer 
 cluster is currently quite inefficient. The list of region servers that are 
 being replicated to is only updated if there are a large number of issues 
 encountered while replicating.
 This can cause it to take quite a while to recognize that a number of the 
 regionserver in a peer cluster are no longer available. A potentially bigger 
 problem is that if a replication peer cluster is started with a small number 
 of regionservers, and then more region servers are added after replication 
 has started, the additional region servers will never be used for replication 
 (unless there are failures on the in-use regionservers).
 Part of the current issue is that the retry code in 
 ReplicationSource#shipEdits checks a randomly-chosen replication peer 
 regionserver (in ReplicationSource#isSlaveDown) to see if it is up after a 
 replication write has failed on a different randonly-chosen replication peer. 
 If the peer is seen as not down, another randomly-chosen peer is used for 
 writing.
 A second part of the issue is that changes to the list of region servers in a 
 peer cluster are not detected at all, and are only picked up if a certain 
 number of failures have occurred when trying to ship edits.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-9594) Add reference documentation on changes made by HBASE-7634 (Replication handling of peer cluster changes)

2013-09-20 Thread Gabriel Reid (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9594?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gabriel Reid updated HBASE-9594:


Attachment: HBASE-9594.patch

Documentation patch to include information on HBASE-7634 changes to replication.

 Add reference documentation on changes made by HBASE-7634 (Replication 
 handling of peer cluster changes)
 

 Key: HBASE-9594
 URL: https://issues.apache.org/jira/browse/HBASE-9594
 Project: HBase
  Issue Type: Improvement
  Components: documentation
Affects Versions: 0.96.0
Reporter: Gabriel Reid
Priority: Minor
 Attachments: HBASE-9594.patch


 HBASE-7634 introduced changes to how the HBase replication handles changes to 
 the composition of a slave cluster. Information on this mechanism should be 
 added to the reference documentation for replication.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7634) Replication handling of changes to peer clusters is inefficient

2013-09-20 Thread Gabriel Reid (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13772957#comment-13772957
 ] 

Gabriel Reid commented on HBASE-7634:
-

Added HBASE-9594 with documentation patch. Also added small release note to 
this issue.

 Replication handling of changes to peer clusters is inefficient
 ---

 Key: HBASE-7634
 URL: https://issues.apache.org/jira/browse/HBASE-7634
 Project: HBase
  Issue Type: Bug
  Components: Replication
Affects Versions: 0.95.2
Reporter: Gabriel Reid
Assignee: Gabriel Reid
 Fix For: 0.98.0, 0.95.2

 Attachments: HBASE-7634.patch, HBASE-7634.v2.patch, 
 HBASE-7634.v3.patch, HBASE-7634.v4.patch, HBASE-7634.v5.patch, 
 HBASE-7634.v6.patch


 The current handling of changes to the region servers in a replication peer 
 cluster is currently quite inefficient. The list of region servers that are 
 being replicated to is only updated if there are a large number of issues 
 encountered while replicating.
 This can cause it to take quite a while to recognize that a number of the 
 regionserver in a peer cluster are no longer available. A potentially bigger 
 problem is that if a replication peer cluster is started with a small number 
 of regionservers, and then more region servers are added after replication 
 has started, the additional region servers will never be used for replication 
 (unless there are failures on the in-use regionservers).
 Part of the current issue is that the retry code in 
 ReplicationSource#shipEdits checks a randomly-chosen replication peer 
 regionserver (in ReplicationSource#isSlaveDown) to see if it is up after a 
 replication write has failed on a different randonly-chosen replication peer. 
 If the peer is seen as not down, another randomly-chosen peer is used for 
 writing.
 A second part of the issue is that changes to the list of region servers in a 
 peer cluster are not detected at all, and are only picked up if a certain 
 number of failures have occurred when trying to ship edits.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HBASE-9594) Add reference documentation on changes made by HBASE-7634 (Replication handling of peer cluster changes)

2013-09-20 Thread Gabriel Reid (JIRA)
Gabriel Reid created HBASE-9594:
---

 Summary: Add reference documentation on changes made by HBASE-7634 
(Replication handling of peer cluster changes)
 Key: HBASE-9594
 URL: https://issues.apache.org/jira/browse/HBASE-9594
 Project: HBase
  Issue Type: Improvement
  Components: documentation
Affects Versions: 0.96.0
Reporter: Gabriel Reid
Priority: Minor


HBASE-7634 introduced changes to how the HBase replication handles changes to 
the composition of a slave cluster. Information on this mechanism should be 
added to the reference documentation for replication.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8561) [replication] Don't instantiate a ReplicationSource if the passed implementation isn't found

2013-08-03 Thread Gabriel Reid (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13728465#comment-13728465
 ] 

Gabriel Reid commented on HBASE-8561:
-

Wouldn't it be easier to just throw an exception up the stack and then (I 
assume) abort the regionserver? That way it'll be immediately clear if there is 
an issue, as well as removing the need to do null checking everywhere in the 
replication code.

 [replication] Don't instantiate a ReplicationSource if the passed 
 implementation isn't found
 

 Key: HBASE-8561
 URL: https://issues.apache.org/jira/browse/HBASE-8561
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.94.6.1
Reporter: Jean-Daniel Cryans
 Fix For: 0.98.0, 0.95.2


 I was debugging a case where the region servers were dying with:
 {noformat}
 ABORTING region server someserver.com,60020,1368123702806: Writing 
 replication status 
 org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = 
 NoNode for 
 /hbase/replication/rs/someserver.com,60020,1368123702806/etcetcetc/somserver.com%2C60020%2C1368123702740.1368123705091
  
 at org.apache.zookeeper.KeeperException.create(KeeperException.java:111) 
 at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) 
 at org.apache.zookeeper.ZooKeeper.setData(ZooKeeper.java:1266) 
 at 
 org.apache.hadoop.hbase.zookeeper.RecoverableZooKeeper.setData(RecoverableZooKeeper.java:354)
  
 at org.apache.hadoop.hbase.zookeeper.ZKUtil.setData(ZKUtil.java:846) 
 at org.apache.hadoop.hbase.zookeeper.ZKUtil.setData(ZKUtil.java:898) 
 at org.apache.hadoop.hbase.zookeeper.ZKUtil.setData(ZKUtil.java:892) 
 at 
 org.apache.hadoop.hbase.replication.ReplicationZookeeper.writeReplicationStatus(ReplicationZookeeper.java:558)
  
 at 
 org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceManager.logPositionAndCleanOldLogs(ReplicationSourceManager.java:154)
  
 at 
 org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.shipEdits(ReplicationSource.java:638)
  
 at 
 org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:387)
 {noformat}
 Turns out the problem really was:
 {noformat}
 2013-05-09 11:21:45,625 WARN 
 org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceManager: 
 Passed replication source implementation throws errors, defaulting to 
 ReplicationSource
 java.lang.ClassNotFoundException: Some.Other.ReplicationSource.Implementation
   at java.net.URLClassLoader$1.run(URLClassLoader.java:366)
   at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
   at java.security.AccessController.doPrivileged(Native Method)
   at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
   at java.lang.ClassLoader.loadClass(ClassLoader.java:423)
   at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
   at java.lang.ClassLoader.loadClass(ClassLoader.java:356)
   at java.lang.Class.forName0(Native Method)
   at java.lang.Class.forName(Class.java:186)
   at 
 org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceManager.getReplicationSource(ReplicationSourceManager.java:324)
   at 
 org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceManager.addSource(ReplicationSourceManager.java:202)
   at 
 org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceManager.init(ReplicationSourceManager.java:174)
   at 
 org.apache.hadoop.hbase.replication.regionserver.Replication.startReplicationService(Replication.java:171)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.startServiceThreads(HRegionServer.java:1583)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.handleReportForDutyResponse(HRegionServer.java:1042)
   at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:698)
   at java.lang.Thread.run(Thread.java:722)
 {noformat}
 So I think instantiating a ReplicationSource here is wrong and makes it 
 harder to debug.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-7634) Replication handling of changes to peer clusters is inefficient

2013-07-31 Thread Gabriel Reid (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-7634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gabriel Reid updated HBASE-7634:


Attachment: HBASE-7634.v5.patch

Updated patch based on reviewboard comments from J-D

No real functional changes (other than increased configurability). Most changes 
are formatting and whitespace.

 Replication handling of changes to peer clusters is inefficient
 ---

 Key: HBASE-7634
 URL: https://issues.apache.org/jira/browse/HBASE-7634
 Project: HBase
  Issue Type: Bug
  Components: Replication
Affects Versions: 0.95.2
Reporter: Gabriel Reid
 Attachments: HBASE-7634.patch, HBASE-7634.v2.patch, 
 HBASE-7634.v3.patch, HBASE-7634.v4.patch, HBASE-7634.v5.patch


 The current handling of changes to the region servers in a replication peer 
 cluster is currently quite inefficient. The list of region servers that are 
 being replicated to is only updated if there are a large number of issues 
 encountered while replicating.
 This can cause it to take quite a while to recognize that a number of the 
 regionserver in a peer cluster are no longer available. A potentially bigger 
 problem is that if a replication peer cluster is started with a small number 
 of regionservers, and then more region servers are added after replication 
 has started, the additional region servers will never be used for replication 
 (unless there are failures on the in-use regionservers).
 Part of the current issue is that the retry code in 
 ReplicationSource#shipEdits checks a randomly-chosen replication peer 
 regionserver (in ReplicationSource#isSlaveDown) to see if it is up after a 
 replication write has failed on a different randonly-chosen replication peer. 
 If the peer is seen as not down, another randomly-chosen peer is used for 
 writing.
 A second part of the issue is that changes to the list of region servers in a 
 peer cluster are not detected at all, and are only picked up if a certain 
 number of failures have occurred when trying to ship edits.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-7634) Replication handling of changes to peer clusters is inefficient

2013-07-31 Thread Gabriel Reid (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-7634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gabriel Reid updated HBASE-7634:


Attachment: HBASE-7634.v6.patch

Now without javadoc warnings

 Replication handling of changes to peer clusters is inefficient
 ---

 Key: HBASE-7634
 URL: https://issues.apache.org/jira/browse/HBASE-7634
 Project: HBase
  Issue Type: Bug
  Components: Replication
Affects Versions: 0.95.2
Reporter: Gabriel Reid
 Attachments: HBASE-7634.patch, HBASE-7634.v2.patch, 
 HBASE-7634.v3.patch, HBASE-7634.v4.patch, HBASE-7634.v5.patch, 
 HBASE-7634.v6.patch


 The current handling of changes to the region servers in a replication peer 
 cluster is currently quite inefficient. The list of region servers that are 
 being replicated to is only updated if there are a large number of issues 
 encountered while replicating.
 This can cause it to take quite a while to recognize that a number of the 
 regionserver in a peer cluster are no longer available. A potentially bigger 
 problem is that if a replication peer cluster is started with a small number 
 of regionservers, and then more region servers are added after replication 
 has started, the additional region servers will never be used for replication 
 (unless there are failures on the in-use regionservers).
 Part of the current issue is that the retry code in 
 ReplicationSource#shipEdits checks a randomly-chosen replication peer 
 regionserver (in ReplicationSource#isSlaveDown) to see if it is up after a 
 replication write has failed on a different randonly-chosen replication peer. 
 If the peer is seen as not down, another randomly-chosen peer is used for 
 writing.
 A second part of the issue is that changes to the list of region servers in a 
 peer cluster are not detected at all, and are only picked up if a certain 
 number of failures have occurred when trying to ship edits.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-7325) Replication reacts slowly on a lightly-loaded cluster

2013-07-30 Thread Gabriel Reid (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-7325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gabriel Reid updated HBASE-7325:


Attachment: HBASE-7325.v2.patch

Regenerated patch for trunk. I've also verified that it can be applied to the 
0.94 branch (using patch -p1).

Sorry for taking so long with the patch regeneration.

 Replication reacts slowly on a lightly-loaded cluster
 -

 Key: HBASE-7325
 URL: https://issues.apache.org/jira/browse/HBASE-7325
 Project: HBase
  Issue Type: Bug
  Components: Replication
Reporter: Gabriel Reid
Priority: Minor
 Attachments: HBASE-7325.patch, HBASE-7325.v2.patch


 ReplicationSource uses a backing-off algorithm to sleep for an increasing 
 duration when an error is encountered in the replication run loop. However, 
 this backing-off is also performed when there is nothing found to replicate 
 in the HLog.
 Assuming default settings (1 second base retry sleep time, and maximum 
 multiplier of 10), this means that replication takes up to 10 seconds to 
 occur when there is a break of about 55 seconds without anything being 
 written. As there is no error condition, and there is apparently no 
 substantial load on the regionserver in this situation, it would probably 
 make more sense to not back off in non-error situations.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-7325) Replication reacts slowly on a lightly-loaded cluster

2013-07-30 Thread Gabriel Reid (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-7325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gabriel Reid updated HBASE-7325:


Status: Patch Available  (was: Open)

 Replication reacts slowly on a lightly-loaded cluster
 -

 Key: HBASE-7325
 URL: https://issues.apache.org/jira/browse/HBASE-7325
 Project: HBase
  Issue Type: Bug
  Components: Replication
Reporter: Gabriel Reid
Priority: Minor
 Attachments: HBASE-7325.patch, HBASE-7325.v2.patch


 ReplicationSource uses a backing-off algorithm to sleep for an increasing 
 duration when an error is encountered in the replication run loop. However, 
 this backing-off is also performed when there is nothing found to replicate 
 in the HLog.
 Assuming default settings (1 second base retry sleep time, and maximum 
 multiplier of 10), this means that replication takes up to 10 seconds to 
 occur when there is a break of about 55 seconds without anything being 
 written. As there is no error condition, and there is apparently no 
 substantial load on the regionserver in this situation, it would probably 
 make more sense to not back off in non-error situations.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-7634) Replication handling of changes to peer clusters is inefficient

2013-07-30 Thread Gabriel Reid (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-7634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gabriel Reid updated HBASE-7634:


Attachment: HBASE-7634.v4.patch

Rebased patch on trunk, sorry for the delay on this one.

 Replication handling of changes to peer clusters is inefficient
 ---

 Key: HBASE-7634
 URL: https://issues.apache.org/jira/browse/HBASE-7634
 Project: HBase
  Issue Type: Bug
  Components: Replication
Affects Versions: 0.95.2
Reporter: Gabriel Reid
 Attachments: HBASE-7634.patch, HBASE-7634.v2.patch, 
 HBASE-7634.v3.patch, HBASE-7634.v4.patch


 The current handling of changes to the region servers in a replication peer 
 cluster is currently quite inefficient. The list of region servers that are 
 being replicated to is only updated if there are a large number of issues 
 encountered while replicating.
 This can cause it to take quite a while to recognize that a number of the 
 regionserver in a peer cluster are no longer available. A potentially bigger 
 problem is that if a replication peer cluster is started with a small number 
 of regionservers, and then more region servers are added after replication 
 has started, the additional region servers will never be used for replication 
 (unless there are failures on the in-use regionservers).
 Part of the current issue is that the retry code in 
 ReplicationSource#shipEdits checks a randomly-chosen replication peer 
 regionserver (in ReplicationSource#isSlaveDown) to see if it is up after a 
 replication write has failed on a different randonly-chosen replication peer. 
 If the peer is seen as not down, another randomly-chosen peer is used for 
 writing.
 A second part of the issue is that changes to the list of region servers in a 
 peer cluster are not detected at all, and are only picked up if a certain 
 number of failures have occurred when trying to ship edits.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7634) Replication handling of changes to peer clusters is inefficient

2013-07-30 Thread Gabriel Reid (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13724226#comment-13724226
 ] 

Gabriel Reid commented on HBASE-7634:
-

[~ctrezzo] great idea -- I've put it on reviewboard at 
https://reviews.apache.org/r/13075/

 Replication handling of changes to peer clusters is inefficient
 ---

 Key: HBASE-7634
 URL: https://issues.apache.org/jira/browse/HBASE-7634
 Project: HBase
  Issue Type: Bug
  Components: Replication
Affects Versions: 0.95.2
Reporter: Gabriel Reid
 Attachments: HBASE-7634.patch, HBASE-7634.v2.patch, 
 HBASE-7634.v3.patch, HBASE-7634.v4.patch


 The current handling of changes to the region servers in a replication peer 
 cluster is currently quite inefficient. The list of region servers that are 
 being replicated to is only updated if there are a large number of issues 
 encountered while replicating.
 This can cause it to take quite a while to recognize that a number of the 
 regionserver in a peer cluster are no longer available. A potentially bigger 
 problem is that if a replication peer cluster is started with a small number 
 of regionservers, and then more region servers are added after replication 
 has started, the additional region servers will never be used for replication 
 (unless there are failures on the in-use regionservers).
 Part of the current issue is that the retry code in 
 ReplicationSource#shipEdits checks a randomly-chosen replication peer 
 regionserver (in ReplicationSource#isSlaveDown) to see if it is up after a 
 replication write has failed on a different randonly-chosen replication peer. 
 If the peer is seen as not down, another randomly-chosen peer is used for 
 writing.
 A second part of the issue is that changes to the list of region servers in a 
 peer cluster are not detected at all, and are only picked up if a certain 
 number of failures have occurred when trying to ship edits.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7634) Replication handling of changes to peer clusters is inefficient

2013-07-29 Thread Gabriel Reid (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13722190#comment-13722190
 ] 

Gabriel Reid commented on HBASE-7634:
-

Yes, will rebase this in the course of the next couple of days -- sorry for the 
delay on this.

 Replication handling of changes to peer clusters is inefficient
 ---

 Key: HBASE-7634
 URL: https://issues.apache.org/jira/browse/HBASE-7634
 Project: HBase
  Issue Type: Bug
  Components: Replication
Affects Versions: 0.95.2
Reporter: Gabriel Reid
 Attachments: HBASE-7634.patch, HBASE-7634.v2.patch, 
 HBASE-7634.v3.patch


 The current handling of changes to the region servers in a replication peer 
 cluster is currently quite inefficient. The list of region servers that are 
 being replicated to is only updated if there are a large number of issues 
 encountered while replicating.
 This can cause it to take quite a while to recognize that a number of the 
 regionserver in a peer cluster are no longer available. A potentially bigger 
 problem is that if a replication peer cluster is started with a small number 
 of regionservers, and then more region servers are added after replication 
 has started, the additional region servers will never be used for replication 
 (unless there are failures on the in-use regionservers).
 Part of the current issue is that the retry code in 
 ReplicationSource#shipEdits checks a randomly-chosen replication peer 
 regionserver (in ReplicationSource#isSlaveDown) to see if it is up after a 
 replication write has failed on a different randonly-chosen replication peer. 
 If the peer is seen as not down, another randomly-chosen peer is used for 
 writing.
 A second part of the issue is that changes to the list of region servers in a 
 peer cluster are not detected at all, and are only picked up if a certain 
 number of failures have occurred when trying to ship edits.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-9035) Incorrect example for using a scan stopRow in HBase book

2013-07-24 Thread Gabriel Reid (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gabriel Reid updated HBASE-9035:


Attachment: HBASE-9035.patch

Patch with working example, as well as a pointer to InclusiveStopFilter for 
easier scanning over a specific range.

 Incorrect example for using a scan stopRow in HBase book
 

 Key: HBASE-9035
 URL: https://issues.apache.org/jira/browse/HBASE-9035
 Project: HBase
  Issue Type: Bug
  Components: documentation
Reporter: Gabriel Reid
 Attachments: HBASE-9035.patch


 The example of how to use a stop row in a scan in the Section 5.7.3 of the 
 HBase book [1] is incorrect. It demonstrates using a start and stop row to 
 only retrieve records with a given prefix, creating the stop row by appending 
 a null byte to the start row.
 This creates a scan that does not include any of the target rows, because the 
 the stop row is less than the target rows via lexicographical sorting.
 [1] http://hbase.apache.org/book/data_model_operations.html#scan

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HBASE-9035) Incorrect example for using a scan stopRow in HBase book

2013-07-24 Thread Gabriel Reid (JIRA)
Gabriel Reid created HBASE-9035:
---

 Summary: Incorrect example for using a scan stopRow in HBase book
 Key: HBASE-9035
 URL: https://issues.apache.org/jira/browse/HBASE-9035
 Project: HBase
  Issue Type: Bug
  Components: documentation
Reporter: Gabriel Reid
 Attachments: HBASE-9035.patch

The example of how to use a stop row in a scan in the Section 5.7.3 of the 
HBase book [1] is incorrect. It demonstrates using a start and stop row to only 
retrieve records with a given prefix, creating the stop row by appending a null 
byte to the start row.

This creates a scan that does not include any of the target rows, because the 
the stop row is less than the target rows via lexicographical sorting.

[1] http://hbase.apache.org/book/data_model_operations.html#scan

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-9035) Incorrect example for using a scan stopRow in HBase book

2013-07-24 Thread Gabriel Reid (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9035?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gabriel Reid updated HBASE-9035:


Status: Patch Available  (was: Open)

 Incorrect example for using a scan stopRow in HBase book
 

 Key: HBASE-9035
 URL: https://issues.apache.org/jira/browse/HBASE-9035
 Project: HBase
  Issue Type: Bug
  Components: documentation
Reporter: Gabriel Reid
 Attachments: HBASE-9035.patch


 The example of how to use a stop row in a scan in the Section 5.7.3 of the 
 HBase book [1] is incorrect. It demonstrates using a start and stop row to 
 only retrieve records with a given prefix, creating the stop row by appending 
 a null byte to the start row.
 This creates a scan that does not include any of the target rows, because the 
 the stop row is less than the target rows via lexicographical sorting.
 [1] http://hbase.apache.org/book/data_model_operations.html#scan

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-9035) Incorrect example for using a scan stopRow in HBase book

2013-07-24 Thread Gabriel Reid (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-9035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13718262#comment-13718262
 ] 

Gabriel Reid commented on HBASE-9035:
-

If I'm not mistaken, adding a #FF still isn't sufficient to do what's given as 
the example in the book. 

The example specifies that it will return all rows whose row key begins with 
row. Setting the stop row to row\xFF will still exclude all rows that begin 
with row\xFF.

 Incorrect example for using a scan stopRow in HBase book
 

 Key: HBASE-9035
 URL: https://issues.apache.org/jira/browse/HBASE-9035
 Project: HBase
  Issue Type: Bug
  Components: documentation
Reporter: Gabriel Reid
 Attachments: HBASE-9035.patch


 The example of how to use a stop row in a scan in the Section 5.7.3 of the 
 HBase book [1] is incorrect. It demonstrates using a start and stop row to 
 only retrieve records with a given prefix, creating the stop row by appending 
 a null byte to the start row.
 This creates a scan that does not include any of the target rows, because the 
 the stop row is less than the target rows via lexicographical sorting.
 [1] http://hbase.apache.org/book/data_model_operations.html#scan

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-7634) Replication handling of changes to peer clusters is inefficient

2013-01-25 Thread Gabriel Reid (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-7634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gabriel Reid updated HBASE-7634:


Attachment: HBASE-7634.v3.patch

Updated patch with formatting issues fix, and made 
TestReplicationSinkManager#testReportBadSink_DownToZeroSinks deterministic

 Replication handling of changes to peer clusters is inefficient
 ---

 Key: HBASE-7634
 URL: https://issues.apache.org/jira/browse/HBASE-7634
 Project: HBase
  Issue Type: Bug
  Components: Replication
Affects Versions: 0.96.0
Reporter: Gabriel Reid
 Attachments: HBASE-7634.patch, HBASE-7634.v2.patch, 
 HBASE-7634.v3.patch


 The current handling of changes to the region servers in a replication peer 
 cluster is currently quite inefficient. The list of region servers that are 
 being replicated to is only updated if there are a large number of issues 
 encountered while replicating.
 This can cause it to take quite a while to recognize that a number of the 
 regionserver in a peer cluster are no longer available. A potentially bigger 
 problem is that if a replication peer cluster is started with a small number 
 of regionservers, and then more region servers are added after replication 
 has started, the additional region servers will never be used for replication 
 (unless there are failures on the in-use regionservers).
 Part of the current issue is that the retry code in 
 ReplicationSource#shipEdits checks a randomly-chosen replication peer 
 regionserver (in ReplicationSource#isSlaveDown) to see if it is up after a 
 replication write has failed on a different randonly-chosen replication peer. 
 If the peer is seen as not down, another randomly-chosen peer is used for 
 writing.
 A second part of the issue is that changes to the list of region servers in a 
 peer cluster are not detected at all, and are only picked up if a certain 
 number of failures have occurred when trying to ship edits.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HBASE-7673) Incorrect error logging when a replication peer is removed

2013-01-25 Thread Gabriel Reid (JIRA)
Gabriel Reid created HBASE-7673:
---

 Summary: Incorrect error logging when a replication peer is removed
 Key: HBASE-7673
 URL: https://issues.apache.org/jira/browse/HBASE-7673
 Project: HBase
  Issue Type: Bug
  Components: Replication
Affects Versions: 0.96.0
Reporter: Gabriel Reid
Priority: Minor
 Attachments: HBASE-7673.patch

When a replication peer is removed (and all goes well), the following error is 
still logged:
{noformat}[ERROR][14:14:21,504][ventThread] 
org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceManager - The 
queue we wanted to close is missing peer-state{noformat}

This is due to a watch being set on the peer-state node under the replication 
peer node in ZooKeeper, and the ReplicationSource#PeersWatcher doesn't 
correctly discern between nodes when it gets nodeDeleted messages.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-7673) Incorrect error logging when a replication peer is removed

2013-01-25 Thread Gabriel Reid (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-7673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gabriel Reid updated HBASE-7673:


Attachment: HBASE-7673.patch

Patch to make the PeersWatcher aware of what a valid path is to a peer node in 
ZK, and ignore nodeDeleted messages for other nodes.

 Incorrect error logging when a replication peer is removed
 --

 Key: HBASE-7673
 URL: https://issues.apache.org/jira/browse/HBASE-7673
 Project: HBase
  Issue Type: Bug
  Components: Replication
Affects Versions: 0.96.0
Reporter: Gabriel Reid
Priority: Minor
 Attachments: HBASE-7673.patch


 When a replication peer is removed (and all goes well), the following error 
 is still logged:
 {noformat}[ERROR][14:14:21,504][ventThread] 
 org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceManager - 
 The queue we wanted to close is missing peer-state{noformat}
 This is due to a watch being set on the peer-state node under the replication 
 peer node in ZooKeeper, and the ReplicationSource#PeersWatcher doesn't 
 correctly discern between nodes when it gets nodeDeleted messages.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-7673) Incorrect error logging when a replication peer is removed

2013-01-25 Thread Gabriel Reid (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-7673?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gabriel Reid updated HBASE-7673:


Status: Patch Available  (was: Open)

 Incorrect error logging when a replication peer is removed
 --

 Key: HBASE-7673
 URL: https://issues.apache.org/jira/browse/HBASE-7673
 Project: HBase
  Issue Type: Bug
  Components: Replication
Affects Versions: 0.96.0
Reporter: Gabriel Reid
Priority: Minor
 Attachments: HBASE-7673.patch


 When a replication peer is removed (and all goes well), the following error 
 is still logged:
 {noformat}[ERROR][14:14:21,504][ventThread] 
 org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceManager - 
 The queue we wanted to close is missing peer-state{noformat}
 This is due to a watch being set on the peer-state node under the replication 
 peer node in ZooKeeper, and the ReplicationSource#PeersWatcher doesn't 
 correctly discern between nodes when it gets nodeDeleted messages.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-7122) Proper warning message when opening a log file with no entries (idle cluster)

2013-01-25 Thread Gabriel Reid (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-7122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gabriel Reid updated HBASE-7122:


Attachment: HBASE-7122.v2.patch

We've just started having some issues with this bug as well -- trunk patch is 
attached. 

It's been tested against the replication unit tests, as well as a small-scale 
cluster test to ensure it does what it's supposed to.

 Proper warning message when opening a log file with no entries (idle cluster)
 -

 Key: HBASE-7122
 URL: https://issues.apache.org/jira/browse/HBASE-7122
 Project: HBase
  Issue Type: Sub-task
  Components: Replication
Affects Versions: 0.94.2
Reporter: Himanshu Vashishtha
Assignee: Himanshu Vashishtha
 Fix For: 0.96.0

 Attachments: HBase-7122.patch, HBASE-7122.v2.patch


 In case the cluster is idle and the log has rolled (offset to 0), 
 replicationSource tries to open the log and gets an EOF exception. This gets 
 printed after every 10 sec until an entry is inserted in it.
 {code}
 2012-11-07 15:47:40,924 DEBUG regionserver.ReplicationSource 
 (ReplicationSource.java:openReader(487)) - Opening log for replication 
 c0315.hal.cloudera.com%2C40020%2C1352324202860.1352327804874 at 0
 2012-11-07 15:47:40,926 WARN  regionserver.ReplicationSource 
 (ReplicationSource.java:openReader(543)) - 1 Got: 
 java.io.EOFException
   at java.io.DataInputStream.readFully(DataInputStream.java:180)
   at java.io.DataInputStream.readFully(DataInputStream.java:152)
   at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1508)
   at 
 org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1486)
   at 
 org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1475)
   at 
 org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1470)
   at 
 org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader$WALReader.init(SequenceFileLogReader.java:55)
   at 
 org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.init(SequenceFileLogReader.java:175)
   at 
 org.apache.hadoop.hbase.regionserver.wal.HLog.getReader(HLog.java:716)
   at 
 org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.openReader(ReplicationSource.java:491)
   at 
 org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:290)
 2012-11-07 15:47:40,927 WARN  regionserver.ReplicationSource 
 (ReplicationSource.java:openReader(547)) - Waited too long for this file, 
 considering dumping
 2012-11-07 15:47:40,927 DEBUG regionserver.ReplicationSource 
 (ReplicationSource.java:sleepForRetries(562)) - Unable to open a reader, 
 sleeping 1000 times 10
 {code}
 We should reduce the log spewing in this case (or some informative message, 
 based on the offset).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7673) Incorrect error logging when a replication peer is removed

2013-01-25 Thread Gabriel Reid (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13562744#comment-13562744
 ] 

Gabriel Reid commented on HBASE-7673:
-

Hmm, looks like the failed test (in 
org.apache.hadoop.hbase.coprocessor.example.TestBulkDeleteProtocol) seems to be 
an unrelated random test failure.

 Incorrect error logging when a replication peer is removed
 --

 Key: HBASE-7673
 URL: https://issues.apache.org/jira/browse/HBASE-7673
 Project: HBase
  Issue Type: Bug
  Components: Replication
Affects Versions: 0.96.0
Reporter: Gabriel Reid
Priority: Minor
 Attachments: HBASE-7673.patch


 When a replication peer is removed (and all goes well), the following error 
 is still logged:
 {noformat}[ERROR][14:14:21,504][ventThread] 
 org.apache.hadoop.hbase.replication.regionserver.ReplicationSourceManager - 
 The queue we wanted to close is missing peer-state{noformat}
 This is due to a watch being set on the peer-state node under the replication 
 peer node in ZooKeeper, and the ReplicationSource#PeersWatcher doesn't 
 correctly discern between nodes when it gets nodeDeleted messages.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-7634) Replication handling of changes to peer clusters is inefficient

2013-01-24 Thread Gabriel Reid (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-7634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gabriel Reid updated HBASE-7634:


Status: Patch Available  (was: Open)

 Replication handling of changes to peer clusters is inefficient
 ---

 Key: HBASE-7634
 URL: https://issues.apache.org/jira/browse/HBASE-7634
 Project: HBase
  Issue Type: Bug
  Components: Replication
Affects Versions: 0.96.0
Reporter: Gabriel Reid
 Attachments: HBASE-7634.patch


 The current handling of changes to the region servers in a replication peer 
 cluster is currently quite inefficient. The list of region servers that are 
 being replicated to is only updated if there are a large number of issues 
 encountered while replicating.
 This can cause it to take quite a while to recognize that a number of the 
 regionserver in a peer cluster are no longer available. A potentially bigger 
 problem is that if a replication peer cluster is started with a small number 
 of regionservers, and then more region servers are added after replication 
 has started, the additional region servers will never be used for replication 
 (unless there are failures on the in-use regionservers).
 Part of the current issue is that the retry code in 
 ReplicationSource#shipEdits checks a randomly-chosen replication peer 
 regionserver (in ReplicationSource#isSlaveDown) to see if it is up after a 
 replication write has failed on a different randonly-chosen replication peer. 
 If the peer is seen as not down, another randomly-chosen peer is used for 
 writing.
 A second part of the issue is that changes to the list of region servers in a 
 peer cluster are not detected at all, and are only picked up if a certain 
 number of failures have occurred when trying to ship edits.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-7634) Replication handling of changes to peer clusters is inefficient

2013-01-24 Thread Gabriel Reid (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-7634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gabriel Reid updated HBASE-7634:


Status: Open  (was: Patch Available)

Current patch cancelled because it doesn't apply cleanly.

 Replication handling of changes to peer clusters is inefficient
 ---

 Key: HBASE-7634
 URL: https://issues.apache.org/jira/browse/HBASE-7634
 Project: HBase
  Issue Type: Bug
  Components: Replication
Affects Versions: 0.96.0
Reporter: Gabriel Reid
 Attachments: HBASE-7634.patch


 The current handling of changes to the region servers in a replication peer 
 cluster is currently quite inefficient. The list of region servers that are 
 being replicated to is only updated if there are a large number of issues 
 encountered while replicating.
 This can cause it to take quite a while to recognize that a number of the 
 regionserver in a peer cluster are no longer available. A potentially bigger 
 problem is that if a replication peer cluster is started with a small number 
 of regionservers, and then more region servers are added after replication 
 has started, the additional region servers will never be used for replication 
 (unless there are failures on the in-use regionservers).
 Part of the current issue is that the retry code in 
 ReplicationSource#shipEdits checks a randomly-chosen replication peer 
 regionserver (in ReplicationSource#isSlaveDown) to see if it is up after a 
 replication write has failed on a different randonly-chosen replication peer. 
 If the peer is seen as not down, another randomly-chosen peer is used for 
 writing.
 A second part of the issue is that changes to the list of region servers in a 
 peer cluster are not detected at all, and are only picked up if a certain 
 number of failures have occurred when trying to ship edits.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-7634) Replication handling of changes to peer clusters is inefficient

2013-01-24 Thread Gabriel Reid (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-7634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gabriel Reid updated HBASE-7634:


Status: Patch Available  (was: Open)

 Replication handling of changes to peer clusters is inefficient
 ---

 Key: HBASE-7634
 URL: https://issues.apache.org/jira/browse/HBASE-7634
 Project: HBase
  Issue Type: Bug
  Components: Replication
Affects Versions: 0.96.0
Reporter: Gabriel Reid
 Attachments: HBASE-7634.patch, HBASE-7634.v2.patch


 The current handling of changes to the region servers in a replication peer 
 cluster is currently quite inefficient. The list of region servers that are 
 being replicated to is only updated if there are a large number of issues 
 encountered while replicating.
 This can cause it to take quite a while to recognize that a number of the 
 regionserver in a peer cluster are no longer available. A potentially bigger 
 problem is that if a replication peer cluster is started with a small number 
 of regionservers, and then more region servers are added after replication 
 has started, the additional region servers will never be used for replication 
 (unless there are failures on the in-use regionservers).
 Part of the current issue is that the retry code in 
 ReplicationSource#shipEdits checks a randomly-chosen replication peer 
 regionserver (in ReplicationSource#isSlaveDown) to see if it is up after a 
 replication write has failed on a different randonly-chosen replication peer. 
 If the peer is seen as not down, another randomly-chosen peer is used for 
 writing.
 A second part of the issue is that changes to the list of region servers in a 
 peer cluster are not detected at all, and are only picked up if a certain 
 number of failures have occurred when trying to ship edits.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-7634) Replication handling of changes to peer clusters is inefficient

2013-01-24 Thread Gabriel Reid (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-7634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gabriel Reid updated HBASE-7634:


Attachment: HBASE-7634.v2.patch

Updated patch that applies cleanly on current trunk

 Replication handling of changes to peer clusters is inefficient
 ---

 Key: HBASE-7634
 URL: https://issues.apache.org/jira/browse/HBASE-7634
 Project: HBase
  Issue Type: Bug
  Components: Replication
Affects Versions: 0.96.0
Reporter: Gabriel Reid
 Attachments: HBASE-7634.patch, HBASE-7634.v2.patch


 The current handling of changes to the region servers in a replication peer 
 cluster is currently quite inefficient. The list of region servers that are 
 being replicated to is only updated if there are a large number of issues 
 encountered while replicating.
 This can cause it to take quite a while to recognize that a number of the 
 regionserver in a peer cluster are no longer available. A potentially bigger 
 problem is that if a replication peer cluster is started with a small number 
 of regionservers, and then more region servers are added after replication 
 has started, the additional region servers will never be used for replication 
 (unless there are failures on the in-use regionservers).
 Part of the current issue is that the retry code in 
 ReplicationSource#shipEdits checks a randomly-chosen replication peer 
 regionserver (in ReplicationSource#isSlaveDown) to see if it is up after a 
 replication write has failed on a different randonly-chosen replication peer. 
 If the peer is seen as not down, another randomly-chosen peer is used for 
 writing.
 A second part of the issue is that changes to the list of region servers in a 
 peer cluster are not detected at all, and are only picked up if a certain 
 number of failures have occurred when trying to ship edits.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HBASE-7634) Replication handling of changes to peer clusters is inefficient

2013-01-21 Thread Gabriel Reid (JIRA)
Gabriel Reid created HBASE-7634:
---

 Summary: Replication handling of changes to peer clusters is 
inefficient
 Key: HBASE-7634
 URL: https://issues.apache.org/jira/browse/HBASE-7634
 Project: HBase
  Issue Type: Bug
  Components: Replication
Affects Versions: 0.96.0
Reporter: Gabriel Reid
 Attachments: HBASE-7634.patch

The current handling of changes to the region servers in a replication peer 
cluster is currently quite inefficient. The list of region servers that are 
being replicated to is only updated if there are a large number of issues 
encountered while replicating.

This can cause it to take quite a while to recognize that a number of the 
regionserver in a peer cluster are no longer available. A potentially bigger 
problem is that if a replication peer cluster is started with a small number of 
regionservers, and then more region servers are added after replication has 
started, the additional region servers will never be used for replication 
(unless there are failures on the in-use regionservers).



Part of the current issue is that the retry code in ReplicationSource#shipEdits 
checks a randomly-chosen replication peer regionserver (in 
ReplicationSource#isSlaveDown) to see if it is up after a replication write has 
failed on a different randonly-chosen replication peer. If the peer is seen as 
not down, another randomly-chosen peer is used for writing.



A second part of the issue is that changes to the list of region servers in a 
peer cluster are not detected at all, and are only picked up if a certain 
number of failures have occurred when trying to ship edits.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-7634) Replication handling of changes to peer clusters is inefficient

2013-01-21 Thread Gabriel Reid (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-7634?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gabriel Reid updated HBASE-7634:


Attachment: HBASE-7634.patch

Initial patch to resolve this issue. Adds a watcher to replication peer's list 
of region servers to respond to changes in the list of region servers. 

Also changes checking randomly-chosen peer regionservers with the ability to 
report a bad peer regionserver. When a bad peer regionserver has been 
reported three times, it is no longer used for replication until the list of 
replication peer regionservers is refreshed.


 Replication handling of changes to peer clusters is inefficient
 ---

 Key: HBASE-7634
 URL: https://issues.apache.org/jira/browse/HBASE-7634
 Project: HBase
  Issue Type: Bug
  Components: Replication
Affects Versions: 0.96.0
Reporter: Gabriel Reid
 Attachments: HBASE-7634.patch


 The current handling of changes to the region servers in a replication peer 
 cluster is currently quite inefficient. The list of region servers that are 
 being replicated to is only updated if there are a large number of issues 
 encountered while replicating.
 This can cause it to take quite a while to recognize that a number of the 
 regionserver in a peer cluster are no longer available. A potentially bigger 
 problem is that if a replication peer cluster is started with a small number 
 of regionservers, and then more region servers are added after replication 
 has started, the additional region servers will never be used for replication 
 (unless there are failures on the in-use regionservers).
 Part of the current issue is that the retry code in 
 ReplicationSource#shipEdits checks a randomly-chosen replication peer 
 regionserver (in ReplicationSource#isSlaveDown) to see if it is up after a 
 replication write has failed on a different randonly-chosen replication peer. 
 If the peer is seen as not down, another randomly-chosen peer is used for 
 writing.
 A second part of the issue is that changes to the list of region servers in a 
 peer cluster are not detected at all, and are only picked up if a certain 
 number of failures have occurred when trying to ship edits.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7325) Replication reacts slowly on a lightly-loaded cluster

2013-01-14 Thread Gabriel Reid (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13552852#comment-13552852
 ] 

Gabriel Reid commented on HBASE-7325:
-

[~jdcryans] [~lhofhansl] Gentle nudge on this so it doesn't get forgotten -- is 
it still the plan to commit this to trunk (and possibly 0.94)? Or is there more 
of a feeling that this wouldn't be a good idea?


 Replication reacts slowly on a lightly-loaded cluster
 -

 Key: HBASE-7325
 URL: https://issues.apache.org/jira/browse/HBASE-7325
 Project: HBase
  Issue Type: Bug
  Components: Replication
Reporter: Gabriel Reid
Priority: Minor
 Attachments: HBASE-7325.patch


 ReplicationSource uses a backing-off algorithm to sleep for an increasing 
 duration when an error is encountered in the replication run loop. However, 
 this backing-off is also performed when there is nothing found to replicate 
 in the HLog.
 Assuming default settings (1 second base retry sleep time, and maximum 
 multiplier of 10), this means that replication takes up to 10 seconds to 
 occur when there is a break of about 55 seconds without anything being 
 written. As there is no error condition, and there is apparently no 
 substantial load on the regionserver in this situation, it would probably 
 make more sense to not back off in non-error situations.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-7476) HBase shell count command doesn't escape binary output

2013-01-06 Thread Gabriel Reid (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-7476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gabriel Reid updated HBASE-7476:


Attachment: HBASE-7476_1.patch

[~stack] Here's an updated patch that uses Bytes.toBinaryString as suggested

 HBase shell count command doesn't escape binary output
 --

 Key: HBASE-7476
 URL: https://issues.apache.org/jira/browse/HBASE-7476
 Project: HBase
  Issue Type: Bug
  Components: shell
Reporter: Gabriel Reid
Priority: Minor
 Attachments: HBASE-7476_1.patch, HBASE-7476.patch


 When running the the count command in the HBase shell, the row key is printed 
 each time a count interval is reached. However, the key is printed verbatim, 
 meaning that non-printable characters are directly printed to the terminal. 
 This can cause confusing results, or even leave the terminal in an unusable 
 state.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HBASE-7476) HBase shell count command doesn't escape binary output

2013-01-02 Thread Gabriel Reid (JIRA)
Gabriel Reid created HBASE-7476:
---

 Summary: HBase shell count command doesn't escape binary output
 Key: HBASE-7476
 URL: https://issues.apache.org/jira/browse/HBASE-7476
 Project: HBase
  Issue Type: Bug
  Components: shell
Reporter: Gabriel Reid
Priority: Minor


When running the the count command in the HBase shell, the row key is printed 
each time a count interval is reached. However, the key is printed verbatim, 
meaning that non-printable characters are directly printed to the terminal. 
This can cause confusing results, or even leave the terminal in an unusable 
state.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-7476) HBase shell count command doesn't escape binary output

2013-01-02 Thread Gabriel Reid (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-7476?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gabriel Reid updated HBASE-7476:


Attachment: HBASE-7476.patch

Simple patch to escape the row key when it is printed. I'm pretty illiterate in 
Ruby, so there may well be a better way of doing this.

 HBase shell count command doesn't escape binary output
 --

 Key: HBASE-7476
 URL: https://issues.apache.org/jira/browse/HBASE-7476
 Project: HBase
  Issue Type: Bug
  Components: shell
Reporter: Gabriel Reid
Priority: Minor
 Attachments: HBASE-7476.patch


 When running the the count command in the HBase shell, the row key is printed 
 each time a count interval is reached. However, the key is printed verbatim, 
 meaning that non-printable characters are directly printed to the terminal. 
 This can cause confusing results, or even leave the terminal in an unusable 
 state.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7476) HBase shell count command doesn't escape binary output

2013-01-02 Thread Gabriel Reid (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7476?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13542367#comment-13542367
 ] 

Gabriel Reid commented on HBASE-7476:
-

Absolutely, anything that gets the job done sounds good to me, thanks!

 HBase shell count command doesn't escape binary output
 --

 Key: HBASE-7476
 URL: https://issues.apache.org/jira/browse/HBASE-7476
 Project: HBase
  Issue Type: Bug
  Components: shell
Reporter: Gabriel Reid
Priority: Minor
 Attachments: HBASE-7476.patch


 When running the the count command in the HBase shell, the row key is printed 
 each time a count interval is reached. However, the key is printed verbatim, 
 meaning that non-printable characters are directly printed to the terminal. 
 This can cause confusing results, or even leave the terminal in an unusable 
 state.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7325) Replication reacts slowly on a lightly-loaded cluster

2012-12-12 Thread Gabriel Reid (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13529742#comment-13529742
 ] 

Gabriel Reid commented on HBASE-7325:
-

I've tested it against the TestReplication unit tests, as well as doing some 
additional testing with HBaseTestingUtility to verify the expected performance 
improvement.

Indeed, I think the once-per-second load being put on the namenode should be a 
non-issue, and worth it for the gain that you get with faster replication on a 
quiet cluster.

 Replication reacts slowly on a lightly-loaded cluster
 -

 Key: HBASE-7325
 URL: https://issues.apache.org/jira/browse/HBASE-7325
 Project: HBase
  Issue Type: Bug
  Components: Replication
Reporter: Gabriel Reid
Priority: Minor
 Attachments: HBASE-7325.patch


 ReplicationSource uses a backing-off algorithm to sleep for an increasing 
 duration when an error is encountered in the replication run loop. However, 
 this backing-off is also performed when there is nothing found to replicate 
 in the HLog.
 Assuming default settings (1 second base retry sleep time, and maximum 
 multiplier of 10), this means that replication takes up to 10 seconds to 
 occur when there is a break of about 55 seconds without anything being 
 written. As there is no error condition, and there is apparently no 
 substantial load on the regionserver in this situation, it would probably 
 make more sense to not back off in non-error situations.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7325) Replication reacts slowly on a lightly-loaded cluster

2012-12-12 Thread Gabriel Reid (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13530769#comment-13530769
 ] 

Gabriel Reid commented on HBASE-7325:
-

[~lhofhansl] I follow your point -- however, this is a situation where the 
cluster is almost totally idle. 

If each region server is getting at least one mutation event per second (which 
I would assume is still a very light load) then the polling is going to be 
happening once per second anyhow. If the cluster is more heavily loaded, then 
the polling is going to be occurring at the rate at which edits can be shipped 
to peers. 

This makes me think that if the 1 second interval polling on an idle cluster is 
a problem, then replication on a loaded cluster will be a much bigger problem.

 Replication reacts slowly on a lightly-loaded cluster
 -

 Key: HBASE-7325
 URL: https://issues.apache.org/jira/browse/HBASE-7325
 Project: HBase
  Issue Type: Bug
  Components: Replication
Reporter: Gabriel Reid
Priority: Minor
 Attachments: HBASE-7325.patch


 ReplicationSource uses a backing-off algorithm to sleep for an increasing 
 duration when an error is encountered in the replication run loop. However, 
 this backing-off is also performed when there is nothing found to replicate 
 in the HLog.
 Assuming default settings (1 second base retry sleep time, and maximum 
 multiplier of 10), this means that replication takes up to 10 seconds to 
 occur when there is a break of about 55 seconds without anything being 
 written. As there is no error condition, and there is apparently no 
 substantial load on the regionserver in this situation, it would probably 
 make more sense to not back off in non-error situations.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HBASE-7325) Replication reacts slowly on a lightly-loaded cluster

2012-12-11 Thread Gabriel Reid (JIRA)
Gabriel Reid created HBASE-7325:
---

 Summary: Replication reacts slowly on a lightly-loaded cluster
 Key: HBASE-7325
 URL: https://issues.apache.org/jira/browse/HBASE-7325
 Project: HBase
  Issue Type: Bug
  Components: Replication
Reporter: Gabriel Reid
Priority: Minor


ReplicationSource uses a backing-off algorithm to sleep for an increasing 
duration when an error is encountered in the replication run loop. However, 
this backing-off is also performed when there is nothing found to replicate in 
the HLog.

Assuming default settings (1 second base retry sleep time, and maximum 
multiplier of 10), this means that replication takes up to 10 seconds to occur 
when there is a break of about 55 seconds without anything being written. As 
there is no error condition, and there is apparently no substantial load on the 
regionserver in this situation, it would probably make more sense to not back 
off in non-error situations.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-7325) Replication reacts slowly on a lightly-loaded cluster

2012-12-11 Thread Gabriel Reid (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-7325?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gabriel Reid updated HBASE-7325:


Attachment: HBASE-7325.patch

Patch to reset the sleep multiplier if there is no data available to replicate 
(and it is not an error situation)

 Replication reacts slowly on a lightly-loaded cluster
 -

 Key: HBASE-7325
 URL: https://issues.apache.org/jira/browse/HBASE-7325
 Project: HBase
  Issue Type: Bug
  Components: Replication
Reporter: Gabriel Reid
Priority: Minor
 Attachments: HBASE-7325.patch


 ReplicationSource uses a backing-off algorithm to sleep for an increasing 
 duration when an error is encountered in the replication run loop. However, 
 this backing-off is also performed when there is nothing found to replicate 
 in the HLog.
 Assuming default settings (1 second base retry sleep time, and maximum 
 multiplier of 10), this means that replication takes up to 10 seconds to 
 occur when there is a break of about 55 seconds without anything being 
 written. As there is no error condition, and there is apparently no 
 substantial load on the regionserver in this situation, it would probably 
 make more sense to not back off in non-error situations.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HBASE-7292) HTablePool class description javadoc is unclear

2012-12-06 Thread Gabriel Reid (JIRA)
Gabriel Reid created HBASE-7292:
---

 Summary: HTablePool class description javadoc is unclear
 Key: HBASE-7292
 URL: https://issues.apache.org/jira/browse/HBASE-7292
 Project: HBase
  Issue Type: Improvement
Reporter: Gabriel Reid
Priority: Minor


The class description javadoc for HTablePool contains a sentence that makes no 
sense in the context (it appears to be part of an incorrectly-applied patch 
from the past). The sentence references the correct way of returning HTables to 
the pool, but it actually makes it more difficult to understand what the 
correct way of returning tables to the pool actually is.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-7292) HTablePool class description javadoc is unclear

2012-12-06 Thread Gabriel Reid (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-7292?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gabriel Reid updated HBASE-7292:


Attachment: HBASE-7292.patch

 HTablePool class description javadoc is unclear
 ---

 Key: HBASE-7292
 URL: https://issues.apache.org/jira/browse/HBASE-7292
 Project: HBase
  Issue Type: Improvement
Reporter: Gabriel Reid
Priority: Minor
 Attachments: HBASE-7292.patch


 The class description javadoc for HTablePool contains a sentence that makes 
 no sense in the context (it appears to be part of an incorrectly-applied 
 patch from the past). The sentence references the correct way of returning 
 HTables to the pool, but it actually makes it more difficult to understand 
 what the correct way of returning tables to the pool actually is.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7292) HTablePool class description javadoc is unclear

2012-12-06 Thread Gabriel Reid (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7292?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13511470#comment-13511470
 ] 

Gabriel Reid commented on HBASE-7292:
-

Added small patch to fix up the HTablePool javadoc.

 HTablePool class description javadoc is unclear
 ---

 Key: HBASE-7292
 URL: https://issues.apache.org/jira/browse/HBASE-7292
 Project: HBase
  Issue Type: Improvement
Reporter: Gabriel Reid
Priority: Minor
 Attachments: HBASE-7292.patch


 The class description javadoc for HTablePool contains a sentence that makes 
 no sense in the context (it appears to be part of an incorrectly-applied 
 patch from the past). The sentence references the correct way of returning 
 HTables to the pool, but it actually makes it more difficult to understand 
 what the correct way of returning tables to the pool actually is.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HBASE-6953) Incorrect javadoc description of HFileOutputFormat regarding multiple column families

2012-10-04 Thread Gabriel Reid (JIRA)
Gabriel Reid created HBASE-6953:
---

 Summary: Incorrect javadoc description of HFileOutputFormat 
regarding multiple column families
 Key: HBASE-6953
 URL: https://issues.apache.org/jira/browse/HBASE-6953
 Project: HBase
  Issue Type: Bug
  Components: mapreduce
Reporter: Gabriel Reid
Priority: Minor


The javadoc for HFileOutputFormat states that the class does not support 
writing multiple column families; however, this hasn't been the case since 
HBASE-1861 was resolved.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-6953) Incorrect javadoc description of HFileOutputFormat regarding multiple column families

2012-10-04 Thread Gabriel Reid (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-6953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Gabriel Reid updated HBASE-6953:


Attachment: HBASE-6953.patch

Attached patch removes the incorrect statement about multiple column families, 
as well as providing a pointer to the configureIncrementalLoad convenience 
method (as this is what most users of the class will be interested in).

 Incorrect javadoc description of HFileOutputFormat regarding multiple column 
 families
 -

 Key: HBASE-6953
 URL: https://issues.apache.org/jira/browse/HBASE-6953
 Project: HBase
  Issue Type: Bug
  Components: mapreduce
Reporter: Gabriel Reid
Priority: Minor
 Attachments: HBASE-6953.patch


 The javadoc for HFileOutputFormat states that the class does not support 
 writing multiple column families; however, this hasn't been the case since 
 HBASE-1861 was resolved.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira