[jira] [Commented] (HBASE-11394) Replication can have data loss if peer id contains hyphen -

2014-06-23 Thread Jieshan Bean (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11394?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14041580#comment-14041580
 ] 

Jieshan Bean commented on HBASE-11394:
--

 I have the same doubt.  We have added a restriction to the peer-id name in our 
private version. 



 Replication can have data loss if peer id contains hyphen -
 -

 Key: HBASE-11394
 URL: https://issues.apache.org/jira/browse/HBASE-11394
 Project: HBase
  Issue Type: Bug
Reporter: Enis Soztutar
 Fix For: 0.99.0, 0.98.4


 This is an extension to HBASE-8207. It seems that there is no check for the 
 peer id string (which is the short name for the replication peer) format. So 
 in case a peer id containing -, it will cause data loss silently on server 
 failure. 
 I did not verify the claim via testing though, this is just purely from 
 reading the code. 



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Commented] (HBASE-11344) Hide row keys and such from the web UIs

2014-06-19 Thread Jieshan Bean (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-11344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14038214#comment-14038214
 ] 

Jieshan Bean commented on HBASE-11344:
--

+1 on this idea. We are suffering the same security problem.

 Hide row keys and such from the web UIs
 ---

 Key: HBASE-11344
 URL: https://issues.apache.org/jira/browse/HBASE-11344
 Project: HBase
  Issue Type: Improvement
Reporter: Devaraj Das
 Fix For: 0.99.0


 The table details on the master UI lists the start row keys of the regions. 
 The row keys might have sensitive data. We should hide them based on whether 
 or not the user accessing has the required authorization to view the table.. 
 To start with, we could make the display of row keys and such based on a 
 configuration being true or false. If it is false, such potentially sensitive 
 data is never displayed on the web UI.



--
This message was sent by Atlassian JIRA
(v6.2#6252)


[jira] [Assigned] (HBASE-9081) Online split for an reserved empty region

2013-07-30 Thread Jieshan Bean (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jieshan Bean reassigned HBASE-9081:
---

Assignee: Jieshan Bean

 Online split for an reserved empty region
 -

 Key: HBASE-9081
 URL: https://issues.apache.org/jira/browse/HBASE-9081
 Project: HBase
  Issue Type: New Feature
  Components: master, regionserver
Reporter: Jieshan Bean
Assignee: Jieshan Bean

 We already have a region splitter tool. But it can only provide limited 
 functions:
 1. Create table with a specified region number without give any splits.
 2. Roll-Split on an exist region.
 We have such user scenario: 
 Table was created with splits like below: 
 abcdefgo
 g~o is a reserved empty region. Will use it only after some days. So we don't 
 know the rowkey distribution currently. Will split it only when it get used.
 Say, we want to split g~o with 10 new regions, likes g, g1, g2, g3, g4, 
 g5...,g9, o.
 I didn't find similar function has already been there. Please tell me if I am 
 wrong.
 Hope to hear your ideas on this:)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-9081) Online split for an reserved empty region

2013-07-29 Thread Jieshan Bean (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jieshan Bean updated HBASE-9081:


Summary: Online split for an reserved empty region  (was: Online split for 
an reserved empty region with splits = 1)

 Online split for an reserved empty region
 -

 Key: HBASE-9081
 URL: https://issues.apache.org/jira/browse/HBASE-9081
 Project: HBase
  Issue Type: New Feature
  Components: master, regionserver
Reporter: Jieshan Bean

 We already have a region splitter tool. But it can only provide limited 
 functions:
 1. Create table with a specified region number without give any splits.
 2. Roll-Split on an exist region.
 We have such user scenario: 
 Table was created with splits like below: 
 abcdefgo
 g~o is a reserved empty region. Will use it only after some days. So we don't 
 know the rowkey distribution currently. Will split it only when it get used.
 Say, we want to split g~o with 10 new regions, likes g, g1, g2, g3, g4, 
 g5...,g9, o.
 I didn't find similar function has already been there. Please tell me if I am 
 wrong.
 Hope to hear your ideas on this:)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-9081) Online split for an reserved empty region with splits = 1

2013-07-29 Thread Jieshan Bean (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-9081?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jieshan Bean updated HBASE-9081:


Summary: Online split for an reserved empty region with splits = 1  (was: 
Online split for an reserved empty region with splits  1)

 Online split for an reserved empty region with splits = 1
 --

 Key: HBASE-9081
 URL: https://issues.apache.org/jira/browse/HBASE-9081
 Project: HBase
  Issue Type: New Feature
  Components: master, regionserver
Reporter: Jieshan Bean

 We already have a region splitter tool. But it can only provide limited 
 functions:
 1. Create table with a specified region number without give any splits.
 2. Roll-Split on an exist region.
 We have such user scenario: 
 Table was created with splits like below: 
 abcdefgo
 g~o is a reserved empty region. Will use it only after some days. So we don't 
 know the rowkey distribution currently. Will split it only when it get used.
 Say, we want to split g~o with 10 new regions, likes g, g1, g2, g3, g4, 
 g5...,g9, o.
 I didn't find similar function has already been there. Please tell me if I am 
 wrong.
 Hope to hear your ideas on this:)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HBASE-9081) Online split for an reserved empty region with splits 1

2013-07-29 Thread Jieshan Bean (JIRA)
Jieshan Bean created HBASE-9081:
---

 Summary: Online split for an reserved empty region with splits  1
 Key: HBASE-9081
 URL: https://issues.apache.org/jira/browse/HBASE-9081
 Project: HBase
  Issue Type: New Feature
  Components: master, regionserver
Reporter: Jieshan Bean


We already have a region splitter tool. But it can only provide limited 
functions:
1. Create table with a specified region number without give any splits.
2. Roll-Split on an exist region.

We have such user scenario: 
Table was created with splits like below: 
abcdefgo
g~o is a reserved empty region. Will use it only after some days. So we don't 
know the rowkey distribution currently. Will split it only when it get used.

Say, we want to split g~o with 10 new regions, likes g, g1, g2, g3, g4, 
g5...,g9, o.

I didn't find similar function has already been there. Please tell me if I am 
wrong.
Hope to hear your ideas on this:)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8927) Use nano time instead of mili time everywhere

2013-07-11 Thread Jieshan Bean (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8927?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13705568#comment-13705568
 ] 

Jieshan Bean commented on HBASE-8927:
-

I think System.nanoTime() can not be used as TimeStamp. It can't ensure the 
accuracy.

 Use nano time instead of mili time everywhere
 -

 Key: HBASE-8927
 URL: https://issues.apache.org/jira/browse/HBASE-8927
 Project: HBase
  Issue Type: Bug
Reporter: stack
 Attachments: 8927.txt


 Less collisions and we are paying the price of a long anyways so might as 
 well fill it.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8892) should pick the files as older as possible also while hasReferences

2013-07-08 Thread Jieshan Bean (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13701845#comment-13701845
 ] 

Jieshan Bean commented on HBASE-8892:
-

Generally speak, older file is bigger than the new one, and maybe include 
reference files at the beginning. So I don't think this change is reasonable if 
I understand the code correctly:).
[~xieliang007], any other reasons?

 should pick the files as older as possible also while hasReferences
 ---

 Key: HBASE-8892
 URL: https://issues.apache.org/jira/browse/HBASE-8892
 Project: HBase
  Issue Type: Bug
  Components: Compaction
Affects Versions: 0.94.9
Reporter: Liang Xie
Assignee: Liang Xie
Priority: Minor
 Attachments: HBase-8892-0.94.txt


 Currently, while hasReferences for compactSelection, and if 
 compactSelection.getFilesToCompact() has more than maxFilesToCompact files, 
 we clear the files from beginning, it's different with the normal minor 
 compaction ratio based policy, which tries to do compactSelection from older 
 to newer ones as possible.
 {code}
   } else if (compactSelection.getFilesToCompact().size()  
 this.maxFilesToCompact) {
 // all files included in this compaction, up to max
 int pastMax = compactSelection.getFilesToCompact().size() - 
 this.maxFilesToCompact;
 compactSelection.getFilesToCompact().subList(0, pastMax).clear();
 {code}
 It makes the beginning files more difficult to be picked up in future's minor 
 compaction stage.
 IMHO, it should be like this:
 {code}
 compactSelection.getFilesToCompact()
 .subList(this.maxFilesToCompact, 
 compactSelection.getFilesToCompact().size())
 .clear();
 {code}
 It's not a big issue, since occurs while hasReferences returns true only.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8772) Separate Replication from HBase RegionServer process

2013-06-25 Thread Jieshan Bean (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8772?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13692886#comment-13692886
 ] 

Jieshan Bean commented on HBASE-8772:
-

Aggreed with J-D. Below is some key points need to consider in my own 
understanding:
1. Seperated process only for ReplicationSource? Meanwhile, ReplicationSink 
could also be impacted by GC triggered by RegionServer, although 
ReplicationSink is not a seperated thread currently.
2. Need to introduce a new RPC interface? RegionInterface can not be used any 
more.
3. Need to track logs itself.
4. Queue-Failover is more complicated. Since RegionServer may have aborted but 
Replication process still be there, and vice versa. So each replication process 
should be registered in ZooKeeper, and tracked by each RegionServer.
5. Support for security.

 Separate Replication from HBase RegionServer process
 

 Key: HBASE-8772
 URL: https://issues.apache.org/jira/browse/HBASE-8772
 Project: HBase
  Issue Type: New Feature
  Components: regionserver, Replication
Reporter: Sameer Vaishampayan
  Labels: performance

 Replication is a separate functionality than managing regions and should be 
 able to be managed separately as a service rather than rolled into 
 RegionServer. Load on RegionServer, gc etc shouldn't affect the replication 
 service.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8774) Add BatchSize and Filter to Thrift2

2013-06-20 Thread Jieshan Bean (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13689107#comment-13689107
 ] 

Jieshan Bean commented on HBASE-8774:
-

See HBASE-6073. It's also regarding on adding filter to Thrift2.

1 minor problem in the patch:
{code}
+boolean this_present_filterString = true  this.isSetFilterString();
+boolean that_present_filterString = true  that.isSetFilterString();
{code}
true is redundant.

In addition, I suggest to add 1 unit test. Anyway, it's a nice patch.

 Add BatchSize and Filter to Thrift2
 ---

 Key: HBASE-8774
 URL: https://issues.apache.org/jira/browse/HBASE-8774
 Project: HBase
  Issue Type: New Feature
  Components: Thrift
Affects Versions: 0.95.1
Reporter: Hamed Madani
 Attachments: HBASE_8774.patch


 Attached Patch will add BatchSize and Filter support to Thrift2

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8562) Close readers after compaction

2013-05-17 Thread Jieshan Bean (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13661258#comment-13661258
 ] 

Jieshan Bean commented on HBASE-8562:
-

StoreFileScanners use a shared reader on store file level. so it does not need 
to close the reader. right?

 Close readers after compaction
 --

 Key: HBASE-8562
 URL: https://issues.apache.org/jira/browse/HBASE-8562
 Project: HBase
  Issue Type: Bug
Reporter: Amitanand Aiyer
Assignee: Amitanand Aiyer
Priority: Trivial

 StoreFileScanners open readers to read the store file. However, these readers 
 are not closed upon StoreFileScanner.close().
 This should be closed at the end of the compaction.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8563) Double count of read requests for Gets

2013-05-16 Thread Jieshan Bean (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13660218#comment-13660218
 ] 

Jieshan Bean commented on HBASE-8563:
-

Makes sense to me. +1.

 Double count of read requests for Gets 
 ---

 Key: HBASE-8563
 URL: https://issues.apache.org/jira/browse/HBASE-8563
 Project: HBase
  Issue Type: Bug
Reporter: Francis Liu
Assignee: Francis Liu
 Fix For: 0.94.7

 Attachments: HBASE-8563_94.patch


 Whenever a RegionScanner is created via HRegion.getScanner(), the read 
 request count is incremented. Since get is implemented as a scan internally. 
 Each Get request is counted twice. Scans will have an extra count as well.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8253) A corrupted log blocked ReplicationSource

2013-05-09 Thread Jieshan Bean (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13652954#comment-13652954
 ] 

Jieshan Bean commented on HBASE-8253:
-

bq.the edit was then written into the next log and durability was ensured?
Yes.

bq.So we just need to skip over this one?
Yes. Just skip this one:)



 A corrupted log blocked ReplicationSource
 -

 Key: HBASE-8253
 URL: https://issues.apache.org/jira/browse/HBASE-8253
 Project: HBase
  Issue Type: Bug
  Components: Replication
Affects Versions: 0.94.6
Reporter: Jieshan Bean
Assignee: Jieshan Bean
 Attachments: HBASE-8253-94.patch


 A writting log got corrupted when we forcely power down one node. Only 
 partial of last WALEdit was written into that log. And that log was not the 
 last one in replication queue. 
 ReplicationSource was blocked under this scenario. A lot of logs like below 
 were printed:
 {noformat}
 2013-03-30 06:53:48,628 WARN  
 [regionserver26003-EventThread.replicationSource,1] 1 Got:  
 org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:334)
 java.io.EOFException: 
 hdfs://hacluster/hbase/.logs/master11,26003,1364530862620/master11%2C26003%2C1364530862620.1364553936510,
  entryStart=40434738, pos=40450048, end=40450048, edit=0
   at sun.reflect.GeneratedConstructorAccessor42.newInstance(Unknown 
 Source)
   at 
 sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
   at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
   at 
 org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.addFileInfoToException(SequenceFileLogReader.java:295)
   at 
 org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.next(SequenceFileLogReader.java:240)
   at 
 org.apache.hadoop.hbase.replication.regionserver.ReplicationHLogReaderManager.readNextAndSetPosition(ReplicationHLogReaderManager.java:84)
   at 
 org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.readAllEntriesToReplicateOrNextFile(ReplicationSource.java:412)
   at 
 org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:330)
 Caused by: java.io.EOFException
   at java.io.DataInputStream.readFully(DataInputStream.java:180)
   at 
 org.apache.hadoop.io.DataOutputBuffer$Buffer.write(DataOutputBuffer.java:68)
   at 
 org.apache.hadoop.io.DataOutputBuffer.write(DataOutputBuffer.java:106)
   at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2282)
   at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2181)
   at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2227)
   at 
 org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.next(SequenceFileLogReader.java:238)
   ... 3 more
 ..
 2013-03-30 06:54:38,899 WARN  
 [regionserver26003-EventThread.replicationSource,1] 1 Got:  
 org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:334)
 java.io.EOFException: 
 hdfs://hacluster/hbase/.logs/master11,26003,1364530862620/master11%2C26003%2C1364530862620.1364553936510,
  entryStart=40434738, pos=40450048, end=40450048, edit=0
   at sun.reflect.GeneratedConstructorAccessor42.newInstance(Unknown 
 Source)
   at 
 sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
   at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
   at 
 org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.addFileInfoToException(SequenceFileLogReader.java:295)
   at 
 org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.next(SequenceFileLogReader.java:240)
   at 
 org.apache.hadoop.hbase.replication.regionserver.ReplicationHLogReaderManager.readNextAndSetPosition(ReplicationHLogReaderManager.java:84)
   at 
 org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.readAllEntriesToReplicateOrNextFile(ReplicationSource.java:412)
   at 
 org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:330)
 Caused by: java.io.EOFException
   at java.io.DataInputStream.readFully(DataInputStream.java:180)
   at 
 org.apache.hadoop.io.DataOutputBuffer$Buffer.write(DataOutputBuffer.java:68)
   at 
 org.apache.hadoop.io.DataOutputBuffer.write(DataOutputBuffer.java:106)
   at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2282)
   at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2181)
   at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2227)
   at 
 

[jira] [Commented] (HBASE-8253) A corrupted log blocked ReplicationSource

2013-05-02 Thread Jieshan Bean (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13648093#comment-13648093
 ] 

Jieshan Bean commented on HBASE-8253:
-

bq.how come you got this error in a normal source?
Yes, it happened in a normal source not a recovered one. Primary data node was 
forcely powered down, so logRoll was requested. And during that time, only 
partial of last edit was written. Then we saw this problem.

 A corrupted log blocked ReplicationSource
 -

 Key: HBASE-8253
 URL: https://issues.apache.org/jira/browse/HBASE-8253
 Project: HBase
  Issue Type: Bug
  Components: Replication
Affects Versions: 0.94.6
Reporter: Jieshan Bean
Assignee: Jieshan Bean
 Attachments: HBASE-8253-94.patch


 A writting log got corrupted when we forcely power down one node. Only 
 partial of last WALEdit was written into that log. And that log was not the 
 last one in replication queue. 
 ReplicationSource was blocked under this scenario. A lot of logs like below 
 were printed:
 {noformat}
 2013-03-30 06:53:48,628 WARN  
 [regionserver26003-EventThread.replicationSource,1] 1 Got:  
 org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:334)
 java.io.EOFException: 
 hdfs://hacluster/hbase/.logs/master11,26003,1364530862620/master11%2C26003%2C1364530862620.1364553936510,
  entryStart=40434738, pos=40450048, end=40450048, edit=0
   at sun.reflect.GeneratedConstructorAccessor42.newInstance(Unknown 
 Source)
   at 
 sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
   at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
   at 
 org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.addFileInfoToException(SequenceFileLogReader.java:295)
   at 
 org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.next(SequenceFileLogReader.java:240)
   at 
 org.apache.hadoop.hbase.replication.regionserver.ReplicationHLogReaderManager.readNextAndSetPosition(ReplicationHLogReaderManager.java:84)
   at 
 org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.readAllEntriesToReplicateOrNextFile(ReplicationSource.java:412)
   at 
 org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:330)
 Caused by: java.io.EOFException
   at java.io.DataInputStream.readFully(DataInputStream.java:180)
   at 
 org.apache.hadoop.io.DataOutputBuffer$Buffer.write(DataOutputBuffer.java:68)
   at 
 org.apache.hadoop.io.DataOutputBuffer.write(DataOutputBuffer.java:106)
   at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2282)
   at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2181)
   at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2227)
   at 
 org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.next(SequenceFileLogReader.java:238)
   ... 3 more
 ..
 2013-03-30 06:54:38,899 WARN  
 [regionserver26003-EventThread.replicationSource,1] 1 Got:  
 org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:334)
 java.io.EOFException: 
 hdfs://hacluster/hbase/.logs/master11,26003,1364530862620/master11%2C26003%2C1364530862620.1364553936510,
  entryStart=40434738, pos=40450048, end=40450048, edit=0
   at sun.reflect.GeneratedConstructorAccessor42.newInstance(Unknown 
 Source)
   at 
 sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
   at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
   at 
 org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.addFileInfoToException(SequenceFileLogReader.java:295)
   at 
 org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.next(SequenceFileLogReader.java:240)
   at 
 org.apache.hadoop.hbase.replication.regionserver.ReplicationHLogReaderManager.readNextAndSetPosition(ReplicationHLogReaderManager.java:84)
   at 
 org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.readAllEntriesToReplicateOrNextFile(ReplicationSource.java:412)
   at 
 org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:330)
 Caused by: java.io.EOFException
   at java.io.DataInputStream.readFully(DataInputStream.java:180)
   at 
 org.apache.hadoop.io.DataOutputBuffer$Buffer.write(DataOutputBuffer.java:68)
   at 
 org.apache.hadoop.io.DataOutputBuffer.write(DataOutputBuffer.java:106)
   at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2282)
   at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2181)
   at 

[jira] [Commented] (HBASE-6428) Pluggable Compaction policies

2013-04-24 Thread Jieshan Bean (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13641303#comment-13641303
 ] 

Jieshan Bean commented on HBASE-6428:
-

Thanks for your reply, this is really a good feature:)

 Pluggable Compaction policies
 -

 Key: HBASE-6428
 URL: https://issues.apache.org/jira/browse/HBASE-6428
 Project: HBase
  Issue Type: New Feature
Reporter: Lars Hofhansl

 For some usecases is useful to allow more control over how KVs get compacted.
 For example one could envision storing old versions of a KV separate HFiles, 
 which then rarely have to be touched/cached by queries querying for new data.
 In addition these date ranged HFile can be easily used for backups while 
 maintaining historical data.
 This would be a major change, allowing compactions to provide multiple 
 targets (not just a filter).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6428) Pluggable Compaction policies

2013-04-23 Thread Jieshan Bean (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6428?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13638881#comment-13638881
 ] 

Jieshan Bean commented on HBASE-6428:
-

[~lhofhansl] Any updates on this?

 Pluggable Compaction policies
 -

 Key: HBASE-6428
 URL: https://issues.apache.org/jira/browse/HBASE-6428
 Project: HBase
  Issue Type: New Feature
Reporter: Lars Hofhansl

 For some usecases is useful to allow more control over how KVs get compacted.
 For example one could envision storing old versions of a KV separate HFiles, 
 which then rarely have to be touched/cached by queries querying for new data.
 In addition these date ranged HFile can be easily used for backups while 
 maintaining historical data.
 This would be a major change, allowing compactions to provide multiple 
 targets (not just a filter).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8361) Bulk load and other utilities should not create tables for user

2013-04-17 Thread Jieshan Bean (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8361?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13634851#comment-13634851
 ] 

Jieshan Bean commented on HBASE-8361:
-

bq.The tools should error when the destination table does not exist.

+1. Such tools should not create a table silently which may not be the expected 
table. 

 Bulk load and other utilities should not create tables for user
 ---

 Key: HBASE-8361
 URL: https://issues.apache.org/jira/browse/HBASE-8361
 Project: HBase
  Issue Type: Improvement
  Components: mapreduce
Reporter: Nick Dimiduk

 {{LoadIncrementalHFiles}} and {{ImportTsv}} will create a table with the 
 default setting when the target table does not exist. I think this is an 
 anti-feature. Neither tool provide a mechanism for the user to configure the 
 creation parameters of that table, resulting in a new table with the default 
 settings. I think it is unlikely that the default settings are what the user 
 actually wants. In the event of a table-name typo, that means data is 
 silently loaded into the wrong place. The tools should error when the 
 destination table does not exist.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8336) PooledHTable may be returned multiple times to the same pool

2013-04-14 Thread Jieshan Bean (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13631451#comment-13631451
 ] 

Jieshan Bean commented on HBASE-8336:
-

[~ngrigor...@gmail.com] Nice find.
We encountered the same problem before. +1 on the idea of adding a flag to 
represent its state.

 PooledHTable may be returned multiple times to the same pool
 

 Key: HBASE-8336
 URL: https://issues.apache.org/jira/browse/HBASE-8336
 Project: HBase
  Issue Type: Bug
  Components: Client
Affects Versions: 0.95.0
Reporter: Nikolai Grigoriev
Priority: Minor

 I have recently observed a very strange issue with an application using HBase 
 and HTablePool. After an investigation I have found that the root cause was 
 the piece of code that was calling close() twice on the same HTableInterface 
 instance retrieved from HTablePool (created with default policy).
 A closer look at the code revealed that PooledHTable.close() calls 
 returnTable(), which, in turn, places the table back into the QUEUE of the 
 pooled tables. No checking of any kind is done so it is possible to call it 
 multiple times and place multiple references to the same HTable into the same 
 pool.
 This creates a number  of negative effects:
 - pool grows on each close() call and eventually gets filled up with the 
 references to the same HTable. From this moment the pool stops working as 
 pool.
 - multiple callers will get the same instance of HTable while expecting to 
 have unique instances
 - once the pool is full, next call to close() will result to the call to the 
 real close() method of HTable. This will make HTable unusable as close() call 
 may shutdown() the internal thread pool. From this moment other attempts to 
 use this HTable will fail with RejectedExecutionException. And since the 
 HTablePool will have additional references to that HTable, other users of the 
 pool will just start failing on any call that leads to flushCommits()
 The problem was, obviously, triggered by bad code on our side. But I think 
 the pool has to be protected. Probably the best way to fix it would be to 
 implement a flag in PooledHTable that represent its state (leased/returned) 
 and once close() is called, it would be returned. From this moment any 
 operations on this PooledHTable would result in something like 
 IllegalStateException.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8251) enable SSH before assign META on Master startup

2013-04-14 Thread Jieshan Bean (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13631453#comment-13631453
 ] 

Jieshan Bean commented on HBASE-8251:
-

Yes, we can.

 enable SSH before assign META on Master startup
 ---

 Key: HBASE-8251
 URL: https://issues.apache.org/jira/browse/HBASE-8251
 Project: HBase
  Issue Type: Bug
  Components: master, Region Assignment
Affects Versions: 0.94.6
Reporter: Jieshan Bean
Assignee: Jieshan Bean
 Attachments: HBASE-8251-94.patch, HBASE-8251-94-v2.patch


 I think HBASE-5918 could not fix this issue. In HMaster#assignRootAndMeta:
 1. Assign ROOT.
 2. Block until ROOT be opened.
 3. Assign META.
 4. Block until META be opened.
 SSH is enabled after step 4. So if the RS who host ROOT dies before step 4, 
 master will be blocked.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-8251) enable SSH before assign META on Master startup

2013-04-14 Thread Jieshan Bean (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-8251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jieshan Bean updated HBASE-8251:


Resolution: Fixed
Status: Resolved  (was: Patch Available)

 enable SSH before assign META on Master startup
 ---

 Key: HBASE-8251
 URL: https://issues.apache.org/jira/browse/HBASE-8251
 Project: HBase
  Issue Type: Bug
  Components: master, Region Assignment
Affects Versions: 0.94.6
Reporter: Jieshan Bean
Assignee: Jieshan Bean
 Attachments: HBASE-8251-94.patch, HBASE-8251-94-v2.patch


 I think HBASE-5918 could not fix this issue. In HMaster#assignRootAndMeta:
 1. Assign ROOT.
 2. Block until ROOT be opened.
 3. Assign META.
 4. Block until META be opened.
 SSH is enabled after step 4. So if the RS who host ROOT dies before step 4, 
 master will be blocked.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8251) enable SSH before assign META on Master startup

2013-04-14 Thread Jieshan Bean (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13631455#comment-13631455
 ] 

Jieshan Bean commented on HBASE-8251:
-

Sorry, it should be duplicated:(. I linked this issue to HBASE-7824.

 enable SSH before assign META on Master startup
 ---

 Key: HBASE-8251
 URL: https://issues.apache.org/jira/browse/HBASE-8251
 Project: HBase
  Issue Type: Bug
  Components: master, Region Assignment
Affects Versions: 0.94.6
Reporter: Jieshan Bean
Assignee: Jieshan Bean
 Attachments: HBASE-8251-94.patch, HBASE-8251-94-v2.patch


 I think HBASE-5918 could not fix this issue. In HMaster#assignRootAndMeta:
 1. Assign ROOT.
 2. Block until ROOT be opened.
 3. Assign META.
 4. Block until META be opened.
 SSH is enabled after step 4. So if the RS who host ROOT dies before step 4, 
 master will be blocked.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8325) ReplicationSource read a empty HLog throws EOFException

2013-04-11 Thread Jieshan Bean (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8325?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13629652#comment-13629652
 ] 

Jieshan Bean commented on HBASE-8325:
-

Aggree, the latest patch in HBASE-7122 covers this issue, I suggest to resolve 
this issue as duplicate. 

 ReplicationSource read a empty HLog throws EOFException
 ---

 Key: HBASE-8325
 URL: https://issues.apache.org/jira/browse/HBASE-8325
 Project: HBase
  Issue Type: Bug
  Components: Replication
Affects Versions: 0.94.5
 Environment: replication enabled
Reporter: zavakid
Priority: Critical

 I'm using  the replication of Hbase in my test environment.
 When a replicationSource open a empty HLog, the EOFException throws. 
 It is because the Reader can't read the SequenceFile's meta data, but there's 
 no data at all, so it throws the EOFException.
 Should we detect the empty file and processed it, like we process the 
 FileNotFoundException?
 here's the code:
 {code:java}
 /**
* Open a reader on the current path
*
* @param sleepMultiplier by how many times the default sleeping time is 
 augmented
* @return true if we should continue with that file, false if we are over 
 with it
*/
   protected boolean openReader(int sleepMultiplier) {
 try {
   LOG.debug(Opening log for replication  + this.currentPath.getName() +
at  + this.repLogReader.getPosition());
   try {
 this.reader = repLogReader.openReader(this.currentPath);
   } catch (FileNotFoundException fnfe) {
 if (this.queueRecovered) {
   // We didn't find the log in the archive directory, look if it still
   // exists in the dead RS folder (there could be a chain of failures
   // to look at)
   LOG.info(NB dead servers :  + deadRegionServers.length);
   for (int i = this.deadRegionServers.length - 1; i = 0; i--) {
 Path deadRsDirectory =
 new Path(manager.getLogDir().getParent(), 
 this.deadRegionServers[i]);
 Path[] locs = new Path[] {
 new Path(deadRsDirectory, currentPath.getName()),
 new Path(deadRsDirectory.suffix(HLog.SPLITTING_EXT),
   currentPath.getName()),
 };
 for (Path possibleLogLocation : locs) {
   LOG.info(Possible location  + 
 possibleLogLocation.toUri().toString());
   if (this.manager.getFs().exists(possibleLogLocation)) {
 // We found the right new location
 LOG.info(Log  + this.currentPath +  still exists at  +
 possibleLogLocation);
 // Breaking here will make us sleep since reader is null
 return true;
   }
 }
   }
   // TODO What happens if the log was missing from every single 
 location?
   // Although we need to check a couple of times as the log could have
   // been moved by the master between the checks
   // It can also happen if a recovered queue wasn't properly cleaned,
   // such that the znode pointing to a log exists but the log was
   // deleted a long time ago.
   // For the moment, we'll throw the IO and processEndOfFile
   throw new IOException(File from recovered queue is  +
   nowhere to be found, fnfe);
 } else {
   // If the log was archived, continue reading from there
   Path archivedLogLocation =
   new Path(manager.getOldLogDir(), currentPath.getName());
   if (this.manager.getFs().exists(archivedLogLocation)) {
 currentPath = archivedLogLocation;
 LOG.info(Log  + this.currentPath +  was moved to  +
 archivedLogLocation);
 // Open the log at the new location
 this.openReader(sleepMultiplier);
   }
   // TODO What happens the log is missing in both places?
 }
   }
 } catch (IOException ioe) {
   LOG.warn(peerClusterZnode +  Got: , ioe);
   this.reader = null;
   // TODO Need a better way to determinate if a file is really gone but
   // TODO without scanning all logs dir
   if (sleepMultiplier == this.maxRetriesMultiplier) {
 LOG.warn(Waited too long for this file, considering dumping);
 return !processEndOfFile();
   }
 }
 return true;
   }
 {code}
 there's a method called {code:java}processEndOfFile(){code}
 should we add this case in it?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7122) Proper warning message when opening a log file with no entries (idle cluster)

2013-04-09 Thread Jieshan Bean (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13626323#comment-13626323
 ] 

Jieshan Bean commented on HBASE-7122:
-

Yes, HLog is not empty if it closed succesfully. We may not get EOF.
But if we get IOE during close, what will happen? 


 Proper warning message when opening a log file with no entries (idle cluster)
 -

 Key: HBASE-7122
 URL: https://issues.apache.org/jira/browse/HBASE-7122
 Project: HBase
  Issue Type: Bug
  Components: Replication
Affects Versions: 0.94.2
Reporter: Himanshu Vashishtha
Assignee: Himanshu Vashishtha
 Fix For: 0.95.1

 Attachments: HBase-7122-94.patch, HBase-7122-95.patch, 
 HBase-7122.patch, HBASE-7122.v2.patch


 In case the cluster is idle and the log has rolled (offset to 0), 
 replicationSource tries to open the log and gets an EOF exception. This gets 
 printed after every 10 sec until an entry is inserted in it.
 {code}
 2012-11-07 15:47:40,924 DEBUG regionserver.ReplicationSource 
 (ReplicationSource.java:openReader(487)) - Opening log for replication 
 c0315.hal.cloudera.com%2C40020%2C1352324202860.1352327804874 at 0
 2012-11-07 15:47:40,926 WARN  regionserver.ReplicationSource 
 (ReplicationSource.java:openReader(543)) - 1 Got: 
 java.io.EOFException
   at java.io.DataInputStream.readFully(DataInputStream.java:180)
   at java.io.DataInputStream.readFully(DataInputStream.java:152)
   at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1508)
   at 
 org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1486)
   at 
 org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1475)
   at 
 org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1470)
   at 
 org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader$WALReader.init(SequenceFileLogReader.java:55)
   at 
 org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.init(SequenceFileLogReader.java:175)
   at 
 org.apache.hadoop.hbase.regionserver.wal.HLog.getReader(HLog.java:716)
   at 
 org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.openReader(ReplicationSource.java:491)
   at 
 org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:290)
 2012-11-07 15:47:40,927 WARN  regionserver.ReplicationSource 
 (ReplicationSource.java:openReader(547)) - Waited too long for this file, 
 considering dumping
 2012-11-07 15:47:40,927 DEBUG regionserver.ReplicationSource 
 (ReplicationSource.java:sleepForRetries(562)) - Unable to open a reader, 
 sleeping 1000 times 10
 {code}
 We should reduce the log spewing in this case (or some informative message, 
 based on the offset).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7122) Proper warning message when opening a log file with no entries (idle cluster)

2013-04-09 Thread Jieshan Bean (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13627326#comment-13627326
 ] 

Jieshan Bean commented on HBASE-7122:
-

+1 on patch v2.

 Proper warning message when opening a log file with no entries (idle cluster)
 -

 Key: HBASE-7122
 URL: https://issues.apache.org/jira/browse/HBASE-7122
 Project: HBase
  Issue Type: Bug
  Components: Replication
Affects Versions: 0.94.2
Reporter: Himanshu Vashishtha
Assignee: Himanshu Vashishtha
 Fix For: 0.95.1

 Attachments: HBase-7122-94.patch, HBase-7122-95.patch, 
 HBase-7122-95-v2.patch, HBase-7122.patch, HBASE-7122.v2.patch


 In case the cluster is idle and the log has rolled (offset to 0), 
 replicationSource tries to open the log and gets an EOF exception. This gets 
 printed after every 10 sec until an entry is inserted in it.
 {code}
 2012-11-07 15:47:40,924 DEBUG regionserver.ReplicationSource 
 (ReplicationSource.java:openReader(487)) - Opening log for replication 
 c0315.hal.cloudera.com%2C40020%2C1352324202860.1352327804874 at 0
 2012-11-07 15:47:40,926 WARN  regionserver.ReplicationSource 
 (ReplicationSource.java:openReader(543)) - 1 Got: 
 java.io.EOFException
   at java.io.DataInputStream.readFully(DataInputStream.java:180)
   at java.io.DataInputStream.readFully(DataInputStream.java:152)
   at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1508)
   at 
 org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1486)
   at 
 org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1475)
   at 
 org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1470)
   at 
 org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader$WALReader.init(SequenceFileLogReader.java:55)
   at 
 org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.init(SequenceFileLogReader.java:175)
   at 
 org.apache.hadoop.hbase.regionserver.wal.HLog.getReader(HLog.java:716)
   at 
 org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.openReader(ReplicationSource.java:491)
   at 
 org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:290)
 2012-11-07 15:47:40,927 WARN  regionserver.ReplicationSource 
 (ReplicationSource.java:openReader(547)) - Waited too long for this file, 
 considering dumping
 2012-11-07 15:47:40,927 DEBUG regionserver.ReplicationSource 
 (ReplicationSource.java:sleepForRetries(562)) - Unable to open a reader, 
 sleeping 1000 times 10
 {code}
 We should reduce the log spewing in this case (or some informative message, 
 based on the offset).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7122) Proper warning message when opening a log file with no entries (idle cluster)

2013-04-09 Thread Jieshan Bean (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13627359#comment-13627359
 ] 

Jieshan Bean commented on HBASE-7122:
-

BTW, one minor comment:). Please add curly brackets to the below code:
{code}
+if (this.repLogReader.getPosition() == 0  !queueRecovered  
queue.size() == 0)
+  return true;
{code}



 Proper warning message when opening a log file with no entries (idle cluster)
 -

 Key: HBASE-7122
 URL: https://issues.apache.org/jira/browse/HBASE-7122
 Project: HBase
  Issue Type: Bug
  Components: Replication
Affects Versions: 0.94.2
Reporter: Himanshu Vashishtha
Assignee: Himanshu Vashishtha
 Fix For: 0.95.1

 Attachments: HBase-7122-94.patch, HBase-7122-95.patch, 
 HBase-7122-95-v2.patch, HBase-7122.patch, HBASE-7122.v2.patch


 In case the cluster is idle and the log has rolled (offset to 0), 
 replicationSource tries to open the log and gets an EOF exception. This gets 
 printed after every 10 sec until an entry is inserted in it.
 {code}
 2012-11-07 15:47:40,924 DEBUG regionserver.ReplicationSource 
 (ReplicationSource.java:openReader(487)) - Opening log for replication 
 c0315.hal.cloudera.com%2C40020%2C1352324202860.1352327804874 at 0
 2012-11-07 15:47:40,926 WARN  regionserver.ReplicationSource 
 (ReplicationSource.java:openReader(543)) - 1 Got: 
 java.io.EOFException
   at java.io.DataInputStream.readFully(DataInputStream.java:180)
   at java.io.DataInputStream.readFully(DataInputStream.java:152)
   at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1508)
   at 
 org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1486)
   at 
 org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1475)
   at 
 org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1470)
   at 
 org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader$WALReader.init(SequenceFileLogReader.java:55)
   at 
 org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.init(SequenceFileLogReader.java:175)
   at 
 org.apache.hadoop.hbase.regionserver.wal.HLog.getReader(HLog.java:716)
   at 
 org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.openReader(ReplicationSource.java:491)
   at 
 org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:290)
 2012-11-07 15:47:40,927 WARN  regionserver.ReplicationSource 
 (ReplicationSource.java:openReader(547)) - Waited too long for this file, 
 considering dumping
 2012-11-07 15:47:40,927 DEBUG regionserver.ReplicationSource 
 (ReplicationSource.java:sleepForRetries(562)) - Unable to open a reader, 
 sleeping 1000 times 10
 {code}
 We should reduce the log spewing in this case (or some informative message, 
 based on the offset).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8251) enable SSH before assign META on Master startup

2013-04-09 Thread Jieshan Bean (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13627367#comment-13627367
 ] 

Jieshan Bean commented on HBASE-8251:
-

[~jeffreyz] [~zjushch]
Do you have further comments? Thank you.

 enable SSH before assign META on Master startup
 ---

 Key: HBASE-8251
 URL: https://issues.apache.org/jira/browse/HBASE-8251
 Project: HBase
  Issue Type: Bug
  Components: master, Region Assignment
Affects Versions: 0.94.6
Reporter: Jieshan Bean
Assignee: Jieshan Bean
 Attachments: HBASE-8251-94.patch, HBASE-8251-94-v2.patch


 I think HBASE-5918 could not fix this issue. In HMaster#assignRootAndMeta:
 1. Assign ROOT.
 2. Block until ROOT be opened.
 3. Assign META.
 4. Block until META be opened.
 SSH is enabled after step 4. So if the RS who host ROOT dies before step 4, 
 master will be blocked.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-8251) enable SSH before assign META on Master startup

2013-04-09 Thread Jieshan Bean (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-8251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jieshan Bean updated HBASE-8251:


Status: Patch Available  (was: Open)

 enable SSH before assign META on Master startup
 ---

 Key: HBASE-8251
 URL: https://issues.apache.org/jira/browse/HBASE-8251
 Project: HBase
  Issue Type: Bug
  Components: master, Region Assignment
Affects Versions: 0.94.6
Reporter: Jieshan Bean
Assignee: Jieshan Bean
 Attachments: HBASE-8251-94.patch, HBASE-8251-94-v2.patch


 I think HBASE-5918 could not fix this issue. In HMaster#assignRootAndMeta:
 1. Assign ROOT.
 2. Block until ROOT be opened.
 3. Assign META.
 4. Block until META be opened.
 SSH is enabled after step 4. So if the RS who host ROOT dies before step 4, 
 master will be blocked.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8251) enable SSH before assign META on Master startup

2013-04-09 Thread Jieshan Bean (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13627380#comment-13627380
 ] 

Jieshan Bean commented on HBASE-8251:
-

It seems this approach conflicts with the patch in HBASE-7824.

 enable SSH before assign META on Master startup
 ---

 Key: HBASE-8251
 URL: https://issues.apache.org/jira/browse/HBASE-8251
 Project: HBase
  Issue Type: Bug
  Components: master, Region Assignment
Affects Versions: 0.94.6
Reporter: Jieshan Bean
Assignee: Jieshan Bean
 Attachments: HBASE-8251-94.patch, HBASE-8251-94-v2.patch


 I think HBASE-5918 could not fix this issue. In HMaster#assignRootAndMeta:
 1. Assign ROOT.
 2. Block until ROOT be opened.
 3. Assign META.
 4. Block until META be opened.
 SSH is enabled after step 4. So if the RS who host ROOT dies before step 4, 
 master will be blocked.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8251) enable SSH before assign META on Master startup

2013-04-09 Thread Jieshan Bean (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13627448#comment-13627448
 ] 

Jieshan Bean commented on HBASE-8251:
-

Yes. I'm reviewing that patch. Seems I missed a wonderful discussion:)

 enable SSH before assign META on Master startup
 ---

 Key: HBASE-8251
 URL: https://issues.apache.org/jira/browse/HBASE-8251
 Project: HBase
  Issue Type: Bug
  Components: master, Region Assignment
Affects Versions: 0.94.6
Reporter: Jieshan Bean
Assignee: Jieshan Bean
 Attachments: HBASE-8251-94.patch, HBASE-8251-94-v2.patch


 I think HBASE-5918 could not fix this issue. In HMaster#assignRootAndMeta:
 1. Assign ROOT.
 2. Block until ROOT be opened.
 3. Assign META.
 4. Block until META be opened.
 SSH is enabled after step 4. So if the RS who host ROOT dies before step 4, 
 master will be blocked.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-7750) We should throw IOE when calling HRegionServer#replicateLogEntries if ReplicationSink is null

2013-04-08 Thread Jieshan Bean (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-7750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jieshan Bean updated HBASE-7750:


Attachment: HBASE-7750-trunk.patch
HBASE-7750-94.patch

 We should throw IOE when calling HRegionServer#replicateLogEntries if 
 ReplicationSink is null
 -

 Key: HBASE-7750
 URL: https://issues.apache.org/jira/browse/HBASE-7750
 Project: HBase
  Issue Type: Bug
  Components: Replication
Affects Versions: 0.94.4, 0.95.2
Reporter: Jieshan Bean
Assignee: Jieshan Bean
 Attachments: HBASE-7750-94.patch, HBASE-7750-trunk.patch


 It may be an expected behavior, but I think it's better to do something. 
 We configured hbase.replication as true in master cluster, and added peer. 
 But forgot to configure hbase.replication on slave cluster side.
 ReplicationSource read HLog, shipped log edits, and logged position. 
 Everything seemed alright. But data was not present in slave cluster.
 So I think, slave cluster should throw exception to master cluster instead of 
 return directly:
 {code}
   public void replicateLogEntries(final HLog.Entry[] entries)
   throws IOException {
 checkOpen();
 if (this.replicationSinkHandler == null) return;
 this.replicationSinkHandler.replicateLogEntries(entries);
   }
 {code}
 I would like to hear your comments on this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-7750) We should throw IOE when calling HRegionServer#replicateLogEntries if ReplicationSink is null

2013-04-08 Thread Jieshan Bean (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-7750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jieshan Bean updated HBASE-7750:


Status: Patch Available  (was: Open)

 We should throw IOE when calling HRegionServer#replicateLogEntries if 
 ReplicationSink is null
 -

 Key: HBASE-7750
 URL: https://issues.apache.org/jira/browse/HBASE-7750
 Project: HBase
  Issue Type: Bug
  Components: Replication
Affects Versions: 0.94.4, 0.95.2
Reporter: Jieshan Bean
Assignee: Jieshan Bean
 Attachments: HBASE-7750-94.patch, HBASE-7750-trunk.patch


 It may be an expected behavior, but I think it's better to do something. 
 We configured hbase.replication as true in master cluster, and added peer. 
 But forgot to configure hbase.replication on slave cluster side.
 ReplicationSource read HLog, shipped log edits, and logged position. 
 Everything seemed alright. But data was not present in slave cluster.
 So I think, slave cluster should throw exception to master cluster instead of 
 return directly:
 {code}
   public void replicateLogEntries(final HLog.Entry[] entries)
   throws IOException {
 checkOpen();
 if (this.replicationSinkHandler == null) return;
 this.replicationSinkHandler.replicateLogEntries(entries);
   }
 {code}
 I would like to hear your comments on this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8251) enable SSH before assign META on Master startup

2013-04-08 Thread Jieshan Bean (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13625234#comment-13625234
 ] 

Jieshan Bean commented on HBASE-8251:
-

bq.I'm not sure if it's a good modification for scenario that a Meta location 
is pointing to an offline server or a wrong server.

The key point is whether this offline server or wrong server is a 
registered server on this master. If already rigistered, SSH will be triggered. 
This patch can avoid race. Otherwise, SSH would not happen.

 enable SSH before assign META on Master startup
 ---

 Key: HBASE-8251
 URL: https://issues.apache.org/jira/browse/HBASE-8251
 Project: HBase
  Issue Type: Bug
  Components: master, Region Assignment
Affects Versions: 0.94.6
Reporter: Jieshan Bean
Assignee: Jieshan Bean
 Attachments: HBASE-8251-94.patch, HBASE-8251-94-v2.patch


 I think HBASE-5918 could not fix this issue. In HMaster#assignRootAndMeta:
 1. Assign ROOT.
 2. Block until ROOT be opened.
 3. Assign META.
 4. Block until META be opened.
 SSH is enabled after step 4. So if the RS who host ROOT dies before step 4, 
 master will be blocked.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7122) Proper warning message when opening a log file with no entries (idle cluster)

2013-04-08 Thread Jieshan Bean (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13626043#comment-13626043
 ] 

Jieshan Bean commented on HBASE-7122:
-

I think add this check is not enough. There still has the chance of empty 
log(Not the writing one) in normal log list.
So we need to check whether this log is the one in use. 

 Proper warning message when opening a log file with no entries (idle cluster)
 -

 Key: HBASE-7122
 URL: https://issues.apache.org/jira/browse/HBASE-7122
 Project: HBase
  Issue Type: Bug
  Components: Replication
Affects Versions: 0.94.2
Reporter: Himanshu Vashishtha
Assignee: Himanshu Vashishtha
 Fix For: 0.95.1

 Attachments: HBase-7122-94.patch, HBase-7122-95.patch, 
 HBase-7122.patch, HBASE-7122.v2.patch


 In case the cluster is idle and the log has rolled (offset to 0), 
 replicationSource tries to open the log and gets an EOF exception. This gets 
 printed after every 10 sec until an entry is inserted in it.
 {code}
 2012-11-07 15:47:40,924 DEBUG regionserver.ReplicationSource 
 (ReplicationSource.java:openReader(487)) - Opening log for replication 
 c0315.hal.cloudera.com%2C40020%2C1352324202860.1352327804874 at 0
 2012-11-07 15:47:40,926 WARN  regionserver.ReplicationSource 
 (ReplicationSource.java:openReader(543)) - 1 Got: 
 java.io.EOFException
   at java.io.DataInputStream.readFully(DataInputStream.java:180)
   at java.io.DataInputStream.readFully(DataInputStream.java:152)
   at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1508)
   at 
 org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1486)
   at 
 org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1475)
   at 
 org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1470)
   at 
 org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader$WALReader.init(SequenceFileLogReader.java:55)
   at 
 org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.init(SequenceFileLogReader.java:175)
   at 
 org.apache.hadoop.hbase.regionserver.wal.HLog.getReader(HLog.java:716)
   at 
 org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.openReader(ReplicationSource.java:491)
   at 
 org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:290)
 2012-11-07 15:47:40,927 WARN  regionserver.ReplicationSource 
 (ReplicationSource.java:openReader(547)) - Waited too long for this file, 
 considering dumping
 2012-11-07 15:47:40,927 DEBUG regionserver.ReplicationSource 
 (ReplicationSource.java:sleepForRetries(562)) - Unable to open a reader, 
 sleeping 1000 times 10
 {code}
 We should reduce the log spewing in this case (or some informative message, 
 based on the offset).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8266) Master cannot start if TableNotFoundException is thrown while partial table recovery

2013-04-08 Thread Jieshan Bean (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8266?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13626149#comment-13626149
 ] 

Jieshan Bean commented on HBASE-8266:
-

[~ram_krish] can we just skip calling handleEnableTable under this scenario?  
Patch looks good otherwise:)

 Master cannot start if TableNotFoundException is thrown while partial table 
 recovery
 

 Key: HBASE-8266
 URL: https://issues.apache.org/jira/browse/HBASE-8266
 Project: HBase
  Issue Type: Bug
Affects Versions: 0.94.6, 0.95.0
Reporter: ramkrishna.s.vasudevan
Assignee: ramkrishna.s.vasudevan
Priority: Critical
 Fix For: 0.98.0, 0.94.7, 0.95.1

 Attachments: HBASE-8266_0.94.patch, HBASE-8266_1.patch, 
 HBASE-8266.patch


 I was trying to create a table. The table creation failed
 {code}
 java.io.IOException: java.util.concurrent.ExecutionException: 
 java.lang.IllegalStateException: Could not instantiate a region instance.
   at 
 org.apache.hadoop.hbase.util.ModifyRegionUtils.createRegions(ModifyRegionUtils.java:133)
   at 
 org.apache.hadoop.hbase.master.handler.CreateTableHandler.handleCreateHdfsRegions(CreateTableHandler.java:256)
   at 
 org.apache.hadoop.hbase.master.handler.CreateTableHandler.handleCreateTable(CreateTableHandler.java:204)
   at 
 org.apache.hadoop.hbase.master.handler.CreateTableHandler.process(CreateTableHandler.java:153)
   at 
 org.apache.hadoop.hbase.executor.EventHandler.run(EventHandler.java:130)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
   at java.lang.Thread.run(Thread.java:662)
 Caused by: java.util.concurrent.ExecutionException: 
 java.lang.IllegalStateException: Could not instantiate a region instance.
   at java.util.concurrent.FutureTask$Sync.innerGet(FutureTask.java:222)
   at java.util.concurrent.FutureTask.get(FutureTask.java:83)
   at 
 org.apache.hadoop.hbase.util.ModifyRegionUtils.createRegions(ModifyRegionUtils.java:126)
   ... 7 more
 Caused by: java.lang.IllegalStateException: Could not instantiate a region 
 instance.
   at 
 org.apache.hadoop.hbase.regionserver.HRegion.newHRegion(HRegion.java:3765)
   at 
 org.apache.hadoop.hbase.regionserver.HRegion.createHRegion(HRegion.java:3870)
   at 
 org.apache.hadoop.hbase.util.ModifyRegionUtils$1.call(ModifyRegionUtils.java:106)
   at 
 org.apache.hadoop.hbase.util.ModifyRegionUtils$1.call(ModifyRegionUtils.java:103)
   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
   at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:441)
   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:303)
   at java.util.concurrent.FutureTask.run(FutureTask.java:138)
   ... 3 more
 Caused by: java.lang.reflect.InvocationTargetException
   at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
   at 
 sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:39)
   at 
 sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
   at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
   at 
 org.apache.hadoop.hbase.regionserver.HRegion.newHRegion(HRegion.java:3762)
   ... 11 more
 Caused by: java.lang.NoClassDefFoundError: 
 org/apache/hadoop/hbase/CompoundConfiguration$1
   at 
 org.apache.hadoop.hbase.CompoundConfiguration.add(CompoundConfiguration.java:82)
   at org.apache.hadoop.hbase.regionserver.HRegion.init(HRegion.java:438)
   at org.apache.hadoop.hbase.regionserver.HRegion.init(HRegion.java:401)
   ... 16 more
 {code}
 Am not sure of the above failure.  The same setup is able to create new 
 tables.
 Now the table is already in ENABLING state.  The master was restarted.
 Now as the table was found in ENABLING state but not added to META the 
 EnableTableHandler 
 {code}
 2013-04-03 18:33:03,850 FATAL org.apache.hadoop.hbase.master.HMaster: 
 Unhandled exception. Starting shutdown.
 org.apache.hadoop.hbase.exceptions.TableNotFoundException: TestTable
   at 
 org.apache.hadoop.hbase.master.handler.EnableTableHandler.prepare(EnableTableHandler.java:89)
   at 
 org.apache.hadoop.hbase.master.AssignmentManager.recoverTableInEnablingState(AssignmentManager.java:2586)
   at 
 org.apache.hadoop.hbase.master.AssignmentManager.joinCluster(AssignmentManager.java:390)
   at 
 org.apache.hadoop.hbase.master.HMaster.finishInitialization(HMaster.java:777)
   at 

[jira] [Commented] (HBASE-8302) HBase dynamic configuration

2013-04-08 Thread Jieshan Bean (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8302?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13626157#comment-13626157
 ] 

Jieshan Bean commented on HBASE-8302:
-

[~brianhbase] 
is this same as HBASE-3909 and HBASE-8292?

 HBase dynamic configuration
 ---

 Key: HBASE-8302
 URL: https://issues.apache.org/jira/browse/HBASE-8302
 Project: HBase
  Issue Type: Improvement
Affects Versions: 0.94.3
Reporter: Brian Fu

 if change the HBase configuration, we need to restart the cluster to take 
 effect, we want to dynamically configure such parameters.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8251) enable SSH before assign META on Master startup

2013-04-07 Thread Jieshan Bean (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13624848#comment-13624848
 ] 

Jieshan Bean commented on HBASE-8251:
-

Yes, one corner case is META RS is killed just before new assignment 
happening(Killing happens either during calling 
processRegionInTransitionAndBlockUntilAssigned or verifyMetaRegionLocation), so 
rit is false and metaRegionLocation is false(Only under this scenario may 
trigger a new assignment).
It may cause data-loss and double assignment.
Thanks, Chunhui, Rama and Jeffrey.
I'm thinking about adding a check(check for whether currentMetaServer is a 
online server or a processing dead server, need to change something in 
ServerManager) before calling assignMeta.


 enable SSH before assign META on Master startup
 ---

 Key: HBASE-8251
 URL: https://issues.apache.org/jira/browse/HBASE-8251
 Project: HBase
  Issue Type: Bug
  Components: master, Region Assignment
Affects Versions: 0.94.6
Reporter: Jieshan Bean
Assignee: Jieshan Bean
 Attachments: HBASE-8251-94.patch


 I think HBASE-5918 could not fix this issue. In HMaster#assignRootAndMeta:
 1. Assign ROOT.
 2. Block until ROOT be opened.
 3. Assign META.
 4. Block until META be opened.
 SSH is enabled after step 4. So if the RS who host ROOT dies before step 4, 
 master will be blocked.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-8251) enable SSH before assign META on Master startup

2013-04-07 Thread Jieshan Bean (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-8251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jieshan Bean updated HBASE-8251:


Attachment: HBASE-8251-94-v2.patch

New version of patch for 94, addressed all the comments.

 enable SSH before assign META on Master startup
 ---

 Key: HBASE-8251
 URL: https://issues.apache.org/jira/browse/HBASE-8251
 Project: HBase
  Issue Type: Bug
  Components: master, Region Assignment
Affects Versions: 0.94.6
Reporter: Jieshan Bean
Assignee: Jieshan Bean
 Attachments: HBASE-8251-94.patch, HBASE-8251-94-v2.patch


 I think HBASE-5918 could not fix this issue. In HMaster#assignRootAndMeta:
 1. Assign ROOT.
 2. Block until ROOT be opened.
 3. Assign META.
 4. Block until META be opened.
 SSH is enabled after step 4. So if the RS who host ROOT dies before step 4, 
 master will be blocked.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8251) enable SSH before assign META on Master startup

2013-04-07 Thread Jieshan Bean (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13625064#comment-13625064
 ] 

Jieshan Bean commented on HBASE-8251:
-

bq.*, currentMetaServer is marked dead by master. Then we still have two 
possible places to assign meta. 
This RS should belong to the set of ServerManager#onlineServers just before 
marking it as dead(See the related code of ServerManager#onlineServers and 
DeadServer#processingDeadServers.
). So the below check returns true:
{code}
this.serverManager
  .isOnlineOrProcessingDeadServer(currentMetaServer).
{code}
So needToAssign is false. No new assign would happen under this scenario.
Correct me if I misunderstood you:). Thank you.

 enable SSH before assign META on Master startup
 ---

 Key: HBASE-8251
 URL: https://issues.apache.org/jira/browse/HBASE-8251
 Project: HBase
  Issue Type: Bug
  Components: master, Region Assignment
Affects Versions: 0.94.6
Reporter: Jieshan Bean
Assignee: Jieshan Bean
 Attachments: HBASE-8251-94.patch, HBASE-8251-94-v2.patch


 I think HBASE-5918 could not fix this issue. In HMaster#assignRootAndMeta:
 1. Assign ROOT.
 2. Block until ROOT be opened.
 3. Assign META.
 4. Block until META be opened.
 SSH is enabled after step 4. So if the RS who host ROOT dies before step 4, 
 master will be blocked.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (HBASE-8251) enable SSH before assign META on Master startup

2013-04-03 Thread Jieshan Bean (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-8251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jieshan Bean reassigned HBASE-8251:
---

Assignee: Jieshan Bean

 enable SSH before assign META on Master startup
 ---

 Key: HBASE-8251
 URL: https://issues.apache.org/jira/browse/HBASE-8251
 Project: HBase
  Issue Type: Bug
  Components: master, Region Assignment
Affects Versions: 0.94.6
Reporter: Jieshan Bean
Assignee: Jieshan Bean

 I think HBASE-5918 could not fix this issue. In HMaster#assignRootAndMeta:
 1. Assign ROOT.
 2. Block until ROOT be opened.
 3. Assign META.
 4. Block until META be opened.
 SSH is enabled after step 4. So if the RS who host ROOT dies before step 4, 
 master will be blocked.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HBASE-8251) enable SSH before assign META on Master startup

2013-04-03 Thread Jieshan Bean (JIRA)
Jieshan Bean created HBASE-8251:
---

 Summary: enable SSH before assign META on Master startup
 Key: HBASE-8251
 URL: https://issues.apache.org/jira/browse/HBASE-8251
 Project: HBase
  Issue Type: Bug
  Components: master, Region Assignment
Affects Versions: 0.94.6
Reporter: Jieshan Bean


I think HBASE-5918 could not fix this issue. In HMaster#assignRootAndMeta:
1. Assign ROOT.
2. Block until ROOT be opened.
3. Assign META.
4. Block until META be opened.

SSH is enabled after step 4. So if the RS who host ROOT dies before step 4, 
master will be blocked.


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8229) Replication code logs like crazy if a target table cannot be found.

2013-04-03 Thread Jieshan Bean (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13620667#comment-13620667
 ] 

Jieshan Bean commented on HBASE-8229:
-

+1 on first version of patch.

 Replication code logs like crazy if a target table cannot be found.
 ---

 Key: HBASE-8229
 URL: https://issues.apache.org/jira/browse/HBASE-8229
 Project: HBase
  Issue Type: Bug
  Components: Replication
Reporter: Lars Hofhansl
 Fix For: 0.95.1, 0.98.0, 0.94.7

 Attachments: 8229-0.94.txt, 8229-0.94-V2.txt


 One of our RS/DN machines ran out of diskspace on the partition to which we 
 write the log files.
 It turns out we still had a table in our source cluster with 
 REPLICATION_SCOPE=1 that did not have a matching table in the remote cluster.
 In then logged a long stack trace every 50ms or so, over a few days that 
 filled up our log partition.
 Since ReplicationSource cannot make any progress in this case anyway, it 
 should probably sleep a bit before retrying (or at least limit the rate at 
 which it spews out these exceptions to the log).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HBASE-8253) A corrupted log blocked ReplicationSource

2013-04-03 Thread Jieshan Bean (JIRA)
Jieshan Bean created HBASE-8253:
---

 Summary: A corrupted log blocked ReplicationSource
 Key: HBASE-8253
 URL: https://issues.apache.org/jira/browse/HBASE-8253
 Project: HBase
  Issue Type: Bug
  Components: Replication
Affects Versions: 0.94.6
Reporter: Jieshan Bean
Assignee: Jieshan Bean


A writting log got corrupted when we forcely power down one node. Only partial 
of last WALEdit was written into that log. And that log was not the last one in 
replication queue. 
ReplicationSource was blocked under this scenario. A lot of logs like below 
were printed:
{noformat}
2013-03-30 06:53:48,628 WARN  
[regionserver26003-EventThread.replicationSource,1] 1 Got:  
org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:334)
java.io.EOFException: 
hdfs://hacluster/hbase/.logs/master11,26003,1364530862620/master11%2C26003%2C1364530862620.1364553936510,
 entryStart=40434738, pos=40450048, end=40450048, edit=0
at sun.reflect.GeneratedConstructorAccessor42.newInstance(Unknown 
Source)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
at 
org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.addFileInfoToException(SequenceFileLogReader.java:295)
at 
org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.next(SequenceFileLogReader.java:240)
at 
org.apache.hadoop.hbase.replication.regionserver.ReplicationHLogReaderManager.readNextAndSetPosition(ReplicationHLogReaderManager.java:84)
at 
org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.readAllEntriesToReplicateOrNextFile(ReplicationSource.java:412)
at 
org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:330)
Caused by: java.io.EOFException
at java.io.DataInputStream.readFully(DataInputStream.java:180)
at 
org.apache.hadoop.io.DataOutputBuffer$Buffer.write(DataOutputBuffer.java:68)
at 
org.apache.hadoop.io.DataOutputBuffer.write(DataOutputBuffer.java:106)
at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2282)
at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2181)
at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2227)
at 
org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.next(SequenceFileLogReader.java:238)
... 3 more
..  
2013-03-30 06:54:38,899 WARN  
[regionserver26003-EventThread.replicationSource,1] 1 Got:  
org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:334)
java.io.EOFException: 
hdfs://hacluster/hbase/.logs/master11,26003,1364530862620/master11%2C26003%2C1364530862620.1364553936510,
 entryStart=40434738, pos=40450048, end=40450048, edit=0
at sun.reflect.GeneratedConstructorAccessor42.newInstance(Unknown 
Source)
at 
sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
at 
org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.addFileInfoToException(SequenceFileLogReader.java:295)
at 
org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.next(SequenceFileLogReader.java:240)
at 
org.apache.hadoop.hbase.replication.regionserver.ReplicationHLogReaderManager.readNextAndSetPosition(ReplicationHLogReaderManager.java:84)
at 
org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.readAllEntriesToReplicateOrNextFile(ReplicationSource.java:412)
at 
org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:330)
Caused by: java.io.EOFException
at java.io.DataInputStream.readFully(DataInputStream.java:180)
at 
org.apache.hadoop.io.DataOutputBuffer$Buffer.write(DataOutputBuffer.java:68)
at 
org.apache.hadoop.io.DataOutputBuffer.write(DataOutputBuffer.java:106)
at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2282)
at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2181)
at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2227)
at 
org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.next(SequenceFileLogReader.java:238)
... 3 more
... 
{noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-8253) A corrupted log blocked ReplicationSource

2013-04-03 Thread Jieshan Bean (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-8253?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jieshan Bean updated HBASE-8253:


Attachment: HBASE-8253-94.patch

Patch for discussion.

In ReplicationSource#readAllEntriesToReplicateOrNextFile, only read for the 
first edit may throw EOF. So when we get EOF, currentNbEntries should be 0. No 
other case.
Please correct me if I am wrong.

 A corrupted log blocked ReplicationSource
 -

 Key: HBASE-8253
 URL: https://issues.apache.org/jira/browse/HBASE-8253
 Project: HBase
  Issue Type: Bug
  Components: Replication
Affects Versions: 0.94.6
Reporter: Jieshan Bean
Assignee: Jieshan Bean
 Attachments: HBASE-8253-94.patch


 A writting log got corrupted when we forcely power down one node. Only 
 partial of last WALEdit was written into that log. And that log was not the 
 last one in replication queue. 
 ReplicationSource was blocked under this scenario. A lot of logs like below 
 were printed:
 {noformat}
 2013-03-30 06:53:48,628 WARN  
 [regionserver26003-EventThread.replicationSource,1] 1 Got:  
 org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:334)
 java.io.EOFException: 
 hdfs://hacluster/hbase/.logs/master11,26003,1364530862620/master11%2C26003%2C1364530862620.1364553936510,
  entryStart=40434738, pos=40450048, end=40450048, edit=0
   at sun.reflect.GeneratedConstructorAccessor42.newInstance(Unknown 
 Source)
   at 
 sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
   at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
   at 
 org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.addFileInfoToException(SequenceFileLogReader.java:295)
   at 
 org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.next(SequenceFileLogReader.java:240)
   at 
 org.apache.hadoop.hbase.replication.regionserver.ReplicationHLogReaderManager.readNextAndSetPosition(ReplicationHLogReaderManager.java:84)
   at 
 org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.readAllEntriesToReplicateOrNextFile(ReplicationSource.java:412)
   at 
 org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:330)
 Caused by: java.io.EOFException
   at java.io.DataInputStream.readFully(DataInputStream.java:180)
   at 
 org.apache.hadoop.io.DataOutputBuffer$Buffer.write(DataOutputBuffer.java:68)
   at 
 org.apache.hadoop.io.DataOutputBuffer.write(DataOutputBuffer.java:106)
   at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2282)
   at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2181)
   at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2227)
   at 
 org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.next(SequenceFileLogReader.java:238)
   ... 3 more
 ..
 2013-03-30 06:54:38,899 WARN  
 [regionserver26003-EventThread.replicationSource,1] 1 Got:  
 org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:334)
 java.io.EOFException: 
 hdfs://hacluster/hbase/.logs/master11,26003,1364530862620/master11%2C26003%2C1364530862620.1364553936510,
  entryStart=40434738, pos=40450048, end=40450048, edit=0
   at sun.reflect.GeneratedConstructorAccessor42.newInstance(Unknown 
 Source)
   at 
 sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:27)
   at java.lang.reflect.Constructor.newInstance(Constructor.java:513)
   at 
 org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.addFileInfoToException(SequenceFileLogReader.java:295)
   at 
 org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.next(SequenceFileLogReader.java:240)
   at 
 org.apache.hadoop.hbase.replication.regionserver.ReplicationHLogReaderManager.readNextAndSetPosition(ReplicationHLogReaderManager.java:84)
   at 
 org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.readAllEntriesToReplicateOrNextFile(ReplicationSource.java:412)
   at 
 org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:330)
 Caused by: java.io.EOFException
   at java.io.DataInputStream.readFully(DataInputStream.java:180)
   at 
 org.apache.hadoop.io.DataOutputBuffer$Buffer.write(DataOutputBuffer.java:68)
   at 
 org.apache.hadoop.io.DataOutputBuffer.write(DataOutputBuffer.java:106)
   at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2282)
   at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2181)
   at org.apache.hadoop.io.SequenceFile$Reader.next(SequenceFile.java:2227)
   at 
 

[jira] [Commented] (HBASE-8251) enable SSH before assign META on Master startup

2013-04-03 Thread Jieshan Bean (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13620703#comment-13620703
 ] 

Jieshan Bean commented on HBASE-8251:
-

[~rajesh23]I don't think so, Master was blocked at 
AM#processRegionInTransitionAndBlockUntilAssigned:
{code}
// Work on meta region
status.setStatus(Assigning META region);
rit = this.assignmentManager.
  
processRegionInTransitionAndBlockUntilAssigned(HRegionInfo.FIRST_META_REGIONINFO);
boolean metaRegionLocation = 
this.catalogTracker.verifyMetaRegionLocation(timeout);
{code}


 enable SSH before assign META on Master startup
 ---

 Key: HBASE-8251
 URL: https://issues.apache.org/jira/browse/HBASE-8251
 Project: HBase
  Issue Type: Bug
  Components: master, Region Assignment
Affects Versions: 0.94.6
Reporter: Jieshan Bean
Assignee: Jieshan Bean

 I think HBASE-5918 could not fix this issue. In HMaster#assignRootAndMeta:
 1. Assign ROOT.
 2. Block until ROOT be opened.
 3. Assign META.
 4. Block until META be opened.
 SSH is enabled after step 4. So if the RS who host ROOT dies before step 4, 
 master will be blocked.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-8251) enable SSH before assign META on Master startup

2013-04-03 Thread Jieshan Bean (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-8251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jieshan Bean updated HBASE-8251:


Attachment: HBASE-8251-94.patch

Patch for 94.

 enable SSH before assign META on Master startup
 ---

 Key: HBASE-8251
 URL: https://issues.apache.org/jira/browse/HBASE-8251
 Project: HBase
  Issue Type: Bug
  Components: master, Region Assignment
Affects Versions: 0.94.6
Reporter: Jieshan Bean
Assignee: Jieshan Bean
 Attachments: HBASE-8251-94.patch


 I think HBASE-5918 could not fix this issue. In HMaster#assignRootAndMeta:
 1. Assign ROOT.
 2. Block until ROOT be opened.
 3. Assign META.
 4. Block until META be opened.
 SSH is enabled after step 4. So if the RS who host ROOT dies before step 4, 
 master will be blocked.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8251) enable SSH before assign META on Master startup

2013-04-03 Thread Jieshan Bean (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13621535#comment-13621535
 ] 

Jieshan Bean commented on HBASE-8251:
-

[~ted_yu] Sorry, we will be on vacation before April 7th. So I can only submit 
the new patch on 7th:).

 enable SSH before assign META on Master startup
 ---

 Key: HBASE-8251
 URL: https://issues.apache.org/jira/browse/HBASE-8251
 Project: HBase
  Issue Type: Bug
  Components: master, Region Assignment
Affects Versions: 0.94.6
Reporter: Jieshan Bean
Assignee: Jieshan Bean
 Attachments: HBASE-8251-94.patch


 I think HBASE-5918 could not fix this issue. In HMaster#assignRootAndMeta:
 1. Assign ROOT.
 2. Block until ROOT be opened.
 3. Assign META.
 4. Block until META be opened.
 SSH is enabled after step 4. So if the RS who host ROOT dies before step 4, 
 master will be blocked.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8230) Possible NPE on regionserver abort if replication service has not been started

2013-04-03 Thread Jieshan Bean (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13621541#comment-13621541
 ] 

Jieshan Bean commented on HBASE-8230:
-

[~ted_yu] 
Do you have any other comments on this issue? 
Thank you.

 Possible NPE on regionserver abort if replication service has not been started
 --

 Key: HBASE-8230
 URL: https://issues.apache.org/jira/browse/HBASE-8230
 Project: HBase
  Issue Type: Bug
  Components: regionserver, Replication
Affects Versions: 0.94.6
Reporter: Jieshan Bean
Assignee: Jieshan Bean
 Attachments: HBASE-8230-94.patch, HBASE-8230-trunk.patch


 RegionServer got Exception on calling setupWALAndReplication, so entered 
 abort flow. Since replicationSink had not been inialized yet, we got below 
 exception:
 {noformat}
 Exception in thread regionserver26003 java.lang.NullPointerException
  at 
 org.apache.hadoop.hbase.replication.regionserver.Replication.join(Replication.java:129)
  at 
 org.apache.hadoop.hbase.replication.regionserver.Replication.stopReplicationService(Replication.java:120)
  at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.join(HRegionServer.java:1803)
  at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:834)
  at java.lang.Thread.run(Thread.java:662)
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8229) Replication code logs like crazy if a target table cannot be found.

2013-04-02 Thread Jieshan Bean (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13619722#comment-13619722
 ] 

Jieshan Bean commented on HBASE-8229:
-

bq.The idea is to return back into the run() loop of ReplicationSource, so that 
the edits are rechecked (and not shipped to the peer if the local table's 
status has changed).
I didn't see anywhere do this re-check, hope I misread the code:). Even if 
local tabls' replication status has been changed, ReplicationSource still has 
the responsibility to replicate all the edits before the time of table got 
changed, right? So I prefer to not return back directly. Just let it retry and 
sleep until that table be created. 

 Replication code logs like crazy if a target table cannot be found.
 ---

 Key: HBASE-8229
 URL: https://issues.apache.org/jira/browse/HBASE-8229
 Project: HBase
  Issue Type: Bug
  Components: Replication
Reporter: Lars Hofhansl
 Fix For: 0.95.0, 0.98.0, 0.94.7

 Attachments: 8229-0.94.txt, 8229-0.94-V2.txt


 One of our RS/DN machines ran out of diskspace on the partition to which we 
 write the log files.
 It turns out we still had a table in our source cluster with 
 REPLICATION_SCOPE=1 that did not have a matching table in the remote cluster.
 In then logged a long stack trace every 50ms or so, over a few days that 
 filled up our log partition.
 Since ReplicationSource cannot make any progress in this case anyway, it 
 should probably sleep a bit before retrying (or at least limit the rate at 
 which it spews out these exceptions to the log).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8229) Replication code logs like crazy if a target table cannot be found.

2013-04-02 Thread Jieshan Bean (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13620499#comment-13620499
 ] 

Jieshan Bean commented on HBASE-8229:
-

bq.Wouldn't it recreate the set of edits to ship in 
readAllEntriesToReplicateOrNextFile(...) called from run().
Yes, it will read and recreate the set again. But it's the same set as the 
previous one. The current logic in removeNonReplicableEdits only check the 
scope property which owned by the edit itself, not the table scope.


 Replication code logs like crazy if a target table cannot be found.
 ---

 Key: HBASE-8229
 URL: https://issues.apache.org/jira/browse/HBASE-8229
 Project: HBase
  Issue Type: Bug
  Components: Replication
Reporter: Lars Hofhansl
 Fix For: 0.95.0, 0.98.0, 0.94.7

 Attachments: 8229-0.94.txt, 8229-0.94-V2.txt


 One of our RS/DN machines ran out of diskspace on the partition to which we 
 write the log files.
 It turns out we still had a table in our source cluster with 
 REPLICATION_SCOPE=1 that did not have a matching table in the remote cluster.
 In then logged a long stack trace every 50ms or so, over a few days that 
 filled up our log partition.
 Since ReplicationSource cannot make any progress in this case anyway, it 
 should probably sleep a bit before retrying (or at least limit the rate at 
 which it spews out these exceptions to the log).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8213) global authorization may lose efficacy

2013-04-01 Thread Jieshan Bean (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13618693#comment-13618693
 ] 

Jieshan Bean commented on HBASE-8213:
-

[~apurtell]Thank you for the trunk patch. I planned to do that after review:)

 global authorization may lose efficacy 
 ---

 Key: HBASE-8213
 URL: https://issues.apache.org/jira/browse/HBASE-8213
 Project: HBase
  Issue Type: Bug
  Components: security
Affects Versions: 0.95.0, 0.96.0, 0.94.7
Reporter: Jieshan Bean
Assignee: Jieshan Bean
Priority: Critical
 Attachments: HBASE-8213-94.patch, HBASE-8213-trunk.patch


 It depends on the order of which region be opened first.  
 Suppose we have one 1 regionserver and only 1 user region REGION-A on this 
 server, _acl_ region was on another regionserver. _acl_ was opened a few 
 seconds before REGION-A.
 The global authorization data read from Zookeeper was overwritten by the data 
 read from configuration.
 {code}
   private TableAuthManager(ZooKeeperWatcher watcher, Configuration conf)
   throws IOException {
 this.conf = conf;
 this.zkperms = new ZKPermissionWatcher(watcher, this, conf);
 try {
 // Read global authorization data from zookeeper. 
   this.zkperms.start();
 } catch (KeeperException ke) {
   LOG.error(ZooKeeper initialization failed, ke);
 }
 // It will overwrite globalCache.
 // initialize global permissions based on configuration
 globalCache = initGlobal(conf);
   }
 {code}
 This issue can be easily reproduced by below steps:
 1. Start a cluster with 3 regionservers.
 2. Create a new table T1.
 3. grant a new user USER-A with global authorization.
 4. Kill 1 regionserver RS3 and switch balance off.
 5. Start regionserver RS3.
 6. Assign region T1 to RS3.
 7. Put data with user USER-A.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8213) global authorization may lose efficacy

2013-04-01 Thread Jieshan Bean (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8213?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13619386#comment-13619386
 ] 

Jieshan Bean commented on HBASE-8213:
-

Thank you for the review, [~apurtell][~yuzhih...@gmail.com]:)


 global authorization may lose efficacy 
 ---

 Key: HBASE-8213
 URL: https://issues.apache.org/jira/browse/HBASE-8213
 Project: HBase
  Issue Type: Bug
  Components: security
Affects Versions: 0.95.0, 0.96.0, 0.94.6
Reporter: Jieshan Bean
Assignee: Jieshan Bean
Priority: Critical
 Fix For: 0.95.0, 0.98.0, 0.94.7

 Attachments: HBASE-8213-94.patch, HBASE-8213-trunk.patch


 It depends on the order of which region be opened first.  
 Suppose we have one 1 regionserver and only 1 user region REGION-A on this 
 server, _acl_ region was on another regionserver. _acl_ was opened a few 
 seconds before REGION-A.
 The global authorization data read from Zookeeper was overwritten by the data 
 read from configuration.
 {code}
   private TableAuthManager(ZooKeeperWatcher watcher, Configuration conf)
   throws IOException {
 this.conf = conf;
 this.zkperms = new ZKPermissionWatcher(watcher, this, conf);
 try {
 // Read global authorization data from zookeeper. 
   this.zkperms.start();
 } catch (KeeperException ke) {
   LOG.error(ZooKeeper initialization failed, ke);
 }
 // It will overwrite globalCache.
 // initialize global permissions based on configuration
 globalCache = initGlobal(conf);
   }
 {code}
 This issue can be easily reproduced by below steps:
 1. Start a cluster with 3 regionservers.
 2. Create a new table T1.
 3. grant a new user USER-A with global authorization.
 4. Kill 1 regionserver RS3 and switch balance off.
 5. Start regionserver RS3.
 6. Assign region T1 to RS3.
 7. Put data with user USER-A.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-8230) Possible NPE on regionserver abort if replication service has not been started

2013-04-01 Thread Jieshan Bean (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-8230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jieshan Bean updated HBASE-8230:


Attachment: HBASE-8230-trunk.patch

Patch for trunk.

 Possible NPE on regionserver abort if replication service has not been started
 --

 Key: HBASE-8230
 URL: https://issues.apache.org/jira/browse/HBASE-8230
 Project: HBase
  Issue Type: Bug
  Components: regionserver, Replication
Affects Versions: 0.94.6
Reporter: Jieshan Bean
Assignee: Jieshan Bean
 Attachments: HBASE-8230-94.patch, HBASE-8230-trunk.patch


 RegionServer got Exception on calling setupWALAndReplication, so entered 
 abort flow. Since replicationSink had not been inialized yet, we got below 
 exception:
 {noformat}
 Exception in thread regionserver26003 java.lang.NullPointerException
  at 
 org.apache.hadoop.hbase.replication.regionserver.Replication.join(Replication.java:129)
  at 
 org.apache.hadoop.hbase.replication.regionserver.Replication.stopReplicationService(Replication.java:120)
  at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.join(HRegionServer.java:1803)
  at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:834)
  at java.lang.Thread.run(Thread.java:662)
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-8230) Possible NPE on regionserver abort if replication service has not been started

2013-04-01 Thread Jieshan Bean (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-8230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jieshan Bean updated HBASE-8230:


Status: Patch Available  (was: Open)

 Possible NPE on regionserver abort if replication service has not been started
 --

 Key: HBASE-8230
 URL: https://issues.apache.org/jira/browse/HBASE-8230
 Project: HBase
  Issue Type: Bug
  Components: regionserver, Replication
Affects Versions: 0.94.6
Reporter: Jieshan Bean
Assignee: Jieshan Bean
 Attachments: HBASE-8230-94.patch, HBASE-8230-trunk.patch


 RegionServer got Exception on calling setupWALAndReplication, so entered 
 abort flow. Since replicationSink had not been inialized yet, we got below 
 exception:
 {noformat}
 Exception in thread regionserver26003 java.lang.NullPointerException
  at 
 org.apache.hadoop.hbase.replication.regionserver.Replication.join(Replication.java:129)
  at 
 org.apache.hadoop.hbase.replication.regionserver.Replication.stopReplicationService(Replication.java:120)
  at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.join(HRegionServer.java:1803)
  at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:834)
  at java.lang.Thread.run(Thread.java:662)
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8229) Replication code logs like crazy if a target table cannot be found.

2013-04-01 Thread Jieshan Bean (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13619487#comment-13619487
 ] 

Jieshan Bean commented on HBASE-8229:
-

Yes, it's really a good idea. 

 Replication code logs like crazy if a target table cannot be found.
 ---

 Key: HBASE-8229
 URL: https://issues.apache.org/jira/browse/HBASE-8229
 Project: HBase
  Issue Type: Bug
  Components: Replication
Reporter: Lars Hofhansl
 Fix For: 0.95.0, 0.98.0, 0.94.7


 One of our RS/DN machines ran out of diskspace on the partition to which we 
 write the log files.
 It turns out we still had a table in our source cluster with 
 REPLICATION_SCOPE=1 that did not have a matching table in the remote cluster.
 In then logged a long stack trace every 50ms or so, over a few days that 
 filled up our log partition.
 Since ReplicationSource cannot make any progress in this case anyway, it 
 should probably sleep a bit before retrying (or at least limit the rate at 
 which it spews out these exceptions to the log).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8230) Possible NPE on regionserver abort if replication service has not been started

2013-03-31 Thread Jieshan Bean (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13618549#comment-13618549
 ] 

Jieshan Bean commented on HBASE-8230:
-

bq.Did the failure happen when region server restarted ?
Yes.

bq.If this was repeatable, I would suggest finding the root cause.
The root cause in our env was NameNode was in safemode:
{noformat}
2013-03-29 10:32:42,260 FATAL [regionserver26003] ABORTING region server 
om-host2,26003,1364524173470: Unhandled exception: cannot get log writer 
org.apache.hadoop.hbase.regionserver.HRegionServer.abort(HRegionServer.java:1737)
java.io.IOException: cannot get log writer
at 
org.apache.hadoop.hbase.regionserver.wal.HLog.createWriter(HLog.java:757)
at 
org.apache.hadoop.hbase.regionserver.wal.HLog.createWriterInstance(HLog.java:701)
at 
org.apache.hadoop.hbase.regionserver.wal.HLog.rollWriter(HLog.java:637)
at 
org.apache.hadoop.hbase.regionserver.wal.HLog.rollWriter(HLog.java:582)
at org.apache.hadoop.hbase.regionserver.wal.HLog.init(HLog.java:436)
at org.apache.hadoop.hbase.regionserver.wal.HLog.init(HLog.java:362)
at 
org.apache.hadoop.hbase.regionserver.HRegionServer.instantiateHLog(HRegionServer.java:1327)
at 
org.apache.hadoop.hbase.regionserver.HRegionServer.setupWALAndReplication(HRegionServer.java:1316)
at 
org.apache.hadoop.hbase.regionserver.HRegionServer.handleReportForDutyResponse(HRegionServer.java:1030)
at 
org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:706)
at java.lang.Thread.run(Thread.java:662)
Caused by: java.io.IOException: 
org.apache.hadoop.hdfs.server.namenode.SafeModeException: Cannot create 
file/hbase/.logs/om-host2,26003,1364524173470/om-host2%2C26003%2C1364524173470.1364524361366.
 Name node is in safe mode.
The reported blocks 14 has reached the threshold 0.9990 of total blocks 14. 
Safe mode will be turned off automatically in 21 seconds.
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFileInternal(FSNamesystem.java:1601)
at 
org.apache.hadoop.hdfs.server.namenode.FSNamesystem.startFile(FSNamesystem.java:1547)
at 
org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.create(NameNodeRpcServer.java:412)
at 
org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.create(ClientNamenodeProtocolServerSideTranslatorPB.java:204)
at 
org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java:43664)
at 
org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:427)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:924)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1710)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:1706)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1232)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:1704)

at 
org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter.init(SequenceFileLogWriter.java:209)
at 
org.apache.hadoop.hbase.regionserver.wal.HLog.createWriter(HLog.java:754)
... 10 more
{noformat}


 Possible NPE on regionserver abort if replication service has not been started
 --

 Key: HBASE-8230
 URL: https://issues.apache.org/jira/browse/HBASE-8230
 Project: HBase
  Issue Type: Bug
  Components: regionserver, Replication
Affects Versions: 0.94.6
Reporter: Jieshan Bean
Assignee: Jieshan Bean
 Attachments: HBASE-8230-94.patch


 RegionServer got Exception on calling setupWALAndReplication, so entered 
 abort flow. Since replicationSink had not been inialized yet, we got below 
 exception:
 {noformat}
 Exception in thread regionserver26003 java.lang.NullPointerException
  at 
 org.apache.hadoop.hbase.replication.regionserver.Replication.join(Replication.java:129)
  at 
 org.apache.hadoop.hbase.replication.regionserver.Replication.stopReplicationService(Replication.java:120)
  at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.join(HRegionServer.java:1803)
  at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:834)
  at java.lang.Thread.run(Thread.java:662)
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-8229) Replication code logs like crazy if a target table cannot be found.

2013-03-31 Thread Jieshan Bean (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-8229?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jieshan Bean updated HBASE-8229:


Component/s: Replication

 Replication code logs like crazy if a target table cannot be found.
 ---

 Key: HBASE-8229
 URL: https://issues.apache.org/jira/browse/HBASE-8229
 Project: HBase
  Issue Type: Bug
  Components: Replication
Reporter: Lars Hofhansl
 Fix For: 0.95.0, 0.98.0, 0.94.7


 One of our RS/DN machines ran out of diskspace on the partition to which we 
 write the log files.
 It turns out we still had a table in our source cluster with 
 REPLICATION_SCOPE=1 that did not have a matching table in the remote cluster.
 In then logged a long stack trace every 50ms or so, over a few days that 
 filled up our log partition.
 Since ReplicationSource cannot make any progress in this case anyway, it 
 should probably sleep a bit before retrying (or at least limit the rate at 
 which it spews out these exceptions to the log).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8229) Replication code logs like crazy if a target table cannot be found.

2013-03-31 Thread Jieshan Bean (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13618559#comment-13618559
 ] 

Jieshan Bean commented on HBASE-8229:
-

bq.For this issue, I'll just add the same waiting we do when the peer is down 
(which is the same logical behavior we currently have, but without the insane 
busy retrying).
+1





 Replication code logs like crazy if a target table cannot be found.
 ---

 Key: HBASE-8229
 URL: https://issues.apache.org/jira/browse/HBASE-8229
 Project: HBase
  Issue Type: Bug
  Components: Replication
Reporter: Lars Hofhansl
 Fix For: 0.95.0, 0.98.0, 0.94.7


 One of our RS/DN machines ran out of diskspace on the partition to which we 
 write the log files.
 It turns out we still had a table in our source cluster with 
 REPLICATION_SCOPE=1 that did not have a matching table in the remote cluster.
 In then logged a long stack trace every 50ms or so, over a few days that 
 filled up our log partition.
 Since ReplicationSource cannot make any progress in this case anyway, it 
 should probably sleep a bit before retrying (or at least limit the rate at 
 which it spews out these exceptions to the log).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8229) Replication code logs like crazy if a target table cannot be found.

2013-03-30 Thread Jieshan Bean (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13618003#comment-13618003
 ] 

Jieshan Bean commented on HBASE-8229:
-

I suggest to let ReplicationSource wait if one replicating table is not 
present, likes the scenario of peer cluster is unavailable.

 Replication code logs like crazy if a target table cannot be found.
 ---

 Key: HBASE-8229
 URL: https://issues.apache.org/jira/browse/HBASE-8229
 Project: HBase
  Issue Type: Bug
Reporter: Lars Hofhansl
 Fix For: 0.95.0, 0.98.0, 0.94.7


 One of our RS/DN machines ran out of diskspace on the partition to which we 
 write the log files.
 It turns out we still had a table in our source cluster with 
 REPLICATION_SCOPE=1 that did not have a matching table in the remote cluster.
 In then logged a long stack trace every 50ms or so, over a few days that 
 filled up our log partition.
 Since ReplicationSource cannot make any progress in this case anyway, it 
 should probably sleep a bit before retrying (or at least limit the rate at 
 which it spews out these exceptions to the log).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-8213) global authorization may lose efficacy

2013-03-30 Thread Jieshan Bean (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-8213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jieshan Bean updated HBASE-8213:


Attachment: HBASE-8213-94.patch

Patch for 94.

 global authorization may lose efficacy 
 ---

 Key: HBASE-8213
 URL: https://issues.apache.org/jira/browse/HBASE-8213
 Project: HBase
  Issue Type: Bug
  Components: security
Affects Versions: 0.95.0, 0.96.0, 0.94.7
Reporter: Jieshan Bean
Assignee: Jieshan Bean
Priority: Critical
 Attachments: HBASE-8213-94.patch


 It depends on the order of which region be opened first.  
 Suppose we have one 1 regionserver and only 1 user region REGION-A on this 
 server, _acl_ region was on another regionserver. _acl_ was opened a few 
 seconds before REGION-A.
 The global authorization data read from Zookeeper was overwritten by the data 
 read from configuration.
 {code}
   private TableAuthManager(ZooKeeperWatcher watcher, Configuration conf)
   throws IOException {
 this.conf = conf;
 this.zkperms = new ZKPermissionWatcher(watcher, this, conf);
 try {
 // Read global authorization data from zookeeper. 
   this.zkperms.start();
 } catch (KeeperException ke) {
   LOG.error(ZooKeeper initialization failed, ke);
 }
 // It will overwrite globalCache.
 // initialize global permissions based on configuration
 globalCache = initGlobal(conf);
   }
 {code}
 This issue can be easily reproduced by below steps:
 1. Start a cluster with 3 regionservers.
 2. Create a new table T1.
 3. grant a new user USER-A with global authorization.
 4. Kill 1 regionserver RS3 and switch balance off.
 5. Start regionserver RS3.
 6. Assign region T1 to RS3.
 7. Put data with user USER-A.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HBASE-8230) Possible NPE on regionserver abort if replication service has not been started

2013-03-30 Thread Jieshan Bean (JIRA)
Jieshan Bean created HBASE-8230:
---

 Summary: Possible NPE on regionserver abort if replication service 
has not been started
 Key: HBASE-8230
 URL: https://issues.apache.org/jira/browse/HBASE-8230
 Project: HBase
  Issue Type: Bug
  Components: regionserver, Replication
Affects Versions: 0.94.6
Reporter: Jieshan Bean


RegionServer got Exception on calling setupWALAndReplication, so entered abort 
flow. Since replicationSink had not been inialized yet, we got below exception:
{noformat}
Exception in thread regionserver26003 java.lang.NullPointerException
 at 
org.apache.hadoop.hbase.replication.regionserver.Replication.join(Replication.java:129)
 at 
org.apache.hadoop.hbase.replication.regionserver.Replication.stopReplicationService(Replication.java:120)
 at 
org.apache.hadoop.hbase.regionserver.HRegionServer.join(HRegionServer.java:1803)
 at 
org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:834)
 at java.lang.Thread.run(Thread.java:662)
{noformat}


--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-8230) Possible NPE on regionserver abort if replication service has not been started

2013-03-30 Thread Jieshan Bean (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-8230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jieshan Bean updated HBASE-8230:


Attachment: HBASE-8230-94.patch

patch for 94.

 Possible NPE on regionserver abort if replication service has not been started
 --

 Key: HBASE-8230
 URL: https://issues.apache.org/jira/browse/HBASE-8230
 Project: HBase
  Issue Type: Bug
  Components: regionserver, Replication
Affects Versions: 0.94.6
Reporter: Jieshan Bean
 Attachments: HBASE-8230-94.patch


 RegionServer got Exception on calling setupWALAndReplication, so entered 
 abort flow. Since replicationSink had not been inialized yet, we got below 
 exception:
 {noformat}
 Exception in thread regionserver26003 java.lang.NullPointerException
  at 
 org.apache.hadoop.hbase.replication.regionserver.Replication.join(Replication.java:129)
  at 
 org.apache.hadoop.hbase.replication.regionserver.Replication.stopReplicationService(Replication.java:120)
  at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.join(HRegionServer.java:1803)
  at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:834)
  at java.lang.Thread.run(Thread.java:662)
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (HBASE-8230) Possible NPE on regionserver abort if replication service has not been started

2013-03-30 Thread Jieshan Bean (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-8230?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jieshan Bean reassigned HBASE-8230:
---

Assignee: Jieshan Bean

 Possible NPE on regionserver abort if replication service has not been started
 --

 Key: HBASE-8230
 URL: https://issues.apache.org/jira/browse/HBASE-8230
 Project: HBase
  Issue Type: Bug
  Components: regionserver, Replication
Affects Versions: 0.94.6
Reporter: Jieshan Bean
Assignee: Jieshan Bean
 Attachments: HBASE-8230-94.patch


 RegionServer got Exception on calling setupWALAndReplication, so entered 
 abort flow. Since replicationSink had not been inialized yet, we got below 
 exception:
 {noformat}
 Exception in thread regionserver26003 java.lang.NullPointerException
  at 
 org.apache.hadoop.hbase.replication.regionserver.Replication.join(Replication.java:129)
  at 
 org.apache.hadoop.hbase.replication.regionserver.Replication.stopReplicationService(Replication.java:120)
  at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.join(HRegionServer.java:1803)
  at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:834)
  at java.lang.Thread.run(Thread.java:662)
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-1936) ClassLoader that loads from hdfs; useful adding filters to classpath without having to restart services

2013-03-30 Thread Jieshan Bean (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13618019#comment-13618019
 ] 

Jieshan Bean commented on HBASE-1936:
-

Sorry, I don’t have enough time to finish it currently. Please feel free to 
take it over if you interest on it:)

 ClassLoader that loads from hdfs; useful adding filters to classpath without 
 having to restart services
 ---

 Key: HBASE-1936
 URL: https://issues.apache.org/jira/browse/HBASE-1936
 Project: HBase
  Issue Type: New Feature
Reporter: stack
Assignee: Jieshan Bean
  Labels: noob
 Attachments: cp_from_hdfs.patch, HBASE-1936-trunk(forReview).patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (HBASE-1936) ClassLoader that loads from hdfs; useful adding filters to classpath without having to restart services

2013-03-30 Thread Jieshan Bean (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-1936?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jieshan Bean reassigned HBASE-1936:
---

Assignee: (was: Jieshan Bean)

 ClassLoader that loads from hdfs; useful adding filters to classpath without 
 having to restart services
 ---

 Key: HBASE-1936
 URL: https://issues.apache.org/jira/browse/HBASE-1936
 Project: HBase
  Issue Type: New Feature
Reporter: stack
  Labels: noob
 Attachments: cp_from_hdfs.patch, HBASE-1936-trunk(forReview).patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8230) Possible NPE on regionserver abort if replication service has not been started

2013-03-30 Thread Jieshan Bean (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13618054#comment-13618054
 ] 

Jieshan Bean commented on HBASE-8230:
-

Here's the exception:
{noformat}
2013-03-29 10:32:42,251 INFO  [regionserver26003] STOPPED: Failed 
initialization 
org.apache.hadoop.hbase.regionserver.HRegionServer.stop(HRegionServer.java:1665)
2013-03-29 10:32:42,253 ERROR [regionserver26003] Failed init 
org.apache.hadoop.hbase.regionserver.HRegionServer.cleanup(HRegionServer.java:1161)
java.io.IOException: cannot get log writer
 at org.apache.hadoop.hbase.regionserver.wal.HLog.createWriter(HLog.java:757)
 at 
org.apache.hadoop.hbase.regionserver.wal.HLog.createWriterInstance(HLog.java:701)
 at org.apache.hadoop.hbase.regionserver.wal.HLog.rollWriter(HLog.java:637)
 at org.apache.hadoop.hbase.regionserver.wal.HLog.rollWriter(HLog.java:582)
 at org.apache.hadoop.hbase.regionserver.wal.HLog.init(HLog.java:436)
 at org.apache.hadoop.hbase.regionserver.wal.HLog.init(HLog.java:362)
 at 
org.apache.hadoop.hbase.regionserver.HRegionServer.instantiateHLog(HRegionServer.java:1327)
 at 
org.apache.hadoop.hbase.regionserver.HRegionServer.setupWALAndReplication(HRegionServer.java:1316)
 at 
org.apache.hadoop.hbase.regionserver.HRegionServer.handleReportForDutyResponse(HRegionServer.java:1030)
 at 
org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:706)
 at java.lang.Thread.run(Thread.java:662)
{noformat}

 Possible NPE on regionserver abort if replication service has not been started
 --

 Key: HBASE-8230
 URL: https://issues.apache.org/jira/browse/HBASE-8230
 Project: HBase
  Issue Type: Bug
  Components: regionserver, Replication
Affects Versions: 0.94.6
Reporter: Jieshan Bean
Assignee: Jieshan Bean
 Attachments: HBASE-8230-94.patch


 RegionServer got Exception on calling setupWALAndReplication, so entered 
 abort flow. Since replicationSink had not been inialized yet, we got below 
 exception:
 {noformat}
 Exception in thread regionserver26003 java.lang.NullPointerException
  at 
 org.apache.hadoop.hbase.replication.regionserver.Replication.join(Replication.java:129)
  at 
 org.apache.hadoop.hbase.replication.regionserver.Replication.stopReplicationService(Replication.java:120)
  at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.join(HRegionServer.java:1803)
  at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:834)
  at java.lang.Thread.run(Thread.java:662)
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8230) Possible NPE on regionserver abort if replication service has not been started

2013-03-30 Thread Jieshan Bean (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8230?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13618204#comment-13618204
 ] 

Jieshan Bean commented on HBASE-8230:
-

Any exception occures before startServiceThreads may cause this NPE, right?  so 
what caused the log writer creation failure is not the key point, I think.

 Possible NPE on regionserver abort if replication service has not been started
 --

 Key: HBASE-8230
 URL: https://issues.apache.org/jira/browse/HBASE-8230
 Project: HBase
  Issue Type: Bug
  Components: regionserver, Replication
Affects Versions: 0.94.6
Reporter: Jieshan Bean
Assignee: Jieshan Bean
 Attachments: HBASE-8230-94.patch


 RegionServer got Exception on calling setupWALAndReplication, so entered 
 abort flow. Since replicationSink had not been inialized yet, we got below 
 exception:
 {noformat}
 Exception in thread regionserver26003 java.lang.NullPointerException
  at 
 org.apache.hadoop.hbase.replication.regionserver.Replication.join(Replication.java:129)
  at 
 org.apache.hadoop.hbase.replication.regionserver.Replication.stopReplicationService(Replication.java:120)
  at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.join(HRegionServer.java:1803)
  at 
 org.apache.hadoop.hbase.regionserver.HRegionServer.run(HRegionServer.java:834)
  at java.lang.Thread.run(Thread.java:662)
 {noformat}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8229) Replication code logs like crazy if a target table cannot be found.

2013-03-30 Thread Jieshan Bean (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13618207#comment-13618207
 ] 

Jieshan Bean commented on HBASE-8229:
-

bq.But can we simply wait again and again until the table is created on the 
other side?
I'm afraid we should do that. Unless we add a mechanism to check whether a 
table has already been deleted. But I think ReplicationSource still has the 
responsibility to finish all the rest edits. Any skip may cause data-loss.
I think the most probable scenario of this problem is we forgot to create table 
for sink side.

bq. At some point, if there is any failure, we will still miss the edits.
[~jmspaggi] Can you show me one scenario? :)


 Replication code logs like crazy if a target table cannot be found.
 ---

 Key: HBASE-8229
 URL: https://issues.apache.org/jira/browse/HBASE-8229
 Project: HBase
  Issue Type: Bug
Reporter: Lars Hofhansl
 Fix For: 0.95.0, 0.98.0, 0.94.7


 One of our RS/DN machines ran out of diskspace on the partition to which we 
 write the log files.
 It turns out we still had a table in our source cluster with 
 REPLICATION_SCOPE=1 that did not have a matching table in the remote cluster.
 In then logged a long stack trace every 50ms or so, over a few days that 
 filled up our log partition.
 Since ReplicationSource cannot make any progress in this case anyway, it 
 should probably sleep a bit before retrying (or at least limit the rate at 
 which it spews out these exceptions to the log).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7122) Proper warning message when opening a log file with no entries (idle cluster)

2013-03-28 Thread Jieshan Bean (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7122?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13616115#comment-13616115
 ] 

Jieshan Bean commented on HBASE-7122:
-

[~himan...@cloudera.com] There's one possible problem in this patch. Suppose we 
have several logs in recovered queue and with 1 log is empty, this change will 
hang the ReplicationSource thread which will keep opening the empty log.

 Proper warning message when opening a log file with no entries (idle cluster)
 -

 Key: HBASE-7122
 URL: https://issues.apache.org/jira/browse/HBASE-7122
 Project: HBase
  Issue Type: Bug
  Components: Replication
Affects Versions: 0.94.2
Reporter: Himanshu Vashishtha
Assignee: Himanshu Vashishtha
 Fix For: 0.95.0

 Attachments: HBase-7122-94.patch, HBase-7122.patch, 
 HBASE-7122.v2.patch


 In case the cluster is idle and the log has rolled (offset to 0), 
 replicationSource tries to open the log and gets an EOF exception. This gets 
 printed after every 10 sec until an entry is inserted in it.
 {code}
 2012-11-07 15:47:40,924 DEBUG regionserver.ReplicationSource 
 (ReplicationSource.java:openReader(487)) - Opening log for replication 
 c0315.hal.cloudera.com%2C40020%2C1352324202860.1352327804874 at 0
 2012-11-07 15:47:40,926 WARN  regionserver.ReplicationSource 
 (ReplicationSource.java:openReader(543)) - 1 Got: 
 java.io.EOFException
   at java.io.DataInputStream.readFully(DataInputStream.java:180)
   at java.io.DataInputStream.readFully(DataInputStream.java:152)
   at org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1508)
   at 
 org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1486)
   at 
 org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1475)
   at 
 org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1470)
   at 
 org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader$WALReader.init(SequenceFileLogReader.java:55)
   at 
 org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.init(SequenceFileLogReader.java:175)
   at 
 org.apache.hadoop.hbase.regionserver.wal.HLog.getReader(HLog.java:716)
   at 
 org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.openReader(ReplicationSource.java:491)
   at 
 org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:290)
 2012-11-07 15:47:40,927 WARN  regionserver.ReplicationSource 
 (ReplicationSource.java:openReader(547)) - Waited too long for this file, 
 considering dumping
 2012-11-07 15:47:40,927 DEBUG regionserver.ReplicationSource 
 (ReplicationSource.java:sleepForRetries(562)) - Unable to open a reader, 
 sleeping 1000 times 10
 {code}
 We should reduce the log spewing in this case (or some informative message, 
 based on the offset).

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HBASE-8212) Introduce a new separator instead of hyphen('-') for renaming recovered queues' znodes

2013-03-28 Thread Jieshan Bean (JIRA)
Jieshan Bean created HBASE-8212:
---

 Summary: Introduce a new separator instead of hyphen('-') for 
renaming recovered queues' znodes
 Key: HBASE-8212
 URL: https://issues.apache.org/jira/browse/HBASE-8212
 Project: HBase
  Issue Type: Bug
  Components: Replication
Affects Versions: 0.94.6
Reporter: Jieshan Bean
Assignee: Jieshan Bean
 Fix For: 0.96.0, 0.94.7


hyphen is frequently used in the HostName. Likes we have one regionserver named 
160-172-0-1, so under this scenario, 160-172-0-1 will be splited to 4 Strings 
and will be considered for 4 possible dead servers.
It won't find all the logs for 160-172-0-1 any more, so causes data-loss.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8212) Introduce a new separator instead of hyphen('-') for renaming recovered queues' znodes

2013-03-28 Thread Jieshan Bean (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8212?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13616131#comment-13616131
 ] 

Jieshan Bean commented on HBASE-8212:
-

Ya...Sorry, I didn't see that:(. It's the same issue.

 Introduce a new separator instead of hyphen('-') for renaming recovered 
 queues' znodes
 --

 Key: HBASE-8212
 URL: https://issues.apache.org/jira/browse/HBASE-8212
 Project: HBase
  Issue Type: Bug
  Components: Replication
Affects Versions: 0.94.6
Reporter: Jieshan Bean
Assignee: Jieshan Bean
 Fix For: 0.96.0, 0.94.7


 hyphen is frequently used in the HostName. Likes we have one regionserver 
 named 160-172-0-1, so under this scenario, 160-172-0-1 will be splited to 4 
 Strings and will be considered for 4 possible dead servers.
 It won't find all the logs for 160-172-0-1 any more, so causes data-loss.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-8207) Replication could have data loss when machine name contains hyphen -

2013-03-28 Thread Jieshan Bean (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-8207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jieshan Bean updated HBASE-8207:


Attachment: HBASE-8212-94.patch

I have finished one patch which using # instead of -. Sorry, I raised a 
same issue but didn't notice this has already been there:(

 Replication could have data loss when machine name contains hyphen -
 --

 Key: HBASE-8207
 URL: https://issues.apache.org/jira/browse/HBASE-8207
 Project: HBase
  Issue Type: Bug
  Components: Replication
Affects Versions: 0.95.0, 0.94.6
Reporter: Jeffrey Zhong
Assignee: Jeffrey Zhong
Priority: Critical
 Fix For: 0.95.0, 0.98.0, 0.94.7

 Attachments: failed.txt, HBASE-8212-94.patch


 In the recent test case TestReplication* failures, I'm finally able to find 
 the cause(or one of causes) for its intermittent failures.
 When a machine name contains -, it breaks the function 
 ReplicationSource.checkIfQueueRecovered. It causes the following issue:
 deadRegionServers list is way off so that replication doesn't wait for log 
 splitting finish for a wal file and move on to the next one(data loss)
 You can see that replication use those weird paths constructed from 
 deadRegionServers to check a file existence
 {code}
 2013-03-26 21:26:51,385 INFO  
 [ReplicationExecutor-0.replicationSource,2-ip-10-197-0-156.us-west-1.compute.internal,52170,1364333181125]
  regionserver.ReplicationSource(524): Possible location 
 hdfs://localhost:52882/user/ec2-user/hbase/.logs/1.compute.internal,52170,1364333181125/ip-10-197-0-156.us-west-1.compute.internal%252C52170%252C1364333181125.1364333199540
 2013-03-26 21:26:51,386 INFO  
 [ReplicationExecutor-0.replicationSource,2-ip-10-197-0-156.us-west-1.compute.internal,52170,1364333181125]
  regionserver.ReplicationSource(524): Possible location 
 hdfs://localhost:52882/user/ec2-user/hbase/.logs/1.compute.internal,52170,1364333181125-splitting/ip-10-197-0-156.us-west-1.compute.internal%252C52170%252C1364333181125.1364333199540
 2013-03-26 21:26:51,387 INFO  
 [ReplicationExecutor-0.replicationSource,2-ip-10-197-0-156.us-west-1.compute.internal,52170,1364333181125]
  regionserver.ReplicationSource(524): Possible location 
 hdfs://localhost:52882/user/ec2-user/hbase/.logs/west/ip-10-197-0-156.us-west-1.compute.internal%252C52170%252C1364333181125.1364333199540
 2013-03-26 21:26:51,389 INFO  
 [ReplicationExecutor-0.replicationSource,2-ip-10-197-0-156.us-west-1.compute.internal,52170,1364333181125]
  regionserver.ReplicationSource(524): Possible location 
 hdfs://localhost:52882/user/ec2-user/hbase/.logs/west-splitting/ip-10-197-0-156.us-west-1.compute.internal%252C52170%252C1364333181125.1364333199540
 2013-03-26 21:26:51,391 INFO  
 [ReplicationExecutor-0.replicationSource,2-ip-10-197-0-156.us-west-1.compute.internal,52170,1364333181125]
  regionserver.ReplicationSource(524): Possible location 
 hdfs://localhost:52882/user/ec2-user/hbase/.logs/156.us/ip-10-197-0-156.us-west-1.compute.internal%252C52170%252C1364333181125.1364333199540
 2013-03-26 21:26:51,394 INFO  
 [ReplicationExecutor-0.replicationSource,2-ip-10-197-0-156.us-west-1.compute.internal,52170,1364333181125]
  regionserver.ReplicationSource(524): Possible location 
 hdfs://localhost:52882/user/ec2-user/hbase/.logs/156.us-splitting/ip-10-197-0-156.us-west-1.compute.internal%252C52170%252C1364333181125.1364333199540
 2013-03-26 21:26:51,396 INFO  
 [ReplicationExecutor-0.replicationSource,2-ip-10-197-0-156.us-west-1.compute.internal,52170,1364333181125]
  regionserver.ReplicationSource(524): Possible location 
 hdfs://localhost:52882/user/ec2-user/hbase/.logs/0/ip-10-197-0-156.us-west-1.compute.internal%252C52170%252C1364333181125.1364333199540
 2013-03-26 21:26:51,398 INFO  
 [ReplicationExecutor-0.replicationSource,2-ip-10-197-0-156.us-west-1.compute.internal,52170,1364333181125]
  regionserver.ReplicationSource(524): Possible location 
 hdfs://localhost:52882/user/ec2-user/hbase/.logs/0-splitting/ip-10-197-0-156.us-west-1.compute.internal%252C52170%252C1364333181125.1364333199540
 {code}
 This happened in the recent test failure in 
 http://54.241.6.143/job/HBase-0.94/org.apache.hbase$hbase/21/testReport/junit/org.apache.hadoop.hbase.replication/TestReplicationQueueFailover/queueFailover/?auto_refresh=false
 Search for 
 {code}
 File does not exist: 
 hdfs://localhost:52882/user/ec2-user/hbase/.oldlogs/ip-10-197-0-156.us-west-1.compute.internal%2C52170%2C1364333181125.1364333199540
 {code}
 After 10 times retries, replication source gave up and move on to the next 
 file. Data loss happens. 
 Since lots of EC2 machine names contain - including our Jenkin servers, 
 this is a high impact issue.

--
This message is automatically generated by JIRA.
If you 

[jira] [Resolved] (HBASE-8212) Introduce a new separator instead of hyphen('-') for renaming recovered queues' znodes

2013-03-28 Thread Jieshan Bean (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-8212?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jieshan Bean resolved HBASE-8212.
-

Resolution: Duplicate

 Introduce a new separator instead of hyphen('-') for renaming recovered 
 queues' znodes
 --

 Key: HBASE-8212
 URL: https://issues.apache.org/jira/browse/HBASE-8212
 Project: HBase
  Issue Type: Bug
  Components: Replication
Affects Versions: 0.94.6
Reporter: Jieshan Bean
Assignee: Jieshan Bean
 Fix For: 0.96.0, 0.94.7


 hyphen is frequently used in the HostName. Likes we have one regionserver 
 named 160-172-0-1, so under this scenario, 160-172-0-1 will be splited to 4 
 Strings and will be considered for 4 possible dead servers.
 It won't find all the logs for 160-172-0-1 any more, so causes data-loss.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8207) Replication could have data loss when machine name contains hyphen -

2013-03-28 Thread Jieshan Bean (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13616141#comment-13616141
 ] 

Jieshan Bean commented on HBASE-8207:
-

We found the same problem in our test environment, attaching the logs for your 
reference:
{noformat}
2013-03-25 04:51:20,929 INFO  
[ReplicationExecutor-0.replicationSource,1-160-172-0-130,60020,1364199883591] 
NB dead servers : 4 
org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.openReader(ReplicationSource.java:517)
2013-03-25 04:51:20,929 INFO  
[ReplicationExecutor-0.replicationSource,1-160-172-0-130,60020,1364199883591] 
Possible location 
hdfs://hacluster/hbase/.logs/130,60020,1364199883591/160-172-0-130%252C60020%252C1364199883591.1364200564291
 
org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.openReader(ReplicationSource.java:528)
2013-03-25 04:51:20,930 INFO  
[ReplicationExecutor-0.replicationSource,1-160-172-0-130,60020,1364199883591] 
Possible location 
hdfs://hacluster/hbase/.logs/130,60020,1364199883591-splitting/160-172-0-130%252C60020%252C1364199883591.1364200564291
 
org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.openReader(ReplicationSource.java:528)
2013-03-25 04:51:20,932 INFO  
[ReplicationExecutor-0.replicationSource,1-160-172-0-130,60020,1364199883591] 
Possible location 
hdfs://hacluster/hbase/.logs/0/160-172-0-130%252C60020%252C1364199883591.1364200564291
 
org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.openReader(ReplicationSource.java:528)
2013-03-25 04:51:20,934 INFO  
[ReplicationExecutor-0.replicationSource,1-160-172-0-130,60020,1364199883591] 
Possible location 
hdfs://hacluster/hbase/.logs/0-splitting/160-172-0-130%252C60020%252C1364199883591.1364200564291
 
org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.openReader(ReplicationSource.java:528)
2013-03-25 04:51:20,935 INFO  
[ReplicationExecutor-0.replicationSource,1-160-172-0-130,60020,1364199883591] 
Possible location 
hdfs://hacluster/hbase/.logs/172/160-172-0-130%252C60020%252C1364199883591.1364200564291
 
org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.openReader(ReplicationSource.java:528)
2013-03-25 04:51:20,937 INFO  
[ReplicationExecutor-0.replicationSource,1-160-172-0-130,60020,1364199883591] 
Possible location 
hdfs://hacluster/hbase/.logs/172-splitting/160-172-0-130%252C60020%252C1364199883591.1364200564291
 
org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.openReader(ReplicationSource.java:528)
2013-03-25 04:51:20,938 INFO  
[ReplicationExecutor-0.replicationSource,1-160-172-0-130,60020,1364199883591] 
Possible location 
hdfs://hacluster/hbase/.logs/160/160-172-0-130%252C60020%252C1364199883591.1364200564291
 
org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.openReader(ReplicationSource.java:528)
2013-03-25 04:51:20,939 INFO  
[ReplicationExecutor-0.replicationSource,1-160-172-0-130,60020,1364199883591] 
Possible location 
hdfs://hacluster/hbase/.logs/160-splitting/160-172-0-130%252C60020%252C1364199883591.1364200564291
 
org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.openReader(ReplicationSource.java:528)
2013-03-25 04:51:20,941 WARN  
[ReplicationExecutor-0.replicationSource,1-160-172-0-130,60020,1364199883591] 
1-160-172-0-130,60020,1364199883591 Got:  
org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.openReader(ReplicationSource.java:563)
java.io.IOException: File from recovered queue is nowhere to be found
at 
org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.openReader(ReplicationSource.java:545)
at 
org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:311)
Caused by: java.io.FileNotFoundException: File does not exist: 
hdfs://hacluster/hbase/.oldlogs/160-172-0-130%2C60020%2C1364199883591.1364200564291
at 
org.apache.hadoop.hdfs.DistributedFileSystem.getFileStatus(DistributedFileSystem.java:752)
at 
org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1692)
at 
org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1716)
at 
org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader$WALReader.init(SequenceFileLogReader.java:55)
at 
org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.init(SequenceFileLogReader.java:177)
at 
org.apache.hadoop.hbase.regionserver.wal.HLog.getReader(HLog.java:728)
at 
org.apache.hadoop.hbase.replication.regionserver.ReplicationHLogReaderManager.openReader(ReplicationHLogReaderManager.java:67)
at 
org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.openReader(ReplicationSource.java:511)
... 1 more
{noformat}

 Replication could have data loss when machine name contains hyphen -
 

[jira] [Created] (HBASE-8213) global authorization may lose efficacy

2013-03-28 Thread Jieshan Bean (JIRA)
Jieshan Bean created HBASE-8213:
---

 Summary: global authorization may lose efficacy 
 Key: HBASE-8213
 URL: https://issues.apache.org/jira/browse/HBASE-8213
 Project: HBase
  Issue Type: Bug
Reporter: Jieshan Bean
Priority: Critical


It depends on the order of which region be opened first.  
Suppose we have one 1 regionserver and only 1 user region REGION-A on this 
server, _acl_ region was on another regionserver. _acl_ was opened a few 
seconds before REGION-A.
The global authorization data read from Zookeeper was overwritten by the data 
read from configuration.
{code}
  private TableAuthManager(ZooKeeperWatcher watcher, Configuration conf)
  throws IOException {
this.conf = conf;
this.zkperms = new ZKPermissionWatcher(watcher, this, conf);
try {
  // Read global authorization data from zookeeper. 
  this.zkperms.start();
} catch (KeeperException ke) {
  LOG.error(ZooKeeper initialization failed, ke);
}
// It will overwrite globalCache.
// initialize global permissions based on configuration
globalCache = initGlobal(conf);
  }
{code}

This issue can be easily reproduced by below steps:
1. Start a cluster with 3 regionservers.
2. Create a new table T1.
3. grant a new user USER-A with global authorization.
4. Kill 1 regionserver RS3 and switch balance off.
5. Start regionserver RS3.
6. Assign region T1 to RS3.
7. Put data with user USER-A.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (HBASE-8213) global authorization may lose efficacy

2013-03-28 Thread Jieshan Bean (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-8213?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jieshan Bean reassigned HBASE-8213:
---

Assignee: Jieshan Bean

 global authorization may lose efficacy 
 ---

 Key: HBASE-8213
 URL: https://issues.apache.org/jira/browse/HBASE-8213
 Project: HBase
  Issue Type: Bug
Reporter: Jieshan Bean
Assignee: Jieshan Bean
Priority: Critical

 It depends on the order of which region be opened first.  
 Suppose we have one 1 regionserver and only 1 user region REGION-A on this 
 server, _acl_ region was on another regionserver. _acl_ was opened a few 
 seconds before REGION-A.
 The global authorization data read from Zookeeper was overwritten by the data 
 read from configuration.
 {code}
   private TableAuthManager(ZooKeeperWatcher watcher, Configuration conf)
   throws IOException {
 this.conf = conf;
 this.zkperms = new ZKPermissionWatcher(watcher, this, conf);
 try {
 // Read global authorization data from zookeeper. 
   this.zkperms.start();
 } catch (KeeperException ke) {
   LOG.error(ZooKeeper initialization failed, ke);
 }
 // It will overwrite globalCache.
 // initialize global permissions based on configuration
 globalCache = initGlobal(conf);
   }
 {code}
 This issue can be easily reproduced by below steps:
 1. Start a cluster with 3 regionservers.
 2. Create a new table T1.
 3. grant a new user USER-A with global authorization.
 4. Kill 1 regionserver RS3 and switch balance off.
 5. Start regionserver RS3.
 6. Assign region T1 to RS3.
 7. Put data with user USER-A.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-8207) Replication could have data loss when machine name contains hyphen -

2013-03-28 Thread Jieshan Bean (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13617056#comment-13617056
 ] 

Jieshan Bean commented on HBASE-8207:
-

New patch also looks good to me. Is it neccessary to add restrictions on 
peer-id when calling add_peer?

 Replication could have data loss when machine name contains hyphen -
 --

 Key: HBASE-8207
 URL: https://issues.apache.org/jira/browse/HBASE-8207
 Project: HBase
  Issue Type: Bug
  Components: Replication
Affects Versions: 0.95.0, 0.94.6
Reporter: Jeffrey Zhong
Assignee: Jeffrey Zhong
Priority: Critical
 Fix For: 0.95.0, 0.98.0, 0.94.7

 Attachments: failed.txt, hbase-8207-0.94-v1.patch, hbase-8207.patch, 
 hbase-8207_v1.patch, hbase-8207_v2.patch, hbase-8207_v2.patch, 
 HBASE-8212-94.patch


 In the recent test case TestReplication* failures, I'm finally able to find 
 the cause(or one of causes) for its intermittent failures.
 When a machine name contains -, it breaks the function 
 ReplicationSource.checkIfQueueRecovered. It causes the following issue:
 deadRegionServers list is way off so that replication doesn't wait for log 
 splitting finish for a wal file and move on to the next one(data loss)
 You can see that replication use those weird paths constructed from 
 deadRegionServers to check a file existence
 {code}
 2013-03-26 21:26:51,385 INFO  
 [ReplicationExecutor-0.replicationSource,2-ip-10-197-0-156.us-west-1.compute.internal,52170,1364333181125]
  regionserver.ReplicationSource(524): Possible location 
 hdfs://localhost:52882/user/ec2-user/hbase/.logs/1.compute.internal,52170,1364333181125/ip-10-197-0-156.us-west-1.compute.internal%252C52170%252C1364333181125.1364333199540
 2013-03-26 21:26:51,386 INFO  
 [ReplicationExecutor-0.replicationSource,2-ip-10-197-0-156.us-west-1.compute.internal,52170,1364333181125]
  regionserver.ReplicationSource(524): Possible location 
 hdfs://localhost:52882/user/ec2-user/hbase/.logs/1.compute.internal,52170,1364333181125-splitting/ip-10-197-0-156.us-west-1.compute.internal%252C52170%252C1364333181125.1364333199540
 2013-03-26 21:26:51,387 INFO  
 [ReplicationExecutor-0.replicationSource,2-ip-10-197-0-156.us-west-1.compute.internal,52170,1364333181125]
  regionserver.ReplicationSource(524): Possible location 
 hdfs://localhost:52882/user/ec2-user/hbase/.logs/west/ip-10-197-0-156.us-west-1.compute.internal%252C52170%252C1364333181125.1364333199540
 2013-03-26 21:26:51,389 INFO  
 [ReplicationExecutor-0.replicationSource,2-ip-10-197-0-156.us-west-1.compute.internal,52170,1364333181125]
  regionserver.ReplicationSource(524): Possible location 
 hdfs://localhost:52882/user/ec2-user/hbase/.logs/west-splitting/ip-10-197-0-156.us-west-1.compute.internal%252C52170%252C1364333181125.1364333199540
 2013-03-26 21:26:51,391 INFO  
 [ReplicationExecutor-0.replicationSource,2-ip-10-197-0-156.us-west-1.compute.internal,52170,1364333181125]
  regionserver.ReplicationSource(524): Possible location 
 hdfs://localhost:52882/user/ec2-user/hbase/.logs/156.us/ip-10-197-0-156.us-west-1.compute.internal%252C52170%252C1364333181125.1364333199540
 2013-03-26 21:26:51,394 INFO  
 [ReplicationExecutor-0.replicationSource,2-ip-10-197-0-156.us-west-1.compute.internal,52170,1364333181125]
  regionserver.ReplicationSource(524): Possible location 
 hdfs://localhost:52882/user/ec2-user/hbase/.logs/156.us-splitting/ip-10-197-0-156.us-west-1.compute.internal%252C52170%252C1364333181125.1364333199540
 2013-03-26 21:26:51,396 INFO  
 [ReplicationExecutor-0.replicationSource,2-ip-10-197-0-156.us-west-1.compute.internal,52170,1364333181125]
  regionserver.ReplicationSource(524): Possible location 
 hdfs://localhost:52882/user/ec2-user/hbase/.logs/0/ip-10-197-0-156.us-west-1.compute.internal%252C52170%252C1364333181125.1364333199540
 2013-03-26 21:26:51,398 INFO  
 [ReplicationExecutor-0.replicationSource,2-ip-10-197-0-156.us-west-1.compute.internal,52170,1364333181125]
  regionserver.ReplicationSource(524): Possible location 
 hdfs://localhost:52882/user/ec2-user/hbase/.logs/0-splitting/ip-10-197-0-156.us-west-1.compute.internal%252C52170%252C1364333181125.1364333199540
 {code}
 This happened in the recent test failure in 
 http://54.241.6.143/job/HBase-0.94/org.apache.hbase$hbase/21/testReport/junit/org.apache.hadoop.hbase.replication/TestReplicationQueueFailover/queueFailover/?auto_refresh=false
 Search for 
 {code}
 File does not exist: 
 hdfs://localhost:52882/user/ec2-user/hbase/.oldlogs/ip-10-197-0-156.us-west-1.compute.internal%2C52170%2C1364333181125.1364333199540
 {code}
 After 10 times retries, replication source gave up and move on to the next 
 file. Data loss happens. 
 Since lots of EC2 machine names contain - including our Jenkin 

[jira] [Commented] (HBASE-8104) HBase consistency and availability after replication

2013-03-20 Thread Jieshan Bean (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-8104?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13608509#comment-13608509
 ] 

Jieshan Bean commented on HBASE-8104:
-

I think so, this is the only way we can do that. But I don't think we really 
need that.

 HBase consistency and availability after replication
 

 Key: HBASE-8104
 URL: https://issues.apache.org/jira/browse/HBASE-8104
 Project: HBase
  Issue Type: New Feature
Affects Versions: 0.94.3
Reporter: Brian Fu
Priority: Critical
   Original Estimate: 336h
  Remaining Estimate: 336h

 HBase consistency and availability after replication
 Scene as follows:
 1. There are two HBase clusters are the Master  clusters and Slave Clusters.  
 two clusters replication function is open.
 2. if master cluster have problems, so  all write and read request switching 
 to the slave cluster.
 3. After a period of time ,we need to switch back to the Master cluster, 
 there will be a part of the data is inconsistent, lead to  this part of the 
 data is not available.
 This feature is particularly important for providing online services HBase 
 cluster.
 So, I want through a write-back program to keep the data consistency, then to 
 improve HBase availability. 
 we will provide a patch for this function.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7750) We should throw IOE when calling HRegionServer#replicateLogEntries if ReplicationSink is null

2013-02-05 Thread Jieshan Bean (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7750?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13572239#comment-13572239
 ] 

Jieshan Bean commented on HBASE-7750:
-

bq.Also on the sink side we could just print a message saying that someone 
tried to replicate to us and weren't able to accept the edits.

I agree. Sink side should print this warn log. 
Source side need to handle this exception, otherwise, it keeps on calling 
shipEdits without sleep.

I will submit a patch after verification.

 We should throw IOE when calling HRegionServer#replicateLogEntries if 
 ReplicationSink is null
 -

 Key: HBASE-7750
 URL: https://issues.apache.org/jira/browse/HBASE-7750
 Project: HBase
  Issue Type: Bug
  Components: Replication
Affects Versions: 0.96.0, 0.94.4
Reporter: Jieshan Bean

 It may be an expected behavior, but I think it's better to do something. 
 We configured hbase.replication as true in master cluster, and added peer. 
 But forgot to configure hbase.replication on slave cluster side.
 ReplicationSource read HLog, shipped log edits, and logged position. 
 Everything seemed alright. But data was not present in slave cluster.
 So I think, slave cluster should throw exception to master cluster instead of 
 return directly:
 {code}
   public void replicateLogEntries(final HLog.Entry[] entries)
   throws IOException {
 checkOpen();
 if (this.replicationSinkHandler == null) return;
 this.replicationSinkHandler.replicateLogEntries(entries);
   }
 {code}
 I would like to hear your comments on this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (HBASE-7750) We should throw IOE when calling HRegionServer#replicateLogEntries if ReplicationSink is null

2013-02-05 Thread Jieshan Bean (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-7750?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jieshan Bean reassigned HBASE-7750:
---

Assignee: Jieshan Bean

 We should throw IOE when calling HRegionServer#replicateLogEntries if 
 ReplicationSink is null
 -

 Key: HBASE-7750
 URL: https://issues.apache.org/jira/browse/HBASE-7750
 Project: HBase
  Issue Type: Bug
  Components: Replication
Affects Versions: 0.96.0, 0.94.4
Reporter: Jieshan Bean
Assignee: Jieshan Bean

 It may be an expected behavior, but I think it's better to do something. 
 We configured hbase.replication as true in master cluster, and added peer. 
 But forgot to configure hbase.replication on slave cluster side.
 ReplicationSource read HLog, shipped log edits, and logged position. 
 Everything seemed alright. But data was not present in slave cluster.
 So I think, slave cluster should throw exception to master cluster instead of 
 return directly:
 {code}
   public void replicateLogEntries(final HLog.Entry[] entries)
   throws IOException {
 checkOpen();
 if (this.replicationSinkHandler == null) return;
 this.replicationSinkHandler.replicateLogEntries(entries);
   }
 {code}
 I would like to hear your comments on this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HBASE-7750) We should throw IOE when calling HRegionServer#replicateLogEntries if ReplicationSink is null

2013-02-02 Thread Jieshan Bean (JIRA)
Jieshan Bean created HBASE-7750:
---

 Summary: We should throw IOE when calling 
HRegionServer#replicateLogEntries if ReplicationSink is null
 Key: HBASE-7750
 URL: https://issues.apache.org/jira/browse/HBASE-7750
 Project: HBase
  Issue Type: Bug
  Components: Replication
Affects Versions: 0.94.4, 0.96.0
Reporter: Jieshan Bean


It may be an expected behavior, but I think it's better to do something. 
We configured hbase.replication as true in master cluster, and added peer. 
But forgot to configure hbase.replication on slave cluster side.
ReplicationSource read HLog, shipped log edits, and logged position. Everything 
seemed alright. But data was not present in slave cluster.

So I think, slave cluster should throw exception to master cluster instead of 
return directly:

{code}
  public void replicateLogEntries(final HLog.Entry[] entries)
  throws IOException {
checkOpen();
if (this.replicationSinkHandler == null) return;
this.replicationSinkHandler.replicateLogEntries(entries);
  }
{code}

I would like to hear your comments on this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7705) Can we make the method getCurrentPoolSize of HTablePool public?

2013-01-29 Thread Jieshan Bean (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13565320#comment-13565320
 ] 

Jieshan Bean commented on HBASE-7705:
-

I think we can make it public. 

 Can we make the method getCurrentPoolSize of HTablePool public?
 ---

 Key: HBASE-7705
 URL: https://issues.apache.org/jira/browse/HBASE-7705
 Project: HBase
  Issue Type: Wish
  Components: Client
Affects Versions: 0.94.3
Reporter: cuijianwei
Priority: Minor

 We use HTablePool to manager opened HTable in our applications. We want to 
 track the usage of HTablePool for different table names. Then we discover 
 that HTablePool#getCurrentPoolSize could help us:
 {code}
   int getCurrentPoolSize(String tableName) {
 return tables.size(tableName);
   }
 {code}
 However, this method could only be called in the hbase client package. Can we 
 make this method public?

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HBASE-7694) HBASE-6165 has been out of action after enabling security

2013-01-28 Thread Jieshan Bean (JIRA)
Jieshan Bean created HBASE-7694:
---

 Summary: HBASE-6165 has been out of action after enabling security
 Key: HBASE-7694
 URL: https://issues.apache.org/jira/browse/HBASE-7694
 Project: HBase
  Issue Type: Bug
  Components: Replication
Affects Versions: 0.94.4, 0.96.0
 Environment: Replication should use an independent msg queue, 
otherwise replication msg may occupy all the handlers. This feature added in 
HBase-6165, but becomes invalid after enabling security.
Reporter: Jieshan Bean




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Assigned] (HBASE-7694) HBASE-6165 has been out of action after enabling security

2013-01-28 Thread Jieshan Bean (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-7694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jieshan Bean reassigned HBASE-7694:
---

Assignee: Jieshan Bean

 HBASE-6165 has been out of action after enabling security
 -

 Key: HBASE-7694
 URL: https://issues.apache.org/jira/browse/HBASE-7694
 Project: HBase
  Issue Type: Bug
  Components: Replication
Affects Versions: 0.96.0, 0.94.4
 Environment: Replication should use an independent msg queue, 
 otherwise replication msg may occupy all the handlers. This feature added in 
 HBase-6165, but becomes invalid after enabling security.
Reporter: Jieshan Bean
Assignee: Jieshan Bean



--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-7694) HBASE-6165 has been out of action after enabling security

2013-01-28 Thread Jieshan Bean (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-7694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jieshan Bean updated HBASE-7694:


Description: Replication should use an independent msg queue, otherwise 
replication msg may occupy all the handlers. This feature added in HBase-6165, 
but becomes invalid after enabling security.
Environment: (was: Replication should use an independent msg queue, 
otherwise replication msg may occupy all the handlers. This feature added in 
HBase-6165, but becomes invalid after enabling security.)

 HBASE-6165 has been out of action after enabling security
 -

 Key: HBASE-7694
 URL: https://issues.apache.org/jira/browse/HBASE-7694
 Project: HBase
  Issue Type: Bug
  Components: Replication
Affects Versions: 0.96.0, 0.94.4
Reporter: Jieshan Bean
Assignee: Jieshan Bean

 Replication should use an independent msg queue, otherwise replication msg 
 may occupy all the handlers. This feature added in HBase-6165, but becomes 
 invalid after enabling security.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7694) HBASE-6165 has been out of action after enabling security

2013-01-28 Thread Jieshan Bean (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13564216#comment-13564216
 ] 

Jieshan Bean commented on HBASE-7694:
-

I will submit a patch after testing.

 HBASE-6165 has been out of action after enabling security
 -

 Key: HBASE-7694
 URL: https://issues.apache.org/jira/browse/HBASE-7694
 Project: HBase
  Issue Type: Bug
  Components: Replication
Affects Versions: 0.96.0, 0.94.4
Reporter: Jieshan Bean
Assignee: Jieshan Bean

 Replication should use an independent msg queue, otherwise replication msg 
 may occupy all the handlers. This feature added in HBase-6165, but becomes 
 invalid after enabling security.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7694) HBASE-6165 has been out of action after enabling security

2013-01-28 Thread Jieshan Bean (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13564917#comment-13564917
 ] 

Jieshan Bean commented on HBASE-7694:
-

bq.Do you see your master cluster regionservers successfully connecting to the 
zk quorum and then region servers of the slave cluster?

Yes.Master cluster regionservers could connect to slave cluster successfully. 
We use the same KDC configurations, and the same principals.

 HBASE-6165 has been out of action after enabling security
 -

 Key: HBASE-7694
 URL: https://issues.apache.org/jira/browse/HBASE-7694
 Project: HBase
  Issue Type: Bug
  Components: Replication
Affects Versions: 0.96.0, 0.94.4
Reporter: Jieshan Bean
Assignee: Jieshan Bean
Priority: Critical
 Fix For: 0.96.0, 0.94.5


 Replication should use an independent msg queue, otherwise replication msg 
 may occupy all the handlers. This feature added in HBASE-6165, but becomes 
 invalid after enabling security.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Updated] (HBASE-7694) HBASE-6165 has been out of action after enabling security

2013-01-28 Thread Jieshan Bean (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-7694?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jieshan Bean updated HBASE-7694:


Attachment: HBASE-7694-94.patch

Patch for 94.

 HBASE-6165 has been out of action after enabling security
 -

 Key: HBASE-7694
 URL: https://issues.apache.org/jira/browse/HBASE-7694
 Project: HBase
  Issue Type: Bug
  Components: Replication
Affects Versions: 0.96.0, 0.94.4
Reporter: Jieshan Bean
Assignee: Jieshan Bean
Priority: Critical
 Fix For: 0.96.0, 0.94.5

 Attachments: HBASE-7694-94.patch


 Replication should use an independent msg queue, otherwise replication msg 
 may occupy all the handlers. This feature added in HBASE-6165, but becomes 
 invalid after enabling security.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7694) HBASE-6165 has been out of action after enabling security

2013-01-28 Thread Jieshan Bean (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7694?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13564977#comment-13564977
 ] 

Jieshan Bean commented on HBASE-7694:
-

Yes, I think so.

 HBASE-6165 has been out of action after enabling security
 -

 Key: HBASE-7694
 URL: https://issues.apache.org/jira/browse/HBASE-7694
 Project: HBase
  Issue Type: Bug
  Components: Replication
Affects Versions: 0.96.0, 0.94.4
Reporter: Jieshan Bean
Assignee: Jieshan Bean
Priority: Critical
 Fix For: 0.96.0, 0.94.5

 Attachments: HBASE-7694-94.patch


 Replication should use an independent msg queue, otherwise replication msg 
 may occupy all the handlers. This feature added in HBASE-6165, but becomes 
 invalid after enabling security.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7324) Archive the logs instead of deletion after distributed splitting

2012-12-28 Thread Jieshan Bean (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13540697#comment-13540697
 ] 

Jieshan Bean commented on HBASE-7324:
-

Sorry, I misread the code. It's not a problem.

 Archive the logs instead of deletion after distributed splitting
 

 Key: HBASE-7324
 URL: https://issues.apache.org/jira/browse/HBASE-7324
 Project: HBase
  Issue Type: Bug
  Components: wal
Affects Versions: 0.94.3, 0.96.0
Reporter: Jieshan Bean

 We should always move the logs to .oldlogs instead of deleting them directly. 
 The negative effect of this bug may cause data-loss if we enabled replication.
 The below code is extracted from SplitLogManager#splitLogDistributed:
 {code}
 for(Path logDir: logDirs){
   status.setStatus(Cleaning up log directory...);
   try {
 if (fs.exists(logDir)  !fs.delete(logDir, false)) {
   LOG.warn(Unable to delete log src dir. Ignoring.  + logDir);
 }
   } catch (IOException ioe) {
 FileStatus[] files = fs.listStatus(logDir);
 if (files != null  files.length  0) {
   LOG.warn(returning success without actually splitting and  + 
   deleting all the log files in path  + logDir);
 } else {
   LOG.warn(Unable to delete log src dir. Ignoring.  + logDir, ioe);
 }
   }
   tot_mgr_log_split_batch_success.incrementAndGet();
 }
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Resolved] (HBASE-7324) Archive the logs instead of deletion after distributed splitting

2012-12-28 Thread Jieshan Bean (JIRA)

 [ 
https://issues.apache.org/jira/browse/HBASE-7324?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Jieshan Bean resolved HBASE-7324.
-

Resolution: Invalid

 Archive the logs instead of deletion after distributed splitting
 

 Key: HBASE-7324
 URL: https://issues.apache.org/jira/browse/HBASE-7324
 Project: HBase
  Issue Type: Bug
  Components: wal
Affects Versions: 0.94.3, 0.96.0
Reporter: Jieshan Bean

 We should always move the logs to .oldlogs instead of deleting them directly. 
 The negative effect of this bug may cause data-loss if we enabled replication.
 The below code is extracted from SplitLogManager#splitLogDistributed:
 {code}
 for(Path logDir: logDirs){
   status.setStatus(Cleaning up log directory...);
   try {
 if (fs.exists(logDir)  !fs.delete(logDir, false)) {
   LOG.warn(Unable to delete log src dir. Ignoring.  + logDir);
 }
   } catch (IOException ioe) {
 FileStatus[] files = fs.listStatus(logDir);
 if (files != null  files.length  0) {
   LOG.warn(returning success without actually splitting and  + 
   deleting all the log files in path  + logDir);
 } else {
   LOG.warn(Unable to delete log src dir. Ignoring.  + logDir, ioe);
 }
   }
   tot_mgr_log_split_batch_success.incrementAndGet();
 }
 {code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Created] (HBASE-7324) Archive the logs instead of deletion after distributed splitting

2012-12-11 Thread Jieshan Bean (JIRA)
Jieshan Bean created HBASE-7324:
---

 Summary: Archive the logs instead of deletion after distributed 
splitting
 Key: HBASE-7324
 URL: https://issues.apache.org/jira/browse/HBASE-7324
 Project: HBase
  Issue Type: Bug
  Components: wal
Affects Versions: 0.94.3, 0.96.0
Reporter: Jieshan Bean


We should always move the logs to .oldlogs instead of deleting them directly. 
The negative effect of this bug may cause data-loss if we enabled replication.
The below code is extracted from SplitLogManager#splitLogDistributed:
{code}
for(Path logDir: logDirs){
  status.setStatus(Cleaning up log directory...);
  try {
if (fs.exists(logDir)  !fs.delete(logDir, false)) {
  LOG.warn(Unable to delete log src dir. Ignoring.  + logDir);
}
  } catch (IOException ioe) {
FileStatus[] files = fs.listStatus(logDir);
if (files != null  files.length  0) {
  LOG.warn(returning success without actually splitting and  + 
  deleting all the log files in path  + logDir);
} else {
  LOG.warn(Unable to delete log src dir. Ignoring.  + logDir, ioe);
}
  }
  tot_mgr_log_split_batch_success.incrementAndGet();
}
{code}

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7313) ColumnPaginationFilter should reset count when moving to NEXT_ROW

2012-12-10 Thread Jieshan Bean (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7313?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13528581#comment-13528581
 ] 

Jieshan Bean commented on HBASE-7313:
-

This reset has been there in ColumnPaginationFilter#reset(), right? 

 ColumnPaginationFilter should reset count when moving to NEXT_ROW
 -

 Key: HBASE-7313
 URL: https://issues.apache.org/jira/browse/HBASE-7313
 Project: HBase
  Issue Type: Bug
  Components: Filters
Affects Versions: 0.94.3, 0.96.0
Reporter: Varun Sharma
Assignee: Varun Sharma
 Fix For: 0.96.0, 0.94.4

 Attachments: 7313-0.94.txt, 7313-trunk.txt


 ColumnPaginationFilter does not reset count to zero on moving to next row. 
 Hence, if we have already gotten limit number of columns - the subsequent 
 rows will always return 0 columns.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7008) Set scanner caching to a better default

2012-10-18 Thread Jieshan Bean (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13479588#comment-13479588
 ] 

Jieshan Bean commented on HBASE-7008:
-

Thanks for the patch, xieliang.
I suggest to introduce a new member variable in HConstants to define this 
dafault value. What do you think? 

 Set scanner caching to a better default
 ---

 Key: HBASE-7008
 URL: https://issues.apache.org/jira/browse/HBASE-7008
 Project: HBase
  Issue Type: Bug
  Components: Client
Reporter: liang xie
Assignee: liang xie
 Attachments: HBASE-7008.patch


 per 
 http://search-hadoop.com/m/qaRu9iM2f02/Set+scanner+caching+to+a+better+default%253Fsubj=Set+scanner+caching+to+a+better+default+
 let's set to 100 by default

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-7008) Set scanner caching to a better default

2012-10-18 Thread Jieshan Bean (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-7008?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13479603#comment-13479603
 ] 

Jieshan Bean commented on HBASE-7008:
-

Ok, fine. Magic number is always not good, but anyway it's not a problem. 


 Set scanner caching to a better default
 ---

 Key: HBASE-7008
 URL: https://issues.apache.org/jira/browse/HBASE-7008
 Project: HBase
  Issue Type: Bug
  Components: Client
Reporter: liang xie
Assignee: liang xie
 Attachments: HBASE-7008.patch


 per 
 http://search-hadoop.com/m/qaRu9iM2f02/Set+scanner+caching+to+a+better+default%253Fsubj=Set+scanner+caching+to+a+better+default+
 let's set to 100 by default

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6854) Deletion of SPLITTING node on split rollback should clear the region from RIT

2012-09-24 Thread Jieshan Bean (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13461613#comment-13461613
 ] 

Jieshan Bean commented on HBASE-6854:
-

I think it's ok:)

 Deletion of SPLITTING node on split rollback should clear the region from RIT
 -

 Key: HBASE-6854
 URL: https://issues.apache.org/jira/browse/HBASE-6854
 Project: HBase
  Issue Type: Bug
Reporter: ramkrishna.s.vasudevan
 Fix For: 0.94.3

 Attachments: HBASE-6854.patch


 If a failure happens in split before OFFLINING_PARENT, we tend to rollback 
 the split including deleting the znodes created.
 On deletion of the RS_ZK_SPLITTING node we are getting a callback but not 
 remvoving from RIT. We need to remove it from RIT, anyway SSH logic is well 
 guarded in case the delete event comes due to RS down scenario.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-5950) Add a decimal comparator for Filter

2012-09-24 Thread Jieshan Bean (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5950?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13461734#comment-13461734
 ] 

Jieshan Bean commented on HBASE-5950:
-

This comparator is not needed if we store the Integer/Double/Float as bytes 
directly. Right?

 Add a decimal comparator for Filter
 ---

 Key: HBASE-5950
 URL: https://issues.apache.org/jira/browse/HBASE-5950
 Project: HBase
  Issue Type: New Feature
  Components: Filters
Affects Versions: 0.94.0, 0.96.0
Reporter: Jieshan Bean
Assignee: Jieshan Bean

 Suppose we have a requirement like below:
 we want to get the rows with one specified column value larger than A and 
 less than B.
 (They are all decimals or integers)
 namely: 
A  Integer.valueof(column)  B
 Use BinaryComparator will not help us to archive that goal:
 e.g.   suppose A = 100, B = 200, one column value is 11.
So it can satisfy that condition, but it's not the row we wanted.
  
 So I suggest to add one comparator to help implementing this.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6854) Deletion of SPLITTING node on split rollback should clear the region from RIT

2012-09-23 Thread Jieshan Bean (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6854?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13461565#comment-13461565
 ] 

Jieshan Bean commented on HBASE-6854:
-

We also found the same problem. Only 2 minor comments:
1.
{code}
LOG.debug(Ephemeral node deleted.  Found in SPLIITING state.  + Removing 
from RIT 
{code}
SPLIITING should be SPLITTING.
2. HBaseAdmin in testShouldClearRITWhenNodeFoundInSplittingState should be 
closed in finally block.

Otherwise, I'm +1 on this patch.

 Deletion of SPLITTING node on split rollback should clear the region from RIT
 -

 Key: HBASE-6854
 URL: https://issues.apache.org/jira/browse/HBASE-6854
 Project: HBase
  Issue Type: Bug
Reporter: ramkrishna.s.vasudevan
 Fix For: 0.94.3

 Attachments: HBASE-6854.patch


 If a failure happens in split before OFFLINING_PARENT, we tend to rollback 
 the split including deleting the znodes created.
 On deletion of the RS_ZK_SPLITTING node we are getting a callback but not 
 remvoving from RIT. We need to remove it from RIT, anyway SSH logic is well 
 guarded in case the delete event comes due to RS down scenario.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6491) add limit function at ClientScanner

2012-09-20 Thread Jieshan Bean (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13460126#comment-13460126
 ] 

Jieshan Bean commented on HBASE-6491:
-

@ronghai: Why not use PageFilter instead of adding this new method?


 add limit function at ClientScanner
 ---

 Key: HBASE-6491
 URL: https://issues.apache.org/jira/browse/HBASE-6491
 Project: HBase
  Issue Type: New Feature
  Components: Client
Affects Versions: 0.96.0
Reporter: ronghai.ma
Assignee: ronghai.ma
  Labels: patch
 Fix For: 0.96.0

 Attachments: ClientScanner.java, HBASE-6491.patch


 Add a new method in ClientScanner to implement a function like LIMIT in MySQL.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira


[jira] [Commented] (HBASE-6748) Endless recursive of deleteNode happened in SplitLogManager#DeleteAsyncCallback

2012-09-11 Thread Jieshan Bean (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13453587#comment-13453587
 ] 

Jieshan Bean commented on HBASE-6748:
-

Yes. Long.MAX_VALUE is the problem. 


 Endless recursive of deleteNode happened in 
 SplitLogManager#DeleteAsyncCallback
 ---

 Key: HBASE-6748
 URL: https://issues.apache.org/jira/browse/HBASE-6748
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.96.0, 0.94.1
Reporter: Jieshan Bean
Priority: Critical
 Fix For: 0.96.0, 0.94.3


 You can ealily understand the problem from the below logs:
 {code}
 [2012-09-01 11:41:02,062] [WARN ] 
 [MASTER_SERVER_OPERATIONS-xh03,2,1339549619270-1] 
 [org.apache.hadoop.hbase.master.SplitLogManager$CreateAsyncCallback 978] 
 create rc =SESSIONEXPIRED for 
 /hbase/splitlog/hdfs%3A%2F%2Fxh01%3A9000%2Fhbase%2F.logs%2Fxh01%2C20020%2C1339552105088-splitting%2Fxh01%252C20020%252C1339552105088.1339557014846
  remaining retries=3
 [2012-09-01 11:41:02,062] [WARN ] 
 [MASTER_SERVER_OPERATIONS-xh03,2,1339549619270-1] 
 [org.apache.hadoop.hbase.master.SplitLogManager$CreateAsyncCallback 978] 
 create rc =SESSIONEXPIRED for 
 /hbase/splitlog/hdfs%3A%2F%2Fxh01%3A9000%2Fhbase%2F.logs%2Fxh01%2C20020%2C1339552105088-splitting%2Fxh01%252C20020%252C1339552105088.1339557014846
  remaining retries=2
 [2012-09-01 11:41:02,063] [WARN ] 
 [MASTER_SERVER_OPERATIONS-xh03,2,1339549619270-1] 
 [org.apache.hadoop.hbase.master.SplitLogManager$CreateAsyncCallback 978] 
 create rc =SESSIONEXPIRED for 
 /hbase/splitlog/hdfs%3A%2F%2Fxh01%3A9000%2Fhbase%2F.logs%2Fxh01%2C20020%2C1339552105088-splitting%2Fxh01%252C20020%252C1339552105088.1339557014846
  remaining retries=1
 [2012-09-01 11:41:02,063] [WARN ] 
 [MASTER_SERVER_OPERATIONS-xh03,2,1339549619270-1] 
 [org.apache.hadoop.hbase.master.SplitLogManager$CreateAsyncCallback 978] 
 create rc =SESSIONEXPIRED for 
 /hbase/splitlog/hdfs%3A%2F%2Fxh01%3A9000%2Fhbase%2F.logs%2Fxh01%2C20020%2C1339552105088-splitting%2Fxh01%252C20020%252C1339552105088.1339557014846
  remaining retries=0
 [2012-09-01 11:41:02,063] [WARN ] 
 [MASTER_SERVER_OPERATIONS-xh03,2,1339549619270-1] 
 [org.apache.hadoop.hbase.master.SplitLogManager 393] failed to create task 
 node/hbase/splitlog/hdfs%3A%2F%2Fxh01%3A9000%2Fhbase%2F.logs%2Fxh01%2C20020%2C1339552105088-splitting%2Fxh01%252C20020%252C1339552105088.1339557014846
 [2012-09-01 11:41:02,063] [WARN ] 
 [MASTER_SERVER_OPERATIONS-xh03,2,1339549619270-1] 
 [org.apache.hadoop.hbase.master.SplitLogManager 353] Error splitting 
 /hbase/splitlog/hdfs%3A%2F%2Fxh01%3A9000%2Fhbase%2F.logs%2Fxh01%2C20020%2C1339552105088-splitting%2Fxh01%252C20020%252C1339552105088.1339557014846
 [2012-09-01 11:41:02,063] [WARN ] 
 [MASTER_SERVER_OPERATIONS-xh03,2,1339549619270-1] 
 [org.apache.hadoop.hbase.master.SplitLogManager$DeleteAsyncCallback 1052] 
 delete rc=SESSIONEXPIRED for 
 /hbase/splitlog/hdfs%3A%2F%2Fxh01%3A9000%2Fhbase%2F.logs%2Fxh01%2C20020%2C1339552105088-splitting%2Fxh01%252C20020%252C1339552105088.1339557014846
  remaining retries=9223372036854775807
 [2012-09-01 11:41:02,064] [WARN ] 
 [MASTER_SERVER_OPERATIONS-xh03,2,1339549619270-1] 
 [org.apache.hadoop.hbase.master.SplitLogManager$DeleteAsyncCallback 1052] 
 delete rc=SESSIONEXPIRED for 
 /hbase/splitlog/hdfs%3A%2F%2Fxh01%3A9000%2Fhbase%2F.logs%2Fxh01%2C20020%2C1339552105088-splitting%2Fxh01%252C20020%252C1339552105088.1339557014846
  remaining retries=9223372036854775806
 [2012-09-01 11:41:02,064] [WARN ] 
 [MASTER_SERVER_OPERATIONS-xh03,2,1339549619270-1] 
 [org.apache.hadoop.hbase.master.SplitLogManager$DeleteAsyncCallback 1052] 
 delete rc=SESSIONEXPIRED for 
 /hbase/splitlog/hdfs%3A%2F%2Fxh01%3A9000%2Fhbase%2F.logs%2Fxh01%2C20020%2C1339552105088-splitting%2Fxh01%252C20020%252C1339552105088.1339557014846
  remaining retries=9223372036854775805
 [2012-09-01 11:41:02,064] [WARN ] 
 [MASTER_SERVER_OPERATIONS-xh03,2,1339549619270-1] 
 [org.apache.hadoop.hbase.master.SplitLogManager$DeleteAsyncCallback 1052] 
 delete rc=SESSIONEXPIRED for 
 /hbase/splitlog/hdfs%3A%2F%2Fxh01%3A9000%2Fhbase%2F.logs%2Fxh01%2C20020%2C1339552105088-splitting%2Fxh01%252C20020%252C1339552105088.1339557014846
  remaining retries=9223372036854775804
 [2012-09-01 11:41:02,065] [WARN ] 
 [MASTER_SERVER_OPERATIONS-xh03,2,1339549619270-1] 
 [org.apache.hadoop.hbase.master.SplitLogManager$DeleteAsyncCallback 1052] 
 delete rc=SESSIONEXPIRED for 
 /hbase/splitlog/hdfs%3A%2F%2Fxh01%3A9000%2Fhbase%2F.logs%2Fxh01%2C20020%2C1339552105088-splitting%2Fxh01%252C20020%252C1339552105088.1339557014846
  remaining retries=9223372036854775803
 ...
 [2012-09-01 11:41:03,307] [ERROR] 
 

[jira] [Commented] (HBASE-6748) Endless recursive of deleteNode happened in SplitLogManager#DeleteAsyncCallback

2012-09-11 Thread Jieshan Bean (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-6748?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13453666#comment-13453666
 ] 

Jieshan Bean commented on HBASE-6748:
-

Either. Both Master starts up and region server failure handling may trigger 
HLog splitting.

Yes, I think HMaster should abort when sessionTimeout happens.

 Endless recursive of deleteNode happened in 
 SplitLogManager#DeleteAsyncCallback
 ---

 Key: HBASE-6748
 URL: https://issues.apache.org/jira/browse/HBASE-6748
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.96.0, 0.94.1
Reporter: Jieshan Bean
Priority: Critical
 Fix For: 0.96.0, 0.94.3


 You can ealily understand the problem from the below logs:
 {code}
 [2012-09-01 11:41:02,062] [WARN ] 
 [MASTER_SERVER_OPERATIONS-xh03,2,1339549619270-1] 
 [org.apache.hadoop.hbase.master.SplitLogManager$CreateAsyncCallback 978] 
 create rc =SESSIONEXPIRED for 
 /hbase/splitlog/hdfs%3A%2F%2Fxh01%3A9000%2Fhbase%2F.logs%2Fxh01%2C20020%2C1339552105088-splitting%2Fxh01%252C20020%252C1339552105088.1339557014846
  remaining retries=3
 [2012-09-01 11:41:02,062] [WARN ] 
 [MASTER_SERVER_OPERATIONS-xh03,2,1339549619270-1] 
 [org.apache.hadoop.hbase.master.SplitLogManager$CreateAsyncCallback 978] 
 create rc =SESSIONEXPIRED for 
 /hbase/splitlog/hdfs%3A%2F%2Fxh01%3A9000%2Fhbase%2F.logs%2Fxh01%2C20020%2C1339552105088-splitting%2Fxh01%252C20020%252C1339552105088.1339557014846
  remaining retries=2
 [2012-09-01 11:41:02,063] [WARN ] 
 [MASTER_SERVER_OPERATIONS-xh03,2,1339549619270-1] 
 [org.apache.hadoop.hbase.master.SplitLogManager$CreateAsyncCallback 978] 
 create rc =SESSIONEXPIRED for 
 /hbase/splitlog/hdfs%3A%2F%2Fxh01%3A9000%2Fhbase%2F.logs%2Fxh01%2C20020%2C1339552105088-splitting%2Fxh01%252C20020%252C1339552105088.1339557014846
  remaining retries=1
 [2012-09-01 11:41:02,063] [WARN ] 
 [MASTER_SERVER_OPERATIONS-xh03,2,1339549619270-1] 
 [org.apache.hadoop.hbase.master.SplitLogManager$CreateAsyncCallback 978] 
 create rc =SESSIONEXPIRED for 
 /hbase/splitlog/hdfs%3A%2F%2Fxh01%3A9000%2Fhbase%2F.logs%2Fxh01%2C20020%2C1339552105088-splitting%2Fxh01%252C20020%252C1339552105088.1339557014846
  remaining retries=0
 [2012-09-01 11:41:02,063] [WARN ] 
 [MASTER_SERVER_OPERATIONS-xh03,2,1339549619270-1] 
 [org.apache.hadoop.hbase.master.SplitLogManager 393] failed to create task 
 node/hbase/splitlog/hdfs%3A%2F%2Fxh01%3A9000%2Fhbase%2F.logs%2Fxh01%2C20020%2C1339552105088-splitting%2Fxh01%252C20020%252C1339552105088.1339557014846
 [2012-09-01 11:41:02,063] [WARN ] 
 [MASTER_SERVER_OPERATIONS-xh03,2,1339549619270-1] 
 [org.apache.hadoop.hbase.master.SplitLogManager 353] Error splitting 
 /hbase/splitlog/hdfs%3A%2F%2Fxh01%3A9000%2Fhbase%2F.logs%2Fxh01%2C20020%2C1339552105088-splitting%2Fxh01%252C20020%252C1339552105088.1339557014846
 [2012-09-01 11:41:02,063] [WARN ] 
 [MASTER_SERVER_OPERATIONS-xh03,2,1339549619270-1] 
 [org.apache.hadoop.hbase.master.SplitLogManager$DeleteAsyncCallback 1052] 
 delete rc=SESSIONEXPIRED for 
 /hbase/splitlog/hdfs%3A%2F%2Fxh01%3A9000%2Fhbase%2F.logs%2Fxh01%2C20020%2C1339552105088-splitting%2Fxh01%252C20020%252C1339552105088.1339557014846
  remaining retries=9223372036854775807
 [2012-09-01 11:41:02,064] [WARN ] 
 [MASTER_SERVER_OPERATIONS-xh03,2,1339549619270-1] 
 [org.apache.hadoop.hbase.master.SplitLogManager$DeleteAsyncCallback 1052] 
 delete rc=SESSIONEXPIRED for 
 /hbase/splitlog/hdfs%3A%2F%2Fxh01%3A9000%2Fhbase%2F.logs%2Fxh01%2C20020%2C1339552105088-splitting%2Fxh01%252C20020%252C1339552105088.1339557014846
  remaining retries=9223372036854775806
 [2012-09-01 11:41:02,064] [WARN ] 
 [MASTER_SERVER_OPERATIONS-xh03,2,1339549619270-1] 
 [org.apache.hadoop.hbase.master.SplitLogManager$DeleteAsyncCallback 1052] 
 delete rc=SESSIONEXPIRED for 
 /hbase/splitlog/hdfs%3A%2F%2Fxh01%3A9000%2Fhbase%2F.logs%2Fxh01%2C20020%2C1339552105088-splitting%2Fxh01%252C20020%252C1339552105088.1339557014846
  remaining retries=9223372036854775805
 [2012-09-01 11:41:02,064] [WARN ] 
 [MASTER_SERVER_OPERATIONS-xh03,2,1339549619270-1] 
 [org.apache.hadoop.hbase.master.SplitLogManager$DeleteAsyncCallback 1052] 
 delete rc=SESSIONEXPIRED for 
 /hbase/splitlog/hdfs%3A%2F%2Fxh01%3A9000%2Fhbase%2F.logs%2Fxh01%2C20020%2C1339552105088-splitting%2Fxh01%252C20020%252C1339552105088.1339557014846
  remaining retries=9223372036854775804
 [2012-09-01 11:41:02,065] [WARN ] 
 [MASTER_SERVER_OPERATIONS-xh03,2,1339549619270-1] 
 [org.apache.hadoop.hbase.master.SplitLogManager$DeleteAsyncCallback 1052] 
 delete rc=SESSIONEXPIRED for 
 /hbase/splitlog/hdfs%3A%2F%2Fxh01%3A9000%2Fhbase%2F.logs%2Fxh01%2C20020%2C1339552105088-splitting%2Fxh01%252C20020%252C1339552105088.1339557014846
  

  1   2   3   4   >