[jira] [Commented] (HBASE-1936) ClassLoader that loads from hdfs; useful adding filters to classpath without having to restart services

2012-04-14 Thread Jieshan Bean (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13254268#comment-13254268
 ] 

Jieshan Bean commented on HBASE-1936:
-

I will upload the patch to review board after more tests.
Based on stack's original design, ClassPathLocalizer is a configurable class, 
that's why we use the refrection to call the method. I think I can change this. 
@stack, what do you think?
I'm also thinking about how to add test case. Thank you, Ted.

@Andrew, thank you. I will try to reuse the code.  I agree with we should use 
one classloader. The classloader in CP is not a general one.


 ClassLoader that loads from hdfs; useful adding filters to classpath without 
 having to restart services
 ---

 Key: HBASE-1936
 URL: https://issues.apache.org/jira/browse/HBASE-1936
 Project: HBase
  Issue Type: New Feature
Reporter: stack
Assignee: Jieshan Bean
  Labels: noob
 Attachments: HBASE-1936-trunk(forReview).patch, cp_from_hdfs.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-1936) ClassLoader that loads from hdfs; useful adding filters to classpath without having to restart services

2012-04-08 Thread Jieshan Bean (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13249655#comment-13249655
 ] 

Jieshan Bean commented on HBASE-1936:
-

Thank you, stack.
I will make a patch based on this one.

 ClassLoader that loads from hdfs; useful adding filters to classpath without 
 having to restart services
 ---

 Key: HBASE-1936
 URL: https://issues.apache.org/jira/browse/HBASE-1936
 Project: HBase
  Issue Type: New Feature
Reporter: stack
Assignee: Daniel Ploeg
  Labels: noob
 Attachments: cp_from_hdfs.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-1936) ClassLoader that loads from hdfs; useful adding filters to classpath without having to restart services

2012-04-08 Thread Jieshan Bean (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13249674#comment-13249674
 ] 

Jieshan Bean commented on HBASE-1936:
-

Thank you Lars. I will check:)

 ClassLoader that loads from hdfs; useful adding filters to classpath without 
 having to restart services
 ---

 Key: HBASE-1936
 URL: https://issues.apache.org/jira/browse/HBASE-1936
 Project: HBase
  Issue Type: New Feature
Reporter: stack
Assignee: Daniel Ploeg
  Labels: noob
 Attachments: cp_from_hdfs.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-1936) ClassLoader that loads from hdfs; useful adding filters to classpath without having to restart services

2012-04-06 Thread Jieshan Bean (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13248335#comment-13248335
 ] 

Jieshan Bean commented on HBASE-1936:
-

Yes.
Regarding on how to add the new class into HDFS seems not mentioned in this 
patch. Should we also take care of this?


 ClassLoader that loads from hdfs; useful adding filters to classpath without 
 having to restart services
 ---

 Key: HBASE-1936
 URL: https://issues.apache.org/jira/browse/HBASE-1936
 Project: HBase
  Issue Type: New Feature
Reporter: stack
Assignee: Daniel Ploeg
  Labels: noob
 Attachments: cp_from_hdfs.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-1936) ClassLoader that loads from hdfs; useful adding filters to classpath without having to restart services

2012-04-05 Thread Jieshan Bean (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-1936?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13247180#comment-13247180
 ] 

Jieshan Bean commented on HBASE-1936:
-

@stack,
can you look back to this new feature again? I think it's very useful either 
for hot deployment of Filter or dynamic coprocessor. Any potential reasons for 
hanging this task up?

Thank you.

 ClassLoader that loads from hdfs; useful adding filters to classpath without 
 having to restart services
 ---

 Key: HBASE-1936
 URL: https://issues.apache.org/jira/browse/HBASE-1936
 Project: HBase
  Issue Type: New Feature
Reporter: stack
Assignee: Daniel Ploeg
  Labels: noob
 Attachments: cp_from_hdfs.patch




--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5682) Allow HConnectionImplementation to recover from ZK connection loss (for 0.94 only)

2012-04-02 Thread Jieshan Bean (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13244093#comment-13244093
 ] 

Jieshan Bean commented on HBASE-5682:
-

Everything seems good to me. Only a minor doubt, is it necessary to close 
zooKeeper before set it as null?
If HConnectionImplementation#managed is true, HConnectionImplementation#abort 
doesn't set closed to true, just calls close method. It makes sense to me:). So 
the retry logic introduced in HBASE-5153 seems redundant.
If one want to manage the connection by himself. If the connection is aborted. 
We should suggest to recreate the HConnection and HTable, right? 

 Allow HConnectionImplementation to recover from ZK connection loss (for 0.94 
 only)
 --

 Key: HBASE-5682
 URL: https://issues.apache.org/jira/browse/HBASE-5682
 Project: HBase
  Issue Type: Improvement
  Components: client
Reporter: Lars Hofhansl
Assignee: Lars Hofhansl
Priority: Critical
 Fix For: 0.94.0

 Attachments: 5682-all-v2.txt, 5682-all-v3.txt, 5682-all.txt, 
 5682-v2.txt, 5682.txt


 Just realized that without this HBASE-4805 is broken.
 I.e. there's no point keeping a persistent HConnection around if it can be 
 rendered permanently unusable if the ZK connection is lost temporarily.
 Note that this is fixed in 0.96 with HBASE-5399 (but that seems to big to 
 backport)

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5312) Closed parent region present in Hlog.lastSeqWritten

2012-03-28 Thread Jieshan Bean (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5312?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13240340#comment-13240340
 ] 

Jieshan Bean commented on HBASE-5312:
-

I'm afraid it's not the same issue with HBASE-5568. No concurrent flushing 
happened when the edit with the sequenceId 20312224 added into Region 
2acaf8e3acfd2e8a5825a1f6f0aca4a8. Though that problem may also happened in this 
issue.
{noformat}
00:28:25,460 INFO org.apache.hadoop.hbase.regionserver.HRegion: Finished 
memstore flush of ~153.9m for region 
Htable_UFDR_031,00332,1325808823997.2acaf8e3acfd2e8a5825a1f6f0aca4a8. in 
8444ms, sequenceid=20294110, compaction requested=true
00:28:25,460 DEBUG org.apache.hadoop.hbase.regionserver.CompactSplitThread: 
Compaction requested for 
Htable_UFDR_031,00332,1325808823997.2acaf8e3acfd2e8a5825a1f6f0aca4a8. because 
regionserver20020.cacheFlusher; priority=2, compaction queue size=5835
00:30:35,328 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Flush 
requested on 
Htable_UFDR_031,00332,1325808823997.2acaf8e3acfd2e8a5825a1f6f0aca4a8.
00:30:35,943 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Started 
memstore flush for 
Htable_UFDR_031,00332,1325808823997.2acaf8e3acfd2e8a5825a1f6f0aca4a8., current 
region memstore size 129.5m
00:30:37,963 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: NOT flushing 
memstore for region 
Htable_UFDR_031,00332,1325808823997.2acaf8e3acfd2e8a5825a1f6f0aca4a8., 
flushing=true, writesEnabled=true
00:30:37,971 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Flush 
requested on 
Htable_UFDR_031,00332,1325808823997.2acaf8e3acfd2e8a5825a1f6f0aca4a8.
00:30:37,971 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: Started 
memstore flush for 
Htable_UFDR_031,00332,1325808823997.2acaf8e3acfd2e8a5825a1f6f0aca4a8., current 
region memstore size 129.7m
00:30:47,907 INFO org.apache.hadoop.hbase.regionserver.Store: Renaming flushed 
file at 
hdfs://192.168.1.103:9000/hbase/Htable_UFDR_031/2acaf8e3acfd2e8a5825a1f6f0aca4a8/.tmp/5960455867013769207
 to 
hdfs://192.168.1.103:9000/hbase/Htable_UFDR_031/2acaf8e3acfd2e8a5825a1f6f0aca4a8/value/3682074154882687307
00:30:47,907 INFO org.apache.hadoop.hbase.regionserver.Store: Renaming flushed 
file at 
hdfs://192.168.1.103:9000/hbase/Htable_UFDR_031/2acaf8e3acfd2e8a5825a1f6f0aca4a8/.tmp/5960455867013769207
 to 
hdfs://192.168.1.103:9000/hbase/Htable_UFDR_031/2acaf8e3acfd2e8a5825a1f6f0aca4a8/value/3682074154882687307
00:30:49,241 INFO org.apache.hadoop.hbase.regionserver.Store: Added 
hdfs://192.168.1.103:9000/hbase/Htable_UFDR_031/2acaf8e3acfd2e8a5825a1f6f0aca4a8/value/3682074154882687307,
 entries=233841, sequenceid=20311822, memsize=129.5m, filesize=89.5m
00:30:49,242 INFO org.apache.hadoop.hbase.regionserver.HRegion: Finished 
memstore flush of ~129.5m for region 
Htable_UFDR_031,00332,1325808823997.2acaf8e3acfd2e8a5825a1f6f0aca4a8. in 
13299ms, sequenceid=20311822, compaction requested=true
00:30:49,242 DEBUG org.apache.hadoop.hbase.regionserver.CompactSplitThread: 
Compaction requested for 
Htable_UFDR_031,00332,1325808823997.2acaf8e3acfd2e8a5825a1f6f0aca4a8. because 
User-triggered split; priority=1, compaction queue size=5840
00:30:55,214 INFO org.apache.hadoop.hbase.regionserver.Store: Renaming flushed 
file at 
hdfs://192.168.1.103:9000/hbase/Htable_UFDR_031/2acaf8e3acfd2e8a5825a1f6f0aca4a8/.tmp/1755862026714756815
 to 
hdfs://192.168.1.103:9000/hbase/Htable_UFDR_031/2acaf8e3acfd2e8a5825a1f6f0aca4a8/value/973789709483406123
00:30:55,214 INFO org.apache.hadoop.hbase.regionserver.Store: Renaming flushed 
file at 
hdfs://192.168.1.103:9000/hbase/Htable_UFDR_031/2acaf8e3acfd2e8a5825a1f6f0aca4a8/.tmp/1755862026714756815
 to 
hdfs://192.168.1.103:9000/hbase/Htable_UFDR_031/2acaf8e3acfd2e8a5825a1f6f0aca4a8/value/973789709483406123
00:30:59,614 INFO org.apache.hadoop.hbase.regionserver.Store: Added 
hdfs://192.168.1.103:9000/hbase/Htable_UFDR_031/2acaf8e3acfd2e8a5825a1f6f0aca4a8/value/973789709483406123,
 entries=7537, sequenceid=20312223, memsize=4.2m, filesize=2.9m
00:30:59,787 INFO org.apache.hadoop.hbase.regionserver.HRegion: Finished 
memstore flush of ~133.5m for region 
Htable_UFDR_031,00332,1325808823997.2acaf8e3acfd2e8a5825a1f6f0aca4a8. in 
21816ms, sequenceid=20312223, compaction requested=true
00:30:59,787 DEBUG org.apache.hadoop.hbase.regionserver.CompactSplitThread: 
Compaction requested for 
Htable_UFDR_031,00332,1325808823997.2acaf8e3acfd2e8a5825a1f6f0aca4a8. because 
regionserver20020.cacheFlusher; priority=0, compaction queue size=5840
00:31:12,605 INFO org.apache.hadoop.hbase.regionserver.HRegion: Starting 
compaction on region 
Htable_UFDR_031,00332,1325808823997.2acaf8e3acfd2e8a5825a1f6f0aca4a8.
00:31:12,607 INFO org.apache.hadoop.hbase.regionserver.HRegion: completed 
compaction on region 
Htable_UFDR_031,00332,1325808823997.2acaf8e3acfd2e8a5825a1f6f0aca4a8. after 0sec
00:31:12,607 INFO 

[jira] [Commented] (HBASE-4991) Provide capability to delete named region

2012-03-01 Thread Jieshan Bean (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13219951#comment-13219951
 ] 

Jieshan Bean commented on HBASE-4991:
-

@Jieshan Yes that'll work. How you do it? You have a patch?

@Stack:
I have written a client tool to do this for 90. Only delete the specified 
regions. I will modify it and submit a patch if this tool is necessary.


 Provide capability to delete named region
 -

 Key: HBASE-4991
 URL: https://issues.apache.org/jira/browse/HBASE-4991
 Project: HBase
  Issue Type: Improvement
Reporter: Ted Yu
Assignee: Mubarak Seyed
 Fix For: 0.94.0

 Attachments: HBASE-4991.trunk.v1.patch, HBASE-4991.trunk.v2.patch


 See discussion titled 'Able to control routing to Solr shards or not' on 
 lily-discuss
 User may want to quickly dispose of out of date records by deleting specific 
 regions. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-4991) Provide capability to delete named region

2012-02-29 Thread Jieshan Bean (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-4991?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13219785#comment-13219785
 ] 

Jieshan Bean commented on HBASE-4991:
-

We also have this requirement: delete some specified regions, specillaly about 
the data in HDFS, and merge those regions. We do it after diable the table. So 
it seems very simple:
1. Ensure the table has been disabled.
2. Scan META, and find our all the regions should be deleted.
3. Delete information from .META.
4. Delete Region directory in HDFS.
5. Add a new empty region to avoid region hole in .META.
6. enable table.

 Provide capability to delete named region
 -

 Key: HBASE-4991
 URL: https://issues.apache.org/jira/browse/HBASE-4991
 Project: HBase
  Issue Type: Improvement
Reporter: Ted Yu
Assignee: Mubarak Seyed
 Fix For: 0.94.0

 Attachments: HBASE-4991.trunk.v1.patch, HBASE-4991.trunk.v2.patch


 See discussion titled 'Able to control routing to Solr shards or not' on 
 lily-discuss
 User may want to quickly dispose of out of date records by deleting specific 
 regions. 

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5396) Handle the regions in regionPlans while processing ServerShutdownHandler

2012-02-21 Thread Jieshan Bean (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13213237#comment-13213237
 ] 

Jieshan Bean commented on HBASE-5396:
-

All tests passed for TRUNK.

 Handle the regions in regionPlans while processing ServerShutdownHandler
 

 Key: HBASE-5396
 URL: https://issues.apache.org/jira/browse/HBASE-5396
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.90.6
Reporter: Jieshan Bean
Assignee: Jieshan Bean
 Fix For: 0.94.0, 0.90.6, 0.92.1

 Attachments: HBASE-5396-90-V2.patch, HBASE-5396-90-final.patch, 
 HBASE-5396-90-forReview.patch, HBASE-5396-90.patch, HBASE-5396-92.patch, 
 HBASE-5396-trunk.patch, Logs-TestFor92.rar


 The regions plan to open on this server while ServerShutdownHandler is 
 handling, just be removed from AM.regionPlans, and only left to 
 TimeoutMonitor handle these regions. This need to optimize.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5396) Handle the regions in regionPlans while processing ServerShutdownHandler

2012-02-21 Thread Jieshan Bean (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13213349#comment-13213349
 ] 

Jieshan Bean commented on HBASE-5396:
-

@stack,
RegionServer process was killed manually. Not aborted itself. This is the steps 
of my test:
Suppose there's 3 nodes in cluster: A, B, C.
1. Create a table with 1000 regions.
2. Assign all regions to C.
3. Kill C. During the ServerShutdownHandler processing, then kill A(At this 
time, some regions may been openning on this server.)

See whether all the regions can be opened in time(And also check the regions 
which in regionPlans).

 Handle the regions in regionPlans while processing ServerShutdownHandler
 

 Key: HBASE-5396
 URL: https://issues.apache.org/jira/browse/HBASE-5396
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.90.6
Reporter: Jieshan Bean
Assignee: Jieshan Bean
 Fix For: 0.94.0, 0.90.6, 0.92.1

 Attachments: HBASE-5396-90-V2.patch, HBASE-5396-90-final.patch, 
 HBASE-5396-90-forReview.patch, HBASE-5396-90.patch, HBASE-5396-92.patch, 
 HBASE-5396-trunk.patch, Logs-TestFor92.rar


 The regions plan to open on this server while ServerShutdownHandler is 
 handling, just be removed from AM.regionPlans, and only left to 
 TimeoutMonitor handle these regions. This need to optimize.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5396) Handle the regions in regionPlans while processing ServerShutdownHandler

2012-02-21 Thread Jieshan Bean (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13213381#comment-13213381
 ] 

Jieshan Bean commented on HBASE-5396:
-

Only see the below and related logs. It means the region in regionPlans can be 
reassigned with this patch.
{noformat}
2012-02-20 23:25:36,472 INFO 
org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: 1 regions which 
planned to open on C3S31,20020,1329798177361 be re-assigned.
{noformat}

Thanks, Stack.

 Handle the regions in regionPlans while processing ServerShutdownHandler
 

 Key: HBASE-5396
 URL: https://issues.apache.org/jira/browse/HBASE-5396
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.90.6
Reporter: Jieshan Bean
Assignee: Jieshan Bean
 Fix For: 0.94.0, 0.90.6, 0.92.1

 Attachments: HBASE-5396-90-V2.patch, HBASE-5396-90-final.patch, 
 HBASE-5396-90-forReview.patch, HBASE-5396-90.patch, HBASE-5396-92.patch, 
 HBASE-5396-trunk.patch, Logs-TestFor92.rar


 The regions plan to open on this server while ServerShutdownHandler is 
 handling, just be removed from AM.regionPlans, and only left to 
 TimeoutMonitor handle these regions. This need to optimize.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5396) Handle the regions in regionPlans while processing ServerShutdownHandler

2012-02-20 Thread Jieshan Bean (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13211917#comment-13211917
 ] 

Jieshan Bean commented on HBASE-5396:
-

I tested the patch for 92 by unit test and also in real cluster. This problem 
seems not represent in 92 and trunk version. I will give more tests and went 
through the code to check whether it's necessary for 92 and trunk. 

 Handle the regions in regionPlans while processing ServerShutdownHandler
 

 Key: HBASE-5396
 URL: https://issues.apache.org/jira/browse/HBASE-5396
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.90.6
Reporter: Jieshan Bean
Assignee: Jieshan Bean
 Fix For: 0.94.0, 0.90.6, 0.92.1

 Attachments: HBASE-5396-90-V2.patch, HBASE-5396-90-final.patch, 
 HBASE-5396-90-forReview.patch, HBASE-5396-90.patch


 The regions plan to open on this server while ServerShutdownHandler is 
 handling, just be removed from AM.regionPlans, and only left to 
 TimeoutMonitor handle these regions. This need to optimize.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5396) Handle the regions in regionPlans while processing ServerShutdownHandler

2012-02-20 Thread Jieshan Bean (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13212352#comment-13212352
 ] 

Jieshan Bean commented on HBASE-5396:
-

RegionServer Log:
{noformat}
2012-02-20 23:24:49,432 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
regionserver:20020-0x3359b077bc90018 Attempting to transition node 
897ad476f426e58a36cae77c7302be1d from M_ZK_REGION_OFFLINE to 
RS_ZK_REGION_OPENING
2012-02-20 23:24:49,447 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
regionserver:20020-0x3359b077bc90018 Successfully transitioned node 
897ad476f426e58a36cae77c7302be1d from M_ZK_REGION_OFFLINE to 
RS_ZK_REGION_OPENING
2012-02-20 23:24:49,448 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: 
Opening region: {NAME = 
'jeason,0014,1329747764680.897ad476f426e58a36cae77c7302be1d.', STARTKEY 
= '0014', ENDKEY = '0014', ENCODED = 
897ad476f426e58a36cae77c7302be1d,}
2012-02-20 23:24:49,448 INFO org.apache.hadoop.hbase.regionserver.HRegion: 
Setting up tabledescriptor config now ...
2012-02-20 23:24:49,448 DEBUG org.apache.hadoop.hbase.regionserver.HRegion: 
Instantiated jeason,0014,1329747764680.897ad476f426e58a36cae77c7302be1d.
2012-02-20 23:24:49,455 INFO org.apache.hadoop.hbase.regionserver.HRegion: 
Onlined jeason,0014,1329747764680.897ad476f426e58a36cae77c7302be1d.; 
next sequenceid=1
//This regionserver was killed just after the below log.
2012-02-20 23:24:49,455 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
regionserver:20020-0x3359b077bc90018 Attempting to transition node 
897ad476f426e58a36cae77c7302be1d from RS_ZK_REGION_OPENING to 
RS_ZK_REGION_OPENING
{noformat}

HMaster Log:
{noformat}
2012-02-20 23:25:36,442 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: 
Assigning region 
jeason,0014,1329747764680.897ad476f426e58a36cae77c7302be1d. to 
C3S32,20020,1329798176762
2012-02-20 23:25:36,442 DEBUG org.apache.hadoop.hbase.master.ServerManager: New 
connection to C3S32,20020,1329798176762
2012-02-20 23:25:36,462 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: 
Handling transition=RS_ZK_REGION_OPENED, server=C3S32,20020,1329798176762, 
region=561e68708199320894bfcfc533cae772
2012-02-20 23:25:36,462 DEBUG 
org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: Handling OPENED 
event for jeason,0013,1329747764680.561e68708199320894bfcfc533cae772. 
from C3S32,20020,1329798176762; deleting unassigned node
2012-02-20 23:25:36,462 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
master:2-0x2359b0778ec000d Deleting existing unassigned node for 
561e68708199320894bfcfc533cae772 that is in expected state RS_ZK_REGION_OPENED
2012-02-20 23:25:36,463 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: 
Handling transition=RS_ZK_REGION_OPENING, server=C3S32,20020,1329798176762, 
region=3455f468b76dcf7cfe3881f9e156d0c8
2012-02-20 23:25:36,468 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
master:2-0x2359b0778ec000d Successfully deleted unassigned node for region 
561e68708199320894bfcfc533cae772 in expected state RS_ZK_REGION_OPENED
2012-02-20 23:25:36,468 DEBUG org.apache.hadoop.hbase.master.AssignmentManager: 
The znode of region 
jeason,0013,1329747764680.561e68708199320894bfcfc533cae772. has been 
deleted.
2012-02-20 23:25:36,468 INFO org.apache.hadoop.hbase.master.AssignmentManager: 
The master has opened the region 
jeason,0013,1329747764680.561e68708199320894bfcfc533cae772. that was 
online on C3S32,20020,1329798176762
2012-02-20 23:25:36,472 INFO 
org.apache.hadoop.hbase.master.handler.ServerShutdownHandler: 1 regions which 
planned to open on C3S31,20020,1329798177361 be re-assigned.
{noformat}

 Handle the regions in regionPlans while processing ServerShutdownHandler
 

 Key: HBASE-5396
 URL: https://issues.apache.org/jira/browse/HBASE-5396
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.90.6
Reporter: Jieshan Bean
Assignee: Jieshan Bean
 Fix For: 0.94.0, 0.90.6, 0.92.1

 Attachments: HBASE-5396-90-V2.patch, HBASE-5396-90-final.patch, 
 HBASE-5396-90-forReview.patch, HBASE-5396-90.patch, HBASE-5396-92.patch, 
 HBASE-5396-trunk.patch


 The regions plan to open on this server while ServerShutdownHandler is 
 handling, just be removed from AM.regionPlans, and only left to 
 TimeoutMonitor handle these regions. This need to optimize.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5396) Handle the regions in regionPlans while processing ServerShutdownHandler

2012-02-20 Thread Jieshan Bean (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13212357#comment-13212357
 ] 

Jieshan Bean commented on HBASE-5396:
-

I'm running the unit tests for TRUNK.

 Handle the regions in regionPlans while processing ServerShutdownHandler
 

 Key: HBASE-5396
 URL: https://issues.apache.org/jira/browse/HBASE-5396
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.90.6
Reporter: Jieshan Bean
Assignee: Jieshan Bean
 Fix For: 0.94.0, 0.90.6, 0.92.1

 Attachments: HBASE-5396-90-V2.patch, HBASE-5396-90-final.patch, 
 HBASE-5396-90-forReview.patch, HBASE-5396-90.patch, HBASE-5396-92.patch, 
 HBASE-5396-trunk.patch


 The regions plan to open on this server while ServerShutdownHandler is 
 handling, just be removed from AM.regionPlans, and only left to 
 TimeoutMonitor handle these regions. This need to optimize.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5396) Handle the regions in regionPlans while processing ServerShutdownHandler

2012-02-15 Thread Jieshan Bean (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13208378#comment-13208378
 ] 

Jieshan Bean commented on HBASE-5396:
-

All tests passed. And I also tested it in real cluster.

 Handle the regions in regionPlans while processing ServerShutdownHandler
 

 Key: HBASE-5396
 URL: https://issues.apache.org/jira/browse/HBASE-5396
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.90.6
Reporter: Jieshan Bean
Assignee: Jieshan Bean
 Fix For: 0.90.7

 Attachments: HBASE-5396-90-V2.patch, HBASE-5396-90-forReview.patch, 
 HBASE-5396-90.patch


 The regions plan to open on this server while ServerShutdownHandler is 
 handling, just be removed from AM.regionPlans, and only left to 
 TimeoutMonitor handle these regions. This need to optimize.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5396) Handle the regions in regionPlans while processing ServerShutdownHandler

2012-02-14 Thread Jieshan Bean (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13208222#comment-13208222
 ] 

Jieshan Bean commented on HBASE-5396:
-

Thanks, Ted. Only one doubt basing on your comments:
bq.Since the member of the Set isn't RegionPlan, I suggest renaming the above 
field to regionsOnServer.
Since the Set came from the regionPlans. So can I keep this name as 
regionPlanOnThisServer?  I think regionsOnServer will bring some 
misunderstanding. right?

I have test this patch in 90 cluster for many times. And it works fine. I will 
upload the new patch today.


 Handle the regions in regionPlans while processing ServerShutdownHandler
 

 Key: HBASE-5396
 URL: https://issues.apache.org/jira/browse/HBASE-5396
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.90.6
Reporter: Jieshan Bean
Assignee: Jieshan Bean
 Fix For: 0.90.7

 Attachments: HBASE-5396-90.patch


 The regions plan to open on this server while ServerShutdownHandler is 
 handling, just be removed from AM.regionPlans, and only left to 
 TimeoutMonitor handle these regions. This need to optimize.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5396) Handle the regions in regionPlans while processing ServerShutdownHandler

2012-02-14 Thread Jieshan Bean (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5396?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13208227#comment-13208227
 ] 

Jieshan Bean commented on HBASE-5396:
-

ok..I rename it as regionsOnServer. I will upload the new patch after tests. 
Thanks, Ted.

 Handle the regions in regionPlans while processing ServerShutdownHandler
 

 Key: HBASE-5396
 URL: https://issues.apache.org/jira/browse/HBASE-5396
 Project: HBase
  Issue Type: Bug
  Components: master
Affects Versions: 0.90.6
Reporter: Jieshan Bean
Assignee: Jieshan Bean
 Fix For: 0.90.7

 Attachments: HBASE-5396-90.patch


 The regions plan to open on this server while ServerShutdownHandler is 
 handling, just be removed from AM.regionPlans, and only left to 
 TimeoutMonitor handle these regions. This need to optimize.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5153) Add retry logic in HConnectionImplementation#resetZooKeeperTrackers

2012-01-27 Thread Jieshan Bean (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13194541#comment-13194541
 ] 

Jieshan Bean commented on HBASE-5153:
-

@Lars:
One more doubt basing on the previous comment, regading on the below code:
{noformat}
+  try {
+setupZookeeperTrackers();
+break;
+  } catch (ZooKeeperConnectionException zkce) {
+if (tries = this.numRetries) {
+  throw zkce;
+}
+  }
{noformat}
if ZookeeperNodeTracker#start get an exception, then catches and calls 
abort(Doesn't throw it). The above retries will be break under this situation. 
So this retries takes no effects.Correct me if am wrong. 

 Add retry logic in HConnectionImplementation#resetZooKeeperTrackers
 ---

 Key: HBASE-5153
 URL: https://issues.apache.org/jira/browse/HBASE-5153
 Project: HBase
  Issue Type: Bug
  Components: client
Affects Versions: 0.90.4
Reporter: Jieshan Bean
Assignee: Jieshan Bean
 Fix For: 0.94.0, 0.90.6, 0.92.1

 Attachments: 5153-92.txt, 5153-trunk-v2.txt, 5153-trunk.txt, 
 5153-trunk.txt, HBASE-5153-V2.patch, HBASE-5153-V3.patch, 
 HBASE-5153-V4-90.patch, HBASE-5153-V5-90.patch, 
 HBASE-5153-V6-90-minorchange.patch, HBASE-5153-V6-90.txt, 
 HBASE-5153-trunk-v2.patch, HBASE-5153-trunk.patch, HBASE-5153.patch, 
 TestResults-hbase5153.out


 HBASE-4893 is related to this issue. In that issue, we know, if multi-threads 
 share a same connection, once this connection got abort in one thread, the 
 other threads will got a 
 HConnectionManager$HConnectionImplementation@18fb1f7 closed exception.
 It solve the problem of stale connection can't removed. But the orignal 
 HTable instance cann't be continue to use. The connection in HTable should be 
 recreated.
 Actually, there's two aproach to solve this:
 1. In user code, once catch an IOE, close connection and re-create HTable 
 instance. We can use this as a workaround.
 2. In HBase Client side, catch this exception, and re-create connection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5153) Add retry logic in HConnectionImplementation#resetZooKeeperTrackers

2012-01-26 Thread Jieshan Bean (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13194406#comment-13194406
 ] 

Jieshan Bean commented on HBASE-5153:
-

It should not leading to an endless loop. Unless, each retry will get a 
ZookeeperLossException. If this exception happened for long time, Zookeeper 
must has some problem. so when create a new Zookeeper instance, it already 
thrown a Exception. So it won't be an endless loop:
{noformat}
if ((t instanceof KeeperException.SessionExpiredException)
  || (t instanceof KeeperException.ConnectionLossException)) {
try {
  LOG.info(This client just lost it's session with ZooKeeper, trying +
   to reconnect.);
  resetZooKeeperTrackersWithRetries();
  LOG.info(Reconnected successfully. This disconnect could have been +
   caused by a network partition or a long-running GC pause, +
   either way it's recommended that you verify your environment.);
  return;
} catch (ZooKeeperConnectionException e) {
  LOG.error(Could not reconnect to ZooKeeper after session +
   expiration, aborting);
  t = e;
}
  }
  if (t != null) LOG.fatal(msg, t);
  else LOG.fatal(msg);
  HConnectionManager.deleteStaleConnection(this);
{noformat}

 Add retry logic in HConnectionImplementation#resetZooKeeperTrackers
 ---

 Key: HBASE-5153
 URL: https://issues.apache.org/jira/browse/HBASE-5153
 Project: HBase
  Issue Type: Bug
  Components: client
Affects Versions: 0.90.4
Reporter: Jieshan Bean
Assignee: Jieshan Bean
 Fix For: 0.94.0, 0.90.6, 0.92.1

 Attachments: 5153-92.txt, 5153-trunk.txt, 5153-trunk.txt, 
 HBASE-5153-V2.patch, HBASE-5153-V3.patch, HBASE-5153-V4-90.patch, 
 HBASE-5153-V5-90.patch, HBASE-5153-V6-90-minorchange.patch, 
 HBASE-5153-V6-90.txt, HBASE-5153-trunk-v2.patch, HBASE-5153-trunk.patch, 
 HBASE-5153.patch, TestResults-hbase5153.out


 HBASE-4893 is related to this issue. In that issue, we know, if multi-threads 
 share a same connection, once this connection got abort in one thread, the 
 other threads will got a 
 HConnectionManager$HConnectionImplementation@18fb1f7 closed exception.
 It solve the problem of stale connection can't removed. But the orignal 
 HTable instance cann't be continue to use. The connection in HTable should be 
 recreated.
 Actually, there's two aproach to solve this:
 1. In user code, once catch an IOE, close connection and re-create HTable 
 instance. We can use this as a workaround.
 2. In HBase Client side, catch this exception, and re-create connection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5153) Add retry logic in HConnectionImplementation#resetZooKeeperTrackers

2012-01-26 Thread Jieshan Bean (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13194412#comment-13194412
 ] 

Jieshan Bean commented on HBASE-5153:
-

As discussed with Ted. Trunk and 92 already including a retry logic in 
RecoverableZooKeeper. So that makes the retry logic in 
resetZooKeeperTrackersWithRetries less important.

 Add retry logic in HConnectionImplementation#resetZooKeeperTrackers
 ---

 Key: HBASE-5153
 URL: https://issues.apache.org/jira/browse/HBASE-5153
 Project: HBase
  Issue Type: Bug
  Components: client
Affects Versions: 0.90.4
Reporter: Jieshan Bean
Assignee: Jieshan Bean
 Fix For: 0.94.0, 0.90.6, 0.92.1

 Attachments: 5153-92.txt, 5153-trunk.txt, 5153-trunk.txt, 
 HBASE-5153-V2.patch, HBASE-5153-V3.patch, HBASE-5153-V4-90.patch, 
 HBASE-5153-V5-90.patch, HBASE-5153-V6-90-minorchange.patch, 
 HBASE-5153-V6-90.txt, HBASE-5153-trunk-v2.patch, HBASE-5153-trunk.patch, 
 HBASE-5153.patch, TestResults-hbase5153.out


 HBASE-4893 is related to this issue. In that issue, we know, if multi-threads 
 share a same connection, once this connection got abort in one thread, the 
 other threads will got a 
 HConnectionManager$HConnectionImplementation@18fb1f7 closed exception.
 It solve the problem of stale connection can't removed. But the orignal 
 HTable instance cann't be continue to use. The connection in HTable should be 
 recreated.
 Actually, there's two aproach to solve this:
 1. In user code, once catch an IOE, close connection and re-create HTable 
 instance. We can use this as a workaround.
 2. In HBase Client side, catch this exception, and re-create connection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5153) Add retry logic in HConnectionImplementation#resetZooKeeperTrackers

2012-01-26 Thread Jieshan Bean (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13194426#comment-13194426
 ] 

Jieshan Bean commented on HBASE-5153:
-

Thanks, Ted...I will take a look at that stack trace.
 
If a ZooKeeperConnectionException thrown by the below code:
{noformat}
   try {
  if (setupZookeeperTrackers(isLastTime)) {
break;
  }
} catch (ZooKeeperConnectionException zkce) {
  if (isLastTime) {
throw zkce;
  }
}
{noformat}

If will be catched in abort method, then calling LOG.fatal(msg, t);

No problem here. Don't know whether I get you correctly:(.

 Add retry logic in HConnectionImplementation#resetZooKeeperTrackers
 ---

 Key: HBASE-5153
 URL: https://issues.apache.org/jira/browse/HBASE-5153
 Project: HBase
  Issue Type: Bug
  Components: client
Affects Versions: 0.90.4
Reporter: Jieshan Bean
Assignee: Jieshan Bean
 Fix For: 0.94.0, 0.90.6, 0.92.1

 Attachments: 5153-92.txt, 5153-trunk.txt, 5153-trunk.txt, 
 HBASE-5153-V2.patch, HBASE-5153-V3.patch, HBASE-5153-V4-90.patch, 
 HBASE-5153-V5-90.patch, HBASE-5153-V6-90-minorchange.patch, 
 HBASE-5153-V6-90.txt, HBASE-5153-trunk-v2.patch, HBASE-5153-trunk.patch, 
 HBASE-5153.patch, TestResults-hbase5153.out


 HBASE-4893 is related to this issue. In that issue, we know, if multi-threads 
 share a same connection, once this connection got abort in one thread, the 
 other threads will got a 
 HConnectionManager$HConnectionImplementation@18fb1f7 closed exception.
 It solve the problem of stale connection can't removed. But the orignal 
 HTable instance cann't be continue to use. The connection in HTable should be 
 recreated.
 Actually, there's two aproach to solve this:
 1. In user code, once catch an IOE, close connection and re-create HTable 
 instance. We can use this as a workaround.
 2. In HBase Client side, catch this exception, and re-create connection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5153) Add retry logic in HConnectionImplementation#resetZooKeeperTrackers

2012-01-26 Thread Jieshan Bean (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=1319#comment-1319
 ] 

Jieshan Bean commented on HBASE-5153:
-

Thanks Lars. It's nice of you:)

 Add retry logic in HConnectionImplementation#resetZooKeeperTrackers
 ---

 Key: HBASE-5153
 URL: https://issues.apache.org/jira/browse/HBASE-5153
 Project: HBase
  Issue Type: Bug
  Components: client
Affects Versions: 0.90.4
Reporter: Jieshan Bean
Assignee: Jieshan Bean
 Fix For: 0.94.0, 0.90.6, 0.92.1

 Attachments: 5153-92.txt, 5153-trunk-v2.txt, 5153-trunk.txt, 
 5153-trunk.txt, HBASE-5153-V2.patch, HBASE-5153-V3.patch, 
 HBASE-5153-V4-90.patch, HBASE-5153-V5-90.patch, 
 HBASE-5153-V6-90-minorchange.patch, HBASE-5153-V6-90.txt, 
 HBASE-5153-trunk-v2.patch, HBASE-5153-trunk.patch, HBASE-5153.patch, 
 TestResults-hbase5153.out


 HBASE-4893 is related to this issue. In that issue, we know, if multi-threads 
 share a same connection, once this connection got abort in one thread, the 
 other threads will got a 
 HConnectionManager$HConnectionImplementation@18fb1f7 closed exception.
 It solve the problem of stale connection can't removed. But the orignal 
 HTable instance cann't be continue to use. The connection in HTable should be 
 recreated.
 Actually, there's two aproach to solve this:
 1. In user code, once catch an IOE, close connection and re-create HTable 
 instance. We can use this as a workaround.
 2. In HBase Client side, catch this exception, and re-create connection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5153) Add retry logic in HConnectionImplementation#resetZooKeeperTrackers

2012-01-26 Thread Jieshan Bean (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13194445#comment-13194445
 ] 

Jieshan Bean commented on HBASE-5153:
-

The endless loop happens when ZK is actually down.
If ZK is actually down, the below code will throw a Exception:
 this.zooKeeper = getZooKeeperWatcher();
Then catched by the below code:
{noformat}
  try {
  LOG.info(This client just lost it's session with ZooKeeper, trying +
   to reconnect.);
  resetZooKeeperTrackersWithRetries();
  LOG.info(Reconnected successfully. This disconnect could have been +
   caused by a network partition or a long-running GC pause, +
   either way it's recommended that you verify your environment.);
  return;
} catch (ZooKeeperConnectionException e) {
  LOG.error(Could not reconnect to ZooKeeper after session +
   expiration, aborting);
  t = e;
}
  if (t != null) LOG.fatal(msg, t);
  else LOG.fatal(msg);
  HConnectionManager.deleteStaleConnection(this);
{noformat}

It should not be a endless loop. Does that make sense?


 Add retry logic in HConnectionImplementation#resetZooKeeperTrackers
 ---

 Key: HBASE-5153
 URL: https://issues.apache.org/jira/browse/HBASE-5153
 Project: HBase
  Issue Type: Bug
  Components: client
Affects Versions: 0.90.4
Reporter: Jieshan Bean
Assignee: Jieshan Bean
 Fix For: 0.94.0, 0.90.6, 0.92.1

 Attachments: 5153-92.txt, 5153-trunk-v2.txt, 5153-trunk.txt, 
 5153-trunk.txt, HBASE-5153-V2.patch, HBASE-5153-V3.patch, 
 HBASE-5153-V4-90.patch, HBASE-5153-V5-90.patch, 
 HBASE-5153-V6-90-minorchange.patch, HBASE-5153-V6-90.txt, 
 HBASE-5153-trunk-v2.patch, HBASE-5153-trunk.patch, HBASE-5153.patch, 
 TestResults-hbase5153.out


 HBASE-4893 is related to this issue. In that issue, we know, if multi-threads 
 share a same connection, once this connection got abort in one thread, the 
 other threads will got a 
 HConnectionManager$HConnectionImplementation@18fb1f7 closed exception.
 It solve the problem of stale connection can't removed. But the orignal 
 HTable instance cann't be continue to use. The connection in HTable should be 
 recreated.
 Actually, there's two aproach to solve this:
 1. In user code, once catch an IOE, close connection and re-create HTable 
 instance. We can use this as a workaround.
 2. In HBase Client side, catch this exception, and re-create connection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5153) Add retry logic in HConnectionImplementation#resetZooKeeperTrackers

2012-01-26 Thread Jieshan Bean (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13194447#comment-13194447
 ] 

Jieshan Bean commented on HBASE-5153:
-

bq. Are you guys saying we do not need this in 0.92+?

We need this, but the patch for 0.92+ should take notice of this:)

 Add retry logic in HConnectionImplementation#resetZooKeeperTrackers
 ---

 Key: HBASE-5153
 URL: https://issues.apache.org/jira/browse/HBASE-5153
 Project: HBase
  Issue Type: Bug
  Components: client
Affects Versions: 0.90.4
Reporter: Jieshan Bean
Assignee: Jieshan Bean
 Fix For: 0.94.0, 0.90.6, 0.92.1

 Attachments: 5153-92.txt, 5153-trunk-v2.txt, 5153-trunk.txt, 
 5153-trunk.txt, HBASE-5153-V2.patch, HBASE-5153-V3.patch, 
 HBASE-5153-V4-90.patch, HBASE-5153-V5-90.patch, 
 HBASE-5153-V6-90-minorchange.patch, HBASE-5153-V6-90.txt, 
 HBASE-5153-trunk-v2.patch, HBASE-5153-trunk.patch, HBASE-5153.patch, 
 TestResults-hbase5153.out


 HBASE-4893 is related to this issue. In that issue, we know, if multi-threads 
 share a same connection, once this connection got abort in one thread, the 
 other threads will got a 
 HConnectionManager$HConnectionImplementation@18fb1f7 closed exception.
 It solve the problem of stale connection can't removed. But the orignal 
 HTable instance cann't be continue to use. The connection in HTable should be 
 recreated.
 Actually, there's two aproach to solve this:
 1. In user code, once catch an IOE, close connection and re-create HTable 
 instance. We can use this as a workaround.
 2. In HBase Client side, catch this exception, and re-create connection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5153) Add retry logic in HConnectionImplementation#resetZooKeeperTrackers

2012-01-26 Thread Jieshan Bean (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13194451#comment-13194451
 ] 

Jieshan Bean commented on HBASE-5153:
-

@Lars:
Adding a isResettingZKTrackers sounds good to me. One doubt: Is it necessary 
to add the keyword of volatile?

 Add retry logic in HConnectionImplementation#resetZooKeeperTrackers
 ---

 Key: HBASE-5153
 URL: https://issues.apache.org/jira/browse/HBASE-5153
 Project: HBase
  Issue Type: Bug
  Components: client
Affects Versions: 0.90.4
Reporter: Jieshan Bean
Assignee: Jieshan Bean
 Fix For: 0.94.0, 0.90.6, 0.92.1

 Attachments: 5153-92.txt, 5153-trunk-v2.txt, 5153-trunk.txt, 
 5153-trunk.txt, HBASE-5153-V2.patch, HBASE-5153-V3.patch, 
 HBASE-5153-V4-90.patch, HBASE-5153-V5-90.patch, 
 HBASE-5153-V6-90-minorchange.patch, HBASE-5153-V6-90.txt, 
 HBASE-5153-trunk-v2.patch, HBASE-5153-trunk.patch, HBASE-5153.patch, 
 TestResults-hbase5153.out


 HBASE-4893 is related to this issue. In that issue, we know, if multi-threads 
 share a same connection, once this connection got abort in one thread, the 
 other threads will got a 
 HConnectionManager$HConnectionImplementation@18fb1f7 closed exception.
 It solve the problem of stale connection can't removed. But the orignal 
 HTable instance cann't be continue to use. The connection in HTable should be 
 recreated.
 Actually, there's two aproach to solve this:
 1. In user code, once catch an IOE, close connection and re-create HTable 
 instance. We can use this as a workaround.
 2. In HBase Client side, catch this exception, and re-create connection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5153) Add retry logic in HConnectionImplementation#resetZooKeeperTrackers

2012-01-26 Thread Jieshan Bean (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13194519#comment-13194519
 ] 

Jieshan Bean commented on HBASE-5153:
-

@Lars:
During the retry, if we get any exceptions, Zookeeper and the Trackers also 
need to close.
{noformat}
+  try {
+setupZookeeperTrackers();
+break;
+  } catch (ZooKeeperConnectionException zkce) {
+if (tries = this.numRetries) {
+  throw zkce;
+}
+  }
{noformat}

 Add retry logic in HConnectionImplementation#resetZooKeeperTrackers
 ---

 Key: HBASE-5153
 URL: https://issues.apache.org/jira/browse/HBASE-5153
 Project: HBase
  Issue Type: Bug
  Components: client
Affects Versions: 0.90.4
Reporter: Jieshan Bean
Assignee: Jieshan Bean
 Fix For: 0.94.0, 0.90.6, 0.92.1

 Attachments: 5153-92.txt, 5153-trunk-v2.txt, 5153-trunk.txt, 
 5153-trunk.txt, HBASE-5153-V2.patch, HBASE-5153-V3.patch, 
 HBASE-5153-V4-90.patch, HBASE-5153-V5-90.patch, 
 HBASE-5153-V6-90-minorchange.patch, HBASE-5153-V6-90.txt, 
 HBASE-5153-trunk-v2.patch, HBASE-5153-trunk.patch, HBASE-5153.patch, 
 TestResults-hbase5153.out


 HBASE-4893 is related to this issue. In that issue, we know, if multi-threads 
 share a same connection, once this connection got abort in one thread, the 
 other threads will got a 
 HConnectionManager$HConnectionImplementation@18fb1f7 closed exception.
 It solve the problem of stale connection can't removed. But the orignal 
 HTable instance cann't be continue to use. The connection in HTable should be 
 recreated.
 Actually, there's two aproach to solve this:
 1. In user code, once catch an IOE, close connection and re-create HTable 
 instance. We can use this as a workaround.
 2. In HBase Client side, catch this exception, and re-create connection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5153) Add retry logic in HConnectionImplementation#resetZooKeeperTrackers

2012-01-16 Thread Jieshan Bean (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13187442#comment-13187442
 ] 

Jieshan Bean commented on HBASE-5153:
-

Thank you, Ted.

 Add retry logic in HConnectionImplementation#resetZooKeeperTrackers
 ---

 Key: HBASE-5153
 URL: https://issues.apache.org/jira/browse/HBASE-5153
 Project: HBase
  Issue Type: Bug
  Components: client
Affects Versions: 0.90.4
Reporter: Jieshan Bean
Assignee: Jieshan Bean
 Fix For: 0.94.0, 0.92.1, 0.90.6

 Attachments: 5153-92.txt, 5153-trunk.txt, 5153-trunk.txt, 
 HBASE-5153-V2.patch, HBASE-5153-V3.patch, HBASE-5153-V4-90.patch, 
 HBASE-5153-V5-90.patch, HBASE-5153-V6-90-minorchange.patch, 
 HBASE-5153-V6-90.txt, HBASE-5153-trunk-v2.patch, HBASE-5153-trunk.patch, 
 HBASE-5153.patch, TestResults-hbase5153.out


 HBASE-4893 is related to this issue. In that issue, we know, if multi-threads 
 share a same connection, once this connection got abort in one thread, the 
 other threads will got a 
 HConnectionManager$HConnectionImplementation@18fb1f7 closed exception.
 It solve the problem of stale connection can't removed. But the orignal 
 HTable instance cann't be continue to use. The connection in HTable should be 
 recreated.
 Actually, there's two aproach to solve this:
 1. In user code, once catch an IOE, close connection and re-create HTable 
 instance. We can use this as a workaround.
 2. In HBase Client side, catch this exception, and re-create connection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5153) Add retry logic in HConnectionImplementation#resetZooKeeperTrackers

2012-01-15 Thread Jieshan Bean (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13186721#comment-13186721
 ] 

Jieshan Bean commented on HBASE-5153:
-

The test is still running. Before I get the results, I want your comments:).
Thank you.

 Add retry logic in HConnectionImplementation#resetZooKeeperTrackers
 ---

 Key: HBASE-5153
 URL: https://issues.apache.org/jira/browse/HBASE-5153
 Project: HBase
  Issue Type: Bug
  Components: client
Affects Versions: 0.90.4
Reporter: Jieshan Bean
Assignee: Jieshan Bean
 Fix For: 0.90.6

 Attachments: 5153-trunk.txt, HBASE-5153-V2.patch, 
 HBASE-5153-V3.patch, HBASE-5153-V4-90.patch, HBASE-5153-V5-90.patch, 
 HBASE-5153-V6-90.txt, HBASE-5153-trunk-v2.patch, HBASE-5153-trunk.patch, 
 HBASE-5153.patch


 HBASE-4893 is related to this issue. In that issue, we know, if multi-threads 
 share a same connection, once this connection got abort in one thread, the 
 other threads will got a 
 HConnectionManager$HConnectionImplementation@18fb1f7 closed exception.
 It solve the problem of stale connection can't removed. But the orignal 
 HTable instance cann't be continue to use. The connection in HTable should be 
 recreated.
 Actually, there's two aproach to solve this:
 1. In user code, once catch an IOE, close connection and re-create HTable 
 instance. We can use this as a workaround.
 2. In HBase Client side, catch this exception, and re-create connection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5153) Add retry logic in HConnectionImplementation#resetZooKeeperTrackers

2012-01-15 Thread Jieshan Bean (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13186730#comment-13186730
 ] 

Jieshan Bean commented on HBASE-5153:
-

I will upload the new patch after the tests finish(Including this minor 
change), Thanks, Ted.

 Add retry logic in HConnectionImplementation#resetZooKeeperTrackers
 ---

 Key: HBASE-5153
 URL: https://issues.apache.org/jira/browse/HBASE-5153
 Project: HBase
  Issue Type: Bug
  Components: client
Affects Versions: 0.90.4
Reporter: Jieshan Bean
Assignee: Jieshan Bean
 Fix For: 0.90.6

 Attachments: 5153-trunk.txt, HBASE-5153-V2.patch, 
 HBASE-5153-V3.patch, HBASE-5153-V4-90.patch, HBASE-5153-V5-90.patch, 
 HBASE-5153-V6-90.txt, HBASE-5153-trunk-v2.patch, HBASE-5153-trunk.patch, 
 HBASE-5153.patch


 HBASE-4893 is related to this issue. In that issue, we know, if multi-threads 
 share a same connection, once this connection got abort in one thread, the 
 other threads will got a 
 HConnectionManager$HConnectionImplementation@18fb1f7 closed exception.
 It solve the problem of stale connection can't removed. But the orignal 
 HTable instance cann't be continue to use. The connection in HTable should be 
 recreated.
 Actually, there's two aproach to solve this:
 1. In user code, once catch an IOE, close connection and re-create HTable 
 instance. We can use this as a workaround.
 2. In HBase Client side, catch this exception, and re-create connection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5153) HConnection re-creation in HTable after HConnection abort

2012-01-13 Thread Jieshan Bean (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13185487#comment-13185487
 ] 

Jieshan Bean commented on HBASE-5153:
-

Maybe retry in resetZooKeeperTrackers is better. We try our best to re-use the 
original connection. The worst case is even if we retried max times and still 
fail, then abort, but i think we're responsible for letting the user level know 
this. 

 HConnection re-creation in HTable after HConnection abort
 -

 Key: HBASE-5153
 URL: https://issues.apache.org/jira/browse/HBASE-5153
 Project: HBase
  Issue Type: Bug
  Components: client
Affects Versions: 0.90.4
Reporter: Jieshan Bean
Assignee: Jieshan Bean
 Fix For: 0.90.6

 Attachments: 5153-trunk.txt, HBASE-5153-V2.patch, 
 HBASE-5153-V3.patch, HBASE-5153-V4-90.patch, HBASE-5153-trunk-v2.patch, 
 HBASE-5153-trunk.patch, HBASE-5153.patch


 HBASE-4893 is related to this issue. In that issue, we know, if multi-threads 
 share a same connection, once this connection got abort in one thread, the 
 other threads will got a 
 HConnectionManager$HConnectionImplementation@18fb1f7 closed exception.
 It solve the problem of stale connection can't removed. But the orignal 
 HTable instance cann't be continue to use. The connection in HTable should be 
 recreated.
 Actually, there's two aproach to solve this:
 1. In user code, once catch an IOE, close connection and re-create HTable 
 instance. We can use this as a workaround.
 2. In HBase Client side, catch this exception, and re-create connection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5153) HConnection re-creation in HTable after HConnection abort

2012-01-13 Thread Jieshan Bean (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13185601#comment-13185601
 ] 

Jieshan Bean commented on HBASE-5153:
-

Thanks, Ted.

Without HBASE-3065, 0.90 doesn't handle the ConnectionLossException correctly. 
Consider the below case:
1. Somewhere trigger a HConnection#abort. 
2. Suppose the check of if (t instanceof 
KeeperException.SessionExpiredException) is true. Then called the 
resetZooKeeperTrackers().
3. A ConnectionLossException occur during ZookeeperNodeTracker#start. then 
trigger a new HConnection#abort. At this scenario, the previous abort may print 
a log of 
Reconnected successfully. This disconnect could have been caused by a network 
partition or a long-running GC pause..
4. The new abort carry a Throwable with a type which is not 
KeeperException.SessionExpiredException. so this time abort directly.

It seems a recursion here.
 
Either re-use the old connection by resetZooKeeperTrackers, or re-create the 
connection, the ZookeeperWatcher will be a new one. So  I still think the patch 
for 0.90 is reasonable.

Trunk patch will be made big changes.

So any other good suggestions? Thanks.


 HConnection re-creation in HTable after HConnection abort
 -

 Key: HBASE-5153
 URL: https://issues.apache.org/jira/browse/HBASE-5153
 Project: HBase
  Issue Type: Bug
  Components: client
Affects Versions: 0.90.4
Reporter: Jieshan Bean
Assignee: Jieshan Bean
 Fix For: 0.90.6

 Attachments: 5153-trunk.txt, HBASE-5153-V2.patch, 
 HBASE-5153-V3.patch, HBASE-5153-V4-90.patch, HBASE-5153-trunk-v2.patch, 
 HBASE-5153-trunk.patch, HBASE-5153.patch


 HBASE-4893 is related to this issue. In that issue, we know, if multi-threads 
 share a same connection, once this connection got abort in one thread, the 
 other threads will got a 
 HConnectionManager$HConnectionImplementation@18fb1f7 closed exception.
 It solve the problem of stale connection can't removed. But the orignal 
 HTable instance cann't be continue to use. The connection in HTable should be 
 recreated.
 Actually, there's two aproach to solve this:
 1. In user code, once catch an IOE, close connection and re-create HTable 
 instance. We can use this as a workaround.
 2. In HBase Client side, catch this exception, and re-create connection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5153) HConnection re-creation in HTable after HConnection abort

2012-01-11 Thread Jieshan Bean (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13184705#comment-13184705
 ] 

Jieshan Bean commented on HBASE-5153:
-

@Ted, the patch for TRUNK seems very different, and i still need some time to 
check it. hope i can provide today:)

@Stack, I think ConnectionUtils is reasonable. I can add it:). I will update 
the patch.

Thank you all.

 HConnection re-creation in HTable after HConnection abort
 -

 Key: HBASE-5153
 URL: https://issues.apache.org/jira/browse/HBASE-5153
 Project: HBase
  Issue Type: Bug
  Components: client
Affects Versions: 0.90.4
Reporter: Jieshan Bean
Assignee: Jieshan Bean
 Fix For: 0.90.6

 Attachments: HBASE-5153-V2.patch, HBASE-5153-V3.patch, 
 HBASE-5153.patch


 HBASE-4893 is related to this issue. In that issue, we know, if multi-threads 
 share a same connection, once this connection got abort in one thread, the 
 other threads will got a 
 HConnectionManager$HConnectionImplementation@18fb1f7 closed exception.
 It solve the problem of stale connection can't removed. But the orignal 
 HTable instance cann't be continue to use. The connection in HTable should be 
 recreated.
 Actually, there's two aproach to solve this:
 1. In user code, once catch an IOE, close connection and re-create HTable 
 instance. We can use this as a workaround.
 2. In HBase Client side, catch this exception, and re-create connection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5153) HConnection re-creation in HTable after HConnection abort

2012-01-10 Thread Jieshan Bean (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13183290#comment-13183290
 ] 

Jieshan Bean commented on HBASE-5153:
-

Only do that in flushCommits maybe not enough. I'll go though the code and give 
a more considerate approach. and also will give a patch for TRUNK.

 HConnection re-creation in HTable after HConnection abort
 -

 Key: HBASE-5153
 URL: https://issues.apache.org/jira/browse/HBASE-5153
 Project: HBase
  Issue Type: Bug
  Components: client
Affects Versions: 0.90.4
Reporter: Jieshan Bean
Assignee: Jieshan Bean
 Fix For: 0.90.6

 Attachments: HBASE-5153-V2.patch, HBASE-5153.patch


 HBASE-4893 is related to this issue. In that issue, we know, if multi-threads 
 share a same connection, once this connection got abort in one thread, the 
 other threads will got a 
 HConnectionManager$HConnectionImplementation@18fb1f7 closed exception.
 It solve the problem of stale connection can't removed. But the orignal 
 HTable instance cann't be continue to use. The connection in HTable should be 
 recreated.
 Actually, there's two aproach to solve this:
 1. In user code, once catch an IOE, close connection and re-create HTable 
 instance. We can use this as a workaround.
 2. In HBase Client side, catch this exception, and re-create connection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5153) HConnection re-creation in HTable after HConnection abort

2012-01-09 Thread Jieshan Bean (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13182472#comment-13182472
 ] 

Jieshan Bean commented on HBASE-5153:
-

Thanks, Ted. I'll add the unit test code immediately.

 HConnection re-creation in HTable after HConnection abort
 -

 Key: HBASE-5153
 URL: https://issues.apache.org/jira/browse/HBASE-5153
 Project: HBase
  Issue Type: Bug
  Components: client
Affects Versions: 0.90.4
Reporter: Jieshan Bean
Assignee: Jieshan Bean
 Fix For: 0.90.6

 Attachments: HBASE-5153.patch


 HBASE-4893 is related to this issue. In that issue, we know, if multi-threads 
 share a same connection, once this connection got abort in one thread, the 
 other threads will got a 
 HConnectionManager$HConnectionImplementation@18fb1f7 closed exception.
 It solve the problem of stale connection can't removed. But the orignal 
 HTable instance cann't be continue to use. The connection in HTable should be 
 recreated.
 Actually, there's two aproach to solve this:
 1. In user code, once catch an IOE, close connection and re-create HTable 
 instance. We can use this as a workaround.
 2. In HBase Client side, catch this exception, and re-create connection.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5088) A concurrency issue on SoftValueSortedMap

2012-01-07 Thread Jieshan Bean (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13181900#comment-13181900
 ] 

Jieshan Bean commented on HBASE-5088:
-

It seems my reply is too late. Thank you all:)
I suggest backport this patch to 0.90.6.

 A concurrency issue on SoftValueSortedMap
 -

 Key: HBASE-5088
 URL: https://issues.apache.org/jira/browse/HBASE-5088
 Project: HBase
  Issue Type: Bug
  Components: client
Affects Versions: 0.90.4, 0.94.0
Reporter: Jieshan Bean
Assignee: Lars Hofhansl
Priority: Critical
 Fix For: 0.92.0, 0.94.0

 Attachments: 5088-final.txt, 5088-final2.txt, 5088-final3.txt, 
 5088-syncObj.txt, 5088-useMapInterfaces.txt, 5088.generics.txt, 
 HBase-5088-90.patch, HBase-5088-trunk.patch, 
 HBase5088-90-replaceSoftValueSortedMap.patch, 
 HBase5088-90-replaceTreeMap.patch, HBase5088-trunk-replaceTreeMap.patch, 
 HBase5088Reproduce.java, PerformanceTestResults.png


 SoftValueSortedMap is backed by a TreeMap. All the methods in this class are 
 synchronized. If we use this method to add/delete elements, it's ok.
 But in HConnectionManager#getCachedLocation, it use headMap to get a view 
 from SoftValueSortedMap#internalMap. Once we operate 
 on this view map(like add/delete) in other threads, a concurrency issue may 
 occur.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5088) A concurrency issue on SoftValueSortedMap

2012-01-05 Thread Jieshan Bean (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13180256#comment-13180256
 ] 

Jieshan Bean commented on HBASE-5088:
-

We can finish the tests tommorow,then i'll give the results. Thank you, Lars:)

 A concurrency issue on SoftValueSortedMap
 -

 Key: HBASE-5088
 URL: https://issues.apache.org/jira/browse/HBASE-5088
 Project: HBase
  Issue Type: Bug
  Components: client
Affects Versions: 0.90.4, 0.94.0
Reporter: Jieshan Bean
Assignee: Jieshan Bean
 Attachments: 5088-useMapInterfaces.txt, 5088.generics.txt, 
 HBase-5088-90.patch, HBase-5088-trunk.patch, HBase5088Reproduce.java


 SoftValueSortedMap is backed by a TreeMap. All the methods in this class are 
 synchronized. If we use this method to add/delete elements, it's ok.
 But in HConnectionManager#getCachedLocation, it use headMap to get a view 
 from SoftValueSortedMap#internalMap. Once we operate 
 on this view map(like add/delete) in other threads, a concurrency issue may 
 occur.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5088) A concurrency issue on SoftValueSortedMap

2012-01-05 Thread Jieshan Bean (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13181156#comment-13181156
 ] 

Jieshan Bean commented on HBASE-5088:
-

Either the patch of replace TreeMap with ConcurrentSkipListMap, or the patch of 
replace SoftValueSortedMap with ConcurrentSkipListMap, the performance slightly 
degraded. The latter one seems better. Please find the test results from the 
attachment PerformanceTestResults.png.

 A concurrency issue on SoftValueSortedMap
 -

 Key: HBASE-5088
 URL: https://issues.apache.org/jira/browse/HBASE-5088
 Project: HBase
  Issue Type: Bug
  Components: client
Affects Versions: 0.90.4, 0.94.0
Reporter: Jieshan Bean
Assignee: Jieshan Bean
Priority: Critical
 Fix For: 0.92.0

 Attachments: 5088-useMapInterfaces.txt, 5088.generics.txt, 
 HBase-5088-90.patch, HBase-5088-trunk.patch, 
 HBase5088-trunk-replaceTreeMap.patch, HBase5088Reproduce.java, 
 PerformanceTestResults.png


 SoftValueSortedMap is backed by a TreeMap. All the methods in this class are 
 synchronized. If we use this method to add/delete elements, it's ok.
 But in HConnectionManager#getCachedLocation, it use headMap to get a view 
 from SoftValueSortedMap#internalMap. Once we operate 
 on this view map(like add/delete) in other threads, a concurrency issue may 
 occur.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5088) A concurrency issue on SoftValueSortedMap

2012-01-05 Thread Jieshan Bean (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13181161#comment-13181161
 ] 

Jieshan Bean commented on HBASE-5088:
-

@Lars:
patch of replace TreeMap with ConcurrentSkipListMap is same with the patch 
you attached named 5088.generics.txt.

 A concurrency issue on SoftValueSortedMap
 -

 Key: HBASE-5088
 URL: https://issues.apache.org/jira/browse/HBASE-5088
 Project: HBase
  Issue Type: Bug
  Components: client
Affects Versions: 0.90.4, 0.94.0
Reporter: Jieshan Bean
Assignee: Jieshan Bean
Priority: Critical
 Fix For: 0.92.0

 Attachments: 5088-useMapInterfaces.txt, 5088.generics.txt, 
 HBase-5088-90.patch, HBase-5088-trunk.patch, 
 HBase5088-90-replaceSoftValueSortedMap.patch, 
 HBase5088-90-replaceTreeMap.patch, HBase5088-trunk-replaceTreeMap.patch, 
 HBase5088Reproduce.java, PerformanceTestResults.png


 SoftValueSortedMap is backed by a TreeMap. All the methods in this class are 
 synchronized. If we use this method to add/delete elements, it's ok.
 But in HConnectionManager#getCachedLocation, it use headMap to get a view 
 from SoftValueSortedMap#internalMap. Once we operate 
 on this view map(like add/delete) in other threads, a concurrency issue may 
 occur.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5088) A concurrency issue on SoftValueSortedMap

2012-01-02 Thread Jieshan Bean (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13178362#comment-13178362
 ] 

Jieshan Bean commented on HBASE-5088:
-

@Lars:
It makes sense to me. I also made plan to test this patch, but the tests 
haven't finished yet(Encountered some problems during the tests). 
Thank you for your attention on this issue, Lars:) 

 A concurrency issue on SoftValueSortedMap
 -

 Key: HBASE-5088
 URL: https://issues.apache.org/jira/browse/HBASE-5088
 Project: HBase
  Issue Type: Bug
  Components: client
Affects Versions: 0.90.4, 0.94.0
Reporter: Jieshan Bean
Assignee: Jieshan Bean
 Attachments: 5088-useMapInterfaces.txt, 5088.generics.txt, 
 HBase-5088-90.patch, HBase-5088-trunk.patch, HBase5088Reproduce.java


 SoftValueSortedMap is backed by a TreeMap. All the methods in this class are 
 synchronized. If we use this method to add/delete elements, it's ok.
 But in HConnectionManager#getCachedLocation, it use headMap to get a view 
 from SoftValueSortedMap#internalMap. Once we operate 
 on this view map(like add/delete) in other threads, a concurrency issue may 
 occur.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5100) Rollback of split would cause closed region to opened

2011-12-28 Thread Jieshan Bean (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5100?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13176993#comment-13176993
 ] 

Jieshan Bean commented on HBASE-5100:
-

The patch is good. If region has been closed by other thread, just abondon the 
split.That region should not be online again while rolling back. Meanwhile, we 
just need to clean the splitDir.

 Rollback of split would cause closed region to opened 
 --

 Key: HBASE-5100
 URL: https://issues.apache.org/jira/browse/HBASE-5100
 Project: HBase
  Issue Type: Bug
Reporter: chunhui shen
Assignee: chunhui shen
 Attachments: hbase-5100.patch


 If master sending close region to rs and region's split transaction 
 concurrently happen,
 it may cause closed region to opened. 
 See the detailed code in SplitTransaction#createDaughters
 {code}
 ListStoreFile hstoreFilesToSplit = null;
 try{
   hstoreFilesToSplit = this.parent.close(false);
   if (hstoreFilesToSplit == null) {
 // The region was closed by a concurrent thread.  We can't continue
 // with the split, instead we must just abandon the split.  If we
 // reopen or split this could cause problems because the region has
 // probably already been moved to a different server, or is in the
 // process of moving to a different server.
 throw new IOException(Failed to close region: already closed by  +
   another thread);
   }
 } finally {
   this.journal.add(JournalEntry.CLOSED_PARENT_REGION);
 }
 {code}
 when rolling back, the JournalEntry.CLOSED_PARENT_REGION causes 
 this.parent.initialize();
 Although this region is not onlined in the regionserver, it may bring some 
 potential problem.
 For example, in our environment, the closed parent region is rolled back 
 sucessfully , and then starting compaction and split again.
 The parent region is f892dd6107b6b4130199582abc78e9c1
 master log
 {code}
 2011-12-26 00:24:42,693 INFO org.apache.hadoop.hbase.master.HMaster: balance 
 hri=writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1.,
  src=dw87.kgb.sqa.cm4,60020,1324827866085, 
 dest=dw80.kgb.sqa.cm4,60020,1324827865780
 2011-12-26 00:24:42,693 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Starting unassignment of 
 region 
 writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1.
  (offlining)
 2011-12-26 00:24:42,694 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Sent CLOSE to 
 serverName=dw87.kgb.sqa.cm4,60020,1324827866085, load=(requests=0, regions=0, 
 usedHeap=0, maxHeap=0) for region 
 writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1.
 2011-12-26 00:24:42,699 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling new unassigned 
 node: /hbase-tbfs/unassigned/f892dd6107b6b4130199582abc78e9c1 
 (region=writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1.,
  server=dw87.kgb.sqa.cm4,60020,1324827866085, state=RS_ZK_REGION_CLOSING)
 2011-12-26 00:24:42,699 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=RS_ZK_REGION_CLOSING, server=dw87.kgb.sqa.cm4,60020,1324827866085, 
 region=f892dd6107b6b4130199582abc78e9c1
 2011-12-26 00:24:45,348 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=RS_ZK_REGION_CLOSED, server=dw87.kgb.sqa.cm4,60020,1324827866085, 
 region=f892dd6107b6b4130199582abc78e9c1
 2011-12-26 00:24:45,349 DEBUG 
 org.apache.hadoop.hbase.master.handler.ClosedRegionHandler: Handling CLOSED 
 event for f892dd6107b6b4130199582abc78e9c1
 2011-12-26 00:24:45,349 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Forcing OFFLINE; 
 was=writetest,8ZW417DZP93OU6SZ0QQMKTALTDP4883KW5AXSAFMQ952Y6J6VPPXEXRRPCWBR2PK7DQV3RKK28222JMOJSW3JJ8AB05MIREM1CL6,1324829936318.f892dd6107b6b4130199582abc78e9c1.
  state=CLOSED, ts=1324830285347
 2011-12-26 00:24:45,349 DEBUG org.apache.hadoop.hbase.zookeeper.ZKAssign: 
 master:6-0x13447f283f40e73 Creating (or updating) unassigned node for 
 f892dd6107b6b4130199582abc78e9c1 with OFFLINE state
 2011-12-26 00:24:45,354 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Handling 
 transition=M_ZK_REGION_OFFLINE, server=dw75.kgb.sqa.cm4:6, 
 region=f892dd6107b6b4130199582abc78e9c1
 2011-12-26 00:24:45,354 DEBUG 
 org.apache.hadoop.hbase.master.AssignmentManager: Found an existing plan for 
 

[jira] [Commented] (HBASE-5088) A concurrency issue on SoftValueSortedMap

2011-12-26 Thread Jieshan Bean (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13176074#comment-13176074
 ] 

Jieshan Bean commented on HBASE-5088:
-

@Ted:
Upgrate the jdk version seems very difficult to us, because all the 
applications base on that version. I remember the 1.6.0_22 is the recommanded 
version from community before. Can you tell me why should I use 1.6.0_29? 
thanks.

 A concurrency issue on SoftValueSortedMap
 -

 Key: HBASE-5088
 URL: https://issues.apache.org/jira/browse/HBASE-5088
 Project: HBase
  Issue Type: Bug
  Components: client
Affects Versions: 0.90.4, 0.94.0
Reporter: Jieshan Bean
Assignee: Jieshan Bean
 Attachments: 5088-useMapInterfaces.txt, 5088.generics.txt, 
 HBase-5088-90.patch, HBase-5088-trunk.patch, HBase5088Reproduce.java


 SoftValueSortedMap is backed by a TreeMap. All the methods in this class are 
 synchronized. If we use this method to add/delete elements, it's ok.
 But in HConnectionManager#getCachedLocation, it use headMap to get a view 
 from SoftValueSortedMap#internalMap. Once we operate 
 on this view map(like add/delete) in other threads, a concurrency issue may 
 occur.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5088) A concurrency issue on SoftValueSortedMap

2011-12-26 Thread Jieshan Bean (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13176078#comment-13176078
 ] 

Jieshan Bean commented on HBASE-5088:
-

ok..Thanks...We're doing the performance tests accross reads vs writes with two 
different patches according to your suggestion:)

 A concurrency issue on SoftValueSortedMap
 -

 Key: HBASE-5088
 URL: https://issues.apache.org/jira/browse/HBASE-5088
 Project: HBase
  Issue Type: Bug
  Components: client
Affects Versions: 0.90.4, 0.94.0
Reporter: Jieshan Bean
Assignee: Jieshan Bean
 Attachments: 5088-useMapInterfaces.txt, 5088.generics.txt, 
 HBase-5088-90.patch, HBase-5088-trunk.patch, HBase5088Reproduce.java


 SoftValueSortedMap is backed by a TreeMap. All the methods in this class are 
 synchronized. If we use this method to add/delete elements, it's ok.
 But in HConnectionManager#getCachedLocation, it use headMap to get a view 
 from SoftValueSortedMap#internalMap. Once we operate 
 on this view map(like add/delete) in other threads, a concurrency issue may 
 occur.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5088) A concurrency issue on SoftValueSortedMap

2011-12-25 Thread Jieshan Bean (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13175887#comment-13175887
 ] 

Jieshan Bean commented on HBASE-5088:
-

@Ted,
At first, I also thought we would get a higher performance with this patch, 
because all the keywords of synchronized removed. But it slowdown.
I agree with the explaination from Lars.
Our JDK version is 1.6.0_22. And the below is our OS information:
{noformat}
C3S3:~ # cat /proc/version
Linux version 2.6.32.12-0.7-default (geeko@buildhost) (gcc version 4.3.4 
[gcc-4_ 3-branch revision 152973] 
(SUSE Linux) ) #1 SMP 2010-05-20 11:14:20 +0200
C3S3:/proc # lsb_release -a
LSB Version:
core-2.0-noarch:core-3.2-noarch:core-4.0-noarch:core-2.0-x86_64:core-3.2-x86_64:core-4.0-x86_64:desktop-4.0-amd64:desktop-4.0-noarch:graphics-2.0-amd64:graphics-2.0-noarch:graphics-3.2-amd64:graphics-3.2-noarch:graphics-4.0-amd64:graphics-4.0-noarch
Distributor ID: SUSE LINUX
Description:SUSE Linux Enterprise Server 11 (x86_64)
Release:11
Codename:   n/a
{noformat}

We'll take more tests accross the read vs write and give out the results.

@Lars,
Sorry, I didn't do another comparison with SoftvalueSortedMap replaced by 
ConcurrentSkiplistMap.And am planning to do it. Including the functional test 
and the performance test. And then, we can choose a better one.

 A concurrency issue on SoftValueSortedMap
 -

 Key: HBASE-5088
 URL: https://issues.apache.org/jira/browse/HBASE-5088
 Project: HBase
  Issue Type: Bug
  Components: client
Affects Versions: 0.90.4, 0.94.0
Reporter: Jieshan Bean
Assignee: Jieshan Bean
 Attachments: 5088-useMapInterfaces.txt, 5088.generics.txt, 
 HBase-5088-90.patch, HBase-5088-trunk.patch, HBase5088Reproduce.java


 SoftValueSortedMap is backed by a TreeMap. All the methods in this class are 
 synchronized. If we use this method to add/delete elements, it's ok.
 But in HConnectionManager#getCachedLocation, it use headMap to get a view 
 from SoftValueSortedMap#internalMap. Once we operate 
 on this view map(like add/delete) in other threads, a concurrency issue may 
 occur.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5088) A concurrency issue on SoftValueSortedMap

2011-12-22 Thread Jieshan Bean (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13175253#comment-13175253
 ] 

Jieshan Bean commented on HBASE-5088:
-

Loop into the mothod of TreeMap#fixAfterDeletion(EntryK,V x), once the x is 
null(It may caused by a concurrency issue), it can't come out of that loop. At 
that time, CPU usage is high. 
That's what we saw. All the time, the thread was blocked in that method.
Currently, the patch is made just replace TreeMap with ConcurrentSkipListMap as 
Anoop and Lars's suggestion. We're verifying the patch.
@Ted, actually, the heapmap method of ConcurrentSkipListMap is different from 
TreeMap, it is also backed by the original thread-safe map. what do you think?

 A concurrency issue on SoftValueSortedMap
 -

 Key: HBASE-5088
 URL: https://issues.apache.org/jira/browse/HBASE-5088
 Project: HBase
  Issue Type: Bug
  Components: client
Affects Versions: 0.90.4, 0.94.0
Reporter: Jieshan Bean
Assignee: Jieshan Bean

 SoftValueSortedMap is backed by a TreeMap. All the methods in this class are 
 synchronized. If we use this method to add/delete elements, it's ok.
 But in HConnectionManager#getCachedLocation, it use headMap to get a view 
 from SoftValueSortedMap#internalMap. Once we operate 
 on this view map(like add/delete) in other threads, a concurrency issue may 
 occur.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5088) A concurrency issue on SoftValueSortedMap

2011-12-22 Thread Jieshan Bean (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13175283#comment-13175283
 ] 

Jieshan Bean commented on HBASE-5088:
-

+1
Use the concrete Map type is always not recommended. 


 A concurrency issue on SoftValueSortedMap
 -

 Key: HBASE-5088
 URL: https://issues.apache.org/jira/browse/HBASE-5088
 Project: HBase
  Issue Type: Bug
  Components: client
Affects Versions: 0.90.4, 0.94.0
Reporter: Jieshan Bean
Assignee: Jieshan Bean

 SoftValueSortedMap is backed by a TreeMap. All the methods in this class are 
 synchronized. If we use this method to add/delete elements, it's ok.
 But in HConnectionManager#getCachedLocation, it use headMap to get a view 
 from SoftValueSortedMap#internalMap. Once we operate 
 on this view map(like add/delete) in other threads, a concurrency issue may 
 occur.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5088) A concurrency issue on SoftValueSortedMap

2011-12-22 Thread Jieshan Bean (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13175306#comment-13175306
 ] 

Jieshan Bean commented on HBASE-5088:
-

I have made a patch replace the TreeMap with ConcurrentSplitListMap in 
SoftValueSortedMap, but not replace SoftValueSortedMap. I'm running the test on 
that patch. once finished, I'll submit the patch.

 A concurrency issue on SoftValueSortedMap
 -

 Key: HBASE-5088
 URL: https://issues.apache.org/jira/browse/HBASE-5088
 Project: HBase
  Issue Type: Bug
  Components: client
Affects Versions: 0.90.4, 0.94.0
Reporter: Jieshan Bean
Assignee: Jieshan Bean
 Attachments: 5088-useMapInterfaces.txt


 SoftValueSortedMap is backed by a TreeMap. All the methods in this class are 
 synchronized. If we use this method to add/delete elements, it's ok.
 But in HConnectionManager#getCachedLocation, it use headMap to get a view 
 from SoftValueSortedMap#internalMap. Once we operate 
 on this view map(like add/delete) in other threads, a concurrency issue may 
 occur.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5088) A concurrency issue on SoftValueSortedMap

2011-12-22 Thread Jieshan Bean (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13175309#comment-13175309
 ] 

Jieshan Bean commented on HBASE-5088:
-

Thanks Lars, I will modify the patch under your suggestion, to use the 
interface replace the concrete type:)...Will submit it later.

 A concurrency issue on SoftValueSortedMap
 -

 Key: HBASE-5088
 URL: https://issues.apache.org/jira/browse/HBASE-5088
 Project: HBase
  Issue Type: Bug
  Components: client
Affects Versions: 0.90.4, 0.94.0
Reporter: Jieshan Bean
Assignee: Jieshan Bean
 Attachments: 5088-useMapInterfaces.txt


 SoftValueSortedMap is backed by a TreeMap. All the methods in this class are 
 synchronized. If we use this method to add/delete elements, it's ok.
 But in HConnectionManager#getCachedLocation, it use headMap to get a view 
 from SoftValueSortedMap#internalMap. Once we operate 
 on this view map(like add/delete) in other threads, a concurrency issue may 
 occur.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5088) A concurrency issue on SoftValueSortedMap

2011-12-21 Thread Jieshan Bean (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13174606#comment-13174606
 ] 

Jieshan Bean commented on HBASE-5088:
-

We found this issue while one thread can't get out of the loop in 
TreeMap#fixAfterDeletion:
{noformat}
Thread-923 prio=10 tid=0x7f3d40553000 nid=0x3ed6 runnable 
[0x7f3d05c1a000]
   java.lang.Thread.State: RUNNABLE
at java.util.TreeMap.fixAfterDeletion(TreeMap.java:2193)
at java.util.TreeMap.deleteEntry(TreeMap.java:2151)
at java.util.TreeMap.remove(TreeMap.java:585)
at java.util.TreeMap$NavigableSubMap.remove(TreeMap.java:1395)
at 
org.apache.hadoop.hbase.util.SoftValueSortedMap.get(SoftValueSortedMap.java:101)
- locked 0x7f3d94f24f70 (a 
org.apache.hadoop.hbase.util.SoftValueSortedMap)
at 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getCachedLocation(HConnectionManager.java:846)
at 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegionInMeta(HConnectionManager.java:668)
at 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:594)
at 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.locateRegion(HConnectionManager.java:559)
at 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionLocation(HConnectionManager.java:416)
at 
org.apache.hadoop.hbase.client.ServerCallable.instantiateServer(ServerCallable.java:57)
at 
org.apache.hadoop.hbase.client.ScannerCallable.instantiateServer(ScannerCallable.java:63)
at 
org.apache.hadoop.hbase.client.HConnectionManager$HConnectionImplementation.getRegionServerWithRetries(HConnectionManager.java:1018)
at 
org.apache.hadoop.hbase.client.HTable$ClientScanner.nextScanner(HTable.java:1104)
at 
org.apache.hadoop.hbase.client.HTable$ClientScanner.initialize(HTable.java:1027)
at org.apache.hadoop.hbase.client.HTable.getScanner(HTable.java:535)
at 
com.huawei.icbc.query.SingleTabQuery.querybatch(SingleTabQuery.java:197)
at 
com.huawei.icbc.benchmark.SingleTabQueryAction.query(SingleTabQueryAction.java:181)
at framework.QueryThread.run(QueryThread.java:47)
{noformat}

 A concurrency issue on SoftValueSortedMap
 -

 Key: HBASE-5088
 URL: https://issues.apache.org/jira/browse/HBASE-5088
 Project: HBase
  Issue Type: Bug
  Components: client
Affects Versions: 0.90.4, 0.94.0
Reporter: Jieshan Bean

 SoftValueSortedMap is backed by a TreeMap. All the methods in this class are 
 synchronized. If we use this method to add/delete elements, it's ok.
 But in HConnectionManager#getCachedLocation, it use heapMap to get a view 
 from SoftValueSortedMap#internalMap. Once we operate 
 on this view map(like add/delete) in other threads, a concurrency issue may 
 occur.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira




[jira] [Commented] (HBASE-5088) A concurrency issue on SoftValueSortedMap

2011-12-21 Thread Jieshan Bean (Commented) (JIRA)

[ 
https://issues.apache.org/jira/browse/HBASE-5088?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=13174615#comment-13174615
 ] 

Jieshan Bean commented on HBASE-5088:
-

Good suggestion, Anoop.

Actually there's 2 solutions now:
1. Use synchronized while we operation on the view of the original map.
2. Use ConcurrentSkipListMap insteadof TreeMap.

Don't know which one is better.

 A concurrency issue on SoftValueSortedMap
 -

 Key: HBASE-5088
 URL: https://issues.apache.org/jira/browse/HBASE-5088
 Project: HBase
  Issue Type: Bug
  Components: client
Affects Versions: 0.90.4, 0.94.0
Reporter: Jieshan Bean
Assignee: Jieshan Bean

 SoftValueSortedMap is backed by a TreeMap. All the methods in this class are 
 synchronized. If we use this method to add/delete elements, it's ok.
 But in HConnectionManager#getCachedLocation, it use heapMap to get a view 
 from SoftValueSortedMap#internalMap. Once we operate 
 on this view map(like add/delete) in other threads, a concurrency issue may 
 occur.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira