Thanks for the explanation Ted, I will try to apply HBASE-3789 and hope for the best but my understanding is that it doesn't really solve the problem, it only reduces the probability of it happening, at least in one particular scenario. I would hope for a more robust solution. My concern is that the region allocation process seems to rely too much on timing considerations and doesn't seem to take enough measures to guarantee conflicts do not occur. I understand that in a distributed environment, when you don't get a timely response from a remote machine you can't know for sure if it did or did not receive the request, however there are things that can be done to mitigate this and reduce the conflict time significantly. For example, when I run dbck it knows that some regions are multiply assigned, the master could do the same and try to resolve the conflict. Another approach would be to handle late responses, even if the response from the remote machine arrives after it was assumed to be dead the master should have enough information to know it had created a conflict by assigning the region to another server. An even better solution, I think, is for the RS to periodically test that it is indeed the rightful owner of every region it holds and relinquish control over the region if it's not. Obviously a state where two RSs hold the same region is pathological and can lead to data loss, as demonstrated in my case. The system should be able to actively protect itself against such a scenario. It probably doesn't need saying but there is really nothing worse for a data storage system than data loss.
In my case the problem didn't happen in the initial phase but after disabling and enabling a table with about 12K regions. -eran On Sun, Jul 3, 2011 at 23:49, Ted Yu <[email protected]> wrote: > Let me try to answer some of your questions. > The two paragraphs below were written along my reasoning which is in > reverse > order of the actual call sequence. > > For #4 below, the log indicates that the following was executed: > private void assign(final RegionState state, final boolean setOfflineInZK, > final boolean forceNewPlan) { > for (int i = 0; i < this.maximumAssignmentAttempts; i++) { > if (setOfflineInZK && !*setOfflineInZooKeeper*(state)) return; > > The above was due to the timeout which you noted in #2 which would have > caused > TimeoutMonitor.chore() to run this code (line 1787) > > for (Map.Entry<HRegionInfo, Boolean> e: assigns.entrySet()){ > assign(e.getKey(), false, e.getValue()); > } > > This means there is lack of coordination between > assignmentManager.TimeoutMonitor and OpenedRegionHandler > > The reason I mention HBASE-3789 is that it is marked as Incompatible change > and is in TRUNK already. > The application of HBASE-3789 to 0.90 branch would change the behavior > (timing) of region assignment. > > I think it makes sense to evaluate the effect of HBASE-3789 in 0.90.4 > > BTW were the incorrect region assignments observed for a table with > multiple > initial regions ? > If so, I have HBASE-4010 in TRUNK which speeds up initial region assignment > by about 50%. > > Cheers > > On Sun, Jul 3, 2011 at 12:02 PM, Eran Kutner <[email protected]> wrote: > > > Ted, > > So if I understand correctly the the theory is that because of the issue > > fixed in HBASE-3789 the master took too long to detect that the region > was > > successfully opened by the first server so it forced closed it and > > transitioned to a second server, but there are a few things about this > > scenario I don't understand, probably because I don't know enough about > the > > inner workings of the region transition process and would appreciate it > if > > you can help me understand: > > 1. The RS opened the region at 16:37:49. > > 2. The master started handling the opened event at 16:39:54 - this delay > > can > > probably be explained by HBASE-3789 > > 3. At 16:39:54 the master log says: Opened region gs_raw_events,..... on > > hadoop1-s05.farm-ny.gigya.com > > 4. Then at 16:40:00 the master log says: master:60000-0x13004a31d7804c4 > > Creating (or updating) unassigned node for 584dac5cc70d8682f71c4675a843c3 > > 09 with OFFLINE state - why did it decide to take the region offline > after > > learning it was successfully opened? > > 5. Then it tries to reopen the region on hadoop1-s05, which indicates in > > its > > log that the open request failed because the region was already open - > why > > didn't the master use that information to learn that the region was > already > > open? > > 6. At 16:43:57 the master decides the region transition timed out and > > starts > > forcing the transition - HBASE-3789 again? > > 7. Now the master forces the transition of the region to hadoop1-s02 but > > there is no sign of that on hadoop1-s05 - why doesn't the old RS > > (hadoop1-s05) detect that it is no longer the master and relinquishes > > control of the region? > > > > Thanks. > > > > -eran > > > > > > > > On Sun, Jul 3, 2011 at 20:09, Ted Yu <[email protected]> wrote: > > > > > HBASE-3789 should have sped up region assignment. > > > The patch for 0.90 is attached to that JIRA. > > > > > > You may prudently apply that patch. > > > > > > Regards > > > > > > On Sun, Jul 3, 2011 at 10:01 AM, Eran Kutner <[email protected]> wrote: > > > > > > > Thanks Ted, but, as stated before, I'm already using 0.90.3, so > either > > > it's > > > > not fixed or it's not the same thing. > > > > > > > > -eran > > > > > > > > > > > > > > > > On Sun, Jul 3, 2011 at 17:27, Ted Yu <[email protected]> wrote: > > > > > > > > > Eran: > > > > > I was thinking of this: > > > > > HBASE-3789 Cleanup the locking contention in the master > > > > > > > > > > though it doesn't directly handle 'PENDING_OPEN for too long' case. > > > > > > > > > > https://issues.apache.org/jira/browse/HBASE-3741 is in 0.90.3 and > > > > actually > > > > > close to the symptom you described. > > > > > > > > > > On Sun, Jul 3, 2011 at 12:00 AM, Eran Kutner <[email protected]> > wrote: > > > > > > > > > > > It does seem that both servers opened the same region around the > > same > > > > > time. > > > > > > The region was offline because I disabled the table so I can > change > > > its > > > > > > TTL. > > > > > > > > > > > > Here is the log from haddop1-s05 : > > > > > > 2011-06-29 16:37:12,576 INFO > > > > > > org.apache.hadoop.hbase.regionserver.HRegionServer: Received > > request > > > to > > > > > > open > > > > > > region: > > > > > > > > > > > > > > > > > > > > > > > > > > > gs_raw_events,GSLoad_1308518553_168_WEB204,1308533970928.584dac5cc70d8682f71c4675a843c309. > > > > > > 2011-06-29 16:37:12,680 DEBUG > > > > > > org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: > > > > > Processing > > > > > > open of > > > > > > > > > > > > > > > > > > > > > > > > > > > gs_raw_events,GSLoad_1308518553_168_WEB204,1308533970928.584dac5cc70d8682f71c4675a843c309. > > > > > > 2011-06-29 16:37:12,680 DEBUG > > > > org.apache.hadoop.hbase.zookeeper.ZKAssign: > > > > > > regionserver:60020-0x33004a38816050b Attempting to transition > node > > > > > > 584dac5cc70d8682f71c4675a843c309 from M_ZK_REGION_OFFLINE to > > > > > > RS_ZK_REGION_OPENING > > > > > > 2011-06-29 16:37:12,711 DEBUG > > > > org.apache.hadoop.hbase.zookeeper.ZKAssign: > > > > > > regionserver:60020-0x33004a38816050b Successfully transitioned > node > > > > > > 584dac5cc70d8682f71c4675a843c309 from M_ZK_REGION_OFFLINE to > > > > > > RS_ZK_REGION_OPENING > > > > > > 2011-06-29 16:37:12,711 DEBUG > > > > > org.apache.hadoop.hbase.regionserver.HRegion: > > > > > > Opening region: REGION => {NAME => > > > > > > > > > > > > > > > > > > > > > > > > > > > 'gs_raw_events,GSLoad_1308518553_168_WEB204,1308533970928.584dac5cc70d8682f71c4675a843c309.', > > > > > > STARTKEY => 'GSLoad_1308518553_168_WEB204', ENDKEY => > > > > > > 'GSLoad_1308518810_1249_WEB204', ENCODED => > > > > > > 584dac5cc70d8682f71c4675a843c309, TABLE => {{NAME => > > 'gs_raw_events', > > > > > > FAMILIES => [{NAME => 'events', BLOOMFILTER => 'NONE', > > > > REPLICATION_SCOPE > > > > > => > > > > > > '1', VERSIONS => '3', COMPRESSION => 'LZO', TTL => '604800', > > > BLOCKSIZE > > > > => > > > > > > '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}]}} > > > > > > 2011-06-29 16:37:12,711 DEBUG > > > > > org.apache.hadoop.hbase.regionserver.HRegion: > > > > > > Instantiated > > > > > > > > > > > > > > > > > > > > > > > > > > > gs_raw_events,GSLoad_1308518553_168_WEB204,1308533970928.584dac5cc70d8682f71c4675a843c309. > > > > > > 2011-06-29 16:37:12,847 DEBUG > > > > org.apache.hadoop.hbase.regionserver.Store: > > > > > > loaded > > > > > > > > > > > > > > > > > > > > > > > > > > > hdfs://hadoop1-m1:8020/hbase/gs_raw_events/584dac5cc70d8682f71c4675a843c309/events/1971818821800304360, > > > > > > isReference=false, isBulkLoadResult=false, seqid=1162228062, > > > > > > majorCompaction=false > > > > > > 2011-06-29 16:37:12,848 INFO > > > > > org.apache.hadoop.hbase.regionserver.HRegion: > > > > > > Onlined > > > > > > > > > > > > > > > > > > > > > > > > > > > gs_raw_events,GSLoad_1308518553_168_WEB204,1308533970928.584dac5cc70d8682f71c4675a843c309.; > > > > > > next sequenceid=1162228063 > > > > > > 2011-06-29 16:37:12,849 DEBUG > > > > org.apache.hadoop.hbase.zookeeper.ZKAssign: > > > > > > regionserver:60020-0x33004a38816050b Attempting to transition > node > > > > > > 584dac5cc70d8682f71c4675a843c309 from RS_ZK_REGION_OPENING to > > > > > > RS_ZK_REGION_OPENING > > > > > > 2011-06-29 16:37:12,875 DEBUG > > > > org.apache.hadoop.hbase.zookeeper.ZKAssign: > > > > > > regionserver:60020-0x33004a38816050b Successfully transitioned > node > > > > > > 584dac5cc70d8682f71c4675a843c309 from RS_ZK_REGION_OPENING to > > > > > > RS_ZK_REGION_OPENING > > > > > > 2011-06-29 16:37:12,951 INFO > > > > org.apache.hadoop.hbase.catalog.MetaEditor: > > > > > > Updated row > > > > > > > > > > > > > > > > > > > > > > > > > > > gs_raw_events,GSLoad_1308518553_168_WEB204,1308533970928.584dac5cc70d8682f71c4675a843c309. > > > > > > in region .META.,,1 with server= > > hadoop1-s05.farm-ny.gigya.com:60020, > > > > > > startcode=1307349217076 > > > > > > 2011-06-29 16:37:12,951 DEBUG > > > > org.apache.hadoop.hbase.zookeeper.ZKAssign: > > > > > > regionserver:60020-0x33004a38816050b Attempting to transition > node > > > > > > 584dac5cc70d8682f71c4675a843c309 from RS_ZK_REGION_OPENING to > > > > > > RS_ZK_REGION_OPENED > > > > > > 2011-06-29 16:37:12,964 DEBUG > > > > org.apache.hadoop.hbase.zookeeper.ZKAssign: > > > > > > regionserver:60020-0x33004a38816050b Successfully transitioned > node > > > > > > 584dac5cc70d8682f71c4675a843c309 from RS_ZK_REGION_OPENING to > > > > > > RS_ZK_REGION_OPENED > > > > > > 2011-06-29 16:37:12,964 DEBUG > > > > > > org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: > > > Opened > > > > > > > > > > > > > > > > > > > > > > > > > > > gs_raw_events,GSLoad_1308518553_168_WEB204,1308533970928.584dac5cc70d8682f71c4675a843c309. > > > > > > 2011-06-29 16:40:00,878 INFO > > > > > > org.apache.hadoop.hbase.regionserver.HRegionServer: Received > > request > > > to > > > > > > open > > > > > > region: > > > > > > > > > > > > > > > > > > > > > > > > > > > gs_raw_events,GSLoad_1308518553_168_WEB204,1308533970928.584dac5cc70d8682f71c4675a843c309. > > > > > > 2011-06-29 16:40:00,878 DEBUG > > > > > > org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: > > > > > Processing > > > > > > open of > > > > > > > > > > > > > > > > > > > > > > > > > > > gs_raw_events,GSLoad_1308518553_168_WEB204,1308533970928.584dac5cc70d8682f71c4675a843c309. > > > > > > 2011-06-29 16:40:01,079 WARN > > > > > > org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: > > > > Attempted > > > > > > open of > > > > > > > > > > > > > > > > > > > > > > > > > > > gs_raw_events,GSLoad_1308518553_168_WEB204,1308533970928.584dac5cc70d8682f71c4675a843c309. > > > > > > but already online on this server > > > > > > 2011-06-29 16:43:50,395 DEBUG > > > > > > org.apache.hadoop.hbase.regionserver.CompactSplitThread: > Compaction > > > > > (major) > > > > > > requested for > > > > > > > > > > > > > > > > > > > > > > > > > > > gs_raw_events,GSLoad_1308518553_168_WEB204,1308533970928.584dac5cc70d8682f71c4675a843c309. > > > > > > because User-triggered major compaction; priority=1, compaction > > queue > > > > > > size=1248 > > > > > > 2011-06-29 20:19:49,906 INFO > > > > > org.apache.hadoop.hbase.regionserver.HRegion: > > > > > > Starting major compaction on region > > > > > > > > > > > > > > > > > > > > > > > > > > > gs_raw_events,GSLoad_1308518553_168_WEB204,1308533970928.584dac5cc70d8682f71c4675a843c309. > > > > > > 2011-06-29 20:19:49,906 INFO > > > > org.apache.hadoop.hbase.regionserver.Store: > > > > > > Started compaction of 1 file(s) in cf=events into > > > > > > > > > > > > > > > > > > > > > > > > > > > hdfs://hadoop1-m1:8020/hbase/gs_raw_events/584dac5cc70d8682f71c4675a843c309/.tmp, > > > > > > seqid=1162228062, totalSize=98.3m > > > > > > 2011-06-29 20:19:49,906 DEBUG > > > > org.apache.hadoop.hbase.regionserver.Store: > > > > > > Compacting > > > > > > > > > > > > > > > > > > > > > > > > > > > hdfs://hadoop1-m1:8020/hbase/gs_raw_events/584dac5cc70d8682f71c4675a843c309/events/1971818821800304360, > > > > > > keycount=6882816, bloomtype=NONE, size=98.3m > > > > > > 2011-06-29 20:19:59,920 INFO > > > > > org.apache.hadoop.hbase.regionserver.HRegion: > > > > > > completed compaction on region > > > > > > > > > > > > > > > > > > > > > > > > > > > gs_raw_events,GSLoad_1308518553_168_WEB204,1308533970928.584dac5cc70d8682f71c4675a843c309. > > > > > > after 10sec > > > > > > > > > > > > And here is the one from hadoop1-s02: > > > > > > 2011-06-29 16:43:57,935 INFO > > > > > > org.apache.hadoop.hbase.regionserver.HRegionServer: Received > > request > > > to > > > > > > open > > > > > > region: > > > > > > > > > > > > > > > > > > > > > > > > > > > gs_raw_events,GSLoad_1308518553_168_WEB204,1308533970928.584dac5cc70d8682f71c4675a843c309. > > > > > > 2011-06-29 16:43:58,990 DEBUG > > > > > > org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: > > > > > Processing > > > > > > open of > > > > > > > > > > > > > > > > > > > > > > > > > > > gs_raw_events,GSLoad_1308518553_168_WEB204,1308533970928.584dac5cc70d8682f71c4675a843c309. > > > > > > 2011-06-29 16:43:58,990 DEBUG > > > > org.apache.hadoop.hbase.zookeeper.ZKAssign: > > > > > > regionserver:60020-0x23004a31d8904de Attempting to transition > node > > > > > > 584dac5cc70d8682f71c4675a843c309 from M_ZK_REGION_OFFLINE to > > > > > > RS_ZK_REGION_OPENING > > > > > > 2011-06-29 16:43:59,002 DEBUG > > > > org.apache.hadoop.hbase.zookeeper.ZKAssign: > > > > > > regionserver:60020-0x23004a31d8904de Successfully transitioned > node > > > > > > 584dac5cc70d8682f71c4675a843c309 from M_ZK_REGION_OFFLINE to > > > > > > RS_ZK_REGION_OPENING > > > > > > 2011-06-29 16:43:59,002 DEBUG > > > > > org.apache.hadoop.hbase.regionserver.HRegion: > > > > > > Opening region: REGION => {NAME => > > > > > > > > > > > > > > > > > > > > > > > > > > > 'gs_raw_events,GSLoad_1308518553_168_WEB204,1308533970928.584dac5cc70d8682f71c4675a843c309.', > > > > > > STARTKEY => 'GSLoad_1308518553_168_WEB204', ENDKEY => > > > > > > 'GSLoad_1308518810_1249_WEB204', ENCODED => > > > > > > 584dac5cc70d8682f71c4675a843c309, TABLE => {{NAME => > > 'gs_raw_events', > > > > > > FAMILIES => [{NAME => 'events', BLOOMFILTER => 'NONE', > > > > REPLICATION_SCOPE > > > > > => > > > > > > '1', VERSIONS => '3', COMPRESSION => 'LZO', TTL => '604800', > > > BLOCKSIZE > > > > => > > > > > > '65536', IN_MEMORY => 'false', BLOCKCACHE => 'true'}]}} > > > > > > 2011-06-29 16:43:59,003 DEBUG > > > > > org.apache.hadoop.hbase.regionserver.HRegion: > > > > > > Instantiated > > > > > > > > > > > > > > > > > > > > > > > > > > > gs_raw_events,GSLoad_1308518553_168_WEB204,1308533970928.584dac5cc70d8682f71c4675a843c309. > > > > > > 2011-06-29 16:43:59,204 DEBUG > > > > org.apache.hadoop.hbase.regionserver.Store: > > > > > > loaded > > > > > > > > > > > > > > > > > > > > > > > > > > > hdfs://hadoop1-m1:8020/hbase/gs_raw_events/584dac5cc70d8682f71c4675a843c309/events/1971818821800304360, > > > > > > isReference=false, isBulkLoadResult=false, seqid=1162228062, > > > > > > majorCompaction=false > > > > > > 2011-06-29 16:43:59,205 INFO > > > > > org.apache.hadoop.hbase.regionserver.HRegion: > > > > > > Onlined > > > > > > > > > > > > > > > > > > > > > > > > > > > gs_raw_events,GSLoad_1308518553_168_WEB204,1308533970928.584dac5cc70d8682f71c4675a843c309.; > > > > > > next sequenceid=1162228063 > > > > > > 2011-06-29 16:43:59,205 DEBUG > > > > org.apache.hadoop.hbase.zookeeper.ZKAssign: > > > > > > regionserver:60020-0x23004a31d8904de Attempting to transition > node > > > > > > 584dac5cc70d8682f71c4675a843c309 from RS_ZK_REGION_OPENING to > > > > > > RS_ZK_REGION_OPENING > > > > > > 2011-06-29 16:43:59,212 DEBUG > > > > org.apache.hadoop.hbase.zookeeper.ZKAssign: > > > > > > regionserver:60020-0x23004a31d8904de Successfully transitioned > node > > > > > > 584dac5cc70d8682f71c4675a843c309 from RS_ZK_REGION_OPENING to > > > > > > RS_ZK_REGION_OPENING > > > > > > 2011-06-29 16:43:59,214 INFO > > > > org.apache.hadoop.hbase.catalog.MetaEditor: > > > > > > Updated row > > > > > > > > > > > > > > > > > > > > > > > > > > > gs_raw_events,GSLoad_1308518553_168_WEB204,1308533970928.584dac5cc70d8682f71c4675a843c309. > > > > > > in region .META.,,1 with server= > > hadoop1-s02.farm-ny.gigya.com:60020, > > > > > > startcode=1306919627544 > > > > > > 2011-06-29 16:43:59,214 DEBUG > > > > org.apache.hadoop.hbase.zookeeper.ZKAssign: > > > > > > regionserver:60020-0x23004a31d8904de Attempting to transition > node > > > > > > 584dac5cc70d8682f71c4675a843c309 from RS_ZK_REGION_OPENING to > > > > > > RS_ZK_REGION_OPENED > > > > > > 2011-06-29 16:43:59,224 DEBUG > > > > org.apache.hadoop.hbase.zookeeper.ZKAssign: > > > > > > regionserver:60020-0x23004a31d8904de Successfully transitioned > node > > > > > > 584dac5cc70d8682f71c4675a843c309 from RS_ZK_REGION_OPENING to > > > > > > RS_ZK_REGION_OPENED > > > > > > 2011-06-29 16:43:59,224 DEBUG > > > > > > org.apache.hadoop.hbase.regionserver.handler.OpenRegionHandler: > > > Opened > > > > > > > > > > > > > > > > > > > > > > > > > > > gs_raw_events,GSLoad_1308518553_168_WEB204,1308533970928.584dac5cc70d8682f71c4675a843c309. > > > > > > java.io.IOException: Got error in response to OP_READ_BLOCK > self=/ > > > > > > 10.1.104.2:33356, remote=/10.1.104.2:50010 for file > > > > > > > > > > > > > > > > > > > > > > > > > > > /hbase/gs_raw_events/584dac5cc70d8682f71c4675a843c309/events/1971818821800304360 > > > > > > for block 3674866614142268536_674205 > > > > > > > > > > > > > > > > > > Ted, can you please point me to J-D's bug fix you mentioned? Are > > you > > > > > > positive it's the same scenario - data loss is a very serious > > problem > > > > for > > > > > a > > > > > > DB. > > > > > > I'd really like to apply that patch ASAP, because when I run hbck > I > > > get > > > > > > over > > > > > > 400 regions which are multiply assigned. > > > > > > Last question, I understand the region's data is lost but is > there > > a > > > > way > > > > > to > > > > > > at least make the table consistent again by some how removing the > > > lost > > > > > > region? > > > > > > > > > > > > Thanks. > > > > > > > > > > > > -eran > > > > > > > > > > > > > > > > > > > > > > > > On Sat, Jul 2, 2011 at 01:46, Ted Yu <[email protected]> > wrote: > > > > > > > > > > > > > >> 2011-06-29 16:43:57,880 INFO > > > > > > > org.apache.hadoop.hbase. > > > > > > > master.AssignmentManager: Region has been > > > > > > > PENDING_OPEN for too long, reassigning > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > region=gs_raw_events,GSLoad_1308518553_168_WEB204,1308533970928.584dac5cc70d8682f71c4675a843c309. > > > > > > > > > > > > > > The double assignment should have been fixed by J-D's recent > > > checkin. > > > > > > > > > > > > > > On Fri, Jul 1, 2011 at 3:14 PM, Stack <[email protected]> > wrote: > > > > > > > > > > > > > > > Is > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > gs_raw_events,GSLoad_1308518553_168_WEB204,1308533970928.584dac5cc70d8682f71c4675a843c309. > > > > > > > > the region that was having the issue? If so, if you looked > in > > > > > > > > hadoop1-s05's logs, was this region opened around 2011-06-29 > > > > > 16:43:57? > > > > > > > > Was it also opened hadoop1-s02 not long after? Did you say > > what > > > > > > > > version of hbase you are on? > > > > > > > > > > > > > > > > St.Ack > > > > > > > > > > > > > > > > On Fri, Jul 1, 2011 at 5:08 AM, Eran Kutner <[email protected]> > > > > wrote: > > > > > > > > > Hi Stack, > > > > > > > > > I'm not sure what the log means. I do see references to two > > > > > different > > > > > > > > > servers, but that would probably happen if there was normal > > > > > > transition > > > > > > > I > > > > > > > > > assume. I'm using version 0.90.3 > > > > > > > > > Here are the relevant lines from the master logs: > > > > > > > > > > > > > > > > > > 2011-06-19 21:39:37,164 INFO > > > > > > > > org.apache.hadoop.hbase.master.ServerManager: > > > > > > > > > Received REGION_SPLIT: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > gs_raw_events,GSLoad_1308518553_168_WEB204,1308533691659.9000a5d8df9502efc90d2c23567e4658.: > > > > > > > > > Daughters; > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > gs_raw_events,GSLoad_1308518553_168_WEB204,1308533970928.584dac5cc70d8682f71c4675a843c309., > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > gs_raw_events,GSLoad_1308518810_1249_WEB204,1308533970928.46f876a4e97be04edb35eb8f8959d482. > > > > > > > > > from hadoop1-s05.farm-ny.gigya.com,60020,1307349217076 > > > > > > > > > 2011-06-19 21:43:12,983 INFO > > > > > > > org.apache.hadoop.hbase.catalog.MetaEditor: > > > > > > > > > Deleted daughter reference > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > gs_raw_events,GSLoad_1308518553_168_WEB204,1308533970928.584dac5cc70d8682f71c4675a843c309., > > > > > > > > > qualifier=splitA, from parent > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > gs_raw_events,GSLoad_1308518553_168_WEB204,1308533691659.9000a5d8df9502efc90d2c23567e4658. > > > > > > > > > 2011-06-29 16:29:36,143 DEBUG > > > > > > > > > org.apache.hadoop.hbase.master.AssignmentManager: Starting > > > > > > unassignment > > > > > > > > of > > > > > > > > > region > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > gs_raw_events,GSLoad_1308518553_168_WEB204,1308533970928.584dac5cc70d8682f71c4675a843c309. > > > > > > > > > (offlining) > > > > > > > > > 2011-06-29 16:29:36,146 DEBUG > > > > > > > > > org.apache.hadoop.hbase.master.AssignmentManager: Sent > CLOSE > > to > > > > > > > > serverName= > > > > > > > > > hadoop1-s05.farm-ny.gigya.com,60020,1307349217076, > > > > > load=(requests=0, > > > > > > > > > regions=1654, usedHeap=1870, maxHeap=12483) for region > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > gs_raw_events,GSLoad_1308518553_168_WEB204,1308533970928.584dac5cc70d8682f71c4675a843c309. > > > > > > > > > 2011-06-29 16:29:38,327 DEBUG > > > > > > > > > org.apache.hadoop.hbase.master.AssignmentManager: Handling > > new > > > > > > > unassigned > > > > > > > > > node: /hbase/unassigned/584dac5cc70d8682f71c4675a843c309 > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > (region=gs_raw_events,GSLoad_1308518553_168_WEB204,1308533970928.584dac5cc70d8682f71c4675a843c309., > > > > > > > > > server=hadoop1-s05.farm-ny.gigya.com,60020,1307349217076, > > > > > > > > > state=RS_ZK_REGION_CLOSED) > > > > > > > > > 2011-06-29 16:29:38,327 DEBUG > > > > > > > > > org.apache.hadoop.hbase.master.AssignmentManager: Handling > > > > > > > > > transition=RS_ZK_REGION_CLOSED, > > > > > > > > > server=hadoop1-s05.farm-ny.gigya.com,60020,1307349217076, > > > > > > > > > region=584dac5cc70d8682f71c4675a843c309 > > > > > > > > > 2011-06-29 16:30:53,742 DEBUG > > > > > > > > > org.apache.hadoop.hbase.master.handler.ClosedRegionHandler: > > > > > Handling > > > > > > > > CLOSED > > > > > > > > > event for 584dac5cc70d8682f71c4675a843c309 > > > > > > > > > 2011-06-29 16:30:53,742 DEBUG > > > > > > > > > org.apache.hadoop.hbase.master.AssignmentManager: Table > being > > > > > > disabled > > > > > > > so > > > > > > > > > deleting ZK node and removing from regions in transition, > > > > skipping > > > > > > > > > assignment of region > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > gs_raw_events,GSLoad_1308518553_168_WEB204,1308533970928.584dac5cc70d8682f71c4675a843c309. > > > > > > > > > 2011-06-29 16:30:53,742 DEBUG > > > > > > > org.apache.hadoop.hbase.zookeeper.ZKAssign: > > > > > > > > > master:60000-0x13004a31d7804c4 Deleting existing unassigned > > > node > > > > > for > > > > > > > > > 584dac5cc70d8682f71c4675a843c309 that is in expected state > > > > > > > > > RS_ZK_REGION_CLOSED > > > > > > > > > 2011-06-29 16:30:53,801 DEBUG > > > > > > > org.apache.hadoop.hbase.zookeeper.ZKAssign: > > > > > > > > > master:60000-0x13004a31d7804c4 Successfully deleted > > unassigned > > > > node > > > > > > for > > > > > > > > > region 584dac5cc70d8682f71c4675a843c309 in expected state > > > > > > > > > RS_ZK_REGION_CLOSED > > > > > > > > > 2011-06-29 16:34:01,453 INFO > > > > > > > org.apache.hadoop.hbase.catalog.MetaEditor: > > > > > > > > > Updated region > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > gs_raw_events,GSLoad_1308518553_168_WEB204,1308533970928.584dac5cc70d8682f71c4675a843c309. > > > > > > > > > in META > > > > > > > > > 2011-06-29 16:37:12,247 DEBUG > > > > > > > org.apache.hadoop.hbase.zookeeper.ZKAssign: > > > > > > > > > master:60000-0x13004a31d7804c4 Creating (or updating) > > > unassigned > > > > > node > > > > > > > for > > > > > > > > > 584dac5cc70d8682f71c4675a843c309 with OFFLINE state > > > > > > > > > 2011-06-29 16:37:12,576 DEBUG > > > > > > > > > org.apache.hadoop.hbase.master.AssignmentManager: No > previous > > > > > > > transition > > > > > > > > > plan was found (or we are ignoring an existing plan) for > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > gs_raw_events,GSLoad_1308518553_168_WEB204,1308533970928.584dac5cc70d8682f71c4675a843c309. > > > > > > > > > so generated a random one; > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > hri=gs_raw_events,GSLoad_1308518553_168_WEB204,1308533970928.584dac5cc70d8682f71c4675a843c309., > > > > > > > > > src=, dest=hadoop1-s05.farm-ny.gigya.com > > ,60020,1307349217076; > > > 5 > > > > > > > > (online=5, > > > > > > > > > exclude=null) available servers > > > > > > > > > 2011-06-29 16:37:12,576 DEBUG > > > > > > > > > org.apache.hadoop.hbase.master.AssignmentManager: Assigning > > > > region > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > gs_raw_events,GSLoad_1308518553_168_WEB204,1308533970928.584dac5cc70d8682f71c4675a843c309. > > > > > > > > > to hadoop1-s05.farm-ny.gigya.com,60020,1307349217076 > > > > > > > > > 2011-06-29 16:37:13,102 DEBUG > > > > > > > > > org.apache.hadoop.hbase.master.AssignmentManager: Handling > > > > > > > > > transition=RS_ZK_REGION_OPENED, > > > > > > > > > server=hadoop1-s05.farm-ny.gigya.com,60020,1307349217076, > > > > > > > > > region=584dac5cc70d8682f71c4675a843c309 > > > > > > > > > 2011-06-29 16:39:54,075 DEBUG > > > > > > > > > org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: > > > > > Handling > > > > > > > > OPENED > > > > > > > > > event for 584dac5cc70d8682f71c4675a843c309; deleting > > unassigned > > > > > node > > > > > > > > > 2011-06-29 16:39:54,075 DEBUG > > > > > > > org.apache.hadoop.hbase.zookeeper.ZKAssign: > > > > > > > > > master:60000-0x13004a31d7804c4 Deleting existing unassigned > > > node > > > > > for > > > > > > > > > 584dac5cc70d8682f71c4675a843c309 that is in expected state > > > > > > > > > RS_ZK_REGION_OPENED > > > > > > > > > 2011-06-29 16:39:54,192 DEBUG > > > > > > > org.apache.hadoop.hbase.zookeeper.ZKAssign: > > > > > > > > > master:60000-0x13004a31d7804c4 Successfully deleted > > unassigned > > > > node > > > > > > for > > > > > > > > > region 584dac5cc70d8682f71c4675a843c309 in expected state > > > > > > > > > RS_ZK_REGION_OPENED > > > > > > > > > 2011-06-29 16:39:54,326 DEBUG > > > > > > > > > org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: > > > > Opened > > > > > > > region > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > gs_raw_events,GSLoad_1308518553_168_WEB204,1308533970928.584dac5cc70d8682f71c4675a843c309. > > > > > > > > > on hadoop1-s05.farm-ny.gigya.com,60020,1307349217076 > > > > > > > > > 2011-06-29 16:40:00,598 DEBUG > > > > > > > org.apache.hadoop.hbase.zookeeper.ZKAssign: > > > > > > > > > master:60000-0x13004a31d7804c4 Creating (or updating) > > > unassigned > > > > > node > > > > > > > for > > > > > > > > > 584dac5cc70d8682f71c4675a843c309 with OFFLINE state > > > > > > > > > 2011-06-29 16:40:00,877 DEBUG > > > > > > > > > org.apache.hadoop.hbase.master.AssignmentManager: No > previous > > > > > > > transition > > > > > > > > > plan was found (or we are ignoring an existing plan) for > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > gs_raw_events,GSLoad_1308518553_168_WEB204,1308533970928.584dac5cc70d8682f71c4675a843c309. > > > > > > > > > so generated a random one; > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > hri=gs_raw_events,GSLoad_1308518553_168_WEB204,1308533970928.584dac5cc70d8682f71c4675a843c309., > > > > > > > > > src=, dest=hadoop1-s05.farm-ny.gigya.com > > ,60020,1307349217076; > > > 5 > > > > > > > > (online=5, > > > > > > > > > exclude=null) available servers > > > > > > > > > 2011-06-29 16:40:00,877 DEBUG > > > > > > > > > org.apache.hadoop.hbase.master.AssignmentManager: Assigning > > > > region > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > gs_raw_events,GSLoad_1308518553_168_WEB204,1308533970928.584dac5cc70d8682f71c4675a843c309. > > > > > > > > > to hadoop1-s05.farm-ny.gigya.com,60020,1307349217076 > > > > > > > > > 2011-06-29 16:43:57,879 INFO > > > > > > > > > org.apache.hadoop.hbase.master.AssignmentManager: Regions > in > > > > > > transition > > > > > > > > > timed out: > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > gs_raw_events,GSLoad_1308518553_168_WEB204,1308533970928.584dac5cc70d8682f71c4675a843c309. > > > > > > > > > state=PENDING_OPEN, ts=1309380052723 > > > > > > > > > 2011-06-29 16:43:57,880 INFO > > > > > > > > > org.apache.hadoop.hbase.master.AssignmentManager: Region > has > > > been > > > > > > > > > PENDING_OPEN for too long, reassigning > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > region=gs_raw_events,GSLoad_1308518553_168_WEB204,1308533970928.584dac5cc70d8682f71c4675a843c309. > > > > > > > > > 2011-06-29 16:43:57,936 DEBUG > > > > > > > > > org.apache.hadoop.hbase.master.AssignmentManager: Forcing > > > > OFFLINE; > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > was=gs_raw_events,GSLoad_1308518553_168_WEB204,1308533970928.584dac5cc70d8682f71c4675a843c309. > > > > > > > > > state=PENDING_OPEN, ts=1309380052723 > > > > > > > > > 2011-06-29 16:43:57,936 DEBUG > > > > > > > > > org.apache.hadoop.hbase.master.AssignmentManager: No > previous > > > > > > > transition > > > > > > > > > plan was found (or we are ignoring an existing plan) for > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > gs_raw_events,GSLoad_1308518553_168_WEB204,1308533970928.584dac5cc70d8682f71c4675a843c309. > > > > > > > > > so generated a random one; > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > hri=gs_raw_events,GSLoad_1308518553_168_WEB204,1308533970928.584dac5cc70d8682f71c4675a843c309., > > > > > > > > > src=, dest=hadoop1-s02.farm-ny.gigya.com > > ,60020,1306919627544; > > > 5 > > > > > > > > (online=5, > > > > > > > > > exclude=null) available servers > > > > > > > > > 2011-06-29 16:43:57,936 DEBUG > > > > > > > > > org.apache.hadoop.hbase.master.AssignmentManager: Assigning > > > > region > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > gs_raw_events,GSLoad_1308518553_168_WEB204,1308533970928.584dac5cc70d8682f71c4675a843c309. > > > > > > > > > to hadoop1-s02.farm-ny.gigya.com,60020,1306919627544 > > > > > > > > > 2011-06-29 16:43:59,022 DEBUG > > > > > > > > > org.apache.hadoop.hbase.master.AssignmentManager: Handling > > > > > > > > > transition=RS_ZK_REGION_OPENING, > > > > > > > > > server=hadoop1-s02.farm-ny.gigya.com,60020,1306919627544, > > > > > > > > > region=584dac5cc70d8682f71c4675a843c309 > > > > > > > > > 2011-06-29 16:43:59,221 DEBUG > > > > > > > > > org.apache.hadoop.hbase.master.AssignmentManager: Handling > > > > > > > > > transition=RS_ZK_REGION_OPENING, > > > > > > > > > server=hadoop1-s02.farm-ny.gigya.com,60020,1306919627544, > > > > > > > > > region=584dac5cc70d8682f71c4675a843c309 > > > > > > > > > 2011-06-29 16:43:59,226 DEBUG > > > > > > > > > org.apache.hadoop.hbase.master.AssignmentManager: Handling > > > > > > > > > transition=RS_ZK_REGION_OPENED, > > > > > > > > > server=hadoop1-s02.farm-ny.gigya.com,60020,1306919627544, > > > > > > > > > region=584dac5cc70d8682f71c4675a843c309 > > > > > > > > > 2011-06-29 16:43:59,274 DEBUG > > > > > > > > > org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: > > > > > Handling > > > > > > > > OPENED > > > > > > > > > event for 584dac5cc70d8682f71c4675a843c309; deleting > > unassigned > > > > > node > > > > > > > > > 2011-06-29 16:43:59,274 DEBUG > > > > > > > org.apache.hadoop.hbase.zookeeper.ZKAssign: > > > > > > > > > master:60000-0x13004a31d7804c4 Deleting existing unassigned > > > node > > > > > for > > > > > > > > > 584dac5cc70d8682f71c4675a843c309 that is in expected state > > > > > > > > > RS_ZK_REGION_OPENED > > > > > > > > > 2011-06-29 16:43:59,296 DEBUG > > > > > > > org.apache.hadoop.hbase.zookeeper.ZKAssign: > > > > > > > > > master:60000-0x13004a31d7804c4 Successfully deleted > > unassigned > > > > node > > > > > > for > > > > > > > > > region 584dac5cc70d8682f71c4675a843c309 in expected state > > > > > > > > > RS_ZK_REGION_OPENED > > > > > > > > > 2011-06-29 16:43:59,375 WARN > > > > > > > > > org.apache.hadoop.hbase.master.AssignmentManager: > Overwriting > > > > > > > > > 584dac5cc70d8682f71c4675a843c309 on > > > > > > > > > serverName=hadoop1-s05.farm-ny.gigya.com > > ,60020,1307349217076, > > > > > > > > > load=(requests=0, regions=1273, usedHeap=2676, > maxHeap=12483) > > > > > > > > > 2011-06-29 16:43:59,375 DEBUG > > > > > > > > > org.apache.hadoop.hbase.master.handler.OpenedRegionHandler: > > > > Opened > > > > > > > region > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > gs_raw_events,GSLoad_1308518553_168_WEB204,1308533970928.584dac5cc70d8682f71c4675a843c309. > > > > > > > > > on hadoop1-s02.farm-ny.gigya.com,60020,1306919627544 > > > > > > > > > > > > > > > > > > Thanks. > > > > > > > > > > > > > > > > > > -eran > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > On Fri, Jul 1, 2011 at 09:05, Stack <[email protected]> > > wrote: > > > > > > > > > > > > > > > > > >> So, Eran, it seems as though two RegionServers were > carrying > > > the > > > > > > > > >> region? One deleted a file (compaction on its side)? Can > > you > > > > > > figure > > > > > > > > >> if indeed two servers had same region? (Check master logs > > for > > > > > this > > > > > > > > >> regions assignments). > > > > > > > > >> > > > > > > > > >> What version of hbase? > > > > > > > > >> > > > > > > > > >> St.Ack > > > > > > > > >> > > > > > > > > >> > > > > > > > > >> On Thu, Jun 30, 2011 at 3:58 AM, Eran Kutner < > > [email protected]> > > > > > > wrote: > > > > > > > > >> > Hi, > > > > > > > > >> > I have a cluster of 5 nodes with one large table that > > > > currently > > > > > > has > > > > > > > > >> around > > > > > > > > >> > 12000 regions. Everything was working fine for > relatively > > > long > > > > > > time, > > > > > > > > >> until > > > > > > > > >> > now. > > > > > > > > >> > Yesterday I significantly reduced the TTL on the table > and > > > > > > initiated > > > > > > > > >> major > > > > > > > > >> > compaction. This should have reduced the table size to > > about > > > > 20% > > > > > > of > > > > > > > > its > > > > > > > > >> > original size. > > > > > > > > >> > Today, I'm getting errors of inaccessible files on HDFS, > > for > > > > > > > example: > > > > > > > > >> > java.io.IOException: Got error in response to > > OP_READ_BLOCK > > > > > self=/ > > > > > > > > >> > 10.1.104.2:58047, remote=/10.1.104.2:50010 for file > > > > > > > > >> > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > /hbase/gs_raw_events/584dac5cc70d8682f71c4675a843c309/events/1971818821800304360 > > > > > > > > >> > for block 3674866614142268536_674205 > > > > > > > > >> > at > > > > > > > > >> > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > org.apache.hadoop.hdfs.DFSClient$BlockReader.newBlockReader(DFSClient.java:1487) > > > > > > > > >> > at > > > > > > > > >> > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > org.apache.hadoop.hdfs.DFSClient$DFSInputStream.blockSeekTo(DFSClient.java:1811) > > > > > > > > >> > at > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > org.apache.hadoop.hdfs.DFSClient$DFSInputStream.read(DFSClient.java:1948) > > > > > > > > >> > at > > > > java.io.DataInputStream.read(DataInputStream.java:132) > > > > > > > > >> > at > > > > > > > > >> > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > org.apache.hadoop.hbase.io.hfile.BoundedRangeFileInputStream.read(BoundedRangeFileInputStream.java:105) > > > > > > > > >> > at > > > > > > > > > java.io.BufferedInputStream.fill(BufferedInputStream.java:218) > > > > > > > > >> > at > > > > > > > > > java.io.BufferedInputStream.read(BufferedInputStream.java:237) > > > > > > > > >> > at > > > > > > > > >> > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > org.apache.hadoop.io.compress.BlockDecompressorStream.rawReadInt(BlockDecompressorStream.java:128) > > > > > > > > >> > at > > > > > > > > >> > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > org.apache.hadoop.io.compress.BlockDecompressorStream.decompress(BlockDecompressorStream.java:68) > > > > > > > > >> > at > > > > > > > > >> > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > org.apache.hadoop.io.compress.DecompressorStream.read(DecompressorStream.java:75) > > > > > > > > >> > at > > > > > > > > > java.io.BufferedInputStream.read1(BufferedInputStream.java:256) > > > > > > > > >> > at > > > > > > > > > java.io.BufferedInputStream.read(BufferedInputStream.java:317) > > > > > > > > >> > at > > > > > org.apache.hadoop.io.IOUtils.readFully(IOUtils.java:102) > > > > > > > > >> > at > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > org.apache.hadoop.hbase.io.hfile.HFile$Reader.decompress(HFile.java:1094) > > > > > > > > >> > at > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > org.apache.hadoop.hbase.io.hfile.HFile$Reader.readBlock(HFile.java:1036) > > > > > > > > >> > at > > > > > > > > >> > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > org.apache.hadoop.hbase.io.hfile.HFile$Reader$Scanner.seekTo(HFile.java:1433) > > > > > > > > >> > at > > > > > > > > >> > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > org.apache.hadoop.hbase.regionserver.StoreFileScanner.seekAtOrAfter(StoreFileScanner.java:139) > > > > > > > > >> > at > > > > > > > > >> > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > org.apache.hadoop.hbase.regionserver.StoreFileScanner.seek(StoreFileScanner.java:96) > > > > > > > > >> > at > > > > > > > > >> > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > org.apache.hadoop.hbase.regionserver.StoreScanner.<init>(StoreScanner.java:77) > > > > > > > > >> > at > > > > > > > > >> > > > > > > > > > > > > > org.apache.hadoop.hbase.regionserver.Store.getScanner(Store.java:1341) > > > > > > > > >> > at > > > > > > > > >> > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > org.apache.hadoop.hbase.regionserver.HRegion$RegionScanner.<init>(HRegion.java:2269) > > > > > > > > >> > at > > > > > > > > >> > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > org.apache.hadoop.hbase.regionserver.HRegion.instantiateInternalScanner(HRegion.java:1126) > > > > > > > > >> > at > > > > > > > > >> > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:1118) > > > > > > > > >> > at > > > > > > > > >> > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > org.apache.hadoop.hbase.regionserver.HRegion.getScanner(HRegion.java:1102) > > > > > > > > >> > at > > > > > > > > >> > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > org.apache.hadoop.hbase.regionserver.HRegionServer.openScanner(HRegionServer.java:1781) > > > > > > > > >> > at > > > sun.reflect.GeneratedMethodAccessor46.invoke(Unknown > > > > > > > Source) > > > > > > > > >> > at > > > > > > > > >> > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > > > > > > > > >> > at > java.lang.reflect.Method.invoke(Method.java:597) > > > > > > > > >> > at > > > > > > > > >> > > > > > > > > org.apache.hadoop.hbase.ipc.HBaseRPC$Server.call(HBaseRPC.java:570) > > > > > > > > >> > at > > > > > > > > >> > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > org.apache.hadoop.hbase.ipc.HBaseServer$Handler.run(HBaseServer.java:1039) > > > > > > > > >> > > > > > > > > > >> > I checked and the file, indeed doesn't exist on HDFS, > here > > > is > > > > > the > > > > > > > name > > > > > > > > >> node > > > > > > > > >> > logs for this block, apparently because it was deleted: > > > > > > > > >> > 2011-06-19 21:39:36,651 INFO > > > > org.apache.hadoop.hdfs.StateChange: > > > > > > > > BLOCK* > > > > > > > > >> > NameSystem.allocateBlock: > > > > > > > > >> > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > /hbase/gs_raw_events/584dac5cc70d8682f71c4675a843c309/.tmp/2096863423111131624. > > > > > > > > >> > blk_3674866614142268536_674205 > > > > > > > > >> > 2011-06-19 21:40:11,954 INFO > > > > org.apache.hadoop.hdfs.StateChange: > > > > > > > > BLOCK* > > > > > > > > >> > NameSystem.addStoredBlock: blockMap updated: > > > 10.1.104.2:50010 > > > > is > > > > > > > > added > > > > > > > > >> to > > > > > > > > >> > blk_3674866614142268536_674205 size 67108864 > > > > > > > > >> > 2011-06-19 21:40:11,954 INFO > > > > org.apache.hadoop.hdfs.StateChange: > > > > > > > > BLOCK* > > > > > > > > >> > NameSystem.addStoredBlock: blockMap updated: > > > 10.1.104.3:50010 > > > > is > > > > > > > > added > > > > > > > > >> to > > > > > > > > >> > blk_3674866614142268536_674205 size 67108864 > > > > > > > > >> > 2011-06-19 21:40:11,955 INFO > > > > org.apache.hadoop.hdfs.StateChange: > > > > > > > > BLOCK* > > > > > > > > >> > NameSystem.addStoredBlock: blockMap updated: > > > 10.1.104.5:50010 > > > > is > > > > > > > > added > > > > > > > > >> to > > > > > > > > >> > blk_3674866614142268536_674205 size 67108864 > > > > > > > > >> > 2011-06-29 20:20:01,662 INFO > > > > org.apache.hadoop.hdfs.StateChange: > > > > > > > > BLOCK* > > > > > > > > >> ask > > > > > > > > >> > 10.1.104.2:50010 to delete > > blk_3674866614142268536_674205 > > > > > > > > >> > 2011-06-29 20:20:13,671 INFO > > > > org.apache.hadoop.hdfs.StateChange: > > > > > > > > BLOCK* > > > > > > > > >> ask > > > > > > > > >> > 10.1.104.5:50010 to delete > > blk_-4056387895369608597_675174 > > > > > > > > >> > blk_-5017882805850873821_672281 > > > blk_702373987100607684_672288 > > > > > > > > >> > blk_-5357157478043290010_668506 > > > blk_7118175133735412789_674903 > > > > > > > > >> > blk_-3569812563715986384_675231 > > > blk_8296855057240604851_669285 > > > > > > > > >> > blk_-6483679172530609101_674268 > > > blk_8738539715363739108_673682 > > > > > > > > >> > blk_1744841904626813502_675238 > > > blk_-6035315106100051103_674266 > > > > > > > > >> > blk_-1789501623010070237_674908 > > > blk_1944054629336265129_673689 > > > > > > > > >> > blk_3674866614142268536_674205 > > > blk_7930425446738143892_647410 > > > > > > > > >> > blk_-3007186753042268449_669296 > > > > blk_-5482302621772778061_647416 > > > > > > > > >> > blk_-3765735404924932181_672004 > > > blk_7476090998956811081_675169 > > > > > > > > >> > blk_7862291659285127712_646890 > > > blk_-2666244746343584727_672013 > > > > > > > > >> > blk_6039172613960915602_674206 > > > blk_-8470884397893086564_646899 > > > > > > > > >> > blk_4558230221166712802_668510 > > > > > > > > >> > 2011-06-29 20:20:46,698 INFO > > > > org.apache.hadoop.hdfs.StateChange: > > > > > > > > BLOCK* > > > > > > > > >> ask > > > > > > > > >> > 10.1.104.3:50010 to delete > > blk_-7851606440036350812_671552 > > > > > > > > >> > blk_9214649160203453845_647566 > > blk_702373987100607684_672288 > > > > > > > > >> > blk_5958099369749234073_668143 > > > blk_-5172218034084903173_673109 > > > > > > > > >> > blk_-2934555181472719276_646476 > > > > blk_-1409986679370073931_672552 > > > > > > > > >> > blk_-2786034325506235869_669086 > > > blk_3674866614142268536_674205 > > > > > > > > >> > blk_510158930393283118_673225 > > blk_916244738216205237_677068 > > > > > > > > >> > blk_-4317027806407316617_670379 > > > blk_8555705688850972639_673485 > > > > > > > > >> > blk_-3765735404924932181_672004 > > > > blk_-5482302621772778061_647416 > > > > > > > > >> > blk_-2461801145731752623_674605 > > > > blk_-8737702908048998927_672549 > > > > > > > > >> > blk_-8470884397893086564_646899 > > > blk_4558230221166712802_668510 > > > > > > > > >> > blk_-4056387895369608597_675174 > > > > blk_-8675430610673886073_647695 > > > > > > > > >> > blk_-6642870230256028318_668211 > > > > blk_-3890408516362176771_677483 > > > > > > > > >> > blk_-3569812563715986384_675231 > > > > blk_-5007142629771321873_674548 > > > > > > > > >> > blk_-3345355191863431669_667066 > > > blk_8296855057240604851_669285 > > > > > > > > >> > blk_-6595462308187757470_672420 > > > > blk_-2583945228783203947_674607 > > > > > > > > >> > blk_-346988625120916345_677063 > > > blk_4449525876338684218_674496 > > > > > > > > >> > blk_2617172363857549730_668201 > > > blk_8738539715363739108_673682 > > > > > > > > >> > blk_-208904675456598428_679286 > > > blk_-497549341281882641_646477 > > > > > > > > >> > blk_-6035315106100051103_674266 > > > > blk_-2356539038067297411_672388 > > > > > > > > >> > blk_-3881703084497103249_668137 > > > blk_2214397881104950315_646643 > > > > > > > > >> > blk_-5907671443455357710_673223 > > > > blk_-2431880309956605679_669204 > > > > > > > > >> > blk_6039172613960915602_674206 > > > blk_5053643911633142711_669194 > > > > > > > > >> > blk_-2636977729205236686_674664 > > > > > > > > >> > > > > > > > > > >> > I assume the file loss is somehow related to this change > > and > > > > the > > > > > > > major > > > > > > > > >> > compaction that followed because the same scan that is > > > failing > > > > > now > > > > > > > was > > > > > > > > >> > working fine yesterday and that is the only changed that > > > > > happened > > > > > > on > > > > > > > > the > > > > > > > > >> > cluster. Any suggestions what to do now? > > > > > > > > >> > > > > > > > > > >> > Thanks. > > > > > > > > >> > > > > > > > > > >> > -eran > > > > > > > > >> > > > > > > > > > >> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > >
