Well, the master doesn't know that s05 has the region open -- thats
why it gives it to s02 -- and then, there is no channel available to
s05 to figure who has what
The way I see it, that's the root of the problem. It would probably
make sense if the RS could figure this out independently from
Mind pastebin'ing this part of master log?
2011-06-29 16:39:54,326 DEBUG
org.apache.hadoop.hbase.
master.handler.OpenedRegionHandler: Opened region
gs_raw_events,GSLoad_1308518553_168_WEB204,1308533970928.584dac5cc70d8682f71c4675a843c309.
on hadoop1-s05.farm-ny.gigya.com,60020,1307349217076
On Thu, Jul 7, 2011 at 2:56 AM, Eran Kutner e...@gigya.com wrote:
Well, the master doesn't know that s05 has the region open -- thats
why it gives it to s02 -- and then, there is no channel available to
s05 to figure who has what
The way I see it, that's the root of the problem.
Well backing
no. but I did run major compaction.
As I explained initially, I disabled the table so I could change its TTL,
then re-enabled it then ran major compaction so it would clean up the
expired data due to the TTL change.
-eran
On Wed, Jul 6, 2011 at 02:43, Ted Yu yuzhih...@gmail.com wrote:
Eran:
On Sun, Jul 3, 2011 at 12:00 AM, Eran Kutner e...@gigya.com wrote:
It does seem that both servers opened the same region around the same time.
The region was offline because I disabled the table so I can change its TTL.
.
2011-06-29 16:37:12,964 DEBUG
On Sun, Jul 3, 2011 at 12:02 PM, Eran Kutner e...@gigya.com wrote:
4. Then at 16:40:00 the master log says: master:6-0x13004a31d7804c4
Creating (or updating) unassigned node for 584dac5cc70d8682f71c4675a843c3
09 with OFFLINE state - why did it decide to take the region offline after
Eran:
I logged https://issues.apache.org/jira/browse/HBASE-4060 for you.
On Mon, Jul 4, 2011 at 2:30 AM, Ted Yu yuzhih...@gmail.com wrote:
Thanks for the understanding.
Can you log a JIRA and put your ideas below in it ?
On Jul 4, 2011, at 12:42 AM, Eran Kutner e...@gigya.com wrote:
Appreciate it, sorry I didn't get to it sooner. Had some crazy days :)
-eran
On Tue, Jul 5, 2011 at 17:19, Ted Yu yuzhih...@gmail.com wrote:
Eran:
I logged https://issues.apache.org/jira/browse/HBASE-4060 for you.
On Mon, Jul 4, 2011 at 2:30 AM, Ted Yu yuzhih...@gmail.com wrote:
Thanks
Eran:
You didn't run hbck during the enabling of gs_raw_events table, right ?
I saw:
2011-06-29 16:43:50,395 DEBUG
org.apache.hadoop.hbase.regionserver.CompactSplitThread: Compaction (major)
requested for
gs_raw_events,GSLoad_1308518553_168_WEB204,1308533970928.584dac5cc70d8682f71c4675a843c309.
Thanks for the understanding.
Can you log a JIRA and put your ideas below in it ?
On Jul 4, 2011, at 12:42 AM, Eran Kutner e...@gigya.com wrote:
Thanks for the explanation Ted,
I will try to apply HBASE-3789 and hope for the best but my understanding is
that it doesn't really solve the
Sure, I'll do that.
-eran
On Mon, Jul 4, 2011 at 12:30, Ted Yu yuzhih...@gmail.com wrote:
Thanks for the understanding.
Can you log a JIRA and put your ideas below in it ?
On Jul 4, 2011, at 12:42 AM, Eran Kutner e...@gigya.com wrote:
Thanks for the explanation Ted,
I will try
It does seem that both servers opened the same region around the same time.
The region was offline because I disabled the table so I can change its TTL.
Here is the log from haddop1-s05 :
2011-06-29 16:37:12,576 INFO
org.apache.hadoop.hbase.regionserver.HRegionServer: Received request to open
Eran:
I was thinking of this:
HBASE-3789 Cleanup the locking contention in the master
though it doesn't directly handle 'PENDING_OPEN for too long' case.
https://issues.apache.org/jira/browse/HBASE-3741 is in 0.90.3 and actually
close to the symptom you described.
On Sun, Jul 3, 2011 at 12:00
Thanks Ted, but, as stated before, I'm already using 0.90.3, so either it's
not fixed or it's not the same thing.
-eran
On Sun, Jul 3, 2011 at 17:27, Ted Yu yuzhih...@gmail.com wrote:
Eran:
I was thinking of this:
HBASE-3789 Cleanup the locking contention in the master
though it doesn't
HBASE-3789 should have sped up region assignment.
The patch for 0.90 is attached to that JIRA.
You may prudently apply that patch.
Regards
On Sun, Jul 3, 2011 at 10:01 AM, Eran Kutner e...@gigya.com wrote:
Thanks Ted, but, as stated before, I'm already using 0.90.3, so either it's
not fixed
Ted,
So if I understand correctly the the theory is that because of the issue
fixed in HBASE-3789 the master took too long to detect that the region was
successfully opened by the first server so it forced closed it and
transitioned to a second server, but there are a few things about this
Let me try to answer some of your questions.
The two paragraphs below were written along my reasoning which is in reverse
order of the actual call sequence.
For #4 below, the log indicates that the following was executed:
private void assign(final RegionState state, final boolean
So, Eran, it seems as though two RegionServers were carrying the
region? One deleted a file (compaction on its side)? Can you figure
if indeed two servers had same region? (Check master logs for this
regions assignments).
What version of hbase?
St.Ack
On Thu, Jun 30, 2011 at 3:58 AM, Eran
Hi Stack,
I'm not sure what the log means. I do see references to two different
servers, but that would probably happen if there was normal transition I
assume. I'm using version 0.90.3
Here are the relevant lines from the master logs:
2011-06-19 21:39:37,164 INFO
Is
gs_raw_events,GSLoad_1308518553_168_WEB204,1308533970928.584dac5cc70d8682f71c4675a843c309.
the region that was having the issue? If so, if you looked in
hadoop1-s05's logs, was this region opened around 2011-06-29 16:43:57?
Was it also opened hadoop1-s02 not long after? Did you say what
2011-06-29 16:43:57,880 INFO
org.apache.hadoop.hbase.
master.AssignmentManager: Region has been
PENDING_OPEN for too long, reassigning
region=gs_raw_events,GSLoad_1308518553_168_WEB204,1308533970928.584dac5cc70d8682f71c4675a843c309.
The double assignment should have been fixed by J-D's recent
Hi,
I have a cluster of 5 nodes with one large table that currently has around
12000 regions. Everything was working fine for relatively long time, until
now.
Yesterday I significantly reduced the TTL on the table and initiated major
compaction. This should have reduced the table size to about 20%
22 matches
Mail list logo