Re: Replication

Jean-Daniel Cryans Wed, 14 Jul 2010 08:30:30 -0700

This is not related to replication, it's about a new feature added by
https://issues.apache.org/jira/browse/HBASE-2306


https://issues.apache.org/jira/browse/HBASE-2382 added the required
documentation, basically if you are in pseudo-distributed you need to
tell HBase not to expect the default number of replicas:
http://hbase.apache.org/docs/r0.89.20100621/apidocs/overview-summary.html#pseudo-distrib

See the discussion in 2382, do you think it could be more user friendly?

J-D

On Wed, Jul 14, 2010 at 3:33 AM, Sebastian Bauer <[email protected]> wrote:
>  So replication is working, but after hadoop update i see many of this on
> slave:
>
> 2010-07-14 12:30:51,941 WARN org.apache.hadoop.hbase.regionserver.wal.HLog:
> HDFS pipeline error detected. Found 1 replicas but expecting 3 replicas.
>  Requesting close of hlog.
> 2010-07-14 12:30:51,955 INFO
> org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter: Using syncFs
> -- HDFS-200
> 2010-07-14 12:30:51,955 INFO org.apache.hadoop.hbase.regionserver.wal.HLog:
> Roll
> /hbase/.logs/db2b.goldenline.pl,60020,1279102137010/85.232.237.235%3A60020.1279103451914,
> entrie
> s=1, filesize=555. New hlog
> /hbase/.logs/db2b.goldenline.pl,60020,1279102137010/85.232.237.235%3A60020.1279103451944
> 2010-07-14 12:30:51,957 WARN org.apache.hadoop.hbase.regionserver.wal.HLog:
> HDFS pipeline error detected. Found 1 replicas but expecting 3 replicas.
>  Requesting close of hlog.
> 2010-07-14 12:30:51,966 INFO
> org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter: Using syncFs
> -- HDFS-200
> 2010-07-14 12:30:51,967 INFO org.apache.hadoop.hbase.regionserver.wal.HLog:
> Roll
> /hbase/.logs/db2b.goldenline.pl,60020,1279102137010/85.232.237.235%3A60020.1279103451944,
> entrie
> s=1, filesize=1195. New hlog
> /hbase/.logs/db2b.goldenline.pl,60020,1279102137010/85.232.237.235%3A60020.1279103451959
>
>
> and something like this on master:
>
> 2010-07-14 12:25:10,939 WARN org.apache.hadoop.hbase.regionserver.wal.HLog:
> HDFS pipeline error detected. Found 1 replicas but expecting 3 replicas.
>  Requesting close of hlog.
> 2010-07-14 12:25:10,940 WARN org.apache.hadoop.hbase.regionserver.wal.HLog:
> HDFS pipeline error detected. Found 1 replicas but expecting 3 replicas.
>  Requesting close of hlog.
> 2010-07-14 12:25:10,940 WARN org.apache.hadoop.hbase.regionserver.wal.HLog:
> HDFS pipeline error detected. Found 1 replicas but expecting 3 replicas.
>  Requesting close of hlog.
> 2010-07-14 12:25:11,399 INFO
> org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter: Using syncFs
> -- HDFS-200
> 2010-07-14 12:25:11,400 INFO org.apache.hadoop.hbase.regionserver.wal.HLog:
> Roll
> /hbase/.logs/db2a.goldenline.pl,60020,1279102568601/85.232.237.234%3A60020.1279103110860,
> entries=
> 81, filesize=22075. New hlog
> /hbase/.logs/db2a.goldenline.pl,60020,1279102568601/85.232.237.234%3A60020.1279103111379
> 2010-07-14 12:25:11,451 DEBUG
> org.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper:
> <db2a:/hbase,db2a.goldenline.pl,60020,1279102568601>Created
> /hbase/replication/rs/db2a.goldenline
> .pl,60020,1279102568601/test/85.232.237.234%3A60020.1279103111379 with data
> 2010-07-14 12:25:11,454 WARN org.apache.hadoop.hbase.regionserver.wal.HLog:
> HDFS pipeline error detected. Found 1 replicas but expecting 3 replicas.
>  Requesting close of hlog.
>
>
> W dniu 14.07.2010 00:08, Jean-Daniel Cryans pisze:
>>
>> Just looked at the head of 0.20-append and I see it contains the
>> missing patch (was committed as part of HDFS-1057).
>>
>> So that would mean that the file is just empty :) If you insert a few
>> rows in the shell on the master cluster, do you see them some seconds
>> later on the slave?
>>
>> J-D
>>
>> On Tue, Jul 13, 2010 at 3:01 PM, Sebastian Bauer<[email protected]>
>>  wrote:
>>>
>>>  W dniu 13.07.2010 23:50, Jean-Daniel Cryans pisze:
>>>>
>>>> Yeah using an experimental feature can be "odd" to use :D
>>>
>>> I love bleeding edge technologies :D
>>>>
>>>> So one of the following is happening:
>>>>
>>>>  1) You aren't using a version of hadoop patched enough to get
>>>> replication working fully. Trunk uses a special jar that I patched
>>>> myself. CDH3b2 also has everything needed. What this means is that
>>>> it's trying to open the log file but the first block isn't available
>>>> (it's actually a very small patch for the Namenode).
>>>
>>> I'm using hadoop from 0.20-append branch released with hbase-0.89.xxxx
>>>
>>>>  2) The file is empty, because nothing was written to the log file.
>>>> What this means is that it's trying to open the log file but there's
>>>> not even a single block in it, so it fails on EOF.
>>>
>>> this problem can be because of this, cause when all its running i see
>>> less
>>> of this traces
>>>
>>>> J-D
>>>>
>>> Thanks for your help :)
>>>
>>>> On Tue, Jul 13, 2010 at 2:37 PM, Sebastian Bauer<[email protected]>
>>>>  wrote:
>>>>>
>>>>>  after trying to setup replication i have got many od this errors:
>>>>>
>>>>> 2010-07-13 23:35:26,498 WARN
>>>>> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource:
>>>>> Waited
>>>>> too long for this file, considering dumping
>>>>> 2010-07-13 23:35:26,498 DEBUG
>>>>> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource:
>>>>> Unable
>>>>> to open a reader, sleeping 100 times 10
>>>>> 2010-07-13 23:35:27,111 INFO
>>>>> org.apache.hadoop.hbase.regionserver.Store:
>>>>> Completed compaction of 3 file(s) in c of
>>>>>
>>>>>
>>>>> CampaignToUsers,43-m_2010_5_750D70A83162FF54389D2CA67ADA0B86,1278610126054.6504d518fb224efe1530e79c198994cd.;
>>>>> new
>>>>>  storefile is
>>>>>
>>>>>
>>>>> hdfs://db2a:50001/hbase/CampaignToUsers/6504d518fb224efe1530e79c198994cd/c/226233377281334567;
>>>>> store size is 19.6m
>>>>> 2010-07-13 23:35:27,111 INFO
>>>>> org.apache.hadoop.hbase.regionserver.HRegion:
>>>>> compaction completed on region
>>>>>
>>>>>
>>>>> CampaignToUsers,43-m_2010_5_750D70A83162FF54389D2CA67ADA0B86,1278610126054.6504d518fb224efe1530e79c198994cd.
>>>>> in 1sec
>>>>> 2010-07-13 23:35:27,111 INFO
>>>>> org.apache.hadoop.hbase.regionserver.HRegion:
>>>>> Starting compaction on region
>>>>> UsersToCampaign,,1278609821058.ecb7605434967e247ce14d525849495d.
>>>>> 2010-07-13 23:35:27,112 DEBUG
>>>>> org.apache.hadoop.hbase.regionserver.Store:
>>>>> Compaction size of c: 31.4m; Skipped 0 file(s), size: 0
>>>>> 2010-07-13 23:35:27,112 INFO
>>>>> org.apache.hadoop.hbase.regionserver.Store:
>>>>> Started compaction of 3 file(s) in c of
>>>>> UsersToCampaign,,1278609821058.ecb7605434967e247ce14d525849495d.  into
>>>>> hdfs://db2a:50001/hbase/UsersToCampaign/ecb7
>>>>> 605434967e247ce14d525849495d/.tmp, seqid=65302505
>>>>> 2010-07-13 23:35:27,498 INFO
>>>>> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource:
>>>>> Opening
>>>>> log for replication 85.232.237.234%3A60020.1279056880911 at 0
>>>>> 2010-07-13 23:35:27,499 WARN
>>>>> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource:
>>>>> test
>>>>> Got:
>>>>> java.io.EOFException
>>>>>        at java.io.DataInputStream.readFully(DataInputStream.java:180)
>>>>>        at java.io.DataInputStream.readFully(DataInputStream.java:152)
>>>>>        at
>>>>> org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1457)
>>>>>        at
>>>>> org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1435)
>>>>>        at
>>>>> org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1424)
>>>>>        at
>>>>> org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1419)
>>>>>        at
>>>>>
>>>>>
>>>>> org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader$WALReader.<init>(SequenceFileLogReader.java:51)
>>>>>        at
>>>>>
>>>>>
>>>>> org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.init(SequenceFileLogReader.java:103)
>>>>>        at
>>>>> org.apache.hadoop.hbase.regionserver.wal.HLog.getReader(HLog.java:511)
>>>>>        at
>>>>>
>>>>>
>>>>> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.openReader(ReplicationSource.java:422)
>>>>>        at
>>>>>
>>>>>
>>>>> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:262)
>>>>> 2010-07-13 23:35:27,499 WARN
>>>>> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource:
>>>>> Waited
>>>>> too long for this file, considering dumping
>>>>> 2010-07-13 23:35:27,499 DEBUG
>>>>> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource:
>>>>> Unable
>>>>> to open a reader, sleeping 100 times 10
>>>>> 2010-07-13 23:35:28,499 INFO
>>>>> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource:
>>>>> Opening
>>>>> log for replication 85.232.237.234%3A60020.1279056880911 at 0
>>>>> 2010-07-13 23:35:28,500 WARN
>>>>> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource:
>>>>> test
>>>>> Got:
>>>>> java.io.EOFException
>>>>>        at java.io.DataInputStream.readFully(DataInputStream.java:180)
>>>>>        at java.io.DataInputStream.readFully(DataInputStream.java:152)
>>>>>        at
>>>>> org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1457)
>>>>>        at
>>>>> org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1435)
>>>>>        at
>>>>> org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1424)
>>>>>        at
>>>>> org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1419)
>>>>>        at
>>>>>
>>>>>
>>>>> org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader$WALReader.<init>(SequenceFileLogReader.java:51)
>>>>>        at
>>>>>
>>>>>
>>>>> org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.init(SequenceFileLogReader.java:103)
>>>>>        at
>>>>> org.apache.hadoop.hbase.regionserver.wal.HLog.getReader(HLog.java:511)
>>>>>        at
>>>>>
>>>>>
>>>>> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.openReader(ReplicationSource.java:422)
>>>>>        at
>>>>>
>>>>>
>>>>> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:262)
>>>>>
>>>>> W dniu 13.07.2010 20:18, Jean-Daniel Cryans pisze:
>>>>>>
>>>>>> No, but you can use the new mapreduce utility
>>>>>> org.apache.hadoop.hbase.mapreduce.CopyTable to copy whole tables
>>>>>> between clusters. It's like distcp for HBase.
>>>>>>
>>>>>> Oh and looking at the documentation I just figured that I changed the
>>>>>> name of the configuration that enables replication just before
>>>>>> committing and forgot to update the package.html file, it's now simply
>>>>>> hbase.replication (and it should stay like that). I'll fix that in the
>>>>>> scope of HBASE-2808.
>>>>>>
>>>>>> J-D
>>>>>>
>>>>>> On Tue, Jul 13, 2010 at 11:12 AM, Sebastian Bauer<[email protected]>
>>>>>>  wrote:
>>>>>>>
>>>>>>>  I have one more question can i first create master and after loading
>>>>>>> data
>>>>>>> connect slave or turn on replication on existing tables with data?
>>>>>>>
>>>>>>> W dniu 13.07.2010 19:56, Jean-Daniel Cryans pisze:
>>>>>>>>>
>>>>>>>>> Thanks for info where i can find some documentation. There is info
>>>>>>>>> about
>>>>>>>>> zookeeper that it need running in standalone mode it is true?
>>>>>>>>>
>>>>>>>> Well you can run add_peer.rb when the clusters are running, but they
>>>>>>>> won't pickup the change live (that part isn't done yet). So if you
>>>>>>>> run
>>>>>>>> the script while the cluster is running, restart it. Also take a
>>>>>>>> look
>>>>>>>> at the region server log, it should output something like this when
>>>>>>>> starting up:
>>>>>>>>
>>>>>>>>     LOG.info("This cluster (" + thisCluster + ") is a "
>>>>>>>>           + (this.replicationMaster ? "master" : "slave") + " for
>>>>>>>> replication" +
>>>>>>>>           ", compared with (" + address + ")");
>>>>>>>>
>>>>>>>> This will tell you if you used the right address for zookeeper. If
>>>>>>>> your region server on the master cluster thinks its a slave, then
>>>>>>>> the
>>>>>>>> addresses are wrong. Also currently there's no reporting for
>>>>>>>> replication, since it's not done yet!
>>>>>>>>
>>>>>>>> For a more in-depth documentation, check out
>>>>>>>> https://issues.apache.org/jira/browse/HBASE-2808
>>>>>>>>
>>>>>>>> Thanks for trying this out, as the author of most of that part of
>>>>>>>> the
>>>>>>>> code I'm thrilled!
>>>>>>>>
>>>>>>>> J-D
>>>>>>>>
>>>
>
>

Re: Replication

Reply via email to