Re: Replication

Jean-Daniel Cryans Tue, 13 Jul 2010 14:52:28 -0700

Yeah using an experimental feature can be "odd" to use :D

So one of the following is happening:


 1) You aren't using a version of hadoop patched enough to get
replication working fully. Trunk uses a special jar that I patched
myself. CDH3b2 also has everything needed. What this means is that
it's trying to open the log file but the first block isn't available
(it's actually a very small patch for the Namenode).

 2) The file is empty, because nothing was written to the log file.
What this means is that it's trying to open the log file but there's
not even a single block in it, so it fails on EOF.

J-D

On Tue, Jul 13, 2010 at 2:37 PM, Sebastian Bauer <ad...@ugame.net.pl> wrote:
>  after trying to setup replication i have got many od this errors:
>
> 2010-07-13 23:35:26,498 WARN
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: Waited
> too long for this file, considering dumping
> 2010-07-13 23:35:26,498 DEBUG
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: Unable
> to open a reader, sleeping 100 times 10
> 2010-07-13 23:35:27,111 INFO org.apache.hadoop.hbase.regionserver.Store:
> Completed compaction of 3 file(s) in c of
> CampaignToUsers,43-m_2010_5_750D70A83162FF54389D2CA67ADA0B86,1278610126054.6504d518fb224efe1530e79c198994cd.;
> new
>  storefile is
> hdfs://db2a:50001/hbase/CampaignToUsers/6504d518fb224efe1530e79c198994cd/c/226233377281334567;
> store size is 19.6m
> 2010-07-13 23:35:27,111 INFO org.apache.hadoop.hbase.regionserver.HRegion:
> compaction completed on region
> CampaignToUsers,43-m_2010_5_750D70A83162FF54389D2CA67ADA0B86,1278610126054.6504d518fb224efe1530e79c198994cd.
> in 1sec
> 2010-07-13 23:35:27,111 INFO org.apache.hadoop.hbase.regionserver.HRegion:
> Starting compaction on region
> UsersToCampaign,,1278609821058.ecb7605434967e247ce14d525849495d.
> 2010-07-13 23:35:27,112 DEBUG org.apache.hadoop.hbase.regionserver.Store:
> Compaction size of c: 31.4m; Skipped 0 file(s), size: 0
> 2010-07-13 23:35:27,112 INFO org.apache.hadoop.hbase.regionserver.Store:
> Started compaction of 3 file(s) in c of
> UsersToCampaign,,1278609821058.ecb7605434967e247ce14d525849495d.  into
> hdfs://db2a:50001/hbase/UsersToCampaign/ecb7
> 605434967e247ce14d525849495d/.tmp, seqid=65302505
> 2010-07-13 23:35:27,498 INFO
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: Opening
> log for replication 85.232.237.234%3A60020.1279056880911 at 0
> 2010-07-13 23:35:27,499 WARN
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: test
> Got:
> java.io.EOFException
>        at java.io.DataInputStream.readFully(DataInputStream.java:180)
>        at java.io.DataInputStream.readFully(DataInputStream.java:152)
>        at
> org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1457)
>        at
> org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1435)
>        at
> org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1424)
>        at
> org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1419)
>        at
> org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader$WALReader.<init>(SequenceFileLogReader.java:51)
>        at
> org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.init(SequenceFileLogReader.java:103)
>        at
> org.apache.hadoop.hbase.regionserver.wal.HLog.getReader(HLog.java:511)
>        at
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.openReader(ReplicationSource.java:422)
>        at
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:262)
> 2010-07-13 23:35:27,499 WARN
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: Waited
> too long for this file, considering dumping
> 2010-07-13 23:35:27,499 DEBUG
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: Unable
> to open a reader, sleeping 100 times 10
> 2010-07-13 23:35:28,499 INFO
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: Opening
> log for replication 85.232.237.234%3A60020.1279056880911 at 0
> 2010-07-13 23:35:28,500 WARN
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: test
> Got:
> java.io.EOFException
>        at java.io.DataInputStream.readFully(DataInputStream.java:180)
>        at java.io.DataInputStream.readFully(DataInputStream.java:152)
>        at
> org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1457)
>        at
> org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1435)
>        at
> org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1424)
>        at
> org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1419)
>        at
> org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader$WALReader.<init>(SequenceFileLogReader.java:51)
>        at
> org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.init(SequenceFileLogReader.java:103)
>        at
> org.apache.hadoop.hbase.regionserver.wal.HLog.getReader(HLog.java:511)
>        at
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.openReader(ReplicationSource.java:422)
>        at
> org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:262)
>
> W dniu 13.07.2010 20:18, Jean-Daniel Cryans pisze:
>>
>> No, but you can use the new mapreduce utility
>> org.apache.hadoop.hbase.mapreduce.CopyTable to copy whole tables
>> between clusters. It's like distcp for HBase.
>>
>> Oh and looking at the documentation I just figured that I changed the
>> name of the configuration that enables replication just before
>> committing and forgot to update the package.html file, it's now simply
>> hbase.replication (and it should stay like that). I'll fix that in the
>> scope of HBASE-2808.
>>
>> J-D
>>
>> On Tue, Jul 13, 2010 at 11:12 AM, Sebastian Bauer<ad...@ugame.net.pl>
>>  wrote:
>>>
>>>  I have one more question can i first create master and after loading
>>> data
>>> connect slave or turn on replication on existing tables with data?
>>>
>>> W dniu 13.07.2010 19:56, Jean-Daniel Cryans pisze:
>>>>>
>>>>> Thanks for info where i can find some documentation. There is info
>>>>> about
>>>>> zookeeper that it need running in standalone mode it is true?
>>>>>
>>>> Well you can run add_peer.rb when the clusters are running, but they
>>>> won't pickup the change live (that part isn't done yet). So if you run
>>>> the script while the cluster is running, restart it. Also take a look
>>>> at the region server log, it should output something like this when
>>>> starting up:
>>>>
>>>>     LOG.info("This cluster (" + thisCluster + ") is a "
>>>>           + (this.replicationMaster ? "master" : "slave") + " for
>>>> replication" +
>>>>           ", compared with (" + address + ")");
>>>>
>>>> This will tell you if you used the right address for zookeeper. If
>>>> your region server on the master cluster thinks its a slave, then the
>>>> addresses are wrong. Also currently there's no reporting for
>>>> replication, since it's not done yet!
>>>>
>>>> For a more in-depth documentation, check out
>>>> https://issues.apache.org/jira/browse/HBASE-2808
>>>>
>>>> Thanks for trying this out, as the author of most of that part of the
>>>> code I'm thrilled!
>>>>
>>>> J-D
>>>>
>>>
>
>

Re: Replication

Reply via email to