Re: Replication

Sebastian Bauer Wed, 14 Jul 2010 03:35:38 -0700

So replication is working, but after hadoop update i see many of thison slave:

2010-07-14 12:30:51,941 WARNorg.apache.hadoop.hbase.regionserver.wal.HLog: HDFS pipeline errordetected. Found 1 replicas but expecting 3 replicas. Requesting closeof hlog.2010-07-14 12:30:51,955 INFOorg.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter: UsingsyncFs -- HDFS-2002010-07-14 12:30:51,955 INFOorg.apache.hadoop.hbase.regionserver.wal.HLog: Roll/hbase/.logs/db2b.goldenline.pl,60020,1279102137010/85.232.237.235%3A60020.1279103451914,entries=1, filesize=555. New hlog/hbase/.logs/db2b.goldenline.pl,60020,1279102137010/85.232.237.235%3A60020.12791034519442010-07-14 12:30:51,957 WARNorg.apache.hadoop.hbase.regionserver.wal.HLog: HDFS pipeline errordetected. Found 1 replicas but expecting 3 replicas. Requesting closeof hlog.2010-07-14 12:30:51,966 INFOorg.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter: UsingsyncFs -- HDFS-2002010-07-14 12:30:51,967 INFOorg.apache.hadoop.hbase.regionserver.wal.HLog: Roll/hbase/.logs/db2b.goldenline.pl,60020,1279102137010/85.232.237.235%3A60020.1279103451944,entries=1, filesize=1195. New hlog/hbase/.logs/db2b.goldenline.pl,60020,1279102137010/85.232.237.235%3A60020.1279103451959



and something like this on master:

2010-07-14 12:25:10,939 WARNorg.apache.hadoop.hbase.regionserver.wal.HLog: HDFS pipeline errordetected. Found 1 replicas but expecting 3 replicas. Requesting closeof hlog.2010-07-14 12:25:10,940 WARNorg.apache.hadoop.hbase.regionserver.wal.HLog: HDFS pipeline errordetected. Found 1 replicas but expecting 3 replicas. Requesting closeof hlog.2010-07-14 12:25:10,940 WARNorg.apache.hadoop.hbase.regionserver.wal.HLog: HDFS pipeline errordetected. Found 1 replicas but expecting 3 replicas. Requesting closeof hlog.2010-07-14 12:25:11,399 INFOorg.apache.hadoop.hbase.regionserver.wal.SequenceFileLogWriter: UsingsyncFs -- HDFS-2002010-07-14 12:25:11,400 INFOorg.apache.hadoop.hbase.regionserver.wal.HLog: Roll/hbase/.logs/db2a.goldenline.pl,60020,1279102568601/85.232.237.234%3A60020.1279103110860,entries=81, filesize=22075. New hlog/hbase/.logs/db2a.goldenline.pl,60020,1279102568601/85.232.237.234%3A60020.12791031113792010-07-14 12:25:11,451 DEBUGorg.apache.hadoop.hbase.zookeeper.ZooKeeperWrapper:<db2a:/hbase,db2a.goldenline.pl,60020,1279102568601>Created/hbase/replication/rs/db2a.goldenline

.pl,60020,1279102568601/test/85.232.237.234%3A60020.1279103111379 with data

2010-07-14 12:25:11,454 WARNorg.apache.hadoop.hbase.regionserver.wal.HLog: HDFS pipeline errordetected. Found 1 replicas but expecting 3 replicas. Requesting closeof hlog.



W dniu 14.07.2010 00:08, Jean-Daniel Cryans pisze:

Just looked at the head of 0.20-append and I see it contains the
missing patch (was committed as part of HDFS-1057).

So that would mean that the file is just empty :) If you insert a few
rows in the shell on the master cluster, do you see them some seconds
later on the slave?

J-D

On Tue, Jul 13, 2010 at 3:01 PM, Sebastian Bauer<[email protected]>  wrote:

  W dniu 13.07.2010 23:50, Jean-Daniel Cryans pisze:

Yeah using an experimental feature can be "odd" to use :D

I love bleeding edge technologies :D

So one of the following is happening:

  1) You aren't using a version of hadoop patched enough to get
replication working fully. Trunk uses a special jar that I patched
myself. CDH3b2 also has everything needed. What this means is that
it's trying to open the log file but the first block isn't available
(it's actually a very small patch for the Namenode).

I'm using hadoop from 0.20-append branch released with hbase-0.89.xxxx

  2) The file is empty, because nothing was written to the log file.
What this means is that it's trying to open the log file but there's
not even a single block in it, so it fails on EOF.

this problem can be because of this, cause when all its running i see less
of this traces

J-D

Thanks for your help :)

On Tue, Jul 13, 2010 at 2:37 PM, Sebastian Bauer<[email protected]>
  wrote:

  after trying to setup replication i have got many od this errors:

2010-07-13 23:35:26,498 WARN
org.apache.hadoop.hbase.replication.regionserver.ReplicationSource:
Waited
too long for this file, considering dumping
2010-07-13 23:35:26,498 DEBUG
org.apache.hadoop.hbase.replication.regionserver.ReplicationSource:
Unable
to open a reader, sleeping 100 times 10
2010-07-13 23:35:27,111 INFO org.apache.hadoop.hbase.regionserver.Store:
Completed compaction of 3 file(s) in c of

CampaignToUsers,43-m_2010_5_750D70A83162FF54389D2CA67ADA0B86,1278610126054.6504d518fb224efe1530e79c198994cd.;
new
  storefile is

hdfs://db2a:50001/hbase/CampaignToUsers/6504d518fb224efe1530e79c198994cd/c/226233377281334567;
store size is 19.6m
2010-07-13 23:35:27,111 INFO
org.apache.hadoop.hbase.regionserver.HRegion:
compaction completed on region

CampaignToUsers,43-m_2010_5_750D70A83162FF54389D2CA67ADA0B86,1278610126054.6504d518fb224efe1530e79c198994cd.
in 1sec
2010-07-13 23:35:27,111 INFO
org.apache.hadoop.hbase.regionserver.HRegion:
Starting compaction on region
UsersToCampaign,,1278609821058.ecb7605434967e247ce14d525849495d.
2010-07-13 23:35:27,112 DEBUG org.apache.hadoop.hbase.regionserver.Store:
Compaction size of c: 31.4m; Skipped 0 file(s), size: 0
2010-07-13 23:35:27,112 INFO org.apache.hadoop.hbase.regionserver.Store:
Started compaction of 3 file(s) in c of
UsersToCampaign,,1278609821058.ecb7605434967e247ce14d525849495d.  into
hdfs://db2a:50001/hbase/UsersToCampaign/ecb7
605434967e247ce14d525849495d/.tmp, seqid=65302505
2010-07-13 23:35:27,498 INFO
org.apache.hadoop.hbase.replication.regionserver.ReplicationSource:
Opening
log for replication 85.232.237.234%3A60020.1279056880911 at 0
2010-07-13 23:35:27,499 WARN
org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: test
Got:
java.io.EOFException
        at java.io.DataInputStream.readFully(DataInputStream.java:180)
        at java.io.DataInputStream.readFully(DataInputStream.java:152)
        at
org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1457)
        at
org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1435)
        at
org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1424)
        at
org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1419)
        at

org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader$WALReader.<init>(SequenceFileLogReader.java:51)
        at

org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.init(SequenceFileLogReader.java:103)
        at
org.apache.hadoop.hbase.regionserver.wal.HLog.getReader(HLog.java:511)
        at

org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.openReader(ReplicationSource.java:422)
        at

org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:262)
2010-07-13 23:35:27,499 WARN
org.apache.hadoop.hbase.replication.regionserver.ReplicationSource:
Waited
too long for this file, considering dumping
2010-07-13 23:35:27,499 DEBUG
org.apache.hadoop.hbase.replication.regionserver.ReplicationSource:
Unable
to open a reader, sleeping 100 times 10
2010-07-13 23:35:28,499 INFO
org.apache.hadoop.hbase.replication.regionserver.ReplicationSource:
Opening
log for replication 85.232.237.234%3A60020.1279056880911 at 0
2010-07-13 23:35:28,500 WARN
org.apache.hadoop.hbase.replication.regionserver.ReplicationSource: test
Got:
java.io.EOFException
        at java.io.DataInputStream.readFully(DataInputStream.java:180)
        at java.io.DataInputStream.readFully(DataInputStream.java:152)
        at
org.apache.hadoop.io.SequenceFile$Reader.init(SequenceFile.java:1457)
        at
org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1435)
        at
org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1424)
        at
org.apache.hadoop.io.SequenceFile$Reader.<init>(SequenceFile.java:1419)
        at

org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader$WALReader.<init>(SequenceFileLogReader.java:51)
        at

org.apache.hadoop.hbase.regionserver.wal.SequenceFileLogReader.init(SequenceFileLogReader.java:103)
        at
org.apache.hadoop.hbase.regionserver.wal.HLog.getReader(HLog.java:511)
        at

org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.openReader(ReplicationSource.java:422)
        at

org.apache.hadoop.hbase.replication.regionserver.ReplicationSource.run(ReplicationSource.java:262)

W dniu 13.07.2010 20:18, Jean-Daniel Cryans pisze:

No, but you can use the new mapreduce utility
org.apache.hadoop.hbase.mapreduce.CopyTable to copy whole tables
between clusters. It's like distcp for HBase.

Oh and looking at the documentation I just figured that I changed the
name of the configuration that enables replication just before
committing and forgot to update the package.html file, it's now simply
hbase.replication (and it should stay like that). I'll fix that in the
scope of HBASE-2808.

J-D

On Tue, Jul 13, 2010 at 11:12 AM, Sebastian Bauer<[email protected]>
  wrote:

  I have one more question can i first create master and after loading
data
connect slave or turn on replication on existing tables with data?

W dniu 13.07.2010 19:56, Jean-Daniel Cryans pisze:

Thanks for info where i can find some documentation. There is info
about
zookeeper that it need running in standalone mode it is true?

Well you can run add_peer.rb when the clusters are running, but they
won't pickup the change live (that part isn't done yet). So if you run
the script while the cluster is running, restart it. Also take a look
at the region server log, it should output something like this when
starting up:

     LOG.info("This cluster (" + thisCluster + ") is a "
           + (this.replicationMaster ? "master" : "slave") + " for
replication" +
           ", compared with (" + address + ")");

This will tell you if you used the right address for zookeeper. If
your region server on the master cluster thinks its a slave, then the
addresses are wrong. Also currently there's no reporting for
replication, since it's not done yet!

For a more in-depth documentation, check out
https://issues.apache.org/jira/browse/HBASE-2808

Thanks for trying this out, as the author of most of that part of the
code I'm thrilled!

J-D

Re: Replication

Reply via email to