Re: Problems with accumulo replication

2018-01-02 Thread Josh Elser

That sounds like a problem :)

The normal case is that you have two ZooKeeper instances, one for the 
primary and one for the peer. As such, it stands that the ZK quorum for 
your primary would not contain instance information for the peer 
Accumulo instance. What is the Accumulo instance name for the Accumulo 
cluster which is the "peer"? Remember that the "logical" name you give 
the cluster to uniquely identify the cluster for replication is 
different than the Accumulo instance name.


I would double check your Accumulo configuration, notably around the 
ZooKeeper quorum for replication. You can do this via the Accumulo shell 
doing something like `config -f replication`.


You can check the docs for a refresher on this: 
https://accumulo.apache.org/1.8/accumulo_user_manual.html#_instance_configuration


On 1/2/18 6:06 AM, vLex Systems wrote:

Could it be the second problem?

I'm seeing exceptions like this one in the tablet server logs:
2018-01-02 09:09:23,481 [zookeeper.DistributedWorkQueue] WARN : Failed
to process work b1bba0c2-dde2-42d4-8c10-ef51d13448ca|peer1|4l|6
java.lang.RuntimeException: Instance name peer1 does not exist in
zookeeper.  Run "accumulo
org.apache.accumulo.server.util.ListInstances" to see a list.

When I run "accumulo org.apache.accumulo.server.util.ListInstances" it
only lists the primary accumulo.
Could the problem be in the ZooKeeper Quorum I used when I registered
the peer instance? I used the IP ot the peer as the only IP as the
ZooKeeper Quorum value.




2017-12-29 16:07 GMT+01:00 Josh Elser :

If the system is reporting files that need to be replicated, it's probably
one of two problems:

* The WALs are still in use by the TabletServers. In its current
implementation, the WALs are not replicated until the TabletServers don't
referenced those WALs. This happens either by writing enough data or when
the tabletserver is restarted. You can try to investigate either for this.
* The replication is trying to happen but fails. You can look at the
TabletServer logs on the primary instance to see if there are any reported
exceptions around sending the data to the peer.


On 12/29/17 8:24 AM, vLex Systems wrote:


Hi,

I've configured replication between two instances of accumulo: one is
the primary accumulo and the other is a peer created from a restore of
the backup of the primary.

I've followed the instructions in the manual
(https://accumulo.apache.org/1.7/accumulo_user_manual#_replication)
and I can see the 4 tables I've configured to replicate in the
Accumulo Monitor but they do not replicate. They have 1 or 2 "Files
needing replication" and this number never decreases.

I've also tried inserting data in one of the tables and the data does
not replicate to the accumulo peer instance.

In the master log I see many entries like this one:
2017-12-29 13:22:25,490 [replication.RemoveCompleteReplicationRecords]
INFO : Removed 0 complete replication entries from the table
accumulo.replication

Does anyone know what could be happening?

Thanks.





Re: Problems with accumulo replication

2018-01-02 Thread vLex Systems
Could it be the second problem?

I'm seeing exceptions like this one in the tablet server logs:
2018-01-02 09:09:23,481 [zookeeper.DistributedWorkQueue] WARN : Failed
to process work b1bba0c2-dde2-42d4-8c10-ef51d13448ca|peer1|4l|6
java.lang.RuntimeException: Instance name peer1 does not exist in
zookeeper.  Run "accumulo
org.apache.accumulo.server.util.ListInstances" to see a list.

When I run "accumulo org.apache.accumulo.server.util.ListInstances" it
only lists the primary accumulo.
Could the problem be in the ZooKeeper Quorum I used when I registered
the peer instance? I used the IP ot the peer as the only IP as the
ZooKeeper Quorum value.




2017-12-29 16:07 GMT+01:00 Josh Elser :
> If the system is reporting files that need to be replicated, it's probably
> one of two problems:
>
> * The WALs are still in use by the TabletServers. In its current
> implementation, the WALs are not replicated until the TabletServers don't
> referenced those WALs. This happens either by writing enough data or when
> the tabletserver is restarted. You can try to investigate either for this.
> * The replication is trying to happen but fails. You can look at the
> TabletServer logs on the primary instance to see if there are any reported
> exceptions around sending the data to the peer.
>
>
> On 12/29/17 8:24 AM, vLex Systems wrote:
>>
>> Hi,
>>
>> I've configured replication between two instances of accumulo: one is
>> the primary accumulo and the other is a peer created from a restore of
>> the backup of the primary.
>>
>> I've followed the instructions in the manual
>> (https://accumulo.apache.org/1.7/accumulo_user_manual#_replication)
>> and I can see the 4 tables I've configured to replicate in the
>> Accumulo Monitor but they do not replicate. They have 1 or 2 "Files
>> needing replication" and this number never decreases.
>>
>> I've also tried inserting data in one of the tables and the data does
>> not replicate to the accumulo peer instance.
>>
>> In the master log I see many entries like this one:
>> 2017-12-29 13:22:25,490 [replication.RemoveCompleteReplicationRecords]
>> INFO : Removed 0 complete replication entries from the table
>> accumulo.replication
>>
>> Does anyone know what could be happening?
>>
>> Thanks.
>>
>


RE: Problems with accumulo replication

2017-12-29 Thread dlmarion
You can also use the tserver.walog.max.age property to ensure that the walogs 
roll if there is no activity. The default is 24h and was backported to 1.7.2. 
See ACCUMULO-4004 for more info.

-Original Message-
From: Josh Elser [mailto:els...@apache.org] 
Sent: Friday, December 29, 2017 10:08 AM
To: user@accumulo.apache.org
Subject: Re: Problems with accumulo replication

If the system is reporting files that need to be replicated, it's probably one 
of two problems:

* The WALs are still in use by the TabletServers. In its current 
implementation, the WALs are not replicated until the TabletServers don't 
referenced those WALs. This happens either by writing enough data or when the 
tabletserver is restarted. You can try to investigate either for this.
* The replication is trying to happen but fails. You can look at the 
TabletServer logs on the primary instance to see if there are any reported 
exceptions around sending the data to the peer.

On 12/29/17 8:24 AM, vLex Systems wrote:
> Hi,
> 
> I've configured replication between two instances of accumulo: one is 
> the primary accumulo and the other is a peer created from a restore of 
> the backup of the primary.
> 
> I've followed the instructions in the manual
> (https://accumulo.apache.org/1.7/accumulo_user_manual#_replication)
> and I can see the 4 tables I've configured to replicate in the 
> Accumulo Monitor but they do not replicate. They have 1 or 2 "Files 
> needing replication" and this number never decreases.
> 
> I've also tried inserting data in one of the tables and the data does 
> not replicate to the accumulo peer instance.
> 
> In the master log I see many entries like this one:
> 2017-12-29 13:22:25,490 [replication.RemoveCompleteReplicationRecords]
> INFO : Removed 0 complete replication entries from the table 
> accumulo.replication
> 
> Does anyone know what could be happening?
> 
> Thanks.
> 



Re: Problems with accumulo replication

2017-12-29 Thread Josh Elser
If the system is reporting files that need to be replicated, it's 
probably one of two problems:


* The WALs are still in use by the TabletServers. In its current 
implementation, the WALs are not replicated until the TabletServers 
don't referenced those WALs. This happens either by writing enough data 
or when the tabletserver is restarted. You can try to investigate either 
for this.
* The replication is trying to happen but fails. You can look at the 
TabletServer logs on the primary instance to see if there are any 
reported exceptions around sending the data to the peer.


On 12/29/17 8:24 AM, vLex Systems wrote:

Hi,

I've configured replication between two instances of accumulo: one is
the primary accumulo and the other is a peer created from a restore of
the backup of the primary.

I've followed the instructions in the manual
(https://accumulo.apache.org/1.7/accumulo_user_manual#_replication)
and I can see the 4 tables I've configured to replicate in the
Accumulo Monitor but they do not replicate. They have 1 or 2 "Files
needing replication" and this number never decreases.

I've also tried inserting data in one of the tables and the data does
not replicate to the accumulo peer instance.

In the master log I see many entries like this one:
2017-12-29 13:22:25,490 [replication.RemoveCompleteReplicationRecords]
INFO : Removed 0 complete replication entries from the table
accumulo.replication

Does anyone know what could be happening?

Thanks.



Problems with accumulo replication

2017-12-29 Thread vLex Systems
Hi,

I've configured replication between two instances of accumulo: one is
the primary accumulo and the other is a peer created from a restore of
the backup of the primary.

I've followed the instructions in the manual
(https://accumulo.apache.org/1.7/accumulo_user_manual#_replication)
and I can see the 4 tables I've configured to replicate in the
Accumulo Monitor but they do not replicate. They have 1 or 2 "Files
needing replication" and this number never decreases.

I've also tried inserting data in one of the tables and the data does
not replicate to the accumulo peer instance.

In the master log I see many entries like this one:
2017-12-29 13:22:25,490 [replication.RemoveCompleteReplicationRecords]
INFO : Removed 0 complete replication entries from the table
accumulo.replication

Does anyone know what could be happening?

Thanks.