Dharam,
Thank you for testing this out as well. Using Anton's guidance, I've
managed to reproduce the issue, byt restarting the 2 locators within 1s
(try for sub-second if possible).
Anton, did describe he did not see this behavior when the restarting
between the two locators was more than 2s.
--Udo
On 6/8/17 09:29, Dharam Thacker wrote:
Hi Anton,
I also tried to reproduce your scenario in my local ubuntu machine
with Geode 1.1.1, but I was able to restart cluster safely as
explained below.
host1> start locator1
host2> start locator2
host1> start server1
host2> start server2
host1> stop server1
host2> stop server2
host1> stop locator1
host2> stop locator2
verify all members shutdown well...
host1> start locator2 [Even though i terminated this last as per above
sequence i am starting it as first member]
host2> start locator1
Of course start of locator2 gave me same warning as I have higlighted
in red below. Then I waited for greater than 10s before starting
second locator(locator1 was stopped earlier than locator2 in past)
But as soon as I locator1 started, locator2 detected that and started
up cluster configuration service. Cluster was reformed after that.
*_Logs below for verification:_*
*
[info 2017/06/08 21:47:10.911 IST locator2 <Pooled Message Processor
1> tid=0x2d] Region /_ConfigurationRegion has potentially stale data.
It is waiting for another member to recover the latest data.*
My persistent id:
DiskStore ID: a267d876-40c8-4c85-848a-5a397adb5e5b
Name: locator2
Location:
/192.168.1.12:/home/dharam/Downloads/apache-geode/locator2/ConfigDiskDir_locator2
Members with potentially new data:
[
DiskStore ID: 39d28da8-6b2c-414c-9608-3550219b624d
Name: locator1
Location:
/192.168.1.12:/home/dharam/Downloads/apache-geode/locator1/ConfigDiskDir_locator1
]
Use the "gfsh show missing-disk-stores" command to see all disk
stores that are being waited on by other members.
*[warning 2017/06/08 21:47:45.606 IST locator2 <WAN Locator Discovery
Thread> tid=0x2f] Locator discovery task could not exchange locator
information 192.168.1.12[10335] with localhost[10334] after 6 retry
attempts. Retrying in 10,000 ms.
*
[info 2017/06/08 21:48:02.886 IST locator2 <unicast
receiver,dharam-ThinkPad-Edge-E431-1183> tid=0x1c] received join
request from 192.168.1.12(locator1:10969:locator)<ec>:1025
[info 2017/06/08 21:48:03.187 IST locator2 <Geode Membership View
Creator> tid=0x22] View Creator is processing 1 requests for the next
membership view
[info 2017/06/08 21:48:03.188 IST locator2 <Geode Membership View
Creator> tid=0x22] preparing new view
View[192.168.1.12(locator2:10853:locator)<ec><v0>:1024|1] members:
[192.168.1.12(locator2:10853:locator)<ec><v0>:1024,
192.168.1.12(locator1:10969:locator)<ec><v1>:1025]
failure detection ports: 14001 42428
[info 2017/06/08 21:48:03.221 IST locator2 <Geode Membership View
Creator> tid=0x22] finished waiting for responses to view preparation
[info 2017/06/08 21:48:03.221 IST locator2 <Geode Membership View
Creator> tid=0x22] received new view:
View[192.168.1.12(locator2:10853:locator)<ec><v0>:1024|1] members:
[192.168.1.12(locator2:10853:locator)<ec><v0>:1024,
192.168.1.12(locator1:10969:locator)<ec><v1>:1025]
old view is:
View[192.168.1.12(locator2:10853:locator)<ec><v0>:1024|0] members:
[192.168.1.12(locator2:10853:locator)<ec><v0>:1024]
[info 2017/06/08 21:48:03.222 IST locator2 <Geode Membership View
Creator> tid=0x22] Peer locator received new membership view:
View[192.168.1.12(locator2:10853:locator)<ec><v0>:1024|1] members:
[192.168.1.12(locator2:10853:locator)<ec><v0>:1024,
192.168.1.12(locator1:10969:locator)<ec><v1>:1025]
[info 2017/06/08 21:48:03.228 IST locator2 <Geode Membership View
Creator> tid=0x22] sending new view
View[192.168.1.12(locator2:10853:locator)<ec><v0>:1024|1] members:
[192.168.1.12(locator2:10853:locator)<ec><v0>:1024,
192.168.1.12(locator1:10969:locator)<ec><v1>:1025]
failure detection ports: 14001 42428
[info 2017/06/08 21:48:03.232 IST locator2 <View Message Processor>
tid=0x48] Membership: Processing addition <
192.168.1.12(locator1:10969:locator)<ec><v1>:1025 >
[info 2017/06/08 21:48:03.233 IST locator2 <View Message Processor>
tid=0x48] Admitting member
<192.168.1.12(locator1:10969:locator)<ec><v1>:1025>. Now there are 2
non-admin member(s).
[info 2017/06/08 21:48:03.242 IST locator2 <pool-3-thread-1> tid=0x4a]
Initializing region _monitoringRegion_192.168.1.12<v1>1025
[info 2017/06/08 21:48:03.275 IST locator2 <Pooled High Priority
Message Processor 1> tid=0x4e] Member
192.168.1.12(locator1:10969:locator)<ec><v1>:1025 is equivalent or in
the same redundancy zone.
[info 2017/06/08 21:48:03.326 IST locator2 <pool-3-thread-1> tid=0x4a]
Initialization of region _monitoringRegion_192.168.1.12<v1>1025 completed
[info 2017/06/08 21:48:03.336 IST locator2 <pool-3-thread-1> tid=0x4a]
Initializing region _notificationRegion_192.168.1.12<v1>1025
[info 2017/06/08 21:48:03.338 IST locator2 <pool-3-thread-1> tid=0x4a]
Initialization of region _notificationRegion_192.168.1.12<v1>1025
completed
[info 2017/06/08 21:48:04.611 IST locator2 <Pooled Message Processor
1> tid=0x2d] Region _ConfigurationRegion requesting initial image from
192.168.1.12(locator1:10969:locator)<ec><v1>:1025
[info 2017/06/08 21:48:04.615 IST locator2 <Pooled Message Processor
1> tid=0x2d] _ConfigurationRegion is done getting image from
192.168.1.12(locator1:10969:locator)<ec><v1>:1025. isDeltaGII is true
[info 2017/06/08 21:48:04.616 IST locator2 <Pooled Message Processor
1> tid=0x2d] Region _ConfigurationRegion initialized persistent id:
/192.168.1.12:/home/dharam/Downloads/apache-geode/locator2/ConfigDiskDir_locator2
created at timestamp 1496938615755 version 0 diskStoreId
a267d87640c84c85-848a5a397adb5e5b name locator2 with data from
192.168.1.12(locator1:10969:locator)<ec><v1>:1025.
[info 2017/06/08 21:48:04.617 IST locator2 <Pooled Message Processor
1> tid=0x2d] Initialization of region _ConfigurationRegion completed
[info 2017/06/08 21:48:04.637 IST locator2 <Pooled Message Processor
1> tid=0x2d] ConfigRequestHandler installed
*[info 2017/06/08 21:48:04.637 IST locator2 <Pooled Message Processor
1> tid=0x2d] Cluster configuration service start up completed
successfully and is now running ....*
[info 2017/06/08 21:48:05.692 IST locator2 <WAN Locator Discovery
Thread> tid=0x2f] Locator discovery task exchanged locator information
192.168.1.12[10335] with localhost[10334]: {-1=[192.168.1.12[10335],
192.168.1.12[10334]]}.
Thanks,
Dharam
- Dharam Thacker
On Thu, Jun 8, 2017 at 9:25 PM, Bruce Schuchardt
<[email protected] <mailto:[email protected]>> wrote:
The locator view file exists to allow locators to be bounced
without shutting down the rest of the cluster. On startup a
locator will try to find the current membership coordinator of the
cluster from an existing locator and join the system using that
information. If there is no existing locator that knows who the
coordinator might be then the new locator will try to find the
coordinator using the membership "view" that is stored in the view
file. If there is no view file the locator will not be able to
join the existing cluster.
If you've done a full shutdown of the cluster it is safe to delete
the locator*view.dat files.
When there is no .dat file the locators will use a
concurrent-startup algorithm to form a unified system.
On 6/8/17 7:48 AM, Anton Mironenko wrote:
Hello,
We found out that if we delete “locator*view.dat” before starting
a locator,
It fixes the first part of the issue
https://issues.apache.org/jira/browse/GEODE-3003
<https://issues.apache.org/jira/browse/GEODE-3003>
“Geode doesn't start after cluster restart when using
cluster-configuration”
“The second start goes wrong: the locator on the first host
always doesn't join the rest of the cluster with the error in the
locator log:
"Region /_ConfigurationRegion has potentially stale data. It is
waiting for another member to recover the latest data."”
What is a side effect of deleting the file
"locator0/locator*view.dat"? What functionality do we lose?
A use case with some example would be great.
Anton Mironenko
This message and the information contained herein is proprietary
and confidential and subject to the Amdocs policy statement,
you may review at https://www.amdocs.com/about/email-disclaimer
<https://www.amdocs.com/about/email-disclaimer>