Hi Anton,

I also tried to reproduce your scenario in my local ubuntu machine with
Geode 1.1.1, but I was able to restart cluster safely as explained below.

host1> start locator1

host2> start locator2

host1> start server1

host2> start server2

host1> stop server1
host2> stop server2

host1> stop locator1
host2> stop locator2

verify all members shutdown well...


host1> start locator2 [Even though i terminated this last as per above
sequence i am starting it as first member]
host2> start locator1

Of course start of locator2 gave me same warning as I have higlighted in
red below. Then I waited for greater than 10s before starting second
locator(locator1 was stopped earlier than locator2 in past)

But as soon as I locator1 started, locator2 detected that and started up
cluster configuration service. Cluster was reformed after that.

*Logs below for verification:*


*[info 2017/06/08 21:47:10.911 IST locator2 <Pooled Message Processor 1>
tid=0x2d] Region /_ConfigurationRegion has potentially stale data. It is
waiting for another member to recover the latest data.*
  My persistent id:

    DiskStore ID: a267d876-40c8-4c85-848a-5a397adb5e5b
    Name: locator2
    Location: /192.168.1.12:
/home/dharam/Downloads/apache-geode/locator2/ConfigDiskDir_locator2

  Members with potentially new data:
  [
    DiskStore ID: 39d28da8-6b2c-414c-9608-3550219b624d
    Name: locator1
    Location: /192.168.1.12:
/home/dharam/Downloads/apache-geode/locator1/ConfigDiskDir_locator1
  ]
  Use the "gfsh show missing-disk-stores" command to see all disk stores
that are being waited on by other members.


*[warning 2017/06/08 21:47:45.606 IST locator2 <WAN Locator Discovery
Thread> tid=0x2f] Locator discovery task could not exchange locator
information 192.168.1.12[10335] with localhost[10334] after 6 retry
attempts. Retrying in 10,000 ms.*
[info 2017/06/08 21:48:02.886 IST locator2 <unicast
receiver,dharam-ThinkPad-Edge-E431-1183> tid=0x1c] received join request
from 192.168.1.12(locator1:10969:locator)<ec>:1025

[info 2017/06/08 21:48:03.187 IST locator2 <Geode Membership View Creator>
tid=0x22] View Creator is processing 1 requests for the next membership view

[info 2017/06/08 21:48:03.188 IST locator2 <Geode Membership View Creator>
tid=0x22] preparing new view
View[192.168.1.12(locator2:10853:locator)<ec><v0>:1024|1] members:
[192.168.1.12(locator2:10853:locator)<ec><v0>:1024,
192.168.1.12(locator1:10969:locator)<ec><v1>:1025]
  failure detection ports: 14001 42428

[info 2017/06/08 21:48:03.221 IST locator2 <Geode Membership View Creator>
tid=0x22] finished waiting for responses to view preparation

[info 2017/06/08 21:48:03.221 IST locator2 <Geode Membership View Creator>
tid=0x22] received new view:
View[192.168.1.12(locator2:10853:locator)<ec><v0>:1024|1] members:
[192.168.1.12(locator2:10853:locator)<ec><v0>:1024,
192.168.1.12(locator1:10969:locator)<ec><v1>:1025]
  old view is: View[192.168.1.12(locator2:10853:locator)<ec><v0>:1024|0]
members: [192.168.1.12(locator2:10853:locator)<ec><v0>:1024]

[info 2017/06/08 21:48:03.222 IST locator2 <Geode Membership View Creator>
tid=0x22] Peer locator received new membership view:
View[192.168.1.12(locator2:10853:locator)<ec><v0>:1024|1] members:
[192.168.1.12(locator2:10853:locator)<ec><v0>:1024,
192.168.1.12(locator1:10969:locator)<ec><v1>:1025]

[info 2017/06/08 21:48:03.228 IST locator2 <Geode Membership View Creator>
tid=0x22] sending new view
View[192.168.1.12(locator2:10853:locator)<ec><v0>:1024|1] members:
[192.168.1.12(locator2:10853:locator)<ec><v0>:1024,
192.168.1.12(locator1:10969:locator)<ec><v1>:1025]
  failure detection ports: 14001 42428

[info 2017/06/08 21:48:03.232 IST locator2 <View Message Processor>
tid=0x48] Membership: Processing addition <
192.168.1.12(locator1:10969:locator)<ec><v1>:1025 >

[info 2017/06/08 21:48:03.233 IST locator2 <View Message Processor>
tid=0x48] Admitting member
<192.168.1.12(locator1:10969:locator)<ec><v1>:1025>. Now there are 2
non-admin member(s).

[info 2017/06/08 21:48:03.242 IST locator2 <pool-3-thread-1> tid=0x4a]
Initializing region _monitoringRegion_192.168.1.12<v1>1025

[info 2017/06/08 21:48:03.275 IST locator2 <Pooled High Priority Message
Processor 1> tid=0x4e] Member
192.168.1.12(locator1:10969:locator)<ec><v1>:1025 is equivalent or in the
same redundancy zone.

[info 2017/06/08 21:48:03.326 IST locator2 <pool-3-thread-1> tid=0x4a]
Initialization of region _monitoringRegion_192.168.1.12<v1>1025 completed

[info 2017/06/08 21:48:03.336 IST locator2 <pool-3-thread-1> tid=0x4a]
Initializing region _notificationRegion_192.168.1.12<v1>1025

[info 2017/06/08 21:48:03.338 IST locator2 <pool-3-thread-1> tid=0x4a]
Initialization of region _notificationRegion_192.168.1.12<v1>1025 completed

[info 2017/06/08 21:48:04.611 IST locator2 <Pooled Message Processor 1>
tid=0x2d] Region _ConfigurationRegion requesting initial image from
192.168.1.12(locator1:10969:locator)<ec><v1>:1025

[info 2017/06/08 21:48:04.615 IST locator2 <Pooled Message Processor 1>
tid=0x2d] _ConfigurationRegion is done getting image from
192.168.1.12(locator1:10969:locator)<ec><v1>:1025. isDeltaGII is true

[info 2017/06/08 21:48:04.616 IST locator2 <Pooled Message Processor 1>
tid=0x2d] Region _ConfigurationRegion initialized persistent id:
/192.168.1.12:/home/dharam/Downloads/apache-geode/locator2/ConfigDiskDir_locator2
created at timestamp 1496938615755 version 0 diskStoreId
a267d87640c84c85-848a5a397adb5e5b name locator2 with data from
192.168.1.12(locator1:10969:locator)<ec><v1>:1025.

[info 2017/06/08 21:48:04.617 IST locator2 <Pooled Message Processor 1>
tid=0x2d] Initialization of region _ConfigurationRegion completed

[info 2017/06/08 21:48:04.637 IST locator2 <Pooled Message Processor 1>
tid=0x2d] ConfigRequestHandler installed

*[info 2017/06/08 21:48:04.637 IST locator2 <Pooled Message Processor 1>
tid=0x2d] Cluster configuration service start up completed successfully and
is now running ....*

[info 2017/06/08 21:48:05.692 IST locator2 <WAN Locator Discovery Thread>
tid=0x2f] Locator discovery task exchanged locator information
192.168.1.12[10335] with localhost[10334]: {-1=[192.168.1.12[10335],
192.168.1.12[10334]]}.

Thanks,
Dharam

- Dharam Thacker

On Thu, Jun 8, 2017 at 9:25 PM, Bruce Schuchardt <[email protected]>
wrote:

> The locator view file exists to allow locators to be bounced without
> shutting down the rest of the cluster.  On startup a locator will try to
> find the current membership coordinator of the cluster from an existing
> locator and join the system using that information.  If there is no
> existing locator that knows who the coordinator might be then the new
> locator will try to find the coordinator using the membership "view" that
> is stored in the view file.  If there is no view file the locator will not
> be able to join the existing cluster.
>
> If you've done a full shutdown of the cluster it is safe to delete the
> locator*view.dat files.
>
> When there is no .dat file the locators will use a concurrent-startup
> algorithm to form a unified system.
>
> On 6/8/17 7:48 AM, Anton Mironenko wrote:
>
> Hello,
>
> We found out that if we delete “locator*view.dat” before starting a
> locator,
>
> It fixes the first part of the issue
>
> https://issues.apache.org/jira/browse/GEODE-3003
>
> “Geode doesn't start after cluster restart when using
> cluster-configuration”
>
>
>
> “The second start goes wrong: the locator on the first host always
> doesn't join the rest of the cluster with the error in the locator log:
> "Region /_ConfigurationRegion has potentially stale data. It is waiting
> for another member to recover the latest data."”
>
>
>
> What is a side effect of deleting the file "locator0/locator*view.dat"?
> What functionality do we lose?
>
> A use case with some example would be great.
>
>
>
> Anton Mironenko
>
>
> This message and the information contained herein is proprietary and
> confidential and subject to the Amdocs policy statement,
> you may review at https://www.amdocs.com/about/email-disclaimer
>
>
>

Reply via email to