Sudhir, >> we tried to "revoke" them, after few disk stores were revoked
Can you try executing missing disk store? And see if there are any missing disk stores for the server that had issues. If there are any, can you try revoking them all... How did you concluded there was network partition? -Anil. On Wed, Jan 31, 2018 at 9:18 AM, Udo Kohlmeyer <[email protected]> wrote: > Hi there Sudhir, > > At this stage I think it would b best if you could share some log files of > all the servers. (at the time of failure). This might better help us better > understand what the system was doing at that time. > > --Udo > > On Tue, Jan 30, 2018 at 9:13 PM, Sudhir Babu Pothineni < > [email protected]> wrote: > >> Thanks for the replys Dan, Udo and Anil! >> >> The issue happened when the home directory disk partition was full >> because another application collated filled up the disk (not the diskstore >> directory partition). We have 8 servers, out of which only one had this >> issue, and it created network partition. *Why should one server having >> problem should create a network partition? * >> >> It might have come back from the network partition after we cleared disk >> issue, in this case the servers were restarted, which are went into >> ConflictingPersistentDataException. >> >> We tried to restart the cluster without that one member, and Geode showed >> missing-disk-stores from that member, we tried to "revoke" them, after few >> disk stores were revoked , *all the regions went to offline.* >> >> In our case regions are partitioned, persistent with redundancy 1. >> >> The message says: >> >> Partitioned Region /<region_name> is offline due to unrecovered >> persistent data <list of all the disk stores from other 7 servers >, *Now >> how to get them back online?* >> >> >> >> >> >> >> On Tue, Jan 30, 2018 at 12:31 PM, Dan Smith <[email protected]> wrote: >> >>> Hi Sudhir, >>> >>> Are you getting a ConflictingPersistentDataException, like is shown in >>> the link you sent? >>> >>> Generally that is caused by creating to completely different copies of a >>> region, using the method Barry described. The other cause might be >>> revoking members than then trying to bring them back into the system after >>> they were revoked. Just running out of disk space on one of your members >>> should not cause the issue. >>> >>> In all cases, when you get a ConflictingPersistentDataException, you >>> need will need to throw away the data from one of the members involved on >>> the conflict. You should look at which two members and see which one you >>> want to throw away. Is one of the members involved a member that you just >>> added to expand the system, for example? You can also use the describe >>> offline disk store command to look at the two conflicting disk stores and >>> see if one of them doesn't actually have data: >>> >>> https://geode.apache.org/docs/guide/latest/tools_modules/gfs >>> h/command-pages/describe.html#topic_kys_yvk_2l >>> >>> -Dan >>> >>> On Tue, Jan 30, 2018 at 10:00 AM, Udo Kohlmeyer <[email protected]> >>> wrote: >>> >>>> Hi there Sudhir, >>>> >>>> When you say disk full issue, is this the critical disk usage exception >>>> or is this where you physically have no more space on the disk. If you have >>>> the critical disk usage percentage exception thrown, you could temporarily >>>> disable this, by setting the percentage to 0. >>>> https://geode.apache.org/docs/guide/11/tools_modules/gfsh/co >>>> mmand-pages/create.html#topic_bkn_zty_ck >>>> >>>> This of course is there to protect the system before it runs out of >>>> space. BUT if you have some usable space available (let's say 10-20GB) then >>>> you could potentially start without the check and then start clearing some >>>> data. >>>> >>>> If you have physically run out of space `du -hs` tells you that you no >>>> space left, you are unfortunately left with the option to attach larger >>>> disks or reduce data stored. >>>> >>>> You could potentially look into compacting the diskstores, >>>> https://geode.apache.org/docs/guide/11/tools_modules/gfsh/co >>>> mmand-pages/compact.html, which might free up some space, BUT this >>>> would required 2x the existing used space to complete. >>>> >>>> --Udo >>>> >>>> On Tue, Jan 30, 2018 at 6:28 AM, Sudhir Babu Pothineni < >>>> [email protected]> wrote: >>>> >>>>> >>>>> Because of disk full issue, the servers are not starting, this is >>>>> similar to problem described here: https://discuss.pivotal. >>>>> io/hc/en-us/community/posts/205143537--ConflictingPersistent >>>>> DataException-Region-PdxTypes?mobile_site=true >>>>> >>>>> Any solution to get out of this state of the cluster? >>>>> >>>>> Thanks >>>>> Sudhir >>>>> >>>> >>>> >>> >> >
