[Bug 62771] WMFLabs: Auto-creation of home directories broken (new members and instances unable to login)

2014-03-26 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=62771

Marc A. Pelletier  changed:

   What|Removed |Added

 Status|ASSIGNED|RESOLVED
 Resolution|--- |FIXED

--- Comment #5 from Marc A. Pelletier  ---
The race condition has been prevented for new images (that is, attempts to
mount a filesystem before it has been made available rw will now fail rather
than mount readonly); subsequent puppet runs will try again.

This will prevent the fundamental issue (and the annoying caching that makes it
hard to go away), but not for existing instances which will still require some
manual manipulation.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 62771] WMFLabs: Auto-creation of home directories broken (new members and instances unable to login)

2014-03-21 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=62771

--- Comment #4 from Krinkle  ---
(In reply to Marc A. Pelletier from comment #3)
> The nature of the problem is known (the instance attempts to mount /home and
> /data/project before the NFS server has updated its ACLs for it, then caches
> the negative result for some time), but a proper fix hasn't been found yet.
> 
> I have some ideas on how to prevent this from happening that I will be
> trying today.
> 
> In the meantime, doing a reboot at least 10 minutes after the issue occurs
> then waiting at least another 20 minutes seem to be sufficient to let the
> ACL time out.

I've rebooted cvn-app3 shortly after I created it and it wasn't working, then I
reported this bug.

I've rebooted it again yesterday, and again today just now. Still getting:

krinkle at KrinkleMac in ~ $ ssh cvn-app3.eqiad.wmflabs 
channel 0: open failed: connect failed: Connection timed out
ssh_exchange_identification: Connection closed by remote host


Could be unrelated, but it's also not showing any life signs in ganglia since
and including during its creation:
http://ganglia.wmflabs.org/latest/?c=cvn&h=cvn-app3

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 62771] WMFLabs: Auto-creation of home directories broken (new members and instances unable to login)

2014-03-19 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=62771

Marc A. Pelletier  changed:

   What|Removed |Added

   Priority|Unprioritized   |High
   Severity|normal  |major

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 62771] WMFLabs: Auto-creation of home directories broken (new members and instances unable to login)

2014-03-19 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=62771

Marc A. Pelletier  changed:

   What|Removed |Added

 Status|NEW |ASSIGNED

--- Comment #3 from Marc A. Pelletier  ---
The nature of the problem is known (the instance attempts to mount /home and
/data/project before the NFS server has updated its ACLs for it, then caches
the negative result for some time), but a proper fix hasn't been found yet.

I have some ideas on how to prevent this from happening that I will be trying
today.

In the meantime, doing a reboot at least 10 minutes after the issue occurs then
waiting at least another 20 minutes seem to be sufficient to let the ACL time
out.

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 62771] WMFLabs: Auto-creation of home directories broken (new members and instances unable to login)

2014-03-19 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=62771

Daniel Zahn  changed:

   What|Removed |Added

 CC||dz...@wikimedia.org

--- Comment #2 from Daniel Zahn  ---
confirmed, had exact same issue today, with 2 newly created eqiad instances.
the problem disappeared after rebooting the second one a second time (or so)

-- 
You are receiving this mail because:
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l


[Bug 62771] WMFLabs: Auto-creation of home directories broken (new members and instances unable to login)

2014-03-19 Thread bugzilla-daemon
https://bugzilla.wikimedia.org/show_bug.cgi?id=62771

Andrew Bogott  changed:

   What|Removed |Added

   Assignee|wikibugs-l@lists.wikimedia. |m...@uberbox.org
   |org |

--- Comment #1 from Andrew Bogott  ---
The pmtpa issue is an unrelated and soon-to-be-moot gluster failure.

The eqiad issue I've seen before, but don't know how to fix (other than by
waiting and rebooting.)  Perhaps Coren will have time to debug this sometime
soon...

-- 
You are receiving this mail because:
You are the assignee for the bug.
You are on the CC list for the bug.
___
Wikibugs-l mailing list
Wikibugs-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikibugs-l