[Bug 36993] dumps project overload GlusterFS and cause cluster failure
https://bugzilla.wikimedia.org/show_bug.cgi?id=36993 --- Comment #14 from Nemo federicol...@tiscali.it 2012-11-10 14:44:21 UTC --- *** Bug 36997 has been marked as a duplicate of this bug. *** -- Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are the assignee for the bug. You are on the CC list for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
[Bug 36993] dumps project overload GlusterFS and cause cluster failure
https://bugzilla.wikimedia.org/show_bug.cgi?id=36993 Nemo federicol...@tiscali.it changed: What|Removed |Added Blocks||41967 -- Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are the assignee for the bug. You are on the CC list for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
[Bug 36993] dumps project overload GlusterFS and cause cluster failure
https://bugzilla.wikimedia.org/show_bug.cgi?id=36993 Antoine hashar Musso has...@free.fr changed: What|Removed |Added Summary|Labs cluster dies daily at |dumps project overload |roughly 6:30 UTC|GlusterFS and cause cluster ||failure Severity|normal |major --- Comment #9 from Antoine hashar Musso has...@free.fr 2012-05-22 14:05:34 UTC --- We just had some kind of outage for the whole cluster. The virtualization cluster showed load gradually increasing at 13:20UTC : http://ganglia.wikimedia.org/latest/?r=hourcs=05%2F22%2F2012+13%3A00+ce=05%2F22%2F2012+14%3A00+m=load_reports=by+namec=Virtualization+cluster+pmtpah=host_regex=max_graphs=0tab=mvn=sh=1z=smallhc=4 At the sometime, the dumps project on labs starts having some network activity which corresponds to I/O activity over NFS: http://ganglia.wmflabs.org/latest/graph.php?c=dumpsm=network_reportr=customs=by%20namehc=4mc=2cs=05%2F22%2F2012%2011%3A00%20ce=05%2F22%2F2012%2014%3A00%20st=1337694997g=network_reportz=mediumc=dumps I have seen the exact same behavior earlier this meaning where 30MBytes/s were output from a datadump host in eqiad and 30Mbytes/s were input in the dumps project. At the sametime, instances were unresponsive. We need to find a workaround, some possible solutions: - get the `dump` project to use some NFS share on real storage thus bypassing GlusterFS - rate limit network bandwidth between the dataset1001 in eqiad and the labs - find a parameter in GlusterFS that will throttle the connection Other ideas? Changing summary from: Labs cluster dies daily at roughly 6:30 UTC To: dumps project overload GlusterFS and cause cluster failure Raising severity since that makes the cluster unusable from time to time. -- Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are the assignee for the bug. You are on the CC list for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
[Bug 36993] dumps project overload GlusterFS and cause cluster failure
https://bugzilla.wikimedia.org/show_bug.cgi?id=36993 Ariel T. Glenn ar...@wikimedia.org changed: What|Removed |Added CC||ar...@wikimedia.org --- Comment #10 from Ariel T. Glenn ar...@wikimedia.org 2012-05-22 14:12:38 UTC --- There is a gluster share which is supposed to be available across all lab instances, which has the last 5 good dumps in it. I don't know if it's been made accessible to the instances yet. It updates every day at around 4 am UTC. The point of that is so that no one has to download their own copies of the dumps to work on them in a labs project (wasting space and bandwidth). -- Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are the assignee for the bug. You are on the CC list for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
[Bug 36993] dumps project overload GlusterFS and cause cluster failure
https://bugzilla.wikimedia.org/show_bug.cgi?id=36993 --- Comment #11 from Antoine hashar Musso has...@free.fr 2012-05-22 14:16:08 UTC --- Following a discussion with Hydriz here is what he does: - rsync dumps to is instance in /data/project/dumps (which hit glusterFS) - upload the dumps to Internet Archive using curl and their S3 interface So we are copying the data in Gluster FS just to move them out after. I guess the comment by Ariel above could be a good solution. -- Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are the assignee for the bug. You are on the CC list for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
[Bug 36993] dumps project overload GlusterFS and cause cluster failure
https://bugzilla.wikimedia.org/show_bug.cgi?id=36993 --- Comment #12 from Antoine hashar Musso has...@free.fr 2012-05-22 14:41:11 UTC --- Hydriz is going to upload to S3 from the copy Ariel is referring to in comment 10. -- Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are the assignee for the bug. You are on the CC list for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l
[Bug 36993] dumps project overload GlusterFS and cause cluster failure
https://bugzilla.wikimedia.org/show_bug.cgi?id=36993 Antoine hashar Musso has...@free.fr changed: What|Removed |Added Status|NEW |RESOLVED Resolution||FIXED --- Comment #13 from Antoine hashar Musso has...@free.fr 2012-05-22 14:43:09 UTC --- Since we have found a workaround for the recent problems we had, I am closing this bug. The root cause is GlusterFS that can be killed just by one instance doing some heavy I/O. That should be another bug. -- Configure bugmail: https://bugzilla.wikimedia.org/userprefs.cgi?tab=email --- You are receiving this mail because: --- You are the assignee for the bug. You are on the CC list for the bug. ___ Wikibugs-l mailing list Wikibugs-l@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikibugs-l