Re: [galaxy-dev] Adding data libraries from filesystem path creating duplicates
If you can consistently cause the problem I wonder if it is worth trying this (http://serverfault.com/questions/528653/how-can-i-stop-nginx-from-retrying-put-or-post-requests-on-upstream-server-timeo) advice out - it would be good to know if it helps. There is a gist here - https://gist.github.com/wojons/6154645. I don't think any POST in Galaxy should be retries instead of errored on so I don't see any downside of adding this to the nginx configuration. -John On Mon, Sep 21, 2015 at 10:27 AM, Martin Vickers wrote: > > -BEGIN PGP SIGNED MESSAGE- > Hash: SHA1 > > Hi John, > > Thanks for taking the time to reply. I never thought to look at the > proxy settings but I think you're right, the behaviour seems to match > what you've described. > > Like you I'm not really an expert on proxies and have no idea what would > be mis-configured that would cause this. > > I'm using nginx and the configuration is as described in the wiki. I've > not loaded any special extensions. > > nginx is configured like this; > > upstream galaxy_app { > server localhost:8090; > server localhost:8091; > server localhost:8092; > server localhost:8093; > server localhost:8094; > server localhost:8095; > } > > server { > # pass to uWSGI by default > location / { > proxy_pass http://galaxy_app; > proxy_set_header X-Forwarded-Host $host; > proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; > proxy_set_header X-URL-SCHEME https; > } > > static content > > } > > and in galaxy.ini I have a bunch of handlers, e.g. > > > [server:handler0] > use = egg:Paste#http > port = 8090 > host = 127.0.0.1 > use_threadpool = true > threadpool_workers = 5 > > I thought that maybe the issue was to do with the 'job admin > complication' very briefly mentioned here; > > https://production-galaxy-instances-with-cloudman-and-cloudbiolinux.readthedocs.org/en/latest/ > > so I added this to my nginx conf > > location /admin/jobs { > proxy_pass http://localhost:8090; > } > > > so this complication is not the one I'm having here. > > Are any of the people John mentioned having this issue here on the dev > board? > > Cheers, > > Martin > > On 09/14/2015 03:11 PM, John Chilton wrote: >> If I had to guess, I would guess this is caused by a mis-configured >> proxy (nginx or Apache) that is resubmitting a POST request that is >> taking Galaxy to long to respond to. Order of events being something >> like: >> >> - User clicks to upload library items. >> - Proxy gets requests and passes to Galaxy >> - Galaxy takes a long time to process request and doesn't respond >> within a timeout. >> - Proxy resends POST request to Galaxy. >> - Galaxy takes a long time to process request and doesn't respond >> within a timeout. >> ... >> >> Proxies should never resend POST requests to Galaxy as far a I can >> imagine, but we have seen this for instance when submitting workflows. >> Some people have had their proxy retry that request repeatedly. >> >> I don't really know if this is a problem with the default proxy >> configurations we list on the wiki or if it comes down to >> customizations or special loaded extensions at various sites that have >> encountered this. >> >> Is this enough to help debug the problem? I'm not really an expert on >> specific proxies, etc... and you have it there and seem to be able to >> reproduce the problem. If you do want further help I would post the >> proxy you are using, the extensions, the configuration, and the Galaxy >> logs corresponding to this incident to see if we can see the repeated >> posts and the route that is being posted to. >> >> If you are not using a proxy, then I am stumped :(. >> >> -John >> >> >> On Fri, Sep 4, 2015 at 12:04 PM, Martin Vickers wrote: >>> Hi All, >>> >>> I've noticed an issue a couple of times now where I've added a directory >>> of fastq's from an NFS mounted filesystem (reference only rather than >>> copying into galaxy) and then galaxy times out. Load average begins to >>> get really high and then consumes all the RAM and sometimes crashes. >>> These are the same symptom as I had before with this issue that was >>> never resolved; >>> >>> http://dev.list.galaxyproject.org/run-sh-segfault-td4667549.html#a4667553 >>> >>> What I've noticed is that in the dataset I'm uploading to galaxy, there >>> are suddenly many duplicates. In this example that's just happened, >>> there are 288 fastq.gz files in the physical folder, but galaxy has >>> created 6 references to each file resulting in 1728 datasets in the >>> folder (see attached images). >>> >>> When this happened before and crashed the galaxy application, whenever >>> it restarted it'd try to resume what it was doing which created an >>> endless loop of retrying and crashing until the job was removed. >>> >>> Does anyone know what may be causing this? >>> >>> Cheers, >>> >>> Martin >>> >>> -- >>> >>> -- >>> Dr. Martin Vickers >>> >>> Data Manager/HPC Sy
Re: [galaxy-dev] Adding data libraries from filesystem path creating duplicates
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Hi John, Thanks for taking the time to reply. I never thought to look at the proxy settings but I think you're right, the behaviour seems to match what you've described. Like you I'm not really an expert on proxies and have no idea what would be mis-configured that would cause this. I'm using nginx and the configuration is as described in the wiki. I've not loaded any special extensions. nginx is configured like this; upstream galaxy_app { server localhost:8090; server localhost:8091; server localhost:8092; server localhost:8093; server localhost:8094; server localhost:8095; } server { # pass to uWSGI by default location / { proxy_pass http://galaxy_app; proxy_set_header X-Forwarded-Host $host; proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for; proxy_set_header X-URL-SCHEME https; } static content } and in galaxy.ini I have a bunch of handlers, e.g. [server:handler0] use = egg:Paste#http port = 8090 host = 127.0.0.1 use_threadpool = true threadpool_workers = 5 I thought that maybe the issue was to do with the 'job admin complication' very briefly mentioned here; https://production-galaxy-instances-with-cloudman-and-cloudbiolinux.readthedocs.org/en/latest/ so I added this to my nginx conf location /admin/jobs { proxy_pass http://localhost:8090; } so this complication is not the one I'm having here. Are any of the people John mentioned having this issue here on the dev board? Cheers, Martin On 09/14/2015 03:11 PM, John Chilton wrote: > If I had to guess, I would guess this is caused by a mis-configured > proxy (nginx or Apache) that is resubmitting a POST request that is > taking Galaxy to long to respond to. Order of events being something > like: > > - User clicks to upload library items. > - Proxy gets requests and passes to Galaxy > - Galaxy takes a long time to process request and doesn't respond > within a timeout. > - Proxy resends POST request to Galaxy. > - Galaxy takes a long time to process request and doesn't respond > within a timeout. > ... > > Proxies should never resend POST requests to Galaxy as far a I can > imagine, but we have seen this for instance when submitting workflows. > Some people have had their proxy retry that request repeatedly. > > I don't really know if this is a problem with the default proxy > configurations we list on the wiki or if it comes down to > customizations or special loaded extensions at various sites that have > encountered this. > > Is this enough to help debug the problem? I'm not really an expert on > specific proxies, etc... and you have it there and seem to be able to > reproduce the problem. If you do want further help I would post the > proxy you are using, the extensions, the configuration, and the Galaxy > logs corresponding to this incident to see if we can see the repeated > posts and the route that is being posted to. > > If you are not using a proxy, then I am stumped :(. > > -John > > > On Fri, Sep 4, 2015 at 12:04 PM, Martin Vickers wrote: >> Hi All, >> >> I've noticed an issue a couple of times now where I've added a directory >> of fastq's from an NFS mounted filesystem (reference only rather than >> copying into galaxy) and then galaxy times out. Load average begins to >> get really high and then consumes all the RAM and sometimes crashes. >> These are the same symptom as I had before with this issue that was >> never resolved; >> >> http://dev.list.galaxyproject.org/run-sh-segfault-td4667549.html#a4667553 >> >> What I've noticed is that in the dataset I'm uploading to galaxy, there >> are suddenly many duplicates. In this example that's just happened, >> there are 288 fastq.gz files in the physical folder, but galaxy has >> created 6 references to each file resulting in 1728 datasets in the >> folder (see attached images). >> >> When this happened before and crashed the galaxy application, whenever >> it restarted it'd try to resume what it was doing which created an >> endless loop of retrying and crashing until the job was removed. >> >> Does anyone know what may be causing this? >> >> Cheers, >> >> Martin >> >> -- >> >> -- >> Dr. Martin Vickers >> >> Data Manager/HPC Systems Administrator >> Institute of Biological, Environmental and Rural Sciences >> IBERS New Building >> Aberystwyth University >> >> w: http://www.martin-vickers.co.uk/ >> e: mj...@aber.ac.uk >> t: 01970 62 2807 >> >> >> ___ >> Please keep all replies on the list by using "reply all" >> in your mail client. To manage your subscriptions to this >> and other Galaxy lists, please use the interface at: >> https://lists.galaxyproject.org/ >> >> To search Galaxy mailing lists use the unified search at: >> http://galaxyproject.org/search/mailinglists/ - -- - -- Dr. Martin Vickers Data Manager/HPC Systems Administrator Institute of Biological, Enviro
Re: [galaxy-dev] Adding data libraries from filesystem path creating duplicates
If I had to guess, I would guess this is caused by a mis-configured proxy (nginx or Apache) that is resubmitting a POST request that is taking Galaxy to long to respond to. Order of events being something like: - User clicks to upload library items. - Proxy gets requests and passes to Galaxy - Galaxy takes a long time to process request and doesn't respond within a timeout. - Proxy resends POST request to Galaxy. - Galaxy takes a long time to process request and doesn't respond within a timeout. ... Proxies should never resend POST requests to Galaxy as far a I can imagine, but we have seen this for instance when submitting workflows. Some people have had their proxy retry that request repeatedly. I don't really know if this is a problem with the default proxy configurations we list on the wiki or if it comes down to customizations or special loaded extensions at various sites that have encountered this. Is this enough to help debug the problem? I'm not really an expert on specific proxies, etc... and you have it there and seem to be able to reproduce the problem. If you do want further help I would post the proxy you are using, the extensions, the configuration, and the Galaxy logs corresponding to this incident to see if we can see the repeated posts and the route that is being posted to. If you are not using a proxy, then I am stumped :(. -John On Fri, Sep 4, 2015 at 12:04 PM, Martin Vickers wrote: > Hi All, > > I've noticed an issue a couple of times now where I've added a directory > of fastq's from an NFS mounted filesystem (reference only rather than > copying into galaxy) and then galaxy times out. Load average begins to > get really high and then consumes all the RAM and sometimes crashes. > These are the same symptom as I had before with this issue that was > never resolved; > > http://dev.list.galaxyproject.org/run-sh-segfault-td4667549.html#a4667553 > > What I've noticed is that in the dataset I'm uploading to galaxy, there > are suddenly many duplicates. In this example that's just happened, > there are 288 fastq.gz files in the physical folder, but galaxy has > created 6 references to each file resulting in 1728 datasets in the > folder (see attached images). > > When this happened before and crashed the galaxy application, whenever > it restarted it'd try to resume what it was doing which created an > endless loop of retrying and crashing until the job was removed. > > Does anyone know what may be causing this? > > Cheers, > > Martin > > -- > > -- > Dr. Martin Vickers > > Data Manager/HPC Systems Administrator > Institute of Biological, Environmental and Rural Sciences > IBERS New Building > Aberystwyth University > > w: http://www.martin-vickers.co.uk/ > e: mj...@aber.ac.uk > t: 01970 62 2807 > > > ___ > Please keep all replies on the list by using "reply all" > in your mail client. To manage your subscriptions to this > and other Galaxy lists, please use the interface at: > https://lists.galaxyproject.org/ > > To search Galaxy mailing lists use the unified search at: > http://galaxyproject.org/search/mailinglists/ ___ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: https://lists.galaxyproject.org/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/