Re: [galaxy-dev] Adding data libraries from filesystem path creating duplicates

2015-09-25 Thread John Chilton
If you can consistently cause the problem I wonder if it is worth
trying this 
(http://serverfault.com/questions/528653/how-can-i-stop-nginx-from-retrying-put-or-post-requests-on-upstream-server-timeo)
advice out - it would be good to know if it helps. There is a gist
here - https://gist.github.com/wojons/6154645.

I don't think any POST in Galaxy should be retries instead of errored
on so I don't see any downside of adding this to the nginx
configuration.

-John


On Mon, Sep 21, 2015 at 10:27 AM, Martin Vickers  wrote:
>
> -BEGIN PGP SIGNED MESSAGE-
> Hash: SHA1
>
> Hi John,
>
> Thanks for taking the time to reply. I never thought to look at the
> proxy settings but I think you're right, the behaviour seems to match
> what you've described.
>
> Like you I'm not really an expert on proxies and have no idea what would
> be mis-configured that would cause this.
>
> I'm using nginx and the configuration is as described in the wiki. I've
> not loaded any special extensions.
>
> nginx is configured like this;
>
> upstream galaxy_app {
> server localhost:8090;
> server localhost:8091;
> server localhost:8092;
> server localhost:8093;
> server localhost:8094;
> server localhost:8095;
> }
>
> server {
> # pass to uWSGI by default
> location / {
> proxy_pass http://galaxy_app;
> proxy_set_header X-Forwarded-Host $host;
> proxy_set_header X-Forwarded-For  $proxy_add_x_forwarded_for;
> proxy_set_header X-URL-SCHEME https;
> }
>
> static content
>
> }
>
> and in galaxy.ini I have a bunch of handlers, e.g.
>
>
> [server:handler0]
> use = egg:Paste#http
> port = 8090
> host = 127.0.0.1
> use_threadpool = true
> threadpool_workers = 5
>
> I thought that maybe the issue was to do with the 'job admin
> complication' very briefly mentioned here;
>
> https://production-galaxy-instances-with-cloudman-and-cloudbiolinux.readthedocs.org/en/latest/
>
> so I added this to my nginx conf
>
> location /admin/jobs {
> proxy_pass  http://localhost:8090;
> }
>
>
> so this complication is not the one I'm having here.
>
> Are any of the people John mentioned having this issue here on the dev
> board?
>
> Cheers,
>
> Martin
>
> On 09/14/2015 03:11 PM, John Chilton wrote:
>> If I had to guess, I would guess this is caused by a mis-configured
>> proxy (nginx or Apache) that is resubmitting a POST request that is
>> taking Galaxy to long to respond to. Order of events being something
>> like:
>>
>> - User clicks to upload library items.
>> - Proxy gets requests and passes to Galaxy
>> - Galaxy takes a long time to process request and doesn't respond
>> within a timeout.
>> - Proxy resends POST request to Galaxy.
>> - Galaxy takes a long time to process request and doesn't respond
>> within a timeout.
>> ...
>>
>> Proxies should never resend POST requests to Galaxy as far a I can
>> imagine, but we have seen this for instance when submitting workflows.
>> Some people have had their proxy retry that request repeatedly.
>>
>> I don't really know if this is a problem with the default proxy
>> configurations we list on the wiki or if it comes down to
>> customizations or special loaded extensions at various sites that have
>> encountered this.
>>
>> Is this enough to help debug the problem? I'm not really an expert on
>> specific proxies, etc... and you have it there and seem to be able to
>> reproduce the problem. If you do want further help I would post the
>> proxy you are using, the extensions, the configuration, and the Galaxy
>> logs corresponding to this incident to see if we can see the repeated
>> posts and the route that is being posted to.
>>
>> If you are not using a proxy, then I am stumped :(.
>>
>> -John
>>
>>
>> On Fri, Sep 4, 2015 at 12:04 PM, Martin Vickers  wrote:
>>> Hi All,
>>>
>>> I've noticed an issue a couple of times now where I've added a directory
>>> of fastq's from an NFS mounted filesystem (reference only rather than
>>> copying into galaxy) and then galaxy times out. Load average begins to
>>> get really high and then consumes all the RAM and sometimes crashes.
>>> These are the same symptom as I had before with this issue that was
>>> never resolved;
>>>
>>> http://dev.list.galaxyproject.org/run-sh-segfault-td4667549.html#a4667553
>>>
>>> What I've noticed is that in the dataset I'm uploading to galaxy, there
>>> are suddenly many duplicates. In this example that's just happened,
>>> there are 288 fastq.gz files in the physical folder, but galaxy has
>>> created 6 references to each file resulting in 1728 datasets in the
>>> folder (see attached images).
>>>
>>> When this happened before and crashed the galaxy application, whenever
>>> it restarted it'd try to resume what it was doing which created an
>>> endless loop of retrying and crashing until the job was removed.
>>>
>>> Does anyone know what may be causing this?
>>>
>>> Cheers,
>>>
>>> Martin
>>>
>>> --
>>>
>>> --
>>> Dr. Martin Vickers
>>>
>>> Data Manager/HPC Sy

Re: [galaxy-dev] Adding data libraries from filesystem path creating duplicates

2015-09-21 Thread Martin Vickers

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hi John,

Thanks for taking the time to reply. I never thought to look at the
proxy settings but I think you're right, the behaviour seems to match
what you've described.

Like you I'm not really an expert on proxies and have no idea what would
be mis-configured that would cause this.

I'm using nginx and the configuration is as described in the wiki. I've
not loaded any special extensions.

nginx is configured like this;

upstream galaxy_app {
server localhost:8090;
server localhost:8091;
server localhost:8092;
server localhost:8093;
server localhost:8094;
server localhost:8095;
}

server {
# pass to uWSGI by default
location / {
proxy_pass http://galaxy_app;
proxy_set_header X-Forwarded-Host $host;
proxy_set_header X-Forwarded-For  $proxy_add_x_forwarded_for;
proxy_set_header X-URL-SCHEME https;
}

static content

}

and in galaxy.ini I have a bunch of handlers, e.g.


[server:handler0]
use = egg:Paste#http
port = 8090
host = 127.0.0.1
use_threadpool = true
threadpool_workers = 5

I thought that maybe the issue was to do with the 'job admin
complication' very briefly mentioned here;

https://production-galaxy-instances-with-cloudman-and-cloudbiolinux.readthedocs.org/en/latest/

so I added this to my nginx conf

location /admin/jobs {
proxy_pass  http://localhost:8090;
}


so this complication is not the one I'm having here.

Are any of the people John mentioned having this issue here on the dev
board?

Cheers,

Martin

On 09/14/2015 03:11 PM, John Chilton wrote:
> If I had to guess, I would guess this is caused by a mis-configured
> proxy (nginx or Apache) that is resubmitting a POST request that is
> taking Galaxy to long to respond to. Order of events being something
> like:
>
> - User clicks to upload library items.
> - Proxy gets requests and passes to Galaxy
> - Galaxy takes a long time to process request and doesn't respond
> within a timeout.
> - Proxy resends POST request to Galaxy.
> - Galaxy takes a long time to process request and doesn't respond
> within a timeout.
> ...
>
> Proxies should never resend POST requests to Galaxy as far a I can
> imagine, but we have seen this for instance when submitting workflows.
> Some people have had their proxy retry that request repeatedly.
>
> I don't really know if this is a problem with the default proxy
> configurations we list on the wiki or if it comes down to
> customizations or special loaded extensions at various sites that have
> encountered this.
>
> Is this enough to help debug the problem? I'm not really an expert on
> specific proxies, etc... and you have it there and seem to be able to
> reproduce the problem. If you do want further help I would post the
> proxy you are using, the extensions, the configuration, and the Galaxy
> logs corresponding to this incident to see if we can see the repeated
> posts and the route that is being posted to.
>
> If you are not using a proxy, then I am stumped :(.
>
> -John
>
>
> On Fri, Sep 4, 2015 at 12:04 PM, Martin Vickers  wrote:
>> Hi All,
>>
>> I've noticed an issue a couple of times now where I've added a directory
>> of fastq's from an NFS mounted filesystem (reference only rather than
>> copying into galaxy) and then galaxy times out. Load average begins to
>> get really high and then consumes all the RAM and sometimes crashes.
>> These are the same symptom as I had before with this issue that was
>> never resolved;
>>
>> http://dev.list.galaxyproject.org/run-sh-segfault-td4667549.html#a4667553
>>
>> What I've noticed is that in the dataset I'm uploading to galaxy, there
>> are suddenly many duplicates. In this example that's just happened,
>> there are 288 fastq.gz files in the physical folder, but galaxy has
>> created 6 references to each file resulting in 1728 datasets in the
>> folder (see attached images).
>>
>> When this happened before and crashed the galaxy application, whenever
>> it restarted it'd try to resume what it was doing which created an
>> endless loop of retrying and crashing until the job was removed.
>>
>> Does anyone know what may be causing this?
>>
>> Cheers,
>>
>> Martin
>>
>> --
>>
>> --
>> Dr. Martin Vickers
>>
>> Data Manager/HPC Systems Administrator
>> Institute of Biological, Environmental and Rural Sciences
>> IBERS New Building
>> Aberystwyth University
>>
>> w: http://www.martin-vickers.co.uk/
>> e: mj...@aber.ac.uk
>> t: 01970 62 2807
>>
>>
>> ___
>> Please keep all replies on the list by using "reply all"
>> in your mail client.  To manage your subscriptions to this
>> and other Galaxy lists, please use the interface at:
>>   https://lists.galaxyproject.org/
>>
>> To search Galaxy mailing lists use the unified search at:
>>   http://galaxyproject.org/search/mailinglists/

- -- 

- --
Dr. Martin Vickers

Data Manager/HPC Systems Administrator
Institute of Biological, Enviro

Re: [galaxy-dev] Adding data libraries from filesystem path creating duplicates

2015-09-14 Thread John Chilton
If I had to guess, I would guess this is caused by a mis-configured
proxy (nginx or Apache) that is resubmitting a POST request that is
taking Galaxy to long to respond to. Order of events being something
like:

- User clicks to upload library items.
- Proxy gets requests and passes to Galaxy
- Galaxy takes a long time to process request and doesn't respond
within a timeout.
- Proxy resends POST request to Galaxy.
- Galaxy takes a long time to process request and doesn't respond
within a timeout.
...

Proxies should never resend POST requests to Galaxy as far a I can
imagine, but we have seen this for instance when submitting workflows.
Some people have had their proxy retry that request repeatedly.

I don't really know if this is a problem with the default proxy
configurations we list on the wiki or if it comes down to
customizations or special loaded extensions at various sites that have
encountered this.

Is this enough to help debug the problem? I'm not really an expert on
specific proxies, etc... and you have it there and seem to be able to
reproduce the problem. If you do want further help I would post the
proxy you are using, the extensions, the configuration, and the Galaxy
logs corresponding to this incident to see if we can see the repeated
posts and the route that is being posted to.

If you are not using a proxy, then I am stumped :(.

-John


On Fri, Sep 4, 2015 at 12:04 PM, Martin Vickers  wrote:
> Hi All,
>
> I've noticed an issue a couple of times now where I've added a directory
> of fastq's from an NFS mounted filesystem (reference only rather than
> copying into galaxy) and then galaxy times out. Load average begins to
> get really high and then consumes all the RAM and sometimes crashes.
> These are the same symptom as I had before with this issue that was
> never resolved;
>
> http://dev.list.galaxyproject.org/run-sh-segfault-td4667549.html#a4667553
>
> What I've noticed is that in the dataset I'm uploading to galaxy, there
> are suddenly many duplicates. In this example that's just happened,
> there are 288 fastq.gz files in the physical folder, but galaxy has
> created 6 references to each file resulting in 1728 datasets in the
> folder (see attached images).
>
> When this happened before and crashed the galaxy application, whenever
> it restarted it'd try to resume what it was doing which created an
> endless loop of retrying and crashing until the job was removed.
>
> Does anyone know what may be causing this?
>
> Cheers,
>
> Martin
>
> --
>
> --
> Dr. Martin Vickers
>
> Data Manager/HPC Systems Administrator
> Institute of Biological, Environmental and Rural Sciences
> IBERS New Building
> Aberystwyth University
>
> w: http://www.martin-vickers.co.uk/
> e: mj...@aber.ac.uk
> t: 01970 62 2807
>
>
> ___
> Please keep all replies on the list by using "reply all"
> in your mail client.  To manage your subscriptions to this
> and other Galaxy lists, please use the interface at:
>   https://lists.galaxyproject.org/
>
> To search Galaxy mailing lists use the unified search at:
>   http://galaxyproject.org/search/mailinglists/
___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  https://lists.galaxyproject.org/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/