Re: [galaxy-dev] Fwd: Galaxy process

2011-03-15 Thread Brad Chapman
Nate;

[folder level security to speed up data library loading]

> My suggestion would be a solution somewhere in the middle.  The ability
> to have per-dataset permissions is something that I think we should
> retain, but we could change our current policy of checking the
> permissions of the entire library at every load.  Instead, it could work
> like this:
> 
>   1. Check permissions on the library.
>   2. Check permissions on the first level contents of the library.
>   3. When a folder is expanded to show its contents, check the
>  permissions of that folder's contents via AJAX.
> 
> The reason we didn't do this originally was to prevent folders from
> showing up if a user didn't have permission to access any of the
> datasets in that folder.  But this can be worked around by setting
> access permission on the folder itself.
> 
> This is probably a fair amount of work, though, since it means not
> loading subfolder contents at page load since we are not checking their
> security until later.

Agreed, the AJAX loading would be ideal and allow you to maintain
full permissions. This would also allow scaling up to arbitrarily
large data libraries, as long as they are nested within folders.
It did look like a pretty big project upon digging into the code,
but any modifications that allow larger libraries would definitely
be appreciated. Thanks,
Brad
___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/


Re: [galaxy-dev] Fwd: Galaxy process

2011-03-15 Thread Nate Coraor
SHAUN WEBB wrote:
> 
> Further to my email below.
> 
> I have a data library that contains many ~117Gb of NGS data,
> uploaded via file system path. This library was always slow to open
> (about 10s) but now takes several minutes or not at all.
> 
> Thanks for any help on this. I am still experiencing a memory leak
> that I can't pinpoint and it is only dissipated by restarting the
> server. At the moment my debugging level is set to INFO. Is there
> anything I can change in the universe file to try to trace this.

Two things that would help here:

Are there other users doing things in Galaxy at this time?  It'd help to
be able to determine exactly what is triggering this.

Also, if you set 'use_heartbeat = True' in universe_wsgi.ini, this will
dump the call stack every 30 seconds to the file 'heartbeat.log' (and
'heartbeat.log.nonsleeping').  This should reveal where the thread(s)
are hung.

--nate

> 
> Thanks!
> Shaun
> 
> 
> - Forwarded message from swe...@staffmail.ed.ac.uk -
> Date: Wed, 09 Mar 2011 10:15:13 +
> From: SHAUN WEBB 
>  Subject: Galaxy process
>   To: galaxy dev 
> 
> 
> 
> Hi,
> 
> since making the last update I have found some new warnings in my
> paster.log, it also seems as though the galaxy process starts to
> gather memory and eventually hang (35% of 64G memory).
> 
> I've posted the entries below.
> 
> If anyone could help me understand what is going on that would be great.
> 
> Thanks.
> Shaun Webb
> 
> 
> 
> paste.httpserver.ThreadPool INFO 2011-03-09 09:49:49,962 No idle
> tasks, and only 0 busy tasks; adding 5 more workers
> paste.httpserver.ThreadPool INFO 2011-03-09 09:49:58,754 No idle
> tasks, and only 4 busy tasks; adding 1 more workers
> paste.httpserver.ThreadPool INFO 2011-03-09 09:51:47,301 Culling 6
> extra workers (5 idle workers present)
> paste.httpserver.ThreadPool INFO 2011-03-09 09:55:17,163 No idle
> tasks, and only 0 busy tasks; adding 5 more workers
> 129.215.14.72 - - [09/Mar/2011:09:48:40 +0100] "GET /history
> HTTP/1.1" 500 - "http://bifx-core.bio.ed.ac.uk:8080/"; "Mozilla/5.0
> (Windows; U; Windows NT 5.1; en-GB; rv:1.9.2.13) Gecko/20101203
> Firefox/3.6.13 (.NET CLR 2.0.50727; .NET CLR 1.1.4322; .NET CLR
> 3.0.4506.2152; .NET CLR 3.5.30729)"
> paste.httpserver.ThreadPool INFO 2011-03-09 10:13:16,956 Culling 5
> extra workers (7 idle workers present)
> 212.183.140.59 - - [09/Mar/2011:10:13:17 +0100] "GET / HTTP/1.1" 200
> - "-" "Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_6_6; en-US)
> AppleWebKit/534.13 (KHTML, like Gecko) Chrome/9.0.597.102
> Safari/534.13"
> paste.httpserver.ThreadPool INFO 2011-03-09 10:14:35,715 No idle
> tasks, and only 2 busy tasks; adding 3 more workers
> paste.httpserver.ThreadPool WARNING 2011-03-09 10:15:15,104 Thread
> 140283224094464 hung (working on task for 3096 seconds)
> 
> Exception happened during processing of request from ('212.183.140.59', 10871)
> Traceback (most recent call last):
>   File "/usr/lib/python2.6/SocketServer.py", line 281, in
> _handle_request_noblock
> self.process_request(request, client_address)
>   File 
> "/storage/home/galaxy/galaxy_dist/eggs/Paste-1.6-py2.6.egg/paste/httpserver.py",
> line 1037, in process_request
> lambda: self.process_request_in_thread(request, client_address))
>   File 
> "/storage/home/galaxy/galaxy_dist/eggs/Paste-1.6-py2.6.egg/paste/httpserver.py",
> line 617, in add_task
> self.kill_hung_threads()
>   File 
> "/storage/home/galaxy/galaxy_dist/eggs/Paste-1.6-py2.6.egg/paste/httpserver.py",
> line 778, in kill_hung_threads
> self.kill_worker(worker.thread_id)
>   File 
> "/storage/home/galaxy/galaxy_dist/eggs/Paste-1.6-py2.6.egg/paste/httpserver.py",
> line 705, in kill_worker
> killthread.async_raise(thread_id, SystemExit)
>   File 
> "/storage/home/galaxy/galaxy_dist/eggs/Paste-1.6-py2.6.egg/paste/util/killthread.py",
> line 22, in async_raise
> raise ValueError("invalid thread id")
> ValueError: invalid thread id
> 
> 
> 
> -- 
> The University of Edinburgh is a charitable body, registered in
> Scotland, with registration number SC005336.
> 
> 
> 
> - End forwarded message -
> 
> 
> -- 
> The University of Edinburgh is a charitable body, registered in
> Scotland, with registration number SC005336.
> 
> 
> ___
> Please keep all replies on the list by using "reply all"
> in your mail client.  To manage your subscriptions to this
> and other Galaxy lists, please use the interface at:
> 
>  http://lists.bx.psu.edu/
___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/


Re: [galaxy-dev] Fwd: Galaxy process

2011-03-15 Thread Nate Coraor
Brad Chapman wrote:
> 
> > We made a decision when we implemented libraries that we would provide
> > fine grained security at the dataset level. The trade-off, obviously,
> > is that it takes time to check every dataset. Another approach to
> > solve this would be to not provide as fine-grained security, and have
> > security at the folder level rather than the dataset level. If this is
> > done, all dataset within a folder would be required to have the same
> > security. What are your thoughts on this approach?
> 
> This sounds like a very reasonable trade off. We go one further and 
> use Data Library level security, so everything within a library has 
> the same permissions, but I can definitely see how having the
> ability to control is at the folder level would be useful.

My suggestion would be a solution somewhere in the middle.  The ability
to have per-dataset permissions is something that I think we should
retain, but we could change our current policy of checking the
permissions of the entire library at every load.  Instead, it could work
like this:

  1. Check permissions on the library.
  2. Check permissions on the first level contents of the library.
  3. When a folder is expanded to show its contents, check the
 permissions of that folder's contents via AJAX.

The reason we didn't do this originally was to prevent folders from
showing up if a user didn't have permission to access any of the
datasets in that folder.  But this can be worked around by setting
access permission on the folder itself.

This is probably a fair amount of work, though, since it means not
loading subfolder contents at page load since we are not checking their
security until later.

--nate

> 
> Brad
> ___
> Please keep all replies on the list by using "reply all"
> in your mail client.  To manage your subscriptions to this
> and other Galaxy lists, please use the interface at:
> 
>   http://lists.bx.psu.edu/
___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/


Re: [galaxy-dev] Fwd: Galaxy process

2011-03-10 Thread Brad Chapman
Greg;

> I agree that implementing an ajax approach to rendering libraries may
> be a good solution, but also may be difficult. We'll analyze this a
> bit more and see if it will be reasonable.

Great -- thanks for taking a look.

> We made a decision when we implemented libraries that we would provide
> fine grained security at the dataset level. The trade-off, obviously,
> is that it takes time to check every dataset. Another approach to
> solve this would be to not provide as fine-grained security, and have
> security at the folder level rather than the dataset level. If this is
> done, all dataset within a folder would be required to have the same
> security. What are your thoughts on this approach?

This sounds like a very reasonable trade off. We go one further and 
use Data Library level security, so everything within a library has 
the same permissions, but I can definitely see how having the
ability to control is at the folder level would be useful.

Brad
___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/


Re: [galaxy-dev] Fwd: Galaxy process

2011-03-10 Thread Greg Von Kuster
Hi Brad,

I agree that implementing an ajax approach to rendering libraries may be a good 
solution, but also may be difficult.  We'll analyze this a bit more and see if 
it will be reasonable.  

We made a decision when we implemented libraries that we would provide fine 
grained security at the dataset level.  The trade-off, obviously, is that it 
takes time to check every dataset.  Another approach to solve this would be to 
not provide as fine-grained security, and have security at the folder level 
rather than the dataset level.  If this is done, all dataset within a folder 
would be required to have the same security.  What are your thoughts on this 
approach?

Thanks Brad,

Greg


On Mar 10, 2011, at 10:07 AM, Brad Chapman wrote:

> Greg;
> 
> [large data library performance]
>> Security checks are performed on every active ( undeleted ) dataset
>> within a data library as it's contents are rendered upon opening. For
>> the current user, every dataset is checked to determine if the user
>> can access it, and if so, then checks are made to see if the user has
>> permission to perform operations on the dataset that fall into the add
>> / modify / manage permissions areas. These checks incur db hits for
>> each dataset, so if your data library include many datasets ( several
>> hundred or more ), then it will take a bit of time to render it upon
>> opening. The size of the dataset files is not an issue here, but the
>> number of datasets within the data library.
> 
> We are running into this limit quite a bit in practice as our data libraries
> grow. Splitting it does provide as a quick workaround. What would you
> think about loading folder data on-demand via Ajax? Our data is
> stored in folders and sub-folders within the library, so this would
> let us scale up library items without having to arbitrarily split
> the data libraries.
> 
> I took a quick look at the code with this in might and it looked,
> well, hard. But that could be totally due to my ignorance of the
> implementation, what do you think?
> 
> Brad
> ___
> Please keep all replies on the list by using "reply all"
> in your mail client.  To manage your subscriptions to this
> and other Galaxy lists, please use the interface at:
> 
>  http://lists.bx.psu.edu/

Greg Von Kuster
Galaxy Development Team
g...@bx.psu.edu



___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/

Re: [galaxy-dev] Fwd: Galaxy process

2011-03-10 Thread Greg Von Kuster
Hi Shaun,

On Mar 10, 2011, at 10:05 AM, SHAUN WEBB wrote:

> Thanks Greg, there would be about 100 datasets in the library.

If there are only about 100 datasets, then this is likely not the cause of the 
time taken to render the library.  Whatever is going on with the environment 
that is slowing Galaxy may also be causing this. 

> 
> Can you tell me how I would pull a single changeset, I have only done batch 
> updates using the distribution depository before.

You really can't pull a single change set, you just have to pull from the 
development repo in a batch update like you've done with teh stable 
distribution.

> 
> I'm just wondering why it has become so much slower since the latest update 
> (I previously updated last December). Also wondering why the Galaxy process 
> is now taking up 25% of the memory on a 64GB machine.

Yeah, the cause of this is really needed, although I have no idea what is could 
be...


> 
> Thanks
> Shaun
> 
> Quoting Greg Von Kuster :
> 
>> Hi Shaun,
>> 
>> Security checks are performed on every active ( undeleted ) dataset within a 
>> data library as it's contents are rendered upon opening.  For the current 
>> user, every dataset is checked to determine if the user can access it, and 
>> if so, then checks are made to see if the user has permission to perform 
>> operations on the dataset that fall into the add / modify / manage 
>> permissions areas.  These checks incur db hits for each dataset, so if your 
>> data library include many datasets ( several hundred  or more ), then it 
>> will take a bit of time to render it upon opening.  The size of the dataset 
>> files is not an issue here, but the number of datasets within the data 
>> library.
>> 
>> A solution to this is to split up the data library.  Change set 
>> 5200:ed7b6180b925 added the ability to move data library items within a 
>> library or between libraries, providing a way to split up a large library.  
>> This change set should make it to the distribution within the next few 
>> weeks, or you can pull it from our development repo.
>> 
>> Thanks Shaun,
>> 
>> Greg Von Kuster
>> 
>> 
>> On Mar 10, 2011, at 5:21 AM, SHAUN WEBB wrote:
>> 
>>> 
>>> Further to my email below.
>>> 
>>> I have a data library that contains many ~117Gb of NGS data, uploaded via 
>>> file system path. This library was always slow to open (about 10s) but now 
>>> takes several minutes or not at all.
>>> 
>>> Thanks for any help on this. I am still experiencing a memory leak that I 
>>> can't pinpoint and it is only dissipated by restarting the server. At the 
>>> moment my debugging level is set to INFO. Is there anything I can change in 
>>> the universe file to try to trace this.
>>> 
>>> Thanks!
>>> Shaun
>>> 
>>> 
>> 
>> Greg Von Kuster
>> Galaxy Development Team
>> g...@bx.psu.edu
>> 
>> 
>> 
>> 
>> 
> 
> 
> 
> -- 
> The University of Edinburgh is a charitable body, registered in
> Scotland, with registration number SC005336.
> 
> 

Greg Von Kuster
Galaxy Development Team
g...@bx.psu.edu




___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/


Re: [galaxy-dev] Fwd: Galaxy process

2011-03-10 Thread Brad Chapman
Greg;

[large data library performance]
> Security checks are performed on every active ( undeleted ) dataset
> within a data library as it's contents are rendered upon opening. For
> the current user, every dataset is checked to determine if the user
> can access it, and if so, then checks are made to see if the user has
> permission to perform operations on the dataset that fall into the add
> / modify / manage permissions areas. These checks incur db hits for
> each dataset, so if your data library include many datasets ( several
> hundred or more ), then it will take a bit of time to render it upon
> opening. The size of the dataset files is not an issue here, but the
> number of datasets within the data library.

We are running into this limit quite a bit in practice as our data libraries
grow. Splitting it does provide as a quick workaround. What would you
think about loading folder data on-demand via Ajax? Our data is
stored in folders and sub-folders within the library, so this would
let us scale up library items without having to arbitrarily split
the data libraries.

I took a quick look at the code with this in might and it looked,
well, hard. But that could be totally due to my ignorance of the
implementation, what do you think?

Brad
___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/


Re: [galaxy-dev] Fwd: Galaxy process

2011-03-10 Thread SHAUN WEBB

Thanks Greg, there would be about 100 datasets in the library.

Can you tell me how I would pull a single changeset, I have only done  
batch updates using the distribution depository before.


I'm just wondering why it has become so much slower since the latest  
update (I previously updated last December). Also wondering why the  
Galaxy process is now taking up 25% of the memory on a 64GB machine.


Thanks
Shaun

Quoting Greg Von Kuster :


Hi Shaun,

Security checks are performed on every active ( undeleted ) dataset  
within a data library as it's contents are rendered upon opening.   
For the current user, every dataset is checked to determine if the  
user can access it, and if so, then checks are made to see if the  
user has permission to perform operations on the dataset that fall  
into the add / modify / manage permissions areas.  These checks  
incur db hits for each dataset, so if your data library include many  
datasets ( several hundred  or more ), then it will take a bit of  
time to render it upon opening.  The size of the dataset files is  
not an issue here, but the number of datasets within the data library.


A solution to this is to split up the data library.  Change set  
5200:ed7b6180b925 added the ability to move data library items  
within a library or between libraries, providing a way to split up a  
large library.  This change set should make it to the distribution  
within the next few weeks, or you can pull it from our development  
repo.


Thanks Shaun,

Greg Von Kuster


On Mar 10, 2011, at 5:21 AM, SHAUN WEBB wrote:



Further to my email below.

I have a data library that contains many ~117Gb of NGS data,  
uploaded via file system path. This library was always slow to open  
(about 10s) but now takes several minutes or not at all.


Thanks for any help on this. I am still experiencing a memory leak  
that I can't pinpoint and it is only dissipated by restarting the  
server. At the moment my debugging level is set to INFO. Is there  
anything I can change in the universe file to try to trace this.


Thanks!
Shaun




Greg Von Kuster
Galaxy Development Team
g...@bx.psu.edu









--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.


___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

 http://lists.bx.psu.edu/


Re: [galaxy-dev] Fwd: Galaxy process

2011-03-10 Thread Greg Von Kuster
Hi Shaun,

Security checks are performed on every active ( undeleted ) dataset within a 
data library as it's contents are rendered upon opening.  For the current user, 
every dataset is checked to determine if the user can access it, and if so, 
then checks are made to see if the user has permission to perform operations on 
the dataset that fall into the add / modify / manage permissions areas.  These 
checks incur db hits for each dataset, so if your data library include many 
datasets ( several hundred  or more ), then it will take a bit of time to 
render it upon opening.  The size of the dataset files is not an issue here, 
but the number of datasets within the data library.

A solution to this is to split up the data library.  Change set 
5200:ed7b6180b925 added the ability to move data library items within a library 
or between libraries, providing a way to split up a large library.  This change 
set should make it to the distribution within the next few weeks, or you can 
pull it from our development repo.

Thanks Shaun,

Greg Von Kuster


On Mar 10, 2011, at 5:21 AM, SHAUN WEBB wrote:

> 
> Further to my email below.
> 
> I have a data library that contains many ~117Gb of NGS data, uploaded via 
> file system path. This library was always slow to open (about 10s) but now 
> takes several minutes or not at all.
> 
> Thanks for any help on this. I am still experiencing a memory leak that I 
> can't pinpoint and it is only dissipated by restarting the server. At the 
> moment my debugging level is set to INFO. Is there anything I can change in 
> the universe file to try to trace this.
> 
> Thanks!
> Shaun
> 
> 

Greg Von Kuster
Galaxy Development Team
g...@bx.psu.edu




___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/


[galaxy-dev] Fwd: Galaxy process

2011-03-10 Thread SHAUN WEBB


Further to my email below.

I have a data library that contains many ~117Gb of NGS data, uploaded  
via file system path. This library was always slow to open (about 10s)  
but now takes several minutes or not at all.


Thanks for any help on this. I am still experiencing a memory leak  
that I can't pinpoint and it is only dissipated by restarting the  
server. At the moment my debugging level is set to INFO. Is there  
anything I can change in the universe file to try to trace this.


Thanks!
Shaun


- Forwarded message from swe...@staffmail.ed.ac.uk -
Date: Wed, 09 Mar 2011 10:15:13 +
From: SHAUN WEBB 
 Subject: Galaxy process
  To: galaxy dev 



Hi,

since making the last update I have found some new warnings in my  
paster.log, it also seems as though the galaxy process starts to  
gather memory and eventually hang (35% of 64G memory).


I've posted the entries below.

If anyone could help me understand what is going on that would be great.

Thanks.
Shaun Webb



paste.httpserver.ThreadPool INFO 2011-03-09 09:49:49,962 No idle  
tasks, and only 0 busy tasks; adding 5 more workers
paste.httpserver.ThreadPool INFO 2011-03-09 09:49:58,754 No idle  
tasks, and only 4 busy tasks; adding 1 more workers
paste.httpserver.ThreadPool INFO 2011-03-09 09:51:47,301 Culling 6  
extra workers (5 idle workers present)
paste.httpserver.ThreadPool INFO 2011-03-09 09:55:17,163 No idle  
tasks, and only 0 busy tasks; adding 5 more workers
129.215.14.72 - - [09/Mar/2011:09:48:40 +0100] "GET /history HTTP/1.1"  
500 - "http://bifx-core.bio.ed.ac.uk:8080/"; "Mozilla/5.0 (Windows; U;  
Windows NT 5.1; en-GB; rv:1.9.2.13) Gecko/20101203 Firefox/3.6.13  
(.NET CLR 2.0.50727; .NET CLR 1.1.4322; .NET CLR 3.0.4506.2152; .NET  
CLR 3.5.30729)"
paste.httpserver.ThreadPool INFO 2011-03-09 10:13:16,956 Culling 5  
extra workers (7 idle workers present)
212.183.140.59 - - [09/Mar/2011:10:13:17 +0100] "GET / HTTP/1.1" 200 -  
"-" "Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_6_6; en-US)  
AppleWebKit/534.13 (KHTML, like Gecko) Chrome/9.0.597.102 Safari/534.13"
paste.httpserver.ThreadPool INFO 2011-03-09 10:14:35,715 No idle  
tasks, and only 2 busy tasks; adding 3 more workers
paste.httpserver.ThreadPool WARNING 2011-03-09 10:15:15,104 Thread  
140283224094464 hung (working on task for 3096 seconds)


Exception happened during processing of request from ('212.183.140.59', 10871)
Traceback (most recent call last):
  File "/usr/lib/python2.6/SocketServer.py", line 281, in  
_handle_request_noblock

self.process_request(request, client_address)
  File  
"/storage/home/galaxy/galaxy_dist/eggs/Paste-1.6-py2.6.egg/paste/httpserver.py", line 1037, in  
process_request

lambda: self.process_request_in_thread(request, client_address))
  File  
"/storage/home/galaxy/galaxy_dist/eggs/Paste-1.6-py2.6.egg/paste/httpserver.py", line 617, in  
add_task

self.kill_hung_threads()
  File  
"/storage/home/galaxy/galaxy_dist/eggs/Paste-1.6-py2.6.egg/paste/httpserver.py", line 778, in  
kill_hung_threads

self.kill_worker(worker.thread_id)
  File  
"/storage/home/galaxy/galaxy_dist/eggs/Paste-1.6-py2.6.egg/paste/httpserver.py", line 705, in  
kill_worker

killthread.async_raise(thread_id, SystemExit)
  File  
"/storage/home/galaxy/galaxy_dist/eggs/Paste-1.6-py2.6.egg/paste/util/killthread.py", line 22, in  
async_raise

raise ValueError("invalid thread id")
ValueError: invalid thread id



--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.



- End forwarded message -


--
The University of Edinburgh is a charitable body, registered in
Scotland, with registration number SC005336.


___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

 http://lists.bx.psu.edu/