Re: [galaxy-dev] Fwd: Galaxy process
Nate; [folder level security to speed up data library loading] > My suggestion would be a solution somewhere in the middle. The ability > to have per-dataset permissions is something that I think we should > retain, but we could change our current policy of checking the > permissions of the entire library at every load. Instead, it could work > like this: > > 1. Check permissions on the library. > 2. Check permissions on the first level contents of the library. > 3. When a folder is expanded to show its contents, check the > permissions of that folder's contents via AJAX. > > The reason we didn't do this originally was to prevent folders from > showing up if a user didn't have permission to access any of the > datasets in that folder. But this can be worked around by setting > access permission on the folder itself. > > This is probably a fair amount of work, though, since it means not > loading subfolder contents at page load since we are not checking their > security until later. Agreed, the AJAX loading would be ideal and allow you to maintain full permissions. This would also allow scaling up to arbitrarily large data libraries, as long as they are nested within folders. It did look like a pretty big project upon digging into the code, but any modifications that allow larger libraries would definitely be appreciated. Thanks, Brad ___ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Re: [galaxy-dev] Fwd: Galaxy process
SHAUN WEBB wrote: > > Further to my email below. > > I have a data library that contains many ~117Gb of NGS data, > uploaded via file system path. This library was always slow to open > (about 10s) but now takes several minutes or not at all. > > Thanks for any help on this. I am still experiencing a memory leak > that I can't pinpoint and it is only dissipated by restarting the > server. At the moment my debugging level is set to INFO. Is there > anything I can change in the universe file to try to trace this. Two things that would help here: Are there other users doing things in Galaxy at this time? It'd help to be able to determine exactly what is triggering this. Also, if you set 'use_heartbeat = True' in universe_wsgi.ini, this will dump the call stack every 30 seconds to the file 'heartbeat.log' (and 'heartbeat.log.nonsleeping'). This should reveal where the thread(s) are hung. --nate > > Thanks! > Shaun > > > - Forwarded message from swe...@staffmail.ed.ac.uk - > Date: Wed, 09 Mar 2011 10:15:13 + > From: SHAUN WEBB > Subject: Galaxy process > To: galaxy dev > > > > Hi, > > since making the last update I have found some new warnings in my > paster.log, it also seems as though the galaxy process starts to > gather memory and eventually hang (35% of 64G memory). > > I've posted the entries below. > > If anyone could help me understand what is going on that would be great. > > Thanks. > Shaun Webb > > > > paste.httpserver.ThreadPool INFO 2011-03-09 09:49:49,962 No idle > tasks, and only 0 busy tasks; adding 5 more workers > paste.httpserver.ThreadPool INFO 2011-03-09 09:49:58,754 No idle > tasks, and only 4 busy tasks; adding 1 more workers > paste.httpserver.ThreadPool INFO 2011-03-09 09:51:47,301 Culling 6 > extra workers (5 idle workers present) > paste.httpserver.ThreadPool INFO 2011-03-09 09:55:17,163 No idle > tasks, and only 0 busy tasks; adding 5 more workers > 129.215.14.72 - - [09/Mar/2011:09:48:40 +0100] "GET /history > HTTP/1.1" 500 - "http://bifx-core.bio.ed.ac.uk:8080/"; "Mozilla/5.0 > (Windows; U; Windows NT 5.1; en-GB; rv:1.9.2.13) Gecko/20101203 > Firefox/3.6.13 (.NET CLR 2.0.50727; .NET CLR 1.1.4322; .NET CLR > 3.0.4506.2152; .NET CLR 3.5.30729)" > paste.httpserver.ThreadPool INFO 2011-03-09 10:13:16,956 Culling 5 > extra workers (7 idle workers present) > 212.183.140.59 - - [09/Mar/2011:10:13:17 +0100] "GET / HTTP/1.1" 200 > - "-" "Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_6_6; en-US) > AppleWebKit/534.13 (KHTML, like Gecko) Chrome/9.0.597.102 > Safari/534.13" > paste.httpserver.ThreadPool INFO 2011-03-09 10:14:35,715 No idle > tasks, and only 2 busy tasks; adding 3 more workers > paste.httpserver.ThreadPool WARNING 2011-03-09 10:15:15,104 Thread > 140283224094464 hung (working on task for 3096 seconds) > > Exception happened during processing of request from ('212.183.140.59', 10871) > Traceback (most recent call last): > File "/usr/lib/python2.6/SocketServer.py", line 281, in > _handle_request_noblock > self.process_request(request, client_address) > File > "/storage/home/galaxy/galaxy_dist/eggs/Paste-1.6-py2.6.egg/paste/httpserver.py", > line 1037, in process_request > lambda: self.process_request_in_thread(request, client_address)) > File > "/storage/home/galaxy/galaxy_dist/eggs/Paste-1.6-py2.6.egg/paste/httpserver.py", > line 617, in add_task > self.kill_hung_threads() > File > "/storage/home/galaxy/galaxy_dist/eggs/Paste-1.6-py2.6.egg/paste/httpserver.py", > line 778, in kill_hung_threads > self.kill_worker(worker.thread_id) > File > "/storage/home/galaxy/galaxy_dist/eggs/Paste-1.6-py2.6.egg/paste/httpserver.py", > line 705, in kill_worker > killthread.async_raise(thread_id, SystemExit) > File > "/storage/home/galaxy/galaxy_dist/eggs/Paste-1.6-py2.6.egg/paste/util/killthread.py", > line 22, in async_raise > raise ValueError("invalid thread id") > ValueError: invalid thread id > > > > -- > The University of Edinburgh is a charitable body, registered in > Scotland, with registration number SC005336. > > > > - End forwarded message - > > > -- > The University of Edinburgh is a charitable body, registered in > Scotland, with registration number SC005336. > > > ___ > Please keep all replies on the list by using "reply all" > in your mail client. To manage your subscriptions to this > and other Galaxy lists, please use the interface at: > > http://lists.bx.psu.edu/ ___ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Re: [galaxy-dev] Fwd: Galaxy process
Brad Chapman wrote: > > > We made a decision when we implemented libraries that we would provide > > fine grained security at the dataset level. The trade-off, obviously, > > is that it takes time to check every dataset. Another approach to > > solve this would be to not provide as fine-grained security, and have > > security at the folder level rather than the dataset level. If this is > > done, all dataset within a folder would be required to have the same > > security. What are your thoughts on this approach? > > This sounds like a very reasonable trade off. We go one further and > use Data Library level security, so everything within a library has > the same permissions, but I can definitely see how having the > ability to control is at the folder level would be useful. My suggestion would be a solution somewhere in the middle. The ability to have per-dataset permissions is something that I think we should retain, but we could change our current policy of checking the permissions of the entire library at every load. Instead, it could work like this: 1. Check permissions on the library. 2. Check permissions on the first level contents of the library. 3. When a folder is expanded to show its contents, check the permissions of that folder's contents via AJAX. The reason we didn't do this originally was to prevent folders from showing up if a user didn't have permission to access any of the datasets in that folder. But this can be worked around by setting access permission on the folder itself. This is probably a fair amount of work, though, since it means not loading subfolder contents at page load since we are not checking their security until later. --nate > > Brad > ___ > Please keep all replies on the list by using "reply all" > in your mail client. To manage your subscriptions to this > and other Galaxy lists, please use the interface at: > > http://lists.bx.psu.edu/ ___ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Re: [galaxy-dev] Fwd: Galaxy process
Greg; > I agree that implementing an ajax approach to rendering libraries may > be a good solution, but also may be difficult. We'll analyze this a > bit more and see if it will be reasonable. Great -- thanks for taking a look. > We made a decision when we implemented libraries that we would provide > fine grained security at the dataset level. The trade-off, obviously, > is that it takes time to check every dataset. Another approach to > solve this would be to not provide as fine-grained security, and have > security at the folder level rather than the dataset level. If this is > done, all dataset within a folder would be required to have the same > security. What are your thoughts on this approach? This sounds like a very reasonable trade off. We go one further and use Data Library level security, so everything within a library has the same permissions, but I can definitely see how having the ability to control is at the folder level would be useful. Brad ___ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Re: [galaxy-dev] Fwd: Galaxy process
Hi Brad, I agree that implementing an ajax approach to rendering libraries may be a good solution, but also may be difficult. We'll analyze this a bit more and see if it will be reasonable. We made a decision when we implemented libraries that we would provide fine grained security at the dataset level. The trade-off, obviously, is that it takes time to check every dataset. Another approach to solve this would be to not provide as fine-grained security, and have security at the folder level rather than the dataset level. If this is done, all dataset within a folder would be required to have the same security. What are your thoughts on this approach? Thanks Brad, Greg On Mar 10, 2011, at 10:07 AM, Brad Chapman wrote: > Greg; > > [large data library performance] >> Security checks are performed on every active ( undeleted ) dataset >> within a data library as it's contents are rendered upon opening. For >> the current user, every dataset is checked to determine if the user >> can access it, and if so, then checks are made to see if the user has >> permission to perform operations on the dataset that fall into the add >> / modify / manage permissions areas. These checks incur db hits for >> each dataset, so if your data library include many datasets ( several >> hundred or more ), then it will take a bit of time to render it upon >> opening. The size of the dataset files is not an issue here, but the >> number of datasets within the data library. > > We are running into this limit quite a bit in practice as our data libraries > grow. Splitting it does provide as a quick workaround. What would you > think about loading folder data on-demand via Ajax? Our data is > stored in folders and sub-folders within the library, so this would > let us scale up library items without having to arbitrarily split > the data libraries. > > I took a quick look at the code with this in might and it looked, > well, hard. But that could be totally due to my ignorance of the > implementation, what do you think? > > Brad > ___ > Please keep all replies on the list by using "reply all" > in your mail client. To manage your subscriptions to this > and other Galaxy lists, please use the interface at: > > http://lists.bx.psu.edu/ Greg Von Kuster Galaxy Development Team g...@bx.psu.edu ___ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Re: [galaxy-dev] Fwd: Galaxy process
Hi Shaun, On Mar 10, 2011, at 10:05 AM, SHAUN WEBB wrote: > Thanks Greg, there would be about 100 datasets in the library. If there are only about 100 datasets, then this is likely not the cause of the time taken to render the library. Whatever is going on with the environment that is slowing Galaxy may also be causing this. > > Can you tell me how I would pull a single changeset, I have only done batch > updates using the distribution depository before. You really can't pull a single change set, you just have to pull from the development repo in a batch update like you've done with teh stable distribution. > > I'm just wondering why it has become so much slower since the latest update > (I previously updated last December). Also wondering why the Galaxy process > is now taking up 25% of the memory on a 64GB machine. Yeah, the cause of this is really needed, although I have no idea what is could be... > > Thanks > Shaun > > Quoting Greg Von Kuster : > >> Hi Shaun, >> >> Security checks are performed on every active ( undeleted ) dataset within a >> data library as it's contents are rendered upon opening. For the current >> user, every dataset is checked to determine if the user can access it, and >> if so, then checks are made to see if the user has permission to perform >> operations on the dataset that fall into the add / modify / manage >> permissions areas. These checks incur db hits for each dataset, so if your >> data library include many datasets ( several hundred or more ), then it >> will take a bit of time to render it upon opening. The size of the dataset >> files is not an issue here, but the number of datasets within the data >> library. >> >> A solution to this is to split up the data library. Change set >> 5200:ed7b6180b925 added the ability to move data library items within a >> library or between libraries, providing a way to split up a large library. >> This change set should make it to the distribution within the next few >> weeks, or you can pull it from our development repo. >> >> Thanks Shaun, >> >> Greg Von Kuster >> >> >> On Mar 10, 2011, at 5:21 AM, SHAUN WEBB wrote: >> >>> >>> Further to my email below. >>> >>> I have a data library that contains many ~117Gb of NGS data, uploaded via >>> file system path. This library was always slow to open (about 10s) but now >>> takes several minutes or not at all. >>> >>> Thanks for any help on this. I am still experiencing a memory leak that I >>> can't pinpoint and it is only dissipated by restarting the server. At the >>> moment my debugging level is set to INFO. Is there anything I can change in >>> the universe file to try to trace this. >>> >>> Thanks! >>> Shaun >>> >>> >> >> Greg Von Kuster >> Galaxy Development Team >> g...@bx.psu.edu >> >> >> >> >> > > > > -- > The University of Edinburgh is a charitable body, registered in > Scotland, with registration number SC005336. > > Greg Von Kuster Galaxy Development Team g...@bx.psu.edu ___ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Re: [galaxy-dev] Fwd: Galaxy process
Greg; [large data library performance] > Security checks are performed on every active ( undeleted ) dataset > within a data library as it's contents are rendered upon opening. For > the current user, every dataset is checked to determine if the user > can access it, and if so, then checks are made to see if the user has > permission to perform operations on the dataset that fall into the add > / modify / manage permissions areas. These checks incur db hits for > each dataset, so if your data library include many datasets ( several > hundred or more ), then it will take a bit of time to render it upon > opening. The size of the dataset files is not an issue here, but the > number of datasets within the data library. We are running into this limit quite a bit in practice as our data libraries grow. Splitting it does provide as a quick workaround. What would you think about loading folder data on-demand via Ajax? Our data is stored in folders and sub-folders within the library, so this would let us scale up library items without having to arbitrarily split the data libraries. I took a quick look at the code with this in might and it looked, well, hard. But that could be totally due to my ignorance of the implementation, what do you think? Brad ___ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Re: [galaxy-dev] Fwd: Galaxy process
Thanks Greg, there would be about 100 datasets in the library. Can you tell me how I would pull a single changeset, I have only done batch updates using the distribution depository before. I'm just wondering why it has become so much slower since the latest update (I previously updated last December). Also wondering why the Galaxy process is now taking up 25% of the memory on a 64GB machine. Thanks Shaun Quoting Greg Von Kuster : Hi Shaun, Security checks are performed on every active ( undeleted ) dataset within a data library as it's contents are rendered upon opening. For the current user, every dataset is checked to determine if the user can access it, and if so, then checks are made to see if the user has permission to perform operations on the dataset that fall into the add / modify / manage permissions areas. These checks incur db hits for each dataset, so if your data library include many datasets ( several hundred or more ), then it will take a bit of time to render it upon opening. The size of the dataset files is not an issue here, but the number of datasets within the data library. A solution to this is to split up the data library. Change set 5200:ed7b6180b925 added the ability to move data library items within a library or between libraries, providing a way to split up a large library. This change set should make it to the distribution within the next few weeks, or you can pull it from our development repo. Thanks Shaun, Greg Von Kuster On Mar 10, 2011, at 5:21 AM, SHAUN WEBB wrote: Further to my email below. I have a data library that contains many ~117Gb of NGS data, uploaded via file system path. This library was always slow to open (about 10s) but now takes several minutes or not at all. Thanks for any help on this. I am still experiencing a memory leak that I can't pinpoint and it is only dissipated by restarting the server. At the moment my debugging level is set to INFO. Is there anything I can change in the universe file to try to trace this. Thanks! Shaun Greg Von Kuster Galaxy Development Team g...@bx.psu.edu -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. ___ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Re: [galaxy-dev] Fwd: Galaxy process
Hi Shaun, Security checks are performed on every active ( undeleted ) dataset within a data library as it's contents are rendered upon opening. For the current user, every dataset is checked to determine if the user can access it, and if so, then checks are made to see if the user has permission to perform operations on the dataset that fall into the add / modify / manage permissions areas. These checks incur db hits for each dataset, so if your data library include many datasets ( several hundred or more ), then it will take a bit of time to render it upon opening. The size of the dataset files is not an issue here, but the number of datasets within the data library. A solution to this is to split up the data library. Change set 5200:ed7b6180b925 added the ability to move data library items within a library or between libraries, providing a way to split up a large library. This change set should make it to the distribution within the next few weeks, or you can pull it from our development repo. Thanks Shaun, Greg Von Kuster On Mar 10, 2011, at 5:21 AM, SHAUN WEBB wrote: > > Further to my email below. > > I have a data library that contains many ~117Gb of NGS data, uploaded via > file system path. This library was always slow to open (about 10s) but now > takes several minutes or not at all. > > Thanks for any help on this. I am still experiencing a memory leak that I > can't pinpoint and it is only dissipated by restarting the server. At the > moment my debugging level is set to INFO. Is there anything I can change in > the universe file to try to trace this. > > Thanks! > Shaun > > Greg Von Kuster Galaxy Development Team g...@bx.psu.edu ___ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
[galaxy-dev] Fwd: Galaxy process
Further to my email below. I have a data library that contains many ~117Gb of NGS data, uploaded via file system path. This library was always slow to open (about 10s) but now takes several minutes or not at all. Thanks for any help on this. I am still experiencing a memory leak that I can't pinpoint and it is only dissipated by restarting the server. At the moment my debugging level is set to INFO. Is there anything I can change in the universe file to try to trace this. Thanks! Shaun - Forwarded message from swe...@staffmail.ed.ac.uk - Date: Wed, 09 Mar 2011 10:15:13 + From: SHAUN WEBB Subject: Galaxy process To: galaxy dev Hi, since making the last update I have found some new warnings in my paster.log, it also seems as though the galaxy process starts to gather memory and eventually hang (35% of 64G memory). I've posted the entries below. If anyone could help me understand what is going on that would be great. Thanks. Shaun Webb paste.httpserver.ThreadPool INFO 2011-03-09 09:49:49,962 No idle tasks, and only 0 busy tasks; adding 5 more workers paste.httpserver.ThreadPool INFO 2011-03-09 09:49:58,754 No idle tasks, and only 4 busy tasks; adding 1 more workers paste.httpserver.ThreadPool INFO 2011-03-09 09:51:47,301 Culling 6 extra workers (5 idle workers present) paste.httpserver.ThreadPool INFO 2011-03-09 09:55:17,163 No idle tasks, and only 0 busy tasks; adding 5 more workers 129.215.14.72 - - [09/Mar/2011:09:48:40 +0100] "GET /history HTTP/1.1" 500 - "http://bifx-core.bio.ed.ac.uk:8080/"; "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-GB; rv:1.9.2.13) Gecko/20101203 Firefox/3.6.13 (.NET CLR 2.0.50727; .NET CLR 1.1.4322; .NET CLR 3.0.4506.2152; .NET CLR 3.5.30729)" paste.httpserver.ThreadPool INFO 2011-03-09 10:13:16,956 Culling 5 extra workers (7 idle workers present) 212.183.140.59 - - [09/Mar/2011:10:13:17 +0100] "GET / HTTP/1.1" 200 - "-" "Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10_6_6; en-US) AppleWebKit/534.13 (KHTML, like Gecko) Chrome/9.0.597.102 Safari/534.13" paste.httpserver.ThreadPool INFO 2011-03-09 10:14:35,715 No idle tasks, and only 2 busy tasks; adding 3 more workers paste.httpserver.ThreadPool WARNING 2011-03-09 10:15:15,104 Thread 140283224094464 hung (working on task for 3096 seconds) Exception happened during processing of request from ('212.183.140.59', 10871) Traceback (most recent call last): File "/usr/lib/python2.6/SocketServer.py", line 281, in _handle_request_noblock self.process_request(request, client_address) File "/storage/home/galaxy/galaxy_dist/eggs/Paste-1.6-py2.6.egg/paste/httpserver.py", line 1037, in process_request lambda: self.process_request_in_thread(request, client_address)) File "/storage/home/galaxy/galaxy_dist/eggs/Paste-1.6-py2.6.egg/paste/httpserver.py", line 617, in add_task self.kill_hung_threads() File "/storage/home/galaxy/galaxy_dist/eggs/Paste-1.6-py2.6.egg/paste/httpserver.py", line 778, in kill_hung_threads self.kill_worker(worker.thread_id) File "/storage/home/galaxy/galaxy_dist/eggs/Paste-1.6-py2.6.egg/paste/httpserver.py", line 705, in kill_worker killthread.async_raise(thread_id, SystemExit) File "/storage/home/galaxy/galaxy_dist/eggs/Paste-1.6-py2.6.egg/paste/util/killthread.py", line 22, in async_raise raise ValueError("invalid thread id") ValueError: invalid thread id -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. - End forwarded message - -- The University of Edinburgh is a charitable body, registered in Scotland, with registration number SC005336. ___ Please keep all replies on the list by using "reply all" in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/