Re: [galaxy-dev] Fresh galaxy installation tips/requests

2011-05-05 Thread Assaf Gordon
Yes, everything is working fine.
Those notes are mostly for possible future improvement ideas.

Thanks!

Jennifer Jackson wrote, On 05/05/2011 01:52 PM:
> Hi Assaf,
> 
> Sorry for the delay in reply. Did you get help/resolve the problems you were 
> having? Or do you still need help?
> 
> Thanks!
> 
> Jen
> Galaxy team
> 
> On 3/30/11 3:12 PM, Assaf Gordon wrote:
>> Hi all,
>>
>> It's been a long while since I had to install a fresh production galaxy 
>> server, and I can offer some tips or requests for minor improvements (but to 
>> give credit were it's due - installation is much smoother now, and DRMAA 
>> works great with SGE out of the box).
>>
>> These are just minor annoyances, but they can help make new installations 
>> easier.
>>
>> ---
>>
>> 1. Debugging REMOTE_USER issues.
>> The mix of apache + authentication + non-root URL + load balancing is a 
>> killer.
>> googling "mod_rewrite + mod_auth" shows that I'm not the only one having 
>> problems with it...
>>
>> Adding those two lines in 
>> "./lib/galaxy/web/framework/middleware/remoteuser.py", line 79 is a 
>> wonderful debugging tool, which enables you to see what Apache is passing on 
>> to Galaxy:
>>
>>for k,v in environ.items():
>> sys.stderr.write ( "%s:\t%s\n" % ( k, v ) )
>>
>> I needed to print all environment variables (not just HTTP_REMOTE_USER) so I 
>> could debug mod_rewrite's variables setup.
>>
>> If there was an easy way to enable this debug from the INI file it would be 
>> a great help.
>>
>> ---
>>
>> 2. The file name "universe_wsgi.ini" is hard-coded in several places.
>> Most of them are python scripts (like cleanup, ampq, etc.), but one is 
>> "./galaxy/eggs/init.py".
>>
>> Not a real problem, except that in the "WebApplicationScaling" wiki page 
>> it's kind of implied that "universe_wsgi.ini" is not used any more after 
>> creating the two separated ini files (universe_wsgi.runner.ini  and  
>> universe_wsgi.webapp.ini).
>> If you rename "universe_wsgi.ini" instead of copying it, galaxy will simply 
>> not start, even if given a different INI file name.
>>
>> So the best recommendation (in the documents) would probably be to maintain 
>> three identical INI files,
>> except for the "[server:main]" part of each (and the "enable_job_running = 
>> False" in the runner).
>> Most importantly, make sure that the database connection string is identical 
>> in all of them.
>>
>> As a side node, the "cp" command in the "WebApplicationScaling" wiki page is 
>> incorrect (you can't copy one file into two files).
>>
>> It'll also be good to mention that the cleanup scripts will always use 
>> "universe_wsgi.ini" regardless of what the galaxy server is using.
>> ---
>>
>> 3. Saving SGE/PBS/DRMAA shell scripts if "debug=TRUE" in the jobrunner's INI 
>> file - this one is a gem, too bad I discovered it too late.
>> If only it was documented :)
>>
>> ---
>>
>> 4. INI keys that accept multiple comma separated values (e.g. "admin_users") 
>> are sensitive to white space.
>> Example:
>>admin_users = gor...@cshl.edu,foo...@cshl.edu
>> works fine, but:
>>admin_users = gor...@cshl.edu , foo...@cshl.edu
>> will not work (spaces around the comma).
>>
>> Same thing with "ucsc_display_sites":
>>ucsc_display_sites = cshl,main
>> will work (and display both "cshl" and "main"),
>> but:
>>ucsc_display_sites = cshl, main
>> will display only "cshl" (because "main" is " main" with a space).
>>
>> It's probably easier to document it than to fix it, but either will do.
>>
>> ---
>>
>> 5. There's no "ensembl_display_sites" in the INI, so disabling the "display 
>> at ensembl" link (for BAM files, etc.) requires commenting out the 
>> corresponding  "" tag in "datatypes_conf.xml", whereas for other 
>> servers (gbrowse,ucsc) it's enough to comment out the "ucsc_display_sites" 
>> in the INI.
>>
>> ---
>>
>> 6. For galaxy servers running inside an intranet without access to the 
>> main/test UCSC Genome browser server,
>> and want to use a local mirror (this might be an obscure setup but that's 
>> what I have):
>>
>> disabling both "main" and "test" in "ucsc_display_sites" will disable the 
>> authenticated user bypass, because "remouteuser.py" have them hard-coded 
>> before setting "self.allow_ucsc_main = True".
>>
>> I needed to patch and add "cshl" as a third option - this is kind of ugly.
>>
>> ---
>>
>> Two development issues not related to production server installation:
>>
>> 1. Debugging complex cheetah templates is very hard. If you have good tricks 
>> (beyond #slient and #breakpoint) - examples would be appriciated.
>> If there's a way to see the compiled cheetah code which produced the error - 
>> it would be great.
>> One problem is that the error is always reported without a line number (I 
>> guess because it's first converted into a string) and without the offending 
>> tag.
>>
>> 2. I only half-understand the new "tool_data_table_conf.xml" mechanism - is 
>> it explained somewhere ?
>> Looks quite

Re: [galaxy-dev] Fresh galaxy installation tips/requests

2011-05-05 Thread Jennifer Jackson

Hi Assaf,

Sorry for the delay in reply. Did you get help/resolve the problems you 
were having? Or do you still need help?


Thanks!

Jen
Galaxy team

On 3/30/11 3:12 PM, Assaf Gordon wrote:

Hi all,

It's been a long while since I had to install a fresh production galaxy server, 
and I can offer some tips or requests for minor improvements (but to give 
credit were it's due - installation is much smoother now, and DRMAA works great 
with SGE out of the box).

These are just minor annoyances, but they can help make new installations 
easier.

---

1. Debugging REMOTE_USER issues.
The mix of apache + authentication + non-root URL + load balancing is a killer.
googling "mod_rewrite + mod_auth" shows that I'm not the only one having 
problems with it...

Adding those two lines in 
"./lib/galaxy/web/framework/middleware/remoteuser.py", line 79 is a wonderful 
debugging tool, which enables you to see what Apache is passing on to Galaxy:

   for k,v in environ.items():
sys.stderr.write ( "%s:\t%s\n" % ( k, v ) )

I needed to print all environment variables (not just HTTP_REMOTE_USER) so I 
could debug mod_rewrite's variables setup.

If there was an easy way to enable this debug from the INI file it would be a 
great help.

---

2. The file name "universe_wsgi.ini" is hard-coded in several places.
Most of them are python scripts (like cleanup, ampq, etc.), but one is 
"./galaxy/eggs/init.py".

Not a real problem, except that in the "WebApplicationScaling" wiki page it's kind of 
implied that "universe_wsgi.ini" is not used any more after creating the two separated 
ini files (universe_wsgi.runner.ini  and  universe_wsgi.webapp.ini).
If you rename "universe_wsgi.ini" instead of copying it, galaxy will simply not 
start, even if given a different INI file name.

So the best recommendation (in the documents) would probably be to maintain 
three identical INI files,
except for the "[server:main]" part of each (and the "enable_job_running = 
False" in the runner).
Most importantly, make sure that the database connection string is identical in 
all of them.

As a side node, the "cp" command in the "WebApplicationScaling" wiki page is 
incorrect (you can't copy one file into two files).

It'll also be good to mention that the cleanup scripts will always use 
"universe_wsgi.ini" regardless of what the galaxy server is using.
---

3. Saving SGE/PBS/DRMAA shell scripts if "debug=TRUE" in the jobrunner's INI 
file - this one is a gem, too bad I discovered it too late.
If only it was documented :)

---

4. INI keys that accept multiple comma separated values (e.g. "admin_users") 
are sensitive to white space.
Example:
   admin_users = gor...@cshl.edu,foo...@cshl.edu
works fine, but:
   admin_users = gor...@cshl.edu , foo...@cshl.edu
will not work (spaces around the comma).

Same thing with "ucsc_display_sites":
   ucsc_display_sites = cshl,main
will work (and display both "cshl" and "main"),
but:
   ucsc_display_sites = cshl, main
will display only "cshl" (because "main" is " main" with a space).

It's probably easier to document it than to fix it, but either will do.

---

5. There's no "ensembl_display_sites" in the INI, so disabling the "display at ensembl" link (for BAM files, etc.) 
requires commenting out the corresponding  "" tag in "datatypes_conf.xml", whereas for other servers 
(gbrowse,ucsc) it's enough to comment out the "ucsc_display_sites" in the INI.

---

6. For galaxy servers running inside an intranet without access to the 
main/test UCSC Genome browser server,
and want to use a local mirror (this might be an obscure setup but that's what 
I have):

disabling both "main" and "test" in "ucsc_display_sites" will disable the authenticated user 
bypass, because "remouteuser.py" have them hard-coded before setting "self.allow_ucsc_main = True".

I needed to patch and add "cshl" as a third option - this is kind of ugly.

---

Two development issues not related to production server installation:

1. Debugging complex cheetah templates is very hard. If you have good tricks 
(beyond #slient and #breakpoint) - examples would be appriciated.
If there's a way to see the compiled cheetah code which produced the error - it 
would be great.
One problem is that the error is always reported without a line number (I guess 
because it's first converted into a string) and without the offending tag.

2. I only half-understand the new "tool_data_table_conf.xml" mechanism - is it 
explained somewhere ?
Looks quite complicated, especially adding in-lined lamba for reading the 
"__app__" global variable inside the cheetah code of each tool :(

---

And user interface quirks:

1. when managing workflows, the "configure your workflow menu" page is a bit uninformative: there's 
no title, header, instruction or anything else, and the button says "Submit Query" which doesn't 
mean much ("Update Workflow Menu" would be better).
If you click on "configure your workflow menu" and you don't have any 
workflows, the pag