[galaxy-dev] (Composite) Dataset Upload not Setting Metadata

2011-09-12 Thread Paniagua, Eric
Hi everyone,

I've been getting my feet wet with Galaxy development working to get some of 
the rexpression tools online, and I've run into a snag that I've traced back to 
a set_meta datatype method not being able to find a file from which it wants to 
extract metadata.  After reading the code, I believe this would also be a 
problem for non-composite datatypes.

The specific test case I've been looking at is uploading an affybatch file (and 
associated pheno file) using Galaxy's built-in upload tool and selecting the 
File Format manually (ie choosing affybatch in the dropdown).  I am using 
unmodified datatype definitions provided in lib/galaxy/datatypes/genetics.py 
and unmodified core Galaxy upload code as of 5955:949e4f5fa03a.  (I am also 
testing with modified versions, but I am able to reproduce and track this bug 
in the specified clean version).

The crux of the cause of error is that in JobWrapper.finish(), 
dataset.set_meta() is called (lib/galaxy/jobs/__init__.py:607) before the 
composite dataset uploaded files are moved (in a call to a Tool method 
self.tool.collect_associated_files(out_data, self.working_directory) on line 
670) from the job working directory to the final destination under 
config.file_path (which defaults to database/files).

In my test case, database.set_meta( overwrite = False ) eventually calls 
lib/galaxy/datatypes/genetics.py:Rexp.set_meta(dataset, **kwd).  As far as I 
can tell, the only ways to construct a path to a file (or the file) in a 
dataset without using hard-coded paths from external knowledge is to use the 
Dataset.get_file_name or Dataset.extra_files_path properties.  Unless 
explicitly told otherwise, both of these methods construct a path based on the 
Dataset.file_path class data member, whose value is set during Galaxy startup 
to config.file_path (default database/files).  However, at the time set_meta 
is called in this case, the files are not under config.file_path, but rather 
under the job working directory.  Attempting to open files from the dataset 
therefore fails when using these paths.  However, unless the job working 
directory is passed to set_meta or during construction of the underlying 
Dataset object, there doesn't appear to be a way for a Dataset method to access 
th!
 e currently running job (for instance to get its job ID or working directory). 
 (The second suggestion is actually not possible; since the standard upload is 
asynchronous, the Dataset object is created (and persisted) before the Job that 
will process it is created.)

Thoughts?  This issue affects Rexp.set_peek also, as well as any other 
functions that may want to read data from the uploaded files before they are 
moved to permanent location.  This is why if you have an affybatch file and its 
associated pheno file and you test this on, say, the public Galaxy server at 
http://main.g2.bx.psu.edu/ you'll see that the peek info says (for example): 
##failed to find 
/galaxy/main_database/files/002/948/dataset_2948818_files/affybatch_test.pheno

It seems that if the current way that Dataset.file_path, Dataset.file_name, and 
Dataset.extra_files_path is part of the desired design of Galaxy, that methods 
like set_meta should be run after the files have been moved to config.file_path 
so they can set metadata based on file contents.  It looks like this is 
intended to happen at least in some cases, from looking at 
lib/galaxy/jobs/__init__.py:568-586.  However, in my tests this code is not 
kicking in because hda_tool_output is None.

Any clarification on what's happening here, what's supposed to be happening for 
setting metadata on (potentially composite) uploads, why dataset.set_meta() 
isn't already being called after the files are moved to config.file_path, or 
any insights on related Galaxy design decisions I may not know about or design 
constraints I may have missed would be very greatly appreciated.

I'd also be glad to provide further detail or test files upon request.

Thank you,
Eric Paniagua

PS: Further notes on passing the job working directory to set_meta or set_peek 
- I have been successful modifying the code to do this for set_meta since the 
call chain starting from dataset.set_meta() in JobWrapper.finish() to 
Rexp.set_meta() accepts and forwards keyword argument dictionaries along the 
way.  However, set_peek does not accept arbitrary keyword arguments, making it 
harder to pass along the job working directory when needed without stepping on 
the toes of any other code.

___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/


Re: [galaxy-dev] (Composite) Dataset Upload not Setting Metadata

2011-09-13 Thread Paniagua, Eric
Hi all,

Can anyone tell me why JobWrapper.finish() moves the primary dataset file 
dataset_path.false_path to dataset_path.real_path (contingent on 
config.outputs_to_working_directory == True) but does not move the extra 
files?  (lib/galaxy/jobs/__init__.py:540-553)  It seems to me that if you want 
to move a dataset, you want to move the whole dataset, and that perhaps this 
should be factored out, perhaps into the galaxy.util module?

Why does class DatasetPath only account for the path to the primary file and 
not the path to the extra files?  It could be used to account for the extra 
files by path splitting as in my previous suggested bug fix, but only if that 
fix is correct.  It doesn't seem to be used for that purpose in the Galaxy code.

I look forward to an informative response.

Thanks,
Eric Paniagua


From: galaxy-dev-boun...@lists.bx.psu.edu [galaxy-dev-boun...@lists.bx.psu.edu] 
on behalf of Paniagua, Eric [epani...@cshl.edu]
Sent: Monday, September 12, 2011 7:37 PM
To: galaxy-dev@lists.bx.psu.edu
Subject: Re: [galaxy-dev] (Composite) Dataset Upload not Setting Metadata

Hello again,

It looks like the config.outputs_to_working_directory variable is intended to 
do something closely related, but setting it to either of True and False does 
not in fact fix the problem.

The output path for files in a composite dataset upload (dataset.files_path) 
that is used in the tools/data_source/upload.xml tool is set to a path under 
the job working directory by lib/galaxy/tools/__init__.py:1519.  The preceding 
code (lines 1507-1516) select the path for the primary file contingent on 
config.outputs_to_working_directory.

Why is the path set in line 1519 not contingent on 
config.outputs_to_working_directory?  Indeed, the following small change fixes 
the bug I'm observing:

diff -r 949e4f5fa03a lib/galaxy/tools/__init__.py
--- a/lib/galaxy/tools/__init__.py  Mon Aug 29 14:42:04 2011 -0400
+++ b/lib/galaxy/tools/__init__.py  Mon Sep 12 19:32:26 2011 -0400
@@ -1516,7 +1516,9 @@
 param_dict[name] = DatasetFilenameWrapper( hda )
 # Provide access to a path to store additional files
 # TODO: path munging for cluster/dataset server relocatability
-param_dict[name].files_path = os.path.abspath(os.path.join( 
job_working_directory, dataset_%s_files % (hda.dataset.id) ))
+#param_dict[name].files_path = os.path.abspath(os.path.join( 
job_working_directory, dataset_%s_files % (hda.dataset.id) ))
+# This version should make it always follow the primary file
+param_dict[name].files_path = os.path.abspath( os.path.join( 
os.path.split( param_dict[name].file_name )[0], dataset_%s_files % 
(hda.dataset.id) ))
 for child in hda.children:
 param_dict[ _CHILD___%s___%s % ( name, child.designation ) ] 
= DatasetFilenameWrapper( child )
 for out_name, output in self.outputs.iteritems():

Would this break anything?

If that cannot be changed, would the best solution be to modify the upload tool 
so that it took care of this on its own?  That seems readily doable, but starts 
to decentralize control of data flow policy.

Please advise.

Thanks,
Eric Paniagua

From: galaxy-dev-boun...@lists.bx.psu.edu [galaxy-dev-boun...@lists.bx.psu.edu] 
on behalf of Paniagua, Eric [epani...@cshl.edu]
Sent: Monday, September 12, 2011 1:45 PM
To: galaxy-dev@lists.bx.psu.edu
Subject: [galaxy-dev] (Composite) Dataset Upload not Setting Metadata

Hi everyone,

I've been getting my feet wet with Galaxy development working to get some of 
the rexpression tools online, and I've run into a snag that I've traced back to 
a set_meta datatype method not being able to find a file from which it wants to 
extract metadata.  After reading the code, I believe this would also be a 
problem for non-composite datatypes.

The specific test case I've been looking at is uploading an affybatch file (and 
associated pheno file) using Galaxy's built-in upload tool and selecting the 
File Format manually (ie choosing affybatch in the dropdown).  I am using 
unmodified datatype definitions provided in lib/galaxy/datatypes/genetics.py 
and unmodified core Galaxy upload code as of 5955:949e4f5fa03a.  (I am also 
testing with modified versions, but I am able to reproduce and track this bug 
in the specified clean version).

The crux of the cause of error is that in JobWrapper.finish(), 
dataset.set_meta() is called (lib/galaxy/jobs/__init__.py:607) before the 
composite dataset uploaded files are moved (in a call to a Tool method 
self.tool.collect_associated_files(out_data, self.working_directory) on line 
670) from the job working directory to the final destination under 
config.file_path (which defaults to database/files).

In my test case, database.set_meta( overwrite = False ) eventually calls 
lib/galaxy/datatypes/genetics.py:Rexp.set_meta(dataset

Re: [galaxy-dev] uploading binary files checksum changes, Galaxy doing something to file?

2011-09-16 Thread Paniagua, Eric
Hi Leandro,

Is there an entry in your history for the upload?  What file format does it 
show?  Is there any chance your original file was zipped?  If Galaxy detected 
it as a zip file on upload, it may have unzipped it and taken the first file in 
it as the dataset.

That's at least the version of your problem that I've run into before.  
Specifying the file format manually (rather than choosing Auto-detect) may help 
if it's a similar problem.  I suspect the correct solution is to write a 
sniffer for your datatype to help ensure it is identified correctly by Galaxy, 
but I haven't tried this yet.

Best of luck,
Eric

From: galaxy-dev-boun...@lists.bx.psu.edu [galaxy-dev-boun...@lists.bx.psu.edu] 
on behalf of Leandro Hermida [soft...@leandrohermida.com]
Sent: Friday, September 16, 2011 9:42 AM
To: Galaxy Dev
Subject: [galaxy-dev] uploading binary files checksum changes,  Galaxy doing 
something to file?

Hi all,

We tried to find something in the docs and mailing list no luck.  We
created a new datatype the is a straight subclass of Binary and then
when we upload such a file in the Galaxy UI and check the checksums
between the original file and the file located in the Galaxy
database/files/... directory their checksums are different!

What are we doing wrong? We simply want Galaxy to upload and no touch
the file at all.

regards,
Leandro
___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/

___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/


[galaxy-dev] 2 questions about Galaxy's functional testing support

2011-09-29 Thread Paniagua, Eric
Hi all,

I've read the Wiki page on Writing Functional Tests 
(http://wiki.g2.bx.psu.edu/Admin/Tools/Writing%20Tests) and I've been looking 
through test/base and test/functional and I am left with two questions:


  *   Is it possible to write a test to validate metadata directly on an 
(optionally composite) output dataset?  Everything described on the above page 
is file oriented.  I see that there is TwillTestCase.check_metadata_for_string, 
but as far as I can tell this is a bit nonspecific since it appears to just do 
a text search on the Edit page.  I don't yet fully understand the context in 
which tests run, but is there some way to access a live dataset's metadata 
directly, either as a dictionary or just as attributes?  Or even to get the 
actual dataset object?

  *   Does the test harness support retaining output files only for failed 
tests?  Ideally with a cap on how much output data to save.  If not, would this 
be difficult to configure?

Thanks,
Eric

___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/


Re: [galaxy-dev] (Composite) Dataset Upload not Setting Metadata

2011-10-03 Thread Paniagua, Eric
Hi Dan,

Sure, here's the example where I discovered the bug (data files are not 
attached because of a 5MB limit on my email client; see 
http://main.g2.bx.psu.edu/u/paniag/h/metadata-bug-example for a history with 
the example dataset).  The datatype is AffyBatch (or probably anything derived 
from RexpBase) in lib/galaxy/datatypes/genetics.py.

  1.  On Galaxy main (http://main.g2.bx.psu.edu/) go to Get Data - Upload File
  2.  Select affybatch under file format.
  3.  Choose the attached files for upload in their corresponding fields.
  4.  Select genome mm9
  5.  Hit Execute
  6.  Wait for job completion, which occurs successfully (ie end up green)
  7.  Click to expand new history item
  8.  Note that the peek box displays an error message

I further tested by trying to use the newly uploaded dataset with the reQC tool 
from the Rexpression library (I don't see it on the main site).  By looking 
into that code I realized metadata wasn't getting set, and (as detailed below) 
I tracked the problem through the upload tool, (local) tool runner, job 
wrapper, etc and discovered the metadata problem.

Best,
Eric


From: Daniel Blankenberg [d...@bx.psu.edu]
Sent: Monday, October 03, 2011 2:00 PM
To: Paniagua, Eric
Cc: Nate Coraor; galaxy-dev@lists.bx.psu.edu
Subject: Re: [galaxy-dev] (Composite) Dataset Upload not Setting Metadata

Hi Eric,

Can you provide an example of the composite datatype that is experiencing the 
metadata problems?

The code and tool hooks are deprecated, but there could still be some instances 
where they may need to be used.  However, dynamic options (see eg options 
from_dataset=input1  from tools/filters/extract_GFF_Features.xml) can 
handle much of the functions where code files are used for population 
parameters. Additionally, tool output actions (see e.g. 
tools/filters/cutWrapper.xml) can be used to set metadata ( e.g. column 
assignments ) on tool outputs without using hooks. Please let us know if we can 
provide additional information.


Thanks for using Galaxy,

Dan


On Sep 30, 2011, at 1:15 PM, Paniagua, Eric wrote:

Hi Nate,

Thanks for your answers.  I will look into setting set_metadata_externally=True.

I've observed no impact (suggesting someone did error handling properly), but 
upload is the only tool I've tested it with so far.  I'll be doing more 
shortly, but going with the exec_after_process solution since I don't then need 
to modify core code here.

I'm glad to hear that abstraction layer is in the works.

Regarding the upload and general handling of compressed files, please refer to 
my response to Brent a short time ago, subject line [galaxy-user] upload zip 
file to custom tool.  In particular, I pointed out that it doesn't work even 
with a manually set file type and why, at least in terms of code.  I was hoping 
you might know if it was that way on purpose and, if so, why.

Regarding deprecation of the code tag and of tool hooks (am I correct to 
understand that these are deprecated too?), there's been an example at the top 
of my mind (since I'm going to start working on this today).  In the abstract, 
the scenario is dynamically populating the tool UI based on a computation on 
the contents of one or more input datasets.  To an extent, the conditional 
and page tags are helpful with this, but without the code tag I'm not clear 
on how to call out to code that inspects the datasets and returns, for the sake 
of simplicity, options for a single parameter.

My specific example has to do with constructing a microarray expression 
analysis pipeline, with one of my starting points being Ross Lazarus's 
Rexpression code.  I may be able to get around the issue by storing the 
relevant information in metadata (in a custom datatype's set_meta).  Not sure 
yet.  Of course, that relies on metadata being handled correctly.  If I run 
into something more specific, I'll start a new thread.

Thanks,
Eric


From: Nate Coraor [n...@bx.psu.edu]
Sent: Friday, September 30, 2011 11:03 AM
To: Paniagua, Eric
Cc: galaxy-dev@lists.bx.psu.edumailto:galaxy-dev@lists.bx.psu.edu
Subject: Re: [galaxy-dev] (Composite) Dataset Upload not Setting Metadata

Paniagua, Eric wrote:
Hi Nate,

Thank you for your response!  I am glad that it was you in particular who did 
respond, because I also have some questions about the way the upload tool 
handles compressed files and saw that you have opened several Issues related to 
this on the Galaxy bitbucket site.  First though, I'll fill you in on my 
further progress on the composite file issue.

As I mentioned in my original email, the trouble is that JobWrapper.finish() 
calls dataset.set_meta() before it calls collect_associated_files(), resulting 
dataset.extra_files_path being inaccurate because the files haven't been moved 
yet from the job working directory.  This is all with 
set_metadata_externally=False.  (I haven't worked with setting metadata 
externally yet

Re: [galaxy-dev] (Composite) Dataset Upload not Setting Metadata

2011-10-03 Thread Paniagua, Eric
Hi Dan,

Regarding the tool output actions, could you point me to any documentation or 
additional good examples for handling tool output post-processing?

Thanks,
Eric


From: Daniel Blankenberg [d...@bx.psu.edu]
Sent: Monday, October 03, 2011 2:00 PM
To: Paniagua, Eric
Cc: Nate Coraor; galaxy-dev@lists.bx.psu.edu
Subject: Re: [galaxy-dev] (Composite) Dataset Upload not Setting Metadata

Hi Eric,

Can you provide an example of the composite datatype that is experiencing the 
metadata problems?

The code and tool hooks are deprecated, but there could still be some instances 
where they may need to be used.  However, dynamic options (see eg options 
from_dataset=input1  from tools/filters/extract_GFF_Features.xml) can 
handle much of the functions where code files are used for population 
parameters. Additionally, tool output actions (see e.g. 
tools/filters/cutWrapper.xml) can be used to set metadata ( e.g. column 
assignments ) on tool outputs without using hooks. Please let us know if we can 
provide additional information.


Thanks for using Galaxy,

Dan


On Sep 30, 2011, at 1:15 PM, Paniagua, Eric wrote:

Hi Nate,

Thanks for your answers.  I will look into setting set_metadata_externally=True.

I've observed no impact (suggesting someone did error handling properly), but 
upload is the only tool I've tested it with so far.  I'll be doing more 
shortly, but going with the exec_after_process solution since I don't then need 
to modify core code here.

I'm glad to hear that abstraction layer is in the works.

Regarding the upload and general handling of compressed files, please refer to 
my response to Brent a short time ago, subject line [galaxy-user] upload zip 
file to custom tool.  In particular, I pointed out that it doesn't work even 
with a manually set file type and why, at least in terms of code.  I was hoping 
you might know if it was that way on purpose and, if so, why.

Regarding deprecation of the code tag and of tool hooks (am I correct to 
understand that these are deprecated too?), there's been an example at the top 
of my mind (since I'm going to start working on this today).  In the abstract, 
the scenario is dynamically populating the tool UI based on a computation on 
the contents of one or more input datasets.  To an extent, the conditional 
and page tags are helpful with this, but without the code tag I'm not clear 
on how to call out to code that inspects the datasets and returns, for the sake 
of simplicity, options for a single parameter.

My specific example has to do with constructing a microarray expression 
analysis pipeline, with one of my starting points being Ross Lazarus's 
Rexpression code.  I may be able to get around the issue by storing the 
relevant information in metadata (in a custom datatype's set_meta).  Not sure 
yet.  Of course, that relies on metadata being handled correctly.  If I run 
into something more specific, I'll start a new thread.

Thanks,
Eric


From: Nate Coraor [n...@bx.psu.edu]
Sent: Friday, September 30, 2011 11:03 AM
To: Paniagua, Eric
Cc: galaxy-dev@lists.bx.psu.edumailto:galaxy-dev@lists.bx.psu.edu
Subject: Re: [galaxy-dev] (Composite) Dataset Upload not Setting Metadata

Paniagua, Eric wrote:
Hi Nate,

Thank you for your response!  I am glad that it was you in particular who did 
respond, because I also have some questions about the way the upload tool 
handles compressed files and saw that you have opened several Issues related to 
this on the Galaxy bitbucket site.  First though, I'll fill you in on my 
further progress on the composite file issue.

As I mentioned in my original email, the trouble is that JobWrapper.finish() 
calls dataset.set_meta() before it calls collect_associated_files(), resulting 
dataset.extra_files_path being inaccurate because the files haven't been moved 
yet from the job working directory.  This is all with 
set_metadata_externally=False.  (I haven't worked with setting metadata 
externally yet, but I think it is worth verifying whether everything works 
correctly for the case I pointed out when set_metadata_externally=True.)

Since my last email, I poked around a bit more and found that my suggested 
short patch was not correct but incomplete.  The core problem is that component 
files are not moved with the primary file, so I changed that (patch attached, 
relative to { https://bitbucket.org/galaxy/galaxy-dist 5955:949e4f5fa03a }.  
Early in JobWrapper.finish() the primary file is moved from the working 
directory to the appropriate directory under config.file_path.  This patch uses 
the structure of the path naming convention to build the accurate path to the 
component files, and then moves them along with the primary file.  It's the 
least invasive (in terms of modifying Galaxy core code) potential fix I came up 
with, but since it relies explicitly on path structure and naming conventions I 
still think it's a bit of a hack

Re: [galaxy-dev] [Internal - Galaxy-dev #2159] (New) 2 questions about Galaxy's functional testing support

2011-11-30 Thread Paniagua, Eric
Hi Greg,

I appreciate your response!  Thanks for clarifying the capabilities and 
limitations of the current functional testing framework.  (BTW, did you mean 
current as in current on galaxy-dist or also current on galaxy-central ?) 
 If I can find the time, these are areas I would be interested in helping 
enhance.

Sorry my diction was unclear; by output files I meant the transient files 
that a running job is free to create and manipulate in its 
job_working_directory.  (I'm writing transient rather than temporary to 
avoid confusion with, e.g. files created with tempfile.mkstemp - although 
conceivably one might want to be able to inspect their contents on failure as 
well, actually saving them might be a very different problem.)

Normally, upon successful or failed job completion the job working directory 
and anything in it are erased.  For certain tools, it could be helpful in 
debugging if the developer (or Galaxy system administrator) were able to 
inspect their contents to see, for instance, if they are properly formed.

I can disable removal of job working directories globally, but then of course 
the job working directories also persist for successful jobs, which can add up 
to a lot of unnecessary storage (the reason they're deleted by default in the 
first place).

I'm not sure how work is divided on your team, but can you tell me (a) if the 
preceding paragraphs actually clarify anything for you, and (b) whether that 
issue is on the radar of your team and specifically on the radar of the primary 
developer(s) / maintainer(s) of the testing framework?

Thanks again for responding to my email.

Best,
Eric


From: Greg Von Kuster [g...@bx.psu.edu]
Sent: Wednesday, November 30, 2011 9:56 AM
To: Paniagua, Eric
Cc: galaxy-dev@lists.bx.psu.edu Dev
Subject: Re: [Internal - Galaxy-dev #2159] (New) [galaxy-dev] 2 questions about 
Galaxy's functional testing support

Hello Eric,


Submitted by epani...@cshl.edumailto:epani...@cshl.edu

Hi all,

I've read the Wiki page on Writing Functional Tests 
(http://wiki.g2.bx.psu.edu/Admin/Tools/Writing%20Tests) and I've been looking 
through test/base and test/functional and I am left with two questions:

  *   Is it possible to write a test to validate metadata directly on an 
(optionally composite) output dataset?

I'm sure this is possible, but it would require enhancements to the current 
functional test framework.


Everything described on the above page is file oriented.  I see that there is 
TwillTestCase.check_metadata_for_string, but as far as I can tell this is a bit 
nonspecific since it appears to just do a text search on the Edit page.

This is correct.



I don't yet fully understand the context in which tests run, but is there some 
way to access a live dataset's metadata directly, either as a dictionary or 
just as attributes?  Or even to get the actual dataset object?


Not with the current functional test framework.  Doing this would require 
enhancements to the framework.


  *   Does the test harness support retaining output files only for failed 
tests?  Ideally with a cap on how much output data to save.  If not, would this 
be difficult to configure?


I'm not sure what you mean by output files in your question.  If you mean 
output datasets that result from running a functional test for a tool, then I 
believe there is no difference if the test passed or failed.


Thanks,
Eric


Greg Von Kuster
Galaxy Development Team
g...@bx.psu.edumailto:g...@bx.psu.edu




___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/


Re: [galaxy-dev] [Internal - Galaxy-dev #2159] (New) 2 questions about Galaxy's functional testing support

2011-11-30 Thread Paniagua, Eric
Thanks for the quick replies again!  Yeah, from a technical standpoint such 
support is certainly doable.  My employer strongly discourages modifying the 
Galaxy code base too invasively (if at all), which is pretty fair since I'm not 
in a position to take responsibility for performing future Galaxy upgrades 
which may have messy merges as a consequence of my tinkering downstream of you. 
 That's primarily the reason I was curious about whether such features were in 
the works or at least on the horizon at the Galaxy Development Team proper.

Anyway, thanks for communicating.  Have a great day :)

Best,
Eric


From: Greg Von Kuster [g...@bx.psu.edu]
Sent: Wednesday, November 30, 2011 1:22 PM
To: Paniagua, Eric
Cc: galaxy-dev@lists.bx.psu.edu Dev
Subject: Re: [galaxy-dev] [Internal - Galaxy-dev #2159] (New) 2 questions about 
Galaxy's functional testing support

On Nov 30, 2011, at 11:11 AM, Paniagua, Eric wrote:

 Hi Greg,

 I appreciate your response!  Thanks for clarifying the capabilities and 
 limitations of the current functional testing framework.  (BTW, did you mean 
 current as in current on galaxy-dist or also current on galaxy-central 
 ?)  If I can find the time, these are areas I would be interested in helping 
 enhance.

Current on Galaxy central, very likely the same as Galaxy dist.


 Sorry my diction was unclear; by output files I meant the transient files 
 that a running job is free to create and manipulate in its 
 job_working_directory.  (I'm writing transient rather than temporary to 
 avoid confusion with, e.g. files created with tempfile.mkstemp - although 
 conceivably one might want to be able to inspect their contents on failure as 
 well, actually saving them might be a very different problem.)

 Normally, upon successful or failed job completion the job working directory 
 and anything in it are erased.  For certain tools, it could be helpful in 
 debugging if the developer (or Galaxy system administrator) were able to 
 inspect their contents to see, for instance, if they are properly formed.

 I can disable removal of job working directories globally, but then of course 
 the job working directories also persist for successful jobs, which can add 
 up to a lot of unnecessary storage (the reason they're deleted by default in 
 the first place).

 I'm not sure how work is divided on your team, but can you tell me (a) if the 
 preceding paragraphs actually clarify anything for you,

Yes, the functional test framework does not currently deal with anything in 
job_working_directory as far as I know.  I am not a primary tool developer on 
the development team, however, so there may be some peripheral test components 
working in this realm of which I am not aware.  My understanding of the use of 
the job_working_directory is that some files are moved out of it into permanent 
locations, while others are deleted upon job completion.  You should be able to 
enhance the job code to enable you to inspect certain elements of this 
directory during job processing, but I'm not sure how difficult this may be.


 and (b) whether that issue is on the radar of your team and specifically on 
 the radar of the primary developer(s) / maintainer(s) of the testing 
 framework?

This issue is not currently anywhere on the development team's radar.  Sorry if 
this is an inconvenience.


 Thanks again for responding to my email.

 Best,
 Eric

 
 From: Greg Von Kuster [g...@bx.psu.edu]
 Sent: Wednesday, November 30, 2011 9:56 AM
 To: Paniagua, Eric
 Cc: galaxy-dev@lists.bx.psu.edu Dev
 Subject: Re: [Internal - Galaxy-dev #2159] (New) [galaxy-dev] 2 questions 
 about Galaxy's functional testing support

 Hello Eric,


 Submitted by epani...@cshl.edumailto:epani...@cshl.edu

 Hi all,

 I've read the Wiki page on Writing Functional Tests 
 (http://wiki.g2.bx.psu.edu/Admin/Tools/Writing%20Tests) and I've been looking 
 through test/base and test/functional and I am left with two questions:

  *   Is it possible to write a test to validate metadata directly on an 
 (optionally composite) output dataset?

 I'm sure this is possible, but it would require enhancements to the current 
 functional test framework.


 Everything described on the above page is file oriented.  I see that there is 
 TwillTestCase.check_metadata_for_string, but as far as I can tell this is a 
 bit nonspecific since it appears to just do a text search on the Edit page.

 This is correct.



 I don't yet fully understand the context in which tests run, but is there 
 some way to access a live dataset's metadata directly, either as a 
 dictionary or just as attributes?  Or even to get the actual dataset object?


 Not with the current functional test framework.  Doing this would require 
 enhancements to the framework.


  *   Does the test harness support retaining output files only for failed 
 tests?  Ideally with a cap on how much output data to save

Re: [galaxy-dev] [Internal - Galaxy-dev #2159] (New) 2 questions about Galaxy's functional testing support

2011-12-06 Thread Paniagua, Eric
Hi Nate,

That's awesome; thanks a million!

I just took a look at the auto generated emails on galaxy-commits; nicely done. 
 Even if it wasn't much work, I think the general benefit to the quality of 
Galaxy as a platform and to the developer community is palpable.

And definitely thanks for expressly letting me know; that is much appreciated.

Happy hacking,
Eric


From: Nate Coraor [n...@bx.psu.edu]
Sent: Tuesday, December 06, 2011 1:04 PM
To: Greg Von Kuster
Cc: Paniagua, Eric; galaxy-dev@lists.bx.psu.edu Dev
Subject: Re: [galaxy-dev] [Internal - Galaxy-dev #2159] (New) 2 questions about 
Galaxy's functional testing support

On Nov 30, 2011, at 6:35 PM, Greg Von Kuster wrote:

 Eric,

 We always welcome help from the Galaxy community, so if you are interested in 
 enhancing the Galaxy code, the best way to go about it is to create your own 
 fork off the Galaxy central repo in bitbucket, and when you have something to 
 contribute initiate a pull request for us to review and merge into the 
 central repo.  This way you don't have to deal with ongoing merges.

A bunch of people have asked recently and it wasn't much work, so I just added 
a new config param 'cleanup_job' that allows control over when job-related 
files are cleaned up (always, never, or only on job success).

--nate


 Thanks!

 On Nov 30, 2011, at 2:24 PM, Paniagua, Eric wrote:

 Thanks for the quick replies again!  Yeah, from a technical standpoint such 
 support is certainly doable.  My employer strongly discourages modifying the 
 Galaxy code base too invasively (if at all), which is pretty fair since I'm 
 not in a position to take responsibility for performing future Galaxy 
 upgrades which may have messy merges as a consequence of my tinkering 
 downstream of you.  That's primarily the reason I was curious about whether 
 such features were in the works or at least on the horizon at the Galaxy 
 Development Team proper.

 Anyway, thanks for communicating.  Have a great day :)

 Best,
 Eric

 
 From: Greg Von Kuster [g...@bx.psu.edu]
 Sent: Wednesday, November 30, 2011 1:22 PM
 To: Paniagua, Eric
 Cc: galaxy-dev@lists.bx.psu.edu Dev
 Subject: Re: [galaxy-dev] [Internal - Galaxy-dev #2159] (New) 2 questions 
 about Galaxy's functional testing support

 On Nov 30, 2011, at 11:11 AM, Paniagua, Eric wrote:

 Hi Greg,

 I appreciate your response!  Thanks for clarifying the capabilities and 
 limitations of the current functional testing framework.  (BTW, did you 
 mean current as in current on galaxy-dist or also current on 
 galaxy-central ?)  If I can find the time, these are areas I would be 
 interested in helping enhance.

 Current on Galaxy central, very likely the same as Galaxy dist.


 Sorry my diction was unclear; by output files I meant the transient files 
 that a running job is free to create and manipulate in its 
 job_working_directory.  (I'm writing transient rather than temporary to 
 avoid confusion with, e.g. files created with tempfile.mkstemp - although 
 conceivably one might want to be able to inspect their contents on failure 
 as well, actually saving them might be a very different problem.)

 Normally, upon successful or failed job completion the job working 
 directory and anything in it are erased.  For certain tools, it could be 
 helpful in debugging if the developer (or Galaxy system administrator) were 
 able to inspect their contents to see, for instance, if they are properly 
 formed.

 I can disable removal of job working directories globally, but then of 
 course the job working directories also persist for successful jobs, which 
 can add up to a lot of unnecessary storage (the reason they're deleted by 
 default in the first place).

 I'm not sure how work is divided on your team, but can you tell me (a) if 
 the preceding paragraphs actually clarify anything for you,

 Yes, the functional test framework does not currently deal with anything in 
 job_working_directory as far as I know.  I am not a primary tool developer 
 on the development team, however, so there may be some peripheral test 
 components working in this realm of which I am not aware.  My understanding 
 of the use of the job_working_directory is that some files are moved out of 
 it into permanent locations, while others are deleted upon job completion.  
 You should be able to enhance the job code to enable you to inspect certain 
 elements of this directory during job processing, but I'm not sure how 
 difficult this may be.


 and (b) whether that issue is on the radar of your team and specifically on 
 the radar of the primary developer(s) / maintainer(s) of the testing 
 framework?

 This issue is not currently anywhere on the development team's radar.  Sorry 
 if this is an inconvenience.


 Thanks again for responding to my email.

 Best,
 Eric

 
 From: Greg Von Kuster [g...@bx.psu.edu]
 Sent: Wednesday

Re: [galaxy-dev] Managing Data Locality

2013-11-08 Thread Paniagua, Eric
Hi John,

I was just wondering, did you have an object store based suggestion as well?  
Logically, this seems to be where this operation should be done, but I don't 
see much infrastructure to support this, such as logic for moving a data object 
between object stores.  (Incidentally, the release of Galaxy I'm running is 
from last April or May.  Would and upgrade to the latest and greatest version 
pull in more support infrastructure for this?)

Regarding your LWR suggestion, admittedly I have not yet read the docs you 
referred me to, but I thought a second email was warranted anyway.  We would in 
fact be using DRMAA to talk to the HPCC (this is being configured as I write), 
and Galaxy's long-term storage lives on its our independent Galaxy server.  As 
I may have commented before, we can't simply mount our Galaxy file systems to 
the HPCC for security reasons.  To make the scenario even more concrete, we are 
currently using the DistributedObjectStore to balance Galaxy's storage 
requirements across three mounted volumes.  I don't expect this to complicate 
the task at hand, but please do let me know if you think it will.  We also 
currently have UGE set up on our Galaxy server, so we've already been using 
DRMAA to submit jobs.  The details for submission to another host are more 
complicated, though.

Does your LWR suggestion involve the use of scripts/drmaa_external_killer.py, 
scripts/drmaa_external_runner.py, and scripts/external_chown_script.py?  
(Particularly if so, ) Would you be so kind as to point me toward documentation 
for those scripts?  It's not clear to me from their source how they are 
intended to be used or at what stage of the job creation process they would be 
called by Galaxy.  The same applies also to the file_actions.json file you 
referred to previously.  Is that a Galaxy file or an LWR file?  Where may I 
find some documentation on the available configuration attributes, options, 
values, and semantics?  Does your LWR suggestion require that the same absolute 
path structure exists (not much information is conveyed by the action name 
copy), does it require a certain relative path structure to match on both 
file systems, how does setting that option lead to Galaxy setting the correct 
paths (local to the HPCC) when building the command line?

Our goal is to submit all heavy jobs (e.g. mappers) to the HPCC as the user who 
launches the Galaxy job.  Both the HPCC and our Galaxy instance use LDAP 
logins, so fortunately that's one wrinkle we don't have to worry about.  This 
will help all involved maintain fair quota policies on a per-user basis.  I 
plan to handle the support files (genome indices) by transferring them to the 
HPCC and rewriting the appropriate *.loc files on our Galaxy host with HPCC 
paths.

I appreciate your generous response to my first email, and hope to continue the 
conversation with this email.  Now, I will go RTFM for LWR. :)

Many thanks,
Eric


From: jmchil...@gmail.com [jmchil...@gmail.com] on behalf of John Chilton 
[chil...@msi.umn.edu]
Sent: Tuesday, November 05, 2013 11:58 AM
To: Paniagua, Eric
Cc: Galaxy Dev [galaxy-...@bx.psu.edu]
Subject: Re: [galaxy-dev] Managing Data Locality

Hey Eric,

I think what you are purposing would be a major development effort and
mirrors major development efforts ongoing. There are  sortof ways to
do this already, with various trade-offs, and none particularly well
documented. So before undertaking this efforts I would dig into some
alternatives.

If you are using PBS, the PBS runner contains some logic for
delegating to PBS for doing this kind of thing - I have never tried
it.

https://bitbucket.org/galaxy/galaxy-central/src/default/lib/galaxy/jobs/runners/pbs.py#cl-245

In may be possible to use a specially configured handler and the
Galaxy object store to stage files to a particular mount before
running jobs - not sure it makes sense in this case. It might be worth
looking into this (having the object store stage your files, instead
of solving it at the job runner level).

My recommendation however would be to investigate the LWR job runner.
There are a bunch of fairly recent developments to enable something
like what you are describing. For specificity lets say you are using
DRMAA to talk to some HPC cluster and Galaxy's file data is stored in
/galaxy/data on the galaxy web server but not on the HPC and there is
some scratch space (/scratch) that is mounted on both the Galaxy web
server and your HPC cluster.



I would stand up an LWR (http://lwr.readthedocs.org/en/latest/) server
right beside Galaxy on your web server. The LWR has a concept of
managers that sort of mirrors the concept of runners in Galaxy - see
the sample config for guidance on how to get it to talk with your
cluster. It could use DRMAA, torque command-line tools, or condor at
this time (I could add new methods e.g. PBS library if that would
help). 
https://bitbucket.org/jmchilton/lwr/src/default

Re: [galaxy-dev] Managing Data Locality

2013-11-08 Thread Paniagua, Eric
Hi John,

I have now read the top-level documentation for LWR, and gone through the 
sample configurations.  I would appreciate if you would answer a few technical 
questions for me.

1) How exactly is the staging_directory in server.ini.sample used?  Is that 
intended to be the (final) location at which to put files on the remote server? 
 How is the relative path structure under $GALAXY_ROOT/databases/files handled?

2) What exactly does persistence_directory in server.ini.sample mean?  
Where should it be located, how will it be used?

3) What exactly does file_cache_dir in server.ini.sample mean?

4) Does LWR preserve some relative path (e.g. to GALAXY_ROOT) under the above 
directories?

5) Are files renamed when cached?  If so, are they eventually restored to their 
original names?

6) Is it possible to customize the DRMAA and/or qsub requests made by LWR, for 
example to include additional settings such as Project or a memory limit?  Is 
it possible to customize this on a case by case basis, rather than globally?

7) Are there any options for the queued_drmaa manager in 
job_managers.ini.sample which are not listed in that file?

8) What exactly are the differences between the queued_drmaa manager and the 
queued_cli manager?  Are there any options for the latter which are not in 
the job_managers.ini.sample file?

9) When I attempt to run LWR (not having completed all the mentioned 
preparation steps, namely without setting DRMAA_LIBRARY_PATH), I get a Seg 
fault.  Is this because it can't find DRMAA or is it potentially unrelated?  In 
the latter case, here's the error being output to the console:

./run.sh: line 65: 26277 Segmentation fault  paster serve server.ini $@

Lastly, a simple comment, hopefully helpful.  It would be nice if the LWR 
install docs at least mentioned the dependency of PyOpenSSL 0.13 (or later) on 
OpenSSL 0.9.8f (or later), maybe even with a comment that pip will listen to 
the environment variables CFLAGS and LDFLAGS in the event one is creating a 
local installation of the OpenSSL library for LWR to use.

Thank you for your time and assistance.

Best,
Eric

From: jmchil...@gmail.com [jmchil...@gmail.com] on behalf of John Chilton 
[chil...@msi.umn.edu]
Sent: Tuesday, November 05, 2013 11:58 AM
To: Paniagua, Eric
Cc: Galaxy Dev [galaxy-...@bx.psu.edu]
Subject: Re: [galaxy-dev] Managing Data Locality

Hey Eric,

I think what you are purposing would be a major development effort and
mirrors major development efforts ongoing. There are  sortof ways to
do this already, with various trade-offs, and none particularly well
documented. So before undertaking this efforts I would dig into some
alternatives.

If you are using PBS, the PBS runner contains some logic for
delegating to PBS for doing this kind of thing - I have never tried
it.

https://bitbucket.org/galaxy/galaxy-central/src/default/lib/galaxy/jobs/runners/pbs.py#cl-245

In may be possible to use a specially configured handler and the
Galaxy object store to stage files to a particular mount before
running jobs - not sure it makes sense in this case. It might be worth
looking into this (having the object store stage your files, instead
of solving it at the job runner level).

My recommendation however would be to investigate the LWR job runner.
There are a bunch of fairly recent developments to enable something
like what you are describing. For specificity lets say you are using
DRMAA to talk to some HPC cluster and Galaxy's file data is stored in
/galaxy/data on the galaxy web server but not on the HPC and there is
some scratch space (/scratch) that is mounted on both the Galaxy web
server and your HPC cluster.

I would stand up an LWR (http://lwr.readthedocs.org/en/latest/) server
right beside Galaxy on your web server. The LWR has a concept of
managers that sort of mirrors the concept of runners in Galaxy - see
the sample config for guidance on how to get it to talk with your
cluster. It could use DRMAA, torque command-line tools, or condor at
this time (I could add new methods e.g. PBS library if that would
help). 
https://bitbucket.org/jmchilton/lwr/src/default/job_managers.ini.sample?at=default

On the Galaxy side, I would then create a job_conf.xml file telling
certain HPC tools to be sent to the LWR. Be sure to enable the LWR
runner at the top (see advanced example config) and then add at least
one LWR destination.

 destinations

destination id=lwr runner=lwr
  param id=urlhttp://localhost:8913//param
  !-- Leave Galaxy directory and data indices alone, assumes they
are mounted in both places. --
  param id=default_file_actionnone/param
  !-- Do stage everything in /galaxy/data though --
  param id=file_action_configfile_actions.json/param
/destination

Then create a file_actions.json file in the Galaxy root directory
(structure of this file is subject to change, current json layout
doesn't feel very Galaxy-ish).

{paths: [
{path

[galaxy-dev] Problem using Galaxy with a PostgreSQL database on a remote host

2014-05-23 Thread Paniagua, Eric
Dear Galaxy Developers,

I've been banging my head against this one for a few days now.

I have two Galaxy instances.  One resides on a server called genomics, which 
also hosts the corresponding PostgreSQL installation.  The second also resides 
on genomics, but its database is hosted on wigserv5.

Based on the tests I just ran and code I just read, sqlalchemy (not Galaxy) is 
ignoring the hostname/port part of the database_connection string.  For 
reference, the connection strings I've tried are:

postgresql://glxeric:X@/glxeric?host=/tmp
postgresql://glxeric:xx...@wigserv5.cshl.edu/glxeric?host=/tmp
postgresql://glxeric:xx...@wigserv5.cshl.edu:5432/glxeric?host=/tmp
postgresql://glxeric:X@adgdgdfdflkhjfdhfkl/glxeric?host=/tmp

All of these appear to result in Galaxy connecting to the PostgreSQL 
installation on genomics, as determined by Galaxy schema version discrepancies 
and other constraints.  With each connection string, Galaxy starts up normally. 
 I force database activity by browsing saved histories.  It works every time.  
By all appearances, the second Galaxy instance is using the PostgreSQL database 
hosted on genomics, not on wigserv5.

All databases and roles exist, and the databases are populated.

When I comment out the database_connection line in universe_wsgi.ini, I get 
errors arising from the later configuration of PostgreSQL-specific Galaxy 
options, as expected.

I can connect to the database server on wigserv5 using psql -h 
wigserv5.cshl.edu -d glxeric -U glxeric from the server genomics.

Have you ever observed this behavior from Galaxy or sqlalchemy?

Thanks,
Eric

___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/


Re: [galaxy-dev] Problem using Galaxy with a PostgreSQL database on a remote host

2014-05-27 Thread Paniagua, Eric
Hey Dannon,

Thanks for pointing that out!  I missed it.  I am now connecting to the remote 
database.  I ran sh manage_db.sh upgrade and it upgraded from schema 114 to 
118 without error messages.  I then ran sh 
./scripts/migrate_tools/0010_tools.sh install_dependencies and received the 
following error:

Traceback (most recent call last): File 
./scripts/migrate_tools/migrate_tools.py, line 21, in app = 
MigrateToolsApplication( sys.argv[ 1 ] ) File 
/localdata1/galaxy/glxmaint/src/lib/tool_shed/galaxy_install/migrate/common.py,
 line 59, in __init__ install_dependencies=install_dependencies ) File 
/localdata1/galaxy/glxmaint/src/lib/tool_shed/galaxy_install/install_manager.py,
 line 122, in __init__ is_repository_dependency=is_repository_dependency ) File 
/localdata1/galaxy/glxmaint/src/lib/tool_shed/galaxy_install/install_manager.py,
 line 506, in install_repository 
is_repository_dependency=is_repository_dependency ) File 
/localdata1/galaxy/glxmaint/src/lib/tool_shed/galaxy_install/install_manager.py,
 line 345, in handle_repository_contents guid = self.get_guid( 
repository_clone_url, relative_install_dir, tool_config ) File 
/localdata1/galaxy/glxmaint/src/lib/tool_shed/galaxy_install/install_manager.py,
 line 253, in get_guid tool = self.toolbox.load_tool( full_path )!
  File /localdata1/galaxy/glxmaint/src/lib/galaxy/tools/__init__.py, line 
671, in load_tool return ToolClass( config_file, root, self.app, guid=guid, 
repository_id=repository_id, **kwds ) File 
/localdata1/galaxy/glxmaint/src/lib/galaxy/tools/__init__.py, line 1045, in 
__init__ self.parse( root, guid=guid ) File 
/localdata1/galaxy/glxmaint/src/lib/galaxy/tools/__init__.py, line 1260, in 
parse self.parse_inputs( root ) File 
/localdata1/galaxy/glxmaint/src/lib/galaxy/tools/__init__.py, line 1351, in 
parse_inputs display, inputs = self.parse_input_page( page, enctypes ) File 
/localdata1/galaxy/glxmaint/src/lib/galaxy/tools/__init__.py, line 1655, in 
parse_input_page inputs = self.parse_input_elem( input_elem, enctypes ) File 
/localdata1/galaxy/glxmaint/src/lib/galaxy/tools/__init__.py, line 1723, in 
parse_input_elem case.inputs = self.parse_input_elem( case_elem, enctypes, 
context ) File /localdata1/galaxy/glxmaint/src/lib/galaxy/tools/__init__.py, 
line 1679, in pa!
 rse_input_elem group.inputs = self.parse_input_elem( elem, enc!
 types, c
ontext ) File /localdata1/galaxy/glxmaint/src/lib/galaxy/tools/__init__.py, 
line 1751, in parse_input_elem param = self.parse_param_elem( elem, enctypes, 
context ) File /localdata1/galaxy/glxmaint/src/lib/galaxy/tools/__init__.py, 
line 1764, in parse_param_elem param = ToolParameter.build( self, input_elem ) 
File /localdata1/galaxy/glxmaint/src/lib/galaxy/tools/parameters/basic.py, 
line 215, in build return parameter_types[param_type]( tool, param ) File 
/localdata1/galaxy/glxmaint/src/lib/galaxy/tools/parameters/basic.py, line 
1566, in __init__ ToolParameter.__init__( self, tool, elem ) File 
/localdata1/galaxy/glxmaint/src/lib/galaxy/tools/parameters/basic.py, line 
54, in __init__ self.validators.append( validation.Validator.from_element( 
self, elem ) ) File 
/localdata1/galaxy/glxmaint/src/lib/galaxy/tools/parameters/validation.py, 
line 23, in from_element return validator_types[type].from_element( param, elem 
) File /localdata1/galaxy/glxmaint/src/lib/galaxy/t!
 ools/parameters/validation.py, line 283, in from_element tool_data_table = 
param.tool.app.tool_data_tables[ table_name ] File 
/localdata1/galaxy/glxmaint/src/lib/galaxy/tools/data/__init__.py, line 35, 
in __getitem__ return self.data_tables.__getitem__( key ) KeyError: 
'gatk_picard_indexes'

I fixed this by adding the appropriate entries to tool_data_table_conf.xml.  I 
then reran the migrate_tools command successfully.  However, now my 
history_dataset_association table in the database was blown away at some 
point.  The table is now completely empty.  Have you ever seen this before?

Thanks,
Eric

From: Dannon Baker [dannon.ba...@gmail.com]
Sent: Tuesday, May 27, 2014 7:40 AM
To: Paniagua, Eric
Cc: galaxy-dev@lists.bx.psu.edu
Subject: Re: [galaxy-dev] Problem using Galaxy with a PostgreSQL database on a 
remote host

Hey Eric,

It looks like you have connection info for both tcp/ip connections and unix 
sockets in the connection strings.  If you're logging in using psql -h 
wigserv5.cshl.eduhttp://wigserv5.cshl.edu snip, then you only want the 
tcp/ip connection info.  Drop the ?host=tmp off the third option you listed and 
I think you'll be up and running, so:

postgresql://glxeric:xx...@wigserv5.cshl.edu:5432/glxerichttp://glxeric:xx...@wigserv5.cshl.edu:5432/glxeric

-Dannon


On Sat, May 24, 2014 at 1:49 AM, Paniagua, Eric 
epani...@cshl.edumailto:epani...@cshl.edu wrote:
Dear Galaxy Developers,

I've been banging my head against this one for a few days now.

I have two Galaxy instances.  One resides on a server called genomics, which 
also hosts the corresponding

Re: [galaxy-dev] Problem using Galaxy with a PostgreSQL database on a remote host

2014-05-27 Thread Paniagua, Eric
The dataset table is populated.  I looked at the SQL dump file I used to copy 
the database, and it has create table and copy into statements for 
history_dataset_association, but it looks like there may have been an error 
while executing them.  Trying to figure out how to get  my data in...

From: Dannon Baker [dannon.ba...@gmail.com]
Sent: Tuesday, May 27, 2014 11:43 AM
To: Paniagua, Eric
Cc: galaxy-dev@lists.bx.psu.edu
Subject: Re: [galaxy-dev] Problem using Galaxy with a PostgreSQL database on a 
remote host

On Tue, May 27, 2014 at 11:26 AM, Paniagua, Eric 
epani...@cshl.edumailto:epani...@cshl.edu wrote:
Thanks for pointing that out!  I missed it.  I am now connecting to the remote 
database.  I ran sh manage_db.sh upgrade and it upgraded from schema 114 to 
118 without error messages.  I then ran sh 
./scripts/migrate_tools/0010_tools.sh install_dependencies and received the 
following error:

line 35, in __getitem__ return self.data_tables.__getitem__( key ) KeyError: 
'gatk_picard_indexes'

I fixed this by adding the appropriate entries to tool_data_table_conf.xml.  I 
then reran the migrate_tools command successfully.  However, now my 
history_dataset_association table in the database was blown away at some 
point.  The table is now completely empty.  Have you ever seen this before?

I have not seen the tool migration issue before, but it seems harmless.  The 
fact that your history_dataset_association table is empty is concerning if 
there was ever anything in it.  Can you verify that there are datasets in the 
same database that *should* be associated to a history?  It sounds like this 
galaxy instance has been used with different databases, and my hope is that the 
wires are crossed up here and there actually should not be any.

___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/


Re: [galaxy-dev] Problem using Galaxy with a PostgreSQL database on a remote host

2014-05-27 Thread Paniagua, Eric
I have created a fresh dump with

$ pg_dump -U galaxyprod galaxyprod

This time the import proceeded cleanly.

Further, using PostgreSQL 9.1, I no longer get the error regarding a read only 
database cursor and getting the next history item number.  I am currently 
running a test job to confirm that things are working as expected.  However, 
just the fact that this job is running is a very good sign.

From: Dannon Baker [dannon.ba...@gmail.com]
Sent: Tuesday, May 27, 2014 11:58 AM
To: Paniagua, Eric
Cc: galaxy-dev@lists.bx.psu.edu
Subject: Re: [galaxy-dev] Problem using Galaxy with a PostgreSQL database on a 
remote host

Since the database has lost consistency, I'd really try a fresh pg_dump / 
import if that's possible.  If there's an error this time around, note it and 
send it on over and we can figure out where to go from there.


On Tue, May 27, 2014 at 11:48 AM, Paniagua, Eric 
epani...@cshl.edumailto:epani...@cshl.edu wrote:
The dataset table is populated.  I looked at the SQL dump file I used to copy 
the database, and it has create table and copy into statements for 
history_dataset_association, but it looks like there may have been an error 
while executing them.  Trying to figure out how to get  my data in...

From: Dannon Baker [dannon.ba...@gmail.commailto:dannon.ba...@gmail.com]
Sent: Tuesday, May 27, 2014 11:43 AM
To: Paniagua, Eric
Cc: galaxy-dev@lists.bx.psu.edumailto:galaxy-dev@lists.bx.psu.edu
Subject: Re: [galaxy-dev] Problem using Galaxy with a PostgreSQL database on a 
remote host

On Tue, May 27, 2014 at 11:26 AM, Paniagua, Eric 
epani...@cshl.edumailto:epani...@cshl.edumailto:epani...@cshl.edumailto:epani...@cshl.edu
 wrote:
Thanks for pointing that out!  I missed it.  I am now connecting to the remote 
database.  I ran sh manage_db.sh upgrade and it upgraded from schema 114 to 
118 without error messages.  I then ran sh 
./scripts/migrate_tools/0010_tools.sh install_dependencies and received the 
following error:

line 35, in __getitem__ return self.data_tables.__getitem__( key ) KeyError: 
'gatk_picard_indexes'

I fixed this by adding the appropriate entries to tool_data_table_conf.xml.  I 
then reran the migrate_tools command successfully.  However, now my 
history_dataset_association table in the database was blown away at some 
point.  The table is now completely empty.  Have you ever seen this before?

I have not seen the tool migration issue before, but it seems harmless.  The 
fact that your history_dataset_association table is empty is concerning if 
there was ever anything in it.  Can you verify that there are datasets in the 
same database that *should* be associated to a history?  It sounds like this 
galaxy instance has been used with different databases, and my hope is that the 
wires are crossed up here and there actually should not be any.


___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/


Re: [galaxy-dev] Problem using Galaxy with a PostgreSQL database on a remote host

2014-05-27 Thread Paniagua, Eric
Restarting the Galaxy server in multiple process mode appears to have helped.  
The test job is now running.

From: Paniagua, Eric
Sent: Tuesday, May 27, 2014 12:43 PM
To: Dannon Baker
Cc: galaxy-dev@lists.bx.psu.edu
Subject: RE: [galaxy-dev] Problem using Galaxy with a PostgreSQL database on a 
remote host

Correction.  The job has entered the waiting to run phase, and doesn't appear 
to be leaving it.  There is nothing of note in the server log.

From: Paniagua, Eric
Sent: Tuesday, May 27, 2014 12:41 PM
To: Dannon Baker
Cc: galaxy-dev@lists.bx.psu.edu
Subject: RE: [galaxy-dev] Problem using Galaxy with a PostgreSQL database on a 
remote host

I have created a fresh dump with

$ pg_dump -U galaxyprod galaxyprod

This time the import proceeded cleanly.

Further, using PostgreSQL 9.1, I no longer get the error regarding a read only 
database cursor and getting the next history item number.  I am currently 
running a test job to confirm that things are working as expected.  However, 
just the fact that this job is running is a very good sign.

From: Dannon Baker [dannon.ba...@gmail.com]
Sent: Tuesday, May 27, 2014 11:58 AM
To: Paniagua, Eric
Cc: galaxy-dev@lists.bx.psu.edu
Subject: Re: [galaxy-dev] Problem using Galaxy with a PostgreSQL database on a 
remote host

Since the database has lost consistency, I'd really try a fresh pg_dump / 
import if that's possible.  If there's an error this time around, note it and 
send it on over and we can figure out where to go from there.


On Tue, May 27, 2014 at 11:48 AM, Paniagua, Eric 
epani...@cshl.edumailto:epani...@cshl.edu wrote:
The dataset table is populated.  I looked at the SQL dump file I used to copy 
the database, and it has create table and copy into statements for 
history_dataset_association, but it looks like there may have been an error 
while executing them.  Trying to figure out how to get  my data in...

From: Dannon Baker [dannon.ba...@gmail.commailto:dannon.ba...@gmail.com]
Sent: Tuesday, May 27, 2014 11:43 AM
To: Paniagua, Eric
Cc: galaxy-dev@lists.bx.psu.edumailto:galaxy-dev@lists.bx.psu.edu
Subject: Re: [galaxy-dev] Problem using Galaxy with a PostgreSQL database on a 
remote host

On Tue, May 27, 2014 at 11:26 AM, Paniagua, Eric 
epani...@cshl.edumailto:epani...@cshl.edumailto:epani...@cshl.edumailto:epani...@cshl.edu
 wrote:
Thanks for pointing that out!  I missed it.  I am now connecting to the remote 
database.  I ran sh manage_db.sh upgrade and it upgraded from schema 114 to 
118 without error messages.  I then ran sh 
./scripts/migrate_tools/0010_tools.sh install_dependencies and received the 
following error:

line 35, in __getitem__ return self.data_tables.__getitem__( key ) KeyError: 
'gatk_picard_indexes'

I fixed this by adding the appropriate entries to tool_data_table_conf.xml.  I 
then reran the migrate_tools command successfully.  However, now my 
history_dataset_association table in the database was blown away at some 
point.  The table is now completely empty.  Have you ever seen this before?

I have not seen the tool migration issue before, but it seems harmless.  The 
fact that your history_dataset_association table is empty is concerning if 
there was ever anything in it.  Can you verify that there are datasets in the 
same database that *should* be associated to a history?  It sounds like this 
galaxy instance has been used with different databases, and my hope is that the 
wires are crossed up here and there actually should not be any.


___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/