[galaxy-dev] (Composite) Dataset Upload not Setting Metadata
Hi everyone, I've been getting my feet wet with Galaxy development working to get some of the rexpression tools online, and I've run into a snag that I've traced back to a set_meta datatype method not being able to find a file from which it wants to extract metadata. After reading the code, I believe this would also be a problem for non-composite datatypes. The specific test case I've been looking at is uploading an affybatch file (and associated pheno file) using Galaxy's built-in upload tool and selecting the File Format manually (ie choosing affybatch in the dropdown). I am using unmodified datatype definitions provided in lib/galaxy/datatypes/genetics.py and unmodified core Galaxy upload code as of 5955:949e4f5fa03a. (I am also testing with modified versions, but I am able to reproduce and track this bug in the specified clean version). The crux of the cause of error is that in JobWrapper.finish(), dataset.set_meta() is called (lib/galaxy/jobs/__init__.py:607) before the composite dataset uploaded files are moved (in a call to a Tool method self.tool.collect_associated_files(out_data, self.working_directory) on line 670) from the job working directory to the final destination under config.file_path (which defaults to database/files). In my test case, database.set_meta( overwrite = False ) eventually calls lib/galaxy/datatypes/genetics.py:Rexp.set_meta(dataset, **kwd). As far as I can tell, the only ways to construct a path to a file (or the file) in a dataset without using hard-coded paths from external knowledge is to use the Dataset.get_file_name or Dataset.extra_files_path properties. Unless explicitly told otherwise, both of these methods construct a path based on the Dataset.file_path class data member, whose value is set during Galaxy startup to config.file_path (default database/files). However, at the time set_meta is called in this case, the files are not under config.file_path, but rather under the job working directory. Attempting to open files from the dataset therefore fails when using these paths. However, unless the job working directory is passed to set_meta or during construction of the underlying Dataset object, there doesn't appear to be a way for a Dataset method to access th! e currently running job (for instance to get its job ID or working directory). (The second suggestion is actually not possible; since the standard upload is asynchronous, the Dataset object is created (and persisted) before the Job that will process it is created.) Thoughts? This issue affects Rexp.set_peek also, as well as any other functions that may want to read data from the uploaded files before they are moved to permanent location. This is why if you have an affybatch file and its associated pheno file and you test this on, say, the public Galaxy server at http://main.g2.bx.psu.edu/ you'll see that the peek info says (for example): ##failed to find /galaxy/main_database/files/002/948/dataset_2948818_files/affybatch_test.pheno It seems that if the current way that Dataset.file_path, Dataset.file_name, and Dataset.extra_files_path is part of the desired design of Galaxy, that methods like set_meta should be run after the files have been moved to config.file_path so they can set metadata based on file contents. It looks like this is intended to happen at least in some cases, from looking at lib/galaxy/jobs/__init__.py:568-586. However, in my tests this code is not kicking in because hda_tool_output is None. Any clarification on what's happening here, what's supposed to be happening for setting metadata on (potentially composite) uploads, why dataset.set_meta() isn't already being called after the files are moved to config.file_path, or any insights on related Galaxy design decisions I may not know about or design constraints I may have missed would be very greatly appreciated. I'd also be glad to provide further detail or test files upon request. Thank you, Eric Paniagua PS: Further notes on passing the job working directory to set_meta or set_peek - I have been successful modifying the code to do this for set_meta since the call chain starting from dataset.set_meta() in JobWrapper.finish() to Rexp.set_meta() accepts and forwards keyword argument dictionaries along the way. However, set_peek does not accept arbitrary keyword arguments, making it harder to pass along the job working directory when needed without stepping on the toes of any other code. ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Re: [galaxy-dev] (Composite) Dataset Upload not Setting Metadata
Hi all, Can anyone tell me why JobWrapper.finish() moves the primary dataset file dataset_path.false_path to dataset_path.real_path (contingent on config.outputs_to_working_directory == True) but does not move the extra files? (lib/galaxy/jobs/__init__.py:540-553) It seems to me that if you want to move a dataset, you want to move the whole dataset, and that perhaps this should be factored out, perhaps into the galaxy.util module? Why does class DatasetPath only account for the path to the primary file and not the path to the extra files? It could be used to account for the extra files by path splitting as in my previous suggested bug fix, but only if that fix is correct. It doesn't seem to be used for that purpose in the Galaxy code. I look forward to an informative response. Thanks, Eric Paniagua From: galaxy-dev-boun...@lists.bx.psu.edu [galaxy-dev-boun...@lists.bx.psu.edu] on behalf of Paniagua, Eric [epani...@cshl.edu] Sent: Monday, September 12, 2011 7:37 PM To: galaxy-dev@lists.bx.psu.edu Subject: Re: [galaxy-dev] (Composite) Dataset Upload not Setting Metadata Hello again, It looks like the config.outputs_to_working_directory variable is intended to do something closely related, but setting it to either of True and False does not in fact fix the problem. The output path for files in a composite dataset upload (dataset.files_path) that is used in the tools/data_source/upload.xml tool is set to a path under the job working directory by lib/galaxy/tools/__init__.py:1519. The preceding code (lines 1507-1516) select the path for the primary file contingent on config.outputs_to_working_directory. Why is the path set in line 1519 not contingent on config.outputs_to_working_directory? Indeed, the following small change fixes the bug I'm observing: diff -r 949e4f5fa03a lib/galaxy/tools/__init__.py --- a/lib/galaxy/tools/__init__.py Mon Aug 29 14:42:04 2011 -0400 +++ b/lib/galaxy/tools/__init__.py Mon Sep 12 19:32:26 2011 -0400 @@ -1516,7 +1516,9 @@ param_dict[name] = DatasetFilenameWrapper( hda ) # Provide access to a path to store additional files # TODO: path munging for cluster/dataset server relocatability -param_dict[name].files_path = os.path.abspath(os.path.join( job_working_directory, dataset_%s_files % (hda.dataset.id) )) +#param_dict[name].files_path = os.path.abspath(os.path.join( job_working_directory, dataset_%s_files % (hda.dataset.id) )) +# This version should make it always follow the primary file +param_dict[name].files_path = os.path.abspath( os.path.join( os.path.split( param_dict[name].file_name )[0], dataset_%s_files % (hda.dataset.id) )) for child in hda.children: param_dict[ _CHILD___%s___%s % ( name, child.designation ) ] = DatasetFilenameWrapper( child ) for out_name, output in self.outputs.iteritems(): Would this break anything? If that cannot be changed, would the best solution be to modify the upload tool so that it took care of this on its own? That seems readily doable, but starts to decentralize control of data flow policy. Please advise. Thanks, Eric Paniagua From: galaxy-dev-boun...@lists.bx.psu.edu [galaxy-dev-boun...@lists.bx.psu.edu] on behalf of Paniagua, Eric [epani...@cshl.edu] Sent: Monday, September 12, 2011 1:45 PM To: galaxy-dev@lists.bx.psu.edu Subject: [galaxy-dev] (Composite) Dataset Upload not Setting Metadata Hi everyone, I've been getting my feet wet with Galaxy development working to get some of the rexpression tools online, and I've run into a snag that I've traced back to a set_meta datatype method not being able to find a file from which it wants to extract metadata. After reading the code, I believe this would also be a problem for non-composite datatypes. The specific test case I've been looking at is uploading an affybatch file (and associated pheno file) using Galaxy's built-in upload tool and selecting the File Format manually (ie choosing affybatch in the dropdown). I am using unmodified datatype definitions provided in lib/galaxy/datatypes/genetics.py and unmodified core Galaxy upload code as of 5955:949e4f5fa03a. (I am also testing with modified versions, but I am able to reproduce and track this bug in the specified clean version). The crux of the cause of error is that in JobWrapper.finish(), dataset.set_meta() is called (lib/galaxy/jobs/__init__.py:607) before the composite dataset uploaded files are moved (in a call to a Tool method self.tool.collect_associated_files(out_data, self.working_directory) on line 670) from the job working directory to the final destination under config.file_path (which defaults to database/files). In my test case, database.set_meta( overwrite = False ) eventually calls lib/galaxy/datatypes/genetics.py:Rexp.set_meta(dataset
Re: [galaxy-dev] uploading binary files checksum changes, Galaxy doing something to file?
Hi Leandro, Is there an entry in your history for the upload? What file format does it show? Is there any chance your original file was zipped? If Galaxy detected it as a zip file on upload, it may have unzipped it and taken the first file in it as the dataset. That's at least the version of your problem that I've run into before. Specifying the file format manually (rather than choosing Auto-detect) may help if it's a similar problem. I suspect the correct solution is to write a sniffer for your datatype to help ensure it is identified correctly by Galaxy, but I haven't tried this yet. Best of luck, Eric From: galaxy-dev-boun...@lists.bx.psu.edu [galaxy-dev-boun...@lists.bx.psu.edu] on behalf of Leandro Hermida [soft...@leandrohermida.com] Sent: Friday, September 16, 2011 9:42 AM To: Galaxy Dev Subject: [galaxy-dev] uploading binary files checksum changes, Galaxy doing something to file? Hi all, We tried to find something in the docs and mailing list no luck. We created a new datatype the is a straight subclass of Binary and then when we upload such a file in the Galaxy UI and check the checksums between the original file and the file located in the Galaxy database/files/... directory their checksums are different! What are we doing wrong? We simply want Galaxy to upload and no touch the file at all. regards, Leandro ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
[galaxy-dev] 2 questions about Galaxy's functional testing support
Hi all, I've read the Wiki page on Writing Functional Tests (http://wiki.g2.bx.psu.edu/Admin/Tools/Writing%20Tests) and I've been looking through test/base and test/functional and I am left with two questions: * Is it possible to write a test to validate metadata directly on an (optionally composite) output dataset? Everything described on the above page is file oriented. I see that there is TwillTestCase.check_metadata_for_string, but as far as I can tell this is a bit nonspecific since it appears to just do a text search on the Edit page. I don't yet fully understand the context in which tests run, but is there some way to access a live dataset's metadata directly, either as a dictionary or just as attributes? Or even to get the actual dataset object? * Does the test harness support retaining output files only for failed tests? Ideally with a cap on how much output data to save. If not, would this be difficult to configure? Thanks, Eric ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Re: [galaxy-dev] (Composite) Dataset Upload not Setting Metadata
Hi Dan, Sure, here's the example where I discovered the bug (data files are not attached because of a 5MB limit on my email client; see http://main.g2.bx.psu.edu/u/paniag/h/metadata-bug-example for a history with the example dataset). The datatype is AffyBatch (or probably anything derived from RexpBase) in lib/galaxy/datatypes/genetics.py. 1. On Galaxy main (http://main.g2.bx.psu.edu/) go to Get Data - Upload File 2. Select affybatch under file format. 3. Choose the attached files for upload in their corresponding fields. 4. Select genome mm9 5. Hit Execute 6. Wait for job completion, which occurs successfully (ie end up green) 7. Click to expand new history item 8. Note that the peek box displays an error message I further tested by trying to use the newly uploaded dataset with the reQC tool from the Rexpression library (I don't see it on the main site). By looking into that code I realized metadata wasn't getting set, and (as detailed below) I tracked the problem through the upload tool, (local) tool runner, job wrapper, etc and discovered the metadata problem. Best, Eric From: Daniel Blankenberg [d...@bx.psu.edu] Sent: Monday, October 03, 2011 2:00 PM To: Paniagua, Eric Cc: Nate Coraor; galaxy-dev@lists.bx.psu.edu Subject: Re: [galaxy-dev] (Composite) Dataset Upload not Setting Metadata Hi Eric, Can you provide an example of the composite datatype that is experiencing the metadata problems? The code and tool hooks are deprecated, but there could still be some instances where they may need to be used. However, dynamic options (see eg options from_dataset=input1 from tools/filters/extract_GFF_Features.xml) can handle much of the functions where code files are used for population parameters. Additionally, tool output actions (see e.g. tools/filters/cutWrapper.xml) can be used to set metadata ( e.g. column assignments ) on tool outputs without using hooks. Please let us know if we can provide additional information. Thanks for using Galaxy, Dan On Sep 30, 2011, at 1:15 PM, Paniagua, Eric wrote: Hi Nate, Thanks for your answers. I will look into setting set_metadata_externally=True. I've observed no impact (suggesting someone did error handling properly), but upload is the only tool I've tested it with so far. I'll be doing more shortly, but going with the exec_after_process solution since I don't then need to modify core code here. I'm glad to hear that abstraction layer is in the works. Regarding the upload and general handling of compressed files, please refer to my response to Brent a short time ago, subject line [galaxy-user] upload zip file to custom tool. In particular, I pointed out that it doesn't work even with a manually set file type and why, at least in terms of code. I was hoping you might know if it was that way on purpose and, if so, why. Regarding deprecation of the code tag and of tool hooks (am I correct to understand that these are deprecated too?), there's been an example at the top of my mind (since I'm going to start working on this today). In the abstract, the scenario is dynamically populating the tool UI based on a computation on the contents of one or more input datasets. To an extent, the conditional and page tags are helpful with this, but without the code tag I'm not clear on how to call out to code that inspects the datasets and returns, for the sake of simplicity, options for a single parameter. My specific example has to do with constructing a microarray expression analysis pipeline, with one of my starting points being Ross Lazarus's Rexpression code. I may be able to get around the issue by storing the relevant information in metadata (in a custom datatype's set_meta). Not sure yet. Of course, that relies on metadata being handled correctly. If I run into something more specific, I'll start a new thread. Thanks, Eric From: Nate Coraor [n...@bx.psu.edu] Sent: Friday, September 30, 2011 11:03 AM To: Paniagua, Eric Cc: galaxy-dev@lists.bx.psu.edumailto:galaxy-dev@lists.bx.psu.edu Subject: Re: [galaxy-dev] (Composite) Dataset Upload not Setting Metadata Paniagua, Eric wrote: Hi Nate, Thank you for your response! I am glad that it was you in particular who did respond, because I also have some questions about the way the upload tool handles compressed files and saw that you have opened several Issues related to this on the Galaxy bitbucket site. First though, I'll fill you in on my further progress on the composite file issue. As I mentioned in my original email, the trouble is that JobWrapper.finish() calls dataset.set_meta() before it calls collect_associated_files(), resulting dataset.extra_files_path being inaccurate because the files haven't been moved yet from the job working directory. This is all with set_metadata_externally=False. (I haven't worked with setting metadata externally yet
Re: [galaxy-dev] (Composite) Dataset Upload not Setting Metadata
Hi Dan, Regarding the tool output actions, could you point me to any documentation or additional good examples for handling tool output post-processing? Thanks, Eric From: Daniel Blankenberg [d...@bx.psu.edu] Sent: Monday, October 03, 2011 2:00 PM To: Paniagua, Eric Cc: Nate Coraor; galaxy-dev@lists.bx.psu.edu Subject: Re: [galaxy-dev] (Composite) Dataset Upload not Setting Metadata Hi Eric, Can you provide an example of the composite datatype that is experiencing the metadata problems? The code and tool hooks are deprecated, but there could still be some instances where they may need to be used. However, dynamic options (see eg options from_dataset=input1 from tools/filters/extract_GFF_Features.xml) can handle much of the functions where code files are used for population parameters. Additionally, tool output actions (see e.g. tools/filters/cutWrapper.xml) can be used to set metadata ( e.g. column assignments ) on tool outputs without using hooks. Please let us know if we can provide additional information. Thanks for using Galaxy, Dan On Sep 30, 2011, at 1:15 PM, Paniagua, Eric wrote: Hi Nate, Thanks for your answers. I will look into setting set_metadata_externally=True. I've observed no impact (suggesting someone did error handling properly), but upload is the only tool I've tested it with so far. I'll be doing more shortly, but going with the exec_after_process solution since I don't then need to modify core code here. I'm glad to hear that abstraction layer is in the works. Regarding the upload and general handling of compressed files, please refer to my response to Brent a short time ago, subject line [galaxy-user] upload zip file to custom tool. In particular, I pointed out that it doesn't work even with a manually set file type and why, at least in terms of code. I was hoping you might know if it was that way on purpose and, if so, why. Regarding deprecation of the code tag and of tool hooks (am I correct to understand that these are deprecated too?), there's been an example at the top of my mind (since I'm going to start working on this today). In the abstract, the scenario is dynamically populating the tool UI based on a computation on the contents of one or more input datasets. To an extent, the conditional and page tags are helpful with this, but without the code tag I'm not clear on how to call out to code that inspects the datasets and returns, for the sake of simplicity, options for a single parameter. My specific example has to do with constructing a microarray expression analysis pipeline, with one of my starting points being Ross Lazarus's Rexpression code. I may be able to get around the issue by storing the relevant information in metadata (in a custom datatype's set_meta). Not sure yet. Of course, that relies on metadata being handled correctly. If I run into something more specific, I'll start a new thread. Thanks, Eric From: Nate Coraor [n...@bx.psu.edu] Sent: Friday, September 30, 2011 11:03 AM To: Paniagua, Eric Cc: galaxy-dev@lists.bx.psu.edumailto:galaxy-dev@lists.bx.psu.edu Subject: Re: [galaxy-dev] (Composite) Dataset Upload not Setting Metadata Paniagua, Eric wrote: Hi Nate, Thank you for your response! I am glad that it was you in particular who did respond, because I also have some questions about the way the upload tool handles compressed files and saw that you have opened several Issues related to this on the Galaxy bitbucket site. First though, I'll fill you in on my further progress on the composite file issue. As I mentioned in my original email, the trouble is that JobWrapper.finish() calls dataset.set_meta() before it calls collect_associated_files(), resulting dataset.extra_files_path being inaccurate because the files haven't been moved yet from the job working directory. This is all with set_metadata_externally=False. (I haven't worked with setting metadata externally yet, but I think it is worth verifying whether everything works correctly for the case I pointed out when set_metadata_externally=True.) Since my last email, I poked around a bit more and found that my suggested short patch was not correct but incomplete. The core problem is that component files are not moved with the primary file, so I changed that (patch attached, relative to { https://bitbucket.org/galaxy/galaxy-dist 5955:949e4f5fa03a }. Early in JobWrapper.finish() the primary file is moved from the working directory to the appropriate directory under config.file_path. This patch uses the structure of the path naming convention to build the accurate path to the component files, and then moves them along with the primary file. It's the least invasive (in terms of modifying Galaxy core code) potential fix I came up with, but since it relies explicitly on path structure and naming conventions I still think it's a bit of a hack
Re: [galaxy-dev] [Internal - Galaxy-dev #2159] (New) 2 questions about Galaxy's functional testing support
Hi Greg, I appreciate your response! Thanks for clarifying the capabilities and limitations of the current functional testing framework. (BTW, did you mean current as in current on galaxy-dist or also current on galaxy-central ?) If I can find the time, these are areas I would be interested in helping enhance. Sorry my diction was unclear; by output files I meant the transient files that a running job is free to create and manipulate in its job_working_directory. (I'm writing transient rather than temporary to avoid confusion with, e.g. files created with tempfile.mkstemp - although conceivably one might want to be able to inspect their contents on failure as well, actually saving them might be a very different problem.) Normally, upon successful or failed job completion the job working directory and anything in it are erased. For certain tools, it could be helpful in debugging if the developer (or Galaxy system administrator) were able to inspect their contents to see, for instance, if they are properly formed. I can disable removal of job working directories globally, but then of course the job working directories also persist for successful jobs, which can add up to a lot of unnecessary storage (the reason they're deleted by default in the first place). I'm not sure how work is divided on your team, but can you tell me (a) if the preceding paragraphs actually clarify anything for you, and (b) whether that issue is on the radar of your team and specifically on the radar of the primary developer(s) / maintainer(s) of the testing framework? Thanks again for responding to my email. Best, Eric From: Greg Von Kuster [g...@bx.psu.edu] Sent: Wednesday, November 30, 2011 9:56 AM To: Paniagua, Eric Cc: galaxy-dev@lists.bx.psu.edu Dev Subject: Re: [Internal - Galaxy-dev #2159] (New) [galaxy-dev] 2 questions about Galaxy's functional testing support Hello Eric, Submitted by epani...@cshl.edumailto:epani...@cshl.edu Hi all, I've read the Wiki page on Writing Functional Tests (http://wiki.g2.bx.psu.edu/Admin/Tools/Writing%20Tests) and I've been looking through test/base and test/functional and I am left with two questions: * Is it possible to write a test to validate metadata directly on an (optionally composite) output dataset? I'm sure this is possible, but it would require enhancements to the current functional test framework. Everything described on the above page is file oriented. I see that there is TwillTestCase.check_metadata_for_string, but as far as I can tell this is a bit nonspecific since it appears to just do a text search on the Edit page. This is correct. I don't yet fully understand the context in which tests run, but is there some way to access a live dataset's metadata directly, either as a dictionary or just as attributes? Or even to get the actual dataset object? Not with the current functional test framework. Doing this would require enhancements to the framework. * Does the test harness support retaining output files only for failed tests? Ideally with a cap on how much output data to save. If not, would this be difficult to configure? I'm not sure what you mean by output files in your question. If you mean output datasets that result from running a functional test for a tool, then I believe there is no difference if the test passed or failed. Thanks, Eric Greg Von Kuster Galaxy Development Team g...@bx.psu.edumailto:g...@bx.psu.edu ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/
Re: [galaxy-dev] [Internal - Galaxy-dev #2159] (New) 2 questions about Galaxy's functional testing support
Thanks for the quick replies again! Yeah, from a technical standpoint such support is certainly doable. My employer strongly discourages modifying the Galaxy code base too invasively (if at all), which is pretty fair since I'm not in a position to take responsibility for performing future Galaxy upgrades which may have messy merges as a consequence of my tinkering downstream of you. That's primarily the reason I was curious about whether such features were in the works or at least on the horizon at the Galaxy Development Team proper. Anyway, thanks for communicating. Have a great day :) Best, Eric From: Greg Von Kuster [g...@bx.psu.edu] Sent: Wednesday, November 30, 2011 1:22 PM To: Paniagua, Eric Cc: galaxy-dev@lists.bx.psu.edu Dev Subject: Re: [galaxy-dev] [Internal - Galaxy-dev #2159] (New) 2 questions about Galaxy's functional testing support On Nov 30, 2011, at 11:11 AM, Paniagua, Eric wrote: Hi Greg, I appreciate your response! Thanks for clarifying the capabilities and limitations of the current functional testing framework. (BTW, did you mean current as in current on galaxy-dist or also current on galaxy-central ?) If I can find the time, these are areas I would be interested in helping enhance. Current on Galaxy central, very likely the same as Galaxy dist. Sorry my diction was unclear; by output files I meant the transient files that a running job is free to create and manipulate in its job_working_directory. (I'm writing transient rather than temporary to avoid confusion with, e.g. files created with tempfile.mkstemp - although conceivably one might want to be able to inspect their contents on failure as well, actually saving them might be a very different problem.) Normally, upon successful or failed job completion the job working directory and anything in it are erased. For certain tools, it could be helpful in debugging if the developer (or Galaxy system administrator) were able to inspect their contents to see, for instance, if they are properly formed. I can disable removal of job working directories globally, but then of course the job working directories also persist for successful jobs, which can add up to a lot of unnecessary storage (the reason they're deleted by default in the first place). I'm not sure how work is divided on your team, but can you tell me (a) if the preceding paragraphs actually clarify anything for you, Yes, the functional test framework does not currently deal with anything in job_working_directory as far as I know. I am not a primary tool developer on the development team, however, so there may be some peripheral test components working in this realm of which I am not aware. My understanding of the use of the job_working_directory is that some files are moved out of it into permanent locations, while others are deleted upon job completion. You should be able to enhance the job code to enable you to inspect certain elements of this directory during job processing, but I'm not sure how difficult this may be. and (b) whether that issue is on the radar of your team and specifically on the radar of the primary developer(s) / maintainer(s) of the testing framework? This issue is not currently anywhere on the development team's radar. Sorry if this is an inconvenience. Thanks again for responding to my email. Best, Eric From: Greg Von Kuster [g...@bx.psu.edu] Sent: Wednesday, November 30, 2011 9:56 AM To: Paniagua, Eric Cc: galaxy-dev@lists.bx.psu.edu Dev Subject: Re: [Internal - Galaxy-dev #2159] (New) [galaxy-dev] 2 questions about Galaxy's functional testing support Hello Eric, Submitted by epani...@cshl.edumailto:epani...@cshl.edu Hi all, I've read the Wiki page on Writing Functional Tests (http://wiki.g2.bx.psu.edu/Admin/Tools/Writing%20Tests) and I've been looking through test/base and test/functional and I am left with two questions: * Is it possible to write a test to validate metadata directly on an (optionally composite) output dataset? I'm sure this is possible, but it would require enhancements to the current functional test framework. Everything described on the above page is file oriented. I see that there is TwillTestCase.check_metadata_for_string, but as far as I can tell this is a bit nonspecific since it appears to just do a text search on the Edit page. This is correct. I don't yet fully understand the context in which tests run, but is there some way to access a live dataset's metadata directly, either as a dictionary or just as attributes? Or even to get the actual dataset object? Not with the current functional test framework. Doing this would require enhancements to the framework. * Does the test harness support retaining output files only for failed tests? Ideally with a cap on how much output data to save
Re: [galaxy-dev] [Internal - Galaxy-dev #2159] (New) 2 questions about Galaxy's functional testing support
Hi Nate, That's awesome; thanks a million! I just took a look at the auto generated emails on galaxy-commits; nicely done. Even if it wasn't much work, I think the general benefit to the quality of Galaxy as a platform and to the developer community is palpable. And definitely thanks for expressly letting me know; that is much appreciated. Happy hacking, Eric From: Nate Coraor [n...@bx.psu.edu] Sent: Tuesday, December 06, 2011 1:04 PM To: Greg Von Kuster Cc: Paniagua, Eric; galaxy-dev@lists.bx.psu.edu Dev Subject: Re: [galaxy-dev] [Internal - Galaxy-dev #2159] (New) 2 questions about Galaxy's functional testing support On Nov 30, 2011, at 6:35 PM, Greg Von Kuster wrote: Eric, We always welcome help from the Galaxy community, so if you are interested in enhancing the Galaxy code, the best way to go about it is to create your own fork off the Galaxy central repo in bitbucket, and when you have something to contribute initiate a pull request for us to review and merge into the central repo. This way you don't have to deal with ongoing merges. A bunch of people have asked recently and it wasn't much work, so I just added a new config param 'cleanup_job' that allows control over when job-related files are cleaned up (always, never, or only on job success). --nate Thanks! On Nov 30, 2011, at 2:24 PM, Paniagua, Eric wrote: Thanks for the quick replies again! Yeah, from a technical standpoint such support is certainly doable. My employer strongly discourages modifying the Galaxy code base too invasively (if at all), which is pretty fair since I'm not in a position to take responsibility for performing future Galaxy upgrades which may have messy merges as a consequence of my tinkering downstream of you. That's primarily the reason I was curious about whether such features were in the works or at least on the horizon at the Galaxy Development Team proper. Anyway, thanks for communicating. Have a great day :) Best, Eric From: Greg Von Kuster [g...@bx.psu.edu] Sent: Wednesday, November 30, 2011 1:22 PM To: Paniagua, Eric Cc: galaxy-dev@lists.bx.psu.edu Dev Subject: Re: [galaxy-dev] [Internal - Galaxy-dev #2159] (New) 2 questions about Galaxy's functional testing support On Nov 30, 2011, at 11:11 AM, Paniagua, Eric wrote: Hi Greg, I appreciate your response! Thanks for clarifying the capabilities and limitations of the current functional testing framework. (BTW, did you mean current as in current on galaxy-dist or also current on galaxy-central ?) If I can find the time, these are areas I would be interested in helping enhance. Current on Galaxy central, very likely the same as Galaxy dist. Sorry my diction was unclear; by output files I meant the transient files that a running job is free to create and manipulate in its job_working_directory. (I'm writing transient rather than temporary to avoid confusion with, e.g. files created with tempfile.mkstemp - although conceivably one might want to be able to inspect their contents on failure as well, actually saving them might be a very different problem.) Normally, upon successful or failed job completion the job working directory and anything in it are erased. For certain tools, it could be helpful in debugging if the developer (or Galaxy system administrator) were able to inspect their contents to see, for instance, if they are properly formed. I can disable removal of job working directories globally, but then of course the job working directories also persist for successful jobs, which can add up to a lot of unnecessary storage (the reason they're deleted by default in the first place). I'm not sure how work is divided on your team, but can you tell me (a) if the preceding paragraphs actually clarify anything for you, Yes, the functional test framework does not currently deal with anything in job_working_directory as far as I know. I am not a primary tool developer on the development team, however, so there may be some peripheral test components working in this realm of which I am not aware. My understanding of the use of the job_working_directory is that some files are moved out of it into permanent locations, while others are deleted upon job completion. You should be able to enhance the job code to enable you to inspect certain elements of this directory during job processing, but I'm not sure how difficult this may be. and (b) whether that issue is on the radar of your team and specifically on the radar of the primary developer(s) / maintainer(s) of the testing framework? This issue is not currently anywhere on the development team's radar. Sorry if this is an inconvenience. Thanks again for responding to my email. Best, Eric From: Greg Von Kuster [g...@bx.psu.edu] Sent: Wednesday
Re: [galaxy-dev] Managing Data Locality
Hi John, I was just wondering, did you have an object store based suggestion as well? Logically, this seems to be where this operation should be done, but I don't see much infrastructure to support this, such as logic for moving a data object between object stores. (Incidentally, the release of Galaxy I'm running is from last April or May. Would and upgrade to the latest and greatest version pull in more support infrastructure for this?) Regarding your LWR suggestion, admittedly I have not yet read the docs you referred me to, but I thought a second email was warranted anyway. We would in fact be using DRMAA to talk to the HPCC (this is being configured as I write), and Galaxy's long-term storage lives on its our independent Galaxy server. As I may have commented before, we can't simply mount our Galaxy file systems to the HPCC for security reasons. To make the scenario even more concrete, we are currently using the DistributedObjectStore to balance Galaxy's storage requirements across three mounted volumes. I don't expect this to complicate the task at hand, but please do let me know if you think it will. We also currently have UGE set up on our Galaxy server, so we've already been using DRMAA to submit jobs. The details for submission to another host are more complicated, though. Does your LWR suggestion involve the use of scripts/drmaa_external_killer.py, scripts/drmaa_external_runner.py, and scripts/external_chown_script.py? (Particularly if so, ) Would you be so kind as to point me toward documentation for those scripts? It's not clear to me from their source how they are intended to be used or at what stage of the job creation process they would be called by Galaxy. The same applies also to the file_actions.json file you referred to previously. Is that a Galaxy file or an LWR file? Where may I find some documentation on the available configuration attributes, options, values, and semantics? Does your LWR suggestion require that the same absolute path structure exists (not much information is conveyed by the action name copy), does it require a certain relative path structure to match on both file systems, how does setting that option lead to Galaxy setting the correct paths (local to the HPCC) when building the command line? Our goal is to submit all heavy jobs (e.g. mappers) to the HPCC as the user who launches the Galaxy job. Both the HPCC and our Galaxy instance use LDAP logins, so fortunately that's one wrinkle we don't have to worry about. This will help all involved maintain fair quota policies on a per-user basis. I plan to handle the support files (genome indices) by transferring them to the HPCC and rewriting the appropriate *.loc files on our Galaxy host with HPCC paths. I appreciate your generous response to my first email, and hope to continue the conversation with this email. Now, I will go RTFM for LWR. :) Many thanks, Eric From: jmchil...@gmail.com [jmchil...@gmail.com] on behalf of John Chilton [chil...@msi.umn.edu] Sent: Tuesday, November 05, 2013 11:58 AM To: Paniagua, Eric Cc: Galaxy Dev [galaxy-...@bx.psu.edu] Subject: Re: [galaxy-dev] Managing Data Locality Hey Eric, I think what you are purposing would be a major development effort and mirrors major development efforts ongoing. There are sortof ways to do this already, with various trade-offs, and none particularly well documented. So before undertaking this efforts I would dig into some alternatives. If you are using PBS, the PBS runner contains some logic for delegating to PBS for doing this kind of thing - I have never tried it. https://bitbucket.org/galaxy/galaxy-central/src/default/lib/galaxy/jobs/runners/pbs.py#cl-245 In may be possible to use a specially configured handler and the Galaxy object store to stage files to a particular mount before running jobs - not sure it makes sense in this case. It might be worth looking into this (having the object store stage your files, instead of solving it at the job runner level). My recommendation however would be to investigate the LWR job runner. There are a bunch of fairly recent developments to enable something like what you are describing. For specificity lets say you are using DRMAA to talk to some HPC cluster and Galaxy's file data is stored in /galaxy/data on the galaxy web server but not on the HPC and there is some scratch space (/scratch) that is mounted on both the Galaxy web server and your HPC cluster. I would stand up an LWR (http://lwr.readthedocs.org/en/latest/) server right beside Galaxy on your web server. The LWR has a concept of managers that sort of mirrors the concept of runners in Galaxy - see the sample config for guidance on how to get it to talk with your cluster. It could use DRMAA, torque command-line tools, or condor at this time (I could add new methods e.g. PBS library if that would help). https://bitbucket.org/jmchilton/lwr/src/default
Re: [galaxy-dev] Managing Data Locality
Hi John, I have now read the top-level documentation for LWR, and gone through the sample configurations. I would appreciate if you would answer a few technical questions for me. 1) How exactly is the staging_directory in server.ini.sample used? Is that intended to be the (final) location at which to put files on the remote server? How is the relative path structure under $GALAXY_ROOT/databases/files handled? 2) What exactly does persistence_directory in server.ini.sample mean? Where should it be located, how will it be used? 3) What exactly does file_cache_dir in server.ini.sample mean? 4) Does LWR preserve some relative path (e.g. to GALAXY_ROOT) under the above directories? 5) Are files renamed when cached? If so, are they eventually restored to their original names? 6) Is it possible to customize the DRMAA and/or qsub requests made by LWR, for example to include additional settings such as Project or a memory limit? Is it possible to customize this on a case by case basis, rather than globally? 7) Are there any options for the queued_drmaa manager in job_managers.ini.sample which are not listed in that file? 8) What exactly are the differences between the queued_drmaa manager and the queued_cli manager? Are there any options for the latter which are not in the job_managers.ini.sample file? 9) When I attempt to run LWR (not having completed all the mentioned preparation steps, namely without setting DRMAA_LIBRARY_PATH), I get a Seg fault. Is this because it can't find DRMAA or is it potentially unrelated? In the latter case, here's the error being output to the console: ./run.sh: line 65: 26277 Segmentation fault paster serve server.ini $@ Lastly, a simple comment, hopefully helpful. It would be nice if the LWR install docs at least mentioned the dependency of PyOpenSSL 0.13 (or later) on OpenSSL 0.9.8f (or later), maybe even with a comment that pip will listen to the environment variables CFLAGS and LDFLAGS in the event one is creating a local installation of the OpenSSL library for LWR to use. Thank you for your time and assistance. Best, Eric From: jmchil...@gmail.com [jmchil...@gmail.com] on behalf of John Chilton [chil...@msi.umn.edu] Sent: Tuesday, November 05, 2013 11:58 AM To: Paniagua, Eric Cc: Galaxy Dev [galaxy-...@bx.psu.edu] Subject: Re: [galaxy-dev] Managing Data Locality Hey Eric, I think what you are purposing would be a major development effort and mirrors major development efforts ongoing. There are sortof ways to do this already, with various trade-offs, and none particularly well documented. So before undertaking this efforts I would dig into some alternatives. If you are using PBS, the PBS runner contains some logic for delegating to PBS for doing this kind of thing - I have never tried it. https://bitbucket.org/galaxy/galaxy-central/src/default/lib/galaxy/jobs/runners/pbs.py#cl-245 In may be possible to use a specially configured handler and the Galaxy object store to stage files to a particular mount before running jobs - not sure it makes sense in this case. It might be worth looking into this (having the object store stage your files, instead of solving it at the job runner level). My recommendation however would be to investigate the LWR job runner. There are a bunch of fairly recent developments to enable something like what you are describing. For specificity lets say you are using DRMAA to talk to some HPC cluster and Galaxy's file data is stored in /galaxy/data on the galaxy web server but not on the HPC and there is some scratch space (/scratch) that is mounted on both the Galaxy web server and your HPC cluster. I would stand up an LWR (http://lwr.readthedocs.org/en/latest/) server right beside Galaxy on your web server. The LWR has a concept of managers that sort of mirrors the concept of runners in Galaxy - see the sample config for guidance on how to get it to talk with your cluster. It could use DRMAA, torque command-line tools, or condor at this time (I could add new methods e.g. PBS library if that would help). https://bitbucket.org/jmchilton/lwr/src/default/job_managers.ini.sample?at=default On the Galaxy side, I would then create a job_conf.xml file telling certain HPC tools to be sent to the LWR. Be sure to enable the LWR runner at the top (see advanced example config) and then add at least one LWR destination. destinations destination id=lwr runner=lwr param id=urlhttp://localhost:8913//param !-- Leave Galaxy directory and data indices alone, assumes they are mounted in both places. -- param id=default_file_actionnone/param !-- Do stage everything in /galaxy/data though -- param id=file_action_configfile_actions.json/param /destination Then create a file_actions.json file in the Galaxy root directory (structure of this file is subject to change, current json layout doesn't feel very Galaxy-ish). {paths: [ {path
[galaxy-dev] Problem using Galaxy with a PostgreSQL database on a remote host
Dear Galaxy Developers, I've been banging my head against this one for a few days now. I have two Galaxy instances. One resides on a server called genomics, which also hosts the corresponding PostgreSQL installation. The second also resides on genomics, but its database is hosted on wigserv5. Based on the tests I just ran and code I just read, sqlalchemy (not Galaxy) is ignoring the hostname/port part of the database_connection string. For reference, the connection strings I've tried are: postgresql://glxeric:X@/glxeric?host=/tmp postgresql://glxeric:xx...@wigserv5.cshl.edu/glxeric?host=/tmp postgresql://glxeric:xx...@wigserv5.cshl.edu:5432/glxeric?host=/tmp postgresql://glxeric:X@adgdgdfdflkhjfdhfkl/glxeric?host=/tmp All of these appear to result in Galaxy connecting to the PostgreSQL installation on genomics, as determined by Galaxy schema version discrepancies and other constraints. With each connection string, Galaxy starts up normally. I force database activity by browsing saved histories. It works every time. By all appearances, the second Galaxy instance is using the PostgreSQL database hosted on genomics, not on wigserv5. All databases and roles exist, and the databases are populated. When I comment out the database_connection line in universe_wsgi.ini, I get errors arising from the later configuration of PostgreSQL-specific Galaxy options, as expected. I can connect to the database server on wigserv5 using psql -h wigserv5.cshl.edu -d glxeric -U glxeric from the server genomics. Have you ever observed this behavior from Galaxy or sqlalchemy? Thanks, Eric ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
Re: [galaxy-dev] Problem using Galaxy with a PostgreSQL database on a remote host
Hey Dannon, Thanks for pointing that out! I missed it. I am now connecting to the remote database. I ran sh manage_db.sh upgrade and it upgraded from schema 114 to 118 without error messages. I then ran sh ./scripts/migrate_tools/0010_tools.sh install_dependencies and received the following error: Traceback (most recent call last): File ./scripts/migrate_tools/migrate_tools.py, line 21, in app = MigrateToolsApplication( sys.argv[ 1 ] ) File /localdata1/galaxy/glxmaint/src/lib/tool_shed/galaxy_install/migrate/common.py, line 59, in __init__ install_dependencies=install_dependencies ) File /localdata1/galaxy/glxmaint/src/lib/tool_shed/galaxy_install/install_manager.py, line 122, in __init__ is_repository_dependency=is_repository_dependency ) File /localdata1/galaxy/glxmaint/src/lib/tool_shed/galaxy_install/install_manager.py, line 506, in install_repository is_repository_dependency=is_repository_dependency ) File /localdata1/galaxy/glxmaint/src/lib/tool_shed/galaxy_install/install_manager.py, line 345, in handle_repository_contents guid = self.get_guid( repository_clone_url, relative_install_dir, tool_config ) File /localdata1/galaxy/glxmaint/src/lib/tool_shed/galaxy_install/install_manager.py, line 253, in get_guid tool = self.toolbox.load_tool( full_path )! File /localdata1/galaxy/glxmaint/src/lib/galaxy/tools/__init__.py, line 671, in load_tool return ToolClass( config_file, root, self.app, guid=guid, repository_id=repository_id, **kwds ) File /localdata1/galaxy/glxmaint/src/lib/galaxy/tools/__init__.py, line 1045, in __init__ self.parse( root, guid=guid ) File /localdata1/galaxy/glxmaint/src/lib/galaxy/tools/__init__.py, line 1260, in parse self.parse_inputs( root ) File /localdata1/galaxy/glxmaint/src/lib/galaxy/tools/__init__.py, line 1351, in parse_inputs display, inputs = self.parse_input_page( page, enctypes ) File /localdata1/galaxy/glxmaint/src/lib/galaxy/tools/__init__.py, line 1655, in parse_input_page inputs = self.parse_input_elem( input_elem, enctypes ) File /localdata1/galaxy/glxmaint/src/lib/galaxy/tools/__init__.py, line 1723, in parse_input_elem case.inputs = self.parse_input_elem( case_elem, enctypes, context ) File /localdata1/galaxy/glxmaint/src/lib/galaxy/tools/__init__.py, line 1679, in pa! rse_input_elem group.inputs = self.parse_input_elem( elem, enc! types, c ontext ) File /localdata1/galaxy/glxmaint/src/lib/galaxy/tools/__init__.py, line 1751, in parse_input_elem param = self.parse_param_elem( elem, enctypes, context ) File /localdata1/galaxy/glxmaint/src/lib/galaxy/tools/__init__.py, line 1764, in parse_param_elem param = ToolParameter.build( self, input_elem ) File /localdata1/galaxy/glxmaint/src/lib/galaxy/tools/parameters/basic.py, line 215, in build return parameter_types[param_type]( tool, param ) File /localdata1/galaxy/glxmaint/src/lib/galaxy/tools/parameters/basic.py, line 1566, in __init__ ToolParameter.__init__( self, tool, elem ) File /localdata1/galaxy/glxmaint/src/lib/galaxy/tools/parameters/basic.py, line 54, in __init__ self.validators.append( validation.Validator.from_element( self, elem ) ) File /localdata1/galaxy/glxmaint/src/lib/galaxy/tools/parameters/validation.py, line 23, in from_element return validator_types[type].from_element( param, elem ) File /localdata1/galaxy/glxmaint/src/lib/galaxy/t! ools/parameters/validation.py, line 283, in from_element tool_data_table = param.tool.app.tool_data_tables[ table_name ] File /localdata1/galaxy/glxmaint/src/lib/galaxy/tools/data/__init__.py, line 35, in __getitem__ return self.data_tables.__getitem__( key ) KeyError: 'gatk_picard_indexes' I fixed this by adding the appropriate entries to tool_data_table_conf.xml. I then reran the migrate_tools command successfully. However, now my history_dataset_association table in the database was blown away at some point. The table is now completely empty. Have you ever seen this before? Thanks, Eric From: Dannon Baker [dannon.ba...@gmail.com] Sent: Tuesday, May 27, 2014 7:40 AM To: Paniagua, Eric Cc: galaxy-dev@lists.bx.psu.edu Subject: Re: [galaxy-dev] Problem using Galaxy with a PostgreSQL database on a remote host Hey Eric, It looks like you have connection info for both tcp/ip connections and unix sockets in the connection strings. If you're logging in using psql -h wigserv5.cshl.eduhttp://wigserv5.cshl.edu snip, then you only want the tcp/ip connection info. Drop the ?host=tmp off the third option you listed and I think you'll be up and running, so: postgresql://glxeric:xx...@wigserv5.cshl.edu:5432/glxerichttp://glxeric:xx...@wigserv5.cshl.edu:5432/glxeric -Dannon On Sat, May 24, 2014 at 1:49 AM, Paniagua, Eric epani...@cshl.edumailto:epani...@cshl.edu wrote: Dear Galaxy Developers, I've been banging my head against this one for a few days now. I have two Galaxy instances. One resides on a server called genomics, which also hosts the corresponding
Re: [galaxy-dev] Problem using Galaxy with a PostgreSQL database on a remote host
The dataset table is populated. I looked at the SQL dump file I used to copy the database, and it has create table and copy into statements for history_dataset_association, but it looks like there may have been an error while executing them. Trying to figure out how to get my data in... From: Dannon Baker [dannon.ba...@gmail.com] Sent: Tuesday, May 27, 2014 11:43 AM To: Paniagua, Eric Cc: galaxy-dev@lists.bx.psu.edu Subject: Re: [galaxy-dev] Problem using Galaxy with a PostgreSQL database on a remote host On Tue, May 27, 2014 at 11:26 AM, Paniagua, Eric epani...@cshl.edumailto:epani...@cshl.edu wrote: Thanks for pointing that out! I missed it. I am now connecting to the remote database. I ran sh manage_db.sh upgrade and it upgraded from schema 114 to 118 without error messages. I then ran sh ./scripts/migrate_tools/0010_tools.sh install_dependencies and received the following error: line 35, in __getitem__ return self.data_tables.__getitem__( key ) KeyError: 'gatk_picard_indexes' I fixed this by adding the appropriate entries to tool_data_table_conf.xml. I then reran the migrate_tools command successfully. However, now my history_dataset_association table in the database was blown away at some point. The table is now completely empty. Have you ever seen this before? I have not seen the tool migration issue before, but it seems harmless. The fact that your history_dataset_association table is empty is concerning if there was ever anything in it. Can you verify that there are datasets in the same database that *should* be associated to a history? It sounds like this galaxy instance has been used with different databases, and my hope is that the wires are crossed up here and there actually should not be any. ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
Re: [galaxy-dev] Problem using Galaxy with a PostgreSQL database on a remote host
I have created a fresh dump with $ pg_dump -U galaxyprod galaxyprod This time the import proceeded cleanly. Further, using PostgreSQL 9.1, I no longer get the error regarding a read only database cursor and getting the next history item number. I am currently running a test job to confirm that things are working as expected. However, just the fact that this job is running is a very good sign. From: Dannon Baker [dannon.ba...@gmail.com] Sent: Tuesday, May 27, 2014 11:58 AM To: Paniagua, Eric Cc: galaxy-dev@lists.bx.psu.edu Subject: Re: [galaxy-dev] Problem using Galaxy with a PostgreSQL database on a remote host Since the database has lost consistency, I'd really try a fresh pg_dump / import if that's possible. If there's an error this time around, note it and send it on over and we can figure out where to go from there. On Tue, May 27, 2014 at 11:48 AM, Paniagua, Eric epani...@cshl.edumailto:epani...@cshl.edu wrote: The dataset table is populated. I looked at the SQL dump file I used to copy the database, and it has create table and copy into statements for history_dataset_association, but it looks like there may have been an error while executing them. Trying to figure out how to get my data in... From: Dannon Baker [dannon.ba...@gmail.commailto:dannon.ba...@gmail.com] Sent: Tuesday, May 27, 2014 11:43 AM To: Paniagua, Eric Cc: galaxy-dev@lists.bx.psu.edumailto:galaxy-dev@lists.bx.psu.edu Subject: Re: [galaxy-dev] Problem using Galaxy with a PostgreSQL database on a remote host On Tue, May 27, 2014 at 11:26 AM, Paniagua, Eric epani...@cshl.edumailto:epani...@cshl.edumailto:epani...@cshl.edumailto:epani...@cshl.edu wrote: Thanks for pointing that out! I missed it. I am now connecting to the remote database. I ran sh manage_db.sh upgrade and it upgraded from schema 114 to 118 without error messages. I then ran sh ./scripts/migrate_tools/0010_tools.sh install_dependencies and received the following error: line 35, in __getitem__ return self.data_tables.__getitem__( key ) KeyError: 'gatk_picard_indexes' I fixed this by adding the appropriate entries to tool_data_table_conf.xml. I then reran the migrate_tools command successfully. However, now my history_dataset_association table in the database was blown away at some point. The table is now completely empty. Have you ever seen this before? I have not seen the tool migration issue before, but it seems harmless. The fact that your history_dataset_association table is empty is concerning if there was ever anything in it. Can you verify that there are datasets in the same database that *should* be associated to a history? It sounds like this galaxy instance has been used with different databases, and my hope is that the wires are crossed up here and there actually should not be any. ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/
Re: [galaxy-dev] Problem using Galaxy with a PostgreSQL database on a remote host
Restarting the Galaxy server in multiple process mode appears to have helped. The test job is now running. From: Paniagua, Eric Sent: Tuesday, May 27, 2014 12:43 PM To: Dannon Baker Cc: galaxy-dev@lists.bx.psu.edu Subject: RE: [galaxy-dev] Problem using Galaxy with a PostgreSQL database on a remote host Correction. The job has entered the waiting to run phase, and doesn't appear to be leaving it. There is nothing of note in the server log. From: Paniagua, Eric Sent: Tuesday, May 27, 2014 12:41 PM To: Dannon Baker Cc: galaxy-dev@lists.bx.psu.edu Subject: RE: [galaxy-dev] Problem using Galaxy with a PostgreSQL database on a remote host I have created a fresh dump with $ pg_dump -U galaxyprod galaxyprod This time the import proceeded cleanly. Further, using PostgreSQL 9.1, I no longer get the error regarding a read only database cursor and getting the next history item number. I am currently running a test job to confirm that things are working as expected. However, just the fact that this job is running is a very good sign. From: Dannon Baker [dannon.ba...@gmail.com] Sent: Tuesday, May 27, 2014 11:58 AM To: Paniagua, Eric Cc: galaxy-dev@lists.bx.psu.edu Subject: Re: [galaxy-dev] Problem using Galaxy with a PostgreSQL database on a remote host Since the database has lost consistency, I'd really try a fresh pg_dump / import if that's possible. If there's an error this time around, note it and send it on over and we can figure out where to go from there. On Tue, May 27, 2014 at 11:48 AM, Paniagua, Eric epani...@cshl.edumailto:epani...@cshl.edu wrote: The dataset table is populated. I looked at the SQL dump file I used to copy the database, and it has create table and copy into statements for history_dataset_association, but it looks like there may have been an error while executing them. Trying to figure out how to get my data in... From: Dannon Baker [dannon.ba...@gmail.commailto:dannon.ba...@gmail.com] Sent: Tuesday, May 27, 2014 11:43 AM To: Paniagua, Eric Cc: galaxy-dev@lists.bx.psu.edumailto:galaxy-dev@lists.bx.psu.edu Subject: Re: [galaxy-dev] Problem using Galaxy with a PostgreSQL database on a remote host On Tue, May 27, 2014 at 11:26 AM, Paniagua, Eric epani...@cshl.edumailto:epani...@cshl.edumailto:epani...@cshl.edumailto:epani...@cshl.edu wrote: Thanks for pointing that out! I missed it. I am now connecting to the remote database. I ran sh manage_db.sh upgrade and it upgraded from schema 114 to 118 without error messages. I then ran sh ./scripts/migrate_tools/0010_tools.sh install_dependencies and received the following error: line 35, in __getitem__ return self.data_tables.__getitem__( key ) KeyError: 'gatk_picard_indexes' I fixed this by adding the appropriate entries to tool_data_table_conf.xml. I then reran the migrate_tools command successfully. However, now my history_dataset_association table in the database was blown away at some point. The table is now completely empty. Have you ever seen this before? I have not seen the tool migration issue before, but it seems harmless. The fact that your history_dataset_association table is empty is concerning if there was ever anything in it. Can you verify that there are datasets in the same database that *should* be associated to a history? It sounds like this galaxy instance has been used with different databases, and my hope is that the wires are crossed up here and there actually should not be any. ___ Please keep all replies on the list by using reply all in your mail client. To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/