Re: [galaxy-dev] Retain the dbkey specified for an input dataset through a Galaxy workflow?

2016-01-27 Thread Greg Von Kuster
I’ve submitted an issue about this here: 
https://github.com/galaxyproject/galaxy/issues/1598


> On Jan 26, 2016, at 8:15 AM, Greg Von Kuster  wrote:
> 
> I’ve tracked down how the dbkey is getting lost on tool output datasets that 
> are part of a collection, but now I’m wondering if the tool’s 
>   tag is lacking information about the dbkey and this is 
> why it is getting lost.
> 
> At least the code implies this.  John, can you help here?
> 
> The populate_collection_elements() function in 
> ~/lib/galaxy/tools/parameters/output_collect.py looks for a match on dbkey 
> from the  tag set, and if there is no match the default 
> dbkey value “?” is associated with the output dataset in the collection.  An 
> example tool that results in this behavior has this tag set:
> 
>
> directory="data_MP" ext="gff" visible="false" />
>
> 
> I’ve not found an example anywhere in the Galaxy code or in tools that have 
> been written to produce output collections that includes a dbkey designation 
> in the  tag set, so I’m wondering if I am correctly 
> understanding the intent of this code.
> 
> I have a work-around fix that works without adding a dbkey designation to the 
> tag set.  The caller of the populate_collection_elements() function is a 
> function named collect_dynamic_collections(), whose signature includes the 
> input datasets from which the dbkey can be retained.
> 
> I can submit a PR that includes this approach to a fix, but if the fix is as 
> simple as adding some kind of dbkey designation to the tag set, an example of 
> what that should look like would be much appreciated.
> 
> Thanks very much!
> 
> Greg Von Kuster
> 
> 
>> On Jan 25, 2016, at 10:11 AM, Greg Von Kuster  wrote:
>> 
>> I’ve disovered that this issue is related to tools rather than workflows, 
>> and specifically with tools that produce dataset collections on output.  In 
>> the "job.finish()" method, metadata that includes the input dataset’s dbkey 
>> setting is generated correctly for output datasets that are not part of a 
>> collection, but the dbkey (and possibly other metadata attributes) are lost 
>> if the output dataset is part of a collection.  I’m still digging to find 
>> how setting metadata for output dataset collections is handled differently 
>> than regular output datasets.
>> 
>> 
>>> On Jan 22, 2016, at 2:34 PM, Greg Von Kuster  wrote:
>>> 
>>> Hello Galaxians,
>>> 
>>> I’m running Galaxy 15.10 and running workflows that include tools that 
>>> require reference genomes (e.g. Extract Genomic DNA).  I set the dbkey for 
>>> the input dataset and it is retained for some tools, but not others.  
>>> Running the workflow multiple times, it looks like the dbekey is lost at 
>>> different tool points in the workflow.  Is this a known issue or is there 
>>> some setting I’ve missed.  I’ve seen where the output dattype can be set 
>>> for each tool, but not the dbkey.  This is a problem because any tools that 
>>> require a dbkey downstream result in errors.
>>> 
>>> I was running the dev branch for a while, but workflow bugs in that branch 
>>> forced me to revert back to 15.10.
>>> 
>>> I’ve searched biostar and the mail lists, but haven’t seen an answer for 
>>> this specific issue, although there are several related threads from the 
>>> past.  Sorry if it’s been answered and I missed it.
>>> 
>>> Thanks very much for any help you can provide,
>>> 
>>> Greg Von Kuster
>>> 
>> 
>> ___
>> Please keep all replies on the list by using "reply all"
>> in your mail client.  To manage your subscriptions to this
>> and other Galaxy lists, please use the interface at:
>> https://lists.galaxyproject.org/
>> 
>> To search Galaxy mailing lists use the unified search at:
>> http://galaxyproject.org/search/mailinglists/
> 
> ___
> Please keep all replies on the list by using "reply all"
> in your mail client.  To manage your subscriptions to this
> and other Galaxy lists, please use the interface at:
>  https://lists.galaxyproject.org/
> 
> To search Galaxy mailing lists use the unified search at:
>  http://galaxyproject.org/search/mailinglists/

___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  https://lists.galaxyproject.org/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/

Re: [galaxy-dev] Retain the dbkey specified for an input dataset through a Galaxy workflow?

2016-01-26 Thread Greg Von Kuster
I’ve tracked down how the dbkey is getting lost on tool output datasets that 
are part of a collection, but now I’m wondering if the tool’s 
  tag is lacking information about the dbkey and this is why 
it is getting lost.

At least the code implies this.  John, can you help here?

The populate_collection_elements() function in 
~/lib/galaxy/tools/parameters/output_collect.py looks for a match on dbkey from 
the  tag set, and if there is no match the default dbkey 
value “?” is associated with the output dataset in the collection.  An example 
tool that results in this behavior has this tag set:





I’ve not found an example anywhere in the Galaxy code or in tools that have 
been written to produce output collections that includes a dbkey designation in 
the  tag set, so I’m wondering if I am correctly 
understanding the intent of this code.

I have a work-around fix that works without adding a dbkey designation to the 
tag set.  The caller of the populate_collection_elements() function is a 
function named collect_dynamic_collections(), whose signature includes the 
input datasets from which the dbkey can be retained.

I can submit a PR that includes this approach to a fix, but if the fix is as 
simple as adding some kind of dbkey designation to the tag set, an example of 
what that should look like would be much appreciated.

Thanks very much!

Greg Von Kuster


> On Jan 25, 2016, at 10:11 AM, Greg Von Kuster  wrote:
> 
> I’ve disovered that this issue is related to tools rather than workflows, and 
> specifically with tools that produce dataset collections on output.  In the 
> "job.finish()" method, metadata that includes the input dataset’s dbkey 
> setting is generated correctly for output datasets that are not part of a 
> collection, but the dbkey (and possibly other metadata attributes) are lost 
> if the output dataset is part of a collection.  I’m still digging to find how 
> setting metadata for output dataset collections is handled differently than 
> regular output datasets.
> 
> 
>> On Jan 22, 2016, at 2:34 PM, Greg Von Kuster  wrote:
>> 
>> Hello Galaxians,
>> 
>> I’m running Galaxy 15.10 and running workflows that include tools that 
>> require reference genomes (e.g. Extract Genomic DNA).  I set the dbkey for 
>> the input dataset and it is retained for some tools, but not others.  
>> Running the workflow multiple times, it looks like the dbekey is lost at 
>> different tool points in the workflow.  Is this a known issue or is there 
>> some setting I’ve missed.  I’ve seen where the output dattype can be set for 
>> each tool, but not the dbkey.  This is a problem because any tools that 
>> require a dbkey downstream result in errors.
>> 
>> I was running the dev branch for a while, but workflow bugs in that branch 
>> forced me to revert back to 15.10.
>> 
>> I’ve searched biostar and the mail lists, but haven’t seen an answer for 
>> this specific issue, although there are several related threads from the 
>> past.  Sorry if it’s been answered and I missed it.
>> 
>> Thanks very much for any help you can provide,
>> 
>> Greg Von Kuster
>> 
> 
> ___
> Please keep all replies on the list by using "reply all"
> in your mail client.  To manage your subscriptions to this
> and other Galaxy lists, please use the interface at:
>  https://lists.galaxyproject.org/
> 
> To search Galaxy mailing lists use the unified search at:
>  http://galaxyproject.org/search/mailinglists/

___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  https://lists.galaxyproject.org/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/

Re: [galaxy-dev] Retain the dbkey specified for an input dataset through a Galaxy workflow?

2016-01-25 Thread Greg Von Kuster
I’ve disovered that this issue is related to tools rather than workflows, and 
specifically with tools that produce dataset collections on output.  In the 
"job.finish()" method, metadata that includes the input dataset’s dbkey setting 
is generated correctly for output datasets that are not part of a collection, 
but the dbkey (and possibly other metadata attributes) are lost if the output 
dataset is part of a collection.  I’m still digging to find how setting 
metadata for output dataset collections is handled differently than regular 
output datasets.


> On Jan 22, 2016, at 2:34 PM, Greg Von Kuster  wrote:
> 
> Hello Galaxians,
> 
> I’m running Galaxy 15.10 and running workflows that include tools that 
> require reference genomes (e.g. Extract Genomic DNA).  I set the dbkey for 
> the input dataset and it is retained for some tools, but not others.  Running 
> the workflow multiple times, it looks like the dbekey is lost at different 
> tool points in the workflow.  Is this a known issue or is there some setting 
> I’ve missed.  I’ve seen where the output dattype can be set for each tool, 
> but not the dbkey.  This is a problem because any tools that require a dbkey 
> downstream result in errors.
> 
> I was running the dev branch for a while, but workflow bugs in that branch 
> forced me to revert back to 15.10.
> 
> I’ve searched biostar and the mail lists, but haven’t seen an answer for this 
> specific issue, although there are several related threads from the past.  
> Sorry if it’s been answered and I missed it.
> 
> Thanks very much for any help you can provide,
> 
> Greg Von Kuster
> 

___
Please keep all replies on the list by using "reply all"
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  https://lists.galaxyproject.org/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/