Re: [galaxy-dev] referring to tool_data_tables[] structure (John Chilton)

2014-01-15 Thread John Chilton
First I am not the best person to respond to this - I hope someone
like Dan or JJ can follow up. In my past tool development I have tried
to limit my use of .loc files because they can be impediments to
reproduciblity (they are getting better though).

I have a pet peeve - it is when I ask a general question with specific
examples and then people just provide alternative ideas to the
specific examples. I am about to do that to you - sorry :(. This is
because I don't really think tools should be breaking these
abstractions. Instantiating app in the previous example would require
for instance the compute nodes to have access to the database - which
they should not have. You could pass $__root__ into your tool and read
the XML file $__root__/tool_data_table_conf.xml directly - I would
still avoid this but it is better than trying to instantiate Galaxy
internals from your tool.

As mentioned though - I think in your two use cases there are some
better approaches:

If your first case - that spec is not really something that is going
to vary from site to site right? It is fixed data - so I would just
place a copy with your tool and resolve references to it relative to
your tool wrapper. Most scripting languages have a way to get the path
to the current file - for instance in Python you can do something like
os.path.join( os.path.dirname( __file__ ), 'ncbi_columns_spec.txt' ).
This should work with manual installs as well as with the tool shed.

In the second case - I think that data managers are going to be the
best way to handle this going forward. I believe they can dynamically
update these loc files without restarting Galaxy (... though there are
probably some caveats to that). I don't know if they can be driven by
the API - but I think I remember Dan mentioning they are just normal
tools so the tools API should work???

https://wiki.galaxyproject.org/Admin/Tools/DataManagers/HowTo/Define
https://github.com/peterjc/galaxy_blast/issues/22

If there is something they do not currently do but need to, Trello
cards can be created and community contributions considered, because
data managers are going to be the best path forward for dynamically
updating .loc files.

Hope this helps!

-John

On Mon, Jan 13, 2014 at 12:18 PM, Dooley, Damion damion.doo...@bccdc.ca wrote:
 Hi John,

 Thansk for the feedback.  Righto, I saw abundant use of trans in core 
 galaxy code; and problem is I'd wanted access from a tool wrapper's python 
 code.  Basically I'm trying to get more out of .loc files, for example this 
 field specification for blast report data:

#value   typesubtype sortfilter  default min max 
 choose  name
# Remember to edit tool_data_table_conf.xml for column spec!
length   numeric int 1   1   1 
   Alignment length
qstart   numeric int 1   1   1 
   Alignment start in query
qend numeric int 1   1   1   
 Alignment end in query
sstart   numeric int 1   1   1 
   Alignment start in subject
send numeric int 1   1   1   
 Alignment end in subject
qseq textatgc0   1   1   
 Aligned part of query sequence
sseq textatgc0   1   1   
 Aligned part of subject sequence
mseq textatgc0   1   1   
 Alignment, matched part
pident   numeric float   1   1   97  90  100 1 
   Percentage of identical matches
...

 This data is being accessed in our tool xml code via a too_data_table entry 
 and is handily providing field lists for sorting, filtering, and searching 
 input data.

 In our python code I'd like to just say

 blastfieldspec = app.tool_data_tables[ 'blast_report_fields' ]

 And then go to town on sorting, filtering, validation etc. as desired in 
 python, using this spec.  I don't want the python code to be specifying the 
 .loc path directly (which I have to do now), I'd much rather take advantage 
 of what tool_data_tables could provide.

 Our second desired use of tool_data_tables info is a case where some tools 
 would be making use of 3rd party datasets that another tool manages.  We want 
 to set up a tool/system that manages 3rd party reference databases (e.g. for 
 particular specialized gene universal target databases like Chaperonin 
 cpn60 or Legionella mip).  This system would periodically get and process 
 fasta data online from sources listed in a .loc file.  We'd process and use 
 these databases in pulldown menus, but each of these reference database will 
 need management through a Galaxy interface via a plugin tool I guess.  
 Someone has done a prototype for this for database specific to NCBI.  We'd 
 target other niche data sources.  I realise another hurdle is 

Re: [galaxy-dev] referring to tool_data_tables[] structure (John Chilton)

2014-01-13 Thread Dooley, Damion
Hi John,

Thansk for the feedback.  Righto, I saw abundant use of trans in core galaxy 
code; and problem is I'd wanted access from a tool wrapper's python code.  
Basically I'm trying to get more out of .loc files, for example this field 
specification for blast report data:

   #value   typesubtype sortfilter  default min max choose  
name
   # Remember to edit tool_data_table_conf.xml for column spec!
   length   numeric int 1   1   1   
Alignment length
   qstart   numeric int 1   1   1   
Alignment start in query
   qend numeric int 1   1   1   
Alignment end in query
   sstart   numeric int 1   1   1   
Alignment start in subject
   send numeric int 1   1   1   
Alignment end in subject
   qseq textatgc0   1   1   Aligned 
part of query sequence
   sseq textatgc0   1   1   Aligned 
part of subject sequence
   mseq textatgc0   1   1   
Alignment, matched part
   pident   numeric float   1   1   97  90  100 1   
Percentage of identical matches
   ...

This data is being accessed in our tool xml code via a too_data_table entry and 
is handily providing field lists for sorting, filtering, and searching input 
data.

In our python code I'd like to just say

blastfieldspec = app.tool_data_tables[ 'blast_report_fields' ]

And then go to town on sorting, filtering, validation etc. as desired in 
python, using this spec.  I don't want the python code to be specifying the 
.loc path directly (which I have to do now), I'd much rather take advantage of 
what tool_data_tables could provide.

Our second desired use of tool_data_tables info is a case where some tools 
would be making use of 3rd party datasets that another tool manages.  We want 
to set up a tool/system that manages 3rd party reference databases (e.g. for 
particular specialized gene universal target databases like Chaperonin cpn60 
or Legionella mip).  This system would periodically get and process fasta data 
online from sources listed in a .loc file.  We'd process and use these 
databases in pulldown menus, but each of these reference database will need 
management through a Galaxy interface via a plugin tool I guess.  Someone has 
done a prototype for this for database specific to NCBI.  We'd target other 
niche data sources.  I realise another hurdle is getting modified .loc file 
info refreshed back into galaxy without having to stop/restart the server.  
Hoping an extension to the tool_data_tables class could do this too.

Regards, 

Damion



--

Message: 1
Date: Fri, 10 Jan 2014 16:56:56 -0800
From: Dooley, Damion damion.doo...@bccdc.ca
To: galaxy-dev@lists.bx.psu.edu galaxy-dev@lists.bx.psu.edu
Subject: [galaxy-dev] referring to tool_data_tables[] structure
Message-ID:
7891813f3c8f424b97d8bf2e5600e51903301b88e...@vexccr02.phsabc.ehcnet.ca

Content-Type: text/plain; charset=us-ascii

I've seen

   $__app__.tool_data_tables[ 'all_fasta' ].get_fields() )[0][-1]

in tool xml templates.  Is there a way I can access the tool_data_tables 
structure from python code too?

I see all the initialization stuff happening in 
https://bitbucket.org/abrenner/galaxy-central/src/f3e736fe03df3a6dd5438c12ba35ea791a1eaca9/lib/galaxy/app.py?at=default
But not seeing where one can access any of these app variables?

Regards,

Damion


--

Message: 2
Date: Fri, 10 Jan 2014 20:42:50 -0600
From: John Chilton chil...@msi.umn.edu
To: Dooley, Damion damion.doo...@bccdc.ca
Cc: galaxy-dev@lists.bx.psu.edu galaxy-dev@lists.bx.psu.edu
Subject: Re: [galaxy-dev] referring to tool_data_tables[] structure
Message-ID:
canwbokeuz4kfb0+9z9rvk0rtdmte1iwmg+e3uo8ytuoxfbl...@mail.gmail.com
Content-Type: text/plain; charset=ISO-8859-1

Just to clarify do you want to access them from Galaxy web server code
or from a Galaxy tool wrapper written in Python?

Nearly every part of the Galaxy source code has access to app and
everything inside of it, for instance all controller methods take in a
trans variable that contains a reference to app (trans.app).

I suspect you want to access it from a tool wrapper though? This is
not possible. If this is the case - what are you hoping to accomplish?
Do you want to access to the all_fasta data table information in the
example?

-John
___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at: