Re: [galaxy-dev] datatype dependencies

2014-07-18 Thread John Chilton
My understanding of the code is that tool shed dependencies (or local
dependencies) will not be available to tool shed datatypes (for
sniffing for instance). Sorry.

If you want to hack up your local instance to resolve dependencies
during the sniffing process that may be possible - my guess is you
could add requirement tags to tools/data_source/upload.xml and the
__SET_METADATA__ tool definition embedded in
lib/galaxy/datatypes/registry.py - though I have not tried this.

-John

On Thu, Jul 17, 2014 at 2:24 PM, Peter Cock p.j.a.c...@googlemail.com wrote:
 On Thu, Jul 17, 2014 at 8:20 PM, Eric Rasche rasche.e...@yandex.ru wrote:
 On 07/17/2014 02:11 PM, Peter Cock wrote:
 You could do something like that, and we already have
 Biopython packages in the ToolShed which can be listed
 as dependencies :)


 If my module depends on the biopython from the toolshed, will that be
 accessible within a datatype? Would it be as simple as from Bio import
 X? Most of what I've seen of dependencies (and please forgive my lack
 of knowledge about them) consists of env.sh being sourced with paths to
 binaries, prior to tool run.

 I don't know - this may well be a gap in the ToolShed
 framework, since thus far most of the datatypes defined
 have been self contained.

 I have asked something similar before (in the context
 of defining automatic file format conversion like the way
 Galaxy can turn FASTA into tabular in input parameters
 expecting tabular), where there could be a binary
 dependency.

 However, some things like GenBank are tricky - in order
 to tolerate NCBI dumps the Biopython parser will ignore
 any free text before the first LOCUS line. A confusing
 side effect is most text files are then treated as a
 GenBank file with zero records. But if it came back
 with some records it is probably OK :)

 Interesting, very good to know.


 Basically Biopython also does not care to offer file
 format detection simply because it is a can of worms.

 Zen of Python - explicit is better than implicit.

 We want you to tell us which format you want to try
 parsing it as.

 Yes! Exactly! Which is why it's perfectly fine here:

 SeqIO.parse( dataset.file_name, genbank )

 All I want to know is whether or not this parses as a genbank file (and
 has 1 or more records). BioPython may not do automatic format detection
 (yuck, agreed), but since I already know I'm looking for a genbank file,
 simply being able to parse it or not is good enough.

 With those provisos, you should be OK :)

 Peter
 ___
 Please keep all replies on the list by using reply all
 in your mail client.  To manage your subscriptions to this
 and other Galaxy lists, please use the interface at:
   http://lists.bx.psu.edu/

 To search Galaxy mailing lists use the unified search at:
   http://galaxyproject.org/search/mailinglists/
___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/


Re: [galaxy-dev] datatype dependencies

2014-07-18 Thread Eric Rasche
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1


On 07/18/2014 09:49 AM, John Chilton wrote:
 My understanding of the code is that tool shed dependencies (or local
 dependencies) will not be available to tool shed datatypes (for
 sniffing for instance). Sorry.

I figured as much, not very surprising at all. Dependencies
notwithstanding, the idea has some modicum of merit. There are plenty of
people who have already written great parsers that throw up errors, why
should datatypes re-write them?

 If you want to hack up your local instance to resolve dependencies
 during the sniffing process that may be possible - my guess is you
 could add requirement tags to tools/data_source/upload.xml and the
 __SET_METADATA__ tool definition embedded in
 lib/galaxy/datatypes/registry.py - though I have not tried this.

Well heck, at that point I'd just use the fact that I know I'm in
lib/galaxy/datatypes to locate the BioPython dependency that was
installed through greps, globs, and finds. Though I'll hold off on that
for a better solution.

Cheers,
Eric

 
 -John
 
 On Thu, Jul 17, 2014 at 2:24 PM, Peter Cock p.j.a.c...@googlemail.com wrote:
 On Thu, Jul 17, 2014 at 8:20 PM, Eric Rasche rasche.e...@yandex.ru wrote:
 On 07/17/2014 02:11 PM, Peter Cock wrote:
 You could do something like that, and we already have
 Biopython packages in the ToolShed which can be listed
 as dependencies :)


 If my module depends on the biopython from the toolshed, will that be
 accessible within a datatype? Would it be as simple as from Bio import
 X? Most of what I've seen of dependencies (and please forgive my lack
 of knowledge about them) consists of env.sh being sourced with paths to
 binaries, prior to tool run.

 I don't know - this may well be a gap in the ToolShed
 framework, since thus far most of the datatypes defined
 have been self contained.

 I have asked something similar before (in the context
 of defining automatic file format conversion like the way
 Galaxy can turn FASTA into tabular in input parameters
 expecting tabular), where there could be a binary
 dependency.

 However, some things like GenBank are tricky - in order
 to tolerate NCBI dumps the Biopython parser will ignore
 any free text before the first LOCUS line. A confusing
 side effect is most text files are then treated as a
 GenBank file with zero records. But if it came back
 with some records it is probably OK :)

 Interesting, very good to know.


 Basically Biopython also does not care to offer file
 format detection simply because it is a can of worms.

 Zen of Python - explicit is better than implicit.

 We want you to tell us which format you want to try
 parsing it as.

 Yes! Exactly! Which is why it's perfectly fine here:

 SeqIO.parse( dataset.file_name, genbank )

 All I want to know is whether or not this parses as a genbank file (and
 has 1 or more records). BioPython may not do automatic format detection
 (yuck, agreed), but since I already know I'm looking for a genbank file,
 simply being able to parse it or not is good enough.

 With those provisos, you should be OK :)

 Peter
 ___
 Please keep all replies on the list by using reply all
 in your mail client.  To manage your subscriptions to this
 and other Galaxy lists, please use the interface at:
   http://lists.bx.psu.edu/

 To search Galaxy mailing lists use the unified search at:
   http://galaxyproject.org/search/mailinglists/
- -- 
Eric Rasche
Programmer II
Center for Phage Technology
Texas AM University
College Station, TX 77843
404-692-2048
e...@tamu.edu
rasche.e...@yandex.ru
-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.22 (GNU/Linux)

iQIcBAEBAgAGBQJTyTuPAAoJEMqDXdrsMcpVqLAQAJ7wN2AXkwYYvdfSf5YxZtiH
ctPRSs1C7yC+8mHzxCPwY1i28fyAZXOP55lO2UzAN+KxpCTHBSned8Se4+jAJqIP
J/JNxRyCW0Z8S0AjnPcVA17IrHfkhpgWMPdvsEDqEAfkLllwWcV81LepjjZEpEUV
f2sWXdsgku35wO3wH+lk72NppqW0Qh0hYUh108wDxajeKoOhUHX29tkvVztqeg6P
MWPJDFrKqjXvUr/IwMKJRClU6VIVIQMp5XF12sngVU1rdVFkAh6ndCCE14QnwD2c
jrOnH3YXt8c+fXrh4bq64JiTQjCDjHcbukkf5nNGnruxvhU0O1ZPzJzHTmYbuV3+
FnzBCm64REfVhCsJXYpLv7IaVYCdqPhcOzGDoaqxY6AoIEd+eVA+7UGnoh1mAM7g
9oanlVigU5l+bspdmejdFGAPbyzXTuwvdu0JSObFYRyGFeT1xH0WuREfCaJavp0I
659CSpbshaNPQqpBMDYRIMY1lJVV9j0zoOAQTh65S8IHhjsTBVt/wE40JESOlXfa
wB9XPFuQWex51FpHYk4AwyPoOSKIXSWJXk8YzhaCmkfWQmwgrMvAc3cqqXRLZApx
53syXmlgfrA82NcACLXMBiDlUcjwqOvIi8EFr7MwnPyhgWWLbbvXegKkV7mSGwkn
SzEBs3lwKEDxPXFWStOG
=axTZ
-END PGP SIGNATURE-
___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/


Re: [galaxy-dev] datatype dependencies

2014-07-18 Thread Peter Cock
On Fri, Jul 18, 2014 at 4:21 PM, Eric Rasche rasche.e...@yandex.ru wrote:
 On 07/18/2014 09:49 AM, John Chilton wrote:
 My understanding of the code is that tool shed dependencies (or local
 dependencies) will not be available to tool shed datatypes (for
 sniffing for instance). Sorry.

 I figured as much, not very surprising at all. Dependencies
 notwithstanding, the idea has some modicum of merit. There are plenty of
 people who have already written great parsers that throw up errors, why
 should datatypes re-write them?

Exactly - Trello request for the toolshed to handle both Python and
binary dependencies for datatypes?

(e.g. samtools is a binary dependency of the SAM/BAM datatypes,
used for conversion and indexing)

 If you want to hack up your local instance to resolve dependencies
 during the sniffing process that may be possible - my guess is you
 could add requirement tags to tools/data_source/upload.xml and the
 __SET_METADATA__ tool definition embedded in
 lib/galaxy/datatypes/registry.py - though I have not tried this.

 Well heck, at that point I'd just use the fact that I know I'm in
 lib/galaxy/datatypes to locate the BioPython dependency that was
 installed through greps, globs, and finds. Though I'll hold off on that
 for a better solution.

I'd manually install the Python dependencies as part of the Python
used to run Galaxy itself?

Peter
___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/


[galaxy-dev] datatype dependencies

2014-07-17 Thread Eric Rasche
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Let's pretend for a second that I'm rather lazy (oh...wait), and I have
ZERO interest in writing datatype parsers to sniff and validate whether
or not a specific file is a specific datatype. I'm a sysadmin and
bioinformatician, and I've worked with dozens of libraries that exist to
parse file formats, and they all die in flames when I feed them bad data.

Would it be possible to somehow define requirements for datatypes?

I don't want to take on the burden of code I write saying yes, I've
sniffed+validated this and it is absolutely a genbank file. That's a
lot of responsibility, especially if people have malformed genbank files
and their tools fail as a result.

I would like to do this with BioPython and turf the validation to
another library that exists to parse genbank files, that will raise and
exception if they're invalid.

 def sniff(self, filename):
   from Bio import SeqIO
   try:
 self.records = list(SeqIO.parse( filename, genbank ))
 return True
   except:
 self.records = None
 return False
 
 def validate(self, dataset):
   from Bio import SeqIO
errors = list()
   try:
 self.records = list(SeqIO.parse( dataset.file_name, genbank ))
   except Exception, e:
 errors.append(e)
   return errors
 
 def set_meta(self, dataset, **kwd):
   if self.records is not None:
 dataset.metadata.number_of_sequences = len(self.records)

so much easier! And I can shift the burden of validation and sniffing to
upstream, rather than any failures being my fault and requiring
maintenance of a complex sniffer.

Cheers,
Eric

- -- 
Eric Rasche
Programmer II
Center for Phage Technology
Texas AM University
College Station, TX 77843
404-692-2048
e...@tamu.edu
rasche.e...@yandex.ru
-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.22 (GNU/Linux)

iQIcBAEBAgAGBQJTyBmyAAoJEMqDXdrsMcpVQa0P/jj0edAKM6QsodhRWHglR92W
tej1tJjtPgtJ15wsFzq6wVfhbL5J39ytsWjjtk//jhVNXh4FEE/OFZe6Nx9uTFKP
ybazyTrLSCrxsST+w+Rx8Q9vfzShr87vjP+fC1k5i2EZOgogPOcQml1ouOHHjC6z
pArrwPOvL3ZxWJG7oEcZjUjrPD8+ffhfQ/x096YYIMw7Hg74d50ARwtawJRoslZD
JnYWa+aUOcsvC3QMrLKkDm4qBaTHa5x7x7P07Lcx7X65iMPDcuMZNtImiLztNscF
QwbbdJdcs8oeSRRnmKgAllRAKf4dMeiyaSI+muVzNlpvLlSMZBNawD0bO1OXmIQH
vAaV0eU+rYmDJSGo330o+RydvlDJENTXOkDt0TxmvfYAPtg2TlJCiWUdL7V1LqqF
n8J5Z7Cu/sqRGSr5ww6KY27QHq6TU1WZDsVZiyEWJeKg3HGzp0MUmzMdr7iSZawK
gnZxv6qg3+FlSqA30niyAuxEq588vS8uEFjjOfhnNLsUM7FAuFANF5z9bPOhG2qM
Xjc3/NY7NsERd9nsIwfRuz0DWni8upvZ39vfeRZ3OAW9NwjRzqXrQiQp08XHa934
z4EBnpcWc9rNSV/3APF/imecBTOoiKtZfzIfILLtOPGE407Bmd8cE8hWyW7ipvrT
QU6DIimj3eoMn+elXDfX
=M+s5
-END PGP SIGNATURE-
___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/


Re: [galaxy-dev] datatype dependencies

2014-07-17 Thread Peter Cock
You could do something like that, and we already have
Biopython packages in the ToolShed which can be listed
as dependencies :)

However, some things like GenBank are tricky - in order
to tolerate NCBI dumps the Biopython parser will ignore
any free text before the first LOCUS line. A confusing
side effect is most text files are then treated as a
GenBank file with zero records. But if it came back
with some records it is probably OK :)

Basically Biopython also does not care to offer file
format detection simply because it is a can of worms.

Zen of Python - explicit is better than implicit.

We want you to tell us which format you want to try
parsing it as.

Sorry,

Peter
(Speaking as the Bio.SeqIO maintainer for Biopython)


On Thu, Jul 17, 2014 at 7:45 PM, Eric Rasche rasche.e...@yandex.ru wrote:
 -BEGIN PGP SIGNED MESSAGE-
 Hash: SHA1

 Let's pretend for a second that I'm rather lazy (oh...wait), and I have
 ZERO interest in writing datatype parsers to sniff and validate whether
 or not a specific file is a specific datatype. I'm a sysadmin and
 bioinformatician, and I've worked with dozens of libraries that exist to
 parse file formats, and they all die in flames when I feed them bad data.

 Would it be possible to somehow define requirements for datatypes?

 I don't want to take on the burden of code I write saying yes, I've
 sniffed+validated this and it is absolutely a genbank file. That's a
 lot of responsibility, especially if people have malformed genbank files
 and their tools fail as a result.

 I would like to do this with BioPython and turf the validation to
 another library that exists to parse genbank files, that will raise and
 exception if they're invalid.

 def sniff(self, filename):
   from Bio import SeqIO
   try:
 self.records = list(SeqIO.parse( filename, genbank ))
 return True
   except:
 self.records = None
 return False

 def validate(self, dataset):
   from Bio import SeqIO
errors = list()
   try:
 self.records = list(SeqIO.parse( dataset.file_name, genbank ))
   except Exception, e:
 errors.append(e)
   return errors

 def set_meta(self, dataset, **kwd):
   if self.records is not None:
 dataset.metadata.number_of_sequences = len(self.records)

 so much easier! And I can shift the burden of validation and sniffing to
 upstream, rather than any failures being my fault and requiring
 maintenance of a complex sniffer.

 Cheers,
 Eric

 - --
 Eric Rasche
 Programmer II
 Center for Phage Technology
 Texas AM University
 College Station, TX 77843
 404-692-2048
 e...@tamu.edu
 rasche.e...@yandex.ru
 -BEGIN PGP SIGNATURE-
 Version: GnuPG v2.0.22 (GNU/Linux)

 iQIcBAEBAgAGBQJTyBmyAAoJEMqDXdrsMcpVQa0P/jj0edAKM6QsodhRWHglR92W
 tej1tJjtPgtJ15wsFzq6wVfhbL5J39ytsWjjtk//jhVNXh4FEE/OFZe6Nx9uTFKP
 ybazyTrLSCrxsST+w+Rx8Q9vfzShr87vjP+fC1k5i2EZOgogPOcQml1ouOHHjC6z
 pArrwPOvL3ZxWJG7oEcZjUjrPD8+ffhfQ/x096YYIMw7Hg74d50ARwtawJRoslZD
 JnYWa+aUOcsvC3QMrLKkDm4qBaTHa5x7x7P07Lcx7X65iMPDcuMZNtImiLztNscF
 QwbbdJdcs8oeSRRnmKgAllRAKf4dMeiyaSI+muVzNlpvLlSMZBNawD0bO1OXmIQH
 vAaV0eU+rYmDJSGo330o+RydvlDJENTXOkDt0TxmvfYAPtg2TlJCiWUdL7V1LqqF
 n8J5Z7Cu/sqRGSr5ww6KY27QHq6TU1WZDsVZiyEWJeKg3HGzp0MUmzMdr7iSZawK
 gnZxv6qg3+FlSqA30niyAuxEq588vS8uEFjjOfhnNLsUM7FAuFANF5z9bPOhG2qM
 Xjc3/NY7NsERd9nsIwfRuz0DWni8upvZ39vfeRZ3OAW9NwjRzqXrQiQp08XHa934
 z4EBnpcWc9rNSV/3APF/imecBTOoiKtZfzIfILLtOPGE407Bmd8cE8hWyW7ipvrT
 QU6DIimj3eoMn+elXDfX
 =M+s5
 -END PGP SIGNATURE-
 ___
 Please keep all replies on the list by using reply all
 in your mail client.  To manage your subscriptions to this
 and other Galaxy lists, please use the interface at:
   http://lists.bx.psu.edu/

 To search Galaxy mailing lists use the unified search at:
   http://galaxyproject.org/search/mailinglists/
___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/


Re: [galaxy-dev] datatype dependencies

2014-07-17 Thread Eric Rasche
-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1


On 07/17/2014 02:11 PM, Peter Cock wrote:
 You could do something like that, and we already have
 Biopython packages in the ToolShed which can be listed
 as dependencies :)
 

If my module depends on the biopython from the toolshed, will that be
accessible within a datatype? Would it be as simple as from Bio import
X? Most of what I've seen of dependencies (and please forgive my lack
of knowledge about them) consists of env.sh being sourced with paths to
binaries, prior to tool run.

 However, some things like GenBank are tricky - in order
 to tolerate NCBI dumps the Biopython parser will ignore
 any free text before the first LOCUS line. A confusing
 side effect is most text files are then treated as a
 GenBank file with zero records. But if it came back
 with some records it is probably OK :)

Interesting, very good to know.

 
 Basically Biopython also does not care to offer file
 format detection simply because it is a can of worms.
 
 Zen of Python - explicit is better than implicit.
 
 We want you to tell us which format you want to try
 parsing it as.

Yes! Exactly! Which is why it's perfectly fine here:

SeqIO.parse( dataset.file_name, genbank )

All I want to know is whether or not this parses as a genbank file (and
has 1 or more records). BioPython may not do automatic format detection
(yuck, agreed), but since I already know I'm looking for a genbank file,
simply being able to parse it or not is good enough.

 
 Sorry,
 
 Peter
 (Speaking as the Bio.SeqIO maintainer for Biopython)
 
 
 On Thu, Jul 17, 2014 at 7:45 PM, Eric Rasche rasche.e...@yandex.ru wrote:
 Let's pretend for a second that I'm rather lazy (oh...wait), and I have
 ZERO interest in writing datatype parsers to sniff and validate whether
 or not a specific file is a specific datatype. I'm a sysadmin and
 bioinformatician, and I've worked with dozens of libraries that exist to
 parse file formats, and they all die in flames when I feed them bad data.
 
 Would it be possible to somehow define requirements for datatypes?
 
 I don't want to take on the burden of code I write saying yes, I've
 sniffed+validated this and it is absolutely a genbank file. That's a
 lot of responsibility, especially if people have malformed genbank files
 and their tools fail as a result.
 
 I would like to do this with BioPython and turf the validation to
 another library that exists to parse genbank files, that will raise and
 exception if they're invalid.
 
 def sniff(self, filename):
   from Bio import SeqIO
   try:
 self.records = list(SeqIO.parse( filename, genbank ))
 return True
   except:
 self.records = None
 return False

 def validate(self, dataset):
   from Bio import SeqIO
errors = list()
   try:
 self.records = list(SeqIO.parse( dataset.file_name, genbank ))
   except Exception, e:
 errors.append(e)
   return errors

 def set_meta(self, dataset, **kwd):
   if self.records is not None:
 dataset.metadata.number_of_sequences = len(self.records)
 
 so much easier! And I can shift the burden of validation and sniffing to
 upstream, rather than any failures being my fault and requiring
 maintenance of a complex sniffer.
 
 Cheers,
 Eric
 
 ___
 Please keep all replies on the list by using reply all
 in your mail client.  To manage your subscriptions to this
 and other Galaxy lists, please use the interface at:
   http://lists.bx.psu.edu/

 To search Galaxy mailing lists use the unified search at:
   http://galaxyproject.org/search/mailinglists/
- -- 
Eric Rasche
Programmer II
Center for Phage Technology
Texas AM University
College Station, TX 77843
404-692-2048
e...@tamu.edu
rasche.e...@yandex.ru
-BEGIN PGP SIGNATURE-
Version: GnuPG v2.0.22 (GNU/Linux)

iQIcBAEBAgAGBQJTyCHwAAoJEMqDXdrsMcpVU6wP/26X1OOvvsF8kWvV7daA7ilh
7fpfh6uCKJ4aShyVbXmSvwvXP0i7lYvmoGWeNot46SZb/A9aZyd+05stMpn/Aqcm
q8SDpQop/sg8VUZBo6SerFpn8xQ1s3kT3hfFUHmAq25ity+bT58kPnpAmdQocuRg
V7F5CPGW3y1L4NMUHcBXockieGJgnnP4cEKWp++G/SUrExTYSBw2DmaYCC2Q0CIV
7XGbV3CoNTDXOsVZGvHQHXkYK6uL9yCN1R4xMc8UMkFN+bjlKbsU9aVgs6s2lImP
nazK6pD2z9EDz7VpVeDKYJiAa8cVpYQN/Ua3mNaMxa59gYh59AVQ1A5JMXBCpwQ5
Zm2o2roMbyeuWtB22pt5Dddim2qyYcie5A9t2hEJfBnMWOBCpPzEw34h2sm/5173
FC1etrltTMjdRsBl7SGE9WqAz5SRffgF3CE5JuFS9tqpCsSsuP2b0wIvY56Oixc9
VEF/tTNV05jG7O45QWoHr43CqqtiyXRZvqr7f8HaJkDjrtsNeMcWim6Wk4/fsNip
dw/jCCyMdanEGTn9oGqs8L1UfWmzLjut+UcOnFQM0R2f+xuD6gxW5PQYmjRFIf7i
cvFZ7XiGwd/6p5sI/3CYt7BnMMwaaIRZqnZd2NXK60R515OBx3nniG22rmUuviGh
uTNHT5Jt7m2mYZYMlCUk
=murl
-END PGP SIGNATURE-
___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/


Re: [galaxy-dev] datatype dependencies

2014-07-17 Thread Peter Cock
On Thu, Jul 17, 2014 at 8:20 PM, Eric Rasche rasche.e...@yandex.ru wrote:
 On 07/17/2014 02:11 PM, Peter Cock wrote:
 You could do something like that, and we already have
 Biopython packages in the ToolShed which can be listed
 as dependencies :)


 If my module depends on the biopython from the toolshed, will that be
 accessible within a datatype? Would it be as simple as from Bio import
 X? Most of what I've seen of dependencies (and please forgive my lack
 of knowledge about them) consists of env.sh being sourced with paths to
 binaries, prior to tool run.

I don't know - this may well be a gap in the ToolShed
framework, since thus far most of the datatypes defined
have been self contained.

I have asked something similar before (in the context
of defining automatic file format conversion like the way
Galaxy can turn FASTA into tabular in input parameters
expecting tabular), where there could be a binary
dependency.

 However, some things like GenBank are tricky - in order
 to tolerate NCBI dumps the Biopython parser will ignore
 any free text before the first LOCUS line. A confusing
 side effect is most text files are then treated as a
 GenBank file with zero records. But if it came back
 with some records it is probably OK :)

 Interesting, very good to know.


 Basically Biopython also does not care to offer file
 format detection simply because it is a can of worms.

 Zen of Python - explicit is better than implicit.

 We want you to tell us which format you want to try
 parsing it as.

 Yes! Exactly! Which is why it's perfectly fine here:

 SeqIO.parse( dataset.file_name, genbank )

 All I want to know is whether or not this parses as a genbank file (and
 has 1 or more records). BioPython may not do automatic format detection
 (yuck, agreed), but since I already know I'm looking for a genbank file,
 simply being able to parse it or not is good enough.

With those provisos, you should be OK :)

Peter
___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:
  http://lists.bx.psu.edu/

To search Galaxy mailing lists use the unified search at:
  http://galaxyproject.org/search/mailinglists/