Re: [galaxy-dev] DustMasker tool for ncbi_blast_plus

2013-02-15 Thread Nicola Soranzo
Il giorno lun, 11/02/2013 alle 13.19 +, Peter Cock ha scritto:
 On Fri, Feb 8, 2013 at 4:30 PM, Nicola Soranzo sora...@crs4.it wrote:
  Il giorno mer, 06/02/2013 alle 20.01 +0100, Nicola Soranzo ha scritto:
  Hi Peter,
  I added these file formats mostly as placeholders for a future
  implementation. Now I have changed a bit the tool by removing acclist
  and seqloc_xml formats since they are not recognized by the last
  versions of dustmasker (I also sent an email to
  blast-h...@ncbi.nlm.nih.gov to inform them of this bug).
  As before, you can find the new version at:
 
  https://bitbucket.org/nsoranzo/ncbi_blast_plus
 
  I stripped the old commit and did a new one, not a very good practice,
  sorry about that.
 
 It seems to have confused the bitbucket page a little, but I have
 checked in your initial wrapper to my development repository (I
 use the tools branch):
 https://bitbucket.org/peterjc/galaxy-central/commits/2284d485e36f74f19b0dbe78709b098d9eba4ef6
 
 Note I'm not going to include this in the Tool Shed release yet,
 we need to sort out the file format definitions first.


Hi Peter,
I implemented minimal datatypes for maskinfo ASN.1 binary and text, plus
some other improvements to ncbi_blast_plus, and I sent you a pull
request through Bitbucket for your development repository. I think that
would be easier for you, let me know if it is not.

Nicola

___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/


Re: [galaxy-dev] DustMasker tool for ncbi_blast_plus

2013-02-15 Thread Peter Cock
On Fri, Feb 15, 2013 at 6:10 PM, Nicola Soranzo sora...@crs4.it wrote:
Peter wrote:

 It seems to have confused the bitbucket page a little, but I have
 checked in your initial wrapper to my development repository (I
 use the tools branch):
 https://bitbucket.org/peterjc/galaxy-central/commits/2284d485e36f74f19b0dbe78709b098d9eba4ef6

 Note I'm not going to include this in the Tool Shed release yet,
 we need to sort out the file format definitions first.


 Hi Peter,
 I implemented minimal datatypes for maskinfo ASN.1 binary and text, plus
 some other improvements to ncbi_blast_plus, and I sent you a pull
 request through Bitbucket for your development repository. I think that
 would be easier for you, let me know if it is not.

 Nicola


That looks very useful Nicola - I hope to have time to test that
next week :)

Thank you,

Peter
___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/


Re: [galaxy-dev] DustMasker tool for ncbi_blast_plus

2013-02-11 Thread Peter Cock
On Fri, Feb 8, 2013 at 4:30 PM, Nicola Soranzo sora...@crs4.it wrote:
 Il giorno mer, 06/02/2013 alle 20.01 +0100, Nicola Soranzo ha scritto:
 Hi Peter,
 I added these file formats mostly as placeholders for a future
 implementation. Now I have changed a bit the tool by removing acclist
 and seqloc_xml formats since they are not recognized by the last
 versions of dustmasker (I also sent an email to
 blast-h...@ncbi.nlm.nih.gov to inform them of this bug).
 As before, you can find the new version at:

 https://bitbucket.org/nsoranzo/ncbi_blast_plus

 I stripped the old commit and did a new one, not a very good practice,
 sorry about that.

It seems to have confused the bitbucket page a little, but I have
checked in your initial wrapper to my development repository (I
use the tools branch):
https://bitbucket.org/peterjc/galaxy-central/commits/2284d485e36f74f19b0dbe78709b098d9eba4ef6

Note I'm not going to include this in the Tool Shed release yet,
we need to sort out the file format definitions first.

 Hi Peter,
 I've added a new commit to this repo which updates the test output files
 to (recommended) BLAST 2.2.26+, since functional tests were returning
 errors.

 Hope you find it useful.

Also applied to my branch, thank you - I'd forgotten to update that
(but intend at some point to refresh the test files and dependency
install to use BLAST 2.2.27+ instead):
https://bitbucket.org/peterjc/galaxy-central/commits/f1f912f63bb4174f434e3f47eac58f2cfa3753e6

Sadly I've not actually got the unit tests to run at all yet, see:
http://lists.bx.psu.edu/pipermail/galaxy-dev/2013-February/013245.html

Regards,

Peter
___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/


Re: [galaxy-dev] DustMasker tool for ncbi_blast_plus

2013-02-08 Thread Nicola Soranzo
Il giorno mer, 06/02/2013 alle 20.01 +0100, Nicola Soranzo ha scritto: 
 Hi Peter,
 I added these file formats mostly as placeholders for a future
 implementation. Now I have changed a bit the tool by removing acclist
 and seqloc_xml formats since they are not recognized by the last
 versions of dustmasker (I also sent an email to
 blast-h...@ncbi.nlm.nih.gov to inform them of this bug).
 As before, you can find the new version at:
 
 https://bitbucket.org/nsoranzo/ncbi_blast_plus
 
 I stripped the old commit and did a new one, not a very good practice,
 sorry about that.

Hi Peter,
I've added a new commit to this repo which updates the test output files
to (recommended) BLAST 2.2.26+, since functional tests were returning
errors.

Hope you find it useful.

Nicola
-- 
Nicola Soranzo, Ph.D.
CRS4
Bioinformatics Program
Loc. Piscina Manna
09010 Pula (CA), Italy
http://www.bioinformatica.crs4.it/

___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/


Re: [galaxy-dev] DustMasker tool for ncbi_blast_plus

2013-02-06 Thread Nicola Soranzo
Adding galaxy-dev list in CC as suggested by Peter.

Il giorno mer, 06/02/2013 alle 16.57 +, Peter Cock ha scritto: 
 On Tue, Feb 5, 2013 at 11:45 AM, Nicola Soranzo sora...@crs4.it wrote:
  Dear Peter,
  I have created a simple Galaxy tool for DustMasker of the NCBI BLAST+
  suite, which I think would be a useful addition to the ncbi_blast_plus
  repository you're maintaining in the Galaxy Tool Shed.
 
  You can find it and hopefully pull it from:
 
  https://bitbucket.org/nsoranzo/ncbi_blast_plus
 
  Kind regards,
  Nicola
 
 Hi Nicola,
 
 Thanks for getting involved - we can discuss this on the galaxy-dev
 mailing list if you prefer? For now I have CC'd Edward Kirton as he
 is/was working on masking in BLAST databases for Galaxy.
 
 I can see the new file
 tools/ncbi_blast_plus/ncbi_dustmasker_wrapper.xml however it refers to
 multiple new file formats - where are they defined?
 
 * acclist
 * maskinfo_asn1_bin
 * maskinfo_asn1_text
 * seqloc_asn1_bin
 * seqloc_asn1_text

Hi Peter,
I added these file formats mostly as placeholders for a future
implementation. Now I have changed a bit the tool by removing acclist
and seqloc_xml formats since they are not recognized by the last
versions of dustmasker (I also sent an email to
blast-h...@ncbi.nlm.nih.gov to inform them of this bug).
As before, you can find the new version at:

https://bitbucket.org/nsoranzo/ncbi_blast_plus

I stripped the old commit and did a new one, not a very good practice,
sorry about that.

 Have you looked at the (commented out) bits in the makeblastdb wrapper
 which would perhaps be relevant? This is something Edward Kirton wrote
 which I haven't integrated yet:
 
 !-- SEQUENCE MASKING OPTIONS --
 !-- TODO
 repeat name=mask_data title=Provide one or more files
 containing masking data
 param name=file type=data format=asnb label=File
 containing masking data help=As produced by NCBI masking
 applications (e.g. dustmasker, segmasker, windowmasker) /
 /repeat
 repeat name=gi_mask title=Create GI indexed masking data
 param name=file type=data format=asnb label=Masking
 data output file /
 /repeat
 --
 
 Perhaps all you need to offer in ncbi_dustmasker_wrapper.xml is
 'fasta' and 'asnb' (binary ASN) formats? Edward - did you have an
 'asnb' definition?

'fasta' and 'interval' are the ones I'm interested for my use case.

'maskinfo_asn1_bin' is probably the one referenced as 'asnb' in the
cited code (ASN1 is a general data serialization format like XML). A
file in this format can be given as input to makeblastdb -mask_data.

Nicola
-- 
Nicola Soranzo, Ph.D.
CRS4
Bioinformatics Program
Loc. Piscina Manna
09010 Pula (CA), Italy
http://www.bioinformatica.crs4.it/

___
Please keep all replies on the list by using reply all
in your mail client.  To manage your subscriptions to this
and other Galaxy lists, please use the interface at:

  http://lists.bx.psu.edu/