Re: [EMBOSS] Escaping query terms in a USA

2013-08-23 Thread David Bauer
Hi Hamish,

it seems the index is OK, just the database query code can not handle the ":" 
which has special meanings in USAs.
So as workaround you can replace the ":" by a "*".

entret -stdout -auto 'imgthla-key:A*02*364'

will return the entry HLA08011.

But be aware that by this you actually generate a wildcard query, so the * 
matches any single character at that position.

Kind regards,
David.

-Ursprüngliche Nachricht-
Von: emboss-boun...@lists.open-bio.org 
[mailto:emboss-boun...@lists.open-bio.org] Im Auftrag von Hamish McWilliam
Gesendet: 23 August 2013 11:25
An: emboss@lists.open-bio.org
Betreff: [EMBOSS] Escaping query terms in a USA

Hi folks,

In the IMGT/HLA database (http://www.ebi.ac.uk/ipd/imgt/hla/) the 
keywords field in the EMBL-Bank format flat-file contains allele names like:

A*02:364

While I can build an index containing the keywords, it does not appear 
to be possible to search the index with the allele names. For example:

$ entret -stdout -auto 'imgthla-key:Allele'

works as expected, but:

$ entret -stdout -auto 'imgthla-key:A*02:364'

just gives errors:

Error: Failed to open filename 'imgthla-key'
Error: Unable to read sequence 'imgthla-key:A*02:364'
Died: entret terminated: Bad value for '-sequence' with -auto defined

I am guessing that the problem is the '*' and ':' characters in the 
term... so is there some way to escape these or are the terms in the 
index mangles in some way?

All the best,

Hamish
-- 

Mr Hamish McWilliam,
Web Production,
European Bioinformatics Institute (EMBL-EBI),
European Molecular Biology Laboratory,
Wellcome Trust Genome Campus,
Hinxton, Cambridge, CB10 1SD
United Kingdom

URL: http://www.ebi.ac.uk/


___
EMBOSS mailing list
EMBOSS@lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/emboss

___
EMBOSS mailing list
EMBOSS@lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/emboss


Re: [EMBOSS] Escaping query terms in a USA

2013-08-23 Thread Hamish McWilliam

Hi David,


it seems the index is OK, just the database query code can not handle
the ":" which has special meanings in USAs. So as workaround you can
replace the ":" by a "*".

entret -stdout -auto 'imgthla-key:A*02*364'

will return the entry HLA08011.

But be aware that by this you actually generate a wildcard query, so
the * matches any single character at that position.


Unfortunately that is not going to work for this case since the HLA 
alleles use a somewhat nested nomenclature, for example:


a*01:01:02
a*01:02
a*02:01:02
a*02:101:02

However a little experimentation indicates that EMBOSS supports the 
single character wild-card '?', so something like:


$ entret -stdout -auto 'imgthla-key:A?01?02'

appears to do what I want in most cases.

That said, it would be better to have a way to escape the special 
characters (i.e. '*', ':' and '?') in the search term when an exact 
match is required (as in this case).


Thanks,

Hamish



Kind regards, David.

-Ursprüngliche Nachricht- Von:
emboss-boun...@lists.open-bio.org
[mailto:emboss-boun...@lists.open-bio.org] Im Auftrag von Hamish
McWilliam Gesendet: 23 August 2013 11:25 An:
emboss@lists.open-bio.org Betreff: [EMBOSS] Escaping query terms in a
USA

Hi folks,

In the IMGT/HLA database (http://www.ebi.ac.uk/ipd/imgt/hla/) the
keywords field in the EMBL-Bank format flat-file contains allele
names like:

A*02:364

While I can build an index containing the keywords, it does not
appear to be possible to search the index with the allele names. For
example:

$ entret -stdout -auto 'imgthla-key:Allele'

works as expected, but:

$ entret -stdout -auto 'imgthla-key:A*02:364'

just gives errors:

Error: Failed to open filename 'imgthla-key' Error: Unable to read
sequence 'imgthla-key:A*02:364' Died: entret terminated: Bad value
for '-sequence' with -auto defined

I am guessing that the problem is the '*' and ':' characters in the
term... so is there some way to escape these or are the terms in the
index mangles in some way?

All the best,

Hamish




--

Mr Hamish McWilliam,
Web Production,
European Bioinformatics Institute (EMBL-EBI),
European Molecular Biology Laboratory,
Wellcome Trust Genome Campus,
Hinxton, Cambridge, CB10 1SD
United Kingdom

URL: http://www.ebi.ac.uk/


___
EMBOSS mailing list
EMBOSS@lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/emboss