subject:"\[EMBOSS\] Space in USA \"db\:seqname\" in list file causes unintended behavior"

Re: [EMBOSS] Space in USA "db:seqname" in list file causes unintended behavior

2012-10-13 Thread Peter Rice

On 12/10/2012 22:27, Rozenbaum, Daniel (Biocceleration Inc) wrote:

Hello everyone,

We have encountered the following issue: if there's an erroneous (most likely unintentionally) entry in a
list file that looks like "db:seqname", EMBOSS doesn't issue an
error/warning message, but treats this entry as "db:*". >

Might it be possible though to add some protection against potentially
problematic consequences if such an error in the USA is made? In one such
instance the resultant clustalw process ended up attempting to build a multiple
alignment across the entire UniProt, which the server didn't handle well :-)

An interesting problem. List files have a long history, going back
before EMBOSS. They were also used in the GCG (Wisconsin) package, which
in turn adopted them from the VMS operating system. where they could be
used for mailing lists (sending to @list with a list of usernames, for
example).

In a list file, only the first token (word) is significant. The
remainder of the line is treated as a comment.

As you discovered, a space before the id (or indeed just a database
name) is a valid input representing all entries in the database.

I think it is safe to assume that list files in practice have no
comments, so we can make a simple change for the next release:

list:: indicates a list file with only one token per line. Any
extraneous text will result in an error or warning message

The same restriction will be applied to the VMS syntax @listfile

A new list style can be added to allow comments so that any user with
them can still use their list files.

Possibly a stricter comment style could be allowed in standard list::
files. We can check what other packages may have introduced, but
something like a perl-style #comment could be simple to add. The #
character has no special meaning in the EMBOSS query language.

With those changes in place your users would be saved from extra spaces
... but of course would still be caught by a newline creeping in to
start a new record after the database name (reading the entire database,
then reading the id as a possible filename). Users will get an error
message from that so long as the second part is not a valid filename or
database name.

regards,

Peter Rice
EMBOSS Team

___
EMBOSS mailing list
EMBOSS@lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/emboss

[EMBOSS] Space in USA "db:seqname" in list file causes unintended behavior

2012-10-12 Thread Rozenbaum, Daniel (Biocceleration Inc)

Hello everyone,

We have encountered the following issue: if there's an erroneous (most likely 
unintentionally) entry in a list file that looks like "db:seqname", EMBOSS doesn't issue an error/warning message, but treats 
this entry as "db:*". Here's an example using the test database tsw in EMBOSS 
distribution; it contains 100 sequences, three of which match the pattern 
"hba*" :

% seqret tsw -auto -stdout  | egrep "^>" | wc -l
100
% cat list2
tsw:hba*
% seqret list::list2 -auto -stdout  | egrep "^>" | wc -l
3
% cat list1
tsw: hba*
% seqret list::list1 -auto -stdout  | egrep "^>" | wc -l
100
Of course, the immediate answer is to instruct the users to be careful not to 
allow unintended spaces in the USA's. Might it be possible though to add some 
protection against potentially problematic consequences if such an error in the 
USA is made? In one such instance the resultant clustalw process ended up 
attempting to build a multiple alignment across the entire UniProt, which the 
server didn't handle well :-)

With best regards,
Daniel

--
Daniel Rozenbaum
Biocceleration, Inc.
OCIO/ Office of Application Engineering & Development/ Patent System Division
600 Dulany St.
Alexandria, VA 22314

___
EMBOSS mailing list
EMBOSS@lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/emboss

Re: [EMBOSS] Space in USA "db:seqname" in list file causes unintended behavior

[EMBOSS] Space in USA "db:seqname" in list file causes unintended behavior

2 matches

Site Navigation

Mail list logo

Footer information