Re: [EMBOSS] Space in USA "db:seqname" in list file causes unintended behavior

2012-10-13 Thread Peter Rice

On 12/10/2012 22:27, Rozenbaum, Daniel (Biocceleration Inc) wrote:

Hello everyone,

We have encountered the following issue: if there's an erroneous (most likely unintentionally) entry in a 
list file that looks like "db:seqname", EMBOSS doesn't issue an 
error/warning message, but treats this entry as "db:*". >

Might it be possible though to add some protection against potentially 
problematic consequences if such an error in the USA is made? In one such 
instance the resultant clustalw process ended up attempting to build a multiple 
alignment across the entire UniProt, which the server didn't handle well :-)


An interesting problem. List files have a long history, going back 
before EMBOSS. They were also used in the GCG (Wisconsin) package, which 
in turn adopted them from the VMS operating system. where they could be 
used for mailing lists (sending to @list with a list of usernames, for 
example).


In a list file, only the first token (word) is significant. The 
remainder of the line is treated as a comment.


As you discovered, a space before the id (or indeed just a database 
name) is a valid input representing all entries in the database.


I think it is safe to assume that list files in practice have no 
comments, so we can make a simple change for the next release:


list:: indicates a list file with only one token per line. Any 
extraneous text will result in an error or warning message


The same restriction will be applied to the VMS syntax @listfile

A new list style can be added to allow comments so that any user with 
them can still use their list files.


Possibly a stricter comment style could be allowed in standard list:: 
files. We can check what other packages may have introduced, but 
something like a perl-style #comment could be simple to add. The # 
character has no special meaning in the EMBOSS query language.


With those changes in place your users would be saved from extra spaces 
... but of course would still be caught by a newline creeping in to 
start a new record after the database name (reading the entire database, 
then reading the id as a possible filename). Users will get an error 
message from that so long as the second part is not a valid filename or 
database name.


regards,

Peter Rice
EMBOSS Team


___
EMBOSS mailing list
EMBOSS@lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/emboss


[EMBOSS] Space in USA "db:seqname" in list file causes unintended behavior

2012-10-12 Thread Rozenbaum, Daniel (Biocceleration Inc)
Hello everyone,

We have encountered the following issue: if there's an erroneous (most likely 
unintentionally) entry in a list file that looks like "db:seqname", EMBOSS doesn't issue an error/warning message, but treats 
this entry as "db:*". Here's an example using the test database tsw in EMBOSS 
distribution; it contains 100 sequences, three of which match the pattern 
"hba*" :

% seqret tsw -auto -stdout  | egrep "^>" | wc -l
100
% cat list2
tsw:hba*
% seqret list::list2 -auto -stdout  | egrep "^>" | wc -l
3
% cat list1
tsw: hba*
% seqret list::list1 -auto -stdout  | egrep "^>" | wc -l
100
Of course, the immediate answer is to instruct the users to be careful not to 
allow unintended spaces in the USA's. Might it be possible though to add some 
protection against potentially problematic consequences if such an error in the 
USA is made? In one such instance the resultant clustalw process ended up 
attempting to build a multiple alignment across the entire UniProt, which the 
server didn't handle well :-)

With best regards,
Daniel

--
Daniel Rozenbaum
Biocceleration, Inc.
OCIO/ Office of Application Engineering & Development/ Patent System Division
600 Dulany St.
Alexandria, VA 22314

___
EMBOSS mailing list
EMBOSS@lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/emboss