Re: [EMBOSS] Support for multi-line annotation in ig format

2012-09-19 Thread Peter Rice

On 19/09/2012 17:12, Rozenbaum, Daniel (Biocceleration Inc) wrote:

Right - unfortunately all the other fields, while appearing well structured and 
nicely formatted in the example I sent, may or may not be present (or present 
but poorly formatted due to legacy issues) in the general case. And the patent 
number may not be present in the data representing patent applications that are 
still pending review.


Thanks. That at least keeps things simple!

regards,

Peter Rice
EMBOSS Team

___
EMBOSS mailing list
EMBOSS@lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/emboss


Re: [EMBOSS] Support for multi-line annotation in ig format

2012-09-19 Thread Rozenbaum, Daniel (Biocceleration Inc)
Right - unfortunately all the other fields, while appearing well structured and 
nicely formatted in the example I sent, may or may not be present (or present 
but poorly formatted due to legacy issues) in the general case. And the patent 
number may not be present in the data representing patent applications that are 
still pending review.

Many thanks,
Daniel

-Original Message-
From: Peter Rice [mailto:ricepet...@yahoo.co.uk] 
Sent: Wednesday, September 19, 2012 11:35 AM
To: Rozenbaum, Daniel (Biocceleration Inc)
Cc: emboss@lists.open-bio.org
Subject: Re: [EMBOSS] Support for multi-line annotation in ig format

On 19/09/2012 16:23, Rozenbaum, Daniel (Biocceleration Inc) wrote:
> Dear Peter,
>
> At least within the context of USPTO the sequence identifier is the only 
> consistently present piece of information that uniquely identifies the 
> sequence. Does the absence of an accession number field make the task of 
> adding support for this in EMBOSS more complex?

No, it is not a problem. You only need to tell the database definition it has 
no accession (but perhaps the patent number could be used as an
accession)

regards,

Peter Rice
EMBOSS Team



___
EMBOSS mailing list
EMBOSS@lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/emboss


Re: [EMBOSS] Support for multi-line annotation in ig format

2012-09-19 Thread Peter Rice

On 19/09/2012 16:23, Rozenbaum, Daniel (Biocceleration Inc) wrote:

Dear Peter,

At least within the context of USPTO the sequence identifier is the only 
consistently present piece of information that uniquely identifies the 
sequence. Does the absence of an accession number field make the task of adding 
support for this in EMBOSS more complex?


No, it is not a problem. You only need to tell the database definition 
it has no accession (but perhaps the patent number could be used as an 
accession)


regards,

Peter Rice
EMBOSS Team

___
EMBOSS mailing list
EMBOSS@lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/emboss


Re: [EMBOSS] Support for multi-line annotation in ig format

2012-09-19 Thread Rozenbaum, Daniel (Biocceleration Inc)
Dear Peter,

At least within the context of USPTO the sequence identifier is the only 
consistently present piece of information that uniquely identifies the 
sequence. Does the absence of an accession number field make the task of adding 
support for this in EMBOSS more complex?

Thank you,
Daniel

On Sep 19, 2012, at 11:14 AM, "Peter Rice"  wrote:

> Dear Daniel,
> 
> On 19/09/2012 14:49, Rozenbaum, Daniel (Biocceleration Inc) wrote:
> 
>> I am attaching a short anonymized sample file (would a larger data set be 
>> helpful?) that illustrates the type of IG format in use at USPTO. I believe 
>> that the only reasonably indexable field is the sequence name 
>> ("US-123456789-1", "US-123456789-2", etc). While the annotation fields 
>> appear structured, that part of the information is not reliable.
> 
> Thanks I'll take a look.
> 
> We usually index an "access number" in addition to the identifier. Is 
> there some significance in the parts of the id naming that could be used 
> as an accession or a sequence version?
> 
>> As for the name, how about something like "iguspto"?
> 
> Thanks. I may just use USPTO but it's not important.
> 
>> Lastly, do you think the patch with this change would be made available for 
>> EMBOSS 6.4?
> 
> Yes ... it is a fairly straightforward extension to dbxflat so I could 
> send you a copy but for general release I would prefer to distribute  it 
> only from 6.5 onwards.
> 
> regards,
> 
> Peter Rice
> EMBOSS Team
> 

___
EMBOSS mailing list
EMBOSS@lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/emboss


Re: [EMBOSS] Support for multi-line annotation in ig format

2012-09-19 Thread Peter Rice

Dear Daniel,

On 19/09/2012 14:49, Rozenbaum, Daniel (Biocceleration Inc) wrote:


I am attaching a short anonymized sample file (would a larger data set be helpful?) that 
illustrates the type of IG format in use at USPTO. I believe that the only reasonably indexable 
field is the sequence name ("US-123456789-1", "US-123456789-2", etc). While the 
annotation fields appear structured, that part of the information is not reliable.


Thanks I'll take a look.

We usually index an "access number" in addition to the identifier. Is 
there some significance in the parts of the id naming that could be used 
as an accession or a sequence version?



As for the name, how about something like "iguspto"?


Thanks. I may just use USPTO but it's not important.


Lastly, do you think the patch with this change would be made available for 
EMBOSS 6.4?


Yes ... it is a fairly straightforward extension to dbxflat so I could 
send you a copy but for general release I would prefer to distribute  it 
only from 6.5 onwards.


regards,

Peter Rice
EMBOSS Team
___
EMBOSS mailing list
EMBOSS@lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/emboss


Re: [EMBOSS] Support for multi-line annotation in ig format

2012-09-19 Thread Rozenbaum, Daniel (Biocceleration Inc)
A quick addition to the information on this format: while the example I sent 
has the records separated by a couple of new lines and a form feed (^L , 0x0c), 
in the most general case the first line of the next record (a line that starts 
with a semicolon) could appear immediately after the last sequence data line of 
the previous record. Empty lines between records are ignored.

On Sep 19, 2012, at 10:09 AM, "Rozenbaum, Daniel (Biocceleration Inc)" 
 wrote:

> Dear Peter,
> 
> This is most wonderful news that's going to make a bunch of users really 
> happy!
> 
> I am attaching a short anonymized sample file (would a larger data set be 
> helpful?) that illustrates the type of IG format in use at USPTO. I believe 
> that the only reasonably indexable field is the sequence name 
> ("US-123456789-1", "US-123456789-2", etc). While the annotation fields appear 
> structured, that part of the information is not reliable. 
> 
> As for the name, how about something like "iguspto"?
> 
> Lastly, do you think the patch with this change would be made available for 
> EMBOSS 6.4? 
> 
> With gratitude,
> Daniel
> 
> --
> Daniel Rozenbaum
> Biocceleration, Inc.
> OCIO/ Office of Application Engineering & Development/ Patent System Division
> 600 Dulany St.
> Alexandria, VA 22314
> 
> -Original Message-
> From: Peter Rice [mailto:ricepet...@yahoo.co.uk] 
> Sent: Wednesday, September 19, 2012 6:48 AM
> To: Rozenbaum, Daniel (Biocceleration Inc)
> Cc: emboss@lists.open-bio.org
> Subject: Re: [EMBOSS] Support for multi-line annotation in ig format
> 
> Dear Daniel,
> 
> On 18/09/2012 03:00, Rozenbaum, Daniel (Biocceleration Inc) wrote:
>> Greetings again,
>> 
>> If I may, another question on the issue of IG format: how difficult would it 
>> be to support database indexing for this format?
> 
> Very easy, a 1-day job including testing and documentation.
> 
> Could you please make some example data available, and indicate which fields 
> could be indexed (including any information in formatted descriptions or in 
> naming conventions), and suggest a format name (e.g. 
> USPTO or Biocceleration)
> 
> regards,
> 
> Peter Rice
> EMBOSS Team
> 
> 
> 
> ___
> EMBOSS mailing list
> EMBOSS@lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/emboss

___
EMBOSS mailing list
EMBOSS@lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/emboss


Re: [EMBOSS] Support for multi-line annotation in ig format

2012-09-19 Thread Rozenbaum, Daniel (Biocceleration Inc)
Dear Peter,

This is most wonderful news that's going to make a bunch of users really happy!

I am attaching a short anonymized sample file (would a larger data set be 
helpful?) that illustrates the type of IG format in use at USPTO. I believe 
that the only reasonably indexable field is the sequence name 
("US-123456789-1", "US-123456789-2", etc). While the annotation fields appear 
structured, that part of the information is not reliable. 

As for the name, how about something like "iguspto"?

Lastly, do you think the patch with this change would be made available for 
EMBOSS 6.4? 

With gratitude,
Daniel

--
Daniel Rozenbaum
Biocceleration, Inc.
OCIO/ Office of Application Engineering & Development/ Patent System Division
600 Dulany St.
Alexandria, VA 22314

-Original Message-
From: Peter Rice [mailto:ricepet...@yahoo.co.uk] 
Sent: Wednesday, September 19, 2012 6:48 AM
To: Rozenbaum, Daniel (Biocceleration Inc)
Cc: emboss@lists.open-bio.org
Subject: Re: [EMBOSS] Support for multi-line annotation in ig format

Dear Daniel,

On 18/09/2012 03:00, Rozenbaum, Daniel (Biocceleration Inc) wrote:
> Greetings again,
>
> If I may, another question on the issue of IG format: how difficult would it 
> be to support database indexing for this format?

Very easy, a 1-day job including testing and documentation.

Could you please make some example data available, and indicate which fields 
could be indexed (including any information in formatted descriptions or in 
naming conventions), and suggest a format name (e.g. 
USPTO or Biocceleration)

regards,

Peter Rice
EMBOSS Team


; Sequence 1, Application US/123456789
; Patent No. 98765432
; GENERAL INFORMATION
;  APPLICANT: Doe, Jane
;  APPLICANT: Doe, John
;  TITLE OF INVENTION: title of invention text here
;  FILE REFERENCE: file reference text here
;  CURRENT APPLICATION NUMBER: 123456789
;  CURRENT FILING DATE: 2010-01-01
;  NUMBER OF SEQ ID NOS: 4
;  SOFTWARE: PatentIn version 3.5
; SEQ ID NO 1
;  LENGTH: 178
;  TYPE: PRT
;  ORGANISM: organism description text here
US-123456789-1
HGQGMHKIEAPCGQMFRCTMVKFSDDYNEPIALKIRYARPGTCWYAMVVCEQMVPWISWT
LALTRVAGQVRDSPPFWAWYCEKMQANKPMPWRQTWVAHYAWPENWMNPYNVFGKCHKTD
LGRCWQWWKDITEQLTVCHWMDWGIACQDCLEKTKHGLCHSRAQIMHCGHGGVTGKHA1


; Sequence 2, Application US/123456789
; Patent No. 98765432
; GENERAL INFORMATION
;  APPLICANT: Doe, Jane
;  APPLICANT: Doe, John
;  TITLE OF INVENTION: title of invention text here
;  FILE REFERENCE: file reference text here
;  CURRENT APPLICATION NUMBER: 123456789
;  CURRENT FILING DATE: 2010-01-01
;  NUMBER OF SEQ ID NOS: 4
;  SOFTWARE: PatentIn version 3.5
; SEQ ID NO 2
;  LENGTH: 500
;  TYPE: PRT
;  ORGANISM: organism description text here
US-123456789-2
KTLNSGAQIALVMTNASRGLPQTSRVLDYREVNRTDSGNYHGDSYRYHEHRVKYESMNKM
CNTLLAFCRPKKMQNTARWHRVDLCMQEYCACPRMFCTVQTHMPWFRSDVGPPWFAARTN
PECSIVDGAVGRKCHEPTTNEVAGCRFECGPVSHEDPIMKWHAVTGHQRSMILILLGPRQ
CGKTTSEIWCHYVHDWAHMQHVTYYTVVDEERMNAFANKNHTNVCKYHPSMLHCVHRLSP
HPPVEYNLKNLKITYMPPNSISNPGITLDNTCLQTACLGSHYSWVMVEMYTRNCYRAPAY
NKAQNSDTWGIQTAVHTANGHEANQEVCIAIIFIGFWAYKHDVWHMTVDEVDGYMPDESV
NGDGGPKKYIEFKCQYWTGFDYDAIGIHVLTRFFRWYEFCLRWQHGKAHIHAPCRDTGHG
ANTLAKAESNPFGAAQSALGWLMDNLCKYLMCNRCAQLNASHWTFWTNPMDQWMCGMLDI
CRPPMLRKGPISDESHTFTD1


; Sequence 3, Application US/123456789
; Patent No. 98765432
; GENERAL INFORMATION
;  APPLICANT: Doe, Jane
;  APPLICANT: Doe, John
;  TITLE OF INVENTION: title of invention text here
;  FILE REFERENCE: file reference text here
;  CURRENT APPLICATION NUMBER: 123456789
;  CURRENT FILING DATE: 2010-01-01
;  NUMBER OF SEQ ID NOS: 4
;  SOFTWARE: PatentIn version 3.5
; SEQ ID NO 3
;  LENGTH: 61
;  TYPE: PRT
;  ORGANISM: organism description text here
US-123456789-3
GVSGANWCNNEWFNARSGWPAPICTGRFPKVSAYCRLVVMWYAKTFFRYEFAFVHKRTGP
M1


; Sequence 4, Application US/123456789
; Patent No. 98765432
; GENERAL INFORMATION
;  APPLICANT: Doe, Jane
;  APPLICANT: Doe, John
;  TITLE OF INVENTION: title of invention text here
;  FILE REFERENCE: file reference text here
;  CURRENT APPLICATION NUMBER: 123456789
;  CURRENT FILING DATE: 2010-01-01
;  NUMBER OF SEQ ID NOS: 4
;  SOFTWARE: PatentIn version 3.5
; SEQ ID NO 4
;  LENGTH: 10
;  TYPE: PRT
;  ORGANISM: organism description text here
US-123456789-4
YDAIGIHVLT1
___
EMBOSS mailing list
EMBOSS@lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/emboss


Re: [EMBOSS] Support for multi-line annotation in ig format

2012-09-19 Thread Peter Rice

On 19/09/2012 11:58, Peter Cock wrote:

Does it need a new format name? EMBOSS already defines "ig" and
"igstrict" - do the USPTO files diverge from these?


The format name is needed as an option to dbxflat -idformat so we can 
select a specific parser for any additional fields.


For example, in dbxfasta -idformat has 7 names for 'fasta' format.

regards,

Peter Rice
EMBOSS Team
___
EMBOSS mailing list
EMBOSS@lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/emboss


Re: [EMBOSS] Support for multi-line annotation in ig format

2012-09-19 Thread Peter Rice

Dear  Daniel,

On 14/09/2012 13:56, Rozenbaum, Daniel (Biocceleration Inc) wrote:

Hello Peter and everyone,

Here's an example of an additional issue I've run into when trying to work with 
IG format in EMBOSS:

In the entret result above the first annotation line of the subsequent record 
is returned as part of the requested record.


Well spotted. The input buffer is not reset in Ig formats so the next 
line was included in the entret output.


I will fix it in the next patch for the latest release (6.5). Let me 
know if you also need a patch for 6.4.


regards,

Peter Rice
EMBOSS Team

___
EMBOSS mailing list
EMBOSS@lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/emboss


Re: [EMBOSS] Support for multi-line annotation in ig format

2012-09-19 Thread Peter Cock
On Wed, Sep 19, 2012 at 11:48 AM, Peter Rice  wrote:
> Dear Daniel,
>
>
> On 18/09/2012 03:00, Rozenbaum, Daniel (Biocceleration Inc) wrote:
>>
>> Greetings again,
>>
>> If I may, another question on the issue of IG format: how difficult would
>> it be to support database indexing for this format?
>
>
> Very easy, a 1-day job including testing and documentation.
>
> Could you please make some example data available, and indicate which fields
> could be indexed (including any information in formatted descriptions or in
> naming conventions), and suggest a format name (e.g. USPTO or
> Biocceleration)

Does it need a new format name? EMBOSS already defines "ig" and
"igstrict" - do the USPTO files diverge from these?

Peter C.

P.S. Biopython also uses the format name "ig", based on the current
EMBOSS terminology.
___
EMBOSS mailing list
EMBOSS@lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/emboss


Re: [EMBOSS] Support for multi-line annotation in ig format

2012-09-19 Thread Peter Rice

Dear Daniel,

On 18/09/2012 03:00, Rozenbaum, Daniel (Biocceleration Inc) wrote:

Greetings again,

If I may, another question on the issue of IG format: how difficult would it be 
to support database indexing for this format?


Very easy, a 1-day job including testing and documentation.

Could you please make some example data available, and indicate which 
fields could be indexed (including any information in formatted 
descriptions or in naming conventions), and suggest a format name (e.g. 
USPTO or Biocceleration)


regards,

Peter Rice
EMBOSS Team

___
EMBOSS mailing list
EMBOSS@lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/emboss


Re: [EMBOSS] Support for multi-line annotation in ig format

2012-09-18 Thread Peter Cock
On Tue, Sep 18, 2012 at 12:42 PM, Rozenbaum, Daniel (Biocceleration
Inc)  wrote:
> Hi Peter,
>
> I don't believe the USPTO provides datasets to the public in the IG format.
>
> With best regards,
> Daniel

OK, thanks.

Peter
___
EMBOSS mailing list
EMBOSS@lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/emboss


Re: [EMBOSS] Support for multi-line annotation in ig format

2012-09-18 Thread Rozenbaum, Daniel (Biocceleration Inc)
Hi Peter,

I don't believe the USPTO provides datasets to the public in the IG format.

With best regards,
Daniel


From: Peter Cock [p.j.a.c...@googlemail.com]
Sent: Tuesday, September 18, 2012 4:25 AM
To: Rozenbaum, Daniel (Biocceleration Inc)
Cc: emboss@lists.open-bio.org
Subject: Re: [EMBOSS] Support for multi-line annotation in ig format

On Fri, Sep 14, 2012 at 1:56 PM, Rozenbaum, Daniel (Biocceleration
Inc)  wrote:
> Hello Peter and everyone,
>
> I was wondering if I could  revive the discussion about the support of IG
> format if possible. I'm helping deploy EMBOSS at the US Patent and
> Trademark Office, where this format, in its multi-line sequence annotation
> form, is used extensively.

Hi Daniel,

That is interesting to know - I work on Biopython, which has support for
reading and indexing the Intelligenetics "ig" format. I'd been under the
impression that this was a defunct/unused file format (and therefore
never bothered to implement support for writing it in Biopython).

Does the US Patent and Trademark Office provide datasets to the
public in this format?

Thanks,

Peter C.
___
EMBOSS mailing list
EMBOSS@lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/emboss


Re: [EMBOSS] Support for multi-line annotation in ig format

2012-09-18 Thread Peter Cock
On Fri, Sep 14, 2012 at 1:56 PM, Rozenbaum, Daniel (Biocceleration
Inc)  wrote:
> Hello Peter and everyone,
>
> I was wondering if I could  revive the discussion about the support of IG
> format if possible. I'm helping deploy EMBOSS at the US Patent and
> Trademark Office, where this format, in its multi-line sequence annotation
> form, is used extensively.

Hi Daniel,

That is interesting to know - I work on Biopython, which has support for
reading and indexing the Intelligenetics "ig" format. I'd been under the
impression that this was a defunct/unused file format (and therefore
never bothered to implement support for writing it in Biopython).

Does the US Patent and Trademark Office provide datasets to the
public in this format?

Thanks,

Peter C.
___
EMBOSS mailing list
EMBOSS@lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/emboss


Re: [EMBOSS] Support for multi-line annotation in ig format

2012-09-17 Thread Rozenbaum, Daniel (Biocceleration Inc)
Greetings again,

If I may, another question on the issue of IG format: how difficult would it be 
to support database indexing for this format?

With best regards,
Daniel

--
Daniel Rozenbaum
Biocceleration, Inc.
OCIO/ Office of Application Engineering & Development/ Patent System Division
600 Dulany St.
Alexandria, VA 22314

On Sep 14, 2012, at 9:36 AM, "Rozenbaum, Daniel (Biocceleration Inc)" 
 wrote:

> Hello Peter and everyone,
> 
> I was wondering if I could  revive the discussion about the support of IG 
> format if possible. I'm helping deploy EMBOSS at the US Patent and Trademark 
> Office, where this format, in its multi-line sequence annotation form, is 
> used extensively.
> 
> Here's an example of an additional issue I've run into when trying to work 
> with IG format in EMBOSS:
> 
> % makeprotseq -amount 10 -length 10 -nouseinsert -osformat ig -auto -osname 
> ig1
> 
> % cat ig1.ig
> ;, 10 bases
> EMBOSS_001
> hcsptpstas1
> ;, 10 bases
> EMBOSS_002
> rdgwcvmtrm1
> ;, 10 bases
> EMBOSS_003
> fgtifgdgid1
> 
> 
> % entret  -sequence ig1.ig:EMBOSS_001 -nofirstonly -auto -stdout
> ;, 10 bases
> EMBOSS_001
> hcsptpstas1
> ;, 10 bases
> 
> In the entret result above the first annotation line of the subsequent record 
> is returned as part of the requested record.
> 
> Many thanks,
> Daniel
> --
> Daniel Rozenbaum
> Biocceleration, Inc.
> OCIO/ Office of Application Engineering & Development/ Patent System Division 
> 600 Dulany St.
> Alexandria VA 22314
> 
> -
> On 15/08/2012 17:57, Daniel Rozenbaum wrote:
>> Dear list,
>> 
>> (Peter, many thanks for your prompt reply to my previous inquiry!)
>> 
>> We need to deal with extensive databases in Intelligenetics format with 
>> multiple lines in annotation of each record. It appears however that EMBOSS 
>> concatenates all annotation lines into a single line when building its 
>> internal representation of the sequence description:
>> 
>> % cat /tmp/IGSEQ.ig
>> ; Annotation line 1
>> ; Annotation line 2
>> ; Annotation line 3
>> IGSEQ
>> ACGCATCGCATCAGACTACGC1
>> 
>> 
>> % seqret /tmp/IGSEQ.ig -osformat2 ig -auto -osname IGSEQ.emboss_ig2ig 
>> -osdirectory /tmp
>> 
>> 
>> % cat /tmp/IGSEQ.emboss_ig2ig.ig
>> ;Annotation line 1 Annotation line 2 Annotation line 3, 21 bases
>> IGSEQ
>> ACGCATCGCATCAGACTACGC1
>> 
>> Are there any plans to support multi-line annotation in this format?
> 
> Interesting thought. We will take a look. It will need some care to 
> maintain compatibility with other formats that have single (FASTA) or 
> multiple (swissprot) descriptions.
> 
> Which package is using this IG format?
> 
> regards,
> 
> Peter Rice
> EMBOSS Team
> 
> 
> 
> ___
> EMBOSS mailing list
> EMBOSS@lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/emboss
> 

___
EMBOSS mailing list
EMBOSS@lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/emboss


[EMBOSS] Support for multi-line annotation in ig format

2012-09-14 Thread Rozenbaum, Daniel (Biocceleration Inc)
Hello Peter and everyone,

I was wondering if I could  revive the discussion about the support of IG 
format if possible. I'm helping deploy EMBOSS at the US Patent and Trademark 
Office, where this format, in its multi-line sequence annotation form, is used 
extensively.

Here's an example of an additional issue I've run into when trying to work with 
IG format in EMBOSS:

% makeprotseq -amount 10 -length 10 -nouseinsert -osformat ig -auto -osname ig1

% cat ig1.ig
;, 10 bases
EMBOSS_001
hcsptpstas1
;, 10 bases
EMBOSS_002
rdgwcvmtrm1
;, 10 bases
EMBOSS_003
fgtifgdgid1


% entret  -sequence ig1.ig:EMBOSS_001 -nofirstonly -auto -stdout
;, 10 bases
EMBOSS_001
hcsptpstas1
;, 10 bases

In the entret result above the first annotation line of the subsequent record 
is returned as part of the requested record.

Many thanks,
Daniel
--
Daniel Rozenbaum
Biocceleration, Inc.
OCIO/ Office of Application Engineering & Development/ Patent System Division 
600 Dulany St.
Alexandria VA 22314

-
On 15/08/2012 17:57, Daniel Rozenbaum wrote:
> Dear list,
>
> (Peter, many thanks for your prompt reply to my previous inquiry!)
>
> We need to deal with extensive databases in Intelligenetics format with 
> multiple lines in annotation of each record. It appears however that EMBOSS 
> concatenates all annotation lines into a single line when building its 
> internal representation of the sequence description:
>
> % cat /tmp/IGSEQ.ig
> ; Annotation line 1
> ; Annotation line 2
> ; Annotation line 3
> IGSEQ
> ACGCATCGCATCAGACTACGC1
>
>
> % seqret /tmp/IGSEQ.ig -osformat2 ig -auto -osname IGSEQ.emboss_ig2ig 
> -osdirectory /tmp
>
>
> % cat /tmp/IGSEQ.emboss_ig2ig.ig
> ;Annotation line 1 Annotation line 2 Annotation line 3, 21 bases
> IGSEQ
> ACGCATCGCATCAGACTACGC1
>
> Are there any plans to support multi-line annotation in this format?

Interesting thought. We will take a look. It will need some care to 
maintain compatibility with other formats that have single (FASTA) or 
multiple (swissprot) descriptions.

Which package is using this IG format?

regards,

Peter Rice
EMBOSS Team



___
EMBOSS mailing list
EMBOSS@lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/emboss


Re: [EMBOSS] Support for multi-line annotation in ig format

2012-08-15 Thread Peter Rice

On 15/08/2012 17:57, Daniel Rozenbaum wrote:

Dear list,

(Peter, many thanks for your prompt reply to my previous inquiry!)

We need to deal with extensive databases in Intelligenetics format with 
multiple lines in annotation of each record. It appears however that EMBOSS 
concatenates all annotation lines into a single line when building its internal 
representation of the sequence description:

% cat /tmp/IGSEQ.ig
; Annotation line 1
; Annotation line 2
; Annotation line 3
IGSEQ
ACGCATCGCATCAGACTACGC1


% seqret /tmp/IGSEQ.ig -osformat2 ig -auto -osname IGSEQ.emboss_ig2ig 
-osdirectory /tmp


% cat /tmp/IGSEQ.emboss_ig2ig.ig
;Annotation line 1 Annotation line 2 Annotation line 3, 21 bases
IGSEQ
ACGCATCGCATCAGACTACGC1

Are there any plans to support multi-line annotation in this format?


Interesting thought. We will take a look. It will need some care to 
maintain compatibility with other formats that have single (FASTA) or 
multiple (swissprot) descriptions.


Which package is using this IG format?

regards,

Peter Rice
EMBOSS Team

___
EMBOSS mailing list
EMBOSS@lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/emboss


[EMBOSS] Support for multi-line annotation in ig format

2012-08-15 Thread Daniel Rozenbaum
Dear list,

(Peter, many thanks for your prompt reply to my previous inquiry!)

We need to deal with extensive databases in Intelligenetics format with 
multiple lines in annotation of each record. It appears however that EMBOSS 
concatenates all annotation lines into a single line when building its internal 
representation of the sequence description:

% cat /tmp/IGSEQ.ig
; Annotation line 1
; Annotation line 2
; Annotation line 3
IGSEQ
ACGCATCGCATCAGACTACGC1


% seqret /tmp/IGSEQ.ig -osformat2 ig -auto -osname IGSEQ.emboss_ig2ig 
-osdirectory /tmp


% cat /tmp/IGSEQ.emboss_ig2ig.ig
;Annotation line 1 Annotation line 2 Annotation line 3, 21 bases
IGSEQ
ACGCATCGCATCAGACTACGC1

Are there any plans to support multi-line annotation in this format?

Many thanks,
Daniel
___
EMBOSS mailing list
EMBOSS@lists.open-bio.org
http://lists.open-bio.org/mailman/listinfo/emboss