Re: [EMBOSS] Support for multi-line annotation in ig format
On 19/09/2012 17:12, Rozenbaum, Daniel (Biocceleration Inc) wrote: Right - unfortunately all the other fields, while appearing well structured and nicely formatted in the example I sent, may or may not be present (or present but poorly formatted due to legacy issues) in the general case. And the patent number may not be present in the data representing patent applications that are still pending review. Thanks. That at least keeps things simple! regards, Peter Rice EMBOSS Team ___ EMBOSS mailing list EMBOSS@lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/emboss
Re: [EMBOSS] Support for multi-line annotation in ig format
Right - unfortunately all the other fields, while appearing well structured and nicely formatted in the example I sent, may or may not be present (or present but poorly formatted due to legacy issues) in the general case. And the patent number may not be present in the data representing patent applications that are still pending review. Many thanks, Daniel -Original Message- From: Peter Rice [mailto:ricepet...@yahoo.co.uk] Sent: Wednesday, September 19, 2012 11:35 AM To: Rozenbaum, Daniel (Biocceleration Inc) Cc: emboss@lists.open-bio.org Subject: Re: [EMBOSS] Support for multi-line annotation in ig format On 19/09/2012 16:23, Rozenbaum, Daniel (Biocceleration Inc) wrote: > Dear Peter, > > At least within the context of USPTO the sequence identifier is the only > consistently present piece of information that uniquely identifies the > sequence. Does the absence of an accession number field make the task of > adding support for this in EMBOSS more complex? No, it is not a problem. You only need to tell the database definition it has no accession (but perhaps the patent number could be used as an accession) regards, Peter Rice EMBOSS Team ___ EMBOSS mailing list EMBOSS@lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/emboss
Re: [EMBOSS] Support for multi-line annotation in ig format
On 19/09/2012 16:23, Rozenbaum, Daniel (Biocceleration Inc) wrote: Dear Peter, At least within the context of USPTO the sequence identifier is the only consistently present piece of information that uniquely identifies the sequence. Does the absence of an accession number field make the task of adding support for this in EMBOSS more complex? No, it is not a problem. You only need to tell the database definition it has no accession (but perhaps the patent number could be used as an accession) regards, Peter Rice EMBOSS Team ___ EMBOSS mailing list EMBOSS@lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/emboss
Re: [EMBOSS] Support for multi-line annotation in ig format
Dear Peter, At least within the context of USPTO the sequence identifier is the only consistently present piece of information that uniquely identifies the sequence. Does the absence of an accession number field make the task of adding support for this in EMBOSS more complex? Thank you, Daniel On Sep 19, 2012, at 11:14 AM, "Peter Rice" wrote: > Dear Daniel, > > On 19/09/2012 14:49, Rozenbaum, Daniel (Biocceleration Inc) wrote: > >> I am attaching a short anonymized sample file (would a larger data set be >> helpful?) that illustrates the type of IG format in use at USPTO. I believe >> that the only reasonably indexable field is the sequence name >> ("US-123456789-1", "US-123456789-2", etc). While the annotation fields >> appear structured, that part of the information is not reliable. > > Thanks I'll take a look. > > We usually index an "access number" in addition to the identifier. Is > there some significance in the parts of the id naming that could be used > as an accession or a sequence version? > >> As for the name, how about something like "iguspto"? > > Thanks. I may just use USPTO but it's not important. > >> Lastly, do you think the patch with this change would be made available for >> EMBOSS 6.4? > > Yes ... it is a fairly straightforward extension to dbxflat so I could > send you a copy but for general release I would prefer to distribute it > only from 6.5 onwards. > > regards, > > Peter Rice > EMBOSS Team > ___ EMBOSS mailing list EMBOSS@lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/emboss
Re: [EMBOSS] Support for multi-line annotation in ig format
Dear Daniel, On 19/09/2012 14:49, Rozenbaum, Daniel (Biocceleration Inc) wrote: I am attaching a short anonymized sample file (would a larger data set be helpful?) that illustrates the type of IG format in use at USPTO. I believe that the only reasonably indexable field is the sequence name ("US-123456789-1", "US-123456789-2", etc). While the annotation fields appear structured, that part of the information is not reliable. Thanks I'll take a look. We usually index an "access number" in addition to the identifier. Is there some significance in the parts of the id naming that could be used as an accession or a sequence version? As for the name, how about something like "iguspto"? Thanks. I may just use USPTO but it's not important. Lastly, do you think the patch with this change would be made available for EMBOSS 6.4? Yes ... it is a fairly straightforward extension to dbxflat so I could send you a copy but for general release I would prefer to distribute it only from 6.5 onwards. regards, Peter Rice EMBOSS Team ___ EMBOSS mailing list EMBOSS@lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/emboss
Re: [EMBOSS] Support for multi-line annotation in ig format
A quick addition to the information on this format: while the example I sent has the records separated by a couple of new lines and a form feed (^L , 0x0c), in the most general case the first line of the next record (a line that starts with a semicolon) could appear immediately after the last sequence data line of the previous record. Empty lines between records are ignored. On Sep 19, 2012, at 10:09 AM, "Rozenbaum, Daniel (Biocceleration Inc)" wrote: > Dear Peter, > > This is most wonderful news that's going to make a bunch of users really > happy! > > I am attaching a short anonymized sample file (would a larger data set be > helpful?) that illustrates the type of IG format in use at USPTO. I believe > that the only reasonably indexable field is the sequence name > ("US-123456789-1", "US-123456789-2", etc). While the annotation fields appear > structured, that part of the information is not reliable. > > As for the name, how about something like "iguspto"? > > Lastly, do you think the patch with this change would be made available for > EMBOSS 6.4? > > With gratitude, > Daniel > > -- > Daniel Rozenbaum > Biocceleration, Inc. > OCIO/ Office of Application Engineering & Development/ Patent System Division > 600 Dulany St. > Alexandria, VA 22314 > > -Original Message- > From: Peter Rice [mailto:ricepet...@yahoo.co.uk] > Sent: Wednesday, September 19, 2012 6:48 AM > To: Rozenbaum, Daniel (Biocceleration Inc) > Cc: emboss@lists.open-bio.org > Subject: Re: [EMBOSS] Support for multi-line annotation in ig format > > Dear Daniel, > > On 18/09/2012 03:00, Rozenbaum, Daniel (Biocceleration Inc) wrote: >> Greetings again, >> >> If I may, another question on the issue of IG format: how difficult would it >> be to support database indexing for this format? > > Very easy, a 1-day job including testing and documentation. > > Could you please make some example data available, and indicate which fields > could be indexed (including any information in formatted descriptions or in > naming conventions), and suggest a format name (e.g. > USPTO or Biocceleration) > > regards, > > Peter Rice > EMBOSS Team > > > > ___ > EMBOSS mailing list > EMBOSS@lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/emboss ___ EMBOSS mailing list EMBOSS@lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/emboss
Re: [EMBOSS] Support for multi-line annotation in ig format
Dear Peter, This is most wonderful news that's going to make a bunch of users really happy! I am attaching a short anonymized sample file (would a larger data set be helpful?) that illustrates the type of IG format in use at USPTO. I believe that the only reasonably indexable field is the sequence name ("US-123456789-1", "US-123456789-2", etc). While the annotation fields appear structured, that part of the information is not reliable. As for the name, how about something like "iguspto"? Lastly, do you think the patch with this change would be made available for EMBOSS 6.4? With gratitude, Daniel -- Daniel Rozenbaum Biocceleration, Inc. OCIO/ Office of Application Engineering & Development/ Patent System Division 600 Dulany St. Alexandria, VA 22314 -Original Message- From: Peter Rice [mailto:ricepet...@yahoo.co.uk] Sent: Wednesday, September 19, 2012 6:48 AM To: Rozenbaum, Daniel (Biocceleration Inc) Cc: emboss@lists.open-bio.org Subject: Re: [EMBOSS] Support for multi-line annotation in ig format Dear Daniel, On 18/09/2012 03:00, Rozenbaum, Daniel (Biocceleration Inc) wrote: > Greetings again, > > If I may, another question on the issue of IG format: how difficult would it > be to support database indexing for this format? Very easy, a 1-day job including testing and documentation. Could you please make some example data available, and indicate which fields could be indexed (including any information in formatted descriptions or in naming conventions), and suggest a format name (e.g. USPTO or Biocceleration) regards, Peter Rice EMBOSS Team ; Sequence 1, Application US/123456789 ; Patent No. 98765432 ; GENERAL INFORMATION ; APPLICANT: Doe, Jane ; APPLICANT: Doe, John ; TITLE OF INVENTION: title of invention text here ; FILE REFERENCE: file reference text here ; CURRENT APPLICATION NUMBER: 123456789 ; CURRENT FILING DATE: 2010-01-01 ; NUMBER OF SEQ ID NOS: 4 ; SOFTWARE: PatentIn version 3.5 ; SEQ ID NO 1 ; LENGTH: 178 ; TYPE: PRT ; ORGANISM: organism description text here US-123456789-1 HGQGMHKIEAPCGQMFRCTMVKFSDDYNEPIALKIRYARPGTCWYAMVVCEQMVPWISWT LALTRVAGQVRDSPPFWAWYCEKMQANKPMPWRQTWVAHYAWPENWMNPYNVFGKCHKTD LGRCWQWWKDITEQLTVCHWMDWGIACQDCLEKTKHGLCHSRAQIMHCGHGGVTGKHA1 ; Sequence 2, Application US/123456789 ; Patent No. 98765432 ; GENERAL INFORMATION ; APPLICANT: Doe, Jane ; APPLICANT: Doe, John ; TITLE OF INVENTION: title of invention text here ; FILE REFERENCE: file reference text here ; CURRENT APPLICATION NUMBER: 123456789 ; CURRENT FILING DATE: 2010-01-01 ; NUMBER OF SEQ ID NOS: 4 ; SOFTWARE: PatentIn version 3.5 ; SEQ ID NO 2 ; LENGTH: 500 ; TYPE: PRT ; ORGANISM: organism description text here US-123456789-2 KTLNSGAQIALVMTNASRGLPQTSRVLDYREVNRTDSGNYHGDSYRYHEHRVKYESMNKM CNTLLAFCRPKKMQNTARWHRVDLCMQEYCACPRMFCTVQTHMPWFRSDVGPPWFAARTN PECSIVDGAVGRKCHEPTTNEVAGCRFECGPVSHEDPIMKWHAVTGHQRSMILILLGPRQ CGKTTSEIWCHYVHDWAHMQHVTYYTVVDEERMNAFANKNHTNVCKYHPSMLHCVHRLSP HPPVEYNLKNLKITYMPPNSISNPGITLDNTCLQTACLGSHYSWVMVEMYTRNCYRAPAY NKAQNSDTWGIQTAVHTANGHEANQEVCIAIIFIGFWAYKHDVWHMTVDEVDGYMPDESV NGDGGPKKYIEFKCQYWTGFDYDAIGIHVLTRFFRWYEFCLRWQHGKAHIHAPCRDTGHG ANTLAKAESNPFGAAQSALGWLMDNLCKYLMCNRCAQLNASHWTFWTNPMDQWMCGMLDI CRPPMLRKGPISDESHTFTD1 ; Sequence 3, Application US/123456789 ; Patent No. 98765432 ; GENERAL INFORMATION ; APPLICANT: Doe, Jane ; APPLICANT: Doe, John ; TITLE OF INVENTION: title of invention text here ; FILE REFERENCE: file reference text here ; CURRENT APPLICATION NUMBER: 123456789 ; CURRENT FILING DATE: 2010-01-01 ; NUMBER OF SEQ ID NOS: 4 ; SOFTWARE: PatentIn version 3.5 ; SEQ ID NO 3 ; LENGTH: 61 ; TYPE: PRT ; ORGANISM: organism description text here US-123456789-3 GVSGANWCNNEWFNARSGWPAPICTGRFPKVSAYCRLVVMWYAKTFFRYEFAFVHKRTGP M1 ; Sequence 4, Application US/123456789 ; Patent No. 98765432 ; GENERAL INFORMATION ; APPLICANT: Doe, Jane ; APPLICANT: Doe, John ; TITLE OF INVENTION: title of invention text here ; FILE REFERENCE: file reference text here ; CURRENT APPLICATION NUMBER: 123456789 ; CURRENT FILING DATE: 2010-01-01 ; NUMBER OF SEQ ID NOS: 4 ; SOFTWARE: PatentIn version 3.5 ; SEQ ID NO 4 ; LENGTH: 10 ; TYPE: PRT ; ORGANISM: organism description text here US-123456789-4 YDAIGIHVLT1 ___ EMBOSS mailing list EMBOSS@lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/emboss
Re: [EMBOSS] Support for multi-line annotation in ig format
On 19/09/2012 11:58, Peter Cock wrote: Does it need a new format name? EMBOSS already defines "ig" and "igstrict" - do the USPTO files diverge from these? The format name is needed as an option to dbxflat -idformat so we can select a specific parser for any additional fields. For example, in dbxfasta -idformat has 7 names for 'fasta' format. regards, Peter Rice EMBOSS Team ___ EMBOSS mailing list EMBOSS@lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/emboss
Re: [EMBOSS] Support for multi-line annotation in ig format
Dear Daniel, On 14/09/2012 13:56, Rozenbaum, Daniel (Biocceleration Inc) wrote: Hello Peter and everyone, Here's an example of an additional issue I've run into when trying to work with IG format in EMBOSS: In the entret result above the first annotation line of the subsequent record is returned as part of the requested record. Well spotted. The input buffer is not reset in Ig formats so the next line was included in the entret output. I will fix it in the next patch for the latest release (6.5). Let me know if you also need a patch for 6.4. regards, Peter Rice EMBOSS Team ___ EMBOSS mailing list EMBOSS@lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/emboss
Re: [EMBOSS] Support for multi-line annotation in ig format
On Wed, Sep 19, 2012 at 11:48 AM, Peter Rice wrote: > Dear Daniel, > > > On 18/09/2012 03:00, Rozenbaum, Daniel (Biocceleration Inc) wrote: >> >> Greetings again, >> >> If I may, another question on the issue of IG format: how difficult would >> it be to support database indexing for this format? > > > Very easy, a 1-day job including testing and documentation. > > Could you please make some example data available, and indicate which fields > could be indexed (including any information in formatted descriptions or in > naming conventions), and suggest a format name (e.g. USPTO or > Biocceleration) Does it need a new format name? EMBOSS already defines "ig" and "igstrict" - do the USPTO files diverge from these? Peter C. P.S. Biopython also uses the format name "ig", based on the current EMBOSS terminology. ___ EMBOSS mailing list EMBOSS@lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/emboss
Re: [EMBOSS] Support for multi-line annotation in ig format
Dear Daniel, On 18/09/2012 03:00, Rozenbaum, Daniel (Biocceleration Inc) wrote: Greetings again, If I may, another question on the issue of IG format: how difficult would it be to support database indexing for this format? Very easy, a 1-day job including testing and documentation. Could you please make some example data available, and indicate which fields could be indexed (including any information in formatted descriptions or in naming conventions), and suggest a format name (e.g. USPTO or Biocceleration) regards, Peter Rice EMBOSS Team ___ EMBOSS mailing list EMBOSS@lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/emboss
Re: [EMBOSS] Support for multi-line annotation in ig format
On Tue, Sep 18, 2012 at 12:42 PM, Rozenbaum, Daniel (Biocceleration Inc) wrote: > Hi Peter, > > I don't believe the USPTO provides datasets to the public in the IG format. > > With best regards, > Daniel OK, thanks. Peter ___ EMBOSS mailing list EMBOSS@lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/emboss
Re: [EMBOSS] Support for multi-line annotation in ig format
Hi Peter, I don't believe the USPTO provides datasets to the public in the IG format. With best regards, Daniel From: Peter Cock [p.j.a.c...@googlemail.com] Sent: Tuesday, September 18, 2012 4:25 AM To: Rozenbaum, Daniel (Biocceleration Inc) Cc: emboss@lists.open-bio.org Subject: Re: [EMBOSS] Support for multi-line annotation in ig format On Fri, Sep 14, 2012 at 1:56 PM, Rozenbaum, Daniel (Biocceleration Inc) wrote: > Hello Peter and everyone, > > I was wondering if I could revive the discussion about the support of IG > format if possible. I'm helping deploy EMBOSS at the US Patent and > Trademark Office, where this format, in its multi-line sequence annotation > form, is used extensively. Hi Daniel, That is interesting to know - I work on Biopython, which has support for reading and indexing the Intelligenetics "ig" format. I'd been under the impression that this was a defunct/unused file format (and therefore never bothered to implement support for writing it in Biopython). Does the US Patent and Trademark Office provide datasets to the public in this format? Thanks, Peter C. ___ EMBOSS mailing list EMBOSS@lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/emboss
Re: [EMBOSS] Support for multi-line annotation in ig format
On Fri, Sep 14, 2012 at 1:56 PM, Rozenbaum, Daniel (Biocceleration Inc) wrote: > Hello Peter and everyone, > > I was wondering if I could revive the discussion about the support of IG > format if possible. I'm helping deploy EMBOSS at the US Patent and > Trademark Office, where this format, in its multi-line sequence annotation > form, is used extensively. Hi Daniel, That is interesting to know - I work on Biopython, which has support for reading and indexing the Intelligenetics "ig" format. I'd been under the impression that this was a defunct/unused file format (and therefore never bothered to implement support for writing it in Biopython). Does the US Patent and Trademark Office provide datasets to the public in this format? Thanks, Peter C. ___ EMBOSS mailing list EMBOSS@lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/emboss
Re: [EMBOSS] Support for multi-line annotation in ig format
Greetings again, If I may, another question on the issue of IG format: how difficult would it be to support database indexing for this format? With best regards, Daniel -- Daniel Rozenbaum Biocceleration, Inc. OCIO/ Office of Application Engineering & Development/ Patent System Division 600 Dulany St. Alexandria, VA 22314 On Sep 14, 2012, at 9:36 AM, "Rozenbaum, Daniel (Biocceleration Inc)" wrote: > Hello Peter and everyone, > > I was wondering if I could revive the discussion about the support of IG > format if possible. I'm helping deploy EMBOSS at the US Patent and Trademark > Office, where this format, in its multi-line sequence annotation form, is > used extensively. > > Here's an example of an additional issue I've run into when trying to work > with IG format in EMBOSS: > > % makeprotseq -amount 10 -length 10 -nouseinsert -osformat ig -auto -osname > ig1 > > % cat ig1.ig > ;, 10 bases > EMBOSS_001 > hcsptpstas1 > ;, 10 bases > EMBOSS_002 > rdgwcvmtrm1 > ;, 10 bases > EMBOSS_003 > fgtifgdgid1 > > > % entret -sequence ig1.ig:EMBOSS_001 -nofirstonly -auto -stdout > ;, 10 bases > EMBOSS_001 > hcsptpstas1 > ;, 10 bases > > In the entret result above the first annotation line of the subsequent record > is returned as part of the requested record. > > Many thanks, > Daniel > -- > Daniel Rozenbaum > Biocceleration, Inc. > OCIO/ Office of Application Engineering & Development/ Patent System Division > 600 Dulany St. > Alexandria VA 22314 > > - > On 15/08/2012 17:57, Daniel Rozenbaum wrote: >> Dear list, >> >> (Peter, many thanks for your prompt reply to my previous inquiry!) >> >> We need to deal with extensive databases in Intelligenetics format with >> multiple lines in annotation of each record. It appears however that EMBOSS >> concatenates all annotation lines into a single line when building its >> internal representation of the sequence description: >> >> % cat /tmp/IGSEQ.ig >> ; Annotation line 1 >> ; Annotation line 2 >> ; Annotation line 3 >> IGSEQ >> ACGCATCGCATCAGACTACGC1 >> >> >> % seqret /tmp/IGSEQ.ig -osformat2 ig -auto -osname IGSEQ.emboss_ig2ig >> -osdirectory /tmp >> >> >> % cat /tmp/IGSEQ.emboss_ig2ig.ig >> ;Annotation line 1 Annotation line 2 Annotation line 3, 21 bases >> IGSEQ >> ACGCATCGCATCAGACTACGC1 >> >> Are there any plans to support multi-line annotation in this format? > > Interesting thought. We will take a look. It will need some care to > maintain compatibility with other formats that have single (FASTA) or > multiple (swissprot) descriptions. > > Which package is using this IG format? > > regards, > > Peter Rice > EMBOSS Team > > > > ___ > EMBOSS mailing list > EMBOSS@lists.open-bio.org > http://lists.open-bio.org/mailman/listinfo/emboss > ___ EMBOSS mailing list EMBOSS@lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/emboss
[EMBOSS] Support for multi-line annotation in ig format
Hello Peter and everyone, I was wondering if I could revive the discussion about the support of IG format if possible. I'm helping deploy EMBOSS at the US Patent and Trademark Office, where this format, in its multi-line sequence annotation form, is used extensively. Here's an example of an additional issue I've run into when trying to work with IG format in EMBOSS: % makeprotseq -amount 10 -length 10 -nouseinsert -osformat ig -auto -osname ig1 % cat ig1.ig ;, 10 bases EMBOSS_001 hcsptpstas1 ;, 10 bases EMBOSS_002 rdgwcvmtrm1 ;, 10 bases EMBOSS_003 fgtifgdgid1 % entret -sequence ig1.ig:EMBOSS_001 -nofirstonly -auto -stdout ;, 10 bases EMBOSS_001 hcsptpstas1 ;, 10 bases In the entret result above the first annotation line of the subsequent record is returned as part of the requested record. Many thanks, Daniel -- Daniel Rozenbaum Biocceleration, Inc. OCIO/ Office of Application Engineering & Development/ Patent System Division 600 Dulany St. Alexandria VA 22314 - On 15/08/2012 17:57, Daniel Rozenbaum wrote: > Dear list, > > (Peter, many thanks for your prompt reply to my previous inquiry!) > > We need to deal with extensive databases in Intelligenetics format with > multiple lines in annotation of each record. It appears however that EMBOSS > concatenates all annotation lines into a single line when building its > internal representation of the sequence description: > > % cat /tmp/IGSEQ.ig > ; Annotation line 1 > ; Annotation line 2 > ; Annotation line 3 > IGSEQ > ACGCATCGCATCAGACTACGC1 > > > % seqret /tmp/IGSEQ.ig -osformat2 ig -auto -osname IGSEQ.emboss_ig2ig > -osdirectory /tmp > > > % cat /tmp/IGSEQ.emboss_ig2ig.ig > ;Annotation line 1 Annotation line 2 Annotation line 3, 21 bases > IGSEQ > ACGCATCGCATCAGACTACGC1 > > Are there any plans to support multi-line annotation in this format? Interesting thought. We will take a look. It will need some care to maintain compatibility with other formats that have single (FASTA) or multiple (swissprot) descriptions. Which package is using this IG format? regards, Peter Rice EMBOSS Team ___ EMBOSS mailing list EMBOSS@lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/emboss
Re: [EMBOSS] Support for multi-line annotation in ig format
On 15/08/2012 17:57, Daniel Rozenbaum wrote: Dear list, (Peter, many thanks for your prompt reply to my previous inquiry!) We need to deal with extensive databases in Intelligenetics format with multiple lines in annotation of each record. It appears however that EMBOSS concatenates all annotation lines into a single line when building its internal representation of the sequence description: % cat /tmp/IGSEQ.ig ; Annotation line 1 ; Annotation line 2 ; Annotation line 3 IGSEQ ACGCATCGCATCAGACTACGC1 % seqret /tmp/IGSEQ.ig -osformat2 ig -auto -osname IGSEQ.emboss_ig2ig -osdirectory /tmp % cat /tmp/IGSEQ.emboss_ig2ig.ig ;Annotation line 1 Annotation line 2 Annotation line 3, 21 bases IGSEQ ACGCATCGCATCAGACTACGC1 Are there any plans to support multi-line annotation in this format? Interesting thought. We will take a look. It will need some care to maintain compatibility with other formats that have single (FASTA) or multiple (swissprot) descriptions. Which package is using this IG format? regards, Peter Rice EMBOSS Team ___ EMBOSS mailing list EMBOSS@lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/emboss
[EMBOSS] Support for multi-line annotation in ig format
Dear list, (Peter, many thanks for your prompt reply to my previous inquiry!) We need to deal with extensive databases in Intelligenetics format with multiple lines in annotation of each record. It appears however that EMBOSS concatenates all annotation lines into a single line when building its internal representation of the sequence description: % cat /tmp/IGSEQ.ig ; Annotation line 1 ; Annotation line 2 ; Annotation line 3 IGSEQ ACGCATCGCATCAGACTACGC1 % seqret /tmp/IGSEQ.ig -osformat2 ig -auto -osname IGSEQ.emboss_ig2ig -osdirectory /tmp % cat /tmp/IGSEQ.emboss_ig2ig.ig ;Annotation line 1 Annotation line 2 Annotation line 3, 21 bases IGSEQ ACGCATCGCATCAGACTACGC1 Are there any plans to support multi-line annotation in this format? Many thanks, Daniel ___ EMBOSS mailing list EMBOSS@lists.open-bio.org http://lists.open-bio.org/mailman/listinfo/emboss