Re: [Artemis-users] Export annotated Artemis data to CLC Bio

2010-11-02 Thread J . vandeVossenberg
Dear Tim,

Import in seqret was not faster than manual adaptation of the artemis
output file. This is how I did it (no guarantee...):


1) Replace fasta_record with source, keep the right number of spaces
(features start at column 22), add  the qualifiers /organism=text 
/mol_type=genomic DNA (these are compulsary) under each FT   source line
(use a perl script, see example =a=).

2) on top add ID, DT, DE, OS lines (see example =b=, XX is empty line) and
add FH   Key Location/Qualifiers
Then make an FT   source line that spans the whole sequence. Add the
qualifiers /organism=text  /mol_type=genomic DNA (these are
compulsary) and /focus

/focus is to indicate that the sequence consists of more sources, I
would rather use the word contig, because /focus implies that the
annotated genome is a contiguous sequence, which is not the case for me.
Unfortunately /contig is not an official qualifier.

As far as I can judge this is sufficient for import of all features into
CLC. I only would like to tell CLC that it is not an contiguous sequence.


Examples

=a= example perl oneliner
To insert after the line with FT   source:
FT   /organism=Organism name
FT   /mol_type=genomic DNA

perl -p -i -e 's/FT   source\s{10}\d+\.\.\d+/$\nFT  
\/organism=\Organism name\\nFT   \/mol_type=\genomic
DNA\/g' SourceFileArtemis.art

=b= Header
ID   linear; genomic DNA;
XX
XX
DT   02-NOV-2010
XX
DE   Organism name genome
XX
OS   Organism name
XX
FH   Key Location/Qualifiers
FH
FT   source  1..500
FT   /organism=Organism name
FT   /mol_type=genomic DNA
FT   /note=text
FT   /focus


 Hi Jack

 Artemis is not really meant as a conversion tool between formats and in
 particular EMBL/GenBank to GFF, although it will have a go. You could try
 EMBOSS (seqret) to convert. However, it sounds like you have multiple
 fasta
 records in your file which may cause problems if you are writing out embl
 files. So you may want to try writing the sequence out (File-Write-All
 bases). Open this single sequence file and then read your annotation into
 the sequence entry. Then write out the file as EMBL.

 Regards
 Tim


 On 10/21/10 12:26 PM, Jack van de Vossenberg
 j.vandevossenb...@science.ru.nl wrote:

 Addition:
 For Artemis export to GFF, I changed fasta_record to source, which
 is a Feature Key in standard nomenclature*, and added the mandatory
 fields /organism= and /mol_type=, but every time I get a message that
 the source field cannot be exported.

 Is that normal behaviour? Can anyone tell what goes wrong?

 Cheers, Jack

 *
 http://www.ebi.ac.uk/embl/Documentation/FT_definitions/feature_table.html

 On 10/21/2010 10:27 AM, j.vandevossenb...@science.ru.nl wrote:
 Dear all,

 I have annotated genome data in Artemis, and I would like to import the
 result of that annotation into CLC Bio Genomics Workbench
 (http://www.clcbio.com/).

 I tried direct import, selected all entries and exported from Artemis
 to
 EMBL, Genbank and Sequin. None of these were recognized by CLC, even
 though it should be able to import many file formats
 (http://www.clcbio.com/index.php?id=426).
 I tried SFF, which does not include sequence data. So I used a separate
 sequence file, the contigs concatenated into one large fasta sequence.
 CLC
 has a SFF import filter, which is very picky about the sequence names
 (read CLC SFF import manual). I managed to let it import SFF, but I did
 not see any annotation at all, I think because all ORFs are named
 artemis
 (gff_seqname artemis). Contig names are lost in SFF, so this option
 may
 import all annotated genes, but lose contig info. SFF does not
 recognize
 fasta record so I should rename this into something (but what? I
 tried
 contig, source, but the GFF file keeps on using ORFs only, all
 named
 gff_seqname artemis).

 Does anyone have experience with this? I thought of using another
 program
 as intermediate to convert Artemis data into CLC readable data.

 Thanks for your help, Jack


 ___
 Artemis-users mailing list
 Artemis-users@sanger.ac.uk
 http://lists.sanger.ac.uk/mailman/listinfo/artemis-users



 ___
 Artemis-users mailing list
 Artemis-users@sanger.ac.uk
 http://lists.sanger.ac.uk/mailman/listinfo/artemis-users




 --
  The Wellcome Trust Sanger Institute is operated by Genome Research
  Limited, a charity registered in England with number 1021457 and a
  company registered in England with number 2742969, whose registered
  office is 215 Euston Road, London, NW1 2BE.




___
Artemis-users mailing list
Artemis-users@sanger.ac.uk
http://lists.sanger.ac.uk/mailman/listinfo/artemis-users


Re: [Artemis-users] Export annotated Artemis data to CLC Bio

2010-10-21 Thread Jack van de Vossenberg

Addition:
For Artemis export to GFF, I changed fasta_record to source, which 
is a Feature Key in standard nomenclature*, and added the mandatory 
fields /organism= and /mol_type=, but every time I get a message that 
the source field cannot be exported.


Is that normal behaviour? Can anyone tell what goes wrong?

Cheers, Jack

* http://www.ebi.ac.uk/embl/Documentation/FT_definitions/feature_table.html

On 10/21/2010 10:27 AM, j.vandevossenb...@science.ru.nl wrote:

Dear all,

I have annotated genome data in Artemis, and I would like to import the
result of that annotation into CLC Bio Genomics Workbench
(http://www.clcbio.com/).

I tried direct import, selected all entries and exported from Artemis to
EMBL, Genbank and Sequin. None of these were recognized by CLC, even
though it should be able to import many file formats
(http://www.clcbio.com/index.php?id=426).
I tried SFF, which does not include sequence data. So I used a separate
sequence file, the contigs concatenated into one large fasta sequence. CLC
has a SFF import filter, which is very picky about the sequence names
(read CLC SFF import manual). I managed to let it import SFF, but I did
not see any annotation at all, I think because all ORFs are named artemis
(gff_seqname  artemis). Contig names are lost in SFF, so this option may
import all annotated genes, but lose contig info. SFF does not recognize
fasta record so I should rename this into something (but what? I tried
contig, source, but the GFF file keeps on using ORFs only, all named
gff_seqname   artemis).

Does anyone have experience with this? I thought of using another program
as intermediate to convert Artemis data into CLC readable data.

Thanks for your help, Jack


___
Artemis-users mailing list
Artemis-users@sanger.ac.uk
http://lists.sanger.ac.uk/mailman/listinfo/artemis-users
   



___
Artemis-users mailing list
Artemis-users@sanger.ac.uk
http://lists.sanger.ac.uk/mailman/listinfo/artemis-users


Re: [Artemis-users] Export annotated Artemis data to CLC Bio

2010-10-21 Thread Tim Carver
Hi Jack

Artemis is not really meant as a conversion tool between formats and in
particular EMBL/GenBank to GFF, although it will have a go. You could try
EMBOSS (seqret) to convert. However, it sounds like you have multiple fasta
records in your file which may cause problems if you are writing out embl
files. So you may want to try writing the sequence out (File-Write-All
bases). Open this single sequence file and then read your annotation into
the sequence entry. Then write out the file as EMBL.

Regards
Tim


On 10/21/10 12:26 PM, Jack van de Vossenberg
j.vandevossenb...@science.ru.nl wrote:

 Addition:
 For Artemis export to GFF, I changed fasta_record to source, which
 is a Feature Key in standard nomenclature*, and added the mandatory
 fields /organism= and /mol_type=, but every time I get a message that
 the source field cannot be exported.
 
 Is that normal behaviour? Can anyone tell what goes wrong?
 
 Cheers, Jack
 
 * http://www.ebi.ac.uk/embl/Documentation/FT_definitions/feature_table.html
 
 On 10/21/2010 10:27 AM, j.vandevossenb...@science.ru.nl wrote:
 Dear all,
 
 I have annotated genome data in Artemis, and I would like to import the
 result of that annotation into CLC Bio Genomics Workbench
 (http://www.clcbio.com/).
 
 I tried direct import, selected all entries and exported from Artemis to
 EMBL, Genbank and Sequin. None of these were recognized by CLC, even
 though it should be able to import many file formats
 (http://www.clcbio.com/index.php?id=426).
 I tried SFF, which does not include sequence data. So I used a separate
 sequence file, the contigs concatenated into one large fasta sequence. CLC
 has a SFF import filter, which is very picky about the sequence names
 (read CLC SFF import manual). I managed to let it import SFF, but I did
 not see any annotation at all, I think because all ORFs are named artemis
 (gff_seqname artemis). Contig names are lost in SFF, so this option may
 import all annotated genes, but lose contig info. SFF does not recognize
 fasta record so I should rename this into something (but what? I tried
 contig, source, but the GFF file keeps on using ORFs only, all named
 gff_seqname artemis).
 
 Does anyone have experience with this? I thought of using another program
 as intermediate to convert Artemis data into CLC readable data.
 
 Thanks for your help, Jack
 
 
 ___
 Artemis-users mailing list
 Artemis-users@sanger.ac.uk
 http://lists.sanger.ac.uk/mailman/listinfo/artemis-users

 
 
 ___
 Artemis-users mailing list
 Artemis-users@sanger.ac.uk
 http://lists.sanger.ac.uk/mailman/listinfo/artemis-users



___
Artemis-users mailing list
Artemis-users@sanger.ac.uk
http://lists.sanger.ac.uk/mailman/listinfo/artemis-users