Re: [ccp4bb] sequence format conversion
Hello, The tool is called awk. There is also another tool called Perl, but I won't recommend it. Regards, F. On 05/08/2012 04:02 PM, K Singh wrote: Dear All I was looking for a script or an informatics tool enabling me to change the sequence from FASTA format to something like following: FASTA FORMAT abcdefghijklmnopqrstuvwxyz to 1 abcde fghij 11 klmno pqrst 21 uvwxy z Many thanks in advance Regards Kris
Re: [ccp4bb] sequence format conversion
Surely a sequence analysis tools are the easiest way to do it. I'd recommend EMBOSS (open source and runs nicely on most platforms - the ccp4 of sequence analysis for me at least) http://emboss.sourceforge.net/ Seqret (SEQuence RETurn) program: seqret -out test.seq -osformat gcg test.fasta Marko PS. fasta format needs as a first line with (optional) description in the input file. And not sure what amino acids b and j would get converted to :-) On Tue, 8 May 2012, Francois Berenger wrote: More seriously, there is the babel command from Open Babel in case the second format you show has a known name. On 05/08/2012 04:46 PM, Francois Berenger wrote: Hello, The tool is called awk. There is also another tool called Perl, but I won't recommend it. Regards, F. On 05/08/2012 04:02 PM, K Singh wrote: Dear All I was looking for a script or an informatics tool enabling me to change the sequence from FASTA format to something like following: FASTA FORMAT abcdefghijklmnopqrstuvwxyz to 1 abcde fghij 11 klmno pqrst 21 uvwxy z Many thanks in advance Regards Kris _ Marko Hyvonen Department of Biochemistry, University of Cambridge ma...@cryst.bioc.cam.ac.uk http://www-cryst.bioc.cam.ac.uk/groups/hyvonen tel:+44-(0)1223-766 044 mobile: +44-(0)7796-174 877 fax:+44-(0)1223-766 002 --
Re: [ccp4bb] sequence format conversion
On Tue, 2012-05-08 at 09:22 +0100, Marko Hyvonen wrote: PS. fasta format needs as a first line with (optional) description in the input file. And not sure what amino acids b and j would get converted to :-) A good tool should leave b as is: it is ASX (the standard ambiguity code for ASP or ASN). j, o and u are a different matter :-) Regards, Peter. -- Peter Keller Tel.: +44 (0)1223 353033 Global Phasing Ltd., Fax.: +44 (0)1223 366889 Sheraton House, Castle Park, Cambridge CB3 0AX United Kingdom
Re: [ccp4bb] sequence format conversion
A good tool should leave b as is: it is ASX (the standard ambiguity code for ASP or ASN). j, o and u are a different matter :-) http://www.uniprot.org/manual/non_std Selenocyteine [sic!] and pyrrolysine are represented in the sequence using the one-letter codes U for selenocysteine and O for pyrrolysine --Gerard ** Gerard J. Kleywegt http://xray.bmc.uu.se/gerard mailto:ger...@xray.bmc.uu.se ** The opinions in this message are fictional. Any similarity to actual opinions, living or dead, is purely coincidental. ** Little known gastromathematical curiosity: let z be the radius and a the thickness of a pizza. Then the volume of that pizza is equal to pi*z*z*a ! **