Re: [ccp4bb] sequence format conversion

2012-05-08 Thread Francois Berenger

Hello,

The tool is called awk.
There is also another tool called Perl, but I won't recommend it.

Regards,
F.

On 05/08/2012 04:02 PM, K Singh wrote:

Dear All
I was looking for a script or an informatics tool enabling me to
change the sequence from FASTA format to something like following:


FASTA FORMAT

abcdefghijklmnopqrstuvwxyz

to

   1  abcde fghij
11  klmno pqrst
21  uvwxy z


Many thanks in advance

Regards
Kris


Re: [ccp4bb] sequence format conversion

2012-05-08 Thread Marko Hyvonen

Surely a sequence analysis tools are the easiest way to do it.

I'd recommend EMBOSS (open source and runs nicely on most platforms - the 
ccp4 of sequence analysis for me at least) 
http://emboss.sourceforge.net/


Seqret (SEQuence RETurn) program:

seqret -out test.seq -osformat gcg test.fasta

Marko

PS. fasta format needs  as a first line with (optional) description in 
the input file. And not sure what amino acids b and j would get 
converted to :-)


On Tue, 8 May 2012, Francois Berenger wrote:


More seriously, there is the babel command from Open Babel
in case the second format you show has a known name.

On 05/08/2012 04:46 PM, Francois Berenger wrote:

Hello,

The tool is called awk.
There is also another tool called Perl, but I won't recommend it.

Regards,
F.

On 05/08/2012 04:02 PM, K Singh wrote:

Dear All
I was looking for a script or an informatics tool enabling me to
change the sequence from FASTA format to something like following:


FASTA FORMAT

abcdefghijklmnopqrstuvwxyz

to

1 abcde fghij
11 klmno pqrst
21 uvwxy z


Many thanks in advance

Regards
Kris





 _

 Marko Hyvonen
 Department of Biochemistry, University of Cambridge
 ma...@cryst.bioc.cam.ac.uk
 http://www-cryst.bioc.cam.ac.uk/groups/hyvonen
 tel:+44-(0)1223-766 044
 mobile: +44-(0)7796-174 877
 fax:+44-(0)1223-766 002
 --


Re: [ccp4bb] sequence format conversion

2012-05-08 Thread Peter Keller
On Tue, 2012-05-08 at 09:22 +0100, Marko Hyvonen wrote:

 PS. fasta format needs  as a first line with (optional) description in 
 the input file. And not sure what amino acids b and j would get 
 converted to :-)

A good tool should leave b as is: it is ASX (the standard ambiguity
code for ASP or ASN). j, o and u are a different matter :-)

Regards,
Peter.

-- 
Peter Keller Tel.: +44 (0)1223 353033
Global Phasing Ltd., Fax.: +44 (0)1223 366889
Sheraton House,
Castle Park,
Cambridge CB3 0AX
United Kingdom


Re: [ccp4bb] sequence format conversion

2012-05-08 Thread Gerard DVD Kleywegt

A good tool should leave b as is: it is ASX (the standard ambiguity
code for ASP or ASN). j, o and u are a different matter :-)


http://www.uniprot.org/manual/non_std

Selenocyteine [sic!] and pyrrolysine are represented in the sequence using 
the one-letter codes U for selenocysteine and O for pyrrolysine


--Gerard

**
   Gerard J. Kleywegt

  http://xray.bmc.uu.se/gerard   mailto:ger...@xray.bmc.uu.se
**
   The opinions in this message are fictional.  Any similarity
   to actual opinions, living or dead, is purely coincidental.
**
   Little known gastromathematical curiosity: let z be the
   radius and a the thickness of a pizza. Then the volume
of that pizza is equal to pi*z*z*a !
**