Hi Hajihashemi,

you can do this by a python(easy to learn)/perl(fast and powerful on
processing string) script. pig may not be appropriate to your needs.

Thanks.
TianYi

On Mon, Sep 24, 2012 at 4:41 AM, HAJIHASHEMI, ZAHRA (AG/1000) <
[email protected]> wrote:

> Hi all,
>
> I'm new to pig and need to format my file. I have fasta file with this
> fomat:
>
> >1 abundance=7626 length=72 cross=0
> CGACACGACTCTCGGCAACGGATA
> CGACACGACTCTCGGCAACGGATAC
> GACACGACTCTCGGCAACGGATA
> >3 abundance=4639 length=22 cross=1
> CGACACGACTCTCGGCAACGGA
> CGACACGACTCTCGGCAACGGATA
> CGACACGACTCTCGGCAACGGATA
> >4 abundance=4302 length=24 cross=0
> ACTTGTGCTGATTGGATGACTTGA
> >5 abundance=3785 length=23 cross=0
> GACACGACTCTCGGCAACGGATA
>
> Each line which starts with '>' corresponds to one sequence, but the
> actual sequence might be stored in multiple lines like record 1. In each
> line, the first number is id.
> In the formatted file, I do not need id and cross.
> I need to format this file such that all records will be in just one line
> and without the keywords "abundance", "length", and "cross". So the ideal
> formatted file should be like that:
> 7626 72
> CGACACGACTCTCGGCAACGGATACGACACGACTCTCGGCAACGGATACGACACGACTCTCGGCAACGGATA
> 4639 22
> CGACACGACTCTCGGCAACGGACGACACGACTCTCGGCAACGGATACGACACGACTCTCGGCAACGGATA
> 4302 24 ACTTGTGCTGATTGGATGACTTGA
> 3785 23 GACACGACTCTCGGCAACGGATA
>
> Can I do this formatting in pig?
> Any help is highly appreciated.
>
>
> Zara
> This e-mail message may contain privileged and/or confidential
> information, and is intended to be received only by persons entitled
> to receive such information. If you have received this e-mail in error,
> please notify the sender immediately. Please delete it and
> all attachments from any servers, hard drives or any other media. Other
> use of this e-mail by you is strictly prohibited.
>
> All e-mails and attachments sent and received are subject to monitoring,
> reading and archival by Monsanto, including its
> subsidiaries. The recipient of this e-mail is solely responsible for
> checking for the presence of "Viruses" or other "Malware".
> Monsanto, along with its subsidiaries, accepts no liability for any damage
> caused by any such code transmitted by or accompanying
> this e-mail or any attachment.
>
>
> The information contained in this email may be subject to the export
> control laws and regulations of the United States, potentially
> including but not limited to the Export Administration Regulations (EAR)
> and sanctions regulations issued by the U.S. Department of
> Treasury, Office of Foreign Asset Controls (OFAC).  As a recipient of this
> information you are obligated to comply with all
> applicable U.S. export laws and regulations.
>

Reply via email to