Hi Hajihashemi, you can do this by a python(easy to learn)/perl(fast and powerful on processing string) script. pig may not be appropriate to your needs.
Thanks. TianYi On Mon, Sep 24, 2012 at 4:41 AM, HAJIHASHEMI, ZAHRA (AG/1000) < [email protected]> wrote: > Hi all, > > I'm new to pig and need to format my file. I have fasta file with this > fomat: > > >1 abundance=7626 length=72 cross=0 > CGACACGACTCTCGGCAACGGATA > CGACACGACTCTCGGCAACGGATAC > GACACGACTCTCGGCAACGGATA > >3 abundance=4639 length=22 cross=1 > CGACACGACTCTCGGCAACGGA > CGACACGACTCTCGGCAACGGATA > CGACACGACTCTCGGCAACGGATA > >4 abundance=4302 length=24 cross=0 > ACTTGTGCTGATTGGATGACTTGA > >5 abundance=3785 length=23 cross=0 > GACACGACTCTCGGCAACGGATA > > Each line which starts with '>' corresponds to one sequence, but the > actual sequence might be stored in multiple lines like record 1. In each > line, the first number is id. > In the formatted file, I do not need id and cross. > I need to format this file such that all records will be in just one line > and without the keywords "abundance", "length", and "cross". So the ideal > formatted file should be like that: > 7626 72 > CGACACGACTCTCGGCAACGGATACGACACGACTCTCGGCAACGGATACGACACGACTCTCGGCAACGGATA > 4639 22 > CGACACGACTCTCGGCAACGGACGACACGACTCTCGGCAACGGATACGACACGACTCTCGGCAACGGATA > 4302 24 ACTTGTGCTGATTGGATGACTTGA > 3785 23 GACACGACTCTCGGCAACGGATA > > Can I do this formatting in pig? > Any help is highly appreciated. > > > Zara > This e-mail message may contain privileged and/or confidential > information, and is intended to be received only by persons entitled > to receive such information. If you have received this e-mail in error, > please notify the sender immediately. Please delete it and > all attachments from any servers, hard drives or any other media. Other > use of this e-mail by you is strictly prohibited. > > All e-mails and attachments sent and received are subject to monitoring, > reading and archival by Monsanto, including its > subsidiaries. The recipient of this e-mail is solely responsible for > checking for the presence of "Viruses" or other "Malware". > Monsanto, along with its subsidiaries, accepts no liability for any damage > caused by any such code transmitted by or accompanying > this e-mail or any attachment. > > > The information contained in this email may be subject to the export > control laws and regulations of the United States, potentially > including but not limited to the Export Administration Regulations (EAR) > and sanctions regulations issued by the U.S. Department of > Treasury, Office of Foreign Asset Controls (OFAC). As a recipient of this > information you are obligated to comply with all > applicable U.S. export laws and regulations. >
