Well, fasta is a file format used by biologists to store biological
sequencesthe format is as under> sequence information (sequence name, sequence
length etc)genomic sequence> sequence information (sequence name, sequence
length etc)genomic sequenceI want to match the name of sequence with another
list of sequence names and splice the sequence by the provided list of start
and end sites for each sequenceso the pseudo code could beif line starts with
'>': match the header name with sequence name: if sequence name
found: splice from the given start and end positions of that
sequence the code I have devised so far is:import oswith
open('E:/scaftig.sample - Copy.scaftig','r') as f: header = f.readline()
header = header.rstrip(os.linesep) sequence = '' for line in f:
line = line.rstrip('\n') if line[0] == '>': header =
header[:] print header if line[0] != '>':
sequence+= line
print sequence, len(sequence)I would appreciate if you can
helpThanksBest RegardsAli
> Date: Tue, 8 Mar 2016 03:11:42 -0500
> Subject: Re: [Tutor] FASTA FILE SUB-SEQUENCE EXTRACTION
> From: wolfrage8...@gmail.com
> To: syedzaid...@hotmail.co.uk
>
> What is FASTA? This seems very specific. Do you have any code thus far
> that is failing?
>
> On Tue, Mar 8, 2016 at 2:33 AM, syed zaidi <syedzaid...@hotmail.co.uk> wrote:
> > Hello all,
> > I am stuck in a problem, I hope someone can help me out. I have a FASTA
> > file with multiple sequences and another file with the gene coordinates.
> > SAMPLEFASTA FILE:
> >>EBM_revised_C2034_1
> >>length=611GCAGCAGTGGGGAATATTGCACAATGGGCGCAAGCCTGATGCAGCCATGCCGCGTGTATGAAGAAGGCCTTCGGGTTGTAAAGTACTTTCAGCGGGGAGGAAGGGAGTAAAGTTAATACCTTTGCTCATTGACGTTACCCGCAGAAGAAGCACCGGCTAACTCCGTGCCAGCAGCCGCGGTAATACGGAGGGTGCAAGCGTTAATCGGAATTACTGGGCGTAAAGCGCACGCAGGCGGTTTGTTAAGTCAGATGTGAAATCCCCGGGCTCAACCTGGGAACTGCATCTGATACTGGCAAGCTTGAGTCTCGTAGAGGGGGGTAGAATTCCAGGTGTAGCGGTGAAATGCGTAGAGATCTGGAGGAATACCGGTGGCGAAGGCGGCCCCCTGGACGAAGACTGACGCTCAGGTGCGAAAGCGTGGGGAGCAAACAGGATTAGATACCCTGGTAGTCCACGCCGTAAACGATGTCGACTTGGAGGTTGTGCCCTTGAGGCGTGGCTTCCGGAGCTAACGCGTTAAGTCGACCGCCTGGGGAGTACGGCCGCAAGGTTAAAACTCAAATGAATTGACGGGGGCCCGCACAAGCGGTGGAGCATGTGGTTTAATT>EBM_revised_C2104_1
> >>
> >>length=923TCCGAGGGCGGTGGGATGTTGGTGCTGCAGCGGCTTTCGGATGCGCGGCGGTTGGGTCATCCGGTGTTGGCGGTGGTGGTCGGGTCGGCGGTTAATCAGGATGGGGCGTCGAATGGGTTGACCGCGCCTAATGGTCCTTCGCAGCAGCGGGTGGTGCGGGCGGCGTTGGCCAATGCCGGGTTGAGCGCGGCCGAGGTGGATGTGGTGGAGGGGCATGGGACCGGGACCACGTTGGGGGATCCGATTGAGGCTCAGGCGTTGTTGGCCACTTATGGGCAAGATCGGGGGGAGCCGGGAGAACCTTTGTGGTTGGGGTCGGT
GAA
> >
> > GTCGAATATGGGTCATACGCAGGCCGCGGCGGGGGTGGCCGGGGTGATCAAGATGGTGTTGGCGATGCGCCATGAGCTGTTGCCGGCGACGTTGCACGTGGATGTGCCTAGCCCGCATGTGGATTGGTCGGCGGGGGCGGTGGAGTTGTTGACCGCGCCGCGGGTGTGGCCTGCTGGTGCTCGGACGCGTCGTGCGGGGGTGTCGTCGTTTGGGATTAGTGGCACTAATGCGCATGTGATTATCGAGGCGGTGCCGGTGGTGCCGCGGCGGGAGGCTGGTTGGGCGGGGCCGGTGGTGCCGTGGGTGGTGTCGGCGAAGTCGGAGTCGGCGTTGCGGGGGCAGGCGGCTCGGTTGGCCGCGTACGTGCGTGGCGATGATGGCCTCGATGTTGCCGATGTGGGGTGGTCGTTGGCGGGTCGTTCGGTTTTTGAGCATCGGGCGGTGGTGGTTGGCGGGGACCGTGATCGGTTGTTGGCCGGGCTCGATGAGCTGGCGGGTGACCAGTTGGGCGGCTCGGTTGTTCGGGGCACGGCGACTGCGGCGGGTAAGACGGTGTTCGTCTTCCCCGGCCAAGGCTCCCAATGGCTGGGCATGGGAAT
> > GENE COORD FILEScaf_name Gene_name DS_St
> > DS_EnEBM_revised_C2034_1 gene1_1 33 99EBM_revised_C2034_1
> > gene1_1 55 100EBM_revised_C2034_1 gene1_1 111
> > 150EBM_revised_C2104_1 gene1_1 44 70
> > I want to perform the following steps:compare the scaf_name with the header
> > of fasta sequenceif header matches then process the sequence and extract
> > the sequence by the provided start and end positions.
> >
> > I would appreciate if someone can help
> > Thanks
> > Best Regards
> >
> > Ali
> >
> >> _______________________________________________
> >> Tutor maillist - Tutor@python.org
> >> To unsubscribe or change subscription options:
> >> https://mail.python.org/mailman/listinfo/tutor
> >
> > _______________________________________________
> > Tutor maillist - Tutor@python.org
> > To unsubscribe or change subscription options:
> > https://mail.python.org/mailman/listinfo/tutor
_______________________________________________
Tutor maillist - Tutor@python.org
To unsubscribe or change subscription options:
https://mail.python.org/mailman/listinfo/tutor