Hi, I have the following GFF file from a SNAP X1 SNAP Einit 2579 2712 -3.221 + . X1-snap.1 X1 SNAP Exon 2813 2945 4.836 + . X1-snap.1 X1 SNAP Eterm 3013 3033 10.467 + . X1-snap.1 X1 SNAP Esngl 3457 3702 -17.856 + . X1-snap.2 X1 SNAP Einit 4901 4974 -4.954 + . X1-snap.3 X1 SNAP Eterm 5021 5150 14.231 + . X1-snap.3 X1 SNAP Einit 6245 7325 -1.525 - . X1-snap.4 X1 SNAP Eterm 5974 6008 5.398 - . X1-snap.4
With the code below I have tried to parse the above GFF file from BCBio import GFF from pprint import pprint from BCBio.GFF import GFFExaminer def retrieve_pred_genes_data(): with open("test/X1_small.snap.gff") as sf: #examiner = GFFExaminer() #pprint(examiner.available_limits(sf)) for rec in GFF.parse(sf): pprint(rec.id) pprint(rec.description) pprint(rec.name) pprint(rec.features) #pprint(rec.type) #'SeqRecord' object has no attribute #pprint(rec.ref) #'SeqRecord' object has no attribute #pprint(rec.ref_db) #'SeqRecord' object has no attribute #pprint(rec.location) #'SeqRecord' object has no attribute #pprint(rec.location_operator) #'SeqRecord' object has no attribute #pprint(rec.strand) #'SeqRecord' object has no attribute #pprint(rec.sub_features) #'SeqRecord' object has no attribute retrieve_pred_genes_data() and got the following output: 'X1' '<unknown description>' '<unknown name>' [SeqFeature(FeatureLocation(ExactPosition(2578), ExactPosition(2712), strand=1), type='Einit'), SeqFeature(FeatureLocation(ExactPosition(2812), ExactPosition(2945), strand=1), type='Exon'), SeqFeature(FeatureLocation(ExactPosition(3012), ExactPosition(3033), strand=1), type='Eterm'), SeqFeature(FeatureLocation(ExactPosition(3456), ExactPosition(3702), strand=1), type='Esngl'), SeqFeature(FeatureLocation(ExactPosition(4900), ExactPosition(4974), strand=1), type='Einit'), SeqFeature(FeatureLocation(ExactPosition(5020), ExactPosition(5150), strand=1), type='Eterm'), SeqFeature(FeatureLocation(ExactPosition(6160), ExactPosition(7325), strand=-1), type='Einit'), SeqFeature(FeatureLocation(ExactPosition(5973), ExactPosition(6008), strand=-1), type='Eterm')] and with GFFExaminer I got these: {'gff_id': {('X1',): 8}, 'gff_source': {('SNAP',): 8}, 'gff_source_type': {('SNAP', 'Einit'): 3, ('SNAP', 'Esngl'): 1, ('SNAP', 'Eterm'): 3, ('SNAP', 'Exon'): 1}, 'gff_type': {('Einit',): 3, ('Esngl',): 1, ('Eterm',): 3, ('Exon',): 1}} I found these examples ( https://github.com/patena/jonikaslab-mutant-pools/blob/master/notes_on_GFF_parsing.txt), but I got these kind of errors: #pprint(rec.type) #'SeqRecord' object has no attribute #pprint(rec.ref) #'SeqRecord' object has no attribute #pprint(rec.ref_db) #'SeqRecord' object has no attribute #pprint(rec.location) #'SeqRecord' object has no attribute #pprint(rec.location_operator) #'SeqRecord' object has no attribute #pprint(rec.strand) #'SeqRecord' object has no attribute #pprint(rec.sub_features) #'SeqRecord' object has no attribute What did I do wrong and how is it possible to access all fields in the above GFF file? Thank you in advance. Mic
___________________________________________________________ The Galaxy User list should be used for the discussion of Galaxy analysis and other features on the public server at usegalaxy.org. Please keep all replies on the list by using "reply all" in your mail client. For discussion of local Galaxy instances and the Galaxy source code, please use the Galaxy Development list: http://lists.bx.psu.edu/listinfo/galaxy-dev To manage your subscriptions to this and other Galaxy lists, please use the interface at: http://lists.bx.psu.edu/ To search Galaxy mailing lists use the unified search at: http://galaxyproject.org/search/mailinglists/