Hello, I have this code (http://pastie.org/4575790) which pulls data from a list and then modifies some of the values such as the 'yield' entry, which has entries like:
21 15 ≤ 1000 ≤ 20 2.2 - 30 so that they are cleaned up. # -*- coding: UTF-8 -*- # Norman Khine <nor...@zmgc.net> import operator, json from BeautifulSoup import BeautifulSoup combos={0: 'id', 2: 'country', 3: 'type', 5: 'lat', 6: 'lon', 12: 'name' } TABLE_CONTENT = [['958','<a id="958F" href="javascript:c_row(\'958\')" title="go to map"><img src="/images/c_map.png" border="0"></a>','USA','Atmospheric','<a href="javascript:c_ol(\'958\')" title="click date time to show origin_list (evid=958)">1945/07/16 11:29:45</a>','33.6753','-106.4747','','-.03','21','','','TRINITY',' ',' ','<a href="javascript:c_md(\'958\')" title="click here to show source data">SourceData</a>',' '],['959','<a id="959F" href="javascript:c_row(\'959\')" title="go to map"><img src="/images/c_map.png" border="0"></a>','USA','Atmospheric','<a href="javascript:c_ol(\'959\')" title="click date time to show origin_list (evid=959)">1945/08/05 23:16:02</a>','34.395','132.4538','','-.58','15','','','LITTLEBOY',' ',' ','<a href="javascript:c_md(\'959\')" title="click here to show source data">SourceData</a>',' '],['1906','<a id="1906F" href="javascript:c_row(\'1906\')" title="go to map"><img src="/images/c_map.png" border="0"></a>','GBR','Atmospheric','<a href="javascript:c_ol(\'1906\')" title="click date time to show origin_list (evid=1906)">1958/08/22 17:24:00</a>','1.67','-157.25','',' ',' ≤ 1000','','','Pennant 2',' ',' ','<a href="javascript:c_md(\'1906\')" title="click here to show source data">SourceData</a>',' '],['28','<a id="28F" href="javascript:c_row(\'28\')" title="go to map"><img src="/images/c_map.png" border="0"></a>','USA','Underground','<a href="javascript:c_ol(\'28\')" title="click date time to show origin_list (evid=28)">1961/09/16 19:45:00</a>','37.048','-116.034','0','.098',' ≤ 20','','','SHREW',' ',' ','<a href="javascript:c_md(\'28\')" title="click here to show source data">SourceData</a>','<a href="javascript:c_es(\'NEDBMetadataYucca2.htm\');">US Yucca Flat</a>'],['5393637','<a id="5393637F" href="javascript:c_row(\'5393637\')" title="go to map"><img src="/images/c_map.png" border="0"></a>','PRK','Underground','<a href="javascript:c_ol(\'5393637\')" title="click date time to show origin_list (evid=5393637)">2009/05/25 00:54:45</a>','41.2925','129.0657','','0','2.2 - 30','4.7','<a href="javascript:c_stalist(\'5393637\')" title="click here to show stations with waveform">45</a>','2009 North Korean Nuclear Test','<a href="javascript:c_bull(\'5393637\')" title="click here to show bulletin">Bulletin</a>','<a href="javascript:c_tres(\'5393637\')" title="click here to show IASP91 time residuals with respect to preferred solution">TimeRes</a>','<a href="javascript:c_md(\'5393637\')" title="click here to show source data">SourceData</a>','<a href="javascript:c_es(\'NEDBMetadataNKorea2009.htm\');">NK2009</a>']] event_list = [] for event in TABLE_CONTENT: event_dict = {} for index, item in enumerate(event): if index == 8: if item == ' ': event_dict['depth'] = '0' else: event_dict['depth'] = item if index == 9: try: items = item.split() if len(items) >= 2: event_dict['yield'] = items[-1] else: if item == ' ': event_dict['yield'] = '10' else: event_dict['yield'] = item except: pass if index == 4: soup = BeautifulSoup(item) for a in soup.findAll('a'): event_dict['date'] = ''.join(a.findAll(text=True)) if index == 3: if 'Atmospheric' in item: event_dict['fill'] = 'red' if 'Underground' in item: event_dict['fill'] = 'green' elif index in combos: event_dict[combos[index]]=item event_list.append(event_dict) print event_dict event_list = sorted(event_list, key = operator.itemgetter('id')) f = open('detonations.json', 'w') f.write(json.dumps(event_list)) f.close() print 'detonations.json, written!' this then produces the .json file such as: [{"name": "Pennant 2", "country": "GBR", "lon": "-157.25", "yield": "1000", "lat": "1.67", "depth": "0", "date": "1958/08/22 17:24:00", "id": "1906", "fill": "red"}, {"name": "SHREW", "country": "USA", "lon": "-116.034", "yield": "20", "lat": "37.048", "depth": ".098", "date": "1961/09/16 19:45:00", "id": "28", "fill": "green"}, {"name": "2009 North Korean Nuclear Test", "country": "PRK", "lon": "129.0657", "yield": "30", "lat": "41.2925", "depth": "0", "date": "2009/05/25 00:54:45", "id": "5393637", "fill": "green"}, {"name": "TRINITY", "country": "USA", "lon": "-106.4747", "yield": "21", "lat": "33.6753", "depth": "-.03", "date": "1945/07/16 11:29:45", "id": "958", "fill": "red"}, {"name": "LITTLEBOY", "country": "USA", "lon": "132.4538", "yield": "15", "lat": "34.395", "depth": "-.58", "date": "1945/08/05 23:16:02", "id": "959", "fill": "red"} can the code be improved further? also, the content has 2,153 items, what will be the correct way to have this in a separate file and import this within this file to work on it? any advice much appreciated. norman -- %>>> "".join( [ {'*':'@','^':'.'}.get(c,None) or chr(97+(ord(c)-83)%26) for c in ",adym,*)&uzq^zqf" ] )
_______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor