hello, thank you all for the advise, here is the updated version with the changes.
import re file = open('producers_google_map_code.txt', 'r') data = repr( file.read().decode('utf-8') ) get_records = re.compile(r"""openInfoWindowHtml\(.*?\\ticon: myIcon\\n""").findall get_titles = re.compile(r"""<strong>(.*)<\/strong>""").findall get_urls = re.compile(r"""a href=\"\/(.*)\">En savoir plus""").findall get_latlngs = re.compile(r"""GLatLng\((\-?\d+\.\d*)\,\\n\s*(\-?\d+\.\d*)\)""").findall records = get_records(data) block_record = [] for record in records: namespace = {} titles = get_titles(record) title = titles[-1] if titles else None urls = get_urls(record) url = urls[-1] if urls else None latlngs = get_latlngs(record) latlng = latlngs[-1] if latlngs else None block_record.append( {'title':title, 'url':url, 'lating':latlng} ) print block_record On Tue, Feb 2, 2010 at 1:27 PM, Kent Johnson <ken...@tds.net> wrote: > On Tue, Feb 2, 2010 at 4:16 AM, Norman Khine <nor...@khine.net> wrote: > >> here are the changes: >> >> import re >> file=open('producers_google_map_code.txt', 'r') >> data = repr( file.read().decode('utf-8') ) > > Why do you use repr() here? i have latin-1 chars in the producers_google_map_code.txt' file and this is the only way to get it to read the data. is this incorrect? > >> get_record = re.compile(r"""openInfoWindowHtml\(.*?\\ticon: myIcon\\n""") >> get_title = re.compile(r"""<strong>(.*)<\/strong>""") >> get_url = re.compile(r"""a href=\"\/(.*)\">En savoir plus""") >> get_latlng = re.compile(r"""GLatLng\((\-?\d+\.\d*)\,\\n\s*(\-?\d+\.\d*)\)""") >> >> records = get_record.findall(data) >> block_record = [] >> for record in records: >> namespace = {} >> titles = get_title.findall(record) >> for title in titles: >> namespace['title'] = title > > > This is odd, you don't need a loop to get the last title, just use > namespace['title'] = get_title.findall(html)[-1] > > and similarly for url and latings. > > Kent > > >> urls = get_url.findall(record) >> for url in urls: >> namespace['url'] = url >> latlngs = get_latlng.findall(record) >> for latlng in latlngs: >> namespace['latlng'] = latlng >> block_record.append(namespace) >> >> print block_record >>> >>> The def of "namespace" would be clearer imo in a single line: >>> namespace = {title:t, url:url, lat:g} >> >> i am not sure how this will fit into the code! >> >>> This also reveals a kind of name confusion, doesn't it? >>> >>> >>> Denis >>> >>> >>> >>> >>> ________________________________ >>> >>> la vita e estrany >>> >>> http://spir.wikidot.com/ >>> _______________________________________________ >>> Tutor maillist - tu...@python.org >>> To unsubscribe or change subscription options: >>> http://mail.python.org/mailman/listinfo/tutor >>> >> _______________________________________________ >> Tutor maillist - tu...@python.org >> To unsubscribe or change subscription options: >> http://mail.python.org/mailman/listinfo/tutor >> > _______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor