thanks denis, On Tue, Feb 2, 2010 at 9:30 AM, spir <denis.s...@free.fr> wrote: > On Mon, 1 Feb 2010 16:30:02 +0100 > Norman Khine <nor...@khine.net> wrote: > >> On Mon, Feb 1, 2010 at 1:19 PM, Kent Johnson <ken...@tds.net> wrote: >> > On Mon, Feb 1, 2010 at 6:29 AM, Norman Khine <nor...@khine.net> wrote: >> > >> >> thanks, what about the whitespace problem? >> > >> > \s* will match any amount of whitespace includin newlines. >> >> thank you, this worked well. >> >> here is the code: >> >> ### >> import re >> file=open('producers_google_map_code.txt', 'r') >> data = repr( file.read().decode('utf-8') ) >> >> block = re.compile(r"""openInfoWindowHtml\(.*?\\ticon: myIcon\\n""") >> b = block.findall(data) >> block_list = [] >> for html in b: >> namespace = {} >> t = re.compile(r"""<strong>(.*)<\/strong>""") >> title = t.findall(html) >> for item in title: >> namespace['title'] = item >> u = re.compile(r"""a href=\"\/(.*)\">En savoir plus""") >> url = u.findall(html) >> for item in url: >> namespace['url'] = item >> g = re.compile(r"""GLatLng\((\-?\d+\.\d*)\,\\n\s*(\-?\d+\.\d*)\)""") >> lat = g.findall(html) >> for item in lat: >> namespace['LatLng'] = item >> block_list.append(namespace) >> >> ### >> >> can this be made better? > > The 3 regex patterns are constants: they can be put out of the loop. > > You may also rename b to blocks, and find a more a more accurate name for > block_list; eg block_records, where record = set of (named) fields. > > A short desc and/or example of the overall and partial data formats can > greatly help later review, since regex patterns alone are hard to decode.
here are the changes: import re file=open('producers_google_map_code.txt', 'r') data = repr( file.read().decode('utf-8') ) get_record = re.compile(r"""openInfoWindowHtml\(.*?\\ticon: myIcon\\n""") get_title = re.compile(r"""<strong>(.*)<\/strong>""") get_url = re.compile(r"""a href=\"\/(.*)\">En savoir plus""") get_latlng = re.compile(r"""GLatLng\((\-?\d+\.\d*)\,\\n\s*(\-?\d+\.\d*)\)""") records = get_record.findall(data) block_record = [] for record in records: namespace = {} titles = get_title.findall(record) for title in titles: namespace['title'] = title urls = get_url.findall(record) for url in urls: namespace['url'] = url latlngs = get_latlng.findall(record) for latlng in latlngs: namespace['latlng'] = latlng block_record.append(namespace) print block_record > > The def of "namespace" would be clearer imo in a single line: > namespace = {title:t, url:url, lat:g} i am not sure how this will fit into the code! > This also reveals a kind of name confusion, doesn't it? > > > Denis > > > > > ________________________________ > > la vita e estrany > > http://spir.wikidot.com/ > _______________________________________________ > Tutor maillist - tu...@python.org > To unsubscribe or change subscription options: > http://mail.python.org/mailman/listinfo/tutor > _______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor