Hello, I am still unable to get this to work correctly! In [1]: file=open('producers_google_map_code.txt', 'r')
In [2]: data = repr( file.read().decode('utf-8') ) In [3]: from BeautifulSoup import BeautifulStoneSoup In [4]: soup = BeautifulStoneSoup(data) In [6]: soup http://paste.lisp.org/display/94195 In [7]: import re In [8]: p = re.compile(r"""GLatLng\((\d+\.\d*)\, \n (\d+\.\d*)\)""") In [9]: r = p.findall(data) In [10]: r Out[10]: [] see http://paste.lisp.org/+20BO/1 i can't seem to get the regex correct (r"""GLatLng\((\d+\.\d*)\, \n (\d+\.\d*)\)""") the problem is that, each for example is: GLatLng(27.729912,\\n 85.31559) GLatLng(-18.889851,\\n -66.770897) i have a big whitespace, plus the group can have a negative value, so if i do this: In [31]: p = re.compile(r"""GLatLng\((\d+\.\d*)\,\\n (\d+\.\d*)\)""") In [32]: r = p.findall(data) In [33]: r Out[33]: [('27.729912', '85.31559'), ('9.696333', '122.985992'), ('17.964625', '102.60040'), ('21.046439', '105.853043'), but this does not take into account of data which has negative values, also i am unsure how to pull it all together. i.e. to return a CSV file such as: "ACP", "acp.html", "9.696333", "122.985992" "ALTER TRADE CORPORATION", "alter-trade-corporation.html", "-18.889851", "-66.770897" Thanks On Sat, Jan 23, 2010 at 12:50 AM, spir <denis.s...@free.fr> wrote: > On Sat, 23 Jan 2010 00:22:41 +0100 > Norman Khine <nor...@khine.net> wrote: > >> Hi >> >> On Fri, Jan 22, 2010 at 7:44 PM, spir <denis.s...@free.fr> wrote: >> > On Fri, 22 Jan 2010 14:11:42 +0100 >> > Norman Khine <nor...@khine.net> wrote: >> > >> >> but my problem comes when i try to list the GLatLng: >> >> >> >> GLatLng(9.696333, 122.985992); >> >> >> >> >>> StartingWithGLatLng = soup.findAll(re.compile('GLatLng')) >> >> >>> StartingWithGLatLng >> >> [] >> > >> > Don't about soup's findall. But the regex pattern string should rather be >> > something like (untested): >> > r"""GLatLng\(\(d+\.\d*)\, (d+\.\d*)\) """ >> > capturing both integers. >> > >> > Denis >> > >> > PS: finally tested: >> > >> > import re >> > s = "GLatLng(9.696333, 122.985992)" >> > p = re.compile(r"""GLatLng\((\d+\.\d*)\, (\d+\.\d*)\)""") >> > r = p.match(s) >> > print r.group() # --> GLatLng(9.696333, 122.985992) >> > print r.groups() # --> ('9.696333', '122.985992') >> > >> > s = "xGLatLng(1.1, 11.22)xxxGLatLng(111.111, 1111.2222)x" >> > r = p.findall(s) >> > print r # --> [('1.1', '11.22'), ('111.111', >> > '1111.2222')] >> >> Thanks for the help, but I can't seem to get the RegEx to work correctly. >> >> Here is my input and output: >> >> http://paste.lisp.org/+20BO/1 > > See my previous examples... > If you use match: > > In [6]: r = p.match(data) > > Then the result is a regex match object (unlike when using findall). To get > the string(s) matched; you need to use the group() and/or groups() methods. > >>>> import re >>>> p = re.compile('x') >>>> print p.match("xabcx") > <_sre.SRE_Match object at 0xb74de6e8> >>>> print p.findall("xabcx") > ['x', 'x'] > > Denis > ________________________________ > > la vita e estrany > > http://spir.wikidot.com/ > _______________________________________________ Tutor maillist - Tutor@python.org To unsubscribe or change subscription options: http://mail.python.org/mailman/listinfo/tutor