grouchy wrote:
> Hi,
> 
> I'm having bang-my-head-against-a-wall moments trying to figure all of this 
> out.
> 
>>>>from BeautifulSoup import BeautifulSoup
>>>
>>>>file = urllib.urlopen("http://www.google.com/search?q=beautifulsoup";)
>>>>file = file.read().decode("utf-8")
>>>>soup = BeautifulSoup(file)
>>>>results = soup('p','g') 
>>>>x = results[1].a.renderContents()
>>>>type(x)
> 
> <type 'unicode'>
> 
>>>>print x
> 
> Matt Croydon::Postneo 2.0 » Blog Archive » Mobile Screen Scraping <b>...</b>
> 
> So far so good.  But what I really want is just the text, so I try
> something like:
> 
> 
>>>>y = results[1].a.fetchText(re.compile('.+'))
> 
> Traceback (most recent call last):
>   File "<interactive input>", line 1, in ?
>   File "BeautifulSoup.py", line 466, in fetchText
>     return self.fetch(recursive=recursive, text=text, limit=limit)
>   File "BeautifulSoup.py", line 492, in fetch
>     return self._fetch(name, attrs, text, limit, generator)
>   File "BeautifulSoup.py", line 194, in _fetch
>     if self._matches(i, text):
>   File "BeautifulSoup.py", line 252, in _matches
>     chunk = str(chunk)
> UnicodeEncodeError: 'ascii' codec can't encode character u'\xbb' in
> position 26: ordinal not in range(128)
> 
> Is this a bug?  Come to think of it, I'm not even sure how printing x
> worked, since it printed non-ascii characters.

This is the first question in the BeautifulSoup FAQ at 
http://www.crummy.com/software/BeautifulSoup/FAQ.html

Unfortunately the author of BS considers this a problem with your Python 
installation! So it seems he doesn't have a good understanding of Python and 
Unicode. (OK, I can forgive him that, I think there are only a handful of 
people who really do understand it completely.)

The first fix given doesn't work. The second fix works but it is not a good 
idea to change the default encoding for your Python install. There is a hack 
you can use to change the default encoding just for one program; in your 
program put
  reload(sys); sys.setdefaultencoding('utf-8')

This seems to fix the problem you are having.

Kent

_______________________________________________
Tutor maillist  -  Tutor@python.org
http://mail.python.org/mailman/listinfo/tutor

Reply via email to