Some kind person replied: > You have the same URL as both your good and bad example.
Oops, dang emacs cut buffer (yeah, thats what did it). A working example url would be (again, mind the wrap): http://www.naco.faa.gov/digital_tpp_search.asp?fldIdent=ksfo&fld_ident_type=ICAO&ver=0711&bnSubmit=Complete+Search Marc Christiansen <[EMAIL PROTECTED]> wrote: > The problem is this line: > <META http-equiv="Content-Type" content="text/html; charset=UTF-16"> > > Which is wrong. The content is not utf-16 encoded. The line after that > declares the charset as utf-8, which is correct, although ascii would be > ok too. Ah, er, hmmm. Take a look the 'good' URL I mentioned above. You will notice that it has the same utf-16, utf-8 encoding that the 'bad' one has. And BeautifulSoup works great on it. I'm still scratchin' ma head... > If I save the search result and remove this line, everything works. So, > you could: > - ignore problematic pages Not an option for my application. > - save and edit them, then reparse them (not always practical) Thats what I'm doing at the moment during my development. Sure seems inelegant. > - use the fromEncoding argument: > soup=BeautifulSoup.BeautifulSoup(ifile, fromEncoding="utf-8") > (or 'ascii'). Of course this only works if you guess/predict the > encoding correctly ;) Which can be difficult. Since BeautifulSoup uses > "an encoding discovered in the document itself" (quote from > <http://www.crummy.com/software/BeautifulSoup/documentation.html#Beautiful > Soup Gives You Unicode, Dammit>) I'll try that. For what I'm doing it ought to be safe enough. Much appreciate all the comments so far. -- Frank Stutzman Bonanza N494B "Hula Girl" Boise, ID -- http://mail.python.org/mailman/listinfo/python-list