Re: [Tutor] \x00T\x00r\x00i\x00a\x00 ie I get \x00 breaking up every character ?

Sarma Tangirala Sun, 20 Nov 2011 12:06:29 -0800

Would the html parser library in python be a better idea as opposed to
using split? That way you have greater control over what is in the html.
On 20 Nov 2011 23:58, "dave selby" <[email protected]> wrote:


> Hi All,
>
> I have a long string which is an HTML file, I strip the HTML tags away
> and make a list with
>
> text = re.split('<.*?>', HTML)
>
> I then tried to search for a string with text.index(...) but it was
> not found, printing HTML to a terminal I get what I expect, a block of
> tags and text, I split the HTML and print text and I get loads of
>
> \x00T\x00r\x00i\x00a\x00  ie I get \x00 breaking up every character.
>
> Any idea what is happening and how to get back to a list of ascii strings ?
>
> Cheers
>
> Dave
>
> --
>
> Please avoid sending me Word or PowerPoint attachments.
> See http://www.gnu.org/philosophy/no-word-attachments.html
> _______________________________________________
> Tutor maillist  -  [email protected]
> To unsubscribe or change subscription options:
> http://mail.python.org/mailman/listinfo/tutor
>

_______________________________________________
Tutor maillist  -  [email protected]
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] \x00T\x00r\x00i\x00a\x00 ie I get \x00 breaking up every character ?

Reply via email to