Ezio Melotti added the comment:
Thanks for the report.
This is a duplicate of #34480.
--
nosy: +ezio.melotti
resolution: -> duplicate
stage: -> resolved
status: open -> closed
type: compile error -> behavior
___
Python tracker
f not match:
return -1
if report:
j = match.start(0)
self.unknown_decl(rawdata[i+3: j])
return match.end(0)
`match` should be set to None in the fall-through else statement right before
`if not match`.
--
components: Library (Lib)
messa
Ok i get the basics of this and i have been doing some successful parsings and
using regular expressions to find html tags. I have tried to find an img tag
and write that image to a file. I have had no success. It says it has
successfully wrote the image to the file with a try... except
dont worry it has been solved
--
https://mail.python.org/mailman/listinfo/python-list
Hi all,
I am trying to parse some html string with BeatifulSoup.
The string is,
table colWidths='530.0' style='Table_Main_Table'
tr
td
blockTable colWidths='54.0,80.0,67.0' style='Table_Tax_Header'
tr
th
p
On Wed, Jan 5, 2011 at 2:58 PM, Selvam s.selvams...@gmail.com wrote:
Hi all,
I am trying to parse some html string with BeatifulSoup.
The string is,
table colWidths='530.0' style='Table_Main_Table'
tr
td
blockTable colWidths='54.0,80.0,67.0'
[EMAIL PROTECTED] wrote:
Hi everyone
I am trying to build my own web crawler for an experiement and I don't
know how to access HTTP protocol with python.
Also, Are there any Opensource Parsing engine for HTML documents
available in Python too? That would be great.
Check on Mechanize. It
Stefan Behnel [EMAIL PROTECTED]:
[EMAIL PROTECTED] wrote:
I am trying to build my own web crawler for an experiement and I don't
know how to access HTTP protocol with python.
Also, Are there any Opensource Parsing engine for HTML documents
available in Python too? That would be great.
Hi everyone
I am trying to build my own web crawler for an experiement and I don't
know how to access HTTP protocol with python.
Also, Are there any Opensource Parsing engine for HTML documents
available in Python too? That would be great.
--
http://mail.python.org/mailman/listinfo/python-list
On Sat, 28 Jun 2008 19:03:39 -0700, disappearedng wrote:
Hi everyone
I am trying to build my own web crawler for an experiement and I don't
know how to access HTTP protocol with python.
Also, Are there any Opensource Parsing engine for HTML documents
available in Python too? That would be
On Jun 28, 9:03 pm, [EMAIL PROTECTED] wrote:
Hi everyone
I am trying to build my own web crawler for an experiement and I don't
know how to access HTTP protocol with python.
Look at the httplib module.
Also, Are there any Opensource Parsing engine for HTML documents
available in Python
Hi everyone
Hello
I am trying to build my own web crawler for an experiement and I don't
know how to access HTTP protocol with python.
urllib2: http://docs.python.org/lib/module-urllib2.html
Also, Are there any Opensource Parsing engine for HTML documents
available in Python too? That would
[EMAIL PROTECTED] wrote:
I am trying to build my own web crawler for an experiement and I don't
know how to access HTTP protocol with python.
Also, Are there any Opensource Parsing engine for HTML documents
available in Python too? That would be great.
Try lxml.html. It parses broken HTML,
a
specific problem which I attempted to lay out clearly from the
outset.
I was asking this community if there was a simple way to use only the
tools included with Python to parse a bit of html.
There are lots of ways doing HTML parsing in Python. A common
one is e.g. using mxTidy to convert
The pages I'm trying to write this code to run against aren't in the
wild, though. They are static html files on my company's lan, are very
consistent in format, and are (I believe) valid html.
Obvious way to check this is to go to http://validator.w3.org/ and see
what it tells you about your
On Jan 23, 3:54 am, M.-A. Lemburg [EMAIL PROTECTED] wrote:
I was asking this community if there was a simple way to use only the
tools included with Python to parse a bit of html.
There are lots of ways doing HTML parsing in Python. A common
one is e.g. using mxTidy to convert the HTML
On Jan 23, 2008 7:40 AM, Alnilam [EMAIL PROTECTED] wrote:
Skipping past html validation, and html to xhtml 'cleaning', and
instead starting with the assumption that I have files that are valid
XHTML, can anyone give me a good example of how I would use _ htmllib,
HTMLParser, or ElementTree _
En Wed, 23 Jan 2008 10:40:14 -0200, Alnilam [EMAIL PROTECTED] escribió:
Skipping past html validation, and html to xhtml 'cleaning', and
instead starting with the assumption that I have files that are valid
XHTML, can anyone give me a good example of how I would use _ htmllib,
HTMLParser, or
On Jan 22, 4:31 pm, Alnilam [EMAIL PROTECTED] wrote:
Sorry for the noob question, but I've gone through the documentation
on python.org, tried some of the diveintopython and boddie's examples,
and looked through some of the numerous posts in this group on the
subject and I'm still rather
On 22 Jan, 06:31, Alnilam [EMAIL PROTECTED] wrote:
Sorry for the noob question, but I've gone through the documentation
on python.org, tried some of the diveintopython and boddie's examples,
and looked through some of the numerous posts in this group on the
subject and I'm still rather
Pardon me, but the standard issue Python 2.n (for n in range(5, 2,
-1)) doesn't have an xml.dom.ext ... you must have the mega-monstrous
200-modules PyXML package installed. And you don't want the 75Kb
BeautifulSoup?
I wasn't aware that I had PyXML installed, and can't find a reference
to
On Jan 22, 7:44 am, Alnilam [EMAIL PROTECTED] wrote:
...I move from computer to
computer regularly, and while all have a recent copy of Python, each
has different (or no) extra modules, and I don't always have the
luxury of downloading extras. That being said, if there's a simple way
of doing
On Jan 22, 8:44 am, Alnilam [EMAIL PROTECTED] wrote:
Pardon me, but the standard issue Python 2.n (for n in range(5, 2,
-1)) doesn't have an xml.dom.ext ... you must have the mega-monstrous
200-modules PyXML package installed. And you don't want the 75Kb
BeautifulSoup?
I wasn't aware
Alnilam wrote:
On Jan 22, 8:44 am, Alnilam [EMAIL PROTECTED] wrote:
Pardon me, but the standard issue Python 2.n (for n in range(5, 2,
-1)) doesn't have an xml.dom.ext ... you must have the mega-monstrous
200-modules PyXML package installed. And you don't want the 75Kb
BeautifulSoup?
I
On Jan 22, 11:39 am, Diez B. Roggisch [EMAIL PROTECTED] wrote:
Alnilam wrote:
On Jan 22, 8:44 am, Alnilam [EMAIL PROTECTED] wrote:
Pardon me, but the standard issue Python 2.n (for n in range(5, 2,
-1)) doesn't have an xml.dom.ext ... you must have the mega-monstrous
200-modules PyXML
En Tue, 22 Jan 2008 19:20:32 -0200, Alnilam [EMAIL PROTECTED] escribió:
On Jan 22, 11:39 am, Diez B. Roggisch [EMAIL PROTECTED] wrote:
Alnilam wrote:
On Jan 22, 8:44 am, Alnilam [EMAIL PROTECTED] wrote:
Pardon me, but the standard issue Python 2.n (for n in range(5, 2,
-1)) doesn't have
On Jan 22, 7:29 pm, Gabriel Genellina [EMAIL PROTECTED]
wrote:
I was asking this community if there was a simple way to use only the
tools included with Python to parse a bit of html.
If you *know* that your document is valid HTML, you can use the HTMLParser
module in the standard Python
On Jan 22, 7:29 pm, Gabriel Genellina [EMAIL PROTECTED]
wrote:
I was asking this community if there was a simple way to use only the
tools included with Python to parse a bit of html.
If you *know* that your document is valid HTML, you can use the HTMLParser
module in the standard Python
Sorry for the noob question, but I've gone through the documentation
on python.org, tried some of the diveintopython and boddie's examples,
and looked through some of the numerous posts in this group on the
subject and I'm still rather confused. I know that there are some
great tools out there for
On Jun 21, 9:45 pm, Gabriel Genellina [EMAIL PROTECTED]
wrote:
En Thu, 21 Jun 2007 23:37:07 -0300, [EMAIL PROTECTED] escribió:
So for example if I wanted to navigate to an encoded url
http://online.investools.com/landing.iedu?signedin=truerather than
En Thu, 21 Jun 2007 23:37:07 -0300, [EMAIL PROTECTED] escribió:
So for example if I wanted to navigate to an encoded url
http://online.investools.com/landing.iedu?signedin=true rather than
just http://online.investools.com/landing.iedu How would I do this?
How can I modify the script to
I've written a Script that navigates various urls on a website, and
fetches the contents.
The Url's are being fed from a list urlList. Everything seems to
work splendidly, until I introduce the concept of encoding parameters
for a certain url.
So for example if I wanted to navigate to an encoded
On 6 15 , 2 01 , Stefan Behnel [EMAIL PROTECTED] wrote:
Jackie wrote:
import lxml.etree as et
url = http://www.economics.utoronto.ca/index.php/index/person/faculty/;
tree = et.parse(url)
Stefan- -
- -
Thank you. But when I tried to run the above part, the following
Jackie schrieb:
On 6 15 , 2 01 , Stefan Behnel [EMAIL PROTECTED] wrote:
Jackie wrote:
import lxml.etree as et
url = http://www.economics.utoronto.ca/index.php/index/person/faculty/;
tree = et.parse(url)
Stefan- -
- -
Thank you. But when I tried to run the above
Hi, all,
I want to get the information of the professors (name,title) from the
following link:
http://www.economics.utoronto.ca/index.php/index/person/faculty/;
Ideally, I'd like to have a output file where each line is one Prof,
including his name and title. In practice, I use
Hi, all,
I want to get the information of the professors (name,title) from the
following link:
http://www.economics.utoronto.ca/index.php/index/person/faculty/;
Ideally, I'd like to have a output file where each line is one Prof,
including his name and title. In practice, I use the CSV module.
[ Jackie [EMAIL PROTECTED] ]
1.The code above assume that each Prof has a tilte. If any one of them
does not, the name and title will be mismatched. How to program to
allow that title can be empty?
2.Is there any easier way to get the data I want other than using
list?
Use BeautifulSoup.
Jackie wrote:
I want to get the information of the professors (name,title) from the
following link:
http://www.economics.utoronto.ca/index.php/index/person/faculty/;
That's even XHTML, no need to go through BeautifulSoup. Use lxml instead.
http://codespeak.net/lxml
Ideally, I'd like to
John Machin wrote:
One can even use ElementTree, if the HTML is well-formed. See below.
However if it is as ill-formed as the sample (4th td element not
closed; I've omitted it below), then the OP would be better off
sticking with Beautiful Soup :-)
Or (as we were talking about the best of
On Feb 11, 6:05 pm, Ayaz Ahmed Khan [EMAIL PROTECTED] wrote:
mtuller typed:
I have also tried Beautiful Soup, but had trouble understanding the
documentation
As Gabriel has suggested, spend a little more time going through the
documentation of BeautifulSoup. It is pretty easy to grasp.
John Machin wrote:
One can even use ElementTree, if the HTML is well-formed. See below.
However if it is as ill-formed as the sample (4th td element not
closed; I've omitted it below), then the OP would be better off
sticking with Beautiful Soup :-)
or get the best of both worlds:
Alright. I have tried everything I can find, but am not getting
anywhere. I have a web page that has data like this:
tr
td headers=col1_1 style=width:21%
span class=hpPageText LETTER/span/td
td headers=col2_1 style=width:13%; text-align:right
span class=hpPageText 33,699/span/td
td
En Sat, 10 Feb 2007 20:07:43 -0300, mtuller [EMAIL PROTECTED] escribió:
tr
td headers=col1_1 style=width:21%
span class=hpPageText LETTER/span/td
td headers=col2_1 style=width:13%; text-align:right
span class=hpPageText 33,699/span/td
td headers=col3_1 style=width:13%;
mtuller typed:
I have also tried Beautiful Soup, but had trouble understanding the
documentation
As Gabriel has suggested, spend a little more time going through the
documentation of BeautifulSoup. It is pretty easy to grasp.
I'll give you an example: I want to extract the text between the
On Nov 13, 1:12 pm, [EMAIL PROTECTED] wrote:
I need a help on HTML parser.
snip
I saw a couple of python parsers like pyparsing, yappy, yapps, etc but
they havn't given any example for HTML parsing.
Geez, how hard did you look? pyparsing's wiki menu includes an
'Examples' link, which take
example for HTML parsing. Someone recommended
using lynx to convert the page into the text and parse the data. That
also looks good but still i end of writing a huge chunk of code for
each web page.
What we need is,
One nice parser which should work on HTML/text file (lynx output) and
work based
[EMAIL PROTECTED] wrote:
I need a help on HTML parser.
http://www.effbot.org/pyfaq/tutor-how-do-i-get-data-out-of-html.htm
/F
--
http://mail.python.org/mailman/listinfo/python-list
. But Crawler, Parser and Indexer need to run unattended. I
don't know how to proceed next..
I saw a couple of python parsers like pyparsing, yappy, yapps, etc but
they havn't given any example for HTML parsing. Someone recommended
using lynx to convert the page into the text and parse the data
[EMAIL PROTECTED] wrote:
I am involved in one project which tends to collect news
information published on selected, known web sites inthe format of
HTML, RSS, etc
I just can't imagine why anyone would still want to do this.
With RSS, it's an easy (if not trivial) problem.
With HTML
[EMAIL PROTECTED] wrote:
I am involved in one project which tends to collect news
information published on selected, known web sites inthe format of
HTML, RSS, etc and sortlist them and create a bookmark on our website
for the news content(we will use django for web development). Currently
this is a comment in JavaScript, which is itself inside an HTML comment
Did you read the post?
misread it rather ...
--
http://mail.python.org/mailman/listinfo/python-list
Istvan Albert [EMAIL PROTECTED] wrote:
this is a comment in JavaScript, which is itself inside an HTML comment
Don't nest HTML comments. Occasionaly it may break the browsers as
well.
Did you read the post? He didn't nest HTML comments. He put a Javascript
comment inside an HTML comment,
[EMAIL PROTECTED] wrote:
Python 2.3.5 seems to choke when trying to parse html files, because it
doesn't realize that what's inside !-- -- is a comment in HTML,
even if this comment is inside script /script, especially if it's a
comment inside that script code too.
nope. what's inside
Python 2.3.5 seems to choke when trying to parse html files, because it
doesn't realize that what's inside !-- -- is a comment in HTML,
even if this comment is inside script /script, especially if it's a
comment inside that script code too.
The html file:
!DOCTYPE HTML PUBLIC -//W3C//DTD
// /ht ml - this is a comment in JavaScript, which is itself inside
an HTML comment
This is supposed to be one line. Got wrapped during posting.
--
http://mail.python.org/mailman/listinfo/python-list
[EMAIL PROTECTED] wrote in message
news:[EMAIL PROTECTED]
Python 2.3.5 seems to choke when trying to parse html files, because it
doesn't realize that what's inside !-- -- is a comment in HTML,
even if this comment is inside script /script, especially if it's a
comment inside that
this is a comment in JavaScript, which is itself inside an HTML comment
Don't nest HTML comments. Occasionaly it may break the browsers as
well.
(I remember this from one of the weirdest of bughunts : whenever the
number of characters between nested HTML comments was divisible by four
the page
Take a look at SW Explorer Automation
(http://home.comcast.net/~furmana/SWIEAutomation.htm)(SWEA). SWEA
creates an object model (automation interface) for any Web application
running in Internet Explorer. It supports all IE functionality:frames,
java script, dialogs, downloads.
The runtime can
Sanjay Arora [EMAIL PROTECTED] writes:
We are looking to select the language toolset more suitable for a
project that requires getting data from several web-sites in real-
timehtml parsing/scraping. It would require full emulation of the
browser, including handling cookies, automated
John J. Lee wrote:
Sanjay Arora [EMAIL PROTECTED] writes:
We are looking to select the language toolset more suitable for a
project that requires getting data from several web-sites in real-
timehtml parsing/scraping. It would require full emulation of the
browser, including
The standard library module for fetching HTML is urllib2.
The best module for scraping the HTML is BeautifulSoup.
There is a project called mechanize, built by John Lee on top of
urllib2 and other standard modules.
It will emulate a browsers behaviour - including history, cookies,
basic
Fuzzyman [EMAIL PROTECTED] writes:
The standard library module for fetching HTML is urllib2.
Does urllib2 replace everything in urllib? I thought there was some
urllib functionality that urllib2 didn't do.
There is a project called mechanize, built by John Lee on top of
urllib2 and other
We are looking to select the language toolset more suitable for a
project that requires getting data from several web-sites in real-
timehtml parsing/scraping. It would require full emulation of the
browser, including handling cookies, automated logins following
multiple web-link paths
Sanjay Arora [EMAIL PROTECTED] writes:
We are looking to select the language toolset more suitable for a
project that requires getting data from several web-sites in real-
timehtml parsing/scraping. It would require full emulation of the
browser, including handling cookies, automated
Hi all,
Please help me in parsing the html document
and extract the http links .
Thanks in advance!!1
Suchitra
--
http://mail.python.org/mailman/listinfo/python-list
65 matches
Mail list logo