Attached is a essence of my crawler. This collects tag in a given URL
HTML parsing is not a big deal as "tidy" does all for you. It converts
a broken HTML
to a valid XHTML. From that point there're wealth of XML libraries. Just write
whatever you want such as element handler.
I've extended it
In article <[EMAIL PROTECTED]>,
John Nagle <[EMAIL PROTECTED]> wrote:
> abeen wrote:
> > Hello,
> >
> > I would want to know which could be the best programming language for
> > developing web spider.
> > More information about the spider, much better,,
>
> As someone who actually runs a Py
abeen wrote:
> Hello,
>
> I would want to know which could be the best programming language for
> developing web spider.
> More information about the spider, much better,,
As someone who actually runs a Python based web spider in production, I
should comment.
You need a very robust parse
The O'Reilly Spidering Hacks book is also really good, albeit a little
too focussed on Perl.
On Apr 2, 9:54 am, [EMAIL PROTECTED] wrote:
> On Apr 2, 6:37 am, abeen <[EMAIL PROTECTED]> wrote:
>
> > Hello,
>
> > I would want to know which could be the best programming language for
> > developing we
On Apr 2, 2:54 pm, [EMAIL PROTECTED] wrote:
> On Apr 2, 6:37 am, abeen <[EMAIL PROTECTED]> wrote:
>
> > Hello,
>
> > I would want to know which could be the best programming language for
> > developing web spider.
> > More information about the spider, much better,,
>
> > thanks
>
> >http://www.ima
On Apr 2, 6:37 am, abeen <[EMAIL PROTECTED]> wrote:
> Hello,
>
> I would want to know which could be the best programming language for
> developing web spider.
> More information about the spider, much better,,
>
> thanks
>
> http://www.imavista.com
Just saw this while passing by... There's a nice
abeen <[EMAIL PROTECTED]> wrote:
> I would want to know which could be the best programming language for
> developing web spider.
Since you ask in comp.lang.python: I'd suggest APL
--
Web (en): http://www.no-spoon.de/ -*- Web (de): http://www.frell.de/
--
http://mail.python.org/mailman/listinf
> I would want to know which could be the best programming language for
> developing web spider.
> More information about the spider, much better,,
I hear Larry and Sergei were not exactly unsuccessful with a python
implementation although you might of course try something even better
:)
If you a