On Wed, Dec 20, 2006 at 10:59:06AM -0800, yichun wei wrote:
> I am trying to grab some html pages via KHTMLPart.openURL and scrape
> the content I get. However I am not able to read out the HTML document
> sources I have in KHTMLPart.

    just call:

domDocu= part.document ()
html= domDocu.toString ().string ()

    that's a QString.

> kdelibs has KHTML::documentSource in khtml that can return the source of the
> pages since 2005, however I only found .document() in pyKDE. 

    yes; either it dissapeared from the sources or sip didn't pick it up
or something.

> toHTML() seemed to return nothing (None or ""), while toString() gave
> me an exception and my script crashed:

    yes, under certain circumstances that happens. I think it's because
the KHTMLPart has no parentWidet or no parent or both. if you setup the
whole apparatus for showing the part, everythings works just fine.

> I find
> some discussion which point me to use KIO.get, but it returns a
> TransferJob and I have no idea how to get a QString from a
> TransferJob...

    the kios[1] send signals when data() arrives. just use a KIO::Get
job, connect it to a slot that accumulates the data. there's another
signal when it finishes (result). you could also use NetAccess[2].

--
[1] 
http://developer.kde.org/documentation/library/3.5-api/kdelibs-apidocs/kio/kio/html/index.html

[2] 
http://developer.kde.org/documentation/library/3.5-api/kdelibs-apidocs/kio/kio/html/classKIO_1_1NetAccess.html
-- 
(Not so) Random fortune:
[11:50] <xanthus> m4rgin4l: si, pero es un pais civilizado por mas que sea un 
caos
            -- xanthus, hablando de Argentina.

_______________________________________________
PyKDE mailing list    PyKDE@mats.imk.fraunhofer.de
http://mats.imk.fraunhofer.de/mailman/listinfo/pykde

Reply via email to