On Wed, Dec 20, 2006 at 10:59:06AM -0800, yichun wei wrote: > I am trying to grab some html pages via KHTMLPart.openURL and scrape > the content I get. However I am not able to read out the HTML document > sources I have in KHTMLPart.
just call: domDocu= part.document () html= domDocu.toString ().string () that's a QString. > kdelibs has KHTML::documentSource in khtml that can return the source of the > pages since 2005, however I only found .document() in pyKDE. yes; either it dissapeared from the sources or sip didn't pick it up or something. > toHTML() seemed to return nothing (None or ""), while toString() gave > me an exception and my script crashed: yes, under certain circumstances that happens. I think it's because the KHTMLPart has no parentWidet or no parent or both. if you setup the whole apparatus for showing the part, everythings works just fine. > I find > some discussion which point me to use KIO.get, but it returns a > TransferJob and I have no idea how to get a QString from a > TransferJob... the kios[1] send signals when data() arrives. just use a KIO::Get job, connect it to a slot that accumulates the data. there's another signal when it finishes (result). you could also use NetAccess[2]. -- [1] http://developer.kde.org/documentation/library/3.5-api/kdelibs-apidocs/kio/kio/html/index.html [2] http://developer.kde.org/documentation/library/3.5-api/kdelibs-apidocs/kio/kio/html/classKIO_1_1NetAccess.html -- (Not so) Random fortune: [11:50] <xanthus> m4rgin4l: si, pero es un pais civilizado por mas que sea un caos -- xanthus, hablando de Argentina. _______________________________________________ PyKDE mailing list PyKDE@mats.imk.fraunhofer.de http://mats.imk.fraunhofer.de/mailman/listinfo/pykde