We also use webkit for what essentially boils down to crawling because of scenarios liuyang06 said. Ajax driven sites make life very interesting as you are not sure when the content you are interested in is going to show up. With normal curl/wget like tools/calls it becomes very hard to manage especially if you are working with sites that need authentication and use javascript heavily. Simple actions such submitting form get very complicated as we have to reverse engineer the site and see what actually gets submitted after javascript processing is done. Its much easier to build robotic actions saying set text on this input, click there, let javascript massage the data and submit and then read this content when it appears

Agreed that webkit is heavy for these operations, but after experimenting with lot of sites we want to process and tools that were/are available, we concluded it was the best technology. With XVFB it works perfectly. My next goal is to experiment with network process model and see if we can reduce resource consumption little more.

On 11/18/2014 09:01 PM, Robert Schroll wrote:
On Tue, Nov 18, 2014 at 8:56 PM, 刘阳 <[email protected]> wrote:
But, as you know, more and more website, they use more and more dynamic loading by javascript. It may will add DOM into HTML as what the user do or type. Therefor, I want to do a program do as
a real user with the WebKitGtk, without GUI.

I admit I've never used it myself, but it sounds like you're looking for Ghost.py: https://github.com/jeanphix/Ghost.py

Robert

_______________________________________________
webkit-gtk mailing list
[email protected]
https://lists.webkit.org/mailman/listinfo/webkit-gtk

_______________________________________________
webkit-gtk mailing list
[email protected]
https://lists.webkit.org/mailman/listinfo/webkit-gtk

Reply via email to