When to use mechanize and Windmill library during WebScraping ?
> Be sure to look at Scrapy too: http://scrapy.org > > > Thank U Raji. S -- http://mail.python.org/mailman/listinfo/python-list
When to use mechanize and Windmill library during WebScraping ?
> > -- Forwarded message -- > From: Javier Collado > To: Raji Seetharaman > Date: Sat, 12 Dec 2009 12:52:27 +0100 > Subject: Re: When to use mechanize and Windmill library during WebScraping > ? > Hello, > > If a script that uses mechanize fails to find an html node that has > been identified with Firebug, this is probably because that node has > been autogenerated (provided that the expression to get the node is > correct). > > As an alternative to verify this, you can try to download the html > page and open it in your favourite editor. If some of the nodes that > you can see in your browser are missing or empty, then one of the > JavaScript scripts in the page should have created/populated it. > > If you're in doubt, you can try to use mechanize and, if you have > problems such as the described above, then you can move to windmill or > some other tool that executes JavaScript code before trying to get the > desired data. > > Best regards, >Javier > > Thanks for your help Raji. S -- http://mail.python.org/mailman/listinfo/python-list
A try with WebScraping using Python
Hi >From the tutorial found on the net i came to know about WebScraping using Python. I thought to give a try with it. My wish is to extract the contact mail id's from all the posts published till now in the below link http://fossjobs.wordpress.com/ With Firebug add-on its easy to find the location of mail id's inside HTML DOM tree. I dont know how to download all the web pages i.e., the coding part Which library i can use to download ? ( mechanize or windmill ) Help me Thanks Raji. S http://sraji.wordpress.com/ -- http://mail.python.org/mailman/listinfo/python-list
When to use mechanize and Windmill library during WebScraping ?
Hi For 'Webscraping with Python' mechanize or urllib2 and windmill or selenium libraries are used to download the webpages. http://www.packtpub.com/article/web-scraping-with-python The above link makes use of mechanize library to download the web pages. The below link uses windmill library to download the web pages. http://www.packtpub.com/article/web-scraping-with-python-part-2 I dont know when to use mechanize or windmill library It has been said that Windmill library is used when the HTML file is auto generated by the JavaScript code. Also i dont know how to identify whether the HTML file is auto generated by the JavaScript code or not ? Suggest me Thanks Raji. S http://sraji.wordpress.com/ -- http://mail.python.org/mailman/listinfo/python-list
Re: Error in Windmill
On Thu, Nov 12, 2009 at 5:58 PM, Himanshu wrote: > 2009/11/12 Raji Seetharaman : > > > > Hi > > > > Im learning Web scraping with Python from here > > http://www.packtpub.com/article/web-scraping-with-python-part-2 > > > > From the above link, the complete code is here > http://pastebin.com/m10046dc6 > > > > When i run the program in the terminal i receive following errors > > > > File "nasa.py", line 41, in > > test_scrape_iotd_gallery() > > File "nasa.py", line 24, in test_scrape_iotd_gallery > > client = WindmillTestClient(__name__) > > File > > > "/usr/local/lib/python2.6/dist-packages/windmill-1.3-py2.6.egg/windmill/authoring/__init__.py", > > line 142, in __init__ > > method_proxy = windmill.tools.make_jsonrpc_client() > > File > > > "/usr/local/lib/python2.6/dist-packages/windmill-1.3-py2.6.egg/windmill/tools/__init__.py", > > line 35, in make_jsonrpc_client > > url = urlparse(windmill.settings['TEST_URL']) > > AttributeError: 'module' object has no attribute 'settings' > > > > Suggest me > > > > Thanks > > Raji. S > > > > -- > > http://mail.python.org/mailman/listinfo/python-list > > > > > > Google or See > http://groups.google.com/group/windmill-dev/browse_thread/thread/c921f7a25c0200c9 > Thanks for your help. -- http://mail.python.org/mailman/listinfo/python-list
Error in Windmill
Hi Im learning Web scraping with Python from here http://www.packtpub.com/article/web-scraping-with-python-part-2 >From the above link, the complete code is here http://pastebin.com/m10046dc6 When i run the program in the terminal i receive following errors File "nasa.py", line 41, in test_scrape_iotd_gallery() File "nasa.py", line 24, in test_scrape_iotd_gallery client = WindmillTestClient(__name__) File "/usr/local/lib/python2.6/dist-packages/windmill-1.3-py2.6.egg/windmill/authoring/__init__.py", line 142, in __init__ method_proxy = windmill.tools.make_jsonrpc_client() File "/usr/local/lib/python2.6/dist-packages/windmill-1.3-py2.6.egg/windmill/tools/__init__.py", line 35, in make_jsonrpc_client url = urlparse(windmill.settings['TEST_URL']) AttributeError: 'module' object has no attribute 'settings' Suggest me Thanks Raji. S -- http://mail.python.org/mailman/listinfo/python-list
Error received from _mechanize.py
Hi all, Im learning web scraping with python from the following link http://www.packtpub.com/article/web-scraping-with-python To work with it, mechanize to be installed I installed mechanize using sudo apt-get install python-mechanize As given in the tutorial, i tried the code as below import mechanize BASE_URL = "http://www.packtpub.com/article-network"; br = mechanize.Browser() data = br.open(BASE_URL).get_data() Received the following error File "webscrap.py", line 4, in data = br.open(BASE_URL).get_data() File "/usr/lib/python2.6/dist-packages/mechanize/_mechanize.py", line 209, in open return self._mech_open(url, data, timeout=timeout) File "/usr/lib/python2.6/dist-packages/mechanize/_mechanize.py", line 261, in _mech_open raise response mechanize._response.httperror_seek_wrapper: HTTP Error 403: request disallowed by robots.txt Any Ideas? Welcome -- http://mail.python.org/mailman/listinfo/python-list
AttributeError: 'NoneType' object has no attribute 'get_text'
-- Forwarded message -- > From: MRAB > To: python-list@python.org > Date: Sun, 13 Sep 2009 19:44:30 +0100 > Subject: Re: AttributeError: 'NoneType' object has no attribute 'get_text' > Raji Seetharaman wrote: > >> Hi all, >> i did a small gui addressbook application using pygtk, python, mysql db. >> It was successful. >> I tried the same application with glade. But i ended up with errors. >> I desgined the code as follows. >> 1.It has one main window and four child dialogs. >> 2.In the main window, i have to fill in the text entry widget & if i press >> 'add', the data will get inserted into the database.This works fine. >> 3. If showdialog button is clicked, a child dialog appears, where i have >> to enter old entry to update. >> 4. To update, i again use the main window, where i get the following error >> Traceback (most recent call last): >> File "addressbookglade.py", line 63, in update >>self.ssn = self.wTree.get_widget("ssn"). >> get_text() >>AttributeError: 'NoneType' object has no attribute 'get_text' >> >> Also i already set the name in properties window. It works fine for add >> option. But not for update option. >> Im using the same window for both add and update. >> >> The code is available here http://pastebin.com/m28a4747e >> The glade xml file is here http://pastebin.com/m1af61a29 >> The screenshot of my glade windows are here >> http://www.flickr.com/photos/raji_me/?saved=1 >> It works fine for add option. But not for update option. Im using the >> same window for both add and update. >> >> You're using instance attributes a lot where I think local variables > would be better, eg "self.ssn" instead of just "ssn". > > In the '__init__' method you have: > >self.wTree = gtk.glade.XML(self.gladefile,"mainWindow") > > and then in the 'view' method you have: > >self.wTree = gtk.glade.XML(self.gladefile,"viewdialog") > > In both the 'add' and 'update' methods you have: > >self.ssn = self.wTree.get_widget("ssn").get_text() > > so I suspect that the following is happening: > > 1. __init__ makes self.wTree refer to 'mainWindow'; > > 2. You click on the Add button, the 'add' method is called, and the > "self.ssn = " line looks for the "ssn" widget in 'mainWindow'; > > 3. You click on the OK(?) button and view what's just been added; > > 4. The 'view' method makes self.wTree refer to 'viewdialog'; > > 5. You click on the Update button, the 'update' method is called, and the > "self.ssn = " line looks for the "ssn" widget in 'viewdialog'. > > > I'll use local variables. Yes the "self.ssn = " line looks for the 'ssn' > widget in 'viewdialog' > How to set the "self.ssn = " in update method to the mainWindow ? -- http://mail.python.org/mailman/listinfo/python-list
Re: AttributeError: 'NoneType' object has no attribute 'get_text'
> -- Forwarded message -- > From: MRAB > To: python-list@python.org > Date: Sun, 13 Sep 2009 19:44:30 +0100 > Subject: Re: AttributeError: 'NoneType' object has no attribute 'get_text' > Raji Seetharaman wrote: > >> Hi all, >> i did a small gui addressbook application using pygtk, python, mysql db. >> It was successful. >> I tried the same application with glade. But i ended up with errors. >> I desgined the code as follows. >> 1.It has one main window and four child dialogs. >> 2.In the main window, i have to fill in the text entry widget & if i press >> 'add', the data will get inserted into the database.This works fine. >> 3. If showdialog button is clicked, a child dialog appears, where i have >> to enter old entry to update. >> 4. To update, i again use the main window, where i get the following error >> Traceback (most recent call last): >> File "addressbookglade.py", line 63, in update >>self.ssn = self.wTree.get_widget("ssn"). >> get_text() >>AttributeError: 'NoneType' object has no attribute 'get_text' >> >> Also i already set the name in properties window. It works fine for add >> option. But not for update option. >> Im using the same window for both add and update. >> >> The code is available here http://pastebin.com/m28a4747e >> The glade xml file is here http://pastebin.com/m1af61a29 >> The screenshot of my glade windows are here >> http://www.flickr.com/photos/raji_me/?saved=1 >> It works fine for add option. But not for update option. Im using the >> same window for both add and update. >> >> You're using instance attributes a lot where I think local variables > would be better, eg "self.ssn" instead of just "ssn". > > In the '__init__' method you have: > >self.wTree = gtk.glade.XML(self.gladefile,"mainWindow") > > and then in the 'view' method you have: > >self.wTree = gtk.glade.XML(self.gladefile,"viewdialog") > > In both the 'add' and 'update' methods you have: > >self.ssn = self.wTree.get_widget("ssn").get_text() > > so I suspect that the following is happening: > > 1. __init__ makes self.wTree refer to 'mainWindow'; > > 2. You click on the Add button, the 'add' method is called, and the > "self.ssn = " line looks for the "ssn" widget in 'mainWindow'; > > 3. You click on the OK(?) button and view what's just been added; > > 4. The 'view' method makes self.wTree refer to 'viewdialog'; > > 5. You click on the Update button, the 'update' method is called, and the > "self.ssn = " line looks for the "ssn" widget in 'viewdialog'. > > > Yes, u r right, the "self.ssn = " looks for widget in the child dialog, not > the mainWindow. But how to set it to 'mainWindow' again. > -- http://mail.python.org/mailman/listinfo/python-list
AttributeError: 'NoneType' object has no attribute 'get_text'
Hi all, i did a small gui addressbook application using pygtk, python, mysql db. It was successful. I tried the same application with glade. But i ended up with errors. I desgined the code as follows. 1.It has one main window and four child dialogs. 2.In the main window, i have to fill in the text entry widget & if i press 'add', the data will get inserted into the database.This works fine. 3. If showdialog button is clicked, a child dialog appears, where i have to enter old entry to update. 4. To update, i again use the main window, where i get the following error Traceback (most recent call last): File "addressbookglade.py", line 63, in update self.ssn = self.wTree.get_widget("ssn"). get_text() AttributeError: 'NoneType' object has no attribute 'get_text' Also i already set the name in properties window. It works fine for add option. But not for update option. Im using the same window for both add and update. The code is available here http://pastebin.com/m28a4747e The glade xml file is here http://pastebin.com/m1af61a29 The screenshot of my glade windows are here http://www.flickr.com/photos/raji_me/?saved=1 It works fine for add option. But not for update option. Im using the same window for both add and update. Raji. S -- http://mail.python.org/mailman/listinfo/python-list
AttributeError: 'NoneType' object has no attribute 'get_text'
On Wed, Sep 2, 2009 at 10:11 AM, Raji Seetharaman wrote: > > Thanks MRAB. Now it works. > Raji.S -- http://mail.python.org/mailman/listinfo/python-list
Re: Python-list Digest, Vol 72, Issue 10
Thanks MRAB. Now it works. -- http://mail.python.org/mailman/listinfo/python-list
AttributeError: 'NoneType' object has no attribute 'get_text'
Hi all, i worked out python and glade example program to add two numbers and display its output from the following link http://www.dreamincode.net/forums/showtopic63885.htm When i run the script, i received the following error python add.py Traceback (most recent call last): File "add.py", line 34, in add thistime = adder( self.wTree.get_widget("entryNumber1").get_text(), self.wTree.get_widget("entryNumber2").get_text()) AttributeError: 'NoneType' object has no attribute 'get_text' What has to be done to overcome this error? Regards Raji. S -- http://mail.python.org/mailman/listinfo/python-list
Re: Python and glade program - without errors but didn't display anything
Thanks Anusha. Now my calculator gui window is displayed. -- http://mail.python.org/mailman/listinfo/python-list
Python and glade program - without errors but didn't display anything
Hi all, i tried to develop a calculator using glade and python. When i run the script(python calculatorglade.py), i didn't get any errors. A calculator gui is supposed to be displayed. But no window appears. With Ctrl + C, i killed it. I got the following response from the terminal ^CTraceback (most recent call last): File "calculatorglade.py", line 106, in gtk.main() KeyboardInterrupt The Python program is available here, http://pastebin.com/m44d2c1cb The corresponding Glade xml file is available here, http://pastebin.com/m46beaac5 What should i do to display the gui window? Regards Raji. S -- http://mail.python.org/mailman/listinfo/python-list