When to use mechanize and Windmill library during WebScraping ?

2009-12-17 Thread Raji Seetharaman
> Be sure to look at Scrapy too: http://scrapy.org
>
>
> Thank U

Raji. S
-- 
http://mail.python.org/mailman/listinfo/python-list


When to use mechanize and Windmill library during WebScraping ?

2009-12-17 Thread Raji Seetharaman
>
> -- Forwarded message --
> From: Javier Collado 
> To: Raji Seetharaman 
> Date: Sat, 12 Dec 2009 12:52:27 +0100
> Subject: Re: When to use mechanize and Windmill library during WebScraping
> ?
> Hello,
>
> If a script that uses mechanize fails to find an html node that has
> been identified with Firebug, this is probably because that node has
> been autogenerated (provided that the expression to get the node is
> correct).
>
> As an alternative to verify this, you can try to download the html
> page and open it in your favourite editor. If some of the nodes that
> you can see in your browser are missing or empty, then one of the
> JavaScript scripts in the page should have created/populated it.
>
> If you're in doubt, you can try to use mechanize and, if you have
> problems such as the described above, then you can move to windmill or
> some other tool that executes JavaScript code before trying to get the
> desired data.
>
> Best regards,
>Javier
>
>
Thanks for your help

Raji. S
-- 
http://mail.python.org/mailman/listinfo/python-list


A try with WebScraping using Python

2009-12-11 Thread Raji Seetharaman
Hi

>From the tutorial found on the net i came to know about WebScraping using
Python.

I thought to give a try with it.

My wish is to extract the contact mail  id's from all the posts published
till now in the below link

http://fossjobs.wordpress.com/

With Firebug add-on its easy  to find the location of mail id's inside HTML
DOM tree.

I dont know how to download all the web pages i.e., the coding part

Which library i can use to download ? ( mechanize or windmill )

Help me

Thanks

Raji. S
http://sraji.wordpress.com/
-- 
http://mail.python.org/mailman/listinfo/python-list


When to use mechanize and Windmill library during WebScraping ?

2009-12-11 Thread Raji Seetharaman
Hi

For 'Webscraping with Python' mechanize or urllib2 and windmill or selenium
libraries are used  to download the webpages.

http://www.packtpub.com/article/web-scraping-with-python

The above link makes use of mechanize library to download the web pages.

The below link uses windmill library to download the web pages.

http://www.packtpub.com/article/web-scraping-with-python-part-2

I dont know when to use mechanize or windmill library

It has been said that Windmill library is used when the HTML file is auto
generated by the JavaScript code.

Also i dont know how to identify whether the HTML file is auto generated by
the JavaScript code or not ?

Suggest me

Thanks

Raji. S
http://sraji.wordpress.com/
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Error in Windmill

2009-11-12 Thread Raji Seetharaman
On Thu, Nov 12, 2009 at 5:58 PM, Himanshu  wrote:

> 2009/11/12 Raji Seetharaman :
> >
> > Hi
> >
> > Im learning Web scraping with Python from here
> > http://www.packtpub.com/article/web-scraping-with-python-part-2
> >
> > From the above link, the complete code is here
> http://pastebin.com/m10046dc6
> >
> > When i run the program in the terminal i receive following errors
> >
> > File "nasa.py", line 41, in 
> > test_scrape_iotd_gallery()
> >   File "nasa.py", line 24, in test_scrape_iotd_gallery
> > client = WindmillTestClient(__name__)
> >   File
> >
> "/usr/local/lib/python2.6/dist-packages/windmill-1.3-py2.6.egg/windmill/authoring/__init__.py",
> > line 142, in __init__
> > method_proxy = windmill.tools.make_jsonrpc_client()
> >   File
> >
> "/usr/local/lib/python2.6/dist-packages/windmill-1.3-py2.6.egg/windmill/tools/__init__.py",
> > line 35, in make_jsonrpc_client
> > url = urlparse(windmill.settings['TEST_URL'])
> > AttributeError: 'module' object has no attribute 'settings'
> >
> > Suggest me
> >
> > Thanks
> > Raji. S
> >
> > --
> > http://mail.python.org/mailman/listinfo/python-list
> >
> >
>
> Google or See
> http://groups.google.com/group/windmill-dev/browse_thread/thread/c921f7a25c0200c9
>

Thanks  for your help.
-- 
http://mail.python.org/mailman/listinfo/python-list


Error in Windmill

2009-11-12 Thread Raji Seetharaman
Hi

Im learning Web scraping with Python from here
http://www.packtpub.com/article/web-scraping-with-python-part-2

>From the above link, the complete code is here http://pastebin.com/m10046dc6

When i run the program in the terminal i receive following errors

File "nasa.py", line 41, in 
test_scrape_iotd_gallery()
  File "nasa.py", line 24, in test_scrape_iotd_gallery
client = WindmillTestClient(__name__)
  File
"/usr/local/lib/python2.6/dist-packages/windmill-1.3-py2.6.egg/windmill/authoring/__init__.py",
line 142, in __init__
method_proxy = windmill.tools.make_jsonrpc_client()
  File
"/usr/local/lib/python2.6/dist-packages/windmill-1.3-py2.6.egg/windmill/tools/__init__.py",
line 35, in make_jsonrpc_client
url = urlparse(windmill.settings['TEST_URL'])
AttributeError: 'module' object has no attribute 'settings'

Suggest me

Thanks
Raji. S
-- 
http://mail.python.org/mailman/listinfo/python-list


Error received from _mechanize.py

2009-10-15 Thread Raji Seetharaman
Hi all,

Im learning web scraping with python from the following link
http://www.packtpub.com/article/web-scraping-with-python

To work with it,  mechanize to be installed
I installed mechanize using

sudo apt-get install python-mechanize

As given in the tutorial, i tried the code as below

import mechanize
BASE_URL = "http://www.packtpub.com/article-network";
br = mechanize.Browser()
data = br.open(BASE_URL).get_data()

Received the following error

File "webscrap.py", line 4, in 
data = br.open(BASE_URL).get_data()
  File "/usr/lib/python2.6/dist-packages/mechanize/_mechanize.py", line 209,
in open
return self._mech_open(url, data, timeout=timeout)
  File "/usr/lib/python2.6/dist-packages/mechanize/_mechanize.py", line 261,
in _mech_open
raise response
mechanize._response.httperror_seek_wrapper: HTTP Error 403: request
disallowed by robots.txt


Any Ideas? Welcome
-- 
http://mail.python.org/mailman/listinfo/python-list


AttributeError: 'NoneType' object has no attribute 'get_text'

2009-09-14 Thread Raji Seetharaman
-- Forwarded message --
> From: MRAB 
> To: python-list@python.org
> Date: Sun, 13 Sep 2009 19:44:30 +0100
> Subject: Re: AttributeError: 'NoneType' object has no attribute 'get_text'
> Raji Seetharaman wrote:
>
>> Hi all,
>> i did a small gui addressbook application using pygtk, python, mysql db.
>> It was successful.
>> I tried the same application with glade. But i ended up with errors.
>> I desgined the code as follows.
>> 1.It has one main window and four child dialogs.
>> 2.In the main window, i have to fill in the text entry widget & if i press
>> 'add', the data will get inserted into the database.This works fine.
>> 3. If showdialog button is clicked, a child dialog appears, where i have
>> to enter old entry to update.
>> 4. To update, i again use the main window, where i get the following error
>>  Traceback (most recent call last):
>>  File "addressbookglade.py", line 63, in update
>>self.ssn = self.wTree.get_widget("ssn").
>> get_text()
>>AttributeError: 'NoneType' object has no attribute 'get_text'
>>
>> Also i already set the name in properties window.  It works fine for add
>> option. But not for update option.
>> Im using the same window for both add and update.
>>
>> The code is available here http://pastebin.com/m28a4747e
>> The glade xml file is here http://pastebin.com/m1af61a29
>> The screenshot of my glade windows are here
>> http://www.flickr.com/photos/raji_me/?saved=1
>>  It works fine for add option. But not for update option. Im using the
>> same window for both add and update.
>>
>>  You're using instance attributes a lot where I think local variables
> would be better, eg "self.ssn" instead of just "ssn".
>
> In the '__init__' method you have:
>
>self.wTree = gtk.glade.XML(self.gladefile,"mainWindow")
>
> and then in the 'view' method you have:
>
>self.wTree = gtk.glade.XML(self.gladefile,"viewdialog")
>
> In both the 'add' and 'update' methods you have:
>
>self.ssn = self.wTree.get_widget("ssn").get_text()
>
> so I suspect that the following is happening:
>
> 1. __init__ makes self.wTree refer to 'mainWindow';
>
> 2. You click on the Add button, the 'add' method is called, and the
> "self.ssn = " line looks for the "ssn" widget in 'mainWindow';
>
> 3. You click on the OK(?) button and view what's just been added;
>
> 4. The 'view' method makes self.wTree refer to 'viewdialog';
>
> 5. You click on the Update button, the 'update' method is called, and the
> "self.ssn = " line looks for the "ssn" widget in 'viewdialog'.
>
>
> I'll use local variables. Yes the "self.ssn = " line looks for the 'ssn'
> widget in 'viewdialog'
>
How to set the "self.ssn = " in update method to the mainWindow ?
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: AttributeError: 'NoneType' object has no attribute 'get_text'

2009-09-14 Thread Raji Seetharaman
> -- Forwarded message --
> From: MRAB 
> To: python-list@python.org
> Date: Sun, 13 Sep 2009 19:44:30 +0100
> Subject: Re: AttributeError: 'NoneType' object has no attribute 'get_text'
> Raji Seetharaman wrote:
>
>> Hi all,
>> i did a small gui addressbook application using pygtk, python, mysql db.
>> It was successful.
>> I tried the same application with glade. But i ended up with errors.
>> I desgined the code as follows.
>> 1.It has one main window and four child dialogs.
>> 2.In the main window, i have to fill in the text entry widget & if i press
>> 'add', the data will get inserted into the database.This works fine.
>> 3. If showdialog button is clicked, a child dialog appears, where i have
>> to enter old entry to update.
>> 4. To update, i again use the main window, where i get the following error
>>  Traceback (most recent call last):
>>  File "addressbookglade.py", line 63, in update
>>self.ssn = self.wTree.get_widget("ssn").
>> get_text()
>>AttributeError: 'NoneType' object has no attribute 'get_text'
>>
>> Also i already set the name in properties window.  It works fine for add
>> option. But not for update option.
>> Im using the same window for both add and update.
>>
>> The code is available here http://pastebin.com/m28a4747e
>> The glade xml file is here http://pastebin.com/m1af61a29
>> The screenshot of my glade windows are here
>> http://www.flickr.com/photos/raji_me/?saved=1
>>  It works fine for add option. But not for update option. Im using the
>> same window for both add and update.
>>
>>  You're using instance attributes a lot where I think local variables
> would be better, eg "self.ssn" instead of just "ssn".
>
> In the '__init__' method you have:
>
>self.wTree = gtk.glade.XML(self.gladefile,"mainWindow")
>
> and then in the 'view' method you have:
>
>self.wTree = gtk.glade.XML(self.gladefile,"viewdialog")
>
> In both the 'add' and 'update' methods you have:
>
>self.ssn = self.wTree.get_widget("ssn").get_text()
>
> so I suspect that the following is happening:
>
> 1. __init__ makes self.wTree refer to 'mainWindow';
>
> 2. You click on the Add button, the 'add' method is called, and the
> "self.ssn = " line looks for the "ssn" widget in 'mainWindow';
>
> 3. You click on the OK(?) button and view what's just been added;
>
> 4. The 'view' method makes self.wTree refer to 'viewdialog';
>
> 5. You click on the Update button, the 'update' method is called, and the
> "self.ssn = " line looks for the "ssn" widget in 'viewdialog'.
>



>
> Yes, u r right, the "self.ssn = " looks for widget in the child dialog, not
> the mainWindow. But how to set it to 'mainWindow' again.
>
-- 
http://mail.python.org/mailman/listinfo/python-list


AttributeError: 'NoneType' object has no attribute 'get_text'

2009-09-13 Thread Raji Seetharaman
Hi all,
i did a small gui addressbook application using pygtk, python, mysql db. It
was successful.
I tried the same application with glade. But i ended up with errors.
I desgined the code as follows.
1.It has one main window and four child dialogs.
2.In the main window, i have to fill in the text entry widget & if i press
'add', the data will get inserted into the database.This works fine.
3. If showdialog button is clicked, a child dialog appears, where i have to
enter old entry to update.
4. To update, i again use the main window, where i get the following error
  Traceback (most recent call last):
  File "addressbookglade.py", line 63, in update
self.ssn = self.wTree.get_widget("ssn"). get_text()
AttributeError: 'NoneType' object has no attribute 'get_text'

Also i already set the name in properties window.  It works fine for add
option. But not for update option.
Im using the same window for both add and update.

The code is available here http://pastebin.com/m28a4747e
The glade xml file is here http://pastebin.com/m1af61a29
The screenshot of my glade windows are here
http://www.flickr.com/photos/raji_me/?saved=1
 It works fine for add option. But not for update option. Im using the same
window for both add and update.

Raji. S
-- 
http://mail.python.org/mailman/listinfo/python-list


AttributeError: 'NoneType' object has no attribute 'get_text'

2009-09-02 Thread Raji Seetharaman
On Wed, Sep 2, 2009 at 10:11 AM, Raji Seetharaman wrote:

>
> Thanks MRAB. Now it works.
>

Raji.S
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Python-list Digest, Vol 72, Issue 10

2009-09-01 Thread Raji Seetharaman
Thanks MRAB. Now it works.
-- 
http://mail.python.org/mailman/listinfo/python-list


AttributeError: 'NoneType' object has no attribute 'get_text'

2009-09-01 Thread Raji Seetharaman
Hi all, i worked out python and glade example program to add two numbers and
display its output from the following link
http://www.dreamincode.net/forums/showtopic63885.htm
When i run the script, i received the following error

python add.py
Traceback (most recent call last):
  File "add.py", line 34, in add
thistime = adder( self.wTree.get_widget("entryNumber1").get_text(),
self.wTree.get_widget("entryNumber2").get_text())
AttributeError: 'NoneType' object has no attribute 'get_text'

What has to be done to overcome this error?

Regards

Raji. S
-- 
http://mail.python.org/mailman/listinfo/python-list


Re: Python and glade program - without errors but didn't display anything

2009-09-01 Thread Raji Seetharaman
Thanks Anusha. Now my calculator gui window is displayed.
-- 
http://mail.python.org/mailman/listinfo/python-list


Python and glade program - without errors but didn't display anything

2009-09-01 Thread Raji Seetharaman
Hi all,
i tried to develop a calculator using glade and python.
When i run the script(python calculatorglade.py), i didn't get any errors.
A calculator gui is supposed  to be displayed. But no window appears.
With Ctrl + C, i killed it. I got the following response from the terminal

^CTraceback (most recent call last):
  File "calculatorglade.py", line 106, in 
gtk.main()
KeyboardInterrupt

The Python program is available here, http://pastebin.com/m44d2c1cb

The corresponding Glade xml file is available here,
http://pastebin.com/m46beaac5

What should i do to display the gui window?

Regards
Raji. S
-- 
http://mail.python.org/mailman/listinfo/python-list