There was a problem in trunk that may have caused this. The problem
was with the recent change in URL bahviour. I think I fixed it. Yet I
cannot run your code. I am getting a strange error
fh = urlopen ( URL )
return fh.getcode()
AttributeError: addinfourl instance has no attribute 'getcode'
On Aug 21, 8:07 am, Stef Mientki <[email protected]> wrote:
> On 21-08-2010 14:46, mdipierro wrote:
>
> > what do you find that is strange?
>
> This is the result with the last letter removed, so all links should give an
> error,
> but they differ with the 2 methods,
> and some of them produce 200, while they are definitely wrong
> 404 500http://127.0.0.1:8000/welcome/default/user/logi
> 404 500http://127.0.0.1:8000/welcome/default/user/registe
> 404 500http://127.0.0.1:8000/welcome/default/user/request_reset_passwor
> 200 500http://127.0.0.1:8000/welcome/default
> 400 500http://127.0.0.1:8000/welcome/default/inde
> 200 500http://127.0.0.1:8000/admin/default/design/welcom
> 200 500http://127.0.0.1:8000/admin/default/edit/welcome/controllers/default.p
> 200
> 500http://127.0.0.1:8000/admin/default/edit/welcome/views/default/index.htm
> 200 500http://127.0.0.1:8000/admin/default/edit/welcome/views/layout.htm
> 200 500http://127.0.0.1:8000/admin/default/edit/welcome/static/base.cs
> 200 500http://127.0.0.1:8000/admin/default/edit/welcome/models/db.p
> 200 500http://127.0.0.1:8000/admin/default/edit/welcome/models/menu.p
> 400 500http://127.0.0.1:8000/welcome/appadmin/inde
> 200 500http://127.0.0.1:8000/admin/default/inde
> 400 400http://127.0.0.1:8000/examples/default/inde
> 200 -1http://web2py.co
> 400 400http://web2py.com/boo
> 400 500http://127.0.0.1:8000/welcome/default/inde
> 200 500http://127.0.0.1:8000/welcome/default
> 200 500http://127.0.0.1:8000/admin/default/peek/welcome/controllers/default.p
> 200
> 500http://127.0.0.1:8000/admin/default/peek/welcome/views/default/index.htm
> 200 -1http://www.web2py.co
>
> This is the normal result
> 200 500http://127.0.0.1:8000/welcome/default/user/login
> 200 500http://127.0.0.1:8000/welcome/default/user/register
> 200 500http://127.0.0.1:8000/welcome/default/user/request_reset_password
> 200 500http://127.0.0.1:8000/welcome/default
> 200 500http://127.0.0.1:8000/welcome/default/index
> 200 500http://127.0.0.1:8000/admin/default/design/welcome
> 200 500http://127.0.0.1:8000/admin/default/edit/welcome/controllers/default.py
> 200
> 500http://127.0.0.1:8000/admin/default/edit/welcome/views/default/index....
> 200 500http://127.0.0.1:8000/admin/default/edit/welcome/views/layout.html
> 200 500http://127.0.0.1:8000/admin/default/edit/welcome/static/base.css
> 200 500http://127.0.0.1:8000/admin/default/edit/welcome/models/db.py
> 200 500http://127.0.0.1:8000/admin/default/edit/welcome/models/menu.py
> 200 500http://127.0.0.1:8000/welcome/appadmin/index
> 200 500http://127.0.0.1:8000/admin/default/index
> 200 200http://127.0.0.1:8000/examples/default/index
> 200 200http://web2py.com
> 200 500http://web2py.com/book
> 200 500http://127.0.0.1:8000/welcome/default/index
> 400 500http://127.0.0.1:8000/welcome/default/index#
> 200 500http://127.0.0.1:8000/admin/default/peek/welcome/controllers/default.py
> 200
> 500http://127.0.0.1:8000/admin/default/peek/welcome/views/default/index....
> 200 200http://www.web2py.com
>
> So when is a URL valid ?
>
> thanks,
> Stef
>
> > On Aug 21, 7:32 am, Stef Mientki <[email protected]> wrote:
> >>> Graphical representation of links or pages that don't get linked to.
> >> I tried to test the links (with 2 algorithms, code below) in a generated
> >> webpage, but the result I
> >> get are very weird.
> >> Probably one you knows a better way ?
>
> >> cheers,
> >> Stef
>
> >> from BeautifulSoup import BeautifulSoup
> >> from urllib import urlopen
> >> from httplib import HTTP
> >> from urlparse import urlparse
>
> >> def Check_URL_1 ( URL ) :
> >> try:
> >> fh = urlopen ( URL )
> >> return fh.code == 200
> >> except :
> >> return False
>
> >> def Check_URL_2 ( URL ) :
> >> p = urlparse ( URL )
> >> h = HTTP ( p[1] )
> >> h.putrequest ( 'HEAD', p[2] )
> >> h.endheaders()
> >> if h.getreply()[0] == 200:
> >> return True
> >> else:
> >> return False
>
> >> def Verify_Links ( URL ) :
> >> Parts = URL.split('/')
> >> Site = '/'.join ( Parts [:3] )
> >> Current = '/'.join ( Parts [:-1] )
>
> >> fh = urlopen ( URL )
> >> lines = fh.read ()
> >> fh.close()
>
> >> Soup = BeautifulSoup ( lines )
> >> hrefs = lines = Soup.findAll ( 'a' )
>
> >> for href in hrefs :
> >> href = href [ 'href' ] #[:-1] ## <== remove "#" to generate all
> >> errors
>
> >> if href.startswith ( '/' ) :
> >> href = Site + href
> >> elif href.startswith ('#' ) :
> >> href = URL + href
> >> elif href.startswith ( 'http' ) :
> >> pass
> >> else :
> >> href = Current + href
>
> >> try:
> >> fh = urllib.urlopen ( href )
> >> except :
> >> pass
> >> print Check_URL_1 ( href ), Check_URL_2 ( href ), href
>
> >> URL = 'http://127.0.0.1:8000/welcome/default/index'
> >> fh = Verify_Links ( URL )