why are the urls in the first set truncated?

On Aug 21, 8:07 am, Stef Mientki <[email protected]> wrote:
>  On 21-08-2010 14:46, mdipierro wrote:
>
> > what do you find that is strange?
>
> This is the result with the last letter removed, so all links should give an 
> error,
> but they differ with the 2 methods,
> and some of them produce 200, while they are definitely wrong
> 404 500http://127.0.0.1:8000/welcome/default/user/logi
> 404 500http://127.0.0.1:8000/welcome/default/user/registe
> 404 500http://127.0.0.1:8000/welcome/default/user/request_reset_passwor
> 200 500http://127.0.0.1:8000/welcome/default
> 400 500http://127.0.0.1:8000/welcome/default/inde
> 200 500http://127.0.0.1:8000/admin/default/design/welcom
> 200 500http://127.0.0.1:8000/admin/default/edit/welcome/controllers/default.p
> 200 
> 500http://127.0.0.1:8000/admin/default/edit/welcome/views/default/index.htm
> 200 500http://127.0.0.1:8000/admin/default/edit/welcome/views/layout.htm
> 200 500http://127.0.0.1:8000/admin/default/edit/welcome/static/base.cs
> 200 500http://127.0.0.1:8000/admin/default/edit/welcome/models/db.p
> 200 500http://127.0.0.1:8000/admin/default/edit/welcome/models/menu.p
> 400 500http://127.0.0.1:8000/welcome/appadmin/inde
> 200 500http://127.0.0.1:8000/admin/default/inde
> 400 400http://127.0.0.1:8000/examples/default/inde
> 200 -1http://web2py.co
> 400 400http://web2py.com/boo
> 400 500http://127.0.0.1:8000/welcome/default/inde
> 200 500http://127.0.0.1:8000/welcome/default
> 200 500http://127.0.0.1:8000/admin/default/peek/welcome/controllers/default.p
> 200 
> 500http://127.0.0.1:8000/admin/default/peek/welcome/views/default/index.htm
> 200 -1http://www.web2py.co
>
> This is the normal result
> 200 500http://127.0.0.1:8000/welcome/default/user/login
> 200 500http://127.0.0.1:8000/welcome/default/user/register
> 200 500http://127.0.0.1:8000/welcome/default/user/request_reset_password
> 200 500http://127.0.0.1:8000/welcome/default
> 200 500http://127.0.0.1:8000/welcome/default/index
> 200 500http://127.0.0.1:8000/admin/default/design/welcome
> 200 500http://127.0.0.1:8000/admin/default/edit/welcome/controllers/default.py
> 200 
> 500http://127.0.0.1:8000/admin/default/edit/welcome/views/default/index....
> 200 500http://127.0.0.1:8000/admin/default/edit/welcome/views/layout.html
> 200 500http://127.0.0.1:8000/admin/default/edit/welcome/static/base.css
> 200 500http://127.0.0.1:8000/admin/default/edit/welcome/models/db.py
> 200 500http://127.0.0.1:8000/admin/default/edit/welcome/models/menu.py
> 200 500http://127.0.0.1:8000/welcome/appadmin/index
> 200 500http://127.0.0.1:8000/admin/default/index
> 200 200http://127.0.0.1:8000/examples/default/index
> 200 200http://web2py.com
> 200 500http://web2py.com/book
> 200 500http://127.0.0.1:8000/welcome/default/index
> 400 500http://127.0.0.1:8000/welcome/default/index#
> 200 500http://127.0.0.1:8000/admin/default/peek/welcome/controllers/default.py
> 200 
> 500http://127.0.0.1:8000/admin/default/peek/welcome/views/default/index....
> 200 200http://www.web2py.com
>
> So when is a URL valid ?
>
> thanks,
> Stef
>
> > On Aug 21, 7:32 am, Stef Mientki <[email protected]> wrote:
> >>> Graphical representation of links or pages that don't get linked to.
> >> I tried to test the links (with 2 algorithms, code below) in a generated 
> >> webpage, but the result I
> >> get are very weird.
> >> Probably one you knows a better way ?
>
> >> cheers,
> >> Stef
>
> >> from BeautifulSoup import BeautifulSoup
> >> from urllib        import urlopen
> >> from httplib       import HTTP
> >> from urlparse      import urlparse
>
> >> def Check_URL_1 ( URL ) :
> >>   try:
> >>     fh = urlopen ( URL )
> >>     return fh.code == 200
> >>   except :
> >>     return False
>
> >> def Check_URL_2 ( URL ) :
> >>   p = urlparse ( URL )
> >>   h = HTTP ( p[1] )
> >>   h.putrequest ( 'HEAD', p[2] )
> >>   h.endheaders()
> >>   if h.getreply()[0] == 200:
> >>     return True
> >>   else:
> >>     return False
>
> >> def Verify_Links ( URL ) :
> >>   Parts   = URL.split('/')
> >>   Site    = '/'.join ( Parts [:3] )
> >>   Current = '/'.join ( Parts [:-1] )
>
> >>   fh = urlopen ( URL )
> >>   lines = fh.read ()
> >>   fh.close()
>
> >>   Soup = BeautifulSoup ( lines )
> >>   hrefs = lines = Soup.findAll ( 'a' )
>
> >>   for href in hrefs :
> >>     href = href [ 'href' ] #[:-1]     ## <== remove "#" to generate all 
> >> errors
>
> >>     if href.startswith ( '/' ) :
> >>       href = Site + href
> >>     elif href.startswith ('#' ) :
> >>       href = URL + href
> >>     elif href.startswith ( 'http' ) :
> >>       pass
> >>     else :
> >>       href = Current + href
>
> >>     try:
> >>       fh = urllib.urlopen ( href )
> >>     except :
> >>       pass
> >>     print Check_URL_1 ( href ), Check_URL_2 ( href ), href
>
> >> URL = 'http://127.0.0.1:8000/welcome/default/index'
> >> fh = Verify_Links ( URL )

Reply via email to