Re: [Tutor] An idea for a script

Ian Witham Wed, 10 Oct 2007 17:18:33 -0700

On 10/11/07, Dick Moores <[EMAIL PROTECTED]> wrote:
>
> At 04:20 PM 10/10/2007, Dick Moores wrote:
> >How about a hint of how to get those ">jcooley<" things from the
> >source? (I'm able to have the script get the source, using urllib2.)
> >
> >BTW I thought I wouldn't try to use BeautifulSoup right now, but
> >take the hard way.
> >
> >Dick
>
> I asked for a hint too soon. A light went on, and I think I'm on the way
> with
>
> from urllib2 import *
> u = 'http://starship.python.net/crew/index.html'
> f = urlopen(u)
> a =  f.read()
> b = a.split('"')
> print b
> for x in b:
>      if '<' not in x:
>          print x
>
> This gets all, but not only, those ">jcooley<" things, I believe.



That looks like it will work...
Try starting with a couple of 'splits' so that you are only working with the
data between "The Crew" and "Looking for the official"

a =  f.read()
a = a.split("The Crew")[1].split("Looking for")[0]

Now you are only examining the relevant block of HTML.
You can now filter the list with a list comprehension:

b = a.split('"')
b = [u for u in b if '<' not in u]

Ian.

_______________________________________________
Tutor maillist  -  [email protected]
http://mail.python.org/mailman/listinfo/tutor

Re: [Tutor] An idea for a script

Reply via email to