On 10/11/07, Dick Moores <[EMAIL PROTECTED]> wrote: > > At 04:20 PM 10/10/2007, Dick Moores wrote: > >How about a hint of how to get those ">jcooley<" things from the > >source? (I'm able to have the script get the source, using urllib2.) > > > >BTW I thought I wouldn't try to use BeautifulSoup right now, but > >take the hard way. > > > >Dick > > I asked for a hint too soon. A light went on, and I think I'm on the way > with > > from urllib2 import * > u = 'http://starship.python.net/crew/index.html' > f = urlopen(u) > a = f.read() > b = a.split('"') > print b > for x in b: > if '<' not in x: > print x > > This gets all, but not only, those ">jcooley<" things, I believe.
That looks like it will work... Try starting with a couple of 'splits' so that you are only working with the data between "The Crew" and "Looking for the official" a = f.read() a = a.split("The Crew")[1].split("Looking for")[0] Now you are only examining the relevant block of HTML. You can now filter the list with a list comprehension: b = a.split('"') b = [u for u in b if '<' not in u] Ian.
_______________________________________________ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor