shahs...@gmail.com wrote: > I am trying to scrape a webpage just for learning. In that webpage there > are multiple "a" tags. consider the below code > > <a href='\abc\def\jkl'> Something </a> > > <a href ='http:\\www.google.com'> Something</a>
These are probaly all forward slashes. > Now i want to read only those href in which there is http. My Current code > is > > for link in soup.find_all("a"): > print link.get("href") > > i would like to change it to read only http links. You mean href values that start with "http://"? While you can do that with a callback def check_scheme(href): return href is not None and href.startswith("http://") for a in soup.find_all("a", href=check_scheme): print(a["href"]) or a regular expression import re for a in soup.find_all("a", href=re.compile("^http://")): print(a["href"]) why not keep things simple and check before printing? Like for a in soup.find_all("a"): href = a.get("href", "") # empty string if href is missing if href.startswith("http://"): print(href) -- https://mail.python.org/mailman/listinfo/python-list