vinodh kumar wrote: > hai all, > i am a student of computer science dept. i have planned to > design a search engine in python..i am seeking info about how to > proceed further. > i need some example source code That is an ambitious project. I wonder whether this is "homework". (It sounds too ambitious to be homework but one never knows). We don't provide code for homework but are glad to assist you when you get stuck.
Before coding I suggest you create a design or plan for the program. Do you want to emulate Google? (Do you understand what Google does?) Or something simpler? (I suggest simpler). What are you searching for? How much information do you want to store? How do you want to present the results to a user? Python provides a urllib2 module for getting the contents of a web page. This example gets the python.org main page and displays the first 100 bytes of it: >>> import urllib2 >>> f = urllib2.urlopen('http://www.python.org/') >>> print f.read(100) <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> <?xml-stylesheet href="./css/ht2html That is the basic tool you'd use to get page contents. BeautifulSoup http://www.crummy.com/software/BeautifulSoup/ is a really good tool for parsing the page contents, looking for text and links. I think those are the main ingredients of a search engine. The rest is various strategies for finding web sites from which to read pages. I suggest you expand the above to a program that will read a given page, find the links to other pages and read them recursively. Then you need a way to look for the keywords of interest in the page text and store them with references to the links to the pages containing them. Python dictionaries are the way to collect this data and the shelve module provides a way to save Python objects such as dictionaries for later retrieval. Hope this helps get you started. Someday your work may excel beyond Google. -- Bob Gailer 510-978-4454 _______________________________________________ Tutor maillist - Tutor@python.org http://mail.python.org/mailman/listinfo/tutor