On 11/6/2009 4:24 PM surjit khakh said...
Write a python program to read a text file named “text.txt” and show the number of times each article is found in the file. Articles in the English language are the
words “a”, “an”, and “the”.


Sounds like you're taking a python class. Great! It's probably the best programming language to start with.

First, it helps when asking questions if you mention what version of the language you're using. Some features and options are newer. In particular, there's a string method 'count' that isn't available in older pythons, while the replace method has been around at least ten years.

If you haven't already, the tutorial at http://docs.python.org/tutorial/index.html is a great place to start. Pay particular attention to section 3's string introduction at http://docs.python.org/tutorial/introduction.html#strings and section 7 starting with http://docs.python.org/tutorial/inputoutput.html#reading-and-writing-files
on files.

Implicit in this problem is identifying words in the text file. This is tough because you need to take punctuation into account. There's a neat tool in newer pythons such that, assuming you've read the file contents into a variable txt, allows you to say set(txt) to get all the letters, numbers, punctuation marks, and any other whitespace type characters embedded in the content. You'll need to know these so that you can recognize the word regardless of adjacent punctuation. In this specific case, as articles in English always precede nouns you'll always find whitespace following an article. It would be a space except, of course, when the article ends the line and line wrap characters are included in the text file.

For example, consider the following text:

"""
SECTION 1.4. COUNTY PLANNING COMMISSION.

a. The County Planning Commission shall consist of five members. Each member of the Board of Supervisors shall recommend that a resident of his district be appointed to the Commission; provided, however, the appointments to the Commission shall require the affirmative vote of not less than a majority of the entire membership of the Board.
"""

Any a's, an's or the's in the paragraph body can be easily counted with the string count method once you properly prepared the text.

I expect the an's and the's are the easy ones to count. Consider however the paragraph identifier -- "a." -- this is not an article but would likely be counted as one in most solutions. There may also be a subsequent reference to this section (eg, see a above) or range of sections (eg, see a-n above) that further make this a harder problem. One possible approach may involve confirming the a noun follows the article. There are dictionaries you can access, or word lists that can help. The WordNet database from Princeton appears fairly complete with 117k entries, but even there it's easy to find exceptions: "A 20's style approach"; "a late bus"; or "a fallen hero".

So, frankly, I expect that solutions to this problem will range from the naive through the reasonably complete to the impossible without human confirmation of complex structure and context.

For your homework, showing you can read in the file, strip out any punctuation, count the resulting occurances, and report the results should do it.

Emile

_______________________________________________
Tutor maillist  -  Tutor@python.org
To unsubscribe or change subscription options:
http://mail.python.org/mailman/listinfo/tutor

Reply via email to