Re: Finding keywords

Terry Reedy Tue, 08 Mar 2011 13:06:14 -0800

On 3/8/2011 2:00 PM, Matt Chaput wrote:

On 08/03/2011 8:58 AM, Cross wrote:

I know meta tags contain keywords but they are not always reliable. I
can parse xhtml to obtain keywords from meta tags; but how do I verify
them. To obtain reliable keywords, I have to parse the plain text
obtained from the URL.

This, of course, is a problem for all search engines, especially given'search optimization' games.

I think maybe what the OP is asking about is extracting key words from a
text, i.e. a short list of words that characterize the text. This is an
information retrieval problem, not really a Python problem.

One simple way to do this is to calculate word frequency histograms for
each document in your corpus, and then for a given document, select
words that are frequent in that document but infrequent in the corpus as
a whole. Whoosh does this.

I believe Google does something like this also. I have seen a claim thatGoogle only looks at the first x words, hence the advice 'Make sure yourtarget keywords are in the first x words.'. You, of course, can andshould process entire docs



--
Terry Jan Reedy

--
http://mail.python.org/mailman/listinfo/python-list

Re: Finding keywords

Reply via email to