We find that proper-nouns
constitute 40% of query terms, and proper nouns and nouns together
constitute over 70% of query terms. We also show that the majority of
queries are nounphrases, not unstructured collections of terms.

Way back when, we had a web-authoring environment that proposed likely hyperlinks by picking noun phrases out of a draft webpage and matching them against a full-text index of other content on the same server. At the time NCSA's "what's new" page was state-of-the-art in finding third-party web content; it might be interesting were someone else to take a run at that fence using modern search engine queries...

-Dave


Reply via email to