Hi,

Interesting read when you have a few hours to spare. Guess this is where
Yahoo! chat started labelling possible search queries, but this seems to go
further.

http://www.aclweb.org/anthology-new/D/D08/D08-1107.pdf

Kiran

Abstract
Web-search queries are known to be short, but little else is known about
their structure. In this paper we investigate the applicability
of part-of-speech tagging to typical Englishlanguage web search-engine
queries and the potential value of these tags for improving search results.
We begin by identifying a set of part-of-speech tags suitable for search
queries and quantifying their occurrence. We find that proper-nouns
constitute 40% of query terms, and proper nouns and nouns together
constitute over 70% of query terms. We also show that the majority of
queries are nounphrases, not unstructured collections of terms.  We then use
a set of queries manually labeled with these tags to train a Brill tagger
and evaluate its performance. In addition, we investigate classification of
search queries into grammatical classes based on the syntax of
part-of-speech tag sequences. We also conduct preliminary investigative
experiments into the practical applicability of leveraging query-trained
part-of-speech taggers for information-retrieval tasks. In particular, we
show that part-of-speech information can be a significant feature in
machine-learned searchresult relevance. These experiments also include the
potential use of the tagger in selecting words for omission or substitution
in
query reformulation, actions which can improve recall. We conclude that
training a partof-speech tagger on labeled corpora of queries significantly
outperforms taggers based on traditional corpora, and leveraging the unique
linguistic structure of web-search queries can improve search experience.

Reply via email to