Our next IR Meetup is at Cengage Learning on May 19, 2011. Please RSVP here:
http://www.meetup.com/Michigan-Information-Retrieval-Enthusiasts-Group/events/17567795/
Presentations:
1. Bayesian Language Model
This talk presents a Bayesian language model, originally described by (Teh
2006), which uses a hierarchical Pitman-Yor process to describe the
distribution of n-grams in an n-gram language model and which allows for a
Bayesian back-off and smoothing strategy. The language model, which assumes a
power-law prior over the n-gram space, compares favorably with language models
based upon state of the art empirical n-gram smoothing techniques. In addition
to the language model, and primarily because the background information
required to understand it is somewhat difficult, that material, most of which
does not appear in (Teh 2006), is also presented in some detail. In particular,
background information related to the Dirichlet distribution and the Dirichlet
process is given. The Dirichlet process is then related to the Pitman-Yor
process, and the hierarchical Pitman-Yor process is also presented.
2. Using GATE for Word Polarity in Context Classification
GATE (General Architecture for Text Engineering) is an open source software for
creating text processing workflows. Core GATE includes the tools for solving
many text engineering issues: modeling and persistence of specialized data
structures; measurement, evaluation, benchmarking; visualization and editing of
annotations, ontologies, parse trees, etc.; extraction of training instances
for machine learning; pluggable machine learning implementations. This
tutorial will show how to use GATE for advanced machine learning applications.
Detecting word polarity in context will be used as an example to show some of
the GATE features. The tutorial project is based on the latest sentiment
analysis research, specifically the work by Theresa Wilson, Janyce Wiebe, Paul
Hoffmann Recognizing Contextual Polarity: An Exploration of Features for
Phrase-Level Sentiment Analysis, 2009. Using different features (words, part
of speech, negations, etc...) SVM classifier is trained and evaluated.
Thank you,
Ivan Provalov