Our next IR Meetup is at Cengage Learning on May 19, 2011. Please RSVP here: http://www.meetup.com/Michigan-Information-Retrieval-Enthusiasts-Group/events/17567795/
Presentations: 1. Bayesian Language Model This talk presents a Bayesian language model, originally described by (Teh 2006), which uses a hierarchical Pitman-Yor process to describe the distribution of n-grams in an n-gram language model and which allows for a Bayesian back-off and smoothing strategy. The language model, which assumes a power-law prior over the n-gram space, compares favorably with language models based upon state of the art empirical n-gram smoothing techniques. In addition to the language model, and primarily because the background information required to understand it is somewhat difficult, that material, most of which does not appear in (Teh 2006), is also presented in some detail. In particular, background information related to the Dirichlet distribution and the Dirichlet process is given. The Dirichlet process is then related to the Pitman-Yor process, and the hierarchical Pitman-Yor process is also presented. 2. Using GATE for Word Polarity in Context Classification GATE (General Architecture for Text Engineering) is an open source software for creating text processing workflows. Core GATE includes the tools for solving many text engineering issues: modeling and persistence of specialized data structures; measurement, evaluation, benchmarking; visualization and editing of annotations, ontologies, parse trees, etc.; extraction of training instances for machine learning; pluggable machine learning implementations. This tutorial will show how to use GATE for advanced machine learning applications. Detecting word polarity in context will be used as an example to show some of the GATE features. The tutorial project is based on the latest sentiment analysis research, specifically the work by Theresa Wilson, Janyce Wiebe, Paul Hoffmann "Recognizing Contextual Polarity: An Exploration of Features for Phrase-Level Sentiment Analysis", 2009. Using different features (words, part of speech, negations, etc...) SVM classifier is trained and evaluated. Thank you, Ivan Provalov