Announcing the Release of WordSegment Version 0.6.1 What is WordSegment? --------------------
WordSegment is an Apache2 licensed module for English word segmentation, written in pure-Python, and based on a trillion-word corpus. Based on code from the chapter “Natural Language Corpus Data” by Peter Norvig from the book “Beautiful Data” (Segaran and Hammerbacher, 2009). Data files are derived from the Google Web Trillion Word Corpus. It's implemented in pure-Python with 100% code coverage and complete documentation. What's new in 0.6.1? -------------------- - Exposed TOTAL constant representing the count of all unigrams in the corpus. Defaults to 1,024,908,267,229. - Added documentation on how to use a different corpus: http://www.grantjenks.com/docs/wordsegment/using-a-different-corpus.html Links ----- - Documentation: http://www.grantjenks.com/docs/wordsegment/ - Download: https://pypi.python.org/pypi/wordsegment - Source: https://github.com/grantjenks/wordsegment - Issues: https://github.com/grantjenks/wordsegment/issues This release is backwards-compatible. Please upgrade. -- https://mail.python.org/mailman/listinfo/python-announce-list Support the Python Software Foundation: http://www.python.org/psf/donations/