Package: wnpp
Severity: wishlist
Owner: Emmanuel Bourg <ebo...@apache.org>

* Package name    : boilerpipe
  Version         : 1.2.0
  Upstream Author : Christian Kohlschütter <christ...@kohlschutter.com>
* URL             : http://code.google.com/p/boilerpipe
* License         : Apache-2.0
  Programming Lang: Java
  Description     : Boilerplate removal and fulltext extraction from HTML pages

The boilerpipe library provides algorithms to detect and remove the surplus
"clutter" (boilerplate, templates) around the main textual content of a web
page.

The library already provides specific strategies for common tasks (for example:
news article extraction) and may also be easily extended for individual problem
settings.

Extracting content is very fast (milliseconds), just needs the input document
(no global or site-level information required) and is usually quite accurate.


-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact listmas...@lists.debian.org

Reply via email to