https://bugzilla.redhat.com/show_bug.cgi?id=1002704

            Bug ID: 1002704
           Summary: Review Request: boilerpipe - Boilerplate Removal and
                    Fulltext Extraction from HTML pages
           Product: Fedora
           Version: rawhide
         Component: Package Review
          Severity: medium
          Priority: medium
          Assignee: [email protected]
          Reporter: [email protected]
        QA Contact: [email protected]
                CC: [email protected],
                    [email protected]



Spec URL: http://gil.fedorapeople.org/boilerpipe.spec
SRPM URL: http://gil.fedorapeople.org/boilerpipe-1.2.0-1.fc19.src.rpm
Description:
The boilerpipe library provides algorithms to detect and
remove the surplus "clutter" (boilerplate, templates)
around the main textual content of a web page.

The library already provides specific strategies 
for common tasks (for example: news article extraction) and
may also be easily extended for individual problem settings.

Extracting content is very fast (milliseconds), just needs the
input document (no global or site-level information required) and
is usually quite accurate. 
Fedora Account System Username: gil

-- 
You are receiving this mail because:
You are on the CC list for the bug.
Unsubscribe from this bug 
https://bugzilla.redhat.com/token.cgi?t=si01AujCgH&a=cc_unsubscribe
_______________________________________________
package-review mailing list
[email protected]
https://admin.fedoraproject.org/mailman/listinfo/package-review

Reply via email to