[jira] [Commented] (TIKA-1699) Integrate the GROBID PDF extractor in Tika

2015-07-29 Thread Sujen Shah (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14646316#comment-14646316 ] Sujen Shah commented on TIKA-1699: -- Working towards publishing GROBID to Maven Central

[jira] [Commented] (TIKA-1699) Integrate the GROBID PDF extractor in Tika

2015-07-29 Thread ASF GitHub Bot (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1699?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14646321#comment-14646321 ] ASF GitHub Bot commented on TIKA-1699: -- GitHub user sujen1412 opened a pull request:

[GitHub] tika pull request: Fix for TIKA-1699 contributed by Sujen Shah

2015-07-29 Thread sujen1412
GitHub user sujen1412 opened a pull request: https://github.com/apache/tika/pull/55 Fix for TIKA-1699 contributed by Sujen Shah Waiting for GROBID to get published to maven central. Sonatype issue - https://issues.sonatype.org/browse/OSSRH-16837 You can merge this pull request

[jira] [Commented] (TIKA-1696) Language Identification with Text Processing Toolkit from MITLL

2015-07-29 Thread Paul Ramirez (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1696?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14646530#comment-14646530 ] Paul Ramirez commented on TIKA-1696: The algorithm that is used is described here:

Re: Bayesian N-Gram Language Detection

2015-07-29 Thread Oleg Tikhonov
+1 !!! My two cents. Please also add ability to change/retrain/tote language profiles. Thanks !!! BR, Oleg On Wed, Jul 29, 2015 at 3:59 AM, Mattmann, Chris A (3980) chris.a.mattm...@jpl.nasa.gov wrote: Cool. Well with this one I found, along with language-detector, along with Ramirez and the

[jira] [Updated] (TIKA-1607) Introduce new arbitrary object key/values data structure for persitsence of Tika Metadata

2015-07-29 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1607?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-1607: -- Attachment: TIKA-1607v1_rough_rough.patch I'm attaching a strawman approach to this...slightly different

[jira] [Comment Edited] (TIKA-1607) Introduce new arbitrary object key/values data structure for persitsence of Tika Metadata

2015-07-29 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14646124#comment-14646124 ] Tim Allison edited comment on TIKA-1607 at 7/29/15 2:45 PM: I'm

[jira] [Commented] (TIKA-1691) Apache Tika for enabling metadata interoperability

2015-07-29 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1691?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14645973#comment-14645973 ] Nick Burch commented on TIKA-1691: -- Reading the PDF, it looks to me like what our parsers