Re: Using pf4j for tika pipes

2024-08-26 Thread Eric Pugh
of other intensions. > > After this is merged, I'd like to build another RC so I can see if the > issues reported by users are fixed. > > -Nicholas ___ Eric Pugh | Founder | OpenSource Connections, LLC | 434.466.1467 | http://www.opensourceconnections.c

Re: Bump dependabot to weekly?

2024-04-29 Thread Eric Pugh
ting them reflexively out of your email! >>> Not Tilman!!! >>> >>> Let's move to weekly and see how that works? >>> >>> On Wed, Apr 10, 2024 at 3:57 PM Eric Pugh >>> mailto:ep...@opensourceconnections.com>> >>> wrote: >>

Re: Bump dependabot to weekly?

2024-04-10 Thread Eric Pugh
ning regression tests, we'd run the update plugin to make > sure that we're up to date. > What do you think? > >Best, > > Tim > ___ Eric Pugh | Founder | OpenSource Connections, LLC | 434.466.1467 | http://www.opensource

Re: Bump dependabot to weekly?

2024-04-10 Thread Eric Pugh
x27;re up to date. > What do you think? > >Best, > > Tim ___ Eric Pugh | Founder | OpenSource Connections, LLC | 434.466.1467 | http://www.opensourceconnections.com <http://www.opensourceconnections.com/> | My Free/Busy <http://t

Re: Document chunking

2024-04-09 Thread Eric Pugh
able headers at the >>start of every chunk >> * When you're done chunking + adding bits back at the top, convert >>to markdown on output >> >> Happy to explain more! But sadly lacking time right now to do much on that >> >> Nick >

Re: Replace baseline language detection in tika-server and tika-app in 3.x?

2024-04-08 Thread Eric Pugh
gt;> Hello Tika community, >> >> >> >> Our team is migrating away from usage of tika-app.jar (2.6 currently) >> to something with more minimal third party dependencies which we can >> control. >> >> >> >

Re: Support page?

2023-05-12 Thread Eric Pugh
e Infra thing :) > > Though maybe a mark-down page in the Git repo could also work - haven’t spent > much time thinking about this... > > — Ken > > >> On May 12, 2023, at 5:50 AM, Tim Allison wrote: >> >> All, >> I was chatting with Eric Pu

Re: docker versions?

2022-10-27 Thread Eric Pugh
version of Tika comes out, say 2.6.0, should > we start with a four digit docker version, e.g. 2.6.0.0 for our docker > releases or should we go back to three digits? > > Thank you, all! > > Cheers, > > Tim ___ Eric Pugh | Founder & CEO |

Re: [DISCUSS] support for Java 8?

2022-03-25 Thread Eric Pugh
for how long? >> >> Cheers, >> >> Tim > > ___ Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467 | http://www.opensourceconnections.com <http://www.opensourceconnections.com/> | My Free/Busy <http://tinyu

Re: [DRAFT] Dedicated ANNOUNCE for Tika 1.x EoL?

2022-02-10 Thread Eric Pugh
more than six months until the official EoL date for Tika 1.x. >>>> Tim mentioned that some narrative was provided in the the recent release >>>> announcement but I think we could help ourselves by explicitly sending a >>>> dedicated 1.x EoL ANNOUNCEMENT. >>&

Re: [DISCUSS] upgrading log4j to to log4j2 in Tika's 1.x branch

2021-12-13 Thread Eric Pugh
g4j2? >>> >>> Best, >>> >>> Tim >>> >>> >>> [1] >>> https://issues.apache.org/jira/browse/TIKA-3616?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=174575

Re: Proposed topics for next Tika meetups?

2021-11-09 Thread Eric Pugh
pipes hands-on workshop > b) get to know the users -- 5 minute go-around the room "this is how > we use it; these are our pain points" > c) ??? > > Again, thank you! > > Best, > > Tim _

[jira] [Created] (TIKA-3497) Update README for installing Tika Server as a service for 2.0 release

2021-07-24 Thread David Eric Pugh (Jira)
David Eric Pugh created TIKA-3497: - Summary: Update README for installing Tika Server as a service for 2.0 release Key: TIKA-3497 URL: https://issues.apache.org/jira/browse/TIKA-3497 Project: Tika

[jira] [Commented] (TIKA-3495) parent-child in solr emitter doesn't seem to include parent id (_nest_parent_)

2021-07-23 Thread David Eric Pugh (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17386334#comment-17386334 ] David Eric Pugh commented on TIKA-3495: --- Looking at that json file you linke

[jira] [Comment Edited] (TIKA-3495) parent-child in solr emitter doesn't seem to include parent id (_nest_parent_)

2021-07-23 Thread David Eric Pugh (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17386315#comment-17386315 ] David Eric Pugh edited comment on TIKA-3495 at 7/23/21, 3:4

[jira] [Commented] (TIKA-3495) parent-child in solr emitter doesn't seem to include parent id (_nest_parent_)

2021-07-23 Thread David Eric Pugh (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17386315#comment-17386315 ] David Eric Pugh commented on TIKA-3495: --- This area of Solr has been changing a

[jira] [Commented] (TIKA-1570) Seeking a stop method for better use with Apache Commons Daemon

2021-05-13 Thread David Eric Pugh (Jira)
[ https://issues.apache.org/jira/browse/TIKA-1570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17343965#comment-17343965 ] David Eric Pugh commented on TIKA-1570: --- The associated pr seems reasonable, w

[jira] [Commented] (TIKA-1570) Seeking a stop method for better use with Apache Commons Daemon

2021-05-13 Thread David Eric Pugh (Jira)
[ https://issues.apache.org/jira/browse/TIKA-1570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17343963#comment-17343963 ] David Eric Pugh commented on TIKA-1570: --- I might suggest trying to go down

[jira] [Commented] (TIKA-1570) Seeking a stop method for better use with Apache Commons Daemon

2021-05-13 Thread David Eric Pugh (Jira)
[ https://issues.apache.org/jira/browse/TIKA-1570?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17343962#comment-17343962 ] David Eric Pugh commented on TIKA-1570: --- Unfortunately they are Linux

Re: high level parser module names in 2.x

2021-05-11 Thread Eric Pugh
Sounds good to me. On Tue, May 11, 2021 at 9:33 AM Tim Allison wrote: > If there aren't objections, I'll make this change today or tomorrow. > > Cheers, > >Tim > > On Tue, Apr 20, 2021 at 10:57 AM Tim Allison wrote: > > > > How about: > > > > standard > > extended > > ml (for machin

Re: high level parser module names in 2.x

2021-03-09 Thread Eric Pugh
uire native libs and/or have > heavier dependencies, including network calls. > > tika-parsers-advanced -- anything goes. dl4j as a dependency, etc. > > Some options for classic-> basic, base, ...what else? > > Any other recommendations for these names? Thank you! > >

Re: Config Tika Server

2021-01-18 Thread Eric Pugh
ormations. >> I understand that I need to config tika server to obtain that. Could you >> please hep me with that? >> >> Thanks, >> Nilton >> ___ Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467 | http://www.o

[jira] [Commented] (TIKA-3258) Run OCR on PDFs with 'auto' mode as default in Tika 2.0.0

2021-01-06 Thread David Eric Pugh (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17259809#comment-17259809 ] David Eric Pugh commented on TIKA-3258: --- I'm thinking that this is

Re: Expected private/secret keys in the source (TIKA-3205)

2020-09-29 Thread Eric Pugh
> > So before anyone else gets a notification and worries, I felt it best to give > everyone a heads-up that yes, there are private key files in the Tika source > tree, and yes, they are supposed to be there! > > Cheers > Nick ___ Eric Pugh |

[jira] [Commented] (TIKA-3166) Actually maven-modularize the packages for 2.0

2020-08-20 Thread David Eric Pugh (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17181262#comment-17181262 ] David Eric Pugh commented on TIKA-3166: --- I did a diff, and while I can't s

[jira] [Commented] (TIKA-3093) Enable tika-server to forward parse results to another endpoint

2020-04-24 Thread David Eric Pugh (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17091703#comment-17091703 ] David Eric Pugh commented on TIKA-3093: --- Out of curiosity, is this type of beha

Re: Issue with > 200% CPU after bulk usage

2020-04-16 Thread Eric Pugh
to show what the JVM is doing? > https://access.redhat.com/solutions/18178 > > Nick ___ Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467 | http://www.opensourceconnections.com <http://www.opensourceconnections.com/> | My Free/Busy <htt

Re: Tika master branch not building

2020-04-08 Thread Eric Pugh
here I did the release and was trying to build it for >>> updating the site, and this had already kicked in. :( >>> >>> Y, we can turn this to warn, as long as we run it with fail as part of >> the >>> release process. >>> >>> On Mon,

[jira] [Commented] (TIKA-2368) Clean up SentimentParser dependencies

2020-04-06 Thread David Eric Pugh (Jira)
[ https://issues.apache.org/jira/browse/TIKA-2368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17076673#comment-17076673 ] David Eric Pugh commented on TIKA-2368: --- I'm actually not sure

[jira] [Commented] (TIKA-2368) Clean up SentimentParser dependencies

2020-04-06 Thread David Eric Pugh (Jira)
[ https://issues.apache.org/jira/browse/TIKA-2368?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17076501#comment-17076501 ] David Eric Pugh commented on TIKA-2368: --- In [https://github.com/apache/tika/

Re: Tika master branch not building

2020-04-06 Thread Eric Pugh
are on the latest version of the plugin? https://github.com/apache/tika/blob/master/tika-parent/pom.xml#L382 suggests we are on 3.1.0. Eric > On Apr 6, 2020, at 9:59 AM, Nick Burch wrote: > > On Mon, 6 Apr 2020, Eric Pugh wrote: >> Maybe this needs better documentation, ho

Re: Tika master branch not building

2020-04-06 Thread Eric Pugh
gt;> [ERROR] Re-run Maven using the -X switch to enable full debug logging. >> [ERROR] >> [ERROR] For more information about the errors and possible solutions, >> please read the following articles: >> [ERROR] [Help 1] >> http://cwiki.apache.org/confluence/display/M

[jira] [Commented] (TIKA-3075) Add an HTTP parser

2020-03-19 Thread David Eric Pugh (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3075?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17062619#comment-17062619 ] David Eric Pugh commented on TIKA-3075: --- Not sure I understand what this issu

[jira] [Commented] (TIKA-3035) Tika-app --extract mode outputs to stderr instead of stdout

2020-02-25 Thread David Eric Pugh (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17044533#comment-17044533 ] David Eric Pugh commented on TIKA-3035: --- Tried it with tika-app-1.23.jar and wo

[jira] [Commented] (TIKA-3035) Tika-app --extract mode outputs to stderr instead of stdout

2020-02-25 Thread David Eric Pugh (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3035?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17044406#comment-17044406 ] David Eric Pugh commented on TIKA-3035: --- Here is my command: java -cp tika

[jira] [Commented] (TIKA-3037) Tika Docs should highlight Tika-Server

2020-02-24 Thread David Eric Pugh (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17043796#comment-17043796 ] David Eric Pugh commented on TIKA-3037: --- [~tallison]did you see

Re: [jira] [Commented] (TIKA-3040) PDF inline OCR: Exception while processing certain image (others in same PDF work)

2020-02-12 Thread Eric Pugh
t;> at org.eclipse.jetty.server.Server.handle(Server.java:500) at >> org.eclipse.jetty.server.HttpChannel.lambda$handle$1(HttpChannel.java:386) >> at org.eclipse.jetty.server.HttpChannel.dispatch(HttpChannel.java:560) at >> org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:378) at >> org.eclipse.jetty

Re: [EXTERNAL] Do we have a community supported approach for deploying Tika Server in production?

2020-02-06 Thread Eric Pugh
Dave, I pushed up TIKA-3039 with this change for your review and commit! > On Feb 6, 2020, at 7:20 AM, Eric Pugh wrote: > > Great! > > >> On Feb 5, 2020, at 10:55 PM, David Meikle > <mailto:da...@meikle.io>> wrote: >> >> Hi Eric, >> &

[jira] [Commented] (TIKA-3039) Remove mvn dockerfile:build goal from tika-server

2020-02-06 Thread David Eric Pugh (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3039?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17032034#comment-17032034 ] David Eric Pugh commented on TIKA-3039: --- Okay, PR created [~davemeikle] for

[jira] [Created] (TIKA-3039) Remove mvn dockerfile:build goal from tika-server

2020-02-06 Thread David Eric Pugh (Jira)
David Eric Pugh created TIKA-3039: - Summary: Remove mvn dockerfile:build goal from tika-server Key: TIKA-3039 URL: https://issues.apache.org/jira/browse/TIKA-3039 Project: Tika Issue Type

Re: [EXTERNAL] Do we have a community supported approach for deploying Tika Server in production?

2020-02-06 Thread Eric Pugh
nt snapshots too. > > Cheers, > Dave > > On Wed, 5 Feb 2020 at 15:34, Eric Pugh <mailto:ep...@opensourceconnections.com>> > wrote: > >> Following this thread, should we deprecate/remove the Tika Docker support >> that is in Tika-server project? >

[jira] [Commented] (TIKA-3038) Miredot license key expired

2020-02-05 Thread David Eric Pugh (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17030925#comment-17030925 ] David Eric Pugh commented on TIKA-3038: --- Also, the url for the plugin has cha

[jira] [Created] (TIKA-3038) Miredot license key expired

2020-02-05 Thread David Eric Pugh (Jira)
David Eric Pugh created TIKA-3038: - Summary: Miredot license key expired Key: TIKA-3038 URL: https://issues.apache.org/jira/browse/TIKA-3038 Project: Tika Issue Type: Task

[jira] [Commented] (TIKA-2253) Obtain new Miredot license key and upgrade plugin version in tika-server

2020-02-05 Thread David Eric Pugh (Jira)
[ https://issues.apache.org/jira/browse/TIKA-2253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17030904#comment-17030904 ] David Eric Pugh commented on TIKA-2253: --- Hi all...The license has exp

[jira] [Commented] (TIKA-3037) Tika Docs should highlight Tika-Server

2020-02-05 Thread David Eric Pugh (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17030900#comment-17030900 ] David Eric Pugh commented on TIKA-3037: --- Okay, I've attached a SVN DIFF p

[jira] [Updated] (TIKA-3037) Tika Docs should highlight Tika-Server

2020-02-05 Thread David Eric Pugh (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3037?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Eric Pugh updated TIKA-3037: -- Attachment: gettingstarted.apt.patch > Tika Docs should highlight Tika-Ser

[jira] [Commented] (TIKA-3037) Tika Docs should highlight Tika-Server

2020-02-05 Thread David Eric Pugh (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17030862#comment-17030862 ] David Eric Pugh commented on TIKA-3037: --- Okay, in https://svn.apache.org/repos

[jira] [Commented] (TIKA-3037) Tika Docs should highlight Tika-Server

2020-02-05 Thread David Eric Pugh (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17030806#comment-17030806 ] David Eric Pugh commented on TIKA-3037: --- I put some edits into the wiki at h

[jira] [Commented] (TIKA-3037) Tika Docs should highlight Tika-Server

2020-02-05 Thread David Eric Pugh (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17030766#comment-17030766 ] David Eric Pugh commented on TIKA-3037: --- Thanks [~nick] > Tika Docs

[jira] [Commented] (TIKA-3037) Tika Docs should highlight Tika-Server

2020-02-05 Thread David Eric Pugh (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17030760#comment-17030760 ] David Eric Pugh commented on TIKA-3037: --- Another comment, so the page h

[jira] [Commented] (TIKA-3037) Tika Docs should highlight Tika-Server

2020-02-05 Thread David Eric Pugh (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3037?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17030748#comment-17030748 ] David Eric Pugh commented on TIKA-3037: --- So... Where does the HTML for

Re: [EXTERNAL] Do we have a community supported approach for deploying Tika Server in production?

2020-02-05 Thread Eric Pugh
note, I've created a new repos: > > https://github.com/apache/tika-docker > > > > Thinking based on looking at the PRs and Issues on LogicalSpark > > docker-tikaserver, I'll create an updated docker file using what you've > > added here and look to pub

[jira] [Created] (TIKA-3037) Tika Docs should highlight Tika-Server

2020-02-05 Thread David Eric Pugh (Jira)
David Eric Pugh created TIKA-3037: - Summary: Tika Docs should highlight Tika-Server Key: TIKA-3037 URL: https://issues.apache.org/jira/browse/TIKA-3037 Project: Tika Issue Type: Improvement

Re: [EXTERNAL] Do we have a community supported approach for deploying Tika Server in production?

2020-01-07 Thread Eric Pugh
Hi all, I’ve gone ahead and added the -spawnChild property as a default when running Tika Server as a service. I’d love some eyes on the PR, and if this looks good, get it committed. Feedback welcome! Eric > On Dec 17, 2019, at 12:53 PM, Eric Pugh > wrote: > > Cool. &

Re: [EXTERNAL] Do we have a community supported approach for deploying Tika Server in production?

2019-12-17 Thread Eric Pugh
elease Tika, previously we ship >> tika-app.jar and Tika-eval.jar, and Tika-server.jar, and now, I think, we >> want to add the tika-server-bin.tgz and tika-server-bin.zip binary >> distributions. >> >> I’m happy to start writing accompanying “how to deploy

Re: [EXTERNAL] Do we have a community supported approach for deploying Tika Server in production?

2019-12-16 Thread Eric Pugh
deploy Tika Server” docs if this PR looks good! Or, please give input and I’ll make the updates. Eric > On Dec 12, 2019, at 2:39 PM, Eric Pugh > wrote: > > I’ve created this JIRA to track this work: > https://issues.apache.org/jira/browse/TIKA-3010 > <https://issues.

[jira] [Commented] (TIKA-3010) Tika needs service installation script

2019-12-12 Thread David Eric Pugh (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16995174#comment-16995174 ] David Eric Pugh commented on TIKA-3010: --- Made more progress. Now, when you

[jira] [Updated] (TIKA-3010) Tika needs service installation script

2019-12-12 Thread David Eric Pugh (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3010?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Eric Pugh updated TIKA-3010: -- Flags: Patch,Important (was: Important) > Tika needs service installation scr

Re: [EXTERNAL] Do we have a community supported approach for deploying Tika Server in production?

2019-12-12 Thread Eric Pugh
n to the Solr folks (having created 7633) > from the 2014 DARPA MEMEX days was to > move towards Tika Server based SolrCell dep and that’s the right way to go > IMO. > > > > Chris > > > > > > > > > > > > From: Eric Pugh <mai

[jira] [Created] (TIKA-3010) Tika needs service installation script

2019-12-12 Thread David Eric Pugh (Jira)
David Eric Pugh created TIKA-3010: - Summary: Tika needs service installation script Key: TIKA-3010 URL: https://issues.apache.org/jira/browse/TIKA-3010 Project: Tika Issue Type: Improvement

Miredot documentation is missing for 1.23...

2019-12-12 Thread Eric Pugh
https://tika.apache.org/1.23/miredot/ <https://tika.apache.org/1.23/miredot/> url has a 404. Looks like https://tika.apache.org/1.22/miredot/ <https://tika.apache.org/1.22/miredot/> works. ___ Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.

Do we have a community supported approach for deploying Tika Server in production?

2019-12-04 Thread Eric Pugh
? Do we need to create the equivalent of the Service Installation scripts for Tika-Server? Wanted to stoke the discussion! Eric ___ Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467 | http://www.opensourceconnections.com &

Re: regression tests for 1.23-rc1

2019-11-22 Thread Eric Pugh
extract images for PDFs, which adds to the processing time and > leads to more OOMs. > I restarted the regression tests this morning with that feature turned > off. > > Best, > > Tim ___ Eric Pugh | Founder & CEO | OpenSou

Re: [EXTERNAL] Docker image along with 1.23?

2019-11-21 Thread Eric Pugh
leading the debate on ASF binary releases > that bundle the JVM, I'd suggest we wait for that to resolve before we think > about trying to publish pre-built images ourselves. Linking to images from > external organisations we trust should be fine though, eg similar to > h

Re: [EXTERNAL] Docker image along with 1.23?

2019-11-20 Thread Eric Pugh
; To: "Allison, Timothy B (US 1760-Affiliate)" > Cc: "" > Subject: [EXTERNAL] Re: Docker image along with 1.23? > > > > On Wed, 20 Nov 2019, Tim Allison wrote: > > Eric Pugh recently asked on another channel if we had any plans to > > releas

Re: [EXTERNAL] Tika 1.23?

2019-11-20 Thread Eric Pugh
+1 from contributor On Wed, Nov 20, 2019 at 12:09 PM Chris Mattmann wrote: > +1 ship it > > > > > > > > From: Tim Allison > Reply-To: "dev@tika.apache.org" , "Allison, Timothy > B (US 1760-Affiliate)" > Date: Wednesday, November 20, 2019 at 9:07 AM > To: "" > Subject: [EXTERNAL] Tika 1.23? >

Re: Grant write access to our wiki to Eric Pugh

2019-10-31 Thread Eric Pugh
Thanks Nick, I’ll dig a bit more on those two links. If nothing else, I’d like to get the examples all up to 1.23. > On Oct 31, 2019, at 9:16 AM, Nick Burch <mailto:apa...@gagravarr.org>> wrote: > > On Wed, 30 Oct 2019, Eric Pugh wrote: >> I’ve been going through the

Re: Grant write access to our wiki to Eric Pugh

2019-10-30 Thread Eric Pugh
son wrote: >>> Anyone object if I grant write access to our wiki to Eric Pugh. He slacked >>> me a request. >> >> I'd almost be tempted to say that we should grant access to all ASF >> Committers to our wiki. > > +1, CTR FTW :) > > — Ken >

Re: reconfiguring ossindex-maven-plugin for releases?

2019-10-29 Thread Eric Pugh
ot;fail the build" for our working branches. >> Any better ideas? >> >> Cheers, >> >> Tim >> ___ Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467 | http://www.opensourceconnections.com <http://www.openso

[jira] [Commented] (TIKA-2968) Display specific command for Tesseract if you are running in Verbose mode

2019-10-23 Thread David Eric Pugh (Jira)
[ https://issues.apache.org/jira/browse/TIKA-2968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16957801#comment-16957801 ] David Eric Pugh commented on TIKA-2968: --- And on a related aspect, maybe, if we

[jira] [Commented] (TIKA-2968) Display specific command for Tesseract if you are running in Verbose mode

2019-10-23 Thread David Eric Pugh (Jira)
[ https://issues.apache.org/jira/browse/TIKA-2968?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16957799#comment-16957799 ] David Eric Pugh commented on TIKA-2968: --- Hey community, any chance of this b

[jira] [Created] (TIKA-2971) Link to download OpenNLP models needs to be http not https

2019-10-22 Thread David Eric Pugh (Jira)
David Eric Pugh created TIKA-2971: - Summary: Link to download OpenNLP models needs to be http not https Key: TIKA-2971 URL: https://issues.apache.org/jira/browse/TIKA-2971 Project: Tika

[jira] [Commented] (TIKA-2624) Rendering PDFs for OCR with Tesseract uses different DPI than claimed

2019-10-22 Thread David Eric Pugh (Jira)
[ https://issues.apache.org/jira/browse/TIKA-2624?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16957204#comment-16957204 ] David Eric Pugh commented on TIKA-2624: --- I am rereading this thread via JIRA ve

Need some guidance on how to proceed with TIKA-2970

2019-10-21 Thread Eric Pugh
___ Eric Pugh | Founder & CEO | OpenSource Connections, LLC | 434.466.1467 | http://www.opensourceconnections.com <http://www.opensourceconnections.com/> | My Free/Busy <http://tinyurl.com/eric-cal> Co-Author: Apache Solr Enterprise Search Server, 3rd Ed <https://www.packtp

[jira] [Commented] (TIKA-2970) Configuring Tesseract for OCR of PDF via Tika Config is not working

2019-10-20 Thread David Eric Pugh (Jira)
[ https://issues.apache.org/jira/browse/TIKA-2970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16955615#comment-16955615 ] David Eric Pugh commented on TIKA-2970: --- Interestingly, I think this might all

[jira] [Commented] (TIKA-2970) Configuring Tesseract for OCR of PDF via Tika Config is not working

2019-10-20 Thread David Eric Pugh (Jira)
[ https://issues.apache.org/jira/browse/TIKA-2970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16955612#comment-16955612 ] David Eric Pugh commented on TIKA-2970: --- It's a work in progress, however

[jira] [Created] (TIKA-2970) Configuring Tesseract for OCR of PDF via Tika Config is not working

2019-10-20 Thread David Eric Pugh (Jira)
David Eric Pugh created TIKA-2970: - Summary: Configuring Tesseract for OCR of PDF via Tika Config is not working Key: TIKA-2970 URL: https://issues.apache.org/jira/browse/TIKA-2970 Project: Tika

[jira] [Commented] (TIKA-2705) Allow configuration of TesseractOCRParser as we do for other parsers

2019-10-20 Thread David Eric Pugh (Jira)
[ https://issues.apache.org/jira/browse/TIKA-2705?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16955515#comment-16955515 ] David Eric Pugh commented on TIKA-2705: --- I know this is marked as resolved, but

[jira] [Commented] (TIKA-2969) Unit test for TesseractOCRParserTest.java has confusing behavior when Tesseract not on path

2019-10-20 Thread David Eric Pugh (Jira)
[ https://issues.apache.org/jira/browse/TIKA-2969?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16955498#comment-16955498 ] David Eric Pugh commented on TIKA-2969: --- I noticed that when I run `mvn test`

[jira] [Created] (TIKA-2969) Unit test for TesseractOCRParserTest.java has confusing behavior when Tesseract not on path

2019-10-20 Thread David Eric Pugh (Jira)
David Eric Pugh created TIKA-2969: - Summary: Unit test for TesseractOCRParserTest.java has confusing behavior when Tesseract not on path Key: TIKA-2969 URL: https://issues.apache.org/jira/browse/TIKA-2969

[jira] [Created] (TIKA-2968) Display specific command for Tesseract if you are running in Verbose mode

2019-10-18 Thread David Eric Pugh (Jira)
David Eric Pugh created TIKA-2968: - Summary: Display specific command for Tesseract if you are running in Verbose mode Key: TIKA-2968 URL: https://issues.apache.org/jira/browse/TIKA-2968 Project

Re: Questions

2019-09-10 Thread Eric Pugh
a mirror, or has it become the authoritative > source? (Given that I saw mentions of pull requests, I suspect the latter.) > If the latter, I suggest changing that text to something like "Tika > Authoritative Repository", as it is currently misleading. > > Thanks, >

[jira] [Commented] (TIKA-2931) Tika CLI shouldn't log with System.out.println

2019-08-29 Thread Eric Pugh (Jira)
[ https://issues.apache.org/jira/browse/TIKA-2931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16918723#comment-16918723 ] Eric Pugh commented on TIKA-2931: - Okay, I've made a PR that fixes this proble

[jira] [Commented] (TIKA-2931) Tika CLI shouldn't log with System.out.println

2019-08-28 Thread Eric Pugh (Jira)
[ https://issues.apache.org/jira/browse/TIKA-2931?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16918137#comment-16918137 ] Eric Pugh commented on TIKA-2931: - Looks like the TikaCLI test does rely on this beha

[jira] [Created] (TIKA-2931) Tika CLI shouldn't log with System.out.println

2019-08-28 Thread Eric Pugh (Jira)
Eric Pugh created TIKA-2931: --- Summary: Tika CLI shouldn't log with System.out.println Key: TIKA-2931 URL: https://issues.apache.org/jira/browse/TIKA-2931 Project: Tika Issue Type: Improv

Re: TesseractOCRParserTest needed extra parameters to run...

2019-08-20 Thread Eric Pugh
I poked around at other parsers for Tika that require additional installation steps to see how they warn the user, like the GrobidNERecogniser class... It turns out the way that is handled is by NOT having a unit test at all ;-( > On Aug 20, 2019, at 10:46 AM, Eric Pugh > wrote: &

TesseractOCRParserTest needed extra parameters to run...

2019-08-20 Thread Eric Pugh
tOCRConfig(); +config.setTesseractPath("/usr/local/bin"); + config.setTessdataPath("/usr/local/Cellar/tesseract/4.1.0/share/tessdata"); config.setOutputType(outputType); Parser parser = new RecursiveParserWrapper(new AutoDetectParser(), ___

Re: Tika Tikka Masala Project

2019-03-11 Thread Eric Pugh
ython's tika > package. I've attached my Google Slides presentation that I share with > Chris Mattmann at NASA JPL. > he > Enjoy your day! > > https://docs.google.com/presentation/d/1bmAInwzNxMWUQVL-YrYpgFUI6XXRifDmXdoOCgYCbTI/edit?usp=sharing > > Best, >

Re: experiences with Tika in Docker

2017-06-01 Thread Eric Pugh
s, timeouts). What do you all think? >> >>Cheers, >> >>Tim >> >> Timothy B. Allison, Ph.D. >> Principal Artificial Intelligence Engineer Group Lead K83E/Human >> Language Technology The MITRE Corporation >> 7515 Colshire

Re: Tika talk next week - help needed!

2017-05-16 Thread Eric Pugh
t; >>> Beyond updating the list of releases and parsers, and the slide >>> background, what should I change? >>> >>> Maybe some more on Tika eval? More details on some of the NLP / >> Entity >>> Recognition / Image Recoginition stuff? Some screenshots of that >> stuff? >

[jira] [Created] (TIKA-2106) "hocr" case on Linux fails, but works on OSX. Related to TIKA-2093

2016-09-30 Thread Eric Pugh (JIRA)
Eric Pugh created TIKA-2106: --- Summary: "hocr" case on Linux fails, but works on OSX. Related to TIKA-2093 Key: TIKA-2106 URL: https://issues.apache.org/jira/browse/TIKA-2106 Project: Tika

[jira] [Comment Edited] (TIKA-2093) Add hOCR output type to the TesseractOCRParser

2016-09-29 Thread Eric Pugh (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15534613#comment-15534613 ] Eric Pugh edited comment on TIKA-2093 at 9/30/16 12:52 AM: ---

[jira] [Commented] (TIKA-2093) Add hOCR output type to the TesseractOCRParser

2016-09-29 Thread Eric Pugh (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15534613#comment-15534613 ] Eric Pugh commented on TIKA-2093: - BTW, just got to updating my project with the la

[jira] [Commented] (TIKA-2093) Add hOCR output type to the TesseractOCRParser

2016-09-23 Thread Eric Pugh (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2093?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15516177#comment-15516177 ] Eric Pugh commented on TIKA-2093: - Thanks for this, and the addition of

[jira] [Created] (TIKA-2093) Add hOCR output type to the TesseractOCRParser

2016-09-22 Thread Eric Pugh (JIRA)
Eric Pugh created TIKA-2093: --- Summary: Add hOCR output type to the TesseractOCRParser Key: TIKA-2093 URL: https://issues.apache.org/jira/browse/TIKA-2093 Project: Tika Issue Type: Improvement