[GitHub] [tika] peterkronenberg commented on pull request #403: Allow tesseract/tessdata path to be specified by environment variables

2021-02-09 Thread GitBox
peterkronenberg commented on pull request #403: URL: https://github.com/apache/tika/pull/403#issuecomment-776414822 Overtaken by events This is an automated message from the Apache Git Service. To respond to the message,

[GitHub] [tika] peterkronenberg closed pull request #403: Allow tesseract/tessdata path to be specified by environment variables

2021-02-09 Thread GitBox
peterkronenberg closed pull request #403: URL: https://github.com/apache/tika/pull/403 This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go

Re: {EXTERNAL}[jira] [Commented] (TIKA-3297) Simplify parser configuration in 2.x

2021-02-09 Thread Tim Allison
Y On Tue, Feb 9, 2021 at 7:16 PM Peter Kronenberg wrote: > How are the default values defined? Previously, it was whatever was in > the default .properties file, right? Are they just hard-coded now? > > -Original Message- > From: Tim Allison > Sent: Tuesday, February 9, 2021 5:59 PM

RE: {EXTERNAL}[jira] [Commented] (TIKA-3297) Simplify parser configuration in 2.x

2021-02-09 Thread Peter Kronenberg
How are the default values defined? Previously, it was whatever was in the default .properties file, right? Are they just hard-coded now? -Original Message- From: Tim Allison Sent: Tuesday, February 9, 2021 5:59 PM To: Subject: Re: {EXTERNAL}[jira] [Commented] (TIKA-3297) Simplify

RE: {EXTERNAL}[jira] [Commented] (TIKA-3297) Simplify parser configuration in 2.x

2021-02-09 Thread Peter Kronenberg
I certainly agree that it was confusing and non-intuitive, but I didn't expect things to change so drastically, so quickly! I'll take a look at the unit tests for examples, but it sounds like you're saying I will be able to use tika-config.xml for all my default settings and then still change

[jira] [Commented] (TIKA-3297) Simplify parser configuration in 2.x

2021-02-09 Thread Hudson (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17282120#comment-17282120 ] Hudson commented on TIKA-3297: -- UNSTABLE: Integrated in Jenkins build Tika » tika-main-jdk8 #155 (See

[jira] [Commented] (TIKA-3296) Allow tesseract/tessdata path to be specified by environment variables

2021-02-09 Thread Peter Kronenberg (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17282115#comment-17282115 ] Peter Kronenberg commented on TIKA-3296: Because I want to be able to package the jar to run in

[jira] [Commented] (TIKA-3296) Allow tesseract/tessdata path to be specified by environment variables

2021-02-09 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3296?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17282103#comment-17282103 ] Tim Allison commented on TIKA-3296: --- >  environment variables are a pretty standard way to do this type

[jira] [Resolved] (TIKA-3297) Simplify parser configuration in 2.x

2021-02-09 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3297?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-3297. --- Fix Version/s: 2.0.0 Resolution: Fixed > Simplify parser configuration in 2.x >

[jira] [Commented] (TIKA-3297) Simplify parser configuration in 2.x

2021-02-09 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17282102#comment-17282102 ] Tim Allison commented on TIKA-3297: --- Sorry about that!  I just updated the PDF part and fixed the

Re: {EXTERNAL}[jira] [Commented] (TIKA-3297) Simplify parser configuration in 2.x

2021-02-09 Thread Tim Allison
>How does the TesseractOCRConfig and PDFParser objects get initialized if not from the corresponding .properties file? Configuration is initialized by the default values. If there's a tika-config.xml, that will overwrite those fields shortly after initialization. On Tue, Feb 9, 2021 at 4:21 PM

Re: {EXTERNAL}[jira] [Commented] (TIKA-3297) Simplify parser configuration in 2.x

2021-02-09 Thread Tim Allison
Peter, I did not intend to cause pain. It felt like I spent numerous hours trying to help you debug what you were seeing and explaining the current configuration methods. I was unsuccessful in communicating to you that what you were seeing was "expected." Rather than spend more time trying to

[jira] [Commented] (TIKA-3297) Simplify parser configuration in 2.x

2021-02-09 Thread Hudson (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17282091#comment-17282091 ] Hudson commented on TIKA-3297: -- FAILURE: Integrated in Jenkins build Tika » tika-main-jdk8 #154 (See

RE: {EXTERNAL}[jira] [Commented] (TIKA-3297) Simplify parser configuration in 2.x

2021-02-09 Thread Peter Kronenberg
You're killing me here! I just finished an implementation that relies on this. I never figured out how to set properties at runtime if I use tika-config. Can you please provide an example of setting properties with tika-config and then optionally changing them at runtime? How does the

[jira] [Commented] (TIKA-3297) Simplify parser configuration in 2.x

2021-02-09 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3297?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17282043#comment-17282043 ] Tim Allison commented on TIKA-3297: --- I got rid of the .properties for tesseract.  Users can no longer

[jira] [Created] (TIKA-3297) Simplify parser configuration in 2.x

2021-02-09 Thread Tim Allison (Jira)
Tim Allison created TIKA-3297: - Summary: Simplify parser configuration in 2.x Key: TIKA-3297 URL: https://issues.apache.org/jira/browse/TIKA-3297 Project: Tika Issue Type: Task

Re: load error handler in TikaConfig for 2.x?

2021-02-09 Thread Nick Burch
On Tue, 9 Feb 2021, Tim Allison wrote: Would we just swap to throwing an Exception if a parser can't be found / loaded? Y, that'd be my inclination. Seems ok to me what do we do if someone gives us a Tika Config that references a Parser that doesn't exist? My preference would be to throw

Re: load error handler in TikaConfig for 2.x?

2021-02-09 Thread Tim Allison
> Would we just swap to throwing an Exception if a parser can't be found / loaded? Y, that'd be my inclination. > what do we do if someone gives us a Tika Config that references a Parser that doesn't exist? My preference would be to throw early and often. I don't want problems to be hidden.

Re: load error handler in TikaConfig for 2.x?

2021-02-09 Thread Nick Burch
On Mon, 8 Feb 2021, Tim Allison wrote: Do we still need the LoadErrorHandler for TikaConfig 2.x? IIRC, we added that so that folks who didn't want a dependency could prevent the loading of the dependency and then silence complaints -- if set to ignore. Would we just swap to throwing an