[jira] [Updated] (TIKA-4310) Add CloseShield to JSoupParser

2024-09-16 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-4310: -- Fix Version/s: 3.0.0 > Add CloseShield to JSoupParser > -- > >

[jira] [Resolved] (TIKA-4310) Add CloseShield to JSoupParser

2024-09-16 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-4310. --- Resolution: Fixed > Add CloseShield to JSoupParser > -- > >

[jira] [Created] (TIKA-4310) Add CloseShield to JSoupParser

2024-09-16 Thread Tim Allison (Jira)
Tim Allison created TIKA-4310: - Summary: Add CloseShield to JSoupParser Key: TIKA-4310 URL: https://issues.apache.org/jira/browse/TIKA-4310 Project: Tika Issue Type: Task Reporter: Ti

[jira] [Commented] (TIKA-4305) Tika producing empty output for UCS encoded txt files; parses UTF-7 files as UTF-8

2024-09-10 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17880641#comment-17880641 ] Tim Allison commented on TIKA-4305: --- K. So there are two different issues. 1) Above, I

[jira] [Comment Edited] (TIKA-4305) Tika producing empty output for UCS encoded txt files; parses UTF-7 files as UTF-8

2024-09-10 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17880641#comment-17880641 ] Tim Allison edited comment on TIKA-4305 at 9/10/24 1:20 PM: K.

[jira] [Comment Edited] (TIKA-4305) Tika producing empty output for UCS encoded txt files; parses UTF-7 files as UTF-8

2024-09-10 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17880641#comment-17880641 ] Tim Allison edited comment on TIKA-4305 at 9/10/24 1:18 PM: K.

[jira] [Commented] (TIKA-4307) Text in header not extracted for Microsoft Word doc file

2024-09-10 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4307?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17880628#comment-17880628 ] Tim Allison commented on TIKA-4307: --- I asked for help from fellow POI devs: https://bz.

[jira] [Commented] (TIKA-4305) Tika producing empty output for UCS encoded txt files; parses UTF-7 files as UTF-8

2024-09-10 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17880594#comment-17880594 ] Tim Allison commented on TIKA-4305: --- Thank you. Y, as I mentioned above, I effectively c

[jira] [Commented] (TIKA-4305) Tika producing empty output for UCS encoded txt files; parses UTF-7 files as UTF-8

2024-09-09 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17880344#comment-17880344 ] Tim Allison commented on TIKA-4305: --- For those raising an eyebrow over anyone still usin

[jira] [Commented] (TIKA-4305) Tika producing empty output for UCS encoded txt files; parses UTF-7 files as UTF-8

2024-09-09 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17880342#comment-17880342 ] Tim Allison commented on TIKA-4305: --- I get text that I think is correct with tika app 2.

[jira] [Commented] (TIKA-4305) Tika producing empty output for UCS encoded txt files; parses UTF-7 files as UTF-8

2024-09-09 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17880341#comment-17880341 ] Tim Allison commented on TIKA-4305: --- Thank you for raising this issue. For the followin

[jira] [Commented] (TIKA-4306) ffmpeg all the images

2024-09-09 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4306?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17880335#comment-17880335 ] Tim Allison commented on TIKA-4306: --- The way to accomplish this would be to add more mim

[jira] [Commented] (TIKA-4239) Update to 2.9.3

2024-08-28 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4239?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17877432#comment-17877432 ] Tim Allison commented on TIKA-4239: --- Thank you! > Update to 2.9.3 > --- > >

[jira] [Created] (TIKA-4301) Factor tika pipes base classes out of tika-core into a tika-pipes-core module

2024-08-27 Thread Tim Allison (Jira)
Tim Allison created TIKA-4301: - Summary: Factor tika pipes base classes out of tika-core into a tika-pipes-core module Key: TIKA-4301 URL: https://issues.apache.org/jira/browse/TIKA-4301 Project: Tika

[jira] [Commented] (TIKA-4280) Tasks for the 3.0.0 release

2024-08-21 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17875614#comment-17875614 ] Tim Allison commented on TIKA-4280: --- Got it. Now I see. Thank you. I don't think we have

[jira] [Commented] (TIKA-4280) Tasks for the 3.0.0 release

2024-08-21 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17875577#comment-17875577 ] Tim Allison commented on TIKA-4280: --- bq. Decide about the ffmpeg issue and the hdf5 issu

[jira] [Commented] (TIKA-4280) Tasks for the 3.0.0 release

2024-08-21 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17875579#comment-17875579 ] Tim Allison commented on TIKA-4280: --- bq. TIKA-4290 Tilman question Does anything remain

[jira] [Commented] (TIKA-4280) Tasks for the 3.0.0 release

2024-08-21 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17875574#comment-17875574 ] Tim Allison commented on TIKA-4280: --- bq. Before releasing the real 3.0.0 we need to remo

[jira] [Resolved] (TIKA-4299) Clean up pagination in AbstractPDF2XHTML

2024-08-21 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4299?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-4299. --- Fix Version/s: 3.0.0 Resolution: Fixed > Clean up pagination in AbstractPDF2XHTML > ---

[jira] [Created] (TIKA-4299) Clean up pagination in AbstractPDF2XHTML

2024-08-19 Thread Tim Allison (Jira)
Tim Allison created TIKA-4299: - Summary: Clean up pagination in AbstractPDF2XHTML Key: TIKA-4299 URL: https://issues.apache.org/jira/browse/TIKA-4299 Project: Tika Issue Type: Task Re

[jira] [Commented] (TIKA-4290) Fix code inspection anomalies

2024-08-19 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4290?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17874900#comment-17874900 ] Tim Allison commented on TIKA-4290: --- Thank you [~tilman] and [~dkryukov]! > Fix code in

[jira] [Resolved] (TIKA-4295) Allow bypass of emitKey in AbstractEmbeddedDocumentBytesHandler

2024-08-07 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-4295. --- Fix Version/s: 3.0.0 Resolution: Fixed > Allow bypass of emitKey in AbstractEmbeddedDocumentByt

[jira] [Created] (TIKA-4295) Allow bypass of emitKey in AbstractEmbeddedDocumentBytesHandler

2024-08-07 Thread Tim Allison (Jira)
Tim Allison created TIKA-4295: - Summary: Allow bypass of emitKey in AbstractEmbeddedDocumentBytesHandler Key: TIKA-4295 URL: https://issues.apache.org/jira/browse/TIKA-4295 Project: Tika Issue T

[jira] [Resolved] (TIKA-4235) Add pipeline parameter to OpenSearch emitter

2024-08-06 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-4235. --- Resolution: Won't Fix Reopen if needed > Add pipeline parameter to OpenSearch emitter > -

[jira] [Resolved] (TIKA-4294) Simplify serialization/deserialization of ParseContext

2024-08-06 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-4294. --- Resolution: Fixed Sorry for all the noise on this one. > Simplify serialization/deserialization of Pa

[jira] [Reopened] (TIKA-4294) Simplify serialization/deserialization of ParseContext

2024-08-06 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison reopened TIKA-4294: --- Should include some earlier simplification proposals from https://github.com/apache/tika/pull/1805 > Sim

[jira] [Resolved] (TIKA-4294) Simplify serialization/deserialization of ParseContext

2024-08-05 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-4294. --- Resolution: Fixed > Simplify serialization/deserialization of ParseContext > -

[jira] [Commented] (TIKA-4251) [DISCUSS] move to cosium's git-code-format-maven-plugin with google-java-format

2024-08-05 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17871163#comment-17871163 ] Tim Allison commented on TIKA-4251: --- My sense is that at some point, we have to throw up

[jira] [Commented] (TIKA-4251) [DISCUSS] move to cosium's git-code-format-maven-plugin with google-java-format

2024-08-05 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17871162#comment-17871162 ] Tim Allison commented on TIKA-4251: --- Use intellij using the checkstyle profile? Checksty

[jira] [Commented] (TIKA-4294) Simplify serialization/deserialization of ParseContext

2024-08-05 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17871159#comment-17871159 ] Tim Allison commented on TIKA-4294: --- While adding a unit test, I found that we should al

[jira] [Reopened] (TIKA-4294) Simplify serialization/deserialization of ParseContext

2024-08-05 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison reopened TIKA-4294: --- Assignee: Tim Allison > Simplify serialization/deserialization of ParseContext > ---

[jira] [Commented] (TIKA-4294) Simplify serialization/deserialization of ParseContext

2024-08-05 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17871137#comment-17871137 ] Tim Allison commented on TIKA-4294: --- K. Got it. This fixes the key to be the super class

[jira] [Commented] (TIKA-4294) Simplify serialization/deserialization of ParseContext

2024-08-05 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17871131#comment-17871131 ] Tim Allison commented on TIKA-4294: --- This is an example of what the json might look like

[jira] [Commented] (TIKA-4294) Simplify serialization/deserialization of ParseContext

2024-08-05 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17871130#comment-17871130 ] Tim Allison commented on TIKA-4294: --- The key in ParseContext should be {{superClazz}}, a

[jira] [Commented] (TIKA-4294) Simplify serialization/deserialization of ParseContext

2024-08-05 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17871128#comment-17871128 ] Tim Allison commented on TIKA-4294: --- Thank you, @tilman. Apologies... will {{className}}

[jira] [Resolved] (TIKA-4294) Simplify serialization/deserialization of ParseContext

2024-08-05 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-4294. --- Fix Version/s: 3.0.0 Resolution: Fixed Thank you [~dimirsen] and [~tilman]! > Simplify seriali

[jira] [Commented] (TIKA-4252) PipesClient#process - seems to lose the Fetch input metadata?

2024-08-05 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17871095#comment-17871095 ] Tim Allison commented on TIKA-4252: --- Thank you [~tilman]! I'll work cleaning this up her

[jira] [Created] (TIKA-4294) Simplify serialization/deserialization of ParseContext

2024-08-05 Thread Tim Allison (Jira)
Tim Allison created TIKA-4294: - Summary: Simplify serialization/deserialization of ParseContext Key: TIKA-4294 URL: https://issues.apache.org/jira/browse/TIKA-4294 Project: Tika Issue Type: Task

[jira] [Commented] (TIKA-4291) In JDBCEmitter local var dateFormats shadows class filed with the same name

2024-08-05 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4291?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17871086#comment-17871086 ] Tim Allison commented on TIKA-4291: --- LGTM. Thank you! > In JDBCEmitter local var dateFo

[jira] [Resolved] (TIKA-4289) Further improvements to the metadata filter and serialization

2024-07-31 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4289?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-4289. --- Fix Version/s: 3.0.0 Resolution: Fixed > Further improvements to the metadata filter and serial

[jira] [Resolved] (TIKA-4288) Allow user configuration of MetadataFilters in PipesServer

2024-07-30 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4288?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-4288. --- Fix Version/s: 3.0.0 Resolution: Fixed > Allow user configuration of MetadataFilters in PipesSe

[jira] [Resolved] (TIKA-4287) Improve PDFParserConfig serialization

2024-07-30 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4287?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-4287. --- Fix Version/s: 3.0.0 Resolution: Fixed > Improve PDFParserConfig serialization > --

[jira] [Created] (TIKA-4289) Further improvements to the metadata filter and serialization

2024-07-30 Thread Tim Allison (Jira)
Tim Allison created TIKA-4289: - Summary: Further improvements to the metadata filter and serialization Key: TIKA-4289 URL: https://issues.apache.org/jira/browse/TIKA-4289 Project: Tika Issue Typ

[jira] [Created] (TIKA-4288) Allow user configuration of MetadataFilters in PipesServer

2024-07-30 Thread Tim Allison (Jira)
Tim Allison created TIKA-4288: - Summary: Allow user configuration of MetadataFilters in PipesServer Key: TIKA-4288 URL: https://issues.apache.org/jira/browse/TIKA-4288 Project: Tika Issue Type: T

[jira] [Created] (TIKA-4287) Improve PDFParserConfig serialization

2024-07-30 Thread Tim Allison (Jira)
Tim Allison created TIKA-4287: - Summary: Improve PDFParserConfig serialization Key: TIKA-4287 URL: https://issues.apache.org/jira/browse/TIKA-4287 Project: Tika Issue Type: Task Repor

[jira] [Commented] (TIKA-4280) Tasks for the 3.0.0 release

2024-07-25 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17868724#comment-17868724 ] Tim Allison commented on TIKA-4280: --- Y, probably? I wasn't thinking of changing tika-ser

[jira] [Comment Edited] (TIKA-4280) Tasks for the 3.0.0 release

2024-07-25 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4280?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17868724#comment-17868724 ] Tim Allison edited comment on TIKA-4280 at 7/25/24 3:49 PM: Y,

[jira] [Resolved] (TIKA-4285) Invalid Link for changelog CHANGES.txt files

2024-07-22 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-4285. --- Resolution: Fixed Thank you [~tom_1st] and [~tilman]! Should be fixed now. > Invalid Link for changel

[jira] [Assigned] (TIKA-4285) Invalid Link for changelog CHANGES.txt files

2024-07-22 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4285?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison reassigned TIKA-4285: - Assignee: Tim Allison > Invalid Link for changelog CHANGES.txt files > --

[jira] [Commented] (TIKA-4281) Fix javadoc plugin configuration

2024-07-17 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17866843#comment-17866843 ] Tim Allison commented on TIKA-4281: --- For some reason, now, it looks like {{javadocs}} wo

[jira] [Commented] (TIKA-4281) Fix javadoc plugin configuration

2024-07-17 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4281?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17866742#comment-17866742 ] Tim Allison commented on TIKA-4281: --- Well, that didn't work: {{javadoc: error - No sourc

[jira] [Created] (TIKA-4281) Fix javadoc plugin configuration

2024-07-16 Thread Tim Allison (Jira)
Tim Allison created TIKA-4281: - Summary: Fix javadoc plugin configuration Key: TIKA-4281 URL: https://issues.apache.org/jira/browse/TIKA-4281 Project: Tika Issue Type: Task Reporter:

[jira] [Updated] (TIKA-4280) Tasks for the 3.0.0 release

2024-07-16 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-4280: -- Description: I'm too lazy to open separate tickets. Please do so if desired. Some items: * Before relea

[jira] [Updated] (TIKA-4280) Tasks for the 3.0.0 release

2024-07-16 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4280?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-4280: -- Description: I'm too lazy to open separate tickets. Please do so if desired. Some items: * Before relea

[jira] [Created] (TIKA-4280) Tasks for the 3.0.0 release

2024-07-15 Thread Tim Allison (Jira)
Tim Allison created TIKA-4280: - Summary: Tasks for the 3.0.0 release Key: TIKA-4280 URL: https://issues.apache.org/jira/browse/TIKA-4280 Project: Tika Issue Type: Task Reporter: Tim A

[jira] [Commented] (TIKA-4278) TextAndCSVParser doesn't detect semicolon separated file

2024-07-15 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17866075#comment-17866075 ] Tim Allison commented on TIKA-4278: --- Thank you, [~tilman], y, that's probably an oversig

[jira] [Created] (TIKA-4275) Make tika-grpc a top-level module

2024-07-09 Thread Tim Allison (Jira)
Tim Allison created TIKA-4275: - Summary: Make tika-grpc a top-level module Key: TIKA-4275 URL: https://issues.apache.org/jira/browse/TIKA-4275 Project: Tika Issue Type: Task Reporter:

[jira] [Commented] (TIKA-4272) create tika docker image for tika-grpc

2024-06-26 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17860241#comment-17860241 ] Tim Allison commented on TIKA-4272: --- Y, I concur, we should have a completely separate i

[jira] [Commented] (TIKA-4251) [DISCUSS] move to cosium's git-code-format-maven-plugin with google-java-format

2024-06-25 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17860035#comment-17860035 ] Tim Allison commented on TIKA-4251: --- W00t! > [DISCUSS] move to cosium's git-code-format

[jira] [Commented] (TIKA-4251) [DISCUSS] move to cosium's git-code-format-maven-plugin with google-java-format

2024-06-25 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17860020#comment-17860020 ] Tim Allison commented on TIKA-4251: --- Sounds great. My personal preference would be to mo

[jira] [Commented] (TIKA-4251) [DISCUSS] move to cosium's git-code-format-maven-plugin with google-java-format

2024-06-25 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17860007#comment-17860007 ] Tim Allison commented on TIKA-4251: --- > we eat the 1-time-format cost That's where the v

[jira] [Commented] (TIKA-4251) [DISCUSS] move to cosium's git-code-format-maven-plugin with google-java-format

2024-06-25 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1785#comment-1785 ] Tim Allison commented on TIKA-4251: --- Makes sense. Tilman's observation is legit, and I d

[jira] [Comment Edited] (TIKA-4251) [DISCUSS] move to cosium's git-code-format-maven-plugin with google-java-format

2024-06-25 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17859739#comment-17859739 ] Tim Allison edited comment on TIKA-4251 at 6/25/24 6:19 PM: Y.

[jira] [Commented] (TIKA-4251) [DISCUSS] move to cosium's git-code-format-maven-plugin with google-java-format

2024-06-24 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17859739#comment-17859739 ] Tim Allison commented on TIKA-4251: --- Y. I agree. When I started with checkstyle, it modi

[jira] [Commented] (TIKA-4243) tika configuration overhaul

2024-06-07 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17853241#comment-17853241 ] Tim Allison commented on TIKA-4243: --- This is what the json currently looks like. {code:

[jira] [Commented] (TIKA-4243) tika configuration overhaul

2024-06-07 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17853240#comment-17853240 ] Tim Allison commented on TIKA-4243: --- I opened a PR with some cleanup, fixes and a new un

[jira] [Resolved] (TIKA-4268) Use title for embedded resource path in embedded msg files

2024-06-07 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4268?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-4268. --- Fix Version/s: 3.0.0 Resolution: Fixed > Use title for embedded resource path in embedded msg f

[jira] [Commented] (TIKA-4251) [DISCUSS] move to cosium's git-code-format-maven-plugin with google-java-format

2024-06-07 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4251?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17853157#comment-17853157 ] Tim Allison commented on TIKA-4251: --- Unless there are any objections, I'll likely move f

[jira] [Created] (TIKA-4268) Use title for embedded resource path in embedded msg files

2024-06-07 Thread Tim Allison (Jira)
Tim Allison created TIKA-4268: - Summary: Use title for embedded resource path in embedded msg files Key: TIKA-4268 URL: https://issues.apache.org/jira/browse/TIKA-4268 Project: Tika Issue Type: T

[jira] [Comment Edited] (TIKA-4243) tika configuration overhaul

2024-06-06 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17852876#comment-17852876 ] Tim Allison edited comment on TIKA-4243 at 6/6/24 5:39 PM: --- I th

[jira] [Commented] (TIKA-4243) tika configuration overhaul

2024-06-06 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17852876#comment-17852876 ] Tim Allison commented on TIKA-4243: --- I think our joint recent PR on TIKA-4252 accomplish

[jira] [Commented] (TIKA-4252) PipesClient#process - seems to lose the Fetch input metadata?

2024-06-06 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17852874#comment-17852874 ] Tim Allison commented on TIKA-4252: --- K. I think we're at "good enough" here. [~ndipiazza

[jira] [Resolved] (TIKA-4252) PipesClient#process - seems to lose the Fetch input metadata?

2024-06-06 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-4252. --- Resolution: Fixed > PipesClient#process - seems to lose the Fetch input metadata? > --

[jira] [Commented] (TIKA-4243) tika configuration overhaul

2024-06-06 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17852808#comment-17852808 ] Tim Allison commented on TIKA-4243: --- Oh, and documentation, lots of documentation. :LOL:

[jira] [Comment Edited] (TIKA-4243) tika configuration overhaul

2024-06-06 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17852804#comment-17852804 ] Tim Allison edited comment on TIKA-4243 at 6/6/24 2:11 PM: --- Curr

[jira] [Commented] (TIKA-4243) tika configuration overhaul

2024-06-06 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17852804#comment-17852804 ] Tim Allison commented on TIKA-4243: --- Current status on TIKA-4243 -- works up through and

[jira] [Commented] (TIKA-4243) tika configuration overhaul

2024-06-04 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17852098#comment-17852098 ] Tim Allison commented on TIKA-4243: --- Let me know if there are any objections to heading

[jira] [Commented] (TIKA-4243) tika configuration overhaul

2024-06-04 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17852097#comment-17852097 ] Tim Allison commented on TIKA-4243: --- K, I chatted briefly with [~ndipiazza] this morning

[jira] [Comment Edited] (TIKA-4243) tika configuration overhaul

2024-06-03 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17851727#comment-17851727 ] Tim Allison edited comment on TIKA-4243 at 6/3/24 5:10 PM: --- I sp

[jira] [Comment Edited] (TIKA-4243) tika configuration overhaul

2024-06-03 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17851727#comment-17851727 ] Tim Allison edited comment on TIKA-4243 at 6/3/24 5:02 PM: --- I sp

[jira] [Comment Edited] (TIKA-4243) tika configuration overhaul

2024-06-03 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17851727#comment-17851727 ] Tim Allison edited comment on TIKA-4243 at 6/3/24 5:02 PM: --- I sp

[jira] [Comment Edited] (TIKA-4243) tika configuration overhaul

2024-06-03 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17851727#comment-17851727 ] Tim Allison edited comment on TIKA-4243 at 6/3/24 4:45 PM: --- I sp

[jira] [Comment Edited] (TIKA-4243) tika configuration overhaul

2024-06-03 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17851727#comment-17851727 ] Tim Allison edited comment on TIKA-4243 at 6/3/24 4:45 PM: --- I sp

[jira] [Commented] (TIKA-4243) tika configuration overhaul

2024-06-03 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17851727#comment-17851727 ] Tim Allison commented on TIKA-4243: --- I spent a bit of time trying to serialize ParseCont

[jira] [Resolved] (TIKA-4260) Add parse context to the fetcher interface in 3.x

2024-06-03 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4260?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-4260. --- Resolution: Duplicate Turns out this is a duplicate. Onwards to TIKA-4243! > Add parse context to the

[jira] [Created] (TIKA-4266) Improve multithreading and the xml parser pools in XMLUtils

2024-05-30 Thread Tim Allison (Jira)
Tim Allison created TIKA-4266: - Summary: Improve multithreading and the xml parser pools in XMLUtils Key: TIKA-4266 URL: https://issues.apache.org/jira/browse/TIKA-4266 Project: Tika Issue Type:

[jira] [Resolved] (TIKA-4221) Regression in pack200 parsing in commons-compress

2024-05-30 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4221?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-4221. --- Fix Version/s: 3.0.0 2.9.3 Resolution: Fixed Many thanks to [~ggregory] and

[jira] [Resolved] (TIKA-4220) Commons-compress too lenient on headless tar detection

2024-05-30 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4220?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-4220. --- Fix Version/s: 3.0.0 2.9.3 Resolution: Fixed Many thanks to [~ggregory] and

[jira] [Commented] (TIKA-4265) Consider adding maven build cache extension

2024-05-30 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17850776#comment-17850776 ] Tim Allison commented on TIKA-4265: --- It doesn't help at all if there's a modification in

[jira] [Commented] (TIKA-4265) Consider adding maven build cache extension

2024-05-30 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4265?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17850773#comment-17850773 ] Tim Allison commented on TIKA-4265: --- I just pushed a demo to {{build-cache}}. This inclu

[jira] [Created] (TIKA-4265) Consider adding maven build cache extension

2024-05-30 Thread Tim Allison (Jira)
Tim Allison created TIKA-4265: - Summary: Consider adding maven build cache extension Key: TIKA-4265 URL: https://issues.apache.org/jira/browse/TIKA-4265 Project: Tika Issue Type: Task

[jira] [Created] (TIKA-4261) Add attachment type metadata filter

2024-05-24 Thread Tim Allison (Jira)
Tim Allison created TIKA-4261: - Summary: Add attachment type metadata filter Key: TIKA-4261 URL: https://issues.apache.org/jira/browse/TIKA-4261 Project: Tika Issue Type: Task Reporte

[jira] [Resolved] (TIKA-4259) Decouple xml parser stuff from ParseContext

2024-05-24 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4259?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-4259. --- Fix Version/s: 3.0.0 Resolution: Fixed > Decouple xml parser stuff from ParseContext >

[jira] [Commented] (TIKA-4260) Add parse context to the fetcher interface in 3.x

2024-05-24 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17849298#comment-17849298 ] Tim Allison commented on TIKA-4260: --- That PR currently only works on tika-core. More nee

[jira] [Commented] (TIKA-4243) tika configuration overhaul

2024-05-24 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17849288#comment-17849288 ] Tim Allison commented on TIKA-4243: --- [~ndipiazza], I added parseContext to fetchers and

[jira] [Comment Edited] (TIKA-4243) tika configuration overhaul

2024-05-24 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17849103#comment-17849103 ] Tim Allison edited comment on TIKA-4243 at 5/24/24 1:00 PM: Pr

[jira] [Created] (TIKA-4260) Add parse context to the fetcher interface in 3.x

2024-05-23 Thread Tim Allison (Jira)
Tim Allison created TIKA-4260: - Summary: Add parse context to the fetcher interface in 3.x Key: TIKA-4260 URL: https://issues.apache.org/jira/browse/TIKA-4260 Project: Tika Issue Type: Task

[jira] [Created] (TIKA-4259) Decouple xml parser stuff from ParseContext

2024-05-23 Thread Tim Allison (Jira)
Tim Allison created TIKA-4259: - Summary: Decouple xml parser stuff from ParseContext Key: TIKA-4259 URL: https://issues.apache.org/jira/browse/TIKA-4259 Project: Tika Issue Type: Task

[jira] [Commented] (TIKA-4243) tika configuration overhaul

2024-05-23 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17849114#comment-17849114 ] Tim Allison commented on TIKA-4243: --- I'm going to start working on PRs that will be gene

[jira] [Commented] (TIKA-4243) tika configuration overhaul

2024-05-23 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17849108#comment-17849108 ] Tim Allison commented on TIKA-4243: --- The downsides we see: a) if we there's agreement to

  1   2   3   4   5   6   7   8   9   10   >