[jira] [Closed] (TIKA-4262) In pipes XML config, List serializes incorrect causing the parameters to be empty when read

2024-05-26 Thread Nicholas DiPiazza (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas DiPiazza closed TIKA-4262. --- Assignee: Nicholas DiPiazza Resolution: Invalid never mind - this was an issue in my

[jira] [Updated] (TIKA-4262) In pipes XML config, List serializes incorrect causing the parameters to be empty when read

2024-05-26 Thread Nicholas DiPiazza (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4262?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas DiPiazza updated TIKA-4262: Description: tika configuration when saving a fetcher with a list of strings will look like

[jira] [Created] (TIKA-4262) In pipes XML config, List serializes incorrect causing the parameters to be empty when read

2024-05-26 Thread Nicholas DiPiazza (Jira)
Nicholas DiPiazza created TIKA-4262: --- Summary: In pipes XML config, List serializes incorrect causing the parameters to be empty when read Key: TIKA-4262 URL: https://issues.apache.org/jira/browse/TIKA-4262

[jira] [Commented] (TIKA-4243) tika configuration overhaul

2024-05-23 Thread Nicholas DiPiazza (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17848960#comment-17848960 ] Nicholas DiPiazza commented on TIKA-4243: - Sure that sounds good. When we chat later

[jira] [Commented] (TIKA-4252) PipesClient#process - seems to lose the Fetch input metadata?

2024-05-09 Thread Nicholas DiPiazza (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17845083#comment-17845083 ] Nicholas DiPiazza commented on TIKA-4252: - even better > PipesClient#process - seems to lose the

[jira] [Commented] (TIKA-4252) PipesClient#process - seems to lose the Fetch input metadata?

2024-05-09 Thread Nicholas DiPiazza (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17845080#comment-17845080 ] Nicholas DiPiazza commented on TIKA-4252: - Maybe   fetchInputMetadata outputMetadata >

[jira] [Commented] (TIKA-4252) PipesClient#process - seems to lose the Fetch input metadata?

2024-05-09 Thread Nicholas DiPiazza (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17845071#comment-17845071 ] Nicholas DiPiazza commented on TIKA-4252: - sure I can do that. > PipesClient#process - seems to

[jira] [Comment Edited] (TIKA-4252) PipesClient#process - seems to lose the Fetch input metadata?

2024-05-09 Thread Nicholas DiPiazza (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17845071#comment-17845071 ] Nicholas DiPiazza edited comment on TIKA-4252 at 5/9/24 5:08 PM: - sure I

[jira] [Comment Edited] (TIKA-4252) PipesClient#process - seems to lose the Fetch input metadata?

2024-05-09 Thread Nicholas DiPiazza (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17845061#comment-17845061 ] Nicholas DiPiazza edited comment on TIKA-4252 at 5/9/24 4:50 PM: - What I

[jira] [Comment Edited] (TIKA-4252) PipesClient#process - seems to lose the Fetch input metadata?

2024-05-09 Thread Nicholas DiPiazza (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17845061#comment-17845061 ] Nicholas DiPiazza edited comment on TIKA-4252 at 5/9/24 4:50 PM: - What I

[jira] [Commented] (TIKA-4252) PipesClient#process - seems to lose the Fetch input metadata?

2024-05-09 Thread Nicholas DiPiazza (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17845061#comment-17845061 ] Nicholas DiPiazza commented on TIKA-4252: - What I need is to be able to send "Fetch Metadata" such

[jira] [Closed] (TIKA-4252) PipesClient#process - seems to lose the Fetch input metadata?

2024-05-09 Thread Nicholas DiPiazza (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas DiPiazza closed TIKA-4252. --- Fix Version/s: 3.0.0 Resolution: Fixed > PipesClient#process - seems to lose the Fetch

[jira] [Commented] (TIKA-4252) PipesClient#process - seems to lose the Fetch input metadata?

2024-05-09 Thread Nicholas DiPiazza (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17845010#comment-17845010 ] Nicholas DiPiazza commented on TIKA-4252: - done > PipesClient#process - seems to lose the Fetch

[jira] [Updated] (TIKA-4252) PipesClient#process - seems to lose the Fetch input metadata?

2024-05-09 Thread Nicholas DiPiazza (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas DiPiazza updated TIKA-4252: Description: when calling: PipesResult pipesResult = pipesClient.process(new

[jira] [Updated] (TIKA-4252) PipesClient#process - seems to lose the Fetch input metadata?

2024-05-09 Thread Nicholas DiPiazza (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas DiPiazza updated TIKA-4252: Description: when calling: PipesResult pipesResult = pipesClient.process(new

[jira] [Created] (TIKA-4252) PipesClient#process - seems to lose the Fetch input metadata?

2024-05-09 Thread Nicholas DiPiazza (Jira)
Nicholas DiPiazza created TIKA-4252: --- Summary: PipesClient#process - seems to lose the Fetch input metadata? Key: TIKA-4252 URL: https://issues.apache.org/jira/browse/TIKA-4252 Project: Tika

[jira] [Commented] (TIKA-4243) tika configuration overhaul

2024-05-01 Thread Nicholas DiPiazza (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17842622#comment-17842622 ] Nicholas DiPiazza commented on TIKA-4243: - Kinda seems like it might belong in tika-config module 

[jira] [Comment Edited] (TIKA-4243) tika configuration overhaul

2024-05-01 Thread Nicholas DiPiazza (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17842622#comment-17842622 ] Nicholas DiPiazza edited comment on TIKA-4243 at 5/1/24 12:34 PM: -- Kinda

[jira] [Comment Edited] (TIKA-4243) tika configuration overhaul

2024-04-29 Thread Nicholas DiPiazza (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17842158#comment-17842158 ] Nicholas DiPiazza edited comment on TIKA-4243 at 4/29/24 8:56 PM: -- this

[jira] [Commented] (TIKA-4243) tika configuration overhaul

2024-04-29 Thread Nicholas DiPiazza (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17842158#comment-17842158 ] Nicholas DiPiazza commented on TIKA-4243: - this seems like a major feature thing so i would

[jira] [Commented] (TIKA-4243) tika configuration overhaul

2024-04-29 Thread Nicholas DiPiazza (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17842157#comment-17842157 ] Nicholas DiPiazza commented on TIKA-4243: - [https://github.com/joelittlejohn/jsonschema2pojo 

[jira] [Created] (TIKA-4247) HttpFetcher - add ability to send request headers

2024-04-29 Thread Nicholas DiPiazza (Jira)
Nicholas DiPiazza created TIKA-4247: --- Summary: HttpFetcher - add ability to send request headers Key: TIKA-4247 URL: https://issues.apache.org/jira/browse/TIKA-4247 Project: Tika Issue

[jira] [Created] (TIKA-4243) tika configuration overhaul

2024-04-24 Thread Nicholas DiPiazza (Jira)
Nicholas DiPiazza created TIKA-4243: --- Summary: tika configuration overhaul Key: TIKA-4243 URL: https://issues.apache.org/jira/browse/TIKA-4243 Project: Tika Issue Type: New Feature

[jira] [Updated] (TIKA-4243) tika configuration overhaul

2024-04-24 Thread Nicholas DiPiazza (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas DiPiazza updated TIKA-4243: Description: In 3.0.0 when dealing with Tika, it would greatly help to have a Typed

[jira] [Created] (TIKA-4237) Add JWT authentication ability to the http fetcher

2024-04-05 Thread Nicholas DiPiazza (Jira)
Nicholas DiPiazza created TIKA-4237: --- Summary: Add JWT authentication ability to the http fetcher Key: TIKA-4237 URL: https://issues.apache.org/jira/browse/TIKA-4237 Project: Tika Issue

[jira] [Created] (TIKA-4229) add microsoft graph fetcher

2024-03-28 Thread Nicholas DiPiazza (Jira)
Nicholas DiPiazza created TIKA-4229: --- Summary: add microsoft graph fetcher Key: TIKA-4229 URL: https://issues.apache.org/jira/browse/TIKA-4229 Project: Tika Issue Type: New Feature

[jira] [Updated] (TIKA-4181) Grpc + Tika Pipes - pipe iterator and emitter

2024-02-06 Thread Nicholas DiPiazza (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas DiPiazza updated TIKA-4181: Attachment: image-2024-02-06-07-54-50-116.png > Grpc + Tika Pipes - pipe iterator and

[jira] [Updated] (TIKA-4181) Grpc + Tika Pipes - pipe iterator and emitter

2024-02-06 Thread Nicholas DiPiazza (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas DiPiazza updated TIKA-4181: Description: Add full tika-pipes support of grpc * pipe iterator * fetcher * emitter

[jira] [Comment Edited] (TIKA-4181) Grpc + Tika Pipes - pipe iterator and emitter

2024-01-11 Thread Nicholas DiPiazza (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17805762#comment-17805762 ] Nicholas DiPiazza edited comment on TIKA-4181 at 1/11/24 6:25 PM: -- Tika

[jira] [Commented] (TIKA-4181) Grpc + Tika Pipes - pipe iterator and emitter

2024-01-11 Thread Nicholas DiPiazza (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4181?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17805762#comment-17805762 ] Nicholas DiPiazza commented on TIKA-4181: - Tika pipes could get a full fledged service that could

[jira] [Updated] (TIKA-4181) Grpc + Tika Pipes - pipe iterator and emitter

2024-01-11 Thread Nicholas DiPiazza (Jira)
[ https://issues.apache.org/jira/browse/TIKA-4181?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas DiPiazza updated TIKA-4181: Description: Add full tika-pipes support of grpc * pipe iterator * fetcher * emitter

[jira] [Created] (TIKA-4181) Grpc + Tika Pipes - pipe iterator and emitter

2024-01-11 Thread Nicholas DiPiazza (Jira)
Nicholas DiPiazza created TIKA-4181: --- Summary: Grpc + Tika Pipes - pipe iterator and emitter Key: TIKA-4181 URL: https://issues.apache.org/jira/browse/TIKA-4181 Project: Tika Issue Type:

[jira] [Updated] (TIKA-3979) OneNoteParser - Improve performance for deserialization

2023-02-25 Thread Nicholas DiPiazza (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3979?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas DiPiazza updated TIKA-3979: Attachment: image-2023-02-25-12-01-40-311.png > OneNoteParser - Improve performance for

[jira] [Commented] (TIKA-3979) OneNoteParser - Improve performance for deserialization

2023-02-25 Thread Nicholas DiPiazza (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3979?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17693512#comment-17693512 ] Nicholas DiPiazza commented on TIKA-3979: - old and new appear to be the same binary equivalent 

[jira] [Commented] (TIKA-3970) Certain OneNote documents produce duplicate text

2023-02-23 Thread Nicholas DiPiazza (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17692989#comment-17692989 ] Nicholas DiPiazza commented on TIKA-3970: - So on Windows PC I log into 

[jira] [Commented] (TIKA-3970) Certain OneNote documents produce duplicate text

2023-02-23 Thread Nicholas DiPiazza (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3970?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17692984#comment-17692984 ] Nicholas DiPiazza commented on TIKA-3970: - > Should we reverse the iteration order of the pages? I

[jira] [Created] (TIKA-3881) fix testAttachingADebuggerOnTheForkedParserShouldWork test - do not use hard coded port

2022-10-15 Thread Nicholas DiPiazza (Jira)
Nicholas DiPiazza created TIKA-3881: --- Summary: fix testAttachingADebuggerOnTheForkedParserShouldWork test - do not use hard coded port Key: TIKA-3881 URL: https://issues.apache.org/jira/browse/TIKA-3881

[jira] [Resolved] (TIKA-3879) add test containers test for s3 fetcher, emitter and pipe iterators

2022-10-14 Thread Nicholas DiPiazza (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3879?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas DiPiazza resolved TIKA-3879. - Resolution: Implemented > add test containers test for s3 fetcher, emitter and pipe

[jira] [Created] (TIKA-3879) add test containers test for s3 fetcher, emitter and pipe iterators

2022-10-13 Thread Nicholas DiPiazza (Jira)
Nicholas DiPiazza created TIKA-3879: --- Summary: add test containers test for s3 fetcher, emitter and pipe iterators Key: TIKA-3879 URL: https://issues.apache.org/jira/browse/TIKA-3879 Project: Tika

[jira] [Commented] (TIKA-3835) tika pipes parse cache - avoid re-parsing content that has not changed

2022-09-07 Thread Nicholas DiPiazza (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17601463#comment-17601463 ] Nicholas DiPiazza commented on TIKA-3835: - Yeah quickly realizing in my case, because i have solr

[jira] [Comment Edited] (TIKA-3835) tika pipes parse cache - avoid re-parsing content that has not changed

2022-08-11 Thread Nicholas DiPiazza (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17578666#comment-17578666 ] Nicholas DiPiazza edited comment on TIKA-3835 at 8/11/22 8:53 PM: --

[jira] [Comment Edited] (TIKA-3835) tika pipes parse cache - avoid re-parsing content that has not changed

2022-08-11 Thread Nicholas DiPiazza (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17578666#comment-17578666 ] Nicholas DiPiazza edited comment on TIKA-3835 at 8/11/22 8:52 PM: --

[jira] [Commented] (TIKA-3835) tika pipes parse cache - avoid re-parsing content that has not changed

2022-08-11 Thread Nicholas DiPiazza (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17578666#comment-17578666 ] Nicholas DiPiazza commented on TIKA-3835: - [~tallison] i was wondering same thing. For now just

[jira] [Updated] (TIKA-3835) tika pipes parse cache - avoid re-parsing content that has not changed

2022-08-11 Thread Nicholas DiPiazza (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas DiPiazza updated TIKA-3835: Description: Tika pipes should have an optional configuration to archive parsed results.

[jira] [Updated] (TIKA-3835) tika pipes parse cache - avoid re-parsing content that has not changed

2022-08-11 Thread Nicholas DiPiazza (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas DiPiazza updated TIKA-3835: Description: Tika pipes should have an optional configuration to archive parsed results.

[jira] [Updated] (TIKA-3835) tika pipes parse cache - avoid re-parsing content that has not changed

2022-08-11 Thread Nicholas DiPiazza (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas DiPiazza updated TIKA-3835: Description: Tika pipes should have an optional configuration to archive parsed results.

[jira] [Updated] (TIKA-3835) tika pipes parse cache - avoid re-parsing content that has not changed

2022-08-11 Thread Nicholas DiPiazza (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas DiPiazza updated TIKA-3835: Description: Tika pipes should have an optional configuration to archive parsed results.

[jira] [Updated] (TIKA-3835) tika pipes parse cache - avoid re-parsing content that has not changed

2022-08-11 Thread Nicholas DiPiazza (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas DiPiazza updated TIKA-3835: Description: Tika pipes should have an optional configuration to archive parsed results.

[jira] [Updated] (TIKA-3835) tika pipes parse cache - avoid re-parsing content that has not changed

2022-08-11 Thread Nicholas DiPiazza (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas DiPiazza updated TIKA-3835: Description: Tika pipes should have an optional configuration to archive parsed results.

[jira] [Commented] (TIKA-3835) tika pipes parse cache - avoid re-parsing content that has not changed

2022-08-11 Thread Nicholas DiPiazza (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17578591#comment-17578591 ] Nicholas DiPiazza commented on TIKA-3835: - i added a bunch more edits. done. ha sorry if that

[jira] [Updated] (TIKA-3835) tika pipes parse cache - avoid re-parsing content that has not changed

2022-08-11 Thread Nicholas DiPiazza (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas DiPiazza updated TIKA-3835: Description: Tika pipes should have an optional configuration to archive parsed results.

[jira] [Updated] (TIKA-3835) tika pipes parse cache - avoid re-parsing content that has not changed

2022-08-11 Thread Nicholas DiPiazza (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas DiPiazza updated TIKA-3835: Description: Tika pipes should have an optional configuration to archive parsed results.

[jira] [Updated] (TIKA-3835) tika pipes parse cache - avoid re-parsing content that has not changed

2022-08-11 Thread Nicholas DiPiazza (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas DiPiazza updated TIKA-3835: Description: Tika pipes should have an optional configuration to archive parsed results.

[jira] [Updated] (TIKA-3835) tika pipes parse cache - avoid re-parsing content that has not changed

2022-08-11 Thread Nicholas DiPiazza (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas DiPiazza updated TIKA-3835: Description: Tika pipes should have an optional configuration to archive parsed results.

[jira] [Updated] (TIKA-3835) tika pipes parse cache - avoid re-parsing content that has not changed

2022-08-11 Thread Nicholas DiPiazza (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas DiPiazza updated TIKA-3835: Description: Tika pipes should have an optional configuration to archive parsed results.

[jira] [Updated] (TIKA-3835) tika pipes parse cache - avoid re-parsing content that has not changed

2022-08-11 Thread Nicholas DiPiazza (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas DiPiazza updated TIKA-3835: Description: Tika pipes should have an optional configuration to archive parsed results.

[jira] [Updated] (TIKA-3835) tika pipes parse cache - avoid re-parsing content that has not changed

2022-08-11 Thread Nicholas DiPiazza (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas DiPiazza updated TIKA-3835: Description: Tika pipes should have an optional configuration to archive parsed results.

[jira] [Comment Edited] (TIKA-3835) tika pipes parse cache - avoid re-parsing content that has not changed

2022-08-11 Thread Nicholas DiPiazza (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17578583#comment-17578583 ] Nicholas DiPiazza edited comment on TIKA-3835 at 8/11/22 5:37 PM: -- Yes

[jira] [Commented] (TIKA-3835) tika pipes parse cache - avoid re-parsing content that has not changed

2022-08-11 Thread Nicholas DiPiazza (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3835?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17578583#comment-17578583 ] Nicholas DiPiazza commented on TIKA-3835: - Yes good point. I didn't point out some important

[jira] [Updated] (TIKA-3835) tika pipes parse cache - avoid re-parsing content that has not changed

2022-08-11 Thread Nicholas DiPiazza (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas DiPiazza updated TIKA-3835: Description: Tika pipes should have an optional configuration to archive parsed results.

[jira] [Updated] (TIKA-3835) tika pipes parse cache - avoid re-parsing content that has not changed

2022-08-11 Thread Nicholas DiPiazza (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3835?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas DiPiazza updated TIKA-3835: Summary: tika pipes parse cache - avoid re-parsing content that has not changed (was:

[jira] [Created] (TIKA-3835) parse cache - avoid re-parsing content that has not changed

2022-08-11 Thread Nicholas DiPiazza (Jira)
Nicholas DiPiazza created TIKA-3835: --- Summary: parse cache - avoid re-parsing content that has not changed Key: TIKA-3835 URL: https://issues.apache.org/jira/browse/TIKA-3835 Project: Tika

[jira] [Created] (TIKA-3821) Pulsar Tika Pipes Support

2022-07-19 Thread Nicholas DiPiazza (Jira)
Nicholas DiPiazza created TIKA-3821: --- Summary: Pulsar Tika Pipes Support Key: TIKA-3821 URL: https://issues.apache.org/jira/browse/TIKA-3821 Project: Tika Issue Type: New Feature

[jira] [Updated] (TIKA-3821) Pulsar Tika Pipes Support

2022-07-19 Thread Nicholas DiPiazza (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3821?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas DiPiazza updated TIKA-3821: Description: add pulsar support to tika pipes: * pulsar pipe iterator * pulsar emitter

[jira] [Created] (TIKA-3820) Kafka Tika Pipes Support

2022-07-18 Thread Nicholas DiPiazza (Jira)
Nicholas DiPiazza created TIKA-3820: --- Summary: Kafka Tika Pipes Support Key: TIKA-3820 URL: https://issues.apache.org/jira/browse/TIKA-3820 Project: Tika Issue Type: New Feature

[jira] [Comment Edited] (TIKA-3725) Add Authorization to Tika Server (Suggest Basic to start off with)

2022-04-22 Thread Nicholas DiPiazza (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17526632#comment-17526632 ] Nicholas DiPiazza edited comment on TIKA-3725 at 4/22/22 7:03 PM: --

[jira] [Commented] (TIKA-3725) Add Authorization to Tika Server (Suggest Basic to start off with)

2022-04-22 Thread Nicholas DiPiazza (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17526632#comment-17526632 ] Nicholas DiPiazza commented on TIKA-3725: - [~tallison] in my case I have a bunch of other

[jira] [Commented] (TIKA-3725) Add Authorization to Tika Server (Suggest Basic to start off with)

2022-04-22 Thread Nicholas DiPiazza (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3725?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17526557#comment-17526557 ] Nicholas DiPiazza commented on TIKA-3725: - I am a couple weeks out of needing this too, and I'll

[jira] [Commented] (TIKA-3659) SMB/NFS support

2022-01-22 Thread Nicholas DiPiazza (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3659?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17480514#comment-17480514 ] Nicholas DiPiazza commented on TIKA-3659: - I will need to add a `smbj` client for SMB2/3 and

[jira] [Commented] (TIKA-3446) OneNote - look into adding support for OneNote 365 documents

2021-12-08 Thread Nicholas DiPiazza (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17455997#comment-17455997 ] Nicholas DiPiazza commented on TIKA-3446: - [~tallison] Do I need to do anything to make sure this

[jira] [Comment Edited] (TIKA-3561) Tika throwing java.lang.OutOfMemoryError

2021-09-28 Thread Nicholas DiPiazza (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17421848#comment-17421848 ] Nicholas DiPiazza edited comment on TIKA-3561 at 9/29/21, 1:06 AM: --- Tika

[jira] [Comment Edited] (TIKA-3561) Tika throwing java.lang.OutOfMemoryError

2021-09-28 Thread Nicholas DiPiazza (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17421848#comment-17421848 ] Nicholas DiPiazza edited comment on TIKA-3561 at 9/29/21, 1:01 AM: --- Tika

[jira] [Commented] (TIKA-3561) Tika throwing java.lang.OutOfMemoryError

2021-09-28 Thread Nicholas DiPiazza (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17421848#comment-17421848 ] Nicholas DiPiazza commented on TIKA-3561: - Tika needs a lot of memory to parse a nested file like

[jira] [Updated] (TIKA-3561) Tika throwing java.lang.OutOfMemoryError

2021-09-28 Thread Nicholas DiPiazza (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas DiPiazza updated TIKA-3561: Attachment: out.tar.gz > Tika throwing java.lang.OutOfMemoryError >

[jira] [Comment Edited] (TIKA-3495) parent-child in solr emitter doesn't seem to include parent id (_nest_parent_)

2021-07-24 Thread Nicholas DiPiazza (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17386791#comment-17386791 ] Nicholas DiPiazza edited comment on TIKA-3495 at 7/24/21, 11:56 PM:

[jira] [Comment Edited] (TIKA-3495) parent-child in solr emitter doesn't seem to include parent id (_nest_parent_)

2021-07-24 Thread Nicholas DiPiazza (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17386791#comment-17386791 ] Nicholas DiPiazza edited comment on TIKA-3495 at 7/24/21, 11:56 PM:

[jira] [Comment Edited] (TIKA-3495) parent-child in solr emitter doesn't seem to include parent id (_nest_parent_)

2021-07-24 Thread Nicholas DiPiazza (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17386791#comment-17386791 ] Nicholas DiPiazza edited comment on TIKA-3495 at 7/24/21, 11:55 PM:

[jira] [Commented] (TIKA-3495) parent-child in solr emitter doesn't seem to include parent id (_nest_parent_)

2021-07-24 Thread Nicholas DiPiazza (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3495?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17386791#comment-17386791 ] Nicholas DiPiazza commented on TIKA-3495: - [~tallison] i created a PR adding nested document use

[jira] [Created] (TIKA-3455) Create new tika pipes integration test that uses the rest api

2021-06-26 Thread Nicholas DiPiazza (Jira)
Nicholas DiPiazza created TIKA-3455: --- Summary: Create new tika pipes integration test that uses the rest api Key: TIKA-3455 URL: https://issues.apache.org/jira/browse/TIKA-3455 Project: Tika

[jira] [Assigned] (TIKA-3455) Create new tika pipes integration test that uses the rest api

2021-06-26 Thread Nicholas DiPiazza (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3455?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas DiPiazza reassigned TIKA-3455: --- Assignee: Nicholas DiPiazza > Create new tika pipes integration test that uses the

[jira] [Commented] (TIKA-3446) OneNote - look into adding support for OneNote 365 documents

2021-06-26 Thread Nicholas DiPiazza (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3446?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17370146#comment-17370146 ] Nicholas DiPiazza commented on TIKA-3446: - Talked to Microsoft open docs people and they informed

[jira] [Updated] (TIKA-3446) OneNote - look into adding support for OneNote 365 documents

2021-06-15 Thread Nicholas DiPiazza (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3446?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas DiPiazza updated TIKA-3446: Description: While doing some parsing of OneNote documents, I was investigating a slew of

[jira] [Created] (TIKA-3446) OneNote - look into adding support for OneNote 365 documents

2021-06-15 Thread Nicholas DiPiazza (Jira)
Nicholas DiPiazza created TIKA-3446: --- Summary: OneNote - look into adding support for OneNote 365 documents Key: TIKA-3446 URL: https://issues.apache.org/jira/browse/TIKA-3446 Project: Tika

[jira] [Commented] (TIKA-3441) tika server stuck in loop trying to bind

2021-06-09 Thread Nicholas DiPiazza (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3441?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17360174#comment-17360174 ] Nicholas DiPiazza commented on TIKA-3441: - No we have not seen this. We do not have a huge amount

[jira] [Commented] (TIKA-3324) Add checkstyle checker

2021-03-16 Thread Nicholas DiPiazza (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17302739#comment-17302739 ] Nicholas DiPiazza commented on TIKA-3324: - Wow you move fast. that's awesome. This will be super

[jira] [Comment Edited] (TIKA-3324) Add checkstyle checker

2021-03-16 Thread Nicholas DiPiazza (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17302650#comment-17302650 ] Nicholas DiPiazza edited comment on TIKA-3324 at 3/16/21, 4:05 PM: ---

[jira] [Commented] (TIKA-3324) Add checkstyle checker

2021-03-16 Thread Nicholas DiPiazza (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3324?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17302650#comment-17302650 ] Nicholas DiPiazza commented on TIKA-3324: - [~tallison] can you attach your intellij project config

[jira] [Created] (TIKA-3317) Tika Pipes - add a solr fetch iterator

2021-03-13 Thread Nicholas DiPiazza (Jira)
Nicholas DiPiazza created TIKA-3317: --- Summary: Tika Pipes - add a solr fetch iterator Key: TIKA-3317 URL: https://issues.apache.org/jira/browse/TIKA-3317 Project: Tika Issue Type:

[jira] [Commented] (TIKA-3305) How do you handle PDFs with custom encoding?

2021-02-24 Thread Nicholas DiPiazza (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3305?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17289960#comment-17289960 ] Nicholas DiPiazza commented on TIKA-3305: - ok thanks! just making sure. > How do you handle PDFs

[jira] [Closed] (TIKA-3305) How do you handle PDFs with custom encoding?

2021-02-24 Thread Nicholas DiPiazza (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas DiPiazza closed TIKA-3305. --- Resolution: Won't Fix > How do you handle PDFs with custom encoding? >

[jira] [Created] (TIKA-3305) How do you handle PDFs with custom encoding?

2021-02-23 Thread Nicholas DiPiazza (Jira)
Nicholas DiPiazza created TIKA-3305: --- Summary: How do you handle PDFs with custom encoding? Key: TIKA-3305 URL: https://issues.apache.org/jira/browse/TIKA-3305 Project: Tika Issue Type:

[jira] [Updated] (TIKA-3305) How do you handle PDFs with custom encoding?

2021-02-23 Thread Nicholas DiPiazza (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas DiPiazza updated TIKA-3305: Attachment: custom-encoding.pdf > How do you handle PDFs with custom encoding? >

[jira] [Updated] (TIKA-3305) How do you handle PDFs with custom encoding?

2021-02-23 Thread Nicholas DiPiazza (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3305?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Nicholas DiPiazza updated TIKA-3305: Description: how do you parse a pdf with custom encoding? when i parse it i get garbage

[jira] [Commented] (TIKA-3294) Usage of "ECB" mode for "AES" is insecure

2021-02-06 Thread Nicholas DiPiazza (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17280295#comment-17280295 ] Nicholas DiPiazza commented on TIKA-3294: - [~tallison] definitely! > Usage of "ECB" mode for

[jira] [Commented] (TIKA-3282) OneNote Parser breaks non-ASCII Characters

2021-02-03 Thread Nicholas DiPiazza (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3282?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17278307#comment-17278307 ] Nicholas DiPiazza commented on TIKA-3282: - That is correct. Sorry I'm late to the party. I emailed

[jira] [Commented] (TIKA-3226) Add custom connector endpoint

2021-01-14 Thread Nicholas DiPiazza (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17265253#comment-17265253 ] Nicholas DiPiazza commented on TIKA-3226: - Want me to add the http one? > Add custom connector

[jira] [Commented] (TIKA-3226) Add custom connector endpoint

2021-01-14 Thread Nicholas DiPiazza (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3226?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17265252#comment-17265252 ] Nicholas DiPiazza commented on TIKA-3226: - [~tallison] so far so good! > Add custom connector

[jira] [Commented] (TIKA-1735) Unsupported AutoCAD drawing version: AC1027

2021-01-08 Thread Nicholas DiPiazza (Jira)
[ https://issues.apache.org/jira/browse/TIKA-1735?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17261557#comment-17261557 ] Nicholas DiPiazza commented on TIKA-1735: - Here is the spec:

[jira] [Comment Edited] (TIKA-3258) Run OCR on PDFs with 'auto' mode as default in Tika 2.0.0

2021-01-06 Thread Nicholas DiPiazza (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17259808#comment-17259808 ] Nicholas DiPiazza edited comment on TIKA-3258 at 1/6/21, 3:33 PM: --

[jira] [Comment Edited] (TIKA-3258) Run OCR on PDFs with 'auto' mode as default in Tika 2.0.0

2021-01-06 Thread Nicholas DiPiazza (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17259808#comment-17259808 ] Nicholas DiPiazza edited comment on TIKA-3258 at 1/6/21, 3:32 PM: --

  1   2   3   >