[
https://issues.apache.org/jira/browse/TIKA-4260?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17849298#comment-17849298
]
Tim Allison commented on TIKA-4260:
---
That PR currently only works on tika-core. More needs to be done
[
https://issues.apache.org/jira/browse/TIKA-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17849288#comment-17849288
]
Tim Allison commented on TIKA-4243:
---
[~ndipiazza], I added parseContext to fetchers and emitters
[
https://issues.apache.org/jira/browse/TIKA-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17849103#comment-17849103
]
Tim Allison edited comment on TIKA-4243 at 5/24/24 1:00 PM:
Proposed basic
Tim Allison created TIKA-4260:
-
Summary: Add parse context to the fetcher interface in 3.x
Key: TIKA-4260
URL: https://issues.apache.org/jira/browse/TIKA-4260
Project: Tika
Issue Type: Task
Tim Allison created TIKA-4259:
-
Summary: Decouple xml parser stuff from ParseContext
Key: TIKA-4259
URL: https://issues.apache.org/jira/browse/TIKA-4259
Project: Tika
Issue Type: Task
[
https://issues.apache.org/jira/browse/TIKA-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17849114#comment-17849114
]
Tim Allison commented on TIKA-4243:
---
I'm going to start working on PRs that will be generally helpful
[
https://issues.apache.org/jira/browse/TIKA-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17849108#comment-17849108
]
Tim Allison commented on TIKA-4243:
---
The downsides we see:
a) if we there's agreement to add jackson
[
https://issues.apache.org/jira/browse/TIKA-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17849103#comment-17849103
]
Tim Allison commented on TIKA-4243:
---
Proposed basic roadmap:
Serialize ParseContext as is...
Allow
[
https://issues.apache.org/jira/browse/TIKA-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17849101#comment-17849101
]
Tim Allison commented on TIKA-4243:
---
Fellow devs, in chatting with Nicholas, we're thinking
[
https://issues.apache.org/jira/browse/TIKA-4258?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Allison resolved TIKA-4258.
---
Resolution: Fixed
Just pushed 2.9.2.1/*-latest
Thank you, all!
> Multi-arch support for doc
All,
Many thanks to the many community members who helped figure this out and
get it out the door! As of tika-docker 2.9.2.1, we now have multi-arch
support (and on noble!).
Let us know if there are any surprises. Thank you, again!
Cheers,
Tim
Ref:
[
https://issues.apache.org/jira/browse/TIKA-4255?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17847980#comment-17847980
]
Tim Allison commented on TIKA-4255:
---
Thank you for opening this PR. Are you able to add a small unit
[
https://issues.apache.org/jira/browse/TIKA-4256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Allison resolved TIKA-4256.
---
Fix Version/s: 3.0.0
Resolution: Fixed
> Allow inlining of ocr'd text in container docum
[
https://issues.apache.org/jira/browse/TIKA-4258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17847950#comment-17847950
]
Tim Allison commented on TIKA-4258:
---
I'm sure I'll need to modify the PR when I actually go to run
[
https://issues.apache.org/jira/browse/TIKA-4258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17847949#comment-17847949
]
Tim Allison commented on TIKA-4258:
---
Let's give it a day for fellow devs to weigh
[
https://issues.apache.org/jira/browse/TIKA-4258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17847943#comment-17847943
]
Tim Allison commented on TIKA-4258:
---
And here's the full version:
https://hub.docker.com/layers/apache
[
https://issues.apache.org/jira/browse/TIKA-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17847931#comment-17847931
]
Tim Allison commented on TIKA-4243:
---
Separately, but related to this and also to TIKA-4252 -- should we
[
https://issues.apache.org/jira/browse/TIKA-4258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17847883#comment-17847883
]
Tim Allison commented on TIKA-4258:
---
Helpful links from #infra:
https://infra.apache.org/docker-hub
[
https://issues.apache.org/jira/browse/TIKA-4258?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17847882#comment-17847882
]
Tim Allison commented on TIKA-4258:
---
If fellow devs with better knowledge of github actions and docker
Tim Allison created TIKA-4258:
-
Summary: Multi-arch support for docker images
Key: TIKA-4258
URL: https://issues.apache.org/jira/browse/TIKA-4258
Project: Tika
Issue Type: Task
[
https://issues.apache.org/jira/browse/TIKA-4256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Allison updated TIKA-4256:
--
Description:
For legacy tika, we're inlining all content from embedded files including ocr
content
[
https://issues.apache.org/jira/browse/TIKA-4256?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Allison updated TIKA-4256:
--
Description:
For legacy tika, we're inlining all content from embedded files including ocr
content
Tim Allison created TIKA-4256:
-
Summary: Allow inlining of ocr'd text in container document
Key: TIKA-4256
URL: https://issues.apache.org/jira/browse/TIKA-4256
Project: Tika
Issue Type: Task
[
https://issues.apache.org/jira/browse/TIKA-4137?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17846697#comment-17846697
]
Tim Allison commented on TIKA-4137:
---
Y, done just now.
> Building current Tika main branch fails un
[
https://issues.apache.org/jira/browse/TIKA-4137?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Allison updated TIKA-4137:
--
Fix Version/s: 2.9.3
> Building current Tika main branch fails under Java 20
[
https://issues.apache.org/jira/browse/TIKA-4252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17845081#comment-17845081
]
Tim Allison commented on TIKA-4252:
---
fetchRequestMetadata, fetchResponseMetadata?
> PipesClient#proc
[
https://issues.apache.org/jira/browse/TIKA-4252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17845072#comment-17845072
]
Tim Allison edited comment on TIKA-4252 at 5/9/24 5:14 PM:
---
fetcher.fetch(String
[
https://issues.apache.org/jira/browse/TIKA-4252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17845072#comment-17845072
]
Tim Allison commented on TIKA-4252:
---
fetcher.fetch(String key, Metadata writeMetadata, Metadata
[
https://issues.apache.org/jira/browse/TIKA-4252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17845068#comment-17845068
]
Tim Allison commented on TIKA-4252:
---
Should we add an optional Metadata object to the FetchKey. We could
[
https://issues.apache.org/jira/browse/TIKA-4252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17845062#comment-17845062
]
Tim Allison commented on TIKA-4252:
---
K, but you don't want that coming back and being populated
[
https://issues.apache.org/jira/browse/TIKA-4252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17845051#comment-17845051
]
Tim Allison commented on TIKA-4252:
---
Or, if you mean that metadata gathered from the fetcher isn't
[
https://issues.apache.org/jira/browse/TIKA-4252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17845048#comment-17845048
]
Tim Allison commented on TIKA-4252:
---
My initial thought for injecting user metadata was to pass through
[
https://issues.apache.org/jira/browse/TIKA-4252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17845047#comment-17845047
]
Tim Allison commented on TIKA-4252:
---
I opened this branch: https://github.com/apache/tika/tree/TIKA-4252
[
https://issues.apache.org/jira/browse/TIKA-4252?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Allison reopened TIKA-4252:
---
I pointed you to the wrong part of the code ... sorry. The design goal was to
overwrite the extracted
[
https://issues.apache.org/jira/browse/TIKA-4253?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17845022#comment-17845022
]
Tim Allison commented on TIKA-4253:
---
This is happening in the unit tests because there are multiple
Tim Allison created TIKA-4253:
-
Summary: Duplicate parsers loaded in AutoDetectParser in 3.x at
least in some unit tests
Key: TIKA-4253
URL: https://issues.apache.org/jira/browse/TIKA-4253
Project: Tika
[
https://issues.apache.org/jira/browse/TIKA-4252?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17844998#comment-17844998
]
Tim Allison commented on TIKA-4252:
---
Good catch:
https://github.com/apache/tika/blob/main/tika-core/src
[
https://issues.apache.org/jira/browse/TIKA-4250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17844976#comment-17844976
]
Tim Allison edited comment on TIKA-4250 at 5/9/24 12:59 PM:
libpst issue
[
https://issues.apache.org/jira/browse/TIKA-4250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17844976#comment-17844976
]
Tim Allison commented on TIKA-4250:
---
libpff issue opened: https://github.com/libyal/libpff/issues/128
All,
I'd like to go for another 3.x beta release and then move fairly quickly
to a 3.0.0 release. I was hoping that
https://issues.apache.org/jira/browse/TIKA-4221 would be wrapped up soon.
It hasn't been, but I can add the workaround we did in 2.x.
What do you think?
Any blockers?
[
https://issues.apache.org/jira/browse/TIKA-4251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Allison updated TIKA-4251:
--
Description:
I was recently working a bit on incubator-stormcrawler, and I noticed that they
are using
[
https://issues.apache.org/jira/browse/TIKA-4251?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Allison updated TIKA-4251:
--
Summary: [DISCUSS] move to cosium's git-code-format-maven-plugin with
google-java-format
Tim Allison created TIKA-4251:
-
Summary: [DISCUSS] move to cosium's git-code-format-maven-plugin
Key: TIKA-4251
URL: https://issues.apache.org/jira/browse/TIKA-4251
Project: Tika
Issue Type
[
https://issues.apache.org/jira/browse/TIKA-4250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17843746#comment-17843746
]
Tim Allison edited comment on TIKA-4250 at 5/6/24 5:03 PM:
---
Wait, so
[
https://issues.apache.org/jira/browse/TIKA-4250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17843798#comment-17843798
]
Tim Allison edited comment on TIKA-4250 at 5/6/24 5:02 PM:
---
So, I caught
[
https://issues.apache.org/jira/browse/TIKA-4250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17843798#comment-17843798
]
Tim Allison commented on TIKA-4250:
---
So, I caught an example of libpst not reading an attachment in our
[
https://issues.apache.org/jira/browse/TIKA-4250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Allison updated TIKA-4250:
--
Attachment: 8.eml
> Add a libpst-based parser
> -
>
>
[
https://issues.apache.org/jira/browse/TIKA-4250?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Allison updated TIKA-4250:
--
Attachment: 8.msg
> Add a libpst-based parser
> -
>
>
[
https://issues.apache.org/jira/browse/TIKA-4250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17843740#comment-17843740
]
Tim Allison edited comment on TIKA-4250 at 5/6/24 1:02 PM:
---
Wow. This is super
[
https://issues.apache.org/jira/browse/TIKA-4250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17843428#comment-17843428
]
Tim Allison commented on TIKA-4250:
---
Given your experience, I think it would be valuable to add libpff
[
https://issues.apache.org/jira/browse/TIKA-4250?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17843361#comment-17843361
]
Tim Allison commented on TIKA-4250:
---
Hahahahaha. I figured you'd have input on this [~lfcnassif]!
Y
[
https://issues.apache.org/jira/browse/TIKA-4249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17843217#comment-17843217
]
Tim Allison commented on TIKA-4249:
---
> Crystal ball is murky on the timing of the next 2.x and
Tim Allison created TIKA-4250:
-
Summary: Add a libpst-based parser
Key: TIKA-4250
URL: https://issues.apache.org/jira/browse/TIKA-4250
Project: Tika
Issue Type: Task
Reporter: Tim
[
https://issues.apache.org/jira/browse/TIKA-4249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17842745#comment-17842745
]
Tim Allison commented on TIKA-4249:
---
Version numbers for the fix are noted above: 2.9.3 and 3.0.0
[
https://issues.apache.org/jira/browse/TIKA-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17842605#comment-17842605
]
Tim Allison commented on TIKA-4243:
---
Do we put it in tika-serialization or a new module?
> t
[
https://issues.apache.org/jira/browse/TIKA-4249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17842604#comment-17842604
]
Tim Allison commented on TIKA-4249:
---
The example file shared was actually kind of weird. I looked like
[
https://issues.apache.org/jira/browse/TIKA-4249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Allison updated TIKA-4249:
--
Summary: EML file is treating it as text file in 2.9.2 version (was: EML
file is treating it as text
[
https://issues.apache.org/jira/browse/TIKA-4249?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Allison resolved TIKA-4249.
---
Fix Version/s: 3.0.0
2.9.3
Resolution: Fixed
> EML file is treat
[
https://issues.apache.org/jira/browse/TIKA-4249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17842405#comment-17842405
]
Tim Allison commented on TIKA-4249:
---
Files never cease to amaze!
Thank you. Onwards!
> EML f
[
https://issues.apache.org/jira/browse/TIKA-4249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17842402#comment-17842402
]
Tim Allison commented on TIKA-4249:
---
Modifying the first hit from {{offset="0"}} to {{o
[
https://issues.apache.org/jira/browse/TIKA-4249?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17842401#comment-17842401
]
Tim Allison commented on TIKA-4249:
---
I'm guessing you mean 2.9.0->2.9.2.
The challenge with this f
Tim Allison created TIKA-4248:
-
Summary: Improve PST handling of attachments
Key: TIKA-4248
URL: https://issues.apache.org/jira/browse/TIKA-4248
Project: Tika
Issue Type: Task
https://github.com/apache/tika/commit/63b7e91477d1dcdb0a5535dd4a008a3562a0609b
W00t. Thank you, Tilman!
On Mon, Apr 29, 2024 at 10:58 AM Tilman Hausherr
wrote:
> Yes!
>
> Tilman
>
> On 29.04.2024 16:55, Tim Allison wrote:
> > Oh, interesting. Should we bump t
:
> The positive side is that it's less interruptions.
> One negative side is that there seems to be a maximum. Today it didn't
> report the AWS update, which was detected in the past.
> Tilman
>
> On 29.04.2024 16:34, Tim Allison wrote:
> > The move to weekly dependabot has been
The move to weekly dependabot has been a bit of a relief for me personally.
Our mail list isn't clogged w daily dependabot updates (and yes, I know I
can apply a filter :/).
How is it working for everyone else?
On Wed, Apr 10, 2024 at 4:09 PM Tim Allison wrote:
> >you start deletin
[
https://issues.apache.org/jira/browse/TIKA-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17841252#comment-17841252
]
Tim Allison commented on TIKA-4243:
---
https://json-schema.org/learn/getting-started-step-by-step
Yes
Worst case scenario, or if you're building older releases:
mvn clean install -Dossindex.skip
On Mon, Apr 22, 2024 at 10:35 AM Nicholas DiPiazza <
nicholas.dipia...@gmail.com> wrote:
> thanks I'll pull latest
> appreciate your help.
>
> On Mon, Apr 22, 2024 at 9:30 AM Tilman Hausherr
> wrote:
[
https://issues.apache.org/jira/browse/TIKA-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17841242#comment-17841242
]
Tim Allison edited comment on TIKA-4243 at 4/26/24 1:32 PM:
I really, really
[
https://issues.apache.org/jira/browse/TIKA-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17841243#comment-17841243
]
Tim Allison commented on TIKA-4243:
---
Oh, sorry. Does this break anything? Can we add this as a new
[
https://issues.apache.org/jira/browse/TIKA-4243?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17841242#comment-17841242
]
Tim Allison commented on TIKA-4243:
---
I really, really want to clean up our configuration, and moving
[
https://issues.apache.org/jira/browse/TIKA-4245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17841221#comment-17841221
]
Tim Allison edited comment on TIKA-4245 at 4/26/24 1:23 PM:
Oops, sorry. I
[
https://issues.apache.org/jira/browse/TIKA-4245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17841221#comment-17841221
]
Tim Allison commented on TIKA-4245:
---
Oops, sorry. I didn't realize you sent your tika-config.xml. Y, one
[
https://issues.apache.org/jira/browse/TIKA-4245?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17841220#comment-17841220
]
Tim Allison commented on TIKA-4245:
---
This is an ongoing area for improvement in Tika.
The algorithm
That's not possible yet. Please open an issue on our JIRA...you may need to
request an account(?).
On Fri, Apr 26, 2024 at 6:01 AM Emil Zegers
wrote:
> Hi,
>
> I'm looking for information if it is possible to configure
> FileSystemFetcher for tika-pipes to only process certain files, e.g. based
[
https://issues.apache.org/jira/browse/TIKA-4244?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Allison resolved TIKA-4244.
---
Fix Version/s: 3.0.0
2.9.3
Resolution: Fixed
Thank you [~boomxlucifer
[
https://issues.apache.org/jira/browse/TIKA-4244?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17840852#comment-17840852
]
Tim Allison commented on TIKA-4244:
---
Thank you [~boomxlucifer] for finding this and reporting
[
https://issues.apache.org/jira/browse/TIKA-4166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17839780#comment-17839780
]
Tim Allison commented on TIKA-4166:
---
Thank you!
> dependency updates for Tika
[
https://issues.apache.org/jira/browse/TIKA-4242?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Allison resolved TIKA-4242.
---
Resolution: Fixed
> Tika depends on non-existing plexus-utils vers
[
https://issues.apache.org/jira/browse/TIKA-4242?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17838260#comment-17838260
]
Tim Allison commented on TIKA-4242:
---
Looks like the reason we haven't found this problem is that we
[
https://issues.apache.org/jira/browse/TIKA-4241?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17837806#comment-17837806
]
Tim Allison commented on TIKA-4241:
---
They add a custom key in the trailer {{/AdditionalStreams}} whose
[
https://issues.apache.org/jira/browse/TIKA-4241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Allison updated TIKA-4241:
--
Attachment: testPDF_additionalStreams.pdf
> Consider handling LibreOffice's /AdditionalStreams &quo
Tim Allison created TIKA-4241:
-
Summary: Consider handling LibreOffice's /AdditionalStreams
"hybrid PDF" attachment embedding in PDFs
Key: TIKA-4241
URL: https://issues.apache.org/jira/browse
[
https://issues.apache.org/jira/browse/TIKA-4241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Allison updated TIKA-4241:
--
Description:
Some info here:
https://stackoverflow.com/questions/67358370/what-the-standard-used
.
And please, oh, please don't tell me that the llms are responsible for this!
I'm hoping this is a post report echo artifact and not the cause of this
report.
https://gist.github.com/LLM4IG/6614bfa658295d7af07a6d37e06db27f
-- Forwarded message -
From: Tim Allison
Date: Thu, Apr 11, 2024
[
https://issues.apache.org/jira/browse/TIKA-4240?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17836228#comment-17836228
]
Tim Allison commented on TIKA-4240:
---
Thank you, [~tilman]! Should I revert to daily?
> Cha
[
https://issues.apache.org/jira/browse/TIKA-4240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Allison resolved TIKA-4240.
---
Resolution: Fixed
Let's see how this goes. Thank you!
> Change dependabot to wee
Tim Allison created TIKA-4240:
-
Summary: Change dependabot to weekly
Key: TIKA-4240
URL: https://issues.apache.org/jira/browse/TIKA-4240
Project: Tika
Issue Type: Task
Reporter: Tim
aily because this way we can learn ASAP if there are
> > troubles with new dependency versions, although I'm now too busy.
> >
> > Tilman
> >
> >
> >
> > -- Original-Nachricht --
> > Von: Tim Allison
> > Betreff: Bump dependabot to weekly?
> > Da
All,
Tilman has been doing heroic work keeping us up to date with
dependabot's PRs. Given our pace of releases, would it make sense to
backoff to weekly updates?
Before running regression tests, we'd run the update plugin to make
sure that we're up to date.
What do you think?
Best,
; quality of life much better
>
> On Wed, Apr 10, 2024, 10:03 AM Tim Allison wrote:
>
> > I bumped line length to 180 from 120. Let's see if that's enough.
> >
> > I'm not sure what the best option is for chained method calls?
> > "Chained method calls" ->
ormatter new line settings
> to allow multi-line streaming expressions
>
> builder()
> .name("nick")
> .someOtherStuff("doIt")
> .build()
>
> right now the formatter turns that into 1 line
>
> On Wed, Apr 10, 2024 at 5:06 AM Tim Allison
Sounds good. What length?
On Wed, Apr 10, 2024 at 1:18 AM Nicholas DiPiazza
wrote:
>
> can we bump up the line break to a more reasonable number?
> some of the stream expressions start to wrap and wrap and warp forcing me
> to use smaller variable names or break down into methods when i'd
/c330b12h1fvmq8x1099mgw3tfs0gcp6q
On Mon, Apr 8, 2024 at 12:09 PM Tim Allison wrote:
>
> From October 2023:
> https://www.brilworks.com/blog/java-11-countdown-to-end-of-support/
>
> Getting 3.x out has taken longer than I had anticipated. Should we
> reopen the 17 vs 11 discussion given Eric
tps://github.com/infiniflow/ragflow which might also
> > have some interesting chunking approaches.
> >
> > Thanks
> >
> > Michael
> >
> > Am 09.04.24 um 01:25 schrieb Nick Burch:
> >> On Mon, 8 Apr 2024, Tim Allison wrote:
> >>> Not sure
Not sure we should jump on the bandwagon, but anything we can do to support
smart chunking would benefit us.
Could just be more integrations with parsers that turn out to be useful. I
haven’t had much joy with some. Here’s one that I haven’t evaluated yet:
https://github.com/Filimoa/open-parse
es that are in recent Java versions that we know about?
>
> > On Apr 8, 2024, at 7:02 AM, Tim Allison wrote:
> >
> > Sorry, more correctly:
> >
> > OpenNLP is effectively EOL'd for our 3.x because OpenNLP >= 2.3.0
> > requires Java 17 and our 3.x is still o
Sorry, more correctly:
OpenNLP is effectively EOL'd for our 3.x because OpenNLP >= 2.3.0
requires Java 17 and our 3.x is still on 11.
On Mon, Apr 8, 2024 at 6:30 AM Tim Allison wrote:
>
> All,
> As Brian pointed out, optimaize is no longer maintained, and it has
> some depende
ate the tika
process in it’s own heap space as a separate java process rather than
adding it to our app, but I suppose we could work around that
Thank you
Brian Laskey
From: Tim Allison
Reply-To: "u...@tika.apache.org"
Date: Friday, March 8, 2024 at 9:44 AM
To: "u...@tika.ap
All,
I'm now thinking it would make sense to have one more 3.x beta
release before the final 3.0.0. Are there any breaking changes that we
want to get into 3.x?
I'd like to wait for COMPRESS-675 to be fixed and for COMPRESS-674
to be released before we release 3.0.0-BETA2. Any other items that
[
https://issues.apache.org/jira/browse/TIKA-4233?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Allison updated TIKA-4233:
--
Fix Version/s: (was: 3.0.0)
> Check tika-helm for deprecated k8s A
1 - 100 of 9683 matches
Mail list logo