[jira] [Commented] (NUTCH-2959) Upgrade to Apache Tika 2.4.1

2022-08-10 Thread Sebastian Nagel (Jira)


[ 
https://issues.apache.org/jira/browse/NUTCH-2959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17577845#comment-17577845
 ] 

Sebastian Nagel commented on NUTCH-2959:


Hi [~markus17],

regarding the error with javax.ws.rs dependency: this was an issue with a long 
story until it was fixed (cf. NUTCH-2669, NUTCH-2697, IVY-1586). I remember it 
was painful to get a clean system: delete ~/.ivy2/ and make sure that no ivy 
jar older than 2.5.0 is used and writes to ~/.ivy2/. This prohibits building 
older versions of Nutch, and also other projects built with ant/ivy. An older 
version of ivy could be also requested and downloaded by a Nutch plugin - check 
for properties ivy.version or ivy.installversion, and also whether ivy jars 
happened to be installed somewhere on the system (eg. ~/.ivy2/lib/).

While trying to upgrade to 2.4.0 (NUTCH-2948) I've also I've run in a test 
failure probably due to conflicting dependencies:
- tika-core 2.4.0 required by Nutch core (ivy/ivy.xml)
- any23 requiring tika-parser 2.3.0
- parse-tika requiring tika-parser 2.4.0

In the past there were no issues as any23 includes tika-core. But, eventually, 
we now need to exclude or overwrite some deps in any23.

> Upgrade to Apache Tika 2.4.1
> 
>
> Key: NUTCH-2959
> URL: https://issues.apache.org/jira/browse/NUTCH-2959
> Project: Nutch
>  Issue Type: Task
>Reporter: Markus Jelsma
>Priority: Major
> Fix For: 1.19
>
> Attachments: NUTCH-2959.patch
>
>




--
This message was sent by Atlassian Jira
(v8.20.10#820010)


Re: [DISCUSS] Release 1.19 ?

2022-08-10 Thread Sebastian Nagel
Hi Markus,

> i'll submit a patch to upgrade to the current 2.4.1.

Great!

I'll work through the open issues, try to get patches
or open PRs merged. Also decided to update the license
and notice files (there are a couple of open issues).

Best,
Sebastian



On 8/9/22 15:25, Markus Jelsma wrote:
> Sounds good!
> 
> I see we're still at Tika 2.3.0, i'll submit a patch to upgrade to the
> current 2.4.1.
> 
> Thanks!
> Markus
> 
> Op di 9 aug. 2022 om 09:11 schreef Sebastian Nagel :
> 
>> Hi all,
>>
>> more than 60 issues are done for Nutch 1.19
>>
>>   https://issues.apache.org/jira/projects/NUTCH/versions/12349580
>>
>> including
>>  - important dependency upgrades
>>- Hadoop 3.3.3
>>- Any23 2.7
>>- Tika 2.3.0
>>  - plugin-specific URL stream handlers (NUTCH-2429)
>>  - migration
>>- from Java/JDK 8 to 11
>>- from Log4j 1 to Log4j 2
>>
>> ... and various other fixes and improvements.
>>
>> The last release (1.18) happened in January 2021, so it's definitely high
>> time
>> to release 1.19. As usual, we'll check all remaining issues whether they
>> should
>> be fixed now or can be done in a later release.
>>
>> I would be ready to push a release candidate during the next two weeks and
>> will start to work through the remaining issues and also check for
>> dependency
>> upgrades required to address potential vulnerabilities. Please, comment on
>> issues you want to get fixed already in 1.19! Reviews of open pull
>> requests and
>> patches are also welcome!
>>
>> Thanks,
>> Sebastian
>>
>