Hi Seb,
I'm sorry for taking forever to reply. That's a bug. Now fixed:
https://issues.apache.org/jira/browse/TIKA-3593
If you specify the DcXMLParser in your tika-config after the default
parser, it _should_ be selected instead of the XMLParser. Let me know
if I can help with this temporary workaround.
Thank you for identifying this problem!
Cheers,
Tim
On Thu, Nov 11, 2021 at 7:21 AM Sebastian Nagel
<[email protected]> wrote:
>
> Hi,
>
> when is the Dublin Core XML parser used to parse XML files?
> Is there a configuration required to enable the DcXMLParser?
>
> There is a difference between 1.27 and 2.1.0:
>
> $> java -jar tika-app-1.27.jar -J \
> https://news.haltonhills.halinet.on.ca/dc.xml \
> | jq '.[0]."dc:title"'
> "Deaths"
> $> java -jar tika-app-2.1.0.jar ...
> null
>
> $> java -jar tika-app-1.27.jar -J \
> https://news.haltonhills.halinet.on.ca/dc.xml \
> | jq '.[0]."X-Parsed-By"'
> [
> "org.apache.tika.parser.DefaultParser",
> "org.apache.tika.parser.xml.DcXMLParser"
> ]
> $> java -jar tika-app-2.1.0.jar -J \
> https://news.haltonhills.halinet.on.ca/dc.xml \
> | jq '.[0]."X-TIKA:Parsed-By"'
> [
> "org.apache.tika.parser.DefaultParser",
> "org.apache.tika.parser.xml.XMLParser"
> ]
>
>
> Thanks,
> Sebastian