Hi Seb,

I'm sorry for taking forever to reply.  That's a bug.  Now fixed:
https://issues.apache.org/jira/browse/TIKA-3593

If you specify the DcXMLParser in your tika-config after the default
parser, it _should_ be selected instead of the XMLParser.  Let me know
if I can help with this temporary workaround.

Thank you for identifying this problem!

Cheers,

      Tim

On Thu, Nov 11, 2021 at 7:21 AM Sebastian Nagel
<[email protected]> wrote:
>
> Hi,
>
> when is the Dublin Core XML parser used to parse XML files?
> Is there a configuration required to enable the DcXMLParser?
>
> There is a difference between 1.27 and 2.1.0:
>
> $> java -jar tika-app-1.27.jar -J \
>       https://news.haltonhills.halinet.on.ca/dc.xml \
>    | jq '.[0]."dc:title"'
> "Deaths"
> $> java -jar tika-app-2.1.0.jar ...
> null
>
> $> java -jar tika-app-1.27.jar -J \
>       https://news.haltonhills.halinet.on.ca/dc.xml \
>    | jq '.[0]."X-Parsed-By"'
> [
>   "org.apache.tika.parser.DefaultParser",
>   "org.apache.tika.parser.xml.DcXMLParser"
> ]
> $> java -jar tika-app-2.1.0.jar -J \
>       https://news.haltonhills.halinet.on.ca/dc.xml \
>    | jq '.[0]."X-TIKA:Parsed-By"'
> [
>   "org.apache.tika.parser.DefaultParser",
>   "org.apache.tika.parser.xml.XMLParser"
> ]
>
>
> Thanks,
> Sebastian

Reply via email to