[jira] [Commented] (TIKA-1997) Problem in Tika().detect for xml file signed in CADES
[ https://issues.apache.org/jira/browse/TIKA-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17571451#comment-17571451 ] Adrien Nguyen commented on TIKA-1997: - [~NguyenKhacChinh] [~roberto.benedetti] I see that the config file also specifies for pkcs7-signature and for application/x-dbf If the filename had the correct extension, would it be a higher priority then the magic value ? Is there a way to change that priority or disable application/x-dbf entirely in our own app ? > Problem in Tika().detect for xml file signed in CADES > - > > Key: TIKA-1997 > URL: https://issues.apache.org/jira/browse/TIKA-1997 > Project: Tika > Issue Type: Sub-task > Components: detector >Affects Versions: 1.13 > Environment: JDK 1.7 >Reporter: Michele Andreano >Priority: Blocker > Attachments: test.xml.p7m > > > When I submit a tika a xml file signed in P7M format, I expect tika return as > mimetype application / pkcs7-mime instead gives me application / > pkcs7-signature. > How is it possible? -- This message was sent by Atlassian Jira (v8.20.10#820010)
[jira] [Commented] (TIKA-1997) Problem in Tika().detect for xml file signed in CADES
[ https://issues.apache.org/jira/browse/TIKA-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16821195#comment-16821195 ] Roberto Benedetti commented on TIKA-1997: - I think your problem is that {{application/x-dbf }} has a higher priority than {{application/pkcs7-signature}} and the regular expression with that mime type accidentally matches your file. > Problem in Tika().detect for xml file signed in CADES > - > > Key: TIKA-1997 > URL: https://issues.apache.org/jira/browse/TIKA-1997 > Project: Tika > Issue Type: Sub-task > Components: detector >Affects Versions: 1.13 > Environment: JDK 1.7 >Reporter: Michele Andreano >Priority: Blocker > Attachments: test.xml.p7m > > > When I submit a tika a xml file signed in P7M format, I expect tika return as > mimetype application / pkcs7-mime instead gives me application / > pkcs7-signature. > How is it possible? -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (TIKA-1997) Problem in Tika().detect for xml file signed in CADES
[ https://issues.apache.org/jira/browse/TIKA-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16821015#comment-16821015 ] Chinh Nguyen commented on TIKA-1997: Currently, with tika 1.20, the attached pkcs7 file is detected as a DBF file: {quote}java -jar tika-app-1.20.jar -d test.xml.p7m application/x-dbf {quote} > Problem in Tika().detect for xml file signed in CADES > - > > Key: TIKA-1997 > URL: https://issues.apache.org/jira/browse/TIKA-1997 > Project: Tika > Issue Type: Sub-task > Components: detector >Affects Versions: 1.13 > Environment: JDK 1.7 >Reporter: Michele Andreano >Priority: Blocker > Attachments: test.xml.p7m > > > When I submit a tika a xml file signed in P7M format, I expect tika return as > mimetype application / pkcs7-mime instead gives me application / > pkcs7-signature. > How is it possible? -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (TIKA-1997) Problem in Tika().detect for xml file signed in CADES
[ https://issues.apache.org/jira/browse/TIKA-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16752361#comment-16752361 ] ASF GitHub Bot commented on TIKA-1997: -- dedabob commented on pull request #267: TIKA-1997 Problem in Tika().detect for xml file signed in CADES URL: https://github.com/apache/tika/pull/267 Proper application/pkcs7-mime and application/pkcs7-signature types detection. No changes to application/x-pkcs7-certificates cause they break some tests. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Problem in Tika().detect for xml file signed in CADES > - > > Key: TIKA-1997 > URL: https://issues.apache.org/jira/browse/TIKA-1997 > Project: Tika > Issue Type: Sub-task > Components: detector >Affects Versions: 1.13 > Environment: JDK 1.7 >Reporter: Michele Andreano >Priority: Blocker > Attachments: test.xml.p7m > > > When I submit a tika a xml file signed in P7M format, I expect tika return as > mimetype application / pkcs7-mime instead gives me application / > pkcs7-signature. > How is it possible? -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (TIKA-1997) Problem in Tika().detect for xml file signed in CADES
[ https://issues.apache.org/jira/browse/TIKA-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16749970#comment-16749970 ] Tim Allison commented on TIKA-1997: --- Please do share example files. Thank you for the ping, the references and the explanation. > Problem in Tika().detect for xml file signed in CADES > - > > Key: TIKA-1997 > URL: https://issues.apache.org/jira/browse/TIKA-1997 > Project: Tika > Issue Type: Sub-task > Components: detector >Affects Versions: 1.13 > Environment: JDK 1.7 >Reporter: Michele Andreano >Priority: Blocker > Attachments: test.xml.p7m > > > When I submit a tika a xml file signed in P7M format, I expect tika return as > mimetype application / pkcs7-mime instead gives me application / > pkcs7-signature. > How is it possible? -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (TIKA-1997) Problem in Tika().detect for xml file signed in CADES
[ https://issues.apache.org/jira/browse/TIKA-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16746737#comment-16746737 ] Roberto Benedetti commented on TIKA-1997: - Updated references are: * [RFC-5652, Cryptographic Message Syntax (CMS)|https://tools.ietf.org/html/rfc5652] * [RFC-5751, Secure/Multipurpose Internet Mail Extensions (S/MIME) Version 3.2 Message Specification|https://tools.ietf.org/html/rfc5751] * [RFC-7468, Textual Encodings of PKIX, PKCS, and CMS Structures|https://tools.ietf.org/html/rfc7468] Tika looks for "pkcs7-signedData" OID at the beginning of the file and, if found, returns "application/pkcs7-signature". There are, however, three media types with that OID at the beginning, namely: * "application/pkcs7-signature", extention ".p7s", when the signed content is not present (detached signature) * "application/pkcs7-mime; smime-type=signed-data", extension ".p7m", when the signed content is present * "application/pkcs7-mime; smime-type=certs-only", extension ".p7c" (".p7b" not mentioned but can be found too), when there are only certificates and (optionally) CRLs Extension ".p7m" is also used when the OID at the beginning is "pkcs7-envelopedData" and the media type is "application/pkcs7-mime; smime-type=enveloped-data". Extension ".p7z" is used when the OID at the beginning is "id-smime-ct-compressedData" and the media type is "application/pkcs7-mime; smime-type=compressed-data". Furthermore the label in the textual encoding is always PKCS7 (i.e. the file begins with "-BEGIN PKCS7"). I can provide examples, built using openssl, but to support those media types Tika shall: * return parameters in media type when detecting streams * return different extensions based on media type parameters * further inspect streams when "-BEGIN PKCS7" or pkcs7-signedData are found (like it does for XML streams) > Problem in Tika().detect for xml file signed in CADES > - > > Key: TIKA-1997 > URL: https://issues.apache.org/jira/browse/TIKA-1997 > Project: Tika > Issue Type: Sub-task > Components: detector >Affects Versions: 1.13 > Environment: JDK 1.7 >Reporter: Michele Andreano >Priority: Blocker > Attachments: test.xml.p7m > > > When I submit a tika a xml file signed in P7M format, I expect tika return as > mimetype application / pkcs7-mime instead gives me application / > pkcs7-signature. > How is it possible? -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (TIKA-1997) Problem in Tika().detect for xml file signed in CADES
[ https://issues.apache.org/jira/browse/TIKA-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16042465#comment-16042465 ] Alessandro Scaldaferro commented on TIKA-1997: -- In https://www.ietf.org/rfc/rfc2633.txt I've found the following infos: MIME TypeFile Extension Application/pkcs7-mime (signedData, .p7m envelopedData) Application/pkcs7-mime (degenerate .p7c signedData "certs-only" message) Application/pkcs7-signature .p7s Looks like .p7m files are signed file (the original file + the signature data), and .p7s files are "signature files" containing only the signature data but not the original file. > Problem in Tika().detect for xml file signed in CADES > - > > Key: TIKA-1997 > URL: https://issues.apache.org/jira/browse/TIKA-1997 > Project: Tika > Issue Type: Sub-task > Components: detector >Affects Versions: 1.13 > Environment: JDK 1.7 >Reporter: Michele Andreano >Priority: Blocker > Attachments: test.xml.p7m > > > When I submit a tika a xml file signed in P7M format, I expect tika return as > mimetype application / pkcs7-mime instead gives me application / > pkcs7-signature. > How is it possible? -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (TIKA-1997) Problem in Tika().detect for xml file signed in CADES
[ https://issues.apache.org/jira/browse/TIKA-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15504023#comment-15504023 ] Nick Burch commented on TIKA-1997: -- Running your file through the openssl tool {{ asn1parse }}, it shows your file as having / being a first object of type {{ pkcs7-signedData }}. It also shows the signature from {{ INFOCERT SPA }}. So, it does look to be a signed PKCS7 file, and hence Tika appears to be doing the right thing Unless I've mis-understood something about PKCS7 files and/or the asn1 dump output? > Problem in Tika().detect for xml file signed in CADES > - > > Key: TIKA-1997 > URL: https://issues.apache.org/jira/browse/TIKA-1997 > Project: Tika > Issue Type: Sub-task > Components: detector >Affects Versions: 1.13 > Environment: JDK 1.7 >Reporter: Michele Andreano >Priority: Blocker > Attachments: test.xml.p7m > > > When I submit a tika a xml file signed in P7M format, I expect tika return as > mimetype application / pkcs7-mime instead gives me application / > pkcs7-signature. > How is it possible? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-1997) Problem in Tika().detect for xml file signed in CADES
[ https://issues.apache.org/jira/browse/TIKA-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15493792#comment-15493792 ] Tim Allison commented on TIKA-1997: --- [~gagravarr], any recommendations on this one? > Problem in Tika().detect for xml file signed in CADES > - > > Key: TIKA-1997 > URL: https://issues.apache.org/jira/browse/TIKA-1997 > Project: Tika > Issue Type: Sub-task > Components: detector >Affects Versions: 1.13 > Environment: JDK 1.7 >Reporter: Michele Andreano >Priority: Blocker > Attachments: test.xml.p7m > > > When I submit a tika a xml file signed in P7M format, I expect tika return as > mimetype application / pkcs7-mime instead gives me application / > pkcs7-signature. > How is it possible? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-1997) Problem in Tika().detect for xml file signed in CADES
[ https://issues.apache.org/jira/browse/TIKA-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15318100#comment-15318100 ] Michele Andreano commented on TIKA-1997: Attached to this issue you can find an example > Problem in Tika().detect for xml file signed in CADES > - > > Key: TIKA-1997 > URL: https://issues.apache.org/jira/browse/TIKA-1997 > Project: Tika > Issue Type: Sub-task > Components: detector >Affects Versions: 1.13 > Environment: JDK 1.7 >Reporter: Michele Andreano > Fix For: 1.13 > > Attachments: test.xml.p7m > > > When I submit a tika a xml file signed in P7M format, I expect tika return as > mimetype application / pkcs7-mime instead gives me application / > pkcs7-signature. > How is it possible? -- This message was sent by Atlassian JIRA (v6.3.4#6332)