[jira] [Comment Edited] (TIKA-1997) Problem in Tika().detect for xml file signed in CADES

2019-04-18 Thread Roberto Benedetti (JIRA)


[ 
https://issues.apache.org/jira/browse/TIKA-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16821195#comment-16821195
 ] 

Roberto Benedetti edited comment on TIKA-1997 at 4/18/19 3:10 PM:
--

I think your problem is that {{application/x-dbf }} has a higher priority than 
{{application/pkcs7-signature}} and the regular expression of that mime type 
accidentally matches your file.

 


was (Author: roberto.benedetti):
I think your problem is that {{application/x-dbf }} has a higher priority than 
{{application/pkcs7-signature}} and the regular expression with that mime type 
accidentally matches your file.

 

> Problem in Tika().detect for xml file signed in CADES
> -
>
> Key: TIKA-1997
> URL: https://issues.apache.org/jira/browse/TIKA-1997
> Project: Tika
>  Issue Type: Sub-task
>  Components: detector
>Affects Versions: 1.13
> Environment: JDK 1.7
>Reporter: Michele Andreano
>Priority: Blocker
> Attachments: test.xml.p7m
>
>
> When I submit a tika a xml file signed in P7M format, I expect tika return as 
> mimetype application / pkcs7-mime instead gives me application / 
> pkcs7-signature.
> How is it possible?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (TIKA-1997) Problem in Tika().detect for xml file signed in CADES

2019-01-18 Thread Roberto Benedetti (JIRA)


[ 
https://issues.apache.org/jira/browse/TIKA-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16746737#comment-16746737
 ] 

Roberto Benedetti edited comment on TIKA-1997 at 1/18/19 11:23 PM:
---

Updated references are:
 * [RFC-5652, Cryptographic Message Syntax 
(CMS)|https://tools.ietf.org/html/rfc5652]
 * [RFC-5751, Secure/Multipurpose Internet Mail Extensions (S/MIME) Version 3.2 
Message Specification|https://tools.ietf.org/html/rfc5751]
 * [RFC-7468, Textual Encodings of PKIX, PKCS, and CMS 
Structures|https://tools.ietf.org/html/rfc7468]

Tika looks for any "pkcs7" OID at the beginning of the file and, if found, 
returns "application/pkcs7-signature".

The OIDs that should be looked for are "pkcs7-signedData", 
"pkcs7-envelopedData" and "id-smime-ct-compressedData".

There are three media types with "pkcs7-signedData" at the beginning, namely:
 * "application/pkcs7-signature", extention ".p7s",  when the signed content is 
not present (detached signature)
 * "application/pkcs7-mime; smime-type=signed-data", extension ".p7m", when the 
signed content is present
 * "application/pkcs7-mime; smime-type=certs-only", extension ".p7c", when 
there are only certificates and (optionally) CRLs

When the OID is "pkcs7-envelopedData" the media type is 
"application/pkcs7-mime; smime-type=enveloped-data" and the extension is ".p7m".

When the OID is "id-smime-ct-compressedData" the media type is 
"application/pkcs7-mime; smime-type=compressed-data" and the extension is 
".p7z".

Extension ".p7b" is registered in Tika with media type 
"application/x-pkcs7-certificates" but I think the content of such files is the 
same as ".p7c" ones.

Furthermore the label in the textual encoding is always PKCS7 (i.e. the file 
begins with "-BEGIN PKCS7").

I can provide examples, built using openssl, but to support those media types 
Tika shall:
 * return parameters in media type when detecting streams
 * return different extensions based on media type parameters
 * further inspect streams when "-BEGIN PKCS7" or "pkcs7-signedData" are 
found (like it does for XML streams)
 * register "application/pkcs7-signature" as sub-class of 
"application/pkcs7-mime"
 * register "application/x-pkcs7-certificates" as an alias of 
"application/pkcs7-mime"

 


was (Author: roberto.benedetti):
Updated references are:
 * [RFC-5652, Cryptographic Message Syntax 
(CMS)|https://tools.ietf.org/html/rfc5652]
 * [RFC-5751, Secure/Multipurpose Internet Mail Extensions (S/MIME) Version 3.2 
Message Specification|https://tools.ietf.org/html/rfc5751]
 * [RFC-7468, Textual Encodings of PKIX, PKCS, and CMS 
Structures|https://tools.ietf.org/html/rfc7468]

Tika looks for any "pkcs7" OID at the beginning of the file and, if found, 
returns "application/pkcs7-signature".

The OIDs that should be looked for are "pkcs7-signedData", 
"pkcs7-envelopedData" and "id-smime-ct-compressedData".

There are three media types with "pkcs7-signedData" at the beginning, namely:
 * "application/pkcs7-signature", extention ".p7s",  when the signed content is 
not present (detached signature)
 * "application/pkcs7-mime; smime-type=signed-data", extension ".p7m", when the 
signed content is present
 * "application/pkcs7-mime; smime-type=certs-only", extension ".p7c", when 
there are only certificates and (optionally) CRLs

When the OID is "pkcs7-envelopedData" the media type is 
"application/pkcs7-mime; smime-type=enveloped-data" and the extension is ".p7m".

When the OID is "id-smime-ct-compressedData" the media type is 
"application/pkcs7-mime; smime-type=compressed-data" and the extension is 
".p7z".

Extension ".p7b" is registered in Tika with media type 
"application/x-pkcs7-certificates" but I think the content of such files is the 
same as ".p7c" ones.

Furthermore the label in the textual encoding is always PKCS7 (i.e. the file 
begins with "-BEGIN PKCS7").

I can provide examples, built using openssl, but to support those media types 
Tika shall:
 * return parameters in media type when detecting streams
 * return different extensions based on media type parameters
 * further inspect streams when "-BEGIN PKCS7" or "pkcs7-signedData" are 
found (like it does for XML streams)
 * register "application/pkcs7-signature" as sub-class of 
"application/pkcs7-mime"
 * remove "application/x-pkcs7-certificates"

 

> Problem in Tika().detect for xml file signed in CADES
> -
>
> Key: TIKA-1997
> URL: https://issues.apache.org/jira/browse/TIKA-1997
> Project: Tika
>  Issue Type: Sub-task
>  Components: detector
>Affects Versions: 1.13
> Environment: JDK 1.7
>Reporter: Michele Andreano
>Priority: Blocker
> Attachments: test.xml.p7m
>
>
> When I submit a tika a xml file signed in P7M format, I expect tika return as 
> 

[jira] [Comment Edited] (TIKA-1997) Problem in Tika().detect for xml file signed in CADES

2019-01-18 Thread Roberto Benedetti (JIRA)


[ 
https://issues.apache.org/jira/browse/TIKA-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16746737#comment-16746737
 ] 

Roberto Benedetti edited comment on TIKA-1997 at 1/18/19 11:24 PM:
---

Updated references are:
 * [RFC-5652, Cryptographic Message Syntax 
(CMS)|https://tools.ietf.org/html/rfc5652]
 * [RFC-5751, Secure/Multipurpose Internet Mail Extensions (S/MIME) Version 3.2 
Message Specification|https://tools.ietf.org/html/rfc5751]
 * [RFC-7468, Textual Encodings of PKIX, PKCS, and CMS 
Structures|https://tools.ietf.org/html/rfc7468]

Tika looks for any "pkcs7" OID at the beginning of the file and, if found, 
returns "application/pkcs7-signature".

The OIDs that should be looked for are "pkcs7-signedData", 
"pkcs7-envelopedData" and "id-smime-ct-compressedData".

There are three media types with "pkcs7-signedData" at the beginning, namely:
 * "application/pkcs7-signature", extention ".p7s",  when the signed content is 
not present (detached signature)
 * "application/pkcs7-mime; smime-type=signed-data", extension ".p7m", when the 
signed content is present
 * "application/pkcs7-mime; smime-type=certs-only", extension ".p7c", when 
there are only certificates and (optionally) CRLs

When the OID is "pkcs7-envelopedData" the media type is 
"application/pkcs7-mime; smime-type=enveloped-data" and the extension is ".p7m".

When the OID is "id-smime-ct-compressedData" the media type is 
"application/pkcs7-mime; smime-type=compressed-data" and the extension is 
".p7z".

Extension ".p7b" is registered in Tika with media type 
"application/x-pkcs7-certificates" but I think the content of such files is the 
same as ".p7c" ones.

Furthermore the label in the textual encoding is always PKCS7 (i.e. the file 
begins with "-BEGIN PKCS7").

I can provide examples, built using openssl, but to support those media types 
Tika shall:
 * return parameters in media type when detecting streams
 * return different extensions based on media type parameters
 * further inspect streams when "-BEGIN PKCS7" or "pkcs7-signedData" are 
found (like it does for XML streams)
 * register "application/pkcs7-signature" as sub-class of 
"application/pkcs7-mime"
 * register "application/x-pkcs7-certificates" as an alias of 
"application/pkcs7-mime; smime-type=certs-only"

 


was (Author: roberto.benedetti):
Updated references are:
 * [RFC-5652, Cryptographic Message Syntax 
(CMS)|https://tools.ietf.org/html/rfc5652]
 * [RFC-5751, Secure/Multipurpose Internet Mail Extensions (S/MIME) Version 3.2 
Message Specification|https://tools.ietf.org/html/rfc5751]
 * [RFC-7468, Textual Encodings of PKIX, PKCS, and CMS 
Structures|https://tools.ietf.org/html/rfc7468]

Tika looks for any "pkcs7" OID at the beginning of the file and, if found, 
returns "application/pkcs7-signature".

The OIDs that should be looked for are "pkcs7-signedData", 
"pkcs7-envelopedData" and "id-smime-ct-compressedData".

There are three media types with "pkcs7-signedData" at the beginning, namely:
 * "application/pkcs7-signature", extention ".p7s",  when the signed content is 
not present (detached signature)
 * "application/pkcs7-mime; smime-type=signed-data", extension ".p7m", when the 
signed content is present
 * "application/pkcs7-mime; smime-type=certs-only", extension ".p7c", when 
there are only certificates and (optionally) CRLs

When the OID is "pkcs7-envelopedData" the media type is 
"application/pkcs7-mime; smime-type=enveloped-data" and the extension is ".p7m".

When the OID is "id-smime-ct-compressedData" the media type is 
"application/pkcs7-mime; smime-type=compressed-data" and the extension is 
".p7z".

Extension ".p7b" is registered in Tika with media type 
"application/x-pkcs7-certificates" but I think the content of such files is the 
same as ".p7c" ones.

Furthermore the label in the textual encoding is always PKCS7 (i.e. the file 
begins with "-BEGIN PKCS7").

I can provide examples, built using openssl, but to support those media types 
Tika shall:
 * return parameters in media type when detecting streams
 * return different extensions based on media type parameters
 * further inspect streams when "-BEGIN PKCS7" or "pkcs7-signedData" are 
found (like it does for XML streams)
 * register "application/pkcs7-signature" as sub-class of 
"application/pkcs7-mime"
 * register "application/x-pkcs7-certificates" as an alias of 
"application/pkcs7-mime"

 

> Problem in Tika().detect for xml file signed in CADES
> -
>
> Key: TIKA-1997
> URL: https://issues.apache.org/jira/browse/TIKA-1997
> Project: Tika
>  Issue Type: Sub-task
>  Components: detector
>Affects Versions: 1.13
> Environment: JDK 1.7
>Reporter: Michele Andreano
>Priority: Blocker
> Attachments: test.xml.p7m
>
>
> When I submit a 

[jira] [Comment Edited] (TIKA-1997) Problem in Tika().detect for xml file signed in CADES

2019-01-18 Thread Roberto Benedetti (JIRA)


[ 
https://issues.apache.org/jira/browse/TIKA-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16746737#comment-16746737
 ] 

Roberto Benedetti edited comment on TIKA-1997 at 1/18/19 11:14 PM:
---

Updated references are:
 * [RFC-5652, Cryptographic Message Syntax 
(CMS)|https://tools.ietf.org/html/rfc5652]
 * [RFC-5751, Secure/Multipurpose Internet Mail Extensions (S/MIME) Version 3.2 
Message Specification|https://tools.ietf.org/html/rfc5751]
 * [RFC-7468, Textual Encodings of PKIX, PKCS, and CMS 
Structures|https://tools.ietf.org/html/rfc7468]

Tika looks for any "pkcs7" OID at the beginning of the file and, if found, 
returns "application/pkcs7-signature".

The OIDs that should be looked for are "pkcs7-signedData", 
"pkcs7-envelopedData" and "id-smime-ct-compressedData".

There are three media types with "pkcs7-signedData" at the beginning, namely:
 * "application/pkcs7-signature", extention ".p7s",  when the signed content is 
not present (detached signature)
 * "application/pkcs7-mime; smime-type=signed-data", extension ".p7m", when the 
signed content is present
 * "application/pkcs7-mime; smime-type=certs-only", extension ".p7c", when 
there are only certificates and (optionally) CRLs

When the OID is "pkcs7-envelopedData" the media type is 
"application/pkcs7-mime; smime-type=enveloped-data" and the extension is ".p7m".

When the OID is "id-smime-ct-compressedData" the media type is 
"application/pkcs7-mime; smime-type=compressed-data" and the extension is 
".p7z".

Extension ".p7b" is registered in Tika with media type 
"application/x-pkcs7-certificates" but I think the content of such files is the 
same as ".p7c" ones.

Furthermore the label in the textual encoding is always PKCS7 (i.e. the file 
begins with "-BEGIN PKCS7").

I can provide examples, built using openssl, but to support those media types 
Tika shall:
 * return parameters in media type when detecting streams
 * return different extensions based on media type parameters
 * further inspect streams when "-BEGIN PKCS7" or "pkcs7-signedData" are 
found (like it does for XML streams)
 * register "application/pkcs7-signature" as sub-class of 
"application/pkcs7-mime"

 


was (Author: roberto.benedetti):
Updated references are:
 * [RFC-5652, Cryptographic Message Syntax 
(CMS)|https://tools.ietf.org/html/rfc5652]
 * [RFC-5751, Secure/Multipurpose Internet Mail Extensions (S/MIME) Version 3.2 
Message Specification|https://tools.ietf.org/html/rfc5751]
 * [RFC-7468, Textual Encodings of PKIX, PKCS, and CMS 
Structures|https://tools.ietf.org/html/rfc7468]

Tika looks for any "pkcs7" OID at the beginning of the file and, if found, 
returns "application/pkcs7-signature".

The OIDs that should be looked for are "pkcs7-signedData", 
"pkcs7-envelopedData" and "id-smime-ct-compressedData".

There are three media types with "pkcs7-signedData" at the beginning, namely:
 * "application/pkcs7-signature", extention ".p7s",  when the signed content is 
not present (detached signature)
 * "application/pkcs7-mime; smime-type=signed-data", extension ".p7m", when the 
signed content is present
 * "application/pkcs7-mime; smime-type=certs-only", extension ".p7c", when 
there are only certificates and (optionally) CRLs

When the OID is "pkcs7-envelopedData" the media type is 
"application/pkcs7-mime; smime-type=enveloped-data" and the extension is ".p7m".

When the OID is "id-smime-ct-compressedData" the media type is 
"application/pkcs7-mime; smime-type=compressed-data" and the extension is 
".p7z".

Extension ".p7b" is registered in Tika with media type 
"application/x-pkcs7-certificates" but I think the content of such files is the 
same as ".p7c" ones.

Furthermore the label in the textual encoding is always PKCS7 (i.e. the file 
begins with "-BEGIN PKCS7").

I can provide examples, built using openssl, but to support those media types 
Tika shall:
 * return parameters in media type when detecting streams
 * return different extensions based on media type parameters
 * further inspect streams when "-BEGIN PKCS7" or "pkcs7-signedData" are 
found (like it does for XML streams)
 * register "application/pkcs7-signature" as sub-class of 
"application/pkcs7-mime" (it is referred to as "degenerated case")

 

> Problem in Tika().detect for xml file signed in CADES
> -
>
> Key: TIKA-1997
> URL: https://issues.apache.org/jira/browse/TIKA-1997
> Project: Tika
>  Issue Type: Sub-task
>  Components: detector
>Affects Versions: 1.13
> Environment: JDK 1.7
>Reporter: Michele Andreano
>Priority: Blocker
> Attachments: test.xml.p7m
>
>
> When I submit a tika a xml file signed in P7M format, I expect tika return as 
> mimetype application / pkcs7-mime instead gives me application / 
> pkcs7-signature.
> How 

[jira] [Comment Edited] (TIKA-1997) Problem in Tika().detect for xml file signed in CADES

2019-01-18 Thread Roberto Benedetti (JIRA)


[ 
https://issues.apache.org/jira/browse/TIKA-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16746737#comment-16746737
 ] 

Roberto Benedetti edited comment on TIKA-1997 at 1/18/19 11:14 PM:
---

Updated references are:
 * [RFC-5652, Cryptographic Message Syntax 
(CMS)|https://tools.ietf.org/html/rfc5652]
 * [RFC-5751, Secure/Multipurpose Internet Mail Extensions (S/MIME) Version 3.2 
Message Specification|https://tools.ietf.org/html/rfc5751]
 * [RFC-7468, Textual Encodings of PKIX, PKCS, and CMS 
Structures|https://tools.ietf.org/html/rfc7468]

Tika looks for any "pkcs7" OID at the beginning of the file and, if found, 
returns "application/pkcs7-signature".

The OIDs that should be looked for are "pkcs7-signedData", 
"pkcs7-envelopedData" and "id-smime-ct-compressedData".

There are three media types with "pkcs7-signedData" at the beginning, namely:
 * "application/pkcs7-signature", extention ".p7s",  when the signed content is 
not present (detached signature)
 * "application/pkcs7-mime; smime-type=signed-data", extension ".p7m", when the 
signed content is present
 * "application/pkcs7-mime; smime-type=certs-only", extension ".p7c", when 
there are only certificates and (optionally) CRLs

When the OID is "pkcs7-envelopedData" the media type is 
"application/pkcs7-mime; smime-type=enveloped-data" and the extension is ".p7m".

When the OID is "id-smime-ct-compressedData" the media type is 
"application/pkcs7-mime; smime-type=compressed-data" and the extension is 
".p7z".

Extension ".p7b" is registered in Tika with media type 
"application/x-pkcs7-certificates" but I think the content of such files is the 
same as ".p7c" ones.

Furthermore the label in the textual encoding is always PKCS7 (i.e. the file 
begins with "-BEGIN PKCS7").

I can provide examples, built using openssl, but to support those media types 
Tika shall:
 * return parameters in media type when detecting streams
 * return different extensions based on media type parameters
 * further inspect streams when "-BEGIN PKCS7" or "pkcs7-signedData" are 
found (like it does for XML streams)
 * register "application/pkcs7-signature" as sub-class of 
"application/pkcs7-mime"
 * remove "application/x-pkcs7-certificates"

 


was (Author: roberto.benedetti):
Updated references are:
 * [RFC-5652, Cryptographic Message Syntax 
(CMS)|https://tools.ietf.org/html/rfc5652]
 * [RFC-5751, Secure/Multipurpose Internet Mail Extensions (S/MIME) Version 3.2 
Message Specification|https://tools.ietf.org/html/rfc5751]
 * [RFC-7468, Textual Encodings of PKIX, PKCS, and CMS 
Structures|https://tools.ietf.org/html/rfc7468]

Tika looks for any "pkcs7" OID at the beginning of the file and, if found, 
returns "application/pkcs7-signature".

The OIDs that should be looked for are "pkcs7-signedData", 
"pkcs7-envelopedData" and "id-smime-ct-compressedData".

There are three media types with "pkcs7-signedData" at the beginning, namely:
 * "application/pkcs7-signature", extention ".p7s",  when the signed content is 
not present (detached signature)
 * "application/pkcs7-mime; smime-type=signed-data", extension ".p7m", when the 
signed content is present
 * "application/pkcs7-mime; smime-type=certs-only", extension ".p7c", when 
there are only certificates and (optionally) CRLs

When the OID is "pkcs7-envelopedData" the media type is 
"application/pkcs7-mime; smime-type=enveloped-data" and the extension is ".p7m".

When the OID is "id-smime-ct-compressedData" the media type is 
"application/pkcs7-mime; smime-type=compressed-data" and the extension is 
".p7z".

Extension ".p7b" is registered in Tika with media type 
"application/x-pkcs7-certificates" but I think the content of such files is the 
same as ".p7c" ones.

Furthermore the label in the textual encoding is always PKCS7 (i.e. the file 
begins with "-BEGIN PKCS7").

I can provide examples, built using openssl, but to support those media types 
Tika shall:
 * return parameters in media type when detecting streams
 * return different extensions based on media type parameters
 * further inspect streams when "-BEGIN PKCS7" or "pkcs7-signedData" are 
found (like it does for XML streams)
 * register "application/pkcs7-signature" as sub-class of 
"application/pkcs7-mime"

 

> Problem in Tika().detect for xml file signed in CADES
> -
>
> Key: TIKA-1997
> URL: https://issues.apache.org/jira/browse/TIKA-1997
> Project: Tika
>  Issue Type: Sub-task
>  Components: detector
>Affects Versions: 1.13
> Environment: JDK 1.7
>Reporter: Michele Andreano
>Priority: Blocker
> Attachments: test.xml.p7m
>
>
> When I submit a tika a xml file signed in P7M format, I expect tika return as 
> mimetype application / pkcs7-mime instead gives me application / 
> pkcs7-signature.
> 

[jira] [Comment Edited] (TIKA-1997) Problem in Tika().detect for xml file signed in CADES

2019-01-18 Thread Roberto Benedetti (JIRA)


[ 
https://issues.apache.org/jira/browse/TIKA-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16746737#comment-16746737
 ] 

Roberto Benedetti edited comment on TIKA-1997 at 1/18/19 11:06 PM:
---

Updated references are:
 * [RFC-5652, Cryptographic Message Syntax 
(CMS)|https://tools.ietf.org/html/rfc5652]
 * [RFC-5751, Secure/Multipurpose Internet Mail Extensions (S/MIME) Version 3.2 
Message Specification|https://tools.ietf.org/html/rfc5751]
 * [RFC-7468, Textual Encodings of PKIX, PKCS, and CMS 
Structures|https://tools.ietf.org/html/rfc7468]

Tika looks for any "pkcs7" OID at the beginning of the file and, if found, 
returns "application/pkcs7-signature".

The OIDs that should be looked for are "pkcs7-signedData", 
"pkcs7-envelopedData" and "id-smime-ct-compressedData".

There are three media types with "pkcs7-signedData" at the beginning, namely:
 * "application/pkcs7-signature", extention ".p7s",  when the signed content is 
not present (detached signature)
 * "application/pkcs7-mime; smime-type=signed-data", extension ".p7m", when the 
signed content is present
 * "application/pkcs7-mime; smime-type=certs-only", extension ".p7c", when 
there are only certificates and (optionally) CRLs

When the OID is "pkcs7-envelopedData" the media type is 
"application/pkcs7-mime; smime-type=enveloped-data" and the extension is ".p7m".

When the OID is "id-smime-ct-compressedData" the media type is 
"application/pkcs7-mime; smime-type=compressed-data" and the extension is 
".p7z".

Extension ".p7b" is registered in Tika with media type 
"application/x-pkcs7-certificates" but I think the content of such files is the 
same as ".p7c" ones.

Furthermore the label in the textual encoding is always PKCS7 (i.e. the file 
begins with "-BEGIN PKCS7").

I can provide examples, built using openssl, but to support those media types 
Tika shall:
 * return parameters in media type when detecting streams
 * return different extensions based on media type parameters
 * further inspect streams when "-BEGIN PKCS7" or "pkcs7-signedData" are 
found (like it does for XML streams)
 * register "application/pkcs7-signature" as sub-class of 
"application/pkcs7-mime" (it is referred to as "degenerated case")

 


was (Author: roberto.benedetti):
Updated references are:
 * [RFC-5652, Cryptographic Message Syntax 
(CMS)|https://tools.ietf.org/html/rfc5652]
 * [RFC-5751, Secure/Multipurpose Internet Mail Extensions (S/MIME) Version 3.2 
Message Specification|https://tools.ietf.org/html/rfc5751]
 * [RFC-7468, Textual Encodings of PKIX, PKCS, and CMS 
Structures|https://tools.ietf.org/html/rfc7468]

Tika looks for any "pkcs7" OID at the beginning of the file and, if found, 
returns "application/pkcs7-signature".

The OIDs that should be looked for are "pkcs7-signedData", 
"pkcs7-envelopedData" and "id-smime-ct-compressedData".

There are three media types with "pkcs7-signedData" at the beginning, namely:
 * "application/pkcs7-signature", extention ".p7s",  when the signed content is 
not present (detached signature)
 * "application/pkcs7-mime; smime-type=signed-data", extension ".p7m", when the 
signed content is present
 * "application/pkcs7-mime; smime-type=certs-only", extension ".p7c", when 
there are only certificates and (optionally) CRLs

When the OID is "pkcs7-envelopedData" the media type is 
"application/pkcs7-mime; smime-type=enveloped-data" and the extension is ".p7m".

When the OID is "id-smime-ct-compressedData" the media type is 
"application/pkcs7-mime; smime-type=compressed-data" and the extension is 
".p7z".

Extension ".p7b" is registered in Tika with media type 
"application/x-pkcs7-certificates" but I think the content of such files is the 
same as ".p7c" ones.

Furthermore the label in the textual encoding is always PKCS7 (i.e. the file 
begins with "-BEGIN PKCS7").

I can provide examples, built using openssl, but to support those media types 
Tika shall:
 * return parameters in media type when detecting streams
 * return different extensions based on media type parameters
 * further inspect streams when "-BEGIN PKCS7" or "pkcs7-signedData" are 
found (like it does for XML streams)

 

> Problem in Tika().detect for xml file signed in CADES
> -
>
> Key: TIKA-1997
> URL: https://issues.apache.org/jira/browse/TIKA-1997
> Project: Tika
>  Issue Type: Sub-task
>  Components: detector
>Affects Versions: 1.13
> Environment: JDK 1.7
>Reporter: Michele Andreano
>Priority: Blocker
> Attachments: test.xml.p7m
>
>
> When I submit a tika a xml file signed in P7M format, I expect tika return as 
> mimetype application / pkcs7-mime instead gives me application / 
> pkcs7-signature.
> How is it possible?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (TIKA-1997) Problem in Tika().detect for xml file signed in CADES

2019-01-18 Thread Roberto Benedetti (JIRA)


[ 
https://issues.apache.org/jira/browse/TIKA-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16746737#comment-16746737
 ] 

Roberto Benedetti edited comment on TIKA-1997 at 1/18/19 11:00 PM:
---

Updated references are:
 * [RFC-5652, Cryptographic Message Syntax 
(CMS)|https://tools.ietf.org/html/rfc5652]
 * [RFC-5751, Secure/Multipurpose Internet Mail Extensions (S/MIME) Version 3.2 
Message Specification|https://tools.ietf.org/html/rfc5751]
 * [RFC-7468, Textual Encodings of PKIX, PKCS, and CMS 
Structures|https://tools.ietf.org/html/rfc7468]

Tika looks for any "pkcs7" OID at the beginning of the file and, if found, 
returns "application/pkcs7-signature".

The OIDs that should be looked for are "pkcs7-signedData", 
"pkcs7-envelopedData" and "id-smime-ct-compressedData".

There are three media types with "pkcs7-signedData" at the beginning, namely:
 * "application/pkcs7-signature", extention ".p7s",  when the signed content is 
not present (detached signature)
 * "application/pkcs7-mime; smime-type=signed-data", extension ".p7m", when the 
signed content is present
 * "application/pkcs7-mime; smime-type=certs-only", extension ".p7c", when 
there are only certificates and (optionally) CRLs

When the OID is "pkcs7-envelopedData" the media type is 
"application/pkcs7-mime; smime-type=enveloped-data" and the extension is ".p7m".

When the OID is "id-smime-ct-compressedData" the media type is 
"application/pkcs7-mime; smime-type=compressed-data" and the extension is 
".p7z".

Extension ".p7b" is registered in Tika with media type 
"application/x-pkcs7-certificates" but I think the content of such files is the 
same as ".p7c" ones.

Furthermore the label in the textual encoding is always PKCS7 (i.e. the file 
begins with "-BEGIN PKCS7").

I can provide examples, built using openssl, but to support those media types 
Tika shall:
 * return parameters in media type when detecting streams
 * return different extensions based on media type parameters
 * further inspect streams when "-BEGIN PKCS7" or "pkcs7-signedData" are 
found (like it does for XML streams)

 


was (Author: roberto.benedetti):
Updated references are:
 * [RFC-5652, Cryptographic Message Syntax 
(CMS)|https://tools.ietf.org/html/rfc5652]
 * [RFC-5751, Secure/Multipurpose Internet Mail Extensions (S/MIME) Version 3.2 
Message Specification|https://tools.ietf.org/html/rfc5751]
 * [RFC-7468, Textual Encodings of PKIX, PKCS, and CMS 
Structures|https://tools.ietf.org/html/rfc7468]

Tika looks for "pkcs7-signedData" OID at the beginning of the file and, if 
found, returns "application/pkcs7-signature".

There are, however, three media types with that OID at the beginning, namely:
 * "application/pkcs7-signature", extention ".p7s",  when the signed content is 
not present (detached signature)
 * "application/pkcs7-mime; smime-type=signed-data", extension ".p7m", when the 
signed content is present
 * "application/pkcs7-mime; smime-type=certs-only", extension ".p7c", when 
there are only certificates and (optionally) CRLs

Extension ".p7b" is registered in Tika with media type 
"application/x-pkcs7-certificates" but I think the content of such files is the 
same as ".p7c" ones.

Extension ".p7m" is also used when the OID at the beginning is 
"pkcs7-envelopedData" and the media type is "application/pkcs7-mime; 
smime-type=enveloped-data".

Extension ".p7z" is used when the OID at the beginning is 
"id-smime-ct-compressedData" and the media type is "application/pkcs7-mime; 
smime-type=compressed-data".

Furthermore the label in the textual encoding is always PKCS7 (i.e. the file 
begins with "-BEGIN PKCS7").

I can provide examples, built using openssl, but to support those media types 
Tika shall:
 * return parameters in media type when detecting streams
 * return different extensions based on media type parameters
 * further inspect streams when "-BEGIN PKCS7" or "pkcs7-signedData" are 
found (like it does for XML streams)

 

> Problem in Tika().detect for xml file signed in CADES
> -
>
> Key: TIKA-1997
> URL: https://issues.apache.org/jira/browse/TIKA-1997
> Project: Tika
>  Issue Type: Sub-task
>  Components: detector
>Affects Versions: 1.13
> Environment: JDK 1.7
>Reporter: Michele Andreano
>Priority: Blocker
> Attachments: test.xml.p7m
>
>
> When I submit a tika a xml file signed in P7M format, I expect tika return as 
> mimetype application / pkcs7-mime instead gives me application / 
> pkcs7-signature.
> How is it possible?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (TIKA-1997) Problem in Tika().detect for xml file signed in CADES

2019-01-18 Thread Roberto Benedetti (JIRA)


[ 
https://issues.apache.org/jira/browse/TIKA-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16746737#comment-16746737
 ] 

Roberto Benedetti edited comment on TIKA-1997 at 1/18/19 10:47 PM:
---

Updated references are:
 * [RFC-5652, Cryptographic Message Syntax 
(CMS)|https://tools.ietf.org/html/rfc5652]
 * [RFC-5751, Secure/Multipurpose Internet Mail Extensions (S/MIME) Version 3.2 
Message Specification|https://tools.ietf.org/html/rfc5751]
 * [RFC-7468, Textual Encodings of PKIX, PKCS, and CMS 
Structures|https://tools.ietf.org/html/rfc7468]

Tika looks for "pkcs7-signedData" OID at the beginning of the file and, if 
found, returns "application/pkcs7-signature".

There are, however, three media types with that OID at the beginning, namely:
 * "application/pkcs7-signature", extention ".p7s",  when the signed content is 
not present (detached signature)
 * "application/pkcs7-mime; smime-type=signed-data", extension ".p7m", when the 
signed content is present
 * "application/pkcs7-mime; smime-type=certs-only", extension ".p7c", when 
there are only certificates and (optionally) CRLs

Extension ".p7b" is registered in Tika with media type 
"application/x-pkcs7-certificates" but I think the content of such files is the 
same as ".p7c" ones.

Extension ".p7m" is also used when the OID at the beginning is 
"pkcs7-envelopedData" and the media type is "application/pkcs7-mime; 
smime-type=enveloped-data".

Extension ".p7z" is used when the OID at the beginning is 
"id-smime-ct-compressedData" and the media type is "application/pkcs7-mime; 
smime-type=compressed-data".

Furthermore the label in the textual encoding is always PKCS7 (i.e. the file 
begins with "-BEGIN PKCS7").

I can provide examples, built using openssl, but to support those media types 
Tika shall:
 * return parameters in media type when detecting streams
 * return different extensions based on media type parameters
 * further inspect streams when "-BEGIN PKCS7" or "pkcs7-signedData" are 
found (like it does for XML streams)

 


was (Author: roberto.benedetti):
Updated references are:
 * [RFC-5652, Cryptographic Message Syntax 
(CMS)|https://tools.ietf.org/html/rfc5652]
 * [RFC-5751, Secure/Multipurpose Internet Mail Extensions (S/MIME) Version 3.2 
Message Specification|https://tools.ietf.org/html/rfc5751]
 * [RFC-7468, Textual Encodings of PKIX, PKCS, and CMS 
Structures|https://tools.ietf.org/html/rfc7468]

Tika looks for "pkcs7-signedData" OID at the beginning of the file and, if 
found, returns "application/pkcs7-signature".

There are, however, three media types with that OID at the beginning, namely:
 * "application/pkcs7-signature", extention ".p7s",  when the signed content is 
not present (detached signature)
 * "application/pkcs7-mime; smime-type=signed-data", extension ".p7m", when the 
signed content is present
 * "application/pkcs7-mime; smime-type=certs-only", extension ".p7c", when 
there are only certificates and (optionally) CRLs

Extension ".p7b" is registered in Tika with media type 
"application/x-pkcs7-certificates" but I think the content of such files is the 
same as ".p7c" ones.

Extension ".p7m" is also used when the OID at the beginning is 
"pkcs7-envelopedData" and the media type is "application/pkcs7-mime; 
smime-type=enveloped-data".

Extension ".p7z" is used when the OID at the beginning is 
"id-smime-ct-compressedData" and the media type is "application/pkcs7-mime; 
smime-type=compressed-data".

Furthermore the label in the textual encoding is always PKCS7 (i.e. the file 
begins with "-BEGIN PKCS7").

I can provide examples, built using openssl, but to support those media types 
Tika shall:
 * return parameters in media type when detecting streams
 * return different extensions based on media type parameters
 * further inspect streams when "-BEGIN PKCS7" or pkcs7-signedData are 
found (like it does for XML streams)

 

> Problem in Tika().detect for xml file signed in CADES
> -
>
> Key: TIKA-1997
> URL: https://issues.apache.org/jira/browse/TIKA-1997
> Project: Tika
>  Issue Type: Sub-task
>  Components: detector
>Affects Versions: 1.13
> Environment: JDK 1.7
>Reporter: Michele Andreano
>Priority: Blocker
> Attachments: test.xml.p7m
>
>
> When I submit a tika a xml file signed in P7M format, I expect tika return as 
> mimetype application / pkcs7-mime instead gives me application / 
> pkcs7-signature.
> How is it possible?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (TIKA-1997) Problem in Tika().detect for xml file signed in CADES

2019-01-18 Thread Roberto Benedetti (JIRA)


[ 
https://issues.apache.org/jira/browse/TIKA-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16746737#comment-16746737
 ] 

Roberto Benedetti edited comment on TIKA-1997 at 1/18/19 10:46 PM:
---

Updated references are:
 * [RFC-5652, Cryptographic Message Syntax 
(CMS)|https://tools.ietf.org/html/rfc5652]
 * [RFC-5751, Secure/Multipurpose Internet Mail Extensions (S/MIME) Version 3.2 
Message Specification|https://tools.ietf.org/html/rfc5751]
 * [RFC-7468, Textual Encodings of PKIX, PKCS, and CMS 
Structures|https://tools.ietf.org/html/rfc7468]

Tika looks for "pkcs7-signedData" OID at the beginning of the file and, if 
found, returns "application/pkcs7-signature".

There are, however, three media types with that OID at the beginning, namely:
 * "application/pkcs7-signature", extention ".p7s",  when the signed content is 
not present (detached signature)
 * "application/pkcs7-mime; smime-type=signed-data", extension ".p7m", when the 
signed content is present
 * "application/pkcs7-mime; smime-type=certs-only", extension ".p7c", when 
there are only certificates and (optionally) CRLs

Extension ".p7b" is registered in Tika with media type 
"application/x-pkcs7-certificates" but I think the content of such files is the 
same as ".p7c" ones.

Extension ".p7m" is also used when the OID at the beginning is 
"pkcs7-envelopedData" and the media type is "application/pkcs7-mime; 
smime-type=enveloped-data".

Extension ".p7z" is used when the OID at the beginning is 
"id-smime-ct-compressedData" and the media type is "application/pkcs7-mime; 
smime-type=compressed-data".

Furthermore the label in the textual encoding is always PKCS7 (i.e. the file 
begins with "-BEGIN PKCS7").

I can provide examples, built using openssl, but to support those media types 
Tika shall:
 * return parameters in media type when detecting streams
 * return different extensions based on media type parameters
 * further inspect streams when "-BEGIN PKCS7" or pkcs7-signedData are 
found (like it does for XML streams)

 


was (Author: roberto.benedetti):
Updated references are:
 * [RFC-5652, Cryptographic Message Syntax 
(CMS)|https://tools.ietf.org/html/rfc5652]
 * [RFC-5751, Secure/Multipurpose Internet Mail Extensions (S/MIME) Version 3.2 
Message Specification|https://tools.ietf.org/html/rfc5751]
 * [RFC-7468, Textual Encodings of PKIX, PKCS, and CMS 
Structures|https://tools.ietf.org/html/rfc7468]

Tika looks for "pkcs7-signedData" OID at the beginning of the file and, if 
found, returns "application/pkcs7-signature".

There are, however, three media types with that OID at the beginning, namely:
 * "application/pkcs7-signature", extention ".p7s",  when the signed content is 
not present (detached signature)
 * "application/pkcs7-mime; smime-type=signed-data", extension ".p7m", when the 
signed content is present
 * "application/pkcs7-mime; smime-type=certs-only", extension ".p7c" (".p7b" 
not mentioned but can be found too), when there are only certificates and 
(optionally) CRLs

Extension ".p7b" is registered in Tika with media type 
"application/x-pkcs7-certificates" but I think the content of such files is the 
same as ".p7c" ones.

Extension ".p7m" is also used when the OID at the beginning is 
"pkcs7-envelopedData" and the media type is "application/pkcs7-mime; 
smime-type=enveloped-data".

Extension ".p7z" is used when the OID at the beginning is 
"id-smime-ct-compressedData" and the media type is "application/pkcs7-mime; 
smime-type=compressed-data".

Furthermore the label in the textual encoding is always PKCS7 (i.e. the file 
begins with "-BEGIN PKCS7").

I can provide examples, built using openssl, but to support those media types 
Tika shall:
 * return parameters in media type when detecting streams
 * return different extensions based on media type parameters
 * further inspect streams when "-BEGIN PKCS7" or pkcs7-signedData are 
found (like it does for XML streams)

 

> Problem in Tika().detect for xml file signed in CADES
> -
>
> Key: TIKA-1997
> URL: https://issues.apache.org/jira/browse/TIKA-1997
> Project: Tika
>  Issue Type: Sub-task
>  Components: detector
>Affects Versions: 1.13
> Environment: JDK 1.7
>Reporter: Michele Andreano
>Priority: Blocker
> Attachments: test.xml.p7m
>
>
> When I submit a tika a xml file signed in P7M format, I expect tika return as 
> mimetype application / pkcs7-mime instead gives me application / 
> pkcs7-signature.
> How is it possible?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (TIKA-1997) Problem in Tika().detect for xml file signed in CADES

2019-01-18 Thread Roberto Benedetti (JIRA)


[ 
https://issues.apache.org/jira/browse/TIKA-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16746737#comment-16746737
 ] 

Roberto Benedetti edited comment on TIKA-1997 at 1/18/19 10:45 PM:
---

Updated references are:
 * [RFC-5652, Cryptographic Message Syntax 
(CMS)|https://tools.ietf.org/html/rfc5652]
 * [RFC-5751, Secure/Multipurpose Internet Mail Extensions (S/MIME) Version 3.2 
Message Specification|https://tools.ietf.org/html/rfc5751]
 * [RFC-7468, Textual Encodings of PKIX, PKCS, and CMS 
Structures|https://tools.ietf.org/html/rfc7468]

Tika looks for "pkcs7-signedData" OID at the beginning of the file and, if 
found, returns "application/pkcs7-signature".

There are, however, three media types with that OID at the beginning, namely:
 * "application/pkcs7-signature", extention ".p7s",  when the signed content is 
not present (detached signature)
 * "application/pkcs7-mime; smime-type=signed-data", extension ".p7m", when the 
signed content is present
 * "application/pkcs7-mime; smime-type=certs-only", extension ".p7c" (".p7b" 
not mentioned but can be found too), when there are only certificates and 
(optionally) CRLs

Extension ".p7b" is registered in Tika with media type 
"application/x-pkcs7-certificates" but I think the content of such files is the 
same as ".p7c" ones.

Extension ".p7m" is also used when the OID at the beginning is 
"pkcs7-envelopedData" and the media type is "application/pkcs7-mime; 
smime-type=enveloped-data".

Extension ".p7z" is used when the OID at the beginning is 
"id-smime-ct-compressedData" and the media type is "application/pkcs7-mime; 
smime-type=compressed-data".

Furthermore the label in the textual encoding is always PKCS7 (i.e. the file 
begins with "-BEGIN PKCS7").

I can provide examples, built using openssl, but to support those media types 
Tika shall:
 * return parameters in media type when detecting streams
 * return different extensions based on media type parameters
 * further inspect streams when "-BEGIN PKCS7" or pkcs7-signedData are 
found (like it does for XML streams)

 


was (Author: roberto.benedetti):
Updated references are:
 * [RFC-5652, Cryptographic Message Syntax 
(CMS)|https://tools.ietf.org/html/rfc5652]
 * [RFC-5751, Secure/Multipurpose Internet Mail Extensions (S/MIME) Version 3.2 
Message Specification|https://tools.ietf.org/html/rfc5751]
 * [RFC-7468, Textual Encodings of PKIX, PKCS, and CMS 
Structures|https://tools.ietf.org/html/rfc7468]

Tika looks for "pkcs7-signedData" OID at the beginning of the file and, if 
found, returns "application/pkcs7-signature".

There are, however, three media types with that OID at the beginning, namely:
 * "application/pkcs7-signature", extention ".p7s",  when the signed content is 
not present (detached signature)
 * "application/pkcs7-mime; smime-type=signed-data", extension ".p7m", when the 
signed content is present
 * "application/pkcs7-mime; smime-type=certs-only", extension ".p7c" (".p7b" 
not mentioned but can be found too), when there are only certificates and 
(optionally) CRLs

Extension ".p7m" is also used when the OID at the beginning is 
"pkcs7-envelopedData" and the media type is "application/pkcs7-mime; 
smime-type=enveloped-data".

Extension ".p7z" is used when the OID at the beginning is 
"id-smime-ct-compressedData" and the media type is "application/pkcs7-mime; 
smime-type=compressed-data".

Furthermore the label in the textual encoding is always PKCS7 (i.e. the file 
begins with "-BEGIN PKCS7").

I can provide examples, built using openssl, but to support those media types 
Tika shall:
 * return parameters in media type when detecting streams
 * return different extensions based on media type parameters
 * further inspect streams when "-BEGIN PKCS7" or pkcs7-signedData are 
found (like it does for XML streams)

 

> Problem in Tika().detect for xml file signed in CADES
> -
>
> Key: TIKA-1997
> URL: https://issues.apache.org/jira/browse/TIKA-1997
> Project: Tika
>  Issue Type: Sub-task
>  Components: detector
>Affects Versions: 1.13
> Environment: JDK 1.7
>Reporter: Michele Andreano
>Priority: Blocker
> Attachments: test.xml.p7m
>
>
> When I submit a tika a xml file signed in P7M format, I expect tika return as 
> mimetype application / pkcs7-mime instead gives me application / 
> pkcs7-signature.
> How is it possible?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (TIKA-1997) Problem in Tika().detect for xml file signed in CADES

2017-06-08 Thread Alessandro Scaldaferro (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1997?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16042465#comment-16042465
 ] 

Alessandro Scaldaferro edited comment on TIKA-1997 at 6/8/17 11:10 AM:
---

In https://www.ietf.org/rfc/rfc2633.txt I've found the following infos:
   MIME TypeFile Extension
   Application/pkcs7-mime (signedData,  .p7m
   envelopedData)

   Application/pkcs7-mime (degenerate   .p7c
   signedData "certs-only" message)

   Application/pkcs7-signature  .p7s

Looks like .p7m files are signed file (the original file + the signature data), 
and .p7s files are "signature files" containing only the signature data but not 
the original file.

Hence, for "signedData" files with .p7m extension the correct mimetype seems to 
be "pkcs7-mime".


was (Author: ascaldaf):
In https://www.ietf.org/rfc/rfc2633.txt I've found the following infos:
   MIME TypeFile Extension
   Application/pkcs7-mime (signedData,  .p7m
   envelopedData)

   Application/pkcs7-mime (degenerate   .p7c
   signedData "certs-only" message)

   Application/pkcs7-signature  .p7s

Looks like .p7m files are signed file (the original file + the signature data), 
and .p7s files are "signature files" containing only the signature data but not 
the original file.


> Problem in Tika().detect for xml file signed in CADES
> -
>
> Key: TIKA-1997
> URL: https://issues.apache.org/jira/browse/TIKA-1997
> Project: Tika
>  Issue Type: Sub-task
>  Components: detector
>Affects Versions: 1.13
> Environment: JDK 1.7
>Reporter: Michele Andreano
>Priority: Blocker
> Attachments: test.xml.p7m
>
>
> When I submit a tika a xml file signed in P7M format, I expect tika return as 
> mimetype application / pkcs7-mime instead gives me application / 
> pkcs7-signature.
> How is it possible?



--
This message was sent by Atlassian JIRA
(v6.3.15#6346)