[jira] [Commented] (TIKA-1379) error in Tika().detect for xml files with xades signature
[ https://issues.apache.org/jira/browse/TIKA-1379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14965174#comment-14965174 ] Alessandro De Angelis commented on TIKA-1379: - ... > error in Tika().detect for xml files with xades signature > - > > Key: TIKA-1379 > URL: https://issues.apache.org/jira/browse/TIKA-1379 > Project: Tika > Issue Type: Bug > Components: detector >Affects Versions: 1.4 >Reporter: Alessandro De Angelis > Labels: new-parser > Fix For: 1.12 > > > we tried to get the mime type of an xml file with xades signature embedded. > the result is "text/html" and not the expected "text/xml" or > "application/xml". > here is an example of the xml file: > {code} > > > 00094853 0003 2 > 2013-09-23 > 2013-09-23 > D69017 > FILOSOFIA DELLA SCIENZA > D69 > TEATRO E ARTI VISIVE > > 1233456 > PAOLINO > PAPERINO > 23.0 > 23 > > > > 2012 > 6.0 > > 9 > جامعة البندقية - TEST > Verbale_3 > QUI QUO QUA > D69017 > FILOSOFIA DELLA SCIENZA > D69 > TEATRO E ARTI VISIVE > QUI QUO QUA > 26-09-2013 09:55:53 CEST(+0200) > > 3 > 11.09.03 > > http://www.w3.org/2000/09/xmldsig#; > Id="sig08744308748201048377"> > > Algorithm="http://www.w3.org/2006/12/xml-c14n11;> > Algorithm="http://www.w3.org/2001/04/xmldsig-more#rsa-sha256;> > > > http://www.w3.org/2002/06/xmldsig-filter2;> > xmlns:dsig-xpath="http://www.w3.org/2002/06/xmldsig-filter2; > Filter="subtract">/descendant::ds:Signature > > http://www.w3.org/TR/1999/REC-xslt-19991116;> > http://www.kion.it/webesse3/multilingua; > xmlns:xsl="http://www.w3.org/1999/XSL/Transform; > exclude-result-prefixes="kion" version="1.0"> > > > >select="/VERBALI/VERBALE"> >select="/VERBALI/VERBALE/SOSTITUZIONE_DOCUMENTO"> >select="/VERBALI/VERBALE/RAGGRUPPAMENTO"> >select="/VERBALI/VERBALE/COMMISSIONE"> > > > > >http-equiv="Content-Type"> > >test="$sostituzione_root"> > Dichiarazione > conformità Verbale Esame > > > Verbalizzazione > esame > > > >td {font-family: Arial; font-size:10pt;} >div {font-family: Arial; font-size:10pt;} >pre {font-family: Arial; font-size:10pt;} > > > > > >test="$sostituzione_root"> >colspan="2"> select="$verbale_root/ATENEO_DES"> >colspan="2">DICHIARAZIONE DI > CONFORMITÀ >colspan="2">Il sottoscritto select="$verbale_root/TITOLARE_PROCEDIMENTO">, docente di > > > > > > >test="$sostituzione_root/MOTIVAZIONE"> > > PREMESSO CHE > > > > select="$sostituzione_root/MOTIVAZIONE"> > > > > > > > > > DICHIARA > > >
[jira] [Commented] (TIKA-1379) error in Tika().detect for xml files with xades signature
[ https://issues.apache.org/jira/browse/TIKA-1379?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14372133#comment-14372133 ] Tyler Palsulich commented on TIKA-1379: --- The file is still detected as text/html. Should we update the magic to detect it as xml? error in Tika().detect for xml files with xades signature - Key: TIKA-1379 URL: https://issues.apache.org/jira/browse/TIKA-1379 Project: Tika Issue Type: Bug Components: detector Affects Versions: 1.4 Reporter: Alessandro De Angelis Labels: new-parser Fix For: 1.8 we tried to get the mime type of an xml file with xades signature embedded. the result is text/html and not the expected text/xml or application/xml. here is an example of the xml file: {code} VERBALI ad_cod=D69017 batch_id=0 cds_cod=D69 data_app=2013-09-23 VERBALE Id=1 tipologia=Verbale esame VERB_NUM00094853 0003 2/VERB_NUM DATA_APP2013-09-23/DATA_APP DATA_ESA2013-09-23/DATA_ESA AD_CODD69017/AD_COD ADFILOSOFIA DELLA SCIENZA/AD CDS_CODD69/CDS_COD CDSTEATRO E ARTI VISIVE/CDS TIPO_ESA/TIPO_ESA MAT1233456/MAT NOMEPAOLINO/NOME COGNOMEPAPERINO/COGNOME VOTO23.0/VOTO VOTODECOD23/VOTODECOD CAUSALE/CAUSALE TIPO_MODULO/TIPO_MODULO IMG_PATH/IMG_PATH AA_SES_ID2012/AA_SES_ID AD_CFU6.0/AD_CFU NOTA/NOTA ATENEO9/ATENEO ATENEO_DESجامعة البندقية - TEST/ATENEO_DES TIPO_DOCUMENTOVerbale_3/TIPO_DOCUMENTO TITOLARE_PROCEDIMENTOQUI QUO QUA/TITOLARE_PROCEDIMENTO AD_STU_CODD69017/AD_STU_COD AD_STUFILOSOFIA DELLA SCIENZA/AD_STU CDS_STU_CODD69/CDS_STU_COD CDS_STUTEATRO E ARTI VISIVE/CDS_STU DOCENTEQUI QUO QUA/DOCENTE DATA_DOCUMENTO26-09-2013 09:55:53 CEST(+0200)/DATA_DOCUMENTO SOFTWARE_DI_CREAZIONE NOME3/NOME VERSIONE11.09.03/VERSIONE /SOFTWARE_DI_CREAZIONE /VERBALEds:Signature xmlns:ds=http://www.w3.org/2000/09/xmldsig#; Id=sig08744308748201048377 ds:SignedInfo ds:CanonicalizationMethod Algorithm=http://www.w3.org/2006/12/xml-c14n11;/ds:CanonicalizationMethod ds:SignatureMethod Algorithm=http://www.w3.org/2001/04/xmldsig-more#rsa-sha256;/ds:SignatureMethod ds:Reference URI= ds:Transforms ds:Transform Algorithm=http://www.w3.org/2002/06/xmldsig-filter2; dsig-xpath:XPath xmlns:dsig-xpath=http://www.w3.org/2002/06/xmldsig-filter2; Filter=subtract/descendant::ds:Signature/dsig-xpath:XPath /ds:Transform ds:Transform Algorithm=http://www.w3.org/TR/1999/REC-xslt-19991116; xsl:stylesheet xmlns:kion=http://www.kion.it/webesse3/multilingua; xmlns:xsl=http://www.w3.org/1999/XSL/Transform; exclude-result-prefixes=kion version=1.0 kion:ml module=FirmaDigitale target=kion/kion:ml xsl:output method=xml/xsl:output xsl:variable name=mostra_ad_figlie select=1/xsl:variable xsl:variable name=verbale_root select=/VERBALI/VERBALE/xsl:variable xsl:variable name=sostituzione_root select=/VERBALI/VERBALE/SOSTITUZIONE_DOCUMENTO/xsl:variable xsl:variable name=RAGG_ROOT select=/VERBALI/VERBALE/RAGGRUPPAMENTO/xsl:variable xsl:variable name=COMM_ROOT select=/VERBALI/VERBALE/COMMISSIONE/xsl:variable xsl:template match=/ html head meta content=text/html;charset=UTF-8 http-equiv=Content-Type/meta xsl:choose xsl:when test=$sostituzione_root titleDichiarazione conformità Verbale Esame/title /xsl:when xsl:otherwise titleVerbalizzazione esame/title /xsl:otherwise /xsl:choose style type=text/css td {font-family: Arial; font-size:10pt;} div {font-family: Arial; font-size:10pt;} pre {font-family: Arial; font-size:10pt;} /style /head body table xsl:choose xsl:when test=$sostituzione_root trtd align=center colspan=2bigstrongxsl:value-of select=$verbale_root/ATENEO_DES/xsl:value-of/strong/bigbr/br/td/tr