I _think_ it is included. See below for the two options for parsing
testZipEncrypted.zip.
Are you not seeing this behavior? Were you expecting different behavior?
1) RecursiveParserWrapper
List<Metadata> metadataList =
getRecursiveMetadata("testZipEncrypted.zip");
debug(metadataList);
yields:
0: X-Parsed-By : org.apache.tika.parser.DefaultParser
0: X-Parsed-By : org.apache.tika.parser.pkg.PackageParser
0: X-TIKA:EXCEPTION:embedded_stream_exception :
org.apache.tika.exception.EncryptedDocumentException: stream (encrypted.txt) is
encrypted
at
org.apache.tika.parser.pkg.PackageParser.parseEntry(PackageParser.java:306)
at
org.apache.tika.parser.pkg.PackageParser.parse(PackageParser.java:230)
at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
at
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:135)
at
org.apache.tika.parser.RecursiveParserWrapper.parse(RecursiveParserWrapper.java:158)
at org.apache.tika.TikaTest.getRecursiveMetadata(TikaTest.java:221)
at org.apache.tika.TikaTest.getRecursiveMetadata(TikaTest.java:213)
at
org.apache.tika.parser.pkg.ZipParserTest.testZipEncrypted(ZipParserTest.java:213)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
at
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
at
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
at
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
at
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
at org.junit.runner.JUnitCore.run(JUnitCore.java:137)
at
com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68)
at
com.intellij.rt.execution.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:51)
at
com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:242)
at
com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:70)
0: X-TIKA:parse_time_millis : 34
0: X-TIKA:content : <html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta name="X-Parsed-By" content="org.apache.tika.parser.DefaultParser" />
<meta name="X-Parsed-By" content="org.apache.tika.parser.pkg.PackageParser" />
<meta name="Content-Type" content="application/zip" />
<title></title>
</head>
<body><div class="embedded" id="unencrypted.txt" />
<div class="package-entry"><h1>unencrypted.txt</h1>
</div>
<p>encrypted.txt</p>
</body></html>
0: Content-Type : application/zip
1: date : 2017-03-21T13:07:48Z
1: X-Parsed-By : org.apache.tika.parser.DefaultParser
1: X-Parsed-By : org.apache.tika.parser.txt.TXTParser
1: resourceName : unencrypted.txt
1: dcterms:modified : 2017-03-21T13:07:48Z
1: Last-Modified : 2017-03-21T13:07:48Z
1: Last-Save-Date : 2017-03-21T13:07:48Z
1: embeddedRelationshipId : unencrypted.txt
1: meta:save-date : 2017-03-21T13:07:48Z
1: Content-Encoding : windows-1252
1: X-TIKA:parse_time_millis : 3
1: modified : 2017-03-21T13:07:48Z
1: X-TIKA:content : <html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta name="date" content="2017-03-21T13:07:48Z" />
<meta name="X-Parsed-By" content="org.apache.tika.parser.DefaultParser" />
<meta name="X-Parsed-By" content="org.apache.tika.parser.txt.TXTParser" />
<meta name="resourceName" content="unencrypted.txt" />
<meta name="dcterms:modified" content="2017-03-21T13:07:48Z" />
<meta name="Last-Modified" content="2017-03-21T13:07:48Z" />
<meta name="Last-Save-Date" content="2017-03-21T13:07:48Z" />
<meta name="embeddedRelationshipId" content="unencrypted.txt" />
<meta name="meta:save-date" content="2017-03-21T13:07:48Z" />
<meta name="Content-Encoding" content="windows-1252" />
<meta name="modified" content="2017-03-21T13:07:48Z" />
<meta name="Content-Length" content="13" />
<meta name="X-TIKA:embedded_resource_path" content="/unencrypted.txt" />
<meta name="Content-Type" content="text/plain; charset=windows-1252" />
<title></title>
</head>
<body><p>hello world
</p>
</body></html>
1: Content-Length : 13
1: X-TIKA:embedded_resource_path : /unencrypted.txt
1: Content-Type : text/plain; charset=windows-1252
2) Classic XML:
XMLResult r = getXML("testZipEncrypted.zip");
for (String n : r.metadata.names()) {
for (String v : r.metadata.getValues(n)) {
System.out.println("meta: "+n + " : "+v);
}
}
System.out.println(r.xml);
Yields:
meta: X-Parsed-By : org.apache.tika.parser.DefaultParser
meta: X-Parsed-By : org.apache.tika.parser.pkg.PackageParser
meta: X-TIKA:EXCEPTION:embedded_stream_exception :
org.apache.tika.exception.EncryptedDocumentException: stream (encrypted.txt) is
encrypted
at
org.apache.tika.parser.pkg.PackageParser.parseEntry(PackageParser.java:306)
at
org.apache.tika.parser.pkg.PackageParser.parse(PackageParser.java:230)
at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:280)
at
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:135)
at org.apache.tika.TikaTest.getXML(TikaTest.java:205)
at org.apache.tika.TikaTest.getXML(TikaTest.java:191)
at
org.apache.tika.parser.pkg.ZipParserTest.testZipEncrypted(ZipParserTest.java:206)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at
org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
at
org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
at
org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
at
org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
at
org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
at
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
at
org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
at org.junit.runner.JUnitCore.run(JUnitCore.java:137)
at
com.intellij.junit4.JUnit4IdeaTestRunner.startRunnerWithArgs(JUnit4IdeaTestRunner.java:68)
at
com.intellij.rt.execution.junit.IdeaTestRunner$Repeater.startRunnerWithArgs(IdeaTestRunner.java:51)
at
com.intellij.rt.execution.junit.JUnitStarter.prepareStreamsAndStart(JUnitStarter.java:242)
at
com.intellij.rt.execution.junit.JUnitStarter.main(JUnitStarter.java:70)
meta: Content-Type : application/zip
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta name="X-Parsed-By" content="org.apache.tika.parser.DefaultParser" />
<meta name="X-Parsed-By" content="org.apache.tika.parser.pkg.PackageParser" />
<meta name="Content-Type" content="application/zip" />
<title></title>
</head>
<body><div class="embedded" id="unencrypted.txt" />
<div class="package-entry"><h1>unencrypted.txt</h1>
<p>hello world
</p>
</div>
<p>encrypted.txt</p>
</body></html>
-----Original Message-----
From: Aeham Abushwashi [mailto:[email protected]]
Sent: Tuesday, May 23, 2017 3:47 AM
To: [email protected]; Tim Allison <[email protected]>
Cc: [email protected]
Subject: Re: [VOTE] Release Apache Tika 1.15 Candidate #1
Thanks Tim and apologies if this isn't the right thread to ask this question...
any reason TIKA-2300 is not included despite FixVersions=1.15 on the ticket?
On 22 May 2017 at 20:25, Tim Allison <[email protected]> wrote:
> A candidate for the Tika 1.15 release is available at:
> https://dist.apache.org/repos/dist/dev/tika/
>
> The release candidate is a zip archive of the sources in:
> https://github.com/apache/tika/tree/1.15-rc1
>
> The SHA1 checksum of the archive is
> e82697a6804373367fbba98d47426ab74e036eb1.
>
> In addition, a staged maven repository is available here:
> https://repository.apache.org/content/repositories/orgapachetika-1022
>
> Please vote on releasing this package as Apache Tika 1.15.
> The vote is open for the next 72 hours and passes if a majority of at
> least three +1 Tika PMC votes are cast.
>
> [ ] +1 Release this package as Apache Tika 1.15 [ ] -1 Do not release
> this package because...
>
> ***This is my first time as release manager. Please kick the tires
> thoroughly.***
>
> This is my +1.
>
> Cheers,
>
> Tim
>
--
Aeham Abushwashi
Head of Engineering
Exonar
v: video.exonar.com | w: exonar.com <http://www.exonar.com/> | twitter:
@exonar <https://twitter.com/exonar>
GDPR: Why It’s About More Than Regulation: Download the White Paper Here
<https://goo.gl/1cSVzH>
Trial <https://www.exonar.com/platform/> the capability on your own
organisation's data to understand what you've got, where it is and who has
access to it.
Come and meet us for a chat at Infosecurity Europe
<http://www.infosecurityeurope.com/>on stand S07 in the Cyber Innovation Zone
<http://www.infosecurityeurope.com/visit/whats-on/uk-cyber-innovation-zone/>
Exonar Limited, registered in the UK, registration number 06439969 at 14 West
Mills, Newbury, Berkshire, RG14 5HG. DISCLAIMER: This email and any attachments
to it may be confidential or private. If you have received it in error, please
notify us and delete it from your system.