[ANNOUNCE] Apache Tika 3.0.0-BETA2 released

2024-07-17 Thread Tim Allison
The Apache Tika project is pleased to announce the release of Apache
Tika 3.0.0-BETA2. The release contents have been pushed out to the main
Apache release site and to the Maven Central sync.

Apache Tika is a toolkit for detecting and extracting metadata and
structured text content from various documents using existing parser
libraries.

Apache Tika 3.0.0-BETA2 includes numerous bug fixes and dependency upgrades.
The biggest change in the 3.x branch is that it requires >= Java 11.
Details can be found in the changes file:
https://www.apache.org/dist/tika/3.0.0-BETA2/CHANGES-3.0.0-BETA2.txt

Apache Tika is available on the download page:
https://tika.apache.org/download.html

Apache Tika will be available shortly in binary form or for use using
Maven 2 from the Central Repository:
https://repo1.maven.org/maven2/org/apache/tika/

When downloading, please remember to verify the downloads using
signatures found: https://www.apache.org/dist/tika/KEYS

For more information on Apache Tika, visit the project home page:
https://tika.apache.org/

NOTE: This release requires Java 11. We plan to support the
2.x branch (which requires Java 8) for six months after the
release of 3.0.0.

-- Tim Allison, on behalf of the Apache Tika community


[ANNOUNCE] Apache Tika 2.9.2 released

2024-04-02 Thread Tim Allison
The Apache Tika project is pleased to announce the release of Apache
Tika 2.9.2. The release contents have been pushed out to the main
Apache release site and to the Maven Central sync.

Apache Tika is a toolkit for detecting and extracting metadata and
structured text content from various documents using existing parser
libraries.

Apache Tika 2.9.2 includes numerous bug fixes and dependency upgrades.
Details can be found in the changes file:
https://www.apache.org/dist/tika/2.9.2/CHANGES-2.9.2.txt

Apache Tika is available on the download page:
https://tika.apache.org/download.html

Apache Tika is also available in binary form or for use using Maven 2
from the Central Repository:
https://repo1.maven.org/maven2/org/apache/tika/

When downloading, please remember to verify the downloads using
signatures found: https://www.apache.org/dist/tika/KEYS

For more information on Apache Tika, visit the project home page:
https://tika.apache.org/

-- Tim Allison, on behalf of the Apache Tika community


[ANNOUNCE] Apache Tika 3.0.0-BETA released

2023-12-13 Thread Tim Allison
The Apache Tika project is pleased to announce the release of Apache
Tika 3.0.0-BETA. The release contents have been pushed out to the main
Apache release site and to the Maven Central sync.

Apache Tika is a toolkit for detecting and extracting metadata and
structured text content from various documents using existing parser
libraries.

Apache Tika 3.0.0-BETA includes numerous bug fixes and dependency upgrades.
The biggest change in the 3.x branch is that it requires >= Java 11.
Details can be found in the changes file:
https://www.apache.org/dist/tika/3.0.0-BETA/CHANGES-3.0.0-BETA.txt

Apache Tika is available on the download page:
https://tika.apache.org/download.html

Apache Tika will be available shortly in binary form or for use using Maven 2
from the Central Repository:
https://repo1.maven.org/maven2/org/apache/tika/

When downloading, please remember to verify the downloads using
signatures found: https://www.apache.org/dist/tika/KEYS

For more information on Apache Tika, visit the project home page:
https://tika.apache.org/

NOTE: Users of the tika-pipes Solr jars (tika-emitter-solr and
tika-pipes-iterator-solr) should take steps to mitigate
the risks of logback related CVEs: CVE-2023-6481/CVE-2023-6378.

NOTE: This release requires Java 11. We plan to support the
2.x branch (which requires Java 8) for six months after the
release of 3.0.0.


-- Tim Allison, on behalf of the Apache Tika community


[ANNOUNCE] Apache Tika 2.9.1 released

2023-10-22 Thread Tim Allison
The Apache Tika project is pleased to announce the release of Apache
Tika 2.9.1. The release contents have been pushed out to the main
Apache release site and to the Maven Central sync.

Apache Tika is a toolkit for detecting and extracting metadata and
structured text content from various documents using existing parser
libraries.

Apache Tika 2.9.1 includes numerous bug fixes and dependency upgrades.
Details can be found in the changes file:
https://www.apache.org/dist/tika/2.9.1/CHANGES-2.9.1.txt

Apache Tika is available on the download page:
https://tika.apache.org/download.html

Apache Tika is also available in binary form or for use using Maven 2
from the Central Repository:
https://repo1.maven.org/maven2/org/apache/tika/

When downloading, please remember to verify the downloads using
signatures found: https://www.apache.org/dist/tika/KEYS

For more information on Apache Tika, visit the project home page:
https://tika.apache.org/

-- Tim Allison, on behalf of the Apache Tika community


[ANNOUNCE] Apache Tika 2.8.0 released

2023-05-15 Thread Tim Allison
The Apache Tika project is pleased to announce the release of Apache
Tika 2.8.0. The release contents have been pushed out to the main
Apache release site and to the Maven Central sync.

Apache Tika is a toolkit for detecting and extracting metadata and
structured text content from various documents using existing parser
libraries.

Apache Tika 2.8.0 includes optional handling of incremental updates in PDFs,
a fix for a bug that had prevented the running of exiftool and ffmpeg
by default,
and the move of the GeoTopic parser back to its 1.x namespace:
o.a.t.parser.geo.topic.
There are several other improvements, bug fixes and dependency upgrades.
Details can be found in the changes file:
https://www.apache.org/dist/tika/2.8.0/CHANGES-2.8.0.txt

Apache Tika is available on the download page:
https://tika.apache.org/download.html

Apache Tika is also available in binary form or for use using Maven 2
from the Central Repository:
https://repo1.maven.org/maven2/org/apache/tika/

When downloading, please remember to verify the downloads using
signatures found: https://www.apache.org/dist/tika/KEYS

For more information on Apache Tika, visit the project home page:
https://tika.apache.org/

-- Tim Allison, on behalf of the Apache Tika community


ANNOUNCE] Apache Tika 2.7.0 released

2023-02-06 Thread Tim Allison
The Apache Tika project is pleased to announce the release of Apache
Tika 2.7.0. The release contents have been pushed out to the main
Apache release site and to the Maven Central sync.

Apache Tika is a toolkit for detecting and extracting metadata and
structured text content from various documents using existing parser
libraries.

Apache Tika 2.7.0 includes a new optional integration with Siegfried,
a bug fix for the OpenSearch emitter and several dependency upgrades.
Details can be found in the changes file:
https://www.apache.org/dist/tika/2.7.0/CHANGES-2.7.0.txt

Apache Tika is available on the download page:
https://tika.apache.org/download.html

Apache Tika is also available in binary form or for use using Maven 2
from the Central Repository:
https://repo1.maven.org/maven2/org/apache/tika/

When downloading, please remember to verify the downloads using
signatures found: https://www.apache.org/dist/tika/KEYS

For more information on Apache Tika, visit the project home page:
https://tika.apache.org/

-- Tim Allison, on behalf of the Apache Tika community


[ANNOUNCE] Apache Tika 2.6.0 released

2022-11-07 Thread Tim Allison
The Apache Tika project is pleased to announce the release of Apache
Tika 2.6.0. The release contents have been pushed out to the main
Apache release site and to the Maven Central sync.

Apache Tika is a toolkit for detecting and extracting metadata and
structured text content from various documents using existing parser
libraries.

Apache Tika 2.6.0 includes a new optional integration with Siegfried,
a bug fix for the OpenSearch emitter and several dependency upgrades.
Details can be found in the changes file:
https://www.apache.org/dist/tika/2.6.0/CHANGES-2.6.0.txt

Apache Tika is available on the download page:
https://tika.apache.org/download.html

Apache Tika is also available in binary form or for use using Maven 2
from the Central Repository:
https://repo1.maven.org/maven2/org/apache/tika/

When downloading, please remember to verify the downloads using
signatures found: https://www.apache.org/dist/tika/KEYS

For more information on Apache Tika, visit the project home page:
https://tika.apache.org/

-- Tim Allison, on behalf of the Apache Tika community


[ANNOUNCE] Apache Tika 2.5.0 released

2022-10-03 Thread Tim Allison
The Apache Tika project is pleased to announce the release of Apache
Tika 2.5.0. The release contents have been pushed out to the main
Apache release site and to the Maven Central sync.

Apache Tika is a toolkit for detecting and extracting metadata and
structured text content from various documents using existing parser
libraries.

Apache Tika 2.5.0 includes several security upgrades in dependencies.
Details can be found in the changes file:
https://www.apache.org/dist/tika/2.5.0/CHANGES-2.5.0.txt

Apache Tika is available on the download page:
https://tika.apache.org/download.html

Apache Tika is also available in binary form or for use using Maven 2
from the Central Repository:
https://repo1.maven.org/maven2/org/apache/tika/

When downloading, please remember to verify the downloads using
signatures found: https://www.apache.org/dist/tika/KEYS

For more information on Apache Tika, visit the project home page:
https://tika.apache.org/

-- Tim Allison, on behalf of the Apache Tika community


[ANNOUNCE] Apache Tika 1.28.5 released

2022-09-14 Thread Tim Allison
The Apache Tika project is pleased to announce the release of Apache
Tika 1.28.5. The release contents have been pushed out to the main
Apache release site and to the Maven Central sync.

Apache Tika is a toolkit for detecting and extracting metadata and
structured text content from various documents using existing parser
libraries.

Apache Tika 1.28.5 contains a security-related fix and
dependency upgrades. Details can be found in the changes file:
https://www.apache.org/dist/tika/1.28.5/CHANGES-1.28.5.txt

NOTE: The 1.x branch is now in security-fixes-only mode. The formal
EoL for the 1.x branch is 30 September 2022:
https://lists.apache.org/thread/yq6n7o01kw544dvj1jsoqk29g6yqjkp3

If there are no security issues identified by 30 September 2022,
this will be the last 1.x version released.

Please upgrade to 2.4.x at your earliest convenience. For guidance on this
upgrade:
https://cwiki.apache.org/confluence/display/TIKA/Migrating+to+Tika+2.0.0

Apache Tika is available on the download page:
https://tika.apache.org/download.html

Apache Tika is also available in binary form or for use using Maven 2
from the Central Repository:
https://repo1.maven.org/maven2/org/apache/tika/

When downloading, please remember to verify the downloads using
signatures found: https://www.apache.org/dist/tika/KEYS

For more information on Apache Tika, visit the project home page:
https://tika.apache.org/

-- Tim Allison, on behalf of the Apache Tika community


CVE-2022-33879: Apache Tika: Incomplete fix and new regex DoS in StandardsExtractingContentHandler

2022-06-28 Thread Tim Allison
Severity: low

Description:

The initial fixes in CVE-2022-30126 and CVE-2022-30973 for regexes in the
StandardsExtractingContentHandler were insufficient, and we found a
separate, new regex DoS in a different regex in the
StandardsExtractingContentHandler. These are now fixed in 1.28.4 and 2.4.1
(download: https://tika.apache.org/download.html). See
https://tika.apache.org/security.html for a full list of known security
issues.

Credit:

This incomplete fix was discovered and reported by the CodeQL team member
[@atorralba (Tony Torralba)](https://github.com/atorralba) and [@jarlob
(Jaroslav Lobačevski)](https://github.com/jarlob) from Github Security
Lab.  The new ReDos was discovered by the Apache Tika team.


[ANNOUNCE] Apache Tika 1.28.4 released

2022-06-22 Thread Tim Allison
The Apache Tika project is pleased to announce the release of Apache
Tika 1.28.4. The release contents have been pushed out to the main
Apache release site and to the Maven Central sync.

Apache Tika is a toolkit for detecting and extracting metadata and
structured text content from various documents using existing parser
libraries.

Apache Tika 1.28.4 contains security-related fixes and
dependency upgrades. Details can be found in the changes file:
https://www.apache.org/dist/tika/1.28.4/CHANGES-1.28.4.txt

NOTE: The 1.x branch is now in security-fixes-only mode. The PMC
has decided the formal EoL for the 1.x branch is 30 September 2022:
https://lists.apache.org/thread/yq6n7o01kw544dvj1jsoqk29g6yqjkp3

Please upgrade to 2.4.1 at your earliest convenience. For guidance on this
upgrade:
https://cwiki.apache.org/confluence/display/TIKA/Migrating+to+Tika+2.0.0

Apache Tika is available on the download page:
https://tika.apache.org/download.html

Apache Tika is also available in binary form or for use using Maven 2
from the Central Repository:
https://repo1.maven.org/maven2/org/apache/tika/

When downloading, please remember to verify the downloads using
signatures found: https://www.apache.org/dist/tika/KEYS

For more information on Apache Tika, visit the project home page:
https://tika.apache.org/

-- Tim Allison, on behalf of the Apache Tika community


[ANNOUNCE] Apache Tika 2.4.1 released

2022-06-22 Thread Tim Allison
The Apache Tika project is pleased to announce the release of Apache
Tika 2.4.1. The release contents have been pushed out to the main
Apache release site and to the Maven Central sync.

Apache Tika is a toolkit for detecting and extracting metadata and
structured text content from various documents using existing parser
libraries.

Apache Tika 2.4.1 includes improved customization and configuration
and several upgrades in dependencies.
Details can be found in the changes file:
https://www.apache.org/dist/tika/2.4.1/CHANGES-2.4.1.txt

Apache Tika is available on the download page:
https://tika.apache.org/download.html

Apache Tika is also available in binary form or for use using Maven 2
from the Central Repository:
https://repo1.maven.org/maven2/org/apache/tika/

When downloading, please remember to verify the downloads using
signatures found: https://www.apache.org/dist/tika/KEYS

For more information on Apache Tika, visit the project home page:
https://tika.apache.org/

-- Tim Allison, on behalf of the Apache Tika community


CVE-2022-30973: Apache Tika: Missing fix for CVE-2022-30126 in 1.28.2

2022-06-01 Thread Tim Allison
Description:

We failed to apply the fix for CVE-2022-30126 to the 1.x branch in the 1.28.2 
release.  In Apache Tika, a regular expression in the StandardsText class, used 
by the StandardsExtractingContentHandler could lead to a denial of service 
caused by backtracking on a specially crafted file. This only affects users who 
are running the StandardsExtractingContentHandler, which is a non-standard 
handler.  This is fixed in 1.28.3.

Mitigation:

Avoid using the StandardsExtractingContentHandler or upgrade to Tika 1.28.3 or 
2.4.0

Credit:

This issue was reported by Cathy Hu, SUSE Software Solutions Germany GmbH.



[ANNOUNCE] Apache Tika 1.28.3 released

2022-05-27 Thread Tim Allison
The Apache Tika project is pleased to announce the release of Apache
Tika 1.28.3. The release contents have been pushed out to the main
Apache release site and to the Maven Central sync.

Apache Tika is a toolkit for detecting and extracting metadata and
structured text content from various documents using existing parser
libraries.

Apache Tika 1.28.3 contains security-related fixes and
dependency upgrades. Details can be found in the changes file:
https://www.apache.org/dist/tika/1.28.3/CHANGES-1.28.3.txt

NOTE: The 1.x branch is now in security-fixes-only mode. The PMC
has decided the formal EoL for the 1.x branch is 30 September 2022:
https://lists.apache.org/thread/yq6n7o01kw544dvj1jsoqk29g6yqjkp3

Please upgrade to 2.4.0 at your earliest convenience. For guidance on this
upgrade:
https://cwiki.apache.org/confluence/display/TIKA/Migrating+to+Tika+2.0.0

Apache Tika is available on the download page:
https://tika.apache.org/download.html

Apache Tika is also available in binary form or for use using Maven 2
from the Central Repository:
https://repo1.maven.org/maven2/org/apache/tika/

When downloading, please remember to verify the downloads using
signatures found: https://www.apache.org/dist/tika/KEYS

For more information on Apache Tika, visit the project home page:
https://tika.apache.org/

-- Tim Allison, on behalf of the Apache Tika community


CVE-2022-25169: Apache Tika BPGParser Memory Usage DoS

2022-05-17 Thread Tim Allison
Description:

The BPG parser in versions of Tika before 1.28.2 and 2.4.0 may allocate an 
unreasonable amount of memory on carefully crafted files.




CVE-2022-30126: Apache Tika Regular Expression Denial of Service in Standards Extractor

2022-05-16 Thread Tim Allison
Severity: low

Description:

A regular expression in our StandardsText class, used by the 
StandardsExtractingContentHandler could lead to a denial of service caused by 
backtracking on a specially crafted file. This only affects users who are 
running the StandardsExtractingContentHandler, which is a non-standard handler. 
 This is fixed in 1.28.2 and 2.4.0

Mitigation:

Upgrade to 1.28.2 or 2.4.0

Credit:

This issue was discovered and reported by the CodeQL team members [@atorralba 
(Tony Torralba)](https://github.com/atorralba) and [@joefarebrother (Joseph 
Farebrother)](https://github.com/joefarebrother).



[ANNOUNCE] Apache Tika 1.28.2 released

2022-05-03 Thread Tim Allison
The Apache Tika project is pleased to announce the release of Apache
Tika 1.28.2. The release contents have been pushed out to the main
Apache release site and to the Maven Central sync.

Apache Tika is a toolkit for detecting and extracting metadata and
structured text content from various documents using existing parser
libraries.

Apache Tika 1.28.2 contains security-related and general
dependency upgrades. This release also includes a non-trivial
upgrade to Apache POI 5.2.0 (TIKA-3164); users
will observe significantly more logging from the POI parsers.
Details can be found in the changes file:
https://www.apache.org/dist/tika/1.28.2/CHANGES-1.28.2.txt

NOTE: The 1.x branch is now in security-fixes-only mode. The PMC
has decided the formal EoL for the 1.x branch is 30 September 2022:
https://lists.apache.org/thread/yq6n7o01kw544dvj1jsoqk29g6yqjkp3

Please upgrade to 2.4.0 at your earliest convenience. For guidance on this
upgrade:
https://cwiki.apache.org/confluence/display/TIKA/Migrating+to+Tika+2.0.0

Apache Tika is available on the download page:
https://tika.apache.org/download.html

Apache Tika is also available in binary form or for use using Maven 2
from the Central Repository:
https://repo1.maven.org/maven2/org/apache/tika/

When downloading, please remember to verify the downloads using
signatures found: https://www.apache.org/dist/tika/KEYS

For more information on Apache Tika, visit the project home page:
https://tika.apache.org/

-- Tim Allison, on behalf of the Apache Tika community


[ANNOUNCE] Apache Tika 2.4.0 released

2022-05-03 Thread Tim Allison
The Apache Tika project is pleased to announce the release of Apache
Tika 2.4.0. The release contents have been pushed out to the main
Apache release site and to the Maven Central sync.

Apache Tika is a toolkit for detecting and extracting metadata and
structured text content from various documents using existing parser
libraries.

Apache Tika 2.4.0 includes several security upgrades in dependencies.
Note that we no longer bundle the deeplearning4j dependencies
in our tika-dl jar; users must provide those on their own.
Details can be found in the changes file:
https://www.apache.org/dist/tika/2.4.0/CHANGES-2.4.0.txt

Apache Tika is available on the download page:
https://tika.apache.org/download.html

Apache Tika is also available in binary form or for use using Maven 2
from the Central Repository:
https://repo1.maven.org/maven2/org/apache/tika/

When downloading, please remember to verify the downloads using
signatures found: https://www.apache.org/dist/tika/KEYS

For more information on Apache Tika, visit the project home page:
https://tika.apache.org/

-- Tim Allison, on behalf of the Apache Tika community


[ANNOUNCE] Apache Tika 1.28.1 released

2022-02-11 Thread Tim Allison
The Apache Tika project is pleased to announce the release of Apache
Tika 1.28.1. The release contents have been pushed out to the main
Apache release site and to the Maven Central sync.

Apache Tika is a toolkit for detecting and extracting metadata and
structured text content from various documents using existing parser
libraries.

Apache Tika 1.28.1 contains security-related and general
dependency upgrades. Details can be found in the changes file:
https://www.apache.org/dist/tika/1.28.1/CHANGES-1.28.1.txt

NOTE: The 1.x branch is now in security-fixes-only mode. The PMC
has decided the formal EoL for the 1.x branch is 30 September 2022:
https://lists.apache.org/thread/yq6n7o01kw544dvj1jsoqk29g6yqjkp3

Please upgrade to 2.3.0 at your earliest convenience. For guidance on this
upgrade:
https://cwiki.apache.org/confluence/display/TIKA/Migrating+to+Tika+2.0.0

Apache Tika is available on the download page:
https://tika.apache.org/download.html

Apache Tika is also available in binary form or for use using Maven 2
from the Central Repository:
https://repo1.maven.org/maven2/org/apache/tika/

When downloading, please remember to verify the downloads using
signatures found: https://www.apache.org/dist/tika/KEYS

For more information on Apache Tika, visit the project home page:
https://tika.apache.org/

-- Tim Allison, on behalf of the Apache Tika community


[ANNOUNCE] Apache Tika 1.x End-Of-Life (EOL) announcement

2022-02-11 Thread Tim Allison
The Apache Tika Project Team would like to inform you that the Apache Tika
1.x branch is now in security-only maintenance until September 30, 2022.
After that date, we will not make updates or releases from our 1.x branch.
We will continue to make security fixes and security-related
dependency upgrades in our 1.x branch as necessary until September 30,
2022.

We initially announced this on our website on December 16, 2021 with
the release of Tika 2.2.0: https://tika.apache.org/

Questions and Answers:

With the announcement of Tika 1.x EoL, what happens to
Tika 1.x resources?

All resources will stay where they are. Users will still
be able to download source code from our branch_1x branch via
github[1]; and published artifacts will remain available on
maven central and in the Apache archives[2].

[1] https://github.com/apache/tika/tree/branch_1x
[2] https://archive.apache.org/dist/tika/

Is there an immediate need to upgrade to Tika 2.x in my projects?

As of today, there aren't known critical vulnerabilities affecting the
soon-to-be-released Tika 1.28.1.  However, considering that there are
several breaking changes in the 2.x branch, we encourage making the
migration soon to allow time to adjust your client code as
necessary.  For up-to-date documentation on migrating to 2.x, see [3].

[3] https://cwiki.apache.org/confluence/display/TIKA/Migrating+to+Tika+2.0.0

My friends / colleagues and I would like to see Tika 1.x being
maintained after September 30, 2022. What can we do?

You may fork the existing source and support it on your own.

Kind regards
-
The Apache Tika Team


[ANNOUNCE] Apache Tika 2.3.0 released

2022-02-07 Thread Tim Allison
The Apache Tika project is pleased to announce the release of Apache
Tika 2.3.0. The release contents have been pushed out to the main
Apache release site and to the Maven Central sync.

Apache Tika is a toolkit for detecting and extracting metadata and
structured text content from various documents using existing parser
libraries.

Apache Tika 2.3.0 includes several security upgrades in dependencies,
including an upgrade to log4j2 (version 2.17.1).  This release also
includes a non-trivial upgrade to Apache POI 5.2.0 (TIKA-3164); users
will observe significantly more logging from the POI parsers.
Details can be found in the changes file:
https://www.apache.org/dist/tika/2.3.0/CHANGES-2.3.0.txt

Apache Tika is available on the download page:
https://tika.apache.org/download.html

Apache Tika is also available in binary form or for use using Maven 2
from the Central Repository:
https://repo1.maven.org/maven2/org/apache/tika/

When downloading, please remember to verify the downloads using
signatures found: https://www.apache.org/dist/tika/KEYS

For more information on Apache Tika, visit the project home page:
https://tika.apache.org/

-- Tim Allison, on behalf of the Apache Tika community


[ANNOUNCE] Apache Tika 2.2.1 released

2021-12-23 Thread Tim Allison
The Apache Tika project is pleased to announce the release of Apache
Tika 2.2.1. The release contents have been pushed out to the main
Apache release site and to the Maven Central sync.

Apache Tika is a toolkit for detecting and extracting metadata and
structured text content from various documents using existing parser
libraries.

Apache Tika 2.2.1 contains an upgrade to log4j2 2.17.0, a
critical fix to an OOXML parser regression that was introduced
in 2.2.0, and upgrades to other dependencies.  Details can be found
in the changes file:
https://www.apache.org/dist/tika/2.2.1/CHANGES-2.2.1.txt

Apache Tika is available on the download page:
https://tika.apache.org/download.html

Apache Tika is also available in binary form or for use using Maven 2
from the Central Repository:
https://repo1.maven.org/maven2/org/apache/tika/

When downloading, please remember to verify the downloads using
signatures found: https://www.apache.org/dist/tika/KEYS

For more information on Apache Tika, visit the project home page:
https://tika.apache.org/

-- Tim Allison, on behalf of the Apache Tika community


[ANNOUNCE] Apache Tika 1.28 released

2021-12-23 Thread Tim Allison
The Apache Tika project is pleased to announce the release of Apache
Tika 1.28. The release contents have been pushed out to the main
Apache release site and to the Maven Central sync.

Apache Tika is a toolkit for detecting and extracting metadata and
structured text content from various documents using existing parser
libraries.

Apache Tika 1.28 contains a migration to log4j2 (2.17.0) from log4j
as well as several other dependency upgrades. Details can be found in
the changes file:
https://www.apache.org/dist/tika/1.28/CHANGES-1.28.txt

Apache Tika is available on the download page:
https://tika.apache.org/download.html

Apache Tika is also available in binary form or for use using Maven 2
from the Central Repository:
https://repo1.maven.org/maven2/org/apache/tika/

When downloading, please remember to verify the downloads using
signatures found: https://www.apache.org/dist/tika/KEYS

For more information on Apache Tika, visit the project home page:
https://tika.apache.org/

-- Tim Allison, on behalf of the Apache Tika community


[ANNOUNCE] Apache Tika 2.2.0 released

2021-12-16 Thread Tim Allison
The Apache Tika project is pleased to announce the release of Apache
Tika 2.2.0. The release contents have been pushed out to the main
Apache release site and to the Maven Central sync.

Apache Tika is a toolkit for detecting and extracting metadata and
structured text content from various documents using existing parser
libraries.

Apache Tika 2.2.0 contains a mitigation to log4j's CVE-2021-44228 by
upgrading to log4j 2.15.0 as well as a number of improvements and bug
fixes. Details can be found in the changes file:
https://www.apache.org/dist/tika/2.2.0/CHANGES-2.2.0.txt

Apache Tika is available on the download page:
https://tika.apache.org/download.html

Apache Tika is also available in binary form or for use using Maven 2
from the Central Repository:
https://repo1.maven.org/maven2/org/apache/tika/

When downloading, please remember to verify the downloads using
signatures found: https://www.apache.org/dist/tika/KEYS

For more information on Apache Tika, visit the project home page:
https://tika.apache.org/

-- Tim Allison, on behalf of the Apache Tika community


[ANNOUNCE] Apache Tika 2.1.0 released

2021-08-30 Thread Tim Allison
The Apache Tika project is pleased to announce the release of Apache
Tika 2.1.0. The release contents have been pushed out to the main
Apache release site and to the Maven Central sync, so the releases
should be available as soon as the mirrors get the syncs.

Apache Tika is a toolkit for detecting and extracting metadata and
structured text content from various documents using existing parser
libraries.

Apache Tika 2.1.0 contains a number of improvements and bug fixes.
Details can be found in the changes file:
https://www.apache.org/dist/tika/2.1.0/CHANGES-2.1.0.txt

Apache Tika is available on the download page:
https://tika.apache.org/download.html

Apache Tika is also available in binary form or for use using Maven 2
from the Central Repository:
https://repo1.maven.org/maven2/org/apache/tika/

In the initial 48 hours, the release may not be available on all mirrors.

When downloading from a mirror site, please remember to verify the
downloads using signatures found:
https://www.apache.org/dist/tika/KEYS

For more information on Apache Tika, visit the project home page:
https://tika.apache.org/

-- Tim Allison, on behalf of the Apache Tika community


[ANNOUNCE] Apache Tika 2.0.0 released

2021-07-21 Thread Tim Allison
The Apache Tika project is pleased to announce the release of Apache
Tika 2.0.0. The release contents have been pushed out to the main
Apache release site and to the Maven Central sync, so the releases
should be available as soon as the mirrors get the syncs.

Apache Tika is a toolkit for detecting and extracting metadata and
structured text content from various documents using existing parser
libraries.

Apache Tika 2.0.0 contains a number of improvements and bug fixes.
Details can be found in the changes file:
https://www.apache.org/dist/tika/2.0.0/CHANGES-2.0.0.txt

Apache Tika is available on the download page:
https://tika.apache.org/download.html

Apache Tika is also available in binary form or for use using Maven 2
from the Central Repository:
https://repo1.maven.org/maven2/org/apache/tika/

In the initial 48 hours, the release may not be available on all mirrors.

When downloading from a mirror site, please remember to verify the
downloads using signatures found:
https://www.apache.org/dist/tika/KEYS

For more information on Apache Tika, visit the project home page:
https://tika.apache.org/

-- Tim Allison, on behalf of the Apache Tika community


[ANNOUNCE] Apache Tika 1.27 released

2021-07-07 Thread Tim Allison
The Apache Tika project is pleased to announce the release of Apache
Tika 1.27. The release contents have been pushed out to the main
Apache release site and to the Maven Central sync, so the releases
should be available as soon as the mirrors get the syncs.

Apache Tika is a toolkit for detecting and extracting metadata and
structured text content from various documents using existing parser
libraries.

Apache Tika 1.27 contains a number of improvements and bug fixes.
Details can be found in the changes file:
https://www.apache.org/dist/tika/1.27/CHANGES-1.27.txt

Apache Tika is available on the download page:
https://tika.apache.org/download.html

Apache Tika is also available in binary form or for use using Maven 2
from the Central Repository:
https://repo1.maven.org/maven2/org/apache/tika/

In the initial 48 hours, the release may not be available on all mirrors.

When downloading from a mirror site, please remember to verify the
downloads using signatures found:
https://www.apache.org/dist/tika/KEYS

For more information on Apache Tika, visit the project home page:
https://tika.apache.org/

-- Tim Allison, on behalf of the Apache Tika community


[ANNOUNCE] Apache Tika 2.0.0-BETA released

2021-06-01 Thread Tim Allison
The Apache Tika project is pleased to announce the release of Apache
Tika 2.0.0-BETA. The release contents have been pushed out to the main
Apache release site and to the Maven Central sync, so the releases
should be available as soon as the mirrors get the syncs.

Apache Tika is a toolkit for detecting and extracting metadata and
structured text content from various documents using existing parser
libraries.

Apache Tika 2.0.0-BETA contains a number of improvements and bug fixes.
Details can be found in the changes file:
https://www.apache.org/dist/tika/2.0.0-BETA/CHANGES-2.0.0-BETA.txt

Apache Tika is available on the download page:
https://tika.apache.org/download.html

Apache Tika is also available in binary form or for use using Maven 2
from the Central Repository:
https://repo1.maven.org/maven2/org/apache/tika/

In the initial 48 hours, the release may not be available on all mirrors.

When downloading from a mirror site, please remember to verify the
downloads using signatures found:
https://www.apache.org/dist/tika/KEYS

For more information on Apache Tika, visit the project home page:
https://tika.apache.org/

-- Tim Allison, on behalf of the Apache Tika community


CVE-2021-28657: Infinite loop in Apache Tika's MP3 parser

2021-03-30 Thread Tim Allison
Description:

A carefully crafted or corrupt file may trigger an infinite loop in
Tika's MP3Parser up to and including Tika 1.25. Apache Tika users
should upgrade to 1.26 or later.

Mitigation:

Users should upgrade to 1.26 or later.

Credit:

Apache Tika would like to thank Khaled Nassar for reporting this issue.


[ANNOUNCE] Apache Tika 1.26 released

2021-03-29 Thread Tim Allison
The Apache Tika project is pleased to announce the release of Apache
Tika 1.26. The release contents have been pushed out to the main
Apache release site and to the Maven Central sync, so the releases
should be available as soon as the mirrors get the syncs.

Apache Tika is a toolkit for detecting and extracting metadata and
structured text content from various documents using existing parser
libraries.

Apache Tika 1.26 contains a number of improvements and bug fixes.
Details can be found in the changes file:
https://www.apache.org/dist/tika/CHANGES-1.26.txt

Apache Tika is available on the download page:
https://tika.apache.org/download.html

Apache Tika is also available in binary form or for use using Maven 2
from the Central Repository:
https://repo1.maven.org/maven2/org/apache/tika/

In the initial 48 hours, the release may not be available on all mirrors.

When downloading from a mirror site, please remember to verify the
downloads using signatures found:
https://www.apache.org/dist/tika/KEYS

For more information on Apache Tika, visit the project home page:
https://tika.apache.org/

-- Tim Allison, on behalf of the Apache Tika community


[ANNOUNCE] Apache Tika 2.0.0-ALPHA released

2021-01-19 Thread Tim Allison
The Apache Tika project is pleased to announce the release of Apache
Tika 1.25. The release contents have been pushed out to the main
Apache release site and to the Maven Central sync, so the releases
should be available as soon as the mirrors get the syncs.

Apache Tika is a toolkit for detecting and extracting metadata and
structured text content from various documents using existing parser
libraries.

Apache Tika 2.0.0-ALPHA contains a number of improvements and bug
fixes. Details can be found in the changes file:
https://www.apache.org/dist/tika/CHANGES-2.0.0-ALPHA.txt

Apache Tika is available on the download page:
https://tika.apache.org/download.html

Apache Tika is also available in binary form or for use using Maven 2
from the Central Repository:
https://repo1.maven.org/maven2/org/apache/tika/

In the initial 48 hours, the release may not be available on all mirrors.

When downloading from a mirror site, please remember to verify the
downloads using signatures found:
https://www.apache.org/dist/tika/KEYS

For more information on Apache Tika, visit the project home page:
https://tika.apache.org/

-- Tim Allison, on behalf of the Apache Tika community


[ANNOUNCE] Apache Tika 1.25 released

2020-12-02 Thread Tim Allison
The Apache Tika project is pleased to announce the release of Apache Tika
1.25. The release contents have been pushed out to the main Apache release
site and to the Maven Central sync, so the releases should be available as
soon as the mirrors get the syncs.

Apache Tika is a toolkit for detecting and extracting metadata and
structured text content from various documents using existing parser
libraries.

Apache Tika 1.25 contains a number of improvements and bug fixes. Details
can be found in the changes file:
https://www.apache.org/dist/tika/CHANGES-1.25.txt

Apache Tika is available on the download page:
https://tika.apache.org/download.html

Apache Tika is also available in binary form or for use using Maven 2 from
the Central Repository:
https://repo1.maven.org/maven2/org/apache/tika/

In the initial 48 hours, the release may not be available on all mirrors.

When downloading from a mirror site, please remember to verify the
downloads using signatures found:
https://www.apache.org/dist/tika/KEYS

For more information on Apache Tika, visit the project home page:
https://tika.apache.org/

-- Tim Allison, on behalf of the Apache Tika community


[CVE-2020-9489] Denial of Service (DOS) Vulnerabilities in Some of Apache Tika's Parsers

2020-04-24 Thread Tim Allison
Severity: Medium

Vendor: The Apache Software Foundation

Versions Affected: Apache Tika 1.24

Description:
A carefully crafted or corrupt file may trigger a System.exit in Tika's
OneNote Parser. Crafted or corrupted files can also cause out of memory
errors and/or infinite loops in Tika's ICNSParser, MP3Parser, MP4Parser,
SAS7BDATParser, OneNoteParser and ImageParser.


Mitigation:
Apache Tika users should upgrade to 1.24.1 or later. The vulnerabilities in
the MP4Parser were partially fixed by upgrading the
com.googlecode:isoparser:1.1.22 dependency to
org.tallison:isoparser:1.9.41.2.

For unrelated security reasons, we upgraded org.apache.cxf to 3.3.6 as part
of the 1.24.1 release.

We also upgraded openjson to 1.0.10, org.ow2.asm to 8.0.1, zstd-jni to
1.4.4-9, bouncycastle to 1.65, commons-lang3 to 3.10, lucene to 8.5.0 and
mockito to 3.3.3 as part of the 1.24.1 release.


Credit:
These vulnerabilities were discovered by Tim Allison on the Apache Tika
team.


[ANNOUNCE] Apache Tika 1.24.1 released

2020-04-22 Thread Tim Allison
The Apache Tika project is pleased to announce the release of Apache Tika
1.24.1. The release contents have been pushed out to the main Apache
release site and to the Maven Central sync, so the releases should be
available as soon as the mirrors get the syncs.

Apache Tika is a toolkit for detecting and extracting metadata and
structured text content from various documents using existing parser
libraries.

Apache Tika 1.24.1 contains a number of improvements and bug fixes.
Details can be found in the changes file:
https://www.apache.org/dist/tika/CHANGES-1.24.1.txt

Apache Tika is available on the download page:
https://tika.apache.org/download.html

Apache Tika is also available in binary form or for use using Maven 2
from the Central Repository:
https://repo1.maven.org/maven2/org/apache/tika/

In the initial 48 hours, the release may not be available on all mirrors.
When downloading from a mirror site, please remember to verify the
downloads using signatures found:
https://www.apache.org/dist/tika/KEYS

For more information on Apache Tika, visit the project home page:
https://tika.apache.org/

-- Tim Allison, on behalf of the Apache Tika community


[CVE-2020-1951] Infinite Loop (DoS) vulnerability in Apache Tika's PSDParser

2020-03-18 Thread Tim Allison
TItle: [CVE-2020-1951] Infinite Loop (DoS) vulnerability in Apache Tika's
PSDParser

Severity: Medium

Vendor: The Apache Software Foundation

Versions Affected: Apache Tika  1.0 to 1.23

Description:
A carefully crafted or corrupt PSD file can cause an infinite loop in Apache
Tika's PSDParser in versions 1.0-1.23.


Mitigation:
Apache Tika users should upgrade to 1.24 or later.

Credit:
This issue was discovered by Tim Allison on the Apache Tika team.


[CVE-2020-1950] Excessive memory usage (DoS) vulnerability in Apache Tika's PSDParser

2020-03-18 Thread Tim Allison
Title: [CVE-2020-1950] Excessive memory usage (DoS) vulnerability in Apache
Tika's PSDParser

Severity: Medium

Vendor: The Apache Software Foundation

Versions Affected: Apache Tika  1.0 to 1.23

Description:
A carefully crafted or corrupt PSD file can cause excessive memory usage in
Apache
Tika's PSDParser in versions 1.0-1.23.


Mitigation:
Apache Tika users should upgrade to 1.24 or later.


Credit:
This issue was discovered by Pierre Ernst at Elastic.


[ANNOUNCE] Apache Tika 1.24 released

2020-03-18 Thread Tim Allison
The Apache Tika project is pleased to announce the release of Apache Tika
1.24. The release contents have been pushed out to the main Apache
release site and to the Maven Central sync, so the releases should be
available as soon as the mirrors get the syncs.

Apache Tika is a toolkit for detecting and extracting metadata and
structured text content from various documents using existing parser
libraries.

Apache Tika 1.24 contains a number of improvements and bug fixes.
Details can be found in the changes file:
https://www.apache.org/dist/tika/CHANGES-1.24.txt

Apache Tika is available on the download page:
https://tika.apache.org/download.html

Apache Tika is also available in binary form or for use using Maven 2
from the Central Repository:
https://repo1.maven.org/maven2/org/apache/tika/

In the initial 48 hours, the release may not be available on all mirrors.
When downloading from a mirror site, please remember to verify the
downloads using signatures found:
https://www.apache.org/dist/tika/KEYS

For more information on Apache Tika, visit the project home page:
https://tika.apache.org/

-- Tim Allison, on behalf of the Apache Tika community


[ANNOUNCE] Apache Tika 1.23 released

2019-12-06 Thread Tim Allison
The Apache Tika project is pleased to announce the release of Apache Tika
1.23. The release contents have been pushed out to the main Apache
release site and to the Maven Central sync, so the releases should be
available as soon as the mirrors get the syncs.

Apache Tika is a toolkit for detecting and extracting metadata and
structured text content from various documents using existing parser
libraries.

Apache Tika 1.23 contains a number of improvements and bug fixes.
Details can be found in the changes file:
https://www.apache.org/dist/tika/CHANGES-1.23.txt

Apache Tika is available on the download page:
https://tika.apache.org/download.html

Apache Tika is also available in binary form or for use using Maven 2
from the Central Repository:
https://repo1.maven.org/maven2/org/apache/tika/

In the initial 48 hours, the release may not be available on all mirrors.
When downloading from a mirror site, please remember to verify the
downloads using signatures found:
https://www.apache.org/dist/tika/KEYS

For more information on Apache Tika, visit the project home page:
https://tika.apache.org/

-- Tim Allison, on behalf of the Apache Tika community


[CVE-2019-10094] StackOverflow from Crafted Package/Compressed Files in Apache Tika's RecursiveParserWrapper

2019-08-02 Thread Tim Allison
Title: [CVE-2019-10094] StackOverflow from Crafted Package/Compressed
Files in Apache Tika's RecursiveParserWrapper

Severity: Medium

Vendor: The Apache Software Foundation

Versions Affected: Apache Tika  1.7 to 1.21

Description:
A carefully crafted package/compressed file that, when
unzipped/uncompressed yields the same file (a quine), causes a
StackOverflowError in Apache Tika's RecursiveParserWrapper in versions
1.7-1.21 of Apache Tika.


Mitigation:
Apache Tika users should upgrade to 1.22 or later.


Credit:
This issue was discovered by Tim Allison on the Apache Tika team. Many
thanks to Matthew Barber and Erling Ellingson for crafting examples
and contributing these files to Tika's unit tests.


[CVE-2019-10093] Denial of Service in Apache Tika's 2003ml and 2006ml Parsers

2019-08-02 Thread Tim Allison
Title: [CVE-2019-10093] Denial of Service in Apache Tika's 2003ml and
2006ml Parsers

Severity: Medium

Vendor: The Apache Software Foundation

Versions Affected: Apache Tika  1.19 to 1.21

Description:
A carefully crafted 2003ml or 2006ml file could consume all available
SAXParsers in the pool and lead to very long hangs.


Mitigation:
Apache Tika users should upgrade to 1.22 or later.


Credit:
This issue was discovered by Tim Allison on the Apache Tika team.


[CVE-2019-10088] OOM from a crafted Zip File in Apache Tika's RecursiveParserWrapper

2019-08-02 Thread Tim Allison
Title: [CVE-2019-10088] OOM from a crafted Zip File in Apache Tika's
RecursiveParserWrapper

Severity: Medium

Vendor: The Apache Software Foundation

Versions Affected: Apache Tika  1.7 to 1.21

Description:
A carefully crafted or corrupt zip file can cause an OOM in Apache
Tika's RecursiveParserWrapper in versions 1.7-1.21.


Mitigation:
Apache Tika users should upgrade to 1.22 or later.


Credit:
This issue was discovered by RunningSnail.


[ANNOUNCE] Apache Tika 1.22 released

2019-08-02 Thread Tim Allison
The Apache Tika project is pleased to announce the release of Apache Tika
1.22. The release contents have been pushed out to the main Apache
release site and to the Maven Central sync, so the releases should be
available as soon as the mirrors get the syncs.

Apache Tika is a toolkit for detecting and extracting metadata and
structured text content from various documents using existing parser
libraries.

Apache Tika 1.22 contains a number of improvements and bug fixes.
Details can be found in the changes file:
https://www.apache.org/dist/tika/CHANGES-1.22.txt

Apache Tika is available on the download page:
https://tika.apache.org/download.html

In the initial 48 hours, the release may not be available on all mirrors.
When downloading from a mirror site, please remember to verify the
downloads using signatures found:
https://www.apache.org/dist/tika/KEYS

For more information on Apache Tika, visit the project home page:
https://tika.apache.org/

-- Tim Allison, on behalf of the Apache Tika community


[ANNOUNCE] Apache Tika 1.21 released

2019-05-20 Thread Tim Allison
The Apache Tika project is pleased to announce the release of Apache Tika
1.21. The release contents have been pushed out to the main Apache
release site and to the Maven Central sync, so the releases should be
available as soon as the mirrors get the syncs.

Apache Tika is a toolkit for detecting and extracting metadata and
structured text content from various documents using existing parser
libraries.

Apache Tika 1.21 contains a number of improvements and bug fixes.
Details can be found in the changes file:
https://www.apache.org/dist/tika/CHANGES-1.21.txt

Apache Tika is available on the download page:
https://tika.apache.org/download.html

Apache Tika is also available in binary form or for use using Maven 2
from the Central Repository:
https://repo1.maven.org/maven2/org/apache/tika/

In the initial 48 hours, the release may not be available on all mirrors.
When downloading from a mirror site, please remember to verify the
downloads using signatures found:
https://www.apache.org/dist/tika/KEYS

For more information on Apache Tika, visit the project home page:
https://tika.apache.org/

-- Tim Allison, on behalf of the Apache Tika community


[CVE-2018-17197] Apache Tika Denial of Service -- Infinite Loop in Tika's SQLite3Parser

2018-12-22 Thread Tim Allison
[CVE-2018-17197] Apache Tika Denial of Service -- Infinite Loop in
Tika's SQLite3Parser

Severity: Medium

Vendor: The Apache Software Foundation

Versions Affected: Apache Tika 1.8 to 1.19.1

Description:
A carefully crafted or corrupt sqlite file can cause an infinite loop
in Apache Tika's SQLite3Parser in versions 1.8-1.19.1 of Apache Tika.


Mitigation:
Apache Tika users should upgrade to 1.20 or later.


Credit:
This issue was discovered by Tim Allison on the Apache Tika Team.


[ANNOUNCE] Apache Tika 1.20 released

2018-12-22 Thread Tim Allison
The Apache Tika project is pleased to announce the release of Apache Tika
1.20. The release contents have been pushed out to the main Apache
release site and to the Maven Central sync, so the releases should be
available as soon as the mirrors get the syncs.

Apache Tika is a toolkit for detecting and extracting metadata and
structured text content from various documents using existing parser
libraries.

Apache Tika 1.20 contains a number of improvements and bug fixes.
Details can be found in the changes file:
https://www.apache.org/dist/tika/CHANGES-1.20.txt

Apache Tika is available on the download page:
https://tika.apache.org/download.html

Apache Tika is also available in binary form or for use using Maven 2
from the Central Repository:
https://repo1.maven.org/maven2/org/apache/tika/

In the initial 48 hours, the release may not be available on all mirrors.
When downloading from a mirror site, please remember to verify the
downloads using signatures found:
https://www.apache.org/dist/tika/KEYS

For more information on Apache Tika, visit the project home page:
https://tika.apache.org/

-- Tim Allison, on behalf of the Apache Tika community


[CVE-2018-11796] Apache Tika Denial of Service via XML Entity Expansion Vulnerability

2018-10-09 Thread Tim Allison
CVE-2018-11796: Apache Tika Denial of Service via XML Entity Expansion
Vulnerability

Severity: Medium

Vendor:
The Apache Software Foundation

Versions Affected:
Apache Tika 0.1 to 1.19

Description:
In Apache Tika 1.19 (CVE-2018-11761), we added an entity expansion
limit for XML parsing.  However, Tika reuses SAXParsers and calls
reset() after each parse, which, for Xerces2 parsers, as per the
documentation, removes the user-specified SecurityManager and
thus removes entity expansion limits after the first parse.
Apache Tika 1.19 is therefore still vulnerable to entity
expansions which can lead to a denial of service attack.

Mitigation:
Apache Tika users should upgrade to 1.19.1 or later

Credit:
This issue was discovered by Slava Gorelik of CloudAlly.


[ANNOUNCE] Apache Tika 1.19.1 released

2018-10-09 Thread Tim Allison
The Apache Tika project is pleased to announce the release of Apache
Tika 1.19.1. The release contents have been pushed out to the main
Apache release site and to the Maven Central sync, so the releases
should be available as soon as the mirrors get the syncs.

Apache Tika is a toolkit for detecting and extracting metadata and
structured text content from various documents using existing parser
libraries.

Apache Tika 1.19.1 contains two critical bug fixes to the
MP3Parser and the handling of SAX parsing.  Details can be found in
the changes file:
https://www.apache.org/dist/tika/CHANGES-1.19.1.txt

Apache Tika is available on the download page:
https://tika.apache.org/download.html

Apache Tika is also available in binary form or for use using Maven 2
from the Central Repository:
https://repo1.maven.org/maven2/org/apache/tika/

In the initial 48 hours, the release may not be available on all mirrors.

When downloading from a mirror site, please remember to verify the
downloads using signatures found on the Apache site:
https://www.apache.org/dist/tika/KEYS

For more information on Apache Tika, visit the project home page:
https://tika.apache.org/

-- Tim Allison, on behalf of the Apache Tika community


[ANNOUNCE] Apache Tika 1.19 released

2018-09-19 Thread Tim Allison
The Apache Tika project is pleased to announce the release of Apache
Tika 1.19. The release contents have been pushed out to the main
Apache release site and to the Maven Central sync, so the releases
should be available as soon as the mirrors get the syncs.

Apache Tika is a toolkit for detecting and extracting metadata and
structured text content from various documents using existing parser
libraries.

Apache Tika 1.19 contains a number of improvements and bug fixes.
Details can be found in the changes file:
http://www.apache.org/dist/tika/CHANGES-1.19.txt

Apache Tika is available on the download page:
http://tika.apache.org/download.html

Apache Tika is also available in binary form or for use using Maven 2
from the Central Repository:
http://repo1.maven.org/maven2/org/apache/tika/

In the initial 48 hours, the release may not be available on all mirrors.

When downloading from a mirror site, please remember to verify the
downloads using signatures found on the Apache site:
https://www.apache.org/dist/tika/KEYS

For more information on Apache Tika, visit the project home page:
http://tika.apache.org/

-- Tim Allison, on behalf of the Apache Tika community


[CVE-2018-8017] Apache Tika Denial of Service Vulnerability -- Potential Infinite Loop in IptcAnpaParser

2018-09-19 Thread Tim Allison
CVE-2018-8017: Apache Tika Denial of Service Vulnerability --
Potential Infinite Loop in IptcAnpaParser

Severity: Medium

Vendor:
The Apache Software Foundation

Versions Affected:
Apache Tika 1.2 to 1.18

Description:
A carefully crafted file can trigger an infinite loop in Apache Tika's
IptcAnpaParser.

Mitigation:
Apache Tika users should upgrade to 1.19 or later.

Credit:
This issue was discovered by Tobias Ospelt of modzero AG.


[CVE-2018-11762] Zip Slip Vulnerability in Apache Tika's tika-app

2018-09-19 Thread Tim Allison
CVE-2018-11762: Zip Slip Vulnerability in Apache Tika's tika-app

Severity: Low

Vendor:
The Apache Software Foundation

Versions Affected:
Apache Tika 0.9 to 1.18

Description:
In a rare edge case where a user does not specify an extract directory on
the commandline (--extract-dir=) and the input file has an embedded file
with an absolute path, such as "C:/evil.bat", tika-app would overwrite
that file.

Mitigation:
Apache Tika users should upgrade to 1.19 or later

Credit:
This issue was discovered by Tim Allison on the Apache Tika team.


[CVE-2018-11761] Apache Tika DoS XML Entity Expansion Vulnerability

2018-09-19 Thread Tim Allison
CVE-2018-11761: Apache Tika Denial of Service via XML Entity Expansion
Vulnerability

Severity: Medium

Vendor:
The Apache Software Foundation

Versions Affected:
Apache Tika 0.1 to 1.18

Description:
Apache Tika's XML parsers were not configured to limit entity expansion.
They were therefore vulnerable to an entity expansion vulnerability which
can lead to a denial of service attack.

Mitigation:
Apache Tika users should upgrade to 1.19 or later

Credit:
This issue was discovered by Renfei (Brian) Wang of Amazon.


Fwd: [ANNOUNCE] Apache Tika 1.18 released

2018-04-25 Thread Tim Allison
The Apache Tika project is pleased to announce the release of Apache Tika
1.18.

The release contents have been pushed out to the main Apache release site
and to the Maven
Central sync, so the releases should be available as soon as the mirrors
get the syncs.

Apache Tika is a toolkit for detecting and extracting metadata and
structured text content from
various documents using existing parser libraries.

Apache Tika 1.18 contains a number of improvements as well as security and
bug fixes.

Details can be found in the changes file:

http://www.apache.org/dist/tika/CHANGES-1.18.txt

Apache Tika is available on the download page: http://tika.apache.org/
download.html

Apache Tika is also available in binary form or for use using Maven 2 from
the Central Repository: http://repo1.maven.org/maven2/org/apache/tika/

In the initial 48 hours, the release may not be available on all mirrors.

When downloading from a mirror site, please remember to verify the
downloads using
signatures found on the Apache site: http://www.apache.org/dist/tika/KEYS

For more information on Apache Tika, visit the project home page:
http://tika.apache.org/

-- Tim Allison, on behalf of the Apache Tika community


[CVE-2018-1335] Command Injection Vulnerability in Apache Tika’s tika-server module

2018-04-25 Thread Tim Allison
CVE-2018-1335 – Command Injection Vulnerability in Apache Tika’s tika-server
module


Severity: High



Vendor: The Apache Software Foundation



Versions Affected: <1.18



Description: Before Tika 1.18, clients could send carefully crafted

headers to tika-server that could be used to inject commands into the

command line of the server running tika-server.  This vulnerability

only affects those running tika-server on a server that is open to

 untrusted clients.



Mitigation: Ensure that untrusted users don't have access to

tika-server and/or upgrade to Apache Tika >=1.18.



Credit: Tim Allison, a member of the Apache Tika team, discovered this.


[CVE-2018-1339] DoS (Infinite Loop) Vulnerability in Apache Tika’s ChmParser

2018-04-25 Thread Tim Allison
CVE-2018-1339 – DoS (Infinite Loop) Vulnerability in Apache Tika’s ChmParser


Severity: Important


Vendor: The Apache Software Foundation


Versions Affected: <1.18


Description: A carefully crafted (or fuzzed) file can trigger an infinite
loop in Apache Tika's ChmParser.

Mitigation: Turn off the ChmParser or upgrade to Apache Tika >=1.18.


Credit: Tobias Ospelt of modzero AG discovered this issue by fuzzing with
Kelinci (https://github.com/isstac/kelinci).


[CVE-2018-1338] DoS (Infinite Loop) Vulnerability in Apache Tika’s BPGParser

2018-04-25 Thread Tim Allison
CVE-2018-1338 – DoS (Infinite Loop) Vulnerability in Apache Tika’s BPGParser


Severity: Important

Vendor: The Apache Software Foundation

Versions Affected: <1.18

Description: A carefully crafted (or fuzzed) file can trigger an infinite
loop in Apache Tika's BPGParser.

Mitigation: Turn off the BPGParser or upgrade to Apache Tika >=1.18.

Credit: Tobias Ospelt of modzero AG discovered this issue by fuzzing with
Kelinci (https://github.com/isstac/kelinci).


CVE-2017-12626 – Denial of Service Vulnerabilities in Apache POI < 3.17

2018-01-26 Thread Tim Allison
Title: CVE-2017-12626 – Denial of Service Vulnerabilities in Apache POI < 3.17

Severity: Important

Vendor: The Apache Software Foundation

Versions affected: versions prior to version 3.17

Description:   
    Apache POI versions prior to release 3.17 are vulnerable to Denial of 
Service Attacks:
    * Infinite Loops while parsing specially crafted WMF, EMF, MSG and macros
          (POI bugs 61338 [0] and 61294 [1])
    * Out of Memory Exceptions while parsing specially crafted DOC, PPT and XLS 
          (POI bugs 52372 [2] and 61295 [3])


Mitigation:  Users with applications which accept content from external or 
untrusted sources are advised to upgrade to Apache POI 3.17 or newer.

-Tim Allison

on behalf of the Apache POI PMC

 

[0] https://bz.apache.org/bugzilla/show_bug.cgi?id=61338
[1] https://bz.apache.org/bugzilla/show_bug.cgi?id=61294
[2] https://bz.apache.org/bugzilla/show_bug.cgi?id=52372
[3] https://bz.apache.org/bugzilla/show_bug.cgi?id=61295


[ANNOUNCE] Apache Tika 1.17 released

2017-12-13 Thread Tim Allison
 The Apache Tika project is pleased to announce the release of Apache Tika 
1.17. 

The release contents have been pushed out to the main Apache release site and 
to the Maven Central sync, so the releases should be available as 
soon as the mirrors get the syncs.

Apache Tika is a toolkit for detecting and extracting metadata and
structured text content from various documents using existing parser libraries. 


Apache Tika 1.17 contains a number of improvements and bug fixes.
 
Details can be found in the changes file: 
http://www.apache.org/dist/tika/CHANGES-1.17.txt 

Apache Tika is available in source form from the following download page: 
http://www.apache.org/dyn/closer.cgi/tika/apache-tika-1.17-src.zip 

Apache Tika is also available in binary form or for use using Maven 2 from
the Central Repository: http://repo1.maven.org/maven2/org/apache/tika/ 

In the initial 48 hours, the release may not be available on all mirrors.
When downloading from a mirror site, please remember to verify the downloads 
using
signatures found on the Apache site: 
https://people.apache.org/keys/group/tika.asc 

For more information on Apache Tika, visit the project home page:
http://tika.apache.org/ 

-- Tim Allison, on behalf of the Apache Tika community
  

[ANNOUNCE] Apache Tika 1.16 released

2017-07-12 Thread Tim Allison
The Apache Tika project is pleased to announce the release of Apache Tika 1.16. 

The release contents have been pushed out to the main Apache release site and 
to the Maven Central sync, so the releases should be available as 
soon as the mirrors get the syncs.

Apache Tika is a toolkit for detecting and extracting metadata and
structured text content from various documents using existing parser libraries. 


Apache Tika 1.16 contains a number of improvements and bug fixes.
 
Details can be found in the changes file: 
http://www.apache.org/dist/tika/CHANGES-1.16.txt 

Apache Tika is available in source form from the following download page: 
http://www.apache.org/dyn/closer.cgi/tika/apache-tika-1.16-src.zip 

Apache Tika is also available in binary form or for use using Maven 2 from
the Central Repository: http://repo1.maven.org/maven2/org/apache/tika/ 

In the initial 48 hours, the release may not be available on all mirrors.
When downloading from a mirror site, please remember to verify the downloads 
using
signatures found on the Apache site: 
https://people.apache.org/keys/group/tika.asc 

For more information on Apache Tika, visit the project home page:
http://tika.apache.org/ 

-- Tim Allison, on behalf of the Apache Tika community


[ANNOUNCE] Apache Tika 1.15 released

2017-05-30 Thread Tim Allison
The Apache Tika project is pleased to announce the release of Apache Tika
1.15. The release contents have been pushed out to the main Apache release site 
and to the
Maven Central sync, so the releases should be available as soon as the mirrors 
get the syncs.

Apache Tika is a toolkit for detecting and extracting metadata and
structured text content from various documents using existing parser libraries.

Apache Tika 1.15 contains a number of improvements and bug fixes. Details
can be found in the changes file:
http://www.apache.org/dist/tika/CHANGES-1.15.txt

Apache Tika is available in source form from the following download page:
http://www.apache.org/dyn/closer.cgi/tika/apache-tika-1.15-src.zip

Apache Tika is also available in binary form or for use using Maven 2 from
the Central Repository:
http://repo1.maven.org/maven2/org/apache/tika/

In the initial 48 hours, the release may not be available on all mirrors.
When downloading
from a mirror site, please remember to verify the downloads using
signatures found on the
Apache site:
https://people.apache.org/keys/group/tika.asc

For more information on Apache Tika, visit the project home page:
http://tika.apache.org/

-- Tim Allison, on behalf of the Apache Tika community





[CVE-2016-4434] Apache Tika XML External Entity vulnerability

2016-05-26 Thread Tim Allison
CVE-2016-4434: Apache Tika XML External Entity vulnerability

Severity: Important

Vendor: 
The Apache Software Foundation

Versions Affected: 
Apache Tika 0.10 to 1.12

Description: 
Apache Tika parses XML within numerous file formats.  In some instances[1], the 
initialization ofthe XML parser or the choice of handlers did not protect 
against XML External Entity (XXE)
vulnerabilities.  According to www.owasp.org [2]: "This attack may lead to the 
disclosure of confidential data, denial of service, server side request 
forgery, port scanning from the perspective of the machine where the parser is 
located, and other system impacts." 


Mitigation: 
Upgrade to Apache Tika 1.13.

Credit: 
This issue was discovered by Arthur Khashaev (https://khashaev.ru), Seulgi Kim, 
Mesut Timur, and Microsoft Vulnerability Research.

[1] Spreadsheets in OOXML files and XMP in PDF and other file formats.
[2] https://www.owasp.org/index.php/XML_External_Entity_(XXE)_Processing