[tika] branch TIKA-1599 updated (60479ddd4 -> 6ba636b57)

2023-09-22 Thread tallison
This is an automated email from the ASF dual-hosted git repository.

tallison pushed a change to branch TIKA-1599
in repository https://gitbox.apache.org/repos/asf/tika.git


from 60479ddd4 TIKA-1599 -- migrate to jsoup parser -- checkstyle
 add 6ba636b57 TIKA-1599 -- migrate to jsoup parser -- fix bad auto replace 
all

No new revisions were added by this update.

Summary of changes:
 .../apache/tika/config/TIKA-2273-exclude-encoding-detector-default.xml  | 2 +-
 .../org/apache/tika/config/TIKA-2485-encoding-detector-mark-limits.xml  | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)



[tika] branch TIKA-1599 updated (a47e37ced -> 60479ddd4)

2023-09-22 Thread tallison
This is an automated email from the ASF dual-hosted git repository.

tallison pushed a change to branch TIKA-1599
in repository https://gitbox.apache.org/repos/asf/tika.git


from a47e37ced TIKA-1599 -- migrate to jsoup parser -- fix EncodingDetector 
and fix or disable unit tests
 add 60479ddd4 TIKA-1599 -- migrate to jsoup parser -- checkstyle

No new revisions were added by this update.

Summary of changes:
 .../src/test/java/org/apache/tika/parser/html/HtmlParserTest.java| 1 -
 1 file changed, 1 deletion(-)



[tika] branch TIKA-1599 updated (f05e9b45e -> a47e37ced)

2023-09-22 Thread tallison
This is an automated email from the ASF dual-hosted git repository.

tallison pushed a change to branch TIKA-1599
in repository https://gitbox.apache.org/repos/asf/tika.git


from f05e9b45e TIKA-1599 -- migrate to jsoup parser -- mv tagsoup 
htmlparser to tika-parsrs-extended
 add a47e37ced TIKA-1599 -- migrate to jsoup parser -- fix EncodingDetector 
and fix or disable unit tests

No new revisions were added by this update.

Summary of changes:
 .../services/org.apache.tika.detect.EncodingDetector|  2 +-
 .../java/org/apache/tika/parser/html/HtmlParserTest.java| 13 +++--
 2 files changed, 8 insertions(+), 7 deletions(-)



[tika] branch TIKA-1599 updated (1d4e6ebb6 -> f05e9b45e)

2023-09-22 Thread tallison
This is an automated email from the ASF dual-hosted git repository.

tallison pushed a change to branch TIKA-1599
in repository https://gitbox.apache.org/repos/asf/tika.git


from 1d4e6ebb6 TIKA-1599 -- migrate to jsoup parser -- remove runtime 
exception
 add f05e9b45e TIKA-1599 -- migrate to jsoup parser -- mv tagsoup 
htmlparser to tika-parsrs-extended

No new revisions were added by this update.

Summary of changes:
 pom.xml|   2 +
 tika-bom/pom.xml   |  11 ++-
 tika-parent/pom.xml|   5 +
 .../tika-parser-tagsoup-module/pom.xml |  34 +++
 .../tika/parser/html/tagsoup}/DataURIScheme.java   |   2 +-
 .../html/tagsoup}/DataURISchemeParseException.java |   2 +-
 .../parser/html/tagsoup}/DataURISchemeUtil.java|   2 +-
 .../parser/html/tagsoup}/DefaultHtmlMapper.java|   2 +-
 .../parser/html/tagsoup}/HtmlEncodingDetector.java |   2 +-
 .../tika/parser/html/tagsoup}/HtmlHandler.java |   2 +-
 .../tika/parser/html/tagsoup}/HtmlMapper.java  |   2 +-
 .../tika/parser/html/tagsoup}/HtmlParser.java  |   2 +-
 .../parser/html/tagsoup}/IdentityHtmlMapper.java   |   2 +-
 .../html/tagsoup}/XHTMLDowngradeHandler.java   |   2 +-
 .../tagsoup}/charsetdetector/CharsetAliases.java   |   6 +-
 .../charsetdetector/CharsetDetectionResult.java|   2 +-
 .../tagsoup}/charsetdetector/MetaProcessor.java|   6 +-
 .../html/tagsoup}/charsetdetector/PreScanner.java  |   2 +-
 .../StandardHtmlEncodingDetector.java  |   6 +-
 .../charsets/ReplacementCharset.java   |   2 +-
 .../charsets/XUserDefinedCharset.java  |   2 +-
 .../org.apache.tika.detect.EncodingDetector|   2 +-
 .../services/org.apache.tika.parser.Parser |   2 +-
 .../StandardCharsets_unsupported_by_IANA.txt   |   0
 .../html/tagsoup}/DataURISchemeParserTest.java |   3 +-
 .../html/tagsoup}/HtmlEncodingDetectorTest.java|   3 +-
 .../tika/parser/html/tagsoup}/HtmlParserTest.java  |   5 +-
 .../tika/parser/html/tagsoup}/SrcDocTest.java  |   2 +-
 .../tagsoup}/StandardHtmlEncodingDetectorTest.java |   6 +-
 .../org/apache/tika/parser/html/tika-config.xml|   4 +-
 .../resources/test-documents/big-preamble.html |   0
 .../test-documents/boilerplate-whitespace.html |   0
 .../test/resources/test-documents/boilerplate.html |   0
 .../testBoilerplateMissingSpace.html   |   0
 .../test/resources/test-documents/testHTML.html|   0
 .../test-documents/testHTMLBadScript.html  |   0
 .../test-documents/testHTMLGoodScript.html |   0
 .../testHTMLNoisyMetaEncoding_1.html   |   0
 .../testHTMLNoisyMetaEncoding_2.html   |   0
 .../testHTMLNoisyMetaEncoding_3.html   |   0
 .../testHTMLNoisyMetaEncoding_4.html   |   0
 .../test-documents/testHTML_charset_utf16le.html   | Bin
 .../test-documents/testHTML_charset_utf8.html  |   0
 .../testHTML_embedded_data_uri_js.html |   0
 .../test-documents/testHTML_embedded_img.html  |   0
 .../testHTML_embedded_img_in_js.html   |   0
 .../resources/test-documents/testHTML_head.html|   0
 .../test-documents/testHTML_metadata.html  |   0
 .../testHTML_metadata_two_titles.html  |   0
 .../resources/test-documents/testHTML_utf8.html|   0
 .../test/resources/test-documents/testSrcDoc.html  |   0
 .../test-documents/testUserDefinedCharset.mhtml|   0
 .../test/resources/test-documents/testXHTML.html   |   0
 .../src/test/resources/test-documents/tika434.html |   0
 .../pom.xml|  46 ++---
 .../tika-parser-html-module/pom.xml|   6 --
 .../org.apache.tika.detect.EncodingDetector|   2 +-
 .../apache/tika/parser/html/HtmlParserTest.java| 107 +++--
 ...TIKA-2273-exclude-encoding-detector-default.xml |   2 +-
 .../TIKA-2485-encoding-detector-mark-limits.xml|   2 +-
 60 files changed, 138 insertions(+), 152 deletions(-)
 create mode 100644 
tika-parsers/tika-parsers-extended/tika-parser-tagsoup-module/pom.xml
 copy 
tika-parsers/{tika-parsers-standard/tika-parsers-standard-modules/tika-parser-html-module/src/main/java/org/apache/tika/parser/html
 => 
tika-parsers-extended/tika-parser-tagsoup-module/src/main/java/org/apache/tika/parser/html/tagsoup}/DataURIScheme.java
 (98%)
 copy 
tika-parsers/{tika-parsers-standard/tika-parsers-standard-modules/tika-parser-html-module/src/main/java/org/apache/tika/parser/html
 => 
tika-parsers-extended/tika-parser-tagsoup-module/src/main/java/org/apache/tika/parser/html/tagsoup}/DataURISchemeParseException.java
 (95%)
 copy 
tika-parsers/{tika-parsers-standard/tika-parsers-standard-modules/tika-parser-html-module/src/main/java/org/apache/tika/parser/html
 => 

[tika] branch TIKA-1599 updated (d1bc68eb8 -> 1d4e6ebb6)

2023-09-22 Thread tallison
This is an automated email from the ASF dual-hosted git repository.

tallison pushed a change to branch TIKA-1599
in repository https://gitbox.apache.org/repos/asf/tika.git


from d1bc68eb8 TIKA-1599 -- migrate to jsoup parser -- checkstyle fix
 add e04c47820 TIKA-4138 -- move BoilerpipeContentHandler (#1355)
 add d1a5fbc32 Merge remote-tracking branch 'origin/main' into TIKA-1599
 add 1d4e6ebb6 TIKA-1599 -- migrate to jsoup parser -- remove runtime 
exception

No new revisions were added by this update.

Summary of changes:
 CHANGES.txt|  5 ++
 pom.xml|  1 +
 tika-app/pom.xml   |  2 +-
 tika-bom/pom.xml   |  2 +-
 tika-bundles/tika-bundle-standard/pom.xml  |  2 +-
 tika-handlers/README.md|  2 +
 .../tika-emitter-jdbc => tika-handlers}/pom.xml| 24 ---
 .../tika-handler-boilerpipe}/pom.xml   | 21 +++---
 .../sax/boilerpipe/BoilerpipeContentHandler.java   |  0
 .../tika-parsers-standard-modules/pom.xml  |  1 -
 .../tika-parser-html-commons/README.md | 22 ---
 .../tika-parser-html-commons/pom.xml   | 74 --
 .../org/apache/tika/parser/html/JSoupParser.java   |  2 +-
 .../tika-parsers-standard-package/pom.xml  |  2 +-
 tika-server/tika-server-core/pom.xml   |  2 +-
 tika-server/tika-server-standard/pom.xml   |  6 +-
 16 files changed, 44 insertions(+), 124 deletions(-)
 create mode 100644 tika-handlers/README.md
 copy {tika-pipes/tika-emitters/tika-emitter-jdbc => tika-handlers}/pom.xml 
(70%)
 copy 
{tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-jdbc-commons
 => tika-handlers/tika-handler-boilerpipe}/pom.xml (66%)
 rename 
{tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-html-commons
 => 
tika-handlers/tika-handler-boilerpipe}/src/main/java/org/apache/tika/sax/boilerpipe/BoilerpipeContentHandler.java
 (100%)
 delete mode 100644 
tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-html-commons/README.md
 delete mode 100644 
tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-html-commons/pom.xml



[tika] branch TIKA-4138 deleted (was cbc46ee9b)

2023-09-22 Thread tallison
This is an automated email from the ASF dual-hosted git repository.

tallison pushed a change to branch TIKA-4138
in repository https://gitbox.apache.org/repos/asf/tika.git


 was cbc46ee9b TIKA-4138 -- move BoilerpipeContentHandler

The revisions that were on this branch are still contained in
other references; therefore, this change does not discard any commits
from the repository.



[tika] branch main updated (6871c9157 -> e04c47820)

2023-09-22 Thread tallison
This is an automated email from the ASF dual-hosted git repository.

tallison pushed a change to branch main
in repository https://gitbox.apache.org/repos/asf/tika.git


from 6871c9157 TIKA-4137 -- add a jdk21 build workflow
 add e04c47820 TIKA-4138 -- move BoilerpipeContentHandler (#1355)

No new revisions were added by this update.

Summary of changes:
 CHANGES.txt|  5 ++
 pom.xml|  1 +
 tika-app/pom.xml   |  2 +-
 tika-bom/pom.xml   |  2 +-
 tika-bundles/tika-bundle-standard/pom.xml  |  2 +-
 tika-handlers/README.md|  2 +
 .../tika-emitter-jdbc => tika-handlers}/pom.xml| 24 ---
 .../tika-handler-boilerpipe}/pom.xml   | 21 +++---
 .../sax/boilerpipe/BoilerpipeContentHandler.java   |  0
 .../tika-parsers-standard-modules/pom.xml  |  1 -
 .../tika-parser-html-commons/README.md | 22 ---
 .../tika-parser-html-commons/pom.xml   | 74 --
 .../tika-parsers-standard-package/pom.xml  |  2 +-
 tika-server/tika-server-core/pom.xml   |  2 +-
 tika-server/tika-server-standard/pom.xml   |  6 +-
 15 files changed, 43 insertions(+), 123 deletions(-)
 create mode 100644 tika-handlers/README.md
 copy {tika-pipes/tika-emitters/tika-emitter-jdbc => tika-handlers}/pom.xml 
(70%)
 copy 
{tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-jdbc-commons
 => tika-handlers/tika-handler-boilerpipe}/pom.xml (66%)
 rename 
{tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-html-commons
 => 
tika-handlers/tika-handler-boilerpipe}/src/main/java/org/apache/tika/sax/boilerpipe/BoilerpipeContentHandler.java
 (100%)
 delete mode 100644 
tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-html-commons/README.md
 delete mode 100644 
tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-html-commons/pom.xml



[tika] branch TIKA-1599 updated (b8d4e6d66 -> d1bc68eb8)

2023-09-22 Thread tallison
This is an automated email from the ASF dual-hosted git repository.

tallison pushed a change to branch TIKA-1599
in repository https://gitbox.apache.org/repos/asf/tika.git


from b8d4e6d66 TIKA-1599 -- migrate to jsoup parser
 add d1bc68eb8 TIKA-1599 -- migrate to jsoup parser -- checkstyle fix

No new revisions were added by this update.

Summary of changes:
 .../src/main/java/org/apache/tika/example/TIAParsingExample.java| 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)



[tika] 01/01: TIKA-1599 -- migrate to jsoup parser

2023-09-22 Thread tallison
This is an automated email from the ASF dual-hosted git repository.

tallison pushed a commit to branch TIKA-1599
in repository https://gitbox.apache.org/repos/asf/tika.git

commit b8d4e6d6670485bbb762c5b1e4fe9641cea94f25
Author: tallison 
AuthorDate: Fri Sep 22 12:23:24 2023 -0400

TIKA-1599 -- migrate to jsoup parser
---
 .../test/java/org/apache/tika/cli/TikaCLITest.java |   4 +-
 .../src/test/resources/test-data/tika-config1.xml  |   2 +-
 .../org/apache/tika/example/TIAParsingExample.java |   6 +-
 .../src/test/resources/2.4.0-no-tesseract.txt  |   8 +-
 .../src/test/resources/2.4.0-tesseract.txt |   8 +-
 .../src/test/resources/2.4.1-no-tesseract.txt  |   8 +-
 .../src/test/resources/2.4.1-tesseract.txt |   8 +-
 .../tika-parser-html-module/pom.xml|   5 +
 .../org/apache/tika/parser/html/JSoupParser.java   | 243 +
 .../services/org.apache.tika.parser.Parser |   2 +-
 .../org/apache/tika/parser/html/tika-config.xml|   4 +-
 .../tika/parser/mail/MailContentHandler.java   |   4 +-
 .../tika/parser/microsoft/JackcessExtractor.java   |   6 +-
 .../tika/parser/microsoft/OutlookExtractor.java|   6 +-
 .../tika/parser/microsoft/chm/ChmParser.java   |   6 +-
 .../tika/parser/microsoft/rtf/RTFParserTest.java   |   2 +-
 .../org/apache/tika/sax/BoilerpipeHandlerTest.java |  21 +-
 17 files changed, 300 insertions(+), 43 deletions(-)

diff --git a/tika-app/src/test/java/org/apache/tika/cli/TikaCLITest.java 
b/tika-app/src/test/java/org/apache/tika/cli/TikaCLITest.java
index e6c5c2296..b8795225b 100644
--- a/tika-app/src/test/java/org/apache/tika/cli/TikaCLITest.java
+++ b/tika-app/src/test/java/org/apache/tika/cli/TikaCLITest.java
@@ -272,7 +272,7 @@ public class TikaCLITest {
 
 assertTrue(json.contains(
 "\"X-TIKA:Parsed-By\" : [ 
\"org.apache.tika.parser.DefaultParser\", " +
-"\"org.apache.tika.parser.html.HtmlParser\" ],"));
+"\"org.apache.tika.parser.html.JSoupParser\" ],"));
 //test legacy alphabetic sort of keys
 int enc = json.indexOf("\"Content-Encoding\"");
 int fb = json.indexOf("fb:admins");
@@ -467,7 +467,7 @@ public class TikaCLITest {
 getParamOutContent("--config=" + TEST_DATA_FILE.toString() + 
"/tika-config1.xml",
 resourcePrefix + "bad_xml.xml");
 assertTrue(content.contains("apple"));
-assertTrue(content.contains("org.apache.tika.parser.html.HtmlParser"));
+
assertTrue(content.contains("org.apache.tika.parser.html.JSoupParser"));
 }
 
 @Test
diff --git a/tika-app/src/test/resources/test-data/tika-config1.xml 
b/tika-app/src/test/resources/test-data/tika-config1.xml
index ff03407bc..52f4f0949 100644
--- a/tika-app/src/test/resources/test-data/tika-config1.xml
+++ b/tika-app/src/test/resources/test-data/tika-config1.xml
@@ -1,7 +1,7 @@
 
 
   
-
+
   application/vnd.wap.xhtml+xml
   application/x-asp
   application/xhtml+xml
diff --git 
a/tika-example/src/main/java/org/apache/tika/example/TIAParsingExample.java 
b/tika-example/src/main/java/org/apache/tika/example/TIAParsingExample.java
index 5a9ee5dc5..748f83fae 100755
--- a/tika-example/src/main/java/org/apache/tika/example/TIAParsingExample.java
+++ b/tika-example/src/main/java/org/apache/tika/example/TIAParsingExample.java
@@ -47,7 +47,7 @@ import org.apache.tika.parser.ParseContext;
 import org.apache.tika.parser.Parser;
 import org.apache.tika.parser.ParserDecorator;
 import org.apache.tika.parser.html.HtmlMapper;
-import org.apache.tika.parser.html.HtmlParser;
+import org.apache.tika.parser.html.JSoupParser;
 import org.apache.tika.parser.html.IdentityHtmlMapper;
 import org.apache.tika.parser.txt.TXTParser;
 import org.apache.tika.parser.xml.XMLParser;
@@ -117,7 +117,7 @@ public class TIAParsingExample {
 ContentHandler handler = new DefaultHandler();
 Metadata metadata = new Metadata();
 ParseContext context = new ParseContext();
-Parser parser = new HtmlParser();
+Parser parser = new JSoupParser();
 parser.parse(stream, handler, metadata, context);
 }
 
@@ -126,7 +126,7 @@ public class TIAParsingExample {
 ContentHandler handler = new DefaultHandler();
 ParseContext context = new ParseContext();
 Map parsersByType = new HashMap<>();
-parsersByType.put(MediaType.parse("text/html"), new HtmlParser());
+parsersByType.put(MediaType.parse("text/html"), new JSoupParser());
 parsersByType.put(MediaType.parse("application/xml"), new XMLParser());
 
 CompositeParser parser = new CompositeParser();
diff --git 
a/tika-parsers/tika-parsers-extended/tika-parser-scientific-package/src/test/resources/2.4.0-no-tesseract.txt
 
b/tika-parsers/tika-parsers-extended/tika-parser-scientific-package/src/test/resources/2.4.0-no-tesseract.txt
index a929ec74d..ca772e598 100644
--- 

[tika] branch TIKA-1599 created (now b8d4e6d66)

2023-09-22 Thread tallison
This is an automated email from the ASF dual-hosted git repository.

tallison pushed a change to branch TIKA-1599
in repository https://gitbox.apache.org/repos/asf/tika.git


  at b8d4e6d66 TIKA-1599 -- migrate to jsoup parser

This branch includes the following new commits:

 new b8d4e6d66 TIKA-1599 -- migrate to jsoup parser

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.




[tika] branch TIKA-4138 created (now cbc46ee9b)

2023-09-22 Thread tallison
This is an automated email from the ASF dual-hosted git repository.

tallison pushed a change to branch TIKA-4138
in repository https://gitbox.apache.org/repos/asf/tika.git


  at cbc46ee9b TIKA-4138 -- move BoilerpipeContentHandler

This branch includes the following new commits:

 new cbc46ee9b TIKA-4138 -- move BoilerpipeContentHandler

The 1 revisions listed above as "new" are entirely new to this
repository and will be described in separate emails.  The revisions
listed as "add" were already present in the repository and have only
been added to this reference.




[tika] 01/01: TIKA-4138 -- move BoilerpipeContentHandler

2023-09-22 Thread tallison
This is an automated email from the ASF dual-hosted git repository.

tallison pushed a commit to branch TIKA-4138
in repository https://gitbox.apache.org/repos/asf/tika.git

commit cbc46ee9b5295bf14541da8d1f016261c5e30196
Author: tallison 
AuthorDate: Fri Sep 22 10:31:47 2023 -0400

TIKA-4138 -- move BoilerpipeContentHandler
---
 CHANGES.txt|  5 ++
 pom.xml|  1 +
 tika-app/pom.xml   |  2 +-
 tika-bom/pom.xml   |  2 +-
 tika-bundles/tika-bundle-standard/pom.xml  |  2 +-
 tika-handlers/README.md|  2 +
 tika-handlers/pom.xml  | 48 ++
 .../tika-handler-boilerpipe/pom.xml| 26 ++--
 .../sax/boilerpipe/BoilerpipeContentHandler.java   |  0
 .../tika-parsers-standard-modules/pom.xml  |  1 -
 .../tika-parser-html-commons/pom.xml   | 74 --
 .../tika-parsers-standard-package/pom.xml  |  2 +-
 tika-server/tika-server-core/pom.xml   |  2 +-
 tika-server/tika-server-standard/pom.xml   |  6 +-
 14 files changed, 86 insertions(+), 87 deletions(-)

diff --git a/CHANGES.txt b/CHANGES.txt
index 30c137609..408e42676 100644
--- a/CHANGES.txt
+++ b/CHANGES.txt
@@ -1,7 +1,12 @@
 Release 3.0.0-BETA - ??
 
+   BREAKING CHANGES
+
* Require Java 11 (TIKA-4128).
 
+   * The boilerpipe handler has been moved to tika-handler-boiler-pipe
+
+   Other Changes/Updates
* Fix bug in DateUtils that stripped timezone information from
  incoming Calendar objects (TIKA-4126).
 
diff --git a/pom.xml b/pom.xml
index ab6b22afa..31f025576 100644
--- a/pom.xml
+++ b/pom.xml
@@ -54,6 +54,7 @@
 tika-example
 tika-java7
 tika-detectors
+tika-handlers
   
 
   
diff --git a/tika-app/pom.xml b/tika-app/pom.xml
index 9a48d2ea9..68ac79477 100644
--- a/tika-app/pom.xml
+++ b/tika-app/pom.xml
@@ -45,7 +45,7 @@
 
 
   ${project.groupId}
-  tika-parser-html-commons
+  tika-handler-boilerpipe
   ${project.version}
 
 
diff --git a/tika-bom/pom.xml b/tika-bom/pom.xml
index ba2e19d73..5e1aca01e 100644
--- a/tika-bom/pom.xml
+++ b/tika-bom/pom.xml
@@ -222,7 +222,7 @@
   
   
 org.apache.tika
-tika-parser-html-commons
+tika-handler-boilerpipe
 3.0.0-SNAPSHOT
   
   
diff --git a/tika-bundles/tika-bundle-standard/pom.xml 
b/tika-bundles/tika-bundle-standard/pom.xml
index db605c044..1e18b1cb0 100644
--- a/tika-bundles/tika-bundle-standard/pom.xml
+++ b/tika-bundles/tika-bundle-standard/pom.xml
@@ -58,7 +58,7 @@
 
 
   ${project.groupId}
-  tika-parser-html-commons
+  tika-handler-boilerpipe
   ${project.version}
 
 
diff --git a/tika-handlers/README.md b/tika-handlers/README.md
new file mode 100644
index 0..bb45651b3
--- /dev/null
+++ b/tika-handlers/README.md
@@ -0,0 +1,2 @@
+This package is intended to hold non-standard handlers. These may have 
dependencies that some don't want, 
+or they may have a focus that isn't general enough to warrant adding them to 
tika-core
\ No newline at end of file
diff --git a/tika-handlers/pom.xml b/tika-handlers/pom.xml
new file mode 100644
index 0..fcab3eb20
--- /dev/null
+++ b/tika-handlers/pom.xml
@@ -0,0 +1,48 @@
+
+
+http://maven.apache.org/POM/4.0.0;
+ xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance;
+ xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 
http://maven.apache.org/xsd/maven-4.0.0.xsd;>
+  4.0.0
+  
+org.apache.tika
+tika-parent
+3.0.0-SNAPSHOT
+../tika-parent/pom.xml
+  
+
+  tika-handlers
+
+  Apache Tika handlers
+  pom
+
+  
+tika-handler-boilerpipe
+  
+
+  
+
+  ${project.groupId}
+  tika-core
+  ${project.version}
+  provided
+
+  
+
\ No newline at end of file
diff --git 
a/tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-html-commons/README.md
 b/tika-handlers/tika-handler-boilerpipe/pom.xml
similarity index 51%
rename from 
tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-html-commons/README.md
rename to tika-handlers/tika-handler-boilerpipe/pom.xml
index 82fb00a47..05d0b69b3 100644
--- 
a/tika-parsers/tika-parsers-standard/tika-parsers-standard-modules/tika-parser-html-commons/README.md
+++ b/tika-handlers/tika-handler-boilerpipe/pom.xml
@@ -1,4 +1,5 @@
-
-This module only contains the BoilerPipeContentHandler.  The boilerpipe 
dependency is no 
-longer maintained and contains clashes with NekoHTML.
+http://maven.apache.org/POM/4.0.0;
+ xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance;
+ xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 
http://maven.apache.org/xsd/maven-4.0.0.xsd;>
+  4.0.0
+  
+org.apache.tika
+tika-handlers
+3.0.0-SNAPSHOT
+../pom.xml
+  
 
-In Tika 3.x, we 

[tika] branch main updated: TIKA-4137 -- add a jdk21 build workflow

2023-09-22 Thread tallison
This is an automated email from the ASF dual-hosted git repository.

tallison pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/tika.git


The following commit(s) were added to refs/heads/main by this push:
 new 6871c9157 TIKA-4137 -- add a jdk21 build workflow
6871c9157 is described below

commit 6871c9157ed58fe1a0249bbdf44ef76116dba767
Author: tallison 
AuthorDate: Fri Sep 22 09:33:17 2023 -0400

TIKA-4137 -- add a jdk21 build workflow
---
 .github/workflows/main-jdk21-build.yml | 38 ++
 1 file changed, 38 insertions(+)

diff --git a/.github/workflows/main-jdk21-build.yml 
b/.github/workflows/main-jdk21-build.yml
new file mode 100644
index 0..946cbf0f9
--- /dev/null
+++ b/.github/workflows/main-jdk21-build.yml
@@ -0,0 +1,38 @@
+#
+# Licensed to the Apache Software Foundation (ASF) under one or more
+# contributor license agreements.  See the NOTICE file distributed with
+# this work for additional information regarding copyright ownership.
+# The ASF licenses this file to You under the Apache License, Version 2.0
+# (the "License"); you may not use this file except in compliance with
+# the License.  You may obtain a copy of the License at
+#
+#  http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+#
+
+name: main jdk21 build
+
+on:
+  push:
+branches: [ main ]
+
+jobs:
+  build:
+runs-on: ubuntu-latest
+strategy:
+  matrix:
+java: [ '21' ]
+
+steps:
+  - uses: actions/checkout@v2
+  - name: Set up JDK ${{ matrix.java }}
+uses: actions/setup-java@v1
+with:
+  java-version: ${{ matrix.java }}
+  - name: Build with Maven
+run: mvn clean test install javadoc:aggregate



[tika] branch main updated: Tika 4137 (#1353)

2023-09-22 Thread tallison
This is an automated email from the ASF dual-hosted git repository.

tallison pushed a commit to branch main
in repository https://gitbox.apache.org/repos/asf/tika.git


The following commit(s) were added to refs/heads/main by this push:
 new 72a81a16e Tika 4137 (#1353)
72a81a16e is described below

commit 72a81a16e39848dc15202f7e6f8d23661264dc13
Author: Thorsten Heit 
AuthorDate: Fri Sep 22 15:29:05 2023 +0200

Tika 4137 (#1353)

* TIKA-4137 -- Building current Tika main branch fails under Java 20/21

Authored-by: thorsten 
---
 .../src/main/java/org/apache/tika/server/core/resource/TikaResource.java | 1 +
 1 file changed, 1 insertion(+)

diff --git 
a/tika-server/tika-server-core/src/main/java/org/apache/tika/server/core/resource/TikaResource.java
 
b/tika-server/tika-server-core/src/main/java/org/apache/tika/server/core/resource/TikaResource.java
index aadf86f30..2913e740b 100644
--- 
a/tika-server/tika-server-core/src/main/java/org/apache/tika/server/core/resource/TikaResource.java
+++ 
b/tika-server/tika-server-core/src/main/java/org/apache/tika/server/core/resource/TikaResource.java
@@ -676,6 +676,7 @@ public class TikaResource {
 handler.getTransformer().setOutputProperty(OutputKeys.METHOD, 
format);
 handler.getTransformer().setOutputProperty(OutputKeys.INDENT, 
"yes");
 
handler.getTransformer().setOutputProperty(OutputKeys.ENCODING, UTF_8.name());
+handler.getTransformer().setOutputProperty(OutputKeys.VERSION, 
"1.1");
 handler.setResult(new StreamResult(writer));
 content = new ExpandedTitleContentHandler(handler);
 } catch (TransformerConfigurationException e) {



[tika] branch branch_2x updated: TIKA-4123: update netty, aws

2023-09-22 Thread tilman
This is an automated email from the ASF dual-hosted git repository.

tilman pushed a commit to branch branch_2x
in repository https://gitbox.apache.org/repos/asf/tika.git


The following commit(s) were added to refs/heads/branch_2x by this push:
 new 1996d73ab TIKA-4123: update netty, aws
1996d73ab is described below

commit 1996d73aba38232828e30031419c3389af79c592
Author: Tilman Hausherr 
AuthorDate: Fri Sep 22 08:13:42 2023 +0200

TIKA-4123: update netty, aws
---
 tika-parent/pom.xml | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/tika-parent/pom.xml b/tika-parent/pom.xml
index 50bf4c243..81bbc9595 100644
--- a/tika-parent/pom.xml
+++ b/tika-parent/pom.xml
@@ -306,7 +306,7 @@
 
 
 2.27.0
-1.12.554
+1.12.555
 9.5
 1.1.0
 
@@ -402,7 +402,7 @@
 6.1.11
 1.5.5-5
 3.5.1
-4.1.97.Final
+4.1.98.Final