[jira] [Commented] (TIKA-2791) Add structure tags to tika-eval
[ https://issues.apache.org/jira/browse/TIKA-2791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16721740#comment-16721740 ] Hudson commented on TIKA-2791: -- SUCCESS: Integrated in Jenkins build tika-branch-1x #142 (See [https://builds.apache.org/job/tika-branch-1x/142/]) TIKA-2791 -- add tags/structure to tika-eval (tallison: [https://github.com/apache/tika/commit/4c9e38e4eb2983cda9759c4dffa33daa52d659c8]) * (add) tika-eval/src/main/java/org/apache/tika/eval/util/ContentTags.java * (add) tika-eval/src/test/resources/test-dirs/extractsA/file16_badTags.json * (edit) tika-core/src/main/java/org/apache/tika/sax/AbstractRecursiveParserWrapperHandler.java * (edit) tika-eval/src/main/java/org/apache/tika/eval/ExtractComparer.java * (add) tika-eval/src/test/resources/test-dirs/extractsA/file15_tags.json * (edit) tika-eval/src/main/java/org/apache/tika/eval/db/Cols.java * (add) tika-eval/src/main/java/org/apache/tika/eval/util/ContentTagParser.java * (edit) tika-eval/pom.xml * (edit) tika-eval/src/main/java/org/apache/tika/eval/AbstractProfiler.java * (edit) tika-core/src/main/java/org/apache/tika/sax/RecursiveParserWrapperHandler.java * (edit) tika-eval/src/main/java/org/apache/tika/eval/ExtractProfiler.java * (add) tika-eval/src/test/resources/test-dirs/extractsA/file17_tagsOutOfOrder.json * (add) tika-eval/src/test/resources/test-dirs/extractsB/file15_tags.html * (edit) tika-eval/src/main/java/org/apache/tika/eval/io/ExtractReader.java * (edit) tika-eval/src/main/java/org/apache/tika/eval/batch/ExtractProfilerBuilder.java * (edit) tika-eval/src/test/java/org/apache/tika/eval/SimpleComparerTest.java * (add) tika-eval/src/test/resources/test-dirs/extractsB/file16_badTags.html * (edit) tika-eval/src/main/java/org/apache/tika/eval/batch/ExtractComparerBuilder.java > Add structure tags to tika-eval > --- > > Key: TIKA-2791 > URL: https://issues.apache.org/jira/browse/TIKA-2791 > Project: Tika > Issue Type: Improvement >Reporter: Tim Allison >Priority: Major > > It would be useful to be able to compare counts of common structure tags in > tika-eval. We could also detect and flag bad structure tags that we may be > generating, e.g.: -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (TIKA-2791) Add structure tags to tika-eval
[ https://issues.apache.org/jira/browse/TIKA-2791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16721729#comment-16721729 ] Hudson commented on TIKA-2791: -- SUCCESS: Integrated in Jenkins build Tika-trunk #1609 (See [https://builds.apache.org/job/Tika-trunk/1609/]) TIKA-2791 -- add tags/structure to tika-eval (tallison: [https://github.com/apache/tika/commit/1ac6a3bd8601dc3376ce01786f115b877b9d338f]) * (edit) tika-eval/pom.xml * (edit) tika-eval/src/main/java/org/apache/tika/eval/ExtractProfiler.java * (edit) tika-core/src/main/java/org/apache/tika/sax/RecursiveParserWrapperHandler.java * (edit) tika-eval/src/main/java/org/apache/tika/eval/AbstractProfiler.java * (add) tika-eval/src/test/resources/test-dirs/extractsA/file17_tagsOutOfOrder.json * (edit) tika-eval/src/main/java/org/apache/tika/eval/db/Cols.java * (edit) tika-eval/src/main/java/org/apache/tika/eval/batch/ExtractComparerBuilder.java * (edit) tika-core/src/main/java/org/apache/tika/sax/AbstractRecursiveParserWrapperHandler.java * (edit) tika-eval/src/main/java/org/apache/tika/eval/ExtractComparer.java * (add) tika-eval/src/test/resources/test-dirs/extractsA/file16_badTags.json * (edit) tika-eval/src/main/java/org/apache/tika/eval/io/ExtractReader.java * (add) tika-eval/src/test/resources/test-dirs/extractsB/file15_tags.html * (edit) tika-eval/src/main/java/org/apache/tika/eval/batch/ExtractProfilerBuilder.java * (edit) tika-eval/src/test/java/org/apache/tika/eval/SimpleComparerTest.java * (add) tika-eval/src/test/resources/test-dirs/extractsA/file15_tags.json * (add) tika-eval/src/test/resources/test-dirs/extractsB/file16_badTags.html * (add) tika-eval/src/main/java/org/apache/tika/eval/util/ContentTagParser.java * (add) tika-eval/src/main/java/org/apache/tika/eval/util/ContentTags.java > Add structure tags to tika-eval > --- > > Key: TIKA-2791 > URL: https://issues.apache.org/jira/browse/TIKA-2791 > Project: Tika > Issue Type: Improvement >Reporter: Tim Allison >Priority: Major > > It would be useful to be able to compare counts of common structure tags in > tika-eval. We could also detect and flag bad structure tags that we may be > generating, e.g.: -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (TIKA-2791) Add structure tags to tika-eval
[ https://issues.apache.org/jira/browse/TIKA-2791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16721634#comment-16721634 ] Hudson commented on TIKA-2791: -- UNSTABLE: Integrated in Jenkins build tika-2.x-windows #363 (See [https://builds.apache.org/job/tika-2.x-windows/363/]) TIKA-2791 -- add tags/structure to tika-eval (tallison: rev 1ac6a3bd8601dc3376ce01786f115b877b9d338f) * (add) tika-eval/src/test/resources/test-dirs/extractsA/file16_badTags.json * (edit) tika-eval/src/main/java/org/apache/tika/eval/batch/ExtractComparerBuilder.java * (edit) tika-eval/src/test/java/org/apache/tika/eval/SimpleComparerTest.java * (add) tika-eval/src/main/java/org/apache/tika/eval/util/ContentTagParser.java * (add) tika-eval/src/main/java/org/apache/tika/eval/util/ContentTags.java * (edit) tika-eval/src/main/java/org/apache/tika/eval/io/ExtractReader.java * (edit) tika-core/src/main/java/org/apache/tika/sax/AbstractRecursiveParserWrapperHandler.java * (add) tika-eval/src/test/resources/test-dirs/extractsB/file15_tags.html * (edit) tika-eval/src/main/java/org/apache/tika/eval/ExtractProfiler.java * (edit) tika-eval/src/main/java/org/apache/tika/eval/AbstractProfiler.java * (add) tika-eval/src/test/resources/test-dirs/extractsB/file16_badTags.html * (edit) tika-eval/pom.xml * (edit) tika-eval/src/main/java/org/apache/tika/eval/batch/ExtractProfilerBuilder.java * (edit) tika-core/src/main/java/org/apache/tika/sax/RecursiveParserWrapperHandler.java * (edit) tika-eval/src/main/java/org/apache/tika/eval/ExtractComparer.java * (add) tika-eval/src/test/resources/test-dirs/extractsA/file15_tags.json * (edit) tika-eval/src/main/java/org/apache/tika/eval/db/Cols.java * (add) tika-eval/src/test/resources/test-dirs/extractsA/file17_tagsOutOfOrder.json > Add structure tags to tika-eval > --- > > Key: TIKA-2791 > URL: https://issues.apache.org/jira/browse/TIKA-2791 > Project: Tika > Issue Type: Improvement >Reporter: Tim Allison >Priority: Major > > It would be useful to be able to compare counts of common structure tags in > tika-eval. We could also detect and flag bad structure tags that we may be > generating, e.g.: -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (TIKA-2791) Add structure tags to tika-eval
[ https://issues.apache.org/jira/browse/TIKA-2791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16707280#comment-16707280 ] Tim Allison commented on TIKA-2791: --- I'd want to focus on a handful of common tags: p, div, ul, ol, li, table, tr, td, u, i, b, a...any others? > Add structure tags to tika-eval > --- > > Key: TIKA-2791 > URL: https://issues.apache.org/jira/browse/TIKA-2791 > Project: Tika > Issue Type: Improvement >Reporter: Tim Allison >Priority: Major > > It would be useful to be able to compare counts of common structure tags in > tika-eval. We could also detect and flag bad structure tags, e.g.: > -- This message was sent by Atlassian JIRA (v7.6.3#76005)