Good thread Tim,
Regarding open issues and low hanging fruit to make it into 1.14, I will
also work on finishing
https://github.com/apache/tika/pull/112.
I think Bob has an excellent point. The 2.X work is major and would be a
big step in the right direction. Having both branches longer and longer
[
https://issues.apache.org/jira/browse/TIKA-2013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15419274#comment-15419274
]
Tim Allison edited comment on TIKA-2013 at 8/12/16 6:59 PM:
I compared Tika
[
https://issues.apache.org/jira/browse/TIKA-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15418855#comment-15418855
]
Tim Allison edited comment on TIKA-2038 at 8/12/16 6:40 PM:
bq. But since I
[
https://issues.apache.org/jira/browse/TIKA-2013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15419274#comment-15419274
]
Tim Allison edited comment on TIKA-2013 at 8/12/16 6:36 PM:
I compared Tika
[
https://issues.apache.org/jira/browse/TIKA-2013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Allison updated TIKA-2013:
--
Attachment: potential_regressions_poi_3_15-beta3.zip
I compared Tika with poi-3.15-beta1 vs the
[
https://issues.apache.org/jira/browse/TIKA-2013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Allison updated TIKA-2013:
--
Summary: Upgrade to POI 3.15-beta3 when available (was: Upgrade to POI
3.15-beta2 when available)
>
[
https://issues.apache.org/jira/browse/TIKA-1938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15419202#comment-15419202
]
Hudson commented on TIKA-1938:
--
SUCCESS: Integrated in tika-2.x #130 (See
[
https://issues.apache.org/jira/browse/TIKA-1980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15419203#comment-15419203
]
Hudson commented on TIKA-1980:
--
SUCCESS: Integrated in tika-2.x #130 (See
[
https://issues.apache.org/jira/browse/TIKA-1980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15419150#comment-15419150
]
Hudson commented on TIKA-1980:
--
FAILURE: Integrated in tika-2.x-windows #34 (See
[
https://issues.apache.org/jira/browse/TIKA-1938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15419149#comment-15419149
]
Hudson commented on TIKA-1938:
--
FAILURE: Integrated in tika-2.x-windows #34 (See
The Apache Jenkins build system has built tika-2.x-windows (build #34)
Status: Still Failing
Check console output at https://builds.apache.org/job/tika-2.x-windows/34/ to
view the results.
[
https://issues.apache.org/jira/browse/TIKA-1980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15419104#comment-15419104
]
Hudson commented on TIKA-1980:
--
SUCCESS: Integrated in Tika-trunk #1091 (See
[
https://issues.apache.org/jira/browse/TIKA-1938?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15419101#comment-15419101
]
Tim Allison commented on TIKA-1938:
---
I just applied this to 2.x.
> HtmlParser drops elements found
[
https://issues.apache.org/jira/browse/TIKA-1938?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Allison updated TIKA-1938:
--
Fix Version/s: 2.0
> HtmlParser drops elements found inside
>
[
https://issues.apache.org/jira/browse/TIKA-1980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Allison resolved TIKA-1980.
---
Resolution: Fixed
Fix Version/s: 1.14
2.0
Thank you, [~naegelejd]!
> HTML
Github user asfgit closed the pull request at:
https://github.com/apache/tika/pull/121
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is
[
https://issues.apache.org/jira/browse/TIKA-1980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15419059#comment-15419059
]
ASF GitHub Bot commented on TIKA-1980:
--
Github user asfgit closed the pull request at:
[
https://issues.apache.org/jira/browse/TIKA-1980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Allison reassigned TIKA-1980:
-
Assignee: Tim Allison
> HTML head tags found after first script not parsed by HtmlParser
[
https://issues.apache.org/jira/browse/TIKA-1980?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15418959#comment-15418959
]
Joseph Naegele commented on TIKA-1980:
--
This should absolutely make it into 1.14.
> HTML head tags
1508, and 1680 are pending me/my review. I’ll get it done today.
On 8/12/16, 4:24 AM, "Allison, Timothy B." wrote:
>> I know it's been a little bit since we talked about 2.0. We had
discussed holding off while some API changes that were under consideration.
Has any
[
https://issues.apache.org/jira/browse/TIKA-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15418855#comment-15418855
]
Tim Allison edited comment on TIKA-2038 at 8/12/16 1:51 PM:
bq. But since I
[
https://issues.apache.org/jira/browse/TIKA-2038?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15418855#comment-15418855
]
Tim Allison commented on TIKA-2038:
---
bq. But since I haven’t access to a broadband Internet connection
[
https://issues.apache.org/jira/browse/TIKA-2054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15418760#comment-15418760
]
Tim Allison commented on TIKA-2054:
---
You might try subclassing the XHTMLHandler/SafeContentHandler and
[
https://issues.apache.org/jira/browse/TIKA-2054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15418755#comment-15418755
]
Tim Allison commented on TIKA-2054:
---
I don't think we want to modify our SafeContentHandler to stop
I think waiting for pdfbox 2.0.3 would be great. There are some regressions
fixed.
Regards,
Luis
Em 12 de ago de 2016 08:24, "Allison, Timothy B."
escreveu:
> >> I know it's been a little bit since we talked about 2.0. We had
> discussed holding off while some API changes
[
https://issues.apache.org/jira/browse/TIKA-2054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Angela Onslow updated TIKA-2054:
Attachment: 2482_2014_DAVIDE+CAMPARI-MILANO+SPA_SUSTY-AR.pdf
Here is a file which demonstrates this
[
https://issues.apache.org/jira/browse/TIKA-2054?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15418723#comment-15418723
]
Angela Onslow edited comment on TIKA-2054 at 8/12/16 11:48 AM:
---
Here is a
Angela Onslow created TIKA-2054:
---
Summary: Problem with ligatures converting from PDF to HTML with
Tika
Key: TIKA-2054
URL: https://issues.apache.org/jira/browse/TIKA-2054
Project: Tika
Issue
>> I know it's been a little bit since we talked about 2.0. We had discussed
>> holding off while some API changes that were under consideration. Has any
>> progress been made on this?
> I think we're still trying to come up with a plan for how to allow multiple
> parsers to report text for
I believe we've also still got the issue of structured metadata outstanding.
Regards,
Ray
> On Aug 12, 2016, at 6:27 AM, Nick Burch wrote:
>
> On Thu, 11 Aug 2016, Bob Paulin wrote:
>> I know it's been a little bit since we talked about 2.0. We had discussed
>> holding
On Thu, 11 Aug 2016, Bob Paulin wrote:
I know it's been a little bit since we talked about 2.0. We had
discussed holding off while some API changes that were under
consideration. Has any progress been made on this?
I think we're still trying to come up with a plan for how to allow
multiple
[
https://issues.apache.org/jira/browse/TIKA-2053?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15418450#comment-15418450
]
ASF GitHub Bot commented on TIKA-2053:
--
GitHub user AravindRam opened a pull request:
GitHub user AravindRam opened a pull request:
https://github.com/apache/tika/pull/130
fix for TIKA-2053 contributed by AravindRam
Adding TagRatio parser to Tika Parser.
You can merge this pull request into a Git repository by running:
$ git pull
I know it's been a little bit since we talked about 2.0. We had
discussed holding off while some API changes that were under
consideration. Has any progress been made on this? The community has
been really good about dual maintaining but how much longer do we want
to have this expectation?
Aravind Ram Nathan created TIKA-2053:
Summary: Adding TagRatio to Tika Parser
Key: TIKA-2053
URL: https://issues.apache.org/jira/browse/TIKA-2053
Project: Tika
Issue Type: New Feature
35 matches
Mail list logo