[
https://issues.apache.org/jira/browse/TIKA-2562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16350330#comment-16350330
]
Tim Allison commented on TIKA-2562:
---
This is a "feature" of tagsoup see, e.g.
[
https://issues.apache.org/jira/browse/TIKA-2490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16350418#comment-16350418
]
Andrei Rebegea commented on TIKA-2490:
--
Hello,
I am using tika version 1.17 and still getting these
[
https://issues.apache.org/jira/browse/TIKA-1599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16350335#comment-16350335
]
Tim Allison commented on TIKA-1599:
---
What say we do a fresh eval on our current corpus and then do a
[
https://issues.apache.org/jira/browse/TIKA-2561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16350302#comment-16350302
]
Tim Allison commented on TIKA-2561:
---
This is helpful. It boggles my imagination that this could be a
[
https://issues.apache.org/jira/browse/TIKA-1599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16350335#comment-16350335
]
Tim Allison edited comment on TIKA-1599 at 2/2/18 1:42 PM:
---
What say we do a
[
https://issues.apache.org/jira/browse/TIKA-2561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16350406#comment-16350406
]
Hudson commented on TIKA-2561:
--
SUCCESS: Integrated in Jenkins build Tika-trunk #1429 (See
[
https://issues.apache.org/jira/browse/TIKA-1599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16350504#comment-16350504
]
Luis Filipe Nassif commented on TIKA-1599:
--
Hi [~talli...@mitre.org],
Moving to DOM could lead to
[
https://issues.apache.org/jira/browse/TIKA-2561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Allison resolved TIKA-2561.
---
Resolution: Fixed
Fix Version/s: 1.18
2.0
> Tika Parser includes
[
https://issues.apache.org/jira/browse/TIKA-2490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Allison updated TIKA-2490:
--
Fix Version/s: 1.17
> Turn off stderr warnings in Tika-app
>
>
>
[
https://issues.apache.org/jira/browse/TIKA-2490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16350420#comment-16350420
]
Tim Allison commented on TIKA-2490:
---
No, this isn't supposed to happen if you use the example
[
https://issues.apache.org/jira/browse/TIKA-2490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16350528#comment-16350528
]
Andrei Rebegea commented on TIKA-2490:
--
The short answer is that I don't know the full details, sorry.
[
https://issues.apache.org/jira/browse/TIKA-1599?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Markus Jelsma updated TIKA-1599:
Attachment: consumentenbond.html
> Switch from TagSoup to JSoup
>
>
>
[
https://issues.apache.org/jira/browse/TIKA-1599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16350556#comment-16350556
]
Tim Allison commented on TIKA-1599:
---
>Tim, if attached file is what you are looking for, i've got about
[
https://issues.apache.org/jira/browse/TIKA-2563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16350606#comment-16350606
]
Tim Allison commented on TIKA-2563:
---
Right. Sorry. I meant the {{testHTML_embedded_img.html}}, NOT the
[
https://issues.apache.org/jira/browse/TIKA-2562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16350634#comment-16350634
]
NW Brad edited comment on TIKA-2562 at 2/2/18 4:51 PM:
---
Thanks. I checked it out and
[
https://issues.apache.org/jira/browse/TIKA-1599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16350545#comment-16350545
]
Markus Jelsma commented on TIKA-1599:
-
On topic, our parser on top of Tika relies on a custom
[
https://issues.apache.org/jira/browse/TIKA-2490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16350543#comment-16350543
]
Andrei Rebegea commented on TIKA-2490:
--
OK. Thanks for the answer.
So "Is this still suppose to
[
https://issues.apache.org/jira/browse/TIKA-2562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16350634#comment-16350634
]
NW Brad edited comment on TIKA-2562 at 2/2/18 4:50 PM:
---
Thanks. I checked it out and
[
https://issues.apache.org/jira/browse/TIKA-2490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16350538#comment-16350538
]
Tim Allison commented on TIKA-2490:
---
Whoa, welcome to modernity. :)
>I am assuming that just by
[
https://issues.apache.org/jira/browse/TIKA-2563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Allison updated TIKA-2563:
--
Attachment: consumentenbond.html
> Extract embedded files in HTML
> --
>
>
[
https://issues.apache.org/jira/browse/TIKA-2562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16350634#comment-16350634
]
NW Brad commented on TIKA-2562:
---
Thanks. I check it out, it and tagsoup is definitely adding the shape. I
[
https://issues.apache.org/jira/browse/TIKA-1599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16350529#comment-16350529
]
Tim Allison commented on TIKA-1599:
---
>DOM could lead to higher memory usage
Y, that's my concern, esp
[
https://issues.apache.org/jira/browse/TIKA-1599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16350529#comment-16350529
]
Tim Allison edited comment on TIKA-1599 at 2/2/18 3:39 PM:
---
>DOM could lead to
[
https://issues.apache.org/jira/browse/TIKA-2490?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16350550#comment-16350550
]
Tim Allison commented on TIKA-2490:
---
Y, sorry. We could change this behavior back to ignore missing
[
https://issues.apache.org/jira/browse/TIKA-2563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16350604#comment-16350604
]
Markus Jelsma commented on TIKA-2563:
-
I am not sure if ASL 2.0 friendly would apply. I took it some
[
https://issues.apache.org/jira/browse/TIKA-1599?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16350541#comment-16350541
]
Markus Jelsma commented on TIKA-1599:
-
Tim, if attached file is what you are looking for, i've got
[
https://issues.apache.org/jira/browse/TIKA-2563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16350593#comment-16350593
]
Tim Allison commented on TIKA-2563:
---
ASF 2.0 friendly example file based on example file kindly supplied
[
https://issues.apache.org/jira/browse/TIKA-2563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Allison updated TIKA-2563:
--
Attachment: testHTML_embedded_img.html
> Extract embedded files in HTML
> --
[
https://issues.apache.org/jira/browse/TIKA-2563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16350614#comment-16350614
]
Markus Jelsma commented on TIKA-2563:
-
Ah, thanks :)
> Extract embedded files in HTML
>
[
https://issues.apache.org/jira/browse/TIKA-2563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Allison updated TIKA-2563:
--
Description: Files (esp images) and other objects can be embedded in
html/css/javascript with the
[
https://issues.apache.org/jira/browse/TIKA-2563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Tim Allison updated TIKA-2563:
--
Description: Files (esp images) and other objects can be embedded in
html/css/javascript with the [data:
Tim Allison created TIKA-2563:
-
Summary: Extract embedded files in HTML
Key: TIKA-2563
URL: https://issues.apache.org/jira/browse/TIKA-2563
Project: Tika
Issue Type: Improvement
[
https://issues.apache.org/jira/browse/TIKA-2563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16350555#comment-16350555
]
Tim Allison commented on TIKA-2563:
---
Attached example file supplied by [~markus17] on TIKA-1599. Thank
[
https://issues.apache.org/jira/browse/TIKA-2562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16350822#comment-16350822
]
Tim Allison commented on TIKA-2562:
---
I'll take a look. This will require some digging.
> tika server
[
https://issues.apache.org/jira/browse/TIKA-2562?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16351224#comment-16351224
]
NW Brad commented on TIKA-2562:
---
I was doing some research on this today and this may not be a function of
Lewis John McGibbney created TIKA-2565:
--
Summary: Upgrade edu.ucar dependencies to 4.6.11
Key: TIKA-2565
URL: https://issues.apache.org/jira/browse/TIKA-2565
Project: Tika
Issue Type:
Marc Prud'hommeaux created TIKA-2564:
Summary: Tika client cannot extract files from embedded archive
formats
Key: TIKA-2564
URL: https://issues.apache.org/jira/browse/TIKA-2564
Project: Tika
37 matches
Mail list logo