[jira] [Commented] (TIKA-2563) Extract embedded files in HTML

2018-02-02 Thread Markus Jelsma (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-2563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16350614#comment-16350614
 ] 

Markus Jelsma commented on TIKA-2563:
-

Ah, thanks :)

> Extract embedded files in HTML
> --
>
> Key: TIKA-2563
> URL: https://issues.apache.org/jira/browse/TIKA-2563
> Project: Tika
>  Issue Type: Improvement
>Reporter: Tim Allison
>Priority: Trivial
> Attachments: consumentenbond.html, testHTML_embedded_img.html
>
>
> Files (esp images) can be base64 encoded in HTML files.  We should extract 
> those like any other embedded file.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (TIKA-2563) Extract embedded files in HTML

2018-02-02 Thread Tim Allison (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-2563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16350606#comment-16350606
 ] 

Tim Allison commented on TIKA-2563:
---

Right.  Sorry.  I meant the {{testHTML_embedded_img.html}}, NOT the file you 
shared.  Thank you, again!

> Extract embedded files in HTML
> --
>
> Key: TIKA-2563
> URL: https://issues.apache.org/jira/browse/TIKA-2563
> Project: Tika
>  Issue Type: Improvement
>Reporter: Tim Allison
>Priority: Trivial
> Attachments: consumentenbond.html, testHTML_embedded_img.html
>
>
> Files (esp images) can be base64 encoded in HTML files.  We should extract 
> those like any other embedded file.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (TIKA-2563) Extract embedded files in HTML

2018-02-02 Thread Markus Jelsma (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-2563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16350604#comment-16350604
 ] 

Markus Jelsma commented on TIKA-2563:
-

I am not sure if ASL 2.0 friendly would apply. I took it some time ago from a 
live page of a Dutch non-profift association, for test purposes. 

> Extract embedded files in HTML
> --
>
> Key: TIKA-2563
> URL: https://issues.apache.org/jira/browse/TIKA-2563
> Project: Tika
>  Issue Type: Improvement
>Reporter: Tim Allison
>Priority: Trivial
> Attachments: consumentenbond.html, testHTML_embedded_img.html
>
>
> Files (esp images) can be base64 encoded in HTML files.  We should extract 
> those like any other embedded file.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (TIKA-2563) Extract embedded files in HTML

2018-02-02 Thread Tim Allison (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-2563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16350593#comment-16350593
 ] 

Tim Allison commented on TIKA-2563:
---

ASF 2.0 friendly example file based on example file kindly supplied by 
[~markus17]

> Extract embedded files in HTML
> --
>
> Key: TIKA-2563
> URL: https://issues.apache.org/jira/browse/TIKA-2563
> Project: Tika
>  Issue Type: Improvement
>Reporter: Tim Allison
>Priority: Trivial
> Attachments: consumentenbond.html, testHTML_embedded_img.html
>
>
> Files (esp images) can be base64 encoded in HTML files.  We should extract 
> those like any other embedded file.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (TIKA-2563) Extract embedded files in HTML

2018-02-02 Thread Tim Allison (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-2563?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16350555#comment-16350555
 ] 

Tim Allison commented on TIKA-2563:
---

Attached example file supplied by [~markus17] on TIKA-1599.  Thank you!

> Extract embedded files in HTML
> --
>
> Key: TIKA-2563
> URL: https://issues.apache.org/jira/browse/TIKA-2563
> Project: Tika
>  Issue Type: Improvement
>Reporter: Tim Allison
>Priority: Trivial
> Attachments: consumentenbond.html
>
>
> Files (esp images) can be base64 encoded in HTML files.  We should extract 
> those like any other embedded file.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)