[jira] [Commented] (TIKA-2543) No content extraction for application/x-webarchive format

2018-10-17 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16653804#comment-16653804 ] Nick Burch commented on TIKA-2543: -- Great find Tim! Looks like an excellent resource on this. Assuming

[jira] [Commented] (TIKA-2543) No content extraction for application/x-webarchive format

2018-10-17 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16653764#comment-16653764 ] Tim Allison commented on TIKA-2543: ---

[jira] [Commented] (TIKA-2543) No content extraction for application/x-webarchive format

2018-10-17 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16653725#comment-16653725 ] Tim Allison commented on TIKA-2543: --- Still on lookout for Java parser with an Apache friendly license

[jira] [Commented] (TIKA-2543) No content extraction for application/x-webarchive format

2018-10-17 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16653718#comment-16653718 ] Tim Allison commented on TIKA-2543: --- TIKA-1358 might be relevant. We don't currently parse modern Apple

[jira] [Commented] (TIKA-2543) No content extraction for application/x-webarchive format

2018-10-17 Thread Rafael Ferreira (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16653715#comment-16653715 ] Rafael Ferreira commented on TIKA-2543: --- If someone can point in the general area of the problem,

[jira] [Commented] (TIKA-2543) No content extraction for application/x-webarchive format

2018-10-17 Thread Rafael Ferreira (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16653711#comment-16653711 ] Rafael Ferreira commented on TIKA-2543: --- This seems like a more widespread issue than I imagined,

[jira] [Commented] (TIKA-2543) No content extraction for application/x-webarchive format

2018-01-21 Thread Rafael Ferreira (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16333797#comment-16333797 ] Rafael Ferreira commented on TIKA-2543: --- [~gagravarr] is this what you had in mind? Attached. 

[jira] [Commented] (TIKA-2543) No content extraction for application/x-webarchive format

2018-01-08 Thread Nick Burch (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2543?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16316029#comment-16316029 ] Nick Burch commented on TIKA-2543: -- Based on https://en.wikipedia.org/wiki/Webarchive the underlying