Re: image recognition...how do the parts play together?

2018-07-06 Thread Chris Mattmann
Yes, there is a big reason. It’s b/c you don’t have to have an external server running to use it with tika-dl. And of course you can static analyze the code (which you have to mix languages for that with the other solution), etc. So yes, we should keep them both… From: Tim

Re: image recognition...how do the parts play together?

2018-07-06 Thread Tim Allison
This is very helpful. Thank you! Is there any use in having the tika-dl module if our more modern approach is REST + Docker? The upkeep in tika-dl is nontrivial. On Fri, Jul 6, 2018 at 6:15 PM Chris Mattmann wrote: > Tim, > > > > Thanks. There are multiple modes of integrating deep learning

Re: image recognition...how do the parts play together?

2018-07-06 Thread Chris Mattmann
Tim, Thanks. There are multiple modes of integrating deep learning with Tika: The original mode: uses Thamme’s work on REST exposing Tensorflow and Docker to provide a REST Service to Tika to allow for running Tensorflow DL models. We initially did Inception_v3, and a model by Madhav Sharan

[jira] [Updated] (TIKA-2680) Email attachments to an email are not extracted

2018-07-06 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-2680: -- Attachment: main_email_in_outlook.jpg > Email attachments to an email are not extracted >

[jira] [Commented] (TIKA-2680) Email attachments to an email are not extracted

2018-07-06 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16535412#comment-16535412 ] Tim Allison commented on TIKA-2680: --- Given that Outlook appears to treat this as an attachment, are you

[jira] [Updated] (TIKA-2680) Email attachments to an email are not extracted

2018-07-06 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-2680: -- Attachment: (was: main_email_in_outlook.jpg) > Email attachments to an email are not extracted >

[jira] [Updated] (TIKA-2680) Email attachments to an email are not extracted

2018-07-06 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2680?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-2680: -- Attachment: main_email_in_outlook.jpg > Email attachments to an email are not extracted >

image recognition...how do the parts play together?

2018-07-06 Thread Tim Allison
On Twitter, Chris, Thamme, Thejan, and I are working with some deeplearning4j devs to help us upgrade to deeplearning4j 1.0.0-BETA (TIKA-2672). I initially requested help from Thejan (and Thamme :D) for this because we were getting an initialization exception after the upgrade in tika-dl's

[jira] [Comment Edited] (TIKA-2680) Email attachments to an email are not extracted

2018-07-06 Thread Yury Kats (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16535351#comment-16535351 ] Yury Kats edited comment on TIKA-2680 at 7/6/18 9:07 PM: - Indeed, the first

[jira] [Commented] (TIKA-2680) Email attachments to an email are not extracted

2018-07-06 Thread Yury Kats (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16535351#comment-16535351 ] Yury Kats commented on TIKA-2680: - Indeed, the first embedded rfc822 is not an attachment. I believe this

[jira] [Commented] (TIKA-2680) Email attachments to an email are not extracted

2018-07-06 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2680?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16535339#comment-16535339 ] Tim Allison commented on TIKA-2680: --- Something like this? {noformat} multipart/mixed (uses

[jira] [Commented] (TIKA-2685) Email attached to an undeliverable email report are not extracted

2018-07-06 Thread Yury Kats (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16535296#comment-16535296 ] Yury Kats commented on TIKA-2685: - Yes, correct, this is govern by RFC 3642, sorry I didn't mention this

[jira] [Commented] (TIKA-2685) Email attached to an undeliverable email report are not extracted

2018-07-06 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16535288#comment-16535288 ] Tim Allison commented on TIKA-2685: --- https://tools.ietf.org/html/rfc3462 page 2 describes exactly

[jira] [Comment Edited] (TIKA-2685) Email attached to an undeliverable email report are not extracted

2018-07-06 Thread Yury Kats (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16535271#comment-16535271 ] Yury Kats edited comment on TIKA-2685 at 7/6/18 8:03 PM: - delivery-status and

[jira] [Commented] (TIKA-2685) Email attached to an undeliverable email report are not extracted

2018-07-06 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16535275#comment-16535275 ] Tim Allison commented on TIKA-2685: --- I think I agree...the first rfc822 (multipart/report) has three

[jira] [Comment Edited] (TIKA-2685) Email attached to an undeliverable email report are not extracted

2018-07-06 Thread Yury Kats (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16535271#comment-16535271 ] Yury Kats edited comment on TIKA-2685 at 7/6/18 8:02 PM: - delivery-status and

[jira] [Comment Edited] (TIKA-2685) Email attached to an undeliverable email report are not extracted

2018-07-06 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16535267#comment-16535267 ] Tim Allison edited comment on TIKA-2685 at 7/6/18 8:00 PM: --- Is this your

[jira] [Commented] (TIKA-2685) Email attached to an undeliverable email report are not extracted

2018-07-06 Thread Yury Kats (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16535271#comment-16535271 ] Yury Kats commented on TIKA-2685: - delivery-status and message/rfc822 are inside multipart/report > Email

[jira] [Commented] (TIKA-2685) Email attached to an undeliverable email report are not extracted

2018-07-06 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16535267#comment-16535267 ] Tim Allison commented on TIKA-2685: --- Is this your understanding of the structure? {noformat}

[jira] [Commented] (TIKA-2673) HtmlEncodingDetector doesn't follow the specification

2018-07-06 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16535147#comment-16535147 ] Hudson commented on TIKA-2673: -- SUCCESS: Integrated in Jenkins build tika-branch-1x #56 (See

[jira] [Commented] (TIKA-2673) HtmlEncodingDetector doesn't follow the specification

2018-07-06 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16535145#comment-16535145 ] Hudson commented on TIKA-2673: -- SUCCESS: Integrated in Jenkins build Tika-trunk #1517 (See

[jira] [Commented] (TIKA-2685) Email attached to an undeliverable email report are not extracted

2018-07-06 Thread Yury Kats (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16535135#comment-16535135 ] Yury Kats commented on TIKA-2685: - For my own immediate needs, I modified MimeStreamParser to call

[jira] [Commented] (TIKA-2673) HtmlEncodingDetector doesn't follow the specification

2018-07-06 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16535130#comment-16535130 ] Hudson commented on TIKA-2673: -- UNSTABLE: Integrated in Jenkins build tika-2.x-windows #282 (See

[jira] [Commented] (TIKA-2685) Email attached to an undeliverable email report are not extracted

2018-07-06 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2685?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16535100#comment-16535100 ] Tim Allison commented on TIKA-2685: --- [~yurykats], thank you for identifying this problem and TIKA-2680

[jira] [Assigned] (TIKA-2685) Email attached to an undeliverable email report are not extracted

2018-07-06 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2685?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison reassigned TIKA-2685: - Assignee: Tim Allison > Email attached to an undeliverable email report are not extracted >

[jira] [Commented] (TIKA-2673) HtmlEncodingDetector doesn't follow the specification

2018-07-06 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16535041#comment-16535041 ] Tim Allison commented on TIKA-2673: --- I've added this to both 'master' and 'branch_1x'.  Let me know if

Re: Tika 1.19?

2018-07-06 Thread Chris Mattmann
Once tika-dl works again with Inception v4, I’m good ☺ I’m working on adding some more models to tika-dl and other things but those can come after 1.19. Cheers, Chris From: Tim Allison Reply-To: "dev@tika.apache.org" Date: Friday, July 6, 2018 at 8:40 AM To:

Tika 1.19?

2018-07-06 Thread Tim Allison
All, We've made quite a few improvements, what would you think of starting the release process in a couple of weeks...say, July 23ish? I'd like to complete the dl4j upgrade and update some of our dependencies so that we can at least build with Java 11. Any blockers or other things people

[jira] [Commented] (TIKA-2672) Upgrade dl4j to 1.0.0-beta

2018-07-06 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16534994#comment-16534994 ] Tim Allison commented on TIKA-2672: --- Fantastic!  Thank you [~ThejanWijesinghe]!  > Upgrade dl4j to

[jira] [Comment Edited] (TIKA-2672) Upgrade dl4j to 1.0.0-beta

2018-07-06 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16534994#comment-16534994 ] Tim Allison edited comment on TIKA-2672 at 7/6/18 3:30 PM: --- Fantastic!  Thank

[jira] [Commented] (TIKA-2673) HtmlEncodingDetector doesn't follow the specification

2018-07-06 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2673?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16534992#comment-16534992 ] Tim Allison commented on TIKA-2673: --- [~gbouchar], thank you for contributing this!  I won't have time to

[jira] [Commented] (TIKA-2675) OpenDocumentParser should fail on invalid zip files

2018-07-06 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16534990#comment-16534990 ] Hudson commented on TIKA-2675: -- SUCCESS: Integrated in Jenkins build tika-branch-1x #55 (See

[jira] [Commented] (TIKA-2672) Upgrade dl4j to 1.0.0-beta

2018-07-06 Thread Chris A. Mattmann (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16534984#comment-16534984 ] Chris A. Mattmann commented on TIKA-2672: - GREAT WORK [~ThejanWijesinghe] thanks my guy > Upgrade

[jira] [Commented] (TIKA-2672) Upgrade dl4j to 1.0.0-beta

2018-07-06 Thread Thejan Wijesinghe (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2672?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16534966#comment-16534966 ] Thejan Wijesinghe commented on TIKA-2672: - [~talli...@apache.org] sorry for the delay, so the dl4j

[jira] [Commented] (TIKA-2675) OpenDocumentParser should fail on invalid zip files

2018-07-06 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16534858#comment-16534858 ] Hudson commented on TIKA-2675: -- SUCCESS: Integrated in Jenkins build Tika-trunk #1516 (See

[jira] [Resolved] (TIKA-2675) OpenDocumentParser should fail on invalid zip files

2018-07-06 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2675?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-2675. --- Resolution: Fixed Fix Version/s: 2.0.0 1.19 Thank you [~wastl-nagel]! >

[jira] [Commented] (TIKA-2675) OpenDocumentParser should fail on invalid zip files

2018-07-06 Thread ASF GitHub Bot (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2675?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16534789#comment-16534789 ] ASF GitHub Bot commented on TIKA-2675: -- tballison closed pull request #240: TIKA-2675

[jira] [Commented] (TIKA-874) Identify FITS (Flexible Image Transport System) files

2018-07-06 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-874?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16534784#comment-16534784 ] Tim Allison commented on TIKA-874: -- See TIKA-2684 for how to configure GDAL to parse FITS...many thanks to

[jira] [Resolved] (TIKA-2684) Tika does not extract *.fits header text, just file level metadata

2018-07-06 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-2684. --- Resolution: Not A Problem Not a Tika problem technically, but definitely an area for us to improve

[jira] [Commented] (TIKA-2684) Tika does not extract *.fits header text, just file level metadata

2018-07-06 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16534780#comment-16534780 ] Tim Allison commented on TIKA-2684: --- W00t! Thank you [~chrismattmann].   [~sborda], I updated our

[jira] [Comment Edited] (TIKA-2684) Tika does not extract *.fits header text, just file level metadata

2018-07-06 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16534780#comment-16534780 ] Tim Allison edited comment on TIKA-2684 at 7/6/18 12:46 PM: W00t! Thank you