[ https://issues.apache.org/jira/browse/TIKA-2092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Craig Pfeifer updated TIKA-2092: -------------------------------- Description: "A general-purpose, deep learning-based system to decompile an image into presentational markup." As described in : What You Get Is What You See: A Visual Markup Decompiler Yuntian Deng, Anssi Kanervisto, and Alexander M. Rush http://arxiv.org/pdf/1609.04938v1.pdf code here: https://github.com/harvardnlp/im2markup demo here: http://lstm.seas.harvard.edu/latex/ Would be useful as part of the OCR efforts in tika. was: "A general-purpose, deep learning-based system to decompile an image into presentational markup." As described in : What You Get Is What You See: A Visual Markup Decompiler Yuntian Deng, Anssi Kanervisto, and Alexander M. Rush http://arxiv.org/pdf/1609.04938v1.pdf code here: https://github.com/harvardnlp/im2markup Would be useful as part of the OCR efforts in tika. > Integrate Math equation image extraction > ---------------------------------------- > > Key: TIKA-2092 > URL: https://issues.apache.org/jira/browse/TIKA-2092 > Project: Tika > Issue Type: Improvement > Components: detector, ocr > Affects Versions: 2.0 > Reporter: Craig Pfeifer > Labels: deeplearning, image, parse > > "A general-purpose, deep learning-based system to decompile an image into > presentational markup." > As described in : > What You Get Is What You See: A Visual Markup Decompiler > Yuntian Deng, Anssi Kanervisto, and Alexander M. Rush > http://arxiv.org/pdf/1609.04938v1.pdf > code here: > https://github.com/harvardnlp/im2markup > demo here: > http://lstm.seas.harvard.edu/latex/ > Would be useful as part of the OCR efforts in tika. -- This message was sent by Atlassian JIRA (v6.3.4#6332)