[jira] [Commented] (TIKA-1728) Detection is not working properly for detecting HWP 5.0 file
[ https://issues.apache.org/jira/browse/TIKA-1728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14732013#comment-14732013 ] Tim Allison commented on TIKA-1728: --- Should we look into seeing if the author of [java-hwp|https://github.com/ddoleye/java-hwp] would be willing to move to an Apache license and push to maven? > Detection is not working properly for detecting HWP 5.0 file > > > Key: TIKA-1728 > URL: https://issues.apache.org/jira/browse/TIKA-1728 > Project: Tika > Issue Type: Bug > Environment: OS: windows 7 and centos 6 > Java: 1.7 > Tika jar: tika-app-1.10.jar > File: HWP 5.0 >Reporter: mungeol heo > Attachments: HWP-document-file-formats-3.0-Korean.pdf, > HWP-document-file-formats-5.0-Korean.pdf, error-message.png, test_3.0.hwp, > test_5.0.hwp > > > HWP file has two formats which are HWP 3.0 and HWP 5.0. > 'tika-app-1.10.jar' detects HWP 3.0 format's file correctly. > But, not for HWP 5.0. > Used commands and returned results are addresses below. > > java -jar tika-app-1.10.jar --detect test_3.0.hwp > > application/x-hwp > > java -jar tika-app-1.10.jar --detect test_5.0.hwp > > application/x-tika-msoffice -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TIKA-1726) Augment public methods that use a java.io.File with methods that use a java.nio.file.Path
[ https://issues.apache.org/jira/browse/TIKA-1726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yaniv Kunda updated TIKA-1726: -- Description: In light of Java 7 already EOL, it's high time we add support for the new java.nio.file.Path class introduced with it, which, together with support methods in java.nio.file.Files and others, provide a better file I/O framework than java.io.File. In just two cases, we have public methods in tika that only return a File object, and cannot be overloaded, so a different name for the new method must be created: - {{org.apache.tika.io.TemporaryResources#createTemporaryFile()}} _Suggestions:_ -- addTemporaryFile -- addTempFile -- createTempFile - {{org.apache.tika.io.TikaInputStream#getFile()}} _Suggestions:_ -- asFile -- toPath -- getPath In other cases, the methods accept a File as an argument, and should remain as tika users might be using them - so an overloaded method that accepts a Path instead should be added, referencing the new method from the old one using (using the @see tag) deprecating the old method until an unknown tika major release. Here is the full list of other methods: _tika-app:_ - {{org.apache.tika.gui.TikaGUI#openFile(File)}} _tika-batch:_ - {{org.apache.tika.batch.fs.FSUtil#getOutputFile(File, String, HANDLE_EXISTING, String)}} - {{org.apache.tika.util.PropsUtil#getFile(String, File)}} - {{org.apache.tika.batch.fs.FSDirectoryCrawler}} constructors - {{org.apache.tika.batch.fs.FSDirectoryCrawler#handleFirstFileInDirectory(File)}} - {{org.apache.tika.batch.fs.FSFileResource}} constructor - {{org.apache.tika.batch.fs.FSListCrawler}} constructor - {{org.apache.tika.batch.fs.FSOutputStreamFactory}} constructor - {{org.apache.tika.batch.fs.FSUtil#checkThisIsAncestorOfOrSameAsThat(File, File)}} - {{org.apache.tika.batch.fs.FSUtil#checkThisIsAncestorOfThat(File, File)}} - {{org.apache.tika.batch.fs.strawman.StrawManTikaAppDriver}} constructor _tika-core:_ - {{org.apache.tika.Tika#detect(File)}} - {{org.apache.tika.Tika#parse(File)}} - {{org.apache.tika.Tika#parseToString(File)}} - {{org.apache.tika.config.TikaConfig}} constructors - {{org.apache.tika.detect.NNExampleModelDetector}} constructor - {{org.apache.tika.detect.TrainedModelDetector#loadDefaultModels(File)}} - {{org.apache.tika.io.TemporaryResources#setTemporaryFileDirectory(File)}} - {{org.apache.tika.io.TikaInputStream#get(File)}} - {{org.apache.tika.io.TikaInputStream#get(File, Metadata)}} _tika-parsers:_ - {{org.apache.tika.parser.ParsingReader}} constructor - {{org.apache.tika.parser.image.ImageMetadataExtractor#parseJpeg(File)}} - {{org.apache.tika.parser.image.ImageMetadataExtractor#parseWebP(File)}} - {{org.apache.tika.parser.mp4.DirectFileReadDataSource}} constructor _tika-translate:_ - {{org.apache.tika.language.translate.ExternalTranslator#runAndGetOutput(String, String[], File)}} Due to lack of evidence, all public methods in public non-test classes (and not in tika-example) are deemed part of a public API - although there's no formal definition of such. If anyone knows of a public method which isn't accessed publicly and can be defined as package-private, or for another reason, please comment. was: In light of Java 7 already EOL, it's high time we add support for the new java.nio.file.Path class introduced with it, which, together with support methods in java.nio.file.Files and others, provide a better file I/O framework than java.io.File. In just two cases, we have public methods in tika that only return a File object, and cannot be overloaded, so a different name for the new method must be created: - {{org.apache.tika.io.TemporaryResources#createTemporaryFile()}} _Suggestions:_ -- addTemporaryFile -- addTempFile -- createTempFile - {{org.apache.tika.io.TikaInputStream#getFile()}} _Suggestions:_ -- asFile -- toPath -- getPath In other cases, the methods accept a File as an argument, and should remain as tika users might be using them - so an overloaded method that accepts a Path instead should be added, deprecating the old method until an unknown tika major release. Here is the full list of other methods: _tika-app:_ - {{org.apache.tika.gui.TikaGUI#openFile(File)}} _tika-batch:_ - {{org.apache.tika.batch.fs.FSUtil#getOutputFile(File, String, HANDLE_EXISTING, String)}} - {{org.apache.tika.util.PropsUtil#getFile(String, File)}} - {{org.apache.tika.batch.fs.FSDirectoryCrawler}} constructors - {{org.apache.tika.batch.fs.FSDirectoryCrawler#handleFirstFileInDirectory(File)}} - {{org.apache.tika.batch.fs.FSFileResource}} constructor - {{org.apache.tika.batch.fs.FSListCrawler}} constructor - {{org.apache.tika.batch.fs.FSOutputStreamFactory}} constructor - {{org.apache.tika.batch.fs.FSUtil#checkThisIsAncestorOfOrSameAsThat(File, File)}} - {{org.apache.tika.batch.fs.FSUtil#checkThisIsAncestorOfThat(File, File)}} - {{org.apache.tika.batch.fs.strawman.StrawManTikaAppDriver}} constructor
[jira] [Updated] (TIKA-1726) Augment public methods that use a java.io.File with methods that use a java.nio.file.Path
[ https://issues.apache.org/jira/browse/TIKA-1726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yaniv Kunda updated TIKA-1726: -- Description: In light of Java 7 already EOL, it's high time we add support for the new java.nio.file.Path class introduced with it, which, together with support methods in java.nio.file.Files and others, provide a better file I/O framework than java.io.File. In just two cases, we have public methods in tika that only return a File object, and cannot be overloaded, so a different name for the new method must be created: - {{org.apache.tika.io.TemporaryResources#createTemporaryFile()}} _Suggestions:_ -- addTemporaryFile -- addTempFile -- createTempFile -- createTemporaryPath - {{org.apache.tika.io.TikaInputStream#getFile()}} _Suggestions:_ -- asFile -- toPath -- getPath In other cases, the methods accept a File as an argument, and should remain as tika users might be using them - so an overloaded method that accepts a Path instead should be added, referencing the new method from the old one (using the @see tag) until java.io.File itself is deprecated or otherwise becomes obsolete. Here is the full list of other methods: _tika-app:_ - {{org.apache.tika.gui.TikaGUI#openFile(File)}} _tika-batch:_ - {{org.apache.tika.batch.fs.FSUtil#getOutputFile(File, String, HANDLE_EXISTING, String)}} - {{org.apache.tika.util.PropsUtil#getFile(String, File)}} - {{org.apache.tika.batch.fs.FSDirectoryCrawler}} constructors - {{org.apache.tika.batch.fs.FSDirectoryCrawler#handleFirstFileInDirectory(File)}} - {{org.apache.tika.batch.fs.FSFileResource}} constructor - {{org.apache.tika.batch.fs.FSListCrawler}} constructor - {{org.apache.tika.batch.fs.FSOutputStreamFactory}} constructor - {{org.apache.tika.batch.fs.FSUtil#checkThisIsAncestorOfOrSameAsThat(File, File)}} - {{org.apache.tika.batch.fs.FSUtil#checkThisIsAncestorOfThat(File, File)}} - {{org.apache.tika.batch.fs.strawman.StrawManTikaAppDriver}} constructor _tika-core:_ - {{org.apache.tika.Tika#detect(File)}} - {{org.apache.tika.Tika#parse(File)}} - {{org.apache.tika.Tika#parseToString(File)}} - {{org.apache.tika.config.TikaConfig}} constructors - {{org.apache.tika.detect.NNExampleModelDetector}} constructor - {{org.apache.tika.detect.TrainedModelDetector#loadDefaultModels(File)}} - {{org.apache.tika.io.TemporaryResources#setTemporaryFileDirectory(File)}} - {{org.apache.tika.io.TikaInputStream#get(File)}} - {{org.apache.tika.io.TikaInputStream#get(File, Metadata)}} _tika-parsers:_ - {{org.apache.tika.parser.ParsingReader}} constructor - {{org.apache.tika.parser.image.ImageMetadataExtractor#parseJpeg(File)}} - {{org.apache.tika.parser.image.ImageMetadataExtractor#parseWebP(File)}} - {{org.apache.tika.parser.mp4.DirectFileReadDataSource}} constructor _tika-translate:_ - {{org.apache.tika.language.translate.ExternalTranslator#runAndGetOutput(String, String[], File)}} Due to lack of evidence, all public methods in public non-test classes (and not in tika-example) are deemed part of a public API - although there's no formal definition of such. If anyone knows of a public method which isn't accessed publicly and can be defined as package-private, or for another reason, please comment. was: In light of Java 7 already EOL, it's high time we add support for the new java.nio.file.Path class introduced with it, which, together with support methods in java.nio.file.Files and others, provide a better file I/O framework than java.io.File. In just two cases, we have public methods in tika that only return a File object, and cannot be overloaded, so a different name for the new method must be created: - {{org.apache.tika.io.TemporaryResources#createTemporaryFile()}} _Suggestions:_ -- addTemporaryFile -- addTempFile -- createTempFile - {{org.apache.tika.io.TikaInputStream#getFile()}} _Suggestions:_ -- asFile -- toPath -- getPath In other cases, the methods accept a File as an argument, and should remain as tika users might be using them - so an overloaded method that accepts a Path instead should be added, referencing the new method from the old one using (using the @see tag) deprecating the old method until an unknown tika major release. Here is the full list of other methods: _tika-app:_ - {{org.apache.tika.gui.TikaGUI#openFile(File)}} _tika-batch:_ - {{org.apache.tika.batch.fs.FSUtil#getOutputFile(File, String, HANDLE_EXISTING, String)}} - {{org.apache.tika.util.PropsUtil#getFile(String, File)}} - {{org.apache.tika.batch.fs.FSDirectoryCrawler}} constructors - {{org.apache.tika.batch.fs.FSDirectoryCrawler#handleFirstFileInDirectory(File)}} - {{org.apache.tika.batch.fs.FSFileResource}} constructor - {{org.apache.tika.batch.fs.FSListCrawler}} constructor - {{org.apache.tika.batch.fs.FSOutputStreamFactory}} constructor - {{org.apache.tika.batch.fs.FSUtil#checkThisIsAncestorOfOrSameAsThat(File, File)}} -