[jira] [Commented] (TIKA-1728) Detection is not working properly for detecting HWP 5.0 file

2015-09-05 Thread Tim Allison (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1728?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14732013#comment-14732013
 ] 

Tim Allison commented on TIKA-1728:
---

Should we look into seeing if the author of 
[java-hwp|https://github.com/ddoleye/java-hwp] would be willing to move to an 
Apache license and push to maven?

> Detection is not working properly for detecting HWP 5.0 file
> 
>
> Key: TIKA-1728
> URL: https://issues.apache.org/jira/browse/TIKA-1728
> Project: Tika
>  Issue Type: Bug
> Environment: OS: windows 7 and centos 6
> Java: 1.7
> Tika jar: tika-app-1.10.jar
> File: HWP 5.0
>Reporter: mungeol heo
> Attachments: HWP-document-file-formats-3.0-Korean.pdf, 
> HWP-document-file-formats-5.0-Korean.pdf, error-message.png, test_3.0.hwp, 
> test_5.0.hwp
>
>
> HWP file has two formats which are HWP 3.0 and HWP 5.0.
> 'tika-app-1.10.jar' detects HWP 3.0 format's file correctly.
> But, not for HWP 5.0.
> Used commands and returned results are addresses below.
> > java -jar tika-app-1.10.jar --detect test_3.0.hwp
> > application/x-hwp
> > java -jar tika-app-1.10.jar --detect test_5.0.hwp
> > application/x-tika-msoffice



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TIKA-1726) Augment public methods that use a java.io.File with methods that use a java.nio.file.Path

2015-09-05 Thread Yaniv Kunda (JIRA)

 [ 
https://issues.apache.org/jira/browse/TIKA-1726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yaniv Kunda updated TIKA-1726:
--
Description: 
In light of Java 7 already EOL, it's high time we add support for the new 
java.nio.file.Path class introduced with it, which, together with support 
methods in java.nio.file.Files and others, provide a better file I/O framework 
than java.io.File.

In just two cases, we have public methods in tika that only return a File 
object, and cannot be overloaded, so a different name for the new method must 
be created:
- {{org.apache.tika.io.TemporaryResources#createTemporaryFile()}}
_Suggestions:_
-- addTemporaryFile
-- addTempFile
-- createTempFile
- {{org.apache.tika.io.TikaInputStream#getFile()}}
_Suggestions:_
-- asFile
-- toPath
-- getPath

In other cases, the methods accept a File as an argument, and should remain as 
tika users might be using them - so an overloaded method that accepts a Path 
instead should be added, referencing the new method from the old one using 
(using the @see tag) deprecating the old method until an unknown tika major 
release.
Here is the full list of other methods:
_tika-app:_
- {{org.apache.tika.gui.TikaGUI#openFile(File)}}

_tika-batch:_
- {{org.apache.tika.batch.fs.FSUtil#getOutputFile(File, String, 
HANDLE_EXISTING, String)}}
- {{org.apache.tika.util.PropsUtil#getFile(String, File)}}
- {{org.apache.tika.batch.fs.FSDirectoryCrawler}} constructors
- 
{{org.apache.tika.batch.fs.FSDirectoryCrawler#handleFirstFileInDirectory(File)}}
- {{org.apache.tika.batch.fs.FSFileResource}} constructor
- {{org.apache.tika.batch.fs.FSListCrawler}} constructor
- {{org.apache.tika.batch.fs.FSOutputStreamFactory}} constructor
- {{org.apache.tika.batch.fs.FSUtil#checkThisIsAncestorOfOrSameAsThat(File, 
File)}}
- {{org.apache.tika.batch.fs.FSUtil#checkThisIsAncestorOfThat(File, File)}}
- {{org.apache.tika.batch.fs.strawman.StrawManTikaAppDriver}} constructor

_tika-core:_
- {{org.apache.tika.Tika#detect(File)}}
- {{org.apache.tika.Tika#parse(File)}}
- {{org.apache.tika.Tika#parseToString(File)}}
- {{org.apache.tika.config.TikaConfig}} constructors
- {{org.apache.tika.detect.NNExampleModelDetector}} constructor
- {{org.apache.tika.detect.TrainedModelDetector#loadDefaultModels(File)}}
- {{org.apache.tika.io.TemporaryResources#setTemporaryFileDirectory(File)}}
- {{org.apache.tika.io.TikaInputStream#get(File)}}
- {{org.apache.tika.io.TikaInputStream#get(File, Metadata)}}

_tika-parsers:_
- {{org.apache.tika.parser.ParsingReader}} constructor
- {{org.apache.tika.parser.image.ImageMetadataExtractor#parseJpeg(File)}}
- {{org.apache.tika.parser.image.ImageMetadataExtractor#parseWebP(File)}}
- {{org.apache.tika.parser.mp4.DirectFileReadDataSource}} constructor

_tika-translate:_
- 
{{org.apache.tika.language.translate.ExternalTranslator#runAndGetOutput(String, 
String[], File)}}

Due to lack of evidence, all public methods in public non-test classes (and not 
in tika-example) are deemed part of a public API - although there's no formal 
definition of such.
If anyone knows of a public method which isn't accessed publicly and can be 
defined as package-private, or for another reason, please comment.


  was:
In light of Java 7 already EOL, it's high time we add support for the new 
java.nio.file.Path class introduced with it, which, together with support 
methods in java.nio.file.Files and others, provide a better file I/O framework 
than java.io.File.

In just two cases, we have public methods in tika that only return a File 
object, and cannot be overloaded, so a different name for the new method must 
be created:
- {{org.apache.tika.io.TemporaryResources#createTemporaryFile()}}
_Suggestions:_
-- addTemporaryFile
-- addTempFile
-- createTempFile
- {{org.apache.tika.io.TikaInputStream#getFile()}}
_Suggestions:_
-- asFile
-- toPath
-- getPath

In other cases, the methods accept a File as an argument, and should remain as 
tika users might be using them - so an overloaded method that accepts a Path 
instead should be added, deprecating the old method until an unknown tika major 
release.
Here is the full list of other methods:
_tika-app:_
- {{org.apache.tika.gui.TikaGUI#openFile(File)}}

_tika-batch:_
- {{org.apache.tika.batch.fs.FSUtil#getOutputFile(File, String, 
HANDLE_EXISTING, String)}}
- {{org.apache.tika.util.PropsUtil#getFile(String, File)}}
- {{org.apache.tika.batch.fs.FSDirectoryCrawler}} constructors
- 
{{org.apache.tika.batch.fs.FSDirectoryCrawler#handleFirstFileInDirectory(File)}}
- {{org.apache.tika.batch.fs.FSFileResource}} constructor
- {{org.apache.tika.batch.fs.FSListCrawler}} constructor
- {{org.apache.tika.batch.fs.FSOutputStreamFactory}} constructor
- {{org.apache.tika.batch.fs.FSUtil#checkThisIsAncestorOfOrSameAsThat(File, 
File)}}
- {{org.apache.tika.batch.fs.FSUtil#checkThisIsAncestorOfThat(File, File)}}
- {{org.apache.tika.batch.fs.strawman.StrawManTikaAppDriver}} constructor


[jira] [Updated] (TIKA-1726) Augment public methods that use a java.io.File with methods that use a java.nio.file.Path

2015-09-05 Thread Yaniv Kunda (JIRA)

 [ 
https://issues.apache.org/jira/browse/TIKA-1726?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yaniv Kunda updated TIKA-1726:
--
Description: 
In light of Java 7 already EOL, it's high time we add support for the new 
java.nio.file.Path class introduced with it, which, together with support 
methods in java.nio.file.Files and others, provide a better file I/O framework 
than java.io.File.

In just two cases, we have public methods in tika that only return a File 
object, and cannot be overloaded, so a different name for the new method must 
be created:
- {{org.apache.tika.io.TemporaryResources#createTemporaryFile()}}
_Suggestions:_
-- addTemporaryFile
-- addTempFile
-- createTempFile
-- createTemporaryPath
- {{org.apache.tika.io.TikaInputStream#getFile()}}
_Suggestions:_
-- asFile
-- toPath
-- getPath

In other cases, the methods accept a File as an argument, and should remain as 
tika users might be using them - so an overloaded method that accepts a Path 
instead should be added, referencing the new method from the old one (using the 
@see tag) until java.io.File itself is deprecated or otherwise becomes obsolete.
Here is the full list of other methods:
_tika-app:_
- {{org.apache.tika.gui.TikaGUI#openFile(File)}}

_tika-batch:_
- {{org.apache.tika.batch.fs.FSUtil#getOutputFile(File, String, 
HANDLE_EXISTING, String)}}
- {{org.apache.tika.util.PropsUtil#getFile(String, File)}}
- {{org.apache.tika.batch.fs.FSDirectoryCrawler}} constructors
- 
{{org.apache.tika.batch.fs.FSDirectoryCrawler#handleFirstFileInDirectory(File)}}
- {{org.apache.tika.batch.fs.FSFileResource}} constructor
- {{org.apache.tika.batch.fs.FSListCrawler}} constructor
- {{org.apache.tika.batch.fs.FSOutputStreamFactory}} constructor
- {{org.apache.tika.batch.fs.FSUtil#checkThisIsAncestorOfOrSameAsThat(File, 
File)}}
- {{org.apache.tika.batch.fs.FSUtil#checkThisIsAncestorOfThat(File, File)}}
- {{org.apache.tika.batch.fs.strawman.StrawManTikaAppDriver}} constructor

_tika-core:_
- {{org.apache.tika.Tika#detect(File)}}
- {{org.apache.tika.Tika#parse(File)}}
- {{org.apache.tika.Tika#parseToString(File)}}
- {{org.apache.tika.config.TikaConfig}} constructors
- {{org.apache.tika.detect.NNExampleModelDetector}} constructor
- {{org.apache.tika.detect.TrainedModelDetector#loadDefaultModels(File)}}
- {{org.apache.tika.io.TemporaryResources#setTemporaryFileDirectory(File)}}
- {{org.apache.tika.io.TikaInputStream#get(File)}}
- {{org.apache.tika.io.TikaInputStream#get(File, Metadata)}}

_tika-parsers:_
- {{org.apache.tika.parser.ParsingReader}} constructor
- {{org.apache.tika.parser.image.ImageMetadataExtractor#parseJpeg(File)}}
- {{org.apache.tika.parser.image.ImageMetadataExtractor#parseWebP(File)}}
- {{org.apache.tika.parser.mp4.DirectFileReadDataSource}} constructor

_tika-translate:_
- 
{{org.apache.tika.language.translate.ExternalTranslator#runAndGetOutput(String, 
String[], File)}}

Due to lack of evidence, all public methods in public non-test classes (and not 
in tika-example) are deemed part of a public API - although there's no formal 
definition of such.
If anyone knows of a public method which isn't accessed publicly and can be 
defined as package-private, or for another reason, please comment.


  was:
In light of Java 7 already EOL, it's high time we add support for the new 
java.nio.file.Path class introduced with it, which, together with support 
methods in java.nio.file.Files and others, provide a better file I/O framework 
than java.io.File.

In just two cases, we have public methods in tika that only return a File 
object, and cannot be overloaded, so a different name for the new method must 
be created:
- {{org.apache.tika.io.TemporaryResources#createTemporaryFile()}}
_Suggestions:_
-- addTemporaryFile
-- addTempFile
-- createTempFile
- {{org.apache.tika.io.TikaInputStream#getFile()}}
_Suggestions:_
-- asFile
-- toPath
-- getPath

In other cases, the methods accept a File as an argument, and should remain as 
tika users might be using them - so an overloaded method that accepts a Path 
instead should be added, referencing the new method from the old one using 
(using the @see tag) deprecating the old method until an unknown tika major 
release.
Here is the full list of other methods:
_tika-app:_
- {{org.apache.tika.gui.TikaGUI#openFile(File)}}

_tika-batch:_
- {{org.apache.tika.batch.fs.FSUtil#getOutputFile(File, String, 
HANDLE_EXISTING, String)}}
- {{org.apache.tika.util.PropsUtil#getFile(String, File)}}
- {{org.apache.tika.batch.fs.FSDirectoryCrawler}} constructors
- 
{{org.apache.tika.batch.fs.FSDirectoryCrawler#handleFirstFileInDirectory(File)}}
- {{org.apache.tika.batch.fs.FSFileResource}} constructor
- {{org.apache.tika.batch.fs.FSListCrawler}} constructor
- {{org.apache.tika.batch.fs.FSOutputStreamFactory}} constructor
- {{org.apache.tika.batch.fs.FSUtil#checkThisIsAncestorOfOrSameAsThat(File, 
File)}}
-