[jira] [Created] (TIKA-1539) GRB file magic bytes and extension matching

2015-02-03 Thread Luke sh (JIRA)
Luke sh created TIKA-1539: - Summary: GRB file magic bytes and extension matching Key: TIKA-1539 URL: https://issues.apache.org/jira/browse/TIKA-1539 Project: Tika Issue Type: Improvement

[jira] [Commented] (TIKA-1539) GRB file magic bytes and extension matching

2015-02-05 Thread Luke sh (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14308212#comment-14308212 ] Luke sh commented on TIKA-1539: --- pull request #28, add grb files for unit tests. GRB file

[jira] [Commented] (TIKA-1517) MIME type detection with probability

2015-01-17 Thread Luke sh (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14281254#comment-14281254 ] Luke sh commented on TIKA-1517: --- Basic feature design: Probability selection mechanism is

[jira] [Updated] (TIKA-1517) MIME type detection with probability

2015-01-15 Thread Luke sh (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Luke sh updated TIKA-1517: -- Description: Improvement and intuition The original implementation for MIME type selection/detection is a bit

[jira] [Commented] (TIKA-1521) Handle password protected 7zip files

2015-01-27 Thread Luke sh (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14293944#comment-14293944 ] Luke sh commented on TIKA-1521: --- Hi @Nick Burch, I just came across this problem, i had a

[jira] [Commented] (TIKA-1517) MIME type selection with probability

2015-01-28 Thread Luke sh (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14295928#comment-14295928 ] Luke sh commented on TIKA-1517: --- the probability selection will inherit the class MIMETypes,

[jira] [Commented] (TIKA-1535) Inheritance modification for the class MIMETypes

2015-01-28 Thread Luke sh (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14295922#comment-14295922 ] Luke sh commented on TIKA-1535: --- TIKA-1517, the mime type selection mechanism with

[jira] [Comment Edited] (TIKA-1517) MIME type selection with probability

2015-01-28 Thread Luke sh (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14295928#comment-14295928 ] Luke sh edited comment on TIKA-1517 at 1/28/15 11:06 PM: - the

[jira] [Updated] (TIKA-1561) GCMD Directory Interchange Format (.dif) identification

2015-02-25 Thread Luke sh (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Luke sh updated TIKA-1561: -- Description: cited from the http://gcmd.nasa.gov/add/difguide/WRITEADIF.pdf The Directory Interchange Format

[jira] [Updated] (TIKA-1561) GCMD Directory Interchange Format (.dif) identification

2015-02-25 Thread Luke sh (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Luke sh updated TIKA-1561: -- Attachment: (was: carbon_isotopic_values_of_alkanes_extracted_from_paleosols.dif) GCMD Directory

[jira] [Issue Comment Deleted] (TIKA-1561) GCMD Directory Interchange Format (.dif) identification

2015-02-25 Thread Luke sh (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Luke sh updated TIKA-1561: -- Comment: was deleted (was: sample dif file) GCMD Directory Interchange Format (.dif) identification

[jira] [Created] (TIKA-1561) GCMD Directory Interchange Format (.dif) identification

2015-02-25 Thread Luke sh (JIRA)
Luke sh created TIKA-1561: - Summary: GCMD Directory Interchange Format (.dif) identification Key: TIKA-1561 URL: https://issues.apache.org/jira/browse/TIKA-1561 Project: Tika Issue Type: Improvement

[jira] [Updated] (TIKA-1561) GCMD Directory Interchange Format (.dif) identification

2015-02-25 Thread Luke sh (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Luke sh updated TIKA-1561: -- Attachment: carbon_isotopic_values_of_alkanes_extracted_from_paleosols.dif sample dif file GCMD Directory

[jira] [Updated] (TIKA-1561) GCMD Directory Interchange Format (.dif) identification

2015-02-25 Thread Luke sh (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Luke sh updated TIKA-1561: -- Description: cited from the http://gcmd.nasa.gov/add/difguide/WRITEADIF.pdf The Directory Interchange Format

[jira] [Commented] (TIKA-1561) GCMD Directory Interchange Format (.dif) identification

2015-02-25 Thread Luke sh (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14337774#comment-14337774 ] Luke sh commented on TIKA-1561: --- I am going to send an pull request with this dif type

[jira] [Updated] (TIKA-1561) GCMD Directory Interchange Format (.dif) identification

2015-02-25 Thread Luke sh (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Luke sh updated TIKA-1561: -- Attachment: carbon_isotopic_values_of_alkanes_extracted_from_paleosols.dif sample dif file GCMD Directory

[jira] [Updated] (TIKA-1561) GCMD Directory Interchange Format (.dif) identification

2015-02-25 Thread Luke sh (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Luke sh updated TIKA-1561: -- Description: cited from the http://gcmd.nasa.gov/add/difguide/WRITEADIF.pdf The Directory Interchange Format

[jira] [Updated] (TIKA-1561) GCMD Directory Interchange Format (.dif) identification

2015-02-25 Thread Luke sh (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Luke sh updated TIKA-1561: -- Description: cited from the http://gcmd.nasa.gov/add/difguide/WRITEADIF.pdf The Directory Interchange Format

[jira] [Created] (TIKA-1582) Mime Detection based on neural networks with Byte-frequency-histogram

2015-03-26 Thread Luke sh (JIRA)
Luke sh created TIKA-1582: - Summary: Mime Detection based on neural networks with Byte-frequency-histogram Key: TIKA-1582 URL: https://issues.apache.org/jira/browse/TIKA-1582 Project: Tika Issue

[jira] [Commented] (TIKA-1582) Mime Detection based on neural networks with Byte-frequency-histogram

2015-03-26 Thread Luke sh (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14382590#comment-14382590 ] Luke sh commented on TIKA-1582: --- a pull request with this feature for Tika will be created

[jira] [Updated] (TIKA-1582) Mime Detection based on neural networks with Byte-frequency-histogram

2015-03-26 Thread Luke sh (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Luke sh updated TIKA-1582: -- Description: Content-based mime type detection is one of the popular approaches to detect mime type, there are

[jira] [Updated] (TIKA-1582) Mime Detection based on neural networks with Byte-frequency-histogram

2015-03-28 Thread Luke sh (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Luke sh updated TIKA-1582: -- Attachment: nnmodel.docx Documentation Mime Detection based on neural networks with Byte-frequency-histogram

[jira] [Updated] (TIKA-1582) Mime Detection based on neural networks with Byte-frequency-histogram

2015-03-28 Thread Luke sh (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Luke sh updated TIKA-1582: -- Attachment: week2-report-histogram comparison.docx histogram comparison Mime Detection based on neural

[jira] [Updated] (TIKA-1582) Mime Detection based on neural networks with Byte-frequency-histogram

2015-03-28 Thread Luke sh (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Luke sh updated TIKA-1582: -- Attachment: week6 report.docx Test report Mime Detection based on neural networks with

[jira] [Updated] (TIKA-1535) Inheritance modification for the class MIMETypes

2015-01-29 Thread Luke sh (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Luke sh updated TIKA-1535: -- Description: The Class MIMETypes does not currently allow for inheritance. There are a couple of methods in

[jira] [Comment Edited] (TIKA-1517) MIME type selection with probability

2015-04-01 Thread Luke sh (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14391476#comment-14391476 ] Luke sh edited comment on TIKA-1517 at 4/2/15 1:30 AM: --- After some

[jira] [Updated] (TIKA-1610) CBOR Parser and detection improvement

2015-04-21 Thread Luke sh (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Luke sh updated TIKA-1610: -- Attachment: cbor_tika.mimetypes.xml.jpg rfc_cbor.jpg CBOR Parser and detection improvement

[jira] [Updated] (TIKA-1610) CBOR Parser and detection improvement

2015-04-21 Thread Luke sh (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Luke sh updated TIKA-1610: -- Description: CBOR is a data format whose design goals include the possibility of extremely small code size,

[jira] [Created] (TIKA-1610) CBOR Parser and detection improvement

2015-04-21 Thread Luke sh (JIRA)
Luke sh created TIKA-1610: - Summary: CBOR Parser and detection improvement Key: TIKA-1610 URL: https://issues.apache.org/jira/browse/TIKA-1610 Project: Tika Issue Type: New Feature

[jira] [Updated] (TIKA-1610) CBOR Parser and detection improvement

2015-04-21 Thread Luke sh (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Luke sh updated TIKA-1610: -- Description: CBOR is a data format whose design goals include the possibility of extremely small code size,

[jira] [Updated] (TIKA-1610) CBOR Parser and detection improvement

2015-04-21 Thread Luke sh (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Luke sh updated TIKA-1610: -- Description: CBOR is a data format whose design goals include the possibility of extremely small code size,

[jira] [Updated] (TIKA-1610) CBOR Parser and detection improvement

2015-04-21 Thread Luke sh (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Luke sh updated TIKA-1610: -- Description: CBOR is a data format whose design goals include the possibility of extremely small code size,

[jira] [Updated] (TIKA-1610) CBOR Parser and detection improvement

2015-04-21 Thread Luke sh (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Luke sh updated TIKA-1610: -- Attachment: 142440269.html cbor file dumped by the nutch tool. CBOR Parser and detection improvement

[jira] [Updated] (TIKA-1610) CBOR Parser and detection [improvement]

2015-04-21 Thread Luke sh (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Luke sh updated TIKA-1610: -- Summary: CBOR Parser and detection [improvement] (was: CBOR Parser and detection improvement) CBOR Parser and

[jira] [Commented] (TIKA-1517) MIME type selection with probability

2015-04-25 Thread Luke sh (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14512284#comment-14512284 ] Luke sh commented on TIKA-1517: --- Notes: A Pull request with adding the support to Tika()

[jira] [Commented] (TIKA-1517) MIME type selection with probability

2015-04-25 Thread Luke sh (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14512286#comment-14512286 ] Luke sh commented on TIKA-1517: --- Notes: this feature is also tested with the data from

[jira] [Commented] (TIKA-1610) CBOR Parser and detection [improvement]

2015-04-23 Thread Luke sh (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14510402#comment-14510402 ] Luke sh commented on TIKA-1610: --- Thanks a lot [~gagravarr] for the prompt response. I thought

[jira] [Updated] (TIKA-1610) CBOR Parser and detection [improvement]

2015-04-23 Thread Luke sh (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Luke sh updated TIKA-1610: -- Attachment: NUTCH-1997.cbor CBOR Parser and detection [improvement] ---

[jira] [Comment Edited] (TIKA-1610) CBOR Parser and detection [improvement]

2015-04-23 Thread Luke sh (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14510382#comment-14510382 ] Luke sh edited comment on TIKA-1610 at 4/24/15 2:43 AM: Notes: The

[jira] [Commented] (TIKA-1610) CBOR Parser and detection [improvement]

2015-04-23 Thread Luke sh (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14510382#comment-14510382 ] Luke sh commented on TIKA-1610: --- Notes: The attached cbor file contains both magic bytes for

[jira] [Commented] (TIKA-1582) Mime Detection based on neural networks with Byte-frequency-histogram

2015-05-02 Thread Luke sh (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14525379#comment-14525379 ] Luke sh commented on TIKA-1582: --- Sure [~chrismattmann], i will work on the wiki for this

[jira] [Commented] (TIKA-1582) Mime Detection based on neural networks with Byte-frequency-histogram

2015-05-04 Thread Luke sh (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14526717#comment-14526717 ] Luke sh commented on TIKA-1582: --- Thanks a lot [~talli...@apache.org] for the comments, here

[jira] [Commented] (TIKA-1582) Content-based Mime Detection with Byte-frequency-histogram

2015-05-10 Thread Luke sh (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14537405#comment-14537405 ] Luke sh commented on TIKA-1582: --- Thanks professor [~chrismattmann] Hi all, I just created a

[jira] [Updated] (TIKA-1582) Content-based Mime Detection with Byte-frequency-histogram

2015-05-10 Thread Luke sh (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Luke sh updated TIKA-1582: -- Summary: Content-based Mime Detection with Byte-frequency-histogram (was: Mime Detection based on neural

[jira] [Commented] (TIKA-1517) MIME type selection with probability

2015-04-01 Thread Luke sh (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14391476#comment-14391476 ] Luke sh commented on TIKA-1517: --- After some research, it looks like the algorithm design with

[jira] [Comment Edited] (TIKA-1517) MIME type selection with probability

2015-04-01 Thread Luke sh (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14391476#comment-14391476 ] Luke sh edited comment on TIKA-1517 at 4/1/15 10:38 PM: After some

[jira] [Comment Edited] (TIKA-1517) MIME type selection with probability

2015-04-01 Thread Luke sh (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14391476#comment-14391476 ] Luke sh edited comment on TIKA-1517 at 4/1/15 10:37 PM: After some

[jira] [Comment Edited] (TIKA-1517) MIME type selection with probability

2015-04-01 Thread Luke sh (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14391476#comment-14391476 ] Luke sh edited comment on TIKA-1517 at 4/1/15 10:39 PM: After some

[jira] [Comment Edited] (TIKA-1517) MIME type selection with probability

2015-04-01 Thread Luke sh (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14391476#comment-14391476 ] Luke sh edited comment on TIKA-1517 at 4/1/15 10:39 PM: After some

[jira] [Comment Edited] (TIKA-1517) MIME type selection with probability

2015-04-01 Thread Luke sh (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14391476#comment-14391476 ] Luke sh edited comment on TIKA-1517 at 4/1/15 10:41 PM: After some

[jira] [Comment Edited] (TIKA-1517) MIME type selection with probability

2015-04-01 Thread Luke sh (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14391476#comment-14391476 ] Luke sh edited comment on TIKA-1517 at 4/1/15 10:39 PM: After some

[jira] [Updated] (TIKA-1517) MIME type selection with probability

2015-04-01 Thread Luke sh (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Luke sh updated TIKA-1517: -- Priority: Trivial (was: Major) MIME type selection with probability