Luke sh created TIKA-1539:
-
Summary: GRB file magic bytes and extension matching
Key: TIKA-1539
URL: https://issues.apache.org/jira/browse/TIKA-1539
Project: Tika
Issue Type: Improvement
[
https://issues.apache.org/jira/browse/TIKA-1539?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14308212#comment-14308212
]
Luke sh commented on TIKA-1539:
---
pull request #28, add grb files for unit tests.
GRB file
[
https://issues.apache.org/jira/browse/TIKA-1517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14281254#comment-14281254
]
Luke sh commented on TIKA-1517:
---
Basic feature design:
Probability selection mechanism is
[
https://issues.apache.org/jira/browse/TIKA-1517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Luke sh updated TIKA-1517:
--
Description:
Improvement and intuition
The original implementation for MIME type selection/detection is a bit
[
https://issues.apache.org/jira/browse/TIKA-1521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14293944#comment-14293944
]
Luke sh commented on TIKA-1521:
---
Hi @Nick Burch,
I just came across this problem, i had a
[
https://issues.apache.org/jira/browse/TIKA-1517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14295928#comment-14295928
]
Luke sh commented on TIKA-1517:
---
the probability selection will inherit the class MIMETypes,
[
https://issues.apache.org/jira/browse/TIKA-1535?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14295922#comment-14295922
]
Luke sh commented on TIKA-1535:
---
TIKA-1517, the mime type selection mechanism with
[
https://issues.apache.org/jira/browse/TIKA-1517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14295928#comment-14295928
]
Luke sh edited comment on TIKA-1517 at 1/28/15 11:06 PM:
-
the
[
https://issues.apache.org/jira/browse/TIKA-1561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Luke sh updated TIKA-1561:
--
Description:
cited from the http://gcmd.nasa.gov/add/difguide/WRITEADIF.pdf
The Directory Interchange Format
[
https://issues.apache.org/jira/browse/TIKA-1561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Luke sh updated TIKA-1561:
--
Attachment: (was:
carbon_isotopic_values_of_alkanes_extracted_from_paleosols.dif)
GCMD Directory
[
https://issues.apache.org/jira/browse/TIKA-1561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Luke sh updated TIKA-1561:
--
Comment: was deleted
(was: sample dif file)
GCMD Directory Interchange Format (.dif) identification
Luke sh created TIKA-1561:
-
Summary: GCMD Directory Interchange Format (.dif) identification
Key: TIKA-1561
URL: https://issues.apache.org/jira/browse/TIKA-1561
Project: Tika
Issue Type: Improvement
[
https://issues.apache.org/jira/browse/TIKA-1561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Luke sh updated TIKA-1561:
--
Attachment: carbon_isotopic_values_of_alkanes_extracted_from_paleosols.dif
sample dif file
GCMD Directory
[
https://issues.apache.org/jira/browse/TIKA-1561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Luke sh updated TIKA-1561:
--
Description:
cited from the http://gcmd.nasa.gov/add/difguide/WRITEADIF.pdf
The Directory Interchange Format
[
https://issues.apache.org/jira/browse/TIKA-1561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14337774#comment-14337774
]
Luke sh commented on TIKA-1561:
---
I am going to send an pull request with this dif type
[
https://issues.apache.org/jira/browse/TIKA-1561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Luke sh updated TIKA-1561:
--
Attachment: carbon_isotopic_values_of_alkanes_extracted_from_paleosols.dif
sample dif file
GCMD Directory
[
https://issues.apache.org/jira/browse/TIKA-1561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Luke sh updated TIKA-1561:
--
Description:
cited from the http://gcmd.nasa.gov/add/difguide/WRITEADIF.pdf
The Directory Interchange Format
[
https://issues.apache.org/jira/browse/TIKA-1561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Luke sh updated TIKA-1561:
--
Description:
cited from the http://gcmd.nasa.gov/add/difguide/WRITEADIF.pdf
The Directory Interchange Format
Luke sh created TIKA-1582:
-
Summary: Mime Detection based on neural networks with
Byte-frequency-histogram
Key: TIKA-1582
URL: https://issues.apache.org/jira/browse/TIKA-1582
Project: Tika
Issue
[
https://issues.apache.org/jira/browse/TIKA-1582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14382590#comment-14382590
]
Luke sh commented on TIKA-1582:
---
a pull request with this feature for Tika will be created
[
https://issues.apache.org/jira/browse/TIKA-1582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Luke sh updated TIKA-1582:
--
Description:
Content-based mime type detection is one of the popular approaches to detect
mime type, there are
[
https://issues.apache.org/jira/browse/TIKA-1582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Luke sh updated TIKA-1582:
--
Attachment: nnmodel.docx
Documentation
Mime Detection based on neural networks with Byte-frequency-histogram
[
https://issues.apache.org/jira/browse/TIKA-1582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Luke sh updated TIKA-1582:
--
Attachment: week2-report-histogram comparison.docx
histogram comparison
Mime Detection based on neural
[
https://issues.apache.org/jira/browse/TIKA-1582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Luke sh updated TIKA-1582:
--
Attachment: week6 report.docx
Test report
Mime Detection based on neural networks with
[
https://issues.apache.org/jira/browse/TIKA-1535?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Luke sh updated TIKA-1535:
--
Description:
The Class MIMETypes does not currently allow for inheritance.
There are a couple of methods in
[
https://issues.apache.org/jira/browse/TIKA-1517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14391476#comment-14391476
]
Luke sh edited comment on TIKA-1517 at 4/2/15 1:30 AM:
---
After some
[
https://issues.apache.org/jira/browse/TIKA-1610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Luke sh updated TIKA-1610:
--
Attachment: cbor_tika.mimetypes.xml.jpg
rfc_cbor.jpg
CBOR Parser and detection improvement
[
https://issues.apache.org/jira/browse/TIKA-1610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Luke sh updated TIKA-1610:
--
Description:
CBOR is a data format whose design goals include the possibility of extremely
small code size,
Luke sh created TIKA-1610:
-
Summary: CBOR Parser and detection improvement
Key: TIKA-1610
URL: https://issues.apache.org/jira/browse/TIKA-1610
Project: Tika
Issue Type: New Feature
[
https://issues.apache.org/jira/browse/TIKA-1610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Luke sh updated TIKA-1610:
--
Description:
CBOR is a data format whose design goals include the possibility of extremely
small code size,
[
https://issues.apache.org/jira/browse/TIKA-1610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Luke sh updated TIKA-1610:
--
Description:
CBOR is a data format whose design goals include the possibility of extremely
small code size,
[
https://issues.apache.org/jira/browse/TIKA-1610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Luke sh updated TIKA-1610:
--
Description:
CBOR is a data format whose design goals include the possibility of extremely
small code size,
[
https://issues.apache.org/jira/browse/TIKA-1610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Luke sh updated TIKA-1610:
--
Attachment: 142440269.html
cbor file dumped by the nutch tool.
CBOR Parser and detection improvement
[
https://issues.apache.org/jira/browse/TIKA-1610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Luke sh updated TIKA-1610:
--
Summary: CBOR Parser and detection [improvement] (was: CBOR Parser and
detection improvement)
CBOR Parser and
[
https://issues.apache.org/jira/browse/TIKA-1517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14512284#comment-14512284
]
Luke sh commented on TIKA-1517:
---
Notes:
A Pull request with adding the support to Tika()
[
https://issues.apache.org/jira/browse/TIKA-1517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14512286#comment-14512286
]
Luke sh commented on TIKA-1517:
---
Notes:
this feature is also tested with the data from
[
https://issues.apache.org/jira/browse/TIKA-1610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14510402#comment-14510402
]
Luke sh commented on TIKA-1610:
---
Thanks a lot [~gagravarr] for the prompt response.
I thought
[
https://issues.apache.org/jira/browse/TIKA-1610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Luke sh updated TIKA-1610:
--
Attachment: NUTCH-1997.cbor
CBOR Parser and detection [improvement]
---
[
https://issues.apache.org/jira/browse/TIKA-1610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14510382#comment-14510382
]
Luke sh edited comment on TIKA-1610 at 4/24/15 2:43 AM:
Notes:
The
[
https://issues.apache.org/jira/browse/TIKA-1610?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14510382#comment-14510382
]
Luke sh commented on TIKA-1610:
---
Notes:
The attached cbor file contains both magic bytes for
[
https://issues.apache.org/jira/browse/TIKA-1582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14525379#comment-14525379
]
Luke sh commented on TIKA-1582:
---
Sure [~chrismattmann], i will work on the wiki for this
[
https://issues.apache.org/jira/browse/TIKA-1582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14526717#comment-14526717
]
Luke sh commented on TIKA-1582:
---
Thanks a lot [~talli...@apache.org] for the comments, here
[
https://issues.apache.org/jira/browse/TIKA-1582?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14537405#comment-14537405
]
Luke sh commented on TIKA-1582:
---
Thanks professor [~chrismattmann]
Hi all,
I just created a
[
https://issues.apache.org/jira/browse/TIKA-1582?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Luke sh updated TIKA-1582:
--
Summary: Content-based Mime Detection with Byte-frequency-histogram (was:
Mime Detection based on neural
[
https://issues.apache.org/jira/browse/TIKA-1517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14391476#comment-14391476
]
Luke sh commented on TIKA-1517:
---
After some research, it looks like the algorithm design with
[
https://issues.apache.org/jira/browse/TIKA-1517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14391476#comment-14391476
]
Luke sh edited comment on TIKA-1517 at 4/1/15 10:38 PM:
After some
[
https://issues.apache.org/jira/browse/TIKA-1517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14391476#comment-14391476
]
Luke sh edited comment on TIKA-1517 at 4/1/15 10:37 PM:
After some
[
https://issues.apache.org/jira/browse/TIKA-1517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14391476#comment-14391476
]
Luke sh edited comment on TIKA-1517 at 4/1/15 10:39 PM:
After some
[
https://issues.apache.org/jira/browse/TIKA-1517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14391476#comment-14391476
]
Luke sh edited comment on TIKA-1517 at 4/1/15 10:39 PM:
After some
[
https://issues.apache.org/jira/browse/TIKA-1517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14391476#comment-14391476
]
Luke sh edited comment on TIKA-1517 at 4/1/15 10:41 PM:
After some
[
https://issues.apache.org/jira/browse/TIKA-1517?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14391476#comment-14391476
]
Luke sh edited comment on TIKA-1517 at 4/1/15 10:39 PM:
After some
[
https://issues.apache.org/jira/browse/TIKA-1517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Luke sh updated TIKA-1517:
--
Priority: Trivial (was: Major)
MIME type selection with probability
52 matches
Mail list logo