[
https://issues.apache.org/jira/browse/TIKA-1513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15302653#comment-15302653
]
Hudson commented on TIKA-1513:
--
UNSTABLE: Integrated in tika-2.x #103 (See
[
https://issues.apache.org/jira/browse/TIKA-1513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15302219#comment-15302219
]
Hudson commented on TIKA-1513:
--
FAILURE: Integrated in tika-2.x-windows #7 (See
[
https://issues.apache.org/jira/browse/TIKA-1513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15302183#comment-15302183
]
Hudson commented on TIKA-1513:
--
SUCCESS: Integrated in tika-trunk-jdk1.7 #1001 (See
[
https://issues.apache.org/jira/browse/TIKA-1513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15302076#comment-15302076
]
Tim Allison commented on TIKA-1513:
---
[~iryndin], would you mind if we added your test files (tir_im.dbf,
[
https://issues.apache.org/jira/browse/TIKA-1513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15300737#comment-15300737
]
Nick Burch commented on TIKA-1513:
--
I haven't read much on the format, but I'd be tempted to maybe have
[
https://issues.apache.org/jira/browse/TIKA-1513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15300460#comment-15300460
]
Hudson commented on TIKA-1513:
--
FAILURE: Integrated in tika-2.x-windows #6 (See
[
https://issues.apache.org/jira/browse/TIKA-1513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15300425#comment-15300425
]
Hudson commented on TIKA-1513:
--
SUCCESS: Integrated in tika-trunk-jdk1.7 #999 (See
[
https://issues.apache.org/jira/browse/TIKA-1513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15299884#comment-15299884
]
Tim Allison commented on TIKA-1513:
---
[~nicholasc], do you, by chance, have any shareable examples of
[
https://issues.apache.org/jira/browse/TIKA-1513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15299033#comment-15299033
]
Tim Allison commented on TIKA-1513:
---
Rolled our own parser. Will commit tomorrow.
> Add mime detection
[
https://issues.apache.org/jira/browse/TIKA-1513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15280148#comment-15280148
]
Tim Allison commented on TIKA-1513:
---
[~iryndin], now that 1.13 is in the voting process, I'd like to
[
https://issues.apache.org/jira/browse/TIKA-1513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15258147#comment-15258147
]
Tim Allison commented on TIKA-1513:
---
Great. Frankly, the initial regex looked quite good...small handful
[
https://issues.apache.org/jira/browse/TIKA-1513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15256870#comment-15256870
]
Nick C commented on TIKA-1513:
--
Tested more files using the full regex and haven't had any false positives. :D
[
https://issues.apache.org/jira/browse/TIKA-1513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15249129#comment-15249129
]
Nick C commented on TIKA-1513:
--
Sounds good. I'll be running this on more files this week and will report back
[
https://issues.apache.org/jira/browse/TIKA-1513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15248493#comment-15248493
]
Tim Allison commented on TIKA-1513:
---
I won't commit this until we get our corpus results back...perhaps
[
https://issues.apache.org/jira/browse/TIKA-1513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15248486#comment-15248486
]
Tim Allison commented on TIKA-1513:
---
I won't commit this until we get our corpus results back...perhaps
[
https://issues.apache.org/jira/browse/TIKA-1513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15248463#comment-15248463
]
Nick C commented on TIKA-1513:
--
I was running this on more data and ran in to a text file that matched. It
[
https://issues.apache.org/jira/browse/TIKA-1513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15247579#comment-15247579
]
Tim Allison commented on TIKA-1513:
---
I'll add this before running the final (?) 1.13 regression tests and
[
https://issues.apache.org/jira/browse/TIKA-1513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15245081#comment-15245081
]
Nick C commented on TIKA-1513:
--
Did some more testing and simplified the rules enough that it could be made in
[
https://issues.apache.org/jira/browse/TIKA-1513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15240982#comment-15240982
]
Tim Allison commented on TIKA-1513:
---
Nope. Didn't remove them. There are roughly 3k files that ended
[
https://issues.apache.org/jira/browse/TIKA-1513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15239992#comment-15239992
]
Tim Allison commented on TIKA-1513:
---
bq. At least 200. I would like more to test with though.
I think I
[
https://issues.apache.org/jira/browse/TIKA-1513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15239960#comment-15239960
]
Nick C commented on TIKA-1513:
--
bq. Well, you know there's still plenty of time to get that into Tika 2.0
[
https://issues.apache.org/jira/browse/TIKA-1513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15239936#comment-15239936
]
Tim Allison commented on TIKA-1513:
---
bq. It be nice if Tika's mime definition allowed for more complex
[
https://issues.apache.org/jira/browse/TIKA-1513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15239836#comment-15239836
]
Nick C commented on TIKA-1513:
--
I added the license header. I think some of the checks could be removed. I'll
[
https://issues.apache.org/jira/browse/TIKA-1513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15239067#comment-15239067
]
Tim Allison commented on TIKA-1513:
---
Is there any interest in forking jdbf either into Tika; or
[
https://issues.apache.org/jira/browse/TIKA-1513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15239062#comment-15239062
]
Tim Allison commented on TIKA-1513:
---
[~gagravarr], would you mind taking a look at the detector? Is
[
https://issues.apache.org/jira/browse/TIKA-1513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15236171#comment-15236171
]
Nick C commented on TIKA-1513:
--
Some of my checks maybe a little strict because you can have extra bytes at
[
https://issues.apache.org/jira/browse/TIKA-1513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15234986#comment-15234986
]
Tim Allison commented on TIKA-1513:
---
[~iryndin], any interest in working on this?
> Add mime detection
[
https://issues.apache.org/jira/browse/TIKA-1513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15234984#comment-15234984
]
Tim Allison commented on TIKA-1513:
---
Great. Thank you!
> Add mime detection and parsing for dbf files
>
[
https://issues.apache.org/jira/browse/TIKA-1513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15234276#comment-15234276
]
Nick C commented on TIKA-1513:
--
I wrote the detector from scratch a couple months ago because 0x03 caused too
[
https://issues.apache.org/jira/browse/TIKA-1513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15234007#comment-15234007
]
Nick Burch commented on TIKA-1513:
--
Is it based on JDBF, or did you write it from scratch?
> Add mime
[
https://issues.apache.org/jira/browse/TIKA-1513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15233743#comment-15233743
]
Nick C commented on TIKA-1513:
--
I ended up building a detector that tries to validate the dbf header instead
[
https://issues.apache.org/jira/browse/TIKA-1513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=14734927#comment-14734927
]
Tim Allison commented on TIKA-1513:
---
Hi [~iryndin], I wanted to check in to see if you've had a chance to
[
https://issues.apache.org/jira/browse/TIKA-1513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14508932#comment-14508932
]
Tim Allison commented on TIKA-1513:
---
Oh, broken files, y, that would explain your
[
https://issues.apache.org/jira/browse/TIKA-1513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14507995#comment-14507995
]
Luis Filipe Nassif commented on TIKA-1513:
--
Hi Tim,
I've processed a forensic
[
https://issues.apache.org/jira/browse/TIKA-1513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14505057#comment-14505057
]
Luis Filipe Nassif commented on TIKA-1513:
--
No, I did not give a try to 0x03. How
[
https://issues.apache.org/jira/browse/TIKA-1513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14504996#comment-14504996
]
Luis Filipe Nassif commented on TIKA-1513:
--
Hi Tim,
I am ok with 1) and 2). But I
[
https://issues.apache.org/jira/browse/TIKA-1513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14505092#comment-14505092
]
Tim Allison commented on TIKA-1513:
---
Completely agree.
Only 2,386 files.
This is the
[
https://issues.apache.org/jira/browse/TIKA-1513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14505006#comment-14505006
]
Tim Allison commented on TIKA-1513:
---
Y, I was concerned by that generally. Are you
[
https://issues.apache.org/jira/browse/TIKA-1513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14504951#comment-14504951
]
Tim Allison commented on TIKA-1513:
---
From govdocs1, it looks like first byte of 0X03 is a
[
https://issues.apache.org/jira/browse/TIKA-1513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14506214#comment-14506214
]
Tim Allison commented on TIKA-1513:
---
In looking at
[
https://issues.apache.org/jira/browse/TIKA-1513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14502716#comment-14502716
]
Tim Allison commented on TIKA-1513:
---
Hi [~iryndin], I wanted to check in to see how the
[
https://issues.apache.org/jira/browse/TIKA-1513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14293503#comment-14293503
]
Tim Allison commented on TIKA-1513:
---
Ah, ok. These are the links that I came across:
[
https://issues.apache.org/jira/browse/TIKA-1513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14293842#comment-14293842
]
Ivan Ryndin commented on TIKA-1513:
---
Yeah, I saw these articles. Probably, this code page
[
https://issues.apache.org/jira/browse/TIKA-1513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14292929#comment-14292929
]
Ivan Ryndin commented on TIKA-1513:
---
There are no reliable ways to detect codepage of DBF
[
https://issues.apache.org/jira/browse/TIKA-1513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14287980#comment-14287980
]
Tim Allison commented on TIKA-1513:
---
[~iryndin], on codepage detection in dbf...in one of
[
https://issues.apache.org/jira/browse/TIKA-1513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14280211#comment-14280211
]
Ivan Ryndin commented on TIKA-1513:
---
Hi guys!
I started working on jdbf push to Maven
[
https://issues.apache.org/jira/browse/TIKA-1513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14280367#comment-14280367
]
Tim Allison commented on TIKA-1513:
---
[~iryndin], No rush on our side (well, at least
[
https://issues.apache.org/jira/browse/TIKA-1513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14280391#comment-14280391
]
Ivan Ryndin commented on TIKA-1513:
---
Well, I plan ongoing support of JDBF, though I left
[
https://issues.apache.org/jira/browse/TIKA-1513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14280398#comment-14280398
]
Tim Allison commented on TIKA-1513:
---
Great! Well, yes, we're always looking to build the
[
https://issues.apache.org/jira/browse/TIKA-1513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14280401#comment-14280401
]
Ivan Ryndin commented on TIKA-1513:
---
Well, okay, let my first job for the TIKA project
[
https://issues.apache.org/jira/browse/TIKA-1513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14279008#comment-14279008
]
Tim Allison commented on TIKA-1513:
---
Thank you, [~lfcnassif] and [~gagravarr]!
I think
[
https://issues.apache.org/jira/browse/TIKA-1513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14278678#comment-14278678
]
Nick Burch commented on TIKA-1513:
--
If it's the project themselves pushing it to central,
[
https://issues.apache.org/jira/browse/TIKA-1513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14278674#comment-14278674
]
Luis Filipe Nassif commented on TIKA-1513:
--
I talked to iryndin and he liked the
[
https://issues.apache.org/jira/browse/TIKA-1513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14276916#comment-14276916
]
Tim Allison commented on TIKA-1513:
---
From a brochure-level evaluation :), I'd prefer
[
https://issues.apache.org/jira/browse/TIKA-1513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14275553#comment-14275553
]
Tim Allison commented on TIKA-1513:
---
Any interest in encouraging iryndin to push to
[
https://issues.apache.org/jira/browse/TIKA-1513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14275636#comment-14275636
]
Luis Filipe Nassif commented on TIKA-1513:
--
I can if the community thinks that
[
https://issues.apache.org/jira/browse/TIKA-1513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14275542#comment-14275542
]
Luis Filipe Nassif commented on TIKA-1513:
--
I have found
[
https://issues.apache.org/jira/browse/TIKA-1513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14275679#comment-14275679
]
Konstantin Gribov commented on TIKA-1513:
-
[~talli...@mitre.org], I think it's good
[
https://issues.apache.org/jira/browse/TIKA-1513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14275467#comment-14275467
]
Konstantin Gribov commented on TIKA-1513:
-
Is this lib alive? Last commits were in
[
https://issues.apache.org/jira/browse/TIKA-1513?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14275472#comment-14275472
]
Tim Allison commented on TIKA-1513:
---
I share your concern. There are ~2600 .dbase3 files
60 matches
Mail list logo