need URL openStream() to test Tika-327 in MimeDetectionTest?

2013-06-28 Thread Allison, Timothy B.
All, testUrlOnly in MimeDetectionTest makes a URL.openStream() call. I have to modify Tika's pom with my proxy info to get the test to pass in my environment. Would the test still test Tika-327 if we modified the test to read from a local copy of nheri's html? It looks to me like the

RE: need URL openStream() to test Tika-327 in MimeDetectionTest?

2013-06-28 Thread Allison, Timothy B.
Doh! Please ignore last email: https://issues.apache.org/jira/browse/TIKA-1129 Would anyone mind if I recreated the structure from the offending html so that we can return this test to test a local copy of the document? -Original Message- From: Allison, Timothy B. Sent: Friday, June

RE: MagicDetector don't work for all RFC882 message Types.

2013-07-11 Thread Allison, Timothy B.
I think I may be uniquely qualified to answer this from an Idiot's guide/newish to Tika perspective. :) Apologies if I'm missing out on more obvious answers! SVN info: http://tika.apache.org/source-repository.html Generally how to contribute (Lucene has a good description):

RE: MagicDetector don't work for all RFC882 message Types.

2013-07-11 Thread Allison, Timothy B.
-repository.html site? Thank you. Best, Tim -Original Message- From: Allison, Timothy B. [mailto:talli...@mitre.org] Sent: Thursday, July 11, 2013 10:53 AM To: dev@tika.apache.org Subject: RE: MagicDetector don't work for all RFC882 message Types. I think I may be uniquely

RE: [Announce] Welcome Tim Allison as Tika PM member and committer

2013-07-30 Thread Allison, Timothy B.
Wow. Thank you, all! I very much look forward to working with you in these new roles. Best, Tim From: Michael McCandless [luc...@mikemccandless.com] Sent: Tuesday, July 30, 2013 6:29 AM To: dev@tika.apache.org Subject: Re: [Announce]

RE: [Announce] Welcome Tim Allison as Tika PM member and committer

2013-08-02 Thread Allison, Timothy B.
The MITRE Corporation -Original Message- From: Allison, Timothy B. [mailto:talli...@mitre.org] Sent: Tuesday, July 30, 2013 8:55 AM To: dev@tika.apache.org Subject: RE: [Announce] Welcome Tim Allison as Tika PM member and committer Wow. Thank you, all! I very much look forward to working

permissions to close issue?

2013-08-15 Thread Allison, Timothy B.
All, I don't appear to have permissions to close out issues that I didn't open (TIKA-1001 and TIKA-1153). Is this standard jira policy or user error? Thank you. Best, Tim -Original Message- From: Tim Allison (JIRA) [mailto:j...@apache.org] Sent: Wednesday,

building tika from scratch without pulling 1.5-SNAPSHOT from the repository?

2013-09-19 Thread Allison, Timothy B.
All, Is there an easy way to build Tika from scratch without reliance on 1.5-SNAPSHOT in the mvn repository and without building the components in the correct order and then manually loading them into a local mvn repository? At the main level, I've been using a simple 'mvn package' Thank

RE: building tika from scratch without pulling 1.5-SNAPSHOT from the repository?

2013-09-19 Thread Allison, Timothy B.
: Allison, Timothy B. [mailto:talli...@mitre.org] Sent: Thursday, September 19, 2013 2:20 PM To: dev@tika.apache.org Subject: building tika from scratch without pulling 1.5-SNAPSHOT from the repository? All, Is there an easy way to build Tika from scratch without reliance on 1.5-SNAPSHOT

RE: NonSequentialPDFParser

2013-12-02 Thread Allison, Timothy B.
Does the speedup only help if you are trying to parse an individual page vs the entire document? If so, is partial parsing a use case for Tika? If this has the same performance on the full document as the regular parser, does it have lower memory overhead? -Original Message- From:

Tika on Jenkins?

2014-01-28 Thread Allison, Timothy B.
All, How do we fix the Tika build in Jenkins? The polling log shows an IOException when trying to install maven (https://builds.apache.org/job/Tika-trunk/scmPollLog/). Permissions or space issue? The last successful tika-app-1.5 SNAPSHOT

building Tika without bundle dependency on repositories

2014-02-05 Thread Allison, Timothy B.
Speaking of building...is there an easy way to build Tika locally without reference to the repositories and without building each component one by one (in the correct order) and then manually installing in a local repository. Thank you! Downloading:

RE: building Tika without bundle dependency on repositories

2014-02-05 Thread Allison, Timothy B.
Feb 2014, Allison, Timothy B. wrote: Speaking of building...is there an easy way to build Tika locally without reference to the repositories and without building each component one by one (in the correct order) and then manually installing in a local repository. I just do: cd root of tika

RE: Failing test - PDFParserTest.testSequentialParser

2014-02-19 Thread Allison, Timothy B.
I haven't seen the problem, but that's my test. Will take a look. -Original Message- From: Nick Burch [mailto:n...@apache.org] Sent: Wednesday, February 19, 2014 9:44 AM To: dev@tika.apache.org Subject: Failing test - PDFParserTest.testSequentialParser I've just tried to build Tika

RE: Failing test - PDFParserTest.testSequentialParser

2014-02-19 Thread Allison, Timothy B.
Y. Sorry about that. Changing to 15 now. -Original Message- From: Nick Burch [mailto:apa...@gagravarr.org] Sent: Wednesday, February 19, 2014 10:10 AM To: dev@tika.apache.org Subject: RE: Failing test - PDFParserTest.testSequentialParser On Wed, 19 Feb 2014, Allison, Timothy B. wrote

RE: Failing test - PDFParserTest.testSequentialParser

2014-02-19 Thread Allison, Timothy B.
handle this; NonSequentialParser is currently not reading the header version) Cheers, Tim -Original Message- From: Allison, Timothy B. [mailto:talli...@mitre.org] Sent: Wednesday, February 19, 2014 10:18 AM To: dev@tika.apache.org Subject: RE: Failing test

RE: Build failure at trunk in org.apache.tika.server.UnpackerResourceTest

2014-02-26 Thread Allison, Timothy B.
Failure here too. My last successful pull and build occurred Feb 19. -Original Message- From: Nick Burch [mailto:apa...@gagravarr.org] Sent: Tuesday, February 25, 2014 8:14 PM To: dev@tika.apache.org Subject: Re: Build failure at trunk in org.apache.tika.server.UnpackerResourceTest On

RE: Build failed in Jenkins: Tika-trunk #1062

2014-02-26 Thread Allison, Timothy B.
He's alive!!! My bet is: TIKA-1243 - Upgrade to Commons Compress 1.7, and add a disabled unit test for 7z support. 7z support is not enabled yet, pending a commons compress fix When I changed trunk back to 1.5, all tests pass. Change in MD5 implementation btwn Compress 1.5 and 1.7?

RE: Build failed in Jenkins: Tika-trunk #1062

2014-02-26 Thread Allison, Timothy B.
Sorry, should have been clearer: changed the pom in trunk to pull Compress 1.5. -Original Message- From: Allison, Timothy B. [mailto:talli...@mitre.org] Sent: Wednesday, February 26, 2014 11:35 AM To: dev@tika.apache.org Subject: RE: Build failed in Jenkins: Tika-trunk #1062 He's alive

RE: Tika 1.5 vs 1.4 testing

2014-03-06 Thread Allison, Timothy B.
Hong-Thai, Thank you for running these tests. I suspect (mea culpa) that the increase in PDF runtime exception failures was caused by PDFBOX-1803/TIKA-1233, which was not fixed before 1.5 was cut. I recently made major modifications to the metadata extraction components of the PDFParser

RE: Unable to commit SVN ?

2014-04-03 Thread Allison, Timothy B.
I've been having problems with co and updating over the last few days. -Original Message- From: Hong-Thai Nguyen [mailto:hong-thai.ngu...@polyspot.com] Sent: Thursday, April 03, 2014 11:37 AM To: dev@tika.apache.org Subject: Unable to commit SVN ? Hi Tika men, I have 500 error when

RE: [DISCUSS] Nightly Jenkins Builds for Trunk

2014-05-14 Thread Allison, Timothy B.
+1 Please, yes. Thank you! -Original Message- From: Lewis John Mcgibbney [mailto:lewis.mcgibb...@gmail.com] Sent: Wednesday, May 14, 2014 11:21 AM To: dev@tika.apache.org Subject: [DISCUSS] Nightly Jenkins Builds for Trunk Hi Folks, Right now in Jenkins (builds.apache.org) we don't

RE: Property type closed choice

2014-05-21 Thread Allison, Timothy B.
Thank you, Nick. Will open trivial issue and fix. -Original Message- From: Nick Burch [mailto:apa...@gagravarr.org] Sent: Tuesday, May 20, 2014 5:27 PM To: dev@tika.apache.org Subject: Re: Property type closed choice On Tue, 20 May 2014, Allison, Timothy B. wrote: When I run

RE: Hello

2014-05-28 Thread Allison, Timothy B.
Welcome, Tyler! I found Jukka's how-to dev Tika in Eclipse very useful (don't know if you are an Eclipser, though): http://lucene.472066.n3.nabble.com/Newb-IDE-Maven-tp3389963p3390012.html As with many projects, some of the most useful documentation is in the test cases, head to the test

[DISCUSS] Centralizing JSON handling of Metadata

2014-05-28 Thread Allison, Timothy B.
All, Nick recommended I put the question to the dev list for discussion. It might be useful to centralize our json handling of Metadata. We are now currently using different libraries and doing different things in CLI and in tika-server. 1) Do we want to centralize json handling of

RE: Timezone issue with TTF parser?

2014-06-09 Thread Allison, Timothy B.
Ok, should work as of r1601444. Thank you, Nick, for working through this issue with me. -Original Message- From: Nick Burch [mailto:apa...@gagravarr.org] Sent: Monday, June 09, 2014 10:45 AM To: dev@tika.apache.org Subject: Re: Timezone issue with TTF parser? On Mon, 9 Jun 2014, Ken

Is TikaExceptionMapper in tika-server actually used?

2014-06-18 Thread Allison, Timothy B.
All, In working on adding the stacktrace from a parse exception to the server response, I'm trying to find the most jax-rsly elegant way of handling exceptions. There seems to be a bit of duplicated code, some with good reason, for exception handling. Is TikaExceptionMapper actually used

tika-server exception handling

2014-06-19 Thread Allison, Timothy B.
, sorry for a delay, I can see it is expected to process a checked exception, so unless we have one of the root resources throwing it from one of the methods then it is not used Thanks, Sergey On 19/06/14 02:22, Allison, Timothy B. wrote: All, In working on adding the stacktrace from a parse

FW: Improving OCR plugin for PDFBox

2014-06-27 Thread Allison, Timothy B.
Thought this might be of interest. -Original Message- From: John Hewson [mailto:j...@jahewson.com] Sent: Friday, June 27, 2014 2:58 AM To: DImuthu Upeksha Cc: d...@pdfbox.apache.org Subject: Re: Improving OCR plugin for PDFBox Hi Dimuthu That's great. We should wait until closer to the

RE: Regression Testing

2014-07-07 Thread Allison, Timothy B.
John, My initial plan for TIKA-1302 is very similar to what Tilman outlined, and my understanding/concerns/thoughts were very much in line with what he articulated. The idea is that there should be a small Apache license-able gold truth set like both projects now have for specific unit

RE: Metadata at e.g. textfiles

2014-07-10 Thread Allison, Timothy B.
Ditto what Nick said on internal metadata. Are you referring to external metadata that we could get in Java 7 via BasicFileAttributes on OS's that support those? -Original Message- From: Nick Burch [mailto:apa...@gagravarr.org] Sent: Thursday, July 10, 2014 6:52 AM To:

RE: [VOTE] Apache Tika 1.6 release candidate #1

2014-07-28 Thread Allison, Timothy B.
+1 Linux version 2.6.32-431.5.1.el6.x86_64: Java 1.6 and 1.7 Windows 7, Java 1.7 I also ran Tika 1.5 and 1.6 rc1 against a random selection of 10,000 docs (all formats) plus all available msoffice-x files in govdocs1, yielding 10,413 docs. There were several improvements in text extraction

RE: [VOTE] Apache Tika 1.6 release candidate #1

2014-07-31 Thread Allison, Timothy B.
: +1 OSX 10.9.3, Java 1.7 Tyler On Mon, Jul 28, 2014 at 7:09 AM, Allison, Timothy B. talli...@mitre.org wrote: +1 Linux version 2.6.32-431.5.1.el6.x86_64: Java 1.6 and 1.7 Windows 7, Java 1.7 I also ran Tika 1.5 and 1.6 rc1 against a random selection of 10,000 docs (all formats

RE: [VOTE] Apache Tika 1.6 release candidate #1

2014-07-31 Thread Allison, Timothy B.
[mailto:apa...@gagravarr.org] Sent: Thursday, July 31, 2014 3:06 PM To: dev@tika.apache.org Subject: RE: [VOTE] Apache Tika 1.6 release candidate #1 On Thu, 31 Jul 2014, Allison, Timothy B. wrote: On a related note, I did some digging on the one regression I found in the pptx

RE: [VOTE] Release Apache POI 3.11 Beta 1

2014-08-01 Thread Allison, Timothy B.
Rat checked out, successful build on linux. +1... with one reservation I just ran a fresh update of trunk from Tika with RC for POI 3.11 Beta 1 against a random selection of ~10k files from govdocs1, covering many formats. There aren't many office-x files, but there are some, and I made sure

RE: Tika regression test on POI 3.11 Beta 1

2014-08-01 Thread Allison, Timothy B.
Great to hear! Maybe we just need to update something on the Tika side to grab the cell comments: http://svn.apache.org/repos/asf/tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/microsoft/ooxml/XSSFExcelExtractorDecorator.java ? The token check is my primordial TIKA-1302 code,

RE: [DISCUSS] Give examples of Parser, Detector, and Translator usage

2014-08-11 Thread Allison, Timothy B.
Recursion is one that causes confusion, we've got some example programs on the wiki that we can include: https://wiki.apache.org/tika/RecursiveMetadata Ray Gauss is probably our best bet for advanced metadata stuff to send in some examples on that! For development on TIKA-1302, I've been using

RE: TIKA - how to read chunks at a time from a very large file?

2014-08-28 Thread Allison, Timothy B.
Probably better question for the user list. Extending a ContentHandler and using that in ContentHandlerDecorator is pretty straightforward. Would it be easy enough to write to file by passing in an OutputStream to WriteOutContentHandler? -Original Message- From: ruby

RE: TIKA - how to read chunks at a time from a very large file?

2014-08-29 Thread Allison, Timothy B.
My belief in making that recommendation was that a given document wouldn't split a word across an element. I can, of course, think of exceptions (word break at the end of a PDF page, for example), but generally, my assumption is that this wouldn't happen very often. However, if this does

Please add me to authorized wiki editors

2014-09-02 Thread Allison, Timothy B.
TimothyAllison I’d like to start documenting tika-batch. Thank you! Best, Tim

RE: [VOTE] Release Apache Tika 1.6 RC #2

2014-09-02 Thread Allison, Timothy B.
+1 Built in both salt water and fresh (er, Windows 7 and RHEL 6.5). Thank you, Chris! And thank you, Uwe and Nick, for the quick work to get poi-3.11-beta2 included! -Original Message- From: Tyler Palsulich [mailto:tpalsul...@gmail.com] Sent: Tuesday, September 02, 2014 7:24 AM To:

RE: NPE on all *.odt, odp, .ods documents

2014-09-11 Thread Allison, Timothy B.
Probably want to add TIKA-1411. Nick and all, anything else? -Original Message- From: Hong-Thai Nguyen [mailto:thaicha...@gmail.com] Sent: Thursday, September 11, 2014 10:10 AM To: dev@tika.apache.org Subject: Re: NPE on all *.odt, odp, .ods documents Hi Chris, Sound perfect too me.

RE: How to exclude a mimetype in tika?

2014-09-18 Thread Allison, Timothy B.
Science Department University of Southern California, Los Angeles, CA 90089 USA ++ -Original Message- From: Allison, Timothy B. talli...@mitre.org Reply-To: u...@tika.apache.org u...@tika.apache.org Date: Thursday, September

import (re)ordering?

2014-10-21 Thread Allison, Timothy B.
All, I have Intellij set to order imports by javax, java, then other. I think this is the most common pattern in Tika. Is it ok if I make these (meaningless/formatting) changes when I commit other changes? Thank you. Best, Tim

RE: import (re)ordering?

2014-10-24 Thread Allison, Timothy B.
, Timothy B. talli...@mitre.org Reply-To: dev@tika.apache.org dev@tika.apache.org Date: Tuesday, October 21, 2014 at 1:59 PM To: dev@tika.apache.org dev@tika.apache.org Subject: import (re)ordering? All, I have Intellij set to order imports by javax, java, then other. I think this is the most

RE: 1.7 release?

2014-10-24 Thread Allison, Timothy B.
Sorry for coming late to the game on the implications of TIKA-1445. I don't want to hold up the release of 1.7. However, would it be possible to return to the legacy default behavior of extracting metadata from images? We can then document on the OCR parser page on the wiki that you need

RE: PDF test failing on trunk

2014-10-30 Thread Allison, Timothy B.
Hi Nick, The build is working for me on linux and Windows with Java 1.7. Can you tell which file is causing the problem? I wonder if the upgrade to PDFBox 1.8.7 caused the issue? -Original Message- From: Nick Burch [mailto:n...@apache.org] Sent: Wednesday, October 29, 2014 4:40 PM

RE: PDF test failing on trunk

2014-10-30 Thread Allison, Timothy B.
, 2014 9:00 AM To: dev@tika.apache.org Subject: RE: PDF test failing on trunk On Thu, 30 Oct 2014, Allison, Timothy B. wrote: The build is working for me on linux and Windows with Java 1.7. Can you tell which file is causing the problem? I wonder if the upgrade to PDFBox 1.8.7 caused

RE: PDF test failing on trunk

2014-10-30 Thread Allison, Timothy B.
I think so. Would you like the honors? -Original Message- From: Nick Burch [mailto:apa...@gagravarr.org] Sent: Thursday, October 30, 2014 9:23 AM To: dev@tika.apache.org Subject: RE: PDF test failing on trunk On Thu, 30 Oct 2014, Allison, Timothy B. wrote: Ha. Works with an older

RE: PDF test failing on trunk

2014-10-30 Thread Allison, Timothy B.
and give it a test on my version of 1.6. -Original Message- From: Nick Burch [mailto:apa...@gagravarr.org] Sent: Thursday, October 30, 2014 12:01 PM To: dev@tika.apache.org Subject: RE: PDF test failing on trunk On Thu, 30 Oct 2014, Allison, Timothy B. wrote: I think so. Would you

RE: TIKA-1445 and having multiple Parsers (as many as needed) work on the same MediaType

2014-11-18 Thread Allison, Timothy B.
Chris, Thank you for moving this to the dev list. This would be a fairly large change, and the discussion is valuable. -Original Message- From: Mattmann, Chris A (3980) [mailto:chris.a.mattm...@jpl.nasa.gov] Sent: Monday, November 17, 2014 5:25 PM To: dev@tika.apache.org Subject:

RE: Outputting JSON from tika-server/meta

2014-12-19 Thread Allison, Timothy B.
All, With many thanks to Sergey, I added JSON and XMP to “/meta” and I folded in MetadataEP into MetadataResource so that users can request a specific metadata value(s). (TIKA-1497, TIKA-1499) I also added a new endpoint “/rmeta” that is equivalent to tika-app’s –J (TIKA-1498) – JSONified

RE: Forbidden-APIS no longer ran because of carzy POM change

2015-01-23 Thread Allison, Timothy B.
Uwe, To confirm, we need to add this pluginManagement.../pluginManagement fully as it is in the parent pom.xml, we should not put the plugin under our regular plugins (which no longer have pluginManagement? -Original Message- From: Uwe Schindler [mailto:u...@thetaphi.de] Sent:

RE: Forbidden-APIS no longer ran because of carzy POM change

2015-01-23 Thread Allison, Timothy B.
Will do. -Original Message- From: Mattmann, Chris A (3980) [mailto:chris.a.mattm...@jpl.nasa.gov] Sent: Friday, January 23, 2015 2:11 PM To: dev@tika.apache.org Subject: Re: Forbidden-APIS no longer ran because of carzy POM change awesome. Thanks Uwe. Tim you want to put that in, or

RE: [VOTE] Apache Tika 1.7 Release

2015-01-14 Thread Allison, Timothy B.
+1 Built successfully on both Windows 7 and RHEL 6.5 for me...no Tesseract installed. Relying on post rc2 release eval for TIKA 1445 against trunk for no new regressions. Manually confirmed image metadata is being extracted. Thank you, Tyler! Best, Tim

RE: 1.7 release? | potential blocker?

2015-01-05 Thread Allison, Timothy B.
/ ++ Adjunct Associate Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++ -Original Message- From: Allison, Timothy B

RE: 1.7 release? | potential blocker?

2015-01-05 Thread Allison, Timothy B.
, though -- I'll wait for Tim's patch then send an RC#2. Sound good? Tyler On Mon, Jan 5, 2015 at 8:09 AM, Allison, Timothy B. talli...@mitre.org wrote: All, I think I may have found a problem with the interaction of OutlookPSTParser with AutoDetectParser that I'd want to fix before 1.7

level of interest in database file parsing?

2015-01-05 Thread Allison, Timothy B.
All, Thanks to Nick for adding mime info for db files, we can now identify several common db files. What is the community's level of interest in adding parsers for databases that store data in one file, such as .mdb, .dbf, .sqlite, .hsqldb ... (others?)? Most of the jdbc drivers are not

RE: TestMultiPart tests failing

2015-01-12 Thread Allison, Timothy B.
Chris, Is this on an updated and/or reverted trunk or on an modified rc-3? I haven't gotten around to installing tesseract yet so I can't actually kick the tires, but the last time there was a test for 5 items on line 91 of RFC822ParserTest was in r1552405...before the fixes for TIKA-1422.

RE: ExternalParser isn't called

2015-01-13 Thread Allison, Timothy B.
Chris, Should we interpret this as -1 on rc3 from you? Or should we go forth with testing and voting on rc3? Thank you! Best, Tim -Original Message- From: Mattmann, Chris A (3980) [mailto:chris.a.mattm...@jpl.nasa.gov] Sent: Monday, January

Tika 2.0 discussion

2015-01-14 Thread Allison, Timothy B.
All, I just started a wiki page for our discussion of Tika 2.0 (https://wiki.apache.org/tika/Tika2_0RoadMap). Please modify/edit/discuss as you see fit. On a related note, I also started a wiki page for our CompositeParser strategy discussion

RE: Parser that includes LGPL as provided?

2015-02-13 Thread Allison, Timothy B.
, February 13, 2015 7:51 AM To: dev@tika.apache.org Subject: Re: Parser that includes LGPL as provided? On Fri, 13 Feb 2015, Allison, Timothy B. wrote: After I dig myself out of several other issues that I'd like to tackle, I'd like to add a parser for MSAccess files. There's a pure java LGPL

RE: svn commit: r1658847 - /tika/trunk/tika-server/pom.xml

2015-02-11 Thread Allison, Timothy B.
I'm working behind a proxy and getting a new proxy error (proxy unacknowledged) with r1658847 on tika-server package. -Original Message- From: Nick Burch [mailto:apa...@gagravarr.org] Sent: Wednesday, February 11, 2015 6:18 AM To: dev@tika.apache.org Subject: Re: svn commit: r1658847 -

grib adds ~14 MB to tika-app?

2015-02-12 Thread Allison, Timothy B.
All, I just noticed that tika-app has gone from ~30MB to ~44MB, ~20k file to ~27k files. 3.5 of those new MB are for README.NLDAS1.pdf and README.NLDAS2.pdf. Can we exclude those in the app and server? Are there other items that we should exclude? Cheers,

Parser that includes LGPL as provided?

2015-02-13 Thread Allison, Timothy B.
All, After I dig myself out of several other issues that I'd like to tackle, I'd like to add a parser for MSAccess files. There's a pure java LGPL library, Jackcess, available on maven, and it appears to be quite active. I know we have a list of third party parsers, but I'm wondering if we

Re: grib adds ~14 MB to tika-app?

2015-02-13 Thread Allison, Timothy B.
/ ++ Adjunct Associate Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++ -Original Message- From: Allison, Timothy B. talli...@mitre.org Reply-To: dev

RE: svn commit: r1664641 - /tika/trunk/tika-parsers/src/test/java/org/apache/tika/parser/pdf/PDFParserTest.java

2015-03-06 Thread Allison, Timothy B.
In the back of my memory, there's a ticket open for fixing the logged messages from PDFBox (or maybe just fixing the pdfs that triggered the messages), but I can't find it quickly. It may have been a smaller part of something that we've already closed out, or it might still be open. Tyler,

RE: Parser test resources

2015-03-10 Thread Allison, Timothy B.
Hi Tyler, This has started to irk me as well, a bit. I don't think there's much overlap, although there is some. I think navigating standard package resource paths might be cumbersome even with a good IDE... perhaps start with high-level subdirectories as chm is now doing? -Original

Re: [DISCUSS] Tika 1.8 or 1.7.1

2015-03-28 Thread Allison, Timothy B.
Once we fix TIKA-1584, I don't have a preference. I defer to Chris's experience (so I guess, +1 for 1.8) given the amount of work required. It'd be great if we could make sure we aren't bundling any pdfs in our tika-app jar, too. Many apologies if that's been fixed!

including refactored docs from govdocs1 in test suite

2015-03-30 Thread Allison, Timothy B.
- From: Allison, Timothy B. [mailto:talli...@mitre.org] Sent: Monday, March 30, 2015 7:03 AM To: dev@tika.apache.org Subject: RE: [DISCUSS] Tika 1.8 or 1.7.1 Unless there are objections, I'd like these to be resolved before 1.8: TIKA-1584 -- I'll fix TIKA-1575 -- Resolved by Konstantin Gribov (thank

RE: including refactored docs from govdocs1 in test suite

2015-03-30 Thread Allison, Timothy B.
the hyperlink into a new doc and change the URL? I have no idea about including the modified version. Tyler On Mar 30, 2015 9:18 AM, Allison, Timothy B. talli...@mitre.org wrote: All, As part of TIKA-1512, I found that I can delete all of the contents, including the metadata, except for one hyperlink

RE: [DISCUSS] Tika 1.8 or 1.7.1

2015-03-30 Thread Allison, Timothy B.
All, I've made the changes that I had hoped to. Grib pdf exclusion remains for any takers. Let me know when I should initiate the run against govdocs1 to see if there are any surprises on that corpus with Tika 1.8. Best, Tim -Original Message- From: Allison, Timothy B

RE: [jira] [Commented] (TIKA-1584) Tika 1.7 possible regression (nested attachment files not getting parsed)

2015-03-30 Thread Allison, Timothy B.
Backwards compatibility issue found by clirr on TIKA-1587 [INFO] --- clirr-maven-plugin:2.3:check (default) @ tika-core --- [ERROR] org.apache.tika.fork.ForkParser: Return type of method 'public java.lang.String getJavaCommand()' has been changed to java.util.List [ERROR]

RE: [DISCUSS] Tika 1.8 or 1.7.1

2015-03-30 Thread Allison, Timothy B.
Unless there are objections, I'd like these to be resolved before 1.8: TIKA-1584 -- I'll fix TIKA-1575 -- Resolved by Konstantin Gribov (thank you!) TIKA-1512 -- I'll put in a temporary fix so that we don't get IOOBEs, but I'll leave this open and do some more digging to see if we need to open a

RE: Broken build because of clirr plugin

2015-03-30 Thread Allison, Timothy B.
How much of an effort would it be to migrate somewhat slowly: Leave in but deprecate setCommandLine(String ) and String getCommandLine() Add something like: setCommandLineArr(String[] ) and String[] getCommandLineArr()? -Original Message- From: Konstantin Gribov

RE: Broken build because of clirr plugin

2015-03-30 Thread Allison, Timothy B.
) to avoid build failure. And use new ones internally. I'll do `mvn verify` before commiting this time. Sorry for inconvenience. -- Best regards, Konstantin Gribov пн, 30 марта 2015 г. в 18:09, Allison, Timothy B. talli...@mitre.org: How much of an effort would it be to migrate somewhat slowly

RE: Licensing Question

2015-03-23 Thread Allison, Timothy B.
I wonder if it is time to do a re-copy. :) -Original Message- From: Tyler Palsulich [mailto:tpalsul...@gmail.com] Sent: Friday, March 20, 2015 5:17 PM To: dev@tika.apache.org Subject: Re: Licensing Question Perfect. I should have thought of the commit message. Thank you, Ken! Tyler On

RE: Access Control Allow Origin

2015-04-01 Thread Allison, Timothy B.
Might be thinking of TIKA-944? Mind if we switch the CORS short option to -C and use -c for the tika config file? -Original Message- From: Tyler Palsulich [mailto:tpalsul...@gmail.com] Sent: Wednesday, April 01, 2015 11:13 AM To: dev@tika.apache.org Subject: Re: Access Control Allow

RE: Rackspace VM and Standing up Tika Server

2015-01-29 Thread Allison, Timothy B.
, Timothy B. Cc: dev@tika.apache.org Subject: Rackspace VM and Standing up Tika Server Hi Tim, Can you please fill us in with the current status with the Tika + Rackspace effort. I have neglected this so apologies. I want to document what is available on the Tika wiki so we do not loose it again. I

RE: TIKA-1423 Build a parser to extract data from GRIB formats not good with Java 6

2015-01-29 Thread Allison, Timothy B.
+1 to dropping 1.6...let's move to 1.8 and beyond! :) -Original Message- From: Lewis John Mcgibbney [mailto:lewis.mcgibb...@gmail.com] Sent: Thursday, January 29, 2015 6:51 PM To: dev@tika.apache.org Subject: TIKA-1423 Build a parser to extract data from GRIB formats not good with Java

RE: [jira] [Commented] (TIKA-1330) Add robust tika-batch code

2015-04-01 Thread Allison, Timothy B.
at 2:55 PM, Allison, Timothy B. talli...@mitre.org wrote: This looks like a Hudson hiccup. Tyler is seeing excessive logging: Running org.apache.tika.cli.TikaCLIBatchIntegrationTest INFO - about to start driver INFO - about to start driver Anyone else having problems building from a fresh

RE: Any interest in running Apache Tika as part of CommonCrawl?

2015-04-03 Thread Allison, Timothy B.
Sorry, link wasn’t included: https://groups.google.com/forum/#!topic/common-crawl/Cv21VRQjGN0 From: tallison314...@gmail.com [mailto:tallison314...@gmail.com] Sent: Friday, April 03, 2015 8:35 AM To: d...@pdfbox.apache.org; dev@tika.apache.org; d...@poi.apache.org Subject: Fwd: Any interest in

RE: [ANNOUNCE] Apache Tika 1.8 Released

2015-04-21 Thread Allison, Timothy B.
Thank you, Tyler! -Original Message- From: Tyler Palsulich [mailto:tpalsul...@apache.org] Sent: Monday, April 20, 2015 5:09 PM To: dev@tika.apache.org; u...@tika.apache.org; annou...@apache.org Subject: [ANNOUNCE] Apache Tika 1.8 Released The Apache Tika project is pleased to announce

RE: comparing Tika's file detect with other tools?

2015-04-22 Thread Allison, Timothy B.
Oops, our emails passed in the ether. Thank you, Jukka! -Original Message- From: Jukka Zitting [mailto:jukka.zitt...@gmail.com] Sent: Wednesday, April 22, 2015 12:06 PM To: dev@tika.apache.org Subject: Re: comparing Tika's file detect with other tools? Hi, Copyright also covers

RE: comparing Tika's file detect with other tools?

2015-04-22 Thread Allison, Timothy B.
: Allison, Timothy B. Sent: April 22, 2015 5:47:17am PDT To: dev@tika.apache.org Subject: comparing Tika's file detect with other tools? Would it be frowned upon to compare Tika's file detection with other tools, like file? Any concerns about effectively reverse engineering (when we find

comparing Tika's file detect with other tools?

2015-04-22 Thread Allison, Timothy B.
Would it be frowned upon to compare Tika's file detection with other tools, like file? Any concerns about effectively reverse engineering (when we find that Tika is wrong) from a non-Apache project? Any other sensitivities I should be aware of? Best, Tim

RE: [VOTE] Apache Tika 1.8 Release Candidate #2

2015-04-20 Thread Allison, Timothy B.
wrote: From: Allison, Timothy B. Sent: April 20, 2015 5:11:04am PDT To: dev@tika.apache.org Subject: RE: [VOTE] Apache Tika 1.8 Release Candidate #2 If I understand correctly, if we release rc2, Tika 1.8 will break in Hadoop clusters across the land?! Or, Hadoop folks will have

Java 1.6 support for Tika 1.9?

2015-04-27 Thread Allison, Timothy B.
Hi All, I can't remember where we are on this. Are we dropping support for Java 1.6 in Tika 1.9? If so, should we open an issue to integrate tika-java7 into core, add diamond operators, catching multiple exceptions... anything else...? Or, do we want to wait for Tika 2.0 or Tika 1.10?

topic change: common crawl slice on TIKA-1302 vm

2015-04-14 Thread Allison, Timothy B.
: [VOTE] Apache Tika 1.8 Release Candidate #2 Hi Tim Great to hear that you managed to use the dataset from CommonCrawl. Thanks! Julien On 14 April 2015 at 14:15, Allison, Timothy B. talli...@mitre.org wrote: +1 Thank you, Tyler! Apologies to Hong-Thai and community for not recognizing

RE: [VOTE] Release Apache Tika 1.8 Candidate #1

2015-04-12 Thread Allison, Timothy B.
to make sure the above issues are (believed to be) settled before the next cut. Thanks, Tyler On Apr 10, 2015 4:55 PM, David Meikle loo...@gmail.com wrote: On 10 Apr 2015, at 11:38, Allison, Timothy B. talli...@mitre.org wrote: I agree that the ODT issue might require a respin. What do

Re: [VOTE] Apache Tika 1.8 Release Candidate #2

2015-04-14 Thread Allison, Timothy B.
+1 Thank you, Tyler! Apologies to Hong-Thai and community for not recognizing the severity of TIKA-1600 when I voted in favor of rc1! Details... I reran against govdocs1, and there aren't any major surprises. On our Rackspace vm, I _finally_ unzipped the Common Crawl slice that Julien

[COMPRESS and others] FW: Any interest in running Apache Tika as part of CommonCrawl?

2015-04-07 Thread Allison, Timothy B.
here: https://groups.google.com/forum/#!topic/common-crawl/Cv21VRQjGN0 I’ve tried to follow Commons’ vernacular, and I’ve added [COMPRESS] to the Subject line. Please invite others who might have an interest in this work. Best, Tim From: Allison, Timothy B. Sent

RE: [jira] [Commented] (TIKA-1330) Add robust tika-batch code

2015-04-01 Thread Allison, Timothy B.
This looks like a Hudson hiccup. Tyler is seeing excessive logging: Running org.apache.tika.cli.TikaCLIBatchIntegrationTest INFO - about to start driver INFO - about to start driver Anyone else having problems building from a fresh trunk? -Original Message- From: Hudson (JIRA)

RE: [VOTE] Release Apache Tika 1.8 Candidate #1

2015-04-09 Thread Allison, Timothy B.
I just finished the against govdocs1 with 1.7 vs. 1.8-rc1, and all looks good with one major change... on first glance. Because of my fix on TIKA-1519 and the law of unintended consequences, files that start like so: !DOCTYPE html PUBLIC -//W3C//DTD XHTML 1.0 Transitional//EN

RE: [VOTE] Release Apache Tika 1.8 Candidate #1

2015-04-09 Thread Allison, Timothy B.
For those who want to take a look at the reports (much more work is needed on processing stack traces for SORT_STACK_TRACE): https://github.com/tballison/share/blob/master/tika_comparisons/tika_1_7_v_1_8-rc1.zip

FW: Any interest in running Apache Tika as part of CommonCrawl?

2015-04-03 Thread Allison, Timothy B.
All, What do you think? https://groups.google.com/forum/#!topic/common-crawl/Cv21VRQjGN0 On Friday, April 3, 2015 at 8:23:11 AM UTC-4, talliso...@gmail.commailto:talliso...@gmail.com wrote: CommonCrawl currently has the WET format that extracts plain text from web pages. My guess is that

RE: [DISCUSS] 1.9 Tika release?

2015-06-04 Thread Allison, Timothy B.
-excel 6116 847/847762.ppt 847762.ppt/992 application/vnd.ms-excel 6119 Looks like the majority are embedded in ppt, but there are several embedded in xls as well. Cheers, Tim -Original Message- From: Allison, Timothy B. [mailto:talli...@mitre.org] Sent: Wednesday

RE: [DISCUSS] 1.9 Tika release?

2015-06-03 Thread Allison, Timothy B.
Fixed eval code, thanks to Nick. Now running against doc/x list fixes to confirm success. Will rerun tomorrow on full set, with results by noon ETD. -Original Message- From: Allison, Timothy B. [mailto:talli...@mitre.org] Sent: Wednesday, June 03, 2015 7:28 AM To: dev@tika.apache.org

RE: [DISCUSS] 1.9 Tika release?

2015-06-05 Thread Allison, Timothy B.
Thank you, Nick! -Original Message- From: Nick Burch [mailto:apa...@gagravarr.org] Sent: Friday, June 05, 2015 6:15 AM To: dev@tika.apache.org Subject: RE: [DISCUSS] 1.9 Tika release? text/dif+xml-application/dif+xml Expected and fine Agreed on the mime type, but is there a reason

RE: [VOTE] Release Apache Tika 1.9 Candidate #2

2015-06-08 Thread Allison, Timothy B.
+1 Built in Windows and Linux. Works on problems (that I caused!) in rc1. Let's make sure to include last Java 1.6 version in the release notes, if that's what we've decided. Thank you, Chris! Best, Tim -Original Message- From: Mattmann, Chris A (3980)

  1   2   3   4   5   >