svn commit: r957258 - /tika/trunk/tika-parent/pom.xml

2010-06-23 Thread nick
Author: nick Date: Wed Jun 23 16:12:23 2010 New Revision: 957258 URL: http://svn.apache.org/viewvc?rev=957258view=rev Log: Add myself to the committers list, and remove Ken Krugler's duplicate entry Modified: tika/trunk/tika-parent/pom.xml Modified: tika/trunk/tika-parent/pom.xml URL: http

svn commit: r957271 - in /tika/trunk/tika-parsers/src: main/java/org/apache/tika/parser/microsoft/ test/java/org/apache/tika/parser/microsoft/ooxml/ test/resources/test-documents/

2010-06-23 Thread nick
Author: nick Date: Wed Jun 23 16:57:58 2010 New Revision: 957271 URL: http://svn.apache.org/viewvc?rev=957271view=rev Log: Apply patch from Maxim Valyanskiy from TIKA-437 - support encrypted OOXML office files which use the default password. Added: tika/trunk/tika-parsers/src/test

svn commit: r958581 - in /tika/trunk: tika-core/src/main/java/org/apache/tika/metadata/ tika-parsers/src/main/java/org/apache/tika/parser/image/ tika-parsers/src/main/java/org/apache/tika/parser/jpeg/

2010-06-28 Thread nick
Author: nick Date: Mon Jun 28 13:59:08 2010 New Revision: 958581 URL: http://svn.apache.org/viewvc?rev=958581view=rev Log: Use the new TIFF Metadata entries for image width/length/sampling from the TIFF, JPEG and general Image (ImageIO) parsers. Gives a small number of consistent image related

svn commit: r958924 - in /tika/trunk/tika-parsers/src: main/java/org/apache/tika/parser/microsoft/ooxml/ test/java/org/apache/tika/parser/microsoft/ooxml/ test/resources/test-documents/

2010-06-29 Thread nick
Author: nick Date: Tue Jun 29 11:12:15 2010 New Revision: 958924 URL: http://svn.apache.org/viewvc?rev=958924view=rev Log: Unit test to show that we support pptx, pptm, ppsx and ppsm (TIKA-418) .thmx will need a POI upgrade, but the file format lacks any text! .xps is still unsupported by POI

svn commit: r958942 - in /tika/trunk/tika-parsers/src: main/java/org/apache/tika/parser/html/ main/java/org/apache/tika/parser/image/ main/java/org/apache/tika/parser/jpeg/ test/java/org/apache/tika/p

2010-06-29 Thread nick
Author: nick Date: Tue Jun 29 12:06:19 2010 New Revision: 958942 URL: http://svn.apache.org/viewvc?rev=958942view=rev Log: Enable extraction of longitude and latitude from JPEG/Tiff files (via the EXIF tags), and HTML (via the ICBM meta tag), to the new geographic metadata namespace Added

svn commit: r964235 - in /tika/trunk: tika-core/src/main/java/org/apache/tika/metadata/ tika-core/src/test/java/org/apache/tika/metadata/ tika-parsers/src/main/java/org/apache/tika/parser/iwork/

2010-07-14 Thread nick
Author: nick Date: Wed Jul 14 22:46:50 2010 New Revision: 964235 URL: http://svn.apache.org/viewvc?rev=964235view=rev Log: TIKA-451 - Inconsistent date format for Metadata.CREATION_DATE and Metadata.LAST_MODIFIED Make CREATION_DATE and LAST_MODIFIED Date property instances, and add support

svn commit: r979256 - /tika/trunk/tika-bundle/pom.xml

2010-07-26 Thread nick
Author: nick Date: Mon Jul 26 12:15:05 2010 New Revision: 979256 URL: http://svn.apache.org/viewvc?rev=979256view=rev Log: Add the new rome dependency to the bundle (TIKA-466) Modified: tika/trunk/tika-bundle/pom.xml Modified: tika/trunk/tika-bundle/pom.xml URL: http://svn.apache.org

svn commit: r980508 - in /tika/trunk/tika-core/src: main/java/org/apache/tika/detect/ main/java/org/apache/tika/mime/ test/java/org/apache/tika/mime/ test/resources/org/apache/tika/mime/

2010-07-29 Thread nick
Author: nick Date: Thu Jul 29 16:59:14 2010 New Revision: 980508 URL: http://svn.apache.org/viewvc?rev=980508view=rev Log: Make mime type detection a little bit more stable (TIKA-391) Make the comparison operator work better on Magic types, and ensure that the type is present on the magic

svn commit: r985248 - in /tika/site/publish: ./ 0.5/ 0.6/ 0.7/

2010-08-13 Thread nick
Author: nick Date: Fri Aug 13 15:35:23 2010 New Revision: 985248 URL: http://svn.apache.org/viewvc?rev=985248view=rev Log: Update the site build to include the detection page Added: tika/site/publish/0.7/detection.html Modified: tika/site/publish/0.5/documentation.html tika/site

svn commit: r993108 - in /tika/trunk: tika-core/src/main/java/org/apache/tika/io/ tika-parsers/src/main/java/org/apache/tika/detect/ tika-parsers/src/main/java/org/apache/tika/parser/microsoft/ tika-p

2010-09-06 Thread nick
Author: nick Date: Mon Sep 6 17:42:52 2010 New Revision: 993108 URL: http://svn.apache.org/viewvc?rev=993108view=rev Log: Add support for to the ContainerAwareDetector for Corel OLE2 formats, and Microsoft Works (TIKA-486) Also slightly refactor the child container detectors, so we can do

svn commit: r993113 - in /tika/trunk/tika-parsers/src: main/java/org/apache/tika/detect/ContainerAwareDetector.java test/java/org/apache/tika/detect/TestContainerAwareDetector.java

2010-09-06 Thread nick
Author: nick Date: Mon Sep 6 17:51:43 2010 New Revision: 993113 URL: http://svn.apache.org/viewvc?rev=993113view=rev Log: Apply (with slight tweaks) Antoni Mylka's container aware detector patch for truncated OLE2 documents - TIKA-485 Modified: tika/trunk/tika-parsers/src/main/java/org

svn commit: r1024255 - in /tika/trunk/tika-parsers/src: main/java/org/apache/tika/detect/ main/java/org/apache/tika/parser/iwork/ test/java/org/apache/tika/detect/ test/resources/test-documents/

2010-10-19 Thread nick
Author: nick Date: Tue Oct 19 14:56:54 2010 New Revision: 1024255 URL: http://svn.apache.org/viewvc?rev=1024255view=rev Log: Add iWork support to the Container Aware Detector (TIKA-533) It's a bit icky for now, but it works and it's quick... Added: tika/trunk/tika-parsers/src/test/resources

svn commit: r1024291 - /tika/trunk/tika-app/src/main/java/org/apache/tika/cli/TikaCLI.java

2010-10-19 Thread nick
Author: nick Date: Tue Oct 19 15:54:41 2010 New Revision: 1024291 URL: http://svn.apache.org/viewvc?rev=1024291view=rev Log: Add --container-aware-detector option to the Tika CLI, which will switch the detector used by the auto parser Modified: tika/trunk/tika-app/src/main/java/org/apache

svn commit: r1034463 - in /tika/trunk/tika-parsers/src: main/java/org/apache/tika/parser/microsoft/ main/java/org/apache/tika/parser/microsoft/ooxml/ test/java/org/apache/tika/parser/microsoft/ooxml/

2010-11-12 Thread nick
Author: nick Date: Fri Nov 12 16:40:43 2010 New Revision: 1034463 URL: http://svn.apache.org/viewvc?rev=1034463view=rev Log: TIKA-552 - Handle word styles like heading 4 just like Heading 4, and in .docx files insert bookmarks as anchor tags, along with relative hyperlinks for the text

svn commit: r1039496 - /tika/trunk/tika-core/src/main/resources/org/apache/tika/mime/tika-mimetypes.xml

2010-11-26 Thread nick
Author: nick Date: Fri Nov 26 18:25:38 2010 New Revision: 1039496 URL: http://svn.apache.org/viewvc?rev=1039496view=rev Log: Apply mimetype updates from TIKA-560 Modified: tika/trunk/tika-core/src/main/resources/org/apache/tika/mime/tika-mimetypes.xml Modified: tika/trunk/tika-core/src

svn commit: r1045006 - in /tika/trunk: tika-core/src/main/resources/org/apache/tika/mime/tika-mimetypes.xml tika-parsers/src/test/java/org/apache/tika/mime/TestMimeTypes.java tika-parsers/src/test/res

2010-12-12 Thread nick
Author: nick Date: Mon Dec 13 02:41:59 2010 New Revision: 1045006 URL: http://svn.apache.org/viewvc?rev=1045006view=rev Log: Apply patch from TIKA-570 from Benson Margulies - stricter BMP detection and unit test Added: tika/trunk/tika-parsers/src/test/resources/test-documents/testBMPfp.txt

svn commit: r1060387 - /tika/trunk/tika-parsers/src/test/resources/test-documents/testACCESS.mdb

2011-01-18 Thread nick
Author: nick Date: Tue Jan 18 14:15:47 2011 New Revision: 1060387 URL: http://svn.apache.org/viewvc?rev=1060387view=rev Log: Add test access mdb file from TIKA-586 Added: tika/trunk/tika-parsers/src/test/resources/test-documents/testACCESS.mdb (with props) Added: tika/trunk/tika-parsers

svn commit: r1060389 - /tika/trunk/tika-parsers/src/test/resources/test-documents/testTrueType.ttf

2011-01-18 Thread nick
Author: nick Date: Tue Jan 18 14:17:07 2011 New Revision: 1060389 URL: http://svn.apache.org/viewvc?rev=1060389view=rev Log: Add test true type font file from TIKA-586 Added: tika/trunk/tika-parsers/src/test/resources/test-documents/testTrueType.ttf (with props) Added: tika/trunk/tika

svn commit: r1060393 - in /tika/trunk: tika-core/src/main/resources/org/apache/tika/mime/tika-mimetypes.xml tika-parsers/src/test/java/org/apache/tika/mime/TestMimeTypes.java

2011-01-18 Thread nick
Author: nick Date: Tue Jan 18 14:28:33 2011 New Revision: 1060393 URL: http://svn.apache.org/viewvc?rev=1060393view=rev Log: Access mdb detection and test from Martijn in TIKA-586 Modified: tika/trunk/tika-core/src/main/resources/org/apache/tika/mime/tika-mimetypes.xml tika/trunk/tika

svn commit: r1064232 - in /tika/trunk: tika-core/src/main/resources/org/apache/tika/mime/tika-mimetypes.xml tika-parsers/src/test/java/org/apache/tika/mime/TestMimeTypes.java

2011-01-27 Thread nick
Author: nick Date: Thu Jan 27 17:54:37 2011 New Revision: 1064232 URL: http://svn.apache.org/viewvc?rev=1064232view=rev Log: Fix up the iwork mime types with the patch from TIKA-588, and also add a unit test for the detection using the non-container detector (we already had container aware

svn commit: r1078031 - /tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/mp3/LyricsHandler.java

2011-03-04 Thread nick
Author: nick Date: Fri Mar 4 16:06:28 2011 New Revision: 1078031 URL: http://svn.apache.org/viewvc?rev=1078031view=rev Log: TIKA-606 - MP3 lyrics tags use a 6 digit length for the overall size, but only 5 digits for each tag Modified: tika/trunk/tika-parsers/src/main/java/org/apache/tika

svn commit: r1081392 - in /tika/trunk/tika-parsers: ./ src/main/java/org/apache/tika/parser/microsoft/ooxml/ src/test/java/org/apache/tika/detect/

2011-03-14 Thread nick
Author: nick Date: Mon Mar 14 14:27:05 2011 New Revision: 1081392 URL: http://svn.apache.org/viewvc?rev=1081392view=rev Log: Update the OOXML Excel (.xlsx) extractor to be largely SAX based, to reduce the memory use (it now works in a similar-ish way to the .xls one). Bumps the POI dependency

svn commit: r1081547 - in /tika/trunk: tika-core/src/main/resources/org/apache/tika/mime/ tika-parsers/src/test/java/org/apache/tika/parser/microsoft/ tika-parsers/src/test/resources/test-documents/

2011-03-14 Thread nick
Author: nick Date: Mon Mar 14 20:26:36 2011 New Revision: 1081547 URL: http://svn.apache.org/viewvc?rev=1081547view=rev Log: Fix the mime magic detection of TNEF files, and add a unit test for it. (The rest of the TNEF support will be committed when POI 3.8 beta 2 is out). (TIKA-615) Added

svn commit: r1082973 - in /tika/trunk/tika-parsers/src: main/java/org/apache/tika/parser/image/ImageMetadataExtractor.java test/java/org/apache/tika/parser/jpeg/JpegParserTest.java test/resources/test

2011-03-18 Thread nick
Author: nick Date: Fri Mar 18 17:00:42 2011 New Revision: 1082973 URL: http://svn.apache.org/viewvc?rev=1082973view=rev Log: TIKA-534 - When parsing a jpeg file with unhandled tags in it, skip these Added: tika/trunk/tika-parsers/src/test/resources/test-documents

svn commit: r1083119 - /tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/dwg/DWGParser.java

2011-03-18 Thread nick
Author: nick Date: Sat Mar 19 01:14:28 2011 New Revision: 1083119 URL: http://svn.apache.org/viewvc?rev=1083119view=rev Log: Turning an ASCII string into static final bytes without exceptions shouldn't be this hard Fix 1.6ism for TIKA-492 Modified: tika/trunk/tika-parsers/src/main/java

svn commit: r1084658 - /tika/trunk/tika-parsers/src/test/java/org/apache/tika/mime/TestMimeTypes.java

2011-03-23 Thread nick
Author: nick Date: Wed Mar 23 18:11:32 2011 New Revision: 1084658 URL: http://svn.apache.org/viewvc?rev=1084658view=rev Log: Add some more detection tests, which show that for container formats the addition of the filename lets us specialise from eg tika-msoffice to msword Modified: tika

svn commit: r1084796 - /tika/trunk/tika-parsers/src/test/java/org/apache/tika/parser/image/ImageParserTest.java

2011-03-23 Thread nick
Author: nick Date: Wed Mar 23 22:57:56 2011 New Revision: 1084796 URL: http://svn.apache.org/viewvc?rev=1084796view=rev Log: Fix deprecated warnings Modified: tika/trunk/tika-parsers/src/test/java/org/apache/tika/parser/image/ImageParserTest.java Modified: tika/trunk/tika-parsers/src/test

svn commit: r1084798 - in /tika/trunk: tika-core/src/main/java/org/apache/tika/parser/CompositeParser.java tika-parsers/src/test/java/org/apache/tika/parser/AutoDetectParserTest.java

2011-03-23 Thread nick
Author: nick Date: Wed Mar 23 23:00:13 2011 New Revision: 1084798 URL: http://svn.apache.org/viewvc?rev=1084798view=rev Log: When trying to identify a parser for a media type in AutoDetect and similar, if the Parser claims to support an alias of the media type but not the canonical one (eg

svn commit: r1084801 - /tika/trunk/tika-core/src/main/java/org/apache/tika/config/TikaConfig.java

2011-03-23 Thread nick
Author: nick Date: Wed Mar 23 23:18:51 2011 New Revision: 1084801 URL: http://svn.apache.org/viewvc?rev=1084801view=rev Log: When creating a default TikaConfig instance with a DefaultParser, have the newly created parser wired up with the Mime Type Registry we create. This allows the parser

svn commit: r1084805 - /tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/image/ImageParser.java

2011-03-23 Thread nick
Author: nick Date: Wed Mar 23 23:25:59 2011 New Revision: 1084805 URL: http://svn.apache.org/viewvc?rev=1084805view=rev Log: TIKA-555 fallout - While image/bmp isn't the official mimetype, it is what Java thinks it is. So, switch from the official to the un-offial one before asking Java to give

svn propchange: r1084798 - svn:log

2011-03-24 Thread nick
Author: nick Revision: 1084798 Modified property: svn:log Modified: svn:log at Thu Mar 24 09:58:59 2011 -- --- svn:log (original) +++ svn:log Thu Mar 24 09:58:59 2011 @@ -1,2 +1 @@ -When trying to identify a parser

svn propchange: r1084801 - svn:log

2011-03-24 Thread nick
Author: nick Revision: 1084801 Modified property: svn:log Modified: svn:log at Thu Mar 24 10:00:30 2011 -- --- svn:log (original) +++ svn:log Thu Mar 24 10:00:30 2011 @@ -1 +1 @@ -When creating a default TikaConfig

svn commit: r1085003 - in /tika/trunk: tika-core/src/main/java/org/apache/tika/parser/CompositeParser.java tika-parsers/src/test/java/org/apache/tika/parser/CompositeParserTest.java tika-parsers/src/t

2011-03-24 Thread nick
Author: nick Date: Thu Mar 24 15:35:24 2011 New Revision: 1085003 URL: http://svn.apache.org/viewvc?rev=1085003view=rev Log: TIKA-620 - Have CompositeParser always use the canonical mimetype internally, via suitable calls to registry.normalise, rather than trying to handle the aliases

svn commit: r1086912 - in /tika/site/src/site/apt: 0.8/formats.apt 0.9/formats.apt

2011-03-30 Thread nick
Author: nick Date: Wed Mar 30 11:52:57 2011 New Revision: 1086912 URL: http://svn.apache.org/viewvc?rev=1086912view=rev Log: TIKA-624 - Update supported formats for 0.8 and 0.9 Modified: tika/site/src/site/apt/0.8/formats.apt tika/site/src/site/apt/0.9/formats.apt Modified: tika/site

svn commit: r1086919 [2/2] - in /tika/site/publish: 0.8/formats.html 0.8/parser.html 0.8/parser_guide.html 0.9/formats.html 0.9/parser.html 0.9/parser_guide.html

2011-03-30 Thread nick
Modified: tika/site/publish/0.9/parser_guide.html URL: http://svn.apache.org/viewvc/tika/site/publish/0.9/parser_guide.html?rev=1086919r1=1086918r2=1086919view=diff == --- tika/site/publish/0.9/parser_guide.html

svn commit: r1087762 - /tika/trunk/tika-parsers/src/test/resources/test-documents/testMSG_chinese.msg

2011-04-01 Thread nick
Author: nick Date: Fri Apr 1 15:32:58 2011 New Revision: 1087762 URL: http://svn.apache.org/viewvc?rev=1087762view=rev Log: TIKA-631 - Sample Chinese outlook file Added: tika/trunk/tika-parsers/src/test/resources/test-documents/testMSG_chinese.msg (with props) Added: tika/trunk/tika

svn commit: r1089516 - in /tika/trunk/tika-core/src/main/java/org/apache/tika: config/ parser/ parser/external/

2011-04-06 Thread nick
Author: nick Date: Wed Apr 6 16:19:17 2011 New Revision: 1089516 URL: http://svn.apache.org/viewvc?rev=1089516view=rev Log: TIKA-634 - Initial work on supporting more flexible ExternalParser loading (via XML, part done), and external parser metadata extraction Added: tika/trunk/tika-core

svn commit: r1089518 - in /tika/trunk/tika-parsers/src/main/resources: META-INF/services/ org/ org/apache/ org/apache/tika/ org/apache/tika/parser/ org/apache/tika/parser/external/

2011-04-06 Thread nick
Author: nick Date: Wed Apr 6 16:20:00 2011 New Revision: 1089518 URL: http://svn.apache.org/viewvc?rev=1089518view=rev Log: TIKA-634 - Example external parsers config file Added: tika/trunk/tika-parsers/src/main/resources/org/ tika/trunk/tika-parsers/src/main/resources/org/apache

svn commit: r1089543 - in /tika/trunk/tika-core/src/main/java/org/apache/tika/parser/external: CompositeExternalParser.java ExternalParser.java ExternalParsersConfigReader.java ExternalParsersConfigRe

2011-04-06 Thread nick
Author: nick Date: Wed Apr 6 17:39:32 2011 New Revision: 1089543 URL: http://svn.apache.org/viewvc?rev=1089543view=rev Log: TIKA-634 - Add support for checking if the external command is there, for collecting the output from a file, and a wrapper CompositeParser that loads all available

svn commit: r1091042 - in /tika/trunk/tika-parsers: pom.xml src/main/java/org/apache/tika/parser/microsoft/OutlookExtractor.java

2011-04-11 Thread nick
Author: nick Date: Mon Apr 11 11:44:18 2011 New Revision: 1091042 URL: http://svn.apache.org/viewvc?rev=1091042view=rev Log: TIKA-615 - Outlook parsing update for POI 3.8 beta 2 Modified: tika/trunk/tika-parsers/pom.xml tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser

svn commit: r1091044 - in /tika/trunk/tika-parsers/src: main/java/org/apache/tika/parser/microsoft/TNEFParser.java test/java/org/apache/tika/parser/microsoft/TNEFParserTest.java

2011-04-11 Thread nick
Author: nick Date: Mon Apr 11 11:47:29 2011 New Revision: 1091044 URL: http://svn.apache.org/viewvc?rev=1091044view=rev Log: TIKA-615 - POI powered TNEF parser Added: tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/microsoft/TNEFParser.java Modified: tika/trunk/tika

svn commit: r1095429 - /tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/microsoft/WordExtractor.java

2011-04-20 Thread nick
Author: nick Date: Wed Apr 20 14:59:24 2011 New Revision: 1095429 URL: http://svn.apache.org/viewvc?rev=1095429view=rev Log: TIKA-644 - When generating html headings from word, h6 is the highest the xhtml allows, so don't try generating h7 (or higher) even if Word has a 'Heading 7' style

svn commit: r1095759 - /tika/trunk/tika-parsers/src/main/java/org/apache/tika/detect/POIFSContainerDetector.java

2011-04-21 Thread nick
Author: nick Date: Thu Apr 21 15:58:22 2011 New Revision: 1095759 URL: http://svn.apache.org/viewvc?rev=1095759view=rev Log: TIKA-643 - Now that we're using NPOIFS which takes files, simplify the code as we don't need to use an InputStream Modified: tika/trunk/tika-parsers/src/main/java

svn commit: r1095760 - in /tika/trunk/tika-core/src/main/java/org/apache/tika: io/TaggedInputStream.java io/TikaInputStream.java parser/CompositeParser.java parser/NetworkParser.java

2011-04-21 Thread nick
Author: nick Date: Thu Apr 21 15:59:42 2011 New Revision: 1095760 URL: http://svn.apache.org/viewvc?rev=1095760view=rev Log: TIKA-643 - Change TagginedInputStream to work like TikaInputStream for creation, with a static get, to avoid double wrapping. Also adds toString methods on the two

svn commit: r1098942 - in /tika/trunk/tika-app: pom.xml src/main/java/org/apache/tika/cli/TikaCLI.java

2011-05-03 Thread nick
Author: nick Date: Tue May 3 07:03:08 2011 New Revision: 1098942 URL: http://svn.apache.org/viewvc?rev=1098942view=rev Log: TIKA-213 JSON metadata output support, using the GSON library to do most of the work Modified: tika/trunk/tika-app/pom.xml tika/trunk/tika-app/src/main/java/org

svn commit: r1099309 - /tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/image/ImageParser.java

2011-05-03 Thread nick
Author: nick Date: Wed May 4 01:06:05 2011 New Revision: 1099309 URL: http://svn.apache.org/viewvc?rev=1099309view=rev Log: TIKA-619 - Apply patch from Alexander Chow to ignore errors from a JRE GIF bug Modified: tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/image

svn commit: r1100014 - /tika/trunk/tika-parsers/src/main/java/org/apache/tika/detect/ZipContainerDetector.java

2011-05-05 Thread nick
Author: nick Date: Fri May 6 01:21:36 2011 New Revision: 1100014 URL: http://svn.apache.org/viewvc?rev=1100014view=rev Log: TIKA-654 - Open the OOXML OPCPackage as read only, and fix serial version warning Modified: tika/trunk/tika-parsers/src/main/java/org/apache/tika/detect

svn commit: r1100015 - /tika/trunk/tika-core/src/main/java/org/apache/tika/io/TikaInputStream.java

2011-05-05 Thread nick
Author: nick Date: Fri May 6 01:22:12 2011 New Revision: 1100015 URL: http://svn.apache.org/viewvc?rev=1100015view=rev Log: TIKA-654 - If we have an open container that can be closed, close it when closing the stream Modified: tika/trunk/tika-core/src/main/java/org/apache/tika/io

svn commit: r1100051 - in /tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser: mail/MailContentHandler.java mbox/MboxParser.java

2011-05-05 Thread nick
Author: nick Date: Fri May 6 04:45:49 2011 New Revision: 1100051 URL: http://svn.apache.org/viewvc?rev=1100051view=rev Log: TIKA-656 RFC822 and MBox parsers should output the same date metadata keys Modified: tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/mail

svn commit: r1100053 - in /tika/trunk: tika-core/src/main/java/org/apache/tika/metadata/MSOffice.java tika-parsers/src/main/java/org/apache/tika/parser/microsoft/ooxml/MetadataExtractor.java

2011-05-05 Thread nick
Author: nick Date: Fri May 6 04:50:27 2011 New Revision: 1100053 URL: http://svn.apache.org/viewvc?rev=1100053view=rev Log: TIKA-656 Switch two more Office metadata keys that hold dates to being typed date properties Modified: tika/trunk/tika-core/src/main/java/org/apache/tika/metadata

svn commit: r1100061 - in /tika/trunk/tika-parsers/src: main/java/org/apache/tika/parser/mbox/MboxParser.java main/java/org/apache/tika/parser/microsoft/OutlookExtractor.java test/java/org/apache/tika

2011-05-05 Thread nick
Author: nick Date: Fri May 6 05:14:39 2011 New Revision: 1100061 URL: http://svn.apache.org/viewvc?rev=1100061view=rev Log: TIKA-656 Update the Outlook parser to handle dates the same way as the other mail parsers Modified: tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/mbox

svn commit: r1100100 - in /tika/trunk: tika-core/src/main/java/org/apache/tika/metadata/ tika-parsers/src/main/java/org/apache/tika/parser/microsoft/ tika-parsers/src/main/java/org/apache/tika/parser/

2011-05-06 Thread nick
Author: nick Date: Fri May 6 06:30:43 2011 New Revision: 1100100 URL: http://svn.apache.org/viewvc?rev=1100100view=rev Log: TIKA-652 Update the POIFS parser to handle custom metadata entries in the same way that the Open Document one already does Modified: tika/trunk/tika-core/src/main

svn commit: r1101471 - in /tika/trunk: tika-core/src/main/resources/org/apache/tika/mime/tika-mimetypes.xml tika-parsers/src/test/java/org/apache/tika/mime/TestMimeTypes.java

2011-05-10 Thread nick
Author: nick Date: Tue May 10 14:26:17 2011 New Revision: 1101471 URL: http://svn.apache.org/viewvc?rev=1101471view=rev Log: TIKA-658 TCPDump pcap mime matching Modified: tika/trunk/tika-core/src/main/resources/org/apache/tika/mime/tika-mimetypes.xml tika/trunk/tika-parsers/src/test

svn commit: r1103540 - in /tika/trunk/tika-parsers/src/test/java/org/apache/tika/parser: odf/ odf/ODFParserTest.java odf/OpenOfficeParserTest.java opendocument/

2011-05-15 Thread nick
Author: nick Date: Sun May 15 20:57:55 2011 New Revision: 1103540 URL: http://svn.apache.org/viewvc?rev=1103540view=rev Log: TIKA-659 Merge the ODF parser tests, and put them in the new package Added: tika/trunk/tika-parsers/src/test/java/org/apache/tika/parser/odf/ - copied from

svn commit: r1103545 - /tika/trunk/tika-core/src/main/java/org/apache/tika/sax/EndDocumentShieldingContentHandler.java

2011-05-15 Thread nick
Author: nick Date: Sun May 15 21:15:21 2011 New Revision: 1103545 URL: http://svn.apache.org/viewvc?rev=1103545view=rev Log: TIKA-646 Helper class to allow us to avoid calling endDocument until a later time Added: tika/trunk/tika-core/src/main/java/org/apache/tika/sax

svn commit: r1124165 - /tika/trunk/tika-app/src/main/java/org/apache/tika/cli/TikaCLI.java

2011-05-18 Thread nick
Author: nick Date: Wed May 18 10:10:51 2011 New Revision: 1124165 URL: http://svn.apache.org/viewvc?rev=1124165view=rev Log: TIKA-213 Remove leading zeros from integers when outputting JSON Modified: tika/trunk/tika-app/src/main/java/org/apache/tika/cli/TikaCLI.java Modified: tika/trunk

svn commit: r1124772 - in /tika/trunk: tika-core/src/test/java/org/apache/tika/mime/ tika-core/src/test/java/org/apache/tika/parser/ tika-parsers/src/test/java/org/apache/tika/mime/ tika-parsers/src/t

2011-05-19 Thread nick
Author: nick Date: Thu May 19 13:44:10 2011 New Revision: 1124772 URL: http://svn.apache.org/viewvc?rev=1124772view=rev Log: TIKA-660 Merge the two CompositeParserTests and PatternsTests into one each in core Added: tika/trunk/tika-core/src/test/java/org/apache/tika/parser/DummyParser.java

svn commit: r1144314 - in /tika/trunk/tika-parsers/src: main/java/org/apache/tika/parser/prt/PRTParser.java test/java/org/apache/tika/parser/prt/PRTParserTest.java test/resources/test-documents/testCA

2011-07-08 Thread nick
Author: nick Date: Fri Jul 8 13:51:49 2011 New Revision: 1144314 URL: http://svn.apache.org/viewvc?rev=1144314view=rev Log: TIKA-679 Update the CADKEY PRT parser to get the description, and tweak the text encoding based on work by Troy Added: tika/trunk/tika-parsers/src/test/resources/test

svn commit: r1147172 - in /tika/trunk/tika-parsers/src: main/java/org/apache/tika/parser/rtf/RTFParser.java test/java/org/apache/tika/parser/rtf/ test/java/org/apache/tika/parser/rtf/RTFParserTest.jav

2011-07-15 Thread nick
Author: nick Date: Fri Jul 15 14:53:20 2011 New Revision: 1147172 URL: http://svn.apache.org/viewvc?rev=1147172view=rev Log: TIKA-683 Create a dedicate RTF parser test, based on the existing checks in TestParsers Added: tika/trunk/tika-parsers/src/test/java/org/apache/tika/parser/rtf

svn commit: r1147250 - in /tika/trunk: tika-core/src/main/resources/org/apache/tika/mime/ tika-core/src/test/java/org/apache/tika/ tika-parsers/src/test/java/org/apache/tika/mime/ tika-parsers/src/tes

2011-07-15 Thread nick
Author: nick Date: Fri Jul 15 17:04:09 2011 New Revision: 1147250 URL: http://svn.apache.org/viewvc?rev=1147250view=rev Log: TIKA-507 Split the mime type entries for AFM and PFM (font metrics) out from the fonts themselves, and add magic detection patterns for them Added: tika/trunk/tika

svn commit: r1147262 - in /tika/trunk: tika-core/src/main/resources/org/apache/tika/mime/tika-mimetypes.xml tika-parsers/src/test/java/org/apache/tika/mime/TestMimeTypes.java

2011-07-15 Thread nick
Author: nick Date: Fri Jul 15 17:47:26 2011 New Revision: 1147262 URL: http://svn.apache.org/viewvc?rev=1147262view=rev Log: TIKA-507 Add byte based detection tests for .pfa/.pfb/.pfm (which we currently lack free sample files for) Modified: tika/trunk/tika-core/src/main/resources/org

svn commit: r1173761 - in /tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/microsoft: HSLFExtractor.java ooxml/XSLFPowerPointExtractorDecorator.java

2011-09-21 Thread nick
Author: nick Date: Wed Sep 21 17:03:38 2011 New Revision: 1173761 URL: http://svn.apache.org/viewvc?rev=1173761view=rev Log: TIKA-712 Fetch Master Slide text for PPT and PPTX text extraction Modified: tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/microsoft/HSLFExtractor.java

svn commit: r1174719 - in /tika/trunk/tika-parsers/src: main/java/org/apache/tika/parser/txt/CharsetMatch.java main/java/org/apache/tika/parser/txt/CharsetRecog_sbcs.java test/java/org/apache/tika/par

2011-09-23 Thread nick
Author: nick Date: Fri Sep 23 12:57:47 2011 New Revision: 1174719 URL: http://svn.apache.org/viewvc?rev=1174719view=rev Log: TIKA-720 Add documentation for some of CharsetRecog_sbcs, and tweak the EBCDIC bit to avoid false matches for short snippets of HTML Modified: tika/trunk/tika

svn commit: r1175014 - in /tika/trunk/tika-parsers/src: main/java/org/apache/tika/parser/microsoft/OutlookExtractor.java test/java/org/apache/tika/parser/microsoft/OutlookParserTest.java

2011-09-23 Thread nick
Author: nick Date: Fri Sep 23 20:54:36 2011 New Revision: 1175014 URL: http://svn.apache.org/viewvc?rev=1175014view=rev Log: Add a disabled Outlook RTF related test, pending a fix for TIKA-632. (We're nearly there with the recent RTF improvements, but not quite) Modified: tika/trunk

svn commit: r1177313 - /tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/microsoft/HSLFExtractor.java

2011-09-29 Thread nick
Author: nick Date: Thu Sep 29 14:12:21 2011 New Revision: 1177313 URL: http://svn.apache.org/viewvc?rev=1177313view=rev Log: HSLF Extractor improvements from Pablo from TIKA-727 Modified: tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/microsoft/HSLFExtractor.java Modified

svn commit: r1179669 - in /tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/mp3: ID3v22Handler.java ID3v23Handler.java ID3v24Handler.java

2011-10-06 Thread nick
Author: nick Date: Thu Oct 6 15:32:23 2011 New Revision: 1179669 URL: http://svn.apache.org/viewvc?rev=1179669view=rev Log: TIKA-745 If we find a ID3v2 Genre that isn't one of the ones in v1, use it as-is Modified: tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/mp3

svn commit: r1179829 - in /tika/trunk/tika-core/src: main/java/org/apache/tika/mime/ test/java/org/apache/tika/mime/ test/resources/org/apache/tika/mime/

2011-10-06 Thread nick
Author: nick Date: Thu Oct 6 20:29:41 2011 New Revision: 1179829 URL: http://svn.apache.org/viewvc?rev=1179829view=rev Log: TIKA-746 Allow MimeTypesFactory to take more than once resource to load, and update the default to be to load tika-mimetypes.xml followed by any custom-mimetypes.xml

svn commit: r1180224 - /tika/trunk/tika-core/src/main/resources/org/apache/tika/mime/tika-mimetypes.xml

2011-10-07 Thread nick
Author: nick Date: Fri Oct 7 20:48:03 2011 New Revision: 1180224 URL: http://svn.apache.org/viewvc?rev=1180224view=rev Log: TIKA-682 Add mime magic detection for PSD files Modified: tika/trunk/tika-core/src/main/resources/org/apache/tika/mime/tika-mimetypes.xml Modified: tika/trunk/tika

svn commit: r1180230 - in /tika/trunk/tika-parsers/src: main/java/org/apache/tika/parser/image/ main/resources/META-INF/services/ test/java/org/apache/tika/parser/image/ test/resources/test-documents/

2011-10-07 Thread nick
Author: nick Date: Fri Oct 7 20:52:20 2011 New Revision: 1180230 URL: http://svn.apache.org/viewvc?rev=1180230view=rev Log: TIKA-682 Add a basic PSD metadata extracting Parser Added: tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/image/PSDParser.java tika/trunk/tika

svn commit: r1180243 - in /tika/trunk: tika-core/src/main/java/org/apache/tika/io/EndianUtils.java tika-parsers/src/main/java/org/apache/tika/parser/dwg/DWGParser.java tika-parsers/src/main/java/org/a

2011-10-07 Thread nick
Author: nick Date: Fri Oct 7 21:05:22 2011 New Revision: 1180243 URL: http://svn.apache.org/viewvc?rev=1180243view=rev Log: TIKA-749 Convert the DWG and PRT parsers to use the Tika endian util, rather than the POI one Modified: tika/trunk/tika-core/src/main/java/org/apache/tika/io

svn commit: r1180244 - /tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/image/PSDParser.java

2011-10-07 Thread nick
Author: nick Date: Fri Oct 7 21:10:09 2011 New Revision: 1180244 URL: http://svn.apache.org/viewvc?rev=1180244view=rev Log: TIKA-682 Fix 1.6ism Modified: tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/image/PSDParser.java Modified: tika/trunk/tika-parsers/src/main/java/org

svn commit: r1182805 - /tika/trunk/tika-core/src/main/resources/org/apache/tika/mime/tika-mimetypes.xml

2011-10-13 Thread nick
Author: nick Date: Thu Oct 13 12:34:50 2011 New Revision: 1182805 URL: http://svn.apache.org/viewvc?rev=1182805view=rev Log: Add a common alias for the WordPerfect mimetype Modified: tika/trunk/tika-core/src/main/resources/org/apache/tika/mime/tika-mimetypes.xml Modified: tika/trunk/tika

svn commit: r1185658 - in /tika/trunk/tika-core/src/main/java/org/apache/tika: Tika.java config/TikaConfig.java parser/AutoDetectParser.java

2011-10-18 Thread nick
Author: nick Date: Tue Oct 18 13:52:09 2011 New Revision: 1185658 URL: http://svn.apache.org/viewvc?rev=1185658view=rev Log: TIKA-755 Have TikaConfig create a DefaultDetector instance based on the supplied MimeTypes and/or ClassLoader, and switch Tika+AutoDetectParser to get their detector from

svn commit: r1202109 - in /tika/trunk/tika-parsers/src: main/java/org/apache/tika/parser/microsoft/POIFSContainerDetector.java test/java/org/apache/tika/detect/TestContainerAwareDetector.java test/res

2011-11-15 Thread nick
Author: nick Date: Tue Nov 15 09:41:46 2011 New Revision: 1202109 URL: http://svn.apache.org/viewvc?rev=1202109view=rev Log: TIKA-779 Works 2000 container aware detection, plus test Added: tika/trunk/tika-parsers/src/test/resources/test-documents/testWORKS2000.wps (with props) Modified

svn commit: r1203681 - in /tika/trunk/tika-parsers/src/test/resources/test-documents: testDITA.dita testDITA.ditamap testDITA2.dita

2011-11-18 Thread nick
Author: nick Date: Fri Nov 18 15:01:07 2011 New Revision: 1203681 URL: http://svn.apache.org/viewvc?rev=1203681view=rev Log: TIKA-784 Sample DITA task, concept and map files. (Based on some Alfresco documentation, with content replaced with Tika info) Added: tika/trunk/tika-parsers/src/test

svn commit: r1203689 - in /tika/trunk: tika-core/src/main/resources/org/apache/tika/mime/tika-mimetypes.xml tika-parsers/src/test/java/org/apache/tika/mime/TestMimeTypes.java

2011-11-18 Thread nick
Author: nick Date: Fri Nov 18 15:13:52 2011 New Revision: 1203689 URL: http://svn.apache.org/viewvc?rev=1203689view=rev Log: TIKA-784 DITA mimetype entries for the 3 subtypes, plus tests Modified: tika/trunk/tika-core/src/main/resources/org/apache/tika/mime/tika-mimetypes.xml tika

svn commit: r1204311 - in /tika/trunk: tika-core/src/main/resources/org/apache/tika/mime/tika-mimetypes.xml tika-parsers/src/test/java/org/apache/tika/mime/TestMimeTypes.java

2011-11-20 Thread nick
Author: nick Date: Mon Nov 21 01:24:58 2011 New Revision: 1204311 URL: http://svn.apache.org/viewvc?rev=1204311view=rev Log: TIKA-784 Switch the DITA types to be format specialisations, rather than their own dedicated mimetypes, to match the OASIS recommendation Modified: tika/trunk/tika

svn commit: r1204435 - /tika/trunk/tika-parsers/src/test/java/org/apache/tika/detect/TestContainerAwareDetector.java

2011-11-21 Thread nick
Author: nick Date: Mon Nov 21 10:30:22 2011 New Revision: 1204435 URL: http://svn.apache.org/viewvc?rev=1204435view=rev Log: Expand container detection tests, and added disabled (failing) tests for TIKA-786 Modified: tika/trunk/tika-parsers/src/test/java/org/apache/tika/detect

svn commit: r1204464 - /tika/trunk/tika-core/src/main/java/org/apache/tika/mime/MediaType.java

2011-11-21 Thread nick
Author: nick Date: Mon Nov 21 12:17:48 2011 New Revision: 1204464 URL: http://svn.apache.org/viewvc?rev=1204464view=rev Log: Add basic JavaDoc for a few MediaType methods that lacked it Modified: tika/trunk/tika-core/src/main/java/org/apache/tika/mime/MediaType.java Modified: tika/trunk

svn commit: r1204476 - in /tika/trunk: tika-core/src/main/java/org/apache/tika/detect/DefaultDetector.java tika-parsers/src/test/java/org/apache/tika/detect/TestContainerAwareDetector.java

2011-11-21 Thread nick
Author: nick Date: Mon Nov 21 12:55:49 2011 New Revision: 1204476 URL: http://svn.apache.org/viewvc?rev=1204476view=rev Log: TIKA-786 Control the ordering of detectors in DefaultDetector, so that user supplied detectors come first, then Tika ones, and finally MimeTypes. This ensures that more

svn commit: r1204479 - /tika/trunk/CHANGES.txt

2011-11-21 Thread nick
Author: nick Date: Mon Nov 21 13:15:29 2011 New Revision: 1204479 URL: http://svn.apache.org/viewvc?rev=1204479view=rev Log: Add a note about TIKA-786 to Changes Modified: tika/trunk/CHANGES.txt Modified: tika/trunk/CHANGES.txt URL: http://svn.apache.org/viewvc/tika/trunk/CHANGES.txt?rev

svn commit: r1206185 - in /tika/trunk/tika-parsers/src/test/resources/test-documents: testMPP2003.mpp testMPP2007.mpp

2011-11-25 Thread nick
Author: nick Date: Fri Nov 25 14:19:23 2011 New Revision: 1206185 URL: http://svn.apache.org/viewvc?rev=1206185view=rev Log: TIKA-789 Sample Microsoft Project (MPP) files Added: tika/trunk/tika-parsers/src/test/resources/test-documents/testMPP2003.mpp (with props) tika/trunk/tika

svn commit: r1206193 - in /tika/trunk/tika-parsers/src: main/java/org/apache/tika/parser/microsoft/POIFSContainerDetector.java test/java/org/apache/tika/detect/TestContainerAwareDetector.java

2011-11-25 Thread nick
Author: nick Date: Fri Nov 25 14:36:03 2011 New Revision: 1206193 URL: http://svn.apache.org/viewvc?rev=1206193view=rev Log: TIKA-789 POIFS Container Detection support for MPP files Modified: tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/microsoft/POIFSContainerDetector.java

svn commit: r1206225 - /tika/trunk/CHANGES.txt

2011-11-25 Thread nick
Author: nick Date: Fri Nov 25 15:38:32 2011 New Revision: 1206225 URL: http://svn.apache.org/viewvc?rev=1206225view=rev Log: Add CHANGES entry for TIKA-789 Modified: tika/trunk/CHANGES.txt Modified: tika/trunk/CHANGES.txt URL: http://svn.apache.org/viewvc/tika/trunk/CHANGES.txt?rev

svn commit: r1206228 - /tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/microsoft/OfficeParser.java

2011-11-25 Thread nick
Author: nick Date: Fri Nov 25 15:43:05 2011 New Revision: 1206228 URL: http://svn.apache.org/viewvc?rev=1206228view=rev Log: TIKA-789 Add the project type to the OfficeParser mimetype list, and add a note on why Works is missing from the list Modified: tika/trunk/tika-parsers/src/main/java

svn commit: r1206791 - /tika/trunk/tika-parsers/src/test/resources/test-documents/test-documents.cpio

2011-11-27 Thread nick
Author: nick Date: Sun Nov 27 18:10:01 2011 New Revision: 1206791 URL: http://svn.apache.org/viewvc?rev=1206791view=rev Log: TIKA-697 Test CPIO file Added: tika/trunk/tika-parsers/src/test/resources/test-documents/test-documents.cpio (with props) Added: tika/trunk/tika-parsers/src/test

svn commit: r1206869 - /tika/trunk/tika-parsers/src/test/java/org/apache/tika/mime/TestMimeTypes.java

2011-11-27 Thread nick
Author: nick Date: Sun Nov 27 22:21:39 2011 New Revision: 1206869 URL: http://svn.apache.org/viewvc?rev=1206869view=rev Log: TIKA-697 Archive formats mimetype tests (not all of which work yet) Modified: tika/trunk/tika-parsers/src/test/java/org/apache/tika/mime/TestMimeTypes.java Modified

svn commit: r1206896 - in /tika/trunk: tika-core/src/main/java/org/apache/tika/detect/MagicDetector.java tika-core/src/main/resources/org/apache/tika/mime/tika-mimetypes.xml tika-parsers/src/test/java

2011-11-27 Thread nick
Author: nick Date: Sun Nov 27 22:51:36 2011 New Revision: 1206896 URL: http://svn.apache.org/viewvc?rev=1206896view=rev Log: TIKA-697 Correct mime match for .ar unix archives, add the suggested extra filetypes and aliases, and list .deb as being ar based Modified: tika/trunk/tika-core/src

svn commit: r1206898 - /tika/trunk/tika-core/src/main/resources/org/apache/tika/mime/tika-mimetypes.xml

2011-11-27 Thread nick
Author: nick Date: Sun Nov 27 22:57:18 2011 New Revision: 1206898 URL: http://svn.apache.org/viewvc?rev=1206898view=rev Log: TIKA-697 Add mime magic for .deb files, which are base on .ar but have a specific first entry Modified: tika/trunk/tika-core/src/main/resources/org/apache/tika/mime

svn commit: r1206937 - in /tika/trunk: tika-core/src/main/java/org/apache/tika/detect/MagicDetector.java tika-parsers/src/test/java/org/apache/tika/mime/TestMimeTypes.java

2011-11-27 Thread nick
Author: nick Date: Mon Nov 28 00:26:28 2011 New Revision: 1206937 URL: http://svn.apache.org/viewvc?rev=1206937view=rev Log: TIKA-794 Correct Little16 mime magic logic, and enable the CPIO test now that the detection is correct Modified: tika/trunk/tika-core/src/main/java/org/apache/tika

svn commit: r1207124 - /tika/trunk/tika-parsers/src/test/resources/test-documents/

2011-11-28 Thread nick
Author: nick Date: Mon Nov 28 13:05:40 2011 New Revision: 1207124 URL: http://svn.apache.org/viewvc?rev=1207124view=rev Log: TIKA-791 Sample protect Microsoft Office documents Added: tika/trunk/tika-parsers/src/test/resources/test-documents/testEXCEL_protected_passtika.xls (with props

svn commit: r1209438 - in /tika/trunk/tika-core/src: main/resources/org/apache/tika/mime/tika-mimetypes.xml test/java/org/apache/tika/mime/MimeTypesReaderTest.java

2011-12-02 Thread nick
Author: nick Date: Fri Dec 2 12:17:53 2011 New Revision: 1209438 URL: http://svn.apache.org/viewvc?rev=1209438view=rev Log: Patch+Test from Antoni from TIKA-797 - Correct the default PPT extension Modified: tika/trunk/tika-core/src/main/resources/org/apache/tika/mime/tika-mimetypes.xml

svn commit: r1210322 - in /tika/trunk/tika-parsers/src: main/java/org/apache/tika/parser/microsoft/WordExtractor.java test/java/org/apache/tika/parser/microsoft/WordParserTest.java

2011-12-04 Thread nick
Author: nick Date: Mon Dec 5 03:44:42 2011 New Revision: 1210322 URL: http://svn.apache.org/viewvc?rev=1210322view=rev Log: TIKA-410 Word Parser support for extracting textbox content (Patch from John Mastarone) Modified: tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser

svn commit: r1211760 - /tika/trunk/tika-app/src/main/java/org/apache/tika/cli/TikaCLI.java

2011-12-07 Thread nick
Author: nick Date: Thu Dec 8 05:30:39 2011 New Revision: 1211760 URL: http://svn.apache.org/viewvc?rev=1211760view=rev Log: Add TikaCLI help for the -f/--fork option previously added for TIKA-416 Modified: tika/trunk/tika-app/src/main/java/org/apache/tika/cli/TikaCLI.java Modified: tika

svn commit: r1213131 - in /tika/trunk/tika-parsers/src/test/java/org/apache/tika/parser/fork: ./ ForkParserIntegrationTest.java

2011-12-11 Thread nick
Author: nick Date: Mon Dec 12 02:25:29 2011 New Revision: 1213131 URL: http://svn.apache.org/viewvc?rev=1213131view=rev Log: Add disabled tests for TIKA-808 (parser needs fixing so that tests can pass) Added: tika/trunk/tika-parsers/src/test/java/org/apache/tika/parser/fork/ tika/trunk

svn commit: r1213560 - /tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/microsoft/OutlookExtractor.java

2011-12-12 Thread nick
Author: nick Date: Tue Dec 13 04:13:53 2011 New Revision: 1213560 URL: http://svn.apache.org/viewvc?rev=1213560view=rev Log: TIKA-803 Wrap the outlook message body in a special div Modified: tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/microsoft/OutlookExtractor.java

svn commit: r1221109 - in /tika/trunk/tika-parsers: pom.xml src/main/java/org/apache/tika/parser/microsoft/ooxml/XSLFPowerPointExtractorDecorator.java src/test/java/org/apache/tika/parser/microsoft/PO

2011-12-19 Thread nick
Author: nick Date: Tue Dec 20 05:59:57 2011 New Revision: 1221109 URL: http://svn.apache.org/viewvc?rev=1221109view=rev Log: TIKA-700 Upgrade to POI 3.8 beta 5 Modified: tika/trunk/tika-parsers/pom.xml tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/microsoft/ooxml

svn commit: r1221581 - in /tika/trunk/tika-core/src: main/java/org/apache/tika/mime/MediaType.java test/java/org/apache/tika/mime/MediaTypeTest.java

2011-12-20 Thread nick
Author: nick Date: Wed Dec 21 03:03:17 2011 New Revision: 1221581 URL: http://svn.apache.org/viewvc?rev=1221581view=rev Log: TIKA-822 - Handle quoted parameters on media types Modified: tika/trunk/tika-core/src/main/java/org/apache/tika/mime/MediaType.java tika/trunk/tika-core/src/test

svn commit: r1224675 - in /tika/trunk/tika-core/src/main/java/org/apache/tika: fork/ForkParser.java io/TikaInputStream.java

2011-12-25 Thread nick
Author: nick Date: Mon Dec 26 04:04:54 2011 New Revision: 1224675 URL: http://svn.apache.org/viewvc?rev=1224675view=rev Log: TIKA-829 Validate inputs to the ForkParser constructor (must not be another ForkParser) and TikaInputStream get (must not be null) - patch from Jerome Lacoste Modified

svn commit: r1224728 - in /tika/trunk: tika-core/src/main/java/org/apache/tika/fork/ForkClient.java tika-parsers/src/test/java/org/apache/tika/parser/fork/ForkParserIntegrationTest.java

2011-12-26 Thread nick
Author: nick Date: Mon Dec 26 12:51:47 2011 New Revision: 1224728 URL: http://svn.apache.org/viewvc?rev=1224728view=rev Log: TIKA-831 Fix the data type when comparing errors from the forked server, and add some more Forked unit tests (one disabled) - patch originally from Jerome Lacoste

  1   2   3   4   5   6   7   8   9   >