Author: nick
Date: Wed Jun 23 16:12:23 2010
New Revision: 957258
URL: http://svn.apache.org/viewvc?rev=957258view=rev
Log:
Add myself to the committers list, and remove Ken Krugler's duplicate entry
Modified:
tika/trunk/tika-parent/pom.xml
Modified: tika/trunk/tika-parent/pom.xml
URL:
http
Author: nick
Date: Wed Jun 23 16:57:58 2010
New Revision: 957271
URL: http://svn.apache.org/viewvc?rev=957271view=rev
Log:
Apply patch from Maxim Valyanskiy from TIKA-437 - support encrypted OOXML
office files which use the default password.
Added:
tika/trunk/tika-parsers/src/test
Author: nick
Date: Mon Jun 28 13:59:08 2010
New Revision: 958581
URL: http://svn.apache.org/viewvc?rev=958581view=rev
Log:
Use the new TIFF Metadata entries for image width/length/sampling from the
TIFF, JPEG and general Image (ImageIO) parsers. Gives a small number of
consistent image related
Author: nick
Date: Tue Jun 29 11:12:15 2010
New Revision: 958924
URL: http://svn.apache.org/viewvc?rev=958924view=rev
Log:
Unit test to show that we support pptx, pptm, ppsx and ppsm (TIKA-418)
.thmx will need a POI upgrade, but the file format lacks any text!
.xps is still unsupported by POI
Author: nick
Date: Tue Jun 29 12:06:19 2010
New Revision: 958942
URL: http://svn.apache.org/viewvc?rev=958942view=rev
Log:
Enable extraction of longitude and latitude from JPEG/Tiff files (via the EXIF
tags), and HTML (via the ICBM meta tag), to the new geographic metadata
namespace
Added
Author: nick
Date: Wed Jul 14 22:46:50 2010
New Revision: 964235
URL: http://svn.apache.org/viewvc?rev=964235view=rev
Log:
TIKA-451 - Inconsistent date format for Metadata.CREATION_DATE and
Metadata.LAST_MODIFIED
Make CREATION_DATE and LAST_MODIFIED Date property instances, and add support
Author: nick
Date: Mon Jul 26 12:15:05 2010
New Revision: 979256
URL: http://svn.apache.org/viewvc?rev=979256view=rev
Log:
Add the new rome dependency to the bundle (TIKA-466)
Modified:
tika/trunk/tika-bundle/pom.xml
Modified: tika/trunk/tika-bundle/pom.xml
URL:
http://svn.apache.org
Author: nick
Date: Thu Jul 29 16:59:14 2010
New Revision: 980508
URL: http://svn.apache.org/viewvc?rev=980508view=rev
Log:
Make mime type detection a little bit more stable (TIKA-391)
Make the comparison operator work better on Magic types, and ensure that the
type is present on the magic
Author: nick
Date: Fri Aug 13 15:35:23 2010
New Revision: 985248
URL: http://svn.apache.org/viewvc?rev=985248view=rev
Log:
Update the site build to include the detection page
Added:
tika/site/publish/0.7/detection.html
Modified:
tika/site/publish/0.5/documentation.html
tika/site
Author: nick
Date: Mon Sep 6 17:42:52 2010
New Revision: 993108
URL: http://svn.apache.org/viewvc?rev=993108view=rev
Log:
Add support for to the ContainerAwareDetector for Corel OLE2 formats, and
Microsoft Works (TIKA-486)
Also slightly refactor the child container detectors, so we can do
Author: nick
Date: Mon Sep 6 17:51:43 2010
New Revision: 993113
URL: http://svn.apache.org/viewvc?rev=993113view=rev
Log:
Apply (with slight tweaks) Antoni Mylka's container aware detector patch for
truncated OLE2 documents - TIKA-485
Modified:
tika/trunk/tika-parsers/src/main/java/org
Author: nick
Date: Tue Oct 19 14:56:54 2010
New Revision: 1024255
URL: http://svn.apache.org/viewvc?rev=1024255view=rev
Log:
Add iWork support to the Container Aware Detector (TIKA-533)
It's a bit icky for now, but it works and it's quick...
Added:
tika/trunk/tika-parsers/src/test/resources
Author: nick
Date: Tue Oct 19 15:54:41 2010
New Revision: 1024291
URL: http://svn.apache.org/viewvc?rev=1024291view=rev
Log:
Add --container-aware-detector option to the Tika CLI, which will switch the
detector used by the auto parser
Modified:
tika/trunk/tika-app/src/main/java/org/apache
Author: nick
Date: Fri Nov 12 16:40:43 2010
New Revision: 1034463
URL: http://svn.apache.org/viewvc?rev=1034463view=rev
Log:
TIKA-552 - Handle word styles like heading 4 just like Heading 4, and in
.docx files insert bookmarks as anchor tags, along with relative hyperlinks for
the text
Author: nick
Date: Fri Nov 26 18:25:38 2010
New Revision: 1039496
URL: http://svn.apache.org/viewvc?rev=1039496view=rev
Log:
Apply mimetype updates from TIKA-560
Modified:
tika/trunk/tika-core/src/main/resources/org/apache/tika/mime/tika-mimetypes.xml
Modified:
tika/trunk/tika-core/src
Author: nick
Date: Mon Dec 13 02:41:59 2010
New Revision: 1045006
URL: http://svn.apache.org/viewvc?rev=1045006view=rev
Log:
Apply patch from TIKA-570 from Benson Margulies - stricter BMP detection and
unit test
Added:
tika/trunk/tika-parsers/src/test/resources/test-documents/testBMPfp.txt
Author: nick
Date: Tue Jan 18 14:15:47 2011
New Revision: 1060387
URL: http://svn.apache.org/viewvc?rev=1060387view=rev
Log:
Add test access mdb file from TIKA-586
Added:
tika/trunk/tika-parsers/src/test/resources/test-documents/testACCESS.mdb
(with props)
Added: tika/trunk/tika-parsers
Author: nick
Date: Tue Jan 18 14:17:07 2011
New Revision: 1060389
URL: http://svn.apache.org/viewvc?rev=1060389view=rev
Log:
Add test true type font file from TIKA-586
Added:
tika/trunk/tika-parsers/src/test/resources/test-documents/testTrueType.ttf
(with props)
Added:
tika/trunk/tika
Author: nick
Date: Tue Jan 18 14:28:33 2011
New Revision: 1060393
URL: http://svn.apache.org/viewvc?rev=1060393view=rev
Log:
Access mdb detection and test from Martijn in TIKA-586
Modified:
tika/trunk/tika-core/src/main/resources/org/apache/tika/mime/tika-mimetypes.xml
tika/trunk/tika
Author: nick
Date: Thu Jan 27 17:54:37 2011
New Revision: 1064232
URL: http://svn.apache.org/viewvc?rev=1064232view=rev
Log:
Fix up the iwork mime types with the patch from TIKA-588, and also add a unit
test for the detection using the non-container detector (we already had
container aware
Author: nick
Date: Fri Mar 4 16:06:28 2011
New Revision: 1078031
URL: http://svn.apache.org/viewvc?rev=1078031view=rev
Log:
TIKA-606 - MP3 lyrics tags use a 6 digit length for the overall size, but only
5 digits for each tag
Modified:
tika/trunk/tika-parsers/src/main/java/org/apache/tika
Author: nick
Date: Mon Mar 14 14:27:05 2011
New Revision: 1081392
URL: http://svn.apache.org/viewvc?rev=1081392view=rev
Log:
Update the OOXML Excel (.xlsx) extractor to be largely SAX based, to reduce the
memory use (it now works in a similar-ish way to the .xls one). Bumps the POI
dependency
Author: nick
Date: Mon Mar 14 20:26:36 2011
New Revision: 1081547
URL: http://svn.apache.org/viewvc?rev=1081547view=rev
Log:
Fix the mime magic detection of TNEF files, and add a unit test for it. (The
rest of the TNEF support will be committed when POI 3.8 beta 2 is out).
(TIKA-615)
Added
Author: nick
Date: Fri Mar 18 17:00:42 2011
New Revision: 1082973
URL: http://svn.apache.org/viewvc?rev=1082973view=rev
Log:
TIKA-534 - When parsing a jpeg file with unhandled tags in it, skip these
Added:
tika/trunk/tika-parsers/src/test/resources/test-documents
Author: nick
Date: Sat Mar 19 01:14:28 2011
New Revision: 1083119
URL: http://svn.apache.org/viewvc?rev=1083119view=rev
Log:
Turning an ASCII string into static final bytes without exceptions shouldn't be
this hard Fix 1.6ism for TIKA-492
Modified:
tika/trunk/tika-parsers/src/main/java
Author: nick
Date: Wed Mar 23 18:11:32 2011
New Revision: 1084658
URL: http://svn.apache.org/viewvc?rev=1084658view=rev
Log:
Add some more detection tests, which show that for container formats the
addition of the filename lets us specialise from eg tika-msoffice to msword
Modified:
tika
Author: nick
Date: Wed Mar 23 22:57:56 2011
New Revision: 1084796
URL: http://svn.apache.org/viewvc?rev=1084796view=rev
Log:
Fix deprecated warnings
Modified:
tika/trunk/tika-parsers/src/test/java/org/apache/tika/parser/image/ImageParserTest.java
Modified:
tika/trunk/tika-parsers/src/test
Author: nick
Date: Wed Mar 23 23:00:13 2011
New Revision: 1084798
URL: http://svn.apache.org/viewvc?rev=1084798view=rev
Log:
When trying to identify a parser for a media type in AutoDetect and similar, if
the Parser claims to support an alias of the media type but not the canonical
one (eg
Author: nick
Date: Wed Mar 23 23:18:51 2011
New Revision: 1084801
URL: http://svn.apache.org/viewvc?rev=1084801view=rev
Log:
When creating a default TikaConfig instance with a DefaultParser, have the
newly created parser wired up with the Mime Type Registry we create. This
allows the parser
Author: nick
Date: Wed Mar 23 23:25:59 2011
New Revision: 1084805
URL: http://svn.apache.org/viewvc?rev=1084805view=rev
Log:
TIKA-555 fallout - While image/bmp isn't the official mimetype, it is what Java
thinks it is. So, switch from the official to the un-offial one before asking
Java to give
Author: nick
Revision: 1084798
Modified property: svn:log
Modified: svn:log at Thu Mar 24 09:58:59 2011
--
--- svn:log (original)
+++ svn:log Thu Mar 24 09:58:59 2011
@@ -1,2 +1 @@
-When trying to identify a parser
Author: nick
Revision: 1084801
Modified property: svn:log
Modified: svn:log at Thu Mar 24 10:00:30 2011
--
--- svn:log (original)
+++ svn:log Thu Mar 24 10:00:30 2011
@@ -1 +1 @@
-When creating a default TikaConfig
Author: nick
Date: Thu Mar 24 15:35:24 2011
New Revision: 1085003
URL: http://svn.apache.org/viewvc?rev=1085003view=rev
Log:
TIKA-620 - Have CompositeParser always use the canonical mimetype internally,
via suitable calls to registry.normalise, rather than trying to handle the
aliases
Author: nick
Date: Wed Mar 30 11:52:57 2011
New Revision: 1086912
URL: http://svn.apache.org/viewvc?rev=1086912view=rev
Log:
TIKA-624 - Update supported formats for 0.8 and 0.9
Modified:
tika/site/src/site/apt/0.8/formats.apt
tika/site/src/site/apt/0.9/formats.apt
Modified: tika/site
Modified: tika/site/publish/0.9/parser_guide.html
URL:
http://svn.apache.org/viewvc/tika/site/publish/0.9/parser_guide.html?rev=1086919r1=1086918r2=1086919view=diff
==
--- tika/site/publish/0.9/parser_guide.html
Author: nick
Date: Fri Apr 1 15:32:58 2011
New Revision: 1087762
URL: http://svn.apache.org/viewvc?rev=1087762view=rev
Log:
TIKA-631 - Sample Chinese outlook file
Added:
tika/trunk/tika-parsers/src/test/resources/test-documents/testMSG_chinese.msg
(with props)
Added:
tika/trunk/tika
Author: nick
Date: Wed Apr 6 16:19:17 2011
New Revision: 1089516
URL: http://svn.apache.org/viewvc?rev=1089516view=rev
Log:
TIKA-634 - Initial work on supporting more flexible ExternalParser loading (via
XML, part done), and external parser metadata extraction
Added:
tika/trunk/tika-core
Author: nick
Date: Wed Apr 6 16:20:00 2011
New Revision: 1089518
URL: http://svn.apache.org/viewvc?rev=1089518view=rev
Log:
TIKA-634 - Example external parsers config file
Added:
tika/trunk/tika-parsers/src/main/resources/org/
tika/trunk/tika-parsers/src/main/resources/org/apache
Author: nick
Date: Wed Apr 6 17:39:32 2011
New Revision: 1089543
URL: http://svn.apache.org/viewvc?rev=1089543view=rev
Log:
TIKA-634 - Add support for checking if the external command is there, for
collecting the output from a file, and a wrapper CompositeParser that loads all
available
Author: nick
Date: Mon Apr 11 11:44:18 2011
New Revision: 1091042
URL: http://svn.apache.org/viewvc?rev=1091042view=rev
Log:
TIKA-615 - Outlook parsing update for POI 3.8 beta 2
Modified:
tika/trunk/tika-parsers/pom.xml
tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser
Author: nick
Date: Mon Apr 11 11:47:29 2011
New Revision: 1091044
URL: http://svn.apache.org/viewvc?rev=1091044view=rev
Log:
TIKA-615 - POI powered TNEF parser
Added:
tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/microsoft/TNEFParser.java
Modified:
tika/trunk/tika
Author: nick
Date: Wed Apr 20 14:59:24 2011
New Revision: 1095429
URL: http://svn.apache.org/viewvc?rev=1095429view=rev
Log:
TIKA-644 - When generating html headings from word, h6 is the highest the xhtml
allows, so don't try generating h7 (or higher) even if Word has a 'Heading 7'
style
Author: nick
Date: Thu Apr 21 15:58:22 2011
New Revision: 1095759
URL: http://svn.apache.org/viewvc?rev=1095759view=rev
Log:
TIKA-643 - Now that we're using NPOIFS which takes files, simplify the code as
we don't need to use an InputStream
Modified:
tika/trunk/tika-parsers/src/main/java
Author: nick
Date: Thu Apr 21 15:59:42 2011
New Revision: 1095760
URL: http://svn.apache.org/viewvc?rev=1095760view=rev
Log:
TIKA-643 - Change TagginedInputStream to work like TikaInputStream for
creation, with a static get, to avoid double wrapping. Also adds toString
methods on the two
Author: nick
Date: Tue May 3 07:03:08 2011
New Revision: 1098942
URL: http://svn.apache.org/viewvc?rev=1098942view=rev
Log:
TIKA-213 JSON metadata output support, using the GSON library to do most of the
work
Modified:
tika/trunk/tika-app/pom.xml
tika/trunk/tika-app/src/main/java/org
Author: nick
Date: Wed May 4 01:06:05 2011
New Revision: 1099309
URL: http://svn.apache.org/viewvc?rev=1099309view=rev
Log:
TIKA-619 - Apply patch from Alexander Chow to ignore errors from a JRE GIF bug
Modified:
tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/image
Author: nick
Date: Fri May 6 01:21:36 2011
New Revision: 1100014
URL: http://svn.apache.org/viewvc?rev=1100014view=rev
Log:
TIKA-654 - Open the OOXML OPCPackage as read only, and fix serial version
warning
Modified:
tika/trunk/tika-parsers/src/main/java/org/apache/tika/detect
Author: nick
Date: Fri May 6 01:22:12 2011
New Revision: 1100015
URL: http://svn.apache.org/viewvc?rev=1100015view=rev
Log:
TIKA-654 - If we have an open container that can be closed, close it when
closing the stream
Modified:
tika/trunk/tika-core/src/main/java/org/apache/tika/io
Author: nick
Date: Fri May 6 04:45:49 2011
New Revision: 1100051
URL: http://svn.apache.org/viewvc?rev=1100051view=rev
Log:
TIKA-656 RFC822 and MBox parsers should output the same date metadata keys
Modified:
tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/mail
Author: nick
Date: Fri May 6 04:50:27 2011
New Revision: 1100053
URL: http://svn.apache.org/viewvc?rev=1100053view=rev
Log:
TIKA-656 Switch two more Office metadata keys that hold dates to being typed
date properties
Modified:
tika/trunk/tika-core/src/main/java/org/apache/tika/metadata
Author: nick
Date: Fri May 6 05:14:39 2011
New Revision: 1100061
URL: http://svn.apache.org/viewvc?rev=1100061view=rev
Log:
TIKA-656 Update the Outlook parser to handle dates the same way as the other
mail parsers
Modified:
tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/mbox
Author: nick
Date: Fri May 6 06:30:43 2011
New Revision: 1100100
URL: http://svn.apache.org/viewvc?rev=1100100view=rev
Log:
TIKA-652 Update the POIFS parser to handle custom metadata entries in the same
way that the Open Document one already does
Modified:
tika/trunk/tika-core/src/main
Author: nick
Date: Tue May 10 14:26:17 2011
New Revision: 1101471
URL: http://svn.apache.org/viewvc?rev=1101471view=rev
Log:
TIKA-658 TCPDump pcap mime matching
Modified:
tika/trunk/tika-core/src/main/resources/org/apache/tika/mime/tika-mimetypes.xml
tika/trunk/tika-parsers/src/test
Author: nick
Date: Sun May 15 20:57:55 2011
New Revision: 1103540
URL: http://svn.apache.org/viewvc?rev=1103540view=rev
Log:
TIKA-659 Merge the ODF parser tests, and put them in the new package
Added:
tika/trunk/tika-parsers/src/test/java/org/apache/tika/parser/odf/
- copied from
Author: nick
Date: Sun May 15 21:15:21 2011
New Revision: 1103545
URL: http://svn.apache.org/viewvc?rev=1103545view=rev
Log:
TIKA-646 Helper class to allow us to avoid calling endDocument until a later
time
Added:
tika/trunk/tika-core/src/main/java/org/apache/tika/sax
Author: nick
Date: Wed May 18 10:10:51 2011
New Revision: 1124165
URL: http://svn.apache.org/viewvc?rev=1124165view=rev
Log:
TIKA-213 Remove leading zeros from integers when outputting JSON
Modified:
tika/trunk/tika-app/src/main/java/org/apache/tika/cli/TikaCLI.java
Modified: tika/trunk
Author: nick
Date: Thu May 19 13:44:10 2011
New Revision: 1124772
URL: http://svn.apache.org/viewvc?rev=1124772view=rev
Log:
TIKA-660 Merge the two CompositeParserTests and PatternsTests into one each in
core
Added:
tika/trunk/tika-core/src/test/java/org/apache/tika/parser/DummyParser.java
Author: nick
Date: Fri Jul 8 13:51:49 2011
New Revision: 1144314
URL: http://svn.apache.org/viewvc?rev=1144314view=rev
Log:
TIKA-679 Update the CADKEY PRT parser to get the description, and tweak the
text encoding based on work by Troy
Added:
tika/trunk/tika-parsers/src/test/resources/test
Author: nick
Date: Fri Jul 15 14:53:20 2011
New Revision: 1147172
URL: http://svn.apache.org/viewvc?rev=1147172view=rev
Log:
TIKA-683 Create a dedicate RTF parser test, based on the existing checks in
TestParsers
Added:
tika/trunk/tika-parsers/src/test/java/org/apache/tika/parser/rtf
Author: nick
Date: Fri Jul 15 17:04:09 2011
New Revision: 1147250
URL: http://svn.apache.org/viewvc?rev=1147250view=rev
Log:
TIKA-507 Split the mime type entries for AFM and PFM (font metrics) out from
the fonts themselves, and add magic detection patterns for them
Added:
tika/trunk/tika
Author: nick
Date: Fri Jul 15 17:47:26 2011
New Revision: 1147262
URL: http://svn.apache.org/viewvc?rev=1147262view=rev
Log:
TIKA-507 Add byte based detection tests for .pfa/.pfb/.pfm (which we currently
lack free sample files for)
Modified:
tika/trunk/tika-core/src/main/resources/org
Author: nick
Date: Wed Sep 21 17:03:38 2011
New Revision: 1173761
URL: http://svn.apache.org/viewvc?rev=1173761view=rev
Log:
TIKA-712 Fetch Master Slide text for PPT and PPTX text extraction
Modified:
tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/microsoft/HSLFExtractor.java
Author: nick
Date: Fri Sep 23 12:57:47 2011
New Revision: 1174719
URL: http://svn.apache.org/viewvc?rev=1174719view=rev
Log:
TIKA-720 Add documentation for some of CharsetRecog_sbcs, and tweak the EBCDIC
bit to avoid false matches for short snippets of HTML
Modified:
tika/trunk/tika
Author: nick
Date: Fri Sep 23 20:54:36 2011
New Revision: 1175014
URL: http://svn.apache.org/viewvc?rev=1175014view=rev
Log:
Add a disabled Outlook RTF related test, pending a fix for TIKA-632. (We're
nearly there with the recent RTF improvements, but not quite)
Modified:
tika/trunk
Author: nick
Date: Thu Sep 29 14:12:21 2011
New Revision: 1177313
URL: http://svn.apache.org/viewvc?rev=1177313view=rev
Log:
HSLF Extractor improvements from Pablo from TIKA-727
Modified:
tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/microsoft/HSLFExtractor.java
Modified
Author: nick
Date: Thu Oct 6 15:32:23 2011
New Revision: 1179669
URL: http://svn.apache.org/viewvc?rev=1179669view=rev
Log:
TIKA-745 If we find a ID3v2 Genre that isn't one of the ones in v1, use it as-is
Modified:
tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/mp3
Author: nick
Date: Thu Oct 6 20:29:41 2011
New Revision: 1179829
URL: http://svn.apache.org/viewvc?rev=1179829view=rev
Log:
TIKA-746 Allow MimeTypesFactory to take more than once resource to load, and
update the default to be to load tika-mimetypes.xml followed by any
custom-mimetypes.xml
Author: nick
Date: Fri Oct 7 20:48:03 2011
New Revision: 1180224
URL: http://svn.apache.org/viewvc?rev=1180224view=rev
Log:
TIKA-682 Add mime magic detection for PSD files
Modified:
tika/trunk/tika-core/src/main/resources/org/apache/tika/mime/tika-mimetypes.xml
Modified:
tika/trunk/tika
Author: nick
Date: Fri Oct 7 20:52:20 2011
New Revision: 1180230
URL: http://svn.apache.org/viewvc?rev=1180230view=rev
Log:
TIKA-682 Add a basic PSD metadata extracting Parser
Added:
tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/image/PSDParser.java
tika/trunk/tika
Author: nick
Date: Fri Oct 7 21:05:22 2011
New Revision: 1180243
URL: http://svn.apache.org/viewvc?rev=1180243view=rev
Log:
TIKA-749 Convert the DWG and PRT parsers to use the Tika endian util, rather
than the POI one
Modified:
tika/trunk/tika-core/src/main/java/org/apache/tika/io
Author: nick
Date: Fri Oct 7 21:10:09 2011
New Revision: 1180244
URL: http://svn.apache.org/viewvc?rev=1180244view=rev
Log:
TIKA-682 Fix 1.6ism
Modified:
tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/image/PSDParser.java
Modified:
tika/trunk/tika-parsers/src/main/java/org
Author: nick
Date: Thu Oct 13 12:34:50 2011
New Revision: 1182805
URL: http://svn.apache.org/viewvc?rev=1182805view=rev
Log:
Add a common alias for the WordPerfect mimetype
Modified:
tika/trunk/tika-core/src/main/resources/org/apache/tika/mime/tika-mimetypes.xml
Modified:
tika/trunk/tika
Author: nick
Date: Tue Oct 18 13:52:09 2011
New Revision: 1185658
URL: http://svn.apache.org/viewvc?rev=1185658view=rev
Log:
TIKA-755 Have TikaConfig create a DefaultDetector instance based on the
supplied MimeTypes and/or ClassLoader, and switch Tika+AutoDetectParser to get
their detector from
Author: nick
Date: Tue Nov 15 09:41:46 2011
New Revision: 1202109
URL: http://svn.apache.org/viewvc?rev=1202109view=rev
Log:
TIKA-779 Works 2000 container aware detection, plus test
Added:
tika/trunk/tika-parsers/src/test/resources/test-documents/testWORKS2000.wps
(with props)
Modified
Author: nick
Date: Fri Nov 18 15:01:07 2011
New Revision: 1203681
URL: http://svn.apache.org/viewvc?rev=1203681view=rev
Log:
TIKA-784 Sample DITA task, concept and map files. (Based on some Alfresco
documentation, with content replaced with Tika info)
Added:
tika/trunk/tika-parsers/src/test
Author: nick
Date: Fri Nov 18 15:13:52 2011
New Revision: 1203689
URL: http://svn.apache.org/viewvc?rev=1203689view=rev
Log:
TIKA-784 DITA mimetype entries for the 3 subtypes, plus tests
Modified:
tika/trunk/tika-core/src/main/resources/org/apache/tika/mime/tika-mimetypes.xml
tika
Author: nick
Date: Mon Nov 21 01:24:58 2011
New Revision: 1204311
URL: http://svn.apache.org/viewvc?rev=1204311view=rev
Log:
TIKA-784 Switch the DITA types to be format specialisations, rather than their
own dedicated mimetypes, to match the OASIS recommendation
Modified:
tika/trunk/tika
Author: nick
Date: Mon Nov 21 10:30:22 2011
New Revision: 1204435
URL: http://svn.apache.org/viewvc?rev=1204435view=rev
Log:
Expand container detection tests, and added disabled (failing) tests for
TIKA-786
Modified:
tika/trunk/tika-parsers/src/test/java/org/apache/tika/detect
Author: nick
Date: Mon Nov 21 12:17:48 2011
New Revision: 1204464
URL: http://svn.apache.org/viewvc?rev=1204464view=rev
Log:
Add basic JavaDoc for a few MediaType methods that lacked it
Modified:
tika/trunk/tika-core/src/main/java/org/apache/tika/mime/MediaType.java
Modified: tika/trunk
Author: nick
Date: Mon Nov 21 12:55:49 2011
New Revision: 1204476
URL: http://svn.apache.org/viewvc?rev=1204476view=rev
Log:
TIKA-786 Control the ordering of detectors in DefaultDetector, so that user
supplied detectors come first, then Tika ones, and finally MimeTypes. This
ensures that more
Author: nick
Date: Mon Nov 21 13:15:29 2011
New Revision: 1204479
URL: http://svn.apache.org/viewvc?rev=1204479view=rev
Log:
Add a note about TIKA-786 to Changes
Modified:
tika/trunk/CHANGES.txt
Modified: tika/trunk/CHANGES.txt
URL:
http://svn.apache.org/viewvc/tika/trunk/CHANGES.txt?rev
Author: nick
Date: Fri Nov 25 14:19:23 2011
New Revision: 1206185
URL: http://svn.apache.org/viewvc?rev=1206185view=rev
Log:
TIKA-789 Sample Microsoft Project (MPP) files
Added:
tika/trunk/tika-parsers/src/test/resources/test-documents/testMPP2003.mpp
(with props)
tika/trunk/tika
Author: nick
Date: Fri Nov 25 14:36:03 2011
New Revision: 1206193
URL: http://svn.apache.org/viewvc?rev=1206193view=rev
Log:
TIKA-789 POIFS Container Detection support for MPP files
Modified:
tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/microsoft/POIFSContainerDetector.java
Author: nick
Date: Fri Nov 25 15:38:32 2011
New Revision: 1206225
URL: http://svn.apache.org/viewvc?rev=1206225view=rev
Log:
Add CHANGES entry for TIKA-789
Modified:
tika/trunk/CHANGES.txt
Modified: tika/trunk/CHANGES.txt
URL:
http://svn.apache.org/viewvc/tika/trunk/CHANGES.txt?rev
Author: nick
Date: Fri Nov 25 15:43:05 2011
New Revision: 1206228
URL: http://svn.apache.org/viewvc?rev=1206228view=rev
Log:
TIKA-789 Add the project type to the OfficeParser mimetype list, and add a note
on why Works is missing from the list
Modified:
tika/trunk/tika-parsers/src/main/java
Author: nick
Date: Sun Nov 27 18:10:01 2011
New Revision: 1206791
URL: http://svn.apache.org/viewvc?rev=1206791view=rev
Log:
TIKA-697 Test CPIO file
Added:
tika/trunk/tika-parsers/src/test/resources/test-documents/test-documents.cpio
(with props)
Added:
tika/trunk/tika-parsers/src/test
Author: nick
Date: Sun Nov 27 22:21:39 2011
New Revision: 1206869
URL: http://svn.apache.org/viewvc?rev=1206869view=rev
Log:
TIKA-697 Archive formats mimetype tests (not all of which work yet)
Modified:
tika/trunk/tika-parsers/src/test/java/org/apache/tika/mime/TestMimeTypes.java
Modified
Author: nick
Date: Sun Nov 27 22:51:36 2011
New Revision: 1206896
URL: http://svn.apache.org/viewvc?rev=1206896view=rev
Log:
TIKA-697 Correct mime match for .ar unix archives, add the suggested extra
filetypes and aliases, and list .deb as being ar based
Modified:
tika/trunk/tika-core/src
Author: nick
Date: Sun Nov 27 22:57:18 2011
New Revision: 1206898
URL: http://svn.apache.org/viewvc?rev=1206898view=rev
Log:
TIKA-697 Add mime magic for .deb files, which are base on .ar but have a
specific first entry
Modified:
tika/trunk/tika-core/src/main/resources/org/apache/tika/mime
Author: nick
Date: Mon Nov 28 00:26:28 2011
New Revision: 1206937
URL: http://svn.apache.org/viewvc?rev=1206937view=rev
Log:
TIKA-794 Correct Little16 mime magic logic, and enable the CPIO test now that
the detection is correct
Modified:
tika/trunk/tika-core/src/main/java/org/apache/tika
Author: nick
Date: Mon Nov 28 13:05:40 2011
New Revision: 1207124
URL: http://svn.apache.org/viewvc?rev=1207124view=rev
Log:
TIKA-791 Sample protect Microsoft Office documents
Added:
tika/trunk/tika-parsers/src/test/resources/test-documents/testEXCEL_protected_passtika.xls
(with props
Author: nick
Date: Fri Dec 2 12:17:53 2011
New Revision: 1209438
URL: http://svn.apache.org/viewvc?rev=1209438view=rev
Log:
Patch+Test from Antoni from TIKA-797 - Correct the default PPT extension
Modified:
tika/trunk/tika-core/src/main/resources/org/apache/tika/mime/tika-mimetypes.xml
Author: nick
Date: Mon Dec 5 03:44:42 2011
New Revision: 1210322
URL: http://svn.apache.org/viewvc?rev=1210322view=rev
Log:
TIKA-410 Word Parser support for extracting textbox content (Patch from John
Mastarone)
Modified:
tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser
Author: nick
Date: Thu Dec 8 05:30:39 2011
New Revision: 1211760
URL: http://svn.apache.org/viewvc?rev=1211760view=rev
Log:
Add TikaCLI help for the -f/--fork option previously added for TIKA-416
Modified:
tika/trunk/tika-app/src/main/java/org/apache/tika/cli/TikaCLI.java
Modified: tika
Author: nick
Date: Mon Dec 12 02:25:29 2011
New Revision: 1213131
URL: http://svn.apache.org/viewvc?rev=1213131view=rev
Log:
Add disabled tests for TIKA-808 (parser needs fixing so that tests can pass)
Added:
tika/trunk/tika-parsers/src/test/java/org/apache/tika/parser/fork/
tika/trunk
Author: nick
Date: Tue Dec 13 04:13:53 2011
New Revision: 1213560
URL: http://svn.apache.org/viewvc?rev=1213560view=rev
Log:
TIKA-803 Wrap the outlook message body in a special div
Modified:
tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/microsoft/OutlookExtractor.java
Author: nick
Date: Tue Dec 20 05:59:57 2011
New Revision: 1221109
URL: http://svn.apache.org/viewvc?rev=1221109view=rev
Log:
TIKA-700 Upgrade to POI 3.8 beta 5
Modified:
tika/trunk/tika-parsers/pom.xml
tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/microsoft/ooxml
Author: nick
Date: Wed Dec 21 03:03:17 2011
New Revision: 1221581
URL: http://svn.apache.org/viewvc?rev=1221581view=rev
Log:
TIKA-822 - Handle quoted parameters on media types
Modified:
tika/trunk/tika-core/src/main/java/org/apache/tika/mime/MediaType.java
tika/trunk/tika-core/src/test
Author: nick
Date: Mon Dec 26 04:04:54 2011
New Revision: 1224675
URL: http://svn.apache.org/viewvc?rev=1224675view=rev
Log:
TIKA-829 Validate inputs to the ForkParser constructor (must not be another
ForkParser) and TikaInputStream get (must not be null) - patch from Jerome
Lacoste
Modified
Author: nick
Date: Mon Dec 26 12:51:47 2011
New Revision: 1224728
URL: http://svn.apache.org/viewvc?rev=1224728view=rev
Log:
TIKA-831 Fix the data type when comparing errors from the forked server, and
add some more Forked unit tests (one disabled) - patch originally from Jerome
Lacoste
1 - 100 of 801 matches
Mail list logo