Author: nick
Date: Tue Jun 18 22:38:45 2013
New Revision: 1494352
URL: http://svn.apache.org/r1494352
Log:
Test file from Paul Brinich from TIKA-1136
Added:
tika/trunk/tika-parsers/src/test/resources/test-documents/testIPA.ipa
(with props)
Added: tika/trunk/tika-parsers/src/test
Author: nick
Date: Wed May 15 00:30:12 2013
New Revision: 1482648
URL: http://svn.apache.org/r1482648
Log:
New code to help for TIKA-1118, currently disabled pending a POI upgrade
Modified:
tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/microsoft/ooxml
Author: nick
Date: Mon Apr 29 14:20:14 2013
New Revision: 1477097
URL: http://svn.apache.org/r1477097
Log:
Patch from Ryan McKinley from TIKA-1014 - Allow custom MimeTypesReader (with
tests)
Added:
tika/trunk/tika-core/src/test/java/org/apache/tika/mime/CustomReaderTest.java
tika
Author: nick
Date: Fri Mar 29 17:16:31 2013
New Revision: 1462546
URL: http://svn.apache.org/r1462546
Log:
Mimetype entries with magic for the arj and uc2 archive formats TIKA-1099
Modified:
tika/trunk/tika-core/src/main/resources/org/apache/tika/mime/tika-mimetypes.xml
Modified:
tika
Author: nick
Date: Wed Feb 13 14:42:20 2013
New Revision: 1445629
URL: http://svn.apache.org/r1445629
Log:
TIKA-1084 Merge image/x-icon (old) with the newer standard
image/vnd.microsoft.icon
Modified:
tika/trunk/tika-core/src/main/resources/org/apache/tika/mime/tika-mimetypes.xml
tika
Author: nick
Date: Wed Feb 13 14:47:20 2013
New Revision: 1445632
URL: http://svn.apache.org/r1445632
Log:
Patch from Ryan McKinley from TIKA-1083 - Add Link and UTI information for a
number of common mimetypes
Modified:
tika/trunk/tika-core/src/main/resources/org/apache/tika/mime/tika
Author: nick
Date: Wed Feb 13 14:49:57 2013
New Revision: 1445637
URL: http://svn.apache.org/r1445637
Log:
ChangeLog entry for TIKA-1012 and TIKA-1083
Modified:
tika/trunk/CHANGES.txt
Modified: tika/trunk/CHANGES.txt
URL:
http://svn.apache.org/viewvc/tika/trunk/CHANGES.txt?rev=1445637r1
Author: nick
Date: Tue Feb 5 10:03:45 2013
New Revision: 1442522
URL: http://svn.apache.org/viewvc?rev=1442522view=rev
Log:
Add missing license header
Modified:
tika/trunk/tika-core/src/test/java/org/apache/tika/io/TailStreamTest.java
Modified:
tika/trunk/tika-core/src/test/java/org
Author: nick
Date: Mon Feb 4 16:05:26 2013
New Revision: 1442159
URL: http://svn.apache.org/viewvc?rev=1442159view=rev
Log:
TIKA-1076 Upgrade to Apache POI 3.9. Commit disables some HSLF related unit
test checks, they need re-enabling along with a fix soon
Modified:
tika/trunk/tika-parsers
Author: nick
Date: Mon Feb 4 16:36:42 2013
New Revision: 1442168
URL: http://svn.apache.org/viewvc?rev=1442168view=rev
Log:
Support tika:link and tika:uti mimetype extensions, along with unit tests.
Modified version of the patch from TIKA-1012
Modified:
tika/trunk/tika-core/src/main/java
Author: nick
Date: Mon Feb 4 17:06:12 2013
New Revision: 1442183
URL: http://svn.apache.org/viewvc?rev=1442183view=rev
Log:
FileMaker Pro mime entry from Marco Quaranta from TIKA-1061
Modified:
tika/trunk/tika-core/src/main/resources/org/apache/tika/mime/tika-mimetypes.xml
Modified:
tika
Author: nick
Date: Mon Feb 4 22:17:00 2013
New Revision: 1442399
URL: http://svn.apache.org/viewvc?rev=1442399view=rev
Log:
TIKA-991 Enable the DURATION property
Modified:
tika/trunk/tika-core/src/main/java/org/apache/tika/metadata/XMPDM.java
Modified: tika/trunk/tika-core/src/main/java
Author: nick
Date: Mon Jan 28 17:35:24 2013
New Revision: 1439515
URL: http://svn.apache.org/viewvc?rev=1439515view=rev
Log:
TIKA-1065 SAS subtype and mime magic
Modified:
tika/trunk/tika-core/src/main/resources/org/apache/tika/mime/tika-mimetypes.xml
Modified:
tika/trunk/tika-core/src
Author: nick
Date: Fri Jan 18 16:37:28 2013
New Revision: 1435235
URL: http://svn.apache.org/viewvc?rev=1435235view=rev
Log:
message/rfc822 pattern from Marco Quaranta from TIKA-1058
Modified:
tika/trunk/tika-core/src/main/resources/org/apache/tika/mime/tika-mimetypes.xml
Modified:
tika
Author: nick
Date: Thu Jan 10 12:11:43 2013
New Revision: 1431313
URL: http://svn.apache.org/viewvc?rev=1431313view=rev
Log:
Tika-1055 patch from Bernhard Berger to add mimetypes for a number of
programming languages
Modified:
tika/trunk/tika-core/src/main/resources/org/apache/tika/mime
Author: nick
Date: Thu Jan 10 15:29:59 2013
New Revision: 1431426
URL: http://svn.apache.org/viewvc?rev=1431426view=rev
Log:
Patch from Emmanuel Hugonnet from TIKA-1021 - PSD data lengths are even padded
Added:
tika/trunk/tika-parsers/src/test/resources/test-documents/testPSD2.psd
Author: nick
Date: Thu Jan 10 15:42:48 2013
New Revision: 1431440
URL: http://svn.apache.org/viewvc?rev=1431440view=rev
Log:
Add a unit test for HDF4 files, which shows that TIKA-958 was already fixed
Added:
tika/trunk/tika-parsers/src/test/resources/test-documents/test.hdf (with
props
Author: nick
Date: Mon Dec 17 04:22:19 2012
New Revision: 1422750
URL: http://svn.apache.org/viewvc?rev=1422750view=rev
Log:
TIKA-976 Excel95 files should be correctly detected, but as POI HSSF does not
support them they should not generate exceptions if you try to parse one
Added:
tika
Author: nick
Date: Fri Dec 14 03:08:58 2012
New Revision: 1421646
URL: http://svn.apache.org/viewvc?rev=1421646view=rev
Log:
TIKA-1044 Fix issue for Word extractors on text that lacks any styling, plus
tests based on files from Jonas Wilhelmsson
Added:
tika/trunk/tika-parsers/src/test
Author: nick
Date: Mon Oct 22 08:24:18 2012
New Revision: 1400795
URL: http://svn.apache.org/viewvc?rev=1400795view=rev
Log:
Add test CSS and JS files taken from the Tika website, and use these to add
additional detection unit tests for these two formats
Added:
tika/trunk/tika-parsers/src
Author: nick
Date: Wed Jul 18 22:39:12 2012
New Revision: 1363160
URL: http://svn.apache.org/viewvc?rev=1363160view=rev
Log:
TIKA-957 NTIF mime entry and magic
Modified:
tika/trunk/tika-core/src/main/resources/org/apache/tika/mime/tika-mimetypes.xml
Modified:
tika/trunk/tika-core/src/main
Author: nick
Date: Sat Jul 7 12:07:46 2012
New Revision: 1358550
URL: http://svn.apache.org/viewvc?rev=1358550view=rev
Log:
TIKA-948 Add mime magic for ChemDraw .cdx files, then fix the Cli extraction
test so it has the correct extension
Modified:
tika/trunk/tika-app/src/test/java/org
Author: nick
Date: Fri Jul 6 16:44:40 2012
New Revision: 1358297
URL: http://svn.apache.org/viewvc?rev=1358297view=rev
Log:
Test file from TIKA-948
Added:
tika/trunk/tika-parsers/src/test/resources/test-documents/testWORD_embedded_pdf.doc
(with props)
Added:
tika/trunk/tika-parsers
Author: nick
Date: Fri Jul 6 20:32:52 2012
New Revision: 1358403
URL: http://svn.apache.org/viewvc?rev=1358403view=rev
Log:
TIKA-948 There is more than one way to embed things in OLE2, so add subtypes
for both
Modified:
tika/trunk/tika-core/src/main/resources/org/apache/tika/mime/tika
Author: nick
Date: Fri Jul 6 20:36:02 2012
New Revision: 1358405
URL: http://svn.apache.org/viewvc?rev=1358405view=rev
Log:
Fix the extraction test for the file type, and check for one additional file
Modified:
tika/trunk/tika-app/src/test/java/org/apache/tika/cli/TikaCLITest.java
Modified
Author: nick
Date: Fri Jul 6 23:12:10 2012
New Revision: 1358467
URL: http://svn.apache.org/viewvc?rev=1358467view=rev
Log:
TIKA-948 Look up the file extension for the mimetype detected for embedded
resources, and fix unit tests for this
Modified:
tika/trunk/tika-app/src/test/java/org
Author: nick
Date: Wed Jul 4 15:08:04 2012
New Revision: 1357296
URL: http://svn.apache.org/viewvc?rev=1357296view=rev
Log:
TIKA-949 Mimetype entries for some zip-based process/mapping formats
Modified:
tika/trunk/tika-core/src/main/resources/org/apache/tika/mime/tika-mimetypes.xml
Author: nick
Date: Sat Jun 30 17:10:24 2012
New Revision: 1355771
URL: http://svn.apache.org/viewvc?rev=1355771view=rev
Log:
TIKA-941 Sample KML and KMZ files, KML sample file from Google from the file
format documentation
Added:
tika/trunk/tika-parsers/src/test/resources/test-documents
Author: nick
Date: Sat Jun 30 17:15:53 2012
New Revision: 1355773
URL: http://svn.apache.org/viewvc?rev=1355773view=rev
Log:
TIKA-941 Mark KMZ as being Zip based, so data only detection works properly
Modified:
tika/trunk/tika-core/src/main/resources/org/apache/tika/mime/tika-mimetypes.xml
Author: nick
Date: Sat Jun 30 17:18:06 2012
New Revision: 1355776
URL: http://svn.apache.org/viewvc?rev=1355776view=rev
Log:
TIKA-941 KML/KMZ detection unit tests
Modified:
tika/trunk/tika-parsers/src/test/java/org/apache/tika/detect/TestContainerAwareDetector.java
tika/trunk/tika
Author: nick
Date: Sat Jun 30 17:25:56 2012
New Revision: 1355780
URL: http://svn.apache.org/viewvc?rev=1355780view=rev
Log:
TIKA-788 Some DWG files have an implausable header offset. Avoid problems and
just skip over them, pending a better understanding of the file format
Modified:
tika
Author: nick
Date: Sat Jun 30 17:40:05 2012
New Revision: 1355782
URL: http://svn.apache.org/viewvc?rev=1355782view=rev
Log:
TIKA-863 Avoid creating a new AutoDetectParser (and implicit TikaConfig) for
each part in a RFC822 message. Instead, check for one on the ParseContext,
otherwise cache
Author: nick
Date: Thu Jun 21 17:14:18 2012
New Revision: 1352625
URL: http://svn.apache.org/viewvc?rev=1352625view=rev
Log:
TIKA-940 Sample 7zip (7z) file, based on the zip example
Added:
tika/trunk/tika-parsers/src/test/resources/test-documents/test-documents.7z
(with props)
Added
Author: nick
Date: Thu Jun 21 17:19:40 2012
New Revision: 1352628
URL: http://svn.apache.org/viewvc?rev=1352628view=rev
Log:
TIKA-940 Mime Magic and unit test for 7zip
Modified:
tika/trunk/tika-core/src/main/resources/org/apache/tika/mime/tika-mimetypes.xml
tika/trunk/tika-parsers/src
Author: nick
Date: Wed Jun 13 15:40:02 2012
New Revision: 1349918
URL: http://svn.apache.org/viewvc?rev=1349918view=rev
Log:
Fix the case of the .ar files in the unit tests (TIKA-935) - case must match
that stored in SVN or tests will fail on case-sensitive file systems
Modified:
tika
Author: nick
Date: Thu May 17 16:19:06 2012
New Revision: 1339682
URL: http://svn.apache.org/viewvc?rev=1339682view=rev
Log:
TIKA-876 Another pkcs7 magic pattern
Modified:
tika/trunk/tika-core/src/main/resources/org/apache/tika/mime/tika-mimetypes.xml
Modified:
tika/trunk/tika-core/src
Author: nick
Date: Thu May 17 16:20:10 2012
New Revision: 1339685
URL: http://svn.apache.org/viewvc?rev=1339685view=rev
Log:
TIKA-929 Start to replace the old non-prefixed, largely non-property MSOffice
metadata definitions with new style ones
Added:
tika/trunk/tika-core/src/main/java/org
Author: nick
Date: Thu May 17 16:33:44 2012
New Revision: 1339695
URL: http://svn.apache.org/viewvc?rev=1339695view=rev
Log:
TIKA-929 Bring some of the key parts of the Office metadata into
TikaCoreProperties, with composites to support the previous (now deprecated)
ones in MSOffice
Modified
Author: nick
Date: Thu May 17 17:30:32 2012
New Revision: 1339730
URL: http://svn.apache.org/viewvc?rev=1339730view=rev
Log:
TIKA-929 Use the prefered constant rather than the IPTC imported one
Modified:
tika/trunk/tika-core/src/main/java/org/apache/tika/metadata/Office.java
Modified:
tika
Author: nick
Date: Thu May 17 19:09:48 2012
New Revision: 1339804
URL: http://svn.apache.org/viewvc?rev=1339804view=rev
Log:
TIKA-928 Patch from Ray Gauss to improve metadata properties setting/getting
Modified:
tika/trunk/tika-core/src/main/java/org/apache/tika/metadata/Metadata.java
Author: nick
Date: Thu May 17 19:22:04 2012
New Revision: 1339811
URL: http://svn.apache.org/viewvc?rev=1339811view=rev
Log:
Make the composite test more explicit in what it does, fix up some deprecated
warnings, and fix the typed getters for composites
Modified:
tika/trunk/tika-core/src
Author: nick
Date: Thu May 17 19:36:58 2012
New Revision: 1339818
URL: http://svn.apache.org/viewvc?rev=1339818view=rev
Log:
Fix setter to be by property not name for add(Property)
Modified:
tika/trunk/tika-core/src/main/java/org/apache/tika/metadata/Metadata.java
Modified:
tika/trunk/tika
Author: nick
Date: Thu May 17 19:49:03 2012
New Revision: 1339822
URL: http://svn.apache.org/viewvc?rev=1339822view=rev
Log:
TIKA-928 Fix up the DWG parser and tests to use the new style properties
Modified:
tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/dwg/DWGParser.java
Author: nick
Date: Thu May 17 20:31:58 2012
New Revision: 1339850
URL: http://svn.apache.org/viewvc?rev=1339850view=rev
Log:
TIKA-842 Patch from Ray Gauss to split out the Photoshop and XMP Rights
namespaces, and updates IPTC to use the new DublinCore properties (plus fix
inconsistent indents
Modified:
tika/trunk/tika-core/src/main/java/org/apache/tika/metadata/Office.java
URL:
http://svn.apache.org/viewvc/tika/trunk/tika-core/src/main/java/org/apache/tika/metadata/Office.java?rev=1339850r1=1339849r2=1339850view=diff
Author: nick
Date: Thu May 17 20:56:05 2012
New Revision: 1339860
URL: http://svn.apache.org/viewvc?rev=1339860view=rev
Log:
TIKA-929 Bring across MSOffice.AUTHOR in the same way as initial and last
authors
Modified:
tika/trunk/tika-core/src/main/java/org/apache/tika/metadata/MSOffice.java
Author: nick
Date: Fri May 18 01:30:57 2012
New Revision: 1339946
URL: http://svn.apache.org/viewvc?rev=1339946view=rev
Log:
TIKA-929 Ensure backwards compatibility on the Office document statistics
Modified:
tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/microsoft
Author: nick
Date: Fri May 18 01:55:18 2012
New Revision: 1339951
URL: http://svn.apache.org/viewvc?rev=1339951view=rev
Log:
TIKA-842 Patch from Ray Gauss to tidy up a few property names
Modified:
tika/trunk/tika-core/src/main/java/org/apache/tika/metadata/IPTC.java
Modified: tika/trunk
Author: nick
Date: Wed May 16 16:32:41 2012
New Revision: 1339255
URL: http://svn.apache.org/viewvc?rev=1339255view=rev
Log:
TIKA-917 Pull the property definitions out to their own class, add more machine
types, and define the platform
Added:
tika/trunk/tika-parsers/src/main/java/org
Author: nick
Date: Wed May 16 17:33:15 2012
New Revision: 1339276
URL: http://svn.apache.org/viewvc?rev=1339276view=rev
Log:
TIKA-925 - Patch from Ray Gauss to start on improving how the common metadata
is stored/fetched
Modified:
tika/trunk/tika-core/pom.xml
tika/trunk/tika-core/src
Author: nick
Date: Wed May 16 20:03:27 2012
New Revision: 1339332
URL: http://svn.apache.org/viewvc?rev=1339332view=rev
Log:
TIKA-917 Some more sample elf files
Added:
tika/trunk/tika-parsers/src/test/resources/test-documents/testFreeBSD-x86-64
(with props)
tika/trunk/tika-parsers
Author: nick
Date: Wed May 16 21:42:49 2012
New Revision: 1339380
URL: http://svn.apache.org/viewvc?rev=1339380view=rev
Log:
TIKA-927 - Patch from Ray Gauss to support Composite Properties (useful for
backwards compatibility, and mapping between application and core properties)
Modified
Author: nick
Date: Wed May 16 22:04:20 2012
New Revision: 1339389
URL: http://svn.apache.org/viewvc?rev=1339389view=rev
Log:
TIKA-917 Get the elf OS, if that bit of the header is set (but it often gets
left as null)
Modified:
tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser
Author: nick
Date: Wed May 16 22:37:19 2012
New Revision: 1339404
URL: http://svn.apache.org/viewvc?rev=1339404view=rev
Log:
TIKA-928 Patch from Ray Gauss (plus extra JavaDocs) - start to define the set
of common consistent metadata that all parsers will try to provide, no matter
what
Author: nick
Date: Wed May 16 23:04:29 2012
New Revision: 1339416
URL: http://svn.apache.org/viewvc?rev=1339416view=rev
Log:
Add some simple JavaDoc descriptions of the property types, to help people who
don't natively speak xmp! (TIKA-926 related)
Modified:
tika/trunk/tika-core/src/main
Author: nick
Date: Wed May 16 23:15:09 2012
New Revision: 1339418
URL: http://svn.apache.org/viewvc?rev=1339418view=rev
Log:
TIKA-926 Patch from Ray Gauss to allow set(Property,String[]) and
add(Property,String), to mirror the string key based methods but with type
safety
Modified:
tika
Author: nick
Date: Sun May 13 13:47:04 2012
New Revision: 1337879
URL: http://svn.apache.org/viewvc?rev=1337879view=rev
Log:
TIKA-917 A few sample files for Linux-ELF, and a PE32 one, plus the C file
Added:
tika/trunk/tika-parsers/src/test/resources/test-documents/testC.c
tika/trunk
Author: nick
Date: Sun May 13 18:47:19 2012
New Revision: 1337962
URL: http://svn.apache.org/viewvc?rev=1337962view=rev
Log:
TIKA-917 Start on a parser for PE and ELF executables, to output metadata
Added:
tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/executable/
tika
Author: nick
Date: Thu May 10 11:37:13 2012
New Revision: 1336610
URL: http://svn.apache.org/viewvc?rev=1336610view=rev
Log:
TIKA-913 Mime Magic for PE, PE32 and PE64 executables
Modified:
tika/trunk/tika-core/src/main/resources/org/apache/tika/mime/tika-mimetypes.xml
Modified:
tika/trunk
Author: nick
Date: Thu May 10 11:52:05 2012
New Revision: 1336622
URL: http://svn.apache.org/viewvc?rev=1336622view=rev
Log:
TIKA-915 related - add mime magic for the elf format too, based on the
mimetypes in the httpd magic file
Modified:
tika/trunk/tika-core/src/main/resources/org/apache
Author: nick
Date: Wed May 9 14:36:42 2012
New Revision: 1336227
URL: http://svn.apache.org/viewvc?rev=1336227view=rev
Log:
Whoops, properly disable the test for TIKA-915 this time...
Modified:
tika/trunk/tika-parsers/src/test/java/org/apache/tika/parser/jpeg/JpegParserTest.java
Modified
Author: nick
Date: Fri May 4 14:38:34 2012
New Revision: 1333994
URL: http://svn.apache.org/viewvc?rev=1333994view=rev
Log:
Add Adobe AfterEffects mimetypes, fix up the Adobe Premier detection, and give
.AEP to AfterEffects as it seems much more common now than AudioGraph
Modified:
tika
Author: nick
Date: Sun Apr 29 16:42:10 2012
New Revision: 1331945
URL: http://svn.apache.org/viewvc?rev=1331945view=rev
Log:
TIKA-858 Fix Java 1.6isms
Modified:
tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/iptc/IptcAnpaParser.java
Modified:
tika/trunk/tika-parsers/src/main
Author: nick
Date: Sun Apr 29 16:43:16 2012
New Revision: 1331946
URL: http://svn.apache.org/viewvc?rev=1331946view=rev
Log:
TIKA-858 Fix Java 1.6isms
Modified:
tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/iptc/IptcAnpaParser.java
Modified:
tika/trunk/tika-parsers/src/main
Author: nick
Date: Sat Apr 28 16:08:21 2012
New Revision: 1331788
URL: http://svn.apache.org/viewvc?rev=1331788view=rev
Log:
TIKA-852 Upgrade the MP4 parser to 1.0 RC1, which allows us to enable the MP4
unit test (patch from Sebastian Annies)
Modified:
tika/trunk/tika-bundle/pom.xml
Author: nick
Date: Sat Apr 28 16:53:35 2012
New Revision: 1331794
URL: http://svn.apache.org/viewvc?rev=1331794view=rev
Log:
TIKA-858 Patch from Craig Stires to add support for parsing IPTC ANPA News Wire
Feeds
Added:
tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/iptc
Author: nick
Date: Sat Apr 28 18:17:58 2012
New Revision: 1331801
URL: http://svn.apache.org/viewvc?rev=1331801view=rev
Log:
TIKA-858 Fix Java 1.6isms
Modified:
tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/iptc/IptcAnpaParser.java
Modified:
tika/trunk/tika-parsers/src/main
Author: nick
Date: Fri Apr 27 13:59:49 2012
New Revision: 1331434
URL: http://svn.apache.org/viewvc?rev=1331434view=rev
Log:
TIKA-861 Patch from Ryan Quam to enable extracting PDF Links. (Links are
extracted for now at the end of the page, further work will be needed to match
them to the text
Author: nick
Date: Fri Apr 27 22:48:32 2012
New Revision: 1331618
URL: http://svn.apache.org/viewvc?rev=1331618view=rev
Log:
TIKA-906 Support extracting Headers, Footers and Footnotes in iWorks Pages
files. As part of this, make the parser a little more aware of where in the
file
Author: nick
Date: Fri Apr 27 23:26:44 2012
New Revision: 1331634
URL: http://svn.apache.org/viewvc?rev=1331634view=rev
Log:
Magic for PCKS7 in PEM format, and DER format (probably...)
Modified:
tika/trunk/tika-core/src/main/resources/org/apache/tika/mime/tika-mimetypes.xml
Modified:
tika
Author: nick
Date: Fri Apr 27 23:55:09 2012
New Revision: 1331640
URL: http://svn.apache.org/viewvc?rev=1331640view=rev
Log:
TIKA-907 Comments in iWorks Pages files
Added:
tika/trunk/tika-parsers/src/test/resources/test-documents/testPagesComments.pages
(with props)
Modified:
tika
Author: nick
Date: Tue Apr 24 09:52:03 2012
New Revision: 1329614
URL: http://svn.apache.org/viewvc?rev=1329614view=rev
Log:
Update the Detector Documentation for DefaultDetector, which replaced the older
ContainerAwareDetection
Modified:
tika/site/publish/1.1/detection.html
tika/site
Author: nick
Date: Fri Apr 20 13:32:55 2012
New Revision: 1328370
URL: http://svn.apache.org/viewvc?rev=1328370view=rev
Log:
TIKA-897 Detect XML files that start with the UTF-8 BOM, plus test
Added:
tika/trunk/tika-core/src/test/resources/org/apache/tika/mime/test-utf8-bom.xml
Author: nick
Date: Thu Apr 5 13:37:27 2012
New Revision: 1309852
URL: http://svn.apache.org/viewvc?rev=1309852view=rev
Log:
TIKA-890 Sample APK file, along with sample EAR and WAR files (related)
Added:
tika/trunk/tika-parsers/src/test/resources/test-documents/testAPK.apk
(with props
Author: nick
Date: Thu Apr 5 13:39:25 2012
New Revision: 1309854
URL: http://svn.apache.org/viewvc?rev=1309854view=rev
Log:
TIKA-890 Container Aware detection of JAR derived types such as WAR, EAR and
APK, with tests
Modified:
tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser
Author: nick
Date: Wed Feb 8 17:59:47 2012
New Revision: 1242018
URL: http://svn.apache.org/viewvc?rev=1242018view=rev
Log:
TIKA-852 Support setting the channel type from a channel count in mp4, via a
couple of different possible routes (see dev@tika discussions)
Modified:
tika/trunk/tika
Author: nick
Date: Sat Jan 28 18:42:35 2012
New Revision: 1237139
URL: http://svn.apache.org/viewvc?rev=1237139view=rev
Log:
TIKA-852 Sample MP4 Audio (M4A) file
Added:
tika/trunk/tika-parsers/src/test/resources/test-documents/testMP4.m4a
(with props)
Added: tika/trunk/tika-parsers/src
Author: nick
Date: Fri Jan 27 14:50:51 2012
New Revision: 1236700
URL: http://svn.apache.org/viewvc?rev=1236700view=rev
Log:
TIKA-851 More specific quicktime/mp4 matches, for the common subtypes, based on
the ftyp atom
Modified:
tika/trunk/tika-core/src/main/resources/org/apache/tika/mime
Author: nick
Date: Fri Jan 27 17:03:03 2012
New Revision: 1236764
URL: http://svn.apache.org/viewvc?rev=1236764view=rev
Log:
TIKA-842 Avoid property name clash with IPTC and the old-style values from
DublinCore
Modified:
tika/trunk/tika-core/src/main/java/org/apache/tika/metadata
Author: nick
Date: Tue Jan 24 14:48:58 2012
New Revision: 1235284
URL: http://svn.apache.org/viewvc?rev=1235284view=rev
Log:
TIKA-760 Avoid NPE in XHTMLContentHandler if a null string is passed to the
characters method
Modified:
tika/trunk/tika-core/src/main/java/org/apache/tika/sax
Author: nick
Date: Tue Jan 24 16:10:34 2012
New Revision: 1235321
URL: http://svn.apache.org/viewvc?rev=1235321view=rev
Log:
TIKA-770 Convert the remaining ODF document statistics to be defined
properties, and update all of the Office Count statistics to be integer typed
properties
Modified
Author: nick
Date: Mon Jan 23 15:45:32 2012
New Revision: 1234860
URL: http://svn.apache.org/viewvc?rev=1234860view=rev
Log:
TIKA-846 Fix indent
Modified:
tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/xml/ElementMetadataHandler.java
Modified:
tika/trunk/tika-parsers/src
Author: nick
Date: Mon Jan 23 16:08:10 2012
New Revision: 1234873
URL: http://svn.apache.org/viewvc?rev=1234873view=rev
Log:
TIKA-845 Correct the conversion of XML tags to multi-valued metadata values,
and avoid duplicating existing values
Modified:
tika/trunk/tika-parsers/src/main/java
Author: nick
Date: Mon Jan 23 16:29:55 2012
New Revision: 1234886
URL: http://svn.apache.org/viewvc?rev=1234886view=rev
Log:
TIKA-849 Add a sample iBooks epub file from Andrew Jackson, and add a unit test
for the Zip Container Detector of epub zip formats
Added:
tika/trunk/tika-parsers/src
Author: nick
Date: Mon Jan 23 16:57:55 2012
New Revision: 1234901
URL: http://svn.apache.org/viewvc?rev=1234901view=rev
Log:
TIKA-849 iBooks epub mimetype entry, and fix a few comments
Modified:
tika/trunk/tika-core/src/main/resources/org/apache/tika/mime/tika-mimetypes.xml
Modified:
tika
Author: nick
Date: Mon Jan 23 17:10:20 2012
New Revision: 1234904
URL: http://svn.apache.org/viewvc?rev=1234904view=rev
Log:
TIKA-849 Initial ibooks epub support and test, from Andrew Jackson. Metadata
only for now though, text isn't coming through as it's within object tags
Added:
tika
Author: nick
Date: Fri Jan 20 15:56:05 2012
New Revision: 1233973
URL: http://svn.apache.org/viewvc?rev=1233973view=rev
Log:
TIKA-507 FontBox powered .afm font metrics parser, patch from Fernando Arreola
Added:
tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/font
Author: nick
Date: Fri Jan 20 17:01:12 2012
New Revision: 1234003
URL: http://svn.apache.org/viewvc?rev=1234003view=rev
Log:
TIKA-843 Metadata support for dates without times (treated as midnight UTC)
Modified:
tika/trunk/tika-core/src/main/java/org/apache/tika/metadata/Metadata.java
Author: nick
Date: Mon Jan 16 10:40:32 2012
New Revision: 1231905
URL: http://svn.apache.org/viewvc?rev=1231905view=rev
Log:
TIKA-805 Improved .pptx XSLF extraction, patch from Yegor Kozlov
Modified:
tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/microsoft/ooxml
Author: nick
Date: Fri Jan 13 15:01:54 2012
New Revision: 1231117
URL: http://svn.apache.org/viewvc?rev=1231117view=rev
Log:
TIKA-840 Update the OOXML parsers, so that rather than hard coding the content
type, the file specific one is feteched and set
Modified:
tika/trunk/tika-parsers/src
Author: nick
Date: Tue Jan 3 05:12:47 2012
New Revision: 1226651
URL: http://svn.apache.org/viewvc?rev=1226651view=rev
Log:
TIKA-826 We don't currently support .xps or .xlsb files (which are OOXML
based), so ensure we don't explicitly claim them, and have the OOXML parser
decline if it gets
Author: nick
Date: Thu Dec 29 09:14:11 2011
New Revision: 1225482
URL: http://svn.apache.org/viewvc?rev=1225482view=rev
Log:
Add TIKA-793 to the changelog
Modified:
tika/trunk/CHANGES.txt
Modified: tika/trunk/CHANGES.txt
URL:
http://svn.apache.org/viewvc/tika/trunk/CHANGES.txt?rev
Author: nick
Date: Thu Dec 29 06:59:08 2011
New Revision: 1225451
URL: http://svn.apache.org/viewvc?rev=1225451view=rev
Log:
TIKA-831 Start on a test for the ForkParser with a parser exception that isn't
serializable (currently not working so disabled)
Modified:
tika/trunk/tika-parsers/src
Author: nick
Date: Thu Dec 29 07:17:07 2011
New Revision: 1225454
URL: http://svn.apache.org/viewvc?rev=1225454view=rev
Log:
TIKA-793 Unit test for i18n MP3 tags (excluding comments)
Modified:
tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/mp3/ID3v2Frame.java
tika/trunk
Author: nick
Date: Wed Dec 28 05:31:10 2011
New Revision: 1225099
URL: http://svn.apache.org/viewvc?rev=1225099view=rev
Log:
TIKA-833 Mark some more Excel formatting tests as passing (with tweaks to match
what actually gets stored)
Modified:
tika/trunk/tika-parsers/src/test/java/org/apache
Author: nick
Date: Mon Dec 26 12:51:47 2011
New Revision: 1224728
URL: http://svn.apache.org/viewvc?rev=1224728view=rev
Log:
TIKA-831 Fix the data type when comparing errors from the forked server, and
add some more Forked unit tests (one disabled) - patch originally from Jerome
Lacoste
Author: nick
Date: Tue Dec 27 02:44:28 2011
New Revision: 1224863
URL: http://svn.apache.org/viewvc?rev=1224863view=rev
Log:
TIKA-827 Handle sending non serializable exceptions back from the ForkServer
Modified:
tika/trunk/tika-core/src/main/java/org/apache/tika/fork/ForkServer.java
Author: nick
Date: Tue Dec 27 02:45:14 2011
New Revision: 1224864
URL: http://svn.apache.org/viewvc?rev=1224864view=rev
Log:
TIKA-831 Fix test warnings, and enable the last test (needs to not use the Tika
facade)
Modified:
tika/trunk/tika-parsers/src/test/java/org/apache/tika/parser/fork
Author: nick
Date: Tue Dec 27 03:00:53 2011
New Revision: 1224865
URL: http://svn.apache.org/viewvc?rev=1224865view=rev
Log:
TIKA-793 Correct the null termination stripping in the ID3 tag code, when
dealing with double byte encoded strings
Modified:
tika/trunk/tika-parsers/src/main/java
Author: nick
Date: Mon Dec 26 04:04:54 2011
New Revision: 1224675
URL: http://svn.apache.org/viewvc?rev=1224675view=rev
Log:
TIKA-829 Validate inputs to the ForkParser constructor (must not be another
ForkParser) and TikaInputStream get (must not be null) - patch from Jerome
Lacoste
Modified
601 - 700 of 798 matches
Mail list logo