Re: renaming master?

2020-06-17 Thread Ray Gauss II
Hi all, Apologies for not being able to be very involved over the past few years, but still trying to follow along and hoping to get time to contribute in the future. Another option might be ‘stable’? - Ray > On Jun 16, 2020, at 1:31 PM, Tim Allison wrote: > > All, > > As you may have see

[jira] [Commented] (TIKA-2056) Installing exiftool causes ForkParserIntegration test errors

2016-08-25 Thread Ray Gauss II (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2056?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15436705#comment-15436705 ] Ray Gauss II commented on TIKA-2056: My guess is that when Exiftool is availabl

Re: Tika 1.14?

2016-08-12 Thread Ray Gauss
I believe we've also still got the issue of structured metadata outstanding. Regards, Ray > On Aug 12, 2016, at 6:27 AM, Nick Burch wrote: > > On Thu, 11 Aug 2016, Bob Paulin wrote: >> I know it's been a little bit since we talked about 2.0. We had discussed >> holding off while some API cha

Re: Getting Files Tags

2016-04-19 Thread Ray Gauss
Hi Rajkumar, We don't have a lot to go on to help you, but depending on the Windows app you're using there's a good chance the tags are being stored as IPTC in which case Tika should have no problem extracting them. You've not said how you'd like to leverage Tika (Java, CLI, GUI, etc.), but an

[jira] [Commented] (TIKA-774) ExifTool Parser

2016-03-23 Thread Ray Gauss II (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-774?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15209162#comment-15209162 ] Ray Gauss II commented on TIKA-774: --- bq. we should add a static check for whe

[jira] [Updated] (TIKA-1906) ExternalParser No Longer Supports Commands in Array Format

2016-03-23 Thread Ray Gauss II (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ray Gauss II updated TIKA-1906: --- Fix Version/s: 1.13 2.0 > ExternalParser No Longer Supports Commands in Ar

[jira] [Resolved] (TIKA-1906) ExternalParser No Longer Supports Commands in Array Format

2016-03-23 Thread Ray Gauss II (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1906?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ray Gauss II resolved TIKA-1906. Resolution: Fixed > ExternalParser No Longer Supports Commands in Array For

[jira] [Comment Edited] (TIKA-1906) ExternalParser No Longer Supports Commands in Array Format

2016-03-22 Thread Ray Gauss II (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15206138#comment-15206138 ] Ray Gauss II edited comment on TIKA-1906 at 3/22/16 2:3

[jira] [Commented] (TIKA-1906) ExternalParser No Longer Supports Commands in Array Format

2016-03-22 Thread Ray Gauss II (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1906?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15206138#comment-15206138 ] Ray Gauss II commented on TIKA-1906: bq. agreed, sorry must have missed that

[jira] [Created] (TIKA-1906) ExternalParser No Longer Supports Commands in Array Format

2016-03-21 Thread Ray Gauss II (JIRA)
Ray Gauss II created TIKA-1906: -- Summary: ExternalParser No Longer Supports Commands in Array Format Key: TIKA-1906 URL: https://issues.apache.org/jira/browse/TIKA-1906 Project: Tika Issue Type

[jira] [Commented] (TIKA-1607) Introduce new arbitrary object key/values data structure for persistence of Tika Metadata

2016-03-15 Thread Ray Gauss II (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15196030#comment-15196030 ] Ray Gauss II commented on TIKA-1607: bq. It might be more easily configurable to

[jira] [Comment Edited] (TIKA-1607) Introduce new arbitrary object key/values data structure for persistence of Tika Metadata

2016-03-15 Thread Ray Gauss II (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15193845#comment-15193845 ] Ray Gauss II edited comment on TIKA-1607 at 3/15/16 1:5

[jira] [Commented] (TIKA-1607) Introduce new arbitrary object key/values data structure for persistence of Tika Metadata

2016-03-15 Thread Ray Gauss II (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15195326#comment-15195326 ] Ray Gauss II commented on TIKA-1607: Sorry, I meant {{EmbeddedDocumentExtra

[jira] [Commented] (TIKA-1607) Introduce new arbitrary object key/values data structure for persistence of Tika Metadata

2016-03-14 Thread Ray Gauss II (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15193845#comment-15193845 ] Ray Gauss II commented on TIKA-1607: Have we already considered treating the

Re: [DISCUSS] options for XMP parsing?

2016-03-14 Thread Ray Gauss
ave a PDF file with two packets containing conflicting authorship info IIRC! > :) It would be nice to expose both the canonical XMP info (with proper > processing of "later-xmp-overrides-earlier") as well as all of the info that > can be scraped from the XMP (packet1: authorX

[jira] [Commented] (TIKA-1894) Add XMPMM metadata extraction to JempboxExtractor

2016-03-14 Thread Ray Gauss II (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1894?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15193622#comment-15193622 ] Ray Gauss II commented on TIKA-1894: The {{tika-xmp}} project deals with converti

Re: [DISCUSS] options for XMP parsing?

2016-03-08 Thread Ray Gauss
k you. Will take a look. > > -Original Message- > From: Ray Gauss [mailto:ray.ga...@alfresco.com] > Sent: Tuesday, March 08, 2016 1:55 PM > To: dev@tika.apache.org > Subject: Re: [DISCUSS] options for XMP parsing? > > Hi Tim, > > We're already using Adobe's

Re: [DISCUSS] options for XMP parsing?

2016-03-08 Thread Ray Gauss
Hi Tim, We're already using Adobe's xmpcore in tika-xmp which works fine for parsing XMP (though has not seen updates in a while), but getting the XMP packets out of the files is tricker. We have XMPPacketScanner which works for many cases, but not all. InDesign files for example do some st

[jira] [Commented] (TIKA-1607) Introduce new arbitrary object key/values data structure for persistence of Tika Metadata

2016-02-25 Thread Ray Gauss II (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15167135#comment-15167135 ] Ray Gauss II commented on TIKA-1607: I know there can be multiple XMP packets

[jira] [Commented] (TIKA-1607) Introduce new arbitrary object key/values data structure for persistence of Tika Metadata

2016-02-19 Thread Ray Gauss II (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15154205#comment-15154205 ] Ray Gauss II commented on TIKA-1607: In my experience people gravitate towards &#

[jira] [Commented] (TIKA-1607) Introduce new arbitrary object key/values data structure for persistence of Tika Metadata

2016-02-16 Thread Ray Gauss II (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15149231#comment-15149231 ] Ray Gauss II commented on TIKA-1607: Are we opening a can of worms by encouraging

[jira] [Commented] (TIKA-1824) Tika 2.0 - Create Initial Parser Modules

2016-02-03 Thread Ray Gauss II (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1824?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15130386#comment-15130386 ] Ray Gauss II commented on TIKA-1824: bq. Thank you, Bob Paulin! Again, thi

Re: Tika 2.0 Source in Modules or tika-parser

2015-12-14 Thread Ray Gauss
I'd vote for a tiki-parser-common(s) artifact for common util classes and dependencies. > On Dec 14, 2015, at 10:54 AM, Ken Krugler wrote: > > >> From: Bob Paulin >> Sent: December 13, 2015 7:34:03pm PST >> To: dev@tika.apache.org >> Subject: Tika 2.0 Source in Modules or tika-parser >> >> H

Re: extracting contributor information?

2015-09-30 Thread Ray Gauss
For edits I'd say +1. For annotations and comments I'm undecided, the DC definition is somewhat vague: "An entity responsible for making contributions to the resource." If a user's comment is "this document is terrible" is he/she a contributor? Regards, Ray > On Sep 30, 2015, at 4:31 PM, Alli

[jira] [Commented] (TIKA-1607) Introduce new arbitrary object key/values data structure for persistence of Tika Metadata

2015-09-15 Thread Ray Gauss II (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14746719#comment-14746719 ] Ray Gauss II commented on TIKA-1607: Hi [~talli...@mitre.org], apologies for the d

Re: One for our XMP experts - Property with indexed closed choice?

2015-08-25 Thread Ray Gauss
Hi Nick, I realize this is ancient, but did you get any further with it? (didn't see a response when searching the list) My vote would be to always store the value as close to the spec as possible, so integers in this case, then work out a way for pulling out a localized 'display' string of th

[jira] [Commented] (TIKA-1607) Introduce new arbitrary object key/values data structure for persistence of Tika Metadata

2015-08-21 Thread Ray Gauss II (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14706706#comment-14706706 ] Ray Gauss II commented on TIKA-1607: Yes, by shoehorn I meant that the inde

[jira] [Commented] (TIKA-1607) Introduce new arbitrary object key/values data structure for persistence of Tika Metadata

2015-08-20 Thread Ray Gauss II (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14704880#comment-14704880 ] Ray Gauss II commented on TIKA-1607: I did see that, but I was after full

[jira] [Commented] (TIKA-1607) Introduce new arbitrary object key/values data structure for persistence of Tika Metadata

2015-08-19 Thread Ray Gauss II (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14704108#comment-14704108 ] Ray Gauss II commented on TIKA-1607: [~chrismattmann], I did. It seemed more sim

[jira] [Commented] (TIKA-1607) Introduce new arbitrary object key/values data structure for persistence of Tika Metadata

2015-08-19 Thread Ray Gauss II (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14703924#comment-14703924 ] Ray Gauss II commented on TIKA-1607: I've put together the start of the DOM

[jira] [Commented] (TIKA-1607) Introduce new arbitrary object key/values data structure for persistence of Tika Metadata

2015-08-06 Thread Ray Gauss II (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14660441#comment-14660441 ] Ray Gauss II commented on TIKA-1607: To clarify, the work mentioned above that use

[jira] [Commented] (TIKA-1607) Introduce new HashMap data structure for persitsence of Tika Metadata

2015-04-21 Thread Ray Gauss II (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1607?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14505054#comment-14505054 ] Ray Gauss II commented on TIKA-1607: We've had a few discussions on s

[jira] [Commented] (TIKA-1594) Webp parsing support

2015-04-07 Thread Ray Gauss II (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1594?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14484463#comment-14484463 ] Ray Gauss II commented on TIKA-1594: I'd recommend that for now we t

[jira] [Commented] (TIKA-634) Command Line Parser for Metadata Extraction

2015-03-01 Thread Ray Gauss II (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-634?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14342547#comment-14342547 ] Ray Gauss II commented on TIKA-634: --- Also see the [tika-ffmpeg project|https://github

[jira] [Commented] (TIKA-1510) FFMpeg installed but not parsing video files

2015-01-12 Thread Ray Gauss II (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14273520#comment-14273520 ] Ray Gauss II commented on TIKA-1510: Yes. The only reason I haven't mysel

[jira] [Commented] (TIKA-1510) FFMpeg installed but not parsing video files

2015-01-11 Thread Ray Gauss II (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14273049#comment-14273049 ] Ray Gauss II commented on TIKA-1510: In that project there

[jira] [Commented] (TIKA-93) OCR support

2014-09-15 Thread Ray Gauss II (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-93?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14134822#comment-14134822 ] Ray Gauss II commented on TIKA-93: -- You could use [{{org.junit.Assume}}|

Re: How should video files with audio be handled by parsers?

2014-08-20 Thread Ray Gauss
You could do something like that, or:   Property phone = Contact.PHONE(1,2,2);   System.out.println(phone.getName());    // -> company[1]/contact[2]/phone[2] Are these the droids I'm looking for?    https://github.com/Gagravarr/VorbisJava/tree/master/tika/src/main/java/org/gagravarr/tika Regar

Re: How should video files with audio be handled by parsers?

2014-08-19 Thread Ray Gauss
st 7, 2014 at 6:21:37 AM, Nick Burch (apa...@gagravarr.org) wrote: > On Wed, 6 Aug 2014, Ray Gauss wrote: > > I've updated tika-ffmpeg with a new file with 2 audio tracks and a > > subtitle track and added a test. The metadata looks as follows: > > > > pbcore:instanti

[jira] [Commented] (TIKA-93) OCR support

2014-08-19 Thread Ray Gauss II (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-93?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14102193#comment-14102193 ] Ray Gauss II commented on TIKA-93: -- Apologies, jumped in late and only glanced at

[jira] [Commented] (TIKA-93) OCR support

2014-08-19 Thread Ray Gauss II (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-93?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14102175#comment-14102175 ] Ray Gauss II commented on TIKA-93: -- Can you create a config object and pass that in

Re: How should video files with audio be handled by parsers?

2014-08-06 Thread Ray Gauss
ountryName ... Iptc4xmpExt:LocationShown[0]/Iptc4xmpExt:City Iptc4xmpExt:LocationShown[0]/Iptc4xmpExt:CountryName ... IMHO, a generic 'streams' prefix would seem out of place next to those fields. Regards, Ray On July 24, 2014 at 9:52:47 AM, Nick Burch (apa...@gagravarr.org) wrote: &g

Re: How should video files with audio be handled by parsers?

2014-07-23 Thread Ray Gauss
They are a bit verbose, but: 1) I'd really like to stick to the specification as closely as possible. 2) There are are several PBCore instantiation properties that apply to the entire file like duration and tracks that we'd want prefixed with pbcore so I think it would be odd to see:   pbcore:

Re: How should video files with audio be handled by parsers?

2014-07-22 Thread Ray Gauss
rinsic metadata.  The informational metadata will be extracted by things like XMP parsers. Regards, Ray On July 22, 2014 at 7:39:12 AM, Nick Burch (apa...@gagravarr.org) wrote: > On Tue, 22 Jul 2014, Ray Gauss wrote: > > This is a few months old but I've been looking at this recently

Re: How should video files with audio be handled by parsers?

2014-07-21 Thread Ray Gauss
Are you able to contribute to tika ? > > Sent from my iPhone > > > On Jul 21, 2014, at 6:43 PM, "Ray Gauss" wrote: > > > > Hi all, > > > > This is a few months old but I've been looking at this recently and since > > we're unlikely

Re: How should video files with audio be handled by parsers?

2014-07-21 Thread Ray Gauss
Hi all, This is a few months old but I've been looking at this recently and since we're unlikely to move to a structured metadata store in the short term I've come up with what I think is an interim solution [1] that essentially allows nesting through XPath-like syntax:     stream[0]/field1=so

Re: Can some of tika-parsers module dependencies be made optional ?

2014-07-15 Thread Ray Gauss
; > documentation) is better. This is not that distabilizing to be honest - > > any practical application is expected to be aware of the actual file > > formats and parser libs supporting those formats. > > > > But I'd like to propose tika-parsers-optional as a

Re: Can some of tika-parsers module dependencies be made optional ?

2014-06-21 Thread Ray Gauss
I’d have to respectfully disagree with most of those points but if there’s that much resistance to the idea I’ll drop it. Cheers, Ray On June 19, 2014 at 3:22:14 PM, Nick Burch (apa...@gagravarr.org) wrote: > On Thu, 19 Jun 2014, Ray Gauss wrote: > > The point of a tika-parsers-all

Re: Can some of tika-parsers module dependencies be made optional ?

2014-06-18 Thread Ray Gauss
AM, Nick Burch (apa...@gagravarr.org) wrote: > On Wed, 18 Jun 2014, Ray Gauss wrote: > > I think for 2.0 we should consider splitting out parsers into their own > > projects for a streamlined dependency hierarchy then reassembling them > > with something like a tika-parsers-all a

Re: Can some of tika-parsers module dependencies be made optional ?

2014-06-18 Thread Ray Gauss
I think for 2.0 we should consider splitting out parsers into their own projects for a streamlined dependency hierarchy then reassembling them with something like a tika-parsers-all artifact. On June 17, 2014 at 5:08:38 PM, Nick Burch (apa...@gagravarr.org) wrote: > On Tue, 17 Jun 2014, Sergey

[jira] [Commented] (TIKA-1328) Translate Metadata and Content

2014-06-10 Thread Ray Gauss II (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1328?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14026783#comment-14026783 ] Ray Gauss II commented on TIKA-1328: Leaning towards the whitelist approach, per

[jira] [Commented] (TIKA-1319) Translation

2014-06-10 Thread Ray Gauss II (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1319?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14026703#comment-14026703 ] Ray Gauss II commented on TIKA-1319: [~gagravarr], that comment seems to be

[jira] [Commented] (TIKA-1320) extract text from jpeg in solr tika

2014-06-04 Thread Ray Gauss II (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1320?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14017613#comment-14017613 ] Ray Gauss II commented on TIKA-1320: I'm not sure we have enough conte

[jira] [Commented] (TIKA-1294) Add ability to turn off extraction of PDXObjectImages (TIKA-1268) from PDFs

2014-05-29 Thread Ray Gauss II (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14012393#comment-14012393 ] Ray Gauss II commented on TIKA-1294: Hi [~talli...@apache.org], The changes look

RE: [DISCUSS] Centralizing JSON handling of Metadata

2014-05-28 Thread Ray Gauss II
hat would be blckbelt), so I'm happy to go with either. > > A new compilation unit makes sense. I'm wondering if we want to be that > specific? tika-serialization? > Or, maybe just tika-utils? > > Package name looks good to me. > > Thanks, again! > &g

Re: [DISCUSS] Centralizing JSON handling of Metadata

2014-05-28 Thread Ray Gauss II
Hi Tim, 1) Sounds good to me. 2) I do think we want core as lean as possible, so my vote would be for a separate project/module, similar to what was done with tika-xmp.  Perhaps something like tika-serialization-json to indicate other formats may follow in the same precedence? 3) Similar to a

[jira] [Commented] (TIKA-1295) Make some Dublin Core items multi-valued

2014-05-15 Thread Ray Gauss II (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13995945#comment-13995945 ] Ray Gauss II commented on TIKA-1295: +1 for the data model more accurately reflec

[jira] [Commented] (TIKA-1278) Expose PDF Avg Char and Spacing Tolerance Config Params

2014-05-15 Thread Ray Gauss II (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13995298#comment-13995298 ] Ray Gauss II commented on TIKA-1278: Hi [~tallison], I thought about addin

[jira] [Commented] (TIKA-1295) Make some Dublin Core items multi-valued

2014-05-14 Thread Ray Gauss II (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1295?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13997478#comment-13997478 ] Ray Gauss II commented on TIKA-1295: bq. I see that there is an ALT PropertyType.

[jira] [Commented] (TIKA-1294) Add ability to turn off extraction of PDXObjectImages (TIKA-1268) from PDFs

2014-05-14 Thread Ray Gauss II (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13997500#comment-13997500 ] Ray Gauss II commented on TIKA-1294: I saw similar problematic resource consumptio

[jira] [Commented] (TIKA-1294) Add ability to turn off extraction of PDXObjectImages (TIKA-1268) from PDFs

2014-05-14 Thread Ray Gauss II (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13995474#comment-13995474 ] Ray Gauss II commented on TIKA-1294: We ran into this exact issue recently and t

[jira] [Commented] (TIKA-1294) Add ability to turn off extraction of PDXObjectImages (TIKA-1268) from PDFs

2014-05-13 Thread Ray Gauss II (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1294?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13995960#comment-13995960 ] Ray Gauss II commented on TIKA-1294: bq. Can your MediaTypeDisablingDocumentSele

[jira] [Comment Edited] (TIKA-1278) Expose PDF Avg Char and Spacing Tolerance Config Params

2014-05-12 Thread Ray Gauss II (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13995298#comment-13995298 ] Ray Gauss II edited comment on TIKA-1278 at 5/12/14 5:3

[jira] [Reopened] (TIKA-1279) Missing return lines at output of SourceCodeParser

2014-04-24 Thread Ray Gauss II (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1279?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ray Gauss II reopened TIKA-1279: Assignee: Hong-Thai Nguyen [~thaichat04], I believe we still have to support Java 6 and

[jira] [Comment Edited] (TIKA-1278) Expose PDF Avg Char and Spacing Tolerance Config Params

2014-04-24 Thread Ray Gauss II (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1278?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13979700#comment-13979700 ] Ray Gauss II edited comment on TIKA-1278 at 4/24/14 1:3

[jira] [Resolved] (TIKA-1278) Expose PDF Avg Char and Spacing Tolerance Config Params

2014-04-24 Thread Ray Gauss II (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ray Gauss II resolved TIKA-1278. Resolution: Fixed Resolved in r1589722. > Expose PDF Avg Char and Spacing Tolerance Config Par

[jira] [Updated] (TIKA-1278) Expose PDF Avg Char and Spacing Tolerance Config Params

2014-04-24 Thread Ray Gauss II (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1278?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ray Gauss II updated TIKA-1278: --- Description: {{PDFParserConfig}} should allow for override of PDFBox's {{averageCharTolerance}

[jira] [Created] (TIKA-1278) Expose PDF Avg Char and Spacing Tolerance Config Params

2014-04-24 Thread Ray Gauss II (JIRA)
Ray Gauss II created TIKA-1278: -- Summary: Expose PDF Avg Char and Spacing Tolerance Config Params Key: TIKA-1278 URL: https://issues.apache.org/jira/browse/TIKA-1278 Project: Tika Issue Type

[jira] [Updated] (TIKA-1151) Maven Build Should Automatically Produce test-jar Artifacts

2014-03-24 Thread Ray Gauss II (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ray Gauss II updated TIKA-1151: --- Fix Version/s: 1.6 > Maven Build Should Automatically Produce test-jar Artifa

[jira] [Resolved] (TIKA-1151) Maven Build Should Automatically Produce test-jar Artifacts

2014-03-24 Thread Ray Gauss II (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ray Gauss II resolved TIKA-1151. Resolution: Fixed Resolved in r1580887. > Maven Build Should Automatically Produce test-

[jira] [Updated] (TIKA-1151) Maven Build Should Automatically Produce test-jar Artifacts

2014-02-20 Thread Ray Gauss II (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ray Gauss II updated TIKA-1151: --- Description: The Maven build should be updated to produce test jar artifacts for appropriate sub

[jira] [Commented] (TIKA-1151) Maven Build Should Automatically Produce test-jar Artifacts

2014-02-20 Thread Ray Gauss II (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1151?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13907100#comment-13907100 ] Ray Gauss II commented on TIKA-1151: This will create a few artifacts on the la

[jira] [Updated] (TIKA-1151) Maven Build Should Automatically Produce test-jar Artifacts

2014-02-20 Thread Ray Gauss II (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1151?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ray Gauss II updated TIKA-1151: --- Description: The Maven build should be updated to produce test jar artifacts for appropriate sub

Re: Extract thumbnail from openxml office files

2014-01-08 Thread Ray Gauss II
Hi Hong-Thai, It’s certainly worth investigating.  Several other formats can have embedded thumbnails as well so we could implement a generic thumbnail property. We could probably store as something like a Base64 encoded string, but we’d likely want to place limits on the size and may need a th

[jira] [Resolved] (TIKA-1179) A corrupt mp3 file can cause an infinite loop in Mp3Parser

2013-10-04 Thread Ray Gauss II (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1179?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ray Gauss II resolved TIKA-1179. Resolution: Cannot Reproduce Assignee: Ray Gauss II I've just confirmed the desc

[jira] [Resolved] (TIKA-1177) Add Matroska (mkv, mka) format detection

2013-10-04 Thread Ray Gauss II (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ray Gauss II resolved TIKA-1177. Resolution: Fixed Fix Version/s: 1.5 Unfortunately that magic doesn't seem to be requir

[jira] [Assigned] (TIKA-1177) Add Matroska (mkv, mka) format detection

2013-10-04 Thread Ray Gauss II (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ray Gauss II reassigned TIKA-1177: -- Assignee: Ray Gauss II > Add Matroska (mkv, mka) format detect

[jira] [Commented] (TIKA-1170) Insufficiently specific magic for binary image/cgm files

2013-09-03 Thread Ray Gauss II (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13757000#comment-13757000 ] Ray Gauss II commented on TIKA-1170: Yes, but in this particular case I though

[jira] [Resolved] (TIKA-1170) Insufficiently specific magic for binary image/cgm files

2013-09-03 Thread Ray Gauss II (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ray Gauss II resolved TIKA-1170. Resolution: Fixed Resolved in r1519792. SVN did not like the html extension on the problem file

[jira] [Reopened] (TIKA-1170) Insufficiently specific magic for binary image/cgm files

2013-09-03 Thread Ray Gauss II (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ray Gauss II reopened TIKA-1170: > Insufficiently specific magic for binary image/cgm fi

[jira] [Commented] (TIKA-1170) Insufficiently specific magic for binary image/cgm files

2013-09-03 Thread Ray Gauss II (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1170?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=1375#comment-1375 ] Ray Gauss II commented on TIKA-1170: My mistake, that's an artifact of me

[jira] [Resolved] (TIKA-1170) Insufficiently specific magic for binary image/cgm files

2013-09-03 Thread Ray Gauss II (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ray Gauss II resolved TIKA-1170. Resolution: Fixed Fix Version/s: 1.5 Added in r1519664. Thanks

[jira] [Assigned] (TIKA-1170) Insufficiently specific magic for binary image/cgm files

2013-09-03 Thread Ray Gauss II (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1170?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ray Gauss II reassigned TIKA-1170: -- Assignee: Ray Gauss II > Insufficiently specific magic for binary image/cgm fi

[jira] [Resolved] (TIKA-1166) FLVParser NullPointerException

2013-08-28 Thread Ray Gauss II (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ray Gauss II resolved TIKA-1166. Resolution: Fixed Fix Version/s: 1.5 I briefly tried a few methods of trimming the problem

[jira] [Assigned] (TIKA-1166) FLVParser NullPointerException

2013-08-28 Thread Ray Gauss II (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ray Gauss II reassigned TIKA-1166: -- Assignee: Ray Gauss II > FLVParser NullPointerExcept

[jira] [Commented] (TIKA-1166) FLVParser NullPointerException

2013-08-22 Thread Ray Gauss II (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1166?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13747529#comment-13747529 ] Ray Gauss II commented on TIKA-1166: Thanks. Is there any chance you could get

[jira] [Commented] (TIKA-1154) Tika hangs on format detection of malformed HTML file.

2013-07-26 Thread Ray Gauss II (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1154?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13720694#comment-13720694 ] Ray Gauss II commented on TIKA-1154: I've been pushing the metadata-extrac

[jira] [Created] (TIKA-1151) Maven Build Should Automatically Produce test-jar Artifacts

2013-07-22 Thread Ray Gauss II (JIRA)
Ray Gauss II created TIKA-1151: -- Summary: Maven Build Should Automatically Produce test-jar Artifacts Key: TIKA-1151 URL: https://issues.apache.org/jira/browse/TIKA-1151 Project: Tika Issue

Re: Tika Core and Parsers Test Artifacts

2013-07-22 Thread Ray Gauss II
neral I'll create a JIRA issue where we can discuss the details. Regards, Ray On Jul 21, 2013, at 3:25 PM, Ken Krugler wrote: > Hi Ray, > > On Jul 18, 2013, at 6:37am, Ray Gauss II wrote: > >> Hi Ken, >> >> They recommend test-jar instead of classifier no

Re: Tika Core and Parsers Test Artifacts

2013-07-18 Thread Ray Gauss II
3, at 9:19 AM, Ken Krugler wrote: > Hi Ray, > > On Jul 18, 2013, at 5:14am, Ray Gauss II wrote: > >> I don't recall if we've discussed this already (I did do a brief search and >> didn't see anything). >> >> Is there any opposition to ad

Tika Core and Parsers Test Artifacts

2013-07-18 Thread Ray Gauss II
I don't recall if we've discussed this already (I did do a brief search and didn't see anything). Is there any opposition to adding test-jar Maven artifacts for tika-core and tika-parsers? Seems like it would be good to allow others to extend from tests there if need be.

[jira] [Resolved] (TIKA-1147) File-Based TikaInputStreams are Deleted by ExternalEmbedder.embed

2013-07-17 Thread Ray Gauss II (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ray Gauss II resolved TIKA-1147. Resolution: Fixed Fix Version/s: 1.5 Resolved in r1504302. > File-Ba

[jira] [Updated] (TIKA-1147) File-Based TikaInputStreams are Deleted by ExternalEmbedder.embed

2013-07-17 Thread Ray Gauss II (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1147?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ray Gauss II updated TIKA-1147: --- Component/s: metadata Description: When an application using Tika passes

[jira] [Created] (TIKA-1147) Passing a File-Based TikaInputStream to ExternalEmbedder Delete

2013-07-17 Thread Ray Gauss II (JIRA)
Ray Gauss II created TIKA-1147: -- Summary: Passing a File-Based TikaInputStream to ExternalEmbedder Delete Key: TIKA-1147 URL: https://issues.apache.org/jira/browse/TIKA-1147 Project: Tika

Re: RFC822Parser build error on gump

2013-06-28 Thread Ray Gauss II
I know very little about gump, but looking at the log the build seems to have skipped the mime4j artifacts altogether. On Jun 25, 2013, at 6:25 PM, Nick Burch wrote: > Hi All > > Anyone have any idea about this compiler error on the tika parsers project as > hit by gump? > http://vmgump.apac

[jira] [Commented] (TIKA-1130) .docx text extract leaves out some portions of text

2013-06-13 Thread Ray Gauss II (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13682924#comment-13682924 ] Ray Gauss II commented on TIKA-1130: Test file and method committed in r1492909.

[jira] [Commented] (TIKA-1130) .docx text extract leaves out some portions of text

2013-06-13 Thread Ray Gauss II (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1130?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13682644#comment-13682644 ] Ray Gauss II commented on TIKA-1130: I've created a unit test that repro

[jira] [Created] (TIKA-1135) Incorrect Cardinality and Case in IPTC Metadata Definition

2013-06-11 Thread Ray Gauss II (JIRA)
Ray Gauss II created TIKA-1135: -- Summary: Incorrect Cardinality and Case in IPTC Metadata Definition Key: TIKA-1135 URL: https://issues.apache.org/jira/browse/TIKA-1135 Project: Tika Issue Type

[jira] [Resolved] (TIKA-1135) Incorrect Cardinality and Case in IPTC Metadata Definition

2013-06-11 Thread Ray Gauss II (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1135?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ray Gauss II resolved TIKA-1135. Resolution: Fixed Resolved in r1491935. > Incorrect Cardinality and Case in I

[jira] [Resolved] (TIKA-1133) Ability to Allow Empty and Duplicate Tika Values for XML Elements

2013-06-10 Thread Ray Gauss II (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1133?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ray Gauss II resolved TIKA-1133. Resolution: Fixed Fix Version/s: 1.4 Resolved in r1491680. > Ability

  1   2   3   >