Re: Parsing order issue

2019-12-17 Thread Tim Allison
Tilman, That isn’t correct. I’ll find the link that might help... On Tue, Dec 17, 2019 at 1:02 PM Tilman Hausherr wrote: > I already answered... we need the PDF. > > But... about the config: > > > > > > >image/jpeg >application/pdf >

Re: Parsing order issue

2019-12-17 Thread Tilman Hausherr
I already answered... we need the PDF. But... about the config:             image/jpeg   application/pdf   class="org.apache.tika.parser.executable.ExecutableParser"/>                   application/pdf       Is this a correct setting for PDFs in tika? I notice

Re: [EXTERNAL] Do we have a community supported approach for deploying Tika Server in production?

2019-12-17 Thread Eric Pugh
Cool. It’s the auto run that I really need, and the other part that I don’t think I’ve tackled properly is the managing of logs… I’m going to check with my project to see if they support Snap packages. Eric > On Dec 16, 2019, at 5:10 PM, Tom Barber wrote: > > Just saw this fly by and

[jira] [Commented] (TIKA-3010) Tika needs service installation script

2019-12-17 Thread ASF GitHub Bot (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3010?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16998423#comment-16998423 ] ASF GitHub Bot commented on TIKA-3010: -- epugh commented on issue #305: TIKA-3010 Install and run

[jira] [Commented] (TIKA-3012) NPE caused by multiple calls to start/end document in RFC822Parser

2019-12-17 Thread Hudson (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16998294#comment-16998294 ] Hudson commented on TIKA-3012: -- SUCCESS: Integrated in Jenkins build tika-branch-1x #296 (See

[jira] [Commented] (TIKA-3012) NPE caused by multiple calls to start/end document in RFC822Parser

2019-12-17 Thread Hudson (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3012?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16998219#comment-16998219 ] Hudson commented on TIKA-3012: -- FAILURE: Integrated in Jenkins build tika-branch-1x #295 (See

[jira] [Commented] (TIKA-3013) TSDParser should pass wrapped handler into handle attachments

2019-12-17 Thread Hudson (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3013?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16998220#comment-16998220 ] Hudson commented on TIKA-3013: -- FAILURE: Integrated in Jenkins build tika-branch-1x #295 (See

[jira] [Commented] (TIKA-3015) TNEFParser fails with ToXMLContentHandler

2019-12-17 Thread Hudson (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3015?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16998221#comment-16998221 ] Hudson commented on TIKA-3015: -- FAILURE: Integrated in Jenkins build tika-branch-1x #295 (See

[jira] [Commented] (TIKA-3016) Old Excel Parser fails with ToXMLHandler

2019-12-17 Thread Hudson (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16998222#comment-16998222 ] Hudson commented on TIKA-3016: -- FAILURE: Integrated in Jenkins build tika-branch-1x #295 (See

[jira] [Commented] (TIKA-3011) Need to add release version for maven-compiler-plugin

2019-12-17 Thread Hudson (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3011?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16998218#comment-16998218 ] Hudson commented on TIKA-3011: -- FAILURE: Integrated in Jenkins build tika-branch-1x #295 (See

[jira] [Commented] (TIKA-3007) Heic images are detected as "application/mp4" when using tika as server

2019-12-17 Thread Nick Burch (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16998213#comment-16998213 ] Nick Burch commented on TIKA-3007: -- See

[jira] [Commented] (TIKA-3014) XLIFF12Parser fails with ToXMLHandler

2019-12-17 Thread Timo Boehme (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16998204#comment-16998204 ] Timo Boehme commented on TIKA-3014: --- Only looking (online) at the code I see that by default the

Re: Parsing order issue

2019-12-17 Thread Maruan Sahyoun
Hi Tim, unfortunately the image didn't make it to the mailing list. What is the issue here? Is the extracted text not in the right order? Order of PDF parsing and visual order of text are not related. BR Maruan > PDFBox Colleagues, > Any recommendations? > > On Mon, Dec 16, 2019 at 7:05

Re: Parsing order issue

2019-12-17 Thread Tim Allison
PDFBox Colleagues, Any recommendations? On Mon, Dec 16, 2019 at 7:05 AM Lu Sun wrote: > Dear Tika Dev Team, > > > > Hope this email finds you well. > > > > I have been actively using Tika for pdf file reading. One issue I found is > the parsing order. As shown in attached image, the parsing

[jira] [Commented] (TIKA-3014) XLIFF12Parser fails with ToXMLHandler

2019-12-17 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16998142#comment-16998142 ] Tim Allison commented on TIKA-3014: --- Thank you, [~tboehme]! Any recommendations on how we can

[jira] [Updated] (TIKA-3014) XLIFF12Parser fails with ToXMLHandler

2019-12-17 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3014?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-3014: -- Description: XLIFF12Parser fails with ToXMLHandler because xml namespace isn't set, but is needed for

[jira] [Resolved] (TIKA-2224) OneNote formats support - Mime Magic and Parser

2019-12-17 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-2224?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-2224. --- Fix Version/s: 1.24 Assignee: Tim Allison Resolution: Fixed Many, many thanks to

[jira] [Resolved] (TIKA-3012) NPE caused by multiple calls to start/end document in RFC822Parser

2019-12-17 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3012?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-3012. --- Fix Version/s: 1.24 Assignee: Tim Allison Resolution: Fixed > NPE caused by multiple

[jira] [Resolved] (TIKA-3011) Need to add release version for maven-compiler-plugin

2019-12-17 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3011?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-3011. --- Fix Version/s: 1.24 Assignee: Tim Allison Resolution: Fixed > Need to add release

[jira] [Resolved] (TIKA-3015) TNEFParser fails with ToXMLContentHandler

2019-12-17 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-3015. --- Fix Version/s: 1.24 Resolution: Fixed > TNEFParser fails with ToXMLContentHandler >

[jira] [Resolved] (TIKA-3013) TSDParser should pass wrapped handler into handle attachments

2019-12-17 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3013?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-3013. --- Fix Version/s: 1.24 Assignee: Tim Allison Resolution: Fixed > TSDParser should pass

[jira] [Resolved] (TIKA-3016) Old Excel Parser fails with ToXMLHandler

2019-12-17 Thread Tim Allison (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-3016. --- Fix Version/s: 1.24 Resolution: Fixed > Old Excel Parser fails with ToXMLHandler >

[jira] [Commented] (TIKA-3007) Heic images are detected as "application/mp4" when using tika as server

2019-12-17 Thread Johan (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16998119#comment-16998119 ] Johan commented on TIKA-3007: - Hi, Ok i understand it a bit better now but don't you still think that -j

[jira] [Commented] (TIKA-3007) Heic images are detected as "application/mp4" when using tika as server

2019-12-17 Thread Nick Burch (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16998112#comment-16998112 ] Nick Burch commented on TIKA-3007: -- There is currently no Parser for HEIC files, only mime detection. A

[jira] [Commented] (TIKA-3014) XLIFF12Parser fails with ToXMLHandler

2019-12-17 Thread Timo Boehme (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3014?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16998084#comment-16998084 ] Timo Boehme commented on TIKA-3014: --- I'm sorry if I misunderstand the problem: any XML namespace aware

[jira] [Commented] (TIKA-3007) Heic images are detected as "application/mp4" when using tika as server

2019-12-17 Thread Johan (Jira)
[ https://issues.apache.org/jira/browse/TIKA-3007?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16998072#comment-16998072 ] Johan commented on TIKA-3007: - Hi, Ok we see that indeed your call above is working but then we have some