FW: ApacheCon Miami is coming in May.

2016-11-30 Thread Allison, Timothy B.
> ApacheCon and Apache Big Data will be held at the Intercontinental in Miami, > Florida, May 16-18, 2017 I plan to attend. Who's in? Any idea if there will be another "content" track like we had in Austin? Cheers, Tim -Original Message- From: Rich Bowen

Re: FW: ApacheCon Miami is coming in May.

2016-11-30 Thread Nick Burch
On Wed, 30 Nov 2016, Allison, Timothy B. wrote: ApacheCon and Apache Big Data will be held at the Intercontinental in Miami, Florida, May 16-18, 2017 I plan to attend. Who's in? Any idea if there will be another "content" track like we had in Austin? If we want a Content track, then we'd

[jira] [Commented] (TIKA-2036) Deleted Text from Word File Shows Up in Extract

2016-11-30 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15709719#comment-15709719 ] Tim Allison commented on TIKA-2036: --- On TIKA-1321, I added a new experimental SAXParser that processes

[jira] [Updated] (TIKA-2187) Align default behavior of experimental docx parser with that of doc parser in handling delText

2016-11-30 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-2187: -- Summary: Align default behavior of experimental docx parser with that of doc parser in handling delText

[jira] [Updated] (TIKA-2187) Align default behavior of experimental docx parser with that of doc parser in handling delText

2016-11-30 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-2187: -- Description: Now that we can ignore delText via the experimental alternate SAXParser for .docx files,

[jira] [Created] (TIKA-2187) Change default behavior for handling deleted content in .docx files

2016-11-30 Thread Tim Allison (JIRA)
Tim Allison created TIKA-2187: - Summary: Change default behavior for handling deleted content in .docx files Key: TIKA-2187 URL: https://issues.apache.org/jira/browse/TIKA-2187 Project: Tika

Re: FW: ApacheCon Miami is coming in May.

2016-11-30 Thread Tom Barber
Nick, who *wouldn't* want to see your talks?! On Wed, Nov 30, 2016 at 8:13 PM, Nick Burch wrote: > On Wed, 30 Nov 2016, Allison, Timothy B. wrote: > >> ApacheCon and Apache Big Data will be held at the Intercontinental in >>> Miami, Florida, May 16-18, 2017 >>> >> >> I

[jira] [Resolved] (TIKA-1321) Add experimental SAX/Streaming XWPF/docx extractor

2016-11-30 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1321?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-1321. --- Resolution: Fixed Fix Version/s: 1.15 2.0 Initial parser is added. We may

[jira] [Resolved] (TIKA-2187) Align default behavior of experimental docx parser with that of doc parser in handling delText

2016-11-30 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2187?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-2187. --- Resolution: Fixed Fix Version/s: 1.15 2.0 > Align default behavior of

[jira] [Commented] (TIKA-207) MS word doc containing tracked changes produces incorrect text

2016-11-30 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-207?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15709738#comment-15709738 ] Tim Allison commented on TIKA-207: -- Only took 5 years, but this is now accomplished in docx. :) > MS word

[jira] [Commented] (TIKA-2036) Deleted Text from Word File Shows Up in Extract

2016-11-30 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15709722#comment-15709722 ] Tim Allison commented on TIKA-2036: --- Is this still true? It looks like from the following: {noformat}

[jira] [Commented] (TIKA-2187) Align default behavior of experimental docx parser with that of doc parser in handling delText

2016-11-30 Thread Luis Filipe Nassif (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15710370#comment-15710370 ] Luis Filipe Nassif commented on TIKA-2187: -- Thank you [~talli...@apache.org] for making it

[jira] [Commented] (TIKA-2187) Align default behavior of experimental docx parser with that of doc parser in handling delText

2016-11-30 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15710372#comment-15710372 ] Hudson commented on TIKA-2187: -- SUCCESS: Integrated in Jenkins build Tika-trunk #1148 (See

[jira] [Resolved] (TIKA-2090) Extract javascript from PDActions in PDFs

2016-11-30 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2090?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-2090. --- Resolution: Fixed Fix Version/s: 1.15 2.0 Fixed for extraction from common

Re: FW: ApacheCon Miami is coming in May.

2016-11-30 Thread Bob Paulin
I bet the Sling and Jackrabbit/Oak projects would be interested in such a track. Those projects have pretty strong corporate backing which could help with sponsorship. Would it make sense to cross post this to them to get the ball rolling? - Bob On 11/30/2016 2:15 PM, Tom Barber wrote: >

[jira] [Commented] (TIKA-2187) Align default behavior of experimental docx parser with that of doc parser in handling delText

2016-11-30 Thread Tim Allison (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15710381#comment-15710381 ] Tim Allison commented on TIKA-2187: --- Note that I also added extraction of deleted text back to .doc, also

[jira] [Commented] (TIKA-2187) Align default behavior of experimental docx parser with that of doc parser in handling delText

2016-11-30 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15710454#comment-15710454 ] Hudson commented on TIKA-2187: -- FAILURE: Integrated in Jenkins build tika-2.x-windows #81 (See

tika-2.x-windows - Build # 81 - Still Failing

2016-11-30 Thread Apache Jenkins Server
The Apache Jenkins build system has built tika-2.x-windows (build #81) Status: Still Failing Check console output at https://builds.apache.org/job/tika-2.x-windows/81/ to view the results.

[jira] [Commented] (TIKA-2090) Extract javascript from PDActions in PDFs

2016-11-30 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15710455#comment-15710455 ] Hudson commented on TIKA-2090: -- FAILURE: Integrated in Jenkins build tika-2.x-windows #81 (See

[jira] [Commented] (TIKA-2187) Align default behavior of experimental docx parser with that of doc parser in handling delText

2016-11-30 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15710097#comment-15710097 ] Hudson commented on TIKA-2187: -- SUCCESS: Integrated in Jenkins build Tika-trunk #1147 (See

[jira] [Commented] (TIKA-1321) Add experimental SAX/Streaming XWPF/docx extractor

2016-11-30 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-1321?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15710096#comment-15710096 ] Hudson commented on TIKA-1321: -- SUCCESS: Integrated in Jenkins build Tika-trunk #1147 (See

[jira] [Commented] (TIKA-2090) Extract javascript from PDActions in PDFs

2016-11-30 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15710525#comment-15710525 ] Hudson commented on TIKA-2090: -- SUCCESS: Integrated in Jenkins build Tika-trunk #1149 (See

[jira] [Commented] (TIKA-2187) Align default behavior of experimental docx parser with that of doc parser in handling delText

2016-11-30 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2187?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15710536#comment-15710536 ] Hudson commented on TIKA-2187: -- SUCCESS: Integrated in Jenkins build tika-2.x #180 (See

[jira] [Commented] (TIKA-2090) Extract javascript from PDActions in PDFs

2016-11-30 Thread Hudson (JIRA)
[ https://issues.apache.org/jira/browse/TIKA-2090?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15710537#comment-15710537 ] Hudson commented on TIKA-2090: -- SUCCESS: Integrated in Jenkins build tika-2.x #180 (See