[jira] [Commented] (TIKA-2799) Consider reverting jackcess

2018-12-14 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/TIKA-2799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16721919#comment-16721919
 ] 

Hudson commented on TIKA-2799:
--

SUCCESS: Integrated in Jenkins build tika-branch-1x #145 (See 
[https://builds.apache.org/job/tika-branch-1x/145/])
TIKA-2799 - revert jackcess based on regression results (tallison: 
[https://github.com/apache/tika/commit/1a1f9809b2464c2814d251e2bf4c82c61e59d1e7])
* (edit) 
tika-parsers/src/test/java/org/apache/tika/parser/microsoft/JackcessParserTest.java
* (edit) 
tika-parsers/src/main/java/org/apache/tika/parser/microsoft/JackcessOleUtil.java
* (edit) tika-parsers/pom.xml


> Consider reverting jackcess
> ---
>
> Key: TIKA-2799
> URL: https://issues.apache.org/jira/browse/TIKA-2799
> Project: Tika
>  Issue Type: Improvement
>Reporter: Tim Allison
>Assignee: Tim Allison
>Priority: Major
> Fix For: 2.0.0, 1.20
>
>
> It looks like there were some very slight regressions in Jackcess 2.2.0 -- 
> we're able to get slightly less text out of files that threw exceptions, and 
> there's one new exception on a file that was parsed without problem earlier.
> https://sourceforge.net/p/jackcess/bugs/150/
> https://sourceforge.net/p/jackcess/bugs/149/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (TIKA-2799) Consider reverting jackcess

2018-12-14 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/TIKA-2799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16721903#comment-16721903
 ] 

Hudson commented on TIKA-2799:
--

SUCCESS: Integrated in Jenkins build Tika-trunk #1611 (See 
[https://builds.apache.org/job/Tika-trunk/1611/])
TIKA-2799 - revert jackcess based on regression results (tallison: 
[https://github.com/apache/tika/commit/27454e364525a67deee0673cf91dbb2f090b0b41])
* (edit) 
tika-parsers/src/test/java/org/apache/tika/parser/microsoft/JackcessParserTest.java
* (edit) tika-parsers/pom.xml
* (edit) 
tika-parsers/src/main/java/org/apache/tika/parser/microsoft/JackcessOleUtil.java


> Consider reverting jackcess
> ---
>
> Key: TIKA-2799
> URL: https://issues.apache.org/jira/browse/TIKA-2799
> Project: Tika
>  Issue Type: Improvement
>Reporter: Tim Allison
>Assignee: Tim Allison
>Priority: Major
> Fix For: 2.0.0, 1.20
>
>
> It looks like there were some very slight regressions in Jackcess 2.2.0 -- 
> we're able to get slightly less text out of files that threw exceptions, and 
> there's one new exception on a file that was parsed without problem earlier.
> https://sourceforge.net/p/jackcess/bugs/150/
> https://sourceforge.net/p/jackcess/bugs/149/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (TIKA-2800) Include num of unique common/alphabetic tokens (types) in tika-eval

2018-12-14 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/TIKA-2800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16721902#comment-16721902
 ] 

Hudson commented on TIKA-2800:
--

SUCCESS: Integrated in Jenkins build Tika-trunk #1611 (See 
[https://builds.apache.org/job/Tika-trunk/1611/])
TIKA-2800 -- add num unique alphabetic tokens and num unique common (tallison: 
[https://github.com/apache/tika/commit/c7f292b5abb08096f6f4870326a16929cb326a33])
* (edit) tika-eval/src/main/resources/comparison-reports.xml
* (edit) tika-eval/src/test/java/org/apache/tika/eval/SimpleComparerTest.java
* (edit) 
tika-eval/src/main/java/org/apache/tika/eval/tokens/CommonTokenResult.java
* (edit) tika-eval/src/main/java/org/apache/tika/eval/ExtractProfiler.java
* (edit) tika-eval/src/main/java/org/apache/tika/eval/db/Cols.java
* (edit) 
tika-eval/src/main/java/org/apache/tika/eval/tokens/CommonTokenCountManager.java
* (edit) tika-eval/src/main/java/org/apache/tika/eval/AbstractProfiler.java


> Include num of unique common/alphabetic tokens (types) in tika-eval
> ---
>
> Key: TIKA-2800
> URL: https://issues.apache.org/jira/browse/TIKA-2800
> Project: Tika
>  Issue Type: Improvement
>Reporter: Tim Allison
>Assignee: Tim Allison
>Priority: Major
> Fix For: 2.0.0, 1.20
>
>
> We include token and unique token (type) counts in tika-eval.  We should 
> include type counts for alphabetic and common words.  If one tool is 
> incorrectly duplicating/triplicating content dramatically, that would 
> incorrectly inflate the "common_tokens" sum for that tool. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (TIKA-2799) Consider reverting jackcess

2018-12-14 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/TIKA-2799?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16721884#comment-16721884
 ] 

Hudson commented on TIKA-2799:
--

UNSTABLE: Integrated in Jenkins build tika-2.x-windows #366 (See 
[https://builds.apache.org/job/tika-2.x-windows/366/])
TIKA-2799 - revert jackcess based on regression results (tallison: rev 
27454e364525a67deee0673cf91dbb2f090b0b41)
* (edit) tika-parsers/pom.xml
* (edit) 
tika-parsers/src/main/java/org/apache/tika/parser/microsoft/JackcessOleUtil.java
* (edit) 
tika-parsers/src/test/java/org/apache/tika/parser/microsoft/JackcessParserTest.java


> Consider reverting jackcess
> ---
>
> Key: TIKA-2799
> URL: https://issues.apache.org/jira/browse/TIKA-2799
> Project: Tika
>  Issue Type: Improvement
>Reporter: Tim Allison
>Assignee: Tim Allison
>Priority: Major
> Fix For: 2.0.0, 1.20
>
>
> It looks like there were some very slight regressions in Jackcess 2.2.0 -- 
> we're able to get slightly less text out of files that threw exceptions, and 
> there's one new exception on a file that was parsed without problem earlier.
> https://sourceforge.net/p/jackcess/bugs/150/
> https://sourceforge.net/p/jackcess/bugs/149/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (TIKA-2800) Include num of unique common/alphabetic tokens (types) in tika-eval

2018-12-14 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/TIKA-2800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16721861#comment-16721861
 ] 

Hudson commented on TIKA-2800:
--

SUCCESS: Integrated in Jenkins build tika-branch-1x #144 (See 
[https://builds.apache.org/job/tika-branch-1x/144/])
TIKA-2800 -- add num unique alphabetic tokens and num unique common (tallison: 
[https://github.com/apache/tika/commit/b2680df17bda7d112d41dcab57d474767fd4212e])
* (edit) tika-eval/src/main/java/org/apache/tika/eval/ExtractProfiler.java
* (edit) 
tika-eval/src/main/java/org/apache/tika/eval/tokens/CommonTokenResult.java
* (edit) tika-eval/src/main/java/org/apache/tika/eval/db/Cols.java
* (edit) tika-eval/src/test/java/org/apache/tika/eval/SimpleComparerTest.java
* (edit) 
tika-eval/src/main/java/org/apache/tika/eval/tokens/CommonTokenCountManager.java
* (edit) tika-eval/src/main/java/org/apache/tika/eval/AbstractProfiler.java
* (edit) tika-eval/src/main/resources/comparison-reports.xml


> Include num of unique common/alphabetic tokens (types) in tika-eval
> ---
>
> Key: TIKA-2800
> URL: https://issues.apache.org/jira/browse/TIKA-2800
> Project: Tika
>  Issue Type: Improvement
>Reporter: Tim Allison
>Assignee: Tim Allison
>Priority: Major
> Fix For: 2.0.0, 1.20
>
>
> We include token and unique token (type) counts in tika-eval.  We should 
> include type counts for alphabetic and common words.  If one tool is 
> incorrectly duplicating/triplicating content dramatically, that would 
> incorrectly inflate the "common_tokens" sum for that tool. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (TIKA-2800) Include num of unique common/alphabetic tokens (types) in tika-eval

2018-12-14 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/TIKA-2800?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16721843#comment-16721843
 ] 

Hudson commented on TIKA-2800:
--

UNSTABLE: Integrated in Jenkins build tika-2.x-windows #365 (See 
[https://builds.apache.org/job/tika-2.x-windows/365/])
TIKA-2800 -- add num unique alphabetic tokens and num unique common (tallison: 
rev c7f292b5abb08096f6f4870326a16929cb326a33)
* (edit) 
tika-eval/src/main/java/org/apache/tika/eval/tokens/CommonTokenCountManager.java
* (edit) tika-eval/src/main/java/org/apache/tika/eval/AbstractProfiler.java
* (edit) tika-eval/src/main/java/org/apache/tika/eval/ExtractProfiler.java
* (edit) tika-eval/src/main/resources/comparison-reports.xml
* (edit) tika-eval/src/main/java/org/apache/tika/eval/db/Cols.java
* (edit) 
tika-eval/src/main/java/org/apache/tika/eval/tokens/CommonTokenResult.java
* (edit) tika-eval/src/test/java/org/apache/tika/eval/SimpleComparerTest.java


> Include num of unique common/alphabetic tokens (types) in tika-eval
> ---
>
> Key: TIKA-2800
> URL: https://issues.apache.org/jira/browse/TIKA-2800
> Project: Tika
>  Issue Type: Improvement
>Reporter: Tim Allison
>Assignee: Tim Allison
>Priority: Major
> Fix For: 2.0.0, 1.20
>
>
> We include token and unique token (type) counts in tika-eval.  We should 
> include type counts for alphabetic and common words.  If one tool is 
> incorrectly duplicating/triplicating content dramatically, that would 
> incorrectly inflate the "common_tokens" sum for that tool. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: 1.20?

2018-12-14 Thread Tim Allison
Thank you, again, Luís Filipe Nassif!  There's no point in having
reports unless we pay attention to them :P.  I reverted junrar to
where it was in 1.19.1. I also reverted jackcess based on the reports.

All,
  On the theory that it isn't a great idea to push to production on a
Friday.  I'm going to let the recent changes rest over the weekend.
I'll rerun some tests on a subset of the regression corpus on Monday
and then roll rc1.  If anyone wants to kick the tires on the recent
version changes, including parsers that depend on the upgraded guava,
that'd be great!

Onward!

Cheers,

   Tim

On Thu, Dec 13, 2018 at 5:34 PM Tim Allison  wrote:
>
> Let me actually take a look before answering. Sorry!
>
> On Thu, Dec 13, 2018 at 5:30 PM Tim Allison  wrote:
>>
>>  Thank you for reading the reports!!!
>>
>> The files are very likely broken.  I can take a look.  The change was
>> probably because of an "upgrade" to junrar.  Should I revert to the
>> version we used in 1.19.1?
>> On Thu, Dec 13, 2018 at 1:34 PM Luís Filipe Nassif  
>> wrote:
>> >
>> > Hi Tim,
>> >
>> > Reading your great reports, I also saw some new exceptions with RAR files
>> > in likely broken folder, but seems tika was able to extract some text from
>> > them before. Do you know if those files are really broken and why tika
>> > extracted text from them before?
>> >
>> > Thank you,
>> > Luis
>> >
>> > Em qui, 13 de dez de 2018 às 13:02, Tim Allison 
>> > escreveu:
>> >
>> > > Reports are here:
>> > >
>> > > http://162.242.228.174/reports/tika_1_20-pre-rc1.zip
>> > >
>> > > I'm going to revert the mp4 parser, and commit the few dependency
>> > > upgrades I ran.
>> > >
>> > > The _major_ difference in content for ppt is explained by the
>> > > duplication of header/footer info.  To confirm this, note that the
>> > > values for "num_unique_tokens_a" and "num_unique_tokens_b" are
>> > > identical for nearly all ppt->ppt, but there are far more tokens in
>> > > "num_tokens_a" vs "num_tokens_b".
>> > >
>> > > I also see that we're losing content in x-java and x-groovy, etc., but
>> > > that's because we're now suppressing the style markup that our parser
>> > > was (incorrectly, IMHO, inserting) -- check the values in
>> > > "top_10_unique_token_diffs_a", e.g.: rgb: 15 | color: 14 | font: 9 |
>> > > 0,0,0: 4 | background: 4 | 147,147,147: 3 | 247,247,247: 3 | bold: 3 |
>> > > weight: 3 | family: 2
>> > >
>> > > In short, I think we're good to go.  Will roll rc1 later today or
>> > > (more likely) tomorrow unless there are objections.
>> > > On Mon, Dec 10, 2018 at 9:37 PM Tim Allison  wrote:
>> > > >
>> > > > Any blockers on 1.20?  I'm going to kick off the regression tests
>> > > shortly.
>> > > > On Fri, Nov 30, 2018 at 7:39 PM  wrote:
>> > > > >
>> > > > > Hi,
>> > > > > On Wed, 21 Nov 2018 at 13:00, Tim Allison  
>> > > > > wrote:
>> > > > >
>> > > > > > Dave,
>> > > > > >   Should I try to get the Docker plugin working again?
>> > > > > >
>> > > > >
>> > > > > That would be great. I think I may have went down the wrong path
>> > > building
>> > > > > an image at package time, as there doesn't seem to be an easy way to
>> > > > > publish it as an Apache labelled org on Dockerhub unless it builds 
>> > > > > from
>> > > > > source.
>> > > > >
>> > > > > I have some time over the weekend, so could update to where I got to
>> > > and
>> > > > > see what you think.
>> > > > >
>> > > > > Cheers,
>> > > > > Dave
>> > >


[jira] [Resolved] (TIKA-2791) Add structure tags to tika-eval

2018-12-14 Thread Tim Allison (JIRA)


 [ 
https://issues.apache.org/jira/browse/TIKA-2791?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Allison resolved TIKA-2791.
---
   Resolution: Fixed
 Assignee: Tim Allison
Fix Version/s: 1.20
   2.0.0

Still have to update reports sql to include this info, but this is a good start.

> Add structure tags to tika-eval
> ---
>
> Key: TIKA-2791
> URL: https://issues.apache.org/jira/browse/TIKA-2791
> Project: Tika
>  Issue Type: Improvement
>Reporter: Tim Allison
>Assignee: Tim Allison
>Priority: Major
> Fix For: 2.0.0, 1.20
>
>
> It would be useful to be able to compare counts of common structure tags in 
> tika-eval.  We could also detect and flag bad structure tags that we may be 
> generating, e.g.: 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (TIKA-2799) Consider reverting jackcess

2018-12-14 Thread Tim Allison (JIRA)


 [ 
https://issues.apache.org/jira/browse/TIKA-2799?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Allison resolved TIKA-2799.
---
   Resolution: Fixed
 Assignee: Tim Allison
Fix Version/s: 1.20
   2.0.0

> Consider reverting jackcess
> ---
>
> Key: TIKA-2799
> URL: https://issues.apache.org/jira/browse/TIKA-2799
> Project: Tika
>  Issue Type: Improvement
>Reporter: Tim Allison
>Assignee: Tim Allison
>Priority: Major
> Fix For: 2.0.0, 1.20
>
>
> It looks like there were some very slight regressions in Jackcess 2.2.0 -- 
> we're able to get slightly less text out of files that threw exceptions, and 
> there's one new exception on a file that was parsed without problem earlier.
> https://sourceforge.net/p/jackcess/bugs/150/
> https://sourceforge.net/p/jackcess/bugs/149/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (TIKA-2798) Consider reverting junrar

2018-12-14 Thread Tim Allison (JIRA)


 [ 
https://issues.apache.org/jira/browse/TIKA-2798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Allison resolved TIKA-2798.
---
   Resolution: Fixed
Fix Version/s: 1.20
   2.0.0

Thank you [~lfcnassif]!  You may be the 4th or 5th person in the world to make 
sense of tika-eval's reports. :D

> Consider reverting junrar
> -
>
> Key: TIKA-2798
> URL: https://issues.apache.org/jira/browse/TIKA-2798
> Project: Tika
>  Issue Type: Improvement
>Reporter: Tim Allison
>Priority: Major
> Fix For: 2.0.0, 1.20
>
> Attachments: AQRJRPYMH3PNNK2HLOOKKR4B3QOVWOUH_1_19_1.rar.json, 
> AQRJRPYMH3PNNK2HLOOKKR4B3QOVWOUH_1_20.rar.json, 
> attachment_diffs_no_exceptions.xlsx, attachment_diffs_with_exceptions.xlsx
>
>
> [~lfcnassif] noticed junrar regressions in the pre-1.20-rc1 regression 
> reports.  Let's figure out what's going on and consider reverting the 
> "upgrade".



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (TIKA-2800) Include num of unique common/alphabetic tokens (types) in tika-eval

2018-12-14 Thread Tim Allison (JIRA)


 [ 
https://issues.apache.org/jira/browse/TIKA-2800?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Allison resolved TIKA-2800.
---
   Resolution: Fixed
 Assignee: Tim Allison
Fix Version/s: 1.20
   2.0.0

> Include num of unique common/alphabetic tokens (types) in tika-eval
> ---
>
> Key: TIKA-2800
> URL: https://issues.apache.org/jira/browse/TIKA-2800
> Project: Tika
>  Issue Type: Improvement
>Reporter: Tim Allison
>Assignee: Tim Allison
>Priority: Major
> Fix For: 2.0.0, 1.20
>
>
> We include token and unique token (type) counts in tika-eval.  We should 
> include type counts for alphabetic and common words.  If one tool is 
> incorrectly duplicating/triplicating content dramatically, that would 
> incorrectly inflate the "common_tokens" sum for that tool. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (TIKA-2800) Include num of unique common/alphabetic tokens (types) in tika-eval

2018-12-14 Thread Tim Allison (JIRA)
Tim Allison created TIKA-2800:
-

 Summary: Include num of unique common/alphabetic tokens (types) in 
tika-eval
 Key: TIKA-2800
 URL: https://issues.apache.org/jira/browse/TIKA-2800
 Project: Tika
  Issue Type: Improvement
Reporter: Tim Allison


We include token and unique token (type) counts in tika-eval.  We should 
include type counts for alphabetic and common words.  If one tool is 
incorrectly duplicating/triplicating content dramatically, that would 
incorrectly inflate the "common_tokens" sum for that tool. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (TIKA-2775) Bulk upgrade plugins and dependencies

2018-12-14 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/TIKA-2775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16721758#comment-16721758
 ] 

Hudson commented on TIKA-2775:
--

UNSTABLE: Integrated in Jenkins build Tika-trunk #1610 (See 
[https://builds.apache.org/job/Tika-trunk/1610/])
TIKA-2775 -- upgrade guava (tallison: 
[https://github.com/apache/tika/commit/7ffd19492909b4dbfddcc96e3921e078c537faef])
* (edit) tika-nlp/pom.xml
* (edit) tika-bundle/pom.xml
* (edit) tika-dl/pom.xml
* (edit) tika-langdetect/pom.xml
* (edit) tika-parsers/pom.xml
TIKA-2775 -- more updates (these were made locally before the first (tallison: 
[https://github.com/apache/tika/commit/dbd3ea1f8cba5528816cd7e72e4c07d4ec5ae8f4])
* (edit) tika-parsers/pom.xml
* (edit) tika-parent/pom.xml


> Bulk upgrade plugins and dependencies
> -
>
> Key: TIKA-2775
> URL: https://issues.apache.org/jira/browse/TIKA-2775
> Project: Tika
>  Issue Type: Task
>Reporter: Tim Allison
>Assignee: Tim Allison
>Priority: Major
>
> Thanks to TIKA-2757...there are some areas for upgrading. :D



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (TIKA-2775) Bulk upgrade plugins and dependencies

2018-12-14 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/TIKA-2775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16721761#comment-16721761
 ] 

Hudson commented on TIKA-2775:
--

UNSTABLE: Integrated in Jenkins build tika-branch-1x #143 (See 
[https://builds.apache.org/job/tika-branch-1x/143/])
TIKA-2775 -- more updates (these were made locally before the first (tallison: 
[https://github.com/apache/tika/commit/6a6c82a4a8e723cfc2873ce10ed9ea8a7454632b])
* (edit) tika-parsers/pom.xml
* (edit) tika-parent/pom.xml


> Bulk upgrade plugins and dependencies
> -
>
> Key: TIKA-2775
> URL: https://issues.apache.org/jira/browse/TIKA-2775
> Project: Tika
>  Issue Type: Task
>Reporter: Tim Allison
>Assignee: Tim Allison
>Priority: Major
>
> Thanks to TIKA-2757...there are some areas for upgrading. :D



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (TIKA-2798) Consider reverting junrar

2018-12-14 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/TIKA-2798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16721762#comment-16721762
 ] 

Hudson commented on TIKA-2798:
--

UNSTABLE: Integrated in Jenkins build tika-branch-1x #143 (See 
[https://builds.apache.org/job/tika-branch-1x/143/])
TIKA-2798 -- revert junrar (tallison: 
[https://github.com/apache/tika/commit/ad3961063f45a03cbec10d4b5f075111aaa5f56f])
* (edit) tika-parsers/pom.xml


> Consider reverting junrar
> -
>
> Key: TIKA-2798
> URL: https://issues.apache.org/jira/browse/TIKA-2798
> Project: Tika
>  Issue Type: Improvement
>Reporter: Tim Allison
>Priority: Major
> Attachments: AQRJRPYMH3PNNK2HLOOKKR4B3QOVWOUH_1_19_1.rar.json, 
> AQRJRPYMH3PNNK2HLOOKKR4B3QOVWOUH_1_20.rar.json, 
> attachment_diffs_no_exceptions.xlsx, attachment_diffs_with_exceptions.xlsx
>
>
> [~lfcnassif] noticed junrar regressions in the pre-1.20-rc1 regression 
> reports.  Let's figure out what's going on and consider reverting the 
> "upgrade".



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (TIKA-2798) Consider reverting junrar

2018-12-14 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/TIKA-2798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16721759#comment-16721759
 ] 

Hudson commented on TIKA-2798:
--

UNSTABLE: Integrated in Jenkins build Tika-trunk #1610 (See 
[https://builds.apache.org/job/Tika-trunk/1610/])
TIKA-2798 -- revert junrar (tallison: 
[https://github.com/apache/tika/commit/1c4c8fd26ab3a9a0b27d6bb1cbb777b5098a3ec2])
* (edit) tika-parsers/pom.xml


> Consider reverting junrar
> -
>
> Key: TIKA-2798
> URL: https://issues.apache.org/jira/browse/TIKA-2798
> Project: Tika
>  Issue Type: Improvement
>Reporter: Tim Allison
>Priority: Major
> Attachments: AQRJRPYMH3PNNK2HLOOKKR4B3QOVWOUH_1_19_1.rar.json, 
> AQRJRPYMH3PNNK2HLOOKKR4B3QOVWOUH_1_20.rar.json, 
> attachment_diffs_no_exceptions.xlsx, attachment_diffs_with_exceptions.xlsx
>
>
> [~lfcnassif] noticed junrar regressions in the pre-1.20-rc1 regression 
> reports.  Let's figure out what's going on and consider reverting the 
> "upgrade".



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (TIKA-2798) Consider reverting junrar

2018-12-14 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/TIKA-2798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16721739#comment-16721739
 ] 

Hudson commented on TIKA-2798:
--

SUCCESS: Integrated in Jenkins build tika-branch-1x #142 (See 
[https://builds.apache.org/job/tika-branch-1x/142/])
TIKA-2798 -- improve reporting for attachment diffs (tallison: 
[https://github.com/apache/tika/commit/8c88966d1ca8a7cbcb34a8bbd39459fd6b161ebf])
* (edit) tika-eval/src/main/resources/comparison-reports.xml


> Consider reverting junrar
> -
>
> Key: TIKA-2798
> URL: https://issues.apache.org/jira/browse/TIKA-2798
> Project: Tika
>  Issue Type: Improvement
>Reporter: Tim Allison
>Priority: Major
> Attachments: AQRJRPYMH3PNNK2HLOOKKR4B3QOVWOUH_1_19_1.rar.json, 
> AQRJRPYMH3PNNK2HLOOKKR4B3QOVWOUH_1_20.rar.json, 
> attachment_diffs_no_exceptions.xlsx, attachment_diffs_with_exceptions.xlsx
>
>
> [~lfcnassif] noticed junrar regressions in the pre-1.20-rc1 regression 
> reports.  Let's figure out what's going on and consider reverting the 
> "upgrade".



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (TIKA-2798) Consider reverting junrar

2018-12-14 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/TIKA-2798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16721728#comment-16721728
 ] 

Hudson commented on TIKA-2798:
--

SUCCESS: Integrated in Jenkins build Tika-trunk #1609 (See 
[https://builds.apache.org/job/Tika-trunk/1609/])
TIKA-2798 -- improve reporting for attachment diffs (tallison: 
[https://github.com/apache/tika/commit/398bcd8566d3028a9554a459f5c49a51fb45528f])
* (edit) tika-eval/src/main/resources/comparison-reports.xml


> Consider reverting junrar
> -
>
> Key: TIKA-2798
> URL: https://issues.apache.org/jira/browse/TIKA-2798
> Project: Tika
>  Issue Type: Improvement
>Reporter: Tim Allison
>Priority: Major
> Attachments: AQRJRPYMH3PNNK2HLOOKKR4B3QOVWOUH_1_19_1.rar.json, 
> AQRJRPYMH3PNNK2HLOOKKR4B3QOVWOUH_1_20.rar.json, 
> attachment_diffs_no_exceptions.xlsx, attachment_diffs_with_exceptions.xlsx
>
>
> [~lfcnassif] noticed junrar regressions in the pre-1.20-rc1 regression 
> reports.  Let's figure out what's going on and consider reverting the 
> "upgrade".



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (TIKA-2791) Add structure tags to tika-eval

2018-12-14 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/TIKA-2791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16721740#comment-16721740
 ] 

Hudson commented on TIKA-2791:
--

SUCCESS: Integrated in Jenkins build tika-branch-1x #142 (See 
[https://builds.apache.org/job/tika-branch-1x/142/])
TIKA-2791 -- add tags/structure to tika-eval (tallison: 
[https://github.com/apache/tika/commit/4c9e38e4eb2983cda9759c4dffa33daa52d659c8])
* (add) tika-eval/src/main/java/org/apache/tika/eval/util/ContentTags.java
* (add) tika-eval/src/test/resources/test-dirs/extractsA/file16_badTags.json
* (edit) 
tika-core/src/main/java/org/apache/tika/sax/AbstractRecursiveParserWrapperHandler.java
* (edit) tika-eval/src/main/java/org/apache/tika/eval/ExtractComparer.java
* (add) tika-eval/src/test/resources/test-dirs/extractsA/file15_tags.json
* (edit) tika-eval/src/main/java/org/apache/tika/eval/db/Cols.java
* (add) tika-eval/src/main/java/org/apache/tika/eval/util/ContentTagParser.java
* (edit) tika-eval/pom.xml
* (edit) tika-eval/src/main/java/org/apache/tika/eval/AbstractProfiler.java
* (edit) 
tika-core/src/main/java/org/apache/tika/sax/RecursiveParserWrapperHandler.java
* (edit) tika-eval/src/main/java/org/apache/tika/eval/ExtractProfiler.java
* (add) 
tika-eval/src/test/resources/test-dirs/extractsA/file17_tagsOutOfOrder.json
* (add) tika-eval/src/test/resources/test-dirs/extractsB/file15_tags.html
* (edit) tika-eval/src/main/java/org/apache/tika/eval/io/ExtractReader.java
* (edit) 
tika-eval/src/main/java/org/apache/tika/eval/batch/ExtractProfilerBuilder.java
* (edit) tika-eval/src/test/java/org/apache/tika/eval/SimpleComparerTest.java
* (add) tika-eval/src/test/resources/test-dirs/extractsB/file16_badTags.html
* (edit) 
tika-eval/src/main/java/org/apache/tika/eval/batch/ExtractComparerBuilder.java


> Add structure tags to tika-eval
> ---
>
> Key: TIKA-2791
> URL: https://issues.apache.org/jira/browse/TIKA-2791
> Project: Tika
>  Issue Type: Improvement
>Reporter: Tim Allison
>Priority: Major
>
> It would be useful to be able to compare counts of common structure tags in 
> tika-eval.  We could also detect and flag bad structure tags that we may be 
> generating, e.g.: 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (TIKA-2791) Add structure tags to tika-eval

2018-12-14 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/TIKA-2791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16721729#comment-16721729
 ] 

Hudson commented on TIKA-2791:
--

SUCCESS: Integrated in Jenkins build Tika-trunk #1609 (See 
[https://builds.apache.org/job/Tika-trunk/1609/])
TIKA-2791 -- add tags/structure to tika-eval (tallison: 
[https://github.com/apache/tika/commit/1ac6a3bd8601dc3376ce01786f115b877b9d338f])
* (edit) tika-eval/pom.xml
* (edit) tika-eval/src/main/java/org/apache/tika/eval/ExtractProfiler.java
* (edit) 
tika-core/src/main/java/org/apache/tika/sax/RecursiveParserWrapperHandler.java
* (edit) tika-eval/src/main/java/org/apache/tika/eval/AbstractProfiler.java
* (add) 
tika-eval/src/test/resources/test-dirs/extractsA/file17_tagsOutOfOrder.json
* (edit) tika-eval/src/main/java/org/apache/tika/eval/db/Cols.java
* (edit) 
tika-eval/src/main/java/org/apache/tika/eval/batch/ExtractComparerBuilder.java
* (edit) 
tika-core/src/main/java/org/apache/tika/sax/AbstractRecursiveParserWrapperHandler.java
* (edit) tika-eval/src/main/java/org/apache/tika/eval/ExtractComparer.java
* (add) tika-eval/src/test/resources/test-dirs/extractsA/file16_badTags.json
* (edit) tika-eval/src/main/java/org/apache/tika/eval/io/ExtractReader.java
* (add) tika-eval/src/test/resources/test-dirs/extractsB/file15_tags.html
* (edit) 
tika-eval/src/main/java/org/apache/tika/eval/batch/ExtractProfilerBuilder.java
* (edit) tika-eval/src/test/java/org/apache/tika/eval/SimpleComparerTest.java
* (add) tika-eval/src/test/resources/test-dirs/extractsA/file15_tags.json
* (add) tika-eval/src/test/resources/test-dirs/extractsB/file16_badTags.html
* (add) tika-eval/src/main/java/org/apache/tika/eval/util/ContentTagParser.java
* (add) tika-eval/src/main/java/org/apache/tika/eval/util/ContentTags.java


> Add structure tags to tika-eval
> ---
>
> Key: TIKA-2791
> URL: https://issues.apache.org/jira/browse/TIKA-2791
> Project: Tika
>  Issue Type: Improvement
>Reporter: Tim Allison
>Priority: Major
>
> It would be useful to be able to compare counts of common structure tags in 
> tika-eval.  We could also detect and flag bad structure tags that we may be 
> generating, e.g.: 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (TIKA-2775) Bulk upgrade plugins and dependencies

2018-12-14 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/TIKA-2775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16721680#comment-16721680
 ] 

Hudson commented on TIKA-2775:
--

UNSTABLE: Integrated in Jenkins build tika-2.x-windows #364 (See 
[https://builds.apache.org/job/tika-2.x-windows/364/])
TIKA-2775 -- upgrade guava (tallison: rev 
7ffd19492909b4dbfddcc96e3921e078c537faef)
* (edit) tika-langdetect/pom.xml
* (edit) tika-nlp/pom.xml
* (edit) tika-bundle/pom.xml
* (edit) tika-dl/pom.xml
* (edit) tika-parsers/pom.xml
TIKA-2775 -- more updates (these were made locally before the first (tallison: 
rev dbd3ea1f8cba5528816cd7e72e4c07d4ec5ae8f4)
* (edit) tika-parent/pom.xml
* (edit) tika-parsers/pom.xml


> Bulk upgrade plugins and dependencies
> -
>
> Key: TIKA-2775
> URL: https://issues.apache.org/jira/browse/TIKA-2775
> Project: Tika
>  Issue Type: Task
>Reporter: Tim Allison
>Assignee: Tim Allison
>Priority: Major
>
> Thanks to TIKA-2757...there are some areas for upgrading. :D



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (TIKA-2798) Consider reverting junrar

2018-12-14 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/TIKA-2798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16721681#comment-16721681
 ] 

Hudson commented on TIKA-2798:
--

UNSTABLE: Integrated in Jenkins build tika-2.x-windows #364 (See 
[https://builds.apache.org/job/tika-2.x-windows/364/])
TIKA-2798 -- revert junrar (tallison: rev 
1c4c8fd26ab3a9a0b27d6bb1cbb777b5098a3ec2)
* (edit) tika-parsers/pom.xml


> Consider reverting junrar
> -
>
> Key: TIKA-2798
> URL: https://issues.apache.org/jira/browse/TIKA-2798
> Project: Tika
>  Issue Type: Improvement
>Reporter: Tim Allison
>Priority: Major
> Attachments: AQRJRPYMH3PNNK2HLOOKKR4B3QOVWOUH_1_19_1.rar.json, 
> AQRJRPYMH3PNNK2HLOOKKR4B3QOVWOUH_1_20.rar.json, 
> attachment_diffs_no_exceptions.xlsx, attachment_diffs_with_exceptions.xlsx
>
>
> [~lfcnassif] noticed junrar regressions in the pre-1.20-rc1 regression 
> reports.  Let's figure out what's going on and consider reverting the 
> "upgrade".



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (TIKA-2799) Consider reverting jackcess

2018-12-14 Thread Tim Allison (JIRA)
Tim Allison created TIKA-2799:
-

 Summary: Consider reverting jackcess
 Key: TIKA-2799
 URL: https://issues.apache.org/jira/browse/TIKA-2799
 Project: Tika
  Issue Type: Improvement
Reporter: Tim Allison


It looks like there were some very slight regressions in Jackcess 2.2.0 -- 
we're able to get slightly less text out of files that threw exceptions, and 
there's one new exception on a file that was parsed without problem earlier.

https://sourceforge.net/p/jackcess/bugs/150/
https://sourceforge.net/p/jackcess/bugs/149/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (TIKA-2791) Add structure tags to tika-eval

2018-12-14 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/TIKA-2791?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16721634#comment-16721634
 ] 

Hudson commented on TIKA-2791:
--

UNSTABLE: Integrated in Jenkins build tika-2.x-windows #363 (See 
[https://builds.apache.org/job/tika-2.x-windows/363/])
TIKA-2791 -- add tags/structure to tika-eval (tallison: rev 
1ac6a3bd8601dc3376ce01786f115b877b9d338f)
* (add) tika-eval/src/test/resources/test-dirs/extractsA/file16_badTags.json
* (edit) 
tika-eval/src/main/java/org/apache/tika/eval/batch/ExtractComparerBuilder.java
* (edit) tika-eval/src/test/java/org/apache/tika/eval/SimpleComparerTest.java
* (add) tika-eval/src/main/java/org/apache/tika/eval/util/ContentTagParser.java
* (add) tika-eval/src/main/java/org/apache/tika/eval/util/ContentTags.java
* (edit) tika-eval/src/main/java/org/apache/tika/eval/io/ExtractReader.java
* (edit) 
tika-core/src/main/java/org/apache/tika/sax/AbstractRecursiveParserWrapperHandler.java
* (add) tika-eval/src/test/resources/test-dirs/extractsB/file15_tags.html
* (edit) tika-eval/src/main/java/org/apache/tika/eval/ExtractProfiler.java
* (edit) tika-eval/src/main/java/org/apache/tika/eval/AbstractProfiler.java
* (add) tika-eval/src/test/resources/test-dirs/extractsB/file16_badTags.html
* (edit) tika-eval/pom.xml
* (edit) 
tika-eval/src/main/java/org/apache/tika/eval/batch/ExtractProfilerBuilder.java
* (edit) 
tika-core/src/main/java/org/apache/tika/sax/RecursiveParserWrapperHandler.java
* (edit) tika-eval/src/main/java/org/apache/tika/eval/ExtractComparer.java
* (add) tika-eval/src/test/resources/test-dirs/extractsA/file15_tags.json
* (edit) tika-eval/src/main/java/org/apache/tika/eval/db/Cols.java
* (add) 
tika-eval/src/test/resources/test-dirs/extractsA/file17_tagsOutOfOrder.json


> Add structure tags to tika-eval
> ---
>
> Key: TIKA-2791
> URL: https://issues.apache.org/jira/browse/TIKA-2791
> Project: Tika
>  Issue Type: Improvement
>Reporter: Tim Allison
>Priority: Major
>
> It would be useful to be able to compare counts of common structure tags in 
> tika-eval.  We could also detect and flag bad structure tags that we may be 
> generating, e.g.: 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (TIKA-2798) Consider reverting junrar

2018-12-14 Thread Hudson (JIRA)


[ 
https://issues.apache.org/jira/browse/TIKA-2798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16721633#comment-16721633
 ] 

Hudson commented on TIKA-2798:
--

UNSTABLE: Integrated in Jenkins build tika-2.x-windows #363 (See 
[https://builds.apache.org/job/tika-2.x-windows/363/])
TIKA-2798 -- improve reporting for attachment diffs (tallison: rev 
398bcd8566d3028a9554a459f5c49a51fb45528f)
* (edit) tika-eval/src/main/resources/comparison-reports.xml


> Consider reverting junrar
> -
>
> Key: TIKA-2798
> URL: https://issues.apache.org/jira/browse/TIKA-2798
> Project: Tika
>  Issue Type: Improvement
>Reporter: Tim Allison
>Priority: Major
> Attachments: AQRJRPYMH3PNNK2HLOOKKR4B3QOVWOUH_1_19_1.rar.json, 
> AQRJRPYMH3PNNK2HLOOKKR4B3QOVWOUH_1_20.rar.json, 
> attachment_diffs_no_exceptions.xlsx, attachment_diffs_with_exceptions.xlsx
>
>
> [~lfcnassif] noticed junrar regressions in the pre-1.20-rc1 regression 
> reports.  Let's figure out what's going on and consider reverting the 
> "upgrade".



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Comment Edited] (TIKA-2797) Using TIKA with javax api causes exception

2018-12-14 Thread Tim Allison (JIRA)


[ 
https://issues.apache.org/jira/browse/TIKA-2797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16721591#comment-16721591
 ] 

Tim Allison edited comment on TIKA-2797 at 12/14/18 4:48 PM:
-

Which version of Tika is this? Is this tika-app or tika-server?

What (and what version) are you using for REST communication?

In master, we're currently using: 
org.apache.cxf:cxf-rt-frontend-jaxrs:jar:3.2.7 which brings in: 
javax.servlet:javax.servlet-api:jar:3.1.0


was (Author: talli...@mitre.org):
Which version of Tika is this? What (and what version) are you using for REST 
communication?

> Using TIKA with javax api causes exception
> --
>
> Key: TIKA-2797
> URL: https://issues.apache.org/jira/browse/TIKA-2797
> Project: Tika
>  Issue Type: Bug
>Reporter: Liia Nurullina
>Priority: Major
>
> After tika dependency just added to the project (no classes used in logic) 
> during REST communication there is error when sending request to the 
> application:
> *Type* Exception Report
> *Message* org.glassfish.jersey.server.ContainerException: 
> java.lang.AbstractMethodError: 
> javax.ws.rs.core.Response$ResponseBuilder.status(ILjava/lang/String;)Ljavax/ws/rs/core/Response$ResponseBuilder;
> *Description* The server encountered an unexpected condition that prevented 
> it from fulfilling the request.
> *Exception*
> javax.servlet.ServletException: 
> org.glassfish.jersey.server.ContainerException: 
> java.lang.AbstractMethodError: 
> javax.ws.rs.core.Response$ResponseBuilder.status(ILjava/lang/String;)Ljavax/ws/rs/core/Response$ResponseBuilder;
>  org.glassfish.jersey.servlet.WebComponent.serviceImpl(WebComponent.java:432) 
> org.glassfish.jersey.servlet.WebComponent.service(WebComponent.java:370) 
> org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:389)
>  
> org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:342)
>  
> org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:229)
>  org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:52)
> *Root Cause*
> org.glassfish.jersey.server.ContainerException: 
> java.lang.AbstractMethodError: 
> javax.ws.rs.core.Response$ResponseBuilder.status(ILjava/lang/String;)Ljavax/ws/rs/core/Response$ResponseBuilder;
>  
> org.glassfish.jersey.servlet.internal.ResponseWriter.rethrow(ResponseWriter.java:278)
>  
> org.glassfish.jersey.servlet.internal.ResponseWriter.failure(ResponseWriter.java:260)
>  
> org.glassfish.jersey.server.ServerRuntime$Responder.process(ServerRuntime.java:460)
>  org.glassfish.jersey.server.ServerRuntime$1.run(ServerRuntime.java:285) 
> org.glassfish.jersey.internal.Errors$1.call(Errors.java:272) 
> org.glassfish.jersey.internal.Errors$1.call(Errors.java:268) 
> org.glassfish.jersey.internal.Errors.process(Errors.java:316) 
> org.glassfish.jersey.internal.Errors.process(Errors.java:298) 
> org.glassfish.jersey.internal.Errors.process(Errors.java:268) 
> org.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:289)
>  org.glassfish.jersey.server.ServerRuntime.process(ServerRuntime.java:256) 
> org.glassfish.jersey.server.ApplicationHandler.handle(ApplicationHandler.java:703)
>  org.glassfish.jersey.servlet.WebComponent.serviceImpl(WebComponent.java:416) 
> org.glassfish.jersey.servlet.WebComponent.service(WebComponent.java:370) 
> org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:389)
>  
> org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:342)
>  
> org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:229)
>  org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:52)
> *Root Cause*
> java.lang.AbstractMethodError: 
> javax.ws.rs.core.Response$ResponseBuilder.status(ILjava/lang/String;)Ljavax/ws/rs/core/Response$ResponseBuilder;
>  javax.ws.rs.core.Response$ResponseBuilder.status(Response.java:921) 
> javax.ws.rs.core.Response.status(Response.java:592) 
> javax.ws.rs.core.Response.status(Response.java:603) 
> javax.ws.rs.core.Response.ok(Response.java:638) 
> javax.ws.rs.core.Response.ok(Response.java:650) 
> com.avaya.asr.deliverynode.resources.StatusResource.getSystemStatus(StatusResource.java:46)
>  sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  java.lang.reflect.Method.invoke(Method.java:498) 
> org.glassfish.jersey.server.model.internal.ResourceMethodInvocationHandlerFactory.lambda$static$0(ResourceMethodInvocationHandlerFactory.java:76)
>  
> org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher$1.run(AbstractJavaResourceMethodDispatch

[jira] [Commented] (TIKA-2797) Using TIKA with javax api causes exception

2018-12-14 Thread Tim Allison (JIRA)


[ 
https://issues.apache.org/jira/browse/TIKA-2797?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16721591#comment-16721591
 ] 

Tim Allison commented on TIKA-2797:
---

Which version of Tika is this? What (and what version) are you using for REST 
communication?

> Using TIKA with javax api causes exception
> --
>
> Key: TIKA-2797
> URL: https://issues.apache.org/jira/browse/TIKA-2797
> Project: Tika
>  Issue Type: Bug
>Reporter: Liia Nurullina
>Priority: Major
>
> After tika dependency just added to the project (no classes used in logic) 
> during REST communication there is error when sending request to the 
> application:
> *Type* Exception Report
> *Message* org.glassfish.jersey.server.ContainerException: 
> java.lang.AbstractMethodError: 
> javax.ws.rs.core.Response$ResponseBuilder.status(ILjava/lang/String;)Ljavax/ws/rs/core/Response$ResponseBuilder;
> *Description* The server encountered an unexpected condition that prevented 
> it from fulfilling the request.
> *Exception*
> javax.servlet.ServletException: 
> org.glassfish.jersey.server.ContainerException: 
> java.lang.AbstractMethodError: 
> javax.ws.rs.core.Response$ResponseBuilder.status(ILjava/lang/String;)Ljavax/ws/rs/core/Response$ResponseBuilder;
>  org.glassfish.jersey.servlet.WebComponent.serviceImpl(WebComponent.java:432) 
> org.glassfish.jersey.servlet.WebComponent.service(WebComponent.java:370) 
> org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:389)
>  
> org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:342)
>  
> org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:229)
>  org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:52)
> *Root Cause*
> org.glassfish.jersey.server.ContainerException: 
> java.lang.AbstractMethodError: 
> javax.ws.rs.core.Response$ResponseBuilder.status(ILjava/lang/String;)Ljavax/ws/rs/core/Response$ResponseBuilder;
>  
> org.glassfish.jersey.servlet.internal.ResponseWriter.rethrow(ResponseWriter.java:278)
>  
> org.glassfish.jersey.servlet.internal.ResponseWriter.failure(ResponseWriter.java:260)
>  
> org.glassfish.jersey.server.ServerRuntime$Responder.process(ServerRuntime.java:460)
>  org.glassfish.jersey.server.ServerRuntime$1.run(ServerRuntime.java:285) 
> org.glassfish.jersey.internal.Errors$1.call(Errors.java:272) 
> org.glassfish.jersey.internal.Errors$1.call(Errors.java:268) 
> org.glassfish.jersey.internal.Errors.process(Errors.java:316) 
> org.glassfish.jersey.internal.Errors.process(Errors.java:298) 
> org.glassfish.jersey.internal.Errors.process(Errors.java:268) 
> org.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:289)
>  org.glassfish.jersey.server.ServerRuntime.process(ServerRuntime.java:256) 
> org.glassfish.jersey.server.ApplicationHandler.handle(ApplicationHandler.java:703)
>  org.glassfish.jersey.servlet.WebComponent.serviceImpl(WebComponent.java:416) 
> org.glassfish.jersey.servlet.WebComponent.service(WebComponent.java:370) 
> org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:389)
>  
> org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:342)
>  
> org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:229)
>  org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:52)
> *Root Cause*
> java.lang.AbstractMethodError: 
> javax.ws.rs.core.Response$ResponseBuilder.status(ILjava/lang/String;)Ljavax/ws/rs/core/Response$ResponseBuilder;
>  javax.ws.rs.core.Response$ResponseBuilder.status(Response.java:921) 
> javax.ws.rs.core.Response.status(Response.java:592) 
> javax.ws.rs.core.Response.status(Response.java:603) 
> javax.ws.rs.core.Response.ok(Response.java:638) 
> javax.ws.rs.core.Response.ok(Response.java:650) 
> com.avaya.asr.deliverynode.resources.StatusResource.getSystemStatus(StatusResource.java:46)
>  sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>  java.lang.reflect.Method.invoke(Method.java:498) 
> org.glassfish.jersey.server.model.internal.ResourceMethodInvocationHandlerFactory.lambda$static$0(ResourceMethodInvocationHandlerFactory.java:76)
>  
> org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher$1.run(AbstractJavaResourceMethodDispatcher.java:148)
>  
> org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.invoke(AbstractJavaResourceMethodDispatcher.java:191)
>  
> org.glassfish.jersey.server.model.internal.JavaResourceMethodDispatcherProvider$ResponseOutInvoker.doDispatch(JavaResourceMethodDispatcherProvider.java:200)
>  
> org.glassfish.jersey.server.model.i

[jira] [Commented] (TIKA-2798) Consider reverting junrar

2018-12-14 Thread Tim Allison (JIRA)


[ 
https://issues.apache.org/jira/browse/TIKA-2798?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16721554#comment-16721554
 ] 

Tim Allison commented on TIKA-2798:
---

I couldn't figure out why I didn't see diffs in the attachment counts.  It was 
because the reporting sql for attachment counts required that both extracts not 
have an exception.  I'm going to improve the reporting for attachment diffs -- 
as we do with content diffs -- one report will not include info if one or both 
of the files had exceptions {{attachment_diffs_no_exceptions.xlsx}}, and there 
will also be a full report {{attachment_diffs_with_exceptions.xlsx}}

> Consider reverting junrar
> -
>
> Key: TIKA-2798
> URL: https://issues.apache.org/jira/browse/TIKA-2798
> Project: Tika
>  Issue Type: Improvement
>Reporter: Tim Allison
>Priority: Major
> Attachments: AQRJRPYMH3PNNK2HLOOKKR4B3QOVWOUH_1_19_1.rar.json, 
> AQRJRPYMH3PNNK2HLOOKKR4B3QOVWOUH_1_20.rar.json, 
> attachment_diffs_no_exceptions.xlsx, attachment_diffs_with_exceptions.xlsx
>
>
> [~lfcnassif] noticed junrar regressions in the pre-1.20-rc1 regression 
> reports.  Let's figure out what's going on and consider reverting the 
> "upgrade".



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (TIKA-2798) Consider reverting junrar

2018-12-14 Thread Tim Allison (JIRA)


 [ 
https://issues.apache.org/jira/browse/TIKA-2798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Allison updated TIKA-2798:
--
Attachment: attachment_diffs_no_exceptions.xlsx
attachment_diffs_with_exceptions.xlsx

> Consider reverting junrar
> -
>
> Key: TIKA-2798
> URL: https://issues.apache.org/jira/browse/TIKA-2798
> Project: Tika
>  Issue Type: Improvement
>Reporter: Tim Allison
>Priority: Major
> Attachments: AQRJRPYMH3PNNK2HLOOKKR4B3QOVWOUH_1_19_1.rar.json, 
> AQRJRPYMH3PNNK2HLOOKKR4B3QOVWOUH_1_20.rar.json, 
> attachment_diffs_no_exceptions.xlsx, attachment_diffs_with_exceptions.xlsx
>
>
> [~lfcnassif] noticed junrar regressions in the pre-1.20-rc1 regression 
> reports.  Let's figure out what's going on and consider reverting the 
> "upgrade".



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (TIKA-2798) Consider reverting junrar

2018-12-14 Thread Tim Allison (JIRA)


 [ 
https://issues.apache.org/jira/browse/TIKA-2798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Allison updated TIKA-2798:
--
Attachment: AQRJRPYMH3PNNK2HLOOKKR4B3QOVWOUH_1_19_1.rar.json
AQRJRPYMH3PNNK2HLOOKKR4B3QOVWOUH_1_20.rar.json

> Consider reverting junrar
> -
>
> Key: TIKA-2798
> URL: https://issues.apache.org/jira/browse/TIKA-2798
> Project: Tika
>  Issue Type: Improvement
>Reporter: Tim Allison
>Priority: Major
> Attachments: AQRJRPYMH3PNNK2HLOOKKR4B3QOVWOUH_1_19_1.rar.json, 
> AQRJRPYMH3PNNK2HLOOKKR4B3QOVWOUH_1_20.rar.json
>
>
> [~lfcnassif] noticed junrar regressions in the pre-1.20-rc1 regression 
> reports.  Let's figure out what's going on and consider reverting the 
> "upgrade".



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (TIKA-2798) Consider reverting junrar

2018-12-14 Thread Tim Allison (JIRA)
Tim Allison created TIKA-2798:
-

 Summary: Consider reverting junrar
 Key: TIKA-2798
 URL: https://issues.apache.org/jira/browse/TIKA-2798
 Project: Tika
  Issue Type: Improvement
Reporter: Tim Allison


[~lfcnassif] noticed junrar regressions in the pre-1.20-rc1 regression reports. 
 Let's figure out what's going on and consider reverting the "upgrade".



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


JDK 12 enters Rampdown Phase One

2018-12-14 Thread Rory O'Donnell

Hi Tim,

*JDK 12 Early Access build **is now available **at : - jdk.java.net/12/*

 * Per the JDK 12 schedule [1], we are now in Rampdown Phase One.
 o For more details , see Mark Reinhold's email to jdk-dev mailing
   list [2]
 o The overall feature set is frozen, no further JEPs will be
   targeted to this release.
 o We’ve forked the main-line source repository, jdk/jdk, to the
   JDK 12 stabilization repository.

Changes since the last availability email

 * JEP 189: Shenandoah: A Low-Pause-Time Garbage Collector
   (Experimental)  moved to *Targeted*.
 * JEP 334: JVM Constants API  moved
   to *Targeted*.
 * JEP 344: Abortable Mixed Collections for G1
    moved  to *Targeted*.
 * JEP 346: Promptly Return Unused Committed Memory from G1
    to *Targeted*.
 * JEP 326: Raw String Literals 
   (Preview) *Proposed to drop from JDK 12*
 o link to proposal on jdk-dev
   


Bug fixes reported by Open Source Projects  :

 o JDK-8211051 - fixed in b22 - reported by JUnit5
 o JDK-8211422 - fixed in b23 - reported by Apache Batik

The Java Crypto Roadmap  has been 
updated with the following target:


 * With the 2019-04-16 CPU,
 o Targeted Releases - JDK 12, JDK 11, JDK 8, and JDK 7
 o Distrust TLS server certificates anchored by Symantec Root CAs.

Oracle Java SE 8 Release Updates [3]

 * Public updates for Oracle Java SE 8 released after January 2019 will
   not be available for business, commercial or production use without
   a commercial license.

Rgds, Rory

[1] http://openjdk.java.net/projects/jdk/12/#Schedule
[2] http://mail.openjdk.java.net/pipermail/jdk-dev/2018-December/002405.html
[3] https://java.com/en/download/release_notice.jsp

--
Rgds,Rory O'Donnell
Quality Engineering Manager
Oracle EMEA , Dublin, Ireland



[jira] [Created] (TIKA-2797) Using TIKA with javax api causes exception

2018-12-14 Thread Liia Nurullina (JIRA)
Liia Nurullina created TIKA-2797:


 Summary: Using TIKA with javax api causes exception
 Key: TIKA-2797
 URL: https://issues.apache.org/jira/browse/TIKA-2797
 Project: Tika
  Issue Type: Bug
Reporter: Liia Nurullina


After tika dependency just added to the project (no classes used in logic) 
during REST communication there is error when sending request to the 
application:

*Type* Exception Report

*Message* org.glassfish.jersey.server.ContainerException: 
java.lang.AbstractMethodError: 
javax.ws.rs.core.Response$ResponseBuilder.status(ILjava/lang/String;)Ljavax/ws/rs/core/Response$ResponseBuilder;

*Description* The server encountered an unexpected condition that prevented it 
from fulfilling the request.

*Exception*

javax.servlet.ServletException: org.glassfish.jersey.server.ContainerException: 
java.lang.AbstractMethodError: 
javax.ws.rs.core.Response$ResponseBuilder.status(ILjava/lang/String;)Ljavax/ws/rs/core/Response$ResponseBuilder;
 org.glassfish.jersey.servlet.WebComponent.serviceImpl(WebComponent.java:432) 
org.glassfish.jersey.servlet.WebComponent.service(WebComponent.java:370) 
org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:389)
 
org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:342)
 
org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:229)
 org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:52)

*Root Cause*

org.glassfish.jersey.server.ContainerException: java.lang.AbstractMethodError: 
javax.ws.rs.core.Response$ResponseBuilder.status(ILjava/lang/String;)Ljavax/ws/rs/core/Response$ResponseBuilder;
 
org.glassfish.jersey.servlet.internal.ResponseWriter.rethrow(ResponseWriter.java:278)
 
org.glassfish.jersey.servlet.internal.ResponseWriter.failure(ResponseWriter.java:260)
 
org.glassfish.jersey.server.ServerRuntime$Responder.process(ServerRuntime.java:460)
 org.glassfish.jersey.server.ServerRuntime$1.run(ServerRuntime.java:285) 
org.glassfish.jersey.internal.Errors$1.call(Errors.java:272) 
org.glassfish.jersey.internal.Errors$1.call(Errors.java:268) 
org.glassfish.jersey.internal.Errors.process(Errors.java:316) 
org.glassfish.jersey.internal.Errors.process(Errors.java:298) 
org.glassfish.jersey.internal.Errors.process(Errors.java:268) 
org.glassfish.jersey.process.internal.RequestScope.runInScope(RequestScope.java:289)
 org.glassfish.jersey.server.ServerRuntime.process(ServerRuntime.java:256) 
org.glassfish.jersey.server.ApplicationHandler.handle(ApplicationHandler.java:703)
 org.glassfish.jersey.servlet.WebComponent.serviceImpl(WebComponent.java:416) 
org.glassfish.jersey.servlet.WebComponent.service(WebComponent.java:370) 
org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:389)
 
org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:342)
 
org.glassfish.jersey.servlet.ServletContainer.service(ServletContainer.java:229)
 org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:52)

*Root Cause*

java.lang.AbstractMethodError: 
javax.ws.rs.core.Response$ResponseBuilder.status(ILjava/lang/String;)Ljavax/ws/rs/core/Response$ResponseBuilder;
 javax.ws.rs.core.Response$ResponseBuilder.status(Response.java:921) 
javax.ws.rs.core.Response.status(Response.java:592) 
javax.ws.rs.core.Response.status(Response.java:603) 
javax.ws.rs.core.Response.ok(Response.java:638) 
javax.ws.rs.core.Response.ok(Response.java:650) 
com.avaya.asr.deliverynode.resources.StatusResource.getSystemStatus(StatusResource.java:46)
 sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
 java.lang.reflect.Method.invoke(Method.java:498) 
org.glassfish.jersey.server.model.internal.ResourceMethodInvocationHandlerFactory.lambda$static$0(ResourceMethodInvocationHandlerFactory.java:76)
 
org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher$1.run(AbstractJavaResourceMethodDispatcher.java:148)
 
org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.invoke(AbstractJavaResourceMethodDispatcher.java:191)
 
org.glassfish.jersey.server.model.internal.JavaResourceMethodDispatcherProvider$ResponseOutInvoker.doDispatch(JavaResourceMethodDispatcherProvider.java:200)
 
org.glassfish.jersey.server.model.internal.AbstractJavaResourceMethodDispatcher.dispatch(AbstractJavaResourceMethodDispatcher.java:103)
 
org.glassfish.jersey.server.model.ResourceMethodInvoker.invoke(ResourceMethodInvoker.java:493)
 
org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:415)
 
org.glassfish.jersey.server.model.ResourceMethodInvoker.apply(ResourceMethodInvoker.java:104)
 org.glassfish.jersey.server.ServerRuntime$1.run(ServerRuntime.java:277) 
org