RE: Forbidden-APIS no longer ran because of carzy POM change

2015-01-23 Thread Uwe Schindler
Hi,

I did further investigation. I had the plugin disabled in my eclipse (you can 
do this in quick fix for the whole workspace). In fact, if you remove the 
disable thing, it fails also in Eclipse Luna.

If we want to make the plugin automatically hidden to all Eclipse versions 
through our own POM file - this is what the quick fix also allows to do for the 
current project:

pluginManagement
plugins
!--This plugin's configuration is used to store Eclipse m2e 
settings only. It has no influence on the Maven build itself.--
plugin
groupIdorg.eclipse.m2e/groupId
artifactIdlifecycle-mapping/artifactId
version1.0.0/version
configuration
lifecycleMappingMetadata
pluginExecutions
pluginExecution
pluginExecutionFilter

groupIdde.thetaphi/groupId

artifactIdforbiddenapis/artifactId

versionRange[1.0,)/versionRange
goals

goalcheck/goal

goaltestCheck/goal
/goals
/pluginExecutionFilter
action
ignore/
/action
/pluginExecution
/pluginExecutions
/lifecycleMappingMetadata
/configuration
/plugin
/plugins
/pluginManagement

This can be put in to tika-parent's POM.

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


 -Original Message-
 From: Uwe Schindler [mailto:u...@thetaphi.de]
 Sent: Friday, January 23, 2015 5:18 PM
 To: dev@tika.apache.org
 Subject: RE: Forbidden-APIS no longer ran because of carzy POM change
 
 Hi, this may also help, it also brings the needed information:
 
 https://www.eclipse.org/m2e/documentation/m2e-execution-not-
 covered.html
 
 In fact the problem is: Eclipse has no idea how this plugin should be executed
 internally in Eclipse. But as this is just a check plugin that does not 
 affect the
 build output at all, you can leave it disabled.
 
 If you scroll down, you see that Eclipse 4.2+ fixes this problem: Disable the
 plugin for Maven using Project properties - Maven - Lifecycle mappings -
 ignore
 
 -
 Uwe Schindler
 H.-H.-Meier-Allee 63, D-28213 Bremen
 http://www.thetaphi.de
 eMail: u...@thetaphi.de
 
 
  -Original Message-
  From: Mattmann, Chris A (3980) [mailto:chris.a.mattm...@jpl.nasa.gov]
  Sent: Friday, January 23, 2015 4:13 PM
  To: dev@tika.apache.org
  Subject: Re: Forbidden-APIS no longer ran because of carzy POM change
 
  Hi Uwe,
 
  Thanks. I will check it out. Like I said, I’m not OK reverting
  anything if my Eclipse keeps complaining at me so we’ll need a fix
  that handles both. Let me try with the latest version of Eclipse and
  m2e and see if (with your patch) the issue goes away.
 
  Cheers,
  Chris
 
 
 ++
  
  Chris Mattmann, Ph.D.
  Chief Architect
  Instrument Software and Science Data Systems Section (398) NASA Jet
  Propulsion Laboratory Pasadena, CA 91109 USA
  Office: 168-519, Mailstop: 168-527
  Email: chris.a.mattm...@nasa.gov
  WWW:  http://sunset.usc.edu/~mattmann/
 
 ++
  
  Adjunct Associate Professor, Computer Science Department University of
  Southern California, Los Angeles, CA 90089 USA
 
 ++
  
 
 
 
 
 
 
  -Original Message-
  From: Uwe Schindler u...@thetaphi.de
  Reply-To: dev@tika.apache.org dev@tika.apache.org
  Date: Friday, January 23, 2015 at 3:59 AM
  To: dev@tika.apache.org dev@tika.apache.org
  Subject: RE: Forbidden-APIS no longer ran because of carzy POM change
 
  Here ist he explanation why the plugin is no longer called because of
  this:
  
  - Works for me too, but can anyone explain why? –  Andrew Swan May 15
  '13
  at 6:26
  - @Andrew I think this works because m2e is not looking for plugins
  in pluginManagement, but only in build/plugins. In the Maven world,
  there is a difference between 

[jira] [Commented] (TIKA-1521) Handle password protected 7zip files

2015-01-23 Thread Nick Burch (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289605#comment-14289605
 ] 

Nick Burch commented on TIKA-1521:
--

All unit tests (including that one) pass just fine on my system, after a mvn 
clean, so I'm not sure why it isn't working for you or Jenkins?

 Handle password protected 7zip files
 

 Key: TIKA-1521
 URL: https://issues.apache.org/jira/browse/TIKA-1521
 Project: Tika
  Issue Type: Improvement
  Components: parser
Affects Versions: 1.7
Reporter: Nick Burch
 Fix For: 1.8


 While working on TIKA-1028, I notice that while Commons Compress doesn't 
 currently handle decrypting password protected zip files, it does handle 
 password protected 7zip files
 We should therefore add logic into the package parser to spot password 
 protected 7zip files, and fetch the password for them from a PasswordProvider 
 if given



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


RE: Forbidden-APIS no longer ran because of carzy POM change

2015-01-23 Thread Uwe Schindler
I will add this to the documentation page of forbidden-apis. This may also help 
Elasticsearch and other people :-) 

Uwe

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


 -Original Message-
 From: Mattmann, Chris A (3980) [mailto:chris.a.mattm...@jpl.nasa.gov]
 Sent: Friday, January 23, 2015 8:11 PM
 To: dev@tika.apache.org
 Subject: Re: Forbidden-APIS no longer ran because of carzy POM change
 
 awesome. Thanks Uwe.
 
 Tim you want to put that in, or you want me to?
 
 ++
 
 Chris Mattmann, Ph.D.
 Chief Architect
 Instrument Software and Science Data Systems Section (398) NASA Jet
 Propulsion Laboratory Pasadena, CA 91109 USA
 Office: 168-519, Mailstop: 168-527
 Email: chris.a.mattm...@nasa.gov
 WWW:  http://sunset.usc.edu/~mattmann/
 ++
 
 Adjunct Associate Professor, Computer Science Department University of
 Southern California, Los Angeles, CA 90089 USA
 ++
 
 
 
 
 
 
 
 -Original Message-
 From: Uwe Schindler u...@thetaphi.de
 Reply-To: dev@tika.apache.org dev@tika.apache.org
 Date: Friday, January 23, 2015 at 8:47 AM
 To: dev@tika.apache.org dev@tika.apache.org
 Subject: RE: Forbidden-APIS no longer ran because of carzy POM change
 
 Hi,
 
 I did further investigation. I had the plugin disabled in my eclipse
 (you can do this in quick fix for the whole workspace). In fact, if you
 remove the disable thing, it fails also in Eclipse Luna.
 
 If we want to make the plugin automatically hidden to all Eclipse
 versions through our own POM file - this is what the quick fix also
 allows to do for the current project:
 
 pluginManagement
  plugins
  !--This plugin's configuration is used to store Eclipse m2e
 settings only. It has no influence on the Maven build itself.--
  plugin
  groupIdorg.eclipse.m2e/groupId
  artifactIdlifecycle-mapping/artifactId
  version1.0.0/version
  configuration
  lifecycleMappingMetadata
  pluginExecutions
  pluginExecution
 
   pluginExecutionFilter
 
   groupIdde.thetaphi/groupId
 
   artifactIdforbiddenapis/artifactId
 
   versionRange[1.0,)/versionRange
  goals
 
   goalcheck/goal
 
   goaltestCheck/goal
  /goals
 
   /pluginExecutionFilter
  action
  ignore/
  /action
  /pluginExecution
  /pluginExecutions
  /lifecycleMappingMetadata
  /configuration
  /plugin
  /plugins
 /pluginManagement
 
 This can be put in to tika-parent's POM.
 
 -
 Uwe Schindler
 H.-H.-Meier-Allee 63, D-28213 Bremen
 http://www.thetaphi.de
 eMail: u...@thetaphi.de
 
 
  -Original Message-
  From: Uwe Schindler [mailto:u...@thetaphi.de]
  Sent: Friday, January 23, 2015 5:18 PM
  To: dev@tika.apache.org
  Subject: RE: Forbidden-APIS no longer ran because of carzy POM change
 
  Hi, this may also help, it also brings the needed information:
 
  https://www.eclipse.org/m2e/documentation/m2e-execution-not-
  covered.html
 
  In fact the problem is: Eclipse has no idea how this plugin should be
 executed  internally in Eclipse. But as this is just a check plugin
 that does not affect the  build output at all, you can leave it
 disabled.
 
  If you scroll down, you see that Eclipse 4.2+ fixes this problem:
 Disable the
  plugin for Maven using Project properties - Maven - Lifecycle
 mappings -  ignore
 
  -
  Uwe Schindler
  H.-H.-Meier-Allee 63, D-28213 Bremen
  http://www.thetaphi.de
  eMail: u...@thetaphi.de
 
 
   -Original Message-
   From: Mattmann, Chris A (3980)
   [mailto:chris.a.mattm...@jpl.nasa.gov]
   Sent: Friday, January 23, 2015 4:13 PM
   To: dev@tika.apache.org
   Subject: Re: Forbidden-APIS no longer ran because of carzy POM
   change
  
   Hi Uwe,
  
   Thanks. I will check it out. Like I said, I’m not OK reverting
   anything if my Eclipse keeps complaining at me so we’ll need a fix
   that handles both. Let me try with the latest version of Eclipse
   and m2e and see if (with your patch) the issue goes away.
  
   Cheers,
   Chris
  
  
 
 ++
   
   Chris Mattmann, Ph.D.
   Chief Architect
   Instrument Software and Science Data Systems Section (398) NASA Jet
   

Re: Forbidden-APIS no longer ran because of carzy POM change

2015-01-23 Thread Mattmann, Chris A (3980)
awesome. Thanks Uwe.

Tim you want to put that in, or you want me to?

++
Chris Mattmann, Ph.D.
Chief Architect
Instrument Software and Science Data Systems Section (398)
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 168-519, Mailstop: 168-527
Email: chris.a.mattm...@nasa.gov
WWW:  http://sunset.usc.edu/~mattmann/
++
Adjunct Associate Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++






-Original Message-
From: Uwe Schindler u...@thetaphi.de
Reply-To: dev@tika.apache.org dev@tika.apache.org
Date: Friday, January 23, 2015 at 8:47 AM
To: dev@tika.apache.org dev@tika.apache.org
Subject: RE: Forbidden-APIS no longer ran because of carzy POM change

Hi,

I did further investigation. I had the plugin disabled in my eclipse (you
can do this in quick fix for the whole workspace). In fact, if you remove
the disable thing, it fails also in Eclipse Luna.

If we want to make the plugin automatically hidden to all Eclipse
versions through our own POM file - this is what the quick fix also
allows to do for the current project:

pluginManagement
   plugins
   !--This plugin's configuration is used to store Eclipse m2e
settings only. It has no influence on the Maven build itself.--
   plugin
   groupIdorg.eclipse.m2e/groupId
   artifactIdlifecycle-mapping/artifactId
   version1.0.0/version
   configuration
   lifecycleMappingMetadata
   pluginExecutions
   pluginExecution
   pluginExecutionFilter
   
 groupIdde.thetaphi/groupId
   
 artifactIdforbiddenapis/artifactId
   
 versionRange[1.0,)/versionRange
   goals
   
 goalcheck/goal
   
 goaltestCheck/goal
   /goals
   /pluginExecutionFilter
   action
   ignore/
   /action
   /pluginExecution
   /pluginExecutions
   /lifecycleMappingMetadata
   /configuration
   /plugin
   /plugins
/pluginManagement

This can be put in to tika-parent's POM.

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


 -Original Message-
 From: Uwe Schindler [mailto:u...@thetaphi.de]
 Sent: Friday, January 23, 2015 5:18 PM
 To: dev@tika.apache.org
 Subject: RE: Forbidden-APIS no longer ran because of carzy POM change
 
 Hi, this may also help, it also brings the needed information:
 
 https://www.eclipse.org/m2e/documentation/m2e-execution-not-
 covered.html
 
 In fact the problem is: Eclipse has no idea how this plugin should be
executed
 internally in Eclipse. But as this is just a check plugin that does
not affect the
 build output at all, you can leave it disabled.
 
 If you scroll down, you see that Eclipse 4.2+ fixes this problem:
Disable the
 plugin for Maven using Project properties - Maven - Lifecycle
mappings -
 ignore
 
 -
 Uwe Schindler
 H.-H.-Meier-Allee 63, D-28213 Bremen
 http://www.thetaphi.de
 eMail: u...@thetaphi.de
 
 
  -Original Message-
  From: Mattmann, Chris A (3980) [mailto:chris.a.mattm...@jpl.nasa.gov]
  Sent: Friday, January 23, 2015 4:13 PM
  To: dev@tika.apache.org
  Subject: Re: Forbidden-APIS no longer ran because of carzy POM change
 
  Hi Uwe,
 
  Thanks. I will check it out. Like I said, I’m not OK reverting
  anything if my Eclipse keeps complaining at me so we’ll need a fix
  that handles both. Let me try with the latest version of Eclipse and
  m2e and see if (with your patch) the issue goes away.
 
  Cheers,
  Chris
 
 
 ++
  
  Chris Mattmann, Ph.D.
  Chief Architect
  Instrument Software and Science Data Systems Section (398) NASA Jet
  Propulsion Laboratory Pasadena, CA 91109 USA
  Office: 168-519, Mailstop: 168-527
  Email: chris.a.mattm...@nasa.gov
  WWW:  http://sunset.usc.edu/~mattmann/
 
 

RE: Forbidden-APIS no longer ran because of carzy POM change

2015-01-23 Thread Allison, Timothy B.
Uwe,
  To confirm, we need to add this  pluginManagement.../pluginManagement 
fully as it is in the parent pom.xml, we should not put the plugin under our 
regular plugins (which no longer have pluginManagement?

-Original Message-
From: Uwe Schindler [mailto:u...@thetaphi.de] 
Sent: Friday, January 23, 2015 11:47 AM
To: dev@tika.apache.org
Subject: RE: Forbidden-APIS no longer ran because of carzy POM change

Hi,

I did further investigation. I had the plugin disabled in my eclipse (you can 
do this in quick fix for the whole workspace). In fact, if you remove the 
disable thing, it fails also in Eclipse Luna.

If we want to make the plugin automatically hidden to all Eclipse versions 
through our own POM file - this is what the quick fix also allows to do for the 
current project:

pluginManagement
plugins
!--This plugin's configuration is used to store Eclipse m2e 
settings only. It has no influence on the Maven build itself.--
plugin
groupIdorg.eclipse.m2e/groupId
artifactIdlifecycle-mapping/artifactId
version1.0.0/version
configuration
lifecycleMappingMetadata
pluginExecutions
pluginExecution
pluginExecutionFilter

groupIdde.thetaphi/groupId

artifactIdforbiddenapis/artifactId

versionRange[1.0,)/versionRange
goals

goalcheck/goal

goaltestCheck/goal
/goals
/pluginExecutionFilter
action
ignore/
/action
/pluginExecution
/pluginExecutions
/lifecycleMappingMetadata
/configuration
/plugin
/plugins
/pluginManagement

This can be put in to tika-parent's POM.

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


 -Original Message-
 From: Uwe Schindler [mailto:u...@thetaphi.de]
 Sent: Friday, January 23, 2015 5:18 PM
 To: dev@tika.apache.org
 Subject: RE: Forbidden-APIS no longer ran because of carzy POM change
 
 Hi, this may also help, it also brings the needed information:
 
 https://www.eclipse.org/m2e/documentation/m2e-execution-not-
 covered.html
 
 In fact the problem is: Eclipse has no idea how this plugin should be executed
 internally in Eclipse. But as this is just a check plugin that does not 
 affect the
 build output at all, you can leave it disabled.
 
 If you scroll down, you see that Eclipse 4.2+ fixes this problem: Disable the
 plugin for Maven using Project properties - Maven - Lifecycle mappings -
 ignore
 
 -
 Uwe Schindler
 H.-H.-Meier-Allee 63, D-28213 Bremen
 http://www.thetaphi.de
 eMail: u...@thetaphi.de
 
 
  -Original Message-
  From: Mattmann, Chris A (3980) [mailto:chris.a.mattm...@jpl.nasa.gov]
  Sent: Friday, January 23, 2015 4:13 PM
  To: dev@tika.apache.org
  Subject: Re: Forbidden-APIS no longer ran because of carzy POM change
 
  Hi Uwe,
 
  Thanks. I will check it out. Like I said, I’m not OK reverting
  anything if my Eclipse keeps complaining at me so we’ll need a fix
  that handles both. Let me try with the latest version of Eclipse and
  m2e and see if (with your patch) the issue goes away.
 
  Cheers,
  Chris
 
 
 ++
  
  Chris Mattmann, Ph.D.
  Chief Architect
  Instrument Software and Science Data Systems Section (398) NASA Jet
  Propulsion Laboratory Pasadena, CA 91109 USA
  Office: 168-519, Mailstop: 168-527
  Email: chris.a.mattm...@nasa.gov
  WWW:  http://sunset.usc.edu/~mattmann/
 
 ++
  
  Adjunct Associate Professor, Computer Science Department University of
  Southern California, Los Angeles, CA 90089 USA
 
 ++
  
 
 
 
 
 
 
  -Original Message-
  From: Uwe Schindler u...@thetaphi.de
  Reply-To: dev@tika.apache.org dev@tika.apache.org
  Date: Friday, January 23, 2015 at 3:59 AM
  To: dev@tika.apache.org 

[jira] [Commented] (TIKA-1529) Turn forbidden-apis back on

2015-01-23 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289932#comment-14289932
 ] 

Hudson commented on TIKA-1529:
--

UNSTABLE: Integrated in tika-trunk-jdk1.7 #449 (See 
[https://builds.apache.org/job/tika-trunk-jdk1.7/449/])
TIKA-1529: turn forbidden-apis back on and clean up all mentions of UTF-8 
(tallison: http://svn.apache.org/viewvc/tika/trunk/?view=revrev=1654351)
* /tika/trunk/tika-app/src/main/java/org/apache/tika/cli/TikaCLI.java
* /tika/trunk/tika-app/src/main/java/org/apache/tika/gui/TikaGUI.java
* /tika/trunk/tika-app/src/test/java/org/apache/tika/cli/TikaCLITest.java
* /tika/trunk/tika-bundle/src/test/java/org/apache/tika/bundle/BundleIT.java
* /tika/trunk/tika-core/src/main/java/org/apache/tika/config/ServiceLoader.java
* /tika/trunk/tika-core/src/main/java/org/apache/tika/detect/MagicDetector.java
* /tika/trunk/tika-core/src/main/java/org/apache/tika/detect/NameDetector.java
* 
/tika/trunk/tika-core/src/main/java/org/apache/tika/embedder/ExternalEmbedder.java
* /tika/trunk/tika-core/src/main/java/org/apache/tika/fork/ForkClient.java
* /tika/trunk/tika-core/src/main/java/org/apache/tika/io/IOUtils.java
* 
/tika/trunk/tika-core/src/main/java/org/apache/tika/language/LanguageIdentifier.java
* 
/tika/trunk/tika-core/src/main/java/org/apache/tika/language/LanguageProfilerBuilder.java
* 
/tika/trunk/tika-core/src/main/java/org/apache/tika/parser/external/ExternalParser.java
* 
/tika/trunk/tika-core/src/test/java/org/apache/tika/detect/TextDetectorTest.java
* /tika/trunk/tika-core/src/test/java/org/apache/tika/io/TailStreamTest.java
* 
/tika/trunk/tika-core/src/test/java/org/apache/tika/io/TikaInputStreamTest.java
* 
/tika/trunk/tika-core/src/test/java/org/apache/tika/language/LanguageIdentifierTest.java
* 
/tika/trunk/tika-core/src/test/java/org/apache/tika/language/LanguageProfilerBuilderTest.java
* 
/tika/trunk/tika-core/src/test/java/org/apache/tika/mime/MimeDetectionTest.java
* 
/tika/trunk/tika-core/src/test/java/org/apache/tika/sax/BasicContentHandlerFactoryTest.java
* 
/tika/trunk/tika-core/src/test/java/org/apache/tika/sax/BodyContentHandlerTest.java
* 
/tika/trunk/tika-example/src/main/java/org/apache/tika/example/DumpTikaConfigExample.java
* /tika/trunk/tika-parent/pom.xml
* 
/tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/chm/accessor/ChmDirectoryListingSet.java
* 
/tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/chm/accessor/ChmItsfHeader.java
* 
/tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/chm/accessor/ChmItspHeader.java
* 
/tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/chm/accessor/ChmLzxcControlData.java
* 
/tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/chm/accessor/ChmPmgiHeader.java
* 
/tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/chm/accessor/ChmPmglHeader.java
* 
/tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/chm/core/ChmConstants.java
* 
/tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/chm/core/ChmExtractor.java
* 
/tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/epub/EpubParser.java
* 
/tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/gdal/GDALParser.java
* 
/tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/image/ImageMetadataExtractor.java
* 
/tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/image/xmp/JempboxExtractor.java
* 
/tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/iptc/IptcAnpaParser.java
* 
/tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/mat/MatParser.java
* 
/tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/mbox/OutlookPSTParser.java
* 
/tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/microsoft/WordExtractor.java
* 
/tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/mp3/LyricsHandler.java
* 
/tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/ocr/TesseractOCRConfig.java
* 
/tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/ocr/TesseractOCRParser.java
* 
/tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/odf/OpenDocumentParser.java
* 
/tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/pkg/ZipContainerDetector.java
* 
/tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/video/FLVParser.java
* 
/tika/trunk/tika-parsers/src/test/java/org/apache/tika/embedder/ExternalEmbedderTest.java
* 
/tika/trunk/tika-parsers/src/test/java/org/apache/tika/parser/AutoDetectParserTest.java
* 
/tika/trunk/tika-parsers/src/test/java/org/apache/tika/parser/ParsingReaderTest.java
* 
/tika/trunk/tika-parsers/src/test/java/org/apache/tika/parser/chm/TestChmBlockInfo.java
* 
/tika/trunk/tika-parsers/src/test/java/org/apache/tika/parser/chm/TestChmExtraction.java
* 
/tika/trunk/tika-parsers/src/test/java/org/apache/tika/parser/chm/TestChmItspHeader.java
* 

RE: Forbidden-APIS no longer ran because of carzy POM change

2015-01-23 Thread Allison, Timothy B.
Will do.

-Original Message-
From: Mattmann, Chris A (3980) [mailto:chris.a.mattm...@jpl.nasa.gov] 
Sent: Friday, January 23, 2015 2:11 PM
To: dev@tika.apache.org
Subject: Re: Forbidden-APIS no longer ran because of carzy POM change

awesome. Thanks Uwe.

Tim you want to put that in, or you want me to?

++
Chris Mattmann, Ph.D.
Chief Architect
Instrument Software and Science Data Systems Section (398)
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 168-519, Mailstop: 168-527
Email: chris.a.mattm...@nasa.gov
WWW:  http://sunset.usc.edu/~mattmann/
++
Adjunct Associate Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++






-Original Message-
From: Uwe Schindler u...@thetaphi.de
Reply-To: dev@tika.apache.org dev@tika.apache.org
Date: Friday, January 23, 2015 at 8:47 AM
To: dev@tika.apache.org dev@tika.apache.org
Subject: RE: Forbidden-APIS no longer ran because of carzy POM change

Hi,

I did further investigation. I had the plugin disabled in my eclipse (you
can do this in quick fix for the whole workspace). In fact, if you remove
the disable thing, it fails also in Eclipse Luna.

If we want to make the plugin automatically hidden to all Eclipse
versions through our own POM file - this is what the quick fix also
allows to do for the current project:

pluginManagement
   plugins
   !--This plugin's configuration is used to store Eclipse m2e
settings only. It has no influence on the Maven build itself.--
   plugin
   groupIdorg.eclipse.m2e/groupId
   artifactIdlifecycle-mapping/artifactId
   version1.0.0/version
   configuration
   lifecycleMappingMetadata
   pluginExecutions
   pluginExecution
   pluginExecutionFilter
   
 groupIdde.thetaphi/groupId
   
 artifactIdforbiddenapis/artifactId
   
 versionRange[1.0,)/versionRange
   goals
   
 goalcheck/goal
   
 goaltestCheck/goal
   /goals
   /pluginExecutionFilter
   action
   ignore/
   /action
   /pluginExecution
   /pluginExecutions
   /lifecycleMappingMetadata
   /configuration
   /plugin
   /plugins
/pluginManagement

This can be put in to tika-parent's POM.

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


 -Original Message-
 From: Uwe Schindler [mailto:u...@thetaphi.de]
 Sent: Friday, January 23, 2015 5:18 PM
 To: dev@tika.apache.org
 Subject: RE: Forbidden-APIS no longer ran because of carzy POM change
 
 Hi, this may also help, it also brings the needed information:
 
 https://www.eclipse.org/m2e/documentation/m2e-execution-not-
 covered.html
 
 In fact the problem is: Eclipse has no idea how this plugin should be
executed
 internally in Eclipse. But as this is just a check plugin that does
not affect the
 build output at all, you can leave it disabled.
 
 If you scroll down, you see that Eclipse 4.2+ fixes this problem:
Disable the
 plugin for Maven using Project properties - Maven - Lifecycle
mappings -
 ignore
 
 -
 Uwe Schindler
 H.-H.-Meier-Allee 63, D-28213 Bremen
 http://www.thetaphi.de
 eMail: u...@thetaphi.de
 
 
  -Original Message-
  From: Mattmann, Chris A (3980) [mailto:chris.a.mattm...@jpl.nasa.gov]
  Sent: Friday, January 23, 2015 4:13 PM
  To: dev@tika.apache.org
  Subject: Re: Forbidden-APIS no longer ran because of carzy POM change
 
  Hi Uwe,
 
  Thanks. I will check it out. Like I said, I’m not OK reverting
  anything if my Eclipse keeps complaining at me so we’ll need a fix
  that handles both. Let me try with the latest version of Eclipse and
  m2e and see if (with your patch) the issue goes away.
 
  Cheers,
  Chris
 
 
 ++
  
  Chris Mattmann, Ph.D.
  Chief Architect
 

RE: Forbidden-APIS no longer ran because of carzy POM change

2015-01-23 Thread Uwe Schindler
Hi Timothy,

Your commit looks fine. Basically, this pluginManagement section is just 
containing some fake plugin that is never actively executed, but used by 
Eclipse to detect which plugins map to internal lifecycles of the Eclipse IDE. 
It uses this to map for example how to execute the compile maven plugin 
inside Eclipse (use ECJ compiler) or let the surefire plugin map to the 
internal Eclipse test runner.

Our addition through the parent POM just tells eclipse how to map the 
forbidden-apis plugin: To *nothing*, just ignore it inside the Eclipse IDE.

I will try it with Eclipse later, to make sure all is fine. But looks good to 
me. I have not yet tried the setup with parent POMs, but I assume this should 
be fine.

Uwe

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


 -Original Message-
 From: Allison, Timothy B. [mailto:talli...@mitre.org]
 Sent: Friday, January 23, 2015 8:35 PM
 To: dev@tika.apache.org
 Subject: RE: Forbidden-APIS no longer ran because of carzy POM change
 
 Uwe,
   To confirm, we need to add this
 pluginManagement.../pluginManagement fully as it is in the parent
 pom.xml, we should not put the plugin under our regular plugins (which no
 longer have pluginManagement?
 
 -Original Message-
 From: Uwe Schindler [mailto:u...@thetaphi.de]
 Sent: Friday, January 23, 2015 11:47 AM
 To: dev@tika.apache.org
 Subject: RE: Forbidden-APIS no longer ran because of carzy POM change
 
 Hi,
 
 I did further investigation. I had the plugin disabled in my eclipse (you can 
 do
 this in quick fix for the whole workspace). In fact, if you remove the disable
 thing, it fails also in Eclipse Luna.
 
 If we want to make the plugin automatically hidden to all Eclipse versions
 through our own POM file - this is what the quick fix also allows to do for 
 the
 current project:
 
 pluginManagement
   plugins
   !--This plugin's configuration is used to store Eclipse m2e
 settings only. It has no influence on the Maven build itself.--
   plugin
   groupIdorg.eclipse.m2e/groupId
   artifactIdlifecycle-mapping/artifactId
   version1.0.0/version
   configuration
   lifecycleMappingMetadata
   pluginExecutions
   pluginExecution
 
   pluginExecutionFilter
 
   groupIdde.thetaphi/groupId
 
   artifactIdforbiddenapis/artifactId
 
   versionRange[1.0,)/versionRange
   goals
 
   goalcheck/goal
 
   goaltestCheck/goal
   /goals
 
   /pluginExecutionFilter
   action
   ignore/
   /action
   /pluginExecution
   /pluginExecutions
   /lifecycleMappingMetadata
   /configuration
   /plugin
   /plugins
 /pluginManagement
 
 This can be put in to tika-parent's POM.
 
 -
 Uwe Schindler
 H.-H.-Meier-Allee 63, D-28213 Bremen
 http://www.thetaphi.de
 eMail: u...@thetaphi.de
 
 
  -Original Message-
  From: Uwe Schindler [mailto:u...@thetaphi.de]
  Sent: Friday, January 23, 2015 5:18 PM
  To: dev@tika.apache.org
  Subject: RE: Forbidden-APIS no longer ran because of carzy POM change
 
  Hi, this may also help, it also brings the needed information:
 
  https://www.eclipse.org/m2e/documentation/m2e-execution-not-
  covered.html
 
  In fact the problem is: Eclipse has no idea how this plugin should be
  executed internally in Eclipse. But as this is just a check plugin
  that does not affect the build output at all, you can leave it disabled.
 
  If you scroll down, you see that Eclipse 4.2+ fixes this problem:
  Disable the plugin for Maven using Project properties - Maven -
  Lifecycle mappings - ignore
 
  -
  Uwe Schindler
  H.-H.-Meier-Allee 63, D-28213 Bremen
  http://www.thetaphi.de
  eMail: u...@thetaphi.de
 
 
   -Original Message-
   From: Mattmann, Chris A (3980)
   [mailto:chris.a.mattm...@jpl.nasa.gov]
   Sent: Friday, January 23, 2015 4:13 PM
   To: dev@tika.apache.org
   Subject: Re: Forbidden-APIS no longer ran because of carzy POM
   change
  
   Hi Uwe,
  
   Thanks. I will check it out. Like I said, I’m not OK reverting
   anything if my Eclipse keeps complaining at me so we’ll need a fix
   that handles both. Let me try with the latest version of Eclipse and
   m2e and see if (with your patch) the issue goes away.
  
   Cheers,
   Chris
  
  
 
 ++
   
   Chris Mattmann, Ph.D.
   

[jira] [Resolved] (TIKA-1529) Turn forbidden-apis back on

2015-01-23 Thread Tim Allison (JIRA)

 [ 
https://issues.apache.org/jira/browse/TIKA-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Allison resolved TIKA-1529.
---
Resolution: Fixed

Fixes made in r1654351.  Let me know if there are any surprises.

 Turn forbidden-apis back on
 ---

 Key: TIKA-1529
 URL: https://issues.apache.org/jira/browse/TIKA-1529
 Project: Tika
  Issue Type: Bug
Reporter: Tim Allison
Priority: Minor

 [~thetaphi] recently noticed that forbidden-apis was turned off in r1624185, 
 and he submitted a patch to the dev list.  Let's turn it back on.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


RE: Forbidden-APIS no longer ran because of carzy POM change

2015-01-23 Thread Uwe Schindler
Works fine here!

After I removed the manual override of the plugin lifecycle settings, restart, 
 maven update, TIKA built successfully. So the setting in Parent POM is enough.
I will update the forbidden-apis documentation to help others, too.

Uwe

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


 -Original Message-
 From: Uwe Schindler [mailto:u...@thetaphi.de]
 Sent: Friday, January 23, 2015 9:08 PM
 To: dev@tika.apache.org
 Subject: RE: Forbidden-APIS no longer ran because of carzy POM change
 
 Hi Timothy,
 
 Your commit looks fine. Basically, this pluginManagement section is just
 containing some fake plugin that is never actively executed, but used by
 Eclipse to detect which plugins map to internal lifecycles of the Eclipse 
 IDE. It
 uses this to map for example how to execute the compile maven plugin
 inside Eclipse (use ECJ compiler) or let the surefire plugin map to the 
 internal
 Eclipse test runner.
 
 Our addition through the parent POM just tells eclipse how to map the
 forbidden-apis plugin: To *nothing*, just ignore it inside the Eclipse IDE.
 
 I will try it with Eclipse later, to make sure all is fine. But looks good to 
 me. I
 have not yet tried the setup with parent POMs, but I assume this should be
 fine.
 
 Uwe
 
 -
 Uwe Schindler
 H.-H.-Meier-Allee 63, D-28213 Bremen
 http://www.thetaphi.de
 eMail: u...@thetaphi.de
 
 
  -Original Message-
  From: Allison, Timothy B. [mailto:talli...@mitre.org]
  Sent: Friday, January 23, 2015 8:35 PM
  To: dev@tika.apache.org
  Subject: RE: Forbidden-APIS no longer ran because of carzy POM change
 
  Uwe,
To confirm, we need to add this
  pluginManagement.../pluginManagement fully as it is in the parent
  pom.xml, we should not put the plugin under our regular plugins (which
  no longer have pluginManagement?
 
  -Original Message-
  From: Uwe Schindler [mailto:u...@thetaphi.de]
  Sent: Friday, January 23, 2015 11:47 AM
  To: dev@tika.apache.org
  Subject: RE: Forbidden-APIS no longer ran because of carzy POM change
 
  Hi,
 
  I did further investigation. I had the plugin disabled in my eclipse
  (you can do this in quick fix for the whole workspace). In fact, if
  you remove the disable thing, it fails also in Eclipse Luna.
 
  If we want to make the plugin automatically hidden to all Eclipse
  versions through our own POM file - this is what the quick fix also
  allows to do for the current project:
 
  pluginManagement
  plugins
  !--This plugin's configuration is used to store 
  Eclipse m2e
  settings only. It has no influence on the Maven build itself.--
  plugin
  groupIdorg.eclipse.m2e/groupId
  artifactIdlifecycle-mapping/artifactId
  version1.0.0/version
  configuration
  lifecycleMappingMetadata
  pluginExecutions
  pluginExecution
 
  pluginExecutionFilter
 
  groupIdde.thetaphi/groupId
 
  artifactIdforbiddenapis/artifactId
 
  versionRange[1.0,)/versionRange
  goals
 
  goalcheck/goal
 
  goaltestCheck/goal
  /goals
 
  /pluginExecutionFilter
  action
  
  ignore/
  /action
  /pluginExecution
  /pluginExecutions
  /lifecycleMappingMetadata
  /configuration
  /plugin
  /plugins
  /pluginManagement
 
  This can be put in to tika-parent's POM.
 
  -
  Uwe Schindler
  H.-H.-Meier-Allee 63, D-28213 Bremen
  http://www.thetaphi.de
  eMail: u...@thetaphi.de
 
 
   -Original Message-
   From: Uwe Schindler [mailto:u...@thetaphi.de]
   Sent: Friday, January 23, 2015 5:18 PM
   To: dev@tika.apache.org
   Subject: RE: Forbidden-APIS no longer ran because of carzy POM
   change
  
   Hi, this may also help, it also brings the needed information:
  
   https://www.eclipse.org/m2e/documentation/m2e-execution-not-
   covered.html
  
   In fact the problem is: Eclipse has no idea how this plugin should
   be executed internally in Eclipse. But as this is just a check plugin
   that does not affect the build output at all, you can leave it disabled.
  
   If you scroll down, you see that Eclipse 4.2+ fixes this problem:
   Disable the plugin for Maven using Project properties - Maven -
   Lifecycle mappings - 

Re: Forbidden-APIS no longer ran because of carzy POM change

2015-01-23 Thread Konstantin Gribov
Hi, Uwe.

There're several places where forbiddenapis will give errors in Tika.
I don't know if there is better way to fallback. E.g. in one of chm parser
classes:

try {
  dle.setName(new String(bytes, UTF-8));
catch (UnsupportedCharsetException e) {
  dle.setName(new String(bytes));
}

Can you add special annotation parsing (like
@SuppressWarnings(forbiddenapis) on element) to avoid emitting build
error in special cases like above mentioned?

-- 
Best regards,
Konstantin Gribov

Fri Jan 23 2015 at 15:10:18, Uwe Schindler u...@thetaphi.de:

Here is the patch, mailing list swallowed it:

 Index: tika-parent/pom.xml
 ===
 --- tika-parent/pom.xml (revision 1654171)
 +++ tika-parent/pom.xml (working copy)
 @@ -274,7 +274,6 @@
/properties

build
 -pluginManagement
plugins
  plugin
artifactIdmaven-compiler-plugin/artifactId
 @@ -287,7 +286,7 @@
  plugin
groupIdde.thetaphi/groupId
artifactIdforbiddenapis/artifactId
 -  version1.6.1/version
 +  version1.7/version
configuration
  targetVersion${maven.compiler.target}/targetVersion
  internalRuntimeForbiddentrue/internalRuntimeForbidden
 @@ -322,7 +321,6 @@
version2.3/version
  /plugin
/plugins
 -/pluginManagement
/build

profiles

 -
 Uwe Schindler
 H.-H.-Meier-Allee 63, D-28213 Bremen
 http://www.thetaphi.de
 eMail: u...@thetaphi.de


  -Original Message-
  From: Uwe Schindler [mailto:u...@thetaphi.de]
  Sent: Friday, January 23, 2015 1:08 PM
  To: dev@tika.apache.org
  Subject: RE: Forbidden-APIS no longer ran because of carzy POM change
 
  The attached patch reverts the change and updates the forbidden plugin.
 
  Uwe
 
  -
  Uwe Schindler
  H.-H.-Meier-Allee 63, D-28213 Bremen
  http://www.thetaphi.de
  eMail: u...@thetaphi.de
 
 
   -Original Message-
   From: Uwe Schindler [mailto:u...@thetaphi.de]
   Sent: Friday, January 23, 2015 1:00 PM
   To: dev@tika.apache.org
   Subject: RE: Forbidden-APIS no longer ran because of carzy POM change
  
   Here ist he explanation why the plugin is no longer called because of
 this:
  
   - Works for me too, but can anyone explain why? –  Andrew Swan May 15
   '13 at 6:26
   - @Andrew I think this works because m2e is not looking for plugins in
   pluginManagement, but only in build/plugins. In the Maven world, there
   is a difference between the two - the former defines if you happen to
   use this plugin, here's the configuration to use, whereas the latter
   states use this plugin. See this post and its top two answers. –
   GreenGiant Jul 5 '13 at 17:52
   - I agree with @GreenGiant. I tried this solution but it then breaks
   the compilation since the aspectj plugin is not called before
   compilation. –  Pierre Aug 30 '13 at 20:21
  
   This explains the change. In fact placing the plugins in
   pluginManagements disables them unless explicitely configured in a
   sub-module. So this commit should be reverted.
  
  
   In fact the bug described here no longer applies to later M2E
   installations. It still complains about plugins that Eclipse does not
   know about, but this does not prevent you from using Eclipse. So I
   would strongly ask to revert the commit because it breaks the build.
  
   Uwe
  
   -
   Uwe Schindler
   H.-H.-Meier-Allee 63, D-28213 Bremen
   http://www.thetaphi.de
   eMail: u...@thetaphi.de
  
  
-Original Message-
From: Uwe Schindler [mailto:u...@thetaphi.de]
Sent: Friday, January 23, 2015 12:11 PM
To: dev@tika.apache.org
Subject: Forbidden-APIS no longer ran because of carzy POM change
   
Hi,
   
I just noticed while checking the problems around the
ExternalParsers that the TIKA's build no longer runs the
forbidden-apis Maven plugin, so we got a few new violation
especially regarding the toUpper/LowerCase(). In fact the following
  commit broke this:
   
Revision: 1624185
Author: mattmann
Date: Donnerstag, 11. September 2014 05:11:19
Message:
surround in plugin management to resolve
http://stackoverflow.com/questions/6352208/how-to-solve-plugin-
execution-not-covered-by-lifecycle-configuration-for-sprin

Modified : /tika/trunk/tika-parent/pom.xml
   
Since that change, the plugin is no longer run by default. I have no
idea, why this is like this, but in fact this broke some of the
globally defined
   check tasks.
I have no idea how to reenable it easily.
So I cannot help, but reverting that commit restores behavior. What
is the reason for this commit, there is not even an issue about
that. I think it seems to be a workaround for some Eclipse issue,
but in fact this disables the whole plugins. To reenable
forbidden-apis you have to now explicitely enable it in every module
(because pluginManagement just gives 

[jira] [Commented] (TIKA-1511) Create a parser for SQLite3

2015-01-23 Thread Tim Allison (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14290303#comment-14290303
 ] 

Tim Allison commented on TIKA-1511:
---

I'm not sure I understand the need for that.  Won't you be able to send in 
whatever handler you want via the regular call to parse and by attaching a 
ParsingEmbeddedDocumentExtractor?  What, exactly, do you want to have when Tika 
has finished processing the Sqlite file?

 Create a parser for SQLite3
 ---

 Key: TIKA-1511
 URL: https://issues.apache.org/jira/browse/TIKA-1511
 Project: Tika
  Issue Type: New Feature
  Components: parser
Affects Versions: 1.6
Reporter: Luis Filipe Nassif
 Fix For: 1.8

 Attachments: TIKA-1511v1.patch, TIKA-1511v2.patch, TIKA-1511v3.patch, 
 testSQLLite3b.db, testSQLLite3b.db


 I think it would be very useful, as sqlite is used as data storage by a wide 
 range of applications. Opening the ticket to track it. 



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TIKA-1521) Handle password protected 7zip files

2015-01-23 Thread Tim Allison (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14290310#comment-14290310
 ] 

Tim Allison commented on TIKA-1521:
---

I'm getting the test failure on Windows with Java 1.8, but all is well with an 
fairly old update of 1.7 on RHEL.

 Handle password protected 7zip files
 

 Key: TIKA-1521
 URL: https://issues.apache.org/jira/browse/TIKA-1521
 Project: Tika
  Issue Type: Improvement
  Components: parser
Affects Versions: 1.7
Reporter: Nick Burch
 Fix For: 1.8


 While working on TIKA-1028, I notice that while Commons Compress doesn't 
 currently handle decrypting password protected zip files, it does handle 
 password protected 7zip files
 We should therefore add logic into the package parser to spot password 
 protected 7zip files, and fetch the password for them from a PasswordProvider 
 if given



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TIKA-1529) Turn forbidden-apis back on

2015-01-23 Thread Tim Allison (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14290304#comment-14290304
 ] 

Tim Allison commented on TIKA-1529:
---

Thank you, [~thetaphi]!

 Turn forbidden-apis back on
 ---

 Key: TIKA-1529
 URL: https://issues.apache.org/jira/browse/TIKA-1529
 Project: Tika
  Issue Type: Bug
Reporter: Tim Allison
Priority: Minor

 [~thetaphi] recently noticed that forbidden-apis was turned off in r1624185, 
 and he submitted a patch to the dev list.  Let's turn it back on.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


RE: Forbidden-APIS no longer ran because of carzy POM change

2015-01-23 Thread Uwe Schindler
The attached patch reverts the change and updates the forbidden plugin.

Uwe

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


 -Original Message-
 From: Uwe Schindler [mailto:u...@thetaphi.de]
 Sent: Friday, January 23, 2015 1:00 PM
 To: dev@tika.apache.org
 Subject: RE: Forbidden-APIS no longer ran because of carzy POM change
 
 Here ist he explanation why the plugin is no longer called because of this:
 
 - Works for me too, but can anyone explain why? –  Andrew Swan May 15 '13
 at 6:26
 - @Andrew I think this works because m2e is not looking for plugins in
 pluginManagement, but only in build/plugins. In the Maven world, there is a
 difference between the two - the former defines if you happen to use this
 plugin, here's the configuration to use, whereas the latter states use this
 plugin. See this post and its top two answers. –  GreenGiant Jul 5 '13 at 
 17:52
 - I agree with @GreenGiant. I tried this solution but it then breaks the
 compilation since the aspectj plugin is not called before compilation. –  
 Pierre
 Aug 30 '13 at 20:21
 
 This explains the change. In fact placing the plugins in pluginManagements
 disables them unless explicitely configured in a sub-module. So this commit
 should be reverted.
 
 
 In fact the bug described here no longer applies to later M2E installations. 
 It
 still complains about plugins that Eclipse does not know about, but this does
 not prevent you from using Eclipse. So I would strongly ask to revert the
 commit because it breaks the build.
 
 Uwe
 
 -
 Uwe Schindler
 H.-H.-Meier-Allee 63, D-28213 Bremen
 http://www.thetaphi.de
 eMail: u...@thetaphi.de
 
 
  -Original Message-
  From: Uwe Schindler [mailto:u...@thetaphi.de]
  Sent: Friday, January 23, 2015 12:11 PM
  To: dev@tika.apache.org
  Subject: Forbidden-APIS no longer ran because of carzy POM change
 
  Hi,
 
  I just noticed while checking the problems around the ExternalParsers
  that the TIKA's build no longer runs the forbidden-apis Maven plugin,
  so we got a few new violation especially regarding the
  toUpper/LowerCase(). In fact the following commit broke this:
 
  Revision: 1624185
  Author: mattmann
  Date: Donnerstag, 11. September 2014 05:11:19
  Message:
  surround in plugin management to resolve
  http://stackoverflow.com/questions/6352208/how-to-solve-plugin-
  execution-not-covered-by-lifecycle-configuration-for-sprin
  
  Modified : /tika/trunk/tika-parent/pom.xml
 
  Since that change, the plugin is no longer run by default. I have no
  idea, why this is like this, but in fact this broke some of the globally 
  defined
 check tasks.
  I have no idea how to reenable it easily.
  So I cannot help, but reverting that commit restores behavior. What is
  the reason for this commit, there is not even an issue about that. I
  think it seems to be a workaround for some Eclipse issue, but in fact
  this disables the whole plugins. To reenable forbidden-apis you have
  to now explicitely enable it in every module (because pluginManagement
  just gives the config of a plugin, where without that it also enables its
 execution.
 
  In addition, there is already version 1.7 of forbiddenapis, so you can
  replace
  1.6.1 of forbidden-apis with version 1.7 (which fixes a few bugs with
  Java 8 and Java 9).
 
  The following new violations were found - and in fact those broke
  code in turkish locale:
  [INFO]
  --
  -- [INFO] Building Apache Tika core 1.8-SNAPSHOT [INFO]
  --
  --
  [INFO]
  [INFO] --- forbiddenapis:1.7:testCheck (default) @ tika-core ---
  [INFO] Scanning for classes to check...
  [INFO] Reading bundled API signatures: jdk-unsafe [INFO] Reading
  bundled API signatures: jdk-deprecated [INFO] Loading classes to check...
  [INFO] Scanning for API signatures and dependencies...
  [ERROR] Forbidden method invocation: java.lang.String#toLowerCase()
  [Uses default locale]
  [ERROR]   in org.apache.tika.sax.BasicContentHandlerFactoryTest
  (BasicContentHandlerFactoryTest.java:79)
  [ERROR] Forbidden method invocation: java.lang.String#toLowerCase()
  [Uses default locale]
  [ERROR]   in org.apache.tika.sax.BasicContentHandlerFactoryTest
  (BasicContentHandlerFactoryTest.java:80)
  [ERROR] Forbidden method invocation: java.lang.String#toLowerCase()
  [Uses default locale]
  [ERROR]   in org.apache.tika.sax.BasicContentHandlerFactoryTest
  (BasicContentHandlerFactoryTest.java:88)
  [ERROR] Forbidden method invocation: java.lang.String#toLowerCase()
  [Uses default locale]
  [ERROR]   in org.apache.tika.sax.BasicContentHandlerFactoryTest
  (BasicContentHandlerFactoryTest.java:133)
  [ERROR] Forbidden method invocation: java.lang.String#toLowerCase()
  [Uses default locale]
  [ERROR]   in org.apache.tika.sax.BasicContentHandlerFactoryTest
  

[jira] [Commented] (TIKA-1526) ExternalParser should trap/ignore/workarround JDK-8047340 JDK-8055301 so Turkish Tika users can still use non-external parsers

2015-01-23 Thread Konstantin Gribov (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289172#comment-14289172
 ] 

Konstantin Gribov commented on TIKA-1526:
-

[~thetaphi], I understand that this is jdk bug with {{{tr}}} locale. Can they 
use some workaround with {{{Locale.setDefault}}} if user's locale is {{{tr}}}?

 ExternalParser should trap/ignore/workarround JDK-8047340  JDK-8055301 so 
 Turkish Tika users can still use non-external parsers
 

 Key: TIKA-1526
 URL: https://issues.apache.org/jira/browse/TIKA-1526
 Project: Tika
  Issue Type: Wish
Reporter: Hoss Man

 the JDK has numerous pain points regarding the Turkish locale, posix_spawn 
 lowercasing being one of them...
 https://bugs.openjdk.java.net/browse/JDK-8047340
 https://bugs.openjdk.java.net/browse/JDK-8055301
 As of Tika 1.7, the TesseractOCRParser (which is an ExternalParser) is 
 enabled  configured by default in Tika, and uses ExternalParser.check to see 
 if tesseract is available -- but because of the JDK bug, this means that Tika 
 fails fast for Turkish users on BSD/UNIX variants (including MacOSX) like 
 so...
 {noformat}
   [junit4] Throwable #1: java.lang.Error: posix_spawn is not a supported 
 process launch mechanism on this platform.
   [junit4]   at java.lang.UNIXProcess$1.run(UNIXProcess.java:105)
   [junit4]   at java.lang.UNIXProcess$1.run(UNIXProcess.java:94)
   [junit4]   at java.security.AccessController.doPrivileged(Native 
 Method)
   [junit4]   at java.lang.UNIXProcess.clinit(UNIXProcess.java:92)
   [junit4]   at java.lang.ProcessImpl.start(ProcessImpl.java:130)
   [junit4]   at 
 java.lang.ProcessBuilder.start(ProcessBuilder.java:1029)
   [junit4]   at java.lang.Runtime.exec(Runtime.java:620)
   [junit4]   at java.lang.Runtime.exec(Runtime.java:485)
   [junit4]   at 
 org.apache.tika.parser.external.ExternalParser.check(ExternalParser.java:344)
   [junit4]   at 
 org.apache.tika.parser.ocr.TesseractOCRParser.hasTesseract(TesseractOCRParser.java:117)
   [junit4]   at 
 org.apache.tika.parser.ocr.TesseractOCRParser.getSupportedTypes(TesseractOCRParser.java:90)
   [junit4]   at 
 org.apache.tika.parser.CompositeParser.getParsers(CompositeParser.java:81)
   [junit4]   at 
 org.apache.tika.parser.DefaultParser.getParsers(DefaultParser.java:95)
   [junit4]   at 
 org.apache.tika.parser.CompositeParser.getSupportedTypes(CompositeParser.java:229)
   [junit4]   at 
 org.apache.tika.parser.CompositeParser.getParsers(CompositeParser.java:81)
   [junit4]   at 
 org.apache.tika.parser.CompositeParser.getParser(CompositeParser.java:209)
   [junit4]   at 
 org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:244)
   [junit4]   at 
 org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
 {noformat}
 ...unless they go out of their way to white list only the parsers they 
 need/want so TesseractOCRParser (and any other ExternalParsers) will never 
 even be check()ed.
 It would be nice if Tika's ExternalParser class added a similar 
 hack/workarround to what was done in SOLR-6387 to trap these types of errors. 
  In Solr we just propogate a better error explaining why Java hates the 
 turkish langauge...
 {code}
 } catch (Error err) {
   if (err.getMessage() != null  (err.getMessage().contains(posix_spawn) 
 || err.getMessage().contains(UNIXProcess))) {
 log.warn(Error forking command due to JVM locale bug (see 
 https://issues.apache.org/jira/browse/SOLR-6387):  + err.getMessage());
 return (error executing:  + cmd + );
   }
 }
 {code}
 ...but with Tika, it might be better for all ExternalParsers to just opt 
 out as if they don't recognize the filetype when they detect this type of 
 error fro m the check method (or perhaps it would be better if 
 AutoDetectParser handled this? ... i'm not really sure how it would best fit 
 into Tika's architecture)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TIKA-1526) ExternalParser should trap/ignore/workarround JDK-8047340 JDK-8055301 so Turkish Tika users can still use non-external parsers

2015-01-23 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289125#comment-14289125
 ] 

Uwe Schindler commented on TIKA-1526:
-

[~grossws]: This bug is not in Maven itsself, the problem here is unsolved bug 
in the JDK itsself. Maven is perfectly fine, but because of the JDK bug, Maven 
cannot spawn external processes.

 ExternalParser should trap/ignore/workarround JDK-8047340  JDK-8055301 so 
 Turkish Tika users can still use non-external parsers
 

 Key: TIKA-1526
 URL: https://issues.apache.org/jira/browse/TIKA-1526
 Project: Tika
  Issue Type: Wish
Reporter: Hoss Man

 the JDK has numerous pain points regarding the Turkish locale, posix_spawn 
 lowercasing being one of them...
 https://bugs.openjdk.java.net/browse/JDK-8047340
 https://bugs.openjdk.java.net/browse/JDK-8055301
 As of Tika 1.7, the TesseractOCRParser (which is an ExternalParser) is 
 enabled  configured by default in Tika, and uses ExternalParser.check to see 
 if tesseract is available -- but because of the JDK bug, this means that Tika 
 fails fast for Turkish users on BSD/UNIX variants (including MacOSX) like 
 so...
 {noformat}
   [junit4] Throwable #1: java.lang.Error: posix_spawn is not a supported 
 process launch mechanism on this platform.
   [junit4]   at java.lang.UNIXProcess$1.run(UNIXProcess.java:105)
   [junit4]   at java.lang.UNIXProcess$1.run(UNIXProcess.java:94)
   [junit4]   at java.security.AccessController.doPrivileged(Native 
 Method)
   [junit4]   at java.lang.UNIXProcess.clinit(UNIXProcess.java:92)
   [junit4]   at java.lang.ProcessImpl.start(ProcessImpl.java:130)
   [junit4]   at 
 java.lang.ProcessBuilder.start(ProcessBuilder.java:1029)
   [junit4]   at java.lang.Runtime.exec(Runtime.java:620)
   [junit4]   at java.lang.Runtime.exec(Runtime.java:485)
   [junit4]   at 
 org.apache.tika.parser.external.ExternalParser.check(ExternalParser.java:344)
   [junit4]   at 
 org.apache.tika.parser.ocr.TesseractOCRParser.hasTesseract(TesseractOCRParser.java:117)
   [junit4]   at 
 org.apache.tika.parser.ocr.TesseractOCRParser.getSupportedTypes(TesseractOCRParser.java:90)
   [junit4]   at 
 org.apache.tika.parser.CompositeParser.getParsers(CompositeParser.java:81)
   [junit4]   at 
 org.apache.tika.parser.DefaultParser.getParsers(DefaultParser.java:95)
   [junit4]   at 
 org.apache.tika.parser.CompositeParser.getSupportedTypes(CompositeParser.java:229)
   [junit4]   at 
 org.apache.tika.parser.CompositeParser.getParsers(CompositeParser.java:81)
   [junit4]   at 
 org.apache.tika.parser.CompositeParser.getParser(CompositeParser.java:209)
   [junit4]   at 
 org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:244)
   [junit4]   at 
 org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
 {noformat}
 ...unless they go out of their way to white list only the parsers they 
 need/want so TesseractOCRParser (and any other ExternalParsers) will never 
 even be check()ed.
 It would be nice if Tika's ExternalParser class added a similar 
 hack/workarround to what was done in SOLR-6387 to trap these types of errors. 
  In Solr we just propogate a better error explaining why Java hates the 
 turkish langauge...
 {code}
 } catch (Error err) {
   if (err.getMessage() != null  (err.getMessage().contains(posix_spawn) 
 || err.getMessage().contains(UNIXProcess))) {
 log.warn(Error forking command due to JVM locale bug (see 
 https://issues.apache.org/jira/browse/SOLR-6387):  + err.getMessage());
 return (error executing:  + cmd + );
   }
 }
 {code}
 ...but with Tika, it might be better for all ExternalParsers to just opt 
 out as if they don't recognize the filetype when they detect this type of 
 error fro m the check method (or perhaps it would be better if 
 AutoDetectParser handled this? ... i'm not really sure how it would best fit 
 into Tika's architecture)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


RE: Forbidden-APIS no longer ran because of carzy POM change

2015-01-23 Thread Uwe Schindler
Here ist he explanation why the plugin is no longer called because of this:

- Works for me too, but can anyone explain why? –  Andrew Swan May 15 '13 at 
6:26   
- @Andrew I think this works because m2e is not looking for plugins in 
pluginManagement, but only in build/plugins. In the Maven world, there is a 
difference between the two - the former defines if you happen to use this 
plugin, here's the configuration to use, whereas the latter states use this 
plugin. See this post and its top two answers. –  GreenGiant Jul 5 '13 at 
17:52 
- I agree with @GreenGiant. I tried this solution but it then breaks the 
compilation since the aspectj plugin is not called before compilation. –  
Pierre Aug 30 '13 at 20:21

This explains the change. In fact placing the plugins in pluginManagements 
disables them unless explicitely configured in a sub-module. So this commit 
should be reverted.


In fact the bug described here no longer applies to later M2E installations. It 
still complains about plugins that Eclipse does not know about, but this does 
not prevent you from using Eclipse. So I would strongly ask to revert the 
commit because it breaks the build.

Uwe

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


 -Original Message-
 From: Uwe Schindler [mailto:u...@thetaphi.de]
 Sent: Friday, January 23, 2015 12:11 PM
 To: dev@tika.apache.org
 Subject: Forbidden-APIS no longer ran because of carzy POM change
 
 Hi,
 
 I just noticed while checking the problems around the ExternalParsers that
 the TIKA's build no longer runs the forbidden-apis Maven plugin, so we got a
 few new violation especially regarding the toUpper/LowerCase(). In fact the
 following commit broke this:
 
 Revision: 1624185
 Author: mattmann
 Date: Donnerstag, 11. September 2014 05:11:19
 Message:
 surround in plugin management to resolve
 http://stackoverflow.com/questions/6352208/how-to-solve-plugin-
 execution-not-covered-by-lifecycle-configuration-for-sprin
 
 Modified : /tika/trunk/tika-parent/pom.xml
 
 Since that change, the plugin is no longer run by default. I have no idea, why
 this is like this, but in fact this broke some of the globally defined check 
 tasks.
 I have no idea how to reenable it easily.
 So I cannot help, but reverting that commit restores behavior. What is the
 reason for this commit, there is not even an issue about that. I think it 
 seems
 to be a workaround for some Eclipse issue, but in fact this disables the whole
 plugins. To reenable forbidden-apis you have to now explicitely enable it in
 every module (because pluginManagement just gives the config of a plugin,
 where without that it also enables its execution.
 
 In addition, there is already version 1.7 of forbiddenapis, so you can replace
 1.6.1 of forbidden-apis with version 1.7 (which fixes a few bugs with Java 8
 and Java 9).
 
 The following new violations were found - and in fact those broke code in
 turkish locale:
 [INFO] 
 
 [INFO] Building Apache Tika core 1.8-SNAPSHOT [INFO] 
 --
 --
 [INFO]
 [INFO] --- forbiddenapis:1.7:testCheck (default) @ tika-core --- [INFO]
 Scanning for classes to check...
 [INFO] Reading bundled API signatures: jdk-unsafe [INFO] Reading bundled
 API signatures: jdk-deprecated [INFO] Loading classes to check...
 [INFO] Scanning for API signatures and dependencies...
 [ERROR] Forbidden method invocation: java.lang.String#toLowerCase()
 [Uses default locale]
 [ERROR]   in org.apache.tika.sax.BasicContentHandlerFactoryTest
 (BasicContentHandlerFactoryTest.java:79)
 [ERROR] Forbidden method invocation: java.lang.String#toLowerCase()
 [Uses default locale]
 [ERROR]   in org.apache.tika.sax.BasicContentHandlerFactoryTest
 (BasicContentHandlerFactoryTest.java:80)
 [ERROR] Forbidden method invocation: java.lang.String#toLowerCase()
 [Uses default locale]
 [ERROR]   in org.apache.tika.sax.BasicContentHandlerFactoryTest
 (BasicContentHandlerFactoryTest.java:88)
 [ERROR] Forbidden method invocation: java.lang.String#toLowerCase()
 [Uses default locale]
 [ERROR]   in org.apache.tika.sax.BasicContentHandlerFactoryTest
 (BasicContentHandlerFactoryTest.java:133)
 [ERROR] Forbidden method invocation: java.lang.String#toLowerCase()
 [Uses default locale]
 [ERROR]   in org.apache.tika.sax.BasicContentHandlerFactoryTest
 (BasicContentHandlerFactoryTest.java:176)
 [ERROR] Forbidden method invocation: java.lang.String#toLowerCase()
 [Uses default locale]
 [ERROR]   in org.apache.tika.sax.BasicContentHandlerFactoryTest
 (BasicContentHandlerFactoryTest.java:221)
 [ERROR] Forbidden method invocation: java.lang.String#toLowerCase()
 [Uses default locale]
 [ERROR]   in org.apache.tika.sax.BasicContentHandlerFactoryTest
 (BasicContentHandlerFactoryTest.java:273)
 [ERROR] Scanned 52 (and 331 related) class file(s) for forbidden 

RE: Forbidden-APIS no longer ran because of carzy POM change

2015-01-23 Thread Uwe Schindler
Here is the patch, mailing list swallowed it:

Index: tika-parent/pom.xml
===
--- tika-parent/pom.xml (revision 1654171)
+++ tika-parent/pom.xml (working copy)
@@ -274,7 +274,6 @@
   /properties
 
   build
-pluginManagement
   plugins
 plugin
   artifactIdmaven-compiler-plugin/artifactId
@@ -287,7 +286,7 @@
 plugin
   groupIdde.thetaphi/groupId
   artifactIdforbiddenapis/artifactId
-  version1.6.1/version
+  version1.7/version
   configuration
 targetVersion${maven.compiler.target}/targetVersion
 internalRuntimeForbiddentrue/internalRuntimeForbidden
@@ -322,7 +321,6 @@
   version2.3/version
 /plugin
   /plugins
-/pluginManagement
   /build
 
   profiles

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


 -Original Message-
 From: Uwe Schindler [mailto:u...@thetaphi.de]
 Sent: Friday, January 23, 2015 1:08 PM
 To: dev@tika.apache.org
 Subject: RE: Forbidden-APIS no longer ran because of carzy POM change
 
 The attached patch reverts the change and updates the forbidden plugin.
 
 Uwe
 
 -
 Uwe Schindler
 H.-H.-Meier-Allee 63, D-28213 Bremen
 http://www.thetaphi.de
 eMail: u...@thetaphi.de
 
 
  -Original Message-
  From: Uwe Schindler [mailto:u...@thetaphi.de]
  Sent: Friday, January 23, 2015 1:00 PM
  To: dev@tika.apache.org
  Subject: RE: Forbidden-APIS no longer ran because of carzy POM change
 
  Here ist he explanation why the plugin is no longer called because of this:
 
  - Works for me too, but can anyone explain why? –  Andrew Swan May 15
  '13 at 6:26
  - @Andrew I think this works because m2e is not looking for plugins in
  pluginManagement, but only in build/plugins. In the Maven world, there
  is a difference between the two - the former defines if you happen to
  use this plugin, here's the configuration to use, whereas the latter
  states use this plugin. See this post and its top two answers. –
  GreenGiant Jul 5 '13 at 17:52
  - I agree with @GreenGiant. I tried this solution but it then breaks
  the compilation since the aspectj plugin is not called before
  compilation. –  Pierre Aug 30 '13 at 20:21
 
  This explains the change. In fact placing the plugins in
  pluginManagements disables them unless explicitely configured in a
  sub-module. So this commit should be reverted.
 
 
  In fact the bug described here no longer applies to later M2E
  installations. It still complains about plugins that Eclipse does not
  know about, but this does not prevent you from using Eclipse. So I
  would strongly ask to revert the commit because it breaks the build.
 
  Uwe
 
  -
  Uwe Schindler
  H.-H.-Meier-Allee 63, D-28213 Bremen
  http://www.thetaphi.de
  eMail: u...@thetaphi.de
 
 
   -Original Message-
   From: Uwe Schindler [mailto:u...@thetaphi.de]
   Sent: Friday, January 23, 2015 12:11 PM
   To: dev@tika.apache.org
   Subject: Forbidden-APIS no longer ran because of carzy POM change
  
   Hi,
  
   I just noticed while checking the problems around the
   ExternalParsers that the TIKA's build no longer runs the
   forbidden-apis Maven plugin, so we got a few new violation
   especially regarding the toUpper/LowerCase(). In fact the following
 commit broke this:
  
   Revision: 1624185
   Author: mattmann
   Date: Donnerstag, 11. September 2014 05:11:19
   Message:
   surround in plugin management to resolve
   http://stackoverflow.com/questions/6352208/how-to-solve-plugin-
   execution-not-covered-by-lifecycle-configuration-for-sprin
   
   Modified : /tika/trunk/tika-parent/pom.xml
  
   Since that change, the plugin is no longer run by default. I have no
   idea, why this is like this, but in fact this broke some of the
   globally defined
  check tasks.
   I have no idea how to reenable it easily.
   So I cannot help, but reverting that commit restores behavior. What
   is the reason for this commit, there is not even an issue about
   that. I think it seems to be a workaround for some Eclipse issue,
   but in fact this disables the whole plugins. To reenable
   forbidden-apis you have to now explicitely enable it in every module
   (because pluginManagement just gives the config of a plugin, where
   without that it also enables its
  execution.
  
   In addition, there is already version 1.7 of forbiddenapis, so you
   can replace
   1.6.1 of forbidden-apis with version 1.7 (which fixes a few bugs
   with Java 8 and Java 9).
  
   The following new violations were found - and in fact those broke
   code in turkish locale:
   [INFO]
   
   --
   -- [INFO] Building Apache Tika core 1.8-SNAPSHOT [INFO]
   --
   --
   [INFO]
   [INFO] --- forbiddenapis:1.7:testCheck (default) @ tika-core 

[jira] [Comment Edited] (TIKA-1526) ExternalParser should trap/ignore/workarround JDK-8047340 JDK-8055301 so Turkish Tika users can still use non-external parsers

2015-01-23 Thread Konstantin Gribov (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289172#comment-14289172
 ] 

Konstantin Gribov edited comment on TIKA-1526 at 1/23/15 12:24 PM:
---

[~thetaphi], I understand that this is jdk bug with {{tr}} locale. Can they use 
some workaround with {{Locale.setDefault}} if user's locale is {{tr}}?


was (Author: grossws):
[~thetaphi], I understand that this is jdk bug with {{{tr}}} locale. Can they 
use some workaround with {{{Locale.setDefault}}} if user's locale is {{{tr}}}?

 ExternalParser should trap/ignore/workarround JDK-8047340  JDK-8055301 so 
 Turkish Tika users can still use non-external parsers
 

 Key: TIKA-1526
 URL: https://issues.apache.org/jira/browse/TIKA-1526
 Project: Tika
  Issue Type: Wish
Reporter: Hoss Man

 the JDK has numerous pain points regarding the Turkish locale, posix_spawn 
 lowercasing being one of them...
 https://bugs.openjdk.java.net/browse/JDK-8047340
 https://bugs.openjdk.java.net/browse/JDK-8055301
 As of Tika 1.7, the TesseractOCRParser (which is an ExternalParser) is 
 enabled  configured by default in Tika, and uses ExternalParser.check to see 
 if tesseract is available -- but because of the JDK bug, this means that Tika 
 fails fast for Turkish users on BSD/UNIX variants (including MacOSX) like 
 so...
 {noformat}
   [junit4] Throwable #1: java.lang.Error: posix_spawn is not a supported 
 process launch mechanism on this platform.
   [junit4]   at java.lang.UNIXProcess$1.run(UNIXProcess.java:105)
   [junit4]   at java.lang.UNIXProcess$1.run(UNIXProcess.java:94)
   [junit4]   at java.security.AccessController.doPrivileged(Native 
 Method)
   [junit4]   at java.lang.UNIXProcess.clinit(UNIXProcess.java:92)
   [junit4]   at java.lang.ProcessImpl.start(ProcessImpl.java:130)
   [junit4]   at 
 java.lang.ProcessBuilder.start(ProcessBuilder.java:1029)
   [junit4]   at java.lang.Runtime.exec(Runtime.java:620)
   [junit4]   at java.lang.Runtime.exec(Runtime.java:485)
   [junit4]   at 
 org.apache.tika.parser.external.ExternalParser.check(ExternalParser.java:344)
   [junit4]   at 
 org.apache.tika.parser.ocr.TesseractOCRParser.hasTesseract(TesseractOCRParser.java:117)
   [junit4]   at 
 org.apache.tika.parser.ocr.TesseractOCRParser.getSupportedTypes(TesseractOCRParser.java:90)
   [junit4]   at 
 org.apache.tika.parser.CompositeParser.getParsers(CompositeParser.java:81)
   [junit4]   at 
 org.apache.tika.parser.DefaultParser.getParsers(DefaultParser.java:95)
   [junit4]   at 
 org.apache.tika.parser.CompositeParser.getSupportedTypes(CompositeParser.java:229)
   [junit4]   at 
 org.apache.tika.parser.CompositeParser.getParsers(CompositeParser.java:81)
   [junit4]   at 
 org.apache.tika.parser.CompositeParser.getParser(CompositeParser.java:209)
   [junit4]   at 
 org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:244)
   [junit4]   at 
 org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
 {noformat}
 ...unless they go out of their way to white list only the parsers they 
 need/want so TesseractOCRParser (and any other ExternalParsers) will never 
 even be check()ed.
 It would be nice if Tika's ExternalParser class added a similar 
 hack/workarround to what was done in SOLR-6387 to trap these types of errors. 
  In Solr we just propogate a better error explaining why Java hates the 
 turkish langauge...
 {code}
 } catch (Error err) {
   if (err.getMessage() != null  (err.getMessage().contains(posix_spawn) 
 || err.getMessage().contains(UNIXProcess))) {
 log.warn(Error forking command due to JVM locale bug (see 
 https://issues.apache.org/jira/browse/SOLR-6387):  + err.getMessage());
 return (error executing:  + cmd + );
   }
 }
 {code}
 ...but with Tika, it might be better for all ExternalParsers to just opt 
 out as if they don't recognize the filetype when they detect this type of 
 error fro m the check method (or perhaps it would be better if 
 AutoDetectParser handled this? ... i'm not really sure how it would best fit 
 into Tika's architecture)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TIKA-1526) ExternalParser should trap/ignore/workarround JDK-8047340 JDK-8055301 so Turkish Tika users can still use non-external parsers

2015-01-23 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14288963#comment-14288963
 ] 

Uwe Schindler commented on TIKA-1526:
-

I tried it with maven, but this is all too funny. This bug also affects Maven...

{noformat}
[uschindler@lucene ~]$ export MAVEN_OPTS=-Duser.language=tr
[uschindler@lucene ~]$ mvn
---
constituent[0]: 
file:/usr/local/share/java/maven3/lib/aether-connector-wagon-1.13.1.jar
constituent[1]: 
file:/usr/local/share/java/maven3/lib/maven-repository-metadata-3.0.4.jar
constituent[2]: 
file:/usr/local/share/java/maven3/lib/plexus-sec-dispatcher-1.3.jar
constituent[3]: file:/usr/local/share/java/maven3/lib/aether-spi-1.13.1.jar
constituent[4]: file:/usr/local/share/java/maven3/lib/maven-compat-3.0.4.jar
constituent[5]: 
file:/usr/local/share/java/maven3/lib/plexus-component-annotations-1.5.5.jar
constituent[6]: file:/usr/local/share/java/maven3/lib/plexus-cipher-1.7.jar
constituent[7]: file:/usr/local/share/java/maven3/lib/sisu-guava-0.9.9.jar
constituent[8]: file:/usr/local/share/java/maven3/lib/maven-core-3.0.4.jar
constituent[9]: file:/usr/local/share/java/maven3/lib/plexus-utils-2.0.6.jar
constituent[10]: 
file:/usr/local/share/java/maven3/lib/wagon-provider-api-2.2.jar
constituent[11]: 
file:/usr/local/share/java/maven3/lib/maven-plugin-api-3.0.4.jar
constituent[12]: 
file:/usr/local/share/java/maven3/lib/maven-model-builder-3.0.4.jar
constituent[13]: file:/usr/local/share/java/maven3/lib/maven-settings-3.0.4.jar
constituent[14]: 
file:/usr/local/share/java/maven3/lib/sisu-inject-bean-2.3.0.jar
constituent[15]: file:/usr/local/share/java/maven3/lib/wagon-http-2.2-shaded.jar
constituent[16]: 
file:/usr/local/share/java/maven3/lib/maven-aether-provider-3.0.4.jar
constituent[17]: 
file:/usr/local/share/java/maven3/lib/sisu-inject-plexus-2.3.0.jar
constituent[18]: file:/usr/local/share/java/maven3/lib/maven-artifact-3.0.4.jar
constituent[19]: file:/usr/local/share/java/maven3/lib/maven-model-3.0.4.jar
constituent[20]: file:/usr/local/share/java/maven3/lib/wagon-file-2.2.jar
constituent[21]: file:/usr/local/share/java/maven3/lib/maven-embedder-3.0.4.jar
constituent[22]: 
file:/usr/local/share/java/maven3/lib/sisu-guice-3.1.0-no_aop.jar
constituent[23]: 
file:/usr/local/share/java/maven3/lib/maven-settings-builder-3.0.4.jar
constituent[24]: 
file:/usr/local/share/java/maven3/lib/plexus-interpolation-1.14.jar
constituent[25]: file:/usr/local/share/java/maven3/lib/aether-impl-1.13.1.jar
constituent[26]: file:/usr/local/share/java/maven3/lib/aether-api-1.13.1.jar
constituent[27]: file:/usr/local/share/java/maven3/lib/aether-util-1.13.1.jar
constituent[28]: file:/usr/local/share/java/maven3/lib/commons-cli-1.2.jar
---
Exception in thread main java.lang.Error: posix_spawn is not a supported 
process launch mechanism on this platform.
at java.lang.UNIXProcess$1.run(UNIXProcess.java:111)
at java.lang.UNIXProcess$1.run(UNIXProcess.java:93)
at java.security.AccessController.doPrivileged(Native Method)
at java.lang.UNIXProcess.clinit(UNIXProcess.java:91)
at java.lang.ProcessImpl.start(ProcessImpl.java:130)
at java.lang.ProcessBuilder.start(ProcessBuilder.java:1028)
at java.lang.Runtime.exec(Runtime.java:617)
at java.lang.Runtime.exec(Runtime.java:450)
at java.lang.Runtime.exec(Runtime.java:347)
at 
org.codehaus.plexus.interpolation.os.OperatingSystemUtils.getSystemEnvVars(OperatingSystemUtils.java:86)
at 
org.codehaus.plexus.interpolation.EnvarBasedValueSource.getEnvars(EnvarBasedValueSource.java:74)
at 
org.codehaus.plexus.interpolation.EnvarBasedValueSource.init(EnvarBasedValueSource.java:64)
at 
org.codehaus.plexus.interpolation.EnvarBasedValueSource.init(EnvarBasedValueSource.java:50)
at 
org.apache.maven.settings.building.DefaultSettingsBuilder.interpolate(DefaultSettingsBuilder.java:222)
at 
org.apache.maven.settings.building.DefaultSettingsBuilder.build(DefaultSettingsBuilder.java:101)
at org.apache.maven.cli.MavenCli.settings(MavenCli.java:725)
at org.apache.maven.cli.MavenCli.doMain(MavenCli.java:193)
at org.apache.maven.cli.MavenCli.main(MavenCli.java:141)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:606)
at 
org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced(Launcher.java:290)
at 
org.codehaus.plexus.classworlds.launcher.Launcher.launch(Launcher.java:230)
at 

Forbidden-APIS no longer ran because of carzy POM change

2015-01-23 Thread Uwe Schindler
Hi,

I just noticed while checking the problems around the ExternalParsers that the 
TIKA's build no longer runs the forbidden-apis Maven plugin, so we got a few 
new violation especially regarding the toUpper/LowerCase(). In fact the 
following commit broke this:

Revision: 1624185
Author: mattmann
Date: Donnerstag, 11. September 2014 05:11:19
Message:
surround in plugin management to resolve 
http://stackoverflow.com/questions/6352208/how-to-solve-plugin-execution-not-covered-by-lifecycle-configuration-for-sprin

Modified : /tika/trunk/tika-parent/pom.xml

Since that change, the plugin is no longer run by default. I have no idea, why 
this is like this, but in fact this broke some of the globally defined check 
tasks. I have no idea how to reenable it easily.
So I cannot help, but reverting that commit restores behavior. What is the 
reason for this commit, there is not even an issue about that. I think it seems 
to be a workaround for some Eclipse issue, but in fact this disables the whole 
plugins. To reenable forbidden-apis you have to now explicitely enable it in 
every module (because pluginManagement just gives the config of a plugin, where 
without that it also enables its execution.

In addition, there is already version 1.7 of forbiddenapis, so you can replace 
1.6.1 of forbidden-apis with version 1.7 (which fixes a few bugs with Java 8 
and Java 9).

The following new violations were found - and in fact those broke code in 
turkish locale:
[INFO] 
[INFO] Building Apache Tika core 1.8-SNAPSHOT
[INFO] 
[INFO] 
[INFO] --- forbiddenapis:1.7:testCheck (default) @ tika-core ---
[INFO] Scanning for classes to check...
[INFO] Reading bundled API signatures: jdk-unsafe
[INFO] Reading bundled API signatures: jdk-deprecated
[INFO] Loading classes to check...
[INFO] Scanning for API signatures and dependencies...
[ERROR] Forbidden method invocation: java.lang.String#toLowerCase() [Uses 
default locale]
[ERROR]   in org.apache.tika.sax.BasicContentHandlerFactoryTest 
(BasicContentHandlerFactoryTest.java:79)
[ERROR] Forbidden method invocation: java.lang.String#toLowerCase() [Uses 
default locale]
[ERROR]   in org.apache.tika.sax.BasicContentHandlerFactoryTest 
(BasicContentHandlerFactoryTest.java:80)
[ERROR] Forbidden method invocation: java.lang.String#toLowerCase() [Uses 
default locale]
[ERROR]   in org.apache.tika.sax.BasicContentHandlerFactoryTest 
(BasicContentHandlerFactoryTest.java:88)
[ERROR] Forbidden method invocation: java.lang.String#toLowerCase() [Uses 
default locale]
[ERROR]   in org.apache.tika.sax.BasicContentHandlerFactoryTest 
(BasicContentHandlerFactoryTest.java:133)
[ERROR] Forbidden method invocation: java.lang.String#toLowerCase() [Uses 
default locale]
[ERROR]   in org.apache.tika.sax.BasicContentHandlerFactoryTest 
(BasicContentHandlerFactoryTest.java:176)
[ERROR] Forbidden method invocation: java.lang.String#toLowerCase() [Uses 
default locale]
[ERROR]   in org.apache.tika.sax.BasicContentHandlerFactoryTest 
(BasicContentHandlerFactoryTest.java:221)
[ERROR] Forbidden method invocation: java.lang.String#toLowerCase() [Uses 
default locale]
[ERROR]   in org.apache.tika.sax.BasicContentHandlerFactoryTest 
(BasicContentHandlerFactoryTest.java:273)
[ERROR] Scanned 52 (and 331 related) class file(s) for forbidden API 
invocations (in 0.16s), 7 error(s).
[INFO] 
[...]
[INFO] 
[INFO] Building Apache Tika parsers 1.8-SNAPSHOT
[INFO] 
[INFO]
[INFO] --- forbiddenapis:1.7:check (default-cli) @ tika-parsers ---
[INFO] Scanning for classes to check...
[INFO] Reading bundled API signatures: jdk-unsafe
[INFO] Reading bundled API signatures: jdk-deprecated
[INFO] Loading classes to check...
[INFO] Scanning for API signatures and dependencies...
[ERROR] Forbidden method invocation: 
java.io.InputStreamReader#init(java.io.InputStream) [Uses default charset]
[ERROR]   in org.apache.tika.parser.ocr.TesseractOCRParser$2 
(TesseractOCRParser.java:309)
[ERROR] Forbidden method invocation: java.lang.String#init(byte[],int,int) 
[Uses default charset]
[ERROR]   in org.apache.tika.parser.chm.accessor.ChmDirectoryListingSet 
(ChmDirectoryListingSet.java:240)
[ERROR] Forbidden method invocation: 
java.text.SimpleDateFormat#init(java.lang.String) [Uses default locale]
[ERROR]   in org.apache.tika.parser.image.ImageMetadataExtractor$ExifHandler$1 
(ImageMetadataExtractor.java:304)
[ERROR] Forbidden method invocation: 
java.lang.String#format(java.lang.String,java.lang.Object[]) [Uses default 
locale]
[ERROR]   in org.apache.tika.parser.ocr.TesseractOCRConfig 
(TesseractOCRConfig.java:214)
[ERROR] Scanned 281 (and 813 related) class file(s) for forbidden 

[jira] [Commented] (TIKA-1526) ExternalParser should trap/ignore/workarround JDK-8047340 JDK-8055301 so Turkish Tika users can still use non-external parsers

2015-01-23 Thread Konstantin Gribov (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14288977#comment-14288977
 ] 

Konstantin Gribov commented on TIKA-1526:
-

[~thetaphi], they fixed this in 2.0 RC some time ago.. 
http://jira.codehaus.org/browse/MNG-597

As I can see, you have Maven 3. Can you create an issue there with your 
environment description?

 ExternalParser should trap/ignore/workarround JDK-8047340  JDK-8055301 so 
 Turkish Tika users can still use non-external parsers
 

 Key: TIKA-1526
 URL: https://issues.apache.org/jira/browse/TIKA-1526
 Project: Tika
  Issue Type: Wish
Reporter: Hoss Man

 the JDK has numerous pain points regarding the Turkish locale, posix_spawn 
 lowercasing being one of them...
 https://bugs.openjdk.java.net/browse/JDK-8047340
 https://bugs.openjdk.java.net/browse/JDK-8055301
 As of Tika 1.7, the TesseractOCRParser (which is an ExternalParser) is 
 enabled  configured by default in Tika, and uses ExternalParser.check to see 
 if tesseract is available -- but because of the JDK bug, this means that Tika 
 fails fast for Turkish users on BSD/UNIX variants (including MacOSX) like 
 so...
 {noformat}
   [junit4] Throwable #1: java.lang.Error: posix_spawn is not a supported 
 process launch mechanism on this platform.
   [junit4]   at java.lang.UNIXProcess$1.run(UNIXProcess.java:105)
   [junit4]   at java.lang.UNIXProcess$1.run(UNIXProcess.java:94)
   [junit4]   at java.security.AccessController.doPrivileged(Native 
 Method)
   [junit4]   at java.lang.UNIXProcess.clinit(UNIXProcess.java:92)
   [junit4]   at java.lang.ProcessImpl.start(ProcessImpl.java:130)
   [junit4]   at 
 java.lang.ProcessBuilder.start(ProcessBuilder.java:1029)
   [junit4]   at java.lang.Runtime.exec(Runtime.java:620)
   [junit4]   at java.lang.Runtime.exec(Runtime.java:485)
   [junit4]   at 
 org.apache.tika.parser.external.ExternalParser.check(ExternalParser.java:344)
   [junit4]   at 
 org.apache.tika.parser.ocr.TesseractOCRParser.hasTesseract(TesseractOCRParser.java:117)
   [junit4]   at 
 org.apache.tika.parser.ocr.TesseractOCRParser.getSupportedTypes(TesseractOCRParser.java:90)
   [junit4]   at 
 org.apache.tika.parser.CompositeParser.getParsers(CompositeParser.java:81)
   [junit4]   at 
 org.apache.tika.parser.DefaultParser.getParsers(DefaultParser.java:95)
   [junit4]   at 
 org.apache.tika.parser.CompositeParser.getSupportedTypes(CompositeParser.java:229)
   [junit4]   at 
 org.apache.tika.parser.CompositeParser.getParsers(CompositeParser.java:81)
   [junit4]   at 
 org.apache.tika.parser.CompositeParser.getParser(CompositeParser.java:209)
   [junit4]   at 
 org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:244)
   [junit4]   at 
 org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
 {noformat}
 ...unless they go out of their way to white list only the parsers they 
 need/want so TesseractOCRParser (and any other ExternalParsers) will never 
 even be check()ed.
 It would be nice if Tika's ExternalParser class added a similar 
 hack/workarround to what was done in SOLR-6387 to trap these types of errors. 
  In Solr we just propogate a better error explaining why Java hates the 
 turkish langauge...
 {code}
 } catch (Error err) {
   if (err.getMessage() != null  (err.getMessage().contains(posix_spawn) 
 || err.getMessage().contains(UNIXProcess))) {
 log.warn(Error forking command due to JVM locale bug (see 
 https://issues.apache.org/jira/browse/SOLR-6387):  + err.getMessage());
 return (error executing:  + cmd + );
   }
 }
 {code}
 ...but with Tika, it might be better for all ExternalParsers to just opt 
 out as if they don't recognize the filetype when they detect this type of 
 error fro m the check method (or perhaps it would be better if 
 AutoDetectParser handled this? ... i'm not really sure how it would best fit 
 into Tika's architecture)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Created] (TIKA-1529) Turn forbidden-apis back on

2015-01-23 Thread Tim Allison (JIRA)
Tim Allison created TIKA-1529:
-

 Summary: Turn forbidden-apis back on
 Key: TIKA-1529
 URL: https://issues.apache.org/jira/browse/TIKA-1529
 Project: Tika
  Issue Type: Bug
Reporter: Tim Allison
Priority: Minor


[~thetaphi] recently noticed on that forbidden-apis was turned off in r1624185, 
and he submitted a patch to the dev list.  Let's turn it back on.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TIKA-1529) Turn forbidden-apis back on

2015-01-23 Thread Tim Allison (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289311#comment-14289311
 ] 

Tim Allison commented on TIKA-1529:
---

For UnsupportedEncodingException, Lucene/Solr handles this in different ways:

{noformat}
try {
  out = new PrintStream(bos, false, IOUtils.UTF_8);
} catch (UnsupportedEncodingException bogus) {
  throw new RuntimeException(bogus);
}
{noformat}

or 
{noformat}
} catch (UnsupportedEncodingException e) {
}
{noformat}

or
{noformat}
} catch (UnsupportedEncodingException e) {
  throw new Error(JVM Does not seem to support UTF-8, e);
}
{noformat}

What's our preference?

 Turn forbidden-apis back on
 ---

 Key: TIKA-1529
 URL: https://issues.apache.org/jira/browse/TIKA-1529
 Project: Tika
  Issue Type: Bug
Reporter: Tim Allison
Priority: Minor

 [~thetaphi] recently noticed that forbidden-apis was turned off in r1624185, 
 and he submitted a patch to the dev list.  Let's turn it back on.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TIKA-1529) Turn forbidden-apis back on

2015-01-23 Thread Tim Allison (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289304#comment-14289304
 ] 

Tim Allison commented on TIKA-1529:
---

I just fixed issues BasicContentHandlerFactoryTest in r1654225.

Not sure how to fix chm parser, without doing a semi-manual copying of bytes to 
a StringBuilder.

[~tpalsulich], is it ok to use Locale.ROOT in two places in Tesseract parser 
and config?

 Turn forbidden-apis back on
 ---

 Key: TIKA-1529
 URL: https://issues.apache.org/jira/browse/TIKA-1529
 Project: Tika
  Issue Type: Bug
Reporter: Tim Allison
Priority: Minor

 [~thetaphi] recently noticed on that forbidden-apis was turned off in 
 r1624185, and he submitted a patch to the dev list.  Let's turn it back on.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Updated] (TIKA-1529) Turn forbidden-apis back on

2015-01-23 Thread Tim Allison (JIRA)

 [ 
https://issues.apache.org/jira/browse/TIKA-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Tim Allison updated TIKA-1529:
--
Description: [~thetaphi] recently noticed that forbidden-apis was turned 
off in r1624185, and he submitted a patch to the dev list.  Let's turn it back 
on.  (was: [~thetaphi] recently noticed on that forbidden-apis was turned off 
in r1624185, and he submitted a patch to the dev list.  Let's turn it back on.)

 Turn forbidden-apis back on
 ---

 Key: TIKA-1529
 URL: https://issues.apache.org/jira/browse/TIKA-1529
 Project: Tika
  Issue Type: Bug
Reporter: Tim Allison
Priority: Minor

 [~thetaphi] recently noticed that forbidden-apis was turned off in r1624185, 
 and he submitted a patch to the dev list.  Let's turn it back on.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (TIKA-1529) Turn forbidden-apis back on

2015-01-23 Thread Tim Allison (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289304#comment-14289304
 ] 

Tim Allison edited comment on TIKA-1529 at 1/23/15 2:41 PM:


I fixed issues with BasicContentHandlerFactoryTest in r1654225.

Not sure how to fix chm parser, without doing a semi-manual copying of bytes to 
a StringBuilder.

[~tpalsulich], is it ok to use Locale.ROOT in two places in Tesseract parser 
and config?


was (Author: talli...@mitre.org):
I just fixed issues BasicContentHandlerFactoryTest in r1654225.

Not sure how to fix chm parser, without doing a semi-manual copying of bytes to 
a StringBuilder.

[~tpalsulich], is it ok to use Locale.ROOT in two places in Tesseract parser 
and config?

 Turn forbidden-apis back on
 ---

 Key: TIKA-1529
 URL: https://issues.apache.org/jira/browse/TIKA-1529
 Project: Tika
  Issue Type: Bug
Reporter: Tim Allison
Priority: Minor

 [~thetaphi] recently noticed on that forbidden-apis was turned off in 
 r1624185, and he submitted a patch to the dev list.  Let's turn it back on.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TIKA-1529) Turn forbidden-apis back on

2015-01-23 Thread Tyler Palsulich (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289405#comment-14289405
 ] 

Tyler Palsulich commented on TIKA-1529:
---

+1 to {{RuntimeException}}.

 Turn forbidden-apis back on
 ---

 Key: TIKA-1529
 URL: https://issues.apache.org/jira/browse/TIKA-1529
 Project: Tika
  Issue Type: Bug
Reporter: Tim Allison
Priority: Minor

 [~thetaphi] recently noticed that forbidden-apis was turned off in r1624185, 
 and he submitted a patch to the dev list.  Let's turn it back on.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TIKA-1529) Turn forbidden-apis back on

2015-01-23 Thread Tim Allison (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289449#comment-14289449
 ] 

Tim Allison commented on TIKA-1529:
---

Agreed on US-ASCII, but aren't there illegal combinations in UTF-8?  
Exceedingly rare, I admit...

 Turn forbidden-apis back on
 ---

 Key: TIKA-1529
 URL: https://issues.apache.org/jira/browse/TIKA-1529
 Project: Tika
  Issue Type: Bug
Reporter: Tim Allison
Priority: Minor

 [~thetaphi] recently noticed that forbidden-apis was turned off in r1624185, 
 and he submitted a patch to the dev list.  Let's turn it back on.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Forbidden-APIS no longer ran because of carzy POM change

2015-01-23 Thread Mattmann, Chris A (3980)
Thanks Uwe, no problem we’ll figure it out. We’ll get it re-enabled
and also figure out the Eclipse thing. Thanks for bringing this up!

++
Chris Mattmann, Ph.D.
Chief Architect
Instrument Software and Science Data Systems Section (398)
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 168-519, Mailstop: 168-527
Email: chris.a.mattm...@nasa.gov
WWW:  http://sunset.usc.edu/~mattmann/
++
Adjunct Associate Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++






-Original Message-
From: Uwe Schindler u...@thetaphi.de
Reply-To: dev@tika.apache.org dev@tika.apache.org
Date: Friday, January 23, 2015 at 7:20 AM
To: dev@tika.apache.org dev@tika.apache.org
Subject: RE: Forbidden-APIS no longer ran because of carzy POM change

Hi,

 Hmm, weird, that’s a commit from September 2014, Uwe, so quite a while
 ago.
 
 I think I was having some issues in Eclipse complaining about that
plugin, so I
 used the workaround presented on StackOverflow to deal with it.
 
 I’m not fine reverting the commit unless the behavior that it did was
 preserved - in other words, I wanted Eclipse to stop complaining about
that
 plugin. So maybe we can figure out a way that both enables the plugin,
and
 makes Eclipse not complain about it.

For me it just says that it cannot handle that plugin, but it does not
prevent you from using Eclipse or running anything in eclipse. I have the
plugin in various Eclipse projects with Maven running here locally...

Another option would be to make a Maven profile like you do for RAT?
Unfortunately I have no idea how to do this correctly. In that case you
could just instruct Jenkins to run the profile...

 I’ll check.
 
 ++
 
 Chris Mattmann, Ph.D.
 Chief Architect
 Instrument Software and Science Data Systems Section (398) NASA Jet
 Propulsion Laboratory Pasadena, CA 91109 USA
 Office: 168-519, Mailstop: 168-527
 Email: chris.a.mattm...@nasa.gov
 WWW:  http://sunset.usc.edu/~mattmann/
 ++
 
 Adjunct Associate Professor, Computer Science Department University of
 Southern California, Los Angeles, CA 90089 USA
 ++
 
 
 
 
 
 
 
 -Original Message-
 From: Uwe Schindler u...@thetaphi.de
 Reply-To: dev@tika.apache.org dev@tika.apache.org
 Date: Friday, January 23, 2015 at 3:11 AM
 To: dev@tika.apache.org dev@tika.apache.org
 Subject: Forbidden-APIS no longer ran because of carzy POM change
 
 Hi,
 
 I just noticed while checking the problems around the ExternalParsers
 that the TIKA's build no longer runs the forbidden-apis Maven plugin,
 so we got a few new violation especially regarding the
 toUpper/LowerCase().
 In fact the following commit broke this:
 
 Revision: 1624185
 Author: mattmann
 Date: Donnerstag, 11. September 2014 05:11:19
 Message:
 surround in plugin management to resolve
 http://stackoverflow.com/questions/6352208/how-to-solve-plugin-
 executio
 n-n ot-covered-by-lifecycle-configuration-for-sprin
 
 Modified : /tika/trunk/tika-parent/pom.xml
 
 Since that change, the plugin is no longer run by default. I have no
 idea, why this is like this, but in fact this broke some of the
 globally defined check tasks. I have no idea how to reenable it easily.
 So I cannot help, but reverting that commit restores behavior. What is
 the reason for this commit, there is not even an issue about that. I
 think it seems to be a workaround for some Eclipse issue, but in fact
 this disables the whole plugins. To reenable forbidden-apis you have to
 now explicitely enable it in every module (because pluginManagement
 just gives the config of a plugin, where without that it also enables
 its execution.
 
 In addition, there is already version 1.7 of forbiddenapis, so you can
 replace 1.6.1 of forbidden-apis with version 1.7 (which fixes a few
 bugs with Java 8 and Java 9).
 
 The following new violations were found - and in fact those broke code
 in turkish locale:
 [INFO]
 ---
 - [INFO] Building Apache Tika core 1.8-SNAPSHOT [INFO]
 ---
 -
 [INFO]
 [INFO] --- forbiddenapis:1.7:testCheck (default) @ tika-core --- [INFO]
 Scanning for classes to check...
 [INFO] Reading bundled API signatures: jdk-unsafe [INFO] Reading
 bundled API signatures: jdk-deprecated [INFO] Loading classes to
 check...
 [INFO] Scanning for API signatures and dependencies...
 [ERROR] Forbidden method invocation: java.lang.String#toLowerCase()
 [Uses default locale]
 [ERROR]   in org.apache.tika.sax.BasicContentHandlerFactoryTest
 

[jira] [Commented] (TIKA-1526) ExternalParser should trap/ignore/workarround JDK-8047340 JDK-8055301 so Turkish Tika users can still use non-external parsers

2015-01-23 Thread Tyler Palsulich (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289419#comment-14289419
 ] 

Tyler Palsulich commented on TIKA-1526:
---

This is exactly how I saw the bug. I was confused that no tests were running, 
tried switching a config, and never saw the error again (as discussed).

 ExternalParser should trap/ignore/workarround JDK-8047340  JDK-8055301 so 
 Turkish Tika users can still use non-external parsers
 

 Key: TIKA-1526
 URL: https://issues.apache.org/jira/browse/TIKA-1526
 Project: Tika
  Issue Type: Wish
Reporter: Hoss Man

 the JDK has numerous pain points regarding the Turkish locale, posix_spawn 
 lowercasing being one of them...
 https://bugs.openjdk.java.net/browse/JDK-8047340
 https://bugs.openjdk.java.net/browse/JDK-8055301
 As of Tika 1.7, the TesseractOCRParser (which is an ExternalParser) is 
 enabled  configured by default in Tika, and uses ExternalParser.check to see 
 if tesseract is available -- but because of the JDK bug, this means that Tika 
 fails fast for Turkish users on BSD/UNIX variants (including MacOSX) like 
 so...
 {noformat}
   [junit4] Throwable #1: java.lang.Error: posix_spawn is not a supported 
 process launch mechanism on this platform.
   [junit4]   at java.lang.UNIXProcess$1.run(UNIXProcess.java:105)
   [junit4]   at java.lang.UNIXProcess$1.run(UNIXProcess.java:94)
   [junit4]   at java.security.AccessController.doPrivileged(Native 
 Method)
   [junit4]   at java.lang.UNIXProcess.clinit(UNIXProcess.java:92)
   [junit4]   at java.lang.ProcessImpl.start(ProcessImpl.java:130)
   [junit4]   at 
 java.lang.ProcessBuilder.start(ProcessBuilder.java:1029)
   [junit4]   at java.lang.Runtime.exec(Runtime.java:620)
   [junit4]   at java.lang.Runtime.exec(Runtime.java:485)
   [junit4]   at 
 org.apache.tika.parser.external.ExternalParser.check(ExternalParser.java:344)
   [junit4]   at 
 org.apache.tika.parser.ocr.TesseractOCRParser.hasTesseract(TesseractOCRParser.java:117)
   [junit4]   at 
 org.apache.tika.parser.ocr.TesseractOCRParser.getSupportedTypes(TesseractOCRParser.java:90)
   [junit4]   at 
 org.apache.tika.parser.CompositeParser.getParsers(CompositeParser.java:81)
   [junit4]   at 
 org.apache.tika.parser.DefaultParser.getParsers(DefaultParser.java:95)
   [junit4]   at 
 org.apache.tika.parser.CompositeParser.getSupportedTypes(CompositeParser.java:229)
   [junit4]   at 
 org.apache.tika.parser.CompositeParser.getParsers(CompositeParser.java:81)
   [junit4]   at 
 org.apache.tika.parser.CompositeParser.getParser(CompositeParser.java:209)
   [junit4]   at 
 org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:244)
   [junit4]   at 
 org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
 {noformat}
 ...unless they go out of their way to white list only the parsers they 
 need/want so TesseractOCRParser (and any other ExternalParsers) will never 
 even be check()ed.
 It would be nice if Tika's ExternalParser class added a similar 
 hack/workarround to what was done in SOLR-6387 to trap these types of errors. 
  In Solr we just propogate a better error explaining why Java hates the 
 turkish langauge...
 {code}
 } catch (Error err) {
   if (err.getMessage() != null  (err.getMessage().contains(posix_spawn) 
 || err.getMessage().contains(UNIXProcess))) {
 log.warn(Error forking command due to JVM locale bug (see 
 https://issues.apache.org/jira/browse/SOLR-6387):  + err.getMessage());
 return (error executing:  + cmd + );
   }
 }
 {code}
 ...but with Tika, it might be better for all ExternalParsers to just opt 
 out as if they don't recognize the filetype when they detect this type of 
 error fro m the check method (or perhaps it would be better if 
 AutoDetectParser handled this? ... i'm not really sure how it would best fit 
 into Tika's architecture)



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Forbidden-APIS no longer ran because of carzy POM change

2015-01-23 Thread Mattmann, Chris A (3980)
Hmm, weird, that’s a commit from September 2014, Uwe, so quite a
while ago.

I think I was having some issues in Eclipse complaining about that
plugin, so I used the workaround presented on StackOverflow to deal
with it. 

I’m not fine reverting the commit unless the behavior that it did
was preserved - in other words, I wanted Eclipse to stop complaining
about that plugin. So maybe we can figure out a way that both enables
the plugin, and makes Eclipse not complain about it.

I’ll check.

++
Chris Mattmann, Ph.D.
Chief Architect
Instrument Software and Science Data Systems Section (398)
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 168-519, Mailstop: 168-527
Email: chris.a.mattm...@nasa.gov
WWW:  http://sunset.usc.edu/~mattmann/
++
Adjunct Associate Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++






-Original Message-
From: Uwe Schindler u...@thetaphi.de
Reply-To: dev@tika.apache.org dev@tika.apache.org
Date: Friday, January 23, 2015 at 3:11 AM
To: dev@tika.apache.org dev@tika.apache.org
Subject: Forbidden-APIS no longer ran because of carzy POM change

Hi,

I just noticed while checking the problems around the ExternalParsers
that the TIKA's build no longer runs the forbidden-apis Maven plugin, so
we got a few new violation especially regarding the toUpper/LowerCase().
In fact the following commit broke this:

Revision: 1624185
Author: mattmann
Date: Donnerstag, 11. September 2014 05:11:19
Message:
surround in plugin management to resolve
http://stackoverflow.com/questions/6352208/how-to-solve-plugin-execution-n
ot-covered-by-lifecycle-configuration-for-sprin

Modified : /tika/trunk/tika-parent/pom.xml

Since that change, the plugin is no longer run by default. I have no
idea, why this is like this, but in fact this broke some of the globally
defined check tasks. I have no idea how to reenable it easily.
So I cannot help, but reverting that commit restores behavior. What is
the reason for this commit, there is not even an issue about that. I
think it seems to be a workaround for some Eclipse issue, but in fact
this disables the whole plugins. To reenable forbidden-apis you have to
now explicitely enable it in every module (because pluginManagement just
gives the config of a plugin, where without that it also enables its
execution.

In addition, there is already version 1.7 of forbiddenapis, so you can
replace 1.6.1 of forbidden-apis with version 1.7 (which fixes a few bugs
with Java 8 and Java 9).

The following new violations were found - and in fact those broke code
in turkish locale:
[INFO] 

[INFO] Building Apache Tika core 1.8-SNAPSHOT
[INFO] 

[INFO] 
[INFO] --- forbiddenapis:1.7:testCheck (default) @ tika-core ---
[INFO] Scanning for classes to check...
[INFO] Reading bundled API signatures: jdk-unsafe
[INFO] Reading bundled API signatures: jdk-deprecated
[INFO] Loading classes to check...
[INFO] Scanning for API signatures and dependencies...
[ERROR] Forbidden method invocation: java.lang.String#toLowerCase() [Uses
default locale]
[ERROR]   in org.apache.tika.sax.BasicContentHandlerFactoryTest
(BasicContentHandlerFactoryTest.java:79)
[ERROR] Forbidden method invocation: java.lang.String#toLowerCase() [Uses
default locale]
[ERROR]   in org.apache.tika.sax.BasicContentHandlerFactoryTest
(BasicContentHandlerFactoryTest.java:80)
[ERROR] Forbidden method invocation: java.lang.String#toLowerCase() [Uses
default locale]
[ERROR]   in org.apache.tika.sax.BasicContentHandlerFactoryTest
(BasicContentHandlerFactoryTest.java:88)
[ERROR] Forbidden method invocation: java.lang.String#toLowerCase() [Uses
default locale]
[ERROR]   in org.apache.tika.sax.BasicContentHandlerFactoryTest
(BasicContentHandlerFactoryTest.java:133)
[ERROR] Forbidden method invocation: java.lang.String#toLowerCase() [Uses
default locale]
[ERROR]   in org.apache.tika.sax.BasicContentHandlerFactoryTest
(BasicContentHandlerFactoryTest.java:176)
[ERROR] Forbidden method invocation: java.lang.String#toLowerCase() [Uses
default locale]
[ERROR]   in org.apache.tika.sax.BasicContentHandlerFactoryTest
(BasicContentHandlerFactoryTest.java:221)
[ERROR] Forbidden method invocation: java.lang.String#toLowerCase() [Uses
default locale]
[ERROR]   in org.apache.tika.sax.BasicContentHandlerFactoryTest
(BasicContentHandlerFactoryTest.java:273)
[ERROR] Scanned 52 (and 331 related) class file(s) for forbidden API
invocations (in 0.16s), 7 error(s).
[INFO] 

[...]
[INFO] 

[INFO] Building Apache Tika 

[jira] [Commented] (TIKA-1529) Turn forbidden-apis back on

2015-01-23 Thread Chris A. Mattmann (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289361#comment-14289361
 ] 

Chris A. Mattmann commented on TIKA-1529:
-

Tim, I'm OK with figuring out how to turn it back on, but not at the expense of 
my Eclipse complaining at me which I was trying to do back in September 2014. 
Let me see if I can help find a workaround for both.

 Turn forbidden-apis back on
 ---

 Key: TIKA-1529
 URL: https://issues.apache.org/jira/browse/TIKA-1529
 Project: Tika
  Issue Type: Bug
Reporter: Tim Allison
Priority: Minor

 [~thetaphi] recently noticed that forbidden-apis was turned off in r1624185, 
 and he submitted a patch to the dev list.  Let's turn it back on.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TIKA-1529) Turn forbidden-apis back on

2015-01-23 Thread Tim Allison (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289369#comment-14289369
 ] 

Tim Allison commented on TIKA-1529:
---

[~grossws], for the following in {noformat}testExtractChmEntry{noformat}

{noformat}
//validate html
String html = new String(data);
{noformat}

Should this be ISO-8859-1?

 Turn forbidden-apis back on
 ---

 Key: TIKA-1529
 URL: https://issues.apache.org/jira/browse/TIKA-1529
 Project: Tika
  Issue Type: Bug
Reporter: Tim Allison
Priority: Minor

 [~thetaphi] recently noticed that forbidden-apis was turned off in r1624185, 
 and he submitted a patch to the dev list.  Let's turn it back on.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TIKA-1529) Turn forbidden-apis back on

2015-01-23 Thread Chris A. Mattmann (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289390#comment-14289390
 ] 

Chris A. Mattmann commented on TIKA-1529:
-

Hi Tim, I tried Uwe's patch on the latest version of Eclipse Version: Luna 
Service Release 1a (4.4.1) and M2e. Eclipse doesn't seem to complain to me 
anymore. Yay! I'm +1 to apply the patch.

 Turn forbidden-apis back on
 ---

 Key: TIKA-1529
 URL: https://issues.apache.org/jira/browse/TIKA-1529
 Project: Tika
  Issue Type: Bug
Reporter: Tim Allison
Priority: Minor

 [~thetaphi] recently noticed that forbidden-apis was turned off in r1624185, 
 and he submitted a patch to the dev list.  Let's turn it back on.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TIKA-1529) Turn forbidden-apis back on

2015-01-23 Thread Tyler Palsulich (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289402#comment-14289402
 ] 

Tyler Palsulich commented on TIKA-1529:
---

Yes, Locale.ROOT is OK.

 Turn forbidden-apis back on
 ---

 Key: TIKA-1529
 URL: https://issues.apache.org/jira/browse/TIKA-1529
 Project: Tika
  Issue Type: Bug
Reporter: Tim Allison
Priority: Minor

 [~thetaphi] recently noticed that forbidden-apis was turned off in r1624185, 
 and he submitted a patch to the dev list.  Let's turn it back on.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TIKA-1529) Turn forbidden-apis back on

2015-01-23 Thread Konstantin Gribov (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289333#comment-14289333
 ] 

Konstantin Gribov commented on TIKA-1529:
-

I vote for throwing {{RuntimeException}} or {{TikaException}} with 
cause={{UnsupportedEncodingException}} and human-readable text.

 Turn forbidden-apis back on
 ---

 Key: TIKA-1529
 URL: https://issues.apache.org/jira/browse/TIKA-1529
 Project: Tika
  Issue Type: Bug
Reporter: Tim Allison
Priority: Minor

 [~thetaphi] recently noticed that forbidden-apis was turned off in r1624185, 
 and he submitted a patch to the dev list.  Let's turn it back on.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TIKA-1529) Turn forbidden-apis back on

2015-01-23 Thread Hudson (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289337#comment-14289337
 ] 

Hudson commented on TIKA-1529:
--

UNSTABLE: Integrated in tika-trunk-jdk1.6 #433 (See 
[https://builds.apache.org/job/tika-trunk-jdk1.6/433/])
TIKA-1529: step 1...get rid of toLowerCase in BasicContentHandlerFactoryTest 
(tallison: http://svn.apache.org/viewvc/tika/trunk/?view=revrev=1654225)
* 
/tika/trunk/tika-core/src/test/java/org/apache/tika/sax/BasicContentHandlerFactoryTest.java


 Turn forbidden-apis back on
 ---

 Key: TIKA-1529
 URL: https://issues.apache.org/jira/browse/TIKA-1529
 Project: Tika
  Issue Type: Bug
Reporter: Tim Allison
Priority: Minor

 [~thetaphi] recently noticed that forbidden-apis was turned off in r1624185, 
 and he submitted a patch to the dev list.  Let's turn it back on.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


RE: Forbidden-APIS no longer ran because of carzy POM change

2015-01-23 Thread Uwe Schindler
Hi,

 Hmm, weird, that’s a commit from September 2014, Uwe, so quite a while
 ago.
 
 I think I was having some issues in Eclipse complaining about that plugin, so 
 I
 used the workaround presented on StackOverflow to deal with it.
 
 I’m not fine reverting the commit unless the behavior that it did was
 preserved - in other words, I wanted Eclipse to stop complaining about that
 plugin. So maybe we can figure out a way that both enables the plugin, and
 makes Eclipse not complain about it.

For me it just says that it cannot handle that plugin, but it does not prevent 
you from using Eclipse or running anything in eclipse. I have the plugin in 
various Eclipse projects with Maven running here locally...

Another option would be to make a Maven profile like you do for RAT? 
Unfortunately I have no idea how to do this correctly. In that case you could 
just instruct Jenkins to run the profile...

 I’ll check.
 
 ++
 
 Chris Mattmann, Ph.D.
 Chief Architect
 Instrument Software and Science Data Systems Section (398) NASA Jet
 Propulsion Laboratory Pasadena, CA 91109 USA
 Office: 168-519, Mailstop: 168-527
 Email: chris.a.mattm...@nasa.gov
 WWW:  http://sunset.usc.edu/~mattmann/
 ++
 
 Adjunct Associate Professor, Computer Science Department University of
 Southern California, Los Angeles, CA 90089 USA
 ++
 
 
 
 
 
 
 
 -Original Message-
 From: Uwe Schindler u...@thetaphi.de
 Reply-To: dev@tika.apache.org dev@tika.apache.org
 Date: Friday, January 23, 2015 at 3:11 AM
 To: dev@tika.apache.org dev@tika.apache.org
 Subject: Forbidden-APIS no longer ran because of carzy POM change
 
 Hi,
 
 I just noticed while checking the problems around the ExternalParsers
 that the TIKA's build no longer runs the forbidden-apis Maven plugin,
 so we got a few new violation especially regarding the
 toUpper/LowerCase().
 In fact the following commit broke this:
 
 Revision: 1624185
 Author: mattmann
 Date: Donnerstag, 11. September 2014 05:11:19
 Message:
 surround in plugin management to resolve
 http://stackoverflow.com/questions/6352208/how-to-solve-plugin-
 executio
 n-n ot-covered-by-lifecycle-configuration-for-sprin
 
 Modified : /tika/trunk/tika-parent/pom.xml
 
 Since that change, the plugin is no longer run by default. I have no
 idea, why this is like this, but in fact this broke some of the
 globally defined check tasks. I have no idea how to reenable it easily.
 So I cannot help, but reverting that commit restores behavior. What is
 the reason for this commit, there is not even an issue about that. I
 think it seems to be a workaround for some Eclipse issue, but in fact
 this disables the whole plugins. To reenable forbidden-apis you have to
 now explicitely enable it in every module (because pluginManagement
 just gives the config of a plugin, where without that it also enables
 its execution.
 
 In addition, there is already version 1.7 of forbiddenapis, so you can
 replace 1.6.1 of forbidden-apis with version 1.7 (which fixes a few
 bugs with Java 8 and Java 9).
 
 The following new violations were found - and in fact those broke code
 in turkish locale:
 [INFO]
 ---
 - [INFO] Building Apache Tika core 1.8-SNAPSHOT [INFO]
 ---
 -
 [INFO]
 [INFO] --- forbiddenapis:1.7:testCheck (default) @ tika-core --- [INFO]
 Scanning for classes to check...
 [INFO] Reading bundled API signatures: jdk-unsafe [INFO] Reading
 bundled API signatures: jdk-deprecated [INFO] Loading classes to
 check...
 [INFO] Scanning for API signatures and dependencies...
 [ERROR] Forbidden method invocation: java.lang.String#toLowerCase()
 [Uses default locale]
 [ERROR]   in org.apache.tika.sax.BasicContentHandlerFactoryTest
 (BasicContentHandlerFactoryTest.java:79)
 [ERROR] Forbidden method invocation: java.lang.String#toLowerCase()
 [Uses default locale]
 [ERROR]   in org.apache.tika.sax.BasicContentHandlerFactoryTest
 (BasicContentHandlerFactoryTest.java:80)
 [ERROR] Forbidden method invocation: java.lang.String#toLowerCase()
 [Uses default locale]
 [ERROR]   in org.apache.tika.sax.BasicContentHandlerFactoryTest
 (BasicContentHandlerFactoryTest.java:88)
 [ERROR] Forbidden method invocation: java.lang.String#toLowerCase()
 [Uses default locale]
 [ERROR]   in org.apache.tika.sax.BasicContentHandlerFactoryTest
 (BasicContentHandlerFactoryTest.java:133)
 [ERROR] Forbidden method invocation: java.lang.String#toLowerCase()
 [Uses default locale]
 [ERROR]   in org.apache.tika.sax.BasicContentHandlerFactoryTest
 (BasicContentHandlerFactoryTest.java:176)
 [ERROR] Forbidden method invocation: java.lang.String#toLowerCase()
 [Uses default locale]
 [ERROR]   in org.apache.tika.sax.BasicContentHandlerFactoryTest
 

[jira] [Commented] (TIKA-1529) Turn forbidden-apis back on

2015-01-23 Thread Tim Allison (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289410#comment-14289410
 ] 

Tim Allison commented on TIKA-1529:
---

Great!  I'm still making mods and creating a static Charset UTF_8 in IOUtils 
following [~thetaphi]'s recommendation...until we move to 1.7 and can use 
StandardCharsets.

 Turn forbidden-apis back on
 ---

 Key: TIKA-1529
 URL: https://issues.apache.org/jira/browse/TIKA-1529
 Project: Tika
  Issue Type: Bug
Reporter: Tim Allison
Priority: Minor

 [~thetaphi] recently noticed that forbidden-apis was turned off in r1624185, 
 and he submitted a patch to the dev list.  Let's turn it back on.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Commented] (TIKA-1529) Turn forbidden-apis back on

2015-01-23 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289438#comment-14289438
 ] 

Uwe Schindler commented on TIKA-1529:
-

If you just check for ASCII chars in some string of unknown encoding, the 
easiest is to use US-ASCII as charset, this will always work, also with UTF-8 
:-)

 Turn forbidden-apis back on
 ---

 Key: TIKA-1529
 URL: https://issues.apache.org/jira/browse/TIKA-1529
 Project: Tika
  Issue Type: Bug
Reporter: Tim Allison
Priority: Minor

 [~thetaphi] recently noticed that forbidden-apis was turned off in r1624185, 
 and he submitted a patch to the dev list.  Let's turn it back on.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


RE: Forbidden-APIS no longer ran because of carzy POM change

2015-01-23 Thread Uwe Schindler
Hi,

 There're several places where forbiddenapis will give errors in Tika.
 I don't know if there is better way to fallback. E.g. in one of chm parser
 classes:
 
 try {
   dle.setName(new String(bytes, UTF-8)); catch
 (UnsupportedCharsetException e) {
   dle.setName(new String(bytes));
 }
 
 Can you add special annotation parsing (like
 @SuppressWarnings(forbiddenapis) on element) to avoid emitting build
 error in special cases like above mentioned?

Not yet, see: https://code.google.com/p/forbidden-apis/issues/detail?id=34

But the above is not needed. UTF-8 is always defined (the JVM standard requires 
this). In fact in Java 7, you can use StandardCharsets.UTF_8. It is also a bad 
idea to use charsets as strings. Just somewhere define a constant (if you have 
to Java 7) like: 
public final Charset UTF_8 = Charset.forName(UTF-8);

And use that everywhere instead of a string. This spares the synchronized 
lookup of the string, the JVM is doing otherwise.

Uwe

 --
 Best regards,
 Konstantin Gribov
 
 Fri Jan 23 2015 at 15:10:18, Uwe Schindler u...@thetaphi.de:
 
 Here is the patch, mailing list swallowed it:
 
  Index: tika-parent/pom.xml
 
 ==
 =
  --- tika-parent/pom.xml (revision 1654171)
  +++ tika-parent/pom.xml (working copy)
  @@ -274,7 +274,6 @@
 /properties
 
 build
  -pluginManagement
 plugins
   plugin
 artifactIdmaven-compiler-plugin/artifactId
  @@ -287,7 +286,7 @@
   plugin
 groupIdde.thetaphi/groupId
 artifactIdforbiddenapis/artifactId
  -  version1.6.1/version
  +  version1.7/version
 configuration
   targetVersion${maven.compiler.target}/targetVersion
   internalRuntimeForbiddentrue/internalRuntimeForbidden
  @@ -322,7 +321,6 @@
 version2.3/version
   /plugin
 /plugins
  -/pluginManagement
 /build
 
 profiles
 
  -
  Uwe Schindler
  H.-H.-Meier-Allee 63, D-28213 Bremen
  http://www.thetaphi.de
  eMail: u...@thetaphi.de
 
 
   -Original Message-
   From: Uwe Schindler [mailto:u...@thetaphi.de]
   Sent: Friday, January 23, 2015 1:08 PM
   To: dev@tika.apache.org
   Subject: RE: Forbidden-APIS no longer ran because of carzy POM
   change
  
   The attached patch reverts the change and updates the forbidden plugin.
  
   Uwe
  
   -
   Uwe Schindler
   H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de
   eMail: u...@thetaphi.de
  
  
-Original Message-
From: Uwe Schindler [mailto:u...@thetaphi.de]
Sent: Friday, January 23, 2015 1:00 PM
To: dev@tika.apache.org
Subject: RE: Forbidden-APIS no longer ran because of carzy POM
 change
   
Here ist he explanation why the plugin is no longer called because of
  this:
   
- Works for me too, but can anyone explain why? –  Andrew Swan May
 15
'13 at 6:26
- @Andrew I think this works because m2e is not looking for plugins in
pluginManagement, but only in build/plugins. In the Maven world,
 there
is a difference between the two - the former defines if you happen to
use this plugin, here's the configuration to use, whereas the latter
states use this plugin. See this post and its top two answers. –
GreenGiant Jul 5 '13 at 17:52
- I agree with @GreenGiant. I tried this solution but it then breaks
the compilation since the aspectj plugin is not called before
compilation. –  Pierre Aug 30 '13 at 20:21
   
This explains the change. In fact placing the plugins in
pluginManagements disables them unless explicitely configured in a
sub-module. So this commit should be reverted.
   
   
In fact the bug described here no longer applies to later M2E
installations. It still complains about plugins that Eclipse does not
know about, but this does not prevent you from using Eclipse. So I
would strongly ask to revert the commit because it breaks the build.
   
Uwe
   
-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de
   
   
 -Original Message-
 From: Uwe Schindler [mailto:u...@thetaphi.de]
 Sent: Friday, January 23, 2015 12:11 PM
 To: dev@tika.apache.org
 Subject: Forbidden-APIS no longer ran because of carzy POM change

 Hi,

 I just noticed while checking the problems around the
 ExternalParsers that the TIKA's build no longer runs the
 forbidden-apis Maven plugin, so we got a few new violation
 especially regarding the toUpper/LowerCase(). In fact the following
   commit broke this:

 Revision: 1624185
 Author: mattmann
 Date: Donnerstag, 11. September 2014 05:11:19
 Message:
 surround in plugin management to resolve
 http://stackoverflow.com/questions/6352208/how-to-solve-plugin-
 execution-not-covered-by-lifecycle-configuration-for-sprin
 
   

RE: Forbidden-APIS no longer ran because of carzy POM change

2015-01-23 Thread Uwe Schindler
Hi, this may also help, it also brings the needed information:

https://www.eclipse.org/m2e/documentation/m2e-execution-not-covered.html

In fact the problem is: Eclipse has no idea how this plugin should be executed 
internally in Eclipse. But as this is just a check plugin that does not 
affect the build output at all, you can leave it disabled.

If you scroll down, you see that Eclipse 4.2+ fixes this problem: Disable the 
plugin for Maven using Project properties - Maven - Lifecycle mappings - 
ignore

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


 -Original Message-
 From: Mattmann, Chris A (3980) [mailto:chris.a.mattm...@jpl.nasa.gov]
 Sent: Friday, January 23, 2015 4:13 PM
 To: dev@tika.apache.org
 Subject: Re: Forbidden-APIS no longer ran because of carzy POM change
 
 Hi Uwe,
 
 Thanks. I will check it out. Like I said, I’m not OK reverting anything if my
 Eclipse keeps complaining at me so we’ll need a fix that handles both. Let me
 try with the latest version of Eclipse and m2e and see if (with your patch) 
 the
 issue goes away.
 
 Cheers,
 Chris
 
 ++
 
 Chris Mattmann, Ph.D.
 Chief Architect
 Instrument Software and Science Data Systems Section (398) NASA Jet
 Propulsion Laboratory Pasadena, CA 91109 USA
 Office: 168-519, Mailstop: 168-527
 Email: chris.a.mattm...@nasa.gov
 WWW:  http://sunset.usc.edu/~mattmann/
 ++
 
 Adjunct Associate Professor, Computer Science Department University of
 Southern California, Los Angeles, CA 90089 USA
 ++
 
 
 
 
 
 
 
 -Original Message-
 From: Uwe Schindler u...@thetaphi.de
 Reply-To: dev@tika.apache.org dev@tika.apache.org
 Date: Friday, January 23, 2015 at 3:59 AM
 To: dev@tika.apache.org dev@tika.apache.org
 Subject: RE: Forbidden-APIS no longer ran because of carzy POM change
 
 Here ist he explanation why the plugin is no longer called because of
 this:
 
 - Works for me too, but can anyone explain why? –  Andrew Swan May 15
 '13
 at 6:26
 - @Andrew I think this works because m2e is not looking for plugins in
 pluginManagement, but only in build/plugins. In the Maven world, there
 is a difference between the two - the former defines if you happen to
 use this plugin, here's the configuration to use, whereas the latter
 states use this plugin. See this post and its top two answers. –
 GreenGiant Jul 5 '13 at 17:52
 - I agree with @GreenGiant. I tried this solution but it then breaks
 the compilation since the aspectj plugin is not called before
 compilation. – Pierre Aug 30 '13 at 20:21
 
 This explains the change. In fact placing the plugins in
 pluginManagements disables them unless explicitely configured in a
 sub-module. So this commit should be reverted.
 
 
 In fact the bug described here no longer applies to later M2E
 installations. It still complains about plugins that Eclipse does not
 know about, but this does not prevent you from using Eclipse. So I
 would strongly ask to revert the commit because it breaks the build.
 
 Uwe
 
 -
 Uwe Schindler
 H.-H.-Meier-Allee 63, D-28213 Bremen
 http://www.thetaphi.de
 eMail: u...@thetaphi.de
 
 
  -Original Message-
  From: Uwe Schindler [mailto:u...@thetaphi.de]
  Sent: Friday, January 23, 2015 12:11 PM
  To: dev@tika.apache.org
  Subject: Forbidden-APIS no longer ran because of carzy POM change
 
  Hi,
 
  I just noticed while checking the problems around the ExternalParsers
 that  the TIKA's build no longer runs the forbidden-apis Maven plugin,
 so we got a  few new violation especially regarding the
 toUpper/LowerCase(). In fact the  following commit broke this:
 
  Revision: 1624185
  Author: mattmann
  Date: Donnerstag, 11. September 2014 05:11:19
  Message:
  surround in plugin management to resolve
  http://stackoverflow.com/questions/6352208/how-to-solve-plugin-
  execution-not-covered-by-lifecycle-configuration-for-sprin
  
  Modified : /tika/trunk/tika-parent/pom.xml
 
  Since that change, the plugin is no longer run by default. I have no
 idea, why  this is like this, but in fact this broke some of the
 globally defined check tasks.
  I have no idea how to reenable it easily.
  So I cannot help, but reverting that commit restores behavior. What
 is the  reason for this commit, there is not even an issue about that.
 I think it seems  to be a workaround for some Eclipse issue, but in
 fact this disables the whole  plugins. To reenable forbidden-apis you
 have to now explicitely enable it in  every module (because
 pluginManagement just gives the config of a plugin,  where without
 that it also enables its execution.
 
  In addition, there is already version 1.7 of forbiddenapis, so you
 can replace
  1.6.1 of forbidden-apis with version 1.7 (which fixes a few bugs with
 Java 8  and Java 9).
 
  The following new 

[jira] [Commented] (TIKA-1529) Turn forbidden-apis back on

2015-01-23 Thread Konstantin Gribov (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289460#comment-14289460
 ] 

Konstantin Gribov commented on TIKA-1529:
-

{{new String(bytes, Charset)}} will always replace malformed and unmappable 
chars with some placeholder (see 
[http://docs.oracle.com/javase/6/docs/api/java/lang/String.html#String(byte\[\],
 java.nio.charset.Charset)]).

So we can use any standard encoding.

 Turn forbidden-apis back on
 ---

 Key: TIKA-1529
 URL: https://issues.apache.org/jira/browse/TIKA-1529
 Project: Tika
  Issue Type: Bug
Reporter: Tim Allison
Priority: Minor

 [~thetaphi] recently noticed that forbidden-apis was turned off in r1624185, 
 and he submitted a patch to the dev list.  Let's turn it back on.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


Re: Forbidden-APIS no longer ran because of carzy POM change

2015-01-23 Thread Mattmann, Chris A (3980)
Hi Uwe,

Thanks. I will check it out. Like I said, I’m not OK reverting anything
if my Eclipse keeps complaining at me so we’ll need a fix that handles
both. Let me try with the latest version of Eclipse and m2e and see if
(with your patch) the issue goes away.

Cheers,
Chris

++
Chris Mattmann, Ph.D.
Chief Architect
Instrument Software and Science Data Systems Section (398)
NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA
Office: 168-519, Mailstop: 168-527
Email: chris.a.mattm...@nasa.gov
WWW:  http://sunset.usc.edu/~mattmann/
++
Adjunct Associate Professor, Computer Science Department
University of Southern California, Los Angeles, CA 90089 USA
++






-Original Message-
From: Uwe Schindler u...@thetaphi.de
Reply-To: dev@tika.apache.org dev@tika.apache.org
Date: Friday, January 23, 2015 at 3:59 AM
To: dev@tika.apache.org dev@tika.apache.org
Subject: RE: Forbidden-APIS no longer ran because of carzy POM change

Here ist he explanation why the plugin is no longer called because of
this:

- Works for me too, but can anyone explain why? –  Andrew Swan May 15 '13
at 6:26
- @Andrew I think this works because m2e is not looking for plugins in
pluginManagement, but only in build/plugins. In the Maven world, there is
a difference between the two - the former defines if you happen to use
this plugin, here's the configuration to use, whereas the latter states
use this plugin. See this post and its top two answers. –  GreenGiant
Jul 5 '13 at 17:52
- I agree with @GreenGiant. I tried this solution but it then breaks the
compilation since the aspectj plugin is not called before compilation. –
Pierre Aug 30 '13 at 20:21

This explains the change. In fact placing the plugins in
pluginManagements disables them unless explicitely configured in a
sub-module. So this commit should be reverted.


In fact the bug described here no longer applies to later M2E
installations. It still complains about plugins that Eclipse does not
know about, but this does not prevent you from using Eclipse. So I would
strongly ask to revert the commit because it breaks the build.

Uwe

-
Uwe Schindler
H.-H.-Meier-Allee 63, D-28213 Bremen
http://www.thetaphi.de
eMail: u...@thetaphi.de


 -Original Message-
 From: Uwe Schindler [mailto:u...@thetaphi.de]
 Sent: Friday, January 23, 2015 12:11 PM
 To: dev@tika.apache.org
 Subject: Forbidden-APIS no longer ran because of carzy POM change
 
 Hi,
 
 I just noticed while checking the problems around the ExternalParsers
that
 the TIKA's build no longer runs the forbidden-apis Maven plugin, so we
got a
 few new violation especially regarding the toUpper/LowerCase(). In fact
the
 following commit broke this:
 
 Revision: 1624185
 Author: mattmann
 Date: Donnerstag, 11. September 2014 05:11:19
 Message:
 surround in plugin management to resolve
 http://stackoverflow.com/questions/6352208/how-to-solve-plugin-
 execution-not-covered-by-lifecycle-configuration-for-sprin
 
 Modified : /tika/trunk/tika-parent/pom.xml
 
 Since that change, the plugin is no longer run by default. I have no
idea, why
 this is like this, but in fact this broke some of the globally defined
check tasks.
 I have no idea how to reenable it easily.
 So I cannot help, but reverting that commit restores behavior. What is
the
 reason for this commit, there is not even an issue about that. I think
it seems
 to be a workaround for some Eclipse issue, but in fact this disables
the whole
 plugins. To reenable forbidden-apis you have to now explicitely enable
it in
 every module (because pluginManagement just gives the config of a
plugin,
 where without that it also enables its execution.
 
 In addition, there is already version 1.7 of forbiddenapis, so you can
replace
 1.6.1 of forbidden-apis with version 1.7 (which fixes a few bugs with
Java 8
 and Java 9).
 
 The following new violations were found - and in fact those broke code
in
 turkish locale:
 [INFO] 

 [INFO] Building Apache Tika core 1.8-SNAPSHOT [INFO]
--
 --
 [INFO]
 [INFO] --- forbiddenapis:1.7:testCheck (default) @ tika-core --- [INFO]
 Scanning for classes to check...
 [INFO] Reading bundled API signatures: jdk-unsafe [INFO] Reading bundled
 API signatures: jdk-deprecated [INFO] Loading classes to check...
 [INFO] Scanning for API signatures and dependencies...
 [ERROR] Forbidden method invocation: java.lang.String#toLowerCase()
 [Uses default locale]
 [ERROR]   in org.apache.tika.sax.BasicContentHandlerFactoryTest
 (BasicContentHandlerFactoryTest.java:79)
 [ERROR] Forbidden method invocation: java.lang.String#toLowerCase()
 [Uses default locale]
 [ERROR]   in 

[jira] [Commented] (TIKA-1529) Turn forbidden-apis back on

2015-01-23 Thread Konstantin Gribov (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289429#comment-14289429
 ] 

Konstantin Gribov commented on TIKA-1529:
-

[~talli...@mitre.org], it works with {{ISO-8859-1}} since only {{html}} tags 
presence is checked. It should also work with any utf-8 and single-byte 
encodings, so, I think, it's safe to decode with this encoding.

In openjdk8 {{new String(bytes)}} tries:
- to decode using default charset ({{Charset.defaultCharset().name()}}),
- if it fails print a warning and decode using {{ISO-8859-1}}.

We may use such pattern in {{ChmDirectoryListingSet}}.

 Turn forbidden-apis back on
 ---

 Key: TIKA-1529
 URL: https://issues.apache.org/jira/browse/TIKA-1529
 Project: Tika
  Issue Type: Bug
Reporter: Tim Allison
Priority: Minor

 [~thetaphi] recently noticed that forbidden-apis was turned off in r1624185, 
 and he submitted a patch to the dev list.  Let's turn it back on.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)


[jira] [Comment Edited] (TIKA-1526) ExternalParser should trap/ignore/workarround JDK-8047340 JDK-8055301 so Turkish Tika users can still use non-external parsers

2015-01-23 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289182#comment-14289182
 ] 

Uwe Schindler edited comment on TIKA-1526 at 1/23/15 12:32 PM:
---

To work around this bug you can in fact do this. It is just bad to change 
User's default locale, which may especially break multi-threaded applications.

One solution could be:
During startup of the JVM (in the Plexus launcher's main method) you can do the 
following:
- check for locale, we do this like that: {{new 
Locale(tr).getLanguage().equals(Locale.getDefault().getLanguage())}} (it is 
important to do the check like this, because otherwise its not guaranteed that 
it really works, especially in newer java versions!!!)
- if its such a locale, switch to Locale.ROOT (save original) in a 
single-threaded environment (this is why it should be in main launcher)
- execute a fake UNIX command, like /bin/true. You can also execute some 
non-existing bullshit that just fails. The call is just there to statically 
initalize the broken UnixProcess class. Once it is initialized correctly it 
works
- switch back to saved locale


was (Author: thetaphi):
To work around this bug you can in fact do this. It is just bad to change 
User's default locale, which may especially break multi-threaded applications.

One solution could be:
During startup of the JVM (in the Plexus launcher's main method) you can do the 
following:
- check for locale, we do this like that: {{new 
Locale(tr).getLanguage().equals(Locale.getDefault().getLanguage())}} (it is 
important to do the check like this, because otherwise its not guaranteed that 
it really works, especially in newer java versions!!!)
- if its such a locale, switch to Locale.ROOT (save original) in a 
single-threaded environment (this is why it should be in main launcher)
- execute a fake UNIX command, like /bin/true. You can also execute northing, 
it is just there to statically initalize the broken UnixProcess class. Once it 
is initialized correctly it works
- switch back to saved locale

 ExternalParser should trap/ignore/workarround JDK-8047340  JDK-8055301 so 
 Turkish Tika users can still use non-external parsers
 

 Key: TIKA-1526
 URL: https://issues.apache.org/jira/browse/TIKA-1526
 Project: Tika
  Issue Type: Wish
Reporter: Hoss Man

 the JDK has numerous pain points regarding the Turkish locale, posix_spawn 
 lowercasing being one of them...
 https://bugs.openjdk.java.net/browse/JDK-8047340
 https://bugs.openjdk.java.net/browse/JDK-8055301
 As of Tika 1.7, the TesseractOCRParser (which is an ExternalParser) is 
 enabled  configured by default in Tika, and uses ExternalParser.check to see 
 if tesseract is available -- but because of the JDK bug, this means that Tika 
 fails fast for Turkish users on BSD/UNIX variants (including MacOSX) like 
 so...
 {noformat}
   [junit4] Throwable #1: java.lang.Error: posix_spawn is not a supported 
 process launch mechanism on this platform.
   [junit4]   at java.lang.UNIXProcess$1.run(UNIXProcess.java:105)
   [junit4]   at java.lang.UNIXProcess$1.run(UNIXProcess.java:94)
   [junit4]   at java.security.AccessController.doPrivileged(Native 
 Method)
   [junit4]   at java.lang.UNIXProcess.clinit(UNIXProcess.java:92)
   [junit4]   at java.lang.ProcessImpl.start(ProcessImpl.java:130)
   [junit4]   at 
 java.lang.ProcessBuilder.start(ProcessBuilder.java:1029)
   [junit4]   at java.lang.Runtime.exec(Runtime.java:620)
   [junit4]   at java.lang.Runtime.exec(Runtime.java:485)
   [junit4]   at 
 org.apache.tika.parser.external.ExternalParser.check(ExternalParser.java:344)
   [junit4]   at 
 org.apache.tika.parser.ocr.TesseractOCRParser.hasTesseract(TesseractOCRParser.java:117)
   [junit4]   at 
 org.apache.tika.parser.ocr.TesseractOCRParser.getSupportedTypes(TesseractOCRParser.java:90)
   [junit4]   at 
 org.apache.tika.parser.CompositeParser.getParsers(CompositeParser.java:81)
   [junit4]   at 
 org.apache.tika.parser.DefaultParser.getParsers(DefaultParser.java:95)
   [junit4]   at 
 org.apache.tika.parser.CompositeParser.getSupportedTypes(CompositeParser.java:229)
   [junit4]   at 
 org.apache.tika.parser.CompositeParser.getParsers(CompositeParser.java:81)
   [junit4]   at 
 org.apache.tika.parser.CompositeParser.getParser(CompositeParser.java:209)
   [junit4]   at 
 org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:244)
   [junit4]   at 
 org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
 {noformat}
 ...unless they go out of their way to white list only the parsers 

[jira] [Commented] (TIKA-1526) ExternalParser should trap/ignore/workarround JDK-8047340 JDK-8055301 so Turkish Tika users can still use non-external parsers

2015-01-23 Thread Uwe Schindler (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289182#comment-14289182
 ] 

Uwe Schindler commented on TIKA-1526:
-

To work around this bug you can in fact do this. It is just bad to change 
User's default locale, which may especially break multi-threaded applications.

One solution could be:
During startup of the JVM (in the Plexus launcher's main method) you can do the 
following:
- check for locale, we do this like that: {{new 
Locale(tr).getLanguage().equals(Locale.getDefault().getLanguage())}} (it is 
important to do the check like this, because otherwise its not guaranteed that 
it really works, especially in newer java versions!!!)
- if its such a locale, switch to Locale.ROOT (save original) in a 
single-threaded environment (this is why it should be in main launcher)
- execute a fake UNIX command, like /bin/true. You can also execute northing, 
it is just there to statically initalize the broken UnixProcess class. Once it 
is initialized correctly it works
- switch back to saved locale

 ExternalParser should trap/ignore/workarround JDK-8047340  JDK-8055301 so 
 Turkish Tika users can still use non-external parsers
 

 Key: TIKA-1526
 URL: https://issues.apache.org/jira/browse/TIKA-1526
 Project: Tika
  Issue Type: Wish
Reporter: Hoss Man

 the JDK has numerous pain points regarding the Turkish locale, posix_spawn 
 lowercasing being one of them...
 https://bugs.openjdk.java.net/browse/JDK-8047340
 https://bugs.openjdk.java.net/browse/JDK-8055301
 As of Tika 1.7, the TesseractOCRParser (which is an ExternalParser) is 
 enabled  configured by default in Tika, and uses ExternalParser.check to see 
 if tesseract is available -- but because of the JDK bug, this means that Tika 
 fails fast for Turkish users on BSD/UNIX variants (including MacOSX) like 
 so...
 {noformat}
   [junit4] Throwable #1: java.lang.Error: posix_spawn is not a supported 
 process launch mechanism on this platform.
   [junit4]   at java.lang.UNIXProcess$1.run(UNIXProcess.java:105)
   [junit4]   at java.lang.UNIXProcess$1.run(UNIXProcess.java:94)
   [junit4]   at java.security.AccessController.doPrivileged(Native 
 Method)
   [junit4]   at java.lang.UNIXProcess.clinit(UNIXProcess.java:92)
   [junit4]   at java.lang.ProcessImpl.start(ProcessImpl.java:130)
   [junit4]   at 
 java.lang.ProcessBuilder.start(ProcessBuilder.java:1029)
   [junit4]   at java.lang.Runtime.exec(Runtime.java:620)
   [junit4]   at java.lang.Runtime.exec(Runtime.java:485)
   [junit4]   at 
 org.apache.tika.parser.external.ExternalParser.check(ExternalParser.java:344)
   [junit4]   at 
 org.apache.tika.parser.ocr.TesseractOCRParser.hasTesseract(TesseractOCRParser.java:117)
   [junit4]   at 
 org.apache.tika.parser.ocr.TesseractOCRParser.getSupportedTypes(TesseractOCRParser.java:90)
   [junit4]   at 
 org.apache.tika.parser.CompositeParser.getParsers(CompositeParser.java:81)
   [junit4]   at 
 org.apache.tika.parser.DefaultParser.getParsers(DefaultParser.java:95)
   [junit4]   at 
 org.apache.tika.parser.CompositeParser.getSupportedTypes(CompositeParser.java:229)
   [junit4]   at 
 org.apache.tika.parser.CompositeParser.getParsers(CompositeParser.java:81)
   [junit4]   at 
 org.apache.tika.parser.CompositeParser.getParser(CompositeParser.java:209)
   [junit4]   at 
 org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:244)
   [junit4]   at 
 org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120)
 {noformat}
 ...unless they go out of their way to white list only the parsers they 
 need/want so TesseractOCRParser (and any other ExternalParsers) will never 
 even be check()ed.
 It would be nice if Tika's ExternalParser class added a similar 
 hack/workarround to what was done in SOLR-6387 to trap these types of errors. 
  In Solr we just propogate a better error explaining why Java hates the 
 turkish langauge...
 {code}
 } catch (Error err) {
   if (err.getMessage() != null  (err.getMessage().contains(posix_spawn) 
 || err.getMessage().contains(UNIXProcess))) {
 log.warn(Error forking command due to JVM locale bug (see 
 https://issues.apache.org/jira/browse/SOLR-6387):  + err.getMessage());
 return (error executing:  + cmd + );
   }
 }
 {code}
 ...but with Tika, it might be better for all ExternalParsers to just opt 
 out as if they don't recognize the filetype when they detect this type of 
 error fro m the check method (or perhaps it would be better if 
 AutoDetectParser handled this? ... i'm not really sure how it would best fit 
 into Tika's 

[jira] [Commented] (TIKA-1529) Turn forbidden-apis back on

2015-01-23 Thread Tim Allison (JIRA)

[ 
https://issues.apache.org/jira/browse/TIKA-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289362#comment-14289362
 ] 

Tim Allison commented on TIKA-1529:
---

Makes sense.  I'll try to fix the causes for failure now so that when/if we can 
turn it back on, there won't be much work.

 Turn forbidden-apis back on
 ---

 Key: TIKA-1529
 URL: https://issues.apache.org/jira/browse/TIKA-1529
 Project: Tika
  Issue Type: Bug
Reporter: Tim Allison
Priority: Minor

 [~thetaphi] recently noticed that forbidden-apis was turned off in r1624185, 
 and he submitted a patch to the dev list.  Let's turn it back on.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)