RE: Forbidden-APIS no longer ran because of carzy POM change
Hi, I did further investigation. I had the plugin disabled in my eclipse (you can do this in quick fix for the whole workspace). In fact, if you remove the disable thing, it fails also in Eclipse Luna. If we want to make the plugin automatically hidden to all Eclipse versions through our own POM file - this is what the quick fix also allows to do for the current project: pluginManagement plugins !--This plugin's configuration is used to store Eclipse m2e settings only. It has no influence on the Maven build itself.-- plugin groupIdorg.eclipse.m2e/groupId artifactIdlifecycle-mapping/artifactId version1.0.0/version configuration lifecycleMappingMetadata pluginExecutions pluginExecution pluginExecutionFilter groupIdde.thetaphi/groupId artifactIdforbiddenapis/artifactId versionRange[1.0,)/versionRange goals goalcheck/goal goaltestCheck/goal /goals /pluginExecutionFilter action ignore/ /action /pluginExecution /pluginExecutions /lifecycleMappingMetadata /configuration /plugin /plugins /pluginManagement This can be put in to tika-parent's POM. - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Uwe Schindler [mailto:u...@thetaphi.de] Sent: Friday, January 23, 2015 5:18 PM To: dev@tika.apache.org Subject: RE: Forbidden-APIS no longer ran because of carzy POM change Hi, this may also help, it also brings the needed information: https://www.eclipse.org/m2e/documentation/m2e-execution-not- covered.html In fact the problem is: Eclipse has no idea how this plugin should be executed internally in Eclipse. But as this is just a check plugin that does not affect the build output at all, you can leave it disabled. If you scroll down, you see that Eclipse 4.2+ fixes this problem: Disable the plugin for Maven using Project properties - Maven - Lifecycle mappings - ignore - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Mattmann, Chris A (3980) [mailto:chris.a.mattm...@jpl.nasa.gov] Sent: Friday, January 23, 2015 4:13 PM To: dev@tika.apache.org Subject: Re: Forbidden-APIS no longer ran because of carzy POM change Hi Uwe, Thanks. I will check it out. Like I said, I’m not OK reverting anything if my Eclipse keeps complaining at me so we’ll need a fix that handles both. Let me try with the latest version of Eclipse and m2e and see if (with your patch) the issue goes away. Cheers, Chris ++ Chris Mattmann, Ph.D. Chief Architect Instrument Software and Science Data Systems Section (398) NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 168-519, Mailstop: 168-527 Email: chris.a.mattm...@nasa.gov WWW: http://sunset.usc.edu/~mattmann/ ++ Adjunct Associate Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++ -Original Message- From: Uwe Schindler u...@thetaphi.de Reply-To: dev@tika.apache.org dev@tika.apache.org Date: Friday, January 23, 2015 at 3:59 AM To: dev@tika.apache.org dev@tika.apache.org Subject: RE: Forbidden-APIS no longer ran because of carzy POM change Here ist he explanation why the plugin is no longer called because of this: - Works for me too, but can anyone explain why? – Andrew Swan May 15 '13 at 6:26 - @Andrew I think this works because m2e is not looking for plugins in pluginManagement, but only in build/plugins. In the Maven world, there is a difference between
[jira] [Commented] (TIKA-1521) Handle password protected 7zip files
[ https://issues.apache.org/jira/browse/TIKA-1521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289605#comment-14289605 ] Nick Burch commented on TIKA-1521: -- All unit tests (including that one) pass just fine on my system, after a mvn clean, so I'm not sure why it isn't working for you or Jenkins? Handle password protected 7zip files Key: TIKA-1521 URL: https://issues.apache.org/jira/browse/TIKA-1521 Project: Tika Issue Type: Improvement Components: parser Affects Versions: 1.7 Reporter: Nick Burch Fix For: 1.8 While working on TIKA-1028, I notice that while Commons Compress doesn't currently handle decrypting password protected zip files, it does handle password protected 7zip files We should therefore add logic into the package parser to spot password protected 7zip files, and fetch the password for them from a PasswordProvider if given -- This message was sent by Atlassian JIRA (v6.3.4#6332)
RE: Forbidden-APIS no longer ran because of carzy POM change
I will add this to the documentation page of forbidden-apis. This may also help Elasticsearch and other people :-) Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Mattmann, Chris A (3980) [mailto:chris.a.mattm...@jpl.nasa.gov] Sent: Friday, January 23, 2015 8:11 PM To: dev@tika.apache.org Subject: Re: Forbidden-APIS no longer ran because of carzy POM change awesome. Thanks Uwe. Tim you want to put that in, or you want me to? ++ Chris Mattmann, Ph.D. Chief Architect Instrument Software and Science Data Systems Section (398) NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 168-519, Mailstop: 168-527 Email: chris.a.mattm...@nasa.gov WWW: http://sunset.usc.edu/~mattmann/ ++ Adjunct Associate Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++ -Original Message- From: Uwe Schindler u...@thetaphi.de Reply-To: dev@tika.apache.org dev@tika.apache.org Date: Friday, January 23, 2015 at 8:47 AM To: dev@tika.apache.org dev@tika.apache.org Subject: RE: Forbidden-APIS no longer ran because of carzy POM change Hi, I did further investigation. I had the plugin disabled in my eclipse (you can do this in quick fix for the whole workspace). In fact, if you remove the disable thing, it fails also in Eclipse Luna. If we want to make the plugin automatically hidden to all Eclipse versions through our own POM file - this is what the quick fix also allows to do for the current project: pluginManagement plugins !--This plugin's configuration is used to store Eclipse m2e settings only. It has no influence on the Maven build itself.-- plugin groupIdorg.eclipse.m2e/groupId artifactIdlifecycle-mapping/artifactId version1.0.0/version configuration lifecycleMappingMetadata pluginExecutions pluginExecution pluginExecutionFilter groupIdde.thetaphi/groupId artifactIdforbiddenapis/artifactId versionRange[1.0,)/versionRange goals goalcheck/goal goaltestCheck/goal /goals /pluginExecutionFilter action ignore/ /action /pluginExecution /pluginExecutions /lifecycleMappingMetadata /configuration /plugin /plugins /pluginManagement This can be put in to tika-parent's POM. - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Uwe Schindler [mailto:u...@thetaphi.de] Sent: Friday, January 23, 2015 5:18 PM To: dev@tika.apache.org Subject: RE: Forbidden-APIS no longer ran because of carzy POM change Hi, this may also help, it also brings the needed information: https://www.eclipse.org/m2e/documentation/m2e-execution-not- covered.html In fact the problem is: Eclipse has no idea how this plugin should be executed internally in Eclipse. But as this is just a check plugin that does not affect the build output at all, you can leave it disabled. If you scroll down, you see that Eclipse 4.2+ fixes this problem: Disable the plugin for Maven using Project properties - Maven - Lifecycle mappings - ignore - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Mattmann, Chris A (3980) [mailto:chris.a.mattm...@jpl.nasa.gov] Sent: Friday, January 23, 2015 4:13 PM To: dev@tika.apache.org Subject: Re: Forbidden-APIS no longer ran because of carzy POM change Hi Uwe, Thanks. I will check it out. Like I said, I’m not OK reverting anything if my Eclipse keeps complaining at me so we’ll need a fix that handles both. Let me try with the latest version of Eclipse and m2e and see if (with your patch) the issue goes away. Cheers, Chris ++ Chris Mattmann, Ph.D. Chief Architect Instrument Software and Science Data Systems Section (398) NASA Jet
Re: Forbidden-APIS no longer ran because of carzy POM change
awesome. Thanks Uwe. Tim you want to put that in, or you want me to? ++ Chris Mattmann, Ph.D. Chief Architect Instrument Software and Science Data Systems Section (398) NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 168-519, Mailstop: 168-527 Email: chris.a.mattm...@nasa.gov WWW: http://sunset.usc.edu/~mattmann/ ++ Adjunct Associate Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++ -Original Message- From: Uwe Schindler u...@thetaphi.de Reply-To: dev@tika.apache.org dev@tika.apache.org Date: Friday, January 23, 2015 at 8:47 AM To: dev@tika.apache.org dev@tika.apache.org Subject: RE: Forbidden-APIS no longer ran because of carzy POM change Hi, I did further investigation. I had the plugin disabled in my eclipse (you can do this in quick fix for the whole workspace). In fact, if you remove the disable thing, it fails also in Eclipse Luna. If we want to make the plugin automatically hidden to all Eclipse versions through our own POM file - this is what the quick fix also allows to do for the current project: pluginManagement plugins !--This plugin's configuration is used to store Eclipse m2e settings only. It has no influence on the Maven build itself.-- plugin groupIdorg.eclipse.m2e/groupId artifactIdlifecycle-mapping/artifactId version1.0.0/version configuration lifecycleMappingMetadata pluginExecutions pluginExecution pluginExecutionFilter groupIdde.thetaphi/groupId artifactIdforbiddenapis/artifactId versionRange[1.0,)/versionRange goals goalcheck/goal goaltestCheck/goal /goals /pluginExecutionFilter action ignore/ /action /pluginExecution /pluginExecutions /lifecycleMappingMetadata /configuration /plugin /plugins /pluginManagement This can be put in to tika-parent's POM. - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Uwe Schindler [mailto:u...@thetaphi.de] Sent: Friday, January 23, 2015 5:18 PM To: dev@tika.apache.org Subject: RE: Forbidden-APIS no longer ran because of carzy POM change Hi, this may also help, it also brings the needed information: https://www.eclipse.org/m2e/documentation/m2e-execution-not- covered.html In fact the problem is: Eclipse has no idea how this plugin should be executed internally in Eclipse. But as this is just a check plugin that does not affect the build output at all, you can leave it disabled. If you scroll down, you see that Eclipse 4.2+ fixes this problem: Disable the plugin for Maven using Project properties - Maven - Lifecycle mappings - ignore - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Mattmann, Chris A (3980) [mailto:chris.a.mattm...@jpl.nasa.gov] Sent: Friday, January 23, 2015 4:13 PM To: dev@tika.apache.org Subject: Re: Forbidden-APIS no longer ran because of carzy POM change Hi Uwe, Thanks. I will check it out. Like I said, I’m not OK reverting anything if my Eclipse keeps complaining at me so we’ll need a fix that handles both. Let me try with the latest version of Eclipse and m2e and see if (with your patch) the issue goes away. Cheers, Chris ++ Chris Mattmann, Ph.D. Chief Architect Instrument Software and Science Data Systems Section (398) NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 168-519, Mailstop: 168-527 Email: chris.a.mattm...@nasa.gov WWW: http://sunset.usc.edu/~mattmann/
RE: Forbidden-APIS no longer ran because of carzy POM change
Uwe, To confirm, we need to add this pluginManagement.../pluginManagement fully as it is in the parent pom.xml, we should not put the plugin under our regular plugins (which no longer have pluginManagement? -Original Message- From: Uwe Schindler [mailto:u...@thetaphi.de] Sent: Friday, January 23, 2015 11:47 AM To: dev@tika.apache.org Subject: RE: Forbidden-APIS no longer ran because of carzy POM change Hi, I did further investigation. I had the plugin disabled in my eclipse (you can do this in quick fix for the whole workspace). In fact, if you remove the disable thing, it fails also in Eclipse Luna. If we want to make the plugin automatically hidden to all Eclipse versions through our own POM file - this is what the quick fix also allows to do for the current project: pluginManagement plugins !--This plugin's configuration is used to store Eclipse m2e settings only. It has no influence on the Maven build itself.-- plugin groupIdorg.eclipse.m2e/groupId artifactIdlifecycle-mapping/artifactId version1.0.0/version configuration lifecycleMappingMetadata pluginExecutions pluginExecution pluginExecutionFilter groupIdde.thetaphi/groupId artifactIdforbiddenapis/artifactId versionRange[1.0,)/versionRange goals goalcheck/goal goaltestCheck/goal /goals /pluginExecutionFilter action ignore/ /action /pluginExecution /pluginExecutions /lifecycleMappingMetadata /configuration /plugin /plugins /pluginManagement This can be put in to tika-parent's POM. - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Uwe Schindler [mailto:u...@thetaphi.de] Sent: Friday, January 23, 2015 5:18 PM To: dev@tika.apache.org Subject: RE: Forbidden-APIS no longer ran because of carzy POM change Hi, this may also help, it also brings the needed information: https://www.eclipse.org/m2e/documentation/m2e-execution-not- covered.html In fact the problem is: Eclipse has no idea how this plugin should be executed internally in Eclipse. But as this is just a check plugin that does not affect the build output at all, you can leave it disabled. If you scroll down, you see that Eclipse 4.2+ fixes this problem: Disable the plugin for Maven using Project properties - Maven - Lifecycle mappings - ignore - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Mattmann, Chris A (3980) [mailto:chris.a.mattm...@jpl.nasa.gov] Sent: Friday, January 23, 2015 4:13 PM To: dev@tika.apache.org Subject: Re: Forbidden-APIS no longer ran because of carzy POM change Hi Uwe, Thanks. I will check it out. Like I said, I’m not OK reverting anything if my Eclipse keeps complaining at me so we’ll need a fix that handles both. Let me try with the latest version of Eclipse and m2e and see if (with your patch) the issue goes away. Cheers, Chris ++ Chris Mattmann, Ph.D. Chief Architect Instrument Software and Science Data Systems Section (398) NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 168-519, Mailstop: 168-527 Email: chris.a.mattm...@nasa.gov WWW: http://sunset.usc.edu/~mattmann/ ++ Adjunct Associate Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++ -Original Message- From: Uwe Schindler u...@thetaphi.de Reply-To: dev@tika.apache.org dev@tika.apache.org Date: Friday, January 23, 2015 at 3:59 AM To: dev@tika.apache.org
[jira] [Commented] (TIKA-1529) Turn forbidden-apis back on
[ https://issues.apache.org/jira/browse/TIKA-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289932#comment-14289932 ] Hudson commented on TIKA-1529: -- UNSTABLE: Integrated in tika-trunk-jdk1.7 #449 (See [https://builds.apache.org/job/tika-trunk-jdk1.7/449/]) TIKA-1529: turn forbidden-apis back on and clean up all mentions of UTF-8 (tallison: http://svn.apache.org/viewvc/tika/trunk/?view=revrev=1654351) * /tika/trunk/tika-app/src/main/java/org/apache/tika/cli/TikaCLI.java * /tika/trunk/tika-app/src/main/java/org/apache/tika/gui/TikaGUI.java * /tika/trunk/tika-app/src/test/java/org/apache/tika/cli/TikaCLITest.java * /tika/trunk/tika-bundle/src/test/java/org/apache/tika/bundle/BundleIT.java * /tika/trunk/tika-core/src/main/java/org/apache/tika/config/ServiceLoader.java * /tika/trunk/tika-core/src/main/java/org/apache/tika/detect/MagicDetector.java * /tika/trunk/tika-core/src/main/java/org/apache/tika/detect/NameDetector.java * /tika/trunk/tika-core/src/main/java/org/apache/tika/embedder/ExternalEmbedder.java * /tika/trunk/tika-core/src/main/java/org/apache/tika/fork/ForkClient.java * /tika/trunk/tika-core/src/main/java/org/apache/tika/io/IOUtils.java * /tika/trunk/tika-core/src/main/java/org/apache/tika/language/LanguageIdentifier.java * /tika/trunk/tika-core/src/main/java/org/apache/tika/language/LanguageProfilerBuilder.java * /tika/trunk/tika-core/src/main/java/org/apache/tika/parser/external/ExternalParser.java * /tika/trunk/tika-core/src/test/java/org/apache/tika/detect/TextDetectorTest.java * /tika/trunk/tika-core/src/test/java/org/apache/tika/io/TailStreamTest.java * /tika/trunk/tika-core/src/test/java/org/apache/tika/io/TikaInputStreamTest.java * /tika/trunk/tika-core/src/test/java/org/apache/tika/language/LanguageIdentifierTest.java * /tika/trunk/tika-core/src/test/java/org/apache/tika/language/LanguageProfilerBuilderTest.java * /tika/trunk/tika-core/src/test/java/org/apache/tika/mime/MimeDetectionTest.java * /tika/trunk/tika-core/src/test/java/org/apache/tika/sax/BasicContentHandlerFactoryTest.java * /tika/trunk/tika-core/src/test/java/org/apache/tika/sax/BodyContentHandlerTest.java * /tika/trunk/tika-example/src/main/java/org/apache/tika/example/DumpTikaConfigExample.java * /tika/trunk/tika-parent/pom.xml * /tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/chm/accessor/ChmDirectoryListingSet.java * /tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/chm/accessor/ChmItsfHeader.java * /tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/chm/accessor/ChmItspHeader.java * /tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/chm/accessor/ChmLzxcControlData.java * /tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/chm/accessor/ChmPmgiHeader.java * /tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/chm/accessor/ChmPmglHeader.java * /tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/chm/core/ChmConstants.java * /tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/chm/core/ChmExtractor.java * /tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/epub/EpubParser.java * /tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/gdal/GDALParser.java * /tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/image/ImageMetadataExtractor.java * /tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/image/xmp/JempboxExtractor.java * /tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/iptc/IptcAnpaParser.java * /tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/mat/MatParser.java * /tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/mbox/OutlookPSTParser.java * /tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/microsoft/WordExtractor.java * /tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/mp3/LyricsHandler.java * /tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/ocr/TesseractOCRConfig.java * /tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/ocr/TesseractOCRParser.java * /tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/odf/OpenDocumentParser.java * /tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/pkg/ZipContainerDetector.java * /tika/trunk/tika-parsers/src/main/java/org/apache/tika/parser/video/FLVParser.java * /tika/trunk/tika-parsers/src/test/java/org/apache/tika/embedder/ExternalEmbedderTest.java * /tika/trunk/tika-parsers/src/test/java/org/apache/tika/parser/AutoDetectParserTest.java * /tika/trunk/tika-parsers/src/test/java/org/apache/tika/parser/ParsingReaderTest.java * /tika/trunk/tika-parsers/src/test/java/org/apache/tika/parser/chm/TestChmBlockInfo.java * /tika/trunk/tika-parsers/src/test/java/org/apache/tika/parser/chm/TestChmExtraction.java * /tika/trunk/tika-parsers/src/test/java/org/apache/tika/parser/chm/TestChmItspHeader.java *
RE: Forbidden-APIS no longer ran because of carzy POM change
Will do. -Original Message- From: Mattmann, Chris A (3980) [mailto:chris.a.mattm...@jpl.nasa.gov] Sent: Friday, January 23, 2015 2:11 PM To: dev@tika.apache.org Subject: Re: Forbidden-APIS no longer ran because of carzy POM change awesome. Thanks Uwe. Tim you want to put that in, or you want me to? ++ Chris Mattmann, Ph.D. Chief Architect Instrument Software and Science Data Systems Section (398) NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 168-519, Mailstop: 168-527 Email: chris.a.mattm...@nasa.gov WWW: http://sunset.usc.edu/~mattmann/ ++ Adjunct Associate Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++ -Original Message- From: Uwe Schindler u...@thetaphi.de Reply-To: dev@tika.apache.org dev@tika.apache.org Date: Friday, January 23, 2015 at 8:47 AM To: dev@tika.apache.org dev@tika.apache.org Subject: RE: Forbidden-APIS no longer ran because of carzy POM change Hi, I did further investigation. I had the plugin disabled in my eclipse (you can do this in quick fix for the whole workspace). In fact, if you remove the disable thing, it fails also in Eclipse Luna. If we want to make the plugin automatically hidden to all Eclipse versions through our own POM file - this is what the quick fix also allows to do for the current project: pluginManagement plugins !--This plugin's configuration is used to store Eclipse m2e settings only. It has no influence on the Maven build itself.-- plugin groupIdorg.eclipse.m2e/groupId artifactIdlifecycle-mapping/artifactId version1.0.0/version configuration lifecycleMappingMetadata pluginExecutions pluginExecution pluginExecutionFilter groupIdde.thetaphi/groupId artifactIdforbiddenapis/artifactId versionRange[1.0,)/versionRange goals goalcheck/goal goaltestCheck/goal /goals /pluginExecutionFilter action ignore/ /action /pluginExecution /pluginExecutions /lifecycleMappingMetadata /configuration /plugin /plugins /pluginManagement This can be put in to tika-parent's POM. - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Uwe Schindler [mailto:u...@thetaphi.de] Sent: Friday, January 23, 2015 5:18 PM To: dev@tika.apache.org Subject: RE: Forbidden-APIS no longer ran because of carzy POM change Hi, this may also help, it also brings the needed information: https://www.eclipse.org/m2e/documentation/m2e-execution-not- covered.html In fact the problem is: Eclipse has no idea how this plugin should be executed internally in Eclipse. But as this is just a check plugin that does not affect the build output at all, you can leave it disabled. If you scroll down, you see that Eclipse 4.2+ fixes this problem: Disable the plugin for Maven using Project properties - Maven - Lifecycle mappings - ignore - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Mattmann, Chris A (3980) [mailto:chris.a.mattm...@jpl.nasa.gov] Sent: Friday, January 23, 2015 4:13 PM To: dev@tika.apache.org Subject: Re: Forbidden-APIS no longer ran because of carzy POM change Hi Uwe, Thanks. I will check it out. Like I said, I’m not OK reverting anything if my Eclipse keeps complaining at me so we’ll need a fix that handles both. Let me try with the latest version of Eclipse and m2e and see if (with your patch) the issue goes away. Cheers, Chris ++ Chris Mattmann, Ph.D. Chief Architect
RE: Forbidden-APIS no longer ran because of carzy POM change
Hi Timothy, Your commit looks fine. Basically, this pluginManagement section is just containing some fake plugin that is never actively executed, but used by Eclipse to detect which plugins map to internal lifecycles of the Eclipse IDE. It uses this to map for example how to execute the compile maven plugin inside Eclipse (use ECJ compiler) or let the surefire plugin map to the internal Eclipse test runner. Our addition through the parent POM just tells eclipse how to map the forbidden-apis plugin: To *nothing*, just ignore it inside the Eclipse IDE. I will try it with Eclipse later, to make sure all is fine. But looks good to me. I have not yet tried the setup with parent POMs, but I assume this should be fine. Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Allison, Timothy B. [mailto:talli...@mitre.org] Sent: Friday, January 23, 2015 8:35 PM To: dev@tika.apache.org Subject: RE: Forbidden-APIS no longer ran because of carzy POM change Uwe, To confirm, we need to add this pluginManagement.../pluginManagement fully as it is in the parent pom.xml, we should not put the plugin under our regular plugins (which no longer have pluginManagement? -Original Message- From: Uwe Schindler [mailto:u...@thetaphi.de] Sent: Friday, January 23, 2015 11:47 AM To: dev@tika.apache.org Subject: RE: Forbidden-APIS no longer ran because of carzy POM change Hi, I did further investigation. I had the plugin disabled in my eclipse (you can do this in quick fix for the whole workspace). In fact, if you remove the disable thing, it fails also in Eclipse Luna. If we want to make the plugin automatically hidden to all Eclipse versions through our own POM file - this is what the quick fix also allows to do for the current project: pluginManagement plugins !--This plugin's configuration is used to store Eclipse m2e settings only. It has no influence on the Maven build itself.-- plugin groupIdorg.eclipse.m2e/groupId artifactIdlifecycle-mapping/artifactId version1.0.0/version configuration lifecycleMappingMetadata pluginExecutions pluginExecution pluginExecutionFilter groupIdde.thetaphi/groupId artifactIdforbiddenapis/artifactId versionRange[1.0,)/versionRange goals goalcheck/goal goaltestCheck/goal /goals /pluginExecutionFilter action ignore/ /action /pluginExecution /pluginExecutions /lifecycleMappingMetadata /configuration /plugin /plugins /pluginManagement This can be put in to tika-parent's POM. - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Uwe Schindler [mailto:u...@thetaphi.de] Sent: Friday, January 23, 2015 5:18 PM To: dev@tika.apache.org Subject: RE: Forbidden-APIS no longer ran because of carzy POM change Hi, this may also help, it also brings the needed information: https://www.eclipse.org/m2e/documentation/m2e-execution-not- covered.html In fact the problem is: Eclipse has no idea how this plugin should be executed internally in Eclipse. But as this is just a check plugin that does not affect the build output at all, you can leave it disabled. If you scroll down, you see that Eclipse 4.2+ fixes this problem: Disable the plugin for Maven using Project properties - Maven - Lifecycle mappings - ignore - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Mattmann, Chris A (3980) [mailto:chris.a.mattm...@jpl.nasa.gov] Sent: Friday, January 23, 2015 4:13 PM To: dev@tika.apache.org Subject: Re: Forbidden-APIS no longer ran because of carzy POM change Hi Uwe, Thanks. I will check it out. Like I said, I’m not OK reverting anything if my Eclipse keeps complaining at me so we’ll need a fix that handles both. Let me try with the latest version of Eclipse and m2e and see if (with your patch) the issue goes away. Cheers, Chris ++ Chris Mattmann, Ph.D.
[jira] [Resolved] (TIKA-1529) Turn forbidden-apis back on
[ https://issues.apache.org/jira/browse/TIKA-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison resolved TIKA-1529. --- Resolution: Fixed Fixes made in r1654351. Let me know if there are any surprises. Turn forbidden-apis back on --- Key: TIKA-1529 URL: https://issues.apache.org/jira/browse/TIKA-1529 Project: Tika Issue Type: Bug Reporter: Tim Allison Priority: Minor [~thetaphi] recently noticed that forbidden-apis was turned off in r1624185, and he submitted a patch to the dev list. Let's turn it back on. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
RE: Forbidden-APIS no longer ran because of carzy POM change
Works fine here! After I removed the manual override of the plugin lifecycle settings, restart, maven update, TIKA built successfully. So the setting in Parent POM is enough. I will update the forbidden-apis documentation to help others, too. Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Uwe Schindler [mailto:u...@thetaphi.de] Sent: Friday, January 23, 2015 9:08 PM To: dev@tika.apache.org Subject: RE: Forbidden-APIS no longer ran because of carzy POM change Hi Timothy, Your commit looks fine. Basically, this pluginManagement section is just containing some fake plugin that is never actively executed, but used by Eclipse to detect which plugins map to internal lifecycles of the Eclipse IDE. It uses this to map for example how to execute the compile maven plugin inside Eclipse (use ECJ compiler) or let the surefire plugin map to the internal Eclipse test runner. Our addition through the parent POM just tells eclipse how to map the forbidden-apis plugin: To *nothing*, just ignore it inside the Eclipse IDE. I will try it with Eclipse later, to make sure all is fine. But looks good to me. I have not yet tried the setup with parent POMs, but I assume this should be fine. Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Allison, Timothy B. [mailto:talli...@mitre.org] Sent: Friday, January 23, 2015 8:35 PM To: dev@tika.apache.org Subject: RE: Forbidden-APIS no longer ran because of carzy POM change Uwe, To confirm, we need to add this pluginManagement.../pluginManagement fully as it is in the parent pom.xml, we should not put the plugin under our regular plugins (which no longer have pluginManagement? -Original Message- From: Uwe Schindler [mailto:u...@thetaphi.de] Sent: Friday, January 23, 2015 11:47 AM To: dev@tika.apache.org Subject: RE: Forbidden-APIS no longer ran because of carzy POM change Hi, I did further investigation. I had the plugin disabled in my eclipse (you can do this in quick fix for the whole workspace). In fact, if you remove the disable thing, it fails also in Eclipse Luna. If we want to make the plugin automatically hidden to all Eclipse versions through our own POM file - this is what the quick fix also allows to do for the current project: pluginManagement plugins !--This plugin's configuration is used to store Eclipse m2e settings only. It has no influence on the Maven build itself.-- plugin groupIdorg.eclipse.m2e/groupId artifactIdlifecycle-mapping/artifactId version1.0.0/version configuration lifecycleMappingMetadata pluginExecutions pluginExecution pluginExecutionFilter groupIdde.thetaphi/groupId artifactIdforbiddenapis/artifactId versionRange[1.0,)/versionRange goals goalcheck/goal goaltestCheck/goal /goals /pluginExecutionFilter action ignore/ /action /pluginExecution /pluginExecutions /lifecycleMappingMetadata /configuration /plugin /plugins /pluginManagement This can be put in to tika-parent's POM. - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Uwe Schindler [mailto:u...@thetaphi.de] Sent: Friday, January 23, 2015 5:18 PM To: dev@tika.apache.org Subject: RE: Forbidden-APIS no longer ran because of carzy POM change Hi, this may also help, it also brings the needed information: https://www.eclipse.org/m2e/documentation/m2e-execution-not- covered.html In fact the problem is: Eclipse has no idea how this plugin should be executed internally in Eclipse. But as this is just a check plugin that does not affect the build output at all, you can leave it disabled. If you scroll down, you see that Eclipse 4.2+ fixes this problem: Disable the plugin for Maven using Project properties - Maven - Lifecycle mappings -
Re: Forbidden-APIS no longer ran because of carzy POM change
Hi, Uwe. There're several places where forbiddenapis will give errors in Tika. I don't know if there is better way to fallback. E.g. in one of chm parser classes: try { dle.setName(new String(bytes, UTF-8)); catch (UnsupportedCharsetException e) { dle.setName(new String(bytes)); } Can you add special annotation parsing (like @SuppressWarnings(forbiddenapis) on element) to avoid emitting build error in special cases like above mentioned? -- Best regards, Konstantin Gribov Fri Jan 23 2015 at 15:10:18, Uwe Schindler u...@thetaphi.de: Here is the patch, mailing list swallowed it: Index: tika-parent/pom.xml === --- tika-parent/pom.xml (revision 1654171) +++ tika-parent/pom.xml (working copy) @@ -274,7 +274,6 @@ /properties build -pluginManagement plugins plugin artifactIdmaven-compiler-plugin/artifactId @@ -287,7 +286,7 @@ plugin groupIdde.thetaphi/groupId artifactIdforbiddenapis/artifactId - version1.6.1/version + version1.7/version configuration targetVersion${maven.compiler.target}/targetVersion internalRuntimeForbiddentrue/internalRuntimeForbidden @@ -322,7 +321,6 @@ version2.3/version /plugin /plugins -/pluginManagement /build profiles - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Uwe Schindler [mailto:u...@thetaphi.de] Sent: Friday, January 23, 2015 1:08 PM To: dev@tika.apache.org Subject: RE: Forbidden-APIS no longer ran because of carzy POM change The attached patch reverts the change and updates the forbidden plugin. Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Uwe Schindler [mailto:u...@thetaphi.de] Sent: Friday, January 23, 2015 1:00 PM To: dev@tika.apache.org Subject: RE: Forbidden-APIS no longer ran because of carzy POM change Here ist he explanation why the plugin is no longer called because of this: - Works for me too, but can anyone explain why? – Andrew Swan May 15 '13 at 6:26 - @Andrew I think this works because m2e is not looking for plugins in pluginManagement, but only in build/plugins. In the Maven world, there is a difference between the two - the former defines if you happen to use this plugin, here's the configuration to use, whereas the latter states use this plugin. See this post and its top two answers. – GreenGiant Jul 5 '13 at 17:52 - I agree with @GreenGiant. I tried this solution but it then breaks the compilation since the aspectj plugin is not called before compilation. – Pierre Aug 30 '13 at 20:21 This explains the change. In fact placing the plugins in pluginManagements disables them unless explicitely configured in a sub-module. So this commit should be reverted. In fact the bug described here no longer applies to later M2E installations. It still complains about plugins that Eclipse does not know about, but this does not prevent you from using Eclipse. So I would strongly ask to revert the commit because it breaks the build. Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Uwe Schindler [mailto:u...@thetaphi.de] Sent: Friday, January 23, 2015 12:11 PM To: dev@tika.apache.org Subject: Forbidden-APIS no longer ran because of carzy POM change Hi, I just noticed while checking the problems around the ExternalParsers that the TIKA's build no longer runs the forbidden-apis Maven plugin, so we got a few new violation especially regarding the toUpper/LowerCase(). In fact the following commit broke this: Revision: 1624185 Author: mattmann Date: Donnerstag, 11. September 2014 05:11:19 Message: surround in plugin management to resolve http://stackoverflow.com/questions/6352208/how-to-solve-plugin- execution-not-covered-by-lifecycle-configuration-for-sprin Modified : /tika/trunk/tika-parent/pom.xml Since that change, the plugin is no longer run by default. I have no idea, why this is like this, but in fact this broke some of the globally defined check tasks. I have no idea how to reenable it easily. So I cannot help, but reverting that commit restores behavior. What is the reason for this commit, there is not even an issue about that. I think it seems to be a workaround for some Eclipse issue, but in fact this disables the whole plugins. To reenable forbidden-apis you have to now explicitely enable it in every module (because pluginManagement just gives
[jira] [Commented] (TIKA-1511) Create a parser for SQLite3
[ https://issues.apache.org/jira/browse/TIKA-1511?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14290303#comment-14290303 ] Tim Allison commented on TIKA-1511: --- I'm not sure I understand the need for that. Won't you be able to send in whatever handler you want via the regular call to parse and by attaching a ParsingEmbeddedDocumentExtractor? What, exactly, do you want to have when Tika has finished processing the Sqlite file? Create a parser for SQLite3 --- Key: TIKA-1511 URL: https://issues.apache.org/jira/browse/TIKA-1511 Project: Tika Issue Type: New Feature Components: parser Affects Versions: 1.6 Reporter: Luis Filipe Nassif Fix For: 1.8 Attachments: TIKA-1511v1.patch, TIKA-1511v2.patch, TIKA-1511v3.patch, testSQLLite3b.db, testSQLLite3b.db I think it would be very useful, as sqlite is used as data storage by a wide range of applications. Opening the ticket to track it. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-1521) Handle password protected 7zip files
[ https://issues.apache.org/jira/browse/TIKA-1521?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14290310#comment-14290310 ] Tim Allison commented on TIKA-1521: --- I'm getting the test failure on Windows with Java 1.8, but all is well with an fairly old update of 1.7 on RHEL. Handle password protected 7zip files Key: TIKA-1521 URL: https://issues.apache.org/jira/browse/TIKA-1521 Project: Tika Issue Type: Improvement Components: parser Affects Versions: 1.7 Reporter: Nick Burch Fix For: 1.8 While working on TIKA-1028, I notice that while Commons Compress doesn't currently handle decrypting password protected zip files, it does handle password protected 7zip files We should therefore add logic into the package parser to spot password protected 7zip files, and fetch the password for them from a PasswordProvider if given -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-1529) Turn forbidden-apis back on
[ https://issues.apache.org/jira/browse/TIKA-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14290304#comment-14290304 ] Tim Allison commented on TIKA-1529: --- Thank you, [~thetaphi]! Turn forbidden-apis back on --- Key: TIKA-1529 URL: https://issues.apache.org/jira/browse/TIKA-1529 Project: Tika Issue Type: Bug Reporter: Tim Allison Priority: Minor [~thetaphi] recently noticed that forbidden-apis was turned off in r1624185, and he submitted a patch to the dev list. Let's turn it back on. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
RE: Forbidden-APIS no longer ran because of carzy POM change
The attached patch reverts the change and updates the forbidden plugin. Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Uwe Schindler [mailto:u...@thetaphi.de] Sent: Friday, January 23, 2015 1:00 PM To: dev@tika.apache.org Subject: RE: Forbidden-APIS no longer ran because of carzy POM change Here ist he explanation why the plugin is no longer called because of this: - Works for me too, but can anyone explain why? – Andrew Swan May 15 '13 at 6:26 - @Andrew I think this works because m2e is not looking for plugins in pluginManagement, but only in build/plugins. In the Maven world, there is a difference between the two - the former defines if you happen to use this plugin, here's the configuration to use, whereas the latter states use this plugin. See this post and its top two answers. – GreenGiant Jul 5 '13 at 17:52 - I agree with @GreenGiant. I tried this solution but it then breaks the compilation since the aspectj plugin is not called before compilation. – Pierre Aug 30 '13 at 20:21 This explains the change. In fact placing the plugins in pluginManagements disables them unless explicitely configured in a sub-module. So this commit should be reverted. In fact the bug described here no longer applies to later M2E installations. It still complains about plugins that Eclipse does not know about, but this does not prevent you from using Eclipse. So I would strongly ask to revert the commit because it breaks the build. Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Uwe Schindler [mailto:u...@thetaphi.de] Sent: Friday, January 23, 2015 12:11 PM To: dev@tika.apache.org Subject: Forbidden-APIS no longer ran because of carzy POM change Hi, I just noticed while checking the problems around the ExternalParsers that the TIKA's build no longer runs the forbidden-apis Maven plugin, so we got a few new violation especially regarding the toUpper/LowerCase(). In fact the following commit broke this: Revision: 1624185 Author: mattmann Date: Donnerstag, 11. September 2014 05:11:19 Message: surround in plugin management to resolve http://stackoverflow.com/questions/6352208/how-to-solve-plugin- execution-not-covered-by-lifecycle-configuration-for-sprin Modified : /tika/trunk/tika-parent/pom.xml Since that change, the plugin is no longer run by default. I have no idea, why this is like this, but in fact this broke some of the globally defined check tasks. I have no idea how to reenable it easily. So I cannot help, but reverting that commit restores behavior. What is the reason for this commit, there is not even an issue about that. I think it seems to be a workaround for some Eclipse issue, but in fact this disables the whole plugins. To reenable forbidden-apis you have to now explicitely enable it in every module (because pluginManagement just gives the config of a plugin, where without that it also enables its execution. In addition, there is already version 1.7 of forbiddenapis, so you can replace 1.6.1 of forbidden-apis with version 1.7 (which fixes a few bugs with Java 8 and Java 9). The following new violations were found - and in fact those broke code in turkish locale: [INFO] -- -- [INFO] Building Apache Tika core 1.8-SNAPSHOT [INFO] -- -- [INFO] [INFO] --- forbiddenapis:1.7:testCheck (default) @ tika-core --- [INFO] Scanning for classes to check... [INFO] Reading bundled API signatures: jdk-unsafe [INFO] Reading bundled API signatures: jdk-deprecated [INFO] Loading classes to check... [INFO] Scanning for API signatures and dependencies... [ERROR] Forbidden method invocation: java.lang.String#toLowerCase() [Uses default locale] [ERROR] in org.apache.tika.sax.BasicContentHandlerFactoryTest (BasicContentHandlerFactoryTest.java:79) [ERROR] Forbidden method invocation: java.lang.String#toLowerCase() [Uses default locale] [ERROR] in org.apache.tika.sax.BasicContentHandlerFactoryTest (BasicContentHandlerFactoryTest.java:80) [ERROR] Forbidden method invocation: java.lang.String#toLowerCase() [Uses default locale] [ERROR] in org.apache.tika.sax.BasicContentHandlerFactoryTest (BasicContentHandlerFactoryTest.java:88) [ERROR] Forbidden method invocation: java.lang.String#toLowerCase() [Uses default locale] [ERROR] in org.apache.tika.sax.BasicContentHandlerFactoryTest (BasicContentHandlerFactoryTest.java:133) [ERROR] Forbidden method invocation: java.lang.String#toLowerCase() [Uses default locale] [ERROR] in org.apache.tika.sax.BasicContentHandlerFactoryTest
[jira] [Commented] (TIKA-1526) ExternalParser should trap/ignore/workarround JDK-8047340 JDK-8055301 so Turkish Tika users can still use non-external parsers
[ https://issues.apache.org/jira/browse/TIKA-1526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289172#comment-14289172 ] Konstantin Gribov commented on TIKA-1526: - [~thetaphi], I understand that this is jdk bug with {{{tr}}} locale. Can they use some workaround with {{{Locale.setDefault}}} if user's locale is {{{tr}}}? ExternalParser should trap/ignore/workarround JDK-8047340 JDK-8055301 so Turkish Tika users can still use non-external parsers Key: TIKA-1526 URL: https://issues.apache.org/jira/browse/TIKA-1526 Project: Tika Issue Type: Wish Reporter: Hoss Man the JDK has numerous pain points regarding the Turkish locale, posix_spawn lowercasing being one of them... https://bugs.openjdk.java.net/browse/JDK-8047340 https://bugs.openjdk.java.net/browse/JDK-8055301 As of Tika 1.7, the TesseractOCRParser (which is an ExternalParser) is enabled configured by default in Tika, and uses ExternalParser.check to see if tesseract is available -- but because of the JDK bug, this means that Tika fails fast for Turkish users on BSD/UNIX variants (including MacOSX) like so... {noformat} [junit4] Throwable #1: java.lang.Error: posix_spawn is not a supported process launch mechanism on this platform. [junit4] at java.lang.UNIXProcess$1.run(UNIXProcess.java:105) [junit4] at java.lang.UNIXProcess$1.run(UNIXProcess.java:94) [junit4] at java.security.AccessController.doPrivileged(Native Method) [junit4] at java.lang.UNIXProcess.clinit(UNIXProcess.java:92) [junit4] at java.lang.ProcessImpl.start(ProcessImpl.java:130) [junit4] at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029) [junit4] at java.lang.Runtime.exec(Runtime.java:620) [junit4] at java.lang.Runtime.exec(Runtime.java:485) [junit4] at org.apache.tika.parser.external.ExternalParser.check(ExternalParser.java:344) [junit4] at org.apache.tika.parser.ocr.TesseractOCRParser.hasTesseract(TesseractOCRParser.java:117) [junit4] at org.apache.tika.parser.ocr.TesseractOCRParser.getSupportedTypes(TesseractOCRParser.java:90) [junit4] at org.apache.tika.parser.CompositeParser.getParsers(CompositeParser.java:81) [junit4] at org.apache.tika.parser.DefaultParser.getParsers(DefaultParser.java:95) [junit4] at org.apache.tika.parser.CompositeParser.getSupportedTypes(CompositeParser.java:229) [junit4] at org.apache.tika.parser.CompositeParser.getParsers(CompositeParser.java:81) [junit4] at org.apache.tika.parser.CompositeParser.getParser(CompositeParser.java:209) [junit4] at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:244) [junit4] at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120) {noformat} ...unless they go out of their way to white list only the parsers they need/want so TesseractOCRParser (and any other ExternalParsers) will never even be check()ed. It would be nice if Tika's ExternalParser class added a similar hack/workarround to what was done in SOLR-6387 to trap these types of errors. In Solr we just propogate a better error explaining why Java hates the turkish langauge... {code} } catch (Error err) { if (err.getMessage() != null (err.getMessage().contains(posix_spawn) || err.getMessage().contains(UNIXProcess))) { log.warn(Error forking command due to JVM locale bug (see https://issues.apache.org/jira/browse/SOLR-6387): + err.getMessage()); return (error executing: + cmd + ); } } {code} ...but with Tika, it might be better for all ExternalParsers to just opt out as if they don't recognize the filetype when they detect this type of error fro m the check method (or perhaps it would be better if AutoDetectParser handled this? ... i'm not really sure how it would best fit into Tika's architecture) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-1526) ExternalParser should trap/ignore/workarround JDK-8047340 JDK-8055301 so Turkish Tika users can still use non-external parsers
[ https://issues.apache.org/jira/browse/TIKA-1526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289125#comment-14289125 ] Uwe Schindler commented on TIKA-1526: - [~grossws]: This bug is not in Maven itsself, the problem here is unsolved bug in the JDK itsself. Maven is perfectly fine, but because of the JDK bug, Maven cannot spawn external processes. ExternalParser should trap/ignore/workarround JDK-8047340 JDK-8055301 so Turkish Tika users can still use non-external parsers Key: TIKA-1526 URL: https://issues.apache.org/jira/browse/TIKA-1526 Project: Tika Issue Type: Wish Reporter: Hoss Man the JDK has numerous pain points regarding the Turkish locale, posix_spawn lowercasing being one of them... https://bugs.openjdk.java.net/browse/JDK-8047340 https://bugs.openjdk.java.net/browse/JDK-8055301 As of Tika 1.7, the TesseractOCRParser (which is an ExternalParser) is enabled configured by default in Tika, and uses ExternalParser.check to see if tesseract is available -- but because of the JDK bug, this means that Tika fails fast for Turkish users on BSD/UNIX variants (including MacOSX) like so... {noformat} [junit4] Throwable #1: java.lang.Error: posix_spawn is not a supported process launch mechanism on this platform. [junit4] at java.lang.UNIXProcess$1.run(UNIXProcess.java:105) [junit4] at java.lang.UNIXProcess$1.run(UNIXProcess.java:94) [junit4] at java.security.AccessController.doPrivileged(Native Method) [junit4] at java.lang.UNIXProcess.clinit(UNIXProcess.java:92) [junit4] at java.lang.ProcessImpl.start(ProcessImpl.java:130) [junit4] at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029) [junit4] at java.lang.Runtime.exec(Runtime.java:620) [junit4] at java.lang.Runtime.exec(Runtime.java:485) [junit4] at org.apache.tika.parser.external.ExternalParser.check(ExternalParser.java:344) [junit4] at org.apache.tika.parser.ocr.TesseractOCRParser.hasTesseract(TesseractOCRParser.java:117) [junit4] at org.apache.tika.parser.ocr.TesseractOCRParser.getSupportedTypes(TesseractOCRParser.java:90) [junit4] at org.apache.tika.parser.CompositeParser.getParsers(CompositeParser.java:81) [junit4] at org.apache.tika.parser.DefaultParser.getParsers(DefaultParser.java:95) [junit4] at org.apache.tika.parser.CompositeParser.getSupportedTypes(CompositeParser.java:229) [junit4] at org.apache.tika.parser.CompositeParser.getParsers(CompositeParser.java:81) [junit4] at org.apache.tika.parser.CompositeParser.getParser(CompositeParser.java:209) [junit4] at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:244) [junit4] at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120) {noformat} ...unless they go out of their way to white list only the parsers they need/want so TesseractOCRParser (and any other ExternalParsers) will never even be check()ed. It would be nice if Tika's ExternalParser class added a similar hack/workarround to what was done in SOLR-6387 to trap these types of errors. In Solr we just propogate a better error explaining why Java hates the turkish langauge... {code} } catch (Error err) { if (err.getMessage() != null (err.getMessage().contains(posix_spawn) || err.getMessage().contains(UNIXProcess))) { log.warn(Error forking command due to JVM locale bug (see https://issues.apache.org/jira/browse/SOLR-6387): + err.getMessage()); return (error executing: + cmd + ); } } {code} ...but with Tika, it might be better for all ExternalParsers to just opt out as if they don't recognize the filetype when they detect this type of error fro m the check method (or perhaps it would be better if AutoDetectParser handled this? ... i'm not really sure how it would best fit into Tika's architecture) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
RE: Forbidden-APIS no longer ran because of carzy POM change
Here ist he explanation why the plugin is no longer called because of this: - Works for me too, but can anyone explain why? – Andrew Swan May 15 '13 at 6:26 - @Andrew I think this works because m2e is not looking for plugins in pluginManagement, but only in build/plugins. In the Maven world, there is a difference between the two - the former defines if you happen to use this plugin, here's the configuration to use, whereas the latter states use this plugin. See this post and its top two answers. – GreenGiant Jul 5 '13 at 17:52 - I agree with @GreenGiant. I tried this solution but it then breaks the compilation since the aspectj plugin is not called before compilation. – Pierre Aug 30 '13 at 20:21 This explains the change. In fact placing the plugins in pluginManagements disables them unless explicitely configured in a sub-module. So this commit should be reverted. In fact the bug described here no longer applies to later M2E installations. It still complains about plugins that Eclipse does not know about, but this does not prevent you from using Eclipse. So I would strongly ask to revert the commit because it breaks the build. Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Uwe Schindler [mailto:u...@thetaphi.de] Sent: Friday, January 23, 2015 12:11 PM To: dev@tika.apache.org Subject: Forbidden-APIS no longer ran because of carzy POM change Hi, I just noticed while checking the problems around the ExternalParsers that the TIKA's build no longer runs the forbidden-apis Maven plugin, so we got a few new violation especially regarding the toUpper/LowerCase(). In fact the following commit broke this: Revision: 1624185 Author: mattmann Date: Donnerstag, 11. September 2014 05:11:19 Message: surround in plugin management to resolve http://stackoverflow.com/questions/6352208/how-to-solve-plugin- execution-not-covered-by-lifecycle-configuration-for-sprin Modified : /tika/trunk/tika-parent/pom.xml Since that change, the plugin is no longer run by default. I have no idea, why this is like this, but in fact this broke some of the globally defined check tasks. I have no idea how to reenable it easily. So I cannot help, but reverting that commit restores behavior. What is the reason for this commit, there is not even an issue about that. I think it seems to be a workaround for some Eclipse issue, but in fact this disables the whole plugins. To reenable forbidden-apis you have to now explicitely enable it in every module (because pluginManagement just gives the config of a plugin, where without that it also enables its execution. In addition, there is already version 1.7 of forbiddenapis, so you can replace 1.6.1 of forbidden-apis with version 1.7 (which fixes a few bugs with Java 8 and Java 9). The following new violations were found - and in fact those broke code in turkish locale: [INFO] [INFO] Building Apache Tika core 1.8-SNAPSHOT [INFO] -- -- [INFO] [INFO] --- forbiddenapis:1.7:testCheck (default) @ tika-core --- [INFO] Scanning for classes to check... [INFO] Reading bundled API signatures: jdk-unsafe [INFO] Reading bundled API signatures: jdk-deprecated [INFO] Loading classes to check... [INFO] Scanning for API signatures and dependencies... [ERROR] Forbidden method invocation: java.lang.String#toLowerCase() [Uses default locale] [ERROR] in org.apache.tika.sax.BasicContentHandlerFactoryTest (BasicContentHandlerFactoryTest.java:79) [ERROR] Forbidden method invocation: java.lang.String#toLowerCase() [Uses default locale] [ERROR] in org.apache.tika.sax.BasicContentHandlerFactoryTest (BasicContentHandlerFactoryTest.java:80) [ERROR] Forbidden method invocation: java.lang.String#toLowerCase() [Uses default locale] [ERROR] in org.apache.tika.sax.BasicContentHandlerFactoryTest (BasicContentHandlerFactoryTest.java:88) [ERROR] Forbidden method invocation: java.lang.String#toLowerCase() [Uses default locale] [ERROR] in org.apache.tika.sax.BasicContentHandlerFactoryTest (BasicContentHandlerFactoryTest.java:133) [ERROR] Forbidden method invocation: java.lang.String#toLowerCase() [Uses default locale] [ERROR] in org.apache.tika.sax.BasicContentHandlerFactoryTest (BasicContentHandlerFactoryTest.java:176) [ERROR] Forbidden method invocation: java.lang.String#toLowerCase() [Uses default locale] [ERROR] in org.apache.tika.sax.BasicContentHandlerFactoryTest (BasicContentHandlerFactoryTest.java:221) [ERROR] Forbidden method invocation: java.lang.String#toLowerCase() [Uses default locale] [ERROR] in org.apache.tika.sax.BasicContentHandlerFactoryTest (BasicContentHandlerFactoryTest.java:273) [ERROR] Scanned 52 (and 331 related) class file(s) for forbidden
RE: Forbidden-APIS no longer ran because of carzy POM change
Here is the patch, mailing list swallowed it: Index: tika-parent/pom.xml === --- tika-parent/pom.xml (revision 1654171) +++ tika-parent/pom.xml (working copy) @@ -274,7 +274,6 @@ /properties build -pluginManagement plugins plugin artifactIdmaven-compiler-plugin/artifactId @@ -287,7 +286,7 @@ plugin groupIdde.thetaphi/groupId artifactIdforbiddenapis/artifactId - version1.6.1/version + version1.7/version configuration targetVersion${maven.compiler.target}/targetVersion internalRuntimeForbiddentrue/internalRuntimeForbidden @@ -322,7 +321,6 @@ version2.3/version /plugin /plugins -/pluginManagement /build profiles - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Uwe Schindler [mailto:u...@thetaphi.de] Sent: Friday, January 23, 2015 1:08 PM To: dev@tika.apache.org Subject: RE: Forbidden-APIS no longer ran because of carzy POM change The attached patch reverts the change and updates the forbidden plugin. Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Uwe Schindler [mailto:u...@thetaphi.de] Sent: Friday, January 23, 2015 1:00 PM To: dev@tika.apache.org Subject: RE: Forbidden-APIS no longer ran because of carzy POM change Here ist he explanation why the plugin is no longer called because of this: - Works for me too, but can anyone explain why? – Andrew Swan May 15 '13 at 6:26 - @Andrew I think this works because m2e is not looking for plugins in pluginManagement, but only in build/plugins. In the Maven world, there is a difference between the two - the former defines if you happen to use this plugin, here's the configuration to use, whereas the latter states use this plugin. See this post and its top two answers. – GreenGiant Jul 5 '13 at 17:52 - I agree with @GreenGiant. I tried this solution but it then breaks the compilation since the aspectj plugin is not called before compilation. – Pierre Aug 30 '13 at 20:21 This explains the change. In fact placing the plugins in pluginManagements disables them unless explicitely configured in a sub-module. So this commit should be reverted. In fact the bug described here no longer applies to later M2E installations. It still complains about plugins that Eclipse does not know about, but this does not prevent you from using Eclipse. So I would strongly ask to revert the commit because it breaks the build. Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Uwe Schindler [mailto:u...@thetaphi.de] Sent: Friday, January 23, 2015 12:11 PM To: dev@tika.apache.org Subject: Forbidden-APIS no longer ran because of carzy POM change Hi, I just noticed while checking the problems around the ExternalParsers that the TIKA's build no longer runs the forbidden-apis Maven plugin, so we got a few new violation especially regarding the toUpper/LowerCase(). In fact the following commit broke this: Revision: 1624185 Author: mattmann Date: Donnerstag, 11. September 2014 05:11:19 Message: surround in plugin management to resolve http://stackoverflow.com/questions/6352208/how-to-solve-plugin- execution-not-covered-by-lifecycle-configuration-for-sprin Modified : /tika/trunk/tika-parent/pom.xml Since that change, the plugin is no longer run by default. I have no idea, why this is like this, but in fact this broke some of the globally defined check tasks. I have no idea how to reenable it easily. So I cannot help, but reverting that commit restores behavior. What is the reason for this commit, there is not even an issue about that. I think it seems to be a workaround for some Eclipse issue, but in fact this disables the whole plugins. To reenable forbidden-apis you have to now explicitely enable it in every module (because pluginManagement just gives the config of a plugin, where without that it also enables its execution. In addition, there is already version 1.7 of forbiddenapis, so you can replace 1.6.1 of forbidden-apis with version 1.7 (which fixes a few bugs with Java 8 and Java 9). The following new violations were found - and in fact those broke code in turkish locale: [INFO] -- -- [INFO] Building Apache Tika core 1.8-SNAPSHOT [INFO] -- -- [INFO] [INFO] --- forbiddenapis:1.7:testCheck (default) @ tika-core
[jira] [Comment Edited] (TIKA-1526) ExternalParser should trap/ignore/workarround JDK-8047340 JDK-8055301 so Turkish Tika users can still use non-external parsers
[ https://issues.apache.org/jira/browse/TIKA-1526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289172#comment-14289172 ] Konstantin Gribov edited comment on TIKA-1526 at 1/23/15 12:24 PM: --- [~thetaphi], I understand that this is jdk bug with {{tr}} locale. Can they use some workaround with {{Locale.setDefault}} if user's locale is {{tr}}? was (Author: grossws): [~thetaphi], I understand that this is jdk bug with {{{tr}}} locale. Can they use some workaround with {{{Locale.setDefault}}} if user's locale is {{{tr}}}? ExternalParser should trap/ignore/workarround JDK-8047340 JDK-8055301 so Turkish Tika users can still use non-external parsers Key: TIKA-1526 URL: https://issues.apache.org/jira/browse/TIKA-1526 Project: Tika Issue Type: Wish Reporter: Hoss Man the JDK has numerous pain points regarding the Turkish locale, posix_spawn lowercasing being one of them... https://bugs.openjdk.java.net/browse/JDK-8047340 https://bugs.openjdk.java.net/browse/JDK-8055301 As of Tika 1.7, the TesseractOCRParser (which is an ExternalParser) is enabled configured by default in Tika, and uses ExternalParser.check to see if tesseract is available -- but because of the JDK bug, this means that Tika fails fast for Turkish users on BSD/UNIX variants (including MacOSX) like so... {noformat} [junit4] Throwable #1: java.lang.Error: posix_spawn is not a supported process launch mechanism on this platform. [junit4] at java.lang.UNIXProcess$1.run(UNIXProcess.java:105) [junit4] at java.lang.UNIXProcess$1.run(UNIXProcess.java:94) [junit4] at java.security.AccessController.doPrivileged(Native Method) [junit4] at java.lang.UNIXProcess.clinit(UNIXProcess.java:92) [junit4] at java.lang.ProcessImpl.start(ProcessImpl.java:130) [junit4] at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029) [junit4] at java.lang.Runtime.exec(Runtime.java:620) [junit4] at java.lang.Runtime.exec(Runtime.java:485) [junit4] at org.apache.tika.parser.external.ExternalParser.check(ExternalParser.java:344) [junit4] at org.apache.tika.parser.ocr.TesseractOCRParser.hasTesseract(TesseractOCRParser.java:117) [junit4] at org.apache.tika.parser.ocr.TesseractOCRParser.getSupportedTypes(TesseractOCRParser.java:90) [junit4] at org.apache.tika.parser.CompositeParser.getParsers(CompositeParser.java:81) [junit4] at org.apache.tika.parser.DefaultParser.getParsers(DefaultParser.java:95) [junit4] at org.apache.tika.parser.CompositeParser.getSupportedTypes(CompositeParser.java:229) [junit4] at org.apache.tika.parser.CompositeParser.getParsers(CompositeParser.java:81) [junit4] at org.apache.tika.parser.CompositeParser.getParser(CompositeParser.java:209) [junit4] at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:244) [junit4] at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120) {noformat} ...unless they go out of their way to white list only the parsers they need/want so TesseractOCRParser (and any other ExternalParsers) will never even be check()ed. It would be nice if Tika's ExternalParser class added a similar hack/workarround to what was done in SOLR-6387 to trap these types of errors. In Solr we just propogate a better error explaining why Java hates the turkish langauge... {code} } catch (Error err) { if (err.getMessage() != null (err.getMessage().contains(posix_spawn) || err.getMessage().contains(UNIXProcess))) { log.warn(Error forking command due to JVM locale bug (see https://issues.apache.org/jira/browse/SOLR-6387): + err.getMessage()); return (error executing: + cmd + ); } } {code} ...but with Tika, it might be better for all ExternalParsers to just opt out as if they don't recognize the filetype when they detect this type of error fro m the check method (or perhaps it would be better if AutoDetectParser handled this? ... i'm not really sure how it would best fit into Tika's architecture) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-1526) ExternalParser should trap/ignore/workarround JDK-8047340 JDK-8055301 so Turkish Tika users can still use non-external parsers
[ https://issues.apache.org/jira/browse/TIKA-1526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14288963#comment-14288963 ] Uwe Schindler commented on TIKA-1526: - I tried it with maven, but this is all too funny. This bug also affects Maven... {noformat} [uschindler@lucene ~]$ export MAVEN_OPTS=-Duser.language=tr [uschindler@lucene ~]$ mvn --- constituent[0]: file:/usr/local/share/java/maven3/lib/aether-connector-wagon-1.13.1.jar constituent[1]: file:/usr/local/share/java/maven3/lib/maven-repository-metadata-3.0.4.jar constituent[2]: file:/usr/local/share/java/maven3/lib/plexus-sec-dispatcher-1.3.jar constituent[3]: file:/usr/local/share/java/maven3/lib/aether-spi-1.13.1.jar constituent[4]: file:/usr/local/share/java/maven3/lib/maven-compat-3.0.4.jar constituent[5]: file:/usr/local/share/java/maven3/lib/plexus-component-annotations-1.5.5.jar constituent[6]: file:/usr/local/share/java/maven3/lib/plexus-cipher-1.7.jar constituent[7]: file:/usr/local/share/java/maven3/lib/sisu-guava-0.9.9.jar constituent[8]: file:/usr/local/share/java/maven3/lib/maven-core-3.0.4.jar constituent[9]: file:/usr/local/share/java/maven3/lib/plexus-utils-2.0.6.jar constituent[10]: file:/usr/local/share/java/maven3/lib/wagon-provider-api-2.2.jar constituent[11]: file:/usr/local/share/java/maven3/lib/maven-plugin-api-3.0.4.jar constituent[12]: file:/usr/local/share/java/maven3/lib/maven-model-builder-3.0.4.jar constituent[13]: file:/usr/local/share/java/maven3/lib/maven-settings-3.0.4.jar constituent[14]: file:/usr/local/share/java/maven3/lib/sisu-inject-bean-2.3.0.jar constituent[15]: file:/usr/local/share/java/maven3/lib/wagon-http-2.2-shaded.jar constituent[16]: file:/usr/local/share/java/maven3/lib/maven-aether-provider-3.0.4.jar constituent[17]: file:/usr/local/share/java/maven3/lib/sisu-inject-plexus-2.3.0.jar constituent[18]: file:/usr/local/share/java/maven3/lib/maven-artifact-3.0.4.jar constituent[19]: file:/usr/local/share/java/maven3/lib/maven-model-3.0.4.jar constituent[20]: file:/usr/local/share/java/maven3/lib/wagon-file-2.2.jar constituent[21]: file:/usr/local/share/java/maven3/lib/maven-embedder-3.0.4.jar constituent[22]: file:/usr/local/share/java/maven3/lib/sisu-guice-3.1.0-no_aop.jar constituent[23]: file:/usr/local/share/java/maven3/lib/maven-settings-builder-3.0.4.jar constituent[24]: file:/usr/local/share/java/maven3/lib/plexus-interpolation-1.14.jar constituent[25]: file:/usr/local/share/java/maven3/lib/aether-impl-1.13.1.jar constituent[26]: file:/usr/local/share/java/maven3/lib/aether-api-1.13.1.jar constituent[27]: file:/usr/local/share/java/maven3/lib/aether-util-1.13.1.jar constituent[28]: file:/usr/local/share/java/maven3/lib/commons-cli-1.2.jar --- Exception in thread main java.lang.Error: posix_spawn is not a supported process launch mechanism on this platform. at java.lang.UNIXProcess$1.run(UNIXProcess.java:111) at java.lang.UNIXProcess$1.run(UNIXProcess.java:93) at java.security.AccessController.doPrivileged(Native Method) at java.lang.UNIXProcess.clinit(UNIXProcess.java:91) at java.lang.ProcessImpl.start(ProcessImpl.java:130) at java.lang.ProcessBuilder.start(ProcessBuilder.java:1028) at java.lang.Runtime.exec(Runtime.java:617) at java.lang.Runtime.exec(Runtime.java:450) at java.lang.Runtime.exec(Runtime.java:347) at org.codehaus.plexus.interpolation.os.OperatingSystemUtils.getSystemEnvVars(OperatingSystemUtils.java:86) at org.codehaus.plexus.interpolation.EnvarBasedValueSource.getEnvars(EnvarBasedValueSource.java:74) at org.codehaus.plexus.interpolation.EnvarBasedValueSource.init(EnvarBasedValueSource.java:64) at org.codehaus.plexus.interpolation.EnvarBasedValueSource.init(EnvarBasedValueSource.java:50) at org.apache.maven.settings.building.DefaultSettingsBuilder.interpolate(DefaultSettingsBuilder.java:222) at org.apache.maven.settings.building.DefaultSettingsBuilder.build(DefaultSettingsBuilder.java:101) at org.apache.maven.cli.MavenCli.settings(MavenCli.java:725) at org.apache.maven.cli.MavenCli.doMain(MavenCli.java:193) at org.apache.maven.cli.MavenCli.main(MavenCli.java:141) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:606) at org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced(Launcher.java:290) at org.codehaus.plexus.classworlds.launcher.Launcher.launch(Launcher.java:230) at
Forbidden-APIS no longer ran because of carzy POM change
Hi, I just noticed while checking the problems around the ExternalParsers that the TIKA's build no longer runs the forbidden-apis Maven plugin, so we got a few new violation especially regarding the toUpper/LowerCase(). In fact the following commit broke this: Revision: 1624185 Author: mattmann Date: Donnerstag, 11. September 2014 05:11:19 Message: surround in plugin management to resolve http://stackoverflow.com/questions/6352208/how-to-solve-plugin-execution-not-covered-by-lifecycle-configuration-for-sprin Modified : /tika/trunk/tika-parent/pom.xml Since that change, the plugin is no longer run by default. I have no idea, why this is like this, but in fact this broke some of the globally defined check tasks. I have no idea how to reenable it easily. So I cannot help, but reverting that commit restores behavior. What is the reason for this commit, there is not even an issue about that. I think it seems to be a workaround for some Eclipse issue, but in fact this disables the whole plugins. To reenable forbidden-apis you have to now explicitely enable it in every module (because pluginManagement just gives the config of a plugin, where without that it also enables its execution. In addition, there is already version 1.7 of forbiddenapis, so you can replace 1.6.1 of forbidden-apis with version 1.7 (which fixes a few bugs with Java 8 and Java 9). The following new violations were found - and in fact those broke code in turkish locale: [INFO] [INFO] Building Apache Tika core 1.8-SNAPSHOT [INFO] [INFO] [INFO] --- forbiddenapis:1.7:testCheck (default) @ tika-core --- [INFO] Scanning for classes to check... [INFO] Reading bundled API signatures: jdk-unsafe [INFO] Reading bundled API signatures: jdk-deprecated [INFO] Loading classes to check... [INFO] Scanning for API signatures and dependencies... [ERROR] Forbidden method invocation: java.lang.String#toLowerCase() [Uses default locale] [ERROR] in org.apache.tika.sax.BasicContentHandlerFactoryTest (BasicContentHandlerFactoryTest.java:79) [ERROR] Forbidden method invocation: java.lang.String#toLowerCase() [Uses default locale] [ERROR] in org.apache.tika.sax.BasicContentHandlerFactoryTest (BasicContentHandlerFactoryTest.java:80) [ERROR] Forbidden method invocation: java.lang.String#toLowerCase() [Uses default locale] [ERROR] in org.apache.tika.sax.BasicContentHandlerFactoryTest (BasicContentHandlerFactoryTest.java:88) [ERROR] Forbidden method invocation: java.lang.String#toLowerCase() [Uses default locale] [ERROR] in org.apache.tika.sax.BasicContentHandlerFactoryTest (BasicContentHandlerFactoryTest.java:133) [ERROR] Forbidden method invocation: java.lang.String#toLowerCase() [Uses default locale] [ERROR] in org.apache.tika.sax.BasicContentHandlerFactoryTest (BasicContentHandlerFactoryTest.java:176) [ERROR] Forbidden method invocation: java.lang.String#toLowerCase() [Uses default locale] [ERROR] in org.apache.tika.sax.BasicContentHandlerFactoryTest (BasicContentHandlerFactoryTest.java:221) [ERROR] Forbidden method invocation: java.lang.String#toLowerCase() [Uses default locale] [ERROR] in org.apache.tika.sax.BasicContentHandlerFactoryTest (BasicContentHandlerFactoryTest.java:273) [ERROR] Scanned 52 (and 331 related) class file(s) for forbidden API invocations (in 0.16s), 7 error(s). [INFO] [...] [INFO] [INFO] Building Apache Tika parsers 1.8-SNAPSHOT [INFO] [INFO] [INFO] --- forbiddenapis:1.7:check (default-cli) @ tika-parsers --- [INFO] Scanning for classes to check... [INFO] Reading bundled API signatures: jdk-unsafe [INFO] Reading bundled API signatures: jdk-deprecated [INFO] Loading classes to check... [INFO] Scanning for API signatures and dependencies... [ERROR] Forbidden method invocation: java.io.InputStreamReader#init(java.io.InputStream) [Uses default charset] [ERROR] in org.apache.tika.parser.ocr.TesseractOCRParser$2 (TesseractOCRParser.java:309) [ERROR] Forbidden method invocation: java.lang.String#init(byte[],int,int) [Uses default charset] [ERROR] in org.apache.tika.parser.chm.accessor.ChmDirectoryListingSet (ChmDirectoryListingSet.java:240) [ERROR] Forbidden method invocation: java.text.SimpleDateFormat#init(java.lang.String) [Uses default locale] [ERROR] in org.apache.tika.parser.image.ImageMetadataExtractor$ExifHandler$1 (ImageMetadataExtractor.java:304) [ERROR] Forbidden method invocation: java.lang.String#format(java.lang.String,java.lang.Object[]) [Uses default locale] [ERROR] in org.apache.tika.parser.ocr.TesseractOCRConfig (TesseractOCRConfig.java:214) [ERROR] Scanned 281 (and 813 related) class file(s) for forbidden
[jira] [Commented] (TIKA-1526) ExternalParser should trap/ignore/workarround JDK-8047340 JDK-8055301 so Turkish Tika users can still use non-external parsers
[ https://issues.apache.org/jira/browse/TIKA-1526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14288977#comment-14288977 ] Konstantin Gribov commented on TIKA-1526: - [~thetaphi], they fixed this in 2.0 RC some time ago.. http://jira.codehaus.org/browse/MNG-597 As I can see, you have Maven 3. Can you create an issue there with your environment description? ExternalParser should trap/ignore/workarround JDK-8047340 JDK-8055301 so Turkish Tika users can still use non-external parsers Key: TIKA-1526 URL: https://issues.apache.org/jira/browse/TIKA-1526 Project: Tika Issue Type: Wish Reporter: Hoss Man the JDK has numerous pain points regarding the Turkish locale, posix_spawn lowercasing being one of them... https://bugs.openjdk.java.net/browse/JDK-8047340 https://bugs.openjdk.java.net/browse/JDK-8055301 As of Tika 1.7, the TesseractOCRParser (which is an ExternalParser) is enabled configured by default in Tika, and uses ExternalParser.check to see if tesseract is available -- but because of the JDK bug, this means that Tika fails fast for Turkish users on BSD/UNIX variants (including MacOSX) like so... {noformat} [junit4] Throwable #1: java.lang.Error: posix_spawn is not a supported process launch mechanism on this platform. [junit4] at java.lang.UNIXProcess$1.run(UNIXProcess.java:105) [junit4] at java.lang.UNIXProcess$1.run(UNIXProcess.java:94) [junit4] at java.security.AccessController.doPrivileged(Native Method) [junit4] at java.lang.UNIXProcess.clinit(UNIXProcess.java:92) [junit4] at java.lang.ProcessImpl.start(ProcessImpl.java:130) [junit4] at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029) [junit4] at java.lang.Runtime.exec(Runtime.java:620) [junit4] at java.lang.Runtime.exec(Runtime.java:485) [junit4] at org.apache.tika.parser.external.ExternalParser.check(ExternalParser.java:344) [junit4] at org.apache.tika.parser.ocr.TesseractOCRParser.hasTesseract(TesseractOCRParser.java:117) [junit4] at org.apache.tika.parser.ocr.TesseractOCRParser.getSupportedTypes(TesseractOCRParser.java:90) [junit4] at org.apache.tika.parser.CompositeParser.getParsers(CompositeParser.java:81) [junit4] at org.apache.tika.parser.DefaultParser.getParsers(DefaultParser.java:95) [junit4] at org.apache.tika.parser.CompositeParser.getSupportedTypes(CompositeParser.java:229) [junit4] at org.apache.tika.parser.CompositeParser.getParsers(CompositeParser.java:81) [junit4] at org.apache.tika.parser.CompositeParser.getParser(CompositeParser.java:209) [junit4] at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:244) [junit4] at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120) {noformat} ...unless they go out of their way to white list only the parsers they need/want so TesseractOCRParser (and any other ExternalParsers) will never even be check()ed. It would be nice if Tika's ExternalParser class added a similar hack/workarround to what was done in SOLR-6387 to trap these types of errors. In Solr we just propogate a better error explaining why Java hates the turkish langauge... {code} } catch (Error err) { if (err.getMessage() != null (err.getMessage().contains(posix_spawn) || err.getMessage().contains(UNIXProcess))) { log.warn(Error forking command due to JVM locale bug (see https://issues.apache.org/jira/browse/SOLR-6387): + err.getMessage()); return (error executing: + cmd + ); } } {code} ...but with Tika, it might be better for all ExternalParsers to just opt out as if they don't recognize the filetype when they detect this type of error fro m the check method (or perhaps it would be better if AutoDetectParser handled this? ... i'm not really sure how it would best fit into Tika's architecture) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (TIKA-1529) Turn forbidden-apis back on
Tim Allison created TIKA-1529: - Summary: Turn forbidden-apis back on Key: TIKA-1529 URL: https://issues.apache.org/jira/browse/TIKA-1529 Project: Tika Issue Type: Bug Reporter: Tim Allison Priority: Minor [~thetaphi] recently noticed on that forbidden-apis was turned off in r1624185, and he submitted a patch to the dev list. Let's turn it back on. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-1529) Turn forbidden-apis back on
[ https://issues.apache.org/jira/browse/TIKA-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289311#comment-14289311 ] Tim Allison commented on TIKA-1529: --- For UnsupportedEncodingException, Lucene/Solr handles this in different ways: {noformat} try { out = new PrintStream(bos, false, IOUtils.UTF_8); } catch (UnsupportedEncodingException bogus) { throw new RuntimeException(bogus); } {noformat} or {noformat} } catch (UnsupportedEncodingException e) { } {noformat} or {noformat} } catch (UnsupportedEncodingException e) { throw new Error(JVM Does not seem to support UTF-8, e); } {noformat} What's our preference? Turn forbidden-apis back on --- Key: TIKA-1529 URL: https://issues.apache.org/jira/browse/TIKA-1529 Project: Tika Issue Type: Bug Reporter: Tim Allison Priority: Minor [~thetaphi] recently noticed that forbidden-apis was turned off in r1624185, and he submitted a patch to the dev list. Let's turn it back on. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-1529) Turn forbidden-apis back on
[ https://issues.apache.org/jira/browse/TIKA-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289304#comment-14289304 ] Tim Allison commented on TIKA-1529: --- I just fixed issues BasicContentHandlerFactoryTest in r1654225. Not sure how to fix chm parser, without doing a semi-manual copying of bytes to a StringBuilder. [~tpalsulich], is it ok to use Locale.ROOT in two places in Tesseract parser and config? Turn forbidden-apis back on --- Key: TIKA-1529 URL: https://issues.apache.org/jira/browse/TIKA-1529 Project: Tika Issue Type: Bug Reporter: Tim Allison Priority: Minor [~thetaphi] recently noticed on that forbidden-apis was turned off in r1624185, and he submitted a patch to the dev list. Let's turn it back on. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (TIKA-1529) Turn forbidden-apis back on
[ https://issues.apache.org/jira/browse/TIKA-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Allison updated TIKA-1529: -- Description: [~thetaphi] recently noticed that forbidden-apis was turned off in r1624185, and he submitted a patch to the dev list. Let's turn it back on. (was: [~thetaphi] recently noticed on that forbidden-apis was turned off in r1624185, and he submitted a patch to the dev list. Let's turn it back on.) Turn forbidden-apis back on --- Key: TIKA-1529 URL: https://issues.apache.org/jira/browse/TIKA-1529 Project: Tika Issue Type: Bug Reporter: Tim Allison Priority: Minor [~thetaphi] recently noticed that forbidden-apis was turned off in r1624185, and he submitted a patch to the dev list. Let's turn it back on. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (TIKA-1529) Turn forbidden-apis back on
[ https://issues.apache.org/jira/browse/TIKA-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289304#comment-14289304 ] Tim Allison edited comment on TIKA-1529 at 1/23/15 2:41 PM: I fixed issues with BasicContentHandlerFactoryTest in r1654225. Not sure how to fix chm parser, without doing a semi-manual copying of bytes to a StringBuilder. [~tpalsulich], is it ok to use Locale.ROOT in two places in Tesseract parser and config? was (Author: talli...@mitre.org): I just fixed issues BasicContentHandlerFactoryTest in r1654225. Not sure how to fix chm parser, without doing a semi-manual copying of bytes to a StringBuilder. [~tpalsulich], is it ok to use Locale.ROOT in two places in Tesseract parser and config? Turn forbidden-apis back on --- Key: TIKA-1529 URL: https://issues.apache.org/jira/browse/TIKA-1529 Project: Tika Issue Type: Bug Reporter: Tim Allison Priority: Minor [~thetaphi] recently noticed on that forbidden-apis was turned off in r1624185, and he submitted a patch to the dev list. Let's turn it back on. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-1529) Turn forbidden-apis back on
[ https://issues.apache.org/jira/browse/TIKA-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289405#comment-14289405 ] Tyler Palsulich commented on TIKA-1529: --- +1 to {{RuntimeException}}. Turn forbidden-apis back on --- Key: TIKA-1529 URL: https://issues.apache.org/jira/browse/TIKA-1529 Project: Tika Issue Type: Bug Reporter: Tim Allison Priority: Minor [~thetaphi] recently noticed that forbidden-apis was turned off in r1624185, and he submitted a patch to the dev list. Let's turn it back on. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-1529) Turn forbidden-apis back on
[ https://issues.apache.org/jira/browse/TIKA-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289449#comment-14289449 ] Tim Allison commented on TIKA-1529: --- Agreed on US-ASCII, but aren't there illegal combinations in UTF-8? Exceedingly rare, I admit... Turn forbidden-apis back on --- Key: TIKA-1529 URL: https://issues.apache.org/jira/browse/TIKA-1529 Project: Tika Issue Type: Bug Reporter: Tim Allison Priority: Minor [~thetaphi] recently noticed that forbidden-apis was turned off in r1624185, and he submitted a patch to the dev list. Let's turn it back on. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Forbidden-APIS no longer ran because of carzy POM change
Thanks Uwe, no problem we’ll figure it out. We’ll get it re-enabled and also figure out the Eclipse thing. Thanks for bringing this up! ++ Chris Mattmann, Ph.D. Chief Architect Instrument Software and Science Data Systems Section (398) NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 168-519, Mailstop: 168-527 Email: chris.a.mattm...@nasa.gov WWW: http://sunset.usc.edu/~mattmann/ ++ Adjunct Associate Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++ -Original Message- From: Uwe Schindler u...@thetaphi.de Reply-To: dev@tika.apache.org dev@tika.apache.org Date: Friday, January 23, 2015 at 7:20 AM To: dev@tika.apache.org dev@tika.apache.org Subject: RE: Forbidden-APIS no longer ran because of carzy POM change Hi, Hmm, weird, that’s a commit from September 2014, Uwe, so quite a while ago. I think I was having some issues in Eclipse complaining about that plugin, so I used the workaround presented on StackOverflow to deal with it. I’m not fine reverting the commit unless the behavior that it did was preserved - in other words, I wanted Eclipse to stop complaining about that plugin. So maybe we can figure out a way that both enables the plugin, and makes Eclipse not complain about it. For me it just says that it cannot handle that plugin, but it does not prevent you from using Eclipse or running anything in eclipse. I have the plugin in various Eclipse projects with Maven running here locally... Another option would be to make a Maven profile like you do for RAT? Unfortunately I have no idea how to do this correctly. In that case you could just instruct Jenkins to run the profile... I’ll check. ++ Chris Mattmann, Ph.D. Chief Architect Instrument Software and Science Data Systems Section (398) NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 168-519, Mailstop: 168-527 Email: chris.a.mattm...@nasa.gov WWW: http://sunset.usc.edu/~mattmann/ ++ Adjunct Associate Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++ -Original Message- From: Uwe Schindler u...@thetaphi.de Reply-To: dev@tika.apache.org dev@tika.apache.org Date: Friday, January 23, 2015 at 3:11 AM To: dev@tika.apache.org dev@tika.apache.org Subject: Forbidden-APIS no longer ran because of carzy POM change Hi, I just noticed while checking the problems around the ExternalParsers that the TIKA's build no longer runs the forbidden-apis Maven plugin, so we got a few new violation especially regarding the toUpper/LowerCase(). In fact the following commit broke this: Revision: 1624185 Author: mattmann Date: Donnerstag, 11. September 2014 05:11:19 Message: surround in plugin management to resolve http://stackoverflow.com/questions/6352208/how-to-solve-plugin- executio n-n ot-covered-by-lifecycle-configuration-for-sprin Modified : /tika/trunk/tika-parent/pom.xml Since that change, the plugin is no longer run by default. I have no idea, why this is like this, but in fact this broke some of the globally defined check tasks. I have no idea how to reenable it easily. So I cannot help, but reverting that commit restores behavior. What is the reason for this commit, there is not even an issue about that. I think it seems to be a workaround for some Eclipse issue, but in fact this disables the whole plugins. To reenable forbidden-apis you have to now explicitely enable it in every module (because pluginManagement just gives the config of a plugin, where without that it also enables its execution. In addition, there is already version 1.7 of forbiddenapis, so you can replace 1.6.1 of forbidden-apis with version 1.7 (which fixes a few bugs with Java 8 and Java 9). The following new violations were found - and in fact those broke code in turkish locale: [INFO] --- - [INFO] Building Apache Tika core 1.8-SNAPSHOT [INFO] --- - [INFO] [INFO] --- forbiddenapis:1.7:testCheck (default) @ tika-core --- [INFO] Scanning for classes to check... [INFO] Reading bundled API signatures: jdk-unsafe [INFO] Reading bundled API signatures: jdk-deprecated [INFO] Loading classes to check... [INFO] Scanning for API signatures and dependencies... [ERROR] Forbidden method invocation: java.lang.String#toLowerCase() [Uses default locale] [ERROR] in org.apache.tika.sax.BasicContentHandlerFactoryTest
[jira] [Commented] (TIKA-1526) ExternalParser should trap/ignore/workarround JDK-8047340 JDK-8055301 so Turkish Tika users can still use non-external parsers
[ https://issues.apache.org/jira/browse/TIKA-1526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289419#comment-14289419 ] Tyler Palsulich commented on TIKA-1526: --- This is exactly how I saw the bug. I was confused that no tests were running, tried switching a config, and never saw the error again (as discussed). ExternalParser should trap/ignore/workarround JDK-8047340 JDK-8055301 so Turkish Tika users can still use non-external parsers Key: TIKA-1526 URL: https://issues.apache.org/jira/browse/TIKA-1526 Project: Tika Issue Type: Wish Reporter: Hoss Man the JDK has numerous pain points regarding the Turkish locale, posix_spawn lowercasing being one of them... https://bugs.openjdk.java.net/browse/JDK-8047340 https://bugs.openjdk.java.net/browse/JDK-8055301 As of Tika 1.7, the TesseractOCRParser (which is an ExternalParser) is enabled configured by default in Tika, and uses ExternalParser.check to see if tesseract is available -- but because of the JDK bug, this means that Tika fails fast for Turkish users on BSD/UNIX variants (including MacOSX) like so... {noformat} [junit4] Throwable #1: java.lang.Error: posix_spawn is not a supported process launch mechanism on this platform. [junit4] at java.lang.UNIXProcess$1.run(UNIXProcess.java:105) [junit4] at java.lang.UNIXProcess$1.run(UNIXProcess.java:94) [junit4] at java.security.AccessController.doPrivileged(Native Method) [junit4] at java.lang.UNIXProcess.clinit(UNIXProcess.java:92) [junit4] at java.lang.ProcessImpl.start(ProcessImpl.java:130) [junit4] at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029) [junit4] at java.lang.Runtime.exec(Runtime.java:620) [junit4] at java.lang.Runtime.exec(Runtime.java:485) [junit4] at org.apache.tika.parser.external.ExternalParser.check(ExternalParser.java:344) [junit4] at org.apache.tika.parser.ocr.TesseractOCRParser.hasTesseract(TesseractOCRParser.java:117) [junit4] at org.apache.tika.parser.ocr.TesseractOCRParser.getSupportedTypes(TesseractOCRParser.java:90) [junit4] at org.apache.tika.parser.CompositeParser.getParsers(CompositeParser.java:81) [junit4] at org.apache.tika.parser.DefaultParser.getParsers(DefaultParser.java:95) [junit4] at org.apache.tika.parser.CompositeParser.getSupportedTypes(CompositeParser.java:229) [junit4] at org.apache.tika.parser.CompositeParser.getParsers(CompositeParser.java:81) [junit4] at org.apache.tika.parser.CompositeParser.getParser(CompositeParser.java:209) [junit4] at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:244) [junit4] at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120) {noformat} ...unless they go out of their way to white list only the parsers they need/want so TesseractOCRParser (and any other ExternalParsers) will never even be check()ed. It would be nice if Tika's ExternalParser class added a similar hack/workarround to what was done in SOLR-6387 to trap these types of errors. In Solr we just propogate a better error explaining why Java hates the turkish langauge... {code} } catch (Error err) { if (err.getMessage() != null (err.getMessage().contains(posix_spawn) || err.getMessage().contains(UNIXProcess))) { log.warn(Error forking command due to JVM locale bug (see https://issues.apache.org/jira/browse/SOLR-6387): + err.getMessage()); return (error executing: + cmd + ); } } {code} ...but with Tika, it might be better for all ExternalParsers to just opt out as if they don't recognize the filetype when they detect this type of error fro m the check method (or perhaps it would be better if AutoDetectParser handled this? ... i'm not really sure how it would best fit into Tika's architecture) -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Forbidden-APIS no longer ran because of carzy POM change
Hmm, weird, that’s a commit from September 2014, Uwe, so quite a while ago. I think I was having some issues in Eclipse complaining about that plugin, so I used the workaround presented on StackOverflow to deal with it. I’m not fine reverting the commit unless the behavior that it did was preserved - in other words, I wanted Eclipse to stop complaining about that plugin. So maybe we can figure out a way that both enables the plugin, and makes Eclipse not complain about it. I’ll check. ++ Chris Mattmann, Ph.D. Chief Architect Instrument Software and Science Data Systems Section (398) NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 168-519, Mailstop: 168-527 Email: chris.a.mattm...@nasa.gov WWW: http://sunset.usc.edu/~mattmann/ ++ Adjunct Associate Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++ -Original Message- From: Uwe Schindler u...@thetaphi.de Reply-To: dev@tika.apache.org dev@tika.apache.org Date: Friday, January 23, 2015 at 3:11 AM To: dev@tika.apache.org dev@tika.apache.org Subject: Forbidden-APIS no longer ran because of carzy POM change Hi, I just noticed while checking the problems around the ExternalParsers that the TIKA's build no longer runs the forbidden-apis Maven plugin, so we got a few new violation especially regarding the toUpper/LowerCase(). In fact the following commit broke this: Revision: 1624185 Author: mattmann Date: Donnerstag, 11. September 2014 05:11:19 Message: surround in plugin management to resolve http://stackoverflow.com/questions/6352208/how-to-solve-plugin-execution-n ot-covered-by-lifecycle-configuration-for-sprin Modified : /tika/trunk/tika-parent/pom.xml Since that change, the plugin is no longer run by default. I have no idea, why this is like this, but in fact this broke some of the globally defined check tasks. I have no idea how to reenable it easily. So I cannot help, but reverting that commit restores behavior. What is the reason for this commit, there is not even an issue about that. I think it seems to be a workaround for some Eclipse issue, but in fact this disables the whole plugins. To reenable forbidden-apis you have to now explicitely enable it in every module (because pluginManagement just gives the config of a plugin, where without that it also enables its execution. In addition, there is already version 1.7 of forbiddenapis, so you can replace 1.6.1 of forbidden-apis with version 1.7 (which fixes a few bugs with Java 8 and Java 9). The following new violations were found - and in fact those broke code in turkish locale: [INFO] [INFO] Building Apache Tika core 1.8-SNAPSHOT [INFO] [INFO] [INFO] --- forbiddenapis:1.7:testCheck (default) @ tika-core --- [INFO] Scanning for classes to check... [INFO] Reading bundled API signatures: jdk-unsafe [INFO] Reading bundled API signatures: jdk-deprecated [INFO] Loading classes to check... [INFO] Scanning for API signatures and dependencies... [ERROR] Forbidden method invocation: java.lang.String#toLowerCase() [Uses default locale] [ERROR] in org.apache.tika.sax.BasicContentHandlerFactoryTest (BasicContentHandlerFactoryTest.java:79) [ERROR] Forbidden method invocation: java.lang.String#toLowerCase() [Uses default locale] [ERROR] in org.apache.tika.sax.BasicContentHandlerFactoryTest (BasicContentHandlerFactoryTest.java:80) [ERROR] Forbidden method invocation: java.lang.String#toLowerCase() [Uses default locale] [ERROR] in org.apache.tika.sax.BasicContentHandlerFactoryTest (BasicContentHandlerFactoryTest.java:88) [ERROR] Forbidden method invocation: java.lang.String#toLowerCase() [Uses default locale] [ERROR] in org.apache.tika.sax.BasicContentHandlerFactoryTest (BasicContentHandlerFactoryTest.java:133) [ERROR] Forbidden method invocation: java.lang.String#toLowerCase() [Uses default locale] [ERROR] in org.apache.tika.sax.BasicContentHandlerFactoryTest (BasicContentHandlerFactoryTest.java:176) [ERROR] Forbidden method invocation: java.lang.String#toLowerCase() [Uses default locale] [ERROR] in org.apache.tika.sax.BasicContentHandlerFactoryTest (BasicContentHandlerFactoryTest.java:221) [ERROR] Forbidden method invocation: java.lang.String#toLowerCase() [Uses default locale] [ERROR] in org.apache.tika.sax.BasicContentHandlerFactoryTest (BasicContentHandlerFactoryTest.java:273) [ERROR] Scanned 52 (and 331 related) class file(s) for forbidden API invocations (in 0.16s), 7 error(s). [INFO] [...] [INFO] [INFO] Building Apache Tika
[jira] [Commented] (TIKA-1529) Turn forbidden-apis back on
[ https://issues.apache.org/jira/browse/TIKA-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289361#comment-14289361 ] Chris A. Mattmann commented on TIKA-1529: - Tim, I'm OK with figuring out how to turn it back on, but not at the expense of my Eclipse complaining at me which I was trying to do back in September 2014. Let me see if I can help find a workaround for both. Turn forbidden-apis back on --- Key: TIKA-1529 URL: https://issues.apache.org/jira/browse/TIKA-1529 Project: Tika Issue Type: Bug Reporter: Tim Allison Priority: Minor [~thetaphi] recently noticed that forbidden-apis was turned off in r1624185, and he submitted a patch to the dev list. Let's turn it back on. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-1529) Turn forbidden-apis back on
[ https://issues.apache.org/jira/browse/TIKA-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289369#comment-14289369 ] Tim Allison commented on TIKA-1529: --- [~grossws], for the following in {noformat}testExtractChmEntry{noformat} {noformat} //validate html String html = new String(data); {noformat} Should this be ISO-8859-1? Turn forbidden-apis back on --- Key: TIKA-1529 URL: https://issues.apache.org/jira/browse/TIKA-1529 Project: Tika Issue Type: Bug Reporter: Tim Allison Priority: Minor [~thetaphi] recently noticed that forbidden-apis was turned off in r1624185, and he submitted a patch to the dev list. Let's turn it back on. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-1529) Turn forbidden-apis back on
[ https://issues.apache.org/jira/browse/TIKA-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289390#comment-14289390 ] Chris A. Mattmann commented on TIKA-1529: - Hi Tim, I tried Uwe's patch on the latest version of Eclipse Version: Luna Service Release 1a (4.4.1) and M2e. Eclipse doesn't seem to complain to me anymore. Yay! I'm +1 to apply the patch. Turn forbidden-apis back on --- Key: TIKA-1529 URL: https://issues.apache.org/jira/browse/TIKA-1529 Project: Tika Issue Type: Bug Reporter: Tim Allison Priority: Minor [~thetaphi] recently noticed that forbidden-apis was turned off in r1624185, and he submitted a patch to the dev list. Let's turn it back on. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-1529) Turn forbidden-apis back on
[ https://issues.apache.org/jira/browse/TIKA-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289402#comment-14289402 ] Tyler Palsulich commented on TIKA-1529: --- Yes, Locale.ROOT is OK. Turn forbidden-apis back on --- Key: TIKA-1529 URL: https://issues.apache.org/jira/browse/TIKA-1529 Project: Tika Issue Type: Bug Reporter: Tim Allison Priority: Minor [~thetaphi] recently noticed that forbidden-apis was turned off in r1624185, and he submitted a patch to the dev list. Let's turn it back on. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-1529) Turn forbidden-apis back on
[ https://issues.apache.org/jira/browse/TIKA-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289333#comment-14289333 ] Konstantin Gribov commented on TIKA-1529: - I vote for throwing {{RuntimeException}} or {{TikaException}} with cause={{UnsupportedEncodingException}} and human-readable text. Turn forbidden-apis back on --- Key: TIKA-1529 URL: https://issues.apache.org/jira/browse/TIKA-1529 Project: Tika Issue Type: Bug Reporter: Tim Allison Priority: Minor [~thetaphi] recently noticed that forbidden-apis was turned off in r1624185, and he submitted a patch to the dev list. Let's turn it back on. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-1529) Turn forbidden-apis back on
[ https://issues.apache.org/jira/browse/TIKA-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289337#comment-14289337 ] Hudson commented on TIKA-1529: -- UNSTABLE: Integrated in tika-trunk-jdk1.6 #433 (See [https://builds.apache.org/job/tika-trunk-jdk1.6/433/]) TIKA-1529: step 1...get rid of toLowerCase in BasicContentHandlerFactoryTest (tallison: http://svn.apache.org/viewvc/tika/trunk/?view=revrev=1654225) * /tika/trunk/tika-core/src/test/java/org/apache/tika/sax/BasicContentHandlerFactoryTest.java Turn forbidden-apis back on --- Key: TIKA-1529 URL: https://issues.apache.org/jira/browse/TIKA-1529 Project: Tika Issue Type: Bug Reporter: Tim Allison Priority: Minor [~thetaphi] recently noticed that forbidden-apis was turned off in r1624185, and he submitted a patch to the dev list. Let's turn it back on. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
RE: Forbidden-APIS no longer ran because of carzy POM change
Hi, Hmm, weird, that’s a commit from September 2014, Uwe, so quite a while ago. I think I was having some issues in Eclipse complaining about that plugin, so I used the workaround presented on StackOverflow to deal with it. I’m not fine reverting the commit unless the behavior that it did was preserved - in other words, I wanted Eclipse to stop complaining about that plugin. So maybe we can figure out a way that both enables the plugin, and makes Eclipse not complain about it. For me it just says that it cannot handle that plugin, but it does not prevent you from using Eclipse or running anything in eclipse. I have the plugin in various Eclipse projects with Maven running here locally... Another option would be to make a Maven profile like you do for RAT? Unfortunately I have no idea how to do this correctly. In that case you could just instruct Jenkins to run the profile... I’ll check. ++ Chris Mattmann, Ph.D. Chief Architect Instrument Software and Science Data Systems Section (398) NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 168-519, Mailstop: 168-527 Email: chris.a.mattm...@nasa.gov WWW: http://sunset.usc.edu/~mattmann/ ++ Adjunct Associate Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++ -Original Message- From: Uwe Schindler u...@thetaphi.de Reply-To: dev@tika.apache.org dev@tika.apache.org Date: Friday, January 23, 2015 at 3:11 AM To: dev@tika.apache.org dev@tika.apache.org Subject: Forbidden-APIS no longer ran because of carzy POM change Hi, I just noticed while checking the problems around the ExternalParsers that the TIKA's build no longer runs the forbidden-apis Maven plugin, so we got a few new violation especially regarding the toUpper/LowerCase(). In fact the following commit broke this: Revision: 1624185 Author: mattmann Date: Donnerstag, 11. September 2014 05:11:19 Message: surround in plugin management to resolve http://stackoverflow.com/questions/6352208/how-to-solve-plugin- executio n-n ot-covered-by-lifecycle-configuration-for-sprin Modified : /tika/trunk/tika-parent/pom.xml Since that change, the plugin is no longer run by default. I have no idea, why this is like this, but in fact this broke some of the globally defined check tasks. I have no idea how to reenable it easily. So I cannot help, but reverting that commit restores behavior. What is the reason for this commit, there is not even an issue about that. I think it seems to be a workaround for some Eclipse issue, but in fact this disables the whole plugins. To reenable forbidden-apis you have to now explicitely enable it in every module (because pluginManagement just gives the config of a plugin, where without that it also enables its execution. In addition, there is already version 1.7 of forbiddenapis, so you can replace 1.6.1 of forbidden-apis with version 1.7 (which fixes a few bugs with Java 8 and Java 9). The following new violations were found - and in fact those broke code in turkish locale: [INFO] --- - [INFO] Building Apache Tika core 1.8-SNAPSHOT [INFO] --- - [INFO] [INFO] --- forbiddenapis:1.7:testCheck (default) @ tika-core --- [INFO] Scanning for classes to check... [INFO] Reading bundled API signatures: jdk-unsafe [INFO] Reading bundled API signatures: jdk-deprecated [INFO] Loading classes to check... [INFO] Scanning for API signatures and dependencies... [ERROR] Forbidden method invocation: java.lang.String#toLowerCase() [Uses default locale] [ERROR] in org.apache.tika.sax.BasicContentHandlerFactoryTest (BasicContentHandlerFactoryTest.java:79) [ERROR] Forbidden method invocation: java.lang.String#toLowerCase() [Uses default locale] [ERROR] in org.apache.tika.sax.BasicContentHandlerFactoryTest (BasicContentHandlerFactoryTest.java:80) [ERROR] Forbidden method invocation: java.lang.String#toLowerCase() [Uses default locale] [ERROR] in org.apache.tika.sax.BasicContentHandlerFactoryTest (BasicContentHandlerFactoryTest.java:88) [ERROR] Forbidden method invocation: java.lang.String#toLowerCase() [Uses default locale] [ERROR] in org.apache.tika.sax.BasicContentHandlerFactoryTest (BasicContentHandlerFactoryTest.java:133) [ERROR] Forbidden method invocation: java.lang.String#toLowerCase() [Uses default locale] [ERROR] in org.apache.tika.sax.BasicContentHandlerFactoryTest (BasicContentHandlerFactoryTest.java:176) [ERROR] Forbidden method invocation: java.lang.String#toLowerCase() [Uses default locale] [ERROR] in org.apache.tika.sax.BasicContentHandlerFactoryTest
[jira] [Commented] (TIKA-1529) Turn forbidden-apis back on
[ https://issues.apache.org/jira/browse/TIKA-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289410#comment-14289410 ] Tim Allison commented on TIKA-1529: --- Great! I'm still making mods and creating a static Charset UTF_8 in IOUtils following [~thetaphi]'s recommendation...until we move to 1.7 and can use StandardCharsets. Turn forbidden-apis back on --- Key: TIKA-1529 URL: https://issues.apache.org/jira/browse/TIKA-1529 Project: Tika Issue Type: Bug Reporter: Tim Allison Priority: Minor [~thetaphi] recently noticed that forbidden-apis was turned off in r1624185, and he submitted a patch to the dev list. Let's turn it back on. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (TIKA-1529) Turn forbidden-apis back on
[ https://issues.apache.org/jira/browse/TIKA-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289438#comment-14289438 ] Uwe Schindler commented on TIKA-1529: - If you just check for ASCII chars in some string of unknown encoding, the easiest is to use US-ASCII as charset, this will always work, also with UTF-8 :-) Turn forbidden-apis back on --- Key: TIKA-1529 URL: https://issues.apache.org/jira/browse/TIKA-1529 Project: Tika Issue Type: Bug Reporter: Tim Allison Priority: Minor [~thetaphi] recently noticed that forbidden-apis was turned off in r1624185, and he submitted a patch to the dev list. Let's turn it back on. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
RE: Forbidden-APIS no longer ran because of carzy POM change
Hi, There're several places where forbiddenapis will give errors in Tika. I don't know if there is better way to fallback. E.g. in one of chm parser classes: try { dle.setName(new String(bytes, UTF-8)); catch (UnsupportedCharsetException e) { dle.setName(new String(bytes)); } Can you add special annotation parsing (like @SuppressWarnings(forbiddenapis) on element) to avoid emitting build error in special cases like above mentioned? Not yet, see: https://code.google.com/p/forbidden-apis/issues/detail?id=34 But the above is not needed. UTF-8 is always defined (the JVM standard requires this). In fact in Java 7, you can use StandardCharsets.UTF_8. It is also a bad idea to use charsets as strings. Just somewhere define a constant (if you have to Java 7) like: public final Charset UTF_8 = Charset.forName(UTF-8); And use that everywhere instead of a string. This spares the synchronized lookup of the string, the JVM is doing otherwise. Uwe -- Best regards, Konstantin Gribov Fri Jan 23 2015 at 15:10:18, Uwe Schindler u...@thetaphi.de: Here is the patch, mailing list swallowed it: Index: tika-parent/pom.xml == = --- tika-parent/pom.xml (revision 1654171) +++ tika-parent/pom.xml (working copy) @@ -274,7 +274,6 @@ /properties build -pluginManagement plugins plugin artifactIdmaven-compiler-plugin/artifactId @@ -287,7 +286,7 @@ plugin groupIdde.thetaphi/groupId artifactIdforbiddenapis/artifactId - version1.6.1/version + version1.7/version configuration targetVersion${maven.compiler.target}/targetVersion internalRuntimeForbiddentrue/internalRuntimeForbidden @@ -322,7 +321,6 @@ version2.3/version /plugin /plugins -/pluginManagement /build profiles - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Uwe Schindler [mailto:u...@thetaphi.de] Sent: Friday, January 23, 2015 1:08 PM To: dev@tika.apache.org Subject: RE: Forbidden-APIS no longer ran because of carzy POM change The attached patch reverts the change and updates the forbidden plugin. Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Uwe Schindler [mailto:u...@thetaphi.de] Sent: Friday, January 23, 2015 1:00 PM To: dev@tika.apache.org Subject: RE: Forbidden-APIS no longer ran because of carzy POM change Here ist he explanation why the plugin is no longer called because of this: - Works for me too, but can anyone explain why? – Andrew Swan May 15 '13 at 6:26 - @Andrew I think this works because m2e is not looking for plugins in pluginManagement, but only in build/plugins. In the Maven world, there is a difference between the two - the former defines if you happen to use this plugin, here's the configuration to use, whereas the latter states use this plugin. See this post and its top two answers. – GreenGiant Jul 5 '13 at 17:52 - I agree with @GreenGiant. I tried this solution but it then breaks the compilation since the aspectj plugin is not called before compilation. – Pierre Aug 30 '13 at 20:21 This explains the change. In fact placing the plugins in pluginManagements disables them unless explicitely configured in a sub-module. So this commit should be reverted. In fact the bug described here no longer applies to later M2E installations. It still complains about plugins that Eclipse does not know about, but this does not prevent you from using Eclipse. So I would strongly ask to revert the commit because it breaks the build. Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Uwe Schindler [mailto:u...@thetaphi.de] Sent: Friday, January 23, 2015 12:11 PM To: dev@tika.apache.org Subject: Forbidden-APIS no longer ran because of carzy POM change Hi, I just noticed while checking the problems around the ExternalParsers that the TIKA's build no longer runs the forbidden-apis Maven plugin, so we got a few new violation especially regarding the toUpper/LowerCase(). In fact the following commit broke this: Revision: 1624185 Author: mattmann Date: Donnerstag, 11. September 2014 05:11:19 Message: surround in plugin management to resolve http://stackoverflow.com/questions/6352208/how-to-solve-plugin- execution-not-covered-by-lifecycle-configuration-for-sprin
RE: Forbidden-APIS no longer ran because of carzy POM change
Hi, this may also help, it also brings the needed information: https://www.eclipse.org/m2e/documentation/m2e-execution-not-covered.html In fact the problem is: Eclipse has no idea how this plugin should be executed internally in Eclipse. But as this is just a check plugin that does not affect the build output at all, you can leave it disabled. If you scroll down, you see that Eclipse 4.2+ fixes this problem: Disable the plugin for Maven using Project properties - Maven - Lifecycle mappings - ignore - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Mattmann, Chris A (3980) [mailto:chris.a.mattm...@jpl.nasa.gov] Sent: Friday, January 23, 2015 4:13 PM To: dev@tika.apache.org Subject: Re: Forbidden-APIS no longer ran because of carzy POM change Hi Uwe, Thanks. I will check it out. Like I said, I’m not OK reverting anything if my Eclipse keeps complaining at me so we’ll need a fix that handles both. Let me try with the latest version of Eclipse and m2e and see if (with your patch) the issue goes away. Cheers, Chris ++ Chris Mattmann, Ph.D. Chief Architect Instrument Software and Science Data Systems Section (398) NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 168-519, Mailstop: 168-527 Email: chris.a.mattm...@nasa.gov WWW: http://sunset.usc.edu/~mattmann/ ++ Adjunct Associate Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++ -Original Message- From: Uwe Schindler u...@thetaphi.de Reply-To: dev@tika.apache.org dev@tika.apache.org Date: Friday, January 23, 2015 at 3:59 AM To: dev@tika.apache.org dev@tika.apache.org Subject: RE: Forbidden-APIS no longer ran because of carzy POM change Here ist he explanation why the plugin is no longer called because of this: - Works for me too, but can anyone explain why? – Andrew Swan May 15 '13 at 6:26 - @Andrew I think this works because m2e is not looking for plugins in pluginManagement, but only in build/plugins. In the Maven world, there is a difference between the two - the former defines if you happen to use this plugin, here's the configuration to use, whereas the latter states use this plugin. See this post and its top two answers. – GreenGiant Jul 5 '13 at 17:52 - I agree with @GreenGiant. I tried this solution but it then breaks the compilation since the aspectj plugin is not called before compilation. – Pierre Aug 30 '13 at 20:21 This explains the change. In fact placing the plugins in pluginManagements disables them unless explicitely configured in a sub-module. So this commit should be reverted. In fact the bug described here no longer applies to later M2E installations. It still complains about plugins that Eclipse does not know about, but this does not prevent you from using Eclipse. So I would strongly ask to revert the commit because it breaks the build. Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Uwe Schindler [mailto:u...@thetaphi.de] Sent: Friday, January 23, 2015 12:11 PM To: dev@tika.apache.org Subject: Forbidden-APIS no longer ran because of carzy POM change Hi, I just noticed while checking the problems around the ExternalParsers that the TIKA's build no longer runs the forbidden-apis Maven plugin, so we got a few new violation especially regarding the toUpper/LowerCase(). In fact the following commit broke this: Revision: 1624185 Author: mattmann Date: Donnerstag, 11. September 2014 05:11:19 Message: surround in plugin management to resolve http://stackoverflow.com/questions/6352208/how-to-solve-plugin- execution-not-covered-by-lifecycle-configuration-for-sprin Modified : /tika/trunk/tika-parent/pom.xml Since that change, the plugin is no longer run by default. I have no idea, why this is like this, but in fact this broke some of the globally defined check tasks. I have no idea how to reenable it easily. So I cannot help, but reverting that commit restores behavior. What is the reason for this commit, there is not even an issue about that. I think it seems to be a workaround for some Eclipse issue, but in fact this disables the whole plugins. To reenable forbidden-apis you have to now explicitely enable it in every module (because pluginManagement just gives the config of a plugin, where without that it also enables its execution. In addition, there is already version 1.7 of forbiddenapis, so you can replace 1.6.1 of forbidden-apis with version 1.7 (which fixes a few bugs with Java 8 and Java 9). The following new
[jira] [Commented] (TIKA-1529) Turn forbidden-apis back on
[ https://issues.apache.org/jira/browse/TIKA-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289460#comment-14289460 ] Konstantin Gribov commented on TIKA-1529: - {{new String(bytes, Charset)}} will always replace malformed and unmappable chars with some placeholder (see [http://docs.oracle.com/javase/6/docs/api/java/lang/String.html#String(byte\[\], java.nio.charset.Charset)]). So we can use any standard encoding. Turn forbidden-apis back on --- Key: TIKA-1529 URL: https://issues.apache.org/jira/browse/TIKA-1529 Project: Tika Issue Type: Bug Reporter: Tim Allison Priority: Minor [~thetaphi] recently noticed that forbidden-apis was turned off in r1624185, and he submitted a patch to the dev list. Let's turn it back on. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
Re: Forbidden-APIS no longer ran because of carzy POM change
Hi Uwe, Thanks. I will check it out. Like I said, I’m not OK reverting anything if my Eclipse keeps complaining at me so we’ll need a fix that handles both. Let me try with the latest version of Eclipse and m2e and see if (with your patch) the issue goes away. Cheers, Chris ++ Chris Mattmann, Ph.D. Chief Architect Instrument Software and Science Data Systems Section (398) NASA Jet Propulsion Laboratory Pasadena, CA 91109 USA Office: 168-519, Mailstop: 168-527 Email: chris.a.mattm...@nasa.gov WWW: http://sunset.usc.edu/~mattmann/ ++ Adjunct Associate Professor, Computer Science Department University of Southern California, Los Angeles, CA 90089 USA ++ -Original Message- From: Uwe Schindler u...@thetaphi.de Reply-To: dev@tika.apache.org dev@tika.apache.org Date: Friday, January 23, 2015 at 3:59 AM To: dev@tika.apache.org dev@tika.apache.org Subject: RE: Forbidden-APIS no longer ran because of carzy POM change Here ist he explanation why the plugin is no longer called because of this: - Works for me too, but can anyone explain why? – Andrew Swan May 15 '13 at 6:26 - @Andrew I think this works because m2e is not looking for plugins in pluginManagement, but only in build/plugins. In the Maven world, there is a difference between the two - the former defines if you happen to use this plugin, here's the configuration to use, whereas the latter states use this plugin. See this post and its top two answers. – GreenGiant Jul 5 '13 at 17:52 - I agree with @GreenGiant. I tried this solution but it then breaks the compilation since the aspectj plugin is not called before compilation. – Pierre Aug 30 '13 at 20:21 This explains the change. In fact placing the plugins in pluginManagements disables them unless explicitely configured in a sub-module. So this commit should be reverted. In fact the bug described here no longer applies to later M2E installations. It still complains about plugins that Eclipse does not know about, but this does not prevent you from using Eclipse. So I would strongly ask to revert the commit because it breaks the build. Uwe - Uwe Schindler H.-H.-Meier-Allee 63, D-28213 Bremen http://www.thetaphi.de eMail: u...@thetaphi.de -Original Message- From: Uwe Schindler [mailto:u...@thetaphi.de] Sent: Friday, January 23, 2015 12:11 PM To: dev@tika.apache.org Subject: Forbidden-APIS no longer ran because of carzy POM change Hi, I just noticed while checking the problems around the ExternalParsers that the TIKA's build no longer runs the forbidden-apis Maven plugin, so we got a few new violation especially regarding the toUpper/LowerCase(). In fact the following commit broke this: Revision: 1624185 Author: mattmann Date: Donnerstag, 11. September 2014 05:11:19 Message: surround in plugin management to resolve http://stackoverflow.com/questions/6352208/how-to-solve-plugin- execution-not-covered-by-lifecycle-configuration-for-sprin Modified : /tika/trunk/tika-parent/pom.xml Since that change, the plugin is no longer run by default. I have no idea, why this is like this, but in fact this broke some of the globally defined check tasks. I have no idea how to reenable it easily. So I cannot help, but reverting that commit restores behavior. What is the reason for this commit, there is not even an issue about that. I think it seems to be a workaround for some Eclipse issue, but in fact this disables the whole plugins. To reenable forbidden-apis you have to now explicitely enable it in every module (because pluginManagement just gives the config of a plugin, where without that it also enables its execution. In addition, there is already version 1.7 of forbiddenapis, so you can replace 1.6.1 of forbidden-apis with version 1.7 (which fixes a few bugs with Java 8 and Java 9). The following new violations were found - and in fact those broke code in turkish locale: [INFO] [INFO] Building Apache Tika core 1.8-SNAPSHOT [INFO] -- -- [INFO] [INFO] --- forbiddenapis:1.7:testCheck (default) @ tika-core --- [INFO] Scanning for classes to check... [INFO] Reading bundled API signatures: jdk-unsafe [INFO] Reading bundled API signatures: jdk-deprecated [INFO] Loading classes to check... [INFO] Scanning for API signatures and dependencies... [ERROR] Forbidden method invocation: java.lang.String#toLowerCase() [Uses default locale] [ERROR] in org.apache.tika.sax.BasicContentHandlerFactoryTest (BasicContentHandlerFactoryTest.java:79) [ERROR] Forbidden method invocation: java.lang.String#toLowerCase() [Uses default locale] [ERROR] in
[jira] [Commented] (TIKA-1529) Turn forbidden-apis back on
[ https://issues.apache.org/jira/browse/TIKA-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289429#comment-14289429 ] Konstantin Gribov commented on TIKA-1529: - [~talli...@mitre.org], it works with {{ISO-8859-1}} since only {{html}} tags presence is checked. It should also work with any utf-8 and single-byte encodings, so, I think, it's safe to decode with this encoding. In openjdk8 {{new String(bytes)}} tries: - to decode using default charset ({{Charset.defaultCharset().name()}}), - if it fails print a warning and decode using {{ISO-8859-1}}. We may use such pattern in {{ChmDirectoryListingSet}}. Turn forbidden-apis back on --- Key: TIKA-1529 URL: https://issues.apache.org/jira/browse/TIKA-1529 Project: Tika Issue Type: Bug Reporter: Tim Allison Priority: Minor [~thetaphi] recently noticed that forbidden-apis was turned off in r1624185, and he submitted a patch to the dev list. Let's turn it back on. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Comment Edited] (TIKA-1526) ExternalParser should trap/ignore/workarround JDK-8047340 JDK-8055301 so Turkish Tika users can still use non-external parsers
[ https://issues.apache.org/jira/browse/TIKA-1526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289182#comment-14289182 ] Uwe Schindler edited comment on TIKA-1526 at 1/23/15 12:32 PM: --- To work around this bug you can in fact do this. It is just bad to change User's default locale, which may especially break multi-threaded applications. One solution could be: During startup of the JVM (in the Plexus launcher's main method) you can do the following: - check for locale, we do this like that: {{new Locale(tr).getLanguage().equals(Locale.getDefault().getLanguage())}} (it is important to do the check like this, because otherwise its not guaranteed that it really works, especially in newer java versions!!!) - if its such a locale, switch to Locale.ROOT (save original) in a single-threaded environment (this is why it should be in main launcher) - execute a fake UNIX command, like /bin/true. You can also execute some non-existing bullshit that just fails. The call is just there to statically initalize the broken UnixProcess class. Once it is initialized correctly it works - switch back to saved locale was (Author: thetaphi): To work around this bug you can in fact do this. It is just bad to change User's default locale, which may especially break multi-threaded applications. One solution could be: During startup of the JVM (in the Plexus launcher's main method) you can do the following: - check for locale, we do this like that: {{new Locale(tr).getLanguage().equals(Locale.getDefault().getLanguage())}} (it is important to do the check like this, because otherwise its not guaranteed that it really works, especially in newer java versions!!!) - if its such a locale, switch to Locale.ROOT (save original) in a single-threaded environment (this is why it should be in main launcher) - execute a fake UNIX command, like /bin/true. You can also execute northing, it is just there to statically initalize the broken UnixProcess class. Once it is initialized correctly it works - switch back to saved locale ExternalParser should trap/ignore/workarround JDK-8047340 JDK-8055301 so Turkish Tika users can still use non-external parsers Key: TIKA-1526 URL: https://issues.apache.org/jira/browse/TIKA-1526 Project: Tika Issue Type: Wish Reporter: Hoss Man the JDK has numerous pain points regarding the Turkish locale, posix_spawn lowercasing being one of them... https://bugs.openjdk.java.net/browse/JDK-8047340 https://bugs.openjdk.java.net/browse/JDK-8055301 As of Tika 1.7, the TesseractOCRParser (which is an ExternalParser) is enabled configured by default in Tika, and uses ExternalParser.check to see if tesseract is available -- but because of the JDK bug, this means that Tika fails fast for Turkish users on BSD/UNIX variants (including MacOSX) like so... {noformat} [junit4] Throwable #1: java.lang.Error: posix_spawn is not a supported process launch mechanism on this platform. [junit4] at java.lang.UNIXProcess$1.run(UNIXProcess.java:105) [junit4] at java.lang.UNIXProcess$1.run(UNIXProcess.java:94) [junit4] at java.security.AccessController.doPrivileged(Native Method) [junit4] at java.lang.UNIXProcess.clinit(UNIXProcess.java:92) [junit4] at java.lang.ProcessImpl.start(ProcessImpl.java:130) [junit4] at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029) [junit4] at java.lang.Runtime.exec(Runtime.java:620) [junit4] at java.lang.Runtime.exec(Runtime.java:485) [junit4] at org.apache.tika.parser.external.ExternalParser.check(ExternalParser.java:344) [junit4] at org.apache.tika.parser.ocr.TesseractOCRParser.hasTesseract(TesseractOCRParser.java:117) [junit4] at org.apache.tika.parser.ocr.TesseractOCRParser.getSupportedTypes(TesseractOCRParser.java:90) [junit4] at org.apache.tika.parser.CompositeParser.getParsers(CompositeParser.java:81) [junit4] at org.apache.tika.parser.DefaultParser.getParsers(DefaultParser.java:95) [junit4] at org.apache.tika.parser.CompositeParser.getSupportedTypes(CompositeParser.java:229) [junit4] at org.apache.tika.parser.CompositeParser.getParsers(CompositeParser.java:81) [junit4] at org.apache.tika.parser.CompositeParser.getParser(CompositeParser.java:209) [junit4] at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:244) [junit4] at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120) {noformat} ...unless they go out of their way to white list only the parsers
[jira] [Commented] (TIKA-1526) ExternalParser should trap/ignore/workarround JDK-8047340 JDK-8055301 so Turkish Tika users can still use non-external parsers
[ https://issues.apache.org/jira/browse/TIKA-1526?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289182#comment-14289182 ] Uwe Schindler commented on TIKA-1526: - To work around this bug you can in fact do this. It is just bad to change User's default locale, which may especially break multi-threaded applications. One solution could be: During startup of the JVM (in the Plexus launcher's main method) you can do the following: - check for locale, we do this like that: {{new Locale(tr).getLanguage().equals(Locale.getDefault().getLanguage())}} (it is important to do the check like this, because otherwise its not guaranteed that it really works, especially in newer java versions!!!) - if its such a locale, switch to Locale.ROOT (save original) in a single-threaded environment (this is why it should be in main launcher) - execute a fake UNIX command, like /bin/true. You can also execute northing, it is just there to statically initalize the broken UnixProcess class. Once it is initialized correctly it works - switch back to saved locale ExternalParser should trap/ignore/workarround JDK-8047340 JDK-8055301 so Turkish Tika users can still use non-external parsers Key: TIKA-1526 URL: https://issues.apache.org/jira/browse/TIKA-1526 Project: Tika Issue Type: Wish Reporter: Hoss Man the JDK has numerous pain points regarding the Turkish locale, posix_spawn lowercasing being one of them... https://bugs.openjdk.java.net/browse/JDK-8047340 https://bugs.openjdk.java.net/browse/JDK-8055301 As of Tika 1.7, the TesseractOCRParser (which is an ExternalParser) is enabled configured by default in Tika, and uses ExternalParser.check to see if tesseract is available -- but because of the JDK bug, this means that Tika fails fast for Turkish users on BSD/UNIX variants (including MacOSX) like so... {noformat} [junit4] Throwable #1: java.lang.Error: posix_spawn is not a supported process launch mechanism on this platform. [junit4] at java.lang.UNIXProcess$1.run(UNIXProcess.java:105) [junit4] at java.lang.UNIXProcess$1.run(UNIXProcess.java:94) [junit4] at java.security.AccessController.doPrivileged(Native Method) [junit4] at java.lang.UNIXProcess.clinit(UNIXProcess.java:92) [junit4] at java.lang.ProcessImpl.start(ProcessImpl.java:130) [junit4] at java.lang.ProcessBuilder.start(ProcessBuilder.java:1029) [junit4] at java.lang.Runtime.exec(Runtime.java:620) [junit4] at java.lang.Runtime.exec(Runtime.java:485) [junit4] at org.apache.tika.parser.external.ExternalParser.check(ExternalParser.java:344) [junit4] at org.apache.tika.parser.ocr.TesseractOCRParser.hasTesseract(TesseractOCRParser.java:117) [junit4] at org.apache.tika.parser.ocr.TesseractOCRParser.getSupportedTypes(TesseractOCRParser.java:90) [junit4] at org.apache.tika.parser.CompositeParser.getParsers(CompositeParser.java:81) [junit4] at org.apache.tika.parser.DefaultParser.getParsers(DefaultParser.java:95) [junit4] at org.apache.tika.parser.CompositeParser.getSupportedTypes(CompositeParser.java:229) [junit4] at org.apache.tika.parser.CompositeParser.getParsers(CompositeParser.java:81) [junit4] at org.apache.tika.parser.CompositeParser.getParser(CompositeParser.java:209) [junit4] at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:244) [junit4] at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:120) {noformat} ...unless they go out of their way to white list only the parsers they need/want so TesseractOCRParser (and any other ExternalParsers) will never even be check()ed. It would be nice if Tika's ExternalParser class added a similar hack/workarround to what was done in SOLR-6387 to trap these types of errors. In Solr we just propogate a better error explaining why Java hates the turkish langauge... {code} } catch (Error err) { if (err.getMessage() != null (err.getMessage().contains(posix_spawn) || err.getMessage().contains(UNIXProcess))) { log.warn(Error forking command due to JVM locale bug (see https://issues.apache.org/jira/browse/SOLR-6387): + err.getMessage()); return (error executing: + cmd + ); } } {code} ...but with Tika, it might be better for all ExternalParsers to just opt out as if they don't recognize the filetype when they detect this type of error fro m the check method (or perhaps it would be better if AutoDetectParser handled this? ... i'm not really sure how it would best fit into Tika's
[jira] [Commented] (TIKA-1529) Turn forbidden-apis back on
[ https://issues.apache.org/jira/browse/TIKA-1529?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanelfocusedCommentId=14289362#comment-14289362 ] Tim Allison commented on TIKA-1529: --- Makes sense. I'll try to fix the causes for failure now so that when/if we can turn it back on, there won't be much work. Turn forbidden-apis back on --- Key: TIKA-1529 URL: https://issues.apache.org/jira/browse/TIKA-1529 Project: Tika Issue Type: Bug Reporter: Tim Allison Priority: Minor [~thetaphi] recently noticed that forbidden-apis was turned off in r1624185, and he submitted a patch to the dev list. Let's turn it back on. -- This message was sent by Atlassian JIRA (v6.3.4#6332)