Re: [VOTE] Propose graduation of ManifoldCF from the Incubator as a Top Level Project
+1 Erlend On 24.04.12 12.08, Karl Wright wrote: Please vote +1 if you think we should propose to the Incubator that we graduate as a top level project at this time. If this vote passes, I will open a [DISCUSS] thread in gene...@incubator.apache.org, and turn it into a [VOTE] thread if the discussion looks positive. +1(binding) from me. Karl -- Erlend Garåsen Center for Information Technology Services University of Oslo P.O. Box 1086 Blindern, N-0317 OSLO, Norway Ph: (+47) 22840193, Fax: (+47) 22852970, Mobile: (+47) 91380968, VIP: 31050
Re: Build fails with Maven while building MCF from trunk
On 12.04.12 23.46, Karl Wright wrote: The build process has changed. The incubator required we remove all binaries. You will need to do one of the following: (a) Download the -lib package from the release candidate and follow the instructions I tried to place all the jars from the lib distribution into a lib folder in my working trunk version. I tried to follow the instructions in the README file. What exactly is the corresponding lib distribution if you work with trunk? The latest 0.5 release? Here's what I did: 1. Downloaded apache-manifoldcf-0.5-incubating-lib.tar.gz and unpacked it 2. Created a lib directory in the project root and copied all jars from the lib distribution into that directory 3. ran ant build (nothing really happens) 4. Tried ant make-deps followed by ant build. I guess there is one detail I have missed. Your suggestions work perfectly if I download the corresponding source package, but not with the latest version from trunk. Erlend (b) Make sure you have svn 1.7 installed and run ant make-core-deps It's not straight forward to upgrade SVN to version 1.7 on OS X. Fink is the most common tool for installing/upgrading SVN, but the current/stable version is 1.6.17-4: http://pdb.finkproject.org/pdb/package.php/svn If I _should_ use SVN 1.7, I can try to install it from MacPorts. Erlend -- Erlend Garåsen Center for Information Technology Services University of Oslo P.O. Box 1086 Blindern, N-0317 OSLO, Norway Ph: (+47) 22840193, Fax: (+47) 22852970, Mobile: (+47) 91380968, VIP: 31050
Re: Build fails with Maven while building MCF from trunk
On 24.04.12 21.16, Karl Wright wrote: ant clean-core-deps make-core-deps OK, SVN 1.7 is required. I can try to install it from MacPorts. patch-source-via-svn: [exec] Unknown command: 'patch' [exec] Type 'svn help' for usage. BUILD FAILED -- Erlend Garåsen Center for Information Technology Services University of Oslo P.O. Box 1086 Blindern, N-0317 OSLO, Norway Ph: (+47) 22840193, Fax: (+47) 22852970, Mobile: (+47) 91380968, VIP: 31050
Missing pom.xml files for some (proprietary?) connectors
Is the following the reason why there are no pom.xml files for some connectors, for instance Documentum? The biggest limitation of the current Maven build is that it does not support any of the proprietary connectors ... Erlend -- Erlend Garåsen Center for Information Technology Services University of Oslo P.O. Box 1086 Blindern, N-0317 OSLO, Norway Ph: (+47) 22840193, Fax: (+47) 22852970, Mobile: (+47) 91380968, VIP: 31050
Re: Missing pom.xml files for some (proprietary?) connectors
Thanks for clarifying. It's not a blocker for me, I just like to use Maven instead of Ant. I figured it out while working with a Jira issue assigned to me. Erlend On 15.04.12 20.32, Karl Wright wrote: The reason that Maven cannot build proprietary connectors is because the dependencies are unavailable via Maven. For instance, you cannot build the LiveLink connector without a lapi.jar, and you can only get that by purchasing a license from OpenText. Karl On Sun, Apr 15, 2012 at 2:29 PM, Erlend Garåsene.f.gara...@usit.uio.no wrote: Is the following the reason why there are no pom.xml files for some connectors, for instance Documentum? The biggest limitation of the current Maven build is that it does not support any of the proprietary connectors ... Erlend -- Erlend Garåsen Center for Information Technology Services University of Oslo P.O. Box 1086 Blindern, N-0317 OSLO, Norway Ph: (+47) 22840193, Fax: (+47) 22852970, Mobile: (+47) 91380968, VIP: 31050 -- Erlend Garåsen Center for Information Technology Services University of Oslo P.O. Box 1086 Blindern, N-0317 OSLO, Norway Ph: (+47) 22840193, Fax: (+47) 22852970, Mobile: (+47) 91380968, VIP: 31050
Re: Build fails with Maven while building MCF from trunk
I knew there were some changes due to new requirements from Apache Incubator, but unfortunately I have been abroad the last month and haven't paid attention. Sorry about that. Thanks for your help! Erlend On 12.04.12 23.46, Karl Wright wrote: The build process has changed. The incubator required we remove all binaries. You will need to do one of the following: (a) Download the -lib package from the release candidate and follow the instructions (b) Make sure you have svn 1.7 installed and run ant make-core-deps Only then will ant build or mvn-bootstrap work. This is also explained in the readme. Thanks, Karl On Thu, Apr 12, 2012 at 4:49 PM, Erlend Garåsene.f.gara...@usit.uio.no wrote: It fails on Linux as well on OSX. I just tried to run ant build and ant test on our Linux development server as well on my laptop. Well, I get BUILD SUCCESSFUL, but nothing really happens. No tests run at all. Erlend On 12.04.12 22.38, Erlend Garåsen wrote: Yes, but it fails as I wrote. I think something is broken in trunk at the moment. ant test and ant build fails as well. I double-checked by doing another svn co. Erlend On 12.04.12 22.18, Karl Wright wrote: You need to run the mvn-bootstrap script, as per the instructions. Karl On Thu, Apr 12, 2012 at 4:17 PM, Erlend Garåsene.f.gara...@usit.uio.no wrote: I did a svn co in order to get a fresh version of MCF from trunk since I had many temporary code changes which shouldn't be committed, and then I discovered some problems. I'm mentioning this in case there are similar problems with the RC6 candidate. When I run mvn-bootstrap.sh, the build fails after the dependencies have been downloaded: [ERROR] Failed to execute goal org.apache.maven.plugins:maven-install-plugin:2.3.1:install-file (default-cli) on project mcf-parent: Error installing artifact 'xml-security:xmlsec:jar': Failed to install artifact xml-security:xmlsec:jar:1.4.1: /Users/erlendfg/tmp/mcf_2012/lib/xmlsec.jar (No such file or directory) - [Help 1] I1m using Apache Maven 3.0.4 Erlend -- Erlend Garåsen Center for Information Technology Services University of Oslo P.O. Box 1086 Blindern, N-0317 OSLO, Norway Ph: (+47) 22840193, Fax: (+47) 22852970, Mobile: (+47) 91380968, VIP: 31050 -- Erlend Garåsen Center for Information Technology Services University of Oslo P.O. Box 1086 Blindern, N-0317 OSLO, Norway Ph: (+47) 22840193, Fax: (+47) 22852970, Mobile: (+47) 91380968, VIP: 31050 -- Erlend Garåsen Center for Information Technology Services University of Oslo P.O. Box 1086 Blindern, N-0317 OSLO, Norway Ph: (+47) 22840193, Fax: (+47) 22852970, Mobile: (+47) 91380968, VIP: 31050
Build fails with Maven while building MCF from trunk
I did a svn co in order to get a fresh version of MCF from trunk since I had many temporary code changes which shouldn't be committed, and then I discovered some problems. I'm mentioning this in case there are similar problems with the RC6 candidate. When I run mvn-bootstrap.sh, the build fails after the dependencies have been downloaded: [ERROR] Failed to execute goal org.apache.maven.plugins:maven-install-plugin:2.3.1:install-file (default-cli) on project mcf-parent: Error installing artifact 'xml-security:xmlsec:jar': Failed to install artifact xml-security:xmlsec:jar:1.4.1: /Users/erlendfg/tmp/mcf_2012/lib/xmlsec.jar (No such file or directory) - [Help 1] I1m using Apache Maven 3.0.4 Erlend -- Erlend Garåsen Center for Information Technology Services University of Oslo P.O. Box 1086 Blindern, N-0317 OSLO, Norway Ph: (+47) 22840193, Fax: (+47) 22852970, Mobile: (+47) 91380968, VIP: 31050
Re: Build fails with Maven while building MCF from trunk
Yes, but it fails as I wrote. I think something is broken in trunk at the moment. ant test and ant build fails as well. I double-checked by doing another svn co. Erlend On 12.04.12 22.18, Karl Wright wrote: You need to run the mvn-bootstrap script, as per the instructions. Karl On Thu, Apr 12, 2012 at 4:17 PM, Erlend Garåsene.f.gara...@usit.uio.no wrote: I did a svn co in order to get a fresh version of MCF from trunk since I had many temporary code changes which shouldn't be committed, and then I discovered some problems. I'm mentioning this in case there are similar problems with the RC6 candidate. When I run mvn-bootstrap.sh, the build fails after the dependencies have been downloaded: [ERROR] Failed to execute goal org.apache.maven.plugins:maven-install-plugin:2.3.1:install-file (default-cli) on project mcf-parent: Error installing artifact 'xml-security:xmlsec:jar': Failed to install artifact xml-security:xmlsec:jar:1.4.1: /Users/erlendfg/tmp/mcf_2012/lib/xmlsec.jar (No such file or directory) - [Help 1] I1m using Apache Maven 3.0.4 Erlend -- Erlend Garåsen Center for Information Technology Services University of Oslo P.O. Box 1086 Blindern, N-0317 OSLO, Norway Ph: (+47) 22840193, Fax: (+47) 22852970, Mobile: (+47) 91380968, VIP: 31050 -- Erlend Garåsen Center for Information Technology Services University of Oslo P.O. Box 1086 Blindern, N-0317 OSLO, Norway Ph: (+47) 22840193, Fax: (+47) 22852970, Mobile: (+47) 91380968, VIP: 31050
Re: Build fails with Maven while building MCF from trunk
It fails on Linux as well on OSX. I just tried to run ant build and ant test on our Linux development server as well on my laptop. Well, I get BUILD SUCCESSFUL, but nothing really happens. No tests run at all. Erlend On 12.04.12 22.38, Erlend Garåsen wrote: Yes, but it fails as I wrote. I think something is broken in trunk at the moment. ant test and ant build fails as well. I double-checked by doing another svn co. Erlend On 12.04.12 22.18, Karl Wright wrote: You need to run the mvn-bootstrap script, as per the instructions. Karl On Thu, Apr 12, 2012 at 4:17 PM, Erlend Garåsene.f.gara...@usit.uio.no wrote: I did a svn co in order to get a fresh version of MCF from trunk since I had many temporary code changes which shouldn't be committed, and then I discovered some problems. I'm mentioning this in case there are similar problems with the RC6 candidate. When I run mvn-bootstrap.sh, the build fails after the dependencies have been downloaded: [ERROR] Failed to execute goal org.apache.maven.plugins:maven-install-plugin:2.3.1:install-file (default-cli) on project mcf-parent: Error installing artifact 'xml-security:xmlsec:jar': Failed to install artifact xml-security:xmlsec:jar:1.4.1: /Users/erlendfg/tmp/mcf_2012/lib/xmlsec.jar (No such file or directory) - [Help 1] I1m using Apache Maven 3.0.4 Erlend -- Erlend Garåsen Center for Information Technology Services University of Oslo P.O. Box 1086 Blindern, N-0317 OSLO, Norway Ph: (+47) 22840193, Fax: (+47) 22852970, Mobile: (+47) 91380968, VIP: 31050 -- Erlend Garåsen Center for Information Technology Services University of Oslo P.O. Box 1086 Blindern, N-0317 OSLO, Norway Ph: (+47) 22840193, Fax: (+47) 22852970, Mobile: (+47) 91380968, VIP: 31050
Translations into Japanese needed for an error message
I need to translate an error message into Japanese in order to resolve a ticket. The English text I need to translate: Invalid URLs in seeds list: Google translates this to: 種子リスト内の無効なURL: Is the translation correct? Erlend -- Erlend Garåsen Center for Information Technology Services University of Oslo P.O. Box 1086 Blindern, N-0317 OSLO, Norway Ph: (+47) 22840193, Fax: (+47) 22852970, Mobile: (+47) 91380968, VIP: 31050
Re: [VOTE] Upgrade ManifoldCF to jdk 1.6 prior to next release
+1 Erlend On 11.04.12 02.39, Karl Wright wrote: Folks, When doing the dependency rework, it became clear that many of our binary dependencies are stuck without fixes or upgrades because we are still using jdk 1.5. I'd like to get a sense from the community whether everyone thinks we should abandon support for jdk1.5 in our next release. The pros of such a move include allowing us to upgrade to jetty 7 (which brings in a number of bug fixes), tomcat 7 components, and off-the-shelf hsqldb builds. I cannot at this time identify any obvious cons. Please let me know your opinion. +1 to upgrade, from me. Thanks, Karl -- Erlend Garåsen Center for Information Technology Services University of Oslo P.O. Box 1086 Blindern, N-0317 OSLO, Norway Ph: (+47) 22840193, Fax: (+47) 22852970, Mobile: (+47) 91380968, VIP: 31050
Re: Postpone localization tickets for 0.5 release?
On 19.03.12 20.33, Karl Wright wrote: Hi all, I've not heard anything from Hitoshi Ozawa about whether he will be able to complete some of the Japanese localization work that was scheduled for 0.5-incubating. Since the day is rapidly approaching when we have to spin the first RC, I'd like a show of hands as to whether we should postpone the two tickets currently on Hitoshi's plate for the next release. The tickets in question are CONNECTORS-394 and CONNECTORS-404. +1 from me for postponing. Karl +1 -- Erlend Garåsen Center for Information Technology Services University of Oslo P.O. Box 1086 Blindern, N-0317 OSLO, Norway Ph: (+47) 22840193, Fax: (+47) 22852970, Mobile: (+47) 91380968, VIP: 31050
Re: Possible bug in seeds list (web connector)
I think it will be much easier to validate the seeds list by using JavaScript instead of parsing urls with java.net.URL, simply because this is how we do validation elsewhere in the application. Checking for valid URLs, supported protocols and illegal characters shouldn't be very complicated by using JavaScript. What do you think? Erlend On 16.03.12 11.51, Karl Wright wrote: Do you agree that a well-formed URL is what java.net.URL will accept in the constructor's argument? Then www.example.org will fail, but http://www.example.org (without a trailing slash) will pass. I might even go a bit further. See the following code in: WebcrawlerConnector: protected String makeDocumentIdentifier(String parentIdentifier, String rawURL, DocumentURLFilter filter) Thanks! Karl On Fri, Mar 16, 2012 at 5:52 AM, Erlend Garåsene.f.gara...@usit.uio.no wrote: On 15.03.12 19.30, Karl Wright wrote: A seed can be a specific html file so complaining about a trailing slash would make that not work. For example: http://hello.world.com/startpage.html I think I was a little bit unclear in my recent email. By a trailing slash, I was thinking more about the domain name itself, e.g. www.example.org/. I will create a Jira ticket now, but I will only focus about well-formed URLs in the seeds list. Do you agree that a well-formed URL is what java.net.URL will accept in the constructor's argument? Then www.example.org will fail, but http://www.example.org (without a trailing slash) will pass. Erlend -- Erlend Garåsen Center for Information Technology Services University of Oslo P.O. Box 1086 Blindern, N-0317 OSLO, Norway Ph: (+47) 22840193, Fax: (+47) 22852970, Mobile: (+47) 91380968, VIP: 31050 -- Erlend Garåsen Center for Information Technology Services University of Oslo P.O. Box 1086 Blindern, N-0317 OSLO, Norway Ph: (+47) 22840193, Fax: (+47) 22852970, Mobile: (+47) 91380968, VIP: 31050
Re: Possible bug in seeds list (web connector)
This sounds very ok. I have already written the necessary JavaScript code, and it works as it should. I didn't create a ticket because I needed time to figure out the best solution and in order to learn more about how the connector works by reading the Java code. I will create a ticket right away and include the JavaScript, but I think I will create a patch as well before I commit my work. Erlend On 20.03.12 13.59, Karl Wright wrote: I think this is a reasonable approach. You may need to modify the python browser simulator, though, to keep the UI tests working. I can help you with that when the time comes. If you create a ticket and include your proposed Javascript, I can review it and let you know how challenging I think it will be to support it in the browser simulator. Also, since we are trying to get a release out the door, I think it makes sense to hold off on these changes until I can make the release branch. Sound OK? Thanks! Karl On Tue, Mar 20, 2012 at 8:54 AM, Erlend Garåsene.f.gara...@usit.uio.no wrote: I think it will be much easier to validate the seeds list by using JavaScript instead of parsing urls with java.net.URL, simply because this is how we do validation elsewhere in the application. Checking for valid URLs, supported protocols and illegal characters shouldn't be very complicated by using JavaScript. What do you think? Erlend On 16.03.12 11.51, Karl Wright wrote: Do you agree that a well-formed URL is what java.net.URL will accept in the constructor's argument? Then www.example.org will fail, but http://www.example.org (without a trailing slash) will pass. I might even go a bit further. See the following code in: WebcrawlerConnector: protected String makeDocumentIdentifier(String parentIdentifier, String rawURL, DocumentURLFilter filter) Thanks! Karl On Fri, Mar 16, 2012 at 5:52 AM, Erlend Garåsene.f.gara...@usit.uio.no wrote: On 15.03.12 19.30, Karl Wright wrote: A seed can be a specific html file so complaining about a trailing slash would make that not work. For example: http://hello.world.com/startpage.html I think I was a little bit unclear in my recent email. By a trailing slash, I was thinking more about the domain name itself, e.g. www.example.org/. I will create a Jira ticket now, but I will only focus about well-formed URLs in the seeds list. Do you agree that a well-formed URL is what java.net.URL will accept in the constructor's argument? Then www.example.org will fail, but http://www.example.org (without a trailing slash) will pass. Erlend -- Erlend Garåsen Center for Information Technology Services University of Oslo P.O. Box 1086 Blindern, N-0317 OSLO, Norway Ph: (+47) 22840193, Fax: (+47) 22852970, Mobile: (+47) 91380968, VIP: 31050 -- Erlend Garåsen Center for Information Technology Services University of Oslo P.O. Box 1086 Blindern, N-0317 OSLO, Norway Ph: (+47) 22840193, Fax: (+47) 22852970, Mobile: (+47) 91380968, VIP: 31050 -- Erlend Garåsen Center for Information Technology Services University of Oslo P.O. Box 1086 Blindern, N-0317 OSLO, Norway Ph: (+47) 22840193, Fax: (+47) 22852970, Mobile: (+47) 91380968, VIP: 31050
Re: Possible bug in seeds list (web connector)
I have created a ticket for this and entered some unfinished JavaScript code. My head does not work very well with regular expressions today, so I will improve the code tomorrow. Please write a few lines about what I need to do with the Python browser simulator in order to test the JavaScript. Erlend On 20.03.12 14.14, Erlend Garåsen wrote: This sounds very ok. I have already written the necessary JavaScript code, and it works as it should. I didn't create a ticket because I needed time to figure out the best solution and in order to learn more about how the connector works by reading the Java code. I will create a ticket right away and include the JavaScript, but I think I will create a patch as well before I commit my work. Erlend On 20.03.12 13.59, Karl Wright wrote: I think this is a reasonable approach. You may need to modify the python browser simulator, though, to keep the UI tests working. I can help you with that when the time comes. If you create a ticket and include your proposed Javascript, I can review it and let you know how challenging I think it will be to support it in the browser simulator. Also, since we are trying to get a release out the door, I think it makes sense to hold off on these changes until I can make the release branch. Sound OK? Thanks! Karl On Tue, Mar 20, 2012 at 8:54 AM, Erlend Garåsene.f.gara...@usit.uio.no wrote: I think it will be much easier to validate the seeds list by using JavaScript instead of parsing urls with java.net.URL, simply because this is how we do validation elsewhere in the application. Checking for valid URLs, supported protocols and illegal characters shouldn't be very complicated by using JavaScript. What do you think? Erlend On 16.03.12 11.51, Karl Wright wrote: Do you agree that a well-formed URL is what java.net.URL will accept in the constructor's argument? Then www.example.org will fail, but http://www.example.org (without a trailing slash) will pass. I might even go a bit further. See the following code in: WebcrawlerConnector: protected String makeDocumentIdentifier(String parentIdentifier, String rawURL, DocumentURLFilter filter) Thanks! Karl On Fri, Mar 16, 2012 at 5:52 AM, Erlend Garåsene.f.gara...@usit.uio.no wrote: On 15.03.12 19.30, Karl Wright wrote: A seed can be a specific html file so complaining about a trailing slash would make that not work. For example: http://hello.world.com/startpage.html I think I was a little bit unclear in my recent email. By a trailing slash, I was thinking more about the domain name itself, e.g. www.example.org/. I will create a Jira ticket now, but I will only focus about well-formed URLs in the seeds list. Do you agree that a well-formed URL is what java.net.URL will accept in the constructor's argument? Then www.example.org will fail, but http://www.example.org (without a trailing slash) will pass. Erlend -- Erlend Garåsen Center for Information Technology Services University of Oslo P.O. Box 1086 Blindern, N-0317 OSLO, Norway Ph: (+47) 22840193, Fax: (+47) 22852970, Mobile: (+47) 91380968, VIP: 31050 -- Erlend Garåsen Center for Information Technology Services University of Oslo P.O. Box 1086 Blindern, N-0317 OSLO, Norway Ph: (+47) 22840193, Fax: (+47) 22852970, Mobile: (+47) 91380968, VIP: 31050 -- Erlend Garåsen Center for Information Technology Services University of Oslo P.O. Box 1086 Blindern, N-0317 OSLO, Norway Ph: (+47) 22840193, Fax: (+47) 22852970, Mobile: (+47) 91380968, VIP: 31050
Re: Possible bug in seeds list (web connector)
On 15.03.12 19.30, Karl Wright wrote: A seed can be a specific html file so complaining about a trailing slash would make that not work. For example: http://hello.world.com/startpage.html I think I was a little bit unclear in my recent email. By a trailing slash, I was thinking more about the domain name itself, e.g. www.example.org/. I will create a Jira ticket now, but I will only focus about well-formed URLs in the seeds list. Do you agree that a well-formed URL is what java.net.URL will accept in the constructor's argument? Then www.example.org will fail, but http://www.example.org (without a trailing slash) will pass. Erlend -- Erlend Garåsen Center for Information Technology Services University of Oslo P.O. Box 1086 Blindern, N-0317 OSLO, Norway Ph: (+47) 22840193, Fax: (+47) 22852970, Mobile: (+47) 91380968, VIP: 31050
Should the default value for include only host matching seeds be yes/checked?
I suggest that we change the default value to yes/checked for Include only hosts matching seeds? for the web connector. If you only want to crawl your own company's web pages, but forget to check this option by a mistake, you risk to crawl a lot of external web pages as well. So what do you think? Should I create a ticket and change the default setting to checked/yes? Erlend -- Erlend Garåsen Center for Information Technology Services University of Oslo P.O. Box 1086 Blindern, N-0317 OSLO, Norway Ph: (+47) 22840193, Fax: (+47) 22852970, Mobile: (+47) 91380968, VIP: 31050
Possible bug in seeds list (web connector)
If I add the following URL into my seeds list: http://www.uio.no and this into the include in crawl list: http://www.uio.no/.* the job will just end shortly after it starts without fetching anything at all. If I add the missing trailing slash into my seeds url list (http://www.uio.no/), it works as it should. I also discovered another similar behaviour. If I add the following into my seeds list: www.uio.no select the include only hosts matching seeds? option and do not add anything into the include in crawl, the same thing happen. No URLs will be fetched. I suggest that we do something like this: - A URL in the Java code will always start with http(s)://www.myhost.com/ - If you fail to add the protocol or the trailing slash, it will be added automatically instead of returning an error message. By in the Java code, I mean that it should automatically be formatted like this before we do a regular expression match. Erlend -- Erlend Garåsen Center for Information Technology Services University of Oslo P.O. Box 1086 Blindern, N-0317 OSLO, Norway Ph: (+47) 22840193, Fax: (+47) 22852970, Mobile: (+47) 91380968, VIP: 31050
mvn eclipse:eclipse fails with Maven 3
When I try to run mvn eclipse:eclipse in order to prepare the MCF project for Eclipse, I get an error which seems to be related to the Alfresco connector, but this might be an issue with version 2.8 of the Maven Eclipse plugin. BTW, it works by running the following: mvn org.apache.maven.plugins:maven-eclipse-plugin:2.6:eclipse I upgraded to Maven 3 for a week ago, so I haven't seen this error (yet) i my other Java projects. This is the part of the error message from Maven using the -e switch: [ERROR] Failed to execute goal org.apache.maven.plugins:maven-eclipse-plugin:2.8:eclipse (default-cli) on project mcf-alfresco-war-test: Request to merge when 'filtering' is not identical. Original=resource src/main/resources: output=target/classes, include=[], exclude=[**/restore-context.xml|**/ldap-*.xml|**/*.java], test=false, filtering=false, merging with=resource src/main/resources: output=target/classes, include=[log4j.properties], exclude=[**/*.java], test=false, filtering=true - [Help 1] org.apache.maven.lifecycle.LifecycleExecutionException: Failed to execute goal org.apache.maven.plugins:maven-eclipse-plugin:2.8:eclipse (default-cli) on project mcf-alfresco-war-test: Request to merge when 'filtering' is not identical. Original=resource src/main/resources: output=target/classes, include=[], exclude=[**/restore-context.xml|**/ldap-*.xml|**/*.java], test=false, filtering=false, merging with=resource src/main/resources: output=target/classes, include=[log4j.properties], exclude=[**/*.java], test=false, filtering=true at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:217) at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:153) at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:145) at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:84) at org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:59) at org.apache.maven.lifecycle.internal.LifecycleStarter.singleThreadedBuild(LifecycleStarter.java:183) at org.apache.maven.lifecycle.internal.LifecycleStarter.execute(LifecycleStarter.java:161) at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:320) at org.apache.maven.DefaultMaven.execute(DefaultMaven.java:156) at org.apache.maven.cli.MavenCli.execute(MavenCli.java:537) at org.apache.maven.cli.MavenCli.doMain(MavenCli.java:196) at org.apache.maven.cli.MavenCli.main(MavenCli.java:141) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced(Launcher.java:290) at org.codehaus.plexus.classworlds.launcher.Launcher.launch(Launcher.java:230) at org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode(Launcher.java:409) at org.codehaus.plexus.classworlds.launcher.Launcher.main(Launcher.java:352) Caused by: org.apache.maven.plugin.MojoExecutionException: Request to merge when 'filtering' is not identical. Original=resource src/main/resources: output=target/classes, include=[], exclude=[**/restore-context.xml|**/ldap-*.xml|**/*.java], test=false, filtering=false, merging with=resource src/main/resources: output=target/classes, include=[log4j.properties], exclude=[**/*.java], test=false, filtering=true at org.apache.maven.plugin.eclipse.EclipseSourceDir.merge(EclipseSourceDir.java:302) at org.apache.maven.plugin.eclipse.EclipsePlugin.extractResourceDirs(EclipsePlugin.java:1652) at org.apache.maven.plugin.eclipse.EclipsePlugin.buildDirectoryList(EclipsePlugin.java:1534) at org.apache.maven.plugin.eclipse.EclipsePlugin.createEclipseWriterConfig(EclipsePlugin.java:1222) at org.apache.maven.plugin.eclipse.EclipsePlugin.writeConfiguration(EclipsePlugin.java:1085) at org.apache.maven.plugin.ide.AbstractIdeSupportMojo.execute(AbstractIdeSupportMojo.java:511) at org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo(DefaultBuildPluginManager.java:101) at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:209) ... 19 more [ERROR] [ERROR] Re-run Maven using the -X switch to enable full debug logging. [ERROR] [ERROR] For more information about the errors and possible solutions, please read the following articles: [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException [ERROR] [ERROR] After correcting the problems, you can resume the build with the command [ERROR] mvn goals -rf :mcf-alfresco-war-test Erlend -- Erlend Garåsen Center for Information Technology Services University of Oslo P.O. Box 1086 Blindern, N-0317 OSLO, Norway Ph: (+47
Re: mvn eclipse:eclipse fails with Maven 3
I forgot to mention that I tried to add the following into the parent pom.xml file without any luck: plugin groupIdorg.apache.maven.plugins/groupId artifactIdmaven-eclipse-plugin/artifactId version2.6/version /plugin Erlend -- Erlend Garåsen Center for Information Technology Services University of Oslo P.O. Box 1086 Blindern, N-0317 OSLO, Norway Ph: (+47) 22840193, Fax: (+47) 22852970, Mobile: (+47) 91380968, VIP: 31050
Re: mvn eclipse:eclipse fails with Maven 3
-context.xml|**/ldap-*.xml|**/*.java], test=false, filtering=false, merging with=resource src/main/resources: output=target/classes, include=[log4j.properties], exclude=[**/*.java], test=false, filtering=true at org.apache.maven.plugin.eclipse.EclipseSourceDir.merge(EclipseSourceDir.java:302) at org.apache.maven.plugin.eclipse.EclipsePlugin.extractResourceDirs(EclipsePlugin.java:1652) at org.apache.maven.plugin.eclipse.EclipsePlugin.buildDirectoryList(EclipsePlugin.java:1534) at org.apache.maven.plugin.eclipse.EclipsePlugin.createEclipseWriterConfig(EclipsePlugin.java:1222) at org.apache.maven.plugin.eclipse.EclipsePlugin.writeConfiguration(EclipsePlugin.java:1085) at org.apache.maven.plugin.ide.AbstractIdeSupportMojo.execute(AbstractIdeSupportMojo.java:511) at org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo(DefaultBuildPluginManager.java:101) at org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:209) ... 19 more [ERROR] [ERROR] Re-run Maven using the -X switch to enable full debug logging. [ERROR] [ERROR] For more information about the errors and possible solutions, please read the following articles: [ERROR] [Help 1] http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException [ERROR] [ERROR] After correcting the problems, you can resume the build with the command [ERROR] mvngoals -rf :mcf-alfresco-war-test Erlend -- Erlend Garåsen Center for Information Technology Services University of Oslo P.O. Box 1086 Blindern, N-0317 OSLO, Norway Ph: (+47) 22840193, Fax: (+47) 22852970, Mobile: (+47) 91380968, VIP: 31050 -- Erlend Garåsen Center for Information Technology Services University of Oslo P.O. Box 1086 Blindern, N-0317 OSLO, Norway Ph: (+47) 22840193, Fax: (+47) 22852970, Mobile: (+47) 91380968, VIP: 31050
Re: Please welcome Hitoshi Ozawa to the ManifoldCF community!
Congratulations and welcome! By the way, I hope I will meet some of you committers at Lucene Revolution in Boston this year. http://www.lucenerevolution.com/ Erlend On 07.02.12 03.39, Karl Wright wrote: Hitoshi is now officially a ManifoldCF Committer. Congratulations, Hitoshi! Karl -- Erlend Garåsen Center for Information Technology Services University of Oslo P.O. Box 1086 Blindern, N-0317 OSLO, Norway Ph: (+47) 22840193, Fax: (+47) 22852970, Mobile: (+47) 91380968, VIP: 31050
Commands for registering agents etc.
Hello list, I haven't been an active developer for a while since I have been sick, so there are a lot of changes since last time I dag into the code and setup process. It seems that the documentation regarding the executecommands tool has been changed since I cannot find the commands for installing the web crawler for instance. Maybe it is a good idea to explain the available arguments for many of the agents and crawlers. I just need to install the Solr and Web crawler connector, but cannot remember how. The documentation says After you have created the necessary configuration files, you will need to initialize the database, register the pull-agent agent, and then register your individual connectors. ManifoldCF provides a set of commands for performing these actions, and others as well. Here's what I did: ./processes/script/executecommand.sh org.apache.manifoldcf.agents.Install But now I don't longer remember what I did before in order to fulfill the necessary steps. I managed to register the Solr ourput connector, but not the web crawler: ./processes/script/executecommand.sh org.apache.manifoldcf.crawler.Register org.apache.manifoldcf.crawler.connectors.webcrawler.WebcrawlerConnector WebCrawler This gives an error message about that the relation connectors does not exist. I think I need to register the pull agent, but I'm not sure how. Erlend -- Erlend Garåsen Center for Information Technology Services University of Oslo P.O. Box 1086 Blindern, N-0317 OSLO, Norway Ph: (+47) 22840193, Fax: (+47) 22852970, Mobile: (+47) 91380968, VIP: 31050
Re: Excluding html files and following links
Unfortunately, I have only created a local Jira ticket (our own Jira at the uni) where this problem is reported. Since we are in a hurry with our search project at the moment, the issue is still open. The host having large documents is now excluded from our crawl, mainly because it has been decided that we don't want to index it at all. The host includes private web pages, basically published by students. I will keep you informed when we have made a decision about what to do with large documents. I guess the new parameter will do the trick. Thanks for working on this issue! Erlend On 05.07.11 12.04, Karl Wright wrote: Have you had a look at the feature added, and does it work for you? I'd also still be interested in knowing where you are seeing out-of-memory situations. Karl On Thu, Jun 23, 2011 at 8:03 AM, Karl Wrightdaddy...@gmail.com wrote: Hi Erlend, I hope you are not seeing memory issues on large files with ManifoldCF itself. That should not happen, and if it does we need to figure out why. Solr memory issues, on the other hand, I can believe. If that is the problem, then I agree we should try to do something about it. Probably the right thing to do is (since it is a Solr limitation) adding a configuration parameter to the Solr connector that specifies the maximum size of a file the connection will accept. Files larger than that should return a 400 if indexing is attempted, etc. Perhaps we should also consider adding a new method to the IOutputConnector interface that returns a maximum file size value, and expose that in IVersionActivity and IProcessActivity. That would allow connectors to make output-based decisions as to whether they should fetch large files in the first place. Karl On Thu, Jun 23, 2011 at 7:32 AM, Erlend Garåsene.f.gara...@usit.uio.no wrote: I will create a ticket today. Post filtering sounds like a good idea. Another thing. We are facing memory problems with huge documents. Maybe we should add another future in order to cope with such documents, for instance skip documents which exceed a preset size. We have discovered pdfs on 500 MB. What do you think? Do we need such a future as well? Erlend On 23.06.11 12.08, Karl Wright wrote: Have there been any further developments on this thread? Karl On Tue, Jun 21, 2011 at 6:08 AM, Karl Wrightdaddy...@gmail.comwrote: Sure. But you've already convinced me we need a new feature. ;-) Karl On Tue, Jun 21, 2011 at 3:50 AM, Erlend Garåsene.f.gara...@usit.uio.no wrote: Sure, I can create a ticket. But first I want to discuss this issue with the two search consultants we have hired. I decided to post to the dev list in order to get some feedback on this issue. Erlend On 20.06.11 18.00, Karl Wright wrote: Hi Erlend, The inclusions and exclusions are based solely on URL, and block the connector from fetching the file. Otherwise you would easily wind up fetching the entire web. However, this raises an interesting issue as to whether there's a way in the web connector to do what you are trying to do, which is to filter based on URL after links have been extracted. The current inclusions/exclusions work fine for any URLs without links but do not allow for the case you are looking for. Can you create a ticket? The suggestion would be to introduce post-extraction inclusions and exclusions into the connector. Karl On Mon, Jun 20, 2011 at 10:53 AM, Erlend Garåsen e.f.gara...@usit.uio.no wrote: I just realized that if I exclude html files for a job, links in these files will not be followed. Is this a desirable behaviour? Should links be followed regardless of the exclude filter? I discovered this issue when I was going to crawl only pdfs and realized that the job ended without finding any documents at all. I think I had something like this in my include list: http://foreninger.uio.no/.*\.pdf$ http://folk.uio.no/.*\.pdf$ Erlend -- Erlend Garåsen Center for Information Technology Services University of Oslo P.O. Box 1086 Blindern, N-0317 OSLO, Norway Ph: (+47) 22840193, Fax: (+47) 22852970, Mobile: (+47) 91380968, VIP: 31050 -- Erlend Garåsen Center for Information Technology Services University of Oslo P.O. Box 1086 Blindern, N-0317 OSLO, Norway Ph: (+47) 22840193, Fax: (+47) 22852970, Mobile: (+47) 91380968, VIP: 31050 -- Erlend Garåsen Center for Information Technology Services University of Oslo P.O. Box 1086 Blindern, N-0317 OSLO, Norway Ph: (+47) 22840193, Fax: (+47) 22852970, Mobile: (+47) 91380968, VIP: 31050 -- Erlend Garåsen Center for Information Technology Services University of Oslo P.O. Box 1086 Blindern, N-0317 OSLO, Norway Ph: (+47) 22840193, Fax: (+47) 22852970, Mobile: (+47) 91380968, VIP: 31050
Re: Excluding html files and following links
Sure, I can create a ticket. But first I want to discuss this issue with the two search consultants we have hired. I decided to post to the dev list in order to get some feedback on this issue. Erlend On 20.06.11 18.00, Karl Wright wrote: Hi Erlend, The inclusions and exclusions are based solely on URL, and block the connector from fetching the file. Otherwise you would easily wind up fetching the entire web. However, this raises an interesting issue as to whether there's a way in the web connector to do what you are trying to do, which is to filter based on URL after links have been extracted. The current inclusions/exclusions work fine for any URLs without links but do not allow for the case you are looking for. Can you create a ticket? The suggestion would be to introduce post-extraction inclusions and exclusions into the connector. Karl On Mon, Jun 20, 2011 at 10:53 AM, Erlend Garåsen e.f.gara...@usit.uio.no wrote: I just realized that if I exclude html files for a job, links in these files will not be followed. Is this a desirable behaviour? Should links be followed regardless of the exclude filter? I discovered this issue when I was going to crawl only pdfs and realized that the job ended without finding any documents at all. I think I had something like this in my include list: http://foreninger.uio.no/.*\.pdf$ http://folk.uio.no/.*\.pdf$ Erlend -- Erlend Garåsen Center for Information Technology Services University of Oslo P.O. Box 1086 Blindern, N-0317 OSLO, Norway Ph: (+47) 22840193, Fax: (+47) 22852970, Mobile: (+47) 91380968, VIP: 31050 -- Erlend Garåsen Center for Information Technology Services University of Oslo P.O. Box 1086 Blindern, N-0317 OSLO, Norway Ph: (+47) 22840193, Fax: (+47) 22852970, Mobile: (+47) 91380968, VIP: 31050
Re: The ManifoldCF PPMC welcomes Shinichiro Abe as a ManifoldCF committer!
Welcome and congratulations! Erlend On 13.05.11 12.05, Karl Wright wrote: Please join us in welcoming Shinichiro to the ManifoldCF team! Karl -- Erlend Garåsen Center for Information Technology Services University of Oslo P.O. Box 1086 Blindern, N-0317 OSLO, Norway Ph: (+47) 22840193, Fax: (+47) 22852970, Mobile: (+47) 91380968, VIP: 31050
Re: [VOTE] Release Apache ManifoldCF 0.2-incubating, RC2
+1 from me as well. 1. Verified the CHANGES.txt 2. Tested the binary release (tar.gz) on OS X: - md5sums - ant test - Did a web crawl and indexed Solr 3.1 using the example (Jetty + Derby) - Deployed on Resin 4.0.15, crawled the web, indexed Solr 3.1 and used an external PostgreSQL server (new future in this release) - Verified that an html page with meta robots noindex was not indexed by Solr (new future in this release) 3. Tested the source release (zip) on OS X: - md5sums - ant test - ant build - Did a web crawl and indexed Solr 3.1 using the example (Jetty + Derby) On 27.04.11 00.12, Karl Wright wrote: The RC2 of the 0.2-incubating release is now up on http://people.apache.org/~kwright. The svn tag is at https://svn.apache.org/repos/asf/incubator/lcf/tags/release-0.2-incubating-RC2. Please vote! Karl -- Erlend Garåsen Center for Information Technology Services University of Oslo P.O. Box 1086 Blindern, N-0317 OSLO, Norway Ph: (+47) 22840193, Fax: (+47) 22852970, Mobile: (+47) 91380968, VIP: 31050
Re: [VOTE] Release ManifoldCF-0.2-incubating, RC1
-1. The executecommand.sh script cannot be run since CRLF line endings have been added (should be LF). I have created a ticket about this problem: https://issues.apache.org/jira/browse/CONNECTORS-188 Erlend On 01.04.11 12.24, Karl Wright wrote: RC1 is now available on http://people.apache.org/~kwright/apache-manifoldcf-0.2-incubating. Please check it out and vote! Karl On Fri, Apr 1, 2011 at 3:38 AM, Karl Wrightdaddy...@gmail.com wrote: Vote failed; a new release candidate with a fix is being respun. This will be RC1. Karl -- Erlend Garåsen Center for Information Technology Services University of Oslo P.O. Box 1086 Blindern, N-0317 OSLO, Norway Ph: (+47) 22840193, Fax: (+47) 22852970, Mobile: (+47) 91380968, VIP: 31050