Re: [VOTE] Propose graduation of ManifoldCF from the Incubator as a Top Level Project

2012-04-24 Thread Erlend Garåsen


+1

Erlend

On 24.04.12 12.08, Karl Wright wrote:

Please vote +1 if you think we should propose to the Incubator that we
graduate as a top level project at this time.

If this vote passes, I will open a [DISCUSS] thread in
gene...@incubator.apache.org, and turn it into a [VOTE] thread if the
discussion looks positive.

+1(binding) from me.

Karl



--
Erlend Garåsen
Center for Information Technology Services
University of Oslo
P.O. Box 1086 Blindern, N-0317 OSLO, Norway
Ph: (+47) 22840193, Fax: (+47) 22852970, Mobile: (+47) 91380968, VIP: 31050


Re: Build fails with Maven while building MCF from trunk

2012-04-24 Thread Erlend Garåsen

On 12.04.12 23.46, Karl Wright wrote:

The build process has changed.  The incubator required we remove all
binaries.  You will need to do one of the following:
(a) Download the -lib package from the release candidate and follow
the instructions


I tried to place all the jars from the lib distribution into a lib 
folder in my working trunk version. I tried to follow the instructions 
in the README file.


What exactly is the corresponding lib distribution if you work with 
trunk? The latest 0.5 release? Here's what I did:

1. Downloaded apache-manifoldcf-0.5-incubating-lib.tar.gz and unpacked it
2. Created a lib directory in the project root and copied all jars from 
the lib distribution into that directory

3. ran ant build (nothing really happens)
4. Tried ant make-deps followed by ant build.

I guess there is one detail I have missed. Your suggestions work 
perfectly if I download the corresponding source package, but not with 
the latest version from trunk.


Erlend


(b) Make sure you have svn 1.7 installed and run ant make-core-deps


It's not straight forward to upgrade SVN to version 1.7 on OS X. Fink is 
the most common tool for installing/upgrading SVN, but the 
current/stable version is 1.6.17-4:

http://pdb.finkproject.org/pdb/package.php/svn

If I _should_ use SVN 1.7, I can try to install it from MacPorts.

Erlend

--
Erlend Garåsen
Center for Information Technology Services
University of Oslo
P.O. Box 1086 Blindern, N-0317 OSLO, Norway
Ph: (+47) 22840193, Fax: (+47) 22852970, Mobile: (+47) 91380968, VIP: 31050


Re: Build fails with Maven while building MCF from trunk

2012-04-24 Thread Erlend Garåsen


On 24.04.12 21.16, Karl Wright wrote:

ant clean-core-deps make-core-deps


OK, SVN 1.7 is required. I can try to install it from MacPorts.

patch-source-via-svn:
 [exec] Unknown command: 'patch'
 [exec] Type 'svn help' for usage.

BUILD FAILED

--
Erlend Garåsen
Center for Information Technology Services
University of Oslo
P.O. Box 1086 Blindern, N-0317 OSLO, Norway
Ph: (+47) 22840193, Fax: (+47) 22852970, Mobile: (+47) 91380968, VIP: 31050


Missing pom.xml files for some (proprietary?) connectors

2012-04-15 Thread Erlend Garåsen
Is the following the reason why there are no pom.xml files for some 
connectors, for instance Documentum?
The biggest limitation of the current Maven build is that it does not 
support any of the proprietary connectors ...


Erlend

--
Erlend Garåsen
Center for Information Technology Services
University of Oslo
P.O. Box 1086 Blindern, N-0317 OSLO, Norway
Ph: (+47) 22840193, Fax: (+47) 22852970, Mobile: (+47) 91380968, VIP: 31050


Re: Missing pom.xml files for some (proprietary?) connectors

2012-04-15 Thread Erlend Garåsen


Thanks for clarifying. It's not a blocker for me, I just like to use 
Maven instead of Ant. I figured it out while working with a Jira issue 
assigned to me.


Erlend

On 15.04.12 20.32, Karl Wright wrote:

The reason that Maven cannot build proprietary connectors is because
the dependencies are unavailable via Maven.  For instance, you cannot
build the LiveLink connector without a lapi.jar, and you can only get
that by purchasing a license from OpenText.

Karl

On Sun, Apr 15, 2012 at 2:29 PM, Erlend Garåsene.f.gara...@usit.uio.no  wrote:

Is the following the reason why there are no pom.xml files for some
connectors, for instance Documentum?
The biggest limitation of the current Maven build is that it does not
support any of the proprietary connectors ...

Erlend

--
Erlend Garåsen
Center for Information Technology Services
University of Oslo
P.O. Box 1086 Blindern, N-0317 OSLO, Norway
Ph: (+47) 22840193, Fax: (+47) 22852970, Mobile: (+47) 91380968, VIP: 31050



--
Erlend Garåsen
Center for Information Technology Services
University of Oslo
P.O. Box 1086 Blindern, N-0317 OSLO, Norway
Ph: (+47) 22840193, Fax: (+47) 22852970, Mobile: (+47) 91380968, VIP: 31050


Re: Build fails with Maven while building MCF from trunk

2012-04-13 Thread Erlend Garåsen


I knew there were some changes due to new requirements from Apache 
Incubator, but unfortunately I have been abroad the last month and 
haven't paid attention. Sorry about that.


Thanks for your help!

Erlend

On 12.04.12 23.46, Karl Wright wrote:

The build process has changed.  The incubator required we remove all
binaries.  You will need to do one of the following:
(a) Download the -lib package from the release candidate and follow
the instructions
(b) Make sure you have svn 1.7 installed and run ant make-core-deps

Only then will ant build or mvn-bootstrap work.  This is also
explained in the readme.

Thanks,
Karl

On Thu, Apr 12, 2012 at 4:49 PM, Erlend Garåsene.f.gara...@usit.uio.no  wrote:


It fails on Linux as well on OSX. I just tried to run ant build and ant
test on our Linux development server as well on my laptop. Well, I get
BUILD SUCCESSFUL, but nothing really happens. No tests run at all.

Erlend


On 12.04.12 22.38, Erlend Garåsen wrote:



Yes, but it fails as I wrote.

I think something is broken in trunk at the moment. ant test and ant
build fails as well. I double-checked by doing another svn co.

Erlend

On 12.04.12 22.18, Karl Wright wrote:


You need to run the mvn-bootstrap script, as per the instructions.

Karl

On Thu, Apr 12, 2012 at 4:17 PM, Erlend
Garåsene.f.gara...@usit.uio.no  wrote:



I did a svn co in order to get a fresh version of MCF from trunk
since I had
many temporary code changes which shouldn't be committed, and then I
discovered some problems. I'm mentioning this in case there are similar
problems with the RC6 candidate.

When I run mvn-bootstrap.sh, the build fails after the dependencies have
been downloaded:
[ERROR] Failed to execute goal
org.apache.maven.plugins:maven-install-plugin:2.3.1:install-file
(default-cli) on project mcf-parent: Error installing artifact
'xml-security:xmlsec:jar': Failed to install artifact
xml-security:xmlsec:jar:1.4.1:
/Users/erlendfg/tmp/mcf_2012/lib/xmlsec.jar
(No such file or directory) -  [Help 1]

I1m using Apache Maven 3.0.4

Erlend
--
Erlend Garåsen
Center for Information Technology Services
University of Oslo
P.O. Box 1086 Blindern, N-0317 OSLO, Norway
Ph: (+47) 22840193, Fax: (+47) 22852970, Mobile: (+47) 91380968, VIP:
31050







--
Erlend Garåsen
Center for Information Technology Services
University of Oslo
P.O. Box 1086 Blindern, N-0317 OSLO, Norway
Ph: (+47) 22840193, Fax: (+47) 22852970, Mobile: (+47) 91380968, VIP: 31050



--
Erlend Garåsen
Center for Information Technology Services
University of Oslo
P.O. Box 1086 Blindern, N-0317 OSLO, Norway
Ph: (+47) 22840193, Fax: (+47) 22852970, Mobile: (+47) 91380968, VIP: 31050


Build fails with Maven while building MCF from trunk

2012-04-12 Thread Erlend Garåsen


I did a svn co in order to get a fresh version of MCF from trunk since I 
had many temporary code changes which shouldn't be committed, and then I 
discovered some problems. I'm mentioning this in case there are similar 
problems with the RC6 candidate.


When I run mvn-bootstrap.sh, the build fails after the dependencies have 
been downloaded:
[ERROR] Failed to execute goal 
org.apache.maven.plugins:maven-install-plugin:2.3.1:install-file 
(default-cli) on project mcf-parent: Error installing artifact 
'xml-security:xmlsec:jar': Failed to install artifact 
xml-security:xmlsec:jar:1.4.1: 
/Users/erlendfg/tmp/mcf_2012/lib/xmlsec.jar (No such file or directory) 
- [Help 1]


I1m using Apache Maven 3.0.4

Erlend
--
Erlend Garåsen
Center for Information Technology Services
University of Oslo
P.O. Box 1086 Blindern, N-0317 OSLO, Norway
Ph: (+47) 22840193, Fax: (+47) 22852970, Mobile: (+47) 91380968, VIP: 31050


Re: Build fails with Maven while building MCF from trunk

2012-04-12 Thread Erlend Garåsen


Yes, but it fails as I wrote.

I think something is broken in trunk at the moment. ant test and ant 
build fails as well. I double-checked by doing another svn co.


Erlend

On 12.04.12 22.18, Karl Wright wrote:

You need to run the mvn-bootstrap script, as per the instructions.

Karl

On Thu, Apr 12, 2012 at 4:17 PM, Erlend Garåsene.f.gara...@usit.uio.no  wrote:


I did a svn co in order to get a fresh version of MCF from trunk since I had
many temporary code changes which shouldn't be committed, and then I
discovered some problems. I'm mentioning this in case there are similar
problems with the RC6 candidate.

When I run mvn-bootstrap.sh, the build fails after the dependencies have
been downloaded:
[ERROR] Failed to execute goal
org.apache.maven.plugins:maven-install-plugin:2.3.1:install-file
(default-cli) on project mcf-parent: Error installing artifact
'xml-security:xmlsec:jar': Failed to install artifact
xml-security:xmlsec:jar:1.4.1: /Users/erlendfg/tmp/mcf_2012/lib/xmlsec.jar
(No such file or directory) -  [Help 1]

I1m using Apache Maven 3.0.4

Erlend
--
Erlend Garåsen
Center for Information Technology Services
University of Oslo
P.O. Box 1086 Blindern, N-0317 OSLO, Norway
Ph: (+47) 22840193, Fax: (+47) 22852970, Mobile: (+47) 91380968, VIP: 31050



--
Erlend Garåsen
Center for Information Technology Services
University of Oslo
P.O. Box 1086 Blindern, N-0317 OSLO, Norway
Ph: (+47) 22840193, Fax: (+47) 22852970, Mobile: (+47) 91380968, VIP: 31050


Re: Build fails with Maven while building MCF from trunk

2012-04-12 Thread Erlend Garåsen


It fails on Linux as well on OSX. I just tried to run ant build and 
ant test on our Linux development server as well on my laptop. Well, I 
get BUILD SUCCESSFUL, but nothing really happens. No tests run at all.


Erlend

On 12.04.12 22.38, Erlend Garåsen wrote:


Yes, but it fails as I wrote.

I think something is broken in trunk at the moment. ant test and ant
build fails as well. I double-checked by doing another svn co.

Erlend

On 12.04.12 22.18, Karl Wright wrote:

You need to run the mvn-bootstrap script, as per the instructions.

Karl

On Thu, Apr 12, 2012 at 4:17 PM, Erlend
Garåsene.f.gara...@usit.uio.no wrote:


I did a svn co in order to get a fresh version of MCF from trunk
since I had
many temporary code changes which shouldn't be committed, and then I
discovered some problems. I'm mentioning this in case there are similar
problems with the RC6 candidate.

When I run mvn-bootstrap.sh, the build fails after the dependencies have
been downloaded:
[ERROR] Failed to execute goal
org.apache.maven.plugins:maven-install-plugin:2.3.1:install-file
(default-cli) on project mcf-parent: Error installing artifact
'xml-security:xmlsec:jar': Failed to install artifact
xml-security:xmlsec:jar:1.4.1:
/Users/erlendfg/tmp/mcf_2012/lib/xmlsec.jar
(No such file or directory) - [Help 1]

I1m using Apache Maven 3.0.4

Erlend
--
Erlend Garåsen
Center for Information Technology Services
University of Oslo
P.O. Box 1086 Blindern, N-0317 OSLO, Norway
Ph: (+47) 22840193, Fax: (+47) 22852970, Mobile: (+47) 91380968, VIP:
31050






--
Erlend Garåsen
Center for Information Technology Services
University of Oslo
P.O. Box 1086 Blindern, N-0317 OSLO, Norway
Ph: (+47) 22840193, Fax: (+47) 22852970, Mobile: (+47) 91380968, VIP: 31050


Translations into Japanese needed for an error message

2012-04-11 Thread Erlend Garåsen


I need to translate an error message into Japanese in order to resolve a 
ticket.


The English text I need to translate: Invalid URLs in seeds list:
Google translates this to: 種子リスト内の無効なURL:

Is the translation correct?

Erlend
--
Erlend Garåsen
Center for Information Technology Services
University of Oslo
P.O. Box 1086 Blindern, N-0317 OSLO, Norway
Ph: (+47) 22840193, Fax: (+47) 22852970, Mobile: (+47) 91380968, VIP: 31050


Re: [VOTE] Upgrade ManifoldCF to jdk 1.6 prior to next release

2012-04-11 Thread Erlend Garåsen


+1

Erlend

On 11.04.12 02.39, Karl Wright wrote:

Folks,

When doing the dependency rework, it became clear that many of our
binary dependencies are stuck without fixes or upgrades because we
are still using jdk 1.5.  I'd like to get a sense from the community
whether everyone thinks we should abandon support for jdk1.5 in our
next release.  The pros of such a move include allowing us to
upgrade to jetty 7 (which brings in a number of bug fixes), tomcat 7
components, and off-the-shelf hsqldb builds.  I cannot at this time
identify any obvious cons.  Please let me know your opinion.  +1 to
upgrade, from me.

Thanks,
Karl



--
Erlend Garåsen
Center for Information Technology Services
University of Oslo
P.O. Box 1086 Blindern, N-0317 OSLO, Norway
Ph: (+47) 22840193, Fax: (+47) 22852970, Mobile: (+47) 91380968, VIP: 31050


Re: Postpone localization tickets for 0.5 release?

2012-03-20 Thread Erlend Garåsen

On 19.03.12 20.33, Karl Wright wrote:

Hi all,

I've not heard anything from Hitoshi Ozawa about whether he will be
able to complete some of the Japanese localization work that was
scheduled for 0.5-incubating.  Since the day is rapidly approaching
when we have to spin the first RC, I'd like a show of hands as to
whether we should postpone the two tickets currently on Hitoshi's
plate for the next release.  The tickets in question are
CONNECTORS-394 and CONNECTORS-404.

+1 from me for postponing.

Karl


+1

--
Erlend Garåsen
Center for Information Technology Services
University of Oslo
P.O. Box 1086 Blindern, N-0317 OSLO, Norway
Ph: (+47) 22840193, Fax: (+47) 22852970, Mobile: (+47) 91380968, VIP: 31050


Re: Possible bug in seeds list (web connector)

2012-03-20 Thread Erlend Garåsen


I think it will be much easier to validate the seeds list by using 
JavaScript instead of parsing urls with java.net.URL, simply because 
this is how we do validation elsewhere in the application.


Checking for valid URLs, supported protocols and illegal characters 
shouldn't be very complicated by using JavaScript.


What do you think?

Erlend

On 16.03.12 11.51, Karl Wright wrote:

Do you agree that a well-formed URL is what java.net.URL will accept
in the constructor's argument? Then www.example.org will fail, but
http://www.example.org (without a trailing slash) will pass.

I might even go a bit further.  See the following code in:
WebcrawlerConnector:  protected String makeDocumentIdentifier(String
parentIdentifier, String rawURL, DocumentURLFilter filter)

Thanks!
Karl



On Fri, Mar 16, 2012 at 5:52 AM, Erlend Garåsene.f.gara...@usit.uio.no  wrote:

On 15.03.12 19.30, Karl Wright wrote:


A seed can be a specific html file so complaining about a trailing
slash would make that not work.  For example:

http://hello.world.com/startpage.html



I think I was a little bit unclear in my recent email. By a trailing slash,
I was thinking more about the domain name itself, e.g. www.example.org/.

I will create a Jira ticket now, but I will only focus about well-formed
URLs in the seeds list.

Do you agree that a well-formed URL is what java.net.URL will accept in the
constructor's argument? Then www.example.org will fail, but
http://www.example.org (without a trailing slash) will pass.


Erlend

--
Erlend Garåsen
Center for Information Technology Services
University of Oslo
P.O. Box 1086 Blindern, N-0317 OSLO, Norway
Ph: (+47) 22840193, Fax: (+47) 22852970, Mobile: (+47) 91380968, VIP: 31050



--
Erlend Garåsen
Center for Information Technology Services
University of Oslo
P.O. Box 1086 Blindern, N-0317 OSLO, Norway
Ph: (+47) 22840193, Fax: (+47) 22852970, Mobile: (+47) 91380968, VIP: 31050


Re: Possible bug in seeds list (web connector)

2012-03-20 Thread Erlend Garåsen


This sounds very ok. I have already written the necessary JavaScript 
code, and it works as it should. I didn't create a ticket because I 
needed time to figure out the best solution and in order to learn more 
about how the connector works by reading the Java code.


I will create a ticket right away and include the JavaScript, but I 
think I will create a patch as well before I commit my work.


Erlend

On 20.03.12 13.59, Karl Wright wrote:

I think this is a reasonable approach.  You may need to modify the
python browser simulator, though, to keep the UI tests working.  I can
help you with that when the time comes.

If you create a ticket and include your proposed Javascript, I can
review it and let you know how challenging I think it will be to
support it in the browser simulator.  Also, since we are trying to get
a release out the door, I think it makes sense to hold off on these
changes until I can make the release branch.  Sound OK?

Thanks!
Karl


On Tue, Mar 20, 2012 at 8:54 AM, Erlend Garåsene.f.gara...@usit.uio.no  wrote:


I think it will be much easier to validate the seeds list by using
JavaScript instead of parsing urls with java.net.URL, simply because this is
how we do validation elsewhere in the application.

Checking for valid URLs, supported protocols and illegal characters
shouldn't be very complicated by using JavaScript.

What do you think?

Erlend


On 16.03.12 11.51, Karl Wright wrote:


Do you agree that a well-formed URL is what java.net.URL will accept
in the constructor's argument? Then www.example.org will fail, but
http://www.example.org (without a trailing slash) will pass.

I might even go a bit further.  See the following code in:
WebcrawlerConnector:  protected String makeDocumentIdentifier(String
parentIdentifier, String rawURL, DocumentURLFilter filter)

Thanks!
Karl



On Fri, Mar 16, 2012 at 5:52 AM, Erlend Garåsene.f.gara...@usit.uio.no
  wrote:


On 15.03.12 19.30, Karl Wright wrote:



A seed can be a specific html file so complaining about a trailing
slash would make that not work.  For example:

http://hello.world.com/startpage.html




I think I was a little bit unclear in my recent email. By a trailing
slash,
I was thinking more about the domain name itself, e.g. www.example.org/.

I will create a Jira ticket now, but I will only focus about well-formed
URLs in the seeds list.

Do you agree that a well-formed URL is what java.net.URL will accept in
the
constructor's argument? Then www.example.org will fail, but
http://www.example.org (without a trailing slash) will pass.


Erlend

--
Erlend Garåsen
Center for Information Technology Services
University of Oslo
P.O. Box 1086 Blindern, N-0317 OSLO, Norway
Ph: (+47) 22840193, Fax: (+47) 22852970, Mobile: (+47) 91380968, VIP:
31050




--
Erlend Garåsen
Center for Information Technology Services
University of Oslo
P.O. Box 1086 Blindern, N-0317 OSLO, Norway
Ph: (+47) 22840193, Fax: (+47) 22852970, Mobile: (+47) 91380968, VIP: 31050



--
Erlend Garåsen
Center for Information Technology Services
University of Oslo
P.O. Box 1086 Blindern, N-0317 OSLO, Norway
Ph: (+47) 22840193, Fax: (+47) 22852970, Mobile: (+47) 91380968, VIP: 31050


Re: Possible bug in seeds list (web connector)

2012-03-20 Thread Erlend Garåsen


I have created a ticket for this and entered some unfinished JavaScript 
code. My head does not work very well with regular expressions today, so 
I will improve the code tomorrow.


Please write a few lines about what I need to do with the Python browser 
simulator in order to test the JavaScript.


Erlend

On 20.03.12 14.14, Erlend Garåsen wrote:


This sounds very ok. I have already written the necessary JavaScript
code, and it works as it should. I didn't create a ticket because I
needed time to figure out the best solution and in order to learn more
about how the connector works by reading the Java code.

I will create a ticket right away and include the JavaScript, but I
think I will create a patch as well before I commit my work.

Erlend

On 20.03.12 13.59, Karl Wright wrote:

I think this is a reasonable approach. You may need to modify the
python browser simulator, though, to keep the UI tests working. I can
help you with that when the time comes.

If you create a ticket and include your proposed Javascript, I can
review it and let you know how challenging I think it will be to
support it in the browser simulator. Also, since we are trying to get
a release out the door, I think it makes sense to hold off on these
changes until I can make the release branch. Sound OK?

Thanks!
Karl


On Tue, Mar 20, 2012 at 8:54 AM, Erlend
Garåsene.f.gara...@usit.uio.no wrote:


I think it will be much easier to validate the seeds list by using
JavaScript instead of parsing urls with java.net.URL, simply because
this is
how we do validation elsewhere in the application.

Checking for valid URLs, supported protocols and illegal characters
shouldn't be very complicated by using JavaScript.

What do you think?

Erlend


On 16.03.12 11.51, Karl Wright wrote:


Do you agree that a well-formed URL is what java.net.URL will accept
in the constructor's argument? Then www.example.org will fail, but
http://www.example.org (without a trailing slash) will pass.

I might even go a bit further. See the following code in:
WebcrawlerConnector: protected String makeDocumentIdentifier(String
parentIdentifier, String rawURL, DocumentURLFilter filter)

Thanks!
Karl



On Fri, Mar 16, 2012 at 5:52 AM, Erlend
Garåsene.f.gara...@usit.uio.no
wrote:


On 15.03.12 19.30, Karl Wright wrote:



A seed can be a specific html file so complaining about a trailing
slash would make that not work. For example:

http://hello.world.com/startpage.html




I think I was a little bit unclear in my recent email. By a trailing
slash,
I was thinking more about the domain name itself, e.g.
www.example.org/.

I will create a Jira ticket now, but I will only focus about
well-formed
URLs in the seeds list.

Do you agree that a well-formed URL is what java.net.URL will
accept in
the
constructor's argument? Then www.example.org will fail, but
http://www.example.org (without a trailing slash) will pass.


Erlend

--
Erlend Garåsen
Center for Information Technology Services
University of Oslo
P.O. Box 1086 Blindern, N-0317 OSLO, Norway
Ph: (+47) 22840193, Fax: (+47) 22852970, Mobile: (+47) 91380968, VIP:
31050




--
Erlend Garåsen
Center for Information Technology Services
University of Oslo
P.O. Box 1086 Blindern, N-0317 OSLO, Norway
Ph: (+47) 22840193, Fax: (+47) 22852970, Mobile: (+47) 91380968, VIP:
31050






--
Erlend Garåsen
Center for Information Technology Services
University of Oslo
P.O. Box 1086 Blindern, N-0317 OSLO, Norway
Ph: (+47) 22840193, Fax: (+47) 22852970, Mobile: (+47) 91380968, VIP: 31050


Re: Possible bug in seeds list (web connector)

2012-03-16 Thread Erlend Garåsen

On 15.03.12 19.30, Karl Wright wrote:

A seed can be a specific html file so complaining about a trailing
slash would make that not work.  For example:

http://hello.world.com/startpage.html


I think I was a little bit unclear in my recent email. By a trailing 
slash, I was thinking more about the domain name itself, e.g. 
www.example.org/.


I will create a Jira ticket now, but I will only focus about well-formed 
URLs in the seeds list.


Do you agree that a well-formed URL is what java.net.URL will accept in 
the constructor's argument? Then www.example.org will fail, but 
http://www.example.org (without a trailing slash) will pass.


Erlend

--
Erlend Garåsen
Center for Information Technology Services
University of Oslo
P.O. Box 1086 Blindern, N-0317 OSLO, Norway
Ph: (+47) 22840193, Fax: (+47) 22852970, Mobile: (+47) 91380968, VIP: 31050


Should the default value for include only host matching seeds be yes/checked?

2012-03-15 Thread Erlend Garåsen


I suggest that we change the default value to yes/checked for
Include only hosts matching seeds?
for the web connector.

If you only want to crawl your own company's web pages, but forget to 
check this option by a mistake, you risk to crawl a lot of external web 
pages as well.


So what do you think? Should I create a ticket and change the default 
setting to checked/yes?


Erlend

--
Erlend Garåsen
Center for Information Technology Services
University of Oslo
P.O. Box 1086 Blindern, N-0317 OSLO, Norway
Ph: (+47) 22840193, Fax: (+47) 22852970, Mobile: (+47) 91380968, VIP: 31050


Possible bug in seeds list (web connector)

2012-03-15 Thread Erlend Garåsen


If I add the following URL into my seeds list:
http://www.uio.no
and this into the include in crawl list:
http://www.uio.no/.*
the job will just end shortly after it starts without fetching anything 
at all. If I add the missing trailing slash into my seeds url list 
(http://www.uio.no/), it works as it should.


I also discovered another similar behaviour. If I add the following into 
my seeds list:

www.uio.no
select the include only hosts matching seeds? option and do not add 
anything into the include in crawl, the same thing happen. No URLs 
will be fetched.


I suggest that we do something like this:
- A URL in the Java code will always start with http(s)://www.myhost.com/
- If you fail to add the protocol or the trailing slash, it will be 
added automatically instead of returning an error message.


By in the Java code, I mean that it should automatically be formatted 
like this before we do a regular expression match.


Erlend

--
Erlend Garåsen
Center for Information Technology Services
University of Oslo
P.O. Box 1086 Blindern, N-0317 OSLO, Norway
Ph: (+47) 22840193, Fax: (+47) 22852970, Mobile: (+47) 91380968, VIP: 31050


mvn eclipse:eclipse fails with Maven 3

2012-03-09 Thread Erlend Garåsen


When I try to run mvn eclipse:eclipse in order to prepare the MCF 
project for Eclipse, I get an error which seems to be related to the 
Alfresco connector, but this might be an issue with version 2.8 of the 
Maven Eclipse plugin. BTW, it works by running the following:

mvn org.apache.maven.plugins:maven-eclipse-plugin:2.6:eclipse

I upgraded to Maven 3 for a week ago, so I haven't seen this error (yet) 
i my other Java projects.


This is the part of the error message from Maven using the -e switch:

[ERROR] Failed to execute goal 
org.apache.maven.plugins:maven-eclipse-plugin:2.8:eclipse (default-cli) 
on project mcf-alfresco-war-test: Request to merge when 'filtering' is 
not identical. Original=resource src/main/resources: 
output=target/classes, include=[], 
exclude=[**/restore-context.xml|**/ldap-*.xml|**/*.java], test=false, 
filtering=false, merging with=resource src/main/resources: 
output=target/classes, include=[log4j.properties], exclude=[**/*.java], 
test=false, filtering=true - [Help 1]
org.apache.maven.lifecycle.LifecycleExecutionException: Failed to 
execute goal org.apache.maven.plugins:maven-eclipse-plugin:2.8:eclipse 
(default-cli) on project mcf-alfresco-war-test: Request to merge when 
'filtering' is not identical. Original=resource src/main/resources: 
output=target/classes, include=[], 
exclude=[**/restore-context.xml|**/ldap-*.xml|**/*.java], test=false, 
filtering=false, merging with=resource src/main/resources: 
output=target/classes, include=[log4j.properties], exclude=[**/*.java], 
test=false, filtering=true
	at 
org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:217)
	at 
org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:153)
	at 
org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:145)
	at 
org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:84)
	at 
org.apache.maven.lifecycle.internal.LifecycleModuleBuilder.buildProject(LifecycleModuleBuilder.java:59)
	at 
org.apache.maven.lifecycle.internal.LifecycleStarter.singleThreadedBuild(LifecycleStarter.java:183)
	at 
org.apache.maven.lifecycle.internal.LifecycleStarter.execute(LifecycleStarter.java:161)

at org.apache.maven.DefaultMaven.doExecute(DefaultMaven.java:320)
at org.apache.maven.DefaultMaven.execute(DefaultMaven.java:156)
at org.apache.maven.cli.MavenCli.execute(MavenCli.java:537)
at org.apache.maven.cli.MavenCli.doMain(MavenCli.java:196)
at org.apache.maven.cli.MavenCli.main(MavenCli.java:141)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
	at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
	at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)

at java.lang.reflect.Method.invoke(Method.java:597)
	at 
org.codehaus.plexus.classworlds.launcher.Launcher.launchEnhanced(Launcher.java:290)
	at 
org.codehaus.plexus.classworlds.launcher.Launcher.launch(Launcher.java:230)
	at 
org.codehaus.plexus.classworlds.launcher.Launcher.mainWithExitCode(Launcher.java:409)
	at 
org.codehaus.plexus.classworlds.launcher.Launcher.main(Launcher.java:352)
Caused by: org.apache.maven.plugin.MojoExecutionException: Request to 
merge when 'filtering' is not identical. Original=resource 
src/main/resources: output=target/classes, include=[], 
exclude=[**/restore-context.xml|**/ldap-*.xml|**/*.java], test=false, 
filtering=false, merging with=resource src/main/resources: 
output=target/classes, include=[log4j.properties], exclude=[**/*.java], 
test=false, filtering=true
	at 
org.apache.maven.plugin.eclipse.EclipseSourceDir.merge(EclipseSourceDir.java:302)
	at 
org.apache.maven.plugin.eclipse.EclipsePlugin.extractResourceDirs(EclipsePlugin.java:1652)
	at 
org.apache.maven.plugin.eclipse.EclipsePlugin.buildDirectoryList(EclipsePlugin.java:1534)
	at 
org.apache.maven.plugin.eclipse.EclipsePlugin.createEclipseWriterConfig(EclipsePlugin.java:1222)
	at 
org.apache.maven.plugin.eclipse.EclipsePlugin.writeConfiguration(EclipsePlugin.java:1085)
	at 
org.apache.maven.plugin.ide.AbstractIdeSupportMojo.execute(AbstractIdeSupportMojo.java:511)
	at 
org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo(DefaultBuildPluginManager.java:101)
	at 
org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:209)

... 19 more
[ERROR]
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, 
please read the following articles:
[ERROR] [Help 1] 
http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException

[ERROR]
[ERROR] After correcting the problems, you can resume the build with the 
command

[ERROR]   mvn goals -rf :mcf-alfresco-war-test

Erlend

--
Erlend Garåsen
Center for Information Technology Services
University of Oslo
P.O. Box 1086 Blindern, N-0317 OSLO, Norway
Ph: (+47

Re: mvn eclipse:eclipse fails with Maven 3

2012-03-09 Thread Erlend Garåsen
I forgot to mention that I tried to add the following into the parent 
pom.xml file without any luck:

plugin
  groupIdorg.apache.maven.plugins/groupId
  artifactIdmaven-eclipse-plugin/artifactId
  version2.6/version
/plugin

Erlend

--
Erlend Garåsen
Center for Information Technology Services
University of Oslo
P.O. Box 1086 Blindern, N-0317 OSLO, Norway
Ph: (+47) 22840193, Fax: (+47) 22852970, Mobile: (+47) 91380968, VIP: 31050


Re: mvn eclipse:eclipse fails with Maven 3

2012-03-09 Thread Erlend Garåsen
-context.xml|**/ldap-*.xml|**/*.java], test=false,
filtering=false, merging with=resource src/main/resources:
output=target/classes, include=[log4j.properties], exclude=[**/*.java],
test=false, filtering=true
at
org.apache.maven.plugin.eclipse.EclipseSourceDir.merge(EclipseSourceDir.java:302)
at
org.apache.maven.plugin.eclipse.EclipsePlugin.extractResourceDirs(EclipsePlugin.java:1652)
at
org.apache.maven.plugin.eclipse.EclipsePlugin.buildDirectoryList(EclipsePlugin.java:1534)
at
org.apache.maven.plugin.eclipse.EclipsePlugin.createEclipseWriterConfig(EclipsePlugin.java:1222)
at
org.apache.maven.plugin.eclipse.EclipsePlugin.writeConfiguration(EclipsePlugin.java:1085)
at
org.apache.maven.plugin.ide.AbstractIdeSupportMojo.execute(AbstractIdeSupportMojo.java:511)
at
org.apache.maven.plugin.DefaultBuildPluginManager.executeMojo(DefaultBuildPluginManager.java:101)
at
org.apache.maven.lifecycle.internal.MojoExecutor.execute(MojoExecutor.java:209)
... 19 more
[ERROR]
[ERROR] Re-run Maven using the -X switch to enable full debug logging.
[ERROR]
[ERROR] For more information about the errors and possible solutions, please
read the following articles:
[ERROR] [Help 1]
http://cwiki.apache.org/confluence/display/MAVEN/MojoExecutionException
[ERROR]
[ERROR] After correcting the problems, you can resume the build with the
command
[ERROR]   mvngoals  -rf :mcf-alfresco-war-test

Erlend

--
Erlend Garåsen
Center for Information Technology Services
University of Oslo
P.O. Box 1086 Blindern, N-0317 OSLO, Norway
Ph: (+47) 22840193, Fax: (+47) 22852970, Mobile: (+47) 91380968, VIP: 31050







--
Erlend Garåsen
Center for Information Technology Services
University of Oslo
P.O. Box 1086 Blindern, N-0317 OSLO, Norway
Ph: (+47) 22840193, Fax: (+47) 22852970, Mobile: (+47) 91380968, VIP: 31050


Re: Please welcome Hitoshi Ozawa to the ManifoldCF community!

2012-02-13 Thread Erlend Garåsen


Congratulations and welcome!

By the way, I hope I will meet some of you committers at Lucene 
Revolution in Boston this year.


http://www.lucenerevolution.com/

Erlend

On 07.02.12 03.39, Karl Wright wrote:

Hitoshi is now officially a ManifoldCF Committer.  Congratulations, Hitoshi!

Karl



--
Erlend Garåsen
Center for Information Technology Services
University of Oslo
P.O. Box 1086 Blindern, N-0317 OSLO, Norway
Ph: (+47) 22840193, Fax: (+47) 22852970, Mobile: (+47) 91380968, VIP: 31050


Commands for registering agents etc.

2012-01-24 Thread Erlend Garåsen


Hello list,

I haven't been an active developer for a while since I have been sick, 
so there are a lot of changes since last time I dag into the code and 
setup process.


It seems that the documentation regarding the executecommands tool has 
been changed since I cannot find the commands for installing the web 
crawler for instance.


Maybe it is a good idea to explain the available arguments for many of 
the agents and crawlers. I just need to install the Solr and Web crawler 
connector, but cannot remember how. The documentation says After you 
have created the necessary configuration files, you will need to 
initialize the database, register the pull-agent agent, and then 
register your individual connectors. ManifoldCF provides a set of 
commands for performing these actions, and others as well.


Here's what I did:
./processes/script/executecommand.sh org.apache.manifoldcf.agents.Install

But now I don't longer remember what I did before in order to fulfill 
the necessary steps. I managed to register the Solr ourput connector, 
but not the web crawler:
./processes/script/executecommand.sh 
org.apache.manifoldcf.crawler.Register 
org.apache.manifoldcf.crawler.connectors.webcrawler.WebcrawlerConnector 
WebCrawler


This gives an error message about that the relation connectors does 
not exist. I think I need to register the pull agent, but I'm not sure how.


Erlend

--
Erlend Garåsen
Center for Information Technology Services
University of Oslo
P.O. Box 1086 Blindern, N-0317 OSLO, Norway
Ph: (+47) 22840193, Fax: (+47) 22852970, Mobile: (+47) 91380968, VIP: 31050


Re: Excluding html files and following links

2011-07-25 Thread Erlend Garåsen


Unfortunately, I have only created a local Jira ticket (our own Jira at 
the uni) where this problem is reported. Since we are in a hurry with 
our search project at the moment, the issue is still open.


The host having large documents is now excluded from our crawl, mainly 
because it has been decided that we don't want to index it at all. The 
host includes private web pages, basically published by students.


I will keep you informed when we have made a decision about what to do 
with large documents. I guess the new parameter will do the trick. 
Thanks for working on this issue!


Erlend

On 05.07.11 12.04, Karl Wright wrote:

Have you had a look at the feature added, and does it work for you?
I'd also still be interested in knowing where you are seeing
out-of-memory situations.

Karl

On Thu, Jun 23, 2011 at 8:03 AM, Karl Wrightdaddy...@gmail.com  wrote:

Hi Erlend,

I hope you are not seeing memory issues on large files with ManifoldCF
itself.  That should not happen, and if it does we need to figure out
why.

Solr memory issues, on the other hand, I can believe.  If that is the
problem, then I agree we should try to do something about it.
Probably the right thing to do is (since it is a Solr limitation)
adding a configuration parameter to the Solr connector that specifies
the maximum size of a file the connection will accept.  Files larger
than that should return a 400 if indexing is attempted, etc.

Perhaps we should also consider adding a new method to the
IOutputConnector interface that returns a maximum file size value, and
expose that in IVersionActivity and IProcessActivity.  That would
allow connectors to make output-based decisions as to whether they
should fetch large files in the first place.

Karl


On Thu, Jun 23, 2011 at 7:32 AM, Erlend Garåsene.f.gara...@usit.uio.no  wrote:


I will create a ticket today. Post filtering sounds like a good idea.

Another thing. We are facing memory problems with huge documents. Maybe we
should add another future in order to cope with such documents, for instance
skip documents which exceed a preset size. We have discovered pdfs on 500
MB. What do you think? Do we need such a future as well?

Erlend

On 23.06.11 12.08, Karl Wright wrote:


Have there been any further developments on this thread?
Karl

On Tue, Jun 21, 2011 at 6:08 AM, Karl Wrightdaddy...@gmail.comwrote:


Sure.  But you've already convinced me we need a new feature. ;-)

Karl

On Tue, Jun 21, 2011 at 3:50 AM, Erlend Garåsene.f.gara...@usit.uio.no
  wrote:


Sure, I can create a ticket. But first I want to discuss this issue with
the
two search consultants we have hired.

I decided to post to the dev list in order to get some feedback on this
issue.

Erlend

On 20.06.11 18.00, Karl Wright wrote:


Hi Erlend,

The inclusions and exclusions are based solely on URL, and block the
connector from fetching the file.  Otherwise you would easily wind up
fetching the entire web.

However, this raises an interesting issue as to whether there's a way
in the web connector to do what you are trying to do, which is to
filter based on URL after links have been extracted.  The current
inclusions/exclusions work fine for any URLs without links but do not
allow for the case you are looking for.

Can you create a ticket?  The suggestion would be to introduce
post-extraction inclusions and exclusions into the connector.

Karl


On Mon, Jun 20, 2011 at 10:53 AM, Erlend Garåsen
e.f.gara...@usit.uio.no  wrote:


I just realized that if I exclude html files for a job, links in these
files
will not be followed. Is this a desirable behaviour? Should links be
followed regardless of the exclude filter?

I discovered this issue when I was going to crawl only pdfs and
realized
that the job ended without finding any documents at all. I think I had
something like this in my include list:
http://foreninger.uio.no/.*\.pdf$
http://folk.uio.no/.*\.pdf$

Erlend

--
Erlend Garåsen
Center for Information Technology Services
University of Oslo
P.O. Box 1086 Blindern, N-0317 OSLO, Norway
Ph: (+47) 22840193, Fax: (+47) 22852970, Mobile: (+47) 91380968, VIP:
31050




--
Erlend Garåsen
Center for Information Technology Services
University of Oslo
P.O. Box 1086 Blindern, N-0317 OSLO, Norway
Ph: (+47) 22840193, Fax: (+47) 22852970, Mobile: (+47) 91380968, VIP:
31050






--
Erlend Garåsen
Center for Information Technology Services
University of Oslo
P.O. Box 1086 Blindern, N-0317 OSLO, Norway
Ph: (+47) 22840193, Fax: (+47) 22852970, Mobile: (+47) 91380968, VIP: 31050






--
Erlend Garåsen
Center for Information Technology Services
University of Oslo
P.O. Box 1086 Blindern, N-0317 OSLO, Norway
Ph: (+47) 22840193, Fax: (+47) 22852970, Mobile: (+47) 91380968, VIP: 31050


Re: Excluding html files and following links

2011-06-21 Thread Erlend Garåsen


Sure, I can create a ticket. But first I want to discuss this issue with 
the two search consultants we have hired.


I decided to post to the dev list in order to get some feedback on this 
issue.


Erlend

On 20.06.11 18.00, Karl Wright wrote:

Hi Erlend,

The inclusions and exclusions are based solely on URL, and block the
connector from fetching the file.  Otherwise you would easily wind up
fetching the entire web.

However, this raises an interesting issue as to whether there's a way
in the web connector to do what you are trying to do, which is to
filter based on URL after links have been extracted.  The current
inclusions/exclusions work fine for any URLs without links but do not
allow for the case you are looking for.

Can you create a ticket?  The suggestion would be to introduce
post-extraction inclusions and exclusions into the connector.

Karl


On Mon, Jun 20, 2011 at 10:53 AM, Erlend Garåsen
e.f.gara...@usit.uio.no  wrote:


I just realized that if I exclude html files for a job, links in these files
will not be followed. Is this a desirable behaviour? Should links be
followed regardless of the exclude filter?

I discovered this issue when I was going to crawl only pdfs and realized
that the job ended without finding any documents at all. I think I had
something like this in my include list:
http://foreninger.uio.no/.*\.pdf$
http://folk.uio.no/.*\.pdf$

Erlend

--
Erlend Garåsen
Center for Information Technology Services
University of Oslo
P.O. Box 1086 Blindern, N-0317 OSLO, Norway
Ph: (+47) 22840193, Fax: (+47) 22852970, Mobile: (+47) 91380968, VIP: 31050




--
Erlend Garåsen
Center for Information Technology Services
University of Oslo
P.O. Box 1086 Blindern, N-0317 OSLO, Norway
Ph: (+47) 22840193, Fax: (+47) 22852970, Mobile: (+47) 91380968, VIP: 31050


Re: The ManifoldCF PPMC welcomes Shinichiro Abe as a ManifoldCF committer!

2011-05-13 Thread Erlend Garåsen


Welcome and congratulations!

Erlend

On 13.05.11 12.05, Karl Wright wrote:

Please join us in welcoming Shinichiro to the ManifoldCF team!

Karl



--
Erlend Garåsen
Center for Information Technology Services
University of Oslo
P.O. Box 1086 Blindern, N-0317 OSLO, Norway
Ph: (+47) 22840193, Fax: (+47) 22852970, Mobile: (+47) 91380968, VIP: 31050


Re: [VOTE] Release Apache ManifoldCF 0.2-incubating, RC2

2011-04-28 Thread Erlend Garåsen

+1 from me as well.

1. Verified the CHANGES.txt

2. Tested the binary release (tar.gz) on OS X:
- md5sums
- ant test
- Did a web crawl and indexed Solr 3.1 using the example (Jetty + Derby)
- Deployed on Resin 4.0.15, crawled the web, indexed Solr 3.1 and used 
an external PostgreSQL server (new future in this release)
- Verified that an html page with meta robots noindex was not indexed by 
Solr (new future in this release)


3. Tested the source release (zip) on OS X:
- md5sums
- ant test
- ant build
- Did a web crawl and indexed Solr 3.1 using the example (Jetty + Derby)


On 27.04.11 00.12, Karl Wright wrote:

The RC2 of the 0.2-incubating release is now up on
http://people.apache.org/~kwright.  The svn tag is at
https://svn.apache.org/repos/asf/incubator/lcf/tags/release-0.2-incubating-RC2.

Please vote!

Karl



--
Erlend Garåsen
Center for Information Technology Services
University of Oslo
P.O. Box 1086 Blindern, N-0317 OSLO, Norway
Ph: (+47) 22840193, Fax: (+47) 22852970, Mobile: (+47) 91380968, VIP: 31050


Re: [VOTE] Release ManifoldCF-0.2-incubating, RC1

2011-04-26 Thread Erlend Garåsen

-1.

The executecommand.sh script cannot be run since CRLF line endings have 
been added (should be LF). I have created a ticket about this problem:

https://issues.apache.org/jira/browse/CONNECTORS-188

Erlend

On 01.04.11 12.24, Karl Wright wrote:

RC1 is now available on
http://people.apache.org/~kwright/apache-manifoldcf-0.2-incubating.
Please check it out and vote!

Karl


On Fri, Apr 1, 2011 at 3:38 AM, Karl Wrightdaddy...@gmail.com  wrote:

Vote failed; a new release candidate with a fix is being respun.  This
will be RC1.
Karl




--
Erlend Garåsen
Center for Information Technology Services
University of Oslo
P.O. Box 1086 Blindern, N-0317 OSLO, Norway
Ph: (+47) 22840193, Fax: (+47) 22852970, Mobile: (+47) 91380968, VIP: 31050