Hello,

I'm struggling to install nutch-1.3 into my local maven repo. How do you do
it?

I'm writing a Nutch plugin (actually a Boilerpipe integration test) as a
maven project (module of other projects). The challenges are:
1. Getting it to compile with a dependency on nutch
2. Getting the tests to run.

For 1:
$ mvn install:install-file -Dfile=build/nutch-1.3.jar -DpomFile=pom.xml
works but prints:
[WARNING] 'dependencies.dependency.exclusions.exclusion.artifactId' for
org.apache.tika:tika-parsers:jar is missing. @ line 173, column 22
[WARNING] 'dependencies.dependency.exclusions.exclusion.artifactId' for
org.apache.tika:tika-parsers:jar is missing. @ line 176, column 22
[WARNING] 'dependencies.dependency.exclusions.exclusion.artifactId' for
org.apache.tika:tika-parsers:jar is missing. @ line 179, column 22
[WARNING] 'dependencies.dependency.exclusions.exclusion.artifactId' for
log4j:log4j:jar is missing. @ line 189, column 22
[WARNING] 'dependencies.dependency.exclusions.exclusion.artifactId' for
log4j:log4j:jar is missing. @ line 192, column 22
[WARNING] 'dependencies.dependency.exclusions.exclusion.artifactId' for
log4j:log4j:jar is missing. @ line 195, column 22
[WARNING] 'dependencies.dependency.exclusions.exclusion.artifactId' for
org.apache.gora:gora-sql:jar is missing. @ line 292, column 22
[WARNING] 'dependencies.dependency.exclusions.exclusion.artifactId' for
org.apache.gora:gora-sql:jar is missing. @ line 295, column 22
[WARNING] 'dependencies.dependency.exclusions.exclusion.artifactId' for
org.apache.gora:gora-sql:jar is missing. @ line 298, column 22
[WARNING]
[WARNING] It is highly recommended to fix these problems because they
threaten the stability of your build.
[WARNING]
[WARNING] For this reason, future Maven versions might no longer support
building such malformed projects.
[WARNING]

Those warnings turn out to be important since building the dummy plugin
below the build fails reporting: The POM for
org.apache.gora:gora-sql:jar:0.1-incubating is missing, no dependency
information available


import  java.util.logging.Logger;
import org.apache.nutch.crawl.CrawlDatum;
import org.apache.nutch.crawl.Inlinks;
import org.apache.nutch.indexer.IndexingException;
import org.apache.nutch.indexer.NutchDocument;
import org.apache.nutch.parse.Parse;
import org.apache.nutch.indexer.IndexingFilter;
/**
 *
 * @author simpatico
 */
public class BPIntegrationTest implements IndexingFilter{

    @Override
    public NutchDocument filter(NutchDocument nd, Parse parse,
org.apache.hadoop.io.Text text, CrawlDatum cd, Inlinks inlnks) throws
IndexingException {
        Logger.getLogger(getClass().class.getName()).log(Level.SEVERE,
"intercepted parsing of " + text);
                return nd;
    }
}

I went through  http://wiki.apache.org/nutch/WritingPluginExample.



-- 
Regards,
K. Gabriele

--- unchanged since 20/9/10 ---
P.S. If the subject contains "[LON]" or the addressee acknowledges the
receipt within 48 hours then I don't resend the email.
subject(this) ∈ L(LON*) ∨ ∃x. (x ∈ MyInbox ∧ Acknowledges(x, this) ∧ time(x)
< Now + 48h) ⇒ ¬resend(I, this).

If an email is sent by a sender that is not a trusted contact or the email
does not contain a valid code then the email is not received. A valid code
starts with a hyphen and ends with "X".
∀x. x ∈ MyInbox ⇒ from(x) ∈ MySafeSenderList ∨ (∃y. y ∈ subject(x) ∧ y ∈
L(-[a-z]+[0-9]X)).

Reply via email to