Hi - snapshot repository is not enabled. See one of those ivy/*.xml files on how to enable it. Don't know how to do it from the top of my head. By the way, why would any23 depend on a snapshot? It doesn't make sense. Markus
-----Original message----- > From:Manish Verma <[email protected]> > Sent: Tuesday 29th March 2016 22:59 > To: [email protected] > Subject: Re: Extract Microdata > > Thanks Markus, > > I was trying to include any23 dependency to nutch 1.10 by adding <dependency > org="org.apache.any23" name="apache-any23-core" rev="1.1" /> to ivy.xml. > But when i build this I get below. Is there any other configuration change > required to add some third party jar ? > > [ivy:resolve] :: problems summary :: > [ivy:resolve] :::: WARNINGS > [ivy:resolve] module not found: > org.apache.commons#commons-csv;1.0-SNAPSHOT-rev1148315 > [ivy:resolve] ==== local: tried > [ivy:resolve] > /Users/manishverma/.ivy2/local/org.apache.commons/commons-csv/1.0-SNAPSHOT-rev1148315/ivys/ivy.xml > [ivy:resolve] -- artifact > org.apache.commons#commons-csv;1.0-SNAPSHOT-rev1148315!commons-csv.jar: > [ivy:resolve] > /Users/manishverma/.ivy2/local/org.apache.commons/commons-csv/1.0-SNAPSHOT-rev1148315/jars/commons-csv.jar > [ivy:resolve] ==== maven2: tried > [ivy:resolve] > http://repo1.maven.org/maven2/org/apache/commons/commons-csv/1.0-SNAPSHOT-rev1148315/commons-csv-1.0-SNAPSHOT-rev1148315.pom > [ivy:resolve] -- artifact > org.apache.commons#commons-csv;1.0-SNAPSHOT-rev1148315!commons-csv.jar: > [ivy:resolve] > http://repo1.maven.org/maven2/org/apache/commons/commons-csv/1.0-SNAPSHOT-rev1148315/commons-csv-1.0-SNAPSHOT-rev1148315.jar > [ivy:resolve] ==== apache-snapshot: tried > [ivy:resolve] > https://repository.apache.org/content/repositories/snapshots/org/apache/commons/commons-csv/1.0-SNAPSHOT-rev1148315/commons-csv-1.0-SNAPSHOT-rev1148315.pom > [ivy:resolve] -- artifact > org.apache.commons#commons-csv;1.0-SNAPSHOT-rev1148315!commons-csv.jar: > [ivy:resolve] > https://repository.apache.org/content/repositories/snapshots/org/apache/commons/commons-csv/1.0-SNAPSHOT-rev1148315/commons-csv-1.0-SNAPSHOT-rev1148315.jar > [ivy:resolve] ==== sonatype: tried > [ivy:resolve] > http://oss.sonatype.org/content/repositories/releases/org/apache/commons/commons-csv/1.0-SNAPSHOT-rev1148315/commons-csv-1.0-SNAPSHOT-rev1148315.pom > [ivy:resolve] -- artifact > org.apache.commons#commons-csv;1.0-SNAPSHOT-rev1148315!commons-csv.jar: > [ivy:resolve] > http://oss.sonatype.org/content/repositories/releases/org/apache/commons/commons-csv/1.0-SNAPSHOT-rev1148315/commons-csv-1.0-SNAPSHOT-rev1148315.jar > [ivy:resolve] :::::::::::::::::::::::::::::::::::::::::::::: > [ivy:resolve] :: UNRESOLVED DEPENDENCIES :: > [ivy:resolve] :::::::::::::::::::::::::::::::::::::::::::::: > [ivy:resolve] :: > org.apache.commons#commons-csv;1.0-SNAPSHOT-rev1148315: not found > [ivy:resolve] :::::::::::::::::::::::::::::::::::::::::::::: > [ivy:resolve] > [ivy:resolve] :: USE VERBOSE OR DEBUG MESSAGE LEVEL FOR MORE DETAILS > > > > On Mar 18, 2016, at 3:16 AM, Markus Jelsma <[email protected]> > > wrote: > > > > Hello! Nutch doesn't have a mechanism to extract microdata from HTML. But > > there is a patch for Apache Tika that comes as a content handler, TIKA-980. > > You can embed it into another content handler or use Tika's > > TeeContentHandler in Nutch' parse-tika plugin. Downside is that you have to > > transform the output data structure to a Writable in the plugin, otherwise > > you cannot store it as metadata and run on Hadoop. > > > > https://issues.apache.org/jira/browse/TIKA-980 > > > > Markus > > > > > > > > -----Original message----- > >> From:Manish Verma <[email protected]> > >> Sent: Thursday 17th March 2016 19:18 > >> To: [email protected] > >> Subject: Extract Microdata > >> > >> Hi, > >> > >> I need to crawl on Urls and extract micro data and save to solr. Does > >> Nutch support extraction of schema org micro data. > >> > >> Thanks > >> > >> > >> > >

