I tried with SOLR 4.9.1. I copied /release-2.3.1/runtime/local/conf/schema.xml to solr-4.9.1/example/solr/collection1/conf/schema.xml
Result of /release-2.3.1/runtime/local/bin/crawl urls method_centers http://localhost:8983/solr 2 Injecting seed URLs /Users/sdrulea/svn/release-2.3.1/runtime/local/bin/nutch inject urls -crawlId method_centers InjectorJob: starting at 2015-09-29 12:54:46 InjectorJob: Injecting urlDir: urls InjectorJob: Using class org.apache.gora.mongodb.store.MongoStore as the Gora storage class. InjectorJob: total number of urls rejected by filters: 1 InjectorJob: total number of urls injected after normalization and filtering: 5 Injector: finished at 2015-09-29 12:54:49, elapsed: 00:00:02 Tue Sep 29 12:54:49 PDT 2015 : Iteration 1 of 2 Generating batchId Generating a new fetchlist /Users/sdrulea/svn/release-2.3.1/runtime/local/bin/nutch generate -D mapred.reduce.tasks=2 -D mapred.child.java.opts=-Xmx1000m -D mapred.reduce.tasks.speculative.execution=false -D mapred.map.tasks.speculative.execution=false -D mapred.compress.map.output=true -topN 50000 -noNorm -noFilter -adddays 0 -crawlId method_centers -batchId 1443556489-5775 GeneratorJob: starting at 2015-09-29 12:54:50 GeneratorJob: Selecting best-scoring urls due for fetch. GeneratorJob: starting GeneratorJob: filtering: false GeneratorJob: normalizing: false GeneratorJob: topN: 50000 GeneratorJob: finished at 2015-09-29 12:54:52, time elapsed: 00:00:02 GeneratorJob: generated batch id: 1443556490-521927141 containing 5 URLs Fetching : /Users/sdrulea/svn/release-2.3.1/runtime/local/bin/nutch fetch -D mapred.reduce.tasks=2 -D mapred.child.java.opts=-Xmx1000m -D mapred.reduce.tasks.speculative.execution=false -D mapred.map.tasks.speculative.execution=false -D mapred.compress.map.output=true -D fetcher.timelimit.mins=180 1443556489-5775 -crawlId method_centers -threads 50 FetcherJob: starting at 2015-09-29 12:54:53 FetcherJob: batchId: 1443556489-5775 FetcherJob: threads: 50 FetcherJob: parsing: false FetcherJob: resuming: false FetcherJob : timelimit set for : 1443567293080 Using queue mode : byHost Fetcher: threads: 50 QueueFeeder finished: total 0 records. Hit by time limit :0 -finishing thread FetcherThread0, activeThreads=0 -finishing thread FetcherThread1, activeThreads=0 -finishing thread FetcherThread2, activeThreads=0 -finishing thread FetcherThread3, activeThreads=0 -finishing thread FetcherThread4, activeThreads=0 -finishing thread FetcherThread5, activeThreads=0 -finishing thread FetcherThread6, activeThreads=0 -finishing thread FetcherThread7, activeThreads=0 -finishing thread FetcherThread8, activeThreads=0 -finishing thread FetcherThread9, activeThreads=0 -finishing thread FetcherThread10, activeThreads=0 -finishing thread FetcherThread11, activeThreads=0 -finishing thread FetcherThread12, activeThreads=0 -finishing thread FetcherThread13, activeThreads=0 -finishing thread FetcherThread14, activeThreads=0 -finishing thread FetcherThread15, activeThreads=0 -finishing thread FetcherThread16, activeThreads=0 -finishing thread FetcherThread17, activeThreads=0 -finishing thread FetcherThread18, activeThreads=0 -finishing thread FetcherThread19, activeThreads=0 -finishing thread FetcherThread20, activeThreads=0 -finishing thread FetcherThread21, activeThreads=0 -finishing thread FetcherThread22, activeThreads=0 -finishing thread FetcherThread23, activeThreads=0 -finishing thread FetcherThread24, activeThreads=0 -finishing thread FetcherThread25, activeThreads=0 -finishing thread FetcherThread26, activeThreads=0 -finishing thread FetcherThread27, activeThreads=0 -finishing thread FetcherThread28, activeThreads=0 -finishing thread FetcherThread29, activeThreads=0 -finishing thread FetcherThread30, activeThreads=0 -finishing thread FetcherThread31, activeThreads=0 -finishing thread FetcherThread32, activeThreads=0 -finishing thread FetcherThread33, activeThreads=0 -finishing thread FetcherThread34, activeThreads=0 -finishing thread FetcherThread35, activeThreads=0 -finishing thread FetcherThread36, activeThreads=0 -finishing thread FetcherThread37, activeThreads=0 -finishing thread FetcherThread38, activeThreads=0 -finishing thread FetcherThread39, activeThreads=0 -finishing thread FetcherThread40, activeThreads=0 -finishing thread FetcherThread41, activeThreads=0 -finishing thread FetcherThread42, activeThreads=0 -finishing thread FetcherThread43, activeThreads=0 -finishing thread FetcherThread44, activeThreads=0 -finishing thread FetcherThread45, activeThreads=0 -finishing thread FetcherThread46, activeThreads=0 -finishing thread FetcherThread47, activeThreads=0 -finishing thread FetcherThread48, activeThreads=0 -finishing thread FetcherThread49, activeThreads=0 Fetcher: throughput threshold: -1 Fetcher: throughput threshold sequence: 5 0/0 spinwaiting/active, 0 pages, 0 errors, 0.0 0 pages/s, 0 0 kb/s, 0 URLs in 0 queues -activeThreads=0 Using queue mode : byHost Fetcher: threads: 50 QueueFeeder finished: total 0 records. Hit by time limit :0 -finishing thread FetcherThread0, activeThreads=0 -finishing thread FetcherThread1, activeThreads=0 -finishing thread FetcherThread2, activeThreads=0 -finishing thread FetcherThread3, activeThreads=0 -finishing thread FetcherThread4, activeThreads=0 -finishing thread FetcherThread5, activeThreads=0 -finishing thread FetcherThread6, activeThreads=0 -finishing thread FetcherThread7, activeThreads=0 -finishing thread FetcherThread8, activeThreads=0 -finishing thread FetcherThread9, activeThreads=0 -finishing thread FetcherThread10, activeThreads=0 -finishing thread FetcherThread11, activeThreads=0 -finishing thread FetcherThread12, activeThreads=0 -finishing thread FetcherThread13, activeThreads=0 -finishing thread FetcherThread14, activeThreads=0 -finishing thread FetcherThread15, activeThreads=0 -finishing thread FetcherThread16, activeThreads=0 -finishing thread FetcherThread17, activeThreads=0 -finishing thread FetcherThread18, activeThreads=0 -finishing thread FetcherThread19, activeThreads=0 -finishing thread FetcherThread20, activeThreads=0 -finishing thread FetcherThread21, activeThreads=0 -finishing thread FetcherThread22, activeThreads=0 -finishing thread FetcherThread23, activeThreads=0 -finishing thread FetcherThread24, activeThreads=0 -finishing thread FetcherThread25, activeThreads=0 -finishing thread FetcherThread26, activeThreads=0 -finishing thread FetcherThread27, activeThreads=0 -finishing thread FetcherThread28, activeThreads=0 -finishing thread FetcherThread29, activeThreads=0 -finishing thread FetcherThread30, activeThreads=0 -finishing thread FetcherThread31, activeThreads=0 -finishing thread FetcherThread32, activeThreads=0 -finishing thread FetcherThread33, activeThreads=0 -finishing thread FetcherThread34, activeThreads=0 -finishing thread FetcherThread35, activeThreads=0 -finishing thread FetcherThread36, activeThreads=0 -finishing thread FetcherThread37, activeThreads=0 -finishing thread FetcherThread38, activeThreads=0 -finishing thread FetcherThread39, activeThreads=0 -finishing thread FetcherThread40, activeThreads=0 -finishing thread FetcherThread41, activeThreads=0 -finishing thread FetcherThread42, activeThreads=0 -finishing thread FetcherThread43, activeThreads=0 -finishing thread FetcherThread44, activeThreads=0 -finishing thread FetcherThread45, activeThreads=0 -finishing thread FetcherThread46, activeThreads=0 -finishing thread FetcherThread47, activeThreads=0 -finishing thread FetcherThread48, activeThreads=0 Fetcher: throughput threshold: -1 Fetcher: throughput threshold sequence: 5 -finishing thread FetcherThread49, activeThreads=0 0/0 spinwaiting/active, 0 pages, 0 errors, 0.0 0 pages/s, 0 0 kb/s, 0 URLs in 0 queues -activeThreads=0 FetcherJob: finished at 2015-09-29 12:55:05, time elapsed: 00:00:12 Parsing : /Users/sdrulea/svn/release-2.3.1/runtime/local/bin/nutch parse -D mapred.reduce.tasks=2 -D mapred.child.java.opts=-Xmx1000m -D mapred.reduce.tasks.speculative.execution=false -D mapred.map.tasks.speculative.execution=false -D mapred.compress.map.output=true -D mapred.skip.attempts.to.start.skipping=2 -D mapred.skip.map.max.skip.records=1 1443556489-5775 -crawlId method_centers ParserJob: starting at 2015-09-29 12:55:06 ParserJob: resuming: false ParserJob: forced reparse: false ParserJob: batchId: 1443556489-5775 ParserJob: success ParserJob: finished at 2015-09-29 12:55:08, time elapsed: 00:00:02 CrawlDB update for method_centers /Users/sdrulea/svn/release-2.3.1/runtime/local/bin/nutch updatedb -D mapred.reduce.tasks=2 -D mapred.child.java.opts=-Xmx1000m -D mapred.reduce.tasks.speculative.execution=false -D mapred.map.tasks.speculative.execution=false -D mapred.compress.map.output=true 1443556489-5775 -crawlId method_centers DbUpdaterJob: starting at 2015-09-29 12:55:09 DbUpdaterJob: batchId: 1443556489-5775 DbUpdaterJob: finished at 2015-09-29 12:55:11, time elapsed: 00:00:02 Indexing method_centers on SOLR index -> http://localhost:8983/solr /Users/sdrulea/svn/release-2.3.1/runtime/local/bin/nutch index -D mapred.reduce.tasks=2 -D mapred.child.java.opts=-Xmx1000m -D mapred.reduce.tasks.speculative.execution=false -D mapred.map.tasks.speculative.execution=false -D mapred.compress.map.output=true -D solr.server.url=http://localhost:8983/solr -all -crawlId method_centers IndexingJob: starting Active IndexWriters : SOLRIndexWriter solr.server.url : URL of the SOLR instance (mandatory) solr.commit.size : buffer size when sending to SOLR (default 1000) solr.mapping.file : name of the mapping file for fields (default solrindex-mapping.xml) solr.auth : use authentication (default false) solr.auth.username : username for authentication solr.auth.password : password for authentication IndexingJob: done. SOLR dedup -> http://localhost:8983/solr /Users/sdrulea/svn/release-2.3.1/runtime/local/bin/nutch solrdedup -D mapred.reduce.tasks=2 -D mapred.child.java.opts=-Xmx1000m -D mapred.reduce.tasks.speculative.execution=false -D mapred.map.tasks.speculative.execution=false -D mapred.compress.map.output=true http://localhost:8983/solr Tue Sep 29 12:55:17 PDT 2015 : Iteration 2 of 2 Generating batchId Generating a new fetchlist /Users/sdrulea/svn/release-2.3.1/runtime/local/bin/nutch generate -D mapred.reduce.tasks=2 -D mapred.child.java.opts=-Xmx1000m -D mapred.reduce.tasks.speculative.execution=false -D mapred.map.tasks.speculative.execution=false -D mapred.compress.map.output=true -topN 50000 -noNorm -noFilter -adddays 0 -crawlId method_centers -batchId 1443556517-13841 GeneratorJob: starting at 2015-09-29 12:55:18 GeneratorJob: Selecting best-scoring urls due for fetch. GeneratorJob: starting GeneratorJob: filtering: false GeneratorJob: normalizing: false GeneratorJob: topN: 50000 GeneratorJob: finished at 2015-09-29 12:55:20, time elapsed: 00:00:02 GeneratorJob: generated batch id: 1443556518-1067112789 containing 0 URLs Generate returned 1 (no new segments created) Escaping loop: no more URLs to fetch now There are 6 URLs in my urls/seeds.txt file. Why does it say 0 URLs? The index job worked but there’s no data in SOLR. Is there a known good version of SOLR that works with 2.3.1 schema.xml? Are the tutorial instructions still valid? On 9/28/15, 8:53 PM, "Drulea, Sherban" <[email protected]> wrote: >Hi Lewis, > >I made further progress. Following the instructions on >https://wiki.apache.org/nutch/NutchTutorial, I tried to copy the nutch >schema.xml to SOLR. > >However, the nutch tutorial is out of date for SOLR 5.1.0. It references >different directory structures. Furthermore, the 2.3.1 SOLR schema.xml >doesn’t appear to work. > >I did the following: > >1.) Created a “nutch” folder called in >/usr/local/Cellar/solr/5.1.0/server/solr >2.) Created a “conf” folder in the “nutch” folder. >3.) Copied >/usr/local/Cellar/solr/5.1.0/server/solr/configsets/basic_configs/conf/sol >r >config.xml to >/usr/local/Cellar/solr/5.1.0/server/solr/nutch/conf/solrconfig.xml >4.) Copied ~/svn/release-2.3.1/runtime/local/conf/schema.xml to >/usr/local/Cellar/solr/5.1.0/server/solr/nutch/conf/schema.xml. >5.) I went into the SOLR admin UI and added a new core called “nutch” >6.) I get the following error (screenshot attached): >Error CREATEing SolrCore 'nutch': Unable to create core [nutch] Caused by: >enablePositionIncrements is not a valid option as of Lucene 5.0 > > >7.) I deleted all enabledPositionIncrements=“true” in >/usr/local/Cellar/solr/5.1.0/server/solr/nutch/conf/schema.xml >8.) I tried creating the nutch core again (same step as #6). >9.) Now I get this error (screenshot attached): >Error CREATEing SolrCore 'nutch': Unable to create core [nutch] Caused by: >copyField source :'rawcontent' is not a glob and doesn't match any >explicit field or dynamicField. > > >The schema.xml in the 2.3.1 seems incompatible with SOLR 5.1.0. Can >someone please update a working schema.xml and document how to upload it >to SOLR 5.1.0? > >Cheers, >Sherban > >-- >Sherban Drulea, RAND Corporation >Senior Research Software Engineer, Information Services >m5129 x7384 [email protected] >― > > > > > > >On 9/28/15, 6:38 PM, "Drulea, Sherban" <[email protected]> wrote: > >>Hi Lewis, >> >>I made progress. I downloaded and installed the release candidate from >>https://svn.apache.org/repos/asf/nutch/tags/release-2.3.1 >> >>I ran the “crawl" executable with a Mongo backend. >> >>My gora.properties: >>------------------------------------------------------------------- >>gora.datastore.default=org.apache.gora.mongodb.store.MongoStore >>gora.mongodb.override_hadoop_configuration=false >>gora.mongodb.mapping.file=/gora-mongodb-mapping.xml >>gora.mongodb.servers=localhost:27017 >>gora.mongodb.db=method_centers >>――――――――――――――――――――――――――――――――― >> >> >>My nutch-site.xml: >>------------------------------------------------------------------- >> >><?xml version="1.0"?> >><?xml-stylesheet type="text/xsl" href="configuration.xsl"?> >> >><!-- Put site-specific property overrides in this file. --> >><configuration> >> <property> >> <name>http.agent.name</name> >> <value>nutch Mongo Solr Crawler</value> >> </property> >> >> <property> >> <name>storage.data.store.class</name> >> <value>org.apache.gora.mongodb.store.MongoStore</value> >> <description>Default class for storing data</description> >> </property> >> >> <property> >> <name>plugin.includes</name> >> >> <value>protocol-httpclient|urlfilter-regex|index-(basic|more)|query-(ba >>s >>i >>c|site|url|lang)|indexer-solr|nutch-extensionpoints|protocol-httpclient|u >>r >>l >>filter-regex|parse-(text|html|msexcel|msword|mspowerpoint|pdf)|summary-ba >>s >>i >>c|scoring-opic|urlnormalizer-(pass|regex|basic)protocol-http|urlfilter-re >>g >>e >>x|parse-(html|tika|metatags)|index-(basic|anchor|more|metadata)</value> >> </property> >> >> >> >></configuration> >>------------------------------------------------------------------- >> >> >>I run with this command: >>./bin/crawl urls method_centers http://localhost:8983/solr 2 >> >> >>Nutch successfully injects into the Mongo backend but fails on the SOLR >>indexing. Here’s the execution trace where nutch errors out on SOLR >>indexing task … >> >>FetcherJob: finished at 2015-09-28 18:27:57, time elapsed: 00:00:12 >>Parsing : >>/Users/sdrulea/svn/release-2.3.1/runtime/local/bin/nutch parse -D >>mapred.reduce.tasks=2 -D mapred.child.java.opts=-Xmx1000m -D >>mapred.reduce.tasks.speculative.execution=false -D >>mapred.map.tasks.speculative.execution=false -D >>mapred.compress.map.output=true -D >>mapred.skip.attempts.to.start.skipping=2 -D >>mapred.skip.map.max.skip.records=1 1443490061-8003 -crawlId >>method_centers >>ParserJob: starting at 2015-09-28 18:27:58 >>ParserJob: resuming: false >>ParserJob: forced reparse: false >>ParserJob: batchId: 1443490061-8003 >>ParserJob: success >>ParserJob: finished at 2015-09-28 18:28:00, time elapsed: 00:00:02 >>CrawlDB update for method_centers >>/Users/sdrulea/svn/release-2.3.1/runtime/local/bin/nutch updatedb -D >>mapred.reduce.tasks=2 -D mapred.child.java.opts=-Xmx1000m -D >>mapred.reduce.tasks.speculative.execution=false -D >>mapred.map.tasks.speculative.execution=false -D >>mapred.compress.map.output=true 1443490061-8003 -crawlId method_centers >>DbUpdaterJob: starting at 2015-09-28 18:28:01 >>DbUpdaterJob: batchId: 1443490061-8003 >>DbUpdaterJob: finished at 2015-09-28 18:28:03, time elapsed: 00:00:02 >>Indexing method_centers on SOLR index -> http://localhost:8983/solr >>/Users/sdrulea/svn/release-2.3.1/runtime/local/bin/nutch index -D >>mapred.reduce.tasks=2 -D mapred.child.java.opts=-Xmx1000m -D >>mapred.reduce.tasks.speculative.execution=false -D >>mapred.map.tasks.speculative.execution=false -D >>mapred.compress.map.output=true -D >>solr.server.url=http://localhost:8983/solr -all -crawlId method_centers >>IndexingJob: starting >>Active IndexWriters : >>SOLRIndexWriter >> solr.server.url : URL of the SOLR instance (mandatory) >> solr.commit.size : buffer size when sending to SOLR (default 1000) >> solr.mapping.file : name of the mapping file for fields (default >>solrindex-mapping.xml) >> solr.auth : use authentication (default false) >> solr.auth.username : username for authentication >> solr.auth.password : password for authentication >> >> >>SolrIndexerJob: >>org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: >>Expected content type application/octet-stream but got >>text/html;charset=ISO-8859-1. <html> >><head> >><meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1"/> >><title>Error 404 Not Found</title> >></head> >><body><h2>HTTP ERROR 404</h2> >><p>Problem accessing /solr/update. Reason: >><pre> Not Found</pre></p><hr /><i><small>Powered by >>Jetty://</small></i><br/> >> >><br/> >><br/> >></body> >></html> >> >>at >>org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.j >>a >>v >>a:455) >> at >>org.apache.solr.client.solrj.impl.HttpSolrServer.request(HttpSolrServer.j >>a >>v >>a:197) >> at >>org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(Abstra >>c >>t >>UpdateRequest.java:117) >> at org.apache.solr.client.solrj.SolrServer.commit(SolrServer.java:168) >> at org.apache.solr.client.solrj.SolrServer.commit(SolrServer.java:146) >> at >>org.apache.nutch.indexwriter.solr.SolrIndexWriter.commit(SolrIndexWriter. >>j >>a >>va:146) >> at org.apache.nutch.indexer.IndexWriters.commit(IndexWriters.java:124) >> at org.apache.nutch.indexer.IndexingJob.index(IndexingJob.java:186) >> at org.apache.nutch.indexer.IndexingJob.run(IndexingJob.java:202) >> at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70) >> at org.apache.nutch.indexer.IndexingJob.main(IndexingJob.java:211) >> >>Error running: >> /Users/sdrulea/svn/release-2.3.1/runtime/local/bin/nutch index -D >>mapred.reduce.tasks=2 -D mapred.child.java.opts=-Xmx1000m -D >>mapred.reduce.tasks.speculative.execution=false -D >>mapred.map.tasks.speculative.execution=false -D >>mapred.compress.map.output=true -D >>solr.server.url=http://localhost:8983/solr -all -crawlId method_centers >>Failed with exit value 255. >> >> >>I verified my SOLR is up and running. The SOLR web gui says solr-spec >>5.1.0. Do I have to configure SOLR for nutch indexing? If so, are there >>instructions to configure SOLR for nutch? >> >> >>Unrelated question… >>How does nutch crawl every link in pages in the seeds.txt file? Is there >>a >>difference between a URL directory entry vs specific page URL? >>For example, let’s say http://foo.com/index.html contains 100 links. Will >>nutch crawl these 2 seed.txt entries the same way(i.e. crawl each 100 >>links)? >>http://foo.com/index.html >>http://foo.com >> >> >>Thanks again for your help. I’ll give +1 vote for 2.3.1 candidate once >>SOLR indexing works ;). >> >>Cheers, >>Sherban >> >> >> >>On 9/28/15, 11:55 AM, "Drulea, Sherban" <[email protected]> wrote: >> >>>Hi Lewis, >>> >>>Thanks for your reply. You¹re right, there¹s no homebrew recipe for >>>Nutch. >>>I use the official nutch 2.3 OS X release download from the Apache >>>website. I run nutch from /runtime/local/bin. The homebrew packages are >>>other dependent software (mongo, cassandra, hbase,e tc.) >>> >>>All the problems I described are with the nutch 2.3 download, not >>>homebrew >>>packages. >>> >>>Where do I download nutch 2.3.1? Should I just pull the latest from >>>http://svn.apache.org/viewvc/nutch/trunk/ ? >>> >>>Cheers, >>>Sherban >>> >>> >>> >>>On 9/27/15, 9:57 AM, "Lewis John Mcgibbney" <[email protected]> >>>wrote: >>> >>>>Hi Drulea, >>>> >>>>On Sun, Sep 27, 2015 at 7:36 AM, <[email protected]> >>>>wrote: >>>> >>>>> >>>>> I¹m using nutch 2.3 on OS X 10.9.5 with homebrew. >>>>> >>>> >>>> >>>>From the start I would like to point you at the current release >>>>candidate >>>>for Nutch 2.3.1. The VOTE is currently open and the release candidate >>>>is >>>>being tested by the community. There are a number of bugs fixed down in >>>>Gora (particularly within the gora-mongodb module) which Nutch 2.3.1 >>>>will >>>>benefit from. >>>>It can be obtained from here >>>>http://www.mail-archive.com/dev%40nutch.apache.org/msg19271.html >>>> >>>>Another thing here is that, AFAIK we are not publishing Homebrew >>>>recipes! >>>>Wherever you got your recipe from I can guarantee you that it is not an >>>>official Nutch one! I do however see two >>>> >>>>lmcgibbn@LMC-032857 /usr/local(joshua) $ brew search nutch >>>>No formula found for "nutch". >>>>==> Searching pull requests... >>>>Closed pull requests: >>>>Added formula for Apache Nutch ( >>>>https://github.com/Homebrew/homebrew/pull/26587) >>>>Added Apache Nutch 2.2.1 >>>>(https://github.com/Homebrew/homebrew/pull/22004) >>>> >>>>None of these are from the release managers at Nutch... maybe this is >>>>something we should look in to. >>>> >>>> >>>>> >>>>> I¹ve been unable to use the crawl command with MySQL, Mongo, or >>>>>Cassandra. >>>>> The inject step fails in each configuration with the following arcane >>>>> errors: >>>>> >>>>> 1.) MySQL (after downgrading to gora-cpre 0.2.1 in ivy.xml as per >>>>>comments) >>>>> >>>> >>>> >>>>MySQL backend for Gora is broken by now. Things have changed and moved >>>>on >>>>with the SQL module being left in the dust. Avro has also moved on >>>>significantly and we now utilize a MUCH never version of Avro so your >>>>NoSuchMethodError below us entirely understandable. >>>> >>>> >>>>> InjectorJob: Injecting urlDir: urls >>>>> >>>> >>>>[...snip] >>>> >>>> >>>> >>>>> >>>>> >>>>> 2.) Mongo with default 0.5 gora >>>>> >>>>> InjectorJob: Injecting urlDir: urls >>>>> >>>>> InjectorJob: org.apache.gora.util.GoraException: >>>>> java.lang.NullPointerException >>>>> >>>>> >>>>> >>>>[...snip] >>>> >>>>This is gone in the Nutch 2.3.1 release candidate. >>>> >>>> >>>>> 3.) Mongo(upgrading to gora 0.6.1 to resolve previous issue above) >>>>> >>>>> InjectorJob: Injecting urlDir: urls >>>>> >>>>> InjectorJob: java.lang.UnsupportedOperationException: Not implemented >>>>>by >>>>> the DistributedFileSystem FileSystem implementation >>>>> >>>>> >>>>> >>>>[...snip] >>>> >>>>Can you please try with the 2.3.1 release candidate and provide the >>>>same >>>>feedback? >>>> >>>> >>>>> 4.) Cassandra using default gora 0.5 >>>>> >>>>> InjectorJob: Injecting urlDir: urls >>>>> >>>>> Exception in thread "main" java.lang.NoSuchMethodError: >>>>> org.apache.avro.Schema.access$1400()Ljava/lang/ThreadLocal; >>>>> >>>>> >>>>> >>>>[...snip] >>>> >>>>I've never seen this before. On another note, Renato and me are >>>>currently >>>>overhauling the gora-cassandra driver from Hector --> Datastax Java >>>>Driver. >>>>Work is ongoing here >>>>https://github.com/renato2099/gora/tree/gora-datastax-cassandra >>>> >>>> >>>>> Does the ³crawl" script inject task work with any backend storage >>>>>reliably >>>>> on OS X? >>>>> >>>> >>>>Well we can better answer that question if and when you and more people >>>>try >>>>our the 2.3.1 release candidate. >>>> >>>> >>>> >>>>> >>>>> Which backend is the most reliable to use with nutch 2.3? >>>>> >>>> >>>>HBase 0.94.14 >>>> >>>> >>>>> >>>>> It¹s frustrating that 3 common (and supposedly supported) backends >>>>>don¹t >>>>> work with nutch due to arcane errors. >>>>> >>>>> >>>>I agree. But lets not throw the baby out with the bath water here. Hows >>>>about you try out the above and respond and we can take it from there? >>>>Would be great to have more developers submitting patches for 2.X >>>>branch. >>>>If you are keen then it would be great to have you on board. >>>>Thanks >>>>Lewis >>> >> >> >>_________________________________________________________________________ >>_ >> >>This email message is for the sole use of the intended recipient(s) and >>may contain confidential information. Any unauthorized review, use, >>disclosure or distribution is prohibited. If you are not the intended >>recipient, please contact the sender by reply email and destroy all >>copies >>of the original message. >

