All,

I realize that the documentation says that you crawl first then add to Solr
but I spent several hours running the same command through Cygwin with
-solrindex http://localhost:8983/solr on the command line (eg. bin/nutch
crawl urls -dir crawl -threads 10 -depth 100 -topN 50 -solrindex
http://localhost:8983/solr) and it worked. Does anyone know why it's not
working for me anymore? I am using the Lucid build of Solr which was what i
was using before. I neglected to write down the command line syntax which is
biting me in the arse. Any tips on this one would be great!

Thanks,
Adam

On Mon, Dec 20, 2010 at 4:21 PM, Anurag <anurag.it.jo...@gmail.com> wrote:

>
> why are using solrindex in the argument.? It is used when we need to index
> the crawled data in Solr
> For more read http://wiki.apache.org/nutch/NutchTutorial .
>
> Also for nutch-solr integration this is very useful blog
> http://www.lucidimagination.com/blog/2009/03/09/nutch-solr/
> I integrated nutch and solr and it works well.
>
> Thanks
>
> On Tue, Dec 21, 2010 at 1:57 AM, Adam Estrada-2 [via Lucene] <
> ml-node+2122347-622655030-146...@n3.nabble.com<ml-node%2b2122347-622655030-146...@n3.nabble.com>
> <ml-node%2b2122347-622655030-146...@n3.nabble.com<ml-node%252b2122347-622655030-146...@n3.nabble.com>
> >
> > wrote:
>
> > All,
> >
> > I have a couple websites that I need to crawl and the following command
> > line
> > used to work I think. Solr is up and running and everything is fine there
> > and I can go through and index the site but I really need the results
> added
> >
> > to Solr after the crawl. Does anyone have any idea on how to make that
> > happen or what I'm doing wrong?  These errors are being thrown fro Hadoop
> > which I am not using at all.
> >
> > $ bin/nutch crawl urls -dir crawl -threads 10 -depth 100 -topN 50
> > -solrindex
> > ht
> > tp://localhost:8983/solr
> > crawl started in: crawl
> > rootUrlDir = http://localhost:8983/solr
> > threads = 10
> > depth = 100
> > indexer=lucene
> > topN = 50
> > Injector: starting at 2010-12-20 15:23:25
> > Injector: crawlDb: crawl/crawldb
> > Injector: urlDir: http://localhost:8983/solr
> > Injector: Converting injected urls to crawl db entries.
> > Exception in thread "main" java.io.IOException: No FileSystem for scheme:
> > http
> >         at
> > org.apache.hadoop.fs.FileSystem.createFileSystem(FileSystem.java:1375
> > )
> >         at org.apache.hadoop.fs.FileSystem.access$200(FileSystem.java:66)
> >         at
> org.apache.hadoop.fs.FileSystem$Cache.get(FileSystem.java:1390)
> >         at org.apache.hadoop.fs.FileSystem.get(FileSystem.java:196)
> >         at org.apache.hadoop.fs.Path.getFileSystem(Path.java:175)
> >         at
> > org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.j
> > ava:169)
> >         at
> > org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.ja
> > va:201)
> >         at
> > org.apache.hadoop.mapred.JobClient.writeOldSplits(JobClient.java:810)
> >
> >         at
> > org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:7
> > 81)
> >         at
> org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:730)
> >
> >         at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:1249)
> >         at org.apache.nutch.crawl.Injector.inject(Injector.java:217)
> >         at org.apache.nutch.crawl.Crawl.main(Crawl.java:124)
> >
> >
> > ------------------------------
> >  View message @
> >
> http://lucene.472066.n3.nabble.com/Nutch-and-Solr-integration-tp2122347p2122347.html
> > To start a new topic under Solr - User, email
> > ml-node+472068-1941297125-146...@n3.nabble.com<ml-node%2b472068-1941297125-146...@n3.nabble.com>
> <ml-node%2b472068-1941297125-146...@n3.nabble.com<ml-node%252b472068-1941297125-146...@n3.nabble.com>
> >
> > To unsubscribe from Solr - User, click here<
> http://lucene.472066.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code&node=472068&code=YW51cmFnLml0LmpvbGx5QGdtYWlsLmNvbXw0NzIwNjh8LTIwOTgzNDQxOTY=
> >.
> >
> >
>
>
>
> --
> Kumar Anurag
>
>
> -----
> Kumar Anurag
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Nutch-and-Solr-integration-tp2122347p2122623.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Reply via email to