RE: nutch 1.x tutorial with solr 6.6.0

2017-07-11 Thread Srinivasa, Rashmi
Hi Pau, I have not used the solrindex command, but from the "input path" error message, it sounds like it wants the actual segment directory under segments/. The nutch crawl script uses the following commands: * inject * generate * fetch * parse * updatedb * invertlinks * dedup * index * clean

Re: nutch 1.x tutorial with solr 6.6.0

2017-07-11 Thread Pau Paches
Hi Rashmi, I have followed your suggestions. Now I'm seeing a different error. bin/nutch solrindex http://127.0.0.1:8983/solr crawl/crawld -linkdb crawl/linkdb crawl/segments The input path at segments is not a segment... skipping Indexer: starting at 2017-07-11 20:45:56 Indexer: deleting gone

RE: nutch 1.x tutorial with solr 6.6.0

2017-07-11 Thread Srinivasa, Rashmi
Hi Pau, Yes, it took me a while to get things working because the tutorial is not complete or up to date. In conf/nutch-site.xml, the value for plugin.includes uses indexer-elastic by default. If you want to use SOLR, you'll have to change it to indexer-solr. I haven't tried SOLR 6.6, but

Re: nutch 1.x tutorial with solr 6.6.0

2017-07-11 Thread Pau Paches
Hi Yossi and BlackIce, many thanks for your tips. However, a tutorial needs to be self-contained, or at least link to the documentation/tutorial on how to configure the parts it uses. On Tue, Jul 11, 2017 at 1:39 PM BlackIce wrote: > I think by default the newer SOLR

Re: nutch 1.x tutorial with solr 6.6.0

2017-07-11 Thread BlackIce
I think by default the newer SOLR starts in "schemaless" mode.. One neds to create a config directory with ALL necessary configuration files like schema and solar.conf BEFORE creating the collection and then run a command to create this collection using this conf directory. I don't have access to

RE: nutch 1.x tutorial with solr 6.6.0

2017-07-11 Thread Yossi Tamari
I struggled with this as well. Eventually I moved to ElasticSearch, which is much easier. What I did manage to find out, is that in newer versions of SOLR you need to use ZooKeeper to update the conf file. see https://stackoverflow.com/a/43351358. -Original Message- From: Pau Paches

Re: nutch 1.x tutorial with solr 6.6.0

2017-07-11 Thread Pau Paches
Hi, I just crawl a single URL so no whole web crawling. So I do option 2, fetching, invertlinks successfully. This is just Nutch 1.x Then I do Indexing into Apache Solr so go to section Setup Solr for search. First thing that does not work: cd ${APACHE_SOLR_HOME}/example java -jar start.jar No