RE: nutch 1.x tutorial with solr 6.6.0

2017-07-12 Thread Yossi Tamari
Hi Pau, I think the tutorial is still not fully up-to-date: If you haven't, you should update the solr.* properties in nutch-site.xml (and run `ant runtime` again to update the runtime). Then the command for the tutorial should be: bin/nutch index crawl/crawldb/ -linkdb crawl/linkdb/ -dir

Re: nutch 1.x tutorial with solr 6.6.0

2017-07-12 Thread Pau Paches
Hi Lewis et al., I have followed the new tutorial. In step Step-by-Step: Indexing into Apache Solr the command bin/nutch index http://localhost:8983/solr crawl/crawldb/ -linkdb crawl/linkdb/ crawl/segments/20131108063838/ -filter -normalize -deleteGone should be run for each segment directory

ElasticSearch error

2017-07-12 Thread Srinivasa, Rashmi
Hello, What's a workaround for this path.home error? https://issues.apache.org/jira/browse/NUTCH-2385 I've tried passing -Des.path.home and -Dpath.home to nutch index. Did not help. I've tried creating my ElasticSearch index with the path.home settings. Did not help. I've tried setting

Re: nutch 1.x tutorial with solr 6.6.0

2017-07-12 Thread Pau Paches
Hi Lewis, Just trying the tutorial again. Doing the third round, it's taking much longer than the other two. What's this schema for? Does the version of Nutch that we run have to have this new schema for compatibility with Solr 6.6.0? Or can we use Nutch 1.13? thanks, pau On 7/12/17, lewis john

Re: Google Summer of Code Weekly Reports.

2017-07-12 Thread Edward Capriolo
Nice job and very diligent. On Wed, Jul 12, 2017 at 11:38 AM, Omkar Reddy wrote: > Hello all, > > Please find my updated weekly reports here[0]. Please feel free to provide > any suggestions. > > Thanks, > Omkar. > > [0] >

Google Summer of Code Weekly Reports.

2017-07-12 Thread Omkar Reddy
Hello all, Please find my updated weekly reports here[0]. Please feel free to provide any suggestions. Thanks, Omkar. [0] https://wiki.apache.org/nutch/GoogleSummerOfCode/GraphGeneratorTool/WeeklyReports

Re: nutch 1.x tutorial with solr 6.6.0

2017-07-12 Thread lewis john mcgibbney
Hi Folks, I just updated the tutorial below, if you find any discrepancies please let me know. https://wiki.apache.org/nutch/NutchTutorial Also, I have made available a new schema.xml which is compatible with Solr 6.6.0 at https://issues.apache.org/jira/browse/NUTCH-2400 Please scope it out

nutch is not fetching all the pages

2017-07-12 Thread Srinivasa, Rashmi
Hello, I've been trying to get nutch to crawl all of my site (let's call it my_domain_name.com) for a while now, but it's not working. These are my settings: --- nutch-site.xml: db.ignore.external.links = true db.ignore.external.links.mode = byDomain db.max.outlinks.per.page = -1 http,