Hi Pau,
I think the tutorial is still not fully up-to-date:
If you haven't, you should update the solr.* properties in nutch-site.xml (and
run `ant runtime` again to update the runtime).
Then the command for the tutorial should be:
bin/nutch index crawl/crawldb/ -linkdb crawl/linkdb/ -dir
Hi Lewis et al.,
I have followed the new tutorial.
In step Step-by-Step: Indexing into Apache Solr
the command
bin/nutch index http://localhost:8983/solr crawl/crawldb/ -linkdb crawl/linkdb/
crawl/segments/20131108063838/ -filter -normalize -deleteGone
should be run for each segment directory
Hello,
What's a workaround for this path.home error?
https://issues.apache.org/jira/browse/NUTCH-2385
I've tried passing -Des.path.home and -Dpath.home to nutch index. Did not help.
I've tried creating my ElasticSearch index with the path.home settings. Did not
help.
I've tried setting
Hi Lewis,
Just trying the tutorial again. Doing the third round, it's taking
much longer than the other two.
What's this schema for?
Does the version of Nutch that we run have to have this new schema for
compatibility with Solr 6.6.0?
Or can we use Nutch 1.13?
thanks,
pau
On 7/12/17, lewis john
Nice job and very diligent.
On Wed, Jul 12, 2017 at 11:38 AM, Omkar Reddy
wrote:
> Hello all,
>
> Please find my updated weekly reports here[0]. Please feel free to provide
> any suggestions.
>
> Thanks,
> Omkar.
>
> [0]
>
Hello all,
Please find my updated weekly reports here[0]. Please feel free to provide
any suggestions.
Thanks,
Omkar.
[0]
https://wiki.apache.org/nutch/GoogleSummerOfCode/GraphGeneratorTool/WeeklyReports
Hi Folks,
I just updated the tutorial below, if you find any discrepancies please let
me know.
https://wiki.apache.org/nutch/NutchTutorial
Also, I have made available a new schema.xml which is compatible with Solr
6.6.0 at
https://issues.apache.org/jira/browse/NUTCH-2400
Please scope it out
Hello,
I've been trying to get nutch to crawl all of my site (let's call it
my_domain_name.com) for a while now, but it's not working. These are my
settings:
---
nutch-site.xml:
db.ignore.external.links = true
db.ignore.external.links.mode = byDomain
db.max.outlinks.per.page = -1
http,
8 matches
Mail list logo