Hi,
I am using nutch for the first time and when I crawl www.mysite.com it
crawls for a while.
When I try to crawl a subfolder like www.mysite.com/mysubfolder it crawls
for about 1 sec.
my ursl/seed.txt is set
http://www.mysite.com/mysubfolder
my regex-urlfilter.txt use the defautl except for the
Maybe because I am trying to just crawl a subfolder mysite.com/subfolder and
I am having problems configuring it to do this and is going and crawling
other pages from the parent directory.
Thanks!
On Tue, Oct 4, 2016 at 4:00 AM, Markus Jelsma
wrote:
> Well, probably because you or something i
Can you send it to me also?
Thanks,
Néstor
On Oct 10, 2016 9:33 PM, "MrSrivastavaRK ." wrote:
>
> Hi,
> I have successfully indexed content in Elasticsearch using Nutch 1.12 REST
> API. I can send you api details, If you want for reference.
>
> Regards
> Raje
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.nutch.crawl.Crawl.main(Crawl.java:55)
*How can I make it crawl the entire subfolder?*
*and What does that error means?*
Thanks,
Néstor
--
Né§t☼r *Authority gone to one's head is the greatest enemy of Truth*
4 matches
Mail list logo