RE: [EXTERNAL] - Re: Outlinks field is not populated when page from seed URL when fetched page contains "refresh" meta tag

2017-06-22 Thread Vyacheslav Pascarel
Hi Lewis, It seems that URLs get mangled when message posted to email list. The seed URL I that used was for MSNBC dot COM: http---www-msnbc-com (replace dashes with ":", "/", and ".") Regards, Vyacheslav Pascarel -Original Message- From: lewis john mcgibbney

Does Nutch 2.3.1 server support parallel calls

2017-06-22 Thread Vladimir Loubenski
Hi there, I have Nutch 2.3.1 deployment. I run in parallel 7 threads. Each of the threads runs sequentially generate, fetch, parse, updatedb in circles and using REST API calls. Each of the threads uses its own crawlId for REST API calls . So I didn't expected exceptions, but from time

Re: Outlinks field is not populated when page from seed URL when fetched page contains "refresh" meta tag

2017-06-22 Thread lewis john mcgibbney
Hi Vyacheslav, Can you provide me and example page with http refresh tag included? I'll try comparing behaviour between 2.X and master. Thank you Lewis On Sat, Jun 17, 2017 at 9:25 AM, wrote: > From: Vyacheslav Pascarel > To: