Re: Writing Nutch data in Parquet format

2021-05-06 Thread Lewis John McGibbney
Hi Seb, Really interesting. Thanks for the response. Below On 2021/05/05 11:42:04, Sebastian Nagel wrote: > > Yes, but not directly - it's a multi-step process. As I expected ;) > > This Parquet index is optimized by sorting the rows by a special form of the > URL [1] which > - drops

Re: Redirection behavior

2021-05-06 Thread prateek
Thanks.. I am using a custom http plugin. So I will debug with 1.16 to see what's causing it. Thanks for your help Regards Prateek On Thu, May 6, 2021 at 11:26 AM Sebastian Nagel wrote: > Hi Prateek, > > (sorry, I pressed the wrong reply button, so redirecting the discussion > back to

Re: Redirection behavior

2021-05-06 Thread Sebastian Nagel
Hi Prateek, (sorry, I pressed the wrong reply button, so redirecting the discussion back to user@nutch) > I am not sure what I am missing. Well, URL filters? Robots.txt? Don't know... > I am currently using Nutch 1.16 Just to make sure this isn't the cause: there was a bug (NUTCH-2550