Re: Writing Nutch data in Parquet format

2021-05-06 Thread Lewis John McGibbney
Hi Seb, Really interesting. Thanks for the response. Below On 2021/05/05 11:42:04, Sebastian Nagel wrote: > > Yes, but not directly - it's a multi-step process. As I expected ;) > > This Parquet index is optimized by sorting the rows by a special form of the > URL [1] which > - drops

Re: Writing Nutch data in Parquet format

2021-05-05 Thread Sebastian Nagel
://research.google/pubs/pub36632/ [6] https://blog.twitter.com/engineering/en_us/a/2013/dremel-made-simple-with-parquet.html On 5/4/21 11:14 PM, Lewis John McGibbney wrote: Hi user@, Has anyone experimented/accomplished either 1) writing Nutch data directly as Parquet format, or 2) post-processing

Writing Nutch data in Parquet format

2021-05-04 Thread Lewis John McGibbney
Hi user@, Has anyone experimented/accomplished either 1) writing Nutch data directly as Parquet format, or 2) post-processing (Nutch) Hadoop sequence data by converting it to Parquet format? Thank you lewismc