Thanks ZHANG! Please find details below:

# of rows: ~25B, row size would be somewhere around ~3-5MB (it's a parquet
formatted data so, need to worry about only the columns to be tagged)

avg length of the text to be parsed : ~300

Unfortunately don't have sample data or regex which I can share freely.
However about data being parsed - assume these are purchases made online
and we are trying to parse the transaction details. Like purchases made on
amazon can be tagged to amazon as well as other vendors etc.

Appreciate your response!



On Tue, May 12, 2020 at 6:23 AM ZHANG Wei <wezh...@outlook.com> wrote:

> May I get some requirement details?
>
> Such as:
> 1. The row count and one row data size
> 2. The avg length of text to be parsed by RegEx
> 3. The sample format of text to be parsed
> 4. The sample of current RegEx
>
> --
> Cheers,
> -z
>
> On Mon, 11 May 2020 18:40:49 -0400
> Rishi Shah <rishishah.s...@gmail.com> wrote:
>
> > Hi All,
> >
> > I have a tagging problem at hand where we currently use regular
> expressions
> > to tag records. Is there a recommended way to distribute & tag? Data is
> > about 10TB large.
> >
> > --
> > Regards,
> >
> > Rishi Shah
>


-- 
Regards,

Rishi Shah

Reply via email to