Thanks ZHANG! Please find details below: # of rows: ~25B, row size would be somewhere around ~3-5MB (it's a parquet formatted data so, need to worry about only the columns to be tagged)
avg length of the text to be parsed : ~300 Unfortunately don't have sample data or regex which I can share freely. However about data being parsed - assume these are purchases made online and we are trying to parse the transaction details. Like purchases made on amazon can be tagged to amazon as well as other vendors etc. Appreciate your response! On Tue, May 12, 2020 at 6:23 AM ZHANG Wei <wezh...@outlook.com> wrote: > May I get some requirement details? > > Such as: > 1. The row count and one row data size > 2. The avg length of text to be parsed by RegEx > 3. The sample format of text to be parsed > 4. The sample of current RegEx > > -- > Cheers, > -z > > On Mon, 11 May 2020 18:40:49 -0400 > Rishi Shah <rishishah.s...@gmail.com> wrote: > > > Hi All, > > > > I have a tagging problem at hand where we currently use regular > expressions > > to tag records. Is there a recommended way to distribute & tag? Data is > > about 10TB large. > > > > -- > > Regards, > > > > Rishi Shah > -- Regards, Rishi Shah