Hello. I have a task to parse dates out of incoming raw content. Of course the date patterns can assume any number of forms - YYYY-MM-DD, YYYY/MM/DD, YYYYMMDD, MMDDYYYY, etc etc etc. I can build myself a robust regex to match a broad set of such patterns in the raw data, but I wonder if there is a project or library available for Groovy that already offes this?
Assuming I get pattern matches parsed out of my raw data, I will have a collection of strings representing year-month-days in a variety of formats. I'd then like to normalize them to a standard form so that I can sort and compare them. I intend to identify the range of dates in the raw data as a sorted Groovy list. I anticipate I will miss many pattern variations with my initial cut at this. I do have one thing going for me: as I test through volumes of raw data, I'll be able to improve the pattern net I cast to catch an ever-improving percentage of year-month-day expressions. I intend to write a Groovy script that will run from an Apache NiFi ExecuteScript processor. I'll read in my data flowfile content using a buffered reader so I can handle flowfiles that may be large. Any recommendations or suggestions?