Thank guys. replaceAll looks like the solution. I read through that doc numerous time and cannot believe I missed that one. LOL. Juan, I was looking at UpdateAttribute advanced as well, but just got stuck on how to do the regex in that context using expression language.
Appreciate the help guys. Now I can have a happy day getting this working :) On Wed, Nov 4, 2015 at 7:04 AM, Ryan Ward <[email protected]> wrote: > Mark, > > Take a look at the replaceAll function. Juan is correct you will want to > use UpdateAttribute in the advance mode. > > > https://nifi.apache.org/docs/nifi-docs/html/expression-language-guide.html#replaceall > > Ryan > > On Wed, Nov 4, 2015 at 6:52 AM, Juan Jose Escobar < > [email protected]> wrote: > >> >> Hello, Mark, >> >> I think it should be possible to do it using UpdateAttribute in advanced >> mode: define a condition for each of the different formats, and once the >> particular format type is identified, get the appropriate substring into a >> new attribute - or into the filename attribute if you want to normalize >> naming. If I remember correctly, there is no support to extract the regex >> groups in Nifi Expression Language in 0.3.0. >> >> Hope this helps >> >> J >> >> On Wed, Nov 4, 2015 at 7:04 AM, Mark Petronic <[email protected]> >> wrote: >> >>> Looking for some help on best way to extract a field from a filename. I >>> need to parse out the date from the core filename attribute set by the >>> UnpackContent processor. I am unzipping files that contain many CSV files >>> and these CSV file names vary in format but each has a timestamp included >>> in the filename. Example formats are: >>> >>> Priority_002_20151104123456_00.csv (20151104123456 is yyyyMMddHHmmss) >>> ABC_02_1447586912344.csv (1447586912344 is Unix time in ms) >>> XYZ_20151104_1234.csv (20151104_1234 is yyyyMMdd_HHmm) >>> >>> So, there are various forms to deal with. I need to normalize these into >>> yyyyMMddHHmmss. A regex with capture groups would be perfect but I cannot >>> quite figure out how to do it. ExtractText does regex with capture groups >>> but only against flowfile contents and these are attributes. >>> UpdateAttribute only support expression language and that does not have >>> regex based extracts of capture groups. >>> >>> In Python, I would just do something like: >>> >>> date, time = re.search(r"XYZ_(\d+)_(\d+)\.csv", >>> "XYZ_20151104_1234.csv").groups() >>> >>> Then I could use the expression language format or doDate functions to >>> normalize the dates >>> >>> I know I could use a utility script with ExecuteStreamCommand that I >>> could call with the filepath and get back the tokens but was looking for an >>> internal way to do it without forking out as there are a lot of archives in >>> each zip and that would add to latency in heavy loads. >>> >>> Any thoughts? >>> >>> Thanks! >>> >>> >> >
