Agreed. Bryan's suggestion will give you the ability to match each line against the regex, rather than trying to match the entire file. It would result in a new FlowFile for each line of text, though, as he said. But if you need to rebuild a single file, those could potentially be merged together using a MergeContent processor, as well.
________________________________ > Date: Tue, 8 Sep 2015 13:03:08 -0400 > Subject: Re: ExtractText usage > From: [email protected] > To: [email protected] > > Chris, > > I think the issue is that ExtractText is not reading the file line by > line, and then applying your pattern to each line. It is applying the > pattern to the whole content of the file so you would need a regex that > repeated the pattern you were looking for so that it captured multiple > times. > > When I tested your example, it was actually extracting the first match > 3 times which I think is because of the following... > - It always puts the first match in the property base name, in this > case "regex", > - then it puts the entire match in index 0, in this case regex.0, and > in this case it is only matching the first occurrence > - and then all of the matches would be in order after that staring with > index 1, which in this case there is only 1 match so it is just regex.1 > > Another solution that might simpler is to put a SplitText processor > between GetFile and ExtractText, and set the Line Split Count to 1. > This will send 1 line at a time to your ExtractTextProcessor which > would then match only the lines starting with 'R'. > The downside is that all of the lines with 'R' would be in different > FlowFiles, but this may or may not matter depending what you wanted to > do with them after. > > -Bryan > > > On Tue, Sep 8, 2015 at 12:12 PM, Christopher Wilson > <[email protected]<mailto:[email protected]>> wrote: > I'm trying to read a directory of .csv files which have 3 different > schemas/list types (not my idea). The descriptor is in the first > column of the csv file. I'm reading the files in using GetFile and > passing them into ExtractText, but I'm only getting the first 3 (of 8) > lines matching my first regex. What I want to do is grab all the lines > beginning with "R" and dump them off to a file (for now). My end goal > would be to loop through these grab lines, or blocks of lines, by regex > and route them downstream based on that regex. > > Details and first 11 lines of a sample file below. > > Thanks in advance. > > -Chris > > NiFi version: 0.2.1 > OS: Ubuntu 14.01 > JVM: java-1.7.0-openjdk-amd64 > > ExtractText: > > Enable Multiline = True > Enable Unix Lines Mode = True > regex = ^("R.*)$ > > > "H","USA","BP","20140502","9","D","BP" > "R","1","TB","CLM"," "," ","3U"," ","47000","0","47000","0"," ","0"," > ","0"," ","0"," ","0"," ","0"," ","0"," ","0","25000","25000"," > ","650","F","D","D","6"," "," "," ","1:20PM ","1:51PM ","0122"," ","Clm > 25000","Fast","","16","87"," > ","","","64","117.39","2266","4648","11129","0","0"," > ","","112089","Good","Cloudy","","","Y" > "R","2","TB","CLM"," ","B","3U"," ","34000","0","34000","0"," ","0"," > ","0"," ","0"," ","0"," ","0"," ","0"," ","0","25000","25000"," > ","600","F","D","D","7"," "," "," ","1:51PM ","2:22PM ","0151"," ","Clm > 25000N2L","Fast","","16","79"," > ","","","64","112.36","2444","4803","10003","0","0"," > ","","261868","Poor","Cloudy","","","Y" > "R","3","TB","STK","S"," ","3U"," > ","100000","0","100000","0","A","100000"," ","0"," ","0"," ","0"," > ","0"," ","0"," ","0","0","0"," ","600","F","D","D","6"," ","Affirmed > Success S.","AfrmdScsB","2:22PM ","2:53PM ","0222"," > ","AfrmdScsB100k","Fast","","16","88"," > ","","","64","110.54","2323","4618","5810","0","0"," > ","","259015","5","Clear","","","Y" > "R","4","TB","MCL"," "," ","3U"," ","49200","0","49200","0"," ","0"," > ","0"," ","0"," ","0"," ","0"," ","0"," ","0","40000","40000"," > ","850","F","D","D","8"," "," "," ","2:53PM ","3:24PM ","0253"," ","Md > 40000","Fast","Y","30","72"," > ","","","64","145.58","2425","4829","11358","13909","0"," > ","","260343","9","Clear","0","","Y" > "R","5","TB","ALW"," "," ","3U"," ","77000","0","77000","0"," ","0"," > ","0"," ","0"," ","0"," ","0"," ","0"," ","0","0","0"," > ","900","F","D","D","7"," "," "," ","3:24PM ","3:55PM ","0325"," ","Alw > 77000N1X","Fast","Y","30","74"," > ","","","64","151.69","2330","4643","11156","13832","0"," > ","","302065","Good","Clear","","","Y" > "R","6","TB","MSW","S","B","3U"," ","60000","1200","60000","0"," > ","0"," ","0"," ","0"," ","0"," ","0"," ","0"," ","0","0","0"," > ","800","F","D","D","5"," "," "," ","3:55PM ","4:26PM ","0355"," ","Md > Sp Wt 58k","Fast","","30","61"," > ","","","64","140.64","2481","4931","11477","0","0"," > ","","161404","Good","Clear","","","Y" > "R","7","TB","CLM"," ","B","3U"," ","40000","0","40000","0"," ","0"," > ","0"," ","0"," ","0"," ","0"," ","0"," ","0","20000","20000"," > ","800","F","D","D","6"," "," "," ","4:26PM ","4:57PM ","0427"," ","Clm > 20000","Fast","","30","68"," > ","","","64","139.31","2337","4770","11402","0","0"," > ","","344306","Good","Clear","","","Y" > "R","8","TB","ALW"," ","B","3U"," ","77000","0","77000","0"," ","0"," > ","0"," ","0"," ","0"," ","0"," ","0"," ","0","0","0"," > ","850","F","D","D","7"," "," "," ","4:57PM ","5:28PM ","0457"," ","Alw > 77000N1X","Fast","","30","76"," > ","","","64","144.76","2416","4847","11365","13836","0"," > ","","213021","Good","Clear","","","Y" > "R","9","TB","STR"," "," ","3U"," ","60000","0","60000","0"," ","0"," > ","0"," ","0"," ","0"," ","0"," ","0"," ","0","0","40000"," > ","700","F","D","D","8"," "," "," ","5:28PM "," ","0528"," ","Alw > 40000s","Fast","Y","16","81"," > ","","","64","124.66","2339","4740","11211","0","0"," > ","","332649","6,8","Clear","0","","Y" > "S","1","000008813341TB","Coolusive","20100124","KY","TB","Colt","Bay","Ice > Cool Kitty","2003","TB","Elusive Quality","1993","TB","Tomorrows > Cat","1995","TB","Gone > West","1984","TB","122","0","L","","28200","Velasquez","Cornelio","H."," > ","Jacobson","David"," ","Drawing Away Stable and Jacobson, David"," > "," ","265","N"," > ","0","N","5","5","3","3","4","0","0","1","1","1","10","200","0","0","100","75","510","320","0","0","0","0","N","25000","4w > > into lane, held","chase 2o turn, bid 4w turning for home,took over, > held > sway","7.30","3.80","2.70","Y","000000002103TE","TE","Barbara","Robert"," > ","000001976480O6","O6","Averill","Bradley","E."," > ","N","0","N","","0","","87","Lansdon B. Robbins & Kevin > Callahan","000000257611TE","000000002695JE" > >
