Agreed. Bryan's suggestion will give you the ability to match each line against 
the regex,
rather than trying to match the entire file. It would result in a new FlowFile 
for each line of
text, though, as he said. But if you need to rebuild a single file, those could 
potentially be
merged together using a MergeContent processor, as well.

________________________________
> Date: Tue, 8 Sep 2015 13:03:08 -0400 
> Subject: Re: ExtractText usage 
> From: [email protected] 
> To: [email protected] 
> 
> Chris, 
> 
> I think the issue is that ExtractText is not reading the file line by 
> line, and then applying your pattern to each line. It is applying the 
> pattern to the whole content of the file so you would need a regex that 
> repeated the pattern you were looking for so that it captured multiple 
> times. 
> 
> When I tested your example, it was actually extracting the first match 
> 3 times which I think is because of the following... 
> - It always puts the first match in the property base name, in this 
> case "regex", 
> - then it puts the entire match in index 0, in this case regex.0, and 
> in this case it is only matching the first occurrence 
> - and then all of the matches would be in order after that staring with 
> index 1, which in this case there is only 1 match so it is just regex.1 
> 
> Another solution that might simpler is to put a SplitText processor 
> between GetFile and ExtractText, and set the Line Split Count to 1. 
> This will send 1 line at a time to your ExtractTextProcessor which 
> would then match only the lines starting with 'R'. 
> The downside is that all of the lines with 'R' would be in different 
> FlowFiles, but this may or may not matter depending what you wanted to 
> do with them after. 
> 
> -Bryan 
> 
> 
> On Tue, Sep 8, 2015 at 12:12 PM, Christopher Wilson 
> <[email protected]<mailto:[email protected]>> wrote: 
> I'm trying to read a directory of .csv files which have 3 different 
> schemas/list types (not my idea). The descriptor is in the first 
> column of the csv file. I'm reading the files in using GetFile and 
> passing them into ExtractText, but I'm only getting the first 3 (of 8) 
> lines matching my first regex. What I want to do is grab all the lines 
> beginning with "R" and dump them off to a file (for now). My end goal 
> would be to loop through these grab lines, or blocks of lines, by regex 
> and route them downstream based on that regex. 
> 
> Details and first 11 lines of a sample file below. 
> 
> Thanks in advance. 
> 
> -Chris 
> 
> NiFi version: 0.2.1 
> OS: Ubuntu 14.01 
> JVM: java-1.7.0-openjdk-amd64 
> 
> ExtractText: 
> 
> Enable Multiline = True 
> Enable Unix Lines Mode = True 
> regex = ^("R.*)$ 
> 
> 
> "H","USA","BP","20140502","9","D","BP" 
> "R","1","TB","CLM"," "," ","3U"," ","47000","0","47000","0"," ","0"," 
> ","0"," ","0"," ","0"," ","0"," ","0"," ","0","25000","25000"," 
> ","650","F","D","D","6"," "," "," ","1:20PM ","1:51PM ","0122"," ","Clm 
> 25000","Fast","","16","87"," 
> ","","","64","117.39","2266","4648","11129","0","0"," 
> ","","112089","Good","Cloudy","","","Y" 
> "R","2","TB","CLM"," ","B","3U"," ","34000","0","34000","0"," ","0"," 
> ","0"," ","0"," ","0"," ","0"," ","0"," ","0","25000","25000"," 
> ","600","F","D","D","7"," "," "," ","1:51PM ","2:22PM ","0151"," ","Clm 
> 25000N2L","Fast","","16","79"," 
> ","","","64","112.36","2444","4803","10003","0","0"," 
> ","","261868","Poor","Cloudy","","","Y" 
> "R","3","TB","STK","S"," ","3U"," 
> ","100000","0","100000","0","A","100000"," ","0"," ","0"," ","0"," 
> ","0"," ","0"," ","0","0","0"," ","600","F","D","D","6"," ","Affirmed 
> Success S.","AfrmdScsB","2:22PM ","2:53PM ","0222"," 
> ","AfrmdScsB100k","Fast","","16","88"," 
> ","","","64","110.54","2323","4618","5810","0","0"," 
> ","","259015","5","Clear","","","Y" 
> "R","4","TB","MCL"," "," ","3U"," ","49200","0","49200","0"," ","0"," 
> ","0"," ","0"," ","0"," ","0"," ","0"," ","0","40000","40000"," 
> ","850","F","D","D","8"," "," "," ","2:53PM ","3:24PM ","0253"," ","Md 
> 40000","Fast","Y","30","72"," 
> ","","","64","145.58","2425","4829","11358","13909","0"," 
> ","","260343","9","Clear","0","","Y" 
> "R","5","TB","ALW"," "," ","3U"," ","77000","0","77000","0"," ","0"," 
> ","0"," ","0"," ","0"," ","0"," ","0"," ","0","0","0"," 
> ","900","F","D","D","7"," "," "," ","3:24PM ","3:55PM ","0325"," ","Alw 
> 77000N1X","Fast","Y","30","74"," 
> ","","","64","151.69","2330","4643","11156","13832","0"," 
> ","","302065","Good","Clear","","","Y" 
> "R","6","TB","MSW","S","B","3U"," ","60000","1200","60000","0"," 
> ","0"," ","0"," ","0"," ","0"," ","0"," ","0"," ","0","0","0"," 
> ","800","F","D","D","5"," "," "," ","3:55PM ","4:26PM ","0355"," ","Md 
> Sp Wt 58k","Fast","","30","61"," 
> ","","","64","140.64","2481","4931","11477","0","0"," 
> ","","161404","Good","Clear","","","Y" 
> "R","7","TB","CLM"," ","B","3U"," ","40000","0","40000","0"," ","0"," 
> ","0"," ","0"," ","0"," ","0"," ","0"," ","0","20000","20000"," 
> ","800","F","D","D","6"," "," "," ","4:26PM ","4:57PM ","0427"," ","Clm 
> 20000","Fast","","30","68"," 
> ","","","64","139.31","2337","4770","11402","0","0"," 
> ","","344306","Good","Clear","","","Y" 
> "R","8","TB","ALW"," ","B","3U"," ","77000","0","77000","0"," ","0"," 
> ","0"," ","0"," ","0"," ","0"," ","0"," ","0","0","0"," 
> ","850","F","D","D","7"," "," "," ","4:57PM ","5:28PM ","0457"," ","Alw 
> 77000N1X","Fast","","30","76"," 
> ","","","64","144.76","2416","4847","11365","13836","0"," 
> ","","213021","Good","Clear","","","Y" 
> "R","9","TB","STR"," "," ","3U"," ","60000","0","60000","0"," ","0"," 
> ","0"," ","0"," ","0"," ","0"," ","0"," ","0","0","40000"," 
> ","700","F","D","D","8"," "," "," ","5:28PM "," ","0528"," ","Alw 
> 40000s","Fast","Y","16","81"," 
> ","","","64","124.66","2339","4740","11211","0","0"," 
> ","","332649","6,8","Clear","0","","Y" 
> "S","1","000008813341TB","Coolusive","20100124","KY","TB","Colt","Bay","Ice 
> Cool Kitty","2003","TB","Elusive Quality","1993","TB","Tomorrows 
> Cat","1995","TB","Gone 
> West","1984","TB","122","0","L","","28200","Velasquez","Cornelio","H."," 
> ","Jacobson","David"," ","Drawing Away Stable and Jacobson, David"," 
> "," ","265","N"," 
> ","0","N","5","5","3","3","4","0","0","1","1","1","10","200","0","0","100","75","510","320","0","0","0","0","N","25000","4w
>  
> into lane, held","chase 2o turn, bid 4w turning for home,took over, 
> held 
> sway","7.30","3.80","2.70","Y","000000002103TE","TE","Barbara","Robert"," 
> ","000001976480O6","O6","Averill","Bradley","E."," 
> ","N","0","N","","0","","87","Lansdon B. Robbins & Kevin 
> Callahan","000000257611TE","000000002695JE" 
> 
> 
                                          

Reply via email to