Prabhu, It's possible to do what you're asking but not especially efficient. You can SplitText twice (10,000 and then 1) outputting the header on each and then running the result through ExtractText. Your regex would be something like ([^,]*?),([^,]*),.... so match 0 or more non-comma characters followed by a comma. ExtractText will place the matched capture groups into attributes like you mentioned (date.1->the_captured_text)
However, it's not super efficient or at least it hasn't been in my case as you're moving the FlowFile contents into attributes and the attributes are stored in memory so, depending on how large the file is, you *may* experience excessive GC activity or OOM errors. Using InferAvroSchema (if you don't know the schema in advance) and then using ConvertCSVtoAvro may be better option depending on where the data is ultimately going. One caveat though is that ConvertCSVtoAvro seems to only work with properly quoted and escaped CSV that conforms to RFC 4180. I'm just getting started with NiFi myself so not an expert or anything but I hope that helps. -Jason On Tue, Nov 22, 2016 at 3:34 AM, prabhu Mahendran <[email protected]> wrote: > Hi All, > > I have CSV unstructured data with comma as delimiter which contains 100 > rows. > > Is it possible to extract the data's in csv file using comma as seperator > in nifi processors. > > > *See my Sample data 3 from 100 rows.* > > *No,Name,Age,PAN,City* > *1,Siva,22,91230,Londan,* > *2,,23,91231,UK* > > *3,Greck,22,,US* > > In 1st row having all values which can be seperated by "data" attribute > having regex *(.+),(.+),(.+),(.+),(.+)* then row will be split like > below.., > > data.1-->1 > data.2-->Siva > data.3-->22 > data.4-->91230 > data.5-->Londan > > But in Second row which having Empty values in Name column can using regex > (.+),,(.+),(.+),(.+) then row will be split like below.., > > data.1-->2 > data.2-->23 > data.3-->91231 > data.4-->UK > > Third row same as PAN Column empty it can able to split using another > regex attribute. > > But my problem is now data having 100 rows.In future this may having > another 100 rows.So again need to write more regex attributes to capture > group wise . > > > *So I think i have given comma(,) as common regex for all rows in csv > file then it will split data as into data.1,data.2,...data.5 * > > > > > > *But i gets an validation failed error in Bulletins Indicator in > ExtractTextProcessor.So is this possible to write delimiter wise splitting > of rows in CSV File?Is this possible to write common regex for all csv data > in ExtractText or any other processor?* > >
