Hello. I am trying to identify a header line and a data line count from a
flowfile that is in csv format.

Most of us are familiar with Matt B's outstanding Cookbook series, and I am
trying to use that as my starting point. Here is my Groovy code:

import org.apache.commons.io.IOUtils
import java.nio.charset.StandardCharsets
def ff=session.get()
if(!ff)return
try {
     def text = ''
     // Cast a closure with an inputStream parameter to InputStreamCallback
     session.read(ff, {inputStream ->
          text = IOUtils.toString(inputStream, StandardCharsets.UTF_8)
          // Do something with text here
          // get header from the second line of the flowfile
          // set datacount as the total line count of the file - 2
          ...
          ff = session.putAttribute(ff, 'mdb.table.header', header)
          ff = session.putAttribute(ff, 'mdb.table.datarecords', datacount)
     } as InputStreamCallback)
     session.transfer(flowFile, REL_SUCCESS)
} catch(e) {
     log.error('Error occurred identifying tables in mdb file', e)
     session.transfer(ff, REL_FAILURE)
}

I want to avoid using that line in red, because as Matt cautions in his
cookbook, our csv files are too large. I do not want to read in the entire
file to variable text. It's going to be a problem.

How in Groovy can I cherry pick only the line I want from the stream (line
#2 in this case)?

Also, how can I get a count of the total lines without loading them all
into text?

Thanks in advance for your help.

Reply via email to