Hello. I am trying to identify a header line and a data line count from a
flowfile that is in csv format.
Most of us are familiar with Matt B's outstanding Cookbook series, and I am
trying to use that as my starting point. Here is my Groovy code:
import org.apache.commons.io.IOUtils
import java.nio.charset.StandardCharsets
def ff=session.get()
if(!ff)return
try {
def text = ''
// Cast a closure with an inputStream parameter to InputStreamCallback
session.read(ff, {inputStream ->
text = IOUtils.toString(inputStream, StandardCharsets.UTF_8)
// Do something with text here
// get header from the second line of the flowfile
// set datacount as the total line count of the file - 2
...
ff = session.putAttribute(ff, 'mdb.table.header', header)
ff = session.putAttribute(ff, 'mdb.table.datarecords', datacount)
} as InputStreamCallback)
session.transfer(flowFile, REL_SUCCESS)
} catch(e) {
log.error('Error occurred identifying tables in mdb file', e)
session.transfer(ff, REL_FAILURE)
}
I want to avoid using that line in red, because as Matt cautions in his
cookbook, our csv files are too large. I do not want to read in the entire
file to variable text. It's going to be a problem.
How in Groovy can I cherry pick only the line I want from the stream (line
#2 in this case)?
Also, how can I get a count of the total lines without loading them all
into text?
Thanks in advance for your help.