This is going to sound very odd, but I am hoping to use a transform script
in such a way that I pass a filepath to the transform script, to which it
reads the file and produces a bunch of rows in hive.  In this case the data
is pcaps.  I have a location accessible to all nodes, and I want to have my
transform script read in a file location, and then spit out, for example
the IP addresses that were seen in the packet capture (using a script I've
already written).   Can I do something whereby I load my file locations
into a table in hive (one file per row) and read that table into a
transform script and only have one map task per source row?  I don't want
my script to parse several files, it may make for some poor
parrelelization, but I am having trouble forcing such a small record count
per map task.

Thoughts?

Reply via email to