Hi, This may be a long shot as I don’t know how many combinations of the column lengths with | and + there are, but you could try using ReplaceTextWithMapping processor where you have all combinations of +--| etc. in a text file with what they represent in term of counts e.g +-- [0] | +-- [1] | +-- [3]
etc. (tab separated) Also, I’m not a particularly experienced in the area of sed, awk etc. but I’m guessing some bash guru would be able to come up with some sort of script that does this that could be called from ExcecuteScript processor. Regards Conrad From: Pat Trainor <[email protected]> Reply-To: "[email protected]" <[email protected]> Date: Sunday, 5 June 2016 at 18:33 To: "[email protected]" <[email protected]> Subject: Nifi & Parsey McParseface! RegEx in a Processor... I have had success with using ReplaceText processor out of the box to modify the output of a nifi-called script. I'm applying nifi to running the parsey mcparseface system (Syntaxnet) from google. The ouput of the application looks like this: --- Input: It is to two English scholars , father and son , Edward Pococke , senior and junior , that the world is indebted for the knowledge of one of the most charming productions Arabian philosophy can boast of . Parse: is VBZ ROOT +-- It PRP nsubj +-- to IN prep | +-- scholars NNS pobj | +-- two CD num | +-- English JJ amod | +-- , , punct | +-- father NN conj | | +-- and CC cc | | +-- son NN conj | +-- Pococke NNP appos [...] --- As you can see, my ExecuteProcessorStream is working fine. But there is a bit of importance that needs to be taken from this text. My ReplaceText Processor used (the first one) is shown in the attached. It only removes characters. How many 'spaces' each of the '+' signs is is important. Simply removing leading spaces, + and | characters moves the first word in each line to the first column, without telling you how many columns over the words started in the original input. WHat is needed is a way to count the number of columns in the beginning of each line that precedes the first alphanumeric. It doesn't matter if the same processor can also clean things out to my present efforts: Input: It is to two English scholars , father and son , Edward Pococke , senior and junior , that the world is indebted for the knowledge of one of the most charming productions Arabian philosophy can boast of . Parse: is VBZ ROOT It PRP nsubj to IN prep [...] I am hoping to somehow use the expressions (a la ${line:blah...) in Nifi, or another mechanism I'm not aware of, to gather the column count, making it available for later processing/storage. [0]is VBZ ROOT [1]It PRP nsubj [1]to IN prep [2] ... With the [X] being the # of columns over from the left that the alpha-numeric character was. The reasoning for this is that the position signifies how 'important' that attribute is in the sentence. It looks like a tree, but the numer (indentation) is the length of the branch the word is on. Is there a clever way to accomplish most/all of this, either with () regex or named attributes, in Nifi? Thanks! pat<http://about.me/PatTrainor> ( ͡° ͜ʖ ͡°) "A wise man can learn more from a foolish question than a fool can learn from a wise answer". ~ Bruce Lee. ***This email originated outside SecureData*** Click here<https://www.mailcontrol.com/sr/MZbqvYs5QwJvpeaetUwhCQ==> to report this email as spam. SecureData, combating cyber threats ______________________________________________________________________ The information contained in this message or any of its attachments may be privileged and confidential and intended for the exclusive use of the intended recipient. If you are not the intended recipient any disclosure, reproduction, distribution or other dissemination or use of this communications is strictly prohibited. The views expressed in this email are those of the individual and not necessarily of SecureData Europe Ltd. Any prices quoted are only valid if followed up by a formal written quote. SecureData Europe Limited. Registered in England & Wales 04365896. Registered Address: SecureData House, Hermitage Court, Hermitage Lane, Maidstone, Kent, ME16 9NT
