anybody has any thoughts on UTF 8 Flow files with XMLtransforemation and other processors ?
Anuj On Mon, Jun 13, 2016 at 4:45 PM, Anuj Handa <[email protected]> wrote: > So it seems like its a UTF-8 issue, when i changed the string to use Hex > instead of Text and using the HEXcode with 00 (2 BYte) the contentsplit > worked. > > <POSTransaction xmlns is the string i was looking to split on which > translates into following Hex code > > *3c0050004f0053005400720061006e00730061006300740069006f006e00200078006d006c006e007300* > > the transformXML is now failing i think because of the UTF-8. I know i had > it working in normal ascii file. > > Do i need to specify someplace the flow files are UTF-8 or is it smart > enough to figure it out on its own ? > based on some reading i see that some processors expect UTF-8 so the next > question would be do all processors support UTF 8 ? > > Anuj > > > > On Mon, Jun 13, 2016 at 3:01 PM, Anuj Handa <[email protected]> wrote: > >> thanks Joe, unfortunately since my xml has namespaces (xmlns ) that >> approach wont work. >> any thought on why spilt doesn't work using the tag, does it accept UTF8 >> flow files ? >> >> Anuj >> >> On Mon, Jun 13, 2016 at 2:50 PM, ski n <[email protected]> wrote: >> >>> You can also make your input XML well-formed by creating a custom root >>> element (e.g. <PostTransactions>...xmldocuments</PostTransactions> >>> and then use the SplitXML processor (or just the transformation step). >>> >>> 2016-06-13 18:04 GMT+02:00 Anuj Handa <[email protected]>: >>> >>>> i have a text file which has multiple XML documents. which starts with >>>> <POSTransaction >>>> xmlns >>>> i am trying to break each one of the XML docs into 1 flow-file so i can >>>> then use evaluate XML and then convert into JSOn and then load into a >>>> database. >>>> >>>> i tried just the split content and that didnt work. the file is UTF 8 >>>> not sure if that plays into it. and i am running the nifi on linux and the >>>> file is also local on linux. >>>> >>>> [image: Inline image 1] >>>> >>>> this is my entire workflow. >>>> >>>> [image: Inline image 2] >>>> >>>> >>>> On Mon, Jun 13, 2016 at 11:43 AM, Joe Percivall <[email protected] >>>> > wrote: >>>> >>>>> Awesome, and what processor were you planning to use to split on >>>>> "#|#|#"? The SplitContent processor[1] can be used to split the content on >>>>> a sequence of text characters which could split on "<POSTransaction xmlns" >>>>> without needing to add "#|#|#". >>>>> >>>>> Also I see "xmlns" and think this is an xml file you are trying to >>>>> split. If so are you by chance trying to split evenly on each child? If so >>>>> the "SplitXml" processor[2] would easily take care of that. >>>>> >>>>> [1] >>>>> https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.processors.standard.SplitContent/index.html >>>>> [2] >>>>> https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.processors.standard.SplitXml/index.html >>>>> >>>>> Joe- - - - - - >>>>> Joseph Percivall >>>>> linkedin.com/in/Percivall >>>>> e: [email protected] >>>>> >>>>> >>>>> >>>>> >>>>> On Monday, June 13, 2016 11:26 AM, Anuj Handa <[email protected]> >>>>> wrote: >>>>> Yes that's exactly correct. >>>>> >>>>> >>>>> > On Jun 13, 2016, at 11:14 AM, Joe Percivall <[email protected]> >>>>> wrote: >>>>> > >>>>> > Sorry I got a bit confused, in your original question you said that >>>>> you wanted to append the value and I took it that you just wanted to >>>>> append >>>>> the value to the end of the line or text. >>>>> > >>>>> > Let me try and restate your goal so I'm sure I understand, >>>>> ultimately you want to split the incoming FlowFile on each occurrence of >>>>> "<POSTransaction xmlns" and you are planning on using ReplaceText to add >>>>> "#|#|#" before each occurrence so that it will be easy to split? >>>>> > >>>>> > >>>>> > Joe >>>>> > - - - - - - >>>>> > Joseph Percivall >>>>> > linkedin.com/in/Percivall >>>>> > e: [email protected] >>>>> > >>>>> > >>>>> > >>>>> > On Monday, June 13, 2016 11:05 AM, Anuj Handa <[email protected]> >>>>> wrote: >>>>> > >>>>> > >>>>> > >>>>> > Anuj >>>>> > Hi Joe, >>>>> > >>>>> > I modified the process per your suggestion but it only works to >>>>> replace the first occurrence, There are multiple such tags which it >>>>> doesn't >>>>> replace. . >>>>> > when i used evaluation mode line by line it appended it to every >>>>> line in the file and not to the one i waned too. >>>>> > >>>>> > >>>>> > >>>>> > >>>>> > On Mon, Jun 13, 2016 at 10:40 AM, Joe Percivall < >>>>> [email protected]> wrote: >>>>> > >>>>> > Hello, >>>>> >> >>>>> >> In order to use ReplaceText[1] to solely append a value to the end >>>>> of then entire text then change the "Replacement Strategy" to "Append" and >>>>> leave "Evaluation Mode" as "Entire Text". This will take whatever is the >>>>> "Replacement Value" and append it as a literal(without interpreting >>>>> back-references) to the end of the text. >>>>> >> >>>>> >> Alternatively, if you want to append to the end of each line then >>>>> change "Evaluation Mode" to "Line-by-Line". >>>>> >> >>>>> >> [1] >>>>> https://nifi.apache.org/docs/nifi-docs/components/org.apache.nifi.processors.standard.ReplaceText/index.html >>>>> >> >>>>> >> >>>>> >> Hope that helps, >>>>> >> Joe >>>>> >> - - - - - - Joseph Percivall >>>>> >> linkedin.com/in/Percivall >>>>> >> e: [email protected] >>>>> >> >>>>> >> >>>>> >> >>>>> >> >>>>> >> On Monday, June 13, 2016 10:05 AM, Anuj Handa <[email protected]> >>>>> wrote: >>>>> >> >>>>> >> >>>>> >> >>>>> >> Hi, >>>>> >> >>>>> >> I am trying to read a file and then use replaceText to append a >>>>> string so I can spilt the line in the next step. I am nable to make the >>>>> ReplaceText work. >>>>> >> The flowfile is going through as success without the string being >>>>> appended or replaced >>>>> >> >>>>> >> Any thoughts what i could be doing wrong >>>>> >> >>>>> >>>> >>>> >>> >> >
