HI Some time back (couple of years) I used split xtokenize on a large XML file at three levels which has worked very well. Two of the three elements had the same tags, however I needed elements from each level so did the split tokenize in separate routes i.e. split/tokenize (get an element) call the next route which split tokenized the next section(s) and then finally the third level. I use the XML DSL (and this was using Camel 2.15.1 - so I'm sure it will work still) The file had 500000 rows to 1000000 rows and I tried different approaches to handle both the size and speed of processing (Claus had a good article on processing large xml files which I followed)
Cheers Bob On Wed, Jun 17, 2020 at 1:22 PM Claus Ibsen <claus.ib...@gmail.com> wrote: > Hi > > It can be both, hower its a little better as an argument > > split(xpath(xxxx), xxxx) > > > > On Wed, Jun 17, 2020 at 1:10 PM Mikael Andersson Wigander > <mikael.grevs...@gmail.com> wrote: > > > > Hi > > > > Makes sense. > > > > Tried splitting using xpath but it didn’t work either. > > Should xpath be as argument of split or own statement (.xpath())? > > > > // Mikael Andersson Wigander > > > > > > > 17 juni 2020 kl. 10:14 skrev Claus Ibsen <claus.ib...@gmail.com>: > > > > > > Hi > > > > > > No tokenizeXml is for not complex XML with tags that are nested. It > > > uses regexp parsing etc. > > > > > > Instead using camel-stax or camel-jaxb or something like that. > > > > > > > > >> On Wed, Jun 17, 2020 at 9:14 AM Mikael Andersson Wigander > > >> <mikael.grevs...@gmail.com> wrote: > > >> > > >> Hi > > >> > > >> We have a XML file to split on tag <Tx>. > > >> However this tag is also present in a node further down the tree as > well. > > >> > > >> tokenizeXML is used in our application but now this won’t work > because it ends prematurely. > > >> > > >> Here’s the XML > > >> > > >> <?xml version="1.0" encoding="UTF-8"?> > > >> <UVMiFIRDocument > xmlns="urn:uv:xsd:unavista.mifir.iso20022.001.001.001"> > > >> <UVHeader> > > >> <UVHeader xmlns="unavista.header.001.001.001"> > > >> <SubmittingEntityID>1312312</SubmittingEntityID> > > >> </UVHeader> > > >> </UVHeader> > > >> <Document> > > >> <Document > xmlns="urn:iso:std:iso:20022:tech:xsd:DRAFT15auth.016.001.01"> > > >> <FinInstrmRptgTxRpt> > > >> <Tx> > > >> <New> > > >> <TxId>197X85138XMT</TxId> > > >> <ExctgPty>1231231</ExctgPty> > > >> <InvstmtPtyInd>true</InvstmtPtyInd> > > >> <SubmitgPty>312312</SubmitgPty> > > >> <Buyr> > > >> <AcctOwnr> > > >> <Id> > > >> <LEI>123123</LEI> > > >> </Id> > > >> <CtryOfBrnch>NL</CtryOfBrnch> > > >> </AcctOwnr> > > >> <DcsnMakr> > > >> <LEI>549300DLR3UX38D4Z689</LEI> > > >> </DcsnMakr> > > >> </Buyr> > > >> <Sellr> > > >> <AcctOwnr> > > >> <Id> > > >> <LEI>123123123</LEI> > > >> </Id> > > >> </AcctOwnr> > > >> </Sellr> > > >> <OrdrTrnsmssn> > > >> <TrnsmssnInd>true</TrnsmssnInd> > > >> </OrdrTrnsmssn> > > >> <Tx> > > >> <TradDt>2020-06-05T21:18:32.000Z</TradDt> > > >> <TradgCpcty>AOTC</TradgCpcty> > > >> <Qty> > > >> <NmnlVal Ccy="EUR">3.57</NmnlVal> > > >> </Qty> > > >> <Pric> > > >> <Pric> > > >> <MntryVal> > > >> <Amt Ccy="USD">1.131818</Amt> > > >> </MntryVal> > > >> </Pric> > > >> </Pric> > > >> <TradVn>XOFF</TradVn> > > >> </Tx> > > >> <FinInstrm> > > >> <Othr> > > >> <FinInstrmGnlAttrbts> > > >> <FullNm>USD/EUR</FullNm> > > >> <ClssfctnTp>JFTXFP</ClssfctnTp> > > >> <NtnlCcy>USD</NtnlCcy> > > >> </FinInstrmGnlAttrbts> > > >> <DerivInstrmAttrbts> > > >> <XpryDt>2020-06-09</XpryDt> > > >> <PricMltplr>1</PricMltplr> > > >> <UndrlygInstrm> > > >> <Othr> > > >> <Sngl> > > >> <Indx> > > >> <Nm> > > >> <RefRate> > > >> > <Nm>USD/EUR</Nm> > > >> </RefRate> > > >> </Nm> > > >> </Indx> > > >> </Sngl> > > >> </Othr> > > >> </UndrlygInstrm> > > >> <DlvryTp>PHYS</DlvryTp> > > >> </DerivInstrmAttrbts> > > >> </Othr> > > >> </FinInstrm> > > >> <ExctgPrsn> > > >> <Clnt>NORE</Clnt> > > >> </ExctgPrsn> > > >> <AddtlAttrbts> > > >> > <SctiesFincgTxInd>false</SctiesFincgTxInd></AddtlAttrbts> > > >> </New> > > >> </Tx> > > >> </FinInstrmRptgTxRpt> > > >> </Document> > > >> </Document> > > >> </UVMiFIRDocument> > > >> > > >> In the debugger it reveals that it is “broken” > > >> > > >> <Tx> > > >> <New> > > >> <TxId>197X85138XMT</TxId> > > >> <ExctgPty>549300DLR3UX38D4Z689</ExctgPty> > > >> <InvstmtPtyInd>true</InvstmtPtyInd> > > >> <SubmitgPty>549300FVRWYPDFJTH118</SubmitgPty> > > >> <Buyr> > > >> <AcctOwnr> > > >> <Id> > > >> <LEI>5493000WZY3YLO3WB727</LEI> > > >> </Id> > > >> <CtryOfBrnch>NL</CtryOfBrnch> > > >> </AcctOwnr> > > >> <DcsnMakr> > > >> <LEI>549300DLR3UX38D4Z689</LEI> > > >> </DcsnMakr> > > >> </Buyr> > > >> <Sellr> > > >> <AcctOwnr> > > >> <Id> > > >> <LEI>5493006KMX1VFTPYPW14</LEI> > > >> </Id> > > >> </AcctOwnr> > > >> </Sellr> > > >> <OrdrTrnsmssn> > > >> <TrnsmssnInd>true</TrnsmssnInd> > > >> </OrdrTrnsmssn> > > >> <Tx> > > >> <TradDt>2020-06-05T21:18:32.000Z</TradDt> > > >> <TradgCpcty>AOTC</TradgCpcty> > > >> <Qty> > > >> <NmnlVal Ccy="EUR">3.57</NmnlVal> > > >> </Qty> > > >> <Pric> > > >> <Pric> > > >> <MntryVal> > > >> <Amt Ccy="USD">1.131818</Amt> > > >> </MntryVal> > > >> </Pric> > > >> </Pric> > > >> <TradVn>XOFF</TradVn> > > >> </Tx> > > >> > > >> > > >> Can this be done using tokenizeXML or? > > >> > > >> > > >> > > >> Thx > > > > > > > > > > > > -- > > > Claus Ibsen > > > ----------------- > > > http://davsclaus.com @davsclaus > > > Camel in Action 2: https://www.manning.com/ibsen2 > > > > -- > Claus Ibsen > ----------------- > http://davsclaus.com @davsclaus > Camel in Action 2: https://www.manning.com/ibsen2 > -- Bob Anderson +27 (0) 82 389 0335 [image: View my profile on LinkedIn] <http://ng.linkedin.com/pub/bob-anderson/2/25/9b5>