Re: [Jprogramming] Is is good idea to use J for reading large XML files ?

Raul Miller Thu, 09 Sep 2021 11:16:09 -0700

Ok, I went ahead and tested it, and it looks good.

And, also, you can see an issue... which has to do with memory
allocation and nested xml elements... here's my test code:


require 'api/expat'
cocurrent 'jexpat'


expat_parse_file=: 3 : 0
'' expat_parse_file y
:
expat_init x
parser=. XML_ParserCreate <<0
f=. [: 15!:13 (IFWIN#'+') , ' x' $~ +:@>:
XML_SetElementHandler parser, (f 3), (f 2)
XML_SetCharacterDataHandler parser, (f 4)
size =. 1!:4 y [ pos=. 0 [ inc=. 2^24

while. pos < size do.
  if. (size-pos+inc) < 0 do.
    block=. 1!:11 (>y); pos, size-pos
    end_flag=.XML_TRUE
  else.
    block=. 1!:11 (>y); pos, inc
    end_flag=.XML_FALSE
  end.
  if. XML_STATUS_ERROR = XML_Parse parser; block; (PARLEN=: #block); end_flag
  do.
    err=. memr 0 _1 2,~ XML_ErrorString XML_GetErrorCode parser
    lncol=. (XML_GetCurrentLineNumber parser), XML_GetCurrentColumnNumber parser
    XML_ParserFree parser
  end.
  pos=. pos+inc
end.

XML_ParserFree parser
expat_parse_xmlx''
)

NB. test against test.xml
cocurrent'base' NB. technically unnecessary - included for emphasis
coclass 'testtest'
coinsert 'jexpat'

expat_start_elementx=: {{
   'enm anm avl'=. x
   echo (":y),'  ',enm,;,' ',each,.anm,each ' ',each avl
}}
expat_end_elementx=: echo@]
expat_parse_xml 1!:1 < jpath '~addons/api/expat/test/test.xml'

When running this code, I saw that expat retained the memory location
for the outer xml element and used that location when later closing
that xml element.

But if I change the buffer size, changing the line that says
size =. 1!:4 y [ pos=. 0 [ inc=. 2^24
to
size =. 1!:4 y [ pos=. 0 [ inc=. 2

this behavior changes, and it's using the same memory location for all
xml elements.

Anyways, that's sort of fine, but now that quirk cannot be used for
tracking which xml element is being closed in expat_end_elementx.

Thanks,


--
Raul

On Thu, Sep 9, 2021 at 7:09 AM Raul Miller <[email protected]> wrote:
>
> That looks plausible (though I have not tested it).
>
> That said, I would like to recommend that when using gmail you use
> "paste and match style" rather than the default "paste" mechanism.
> Perhaps like this:
>
> expat_parse_file=: 3 : 0
> '' expat_parse_file y
> :
> expat_init x
> parser=. XML_ParserCreate <<0
> f=. [: 15!:13 (IFWIN#'+') , ' x' $~ +:@>:
> XML_SetElementHandler parser, (f 3), (f 2)
> XML_SetCharacterDataHandler parser, (f 4)
> size =. 1!:4 y [ pos=. 0 [ inc=. 2^24
>
> while. pos < size do.
>   if. (size-pos+inc) < 0 do.
>     block=. 1!:11 (>y); pos, size-pos
>     end_flag=.XML_TRUE
>   else.
>     block=. 1!:11 (>y); pos, inc
>     end_flag=.XML_FALSE
>   end.
>   if. XML_STATUS_ERROR = XML_Parse parser; block; (PARLEN=: #block); end_flag
>   do.
>     err=. memr 0 _1 2,~ XML_ErrorString XML_GetErrorCode parser
>     lncol=. (XML_GetCurrentLineNumber parser), XML_GetCurrentColumnNumber 
> parser
>     XML_ParserFree parser
>   end.
>   pos=. pos+inc
> end.
>
> XML_ParserFree parser
> expat_parse_xmlx''
> )
>
> Thanks,
>
> --
> Raul
>
> On Thu, Sep 9, 2021 at 6:40 AM Mariusz Grasko <[email protected]> 
> wrote:
> >
> > Repasting it with small fix and without my comments.
> >
> > expat_parse_file=: 3 : 0
> >
> > '' expat_parse_file y
> >
> > :
> >
> > expat_init x
> >
> > parser=. XML_ParserCreate <<0
> >
> > f=. [: 15!:13 (IFWIN#'+') , ' x' $~ +:@>:
> >
> > XML_SetElementHandler parser, (f 3), (f 2)
> >
> > XML_SetCharacterDataHandler parser, (f 4)
> >
> >
> > size =. 1!:4 y [ pos=. 0 [ inc=. 2^24
> >
> > while. pos < size do.
> >
> > if. (size-pos+inc) < 0 do.
> >
> > block=. 1!:11 (>y); pos, size-pos
> >
> > end_flag=.XML_TRUE
> >
> > else.
> >
> > block=. 1!:11 (>y); pos, inc
> >
> > end_flag=.XML_FALSE
> >
> > end.
> >
> > if. XML_STATUS_ERROR = XML_Parse parser; block; (PARLEN=: #block); end_flag
> > do.
> >
> > err=. memr 0 _1 2,~ XML_ErrorString XML_GetErrorCode parser
> >
> > lncol=. (XML_GetCurrentLineNumber parser), XML_GetCurrentColumnNumber parser
> >
> > XML_ParserFree parser
> >
> > end.
> >
> > pos=. pos+inc
> >
> > end.
> >
> > XML_ParserFree parser
> >
> > expat_parse_xmlx''
> >
> > )
> >
> > czw., 9 wrz 2021 o 12:36 Mariusz Grasko <[email protected]>
> > napisał(a):
> >
> > > Raul,
> > >
> > > Thank you very much for helping me out ! I have managed to tweak expat.jis
> > > so it now works stream-parsing style. It took me a long time to realise 
> > > the
> > > fact that api/expat/expat.jis in addons directory was being constatnly
> > > overwritten by J to default version all the time. So now I just load my
> > > Expat version with 0!:1 at the start of program, I will read up more on
> > > load and require verbs later.
> > > Anyways now I have this (I also have a version with block size as an
> > > argument to tweak for verb user 'path inc' =. y):
> > >
> > > expat_parse_file=: 3 : 0
> > >
> > > '' expat_parse_file y
> > >
> > > :
> > >
> > > expat_init x
> > >
> > > parser=. XML_ParserCreate <<0
> > >
> > > f=. [: 15!:13 (IFWIN#'+') , ' x' $~ +:@>:
> > >
> > > XML_SetElementHandler parser, (f 3), (f 2)
> > >
> > > XML_SetCharacterDataHandler parser, (f 4)
> > >
> > >
> > > NB. Here my changes start: NB. filesize, position for indexed read, index
> > > incrementor:
> > >
> > > size =. 1!:4 y [ pos=. 0 [ inc=. 2^24
> > >
> > > while. pos < size do.
> > >
> > > if. (size-pos+inc) < 0 do. NB. End of file, just read the end and set flag
> > > to XML_TRUE
> > >
> > > block=. 1!:11 (>y); pos, size-pos
> > >
> > > end_flag=.XML_TRUE
> > >
> > > else. NB. Before end, read next block
> > >
> > > block=. 1!:11 (>y); pos, inc
> > >
> > > end_flag=.XML_FALSE
> > >
> > > end.
> > >
> > > if. XML_STATUS_ERROR = XML_Parse parser; block; (PARLEN=: #block);
> > > XML_TRUE do.
> > >
> > > err=. memr 0 _1 2,~ XML_ErrorString XML_GetErrorCode parser
> > >
> > > lncol=. (XML_GetCurrentLineNumber parser), XML_GetCurrentColumnNumber
> > > parser
> > >
> > > XML_ParserFree parser
> > >
> > > end.
> > >
> > > pos=. pos+inc
> > >
> > > end.
> > >
> > > XML_ParserFree parser
> > >
> > > expat_parse_xmlx''
> > >
> > > )
> > >
> > > Please let me know if you think this is acceptable, I can make a pull
> > > request.
> > >
> > > Best regards,
> > > M.G.
> > >
> > > śr., 8 wrz 2021 o 12:32 bill lam <[email protected]> napisał(a):
> > >
> > >> IIRC expat uses the push model to get elements and should be memory
> > >> efficient. Please try the approach suggested by Raul to see if there will
> > >> be any improvement.
> > >>
> > >> On Tue, Sep 7, 2021, 7:57 PM Mariusz Grasko <[email protected]>
> > >> wrote:
> > >>
> > >> > So after some break and experimenting with other stuff I have revisited
> > >> > api/expat. It seems that expat, the C library for stream-oriented XML
> > >> > parsing, should not load elements to RAM and only process as you go,
> > >> > yielding tokens like StartElements, chardata, attributes etc. In this J
> > >> > library it seems to not be a case, unless I am misunderstanding how to
> > >> use
> > >> > it (which is most likely explanation).
> > >> >
> > >> > This is my version of dumb parser that should do nothing, just pass
> > >> through
> > >> > a file without capturing any data inside variables:
> > >> >
> > >> > NB. PROGRAM START
> > >> >
> > >> > require 'api/expat'
> > >> >
> > >> > coinsert 'jexpat'
> > >> >
> > >> >
> > >> > expat_initx=: 3 : 0
> > >> >
> > >> > id_offset=: y
> > >> >
> > >> > elements=: 0 0$''
> > >> >
> > >> > idnames=: 0$<''
> > >> >
> > >> > parents=: 0$0
> > >> >
> > >> > )
> > >> >
> > >> >
> > >> > expat_start_elementx=: 4 : 0
> > >> >
> > >> > 'elm att val'=. x
> > >> >
> > >> > smoutput 7!:0''
> > >> >
> > >> > EMPTY
> > >> >
> > >> > )
> > >> >
> > >> >
> > >> > expat_end_elementx=: 3 : 0
> > >> >
> > >> > EMPTY
> > >> >
> > >> > )
> > >> >
> > >> >
> > >> > expat_parse_xmlx=: 3 : 0
> > >> >
> > >> > EMPTY
> > >> >
> > >> > )
> > >> >
> > >> >
> > >> > 1 expat_parse_xml 1!:1 <
> > >> > 'C:\Users\mariusz\Documents\Programing\J\parseXML\test.xml'
> > >> >
> > >> >
> > >> > smoutput 'FINISH'
> > >> >
> > >> > NB. PROGRAM END
> > >> >
> > >> >
> > >> > I have added smoutput 7!:0'' to confirm memory usage increase. I 
> > >> > already
> > >> > knew that because my large file would eventually crash Windows. Small
> > >> files
> > >> > take small amount of memory and larger files take more (not sure if it
> > >> > linear realationship, haven't tested it). Enough to say that 296MB file
> > >> > would eventually get over 6GB of RAM usage.
> > >> >
> > >> > My quesses as someone who is just starting J:
> > >> >
> > >> > 1). 'elm att val'=. x this line is responsible for this memory spike,
> > >> maybe
> > >> > there is a way to free up those variables before parsing next start
> > >> element
> > >> > ?
> > >> >
> > >> > 2). There is some accumulation happening befind the curtains maybe for
> > >> > buffering and speeding up purposes - is there a way to limit it or
> > >> reclaim
> > >> > memory ?
> > >> >
> > >> > 3). My dumb parser is actually not just running through file and does
> > >> > something which I don't understand that causes it to capture data.
> > >> >
> > >> > 4). This impelenation of J expat is not stream-parser it just utilizes
> > >> > Expat, but loads whole document to memory beforehand. (but then why 
> > >> > does
> > >> > memory increase take time to read next element which is confirmed by
> > >> > smoutput 7!:0''
> > >> >
> > >> >
> > >> > Best regards,
> > >> >
> > >> > M.G.
> > >> >
> > >> > wt., 17 sie 2021 o 20:45 Jan-Pieter Jacobs <[email protected]>
> > >> > napisał(a):
> > >> >
> > >> > > Just as no one has not been mentioned before: Memory-mapped files 
> > >> > > also
> > >> > work
> > >> > > very well for big data files.
> > >> > >
> > >> > >
> > >> > > See the "Mapped Files" lab (Help > Studio > Labs > Mapped Files) and
> > >> > > "doc_jmf_ [ load'jmf' " in a J session for a short summary.
> > >> > >
> > >> > > You can map any file as character data with
> > >> > >
> > >> > > JCHAR map_jmf_ 'var';'filename.csv'
> > >> > >
> > >> > > Now the variable 'var' will look as if it contains all your data.
> > >> > > 2 warnings though:
> > >> > > 1) Mind the mode: the default mode to map a file is RW, so changes to
> > >> the
> > >> > > variable are immediately written to disk. You can use MTRO_jmf_ as
> > >> mode
> > >> > for
> > >> > > avoiding this kind of mistakes.
> > >> > > 2) Watch out with operations that might copy a big chunk of the
> > >> variable,
> > >> > > (e.g. var2=:var ; var2=: }. var ; ...) which could make your program,
> > >> or
> > >> > > others crash due to memory exhaustion.
> > >> > >   Things like this do work well though:
> > >> > >   LF +/@:= var NB. number of LF's, i.e. line count
> > >> > >
> > >> > > I once used them to grok through some 4GB CSV's to split them into
> > >> > records,
> > >> > > which went quite satisfactorily. To make matters easier for myself
> > >> > > afterwards, I wrote them out just sequentially, recording in a normal
> > >> > array
> > >> > > a list of the starting point and length pairs of each records for 
> > >> > > easy
> > >> > > indexing into the files when mapped again using map_jmf_ : E.g. get
> > >> > record
> > >> > > n using: recordsJMF {~ (+i.)/ n { start_len ; or even: recordsJMF
> > >> ({:@]
> > >> > {.
> > >> > > {.@] }. [) n { start_len. The last is in recent versions of J far 
> > >> > > more
> > >> > > efficient for long lengths due to using virtual nouns (see
> > >> > >
> > >> > >
> > >> >
> > >> https://code.jsoftware.com/wiki/Vocabulary/SpecialCombinations#Virtual_Nouns
> > >> > > )
> > >> > >
> > >> > > Of course, what makes sense to you probably depends on your 
> > >> > > situation,
> > >> > e.g.
> > >> > > whether the data is going to change, how many times you intend to use
> > >> the
> > >> > > same data, ...
> > >> > >
> > >> > > Best regards,
> > >> > > Jan-Pieter
> > >> > >
> > >> > > Op di 17 aug. 2021 om 18:56 schreef Devon McCormick <
> > >> [email protected]
> > >> > >:
> > >> > >
> > >> > > > Hi Mariuz,
> > >> > > > A while back I wrote some adverbs to apply a verb across a file in
> > >> > > pieces:
> > >> > > >
> > >> https://github.com/DevonMcC/JUtilities/blob/master/workOnLargeFile.ijs
> > >> > .
> > >> > > >
> > >> > > > The simplest, most general one is "doSomethingSimple".  It applies
> > >> the
> > >> > > > supplied verb to successive chunks of the file and allows work
> > >> already
> > >> > > done
> > >> > > > to be passed on the next iteration.
> > >> > > >
> > >> > > > NB.* doSomethingSimple: apply verb to file making minimal
> > >> assumptions
> > >> > > > about file structure.
> > >> > > > doSomethingSimple=: 1 : 0
> > >> > > >    'curptr chsz max flnm passedOn'=. 5{.y
> > >> > > >    if. curptr>:max do. ch=. curptr;chsz;max;flnm
> > >> > > >    else. ch=. readChunk curptr;chsz;max;flnm
> > >> > > >        passedOn=. u (_1{ch),<passedOn  NB. Allow u's work to be
> > >> passed
> > >> > on
> > >> > > > to next invocation
> > >> > > >    end.
> > >> > > >    (4{.ch),<passedOn
> > >> > > > NB.EG ([:~.;) doSomethingSimple ^:_ ] 0x;1e6;(fsize
> > >> > > > 'bigFile.txt');'bigFile.txt';<'' NB. Return unique characters in
> > >> file.
> > >> > > > )
> > >> > > >
> > >> > > > The sub-function "readChunk" looks like this:
> > >> > > >
> > >> > > > readChunk=: 3 : 0
> > >> > > >    'curptr chsz max flnm'=. 4{.y
> > >> > > >    if. 0<chsz2=. chsz<.0>.max-curptr do. chunk=. fread
> > >> > flnm;curptr,chsz2
> > >> > > >    else. chunk=. '' end.
> > >> > > >    (curptr+chsz2);chsz2;max;flnm;chunk
> > >> > > > NB.EG chunk=. >_1{ch0=. readChunk 0;1e6;(fsize
> > >> > > > 'bigFile.txt');'bigFile.txt'
> > >> > > > )
> > >> > > >
> > >> > > > Another adverb "doSomething" is similar but assumes you have
> > >> something
> > >> > > like
> > >> > > > line delimiters and you only want to process complete lines each
> > >> time
> > >> > > > through.
> > >> > > >
> > >> > > > If you get a chance to take a look at these, please let me know 
> > >> > > > what
> > >> > you
> > >> > > > think.
> > >> > > >
> > >> > > > Good luck,
> > >> > > >
> > >> > > > Devon
> > >> > > >
> > >> > > >
> > >> > > >
> > >> > > > On Tue, Aug 17, 2021 at 12:34 PM John Baker <[email protected]>
> > >> > wrote:
> > >> > > >
> > >> > > > > Mariuz,
> > >> > > > >
> > >> > > > > I've used the following adverb (see below) to process 4gig CSVs.
> > >> > > > Basically
> > >> > > > > it works
> > >> > > > > through the file in byte chunks.  As the j forum email tends to
> > >> wreak
> > >> > > > > embedded
> > >> > > > > code you can see how this adv is used in the database ETL system
> > >> that
> > >> > > > uses
> > >> > > > > it
> > >> > > > > here:
> > >> > > > >
> > >> > > > > https://bakerjd99.files.wordpress.com/2021/08/swiftprep.pdf
> > >> > > > >
> > >> > > > > You might also find this amusing:
> > >> > > > >
> > >> > > > >
> > >> > > > >
> > >> > > >
> > >> > >
> > >> >
> > >> https://analyzethedatanotthedrivel.org/2021/08/11/jetl-j-extract-transform-and-load/
> > >> > > > >
> > >> > > > > ireadapply=:1 : 0
> > >> > > > >
> > >> > > > >
> > >> > > > > NB.*ireadapply v-- apply verb (u) to n byte line blocks of static
> > >> > file.
> > >> > > > >
> > >> > > > > NB.
> > >> > > > >
> > >> > > > > NB. adv: u ireadapply (clFileIn ; clFileOut ; clDel ; iaBlockSize
> > >> ;<
> > >> > > > > uuData)
> > >> > > > >
> > >> > > > > NB.
> > >> > > > >
> > >> > > > > NB. fi=. winpathsep ;1 dir SwiftZipCsvDir,'ItemSales-*.csv'
> > >> > > > >
> > >> > > > > NB. fo=. SwiftTsvDir,'land_ItemSales.txt'
> > >> > > > >
> > >> > > > > NB. smoutput@:(>@{. ; ($&.>)@:}.) ireadapply
> > >> fi;fo;CRLF;20000000;<''
> > >> > > > >
> > >> > > > >
> > >> > > > > NB. file in, file out, line delimiter, block size, (u) verb data
> > >> > > > >
> > >> > > > > 'fi fo d k ud'=. y
> > >> > > > >
> > >> > > > >
> > >> > > > > p=. 0 NB. file pointer
> > >> > > > >
> > >> > > > > c=. 0 NB. block count
> > >> > > > >
> > >> > > > > s=. fsize fi NB. file bytes
> > >> > > > >
> > >> > > > > k=. k<.s NB. first block size
> > >> > > > >
> > >> > > > > NB.debug. b=. i.0 NB. block sizes (chk)
> > >> > > > >
> > >> > > > >
> > >> > > > > while. p < s do.
> > >> > > > >
> > >> > > > > 'iread error' assert -. _1 -: r=. (1!:11 :: _1:) fi;p,k
> > >> > > > >
> > >> > > > > c=. >:c NB. block count
> > >> > > > >
> > >> > > > > NB. complete lines
> > >> > > > >
> > >> > > > > if. 0 = #l=. d beforelaststr r do.
> > >> > > > >
> > >> > > > > NB. final shard
> > >> > > > >
> > >> > > > > NB.debug. b=. b,#r
> > >> > > > >
> > >> > > > > u c;1;d;fo;r;<ud break.
> > >> > > > >
> > >> > > > > end.
> > >> > > > >
> > >> > > > > p=. p + #l NB. inc file pointer
> > >> > > > >
> > >> > > > > k=. k <. s - p NB. next block size
> > >> > > > >
> > >> > > > > NB.debug. b=. b,#l NB. block sizes list
> > >> > > > >
> > >> > > > > NB. block number, shard, delimiter, file out, line bytes, (u) 
> > >> > > > > data
> > >> > > > >
> > >> > > > > u c;0;d;fo;l;<ud
> > >> > > > >
> > >> > > > > end.
> > >> > > > >
> > >> > > > >
> > >> > > > > NB.debug. 'byte mismatch' assert s = +/b
> > >> > > > >
> > >> > > > > c NB. blocks processed
> > >> > > > >
> > >> > > > > )
> > >> > > > >
> > >> > > > > On Mon, Aug 16, 2021 at 7:17 PM Raul Miller <
> > >> [email protected]>
> > >> > > > wrote:
> > >> > > > >
> > >> > > > > > 1, As you have noticed, certainly. There's details, of course
> > >> (what
> > >> > > > > > block size to use? Are files guaranteed to be well formed? If
> > >> not,
> > >> > > > > > what are error conditions? (are certain characters illegal? Are
> > >> > lines
> > >> > > > > > longer than the block size allowed?) Do you want a callback
> > >> > interface
> > >> > > > > > for each block? If so, do you need an "end of file" indication?
> > >> If
> > >> > > so,
> > >> > > > > > is that a separate callback or a distinct argument to the block
> > >> > > > > > callback? etc.)
> > >> > > > > >
> > >> > > > > > 2. Again, as you have noticed: yes. And, there are analogous
> > >> > details
> > >> > > > > > here...
> > >> > > > > >
> > >> > > > > > 3. The expat API should only require J knowledge. There are a
> > >> > couple
> > >> > > > > > examples in the addons/api/expat/test/ directory named 
> > >> > > > > > test0.ijs
> > >> > and
> > >> > > > > > test1.ijs
> > >> > > > > >
> > >> > > > > > I hope this helps,
> > >> > > > > >
> > >> > > > > > --
> > >> > > > > > Raul
> > >> > > > > >
> > >> > > > > > On Mon, Aug 16, 2021 at 4:23 PM Mariusz Grasko
> > >> > > > > > <[email protected]> wrote:
> > >> > > > > > >
> > >> > > > > > > Thank you for some ideas on using external parser.
> > >> > > > > > > Okay now I have 3 questions:
> > >> > > > > > > 1. Is it possible to read CSV file streaming-style (for
> > >> example
> > >> > > > record
> > >> > > > > by
> > >> > > > > > > record) without loading everything in memory ? Even if I use
> > >> some
> > >> > > > > > external
> > >> > > > > > > parsing solution like XSLT or just write something myself in
> > >> some
> > >> > > > other
> > >> > > > > > > language than J, I will end up with large CSV instead of 
> > >> > > > > > > large
> > >> > XML.
> > >> > > > It
> > >> > > > > > > makes no difference. The reason that I need to parse it like
> > >> > this,
> > >> > > is
> > >> > > > > > that
> > >> > > > > > > there are some rows that I won't need, those would be
> > >> discarded
> > >> > > > > depending
> > >> > > > > > > on their field values.
> > >> > > > > > > If it is not possible I would do more work outside of J in
> > >> this
> > >> > > first
> > >> > > > > > > parser XML -> CSV.
> > >> > > > > > > 2. Is there a way to call external program for J script ? If
> > >> it
> > >> > is
> > >> > > > > > > possible  to wait for it to finish ?
> > >> > > > > > > If it is not possible, there are definiately ways to run J
> > >> from
> > >> > > other
> > >> > > > > > > programs.
> > >> > > > > > > 3. Can someone give a little bit of pointer or on how to use
> > >> > > > api/expat
> > >> > > > > > > library ? Do I need to familiarize myself with expat (C
> > >> library)
> > >> > or
> > >> > > > > just
> > >> > > > > > > good understanding of J and reading small test in package
> > >> > directory
> > >> > > > > > should
> > >> > > > > > > be enough ?
> > >> > > > > > > I could send some example file like Devon McCormick 
> > >> > > > > > > suggested.
> > >> > > > > > >
> > >> > > > > > > Right now I am working through book "J:The natural language
> > >> for
> > >> > > > > analytic
> > >> > > > > > > computing" and playing around with problems like Project
> > >> Euler,
> > >> > > but I
> > >> > > > > > could
> > >> > > > > > > really see myself using J in serious work.
> > >> > > > > > >
> > >> > > > > > > Best regards,
> > >> > > > > > > MG
> > >> > > > > > >
> > >> > > > > > >
> > >> > > > > > > śr., 11 sie 2021 o 09:51 <[email protected]> napisał(a):
> > >> > > > > > >
> > >> > > > > > > > In similar situations -but my files are not huge- I extract
> > >> > what
> > >> > > I
> > >> > > > > want
> > >> > > > > > > > into flattened CSV using one or more XQuery scripts, and
> > >> then
> > >> > > load
> > >> > > > > the
> > >> > > > > > CSV
> > >> > > > > > > > files with J.  The code is clean, compact and easy to
> > >> maintain.
> > >> > > For
> > >> > > > > > > > recurrent XQuery patterns, m4 occasionally comes to the
> > >> rescue.
> > >> > > > > Expect
> > >> > > > > > > > minor portability issues when using different XQuery
> > >> processors
> > >> > > > > > > > (extensions, language level...).
> > >> > > > > > > >
> > >> > > > > > > >
> > >> > > > > > > >
> > >> > > > > > > > Never got round to SAX parsing beyond tutorials, so I 
> > >> > > > > > > > cannot
> > >> > > > compare.
> > >> > > > > > > >
> > >> > > > > > > >
> > >> > > > > > > > De : Mariusz Grasko <[email protected]>
> > >> > > > > > > > À : [email protected]
> > >> > > > > > > > Sujet : [Jprogramming] Is is good idea to use J for reading
> > >> > large
> > >> > > > XML
> > >> > > > > > > > files ?
> > >> > > > > > > > Date : 10/08/2021 18:05:45 Europe/Paris
> > >> > > > > > > >
> > >> > > > > > > > Hi,
> > >> > > > > > > >
> > >> > > > > > > > We are ecommerce company and have a lot of integrations 
> > >> > > > > > > > with
> > >> > > > > suppliers,
> > >> > > > > > > > products info is nearly always in XML files. I am thinking
> > >> > about
> > >> > > > > using
> > >> > > > > > J as
> > >> > > > > > > > an analysis tool, do you think that working with large 
> > >> > > > > > > > files
> > >> > that
> > >> > > > > need
> > >> > > > > > to
> > >> > > > > > > > be parsed SAX- style without reading everything at once is
> > >> good
> > >> > > > idea
> > >> > > > > > in J ?
> > >> > > > > > > > Also is this even advantageous (as in, would code be 
> > >> > > > > > > > terse).
> > >> > > Right
> > >> > > > > now
> > >> > > > > > XML
> > >> > > > > > > > parsing is done in Golang, so if parsing in J is not very
> > >> good
> > >> > we
> > >> > > > > > could try
> > >> > > > > > > > to rely more on CSV exports. CSV is definiately very good
> > >> in J.
> > >> > > > > > > > I am hoping that maybe XML parsing is very good in J and 
> > >> > > > > > > > the
> > >> > code
> > >> > > > > would
> > >> > > > > > > > become much smaller, if this is the case, then I would 
> > >> > > > > > > > think
> > >> > > about
> > >> > > > > > using J
> > >> > > > > > > > for XMLs with new suppliers.
> > >> > > > > > > >
> > >> > > > > > > > Best Regards
> > >> > > > > > > > M.G.
> > >> > > > > > > >
> > >> > > > >
> > >> > ----------------------------------------------------------------------
> > >> > > > > > > > For information about J forums see
> > >> > > > > http://www.jsoftware.com/forums.htm
> > >> > > > > > > >
> > >> > > > > > > >
> > >> > > > >
> > >> > ----------------------------------------------------------------------
> > >> > > > > > > > For information about J forums see
> > >> > > > > http://www.jsoftware.com/forums.htm
> > >> > > > > > > >
> > >> > > > > > >
> > >> > > >
> > >> ----------------------------------------------------------------------
> > >> > > > > > > For information about J forums see
> > >> > > > http://www.jsoftware.com/forums.htm
> > >> > > > > >
> > >> > > ----------------------------------------------------------------------
> > >> > > > > > For information about J forums see
> > >> > > http://www.jsoftware.com/forums.htm
> > >> > > > > >
> > >> > > > >
> > >> > > > >
> > >> > > > > --
> > >> > > > > John D. Baker
> > >> > > > > [email protected]
> > >> > > > >
> > >> > ----------------------------------------------------------------------
> > >> > > > > For information about J forums see
> > >> > http://www.jsoftware.com/forums.htm
> > >> > > > >
> > >> > > >
> > >> > > >
> > >> > > > --
> > >> > > >
> > >> > > > Devon McCormick, CFA
> > >> > > >
> > >> > > > Quantitative Consultant
> > >> > > >
> > >> ----------------------------------------------------------------------
> > >> > > > For information about J forums see
> > >> http://www.jsoftware.com/forums.htm
> > >> > > >
> > >> > > ----------------------------------------------------------------------
> > >> > > For information about J forums see
> > >> http://www.jsoftware.com/forums.htm
> > >> > >
> > >> > ----------------------------------------------------------------------
> > >> > For information about J forums see http://www.jsoftware.com/forums.htm
> > >> >
> > >> ----------------------------------------------------------------------
> > >> For information about J forums see http://www.jsoftware.com/forums.htm
> > >>
> > >
> > ----------------------------------------------------------------------
> > For information about J forums see http://www.jsoftware.com/forums.htm
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Re: [Jprogramming] Is is good idea to use J for reading large XML files ?

Reply via email to