Hi again My data are big. Im trying to do subsets form Medline database and I also have to read CDATA tags and store the ones I am interested in. my current version of the code is simply reading the xml elements and storing and thats takes 13 hours to process and its not good at all :S
thanks Ashjan On Sat, 6 Jul 2019 at 13:00, <xml-requ...@gnome.org> wrote: > Send xml mailing list submissions to > xml@gnome.org > > To subscribe or unsubscribe via the World Wide Web, visit > https://mail.gnome.org/mailman/listinfo/xml > or, via email, send a message with subject or body 'help' to > xml-requ...@gnome.org > > You can reach the person managing the list at > xml-ow...@gnome.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of xml digest..." > > > Today's Topics: > > 1. Re: Xml Question (Eric Eberhard) > 2. Re: Xml Question (Liam R E Quin) > 3. Re: Xml Question (Eric Eberhard) > 4. Re: Xml Question (Eric Eberhard) > > > ---------------------------------------------------------------------- > > Message: 1 > Date: Fri, 5 Jul 2019 12:18:41 -0700 > From: "Eric Eberhard" <fl...@vicsmba.com> > To: "'Liam R E Quin'" <l...@holoweb.net>, "'Ashjan Alsulaimani'" > <alsul...@tcd.ie>, <xml@gnome.org> > Subject: Re: [xml] Xml Question > Message-ID: <0abb01d53366$75b87000$61295000$@vicsmba.com> > Content-Type: text/plain; charset="us-ascii" > > Dear Ashjan, > > If it was me I'd do it the cheap way and not use the parser. Get the file > and then read through it with your favorite language and look for starting > tags you want moved, then scan until you hit the ending tag, write that > out. > Rinse and repeat. You can use the parser on each piece you write out. > > It is surely possible to do it in both ways described and I know of other > that works on small files. But this is a LOT easier. > > Eric > > -----Original Message----- > From: xml [mailto:xml-boun...@gnome.org] On Behalf Of Liam R E Quin > Sent: Thursday, July 04, 2019 6:28 AM > To: Ashjan Alsulaimani <alsul...@tcd.ie>; xml@gnome.org > Subject: Re: [xml] Xml Question > > On Thu, 2019-07-04 at 10:33 +0100, Ashjan Alsulaimani wrote: > > > > > > What's the best way to approach such a task and the most efficient way > > as I'm dealing with Medline database! > > If your input files are a few hundred megabytes or less, start with the > XSLT > identity transform and add empty templates to match what you want to > delete. > > If your input is over a gigabyte (say) or you do lots of different subsets > of the same document, you may find XQuery update works better for you, with > a databaase (e.g. BaseX or eXistb). > > Liam > > > -- > Liam Quin, https://www.delightfulcomputing.com/ > Available for XML/Document/Information Architecture/XSLT/ > XSL/XQuery/Web/Text Processing/A11Y training, work & consulting. > Upcoming courses: DocBook (sold out); CSS for XML People > > _______________________________________________ > xml mailing list, project page http://xmlsoft.org/ xml@gnome.org > https://mail.gnome.org/mailman/listinfo/xml > > > > > ------------------------------ > > Message: 2 > Date: Fri, 05 Jul 2019 17:24:05 -0400 > From: Liam R E Quin <l...@holoweb.net> > To: Eric Eberhard <fl...@vicsmba.com>, 'Ashjan Alsulaimani' > <alsul...@tcd.ie>, xml@gnome.org > Subject: Re: [xml] Xml Question > Message-ID: > <717eaaaf79ba56458eeb6551a2272637c77f76b8.ca...@holoweb.net> > Content-Type: text/plain; charset="UTF-8" > > On Fri, 2019-07-05 at 12:18 -0700, Eric Eberhard wrote: > > Dear Ashjan, > > > > If it was me I'd do it the cheap way and not use the parser. > > Make sure to handle markup in comments and CDATA sections properly,and > to process external files included with XInclude or by entities defined > in the DTD. > > Working with XML at the text level can be reasonably safe if you know > the input files well, and yes, i sometimes do it too, but cheap isn't > the same as good :) > > Liam > > > -- > Liam Quin, https://www.delightfulcomputing.com/ > > Upcoming course: CSS for XML People, Rockville MD, August 2019 > See https://www.delightfulcomputing.com/ > > > > ------------------------------ > > Message: 3 > Date: Fri, 5 Jul 2019 14:49:01 -0700 > From: "Eric Eberhard" <fl...@vicsmba.com> > To: "'Liam R E Quin'" <l...@holoweb.net>, "'Ashjan Alsulaimani'" > <alsul...@tcd.ie>, <xml@gnome.org> > Subject: Re: [xml] Xml Question > Message-ID: <0adb01d5337b$768e9f30$63abdd90$@vicsmba.com> > Content-Type: text/plain; charset="utf-8" > > Your answer is spot on. I don't know if he has markup and CDATA or if his > files are large. If none of those are true, cheap is good :-) If it is a > gig file with CDATA and markup, cheap would be bad. > > E > > -----Original Message----- > From: Liam R E Quin [mailto:l...@holoweb.net] > Sent: Friday, July 05, 2019 2:24 PM > To: Eric Eberhard <fl...@vicsmba.com>; 'Ashjan Alsulaimani' < > alsul...@tcd.ie>; xml@gnome.org > Subject: Re: [xml] Xml Question > > On Fri, 2019-07-05 at 12:18 -0700, Eric Eberhard wrote: > > Dear Ashjan, > > > > If it was me I'd do it the cheap way and not use the parser. > > Make sure to handle markup in comments and CDATA sections properly,and to > process external files included with XInclude or by entities defined in the > DTD. > > Working with XML at the text level can be reasonably safe if you know the > input files well, and yes, i sometimes do it too, but cheap isn't the same > as good :) > > Liam > > > -- > Liam Quin, https://www.delightfulcomputing.com/ > > Upcoming course: CSS for XML People, Rockville MD, August 2019 > See https://www.delightfulcomputing.com/ > > > > > > ------------------------------ > > Message: 4 > Date: Fri, 5 Jul 2019 14:57:57 -0700 > From: "Eric Eberhard" <fl...@vicsmba.com> > To: "'Liam R E Quin'" <l...@holoweb.net>, "'Ashjan Alsulaimani'" > <alsul...@tcd.ie>, <xml@gnome.org> > Subject: Re: [xml] Xml Question > Message-ID: <0adc01d5337c$b59ec460$20dc4d20$@vicsmba.com> > Content-Type: text/plain; charset="us-ascii" > > Oh -- if smaller file here is some cheap code that works fine. You will > have to create a new document for each smaller pieces and then copy the > pieces over like so: > > for (cur=fromwrk->cur;cur;cur=cur->next) { > tmp = xmlCopyNode(cur,1); > xmlAddChild(towrk->cur,tmp); > } > > >From being you original file and cur being your current little file. > > E > > -----Original Message----- > From: xml [mailto:xml-boun...@gnome.org] On Behalf Of Eric Eberhard > Sent: Friday, July 05, 2019 12:19 PM > To: 'Liam R E Quin' <l...@holoweb.net>; 'Ashjan Alsulaimani' > <alsul...@tcd.ie>; xml@gnome.org > Subject: Re: [xml] Xml Question > > Dear Ashjan, > > If it was me I'd do it the cheap way and not use the parser. Get the file > and then read through it with your favorite language and look for starting > tags you want moved, then scan until you hit the ending tag, write that > out. > Rinse and repeat. You can use the parser on each piece you write out. > > It is surely possible to do it in both ways described and I know of other > that works on small files. But this is a LOT easier. > > Eric > > -----Original Message----- > From: xml [mailto:xml-boun...@gnome.org] On Behalf Of Liam R E Quin > Sent: Thursday, July 04, 2019 6:28 AM > To: Ashjan Alsulaimani <alsul...@tcd.ie>; xml@gnome.org > Subject: Re: [xml] Xml Question > > On Thu, 2019-07-04 at 10:33 +0100, Ashjan Alsulaimani wrote: > > > > > > What's the best way to approach such a task and the most efficient way > > as I'm dealing with Medline database! > > If your input files are a few hundred megabytes or less, start with the > XSLT > identity transform and add empty templates to match what you want to > delete. > > If your input is over a gigabyte (say) or you do lots of different subsets > of the same document, you may find XQuery update works better for you, with > a databaase (e.g. BaseX or eXistb). > > Liam > > > -- > Liam Quin, https://www.delightfulcomputing.com/ > Available for XML/Document/Information Architecture/XSLT/ > XSL/XQuery/Web/Text Processing/A11Y training, work & consulting. > Upcoming courses: DocBook (sold out); CSS for XML People > > _______________________________________________ > xml mailing list, project page http://xmlsoft.org/ xml@gnome.org > https://mail.gnome.org/mailman/listinfo/xml > > > _______________________________________________ > xml mailing list, project page http://xmlsoft.org/ xml@gnome.org > https://mail.gnome.org/mailman/listinfo/xml > > > > > ------------------------------ > > Subject: Digest Footer > > _______________________________________________ > xml mailing list > xml@gnome.org > https://mail.gnome.org/mailman/listinfo/xml > > > ------------------------------ > > End of xml Digest, Vol 180, Issue 4 > *********************************** >
_______________________________________________ xml mailing list, project page http://xmlsoft.org/ xml@gnome.org https://mail.gnome.org/mailman/listinfo/xml