Re: [xml] xml Digest, Vol 180, Issue 4

Ashjan Alsulaimani Thu, 11 Jul 2019 01:32:36 -0700

Hi again
My data are big. Im trying to do subsets form Medline database and I also
have to read CDATA tags and store the ones I am interested in. my current
version of the code is simply reading the xml elements and storing and
thats takes 13 hours to process and its not good at all :S


thanks
Ashjan

On Sat, 6 Jul 2019 at 13:00, <xml-requ...@gnome.org> wrote:

> Send xml mailing list submissions to
>         xml@gnome.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
>         https://mail.gnome.org/mailman/listinfo/xml
> or, via email, send a message with subject or body 'help' to
>         xml-requ...@gnome.org
>
> You can reach the person managing the list at
>         xml-ow...@gnome.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of xml digest..."
>
>
> Today's Topics:
>
>    1. Re: Xml Question (Eric Eberhard)
>    2. Re: Xml Question (Liam R E Quin)
>    3. Re: Xml Question (Eric Eberhard)
>    4. Re: Xml Question (Eric Eberhard)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Fri, 5 Jul 2019 12:18:41 -0700
> From: "Eric Eberhard" <fl...@vicsmba.com>
> To: "'Liam R E Quin'" <l...@holoweb.net>,       "'Ashjan Alsulaimani'"
>         <alsul...@tcd.ie>, <xml@gnome.org>
> Subject: Re: [xml] Xml Question
> Message-ID: <0abb01d53366$75b87000$61295000$@vicsmba.com>
> Content-Type: text/plain;       charset="us-ascii"
>
> Dear Ashjan,
>
> If it was me I'd do it the cheap way and not use the parser.  Get the file
> and then read through it with your favorite language and look for starting
> tags you want moved, then scan until you hit the ending tag, write that
> out.
> Rinse and repeat.  You can use the parser on each piece you write out.
>
> It is surely possible to do it in both ways described and I know of other
> that works on small files.  But this is a LOT easier.
>
> Eric
>
> -----Original Message-----
> From: xml [mailto:xml-boun...@gnome.org] On Behalf Of Liam R E Quin
> Sent: Thursday, July 04, 2019 6:28 AM
> To: Ashjan Alsulaimani <alsul...@tcd.ie>; xml@gnome.org
> Subject: Re: [xml] Xml Question
>
> On Thu, 2019-07-04 at 10:33 +0100, Ashjan Alsulaimani wrote:
> >
> >
> > What's the best way to approach such a task and the most efficient way
> > as I'm dealing with Medline database!
>
> If your input files are a few hundred megabytes or less, start with the
> XSLT
> identity transform and add empty templates to match what you want to
> delete.
>
> If your input is over a gigabyte (say) or you do lots of different subsets
> of the same document, you may find XQuery update works better for you, with
> a databaase (e.g. BaseX or eXistb).
>
> Liam
>
>
> --
> Liam Quin, https://www.delightfulcomputing.com/
> Available for XML/Document/Information Architecture/XSLT/
> XSL/XQuery/Web/Text Processing/A11Y training, work & consulting.
> Upcoming courses: DocBook (sold out); CSS for XML People
>
> _______________________________________________
> xml mailing list, project page  http://xmlsoft.org/ xml@gnome.org
> https://mail.gnome.org/mailman/listinfo/xml
>
>
>
>
> ------------------------------
>
> Message: 2
> Date: Fri, 05 Jul 2019 17:24:05 -0400
> From: Liam R E Quin <l...@holoweb.net>
> To: Eric Eberhard <fl...@vicsmba.com>, 'Ashjan Alsulaimani'
>         <alsul...@tcd.ie>,  xml@gnome.org
> Subject: Re: [xml] Xml Question
> Message-ID:
>         <717eaaaf79ba56458eeb6551a2272637c77f76b8.ca...@holoweb.net>
> Content-Type: text/plain; charset="UTF-8"
>
> On Fri, 2019-07-05 at 12:18 -0700, Eric Eberhard wrote:
> > Dear Ashjan,
> >
> > If it was me I'd do it the cheap way and not use the parser.
>
> Make sure to handle markup in comments and CDATA sections properly,and
> to process external files included with XInclude or by entities defined
> in the DTD.
>
> Working with XML at the text level can be reasonably safe if you know
> the input files well, and yes, i sometimes do it too, but cheap isn't
> the same as good :)
>
> Liam
>
>
> --
> Liam Quin, https://www.delightfulcomputing.com/
>
> Upcoming course:   CSS for XML People, Rockville MD, August 2019
>                    See https://www.delightfulcomputing.com/
>
>
>
> ------------------------------
>
> Message: 3
> Date: Fri, 5 Jul 2019 14:49:01 -0700
> From: "Eric Eberhard" <fl...@vicsmba.com>
> To: "'Liam R E Quin'" <l...@holoweb.net>,       "'Ashjan Alsulaimani'"
>         <alsul...@tcd.ie>, <xml@gnome.org>
> Subject: Re: [xml] Xml Question
> Message-ID: <0adb01d5337b$768e9f30$63abdd90$@vicsmba.com>
> Content-Type: text/plain;       charset="utf-8"
>
> Your answer is spot on.  I don't know if he has markup and CDATA or if his
> files are large.  If none of those are true, cheap is good :-)  If it is a
> gig file with CDATA and markup, cheap would be bad.
>
> E
>
> -----Original Message-----
> From: Liam R E Quin [mailto:l...@holoweb.net]
> Sent: Friday, July 05, 2019 2:24 PM
> To: Eric Eberhard <fl...@vicsmba.com>; 'Ashjan Alsulaimani' <
> alsul...@tcd.ie>; xml@gnome.org
> Subject: Re: [xml] Xml Question
>
> On Fri, 2019-07-05 at 12:18 -0700, Eric Eberhard wrote:
> > Dear Ashjan,
> >
> > If it was me I'd do it the cheap way and not use the parser.
>
> Make sure to handle markup in comments and CDATA sections properly,and to
> process external files included with XInclude or by entities defined in the
> DTD.
>
> Working with XML at the text level can be reasonably safe if you know the
> input files well, and yes, i sometimes do it too, but cheap isn't the same
> as good :)
>
> Liam
>
>
> --
> Liam Quin, https://www.delightfulcomputing.com/
>
> Upcoming course:   CSS for XML People, Rockville MD, August 2019
>                    See https://www.delightfulcomputing.com/
>
>
>
>
>
> ------------------------------
>
> Message: 4
> Date: Fri, 5 Jul 2019 14:57:57 -0700
> From: "Eric Eberhard" <fl...@vicsmba.com>
> To: "'Liam R E Quin'" <l...@holoweb.net>,       "'Ashjan Alsulaimani'"
>         <alsul...@tcd.ie>, <xml@gnome.org>
> Subject: Re: [xml] Xml Question
> Message-ID: <0adc01d5337c$b59ec460$20dc4d20$@vicsmba.com>
> Content-Type: text/plain;       charset="us-ascii"
>
> Oh -- if smaller file here is some cheap code that works fine.  You will
> have to create a new document for each smaller pieces and then copy the
> pieces over like so:
>
> for (cur=fromwrk->cur;cur;cur=cur->next) {
>      tmp = xmlCopyNode(cur,1);
>      xmlAddChild(towrk->cur,tmp);
>  }
>
> >From being you original file and cur being your current little file.
>
> E
>
> -----Original Message-----
> From: xml [mailto:xml-boun...@gnome.org] On Behalf Of Eric Eberhard
> Sent: Friday, July 05, 2019 12:19 PM
> To: 'Liam R E Quin' <l...@holoweb.net>; 'Ashjan Alsulaimani'
> <alsul...@tcd.ie>; xml@gnome.org
> Subject: Re: [xml] Xml Question
>
> Dear Ashjan,
>
> If it was me I'd do it the cheap way and not use the parser.  Get the file
> and then read through it with your favorite language and look for starting
> tags you want moved, then scan until you hit the ending tag, write that
> out.
> Rinse and repeat.  You can use the parser on each piece you write out.
>
> It is surely possible to do it in both ways described and I know of other
> that works on small files.  But this is a LOT easier.
>
> Eric
>
> -----Original Message-----
> From: xml [mailto:xml-boun...@gnome.org] On Behalf Of Liam R E Quin
> Sent: Thursday, July 04, 2019 6:28 AM
> To: Ashjan Alsulaimani <alsul...@tcd.ie>; xml@gnome.org
> Subject: Re: [xml] Xml Question
>
> On Thu, 2019-07-04 at 10:33 +0100, Ashjan Alsulaimani wrote:
> >
> >
> > What's the best way to approach such a task and the most efficient way
> > as I'm dealing with Medline database!
>
> If your input files are a few hundred megabytes or less, start with the
> XSLT
> identity transform and add empty templates to match what you want to
> delete.
>
> If your input is over a gigabyte (say) or you do lots of different subsets
> of the same document, you may find XQuery update works better for you, with
> a databaase (e.g. BaseX or eXistb).
>
> Liam
>
>
> --
> Liam Quin, https://www.delightfulcomputing.com/
> Available for XML/Document/Information Architecture/XSLT/
> XSL/XQuery/Web/Text Processing/A11Y training, work & consulting.
> Upcoming courses: DocBook (sold out); CSS for XML People
>
> _______________________________________________
> xml mailing list, project page  http://xmlsoft.org/ xml@gnome.org
> https://mail.gnome.org/mailman/listinfo/xml
>
>
> _______________________________________________
> xml mailing list, project page  http://xmlsoft.org/ xml@gnome.org
> https://mail.gnome.org/mailman/listinfo/xml
>
>
>
>
> ------------------------------
>
> Subject: Digest Footer
>
> _______________________________________________
> xml mailing list
> xml@gnome.org
> https://mail.gnome.org/mailman/listinfo/xml
>
>
> ------------------------------
>
> End of xml Digest, Vol 180, Issue 4
> ***********************************
>

_______________________________________________
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
https://mail.gnome.org/mailman/listinfo/xml

Re: [xml] xml Digest, Vol 180, Issue 4

Reply via email to