i didn't want pages of text within the body tag because i want to parse the file very quickly.

at what point would i start to notice a decrease in speed?



On Sep 17, 2003, at 6:37 AM, Hans Fugal wrote:

...doesn't seem to make much sense, because...

1. it is so much bigger than the other content

        2.      you'd have to put a <br> in there for it to display
        correctly...  or perhaps a "\n" would work... or something... ?

        3.      if i want to parse the entire directory... it'll get
        really big if the "body" tags contain more than a sentence or
        two
I'm not sure I see the problem with body being big. If having a big xml
file really is a problem, then you could probably do something like
<body filename="body1.txt"/>. If you want to do that sort of thing
the xml way you might have a look at XLink, but I'm not familiar enough
with it to know whether it fits here.

You need to think about what's being stored in this body tag. If it's
HTML, then the < and & (at least) should be escaped as &lt; and &gt; and
you can store <br> and any other tag you feel like. If the body is
well-formed xml (say XHTML) then you wouldn't even have to escape the
<br/>. If it's plain text, you have the choice to care about whitespace
or not, in which case your newlines can be preserved. I don't think your
issue is really about XML supporting the formatting of your data but you
knowing how to properly encode it.


* Wade Preston Shearer [Tue, 16 Sep 2003 at 18:04 -0600]
<quote>
i have a question for all of you xml parsers.

i have an xml file like...

        <data>
                <document>
                        <id></id>
                        <title></title>
                        <subtitle></subtitle>
                        <date></date>
                        <author></<author>
                </document>
                <document>
                ...etc...
        </data>


...that has many entries. as you can see, this xml file is acting like
a database of documents. the information in the xml is what i will call
the "title" information for the document. my question is... what is the
best way to put in the "body" of the document?


the reason that i ask is because, putting it all in a tag, like this...

        <data>
                <document>
                        <id></id>
                        <title></title>
                        <subtitle></subtitle>
                        <date></date>
                        <author></<author>
                        <body>Since every penny I earn depends on copyright
                        protection, I'm all in favor of reasonable laws to
                                        do the job.

                                        But there's something kind of sad
                                        about the
                                        recording industry's indecent
                                        passion to punish
                                        the "criminals" who are violating
                                        their rights.

                                        Copyright is a temporary monopoly
                                        granted by
                                        the government -- it creates the
                                        legal fiction that a
                                        piece of writing or composing (or,
                                        as technologies
                                        were created, a recorded
                                        performance) is
                                        property and can only be sold by
                                        those who have
                                        been licensed to do so by the
                                        copyright holder.
                                </body>
                </document>
                <document>
                ...etc...
        </data>


...doesn't seem to make much sense, because...


1. it is so much bigger than the other content

        2.      you'd have to put a <br> in there for it to display
        correctly...
                or perhaps a "\n" would work... or something...   ?

        3.      if i want to parse the entire directory... it'll get really
        big if the
                "body" tags contain more than a sentence or two


How should i approach this? The best thing that I have come up with is
to use the info in the "id" tag to reference a text file by the same
name ($id . ".txt"). So, the xml file would then a list of the
documents "title" info while the "body" data is in separate text files.


Although everything isn't in the same "database," it seems like a good
idea because I want to be able to use the data in the xml file for both
displaying one document at a time and also displaying a listing of all
of the "titles."


If all of the data was the same size (ie: one sentence or less), this
would be easy, but, hey... it's never easy. For those of you that have
done an XML parsing project like this before, with data that requires a
hard return within data that is within a tag...


how have you done it?

does xml have a special character for end-of-lines like this?

does this approach (separate txt files for the "body") sound good?


eager for you input,


wade



____________________
BYU Unix Users Group
http://uug.byu.edu/
___________________________________________________________________
List Info: http://uug.byu.edu/cgi-bin/mailman/listinfo/uug-list
</quote>

--
 Hans Fugal                 | De gustibus non disputandum est.
 http://hans.fugal.net/     | Debian, vim, mutt, ruby, text, gpg
 http://gdmxml.fugal.net/   | WindowMaker, gaim, UTF-8, RISC, JS Bach
---------------------------------------------------------------------
GnuPG Fingerprint: 6940 87C5 6610 567F 1E95  CB5E FC98 E8CD E0AA D460
____________________
BYU Unix Users Group
http://uug.byu.edu/
___________________________________________________________________
List Info: http://uug.byu.edu/cgi-bin/mailman/listinfo/uug-list



____________________
BYU Unix Users Group http://uug.byu.edu/ ___________________________________________________________________
List Info: http://uug.byu.edu/cgi-bin/mailman/listinfo/uug-list

Reply via email to