Re: [oXygen-user] Xpath and Saxon return tabs as text

Wendell Piez Tue, 09 Sep 2008 07:25:54 -0700

Hi,

In the meantime, Philip should be aware thatthere is generally only a loose binding between aschema (or DTD) and a document, such that (otherthings being equal) processors will notautomatically strip whitespace-only text nodesfrom documents without explicit instruction to doso. This is by design, since schemas are notalways available to processors, and indeed someoperations can and should be able to run withoutschemas. Whitespace stripping without a schema isdangerous and can frequently result in corruptdata where whitespace was stripped improperly.

Accordingly, although the XPath 2.0/XQuery familyof technologies provides this feature, Philip mayhave to get used to its not always beingavailable, for example when using XPath 1.0.

In general, it's something to watch out for;automatic whitespace stripping can easily fallinto the category of "be careful what you wish for".


Cheers,
Wendell

At 11:23 AM 9/3/2008, Sorin wrote:

Hello,
Saxon 9 has an option for stripping whitespacenodes but Oxygen allows you to set it only fortransformations (Preferences -> XML ->XSLT-FO-XQuery -> XSLT -> Saxon -> Saxon-B/SA).If you set the above option to strip whitespacenodes and you run an XSLT transform that usesthe expression //text() you can see that thelist of nodes does not contain such nodes. Inthe next version we will add this Saxon 9 option for XPath expressions too.

...

Philipp SteinkrÃ¼ger wrote:
Dear Oxygen-Users,
i am having a problem with an indented XML File. The File looks like this:
<?xml version="1.0" encoding="UTF-8"?>
<TEIxmlns:xsi="http://www.w3.org/2001/XMLSchema-instance";xmlns="http://www.i-d-e.de/ns/1.0";>
    <teiHeader>
        <fileDesc>
            <titleStmt>
                <title>MS Einsiedeln</title>
            </titleStmt>
            <publicationStmt>
                <p>publicationsStmt empty</p>
            </publicationStmt><sourceDesc>
                <p>sourceDesc empty</p>
            </sourceDesc></fileDesc>
    </teiHeader>
    <text>
        <body>
            <div>
                <div>
                    <div>
<p><c>D</c>ie gotheitit beloÅ¿en<lb/>in dem vater n<ex>atur</ex>elichdar<lb/>vmbeit er alvermvgende<lb/>vnd enpfat niht von ite<lb/>des<gapreason=""/> er elber nit en it an<lb/>iner go<unclear
>                                 >tl</unclear>icher macht wan<lb/>ers
weelich i<ex>n</ex> ime vnd
an<lb/>imeelben beloÅ¿en hat<space unit="letters" quantity="1"
                        /></p>
   </div>
     </div>
       </div>
    </body>
  </text>
</TEI>
Now, using the following XPath 2.0 expression://text(), the tabs are returned as text-nodes,for example the first tab before the tag<teiHeader>. In fact, my DTD does not allow#PCDATA inside <TEI>, but the document isvalidated without any problems. To me thisseems kind of schizophrenic, or am I mistaken?Btw: the same file in XMLSpy with its build-inxslt engine as well as MS XML parser with thesame xPath expression does not return the tabs as text-nodes.
Any ideas?
Philipp
PS: I am using Oxygen 9.3



======================================================================
Wendell Piez                            mailto:[EMAIL PROTECTED]
Mulberry Technologies, Inc.                http://www.mulberrytech.com
17 West Jefferson Street                    Direct Phone: 301/315-9635
Suite 207                                          Phone: 301/315-9631
Rockville, MD  20850                                 Fax: 301/315-8285
----------------------------------------------------------------------
  Mulberry Technologies: A Consultancy Specializing in SGML and XML
======================================================================

_______________________________________________
oXygen-user mailing list
[email protected]
http://www.oxygenxml.com/mailman/listinfo/oxygen-user

Re: [oXygen-user] Xpath and Saxon return tabs as text

Reply via email to