Re: [basex-talk] Passing through entities unchanged when serializing

2019-09-09 Thread Bridger Dyson-Smith
Ha ha, awesome Liam! Thank you for clarifying!

Best,
Bridger


On Mon, Sep 9, 2019 at 9:37 PM Liam R. E. Quin 
wrote:

> On Tue, 2019-09-10 at 02:59 +0200, Andreas Mixich wrote:
> > I wonder why the serialization behaves that way. It does not make
> > sense to
> > me. If a user has the need to escape XML, it should be thorough,
> > shouldn't it?
>
> XML entities are expanded by he XML parser, so by the time XQuery (or
> XSLT) sees the document they are gone.
>
> Consider an entity like
>  "blackgreySteven on>">
>
> 
>
> It'd be really complex to have that visible to XPath and to have to
> write, e.g.
> /students/entity(*)/person
>
> If it's an external parsed entity it's visible in that the base-uri
> property changes, but that's all.
>
> Character entities like  (ŗ) are just special cases of
> general entities, and XML does not distinguish them. I wish it did, but
> we never got back to that work after publishing XML 1.0.
>
> Liam
>
> --
> Liam Quin, https://www.delightfulcomputing.com/
> Available for XML/Document/Information Architecture/XSLT/
> XSL/XQuery/Web/Text Processing/A11Y training, work & consulting.
> Web slave for vintage clipart http://www.fromoldbooks.org/
>
>


Re: [basex-talk] Passing through entities unchanged when serializing

2019-09-09 Thread Andreas Mixich
On Tue, Sep 10, 2019 at 3:37 AM Liam R. E. Quin 
wrote:

> XML entities are expanded by he XML parser, so by the time XQuery (or
> XSLT) sees the document they are gone.
>

Ah, yes, I totally forgot about that! Thanks for clarification!


Re: [basex-talk] Passing through entities unchanged when serializing

2019-09-09 Thread Liam R. E. Quin
On Tue, 2019-09-10 at 02:59 +0200, Andreas Mixich wrote:
> I wonder why the serialization behaves that way. It does not make
> sense to
> me. If a user has the need to escape XML, it should be thorough,
> shouldn't it?

XML entities are expanded by he XML parser, so by the time XQuery (or
XSLT) sees the document they are gone.

Consider an entity like
blackgreySteven">



It'd be really complex to have that visible to XPath and to have to
write, e.g.
/students/entity(*)/person

If it's an external parsed entity it's visible in that the base-uri
property changes, but that's all.

Character entities like  (ŗ) are just special cases of
general entities, and XML does not distinguish them. I wish it did, but
we never got back to that work after publishing XML 1.0.

Liam

-- 
Liam Quin, https://www.delightfulcomputing.com/
Available for XML/Document/Information Architecture/XSLT/
XSL/XQuery/Web/Text Processing/A11Y training, work & consulting.
Web slave for vintage clipart http://www.fromoldbooks.org/



Re: [basex-talk] Passing through entities unchanged when serializing

2019-09-09 Thread Bridger Dyson-Smith
Hi Andreas -
I'm not sure (way outside of my wheelhouse :), but I think because
arbitrary serialization can generate invalid XML, so having a character map
makes the possible invalidity explicit?
Now that I've typed that, I'm not sure if that captures the rational or
not. :) In any case, here's what the specifications have to say[1].

Best,
Bridger

[1] https://www.w3.org/TR/xslt-xquery-serialization-31/#character-maps



On Mon, Sep 9, 2019 at 9:00 PM Andreas Mixich 
wrote:

> I wonder why the serialization behaves that way. It does not make sense to
> me. If a user has the need to escape XML, it should be thorough, shouldn't
> it?
>
> On Mon, Sep 9, 2019 at 10:47 PM Liam R. E. Quin 
> wrote:
>
>> On Mon, 2019-09-09 at 15:04 +0200, Andreas Mixich wrote:
>> > when serializing a string, that contains literal XML with entities,
>> > how do I pass through those entities unchanged?
>>
>> One way is to use a character map, as Bridger Dyson-Smith described.
>>
>> Sometimes another way can be to have a version of the DTD in which the
>> replacement text of the entity marks the presence of the entity, e.g.
>> 
>> but this will affect full-text searching of course.
>>
>> Liam
>>
>> --
>> Liam Quin, https://www.delightfulcomputing.com/
>> Available for XML/Document/Information Architecture/XSLT/
>> XSL/XQuery/Web/Text Processing/A11Y training, work & consulting.
>> Barefoot Webslave for old illustrations  http://www.fromoldbooks.org/
>>
>>
>
> --
> Minden jót, all the best, Alles Gute,
> Andreas Mixich
>


Re: [basex-talk] Passing through entities unchanged when serializing

2019-09-09 Thread Andreas Mixich
I wonder why the serialization behaves that way. It does not make sense to
me. If a user has the need to escape XML, it should be thorough, shouldn't
it?

On Mon, Sep 9, 2019 at 10:47 PM Liam R. E. Quin 
wrote:

> On Mon, 2019-09-09 at 15:04 +0200, Andreas Mixich wrote:
> > when serializing a string, that contains literal XML with entities,
> > how do I pass through those entities unchanged?
>
> One way is to use a character map, as Bridger Dyson-Smith described.
>
> Sometimes another way can be to have a version of the DTD in which the
> replacement text of the entity marks the presence of the entity, e.g.
> 
> but this will affect full-text searching of course.
>
> Liam
>
> --
> Liam Quin, https://www.delightfulcomputing.com/
> Available for XML/Document/Information Architecture/XSLT/
> XSL/XQuery/Web/Text Processing/A11Y training, work & consulting.
> Barefoot Webslave for old illustrations  http://www.fromoldbooks.org/
>
>

-- 
Minden jót, all the best, Alles Gute,
Andreas Mixich


Re: [basex-talk] Passing through entities unchanged when serializing

2019-09-09 Thread Liam R. E. Quin
On Mon, 2019-09-09 at 15:04 +0200, Andreas Mixich wrote:
> when serializing a string, that contains literal XML with entities,
> how do I pass through those entities unchanged?

One way is to use a character map, as Bridger Dyson-Smith described.

Sometimes another way can be to have a version of the DTD in which the
replacement text of the entity marks the presence of the entity, e.g.

but this will affect full-text searching of course.

Liam

-- 
Liam Quin, https://www.delightfulcomputing.com/
Available for XML/Document/Information Architecture/XSLT/
XSL/XQuery/Web/Text Processing/A11Y training, work & consulting.
Barefoot Webslave for old illustrations  http://www.fromoldbooks.org/



Re: [basex-talk] Passing through entities unchanged when serializing

2019-09-09 Thread Bridger Dyson-Smith
Hi Andreas -

Have you tried using different serialization options? I.e.,
serialize.xq:
```
declare option output:method "xml";
declare option output:parameter-document "map.xml";
declare variable $input := "Lorem ipsum,  dolor sit amet.";
serialize($input)
```

map.xml:
```
http://www.w3.org/2010/xslt-xquery-serialization;>
  

  

```

When run in the BaseX GUI, I get:
`lt;pgt;Lorem ipsum,  dolor sit amet.lt;/pgt;`,
might be closer?

I think you might have been experiencing the default 'basex' serialization
option (see [1] for more).
Hope that helps.
Best,
Bridger

[1] http://docs.basex.org/wiki/Serialization

On Mon, Sep 9, 2019 at 9:05 AM Andreas Mixich 
wrote:

> Hi,
>
> when serializing a string, that contains literal XML with entities, how do
> I pass through those entities unchanged?
> Example:
>
> let $input := "Lorem ipsum  dolor sit amet "
> return serialize($input)
>
> results in:
>
> pLorem ipsum dolor sit amet, ' consectetur adipisicing
> elit./p
>
> but I want:
>
> pLorem ipsum dolor sit amet,  consectetur adipisicing
> elit./p
>
> --
> Minden jót, all the best, Alles Gute,
> Andreas Mixich
>


[basex-talk] Passing through entities unchanged when serializing

2019-09-09 Thread Andreas Mixich
Hi,

when serializing a string, that contains literal XML with entities, how do
I pass through those entities unchanged?
Example:

let $input := "Lorem ipsum  dolor sit amet "
return serialize($input)

results in:

pLorem ipsum dolor sit amet, ' consectetur adipisicing
elit./p

but I want:

pLorem ipsum dolor sit amet,  consectetur adipisicing
elit./p

-- 
Minden jót, all the best, Alles Gute,
Andreas Mixich