Re: [RFC] Roundtripping namespaced xml documents for data.xml

2014-06-10 Thread Herwig Hochleitner
Hi Paul,

2014-06-06 17:28 GMT+02:00 Paul Gearon gea...@gmail.com:

 I've split up the namespace pretty much the same way that you did (since I
 totally agreed with that). To that end I ought to put your name down as the
 author, though I typically end up changing things in each file, so it's got
 my mark on it too. Is the standard for the namespace :author tag to use a
 comma separated string?


I'm not aware of any convention here. To me it's also fine if you just put
your name.


 Parsing goes to the representation tier, which is compatible with the
 existing clojure.data.xml output. As mentioned before, the changes are a
 new namespaces field, and metadata to hold the contextual namespaces (so
 it's invisible to printing, equality, etc). For the test data I've used,
 emitting the representation tier generates the original XML (with new
 formatting).


OK, I don't think that adding a separate namespaces field is compatible
with existing output, but it's a change that makes sense in a certain light.
I would still oppose that change, in case it should be proposed.

This data format contains each of the elements of your infoset
 representation. So your model tier example in the design doc of:
D:propfind xmlns:D=DAV: /

 Maps to your representation of infosec:

 {:tag ::dav/propfind :attrs {} :in-scope {D DAV: xml xml-uri}
 :namespace-attrs {::xmlns/D DAV:}}

 Whereas my code represents the same data with:

 (with-meta
  {:tag :D/propfind :attrs {} :namespaces {D DAV:} :content ()}
  {:xml http://www.w3.org/XML/1998/namespace})

 So it's the same info, but represented differently.


My proposed model tier would be leaning on xpath, because it fits with
clojure.core/=. So that would be:

(with-meta
  {:tag #xml/name {DAV:}propfind :attrs {} :content ()}
  {:clojure.data.xml/namespaces (xml/to-ns {D DAV:})}) ;; xml and xmlns
prefixes are added by to-ns

I don't intend to implement the infoset or the dom representation at all,
sorry that the design page is unclear on that.

The ::dav/propfind convention is convenience for notation, it doesn't
interact well with equality, so there would be a special version of =
and/or a converter.

I'm sure that you are aware that your proposed representation doesn't work
with clojure.core/=, so I'm not going to dissect the issues there.
Let me add that people might want to store arbitrary metadata on parsed
xml, so using a namespaced key there is prudent.

I've also implemented a resolve-xml function (I'd have liked resolve, but
 I hate reusing names that appear in clojure.core). When applied to data in
 the representation tier it generates a version of the model tier. This uses
 QNames for tags and attributes. They are still QNames even when they do not
 have a namespace, since QNames support this. This transformation was easy,
 as it just applied the namespace info to the keywords in the representation
 tier. I was expecting to create a new Element type to represent the model
 tier, but in the end I didn't see the need, since the existing types do
 everything needed for the model tier as well. The main reason I can think
 of to create a new element type for the model tier would be for a tree
 walker to be able to use protocol dispatch on the element and not the
 contents of the element.


Cool, maybe I can steal your converter? I have only implemented parsing
directly to model tier, for now.

- a function to go from the model tier to the representation tier. I hope
 to do this soon.


So, basically a prefix-assigner for model tier


 - graceful handling of un-mapped prefixes


I'm still thinking hard about erroring out in this case, as StAX does, IIRC


  - handling a default prefix (the pseudo-raw section of the design
 document)


I think that this can be modeled as the empty prefix.


 - The QName reader


I've already written that, I'm just not sure how to deploy it in a contrib
project.

Thanks for following up on this, I'll make sure and ping you, before I get
back to my version.

cheers

-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
Clojure group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [RFC] Roundtripping namespaced xml documents for data.xml

2014-05-28 Thread Christophe Grand
On Tue, May 27, 2014 at 2:05 AM, Herwig Hochleitner
hhochleit...@gmail.comwrote:

 My use case is parsing and generating webdav and I'd much rather work with
 model tier than just looking at tag names and assuming that the prefixes
 are set up correctly. Yes, that might exclude clients, that generate
 invalid webdav. Yes, I think that's a feature.


On the other hand working at the representation tier may exclude correct
clients emitting funky prefixes.


-- 
On Clojure http://clj-me.cgrand.net/
Clojure Programming http://clojurebook.com
Training, Consulting  Contracting http://lambdanext.eu/

-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
Clojure group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [RFC] Roundtripping namespaced xml documents for data.xml

2014-05-26 Thread Paul Gearon
Hi Herwig,

First, I have to start with an apology, and it's to do with this section:

If I compare the a:bar element form both documents with func-deep-equal
 then they should compare equal, despite the fact that the a:bar qname
 resolves differently in each context.


 Are you saying that deep-equals compares the actual serialization (with
 prefixes), or that the default equality should do that?
 If so, please read the infoset specification:
 http://www.w3.org/TR/xml-infoset/#infoitem.element
 The relevant quote for this case:


I really don't know how I sent that.

I confess to writing it (mea culpa), but it felt wrong as I did so. I went
back to the docs, and realized that I had been thinking of how attribute
values are not affected by namespace. Stupid error, and I don't know what I
was thinking.

The weird thing is that I *thought* that I edited the email to remove this
error. Maybe I mentioned it twice and only removed it once? Anyway, it was
wrong. I apologise for confusing the conversion with this furfy.

Subsequent parts of the email were written with the correct understanding.

Incidentally, looking this over did make me think again about storing the
fully resolved QNames in the parsed document. I still came down on not
doing so, but some of my reasoning is more due to non-XML arguments.

Back to the main thread (I'll remove the part about deep-equals not
comparing resolved qnames)...

On Sun, May 25, 2014 at 8:19 AM, Herwig Hochleitner
hhochleit...@gmail.comwrote:

 2014-05-23 19:01 GMT+02:00 Paul Gearon gea...@gmail.com:


 I still argue for using keywords here. The existing API uses them, and
 they're a natural fit.


 The fact that they have established meaning (for denoting literal xml
 names + their prefix in a given serialization) in the API is exactly one of
 my reasons for not wanting to change those semantics. Having a separate
 tier for representing raw, serialized xml is fine. It's what the library
 currently does. Adding new behavior, like proper xml namespacing, warrants
 adding a new tier.


 The one real problem is elements would need a special hash/equality check
 to deal with namespace contexts (presuming that fn:deep-equal maps to
 Object.equals).


 I had been thinking along those lines before. Check out the dev thread, I
 try to argue that at first there, but at some point I realized that it
 makes no sense to strictly stick to the raw representation and compute
 other info just on the fly. The key observation is, that a tree of raw,
 prefixed xml doesn't make any sense without xmlns declarations, whereas
 they are redundant, as soon as the tree has been resolved.


My point of view is that processing real-world XML rarely needs the fully
resolved URIs. Instead, most applications only care about the element names
as they appear in the document. Also, keywords have the nice property of
being interned, which matters when parsing 20GB XML files. It's possible to
intern a QName implementation, but they will still use more memory per
element.

The counter argument is that the standard requires support for handling
fully resolved QNames, so these need to be dealt with. However, I don't
believe that the use-cases for dealing with fully resolved data comes up as
often. Also, when it does come up, my experience has been that it is
usually in smaller documents (1MB), where efficiency of comparison is not
as paramount.

The issue here may be more of dealing with the representation tier vs. the
model tier. I will address that question more at the bottom of this email.


 To your point from below:


 I didn't follow the discussion for putting URIs into keywords, as I could
 not see why we would want this (am I missing something obvious?)


 We need the URIs for xml processing and the XmlNamespace metadata can get
 lost or not be there in the first place. Also the URI counts for equality,
 see below.
 I totally agree that it makes no sense putting them in keywords.


OK, I agree. My difference has been that I don't think that the entire URI
need exist in the element tag, but rather allow it to be built from the
element tag + the namespace context. (That would be the
representation-to-model tier mapping. I mention this, along with namespace
contexts at the end)

The keywords would need to be translated according to the current context.
 However, that approach still works for fragments that can be evaluated in
 different contexts,


 The problem are fragments that are taken out from their (xmlns -
 declaring) root-element and/or that have no XmlNamespace metadata. Apart
 from actual prefix assignment (which can be done in the emitter), QNames
 are completely context free in that regard. See the key observation above.


This is why I have advocated attaching the namespace context as metadata.

while storing URIs directly needs everything to be rebuilt for another
 context.


 Are you talking about prefix assignments? See my comment about diffing
 metadata below. I also detailed on this 

Re: [RFC] Roundtripping namespaced xml documents for data.xml

2014-05-26 Thread Herwig Hochleitner
2014-05-26 22:46 GMT+02:00 Paul Gearon gea...@gmail.com:

 Hi Herwig,

 First, I have to start with an apology,


Hi Paul,

it's alright. I have to admit, that I'm relieved that you sent that in
error.

My point of view is that processing real-world XML rarely needs the fully
 resolved URIs.


Can we agree, that an application doing namespace aware processing,
actually _does_ care about the URI? Of course, as long as no additional
xmlns attrs are introduced, it might compare by prefix, but it's still the
target uri of the prefix, that counts.


  Instead, most applications only care about the element names as they
 appear in the document. Also, keywords have the nice property of being
 interned, which matters when parsing 20GB XML files.


It's worthwhile to optimize for large files, but correctness considerations
must always come first. We don't want to encourage mostly correct xml
processing.

It's possible to intern a QName implementation, but they will still use
 more memory per element.


I don't buy that argument. Have you looked at the java types and done the
math? Also, if it's really significant, we can always use a custom deftype.

The counter argument is that the standard requires support for handling
 fully resolved QNames, so these need to be dealt with. However, I don't
 believe that the use-cases for dealing with fully resolved data comes up as
 often. Also, when it does come up, my experience has been that it is
 usually in smaller documents (1MB), where efficiency of comparison is not
 as paramount.

 The issue here may be more of dealing with the representation tier vs. the
 model tier. I will address that question more at the bottom of this email.


In my view, if we were to make data.xml namespace aware, it needs to
actually implement the standard. Users can always stay in representation
tier if they don't like the overhead that comes with resolving. And yes, I
think the distinction between representation and model tier is critical.

OK, I agree. My difference has been that I don't think that the entire URI
 need exist in the element tag, but rather allow it to be built from the
 element tag + the namespace context. (That would be the
 representation-to-model tier mapping. I mention this, along with namespace
 contexts at the end)


As I hinted in my reply, I actually started off in the same direction: Keep
everything in representation tier and just give the user tools to resolve
the prefixes properly. You can actually follow the process in the dev
thread of how I got convinced that for default xml handling it's better to
add a model tier.

This is why I have advocated attaching the namespace context as metadata.


My current implementation of representation tier actually has that. It's
still no substitute for working with fully resolved data. Just think of
what people might do to a parsed xml tree with clojure's core functions.

 What about the QName {http://www.w3.org/1999/xhtml}body? Notice that :
 http://www.w3.org/1999/xhtml/body would be read like (keyword http: /
 www.w3.org/1999/xhtml/body). Another point that's already been made on
 the dev thread.


 Not sure what you're trying to get at with this example.


What I might have misunderstood, is that I thought you argued for cgrand
and chousers original approach of putting the namespace uri into the
keyword namespace.
All I'm trying to say is that keywords are inappropriate for storing
_resolved QNames_. They are, however, appropriate for storing prefixed
names in representation tier.

The syntax {http://www.w3.org/1999/xhtml}body is a universal name in
 Clark's notation, and it's used for describing a resolved QName.  As Clark
 points out, it's not valid to use a universal name in XML: you only use it
 in the data model.


Yes, I was talking about the data model and how to encode it in clojure
data structures. I never suggested using Clark's notation in actual
documents.


 In this case, the QName would presumably be either just body with the
 default namespace, or xhtml:body with an in-context namespace of
 xmlns:xhtml=http://www.w3.org/1999/xhtmlhttp://www.w3.org/1999/xhtml%7Dbody
 .


I have to admit that I had gotten this part of terminology wrong in my
mental model. I thought that QName always referred to a universal name, the
way java's QName implementation does. I just learned that in the standard
qualified name refers to a possibly prefixed tag or attr name within the
serialization.

Consequently, the keyword to be constructed should look like either
 (keyword body) or (keyword xthml body). Somewhere nearby there will
 be metadata of {:xhtml 
 http://www.w3.org/1999/xhtmlhttp://www.w3.org/1999/xhtml%7Dbody
 }


OK, that's representation tier. We still need to recognize the fact, that
the metadata just might not be there, even if we always generate it in the
parser.

As an aside, I was curious, so I tried both full URIs and universal names
 in a couple of XML validators, and was surprised to see that they
 validated. 

Re: [RFC] Roundtripping namespaced xml documents for data.xml

2014-05-25 Thread Herwig Hochleitner
2014-05-23 19:01 GMT+02:00 Paul Gearon gea...@gmail.com:


 I still argue for using keywords here. The existing API uses them, and
 they're a natural fit.


The fact that they have established meaning (for denoting literal xml names
+ their prefix in a given serialization) in the API is exactly one of my
reasons for not wanting to change those semantics. Having a separate tier
for representing raw, serialized xml is fine. It's what the library
currently does. Adding new behavior, like proper xml namespacing, warrants
adding a new tier.


 The one real problem is elements would need a special hash/equality check
 to deal with namespace contexts (presuming that fn:deep-equal maps to
 Object.equals).


I had been thinking along those lines before. Check out the dev thread, I
try to argue that at first there, but at some point I realized that it
makes no sense to strictly stick to the raw representation and compute
other info just on the fly. The key observation is, that a tree of raw,
prefixed xml doesn't make any sense without xmlns declarations, whereas
they are redundant, as soon as the tree has been resolved.

To your point from below:


 I didn't follow the discussion for putting URIs into keywords, as I could
 not see why we would want this (am I missing something obvious?)


We need the URIs for xml processing and the XmlNamespace metadata can get
lost or not be there in the first place. Also the URI counts for equality,
see below.
I totally agree that it makes no sense putting them in keywords.


  The keywords would need to be translated according to the current
 context. However, that approach still works for fragments that can be
 evaluated in different contexts,


The problem are fragments that are taken out from their (xmlns - declaring)
root-element and/or that have no XmlNamespace metadata. Apart from actual
prefix assignment (which can be done in the emitter), QNames are completely
context free in that regard. See the key observation above.


 while storing URIs directly needs everything to be rebuilt for another
 context.


Are you talking about prefix assignments? See my comment about diffing
metadata below. I also detailed on this point in the design page.

Most possible QNames can be directly expressed as a keyword (for instance,
 the QName 㑦:㒪 can be represented as the read-able keyword :㑦/㒪). The
 keyword function is just a workaround for exotic values. While I know they
 can exist (e.g. containing \u000A), I've yet to see any QNames in the wild
 that cannot be represented as a read-able keyword.


Seen xhtml? What about the QName {http://www.w3.org/1999/xhtml}body? Notice
that :http://www.w3.org/1999/xhtml/body would be read like (keyword http:
/www.w3.org/1999/xhtml/body). Another point that's already been made on
the dev thread.


 In case I'm not clear, say I have these two docs:

 a:foo xmlns=http://ex.com/; xmlns:a=http://a.com/;
   a:bar xmlns:b=http://b.org; b:baz=blah/
 /a:foo

 a:foo xmlns:a=http://something.else.com/;
   a:bar xmlns:b=http://b.org; b:baz=blah/
 /a:foo

 If I compare the a:bar element form both documents with func-deep-equal
 then they should compare equal, despite the fact that the a:bar qname
 resolves differently in each context.


Are you saying that deep-equals compares the actual serialization (with
prefixes), or that the default equality should do that?
If so, please read the infoset specification:
http://www.w3.org/TR/xml-infoset/#infoitem.element

The relevant quote for this case:

*[prefix]* The namespace prefix part of the element-type name. If the name
 is unprefixed, this property has no value. Note that namespace-aware
 applications should use the namespace name rather than the prefix to
 identify elements.



 I still don't see why the reverse mapping is needed. Is it because I'm
 storing the QName in a keyword and can look up the current namespace for
 the URI, while you are storing the fully qualified name?


First, terminology: In xml the namespace _is_ the uri. The thing that you
write before the : in the serialization is a prefix. It is only an artifact
of serialization, completely meaningless except when you actually read or
write xml. So I want the user to be to write xml without javax.xml, just
by transforming the tree back to its context-dependent keyworded
prefix-representation. So we need a way to find the (a) current prefix for
a namespace.

Sorry, I'm not following what you were getting at with this. In this
 example D and E both get mapped to the same namespace, meaning that
 D:foo/ and E:foo/ can be made to resolve the same way. But in a
 different context they could be different values.


Which is the reason we need to lift elements out of their context as soon
as possible. We don't want an element to change its namespace, just because
we transplant it into another xml fragment. Chouser went to great length
about this point, before he realized that this was exactly my goal aswell.

If both the explicit declarations of namespaces 

Re: [RFC] Roundtripping namespaced xml documents for data.xml

2014-05-23 Thread Paul Gearon
Hi Herwig,

I spent some time going through the design, and a email thread
(particularly Chouser and Christophe's responses), plus I spent a bit more
time on my own implementation where it was clear that I'd missed some
things.

I've yet to go through your code in fine detail, so I'll still have some
gaps in my knowledge.

On Thu, May 22, 2014 at 2:44 PM, Herwig Hochleitner
hhochleit...@gmail.comwrote:

 2014-05-21 21:06 GMT+02:00 Paul Gearon gea...@gmail.com:


 Are QNames strictly necessary? Keywords seem to do the trick, and they
 work in nicely with what already exists.

 I know that there are some QName forms that are not readable as a
 keyword, but the XML parsing code will always call (keyword ...) and that
 holds any kind of QName,


 I've argued this at some length on the dev thread. IMO QNames are not
 nessecary, but we want another datatype than keywords.
 I think the main argument for using keywords would be xml literals in code
 and there readability (i.e. not having to use (keyword ..)) counts. A
 reader tag is far better suited for this.
 In the course of that argument, I also came up with a way to represent
 resolved names as keywords in literals. Please check out the design page
 for this.


I still argue for using keywords here. The existing API uses them, and
they're a natural fit.

The one real problem is elements would need a special hash/equality check
to deal with namespace contexts (presuming that fn:deep-equal maps to
Object.equals). The keywords would need to be translated according to the
current context. However, that approach still works for fragments that can
be evaluated in different contexts, while storing URIs directly needs
everything to be rebuilt for another context.

Most possible QNames can be directly expressed as a keyword (for instance,
the QName 㑦:㒪 can be represented as the read-able keyword :㑦/㒪). The
keyword function is just a workaround for exotic values. While I know they
can exist (e.g. containing \u000A), I've yet to see any QNames in the wild
that cannot be represented as a read-able keyword.

In case I'm not clear, say I have these two docs:

a:foo xmlns=http://ex.com/; xmlns:a=http://a.com/;
  a:bar xmlns:b=http://b.org; b:baz=blah/
/a:foo

a:foo xmlns:a=http://something.else.com/;
  a:bar xmlns:b=http://b.org; b:baz=blah/
/a:foo

If I compare the a:bar element form both documents with func-deep-equal
then they should compare equal, despite the fact that the a:bar qname
resolves differently in each context.

The representation I've used was only a small extension to the existing one:
#clojure.data.xml.Element{:tag :a/bar, :attrs {:b/baz blah}, :namespaces
{:b http://b.org}, :content ()}

I agree with the use of meta to handle the namespaces, since it's not
included in equality testing. Namespaces are declared on, and scoped to the
element, so it makes sense to add them as a map there (this is what I've
done). In the first case, the meta-data for the a:bar element is:
{:b http://b.org;, :a http://a.com/;, :xmlns http://ex.com/}

I didn't follow the discussion for putting URIs into keywords, as I could
not see why we would want this (am I missing something obvious?)

Are the reverse mappings (uri-prefix) definitely necessary? My first look
 at this made me think that they were (particularly so I could call
 XMLStreamWriter.getPrefix), but it seemed that the XmlWriter keeps enough
 state that it isn't necessary. My final code didn't need them at all.


 The XmlWriter does keep enough state, but I also want to support tree
 transformers that have the full information without needing to pipe through
 Xml{Reader,Parser}.
 uri-prefix could be reconstructed from prefix-uri in linear time, so
 again, the reason for the reverse mapping is performance.


I still don't see why the reverse mapping is needed. Is it because I'm
storing the QName in a keyword and can look up the current namespace for
the URI, while you are storing the fully qualified name?


 I was mostly considering round-tripping the data, and the parser is good
 at not repeating namespaces for child elements, so the emitter didn't need
 to either. As a result I didn't need to filter out prefix-uri bindings
 from parent elements when emitting namespaces, though that should be easy.


 What I meant are redundant prefixes, e.g. binding xmlns:D=DAV: at the
 root element, xmlns:E=DAV: in a child element.


Sorry, I'm not following what you were getting at with this. In this
example D and E both get mapped to the same namespace, meaning that
D:foo/ and E:foo/ can be made to resolve the same way. But in a
different context they could be different values.

If both the explicit declarations of namespaces on elements and current
context are stored with the element (one in the :namespaces field, the
other in metadata), then this allows resolution to be handled correctly,
while also maintaining where each namespace needs to be emitted.

If uri-prefix is needed, then a simple map would need that, yes. However,
 

Re: [RFC] Roundtripping namespaced xml documents for data.xml

2014-05-22 Thread Thomas
Slightly Off topic, but how can I add new an element to an existing XML 
file with data.xml. For instance I have:

a
  b
  /b
/a

and I want to add element c to this like this:

a
  b
c
/c
  /b
/a

The documentation isn't particular clear on how to use the library 
unfortunately.

Thomas

-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
Clojure group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [RFC] Roundtripping namespaced xml documents for data.xml

2014-05-22 Thread Herwig Hochleitner
2014-05-21 21:06 GMT+02:00 Paul Gearon gea...@gmail.com:


 Are QNames strictly necessary? Keywords seem to do the trick, and they
 work in nicely with what already exists.

 I know that there are some QName forms that are not readable as a keyword,
 but the XML parsing code will always call (keyword ...) and that holds any
 kind of QName,


I've argued this at some length on the dev thread. IMO QNames are not
nessecary, but we want another datatype than keywords.
I think the main argument for using keywords would be xml literals in code
and there readability (i.e. not having to use (keyword ..)) counts. A
reader tag is far better suited for this.
In the course of that argument, I also came up with a way to represent
resolved names as keywords in literals. Please check out the design page
for this.

Are the reverse mappings (uri-prefix) definitely necessary? My first look
 at this made me think that they were (particularly so I could call
 XMLStreamWriter.getPrefix), but it seemed that the XmlWriter keeps enough
 state that it isn't necessary. My final code didn't need them at all.


The XmlWriter does keep enough state, but I also want to support tree
transformers that have the full information without needing to pipe through
Xml{Reader,Parser}.
uri-prefix could be reconstructed from prefix-uri in linear time, so
again, the reason for the reverse mapping is performance.

I was mostly considering round-tripping the data, and the parser is good at
 not repeating namespaces for child elements, so the emitter didn't need to
 either. As a result I didn't need to filter out prefix-uri bindings from
 parent elements when emitting namespaces, though that should be easy.


What I meant are redundant prefixes, e.g. binding xmlns:D=DAV: at the
root element, xmlns:E=DAV: in a child element.


 If uri-prefix is needed, then a simple map would need that, yes. However,
 if I needed the reverse mapping then I'd use a pair of stacks of maps - one
 for each direction.

 (BTW, a stack of maps sounds complex, but the top of the stack is just
 the new bindings merged onto the previous top of the stack).


In this case, XmlNamespaceImpl is just that, modulo the stack. It is meant
to be updated at every child element that binds xmlns prefixes, so the
stack is implicit. I don't keep the parent XmlNamespaceImpl, because an xml
element doesn't keep a parent pointer either.

ad. Thomas' quesion

 Slightly Off topic, but how can I add new an element to an existing XML
file with data.xml.

Since you mentioned zippers, I assume you are familiar with them. I
wholeheartedly recommend them for manipulating xml.
Enlive is also built on zippers and I think it shouldn't take too much
effort to make it work with the proposed namespace support.

-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
Clojure group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [RFC] Roundtripping namespaced xml documents for data.xml

2014-05-21 Thread Paul Gearon
Hi Herwig,

I needed this myself and took a stab at it before it occurred to me to look
for other implementations and found yours.

I like the way that you have split the implementation of functionality into
different namespaces. The current release is a little monolithic.

One thing that bothers me is what's going on in clojure.data.xml.impl. I
haven't tried to go through all of it yet, but I didn't think that the code
needed to be that complex.

The way I handled prefix-uri mappings was to use a stack of maps, which I
thought was adequate. Your implementation of XmlNamespaceImpl has me
thinking that I've been missing something important. Could you explain why
XmlNamespaceImpl is structured the way it is?

Thanks in advance,

Paul



On Wed, Mar 26, 2014 at 10:34 AM, Herwig Hochleitner hhochleit...@gmail.com
 wrote:

 Hi,

 I'm taking a stab at namespaced xml support:
 http://dev.clojure.org/jira/browse/DXML-4

 I've uploaded a patch, that should implement 1:1 roundtripping, fully
 preserving prefixes and xmlns declarations:
 http://dev.clojure.org/jira/secure/attachment/12899/roundtrip-documents.patch

 This doesn't implement any advanced serialization or deserialization
 strategies described in the design page:
 http://dev.clojure.org/display/DXML/Fuller+XML+support
 However, it allows such strategies to be implemented by transforming
 clojure data structures, hence it should be a suitable common
 representation for any namespaced xml needs.

 I'd like to work on some namespace-related treewalking next, most
 importantly normalizing prefixes and default namespaces, so that one can
 actually parse namespaced xml.

 Meanwhile, I'd be delighted if you could try out the patch on any
 (well-formed) namespaced xml you have at hand and see, if it roundtrips
 correctly.

 kind regards

 --
 You received this message because you are subscribed to the Google
 Groups Clojure group.
 To post to this group, send email to clojure@googlegroups.com
 Note that posts from new members are moderated - please be patient with
 your first post.
 To unsubscribe from this group, send email to
 clojure+unsubscr...@googlegroups.com
 For more options, visit this group at
 http://groups.google.com/group/clojure?hl=en
 ---
 You received this message because you are subscribed to the Google Groups
 Clojure group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to clojure+unsubscr...@googlegroups.com.
 For more options, visit https://groups.google.com/d/optout.


-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
Clojure group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [RFC] Roundtripping namespaced xml documents for data.xml

2014-05-21 Thread Herwig Hochleitner
2014-05-21 19:31 GMT+02:00 Paul Gearon gea...@gmail.com:

 I needed this myself and took a stab at it before it occurred to me to
 look for other implementations and found yours.


Hi Paul,

good to hear you are interested in this effort.

I like the way that you have split the implementation of functionality into
 different namespaces. The current release is a little monolithic.


The patch I attached to DXML-4 [1] are just minimal changes to make
roundtripping work. No refactoring.

I've started to implement a much more advanced approach to namespaced xml
in my repo [2], the design of which came from feedback on clojure-dev
thread [3] and is documented in a new design page [4].

This approach will add another data tier (and make it the default), where
tags and attribute names will be represented as QNames and xmlns attributes
won't be present in :attrs

One thing that bothers me is what's going on in clojure.data.xml.impl. I
 haven't tried to go through all of it yet, but I didn't think that the code
 needed to be that complex.


Good thing that you didn't bother consuming it all, impl currently has a
lot of code, that will get deleted again. Basically all the stuff for
automagically mixing raw and resolved data.
The relevant parts for now are mostly XmlNamespace and the reader tag
implementations.

The way I handled prefix-uri mappings was to use a stack of maps, which I
 thought was adequate. Your implementation of XmlNamespaceImpl has me
 thinking that I've been missing something important. Could you explain why
 XmlNamespaceImpl is structured the way it is?


Apart from basic bidirectional mapping functionality, XmlNamespaceImpl is
built in such a way that:
- Adding mappings is semantically equivalent to a child element declaring
xmlns attributes
- A new prefix for an already established mapping won't displace the
current reverse mapping (uri-prefix), but get recorded as an alternative
- Remapping a prefix to a new uri or deleting it, will establish an
alternative prefix for that uri, in FIFO manner

The aim is to strip out redundant xmlns attributes in the emitter, while
retaining xml semantics. There is indeed something missing: Two
XmlNamespaces must be diffable, to discover which xmlns attributes actually
have to be emitted.

I'm sure it could be built a lot simpler. It's currently basically a
product of making its tests work [5]. It could probably be made to work
with stacks of maps, but then each lookup would involve a linear search, no?

--

What do you think of the proposed design?

[1] http://dev.clojure.org/jira/browse/DXML-4
[2] https://github.com/bendlas/data.xml
[3] https://groups.google.com/d/topic/clojure-dev/2ckb0NxJTlQ/discussion
[4] http://dev.clojure.org/display/DXML/Namespaced+XML
[5]
https://github.com/bendlas/data.xml/blob/master/src/test/clojure/clojure/data/xml/test_namespace.clj

kind regards

-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
Clojure group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [RFC] Roundtripping namespaced xml documents for data.xml

2014-05-21 Thread Paul Gearon
Hi Herwig,

Thanks for the response.

I'll remove some of the less relevant bits as I reply inline...

On Wed, May 21, 2014 at 2:26 PM, Herwig Hochleitner
hhochleit...@gmail.comwrote:

 2014-05-21 19:31 GMT+02:00 Paul Gearon gea...@gmail.com:

 I like the way that you have split the implementation of functionality
 into different namespaces. The current release is a little monolithic.


 The patch I attached to DXML-4 [1] are just minimal changes to make
 roundtripping work. No refactoring.

 I've started to implement a much more advanced approach to namespaced xml
 in my repo [2], the design of which came from feedback on clojure-dev
 thread [3] and is documented in a new design page [4].


I wasn't really paying attention to the patch, since I saw your link to
your fork.

Someone mentioned the thread to me, but I hadn't found it yet. Thanks for
the link. I'll read up on it, along with the design page. I'll continue to
comment while I'm here, though I may get some of my answers as I read more.

This approach will add another data tier (and make it the default), where
 tags and attribute names will be represented as QNames and xmlns attributes
 won't be present in :attrs


Are QNames strictly necessary? Keywords seem to do the trick, and they work
in nicely with what already exists.

I know that there are some QName forms that are not readable as a keyword,
but the XML parsing code will always call (keyword ...) and that holds any
kind of QName,


 One thing that bothers me is what's going on in clojure.data.xml.impl. I
 haven't tried to go through all of it yet, but I didn't think that the code
 needed to be that complex.


 Good thing that you didn't bother consuming it all, impl currently has a
 lot of code, that will get deleted again. Basically all the stuff for
 automagically mixing raw and resolved data.
 The relevant parts for now are mostly XmlNamespace and the reader tag
 implementations.

 The way I handled prefix-uri mappings was to use a stack of maps, which I
 thought was adequate. Your implementation of XmlNamespaceImpl has me
 thinking that I've been missing something important. Could you explain why
 XmlNamespaceImpl is structured the way it is?


 Apart from basic bidirectional mapping functionality, XmlNamespaceImpl is
 built in such a way that:
 - Adding mappings is semantically equivalent to a child element declaring
 xmlns attributes
 - A new prefix for an already established mapping won't displace the
 current reverse mapping (uri-prefix), but get recorded as an alternative
 - Remapping a prefix to a new uri or deleting it, will establish an
 alternative prefix for that uri, in FIFO manner


Are the reverse mappings (uri-prefix) definitely necessary? My first look
at this made me think that they were (particularly so I could call
XMLStreamWriter.getPrefix), but it seemed that the XmlWriter keeps enough
state that it isn't necessary. My final code didn't need them at all.

I thought I covered the various cases, but I could well believe that I
missed one, since there are so many ways to represent the data. Do you have
some good examples?

The aim is to strip out redundant xmlns attributes in the emitter, while
 retaining xml semantics. There is indeed something missing: Two
 XmlNamespaces must be diffable, to discover which xmlns attributes actually
 have to be emitted.


I was mostly considering round-tripping the data, and the parser is good at
not repeating namespaces for child elements, so the emitter didn't need to
either. As a result I didn't need to filter out prefix-uri bindings from
parent elements when emitting namespaces, though that should be easy.


 I'm sure it could be built a lot simpler. It's currently basically a
 product of making its tests work [5]. It could probably be made to work
 with stacks of maps, but then each lookup would involve a linear search, no?


If uri-prefix is needed, then a simple map would need that, yes. However,
if I needed the reverse mapping then I'd use a pair of stacks of maps - one
for each direction.

(BTW, a stack of maps sounds complex, but the top of the stack is just
the new bindings merged onto the previous top of the stack).



 What do you think of the proposed design?


I will take the time to go through them, thanks. Even if it ends up
different to how I'd do it, it'll be the core library, and I need to work
with it no matter what.

Thanks for being on top of this. I'm looking to reimplement a parser in
Clojure when 99% of the world is using a single Java library for this job.
That's no good for diversity (as OpenSSL showed recently) and also because
I can hardly say I'm offering an alternative system when core elements are
built on the same library that everyone else uses. It's a W3C standard, so
it makes ridiculously heavy use of namespaces. :)

Regards,
Paul

-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that 

Re: [RFC] Roundtripping namespaced xml documents for data.xml

2014-05-21 Thread Thomas
Hi,

Speaking of Clojure and XML, what is the preferred way of dealing with XML 
in Clojure these days? In the past I have used clojure.xml and clojure.zip. 
Is clojure.data.xml the best way to do this now?

TIA,

Thomas

-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
Clojure group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [RFC] Roundtripping namespaced xml documents for data.xml

2014-05-21 Thread Paul Gearon
My understanding is that this is the central library for XML, so
theoretically it's preferred. This is why I'm trying to use it. Also, it's
one of the few implementations using StAX, which is the best way to do lazy
streaming (though core.async makes SAX a viable option again).

However, without namespaces there are numerous applications where data.xml
won't work, so it's still immature.

Paul


On Wed, May 21, 2014 at 3:47 PM, Thomas th.vanderv...@gmail.com wrote:

 Hi,

 Speaking of Clojure and XML, what is the preferred way of dealing with XML
 in Clojure these days? In the past I have used clojure.xml and clojure.zip.
 Is clojure.data.xml the best way to do this now?

 TIA,

 Thomas

 --
 You received this message because you are subscribed to the Google
 Groups Clojure group.
 To post to this group, send email to clojure@googlegroups.com
 Note that posts from new members are moderated - please be patient with
 your first post.
 To unsubscribe from this group, send email to
 clojure+unsubscr...@googlegroups.com
 For more options, visit this group at
 http://groups.google.com/group/clojure?hl=en
 ---
 You received this message because you are subscribed to the Google Groups
 Clojure group.
 To unsubscribe from this group and stop receiving emails from it, send an
 email to clojure+unsubscr...@googlegroups.com.
 For more options, visit https://groups.google.com/d/optout.


-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
Clojure group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


[RFC] Roundtripping namespaced xml documents for data.xml

2014-03-26 Thread Herwig Hochleitner
Hi,

I'm taking a stab at namespaced xml support:
http://dev.clojure.org/jira/browse/DXML-4

I've uploaded a patch, that should implement 1:1 roundtripping, fully
preserving prefixes and xmlns declarations:
http://dev.clojure.org/jira/secure/attachment/12899/roundtrip-documents.patch

This doesn't implement any advanced serialization or deserialization
strategies described in the design page:
http://dev.clojure.org/display/DXML/Fuller+XML+support
However, it allows such strategies to be implemented by transforming
clojure data structures, hence it should be a suitable common
representation for any namespaced xml needs.

I'd like to work on some namespace-related treewalking next, most
importantly normalizing prefixes and default namespaces, so that one can
actually parse namespaced xml.

Meanwhile, I'd be delighted if you could try out the patch on any
(well-formed) namespaced xml you have at hand and see, if it roundtrips
correctly.

kind regards

-- 
You received this message because you are subscribed to the Google
Groups Clojure group.
To post to this group, send email to clojure@googlegroups.com
Note that posts from new members are moderated - please be patient with your 
first post.
To unsubscribe from this group, send email to
clojure+unsubscr...@googlegroups.com
For more options, visit this group at
http://groups.google.com/group/clojure?hl=en
--- 
You received this message because you are subscribed to the Google Groups 
Clojure group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to clojure+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.