Re: [RFC] Roundtripping namespaced xml documents for data.xml
Hi Paul, 2014-06-06 17:28 GMT+02:00 Paul Gearon gea...@gmail.com: I've split up the namespace pretty much the same way that you did (since I totally agreed with that). To that end I ought to put your name down as the author, though I typically end up changing things in each file, so it's got my mark on it too. Is the standard for the namespace :author tag to use a comma separated string? I'm not aware of any convention here. To me it's also fine if you just put your name. Parsing goes to the representation tier, which is compatible with the existing clojure.data.xml output. As mentioned before, the changes are a new namespaces field, and metadata to hold the contextual namespaces (so it's invisible to printing, equality, etc). For the test data I've used, emitting the representation tier generates the original XML (with new formatting). OK, I don't think that adding a separate namespaces field is compatible with existing output, but it's a change that makes sense in a certain light. I would still oppose that change, in case it should be proposed. This data format contains each of the elements of your infoset representation. So your model tier example in the design doc of: D:propfind xmlns:D=DAV: / Maps to your representation of infosec: {:tag ::dav/propfind :attrs {} :in-scope {D DAV: xml xml-uri} :namespace-attrs {::xmlns/D DAV:}} Whereas my code represents the same data with: (with-meta {:tag :D/propfind :attrs {} :namespaces {D DAV:} :content ()} {:xml http://www.w3.org/XML/1998/namespace}) So it's the same info, but represented differently. My proposed model tier would be leaning on xpath, because it fits with clojure.core/=. So that would be: (with-meta {:tag #xml/name {DAV:}propfind :attrs {} :content ()} {:clojure.data.xml/namespaces (xml/to-ns {D DAV:})}) ;; xml and xmlns prefixes are added by to-ns I don't intend to implement the infoset or the dom representation at all, sorry that the design page is unclear on that. The ::dav/propfind convention is convenience for notation, it doesn't interact well with equality, so there would be a special version of = and/or a converter. I'm sure that you are aware that your proposed representation doesn't work with clojure.core/=, so I'm not going to dissect the issues there. Let me add that people might want to store arbitrary metadata on parsed xml, so using a namespaced key there is prudent. I've also implemented a resolve-xml function (I'd have liked resolve, but I hate reusing names that appear in clojure.core). When applied to data in the representation tier it generates a version of the model tier. This uses QNames for tags and attributes. They are still QNames even when they do not have a namespace, since QNames support this. This transformation was easy, as it just applied the namespace info to the keywords in the representation tier. I was expecting to create a new Element type to represent the model tier, but in the end I didn't see the need, since the existing types do everything needed for the model tier as well. The main reason I can think of to create a new element type for the model tier would be for a tree walker to be able to use protocol dispatch on the element and not the contents of the element. Cool, maybe I can steal your converter? I have only implemented parsing directly to model tier, for now. - a function to go from the model tier to the representation tier. I hope to do this soon. So, basically a prefix-assigner for model tier - graceful handling of un-mapped prefixes I'm still thinking hard about erroring out in this case, as StAX does, IIRC - handling a default prefix (the pseudo-raw section of the design document) I think that this can be modeled as the empty prefix. - The QName reader I've already written that, I'm just not sure how to deploy it in a contrib project. Thanks for following up on this, I'll make sure and ping you, before I get back to my version. cheers -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups Clojure group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: [RFC] Roundtripping namespaced xml documents for data.xml
On Tue, May 27, 2014 at 2:05 AM, Herwig Hochleitner hhochleit...@gmail.comwrote: My use case is parsing and generating webdav and I'd much rather work with model tier than just looking at tag names and assuming that the prefixes are set up correctly. Yes, that might exclude clients, that generate invalid webdav. Yes, I think that's a feature. On the other hand working at the representation tier may exclude correct clients emitting funky prefixes. -- On Clojure http://clj-me.cgrand.net/ Clojure Programming http://clojurebook.com Training, Consulting Contracting http://lambdanext.eu/ -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups Clojure group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: [RFC] Roundtripping namespaced xml documents for data.xml
Hi Herwig, First, I have to start with an apology, and it's to do with this section: If I compare the a:bar element form both documents with func-deep-equal then they should compare equal, despite the fact that the a:bar qname resolves differently in each context. Are you saying that deep-equals compares the actual serialization (with prefixes), or that the default equality should do that? If so, please read the infoset specification: http://www.w3.org/TR/xml-infoset/#infoitem.element The relevant quote for this case: I really don't know how I sent that. I confess to writing it (mea culpa), but it felt wrong as I did so. I went back to the docs, and realized that I had been thinking of how attribute values are not affected by namespace. Stupid error, and I don't know what I was thinking. The weird thing is that I *thought* that I edited the email to remove this error. Maybe I mentioned it twice and only removed it once? Anyway, it was wrong. I apologise for confusing the conversion with this furfy. Subsequent parts of the email were written with the correct understanding. Incidentally, looking this over did make me think again about storing the fully resolved QNames in the parsed document. I still came down on not doing so, but some of my reasoning is more due to non-XML arguments. Back to the main thread (I'll remove the part about deep-equals not comparing resolved qnames)... On Sun, May 25, 2014 at 8:19 AM, Herwig Hochleitner hhochleit...@gmail.comwrote: 2014-05-23 19:01 GMT+02:00 Paul Gearon gea...@gmail.com: I still argue for using keywords here. The existing API uses them, and they're a natural fit. The fact that they have established meaning (for denoting literal xml names + their prefix in a given serialization) in the API is exactly one of my reasons for not wanting to change those semantics. Having a separate tier for representing raw, serialized xml is fine. It's what the library currently does. Adding new behavior, like proper xml namespacing, warrants adding a new tier. The one real problem is elements would need a special hash/equality check to deal with namespace contexts (presuming that fn:deep-equal maps to Object.equals). I had been thinking along those lines before. Check out the dev thread, I try to argue that at first there, but at some point I realized that it makes no sense to strictly stick to the raw representation and compute other info just on the fly. The key observation is, that a tree of raw, prefixed xml doesn't make any sense without xmlns declarations, whereas they are redundant, as soon as the tree has been resolved. My point of view is that processing real-world XML rarely needs the fully resolved URIs. Instead, most applications only care about the element names as they appear in the document. Also, keywords have the nice property of being interned, which matters when parsing 20GB XML files. It's possible to intern a QName implementation, but they will still use more memory per element. The counter argument is that the standard requires support for handling fully resolved QNames, so these need to be dealt with. However, I don't believe that the use-cases for dealing with fully resolved data comes up as often. Also, when it does come up, my experience has been that it is usually in smaller documents (1MB), where efficiency of comparison is not as paramount. The issue here may be more of dealing with the representation tier vs. the model tier. I will address that question more at the bottom of this email. To your point from below: I didn't follow the discussion for putting URIs into keywords, as I could not see why we would want this (am I missing something obvious?) We need the URIs for xml processing and the XmlNamespace metadata can get lost or not be there in the first place. Also the URI counts for equality, see below. I totally agree that it makes no sense putting them in keywords. OK, I agree. My difference has been that I don't think that the entire URI need exist in the element tag, but rather allow it to be built from the element tag + the namespace context. (That would be the representation-to-model tier mapping. I mention this, along with namespace contexts at the end) The keywords would need to be translated according to the current context. However, that approach still works for fragments that can be evaluated in different contexts, The problem are fragments that are taken out from their (xmlns - declaring) root-element and/or that have no XmlNamespace metadata. Apart from actual prefix assignment (which can be done in the emitter), QNames are completely context free in that regard. See the key observation above. This is why I have advocated attaching the namespace context as metadata. while storing URIs directly needs everything to be rebuilt for another context. Are you talking about prefix assignments? See my comment about diffing metadata below. I also detailed on this
Re: [RFC] Roundtripping namespaced xml documents for data.xml
2014-05-26 22:46 GMT+02:00 Paul Gearon gea...@gmail.com: Hi Herwig, First, I have to start with an apology, Hi Paul, it's alright. I have to admit, that I'm relieved that you sent that in error. My point of view is that processing real-world XML rarely needs the fully resolved URIs. Can we agree, that an application doing namespace aware processing, actually _does_ care about the URI? Of course, as long as no additional xmlns attrs are introduced, it might compare by prefix, but it's still the target uri of the prefix, that counts. Instead, most applications only care about the element names as they appear in the document. Also, keywords have the nice property of being interned, which matters when parsing 20GB XML files. It's worthwhile to optimize for large files, but correctness considerations must always come first. We don't want to encourage mostly correct xml processing. It's possible to intern a QName implementation, but they will still use more memory per element. I don't buy that argument. Have you looked at the java types and done the math? Also, if it's really significant, we can always use a custom deftype. The counter argument is that the standard requires support for handling fully resolved QNames, so these need to be dealt with. However, I don't believe that the use-cases for dealing with fully resolved data comes up as often. Also, when it does come up, my experience has been that it is usually in smaller documents (1MB), where efficiency of comparison is not as paramount. The issue here may be more of dealing with the representation tier vs. the model tier. I will address that question more at the bottom of this email. In my view, if we were to make data.xml namespace aware, it needs to actually implement the standard. Users can always stay in representation tier if they don't like the overhead that comes with resolving. And yes, I think the distinction between representation and model tier is critical. OK, I agree. My difference has been that I don't think that the entire URI need exist in the element tag, but rather allow it to be built from the element tag + the namespace context. (That would be the representation-to-model tier mapping. I mention this, along with namespace contexts at the end) As I hinted in my reply, I actually started off in the same direction: Keep everything in representation tier and just give the user tools to resolve the prefixes properly. You can actually follow the process in the dev thread of how I got convinced that for default xml handling it's better to add a model tier. This is why I have advocated attaching the namespace context as metadata. My current implementation of representation tier actually has that. It's still no substitute for working with fully resolved data. Just think of what people might do to a parsed xml tree with clojure's core functions. What about the QName {http://www.w3.org/1999/xhtml}body? Notice that : http://www.w3.org/1999/xhtml/body would be read like (keyword http: / www.w3.org/1999/xhtml/body). Another point that's already been made on the dev thread. Not sure what you're trying to get at with this example. What I might have misunderstood, is that I thought you argued for cgrand and chousers original approach of putting the namespace uri into the keyword namespace. All I'm trying to say is that keywords are inappropriate for storing _resolved QNames_. They are, however, appropriate for storing prefixed names in representation tier. The syntax {http://www.w3.org/1999/xhtml}body is a universal name in Clark's notation, and it's used for describing a resolved QName. As Clark points out, it's not valid to use a universal name in XML: you only use it in the data model. Yes, I was talking about the data model and how to encode it in clojure data structures. I never suggested using Clark's notation in actual documents. In this case, the QName would presumably be either just body with the default namespace, or xhtml:body with an in-context namespace of xmlns:xhtml=http://www.w3.org/1999/xhtmlhttp://www.w3.org/1999/xhtml%7Dbody . I have to admit that I had gotten this part of terminology wrong in my mental model. I thought that QName always referred to a universal name, the way java's QName implementation does. I just learned that in the standard qualified name refers to a possibly prefixed tag or attr name within the serialization. Consequently, the keyword to be constructed should look like either (keyword body) or (keyword xthml body). Somewhere nearby there will be metadata of {:xhtml http://www.w3.org/1999/xhtmlhttp://www.w3.org/1999/xhtml%7Dbody } OK, that's representation tier. We still need to recognize the fact, that the metadata just might not be there, even if we always generate it in the parser. As an aside, I was curious, so I tried both full URIs and universal names in a couple of XML validators, and was surprised to see that they validated.
Re: [RFC] Roundtripping namespaced xml documents for data.xml
2014-05-23 19:01 GMT+02:00 Paul Gearon gea...@gmail.com: I still argue for using keywords here. The existing API uses them, and they're a natural fit. The fact that they have established meaning (for denoting literal xml names + their prefix in a given serialization) in the API is exactly one of my reasons for not wanting to change those semantics. Having a separate tier for representing raw, serialized xml is fine. It's what the library currently does. Adding new behavior, like proper xml namespacing, warrants adding a new tier. The one real problem is elements would need a special hash/equality check to deal with namespace contexts (presuming that fn:deep-equal maps to Object.equals). I had been thinking along those lines before. Check out the dev thread, I try to argue that at first there, but at some point I realized that it makes no sense to strictly stick to the raw representation and compute other info just on the fly. The key observation is, that a tree of raw, prefixed xml doesn't make any sense without xmlns declarations, whereas they are redundant, as soon as the tree has been resolved. To your point from below: I didn't follow the discussion for putting URIs into keywords, as I could not see why we would want this (am I missing something obvious?) We need the URIs for xml processing and the XmlNamespace metadata can get lost or not be there in the first place. Also the URI counts for equality, see below. I totally agree that it makes no sense putting them in keywords. The keywords would need to be translated according to the current context. However, that approach still works for fragments that can be evaluated in different contexts, The problem are fragments that are taken out from their (xmlns - declaring) root-element and/or that have no XmlNamespace metadata. Apart from actual prefix assignment (which can be done in the emitter), QNames are completely context free in that regard. See the key observation above. while storing URIs directly needs everything to be rebuilt for another context. Are you talking about prefix assignments? See my comment about diffing metadata below. I also detailed on this point in the design page. Most possible QNames can be directly expressed as a keyword (for instance, the QName 㑦:㒪 can be represented as the read-able keyword :㑦/㒪). The keyword function is just a workaround for exotic values. While I know they can exist (e.g. containing \u000A), I've yet to see any QNames in the wild that cannot be represented as a read-able keyword. Seen xhtml? What about the QName {http://www.w3.org/1999/xhtml}body? Notice that :http://www.w3.org/1999/xhtml/body would be read like (keyword http: /www.w3.org/1999/xhtml/body). Another point that's already been made on the dev thread. In case I'm not clear, say I have these two docs: a:foo xmlns=http://ex.com/; xmlns:a=http://a.com/; a:bar xmlns:b=http://b.org; b:baz=blah/ /a:foo a:foo xmlns:a=http://something.else.com/; a:bar xmlns:b=http://b.org; b:baz=blah/ /a:foo If I compare the a:bar element form both documents with func-deep-equal then they should compare equal, despite the fact that the a:bar qname resolves differently in each context. Are you saying that deep-equals compares the actual serialization (with prefixes), or that the default equality should do that? If so, please read the infoset specification: http://www.w3.org/TR/xml-infoset/#infoitem.element The relevant quote for this case: *[prefix]* The namespace prefix part of the element-type name. If the name is unprefixed, this property has no value. Note that namespace-aware applications should use the namespace name rather than the prefix to identify elements. I still don't see why the reverse mapping is needed. Is it because I'm storing the QName in a keyword and can look up the current namespace for the URI, while you are storing the fully qualified name? First, terminology: In xml the namespace _is_ the uri. The thing that you write before the : in the serialization is a prefix. It is only an artifact of serialization, completely meaningless except when you actually read or write xml. So I want the user to be to write xml without javax.xml, just by transforming the tree back to its context-dependent keyworded prefix-representation. So we need a way to find the (a) current prefix for a namespace. Sorry, I'm not following what you were getting at with this. In this example D and E both get mapped to the same namespace, meaning that D:foo/ and E:foo/ can be made to resolve the same way. But in a different context they could be different values. Which is the reason we need to lift elements out of their context as soon as possible. We don't want an element to change its namespace, just because we transplant it into another xml fragment. Chouser went to great length about this point, before he realized that this was exactly my goal aswell. If both the explicit declarations of namespaces
Re: [RFC] Roundtripping namespaced xml documents for data.xml
Hi Herwig, I spent some time going through the design, and a email thread (particularly Chouser and Christophe's responses), plus I spent a bit more time on my own implementation where it was clear that I'd missed some things. I've yet to go through your code in fine detail, so I'll still have some gaps in my knowledge. On Thu, May 22, 2014 at 2:44 PM, Herwig Hochleitner hhochleit...@gmail.comwrote: 2014-05-21 21:06 GMT+02:00 Paul Gearon gea...@gmail.com: Are QNames strictly necessary? Keywords seem to do the trick, and they work in nicely with what already exists. I know that there are some QName forms that are not readable as a keyword, but the XML parsing code will always call (keyword ...) and that holds any kind of QName, I've argued this at some length on the dev thread. IMO QNames are not nessecary, but we want another datatype than keywords. I think the main argument for using keywords would be xml literals in code and there readability (i.e. not having to use (keyword ..)) counts. A reader tag is far better suited for this. In the course of that argument, I also came up with a way to represent resolved names as keywords in literals. Please check out the design page for this. I still argue for using keywords here. The existing API uses them, and they're a natural fit. The one real problem is elements would need a special hash/equality check to deal with namespace contexts (presuming that fn:deep-equal maps to Object.equals). The keywords would need to be translated according to the current context. However, that approach still works for fragments that can be evaluated in different contexts, while storing URIs directly needs everything to be rebuilt for another context. Most possible QNames can be directly expressed as a keyword (for instance, the QName 㑦:㒪 can be represented as the read-able keyword :㑦/㒪). The keyword function is just a workaround for exotic values. While I know they can exist (e.g. containing \u000A), I've yet to see any QNames in the wild that cannot be represented as a read-able keyword. In case I'm not clear, say I have these two docs: a:foo xmlns=http://ex.com/; xmlns:a=http://a.com/; a:bar xmlns:b=http://b.org; b:baz=blah/ /a:foo a:foo xmlns:a=http://something.else.com/; a:bar xmlns:b=http://b.org; b:baz=blah/ /a:foo If I compare the a:bar element form both documents with func-deep-equal then they should compare equal, despite the fact that the a:bar qname resolves differently in each context. The representation I've used was only a small extension to the existing one: #clojure.data.xml.Element{:tag :a/bar, :attrs {:b/baz blah}, :namespaces {:b http://b.org}, :content ()} I agree with the use of meta to handle the namespaces, since it's not included in equality testing. Namespaces are declared on, and scoped to the element, so it makes sense to add them as a map there (this is what I've done). In the first case, the meta-data for the a:bar element is: {:b http://b.org;, :a http://a.com/;, :xmlns http://ex.com/} I didn't follow the discussion for putting URIs into keywords, as I could not see why we would want this (am I missing something obvious?) Are the reverse mappings (uri-prefix) definitely necessary? My first look at this made me think that they were (particularly so I could call XMLStreamWriter.getPrefix), but it seemed that the XmlWriter keeps enough state that it isn't necessary. My final code didn't need them at all. The XmlWriter does keep enough state, but I also want to support tree transformers that have the full information without needing to pipe through Xml{Reader,Parser}. uri-prefix could be reconstructed from prefix-uri in linear time, so again, the reason for the reverse mapping is performance. I still don't see why the reverse mapping is needed. Is it because I'm storing the QName in a keyword and can look up the current namespace for the URI, while you are storing the fully qualified name? I was mostly considering round-tripping the data, and the parser is good at not repeating namespaces for child elements, so the emitter didn't need to either. As a result I didn't need to filter out prefix-uri bindings from parent elements when emitting namespaces, though that should be easy. What I meant are redundant prefixes, e.g. binding xmlns:D=DAV: at the root element, xmlns:E=DAV: in a child element. Sorry, I'm not following what you were getting at with this. In this example D and E both get mapped to the same namespace, meaning that D:foo/ and E:foo/ can be made to resolve the same way. But in a different context they could be different values. If both the explicit declarations of namespaces on elements and current context are stored with the element (one in the :namespaces field, the other in metadata), then this allows resolution to be handled correctly, while also maintaining where each namespace needs to be emitted. If uri-prefix is needed, then a simple map would need that, yes. However,
Re: [RFC] Roundtripping namespaced xml documents for data.xml
Slightly Off topic, but how can I add new an element to an existing XML file with data.xml. For instance I have: a b /b /a and I want to add element c to this like this: a b c /c /b /a The documentation isn't particular clear on how to use the library unfortunately. Thomas -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups Clojure group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: [RFC] Roundtripping namespaced xml documents for data.xml
2014-05-21 21:06 GMT+02:00 Paul Gearon gea...@gmail.com: Are QNames strictly necessary? Keywords seem to do the trick, and they work in nicely with what already exists. I know that there are some QName forms that are not readable as a keyword, but the XML parsing code will always call (keyword ...) and that holds any kind of QName, I've argued this at some length on the dev thread. IMO QNames are not nessecary, but we want another datatype than keywords. I think the main argument for using keywords would be xml literals in code and there readability (i.e. not having to use (keyword ..)) counts. A reader tag is far better suited for this. In the course of that argument, I also came up with a way to represent resolved names as keywords in literals. Please check out the design page for this. Are the reverse mappings (uri-prefix) definitely necessary? My first look at this made me think that they were (particularly so I could call XMLStreamWriter.getPrefix), but it seemed that the XmlWriter keeps enough state that it isn't necessary. My final code didn't need them at all. The XmlWriter does keep enough state, but I also want to support tree transformers that have the full information without needing to pipe through Xml{Reader,Parser}. uri-prefix could be reconstructed from prefix-uri in linear time, so again, the reason for the reverse mapping is performance. I was mostly considering round-tripping the data, and the parser is good at not repeating namespaces for child elements, so the emitter didn't need to either. As a result I didn't need to filter out prefix-uri bindings from parent elements when emitting namespaces, though that should be easy. What I meant are redundant prefixes, e.g. binding xmlns:D=DAV: at the root element, xmlns:E=DAV: in a child element. If uri-prefix is needed, then a simple map would need that, yes. However, if I needed the reverse mapping then I'd use a pair of stacks of maps - one for each direction. (BTW, a stack of maps sounds complex, but the top of the stack is just the new bindings merged onto the previous top of the stack). In this case, XmlNamespaceImpl is just that, modulo the stack. It is meant to be updated at every child element that binds xmlns prefixes, so the stack is implicit. I don't keep the parent XmlNamespaceImpl, because an xml element doesn't keep a parent pointer either. ad. Thomas' quesion Slightly Off topic, but how can I add new an element to an existing XML file with data.xml. Since you mentioned zippers, I assume you are familiar with them. I wholeheartedly recommend them for manipulating xml. Enlive is also built on zippers and I think it shouldn't take too much effort to make it work with the proposed namespace support. -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups Clojure group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: [RFC] Roundtripping namespaced xml documents for data.xml
Hi Herwig, I needed this myself and took a stab at it before it occurred to me to look for other implementations and found yours. I like the way that you have split the implementation of functionality into different namespaces. The current release is a little monolithic. One thing that bothers me is what's going on in clojure.data.xml.impl. I haven't tried to go through all of it yet, but I didn't think that the code needed to be that complex. The way I handled prefix-uri mappings was to use a stack of maps, which I thought was adequate. Your implementation of XmlNamespaceImpl has me thinking that I've been missing something important. Could you explain why XmlNamespaceImpl is structured the way it is? Thanks in advance, Paul On Wed, Mar 26, 2014 at 10:34 AM, Herwig Hochleitner hhochleit...@gmail.com wrote: Hi, I'm taking a stab at namespaced xml support: http://dev.clojure.org/jira/browse/DXML-4 I've uploaded a patch, that should implement 1:1 roundtripping, fully preserving prefixes and xmlns declarations: http://dev.clojure.org/jira/secure/attachment/12899/roundtrip-documents.patch This doesn't implement any advanced serialization or deserialization strategies described in the design page: http://dev.clojure.org/display/DXML/Fuller+XML+support However, it allows such strategies to be implemented by transforming clojure data structures, hence it should be a suitable common representation for any namespaced xml needs. I'd like to work on some namespace-related treewalking next, most importantly normalizing prefixes and default namespaces, so that one can actually parse namespaced xml. Meanwhile, I'd be delighted if you could try out the patch on any (well-formed) namespaced xml you have at hand and see, if it roundtrips correctly. kind regards -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups Clojure group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups Clojure group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: [RFC] Roundtripping namespaced xml documents for data.xml
2014-05-21 19:31 GMT+02:00 Paul Gearon gea...@gmail.com: I needed this myself and took a stab at it before it occurred to me to look for other implementations and found yours. Hi Paul, good to hear you are interested in this effort. I like the way that you have split the implementation of functionality into different namespaces. The current release is a little monolithic. The patch I attached to DXML-4 [1] are just minimal changes to make roundtripping work. No refactoring. I've started to implement a much more advanced approach to namespaced xml in my repo [2], the design of which came from feedback on clojure-dev thread [3] and is documented in a new design page [4]. This approach will add another data tier (and make it the default), where tags and attribute names will be represented as QNames and xmlns attributes won't be present in :attrs One thing that bothers me is what's going on in clojure.data.xml.impl. I haven't tried to go through all of it yet, but I didn't think that the code needed to be that complex. Good thing that you didn't bother consuming it all, impl currently has a lot of code, that will get deleted again. Basically all the stuff for automagically mixing raw and resolved data. The relevant parts for now are mostly XmlNamespace and the reader tag implementations. The way I handled prefix-uri mappings was to use a stack of maps, which I thought was adequate. Your implementation of XmlNamespaceImpl has me thinking that I've been missing something important. Could you explain why XmlNamespaceImpl is structured the way it is? Apart from basic bidirectional mapping functionality, XmlNamespaceImpl is built in such a way that: - Adding mappings is semantically equivalent to a child element declaring xmlns attributes - A new prefix for an already established mapping won't displace the current reverse mapping (uri-prefix), but get recorded as an alternative - Remapping a prefix to a new uri or deleting it, will establish an alternative prefix for that uri, in FIFO manner The aim is to strip out redundant xmlns attributes in the emitter, while retaining xml semantics. There is indeed something missing: Two XmlNamespaces must be diffable, to discover which xmlns attributes actually have to be emitted. I'm sure it could be built a lot simpler. It's currently basically a product of making its tests work [5]. It could probably be made to work with stacks of maps, but then each lookup would involve a linear search, no? -- What do you think of the proposed design? [1] http://dev.clojure.org/jira/browse/DXML-4 [2] https://github.com/bendlas/data.xml [3] https://groups.google.com/d/topic/clojure-dev/2ckb0NxJTlQ/discussion [4] http://dev.clojure.org/display/DXML/Namespaced+XML [5] https://github.com/bendlas/data.xml/blob/master/src/test/clojure/clojure/data/xml/test_namespace.clj kind regards -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups Clojure group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: [RFC] Roundtripping namespaced xml documents for data.xml
Hi Herwig, Thanks for the response. I'll remove some of the less relevant bits as I reply inline... On Wed, May 21, 2014 at 2:26 PM, Herwig Hochleitner hhochleit...@gmail.comwrote: 2014-05-21 19:31 GMT+02:00 Paul Gearon gea...@gmail.com: I like the way that you have split the implementation of functionality into different namespaces. The current release is a little monolithic. The patch I attached to DXML-4 [1] are just minimal changes to make roundtripping work. No refactoring. I've started to implement a much more advanced approach to namespaced xml in my repo [2], the design of which came from feedback on clojure-dev thread [3] and is documented in a new design page [4]. I wasn't really paying attention to the patch, since I saw your link to your fork. Someone mentioned the thread to me, but I hadn't found it yet. Thanks for the link. I'll read up on it, along with the design page. I'll continue to comment while I'm here, though I may get some of my answers as I read more. This approach will add another data tier (and make it the default), where tags and attribute names will be represented as QNames and xmlns attributes won't be present in :attrs Are QNames strictly necessary? Keywords seem to do the trick, and they work in nicely with what already exists. I know that there are some QName forms that are not readable as a keyword, but the XML parsing code will always call (keyword ...) and that holds any kind of QName, One thing that bothers me is what's going on in clojure.data.xml.impl. I haven't tried to go through all of it yet, but I didn't think that the code needed to be that complex. Good thing that you didn't bother consuming it all, impl currently has a lot of code, that will get deleted again. Basically all the stuff for automagically mixing raw and resolved data. The relevant parts for now are mostly XmlNamespace and the reader tag implementations. The way I handled prefix-uri mappings was to use a stack of maps, which I thought was adequate. Your implementation of XmlNamespaceImpl has me thinking that I've been missing something important. Could you explain why XmlNamespaceImpl is structured the way it is? Apart from basic bidirectional mapping functionality, XmlNamespaceImpl is built in such a way that: - Adding mappings is semantically equivalent to a child element declaring xmlns attributes - A new prefix for an already established mapping won't displace the current reverse mapping (uri-prefix), but get recorded as an alternative - Remapping a prefix to a new uri or deleting it, will establish an alternative prefix for that uri, in FIFO manner Are the reverse mappings (uri-prefix) definitely necessary? My first look at this made me think that they were (particularly so I could call XMLStreamWriter.getPrefix), but it seemed that the XmlWriter keeps enough state that it isn't necessary. My final code didn't need them at all. I thought I covered the various cases, but I could well believe that I missed one, since there are so many ways to represent the data. Do you have some good examples? The aim is to strip out redundant xmlns attributes in the emitter, while retaining xml semantics. There is indeed something missing: Two XmlNamespaces must be diffable, to discover which xmlns attributes actually have to be emitted. I was mostly considering round-tripping the data, and the parser is good at not repeating namespaces for child elements, so the emitter didn't need to either. As a result I didn't need to filter out prefix-uri bindings from parent elements when emitting namespaces, though that should be easy. I'm sure it could be built a lot simpler. It's currently basically a product of making its tests work [5]. It could probably be made to work with stacks of maps, but then each lookup would involve a linear search, no? If uri-prefix is needed, then a simple map would need that, yes. However, if I needed the reverse mapping then I'd use a pair of stacks of maps - one for each direction. (BTW, a stack of maps sounds complex, but the top of the stack is just the new bindings merged onto the previous top of the stack). What do you think of the proposed design? I will take the time to go through them, thanks. Even if it ends up different to how I'd do it, it'll be the core library, and I need to work with it no matter what. Thanks for being on top of this. I'm looking to reimplement a parser in Clojure when 99% of the world is using a single Java library for this job. That's no good for diversity (as OpenSSL showed recently) and also because I can hardly say I'm offering an alternative system when core elements are built on the same library that everyone else uses. It's a W3C standard, so it makes ridiculously heavy use of namespaces. :) Regards, Paul -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that
Re: [RFC] Roundtripping namespaced xml documents for data.xml
Hi, Speaking of Clojure and XML, what is the preferred way of dealing with XML in Clojure these days? In the past I have used clojure.xml and clojure.zip. Is clojure.data.xml the best way to do this now? TIA, Thomas -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups Clojure group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: [RFC] Roundtripping namespaced xml documents for data.xml
My understanding is that this is the central library for XML, so theoretically it's preferred. This is why I'm trying to use it. Also, it's one of the few implementations using StAX, which is the best way to do lazy streaming (though core.async makes SAX a viable option again). However, without namespaces there are numerous applications where data.xml won't work, so it's still immature. Paul On Wed, May 21, 2014 at 3:47 PM, Thomas th.vanderv...@gmail.com wrote: Hi, Speaking of Clojure and XML, what is the preferred way of dealing with XML in Clojure these days? In the past I have used clojure.xml and clojure.zip. Is clojure.data.xml the best way to do this now? TIA, Thomas -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups Clojure group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups Clojure group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
[RFC] Roundtripping namespaced xml documents for data.xml
Hi, I'm taking a stab at namespaced xml support: http://dev.clojure.org/jira/browse/DXML-4 I've uploaded a patch, that should implement 1:1 roundtripping, fully preserving prefixes and xmlns declarations: http://dev.clojure.org/jira/secure/attachment/12899/roundtrip-documents.patch This doesn't implement any advanced serialization or deserialization strategies described in the design page: http://dev.clojure.org/display/DXML/Fuller+XML+support However, it allows such strategies to be implemented by transforming clojure data structures, hence it should be a suitable common representation for any namespaced xml needs. I'd like to work on some namespace-related treewalking next, most importantly normalizing prefixes and default namespaces, so that one can actually parse namespaced xml. Meanwhile, I'd be delighted if you could try out the patch on any (well-formed) namespaced xml you have at hand and see, if it roundtrips correctly. kind regards -- You received this message because you are subscribed to the Google Groups Clojure group. To post to this group, send email to clojure@googlegroups.com Note that posts from new members are moderated - please be patient with your first post. To unsubscribe from this group, send email to clojure+unsubscr...@googlegroups.com For more options, visit this group at http://groups.google.com/group/clojure?hl=en --- You received this message because you are subscribed to the Google Groups Clojure group. To unsubscribe from this group and stop receiving emails from it, send an email to clojure+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.