Re: [basex-talk] Shouldn't CHOP = false make xml:space="preserve" the default behavior?
I agree that it might be reasonable to introduce different defaults for WebDAV communication. Problems could arise if documents are opened with WebDAV that have been stored via REST or another API… But we could give it a try. On Thu, Aug 10, 2017 at 11:28 PM, Andy Buncewrote: > It seems globally setting `indent=no` gets applied to WebDAV (and everywhere > else where serialization is not explicit specified). This would be my > preference for WebDAV, as it means documents can be round-tripped without > any changes being introduced. The only side effect from this setting I have > seen is view-source on generated html source is harder to read, but this is > not a real issue. > > I have not tried setting them in web.xml yet. I wondered if you would expect > it to work :-). > I will try... > > Cheers > /Andy > > On 10 August 2017 at 18:40, Christian Grün > wrote: >> >> Hi Andy, >> >> > Can the WebDAV serialization be set independently of the default, in >> > web.xml? >> >> The defaults for whitespace chopping and serialization can only be >> assigned globally for all features of BaseX. Did you try to set both >> 'org.basex.chop' and 'org.basex.serializer' in web.xml / does it >> introduce other unwanted side effects? >> >> Cheers, >> Christian >> >> >> > /Andy >> > >> > On 7 August 2017 at 09:57, Christian Grün >> > wrote: >> >> >> >> Dear Ottid, >> >> >> >> Thanks for providing us with the helpful example, which helped me to >> >> understand the problem. >> >> >> >> >> replace /a foo bar >> >> > "a.xml" (Line 1): Open quote is expected for attribute "xml:space" >> >> > associated with an element type "root". >> >> >> >> Just a side note: Command-line parsing is restrictive when it comes to >> >> replacing XML. The reason is that is possible to send multiple >> >> commands in a single line, as shown in the following example: >> >> >> >> create db db; replace /a ; xquery . >> >> >> >> >> >> >> xquery /root >> >> > foo bar >> >> >> >> You may be surprised to hear that whitespaces in your document were >> >> actually chopped, and that the whitespaces are added by the >> >> serializer, because the "indent" serialization parameter is by default >> >> set to "yes". >> >> >> >> It was surprised to see that no one else pointed at this so far, and >> >> that was not mentioned in our documentation, so I have just added some >> >> explanatory lines [1,2]. >> >> >> >> Some more technical background: >> >> >> >> If you call BaseX the "info storage" command, you will see which XML >> >> nodes are stored in the document: >> >> >> >> > set chop on;create db db ; info storage >> >> CHOP: true >> >> Database 'db' created in 11.0 ms. >> >> PRE DIS SIZ ATS ID NS KIND CONTENT >> >> - >> >> 0121 0 0 DOC db.xml >> >> 1111 1 0 ELEM aIf >> >> >> >> > set chop off;create db db ; info storage >> >> CHOP: false >> >> Database 'db' created in 20.12 ms. >> >> PRE DIS SIZ ATS ID NS KIND CONTENT >> >> - >> >> 0131 0 0 DOC db.xml >> >> 1121 1 0 ELEM a >> >> 2111 2 0 TEXT >> >> >> >> Serialization indentation was a chosen as default because it goes hand >> >> in hand with the CHOP option. It even works fine if CHOP is disabled >> >> if a document has whitespaces included (in which case no whitespaces >> >> will be added by the serialized). But it may definitely cause >> >> undesirable output if a document contains no superfluous whitespaces, >> >> such as in your case. >> >> >> >> Hope this helps, >> >> Christian >> >> >> >> [1] http://docs.basex.org/wiki/Options#CHOP >> >> [2] http://docs.basex.org/wiki/Full-Text#Mixed_Content >> > >> > > >
Re: [basex-talk] Shouldn't CHOP = false make xml:space="preserve" the default behavior?
It seems globally setting `indent=no` gets applied to WebDAV (and everywhere else where serialization is not explicit specified). This would be my preference for WebDAV, as it means documents can be round-tripped without any changes being introduced. The only side effect from this setting I have seen is view-source on generated html source is harder to read, but this is not a real issue. I have not tried setting them in web.xml yet. I wondered if you would expect it to work :-). I will try... Cheers /Andy On 10 August 2017 at 18:40, Christian Grünwrote: > Hi Andy, > > > Can the WebDAV serialization be set independently of the default, in > > web.xml? > > The defaults for whitespace chopping and serialization can only be > assigned globally for all features of BaseX. Did you try to set both > 'org.basex.chop' and 'org.basex.serializer' in web.xml / does it > introduce other unwanted side effects? > > Cheers, > Christian > > > > /Andy > > > > On 7 August 2017 at 09:57, Christian Grün > wrote: > >> > >> Dear Ottid, > >> > >> Thanks for providing us with the helpful example, which helped me to > >> understand the problem. > >> > >> >> replace /a foo bar > >> > "a.xml" (Line 1): Open quote is expected for attribute "xml:space" > >> > associated with an element type "root". > >> > >> Just a side note: Command-line parsing is restrictive when it comes to > >> replacing XML. The reason is that is possible to send multiple > >> commands in a single line, as shown in the following example: > >> > >> create db db; replace /a ; xquery . > >> > >> > >> >> xquery /root > >> > foo bar > >> > >> You may be surprised to hear that whitespaces in your document were > >> actually chopped, and that the whitespaces are added by the > >> serializer, because the "indent" serialization parameter is by default > >> set to "yes". > >> > >> It was surprised to see that no one else pointed at this so far, and > >> that was not mentioned in our documentation, so I have just added some > >> explanatory lines [1,2]. > >> > >> Some more technical background: > >> > >> If you call BaseX the "info storage" command, you will see which XML > >> nodes are stored in the document: > >> > >> > set chop on;create db db ; info storage > >> CHOP: true > >> Database 'db' created in 11.0 ms. > >> PRE DIS SIZ ATS ID NS KIND CONTENT > >> - > >> 0121 0 0 DOC db.xml > >> 1111 1 0 ELEM aIf > >> > >> > set chop off;create db db ; info storage > >> CHOP: false > >> Database 'db' created in 20.12 ms. > >> PRE DIS SIZ ATS ID NS KIND CONTENT > >> - > >> 0131 0 0 DOC db.xml > >> 1121 1 0 ELEM a > >> 2111 2 0 TEXT > >> > >> Serialization indentation was a chosen as default because it goes hand > >> in hand with the CHOP option. It even works fine if CHOP is disabled > >> if a document has whitespaces included (in which case no whitespaces > >> will be added by the serialized). But it may definitely cause > >> undesirable output if a document contains no superfluous whitespaces, > >> such as in your case. > >> > >> Hope this helps, > >> Christian > >> > >> [1] http://docs.basex.org/wiki/Options#CHOP > >> [2] http://docs.basex.org/wiki/Full-Text#Mixed_Content > > > > >
Re: [basex-talk] Shouldn't CHOP = false make xml:space="preserve" the default behavior?
Hi Andy, > Can the WebDAV serialization be set independently of the default, in > web.xml? The defaults for whitespace chopping and serialization can only be assigned globally for all features of BaseX. Did you try to set both 'org.basex.chop' and 'org.basex.serializer' in web.xml / does it introduce other unwanted side effects? Cheers, Christian > /Andy > > On 7 August 2017 at 09:57, Christian Grünwrote: >> >> Dear Ottid, >> >> Thanks for providing us with the helpful example, which helped me to >> understand the problem. >> >> >> replace /a foo bar >> > "a.xml" (Line 1): Open quote is expected for attribute "xml:space" >> > associated with an element type "root". >> >> Just a side note: Command-line parsing is restrictive when it comes to >> replacing XML. The reason is that is possible to send multiple >> commands in a single line, as shown in the following example: >> >> create db db; replace /a ; xquery . >> >> >> >> xquery /root >> > foo bar >> >> You may be surprised to hear that whitespaces in your document were >> actually chopped, and that the whitespaces are added by the >> serializer, because the "indent" serialization parameter is by default >> set to "yes". >> >> It was surprised to see that no one else pointed at this so far, and >> that was not mentioned in our documentation, so I have just added some >> explanatory lines [1,2]. >> >> Some more technical background: >> >> If you call BaseX the "info storage" command, you will see which XML >> nodes are stored in the document: >> >> > set chop on;create db db ; info storage >> CHOP: true >> Database 'db' created in 11.0 ms. >> PRE DIS SIZ ATS ID NS KIND CONTENT >> - >> 0121 0 0 DOC db.xml >> 1111 1 0 ELEM aIf >> >> > set chop off;create db db ; info storage >> CHOP: false >> Database 'db' created in 20.12 ms. >> PRE DIS SIZ ATS ID NS KIND CONTENT >> - >> 0131 0 0 DOC db.xml >> 1121 1 0 ELEM a >> 2111 2 0 TEXT >> >> Serialization indentation was a chosen as default because it goes hand >> in hand with the CHOP option. It even works fine if CHOP is disabled >> if a document has whitespaces included (in which case no whitespaces >> will be added by the serialized). But it may definitely cause >> undesirable output if a document contains no superfluous whitespaces, >> such as in your case. >> >> Hope this helps, >> Christian >> >> [1] http://docs.basex.org/wiki/Options#CHOP >> [2] http://docs.basex.org/wiki/Full-Text#Mixed_Content > >
Re: [basex-talk] Shouldn't CHOP = false make xml:space="preserve" the default behavior?
>But it may definitely cause undesirable output if a document contains no superfluous whitespaces, One situation where the default serialization indentation can be problematic is WebDAV. Can the WebDAV serialization be set independently of the default, in web.xml? /Andy On 7 August 2017 at 09:57, Christian Grünwrote: > Dear Ottid, > > Thanks for providing us with the helpful example, which helped me to > understand the problem. > > >> replace /a foo bar > > "a.xml" (Line 1): Open quote is expected for attribute "xml:space" > > associated with an element type "root". > > Just a side note: Command-line parsing is restrictive when it comes to > replacing XML. The reason is that is possible to send multiple > commands in a single line, as shown in the following example: > > create db db; replace /a ; xquery . > > > >> xquery /root > > foo bar > > You may be surprised to hear that whitespaces in your document were > actually chopped, and that the whitespaces are added by the > serializer, because the "indent" serialization parameter is by default > set to "yes". > > It was surprised to see that no one else pointed at this so far, and > that was not mentioned in our documentation, so I have just added some > explanatory lines [1,2]. > > Some more technical background: > > If you call BaseX the "info storage" command, you will see which XML > nodes are stored in the document: > > > set chop on;create db db ; info storage > CHOP: true > Database 'db' created in 11.0 ms. > PRE DIS SIZ ATS ID NS KIND CONTENT > - > 0121 0 0 DOC db.xml > 1111 1 0 ELEM aIf > > > set chop off;create db db ; info storage > CHOP: false > Database 'db' created in 20.12 ms. > PRE DIS SIZ ATS ID NS KIND CONTENT > - > 0131 0 0 DOC db.xml > 1121 1 0 ELEM a > 2111 2 0 TEXT > > Serialization indentation was a chosen as default because it goes hand > in hand with the CHOP option. It even works fine if CHOP is disabled > if a document has whitespaces included (in which case no whitespaces > will be added by the serialized). But it may definitely cause > undesirable output if a document contains no superfluous whitespaces, > such as in your case. > > Hope this helps, > Christian > > [1] http://docs.basex.org/wiki/Options#CHOP > [2] http://docs.basex.org/wiki/Full-Text#Mixed_Content >
Re: [basex-talk] Shouldn't CHOP = false make xml:space="preserve" the default behavior?
Dear Ottid, Thanks for providing us with the helpful example, which helped me to understand the problem. >> replace /a foo bar > "a.xml" (Line 1): Open quote is expected for attribute "xml:space" > associated with an element type "root". Just a side note: Command-line parsing is restrictive when it comes to replacing XML. The reason is that is possible to send multiple commands in a single line, as shown in the following example: create db db; replace /a ; xquery . >> xquery /root > foo bar You may be surprised to hear that whitespaces in your document were actually chopped, and that the whitespaces are added by the serializer, because the "indent" serialization parameter is by default set to "yes". It was surprised to see that no one else pointed at this so far, and that was not mentioned in our documentation, so I have just added some explanatory lines [1,2]. Some more technical background: If you call BaseX the "info storage" command, you will see which XML nodes are stored in the document: > set chop on;create db db ; info storage CHOP: true Database 'db' created in 11.0 ms. PRE DIS SIZ ATS ID NS KIND CONTENT - 0121 0 0 DOC db.xml 1111 1 0 ELEM aIf > set chop off;create db db ; info storage CHOP: false Database 'db' created in 20.12 ms. PRE DIS SIZ ATS ID NS KIND CONTENT - 0131 0 0 DOC db.xml 1121 1 0 ELEM a 2111 2 0 TEXT Serialization indentation was a chosen as default because it goes hand in hand with the CHOP option. It even works fine if CHOP is disabled if a document has whitespaces included (in which case no whitespaces will be added by the serialized). But it may definitely cause undesirable output if a document contains no superfluous whitespaces, such as in your case. Hope this helps, Christian [1] http://docs.basex.org/wiki/Options#CHOP [2] http://docs.basex.org/wiki/Full-Text#Mixed_Content
[basex-talk] Shouldn't CHOP = false make xml:space="preserve" the default behavior?
(Sorry for the noise, but because my previous mail was shown under trimmed content in my Sent Mail, I will resend it without format just in case it was not shown properly to others as well) >From the documentation about the CHOP option I assumed that since xml:space="preserve" sets CHOP = false for that part of the document, that if I set CHOP = false in my configuration file, that the behavior you get when you use xml:space="preserve" would be applied to the whole database (I created the database after setting the option). However the only way I have ever been able to get this behavior, has been to set xml:space="preserve" at the root element. Am I missing something, or is this a bug? How could I get this behavior by default in my databases? I thought this would not warrent a thorough example given the clear conditions which cause the above situation, but I was asked for it anyway, so here it goes: A little bit of context, although it should not matter, I have had this issue for years (at least 4) under Arch Linux. At the time I just assumed I did something wrong and went with xml:space="preserve" workaround everywhere. > uname -a Linux phoenix 4.9.38 #1-NixOS SMP Sat Jul 15 10:17:55 UTC 2017 x86_64 GNU/Linux The default situation: No config file (.basex) > basex BaseX 8.6.4 [Standalone] > create db chop-test Database 'chop-test' created in 123.01 ms. > open chop-test Database 'chop-test' was opened in 1.19 ms. > replace /a foo bar 0 resource(s) replaced in 103.68 ms. > xquery /root foo bar Query executed in 106.05 ms. I never use the REPL other than to create and drop databases, so I was a bit suprised that this did not work: > replace /a foo bar "a.xml" (Line 1): Open quote is expected for attribute "xml:space" associated with an element type "root". While this does: > replace /a foo bar 1 resource(s) replaced in 4.06 ms. > xquery /root foo bar Query executed in 0.95 ms. > quit Have fun. The resource with xml:space="preserve" is the behavior I want to have within my database, because all my documents are mixed content. On the wiki (http://docs.basex.org/wiki/Options#CHOP) this is also mentioned: It explicitly states that in my use case I should set CHOP to false: "The flag should be turned off if a document contains mixed content." It also states that setting the xml:space="preserve" attribute is the same as having CHOP = false: "If the xml:space="preserve" attribute is attached to an element, chopping will be turned off for all descendant text nodes." So lets do that: Let us first confirm that the config file is correctly read: > echo 'FOO = 0' > /some/path/.basex > BASEX_JVM='-Dorg.basex.path=/some/path' basex /some/path/.basex: Unknown option 'FOO'. /some/path/.basex: writing new configuration file. Now we set the option CHOP = false in our config: > echo 'CHOP = false' >> /some/path/.basex So lets see what this changes in the basex REPL: > BASEX_JVM='-Dorg.basex.path=/some/path' basex BaseX 8.6.4 [Standalone] > drop db chop-test Database 'chop-test' was dropped. > create db chop-test Database 'chop-test' created in 106.42 ms. > open chop-test Database 'chop-test' was opened in 0.05 ms. > replace /a foo bar 0 resource(s) replaced in 39.24 ms. > xquery /root foo bar Query executed in 97.09 ms. > quit Have fun. This is not what I expect, it should have been: > xquery /root foo bar And hence my question: Shouldn't CHOP = false make xml:space="preserve" the default behavior? I even tried this: > BASEX_JVM='-Dorg.basex.path=/some/path' basex BaseX 8.6.4 [Standalone] Try 'help' to get more information. > open chop-test Database 'chop-test' was opened in 90.65 ms. > set chop off CHOP: false > replace /a foo bar 1 resource(s) replaced in 45.58 ms. > xquery /root foo bar Query executed in 96.78 ms. > quit See you. Am I making some mistake in the above? Is the wiki simply outdated and should this be configured differently? Is having having mixed content in basex so rare that this bug has gone unnoticed for years?
[basex-talk] Shouldn't CHOP = false make xml:space="preserve" the default behavior?
>From the documentation about the CHOP option I assumed that since xml:space="preserve" sets CHOP = false for that part of the document, that if I set CHOP = false in my configuration file, that the behavior you get when you use xml:space="preserve" would be applied to the whole database (I created the database after setting the option). However the only way I have ever been able to get this behavior, has been to set xml:space="preserve" at the root element. Am I missing something, or is this a bug? How could I get this behavior by default in my databases? I thought this would not warrent a thorough example given the clear conditions which cause the above situation, but I was asked for it anyway, so here it goes: A little bit of context, although it should not matter, I have had this issue for years (at least 4) under Arch Linux. At the time I just assumed I did something wrong and went with xml:space="preserve" workaround everywhere. > uname -a Linux phoenix 4.9.38 #1-NixOS SMP Sat Jul 15 10:17:55 UTC 2017 x86_64 GNU/Linux The default situation: No config file (.basex) > basex BaseX 8.6.4 [Standalone] > create db chop-test Database 'chop-test' created in 123.01 ms. > open chop-test Database 'chop-test' was opened in 1.19 ms. > replace /a foo bar 0 resource(s) replaced in 103.68 ms. > xquery /root foo bar Query executed in 106.05 ms. I never use the REPL other than to create and drop databases, so I was a bit suprised that this did not work: > replace /a foo bar "a.xml" (Line 1): Open quote is expected for attribute "xml:space" associated with an element type "root". While this does: > replace /a foo bar 1 resource(s) replaced in 4.06 ms. > xquery /root foo bar Query executed in 0.95 ms. > quit Have fun. The resource with xml:space="preserve" is the behavior I want to have within my database, because all my documents are mixed content. On the wiki (http://docs.basex.org/wiki/Options#CHOP) this is also mentioned: It explicitly states that in my use case I should set CHOP to false: "The flag should be turned off if a document contains mixed content." It also states that setting the xml:space="preserve" attribute is the same as having CHOP = false: "If the xml:space="preserve" attribute is attached to an element, chopping will be turned off for all descendant text nodes." So lets do that: Let us first confirm that the config file is correctly read: > echo 'FOO = 0' > /some/path/.basex > BASEX_JVM='-Dorg.basex.path=/some/path' basex /some/path/.basex: Unknown option 'FOO'. /some/path/.basex: writing new configuration file. Now we set the option CHOP = false in our config: > echo 'CHOP = false' >> /some/path/.basex So lets see what this changes in the basex REPL: > BASEX_JVM='-Dorg.basex.path=/some/path' basex BaseX 8.6.4 [Standalone] > drop db chop-test Database 'chop-test' was dropped. > create db chop-test Database 'chop-test' created in 106.42 ms. > open chop-test Database 'chop-test' was opened in 0.05 ms. > replace /a foo bar 0 resource(s) replaced in 39.24 ms. > xquery /root foo bar Query executed in 97.09 ms. > quit Have fun. This is not what I expect, it should have been: > xquery /root foo bar And hence my question: Shouldn't CHOP = false make xml:space="preserve" the default behavior? I even tried this: > BASEX_JVM='-Dorg.basex.path=/some/path' basex BaseX 8.6.4 [Standalone] Try 'help' to get more information. > open chop-test Database 'chop-test' was opened in 90.65 ms. > set chop off CHOP: false > replace /a foo bar 1 resource(s) replaced in 45.58 ms. > xquery /root foo bar Query executed in 96.78 ms. > quit See you. Am I making some mistake in the above? Is the wiki simply outdated and should this be configured differently? Is having having mixed content in basex so rare that this bug has gone unnoticed for years?