Re: [basex-talk] Pretty print

2022-11-18 Thread Liam R. E. Quin
On Fri, 2022-11-18 at 18:39 +, Lizzi, Vincent wrote:
> Hi Liam,
> 
> XML's way handling of space characters is understandably an
> improvement over SGML, but it still causes problems sometimes and
> seems more complex than it perhaps could be. Although the ship has
> long since sailed, out of curiosity do you recall if there were any
> suggestions for a rule to ensure that spaces (and absence of spaces)
> would be consistently preserved without relying on a DTD or Schema?

There were. There was a lot of discussion around this. The main
proposals were
(1) disallow mixed content entirely, and require an element to contain
text.
  Karen actually smiled at this idea.
It's easy to see why this didn't get much traction from document
people.

(2) require mixed or text elements to use different syntax, e.g.
  <@p>Karen <@emph>actually smiled at this idea.
This would have ruled out XHTML, however, or any other pre-existing
SGML vocabulary, and at that time that was 100% of all content: there
was no XML content outside of the examples in the specification itself.

At one point i remember (foolishly) suggesting upper-case element names
for ones that count not contain text directly (or the other way round,
i forget), but of course this wouldn't work in a multilingual world
where not all languages have upper and lower case.

XML was developed before XML Schema. When we started, a DTD was
required; by the end, DTDs were optional (i had Charles Goldfarb
calling me at home over this, trying to find ways to keep DTDs as
mandatory!) but i think we didn't revisit all of the decisions in this
light.

> A relatively safe way to "pretty print" indent XML is to only insert
> or remove spaces between an element's name and closing > and where
> spaces already exist in text nodes.

Yes, there are tools that can do this, too.
> 
> However I haven't seen any XML editor or processor implement this
> approach.

I think maybe xmllint can, i'm not sure. And possibly xml tidy, and
maybe James' xp has something like this. Overall i think it tends to
confuse people more than it helps, though. I'm not sure.

liam


-- 
Liam Quin, https://www.delightfulcomputing.com/
Available for XML/Document/Information Architecture/XSLT/
XSL/XQuery/Web/Text Processing/A11Y training, work & consulting.
Barefoot Web-slave, antique illustrations:  http://www.fromoldbooks.org


Re: [basex-talk] BaseX unreachable behind load balancer

2022-11-18 Thread Tamara Marnell
Hi Harry,

When I get timeouts for our EC2 instances during development, it's almost
always because I don't have the right inbound rules in the security group.
Do you have an inbound rule in the security group on the BaseX application
that allows traffic from either the client IP (with preservation enabled)
or your VPC on port 1984?

-Tamara

On Fri, Nov 18, 2022 at 6:29 AM Harry King 
wrote:

> Thanks for the advice Marco,
>
> I’m using Amazon Linux 2 on the Docker host, which appears to have SELinux
> disabled by default already.  So great suggestion, but not apparently the
> issue here.
>
> sudo getsebool -a | grep http
> getsebool:  SELinux is disabled
>
> Looking at the Dockerfile, it looks like the image is built using a alpine
> docker image, so no SELinux within the docker container.
> bash-4.4$ getsebool -a | grep http
> bash: getsebool: command not found
>
> On 18 Nov 2022, at 14:11, Marco Lettere  wrote:
>
> Hi Harry,
>
> one thing that hits me frequently on cloud machines provisioned by others
> is SELinux. If your host is running Linux of course...
>
> In this case there is some documentation around how to check it out by
> looking into the logs of your proxy service (the error should something
> like "not permitted").
>
> Check the Selinux property for http proxies with:
>
> > sudo getsebool -a | grep http
>
> Whereas to disable selinux enforcement on http proxy permanently (-P flag):
>
> > sudo setsebool -P httpd_verify_dns 0
>
> Regards,
>
> Marco.
>
>
> On 18/11/22 13:18, Harry King wrote:
>
> Hi,
>
> I’m wondering if someone might be able to offer a hint or two.  I’m
> attempting to run BaseX in AWS behind a Network Load Balancer (NLB) using
> the 9.5.2 docker image with the default config to start with.  I’ve setup a
> TCP target group and the healthcheck appears happy on port
> 1984.Reachability analyser suggests firewall is good to go.  From the local
> box I can telnet localhost 1984:
>
> telnet localhost 1984
> Trying 127.0.0.1...
> Connected to localhost.
> Escape character is '^]'.
> BaseX:3334891053316136
>
> I get a similar response if I use the server’s private IP address:
>
> telnet 10.x.x.x 1984
> Trying 10.x.x.x...
> Connected to 10.x.x.x.
> Escape character is '^]'.
> BaseX:3335242111998298
>
>
> So far, so good.
>
> If I attempt the same via the NLB I get a timeout, which suggests to me
> either firewall issue or the application is refusing to listen for some
> reason.  The documentation, seems to suggest by default, basex should
> respond to requests from any IP or hostname, do I understand that
> correctly, or do I need to alter the default config?
>
> Thanks in advance!
>
>
>

-- 

Tamara Marnell
Program Manager, Systems
Orbis Cascade Alliance (orbiscascade.org )
Pronouns: she/her/hers


Re: [basex-talk] Pretty print

2022-11-18 Thread Lizzi, Vincent
Hi Liam,

XML's way handling of space characters is understandably an improvement over 
SGML, but it still causes problems sometimes and seems more complex than it 
perhaps could be. Although the ship has long since sailed, out of curiosity do 
you recall if there were any suggestions for a rule to ensure that spaces (and 
absence of spaces) would be consistently preserved without relying on a DTD or 
Schema?

A relatively safe way to "pretty print" indent XML is to only insert or remove 
spaces between an element's name and closing > and where spaces already exist 
in text nodes. Changing the spaces within an element opening tag can adjust 
formatting without inserting or removing text nodes. For example:

pretty print n2.

Can be indented without changing the node tree:

pretty
  print n2.

However I haven't seen any XML editor or processor implement this approach.

Best regards,
Vincent

_
Vincent M. Lizzi
Head of Information Standards | Taylor & Francis Group
vincent.li...@taylorandfrancis.com



Information Classification: General
From: BaseX-Talk  On Behalf Of Liam 
R. E. Quin
Sent: Thursday, November 17, 2022 4:44 PM
To: BaseX 
Subject: Re: [basex-talk] Pretty print

On Thu, 2022-11-17 at 19:05 +0100, Christian Grün wrote:
> >
> > But is there no way to declare that when I import a file to the
> > database?
> >
>
> There's currently no way to supply this for specific elements

Both XML Schema and DTDs do have a way to say whether text is allowed
in a particular context, and the XML loader could use this information
to discard whitespace text nodes that aren't text.

On how it came to be -

SGML had some really bad whitespace rules, including what was called
"pernicious whitespace" - whitespace where the parser needed
backtracking to know if was text or not, but the parsers didn't
actually do backtracking so they flagged it as an error. This was a
very common source of problems for users.

We eliminated this for XML by requiring #PCDATA (i.e. text) always to
be in a repeatable or-group, so

and not

(to paraphrase Ambrose Beirce's Devil's Dictionary, which defined a boy
as a noise with dirt on it).

liam


--
Liam Quin, 
https://www.delightfulcomputing.com/
Available for XML/Document/Information Architecture/XSLT/
XSL/XQuery/Web/Text Processing/A11Y training, work & consulting.
Barefoot Web-slave, antique illustrations:  
http://www.fromoldbooks.org


Re: [basex-talk] BaseX unreachable behind load balancer

2022-11-18 Thread Harry King
Thanks for the advice Marco,

I’m using Amazon Linux 2 on the Docker host, which appears to have SELinux 
disabled by default already.  So great suggestion, but not apparently the issue 
here.

sudo getsebool -a | grep http
getsebool:  SELinux is disabled

Looking at the Dockerfile, it looks like the image is built using a alpine 
docker image, so no SELinux within the docker container.
bash-4.4$ getsebool -a | grep http
bash: getsebool: command not found

> On 18 Nov 2022, at 14:11, Marco Lettere  wrote:
> 
> Hi Harry,
> 
> one thing that hits me frequently on cloud machines provisioned by others is 
> SELinux. If your host is running Linux of course...
> 
> In this case there is some documentation around how to check it out by 
> looking into the logs of your proxy service (the error should something like 
> "not permitted").
> 
> Check the Selinux property for http proxies with:
> 
> > sudo getsebool -a | grep http
> 
> Whereas to disable selinux enforcement on http proxy permanently (-P flag):
> 
> > sudo setsebool -P httpd_verify_dns 0
> 
> Regards,
> 
> Marco.
> 
> 
> On 18/11/22 13:18, Harry King wrote:
>> Hi,
>> 
>> I’m wondering if someone might be able to offer a hint or two.  I’m 
>> attempting to run BaseX in AWS behind a Network Load Balancer (NLB) using 
>> the 9.5.2 docker image with the default config to start with.  I’ve setup a 
>> TCP target group and the healthcheck appears happy on port 1984.Reachability 
>> analyser suggests firewall is good to go.  From the local box I can telnet 
>> localhost 1984:
>> 
>> telnet localhost 1984
>> Trying 127.0.0.1...
>> Connected to localhost.
>> Escape character is '^]'.
>> BaseX:3334891053316136
>> 
>> I get a similar response if I use the server’s private IP address:
>> 
>> telnet 10.x.x.x 1984
>> Trying 10.x.x.x...
>> Connected to 10.x.x.x.
>> Escape character is '^]'.
>> BaseX:3335242111998298
>> 
>> 
>> So far, so good.
>> 
>> If I attempt the same via the NLB I get a timeout, which suggests to me 
>> either firewall issue or the application is refusing to listen for some 
>> reason.  The documentation, seems to suggest by default, basex should 
>> respond to requests from any IP or hostname, do I understand that correctly, 
>> or do I need to alter the default config?
>> 
>> Thanks in advance!



Re: [basex-talk] BaseX unreachable behind load balancer

2022-11-18 Thread Marco Lettere

Hi Harry,

one thing that hits me frequently on cloud machines provisioned by 
others is SELinux. If your host is running Linux of course...


In this case there is some documentation around how to check it out by 
looking into the logs of your proxy service (the error should something 
like "not permitted").


Check the Selinux property for http proxies with:

> sudo getsebool -a | grep http

Whereas to disable selinux enforcement on http proxy permanently (-P flag):

> sudo setsebool -P httpd_verify_dns 0

Regards,

Marco.


On 18/11/22 13:18, Harry King wrote:

Hi,

I’m wondering if someone might be able to offer a hint or two.  I’m attempting 
to run BaseX in AWS behind a Network Load Balancer (NLB) using the 9.5.2 docker 
image with the default config to start with.  I’ve setup a TCP target group and 
the healthcheck appears happy on port 1984.Reachability analyser suggests 
firewall is good to go.  From the local box I can telnet localhost 1984:

telnet localhost 1984
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
BaseX:3334891053316136

I get a similar response if I use the server’s private IP address:

telnet 10.x.x.x 1984
Trying 10.x.x.x...
Connected to 10.x.x.x.
Escape character is '^]'.
BaseX:3335242111998298


So far, so good.

If I attempt the same via the NLB I get a timeout, which suggests to me either 
firewall issue or the application is refusing to listen for some reason.  The 
documentation, seems to suggest by default, basex should respond to requests 
from any IP or hostname, do I understand that correctly, or do I need to alter 
the default config?

Thanks in advance!


[basex-talk] BaseX unreachable behind load balancer

2022-11-18 Thread Harry King
Hi,

I’m wondering if someone might be able to offer a hint or two.  I’m attempting 
to run BaseX in AWS behind a Network Load Balancer (NLB) using the 9.5.2 docker 
image with the default config to start with.  I’ve setup a TCP target group and 
the healthcheck appears happy on port 1984.Reachability analyser suggests 
firewall is good to go.  From the local box I can telnet localhost 1984:

telnet localhost 1984
Trying 127.0.0.1...
Connected to localhost.
Escape character is '^]'.
BaseX:3334891053316136

I get a similar response if I use the server’s private IP address:

telnet 10.x.x.x 1984
Trying 10.x.x.x...
Connected to 10.x.x.x.
Escape character is '^]'.
BaseX:3335242111998298


So far, so good.

If I attempt the same via the NLB I get a timeout, which suggests to me either 
firewall issue or the application is refusing to listen for some reason.  The 
documentation, seems to suggest by default, basex should respond to requests 
from any IP or hostname, do I understand that correctly, or do I need to alter 
the default config?

Thanks in advance!