Re: [basex-talk] How to escape/encode a search term using BaseX REST XQ

2020-01-24 Thread Christian Grün
Hi Ivan, hi Gerrit,

Thanks for your assessments.

Most design decisions in RESTXQ have been taken from Java’s JAX-RS API
[1]. The semantics for accessing paths is a bit more complex, though:
JAX-RS provides two annotations @Path and @PathParam to access the
full path and segments of the path, and the segments are automatically
decoded. Automatic decoding can be disabled via an optional @Encoded
annotation.

In RESTXQ, we only have a single %rest:path annotations, which
contains both the full path as well as variables for path segments.

Requests with wrongly encoded URLs, such as http://localhost:8984/a%2,
are already rejected by Jetty (and, I guess, any other web servers).
They are rejected before any RESTXQ code can intervene. If a URLs is
correctly encoded, the Java servlet function getPathInfo() is used to
obtain the path. I noticed there is an alternative function
getRequestURI() that could be used to access the original URL.

Maybe the introduction of a %rest:encoded annotation could be
discussed in the EXQuery/RESTXQ repository [2]?

Best,
Christian

[1] https://download.oracle.com/otndocs/jcp/jaxrs-2_0-fr-eval-spec/index.html
[2] https://github.com/exquery/exquery/issues



On Fri, Jan 24, 2020 at 2:38 PM Imsieke, Gerrit, le-tex
 wrote:
>
>
>
> On 24.01.2020 14:36, Imsieke, Gerrit, le-tex wrote:
> > So I agree, BaseX should not interpret escaped slashes as if they were
> > regular slashes, thereby disallowing them as part of RESTXQ path pa
>
> …rameters.


Re: [basex-talk] How to escape/encode a search term using BaseX REST XQ

2020-01-24 Thread Imsieke, Gerrit, le-tex




On 24.01.2020 14:36, Imsieke, Gerrit, le-tex wrote:
So I agree, BaseX should not interpret escaped slashes as if they were 
regular slashes, thereby disallowing them as part of RESTXQ path pa


…rameters.


Re: [basex-talk] How to escape/encode a search term using BaseX REST XQ

2020-01-24 Thread Imsieke, Gerrit, le-tex
While moving the URI parameter to the query string seems like an 
acceptable workaround, I, too, suggest that if *reserved* URI characters 
such as '/' appear percent-encoded, they should not be converted to 
their decoded character prior to analyzing the URI, in line with Sect. 
2.2 of RFC 3986 [1].


If I enter an escaped colon (%3A) in a path segment, it will be kept as 
%3A by BaseX, rather than converted to the reserved character ':'.


The RESTXQ specification [2] doesn’t seem to contain detailed 
instructions on how to decode the submitted URI before extracting path 
parameters, therefore I think RFC 3986 should prevail.


So I agree, BaseX should not interpret escaped slashes as if they were 
regular slashes, thereby disallowing them as part of RESTXQ path pa


Gerrit

[1] https://tools.ietf.org/html/rfc3986#section-2.2
[2] 
http://exquery.github.io/exquery/exquery-restxq-specification/restxq-1.0-specification.html


On 24.01.2020 13:54, Ivan Kanakarakis wrote:

Hi Christian,

thanks for the quick reply. It definitely helps, but it still keeps
this behaviour in the "weird" domain.
I do not see a reason to be decoding the URI before it gets to match a
route. What is the reason for this?

What you propose works, but if I have a route like
"/search/{$query=.+}/page/{$page}", then the query will match
everything including "/page/...". If the path was not decoded, I do
not think I would need the regex, neither any other special operation
on the route. It should work with "/search/{$query}/page/{$page}" and
it should return "tea%2Ftime". Why do I have to make workarounds to
try to guess how a part of the URL was encoded, when the URL I hit has
that part encoded?
I don't think it makes sense, and I don't see a use case for this.

When the framework receives the payload, it is responsible to match a route.
By matching the route, it will provide me with the binded parts of the
route that I requested.
Then, *I* am responsible to decode those parts as I see fit and handle
the request as I need.

If the framework decodes the URL before matching a route, that is a
problem to me - I do not have the control I need.
If the framework decodes the URL parts before binding the route
variables, this is fine - it saves me an operation.

While, I now refactored the endpoint handlers to work with query
params, and this is no longer a problem for me, it is a problem in
general.


Cheers,



On Mon, 20 Jan 2020 at 19:36, Christian Grün  wrote:


Hi Ivan,

A more common approach is to supply search terms as query parameters
(URL?query=...); in that case, your path won’t have new segments. If
you prefer paths, you can use a regular expression in your RESTXQ path
pattern [1]:

   "search/{$query=.+}"

In both cases, encodeURIComponent should be the appropriate function
to encode special characters.

Hope this helps,
Christian

[1] http://docs.basex.org/wiki/RESTXQ#Paths





On Mon, Jan 20, 2020 at 10:54 AM Ivan Kanakarakis
 wrote:


Hello everyone,

I am using BaseX 8.44 and the REST XQ interface (ie,
http://docs.basex.org/wiki/RESTXQ). I have an endpoint that, when
invoked with GET, it does a full text search (using "$db-nodes[text()
contains text { $term } all]"), gets the results, constructs a JSON
response and sends it back.

That's all fine and works great. However, I am not sure how I should
be doing the queries I describe bellow.

_Note: the query is initiated by a SPA javascript client, thus when I
say encode/uri-escape, what I mean is that I invoke the
encodeURIComponent function
(https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/encodeURIComponent).
_Note 2: for the sake of conversation let's consider the example
endpoint declared as:

 %rest:GET
 %rest:path("/search/{$term}")


1. I want to search for "tea". That is the basic query. A single term,
no problem.

 curl -s "https://example.com/search/tea;


2. I want to search for "tea time". Now, this query has a space in
between the two words. What I expect to get back, is any node that
contains both words (thus I have used "contains text" with "all"),
even if they may be a few words apart.
- Should I be sending an encoded/uri-escape version of this, ie, "tea%20time"?
- Or, should I be replacing the space with "+", ie "tea+time"?
- Or, some other advice?

 curl -s "https://example.com/search/tea%20time;
 curl -s "https://example.com/search/tea+time;


3. I want to search for "tea/time". This is even trickier. What I
expect to get back, is any node that contains "tea/time", ie a search
result for a single term. How do I do this?
- If I do not do anything, the slash is treated as part of the URL,
thus not matching a route.
- If I encoded/uri-escape this term, I get "tea%2Ftime". But, when I
invoke the endpoint I get the same as if there was a slash.
- I am not sure how I should deal with the slash. How should I
escape/encode this?

 curl -s "https://example.com/search/tea/time;
 curl -s 

Re: [basex-talk] How to escape/encode a search term using BaseX REST XQ

2020-01-24 Thread Ivan Kanakarakis
Hi Christian,

thanks for the quick reply. It definitely helps, but it still keeps
this behaviour in the "weird" domain.
I do not see a reason to be decoding the URI before it gets to match a
route. What is the reason for this?

What you propose works, but if I have a route like
"/search/{$query=.+}/page/{$page}", then the query will match
everything including "/page/...". If the path was not decoded, I do
not think I would need the regex, neither any other special operation
on the route. It should work with "/search/{$query}/page/{$page}" and
it should return "tea%2Ftime". Why do I have to make workarounds to
try to guess how a part of the URL was encoded, when the URL I hit has
that part encoded?
I don't think it makes sense, and I don't see a use case for this.

When the framework receives the payload, it is responsible to match a route.
By matching the route, it will provide me with the binded parts of the
route that I requested.
Then, *I* am responsible to decode those parts as I see fit and handle
the request as I need.

If the framework decodes the URL before matching a route, that is a
problem to me - I do not have the control I need.
If the framework decodes the URL parts before binding the route
variables, this is fine - it saves me an operation.

While, I now refactored the endpoint handlers to work with query
params, and this is no longer a problem for me, it is a problem in
general.


Cheers,



On Mon, 20 Jan 2020 at 19:36, Christian Grün  wrote:
>
> Hi Ivan,
>
> A more common approach is to supply search terms as query parameters
> (URL?query=...); in that case, your path won’t have new segments. If
> you prefer paths, you can use a regular expression in your RESTXQ path
> pattern [1]:
>
>   "search/{$query=.+}"
>
> In both cases, encodeURIComponent should be the appropriate function
> to encode special characters.
>
> Hope this helps,
> Christian
>
> [1] http://docs.basex.org/wiki/RESTXQ#Paths
>
>
>
>
>
> On Mon, Jan 20, 2020 at 10:54 AM Ivan Kanakarakis
>  wrote:
> >
> > Hello everyone,
> >
> > I am using BaseX 8.44 and the REST XQ interface (ie,
> > http://docs.basex.org/wiki/RESTXQ). I have an endpoint that, when
> > invoked with GET, it does a full text search (using "$db-nodes[text()
> > contains text { $term } all]"), gets the results, constructs a JSON
> > response and sends it back.
> >
> > That's all fine and works great. However, I am not sure how I should
> > be doing the queries I describe bellow.
> >
> > _Note: the query is initiated by a SPA javascript client, thus when I
> > say encode/uri-escape, what I mean is that I invoke the
> > encodeURIComponent function
> > (https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/encodeURIComponent).
> > _Note 2: for the sake of conversation let's consider the example
> > endpoint declared as:
> >
> > %rest:GET
> > %rest:path("/search/{$term}")
> >
> >
> > 1. I want to search for "tea". That is the basic query. A single term,
> > no problem.
> >
> > curl -s "https://example.com/search/tea;
> >
> >
> > 2. I want to search for "tea time". Now, this query has a space in
> > between the two words. What I expect to get back, is any node that
> > contains both words (thus I have used "contains text" with "all"),
> > even if they may be a few words apart.
> > - Should I be sending an encoded/uri-escape version of this, ie, 
> > "tea%20time"?
> > - Or, should I be replacing the space with "+", ie "tea+time"?
> > - Or, some other advice?
> >
> > curl -s "https://example.com/search/tea%20time;
> > curl -s "https://example.com/search/tea+time;
> >
> >
> > 3. I want to search for "tea/time". This is even trickier. What I
> > expect to get back, is any node that contains "tea/time", ie a search
> > result for a single term. How do I do this?
> > - If I do not do anything, the slash is treated as part of the URL,
> > thus not matching a route.
> > - If I encoded/uri-escape this term, I get "tea%2Ftime". But, when I
> > invoke the endpoint I get the same as if there was a slash.
> > - I am not sure how I should deal with the slash. How should I
> > escape/encode this?
> >
> > curl -s "https://example.com/search/tea/time;
> > curl -s "https://example.com/search/tea%2Ftime;
> >
> >
> > Thank you,


Re: [basex-talk] How to escape/encode a search term using BaseX REST XQ

2020-01-20 Thread Christian Grün
Hi Ivan,

A more common approach is to supply search terms as query parameters
(URL?query=...); in that case, your path won’t have new segments. If
you prefer paths, you can use a regular expression in your RESTXQ path
pattern [1]:

  "search/{$query=.+}"

In both cases, encodeURIComponent should be the appropriate function
to encode special characters.

Hope this helps,
Christian

[1] http://docs.basex.org/wiki/RESTXQ#Paths





On Mon, Jan 20, 2020 at 10:54 AM Ivan Kanakarakis
 wrote:
>
> Hello everyone,
>
> I am using BaseX 8.44 and the REST XQ interface (ie,
> http://docs.basex.org/wiki/RESTXQ). I have an endpoint that, when
> invoked with GET, it does a full text search (using "$db-nodes[text()
> contains text { $term } all]"), gets the results, constructs a JSON
> response and sends it back.
>
> That's all fine and works great. However, I am not sure how I should
> be doing the queries I describe bellow.
>
> _Note: the query is initiated by a SPA javascript client, thus when I
> say encode/uri-escape, what I mean is that I invoke the
> encodeURIComponent function
> (https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/encodeURIComponent).
> _Note 2: for the sake of conversation let's consider the example
> endpoint declared as:
>
> %rest:GET
> %rest:path("/search/{$term}")
>
>
> 1. I want to search for "tea". That is the basic query. A single term,
> no problem.
>
> curl -s "https://example.com/search/tea;
>
>
> 2. I want to search for "tea time". Now, this query has a space in
> between the two words. What I expect to get back, is any node that
> contains both words (thus I have used "contains text" with "all"),
> even if they may be a few words apart.
> - Should I be sending an encoded/uri-escape version of this, ie, "tea%20time"?
> - Or, should I be replacing the space with "+", ie "tea+time"?
> - Or, some other advice?
>
> curl -s "https://example.com/search/tea%20time;
> curl -s "https://example.com/search/tea+time;
>
>
> 3. I want to search for "tea/time". This is even trickier. What I
> expect to get back, is any node that contains "tea/time", ie a search
> result for a single term. How do I do this?
> - If I do not do anything, the slash is treated as part of the URL,
> thus not matching a route.
> - If I encoded/uri-escape this term, I get "tea%2Ftime". But, when I
> invoke the endpoint I get the same as if there was a slash.
> - I am not sure how I should deal with the slash. How should I
> escape/encode this?
>
> curl -s "https://example.com/search/tea/time;
> curl -s "https://example.com/search/tea%2Ftime;
>
>
> Thank you,