[basex-talk] Server-side XQuery scripts understanding their "context"

2020-01-20 Thread ERRINGTON Luke
Hi all,

I'm trying to use the REST API to call server-side XQuery scripts. As I 
understand it, these scripts are not part of a database, but reside in the 
filesystem where BaseX is "installed" (or where RESTPATH points to).

However, when I execute an XQuery I would like it to run on a particular 
database. Is there anyway to achieve this without passing the database as an 
argument to the query?

For example,
http://localhost:8984/rest?run=find.xq
http://localhost:8984/rest/Test?run=find.xq
... both execute the same query, and that query needs to open a database to do 
its searching, so currently the database to open is hard-coded. It would be 
nice if the query could work out what database or "collection" was its context 
and automatically use that.

Additionally, if I do
http://localhost:8984/rest?run=Test/find.xq
... then I run the query that is in the Test folder, but this still does not 
have any context and so still needs to know to open the Test database.

Is there a way to achieve this without having to pass the name of the database 
to every XQuery? Or am I not understanding things and is there a better way to 
do this sort of thing?

Thanks,
Luke




[basex-talk] Help

2020-01-20 Thread Pedro Sousa
Hello guys. I need to develop one API that must compare prices of laptops
of two websites Fnac and Worten.

I don't know to much about XML ( it's a college degree and it's one of last
summarys that's i had left)

Here it is what i need to do:

It is intended to develop an application that allows you to make a price
comparison of Laptops sold at Fnac and Worten online stores.


Information Extraction, Structuring and Storage


To preserve and share product data from each online store, such as price
history, create an XML vocabulary that can capture all storage-related
requirements. Thus, communication between applications is supported by a
specific vocabulary and independent of the tools used to record prices used
in the mentioned online stores. Always keep in mind that this vocabulary
may be distributed in the future to various “partners” so that they can
communicate or price their products and price comparator developed.


The following vocabulary objectives were created:

● Represent practical store-identifying information, including the name,
primary web address, and web addresses used for information extraction;

● Represent useful information with the products involved in the price
comparator, namely: name, characteristics, brand, link to an offer page and
their prices over time, including data / time when this information was
collected;

● For each store, display aggregate data information where you can use:
average price of each product (considering various amounts of data) not
only considering the entire price history, but also a specific period (eg
month);

● Represent practical information with partner pricing, including: the name
of the online store and previously used product data (for example, if any
online store wants to publish their pricing on the platform, it will use a
used component of vocabulary).


● You can add new data elements to the vocabulary in order to enrich the
whole process (this component will be enhanced).


Information Processing and Availability


The documents generated and valid according to the developed vocabulary
must be stored in a database and made available through a REST API using
the tool B aseX.


You can use the Postman tool to extract data. The data must be extracted
periodically (for example, hourly) from the referred addresses, and must
later be stored in an XML database with BaseX. The data stored in the
database must then be exposed through a REST API designed to support
integration with applications developed by third parties. The API should
(at a minimum) provide:

● data on a specific product with or without a price history (it should be
possible to select one of the options);

● data about the products of a specific brand;

● data about a specific online store for one or more products;

● aggregated data for the online store (including the average price of
products and maximum and minimum prices in a given period of time);

● price communication for partner products;

● it should also be possible to share information via twitter (for example,
what is the best deal taking into account the price differences of a
particular laptop).


The number of endpoints, such as the form of interaction with the API, that
can be parsed more expressively and consistently, but it is not possible to
identify resources, but also in the REST verbs used in HTTP requests.


Identifying common products between different stores is an extremely
important process since without this identification it will not be possible
to make a price comparison. In addition to adopting techniques to automate
the process of identifying common products, it can provide manual
mechanisms. For example, you can create a document that stores as mailings
that 100% of common products could not be detected and allow a user to
annotate or validate a document (yes or no) from the mailing.


Data visualization


It should also include a specific endpoint to provide an HTML document with
a set of views related to vocabulary aggregated data components. To do
this, select an HTML document to objectively display the required
information. You can run quickchart.io to generate views and integrate
documents into the available HTML (s).


Overview and tools


You should take advantage of the tools studied throughout the semester to
use each step. The combination of tools to achieve the proposed objectives
is at the discretion of each working group. As an example, Figure 1
provides an overview of a possible adjustment of different related concepts
and technologies.


must be delivered:

● A set of XML schemas capable of validating all syntax rules defined for
the language and associated types;

● Evidence of the API developed using Postman, as well as its
documentation. You can deliver a GIT repository;

● XQuery files that define the REST API developed in the BaseX tool;

● Examples of documents that allow you to test the solution (for example:
sample document that features a typical response / request from each 

Re: [basex-talk] Help with a Query/Performance

2020-01-20 Thread Christian Grün
I missed to do the obvious next step. The following query is evaluated
in a few milliseconds:

  declare variable $OFFSET1 := 3;
  declare variable $OFFSET2 := 2;

  let $container := db:open('tr-test')/Container
  let $message := $container/*:MessageA[$OFFSET1]
  let $detail := $message/MessageADetail[$OFFSET2]
  return element { name($container) } {
$container/*[contains(name(), 'MetaData')],
element { name($message) } {
  $message/MessageAMetaData,
  element { name($detail) } {
$detail/*
  }
}
  }


On Mon, Jan 20, 2020 at 6:54 PM Christian Grün
 wrote:
>
> Dear Tom,
>
> If you have large elements, it will usually be faster to create new
> elements. Here’s one way to do it:
>
>   let $offset1 := 3
>   let $offset2 := 2
>   let $container := db:open('tr-test')/Container
>   return element Container {
> (: add meta data elements :)
> $container/*[starts-with(name(), 'ContainerMetaData')],
> (: alternative: add everything except Message elements
> $container/(* except (MessageA, MessageB, MessageC)), :)
> $container/MessageA[$offset1] update {
>   delete node MessageADetail[position() != $offset2]
> }
>   }
>
> There are probably ways to get this even faster; I may have a look at
> this tomorrow.
>
> All the best from Konstanz,
> Christian
>
>
>
> On Mon, Jan 20, 2020 at 10:01 AM Tom Rauchenwald (UNIFITS)
>  wrote:
> >
> > Hi list,
> >
> > I'm struggling with a query.
> >
> > We have XML documents with a structure similar to this:
> >
> > 
> >   FOO
> >   FOO
> >   
> > 
> >   FOO
> >   FOO
> > 
> > 
> >   FOO
> >   FOO
> > 
> > 
> >   FOO
> >   FOO
> > 
> >   
> >   
> > 
> >   FOO
> >   FOO
> > 
> > 
> >   FOO
> >   FOO
> > 
> >   
> >   
> > 
> >   FOO
> >   FOO
> > 
> > 
> >   FOO
> >   FOO
> > 
> >   
> > 
> >
> > Messages are bundled in a container (up to n times for each message), and 
> > each message has details (also up to n times). Container, Message contain 
> > data that is the same for all details (it's basically a grouping).
> > I'd like to retrieve a Detail with all corresponding data associated with 
> > it, so basically a MessageADetail, MessageA (without all the other 
> > MessageADetails), Container (without all the other Messages).
> > I know the position of the message (i.e., I know that I want the second 
> > MessageA for example), and I know the position of the Detail (i.e., I know 
> > that I want the 3rd Detail).
> > The use case is to show the detail in context in a UI.
> >
> > The query to do this I came up with is (here I want to get the 2nd detail 
> > from the third MessageA):
> >
> >   let $fh := (copy $x := /*:Container
> >modify ( delete node $x/*:MessageA[position() != 3]
> >   , delete node $x/*:MessageA[3]/*:MessageADetail[position() != 2]
> >   , delete node $x/*:MessageB
> >   , delete node $x/*:MessageC
> >   )
> >   return $x)
> >   return $fh
> >
> > This works well for small documents. For large documents it can take a 
> > couple of seconds to evaluate the query (our real-life documents do have 
> > more data/elements in Details and Message).
> > I'm wondering if there's a better/more efficient way to do this. I tried 
> > formulating a query that doesn't do deletes, but I couldn't come up with a 
> > solution that performs better and is correct.
> >
> > Any pointers would be very much appreciated.
> >
> > Here's a function to generate sufficiently large test data:
> >
> > declare function local:sample($numberOfMessages, $numberOfDetails) {
> > 
> >   FOO
> >   FOO
> >   {for $i in 1 to $numberOfMessages
> > return
> >   
> > 
> >   FOO {$i}
> >   FOO {$i}
> > 
> > {for $j in 1 to $numberOfDetails
> >  return
> >  
> >FOO {$j}
> >FOO {$j}
> >  
> > }
> >   
> >   }
> >   
> > 
> >   FOO
> >   FOO
> > 
> > 
> >   FOO
> >   FOO
> > 
> >   
> >   
> > 
> >   FOO
> >   FOO
> > 
> > 
> >   FOO
> >   FOO
> > 
> >   
> > 
> > };
> >
> > db:create('tr-test', local:sample(20, 10), 'test.xml')
> >
> > Thanks,
> > Tom Rauchenwald
> >
> >


Re: [basex-talk] Help with a Query/Performance

2020-01-20 Thread Christian Grün
Dear Tom,

If you have large elements, it will usually be faster to create new
elements. Here’s one way to do it:

  let $offset1 := 3
  let $offset2 := 2
  let $container := db:open('tr-test')/Container
  return element Container {
(: add meta data elements :)
$container/*[starts-with(name(), 'ContainerMetaData')],
(: alternative: add everything except Message elements
$container/(* except (MessageA, MessageB, MessageC)), :)
$container/MessageA[$offset1] update {
  delete node MessageADetail[position() != $offset2]
}
  }

There are probably ways to get this even faster; I may have a look at
this tomorrow.

All the best from Konstanz,
Christian



On Mon, Jan 20, 2020 at 10:01 AM Tom Rauchenwald (UNIFITS)
 wrote:
>
> Hi list,
>
> I'm struggling with a query.
>
> We have XML documents with a structure similar to this:
>
> 
>   FOO
>   FOO
>   
> 
>   FOO
>   FOO
> 
> 
>   FOO
>   FOO
> 
> 
>   FOO
>   FOO
> 
>   
>   
> 
>   FOO
>   FOO
> 
> 
>   FOO
>   FOO
> 
>   
>   
> 
>   FOO
>   FOO
> 
> 
>   FOO
>   FOO
> 
>   
> 
>
> Messages are bundled in a container (up to n times for each message), and 
> each message has details (also up to n times). Container, Message contain 
> data that is the same for all details (it's basically a grouping).
> I'd like to retrieve a Detail with all corresponding data associated with it, 
> so basically a MessageADetail, MessageA (without all the other 
> MessageADetails), Container (without all the other Messages).
> I know the position of the message (i.e., I know that I want the second 
> MessageA for example), and I know the position of the Detail (i.e., I know 
> that I want the 3rd Detail).
> The use case is to show the detail in context in a UI.
>
> The query to do this I came up with is (here I want to get the 2nd detail 
> from the third MessageA):
>
>   let $fh := (copy $x := /*:Container
>modify ( delete node $x/*:MessageA[position() != 3]
>   , delete node $x/*:MessageA[3]/*:MessageADetail[position() != 2]
>   , delete node $x/*:MessageB
>   , delete node $x/*:MessageC
>   )
>   return $x)
>   return $fh
>
> This works well for small documents. For large documents it can take a couple 
> of seconds to evaluate the query (our real-life documents do have more 
> data/elements in Details and Message).
> I'm wondering if there's a better/more efficient way to do this. I tried 
> formulating a query that doesn't do deletes, but I couldn't come up with a 
> solution that performs better and is correct.
>
> Any pointers would be very much appreciated.
>
> Here's a function to generate sufficiently large test data:
>
> declare function local:sample($numberOfMessages, $numberOfDetails) {
> 
>   FOO
>   FOO
>   {for $i in 1 to $numberOfMessages
> return
>   
> 
>   FOO {$i}
>   FOO {$i}
> 
> {for $j in 1 to $numberOfDetails
>  return
>  
>FOO {$j}
>FOO {$j}
>  
> }
>   
>   }
>   
> 
>   FOO
>   FOO
> 
> 
>   FOO
>   FOO
> 
>   
>   
> 
>   FOO
>   FOO
> 
> 
>   FOO
>   FOO
> 
>   
> 
> };
>
> db:create('tr-test', local:sample(20, 10), 'test.xml')
>
> Thanks,
> Tom Rauchenwald
>
>


Re: [basex-talk] How to escape/encode a search term using BaseX REST XQ

2020-01-20 Thread Christian Grün
Hi Ivan,

A more common approach is to supply search terms as query parameters
(URL?query=...); in that case, your path won’t have new segments. If
you prefer paths, you can use a regular expression in your RESTXQ path
pattern [1]:

  "search/{$query=.+}"

In both cases, encodeURIComponent should be the appropriate function
to encode special characters.

Hope this helps,
Christian

[1] http://docs.basex.org/wiki/RESTXQ#Paths





On Mon, Jan 20, 2020 at 10:54 AM Ivan Kanakarakis
 wrote:
>
> Hello everyone,
>
> I am using BaseX 8.44 and the REST XQ interface (ie,
> http://docs.basex.org/wiki/RESTXQ). I have an endpoint that, when
> invoked with GET, it does a full text search (using "$db-nodes[text()
> contains text { $term } all]"), gets the results, constructs a JSON
> response and sends it back.
>
> That's all fine and works great. However, I am not sure how I should
> be doing the queries I describe bellow.
>
> _Note: the query is initiated by a SPA javascript client, thus when I
> say encode/uri-escape, what I mean is that I invoke the
> encodeURIComponent function
> (https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/encodeURIComponent).
> _Note 2: for the sake of conversation let's consider the example
> endpoint declared as:
>
> %rest:GET
> %rest:path("/search/{$term}")
>
>
> 1. I want to search for "tea". That is the basic query. A single term,
> no problem.
>
> curl -s "https://example.com/search/tea;
>
>
> 2. I want to search for "tea time". Now, this query has a space in
> between the two words. What I expect to get back, is any node that
> contains both words (thus I have used "contains text" with "all"),
> even if they may be a few words apart.
> - Should I be sending an encoded/uri-escape version of this, ie, "tea%20time"?
> - Or, should I be replacing the space with "+", ie "tea+time"?
> - Or, some other advice?
>
> curl -s "https://example.com/search/tea%20time;
> curl -s "https://example.com/search/tea+time;
>
>
> 3. I want to search for "tea/time". This is even trickier. What I
> expect to get back, is any node that contains "tea/time", ie a search
> result for a single term. How do I do this?
> - If I do not do anything, the slash is treated as part of the URL,
> thus not matching a route.
> - If I encoded/uri-escape this term, I get "tea%2Ftime". But, when I
> invoke the endpoint I get the same as if there was a slash.
> - I am not sure how I should deal with the slash. How should I
> escape/encode this?
>
> curl -s "https://example.com/search/tea/time;
> curl -s "https://example.com/search/tea%2Ftime;
>
>
> Thank you,


Re: [basex-talk] Help with a Query/Performance

2020-01-20 Thread Zimmel, Daniel
Hi Tom,

I think that trying to copy/modify a huge tree is definitely the bottleneck 
here.
Why don’t you copy only your third Message element and then reconstruct the 
wrapping Container with ContainerMetaData?

Since the wanted result is a transformation, perhaps a typeswitch expression 
might be an alternative, if there is something that stops you from 
reconstructing.

Daniel

Von: Tom Rauchenwald (UNIFITS) 
Gesendet: Montag, 20. Januar 2020 10:01
An: basex-talk@mailman.uni-konstanz.de
Betreff: [basex-talk] Help with a Query/Performance

Hi list,

I'm struggling with a query.

We have XML documents with a structure similar to this:


  FOO
  FOO
  

  FOO
  FOO


  FOO
  FOO


  FOO
  FOO

  
  

  FOO
  FOO


  FOO
  FOO

  
  

  FOO
  FOO


  FOO
  FOO

  


Messages are bundled in a container (up to n times for each message), and each 
message has details (also up to n times). Container, Message contain data that 
is the same for all details (it's basically a grouping).
I'd like to retrieve a Detail with all corresponding data associated with it, 
so basically a MessageADetail, MessageA (without all the other 
MessageADetails), Container (without all the other Messages).
I know the position of the message (i.e., I know that I want the second 
MessageA for example), and I know the position of the Detail (i.e., I know that 
I want the 3rd Detail).
The use case is to show the detail in context in a UI.

The query to do this I came up with is (here I want to get the 2nd detail from 
the third MessageA):

  let $fh := (copy $x := /*:Container
   modify ( delete node $x/*:MessageA[position() != 3]
  , delete node $x/*:MessageA[3]/*:MessageADetail[position() != 2]
  , delete node $x/*:MessageB
  , delete node $x/*:MessageC
  )
  return $x)
  return $fh

This works well for small documents. For large documents it can take a couple 
of seconds to evaluate the query (our real-life documents do have more 
data/elements in Details and Message).
I'm wondering if there's a better/more efficient way to do this. I tried 
formulating a query that doesn't do deletes, but I couldn't come up with a 
solution that performs better and is correct.

Any pointers would be very much appreciated.

Here's a function to generate sufficiently large test data:

declare function local:sample($numberOfMessages, $numberOfDetails) {

  FOO
  FOO
  {for $i in 1 to $numberOfMessages
return
  

  FOO {$i}
  FOO {$i}

{for $j in 1 to $numberOfDetails
 return
 
   FOO {$j}
   FOO {$j}
 
}
  
  }
  

  FOO
  FOO


  FOO
  FOO

  
  

  FOO
  FOO


  FOO
  FOO

  

};

db:create('tr-test', local:sample(20, 10), 'test.xml')

Thanks,
Tom Rauchenwald




[basex-talk] How to escape/encode a search term using BaseX REST XQ

2020-01-20 Thread Ivan Kanakarakis
Hello everyone,

I am using BaseX 8.44 and the REST XQ interface (ie,
http://docs.basex.org/wiki/RESTXQ). I have an endpoint that, when
invoked with GET, it does a full text search (using "$db-nodes[text()
contains text { $term } all]"), gets the results, constructs a JSON
response and sends it back.

That's all fine and works great. However, I am not sure how I should
be doing the queries I describe bellow.

_Note: the query is initiated by a SPA javascript client, thus when I
say encode/uri-escape, what I mean is that I invoke the
encodeURIComponent function
(https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/encodeURIComponent).
_Note 2: for the sake of conversation let's consider the example
endpoint declared as:

%rest:GET
%rest:path("/search/{$term}")


1. I want to search for "tea". That is the basic query. A single term,
no problem.

curl -s "https://example.com/search/tea;


2. I want to search for "tea time". Now, this query has a space in
between the two words. What I expect to get back, is any node that
contains both words (thus I have used "contains text" with "all"),
even if they may be a few words apart.
- Should I be sending an encoded/uri-escape version of this, ie, "tea%20time"?
- Or, should I be replacing the space with "+", ie "tea+time"?
- Or, some other advice?

curl -s "https://example.com/search/tea%20time;
curl -s "https://example.com/search/tea+time;


3. I want to search for "tea/time". This is even trickier. What I
expect to get back, is any node that contains "tea/time", ie a search
result for a single term. How do I do this?
- If I do not do anything, the slash is treated as part of the URL,
thus not matching a route.
- If I encoded/uri-escape this term, I get "tea%2Ftime". But, when I
invoke the endpoint I get the same as if there was a slash.
- I am not sure how I should deal with the slash. How should I
escape/encode this?

curl -s "https://example.com/search/tea/time;
curl -s "https://example.com/search/tea%2Ftime;


Thank you,


[basex-talk] Help with a Query/Performance

2020-01-20 Thread Tom Rauchenwald (UNIFITS)
Hi list,

I'm struggling with a query.

We have XML documents with a structure similar to this:


  FOO
  FOO
  

  FOO
  FOO


  FOO
  FOO


  FOO
  FOO

  
  

  FOO
  FOO


  FOO
  FOO

  
  

  FOO
  FOO


  FOO
  FOO

  


Messages are bundled in a container (up to n times for each message), and each 
message has details (also up to n times). Container, Message contain data that 
is the same for all details (it's basically a grouping).
I'd like to retrieve a Detail with all corresponding data associated with it, 
so basically a MessageADetail, MessageA (without all the other 
MessageADetails), Container (without all the other Messages).
I know the position of the message (i.e., I know that I want the second 
MessageA for example), and I know the position of the Detail (i.e., I know that 
I want the 3rd Detail).
The use case is to show the detail in context in a UI.

The query to do this I came up with is (here I want to get the 2nd detail from 
the third MessageA):

  let $fh := (copy $x := /*:Container
   modify ( delete node $x/*:MessageA[position() != 3]
  , delete node $x/*:MessageA[3]/*:MessageADetail[position() != 2]
  , delete node $x/*:MessageB
  , delete node $x/*:MessageC
  )
  return $x)
  return $fh

This works well for small documents. For large documents it can take a couple 
of seconds to evaluate the query (our real-life documents do have more 
data/elements in Details and Message).
I'm wondering if there's a better/more efficient way to do this. I tried 
formulating a query that doesn't do deletes, but I couldn't come up with a 
solution that performs better and is correct.

Any pointers would be very much appreciated.

Here's a function to generate sufficiently large test data:

declare function local:sample($numberOfMessages, $numberOfDetails) {

  FOO
  FOO
  {for $i in 1 to $numberOfMessages
return
  

  FOO {$i}
  FOO {$i}

{for $j in 1 to $numberOfDetails
 return
 
   FOO {$j}
   FOO {$j}
 
}
  
  }
  

  FOO
  FOO


  FOO
  FOO

  
  

  FOO
  FOO


  FOO
  FOO

  

};

db:create('tr-test', local:sample(20, 10), 'test.xml')

Thanks,
Tom Rauchenwald