[basex-talk] Server-side XQuery scripts understanding their "context"
Hi all, I'm trying to use the REST API to call server-side XQuery scripts. As I understand it, these scripts are not part of a database, but reside in the filesystem where BaseX is "installed" (or where RESTPATH points to). However, when I execute an XQuery I would like it to run on a particular database. Is there anyway to achieve this without passing the database as an argument to the query? For example, http://localhost:8984/rest?run=find.xq http://localhost:8984/rest/Test?run=find.xq ... both execute the same query, and that query needs to open a database to do its searching, so currently the database to open is hard-coded. It would be nice if the query could work out what database or "collection" was its context and automatically use that. Additionally, if I do http://localhost:8984/rest?run=Test/find.xq ... then I run the query that is in the Test folder, but this still does not have any context and so still needs to know to open the Test database. Is there a way to achieve this without having to pass the name of the database to every XQuery? Or am I not understanding things and is there a better way to do this sort of thing? Thanks, Luke
[basex-talk] Help
Hello guys. I need to develop one API that must compare prices of laptops of two websites Fnac and Worten. I don't know to much about XML ( it's a college degree and it's one of last summarys that's i had left) Here it is what i need to do: It is intended to develop an application that allows you to make a price comparison of Laptops sold at Fnac and Worten online stores. Information Extraction, Structuring and Storage To preserve and share product data from each online store, such as price history, create an XML vocabulary that can capture all storage-related requirements. Thus, communication between applications is supported by a specific vocabulary and independent of the tools used to record prices used in the mentioned online stores. Always keep in mind that this vocabulary may be distributed in the future to various “partners” so that they can communicate or price their products and price comparator developed. The following vocabulary objectives were created: ● Represent practical store-identifying information, including the name, primary web address, and web addresses used for information extraction; ● Represent useful information with the products involved in the price comparator, namely: name, characteristics, brand, link to an offer page and their prices over time, including data / time when this information was collected; ● For each store, display aggregate data information where you can use: average price of each product (considering various amounts of data) not only considering the entire price history, but also a specific period (eg month); ● Represent practical information with partner pricing, including: the name of the online store and previously used product data (for example, if any online store wants to publish their pricing on the platform, it will use a used component of vocabulary). ● You can add new data elements to the vocabulary in order to enrich the whole process (this component will be enhanced). Information Processing and Availability The documents generated and valid according to the developed vocabulary must be stored in a database and made available through a REST API using the tool B aseX. You can use the Postman tool to extract data. The data must be extracted periodically (for example, hourly) from the referred addresses, and must later be stored in an XML database with BaseX. The data stored in the database must then be exposed through a REST API designed to support integration with applications developed by third parties. The API should (at a minimum) provide: ● data on a specific product with or without a price history (it should be possible to select one of the options); ● data about the products of a specific brand; ● data about a specific online store for one or more products; ● aggregated data for the online store (including the average price of products and maximum and minimum prices in a given period of time); ● price communication for partner products; ● it should also be possible to share information via twitter (for example, what is the best deal taking into account the price differences of a particular laptop). The number of endpoints, such as the form of interaction with the API, that can be parsed more expressively and consistently, but it is not possible to identify resources, but also in the REST verbs used in HTTP requests. Identifying common products between different stores is an extremely important process since without this identification it will not be possible to make a price comparison. In addition to adopting techniques to automate the process of identifying common products, it can provide manual mechanisms. For example, you can create a document that stores as mailings that 100% of common products could not be detected and allow a user to annotate or validate a document (yes or no) from the mailing. Data visualization It should also include a specific endpoint to provide an HTML document with a set of views related to vocabulary aggregated data components. To do this, select an HTML document to objectively display the required information. You can run quickchart.io to generate views and integrate documents into the available HTML (s). Overview and tools You should take advantage of the tools studied throughout the semester to use each step. The combination of tools to achieve the proposed objectives is at the discretion of each working group. As an example, Figure 1 provides an overview of a possible adjustment of different related concepts and technologies. must be delivered: ● A set of XML schemas capable of validating all syntax rules defined for the language and associated types; ● Evidence of the API developed using Postman, as well as its documentation. You can deliver a GIT repository; ● XQuery files that define the REST API developed in the BaseX tool; ● Examples of documents that allow you to test the solution (for example: sample document that features a typical response / request from each
Re: [basex-talk] Help with a Query/Performance
I missed to do the obvious next step. The following query is evaluated in a few milliseconds: declare variable $OFFSET1 := 3; declare variable $OFFSET2 := 2; let $container := db:open('tr-test')/Container let $message := $container/*:MessageA[$OFFSET1] let $detail := $message/MessageADetail[$OFFSET2] return element { name($container) } { $container/*[contains(name(), 'MetaData')], element { name($message) } { $message/MessageAMetaData, element { name($detail) } { $detail/* } } } On Mon, Jan 20, 2020 at 6:54 PM Christian Grün wrote: > > Dear Tom, > > If you have large elements, it will usually be faster to create new > elements. Here’s one way to do it: > > let $offset1 := 3 > let $offset2 := 2 > let $container := db:open('tr-test')/Container > return element Container { > (: add meta data elements :) > $container/*[starts-with(name(), 'ContainerMetaData')], > (: alternative: add everything except Message elements > $container/(* except (MessageA, MessageB, MessageC)), :) > $container/MessageA[$offset1] update { > delete node MessageADetail[position() != $offset2] > } > } > > There are probably ways to get this even faster; I may have a look at > this tomorrow. > > All the best from Konstanz, > Christian > > > > On Mon, Jan 20, 2020 at 10:01 AM Tom Rauchenwald (UNIFITS) > wrote: > > > > Hi list, > > > > I'm struggling with a query. > > > > We have XML documents with a structure similar to this: > > > > > > FOO > > FOO > > > > > > FOO > > FOO > > > > > > FOO > > FOO > > > > > > FOO > > FOO > > > > > > > > > > FOO > > FOO > > > > > > FOO > > FOO > > > > > > > > > > FOO > > FOO > > > > > > FOO > > FOO > > > > > > > > > > Messages are bundled in a container (up to n times for each message), and > > each message has details (also up to n times). Container, Message contain > > data that is the same for all details (it's basically a grouping). > > I'd like to retrieve a Detail with all corresponding data associated with > > it, so basically a MessageADetail, MessageA (without all the other > > MessageADetails), Container (without all the other Messages). > > I know the position of the message (i.e., I know that I want the second > > MessageA for example), and I know the position of the Detail (i.e., I know > > that I want the 3rd Detail). > > The use case is to show the detail in context in a UI. > > > > The query to do this I came up with is (here I want to get the 2nd detail > > from the third MessageA): > > > > let $fh := (copy $x := /*:Container > >modify ( delete node $x/*:MessageA[position() != 3] > > , delete node $x/*:MessageA[3]/*:MessageADetail[position() != 2] > > , delete node $x/*:MessageB > > , delete node $x/*:MessageC > > ) > > return $x) > > return $fh > > > > This works well for small documents. For large documents it can take a > > couple of seconds to evaluate the query (our real-life documents do have > > more data/elements in Details and Message). > > I'm wondering if there's a better/more efficient way to do this. I tried > > formulating a query that doesn't do deletes, but I couldn't come up with a > > solution that performs better and is correct. > > > > Any pointers would be very much appreciated. > > > > Here's a function to generate sufficiently large test data: > > > > declare function local:sample($numberOfMessages, $numberOfDetails) { > > > > FOO > > FOO > > {for $i in 1 to $numberOfMessages > > return > > > > > > FOO {$i} > > FOO {$i} > > > > {for $j in 1 to $numberOfDetails > > return > > > >FOO {$j} > >FOO {$j} > > > > } > > > > } > > > > > > FOO > > FOO > > > > > > FOO > > FOO > > > > > > > > > > FOO > > FOO > > > > > > FOO > > FOO > > > > > > > > }; > > > > db:create('tr-test', local:sample(20, 10), 'test.xml') > > > > Thanks, > > Tom Rauchenwald > > > >
Re: [basex-talk] Help with a Query/Performance
Dear Tom, If you have large elements, it will usually be faster to create new elements. Here’s one way to do it: let $offset1 := 3 let $offset2 := 2 let $container := db:open('tr-test')/Container return element Container { (: add meta data elements :) $container/*[starts-with(name(), 'ContainerMetaData')], (: alternative: add everything except Message elements $container/(* except (MessageA, MessageB, MessageC)), :) $container/MessageA[$offset1] update { delete node MessageADetail[position() != $offset2] } } There are probably ways to get this even faster; I may have a look at this tomorrow. All the best from Konstanz, Christian On Mon, Jan 20, 2020 at 10:01 AM Tom Rauchenwald (UNIFITS) wrote: > > Hi list, > > I'm struggling with a query. > > We have XML documents with a structure similar to this: > > > FOO > FOO > > > FOO > FOO > > > FOO > FOO > > > FOO > FOO > > > > > FOO > FOO > > > FOO > FOO > > > > > FOO > FOO > > > FOO > FOO > > > > > Messages are bundled in a container (up to n times for each message), and > each message has details (also up to n times). Container, Message contain > data that is the same for all details (it's basically a grouping). > I'd like to retrieve a Detail with all corresponding data associated with it, > so basically a MessageADetail, MessageA (without all the other > MessageADetails), Container (without all the other Messages). > I know the position of the message (i.e., I know that I want the second > MessageA for example), and I know the position of the Detail (i.e., I know > that I want the 3rd Detail). > The use case is to show the detail in context in a UI. > > The query to do this I came up with is (here I want to get the 2nd detail > from the third MessageA): > > let $fh := (copy $x := /*:Container >modify ( delete node $x/*:MessageA[position() != 3] > , delete node $x/*:MessageA[3]/*:MessageADetail[position() != 2] > , delete node $x/*:MessageB > , delete node $x/*:MessageC > ) > return $x) > return $fh > > This works well for small documents. For large documents it can take a couple > of seconds to evaluate the query (our real-life documents do have more > data/elements in Details and Message). > I'm wondering if there's a better/more efficient way to do this. I tried > formulating a query that doesn't do deletes, but I couldn't come up with a > solution that performs better and is correct. > > Any pointers would be very much appreciated. > > Here's a function to generate sufficiently large test data: > > declare function local:sample($numberOfMessages, $numberOfDetails) { > > FOO > FOO > {for $i in 1 to $numberOfMessages > return > > > FOO {$i} > FOO {$i} > > {for $j in 1 to $numberOfDetails > return > >FOO {$j} >FOO {$j} > > } > > } > > > FOO > FOO > > > FOO > FOO > > > > > FOO > FOO > > > FOO > FOO > > > > }; > > db:create('tr-test', local:sample(20, 10), 'test.xml') > > Thanks, > Tom Rauchenwald > >
Re: [basex-talk] How to escape/encode a search term using BaseX REST XQ
Hi Ivan, A more common approach is to supply search terms as query parameters (URL?query=...); in that case, your path won’t have new segments. If you prefer paths, you can use a regular expression in your RESTXQ path pattern [1]: "search/{$query=.+}" In both cases, encodeURIComponent should be the appropriate function to encode special characters. Hope this helps, Christian [1] http://docs.basex.org/wiki/RESTXQ#Paths On Mon, Jan 20, 2020 at 10:54 AM Ivan Kanakarakis wrote: > > Hello everyone, > > I am using BaseX 8.44 and the REST XQ interface (ie, > http://docs.basex.org/wiki/RESTXQ). I have an endpoint that, when > invoked with GET, it does a full text search (using "$db-nodes[text() > contains text { $term } all]"), gets the results, constructs a JSON > response and sends it back. > > That's all fine and works great. However, I am not sure how I should > be doing the queries I describe bellow. > > _Note: the query is initiated by a SPA javascript client, thus when I > say encode/uri-escape, what I mean is that I invoke the > encodeURIComponent function > (https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/encodeURIComponent). > _Note 2: for the sake of conversation let's consider the example > endpoint declared as: > > %rest:GET > %rest:path("/search/{$term}") > > > 1. I want to search for "tea". That is the basic query. A single term, > no problem. > > curl -s "https://example.com/search/tea; > > > 2. I want to search for "tea time". Now, this query has a space in > between the two words. What I expect to get back, is any node that > contains both words (thus I have used "contains text" with "all"), > even if they may be a few words apart. > - Should I be sending an encoded/uri-escape version of this, ie, "tea%20time"? > - Or, should I be replacing the space with "+", ie "tea+time"? > - Or, some other advice? > > curl -s "https://example.com/search/tea%20time; > curl -s "https://example.com/search/tea+time; > > > 3. I want to search for "tea/time". This is even trickier. What I > expect to get back, is any node that contains "tea/time", ie a search > result for a single term. How do I do this? > - If I do not do anything, the slash is treated as part of the URL, > thus not matching a route. > - If I encoded/uri-escape this term, I get "tea%2Ftime". But, when I > invoke the endpoint I get the same as if there was a slash. > - I am not sure how I should deal with the slash. How should I > escape/encode this? > > curl -s "https://example.com/search/tea/time; > curl -s "https://example.com/search/tea%2Ftime; > > > Thank you,
Re: [basex-talk] Help with a Query/Performance
Hi Tom, I think that trying to copy/modify a huge tree is definitely the bottleneck here. Why don’t you copy only your third Message element and then reconstruct the wrapping Container with ContainerMetaData? Since the wanted result is a transformation, perhaps a typeswitch expression might be an alternative, if there is something that stops you from reconstructing. Daniel Von: Tom Rauchenwald (UNIFITS) Gesendet: Montag, 20. Januar 2020 10:01 An: basex-talk@mailman.uni-konstanz.de Betreff: [basex-talk] Help with a Query/Performance Hi list, I'm struggling with a query. We have XML documents with a structure similar to this: FOO FOO FOO FOO FOO FOO FOO FOO FOO FOO FOO FOO FOO FOO FOO FOO Messages are bundled in a container (up to n times for each message), and each message has details (also up to n times). Container, Message contain data that is the same for all details (it's basically a grouping). I'd like to retrieve a Detail with all corresponding data associated with it, so basically a MessageADetail, MessageA (without all the other MessageADetails), Container (without all the other Messages). I know the position of the message (i.e., I know that I want the second MessageA for example), and I know the position of the Detail (i.e., I know that I want the 3rd Detail). The use case is to show the detail in context in a UI. The query to do this I came up with is (here I want to get the 2nd detail from the third MessageA): let $fh := (copy $x := /*:Container modify ( delete node $x/*:MessageA[position() != 3] , delete node $x/*:MessageA[3]/*:MessageADetail[position() != 2] , delete node $x/*:MessageB , delete node $x/*:MessageC ) return $x) return $fh This works well for small documents. For large documents it can take a couple of seconds to evaluate the query (our real-life documents do have more data/elements in Details and Message). I'm wondering if there's a better/more efficient way to do this. I tried formulating a query that doesn't do deletes, but I couldn't come up with a solution that performs better and is correct. Any pointers would be very much appreciated. Here's a function to generate sufficiently large test data: declare function local:sample($numberOfMessages, $numberOfDetails) { FOO FOO {for $i in 1 to $numberOfMessages return FOO {$i} FOO {$i} {for $j in 1 to $numberOfDetails return FOO {$j} FOO {$j} } } FOO FOO FOO FOO FOO FOO FOO FOO }; db:create('tr-test', local:sample(20, 10), 'test.xml') Thanks, Tom Rauchenwald
[basex-talk] How to escape/encode a search term using BaseX REST XQ
Hello everyone, I am using BaseX 8.44 and the REST XQ interface (ie, http://docs.basex.org/wiki/RESTXQ). I have an endpoint that, when invoked with GET, it does a full text search (using "$db-nodes[text() contains text { $term } all]"), gets the results, constructs a JSON response and sends it back. That's all fine and works great. However, I am not sure how I should be doing the queries I describe bellow. _Note: the query is initiated by a SPA javascript client, thus when I say encode/uri-escape, what I mean is that I invoke the encodeURIComponent function (https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/encodeURIComponent). _Note 2: for the sake of conversation let's consider the example endpoint declared as: %rest:GET %rest:path("/search/{$term}") 1. I want to search for "tea". That is the basic query. A single term, no problem. curl -s "https://example.com/search/tea; 2. I want to search for "tea time". Now, this query has a space in between the two words. What I expect to get back, is any node that contains both words (thus I have used "contains text" with "all"), even if they may be a few words apart. - Should I be sending an encoded/uri-escape version of this, ie, "tea%20time"? - Or, should I be replacing the space with "+", ie "tea+time"? - Or, some other advice? curl -s "https://example.com/search/tea%20time; curl -s "https://example.com/search/tea+time; 3. I want to search for "tea/time". This is even trickier. What I expect to get back, is any node that contains "tea/time", ie a search result for a single term. How do I do this? - If I do not do anything, the slash is treated as part of the URL, thus not matching a route. - If I encoded/uri-escape this term, I get "tea%2Ftime". But, when I invoke the endpoint I get the same as if there was a slash. - I am not sure how I should deal with the slash. How should I escape/encode this? curl -s "https://example.com/search/tea/time; curl -s "https://example.com/search/tea%2Ftime; Thank you,
[basex-talk] Help with a Query/Performance
Hi list, I'm struggling with a query. We have XML documents with a structure similar to this: FOO FOO FOO FOO FOO FOO FOO FOO FOO FOO FOO FOO FOO FOO FOO FOO Messages are bundled in a container (up to n times for each message), and each message has details (also up to n times). Container, Message contain data that is the same for all details (it's basically a grouping). I'd like to retrieve a Detail with all corresponding data associated with it, so basically a MessageADetail, MessageA (without all the other MessageADetails), Container (without all the other Messages). I know the position of the message (i.e., I know that I want the second MessageA for example), and I know the position of the Detail (i.e., I know that I want the 3rd Detail). The use case is to show the detail in context in a UI. The query to do this I came up with is (here I want to get the 2nd detail from the third MessageA): let $fh := (copy $x := /*:Container modify ( delete node $x/*:MessageA[position() != 3] , delete node $x/*:MessageA[3]/*:MessageADetail[position() != 2] , delete node $x/*:MessageB , delete node $x/*:MessageC ) return $x) return $fh This works well for small documents. For large documents it can take a couple of seconds to evaluate the query (our real-life documents do have more data/elements in Details and Message). I'm wondering if there's a better/more efficient way to do this. I tried formulating a query that doesn't do deletes, but I couldn't come up with a solution that performs better and is correct. Any pointers would be very much appreciated. Here's a function to generate sufficiently large test data: declare function local:sample($numberOfMessages, $numberOfDetails) { FOO FOO {for $i in 1 to $numberOfMessages return FOO {$i} FOO {$i} {for $j in 1 to $numberOfDetails return FOO {$j} FOO {$j} } } FOO FOO FOO FOO FOO FOO FOO FOO }; db:create('tr-test', local:sample(20, 10), 'test.xml') Thanks, Tom Rauchenwald