[basex-talk] INFO and dp:system() show different values
Hi all when restarting a http-server, the script for accessing no longer finds the databases using db:open(). The path to the database using INFO in command line basex says one thing, while output from db:system() is different db:system() .../basex-webapp/webapp/WEB-INF/data INFO: .../basex-webapp/data The folder /webapp is where the scripts are. There is only one file .basex which contain the same as shown with INFO, which resides under folder /basex-webapp. Any thoughts? Best Lars
Re: [basex-talk] Xquery recursion and db:add() - stack overflow
Thanks for pointer! Code is rewritten using hof:until() and tested towards a particular set at our national provider of library data. The script still accumulates data, so it will probably still run into memory troubles with larger datasets, but the stack-overflow should be taken care of. For anyone interested, the code is attached below, and using hof:until() as the higher order function. To make it work, fill in URLs for a choosen OAI-endpoint, and maybe change som of the request parameters - this one fetches marc21 posts and uses sets. Some error checking may also be implemented. Cheers, Lars declare namespace oai = "http://www.openarchives.org/OAI/2.0/;; (:URL for resumption tokens :) declare variable $URL := "oai-URL?verb=ListRecordsresumptionToken="; (:URL for initial request:) declare variable $URL2 := "oai-URL?verb=ListRecordsmetadataPrefix=marc21set="; (: Variable for OAI-set - if not used, remove "set=" in URL2 :) declare variable $oai-set := "aset"; (: basex http :) declare variable $http-option := ; (: -- Fetch data from OAI-endpoint using a start map containing resumption token and the first set of data. The map has two keys, 'resume' and 'chunk', where 'chunk' is an accumulator holding data from the current and previous requests. hof:until() does not return an aggregated list of maps, so data must be collected somehow --:) declare function local:getResumption($startmap) { let $token := map:get($startmap, 'resume') return if (empty($token)) then $startmap else let $http-request := http:send-request($http-option, $URL || $token) let $result := if ($http-request instance of node()) then $http-request else {$http-request} return map { 'resume': $result//oai:resumptionToken/text(), 'chunk': ( map:get($startmap, 'chunk'), $result//oai:metadata ) } }; (: Issue initial request :) let $first := http:send-request($http-option, $URL2 || $oai-set) (: Create startmap :) let $init := map { 'chunk': $first//oai:metadata, 'resume': $first//oai:resumptionToken/text() } let $oai := hof:until( function($x) { empty(map:get($x, 'resume')) }, function($y) { local:getResumption($y) }, $init ) (: Amend with additional code like db:add() of file:write() here :) return element oai {map:get($oai, 'chunk')} 2016-05-12 15:07 GMT+02:00 Dirk Kirsten <d...@basex.org>: > Hello Lars, > > just a thought (and really just a pointer, I am neither a purely > functional guy and also I feel like I am missing something obious...): > Maybe you could rewrite the recursive approach using higher order > functions. Consider a query like the following > > hof:scan-left(1 to 100, > map { "token": "starttoken" }, > function($result, $index) { > let $req := http:send-request(, > "http://google.com?q=; <http://google.com?q=> || $result("token")) > return map { > "result": $req, > "token" : $req//http:header[@name = "Date"]/@value/data() > } > }) > > It will issue 100 requests to google and use some specific token from the > query before (in this case I used the date). This will output a sequence of > the map entries and in a subsequent step you could return only the actual > result values. > > Best regards, Dirk > > On 05/12/2016 12:55 PM, Lars Johnsen wrote: > > Thanks Johan and Matti for useful suggestions. > > Cutting down on the chunks seems to be a viable alternative. > > It would have been nice, though, to have a robust harvester in XQuery > that could take on anything, although the recursive version works fine as > long as the dataset consist of a couple of thousand entries. > > Best, > Lars > > 2016-05-12 8:16 GMT+02:00 Lassila, Matti <matti.j.lass...@jyu.fi>: > >> Hello, >> >> If your case allows using external tools for harvesting, I can highly >> recommend metha (https://github.com/miku/metha) which is a fairly full >> featured command line OAI-PMH harvester. >> >> Best regards, >> >> Matti L. >> >> On 11/05/16 18:31 , "basex-talk-boun...@mailman.uni-konstanz.de on behalf >> of Johan Mörén" <basex-talk-boun...@mailman.uni-konstanz.de on behalf of >> johan.mo...@gmail.com> wrote: >> >> >Maybe there is some other way to get the data over. I'll have a talk with >> >the guys providing the OAI-endpoint. >> >> > > -- > Dirk Kirsten, BaseX GmbH, http://basexgmbh.de > |-- Firmensitz: Blarerstrasse 56, 78462 Konstanz > |-- Registergericht Freiburg, HRB: 708285, Geschäftsführer: > | Dr. Christian Grün, Dr. Alexander Holupirek, Michael Seiferle > `-- Phone: 0049 7531 91 68 276, Fax: 0049 7531 20 05 22 > >
Re: [basex-talk] Xquery recursion and db:add() - stack overflow
Thanks Johan and Matti for useful suggestions. Cutting down on the chunks seems to be a viable alternative. It would have been nice, though, to have a robust harvester in XQuery that could take on anything, although the recursive version works fine as long as the dataset consist of a couple of thousand entries. Best, Lars 2016-05-12 8:16 GMT+02:00 Lassila, Matti: > Hello, > > If your case allows using external tools for harvesting, I can highly > recommend metha (https://github.com/miku/metha) which is a fairly full > featured command line OAI-PMH harvester. > > Best regards, > > Matti L. > > On 11/05/16 18:31 , "basex-talk-boun...@mailman.uni-konstanz.de on behalf > of Johan Mörén" johan.mo...@gmail.com> wrote: > > >Maybe there is some other way to get the data over. I'll have a talk with > >the guys providing the OAI-endpoint. > >
Re: [basex-talk] Xquery recursion and db:add() - stack overflow
The basexgui startup file now contains: BASEX_JVM="-Xmx8g -Xss4m $BASEX_JVM" It helped the script a long way, but eventually it had to kneel. It works fine though, on smaller datasets. Maybe there is some other way to get the data over. I'll have a talk with the guys providing the OAI-endpoint. Thanks for the pointer to Xss! Lars 2016-05-11 14:38 GMT+02:00 Dirk Kirsten <d...@basex.org>: > Hello Lars, > > if you have a deep recursion Java will at some point hit its stack size > limit. Have you already tried to simply increase the Java stack size, e.g. > by passing the parameter -Xss2m to the JVM? > > Cheers > Dirk > > > On 05/11/2016 01:43 PM, Lars Johnsen wrote: > > The following code generates the error "Stack Overflow: try tail > recursion?" > > The code reads in bibliographic data using OAI-PMH and updates a database > for each chunk of data. With OAI-PMH, only part of the data is available > for each request, so the server returns a resumption token if there are > more data available. > > The xquery function making the queries is implemented recursively preceded > by a database update request (see the last two lines) for each call. Is it > db:add() that causes the stack overflow? The recursion cannot be placed > further towards the end! > > declare %updating function local:getResumption($token) { > if (empty($token)) then > () > else > let $http-request := http:send-request($http-option, $URL || > $token) > let $result := > if ($http-request instance of node()) then > $http-request >else > {$http-request} > > let $resume := $result//oai:resumptionToken/text() > return ( > db:add($database,element chunk {$result//oai:metadata}, $path) > , > local:getResumption($resume) > ) > }; > > Best, > Lars > > > -- > Dirk Kirsten, BaseX GmbH, http://basexgmbh.de > |-- Firmensitz: Blarerstrasse 56, 78462 Konstanz > |-- Registergericht Freiburg, HRB: 708285, Geschäftsführer: > | Dr. Christian Grün, Dr. Alexander Holupirek, Michael Seiferle > `-- Phone: 0049 7531 91 68 276, Fax: 0049 7531 20 05 22 > >
[basex-talk] Xquery recursion and db:add() - stack overflow
The following code generates the error "Stack Overflow: try tail recursion?" The code reads in bibliographic data using OAI-PMH and updates a database for each chunk of data. With OAI-PMH, only part of the data is available for each request, so the server returns a resumption token if there are more data available. The xquery function making the queries is implemented recursively preceded by a database update request (see the last two lines) for each call. Is it db:add() that causes the stack overflow? The recursion cannot be placed further towards the end! declare %updating function local:getResumption($token) { if (empty($token)) then () else let $http-request := http:send-request($http-option, $URL || $token) let $result := if ($http-request instance of node()) then $http-request else {$http-request} let $resume := $result//oai:resumptionToken/text() return ( db:add($database,element chunk {$result//oai:metadata}, $path) , local:getResumption($resume) ) }; Best, Lars
[basex-talk] Symlinked files with RESTXQ
Dear all What must be done to access symlinked files when using BaseX as http server? The scenario is that a fairly large collection of audio-files are accessed using BaseX which serves them up over http. However, it seems that symlinking (the folder with audio files exists as a symlinked copy in the static directory of the BaseX web folder) prevents the browser from accessing them. Is there a parameter in the settings I have overlooked? Best, Lars G. Johnsen National Library of Norway
Re: [basex-talk] ft:mark
Thanks for XQuery code! I will try to integrate it with our own efforts, and let you know how it goes. Lars 2015-07-05 15:19 GMT+02:00 Christian Grün christian.gr...@gmail.com: Hi Lars, I guess it was due to the internal XQFT representation of match positions that adjacent matches are marked one by one, and it's probably difficult to change this without too much additional effort. However, you can use XQuery for transforming your result; see e.g. the attached XQuery code. I must confess I didn't spend too much time on it (right now, it's simply too hot around here...), so I'm looking for revised versions ;) Christian On Thu, Jul 2, 2015 at 6:13 PM, Lars Johnsen yoon...@gmail.com wrote: The full text function ft:mark() puts a mark around each of the words that occur in a match, starting from the first matching word to the last, including stop words, except for punctuation characters. Is it possible to check for the kind of characters (or strings) that ft:mark() will skip when marking matches? Or, would it be possible to ask ft:mark() to put one marker for the whole match? The case I am using it for is to get the sequence of matching words within a match, and sometimes, for long strings, there may be several sequences. A contiguous sequence of marked elements maybe assumed to make up a match, while non contiguous do not. One marker around the match solves the problem, as would detecting characters that are never marked. Regards Lars G Johnsen National Library of Norway
[basex-talk] ft:mark
The full text function ft:mark() puts a mark around each of the words that occur in a match, starting from the first matching word to the last, including stop words, except for punctuation characters. Is it possible to check for the kind of characters (or strings) that ft:mark() will skip when marking matches? Or, would it be possible to ask ft:mark() to put one marker for the whole match? The case I am using it for is to get the sequence of matching words within a match, and sometimes, for long strings, there may be several sequences. A contiguous sequence of marked elements maybe assumed to make up a match, while non contiguous do not. One marker around the match solves the problem, as would detecting characters that are never marked. Regards Lars G Johnsen National Library of Norway
Re: [basex-talk] Potential Bug - message from fulltext
After adding a list of stop words - removed the top 100 - the full text index works perfectly. Thanks for suggestions! Lars 2015-06-28 0:03 GMT+02:00 Christian Grün christian.gr...@gmail.com: Hi Lars, It looks as if the input data is indeed too large to be indexed (the internal id lists seem to exceed the maximum array size in main memory). The usual alternative to make it work is to distribute your document(s) into multiple databases. If you want, you can also provide us with the input data, but I assume it will take pretty much space? Best, Christian Sat, Jun 27, 2015 at 12:50 PM, Lars Johnsen yoon...@gmail.com wrote: When trying to to a full text index on a collection of texts, the process runs for a couple of hours with the exit message below - I think it is near completed. From the GUI, I have at least seen the progress bar get to around 80 %, so I think it is safe to assume that the error is connectedt the final stages. The texts are unstructured and represented as one line pr. book. Here is the result from the index process. Parameters set in GUI are: Norwegian Snowball, lemmatization, diacritics. There is set aside 30GB for the GUI. Path summary: doc(): 317259x, strings text: 317259x, leaf text(): 317259x, strings, leaf Here is the error message: Improper use? Potential bug? Your feedback is welcome: Contact: basex-talk@mailman.uni-konstanz.de Version: BaseX 8.2 beta 7d38949 Java: Oracle Corporation, 1.7.0_79 OS: Linux, amd64 Stack Trace: java.lang.NegativeArraySizeException at java.util.Arrays.copyOf(Arrays.java:2271) at org.basex.util.TokenBuilder.add(TokenBuilder.java:303) at org.basex.util.TokenBuilder.add(TokenBuilder.java:290) at org.basex.index.ft.FTBuilder.merge(FTBuilder.java:248) at org.basex.index.ft.FTBuilder.write(FTBuilder.java:155) at org.basex.index.ft.FTBuilder.index(FTBuilder.java:94) at org.basex.index.ft.FTBuilder.build(FTBuilder.java:102) at org.basex.index.ft.FTBuilder.build(FTBuilder.java:1) at org.basex.data.DiskData.createIndex(DiskData.java:195) at org.basex.core.cmd.ACreate.create(ACreate.java:117) at org.basex.core.cmd.CreateIndex.run(CreateIndex.java:62) at org.basex.core.Command.run(Command.java:398) at org.basex.core.Command.execute(Command.java:100) at org.basex.core.Command.execute(Command.java:123) at org.basex.gui.dialog.DialogProgress$1.run(DialogProgress.java:178) Regards Lars G Johnsen National Library of Norway
[basex-talk] Potential Bug - message from fulltext
When trying to to a full text index on a collection of texts, the process runs for a couple of hours with the exit message below - I think it is near completed. From the GUI, I have at least seen the progress bar get to around 80 %, so I think it is safe to assume that the error is connectedt the final stages. The texts are unstructured and represented as one line pr. book. Here is the result from the index process. Parameters set in GUI are: Norwegian Snowball, lemmatization, diacritics. There is set aside 30GB for the GUI. Path summary: doc(): 317259x, strings text: 317259x, leaf text(): 317259x, strings, leaf Here is the error message: Improper use? Potential bug? Your feedback is welcome: Contact: basex-talk@mailman.uni-konstanz.de Version: BaseX 8.2 beta 7d38949 Java: Oracle Corporation, 1.7.0_79 OS: Linux, amd64 Stack Trace: java.lang.NegativeArraySizeException at java.util.Arrays.copyOf(Arrays.java:2271) at org.basex.util.TokenBuilder.add(TokenBuilder.java:303) at org.basex.util.TokenBuilder.add(TokenBuilder.java:290) at org.basex.index.ft.FTBuilder.merge(FTBuilder.java:248) at org.basex.index.ft.FTBuilder.write(FTBuilder.java:155) at org.basex.index.ft.FTBuilder.index(FTBuilder.java:94) at org.basex.index.ft.FTBuilder.build(FTBuilder.java:102) at org.basex.index.ft.FTBuilder.build(FTBuilder.java:1) at org.basex.data.DiskData.createIndex(DiskData.java:195) at org.basex.core.cmd.ACreate.create(ACreate.java:117) at org.basex.core.cmd.CreateIndex.run(CreateIndex.java:62) at org.basex.core.Command.run(Command.java:398) at org.basex.core.Command.execute(Command.java:100) at org.basex.core.Command.execute(Command.java:123) at org.basex.gui.dialog.DialogProgress$1.run(DialogProgress.java:178) Regards Lars G Johnsen National Library of Norway
Re: [basex-talk] rest vs. restxq - strange difference
Thanks for that. New version works nicely with full text indexing - and very fast too! Noticed that the text index seems to work differently between RESTXQ (not utilized?) and REST - judging from the response time. Thanks again for the efforts Lars 2015-05-19 13:29 GMT+02:00 Christian Grün christian.gr...@gmail.com: I'll check out how this can be fixed. So I checked out how to fix it, and I fixed it [1]. Feel free to try the latest snapshot [2]! Christian [1] https://github.com/BaseXdb/basex/issues/1144 [2] http://files.basex.org/releases/latest On Mon, May 18, 2015 at 6:46 PM, Lars Johnsen yoon...@gmail.com wrote: A last update, which may illuminate a little. After reindexing the database using Norwegian (snowball), stemming, and keeping diacritis, RESTXQ processes neither the special characters (treats them as closest ascii), nor inflected forms. The words mannen (=the man, definite) and spaserer (=walks, present tense), result in no output, while using the naked stems mann and spaser the full result is displayed. In contrast to REST which behaves as expected. Cheers Lars 2015-05-18 15:28 GMT+02:00 Lars Johnsen yoon...@gmail.com: As an update, after rebuilding database with text index, full text index (no language, no stemming, keep diacritics) restarting server: BaseX 8.1.1 [Server] Server was started (port: 29084) [main] INFO org.eclipse.jetty.server.AbstractConnector - Started SelectChannelConnector@0.0.0.0:8984 HTTP Server was started (port: 8984) RESTXQ: Norwegian characters are converted using full text index, changing to text index takes forever. REST: Full-text works as expected, and text index works as expected (same as runing in GUI for both). It looks as if the index structure is treated differently. 2015-05-18 15:07 GMT+02:00 Lars Johnsen yoon...@gmail.com: The full text query is blisteringly fast for both, the text index query is fast only for REST queries and seems not to be used with queries in RESTXQ. I am rebuilding the whole database now to see how it goes, and will restart everything for a new assessment. 2015-05-18 15:00 GMT+02:00 Christian Grün christian.gr...@gmail.com : However, when using text index instead of full text the results are the same for both, except that RESTXQ takes almost forever What about the original query: Has it been slow as well, or do you think this is a new problem? 2015-05-18 14:28 GMT+02:00 Christian Grün christian.gr...@gmail.com: It could be that your URL is decoded in a wrong way.. What happens if you run the following function with REST and RESTXQ and føre as word? declare %rest:path(/test/encoding/{$word}) function page:test-encoding($word) { string-to-codepoints($word) }; Thanks, Christian string-to-codepoints() REST output (2 first lines): føre fø - re 219 RESTXQ føre fo - re 123 The first word quoted is føre in both cases and is what the scripts see, so the full text is given the same in both cases. Could it be that within RESTXQ the full text index is treated differently? I will work closer on a self contained example, but thought this might point to something. Cheers Lars 2015-05-18 13:44 GMT+02:00 Lars Johnsen yoon...@gmail.com: Hi Christian - and thanks for fast response. Latest version 8.11 is in use (same behaviour as previous). Let me see if I can make a self contained example. best, Lars 2015-05-18 13:40 GMT+02:00 Christian Grün christian.gr...@gmail.com: Hi Lars, hm, that's difficult to tell. All I can say is that this sounds unusual, so I'm coming up with my standard questions: Do you think you could build us a little example that allows us to reproduce the problem? Have you tried the latest version of BaseX? Best, Christian On Mon, May 18, 2015 at 1:35 PM, Lars Johnsen yoon...@gmail.com wrote: I am running a web script in two identical versions (identical as in cut and paste), one via RESTXQ and one vi REST. The response is different, and I wondered what may be the trouble. For example the output (the URLs only works locally) for http://ljohnsen:8984/hyphens/mellom is the same as http://ljohnsen:8984/rest?run=hyphen-show.xqword=mellom which is a set of hyphenation data: mellom mel - lom 17005 Mel - lom 144 mel - lom. 50 but if mellom is exchanged with nasjonalbiblioteket only the REST version shows any result, which then is the same as I get experimenting in the GUI. The actual script is added below, and which runs in both versions
Re: [basex-talk] rest vs. restxq - strange difference
As an update, after rebuilding database with text index, full text index (no language, no stemming, keep diacritics) restarting server: BaseX 8.1.1 [Server] Server was started (port: 29084) [main] INFO org.eclipse.jetty.server.AbstractConnector - Started SelectChannelConnector@0.0.0.0:8984 HTTP Server was started (port: 8984) RESTXQ: Norwegian characters are converted using full text index, changing to text index takes forever. REST: Full-text works as expected, and text index works as expected (same as runing in GUI for both). It looks as if the index structure is treated differently. 2015-05-18 15:07 GMT+02:00 Lars Johnsen yoon...@gmail.com: The full text query is blisteringly fast for both, the text index query is fast only for REST queries and seems not to be used with queries in RESTXQ. I am rebuilding the whole database now to see how it goes, and will restart everything for a new assessment. 2015-05-18 15:00 GMT+02:00 Christian Grün christian.gr...@gmail.com: However, when using text index instead of full text the results are the same for both, except that RESTXQ takes almost forever What about the original query: Has it been slow as well, or do you think this is a new problem? 2015-05-18 14:28 GMT+02:00 Christian Grün christian.gr...@gmail.com: It could be that your URL is decoded in a wrong way.. What happens if you run the following function with REST and RESTXQ and føre as word? declare %rest:path(/test/encoding/{$word}) function page:test-encoding($word) { string-to-codepoints($word) }; Thanks, Christian string-to-codepoints() REST output (2 first lines): føre fø - re 219 RESTXQ føre fo - re 123 The first word quoted is føre in both cases and is what the scripts see, so the full text is given the same in both cases. Could it be that within RESTXQ the full text index is treated differently? I will work closer on a self contained example, but thought this might point to something. Cheers Lars 2015-05-18 13:44 GMT+02:00 Lars Johnsen yoon...@gmail.com: Hi Christian - and thanks for fast response. Latest version 8.11 is in use (same behaviour as previous). Let me see if I can make a self contained example. best, Lars 2015-05-18 13:40 GMT+02:00 Christian Grün christian.gr...@gmail.com: Hi Lars, hm, that's difficult to tell. All I can say is that this sounds unusual, so I'm coming up with my standard questions: Do you think you could build us a little example that allows us to reproduce the problem? Have you tried the latest version of BaseX? Best, Christian On Mon, May 18, 2015 at 1:35 PM, Lars Johnsen yoon...@gmail.com wrote: I am running a web script in two identical versions (identical as in cut and paste), one via RESTXQ and one vi REST. The response is different, and I wondered what may be the trouble. For example the output (the URLs only works locally) for http://ljohnsen:8984/hyphens/mellom is the same as http://ljohnsen:8984/rest?run=hyphen-show.xqword=mellom which is a set of hyphenation data: mellom mel - lom 17005 Mel - lom 144 mel - lom. 50 but if mellom is exchanged with nasjonalbiblioteket only the REST version shows any result, which then is the same as I get experimenting in the GUI. The actual script is added below, and which runs in both versions (identical apart form the rest and restxq interfaces), it uses full text search, but results differ when run under the REST-regime. All the best Lars G Johnsen National Library of Norway module namespace page = 'http://basex.org/modules/web-page'; declare %rest:path(/hyphens/{$word}) %output:method(html) function page:show-hyphens($word) { let $db := db:open('hyphen-data') let $hyphens := for $hyp in $db/hyphens/hyphens[full contains text {$word}] group by $first := $hyp/first, $second := $hyp/second let $count := count($hyp) order by xs:int($count) descending return element p { attribute freq {$count}, $first, - , $second, $count } let $total := sum($hyphens//@freq) let $div := element div { element p {$word}, for $hyp in $hyphens return element div { attribute class {hyph}, attribute style {font-size:, 1 +round(xs:int($hyp//@freq/data()) div $total,1) || em}, $hyp } } return html encoding=UTF-8 head meta http-equiv=Content-Type content=text/html charset=UTF-8 / titleOrddelinger/title /head body{$div} /body
Re: [basex-talk] rest vs. restxq - strange difference
The full text query is blisteringly fast for both, the text index query is fast only for REST queries and seems not to be used with queries in RESTXQ. I am rebuilding the whole database now to see how it goes, and will restart everything for a new assessment. 2015-05-18 15:00 GMT+02:00 Christian Grün christian.gr...@gmail.com: However, when using text index instead of full text the results are the same for both, except that RESTXQ takes almost forever What about the original query: Has it been slow as well, or do you think this is a new problem? 2015-05-18 14:28 GMT+02:00 Christian Grün christian.gr...@gmail.com: It could be that your URL is decoded in a wrong way.. What happens if you run the following function with REST and RESTXQ and føre as word? declare %rest:path(/test/encoding/{$word}) function page:test-encoding($word) { string-to-codepoints($word) }; Thanks, Christian string-to-codepoints() REST output (2 first lines): føre fø - re 219 RESTXQ føre fo - re 123 The first word quoted is føre in both cases and is what the scripts see, so the full text is given the same in both cases. Could it be that within RESTXQ the full text index is treated differently? I will work closer on a self contained example, but thought this might point to something. Cheers Lars 2015-05-18 13:44 GMT+02:00 Lars Johnsen yoon...@gmail.com: Hi Christian - and thanks for fast response. Latest version 8.11 is in use (same behaviour as previous). Let me see if I can make a self contained example. best, Lars 2015-05-18 13:40 GMT+02:00 Christian Grün christian.gr...@gmail.com : Hi Lars, hm, that's difficult to tell. All I can say is that this sounds unusual, so I'm coming up with my standard questions: Do you think you could build us a little example that allows us to reproduce the problem? Have you tried the latest version of BaseX? Best, Christian On Mon, May 18, 2015 at 1:35 PM, Lars Johnsen yoon...@gmail.com wrote: I am running a web script in two identical versions (identical as in cut and paste), one via RESTXQ and one vi REST. The response is different, and I wondered what may be the trouble. For example the output (the URLs only works locally) for http://ljohnsen:8984/hyphens/mellom is the same as http://ljohnsen:8984/rest?run=hyphen-show.xqword=mellom which is a set of hyphenation data: mellom mel - lom 17005 Mel - lom 144 mel - lom. 50 but if mellom is exchanged with nasjonalbiblioteket only the REST version shows any result, which then is the same as I get experimenting in the GUI. The actual script is added below, and which runs in both versions (identical apart form the rest and restxq interfaces), it uses full text search, but results differ when run under the REST-regime. All the best Lars G Johnsen National Library of Norway module namespace page = 'http://basex.org/modules/web-page'; declare %rest:path(/hyphens/{$word}) %output:method(html) function page:show-hyphens($word) { let $db := db:open('hyphen-data') let $hyphens := for $hyp in $db/hyphens/hyphens[full contains text {$word}] group by $first := $hyp/first, $second := $hyp/second let $count := count($hyp) order by xs:int($count) descending return element p { attribute freq {$count}, $first, - , $second, $count } let $total := sum($hyphens//@freq) let $div := element div { element p {$word}, for $hyp in $hyphens return element div { attribute class {hyph}, attribute style {font-size:, 1 +round(xs:int($hyp//@freq/data()) div $total,1) || em}, $hyp } } return html encoding=UTF-8 head meta http-equiv=Content-Type content=text/html charset=UTF-8 / titleOrddelinger/title /head body{$div} /body /html };
Re: [basex-talk] rest vs. restxq - strange difference
Tried to make a small example but then things worked the same, so reindexed the database (no language and no stemming) and found this. It seems that it has to to with character encoding. RESTXQ finds hits for føre as fore while REST treats it as føre so the outputs are like this REST output (2 first lines): føre fø - re 219 RESTXQ føre fo - re 123 The first word quoted is føre in both cases and is what the scripts see, so the full text is given the same in both cases. Could it be that within RESTXQ the full text index is treated differently? I will work closer on a self contained example, but thought this might point to something. Cheers Lars 2015-05-18 13:44 GMT+02:00 Lars Johnsen yoon...@gmail.com: Hi Christian - and thanks for fast response. Latest version 8.11 is in use (same behaviour as previous). Let me see if I can make a self contained example. best, Lars 2015-05-18 13:40 GMT+02:00 Christian Grün christian.gr...@gmail.com: Hi Lars, hm, that's difficult to tell. All I can say is that this sounds unusual, so I'm coming up with my standard questions: Do you think you could build us a little example that allows us to reproduce the problem? Have you tried the latest version of BaseX? Best, Christian On Mon, May 18, 2015 at 1:35 PM, Lars Johnsen yoon...@gmail.com wrote: I am running a web script in two identical versions (identical as in cut and paste), one via RESTXQ and one vi REST. The response is different, and I wondered what may be the trouble. For example the output (the URLs only works locally) for http://ljohnsen:8984/hyphens/mellom is the same as http://ljohnsen:8984/rest?run=hyphen-show.xqword=mellom which is a set of hyphenation data: mellom mel - lom 17005 Mel - lom 144 mel - lom. 50 but if mellom is exchanged with nasjonalbiblioteket only the REST version shows any result, which then is the same as I get experimenting in the GUI. The actual script is added below, and which runs in both versions (identical apart form the rest and restxq interfaces), it uses full text search, but results differ when run under the REST-regime. All the best Lars G Johnsen National Library of Norway module namespace page = 'http://basex.org/modules/web-page'; declare %rest:path(/hyphens/{$word}) %output:method(html) function page:show-hyphens($word) { let $db := db:open('hyphen-data') let $hyphens := for $hyp in $db/hyphens/hyphens[full contains text {$word}] group by $first := $hyp/first, $second := $hyp/second let $count := count($hyp) order by xs:int($count) descending return element p { attribute freq {$count}, $first, - , $second, $count } let $total := sum($hyphens//@freq) let $div := element div { element p {$word}, for $hyp in $hyphens return element div { attribute class {hyph}, attribute style {font-size:, 1 +round(xs:int($hyp//@freq/data()) div $total,1) || em}, $hyp } } return html encoding=UTF-8 head meta http-equiv=Content-Type content=text/html charset=UTF-8 / titleOrddelinger/title /head body{$div} /body /html };
Re: [basex-talk] rest vs. restxq - strange difference
The codepoints are identical for both for føre: 102 248 114 101 and same as GUI. However, when using text index instead of full text the results are the same for both, except that RESTXQ takes almost forever (as if there was no text index), while REST gives immediate result. So it looks as if the RESTXQ accesses the index structure in a different way - could that be so, or is there some strange things in my own configuration? 2015-05-18 14:28 GMT+02:00 Christian Grün christian.gr...@gmail.com: It could be that your URL is decoded in a wrong way.. What happens if you run the following function with REST and RESTXQ and føre as word? declare %rest:path(/test/encoding/{$word}) function page:test-encoding($word) { string-to-codepoints($word) }; Thanks, Christian string-to-codepoints() REST output (2 first lines): føre fø - re 219 RESTXQ føre fo - re 123 The first word quoted is føre in both cases and is what the scripts see, so the full text is given the same in both cases. Could it be that within RESTXQ the full text index is treated differently? I will work closer on a self contained example, but thought this might point to something. Cheers Lars 2015-05-18 13:44 GMT+02:00 Lars Johnsen yoon...@gmail.com: Hi Christian - and thanks for fast response. Latest version 8.11 is in use (same behaviour as previous). Let me see if I can make a self contained example. best, Lars 2015-05-18 13:40 GMT+02:00 Christian Grün christian.gr...@gmail.com: Hi Lars, hm, that's difficult to tell. All I can say is that this sounds unusual, so I'm coming up with my standard questions: Do you think you could build us a little example that allows us to reproduce the problem? Have you tried the latest version of BaseX? Best, Christian On Mon, May 18, 2015 at 1:35 PM, Lars Johnsen yoon...@gmail.com wrote: I am running a web script in two identical versions (identical as in cut and paste), one via RESTXQ and one vi REST. The response is different, and I wondered what may be the trouble. For example the output (the URLs only works locally) for http://ljohnsen:8984/hyphens/mellom is the same as http://ljohnsen:8984/rest?run=hyphen-show.xqword=mellom which is a set of hyphenation data: mellom mel - lom 17005 Mel - lom 144 mel - lom. 50 but if mellom is exchanged with nasjonalbiblioteket only the REST version shows any result, which then is the same as I get experimenting in the GUI. The actual script is added below, and which runs in both versions (identical apart form the rest and restxq interfaces), it uses full text search, but results differ when run under the REST-regime. All the best Lars G Johnsen National Library of Norway module namespace page = 'http://basex.org/modules/web-page'; declare %rest:path(/hyphens/{$word}) %output:method(html) function page:show-hyphens($word) { let $db := db:open('hyphen-data') let $hyphens := for $hyp in $db/hyphens/hyphens[full contains text {$word}] group by $first := $hyp/first, $second := $hyp/second let $count := count($hyp) order by xs:int($count) descending return element p { attribute freq {$count}, $first, - , $second, $count } let $total := sum($hyphens//@freq) let $div := element div { element p {$word}, for $hyp in $hyphens return element div { attribute class {hyph}, attribute style {font-size:, 1 +round(xs:int($hyp//@freq/data()) div $total,1) || em}, $hyp } } return html encoding=UTF-8 head meta http-equiv=Content-Type content=text/html charset=UTF-8 / titleOrddelinger/title /head body{$div} /body /html };
Re: [basex-talk] rest vs. restxq - strange difference
A last update, which may illuminate a little. After reindexing the database using Norwegian (snowball), stemming, and keeping diacritis, RESTXQ processes neither the special characters (treats them as closest ascii), nor inflected forms. The words mannen (=the man, definite) and spaserer (=walks, present tense), result in no output, while using the naked stems mann and spaser the full result is displayed. In contrast to REST which behaves as expected. Cheers Lars 2015-05-18 15:28 GMT+02:00 Lars Johnsen yoon...@gmail.com: As an update, after rebuilding database with text index, full text index (no language, no stemming, keep diacritics) restarting server: BaseX 8.1.1 [Server] Server was started (port: 29084) [main] INFO org.eclipse.jetty.server.AbstractConnector - Started SelectChannelConnector@0.0.0.0:8984 HTTP Server was started (port: 8984) RESTXQ: Norwegian characters are converted using full text index, changing to text index takes forever. REST: Full-text works as expected, and text index works as expected (same as runing in GUI for both). It looks as if the index structure is treated differently. 2015-05-18 15:07 GMT+02:00 Lars Johnsen yoon...@gmail.com: The full text query is blisteringly fast for both, the text index query is fast only for REST queries and seems not to be used with queries in RESTXQ. I am rebuilding the whole database now to see how it goes, and will restart everything for a new assessment. 2015-05-18 15:00 GMT+02:00 Christian Grün christian.gr...@gmail.com: However, when using text index instead of full text the results are the same for both, except that RESTXQ takes almost forever What about the original query: Has it been slow as well, or do you think this is a new problem? 2015-05-18 14:28 GMT+02:00 Christian Grün christian.gr...@gmail.com: It could be that your URL is decoded in a wrong way.. What happens if you run the following function with REST and RESTXQ and føre as word? declare %rest:path(/test/encoding/{$word}) function page:test-encoding($word) { string-to-codepoints($word) }; Thanks, Christian string-to-codepoints() REST output (2 first lines): føre fø - re 219 RESTXQ føre fo - re 123 The first word quoted is føre in both cases and is what the scripts see, so the full text is given the same in both cases. Could it be that within RESTXQ the full text index is treated differently? I will work closer on a self contained example, but thought this might point to something. Cheers Lars 2015-05-18 13:44 GMT+02:00 Lars Johnsen yoon...@gmail.com: Hi Christian - and thanks for fast response. Latest version 8.11 is in use (same behaviour as previous). Let me see if I can make a self contained example. best, Lars 2015-05-18 13:40 GMT+02:00 Christian Grün christian.gr...@gmail.com: Hi Lars, hm, that's difficult to tell. All I can say is that this sounds unusual, so I'm coming up with my standard questions: Do you think you could build us a little example that allows us to reproduce the problem? Have you tried the latest version of BaseX? Best, Christian On Mon, May 18, 2015 at 1:35 PM, Lars Johnsen yoon...@gmail.com wrote: I am running a web script in two identical versions (identical as in cut and paste), one via RESTXQ and one vi REST. The response is different, and I wondered what may be the trouble. For example the output (the URLs only works locally) for http://ljohnsen:8984/hyphens/mellom is the same as http://ljohnsen:8984/rest?run=hyphen-show.xqword=mellom which is a set of hyphenation data: mellom mel - lom 17005 Mel - lom 144 mel - lom. 50 but if mellom is exchanged with nasjonalbiblioteket only the REST version shows any result, which then is the same as I get experimenting in the GUI. The actual script is added below, and which runs in both versions (identical apart form the rest and restxq interfaces), it uses full text search, but results differ when run under the REST-regime. All the best Lars G Johnsen National Library of Norway module namespace page = 'http://basex.org/modules/web-page'; declare %rest:path(/hyphens/{$word}) %output:method(html) function page:show-hyphens($word) { let $db := db:open('hyphen-data') let $hyphens := for $hyp in $db/hyphens/hyphens[full contains text {$word}] group by $first := $hyp/first, $second := $hyp/second let $count := count($hyp) order by xs:int($count) descending return element p { attribute freq {$count}, $first, - , $second, $count } let $total := sum($hyphens//@freq
Re: [basex-talk] rest vs. restxq - strange difference
Hi Christian - and thanks for fast response. Latest version 8.11 is in use (same behaviour as previous). Let me see if I can make a self contained example. best, Lars 2015-05-18 13:40 GMT+02:00 Christian Grün christian.gr...@gmail.com: Hi Lars, hm, that's difficult to tell. All I can say is that this sounds unusual, so I'm coming up with my standard questions: Do you think you could build us a little example that allows us to reproduce the problem? Have you tried the latest version of BaseX? Best, Christian On Mon, May 18, 2015 at 1:35 PM, Lars Johnsen yoon...@gmail.com wrote: I am running a web script in two identical versions (identical as in cut and paste), one via RESTXQ and one vi REST. The response is different, and I wondered what may be the trouble. For example the output (the URLs only works locally) for http://ljohnsen:8984/hyphens/mellom is the same as http://ljohnsen:8984/rest?run=hyphen-show.xqword=mellom which is a set of hyphenation data: mellom mel - lom 17005 Mel - lom 144 mel - lom. 50 but if mellom is exchanged with nasjonalbiblioteket only the REST version shows any result, which then is the same as I get experimenting in the GUI. The actual script is added below, and which runs in both versions (identical apart form the rest and restxq interfaces), it uses full text search, but results differ when run under the REST-regime. All the best Lars G Johnsen National Library of Norway module namespace page = 'http://basex.org/modules/web-page'; declare %rest:path(/hyphens/{$word}) %output:method(html) function page:show-hyphens($word) { let $db := db:open('hyphen-data') let $hyphens := for $hyp in $db/hyphens/hyphens[full contains text {$word}] group by $first := $hyp/first, $second := $hyp/second let $count := count($hyp) order by xs:int($count) descending return element p { attribute freq {$count}, $first, - , $second, $count } let $total := sum($hyphens//@freq) let $div := element div { element p {$word}, for $hyp in $hyphens return element div { attribute class {hyph}, attribute style {font-size:, 1 +round(xs:int($hyp//@freq/data()) div $total,1) || em}, $hyp } } return html encoding=UTF-8 head meta http-equiv=Content-Type content=text/html charset=UTF-8 / titleOrddelinger/title /head body{$div} /body /html };
Re: [basex-talk] memory usage when writing files
Hi Christian, and thanks a lot for the pointer to fetch:xml - it seems to do the trick! Now, a little recoding, and it should be working. Best, Lars 2015-03-27 10:48 GMT+01:00 Christian Grün christian.gr...@gmail.com: Hi Lars, Here is some background information for the reported behavior (sorry in advance if this is known to you anyway): The functional semantics of XQuery requires that repeated calls to fn:doc and fn:collection return the same documents. This can e.g. be shown by the following query: doc('x.xml') is doc('x.xml') As it's difficult to guess in advance which of the opened documents will possibly be requested again in the same query, they are all kept in main-memory until query evaluation is completed. However, things are different with functions like fetch:xml [1]. You may need to tweak your query a little bit, because the function will always give you single XML documents. Does this help? Christian [1] http://docs.basex.org/wiki/Fetch_Module#fetch:xml On Fri, Mar 27, 2015 at 10:41 AM, Lars Johnsen yoon...@gmail.com wrote: Hi all Here is code that gradually eats up memory, whether run in GUI or as command. All it does is creating temporary collections out of folders, and writing them to file. Is there a simple way to avoid this code to eat up memory? It runs out of memory (set at 12GB for command, 18GB in GUI) after 300 folders or so, and it has to process 20 000 of them. Best Lars G Johnsen Norwegian National Library Here is the actual code (: process list of folders :) for $collections in file:list($digibooks) let $html := $htmlfiles || substring-before($collections, _ocr) || .html return (: code is rerun so check if files exist :) if (not(file:exists($html))) then try { (: create a temporary collection of the files and write result to disk :) file:write( $html, db:digibok-to-html( collection($digibooks || $collections)) ) } catch * { $err:code } else ()
[basex-talk] memory usage when writing files
Hi all Here is code that gradually eats up memory, whether run in GUI or as command. All it does is creating temporary collections out of folders, and writing them to file. Is there a simple way to avoid this code to eat up memory? It runs out of memory (set at 12GB for command, 18GB in GUI) after 300 folders or so, and it has to process 20 000 of them. Best Lars G Johnsen Norwegian National Library Here is the actual code (: process list of folders :) for $collections in file:list($digibooks) let $html := $htmlfiles || substring-before($collections, _ocr) || .html return (: code is rerun so check if files exist :) if (not(file:exists($html))) then try { (: create a temporary collection of the files and write result to disk :) file:write( $html, db:digibok-to-html( collection($digibooks || $collections)) ) } catch * { $err:code } else ()
Re: [basex-talk] Error?
An attempt with a brand new family of colon-infected folders (not databases), works nicely now withthe new snapshot. While the same commands in the old version still generated the FODC0002 error: So both of these worked nicely in the new snapshot (strings refers to files and folders - not database paths): collection(/home/larsj/terra/Synsegn/URN:NBN:no-nb_digitidsskrift_2014092382210_001) as well as doc(/home/larsj/terra/Synsegn/URN:NBN:no-nb_digitidsskrift_2014092382210_001/Alto_13.xml) Cheers Lars 2015-03-12 11:05 GMT+01:00 Christian Grün christian.gr...@gmail.com: Thanks, new snapshot downloaded, but the errors persist in some way or another (not exactly the same). Does it stil have to with colons? If you find out more about it, feel free to tell me. Takk, Christian
Re: [basex-talk] Error?
Thanks, new snapshot downloaded, but the errors persist in some way or another (not exactly the same). However, since you mentioned that the problems were caused by colons in filenames, removing those colons did the trick. Now it all works as expected. The colon-free-folders can be referenced with collection(), as well as being ingested. All is well. Alles klar! Lars 2015-03-11 18:57 GMT+01:00 Christian Grün christian.gr...@gmail.com: Hi Lars, The issue should be fixed with the latest snapshot [1]. फिर मिलेंगे, Christian [1] http://files.basex.org/releases/latest On Tue, Mar 10, 2015 at 6:23 PM, Lars Johnsen yoon...@gmail.com wrote: Hi all I have a database collection which gives a strange error message '[FODC0002] (Line 1): Content is not allowed in prolog.' on this query: for $x in collection('samtiden') return doc(document-uri($x)) The results from document-uri($x) are as expected, but doc(document-uri($x)) should return $x - so what can be the cause of the error? The collection was entered in the gui using skip corrupt files since some of the files had errors (ended prematurely). But the database and document-uri() works fine for all elements in the database. I need to process the paths for grouping the documents in the collection, so the practical value is a bit more than just trying to compute a unity function! Best, Lars G Johnsen National Library of Norway
[basex-talk] Error?
Hi all I have a database collection which gives a strange error message '[FODC0002] (Line 1): Content is not allowed in prolog.' on this query: for $x in collection('samtiden') return doc(document-uri($x)) The results from document-uri($x) are as expected, but doc(document-uri($x)) should return $x - so what can be the cause of the error? The collection was entered in the gui using skip corrupt files since some of the files had errors (ended prematurely). But the database and document-uri() works fine for all elements in the database. I need to process the paths for grouping the documents in the collection, so the practical value is a bit more than just trying to compute a unity function! Best, Lars G Johnsen National Library of Norway
[basex-talk] Invoking database update from RESTXQ
Hello all I was wondering how to perform database updating, like db:add, in RESTXQ, since the adding and updating functions throw errors when invoked inside functions defined in RESTXQ-modules. Have been trying to use the forward mechanism, transferring the data to a script without functions, with no success. Best, Lars G Johnsen
Re: [basex-talk] RESTXQ and access blocking
Thanks - it worked out nicely! Just commented out the servlet-section on REST. Cheers, Lars 2015-01-14 15:57 GMT+01:00 Dirk Kirsten d...@basex.org: Hello Lars, You can disable the REST interface if you do not intend to use it (and you solely use RESTXQ). This can be done using your web server. In our default jetty-based HTTP server you can find the servlet mapping in WEB-INF/web.xml, where you can simply disable the servlet mapping for REST. Of course you could also secure this path using your web service (.e.g requesting a HTTP authentication when accessing REST). Cheers, Dirk On 01/14/2015 03:49 PM, Lars Johnsen wrote: Hi all I was wondering how to block general access to BaseX when using RESTXQ. Our javascript/jquery web-application communicates with BaseX using commands like: $('#myobject').load('objects') where the term 'objects' is defined as a path in a .xqm-file. declare %rest:path(/objects) However, databases are exposed using the URL /rest which seems built into the rest-module. For example, in the javascript/jquery console (f.ex. in Chrome ), a div could be filled up with content outside of the application by typing things like: $('div').load('rest/my_database') and general queries could be made using the rest-interface http://docs.basex.org/wiki/REST. Is there a way to prevent this, while at the same time using BaseX as web-server (one way is to use BaseX only as a backend database)? Or how to limit the URLs permitted? Best Lars -- Dirk Kirsten, BaseX GmbH, http://basexgmbh.de |-- Firmensitz: Blarerstrasse 56, 78462 Konstanz |-- Registergericht Freiburg, HRB: 708285, Geschäftsführer: | Dr. Christian Grün, Dr. Alexander Holupirek, Michael Seiferle `-- Phone: 0049 7531 28 28 676, Fax: 0049 7531 20 05 22
[basex-talk] RESTXQ and access blocking
Hi all I was wondering how to block general access to BaseX when using RESTXQ. Our javascript/jquery web-application communicates with BaseX using commands like: $('#myobject').load('objects') where the term 'objects' is defined as a path in a .xqm-file. declare %rest:path(/objects) However, databases are exposed using the URL /rest which seems built into the rest-module. For example, in the javascript/jquery console (f.ex. in Chrome ), a div could be filled up with content outside of the application by typing things like: $('div').load('rest/my_database') and general queries could be made using the rest-interface http://docs.basex.org/wiki/REST. Is there a way to prevent this, while at the same time using BaseX as web-server (one way is to use BaseX only as a backend database)? Or how to limit the URLs permitted? Best Lars
Re: [basex-talk] deduplication problem in BaseX 7.9?
Hi Michael - I got the same results with my 7.9 version. What is a bit surprising (hopefully I am not introducing any noise into your problem) is that if the last child step is cut off the paths and added to the path variables within the return query, the result becomes 4 4 4 4 4. So each $pathN without the article-id appears to be returning one copy of the parent article-meta up to the parent step: let $path1 := $doc/child::article/child::front /child::article-meta /child::aff[contains(.,Japan)] /parent::article-meta, $path2 := $doc/descendant::aff[contains(.,Japan)] /parent::article-meta, $path3 := $doc/article/front/article-meta/aff[contains(.,'Japan')] /.., $path4 := $doc//aff[contains(.,'Japan')]/.., $path5 := $doc//aff/.. return (count($path1/child::article-id), count($path2/article-id), count($path3/article-id), count($path4/article-id), count($path5/article-id)) Best Lars 2014-09-26 20:35 GMT+02:00 C. M. Sperberg-McQueen cms...@blackmesatech.com : Consider the following XML document: article front article-meta aff id=aff1Tropical and Infectious Disease Hospital, Kathmandu, Nepal/aff aff id=aff2Nagasaki University, Nagasaki, Japan/aff aff id=aff3Department of Radiology, Kyorin University Faculty of Medicine, Tokyo, Japan/aff aff id=aff5Pentax Company Limited, Tokyo, Japan/aff aff id=aff6National Research Laboratory of Molecular Complex Control, Yonsei University, Seoul, Korea/aff !--* ... *-- article-id pub-id-type=pmc2570825/article-id article-id pub-id-type=pmid18325280/article-id article-id pub-id-type=publisher-id07-0473/article-id article-id pub-id-type=doi10.3201/eid1403.070473/article-id /article-meta /front !--* ... *-- /article For convenience in trying to understand this problem, a copy of this document has been placed at [1]. When I issue the following search against this document, I get unexpected results. let $doc := doc('http://blackmesatech.com/2014/LIS590DML/data/testdata.xml ') let $path1 := $doc/child::article/child::front /child::article-meta /child::aff[contains(.,Japan)] /parent::article-meta/child::article-id, $path2 := $doc/descendant::aff[contains(.,Japan)] /parent::article-meta/child::article-id, $path3 := $doc/article/front/article-meta/aff[contains(.,'Japan')] /../article-id, $path4 := $doc//aff[contains(.,'Japan')]/../article-id, $path5 := $doc//aff/../article-id return (count($path1), count($path2), count($path3), count($path4), count($path5)) What I expect is that path1, path2, path3, path4, and path5 should all return the same results, namely the set of four article-id elements in the document. So the sequence of counts returned should be 4 4 4 4 4. What I am finding is that path1 and path3 are returning 12 results, with each article-id present three times in the result (once, apparently, for every aff element containing the string 'Japan'). Paths 2, 4, and 5 are all returning 4 results each, as I had expected them to. So the sequence of counts actually returned is 12 4 12 4 4. In BaseX 7.6, for what it's worth, this query returns the sequence 12 12 12 12 20, which seems suggestive. Interestingly, if I initialize the variable $doc with a direct element constructor, along the lines of let $doc := document { article.../article } then all counts come out as expected in 7.6, but in 7.9 the result continues to be 12 4 12 4 4. Is this an error in the handling of the / operator, or am I missing some subtle point? Many thanks. [1] http://blackmesatech.com/2014/LIS590DML/data/testdata.xml -- * C. M. Sperberg-McQueen, Black Mesa Technologies LLC * http://www.blackmesatech.com * http://cmsmcq.com/mib * http://balisage.net
Re: [basex-talk] RestXQ and file paths
Hi Christian Thanks - that solved it! Best, Lars 2014-09-02 17:48 GMT+02:00 Christian Grün christian.gr...@gmail.com: Hi Lars, the relative URL static/logo.png will be rewritten by your browser to the destination URL. If your URL is e.g http://localhost:1234/show/xyz ...then the image URL will be rewritten to... http://localhost:1234/show/static/logo.png In other words, there is no way for BaseX, RESTXQ or the web server to do correct this address resolution. The easiest solution to tackle this is to use absolute paths. The following URL... img src=/static/logo.png/ ...will be rewritten to... http://localhost:1234/static/logo.png Hope this helps, Christian
[basex-talk] Downloading files
I am using BaseX restxq for accessing a repository from a web browser. Uploading files works smoothly, but I can't see how to make a download button work. For uploading, the recipie on the restxq help page was enough to get it to work. Is there a corresponding way for making downloading work? What I have tried is to let BaseX send a html-page containing: form method=get action=/download/{$file} button type=submitDownload/button /form To process this form is the following restfunction declare %rest:path(/download/{$file}) %output:method(html) function page:download-file($file) { ... } Inside the curly braces, I have tried an a href.. element and file:read-binary, but none of them with any success. BaseX complains about the a element, and file:read-binary outputs directly to the browser. Any suggestions?
Re: [basex-talk] Downloading files
Thanks a lot Andy! It works like a charm. Now the filename can be manipulated as well. Perfect! Best, Lars 2014-08-21 17:10 GMT+02:00 Andy Bunce bunce.a...@gmail.com: Hi Lars, You need to return a sequence of two items: (restxq:response,thedata) I do something like... declare %rest:path(/download/{$file}) function page:download-file($file) { (download-response(raw,$file), file:read-binary(..)) }; (:~ headers for download :) declare function download-response($method,$filename){ restxq:response output:serialization-parameters output:method value={$method}/ /output:serialization-parameters http:response http:header name=Content-Disposition value='attachment;filename={$filename}'/ /http:response /restxq:response }; /Andy On 21 August 2014 15:40, Lars Johnsen yoon...@gmail.com wrote: I came a little closer by making custom http:headers, but I have to confess I'm in deep water here: declare %rest:path(/download/{$file}) function page:download-file($file) { rest:response http:response status=200 message=OK http:header name=Content-Disposition value=Attachment/ http:header name=filename value={$file}/ /http:response /rest:response }; This function do trigger a download of a file with the appropriate file name (=$file) containing the text OK. If I just could find somewhere in this code to put the contents of file, it should solve the problem. Best, Lars 2014-08-21 15:20 GMT+02:00 Lars Johnsen yoon...@gmail.com: I am using BaseX restxq for accessing a repository from a web browser. Uploading files works smoothly, but I can't see how to make a download button work. For uploading, the recipie on the restxq help page was enough to get it to work. Is there a corresponding way for making downloading work? What I have tried is to let BaseX send a html-page containing: form method=get action=/download/{$file} button type=submitDownload/button /form To process this form is the following restfunction declare %rest:path(/download/{$file}) %output:method(html) function page:download-file($file) { ... } Inside the curly braces, I have tried an a href.. element and file:read-binary, but none of them with any success. BaseX complains about the a element, and file:read-binary outputs directly to the browser. Any suggestions?
[basex-talk] Basex 404
I've recently run into a problem with basexserver/basexhttp. The host:8984 responds but seems to have lost contact with the webapp directory. Here is what is happening on 3 out of 5 installations (on linux - installed by unpacking the latest ziparchive), where one used to work as expected: 1) host:8984 do not display the restxq example, but is up and running 2) host:8984/rest shows the list of databases 3) host:8984/rest/run=script-name.xq gives 404 - no function matches request So the server appears to be connected to the database, but won't look at the webapp directory. Where is the best place to look for information? All the best, Lars
[basex-talk] Traling empty nodes in csv:serialize
Hi all I have a question about csv:serialize and blank nodes. Trailing blanks are indicated with a separator, but those in the middle are. csv:serialize( rows row f1/f s2/s t3/t/row row f/f s/s t3/t/row row f1/f s/s t3/t/row row f1/f s/s t/t/row row f1/f s2/s t/t/row /rows) Here is the output (BaseX 7.8.1) , where line 4 and 5 are missing trailing separators: 1,2,3 ,,3 1,,3 1 1,2 Could this be fixed, or is it a feature? All the best Lars G Johnsen National Library of Norway ___ BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk
Re: [basex-talk] Revisiting XMLPrague
It was great meeting so many of the BaseX developers. I especially appreciated all the help with the distinction between Semmelknödel vs. Serviettenknödel! Thanks again for a great time in Prague Lars National Library of Norway 2014-02-17 18:29 GMT+01:00 Christian Grün christian.gr...@gmail.com: Dear subscribers, it was great to meet some of you live in Prague! And once again I would like to thank all our speakers (Lars Johnsen, Gerrit Imsieke, Yoann Maingon, Jean-Marc Mercier and Michael Sperberg-McQueen) for making the user meeting to a memorable event! Have fun with BaseX 7.8, and we are looking to your feedback, Christian PS: Maybe we'll see each other in London or Amsterdam, or at the Balisage? We'll tell you in time if we will be participating as well. ___ BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk ___ BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk
[basex-talk] Parameter to xquery
Is it possible to pass parameters to xquery modules using BaseX scripting language? I tried something similar to to http-specification but had no luck. For example, if the code below is stored as add.xq, could $x and $y be supplied in a RUN add.xq command? declare variable $x external; declare variable $y external; let $res := $x + $y return $res All the best Lars G Johnsen National Library of Norway ___ BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk
[basex-talk] html and css
When running BaseX http-server, I wondered where resources like javascript and css should be located. The .xqm-file for the welcome RESTXQ module seems to fetch its svg and css from a static directory under webapp. However, when sending HTML code from within a home made query, it won't connect to any css neither within webapp nor webapp/static, nor even with an absolute path. Best Lars G Johnsen National Library of Norway ___ BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk
[basex-talk] csv-parser
Hi all When importing CSV files, it seems that BaseX is parsing fields for balanced bracketing and quotes (a single quote or bracket causes trouble). Is it possible to turn that off, so that files are processed based only on delimiter? I couldn't find any information in the documentation. All the best, Lars ___ BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk
Re: [basex-talk] csv-parser
Fabulous, that fixed it! The lines with a single quote in them are fine, as well as those with a single left parenthesis (. The data come from a SQL database with human entered data in freetext fields, so things aren't always in balance. The new file browser with the editor is nice too! Best, Lars 2013/12/4 Christian Grün christian.gr...@gmail.com Hi Lars, When importing CSV files, it seems that BaseX is parsing fields for balanced bracketing and quotes (a single quote or bracket causes trouble). Is it possible to turn that off, so that files are processed based only on delimiter? I couldn't find any information in the documentation. I’ve just added such an option to BaseX. Could you please give us some feedback if it does what you expect? Thanks, Christian [1] http://docs.basex.org/wiki/CSV_Module#Options [2] http://files.basex.org/releases/latest/ ___ BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk
[basex-talk] Results from ft:extract()
Hello all When using ft:extract() on nodes, it seems to clip into the match itself too often. Is it possible to have ft:extract() leave as much before the match as after? For example, here are two results for spise lunsj (= eat lunch (language is Norwegian)) the first is as it should be, while the second have half of the matched string clipped . The result is obtained first as a set of hits using fulltext search [text() contains text {$terms} all], then each hit is processed through ft:extract($hit, $terms): ... gjerne komme for å spise lunsj med meg på Harrods. Da skal jeg servere hjortetestikler fra eiendommen min i Skottland. Vi trenger nemlig alle store... ... de franske VM-spillerne hjem til Paris. Der ble de mottatt av fans både på Charles de Gaulle-flyplassen og da de ankom Elysee-palasset for å spise ... Regards, Lars G Johnsen National Library of Norway ___ BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk
[basex-talk] HTML serialization error
When converting files from xml to html, there appeared a serialization error saying something to the effect that x84 was an illegal html character. The files were written using file:write with parameter $params defined as: let $params := output:serialization-parametersoutput:method value=html// output:serialization-parameters When sending the html directly to the browser (not writing to file and using the above declaration), the browser (chrome) appeared to be ok, and displayed the full html. Processing the text through normalize-unicode() didn't help. The error persisted. Is there a way to fix the text before submitting it to file:write()? All the best Lars G Johnsen National Library of Norway ___ BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk
Re: [basex-talk] HTML serialization error
Hi Christian and thanks for the quick solution! Do you have some idea how the x84 byte was stored into the database? Each file is a digitized book, where the bytes comes stem the OCR-process. The words themselves are stored as values of attributes, for example STRING CONTENT=word. All the best Lars ___ BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk
[basex-talk] BaseX rest and PHP
Hi all This is a question for those with experience using PHP with BaseX and the rest interface. There is a peculiar case of lag for queries, and I wondered if it had to with how PHPdoes things, or if it is how BaseX does things (or if there is something else). This may be a long shot, but I still give it a try. The setup is fairly straight forward, with queries sent from a html-page via a PHPscript which places the call: file_get_contents(basex-rest-URL). The result is then sent back to the requesting html page. The problem is that this sometimes hangs: two or three queries then it stops or takes an awful lot of time. The peculiar thing is that the basex-rest-URL, when typed directly into browser address line, receives an immediate response, even while it is still being processed by the html-php-basex cycle. Sometimes the cycle gives an answer, sometimes not. The cycle do work when file_get_contents is directed towards an SQLite database, so it seems that there is something going on in the communication between PHP and BaseX. Any ideas? Regards Lars G Johnsen National Library of Norway ___ BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk
[basex-talk] OAI-PMH
Hello all I am looking into the possibility of using BaseX as an OAI-PMH metadata provider and harvester, and wondered if anyone has experience with it for this purpose. Specifically using BaseX as a repository with the http-service with xquery scripts for accessing and providing metadata records. Presumably, there aren't any limitations on the database side, and since the OAI-PMH protocol is all XML (http://www.openarchives.org/pmh/) it seems like a good idea to try and make it work. So if people on this list have any experience, I would like to hear from you. thanks, Lars G Johnsen National Library of Norway ___ BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk
[basex-talk] problem with basexhttp
BaseX is installed by administrators on one of our server machines. When starting the http server, the followng message is displayed: $ ./basexhttp /basex/bin/src/main/webapp/WEB-INF/web.xml not found. which is a bit strange since the webapp folder is a sister folder to the bin folder. So why is it looking for webapp in /bin/src/main/ ? The webapp folder in itself do appear to contain all the relevant folders and documents. As it is similar to a working setup on another machine. All the best Lars G Johnsen National Library of Norway ___ BaseX-Talk mailing list BaseX-Talk@mailman.uni-konstanz.de https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk