from:"Lars Johnsen"

[basex-talk] INFO and dp:system() show different values

2018-01-29 Thread Lars Johnsen

Hi all

when restarting a http-server, the script for accessing no longer finds the
databases using db:open().

The path to the database using INFO in command line basex says one thing,
while output from db:system() is different

db:system() .../basex-webapp/webapp/WEB-INF/data
INFO:   .../basex-webapp/data

The folder /webapp is where the scripts are. There is only one file .basex
which contain the same as shown with INFO, which resides under folder
/basex-webapp.

Any thoughts?

Best
Lars

Re: [basex-talk] Xquery recursion and db:add() - stack overflow

2016-05-12 Thread Lars Johnsen

Thanks for pointer!

Code is rewritten using hof:until() and tested towards a particular set at
our national provider of library data.

The script still accumulates data, so it will probably still run into
memory troubles with larger datasets, but the stack-overflow should be
taken care of.

For anyone interested, the code is attached below, and using hof:until() as
the higher order function. To make it work, fill in URLs for a choosen
OAI-endpoint, and maybe change som of the request parameters - this one
fetches marc21 posts and uses sets. Some error checking may also be
implemented.

Cheers,
Lars


declare namespace oai = "http://www.openarchives.org/OAI/2.0/;;

(:URL for  resumption tokens :)
declare variable $URL := "oai-URL?verb=ListRecordsresumptionToken=";

(:URL for initial request:)
declare variable $URL2 :=
"oai-URL?verb=ListRecordsmetadataPrefix=marc21set=";

(: Variable for OAI-set - if not used, remove "set=" in URL2 :)
declare variable $oai-set := "aset";


(: basex http :)
declare variable $http-option := ;


(: --

Fetch data from OAI-endpoint using a start map containing resumption token
and the first set of data.
The map has two keys, 'resume' and 'chunk', where 'chunk' is an accumulator
holding data from the current and previous requests.
hof:until() does not return an aggregated list of maps, so data must be
collected somehow

--:)

declare function local:getResumption($startmap) {

  let $token := map:get($startmap, 'resume')
  return if (empty($token)) then
$startmap
  else
let $http-request := http:send-request($http-option, $URL || $token)
let $result := if ($http-request instance of node()) then
$http-request
  else
{$http-request}
return  map {
  'resume':  $result//oai:resumptionToken/text(),
  'chunk': (
map:get($startmap, 'chunk'),
$result//oai:metadata
  )
}
 };


(: Issue initial request :)

let $first := http:send-request($http-option, $URL2 || $oai-set)

(:  Create startmap :)

let $init := map {
  'chunk': $first//oai:metadata,
  'resume': $first//oai:resumptionToken/text()
}

let $oai :=  hof:until(

  function($x) {
empty(map:get($x, 'resume'))
  },

  function($y) {
local:getResumption($y)
 },
 $init
)

(: Amend with additional code like db:add() of file:write() here :)

return element oai {map:get($oai, 'chunk')}


2016-05-12 15:07 GMT+02:00 Dirk Kirsten <d...@basex.org>:

> Hello Lars,
>
> just a thought (and really just a pointer, I am neither a purely
> functional guy and also I feel like I am missing something obious...):
> Maybe you could rewrite the recursive approach using higher order
> functions. Consider a query like the following
>
> hof:scan-left(1 to 100,
>   map { "token": "starttoken" },
>   function($result, $index) {
> let $req := http:send-request(,
> "http://google.com?q=; <http://google.com?q=> || $result("token"))
> return map {
>   "result": $req,
>   "token" : $req//http:header[@name = "Date"]/@value/data()
> }
> })
>
> It will issue 100 requests to google and use some specific token from the
> query before (in this case I used the date). This will output a sequence of
> the map entries and in a subsequent step you could return only the actual
> result values.
>
> Best regards, Dirk
>
> On 05/12/2016 12:55 PM, Lars Johnsen wrote:
>
> Thanks Johan and Matti for useful suggestions.
>
> Cutting down on the chunks seems to be a viable alternative.
>
> It would have been nice, though,  to have a robust harvester in XQuery
> that could take on anything, although the recursive version works fine as
> long as the dataset consist of a couple of  thousand entries.
>
> Best,
> Lars
>
> 2016-05-12 8:16 GMT+02:00 Lassila, Matti <matti.j.lass...@jyu.fi>:
>
>> Hello,
>>
>> If your case allows using external tools for harvesting, I can highly
>> recommend metha (https://github.com/miku/metha) which is a fairly full
>> featured command line OAI-PMH harvester.
>>
>> Best regards,
>>
>> Matti L.
>>
>> On 11/05/16 18:31 , "basex-talk-boun...@mailman.uni-konstanz.de on behalf
>> of Johan Mörén" <basex-talk-boun...@mailman.uni-konstanz.de on behalf of
>> johan.mo...@gmail.com> wrote:
>>
>> >Maybe there is some other way to get the data over. I'll have a talk with
>> >the guys providing the OAI-endpoint.
>>
>>
>
> --
> Dirk Kirsten, BaseX GmbH, http://basexgmbh.de
> |-- Firmensitz: Blarerstrasse 56, 78462 Konstanz
> |-- Registergericht Freiburg, HRB: 708285, Geschäftsführer:
> |   Dr. Christian Grün, Dr. Alexander Holupirek, Michael Seiferle
> `-- Phone: 0049 7531 91 68 276, Fax: 0049 7531 20 05 22
>
>

Re: [basex-talk] Xquery recursion and db:add() - stack overflow

2016-05-12 Thread Lars Johnsen

Thanks Johan and Matti for useful suggestions.

Cutting down on the chunks seems to be a viable alternative.

It would have been nice, though,  to have a robust harvester in XQuery that
could take on anything, although the recursive version works fine as long
as the dataset consist of a couple of  thousand entries.

Best,
Lars

2016-05-12 8:16 GMT+02:00 Lassila, Matti :

> Hello,
>
> If your case allows using external tools for harvesting, I can highly
> recommend metha (https://github.com/miku/metha) which is a fairly full
> featured command line OAI-PMH harvester.
>
> Best regards,
>
> Matti L.
>
> On 11/05/16 18:31 , "basex-talk-boun...@mailman.uni-konstanz.de on behalf
> of Johan Mörén"  johan.mo...@gmail.com> wrote:
>
> >Maybe there is some other way to get the data over. I'll have a talk with
> >the guys providing the OAI-endpoint.
>
>

Re: [basex-talk] Xquery recursion and db:add() - stack overflow

2016-05-11 Thread Lars Johnsen

The basexgui startup file now contains:

BASEX_JVM="-Xmx8g -Xss4m  $BASEX_JVM"

It helped the script a long way, but eventually it had to kneel. It works
fine though, on smaller datasets.

Maybe there is some other way to get the data over. I'll have a talk with
the guys providing the OAI-endpoint.

Thanks for the pointer to Xss!

Lars

2016-05-11 14:38 GMT+02:00 Dirk Kirsten <d...@basex.org>:

> Hello Lars,
>
> if you have a deep recursion Java will at some point hit its stack size
> limit. Have you already tried to simply increase the Java stack size, e.g.
> by passing the parameter -Xss2m to the JVM?
>
> Cheers
> Dirk
>
>
> On 05/11/2016 01:43 PM, Lars Johnsen wrote:
>
> The following code generates the error "Stack Overflow: try tail
> recursion?"
>
> The code reads in bibliographic data using OAI-PMH and updates a database
> for each chunk of data. With OAI-PMH, only part of the data is available
> for each request, so the server returns a resumption token if there are
> more data available.
>
> The xquery function making the queries is implemented recursively preceded
> by a database update request (see the last two lines) for each call. Is it
> db:add() that causes the stack overflow? The recursion cannot be placed
> further towards the end!
>
> declare %updating function local:getResumption($token) {
>  if (empty($token)) then
> ()
>  else
> let $http-request := http:send-request($http-option, $URL ||
> $token)
> let $result :=
> if ($http-request instance of node()) then
>  $http-request
>else
>  {$http-request}
>
> let $resume := $result//oai:resumptionToken/text()
> return  (
> db:add($database,element chunk {$result//oai:metadata}, $path)
> ,
> local:getResumption($resume)
> )
>  };
>
> Best,
> Lars
>
>
> --
> Dirk Kirsten, BaseX GmbH, http://basexgmbh.de
> |-- Firmensitz: Blarerstrasse 56, 78462 Konstanz
> |-- Registergericht Freiburg, HRB: 708285, Geschäftsführer:
> |   Dr. Christian Grün, Dr. Alexander Holupirek, Michael Seiferle
> `-- Phone: 0049 7531 91 68 276, Fax: 0049 7531 20 05 22
>
>

[basex-talk] Xquery recursion and db:add() - stack overflow

2016-05-11 Thread Lars Johnsen

The following code generates the error "Stack Overflow: try tail
recursion?"

The code reads in bibliographic data using OAI-PMH and updates a database
for each chunk of data. With OAI-PMH, only part of the data is available
for each request, so the server returns a resumption token if there are
more data available.

The xquery function making the queries is implemented recursively preceded
by a database update request (see the last two lines) for each call. Is it
db:add() that causes the stack overflow? The recursion cannot be placed
further towards the end!

declare %updating function local:getResumption($token) {
 if (empty($token)) then
()
 else
let $http-request := http:send-request($http-option, $URL || $token)
let $result :=
if ($http-request instance of node()) then
 $http-request
   else
 {$http-request}

let $resume := $result//oai:resumptionToken/text()
return  (
db:add($database,element chunk {$result//oai:metadata}, $path)
,
local:getResumption($resume)
)
 };

Best,
Lars

[basex-talk] Symlinked files with RESTXQ

2016-04-07 Thread Lars Johnsen

Dear all

What must be done to access symlinked files when using BaseX as http
server?

The scenario is that a fairly large collection of audio-files are accessed
using BaseX which serves them up over http. However, it seems that
symlinking (the folder with audio files exists as a symlinked copy in the
static directory of the BaseX web folder) prevents the browser from
accessing them. Is there a parameter in the settings I have overlooked?


Best,
Lars G. Johnsen
National Library of Norway

Re: [basex-talk] ft:mark

2015-07-05 Thread Lars Johnsen

Thanks for XQuery code! I will try to integrate it with  our own efforts,
and let you know how it goes.

Lars

2015-07-05 15:19 GMT+02:00 Christian Grün christian.gr...@gmail.com:

 Hi Lars,

 I guess it was due to the internal XQFT representation of match
 positions that adjacent matches are marked one by one, and it's
 probably difficult to change this without too much additional effort.

 However, you can use XQuery for transforming your result; see e.g. the
 attached XQuery code. I must confess I didn't spend too much time on
 it (right now, it's simply too hot around here...), so I'm looking for
 revised versions ;)

 Christian


 On Thu, Jul 2, 2015 at 6:13 PM, Lars Johnsen yoon...@gmail.com wrote:
  The full text function ft:mark() puts a mark around each of the words
 that
  occur in a match, starting from the first matching word to the last,
  including stop words, except for punctuation characters. Is it possible
 to
  check for the kind of characters (or strings) that ft:mark() will skip
 when
  marking matches? Or, would it be possible to ask ft:mark() to put one
 marker
  for the whole match?
 
  The case I am using it for is to get the sequence of matching words
 within a
  match, and sometimes, for long strings, there may be several sequences. A
  contiguous sequence of marked elements maybe assumed to make up a match,
  while non contiguous do not. One marker around the match solves the
 problem,
  as would detecting characters that are never marked.
 
  Regards
  Lars G Johnsen
  National Library of Norway

[basex-talk] ft:mark

2015-07-02 Thread Lars Johnsen

The full text function ft:mark() puts a mark around each of the words that
occur in a match, starting from the first matching word to the last,
including stop words, except for punctuation characters. Is it possible to
check for the kind of characters (or strings) that ft:mark() will skip when
marking matches? Or, would it be possible to ask ft:mark() to put one
marker for the whole match?

The case I am using it for is to get the sequence of matching words within
a match, and sometimes, for long strings, there may be several sequences. A
contiguous sequence of marked elements maybe assumed to make up a match,
while non contiguous do not. One marker around the match solves the
problem, as would detecting characters that are never marked.

Regards
Lars G Johnsen
National Library of Norway

Re: [basex-talk] Potential Bug - message from fulltext

2015-06-30 Thread Lars Johnsen

After adding a list of stop words - removed the top 100 - the full text
index works perfectly.

Thanks for suggestions!
Lars

2015-06-28 0:03 GMT+02:00 Christian Grün christian.gr...@gmail.com:

 Hi Lars,

 It looks as if the input data is indeed too large to be indexed (the
 internal id lists seem to exceed the maximum array size in main
 memory). The usual alternative to make it work is to distribute your
 document(s) into multiple databases.

 If you want, you can also provide us with the input data, but I assume
 it will take pretty much space?

 Best,
 Christian


  Sat, Jun 27, 2015 at 12:50 PM, Lars Johnsen yoon...@gmail.com wrote:
  When trying to to a full text index on a collection of texts, the process
  runs for a couple of hours with the exit message below - I think it is
 near
  completed. From the GUI, I have at least seen the progress bar get to
 around
  80 %, so I think it is safe to assume that the error is connectedt the
 final
  stages.
 
  The texts are unstructured and represented as one line pr. book. Here is
 the
  result from the index process. Parameters set in GUI are: Norwegian
  Snowball, lemmatization, diacritics. There is set aside 30GB for the GUI.
 
  Path summary:
  doc(): 317259x, strings
text: 317259x, leaf
  text(): 317259x, strings, leaf
 
  Here is the error message:
 
  Improper use? Potential bug? Your feedback is welcome:
  Contact: basex-talk@mailman.uni-konstanz.de
  Version: BaseX 8.2 beta 7d38949
  Java: Oracle Corporation, 1.7.0_79
  OS: Linux, amd64
  Stack Trace:
  java.lang.NegativeArraySizeException
  at java.util.Arrays.copyOf(Arrays.java:2271)
  at org.basex.util.TokenBuilder.add(TokenBuilder.java:303)
  at org.basex.util.TokenBuilder.add(TokenBuilder.java:290)
  at org.basex.index.ft.FTBuilder.merge(FTBuilder.java:248)
  at org.basex.index.ft.FTBuilder.write(FTBuilder.java:155)
  at org.basex.index.ft.FTBuilder.index(FTBuilder.java:94)
  at org.basex.index.ft.FTBuilder.build(FTBuilder.java:102)
  at org.basex.index.ft.FTBuilder.build(FTBuilder.java:1)
  at org.basex.data.DiskData.createIndex(DiskData.java:195)
  at org.basex.core.cmd.ACreate.create(ACreate.java:117)
  at org.basex.core.cmd.CreateIndex.run(CreateIndex.java:62)
  at org.basex.core.Command.run(Command.java:398)
  at org.basex.core.Command.execute(Command.java:100)
  at org.basex.core.Command.execute(Command.java:123)
  at org.basex.gui.dialog.DialogProgress$1.run(DialogProgress.java:178)
 
  Regards
  Lars G Johnsen
  National Library of Norway

[basex-talk] Potential Bug - message from fulltext

2015-06-27 Thread Lars Johnsen

When trying to to a full text index on a collection of texts, the process
runs for a couple of hours with the exit message below - I think it is near
completed. From the GUI, I have at least seen the progress bar get to
around 80 %, so I think it is safe to assume that the error is connectedt
the final stages.

The texts are unstructured and represented as one line pr. book. Here is
the result from the index process. Parameters set in GUI are: Norwegian
Snowball, lemmatization, diacritics. There is set aside 30GB for the GUI.

Path summary:
doc(): 317259x, strings
  text: 317259x, leaf
text(): 317259x, strings, leaf

Here is the error message:

Improper use? Potential bug? Your feedback is welcome:
Contact: basex-talk@mailman.uni-konstanz.de
Version: BaseX 8.2 beta 7d38949
Java: Oracle Corporation, 1.7.0_79
OS: Linux, amd64
Stack Trace:
java.lang.NegativeArraySizeException
at java.util.Arrays.copyOf(Arrays.java:2271)
at org.basex.util.TokenBuilder.add(TokenBuilder.java:303)
at org.basex.util.TokenBuilder.add(TokenBuilder.java:290)
at org.basex.index.ft.FTBuilder.merge(FTBuilder.java:248)
at org.basex.index.ft.FTBuilder.write(FTBuilder.java:155)
at org.basex.index.ft.FTBuilder.index(FTBuilder.java:94)
at org.basex.index.ft.FTBuilder.build(FTBuilder.java:102)
at org.basex.index.ft.FTBuilder.build(FTBuilder.java:1)
at org.basex.data.DiskData.createIndex(DiskData.java:195)
at org.basex.core.cmd.ACreate.create(ACreate.java:117)
at org.basex.core.cmd.CreateIndex.run(CreateIndex.java:62)
at org.basex.core.Command.run(Command.java:398)
at org.basex.core.Command.execute(Command.java:100)
at org.basex.core.Command.execute(Command.java:123)
at org.basex.gui.dialog.DialogProgress$1.run(DialogProgress.java:178)

Regards
Lars G Johnsen
National Library of Norway

Re: [basex-talk] rest vs. restxq - strange difference

2015-05-20 Thread Lars Johnsen

Thanks for that. New version works nicely with full text indexing - and
very fast too!

Noticed that the text index seems to work differently between RESTXQ (not
utilized?) and REST - judging from the response time.

Thanks again for the efforts

Lars

2015-05-19 13:29 GMT+02:00 Christian Grün christian.gr...@gmail.com:

  I'll check out how this can be fixed.

 So I checked out how to fix it, and I fixed it [1]. Feel free to try
 the latest snapshot [2]!
 Christian

 [1] https://github.com/BaseXdb/basex/issues/1144
 [2] http://files.basex.org/releases/latest


  On Mon, May 18, 2015 at 6:46 PM, Lars Johnsen yoon...@gmail.com wrote:
  A last update, which may illuminate a little. After reindexing the
 database
  using Norwegian (snowball), stemming, and keeping diacritis, RESTXQ
  processes neither the special characters (treats them as closest
 ascii), nor
  inflected forms.
 
  The words mannen (=the man, definite) and spaserer (=walks, present
  tense), result in no output, while using the naked stems mann and
 spaser
  the full result is displayed. In contrast to REST which behaves as
 expected.
 
 
  Cheers
  Lars
 
  2015-05-18 15:28 GMT+02:00 Lars Johnsen yoon...@gmail.com:
 
  As an update, after rebuilding database with
 
  text index,
  full text index (no language, no stemming, keep diacritics)
 
  restarting server:
  BaseX 8.1.1 [Server]
  Server was started (port: 29084)
  [main] INFO org.eclipse.jetty.server.AbstractConnector - Started
  SelectChannelConnector@0.0.0.0:8984
  HTTP Server was started (port: 8984)
 
  RESTXQ: Norwegian characters are converted using full text index,
 changing
  to text index takes forever.
  REST: Full-text works as expected, and text index works as expected
 (same
  as runing in GUI for both).
 
  It looks as if the index structure is treated differently.
 
 
  2015-05-18 15:07 GMT+02:00 Lars Johnsen yoon...@gmail.com:
 
  The full text query is blisteringly fast for both, the text index
 query
  is fast only for REST queries and seems not to be used with queries in
  RESTXQ. I am rebuilding the whole database now to see how it goes,
 and will
  restart everything for a new assessment.
 
 
 
  2015-05-18 15:00 GMT+02:00 Christian Grün christian.gr...@gmail.com
 :
 
   However, when using text index instead of full text the results are
   the same
   for both, except that RESTXQ takes almost forever
 
  What about the original query: Has it been slow as well, or do you
  think this is a new problem?
 
 
   2015-05-18 14:28 GMT+02:00 Christian Grün 
 christian.gr...@gmail.com:
  
   It could be that your URL is decoded in a wrong way.. What
 happens if
   you run the following function with REST and RESTXQ and føre as
   word?
  
 declare
   %rest:path(/test/encoding/{$word})
 function page:test-encoding($word) {
   string-to-codepoints($word)
 };
  
   Thanks,
   Christian
  
  
   string-to-codepoints()
REST output (2 first lines):
   føre
   fø - re 219
   
RESTXQ
   føre
   fo - re 123
   
The first word quoted is føre in both cases and is what the
scripts
see,
so the full text is given the same in both cases. Could it be
 that
within
RESTXQ the full text index is treated differently?
   
I will work closer on a  self contained example, but thought
 this
might
point to something.
   
Cheers
Lars
   
   
2015-05-18 13:44 GMT+02:00 Lars Johnsen yoon...@gmail.com:
   
Hi Christian - and thanks for fast response. Latest version
 8.11
is in
use
(same behaviour as previous). Let me see if I can make a self
contained
example.
   
best,
Lars
   
2015-05-18 13:40 GMT+02:00 Christian Grün
christian.gr...@gmail.com:
   
Hi Lars,
   
hm, that's difficult to tell. All I can say is that this
 sounds
unusual, so I'm coming up with my standard questions: Do you
think you
could build us a little example that allows us to reproduce
 the
problem? Have you tried the latest version of BaseX?
   
Best,
Christian
   
   
On Mon, May 18, 2015 at 1:35 PM, Lars Johnsen 
 yoon...@gmail.com
wrote:

 I am running a web script in two identical versions
 (identical
 as in
 cut
 and paste), one via RESTXQ and one vi REST. The response is
 different,
 and
 I wondered what may be the trouble.

 For example the output (the URLs only works locally) for
 http://ljohnsen:8984/hyphens/mellom
 is the same as

 http://ljohnsen:8984/rest?run=hyphen-show.xqword=mellom

 which is a set of hyphenation data:
 mellom
 mel - lom 17005
 Mel - lom 144
 mel - lom. 50

 but if mellom is exchanged with nasjonalbiblioteket only
 the
 REST
 version shows any result, which then is the same as I get
 experimenting
 in
 the GUI.

 The actual script is added below, and which runs in both
 versions

Re: [basex-talk] rest vs. restxq - strange difference

2015-05-18 Thread Lars Johnsen

As an update, after rebuilding database with

text index,
full text index (no language, no stemming, keep diacritics)

restarting server:
BaseX 8.1.1 [Server]
Server was started (port: 29084)
[main] INFO org.eclipse.jetty.server.AbstractConnector - Started
SelectChannelConnector@0.0.0.0:8984
HTTP Server was started (port: 8984)

RESTXQ: Norwegian characters are converted using full text index, changing
to text index takes forever.
REST: Full-text works as expected, and text index works as expected (same
as runing in GUI for both).

It looks as if the index structure is treated differently.


2015-05-18 15:07 GMT+02:00 Lars Johnsen yoon...@gmail.com:

 The full text query is blisteringly fast for both, the text index query is
 fast only for REST queries and seems not to be used with queries in RESTXQ.
 I am rebuilding the whole database now to see how it goes, and will restart
 everything for a new assessment.



 2015-05-18 15:00 GMT+02:00 Christian Grün christian.gr...@gmail.com:

  However, when using text index instead of full text the results are the
 same
  for both, except that RESTXQ takes almost forever

 What about the original query: Has it been slow as well, or do you
 think this is a new problem?


  2015-05-18 14:28 GMT+02:00 Christian Grün christian.gr...@gmail.com:
 
  It could be that your URL is decoded in a wrong way.. What happens if
  you run the following function with REST and RESTXQ and føre as
  word?
 
declare
  %rest:path(/test/encoding/{$word})
function page:test-encoding($word) {
  string-to-codepoints($word)
};
 
  Thanks,
  Christian
 
 
  string-to-codepoints()
   REST output (2 first lines):
  føre
  fø - re 219
  
   RESTXQ
  føre
  fo - re 123
  
   The first word quoted is føre in both cases and is what the scripts
   see,
   so the full text is given the same in both cases. Could it be that
   within
   RESTXQ the full text index is treated differently?
  
   I will work closer on a  self contained example, but thought this
 might
   point to something.
  
   Cheers
   Lars
  
  
   2015-05-18 13:44 GMT+02:00 Lars Johnsen yoon...@gmail.com:
  
   Hi Christian - and thanks for fast response. Latest version 8.11 is
 in
   use
   (same behaviour as previous). Let me see if I can make a self
 contained
   example.
  
   best,
   Lars
  
   2015-05-18 13:40 GMT+02:00 Christian Grün 
 christian.gr...@gmail.com:
  
   Hi Lars,
  
   hm, that's difficult to tell. All I can say is that this sounds
   unusual, so I'm coming up with my standard questions: Do you think
 you
   could build us a little example that allows us to reproduce the
   problem? Have you tried the latest version of BaseX?
  
   Best,
   Christian
  
  
   On Mon, May 18, 2015 at 1:35 PM, Lars Johnsen yoon...@gmail.com
   wrote:
   
I am running a web script in two identical versions (identical
 as in
cut
and paste), one via RESTXQ and one vi REST. The response is
different,
and
I wondered what may be the trouble.
   
For example the output (the URLs only works locally) for
http://ljohnsen:8984/hyphens/mellom
is the same as
 http://ljohnsen:8984/rest?run=hyphen-show.xqword=mellom
   
which is a set of hyphenation data:
mellom
mel - lom 17005
Mel - lom 144
mel - lom. 50
   
but if mellom is exchanged with nasjonalbiblioteket only  the
REST
version shows any result, which then is the same as I get
experimenting
in
the GUI.
   
The actual script is added below, and which runs in both versions
(identical apart form the rest and restxq interfaces), it uses
 full
text
search, but results differ when run under the REST-regime.
   
All the best
Lars G Johnsen
National Library of Norway
   
module namespace page = 'http://basex.org/modules/web-page';
   
declare
  %rest:path(/hyphens/{$word})
  %output:method(html)
   
function page:show-hyphens($word) {
   let $db := db:open('hyphen-data')
 let $hyphens :=  for $hyp in $db/hyphens/hyphens[full
 contains
text
{$word}]
  group by $first := $hyp/first, $second := $hyp/second
  let $count := count($hyp)
  order by xs:int($count) descending
  return element p {
attribute freq {$count},
$first,  - , $second, $count
  }
   
 let $total := sum($hyphens//@freq)
 let $div := element div {
   element p {$word},
   for $hyp in $hyphens
   return element div {
  attribute class {hyph},
  attribute style {font-size:, 1
+round(xs:int($hyp//@freq/data())
div $total,1) || em},
  $hyp
   
 }
 }
 return
 html encoding=UTF-8
head
meta http-equiv=Content-Type content=text/html
charset=UTF-8
/
titleOrddelinger/title
/head
body{$div}
/body

Re: [basex-talk] rest vs. restxq - strange difference

2015-05-18 Thread Lars Johnsen

The full text query is blisteringly fast for both, the text index query is
fast only for REST queries and seems not to be used with queries in RESTXQ.
I am rebuilding the whole database now to see how it goes, and will restart
everything for a new assessment.



2015-05-18 15:00 GMT+02:00 Christian Grün christian.gr...@gmail.com:

  However, when using text index instead of full text the results are the
 same
  for both, except that RESTXQ takes almost forever

 What about the original query: Has it been slow as well, or do you
 think this is a new problem?


  2015-05-18 14:28 GMT+02:00 Christian Grün christian.gr...@gmail.com:
 
  It could be that your URL is decoded in a wrong way.. What happens if
  you run the following function with REST and RESTXQ and føre as
  word?
 
declare
  %rest:path(/test/encoding/{$word})
function page:test-encoding($word) {
  string-to-codepoints($word)
};
 
  Thanks,
  Christian
 
 
  string-to-codepoints()
   REST output (2 first lines):
  føre
  fø - re 219
  
   RESTXQ
  føre
  fo - re 123
  
   The first word quoted is føre in both cases and is what the scripts
   see,
   so the full text is given the same in both cases. Could it be that
   within
   RESTXQ the full text index is treated differently?
  
   I will work closer on a  self contained example, but thought this
 might
   point to something.
  
   Cheers
   Lars
  
  
   2015-05-18 13:44 GMT+02:00 Lars Johnsen yoon...@gmail.com:
  
   Hi Christian - and thanks for fast response. Latest version 8.11 is
 in
   use
   (same behaviour as previous). Let me see if I can make a self
 contained
   example.
  
   best,
   Lars
  
   2015-05-18 13:40 GMT+02:00 Christian Grün christian.gr...@gmail.com
 :
  
   Hi Lars,
  
   hm, that's difficult to tell. All I can say is that this sounds
   unusual, so I'm coming up with my standard questions: Do you think
 you
   could build us a little example that allows us to reproduce the
   problem? Have you tried the latest version of BaseX?
  
   Best,
   Christian
  
  
   On Mon, May 18, 2015 at 1:35 PM, Lars Johnsen yoon...@gmail.com
   wrote:
   
I am running a web script in two identical versions (identical as
 in
cut
and paste), one via RESTXQ and one vi REST. The response is
different,
and
I wondered what may be the trouble.
   
For example the output (the URLs only works locally) for
http://ljohnsen:8984/hyphens/mellom
is the same as
 http://ljohnsen:8984/rest?run=hyphen-show.xqword=mellom
   
which is a set of hyphenation data:
mellom
mel - lom 17005
Mel - lom 144
mel - lom. 50
   
but if mellom is exchanged with nasjonalbiblioteket only  the
REST
version shows any result, which then is the same as I get
experimenting
in
the GUI.
   
The actual script is added below, and which runs in both versions
(identical apart form the rest and restxq interfaces), it uses
 full
text
search, but results differ when run under the REST-regime.
   
All the best
Lars G Johnsen
National Library of Norway
   
module namespace page = 'http://basex.org/modules/web-page';
   
declare
  %rest:path(/hyphens/{$word})
  %output:method(html)
   
function page:show-hyphens($word) {
   let $db := db:open('hyphen-data')
 let $hyphens :=  for $hyp in $db/hyphens/hyphens[full
 contains
text
{$word}]
  group by $first := $hyp/first, $second := $hyp/second
  let $count := count($hyp)
  order by xs:int($count) descending
  return element p {
attribute freq {$count},
$first,  - , $second, $count
  }
   
 let $total := sum($hyphens//@freq)
 let $div := element div {
   element p {$word},
   for $hyp in $hyphens
   return element div {
  attribute class {hyph},
  attribute style {font-size:, 1
+round(xs:int($hyp//@freq/data())
div $total,1) || em},
  $hyp
   
 }
 }
 return
 html encoding=UTF-8
head
meta http-equiv=Content-Type content=text/html
charset=UTF-8
/
titleOrddelinger/title
/head
body{$div}
/body
/html
   
};

Re: [basex-talk] rest vs. restxq - strange difference

2015-05-18 Thread Lars Johnsen

Tried to make a small example but then things worked the same, so reindexed
the database (no language and no stemming) and found this. It seems that it
has to to with character encoding.

RESTXQ finds hits for føre as fore while REST treats it as føre so
the outputs are like this

REST output (2 first lines):
   føre
   fø - re 219

RESTXQ
   føre
   fo - re 123

The first word quoted is føre in both cases and is what the scripts see,
so the full text is given the same in both cases. Could it be that within
RESTXQ the full text index is treated differently?

I will work closer on a  self contained example, but thought this might
point to something.

Cheers
Lars


2015-05-18 13:44 GMT+02:00 Lars Johnsen yoon...@gmail.com:

 Hi Christian - and thanks for fast response. Latest version 8.11 is in use
 (same behaviour as previous). Let me see if I can make a self contained
 example.

 best,
 Lars

 2015-05-18 13:40 GMT+02:00 Christian Grün christian.gr...@gmail.com:

 Hi Lars,

 hm, that's difficult to tell. All I can say is that this sounds
 unusual, so I'm coming up with my standard questions: Do you think you
 could build us a little example that allows us to reproduce the
 problem? Have you tried the latest version of BaseX?

 Best,
 Christian


 On Mon, May 18, 2015 at 1:35 PM, Lars Johnsen yoon...@gmail.com wrote:
 
  I am running a web script in two identical versions (identical as in
 cut
  and paste), one via RESTXQ and one vi REST. The response is different,
 and
  I wondered what may be the trouble.
 
  For example the output (the URLs only works locally) for
  http://ljohnsen:8984/hyphens/mellom
  is the same as
   http://ljohnsen:8984/rest?run=hyphen-show.xqword=mellom
 
  which is a set of hyphenation data:
  mellom
  mel - lom 17005
  Mel - lom 144
  mel - lom. 50
 
  but if mellom is exchanged with nasjonalbiblioteket only  the REST
  version shows any result, which then is the same as I get experimenting
 in
  the GUI.
 
  The actual script is added below, and which runs in both versions
  (identical apart form the rest and restxq interfaces), it uses full text
  search, but results differ when run under the REST-regime.
 
  All the best
  Lars G Johnsen
  National Library of Norway
 
  module namespace page = 'http://basex.org/modules/web-page';
 
  declare
%rest:path(/hyphens/{$word})
%output:method(html)
 
  function page:show-hyphens($word) {
 let $db := db:open('hyphen-data')
   let $hyphens :=  for $hyp in $db/hyphens/hyphens[full contains text
  {$word}]
group by $first := $hyp/first, $second := $hyp/second
let $count := count($hyp)
order by xs:int($count) descending
return element p {
  attribute freq {$count},
  $first,  - , $second, $count
}
 
   let $total := sum($hyphens//@freq)
   let $div := element div {
 element p {$word},
 for $hyp in $hyphens
 return element div {
attribute class {hyph},
attribute style {font-size:, 1
 +round(xs:int($hyp//@freq/data())
  div $total,1) || em},
$hyp
 
   }
   }
   return
   html encoding=UTF-8
  head
  meta http-equiv=Content-Type content=text/html
 charset=UTF-8
  /
  titleOrddelinger/title
  /head
  body{$div}
  /body
  /html
 
  };

Re: [basex-talk] rest vs. restxq - strange difference

2015-05-18 Thread Lars Johnsen

The codepoints are identical for both for føre:

102 248 114 101

and same as GUI.

However, when using text index instead of full text the results are the
same for both, except that RESTXQ takes almost forever (as if there was no
text index), while REST gives immediate result. So it looks as if the
RESTXQ accesses the index structure in a different way - could that be so,
or is there some strange things in my own configuration?



2015-05-18 14:28 GMT+02:00 Christian Grün christian.gr...@gmail.com:

 It could be that your URL is decoded in a wrong way.. What happens if
 you run the following function with REST and RESTXQ and føre as
 word?

   declare
 %rest:path(/test/encoding/{$word})
   function page:test-encoding($word) {
 string-to-codepoints($word)
   };

 Thanks,
 Christian


 string-to-codepoints()
  REST output (2 first lines):
 føre
 fø - re 219
 
  RESTXQ
 føre
 fo - re 123
 
  The first word quoted is føre in both cases and is what the scripts
 see,
  so the full text is given the same in both cases. Could it be that within
  RESTXQ the full text index is treated differently?
 
  I will work closer on a  self contained example, but thought this might
  point to something.
 
  Cheers
  Lars
 
 
  2015-05-18 13:44 GMT+02:00 Lars Johnsen yoon...@gmail.com:
 
  Hi Christian - and thanks for fast response. Latest version 8.11 is in
 use
  (same behaviour as previous). Let me see if I can make a self contained
  example.
 
  best,
  Lars
 
  2015-05-18 13:40 GMT+02:00 Christian Grün christian.gr...@gmail.com:
 
  Hi Lars,
 
  hm, that's difficult to tell. All I can say is that this sounds
  unusual, so I'm coming up with my standard questions: Do you think you
  could build us a little example that allows us to reproduce the
  problem? Have you tried the latest version of BaseX?
 
  Best,
  Christian
 
 
  On Mon, May 18, 2015 at 1:35 PM, Lars Johnsen yoon...@gmail.com
 wrote:
  
   I am running a web script in two identical versions (identical as in
   cut
   and paste), one via RESTXQ and one vi REST. The response is
 different,
   and
   I wondered what may be the trouble.
  
   For example the output (the URLs only works locally) for
   http://ljohnsen:8984/hyphens/mellom
   is the same as
http://ljohnsen:8984/rest?run=hyphen-show.xqword=mellom
  
   which is a set of hyphenation data:
   mellom
   mel - lom 17005
   Mel - lom 144
   mel - lom. 50
  
   but if mellom is exchanged with nasjonalbiblioteket only  the
 REST
   version shows any result, which then is the same as I get
 experimenting
   in
   the GUI.
  
   The actual script is added below, and which runs in both versions
   (identical apart form the rest and restxq interfaces), it uses full
   text
   search, but results differ when run under the REST-regime.
  
   All the best
   Lars G Johnsen
   National Library of Norway
  
   module namespace page = 'http://basex.org/modules/web-page';
  
   declare
 %rest:path(/hyphens/{$word})
 %output:method(html)
  
   function page:show-hyphens($word) {
  let $db := db:open('hyphen-data')
let $hyphens :=  for $hyp in $db/hyphens/hyphens[full contains
   text
   {$word}]
 group by $first := $hyp/first, $second := $hyp/second
 let $count := count($hyp)
 order by xs:int($count) descending
 return element p {
   attribute freq {$count},
   $first,  - , $second, $count
 }
  
let $total := sum($hyphens//@freq)
let $div := element div {
  element p {$word},
  for $hyp in $hyphens
  return element div {
 attribute class {hyph},
 attribute style {font-size:, 1
   +round(xs:int($hyp//@freq/data())
   div $total,1) || em},
 $hyp
  
}
}
return
html encoding=UTF-8
   head
   meta http-equiv=Content-Type content=text/html
   charset=UTF-8
   /
   titleOrddelinger/title
   /head
   body{$div}
   /body
   /html
  
   };

Re: [basex-talk] rest vs. restxq - strange difference

2015-05-18 Thread Lars Johnsen

A last update, which may illuminate a little. After reindexing the database
using Norwegian (snowball), stemming, and keeping diacritis, RESTXQ
processes neither the special characters (treats them as closest ascii),
nor inflected forms.

The words mannen (=the man, definite) and spaserer (=walks, present
tense), result in no output, while using the naked stems mann and
spaser the full result is displayed. In contrast to REST which behaves as
expected.


Cheers
Lars

2015-05-18 15:28 GMT+02:00 Lars Johnsen yoon...@gmail.com:

 As an update, after rebuilding database with

 text index,
 full text index (no language, no stemming, keep diacritics)

 restarting server:
 BaseX 8.1.1 [Server]
 Server was started (port: 29084)
 [main] INFO org.eclipse.jetty.server.AbstractConnector - Started
 SelectChannelConnector@0.0.0.0:8984
 HTTP Server was started (port: 8984)

 RESTXQ: Norwegian characters are converted using full text index, changing
 to text index takes forever.
 REST: Full-text works as expected, and text index works as expected (same
 as runing in GUI for both).

 It looks as if the index structure is treated differently.


 2015-05-18 15:07 GMT+02:00 Lars Johnsen yoon...@gmail.com:

 The full text query is blisteringly fast for both, the text index query
 is fast only for REST queries and seems not to be used with queries in
 RESTXQ. I am rebuilding the whole database now to see how it goes, and will
 restart everything for a new assessment.



 2015-05-18 15:00 GMT+02:00 Christian Grün christian.gr...@gmail.com:

  However, when using text index instead of full text the results are
 the same
  for both, except that RESTXQ takes almost forever

 What about the original query: Has it been slow as well, or do you
 think this is a new problem?


  2015-05-18 14:28 GMT+02:00 Christian Grün christian.gr...@gmail.com:
 
  It could be that your URL is decoded in a wrong way.. What happens if
  you run the following function with REST and RESTXQ and føre as
  word?
 
declare
  %rest:path(/test/encoding/{$word})
function page:test-encoding($word) {
  string-to-codepoints($word)
};
 
  Thanks,
  Christian
 
 
  string-to-codepoints()
   REST output (2 first lines):
  føre
  fø - re 219
  
   RESTXQ
  føre
  fo - re 123
  
   The first word quoted is føre in both cases and is what the
 scripts
   see,
   so the full text is given the same in both cases. Could it be that
   within
   RESTXQ the full text index is treated differently?
  
   I will work closer on a  self contained example, but thought this
 might
   point to something.
  
   Cheers
   Lars
  
  
   2015-05-18 13:44 GMT+02:00 Lars Johnsen yoon...@gmail.com:
  
   Hi Christian - and thanks for fast response. Latest version 8.11
 is in
   use
   (same behaviour as previous). Let me see if I can make a self
 contained
   example.
  
   best,
   Lars
  
   2015-05-18 13:40 GMT+02:00 Christian Grün 
 christian.gr...@gmail.com:
  
   Hi Lars,
  
   hm, that's difficult to tell. All I can say is that this sounds
   unusual, so I'm coming up with my standard questions: Do you
 think you
   could build us a little example that allows us to reproduce the
   problem? Have you tried the latest version of BaseX?
  
   Best,
   Christian
  
  
   On Mon, May 18, 2015 at 1:35 PM, Lars Johnsen yoon...@gmail.com
   wrote:
   
I am running a web script in two identical versions (identical
 as in
cut
and paste), one via RESTXQ and one vi REST. The response is
different,
and
I wondered what may be the trouble.
   
For example the output (the URLs only works locally) for
http://ljohnsen:8984/hyphens/mellom
is the same as
 http://ljohnsen:8984/rest?run=hyphen-show.xqword=mellom
   
which is a set of hyphenation data:
mellom
mel - lom 17005
Mel - lom 144
mel - lom. 50
   
but if mellom is exchanged with nasjonalbiblioteket only
 the
REST
version shows any result, which then is the same as I get
experimenting
in
the GUI.
   
The actual script is added below, and which runs in both
 versions
(identical apart form the rest and restxq interfaces), it uses
 full
text
search, but results differ when run under the REST-regime.
   
All the best
Lars G Johnsen
National Library of Norway
   
module namespace page = 'http://basex.org/modules/web-page';
   
declare
  %rest:path(/hyphens/{$word})
  %output:method(html)
   
function page:show-hyphens($word) {
   let $db := db:open('hyphen-data')
 let $hyphens :=  for $hyp in $db/hyphens/hyphens[full
 contains
text
{$word}]
  group by $first := $hyp/first, $second := $hyp/second
  let $count := count($hyp)
  order by xs:int($count) descending
  return element p {
attribute freq {$count},
$first,  - , $second, $count
  }
   
 let $total := sum($hyphens//@freq

Re: [basex-talk] rest vs. restxq - strange difference

2015-05-18 Thread Lars Johnsen

Hi Christian - and thanks for fast response. Latest version 8.11 is in use
(same behaviour as previous). Let me see if I can make a self contained
example.

best,
Lars

2015-05-18 13:40 GMT+02:00 Christian Grün christian.gr...@gmail.com:

 Hi Lars,

 hm, that's difficult to tell. All I can say is that this sounds
 unusual, so I'm coming up with my standard questions: Do you think you
 could build us a little example that allows us to reproduce the
 problem? Have you tried the latest version of BaseX?

 Best,
 Christian


 On Mon, May 18, 2015 at 1:35 PM, Lars Johnsen yoon...@gmail.com wrote:
 
  I am running a web script in two identical versions (identical as in cut
  and paste), one via RESTXQ and one vi REST. The response is different,
 and
  I wondered what may be the trouble.
 
  For example the output (the URLs only works locally) for
  http://ljohnsen:8984/hyphens/mellom
  is the same as
   http://ljohnsen:8984/rest?run=hyphen-show.xqword=mellom
 
  which is a set of hyphenation data:
  mellom
  mel - lom 17005
  Mel - lom 144
  mel - lom. 50
 
  but if mellom is exchanged with nasjonalbiblioteket only  the REST
  version shows any result, which then is the same as I get experimenting
 in
  the GUI.
 
  The actual script is added below, and which runs in both versions
  (identical apart form the rest and restxq interfaces), it uses full text
  search, but results differ when run under the REST-regime.
 
  All the best
  Lars G Johnsen
  National Library of Norway
 
  module namespace page = 'http://basex.org/modules/web-page';
 
  declare
%rest:path(/hyphens/{$word})
%output:method(html)
 
  function page:show-hyphens($word) {
 let $db := db:open('hyphen-data')
   let $hyphens :=  for $hyp in $db/hyphens/hyphens[full contains text
  {$word}]
group by $first := $hyp/first, $second := $hyp/second
let $count := count($hyp)
order by xs:int($count) descending
return element p {
  attribute freq {$count},
  $first,  - , $second, $count
}
 
   let $total := sum($hyphens//@freq)
   let $div := element div {
 element p {$word},
 for $hyp in $hyphens
 return element div {
attribute class {hyph},
attribute style {font-size:, 1
 +round(xs:int($hyp//@freq/data())
  div $total,1) || em},
$hyp
 
   }
   }
   return
   html encoding=UTF-8
  head
  meta http-equiv=Content-Type content=text/html
 charset=UTF-8
  /
  titleOrddelinger/title
  /head
  body{$div}
  /body
  /html
 
  };

Re: [basex-talk] memory usage when writing files

2015-03-27 Thread Lars Johnsen

Hi Christian, and thanks a lot for the pointer to fetch:xml - it seems to
do the trick! Now, a little recoding, and it should be working.

Best,
Lars

2015-03-27 10:48 GMT+01:00 Christian Grün christian.gr...@gmail.com:

 Hi Lars,

 Here is some background information for the reported behavior (sorry
 in advance if this is known to you anyway): The functional semantics
 of XQuery requires that repeated calls to fn:doc and fn:collection
 return the same documents. This can e.g. be shown by the following
 query:

   doc('x.xml') is doc('x.xml')

 As it's difficult to guess in advance which of the opened documents
 will possibly be requested again in the same query, they are all kept
 in main-memory until query evaluation is completed.

 However, things are different with functions like fetch:xml [1]. You
 may need to tweak your query a little bit, because the function will
 always give you single XML documents.

 Does this help?
 Christian

 [1] http://docs.basex.org/wiki/Fetch_Module#fetch:xml


 On Fri, Mar 27, 2015 at 10:41 AM, Lars Johnsen yoon...@gmail.com wrote:
  Hi all
 
  Here is code that gradually eats up memory, whether run in GUI or as
  command. All it does is creating temporary collections out of folders,
 and
  writing them to file.
 
  Is there a simple way to avoid this code to eat up memory? It runs out of
  memory (set at 12GB for command, 18GB in GUI) after 300 folders or so,
 and
  it has to process 20 000 of them.
 
  Best
  Lars G Johnsen
  Norwegian National Library
 
  Here is the actual code
 
  (: process list of folders :)
 
  for $collections in file:list($digibooks)
   let $html := $htmlfiles || substring-before($collections, _ocr) ||
  .html
 
   return
 
  (: code is rerun so check if files exist :)
 
  if (not(file:exists($html))) then
  try {
 
 (: create a temporary collection of the files and write
 result to
  disk :)
 
  file:write(
$html,
db:digibok-to-html(
  collection($digibooks || $collections))
  )
 
  } catch * {
  $err:code
  }
  else
()

[basex-talk] memory usage when writing files

2015-03-27 Thread Lars Johnsen

Hi all

Here is code that gradually eats up memory, whether run in GUI or as
command. All it does is creating temporary collections out of folders, and
writing them to file.

Is there a simple way to avoid this code to eat up memory? It runs out of
memory (set at 12GB for command, 18GB in GUI) after 300 folders or so, and
it has to process 20 000 of them.

Best
Lars G Johnsen
Norwegian National Library

Here is the actual code

(: process list of folders :)

for $collections in file:list($digibooks)
 let $html := $htmlfiles || substring-before($collections, _ocr) ||
.html

 return

(: code is rerun so check if files exist :)

if (not(file:exists($html))) then
try {

   (: create a temporary collection of the files and write result
to disk :)

file:write(
  $html,
  db:digibok-to-html(
collection($digibooks || $collections))
)

} catch * {
$err:code
}
else
  ()

Re: [basex-talk] Error?

2015-03-12 Thread Lars Johnsen

An attempt with a brand new family of colon-infected folders (not
databases), works nicely now withthe new snapshot. While the same commands
in the old version still generated the FODC0002 error:

So both of these worked nicely in the new snapshot (strings refers to files
and folders -  not database paths):


collection(/home/larsj/terra/Synsegn/URN:NBN:no-nb_digitidsskrift_2014092382210_001)

as well as


doc(/home/larsj/terra/Synsegn/URN:NBN:no-nb_digitidsskrift_2014092382210_001/Alto_13.xml)


Cheers
Lars



2015-03-12 11:05 GMT+01:00 Christian Grün christian.gr...@gmail.com:

  Thanks, new snapshot downloaded,  but the errors persist in some way or
  another (not exactly the same).

 Does it stil have to with colons? If you find out more about it, feel
 free to tell me.

 Takk,
 Christian

Re: [basex-talk] Error?

2015-03-12 Thread Lars Johnsen

Thanks, new snapshot downloaded,  but the errors persist in some way or
another (not exactly the same).

However, since you mentioned that the problems were caused by colons in
filenames, removing those colons did the trick. Now it all works as
expected. The colon-free-folders can be referenced with collection(), as
well as being ingested. All is well.

Alles klar!
Lars

2015-03-11 18:57 GMT+01:00 Christian Grün christian.gr...@gmail.com:

 Hi Lars,

 The issue should be fixed with the latest snapshot [1].

 फिर मिलेंगे,
 Christian

 [1] http://files.basex.org/releases/latest



 On Tue, Mar 10, 2015 at 6:23 PM, Lars Johnsen yoon...@gmail.com wrote:
  Hi all
 
  I have a database collection which gives a strange error message
 
  '[FODC0002]  (Line 1): Content is not allowed in prolog.'
 
  on this query:
 
  for $x in collection('samtiden')
   return doc(document-uri($x))
 
  The results from document-uri($x) are as expected, but
 doc(document-uri($x))
  should return $x - so what can be the cause of the error?
 
  The collection was entered in the gui using skip corrupt files since
 some
  of the files had errors (ended prematurely). But the database and
  document-uri() works fine for all elements in the database.
 
  I need to process the paths for grouping the documents in the
 collection, so
  the practical value is a bit more than just trying to compute a unity
  function!
 
  Best,
  Lars G Johnsen
  National Library of Norway

[basex-talk] Error?

2015-03-10 Thread Lars Johnsen

Hi all

I have a database collection which gives a strange error message

'[FODC0002]  (Line 1): Content is not allowed in prolog.'

on this query:

for $x in collection('samtiden')
 return doc(document-uri($x))

The results from document-uri($x) are as expected, but
doc(document-uri($x)) should return $x - so what can be the cause of the
error?

The collection was entered in the gui using skip corrupt files since some
of the files had errors (ended prematurely). But the database and
document-uri() works fine for all elements in the database.

I need to process the paths for grouping the documents in the collection,
so the practical value is a bit more than just trying to compute a unity
function!

Best,
Lars G Johnsen
National Library of Norway

[basex-talk] Invoking database update from RESTXQ

2015-03-03 Thread Lars Johnsen

Hello all

I was wondering how to perform database updating, like db:add, in RESTXQ,
since the adding and updating functions throw errors when invoked inside
functions defined in RESTXQ-modules.

Have been trying to use the forward mechanism, transferring the data to a
script without functions, with no success.

Best,
Lars G Johnsen

Re: [basex-talk] RESTXQ and access blocking

2015-01-14 Thread Lars Johnsen

Thanks - it worked out nicely! Just commented out the servlet-section on
REST.

Cheers,
Lars

2015-01-14 15:57 GMT+01:00 Dirk Kirsten d...@basex.org:

 Hello Lars,

 You can disable the REST interface if you do not intend to use it (and you
 solely use RESTXQ). This can be done using your web server. In our default
 jetty-based HTTP server you can find the servlet mapping in
 WEB-INF/web.xml, where you can simply disable the servlet mapping for REST.

 Of course you could also secure this path using your web service (.e.g
 requesting a HTTP authentication when accessing REST).

 Cheers,
 Dirk
 On 01/14/2015 03:49 PM, Lars Johnsen wrote:
  Hi all
 
  I was wondering how to block general access to BaseX when using RESTXQ.
 Our
  javascript/jquery web-application communicates with BaseX using commands
  like:
 
   $('#myobject').load('objects')
 
  where the term 'objects' is defined as a path in a .xqm-file.
 
   declare %rest:path(/objects)
 
  However, databases are exposed using the URL /rest which seems built
 into
  the rest-module. For example, in the javascript/jquery console (f.ex. in
  Chrome ), a div could be filled up with content outside of the
 application
  by typing things like:
 
   $('div').load('rest/my_database')
 
  and general queries could be made using the rest-interface
  http://docs.basex.org/wiki/REST.
 
  Is there a way to prevent this, while at the same time using BaseX as
  web-server (one way is to use BaseX only as a backend database)? Or how
 to
  limit the URLs permitted?
 
 
  Best
  Lars
 

 --
 Dirk Kirsten, BaseX GmbH, http://basexgmbh.de
 |-- Firmensitz: Blarerstrasse 56, 78462 Konstanz
 |-- Registergericht Freiburg, HRB: 708285, Geschäftsführer:
 | Dr. Christian Grün, Dr. Alexander Holupirek, Michael Seiferle
 `-- Phone: 0049 7531 28 28 676, Fax: 0049 7531 20 05 22

[basex-talk] RESTXQ and access blocking

2015-01-14 Thread Lars Johnsen

Hi all

I was wondering how to block general access to BaseX when using RESTXQ. Our
javascript/jquery web-application communicates with BaseX using commands
like:

 $('#myobject').load('objects')

where the term 'objects' is defined as a path in a .xqm-file.

 declare %rest:path(/objects)

However, databases are exposed using the URL /rest which seems built into
the rest-module. For example, in the javascript/jquery console (f.ex. in
Chrome ), a div could be filled up with content outside of the application
by typing things like:

 $('div').load('rest/my_database')

and general queries could be made using the rest-interface
http://docs.basex.org/wiki/REST.

Is there a way to prevent this, while at the same time using BaseX as
web-server (one way is to use BaseX only as a backend database)? Or how to
limit the URLs permitted?


Best
Lars

Re: [basex-talk] deduplication problem in BaseX 7.9?

2014-09-27 Thread Lars Johnsen

Hi Michael -

I got the same results with my 7.9 version. What is a bit surprising
(hopefully I am not introducing any noise into your problem) is that if the
last child step is cut off the paths and added to the path variables within
the return query, the result becomes 4 4 4 4 4. So each $pathN without the
article-id appears to be returning one copy of the parent article-meta up
to the parent step:

let $path1 := $doc/child::article/child::front
  /child::article-meta
  /child::aff[contains(.,Japan)]
  /parent::article-meta,
$path2 := $doc/descendant::aff[contains(.,Japan)]
  /parent::article-meta,
$path3 := $doc/article/front/article-meta/aff[contains(.,'Japan')]
  /..,
$path4 := $doc//aff[contains(.,'Japan')]/..,
$path5 := $doc//aff/..


 return (count($path1/child::article-id), count($path2/article-id),
count($path3/article-id), count($path4/article-id),
count($path5/article-id))

Best
Lars

2014-09-26 20:35 GMT+02:00 C. M. Sperberg-McQueen cms...@blackmesatech.com
:

 Consider the following XML document:

 article
   front
 article-meta
   aff id=aff1Tropical and Infectious Disease Hospital, Kathmandu,
 Nepal/aff
   aff id=aff2Nagasaki University, Nagasaki, Japan/aff
   aff id=aff3Department of  Radiology, Kyorin University Faculty
 of Medicine, Tokyo, Japan/aff
   aff id=aff5Pentax Company Limited, Tokyo, Japan/aff
   aff id=aff6National  Research  Laboratory  of Molecular
 Complex  Control, Yonsei University, Seoul, Korea/aff
   !--* ... *--
   article-id pub-id-type=pmc2570825/article-id
   article-id pub-id-type=pmid18325280/article-id
   article-id pub-id-type=publisher-id07-0473/article-id
   article-id pub-id-type=doi10.3201/eid1403.070473/article-id
 /article-meta
   /front
   !--* ... *--
 /article

 For convenience in trying to understand this problem, a copy of this
 document has been placed at [1].

 When I issue the following search against this document, I get
 unexpected results.

 let $doc := doc('http://blackmesatech.com/2014/LIS590DML/data/testdata.xml
 ')

 let $path1 := $doc/child::article/child::front
   /child::article-meta
   /child::aff[contains(.,Japan)]
   /parent::article-meta/child::article-id,
 $path2 := $doc/descendant::aff[contains(.,Japan)]
   /parent::article-meta/child::article-id,
 $path3 := $doc/article/front/article-meta/aff[contains(.,'Japan')]
   /../article-id,
 $path4 := $doc//aff[contains(.,'Japan')]/../article-id,
 $path5 := $doc//aff/../article-id

 return (count($path1), count($path2), count($path3), count($path4),
 count($path5))

 What I expect is that path1, path2, path3, path4, and path5 should
 all return the same results, namely the set of four article-id elements
 in the document.  So the sequence of counts returned should be
 4 4 4 4 4.

 What I am finding is that path1 and path3 are returning 12 results,
 with each article-id present three times in the result (once, apparently,
 for every aff element containing the string 'Japan').  Paths 2, 4, and 5
 are all returning 4 results each, as I had expected them to.  So
 the sequence of counts actually returned is 12 4 12 4 4.

 In BaseX 7.6, for what it's worth, this query returns the sequence
 12 12 12 12 20, which seems suggestive.

 Interestingly, if I initialize the variable $doc with a direct element
 constructor, along the lines of

 let $doc := document { article.../article }

 then all counts come out as expected in 7.6, but in 7.9 the result
 continues to be 12 4 12 4 4.

 Is this an error in the handling of the / operator, or am I missing some
 subtle point?

 Many thanks.

 [1] http://blackmesatech.com/2014/LIS590DML/data/testdata.xml
 --
 
 * C. M. Sperberg-McQueen, Black Mesa Technologies LLC
 * http://www.blackmesatech.com
 * http://cmsmcq.com/mib
 * http://balisage.net

Re: [basex-talk] RestXQ and file paths

2014-09-02 Thread Lars Johnsen

Hi Christian

Thanks - that solved it!

Best,
Lars




2014-09-02 17:48 GMT+02:00 Christian Grün christian.gr...@gmail.com:

 Hi Lars,

 the relative URL static/logo.png will be rewritten by your browser
 to the destination URL. If your URL is e.g

http://localhost:1234/show/xyz

 ...then the image URL will be rewritten to...

http://localhost:1234/show/static/logo.png

 In other words, there is no way for BaseX, RESTXQ or the web server to
 do correct this address resolution.

 The easiest solution to tackle this is to use absolute paths. The
 following URL...

img src=/static/logo.png/

 ...will be rewritten to...

http://localhost:1234/static/logo.png

 Hope this helps,
 Christian

[basex-talk] Downloading files

2014-08-21 Thread Lars Johnsen

I am using BaseX restxq for accessing a repository from a web browser.
Uploading files works smoothly, but I can't see how to make a download
button work.
For uploading, the recipie on the restxq help page was enough to get it to
work. Is there a corresponding way for making downloading work?
What I have tried is to let BaseX send a html-page containing:

 form method=get action=/download/{$file}
  button type=submitDownload/button
  /form

To process this form is the following restfunction

declare
  %rest:path(/download/{$file})
  %output:method(html)
  function page:download-file($file) { ... }

Inside the curly braces, I have tried an a href.. element and
file:read-binary, but none of them with any success. BaseX complains about
the a element, and file:read-binary outputs directly to the browser.

Any suggestions?

Re: [basex-talk] Downloading files

2014-08-21 Thread Lars Johnsen

Thanks a lot Andy! It works like a charm. Now the filename can be
manipulated as well. Perfect!

Best,
Lars


2014-08-21 17:10 GMT+02:00 Andy Bunce bunce.a...@gmail.com:


 Hi Lars,
 You need to return a sequence of two items: (restxq:response,thedata)
 I do something like...

 declare
   %rest:path(/download/{$file})
   function page:download-file($file)
   {
(download-response(raw,$file), file:read-binary(..))
   };

 (:~ headers for download  :)
 declare function download-response($method,$filename){
 restxq:response
 output:serialization-parameters
 output:method value={$method}/
 /output:serialization-parameters
http:response
http:header name=Content-Disposition
 value='attachment;filename={$filename}'/
 /http:response
 /restxq:response
 };

 /Andy


 On 21 August 2014 15:40, Lars Johnsen yoon...@gmail.com wrote:

 I came a little closer by making custom http:headers, but I have to
 confess I'm in deep water here:

 declare
   %rest:path(/download/{$file})
   function page:download-file($file)
   {
 rest:response
  http:response status=200 message=OK

 http:header name=Content-Disposition value=Attachment/
http:header name=filename value={$file}/
   /http:response
 /rest:response

   };

 This function do trigger a download of a file with the appropriate file
 name (=$file) containing the text OK. If I just could find somewhere in
 this code to put the contents of file, it should solve the problem.

 Best,
 Lars


 2014-08-21 15:20 GMT+02:00 Lars Johnsen yoon...@gmail.com:

 I am using BaseX restxq for accessing a repository from a web browser.
 Uploading files works smoothly, but I can't see how to make a download
 button work.
 For uploading, the recipie on the restxq help page was enough to get it
 to work. Is there a corresponding way for making downloading work?
 What I have tried is to let BaseX send a html-page containing:

  form method=get action=/download/{$file}
   button type=submitDownload/button
   /form

 To process this form is the following restfunction

 declare
   %rest:path(/download/{$file})
   %output:method(html)
   function page:download-file($file) { ... }

 Inside the curly braces, I have tried an a href.. element and
 file:read-binary, but none of them with any success. BaseX complains about
 the a element, and file:read-binary outputs directly to the browser.

 Any suggestions?

[basex-talk] Basex 404

2014-06-16 Thread Lars Johnsen

I've recently run into a problem with basexserver/basexhttp. The host:8984
responds but seems to have lost contact with the webapp directory. Here is
what is happening on 3 out of 5 installations (on linux - installed by
unpacking the latest ziparchive), where one used to work as expected:

1) host:8984 do not display the restxq example, but is up and running
2) host:8984/rest shows the list of databases
3) host:8984/rest/run=script-name.xq  gives 404 - no function matches
request

So the server appears to be connected to the database, but won't look at
the webapp directory. Where is the best place to look for information?

All the best,
Lars

[basex-talk] Traling empty nodes in csv:serialize

2014-03-22 Thread Lars Johnsen

Hi all

I have a question about csv:serialize and blank nodes. Trailing blanks are
indicated with a separator, but those in the middle are.

 csv:serialize(
 rows
row f1/f s2/s t3/t/row
row f/f  s/s  t3/t/row
row f1/f s/s  t3/t/row
row f1/f s/s  t/t/row
row f1/f s2/s t/t/row
 /rows)

Here is the output (BaseX 7.8.1) , where line 4 and 5 are missing trailing
separators:

1,2,3
,,3
1,,3
1
1,2

Could this be fixed, or is it a feature?

All the best
Lars G Johnsen
National Library of Norway
___
BaseX-Talk mailing list
BaseX-Talk@mailman.uni-konstanz.de
https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk

Re: [basex-talk] Revisiting XMLPrague

2014-02-18 Thread Lars Johnsen

It was great meeting so many of the BaseX developers.
I especially appreciated all the help with the distinction between
Semmelknödel vs. Serviettenknödel!

Thanks again for a great time in Prague

Lars
National Library of Norway


2014-02-17 18:29 GMT+01:00 Christian Grün christian.gr...@gmail.com:

 Dear subscribers,

 it was great to meet some of you live in Prague! And once again I
 would like to thank all our speakers (Lars Johnsen, Gerrit Imsieke,
 Yoann Maingon, Jean-Marc Mercier and Michael Sperberg-McQueen) for
 making the user meeting to a memorable event!

 Have fun with BaseX 7.8, and we are looking to your feedback,
 Christian

 PS: Maybe we'll see each other in London or Amsterdam, or at the
 Balisage? We'll tell you in time if we will be participating as well.
 ___
 BaseX-Talk mailing list
 BaseX-Talk@mailman.uni-konstanz.de
 https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk

___
BaseX-Talk mailing list
BaseX-Talk@mailman.uni-konstanz.de
https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk

[basex-talk] Parameter to xquery

2014-01-30 Thread Lars Johnsen

Is it possible to pass parameters to xquery modules using BaseX scripting
language? I tried something similar to to http-specification but had no
luck.

For example, if the code below is stored as add.xq, could $x and $y be
supplied in a RUN add.xq command?

declare variable $x external;
declare variable $y external;
let $res := $x + $y
return $res


All the best
Lars G Johnsen
National Library of Norway
___
BaseX-Talk mailing list
BaseX-Talk@mailman.uni-konstanz.de
https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk

[basex-talk] html and css

2014-01-30 Thread Lars Johnsen

When running BaseX http-server, I wondered where resources like javascript
and css should be located. The .xqm-file for the welcome RESTXQ module
seems to fetch its svg and css from a static directory under webapp.
However, when sending  HTML code from within a home made query, it won't
connect to any css neither within webapp nor webapp/static, nor even with
an absolute path.

Best
Lars G Johnsen
National Library of Norway
___
BaseX-Talk mailing list
BaseX-Talk@mailman.uni-konstanz.de
https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk

[basex-talk] csv-parser

2013-12-04 Thread Lars Johnsen

Hi all

When importing CSV files, it seems that BaseX is parsing fields for
balanced bracketing and quotes (a single quote or bracket causes trouble).
Is it possible to turn that off, so that files are processed based only on
delimiter? I couldn't find any information in the documentation.

All the best,
Lars
___
BaseX-Talk mailing list
BaseX-Talk@mailman.uni-konstanz.de
https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk

Re: [basex-talk] csv-parser

2013-12-04 Thread Lars Johnsen

Fabulous, that fixed it!

The lines with a single quote  in them are fine, as well as those with a
single left parenthesis (. The data come from a SQL database with human
entered data in freetext fields, so things aren't always in balance.

The new file browser with the editor is nice too!

Best,
Lars


2013/12/4 Christian Grün christian.gr...@gmail.com

 Hi Lars,

  When importing CSV files, it seems that BaseX is parsing fields for
 balanced
  bracketing and quotes (a single quote or bracket causes trouble). Is it
  possible to turn that off, so that files are processed based only on
  delimiter? I couldn't find any information in the documentation.

 I’ve just added such an option to BaseX. Could you please give us some
 feedback if it does what you expect?

 Thanks,
 Christian

 [1] http://docs.basex.org/wiki/CSV_Module#Options
 [2] http://files.basex.org/releases/latest/

___
BaseX-Talk mailing list
BaseX-Talk@mailman.uni-konstanz.de
https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk

[basex-talk] Results from ft:extract()

2013-12-02 Thread Lars Johnsen

Hello all

When using ft:extract() on nodes, it seems to clip into the match itself
too often. Is it possible to have ft:extract() leave as much before the
match as after?

For example, here are two results for spise lunsj (= eat lunch (language
is Norwegian)) the first is as it should be, while  the second have half of
the matched string clipped . The result is obtained first as a set of hits
using fulltext search [text() contains text {$terms} all], then each hit is
processed through ft:extract($hit, $terms):

... gjerne komme for å spise lunsj med meg på Harrods. Da skal jeg servere
hjortetestikler fra eiendommen min i Skottland. Vi trenger nemlig alle
store...

... de franske VM-spillerne hjem til Paris. Der ble de mottatt av fans både
på Charles de Gaulle-flyplassen og da de ankom Elysee-palasset for å spise
...



Regards,
Lars G Johnsen
National Library of Norway
___
BaseX-Talk mailing list
BaseX-Talk@mailman.uni-konstanz.de
https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk

[basex-talk] HTML serialization error

2013-11-27 Thread Lars Johnsen

When converting files from xml to html, there appeared a serialization
error saying something to the effect that x84 was an illegal html
character. The files were written using file:write with parameter $params
defined as:

 let $params := output:serialization-parametersoutput:method
value=html//
output:serialization-parameters

When sending the html directly to the browser (not writing to file and
using the above declaration), the browser (chrome) appeared to be ok, and
displayed the full html.

Processing the text through normalize-unicode() didn't help. The error
persisted. Is there a way to fix the text before submitting it to
file:write()?



All the best
Lars G Johnsen
National Library of Norway
___
BaseX-Talk mailing list
BaseX-Talk@mailman.uni-konstanz.de
https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk

Re: [basex-talk] HTML serialization error

2013-11-27 Thread Lars Johnsen

Hi Christian and thanks for the quick solution!


 Do you have some idea how the x84 byte was stored into the database?


Each file is a digitized book, where the bytes comes stem the OCR-process.
The words themselves are stored as values of attributes, for example
STRING CONTENT=word.

All the best
Lars
___
BaseX-Talk mailing list
BaseX-Talk@mailman.uni-konstanz.de
https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk

[basex-talk] BaseX rest and PHP

2013-11-12 Thread Lars Johnsen

Hi all

This is a question for those with experience using PHP with BaseX and the
rest interface.

There is a peculiar case of lag for queries, and I wondered if it had to
with how PHPdoes things, or if it is how BaseX does things (or if there is
something else). This may be a long shot, but I still give it a try.

The setup is fairly straight forward, with queries sent from a html-page
via a PHPscript which places the call:

  file_get_contents(basex-rest-URL).

The result is then sent back to the requesting html page. The problem is
that this sometimes hangs: two or three queries then it stops or takes an
awful lot of time.

The peculiar thing is that the basex-rest-URL, when typed directly into
browser address line, receives an immediate response, even while it is
still being processed by the html-php-basex cycle. Sometimes the cycle
gives an answer, sometimes not. The cycle do work when file_get_contents is
directed towards an SQLite database, so it seems that there is something
going on in the communication between PHP and BaseX.

Any ideas?

Regards
Lars G Johnsen
National Library of Norway
___
BaseX-Talk mailing list
BaseX-Talk@mailman.uni-konstanz.de
https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk

[basex-talk] OAI-PMH

2013-10-17 Thread Lars Johnsen

Hello all

I am looking into the possibility of using BaseX as an OAI-PMH metadata
provider and harvester, and wondered if anyone has experience with it for
this purpose. Specifically using BaseX as a repository with the
http-service with xquery scripts for accessing and providing metadata
records.

Presumably, there aren't any limitations on the database side, and since
the OAI-PMH protocol is all XML (http://www.openarchives.org/pmh/) it seems
like a good idea to try and make it work. So if people on this list have
any experience, I would like to hear from you.

thanks,

Lars G Johnsen
National Library of Norway
___
BaseX-Talk mailing list
BaseX-Talk@mailman.uni-konstanz.de
https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk

[basex-talk] problem with basexhttp

2013-10-14 Thread Lars Johnsen

BaseX is installed by administrators on one of our server machines.
When starting  the http server, the followng message is displayed:

$ ./basexhttp
/basex/bin/src/main/webapp/WEB-INF/web.xml not found.

which is a bit strange since the webapp folder is a sister folder to the
bin folder. So why is it looking for webapp in /bin/src/main/ ?

The webapp folder in itself do appear to contain all the relevant folders
and documents. As it is similar to a working setup on another machine.

All the best
Lars G Johnsen
National Library of Norway
___
BaseX-Talk mailing list
BaseX-Talk@mailman.uni-konstanz.de
https://mailman.uni-konstanz.de/mailman/listinfo/basex-talk

42 matches

Mail list logo