[basex-talk] Corrupt database after update

2020-07-16 Thread Johannes Bauer

Hi,

we have a rather strange and hard to track problem with corrupted databases.

Our setup is:

 * Docker container with a Tomcat that hosts BaseX with some custom
   RESTXQ services
 * BaseX 9.2.4
 * Java 14
 * Docker runs on a Linux VM

Workflow

 * Create database with RESTXQ service call
 * Import JSON document with RESTXQ service call
 o call this multiple times.

After some Import calls, the import fails and the database is corrupt 
from this point on.


We first thought that it has something to do with the content of the 
document. But we found no pattern. Sometimes it works, but sometime it 
does not.
There is no concurrency involved. There are no other clients that read 
or write to the database.


We also tried to deactivate the UPDINDEX setting. But it had no effect 
and we could reproduce the error with and without the automatic index 
update.


The logs in case of errors look like this:

06:58:00.683 172.18.0.2:33728 admin REQUEST [PUT] 
/c42-core/api/v1/restxq/user/documents/c42-index/metadata%40document06:58:00.700 
172.18.0.2:33728 admin 500 Unexpected error: Improper use? Potential 
bug? Your feedback is welcome: Contact: 
basex-talk@mailman.uni-konstanz.de Version: BaseX 9.2.4 Java: Oracle 
Corporation, 14.0.1 OS: Linux, amd64 Stack Trace: 
java.lang.ArrayIndexOutOfBoundsException: Index 4 out of bounds for 
length 1 at 
org.basex.io.random.TableDiskAccess.fpre(TableDiskAccess.java:507) at 
org.basex.io.random.TableDiskAccess.cursor(TableDiskAccess.java:467) at 
org.basex.io.random.TableDiskAccess.read1(TableDiskAccess.java:156) at 
org.basex.data.Data.kind(Data.java:304) at 
org.basex.query.up.DataUpdates.prepare(DataUpdates.java:133) at 
org.basex.query.up.ContextModifier.prepare(ContextModifier.java:90) at 
org.basex.query.up.Updates.prepare(Updates.java:168) at 
org.basex.query.QueryContext.update(QueryContext.java:678) at 
org.basex.query.QueryContext.iter(QueryContext.java:332) at 
org.basex.http.restxq.RestXqResponse.serialize(RestXqResponse.java:73) 
at org.basex.http.web.WebResponse.create(WebResponse.java:63) at 
org.basex.http... 16.59 ms Our service does not much. It just calls 
db:replace(). declare variable $documents:IMPORT_OPTS := map {'chop': 
fn:false(), 'stripns': fn:false(), 'intparse': 
fn:true()};declare%rest:PUT("{$xml}")%rest:consumes("application/xml")%rest:produces("application/json")%rest:path("/user/documents/{$databaseId}/{$documentId}")%updatingfunction 
documents:create($databaseId as xs:string, $documentId as xs:string, 
$xml as document-node()){if (db:exists($databaseId)) then 
(update:output(response:empty(204, ())),db:replace($databaseId, 
documents:decode($documentId), $xml, $documents:IMPORT_OPTS))else 
(update:output(response:json(errors:error('C42UDO002', map 
{'databaseId': $databaseId}), 404)), ())}; I've attached the example 
input document. One addition: We could not reproduce this error running 
the Docker container on a Windows host. Any feedback or hints to solve 
this are greatly appreciated. Best regards Johannes


[
  {
"databaseid": "c42-content",
"documentid": "doc_26424521995_de-DE",
"metadata": [
  {
"id": "document",
"name": "Document ID",
"values": [
  "doc_26424521995_de-DE"
]
  },
  {
"id": "media",
"name": "Media Document ID",
"values": [
  "media_26424521995_de-DE"
]
  },
  {
"id": "projectId",
"name": "Project ID",
"values": [
  "26424521995"
]
  },
  {
"id": "lang",
"name": "Language",
"values": [
  "de-DE"
]
  },
  {
"id": "sysTitle",
"name": "System Title",
"values": [
  "Kompaktleistungsschalter 3VA mit IEC-Zertifikat"
]
  },
  {
"id": "type",
"name": "Type",
"values": [
  "Gerätehandbuch"
]
  },
  {
"id": "system",
"name": "System",
"values": [
  "SENTRON"
]
  },
  {
"id": "productGroup",
"name": "Product Group",
"values": [
  "Schutzgeräte"
]
  },
  {
"id": "importDate",
"name": "Import Date",
"values": [
  "2020-07-16T06:10:50.405Z"
]
  }
]
  },
  {
"databaseid": "c42-content",
"documentid": "doc_26424521995_en-US",
"metadata": [
  {
"id": "document",
"name": "Document ID",
"values": [
  "doc_26424521995_en-US"
]
  },
  {
"id": "media",
"name": "Media Document ID",
"values": [
  "media_26424521995_en-US"
]
  },
  {
"id": "projectId",
"name": "Project ID",
"values": [
  "26424521995"
]
  },
  {
"id": "lang",
"name": "Language",
"values": [
  "en-US"
]
  },
  {

Re: [basex-talk] Database file path

2020-07-16 Thread Christian Grün
Hi Vladimir,

The DBPATH option is the one you’ll need to assign. As it’s a global
option, it should be assigned at startup time [1].

Best,
Christian

[1] https://docs.basex.org/wiki/Options



On Thu, Jul 16, 2020 at 6:11 AM Vladimir Churyukin  wrote:
>
> Hello,
>
> We have a data transformation pipeline that works with XML files of different 
> sizes, sometimes big (up to several gigabytes).
> We are using BaseX to do the transformations.
> For smaller files we use the MAINMEM option, because the whole database can 
> fit in memory. But for some files we can't do that, but all databases we 
> create are simply disposable, and we'd like to control where we put them, and 
> destroy them after the processing.
> Is there a special option or any other way to specify where particular 
> database's files will reside?
> How to specify that when we call CreateDB command?
>
> thank you,
> Vladimir


[basex-talk] Transaction Support

2020-07-16 Thread Reto Peter
I am evaluating BaseX for my XML project.

I need transaction support like


-Start transaction

-Run queries (read, write, update)

-Commit or rollback transaction

When I see the documentation, it lists Transaction Manager. But when I look at 
the details, I cannot find anything like that.

Anyone can explain me how is the support, or is there an add-on or planned 
something?

Best regards
Reto, Frauenfeld, Schweiz


Re: [basex-talk] Database file path

2020-07-16 Thread Christian Grün
Right, the option is global. As BaseX has been designed to serve concurrent
requests, it would introduce unexpected side effects of the path was
changed at runtime.

If you are careful, you can try to change the path by assigning a new value
to Context.soptions.


Vladimir Churyukin  schrieb am Do., 16. Juli 2020, 17:33:

> Yes, I've seen that option.
> But there is no way to set it per database, correct?
> I'm asking because by nature our operations are ad-hoc, we don't really
> "startup" the instances,
> we create a database, process the data, then destroy the database.
> Is there some internal limitation why this option needs to be global?
>
> thank you,
> -Vladimir
>
> On Thu, Jul 16, 2020 at 4:36 AM Christian Grün 
> wrote:
>
>> Hi Vladimir,
>>
>> The DBPATH option is the one you’ll need to assign. As it’s a global
>> option, it should be assigned at startup time [1].
>>
>> Best,
>> Christian
>>
>> [1] https://docs.basex.org/wiki/Options
>>
>>
>>
>> On Thu, Jul 16, 2020 at 6:11 AM Vladimir Churyukin 
>> wrote:
>> >
>> > Hello,
>> >
>> > We have a data transformation pipeline that works with XML files of
>> different sizes, sometimes big (up to several gigabytes).
>> > We are using BaseX to do the transformations.
>> > For smaller files we use the MAINMEM option, because the whole database
>> can fit in memory. But for some files we can't do that, but all databases
>> we create are simply disposable, and we'd like to control where we put
>> them, and destroy them after the processing.
>> > Is there a special option or any other way to specify where particular
>> database's files will reside?
>> > How to specify that when we call CreateDB command?
>> >
>> > thank you,
>> > Vladimir
>>
>


Re: [basex-talk] Transaction Support

2020-07-16 Thread Marco Lettere

Hi Reto,
AFAIK Basex is transactional in the sense that whenever you start a 
sequence of commands or an XQuery script, all the "updating operations" 
that modify the database are always stored in a PUL (a list of potential 
updates).
Only when the script terminates all the operations on the DB are 
effectively committed.

There is no explicit rollback operation.
Regards,
Marco.

On 16/07/20 17:40, Reto Peter wrote:


I am evaluating BaseX for my XML project.

I need transaction support like

-Start transaction

-Run queries (read, write, update)

-Commit or rollback transaction

When I see the documentation, it lists Transaction Manager. But when I 
look at the details, I cannot find anything like that.


Anyone can explain me how is the support, or is there an add-on or 
planned something?


Best regards

Reto, Frauenfeld, Schweiz





Re: [basex-talk] Database file path

2020-07-16 Thread Vladimir Churyukin
Ah no, I'm not talking about changing it in runtime, I'm talking about
specifying the path on database creation,
for example when CreateDB command is executed. There shouldn't be
concurrency problems at the moment of database creation, correct?

-Vladimir

On Thu, Jul 16, 2020 at 8:41 AM Christian Grün 
wrote:

> Right, the option is global. As BaseX has been designed to serve
> concurrent requests, it would introduce unexpected side effects of the path
> was changed at runtime.
>
> If you are careful, you can try to change the path by assigning a new
> value to Context.soptions.
>
>
> Vladimir Churyukin  schrieb am Do., 16. Juli 2020,
> 17:33:
>
>> Yes, I've seen that option.
>> But there is no way to set it per database, correct?
>> I'm asking because by nature our operations are ad-hoc, we don't really
>> "startup" the instances,
>> we create a database, process the data, then destroy the database.
>> Is there some internal limitation why this option needs to be global?
>>
>> thank you,
>> -Vladimir
>>
>> On Thu, Jul 16, 2020 at 4:36 AM Christian Grün 
>> wrote:
>>
>>> Hi Vladimir,
>>>
>>> The DBPATH option is the one you’ll need to assign. As it’s a global
>>> option, it should be assigned at startup time [1].
>>>
>>> Best,
>>> Christian
>>>
>>> [1] https://docs.basex.org/wiki/Options
>>>
>>>
>>>
>>> On Thu, Jul 16, 2020 at 6:11 AM Vladimir Churyukin 
>>> wrote:
>>> >
>>> > Hello,
>>> >
>>> > We have a data transformation pipeline that works with XML files of
>>> different sizes, sometimes big (up to several gigabytes).
>>> > We are using BaseX to do the transformations.
>>> > For smaller files we use the MAINMEM option, because the whole
>>> database can fit in memory. But for some files we can't do that, but all
>>> databases we create are simply disposable, and we'd like to control where
>>> we put them, and destroy them after the processing.
>>> > Is there a special option or any other way to specify where particular
>>> database's files will reside?
>>> > How to specify that when we call CreateDB command?
>>> >
>>> > thank you,
>>> > Vladimir
>>>
>>


Re: [basex-talk] Transaction Support

2020-07-16 Thread Reto Peter
Hi Marco
Thanks for answering
But that means Basex is NOT supporting database transactions
Or is it possible to implement real transactions (START TRANS, do something, 
COMMIT trans) with that PUL or something?
Reto

From: BaseX-Talk [mailto:basex-talk-boun...@mailman.uni-konstanz.de] On Behalf 
Of Marco Lettere
Sent: 16 July 2020 17:46
To: basex-talk@mailman.uni-konstanz.de
Subject: Re: [basex-talk] Transaction Support

Hi Reto,
AFAIK Basex is transactional in the sense that whenever you start a sequence of 
commands or an XQuery script, all the "updating operations" that modify the 
database are always stored in a PUL (a list of potential updates).
Only when the script terminates all the operations on the DB are effectively 
committed.
There is no explicit rollback operation.
Regards,
Marco.

On 16/07/20 17:40, Reto Peter wrote:
I am evaluating BaseX for my XML project.

I need transaction support like


-Start transaction

-Run queries (read, write, update)

-Commit or rollback transaction

When I see the documentation, it lists Transaction Manager. But when I look at 
the details, I cannot find anything like that.

Anyone can explain me how is the support, or is there an add-on or planned 
something?

Best regards
Reto, Frauenfeld, Schweiz




Re: [basex-talk] Database file path

2020-07-16 Thread Vladimir Churyukin
Yes, I've seen that option.
But there is no way to set it per database, correct?
I'm asking because by nature our operations are ad-hoc, we don't really
"startup" the instances,
we create a database, process the data, then destroy the database.
Is there some internal limitation why this option needs to be global?

thank you,
-Vladimir

On Thu, Jul 16, 2020 at 4:36 AM Christian Grün 
wrote:

> Hi Vladimir,
>
> The DBPATH option is the one you’ll need to assign. As it’s a global
> option, it should be assigned at startup time [1].
>
> Best,
> Christian
>
> [1] https://docs.basex.org/wiki/Options
>
>
>
> On Thu, Jul 16, 2020 at 6:11 AM Vladimir Churyukin 
> wrote:
> >
> > Hello,
> >
> > We have a data transformation pipeline that works with XML files of
> different sizes, sometimes big (up to several gigabytes).
> > We are using BaseX to do the transformations.
> > For smaller files we use the MAINMEM option, because the whole database
> can fit in memory. But for some files we can't do that, but all databases
> we create are simply disposable, and we'd like to control where we put
> them, and destroy them after the processing.
> > Is there a special option or any other way to specify where particular
> database's files will reside?
> > How to specify that when we call CreateDB command?
> >
> > thank you,
> > Vladimir
>


[basex-talk] Possible bug in database size (bytes) after database optimize from BaseX Database Administration

2020-07-16 Thread Geoff Alexander


I've found that when I perform a database optimize on an unoptimized
database that while the record count decreases as expected, the database
size (bytes column) stay the same.  Is this a bug in reporting the database
size (less severe problem) or a bug in the database not reducing on
optimize (more severe problem)?  I've encountered this with both BaseX
8.6.7 and 9.3.3 running on Windows 10.

Thanks,
Geoff Alexander, Ph.D.
Software Engineer, Corporate Tools Development
IBM Corporation
Charlotte, NC


Re: [basex-talk] Possible bug in database size (bytes) after database optimize from BaseX Database Administration

2020-07-16 Thread Geoff Alexander
Here are steps to recreate the problem:

(1) Add one or more entries to an new (empty) database.  One the BaseX
Database Administration's Database page, you'll find that the database's
COUNT column shows the number of entries added and that the database's
BYTES column shows the database size.

(2) Update one or of the database's entries.  After refreshing BaseX
Database Administration's Database page, you should find that the
database's COUNT and BYTES columns both increased.

(3) On the BaseX Database Administration's Database page, select the
database and press the Optimize button.  You should find that the
database's COUNT column decreases back to the number of entries in the
database.  However, the database's BYTES column doesn't decrease to reflect
a reduction in the database size.

Maybe I have a misundertanding in what the Optimize button on the BaseX
Database Administration's Database page actually does.

Geoff Alexander, Ph.D.
Software Engineer, Corporate Tools Development
IBM Corporation
Charlotte, NC




From:   "Christian Grün" 
To: Geoff Alexander 
Cc: BaseX 
Date:   07/16/2020 02:11 PM
Subject:[EXTERNAL] Re: Possible bug in database size (bytes) after
database optimize from BaseX Database Administration



It's surprising that the count value changed, as it should represent the
number of resources (documents, binary files) in your database – and this
value shouldn't change if your data is optimized. Feel free to provide us
with a little reproducible example.

The size of the database may stay the same, though. The DBA provides no way
to trigger a full optimization, but you can e.g. use the query panel for
that.





Geoff Alexander  schrieb am Do., 16. Juli 2020, 20:04:
  On the BaseX Database Administration's Database page at
  https://localhost:10443/BaseX/dba/databases, I selected a database I knew
  was unoptimized and pressed the Optimize button. The database's COUNT
  column decreased to the number entries in the database as expected.
  However, the database's BYTES column did not change, even after I logged
  off and back on to BaseX Database Administration.

  Geoff Alexander, Ph.D.
  Software Engineer, Corporate Tools Development
  IBM Corporation
  Charlotte, NC


  "Christian Grün" ---07/16/2020 01:10:38 PM---Hi Geoff, Did you run
  OPTIMIZE ALL or db:optimize(..., true()) ? What do you mean by

  From: "Christian Grün" 
  To: Geoff Alexander 
  Cc: BaseX 
  Date: 07/16/2020 01:10 PM
  Subject: [EXTERNAL] Re: [basex-talk] Possible bug in database size
  (bytes) after database optimize from BaseX Database Administration



  Hi Geoff,

  Did you run OPTIMIZE ALL or db:optimize(..., true()) ? What do you mean
  by "record count"?

  Best,
  Christian



  Geoff Alexander  schrieb am Do., 16. Juli 2020, 18:06:
I've found that when I perform a database optimize on an
unoptimized database that while the record count decreases as
expected, the database size (bytes column) stay the same. Is this a
bug in reporting the database size (less severe problem) or a bug
in the database not reducing on optimize (more severe problem)?
I've encountered this with both BaseX 8.6.7 and 9.3.3 running on
Windows 10.

Thanks,
Geoff Alexander, Ph.D.
Software Engineer, Corporate Tools Development
IBM Corporation
Charlotte, NC


  [attachment "graycol.gif" deleted by Geoff Alexander/Raleigh/IBM]



Re: [basex-talk] Possible bug in database size (bytes) after database optimize from BaseX Database Administration

2020-07-16 Thread Christian Grün
Hi Geoff,

Did you run OPTIMIZE ALL or db:optimize(..., true()) ? What do you mean by
"record count"?

Best,
Christian



Geoff Alexander  schrieb am Do., 16. Juli 2020, 18:06:

> I've found that when I perform a database optimize on an unoptimized
> database that while the record count decreases as expected, the database
> size (bytes column) stay the same. Is this a bug in reporting the database
> size (less severe problem) or a bug in the database not reducing on
> optimize (more severe problem)? I've encountered this with both BaseX 8.6.7
> and 9.3.3 running on Windows 10.
>
> Thanks,
> Geoff Alexander, Ph.D.
> Software Engineer, Corporate Tools Development
> IBM Corporation
> Charlotte, NC
>
>


Re: [basex-talk] Possible bug in database size (bytes) after database optimize from BaseX Database Administration

2020-07-16 Thread Geoff Alexander

On the BaseX Database Administration's Database page at
https://localhost:10443/BaseX/dba/databases, I selected a database I knew
was unoptimized and pressed the Optimize button.  The database's COUNT
column decreased to the number entries in the database as expected.
However, the database's BYTES column did not change, even after I logged
off and back on to BaseX Database Administration.

Geoff Alexander, Ph.D.
Software Engineer, Corporate Tools Development
IBM Corporation
Charlotte, NC




From:   "Christian Grün" 
To: Geoff Alexander 
Cc: BaseX 
Date:   07/16/2020 01:10 PM
Subject:[EXTERNAL] Re: [basex-talk] Possible bug in database size
(bytes) after database optimize from BaseX Database
Administration



Hi Geoff,

Did you run OPTIMIZE ALL or db:optimize(..., true()) ? What do you mean by
"record count"?

Best,
Christian



Geoff Alexander  schrieb am Do., 16. Juli 2020, 18:06:
  I've found that when I perform a database optimize on an unoptimized
  database that while the record count decreases as expected, the database
  size (bytes column) stay the same. Is this a bug in reporting the
  database size (less severe problem) or a bug in the database not reducing
  on optimize (more severe problem)? I've encountered this with both BaseX
  8.6.7 and 9.3.3 running on Windows 10.

  Thanks,
  Geoff Alexander, Ph.D.
  Software Engineer, Corporate Tools Development
  IBM Corporation
  Charlotte, NC




Re: [basex-talk] Possible bug in database size (bytes) after database optimize from BaseX Database Administration

2020-07-16 Thread Christian Grün
It's surprising that the count value changed, as it should represent the
number of resources (documents, binary files) in your database – and this
value shouldn't change if your data is optimized. Feel free to provide us
with a little reproducible example.

The size of the database may stay the same, though. The DBA provides no way
to trigger a full optimization, but you can e.g. use the query panel for
that.





Geoff Alexander  schrieb am Do., 16. Juli 2020, 20:04:

> On the BaseX Database Administration's Database page at
> https://localhost:10443/BaseX/dba/databases, I selected a database I knew
> was unoptimized and pressed the Optimize button. The database's COUNT
> column decreased to the number entries in the database as expected.
> However, the database's BYTES column did not change, even after I logged
> off and back on to BaseX Database Administration.
>
> Geoff Alexander, Ph.D.
> Software Engineer, Corporate Tools Development
> IBM Corporation
> Charlotte, NC
>
>
> [image: Inactive hide details for "Christian Grün" ---07/16/2020 01:10:38
> PM---Hi Geoff, Did you run OPTIMIZE ALL or db:optimize(..., t]"Christian
> Grün" ---07/16/2020 01:10:38 PM---Hi Geoff, Did you run OPTIMIZE ALL or
> db:optimize(..., true()) ? What do you mean by
>
> From: "Christian Grün" 
> To: Geoff Alexander 
> Cc: BaseX 
> Date: 07/16/2020 01:10 PM
> Subject: [EXTERNAL] Re: [basex-talk] Possible bug in database size
> (bytes) after database optimize from BaseX Database Administration
> --
>
>
>
> Hi Geoff,
>
> Did you run OPTIMIZE ALL or db:optimize(..., true()) ? What do you mean by
> "record count"?
>
> Best,
> Christian
>
>
>
> Geoff Alexander <*gd...@us.ibm.com* > schrieb am Do.,
> 16. Juli 2020, 18:06:
>
>I've found that when I perform a database optimize on an unoptimized
>database that while the record count decreases as expected, the database
>size (bytes column) stay the same. Is this a bug in reporting the database
>size (less severe problem) or a bug in the database not reducing on
>optimize (more severe problem)? I've encountered this with both BaseX 8.6.7
>and 9.3.3 running on Windows 10.
>
>Thanks,
>Geoff Alexander, Ph.D.
>Software Engineer, Corporate Tools Development
>IBM Corporation
>Charlotte, NC
>
>
>
>
>


Re: [basex-talk] Possible bug in database size (bytes) after database optimize from BaseX Database Administration

2020-07-16 Thread Christian Grün
>
> We use the BaseX REST API from a Java problem to add and update documents
> in BaseX.
>
Do you think it’s reproducible for us?

>


Re: [basex-talk] Possible bug in database size (bytes) after database optimize from BaseX Database Administration

2020-07-16 Thread Geoff Alexander
I would think so.  Let me see if I can create a small example using the
BaseX WEB API that recreates the problem.

Geoff Alexander, Ph.D.
Software Engineer, Corporate Tools Development
IBM Corporation
Charlotte, NC




From:   "Christian Grün" 
To: Geoff Alexander 
Cc: BaseX 
Date:   07/16/2020 03:10 PM
Subject:[EXTERNAL] Re: Possible bug in database size (bytes) after
database optimize from BaseX Database Administration



  We use the BaseX REST API from a Java problem to add and update documents
  in BaseX.


Do you think it’s reproducible for us?







Re: [basex-talk] Possible bug in database size (bytes) after database optimize from BaseX Database Administration

2020-07-16 Thread Geoff Alexander
We use the BaseX REST API from a Java problem to add and update documents
in BaseX.

Geoff Alexander, Ph.D.
Software Engineer, Corporate Tools Development
IBM Corporation
Charlotte, NC




From:   "Christian Grün" 
To: Geoff Alexander 
Cc: BaseX 
Date:   07/16/2020 03:07 PM
Subject:[EXTERNAL] Re: Possible bug in database size (bytes) after
database optimize from BaseX Database Administration



  (2) Update one or of the database's entries.


If I replace the document via the DBA, or if I run an update expression via
the Queries Panel, the count always reflects the number of resources, it
doesn’t change.

How did you update the document? Is it an XML document or a binary file you
updated?
  Maybe I have a misundertanding in what the Optimize button on the BaseX
  Database Administration's Database page actually does.


Feel free to have a look into our documentation [1].

[1] https://docs.basex.org/wiki/Commands#OPTIMIZE




Re: [basex-talk] Possible bug in database size (bytes) after database optimize from BaseX Database Administration

2020-07-16 Thread Christian Grün
>
> (2) Update one or of the database's entries.
>
If I replace the document via the DBA, or if I run an update expression via
the Queries Panel, the count always reflects the number of resources, it
doesn’t change.

How did you update the document? Is it an XML document or a binary file you
updated?

> Maybe I have a misundertanding in what the Optimize button on the BaseX
> Database Administration's Database page actually does.
>
Feel free to have a look into our documentation [1].

[1] https://docs.basex.org/wiki/Commands#OPTIMIZE