[basex-talk] Test Failure From Current Source, Failure to Run GUI from Eclipse

2018-05-14 Thread Eliot Kimber
I thought I would see if I could add the Xerces grammar caching to BaseX, at 
least to see if it improved things for DITA loading.

I've updated my fork of the basex project to the current version in github.

Using the master branch as the basis for my local feature branch and with no 
modified files, I get one failing test from "mvn test":

Failed tests: 
  FnTest.sum:91->AdvancedQueryTest.error:78 Query did not fail:
sum(1, 'x')
[E] Error:  err:FORG0006
[F] 1

Tests run: 1578, Failures: 1, Errors: 0, Skipped: 5

I'm also not able to run the BaseXGUI class using an Eclipse run configuration 
per the documentation on the BaseX site. I get 

A bunch of messages about things missing from English.lang:

/lang/English.lang not found.
English.lang: 'port' is missing
... lots more
English.lang: 'h_no_html_parser' is missing

Then this fatal error:

Image not found: /img/text_xml.png
at org.basex.util.Util.stack(Util.java:224)
at org.basex.gui.layout.BaseXImages.url(BaseXImages.java:125)
at org.basex.gui.layout.BaseXImages.get(BaseXImages.java:62)
at org.basex.gui.layout.BaseXImages.icon(BaseXImages.java:109)
at org.basex.gui.layout.BaseXImages.(BaseXImages.java:34)
at org.basex.gui.GUIMacOSX.addDockIcon(GUIMacOSX.java:84)
at org.basex.gui.GUIMacOSX.(GUIMacOSX.java:60)
at org.basex.BaseXGUI.(BaseXGUI.java:58)
at org.basex.BaseXGUI.main(BaseXGUI.java:39)
Exception in thread "main" java.lang.ExceptionInInitializerError
at org.basex.gui.GUIMacOSX.addDockIcon(GUIMacOSX.java:84)
at org.basex.gui.GUIMacOSX.(GUIMacOSX.java:60)
at org.basex.BaseXGUI.(BaseXGUI.java:58)
at org.basex.BaseXGUI.main(BaseXGUI.java:39)
Caused by: java.lang.IllegalArgumentException: input == null!
at javax.imageio.ImageIO.read(ImageIO.java:1388)
at org.basex.gui.layout.BaseXImages.get(BaseXImages.java:72)
at org.basex.gui.layout.BaseXImages.get(BaseXImages.java:62)
at org.basex.gui.layout.BaseXImages.icon(BaseXImages.java:109)
at org.basex.gui.layout.BaseXImages.(BaseXImages.java:34)
... 4 more

I suspect it's something very simple but no idea what it might be.

Thanks,

Eliot

--
Eliot Kimber
http://contrext.com
 




Re: [basex-talk] "Database path [...] yields more than one document" with static file

2018-05-14 Thread Christian Grün
> One thing may be interesting to note: When I imported that file into a
> BaseX database, a few days ago, it resulted in that document being in
> the database twice.

The duplicate occurrence of this file could be the reason for the
error message (in BaseX, fn:doc may refer to documents in a database
and to documents in the local file system). How did you proceed? Does
this also happen if you add the document to an empty database?


Re: [basex-talk] Duplicate declaration of static variable

2018-05-14 Thread Christian Grün
Hi Andreas,

Could you possibly provide us with a little, self-contained example
that will help us to reproduce the issue?

Thanks in advance,
Christian



On Mon, May 14, 2018 at 9:12 PM, Andreas Mixich
 wrote:
> Hi,
>
> when loading my RESTXQ app I get this:
>
>   Stopped at C:/Users/dev/.jetty/webapps/basex/fwdev/lib/account.xqm, 20/18:
>   [XQST0049] Duplicate declaration of static variable $acc:last-visited.
>
>   dev@mambo ~/.jetty/webapps/basex
>   $ grep -R --include="*.*" "\$acc\:last-visited" .
>   ./fwdev/lib/account.xqm:declare variable $acc:last-visited as
> xs:dateTime? external := ();
>
> As it seems, there is no other declaration of the variable in the whole
> project. What could that be?
>
> --
> Goody Bye, Minden jót, Mit freundlichen Grüßen,
> Andreas Mixich


[basex-talk] Duplicate declaration of static variable

2018-05-14 Thread Andreas Mixich
Hi,

when loading my RESTXQ app I get this:

  Stopped at C:/Users/dev/.jetty/webapps/basex/fwdev/lib/account.xqm, 20/18:
  [XQST0049] Duplicate declaration of static variable $acc:last-visited.

  dev@mambo ~/.jetty/webapps/basex
  $ grep -R --include="*.*" "\$acc\:last-visited" .
  ./fwdev/lib/account.xqm:declare variable $acc:last-visited as
xs:dateTime? external := ();

As it seems, there is no other declaration of the variable in the whole
project. What could that be?

-- 
Goody Bye, Minden jót, Mit freundlichen Grüßen,
Andreas Mixich


[basex-talk] "Database path [...] yields more than one document" with static file

2018-05-14 Thread Andreas Mixich
Hi,

during development I sometimes use the static file, that contains my
database:

  declare variable $app:db :=
doc('file:/S:/Users/dev/Eigene%20Projekte/codingcookbook/src/data/dbsample.xml');

So far, it went without problems, but today something very strange happened:

  [basex:doc] Database path 'file:/S:/Users/dev/Eigene%20Projekte
  /codingcookbook/src/data/dbsample.xml' yields more than one document.

Now I checked the file in question. It is well formed and has a single
document-root.

One thing may be interesting to note: When I imported that file into a
BaseX database, a few days ago, it resulted in that document being in
the database twice. Once as it was on disk and once it contained only a
single comment, that I had in the file after the  prolog. I
moved the comment to another place, even removed it, just to make sure,
but the error persists, when trying to load the file from disk via `doc()`

  Error:
  Stopped at S:/Users/dev/Eigene
Projekte/xquery-appframework/src/lib/app.xqm, 37/32:
  [basex:doc] Database path
'file:/S:/Users/dev/Eigene%20Projekte/codingcookbook/src/data/dbsample.xml'
yields more than one document.

  Optimized Query:
  declare variable $app:db as item()* :=
doc("file:/S:/Users/dev/Eigene%20Projekte/codingcookbook/src/data/dbsample.xml");
  declare function local:autoComplete($search_489 as xs:string,
$record-name_490 as xs:string, $places_491 as element()*, $title_492,
$categories_493) as element(container)* { for $var_494 in
$app:db/descendant-or-self::node()/node()[({http://www.w3.org/1999/xhtml}name
= $record-name_490)]/$places_491 where $var_494/text()[. contains text {
$search_489 } using fuzzy using stemming using language 'English']
return element Q{dev:intermediatexml:unstable}container { (element
Q{dev:intermediatexml:unstable}item { ($var_494/text()) }, element
Q{dev:intermediatexml:unstable}title {
($var_494/preceding::node()[1]/node()[(name() = $title_492)]/text()) },
element Q{dev:intermediatexml:unstable}categories {
($var_494/preceding::node()[1]/node()[(name() =
$categories_493)]/text()) }, element Q{dev:intermediatexml:unstable}id {
(($var_494/parent::{http://www.w3.org/1999/xhtml}entry/@xml:id !
string())) }) } };
  local:autoComplete("nested", "cb:entry", ((dc:subject union dc:title
union {http://www.w3.org/1999/xhtml}question)), "dc:title", "cb:categories")

  Query:
  declare base-uri
"file:///S:/Users/dev/Eigene%20Projekte/xquery-appframework/src/";
import module namespace app = "dev:app:unstable" at "lib/app.xqm";
declare default element namespace "http://www.w3.org/1999/xhtml;;
declare namespace cb = "http://codeblocker.org/ns/codingcookbook/1.0/;;
declare namespace dc = "http://purl.org/dc/elements/1.1/;; declare
namespace itx = "dev:intermediatexml:unstable"; declare function
local:autoComplete( $search as xs:string, $record-name as xs:string,
$places as element()*, $title, $categories ) as
element(Q{dev:intermediatexml:unstable}container)* { for $var in
$app:db//node()[name=$record-name]/$places where $var/text()[. contains
text {$search} using stemming using fuzzy] return 
{$var/text()}
{$var/preceding::node()[1]/node()[name()=$title]/text()}
{$var/preceding::node()[1]/node()[name()=$categories]/text()}
{$var/parent::entry/@xml:id/string()} 
};
local:autoComplete("nested","cb:entry",(dc:subject|dc:title|question),"dc:title","cb:categories")


-- 
Goody Bye, Minden jót, Mit freundlichen Grüßen,
Andreas Mixich


Re: [basex-talk] 9.0.1: High Memory Usage Loading Docs Via GUI

2018-05-14 Thread Christian Grün
Good to know; I’ll record this as positive news ;) Feel free to give
me an update once you encounter a similar behavior.


On Mon, May 14, 2018 at 8:40 PM, Eliot Kimber  wrote:
> Hmm.
>
> In the process of testing my test data set I can't reproduce the earlier 
> behavior.
>
> In my current tests, using the same data and the same BaseX version, I get a 
> maximum of maybe 1GB for the largest file but just a few hundred MBs once 
> everything is loaded.
>
> For 3800 topics of roughly 50K each (on average) it takes just a couple of 
> seconds to load them with no DTDs, a minute or so with DTDs, which is 
> consistent with the time cost of reparsing the (large) DITA grammars for each 
> topic.
>
> So not sure what was happening when I tried this before but I definitely 
> rebooted and installed macOS updates since then, so could have been some Java 
> issue or who knows what else.
>
> The good news is that even without grammar caching the DITA topics do load in 
> a reasonable (if not ideal) amount of time and with appropriate memory usage.
>
> Cheers,
>
> E.
>
> --
> Eliot Kimber
> http://contrext.com
>
>
> On 5/14/18, 12:53 PM, "Eliot Kimber" 
>  ekim...@contrext.com> wrote:
>
> Yes, I wouldn't expect the grammars to chew up gigabytes. I'll provide a 
> test data set for you.
>
> Cheers,
>
> E.
>
> --
> Eliot Kimber
> http://contrext.com
>
>
> On 5/14/18, 12:45 PM, "Christian Grün"  wrote:
>
> I would have expected some MBs to be sufficient for parsing even
> complex DTDs if nothing is cached (but caching could definitely speed
> up processing), so maybe there’s still something that we could have a
> look at. If you are interested, feel free to provide me with your
> files via a private message.
>
>
>
> On Mon, May 14, 2018 at 7:40 PM, Eliot Kimber  
> wrote:
> > Yes, I would want caching on by default with the option to turn it 
> off. I'm assuming it's currently not turned on (but to be honest I haven't 
> taken the time to check the source code).
> >
> > Certainly for DITA content grammar caching is the only practical 
> way to parse a large number of topics in the same JVM without both using lots 
> of memory and eating an avoidable processing cost of re-processing the 
> grammar files again for each document.
> >
> > DITA is probably somewhat unique in this regard because it takes a 
> such a different approach to grammar organization and use than pretty much 
> any other XML application.
> >
> > Cheers,
> >
> > E.
>
>
>
>
>
>
>
>


Re: [basex-talk] 9.0.1: High Memory Usage Loading Docs Via GUI

2018-05-14 Thread Eliot Kimber
Hmm.

In the process of testing my test data set I can't reproduce the earlier 
behavior.

In my current tests, using the same data and the same BaseX version, I get a 
maximum of maybe 1GB for the largest file but just a few hundred MBs once 
everything is loaded.

For 3800 topics of roughly 50K each (on average) it takes just a couple of 
seconds to load them with no DTDs, a minute or so with DTDs, which is 
consistent with the time cost of reparsing the (large) DITA grammars for each 
topic. 

So not sure what was happening when I tried this before but I definitely 
rebooted and installed macOS updates since then, so could have been some Java 
issue or who knows what else.

The good news is that even without grammar caching the DITA topics do load in a 
reasonable (if not ideal) amount of time and with appropriate memory usage.

Cheers,

E.

--
Eliot Kimber
http://contrext.com
 

On 5/14/18, 12:53 PM, "Eliot Kimber" 
 
wrote:

Yes, I wouldn't expect the grammars to chew up gigabytes. I'll provide a 
test data set for you.

Cheers,

E.

--
Eliot Kimber
http://contrext.com
 

On 5/14/18, 12:45 PM, "Christian Grün"  wrote:

I would have expected some MBs to be sufficient for parsing even
complex DTDs if nothing is cached (but caching could definitely speed
up processing), so maybe there’s still something that we could have a
look at. If you are interested, feel free to provide me with your
files via a private message.



On Mon, May 14, 2018 at 7:40 PM, Eliot Kimber  
wrote:
> Yes, I would want caching on by default with the option to turn it 
off. I'm assuming it's currently not turned on (but to be honest I haven't 
taken the time to check the source code).
>
> Certainly for DITA content grammar caching is the only practical way 
to parse a large number of topics in the same JVM without both using lots of 
memory and eating an avoidable processing cost of re-processing the grammar 
files again for each document.
>
> DITA is probably somewhat unique in this regard because it takes a 
such a different approach to grammar organization and use than pretty much any 
other XML application.
>
> Cheers,
>
> E.










Re: [basex-talk] Duplicate files when using webDAV and batch processes

2018-05-14 Thread Christian Grün
Hi France,

I’ve just returned after a little break. Thanks for the elaborated
instructions; I’ll follow the described steps in the course of this
week.

Best,
Christian




On Sun, May 13, 2018 at 3:36 PM, France Baril
 wrote:
> Hi,
>
> Just wondering if this slipped through the cracks.
>
> On Wed, May 2, 2018 at 1:11 PM, France Baril 
> wrote:
>>
>> Hi,
>>
>> We've been having this issue for a while and we think resolving it may be
>> the key to resolving an intermittent server 500 error that we've been
>> having.
>>
>> When a user tries to save a file while a batch process is running, BaseX
>> saves duplicates of the file.
>>
>> How to reproduce:
>>
>> 1) Take a fresh BaseX 9.0.1 installation
>> 2) Copy the attached .xqm in webapp
>> 3) Create an empty DB called mydb
>> 4) Access localhost:port-num/test/create-update-a-lot-of-files to populate
>> your db.
>> 5) In OxygenXML, set a webdav connection to the db and open a file, add a
>> character in one of the elements, but don't save the file.
>> 6) From the browser, access 'localhost:port-num/test/update-something'
>> 7) While the process in the browser is still running, save the file in
>> Oxygen. You'll get a message saying that read timed out. Click ok and do not
>> try saving the file again.
>> 8) When the update-something process is done running, don't resave the
>> file in Oxygen, instead go to localhost:port-num/test/oups-duplicates.
>>You'll get a message saying that some files are duplicated. If you
>> don't try again from step #4 a few times. You'll only get duplicates if you
>> get the time out message before the update-something process is still
>> running. If you try to save the file many times, you'll get more duplicates,
>> 4 or 6.
>>
>> We're not sure if it's a BaseX bug or if we are setting our user
>> management and/or locking rules incorrectly.
>>
>> Do you have any suggestions?
>>
>> --
>> France Baril
>> Architecte documentaire / Documentation architect
>> france.ba...@architextus.com
>
>
>
>
> --
> France Baril
> Architecte documentaire / Documentation architect
> france.ba...@architextus.com


Re: [basex-talk] 9.0.1: High Memory Usage Loading Docs Via GUI

2018-05-14 Thread Eliot Kimber
Yes, I wouldn't expect the grammars to chew up gigabytes. I'll provide a test 
data set for you.

Cheers,

E.

--
Eliot Kimber
http://contrext.com
 

On 5/14/18, 12:45 PM, "Christian Grün"  wrote:

I would have expected some MBs to be sufficient for parsing even
complex DTDs if nothing is cached (but caching could definitely speed
up processing), so maybe there’s still something that we could have a
look at. If you are interested, feel free to provide me with your
files via a private message.



On Mon, May 14, 2018 at 7:40 PM, Eliot Kimber  wrote:
> Yes, I would want caching on by default with the option to turn it off. 
I'm assuming it's currently not turned on (but to be honest I haven't taken the 
time to check the source code).
>
> Certainly for DITA content grammar caching is the only practical way to 
parse a large number of topics in the same JVM without both using lots of 
memory and eating an avoidable processing cost of re-processing the grammar 
files again for each document.
>
> DITA is probably somewhat unique in this regard because it takes a such a 
different approach to grammar organization and use than pretty much any other 
XML application.
>
> Cheers,
>
> E.






Re: [basex-talk] 9.0.1: High Memory Usage Loading Docs Via GUI

2018-05-14 Thread Christian Grün
I would have expected some MBs to be sufficient for parsing even
complex DTDs if nothing is cached (but caching could definitely speed
up processing), so maybe there’s still something that we could have a
look at. If you are interested, feel free to provide me with your
files via a private message.



On Mon, May 14, 2018 at 7:40 PM, Eliot Kimber  wrote:
> Yes, I would want caching on by default with the option to turn it off. I'm 
> assuming it's currently not turned on (but to be honest I haven't taken the 
> time to check the source code).
>
> Certainly for DITA content grammar caching is the only practical way to parse 
> a large number of topics in the same JVM without both using lots of memory 
> and eating an avoidable processing cost of re-processing the grammar files 
> again for each document.
>
> DITA is probably somewhat unique in this regard because it takes a such a 
> different approach to grammar organization and use than pretty much any other 
> XML application.
>
> Cheers,
>
> E.


Re: [basex-talk] 9.0.1: High Memory Usage Loading Docs Via GUI

2018-05-14 Thread Eliot Kimber
Yes, I would want caching on by default with the option to turn it off. I'm 
assuming it's currently not turned on (but to be honest I haven't taken the 
time to check the source code).

Certainly for DITA content grammar caching is the only practical way to parse a 
large number of topics in the same JVM without both using lots of memory and 
eating an avoidable processing cost of re-processing the grammar files again 
for each document.

DITA is probably somewhat unique in this regard because it takes a such a 
different approach to grammar organization and use than pretty much any other 
XML application.

Cheers,

E.

--
Eliot Kimber
http://contrext.com
 

On 5/14/18, 12:17 PM, "Christian Grün"  wrote:

Hi Eliot,

Thanks for your observations.

> I think the solution is to turn on Xerces' grammar caching.

I’m wondering what is happening here. Did you want to say that caching
is enabled by default, and that it should be possible to turn it off?

Cheers,
Christian



 The only danger there is that different DTDs within the same content
set can different expansions for the same external parameter entity
reference (e.g., MathML DTDs), which then can lead to validation
issues. For this reason the DITA OT makes use of the grammar cache
switchable but on by default.
>
> Another option for DITA content in particular is to use the OT's 
preprocessing to parse all the docs and then use BaseX with the parsed docs 
where all the attributes have been expanded into the source.
>
> Cheers,
>
> E.
> --
> Eliot Kimber
> http://contrext.com
>
>
> On 5/4/18, 9:52 AM, "Eliot Kimber" 
 
wrote:
>
> Follow up--I tried giving BaseX the full 16GB of RAM and it still 
ultimately locked up with the memory meter showing 13GB.
>
> I'm thinking this must be some kind of memory leak.
>
> I tried importing the DITA Open Toolkit's documentation source and 
that worked fine with the max memory being about 2.5GB, but it's only about 250 
topics.
>
> Cheers,
>
> E.
>
> --
> Eliot Kimber
> http://contrext.com
>
> On 5/3/18, 4:59 PM, "Eliot Kimber" 
 
wrote:
>
> In the context of trying to do fun things with DITA docs in BaseX 
I downloaded the latest BaseX (9.0.1) and tried creating a new database and 
loading docs into it using the BaseX GUI. This is on macOS 10.13.4 with 16GB of 
hardware RAM available.
>
> My corpus is about 4000 DITA topics totaling about 30MB on disk. 
They are all in a single directory (not my decision) if that matters.
>
> Using the "parse DTDs" option and default indexing options (no 
token or full text indexes) I'm finding that even with 12GB of RAM allocated to 
the JVM the memory usage during load will eventually go to 12GB, at which point 
the processing appears to stop (that is, whatever I set the max memory to, when 
it's reached, things stop but I only got out of memory errors when I had much 
lower settings, like the default 2GB).
>
> I'm currently running a test with 14GB allocated and it is 
continuing but it does go to 12GB occasionally (watching the memory display on 
the Add progress panel).
>
> No individual file is that big--the biggest is 150K and typical 
is 30K or smaller.
>
> I wouldn't expect BaseX to have this kind of memory problem so 
I'm wondering if maybe there's an issue with memory on macOS or with DITA 
documents in particular (the DITA DTDs are notoriously large)?
>
> Should I expect BaseX to be able to load this kind of corpus with 
14GB of RAM?
>
> Cheers,
>
> E.
> --
> Eliot Kimber
> http://contrext.com
>
>
>
>
>
>
>
>
>
>
>






Re: [basex-talk] Unexpected unary lookup result

2018-05-14 Thread Christian Grün
Hi Sebastian,

​Thanks for your bug report. I found the culprit, it was a little static
typing error​ in our lookup expression optimizations [1].

A new snapshot is available [2]; BaseX 9.0.2 will probably be released
around end of May.

Cheers,
Christian

[1] https://github.com/BaseXdb/basex/commit/75947cfee98d513ddb1620f84a092d
c883ebcc19
[2] http://files.basex.org/releases/latest/




On Fri, May 11, 2018 at 12:52 PM, Sebastian Zimmer <
sebastian.zim...@uni-koeln.de> wrote:

> Sorry to bother you again, but I think there is still something wrong with
> my code and I can't figure it out. This time I checked it consistently on
> BaseX 9.0.1 (Windows and Linux, console and web server), results are always
> the same:
>
> xquery version "3.1";
> declare namespace array = "http://www.w3.org/2005/xpath-functions/array; 
> ;
>
> let $array1 := []
> let $array2 := array:for-each([], function($i) { lower-case($i) })
> let $array3 := array:for-each([], function($i) { $i + 1 })
>
> return (
>   empty($array1!?*),
>   empty($array2!?*),
>   empty($array3!?*),
>   empty(
> for $i in 1 to array:size($array1)
> return $array1($i)
>   ),
>   empty(
> for $i in 1 to array:size($array2)
> return $array2($i)
>   ),
>   empty(
> for $i in 1 to array:size($array3)
> return $array3($i)
>   )
> )
>
> The results I get are:
> true
> false
> true
> true
> true
> true
>
> Why isn't the second result "true" too?
>
> Best regards,
> Sebastian
>
> Am 11.05.2018 um 11:12 schrieb Sebastian Zimmer:
>
> Hi again,
>
> the problem is gone now after a reboot. It seems that the web server was
> running on another version while the console was running with 9.0.1
>
> Sorry for the inconvenience.
>
> Best,
> Sebastian
> Am 11.05.2018 um 11:04 schrieb Sebastian Zimmer:
>
> Hi Giuseppe,
>
> thanks for checking. I double-checked again. The problem is even weirder
> now:
>
> When using the console, I too get 2x true:
>
> $ ./bin/basex "./webapp/array_test.xql"
> true
> true
>
> When using the web server, I still get this:
>
> $ curl localhost:8994/rest?run=array_test.xql
> false
> true
>
> At first I thought there was some cache at work, preventing the update,
> but it doesn't seem to be the case. I can edit the XQL und both outputs
> change accordingly, but the first boolean is still different.
>
> Best regards,
> Sebastian
>
> Am 11.05.2018 um 08:28 schrieb Giuseppe Celano:
>
> Hi Sebastian,
>
> In my Basex 9.0.1 and 8.6.7 you get two "true".
>
> Best,
> Giuseppe
>
> Universität Leipzig
> Institute of Computer Science, NLP
> Augustusplatz 10
> 
> 04109 Leipzig
> Deutschland
> E-mail: cel...@informatik.uni-leipzig.de
> 
> E-mail: giuseppegacel...@gmail.com
> Web site 1: http://www.dh.uni-leipzig.de/wo/team/
> Web site 2: https://sites.google.com/site/giuseppegacelano/
>
> On May 11, 2018, at 1:50 AM, Sebastian Zimmer <
> sebastian.zim...@uni-koeln.de> wrote:
>
> Hi,
>
> I have this script where I use the lookup operator to perform a unary
> lookup:
>
> xquery version "3.1";
> declare namespace array = "http://www.w3.org/2005/xpath-functions/array; 
> ;
>
> let $array := []
>
> return (
>   empty($array!?*),   (: returns false :)
>   empty(
> for $i in 1 to array:size($array)
> return $array($i)
>   )  (: returns true  :)
> )
>
> I'm curious that the first expression returns false even though it should
> be equivalent to the second expression, if I read the XQuery spec [1] right:
> If the context item is an array:
> If the KeySpecifier
> 
>  is a wildcard ("*"), the UnaryLookup
> 
>  operator is equivalent to the following expression:
>
> for $k in 1 to array:size(.)
> return .($k)
>
> But maybe I'm missing something. I'd be glad if you could help.
>
> Best regards,
> Sebastian Zimmer
>
> [1] https://www.w3.org/TR/2017/REC-xquery-31-20170321/#id-unary-lookup
> --
> Sebastian Zimmer
> sebastian.zim...@uni-koeln.de
>  
>
> Cologne Center for eHumanities 
> DH Center at the University of Cologne
>  @CCeHum
> 
>
>
>
> --
> Sebastian Zimmer
> sebastian.zim...@uni-koeln.de
> [image: CCeH Logo] 
>
> Cologne Center for eHumanities 
> DH Center at the University of Cologne
> [image: Twitter Logo] @CCeHum
> 
>
>
> --
> Sebastian Zimmer
> sebastian.zim...@uni-koeln.de
> [image: CCeH Logo] 
>
> Cologne Center for eHumanities 
> DH Center at the 

Re: [basex-talk] 9.0.1: High Memory Usage Loading Docs Via GUI

2018-05-14 Thread Christian Grün
Hi Eliot,

Thanks for your observations.

> I think the solution is to turn on Xerces' grammar caching.

I’m wondering what is happening here. Did you want to say that caching
is enabled by default, and that it should be possible to turn it off?

Cheers,
Christian



 The only danger there is that different DTDs within the same content
set can different expansions for the same external parameter entity
reference (e.g., MathML DTDs), which then can lead to validation
issues. For this reason the DITA OT makes use of the grammar cache
switchable but on by default.
>
> Another option for DITA content in particular is to use the OT's 
> preprocessing to parse all the docs and then use BaseX with the parsed docs 
> where all the attributes have been expanded into the source.
>
> Cheers,
>
> E.
> --
> Eliot Kimber
> http://contrext.com
>
>
> On 5/4/18, 9:52 AM, "Eliot Kimber" 
>  ekim...@contrext.com> wrote:
>
> Follow up--I tried giving BaseX the full 16GB of RAM and it still 
> ultimately locked up with the memory meter showing 13GB.
>
> I'm thinking this must be some kind of memory leak.
>
> I tried importing the DITA Open Toolkit's documentation source and that 
> worked fine with the max memory being about 2.5GB, but it's only about 250 
> topics.
>
> Cheers,
>
> E.
>
> --
> Eliot Kimber
> http://contrext.com
>
> On 5/3/18, 4:59 PM, "Eliot Kimber" 
>  ekim...@contrext.com> wrote:
>
> In the context of trying to do fun things with DITA docs in BaseX I 
> downloaded the latest BaseX (9.0.1) and tried creating a new database and 
> loading docs into it using the BaseX GUI. This is on macOS 10.13.4 with 16GB 
> of hardware RAM available.
>
> My corpus is about 4000 DITA topics totaling about 30MB on disk. They 
> are all in a single directory (not my decision) if that matters.
>
> Using the "parse DTDs" option and default indexing options (no token 
> or full text indexes) I'm finding that even with 12GB of RAM allocated to the 
> JVM the memory usage during load will eventually go to 12GB, at which point 
> the processing appears to stop (that is, whatever I set the max memory to, 
> when it's reached, things stop but I only got out of memory errors when I had 
> much lower settings, like the default 2GB).
>
> I'm currently running a test with 14GB allocated and it is continuing 
> but it does go to 12GB occasionally (watching the memory display on the Add 
> progress panel).
>
> No individual file is that big--the biggest is 150K and typical is 
> 30K or smaller.
>
> I wouldn't expect BaseX to have this kind of memory problem so I'm 
> wondering if maybe there's an issue with memory on macOS or with DITA 
> documents in particular (the DITA DTDs are notoriously large)?
>
> Should I expect BaseX to be able to load this kind of corpus with 
> 14GB of RAM?
>
> Cheers,
>
> E.
> --
> Eliot Kimber
> http://contrext.com
>
>
>
>
>
>
>
>
>
>
>


Re: [basex-talk] BaseX weird error

2018-05-14 Thread Christian Grün
Hi Halit,

Thanks for reporting the stack trace. The only thing that I can derive
from your error message is that something has gone wrong with your
database in an earlier state. Unfortunately, it’s difficult to guess
what might have been the reason. Feel free to give us an update if you
should manage to get the issue reproduced and reproducible for us.

Best,
Christian



On Wed, May 9, 2018 at 4:28 PM, halit tiryaki  wrote:
> I'm using the C# driver. but still, it doesn't look like a high level
> problem. neither a query problem. thats why I didn't post some code
> snippets. To make sure, I just tried to execute the INSPECT command in the
> CLI. same exception. Also the db:optimize fails with the same exception,
> although with a different stacktrace, which I will post below.
>
> Altough the race condition point of two processes interfering is a good
> point, I'm currently having trouble to do anything useful with a session
> from which a race condition would matter. If, then it must have happened
> while the import was running along with the other, non-importing process.
>
> As I mentioned in the first post, while the import process was still
> ongoing, the other process already started to have the exception at a very
> late progress of the import. After the import was done, I was still able to
> operate on the db as normal, but only with the session in the process which
> did the long-run import.
>
> To restore the second process its session state to normal, I restarted the
> webapplication, which closes both sessions. After that, I'm no longer to
> operate on the db. Not even directly from the CLI.
>
> Seems it is a sort of a persisted data state which the low level basex code
> can't handle when trying to reread it into to the memory.
>
> I 've already googled around for the same exception in BaseX, seems it was
> an issue in some older versions too.
>
>
> here the stacktrace when running db:optimize:
>
> Improper use? Potential bug? Your feedback is welcome:
> Contact: basex-talk@mailman.uni-konstanz.de
> Version: BaseX 9.0.2 beta
> Java: Oracle Corporation, 1.8.0_151
> OS: Linux, amd64
> Stack Trace:
> java.lang.ArrayIndexOutOfBoundsException: 52
> at org.basex.util.hash.TokenSet.key(TokenSet.java:128)
> at org.basex.data.Data.name(Data.java:388)
> at org.basex.io.serial.Serializer.node(Serializer.java:414)
> at org.basex.io.serial.Serializer.node(Serializer.java:158)
> at org.basex.io.serial.Serializer.node(Serializer.java:345)
> at org.basex.io.serial.Serializer.node(Serializer.java:158)
> at org.basex.io.serial.Serializer.serialize(Serializer.java:109)
> at org.basex.core.cmd.OptimizeAll$DBParser.parse(OptimizeAll.java:200)
> at org.basex.build.Builder.parse(Builder.java:77)
> at org.basex.build.DiskBuilder.build(DiskBuilder.java:77)
> at org.basex.core.cmd.OptimizeAll.optimizeAll(OptimizeAll.java:122)
> at org.basex.query.up.primitives.db.DBOptimize.apply(DBOptimize.java:124)
> at org.basex.query.up.DataUpdates.apply(DataUpdates.java:175)
> at org.basex.query.up.ContextModifier.apply(ContextModifier.java:120)
> at org.basex.query.up.Updates.apply(Updates.java:157)
> at org.basex.query.QueryContext.iter(QueryContext.java:341)
> at org.basex.query.QueryProcessor.iter(QueryProcessor.java:90)
> at org.basex.core.cmd.AQuery.query(AQuery.java:92)
> at org.basex.core.cmd.XQuery.run(XQuery.java:22)
> at org.basex.core.Command.run(Command.java:257)
> at org.basex.core.Command.execute(Command.java:93)
> at org.basex.server.ClientListener.run(ClientListener.java:140)
>
>
> On Wed, May 9, 2018 at 11:45 AM, Alexander Holupirek 
> wrote:
>>
>> Hi,
>>
>> (please also respond to the list.  Others might have suggestions as well
>> ;-)
>>
>> it still would be great if you could provide even more details.  For
>> instance, what programming language do you use?Ideally include a Short, Self
>> Contained, Correct (Compilable), Example (SSCCE) [1] in order to let people
>> reproduce the behaviour.
>>
>> From a high level perspective, are you sure the sessions do not interfere?
>> Do you additionally work with BaseXGUI on the database, while your program
>> is importing?
>>
>> Cheers,
>> Alex
>>
>>
>> [1] http://sscce.org/
>>
>> > On 9. May 2018, at 09:26, halit tiryaki  wrote:
>> >
>> > hello, thanks for the fast response.
>> >
>> > It happened in the process of a large Import. There were two processes
>> > with each a respective Session to BaseX. One process with the Open session
>> > which did the Import was doing fine. The Second process however Starter
>> > having the mentioned exception. After stopping Both processes and 
>> > restarting
>> > the app, no basex Session is now able to operate normal.
>> >
>> > The ArrayIndexOutOfBoundsException already comes in executing the
>> > command INSPECT.
>> >
>> > The db-Info returns following:
>> >
>> >
>> > info db:
>> > Database Properties
>> > NAME: XXX
>> > SIZE: 5877 kB
>> > NODES: 124051