[basex-talk] Whitespace

2024-02-14 Thread Owen Ambur
Lack of capability to deal appropriately with whitespaces (and punctuation) results in false positives in our StratML-enabled query service at  https://search.aboutthem.info/ Will look forward to learning if anything can be done about it. Owen Amburhttps://www.linkedin.com/in/owenambur/ On

Re: [basex-talk] Help with loading of 9 million documents

2024-02-14 Thread Imsieke, Gerrit, le-tex
Whitespace is probably only a minor factor here. It can’t explain the loading times that grow non-linearly with document count. Dietmar, have you looked at the memory consumption? My experience is that if memory gets scarce, garbage collection will kick in frequently, slowing down the import

Re: [basex-talk] Help with loading of 9 million documents

2024-02-14 Thread Christian Grün
Thanks for the addition, Liam; I should have mentioned that. If your input has mixed content, and if the relevant sections have xml:space='preserve' attributes… The very tc34q. …whitespace stripping will be safe. Similarly, it may be helpful to know that the whitspace gets lost if XML strings…

Re: [basex-talk] Help with loading of 9 million documents

2024-02-14 Thread Liam R. E. Quin
On Tue, 2024-02-13 at 20:29 +0100, Christian Grün wrote: > > If your XML input has been properly indented to improve readibility, > you can reduce the size of your database by dropping superfluous > whitespace during the import: > > SET STRIPWS ON; CREATE DB ... > db:create('db',