Re: [basex-talk] many distinct namespaces

2019-01-24 Thread Liam R. E. Quin
On Wed, 2019-01-16 at 15:31 +0100, Christian Grün wrote:
> Hi Liam,
> 
> > did this ever happen?
> 
> What exactly? ;)

sorry! support for more than 256 namespaces in one db.


-- 
Liam Quin, https://www.delightfulcomputing.com/
Available for XML/Document/Information Architecture/XSLT/
XSL/XQuery/Web/Text Processing/A11Y training, work & consulting.
Web slave for vintage clipart http://www.fromoldbooks.org/



Re: [basex-talk] many distinct namespaces

2019-01-16 Thread Christian Grün
Hi Liam,

> did this ever happen?

What exactly? ;)

> * reserve 7 bits for the namespace, and 1 bit for "uses extended
> namespace". In the top-bit-set objects only, add an extra byte.
> * Use the first 2 bytes of the element name :)

There are a few online resources with details on our chosen storage
layout [1,2,3]. I think that we pretty much use every available bit
for storing references and flags (except e.g. for text nodes, which
have less metadata than elements or attributes), but maybe there’s
indeed something specific that we could optimize? More suggestions are
welcome.

Cheers
Christian

[1] 
http://files.basex.org/publications/Gruen%20[2010],%20Storing%20and%20Querying%20Large%20XML%20Instances.pdf
[2] http://docs.basex.org/wiki/Storage_Layout
[3] 
https://github.com/BaseXdb/basex/blob/master/basex-core/src/main/java/org/basex/data/Data.java


Re: [basex-talk] many distinct namespaces

2019-01-15 Thread Liam R. E. Quin
On Fri, 2018-11-02 at 17:25 +0100, Christian Grün wrote:


did this ever happen?
> 

> Some more details: The current storage layout per node has been fixed
> to 16 bytes. One byte (8 bits) is reserved for the namespace
> reference.

Here are a couple of hacky appraches in the spirit of brainstorming ;)

* reserve 7 bits for the namespace, and 1 bit for "uses extended
namespace". In the top-bit-set objects only, add an extra byte.
* Use the first 2 bytes of the element name :)

Liam


-- 
Liam Quin, https://www.delightfulcomputing.com/
Available for XML/Document/Information Architecture/XSLT/
XSL/XQuery/Web/Text Processing/A11Y training, work & consulting.
Web slave for vintage clipart http://www.fromoldbooks.org/



Re: [basex-talk] many distinct namespaces

2018-11-02 Thread Christian Grün
>
> How hard can it be to move from a 16 bit architecture to a 32 bit
> architecture?


Yes, that can be tricky. Just one example that affects Java and  BaseX:
Java arrays are limited to 2^31 entries. Although 64 bit CPUs were getting
popular a long time ago, I don’t believe this limit will ever be  lifted in
a future version of Java. Right now, we use simple integer offsets to
address nodes in main-memory database instances. If we decide to introduce
support for more than 2 billion nodes per database, we’d need to use
additional redirections (which is possible indeed, but requires some more
effort than replacing int with long values).

Maybe we could start a little new database project from scratch? ;) Various
issues could then be solved more portable (but the switch to 128 bit
architectures is probably still far away, so we could probably stick with
64 bit…).


Re: [basex-talk] many distinct namespaces

2018-11-02 Thread Imsieke, Gerrit, le-tex
One approach to avoid migration and backwards compatibility issues would 
be to support a standard storage and an extended storage side by side.
The storage and query functions would know beforehand which layout 
variant the current database is in, and they could use the appropriate 
optimized functions.
If dynamic lookup of these layout-specific functions would be too 
costly, maybe providing two separate binaries (classic and extended 
storage) might be an option. I cannot fathom how many pieces of code 
need to be modified in order to be able to maintain a common codebase 
for both layouts.
I’m certainly naïve in this regard because back in the days, I thought: 
How hard can it be to move from a 16 bit architecture to a 32 bit 
architecture?


Gerrit


On 02.11.2018 17:25, Christian Grün wrote:

Hi Gerrit,

thanks for your generous offer to sponsor the requested feature. I am
ashamed to confirm it’s been a long time since this issue has been
opened and not closed yet. You are asking how much money will be
required to get this fixed. I’m not sure after all. Maybe I would
rather ask for 3, 4 weeks of “leisure time”, or – even better – get a
proposal into my hands how this could be resolved without compromising
backward conformance.

Some more details: The current storage layout per node has been fixed
to 16 bytes. One byte (8 bits) is reserved for the namespace
reference. The other 15 bytes (minus a few unused bits) are reserved
for other references and flags. We could extend the storage to 24 or
32 bits. As a result, the central database main table would get
larger, so this would affect both old databases (that need to be
imported) and the overall performance of the system. If we decide to
go this step, we could indeed overcome various of the current
limitations.

Any volunteers out there who are ready for the challenge?
Christian



On Wed, Oct 31, 2018 at 11:40 PM Imsieke, Gerrit, le-tex
 wrote:


Hi Sergei,

The corresponding issue will turn 5 next March:
https://github.com/BaseXdb/basex/issues/902

If you are an XML developer who wants to index all the XML, XSLT, XProc,
RNG, XSD, Schematron, etc. files on your hard disk in an XML database,
chances are that you’ll need more than 256 namespaces.

I’m willing to shell out up to 1,200 Euros (plus VAT) out of my own
pockets for this feature. Any other funders?
Christian, how much do we need to raise collectively for you to
prioritize storage layout redesign?

Gerrit

On 31.10.2018 22:47, Сергей Чесноков wrote:


Hi all,

Is it possible to bypass the following restriction (I cannot change
"ep_ins_med_q.xsd" (Central Bank xbrl scheme))?:

BaseX 9.1

Command:
CREATE DB bfo
D:/portal/xbrl/CBR/final_1_3_1/Taxonomy_1_3_1/www.cbr.ru/xbrl/bfo/
Error:
"D:/portal/xbrl/CBR/final_1_3_1/Taxonomy_1_3_1/www.cbr.ru/xbrl/bfo/rep/2018-03-31/ep/ep_ins_med_q.xsd""ep_ins_med_q.xsd"
(Line 21): Too many distinct namespaces (limit: 256).

Best regards, Sergei.


--
Gerrit Imsieke
Geschäftsführer / Managing Director
le-tex publishing services GmbH
Weissenfelser Str. 84, 04229 Leipzig, Germany
Phone +49 341 355356 110, Fax +49 341 355356 510
gerrit.imsi...@le-tex.de, http://www.le-tex.de

Registergericht / Commercial Register: Amtsgericht Leipzig
Registernummer / Registration Number: HRB 24930

Geschäftsführer / Managing Directors:
Gerrit Imsieke, Svea Jelonek, Thomas Schmidt


--
Gerrit Imsieke
Geschäftsführer / Managing Director
le-tex publishing services GmbH
Weissenfelser Str. 84, 04229 Leipzig, Germany
Phone +49 341 355356 110, Fax +49 341 355356 510
gerrit.imsi...@le-tex.de, http://www.le-tex.de

Registergericht / Commercial Register: Amtsgericht Leipzig
Registernummer / Registration Number: HRB 24930

Geschäftsführer / Managing Directors:
Gerrit Imsieke, Svea Jelonek, Thomas Schmidt


Re: [basex-talk] many distinct namespaces

2018-11-02 Thread Christian Grün
Hi Gerrit,

thanks for your generous offer to sponsor the requested feature. I am
ashamed to confirm it’s been a long time since this issue has been
opened and not closed yet. You are asking how much money will be
required to get this fixed. I’m not sure after all. Maybe I would
rather ask for 3, 4 weeks of “leisure time”, or – even better – get a
proposal into my hands how this could be resolved without compromising
backward conformance.

Some more details: The current storage layout per node has been fixed
to 16 bytes. One byte (8 bits) is reserved for the namespace
reference. The other 15 bytes (minus a few unused bits) are reserved
for other references and flags. We could extend the storage to 24 or
32 bits. As a result, the central database main table would get
larger, so this would affect both old databases (that need to be
imported) and the overall performance of the system. If we decide to
go this step, we could indeed overcome various of the current
limitations.

Any volunteers out there who are ready for the challenge?
Christian



On Wed, Oct 31, 2018 at 11:40 PM Imsieke, Gerrit, le-tex
 wrote:
>
> Hi Sergei,
>
> The corresponding issue will turn 5 next March:
> https://github.com/BaseXdb/basex/issues/902
>
> If you are an XML developer who wants to index all the XML, XSLT, XProc,
> RNG, XSD, Schematron, etc. files on your hard disk in an XML database,
> chances are that you’ll need more than 256 namespaces.
>
> I’m willing to shell out up to 1,200 Euros (plus VAT) out of my own
> pockets for this feature. Any other funders?
> Christian, how much do we need to raise collectively for you to
> prioritize storage layout redesign?
>
> Gerrit
>
> On 31.10.2018 22:47, Сергей Чесноков wrote:
> >
> > Hi all,
> >
> > Is it possible to bypass the following restriction (I cannot change
> > "ep_ins_med_q.xsd" (Central Bank xbrl scheme))?:
> >
> > BaseX 9.1
> >
> > Command:
> > CREATE DB bfo
> > D:/portal/xbrl/CBR/final_1_3_1/Taxonomy_1_3_1/www.cbr.ru/xbrl/bfo/
> > Error:
> > "D:/portal/xbrl/CBR/final_1_3_1/Taxonomy_1_3_1/www.cbr.ru/xbrl/bfo/rep/2018-03-31/ep/ep_ins_med_q.xsd""ep_ins_med_q.xsd"
> > (Line 21): Too many distinct namespaces (limit: 256).
> >
> > Best regards, Sergei.
>
> --
> Gerrit Imsieke
> Geschäftsführer / Managing Director
> le-tex publishing services GmbH
> Weissenfelser Str. 84, 04229 Leipzig, Germany
> Phone +49 341 355356 110, Fax +49 341 355356 510
> gerrit.imsi...@le-tex.de, http://www.le-tex.de
>
> Registergericht / Commercial Register: Amtsgericht Leipzig
> Registernummer / Registration Number: HRB 24930
>
> Geschäftsführer / Managing Directors:
> Gerrit Imsieke, Svea Jelonek, Thomas Schmidt


Re: [basex-talk] many distinct namespaces

2018-10-31 Thread Imsieke, Gerrit, le-tex

Hi Sergei,

The corresponding issue will turn 5 next March: 
https://github.com/BaseXdb/basex/issues/902


If you are an XML developer who wants to index all the XML, XSLT, XProc, 
RNG, XSD, Schematron, etc. files on your hard disk in an XML database, 
chances are that you’ll need more than 256 namespaces.


I’m willing to shell out up to 1,200 Euros (plus VAT) out of my own 
pockets for this feature. Any other funders?
Christian, how much do we need to raise collectively for you to 
prioritize storage layout redesign?


Gerrit

On 31.10.2018 22:47, Сергей Чесноков wrote:


Hi all,

Is it possible to bypass the following restriction (I cannot change 
"ep_ins_med_q.xsd" (Central Bank xbrl scheme))?:


BaseX 9.1

Command:
CREATE DB bfo 
D:/portal/xbrl/CBR/final_1_3_1/Taxonomy_1_3_1/www.cbr.ru/xbrl/bfo/

Error:
"D:/portal/xbrl/CBR/final_1_3_1/Taxonomy_1_3_1/www.cbr.ru/xbrl/bfo/rep/2018-03-31/ep/ep_ins_med_q.xsd""ep_ins_med_q.xsd" 
(Line 21): Too many distinct namespaces (limit: 256).


Best regards, Sergei.


--
Gerrit Imsieke
Geschäftsführer / Managing Director
le-tex publishing services GmbH
Weissenfelser Str. 84, 04229 Leipzig, Germany
Phone +49 341 355356 110, Fax +49 341 355356 510
gerrit.imsi...@le-tex.de, http://www.le-tex.de

Registergericht / Commercial Register: Amtsgericht Leipzig
Registernummer / Registration Number: HRB 24930

Geschäftsführer / Managing Directors:
Gerrit Imsieke, Svea Jelonek, Thomas Schmidt