Re: [basex-talk] many distinct namespaces
On Wed, 2019-01-16 at 15:31 +0100, Christian Grün wrote: > Hi Liam, > > > did this ever happen? > > What exactly? ;) sorry! support for more than 256 namespaces in one db. -- Liam Quin, https://www.delightfulcomputing.com/ Available for XML/Document/Information Architecture/XSLT/ XSL/XQuery/Web/Text Processing/A11Y training, work & consulting. Web slave for vintage clipart http://www.fromoldbooks.org/
Re: [basex-talk] many distinct namespaces
Hi Liam, > did this ever happen? What exactly? ;) > * reserve 7 bits for the namespace, and 1 bit for "uses extended > namespace". In the top-bit-set objects only, add an extra byte. > * Use the first 2 bytes of the element name :) There are a few online resources with details on our chosen storage layout [1,2,3]. I think that we pretty much use every available bit for storing references and flags (except e.g. for text nodes, which have less metadata than elements or attributes), but maybe there’s indeed something specific that we could optimize? More suggestions are welcome. Cheers Christian [1] http://files.basex.org/publications/Gruen%20[2010],%20Storing%20and%20Querying%20Large%20XML%20Instances.pdf [2] http://docs.basex.org/wiki/Storage_Layout [3] https://github.com/BaseXdb/basex/blob/master/basex-core/src/main/java/org/basex/data/Data.java
Re: [basex-talk] many distinct namespaces
On Fri, 2018-11-02 at 17:25 +0100, Christian Grün wrote: did this ever happen? > > Some more details: The current storage layout per node has been fixed > to 16 bytes. One byte (8 bits) is reserved for the namespace > reference. Here are a couple of hacky appraches in the spirit of brainstorming ;) * reserve 7 bits for the namespace, and 1 bit for "uses extended namespace". In the top-bit-set objects only, add an extra byte. * Use the first 2 bytes of the element name :) Liam -- Liam Quin, https://www.delightfulcomputing.com/ Available for XML/Document/Information Architecture/XSLT/ XSL/XQuery/Web/Text Processing/A11Y training, work & consulting. Web slave for vintage clipart http://www.fromoldbooks.org/
Re: [basex-talk] many distinct namespaces
> > How hard can it be to move from a 16 bit architecture to a 32 bit > architecture? Yes, that can be tricky. Just one example that affects Java and BaseX: Java arrays are limited to 2^31 entries. Although 64 bit CPUs were getting popular a long time ago, I don’t believe this limit will ever be lifted in a future version of Java. Right now, we use simple integer offsets to address nodes in main-memory database instances. If we decide to introduce support for more than 2 billion nodes per database, we’d need to use additional redirections (which is possible indeed, but requires some more effort than replacing int with long values). Maybe we could start a little new database project from scratch? ;) Various issues could then be solved more portable (but the switch to 128 bit architectures is probably still far away, so we could probably stick with 64 bit…).
Re: [basex-talk] many distinct namespaces
One approach to avoid migration and backwards compatibility issues would be to support a standard storage and an extended storage side by side. The storage and query functions would know beforehand which layout variant the current database is in, and they could use the appropriate optimized functions. If dynamic lookup of these layout-specific functions would be too costly, maybe providing two separate binaries (classic and extended storage) might be an option. I cannot fathom how many pieces of code need to be modified in order to be able to maintain a common codebase for both layouts. I’m certainly naïve in this regard because back in the days, I thought: How hard can it be to move from a 16 bit architecture to a 32 bit architecture? Gerrit On 02.11.2018 17:25, Christian Grün wrote: Hi Gerrit, thanks for your generous offer to sponsor the requested feature. I am ashamed to confirm it’s been a long time since this issue has been opened and not closed yet. You are asking how much money will be required to get this fixed. I’m not sure after all. Maybe I would rather ask for 3, 4 weeks of “leisure time”, or – even better – get a proposal into my hands how this could be resolved without compromising backward conformance. Some more details: The current storage layout per node has been fixed to 16 bytes. One byte (8 bits) is reserved for the namespace reference. The other 15 bytes (minus a few unused bits) are reserved for other references and flags. We could extend the storage to 24 or 32 bits. As a result, the central database main table would get larger, so this would affect both old databases (that need to be imported) and the overall performance of the system. If we decide to go this step, we could indeed overcome various of the current limitations. Any volunteers out there who are ready for the challenge? Christian On Wed, Oct 31, 2018 at 11:40 PM Imsieke, Gerrit, le-tex wrote: Hi Sergei, The corresponding issue will turn 5 next March: https://github.com/BaseXdb/basex/issues/902 If you are an XML developer who wants to index all the XML, XSLT, XProc, RNG, XSD, Schematron, etc. files on your hard disk in an XML database, chances are that you’ll need more than 256 namespaces. I’m willing to shell out up to 1,200 Euros (plus VAT) out of my own pockets for this feature. Any other funders? Christian, how much do we need to raise collectively for you to prioritize storage layout redesign? Gerrit On 31.10.2018 22:47, Сергей Чесноков wrote: Hi all, Is it possible to bypass the following restriction (I cannot change "ep_ins_med_q.xsd" (Central Bank xbrl scheme))?: BaseX 9.1 Command: CREATE DB bfo D:/portal/xbrl/CBR/final_1_3_1/Taxonomy_1_3_1/www.cbr.ru/xbrl/bfo/ Error: "D:/portal/xbrl/CBR/final_1_3_1/Taxonomy_1_3_1/www.cbr.ru/xbrl/bfo/rep/2018-03-31/ep/ep_ins_med_q.xsd""ep_ins_med_q.xsd" (Line 21): Too many distinct namespaces (limit: 256). Best regards, Sergei. -- Gerrit Imsieke Geschäftsführer / Managing Director le-tex publishing services GmbH Weissenfelser Str. 84, 04229 Leipzig, Germany Phone +49 341 355356 110, Fax +49 341 355356 510 gerrit.imsi...@le-tex.de, http://www.le-tex.de Registergericht / Commercial Register: Amtsgericht Leipzig Registernummer / Registration Number: HRB 24930 Geschäftsführer / Managing Directors: Gerrit Imsieke, Svea Jelonek, Thomas Schmidt -- Gerrit Imsieke Geschäftsführer / Managing Director le-tex publishing services GmbH Weissenfelser Str. 84, 04229 Leipzig, Germany Phone +49 341 355356 110, Fax +49 341 355356 510 gerrit.imsi...@le-tex.de, http://www.le-tex.de Registergericht / Commercial Register: Amtsgericht Leipzig Registernummer / Registration Number: HRB 24930 Geschäftsführer / Managing Directors: Gerrit Imsieke, Svea Jelonek, Thomas Schmidt
Re: [basex-talk] many distinct namespaces
Hi Gerrit, thanks for your generous offer to sponsor the requested feature. I am ashamed to confirm it’s been a long time since this issue has been opened and not closed yet. You are asking how much money will be required to get this fixed. I’m not sure after all. Maybe I would rather ask for 3, 4 weeks of “leisure time”, or – even better – get a proposal into my hands how this could be resolved without compromising backward conformance. Some more details: The current storage layout per node has been fixed to 16 bytes. One byte (8 bits) is reserved for the namespace reference. The other 15 bytes (minus a few unused bits) are reserved for other references and flags. We could extend the storage to 24 or 32 bits. As a result, the central database main table would get larger, so this would affect both old databases (that need to be imported) and the overall performance of the system. If we decide to go this step, we could indeed overcome various of the current limitations. Any volunteers out there who are ready for the challenge? Christian On Wed, Oct 31, 2018 at 11:40 PM Imsieke, Gerrit, le-tex wrote: > > Hi Sergei, > > The corresponding issue will turn 5 next March: > https://github.com/BaseXdb/basex/issues/902 > > If you are an XML developer who wants to index all the XML, XSLT, XProc, > RNG, XSD, Schematron, etc. files on your hard disk in an XML database, > chances are that you’ll need more than 256 namespaces. > > I’m willing to shell out up to 1,200 Euros (plus VAT) out of my own > pockets for this feature. Any other funders? > Christian, how much do we need to raise collectively for you to > prioritize storage layout redesign? > > Gerrit > > On 31.10.2018 22:47, Сергей Чесноков wrote: > > > > Hi all, > > > > Is it possible to bypass the following restriction (I cannot change > > "ep_ins_med_q.xsd" (Central Bank xbrl scheme))?: > > > > BaseX 9.1 > > > > Command: > > CREATE DB bfo > > D:/portal/xbrl/CBR/final_1_3_1/Taxonomy_1_3_1/www.cbr.ru/xbrl/bfo/ > > Error: > > "D:/portal/xbrl/CBR/final_1_3_1/Taxonomy_1_3_1/www.cbr.ru/xbrl/bfo/rep/2018-03-31/ep/ep_ins_med_q.xsd""ep_ins_med_q.xsd" > > (Line 21): Too many distinct namespaces (limit: 256). > > > > Best regards, Sergei. > > -- > Gerrit Imsieke > Geschäftsführer / Managing Director > le-tex publishing services GmbH > Weissenfelser Str. 84, 04229 Leipzig, Germany > Phone +49 341 355356 110, Fax +49 341 355356 510 > gerrit.imsi...@le-tex.de, http://www.le-tex.de > > Registergericht / Commercial Register: Amtsgericht Leipzig > Registernummer / Registration Number: HRB 24930 > > Geschäftsführer / Managing Directors: > Gerrit Imsieke, Svea Jelonek, Thomas Schmidt
Re: [basex-talk] many distinct namespaces
Hi Sergei, The corresponding issue will turn 5 next March: https://github.com/BaseXdb/basex/issues/902 If you are an XML developer who wants to index all the XML, XSLT, XProc, RNG, XSD, Schematron, etc. files on your hard disk in an XML database, chances are that you’ll need more than 256 namespaces. I’m willing to shell out up to 1,200 Euros (plus VAT) out of my own pockets for this feature. Any other funders? Christian, how much do we need to raise collectively for you to prioritize storage layout redesign? Gerrit On 31.10.2018 22:47, Сергей Чесноков wrote: Hi all, Is it possible to bypass the following restriction (I cannot change "ep_ins_med_q.xsd" (Central Bank xbrl scheme))?: BaseX 9.1 Command: CREATE DB bfo D:/portal/xbrl/CBR/final_1_3_1/Taxonomy_1_3_1/www.cbr.ru/xbrl/bfo/ Error: "D:/portal/xbrl/CBR/final_1_3_1/Taxonomy_1_3_1/www.cbr.ru/xbrl/bfo/rep/2018-03-31/ep/ep_ins_med_q.xsd""ep_ins_med_q.xsd" (Line 21): Too many distinct namespaces (limit: 256). Best regards, Sergei. -- Gerrit Imsieke Geschäftsführer / Managing Director le-tex publishing services GmbH Weissenfelser Str. 84, 04229 Leipzig, Germany Phone +49 341 355356 110, Fax +49 341 355356 510 gerrit.imsi...@le-tex.de, http://www.le-tex.de Registergericht / Commercial Register: Amtsgericht Leipzig Registernummer / Registration Number: HRB 24930 Geschäftsführer / Managing Directors: Gerrit Imsieke, Svea Jelonek, Thomas Schmidt