Re: [basex-talk] Converting a Date

2018-11-02 Thread Joe Wicentowski
Hi Ron,

You might find Ryan Grimm's date-parser library module useful if you have a
larger range of date formats to handle:


https://github.com/marklogic-community/commons/blob/master/dates/date-parser.xqy

While it was written with some MarkLogic-specific code, I adapted it for
use with eXist (but haven't tested it with BaseX):


https://github.com/HistoryAtState/twitter/blob/master/modules/date-parser.xqm

Best,
Joe

On Fri, Nov 2, 2018 at 6:48 PM Ron Katriel  wrote:

> Hi Christian,
>
> Much appreciated! I hardened the code (see below) since the dates (from
> CT.gov) occasionally also have the day of the month (e.g., “March 21,
> 2014”). Currently the function is dropping the day in such cases but I will
> look into capturing it in a future iteration.
>
> Best,
> Ron
>
>
> declare function local:to-date($string) {
>   if (fn:matches($string, '[A-Za-z]+ [0-9]+') or fn:matches($string,
> '[A-Za-z]+ [0-9]+, [0-9]+'))
>   then
>   let $m := index-of($MONTHS, substring-before($string, ' '))
>   let $y := xs:integer(functx:substring-after-last($string, ' '))
>   return xs:date(string-join(
> (
>   format-number($y, ''),
>   format-number($m, '00'),
>   '01'
> ),
> '-')
>   )
>   else
>   ()
> };
>
> On November 2, 2018 at 4:20:41 PM, Christian Grün (
> christian.gr...@gmail.com) wrote:
>
> Hi Ron,
>
> If your timestamp is available in IETF format, you can use
> fn:parse-ietf-date [1]. Otherwise, you’ll need to write a simple
> function by yourself:
>
> declare variable $MONTHS := (
> 'January', 'February', 'March', 'April', 'May', 'June',
> 'July', 'August', 'September', 'October', 'November', 'December'
> );
>
> declare function local:to-date($string) {
> let $m := index-of($MONTHS, substring-before($string, ' '))
> let $y := xs:integer(substring-after($string, ' '))
> return xs:date(string-join((
> format-number($y, ''),
> format-number($m, '00'),
> '01'
> ), '-'))
> };
> local:to-date('March 2017')
>
> Best,
> Christian
>
> [1]
> https://urldefense.proofpoint.com/v2/url?u=https-3A__www.w3.org_TR_xpath-2Dfunctions-2D31_-23func-2Dparse-2Dietf-2Ddate=DwIFaQ=fi2D4-9xMzmjyjREwHYlAw=44jDQvzmnB_-ovfO6Iusj0ItciJrcWMOQQwd2peEBBE=Yy9sC1xS_Ix-pGSVDp-Lbmz8BOft0S1WdVpEM-qzRw4=uQ3NqCv8FpHulP4q1arjItJX3-gCHwi_06WN4znRz48=
>
>
>
> On Fri, Nov 2, 2018 at 9:09 PM Ron Katriel  wrote:
> >
> > Hi,
> >
> > Is there a BaseX function for converting a string date in the form of
> “March 2017” to xs:date or xs:dateTime?
> >
> > Thanks,
> > Ron
>
>


Re: [basex-talk] Converting a Date

2018-11-02 Thread Ron Katriel
Hi Christian,

Much appreciated! I hardened the code (see below) since the dates (from
CT.gov) occasionally also have the day of the month (e.g., “March 21,
2014”). Currently the function is dropping the day in such cases but I will
look into capturing it in a future iteration.

Best,
Ron


declare function local:to-date($string) {
  if (fn:matches($string, '[A-Za-z]+ [0-9]+') or fn:matches($string,
'[A-Za-z]+ [0-9]+, [0-9]+'))
  then
  let $m := index-of($MONTHS, substring-before($string, ' '))
  let $y := xs:integer(functx:substring-after-last($string, ' '))
  return xs:date(string-join(
(
  format-number($y, ''),
  format-number($m, '00'),
  '01'
),
'-')
  )
  else
  ()
};

On November 2, 2018 at 4:20:41 PM, Christian Grün (christian.gr...@gmail.com)
wrote:

Hi Ron,

If your timestamp is available in IETF format, you can use
fn:parse-ietf-date [1]. Otherwise, you’ll need to write a simple
function by yourself:

declare variable $MONTHS := (
'January', 'February', 'March', 'April', 'May', 'June',
'July', 'August', 'September', 'October', 'November', 'December'
);

declare function local:to-date($string) {
let $m := index-of($MONTHS, substring-before($string, ' '))
let $y := xs:integer(substring-after($string, ' '))
return xs:date(string-join((
format-number($y, ''),
format-number($m, '00'),
'01'
), '-'))
};
local:to-date('March 2017')

Best,
Christian

[1]
https://urldefense.proofpoint.com/v2/url?u=https-3A__www.w3.org_TR_xpath-2Dfunctions-2D31_-23func-2Dparse-2Dietf-2Ddate=DwIFaQ=fi2D4-9xMzmjyjREwHYlAw=44jDQvzmnB_-ovfO6Iusj0ItciJrcWMOQQwd2peEBBE=Yy9sC1xS_Ix-pGSVDp-Lbmz8BOft0S1WdVpEM-qzRw4=uQ3NqCv8FpHulP4q1arjItJX3-gCHwi_06WN4znRz48=



On Fri, Nov 2, 2018 at 9:09 PM Ron Katriel  wrote:
>
> Hi,
>
> Is there a BaseX function for converting a string date in the form of
“March 2017” to xs:date or xs:dateTime?
>
> Thanks,
> Ron


Re: [basex-talk] Converting a Date

2018-11-02 Thread Christian Grün
Hi Ron,

If your timestamp is available in IETF format, you can use
fn:parse-ietf-date [1]. Otherwise, you’ll need to write a simple
function by yourself:

  declare variable $MONTHS := (
'January', 'February', 'March', 'April', 'May', 'June',
'July', 'August', 'September', 'October', 'November', 'December'
  );

  declare function local:to-date($string) {
let $m := index-of($MONTHS, substring-before($string, ' '))
let $y := xs:integer(substring-after($string, ' '))
return xs:date(string-join((
  format-number($y, ''),
  format-number($m, '00'),
  '01'
), '-'))
  };
  local:to-date('March 2017')

Best,
Christian

[1] https://www.w3.org/TR/xpath-functions-31/#func-parse-ietf-date



On Fri, Nov 2, 2018 at 9:09 PM Ron Katriel  wrote:
>
> Hi,
>
> Is there a BaseX function for converting a string date in the form of “March 
> 2017” to xs:date or xs:dateTime?
>
> Thanks,
> Ron


[basex-talk] Converting a Date

2018-11-02 Thread Ron Katriel
Hi,

Is there a BaseX function for converting a string date in the form of
“March 2017” to xs:date or xs:dateTime?

Thanks,
Ron

On November 2, 2018 at 2:37:31 PM, Imsieke, Gerrit, le-tex (
gerrit.imsi...@le-tex.de) wrote:

One approach to avoid migration and backwards compatibility issues would
be to support a standard storage and an extended storage side by side.
The storage and query functions would know beforehand which layout
variant the current database is in, and they could use the appropriate
optimized functions.
If dynamic lookup of these layout-specific functions would be too
costly, maybe providing two separate binaries (classic and extended
storage) might be an option. I cannot fathom how many pieces of code
need to be modified in order to be able to maintain a common codebase
for both layouts.
I’m certainly naïve in this regard because back in the days, I thought:
How hard can it be to move from a 16 bit architecture to a 32 bit
architecture?

Gerrit


On 02.11.2018 17:25, Christian Grün wrote:
> Hi Gerrit,
>
> thanks for your generous offer to sponsor the requested feature. I am
> ashamed to confirm it’s been a long time since this issue has been
> opened and not closed yet. You are asking how much money will be
> required to get this fixed. I’m not sure after all. Maybe I would
> rather ask for 3, 4 weeks of “leisure time”, or – even better – get a
> proposal into my hands how this could be resolved without compromising
> backward conformance.
>
> Some more details: The current storage layout per node has been fixed
> to 16 bytes. One byte (8 bits) is reserved for the namespace
> reference. The other 15 bytes (minus a few unused bits) are reserved
> for other references and flags. We could extend the storage to 24 or
> 32 bits. As a result, the central database main table would get
> larger, so this would affect both old databases (that need to be
> imported) and the overall performance of the system. If we decide to
> go this step, we could indeed overcome various of the current
> limitations.
>
> Any volunteers out there who are ready for the challenge?
> Christian
>
>
>
> On Wed, Oct 31, 2018 at 11:40 PM Imsieke, Gerrit, le-tex
>  wrote:
>>
>> Hi Sergei,
>>
>> The corresponding issue will turn 5 next March:
>>
https://urldefense.proofpoint.com/v2/url?u=https-3A__github.com_BaseXdb_basex_issues_902=DwIDaQ=fi2D4-9xMzmjyjREwHYlAw=44jDQvzmnB_-ovfO6Iusj0ItciJrcWMOQQwd2peEBBE=g71jhvou7ORXyrKJxjN_aKfDjb3cvrDYTUyI9Qoy0EU=ZdZSkALp3UWiMOxwCPZsI3todqdt13fwLXsddRT7AUY=
>>
>> If you are an XML developer who wants to index all the XML, XSLT, XProc,
>> RNG, XSD, Schematron, etc. files on your hard disk in an XML database,
>> chances are that you’ll need more than 256 namespaces.
>>
>> I’m willing to shell out up to 1,200 Euros (plus VAT) out of my own
>> pockets for this feature. Any other funders?
>> Christian, how much do we need to raise collectively for you to
>> prioritize storage layout redesign?
>>
>> Gerrit
>>
>> On 31.10.2018 22:47, Сергей Чесноков wrote:
>>>
>>> Hi all,
>>>
>>> Is it possible to bypass the following restriction (I cannot change
>>> "ep_ins_med_q.xsd" (Central Bank xbrl scheme))?:
>>>
>>> BaseX 9.1
>>>
>>> Command:
>>> CREATE DB bfo
>>> D:/portal/xbrl/CBR/final_1_3_1/Taxonomy_1_3_1/www.cbr.ru/xbrl/bfo/
>>> Error:
>>> "D:/portal/xbrl/CBR/final_1_3_1/Taxonomy_1_3_1/
www.cbr.ru/xbrl/bfo/rep/2018-03-31/ep/ep_ins_med_q.xsd""ep_ins_med_q.xsd"
>>> (Line 21): Too many distinct namespaces (limit: 256).
>>>
>>> Best regards, Sergei.
>>
>> --
>> Gerrit Imsieke
>> Geschäftsführer / Managing Director
>> le-tex publishing services GmbH
>> Weissenfelser Str. 84, 04229 Leipzig, Germany
>> Phone +49 341 355356 110, Fax +49 341 355356 510
>> gerrit.imsi...@le-tex.de,
https://urldefense.proofpoint.com/v2/url?u=http-3A__www.le-2Dtex.de=DwIDaQ=fi2D4-9xMzmjyjREwHYlAw=44jDQvzmnB_-ovfO6Iusj0ItciJrcWMOQQwd2peEBBE=g71jhvou7ORXyrKJxjN_aKfDjb3cvrDYTUyI9Qoy0EU=upoMemh2v9_167rcLB2gdyBS2ybHbIgFUvNF6d6FQu0=
>>
>> Registergericht / Commercial Register: Amtsgericht Leipzig
>> Registernummer / Registration Number: HRB 24930
>>
>> Geschäftsführer / Managing Directors:
>> Gerrit Imsieke, Svea Jelonek, Thomas Schmidt

-- 
Gerrit Imsieke
Geschäftsführer / Managing Director
le-tex publishing services GmbH
Weissenfelser Str. 84, 04229 Leipzig, Germany
Phone +49 341 355356 110, Fax +49 341 355356 510
gerrit.imsi...@le-tex.de,
https://urldefense.proofpoint.com/v2/url?u=http-3A__www.le-2Dtex.de=DwIDaQ=fi2D4-9xMzmjyjREwHYlAw=44jDQvzmnB_-ovfO6Iusj0ItciJrcWMOQQwd2peEBBE=g71jhvou7ORXyrKJxjN_aKfDjb3cvrDYTUyI9Qoy0EU=upoMemh2v9_167rcLB2gdyBS2ybHbIgFUvNF6d6FQu0=

Registergericht / Commercial Register: Amtsgericht Leipzig
Registernummer / Registration Number: HRB 24930

Geschäftsführer / Managing Directors:
Gerrit Imsieke, Svea Jelonek, Thomas Schmidt


Re: [basex-talk] many distinct namespaces

2018-11-02 Thread Christian Grün
>
> How hard can it be to move from a 16 bit architecture to a 32 bit
> architecture?


Yes, that can be tricky. Just one example that affects Java and  BaseX:
Java arrays are limited to 2^31 entries. Although 64 bit CPUs were getting
popular a long time ago, I don’t believe this limit will ever be  lifted in
a future version of Java. Right now, we use simple integer offsets to
address nodes in main-memory database instances. If we decide to introduce
support for more than 2 billion nodes per database, we’d need to use
additional redirections (which is possible indeed, but requires some more
effort than replacing int with long values).

Maybe we could start a little new database project from scratch? ;) Various
issues could then be solved more portable (but the switch to 128 bit
architectures is probably still far away, so we could probably stick with
64 bit…).


Re: [basex-talk] many distinct namespaces

2018-11-02 Thread Imsieke, Gerrit, le-tex
One approach to avoid migration and backwards compatibility issues would 
be to support a standard storage and an extended storage side by side.
The storage and query functions would know beforehand which layout 
variant the current database is in, and they could use the appropriate 
optimized functions.
If dynamic lookup of these layout-specific functions would be too 
costly, maybe providing two separate binaries (classic and extended 
storage) might be an option. I cannot fathom how many pieces of code 
need to be modified in order to be able to maintain a common codebase 
for both layouts.
I’m certainly naïve in this regard because back in the days, I thought: 
How hard can it be to move from a 16 bit architecture to a 32 bit 
architecture?


Gerrit


On 02.11.2018 17:25, Christian Grün wrote:

Hi Gerrit,

thanks for your generous offer to sponsor the requested feature. I am
ashamed to confirm it’s been a long time since this issue has been
opened and not closed yet. You are asking how much money will be
required to get this fixed. I’m not sure after all. Maybe I would
rather ask for 3, 4 weeks of “leisure time”, or – even better – get a
proposal into my hands how this could be resolved without compromising
backward conformance.

Some more details: The current storage layout per node has been fixed
to 16 bytes. One byte (8 bits) is reserved for the namespace
reference. The other 15 bytes (minus a few unused bits) are reserved
for other references and flags. We could extend the storage to 24 or
32 bits. As a result, the central database main table would get
larger, so this would affect both old databases (that need to be
imported) and the overall performance of the system. If we decide to
go this step, we could indeed overcome various of the current
limitations.

Any volunteers out there who are ready for the challenge?
Christian



On Wed, Oct 31, 2018 at 11:40 PM Imsieke, Gerrit, le-tex
 wrote:


Hi Sergei,

The corresponding issue will turn 5 next March:
https://github.com/BaseXdb/basex/issues/902

If you are an XML developer who wants to index all the XML, XSLT, XProc,
RNG, XSD, Schematron, etc. files on your hard disk in an XML database,
chances are that you’ll need more than 256 namespaces.

I’m willing to shell out up to 1,200 Euros (plus VAT) out of my own
pockets for this feature. Any other funders?
Christian, how much do we need to raise collectively for you to
prioritize storage layout redesign?

Gerrit

On 31.10.2018 22:47, Сергей Чесноков wrote:


Hi all,

Is it possible to bypass the following restriction (I cannot change
"ep_ins_med_q.xsd" (Central Bank xbrl scheme))?:

BaseX 9.1

Command:
CREATE DB bfo
D:/portal/xbrl/CBR/final_1_3_1/Taxonomy_1_3_1/www.cbr.ru/xbrl/bfo/
Error:
"D:/portal/xbrl/CBR/final_1_3_1/Taxonomy_1_3_1/www.cbr.ru/xbrl/bfo/rep/2018-03-31/ep/ep_ins_med_q.xsd""ep_ins_med_q.xsd"
(Line 21): Too many distinct namespaces (limit: 256).

Best regards, Sergei.


--
Gerrit Imsieke
Geschäftsführer / Managing Director
le-tex publishing services GmbH
Weissenfelser Str. 84, 04229 Leipzig, Germany
Phone +49 341 355356 110, Fax +49 341 355356 510
gerrit.imsi...@le-tex.de, http://www.le-tex.de

Registergericht / Commercial Register: Amtsgericht Leipzig
Registernummer / Registration Number: HRB 24930

Geschäftsführer / Managing Directors:
Gerrit Imsieke, Svea Jelonek, Thomas Schmidt


--
Gerrit Imsieke
Geschäftsführer / Managing Director
le-tex publishing services GmbH
Weissenfelser Str. 84, 04229 Leipzig, Germany
Phone +49 341 355356 110, Fax +49 341 355356 510
gerrit.imsi...@le-tex.de, http://www.le-tex.de

Registergericht / Commercial Register: Amtsgericht Leipzig
Registernummer / Registration Number: HRB 24930

Geschäftsführer / Managing Directors:
Gerrit Imsieke, Svea Jelonek, Thomas Schmidt


Re: [basex-talk] many distinct namespaces

2018-11-02 Thread Christian Grün
Hi Gerrit,

thanks for your generous offer to sponsor the requested feature. I am
ashamed to confirm it’s been a long time since this issue has been
opened and not closed yet. You are asking how much money will be
required to get this fixed. I’m not sure after all. Maybe I would
rather ask for 3, 4 weeks of “leisure time”, or – even better – get a
proposal into my hands how this could be resolved without compromising
backward conformance.

Some more details: The current storage layout per node has been fixed
to 16 bytes. One byte (8 bits) is reserved for the namespace
reference. The other 15 bytes (minus a few unused bits) are reserved
for other references and flags. We could extend the storage to 24 or
32 bits. As a result, the central database main table would get
larger, so this would affect both old databases (that need to be
imported) and the overall performance of the system. If we decide to
go this step, we could indeed overcome various of the current
limitations.

Any volunteers out there who are ready for the challenge?
Christian



On Wed, Oct 31, 2018 at 11:40 PM Imsieke, Gerrit, le-tex
 wrote:
>
> Hi Sergei,
>
> The corresponding issue will turn 5 next March:
> https://github.com/BaseXdb/basex/issues/902
>
> If you are an XML developer who wants to index all the XML, XSLT, XProc,
> RNG, XSD, Schematron, etc. files on your hard disk in an XML database,
> chances are that you’ll need more than 256 namespaces.
>
> I’m willing to shell out up to 1,200 Euros (plus VAT) out of my own
> pockets for this feature. Any other funders?
> Christian, how much do we need to raise collectively for you to
> prioritize storage layout redesign?
>
> Gerrit
>
> On 31.10.2018 22:47, Сергей Чесноков wrote:
> >
> > Hi all,
> >
> > Is it possible to bypass the following restriction (I cannot change
> > "ep_ins_med_q.xsd" (Central Bank xbrl scheme))?:
> >
> > BaseX 9.1
> >
> > Command:
> > CREATE DB bfo
> > D:/portal/xbrl/CBR/final_1_3_1/Taxonomy_1_3_1/www.cbr.ru/xbrl/bfo/
> > Error:
> > "D:/portal/xbrl/CBR/final_1_3_1/Taxonomy_1_3_1/www.cbr.ru/xbrl/bfo/rep/2018-03-31/ep/ep_ins_med_q.xsd""ep_ins_med_q.xsd"
> > (Line 21): Too many distinct namespaces (limit: 256).
> >
> > Best regards, Sergei.
>
> --
> Gerrit Imsieke
> Geschäftsführer / Managing Director
> le-tex publishing services GmbH
> Weissenfelser Str. 84, 04229 Leipzig, Germany
> Phone +49 341 355356 110, Fax +49 341 355356 510
> gerrit.imsi...@le-tex.de, http://www.le-tex.de
>
> Registergericht / Commercial Register: Amtsgericht Leipzig
> Registernummer / Registration Number: HRB 24930
>
> Geschäftsführer / Managing Directors:
> Gerrit Imsieke, Svea Jelonek, Thomas Schmidt