[basex-talk] More Diacritic Questions

2014-11-23 Thread Chris Yocum
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Hi Everyone, I am rather confused again about diacritic handling in basex. For instance, with Full Text turned on a word like athgabáil will match both athgabail and athgabáil with diacritics insensitive which is what I would expect. However, if

Re: [basex-talk] More Diacritic Questions

2014-11-23 Thread Christian Grün
Hi Chris, Thanks for the observation. I can confirm that some characters like ṡ (U+1E61) do not seem be properly normalized yet. I have added an issue for that [1], and I hope I will soon have it fixed. If you encounter some other surprising behavior like this, feel free to tell us. Best,

Re: [basex-talk] More Diacritic Questions

2014-11-23 Thread Chris Yocum
-BEGIN PGP SIGNED MESSAGE- Hash: SHA256 Hi Chrsitian Thanks for letting me know! I also need ḟ U+1E1F. All the best, Chris On Sun, Nov 23, 2014 at 06:22:39PM +0100, Christian Grün wrote: Hi Chris, Thanks for the observation. I can confirm that some characters like ṡ (U+1E61) do

[basex-talk] Experiencing unexpected fallback to Xalan with xslt:transform

2014-11-23 Thread Marc van Grootel
Hi, I was using an XSLT 2 stylesheet to transform xqdoc XML to Markdown. I already had a stylesheet that did the same with HTML. This stylesheet is XSLT 2 and uses Saxon HE which is located inside the BaseX lib directory. Saxon HE is picked up just fine with xslt:transform and

Re: [basex-talk] Full text score with or

2014-11-23 Thread Christian Grün
Hi Andy, Thanks, that works for me. I always prefer less complex code :-) so it would be nice if this feature made a return at some point. So it did: In the latest snapshot, scores will again be propagated when using and, or, and predicates [2]. Cheers, Christian [1]

Re: [basex-talk] Experiencing unexpected fallback to Xalan with xslt:transform

2014-11-23 Thread Marc van Grootel
Hmm, removing all functions and named templates didn't help either. Something is causing that my HTML conversion stylesheet is using Saxon whereas the other stylesheet insists on using Xalan. Here's another error I got when I added a function and called it. [bxerr:BXSL0001] ERROR: 'Cannot

Re: [basex-talk] Experiencing unexpected fallback to Xalan with xslt:transform

2014-11-23 Thread Marc van Grootel
Stop the press . No, really. I probably have to stop now. I was switching BaseX versions and used a version where I didn't add Saxon yet. The HTMl stylesheet I used didn't use any XSLT 2 features. Face palm. Sorry for wasting your time Can I get Google to forget this ;-) --Marc

Re: [basex-talk] Experiencing unexpected fallback to Xalan with xslt:transform

2014-11-23 Thread Christian Grün
Can I get Google to forget this ;-) Never ever, sorry…

Re: [basex-talk] More Diacritic Questions

2014-11-23 Thread Christian Grün
Hi Graydon, I just had a look. In BaseX, without diacritics can be explained by this a single, glorious mapping table [1]. It's quite obvious that there are just too many cases which are not covered by this mapping. We introduced this solution in the very beginnings of our full-text

Re: [basex-talk] More Diacritic Questions

2014-11-23 Thread Christian Grün
I just found a mapping table proposed by John Cowan [1]. It's already pretty old, so it doesn't cover newer Unicode versions, but it's surely better than our current solution. [1] http://www.unicode.org/mail-arch/unicode-ml/y2003-m08/0047.html On Sun, Nov 23, 2014 at 11:19 PM, Christian Grün

Re: [basex-talk] More Diacritic Questions

2014-11-23 Thread Graydon Saunders
Hi Christian -- That is indeed a glorious table! :) Unicode defines whether or not a character has a decomposition; so e-with-acute, U+00E9, decomposes into U+0065 + U+0301 (an e and a combining acute accent.) I think the presence of a decomposition is a recoverable character property in Java.

Re: [basex-talk] More Diacritic Questions

2014-11-23 Thread Christian Grün
Hi Chris, I am glad to report that the latest snapshot of BaseX [1] now provides much better support for diacritical characters. Please find more details in my next mail to Graydon. Hope this helps, Christian [1] http://files.basex.org/releases/latest/

Re: [basex-talk] More Diacritic Questions

2014-11-23 Thread Christian Grün
Hi Graydon, Thanks for your detailed reply, very appreciated. For today, I decided to choose a pragmatic solution that provides support for much more cases than before. I have added some more (glorious) mappings motivated by John Cowan's mail, which can now be found in a new class [1]. However,