Re: [basex-talk] diacritics sensitive not working

2018-09-03 Thread Graydon Saunders
Let's suppose you've got a map like: (and that by just typing this into the email I haven't left in any really horrible typos!) let $drugInfo as map(xs:string,element()) := map:merge( for $element in collection('newDrugInfo')/descendant::infoElement let $name as xs:string := (: whatever you do

Re: [basex-talk] diacritics sensitive not working

2018-09-02 Thread Ron Katriel
Hi Graydon, Thanks for the suggestion. Could you provide sample code to help with this? If needed I can share the relevant BaseX snippet. Best, Ron > On Sep 2, 2018, at 9:16 PM, Graydon Saunders wrote: > > Maps that reference nodes are pointers, rather than copies. It sounds like > you

Re: [basex-talk] diacritics sensitive not working

2018-09-02 Thread Graydon Saunders
Maps that reference nodes are pointers, rather than copies. It sounds like you could map every drug name to every "interesting" XML node that contains it using grouping during map creation and then just iterate on the keys to process the nodes. On Sun, Sep 2, 2018 at 4:52 PM Ron Katriel wrote:

Re: [basex-talk] diacritics sensitive not working

2018-09-02 Thread Ron Katriel
Hi Christian, As promised here is a summary of my experimentation. I replaced the expensive join with a map lookup and the program finished in 4 minutes vs. 1 hour using a naive loop over the two databases (the original 6 hours reported were due to overly aggressive virus scanning software, which

Re: [basex-talk] diacritics sensitive not working

2018-08-04 Thread Ron Katriel
Hi Christian, Thanks for the advise. The BaseX engine is phenomenal so I realized quickly that the problem was performing a naive cross product. Since this query is run only once a month (to serialize XML to CSV) and applied to new data (DB) each time, a BaseX map will likely be the most

Re: [basex-talk] diacritics sensitive not working

2018-08-04 Thread Christian Grün
Hi Ron, > I believe the slow execution may be due to a combinatorial issue: the cross > product of 280,000 clinical trials and ~10,000 drugs in DrugBank (not > counting synonyms). Yes, this sounds like a pretty expensive operation. Having maps (XQuery, Java) will be much faster indeed. As

Re: [basex-talk] diacritics sensitive not working

2018-08-03 Thread Ron Katriel
Christian, Thanks for sharing that. I assumed all along that this happens automatically. Anyway, I ran my query (for one drug, to save time) and see the following in the Info view - apply text index for "Lenalidomide" I believe the slow execution may be due to a combinatorial issue: the cross

Re: [basex-talk] diacritics sensitive not working

2018-08-03 Thread Christian Grün
Our documentation should help you here: http://docs.basex.org/wiki/Indexes Ron Katriel schrieb am Fr., 3. Aug. 2018, 23:20: > Hi Christian, > > Yes, I created a full-text index when the databases where loaded (see the > commands below). I also verified that FTINDEX is true for both databases

Re: [basex-talk] diacritics sensitive not working

2018-08-03 Thread Ron Katriel
Hi Gerrit, Thanks for the suggestions. I would like to retain the original diacritics (for output purposes) but only match them when warranted (e.g., match acétazolamide to acétazolamide, but not acétazolamide to acetazolamide). I am looking for a simple solution that does not involve modifying

Re: [basex-talk] diacritics sensitive not working

2018-08-03 Thread Ron Katriel
Hi Christian, Yes, I created a full-text index when the databases where loaded (see the commands below). I also verified that FTINDEX is true for both databases (in the GUI under Database > Open & Manage). How do I ensure that my query is rewritten for index access? Thanks, Ron SET FTINDEX

Re: [basex-talk] diacritics sensitive not working

2018-08-03 Thread Christian Grün
Hi Ron, Did you a) create a full-text index for your data and b) ensure that your query is rewritten for index access? Best, Christian On Fri, Aug 3, 2018 at 2:39 PM Ron Katriel wrote: > > Christian, > > Adding diacritics sensitive slows execution by a factor of 3. My script > (fragment

Re: [basex-talk] diacritics sensitive not working

2018-08-03 Thread Imsieke, Gerrit, le-tex
Hi Ron, You can add an extra element (or attribute) to the content when importing or modifying it. (Or another document in another database if you like – you can create and later find such an index document by giving it the same db:path as the original document.) In this extra database,

Re: [basex-talk] diacritics sensitive not working

2018-08-03 Thread Ron Katriel
Christian, Adding diacritics sensitive slows execution by a factor of 3. My script (fragment below), which joins two large databases, namely CT.gov and DrugBank, takes 2 hours without the diacritics sensitive constraint but 6 hours with it. Given the combinatorics

Re: [basex-talk] diacritics sensitive not working

2018-08-01 Thread Ron Katriel
Thanks, Christian. Strange, prior to contacting you and on a hunch, I tried adding the missing “using” keyword but still got the syntax error. Anyway, everything is good now! Best, Ron On August 1, 2018 at 3:57:51 AM, Christian Grün (christian.gr...@gmail.com) wrote: I have fixed the example in

Re: [basex-talk] diacritics sensitive not working

2018-08-01 Thread Christian Grün
I have fixed the example in the doc. Best, Christian On Wed, Aug 1, 2018 at 5:08 AM Ron Katriel wrote: > > Hi, > > The following from your website (docs.basex.org/wiki/Full-Text) appears to be > syntactically incorrect > > "'Äpfel' will not be found..." contains text "Apfel" diacritics

[basex-talk] diacritics sensitive not working

2018-07-31 Thread Ron Katriel
Hi, The following from your website (docs.basex.org/wiki/Full-Text) appears to be syntactically incorrect "'Äpfel' will not be found..." contains text "Apfel" diacritics sensitive In the BaseX GUI the keyword diacritics is underlined in red and the following error is reported Unexpected end of