Re: [basex-talk] king Henry VIII dalenda est

2020-10-05 Thread Liam R. E. Quin
On Mon, 2020-10-05 at 15:15 +0200, Christian Grün wrote:
> Hi Liam,
> 
> Did you find out why II et al. was ignored? Feel free to provide me
> with a little test case.

The markup in the surrogate files in the database turned out to be,
  Edward II

Changing to Edward II made it work.

Henry VIII was the same.

The query was, essentially,
let $term := "Henry VIII"
return ft:search("wobo", $term)/ancestor-or-self::p

Liam

-- 
Liam Quin, https://www.delightfulcomputing.com/
Available for XML/Document/Information Architecture/XSLT/
XSL/XQuery/Web/Text Processing/A11Y training, work & consulting.
Barefoot Web-slave, antique illustrations:  http://www.fromoldbooks.org



Re: [basex-talk] king Henry VIII dalenda est

2020-10-05 Thread Christian Grün
Hi Liam,

Did you find out why II et al. was ignored? Feel free to provide me
with a little test case.

Cheers,
Christian



On Tue, Sep 29, 2020 at 3:55 AM Liam R. E. Quin  wrote:
>
> On Mon, 2020-09-28 at 19:32 -0400, Liam R. E. Quin wrote:
> > At
> >
> > https://words.fromoldbooks.org/Search/
> >
> > a search for henry shows lots of matches, and sodoes a search for
> > henry
> > i, but henry ii and henry viis missing and so is henry viii.
>
> Actually it turns out (1) Henry VIII doesn't occur :) although the
> others do... and (2) in each case the roman numerals are surrounded by
> markup, III or whatever.
>
> So maybe it's behaving as expected!
>
> I'll remove the sc markup and see. Sorry ofr the noise.
>
> Liam
>
> --
> Liam Quin, https://www.delightfulcomputing.com/
> Available for XML/Document/Information Architecture/XSLT/
> XSL/XQuery/Web/Text Processing/A11Y training, work & consulting.
> Barefoot Web-slave, antique illustrations:  http://www.fromoldbooks.org
>


Re: [basex-talk] king Henry VIII dalenda est

2020-09-29 Thread Kristian Kankainen
Hi Liam,
Maybe you could translate those  tag contents into the
corresponding unicode symbols. At least I would hope that text
searching algorithms deal with that kind of expansion already, that
they match vii with Ⅶ and ⅶ.
Best regards,Kristian Kankainen
Ühel kenal päeval, E, 28.09.2020 kell 21:54, kirjutas Liam R. E. Quin:
> On Mon, 2020-09-28 at 19:32 -0400, Liam R. E. Quin wrote:
> > At
> > https://words.fromoldbooks.org/Search/
> > 
> > a search for henry shows lots of matches, and sodoes a search
> > forhenryi, but henry ii and henry viis missing and so is henry
> > viii.
> 
> Actually it turns out (1) Henry VIII doesn't occur :) although
> theothers do... and (2) in each case the roman numerals are
> surrounded bymarkup, III or whatever.
> So maybe it's behaving as expected!
> I'll remove the sc markup and see. Sorry ofr the noise.
> Liam


Re: [basex-talk] king Henry VIII dalenda est

2020-09-28 Thread Liam R. E. Quin
On Mon, 2020-09-28 at 19:32 -0400, Liam R. E. Quin wrote:
> At
> 
> https://words.fromoldbooks.org/Search/
> 
> a search for henry shows lots of matches, and sodoes a search for
> henry
> i, but henry ii and henry viis missing and so is henry viii.

Actually it turns out (1) Henry VIII doesn't occur :) although the
others do... and (2) in each case the roman numerals are surrounded by
markup, III or whatever.

So maybe it's behaving as expected!

I'll remove the sc markup and see. Sorry ofr the noise.

Liam

-- 
Liam Quin, https://www.delightfulcomputing.com/
Available for XML/Document/Information Architecture/XSLT/
XSL/XQuery/Web/Text Processing/A11Y training, work & consulting.
Barefoot Web-slave, antique illustrations:  http://www.fromoldbooks.org



[basex-talk] king Henry VIII dalenda est

2020-09-28 Thread Liam R. E. Quin
At

https://words.fromoldbooks.org/Search/

a search for henry shows lots of matches, and sodoes a search for henry
i, but henry ii and henry viis missing and so is henry viii.

I can search for viii and find Henry VIII and also Charles VIII, but i
also can't search for Charles VIII.

I can search for the king’s feet, and for Henry V, but not Charles II.

It looks like words ending in ii are invisible.

I'm using ft:search("wobo", $term)/ancestor-or-self::p

Might this be related to stemming? i have that turned off

This is used to create the db (from the Perl API)

  # create query instance
  t("drop db wobo");
  t("create db wobo");
  t("open wobo");
  $session->send("set chop false");
  $session->send("set ftindex true");
  $session->send("set updindex true");
  $session->send("set autooptimize true");

  txf("/home/liam/w/Search/wobo.xml");

  t("create index attribute");
  t("create index text");
  t("create index fulltext");
  t("optimize");
  t("close");
  t("quit");


where t() just does a $session->execute() on its argument after
printing a trace line, and txf does a delete followed by an add.

Probably i can make a test case available if needed.

Liam



-- 
Liam Quin, https://www.delightfulcomputing.com/
Available for XML/Document/Information Architecture/XSLT/
XSL/XQuery/Web/Text Processing/A11Y training, work & consulting.
Barefoot Web-slave, antique illustrations:  http://www.fromoldbooks.org