Re: [sword-devel] Stem searching

Troy A. Griffitts Thu, 12 Jul 2012 10:54:11 -0700

Well, theoretically, yes. The logical syntax allows such, but after itdidn't work when I tried, the Lucene FAQ says:

Leading wildcards (e.g. /*ook/) are *not* supported by the QueryParserby default. As of Lucene 2.1, they can be enabled by callingQueryParser.setAllowLeadingWildcard( true ). Note that this can be anexpensive operation: it requires scanning the list of tokens in theindex in its entirety to look for those that match the pattern.


So, it might take some flags to be flipped and an eval of the performance.

Troy


On 07/12/2012 07:12 PM, Daniel Owens wrote:

Troy,

I am excited about this kind of search capability. This is great work.
I have a question. Will this solution also cover searching for a morphvalue for any lemma? It might look like:
morph:*@mor1

instead of

morph:lem1@mor1
In other words, if I want to find all the masculine, singular nouns,regardless of lemma.
Daniel

On 07/12/2012 10:01 AM, Troy A. Griffitts wrote:
Hey Chris,
A relational database will not contribute more to a solution thanwhat we have available in lucene. What I failed to get across in mylast email, due to too much caffeine, was that a verse's declensiondata by itself is useless without being attached to the lemma whicheach morph code in the declension data modifies.
We have 2 things for each word:

root@declension

we refer to these as:

lemma@morph

root, stem, lemma, in this discussion are all synonyms.
Currently in our lucene index we have a field called 'lemma', so fora verse with 5 words, this field might look something like this:
lem1 lem2 lem3 lem4

and we can do searches for all verses with lem3

lemma:lem3
great, but this ignores the declension data; e.g., was lem3 a 1stperson or 2nd person noun? Ignoring declension is usually desiredwhen doing word studies, and why we have the 'lemma' lucene index inthe first place. You don't want to have to search for all forms of aword to do a word study.
... but sometimes you only care about 1 form of a word when doing astudy, so how do we incorporate the declension information?
It would be useless to create a 'morph' field with contents for thesame verse as:
mor1 mor2 mor3 mor4
In this scenario, you could construct a clucene search using bothfields like this:
lemma:lem2 morph:mor2
but this would not return what you desire. This would return allverses which have a lem2 in the lemma field and a mor2 in the morphfield, but not necessarily together.
So... the proposed solution...
++++++++++++++++++++++++++
We have created a new field called 'morph' which will probablyreplace the lemma field and has data as:
lem1@mor1 lem2@mor2 lem3@mor3 lem4@mor4

This allows a lucene search to be create like this:

morph:lem2@mor2
or to get the functionality of the current 'lemma' field-- whichignores declension, the equiv search using the 'morph' field would be:
morph:lem2@*
this allows all kinds of queries, like: give me all verses which havelem1 and lem2 within 4 words of each other and lem2 must have thedeclension mor2
morph:"lem1@* lem2@mor2"~4

Hope this make things clearer if there were any clouds :)

Troy








On 07/12/2012 02:17 PM, Chris Burrell wrote:
Thanks Troy. That helps put the task in perspective... Analternative would possibly be to store both strong and morphologyindexes in a relational database. Then have a table mapping all thedata together. I guess the mapping table would be based on oneversion of the Bible only.
Cheers
Chris
On 11 July 2012 01:09, Troy A. Griffitts <[email protected]<mailto:[email protected]>> wrote:
    Chris,

    We're toyed around with the best way to add lemma+morph searching
    in SWORD but haven't finalized anything yet.

    Indexing Morphology codes won't helps.  This would give you 2
    fields which need to be used together.

    For example, if you wish to find λογος only in the nominative
    within 3 words of any present, active, indicative, 2 persons
    singular or plural verb, you could not satisfy your search.

    Believe it or not, end users of tools like Bibleworks seem quite
    happy to learn odd syntax like:


    "λογος@* *@PAI2?"~3


    Of course GUI tools to help build that syntax for them is also
    desired.

    This it the direction we're heading, but would require lemma
    encoding changed from strongs to lexical form.

    Presently we could nearly obtain this by building an index as
    (from the start of John 1.1):

    G1722@PREP G746@N-DSF G2258@V-IXI-3S

    But this would require users to know strongs numbers rather than
    lexical form, which would almost certainly need a GUI to help
    them build the search syntax.

    Hope this helps,

    Troy





    On 07/10/2012 11:41 PM, Chris Burrell wrote:
    Hello

    Does anyone know/tried some kind of stem search with JSword? Is
    it implemented? Or would we need to do a bit more work there?

    Chris



    _______________________________________________
    jsword-devel mailing list
    [email protected] <mailto:[email protected]>
    http://www.crosswire.org/mailman/listinfo/jsword-devel
_______________________________________________
jsword-devel mailing list
[email protected]
http://www.crosswire.org/mailman/listinfo/jsword-devel
_______________________________________________
sword-devel mailing list: [email protected]
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page
_______________________________________________
sword-devel mailing list: [email protected]
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page

_______________________________________________
sword-devel mailing list: [email protected]
http://www.crosswire.org/mailman/listinfo/sword-devel
Instructions to unsubscribe/change your settings at above page

Re: [sword-devel] Stem searching

Reply via email to