Synonyms for AND/OR/NOT operators
Hi! What is the simplest way to add synonyms for AND/OR/NOT operators? I'd like to support two sets of operator words, so people can use either the original english operators and my custom ones for our local language. Thank you for your attention! Sanyi __ Do you Yahoo!? Send holiday email and support a worthy cause. Do good. http://celebrity.mail.yahoo.com - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Synonyms for AND/OR/NOT operators
On Dec 21, 2004, at 3:04 AM, Sanyi wrote: What is the simplest way to add synonyms for AND/OR/NOT operators? I'd like to support two sets of operator words, so people can use either the original english operators and my custom ones for our local language. There are two options that I know of: 1) add synonyms during indexing and 2) add synonyms during querying. Generally this would be done using a custom analyzer. If the synonym mappings are static and you don't mind a larger index, adding them during indexing avoids the complexity of rewriting the query. Injecting synonyms during querying allows the synonym mappings to change dynamically, though does produce more complex queries. Here's an example you'll find with the source code distribution of Lucene in Action which uses WordNet to look up synonyms. Erik p.s. I'm sensitive to over-marketing Lucene in Action in this forum as it would bother me to constantly see an advertisement. You can be sure that any mentions of it from me will coincide with concrete examples (which are freely available) that are directly related to questions being asked. % ant -emacs SynonymAnalyzerViewer Buildfile: build.xml check-environment: compile: build-test-index: build-perf-index: prepare: SynonymAnalyzerViewer: Using a custom SynonymAnalyzer, two fixed strings are analyzed with the results displayed. Synonyms, from the WordNet database, are injected into the same positions as the original words. See the Analysis chapter for more on synonym injection and position increments. The Tools and extensions chapter covers the WordNet feature found in the Lucene sandbox. Press return to continue... Running lia.analysis.synonym.SynonymAnalyzerViewer... 1: [quick] [warm] [straightaway] [spry] [speedy] [ready] [quickly] [promptly] [prompt] [nimble] [immediate] [flying] [fast] [agile] 2: [brown] [brownness] [brownish] 3: [fox] [trick] [throw] [slyboots] [fuddle] [fob] [dodger] [discombobulate] [confuse] [confound] [befuddle] [bedevil] 4: [jumps] 5: [over] [o] [across] 6: [lazy] [faineant] [indolent] [otiose] [slothful] 7: [dogs] 1: [oh] 2: [we] 3: [get] [acquire] [aim] [amaze] [arrest] [arrive] [baffle] [beat] [become] [beget] [begin] [bewilder] [bring] [can] [capture] [catch] [cause] [come] [commence] [contract] [convey] [develop] [draw] [drive] [dumbfound] [engender] [experience] [father] [fetch] [find] [fix] [flummox] [generate] [go] [gravel] [grow] [have] [incur] [induce] [let] [make] [may] [mother] [mystify] [nonplus] [obtain] [perplex] [produce] [puzzle] [receive] [scram] [sire] [start] [stimulate] [stupefy] [stupify] [suffer] [sustain] [take] [trounce] [undergo] 4: [both] 5: [kinds] 6: [country] [state] [nationality] [nation] [land] [commonwealth] [area] 7: [western] [westerly] 8: [bb] BUILD SUCCESSFUL Total time: 10 seconds - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Synonyms for AND/OR/NOT operators
Hi! I think we're talking about different things. My question is about using synonyms for AND/OR/NOT operators, not about synonyms of words in the index. For example, in some language: AND = AANNDD; OR = OORR; NOT = NNOOTT So, the user can enter: (cat OR kitty) AND black AND tail and either: (cat OORR kitty) AANNDD black AANNDD tail Both sets of operators must work. It must be some kind of a query parser modification/parametering, so there is nothing to do with the index. I hope I was more specific now ;) Thanx, Sanyi --- Erik Hatcher [EMAIL PROTECTED] wrote: On Dec 21, 2004, at 3:04 AM, Sanyi wrote: What is the simplest way to add synonyms for AND/OR/NOT operators? I'd like to support two sets of operator words, so people can use either the original english operators and my custom ones for our local language. There are two options that I know of: 1) add synonyms during indexing and 2) add synonyms during querying. Generally this would be done using a custom analyzer. If the synonym mappings are static and you don't mind a larger index, adding them during indexing avoids the complexity of rewriting the query. Injecting synonyms during querying allows the synonym mappings to change dynamically, though does produce more complex queries. Here's an example you'll find with the source code distribution of Lucene in Action which uses WordNet to look up synonyms. Erik p.s. I'm sensitive to over-marketing Lucene in Action in this forum as it would bother me to constantly see an advertisement. You can be sure that any mentions of it from me will coincide with concrete examples (which are freely available) that are directly related to questions being asked. % ant -emacs SynonymAnalyzerViewer Buildfile: build.xml check-environment: compile: build-test-index: build-perf-index: prepare: SynonymAnalyzerViewer: Using a custom SynonymAnalyzer, two fixed strings are analyzed with the results displayed. Synonyms, from the WordNet database, are injected into the same positions as the original words. See the Analysis chapter for more on synonym injection and position increments. The Tools and extensions chapter covers the WordNet feature found in the Lucene sandbox. Press return to continue... Running lia.analysis.synonym.SynonymAnalyzerViewer... 1: [quick] [warm] [straightaway] [spry] [speedy] [ready] [quickly] [promptly] [prompt] [nimble] [immediate] [flying] [fast] [agile] 2: [brown] [brownness] [brownish] 3: [fox] [trick] [throw] [slyboots] [fuddle] [fob] [dodger] [discombobulate] [confuse] [confound] [befuddle] [bedevil] 4: [jumps] 5: [over] [o] [across] 6: [lazy] [faineant] [indolent] [otiose] [slothful] 7: [dogs] 1: [oh] 2: [we] 3: [get] [acquire] [aim] [amaze] [arrest] [arrive] [baffle] [beat] [become] [beget] [begin] [bewilder] [bring] [can] [capture] [catch] [cause] [come] [commence] [contract] [convey] [develop] [draw] [drive] [dumbfound] [engender] [experience] [father] [fetch] [find] [fix] [flummox] [generate] [go] [gravel] [grow] [have] [incur] [induce] [let] [make] [may] [mother] [mystify] [nonplus] [obtain] [perplex] [produce] [puzzle] [receive] [scram] [sire] [start] [stimulate] [stupefy] [stupify] [suffer] [sustain] [take] [trounce] [undergo] 4: [both] 5: [kinds] 6: [country] [state] [nationality] [nation] [land] [commonwealth] [area] 7: [western] [westerly] 8: [bb] BUILD SUCCESSFUL Total time: 10 seconds - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] __ Do you Yahoo!? Dress up your holiday email, Hollywood style. Learn more. http://celebrity.mail.yahoo.com - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Synonyms for AND/OR/NOT operators
Erik Hatcher writes: On Dec 21, 2004, at 3:04 AM, Sanyi wrote: What is the simplest way to add synonyms for AND/OR/NOT operators? I'd like to support two sets of operator words, so people can use either the original english operators and my custom ones for our local language. There are two options that I know of: 1) add synonyms during indexing and 2) add synonyms during querying. Generally this would be done using a custom analyzer. I guess you missunderstood the question. I think he want's to know how to create a query parser understanding something like 'a UND b' as well as 'a AND b' to support localized operator names (german in this case). AFAIK that can only be done by copying query parsers javacc-source and adding the operators there. Shouldn't be difficult, though it's a bit ugly since it implies code duplication. And there will be no way of choosing the operators dynamically at runtime. One will need to have different query parsers for different languages. Morus - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Synonyms for AND/OR/NOT operators
Wow, I really did misunderstand. My apologies. Yes, you will need to fork QueryParser.jj and install JavaCC to build your custom parser. It should be pretty trivial to add alternatives to AND(+)/OR/NOT(-). Erik On Dec 21, 2004, at 4:42 AM, Sanyi wrote: Hi! I think we're talking about different things. My question is about using synonyms for AND/OR/NOT operators, not about synonyms of words in the index. For example, in some language: AND = AANNDD; OR = OORR; NOT = NNOOTT So, the user can enter: (cat OR kitty) AND black AND tail and either: (cat OORR kitty) AANNDD black AANNDD tail Both sets of operators must work. It must be some kind of a query parser modification/parametering, so there is nothing to do with the index. I hope I was more specific now ;) Thanx, Sanyi --- Erik Hatcher [EMAIL PROTECTED] wrote: On Dec 21, 2004, at 3:04 AM, Sanyi wrote: What is the simplest way to add synonyms for AND/OR/NOT operators? I'd like to support two sets of operator words, so people can use either the original english operators and my custom ones for our local language. There are two options that I know of: 1) add synonyms during indexing and 2) add synonyms during querying. Generally this would be done using a custom analyzer. If the synonym mappings are static and you don't mind a larger index, adding them during indexing avoids the complexity of rewriting the query. Injecting synonyms during querying allows the synonym mappings to change dynamically, though does produce more complex queries. Here's an example you'll find with the source code distribution of Lucene in Action which uses WordNet to look up synonyms. Erik p.s. I'm sensitive to over-marketing Lucene in Action in this forum as it would bother me to constantly see an advertisement. You can be sure that any mentions of it from me will coincide with concrete examples (which are freely available) that are directly related to questions being asked. % ant -emacs SynonymAnalyzerViewer Buildfile: build.xml check-environment: compile: build-test-index: build-perf-index: prepare: SynonymAnalyzerViewer: Using a custom SynonymAnalyzer, two fixed strings are analyzed with the results displayed. Synonyms, from the WordNet database, are injected into the same positions as the original words. See the Analysis chapter for more on synonym injection and position increments. The Tools and extensions chapter covers the WordNet feature found in the Lucene sandbox. Press return to continue... Running lia.analysis.synonym.SynonymAnalyzerViewer... 1: [quick] [warm] [straightaway] [spry] [speedy] [ready] [quickly] [promptly] [prompt] [nimble] [immediate] [flying] [fast] [agile] 2: [brown] [brownness] [brownish] 3: [fox] [trick] [throw] [slyboots] [fuddle] [fob] [dodger] [discombobulate] [confuse] [confound] [befuddle] [bedevil] 4: [jumps] 5: [over] [o] [across] 6: [lazy] [faineant] [indolent] [otiose] [slothful] 7: [dogs] 1: [oh] 2: [we] 3: [get] [acquire] [aim] [amaze] [arrest] [arrive] [baffle] [beat] [become] [beget] [begin] [bewilder] [bring] [can] [capture] [catch] [cause] [come] [commence] [contract] [convey] [develop] [draw] [drive] [dumbfound] [engender] [experience] [father] [fetch] [find] [fix] [flummox] [generate] [go] [gravel] [grow] [have] [incur] [induce] [let] [make] [may] [mother] [mystify] [nonplus] [obtain] [perplex] [produce] [puzzle] [receive] [scram] [sire] [start] [stimulate] [stupefy] [stupify] [suffer] [sustain] [take] [trounce] [undergo] 4: [both] 5: [kinds] 6: [country] [state] [nationality] [nation] [land] [commonwealth] [area] 7: [western] [westerly] 8: [bb] BUILD SUCCESSFUL Total time: 10 seconds - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] __ Do you Yahoo!? Dress up your holiday email, Hollywood style. Learn more. http://celebrity.mail.yahoo.com - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Synonyms for AND/OR/NOT operators
Well, I guess I'd better recognize and replace the operator synonyms to their original format before passing them to QueryParser. I don't feel comfortable tampering with Lucene's source code. Anyway, thanx for the answers. Sanyi --- Morus Walter [EMAIL PROTECTED] wrote: Erik Hatcher writes: On Dec 21, 2004, at 3:04 AM, Sanyi wrote: What is the simplest way to add synonyms for AND/OR/NOT operators? I'd like to support two sets of operator words, so people can use either the original english operators and my custom ones for our local language. There are two options that I know of: 1) add synonyms during indexing and 2) add synonyms during querying. Generally this would be done using a custom analyzer. I guess you missunderstood the question. I think he want's to know how to create a query parser understanding something like 'a UND b' as well as 'a AND b' to support localized operator names (german in this case). AFAIK that can only be done by copying query parsers javacc-source and adding the operators there. Shouldn't be difficult, though it's a bit ugly since it implies code duplication. And there will be no way of choosing the operators dynamically at runtime. One will need to have different query parsers for different languages. Morus - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] __ Do you Yahoo!? Take Yahoo! Mail with you! Get it on your mobile phone. http://mobile.yahoo.com/maildemo - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
Re: Synonyms for AND/OR/NOT operators
Sanyi writes: Well, I guess I'd better recognize and replace the operator synonyms to their original format before passing them to QueryParser. I don't feel comfortable tampering with Lucene's source code. Apart from knowing how to compile lucene (including the javacc code generation) you should only need to change DEFAULT TOKEN : { AND: (AND | ) | OR:(OR | ||) | NOT: (NOT | !) to DEFAULT TOKEN : { AND: (AND | insert your version of and here | ) | OR:(OR | insert your version of or here | ||) | NOT: (NOT | insert your version of not here | !) in jakarta-lucene/src/java/org/apache/lucene/queryParser/QueryParser.jj Replacing the operators before query might be hard to do, if you want to handle cases like »a AND b OR c«, which is a query for a phrase a AND b or the token c, correctly. Morus - To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]