Re: SynonymGraphFilter can't consume an incoming graph
I wonder what happens if you ensure that none of your synonyms contains a character that WDGF cares about. Then they would operate on a disjoint set of tokens, and maybe they would (or could be made to) play nicely together? Even if they hate each other (maybe they detect token graphs and fail even though they could safely ignore), you could maybe write something using ConditionalTokenFilter that passes each token to either one or the other, thereby keeping them separate On Thu, Feb 14, 2019 at 10:19 PM lambda.coder lucene < lambda.coder.luc...@gmail.com> wrote: > Thanks Eric for your response > > So I guess the answer to Shawn Heisey’s question [1] : > > "Since multiple Graph filters cannot be used in an analysis chain, what is > somebody running 8.0 supposed to do if they need both the WordDelimiter > filter and Synonym filter in their analysis chain? » > > is to have an analysis chain for the WordDelimiterGraphFilter and another > one for the SynonymGraphFiler and then querying the two corresponding > fields at the same time > > There is currently no better option / alternative > > Am I right ? > > Kind regards > Patrick > > > [1] > https://issues.apache.org/jira/browse/LUCENE-6664?focusedCommentId=16386294=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16386294 > > > > Le 11 févr. 2019 à 05:46, Erick Erickson a > écrit : > > > > It's, well, undefined. As in nobody knows except that it'll be wrong. > > And exactly what the results are may change with any given release. > > > > Best, > > Erick > > > > On Sun, Feb 10, 2019 at 10:48 AM lambda.coder lucene > > wrote: > >> > >> Hello, > >> > >> The Javadocs of SynonymGraphFilter says that it can’t consume an > incoming graph and that the result will be undefined > >> > >> Is there any example that exhibits the limitations and what is meant by > undefined ? > >> > >> > >> Regards > >> Patrick > >> > >> > >> - > >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > >> For additional commands, e-mail: java-user-h...@lucene.apache.org > >> > > > > - > > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > > For additional commands, e-mail: java-user-h...@lucene.apache.org > > > > > - > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > >
Re: SynonymGraphFilter can't consume an incoming graph
Thanks Eric for your response So I guess the answer to Shawn Heisey’s question [1] : "Since multiple Graph filters cannot be used in an analysis chain, what is somebody running 8.0 supposed to do if they need both the WordDelimiter filter and Synonym filter in their analysis chain? » is to have an analysis chain for the WordDelimiterGraphFilter and another one for the SynonymGraphFiler and then querying the two corresponding fields at the same time There is currently no better option / alternative Am I right ? Kind regards Patrick [1] https://issues.apache.org/jira/browse/LUCENE-6664?focusedCommentId=16386294=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#comment-16386294 > Le 11 févr. 2019 à 05:46, Erick Erickson a écrit : > > It's, well, undefined. As in nobody knows except that it'll be wrong. > And exactly what the results are may change with any given release. > > Best, > Erick > > On Sun, Feb 10, 2019 at 10:48 AM lambda.coder lucene > wrote: >> >> Hello, >> >> The Javadocs of SynonymGraphFilter says that it can’t consume an incoming >> graph and that the result will be undefined >> >> Is there any example that exhibits the limitations and what is meant by >> undefined ? >> >> >> Regards >> Patrick >> >> >> - >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-user-h...@lucene.apache.org >> > > - > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: SynonymGraphFilter can't consume an incoming graph
It's, well, undefined. As in nobody knows except that it'll be wrong. And exactly what the results are may change with any given release. Best, Erick On Sun, Feb 10, 2019 at 10:48 AM lambda.coder lucene wrote: > > Hello, > > The Javadocs of SynonymGraphFilter says that it can’t consume an incoming > graph and that the result will be undefined > > Is there any example that exhibits the limitations and what is meant by > undefined ? > > > Regards > Patrick > > > - > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: SynonymGraphFilter
Thanks Michael. I think this clears my questions. Best regards On 9/12/18 8:23 PM, Michael Sokolov wrote: Usually one will either apply synonyms at index time or apply them at query time, but not both. I think the situation is that you will get most correct behavior, respecting synonym graph structure, with query time synonyms. Index time synonyms may give better performance, but at the cost of some overlap along time positions that results from the need for flattening, as in the quote you provided. If you use only query time synonyms there is no need to flatten. On Thu, Sep 13, 2018, 12:59 AM wrote: Any examples on the following note on the Javadocs at https://urldefense.proofpoint.com/v2/url?u=https-3A__lucene.apache.org_core_6-5F4-5F1_analyzers-2Dcommon_org_apache_lucene_analysis_synonym_SynonymGraphFilter.html=DwIFaQ=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE=nlG5z5NcNdIbQAiX-BKNeyLlULCbaezrgocEvPhQkl4=jjVzb2BqmqJ8noR0AT4fAenDR5scVDEiq9sAcfDmSjM=S02bxwhpCKvLzibdipBlbNQUEcnYsXVBBIiOV2fUKNM= Quoted from the above url: */However, if you use this during indexing, you must follow it with FlattenGraphFilter to squash tokens on top of one another like SynonymFilter, because the indexer can't directly consume a graph. To get fully correct positional queries when your synonym replacements are multiple tokens, you should instead apply synonyms using this TokenFilter at query time and translate the resulting graph to a TermAutomatonQuery e.g. using TokenStreamToTermAutomatonQuery./* End of quote This will make the code really hard to maintain if we separate synonyms based on the number of tokens. Any suggestions please? Best regards On 9/11/18 1:45 PM, baris.ka...@oracle.com wrote: Mike,- Great article, thanks for that; and i was exactly thinking about reverse mapping when i was writing this question. i guess Lucene would be nicer to both mappings when one is called for or another parameter to activate this double mapping. My next question is: can a synonmy be separated by space ? Next last question on this: should i repeat this both at index and query times? Best regards On 9/11/18 1:39 PM, Michael McCandless wrote: Try reading the blog post I wrote about token stream graphs? https://urldefense.proofpoint.com/v2/url?u=http-3A__blog.mikemccandless.com_2012_04_lucenes-2Dtokenstreams-2Dare-2Dactually.html=DwIBaQ=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE=nlG5z5NcNdIbQAiX-BKNeyLlULCbaezrgocEvPhQkl4=dFW7hW4Pkle8VsJIr-2hnjRiyzutTBueNt4tylmWfGA=VmAivANEDBIW2o1yuPeArZ9TEaeUW33HDiwFFLRZMxU= Mike McCandless https://urldefense.proofpoint.com/v2/url?u=http-3A__blog.mikemccandless.com=DwIBaQ=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE=nlG5z5NcNdIbQAiX-BKNeyLlULCbaezrgocEvPhQkl4=dFW7hW4Pkle8VsJIr-2hnjRiyzutTBueNt4tylmWfGA=UPmHXdrk9T2XCSkJrvxNMIqQo5Bducmp5rQRwpZ8UHo= On Tue, Sep 11, 2018 at 1:35 PM, wrote: Any comments please? Thanks On 9/10/18 5:07 PM, baris.ka...@oracle.com wrote: Any examples on this? i think it would be nice if Javadocs had an example on this: However, if you use this during indexing, you must follow it with FlattenGraphFilter to squash tokens on top of one another like SynonymFilter, because the indexer can't directly consume a graph. To get fully correct positional queries when your synonym replacements are multiple tokens, you should instead apply synonyms using this TokenFilter at query time and translate the resulting graph to a TermAutomatonQuery e.g. using TokenStreamToTermAutomatonQuery. multiple tokens means: a synonym with multiple equivalents?? or does it mean a synonym with multiple words? this is not clear to me. Best regards On 9/10/18 3:15 PM, baris.ka...@oracle.com wrote: https://urldefense.proofpoint.com/v2/url?u=https-3A__lucene. apache.org_core_6-5F4-5F1_analyzers-2Dcommon_org_apache_luce ne_analysis_synonym_SynonymGraphFilter.html=DwICaQ=RoP1Y umCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE=nlG5z5NcNdIbQAiX-BK NeyLlULCbaezrgocEvPhQkl4=E2-7wwk3FgEU_ykuPnXNoOe0IIkgxivSa YV3p-2lGfY=guRDJ6HEg5JJkMQqdDVZkKs0gbuI7naZK2TUXFHN9w8= Does this mean i dont have to repeat it in the search analyzer when i do this at indexing time? Best regards - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: SynonymGraphFilter
Usually one will either apply synonyms at index time or apply them at query time, but not both. I think the situation is that you will get most correct behavior, respecting synonym graph structure, with query time synonyms. Index time synonyms may give better performance, but at the cost of some overlap along time positions that results from the need for flattening, as in the quote you provided. If you use only query time synonyms there is no need to flatten. On Thu, Sep 13, 2018, 12:59 AM wrote: > Any examples on the following note on the Javadocs at > > https://lucene.apache.org/core/6_4_1/analyzers-common/org/apache/lucene/analysis/synonym/SynonymGraphFilter.html > > > Quoted from the above url: > > */However, if you use this during indexing, you must follow it with > FlattenGraphFilter to squash tokens on top of one another like > SynonymFilter, because the indexer can't directly consume a graph. To > get fully correct positional queries when your synonym replacements are > multiple tokens, you should instead apply synonyms using this > TokenFilter at query time and translate the resulting graph to a > TermAutomatonQuery e.g. using TokenStreamToTermAutomatonQuery./* > > End of quote > > > This will make the code really hard to maintain if we separate synonyms > based on the number of tokens. > > Any suggestions please? > > Best regards > > > > > On 9/11/18 1:45 PM, baris.ka...@oracle.com wrote: > > Mike,- > > > > Great article, thanks for that; and i was exactly thinking about > > reverse mapping when > > > > i was writing this question. i guess Lucene would be nicer to both > > mappings when one is called for or another parameter to activate this > > double mapping. > > > > > > My next question is: can a synonmy be separated by space ? > > > > Next last question on this: should i repeat this both at index and > > query times? > > Best regards > > > > On 9/11/18 1:39 PM, Michael McCandless wrote: > >> Try reading the blog post I wrote about token stream graphs? > >> > >> > https://urldefense.proofpoint.com/v2/url?u=http-3A__blog.mikemccandless.com_2012_04_lucenes-2Dtokenstreams-2Dare-2Dactually.html=DwIBaQ=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE=nlG5z5NcNdIbQAiX-BKNeyLlULCbaezrgocEvPhQkl4=dFW7hW4Pkle8VsJIr-2hnjRiyzutTBueNt4tylmWfGA=VmAivANEDBIW2o1yuPeArZ9TEaeUW33HDiwFFLRZMxU= > >> > >> > >> Mike McCandless > >> > >> > https://urldefense.proofpoint.com/v2/url?u=http-3A__blog.mikemccandless.com=DwIBaQ=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE=nlG5z5NcNdIbQAiX-BKNeyLlULCbaezrgocEvPhQkl4=dFW7hW4Pkle8VsJIr-2hnjRiyzutTBueNt4tylmWfGA=UPmHXdrk9T2XCSkJrvxNMIqQo5Bducmp5rQRwpZ8UHo= > >> > >> > >> On Tue, Sep 11, 2018 at 1:35 PM, wrote: > >> > >>> Any comments please? > >>> > >>> Thanks > >>> > >>> > >>> On 9/10/18 5:07 PM, baris.ka...@oracle.com wrote: > >>> > Any examples on this? i think it would be nice if Javadocs had an > example > on this: > > However, if you use this during indexing, you must follow it with > FlattenGraphFilter to squash tokens on top of one another like > SynonymFilter, because the indexer can't directly consume a graph. > To get > fully correct positional queries when your synonym replacements are > multiple tokens, you should instead apply synonyms using this > TokenFilter > at query time and translate the resulting graph to a > TermAutomatonQuery > e.g. using TokenStreamToTermAutomatonQuery. > > multiple tokens means: a synonym with multiple equivalents?? > > or does it mean a synonym with multiple words? > > this is not clear to me. > > Best regards > > > On 9/10/18 3:15 PM, baris.ka...@oracle.com wrote: > > > https://urldefense.proofpoint.com/v2/url?u=https-3A__lucene. > > apache.org_core_6-5F4-5F1_analyzers-2Dcommon_org_apache_luce > > ne_analysis_synonym_SynonymGraphFilter.html=DwICaQ=RoP1Y > > umCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE=nlG5z5NcNdIbQAiX-BK > > NeyLlULCbaezrgocEvPhQkl4=E2-7wwk3FgEU_ykuPnXNoOe0IIkgxivSa > > YV3p-2lGfY=guRDJ6HEg5JJkMQqdDVZkKs0gbuI7naZK2TUXFHN9w8= > > > > Does this mean i dont have to repeat it in the search analyzer > > when i do > > this at indexing time? > > > > Best regards > > > > > > > - > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > > > >>> - > >>> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > >>> For additional commands, e-mail: java-user-h...@lucene.apache.org > >>> > >>> > > > > > > - > > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > > For additional commands, e-mail:
Re: SynonymGraphFilter
So, the below statement suggests this? "To get fully correct positional queries when your synonym replacements are multiple tokens, you should instead apply synonyms using this TokenFilter at query time and translate the resulting graph to a TermAutomatonQuery e.g. using TokenStreamToTermAutomatonQuery." - This suggests then processing single token synonyms at index time and multi token synonyms at query time? t Best regards On 9/12/18 11:59 AM, baris.ka...@oracle.com wrote: Any examples on the following note on the Javadocs at https://urldefense.proofpoint.com/v2/url?u=https-3A__lucene.apache.org_core_6-5F4-5F1_analyzers-2Dcommon_org_apache_lucene_analysis_synonym_SynonymGraphFilter.html=DwICaQ=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE=nlG5z5NcNdIbQAiX-BKNeyLlULCbaezrgocEvPhQkl4=Uddluf6A_iPzoewTPE6rDrrtivrgHMTEhbS-EVHqgHo=iTPYNlwp4HPHWaAb-Bjy3xUDyDrxXdk6V3NDqLCiS74= Quoted from the above url: */However, if you use this during indexing, you must follow it with FlattenGraphFilter to squash tokens on top of one another like SynonymFilter, because the indexer can't directly consume a graph. To get fully correct positional queries when your synonym replacements are multiple tokens, you should instead apply synonyms using this TokenFilter at query time and translate the resulting graph to a TermAutomatonQuery e.g. using TokenStreamToTermAutomatonQuery./* End of quote This will make the code really hard to maintain if we separate synonyms based on the number of tokens. Any suggestions please? Best regards On 9/11/18 1:45 PM, baris.ka...@oracle.com wrote: Mike,- Great article, thanks for that; and i was exactly thinking about reverse mapping when i was writing this question. i guess Lucene would be nicer to both mappings when one is called for or another parameter to activate this double mapping. My next question is: can a synonmy be separated by space ? Next last question on this: should i repeat this both at index and query times? Best regards On 9/11/18 1:39 PM, Michael McCandless wrote: Try reading the blog post I wrote about token stream graphs? https://urldefense.proofpoint.com/v2/url?u=http-3A__blog.mikemccandless.com_2012_04_lucenes-2Dtokenstreams-2Dare-2Dactually.html=DwIBaQ=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE=nlG5z5NcNdIbQAiX-BKNeyLlULCbaezrgocEvPhQkl4=dFW7hW4Pkle8VsJIr-2hnjRiyzutTBueNt4tylmWfGA=VmAivANEDBIW2o1yuPeArZ9TEaeUW33HDiwFFLRZMxU= Mike McCandless https://urldefense.proofpoint.com/v2/url?u=http-3A__blog.mikemccandless.com=DwIBaQ=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE=nlG5z5NcNdIbQAiX-BKNeyLlULCbaezrgocEvPhQkl4=dFW7hW4Pkle8VsJIr-2hnjRiyzutTBueNt4tylmWfGA=UPmHXdrk9T2XCSkJrvxNMIqQo5Bducmp5rQRwpZ8UHo= On Tue, Sep 11, 2018 at 1:35 PM, wrote: Any comments please? Thanks On 9/10/18 5:07 PM, baris.ka...@oracle.com wrote: Any examples on this? i think it would be nice if Javadocs had an example on this: However, if you use this during indexing, you must follow it with FlattenGraphFilter to squash tokens on top of one another like SynonymFilter, because the indexer can't directly consume a graph. To get fully correct positional queries when your synonym replacements are multiple tokens, you should instead apply synonyms using this TokenFilter at query time and translate the resulting graph to a TermAutomatonQuery e.g. using TokenStreamToTermAutomatonQuery. multiple tokens means: a synonym with multiple equivalents?? or does it mean a synonym with multiple words? this is not clear to me. Best regards On 9/10/18 3:15 PM, baris.ka...@oracle.com wrote: https://urldefense.proofpoint.com/v2/url?u=https-3A__lucene. apache.org_core_6-5F4-5F1_analyzers-2Dcommon_org_apache_luce ne_analysis_synonym_SynonymGraphFilter.html=DwICaQ=RoP1Y umCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE=nlG5z5NcNdIbQAiX-BK NeyLlULCbaezrgocEvPhQkl4=E2-7wwk3FgEU_ykuPnXNoOe0IIkgxivSa YV3p-2lGfY=guRDJ6HEg5JJkMQqdDVZkKs0gbuI7naZK2TUXFHN9w8= Does this mean i dont have to repeat it in the search analyzer when i do this at indexing time? Best regards - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: SynonymGraphFilter
Any examples on the following note on the Javadocs at https://lucene.apache.org/core/6_4_1/analyzers-common/org/apache/lucene/analysis/synonym/SynonymGraphFilter.html Quoted from the above url: */However, if you use this during indexing, you must follow it with FlattenGraphFilter to squash tokens on top of one another like SynonymFilter, because the indexer can't directly consume a graph. To get fully correct positional queries when your synonym replacements are multiple tokens, you should instead apply synonyms using this TokenFilter at query time and translate the resulting graph to a TermAutomatonQuery e.g. using TokenStreamToTermAutomatonQuery./* End of quote This will make the code really hard to maintain if we separate synonyms based on the number of tokens. Any suggestions please? Best regards On 9/11/18 1:45 PM, baris.ka...@oracle.com wrote: Mike,- Great article, thanks for that; and i was exactly thinking about reverse mapping when i was writing this question. i guess Lucene would be nicer to both mappings when one is called for or another parameter to activate this double mapping. My next question is: can a synonmy be separated by space ? Next last question on this: should i repeat this both at index and query times? Best regards On 9/11/18 1:39 PM, Michael McCandless wrote: Try reading the blog post I wrote about token stream graphs? https://urldefense.proofpoint.com/v2/url?u=http-3A__blog.mikemccandless.com_2012_04_lucenes-2Dtokenstreams-2Dare-2Dactually.html=DwIBaQ=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE=nlG5z5NcNdIbQAiX-BKNeyLlULCbaezrgocEvPhQkl4=dFW7hW4Pkle8VsJIr-2hnjRiyzutTBueNt4tylmWfGA=VmAivANEDBIW2o1yuPeArZ9TEaeUW33HDiwFFLRZMxU= Mike McCandless https://urldefense.proofpoint.com/v2/url?u=http-3A__blog.mikemccandless.com=DwIBaQ=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE=nlG5z5NcNdIbQAiX-BKNeyLlULCbaezrgocEvPhQkl4=dFW7hW4Pkle8VsJIr-2hnjRiyzutTBueNt4tylmWfGA=UPmHXdrk9T2XCSkJrvxNMIqQo5Bducmp5rQRwpZ8UHo= On Tue, Sep 11, 2018 at 1:35 PM, wrote: Any comments please? Thanks On 9/10/18 5:07 PM, baris.ka...@oracle.com wrote: Any examples on this? i think it would be nice if Javadocs had an example on this: However, if you use this during indexing, you must follow it with FlattenGraphFilter to squash tokens on top of one another like SynonymFilter, because the indexer can't directly consume a graph. To get fully correct positional queries when your synonym replacements are multiple tokens, you should instead apply synonyms using this TokenFilter at query time and translate the resulting graph to a TermAutomatonQuery e.g. using TokenStreamToTermAutomatonQuery. multiple tokens means: a synonym with multiple equivalents?? or does it mean a synonym with multiple words? this is not clear to me. Best regards On 9/10/18 3:15 PM, baris.ka...@oracle.com wrote: https://urldefense.proofpoint.com/v2/url?u=https-3A__lucene. apache.org_core_6-5F4-5F1_analyzers-2Dcommon_org_apache_luce ne_analysis_synonym_SynonymGraphFilter.html=DwICaQ=RoP1Y umCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE=nlG5z5NcNdIbQAiX-BK NeyLlULCbaezrgocEvPhQkl4=E2-7wwk3FgEU_ykuPnXNoOe0IIkgxivSa YV3p-2lGfY=guRDJ6HEg5JJkMQqdDVZkKs0gbuI7naZK2TUXFHN9w8= Does this mean i dont have to repeat it in the search analyzer when i do this at indexing time? Best regards - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: SynonymGraphFilter
Mike,- Great article, thanks for that; and i was exactly thinking about reverse mapping when i was writing this question. i guess Lucene would be nicer to both mappings when one is called for or another parameter to activate this double mapping. My next question is: can a synonmy be separated by space ? Next last question on this: should i repeat this both at index and query times? Best regards On 9/11/18 1:39 PM, Michael McCandless wrote: Try reading the blog post I wrote about token stream graphs? https://urldefense.proofpoint.com/v2/url?u=http-3A__blog.mikemccandless.com_2012_04_lucenes-2Dtokenstreams-2Dare-2Dactually.html=DwIBaQ=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE=nlG5z5NcNdIbQAiX-BKNeyLlULCbaezrgocEvPhQkl4=dFW7hW4Pkle8VsJIr-2hnjRiyzutTBueNt4tylmWfGA=VmAivANEDBIW2o1yuPeArZ9TEaeUW33HDiwFFLRZMxU= Mike McCandless https://urldefense.proofpoint.com/v2/url?u=http-3A__blog.mikemccandless.com=DwIBaQ=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE=nlG5z5NcNdIbQAiX-BKNeyLlULCbaezrgocEvPhQkl4=dFW7hW4Pkle8VsJIr-2hnjRiyzutTBueNt4tylmWfGA=UPmHXdrk9T2XCSkJrvxNMIqQo5Bducmp5rQRwpZ8UHo= On Tue, Sep 11, 2018 at 1:35 PM, wrote: Any comments please? Thanks On 9/10/18 5:07 PM, baris.ka...@oracle.com wrote: Any examples on this? i think it would be nice if Javadocs had an example on this: However, if you use this during indexing, you must follow it with FlattenGraphFilter to squash tokens on top of one another like SynonymFilter, because the indexer can't directly consume a graph. To get fully correct positional queries when your synonym replacements are multiple tokens, you should instead apply synonyms using this TokenFilter at query time and translate the resulting graph to a TermAutomatonQuery e.g. using TokenStreamToTermAutomatonQuery. multiple tokens means: a synonym with multiple equivalents?? or does it mean a synonym with multiple words? this is not clear to me. Best regards On 9/10/18 3:15 PM, baris.ka...@oracle.com wrote: https://urldefense.proofpoint.com/v2/url?u=https-3A__lucene. apache.org_core_6-5F4-5F1_analyzers-2Dcommon_org_apache_luce ne_analysis_synonym_SynonymGraphFilter.html=DwICaQ=RoP1Y umCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE=nlG5z5NcNdIbQAiX-BK NeyLlULCbaezrgocEvPhQkl4=E2-7wwk3FgEU_ykuPnXNoOe0IIkgxivSa YV3p-2lGfY=guRDJ6HEg5JJkMQqdDVZkKs0gbuI7naZK2TUXFHN9w8= Does this mean i dont have to repeat it in the search analyzer when i do this at indexing time? Best regards - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: SynonymGraphFilter
Try reading the blog post I wrote about token stream graphs? http://blog.mikemccandless.com/2012/04/lucenes-tokenstreams-are-actually.html Mike McCandless http://blog.mikemccandless.com On Tue, Sep 11, 2018 at 1:35 PM, wrote: > Any comments please? > > Thanks > > > On 9/10/18 5:07 PM, baris.ka...@oracle.com wrote: > >> Any examples on this? i think it would be nice if Javadocs had an example >> on this: >> >> However, if you use this during indexing, you must follow it with >> FlattenGraphFilter to squash tokens on top of one another like >> SynonymFilter, because the indexer can't directly consume a graph. To get >> fully correct positional queries when your synonym replacements are >> multiple tokens, you should instead apply synonyms using this TokenFilter >> at query time and translate the resulting graph to a TermAutomatonQuery >> e.g. using TokenStreamToTermAutomatonQuery. >> >> multiple tokens means: a synonym with multiple equivalents?? >> >> or does it mean a synonym with multiple words? >> >> this is not clear to me. >> >> Best regards >> >> >> On 9/10/18 3:15 PM, baris.ka...@oracle.com wrote: >> >>> https://urldefense.proofpoint.com/v2/url?u=https-3A__lucene. >>> apache.org_core_6-5F4-5F1_analyzers-2Dcommon_org_apache_luce >>> ne_analysis_synonym_SynonymGraphFilter.html=DwICaQ=RoP1Y >>> umCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE=nlG5z5NcNdIbQAiX-BK >>> NeyLlULCbaezrgocEvPhQkl4=E2-7wwk3FgEU_ykuPnXNoOe0IIkgxivSa >>> YV3p-2lGfY=guRDJ6HEg5JJkMQqdDVZkKs0gbuI7naZK2TUXFHN9w8= >>> >>> Does this mean i dont have to repeat it in the search analyzer when i do >>> this at indexing time? >>> >>> Best regards >>> >>> >>> >> >> - >> To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org >> For additional commands, e-mail: java-user-h...@lucene.apache.org >> >> > > - > To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org > For additional commands, e-mail: java-user-h...@lucene.apache.org > >
Re: SynonymGraphFilter
Any comments please? Thanks On 9/10/18 5:07 PM, baris.ka...@oracle.com wrote: Any examples on this? i think it would be nice if Javadocs had an example on this: However, if you use this during indexing, you must follow it with FlattenGraphFilter to squash tokens on top of one another like SynonymFilter, because the indexer can't directly consume a graph. To get fully correct positional queries when your synonym replacements are multiple tokens, you should instead apply synonyms using this TokenFilter at query time and translate the resulting graph to a TermAutomatonQuery e.g. using TokenStreamToTermAutomatonQuery. multiple tokens means: a synonym with multiple equivalents?? or does it mean a synonym with multiple words? this is not clear to me. Best regards On 9/10/18 3:15 PM, baris.ka...@oracle.com wrote: https://urldefense.proofpoint.com/v2/url?u=https-3A__lucene.apache.org_core_6-5F4-5F1_analyzers-2Dcommon_org_apache_lucene_analysis_synonym_SynonymGraphFilter.html=DwICaQ=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE=nlG5z5NcNdIbQAiX-BKNeyLlULCbaezrgocEvPhQkl4=E2-7wwk3FgEU_ykuPnXNoOe0IIkgxivSaYV3p-2lGfY=guRDJ6HEg5JJkMQqdDVZkKs0gbuI7naZK2TUXFHN9w8= Does this mean i dont have to repeat it in the search analyzer when i do this at indexing time? Best regards - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org
Re: SynonymGraphFilter
Any examples on this? i think it would be nice if Javadocs had an example on this: However, if you use this during indexing, you must follow it with FlattenGraphFilter to squash tokens on top of one another like SynonymFilter, because the indexer can't directly consume a graph. To get fully correct positional queries when your synonym replacements are multiple tokens, you should instead apply synonyms using this TokenFilter at query time and translate the resulting graph to a TermAutomatonQuery e.g. using TokenStreamToTermAutomatonQuery. multiple tokens means: a synonym with multiple equivalents?? or does it mean a synonym with multiple words? this is not clear to me. Best regards On 9/10/18 3:15 PM, baris.ka...@oracle.com wrote: https://lucene.apache.org/core/6_4_1/analyzers-common/org/apache/lucene/analysis/synonym/SynonymGraphFilter.html Does this mean i dont have to repeat it in the search analyzer when i do this at indexing time? Best regards - To unsubscribe, e-mail: java-user-unsubscr...@lucene.apache.org For additional commands, e-mail: java-user-h...@lucene.apache.org