Re: German language specific problem (automatic Spelling correction, automatic Synonyms ?)
Concerning the downtime, we found a solution that works well for us. We allready implemented an update mechanism so that when authors are changing some content in the cms, the index regarding this piece of content gets updated (delete than index again) as well. All we had to do is: 1. Change the schema.xml to support the PhoneticFilter in certain fieldtypes 2. Write a script that finds all individual content items 3. Starting the update mechanism for each piece of content item on after another. So the index slowly emerges from the old to the new phonetic state without any noticeable downtime for users using the search function. Its just that they get kind of mixed results for the time of the transition. Sure it needs some time, but we can have cms users working with content all the time. If they create or update content during the transition it will be indexed, reindexed followinf the new schema.xml anyway. If we need to rollback we just replace the schema.xml with the old version and start the update process again. So far this is working, thanks for your support! -- View this message in context: http://lucene.472066.n3.nabble.com/German-language-specific-problem-automatic-Spelling-correction-automatic-Synonyms-tp3216278p3225223.html Sent from the Solr - User mailing list archive at Nabble.com.
German language specific problem (automatic Spelling correction, automatic Synonyms ?)
Hi, we have several entries in our database our customer would like to find when using a not exactly matching search string. The Problem is kind of related to spelling correction and synonyms. But instead of single entries in synonyms.txt we would like a automatic solution for this group of problems: When searching for the name: schmid we want to find also documents with the name schmidt included. There are analog names like hildebrand and hildebrandt and more. That is the reason we'd like to find a automatic solution for this group of words. We allready use the following filters in our index chain filter class=solr.DictionaryCompoundWordTokenFilterFactory dictionary=dictionary_de.txt/ filter class=solr.SnowballPorterFilterFactory language=German2 protected=protwords.txt/ Unfortunatelly the german stemmer is not handling such problems. Nor is this a problem related to compound words. Does anyone know of a solution? maybe its possible to set up a filter rule to extend words ending with letter d automatically with letter t in the query chain? Or other direction to remove t letters after d letters in index chain. Thanks a lot Thomas -- View this message in context: http://lucene.472066.n3.nabble.com/German-language-specific-problem-automatic-Spelling-correction-automatic-Synonyms-tp3216278p3216278.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: German language specific problem (automatic Spelling correction, automatic Synonyms ?)
I'd try solr.PhoneticFilterFactory, it usually converts these slight differences... schmidt, smith and schmid will be something like XMDT 2011/8/1 thomas tom.erfu...@googlemail.com Hi, we have several entries in our database our customer would like to find when using a not exactly matching search string. The Problem is kind of related to spelling correction and synonyms. But instead of single entries in synonyms.txt we would like a automatic solution for this group of problems: When searching for the name: schmid we want to find also documents with the name schmidt included. There are analog names like hildebrand and hildebrandt and more. That is the reason we'd like to find a automatic solution for this group of words. We allready use the following filters in our index chain filter class=solr.DictionaryCompoundWordTokenFilterFactory dictionary=dictionary_de.txt/ filter class=solr.SnowballPorterFilterFactory language=German2 protected=protwords.txt/ Unfortunatelly the german stemmer is not handling such problems. Nor is this a problem related to compound words. Does anyone know of a solution? maybe its possible to set up a filter rule to extend words ending with letter d automatically with letter t in the query chain? Or other direction to remove t letters after d letters in index chain. Thanks a lot Thomas -- View this message in context: http://lucene.472066.n3.nabble.com/German-language-specific-problem-automatic-Spelling-correction-automatic-Synonyms-tp3216278p3216278.html Sent from the Solr - User mailing list archive at Nabble.com. -- *Alexei Martchenko* | *CEO* | Superdownloads ale...@superdownloads.com.br | ale...@martchenko.com.br | (11) 5083.1018/5080.3535/5080.3533
Re: German language specific problem (automatic Spelling correction, automatic Synonyms ?)
Thomas, an alternative would be to use the Kölner phonetic factory. A recent discussion happened about it. But all this needs some programming. paul Le 1 août 2011 à 17:41, Alexei Martchenko a écrit : I'd try solr.PhoneticFilterFactory, it usually converts these slight differences... schmidt, smith and schmid will be something like XMDT 2011/8/1 thomas tom.erfu...@googlemail.com Hi, we have several entries in our database our customer would like to find when using a not exactly matching search string. The Problem is kind of related to spelling correction and synonyms. But instead of single entries in synonyms.txt we would like a automatic solution for this group of problems: When searching for the name: schmid we want to find also documents with the name schmidt included. There are analog names like hildebrand and hildebrandt and more. That is the reason we'd like to find a automatic solution for this group of words. We allready use the following filters in our index chain filter class=solr.DictionaryCompoundWordTokenFilterFactory dictionary=dictionary_de.txt/ filter class=solr.SnowballPorterFilterFactory language=German2 protected=protwords.txt/ Unfortunatelly the german stemmer is not handling such problems. Nor is this a problem related to compound words. Does anyone know of a solution? maybe its possible to set up a filter rule to extend words ending with letter d automatically with letter t in the query chain? Or other direction to remove t letters after d letters in index chain. Thanks a lot Thomas -- View this message in context: http://lucene.472066.n3.nabble.com/German-language-specific-problem-automatic-Spelling-correction-automatic-Synonyms-tp3216278p3216278.html Sent from the Solr - User mailing list archive at Nabble.com. -- *Alexei Martchenko* | *CEO* | Superdownloads ale...@superdownloads.com.br | ale...@martchenko.com.br | (11) 5083.1018/5080.3535/5080.3533
Re: German language specific problem (automatic Spelling correction, automatic Synonyms ?)
Thanks Alexei, Thanks Paul, I played with the solr.PhoneticFilterFactory. Analysing my query in solr admin backend showed me how and that it is working. My major problem is, that this filter needs to be applied to the index chain as well as to the query chain to generate matches for our search. We have a huge index at this point and i'am not really happy to reindex all content. Is there maybe a more subtle solution which is working by just manipulating the query chain only? Otherwise i need to backup the whole index and try to reindex overnight when cms users are sleeping. I will have a look into the ColognePhonetic encoder. Im just afraid ill have to reindex the whole content there as well. Thomas -- View this message in context: http://lucene.472066.n3.nabble.com/German-language-specific-problem-automatic-Spelling-correction-automatic-Synonyms-tp3216278p3216414.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: German language specific problem (automatic Spelling correction, automatic Synonyms ?)
Any changes you make related to stemming or normalization are likely going to require a re-index, just how it goes, just how solr/lucene works. What you can do just by normalizing at query time is limited, almost any good solution to this type of problem is going to require normalization at index time. If you're going to be fiddling with a production solr, it pays to figure out a workflow such that you can introduce indexing changes without downtime, this is not the last time you'll have to do it. On 8/1/2011 12:35 PM, thomas wrote: Thanks Alexei, Thanks Paul, I played with the solr.PhoneticFilterFactory. Analysing my query in solr admin backend showed me how and that it is working. My major problem is, that this filter needs to be applied to the index chain as well as to the query chain to generate matches for our search. We have a huge index at this point and i'am not really happy to reindex all content. Is there maybe a more subtle solution which is working by just manipulating the query chain only? Otherwise i need to backup the whole index and try to reindex overnight when cms users are sleeping. I will have a look into the ColognePhonetic encoder. Im just afraid ill have to reindex the whole content there as well. Thomas -- View this message in context: http://lucene.472066.n3.nabble.com/German-language-specific-problem-automatic-Spelling-correction-automatic-Synonyms-tp3216278p3216414.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: German language specific problem (automatic Spelling correction, automatic Synonyms ?)
Le 1 août 2011 à 18:35, thomas a écrit : Thanks Alexei, Thanks Paul, I played with the solr.PhoneticFilterFactory. Analysing my query in solr admin backend showed me how and that it is working. My major problem is, that this filter needs to be applied to the index chain as well as to the query chain to generate matches for our search. We have a huge index at this point and i'am not really happy to reindex all content. I doubt there's a way out. Is there maybe a more subtle solution which is working by just manipulating the query chain only? You'd need to programme it... it's not excluded. Otherwise i need to backup the whole index and try to reindex overnight when cms users are sleeping. With some work you can do this using an extra solr that just pulls everything, then swaps the indexes (that needs a bit of downtime), then re-indexes the things changed during the night. I feel this should be a standard feature of SOLR... I will have a look into the ColognePhonetic encoder. Im just afraid ill have to reindex the whole content there as well. Sure, absolutely. Also note that using phonetics really needs a separate field with query expansion (which is easy with dismax). paul
Re: German language specific problem (automatic Spelling correction, automatic Synonyms ?)
On 8/1/2011 12:42 PM, Paul Libbrecht wrote: Otherwise i need to backup the whole index and try to reindex overnight when cms users are sleeping. With some work you can do this using an extra solr that just pulls everything, then swaps the indexes (that needs a bit of downtime), then re-indexes the things changed during the night. I feel this should be a standard feature of SOLR... It sort of is, in the sense that you can do it with replication, with no downtime. (Although you'll need enough disk and RAM in the slave to warm the replicated index while still serving queries from the older index, for no downtime). Reindex to a seperate solr (or seperate core), then have the actual production core set up as a slave, and have it replicate from master when the re-indexing is done. You can have your relevant conf files (schema or solrconfig) set up to replicate too, so you get those new ones in production exactly when you get the new indexes they go with. The replication features isn't exactly set up for this, so it gets a bit confusing. I set up the 'slave' with NO polling. It still needs to be set up with config saying it's a slave though. And it still needs to have a 'master' URL in there, even though you can also supply/over-ride the master URL with a manual replicate command, if there's no master URL at all, Solr will refuse to start up. So I config the master URL, but without any polling for changes. Then I manually issue an HTTP replicate command to slave only when I have a rebuilt index in master I want to swap in. It seems to be working.
Re: German language specific problem (automatic Spelling correction, automatic Synonyms ?)
If you want to avoid re-indexing, you could consider building a synonym file that is generated using your rule set, and then using that to expand your queries. You'd need to get a list of all terms in your index and then process them to generate synyonyms. Actually, I don't know how to get a list of all the terms without Java programming, though: is there a way? -Mike On 08/01/2011 12:35 PM, thomas wrote: Thanks Alexei, Thanks Paul, I played with the solr.PhoneticFilterFactory. Analysing my query in solr admin backend showed me how and that it is working. My major problem is, that this filter needs to be applied to the index chain as well as to the query chain to generate matches for our search. We have a huge index at this point and i'am not really happy to reindex all content. Is there maybe a more subtle solution which is working by just manipulating the query chain only? Otherwise i need to backup the whole index and try to reindex overnight when cms users are sleeping. I will have a look into the ColognePhonetic encoder. Im just afraid ill have to reindex the whole content there as well. Thomas -- View this message in context: http://lucene.472066.n3.nabble.com/German-language-specific-problem-automatic-Spelling-correction-automatic-Synonyms-tp3216278p3216414.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: German language specific problem (automatic Spelling correction, automatic Synonyms ?)
On 8/1/2011 1:40 PM, Mike Sokolov wrote: If you want to avoid re-indexing, you could consider building a synonym file that is generated using your rule set, and then using that to expand your queries. You'd need to get a list of all terms in your index and then process them to generate synyonyms. Actually, I don't know how to get a list of all the terms without Java programming, though: is there a way? The terms compoennt will give you a list of all terms, I think. http://wiki.apache.org/solr/TermsComponent But this is getting awfully hacky and hard to maintain simply to avoid doing a re-index. I still think doing a re-index is a normal part of evolving your Solr configuration, and better to just get used to it (and figure out how to do it in production with no or minimal downtime) now.