Spellchecking - Is there a way to do this?
Hello everybody 1. Have tons of digitalized text with the logical errors in OCR process 2. Have indexed with Solr and is working OK. 3. Have added spellchecker index-based for words and phrases with the hope to offer suggestions with suspicious possible new query expressions, or related query expressions to the actual one with the intention to find documents that have the original expression but contains OCR errors (the user originally have search for state and democracy and the interface will offer stete and demcraci as an alternate query expression) My first problem appears because I need suggestions inclusive when the expression has returned results. It's seems that only appear suggestions when there are no results. Is there a way to do so? The second question is: For the purposes that I've mentioned, is the best way to use spellchecker or mlt component? Or some other (as a fuzzy query)? Thanks a lot German
Re: Problem with Query Parser
Thanks Ahmet. Definitely using analyzer appears the english porter as the killer ;) Regards German On Sun, Oct 18, 2009 at 7:30 AM, AHMET ARSLAN iori...@yahoo.com wrote: Hi everybody I have a simple but (for me) annoying problem. I'm happy user of Solr 1.4 with a small collection of documents. Today one of the users has reported that a query returns documents that are non-pertinent to the expression. I have spanish, portuguese and english text inside the collection. Using the Solr administration interface I've found that she was right, if I search for the spanish term represion, I found just only the word root, I mean it returns every document with the term repres. Using the admin-debug search I found this: lst name=debug str name=rawquerystringdescription:represion/str str name=querystringdescription:represion/str str name=parsedquerydescription:repres/str str name=parsedquery_toStringdescription:repres/str the ion part of the term was deleted by the query parser. The first question is: I don´t know now where should I see to correct this, at the schema.xml or at the solrconfig.xml. The only thing that is suspicious to me is the EnglishPorter. Yes you are right. ion part of the term was deleted by it. You can verify this using /admin/analysis.jsp page. It will tell you which TokenFilterFactory removes it. I've deleted from the configuration but nothing changes. Should I reindex the collection to see the changes? Yes re-index is necessary. Should I delete also from the index section? You should remove English porter from both query and index analyzer. What I will loose deleting English porter? You will lose stemming functionality. But since you have spanish, portuguese and english documents using English porter for all the documents is not meaningful.
Problem with Query Parser
Hi everybody I have a simple but (for me) annoying problem. I'm happy user of Solr 1.4 with a small collection of documents. Today one of the users has reported that a query returns documents that are non-pertinent to the expression. I have spanish, portuguese and english text inside the collection. Using the Solr administration interface I've found that she was right, if I search for the spanish term represion, I found just only the word root, I mean it returns every document with the term repres. Using the admin-debug search I found this: lst name=debug str name=rawquerystringdescription:represion/str str name=querystringdescription:represion/str str name=parsedquerydescription:repres/str str name=parsedquery_toStringdescription:repres/str the ion part of the term was deleted by the query parser. The first question is: I don´t know now where should I see to correct this, at the schema.xml or at the solrconfig.xml. At schema, description is field name=description type=text indexed=true multiValued=true stored=true/ and text is: fieldtype name=text class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.ISOLatin1AccentFilterFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.EnglishPorterFilterFactory protected=protwords.txt/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.ISOLatin1AccentFilterFactory/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt/ filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.EnglishPorterFilterFactory protected=protwords.txt/ filter class=solr.RemoveDuplicatesTokenFilterFactory/ /analyzer /fieldtype The only thing that is suspicious to me is the EnglishPorter. I've deleted from the configuration but nothing changes. Should I reindex the collection to see the changes? Should I delete also from the index section? What I will loose deleting English porter? Thanks a lot for the help German
Re: Newbie problem ordering results
Sure fieldtype name=string class=solr.StrField sortMissingLast=true omitNorms=true/ The strange thing is that I could sort by another fields that is defined using string, but not by another defined as some tokenized field and after that copied as string. I attach the schema.xml for the case is there another error and the error log says the following INFO: UnInverted multi-valued field {field=date,memSize=70356,tindexSize=40,time=381,phase1=381,nTerms=99,bigTerms=5,termInstances=4330,uses=0} 11/08/2009 12:42:31 org.apache.solr.request.UnInvertedField uninvert INFO: UnInverted multi-valued field {field=local_medium,memSize=70088,tindexSize=56,time=10,phase1=10,nTerms=30,bigTerms=2,termInstances=2461,uses=0} 11/08/2009 12:42:31 org.apache.solr.core.SolrCore execute INFO: [] webapp=/solr path=/select params={rows=20wt=jsonfacet.field=contributorfacetfacet.field=subjectfacetfacet.field=provenancefacet.field=local_statefacet.field=datefacet.field=local_mediumfacet.limit=15q=text:fisicastart=0facet.mincount=1fl=id,title,contributor,subject,provenance,date,coverage,publisher,score,local_state,local_urlsort=score+descfacet=true} hits=312 status=0 QTime=4963 11/08/2009 12:51:46 org.apache.solr.core.SolrCore execute *** This is the order by date desc that is working OK and it's defined as string INFO: [] webapp=/solr path=/select params={rows=20wt=jsonq=text:fisicastart=0sort=date+descfl=id,title,contributor,subject,provenance,date,coverage,publisher,score,local_state,local_url} hits=312 status=0 QTime=174 11/08/2009 12:52:38 org.apache.solr.common.SolrException log GRAVE: java.lang.RuntimeException: there are more terms than documents in field contributororder, but it's impossible to sort on tokenized fields at org.apache.lucene.search.FieldCacheImpl$8.createValue(FieldCacheImpl.java:518) at org.apache.lucene.search.FieldCacheImpl$Cache.get(FieldCacheImpl.java:81) at org.apache.lucene.search.FieldCacheImpl.getStringIndex(FieldCacheImpl.java:491) at org.apache.solr.search.MissingLastOrdComparator.setNextReader(MissingStringLastComparatorSource.java:181) at org.apache.lucene.search.TopFieldCollector$OneComparatorNonScoringCollector.setNextReader(TopFieldCollector.java:92) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:242) at org.apache.lucene.search.Searcher.search(Searcher.java:173) at org.apache.solr.search.SolrIndexSearcher.getDocListNC(SolrIndexSearcher.java:988) at Thanks a lot German On Tue, Aug 11, 2009 at 1:56 AM, Avlesh Singhavl...@gmail.com wrote: Can you please post the fieldType definition for the string field in your schema.xml? Cheers Avlesh On Tue, Aug 11, 2009 at 9:52 AM, Germán Biozzoli germanbiozz...@gmail.comwrote: Hello everybody I have the following (resumed) schema: field name=title type=text indexed=true stored=true multiValued=true/ field name=titleorder type=string indexed=true stored=true multiValued=true/ field name=contributor type=text indexed=true stored=true multiValued=true/ field name=contributorfacet type=textFacetN indexed=true stored=true multiValued=true/ field name=contributororder type=string indexed=true stored=true multiValued=true/ . copyField source=title dest=text / copyField source=title dest=titleorder / copyField source=contributor dest=text / copyField source=contributor dest=contributorfacet / copyField source=contributor dest=contributororder / ... I use for instance contributor for searching, contributorfacet for faceting and order for ordering results, but when I try to order using contributororder, Solr says that cannot order by a tokenized field...(?) I'm using Solr 1.4 nightly. Is this a bug? I believe that in previous versions I have this issue working... Regards and thanks Germán ?xml version=1.0 encoding=UTF-8 ? schema name=Test version=1.1 types fieldtype name=string class=solr.StrField sortMissingLast=true omitNorms=true/ fieldtype name=boolean class=solr.BoolField sortMissingLast=true omitNorms=true/ fieldtype name=integer class=solr.IntField omitNorms=true/ fieldtype name=long class=solr.LongField omitNorms=true/ fieldtype name=float class=solr.FloatField omitNorms=true/ fieldtype name=double class=solr.DoubleField omitNorms=true/ fieldtype name=sint class=solr.SortableIntField sortMissingLast=true omitNorms=true/ fieldtype name=slong class=solr.SortableLongField sortMissingLast=true omitNorms=true/ fieldtype name=sfloat class=solr.SortableFloatField sortMissingLast=true omitNorms=true/ fieldtype name=sdouble class=solr.SortableDoubleField sortMissingLast=true omitNorms=true/ fieldtype name=date class=solr.DateField sortMissingLast=true omitNorms=true/ fieldtype name=text_ws class=solr.TextField positionIncrementGap=100 analyzer tokenizer class=solr.WhitespaceTokenizerFactory/ /analyzer /fieldtype
Newbie problem ordering results
Hello everybody I have the following (resumed) schema: field name=title type=text indexed=true stored=true multiValued=true/ field name=titleorder type=string indexed=true stored=true multiValued=true/ field name=contributor type=text indexed=true stored=true multiValued=true/ field name=contributorfacet type=textFacetN indexed=true stored=true multiValued=true/ field name=contributororder type=string indexed=true stored=true multiValued=true/ . copyField source=title dest=text / copyField source=title dest=titleorder / copyField source=contributor dest=text / copyField source=contributor dest=contributorfacet / copyField source=contributor dest=contributororder / ... I use for instance contributor for searching, contributorfacet for faceting and order for ordering results, but when I try to order using contributororder, Solr says that cannot order by a tokenized field...(?) I'm using Solr 1.4 nightly. Is this a bug? I believe that in previous versions I have this issue working... Regards and thanks Germán