Right, I *love* problems like this... NOT.... You might get some joy out of using TrimFilterFactory along with KeywordAnalyzer, something like this: <fieldType name="trimField" class="solr.TextField" <your options here> > <analyzer> <tokenizer class="solr.KeywordTokenizerFactory" /> <filter class="solr.TrimFilterFactory" /> </analyzer> </fieldType>
but it depends upon what your fields are padded with.... Best Erick On Fri, Dec 17, 2010 at 8:12 AM, Ezequiel Calderara <ezech...@gmail.com>wrote: > Hi Erick, you were right. > > I'm looking the source of the search result (instead of the render of > internet explorer :$) and i see this: > "<str name="SectionName">Programas_Home > </str>" > > So i think that is the problem is in the SSIS process that retrieves data > from the DB and sends it to solr. > The data type in the db is VARCHAR(100)... but i'm sure that somewhere is > mapping it to CHAR(100) so it's length its always 100. > > Thank you very much, i will keep you informed > > Thanksssss > > > > On Thu, Dec 16, 2010 at 9:38 PM, Erick Erickson <erickerick...@gmail.com > >wrote: > > > OK, it works perfectly for me on a 1.4.1 instance. I've looked over your > > files a couple of times and see nothing obvious (but you'll never find > > anyone better at overlooking the obvious than me!). > > > > Tokenizing and stemming are irrelevant in this case because your > > type is "string", which is an untokenizedtype so you don't need to > > go there. > > > > The way your query parses and analyzes backs this up, so you're > > getting to the right schema definition. > > > > Which may bring us to whether what's in the index is what you *think* is > > in there. I'm betting not. Either you changed the schema and didn't > > re-index > > (say changed index="false" to index="true"), you didn't commit the > > documents > > after indexing or other such-like, or changed the field type and didn't > > reindex. > > > > So go into ..../solr/admin. Click on "schema browser", click on "fields". > > Along > > the left you should see "SectionName", click on that. That will show you > > the > > #indexed# terms, and you should see, exactly, "Programas_Home" in there, > > just > > like in your returned documents. Let us know if that's in fact what you > do > > see. It's > > possible you're being mislead by the difference between seeing the value > in > > a returned > > document (the stored value) and what's searched on (the indexed > token(s)). > > > > And I'm assuming that some asterisks in your mails were really there for > > bolding and > > you are NOT doing wildcard searches for, for instance, > > *SectionName:Programas_Home*. > > > > But we're at a point where my 1.4.1 instance produces the results you're > > expecting, > > at least as I understand them so I don't think it's a problem with Solr, > > but > > some change > > you've made is producing results you don't expect but are correct. Like I > > said, > > look at the indexed terms. If you see "Programas_Home" in the admin > console > > after > > following the steps above, then I don't know what to suggest.... > > > > Best > > Erick > > > > On Thu, Dec 16, 2010 at 5:12 PM, Ezequiel Calderara <ezech...@gmail.com > > >wrote: > > > > > The jars are named like *1.4.1* . So i suppose its the version 1.4.1 > > > > > > Thanks! > > > > > > On Thu, Dec 16, 2010 at 6:54 PM, Erick Erickson < > erickerick...@gmail.com > > > >wrote: > > > > > > > OK, what version of Solr are you using? I can take a quick check to > see > > > > what behavior I get.... > > > > > > > > Erick > > > > > > > > On Thu, Dec 16, 2010 at 4:44 PM, Ezequiel Calderara < > > ezech...@gmail.com > > > > >wrote: > > > > > > > > > I'll check the Tokenizer to see if that's the problem. > > > > > The results of Analysis Page for "SectionName:Programas_Home" > > > > > Query Analyzer org.apache.solr.schema.FieldType$DefaultAnalyzer {} > > > term > > > > > position 1 term text Programas_Home term type word source start,end > > > 0,14 > > > > > payload > > > > > > > > > > So it's not having problems with that... Also in the debug you can > > see > > > > that > > > > > the parsed query is correct... > > > > > So i don't know where to look... > > > > > > > > > > I know nothing about "Stemming" or tokenizing, but i will look if > > that > > > > has > > > > > anything to do. > > > > > > > > > > If anyone can help me out, please do :D > > > > > > > > > > > > > > > > > > > > > > > > > On Thu, Dec 16, 2010 at 5:55 PM, Erick Erickson < > > > erickerick...@gmail.com > > > > > >wrote: > > > > > > > > > > > Ezequiel: > > > > > > > > > > > > Nice job of including relevant details, by the way. Unfortunately > > I'm > > > > > > puzzled too. Your SectionName is a "string" type, so it should > > > > > > be placed in the index as-is. Be a bit cautious about looking at > > > > > > returned results (as I see in one of your xml files) because the > > > > returned > > > > > > values are the verbatim, stored field NOT what's tokenized, and > the > > > > > > tokenized data is what's searched.. > > > > > > > > > > > > That said, you SectionName should not be tokenized at all because > > > > > > it's a string type. Take a look at the admin page, "schema > browser" > > > and > > > > > > see what values for "SectionName" look (these will be the > tokenized > > > > > > values". They should be exactly > > > > > > Programas_Name, complete with underscore, case changes, etc. Is > > that > > > > > > the case? > > > > > > > > > > > > Another place that might help is the admin/analysis page. Check > the > > > > debug > > > > > > boxes and input your steps and it'll show you what the > > > transformations > > > > > > are applied. But a quick look leaves me completely baffled. > > > > > > > > > > > > Sorry I can't be more help > > > > > > Erick > > > > > > > > > > > > On Thu, Dec 16, 2010 at 2:07 PM, Ezequiel Calderara < > > > > ezech...@gmail.com > > > > > > >wrote: > > > > > > > > > > > > > Hi all, I have the following problems. > > > > > > > I have this set of data (View data (Pastebin) < > > > > > > > http://pastebin.com/jKbUhjVS> > > > > > > > ) > > > > > > > If i do a search for: *SectionName:Programas_Home* i have no > > > results: > > > > > > > Returned > > > > > > > Data (PasteBin) <http://pastebin.com/wnPdHqBm> > > > > > > > If i do a search for: *Programas_Home* i have only 1 result: > > Result > > > > > > > Returned > > > > > > > (Pastebin) <http://pastebin.com/fMZkLvYK> > > > > > > > if i do a search for: SectionName:Programa* i have 1 result: > > Result > > > > > > > Returned > > > > > > > (Pastebin) <http://pastebin.com/kLLnVp4b> > > > > > > > > > > > > > > This is my *schema* <http://pastebin.com/PQM8uap4> (Pastebin) > > and > > > > this > > > > > > is > > > > > > > my > > > > > > > *solrconfig* <http://%3c/?xml version="1.0" encoding="UTF-8" > > > > > > ?>>(PasteBin) > > > > > > > > > > > > > > I don't understand why when searching for > > > > "SectionName:Programas_Home" > > > > > > > isn't > > > > > > > returning any results at all... > > > > > > > > > > > > > > Can someone send some light on this? > > > > > > > -- > > > > > > > ______ > > > > > > > Ezequiel. > > > > > > > > > > > > > > Http://www.ironicnet.com <http://www.ironicnet.com/> < > > http://www.ironicnet.com/> < > > > > http://www.ironicnet.com/> > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -- > > > > > ______ > > > > > Ezequiel. > > > > > > > > > > Http://www.ironicnet.com <http://www.ironicnet.com/> < > > http://www.ironicnet.com/> > > > > > > > > > > > > > > > > > > > > > -- > > > ______ > > > Ezequiel. > > > > > > Http://www.ironicnet.com <http://www.ironicnet.com/> > > > > > > > > > -- > ______ > Ezequiel. > > Http://www.ironicnet.com >