RE: problems with search in solr
Remove the stemmer filter. Caso and casa are transformed into cas if you use the stemmer filter. En español: Quita el filtro de stemmer, que se usa para sacar la raiz de las palabras, pero en tu caso la raíz de casa y caso es la misma, cas. Un saludo. De: PINA CORONADO, RAFAEL [rafael.p...@carm.es] Enviado el: jueves, 22 de marzo de 2012 13:38 Para: solr-user@lucene.apache.org Asunto: problems with search in solr Good morning: I have problems with the results obtained Solr search string (eg caso). Me back records with similar terms (in this example would return the same as if looking casa). The 1.4.1 version of Solr is The definition of type text in the file schema.xml is: fieldtype name=text class=solr.TextField analyzer charFilter class=solr.HTMLStripCharFilterFactory/ charFilter class=solr.MappingCharFilterFactory mapping=mapping-ISOLatin1Accent.txt/ tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.StopFilterFactory/ filter class=solr.PorterStemFilterFactory/ /analyzer /fieldtype Could you tell if an error in the configuration and how to solve it. thanks = Rafael Pina Coronado Servicio de Informática. Archivo General de la Región de Murcia Email: rafael.p...@carm.esmailto:rafael.p...@carm.es ==
RE: SOLR 3.3 DIH and Java 1.6
Some versions of the OpenJDK doesn´t include the Rhino Engine to run javascript dataimport. You have to use the Oracle JDK. Juampa. De: randolf.julian [randolf.jul...@dominionenterprises.com] Enviado el: martes, 20 de marzo de 2012 5:41 Para: solr-user@lucene.apache.org Asunto: SOLR 3.3 DIH and Java 1.6 I am trying to use the data import handler to update SOLR index with Oracle data. In the SOLR schema, a dynamic field called PHOTO_* has been defined. I created a script transformer: script and called it in a query: entity name=photo transformer=script:pivotPhotos query=select p.path||','||p.photo_barcode||','||p.display_order REC_PHOTO, lpad(p.display_order,3,'0') SEQUENCE_NUMBER from traderadm.photo p where p.realm_id = '${ad.REALM_ID}' and p.ad_id = '${ad.AD_ID}' order by p.display_order/ However, whenever I run a full import, it fails with this error in the solr0.log file: Full Import failed:org.apache.solr.handler.dataimport.DataImportHandlerException: lt;scriptgt; can be used only in java 6 or above Here's the output of my java version: $ java -version java version 1.6.0_0 OpenJDK Runtime Environment (IcedTea6 1.6) (rhel-1.13.b16.el5-x86_64) OpenJDK 64-Bit Server VM (build 14.0-b16, mixed mode) I believe we are using java 6. I am lost with this error and need help on why this is happening. Thanks. - Randolf -- View this message in context: http://lucene.472066.n3.nabble.com/SOLR-3-3-DIH-and-Java-1-6-tp3841355p3841355.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Solr Optimization Fail
Maybe you are generating a snapshot of your index attached to the optimize ??? Look for post-commit or post-optimize events in your solr-config.xml De: Rajani Maski [rajinima...@gmail.com] Enviado el: viernes, 16 de diciembre de 2011 11:11 Para: solr-user@lucene.apache.org Asunto: Solr Optimization Fail Hi, When we do optimize, it actually reduces the data size right? I have index of size 6gb(5 million documents). Index is already created with commits for every 1 documents. Now I was trying to do optimization with http optimize command. When i did that, data size became - 12gb. Why this might have happened? And can anyone please suggest me fix for it? Thanks Rajani
Re: Grouping or Facet ?
Sorry if I don´t explain my problem clearly... I need to do a suggester of names based on a prefix. My data are from two categories of people, admins and developers for example. So when the client write SAN my results should be: Prefix: San Developers: Sanchez Garcia, Juan (5) Sanchez Roman, Ivan (2) San... Admins: Sanchez, Pedro (7) Sanchez Garcia, Javier (2) And the most common a name is, the upper position will have. So I think is not posible to do that with grouping. So finally my schema will be: id nameDeveloper or nameAdmin : both String fields, but only one will have values in a doc. And my query with facet will be: /q=*:*facet=truefacet.field=nameDeveloperfacet.field=nameAdminfacet.prefix=SANfacet.minCounts=1 If I try to do that with grouping I need something like group.pivot=category,name , and is not posible in Solr yet. Best, Juampa. El 08/12/2011, a las 02:23, Darren Govoni escribió: Yes. That's what I would expect. I guess I didn't understand when you said The facet counts are the counts of the *values* in that field Because it seems its the count of the number of matching documents irrespective if one document has 20 values for that field and another 10, the facet count will be 2, one for each document in the results. On 12/07/2011 09:04 AM, Erick Erickson wrote: In your example you'll have 10 facets returned each with a value of 1. Best Erick On Tue, Dec 6, 2011 at 9:54 AM,dar...@ontrenet.com wrote: Sorry to jump into this thread, but are you saying that the facet count is not # of result hits? So if I have 1 document with field CAT that has 10 values and I do a query that returns this 1 document with faceting, that the CAT facet count will be 10 not 1? I don't seem to be seeing that behavior in my app (Solr 3.5). Thanks. OK, I'm not understanding here. You get the counts and the results if you facet on a single category field. The facet counts are the counts of the *values* in that field. So it would help me if you showed the output of faceting on a single category field and why that didn't work for you But either way, faceting will probably outperform grouping. Best Erick On Mon, Dec 5, 2011 at 9:05 AM, Juan Pablo Morajua...@informa.es wrote: Because I need the count and the result to return back to the client side. Both the grouping and the facet offers me a solution to do that, but my doubt is about performance ... With Grouping my results are: grouped:{ category:{ matches: ..., groups:[{ groupValue:categoryXX, doclist:{numFound:Important_number,start:0,docs:[ { doc:id category:XX } groupValue:categoryYY, doclist:{numFound:Important_number,start:0,docs:[ { doc: id category:YY } And with faceting my results are : facet.prefix=whatever facet_counts:{ facet_queries:{}, facet_fields:{ namesXX:[ whatever_name_in_category,76, ... namesYY:[ whatever_name_in_category,76, ... Both results are OK to me. De: Erick Erickson [erickerick...@gmail.com] Enviado el: lunes, 05 de diciembre de 2011 14:48 Para: solr-user@lucene.apache.org Asunto: Re: Grouping or Facet ? Why not just use the first form of the document and just facet.field=category? You'll get two different facet counts for XX and YY that way. I don't think grouping is the way to go here. Best Erick On Sat, Dec 3, 2011 at 6:43 AM, Juan Pablo Morajua...@informa.es wrote: I need to do some counts on a StrField field to suggest options from two different categories, and I don´t know what option is the best: My schema looks: - id - name - category: XX or YY with Grouping I do: http://localhost:8983/?q=name:prefix*group=truegroup.field=category But I can change my schema to to: - id - nameXX - nameYY - category: XX or YY (only 1 value in nameXX or nameYY) With facet: http://localhost:8983/?q=*:*facet=truefacet.field=nameXXfacet.field=nameYYfacet.prefix=prefix What option have the best performance ? Best, Juampa.
RE: Grouping or Facet ?
Because I need the count and the result to return back to the client side. Both the grouping and the facet offers me a solution to do that, but my doubt is about performance ... With Grouping my results are: grouped:{ category:{ matches: ..., groups:[{ groupValue:categoryXX, doclist:{numFound:Important_number,start:0,docs:[ { doc:id category:XX } groupValue:categoryYY, doclist:{numFound:Important_number,start:0,docs:[ { doc: id category:YY } And with faceting my results are : facet.prefix=whatever facet_counts:{ facet_queries:{}, facet_fields:{ namesXX:[ whatever_name_in_category,76, ... namesYY:[ whatever_name_in_category,76, ... Both results are OK to me. De: Erick Erickson [erickerick...@gmail.com] Enviado el: lunes, 05 de diciembre de 2011 14:48 Para: solr-user@lucene.apache.org Asunto: Re: Grouping or Facet ? Why not just use the first form of the document and just facet.field=category? You'll get two different facet counts for XX and YY that way. I don't think grouping is the way to go here. Best Erick On Sat, Dec 3, 2011 at 6:43 AM, Juan Pablo Mora jua...@informa.es wrote: I need to do some counts on a StrField field to suggest options from two different categories, and I don´t know what option is the best: My schema looks: - id - name - category: XX or YY with Grouping I do: http://localhost:8983/?q=name:prefix*group=truegroup.field=category But I can change my schema to to: - id - nameXX - nameYY - category: XX or YY (only 1 value in nameXX or nameYY) With facet: http://localhost:8983/?q=*:*facet=truefacet.field=nameXXfacet.field=nameYYfacet.prefix=prefix What option have the best performance ? Best, Juampa.
Grouping or Facet ?
I need to do some counts on a StrField field to suggest options from two different categories, and I don´t know what option is the best: My schema looks: - id - name - category: XX or YY with Grouping I do: http://localhost:8983/?q=name:prefix*group=truegroup.field=category But I can change my schema to to: - id - nameXX - nameYY - category: XX or YY (only 1 value in nameXX or nameYY) With facet: http://localhost:8983/?q=*:*facet=truefacet.field=nameXXfacet.field=nameYYfacet.prefix=prefix What option have the best performance ? Best, Juampa.
Highlight, Dismax and local params
Hello, I think I have found something extrange with local params and edismax. If I do querys like : params:{ hl.requireFieldMatch:true, hl.fragsize:200, json.wrf:callback0, indent:on, hl.fl:domicilio,deno, wt:json, hl:true, rows:5, fl:oidEmpresa,codNif,codTpoEmp,codVidaEmp,denoDef, debugQuery:on, q:{!edismax qf=$tipoDeno^5 pf=$tipoDeno^30 ps=5 qs=1}construcciones garcía, tipoDeno:deno, f.domicilio.hl.alternateField:domicilioDef, fq:-codTpoNif:F}}, The highlighting section of the response looks like: highlighting:{ 75663:{ domicilio:[P45 FOO BAR], deno:[V00T06 emFOO BAR/em]}, 76021:{ domicilio:[P45 BLAH BLAH], deno:[V00T00 BLAH BLAH]}, But if I repeat the query with: q:{!edismax qf='$tipoDeno^5 ANOTHER_FIELD' pf=$tipoDeno^30 ps=5 qs=1} construcciones garcía tipoDeno = deno The debug show: parsedquery:+((DisjunctionMaxQuery((deno:construcciones)) DisjunctionMaxQuery((deno:garcia)))~2), parsedquery_toString:+(((deno:construcciones) (deno:garcia))~2), And there is no reference to anotherField field and the highlight of the field deno dissapear in the response. highlighting:{ 75663:{ domicilio:[P45 FOO BAR], 76021:{ domicilio:[P45 BLAH BLAH],
Re: Matching on a multi valued field
I have not find any solution to this. The only thing is to denormalize your multivalue field into several docs with a single value field. Try ComplexPhraseQueryParser (https://issues.apache.org/jira/browse/SOLR-1604) if you are using solr 1.4 version. El 04/04/2011, a las 21:21, Brian Lamb escribió: I just noticed Juan's response and I find that I am encountering that very issue in a few cases. Boosting is a good way to put the more relevant results to the top but it is possible to only have the correct results returned? On Wed, Mar 30, 2011 at 11:51 AM, Brian Lamb brian.l...@journalexperts.commailto:brian.l...@journalexperts.com wrote: Thank you all for your responses. The field had already been set up with positionIncrementGap=100 so I just needed to add in the slop. On Tue, Mar 29, 2011 at 6:32 PM, Juan Pablo Mora jua...@informa.esmailto:jua...@informa.es wrote: A multiValued field is actually a single field with all data separated with positionIncrement. Try setting that value high enough and use a PhraseQuery. That is true but you cannot do things like: q=bar* foo*~10 with default query search. and if you use dismax you will have the same problems with multivalued fields. Imagine the situation: Doc1: field A: [foo bar,dooh] 2 values Doc2: field A: [bar dooh, whatever] Another 2 values the query: qt=dismax qf= fieldA q = ( bar dooh ) will return both Doc1 and Doc2. The only thing you can do in this situation is boost phrase query in Doc2 with parameter pf in order to get Doc2 in the first position of the results: pf = fieldA^1 Thanks, JP. El 29/03/2011, a las 23:14, Markus Jelsma escribió: orly, all replies came in while sending =) Hi, Your filter query is looking for a match of man's friend in a single field. Regardless of analysis of the common_names field, all terms are present in the common_names field of both documents. A multiValued field is actually a single field with all data separated with positionIncrement. Try setting that value high enough and use a PhraseQuery. That should work Cheers, Hi all, I have a field set up like this: field name=common_names multiValued=true type=text indexed=true stored=true required=false / And I have some records: RECORD1 arr name=common_names strman's best friend/str strpooch/str /arr RECORD2 arr name=common_names strman's worst enemy/str strfriend to no one/str /arr Now if I do a search such as: http://localhost:8983/solr/search/?q=*:*fq={!q.op=ANDhttp://localhost:8983/solr/search/?q=*:*fq=%7B!q.op=AND df=common_names}man's friend Both records are returned. However, I only want RECORD1 returned. I understand why RECORD2 is returned but how can I structure my query so that only RECORD1 is returned? Thanks, Brian Lamb
Re: Matching on a multi valued field
A multiValued field is actually a single field with all data separated with positionIncrement. Try setting that value high enough and use a PhraseQuery. That is true but you cannot do things like: q=bar* foo*~10 with default query search. and if you use dismax you will have the same problems with multivalued fields. Imagine the situation: Doc1: field A: [foo bar,dooh] 2 values Doc2: field A: [bar dooh, whatever] Another 2 values the query: qt=dismax qf= fieldA q = ( bar dooh ) will return both Doc1 and Doc2. The only thing you can do in this situation is boost phrase query in Doc2 with parameter pf in order to get Doc2 in the first position of the results: pf = fieldA^1 Thanks, JP. El 29/03/2011, a las 23:14, Markus Jelsma escribió: orly, all replies came in while sending =) Hi, Your filter query is looking for a match of man's friend in a single field. Regardless of analysis of the common_names field, all terms are present in the common_names field of both documents. A multiValued field is actually a single field with all data separated with positionIncrement. Try setting that value high enough and use a PhraseQuery. That should work Cheers, Hi all, I have a field set up like this: field name=common_names multiValued=true type=text indexed=true stored=true required=false / And I have some records: RECORD1 arr name=common_names strman's best friend/str strpooch/str /arr RECORD2 arr name=common_names strman's worst enemy/str strfriend to no one/str /arr Now if I do a search such as: http://localhost:8983/solr/search/?q=*:*fq={!q.op=AND df=common_names}man's friend Both records are returned. However, I only want RECORD1 returned. I understand why RECORD2 is returned but how can I structure my query so that only RECORD1 is returned? Thanks, Brian Lamb