RE: problems with search in solr

2012-03-22 Thread Juan Pablo Mora
Remove the stemmer filter. Caso and casa are transformed into cas if you 
use the stemmer filter.

En español:
Quita el filtro de stemmer, que se usa para sacar la raiz de las palabras, pero 
en tu caso la raíz de casa y caso es la misma, cas.

Un saludo.


De: PINA CORONADO, RAFAEL [rafael.p...@carm.es]
Enviado el: jueves, 22 de marzo de 2012 13:38
Para: solr-user@lucene.apache.org
Asunto: problems with search in solr

Good morning:
I have problems with the results obtained Solr search string (eg caso). Me back 
records with similar terms (in this example would return the same as if looking 
casa).
The 1.4.1 version of Solr is
The definition of type text in the file schema.xml is:

fieldtype name=text class=solr.TextField
  analyzer
charFilter class=solr.HTMLStripCharFilterFactory/
charFilter class=solr.MappingCharFilterFactory 
mapping=mapping-ISOLatin1Accent.txt/
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.StopFilterFactory/
filter class=solr.PorterStemFilterFactory/
  /analyzer
/fieldtype

Could you tell if an error in the configuration and how to solve it.

thanks

=
Rafael Pina Coronado
Servicio de Informática.
Archivo General de la Región de Murcia
Email: rafael.p...@carm.esmailto:rafael.p...@carm.es
==



RE: SOLR 3.3 DIH and Java 1.6

2012-03-20 Thread Juan Pablo Mora
Some versions of the OpenJDK doesn´t include the Rhino Engine to run javascript 
dataimport. You have to use the Oracle JDK.

Juampa.

De: randolf.julian [randolf.jul...@dominionenterprises.com]
Enviado el: martes, 20 de marzo de 2012 5:41
Para: solr-user@lucene.apache.org
Asunto: SOLR 3.3 DIH and Java 1.6

I am trying to use the data import handler to update SOLR index with Oracle
data. In the SOLR schema, a dynamic field called PHOTO_* has been defined. I
created a script transformer:

  script


and called it in a query:

   entity name=photo transformer=script:pivotPhotos
   query=select
p.path||','||p.photo_barcode||','||p.display_order REC_PHOTO,
 lpad(p.display_order,3,'0') SEQUENCE_NUMBER
from traderadm.photo p
where p.realm_id = '${ad.REALM_ID}'
  and p.ad_id = '${ad.AD_ID}'
order by p.display_order/

However, whenever I run a full import, it fails with this error in the
solr0.log file:

Full Import
failed:org.apache.solr.handler.dataimport.DataImportHandlerException:
lt;scriptgt; can be used only in java 6 or above

Here's the output of my java version:

$ java -version
java version 1.6.0_0
OpenJDK Runtime Environment (IcedTea6 1.6) (rhel-1.13.b16.el5-x86_64)
OpenJDK 64-Bit Server VM (build 14.0-b16, mixed mode)

I believe we are using java 6.

I am lost with this error and need help on why this is happening.

Thanks.

- Randolf


--
View this message in context: 
http://lucene.472066.n3.nabble.com/SOLR-3-3-DIH-and-Java-1-6-tp3841355p3841355.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Solr Optimization Fail

2011-12-16 Thread Juan Pablo Mora
Maybe you are generating a snapshot of your index attached to the optimize ???
Look for post-commit or post-optimize events in your solr-config.xml


De: Rajani Maski [rajinima...@gmail.com]
Enviado el: viernes, 16 de diciembre de 2011 11:11
Para: solr-user@lucene.apache.org
Asunto: Solr Optimization Fail

Hi,

 When we do optimize, it actually reduces the data size right?

I have index of size 6gb(5 million documents). Index is already created
with commits for every 1 documents.

Now I was trying to do optimization with  http optimize command.   When i
did that,  data size became - 12gb.  Why this might have happened?

And can anyone please suggest me fix for it?

Thanks
Rajani


Re: Grouping or Facet ?

2011-12-09 Thread Juan Pablo Mora
Sorry if I don´t explain my problem clearly...

I need to do a suggester of names based on a prefix. My data are from two 
categories of people, admins and developers for example. So when the client 
write SAN my results should be:

Prefix: San
Developers: Sanchez Garcia, Juan (5)
   Sanchez Roman, Ivan (2)
   San...

Admins: Sanchez, Pedro (7)
Sanchez Garcia, Javier (2)


And the most common a name is, the upper position will have. So I think is not 
posible to do that with grouping. So finally my schema will be:

id
nameDeveloper or nameAdmin : both String fields, but only one will have values 
in a doc.

And my query with facet will be:

/q=*:*facet=truefacet.field=nameDeveloperfacet.field=nameAdminfacet.prefix=SANfacet.minCounts=1


If I try to do that with grouping I need something like 
group.pivot=category,name , and is not posible in Solr yet.


Best,
Juampa.



El 08/12/2011, a las 02:23, Darren Govoni escribió:

 Yes. That's what I would expect. I guess I didn't understand when you said
 
 The facet counts are the counts of the *values* in that field
 
 Because it seems its the count of the number of matching documents 
 irrespective
 if one document has 20 values for that field and another 10, the facet count 
 will be 2,
 one for each document in the results.
 
 On 12/07/2011 09:04 AM, Erick Erickson wrote:
 In your example you'll have 10 facets returned each with a value of 1.
 
 Best
 Erick
 
 On Tue, Dec 6, 2011 at 9:54 AM,dar...@ontrenet.com  wrote:
 Sorry to jump into this thread, but are you saying that the facet count is
 not # of result hits?
 
 So if I have 1 document with field CAT that has 10 values and I do a query
 that returns this 1 document with faceting, that the CAT facet count will
 be 10 not 1? I don't seem to be seeing that behavior in my app (Solr 3.5).
 
 Thanks.
 
 OK, I'm not understanding here. You get the counts and the results if you
 facet
 on a single category field. The facet counts are the counts of the
 *values* in that
 field. So it would help me if you showed the output of faceting on a
 single
 category field and why that didn't work for you
 
 But either way, faceting will probably outperform grouping.
 
 Best
 Erick
 
 On Mon, Dec 5, 2011 at 9:05 AM, Juan Pablo Morajua...@informa.es  wrote:
 Because I need the count and the result to return back to the client
 side. Both the grouping and the facet offers me a solution to do that,
 but my doubt is about performance ...
 
 With Grouping my results are:
 
 grouped:{
category:{
  matches: ...,
  groups:[{
  groupValue:categoryXX,
  doclist:{numFound:Important_number,start:0,docs:[
  {
   doc:id
   category:XX
  }
   groupValue:categoryYY,
  doclist:{numFound:Important_number,start:0,docs:[
  {
   doc: id
   category:YY
  }
 
 And with faceting my results are :
 facet.prefix=whatever
 facet_counts:{
facet_queries:{},
facet_fields:{
  namesXX:[
whatever_name_in_category,76,
...
  namesYY:[
whatever_name_in_category,76,
...
 
 Both results are OK to me.
 
 
 
 De: Erick Erickson [erickerick...@gmail.com]
 Enviado el: lunes, 05 de diciembre de 2011 14:48
 Para: solr-user@lucene.apache.org
 Asunto: Re: Grouping or Facet ?
 
 Why not just use the first form of the document
 and just facet.field=category? You'll get
 two different facet counts for XX and YY
 that way.
 
 I don't think grouping is the way to go here.
 
 Best
 Erick
 
 On Sat, Dec 3, 2011 at 6:43 AM, Juan Pablo Morajua...@informa.es
 wrote:
 I need to do some counts on a StrField field to suggest options from
 two different categories, and I don´t know what option is the best:
 
 My schema looks:
 
 - id
 - name
 - category: XX or YY
 
 with Grouping I do:
 
 http://localhost:8983/?q=name:prefix*group=truegroup.field=category
 
 But I can change my schema to to:
 
 - id
 - nameXX
 - nameYY
 - category: XX or YY (only 1 value in nameXX or nameYY)
 
 With facet:
 http://localhost:8983/?q=*:*facet=truefacet.field=nameXXfacet.field=nameYYfacet.prefix=prefix
 
 
 What option have the best performance ?
 
 Best,
 Juampa.
 



RE: Grouping or Facet ?

2011-12-05 Thread Juan Pablo Mora
Because I need the count and the result to return back to the client side. Both 
the grouping and the facet offers me a solution to do that, but my doubt is 
about performance ...

With Grouping my results are:

grouped:{
category:{
  matches: ...,
  groups:[{
  groupValue:categoryXX,
  doclist:{numFound:Important_number,start:0,docs:[
  {
   doc:id
   category:XX
  }  
   groupValue:categoryYY,
  doclist:{numFound:Important_number,start:0,docs:[
  {
   doc: id
   category:YY
  }  

And with faceting my results are :
facet.prefix=whatever
facet_counts:{
facet_queries:{},
facet_fields:{
  namesXX:[
whatever_name_in_category,76,
...
  namesYY:[
whatever_name_in_category,76,
...

Both results are OK to me.



De: Erick Erickson [erickerick...@gmail.com]
Enviado el: lunes, 05 de diciembre de 2011 14:48
Para: solr-user@lucene.apache.org
Asunto: Re: Grouping or Facet ?

Why not just use the first form of the document
and just facet.field=category? You'll get
two different facet counts for XX and YY
that way.

I don't think grouping is the way to go here.

Best
Erick

On Sat, Dec 3, 2011 at 6:43 AM, Juan Pablo Mora jua...@informa.es wrote:
 I need to do some counts on a StrField field to suggest options from two 
 different categories, and I don´t know what option is the best:

 My schema looks:

 - id
 - name
 - category: XX or YY

 with Grouping I do:

 http://localhost:8983/?q=name:prefix*group=truegroup.field=category

 But I can change my schema to to:

 - id
 - nameXX
 - nameYY
 - category: XX or YY (only 1 value in nameXX or nameYY)

 With facet:
 http://localhost:8983/?q=*:*facet=truefacet.field=nameXXfacet.field=nameYYfacet.prefix=prefix


 What option have the best performance ?

 Best,
 Juampa.


Grouping or Facet ?

2011-12-03 Thread Juan Pablo Mora
I need to do some counts on a StrField field to suggest options from two 
different categories, and I don´t know what option is the best:

My schema looks:

- id
- name
- category: XX or YY

with Grouping I do:

http://localhost:8983/?q=name:prefix*group=truegroup.field=category

But I can change my schema to to:

- id
- nameXX
- nameYY
- category: XX or YY (only 1 value in nameXX or nameYY)

With facet:
http://localhost:8983/?q=*:*facet=truefacet.field=nameXXfacet.field=nameYYfacet.prefix=prefix


What option have the best performance ?

Best,
Juampa.

Highlight, Dismax and local params

2011-04-18 Thread Juan Pablo Mora
Hello,

I think I have found something extrange with local params and edismax. If I do 
querys like :


params:{
  hl.requireFieldMatch:true,
  hl.fragsize:200,
  json.wrf:callback0,
  indent:on,
  hl.fl:domicilio,deno,
  wt:json,
  hl:true,
  rows:5,
  fl:oidEmpresa,codNif,codTpoEmp,codVidaEmp,denoDef,
  debugQuery:on,
  q:{!edismax qf=$tipoDeno^5 pf=$tipoDeno^30 ps=5 qs=1}construcciones 
garcía,
  tipoDeno:deno,
  f.domicilio.hl.alternateField:domicilioDef,
  fq:-codTpoNif:F}},

The highlighting section of the response looks like:


highlighting:{
75663:{
  domicilio:[P45 FOO BAR],
  deno:[V00T06 emFOO BAR/em]},
76021:{
  domicilio:[P45 BLAH BLAH],
  deno:[V00T00 BLAH BLAH]},

But if I repeat the query with:

 q:{!edismax qf='$tipoDeno^5 ANOTHER_FIELD' pf=$tipoDeno^30 ps=5 qs=1} 
construcciones garcía
 tipoDeno = deno


The debug show:

parsedquery:+((DisjunctionMaxQuery((deno:construcciones)) 
DisjunctionMaxQuery((deno:garcia)))~2),
parsedquery_toString:+(((deno:construcciones) (deno:garcia))~2),

And there is no reference to anotherField field and the highlight of the 
field deno dissapear in the response.


highlighting:{
75663:{
  domicilio:[P45 FOO BAR],
76021:{
  domicilio:[P45 BLAH BLAH],





Re: Matching on a multi valued field

2011-04-04 Thread Juan Pablo Mora
I have not find any solution to this. The only thing is to denormalize your 
multivalue field into several docs with a single value field.

Try ComplexPhraseQueryParser (https://issues.apache.org/jira/browse/SOLR-1604) 
if you are using solr 1.4 version.


El 04/04/2011, a las 21:21, Brian Lamb escribió:

I just noticed Juan's response and I find that I am encountering that very 
issue in a few cases. Boosting is a good way to put the more relevant results 
to the top but it is possible to only have the correct results returned?

On Wed, Mar 30, 2011 at 11:51 AM, Brian Lamb 
brian.l...@journalexperts.commailto:brian.l...@journalexperts.com wrote:
Thank you all for your responses. The field had already been set up with 
positionIncrementGap=100 so I just needed to add in the slop.


On Tue, Mar 29, 2011 at 6:32 PM, Juan Pablo Mora 
jua...@informa.esmailto:jua...@informa.es wrote:
 A multiValued field
 is actually a single field with all data separated with positionIncrement.
 Try setting that value high enough and use a PhraseQuery.


That is true but you cannot do things like:

q=bar* foo*~10 with default query search.

and if you use dismax you will have the same problems with multivalued fields. 
Imagine the situation:

Doc1:
   field A: [foo bar,dooh] 2 values

Doc2:
   field A: [bar dooh, whatever] Another 2 values

the query:
   qt=dismax  qf= fieldA  q = ( bar dooh )

will return both Doc1 and Doc2. The only thing you can do in this situation is 
boost phrase query in Doc2 with parameter pf in order to get Doc2 in the first 
position of the results:

pf = fieldA^1


Thanks,
JP.


El 29/03/2011, a las 23:14, Markus Jelsma escribió:

 orly, all replies came in while sending =)

 Hi,

 Your filter query is looking for a match of man's friend in a single
 field. Regardless of analysis of the common_names field, all terms are
 present in the common_names field of both documents. A multiValued field
 is actually a single field with all data separated with positionIncrement.
 Try setting that value high enough and use a PhraseQuery.

 That should work

 Cheers,

 Hi all,

 I have a field set up like this:

 field name=common_names multiValued=true type=text indexed=true
 stored=true required=false /

 And I have some records:

 RECORD1
 arr name=common_names

  strman's best friend/str
  strpooch/str

 /arr

 RECORD2
 arr name=common_names

  strman's worst enemy/str
  strfriend to no one/str

 /arr

 Now if I do a search such as:
 http://localhost:8983/solr/search/?q=*:*fq={!q.op=ANDhttp://localhost:8983/solr/search/?q=*:*fq=%7B!q.op=AND
 df=common_names}man's friend

 Both records are returned. However, I only want RECORD1 returned. I
 understand why RECORD2 is returned but how can I structure my query so
 that only RECORD1 is returned?

 Thanks,

 Brian Lamb






Re: Matching on a multi valued field

2011-03-29 Thread Juan Pablo Mora
 A multiValued field
 is actually a single field with all data separated with positionIncrement.
 Try setting that value high enough and use a PhraseQuery.


That is true but you cannot do things like:

q=bar* foo*~10 with default query search.

and if you use dismax you will have the same problems with multivalued fields. 
Imagine the situation:

Doc1:
field A: [foo bar,dooh] 2 values

Doc2:
field A: [bar dooh, whatever] Another 2 values

the query:
qt=dismax  qf= fieldA  q = ( bar dooh )

will return both Doc1 and Doc2. The only thing you can do in this situation is 
boost phrase query in Doc2 with parameter pf in order to get Doc2 in the first 
position of the results:

pf = fieldA^1


Thanks,
JP.


El 29/03/2011, a las 23:14, Markus Jelsma escribió:

 orly, all replies came in while sending =)
 
 Hi,
 
 Your filter query is looking for a match of man's friend in a single
 field. Regardless of analysis of the common_names field, all terms are
 present in the common_names field of both documents. A multiValued field
 is actually a single field with all data separated with positionIncrement.
 Try setting that value high enough and use a PhraseQuery.
 
 That should work
 
 Cheers,
 
 Hi all,
 
 I have a field set up like this:
 
 field name=common_names multiValued=true type=text indexed=true
 stored=true required=false /
 
 And I have some records:
 
 RECORD1
 arr name=common_names
 
  strman's best friend/str
  strpooch/str
 
 /arr
 
 RECORD2
 arr name=common_names
 
  strman's worst enemy/str
  strfriend to no one/str
 
 /arr
 
 Now if I do a search such as:
 http://localhost:8983/solr/search/?q=*:*fq={!q.op=AND
 df=common_names}man's friend
 
 Both records are returned. However, I only want RECORD1 returned. I
 understand why RECORD2 is returned but how can I structure my query so
 that only RECORD1 is returned?
 
 Thanks,
 
 Brian Lamb