date:20120704

Couple questions:
1) Why are you explicitly telling solr to sort by score desc,
shouldn't it do that for you? Could this be a source of performance
problems since sorting requires the loading of the field caches?
2) Of the query parameters, q1 and q2, which one is actually doing
text searching on your index? It looks like q1 is doing non-string
related stuff, could this be better handled in either the bf or bq
section of the edismax config? Looking at the sample though I don't
understand how q1=apartment would hit non-string fields again (but see
#3)
3) Are the string fields literally of string type (i.e. no analysis
on the field) or are you saying string loosely to mean text field.
pf == phrase fields == given a multiple word query, will ensure that
the specified phrase exists in the specified fields separated by some
slop (hello my world may match hello world depending on this slop
value). The qf means that given a multi term query, each term exists
in the specified fields (name, description whatever text fields you
want).

Best
Amit

On Mon, Jul 2, 2012 at 9:35 AM, Chamnap Chhorn chamnapchh...@gmail.com wrote:
 Hi all,

 I'm using solr 3.5 with nested query on the 4 core cpu server + 17 Gb. The
 problem is that my query is so slow; the average response time is 12 secs
 against 13 millions documents.

 What I am doing is to send quoted string (q2) to string fields and
 non-quoted string (q1) to other fields and combine the result together.

 facet=truesort=score+descq2=*apartment*facet.mincount=1q1=*apartment*
 tie=0.1q.alt=*:*wt=jsonversion=2.2rows=20fl=uuidfacet.query=has_map:+truefacet.query=has_image:+truefacet.query=has_website:+truestart=0q=
 *
 _query_:+{!dismax+qf='.'+fq='..'+v=$q1}+OR+_query_:+{!dismax+qf='..'+fq='...'+v=$q2}
 *
 facet.field={!ex%3Ddt}sub_category_uuidsfacet.field={!ex%3Ddt}location_uuid

 I have done solr optimize already, but it's still slow. Any idea how to
 improve the speed? Am I done anything wrong?

 --
 Chhorn Chamnap
 http://chamnap.github.com/

Use of Solr as primary store for search engine

Hello all,

I am curious to know how people are using Solr in conjunction with
other data stores when building search engines to power web sites (say
an ecommerce site). The question I have for the group is given an
architecture where the primary (transactional) data store is MySQL
(Oracle, PostGres whatever) with periodic indexing into Solr, when
your front end issues a search query to Solr and returns results, are
there any joins with your primary Oracle/MySQL etc to help render
results?

Basically I guess my question is whether or not you store enough in
Solr so that when your front end renders the results page, it never
has to hit the database. The other option is that your search engine
only returns primary keys that your front end then uses to hit the DB
to fetch data to display to your end user.

With Solr 4.0 and Solr moving towards the NoSQL direction, I am
curious what people are doing and what application architectures with
Solr look like.

Thanks!
Amit

Re: Use of Solr as primary store for search engine

2012-07-04 Thread Paul Libbrecht

Amit,

not exactly a response to your question but doing this with a lucene index on 
i2geo.net has resulted in considerably performance boost (reading from 
stored-fields instead of reading from the xwiki objects which pull from the SQL 
database). However, it implied that we had to rewrite anything necessary for 
the rendering, hence the rendering has not re-used that many code.

Paul


Le 4 juil. 2012 à 09:54, Amit Nithian a écrit :

 Hello all,
 
 I am curious to know how people are using Solr in conjunction with
 other data stores when building search engines to power web sites (say
 an ecommerce site). The question I have for the group is given an
 architecture where the primary (transactional) data store is MySQL
 (Oracle, PostGres whatever) with periodic indexing into Solr, when
 your front end issues a search query to Solr and returns results, are
 there any joins with your primary Oracle/MySQL etc to help render
 results?
 
 Basically I guess my question is whether or not you store enough in
 Solr so that when your front end renders the results page, it never
 has to hit the database. The other option is that your search engine
 only returns primary keys that your front end then uses to hit the DB
 to fetch data to display to your end user.
 
 With Solr 4.0 and Solr moving towards the NoSQL direction, I am
 curious what people are doing and what application architectures with
 Solr look like.
 
 Thanks!
 Amit

Re: Something like 'bf' or 'bq' with MoreLikeThis

Thanks a lot, Amit! Please bear with me, I am a new Solr dev, could you
please shed me some light on how to use a patch? point me to a wiki/doc is
fine too. Thanks a lot! :)

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Something-like-bf-or-bq-with-MoreLikeThis-tp3989060p3992935.html
Sent from the Solr - User mailing list archive at Nabble.com.

How to change tmp directory

2012-07-04 Thread Erik Fäßler

Hello all,

I came about an odd issue today when I wanted to add ca. 7M documents to my 
Solr index: I got a SolrServerException telling me No space left on device. I 
had a look at the directory Solr (and its index) is installed in and there is 
plenty space (~300GB).
I then noticed a file named upload_457ee97b_1385125274b__8000_0005.tmp 
had taken up all space of the machine's /tmp directory. The partition holding 
the /tmp directory only has around 1GB of space and this file already took 
nearly 800MB. I had a look at it and I realized that the file contained the 
data I was adding to Solr in an XML format.

Is there a possibility to change the temporary directory for this action?

I use an IteratorSolrInputDocument with the HttpSolrServer's add(Iterator) 
method for performance. So I can't just do commits from time to time.

Best regards,

Erik

Solr: MLT filter by a field in matched doc

MoreLikeThis can return the matched doc. My question is that can i somehow
pass in a query param to indicate that i would like to filter on a field
value of the matched doc? Is this doable? Or, if not doable, what's the work
around? Thanks a lot!

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-MLT-filter-by-a-field-in-matched-doc-tp3992945.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Similarity of numbers in MoreLikeThisHandler

very well explained. However, you dont know the number (integer/float) field
value of a matched in advance. So even suppose the Similarity field is
constructed, how to use it in the query?

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Similarity-of-numbers-in-MoreLikeThisHandler-tp486350p3992949.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Synonyms and hyphens

2012-07-04 Thread Alireza Salimi

Hi,

Does anybody know why hyphen '-' and q.op=AND causes such a big difference
between the two queries? I thought hyphens are removed by StandardTokenizer
which means theoretically the two queries should be the same!

Thanks

On Tue, Jul 3, 2012 at 4:05 PM, Alireza Salimi alireza.sal...@gmail.comwrote:

 Hi,

 I'm not sure if anybody has experienced this behavior before or not.
 I noticed that 'hyphen' plays a very important role here.
 I used Solr's default example directory.

 http://localhost:8983/solr/select/?q=name:(gb-mb)version=2.2start=0rows=10indent=ondebugQuery=onindent=onwt=jsonq.op=AND
 results in  parsedquery:+name:gb +name:gib +name:gigabyte
 +name:gigabytes +name:mb +name:mib +name:megabyte +name:megabytes,

 While searching 
 http://localhost:8984/solr/select/?q=name:(gbmb)version=2.2start=0rows=10indent=ondebugQuery=onindent=onwt=jsonq.op=AND
 results in parsedquery:+(name:gb name:gib name:gigabyte
 name:gigabytes) +(name:mb name:mib name:megabyte name:megabytes),

 If you notice to the first query - with hyphens - you can see that the
 results of
 parsing is totally different. I know that hyphens are special characters
 in Solr,
 but there's no way that the first query returns any entry because it's
 asking for
 ALL synonyms.

 Am I missing something here?

 Thanks


 --
 Alireza Salimi
 Java EE Developer





-- 
Alireza Salimi
Java EE Developer

Re: how Solr/Lucene can support standard join operation

2012-07-04 Thread Mikhail Khludnev

FYI,

If denormalization doesn't work for you, check index time join
http://blog.mikemccandless.com/2012/01/searching-relational-content-with.html.
here is the scratch for query and index time support:
https://issues.apache.org/jira/browse/SOLR-3076
https://issues.apache.org/jira/browse/SOLR-3535

On Wed, Jun 27, 2012 at 3:47 PM, Lee Carroll
lee.a.carr...@googlemail.comwrote:

 Sorry you have that link! and I did not see the question - apols

 index schema could look something like:

 id
 name
 classList - multi value
 majorClassList - multi value

 a standard query would do the equivalent of your sql

 again apols for not seeing the link

 lee c



 On 27 June 2012 12:37, Lee Carroll lee.a.carr...@googlemail.com wrote:
  In your example de-normalising would be fine in a vast number of
  use-cases. multi value fields are fine.
 
  If you really want to, see http://wiki.apache.org/solr/Join but make
  sure you loose the default relational dba world view first
  and only go down that route if you need to.
 
 
 
  On 27 June 2012 12:27, Robert Yu robert...@morningstar.com wrote:
  The ability of join operation supported as what
 http://wiki.apache.org/solr/Join says is so limited.
  I'm thinking how to support standard join operation in Solr/Lucene
 because not all can be de-normalized efficiently.
 
  Take 2 schemas below as an example:
 
  (1)Student
  sid
  name
  cid// class id
 
  (2)class
 
  cid
 
  name
 
  major
  In SQL, it will be easy to get all students' name and its class name
 where student's name start with 'p' and class's major is CS.
  Select s.name, c.name from student s, class c where s.namelike 
  'p%' and c.major = CS.
 
  How Solr/Lucene support the above query? It seems they do not.
 
  Thanks,
  
  Robert Yu
  Application Service - Backend
  Morningstar Shenzhen Ltd.
  Morningstar. Illuminating investing worldwide.
 
  +86 755 3311-0223 voice
  +86 137-2377-0925 mobile
  +86 755 - fax
  robert...@morningstar.commailto:robert...@morningstar.com
  8FL, Tower A, Donghai International Center ( or East Pacific
 International Center)
  7888 Shennan Road, Futian district,
  Shenzhen, Guangdong province, China 518040
 
  http://cn.morningstar.comhttp://cn.morningstar.com/
 
  This e-mail contains privileged and confidential information and is
 intended only for the use of the person(s) named above. Any dissemination,
 distribution, or duplication of this communication without prior written
 consent from Morningstar is strictly prohibited. If you have received this
 message in error, please contact the sender immediately and delete the
 materials from any computer.
 




-- 
Sincerely yours
Mikhail Khludnev
Tech Lead
Grid Dynamics

http://www.griddynamics.com
 mkhlud...@griddynamics.com

Re: Solr 3.6 issue - DataImportHandler with CachedSqlEntityProcessor not importing all multi-valued fields

2012-07-04 Thread Mikhail Khludnev

It's hard to troubleshoot without debug logs. Pls pay attention that
regular configuration for CachedSqlEP is slightly different

http://wiki.apache.org/solr/DataImportHandler#CachedSqlEntityProcessor
see

  where=xid=x.id



On Wed, Jun 27, 2012 at 2:29 AM, ps_sra praveens1...@yahoo.com wrote:

 Not sure if this is the right forum to post this question.  If not, please
 excuse.

 I'm trying to use the DataImportHandler with
 processor=CachedSqlEntityProcessor to speed up import from an RDBMS.
 While
 processor=CachedSqlEntityProcessor is much faster than
 processor=SqlEntityProcessor, the resulting Solr index does not contain
 multi-valued fields on sub-entities.

 So, for example, my db-data-config.xml has the following structure:

 document
 ..
 entity name=foo  pk=id

 processor=SqlEntityProcessor
 query=SELECT
 f.id AS foo_id,

   f.name AS foo_name
  FROM
   foo f
  
 field column=foo_id name=foo_id /
 field column=foo_name name=foo_name /


 entity name=bar
 processor=CachedSqlEntityProcessor

 query=SELECT   b.name as bar_name

   FROMbar b

  WHEREb.id = '${foo.id}'
 
  field column=bar_name name=bar_name
 /
 /entity

 /entity
 ..
 /document

 where the database relationship foo:bar is 1:m.

 The issue is that when I import with processor=SqlEntityProcessor ,
 everything works fine and the multi-valued field - bar_name has multiple
 values, while importing with processor=CachedSqlEntityProcessor does not
 even create the bar_name field in the index.

 I've deployed Solr 3.6 on Weblogic 11g, with the patch
 https://issues.apache.org/jira/browse/SOLR-3360 applied.

 Any help on this issue is appreciated.


 Thanks,
 ps

 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Solr-3-6-issue-DataImportHandler-with-CachedSqlEntityProcessor-not-importing-all-multi-valued-fields-tp3991449.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 
Sincerely yours
Mikhail Khludnev
Tech Lead
Grid Dynamics

http://www.griddynamics.com
 mkhlud...@griddynamics.com

Re: Elevation togehter with grouping

2012-07-04 Thread tushar_k47

Hi,

I am facing an identical problem. Does anyone have any pointers on this ?

Regards,
Tushar

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Elevation-togehter-with-grouping-tp3916981p3992925.html
Sent from the Solr - User mailing list archive at Nabble.com.

WordDelimiterFilter removes ampersands

2012-07-04 Thread Stephen Lacy

If a user writes a query Apples  Oranges the word delimiter filter
factory will change this into Apples Oranges
Which isn't very useful for me as I'd prefer especially when the phrase is
wrapped in quotes that the original is preserved.
However I still want to be able to separate ApplesOranges into Apples 
Oranges so preserveOriginal isn't really useful.
What I really would like to be able to do is tell WordDelimeterFilter to
treat it like it's neither alpha nor numeric, however
that doesn't mean that you remove it completely.

Thanks for your help
Stephen

Get all matching terms of an OR query

2012-07-04 Thread Michael Jakl

Hi,
is there an easy way to get the matches of an OR query?

If I'm searching for android OR google OR apple OR iphone OR -ipod,
I'd like to know which of these terms document X contains.

I've been using debugQuery and tried to extract the info from the
explain information, unfortunately this is too slow and I'm having
troubles with the stemming of the query.

Using the highlight component doesn't work either because my fields
aren't stored (would the highlighter work with stemmed texts?)

We're using Solr 3.6 in a distributed setting. I'd like to prevent
storing the texts because of space issues, but if that's the only
reasonable solution... .

Thank you,
Michael

Re: WordDelimiterFilter removes ampersands


That's a perfectly reasonable request. But, WDF doesn't have such a feature.

Maybe what is needed is a distinct ampersand filter that runs before WDF 
and detects ampersands that are likely shorthands for and and expands 
them. It would also need to be able to detect ATT (capital letter before 
the ) and not expand it (and you can set up a character type table for WDF 
that treats  as a letter. A single  could also be expanded to and - 
that could also be done with the synonym filter, but that would not help you 
with the embedded  of ApplesOranges.


Maybe a simple character filter that always expands  to  and  would be 
good enough for a lot of common cases, as a rough approximation.


Maybe solr.PatternReplaceCharFilterFactory could be used to accomplish that. 
Match  and replace with  and .


-- Jack Krupansky

-Original Message- 
From: Stephen Lacy

Sent: Wednesday, July 04, 2012 8:16 AM
To: solr-user@lucene.apache.org
Subject: WordDelimiterFilter removes ampersands

If a user writes a query Apples  Oranges the word delimiter filter
factory will change this into Apples Oranges
Which isn't very useful for me as I'd prefer especially when the phrase is
wrapped in quotes that the original is preserved.
However I still want to be able to separate ApplesOranges into Apples 
Oranges so preserveOriginal isn't really useful.
What I really would like to be able to do is tell WordDelimeterFilter to
treat it like it's neither alpha nor numeric, however
that doesn't mean that you remove it completely.

Thanks for your help
Stephen

Urgent:Partial Search not Working

2012-07-04 Thread jayakeerthi s

All,

I am using apache-solr-4.0.0-ALPHA and trying to configure the Partial
search on two fields.

Keywords using to search are
The value inside the search ProdSymbl is M1.6X0.35 9P

and I illl have to get the results if I search for M1.6 or X0.35 (Partial
of the search value).


I have tried using  both NGramTokenizerFactory and solr.EdgeNGramFilterFactory
 in the schema.xml

!-- bigram --
  !--  fieldType name=bigram class=solr.TextField
positionIncrementGap=100
  analyzer type=index
 tokenizer class=solr.NGramTokenizerFactory minGramSize=3
maxGramSize=15 /
  filter class=solr.LowerCaseFilterFactory/
   /analyzer
 analyzer type=query
  tokenizer class=solr.WhitespaceTokenizerFactory /
  filter class=solr.LowerCaseFilterFactory/
  /analyzer

/fieldType --

fieldType name=bigram class=solr.TextField
omitNorms=false
  analyzer
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.StandardFilterFactory/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.EdgeNGramFilterFactory minGramSize=2
maxGramSize=15 side=front/
  /analyzer
  /fieldType



Fields I have configured as

  field name=prodsymbl type=bigraml indexed=true stored=true
multiValued=true/
   field name=measure1 type=bigram indexed=true stored=true
multiValued=true/

Copy field as

copyField source=prodsymbl dest=text/
   copyField source=measure1 dest=text/



Please let me know IF I and missing anything, this is kind of Urgent
requirement needs to be addressed at the earliest, Please help.


Thanks in advance,

Jay

Boosting the whole documents

2012-07-04 Thread Danilak Michal

Hi,

I have the following problem.
I would like to give a boost to the whole documents as I index them. I am
sending to solr xml in the form:

adddoc boost=2.0/doc/add

But it does't seem to alter the search scores in any way. I would expect
that to multiply the final search score by two, am I correct?
Probably I would need to alter schema.xml, but I found only information on
how to do that for specific fields (just put omitNorms=false into the field
tag). But what should I do, if I want to boost the whole document?

Note: by boosting a whole document I mean, that if document A has search
score 10.0 and document B has search score 15.0 and I give document A the
boost 2.0, when I index it, I would expect its search score to be 20.0.

Thanks in advance!

Michal Danilak

Solr facet multiple constraint

2012-07-04 Thread davidbougearel

Hi,

I'm trying to make a facet search on a multiple value field and add a filter
query on it and it doesn't work.
Could you please help me find my mistake ?

Here is my solr query :

facet=true,sort=publishingdate desc,facet.mincount=1,q=service:1 AND
publicationstatus:LIVE,facet.field={!ex=dt}user,wt=javabin,fq={!tag=dt}user:10,version=2

Thanks in advance for answers, David. 

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-facet-multiple-constraint-tp3992974.html
Sent from the Solr - User mailing list archive at Nabble.com.

fl Parameter and Wildcards for Dynamic Fields

2012-07-04 Thread Josh Harness

I'm using SOLR 3.3 and would like to know how to return a list of dynamic
fields in my search results using a wildcard with the fl parameter. I found
SOLR-2444 https://issues.apache.org/jira/browse/SOLR-2444 but this
appears to be for SOLR 4.0. Am I correct in assuming this isn't doable yet?
Please note that I don't want to query the dynamic fields, I just need them
returned in the search results. Using fl=myDynamicField_* doesn't seem to
work.

Many Thanks!

Josh

Re: leap second bug

2012-07-04 Thread Michael Tsadikov

explanation of the cause:

https://lkml.org/lkml/2012/7/1/203

On Wed, Jul 4, 2012 at 1:48 AM, Óscar Marín Miró
oscarmarinm...@gmail.comwrote:

 So, this was the solution, sorry to post it so late, just in case it helps
 anyone:

 /etc/init.d/ntp stop; date; date `date +%m%d%H%M%C%y.%S`; date;
 /etc/init.d/ntp start

 And tomcat magically switched from 100% CPU to 0.5% :)

 From:


 https://groups.google.com/forum/?fromgroups#!topic/elasticsearch/_I1_OfaL7QY

 [from Michael McCandless help on this thread]

 On Sun, Jul 1, 2012 at 6:15 PM, Jack Krupansky j...@basetechnology.com
 wrote:

  Interesting:
 
  
  The sequence of dates of the UTC second markers will be:
 
  2012 June 30, 23h 59m 59s
  2012 June 30, 23h 59m 60s
  2012 July 1, 0h 0m 0s
  
 
  See:
  http://wwp.greenwichmeantime.**com/info/leap-second.htm
 http://wwp.greenwichmeantime.com/info/leap-second.htm
 
  So, there were two consecutive second  markers which were literally
  distinct, but numerically identical.
 
  What design pattern for timing did Linux violate? In other words, what
  lesson should we be learning to assure that we don't have a similar
 problem
  at an application level on a future leap second?
 
  -- Jack Krupansky
 
  -Original Message- From: Óscar Marín Miró
  Sent: Sunday, July 01, 2012 11:02 AM
  To: solr-user@lucene.apache.org
  Subject: Re: leap second bug
 
 
  Thanks Michael, nice information :)
 
  On Sun, Jul 1, 2012 at 5:29 PM, Michael McCandless 
  luc...@mikemccandless.com wrote:
 
   Looks like this is a low-level Linux issue ... see Shay's email to the
  ElasticSearch list about it:
 
 
  https://groups.google.com/**forum/?fromgroups#!topic/**
  elasticsearch/_I1_OfaL7QY
 https://groups.google.com/forum/?fromgroups#!topic/elasticsearch/_I1_OfaL7QY
 
 
  Also see the comments here:
 
   http://news.ycombinator.com/**item?id=4182642
 http://news.ycombinator.com/item?id=4182642
 
  Mike McCandless
 
  http://blog.mikemccandless.com
 
  On Sun, Jul 1, 2012 at 8:08 AM, Óscar Marín Miró
  oscarmarinm...@gmail.com wrote:
   Hello Michael, thanks for the note :)
  
   I'm having a similar problem since yesterday, tomcats are wild on CPU
  [near
   100%]. Did your solr servers did not reply to index/query requests?
  
   Thanks :)
  
   On Sun, Jul 1, 2012 at 1:22 PM, Michael Tsadikov 
  mich...@myheritage.com
  wrote:
  
   Our solr servers went into GC hell, and became non-responsive on date
   change today.
  
   Restarting tomcats did not help.
  
   Rebooting the machine did.
  
  
  
  http://www.wired.com/**wiredenterprise/2012/07/leap-**
  second-bug-wreaks-havoc-with-**java-linux/
 http://www.wired.com/wiredenterprise/2012/07/leap-second-bug-wreaks-havoc-with-java-linux/
 
  
  
  
  
   --
   Whether it's science, technology, personal experience, true love,
   astrology, or gut feelings, each of us has confidence in something
 that
  we
   will never fully comprehend.
--Roy H. William
 
 
 
 
  --
  Whether it's science, technology, personal experience, true love,
  astrology, or gut feelings, each of us has confidence in something that
 we
  will never fully comprehend.
  --Roy H. William
 



 --
 Whether it's science, technology, personal experience, true love,
 astrology, or gut feelings, each of us has confidence in something that we
 will never fully comprehend.
  --Roy H. William

Re: fl Parameter and Wildcards for Dynamic Fields

This appears to be the case. * is the only wildcard supported by fl 
before 4.0.


-- Jack Krupansky

-Original Message- 
From: Josh Harness

Sent: Wednesday, July 04, 2012 9:08 AM
To: solr-user@lucene.apache.org
Subject: fl Parameter and Wildcards for Dynamic Fields

I'm using SOLR 3.3 and would like to know how to return a list of dynamic
fields in my search results using a wildcard with the fl parameter. I found
SOLR-2444 https://issues.apache.org/jira/browse/SOLR-2444 but this
appears to be for SOLR 4.0. Am I correct in assuming this isn't doable yet?
Please note that I don't want to query the dynamic fields, I just need them
returned in the search results. Using fl=myDynamicField_* doesn't seem to
work.

Many Thanks!

Josh

Re: Get all matching terms of an OR query

First, OR -ipod needs to be written as OR (*:* -ipod) due to an ongoing 
deficiency in Lucene query parsing, but I wonder what you really think you 
are OR'ing in that clause - all documents that don't contain ipod? That 
seems odd. Maybe you really want to constrain the preceding query to exclude 
ipod? That would be:


(android OR google OR apple OR iphone) -ipod

-- Jack Krupansky

-Original Message- 
From: Michael Jakl

Sent: Wednesday, July 04, 2012 8:29 AM
To: solr-user@lucene.apache.org
Subject: Get all matching terms of an OR query

Hi,
is there an easy way to get the matches of an OR query?

If I'm searching for android OR google OR apple OR iphone OR -ipod,
I'd like to know which of these terms document X contains.

I've been using debugQuery and tried to extract the info from the
explain information, unfortunately this is too slow and I'm having
troubles with the stemming of the query.

Using the highlight component doesn't work either because my fields
aren't stored (would the highlighter work with stemmed texts?)

We're using Solr 3.6 in a distributed setting. I'd like to prevent
storing the texts because of space issues, but if that's the only
reasonable solution... .

Thank you,
Michael

Javadocs issue on Solr web site

2012-07-04 Thread Ken Krugler

Currently all Javadoc links seem to wind up pointing at the api-4_0_0-ALPHA 
versions - is that expected?

E.g. do a Google search on StreamingUpdateSolrServer. First hit is for 
StreamingUpdateSolrServer (Solr 3.6.0 API)

Follow that link, and you get a 404 for page 
http://lucene.apache.org/solr/api-4_0_0-ALPHA/org/apache/solr/client/solrj/impl/StreamingUpdateSolrServer.html

-- Ken

--
Ken Krugler
http://www.scaleunlimited.com
custom big data solutions  training
Hadoop, Cascading, Mahout  Solr

Re: Get all matching terms of an OR query

2012-07-04 Thread Michael Jakl

Hi!

On 4 July 2012 17:01, Jack Krupansky j...@basetechnology.com wrote:
 First, OR -ipod needs to be written as OR (*:* -ipod) due to an ongoing
 deficiency in Lucene query parsing, but I wonder what you really think you
 are OR'ing in that clause - all documents that don't contain ipod? That
 seems odd. Maybe you really want to constrain the preceding query to exclude
 ipod? That would be:

 (android OR google OR apple OR iphone) -ipod

Thanks, the example was ill-chosen, the -ipod part shouldn't be there.

After some more tests and research, using the debugQuery method seems
the only viable solution(?)

Cheers,
Michael

Re: Get all matching terms of an OR query

You could always do a custom search component, but all the same information 
(which terms matched) is in the debugQuery. For example, 
queryWeight(text:the) indicates that the appears in the document.


What exactly is it that is too slow?

Yes, you do have to accept that explain uses analyzed terms. I would note 
that you could try to correlate the parsedquery with the original query 
since the parsed query will contain stemmed terms.


It would be nice to have an optional search component or query parser option 
that returned the analyzed term for each query term.


But as things stand, I would suggest that you do your own fuzzy match 
between the debugQuery terms and your source terms. That may not be 100% 
accurate, but probably would cover most/many cases.


-- Jack Krupansky

-Original Message- 
From: Michael Jakl

Sent: Wednesday, July 04, 2012 10:09 AM
To: solr-user@lucene.apache.org
Subject: Re: Get all matching terms of an OR query

Hi!

On 4 July 2012 17:01, Jack Krupansky j...@basetechnology.com wrote:
First, OR -ipod needs to be written as OR (*:* -ipod) due to an 
ongoing

deficiency in Lucene query parsing, but I wonder what you really think you
are OR'ing in that clause - all documents that don't contain ipod? That
seems odd. Maybe you really want to constrain the preceding query to 
exclude

ipod? That would be:

(android OR google OR apple OR iphone) -ipod


Thanks, the example was ill-chosen, the -ipod part shouldn't be there.

After some more tests and research, using the debugQuery method seems
the only viable solution(?)

Cheers,
Michael

Re: WordDelimiterFilter removes ampersands

2012-07-04 Thread Stephen Lacy

 solr.**PatternReplaceCharFilterFactor**y  is a brilliant idea, thanks so
much :)

On Wed, Jul 4, 2012 at 2:46 PM, Jack Krupansky j...@basetechnology.comwrote:

 That's a perfectly reasonable request. But, WDF doesn't have such a
 feature.

 Maybe what is needed is a distinct ampersand filter that runs before WDF
 and detects ampersands that are likely shorthands for and and expands
 them. It would also need to be able to detect ATT (capital letter before
 the ) and not expand it (and you can set up a character type table for WDF
 that treats  as a letter. A single  could also be expanded to and -
 that could also be done with the synonym filter, but that would not help
 you with the embedded  of ApplesOranges.

 Maybe a simple character filter that always expands  to  and  would
 be good enough for a lot of common cases, as a rough approximation.

 Maybe solr.**PatternReplaceCharFilterFactor**y could be used to
 accomplish that. Match  and replace with  and .

 -- Jack Krupansky

 -Original Message- From: Stephen Lacy
 Sent: Wednesday, July 04, 2012 8:16 AM
 To: solr-user@lucene.apache.org
 Subject: WordDelimiterFilter removes ampersands


 If a user writes a query Apples  Oranges the word delimiter filter
 factory will change this into Apples Oranges
 Which isn't very useful for me as I'd prefer especially when the phrase is
 wrapped in quotes that the original is preserved.
 However I still want to be able to separate ApplesOranges into Apples 
 Oranges so preserveOriginal isn't really useful.
 What I really would like to be able to do is tell WordDelimeterFilter to
 treat it like it's neither alpha nor numeric, however
 that doesn't mean that you remove it completely.

 Thanks for your help
 Stephen

Boosting the score of the whole documents

2012-07-04 Thread Danilak Michal

Hi guys,
I have the following problem.

I would like to give a boost to the whole documents as I index them. I am
sending to solr the xml in the form:

adddoc boost=2.0/doc/add

But it does't seem to alter the search scores in any way. I would expect
that to multiply the final search score by two, am I correct?
Probably I would need to alter schema.xml, but I found only information on
how to do that for specific fields (just put omitNorms=false into the field
tag). But what should I do, if I want to boost the whole document?

Note: by boosting a whole document I mean, that if document A has search
score 10.0 and document B has search score 15.0 and I give document A the
boost 2.0, when I index it, I would expect its search score to be 20.0.

Thanks in advance!

Re: How to change tmp directory

Solr is probably simply using Java's temp directory, which you can redefine 
by setting the java.io.tmpdir system property on the java command line or 
using a system-specific environment variable.


-- Jack Krupansky

-Original Message- 
From: Erik Fäßler

Sent: Wednesday, July 04, 2012 3:56 AM
To: solr-user@lucene.apache.org
Subject: How to change tmp directory

Hello all,

I came about an odd issue today when I wanted to add ca. 7M documents to my 
Solr index: I got a SolrServerException telling me No space left on 
device. I had a look at the directory Solr (and its index) is installed in 
and there is plenty space (~300GB).
I then noticed a file named upload_457ee97b_1385125274b__8000_0005.tmp 
had taken up all space of the machine's /tmp directory. The partition 
holding the /tmp directory only has around 1GB of space and this file 
already took nearly 800MB. I had a look at it and I realized that the file 
contained the data I was adding to Solr in an XML format.


Is there a possibility to change the temporary directory for this action?

I use an IteratorSolrInputDocument with the HttpSolrServer's add(Iterator) 
method for performance. So I can't just do commits from time to time.


Best regards,

Erik

Re: difference between stored=false and stored=true ?

1. The useless combination of stored=false and indexed=false is useful to 
ignore fields. You might have input data which has fields that you have 
decided to ignore.


2. Stored fields take up memory for documents (fields) to be returned for 
search results in the Solr query response, so fewer stored fields is better 
for performance and memory usage.


-- Jack Krupansky

-Original Message- 
From: Amit Nithian

Sent: Wednesday, July 04, 2012 12:54 AM
To: solr-user@lucene.apache.org
Subject: Re: difference between stored=false and stored=true ?

So couple questions on this (comment first then question):
1) I guess you can't have four combinations b/c
index=false/stored=false has no meaning?
2) If you set less fields stored=true does this reduce the memory
footprint for the document cache? Or better yet, I can store more
documents in the cache possibly increasing my cache efficiency?

I read about the lazy loading of fields which seems like a good way to
maximize the cache and gain the advantage of storing data in Solr too.

Thanks
Amit

On Sat, Jun 30, 2012 at 11:01 AM, Giovanni Gherdovich
g.gherdov...@gmail.com wrote:

Thank you François and Jack for those explainations.

Cheers,
GGhh

2012/6/30 François Schiettecatte:

Giovanni

stored=true means the data is stored in the index and [...]



2012/6/30 Jack Krupansky:
indexed and stored are independent [...]

Re: Synonyms and hyphens

Terms with embedded special characters are treated as phrases with spaces in 
place of the special characters. So, gb-mb is treated as if you had 
enclosed the term in quotes.


-- Jack Krupansky
-Original Message- 
From: Alireza Salimi

Sent: Wednesday, July 04, 2012 6:50 AM
To: solr-user@lucene.apache.org
Subject: Re: Synonyms and hyphens

Hi,

Does anybody know why hyphen '-' and q.op=AND causes such a big difference
between the two queries? I thought hyphens are removed by StandardTokenizer
which means theoretically the two queries should be the same!

Thanks

On Tue, Jul 3, 2012 at 4:05 PM, Alireza Salimi 
alireza.sal...@gmail.comwrote:



Hi,

I'm not sure if anybody has experienced this behavior before or not.
I noticed that 'hyphen' plays a very important role here.
I used Solr's default example directory.

http://localhost:8983/solr/select/?q=name:(gb-mb)version=2.2start=0rows=10indent=ondebugQuery=onindent=onwt=jsonq.op=AND
results in  parsedquery:+name:gb +name:gib +name:gigabyte
+name:gigabytes +name:mb +name:mib +name:megabyte +name:megabytes,

While searching 
http://localhost:8984/solr/select/?q=name:(gbmb)version=2.2start=0rows=10indent=ondebugQuery=onindent=onwt=jsonq.op=AND

results in parsedquery:+(name:gb name:gib name:gigabyte
name:gigabytes) +(name:mb name:mib name:megabyte name:megabytes),

If you notice to the first query - with hyphens - you can see that the
results of
parsing is totally different. I know that hyphens are special characters
in Solr,
but there's no way that the first query returns any entry because it's
asking for
ALL synonyms.

Am I missing something here?

Thanks


--
Alireza Salimi
Java EE Developer






--
Alireza Salimi
Java EE Developer

Re: Synonyms and hyphens

2012-07-04 Thread Alireza Salimi

Wow, I didn't know that. Is there a way to disable this feature? I mean, is
it something coming from the Analyzer?

On Wed, Jul 4, 2012 at 12:26 PM, Jack Krupansky j...@basetechnology.comwrote:

 Terms with embedded special characters are treated as phrases with spaces
 in place of the special characters. So, gb-mb is treated as if you had
 enclosed the term in quotes.

 -- Jack Krupansky
 -Original Message- From: Alireza Salimi
 Sent: Wednesday, July 04, 2012 6:50 AM
 To: solr-user@lucene.apache.org
 Subject: Re: Synonyms and hyphens


 Hi,

 Does anybody know why hyphen '-' and q.op=AND causes such a big difference
 between the two queries? I thought hyphens are removed by StandardTokenizer
 which means theoretically the two queries should be the same!

 Thanks

 On Tue, Jul 3, 2012 at 4:05 PM, Alireza Salimi alireza.sal...@gmail.com*
 *wrote:

  Hi,

 I'm not sure if anybody has experienced this behavior before or not.
 I noticed that 'hyphen' plays a very important role here.
 I used Solr's default example directory.

 http://localhost:8983/solr/**select/?q=name:(gb-mb)**
 version=2.2start=0rows=10**indent=ondebugQuery=on**
 indent=onwt=jsonq.op=ANDhttp://localhost:8983/solr/select/?q=name:(gb-mb)version=2.2start=0rows=10indent=ondebugQuery=onindent=onwt=jsonq.op=AND
 results in  parsedquery:+name:gb +name:gib +name:gigabyte
 +name:gigabytes +name:mb +name:mib +name:megabyte +name:megabytes,

 While searching http://localhost:8984/solr/**
 select/?q=name:(gbmb)version=**2.2start=0rows=10indent=on**
 debugQuery=onindent=onwt=**jsonq.op=ANDhttp://localhost:8984/solr/select/?q=name:(gbmb)version=2.2start=0rows=10indent=ondebugQuery=onindent=onwt=jsonq.op=AND
 results in parsedquery:+(name:gb name:gib name:gigabyte
 name:gigabytes) +(name:mb name:mib name:megabyte name:megabytes),

 If you notice to the first query - with hyphens - you can see that the
 results of
 parsing is totally different. I know that hyphens are special characters
 in Solr,
 but there's no way that the first query returns any entry because it's
 asking for
 ALL synonyms.

 Am I missing something here?

 Thanks


 --
 Alireza Salimi
 Java EE Developer





 --
 Alireza Salimi
 Java EE Developer




-- 
Alireza Salimi
Java EE Developer

Re: Boosting the score of the whole documents

Make sure to review the similarity javadoc page to understand what any of 
these factors does to the document score.


See:
http://lucene.apache.org/core/3_6_0/api/all/org/apache/lucene/search/Similarity.html

Sure, a document boost applies a multiplicative factor, but that is all 
relative to all of the other factors for that document and query. In other 
words, all other things being equal, a doc-boost of 2.0 would double the 
score, but all other things are usually not equal.


Try different doc-boost values and see how the score is affected. The 
document may have such a low score that a boost of 2.0 doesn't move the 
needle relative to other documents.


I believe that the doc-boost is included within the fieldNorm value that 
is shown in the explain section if you add debugQuery=true to your query 
request. This is explained under norm in the similarity javadoc.


I did try a couple of examples with the Solr 3.6 example, such as doc 
boost=2.0, 0.2 (de-boost), 4.0, and 8.0. In my case, it took a boost of 8.0 
to move a document up.


-- Jack Krupansky

-Original Message- 
From: Danilak Michal

Sent: Wednesday, July 04, 2012 10:57 AM
To: solr-user@lucene.apache.org
Subject: Boosting the score of the whole documents

Hi guys,
I have the following problem.

I would like to give a boost to the whole documents as I index them. I am
sending to solr the xml in the form:

adddoc boost=2.0/doc/add

But it does't seem to alter the search scores in any way. I would expect
that to multiply the final search score by two, am I correct?
Probably I would need to alter schema.xml, but I found only information on
how to do that for specific fields (just put omitNorms=false into the field
tag). But what should I do, if I want to boost the whole document?

Note: by boosting a whole document I mean, that if document A has search
score 10.0 and document B has search score 15.0 and I give document A the
boost 2.0, when I index it, I would expect its search score to be 20.0.

Thanks in advance!

Debugging jetty IllegalStateException errors?

2012-07-04 Thread Aaron Daubman

Greetings,

I'm wondering if anybody has experienced (and found root cause) for errors
like this. We're running Solr 3.6.0 with latest stable Jetty 7
(7.6.4.v20120524).
I know this is likely due to a client (or the server) terminating the
connection unexpectedly, but we see these fairly frequently and can't
determine what the impact is or why they are happening (who is closing
early, why?)

Any tips/tricks on troubleshooting or what to do to possibly minimize or
help prevent these from happening (we are using a fairly old python client
to programmatically access this solr instance).

---snip---
17:25:13,250 [qtp581536050-12] WARN  jetty.server.Response null - Committed
before 500 null

org.eclipse.jetty.io.EofException
at
org.eclipse.jetty.http.HttpGenerator.flushBuffer(HttpGenerator.java:952)
at
org.eclipse.jetty.http.AbstractGenerator.flush(AbstractGenerator.java:438)
at org.eclipse.jetty.server.HttpOutput.flush(HttpOutput.java:94)
at
org.eclipse.jetty.server.AbstractHttpConnection$Output.flush(AbstractHttpConnection.java:1016)
at sun.nio.cs.StreamEncoder.implFlush(StreamEncoder.java:278)
at sun.nio.cs.StreamEncoder.flush(StreamEncoder.java:122)
at java.io.OutputStreamWriter.flush(OutputStreamWriter.java:212)
at org.apache.solr.common.util.FastWriter.flush(FastWriter.java:115)
at
org.apache.solr.servlet.SolrDispatchFilter.writeResponse(SolrDispatchFilter.java:353)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:273)
at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1332)
at
org.eclipse.jetty.servlets.UserAgentFilter.doFilter(UserAgentFilter.java:77)
at
org.eclipse.jetty.servlets.GzipFilter.doFilter(GzipFilter.java:247)
at
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1332)
at
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:477)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:119)
at
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:524)
at
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:225)
at
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1031)
at
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:406)
at
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:186)
at
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:965)
at
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:117)
at
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:250)
at
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:149)
at
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:111)
at org.eclipse.jetty.server.Server.handle(Server.java:348)
at
org.eclipse.jetty.server.AbstractHttpConnection.handleRequest(AbstractHttpConnection.java:452)
at
org.eclipse.jetty.server.AbstractHttpConnection.content(AbstractHttpConnection.java:894)
at
org.eclipse.jetty.server.AbstractHttpConnection$RequestHandler.content(AbstractHttpConnection.java:948)
at org.eclipse.jetty.http.HttpParser.parseNext(HttpParser.java:851)
at
org.eclipse.jetty.http.HttpParser.parseAvailable(HttpParser.java:235)
at
org.eclipse.jetty.server.AsyncHttpConnection.handle(AsyncHttpConnection.java:77)
at
org.eclipse.jetty.io.nio.SelectChannelEndPoint.handle(SelectChannelEndPoint.java:620)
at
org.eclipse.jetty.io.nio.SelectChannelEndPoint$1.run(SelectChannelEndPoint.java:46)
at
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:603)
at
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:538)
at java.lang.Thread.run(Thread.java:662)
Caused by: java.nio.channels.ClosedChannelException
at
sun.nio.ch.SocketChannelImpl.ensureWriteOpen(SocketChannelImpl.java:137)
at sun.nio.ch.SocketChannelImpl.write(SocketChannelImpl.java:359)
at java.nio.channels.SocketChannel.write(SocketChannel.java:360)
at
org.eclipse.jetty.io.nio.ChannelEndPoint.gatheringFlush(ChannelEndPoint.java:371)
at
org.eclipse.jetty.io.nio.ChannelEndPoint.flush(ChannelEndPoint.java:330)
at
org.eclipse.jetty.io.nio.SelectChannelEndPoint.flush(SelectChannelEndPoint.java:330)
at
org.eclipse.jetty.http.HttpGenerator.flushBuffer(HttpGenerator.java:876)
... 37 more

17:25:13,250 [qtp581536050-12] WARN  jetty.servlet.ServletHandler null -
/solr/artists/select java.lang.IllegalStateException: Committed
at org.eclipse.jetty.server.Response.resetBuffer(Response.java:1087)
at

Re: Javadocs issue on Solr web site

2012-07-04 Thread Chris Hostetter


: Currently all Javadoc links seem to wind up pointing at the api-4_0_0-ALPHA 
versions - is that expected?

yes.  

/solr/api has always pointed at the javadocs for the most recent 
release of solr.  All that's changed now is that we host multiple copies 
of hte javadocs (just like Lucene-Core has for a long time) and the 
canonical URLs make it clear which version you are looking at.

there's an open Jira to make a landing page listing all the versions that 
i'm going to try to get to later today, but you can still find the 3.6 
javadocs here...

http://lucene.apache.org/solr/api-3_6_0/

: E.g. do a Google search on StreamingUpdateSolrServer. First hit is for 
StreamingUpdateSolrServer (Solr 3.6.0 API)
: 
: Follow that link, and you get a 404 for page 
: 
http://lucene.apache.org/solr/api-4_0_0-ALPHA/org/apache/solr/client/solrj/impl/StreamingUpdateSolrServer.html

that's to be expected:
  1) google hasn't recrawled yet so it doesn't know about the new versions in 
general
  2) that class was removed in 4.0


-Hoss

Re: Synonyms and hyphens

There is one other detail that should clarify the situation. At query time, 
the query parser itself is breaking your query into space-delimited terms, 
and only calling the analyzer for each of those terms, each of which will be 
treated as if a quoted phrase. So it doesn't matter whether it is the 
standard analyzer or word delimiter filter or other filter that is breaking 
up the compound term.


And the default query operator only applies to the terms as the query 
parser parsed them, not for the sub-terms of a compound term like CD-ROM or 
gb-mb.


-- Jack Krupansky

-Original Message- 
From: Alireza Salimi

Sent: Wednesday, July 04, 2012 12:05 PM
To: solr-user@lucene.apache.org
Subject: Re: Synonyms and hyphens

Wow, I didn't know that. Is there a way to disable this feature? I mean, is
it something coming from the Analyzer?

On Wed, Jul 4, 2012 at 12:26 PM, Jack Krupansky 
j...@basetechnology.comwrote:



Terms with embedded special characters are treated as phrases with spaces
in place of the special characters. So, gb-mb is treated as if you had
enclosed the term in quotes.

-- Jack Krupansky
-Original Message- From: Alireza Salimi
Sent: Wednesday, July 04, 2012 6:50 AM
To: solr-user@lucene.apache.org
Subject: Re: Synonyms and hyphens


Hi,

Does anybody know why hyphen '-' and q.op=AND causes such a big difference
between the two queries? I thought hyphens are removed by 
StandardTokenizer

which means theoretically the two queries should be the same!

Thanks

On Tue, Jul 3, 2012 at 4:05 PM, Alireza Salimi alireza.sal...@gmail.com*
*wrote:

 Hi,


I'm not sure if anybody has experienced this behavior before or not.
I noticed that 'hyphen' plays a very important role here.
I used Solr's default example directory.

http://localhost:8983/solr/**select/?q=name:(gb-mb)**
version=2.2start=0rows=10**indent=ondebugQuery=on**
indent=onwt=jsonq.op=ANDhttp://localhost:8983/solr/select/?q=name:(gb-mb)version=2.2start=0rows=10indent=ondebugQuery=onindent=onwt=jsonq.op=AND
results in  parsedquery:+name:gb +name:gib +name:gigabyte
+name:gigabytes +name:mb +name:mib +name:megabyte +name:megabytes,

While searching http://localhost:8984/solr/**
select/?q=name:(gbmb)version=**2.2start=0rows=10indent=on**
debugQuery=onindent=onwt=**jsonq.op=ANDhttp://localhost:8984/solr/select/?q=name:(gbmb)version=2.2start=0rows=10indent=ondebugQuery=onindent=onwt=jsonq.op=AND
results in parsedquery:+(name:gb name:gib name:gigabyte
name:gigabytes) +(name:mb name:mib name:megabyte name:megabytes),

If you notice to the first query - with hyphens - you can see that the
results of
parsing is totally different. I know that hyphens are special characters
in Solr,
but there's no way that the first query returns any entry because it's
asking for
ALL synonyms.

Am I missing something here?

Thanks


--
Alireza Salimi
Java EE Developer






--
Alireza Salimi
Java EE Developer





--
Alireza Salimi
Java EE Developer

Re: Synonyms and hyphens

2012-07-04 Thread Alireza Salimi

ok, so how can I prevent this behavior to happen?
As you can see the parsed query is very different in these two cases.

On Wed, Jul 4, 2012 at 1:37 PM, Jack Krupansky j...@basetechnology.comwrote:

 There is one other detail that should clarify the situation. At query
 time, the query parser itself is breaking your query into space-delimited
 terms, and only calling the analyzer for each of those terms, each of which
 will be treated as if a quoted phrase. So it doesn't matter whether it is
 the standard analyzer or word delimiter filter or other filter that is
 breaking up the compound term.

 And the default query operator only applies to the terms as the query
 parser parsed them, not for the sub-terms of a compound term like CD-ROM or
 gb-mb.


 -- Jack Krupansky

 -Original Message- From: Alireza Salimi
 Sent: Wednesday, July 04, 2012 12:05 PM

 To: solr-user@lucene.apache.org
 Subject: Re: Synonyms and hyphens

 Wow, I didn't know that. Is there a way to disable this feature? I mean, is
 it something coming from the Analyzer?

 On Wed, Jul 4, 2012 at 12:26 PM, Jack Krupansky j...@basetechnology.com*
 *wrote:

  Terms with embedded special characters are treated as phrases with spaces
 in place of the special characters. So, gb-mb is treated as if you had
 enclosed the term in quotes.

 -- Jack Krupansky
 -Original Message- From: Alireza Salimi
 Sent: Wednesday, July 04, 2012 6:50 AM
 To: solr-user@lucene.apache.org
 Subject: Re: Synonyms and hyphens


 Hi,

 Does anybody know why hyphen '-' and q.op=AND causes such a big difference
 between the two queries? I thought hyphens are removed by
 StandardTokenizer
 which means theoretically the two queries should be the same!

 Thanks

 On Tue, Jul 3, 2012 at 4:05 PM, Alireza Salimi alireza.sal...@gmail.com
 *
 *wrote:

  Hi,


 I'm not sure if anybody has experienced this behavior before or not.
 I noticed that 'hyphen' plays a very important role here.
 I used Solr's default example directory.

 http://localhost:8983/solr/select/?q=name:(gb-mb)**http://localhost:8983/solr/**select/?q=name:(gb-mb)**
 version=2.2start=0rows=10indent=ondebugQuery=on**
 indent=onwt=jsonq.op=ANDhtt**p://localhost:8983/solr/**
 select/?q=name:(gb-mb)**version=2.2start=0rows=10**
 indent=ondebugQuery=on**indent=onwt=jsonq.op=ANDhttp://localhost:8983/solr/select/?q=name:(gb-mb)version=2.2start=0rows=10indent=ondebugQuery=onindent=onwt=jsonq.op=AND
 

 results in  parsedquery:+name:gb +name:gib +name:gigabyte
 +name:gigabytes +name:mb +name:mib +name:megabyte +name:megabytes,

 While searching http://localhost:8984/solr/**
 select/?q=name:(gbmb)version=2.2start=0rows=10indent=**on**
 debugQuery=onindent=onwt=jsonq.op=ANDhttp://**
 localhost:8984/solr/select/?q=**name:(gbmb)version=2.2start=**
 0rows=10indent=on**debugQuery=onindent=onwt=**jsonq.op=ANDhttp://localhost:8984/solr/select/?q=name:(gbmb)version=2.2start=0rows=10indent=ondebugQuery=onindent=onwt=jsonq.op=AND
 

 results in parsedquery:+(name:gb name:gib name:gigabyte
 name:gigabytes) +(name:mb name:mib name:megabyte name:megabytes),

 If you notice to the first query - with hyphens - you can see that the
 results of
 parsing is totally different. I know that hyphens are special characters
 in Solr,
 but there's no way that the first query returns any entry because it's
 asking for
 ALL synonyms.

 Am I missing something here?

 Thanks


 --
 Alireza Salimi
 Java EE Developer





 --
 Alireza Salimi
 Java EE Developer




 --
 Alireza Salimi
 Java EE Developer




-- 
Alireza Salimi
Java EE Developer

Re: Boosting the score of the whole documents

2012-07-04 Thread Danilak Michal

Should there be made any modification into scheme.xml file?
For example, to enable field boosts, one has to set omitNorms to false.
Is there some similar field for document boosts?

On Wed, Jul 4, 2012 at 7:29 PM, Jack Krupansky j...@basetechnology.comwrote:

Make sure to review the similarity javadoc page to understand what any
of these factors does to the document score.

See:
http://lucene.apache.org/core/**3_6_0/api/all/org/apache/**
lucene/search/Similarity.htmlhttp://lucene.apache.org/core/3_6_0/api/all/org/apache/lucene/search/Similarity.html

Sure, a document boost applies a multiplicative factor, but that is all
relative to all of the other factors for that document and query. In other
words, all other things being equal, a doc-boost of 2.0 would double the
score, but all other things are usually not equal.

Try different doc-boost values and see how the score is affected. The
document may have such a low score that a boost of 2.0 doesn't move the
needle relative to other documents.

I believe that the doc-boost is included within the fieldNorm value that
is shown in the explain section if you add debugQuery=true to your query
request. This is explained under norm in the similarity javadoc.

I did try a couple of examples with the Solr 3.6 example, such as doc
boost=2.0, 0.2 (de-boost), 4.0, and 8.0. In my case, it took a boost of 8.0
to move a document up.

-- Jack Krupansky

-Original Message- From: Danilak Michal
Sent: Wednesday, July 04, 2012 10:57 AM
To: solr-user@lucene.apache.org
Subject: Boosting the score of the whole documents

Hi guys,
I have the following problem.

I would like to give a boost to the whole documents as I index them. I am
sending to solr the xml in the form:

adddoc boost=2.0/doc/add

But it does't seem to alter the search scores in any way. I would expect
that to multiply the final search score by two, am I correct?
Probably I would need to alter schema.xml, but I found only information on
how to do that for specific fields (just put omitNorms=false into the field
tag). But what should I do, if I want to boost the whole document?

Note: by boosting a whole document I mean, that if document A has search
score 10.0 and document B has search score 15.0 and I give document A the
boost 2.0, when I index it, I would expect its search score to be 20.0.

Thanks in advance!

Re: Urgent:Partial Search not Working

2012-07-04 Thread jayakeerthi s

Could anyone please reply the solution to this

On Wed, Jul 4, 2012 at 7:18 PM, jayakeerthi s mail2keer...@gmail.comwrote:

 All,

 I am using apache-solr-4.0.0-ALPHA and trying to configure the Partial
 search on two fields.

 Keywords using to search are
 The value inside the search ProdSymbl is M1.6X0.35 9P

 and I willl have to get the results if I search for M1.6 or X0.35 (Partial
 of the search value).


 I have tried using  both NGramTokenizerFactory and solr.EdgeNGramFilterFactory
  in the schema.xml

 !-- bigram --
   !--  fieldType name=bigram
 class=solr.TextField positionIncrementGap=100
   analyzer type=index
  tokenizer class=solr.NGramTokenizerFactory minGramSize=3
 maxGramSize=15 /
   filter class=solr.LowerCaseFilterFactory/
/analyzer
  analyzer type=query
   tokenizer class=solr.WhitespaceTokenizerFactory /
   filter class=solr.LowerCaseFilterFactory/
   /analyzer

 /fieldType --

 fieldType name=bigram class=solr.TextField
 omitNorms=false
   analyzer
 tokenizer class=solr.StandardTokenizerFactory/
 filter class=solr.StandardFilterFactory/
 filter class=solr.LowerCaseFilterFactory/
 filter class=solr.EdgeNGramFilterFactory minGramSize=2
 maxGramSize=15 side=front/
   /analyzer
   /fieldType



 Fields I have configured as

   field name=prodsymbl type=bigraml indexed=true stored=true
 multiValued=true/
field name=measure1 type=bigram indexed=true stored=true
 multiValued=true/

 Copy field as

 copyField source=prodsymbl dest=text/
copyField source=measure1 dest=text/



 Please let me know IF I and missing anything, this is kind of Urgent
 requirement needs to be addressed at the earliest, Please help.


 Thanks in advance,

 Jay

Re: Something like 'bf' or 'bq' with MoreLikeThis

No worries! What version of Solr are you using? One that you
downloaded as a tarball or one that you checked out from SVN (trunk)?
I'll take a bit of time and document steps and respond.

I'll review the patch to see that it fits a general case. Question for
you with MLT, are your users doing a blank search (no text) for
something or are you returning results More Like results that were
generated as a result of a user typing some text query. I may have
built this patch assuming a blank query but I can make it work (or try
to) make it work for text based queries.

Thanks
Amit

On Wed, Jul 4, 2012 at 1:37 AM, nanshi nanshi.e...@gmail.com wrote:
 Thanks a lot, Amit! Please bear with me, I am a new Solr dev, could you
 please shed me some light on how to use a patch? point me to a wiki/doc is
 fine too. Thanks a lot! :)

 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Something-like-bf-or-bq-with-MoreLikeThis-tp3989060p3992935.html
 Sent from the Solr - User mailing list archive at Nabble.com.

Re: Urgent:Partial Search not Working

You need to apply the edge n-gram filter only at index time, not at query 
time. So, you need to specify two analyzers for these field types, an 
index and a query analyzer. They should be roughly the same, but the 
query analyzer would not have the edge n-gram filter since you are 
accepting the single n-gram given by the user and then matching it against 
the full list of n-grams that are in the index.


It is unfortunate that the wiki example is misleading. Just as bad, we don't 
have an example in the example schema.


Basically, take a text field type that you like from the Solr example 
schema and then add the edge n-gram filter to its index analyzer, probably 
as the last token filter. I would note that the edge n-gram filter will 
interact with the stemming filter, but there is not much you can do other 
than try different stemmers and experiment with whether stemming should be 
before or after the edge n-gram filter. I suspect that having stemming after 
edge n-gram may be better.


-- Jack Krupansky

-Original Message- 
From: jayakeerthi s

Sent: Wednesday, July 04, 2012 1:41 PM
To: solr-user@lucene.apache.org ; solr-user-h...@lucene.apache.org
Subject: Re: Urgent:Partial Search not Working

Could anyone please reply the solution to this

On Wed, Jul 4, 2012 at 7:18 PM, jayakeerthi s mail2keer...@gmail.comwrote:


All,

I am using apache-solr-4.0.0-ALPHA and trying to configure the Partial
search on two fields.

Keywords using to search are
The value inside the search ProdSymbl is M1.6X0.35 9P

and I willl have to get the results if I search for M1.6 or X0.35 (Partial
of the search value).


I have tried using  both NGramTokenizerFactory and 
solr.EdgeNGramFilterFactory

 in the schema.xml

!-- bigram --
  !--  fieldType name=bigram
class=solr.TextField positionIncrementGap=100
  analyzer type=index
 tokenizer class=solr.NGramTokenizerFactory minGramSize=3
maxGramSize=15 /
  filter class=solr.LowerCaseFilterFactory/
   /analyzer
 analyzer type=query
  tokenizer class=solr.WhitespaceTokenizerFactory /
  filter class=solr.LowerCaseFilterFactory/
  /analyzer

/fieldType --

fieldType name=bigram class=solr.TextField
omitNorms=false
  analyzer
tokenizer class=solr.StandardTokenizerFactory/
filter class=solr.StandardFilterFactory/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.EdgeNGramFilterFactory minGramSize=2
maxGramSize=15 side=front/
  /analyzer
  /fieldType



Fields I have configured as

  field name=prodsymbl type=bigraml indexed=true stored=true
multiValued=true/
   field name=measure1 type=bigram indexed=true stored=true
multiValued=true/

Copy field as

copyField source=prodsymbl dest=text/
   copyField source=measure1 dest=text/



Please let me know IF I and missing anything, this is kind of Urgent
requirement needs to be addressed at the earliest, Please help.


Thanks in advance,

Jay

Re: Use of Solr as primary store for search engine

Paul,

Thanks for your response! Were you using the SQL database as an object
store to pull XWiki objects or did you have to execute several queries
to reconstruct these objects? I don't know much about them sorry..
Also for those responding, can you provide a few basic metrics for me?
1) Number of nodes receiving queries
2) Approximate queries per second
3) Approximate latency per query

I know some of this may be sensitive depending on where you work so
reasonable ranges would be nice (i.e. sub-second isn't hugely helpful
since 50,100,200 ms have huge impacts depending on your site).

Thanks again!
Amit

On Wed, Jul 4, 2012 at 1:09 AM, Paul Libbrecht p...@hoplahup.net wrote:
 Amit,

 not exactly a response to your question but doing this with a lucene index on 
 i2geo.net has resulted in considerably performance boost (reading from 
 stored-fields instead of reading from the xwiki objects which pull from the 
 SQL database). However, it implied that we had to rewrite anything necessary 
 for the rendering, hence the rendering has not re-used that many code.

 Paul


 Le 4 juil. 2012 à 09:54, Amit Nithian a écrit :

 Hello all,

 I am curious to know how people are using Solr in conjunction with
 other data stores when building search engines to power web sites (say
 an ecommerce site). The question I have for the group is given an
 architecture where the primary (transactional) data store is MySQL
 (Oracle, PostGres whatever) with periodic indexing into Solr, when
 your front end issues a search query to Solr and returns results, are
 there any joins with your primary Oracle/MySQL etc to help render
 results?

 Basically I guess my question is whether or not you store enough in
 Solr so that when your front end renders the results page, it never
 has to hit the database. The other option is that your search engine
 only returns primary keys that your front end then uses to hit the DB
 to fetch data to display to your end user.

 With Solr 4.0 and Solr moving towards the NoSQL direction, I am
 curious what people are doing and what application architectures with
 Solr look like.

 Thanks!
 Amit

Re: Synonyms and hyphens

You could pre-process your queries to convert hyphen and other special 
characters to spaces.


-- Jack Krupansky

-Original Message- 
From: Alireza Salimi

Sent: Wednesday, July 04, 2012 12:56 PM
To: solr-user@lucene.apache.org
Subject: Re: Synonyms and hyphens

ok, so how can I prevent this behavior to happen?
As you can see the parsed query is very different in these two cases.

On Wed, Jul 4, 2012 at 1:37 PM, Jack Krupansky 
j...@basetechnology.comwrote:



There is one other detail that should clarify the situation. At query
time, the query parser itself is breaking your query into space-delimited
terms, and only calling the analyzer for each of those terms, each of 
which

will be treated as if a quoted phrase. So it doesn't matter whether it is
the standard analyzer or word delimiter filter or other filter that is
breaking up the compound term.

And the default query operator only applies to the terms as the query
parser parsed them, not for the sub-terms of a compound term like CD-ROM 
or

gb-mb.


-- Jack Krupansky

-Original Message- From: Alireza Salimi
Sent: Wednesday, July 04, 2012 12:05 PM

To: solr-user@lucene.apache.org
Subject: Re: Synonyms and hyphens

Wow, I didn't know that. Is there a way to disable this feature? I mean, 
is

it something coming from the Analyzer?

On Wed, Jul 4, 2012 at 12:26 PM, Jack Krupansky j...@basetechnology.com*
*wrote:

 Terms with embedded special characters are treated as phrases with spaces

in place of the special characters. So, gb-mb is treated as if you had
enclosed the term in quotes.

-- Jack Krupansky
-Original Message- From: Alireza Salimi
Sent: Wednesday, July 04, 2012 6:50 AM
To: solr-user@lucene.apache.org
Subject: Re: Synonyms and hyphens


Hi,

Does anybody know why hyphen '-' and q.op=AND causes such a big 
difference

between the two queries? I thought hyphens are removed by
StandardTokenizer
which means theoretically the two queries should be the same!

Thanks

On Tue, Jul 3, 2012 at 4:05 PM, Alireza Salimi alireza.sal...@gmail.com
*
*wrote:

 Hi,



I'm not sure if anybody has experienced this behavior before or not.
I noticed that 'hyphen' plays a very important role here.
I used Solr's default example directory.

http://localhost:8983/solr/select/?q=name:(gb-mb)**http://localhost:8983/solr/**select/?q=name:(gb-mb)**
version=2.2start=0rows=10indent=ondebugQuery=on**
indent=onwt=jsonq.op=ANDhtt**p://localhost:8983/solr/**
select/?q=name:(gb-mb)**version=2.2start=0rows=10**
indent=ondebugQuery=on**indent=onwt=jsonq.op=ANDhttp://localhost:8983/solr/select/?q=name:(gb-mb)version=2.2start=0rows=10indent=ondebugQuery=onindent=onwt=jsonq.op=AND


results in  parsedquery:+name:gb +name:gib +name:gigabyte
+name:gigabytes +name:mb +name:mib +name:megabyte +name:megabytes,

While searching http://localhost:8984/solr/**
select/?q=name:(gbmb)version=2.2start=0rows=10indent=**on**
debugQuery=onindent=onwt=jsonq.op=ANDhttp://**
localhost:8984/solr/select/?q=**name:(gbmb)version=2.2start=**
0rows=10indent=on**debugQuery=onindent=onwt=**jsonq.op=ANDhttp://localhost:8984/solr/select/?q=name:(gbmb)version=2.2start=0rows=10indent=ondebugQuery=onindent=onwt=jsonq.op=AND


results in parsedquery:+(name:gb name:gib name:gigabyte
name:gigabytes) +(name:mb name:mib name:megabyte name:megabytes),

If you notice to the first query - with hyphens - you can see that the
results of
parsing is totally different. I know that hyphens are special characters
in Solr,
but there's no way that the first query returns any entry because it's
asking for
ALL synonyms.

Am I missing something here?

Thanks


--
Alireza Salimi
Java EE Developer






--
Alireza Salimi
Java EE Developer





--
Alireza Salimi
Java EE Developer





--
Alireza Salimi
Java EE Developer

Re: Use of Solr as primary store for search engine

2012-07-04 Thread Paul Libbrecht


Le 4 juil. 2012 à 21:17, Amit Nithian a écrit :
 Thanks for your response! Were you using the SQL database as an object
 store to pull XWiki objects or did you have to execute several queries
 to reconstruct these objects?

The first. It's all fairly transparent.
There are XWiki Classes and XWiki objects which are rendered, they live as 
composite of the XWiki-java-objects which hibernate-persisted.

 I don't know much about them sorry..
 Also for those responding, can you provide a few basic metrics for me?
 1) Number of nodes receiving queries
 2) Approximate queries per second
 3) Approximate latency per query

I admire those that have this at hand.

 I know some of this may be sensitive depending on where you work so
 reasonable ranges would be nice (i.e. sub-second isn't hugely helpful
 since 50,100,200 ms have huge impacts depending on your site).

I think caching comes into play here in a very strong manner, so these measures 
are fairly difficult to establish. One Solr I run, in particular, makes 
differences between 100ms (uncached queries) and 9 ms (cached query).

Paul

Re: Urgent:Partial Search not Working

2012-07-04 Thread jayakeerthi s

Hi Jack,

Many thanks for your reply...
yes i have tried both ngram and Edgegram filterfactory, still no result.
Please le t me know any alternatives

On Thu, Jul 5, 2012 at 12:42 AM, Jack Krupansky j...@basetechnology.comwrote:

 You need to apply the edge n-gram filter only at index time, not at query
 time. So, you need to specify two analyzers for these field types, an
 index and a query analyzer. They should be roughly the same, but the
 query analyzer would not have the edge n-gram filter since you are
 accepting the single n-gram given by the user and then matching it against
 the full list of n-grams that are in the index.

 It is unfortunate that the wiki example is misleading. Just as bad, we
 don't have an example in the example schema.

 Basically, take a text field type that you like from the Solr example
 schema and then add the edge n-gram filter to its index analyzer,
 probably as the last token filter. I would note that the edge n-gram filter
 will interact with the stemming filter, but there is not much you can do
 other than try different stemmers and experiment with whether stemming
 should be before or after the edge n-gram filter. I suspect that having
 stemming after edge n-gram may be better.

 -- Jack Krupansky

 -Original Message- From: jayakeerthi s
 Sent: Wednesday, July 04, 2012 1:41 PM
 To: solr-user@lucene.apache.org ; 
 solr-user-help@lucene.apache.**orgsolr-user-h...@lucene.apache.org
 Subject: Re: Urgent:Partial Search not Working


 Could anyone please reply the solution to this

 On Wed, Jul 4, 2012 at 7:18 PM, jayakeerthi s mail2keer...@gmail.com
 wrote:

  All,

 I am using apache-solr-4.0.0-ALPHA and trying to configure the Partial
 search on two fields.

 Keywords using to search are
 The value inside the search ProdSymbl is M1.6X0.35 9P

 and I willl have to get the results if I search for M1.6 or X0.35 (Partial
 of the search value).


 I have tried using  both NGramTokenizerFactory and
 solr.EdgeNGramFilterFactory
  in the schema.xml

 !-- bigram --
   !--  fieldType name=bigram
 class=solr.TextField positionIncrementGap=100
   analyzer type=index
  tokenizer class=solr.**NGramTokenizerFactory minGramSize=3
 maxGramSize=15 /
   filter class=solr.**LowerCaseFilterFactory/
/analyzer
  analyzer type=query
   tokenizer class=solr.**WhitespaceTokenizerFactory /
   filter class=solr.**LowerCaseFilterFactory/
   /analyzer

 /fieldType --

 fieldType name=bigram class=solr.TextField
 omitNorms=false
   analyzer
 tokenizer class=solr.**StandardTokenizerFactory/
 filter class=solr.**StandardFilterFactory/
 filter class=solr.**LowerCaseFilterFactory/
 filter class=solr.**EdgeNGramFilterFactory minGramSize=2
 maxGramSize=15 side=front/
   /analyzer
   /fieldType



 Fields I have configured as

   field name=prodsymbl type=bigraml indexed=true stored=true
 multiValued=true/
field name=measure1 type=bigram indexed=true stored=true
 multiValued=true/

 Copy field as

 copyField source=prodsymbl dest=text/
copyField source=measure1 dest=text/



 Please let me know IF I and missing anything, this is kind of Urgent
 requirement needs to be addressed at the earliest, Please help.


 Thanks in advance,

 Jay

Re: Urgent:Partial Search not Working

Don't forget to test your field type analyzers on the Solr Admin analysis 
page. It will show you exactly how terms gets analyzed at both index and 
query time.


If something is not working, be specific as to what the case is and exactly 
what is not as you would expect, both the expected value and the actual 
value.


-- Jack Krupansky

-Original Message- 
From: jayakeerthi s

Sent: Wednesday, July 04, 2012 3:44 PM
To: solr-user@lucene.apache.org
Subject: Re: Urgent:Partial Search not Working

Hi Jack,

Many thanks for your reply...
yes i have tried both ngram and Edgegram filterfactory, still no result.
Please le t me know any alternatives

On Thu, Jul 5, 2012 at 12:42 AM, Jack Krupansky 
j...@basetechnology.comwrote:



You need to apply the edge n-gram filter only at index time, not at query
time. So, you need to specify two analyzers for these field types, an
index and a query analyzer. They should be roughly the same, but the
query analyzer would not have the edge n-gram filter since you are
accepting the single n-gram given by the user and then matching it against
the full list of n-grams that are in the index.

It is unfortunate that the wiki example is misleading. Just as bad, we
don't have an example in the example schema.

Basically, take a text field type that you like from the Solr example
schema and then add the edge n-gram filter to its index analyzer,
probably as the last token filter. I would note that the edge n-gram 
filter

will interact with the stemming filter, but there is not much you can do
other than try different stemmers and experiment with whether stemming
should be before or after the edge n-gram filter. I suspect that having
stemming after edge n-gram may be better.

-- Jack Krupansky

-Original Message- From: jayakeerthi s
Sent: Wednesday, July 04, 2012 1:41 PM
To: solr-user@lucene.apache.org ; 
solr-user-help@lucene.apache.**orgsolr-user-h...@lucene.apache.org

Subject: Re: Urgent:Partial Search not Working


Could anyone please reply the solution to this

On Wed, Jul 4, 2012 at 7:18 PM, jayakeerthi s mail2keer...@gmail.com
wrote:

 All,


I am using apache-solr-4.0.0-ALPHA and trying to configure the Partial
search on two fields.

Keywords using to search are
The value inside the search ProdSymbl is M1.6X0.35 9P

and I willl have to get the results if I search for M1.6 or X0.35 
(Partial

of the search value).


I have tried using  both NGramTokenizerFactory and
solr.EdgeNGramFilterFactory
 in the schema.xml

!-- bigram --
  !--  fieldType name=bigram
class=solr.TextField positionIncrementGap=100
  analyzer type=index
 tokenizer class=solr.**NGramTokenizerFactory minGramSize=3
maxGramSize=15 /
  filter class=solr.**LowerCaseFilterFactory/
   /analyzer
 analyzer type=query
  tokenizer class=solr.**WhitespaceTokenizerFactory /
  filter class=solr.**LowerCaseFilterFactory/
  /analyzer

/fieldType --

fieldType name=bigram class=solr.TextField
omitNorms=false
  analyzer
tokenizer class=solr.**StandardTokenizerFactory/
filter class=solr.**StandardFilterFactory/
filter class=solr.**LowerCaseFilterFactory/
filter class=solr.**EdgeNGramFilterFactory minGramSize=2
maxGramSize=15 side=front/
  /analyzer
  /fieldType



Fields I have configured as

  field name=prodsymbl type=bigraml indexed=true stored=true
multiValued=true/
   field name=measure1 type=bigram indexed=true stored=true
multiValued=true/

Copy field as

copyField source=prodsymbl dest=text/
   copyField source=measure1 dest=text/



Please let me know IF I and missing anything, this is kind of Urgent
requirement needs to be addressed at the earliest, Please help.


Thanks in advance,

Jay

Re: Boosting the score of the whole documents

I'm not completely sure. I wouldn't expect that document boost should 
require field norms, but glancing at the code, it seems that having 
omitNorms=true does mean that the score for a field will not get the 
document boost, and in fact such a field gets a constant score. In other 
words, that the score for any field within the document will only get the 
document boost if that field does not have omitNorms=true. But as long as at 
least one field has norms, the document score should get some boost from 
document boost. I am not sure if this is the way the code is supposed to 
work, or whether it just happens to be this way.


I would hope that some committer with detailed knowledge of norms and 
similarity weigh in on this matter.

-- Jack Krupansky

-Original Message- 
From: Danilak Michal

Sent: Wednesday, July 04, 2012 1:11 PM
To: solr-user@lucene.apache.org
Subject: Re: Boosting the score of the whole documents

Should there be made any modification into scheme.xml file?
For example, to enable field boosts, one has to set omitNorms to false.
Is there some similar field for document boosts?

On Wed, Jul 4, 2012 at 7:29 PM, Jack Krupansky 
j...@basetechnology.comwrote:



Make sure to review the similarity javadoc page to understand what any
of these factors does to the document score.

See:
http://lucene.apache.org/core/**3_6_0/api/all/org/apache/**
lucene/search/Similarity.htmlhttp://lucene.apache.org/core/3_6_0/api/all/org/apache/lucene/search/Similarity.html

Sure, a document boost applies a multiplicative factor, but that is all
relative to all of the other factors for that document and query. In other
words, all other things being equal, a doc-boost of 2.0 would double the
score, but all other things are usually not equal.

Try different doc-boost values and see how the score is affected. The
document may have such a low score that a boost of 2.0 doesn't move the
needle relative to other documents.

I believe that the doc-boost is included within the fieldNorm value that
is shown in the explain section if you add debugQuery=true to your 
query

request. This is explained under norm in the similarity javadoc.

I did try a couple of examples with the Solr 3.6 example, such as doc
boost=2.0, 0.2 (de-boost), 4.0, and 8.0. In my case, it took a boost of 
8.0

to move a document up.

-- Jack Krupansky

-Original Message- From: Danilak Michal
Sent: Wednesday, July 04, 2012 10:57 AM
To: solr-user@lucene.apache.org
Subject: Boosting the score of the whole documents


Hi guys,
I have the following problem.

I would like to give a boost to the whole documents as I index them. I am
sending to solr the xml in the form:

adddoc boost=2.0/doc/add

But it does't seem to alter the search scores in any way. I would expect
that to multiply the final search score by two, am I correct?
Probably I would need to alter schema.xml, but I found only information on
how to do that for specific fields (just put omitNorms=false into the 
field

tag). But what should I do, if I want to boost the whole document?

Note: by boosting a whole document I mean, that if document A has search
score 10.0 and document B has search score 15.0 and I give document A the
boost 2.0, when I index it, I would expect its search score to be 20.0.

Thanks in advance!

Re: Something like 'bf' or 'bq' with MoreLikeThis