date:20150528

In the velocity directory you'll find the templates that implement all
this stuff, I usually copy/paste. Do understand that this is _demo_
code, not an official UI for Solr so it's rather a case of digging
in and experimenting.

I would _not_ use this (or anything else that gave direct access to
Solr) for user-facing apps. If you give me direct access to the URL, I
can delete collections, delete documents, and a myriad of other really
bad things.

Best,
Erick

On Thu, May 28, 2015 at 1:11 PM, Sznajder ForMailingList
bs4mailingl...@gmail.com wrote:
 Hi

 I tried to use the UI Velocity from Solr.

 Could you please help in the following:

 - how do I define the fields from my schema that I would like to be
 displayed as facet in the UI?

 Thanks!

 Benjamin

Re: Dynamic range on numbers


: i'm not sure i follow what you're saying on #3. let me clarify in case it's
: on my end. i was wanting to *eventually* set a lower bound of -10%size1 and
: an upper of +10%size1. for the sake of experimentation i started with just

lower bound of what ?

write out the math equation you want to satisfy, and you'll see that based 
on your original problem statement there are 3 numbers involved

  X) 0.1
  Y) some fixed numeric value give by your user
  Z) some field value of each document

You said you wnat to find documents where the field value Z is within 
10% (X) of the value Y that your user specified...

Find all docs such that:
Z = Y + Y*X
Z = Y - Y*X

At no point in the query/function you were attempting, did you ever 
specify fixed numeric value provided by your user.  What you had written 
was:

Find all docs such that:
Z = Z + Z*X

That's the point i'm making in #3...


:  : 3) right. i hadn't bothered with the upper limit yet simply for sake of
:  : less complexity / chance to fk it up. wanted to get the function working
:  : for lower before worrying about adding u= and getting the query refined
: 
:  To be very clear: even if what you were trying to do worked as you wrote
:  it, adding an upper bound wouldn't change the fact that the comparison you
:  were trying ot make in your query doesn't match the problem statement in
:  your question, and doesn't make sense in general -- you need to compare
:  the field value with 10% of some OTHER input value -- not 10% of itself.
: 
:  Adding an upper bound that was similar to hte lower bound you were trying
:  would have simply prevented any docs from mathcing at all.

...

:  :  : Expected identifier at pos 29 str='{!frange l=sum(size1,
:  product(size1,
:  :  : .10))}size1

...

:  :  3) even if you could pass a function for the l param, conceptually
:  what
:  :  you are asking for doesn't really make much sense ... you are asking
:  solr
:  :  to only return documents where the value of the size1 field is in a
:  :  range between X and infinity, where X is defined as the sum of the
:  value
:  :  of the size1 field plus 10% of the value of the size1 field.
:  : 
:  :  In other words: give me all docs where S * 1.1 = S
:  : 
:  :  Basically you are asking it to return all documents with a negative
:  value
:  :  in the size1 field.


-Hoss
http://www.lucidworks.com/

Re: Ignoring the Document Cache per query

First, there isn't that I know of. But why would you want to do this?

On the face of it, it makes no sense to ignore the doc cache. One of its
purposes is to hold the document (read off disk) for successive
search components _in the same query_. Otherwise, each component
might have to do a disk seek.

So I must be missing why you want to do this.

Best,
Erick

On Thu, May 28, 2015 at 1:23 PM, Bryan Bende bbe...@gmail.com wrote:
 Is there a way to the document cache on a per-query basis?

 It looks like theres {!cache=false} for preventing the filter cache from
 being used for a given query, looking for the same thing for the document
 cache.

 Thanks,

 Bryan

Re: When is too many fields in qf is too many?

2015-05-28 Thread Jack Krupansky

I would reconsider the strategy of mashing so many different record types
into one Solr collection. Sure, you get some advantage from denormalizing
data, but if the downside cost gets too high, it may not make so much sense.

I'd consider a collection per record type, or at least group similar record
types, and then query as many collections - in parallel - as needed for a
given user. That should also assure that a query for a given record type
should be much faster as well.

Surely you should be able to examine the query in the app and determine
what record types it might apply to.

When in doubt, make your schema as clean and simple as possible. Simplicity
over complexity.


-- Jack Krupansky

On Thu, May 28, 2015 at 12:06 PM, Erick Erickson erickerick...@gmail.com
wrote:

 Gotta agree with Jack here. This is an insane number of fields, query
 performance on any significant corpus will be fraught etc. The very
 first thing I'd look at is having that many fields. You have 3,500
 different fields! Whatever the motivation for having that many fields
 is the place I'd start.

 Best,
 Erick

 On Thu, May 28, 2015 at 5:50 AM, Jack Krupansky
 jack.krupan...@gmail.com wrote:
  This does not even pass a basic smell test for reasonability of matching
  the capabilities of Solr and the needs of your application. I'd like to
  hear from others, but I personally would be -1 on this approach to
 misusing
  qf. I'd simply say that you need to go back to the drawing board, and
 that
  your primary focus should be on working with your application product
  manager to revise your application requirements to more closely match the
  capabilities of Solr.
 
  To put it simply, if you have more than a dozen fields in qf, you're
  probably doing something wrong. In this case horribly wrong.
 
  Focus on designing your app to exploit the capabilities of Solr, not to
  misuse them.
 
  In short, to answer the original question, more than a couple dozen
 fields
  in qf is indeed too many. More than a dozen raises a yellow flag for me.
 
 
  -- Jack Krupansky
 
  On Thu, May 28, 2015 at 8:13 AM, Steven White swhite4...@gmail.com
 wrote:
 
  Hi Charles,
 
  That is what I have done.  At the moment, I have 22 request handlers,
 some
  have 3490 field items in qf (that's the most and the qf line spans
 over
  95,000 characters in solrconfig.xml file) and the least one has 1341
  fields.  I'm working on seeing if I can use copyField to copy the data
 of
  that view's field into a single pseudo-view-field and use that pseudo
 field
  for qf of that view's request handler.  The I still have outstanding
 with
  using copyField in this way is that it could lead to a complete
 re-indexing
  of all the data in that view when a field is adding / removing from that
  view.
 
  Thanks
 
  Steve
 
  On Wed, May 27, 2015 at 6:02 PM, Reitzel, Charles 
  charles.reit...@tiaa-cref.org wrote:
 
   One request handler per view?
  
   I think if you are able to make the actual view in use for the current
   request a single value (vs. all views that the user could use over
 time),
   it would keep the qf list down to a manageable size (e.g. specified
  within
   the request handler XML).   Not sure if this is feasible for  you,
 but it
   seems like a reasonable approach given the use case you describe.
  
   Just a thought ...
  
   -Original Message-
   From: Steven White [mailto:swhite4...@gmail.com]
   Sent: Tuesday, May 26, 2015 4:48 PM
   To: solr-user@lucene.apache.org
   Subject: Re: When is too many fields in qf is too many?
  
   Thanks Doug.  I might have to take you on the hangout offer.  Let me
   refine the requirement further and if I still see the need, I will let
  you
   know.
  
   Steve
  
   On Tue, May 26, 2015 at 2:01 PM, Doug Turnbull 
   dturnb...@opensourceconnections.com wrote:
  
How you have tie is fine. Setting tie to 1 might give you reasonable
results. You could easily still have scores that are just always an
order of magnitude or two higher, but try it out!
   
BTW Anything you put in teh URL can also be put into a request
 handler.
   
If you ever just want to have a 15 minute conversation via hangout,
happy to chat with you :) Might be fun to think through your prob
   together.
   
-Doug
   
On Tue, May 26, 2015 at 1:42 PM, Steven White swhite4...@gmail.com
 
wrote:
   
 Hi Doug,

 I'm back to this topic.  Unfortunately, due to my DB structer, and
business
 need, I will not be able to search against a single field (i.e.:
 using copyField).  Thus, I have to use list of fields via qf.
 Given this, I see you said above to use tie=1.0 will that, more
 or
 less, address this scoring issue?  Should tie=1.0 be set on the
   request handler like so:

   requestHandler name=/select class=solr.SearchHandler
  lst name=defaults
str name=echoParamsexplicit/str
int name=rows20/int
str

Ignoring the Document Cache per query

2015-05-28 Thread Bryan Bende

Is there a way to the document cache on a per-query basis?

It looks like theres {!cache=false} for preventing the filter cache from
being used for a given query, looking for the same thing for the document
cache.

Thanks,

Bryan

Re: Dynamic range on numbers


: 2) lame :\

Why do you say that? ... it's a practical limitation -- for each document 
a function is computed, and then the result of that function is compared 
against the (fixed) upper and lower bounds.

In situations where you want the something like the lower bound of the 
range comparison to be another function relative to the document, that's 
equivilent to unwinding that lower bound function and rolling it into 
the function you are testing -- just like i did in the example i posted in 
#4 of my email (below)

By requiring that the uper and lower bounds be fixed, the common case can 
be optimized, but cases like that one can stll be supported.  if the lower 
 upper bound params supported arbitrary functions, the implementation 
would be a lot more complex -- slower for hte common case, and hte same 
speed for uncommon cases like what you're describing.

: 3) right. i hadn't bothered with the upper limit yet simply for sake of
: less complexity / chance to fk it up. wanted to get the function working
: for lower before worrying about adding u= and getting the query refined

To be very clear: even if what you were trying to do worked as you wrote 
it, adding an upper bound wouldn't change the fact that the comparison you 
were trying ot make in your query doesn't match the problem statement in 
your question, and doesn't make sense in general -- you need to compare 
the field value with 10% of some OTHER input value -- not 10% of itself.  

Adding an upper bound that was similar to hte lower bound you were trying 
would have simply prevented any docs from mathcing at all.



:  : Expected identifier at pos 29 str='{!frange l=sum(size1, product(size1,
:  : .10))}size1
:  :
:  : pos 29 is the open parenthesis of product(). can i not use a function
:  : within a function? or is there something else i'm missing in the way i'm
:  : constructing this?
: 
:  1) you're confusing the parser by trying to put whitespace inside of a
:  local param (specifically the 'l' param) w/o quoting the param value .. it
:  things that you want sum(size1 to be the value of the l param, and
:  then it doesn't know what to make of product(size1 as a another local
:  param that it can't make sense of.
: 
:  2) if you remove the whitespace, or quote the param, that will solve that
:  parsing error -- but it will lead to a new error from
:  ValueSourceRangeFilter (ie: frange) because the l param doesn't
:  support arbitrary functions -- it needs to be a concrete number.
: 
:  3) even if you could pass a function for the l param, conceptually what
:  you are asking for doesn't really make much sense ... you are asking solr
:  to only return documents where the value of the size1 field is in a
:  range between X and infinity, where X is defined as the sum of the value
:  of the size1 field plus 10% of the value of the size1 field.
: 
:  In other words: give me all docs where S * 1.1 = S
: 
:  Basically you are asking it to return all documents with a negative value
:  in the size1 field.
: 
: 
:  4) your original question was about filtering docs where the value of a
:  field was inside a range of +/- X% of a specified value.  a range query
:  where you computed the lower/upper bounds bsed on that percentage in the
:  client is really the most straight forward way to do that.
: 
:  the main reason to consider using frange for something like this is if you
:  wnat to filter documents based on the reuslts of a function over multiple
:  fields. (ie: docs where the price field divided by the quantity_included
:  field was within a client specified range)
: 
:  adimitedly, you could do something like this...
: 
:  fq={!frange u=0.10}div(abs(sub($target,size1)),$target)target=345
: 
:  ...which would tell solr to find you all documents where the size1 field
:  was within 10% of the target value (345 in this case) -- ie: 310.5 =
:  size1 = 379.5)
: 
:  however it's important to realize that doing something like this is going
:  to be less inefficient then just computing the lower/upper range
:  bounds in the client -- because solr will be evaluating that function for
:  every document in order to make the comparison.  (meanwhile you can
:  compute the upper and lower bounds exactly once and just let solr do the
:  comparisons)
: 
: 
:  -Hoss
:  http://www.lucidworks.com/
: 
: 

-Hoss
http://www.lucidworks.com/

Re: When is too many fields in qf is too many?

2015-05-28 Thread Steven White

Hi Folks,

First, thanks for taking the time to read and reply to this subject, it is
much appreciated, I have yet to come up with a final solution that
optimizes Solr.  To give you more context, let me give you the big picture
of how the application and the database is structured for which I'm trying
to enable Solr search on.

Application: Has the concept of views.  A view contains one or more
object types.  An object type may exist in any view.  An object type has
one or more field groups.  A field group has a set of fields.  A field
group can be used with any object type of any view.  Notice how field
groups are free standing, that they can be linked to an object type of
any view?

Here is a diagram of the above:

FieldGroup-#1 == Field-1, Field-2, Field-5, etc.
FieldGroup-#2 == Field-1, Field-5, Field-6, Field-7, Field-8, etc.
FieldGroup-#3 == Field-2, Field-5, Field-8, etc.

View-#1 == ObjType-#2 (using FieldGroup-#1  #3)  +  ObjType-#4 (using
FieldGroup-#1)  +  ObjType-#5 (using FieldGroup-#1, #2, #3, etc).

View-#2 == ObjType-#1 (using FieldGroup-#3, #15, #16, #19, etc.)  +
 ObjType-#4 (using FieldGroup-#1, #4, #19, etc.)  +  etc.

View-#3 == ObjType-#1 (using FieldGroup-#1,  #8)  +  etc.

Do you see where this is heading?  To make it even a bit more interesting,
ObjType-#4 (which is in view-#1 and #2 per the above) which in both views,
it uses FieldGroup-#1, in one view it can be configured to have its own
fields off FieldGroup-#1.

With the above setting, a user is assigned a view and can be moved around
views but cannot be in multiple views at the same time.  Based on which
view that user is in, that user will see different fields of ObjType-#1
(the example I gave for FieldGroup-#1) or even not see an object type that
he was able to see in another view.

If I have not lost you with the above, you can see that per view, there can
be may fields.  To make it even yet more interesting, a field in
FieldGroup-#1 may have the exact same name as a field in another FieldGroup
and the two could be of different type (one is date, the other is string
type).  Thus when I build my Solr doc object (and create list of Solr
fields) those fields must be prefixed with the FieldGroup name otherwise I
could end up overwriting the type of another field.

Are you still with me?  :-)

Now you see how a view can end up with many fields (over 3500 in my case),
but a doc I post to Solr for indexing will have on average 50 fields, worse
case maybe 200 fields.  This is fine, and it is not my issue but I want to
call it out to get it out of our way.

Another thing I need to mention is this (in case it is not clear from the
above).  Users create and edit records in the DB by an instance of
ObjType-#N.  Those object types that are created do NOT belong to a view,
in fact they do NOT have any view concept in them.  They simply have the
concept of what fields the user can see / edit based on which view that
user is in.  In effect, in the DB, we have instances of object types data.

One last thing I should point out is that views, and field groups are
dynamic.  This month, View-#3 may have ObjType-#1, but next month it may
not or a new object type may be added to it.

Still with me?  If so, you are my hero!!  :-)

So, I setup my Solr schema.xml to include all fields off each field group
that exists in the database like so:

field name=FieldGroup-1.Headline type=text multiValued=true
indexed=true stored=false required=false/
field name=FieldGroup-1.Summary type=text multiValued=true
indexed=true stored=false required=false/
field name=FieldGroup-1. ... ... ... ... /
field name=FieldGroup-2.Headline type=text multiValued=true
indexed=true stored=false required=false/
field name=FieldGroup-2.Summary type=text multiValued=true
indexed=true stored=false required=false/
field name=FieldGroup-2.Date type=text multiValued=true
indexed=true stored=false required=false/
field name=FieldGroup-2. ... ... ... ... /
field name=FieldGroup-3. ... ... ... ... /
field name=FieldGroup-4. ... ... ... ... /

You got the idea.  Each record of an object type I index contains ALL the
fields off that that object type REGARDLESS which view that object type is
set to be in (remember, all that views does is let you configure the list
of fields visible / accessible in that view).

Next, in Solr I created request handlers per view.  The request handler
utilizes qf to list all fields that are viewable for that view.  When a
user logs into the application, I know which view that user is in so I
issue a search request against that view in effect the search is against
the list of fields of that view.

Why not create a per view pseudo Solr field and copyField into it the
fields data and than use that single field as the qf vs. 100's of filed?
Two reasons:

1) Like I said above, views are dynamic.  On a monthly basic, a object
types or even field groups can be added / removed from a view.  If I was
using copyField it means I have

Problem indexing, value 0.0

2015-05-28 Thread Shawn Heisey

Here's the error I am getting on Solr 4.9.1 on a production server:

ERROR - 2015-05-28 14:39:13.449; org.apache.solr.common.SolrException;
org.apache.solr.common.SolrException: ERROR: [doc=getty36914013] Error
adding field 'price'='0.0' msg=For input string: 0.0

On a dev server, everything is fine.

There are only a few tiny differences.

Production Solr: 4.9.1, CentOS 6, Oracle JDK 7u72.
Production build (SolrJ): CentOS 6, Oracle JDK 7u72, SolrJ 5.1.0.

The production solr and production build are running on different servers.

Dev Solr: 4.9.1, CentOS 7, OpenJDK 7u79.
Dev build (SolrJ): CentOS 7, OpenJDK 7u79, SolrJ 5.1.0.

The dev solr and dev build are on the same server.

The two build programs are running the same SolrJ code.  The code is
compiled locally by JDK that runs it.

---

Initially, price was an int field - TrieIntField with precisionStep
set to 0.

Because we are planning a change on this field in the database to
decimal, I tried changing the price field on both solr servers to double
-- TrieDoubleField with precisionStep set to 0.  This didn't fix the
problem.  Dev was still fine, production throws the exception shown. 
The source database is still integer ... so why is it showing 0.0 as
the value?

Help?

Thanks,
Shawn

Re: Dynamic range on numbers

1) ooo, i see
2) lame :\
3) right. i hadn't bothered with the upper limit yet simply for sake of
less complexity / chance to fk it up. wanted to get the function working
for lower before worrying about adding u= and getting the query refined
4) very good point about just doing it client side. i know in one instance
(and the most immediate one as far as product development/goals is
concerned) this would certainly be both easily doable and desired. there
are other cases where i could see us trying to find a document and then
based off of its returned sizes trying to find a range of items like it
(via morelikethis i assume?).

in either case, point 4 stands and i probably got carried away in the
learning process w/o stepping back to think about real life implementation
and workarounds.

thanks!

-- 
*John Blythe*
Product Manager  Lead Developer

251.605.3071 | j...@curvolabs.com
www.curvolabs.com

58 Adams Ave
Evansville, IN 47713

On Thu, May 28, 2015 at 3:06 PM, Chris Hostetter hossman_luc...@fucit.org
wrote:


 : Expected identifier at pos 29 str='{!frange l=sum(size1, product(size1,
 : .10))}size1
 :
 : pos 29 is the open parenthesis of product(). can i not use a function
 : within a function? or is there something else i'm missing in the way i'm
 : constructing this?

 1) you're confusing the parser by trying to put whitespace inside of a
 local param (specifically the 'l' param) w/o quoting the param value .. it
 things that you want sum(size1 to be the value of the l param, and
 then it doesn't know what to make of product(size1 as a another local
 param that it can't make sense of.

 2) if you remove the whitespace, or quote the param, that will solve that
 parsing error -- but it will lead to a new error from
 ValueSourceRangeFilter (ie: frange) because the l param doesn't
 support arbitrary functions -- it needs to be a concrete number.

 3) even if you could pass a function for the l param, conceptually what
 you are asking for doesn't really make much sense ... you are asking solr
 to only return documents where the value of the size1 field is in a
 range between X and infinity, where X is defined as the sum of the value
 of the size1 field plus 10% of the value of the size1 field.

 In other words: give me all docs where S * 1.1 = S

 Basically you are asking it to return all documents with a negative value
 in the size1 field.


 4) your original question was about filtering docs where the value of a
 field was inside a range of +/- X% of a specified value.  a range query
 where you computed the lower/upper bounds bsed on that percentage in the
 client is really the most straight forward way to do that.

 the main reason to consider using frange for something like this is if you
 wnat to filter documents based on the reuslts of a function over multiple
 fields. (ie: docs where the price field divided by the quantity_included
 field was within a client specified range)

 adimitedly, you could do something like this...

 fq={!frange u=0.10}div(abs(sub($target,size1)),$target)target=345

 ...which would tell solr to find you all documents where the size1 field
 was within 10% of the target value (345 in this case) -- ie: 310.5 =
 size1 = 379.5)

 however it's important to realize that doing something like this is going
 to be less inefficient then just computing the lower/upper range
 bounds in the client -- because solr will be evaluating that function for
 every document in order to make the comparison.  (meanwhile you can
 compute the upper and lower bounds exactly once and just let solr do the
 comparisons)


 -Hoss
 http://www.lucidworks.com/

Re: Dynamic range on numbers

i'm not sure i follow what you're saying on #3. let me clarify in case it's
on my end. i was wanting to *eventually* set a lower bound of -10%size1 and
an upper of +10%size1. for the sake of experimentation i started with just
the lower bound. i didn't care (at that point) about the results, just
getting a successful query to run. upon getting to that point i would then
tailor the lower and upper bounds accordingly to begin testing more true to
life queries.

at any rate, the #4 point seems to be the path to take for the present.

thanks for the discussion!

-- 
*John Blythe*
Product Manager  Lead Developer

251.605.3071 | j...@curvolabs.com
www.curvolabs.com

58 Adams Ave
Evansville, IN 47713

On Thu, May 28, 2015 at 4:56 PM, Chris Hostetter hossman_luc...@fucit.org
wrote:


 : 2) lame :\

 Why do you say that? ... it's a practical limitation -- for each document
 a function is computed, and then the result of that function is compared
 against the (fixed) upper and lower bounds.

 In situations where you want the something like the lower bound of the
 range comparison to be another function relative to the document, that's
 equivilent to unwinding that lower bound function and rolling it into
 the function you are testing -- just like i did in the example i posted in
 #4 of my email (below)

 By requiring that the uper and lower bounds be fixed, the common case can
 be optimized, but cases like that one can stll be supported.  if the lower
  upper bound params supported arbitrary functions, the implementation
 would be a lot more complex -- slower for hte common case, and hte same
 speed for uncommon cases like what you're describing.

 : 3) right. i hadn't bothered with the upper limit yet simply for sake of
 : less complexity / chance to fk it up. wanted to get the function working
 : for lower before worrying about adding u= and getting the query refined

 To be very clear: even if what you were trying to do worked as you wrote
 it, adding an upper bound wouldn't change the fact that the comparison you
 were trying ot make in your query doesn't match the problem statement in
 your question, and doesn't make sense in general -- you need to compare
 the field value with 10% of some OTHER input value -- not 10% of itself.

 Adding an upper bound that was similar to hte lower bound you were trying
 would have simply prevented any docs from mathcing at all.



 :  : Expected identifier at pos 29 str='{!frange l=sum(size1,
 product(size1,
 :  : .10))}size1
 :  :
 :  : pos 29 is the open parenthesis of product(). can i not use a function
 :  : within a function? or is there something else i'm missing in the way
 i'm
 :  : constructing this?
 : 
 :  1) you're confusing the parser by trying to put whitespace inside of a
 :  local param (specifically the 'l' param) w/o quoting the param value
 .. it
 :  things that you want sum(size1 to be the value of the l param, and
 :  then it doesn't know what to make of product(size1 as a another local
 :  param that it can't make sense of.
 : 
 :  2) if you remove the whitespace, or quote the param, that will solve
 that
 :  parsing error -- but it will lead to a new error from
 :  ValueSourceRangeFilter (ie: frange) because the l param doesn't
 :  support arbitrary functions -- it needs to be a concrete number.
 : 
 :  3) even if you could pass a function for the l param, conceptually
 what
 :  you are asking for doesn't really make much sense ... you are asking
 solr
 :  to only return documents where the value of the size1 field is in a
 :  range between X and infinity, where X is defined as the sum of the
 value
 :  of the size1 field plus 10% of the value of the size1 field.
 : 
 :  In other words: give me all docs where S * 1.1 = S
 : 
 :  Basically you are asking it to return all documents with a negative
 value
 :  in the size1 field.
 : 
 : 
 :  4) your original question was about filtering docs where the value of a
 :  field was inside a range of +/- X% of a specified value.  a range query
 :  where you computed the lower/upper bounds bsed on that percentage in
 the
 :  client is really the most straight forward way to do that.
 : 
 :  the main reason to consider using frange for something like this is if
 you
 :  wnat to filter documents based on the reuslts of a function over
 multiple
 :  fields. (ie: docs where the price field divided by the
 quantity_included
 :  field was within a client specified range)
 : 
 :  adimitedly, you could do something like this...
 : 
 :  fq={!frange u=0.10}div(abs(sub($target,size1)),$target)target=345
 : 
 :  ...which would tell solr to find you all documents where the size1
 field
 :  was within 10% of the target value (345 in this case) -- ie: 310.5 =
 :  size1 = 379.5)
 : 
 :  however it's important to realize that doing something like this is
 going
 :  to be less inefficient then just computing the lower/upper range
 :  bounds in the client -- because solr will be evaluating that function
 for
 :  every document in

RE: optimal shard assignment with low shard key cardinality using compositeId to enable shard splitting

2015-05-28 Thread Reitzel, Charles

We have used a similar sharding strategy for exactly the reasons you say.   But 
we are fairly certain that the # of documents per user ID is  5000 and, 
typically, 500.   Thus, we think the overhead of distributed searches clearly 
outweighs the benefits.   Would you agree?   We have done some load testing 
(with 100's of simultaneous users) and performance has been good with data and 
queries distributed evenly across shards.

In Matteo's case, this model appears to apply well to user types B and C.
Not sure about user type A, though.At  100,000 docs per user per year, on 
average, that load seems ok for one node.   But, is it enough to benefit 
significantly from a parallel search?

With a 2 part composite ID, each part will contribute 16 bits to a 32 bit hash 
value, which is then compared to the set of hash ranges for each active shard.  
 Since the user ID will contribute the high-order bytes, it will dominate in 
matching the target shard(s).   But dominance doesn't mean the lower order 16 
bits will always be ignored, does it?   I.e. if the original shard has been 
split, perhaps multiple times, isn't it possible that one user IDs documents 
will be spread over a multiple shards?

In Matteo's case, it might make sense to specify fewer bits to the user ID for 
user category A.   I.e. what I described above is the default for userId!docId. 
  But if you use userId/8!docId/24 (8 bits for userId and 24 bits for the 
document ID), then couldn't one user's docs might be split over multiple 
shards, even without splitting?

I'm just making sure I understand how composite ID sharding works correctly.   
Have I got it right?  Has any of this logic changed in 5.x?

-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: Thursday, May 21, 2015 11:30 AM
To: solr-user@lucene.apache.org
Subject: Re: optimal shard assignment with low shard key cardinality using 
compositeId to enable shard splitting

I question your base assumption:

bq: So shard by document producer seems a good choice

 Because what this _also_ does is force all of the work for a query onto one 
node and all indexing for a particular producer ditto. And will cause you to 
manually monitor your shards to see if some of them grow out of proportion to 
others. And

I think it would be much less hassle to just let Solr distribute the docs as it 
may based on the uniqueKey and forget about it. Unless you want, say, to do 
joins etc There will, of course, be some overhead that you pay here, but 
unless you an measure it and it's a pain I wouldn't add the complexity you're 
talking about, especially at the volumes you're talking.

Best,
Erick

On Thu, May 21, 2015 at 3:20 AM, Matteo Grolla matteo.gro...@gmail.com wrote:
 Hi
 I'd like some feedback on how I'd like to solve the following sharding 
 problem


 I have a collection that will eventually become big

 Average document size is 1.5kb
 Every year 30 Million documents will be indexed

 Data come from different document producers (a person, owner of his 
 documents) and queries are almost always performed by a document 
 producer who can only query his own document. So shard by document 
 producer seems a good choice

 there are 3 types of doc producer
 type A,
 cardinality 105 (there are 105 producers of this type) produce 17M 
 docs/year (the aggregated production af all type A producers) type B 
 cardinality ~10k produce 4M docs/year type C cardinality ~10M produce 
 9M docs/year

 I'm thinking about
 use compositeId ( solrDocId = producerId!docId ) to send all docs of the same 
 producer to the same shards. When a shard becomes too large I can use shard 
 splitting.

 problems
 -documents from type A producers could be oddly distributed among 
 shards, because hashing doesn't work well on small numbers (105) see 
 Appendix

 As a solution I could do this when a new typeA producer (producerA1) arrives:

 1) client app: generate a producer code
 2) client app: simulate murmurhashing and shard assignment
 3) client app: check shard assignment is optimal (producer code is 
 assigned to the shard with the least type A producers) otherwise goto 
 1) and try with another code

 when I add documents or perform searches for producerA1 I use it's 
 producer code respectively in the compositeId or in the route parameter What 
 do you think?


 ---Appendix: murmurhash shard assignment 
 simulation---

 import mmh3

 hashes = [mmh3.hash(str(i))16 for i in xrange(105)]

 num_shards = 16
 shards = [0]*num_shards

 for hash in hashes:
 idx = hash % num_shards
 shards[idx] += 1

 print shards
 print sum(shards)

 -

 result: [4, 10, 6, 7, 8, 6, 7, 8, 11, 1, 8, 5, 6, 5, 5, 8]

 so with 16 shards and 105 shard keys I can have shards with 1 key 
 shards with 11 keys


*
This e-mail may contain confidential or privileged information.
If you are not

Re: Problem indexing, value 0.0

2015-05-28 Thread Shawn Heisey

On 5/28/2015 3:08 PM, Shawn Heisey wrote:
 Because we are planning a change on this field in the database to
 decimal, I tried changing the price field on both solr servers to double
 -- TrieDoubleField with precisionStep set to 0.  This didn't fix the
 problem.  Dev was still fine, production throws the exception shown. 
 The source database is still integer ... so why is it showing 0.0 as
 the value?

I was too hasty in this part of my message.  On the copy of the index
where I changed the type to double, it IS working.  The error messages
were coming from the copy of the index where I haven't yet changed it --
the online copy that is being used for queries.  I had to wipe the
indexes on the fixed copy because the schema changed.

The initial problem (suddenly getting floating point numbers from an
integer database column) just appeared out of the blue, with the only
notable difference between the two systems being the Java version.  The
Linux kernel and associated software is different as well, but that
seems less likely to cause problems.

Thanks,
Shawn

Re: optimal shard assignment with low shard key cardinality using compositeId to enable shard splitting

Charles:

You raise good points, and I didn't mean to say that co-locating docs
due to some critera was never a good idea. That said, it does add
administrative complexity that I'd prefer to avoid unless necessary.

I suppose it largely depends on what the load and response SLAs are.
If there's 1 query/second peak load, the sharding overhead for queries
is probably not noticeable. If there are 1,000 QPS, then it might be
worth it.

Measure, measure, measure..

I think your composite ID understanding is fine.

Best,
Erick

On Thu, May 28, 2015 at 1:40 PM, Reitzel, Charles
charles.reit...@tiaa-cref.org wrote:
 We have used a similar sharding strategy for exactly the reasons you say.   
 But we are fairly certain that the # of documents per user ID is  5000 and, 
 typically, 500.   Thus, we think the overhead of distributed searches 
 clearly outweighs the benefits.   Would you agree?   We have done some load 
 testing (with 100's of simultaneous users) and performance has been good with 
 data and queries distributed evenly across shards.

 In Matteo's case, this model appears to apply well to user types B and C.
 Not sure about user type A, though.At  100,000 docs per user per year, 
 on average, that load seems ok for one node.   But, is it enough to benefit 
 significantly from a parallel search?

 With a 2 part composite ID, each part will contribute 16 bits to a 32 bit 
 hash value, which is then compared to the set of hash ranges for each active 
 shard.   Since the user ID will contribute the high-order bytes, it will 
 dominate in matching the target shard(s).   But dominance doesn't mean the 
 lower order 16 bits will always be ignored, does it?   I.e. if the original 
 shard has been split, perhaps multiple times, isn't it possible that one user 
 IDs documents will be spread over a multiple shards?

 In Matteo's case, it might make sense to specify fewer bits to the user ID 
 for user category A.   I.e. what I described above is the default for 
 userId!docId.   But if you use userId/8!docId/24 (8 bits for userId and 24 
 bits for the document ID), then couldn't one user's docs might be split over 
 multiple shards, even without splitting?

 I'm just making sure I understand how composite ID sharding works correctly.  
  Have I got it right?  Has any of this logic changed in 5.x?

 -Original Message-
 From: Erick Erickson [mailto:erickerick...@gmail.com]
 Sent: Thursday, May 21, 2015 11:30 AM
 To: solr-user@lucene.apache.org
 Subject: Re: optimal shard assignment with low shard key cardinality using 
 compositeId to enable shard splitting

 I question your base assumption:

 bq: So shard by document producer seems a good choice

  Because what this _also_ does is force all of the work for a query onto one 
 node and all indexing for a particular producer ditto. And will cause you to 
 manually monitor your shards to see if some of them grow out of proportion to 
 others. And

 I think it would be much less hassle to just let Solr distribute the docs as 
 it may based on the uniqueKey and forget about it. Unless you want, say, to 
 do joins etc There will, of course, be some overhead that you pay here, 
 but unless you an measure it and it's a pain I wouldn't add the complexity 
 you're talking about, especially at the volumes you're talking.

 Best,
 Erick

 On Thu, May 21, 2015 at 3:20 AM, Matteo Grolla matteo.gro...@gmail.com 
 wrote:
 Hi
 I'd like some feedback on how I'd like to solve the following sharding
 problem


 I have a collection that will eventually become big

 Average document size is 1.5kb
 Every year 30 Million documents will be indexed

 Data come from different document producers (a person, owner of his
 documents) and queries are almost always performed by a document
 producer who can only query his own document. So shard by document
 producer seems a good choice

 there are 3 types of doc producer
 type A,
 cardinality 105 (there are 105 producers of this type) produce 17M
 docs/year (the aggregated production af all type A producers) type B
 cardinality ~10k produce 4M docs/year type C cardinality ~10M produce
 9M docs/year

 I'm thinking about
 use compositeId ( solrDocId = producerId!docId ) to send all docs of the 
 same producer to the same shards. When a shard becomes too large I can use 
 shard splitting.

 problems
 -documents from type A producers could be oddly distributed among
 shards, because hashing doesn't work well on small numbers (105) see
 Appendix

 As a solution I could do this when a new typeA producer (producerA1) arrives:

 1) client app: generate a producer code
 2) client app: simulate murmurhashing and shard assignment
 3) client app: check shard assignment is optimal (producer code is
 assigned to the shard with the least type A producers) otherwise goto
 1) and try with another code

 when I add documents or perform searches for producerA1 I use it's
 producer code respectively in the compositeId or in the route

UI Velocity

2015-05-28 Thread Sznajder ForMailingList

Hi

I tried to use the UI Velocity from Solr.

Could you please help in the following:

- how do I define the fields from my schema that I would like to be
displayed as facet in the UI?

Thanks!

Benjamin

RE: When is too many fields in qf is too many?

2015-05-28 Thread Reitzel, Charles

Still, it seems like the right direction.   

Does it smell ok to have a few hundred request handlers?Again, my logic 
is that if any given view requires no more than 50 fields, one request handler 
per view would work.   This is different than a request handler per user 
category (which requires access to any number of views and, thus, many more 
fields).

This does require a design change for Steven's application ...

Steven, do you have tables of the many-to-many relationship between fields and 
views and users and views?   If so, you should be able to programmatically 
generate the request handlers.

If these relationships change frequently, then some custom plugin will be 
required to access these tables at query time.

See what I mean?

-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: Thursday, May 28, 2015 12:07 PM
To: solr-user@lucene.apache.org
Subject: Re: When is too many fields in qf is too many?

Gotta agree with Jack here. This is an insane number of fields, query 
performance on any significant corpus will be fraught etc. The very first 
thing I'd look at is having that many fields. You have 3,500 different fields! 
Whatever the motivation for having that many fields is the place I'd start.

Best,
Erick

On Thu, May 28, 2015 at 5:50 AM, Jack Krupansky jack.krupan...@gmail.com 
wrote:
 This does not even pass a basic smell test for reasonability of 
 matching the capabilities of Solr and the needs of your application. 
 I'd like to hear from others, but I personally would be -1 on this 
 approach to misusing qf. I'd simply say that you need to go back to 
 the drawing board, and that your primary focus should be on working 
 with your application product manager to revise your application 
 requirements to more closely match the capabilities of Solr.

 To put it simply, if you have more than a dozen fields in qf, you're 
 probably doing something wrong. In this case horribly wrong.

 Focus on designing your app to exploit the capabilities of Solr, not 
 to misuse them.

 In short, to answer the original question, more than a couple dozen 
 fields in qf is indeed too many. More than a dozen raises a yellow flag for 
 me.


 -- Jack Krupansky

 On Thu, May 28, 2015 at 8:13 AM, Steven White swhite4...@gmail.com wrote:

 Hi Charles,

 That is what I have done.  At the moment, I have 22 request handlers, 
 some have 3490 field items in qf (that's the most and the qf line 
 spans over
 95,000 characters in solrconfig.xml file) and the least one has 1341 
 fields.  I'm working on seeing if I can use copyField to copy the 
 data of that view's field into a single pseudo-view-field and use 
 that pseudo field for qf of that view's request handler.  The I 
 still have outstanding with using copyField in this way is that it 
 could lead to a complete re-indexing of all the data in that view 
 when a field is adding / removing from that view.

 Thanks

 Steve

 On Wed, May 27, 2015 at 6:02 PM, Reitzel, Charles  
 charles.reit...@tiaa-cref.org wrote:

  One request handler per view?
 
  I think if you are able to make the actual view in use for the 
  current request a single value (vs. all views that the user could 
  use over time), it would keep the qf list down to a manageable size 
  (e.g. specified
 within
  the request handler XML).   Not sure if this is feasible for  you, but it
  seems like a reasonable approach given the use case you describe.
 
  Just a thought ...
 
  -Original Message-
  From: Steven White [mailto:swhite4...@gmail.com]
  Sent: Tuesday, May 26, 2015 4:48 PM
  To: solr-user@lucene.apache.org
  Subject: Re: When is too many fields in qf is too many?
 
  Thanks Doug.  I might have to take you on the hangout offer.  Let 
  me refine the requirement further and if I still see the need, I 
  will let
 you
  know.
 
  Steve
 
  On Tue, May 26, 2015 at 2:01 PM, Doug Turnbull  
  dturnb...@opensourceconnections.com wrote:
 
   How you have tie is fine. Setting tie to 1 might give you 
   reasonable results. You could easily still have scores that are 
   just always an order of magnitude or two higher, but try it out!
  
   BTW Anything you put in teh URL can also be put into a request handler.
  
   If you ever just want to have a 15 minute conversation via 
   hangout, happy to chat with you :) Might be fun to think through 
   your prob
  together.
  
   -Doug
  
   On Tue, May 26, 2015 at 1:42 PM, Steven White 
   swhite4...@gmail.com
   wrote:
  
Hi Doug,
   
I'm back to this topic.  Unfortunately, due to my DB structer, 
and
   business
need, I will not be able to search against a single field (i.e.:
using copyField).  Thus, I have to use list of fields via qf.
Given this, I see you said above to use tie=1.0 will that, 
more or less, address this scoring issue?  Should tie=1.0 be 
set on the
  request handler like so:
   
  requestHandler name=/select class=solr.SearchHandler
 lst

RE: docValues: Can we apply synonym

2015-05-28 Thread Reitzel, Charles

Again, I would recommend using Nolan Lawson's 
SynonymExpandingExtendedDismaxQParserPlugin.

http://nolanlawson.com/2012/10/31/better-synonym-handling-in-solr/

-Original Message-
From: Aman Tandon [mailto:amantandon...@gmail.com] 
Sent: Wednesday, May 27, 2015 6:42 PM
To: solr-user@lucene.apache.org
Subject: Re: docValues: Can we apply synonym

Ok and what synonym processor you is talking about maybe it could help ?

With Regards
Aman Tandon

On Thu, May 28, 2015 at 4:01 AM, Reitzel, Charles  
charles.reit...@tiaa-cref.org wrote:

 Sorry, my bad.   The synonym processor I mention works differently.  It's
 an extension of the EDisMax query processor and doesn't require field 
 level synonym configs.

 -Original Message-
 From: Reitzel, Charles [mailto:charles.reit...@tiaa-cref.org]
 Sent: Wednesday, May 27, 2015 6:12 PM
 To: solr-user@lucene.apache.org
 Subject: RE: docValues: Can we apply synonym

 But the query analysis isn't on a specific field, it is applied to the 
 query string.

 -Original Message-
 From: Aman Tandon [mailto:amantandon...@gmail.com]
 Sent: Wednesday, May 27, 2015 6:08 PM
 To: solr-user@lucene.apache.org
 Subject: Re: docValues: Can we apply synonym

 Hi Charles,

 The problem here is that the docValues works only with primitives data 
 type only like String, int, etc So how could we apply synonym on 
 primitive data type.

 With Regards
 Aman Tandon

 On Thu, May 28, 2015 at 3:19 AM, Reitzel, Charles  
 charles.reit...@tiaa-cref.org wrote:

  Is there any reason you cannot apply the synonyms at query time?
   Applying synonyms at indexing time has problems, e.g. polluting the 
  term frequency for synonyms added, preventing distance queries, ...

  Since city names often have multiple terms, e.g. New York, Den 
  Hague, etc., I would recommend using Nolan Lawson's
  SynonymExpandingExtendedDismaxQParserPlugin.   Tastes great, less
 filling.

  http://nolanlawson.com/2012/10/31/better-synonym-handling-in-solr/

  We found this to fix synonyms like ny for New York and vice versa.
  Haven't tried it with docValues, tho.

  -Original Message-
  From: Aman Tandon [mailto:amantandon...@gmail.com]
  Sent: Tuesday, May 26, 2015 11:15 PM
  To: solr-user@lucene.apache.org
  Subject: Re: docValues: Can we apply synonym

  Yes it could be :)

  Anyway thanks for helping.

  With Regards
  Aman Tandon

  On Tue, May 26, 2015 at 10:22 PM, Alessandro Benedetti  
  benedetti.ale...@gmail.com wrote:

   I should investigate that, as usually synonyms are analysis stage.
   A simple way is to replace the word with all its synonyms ( 
   including original word), but simply using this kind of processor 
   will change the token position and offsets, modifying the actual 
   content of the
  document .

I am from Bombay will become  I am from Bombay Mumbai which 
   can be annoying.
   So a clever approach must be investigated.

   2015-05-26 17:36 GMT+01:00 Aman Tandon amantandon...@gmail.com:

Okay So how could I do it with UpdateProcessors?

With Regards
Aman Tandon

On Tue, May 26, 2015 at 10:00 PM, Alessandro Benedetti  
benedetti.ale...@gmail.com wrote:

 mmm this is different !
 Without any customisation, right now you could :
 - use docValues to provide exact value facets.
 - Than you can use a copy field, with the proper analysis, to 
 search
when a
 user click on a filter !

 So you will see in your facets :
 Mumbai(3)
 Bombay(2)

 And when clicking you see 5 results.
 A little bit misleading for the users …

 On the other hand if you you want to apply the synonyms 
 before, the indexing pipeline ( because docValues field can 
 not be analysed), I
   think
 you should play with UpdateProcessors.

 Cheers

 2015-05-26 17:18 GMT+01:00 Aman Tandon amantandon...@gmail.com:

  We are interested in using docValues for better memory 
  utilization
   and
  speed.

  Currently we are faceting the search results on *city. *In 
  city we
   have
  also added the synonym for cities like mumbai, bombay (These 
  are
   Indian
  cities). So that result of mumbai is also eligible when 
  somebody will applying filter of bombay on search results.

  I need this functionality to apply with docValues enabled field.

  With Regards
  Aman Tandon

  On Tue, May 26, 2015 at 9:19 PM, Alessandro Benedetti  
  benedetti.ale...@gmail.com wrote:

   I checked in the Documentation to be sure, but apparently :

   DocValues are only available for specific field types. The 
   types
chosen
   determine the underlying Lucene docValue type that will be
 used.
   The
   available Solr field types are:

  - StrField and UUIDField.
  - If the field is single-valued (i.e., multi-valued is 
   false),
 Lucene

Re: Per field mm parameter

2015-05-28 Thread Doug Turnbull

You could use local params  with a filter query and specify multiple mm in
each local param. Here's an example for our VA State Laws Solr (you're free
to poke around with). Here I only allow search results that have mm=1 on
catch_line  (a title field) and mm=2 for text field.

http://solr.quepid.com/solr/statedecoded/select?q=deer huntingfq={!edismax
qf=text mm=2 v=$q}fq={!edismax qf=catch_line mm=1
v=$q}defType=edismaxqf=text catch_linetie=1

You can see the results a bit prettier here at Splainer:

http://splainer.io/#?solr=http:%2F%2Fsolr.quepid.com%2Fsolr%2Fstatedecoded%2Fselect%3Fq%3Ddeer%20hunting%0A%26fq%3D%7B!edismax%20qf%3Dtext%20mm%3D2%20v%3D$q%7D%0A%26fq%3D%7B!edismax%20qf%3Dcatch_line%20mm%3D1%20v%3D$q%7D%26defType%3Dedismax%26qf%3Dtext%20catch_line%26tie%3D1

Hope that helps,
-Doug

On Thu, May 28, 2015 at 12:35 PM, Chris Hostetter hossman_luc...@fucit.org
wrote:


 : Subject: Per field mm parameter
 :
 : How to specify per field mm parameter in edismax query.

 you can't.

 the mm param applies to the number of minimum match clauses in the final
 query, where each of those clauses is a disjunction over each of the
 qf fields.

 this blog might help explain the query structure...

 https://lucidworks.com/blog/whats-a-dismax/



 -Hoss
 http://www.lucidworks.com/




-- 
*Doug Turnbull **| *Search Relevance Consultant | OpenSource Connections,
LLC | 240.476.9983 | http://www.opensourceconnections.com
Author: Relevant Search http://manning.com/turnbull from Manning
Publications
This e-mail and all contents, including attachments, is considered to be
Company Confidential unless explicitly stated otherwise, regardless
of whether attachments are marked as such.

Re: Dynamic range on numbers


: Expected identifier at pos 29 str='{!frange l=sum(size1, product(size1,
: .10))}size1
: 
: pos 29 is the open parenthesis of product(). can i not use a function
: within a function? or is there something else i'm missing in the way i'm
: constructing this?

1) you're confusing the parser by trying to put whitespace inside of a 
local param (specifically the 'l' param) w/o quoting the param value .. it 
things that you want sum(size1 to be the value of the l param, and 
then it doesn't know what to make of product(size1 as a another local 
param that it can't make sense of.

2) if you remove the whitespace, or quote the param, that will solve that 
parsing error -- but it will lead to a new error from 
ValueSourceRangeFilter (ie: frange) because the l param doesn't 
support arbitrary functions -- it needs to be a concrete number.

3) even if you could pass a function for the l param, conceptually what 
you are asking for doesn't really make much sense ... you are asking solr 
to only return documents where the value of the size1 field is in a 
range between X and infinity, where X is defined as the sum of the value 
of the size1 field plus 10% of the value of the size1 field.

In other words: give me all docs where S * 1.1 = S

Basically you are asking it to return all documents with a negative value 
in the size1 field.


4) your original question was about filtering docs where the value of a 
field was inside a range of +/- X% of a specified value.  a range query 
where you computed the lower/upper bounds bsed on that percentage in the 
client is really the most straight forward way to do that.

the main reason to consider using frange for something like this is if you 
wnat to filter documents based on the reuslts of a function over multiple 
fields. (ie: docs where the price field divided by the quantity_included 
field was within a client specified range)

adimitedly, you could do something like this...

fq={!frange u=0.10}div(abs(sub($target,size1)),$target)target=345

...which would tell solr to find you all documents where the size1 field 
was within 10% of the target value (345 in this case) -- ie: 310.5 = 
size1 = 379.5)

however it's important to realize that doing something like this is going 
to be less inefficient then just computing the lower/upper range 
bounds in the client -- because solr will be evaluating that function for 
every document in order to make the comparison.  (meanwhile you can 
compute the upper and lower bounds exactly once and just let solr do the 
comparisons)


-Hoss
http://www.lucidworks.com/

Re: Unsubscribe

Please follow instructions here: http://lucene.apache.org/solr/resources.html

Be sure to use the exact e-mail address you originally subscribed with.

On Thu, May 28, 2015 at 9:49 AM, Nirali Mehta nirali...@gmail.com wrote:
 Unsubscribe

Re: Dynamic range on numbers

doh!

1) silly me, i knew better but was getting tunnel visioned
2) moved to fq and am now getting this error:
Expected identifier at pos 29 str='{!frange l=sum(size1, product(size1,
.10))}size1

pos 29 is the open parenthesis of product(). can i not use a function
within a function? or is there something else i'm missing in the way i'm
constructing this?

thanks for helping me stumble through this!

-- 
*John Blythe*
Product Manager  Lead Developer

251.605.3071 | j...@curvolabs.com
www.curvolabs.com

58 Adams Ave
Evansville, IN 47713

On Thu, May 28, 2015 at 12:37 PM, Erick Erickson erickerick...@gmail.com
wrote:

 fq, not fl.

 fq is filter query
 fl is the field list, the stored fields to be returned to the user.

 Best,
 Erick

 On Thu, May 28, 2015 at 9:03 AM, John Blythe j...@curvolabs.com wrote:
  I've set the field to be processed as such:
  fieldType name=sizes class=solr.TrieDoubleField precisionStep=6 /
 
  and then have this in the fl box in Solr admin UI:
  *, score, {!frange l=sum(size1, product(size1, .10))}size1
 
  I'm trying to use the size1 field as the item upon which a frange is
 being
  used, but also need to use the size1 value for the mathematical functions
  themselves
 
  I get this error:
  error: { msg: Error parsing fieldname, code: 400 }
 
  thanks for any assistance or insight
 
  --
  *John Blythe*
  Product Manager  Lead Developer
 
  251.605.3071 | j...@curvolabs.com
  www.curvolabs.com
 
  58 Adams Ave
  Evansville, IN 47713
 
  On Wed, May 27, 2015 at 2:10 PM, John Blythe j...@curvolabs.com wrote:
 
  thanks erick. will give it a whirl later today and report back tonight
 or
  tomorrow. i imagine i'll have some more questions crop up :)
 
  best,
 
  --
  *John Blythe*
  Product Manager  Lead Developer
 
  251.605.3071 | j...@curvolabs.com
  www.curvolabs.com
 
  58 Adams Ave
  Evansville, IN 47713
 
  On Wed, May 27, 2015 at 1:32 PM, Erick Erickson 
 erickerick...@gmail.com
  wrote:
 
  1 tfloat
  2 fq=dimField:[4.5 TO 5.5] or even use frange to set the lower and
  upper bounds via function
 
  Best,
  Erick
 
  On Wed, May 27, 2015 at 5:29 AM, John Blythe j...@curvolabs.com
 wrote:
   hi all,
  
   i'm attempting to suggest products across a range to users based on
   dimensions. if there is a 5x10mm Drill Set for instance and a
  competitor
   sales something similar enough then i'd like to have it shown. the
  range,
   however, would need to be dynamic. i'm thinking for our initial
 testing
   phase we'll go with 10% in either direction of a number.
  
   thus, a document that hits drill set but has the size1 field set to
  4.5
   or 5.5 would match for the 5 in the query.
  
   1) what's the best field type to use for numeric ranges? i'll need to
   account for decimal places, up to two places though usually only one.
   2) is there a way to dynamically set the range?
  
   thanks!

Number of clustering labels to show

Hi,

I'm trying to increase the number of cluster result to be shown during the
search. I tried to set carrot.fragSize=20 but only 15 cluster labels is
shown. Even when I tried to set carrot.fragSize=5, there's also 15 labels
shown.

Is this the correct way to do this? I understand that setting it to 20
might not necessary mean 20 lables will be shown, as the setting is for
maximum number. But when I set this to 5, it should reduce the number of
labels to 5?

I'm using Solr 5.1.


Regards,
Edwin

Re: Relevancy Score and Proximity Search

I've tried to use the site. I saw that when I search for Matex, it actually
only gives a boost of 0.8 to the word Latex as it is not the main word that
is search, but I still can't understand why the score can be so high?

This is what I get from the output explanation:

{

   -
   - match: true,
   - value: 2.3716807,
   - description: weight(text:latex^0.8 in 106449) [DefaultSimilarity],
   result of:,
   - details:
   -
   [
  - -
  {
 -
 - match: true,
 - value: 2.3716807,
 - description: score(doc=106449,freq=1.0), product of:,
 - details:
 -
 [
- -
{
   -
   - match: true,
   - value: 0.95434946,
   - description: queryWeight, product of:,
   - details:
   -
   [
  - -
  {
 -
 - match: true,
 - value: 0.8,
 - description: boost
  },
  - -
  {
 -
 - match: true,
 - value: 13.254017,
 - description: idf(docFreq=1, maxDocs=419645)
  },
  - -
  {
 -
 - match: true,
 - value: 0.09000568,
 - description: queryNorm
  }
   ]
},
- -
{
   -
   - match: true,
   - value: 2.4851282,
   - description: fieldWeight in 106449, product of:,
   - details:
   -
   [
  - -
  {
 -
 - match: true,
 - value: 1,
 - description: tf(freq=1.0), with freq of:,
 - details:
 -
 [
- -
{
   -
   - match: true,
   - value: 1,
   - description: termFreq=1.0
}
 ]
  },
  - -
  {
 -
 - match: true,
 - value: 13.254017,
 - description: idf(docFreq=1, maxDocs=419645)
  },
  - -
  {
 -
 - match: true,
 - value: 0.1875,
 - description: fieldNorm(doc=106449)
  }
   ]
}
 ]
  }
   ]

}


For the record with Matex, here is the output explanation:

{

   -
   - match: true,
   - value: 0.18585733,
   - description: weight(text:matex in 163) [DefaultSimilarity], result
   of:,
   - details:
   -
   [
  - -
  {
 -
 - match: true,
 - value: 0.18585733,
 - description: score(doc=163,freq=1.0), product of:,
 - details:
 -
 [
- -
{
   -
   - match: true,
   - value: 0.2986924,
   - description: queryWeight, product of:,
   - details:
   -
   [
  - -
  {
 -
 - match: true,
 - value: 3.318595,
 - description: idf(docFreq=41297, maxDocs=419645)
  },
  - -
  {
 -
 - match: true,
 - value: 0.09000568,
 - description: queryNorm
  }
   ]
},
- -
{
   -
   - match: true,
   - value: 0.62223655,
   - description: fieldWeight in 163, product of:,
   - details:
   -
   [
  - -
  {
 -
 - match: true,
 - value: 1,
 - description: tf(freq=1.0), with freq of:,
 - details:
 -
 [
- -
{
   -
   - match: true,
   - value: 1,
   - description: termFreq=1.0
}
 ]
  },
  - -
  {
 -
 - match: true,
 - value: 3.318595,
 - description: idf(docFreq=41297, maxDocs=419645)
  },
  - -

Re: docValues: Can we apply synonym

2015-05-28 Thread Aman Tandon

Thanks chris.

Yes we are using it for handling multiword synonym problem.

With Regards
Aman Tandon

On Fri, May 29, 2015 at 12:38 AM, Reitzel, Charles 
charles.reit...@tiaa-cref.org wrote:

 Again, I would recommend using Nolan Lawson's
 SynonymExpandingExtendedDismaxQParserPlugin.

 http://nolanlawson.com/2012/10/31/better-synonym-handling-in-solr/

 -Original Message-
 From: Aman Tandon [mailto:amantandon...@gmail.com]
 Sent: Wednesday, May 27, 2015 6:42 PM
 To: solr-user@lucene.apache.org
 Subject: Re: docValues: Can we apply synonym

 Ok and what synonym processor you is talking about maybe it could help ?

 With Regards
 Aman Tandon

 On Thu, May 28, 2015 at 4:01 AM, Reitzel, Charles 
 charles.reit...@tiaa-cref.org wrote:

  Sorry, my bad.   The synonym processor I mention works differently.  It's
  an extension of the EDisMax query processor and doesn't require field
  level synonym configs.
 
  -Original Message-
  From: Reitzel, Charles [mailto:charles.reit...@tiaa-cref.org]
  Sent: Wednesday, May 27, 2015 6:12 PM
  To: solr-user@lucene.apache.org
  Subject: RE: docValues: Can we apply synonym
 
  But the query analysis isn't on a specific field, it is applied to the
  query string.
 
  -Original Message-
  From: Aman Tandon [mailto:amantandon...@gmail.com]
  Sent: Wednesday, May 27, 2015 6:08 PM
  To: solr-user@lucene.apache.org
  Subject: Re: docValues: Can we apply synonym
 
  Hi Charles,
 
  The problem here is that the docValues works only with primitives data
  type only like String, int, etc So how could we apply synonym on
  primitive data type.
 
  With Regards
  Aman Tandon
 
  On Thu, May 28, 2015 at 3:19 AM, Reitzel, Charles 
  charles.reit...@tiaa-cref.org wrote:
 
   Is there any reason you cannot apply the synonyms at query time?
Applying synonyms at indexing time has problems, e.g. polluting the
   term frequency for synonyms added, preventing distance queries, ...
  
   Since city names often have multiple terms, e.g. New York, Den
   Hague, etc., I would recommend using Nolan Lawson's
   SynonymExpandingExtendedDismaxQParserPlugin.   Tastes great, less
  filling.
  
   http://nolanlawson.com/2012/10/31/better-synonym-handling-in-solr/
  
   We found this to fix synonyms like ny for New York and vice versa.
   Haven't tried it with docValues, tho.
  
   -Original Message-
   From: Aman Tandon [mailto:amantandon...@gmail.com]
   Sent: Tuesday, May 26, 2015 11:15 PM
   To: solr-user@lucene.apache.org
   Subject: Re: docValues: Can we apply synonym
  
   Yes it could be :)
  
   Anyway thanks for helping.
  
   With Regards
   Aman Tandon
  
   On Tue, May 26, 2015 at 10:22 PM, Alessandro Benedetti 
   benedetti.ale...@gmail.com wrote:
  
I should investigate that, as usually synonyms are analysis stage.
A simple way is to replace the word with all its synonyms (
including original word), but simply using this kind of processor
will change the token position and offsets, modifying the actual
content of the
   document .
   
 I am from Bombay will become  I am from Bombay Mumbai which
can be annoying.
So a clever approach must be investigated.
   
2015-05-26 17:36 GMT+01:00 Aman Tandon amantandon...@gmail.com:
   
 Okay So how could I do it with UpdateProcessors?

 With Regards
 Aman Tandon

 On Tue, May 26, 2015 at 10:00 PM, Alessandro Benedetti 
 benedetti.ale...@gmail.com wrote:

  mmm this is different !
  Without any customisation, right now you could :
  - use docValues to provide exact value facets.
  - Than you can use a copy field, with the proper analysis, to
  search
 when a
  user click on a filter !
 
  So you will see in your facets :
  Mumbai(3)
  Bombay(2)
 
  And when clicking you see 5 results.
  A little bit misleading for the users …
 
  On the other hand if you you want to apply the synonyms
  before, the indexing pipeline ( because docValues field can
  not be analysed), I
think
  you should play with UpdateProcessors.
 
  Cheers
 
  2015-05-26 17:18 GMT+01:00 Aman Tandon amantandon...@gmail.com
 :
 
   We are interested in using docValues for better memory
   utilization
and
   speed.
  
   Currently we are faceting the search results on *city. *In
   city we
have
   also added the synonym for cities like mumbai, bombay (These
   are
Indian
   cities). So that result of mumbai is also eligible when
   somebody will applying filter of bombay on search results.
  
   I need this functionality to apply with docValues enabled
 field.
  
   With Regards
   Aman Tandon
  
   On Tue, May 26, 2015 at 9:19 PM, Alessandro Benedetti 
   benedetti.ale...@gmail.com wrote:
  
I checked in the Documentation to be sure, but apparently :
   
DocValues are only available

Re: Running Solr 5.1.0 as a Service on Windows

Hi Miller,

Yes, I managed to get the zookeeper to start as a service on Windows in
Windows 8 running Java 8. However, it didn't work on Windows Server 2008 R2
(SP1) when I upgrade the Java to Java 8. It is able to work when the server
is running on Java 7.

Regards,
Edwin



On 26 May 2015 at 23:54, Will Miller wmil...@fbbrands.com wrote:

 I am using NSSM to start zookeeper as a service on windows (and for Solr
 too).

 in NSSM I configured it to just point to to
 E:\zookeeper-3.4.6\bin\zkServer.cmd.

 As long as you can run that from the command line to validate that you
 have modified all of the zookeeper config files correctly, NSSM should have
 no problem starting up zookeeper.



 Will Miller
 Development Manager, eCommerce Services | Online Technology
 462 Seventh Avenue, New York, NY, 10018
 Office: 212.502.9323 | Cell: 317.653.0614
 wmil...@fbbrands.com | www.fbbrands.com

 
 From: Upayavira u...@odoko.co.uk
 Sent: Monday, May 25, 2015 4:10 PM
 To: solr-user@lucene.apache.org
 Subject: Re: Running Solr 5.1.0 as a Service on Windows

 Zookeeper is just Java, so there's no reason why it can't be started in
 Windows.

 However, the startup scripts for Zookeeper on Windows are pathetic, so
 you are much more on your own than you are on Linux.

 There may be folks here who can answer your question (e.g. with Windows
 specific startup scripts), or you might consider asking on the Zookeeper
 mailing lists directly: https://zookeeper.apache.org/lists.html

 Upayavira

 On Mon, May 25, 2015, at 10:34 AM, Zheng Lin Edwin Yeo wrote:
  I've managed to get the Solr started as a Windows service after
  re-configuring the startup script, as I've previously missed out some of
  the custom configurations there.
 
  However, I still couldn't get the zookeeper to start the same way too.
  Are
  we able to use NSSM to start up zookeeper as a Microsoft Windows service
  too?
 
 
  Regards,
  Edwin
 
 
 
  On 25 May 2015 at 12:16, Zheng Lin Edwin Yeo edwinye...@gmail.com
  wrote:
 
   Hi,
  
   Has anyone tried to run Solr 5.1.0 as a Microsoft Windows service?
  
   i've tried to follow the steps from this website
   http://www.norconex.com/how-to-run-solr5-as-a-service-on-windows/,
 which
   uses NSSM.
  
   However, when I tried to start the service from the Component Services
 in
   the Windows Control Panel Administrative tools, I get the following
 message:
   Windows could not start the Solr5 service on Local Computer. The
 service
   did not return an error. This could be an internal Windows error or an
   internal service error.
  
   Is this the correct way to set it up, or is there other methods?
  
  
   Regards,
   Edwin

Re: Running Solr 5.1.0 as a Service on Windows

Hi Timothy,

I don't really have much of a good recommendation. Basically I've written a
batch file which will call the solr.cmd with all the setting like heap size
and enable clustering, and I point the path in the NSSM to this batch file.
If I just point it directly to solr.cmd, I not sure if these can be
configured at NSSM side?


Regards,
Edwin



On 26 May 2015 at 23:56, Timothy Potter thelabd...@gmail.com wrote:

 Hi Edwin,

 Are there changes you recommend to bin/solr.cmd to make it easier to
 work with NSSM? If so, please file a JIRA as I'd like to help make
 that process easier.

 Thanks.
 Tim

 On Mon, May 25, 2015 at 3:34 AM, Zheng Lin Edwin Yeo
 edwinye...@gmail.com wrote:
  I've managed to get the Solr started as a Windows service after
  re-configuring the startup script, as I've previously missed out some of
  the custom configurations there.
 
  However, I still couldn't get the zookeeper to start the same way too.
 Are
  we able to use NSSM to start up zookeeper as a Microsoft Windows service
  too?
 
 
  Regards,
  Edwin
 
 
 
  On 25 May 2015 at 12:16, Zheng Lin Edwin Yeo edwinye...@gmail.com
 wrote:
 
  Hi,
 
  Has anyone tried to run Solr 5.1.0 as a Microsoft Windows service?
 
  i've tried to follow the steps from this website
  http://www.norconex.com/how-to-run-solr5-as-a-service-on-windows/,
 which
  uses NSSM.
 
  However, when I tried to start the service from the Component Services
 in
  the Windows Control Panel Administrative tools, I get the following
 message:
  Windows could not start the Solr5 service on Local Computer. The service
  did not return an error. This could be an internal Windows error or an
  internal service error.
 
  Is this the correct way to set it up, or is there other methods?
 
 
  Regards,
  Edwin

Re: Index optimize runs in background.

2015-05-28 Thread Modassar Ather

The indexer takes almost 2 hours to optimize. It has a multi-threaded add
of batches of documents to
org.apache.solr.client.solrj.impl.CloudSolrClient.
Once all the documents are indexed it invokes commit and optimize. I have
seen that the optimize goes into background after 10 minutes and indexer
exits.
I am not sure why this 10 minutes it hangs on indexer. This behavior I have
seen in multiple iteration of the indexing of same data.

There is nothing significant I found in log which I can share. I can see
following in log.
org.apache.solr.update.DirectUpdateHandler2; start
commit{,optimize=true,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}

On Wed, May 27, 2015 at 10:59 PM, Erick Erickson erickerick...@gmail.com
wrote:

All strange of course. What do your Solr logs show when this happens?
And how reproducible is this?

Best,
Erick

On Wed, May 27, 2015 at 4:00 AM, Upayavira u...@odoko.co.uk wrote:
In this case, optimising makes sense, once the index is generated, you
are not updating It.

Upayavira

On Wed, May 27, 2015, at 06:14 AM, Modassar Ather wrote:
Our index has almost 100M documents running on SolrCloud of 5 shards and
each shard has an index size of about 170+GB (for the record, we are not
using stored fields - our documents are pretty large). We perform a full
indexing every weekend and during the week there are no updates made to
the
index. Most of the queries that we run are pretty complex with hundreds
of
terms using PhraseQuery, BooleanQuery, SpanQuery, Wildcards, boosts etc.
and take many minutes to execute. A difference of 10-20% is also a big
advantage for us.

We have been optimizing the index after indexing for years and it has
worked well for us. Every once in a while, we upgrade Solr to the latest
version and try without optimizing so that we can save the many hours it
take to optimize such a huge index, but find optimized index work well
for
us.

Erick I was indexing today the documents and saw the optimize happening
in
background.

On Tue, May 26, 2015 at 9:12 PM, Erick Erickson
erickerick...@gmail.com
wrote:

No results yet. I finished the test harness last night (not really a
unit test, a stand-alone program that endlessly adds stuff and tests
that every commit returns the correct number of docs).

8,000 cycles later there aren't any problems reported.

Siiigh.

On Tue, May 26, 2015 at 1:51 AM, Modassar Ather
modather1...@gmail.com
wrote:
Hi,

Erick you mentioned about a unit test to test the optimize running
in
background. Kindly share your findings if any.

Thanks,
Modassar

On Mon, May 25, 2015 at 11:47 AM, Modassar Ather
modather1...@gmail.com

wrote:

Thanks everybody for your replies.

I have noticed the optimization running in background every time I
indexed. This is 5 node cluster with solr-5.1.0 and uses the
CloudSolrClient. Kindly share your findings on this issue.

Our index has almost 100M documents running on SolrCloud. We have
been
optimizing the index after indexing for years and it has worked
well for
us.

Thanks,
Modassar

On Fri, May 22, 2015 at 11:55 PM, Erick Erickson
erickerick...@gmail.com
wrote:

Actually, I've recently seen very similar behavior in Solr
4.10.3, but
involving hard commits openSearcher=true, see:
https://issues.apache.org/jira/browse/SOLR-7572. Of course I
can't
reproduce this at will, sii.

A unit test should be very simple to write though, maybe I can
get to
it
today.

Erick

On Fri, May 22, 2015 at 8:27 AM, Upayavira u...@odoko.co.uk
wrote:

On Fri, May 22, 2015, at 03:55 PM, Shawn Heisey wrote:
On 5/21/2015 6:21 AM, Modassar Ather wrote:
I am using Solr-5.1.0. I have an indexer class which invokes
cloudSolrClient.optimize(true, true, 1). My indexer exits
after
the
invocation of optimize and the optimization keeps on running
in
the
background.
Kindly let me know if it is per design and how can I make my
indexer
to
wait until the optimization is over. Is there a
configuration/parameter I
need to set for the same.

Please note that the same indexer with
cloudSolrServer.optimize(true, true,
1) on Solr-4.10 used to wait till the optimize was over
before
exiting.

This is very odd, because I could not get HttpSolrServer to
optimize in
the background, even when that was what I wanted.

I wondered if maybe the Cloud object behaves differently with
regard to
blocking until an optimize is finished ... except that there
is no
code
for optimizing in CloudSolrClient at all ... so I don't know
where
the
different behavior would actually be happening.

A more important

Re: Index optimize runs in background.

2015-05-28 Thread Stefan Meise - SONIC Performance Support

Are you timing out on the client request? The theory here is that it's
still a synchronous call, but you're just timing out at the client
level. At that point, the optimize is still running it's just the
connection has been dropped

Shot in the dark.
Erick

On Thu, May 28, 2015 at 10:31 PM, Modassar Ather modather1...@gmail.com wrote:
I could not notice it but with my past experience of commit which used to
take around 2 minutes is now taking around 8 seconds. I think this is also
running as background.

On Fri, May 29, 2015 at 10:52 AM, Modassar Ather modather1...@gmail.com
wrote:

The indexer takes almost 2 hours to optimize. It has a multi-threaded add
of batches of documents to
org.apache.solr.client.solrj.impl.CloudSolrClient.
Once all the documents are indexed it invokes commit and optimize. I have
seen that the optimize goes into background after 10 minutes and indexer
exits.
I am not sure why this 10 minutes it hangs on indexer. This behavior I
have seen in multiple iteration of the indexing of same data.

On Wed, May 27, 2015 at 10:59 PM, Erick Erickson erickerick...@gmail.com
wrote:

All strange of course. What do your Solr logs show when this happens?
And how reproducible is this?

Best,
Erick

On Wed, May 27, 2015 at 4:00 AM, Upayavira u...@odoko.co.uk wrote:
In this case, optimising makes sense, once the index is generated, you
are not updating It.

Upayavira

On Wed, May 27, 2015, at 06:14 AM, Modassar Ather wrote:
Our index has almost 100M documents running on SolrCloud of 5 shards
and
each shard has an index size of about 170+GB (for the record, we are
not
using stored fields - our documents are pretty large). We perform a
full
indexing every weekend and during the week there are no updates made to
the
index. Most of the queries that we run are pretty complex with hundreds
of
terms using PhraseQuery, BooleanQuery, SpanQuery, Wildcards, boosts
etc.
and take many minutes to execute. A difference of 10-20% is also a big
advantage for us.

We have been optimizing the index after indexing for years and it has
worked well for us. Every once in a while, we upgrade Solr to the
latest
version and try without optimizing so that we can save the many hours
it
take to optimize such a huge index, but find optimized index work well
for
us.

Erick I was indexing today the documents and saw the optimize happening
in
background.

On Tue, May 26, 2015 at 9:12 PM, Erick Erickson
erickerick...@gmail.com
wrote:

No results yet. I finished the test harness last night (not really a
unit test, a stand-alone program that endlessly adds stuff and tests
that every commit returns the correct number of docs).

8,000 cycles later there aren't any problems reported.

Siiigh.

On Tue, May 26, 2015 at 1:51 AM, Modassar Ather
modather1...@gmail.com
wrote:
Hi,

Erick you mentioned about a unit test to test the optimize running
in
background. Kindly share your findings if any.

Thanks,
Modassar

On Mon, May 25, 2015 at 11:47 AM, Modassar Ather
modather1...@gmail.com

wrote:

Thanks everybody for your replies.

I have noticed the optimization running in background every time I
indexed. This is 5 node cluster with solr-5.1.0 and uses the
CloudSolrClient. Kindly share your findings on this issue.

Our index has almost 100M documents running on SolrCloud. We have
been
optimizing the index after indexing for years and it has worked
well for
us.

Thanks,
Modassar

On Fri, May 22, 2015 at 11:55 PM, Erick Erickson
erickerick...@gmail.com
wrote:

A unit test should be very simple to write though, maybe I can
get to
it
today.

Erick

On Fri, May 22, 2015 at 8:27 AM, Upayavira u...@odoko.co.uk
wrote:

On Fri, May 22, 2015, at 03:55 PM, Shawn Heisey wrote:
On 5/21/2015 6:21 AM, Modassar Ather wrote:
I am using Solr-5.1.0. I have an indexer class which invokes
cloudSolrClient.optimize(true, true, 1). My indexer exits
after
the
invocation of optimize and the optimization keeps on
running in
the
background.
Kindly let me know if it is per design and how can I make my
indexer
to
wait until the optimization is over. Is there a
configuration/parameter I
need to set for the same.

Please note

Re: Index optimize runs in background.

2015-05-28 Thread Modassar Ather

I could not notice it but with my past experience of commit which used to
take around 2 minutes is now taking around 8 seconds. I think this is also
running as background.

On Fri, May 29, 2015 at 10:52 AM, Modassar Ather modather1...@gmail.com
wrote:

The indexer takes almost 2 hours to optimize. It has a multi-threaded add
of batches of documents to
org.apache.solr.client.solrj.impl.CloudSolrClient.
Once all the documents are indexed it invokes commit and optimize. I have
seen that the optimize goes into background after 10 minutes and indexer
exits.
I am not sure why this 10 minutes it hangs on indexer. This behavior I
have seen in multiple iteration of the indexing of same data.

On Wed, May 27, 2015 at 10:59 PM, Erick Erickson erickerick...@gmail.com
wrote:

All strange of course. What do your Solr logs show when this happens?
And how reproducible is this?

Best,
Erick

On Wed, May 27, 2015 at 4:00 AM, Upayavira u...@odoko.co.uk wrote:
In this case, optimising makes sense, once the index is generated, you
are not updating It.

Upayavira

On Wed, May 27, 2015, at 06:14 AM, Modassar Ather wrote:
Our index has almost 100M documents running on SolrCloud of 5 shards
and
each shard has an index size of about 170+GB (for the record, we are
not
using stored fields - our documents are pretty large). We perform a
full
indexing every weekend and during the week there are no updates made to
the
index. Most of the queries that we run are pretty complex with hundreds
of
terms using PhraseQuery, BooleanQuery, SpanQuery, Wildcards, boosts
etc.
and take many minutes to execute. A difference of 10-20% is also a big
advantage for us.

We have been optimizing the index after indexing for years and it has
worked well for us. Every once in a while, we upgrade Solr to the
latest
version and try without optimizing so that we can save the many hours
it
take to optimize such a huge index, but find optimized index work well
for
us.

Erick I was indexing today the documents and saw the optimize happening
in
background.

On Tue, May 26, 2015 at 9:12 PM, Erick Erickson
erickerick...@gmail.com
wrote:

No results yet. I finished the test harness last night (not really a
unit test, a stand-alone program that endlessly adds stuff and tests
that every commit returns the correct number of docs).

8,000 cycles later there aren't any problems reported.

Siiigh.

On Tue, May 26, 2015 at 1:51 AM, Modassar Ather
modather1...@gmail.com
wrote:
Hi,

Erick you mentioned about a unit test to test the optimize running
in
background. Kindly share your findings if any.

Thanks,
Modassar

On Mon, May 25, 2015 at 11:47 AM, Modassar Ather
modather1...@gmail.com

wrote:

Thanks everybody for your replies.

I have noticed the optimization running in background every time I
indexed. This is 5 node cluster with solr-5.1.0 and uses the
CloudSolrClient. Kindly share your findings on this issue.

Our index has almost 100M documents running on SolrCloud. We have
been
optimizing the index after indexing for years and it has worked
well for
us.

Thanks,
Modassar

On Fri, May 22, 2015 at 11:55 PM, Erick Erickson
erickerick...@gmail.com
wrote:

A unit test should be very simple to write though, maybe I can
get to
it
today.

Erick

On Fri, May 22, 2015 at 8:27 AM, Upayavira u...@odoko.co.uk
wrote:

On Fri, May 22, 2015, at 03:55 PM, Shawn Heisey wrote:
On 5/21/2015 6:21 AM, Modassar Ather wrote:
I am using Solr-5.1.0. I have an indexer class which invokes
cloudSolrClient.optimize(true, true, 1). My indexer exits
after
the
invocation of optimize and the optimization keeps on
running in
the
background.
Kindly let me know if it is per design and how can I make my
indexer
to
wait until the optimization is over. Is there a
configuration/parameter I
need to set for the same.

Please note that the same indexer with
cloudSolrServer.optimize(true, true,
1) on Solr-4.10 used to wait till the optimize was over
before
exiting.

This is very odd, because I could not get HttpSolrServer to
optimize in
the background, even when that was what I wanted.

I wondered if maybe the Cloud object behaves

Re: SolrCloud: Creating more shard at runtime will lower down the load?

2015-05-28 Thread Alessandro Benedetti

Hi Aman,
this feature can be interesting for you :

 Shard Splitting

 When you create a collection in SolrCloud, you decide on the initial
 number shards to be used. But it can be difficult to know in advance the
 number of shards that you need, particularly when organizational
 requirements can change at a moment's notice, and the cost of finding out
 later that you chose wrong can be high, involving creating new cores and
 re-indexing all of your data.

 The ability to split shards is in the Collections API. It currently allows
 splitting a shard into two pieces. The existing shard is left as-is, so the
 split action effectively makes two copies of the data as new shards. You
 can delete the old shard at a later time when you're ready.

 More details on how to use shard splitting is in the section on the 
 Collections
 API https://cwiki.apache.org/confluence/display/solr/Collections+API.


To answer to your questions :

1) If your shard is properly splitter, and you use Solr Cloud to distribute
the requests and load balancing, the users will not see anything
2) Of course it is but you must be careful, because maybe you want to add
replicas if the amount of load is your concern.

Usually sharing is because an increasing amount of content to process and
search.
Adding replication is because an increasing demand of queries and high load
for the servers.

Let me know more details if you like !

Cheers

2015-05-28 4:44 GMT+01:00 Aman Tandon amantandon...@gmail.com:

 Hi,

 I have a question regarding the solr cloud. The load on our search server
 are increasing day by day as our no of visitors are keep on increasing.

 So I have a scenario, I  want to slice the data at the Runtime, by creating
 the more shards of the data.

 *i)* Does it affect the current queries
 *ii)*  Does it lower down the load on our search servers?

 With Regards
 Aman Tandon




-- 
--

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?

William Blake - Songs of Experience -1794 England

Re: SolrCloud: Creating more shard at runtime will lower down the load?

2015-05-28 Thread Aman Tandon

Thank you Alessandro.

With Regards
Aman Tandon

On Thu, May 28, 2015 at 3:57 PM, Alessandro Benedetti 
benedetti.ale...@gmail.com wrote:

 Hi Aman,
 this feature can be interesting for you :

  Shard Splitting
 
  When you create a collection in SolrCloud, you decide on the initial
  number shards to be used. But it can be difficult to know in advance the
  number of shards that you need, particularly when organizational
  requirements can change at a moment's notice, and the cost of finding out
  later that you chose wrong can be high, involving creating new cores and
  re-indexing all of your data.
 
  The ability to split shards is in the Collections API. It currently
 allows
  splitting a shard into two pieces. The existing shard is left as-is, so
 the
  split action effectively makes two copies of the data as new shards. You
  can delete the old shard at a later time when you're ready.
 
  More details on how to use shard splitting is in the section on the
 Collections
  API https://cwiki.apache.org/confluence/display/solr/Collections+API.
 

 To answer to your questions :

 1) If your shard is properly splitter, and you use Solr Cloud to distribute
 the requests and load balancing, the users will not see anything
 2) Of course it is but you must be careful, because maybe you want to add
 replicas if the amount of load is your concern.

 Usually sharing is because an increasing amount of content to process and
 search.
 Adding replication is because an increasing demand of queries and high load
 for the servers.

 Let me know more details if you like !

 Cheers

 2015-05-28 4:44 GMT+01:00 Aman Tandon amantandon...@gmail.com:

  Hi,
 
  I have a question regarding the solr cloud. The load on our search server
  are increasing day by day as our no of visitors are keep on increasing.
 
  So I have a scenario, I  want to slice the data at the Runtime, by
 creating
  the more shards of the data.
 
  *i)* Does it affect the current queries
  *ii)*  Does it lower down the load on our search servers?
 
  With Regards
  Aman Tandon
 



 --
 --

 Benedetti Alessandro
 Visiting card : http://about.me/alessandro_benedetti

 Tyger, tyger burning bright
 In the forests of the night,
 What immortal hand or eye
 Could frame thy fearful symmetry?

 William Blake - Songs of Experience -1794 England

Re: Ability to load solrcore.properties from zookeeper

2015-05-28 Thread Alan Woodward

I think this is an oversight, rather than intentional (at least, I certainly 
didn't intend to write it like this!).  The problem here will be that 
CoreDescriptors are currently built entirely from core.properties files, and 
the CoreLocators that construct them don't have any access to zookeeper.

Maybe the way forward is to move properties out of CoreDescriptor and have an 
entirely separate CoreProperties object that is built and returned by the 
ConfigSetService, and that is read via the ResourceLoader.  This would fit in 
quite nicely with the changes I put up on SOLR-7570, in that you could have 
properties specified on the collection config overriding properties from the 
configset, and then local core-specific properties overriding both.

Do you want to open a JIRA bug, Steve?

Alan Woodward
www.flax.co.uk


On 28 May 2015, at 00:58, Chris Hostetter wrote:

 : I am attempting to override some properties in my solrconfig.xml file by
 : specifying properties in a solrcore.properties file which is uploaded in
 : Zookeeper's collections/conf directory, though when I go to create a new
 : collection those properties are never loaded. One work-around is to specify
 
 yeah ... that's weird ... it looks like the solrcore.properties reading 
 logic goes out ot it's way to read from the conf/ dir of the core, rather 
 then using the SolrResourceLoader (which is ZK aware in cloud mode)
 
 I don't understand if this is intentional or some kind of weird oversight.
 
 The relevent method is CoreDescriptor.loadExtraProperties()  By all means 
 please open a bug about this -- and if you're feeling up to it, tackle a 
 patch:  IIUC CoreDescriptor.loadExtraProperties is the relevent method ... 
 it would need to build up the path including the core name and get the 
 system level resource loader (CoreContainer.getResourceLoader()) to access 
 it since the core doesn't exist yet so there is no core level 
 ResourceLoader to use.
 
 Hopefully some folks who are more recently familiar with the core loading 
 logic (like Alan  Erick) will see the Jira nad can chime in as to wether 
 there is some fundemental reason it has to work the way it does not, or if 
 this bug can be fixed.
 
 
 : easy way of updating those properties cluster-wide, I did attempt to
 : specify a request parameter of 'property.properties=solrcore.properties' in
 : the collection creation request but that also fails.
 
 yeah, looks like regardless of the filename, that method loads it the same 
 way.
 
 
 -Hoss
 http://www.lucidworks.com/

RE: HW requirements

2015-05-28 Thread Allison, Timothy B.

A classic on the importance of prototyping with your data and on the 
intractability of sizing in the abstract:

https://lucidworks.com/blog/sizing-hardware-in-the-abstract-why-we-dont-have-a-definitive-answer/
 


This might be of use:

https://svn.apache.org/repos/asf/lucene/dev/trunk/dev-tools/size-estimator-lucene-solr.xls

but note this thread:
https://mail-archives.apache.org/mod_mbox/lucene-java-user/201503.mbox/%3cCALZAj3KMiStgiFZb=RTAEqDg8dpPYcmaj25T26Hi+=c7cal...@mail.gmail.com%3e
 

perhaps: 
http://docs.alfresco.com/4.1/concepts/solrnodes-memory.html 
-Original Message-
From: Sznajder ForMailingList [mailto:bs4mailingl...@gmail.com] 
Sent: Wednesday, May 27, 2015 12:34 PM
To: solr-user@lucene.apache.org
Subject: HW requirements

Hi ,

Could you give me some hints wrt HW requirements for Solr if I need to
index about 400 Gigas of text?

Thanks

Benjamin

Per field mm parameter

2015-05-28 Thread Nutch Solr User

How to specify per field mm parameter in edismax query.



-
Nutch Solr User

The ultimate search engine would basically understand everything in the world, 
and it would always give you the right thing.
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Per-field-mm-parameter-tp4208325.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: When is too many fields in qf is too many?

2015-05-28 Thread Steven White

Hi Charles,

That is what I have done.  At the moment, I have 22 request handlers, some
have 3490 field items in qf (that's the most and the qf line spans over
95,000 characters in solrconfig.xml file) and the least one has 1341
fields.  I'm working on seeing if I can use copyField to copy the data of
that view's field into a single pseudo-view-field and use that pseudo field
for qf of that view's request handler.  The I still have outstanding with
using copyField in this way is that it could lead to a complete re-indexing
of all the data in that view when a field is adding / removing from that
view.

Thanks

Steve

On Wed, May 27, 2015 at 6:02 PM, Reitzel, Charles 
charles.reit...@tiaa-cref.org wrote:

 One request handler per view?

 I think if you are able to make the actual view in use for the current
 request a single value (vs. all views that the user could use over time),
 it would keep the qf list down to a manageable size (e.g. specified within
 the request handler XML).   Not sure if this is feasible for  you, but it
 seems like a reasonable approach given the use case you describe.

 Just a thought ...

 -Original Message-
 From: Steven White [mailto:swhite4...@gmail.com]
 Sent: Tuesday, May 26, 2015 4:48 PM
 To: solr-user@lucene.apache.org
 Subject: Re: When is too many fields in qf is too many?

 Thanks Doug.  I might have to take you on the hangout offer.  Let me
 refine the requirement further and if I still see the need, I will let you
 know.

 Steve

 On Tue, May 26, 2015 at 2:01 PM, Doug Turnbull 
 dturnb...@opensourceconnections.com wrote:

  How you have tie is fine. Setting tie to 1 might give you reasonable
  results. You could easily still have scores that are just always an
  order of magnitude or two higher, but try it out!
 
  BTW Anything you put in teh URL can also be put into a request handler.
 
  If you ever just want to have a 15 minute conversation via hangout,
  happy to chat with you :) Might be fun to think through your prob
 together.
 
  -Doug
 
  On Tue, May 26, 2015 at 1:42 PM, Steven White swhite4...@gmail.com
  wrote:
 
   Hi Doug,
  
   I'm back to this topic.  Unfortunately, due to my DB structer, and
  business
   need, I will not be able to search against a single field (i.e.:
   using copyField).  Thus, I have to use list of fields via qf.
   Given this, I see you said above to use tie=1.0 will that, more or
   less, address this scoring issue?  Should tie=1.0 be set on the
 request handler like so:
  
 requestHandler name=/select class=solr.SearchHandler
lst name=defaults
  str name=echoParamsexplicit/str
  int name=rows20/int
  str name=defTypeedismax/str
  str name=qfF1 F2 F3 F4 ... ... .../str
  float name=tie1.0/float
  str name=fl_UNIQUE_FIELD_,score/str
  str name=wtxml/str
  str name=indenttrue/str
/lst
 /requestHandler
  
   Or must tie be passed as part of the URL?
  
   Thanks
  
   Steve
  
  
   On Wed, May 20, 2015 at 2:58 PM, Doug Turnbull 
   dturnb...@opensourceconnections.com wrote:
  
Yeah a copyField into one could be a good space/time tradeoff. It
can
  be
more manageable to use an all field for both relevancy and
performance,
   if
you can handle the duplication of data.
   
You could set tie=1.0, which effectively sums all the matches
instead
  of
picking the best match. You'll still have cases where one field's
score might just happen to be far off of another, and thus
dominating the summation. But something easy to try if you want to
keep playing with dismax.
   
-Doug
   
On Wed, May 20, 2015 at 2:56 PM, Steven White
swhite4...@gmail.com
wrote:
   
 Hi Doug,

 Your blog write up on relevancy is very interesting, I didn't
 know
   this.
 Looks like I have to go back to my drawing board and figure out
 an alternative solution: somehow get those group-based-fields
 data into
  a
 single field using copyField.

 Thanks

 Steve

 On Wed, May 20, 2015 at 11:17 AM, Doug Turnbull 
 dturnb...@opensourceconnections.com wrote:

  Steven,
 
  I'd be concerned about your relevance with that many qf fields.
   Dismax
  takes a winner takes all point of view to search. Field
  scores
  can
vary
  by an order of magnitude (or even two) despite the attempts of
  query
  normalization. You can read more here
 
 

   
  
  http://opensourceconnections.com/blog/2013/07/02/getting-dissed-by-dis
  max-why-your-incorrect-assumptions-about-dismax-are-hurting-search-rel
  evancy/
 
  I'm about to win the blashphemer merit badge, but ad-hoc
  all-field
like
  searching over many fields is actually a good use case for
 Elasticsearch's
  cross field queries.
 
 

   
  
  https://www.elastic.co/guide/en/elasticsearch/guide/master/_cross_fiel
  ds_queries.html

Re: When is too many fields in qf is too many?

2015-05-28 Thread Jack Krupansky

This does not even pass a basic smell test for reasonability of matching
the capabilities of Solr and the needs of your application. I'd like to
hear from others, but I personally would be -1 on this approach to misusing
qf. I'd simply say that you need to go back to the drawing board, and that
your primary focus should be on working with your application product
manager to revise your application requirements to more closely match the
capabilities of Solr.

To put it simply, if you have more than a dozen fields in qf, you're
probably doing something wrong. In this case horribly wrong.

Focus on designing your app to exploit the capabilities of Solr, not to
misuse them.

In short, to answer the original question, more than a couple dozen fields
in qf is indeed too many. More than a dozen raises a yellow flag for me.


-- Jack Krupansky

On Thu, May 28, 2015 at 8:13 AM, Steven White swhite4...@gmail.com wrote:

 Hi Charles,

 That is what I have done.  At the moment, I have 22 request handlers, some
 have 3490 field items in qf (that's the most and the qf line spans over
 95,000 characters in solrconfig.xml file) and the least one has 1341
 fields.  I'm working on seeing if I can use copyField to copy the data of
 that view's field into a single pseudo-view-field and use that pseudo field
 for qf of that view's request handler.  The I still have outstanding with
 using copyField in this way is that it could lead to a complete re-indexing
 of all the data in that view when a field is adding / removing from that
 view.

 Thanks

 Steve

 On Wed, May 27, 2015 at 6:02 PM, Reitzel, Charles 
 charles.reit...@tiaa-cref.org wrote:

  One request handler per view?
 
  I think if you are able to make the actual view in use for the current
  request a single value (vs. all views that the user could use over time),
  it would keep the qf list down to a manageable size (e.g. specified
 within
  the request handler XML).   Not sure if this is feasible for  you, but it
  seems like a reasonable approach given the use case you describe.
 
  Just a thought ...
 
  -Original Message-
  From: Steven White [mailto:swhite4...@gmail.com]
  Sent: Tuesday, May 26, 2015 4:48 PM
  To: solr-user@lucene.apache.org
  Subject: Re: When is too many fields in qf is too many?
 
  Thanks Doug.  I might have to take you on the hangout offer.  Let me
  refine the requirement further and if I still see the need, I will let
 you
  know.
 
  Steve
 
  On Tue, May 26, 2015 at 2:01 PM, Doug Turnbull 
  dturnb...@opensourceconnections.com wrote:
 
   How you have tie is fine. Setting tie to 1 might give you reasonable
   results. You could easily still have scores that are just always an
   order of magnitude or two higher, but try it out!
  
   BTW Anything you put in teh URL can also be put into a request handler.
  
   If you ever just want to have a 15 minute conversation via hangout,
   happy to chat with you :) Might be fun to think through your prob
  together.
  
   -Doug
  
   On Tue, May 26, 2015 at 1:42 PM, Steven White swhite4...@gmail.com
   wrote:
  
Hi Doug,
   
I'm back to this topic.  Unfortunately, due to my DB structer, and
   business
need, I will not be able to search against a single field (i.e.:
using copyField).  Thus, I have to use list of fields via qf.
Given this, I see you said above to use tie=1.0 will that, more or
less, address this scoring issue?  Should tie=1.0 be set on the
  request handler like so:
   
  requestHandler name=/select class=solr.SearchHandler
 lst name=defaults
   str name=echoParamsexplicit/str
   int name=rows20/int
   str name=defTypeedismax/str
   str name=qfF1 F2 F3 F4 ... ... .../str
   float name=tie1.0/float
   str name=fl_UNIQUE_FIELD_,score/str
   str name=wtxml/str
   str name=indenttrue/str
 /lst
  /requestHandler
   
Or must tie be passed as part of the URL?
   
Thanks
   
Steve
   
   
On Wed, May 20, 2015 at 2:58 PM, Doug Turnbull 
dturnb...@opensourceconnections.com wrote:
   
 Yeah a copyField into one could be a good space/time tradeoff. It
 can
   be
 more manageable to use an all field for both relevancy and
 performance,
if
 you can handle the duplication of data.

 You could set tie=1.0, which effectively sums all the matches
 instead
   of
 picking the best match. You'll still have cases where one field's
 score might just happen to be far off of another, and thus
 dominating the summation. But something easy to try if you want to
 keep playing with dismax.

 -Doug

 On Wed, May 20, 2015 at 2:56 PM, Steven White
 swhite4...@gmail.com
 wrote:

  Hi Doug,
 
  Your blog write up on relevancy is very interesting, I didn't
  know
this.
  Looks like I have to go back to my drawing board and figure out
  an alternative solution: somehow

solr-user-unsubscribe

Re: Solr advanced StopFilterFactory

2015-05-28 Thread Rupali

sylkaalex sylkaalex at gmail.com writes:

 
 The main goal to allow each user use own stop words list. For example 
user
 type th
 now he will see next results in his terms search:
 the
 the one 
 the then
 then
 then and
 
 But user has stop word the and he want get next results:
 then
 then and
 
 --
 View this message in context: http://lucene.472066.n3.nabble.com/Solr-
advanced-StopFilterFactory-tp4195797p4195855.html
 Sent from the Solr - User mailing list archive at Nabble.com.
 
 

Hi,

Is there any way to get the stopwords from database? Checking options to 
get the list from database and use that list as stopwords into 
spellcheck component.

Thanks in advance.

Re: solr uima and opennlp

2015-05-28 Thread hossmaa

Hi Tommaso

Thanks for the quick reply! I have another question about using the
Dictionary Annotator, but I guess it's better to post it separately.

Cheers
Andreea



--
View this message in context: 
http://lucene.472066.n3.nabble.com/solr-uima-and-opennlp-tp4206873p4208348.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Solr advanced StopFilterFactory

2015-05-28 Thread Alessandro Benedetti

As Alex initially specified , the custom stop filter factory is the right
way !
So is mainly related to the suggester ?
Anyway with a  custom stop filter, it can be possible and actually can be a
nice contribution as well.

Cheers


2015-05-28 13:01 GMT+01:00 Rupali rupali@gmail.com:

 sylkaalex sylkaalex at gmail.com writes:

 
  The main goal to allow each user use own stop words list. For example
 user
  type th
  now he will see next results in his terms search:
  the
  the one
  the then
  then
  then and
 
  But user has stop word the and he want get next results:
  then
  then and
 
  --
  View this message in context: http://lucene.472066.n3.nabble.com/Solr-
 advanced-StopFilterFactory-tp4195797p4195855.html
  Sent from the Solr - User mailing list archive at Nabble.com.
 
 

 Hi,

 Is there any way to get the stopwords from database? Checking options to
 get the list from database and use that list as stopwords into
 spellcheck component.

 Thanks in advance.






-- 
--

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?

William Blake - Songs of Experience -1794 England

Guidance needed to modify ExtendedDismaxQParserPlugin

2015-05-28 Thread Aman Tandon

Hi,

*Problem Statement: *query - i need leather jute bags

If we are searching on the *title *field using the pf2 (
*server:8003/solr/core0/select?q=i%20need%20leather%20jute%20bagspf2=titlexdebug=querydefType=edismaxwt=xmlrows=0*).
Currently it will create the shingled phrases like i need, need
leather, leather jute, jute bags.



 *str name=parsedquery_toString+(((title:i)~0.01 (title:need)~0.01
 (title:leather)~0.01 (title:jute)~0.01 (title:bag)~0.01)~3) ((titlex:i
 need)~0.01 (titlex:need leather)~0.01 (titlex:leather jute)~0.01
 (titlex:jute bag)~0.01)/str*


*Requirement: *

I want to customize the ExtendedDismaxQParserPlugin to generate custom
phrase queries on pf2. I want to create the phrase tokens like jute bags,
leather jute bags

So the irrelevant tokens like *i need*, *need leather* didn't match any
search results. Because in most of the scenarios in our business, we
observed (from Google Analytics) that last two words are more important in
the query.

So I need to generate only these two tokens by calling my xyz function
instead of calling the function *addShingledPhraseQueries. *Please guide me
here.

Should I modify the same java class or create another class. And In case of
another class how and where should I need to define our customized *defType*
.

With Regards
Aman Tandon

solr and uima dictionary annotator

2015-05-28 Thread hossmaa

Hi everyone

I am using the UIMA DictionaryAnnotator to tag Solr documents. It seems to
be working (I do get tags), but I get some strange behavior:

1. I am using the White Space Tokenizer both for the indexed text and for
creating the dictionary. Most entries in my dictionary consist of multiple
words. From the documentation, it seems that with the default settings, a
document must contain all words in order to match the dictionary entry.
However, this is not the case in practice. I'm seeing documents being
randomly tagged with single words, although my dictionary does not contain
an entry for those single words (they only appear as part of multi word
entries). This would be fine (even preferable), if it were consistent. But
it is not. The tagging happens only for a subset of single words, not for
all. What am I doing wrong?

2. If a dictionary word appears multiple times in the analyzed field, it is
also added just as many times to the mapped field (i.e. my tags). Is there a
way to control/disable this?

Thanks!
Regards
Andreea



--
View this message in context: 
http://lucene.472066.n3.nabble.com/solr-and-uima-dictionary-annotator-tp4208359.html
Sent from the Solr - User mailing list archive at Nabble.com.

Relevancy Score and Proximity Search

Hi,

Does anyone knows how Solr does its scoring with a query that has proximity
search enabled.

For example, when I issue a query q=Matex~1, the result with the top score
that came back was actually 'Latex', and with a score of 2.27. This is with
the fact that there are several documents in my index which contain the
exact word 'Matex'. All these results came back below, and only with a
score of 0.19.

Shouldn't documents with the exact match supposed to have a higher score
then those which are found by proximity search?

I'm using Solr 5.1 and have not change any settings. All my settings are
the default settings.


Regards,
Edwin

Re: Relevancy Score and Proximity Search

2015-05-28 Thread Vivek Pathak

You explain parameter and it should show you the scores and the calculations 

Sent from my Fire

On May 28, 2015, at 10:06 AM, Zheng Lin Edwin Yeo edwinye...@gmail.com wrote:

Hi,

Does anyone knows how Solr does its scoring with a query that has proximity
search enabled.

For example, when I issue a query q=Matex~1, the result with the top score
that came back was actually 'Latex', and with a score of 2.27. This is with
the fact that there are several documents in my index which contain the
exact word 'Matex'. All these results came back below, and only with a
score of 0.19.

Shouldn't documents with the exact match supposed to have a higher score
then those which are found by proximity search?

I'm using Solr 5.1 and have not change any settings. All my settings are
the default settings.


Regards,
Edwin

Re: Relevancy Score and Proximity Search

this site has been a great help to me in seeing how things shake out as far
as the scores are concerned: http://splainer.io/

-- 
*John Blythe*
Product Manager  Lead Developer

251.605.3071 | j...@curvolabs.com
www.curvolabs.com

58 Adams Ave
Evansville, IN 47713

On Thu, May 28, 2015 at 10:06 AM, Zheng Lin Edwin Yeo edwinye...@gmail.com
wrote:

 Hi,

 Does anyone knows how Solr does its scoring with a query that has proximity
 search enabled.

 For example, when I issue a query q=Matex~1, the result with the top score
 that came back was actually 'Latex', and with a score of 2.27. This is with
 the fact that there are several documents in my index which contain the
 exact word 'Matex'. All these results came back below, and only with a
 score of 0.19.

 Shouldn't documents with the exact match supposed to have a higher score
 then those which are found by proximity search?

 I'm using Solr 5.1 and have not change any settings. All my settings are
 the default settings.


 Regards,
 Edwin

Re: Native library of plugin is loaded for every core

2015-05-28 Thread adfel70

Works as expected :)
Thanks guys!



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Native-library-of-plugin-is-loaded-for-every-core-tp4207996p4208372.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: Dynamic range on numbers

fq, not fl.

fq is filter query
fl is the field list, the stored fields to be returned to the user.

Best,
Erick

On Thu, May 28, 2015 at 9:03 AM, John Blythe j...@curvolabs.com wrote:
 I've set the field to be processed as such:
 fieldType name=sizes class=solr.TrieDoubleField precisionStep=6 /

 and then have this in the fl box in Solr admin UI:
 *, score, {!frange l=sum(size1, product(size1, .10))}size1

 I'm trying to use the size1 field as the item upon which a frange is being
 used, but also need to use the size1 value for the mathematical functions
 themselves

 I get this error:
 error: { msg: Error parsing fieldname, code: 400 }

 thanks for any assistance or insight

 --
 *John Blythe*
 Product Manager  Lead Developer

 251.605.3071 | j...@curvolabs.com
 www.curvolabs.com

 58 Adams Ave
 Evansville, IN 47713

 On Wed, May 27, 2015 at 2:10 PM, John Blythe j...@curvolabs.com wrote:

 thanks erick. will give it a whirl later today and report back tonight or
 tomorrow. i imagine i'll have some more questions crop up :)

 best,

 --
 *John Blythe*
 Product Manager  Lead Developer

 251.605.3071 | j...@curvolabs.com
 www.curvolabs.com

 58 Adams Ave
 Evansville, IN 47713

 On Wed, May 27, 2015 at 1:32 PM, Erick Erickson erickerick...@gmail.com
 wrote:

 1 tfloat
 2 fq=dimField:[4.5 TO 5.5] or even use frange to set the lower and
 upper bounds via function

 Best,
 Erick

 On Wed, May 27, 2015 at 5:29 AM, John Blythe j...@curvolabs.com wrote:
  hi all,
 
  i'm attempting to suggest products across a range to users based on
  dimensions. if there is a 5x10mm Drill Set for instance and a
 competitor
  sales something similar enough then i'd like to have it shown. the
 range,
  however, would need to be dynamic. i'm thinking for our initial testing
  phase we'll go with 10% in either direction of a number.
 
  thus, a document that hits drill set but has the size1 field set to
 4.5
  or 5.5 would match for the 5 in the query.
 
  1) what's the best field type to use for numeric ranges? i'll need to
  account for decimal places, up to two places though usually only one.
  2) is there a way to dynamically set the range?
 
  thanks!

Re: Per field mm parameter


: Subject: Per field mm parameter
: 
: How to specify per field mm parameter in edismax query.

you can't.

the mm param applies to the number of minimum match clauses in the final 
query, where each of those clauses is a disjunction over each of the 
qf fields.

this blog might help explain the query structure...

https://lucidworks.com/blog/whats-a-dismax/



-Hoss
http://www.lucidworks.com/

Re: Ability to load solrcore.properties from zookeeper

Never even considered loading core.properties from ZK, so not even an
oversight on my part ;)


On Thu, May 28, 2015 at 3:48 AM, Alan Woodward a...@flax.co.uk wrote:
 I think this is an oversight, rather than intentional (at least, I certainly 
 didn't intend to write it like this!).  The problem here will be that 
 CoreDescriptors are currently built entirely from core.properties files, and 
 the CoreLocators that construct them don't have any access to zookeeper.

 Maybe the way forward is to move properties out of CoreDescriptor and have an 
 entirely separate CoreProperties object that is built and returned by the 
 ConfigSetService, and that is read via the ResourceLoader.  This would fit in 
 quite nicely with the changes I put up on SOLR-7570, in that you could have 
 properties specified on the collection config overriding properties from the 
 configset, and then local core-specific properties overriding both.

 Do you want to open a JIRA bug, Steve?

 Alan Woodward
 www.flax.co.uk


 On 28 May 2015, at 00:58, Chris Hostetter wrote:

 : I am attempting to override some properties in my solrconfig.xml file by
 : specifying properties in a solrcore.properties file which is uploaded in
 : Zookeeper's collections/conf directory, though when I go to create a new
 : collection those properties are never loaded. One work-around is to specify

 yeah ... that's weird ... it looks like the solrcore.properties reading
 logic goes out ot it's way to read from the conf/ dir of the core, rather
 then using the SolrResourceLoader (which is ZK aware in cloud mode)

 I don't understand if this is intentional or some kind of weird oversight.

 The relevent method is CoreDescriptor.loadExtraProperties()  By all means
 please open a bug about this -- and if you're feeling up to it, tackle a
 patch:  IIUC CoreDescriptor.loadExtraProperties is the relevent method ...
 it would need to build up the path including the core name and get the
 system level resource loader (CoreContainer.getResourceLoader()) to access
 it since the core doesn't exist yet so there is no core level
 ResourceLoader to use.

 Hopefully some folks who are more recently familiar with the core loading
 logic (like Alan  Erick) will see the Jira nad can chime in as to wether
 there is some fundemental reason it has to work the way it does not, or if
 this bug can be fixed.


 : easy way of updating those properties cluster-wide, I did attempt to
 : specify a request parameter of 'property.properties=solrcore.properties' in
 : the collection creation request but that also fails.

 yeah, looks like regardless of the filename, that method loads it the same
 way.


 -Hoss
 http://www.lucidworks.com/

Re: Dynamic range on numbers

I've set the field to be processed as such:
fieldType name=sizes class=solr.TrieDoubleField precisionStep=6 /

and then have this in the fl box in Solr admin UI:
*, score, {!frange l=sum(size1, product(size1, .10))}size1

I'm trying to use the size1 field as the item upon which a frange is being
used, but also need to use the size1 value for the mathematical functions
themselves

I get this error:
error: { msg: Error parsing fieldname, code: 400 }

thanks for any assistance or insight

-- 
*John Blythe*
Product Manager  Lead Developer

251.605.3071 | j...@curvolabs.com
www.curvolabs.com

58 Adams Ave
Evansville, IN 47713

On Wed, May 27, 2015 at 2:10 PM, John Blythe j...@curvolabs.com wrote:

 thanks erick. will give it a whirl later today and report back tonight or
 tomorrow. i imagine i'll have some more questions crop up :)

 best,

 --
 *John Blythe*
 Product Manager  Lead Developer

 251.605.3071 | j...@curvolabs.com
 www.curvolabs.com

 58 Adams Ave
 Evansville, IN 47713

 On Wed, May 27, 2015 at 1:32 PM, Erick Erickson erickerick...@gmail.com
 wrote:

 1 tfloat
 2 fq=dimField:[4.5 TO 5.5] or even use frange to set the lower and
 upper bounds via function

 Best,
 Erick

 On Wed, May 27, 2015 at 5:29 AM, John Blythe j...@curvolabs.com wrote:
  hi all,
 
  i'm attempting to suggest products across a range to users based on
  dimensions. if there is a 5x10mm Drill Set for instance and a
 competitor
  sales something similar enough then i'd like to have it shown. the
 range,
  however, would need to be dynamic. i'm thinking for our initial testing
  phase we'll go with 10% in either direction of a number.
 
  thus, a document that hits drill set but has the size1 field set to
 4.5
  or 5.5 would match for the 5 in the query.
 
  1) what's the best field type to use for numeric ranges? i'll need to
  account for decimal places, up to two places though usually only one.
  2) is there a way to dynamically set the range?
 
  thanks!

Re: When is too many fields in qf is too many?

Gotta agree with Jack here. This is an insane number of fields, query
performance on any significant corpus will be fraught etc. The very
first thing I'd look at is having that many fields. You have 3,500
different fields! Whatever the motivation for having that many fields
is the place I'd start.

Best,
Erick

On Thu, May 28, 2015 at 5:50 AM, Jack Krupansky
jack.krupan...@gmail.com wrote:
 This does not even pass a basic smell test for reasonability of matching
 the capabilities of Solr and the needs of your application. I'd like to
 hear from others, but I personally would be -1 on this approach to misusing
 qf. I'd simply say that you need to go back to the drawing board, and that
 your primary focus should be on working with your application product
 manager to revise your application requirements to more closely match the
 capabilities of Solr.

 To put it simply, if you have more than a dozen fields in qf, you're
 probably doing something wrong. In this case horribly wrong.

 Focus on designing your app to exploit the capabilities of Solr, not to
 misuse them.

 In short, to answer the original question, more than a couple dozen fields
 in qf is indeed too many. More than a dozen raises a yellow flag for me.


 -- Jack Krupansky

 On Thu, May 28, 2015 at 8:13 AM, Steven White swhite4...@gmail.com wrote:

 Hi Charles,

 That is what I have done.  At the moment, I have 22 request handlers, some
 have 3490 field items in qf (that's the most and the qf line spans over
 95,000 characters in solrconfig.xml file) and the least one has 1341
 fields.  I'm working on seeing if I can use copyField to copy the data of
 that view's field into a single pseudo-view-field and use that pseudo field
 for qf of that view's request handler.  The I still have outstanding with
 using copyField in this way is that it could lead to a complete re-indexing
 of all the data in that view when a field is adding / removing from that
 view.

 Thanks

 Steve

 On Wed, May 27, 2015 at 6:02 PM, Reitzel, Charles 
 charles.reit...@tiaa-cref.org wrote:

  One request handler per view?
 
  I think if you are able to make the actual view in use for the current
  request a single value (vs. all views that the user could use over time),
  it would keep the qf list down to a manageable size (e.g. specified
 within
  the request handler XML).   Not sure if this is feasible for  you, but it
  seems like a reasonable approach given the use case you describe.
 
  Just a thought ...
 
  -Original Message-
  From: Steven White [mailto:swhite4...@gmail.com]
  Sent: Tuesday, May 26, 2015 4:48 PM
  To: solr-user@lucene.apache.org
  Subject: Re: When is too many fields in qf is too many?
 
  Thanks Doug.  I might have to take you on the hangout offer.  Let me
  refine the requirement further and if I still see the need, I will let
 you
  know.
 
  Steve
 
  On Tue, May 26, 2015 at 2:01 PM, Doug Turnbull 
  dturnb...@opensourceconnections.com wrote:
 
   How you have tie is fine. Setting tie to 1 might give you reasonable
   results. You could easily still have scores that are just always an
   order of magnitude or two higher, but try it out!
  
   BTW Anything you put in teh URL can also be put into a request handler.
  
   If you ever just want to have a 15 minute conversation via hangout,
   happy to chat with you :) Might be fun to think through your prob
  together.
  
   -Doug
  
   On Tue, May 26, 2015 at 1:42 PM, Steven White swhite4...@gmail.com
   wrote:
  
Hi Doug,
   
I'm back to this topic.  Unfortunately, due to my DB structer, and
   business
need, I will not be able to search against a single field (i.e.:
using copyField).  Thus, I have to use list of fields via qf.
Given this, I see you said above to use tie=1.0 will that, more or
less, address this scoring issue?  Should tie=1.0 be set on the
  request handler like so:
   
  requestHandler name=/select class=solr.SearchHandler
 lst name=defaults
   str name=echoParamsexplicit/str
   int name=rows20/int
   str name=defTypeedismax/str
   str name=qfF1 F2 F3 F4 ... ... .../str
   float name=tie1.0/float
   str name=fl_UNIQUE_FIELD_,score/str
   str name=wtxml/str
   str name=indenttrue/str
 /lst
  /requestHandler
   
Or must tie be passed as part of the URL?
   
Thanks
   
Steve
   
   
On Wed, May 20, 2015 at 2:58 PM, Doug Turnbull 
dturnb...@opensourceconnections.com wrote:
   
 Yeah a copyField into one could be a good space/time tradeoff. It
 can
   be
 more manageable to use an all field for both relevancy and
 performance,
if
 you can handle the duplication of data.

 You could set tie=1.0, which effectively sums all the matches
 instead
   of
 picking the best match. You'll still have cases where one field's
 score might just happen to be far off of another, and thus
 dominating the

Re: solr-user-unsubscribe

Please follow the instructions here:
http://lucene.apache.org/solr/resources.html. Be sure to use the exact
same e-mail you used to subscribe.

Best,
Erick

On Thu, May 28, 2015 at 6:10 AM, Stefan Meise - SONIC Performance
Support stefan.me...@sonic-ps.de wrote:

Re: HW requirements

2015-05-28 Thread Jack Krupansky

You need to translate your source data size into number of documents and
document size. Document size will depend on number of fields, the type of
data in each field, and the size of the data in each field. You need to
think about numeric and date fields, raw string fields, and keyword text
fields.

Solr and Lucene do not merely index a bulk blob of bytes, but
semi-structured data, in the form of documents and fields.

In some cases the indexed data can be smaller than the source data, but it
can sometimes be larger as well.


-- Jack Krupansky

On Wed, May 27, 2015 at 12:33 PM, Sznajder ForMailingList 
bs4mailingl...@gmail.com wrote:

 Hi ,

 Could you give me some hints wrt HW requirements for Solr if I need to
 index about 400 Gigas of text?

 Thanks

 Benjamin

Re: distributed search limitations via SolrCloud

5.x will still build a war file that you an deploy on Tomcat. But
support for that is going away eventually, certainly by 6.0. But you
do have to make the decision sometime before 6.0 at least.

Best,
Erick

On Wed, May 27, 2015 at 1:24 PM, Vishal Swaroop vishal@gmail.com wrote:
 Thanks a lot Erick... great inputs...

 Currently our deployment is on Tomcat 7 and I think SOLR 5.x does not
 support Tomcat but runs on its own Jetty server, right ?
 I will discuss this with the team.

 Thanks again.

 Regards
 Vishal

 On Wed, May 27, 2015 at 4:16 PM, Erick Erickson erickerick...@gmail.com
 wrote:

 I'd move to Solr 4.10.3 at least, but preferably Solr 5.x. Solr 5.2 is
 being readied for release as we speak, it'll probably be available in
 a week or so barring unforeseen problems and that's the one I'd go
 with by preference.

 Do be aware, though, that the 5.x Solr world deprecates using a war
 file. It's still actually produced, but Solr is moving towards start
 scripts instead. This is something new to get used to. See:
 https://wiki.apache.org/solr/WhyNoWar

 Best,
 Erick

 On Wed, May 27, 2015 at 12:51 PM, Vishal Swaroop vishal@gmail.com
 wrote:
  Thanks a lot Erick... You are right we should not delay moving to
  sharding/SolrCloud process.
 
  As you all are expert... currently we are using SOLR 4.7.. Do you suggest
  we should move to latest SOLR release 5.1.0 ? or we can manage the above
  issue using SOLR 4.7
 
  Regards
  Vishal
 
  On Wed, May 27, 2015 at 2:21 PM, Erick Erickson erickerick...@gmail.com
 
  wrote:
 
  Hard to say. I've seen 20M doc be the place you need to consider
  sharding/SolrCloud. I've seen 300M docs be the place you need to start
  sharding. That said I'm quite sure you'll need to shard before you get
  to 2B. There's no good reason to delay that process.
 
  You'll have to do something about the join issue though, that's the
  problem you might want to solve first. The new streaming aggregation
  stuff might help there, you'll have to figure that out.
 
  The first thing I'd explore is whether you can denormlized your way
  out of the need to join. Or whether you can use block joins instead.
 
  Best,
  Erick
 
  On Wed, May 27, 2015 at 11:15 AM, Vishal Swaroop vishal@gmail.com
  wrote:
   Currently, we have SOLR configured on single linux server (24 GB
 physical
   memory) with multiple cores.
   We are using SOLR joins (https://wiki.apache.org/solr/Join) across
  cores on
   this single server.
  
   But, as data will grow to ~2 billion we need to assess whether we’ll
 need
   to run SolrCloud as In a DistributedSearch environment, you can not
 Join
   across cores on multiple nodes
  
   Please suggest at what point or index size should we start
 considering to
   run SolrCloud ?
  
   Regards

Re: Solr advanced StopFilterFactory

2015-05-28 Thread Timothy Potter

Seems like you should be able to use the ManagedStopFilterFactory with
a custom StorageIO impl that pulls from your db:

http://lucene.apache.org/solr/5_1_0/solr-core/index.html?org/apache/solr/rest/ManagedResourceStorage.StorageIO.html

On Thu, May 28, 2015 at 7:03 AM, Alessandro Benedetti
benedetti.ale...@gmail.com wrote:
 As Alex initially specified , the custom stop filter factory is the right
 way !
 So is mainly related to the suggester ?
 Anyway with a  custom stop filter, it can be possible and actually can be a
 nice contribution as well.

 Cheers


 2015-05-28 13:01 GMT+01:00 Rupali rupali@gmail.com:

 sylkaalex sylkaalex at gmail.com writes:

 
  The main goal to allow each user use own stop words list. For example
 user
  type th
  now he will see next results in his terms search:
  the
  the one
  the then
  then
  then and
 
  But user has stop word the and he want get next results:
  then
  then and
 
  --
  View this message in context: http://lucene.472066.n3.nabble.com/Solr-
 advanced-StopFilterFactory-tp4195797p4195855.html
  Sent from the Solr - User mailing list archive at Nabble.com.
 
 

 Hi,

 Is there any way to get the stopwords from database? Checking options to
 get the list from database and use that list as stopwords into
 spellcheck component.

 Thanks in advance.






 --
 --

 Benedetti Alessandro
 Visiting card : http://about.me/alessandro_benedetti

 Tyger, tyger burning bright
 In the forests of the night,
 What immortal hand or eye
 Could frame thy fearful symmetry?

 William Blake - Songs of Experience -1794 England

Re: Ability to load solrcore.properties from zookeeper


: Never even considered loading core.properties from ZK, so not even an
: oversight on my part ;)

to be very clear -- we're not talking about core.properties.

we're talking about solrcore.properties -- the file that's existed for 
much longer then core.properites (predates both solrcloud and core 
discovery) as a way to specify customer user properties (substituted in 
the real config files) on a per core basis.



: On Thu, May 28, 2015 at 3:48 AM, Alan Woodward a...@flax.co.uk wrote:
:  I think this is an oversight, rather than intentional (at least, I 
certainly didn't intend to write it like this!).  The problem here will be that 
CoreDescriptors are currently built entirely from core.properties files, and 
the CoreLocators that construct them don't have any access to zookeeper.
: 
:  Maybe the way forward is to move properties out of CoreDescriptor and have 
an entirely separate CoreProperties object that is built and returned by the 
ConfigSetService, and that is read via the ResourceLoader.  This would fit in 
quite nicely with the changes I put up on SOLR-7570, in that you could have 
properties specified on the collection config overriding properties from the 
configset, and then local core-specific properties overriding both.
: 
:  Do you want to open a JIRA bug, Steve?
: 
:  Alan Woodward
:  www.flax.co.uk
: 
: 
:  On 28 May 2015, at 00:58, Chris Hostetter wrote:
: 
:  : I am attempting to override some properties in my solrconfig.xml file by
:  : specifying properties in a solrcore.properties file which is uploaded in
:  : Zookeeper's collections/conf directory, though when I go to create a new
:  : collection those properties are never loaded. One work-around is to 
specify
: 
:  yeah ... that's weird ... it looks like the solrcore.properties reading
:  logic goes out ot it's way to read from the conf/ dir of the core, rather
:  then using the SolrResourceLoader (which is ZK aware in cloud mode)
: 
:  I don't understand if this is intentional or some kind of weird oversight.
: 
:  The relevent method is CoreDescriptor.loadExtraProperties()  By all means
:  please open a bug about this -- and if you're feeling up to it, tackle a
:  patch:  IIUC CoreDescriptor.loadExtraProperties is the relevent method ...
:  it would need to build up the path including the core name and get the
:  system level resource loader (CoreContainer.getResourceLoader()) to access
:  it since the core doesn't exist yet so there is no core level
:  ResourceLoader to use.
: 
:  Hopefully some folks who are more recently familiar with the core loading
:  logic (like Alan  Erick) will see the Jira nad can chime in as to wether
:  there is some fundemental reason it has to work the way it does not, or if
:  this bug can be fixed.
: 
: 
:  : easy way of updating those properties cluster-wide, I did attempt to
:  : specify a request parameter of 'property.properties=solrcore.properties' 
in
:  : the collection creation request but that also fails.
: 
:  yeah, looks like regardless of the filename, that method loads it the same
:  way.
: 
: 
:  -Hoss
:  http://www.lucidworks.com/
: 
: 

-Hoss
http://www.lucidworks.com/

Re: Unsubscribe

2015-05-28 Thread Nirali Mehta

Unsubscribe

Re: Ability to load solrcore.properties from zookeeper