date:20151229

Re: Solr - facet fields that contain other facet fields

2015-12-29 Thread Kevin Lopez

Erick,

I am not sure when you say "the only available terms are "not" and
"necessarily"" is totally correct. I go into the schema browser and I can
see that there are two terms "not" and "not necessarily" with the correct
count. Unless these are not the terms you are talking about. Can you
explain to me what these are exactly.

http://imgur.com/m82CH2f

I see what you are saying, it may be best for me to do the entity
extraction separately, and put the terms into a special field, although I
would like the terms to be highlighted (or have some type of position so I
can highlight it).

Regards,

Kevin

On Mon, Dec 28, 2015 at 12:49 PM, Erick Erickson 
wrote:

> bq:  so I cannot copy this field to a text field with a
> keywordtokenizer or strfield
>
> 1> There is no restriction on whether a field is analyzed or not as far as
> faceting is concerned. You can freely facet on an analyzed field
> or String field or KeywordTokenized field. As Binoy says, though,
> faceting on large analyzed text fields is dangerous.
>
> 2> copyField directives are not chained. As soon as the
> field is received, before _anything_ is done the raw contents are
> pushed to the copyField destinations. So in your case the source
> for both copyField directives should be "content". Otherwise you
> get into interesting behavior if you, say,  copyField from A to B and
> have another copyField from B to A. I _suspect_ this is
> why you have no term info available, but check
>
> 3> This is not going to work as you're trying to implement it. If you
> tokenize, the only available terms are "not" and "necessarily". There
> is no "not necessarily" _token_ to facet on. If you use a String
> or KeywordAnalylzed field, likewise there is no "not necessarily"
> token, there will be a _single_ token that's the entire content of the
> field
> (I'm leaving aside, for instance, WordDelimiterFilterFactory
> modifications...).
>
> One way to approach this would be to recognize and index synthetic
> tokens representing the concepts. You'd pre-analyze the text, do your
> entity recognition and add those entities to a special "entity" field or
> some such. This would be an unanalyzed field that you facet on. Let's
> say your entity was "colon cancer". Whenever you recognized that in
> the text during indexing, you'd index "colon_cancer", or "disease_234"
> in your special field.
>
> Of course your app would then have to present this pleasingly, and
> rather than the app needing access to your dictionary the "colon_cancer"
> form would be easier to unpack.
>
> The fragility here is that changing your text file of entities would
> require
> you to re-index to re-inject them into documents.
>
> You could also, assuming you know all the entities that should match
> a given query form facet _queries_ on the phrases. This could get to be
> quite a large query, but has the advantage of not requiring re-indexing.
> So you'd have something like
> facet.query=field:"not necessarily"=field:certainly
> etc.
>
> Best,
> Erick
>
>
> On Mon, Dec 28, 2015 at 9:13 AM, Binoy Dalal 
> wrote:
> > 1) When faceting use field of type string. That'll rid you of your
> > tokenization problems.
> > Alternatively do not use any tokenizers.
> > Also turn doc values on for the field. It'll improve performance.
> > 2) If however you do need to use a tokenized field for faceting, make
> sure
> > that they're pretty short in terms of number of tokens or else your app
> > will die real soon.
> >
> > On Mon, 28 Dec 2015, 22:24 Kevin Lopez  wrote:
> >
> >> I am not sure I am following correctly. The field I upload the document
> to
> >> would be "content" the analyzed field is "ColonCancerField". The
> "content"
> >> field contains the entire text of the document, in my case a pubmed
> >> abstract. This is a tokenized field. I made this field untokenized and I
> >> still received the same results [the results for not instead of not
> >> necessarily (in my current example I have 2 docs with not and 1 doc with
> >> not necessarily {not is of course in the document that contains not
> >> necessarily})]:
> >>
> >> http://imgur.com/a/1bfXT
> >>
> >> I also tried this:
> >>
> >> http://localhost:8983/solr/Cytokine/select?=ColonCancerField
> >> :"not+necessarily"
> >>
> >> I still receive the two documents, which is the same as doing
> >> ColonCancerField:"not"
> >>
> >> Just to clarify the structure looks like this: *content (untokenized,
> >> unanalyzed)* [copied to]==> *ColonCancerField *(tokenized, analyzed)
> then I
> >> browse the ColonCancerField and the facets state that there is 1
> document
> >> for not necessarily, but when selecting it, solr returns 2 results.
> >>
> >> -Kevin
> >>
> >> On Mon, Dec 28, 2015 at 10:22 AM, Jamie Johnson 
> wrote:
> >>
> >> > Can you do the opposite?  Index into an unanalyzed field and copy into
> >> the
> >> > analyzed?
> >> >
> >> > If I remember correctly

RE: post.jar with security.json

2015-12-29 Thread Oakley, Craig (NIH/NLM/NCBI) [C]

I do have authorization and authentication setup in security.json: the question 
is how to pass the login and password into post.jar and/or into 
solr-5.4.0/bin/post -- it does not seem to like the 
user:pswd@host:8983/solr/corename/update syntax from SOLR-5960: when I try 
that, it complains "SimplePostTool: FATAL: Connection error (is Solr running at 
http://user:pswd@hostname:8983/solr/five4a/update ?): 
java.net.ConnectException: Connection refused", and nothing shows up in 
solr.log (although I do set log4j.logger.org.eclipse.jetty.server.Server=DEBUG 
to check for 401 errors, etc).

FYI, I get a 404 from the link you sited: perhaps I don't have access, or 
perhaps you meant 
https://lucidworks.com/blog/2015/08/17/securing-solr-basic-auth-permission-rules
 (although that doesn't mention post.jar)

-Original Message-
From: esther.quan...@lucidworks.com [mailto:esther.quan...@lucidworks.com] 
Sent: Tuesday, December 29, 2015 12:54 PM
To: solr-user@lucene.apache.org
Subject: Re: post.jar with security.json

Hi Craig,

To pass the username and password, you'll want to enable authorization and 
authentication in security.json as is mentioned in this blog post in step 1 of 
"Enabling Basic Authentication". 

https://lucidworks.com/blog/2015/08/17/securing-solr-basic-auth--rules/

Is this what you're looking for?

Thanks,

Esther Quansah

> Le 29 déc. 2015 à 12:24, Oakley, Craig (NIH/NLM/NCBI) [C] 
>  a écrit :
> 
> Or to put it another way, how does one get security.json to work with 
> SOLR-5960?
> 
> Has anyone any suggestions?
> 
> -Original Message-
> From: Oakley, Craig (NIH/NLM/NCBI) [C] 
> Sent: Thursday, December 24, 2015 2:12 PM
> To: 'solr-user@lucene.apache.org' 
> Subject: post.jar with security.json
> 
> In the old jetty-based implementation of Basic Authentication, one could use 
> post.jar by running something like
> 
> java -Durl="http://user:pswd@host:8983/solr/corename/update; 
> -Dtype=application/xml -jar post.jar example.xml
> 
> By what mechanism does one pass in the user name and password to post.jar 
> (or, I suppose more likely, to solr-5.4.0/bin/post) when using security.json?
> 
> Thanks

RE: post.jar with security.json

2015-12-29 Thread Oakley, Craig (NIH/NLM/NCBI) [C]

Or to put it another way, how does one get security.json to work with SOLR-5960?

Has anyone any suggestions?

-Original Message-
From: Oakley, Craig (NIH/NLM/NCBI) [C] 
Sent: Thursday, December 24, 2015 2:12 PM
To: 'solr-user@lucene.apache.org' 
Subject: post.jar with security.json

In the old jetty-based implementation of Basic Authentication, one could use 
post.jar by running something like

java -Durl="http://user:pswd@host:8983/solr/corename/update; 
-Dtype=application/xml -jar post.jar example.xml

By what mechanism does one pass in the user name and password to post.jar (or, 
I suppose more likely, to solr-5.4.0/bin/post) when using security.json?

Thanks

Re: Changing Solr Schema with Data

2015-12-29 Thread Salman Ansari

Thanks guys for your responses.

@Shalin: Do you have a documentation that explains this? Moreover, is it
only for Solr 5+ or is it still applicable to Solr 3+? I am asking this as
I am working in a team and in some of our projects we are using old Solr
versions and I need to convince the guys that this is possible in the old
Solr as well.

Thanks for your help.

Regards,
Salman


On Tue, Dec 29, 2015 at 9:44 AM, Shalin Shekhar Mangar <
shalinman...@gmail.com> wrote:

> Adding new fields is not a problem. You can continue to use your
> existing index with the new schema.
>
> On Tue, Dec 29, 2015 at 1:58 AM, Salman Ansari 
> wrote:
> > You can say that we are not removing any fields (so the old data should
> not
> > get affected), however, we need to add new fields (which new data will
> > have). Does that answer your question?
> >
> >
> > Regards,
> > Salman
> >
> > On Mon, Dec 28, 2015 at 9:58 PM, Alexandre Rafalovitch <
> arafa...@gmail.com>
> > wrote:
> >
> >> Is the schema change affects the data you want to keep?
> >> 
> >> Newsletter and resources for Solr beginners and intermediates:
> >> http://www.solr-start.com/
> >>
> >>
> >> On 29 December 2015 at 01:48, Salman Ansari 
> >> wrote:
> >> > Hi,
> >> >
> >> > I am facing an issue where I need to change Solr schema but I have
> >> crucial
> >> > data that I don't want to delete. Is there a way where I can change
> the
> >> > schema of the index while keeping the data intact?
> >> >
> >> > Regards,
> >> > Salman
> >>
>
>
>
> --
> Regards,
> Shalin Shekhar Mangar.
>

Re: Solr5.X document loss in splitting shards

2015-12-29 Thread Luca Quarello

Hi,
the only way that i find to solve my problem is to do the split using a
solr instance configured in standalone mode.

curl
http://localhost:8983/solr/admin/cores?action=SPLIT=sepa=/nas_perf_2/FRAGMENTS/17MINDEXES/1/index=/nas_perf/FRAGMENTS/17MINDEXES/2/index

In solr_cloud mode does the shards splitting action work properly for large
shards?

Thanks!



*Luca Quarello*

M:+39 347 018 3855

luca.quare...@xeffe.it



*X**EFFE * s.r.l

C.so Giovanni Lanza 72, 10131 Torino

T: +39 011 660 5039

F: +39 011 198 26822

www.xeffe.it

On Mon, Dec 28, 2015 at 2:58 PM, GW  wrote:

> I don't use Curl but there are a couple of things that come to mind
>
> 1: Maybe use document routing with the shards. Use an "!" in your unique
> ID. I'm using gmail to read this and it sucks for searching content so if
> you have done this please ignore this point. Example: If you were storing
> documents per domain you unique field values would look like
> www.domain1.com!123,  www.domain1.com!124,
>www.domain2.com!35, etc.
>
> This should create a two segment hash for searching shards. I do this in
> blind faith as a best practice as it is mentioned in the docs.
>
> 2: Curl works best with URL encoding. I was using Curl at one time and I
> noticed some strange results w/o url encoding
>
> What are you using to write your client?
>
> Best,
>
> GW
>
>
>
> On 27 December 2015 at 19:35, Shawn Heisey  wrote:
>
> > On 12/26/2015 11:21 AM, Luca Quarello wrote:
> > > I have a SOLR 5.3.1 CLOUD with two nodes and 8 shards per node.
> > >
> > > Each shard is about* 35 million documents (**35025882**) and 16GB
> sized.*
> > >
> > >
> > >- I launch the SPLIT command on a shard (shard 13) in the ASYNC way:
> >
> > 
> >
> > > The new created shards have:
> > > *13430316 documents (5.6 GB) and 13425924 documents (5.59 GB**)*.
> >
> > Where are you looking that shows you the source shard has 35 million
> > documents?  Be extremely specific.
> >
> > The following screenshot shows one place you might be looking for this
> > information -- the core overview page:
> >
> >
> https://www.dropbox.com/s/311n49wkp9kw7xa/admin-ui-core-overview.png?dl=0
> >
> > Is the core overview page where you are looking, or is it somewhere else?
> >
> > I'm asking because "Max Doc" and "Num Docs" on the core overview page
> > mean very different things.  The difference between them is the number
> > of deleted docs, and the split shards are probably missing those deleted
> > docs.
> >
> > This is the only idea that I have.  If it's not that, then I'm as
> > clueless as you are.
> >
> > Thanks,
> > Shawn
> >
> >
>

Re: Solr5.X document loss in splitting shards

2015-12-29 Thread Luca Quarello

Hi,
the only way that i find to solve my problem is to do the split using a
solr instance configured in standalone mode.

curl
http://localhost:8983/solr/admin/cores?action=SPLIT=sepa=/nas_perf_2/FRAGMENTS/17MINDEXES/1/index=/nas_perf/FRAGMENTS/17MINDEXES/2/index

In solr_cloud mode does the shards splitting action work properly for large
shards?

Thanks!

On Mon, Dec 28, 2015 at 2:58 PM, GW  wrote:

> I don't use Curl but there are a couple of things that come to mind
>
> 1: Maybe use document routing with the shards. Use an "!" in your unique
> ID. I'm using gmail to read this and it sucks for searching content so if
> you have done this please ignore this point. Example: If you were storing
> documents per domain you unique field values would look like
> www.domain1.com!123,  www.domain1.com!124,
>www.domain2.com!35, etc.
>
> This should create a two segment hash for searching shards. I do this in
> blind faith as a best practice as it is mentioned in the docs.
>
> 2: Curl works best with URL encoding. I was using Curl at one time and I
> noticed some strange results w/o url encoding
>
> What are you using to write your client?
>
> Best,
>
> GW
>
>
>
> On 27 December 2015 at 19:35, Shawn Heisey  wrote:
>
> > On 12/26/2015 11:21 AM, Luca Quarello wrote:
> > > I have a SOLR 5.3.1 CLOUD with two nodes and 8 shards per node.
> > >
> > > Each shard is about* 35 million documents (**35025882**) and 16GB
> sized.*
> > >
> > >
> > >- I launch the SPLIT command on a shard (shard 13) in the ASYNC way:
> >
> > 
> >
> > > The new created shards have:
> > > *13430316 documents (5.6 GB) and 13425924 documents (5.59 GB**)*.
> >
> > Where are you looking that shows you the source shard has 35 million
> > documents?  Be extremely specific.
> >
> > The following screenshot shows one place you might be looking for this
> > information -- the core overview page:
> >
> >
> https://www.dropbox.com/s/311n49wkp9kw7xa/admin-ui-core-overview.png?dl=0
> >
> > Is the core overview page where you are looking, or is it somewhere else?
> >
> > I'm asking because "Max Doc" and "Num Docs" on the core overview page
> > mean very different things.  The difference between them is the number
> > of deleted docs, and the split shards are probably missing those deleted
> > docs.
> >
> > This is the only idea that I have.  If it's not that, then I'm as
> > clueless as you are.
> >
> > Thanks,
> > Shawn
> >
> >
>

Re: post.jar with security.json

2015-12-29 Thread Upayavira

You will probably find that the SimplePostTool (aka post.jar) has not
been updated to take into account security.json functionality.

Thus, the way to do this would be to look at the source code (it will
just use SolrJ to connect to Solr) and make enhancements to get it to
work (or if you're not familiar with Java, get someone else to do it).
Unfortunately, that is the nature of open source - there's so many such
features that *could* be extended, they tend to get the feature when
someone actually needs it.

Upayavira

On Tue, Dec 29, 2015, at 06:14 PM, Oakley, Craig (NIH/NLM/NCBI) [C]
wrote:
> I do have authorization and authentication setup in security.json: the
> question is how to pass the login and password into post.jar and/or into
> solr-5.4.0/bin/post -- it does not seem to like the
> user:pswd@host:8983/solr/corename/update syntax from SOLR-5960: when I
> try that, it complains "SimplePostTool: FATAL: Connection error (is Solr
> running at http://user:pswd@hostname:8983/solr/five4a/update ?):
> java.net.ConnectException: Connection refused", and nothing shows up in
> solr.log (although I do set
> log4j.logger.org.eclipse.jetty.server.Server=DEBUG to check for 401
> errors, etc).
> 
> FYI, I get a 404 from the link you sited: perhaps I don't have access, or
> perhaps you meant
> https://lucidworks.com/blog/2015/08/17/securing-solr-basic-auth-permission-rules
> (although that doesn't mention post.jar)
> 
> -Original Message-
> From: esther.quan...@lucidworks.com
> [mailto:esther.quan...@lucidworks.com] 
> Sent: Tuesday, December 29, 2015 12:54 PM
> To: solr-user@lucene.apache.org
> Subject: Re: post.jar with security.json
> 
> Hi Craig,
> 
> To pass the username and password, you'll want to enable authorization
> and authentication in security.json as is mentioned in this blog post in
> step 1 of "Enabling Basic Authentication". 
> 
> https://lucidworks.com/blog/2015/08/17/securing-solr-basic-auth--rules/
> 
> Is this what you're looking for?
> 
> Thanks,
> 
> Esther Quansah
> 
> > Le 29 déc. 2015 à 12:24, Oakley, Craig (NIH/NLM/NCBI) [C] 
> >  a écrit :
> > 
> > Or to put it another way, how does one get security.json to work with 
> > SOLR-5960?
> > 
> > Has anyone any suggestions?
> > 
> > -Original Message-
> > From: Oakley, Craig (NIH/NLM/NCBI) [C] 
> > Sent: Thursday, December 24, 2015 2:12 PM
> > To: 'solr-user@lucene.apache.org' 
> > Subject: post.jar with security.json
> > 
> > In the old jetty-based implementation of Basic Authentication, one could 
> > use post.jar by running something like
> > 
> > java -Durl="http://user:pswd@host:8983/solr/corename/update; 
> > -Dtype=application/xml -jar post.jar example.xml
> > 
> > By what mechanism does one pass in the user name and password to post.jar 
> > (or, I suppose more likely, to solr-5.4.0/bin/post) when using 
> > security.json?
> > 
> > Thanks

Re: multi term analyzer error

2015-12-29 Thread Ahmet Arslan

Hi Eyal,

What is your analyzer definition for multi-term?
In your example, is star charter separated from the term by a space?


Ahmet

On Tuesday, December 29, 2015 3:26 PM, Eyal Naamati 
 wrote:




Hi,
 
I defined a multi-term analyzer to my analysis chain, and it works as I expect. 
However, for some queries (for example '* or 'term *') I get an exception 
"analyzer returned no terms for multiTerm term". These queries work when I 
don't customize a multi-term analyzer.
My question: is there a way to handle this in the analyzer configuration (in my 
schema.xml)? I realize that I can also change the query I am sending the 
analyzer, but that is difficult for me since there are many places in our 
program that use this.
Thanks!
 
Eyal Naamati
Alma Developer
Tel: +972-2-6499313
Mobile: +972-547915255
eyal.naam...@exlibrisgroup.com

www.exlibrisgroup.com

Re: post.jar with security.json

2015-12-29 Thread esther . quansah

Hi Craig,

To pass the username and password, you'll want to enable authorization and 
authentication in security.json as is mentioned in this blog post in step 1 of 
"Enabling Basic Authentication". 

https://lucidworks.com/blog/2015/08/17/securing-solr-basic-auth--rules/

Is this what you're looking for?

Thanks,

Esther Quansah

> Le 29 déc. 2015 à 12:24, Oakley, Craig (NIH/NLM/NCBI) [C] 
>  a écrit :
> 
> Or to put it another way, how does one get security.json to work with 
> SOLR-5960?
> 
> Has anyone any suggestions?
> 
> -Original Message-
> From: Oakley, Craig (NIH/NLM/NCBI) [C] 
> Sent: Thursday, December 24, 2015 2:12 PM
> To: 'solr-user@lucene.apache.org' 
> Subject: post.jar with security.json
> 
> In the old jetty-based implementation of Basic Authentication, one could use 
> post.jar by running something like
> 
> java -Durl="http://user:pswd@host:8983/solr/corename/update; 
> -Dtype=application/xml -jar post.jar example.xml
> 
> By what mechanism does one pass in the user name and password to post.jar 
> (or, I suppose more likely, to solr-5.4.0/bin/post) when using security.json?
> 
> Thanks

Re: How to achieve join like behavior on solr-cloud

2015-12-29 Thread Dennis Gove

Alok,

You can use the Streaming API to achieve this goal but joins have not been
added to a 5.X release (at least I don't see it on the changelog). They do
exist on trunk and will be a part of Solr 6.

Documentation is still under development but if you wanted to play around
with it now you could perform an inner join with the expression

innerJoin(
  search(products, fl="id,itemDescription", q="type:book", sort="id asc"),
  search(orderLines, fl="orderId,productId", q="*:*",sort="productId asc"),
  on="id=productId"
)

On Tue, Dec 29, 2015 at 6:18 AM, Alok Bhandari <
alokomprakashbhand...@gmail.com> wrote:

> Hello ,
>
> I am aware of the fact that Solr (I am using 5.2) does not support join on
> distributed search with documents to be joined residing on different
> shards/collections.
>
> My use case is I want to fetch uuid of documents that are resultant of a
> search and also those docs which are outside this search but have "related"
> field same as one of the search result docs. This is a typical join
> scenario.
>
> Is there some way using streaming-api to achieve this behavior . Or some
> other approach.
>
> Thanks.
> Alok
>
>
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/How-to-achieve-join-like-behavior-on-solr-cloud-tp4247703.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>

Re: Changing Solr Schema with Data

2015-12-29 Thread Binoy Dalal

What shalin says is solid and will work with solr 5.x as well as 3.x
You could do a little poc if you want to be absolutely certain. Shouldn't
take you very long.
Your only concern will be that your old docs won't be matched against
queries matched against the newly added fields.

On Tue, 29 Dec 2015, 23:38 Salman Ansari  wrote:

> Thanks guys for your responses.
>
> @Shalin: Do you have a documentation that explains this? Moreover, is it
> only for Solr 5+ or is it still applicable to Solr 3+? I am asking this as
> I am working in a team and in some of our projects we are using old Solr
> versions and I need to convince the guys that this is possible in the old
> Solr as well.
>
> Thanks for your help.
>
> Regards,
> Salman
>
>
> On Tue, Dec 29, 2015 at 9:44 AM, Shalin Shekhar Mangar <
> shalinman...@gmail.com> wrote:
>
> > Adding new fields is not a problem. You can continue to use your
> > existing index with the new schema.
> >
> > On Tue, Dec 29, 2015 at 1:58 AM, Salman Ansari 
> > wrote:
> > > You can say that we are not removing any fields (so the old data should
> > not
> > > get affected), however, we need to add new fields (which new data will
> > > have). Does that answer your question?
> > >
> > >
> > > Regards,
> > > Salman
> > >
> > > On Mon, Dec 28, 2015 at 9:58 PM, Alexandre Rafalovitch <
> > arafa...@gmail.com>
> > > wrote:
> > >
> > >> Is the schema change affects the data you want to keep?
> > >> 
> > >> Newsletter and resources for Solr beginners and intermediates:
> > >> http://www.solr-start.com/
> > >>
> > >>
> > >> On 29 December 2015 at 01:48, Salman Ansari 
> > >> wrote:
> > >> > Hi,
> > >> >
> > >> > I am facing an issue where I need to change Solr schema but I have
> > >> crucial
> > >> > data that I don't want to delete. Is there a way where I can change
> > the
> > >> > schema of the index while keeping the data intact?
> > >> >
> > >> > Regards,
> > >> > Salman
> > >>
> >
> >
> >
> > --
> > Regards,
> > Shalin Shekhar Mangar.
> >
>
-- 
Regards,
Binoy Dalal

SOLR replicas performance

2015-12-29 Thread Luca Quarello

Hi,

I have an 260M documents index (90GB) with this structure:




  

  

  

  

  

  

  

   

  

  

  

  

  

  

  

  


where the fragmetnt field contains XML messagges.

There is a search function that provide the messagges satisfying a search
criterion.


TARGET:

To find the best configuration to optimize the response time of a two solr
instances cloud with 2 VM with 8 core and 32 GB


TEST RESULTS:


   1.

   Configurations:
   1.

  the better configuration without replicas
  - CONF1: 16 shards of 17M documents (8 per VM)
  1.

  configuration with replica
  - CONF 2: 8 shards of 35M documents with replication factor of 1
 - CONF 3: 16 shards of 35M documents with replication factor of 1



   1.

   Executed tests


   - sequential requests
  - 5 parallel requests
  - 10 parallel requests
  - 20 parallel requests

in two scenarios: during an indexing phase and not


Call are: http://localhost:8983/solr/sepa/select?
q=+fragment%3A*AAA*+=marked%3AT=-fragmentContentType
%3ABULK=0=100=creationTimestamp+desc%2Cid+asc


   1.

   Test results

   All the test have point out an I/O utilization of 100MB/s during

loading data on disk cache, disk cache utilization of 20GB and core
utilization of 100% (all 8 cores)



   -

   No indexing
   -

  CONF1 (time average and maximum time)
  -

 sequential: 4,1 6,9
 -

 5 parallel: 15,6 19,1
 -

 10 parallel: 23,6 30,2
 -

 20 parallel: 48 52,2
 -

  CONF2
  -

 sequential: 12,3 17,4
 -

 5 parallel: 32,5 34,2
 -

 10 parallel: 45,4 49
 -

 20 parallel: 64,6 74
 -

  CONF3
  -

 sequential: 6,9 9,9
 -

 5 parallel: 33,2 37,5
 -

 10 parallel: 46 51
 -

 20 parallel: 68 83



   -

   Indexing (into the solr admin console is it possible to view the
total throughput?
   I find it only relative to a single shard).


CONF1

   -

  sequential: 7,7 9,5
  -

  5 parallel: 26,8 28,4
  -

  10 parallel: 31,8 37,8
  -

  20 parallel: 42 52,5
  -

   CONF2
   -

  sequential: 12,3 19
  -

  5 parallel: 39 40,8
  -

  10 parallel: 56,6 62,9
  -

  20 parallel: 79 116
  -

   CONF3
   -

  sequential: 10 18,9
  -

  5 parallel: 36,5 41,9
  -

  10 parallel: 63,7 64,1
  -

  20 parallel: 85 120



I have two question:

   -

   the response times of the configuration with replica are worse (in test
   case of sequential requests worse of about three time) than the response
   times of the configuration without replica. Is it an expected result?
   - Why during  index inserting and updating replicas doesn’t help to
   reduce the response time?

Re: Facet shows deleted values...

2015-12-29 Thread Erick Erickson

Let's be sure we're using terms similarly

That article is from 2010, so is unreliable in the 5.2 world, I'd ignore that.

First, facets should always reflect the latest commit, regardless of
expungeDeletes or optimizes/forcemerges.

_commits_ are definitely recommended. Optimize/forcemerge (or
expungedeletes) are rarely necessary and
should _not_ be necessary for facets to not count omitted documents.

Is it possible that your autowarm period is long and you're still
getting an old searcher when you run your tests?

Assuming that you commit(), then wait a few minutes, do you see
inaccurate facets? If so, what are the
exact steps you follow?

Best,
Erick

On Tue, Dec 29, 2015 at 12:54 PM, Don Bosco Durai  wrote:
> I am purging some of my data on regular basis, but when I run a facet query, 
> the deleted values are still shown in the facet list.
>
> Seems, commit with expunge resolves this issue 
> (http://grokbase.com/t/lucene/solr-user/106313v302/deleted-documents-appearing-in-facet-fields
>  ). But it seems, commit is no more recommended. Also, I am running Solr 5.2 
> in SolrCloud mode.
>
> What is the recommendation here?
>
> Thanks
>
> Bosco
>
>

Re: Issue with Join

2015-12-29 Thread William Bell

Thoughts?

I can duplicate it at will...

On Mon, Dec 28, 2015 at 9:02 PM, William Bell  wrote:

> I have having issues with {!join}. If the core have multiValued field and
> the inner join does not have a multiValued field it does not find the
> ones...
>
> Solr 5.3.1... 5.3.1
>
> Example.
>
> PS1226 is in practicing_specialties_codes in providersearch core. This
> field is multiValued.
>
> in the autosuggest core there is NOT a field for PS1226 in there. This
> field is called prac_spec_code and is single values.
>
>
>
> http://localhost:8983/solr/providersearch/select?q=*%3A*=json=true=practicing_specialties_codes:PS1226=practicing_specialties_codes
>
> I get:
>
>
>- docs:
>[
>   -
>   {
>  - practicing_specialties_codes:
>  [
> - "PS1010",
> - "PS282",
> - "PS1226"
> ]
>  }
>   ]
>
>
>
> In autosuggest there is nothing:
>
>
> http://localhost:8983/solr/autosuggest/select?q=*%3A*=json=true=prac_spec_code:PS1226=prac_spec_code
>
> Nothing.
>
> Then a join should find what is in providersearch but missing in
> autosuggest.
>
>
> http://localhost:8983/solr/providersearch/select?debugQuery=true=json=*:*=10=practicing_specialties_codes:PS1226=practicing_specialties_codes=NOT%20{!join%20from=prac_spec_code%20to=practicing_specialties_codes%20fromIndex=autosuggest}auto_type:PRACSPEC
> 
>
> or
>
>
> http://hgsolr2sl1:8983/solr/providersearch/select?debugQuery=true=json=*:*=10=practicing_specialties_codes=NOT%20{!join%20from=prac_spec_code%20to=practicing_specialties_codes%20fromIndex=autosuggest}auto_type:PRACSPEC
> 
>
> or
>
>
> http://hgsolr2sl1:8983/solr/providersearch/select?debugQuery=true=json=*:*=10=practicing_specialties_codes=NOT%20{!join%20from=prac_spec_code%20to=practicing_specialties_codes%20fromIndex=autosuggest}*:*
> 
>
> I also tried *:* AND NOT {!join}
>
> I get 0 results. This seems to be a bug.
>
> {
>
>- responseHeader:
>{
>   - status: 0,
>   - QTime: 178,
>   - params:
>   {
>  - q: "*:*",
>  - fl: "practicing_specialties_codes",
>  - fq: "NOT {!join from=prac_spec_code
>  to=practicing_specialties_codes fromIndex=autosuggest}*:*",
>  - rows: "10",
>  - wt: "json",
>  - debugQuery: "true"
>  }
>   },
>- response:
>{
>   - numFound: 0,
>   - start: 0,
>   - docs: [ ]
>   },
>- debug:
>{
>   - rawquerystring: "*:*",
>   - querystring: "*:*",
>   - parsedquery: "MatchAllDocsQuery(*:*)",
>   - parsedquery_toString: "*:*",
>   - explain: { },
>   - QParser: "LuceneQParser",
>   - filter_queries:
>   [
>  - "NOT {!join from=prac_spec_code
>  to=practicing_specialties_codes fromIndex=autosuggest}*:*"
>  ],
>   - parsed_filter_queries:
>   [
>  - "-JoinQuery({!join from=prac_spec_code
>  to=practicing_specialties_codes fromIndex=autosuggest}*:*)"
>  ],
>   - timing:
>   {
>  - time: 177,
>  - prepare:
>  {
> - time: 0,
> - query:
> {
>- time: 0
>},
> - facet:
> {
>- time: 0
>},
> - facet_module:
> {
>- time: 0
>},
> - mlt:
> {
>- time: 0
>},
> - highlight:
> {
>- time: 0
>},
> - stats:
> {
>- time: 0
>},
> - expand:
> {
>- time: 0
>},
> - debug:
> {
>- time: 0
>}
> },
>  - process:
>  {
> - time: 177,
> - query:
> {
>- time: 177
>},
> - facet:
> {
>- time: 0
>},
> - facet_module:
> {
>- time: 0
>},
> - mlt:
> {
>- time: 0
>},
> - highlight:
>

Memory Usage increases by a lot during and after optimization .

2015-12-29 Thread Zheng Lin Edwin Yeo

Hi,

I am facing a situation, when I do an optimization by clicking on the
"Optimized" button on the Solr Admin Overview UI, the memory usage of the
server increases gradually, until it reaches near the maximum memory
available. There is 64GB of memory available in the server.

Even after the optimized is completed, the memory usage stays near the 100%
range, and could not be reduced until I stop Solr. Why could this be
happening?

Also, I don't think the optimization is completed, as the admin page says
the index is not optimized again after I go back to the Overview page, even
though I did not do any updates to the index.

I am using Solr 5.3.0, with 1 shard and 2 replica. My index size is 183GB.

Regards,
Edwin

Re: Memory Usage increases by a lot during and after optimization .

2015-12-29 Thread Walter Underwood

Do not “optimize".

It is a forced merge, not an optimization. It was a mistake to ever name it 
“optimize”. Solr automatically merges as needed. There are a few situations 
where a force merge might make a small difference. Maybe 10% or 20%, no one had 
bothered to measure it.

If your index is continually updated, clicking that is a complete waste of 
resources. Don’t do it.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Dec 29, 2015, at 6:35 PM, Zheng Lin Edwin Yeo  wrote:
> 
> Hi,
> 
> I am facing a situation, when I do an optimization by clicking on the
> "Optimized" button on the Solr Admin Overview UI, the memory usage of the
> server increases gradually, until it reaches near the maximum memory
> available. There is 64GB of memory available in the server.
> 
> Even after the optimized is completed, the memory usage stays near the 100%
> range, and could not be reduced until I stop Solr. Why could this be
> happening?
> 
> Also, I don't think the optimization is completed, as the admin page says
> the index is not optimized again after I go back to the Overview page, even
> though I did not do any updates to the index.
> 
> I am using Solr 5.3.0, with 1 shard and 2 replica. My index size is 183GB.
> 
> Regards,
> Edwin

Re: Solr - facet fields that contain other facet fields

2015-12-29 Thread Erick Erickson

Sorry, I overlooked the ShingleFilterFactory.
You're getting that from, presumably, your
ShingleFilterFactory. Note that the minShingleSize=2
does not mean that only 2-shingles are output, there's
yet another parameter "outputUnigrams" that controls
that in combination with outputUnigramsIfNoShingles.

I suspect that the shingle factory is making things
not quite meet your expectations. It's actually unclear to me
why the search for "not necessarily" with quotes is matching
the doc with "not". Can we see the output with

debug=true=true

?

In particular I've been assuming that your fq clause is a _phrase_
search as (with quotes) fq:"not necessarily". Look in the parsed-query
of the above (ignore the scoring) to see if the fq clause is a phrase
clause. If it's not, with a default operator of OR then your results
are understandable.

BTW, just to be paranoid I'd start with some two-word phrase
that doesn't contain "not" as that can be an operator It
shouldn't be in this case since it's lower case, but just to be safe...

Best,
Erick



On Tue, Dec 29, 2015 at 11:14 AM, Kevin Lopez  wrote:
> Erick,
>
> I am not sure when you say "the only available terms are "not" and
> "necessarily"" is totally correct. I go into the schema browser and I can
> see that there are two terms "not" and "not necessarily" with the correct
> count. Unless these are not the terms you are talking about. Can you
> explain to me what these are exactly.
>
> http://imgur.com/m82CH2f
>
> I see what you are saying, it may be best for me to do the entity
> extraction separately, and put the terms into a special field, although I
> would like the terms to be highlighted (or have some type of position so I
> can highlight it).
>
> Regards,
>
> Kevin
>
> On Mon, Dec 28, 2015 at 12:49 PM, Erick Erickson 
> wrote:
>
>> bq:  so I cannot copy this field to a text field with a
>> keywordtokenizer or strfield
>>
>> 1> There is no restriction on whether a field is analyzed or not as far as
>> faceting is concerned. You can freely facet on an analyzed field
>> or String field or KeywordTokenized field. As Binoy says, though,
>> faceting on large analyzed text fields is dangerous.
>>
>> 2> copyField directives are not chained. As soon as the
>> field is received, before _anything_ is done the raw contents are
>> pushed to the copyField destinations. So in your case the source
>> for both copyField directives should be "content". Otherwise you
>> get into interesting behavior if you, say,  copyField from A to B and
>> have another copyField from B to A. I _suspect_ this is
>> why you have no term info available, but check
>>
>> 3> This is not going to work as you're trying to implement it. If you
>> tokenize, the only available terms are "not" and "necessarily". There
>> is no "not necessarily" _token_ to facet on. If you use a String
>> or KeywordAnalylzed field, likewise there is no "not necessarily"
>> token, there will be a _single_ token that's the entire content of the
>> field
>> (I'm leaving aside, for instance, WordDelimiterFilterFactory
>> modifications...).
>>
>> One way to approach this would be to recognize and index synthetic
>> tokens representing the concepts. You'd pre-analyze the text, do your
>> entity recognition and add those entities to a special "entity" field or
>> some such. This would be an unanalyzed field that you facet on. Let's
>> say your entity was "colon cancer". Whenever you recognized that in
>> the text during indexing, you'd index "colon_cancer", or "disease_234"
>> in your special field.
>>
>> Of course your app would then have to present this pleasingly, and
>> rather than the app needing access to your dictionary the "colon_cancer"
>> form would be easier to unpack.
>>
>> The fragility here is that changing your text file of entities would
>> require
>> you to re-index to re-inject them into documents.
>>
>> You could also, assuming you know all the entities that should match
>> a given query form facet _queries_ on the phrases. This could get to be
>> quite a large query, but has the advantage of not requiring re-indexing.
>> So you'd have something like
>> facet.query=field:"not necessarily"=field:certainly
>> etc.
>>
>> Best,
>> Erick
>>
>>
>> On Mon, Dec 28, 2015 at 9:13 AM, Binoy Dalal 
>> wrote:
>> > 1) When faceting use field of type string. That'll rid you of your
>> > tokenization problems.
>> > Alternatively do not use any tokenizers.
>> > Also turn doc values on for the field. It'll improve performance.
>> > 2) If however you do need to use a tokenized field for faceting, make
>> sure
>> > that they're pretty short in terms of number of tokens or else your app
>> > will die real soon.
>> >
>> > On Mon, 28 Dec 2015, 22:24 Kevin Lopez  wrote:
>> >
>> >> I am not sure I am following correctly. The field I upload the document
>> to
>> >> would be "content" the analyzed field is "ColonCancerField".

Re: Facet shows deleted values...

2015-12-29 Thread Tomás Fernández Löbbe

I believe the problem here is that terms from the deleted docs still appear
in the facets, even with a doc count of 0, is that it? Can you use
facet.mincount=1 or would that not be a good fit for your use case?

https://cwiki.apache.org/confluence/display/solr/Faceting#Faceting-Thefacet.mincountParameter

Tomás

On Tue, Dec 29, 2015 at 5:23 PM, Erick Erickson 
wrote:

> Let's be sure we're using terms similarly
>
> That article is from 2010, so is unreliable in the 5.2 world, I'd ignore
> that.
>
> First, facets should always reflect the latest commit, regardless of
> expungeDeletes or optimizes/forcemerges.
>
> _commits_ are definitely recommended. Optimize/forcemerge (or
> expungedeletes) are rarely necessary and
> should _not_ be necessary for facets to not count omitted documents.
>
> Is it possible that your autowarm period is long and you're still
> getting an old searcher when you run your tests?
>
> Assuming that you commit(), then wait a few minutes, do you see
> inaccurate facets? If so, what are the
> exact steps you follow?
>
> Best,
> Erick
>
> On Tue, Dec 29, 2015 at 12:54 PM, Don Bosco Durai 
> wrote:
> > I am purging some of my data on regular basis, but when I run a facet
> query, the deleted values are still shown in the facet list.
> >
> > Seems, commit with expunge resolves this issue (
> http://grokbase.com/t/lucene/solr-user/106313v302/deleted-documents-appearing-in-facet-fields
> ). But it seems, commit is no more recommended. Also, I am running Solr 5.2
> in SolrCloud mode.
> >
> > What is the recommendation here?
> >
> > Thanks
> >
> > Bosco
> >
> >
>

Facet shows deleted values...

2015-12-29 Thread Don Bosco Durai

I am purging some of my data on regular basis, but when I run a facet query, 
the deleted values are still shown in the facet list.

Seems, commit with expunge resolves this issue 
(http://grokbase.com/t/lucene/solr-user/106313v302/deleted-documents-appearing-in-facet-fields
 ). But it seems, commit is no more recommended. Also, I am running Solr 5.2 in 
SolrCloud mode.

What is the recommendation here?

Thanks

Bosco

Re: Memory Usage increases by a lot during and after optimization .

2015-12-29 Thread Zheng Lin Edwin Yeo

Hi Walter,

Thanks for your reply.

Then how about optimization after indexing?
Normally the index size is much larger after indexing, then after
optimization, the index size reduces. Do we still need to do that?

Regards,
Edwin

On 30 December 2015 at 10:45, Walter Underwood 
wrote:

> Do not “optimize".
>
> It is a forced merge, not an optimization. It was a mistake to ever name
> it “optimize”. Solr automatically merges as needed. There are a few
> situations where a force merge might make a small difference. Maybe 10% or
> 20%, no one had bothered to measure it.
>
> If your index is continually updated, clicking that is a complete waste of
> resources. Don’t do it.
>
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
> > On Dec 29, 2015, at 6:35 PM, Zheng Lin Edwin Yeo 
> wrote:
> >
> > Hi,
> >
> > I am facing a situation, when I do an optimization by clicking on the
> > "Optimized" button on the Solr Admin Overview UI, the memory usage of the
> > server increases gradually, until it reaches near the maximum memory
> > available. There is 64GB of memory available in the server.
> >
> > Even after the optimized is completed, the memory usage stays near the
> 100%
> > range, and could not be reduced until I stop Solr. Why could this be
> > happening?
> >
> > Also, I don't think the optimization is completed, as the admin page says
> > the index is not optimized again after I go back to the Overview page,
> even
> > though I did not do any updates to the index.
> >
> > I am using Solr 5.3.0, with 1 shard and 2 replica. My index size is
> 183GB.
> >
> > Regards,
> > Edwin
>
>

Re: Memory Usage increases by a lot during and after optimization .

2015-12-29 Thread Walter Underwood

The only time that a force merge might be useful is when you reindex all 
content every night or every week, then do not make any changes until the next 
reindex. But even then, it probably does not matter.

Just let Solr do its thing. Solr is pretty smart.

A long time ago (1996-2006), I worked on an enterprise search engine with the 
same merging algorithm as Solr (Ultraseek Server). We always had customers 
asking about force-merge/optimize. It never made a useful difference. Even with 
twenty servers at irs.gov , it didn’t make a difference.

wunder
K6WRU
Walter Underwood
CM87wj
http://observer.wunderwood.org/ (my blog)

> On Dec 29, 2015, at 6:59 PM, Zheng Lin Edwin Yeo  wrote:
> 
> Hi Walter,
> 
> Thanks for your reply.
> 
> Then how about optimization after indexing?
> Normally the index size is much larger after indexing, then after
> optimization, the index size reduces. Do we still need to do that?
> 
> Regards,
> Edwin
> 
> On 30 December 2015 at 10:45, Walter Underwood 
> wrote:
> 
>> Do not “optimize".
>> 
>> It is a forced merge, not an optimization. It was a mistake to ever name
>> it “optimize”. Solr automatically merges as needed. There are a few
>> situations where a force merge might make a small difference. Maybe 10% or
>> 20%, no one had bothered to measure it.
>> 
>> If your index is continually updated, clicking that is a complete waste of
>> resources. Don’t do it.
>> 
>> wunder
>> Walter Underwood
>> wun...@wunderwood.org
>> http://observer.wunderwood.org/  (my blog)
>> 
>>> On Dec 29, 2015, at 6:35 PM, Zheng Lin Edwin Yeo 
>> wrote:
>>> 
>>> Hi,
>>> 
>>> I am facing a situation, when I do an optimization by clicking on the
>>> "Optimized" button on the Solr Admin Overview UI, the memory usage of the
>>> server increases gradually, until it reaches near the maximum memory
>>> available. There is 64GB of memory available in the server.
>>> 
>>> Even after the optimized is completed, the memory usage stays near the
>> 100%
>>> range, and could not be reduced until I stop Solr. Why could this be
>>> happening?
>>> 
>>> Also, I don't think the optimization is completed, as the admin page says
>>> the index is not optimized again after I go back to the Overview page,
>> even
>>> though I did not do any updates to the index.
>>> 
>>> I am using Solr 5.3.0, with 1 shard and 2 replica. My index size is
>> 183GB.
>>> 
>>> Regards,
>>> Edwin
>> 
>>

Re: Solr index segment level merge

2015-12-29 Thread Tomás Fernández Löbbe

Would collection aliases be an option (assuming you are using SolrCloud
mode)?


https://cwiki.apache.org/confluence/display/solr/Collections+API#CollectionsAPI-api4

On Tue, Dec 29, 2015 at 9:21 PM, Erick Erickson 
wrote:

> Could you simply add the new documents to the current index?
>
> That aside, merging does not need to create a new core or a new
> folder. The form:
>
>
> mergeindexes=core0=/opt/solr/core1/data/index=/opt/solr/core2/data/index
>
> Should merge the indexes from the two directories into the pre-existing
> core's index.
>
> Best,
> Erick
>
> On Tue, Dec 29, 2015 at 9:00 PM, Walter Underwood 
> wrote:
> > You probably do not NEED to merge your indexes. Have you tried not
> merging the indexes?
> >
> > wunder
> > Walter Underwood
> > wun...@wunderwood.org
> > http://observer.wunderwood.org/  (my blog)
> >
> >> On Dec 29, 2015, at 7:31 PM, jeba earnest 
> wrote:
> >>
> >> I have a scenario that I need to merge the solr indexes online. I have a
> >> primary solr index of 100 Gb and it is serving the end users and it
> can't
> >> go offline for a moment. Everyday new lucene indexes(2 GB) are generated
> >> separately.
> >>
> >> I have tried coreadmin
> >> https://cwiki.apache.org/confluence/display/solr/Merging+Indexes
> >>
> >> And it will create a new core or new folder. which means it will copy
> 100Gb
> >> every time to a new folder.
> >>
> >> Is there a way I can do a segment level merging?
> >>
> >> Jeba
> >
>

Re: Solr index segment level merge

2015-12-29 Thread Walter Underwood

You probably do not NEED to merge your indexes. Have you tried not merging the 
indexes?

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)

> On Dec 29, 2015, at 7:31 PM, jeba earnest  wrote:
> 
> I have a scenario that I need to merge the solr indexes online. I have a
> primary solr index of 100 Gb and it is serving the end users and it can't
> go offline for a moment. Everyday new lucene indexes(2 GB) are generated
> separately.
> 
> I have tried coreadmin
> https://cwiki.apache.org/confluence/display/solr/Merging+Indexes
> 
> And it will create a new core or new folder. which means it will copy 100Gb
> every time to a new folder.
> 
> Is there a way I can do a segment level merging?
> 
> Jeba

Having replica will slow down Solr?

2015-12-29 Thread Zheng Lin Edwin Yeo

Hi,

I would like to find out, will having a replica slow down the search for
Solr?

Currently, I'm having 1 shard and a replicationFactor of 2 using Solr
5.3.0. I'm running SolrCloud, with 3 external ZooKeeper using ZooKeeper
3.4.6, and my index size is 183GB.

I have been getting QTime of more than 3000ms for my basic search function,
even without adding other things like faceting or highlighting.

Regards,
Edwin

Solr index segment level merge

2015-12-29 Thread jeba earnest

I have a scenario that I need to merge the solr indexes online. I have a
primary solr index of 100 Gb and it is serving the end users and it can't
go offline for a moment. Everyday new lucene indexes(2 GB) are generated
separately.

I have tried coreadmin
https://cwiki.apache.org/confluence/display/solr/Merging+Indexes

And it will create a new core or new folder. which means it will copy 100Gb
every time to a new folder.

Is there a way I can do a segment level merging?

Jeba

Re: Memory Usage increases by a lot during and after optimization .

2015-12-29 Thread Yonik Seeley

Some people also want to control when major segment merges happen, and
optimizing at a known time helps prevent a major merge at an unknown
time (which can be equivalent to an optimize/forceMerge).

The benefits of optimizing (and having fewer segments to search
across) will vary depending on the requests.
Normal full-text searches will see little benefit (merging a few terms
across many segments is not expensive), while other operations that
need to deal with many terms, like faceting, may see bigger speedups.

-Yonik

Re: Solr index segment level merge

2015-12-29 Thread Erick Erickson

Could you simply add the new documents to the current index?

That aside, merging does not need to create a new core or a new
folder. The form:

mergeindexes=core0=/opt/solr/core1/data/index=/opt/solr/core2/data/index

Should merge the indexes from the two directories into the pre-existing
core's index.

Best,
Erick

On Tue, Dec 29, 2015 at 9:00 PM, Walter Underwood  wrote:
> You probably do not NEED to merge your indexes. Have you tried not merging 
> the indexes?
>
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
>
>> On Dec 29, 2015, at 7:31 PM, jeba earnest  wrote:
>>
>> I have a scenario that I need to merge the solr indexes online. I have a
>> primary solr index of 100 Gb and it is serving the end users and it can't
>> go offline for a moment. Everyday new lucene indexes(2 GB) are generated
>> separately.
>>
>> I have tried coreadmin
>> https://cwiki.apache.org/confluence/display/solr/Merging+Indexes
>>
>> And it will create a new core or new folder. which means it will copy 100Gb
>> every time to a new folder.
>>
>> Is there a way I can do a segment level merging?
>>
>> Jeba
>

Re: Memory Usage increases by a lot during and after optimization .

2015-12-29 Thread Zheng Lin Edwin Yeo

Thanks for the information.

Another thing I like to confirm is, will the Java Heap size setting affect
the optimization process or the memory usage?

Is the any recommended setting that we can use, for an index size of 200GB?

Regards,
Edwin


On 30 December 2015 at 11:07, Walter Underwood 
wrote:

> The only time that a force merge might be useful is when you reindex all
> content every night or every week, then do not make any changes until the
> next reindex. But even then, it probably does not matter.
>
> Just let Solr do its thing. Solr is pretty smart.
>
> A long time ago (1996-2006), I worked on an enterprise search engine with
> the same merging algorithm as Solr (Ultraseek Server). We always had
> customers asking about force-merge/optimize. It never made a useful
> difference. Even with twenty servers at irs.gov , it
> didn’t make a difference.
>
> wunder
> K6WRU
> Walter Underwood
> CM87wj
> http://observer.wunderwood.org/ (my blog)
>
> > On Dec 29, 2015, at 6:59 PM, Zheng Lin Edwin Yeo 
> wrote:
> >
> > Hi Walter,
> >
> > Thanks for your reply.
> >
> > Then how about optimization after indexing?
> > Normally the index size is much larger after indexing, then after
> > optimization, the index size reduces. Do we still need to do that?
> >
> > Regards,
> > Edwin
> >
> > On 30 December 2015 at 10:45, Walter Underwood 
> > wrote:
> >
> >> Do not “optimize".
> >>
> >> It is a forced merge, not an optimization. It was a mistake to ever name
> >> it “optimize”. Solr automatically merges as needed. There are a few
> >> situations where a force merge might make a small difference. Maybe 10%
> or
> >> 20%, no one had bothered to measure it.
> >>
> >> If your index is continually updated, clicking that is a complete waste
> of
> >> resources. Don’t do it.
> >>
> >> wunder
> >> Walter Underwood
> >> wun...@wunderwood.org
> >> http://observer.wunderwood.org/  (my blog)
> >>
> >>> On Dec 29, 2015, at 6:35 PM, Zheng Lin Edwin Yeo  >
> >> wrote:
> >>>
> >>> Hi,
> >>>
> >>> I am facing a situation, when I do an optimization by clicking on the
> >>> "Optimized" button on the Solr Admin Overview UI, the memory usage of
> the
> >>> server increases gradually, until it reaches near the maximum memory
> >>> available. There is 64GB of memory available in the server.
> >>>
> >>> Even after the optimized is completed, the memory usage stays near the
> >> 100%
> >>> range, and could not be reduced until I stop Solr. Why could this be
> >>> happening?
> >>>
> >>> Also, I don't think the optimization is completed, as the admin page
> says
> >>> the index is not optimized again after I go back to the Overview page,
> >> even
> >>> though I did not do any updates to the index.
> >>>
> >>> I am using Solr 5.3.0, with 1 shard and 2 replica. My index size is
> >> 183GB.
> >>>
> >>> Regards,
> >>> Edwin
> >>
> >>
>
>

RE: multi term analyzer error

2015-12-29 Thread Eyal Naamati

Hi Ahmet,
Yes there is a space in my example.
This is my multiterm analyzer:







Thanks!

Eyal Naamati
Alma Developer
Tel: +972-2-6499313
Mobile: +972-547915255
eyal.naam...@exlibrisgroup.com

www.exlibrisgroup.com

-Original Message-
From: Ahmet Arslan [mailto:iori...@yahoo.com.INVALID] 
Sent: Tuesday, December 29, 2015 5:18 PM
To: solr-user@lucene.apache.org
Subject: Re: multi term analyzer error

Hi Eyal,

What is your analyzer definition for multi-term?
In your example, is star charter separated from the term by a space?


Ahmet

On Tuesday, December 29, 2015 3:26 PM, Eyal Naamati 
 wrote:




Hi,
 
I defined a multi-term analyzer to my analysis chain, and it works as I expect. 
However, for some queries (for example '* or 'term *') I get an exception 
"analyzer returned no terms for multiTerm term". These queries work when I 
don't customize a multi-term analyzer.
My question: is there a way to handle this in the analyzer configuration (in my 
schema.xml)? I realize that I can also change the query I am sending the 
analyzer, but that is difficult for me since there are many places in our 
program that use this.
Thanks!
 
Eyal Naamati
Alma Developer
Tel: +972-2-6499313
Mobile: +972-547915255
eyal.naam...@exlibrisgroup.com

www.exlibrisgroup.com

Re: Memory Usage increases by a lot during and after optimization .

2015-12-29 Thread William Bell

Question: does anyone have example good merge settings for solrconfig ? To
keep the number of segments small like 6?

On Tue, Dec 29, 2015 at 8:49 PM, Yonik Seeley  wrote:

> Some people also want to control when major segment merges happen, and
> optimizing at a known time helps prevent a major merge at an unknown
> time (which can be equivalent to an optimize/forceMerge).
>
> The benefits of optimizing (and having fewer segments to search
> across) will vary depending on the requests.
> Normal full-text searches will see little benefit (merging a few terms
> across many segments is not expensive), while other operations that
> need to deal with many terms, like faceting, may see bigger speedups.
>
> -Yonik
>



-- 
Bill Bell
billnb...@gmail.com
cell 720-256-8076

RE: SolrMeter is still a feasible tool for measuring performances?

2015-12-29 Thread Gian Maria Ricci - aka Alkampfer

Thanks to everyone, I'll give it a try because it seems very useful.

Compiling with maven is really trivial, and it is not indeed a problem. I've a 
customer that asked me if developers can find an official compiled version 
instead of compiling on their machine. The solution is probably sharing the 
compiled version on network share :).

Thanks again and happy new year to everyone.

--
Gian Maria Ricci
Cell: +39 320 0136949


-Original Message-
From: Binoy Dalal [mailto:binoydala...@gmail.com] 
Sent: lunedì 28 dicembre 2015 19:38
To: solr-user 
Subject: Re: SolrMeter is still a feasible tool for measuring performances?

Solr meter works very well with solr 4.10.4 including the query extraction 
feature.
We've been using it for quite a while now.
You should give it a try. Won't take very long to setup and use.

On Mon, 28 Dec 2015, 23:23 Erick Erickson  wrote:

> SolrMeter has some pretty cool features, one of which is to extract 
> queries from existing Solr logs. If the Solr logging patterns have 
> changed, which they do, that may require some fixing up...
>
> Let us know...
>
> Erick
>
> On Mon, Dec 28, 2015 at 12:25 AM, Binoy Dalal 
> wrote:
> > Hi Gian
> > We've using solr meter to test the performance of solr instances for
> quite
> > a while now and in my experience it is pretty reliable.
> > Finding a compiled jar is difficult but building from the code is 
> > prett=
y
> > straightforward and will only take you a few minutes.
> >
> > On Mon, 28 Dec 2015, 13:47 Gian Maria Ricci - aka Alkampfer < 
> > alkampfer@nablas
oft.com> wrote:
> >
> >> Hi,
> >>
> >>
> >>
> >> I=E2=80=99ve read on SolrWiki that solrmeter is not active 
> >> developed a=
nymore,
> but
> >> I wonder if it is still valid to do some performance test or if 
> >> there =
is
> >> some better approach / tool.
> >>
> >>
> >>
> >> I=E2=80=99d like also to know where I can find the latest compiled 
> >> ver=
sion for
> >> SolrMeter instead of compiling with maven. The release page on 
> >> GitHub
> only
> >> gives the source code
> >>
> https://github.com/tflobbe/solrmeter/releases/tag/solrmeter-parent-0.3
> .0
> >>
> >>
> >>
> >> Thanks in advance for any help you can give me.
> >>
> >> --
> >> Gian Maria Ricci
> >> Cell: +39 320 0136949
> >>
> >> [image:
> >>
> https://ci5.googleusercontent.com/proxy/5oNMOYAeFXZ_LDKanNfoLRHC37mAZk
> VVh=
kPN7QxMdA0K5JW2m0bm8azJe7oWZMNt8fKHNX1bzrUTd-kIyE40CmwT2Mlf8OI=3Ds0-d-e1-ft=
#http://www.codewrecks.com/files/signature/mvp.png
> ]
> >> 
> [image:
> >
>
> https://ci3.googleusercontent.com/proxy/f-unQbmk6NtkHFspO5Y6x4jlIf_xrm
> GLU=
T3fU9y_7VUHSFUjLs7aUIMdZQYTh3eWIA0sBnvNX3WGXCU59chKXLuAHi2ArWdAcBclKA=3Ds0-=
d-e1-ft#http://www.codewrecks.com/files/signature/linkedin.jpg
> ]
> >>  [image:
> >>
> https://ci3.googleusercontent.com/proxy/gjapMzu3KEakBQUstx_-cN7gHJ_Gpc
> IZN=
EPjCzOYMrPl-r1DViPE378qNAQyEWbXMTj6mcduIAGaApe9qHG1KN_hyFxQAIkdNSVT=3Ds0-d-=
e1-ft#http://www.codewrecks.com/files/signature/twitter.jpg
> ]
> >>  [image:
> >>
> https://ci5.googleusercontent.com/proxy/iuDOD2sdaxRDvTwS8MO7-CcXchpNJX
> 96u=
aWuvagoVLcjpAPsJi88XeOonE4vHT6udVimo7yL9ZtdrYueEfH7jXnudmi_Vvw=3Ds0-d-e1-ft=
#http://www.codewrecks.com/files/signature/rss.jpg
> ]
> >>  [image:
> >>
> https://ci6.googleusercontent.com/proxy/EBJjfkBzcsSlAzlyR88y86YXcwaKfn
> 3x7=
ydAObL1vtjJYclQr_l5TvrFx4PQ5qLNYW3yp7Ig66DJ-0tPJCDbDmYAFcamPQehwg=3Ds0-d-e1=
-ft#http://www.codewrecks.
com/files/signature/skype.jpg
> ]
> >>
> >>
> >>
> > --
> > Regards,
> > Binoy Dalal
>
--=20
Regards,
Binoy Dalal

[More Like This] Query building

2015-12-29 Thread Alessandro Benedetti

Hi guys,
While I was exploring the way we build the More Like This query, I
discovered a part I am not convinced of :



Let's see how we build the query :
org.apache.lucene.queries.mlt.MoreLikeThis#retrieveTerms(int)

1) we extract the terms from the interesting fields, adding them to a map :

Map termFreqMap = new HashMap<>();

*( we lose the relation field-> term, we don't know anymore where the term
was coming ! )*

org.apache.lucene.queries.mlt.MoreLikeThis#createQueue

2) we build the queue that will contain the query terms, at this point we
connect again there terms to some field, but :

...
> // go through all the fields and find the largest document frequency
> String topField = fieldNames[0];
> int docFreq = 0;
> for (String fieldName : fieldNames) {
>   int freq = ir.docFreq(new Term(fieldName, word));
>   topField = (freq > docFreq) ? fieldName : topField;
>   docFreq = (freq > docFreq) ? freq : docFreq;
> }
> ...


We identify the topField as the field with the highest document frequency
for the term t .
Then we build the termQuery :

queue.add(new ScoreTerm(word, *topField*, score, idf, docFreq, tf));

In this way we lose a lot of precision.
Not sure why we do that.
I would prefer to keep the relation between terms and fields.
The MLT query can improve a lot the quality.
If i run the MLT on 2 fields : *description* and *facilities* for example.
It is likely I want to find documents with similar terms in the description
and similar terms in the facilities, without mixing up the things and
loosing the semantic of the terms.

Let me know your opinion,

Cheers


-- 
--

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England

Re: mlt and document boost

2015-12-29 Thread Alessandro Benedetti

Hi Upaya,
talking about wrapping the MLT query parser with additional query parsers :

let's assume I want to run my MLT query + 2 boost functions on the results
to affect the ranking.

Can you give me an example of how to wrap them together ?

Those two are the independent pieces :
{!boost b=recip(dist(2, 0, star_rating, 0, 3),1,10,10)}
{!mlt qf=name,description,facilities,resort,region,dest_level_2 mintf=1
mindf=3 maxqt=100}43083

Cheers

Cheers

On 24 December 2015 at 21:18, Upayavira  wrote:

> If you are going to go that far, you can get the parsed query from the
> debug output, but seriously, if you are using a latest Solr and don't
> need the stream.body functionality in MLT, then use the MLT query
> parser, it is by far the best way to do it - as you get all the features
> of other query parsers and such for free.
>
> Upayavira
>
> On Thu, Dec 24, 2015, at 07:37 PM, Tim Hearn wrote:
> > One workaround is to use the 'important terms' feature to grab the query
> > generated by the MLT handler, then parse that list into your own solr
> > query
> > to use through a standard search handler.  That way, you can get the same
> > results as if you used the MLT handler, and you can also use filter
> > querying, highlighting, etc.
> >
> > Note:  I am currently running a Solr 5.0.0 Single-Core installation
> >
> > On Thu, Dec 24, 2015 at 11:57 AM, Upayavira  wrote:
> >
> > > Which morelikethis are you using? Handler, SearchComponent or
> > > QueryParser?
> > >
> > > You should be a able to wrap the mlt query parser with the boost query
> > > parser with no problem.
> > >
> > > Upayavira
> > >
> > > On Thu, Dec 24, 2015, at 05:18 AM, Binoy Dalal wrote:
> > > > Have you tried applying the boosts to individual fields with mlt.qf?
> > > > Optionally, you could get the patch that is on jira and integrate it
> into
> > > > your code if you're so inclined.
> > > >
> > > > On Thu, 24 Dec 2015, 03:17 CrazyDiamond 
> wrote:
> > > >
> > > > > So no way to apply boost to mlt or any other way to change order of
> > > > > document
> > > > > in mlt result? also may be there is a way to make to mlt query  at
> > > once and
> > > > > merge.
> > > > >
> > > > >
> > > > >
> > > > > --
> > > > > View this message in context:
> > > > >
> > >
> http://lucene.472066.n3.nabble.com/mlt-and-document-boost-tp4246522p4247154.html
> > > > > Sent from the Solr - User mailing list archive at Nabble.com.
> > > > >
> > > > --
> > > > Regards,
> > > > Binoy Dalal
> > >
>



-- 
--

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England

Re: [More Like This] Query building

2015-12-29 Thread Anshum Gupta

Feel free to create a JIRA and put up a patch if you can.

On Tue, Dec 29, 2015 at 4:26 PM, Alessandro Benedetti  wrote:

> Hi guys,
> While I was exploring the way we build the More Like This query, I
> discovered a part I am not convinced of :
>
>
>
> Let's see how we build the query :
> org.apache.lucene.queries.mlt.MoreLikeThis#retrieveTerms(int)
>
> 1) we extract the terms from the interesting fields, adding them to a map :
>
> Map termFreqMap = new HashMap<>();
>
> *( we lose the relation field-> term, we don't know anymore where the term
> was coming ! )*
>
> org.apache.lucene.queries.mlt.MoreLikeThis#createQueue
>
> 2) we build the queue that will contain the query terms, at this point we
> connect again there terms to some field, but :
>
> ...
>> // go through all the fields and find the largest document frequency
>> String topField = fieldNames[0];
>> int docFreq = 0;
>> for (String fieldName : fieldNames) {
>>   int freq = ir.docFreq(new Term(fieldName, word));
>>   topField = (freq > docFreq) ? fieldName : topField;
>>   docFreq = (freq > docFreq) ? freq : docFreq;
>> }
>> ...
>
>
> We identify the topField as the field with the highest document frequency
> for the term t .
> Then we build the termQuery :
>
> queue.add(new ScoreTerm(word, *topField*, score, idf, docFreq, tf));
>
> In this way we lose a lot of precision.
> Not sure why we do that.
> I would prefer to keep the relation between terms and fields.
> The MLT query can improve a lot the quality.
> If i run the MLT on 2 fields : *description* and *facilities* for example.
> It is likely I want to find documents with similar terms in the
> description and similar terms in the facilities, without mixing up the
> things and loosing the semantic of the terms.
>
> Let me know your opinion,
>
> Cheers
>
>
> --
> --
>
> Benedetti Alessandro
> Visiting card : http://about.me/alessandro_benedetti
>
> "Tyger, tyger burning bright
> In the forests of the night,
> What immortal hand or eye
> Could frame thy fearful symmetry?"
>
> William Blake - Songs of Experience -1794 England
>



-- 
Anshum Gupta

multi term analyzer error

2015-12-29 Thread Eyal Naamati

Hi,

I defined a multi-term analyzer to my analysis chain, and it works as I expect. 
However, for some queries (for example '* or 'term *') I get an exception 
"analyzer returned no terms for multiTerm term". These queries work when I 
don't customize a multi-term analyzer.
My question: is there a way to handle this in the analyzer configuration (in my 
schema.xml)? I realize that I can also change the query I am sending the 
analyzer, but that is difficult for me since there are many places in our 
program that use this.
Thanks!

Eyal Naamati
Alma Developer
Tel: +972-2-6499313
Mobile: +972-547915255
eyal.naam...@exlibrisgroup.com
[Description: Description: Description: Description: C://signature/exlibris.jpg]
www.exlibrisgroup.com

Re: [More Like This] Query building

2015-12-29 Thread Alessandro Benedetti

Sure, I will proceed tomorrow with the Jira and the simple patch + tests.

In the meantime let's try to collect some additional feedback.

Cheers

On 29 December 2015 at 12:43, Anshum Gupta  wrote:

> Feel free to create a JIRA and put up a patch if you can.
>
> On Tue, Dec 29, 2015 at 4:26 PM, Alessandro Benedetti <
> abenede...@apache.org
> > wrote:
>
> > Hi guys,
> > While I was exploring the way we build the More Like This query, I
> > discovered a part I am not convinced of :
> >
> >
> >
> > Let's see how we build the query :
> > org.apache.lucene.queries.mlt.MoreLikeThis#retrieveTerms(int)
> >
> > 1) we extract the terms from the interesting fields, adding them to a
> map :
> >
> > Map termFreqMap = new HashMap<>();
> >
> > *( we lose the relation field-> term, we don't know anymore where the
> term
> > was coming ! )*
> >
> > org.apache.lucene.queries.mlt.MoreLikeThis#createQueue
> >
> > 2) we build the queue that will contain the query terms, at this point we
> > connect again there terms to some field, but :
> >
> > ...
> >> // go through all the fields and find the largest document frequency
> >> String topField = fieldNames[0];
> >> int docFreq = 0;
> >> for (String fieldName : fieldNames) {
> >>   int freq = ir.docFreq(new Term(fieldName, word));
> >>   topField = (freq > docFreq) ? fieldName : topField;
> >>   docFreq = (freq > docFreq) ? freq : docFreq;
> >> }
> >> ...
> >
> >
> > We identify the topField as the field with the highest document frequency
> > for the term t .
> > Then we build the termQuery :
> >
> > queue.add(new ScoreTerm(word, *topField*, score, idf, docFreq, tf));
> >
> > In this way we lose a lot of precision.
> > Not sure why we do that.
> > I would prefer to keep the relation between terms and fields.
> > The MLT query can improve a lot the quality.
> > If i run the MLT on 2 fields : *description* and *facilities* for
> example.
> > It is likely I want to find documents with similar terms in the
> > description and similar terms in the facilities, without mixing up the
> > things and loosing the semantic of the terms.
> >
> > Let me know your opinion,
> >
> > Cheers
> >
> >
> > --
> > --
> >
> > Benedetti Alessandro
> > Visiting card : http://about.me/alessandro_benedetti
> >
> > "Tyger, tyger burning bright
> > In the forests of the night,
> > What immortal hand or eye
> > Could frame thy fearful symmetry?"
> >
> > William Blake - Songs of Experience -1794 England
> >
>
>
>
> --
> Anshum Gupta
>



-- 
--

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England

Re: Solr5.X document loss in splitting shards

2015-12-29 Thread Luca Quarello

Hi Shawn,
I'm looking for the docs num from the core overview page and the situation
is:

Num Docs: 35031923Max Doc: 35156879





The difference doesn't explain the strange behavior.





On Mon, Dec 28, 2015 at 1:35 AM, Shawn Heisey  wrote:

> On 12/26/2015 11:21 AM, Luca Quarello wrote:
> > I have a SOLR 5.3.1 CLOUD with two nodes and 8 shards per node.
> >
> > Each shard is about* 35 million documents (**35025882**) and 16GB sized.*
> >
> >
> >- I launch the SPLIT command on a shard (shard 13) in the ASYNC way:
>
> 
>
> > The new created shards have:
> > *13430316 documents (5.6 GB) and 13425924 documents (5.59 GB**)*.
>
> Where are you looking that shows you the source shard has 35 million
> documents?  Be extremely specific.
>
> The following screenshot shows one place you might be looking for this
> information -- the core overview page:
>
> https://www.dropbox.com/s/311n49wkp9kw7xa/admin-ui-core-overview.png?dl=0
>
> Is the core overview page where you are looking, or is it somewhere else?
>
> I'm asking because "Max Doc" and "Num Docs" on the core overview page
> mean very different things.  The difference between them is the number
> of deleted docs, and the split shards are probably missing those deleted
> docs.
>
> This is the only idea that I have.  If it's not that, then I'm as
> clueless as you are.
>
> Thanks,
> Shawn
>
>

How to achieve join like behavior on solr-cloud

2015-12-29 Thread Alok Bhandari

Hello ,

I am aware of the fact that Solr (I am using 5.2) does not support join on
distributed search with documents to be joined residing on different
shards/collections.

My use case is I want to fetch uuid of documents that are resultant of a
search and also those docs which are outside this search but have "related"
field same as one of the search result docs. This is a typical join
scenario. 

Is there some way using streaming-api to achieve this behavior . Or some
other approach.

Thanks.
Alok



--
View this message in context: 
http://lucene.472066.n3.nabble.com/How-to-achieve-join-like-behavior-on-solr-cloud-tp4247703.html
Sent from the Solr - User mailing list archive at Nabble.com.

Re: mlt and document boost

2015-12-29 Thread Upayavira

That might work, but this might be clearer:

q={!boost b=recip(dist(2, 0, star_rating, 0, 3),1,10,10) v=$mlt}&
mlt={!mlt qf=name,description,facilities,resort,region,dest_level_2
mintf=1 mindf=3 maxqt=100}43083

Upayavira

On Tue, Dec 29, 2015, at 12:00 PM, Alessandro Benedetti wrote:
> I simply concatenated them :
> 
> q={!boost b=recip(dist(2,0,star_rating,0,3),1,10,10)}{!mlt
> qf=name,description,facilities,resort,region,dest_level_2 mintf=1 mindf=3
> maxqt=100}43083
> 
> From the debug query the syntax is fine.
> 
> Am i correct ?
> 
> Cheers
> 
> On 29 December 2015 at 11:48, Alessandro Benedetti
> 
> wrote:
> 
> > Hi Upaya,
> > talking about wrapping the MLT query parser with additional query parsers :
> >
> > let's assume I want to run my MLT query + 2 boost functions on the results
> > to affect the ranking.
> >
> > Can you give me an example of how to wrap them together ?
> >
> > Those two are the independent pieces :
> > {!boost b=recip(dist(2, 0, star_rating, 0, 3),1,10,10)}
> > {!mlt qf=name,description,facilities,resort,region,dest_level_2 mintf=1
> > mindf=3 maxqt=100}43083
> >
> > Cheers
> >
> > Cheers
> >
> > On 24 December 2015 at 21:18, Upayavira  wrote:
> >
> >> If you are going to go that far, you can get the parsed query from the
> >> debug output, but seriously, if you are using a latest Solr and don't
> >> need the stream.body functionality in MLT, then use the MLT query
> >> parser, it is by far the best way to do it - as you get all the features
> >> of other query parsers and such for free.
> >>
> >> Upayavira
> >>
> >> On Thu, Dec 24, 2015, at 07:37 PM, Tim Hearn wrote:
> >> > One workaround is to use the 'important terms' feature to grab the query
> >> > generated by the MLT handler, then parse that list into your own solr
> >> > query
> >> > to use through a standard search handler.  That way, you can get the
> >> same
> >> > results as if you used the MLT handler, and you can also use filter
> >> > querying, highlighting, etc.
> >> >
> >> > Note:  I am currently running a Solr 5.0.0 Single-Core installation
> >> >
> >> > On Thu, Dec 24, 2015 at 11:57 AM, Upayavira  wrote:
> >> >
> >> > > Which morelikethis are you using? Handler, SearchComponent or
> >> > > QueryParser?
> >> > >
> >> > > You should be a able to wrap the mlt query parser with the boost query
> >> > > parser with no problem.
> >> > >
> >> > > Upayavira
> >> > >
> >> > > On Thu, Dec 24, 2015, at 05:18 AM, Binoy Dalal wrote:
> >> > > > Have you tried applying the boosts to individual fields with mlt.qf?
> >> > > > Optionally, you could get the patch that is on jira and integrate
> >> it into
> >> > > > your code if you're so inclined.
> >> > > >
> >> > > > On Thu, 24 Dec 2015, 03:17 CrazyDiamond 
> >> wrote:
> >> > > >
> >> > > > > So no way to apply boost to mlt or any other way to change order
> >> of
> >> > > > > document
> >> > > > > in mlt result? also may be there is a way to make to mlt query  at
> >> > > once and
> >> > > > > merge.
> >> > > > >
> >> > > > >
> >> > > > >
> >> > > > > --
> >> > > > > View this message in context:
> >> > > > >
> >> > >
> >> http://lucene.472066.n3.nabble.com/mlt-and-document-boost-tp4246522p4247154.html
> >> > > > > Sent from the Solr - User mailing list archive at Nabble.com.
> >> > > > >
> >> > > > --
> >> > > > Regards,
> >> > > > Binoy Dalal
> >> > >
> >>
> >
> >
> >
> > --
> > --
> >
> > Benedetti Alessandro
> > Visiting card : http://about.me/alessandro_benedetti
> >
> > "Tyger, tyger burning bright
> > In the forests of the night,
> > What immortal hand or eye
> > Could frame thy fearful symmetry?"
> >
> > William Blake - Songs of Experience -1794 England
> >
> 
> 
> 
> -- 
> --
> 
> Benedetti Alessandro
> Visiting card : http://about.me/alessandro_benedetti
> 
> "Tyger, tyger burning bright
> In the forests of the night,
> What immortal hand or eye
> Could frame thy fearful symmetry?"
> 
> William Blake - Songs of Experience -1794 England

Re: mlt and document boost

2015-12-29 Thread Alessandro Benedetti

I simply concatenated them :

q={!boost b=recip(dist(2,0,star_rating,0,3),1,10,10)}{!mlt
qf=name,description,facilities,resort,region,dest_level_2 mintf=1 mindf=3
maxqt=100}43083

>From the debug query the syntax is fine.

Am i correct ?

Cheers

On 29 December 2015 at 11:48, Alessandro Benedetti 
wrote:

> Hi Upaya,
> talking about wrapping the MLT query parser with additional query parsers :
>
> let's assume I want to run my MLT query + 2 boost functions on the results
> to affect the ranking.
>
> Can you give me an example of how to wrap them together ?
>
> Those two are the independent pieces :
> {!boost b=recip(dist(2, 0, star_rating, 0, 3),1,10,10)}
> {!mlt qf=name,description,facilities,resort,region,dest_level_2 mintf=1
> mindf=3 maxqt=100}43083
>
> Cheers
>
> Cheers
>
> On 24 December 2015 at 21:18, Upayavira  wrote:
>
>> If you are going to go that far, you can get the parsed query from the
>> debug output, but seriously, if you are using a latest Solr and don't
>> need the stream.body functionality in MLT, then use the MLT query
>> parser, it is by far the best way to do it - as you get all the features
>> of other query parsers and such for free.
>>
>> Upayavira
>>
>> On Thu, Dec 24, 2015, at 07:37 PM, Tim Hearn wrote:
>> > One workaround is to use the 'important terms' feature to grab the query
>> > generated by the MLT handler, then parse that list into your own solr
>> > query
>> > to use through a standard search handler.  That way, you can get the
>> same
>> > results as if you used the MLT handler, and you can also use filter
>> > querying, highlighting, etc.
>> >
>> > Note:  I am currently running a Solr 5.0.0 Single-Core installation
>> >
>> > On Thu, Dec 24, 2015 at 11:57 AM, Upayavira  wrote:
>> >
>> > > Which morelikethis are you using? Handler, SearchComponent or
>> > > QueryParser?
>> > >
>> > > You should be a able to wrap the mlt query parser with the boost query
>> > > parser with no problem.
>> > >
>> > > Upayavira
>> > >
>> > > On Thu, Dec 24, 2015, at 05:18 AM, Binoy Dalal wrote:
>> > > > Have you tried applying the boosts to individual fields with mlt.qf?
>> > > > Optionally, you could get the patch that is on jira and integrate
>> it into
>> > > > your code if you're so inclined.
>> > > >
>> > > > On Thu, 24 Dec 2015, 03:17 CrazyDiamond 
>> wrote:
>> > > >
>> > > > > So no way to apply boost to mlt or any other way to change order
>> of
>> > > > > document
>> > > > > in mlt result? also may be there is a way to make to mlt query  at
>> > > once and
>> > > > > merge.
>> > > > >
>> > > > >
>> > > > >
>> > > > > --
>> > > > > View this message in context:
>> > > > >
>> > >
>> http://lucene.472066.n3.nabble.com/mlt-and-document-boost-tp4246522p4247154.html
>> > > > > Sent from the Solr - User mailing list archive at Nabble.com.
>> > > > >
>> > > > --
>> > > > Regards,
>> > > > Binoy Dalal
>> > >
>>
>
>
>
> --
> --
>
> Benedetti Alessandro
> Visiting card : http://about.me/alessandro_benedetti
>
> "Tyger, tyger burning bright
> In the forests of the night,
> What immortal hand or eye
> Could frame thy fearful symmetry?"
>
> William Blake - Songs of Experience -1794 England
>



-- 
--

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

"Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?"

William Blake - Songs of Experience -1794 England

40 matches

Mail list logo