Re: Phrases and edismax

2016-04-30 Thread Erick Erickson
Looks like a bug in edismax to me when you field-qualify
the terms.

As an aside, there's no need to specify the field when you only
want it to go against the fields defined in "qf" and "pf" etc. And,
that's a work-around for this particular case. But still:

So here's what I get on 5x:
q=(erick men truck)=edismax=name=name
correctly returns:
"+((name:erick) (name:men) (name:truck)) (name:"erick men truck")",

But,
q=name:(erick men truck)=edismax=name=name
incorrectly returns:
"+(name:erick name:men name:truck) (name:"men truck")",

And this:
q=name:(erick men truck)=edismax=name=features
incorrectly gives this.

"+(name:erick name:men name:truck) (features:"men truck")",

Confusingly, the terms (with "erick" left out, strike 1)
goes against the pf field even though it's fully qualified against the
name field. Not entirely sure whether this is intended or not frankly.

Please go ahead and raise a JIRA.

Best,
Erick

On Fri, Apr 29, 2016 at 7:55 AM, Mark Robinson  wrote:
> Hi,
>
> q=productType:(two piece bathtub white)
> =edismax=productType^20.0=productType^15.0
>
> In the debug section this is what I see:-
> 
> (+(productType:two productType:piec productType:bathtub productType:white)
> DisjunctionMaxQuery((productType:"piec bathtub white"^20.0)))/no_coord
> 
>
> My question is related to the "pf" (phrases) section of edismax.
> As shown in the debug section why is the phrase taken as "piec bathtub
> white". Why is the first word "two" not considered in the phrase fields
> section.
> I am looking for queries with the words "two piece bathtub white" being
> together to be boosted and not "piece bathtub white" only to be boosted.
>
> Could some one help me understand what I am missing?
>
> Thanks!
> Mark


RE: Solr 5.2.1 on Java 8 GC

2016-04-30 Thread Davis, Daniel (NIH/NLM) [C]
Bram, on the subject of brute force - if your script is "clever" and uses 
binary first search, I'd love to adapt it to my environment.  I am trying to 
build a truly multi-tenant Solr because each of our indexes is tiny, but all 
together they will eventually be big, and so I'll have to repeat this 
experiment, many, many times.


From: Bram Van Dam [bram.van...@intix.eu]
Sent: Saturday, April 30, 2016 7:10 AM
To: solr-user@lucene.apache.org
Subject: Re: Solr 5.2.1 on Java 8 GC

On 29/04/16 16:40, Nick Vasilyev wrote:
> Not sure if it helps anyone, but I am seeing decent results with the
> following.
>
> It was mostly a result of trial and error,

I'm ashamed to admit that I've used a similar approach: wrote a simple
test script to try out various GC settings with various values. Repeat
ad nauseum. Ended with a configuration that works reasonably well on the
environment in question, but will probably fail horribly anywhere else.

When in doubt, use brute force.

 - Bram


Re: Solr5.5:DocValues/CopyField does not work with Atomic updates

2016-04-30 Thread Nick Vasilyev
I am also running into this problem on Solr 6.

On Sun, Apr 24, 2016 at 6:10 PM, Karthik Ramachandran <
kramachand...@commvault.com> wrote:

> I have opened JIRA
>
> https://issues.apache.org/jira/browse/SOLR-9034
>
> I will upload the patch soon.
>
> With Thanks & Regards
> Karthik Ramachandran
> CommVault
> Direct: (732) 923-2197
>  Please don't print this e-mail unless you really need to
>
> -Original Message-
> From: Erick Erickson [mailto:erickerick...@gmail.com]
> Sent: Friday, April 22, 2016 8:24 PM
> To: solr-user 
> Subject: Re: Solr5.5:DocValues/CopyField does not work with Atomic updates
>
> I think I just added the right person, let us know if you don't have
> access and/or if you need access to the LUCENE JIRA.
>
> Erick
>
> On Fri, Apr 22, 2016 at 5:17 PM, Karthik Ramachandran <
> kramachand...@commvault.com> wrote:
> > Eric
> >   I have created a JIRA id (kramachand...@commvault.com).  Once I get
> > access I will create the JIRA and submit the patch.
> >
> > With Thanks & Regards
> > Karthik Ramachandran
> > CommVault
> > Direct: (732) 923-2197
> > P Please don't print this e-mail unless you really need to
> >
> >
> >
> > On 4/22/16, 8:04 PM, "Erick Erickson"  wrote:
> >
> >>Karthik:
> >>
> >>The Apache mailing list is pretty aggressive about removing
> >>attachments. Could you possibly open a JIRA and attach the file as a
> >>patch? If at all possible a patch file with just the diffs would be
> >>best.
> >>
> >>One problem is that it'll be a two-step process. The JIRAs have been
> >>being hit with spam, so you'll have to request access once you create
> >>a JIRA ID (this list would be fine).
> >>
> >>Best,
> >>Erick
> >>
> >>On Thu, Apr 21, 2016 at 9:09 PM, Karthik Ramachandran
> >> wrote:
> >>> We feel the issue is in
> >>>RealTimeGetComponent.getInputDocument(SolrCore
> >>>core,
> >>> BytesRef idBytes) where solr calls getNonStoredDVs and add the
> >>>fields to the  original document without excluding the copyFields.
> >>>
> >>>
> >>>
> >>> We made changes to send the filteredList to
> >>>searcher.decorateDocValueFields
> >>> and it started working.
> >>>
> >>>
> >>>
> >>> Attached is the modified file.
> >>>
> >>>
> >>>
> >>> With Thanks & Regards
> >>> Karthik Ramachandran
> >>> CommVault
> >>> P Please don't print this e-mail unless you really need to
> >>>
> >>>
> >>>
> >>> -Original Message-
> >>> From: Karthik Ramachandran [mailto:mrk...@gmail.com]
> >>> Sent: Friday, April 22, 2016 12:08 AM
> >>> To: solr-user@lucene.apache.org
> >>> Subject: Re: Solr5.5:DocValues/CopyField does not work with Atomic
> >>>updates
> >>>
> >>>
> >>>
> >>> We are trying to update Field A.
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> -Karthik
> >>>
> >>>
> >>>
> >>> On Thu, Apr 21, 2016 at 10:36 PM, John Bickerstaff
> >>> >>>
>  wrote:
> >>>
> >>>
> >>>
>  Which field do you try to atomically update?  A or B or some other?
> >>>
>  On Apr 21, 2016 8:29 PM, "Tirthankar Chatterjee" <
> >>>
>  tchatter...@commvault.com>
> >>>
>  wrote:
> >>>
> 
> >>>
>  > Hi,
> >>>
>  > Here is the scenario for SOLR5.5:
> >>>
>  >
> >>>
>  > FieldA type= stored=true indexed=true
> >>>
>  >
> >>>
>  > FieldB type= stored=false indexed=true docValue=true
> >>>
>  > usedocvalueasstored=false
> >>>
>  >
> >>>
>  > FieldA copyTo FieldB
> >>>
>  >
> >>>
>  > Try an Atomic update and we are getting this error:
> >>>
>  >
> >>>
>  > possible analysis error: DocValuesField "mtmround" appears more
>  > than
> >>>
>  > once in this document (only one value is allowed per field)
> >>>
>  >
> >>>
>  > How do we resolve this.
> >>>
>  >
> >>>
>  >
> >>>
>  >
> >>>
>  > ***Legal
> >>>
>  > Disclaimer***
> >>>
>  > "This communication may contain confidential and privileged
>  > material
> >>>
>  > for the sole use of the intended recipient. Any unauthorized
>  > review,
> >>>
>  > use or distribution by others is strictly prohibited. If you have
> >>>
>  > received the message by mistake, please advise the sender by
>  > reply
> >>>
>  > email and delete the message. Thank
> >>>
>  you."
> >>>
>  > *
>  > ***
> >>>
>  > **
> >>>
> 
> >>>
> >>> ***Legal
> >>>Disclaimer***
> >>> "This communication may contain confidential and privileged material
> >>>for the  sole use of the intended recipient. Any unauthorized review,
> >>>use or  distribution  by others is strictly prohibited. If you have
> >>>received the message by  mistake,  please advise the sender by reply
> >>>email and delete the message. Thank you."
> >>>
> >>>*
> >>>*
> >>
> 

Re: Tuning solr for large index with rapid writes

2016-04-30 Thread Bram Van Dam
> If I'm reading this right, you have 420M docs on a single shard?
> Yep, you were reading it right. 

Is Erick mentioned, it's hard to give concrete sizing advice, but we've
found 120M to be the magic number. When a shard contains more than 120M
documents, performance goes down rapidly & GC pauses grow a lot longer.
Up until 250M things remain acceptable. But then performance starts to
drop very quickly after that.

 - Bram



Re: Tuning solr for large index with rapid writes

2016-04-30 Thread Bram Van Dam
On 29/04/16 16:33, Erick Erickson wrote:
> You have one huge advantage when doing prototyping, you can
> mine your current logs for real user queries. It's actually
> surprisingly difficult to generate, say, 10,000 "realistic" queries. And
> IMO you need something approaching that number to insure that
> you're queries don't hit the caches etc

Our approach is to log queries for a while, boil them down to their
different use cases (full text search, simple facet, complex 2D ranged
with stats, etc) and then generate realistic parameter values for each
search field used in those queries. It's not perfect, but it gives you
large amounts of reasonably realistic queries.

Also, you can bypass the query cache by adding {!cache=false} to your query.

 - Bram




Re: Solr 5.2.1 on Java 8 GC

2016-04-30 Thread Bram Van Dam
On 29/04/16 16:40, Nick Vasilyev wrote:
> Not sure if it helps anyone, but I am seeing decent results with the
> following.
> 
> It was mostly a result of trial and error, 

I'm ashamed to admit that I've used a similar approach: wrote a simple
test script to try out various GC settings with various values. Repeat
ad nauseum. Ended with a configuration that works reasonably well on the
environment in question, but will probably fail horribly anywhere else.

When in doubt, use brute force.

 - Bram


Streaming expression for suggester

2016-04-30 Thread Pranaya Behera

Hi,
 I have two collections lets name them as A and B. I want to 
suggester to work on both the collection while searching on the 
front-end application.
In collection A I have 4 different fields. I want to use all of them for 
the suggester. Shall I copy them to a new field of combined of the 4 
fields and use it on the spellcheck component and then use that field 
for the suggester?

In collection B I have only 1 field.

When user searches something in the front-end application, I would like 
to show results from the both collections. Is streaming expression would 
be a viable option here ? If so, how ? I couldn't find any related 
document for the suggester streaming expression. If not, then how would 
I approach this ?


Re: Schema API

2016-04-30 Thread Hendrik Haddorp
Looks like I ran into the same as was discussed here:
http://grokbase.com/t/lucene/solr-user/15c4nr1j48/solrcloud-1-server-1-configset-multiple-collections-multiple-schemas

Would be nice if that would be changed in the future as it would make
these setups much easier.

On 29/04/16 20:07, Hendrik Haddorp wrote:
> Hi,
>
> I have a Solr Cloud 6 setup with a managed schema. It seems like when I
> create multiple collections from the same config set that they still
> share the same schema. That was rather unexpected, as in the REST and
> SolrJ API I do specify a collection when doing the schema change.
> Looking into what is stored in ZooKeeper I do however only see a config
> name stored for my collections so I guess this is the design. Or am I
> missing something? Do I really need to upload a new config set when I
> want to create a collection with unique fields? If so I would need to
> make sure to delete that once I delete my collection. Seems a bit odd
> and complicated to me.
>
> SolrJ is also behaving strange. When I try to add multiple fields using
> a MultiUpdate request where two fields already exist and one is new I
> get no error back (getStatus() == 0) while the response object contains
> error messages for the fields that existed already and the new field did
> not get added.
>
> regards,
> Hendrik



Re: Error - Too many close [count:-1]

2016-04-30 Thread Reth RM
Could you please some more background to this issue. Was it reported while
indexing or querying? What is the version of solr?


On Sat, Apr 30, 2016 at 12:04 AM, Vipul Gupta  wrote:

> Solr team - Any pointers on fixing this issue ?
>
> [10:29:08] ERROR 0-thread-7 o.a.s.c.SolrCore <> Too many close [count:-1]
> on
> org.apache.solr.core.SolrCore@3d6f8ad3. Please report this exception to
> solr-user@lucene.apache.org
>