Re: [Non-DoD Source] Re: Solr 6.1.0 issue (UNCLASSIFIED)

2016-08-05 Thread Erick Erickson
You also need to find out _why_ you're trying to index such huge
tokens, they indicate that something you're ingesting isn't
reasonable

Just truncating the input will index things, true. But a 32K token is
unexpected, and indicates what's in your index may not be what you
expect and may not be useful.

But you know what you're indexing best, this is just a general statement.

Erick

On Fri, Aug 5, 2016 at 12:55 PM, Musshorn, Kris T CTR USARMY RDECOM
ARL (US)  wrote:
> CLASSIFICATION: UNCLASSIFIED
>
> What I did was force nutch to truncate content to 32765 max before indexing 
> into solr and it solved my problem.
>
>
> Thanks,
> Kris
>
> ~~
> Kris T. Musshorn
> FileMaker Developer - Contractor – Catapult Technology Inc.
> US Army Research Lab
> Aberdeen Proving Ground
> Application Management & Development Branch
> 410-278-7251
> kris.t.musshorn@mail.mil
> ~~
>
>
> -Original Message-
> From: Erick Erickson [mailto:erickerick...@gmail.com]
> Sent: Friday, August 05, 2016 3:29 PM
> To: solr-user 
> Subject: [Non-DoD Source] Re: Solr 6.1.0 issue (UNCLASSIFIED)
>
> All active links contained in this email were disabled.  Please verify the 
> identity of the sender, and confirm the authenticity of all links contained 
> within the message prior to copying and pasting the address to a Web browser.
>
>
>
>
> 
>
> what that error is telling you is that you have an unanalyzed term that is, 
> well, huge (i..e > 32K). Is your "content" field by chance a "string" type? 
> It's very rare that a term > 32K is actually useful.
> You can't search on it except with, say, wildcards,there's no stemming etc. 
> So the first question is whether the "content" field is appropriately defined 
> in your schema for your use case.
>
> If your content field is some kind of text-based field (i.e.
> solr.Textfield), then the second issue may be that you just have wonky data 
> coming in, say a base-64 encoded image or something scraped from somewhere. 
> In that case you need to NOT index it. You can try Or try 
> LengthFilterFactory, see:
> Caution-https://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.LengthFilterFactory.
>
> This is a fundamental limitation enforced at the Lucene layer, so if that 
> doesn't work, the only real solution is "don't do that". You'll have to 
> intercept the doc and omit that data, perhaps write a custom update processor 
> to throw out huge fields or the like.
>
> Best,
> Erick
>
>
> On Fri, Aug 5, 2016 at 10:59 AM, Musshorn, Kris T CTR USARMY RDECOM ARL (US) 
>  wrote:
>> CLASSIFICATION: UNCLASSIFIED
>>
>> I am trying to index from nutch 1.12 to SOLR 6.1.0.
>> Got this error.
>> java.lang.Exception:
>> org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException:
>> Error from server at Caution-http://localhost:8983/solr/ARLInside:
>> Exception writing document id
>> Caution-https://emcstage.arl.army.mil/inside/fellows/corner/research.v
>> ol.3.2/index.cfm to the index; possible analysis error: Document
>> contains at least one immense term in field="content" (whose UTF8
>> encoding is longer than the max length 32766
>>
>> How to correct?
>>
>> Thanks,
>> Kris
>>
>> ~~
>> Kris T. Musshorn
>> FileMaker Developer - Contractor - Catapult Technology Inc.
>> US Army Research Lab
>> Aberdeen Proving Ground
>> Application Management & Development Branch
>> 410-278-7251
>> kris.t.musshorn@mail.mil
>> ~~
>>
>>
>>
>> CLASSIFICATION: UNCLASSIFIED
>
>
> CLASSIFICATION: UNCLASSIFIED


RE: [Non-DoD Source] Re: Solr 6.1.0 issue (UNCLASSIFIED)

2016-08-05 Thread Musshorn, Kris T CTR USARMY RDECOM ARL (US)
CLASSIFICATION: UNCLASSIFIED

What I did was force nutch to truncate content to 32765 max before indexing 
into solr and it solved my problem.


Thanks,
Kris

~~
Kris T. Musshorn
FileMaker Developer - Contractor – Catapult Technology Inc.  
US Army Research Lab 
Aberdeen Proving Ground 
Application Management & Development Branch 
410-278-7251
kris.t.musshorn@mail.mil
~~


-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: Friday, August 05, 2016 3:29 PM
To: solr-user 
Subject: [Non-DoD Source] Re: Solr 6.1.0 issue (UNCLASSIFIED)

All active links contained in this email were disabled.  Please verify the 
identity of the sender, and confirm the authenticity of all links contained 
within the message prior to copying and pasting the address to a Web browser.  






what that error is telling you is that you have an unanalyzed term that is, 
well, huge (i..e > 32K). Is your "content" field by chance a "string" type? 
It's very rare that a term > 32K is actually useful.
You can't search on it except with, say, wildcards,there's no stemming etc. So 
the first question is whether the "content" field is appropriately defined in 
your schema for your use case.

If your content field is some kind of text-based field (i.e.
solr.Textfield), then the second issue may be that you just have wonky data 
coming in, say a base-64 encoded image or something scraped from somewhere. In 
that case you need to NOT index it. You can try Or try LengthFilterFactory, see:
Caution-https://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.LengthFilterFactory.

This is a fundamental limitation enforced at the Lucene layer, so if that 
doesn't work, the only real solution is "don't do that". You'll have to 
intercept the doc and omit that data, perhaps write a custom update processor 
to throw out huge fields or the like.

Best,
Erick


On Fri, Aug 5, 2016 at 10:59 AM, Musshorn, Kris T CTR USARMY RDECOM ARL (US) 
 wrote:
> CLASSIFICATION: UNCLASSIFIED
>
> I am trying to index from nutch 1.12 to SOLR 6.1.0.
> Got this error.
> java.lang.Exception: 
> org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: 
> Error from server at Caution-http://localhost:8983/solr/ARLInside: 
> Exception writing document id 
> Caution-https://emcstage.arl.army.mil/inside/fellows/corner/research.v
> ol.3.2/index.cfm to the index; possible analysis error: Document 
> contains at least one immense term in field="content" (whose UTF8 
> encoding is longer than the max length 32766
>
> How to correct?
>
> Thanks,
> Kris
>
> ~~
> Kris T. Musshorn
> FileMaker Developer - Contractor - Catapult Technology Inc.
> US Army Research Lab
> Aberdeen Proving Ground
> Application Management & Development Branch
> 410-278-7251
> kris.t.musshorn@mail.mil
> ~~
>
>
>
> CLASSIFICATION: UNCLASSIFIED


CLASSIFICATION: UNCLASSIFIED


Re: Solr 6.1.0 issue (UNCLASSIFIED)

2016-08-05 Thread Erick Erickson
what that error is telling you is that you have an unanalyzed term
that is, well, huge (i..e > 32K). Is your "content" field by chance a
"string" type? It's very rare that a term > 32K is actually useful.
You can't search on it except with, say, wildcards,there's no stemming
etc. So the first question is whether the "content" field is
appropriately defined in your schema for your use case.

If your content field is some kind of text-based field (i.e.
solr.Textfield), then the second issue may be that you just have wonky
data coming in, say a base-64 encoded image or something scraped from
somewhere. In that case you need to NOT index it. You can try Or try
LengthFilterFactory, see:
https://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.LengthFilterFactory.

This is a fundamental limitation enforced at the Lucene layer, so if
that doesn't work, the only real solution is "don't do that". You'll
have to intercept the doc and omit that data, perhaps write a custom
update processor to throw out huge fields or the like.

Best,
Erick


On Fri, Aug 5, 2016 at 10:59 AM, Musshorn, Kris T CTR USARMY RDECOM
ARL (US)  wrote:
> CLASSIFICATION: UNCLASSIFIED
>
> I am trying to index from nutch 1.12 to SOLR 6.1.0.
> Got this error.
> java.lang.Exception: 
> org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error 
> from server at http://localhost:8983/solr/ARLInside: Exception writing 
> document id 
> https://emcstage.arl.army.mil/inside/fellows/corner/research.vol.3.2/index.cfm
>  to the index; possible analysis error: Document contains at least one 
> immense term in field="content" (whose UTF8 encoding is longer than the max 
> length 32766
>
> How to correct?
>
> Thanks,
> Kris
>
> ~~
> Kris T. Musshorn
> FileMaker Developer - Contractor - Catapult Technology Inc.
> US Army Research Lab
> Aberdeen Proving Ground
> Application Management & Development Branch
> 410-278-7251
> kris.t.musshorn@mail.mil
> ~~
>
>
>
> CLASSIFICATION: UNCLASSIFIED


Solr 6.1.0 issue (UNCLASSIFIED)

2016-08-05 Thread Musshorn, Kris T CTR USARMY RDECOM ARL (US)
CLASSIFICATION: UNCLASSIFIED

I am trying to index from nutch 1.12 to SOLR 6.1.0.
Got this error.
java.lang.Exception: 
org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error 
from server at http://localhost:8983/solr/ARLInside: Exception writing document 
id 
https://emcstage.arl.army.mil/inside/fellows/corner/research.vol.3.2/index.cfm 
to the index; possible analysis error: Document contains at least one immense 
term in field="content" (whose UTF8 encoding is longer than the max length 32766

How to correct?

Thanks,
Kris

~~
Kris T. Musshorn
FileMaker Developer - Contractor - Catapult Technology Inc.  
US Army Research Lab 
Aberdeen Proving Ground 
Application Management & Development Branch 
410-278-7251
kris.t.musshorn@mail.mil
~~



CLASSIFICATION: UNCLASSIFIED

Re: Problems using fieldType text_general in copyField

2016-08-05 Thread John Bickerstaff
Many thanks for the assistance Hoss!  After a couple of bumps, it worked
great.



I followed the recommendations (and read the explanation - thanks!)

Although I swear it threw the error once again, just to be sure I rebooted
everything (Zookeeper included) then reloaded the configs into Zookeeper
and restarted my Solr servers - at that point the errors disappeared and
everything worked.

This will make upgrading super easy for us.  Given the relatively small
size of our data set, we have the luxury of just creating new Solr 6.1
instances in AWS, making a new node in Zookeeper, creating a collection,
adding the custom_schema file as you described and loading the data into
Solr from our Kafka store.  Gotta love it when your complete indexing into
Solr is in the neighborhood of two hours rather than two days or two weeks!

On Thu, Aug 4, 2016 at 8:42 PM, Alexandre Rafalovitch 
wrote:

> Just as a note, TYPO3 uses a lot of include files though I do not remember
> which specific mechanism they rely on.
>
> Regards,
> Alex
>
> On 5 Aug 2016 10:51 AM, "John Bickerstaff" 
> wrote:
>
> > Many thanks for your time!  Yes, it does make sense.
> >
> > I'll give your recommendation a shot tomorrow and update the thread.
> >
> > On Aug 4, 2016 6:22 PM, "Chris Hostetter" 
> > wrote:
> >
> >
> > TL;DR: use entity includes *WITH OUT TOP LEVEL WRAPPER ELEMENTS* like in
> > this example...
> >
> > https://github.com/apache/lucene-solr/blob/master/solr/
> > core/src/test-files/solr/collection1/conf/schema-snippet-types.incl
> > https://github.com/apache/lucene-solr/blob/master/solr/
> > core/src/test-files/solr/collection1/conf/schema-xinclude.xml
> >
> >
> > : The file I pasted last time is the file I was trying to include into
> the
> > : main schema.xml.  It was when that file was getting processed that I
> got
> > : the error  ['content' is not a glob and doesn't match any explicit
> field
> > or
> > : dynamicField. ]
> >
> > Ok -- so just to be crystal clear, you have two files, that look roughly
> > like this...
> >
> > --- BEGIN schema.xml ---
> > 
> > 
> >   
> >   http://www.w3.org/
> > 2001/XInclude"/>
> > 
> > --- END schema.xml ---
> >
> > -- BEGIN statdx_custom_schema.xml ---
> > 
> > 
> >   
> > 
> > --- END statdx_custom_schema.xml ---
> >
> > ...am I correct?
> >
> >
> > I'm going to skip a lot of the nitty gritty and just summarize by saying
> > that ultimately there are 2 problems here that combine to lead to the
> > error you are getting:
> >
> > 1) what you are trying to do as far as the xinclude is not really what
> > xinclude is designed for and doesn't work the way you (or any other sane
> > person) would think it does.
> >
> > 2) for historical reasons, Solr is being sloppy in what 
> > entries it recognizes.  If anything the "bug" is that Solr is
> > willing to try to load any parts of your include file at all -- it it
> were
> > behaving consistently it should be ignoring all of it.
> >
> >
> > Ok ... that seems terse, i'll clarify with a little of the nitty
> gritty...
> >
> >
> > The root of the issue is really something you alluded to earlier that
> > dind't make sense to me at the time because I didn't realize you were
> > showing us the *includED* file when you said it...
> >
> > >>> I assumed (perhaps wrongly) that I could duplicate the 
> > >>>   arrangement from the schema.xml file.
> >
> > ...that assumption is the crux of the problem, because when the XML
> parser
> > evaluates your xinclude, what it produces is functionally equivilent to
> if
> > you had a schema.xml file that looked like this
> >
> > --- BEGIN EFFECTIVE schema.xml ---
> > 
> > 
> >   
> >   
> > 
> >   
> > 
> > --- END EFFECTIVE schema.xml ---
> >
> > ...that extra  element nested inside of the original 
> > element is what's confusing the hell out of solr.  The  and
> >  parsing is fairly strict, and only expects to find them as
> top
> > level elements (or, for historical purposes, as children of  and
> >  -- note the plurals) while the  parsing is sloppy and
> > finds the one that gives you an error.
> >
> > (Even if the  and  parsing was equally sloppy, only the
> > outermost  tag would be recognized, so your default field props
> > would be based on the version="1.5" declaration, not the version="1.6"
> > declaration of the included file they'd be in ... which would be
> confusing
> > as hell, so it's a good thing Solr isn't sloppy about that parsing too)
> >
> >
> > In contrast to xincludes, XML Entity includes are (almost as a side
> effect
> > of the triviality of their design) vastly supperiour 90% of the time, and
> > capable of doing what you want.  The key diff being that Entity includes
> > do not require that the file being included is valid XML -- it can be an
> > arbitrary snippet of xml content (w/o a top level element) that will be
> > inlined verbatim.  so you can/should do soemthing like this...
> >
> > --- BEGIN 

Re: Solr Cloud with 5 servers cluster failed due to Leader out of memory

2016-08-05 Thread Erick Erickson
You don't really have to worry that much about memory consumed during indexing.
The ramBufferSizeMB setting in solrconfig.xml pretty much limits the amount of
RAM consumed, when adding a doc if that limit is exceeded then the
buffer is flushed.

So you can reduce that number, but it's default is 100M and if you're
running that close
to your limits I suspect you'd get, at best, a bit more runway before
you hit the problem again.

NOTE: that number isn't an absolute limit, IIUC the algorithm is
> index a doc to the in-memory structures
> check if the limit is exceeded and flush if so.

So say you were at 99% of your ramBufferSizeMB setting and then indexed a
ginormous doc your in-memory stuff might be significantly bigger.

Searching usually is the bigger RAM consumer, so when I say "a bit
more runway" what
I'm thinking about is that when you start _searching_ the data your
memory requirements
will continue to grow and you'll be back where you started.

And just as a sanity check: You didn't perchance increase the
maxWarmingSearchers
parameter in solrconfig.xml, did you? If so, that's really a red flag.

Best,
Erick

On Fri, Aug 5, 2016 at 12:41 AM, Tim Chen  wrote:
> Thanks Guys. Very very helpful.
>
> I will probably look at consolidate 4 Solr servers into 2 bigger/better 
> server - it gives more memory, and it cut down the replica the Leader needs 
> to manage.
>
> Also, I may look into write a script to monitor the tomcat log and if there 
> is OOM, kill tomcat, then restart it. A bit dirty, but may work for a short 
> term.
>
> I don't know too much about how documents indexed, and how to save memory 
> from that. Will probably work with a developer on this as well.
>
> Many Thanks guys.
>
> Cheers,
> Tim
>
> -Original Message-
> From: Shawn Heisey [mailto:apa...@elyograg.org]
> Sent: Friday, 5 August 2016 4:55 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Solr Cloud with 5 servers cluster failed due to Leader out of 
> memory
>
> On 8/4/2016 8:14 PM, Tim Chen wrote:
>> Couple of thoughts: 1, If Leader goes down, it should just go down,
>> like dead down, so other servers can do the election and choose the
>> new leader. This at least avoids bringing down the whole cluster. Am I
>> right?
>
> Supplementing what Erick told you:
>
> When a typical Java program throws OutOfMemoryError, program behavior is 
> completely unpredictable.  There are programming techniques that can be used 
> so that behavior IS predictable, but writing that code can be challenging.
>
> Solr 5.x and 6.x, when they are started on a UNIX/Linux system, use a Java 
> option to execute a script when OutOfMemoryError happens.  This script kills 
> Solr completely.  We are working on adding this capability when running on 
> Windows.
>
>> 2, Apparently we should not pushing too many documents to Solr, how do
>> you guys handle this? Set a limit somewhere?
>
> There are exactly two ways to deal with OOME problems: Increase the heap or 
> reduce Solr's memory requirements.  The number of documents you push to Solr 
> is unlikely to have a large effect on the amount of memory that Solr 
> requires.  Here's some information on this topic:
>
> https://wiki.apache.org/solr/SolrPerformanceProblems#Java_Heap
>
> Thanks,
> Shawn
>
>
>
> [Premiere League Starts Saturday 13 August 9.30pm on 
> SBS]


Re: TooManyClauses: maxClauseCount is set to 1024

2016-08-05 Thread Erick Erickson
Another alternative is to use the TermsQueryParser which is intended to
deal with very large lists of values that should be ORed together. It may
be useful if your query pattern matches its intent. see:
https://cwiki.apache.org/confluence/display/solr/Other+Parsers#OtherParsers-TermsQueryParser

Best,
Erick


On Fri, Aug 5, 2016 at 8:23 AM, Shawn Heisey  wrote:
> On 8/3/2016 12:49 AM, liubiaoxin1 wrote:
>> set exery core solrconfig.xml: 4096
>
> Are you absolutely certain that you have set maxBooleanClauses on
> *EVERY* core in that Solr instance?
>
> This value is global, across the entire JVM, and the last core that
> starts will set the value for all cores.
>
> It is not possible to explicitly control the exact starting order of
> your cores.  If the config option is missing from the last core that
> gets started, then the global setting will be reset back to 1024.
>
> I tried to address this once, but the change was vetoed, and by Apache's
> rules, I wasn't allowed to do it.
>
> https://issues.apache.org/jira/browse/SOLR-4586
>
> I hope to try again and make this situation better for Solr.
>
> Thanks,
> Shawn
>


Re: TooManyClauses: maxClauseCount is set to 1024

2016-08-05 Thread Shawn Heisey
On 8/3/2016 12:49 AM, liubiaoxin1 wrote:
> set exery core solrconfig.xml: 4096

Are you absolutely certain that you have set maxBooleanClauses on
*EVERY* core in that Solr instance?

This value is global, across the entire JVM, and the last core that
starts will set the value for all cores.

It is not possible to explicitly control the exact starting order of
your cores.  If the config option is missing from the last core that
gets started, then the global setting will be reset back to 1024.

I tried to address this once, but the change was vetoed, and by Apache's
rules, I wasn't allowed to do it.

https://issues.apache.org/jira/browse/SOLR-4586

I hope to try again and make this situation better for Solr.

Thanks,
Shawn



Re: Can a MergeStrategy filter returned docs?

2016-08-05 Thread tedsolr
I don't see any field level data exposed in the SolrDocumentList I get from
shardResponse.getSolrResponse().getResponse().get("response"). I see the
unique ID field and value. Is that by design or am I being stupid?

Separate but related question: the mergIds() method in the merge strategy
class - when TRUE the developer is taking responsibility for the document
merge, when FALSE it looks like the QueryComponent puts all the results in a
sorted queue and removes the "extras" - right? When rows=3 each shard
returns 3 docs, but the user only wants 3 total not 3 per shard. So, if I
set mergeIds=FALSE I won't have to resort the docs, just eliminate the dupes
somehow.


Joel Bernstein wrote
> Collapse will have dups unless you use the _route_ parameter to co-locate
> documents with the same group, onto the same shard.
> 
> In you're scenario, co-locating docs sounds like it won't work because you
> may have different grouping criteria.
> 
> The doc counts would be inflated unless you sent all the documents from
> the
> shards to be merged and then de-duped them, which is how streaming
> operates. But streaming has the capability to do these types of operations
> in parallel and the merge strategy does not.





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Can-a-MergeStrategy-filter-returned-docs-tp4290446p4290556.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr 6: Use facet with Streaming Expressions- LeftOuterJoin

2016-08-05 Thread Joel Bernstein
If you need to aggregate after the join then you'll need to use the
rollup() function.

The rollup function requires the tuples be sorted by the group by fields.
So it's easiest to accomplish this using the hashOuterJoin, which doesn't
require a sort on the join keys.

If you're doing a parallel join, you'll need to wrap the rollup() around
the parallel() function unless the partionKeys for the join are the same as
the rollup group by.

Here is the psuedo code for a non-parallel join then rollup:

rollup(hashOuterJoin(search(), search()))

Here is the psuedo code for parallel join then rollup:

rollup(parallel(outerHashJoin(search(),search())


In both cases the searches should be sorted by the rollup() group by
fields. In the parallel case, the partitionKeys need to be the join keys.








Joel Bernstein
http://joelsolr.blogspot.com/

On Fri, Aug 5, 2016 at 7:21 AM, vrindavda  wrote:

> Hello,
> I have two collections and need to join the results on uniqueIds.
>
> I am able to do that with Streaming Expressions- LeftOuterJoin. Is there
> any
> way to use facets along with this?
>
>
>
>
>
> --
> View this message in context: http://lucene.472066.n3.
> nabble.com/Solr-6-Use-facet-with-Streaming-Expressions-
> LeftOuterJoin-tp4290526.html
> Sent from the Solr - User mailing list archive at Nabble.com.
>


Re: How can I set the defaultOperator to be AND?

2016-08-05 Thread Bastien Latard | MDPI AG

Hi Steve,

I read the thread you sent me (SOLR-8812) and it seems that the 6.1 
includes this fix, as you said.

I will upgrade.
Thank you!

Kind regards,
Bast

On 05/08/2016 14:37, Steve Rowe wrote:

Hi Bastien,

Have you tried upgrading to 6.1?  SOLR-8812, mentioned earlier in the thread, 
was released with 6.1, and is directly aimed at fixing the problem you are 
having in 6.0 (also a problem in 5.5): when mm is not explicitly provided and 
the query contains explicit operators (except for AND), edismax now sets mm=0.

--
Steve
www.lucidworks.com


On Aug 5, 2016, at 2:34 AM, Bastien Latard | MDPI AG  
wrote:

Hi Eric & others,
Is there any way to overwrite the default OP when we use edismax?
Because adding the following line to solrconfig.xml doesn't solve the problem:


(Then if I do "q=black OR white", this always gives the results for "black AND 
white")

I did not find a way to define a default OP, which is automatically overwritten 
by the AND/OR from a query.


Example - Debug: defaultOP in solrconfig = AND / q=a or b


==> results for black AND white
The correct result should be the following (but I had to force the q.op):

==> I cannot do this in case I want to do "(a AND b) OR c"...


Kind regards,
Bastien

On 27/04/2016 05:30, Erick Erickson wrote:

Defaulting to "OR" has been the behavior since forever, so changing the behavior now is 
just not going to happen. Making it fit a new version of "correct" will change the 
behavior for every application out there that has not specified the default behavior.

There's no a-priori reason to expect "more words to equal fewer docs", I can just as 
easily argue that "more words should return more docs". Which you expect depends on your 
mental model.

And providing the default op in your solrconfig.xml request handlers allows you 
to implement whatever model your application chooses...

Best,
Erick

On Mon, Apr 25, 2016 at 11:32 PM, Bastien Latard - MDPI AG 
 wrote:
Thank you Shawn, Jan and Georg for your answers.

Yes, it seems that if I simply remove the defaultOperator it works well for 
"composed queries" like '(a:x AND b:y) OR c:z'.
But I think that the default Operator should/could be the AND.

Because when I add an extra search word, I expect that the results get more 
accurate...
(It seems to be what google is also doing now)
|   |

Otherwise, if you make a search and apply another filter (e.g.: sort by 
publication date, facets, ...) , user can get the less relevant item (only 1 
word in 4 matches) in first position only because of its date...

What do you think?


Kind regards,
Bastien


On 25/04/2016 14:53, Shawn Heisey wrote:

On 4/25/2016 6:39 AM, Bastien Latard - MDPI AG wrote:


Remember:
If I add the following line to the schema.xml, even if I do a search
'title:"test" OR author:"me"', it will returns documents matching
'title:"test" AND author:"me"':



The settings in the schema for default field and default operator were
deprecated a long time ago.  I actually have no idea whether they are
even supported in newer Solr versions.

The q.op parameter controls the default operator, and the df parameter
controls the default field.  These can be set in the request handler
definition in solrconfig.xml -- usually in "defaults" but there might be
reason to put them in "invariants" instead.

If you're using edismax, you'd be better off using the mm parameter
rather than the q.op parameter.  The behavior you have described above
sounds like a change in behavior (some call it a bug) introduced in the
5.5 version:


https://issues.apache.org/jira/browse/SOLR-8812


If you are using edismax, I suspect that if you set mm=100% instead of
q.op=AND (or the schema default operator) that the problem might go away
... but I am not sure.  Someone who is more familiar with SOLR-8812
probably should comment.

Thanks,
Shawn





Re: How can I set the defaultOperator to be AND?

2016-08-05 Thread Steve Rowe
Hi Bastien,

Have you tried upgrading to 6.1?  SOLR-8812, mentioned earlier in the thread, 
was released with 6.1, and is directly aimed at fixing the problem you are 
having in 6.0 (also a problem in 5.5): when mm is not explicitly provided and 
the query contains explicit operators (except for AND), edismax now sets mm=0.

--
Steve
www.lucidworks.com

> On Aug 5, 2016, at 2:34 AM, Bastien Latard | MDPI AG 
>  wrote:
> 
> Hi Eric & others,
> Is there any way to overwrite the default OP when we use edismax?
> Because adding the following line to solrconfig.xml doesn't solve the problem:
> 
> 
> (Then if I do "q=black OR white", this always gives the results for "black 
> AND white")
> 
> I did not find a way to define a default OP, which is automatically 
> overwritten by the AND/OR from a query.
> 
> 
> Example - Debug: defaultOP in solrconfig = AND / q=a or b
> 
> 
> ==> results for black AND white
> The correct result should be the following (but I had to force the q.op):
> 
> ==> I cannot do this in case I want to do "(a AND b) OR c"...
> 
> 
> Kind regards,
> Bastien
> 
> On 27/04/2016 05:30, Erick Erickson wrote:
>> Defaulting to "OR" has been the behavior since forever, so changing the 
>> behavior now is just not going to happen. Making it fit a new version of 
>> "correct" will change the behavior for every application out there that has 
>> not specified the default behavior.
>> 
>> There's no a-priori reason to expect "more words to equal fewer docs", I can 
>> just as easily argue that "more words should return more docs". Which you 
>> expect depends on your mental model.
>> 
>> And providing the default op in your solrconfig.xml request handlers allows 
>> you to implement whatever model your application chooses...
>> 
>> Best,
>> Erick
>> 
>> On Mon, Apr 25, 2016 at 11:32 PM, Bastien Latard - MDPI AG 
>>  wrote:
>> Thank you Shawn, Jan and Georg for your answers.
>> 
>> Yes, it seems that if I simply remove the defaultOperator it works well for 
>> "composed queries" like '(a:x AND b:y) OR c:z'.
>> But I think that the default Operator should/could be the AND.
>> 
>> Because when I add an extra search word, I expect that the results get more 
>> accurate...
>> (It seems to be what google is also doing now)
>>|   |   
>> 
>> Otherwise, if you make a search and apply another filter (e.g.: sort by 
>> publication date, facets, ...) , user can get the less relevant item (only 1 
>> word in 4 matches) in first position only because of its date...
>> 
>> What do you think?
>> 
>> 
>> Kind regards,
>> Bastien
>> 
>> 
>> On 25/04/2016 14:53, Shawn Heisey wrote:
>>> On 4/25/2016 6:39 AM, Bastien Latard - MDPI AG wrote:
>>> 
 Remember:
 If I add the following line to the schema.xml, even if I do a search
 'title:"test" OR author:"me"', it will returns documents matching
 'title:"test" AND author:"me"':
  
 
>>> The settings in the schema for default field and default operator were
>>> deprecated a long time ago.  I actually have no idea whether they are
>>> even supported in newer Solr versions.
>>> 
>>> The q.op parameter controls the default operator, and the df parameter
>>> controls the default field.  These can be set in the request handler
>>> definition in solrconfig.xml -- usually in "defaults" but there might be
>>> reason to put them in "invariants" instead.
>>> 
>>> If you're using edismax, you'd be better off using the mm parameter
>>> rather than the q.op parameter.  The behavior you have described above
>>> sounds like a change in behavior (some call it a bug) introduced in the
>>> 5.5 version:
>>> 
>>> 
>>> https://issues.apache.org/jira/browse/SOLR-8812
>>> 
>>> 
>>> If you are using edismax, I suspect that if you set mm=100% instead of
>>> q.op=AND (or the schema default operator) that the problem might go away
>>> ... but I am not sure.  Someone who is more familiar with SOLR-8812
>>> probably should comment.
>>> 
>>> Thanks,
>>> Shawn
>>> 
>>> 
>>> 
> 



Re: Streaming expressions malfunctioning

2016-08-05 Thread vrindavda
Hello,

I am looking for similar use case. Will it be possible for you to share the
corrected syntax ?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Streaming-expressions-malfunctioning-tp4281016p4290528.html
Sent from the Solr - User mailing list archive at Nabble.com.


Solr 6: Use facet with Streaming Expressions- LeftOuterJoin

2016-08-05 Thread vrindavda
Hello,
I have two collections and need to join the results on uniqueIds.

I am able to do that with Streaming Expressions- LeftOuterJoin. Is there any
way to use facets along with this?





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-6-Use-facet-with-Streaming-Expressions-LeftOuterJoin-tp4290526.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Solr Cloud with 5 servers cluster failed due to Leader out of memory

2016-08-05 Thread Tim Chen
Thanks Guys. Very very helpful.

I will probably look at consolidate 4 Solr servers into 2 bigger/better server 
- it gives more memory, and it cut down the replica the Leader needs to manage.

Also, I may look into write a script to monitor the tomcat log and if there is 
OOM, kill tomcat, then restart it. A bit dirty, but may work for a short term.

I don't know too much about how documents indexed, and how to save memory from 
that. Will probably work with a developer on this as well.

Many Thanks guys.

Cheers,
Tim

-Original Message-
From: Shawn Heisey [mailto:apa...@elyograg.org]
Sent: Friday, 5 August 2016 4:55 PM
To: solr-user@lucene.apache.org
Subject: Re: Solr Cloud with 5 servers cluster failed due to Leader out of 
memory

On 8/4/2016 8:14 PM, Tim Chen wrote:
> Couple of thoughts: 1, If Leader goes down, it should just go down,
> like dead down, so other servers can do the election and choose the
> new leader. This at least avoids bringing down the whole cluster. Am I
> right?

Supplementing what Erick told you:

When a typical Java program throws OutOfMemoryError, program behavior is 
completely unpredictable.  There are programming techniques that can be used so 
that behavior IS predictable, but writing that code can be challenging.

Solr 5.x and 6.x, when they are started on a UNIX/Linux system, use a Java 
option to execute a script when OutOfMemoryError happens.  This script kills 
Solr completely.  We are working on adding this capability when running on 
Windows.

> 2, Apparently we should not pushing too many documents to Solr, how do
> you guys handle this? Set a limit somewhere?

There are exactly two ways to deal with OOME problems: Increase the heap or 
reduce Solr's memory requirements.  The number of documents you push to Solr is 
unlikely to have a large effect on the amount of memory that Solr requires.  
Here's some information on this topic:

https://wiki.apache.org/solr/SolrPerformanceProblems#Java_Heap

Thanks,
Shawn



[Premiere League Starts Saturday 13 August 9.30pm on 
SBS]


Re: Solr Cloud with 5 servers cluster failed due to Leader out of memory

2016-08-05 Thread Shawn Heisey
On 8/4/2016 8:14 PM, Tim Chen wrote:
> Couple of thoughts: 1, If Leader goes down, it should just go down,
> like dead down, so other servers can do the election and choose the
> new leader. This at least avoids bringing down the whole cluster. Am I
> right? 

Supplementing what Erick told you:

When a typical Java program throws OutOfMemoryError, program behavior is
completely unpredictable.  There are programming techniques that can be
used so that behavior IS predictable, but writing that code can be
challenging.

Solr 5.x and 6.x, when they are started on a UNIX/Linux system, use a
Java option to execute a script when OutOfMemoryError happens.  This
script kills Solr completely.  We are working on adding this capability
when running on Windows.

> 2, Apparently we should not pushing too many documents to Solr, how do
> you guys handle this? Set a limit somewhere? 

There are exactly two ways to deal with OOME problems: Increase the heap
or reduce Solr's memory requirements.  The number of documents you push
to Solr is unlikely to have a large effect on the amount of memory that
Solr requires.  Here's some information on this topic:

https://wiki.apache.org/solr/SolrPerformanceProblems#Java_Heap

Thanks,
Shawn