Re: Feedback on Match Query Parser (for fixing multiterm synonyms and other things)

2016-09-02 Thread Doug Turnbull
Thanks Alan, that might be doable.

What would the outer query look like? A big dismax across the fields with a
tie parameter? (ala Elasticsearch most/best fields)?

If you did that, you'd probably also want to control the query parser's
behavior per field. For example, one field you want to analyze with
synonyms and search with phrases. Another perhaps you're doing a bigram
search, etc. Perhaps something like facets field-specific local params
would be needed?

-Doug


On Fri, Sep 2, 2016 at 3:57 AM Alan Woodward  wrote:

> This looks very useful!  It would be nice if you could also query multiple
> fields at the same time, to give more edismax-like functionality.  In fact,
> you could probably extend this slightly to almost entirely replace edismax,
> by allowing multiple fields and multiple analysis paths.
>
> Alan Woodward
> www.flax.co.uk
>
>
> > On 2 Sep 2016, at 01:45, Doug Turnbull <
> dturnb...@opensourceconnections.com> wrote:
> >
> > I wanted to solicit feedback on my query parser, the match query parser (
> > https://github.com/o19s/match-query-parser). It's a work in progress, so
> > any thoughts from the community would be welcome.
> >
> > The point of this query parser is that it's not a query parser!
> >
> > Instead, it's a way of selecting any analyzer to apply to the query
> string. I
> > use it for all kinds of things, finely controlling a bigram phrase
> search,
> > searching with stemmed vs exact variants of the query.
> >
> > But it's biggest value to me is as a fix for multiterm synonyms. Because
> > I'm not giving the user's query to any underlying query parser -- I'm
> > always just doing analysis. So I know my selected analyzer will not be
> > disrupted by whitespace-based query parsing prior to query analysis.
> >
> > Those of you also in the Elasticsearch community may be familiar with the
> > match query (
> >
> https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-match-query.html
> > ). This is similar, except it also lets you select whether to turn the
> > resulting tokens into a term query body:(sea\ biscuit likes to fish) or a
> > phrase query body:"sea biscuit" likes to fish. See the examples above for
> > more.
> >
> > It's also similar to Solr's field query parser. However the field query
> > parser tries to turn the fully analyzed token stream into a phrase query.
> > Moreover, the field query parser can only select the field's own
> query-time
> > analyzer, while the match query parser let's you select an arbitrary
> > analyzer. So match has more bells and whistles and acts as a compliment
> to
> > the field qp.
> >
> > Thanks for any thoughts, feedback, or critiques
> >
> > Best,
> > -Doug
>
>


Re: Blank/Null value search in term filter

2016-09-02 Thread Ahmet Arslan


Hi Kishore,

You can employ an impossible token value (say XX) for null values.
This can be done via default value update processor factory.
You index some placeholder token for null values.
fq={!terms f='queryField' separator='|'}A|XX would fetche docs with A or null 
values.
Ahmet

On Friday, September 2, 2016 2:03 PM, Kamal Kishore Aggarwal 
 wrote:



Hi,

We are using solr 5.4.1.

We are using term filter for multiple value matching purpose.
Example: fq={!terms f='queryField' separator='|'}A|B

A, B, C are the possible field values for solr field "queryField". There
can docs with null values for the same field. Now, how can I create a term
filter in above fashion that fetches docs with A or null values.

Please suggest.

Regards
Kamal


Re: ShardDoc.sortFieldValues are not exposed in v5.2.1

2016-09-02 Thread Shawn Heisey
On 9/1/2016 12:31 PM, tedsolr wrote:
> I'm attempting to perform my own merge of IDs with a MergeStrategy in v5.2.1.
> I'm a bit hamstrung because the ShardFieldSortedHitQueue is not public. When
> trying to build my own priority queue I found out that the field
> sortFieldValues in ShardDoc is package restricted. Now, in v6.1 I see that
> both the HitQueue and the field are public.
>
> Would it be possible to patch 5.2.1, or maybe the latest v5, to expose these
> very useful objects? I can't upgrade to v6 due to the java 8 requirement.

I see that Shalin said it's already available in 5.5.

Upgrading from 5.2.1 to 5.5.x is a big enough leap that I wouldn't do it
without putting some time into QA, to make sure it doesn't break
anything, and to update my configs as necessary.  The 5.5 release in
particular had a lot of changes and new functionality added.  The 5.5.3
release should be out relatively soon, you might want to wait for that.

One thing you could do if you don't want to take the risk of upgrading,
and you have the ability to manually incorporate custom jars into your
project:  Check out the 5.2 branch from subversion, make whatever
changes you need, and then use "ant package" in the solr directory to
build the .tgz and .zip packages. Those packages and the jars inside
will have 5.2.2-SNAPSHOT as the version.  The URL that you need to check
out with svn is:
https://svn.apache.org/repos/asf/lucene/dev/branches/lucene_solr_5_2

The package that you end up with should be identical in function to
5.2.1, but will have a different version and your code change.  You
would replace Solr on your servers with that version, and use the new
solr jar(s) in your project.

I'm telling you about the svn repository for this, because "ant package"
will not work on the 5.2 branch if the checkout is from git.

If that option won't work for you, then you could do the work required
for an upgrade.

Thanks,
Shawn



Re: Is it safe to upgrade an existing field to docvalues?

2016-09-02 Thread Pushkar Raste
Hi Ronald,
Turning on docValues for existing field works in Solr 4. As you mentioned
it will use un-inverting method if docValues are nit found on existing
document. This all works fine until segments that have documents without
docValues merge with segment that have docValues for the field. In the
merged segment documents from the old segment will be stored without
docValues however segment's metadata will indicate docValues are turned ON
for the field in question.

Now if you are sorting on the field those poor documents would seem out of
order and facet counts would be wrong as well.

Solr 5 doesn't throws exception if you have mixed case of docValues for a
field.

I think it is better to crate a copy field, reindex all of the data and
then switch over to use copy field

On Aug 25, 2016 9:21 AM, "Ronald Wood"  wrote:

> Alessandro, yes I can see how this could be conceived of as a more general
> problem; and yes useDocValues also strikes me as being unlike the other
> properties since it would only be used temporarily.
>
> We’ve actually had to migrate fields from one to another when changing
> types, along with awkward naming like ‘fieldName’ (int) to ‘fieldNameLong’.
> But I’m not sure how a change like that could actually be done in place.
>
> The point is stronger when it comes to term vectors etc. where data exists
> in separate files and switches in code control whether they are used or not.
>
> I guess where I would argue that docValues might be different is that so
> much new functionality depends on this that it might be worth treating it
> differently. Given that docValues now is on by default, I wonder if it will
> at some point be mandatory, in which case everyone would have to migrate to
> keep up with Solr version. (Of course, I don’t know what the general
> thinking is on this amongst the implementers.)
>
> Regardless, this change may be so important to us that we’d choose to
> branch the code on GitHub and apply the patch ourselves, use it while we
> transition, and then deploy an official build once we’re done. The
> difference in the level of effort between this approach and the
> alternatives would be too great. The risks of using a custom build for
> production would have to be weighed carefully, naturally.
>
> - Ronald S. Wood
>
>
> On 8/25/16, 06:49, "Alessandro Benedetti"  wrote:
>
> > switching is done in Solr on field.hasDocValues. The code would be
> amended
> > to (field.hasDocValues && field.useDocValues) throughout.
> >
>
> This is correct. Currently we use DocValues if they are available, and
> to
> check the availabilty we check the schema attribute.
> This can be problematic in the scenarios you described ( for example
> half
> the index has docValues for a field and the other half not yet ).
>
> Your proposal is interesting.
> Technically it should work and should allow transparent migration from
> not
> docValues to docValues.
> But it is a risky one, because we are decreasing the readability a bit
> (
> althought a user will specify the attribute only in special cases like
> yours) .
>
> The only problem I see is that the same discussion we had for docValues
> actually applies to all other invasive schema changes :
> 1) you change the field type
> 2) you enable or disable term vectors
> 3) you enable/disable term positions,offsets ect ect
>
> So basically this is actually a general problem, that probably would
> require a general re-think .
> So although  can be a quick fix that will work, I fear can open the
> road to
> messy configuration attributes.
>
> Cheers
> --
> --
>
> Benedetti Alessandro
> Visiting card : http://about.me/alessandro_benedetti
>
> "Tyger, tyger burning bright
> In the forests of the night,
> What immortal hand or eye
> Could frame thy fearful symmetry?"
>
> William Blake - Songs of Experience -1794 England
>
>
>
>


Re: can't seem to get delta imports to work.

2016-09-02 Thread Shawn Heisey
On 8/31/2016 1:54 PM, Stahle, Patrick wrote:
> I am having problems getting the delta import working. Full import
> works fine. I am using current version of solr (6.1). I have been
> looking at this pretty much all day and can't find what I am not doing
> correctly... I did try the Using query attribute for both full and
> delta import and that worked, but as soon I ran it for a full import
> via clean=true my queries performance went very bad (oracle execution
> plain must of went bonkers). Anyways, I would appreciate any help. 

One possibility for performance issues with programs that use JDBC:  The
JDBC driver may be buffering the entire result set in memory before
releasing it to the dataimport handler.  Oracle may have a JDBC option
that causes it to stream results as they are requested, rather than
buffer them.  Upgrading the driver jar may be required.  I found a
document saying that version 12c of the oracle driver does a much better
job than earlier versions when it comes to memory management.

> That exact same query executed in dbeaver SQL client: 

> SELECT COUNT(bo.LXOID) FROM lxbo_current bo WHERE bo.LXMODDATE >
> TO_DATE('2016-08-28 19:28:07', '-mm-dd HH24:MI:SS')
> 250

Before I discovered that dbeaver is a Java program, these were two ideas
that I had about the difference in rowcount between Solr and dbeaver: 
1) The user in the JDBC connection details is somehow blocked from
seeing the matching records in the database.  2) There's a bug in the
JDBC driver you're using that causes the query to return zero rows.

Since dbeaver is Java, just like Solr, those possibilities seem less
likely, but if the following doesn't help, you should explore them.

I did notice that the value for pk that you have chosen (ID) doesn't
show up in deltaQuery.  The field there seems to be "id" ... which will
be a different field than "ID".  I wonder if maybe Solr is skipping
those rows because they don't have the defined pkfield?  Try changing
the field name in deltaQuery to uppercase, and make the back-reference
in deltaImportQuery match it.  I don't know if that's going to help, but
it's an idea.

Thanks,
Shawn



Re: commit it taking 1300 ms

2016-09-02 Thread Pushkar Raste
It would be worth looking into iostats of your disks.

On Aug 22, 2016 10:11 AM, "Alessandro Benedetti" 
wrote:

> I agree with the suggestions so far.
> The cache auto-warming doesn't seem the problem as the index is not massive
> and the auto-warm is for only 10 docs.
> Are you using any warming query for the new searcher ?
>
> Are you using soft or hard commit ?
> This can make the difference ( soft are much cheaper, not free but cheaper)
> .
> You said :
> " Actually earlier it was taking less but suddenly it has increased "
>
> What happened ?
> Anyway, there are a lot of questions to answer before we can help you...
>
> Cheers
>
> On Fri, Aug 12, 2016 at 4:58 AM, Esther-Melaine Quansah <
> esther.quan...@lucidworks.com> wrote:
>
> > Midas,
> >
> > I’d like further clarification as well. Are you sending commits along
> with
> > each document that you’re POSTing to Solr? If so, you’re essentially
> either
> > opening a new searcher or flushing to disk with each POST which could
> > explain latency between each request.
> >
> > Thanks,
> >
> > Esther
> > > On Aug 11, 2016, at 12:19 PM, Erick Erickson 
> > wrote:
> > >
> > > bq:  we post json documents through the curl it takes the time (same
> > time i
> > > would like to say that we are not hard committing ). that curl takes
> time
> > > i.e. 1.3 sec.
> > >
> > > OK, I'm really confused. _what_ is taking 1.3 seconds? When you said
> > > commit, I was thinking of Solr's commit operation, which is totally
> > distinct
> > > from just adding a doc to the index. But I read the above statement
> > > as you're saying it takes 1.3 seconds just to send a doc to Solr.
> > >
> > > Let's see the exact curl command you're using please?
> > >
> > > Best,
> > > Erick
> > >
> > >
> > > On Thu, Aug 11, 2016 at 5:32 AM, Emir Arnautovic
> > >  wrote:
> > >> Hi Midas,
> > >>
> > >> 1. How many indexing threads?
> > >> 2. Do you batch documents and what is your batch size?
> > >> 3. How frequently do you commit?
> > >>
> > >> I would recommend:
> > >> 1. Move commits to Solr (set auto soft commit to max allowed time)
> > >> 2. Use batches (bulks)
> > >> 3. tune bulk size and number of threads to achieve max performance.
> > >>
> > >> Thanks,
> > >> Emir
> > >>
> > >>
> > >>
> > >> On 11.08.2016 08:21, Midas A wrote:
> > >>>
> > >>> Emir,
> > >>>
> > >>> other queries:
> > >>>
> > >>> a) Solr cloud : NO
> > >>> b)  > >>> size="5000" initialSize="5000" autowarmCount="10"/>
> > >>> c)   > >>> size="1000" initialSize="1000" autowarmCount="10"/>
> > >>> d)  > >>> size="1000" initialSize="1000" autowarmCount="10"/>
> > >>> e) we are using multi threaded system.
> > >>>
> > >>> On Thu, Aug 11, 2016 at 11:48 AM, Midas A 
> > wrote:
> > >>>
> >  Emir,
> > 
> >  we post json documents through the curl it takes the time (same
> time i
> >  would like to say that we are not hard committing ). that curl takes
> > time
> >  i.e. 1.3 sec.
> > 
> >  On Wed, Aug 10, 2016 at 2:29 PM, Emir Arnautovic <
> >  emir.arnauto...@sematext.com> wrote:
> > 
> > > Hi Midas,
> > >
> > > According to your autocommit configuration and your worry about
> > commit
> > > time I assume that you are doing explicit commits from client code
> > and
> > > that
> > > 1.3s is client observed commit time. If that is the case, than it
> > might
> > > be
> > > opening searcher that is taking time.
> > >
> > > How do you index data - single threaded or multithreaded? How
> > frequently
> > > do you commit from client? Can you let Solr do soft commits instead
> > of
> > > explicitly committing? Do you have warmup queries? Is this
> SolrCloud?
> > > What
> > > is number of servers (what spec), shards, docs?
> > >
> > > In any case monitoring can give you more info about server/Solr
> > behavior
> > > and help you diagnose issues more easily/precisely. One such
> > monitoring
> > > tool is our SPM .
> > >
> > > Regards,
> > > Emir
> > >
> > > --
> > > Monitoring * Alerting * Anomaly Detection * Centralized Log
> > Management
> > > Solr & Elasticsearch Support * http://sematext.com/
> > >
> > > On 10.08.2016 05:20, Midas A wrote:
> > >
> > >> Thanks for replying
> > >>
> > >> index size:9GB
> > >> 2000 docs/sec.
> > >>
> > >> Actually earlier it was taking less but suddenly it has increased
> .
> > >>
> > >> Currently we do not have any monitoring  tool.
> > >>
> > >> On Tue, Aug 9, 2016 at 7:00 PM, Emir Arnautovic <
> > >> emir.arnauto...@sematext.com> wrote:
> > >>
> > >> Hi Midas,
> > >>>
> > >>> Can you give us more details on your index: size, number of new
> > docs
> > >>> between commits. Why do you think 1.3s for commit is to much and
> > why
> > >>> do
> > >>> you
> > 

RE: Always add the marker when elevating documents

2016-09-02 Thread Alexandre Drouin
Thank you Emir.


Alexandre Drouin


-Original Message-
From: Emir Arnautovic [mailto:emir.arnauto...@sematext.com] 
Sent: September 2, 2016 5:32 AM
To: solr-user@lucene.apache.org
Subject: Re: Always add the marker when elevating documents
Importance: High

Hi Alexandre,

You can specify default fl paramter for search handler in Solr config. 
You can use *,[elevated] to return all fields + elevated, but it is recommended 
to limit fl to fields needed - if you truly need all fields, then using * is ok.

Regards,
Emir


On 01.09.2016 22:11, Alexandre Drouin wrote:
> Hi,
>
> I followed the instructions on the wiki 
> (https://wiki.apache.org/solr/QueryElevationComponent) to add a 
> QueryElevationComponent searchComponent in my Solr 4.10.2 server and it is 
> working as expected.
>
> I saw in the documentation that it is possible to see which documents were 
> elevated by adding [elevated] to the fl parameter and I would like to know if 
> there is a way to always have the [elevated] property the results without 
> having to add it to the fl parameter.
> If it is not possible, is it safe to use "fl=*,[elevated]" to specify all 
> fields plus the elevated marker?
>
> Thanks!
> Alexandre Drouin

--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management Solr & 
Elasticsearch Support * http://sematext.com/



Re: Use function in condition

2016-09-02 Thread nabil Kouici
Hi Emir,
Thank you for your response.
Yes your request is working but only if it's function queries.
If you mix function query with normal query, this will not work. For example: 
fq={!frange 
l=1}and(query($sub1),or(query($sub2),query($sub3)))=F3:Active={!frange
 u=2000}sum(F3,F4)={!frange l=3000}sum(F5,F6)
Regards,Nabil.

  De : Emir Arnautovic 
 À : solr-user@lucene.apache.org 
 Envoyé le : Lundi 29 août 2016 14h06
 Objet : Re: Use function in condition
   
Hi Nabil,

Can you try following:

fq={!frange l=1}and(query($sub1),or(query($sub2),query($sub3)))={!frange 
l=1000}sum(F1,F2)={!frange u=2000}sum(F3,F4)={!frange 
l=3000}sum(F5,F6)

Thanks,
Emir

On 29.08.2016 11:50, nabil Kouici wrote:
> Hi solr users,
> I'm still not able to find a solution either with function query :(
> My need is simple, I'd like to execute these combined filters :
> (Sum F1 and F2 greater than 1000) AND ( (sum F3 and F4 lower than 2000) OR 
> (sum F5 and F6 greater then 3000) )
> Could you please help me to translate these conditions to solr syntaxe.
> Regards,Nabil.
>
>        De : Emir Arnautovic 
>  À : solr-user@lucene.apache.org
>  Envoyé le : Jeudi 25 août 2016 16h51
>  Objet : Re: Use function in condition
>    
> Hi Nabil,
>
> You have limited set functions, but there are logical functions: or,
> and, not and you have query function so can do more complex queries:
>
> fq={!frange l=1}and(query($sub1),termfreq(field3, 300))sub1={!frange 
> l=100}sum(field1,field2)
>
> And will return 1 for doc matching both function terms.
>
> It would be much simpler if Solr supported relational functions: gt, lt, eq.
>
> Hope this gives you ideas how to proceed.
>
> Emir
>
> On 25.08.2016 12:06, nabil Kouici wrote:
>> Hi Emir,Thank you for your replay. I've tested the function range query and 
>> this is solving 50% my need. The problem is I'm not able to use it with 
>> other conditions. For exemple:
>> fq={!frange l=100}sum(field1,field2)  and field3:200
>>
>> or
>> fq=({!frange l=100}sum(field1,field2))  and (field3:200)
>>
>> This is giving me an exception:org.apache.solr.search.SyntaxError: 
>> Unexpected text after function: AND Field3:200
>> I know that I can use multiple fq but the problem is I can have complexe 
>> filter like (cond1 OR cond2 AND cond3)
>> Could you please help.
>> Regards,Nabil.
>>
>>          De : Emir Arnautovic 
>>    À : solr-user@lucene.apache.org
>>    Envoyé le : Mercredi 17 août 2016 17h08
>>    Objet : Re: Use function in condition
>>      
>> Hi Nabil,
>>
>> You can use frange queries, e.g. you can use fq={!frange
>> l=100}sum(field1,field2) to filter doc with sum greater than 100.
>>
>> Regards,
>> Emir
>>
>>
>> On 17.08.2016 16:26, nabil Kouici wrote:
>>> Hi,
>>> Is it possible to use functions (function query 
>>> https://cwiki.apache.org/confluence/display/solr/Function+Queries) in q or 
>>> fq parameters to build a complex search expression.
>>> For exemple, take only documents that sum(field1,field2)> 100. Another 
>>> exemple: if(test,value1,value2):vallue3
>>> Regards,Nabil.

-- 
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/



   

Blank/Null value search in term filter

2016-09-02 Thread Kamal Kishore Aggarwal
Hi,

We are using solr 5.4.1.

We are using term filter for multiple value matching purpose.
Example: fq={!terms f='queryField' separator='|'}A|B

A, B, C are the possible field values for solr field "queryField". There
can docs with null values for the same field. Now, how can I create a term
filter in above fashion that fetches docs with A or null values.

Please suggest.

Regards
Kamal


SOLR replication: different behavior for network cut off vs. machine restart

2016-09-02 Thread Grzegorz Huber
Hi,

We try to set up a SOLR Cloud environment using 1 shard with 2
replicas (1 leader). The replicas are managed by 3 zookeeper
instances.

The setup seems fine when we do the normal work. The data is being
replicated at runtime.

Now we try to simulate erroneous behavior in several cases:

Turn off one of the replicas in two different scenarios: leader and non-leader
Cutting off the network making the non-leader replica down

In both cases the data is being written contentiously to the SOLR Cloud.

CASE 1: The replication process starts after the failed machine gets
boot up again. The complete data set is present in both replicas.
Everything works fine.

CASE 2: Once reconnected to network the non-leader replica starts the
recovery process ,but for some reason the new data from leader is not
being replicated onto the previously failed replica.

>From what I was able to read from logs comparing both cases I don't
understand why SOLR sees

RecoveryStrategy ## currentVersions as present and
RecoveryStrategy ## startupVersions=[[]] (empty)

compared to CASE 1 when RecoveryStrategy ## startupVersions are
filled with objects that are in currentVersions in CASE 2

The general question is... why restarting SOLR results in a successful
migration process, but reconnecting the network does not?

Thanks for any tips / leads!

Cheers,
Greg


Re: Always add the marker when elevating documents

2016-09-02 Thread Emir Arnautovic

Hi Alexandre,

You can specify default fl paramter for search handler in Solr config. 
You can use *,[elevated] to return all fields + elevated, but it is 
recommended to limit fl to fields needed - if you truly need all fields, 
then using * is ok.


Regards,
Emir


On 01.09.2016 22:11, Alexandre Drouin wrote:

Hi,

I followed the instructions on the wiki 
(https://wiki.apache.org/solr/QueryElevationComponent) to add a 
QueryElevationComponent searchComponent in my Solr 4.10.2 server and it is 
working as expected.

I saw in the documentation that it is possible to see which documents were 
elevated by adding [elevated] to the fl parameter and I would like to know if 
there is a way to always have the [elevated] property the results without 
having to add it to the fl parameter.
If it is not possible, is it safe to use "fl=*,[elevated]" to specify all 
fields plus the elevated marker?

Thanks!
Alexandre Drouin


--
Monitoring * Alerting * Anomaly Detection * Centralized Log Management
Solr & Elasticsearch Support * http://sematext.com/



Re: Replication Index fetch failed

2016-09-02 Thread Arkadi Colson

Hi

I cannot find a string in the logs matching "Could not download file...".

This info is logged on the slave:

WARN  - 2016-09-02 09:28:36.923; [c:intradesk s:shard10 r:core_node23 
x:intradesk_shard10_replica1] 
org.apache.solr.handler.IndexFetcher$FileFetcher; Error in fetching 
file: _5qd6_ya.liv (downloaded 0 of 13692 bytes)

java.io.EOFException
at 
org.apache.solr.common.util.FastInputStream.readFully(FastInputStream.java:168)
at 
org.apache.solr.common.util.FastInputStream.readFully(FastInputStream.java:160)
at 
org.apache.solr.handler.IndexFetcher$FileFetcher.fetchPackets(IndexFetcher.java:1460)
at 
org.apache.solr.handler.IndexFetcher$FileFetcher.fetchFile(IndexFetcher.java:1426)
at 
org.apache.solr.handler.IndexFetcher.downloadIndexFiles(IndexFetcher.java:852)
at 
org.apache.solr.handler.IndexFetcher.fetchLatestIndex(IndexFetcher.java:428)
at 
org.apache.solr.handler.IndexFetcher.fetchLatestIndex(IndexFetcher.java:251)
at 
org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:388)
at 
org.apache.solr.cloud.RecoveryStrategy.replicate(RecoveryStrategy.java:156)
at 
org.apache.solr.cloud.RecoveryStrategy.doRecovery(RecoveryStrategy.java:408)
at 
org.apache.solr.cloud.RecoveryStrategy.run(RecoveryStrategy.java:221)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)

at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at 
org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$22(ExecutorUtil.java:229)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)

at java.lang.Thread.run(Thread.java:745)

And this on the master:

WARN  - 2016-09-02 09:28:36.936; [c:intradesk s:shard10 r:core_node13 
x:intradesk_shard10_replica2] 
org.apache.solr.handler.ReplicationHandler$DirectoryFileStream; 
Exception while writing response for params: 
generation=124148=/replication=_5qd6_ya.liv=true=filestream=filecontent

BPerSec=18.75
java.nio.file.NoSuchFileException: 
/var/solr/data/intradesk_shard10_replica2/data/index.20160816102332501/_5qd6_ya.liv
at 
sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
at 
sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
at 
sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
at 
sun.nio.fs.UnixFileSystemProvider.newFileChannel(UnixFileSystemProvider.java:177)

at java.nio.channels.FileChannel.open(FileChannel.java:287)
at java.nio.channels.FileChannel.open(FileChannel.java:335)
at 
org.apache.lucene.store.MMapDirectory.openInput(MMapDirectory.java:238)
at 
org.apache.lucene.store.NRTCachingDirectory.openInput(NRTCachingDirectory.java:192)
at 
org.apache.solr.handler.ReplicationHandler$DirectoryFileStream.write(ReplicationHandler.java:1435)

at org.apache.solr.core.SolrCore$3.write(SolrCore.java:2154)
at 
org.apache.solr.response.QueryResponseWriterUtil.writeQueryResponse(QueryResponseWriterUtil.java:49)
at 
org.apache.solr.servlet.HttpSolrCall.writeResponse(HttpSolrCall.java:731)

at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:473)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:257)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:208)
at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1668)
at 
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:581)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
at 
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
at 
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)
at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1160)
at 
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:511)
at 
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
at 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1092)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
at 
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)
at 
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119)
at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)

at org.eclipse.jetty.server.Server.handle(Server.java:518)
at 
org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:308)
at 

Re: How can I set the defaultOperator to be AND?

2016-09-02 Thread Bastien Latard | MDPI AG

Thanks Steve for your advice (i.e.: upgrade to Solr 6.2).
I finally had time to upgrade and can now use "=AND" together with 
"=a OR b" and this works as expected.


I even defined the following line in the defaults settings in the 
requestHandler, to overwrite the default behavior:

AND

Issue fixed :)

Kind regards,
Bast

On 05/08/2016 14:57, Bastien Latard | MDPI AG wrote:

Hi Steve,

I read the thread you sent me (SOLR-8812) and it seems that the 6.1 
includes this fix, as you said.

I will upgrade.
Thank you!

Kind regards,
Bast

On 05/08/2016 14:37, Steve Rowe wrote:

Hi Bastien,

Have you tried upgrading to 6.1?  SOLR-8812, mentioned earlier in the 
thread, was released with 6.1, and is directly aimed at fixing the 
problem you are having in 6.0 (also a problem in 5.5): when mm is not 
explicitly provided and the query contains explicit operators (except 
for AND), edismax now sets mm=0.


--
Steve
www.lucidworks.com

On Aug 5, 2016, at 2:34 AM, Bastien Latard | MDPI AG 
 wrote:


Hi Eric & others,
Is there any way to overwrite the default OP when we use edismax?
Because adding the following line to solrconfig.xml doesn't solve 
the problem:



(Then if I do "q=black OR white", this always gives the results for 
"black AND white")


I did not find a way to define a default OP, which is automatically 
overwritten by the AND/OR from a query.



Example - Debug: defaultOP in solrconfig = AND / q=a or b


==> results for black AND white
The correct result should be the following (but I had to force the 
q.op):


==> I cannot do this in case I want to do "(a AND b) OR c"...


Kind regards,
Bastien

On 27/04/2016 05:30, Erick Erickson wrote:
Defaulting to "OR" has been the behavior since forever, so changing 
the behavior now is just not going to happen. Making it fit a new 
version of "correct" will change the behavior for every application 
out there that has not specified the default behavior.


There's no a-priori reason to expect "more words to equal fewer 
docs", I can just as easily argue that "more words should return 
more docs". Which you expect depends on your mental model.


And providing the default op in your solrconfig.xml request 
handlers allows you to implement whatever model your application 
chooses...


Best,
Erick

On Mon, Apr 25, 2016 at 11:32 PM, Bastien Latard - MDPI AG 
 wrote:

Thank you Shawn, Jan and Georg for your answers.

Yes, it seems that if I simply remove the defaultOperator it works 
well for "composed queries" like '(a:x AND b:y) OR c:z'.

But I think that the default Operator should/could be the AND.

Because when I add an extra search word, I expect that the results 
get more accurate...

(It seems to be what google is also doing now)
|   |

Otherwise, if you make a search and apply another filter (e.g.: 
sort by publication date, facets, ...) , user can get the less 
relevant item (only 1 word in 4 matches) in first position only 
because of its date...


What do you think?


Kind regards,
Bastien


On 25/04/2016 14:53, Shawn Heisey wrote:

On 4/25/2016 6:39 AM, Bastien Latard - MDPI AG wrote:


Remember:
If I add the following line to the schema.xml, even if I do a search
'title:"test" OR author:"me"', it will returns documents matching
'title:"test" AND author:"me"':


The settings in the schema for default field and default operator 
were

deprecated a long time ago.  I actually have no idea whether they are
even supported in newer Solr versions.

The q.op parameter controls the default operator, and the df 
parameter

controls the default field.  These can be set in the request handler
definition in solrconfig.xml -- usually in "defaults" but there 
might be

reason to put them in "invariants" instead.

If you're using edismax, you'd be better off using the mm parameter
rather than the q.op parameter.  The behavior you have described 
above
sounds like a change in behavior (some call it a bug) introduced 
in the

5.5 version:


https://issues.apache.org/jira/browse/SOLR-8812


If you are using edismax, I suspect that if you set mm=100% 
instead of
q.op=AND (or the schema default operator) that the problem might 
go away

... but I am not sure.  Someone who is more familiar with SOLR-8812
probably should comment.

Thanks,
Shawn





Re: Feedback on Match Query Parser (for fixing multiterm synonyms and other things)

2016-09-02 Thread Alan Woodward
This looks very useful!  It would be nice if you could also query multiple 
fields at the same time, to give more edismax-like functionality.  In fact, you 
could probably extend this slightly to almost entirely replace edismax, by 
allowing multiple fields and multiple analysis paths.

Alan Woodward
www.flax.co.uk


> On 2 Sep 2016, at 01:45, Doug Turnbull  
> wrote:
> 
> I wanted to solicit feedback on my query parser, the match query parser (
> https://github.com/o19s/match-query-parser). It's a work in progress, so
> any thoughts from the community would be welcome.
> 
> The point of this query parser is that it's not a query parser!
> 
> Instead, it's a way of selecting any analyzer to apply to the query string. I
> use it for all kinds of things, finely controlling a bigram phrase search,
> searching with stemmed vs exact variants of the query.
> 
> But it's biggest value to me is as a fix for multiterm synonyms. Because
> I'm not giving the user's query to any underlying query parser -- I'm
> always just doing analysis. So I know my selected analyzer will not be
> disrupted by whitespace-based query parsing prior to query analysis.
> 
> Those of you also in the Elasticsearch community may be familiar with the
> match query (
> https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-match-query.html
> ). This is similar, except it also lets you select whether to turn the
> resulting tokens into a term query body:(sea\ biscuit likes to fish) or a
> phrase query body:"sea biscuit" likes to fish. See the examples above for
> more.
> 
> It's also similar to Solr's field query parser. However the field query
> parser tries to turn the fully analyzed token stream into a phrase query.
> Moreover, the field query parser can only select the field's own query-time
> analyzer, while the match query parser let's you select an arbitrary
> analyzer. So match has more bells and whistles and acts as a compliment to
> the field qp.
> 
> Thanks for any thoughts, feedback, or critiques
> 
> Best,
> -Doug