from:"Greg Pendlebury"

[jira] [Commented] (SOLR-10856) ExtendedDismaxQParser (edismax) override OR when mm=100%

2017-06-24 Thread Greg Pendlebury (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-10856?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16062195#comment-16062195
 ] 

Greg Pendlebury commented on SOLR-10856:


You are describing exactly what mm is supposed to do. The change made in 
SOLR-2649 was the root cause (deliberately... because of the bug caused by the 
inverse impact boolean operators had on mm), and SOLR-8812 was about choosing 
less disruptive default values when users are not specifying them.

In this case, however you are explicitly requesting mm=100%... and getting 
answers that match. The short answer is don't use mm=100% if you want boolean 
logic. It is not feature compatible.

The longer answer is nasty and would require delving into how boolean operators 
are truly handled by Solr when translated into OCCURS flags. The mm parameter 
operates on the SHOULD OCCUR flags, which is (roughly) what your OR terms are 
translated into.

> ExtendedDismaxQParser (edismax) override OR when mm=100%
> 
>
> Key: SOLR-10856
> URL: https://issues.apache.org/jira/browse/SOLR-10856
> Project: Solr
>  Issue Type: Bug
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: query parsers
>Affects Versions: 5.5, 6.0, 6.6
>Reporter: Sébastien LECACHEUR
>
> Since Solr 5.5.1, edismax parser override OR (with AND behavior) in queries 
> when mm=100%. This behavior is new from Solr 5.5.1 to 6.6.0.
> Concerned query :
> {code:none}
> curl -s 
> 'http://localhost:8983/solr/mycorename/select?q=type_s%3A(A+OR+C)&wt=json&defType=edismax&mm=100%25&indent=true&debugQuery=true'
> {code}
> 1) Solr 5.4.1 :
> {code:javascript}
> "rawquerystring":"type_s:(A OR C)",
> "querystring":"type_s:(A OR C)",
> "parsedquery":"(+(type_s:A type_s:C))/no_coord",
> "parsedquery_toString":"+(type_s:A type_s:C)",
> "explain":{...},
> "QParser":"ExtendedDismaxQParser",
> {code}
> Returns docs as expected.
> 2) Solr 5.5.1 :
> {code:javascript}
> "rawquerystring":"type_s:(A OR C)",
> "querystring":"type_s:(A OR C)",
> "parsedquery":"(+((type_s:A type_s:C)~2))/no_coord",
> "parsedquery_toString":"+((type_s:A type_s:C)~2)",
> "explain":{},
> "QParser":"ExtendedDismaxQParser",
> {code}
> Returns no results
> 3) Solr 6.6.0 :
> {code:javascript}
> "rawquerystring":"type_s:(A OR C)",
> "querystring":"type_s:(A OR C)",
> "parsedquery":"(+(type_s:A type_s:C)~2)/no_coord",
> "parsedquery_toString":"+((type_s:A type_s:C)~2)",
> "explain":{},
> "QParser":"ExtendedDismaxQParser",
> {code}
> Returns no results
> This bug looks like SOLR-8812 issue.



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: SolrCloud "master mode" planned?

2017-04-26 Thread Greg Pendlebury

Would it be possible to have an optional node that can dynamically assume a
leadership position?

There has been some small amount of discussion here about whether we could
have a 'search node' join a cluster and give it an empty/null (or zero
length) hash range on the clusterstate file and it would host no lucene
segments, which helps it avoid any GC issues related to commits, or NIC
saturation related to replication. The node could  possibly be tuned and
configured purely for search and a small number of these nodes could be put
behind a traditional load balancer and remove the need for search clients
to understand zookeeper. That last part was attractive to us simply for
search clients like JMeter, but it is nice for other reasons (like
firewalling the ZK nodes and Solr).

Our theory (which never evolved beyond idle chat) was that this would
almost be possible as it currently stands, but those sorts of nodes might
be an attractive place to host any 'leadership' or features which
optionally buy in to 'some nodes more equal than others'.

Ta,
Greg


On 27 April 2017 at 11:27, Walter Underwood  wrote:

> Not fired up about approaches that have “some nodes more equal than
> others”, whether it is zk-only or replica-only. That is the opposite of
> autoscaling.
>
> Getting our Solr Cloud cluster running in test and in prod was
> surprisingly difficult and unfortunately mysterious. I started with Solr
> 1.1, so I’m not exactly a noob.
>
> This cluster needs to handle very long text queries on a large collection
> (17 million docs and growing). After too much work, I’m really happy with
> the performance. This is 4 shards, 4 replicas, with AWS c4.8xlarge nodes.
> Yes, that is nearly $200,000 per year, just for the instances.
>
> Here is what I wanted, and what I ended up with after way too much work.
>
> * Collection configs and startup parameters under version control.
> * Separation between installing software and installing configs.
> * Automated config deploy from Jenkins.
> * Data and logs on /solr EBS volume.
> * HTTP on port 6090 to match our port-to-app mapping (just gave up on this
> one, geez).
> * Five node Zookeeper ensemble to avoid single point of failure during zk
> version or instance upgrade (wasted two weeks, gave up).
> * Graceful scale out and scale in (not even nearly done).
> * Metrics reported to Graphite, the performance bug in 6.4 cost a few
> weeks.
>
> Separating executables, config, and data should not be this hard, right? I
> thought we solved that in the 1990’s.
>
> I’ve never had a problem with getting Zookeeper running, but getting the
> five node ensemble to work was impossible. I wrote my first concurrent code
> 35 years ago, so I should be able to do this. Just could not get 3.4.6 to
> actually work on five nodes, no matter how many weird things we tried, like
> switching AWS instance types. Used an existing 3 node ensemble that we had
> wanted to decommission.
>
> The magic solr script commands do not document what happens when you
> change the port or server directory. Surprisingly, many of them only work
> after you have a local running Solr instance. Also not documented. So port
> must be passed in to many of the commands. I guess the server directory
> needs to be passed in, too, but I never figured that out.
>
> The required contents of the Zookeeper “filesystem” are undocumented, as
> far as I can tell. In fact, I found it really hard to figure out exactly
> what directory to give to either the solr script or the zkCli script.
> Earlier versions of solr had $SOLR_HOME/$collection/conf/…, but where is
> the directory arg in that hierarchy? Especially because … solr.xml
>
> Still don’t have a method that I trust to create a new cluster, but
> updates to our existing clusters are pretty solid. It just seems bogus to
> have a whole filesystem-based deployment, use that to bootstrap, then never
> use it again. I have zero trust that it will work the next time.
>
> I wrote a Python program that takes the URL of any Solr node (I used the
> load balancer), the collection(s) to update, and the base directory for
> configs (use the collection name as a subdirectory). It does this:
>
> 1. Gets info from Solr (requests package, yay!) and parses out the zk
> host, including the chroot. Some ugly parsing there, that should be
> straight JSON.
> 2. Connects to zk (kazoo package, yay!). Uploads solr.xml from the base of
> the configs directory.
> 3. Optionally, removes all the files from the zk config directory we are
> about to upload to.
> 4. Uploads all the files, recursively, from $configs/$collection on the
> filesystem to zk.
> 5. Optionally, links the config to the same name as a collection.
> 6. Sends RELOAD for that collection to the Solr node as async command.
> 7. Waits.
> 8. Waits. Finally completes.
> 9. Parses the response for succeeding and failed nodes. If there are
> failed nodes, exit with a failure result code. NOTE: this could leave the
> cluster in an unfortunate s

[jira] [Commented] (SOLR-4823) Split LBHttpSolrServer into two classes one for the solrj use case and one for the solr cloud use case

2016-11-06 Thread Greg Pendlebury (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-4823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15642463#comment-15642463
 ] 

Greg Pendlebury commented on SOLR-4823:
---

I know that my question will be off-topic for this particular issue, but it 
seems that it might be a viable launching point for a customization our team 
has been considering in-house. We were thinking of trying out the addition of 
one or more nodes in the cluster that had no allocated range hash in 
clusterstate (whether or not we needed to modify to code to achieve this we 
haven't looked yet).

Their purpose would be to act as search entry points for the cluster with more 
stable JVM performance (because they manage no lucene segments) as well as 
internalizing cluster security at the OS level. Right now, in a 200 replica 
cluster we need to let any/all SolrJ clients have access to the ZK ensemble as 
well as ports on every replica. It also makes managing threading (such as in 
the default http client thread pool) annoying to configure and test for 
performance.

With [~phloy]'s patch we could still make use of SolrJ, but just provide a 
small whitelist of our 'search nodes' and keep client-side requirements for 
searching very simple in terms of security and thread management.

> Split LBHttpSolrServer into two classes one for the solrj use case and one 
> for the solr cloud use case
> --
>
> Key: SOLR-4823
> URL: https://issues.apache.org/jira/browse/SOLR-4823
> Project: Solr
>  Issue Type: Improvement
>  Components: SolrCloud
>Reporter: philip hoy
>Priority: Minor
> Attachments: SOLR-4823.patch, SOLR-4823.patch
>
>
> The LBHttpSolrServer has too many responsibilities. It could perhaps be 
> broken into two classes, one in solrj to be used in the place of an external 
> load balancer that balances across a known set of solr servers defined at 
> construction time and one in solr core to be used by the solr cloud 
> components that balances across servers dependant on the request.
> To save code duplication, if much arises an abstract bass class could be 
> introduced in to solrj.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8016) CloudSolrClient has extremely verbose error logging

2016-10-18 Thread Greg Pendlebury (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15587251#comment-15587251
 ] 

Greg Pendlebury commented on SOLR-8016:
---

Not that I am aware of. I can see the problem still in our newest server 
(5.5.3). I like [~markrmil...@gmail.com]'s suggestion of lowering the log level 
to info. It is simple and we can filter it out via logging config. The deeper 
issues of whether the retry should even be attempted sound interesting to me, 
but I'd be happy to just not see the log entries.

> CloudSolrClient has extremely verbose error logging
> ---
>
> Key: SOLR-8016
> URL: https://issues.apache.org/jira/browse/SOLR-8016
> Project: Solr
>  Issue Type: Improvement
>  Components: clients - java
>Affects Versions: 5.2.1, 6.0
>Reporter: Greg Pendlebury
>Priority: Minor
>  Labels: easyfix
>
> CloudSolrClient has this error logging line which is fairly annoying:
> {code}
>   log.error("Request to collection {} failed due to ("+errorCode+
>   ") {}, retry? "+retryCount, collection, rootCause.toString());
> {code}
> Given that this is a client library and then gets embedded into other 
> applications this line is very problematic to handle gracefully. In today's 
> example I was looking at, every failed search was logging over 100 lines, 
> including the full HTML response from the responding node in the cluster.
> The resulting SolrServerException that comes out to our application is 
> handled appropriately but we can't stop this class complaining in logs 
> without suppressing the entire ERROR channel, which we don't want to do. This 
> is the only direct line writing to the log I could find in the client, so we 
> _could_ suppress errors, but that just feels dirty, and fragile for the 
> future.
> From looking at the code I am fairly certain it is not as simple as throwing 
> an exception instead of logging... it is right in the middle of the method. I 
> suspect the simplest answer is adding a marker 
> (http://www.slf4j.org/api/org/slf4j/Marker.html) to the logging call.
> Then solrj users can choose what to do with these log entries. I don't know 
> if there is a broader strategy for handling this that I am ignorant of; 
> apologies if that is the case.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Building a Solr cluster with Maven

2016-10-18 Thread Greg Pendlebury

 a way
> to generate somewhat customized versions of solr from the artifacts that
> are published already. Publishing the whole zip would be a start,
> downstream builds could add logic to resolve it, explode, tweak, and
> re-publish. The maintain the strict separation from the war, it might be
> helpful to have a lib or "plugin-ins" folder in the zip that is by default
> loaded to the classpath as an extension point for users who are re-building
> the package?
>
> -Tim
>
> From: dev@lucene.apache.org At: 10/18/16 09:52:42
> To: dev@lucene.apache.org
> Subject: Re: Building a Solr cluster with Maven
>
> My team has modified the ant scripts to publish all the jars/poms and the
> zip to our local artifactory when we run our build. We have another project
> which pulls down all of these dependencies including the zip to build our
> actual solr deploy and a maven assembly which unpacks the zip file and
> extracts all of the webapp for our real distribution.
>
> I haven't upstreamed the changes for the ant tasks thinking there wouldn't
> be too much interest in that, but I could put together a patch if there is.
>
> The changes do the following:
>
> - Packages the zip along with the parent pom if a flag is set
> - Allows changing group which the poms are published to. For example
> instead of org.apache you can push it as com.xxx to avoid shadowing
> conflicts in your local repository.
>
> On Tue, Oct 18, 2016 at 8:42 AM David Smiley 
> wrote:
>
>> Thanks for bringing this up, Greg.  I too have felt the pain of this in
>> the move away from a WAR file in a project or two.  In one of the projects
>> that comes to mind, we built scripts that re-constituted a Solr
>> distribution from artifacts in Maven. For anything that wasn't in Maven
>> (e.g. the admin UI pages, Jetty configs), we checked it into source
>> control.  In hind sight... the simplicity of what you list as (1) -- check
>> the distro zip into a Maven repo local to the organization sounds better...
>> but I may be forgetting requirements that led us not to do this.  I look
>> forward to that zip shrinking once the docs are gone.  Another option,
>> depending on one's needs, is to pursue Docker, which I've lately become a
>> huge fan of.  I think Docker is particularly great for integration tests.
>> Does the scenario you wish to use the assets for relate to testing or some
>> other use-case?
>>
>> ~ David
>>
>>
>>
>> On Mon, Oct 17, 2016 at 7:58 PM Greg Pendlebury <
>> greg.pendleb...@gmail.com> wrote:
>>
>> Are there any developers with a current working maven build for a
>> downstream Solr installation? ie. Not a build for Solr itself, but a build
>> that brings in the core Solr server plus local plugins, third party plugins
>> etc?
>>
>> I am in the process of updating one of our old builds (it builds both the
>> application and various shard instances) and have hit a stumbling block in
>> sourcing the dashboard static assets (everything under /webapp/web in
>> Solr's source).
>>
>> Prior to the move away from being a webapp I could get them by exploding
>> the war from Maven Central.
>>
>> In our very first foray into 5.x we had a local custom build to patch
>> SOLR-2649. We avoided solving this problem then by pushing the webapp into
>> our local Nexus as part of that build... but that wasn't a very good long
>> term choice.
>>
>> So now I'm trying to work out the best long term approach to take here.
>> Ideas so far:
>>
>>1. Manually download the required zip and add it into our Nexus
>>repository as a 3rd party artifact. Maven can source and extract anything
>>it needs from here. This is where I'm currently leaning for simplicity, 
>> but
>>the manual step required is annoying. It does have the advantage of 
>> causing
>>a build failure straight away when a version upgrade occurs, prompting the
>>developer to look into why.
>>2. Move a copy of the static assets for the dashboard into our
>>project and deploy them ourselves. This has the advantage of aligning our
>>approach with the resources we already maintain in the project (like
>>core.properties, schema.xml, solrconfig.xml, logging etc.). But I am
>>worried that it is really fragile and developers will miss it during a
>>version upgrade, resulting in the dashboard creeping out-of-date and
>>(worse) introducing subtle bugs because of a version mismatch between the
>>UI and the underlying server code.
>>3. I'd like to think a long ter

Building a Solr cluster with Maven

2016-10-17 Thread Greg Pendlebury

Are there any developers with a current working maven build for a
downstream Solr installation? ie. Not a build for Solr itself, but a build
that brings in the core Solr server plus local plugins, third party plugins
etc?

I am in the process of updating one of our old builds (it builds both the
application and various shard instances) and have hit a stumbling block in
sourcing the dashboard static assets (everything under /webapp/web in
Solr's source).

Prior to the move away from being a webapp I could get them by exploding
the war from Maven Central.

In our very first foray into 5.x we had a local custom build to patch
SOLR-2649. We avoided solving this problem then by pushing the webapp into
our local Nexus as part of that build... but that wasn't a very good long
term choice.

So now I'm trying to work out the best long term approach to take here.
Ideas so far:

   1. Manually download the required zip and add it into our Nexus
   repository as a 3rd party artifact. Maven can source and extract anything
   it needs from here. This is where I'm currently leaning for simplicity, but
   the manual step required is annoying. It does have the advantage of causing
   a build failure straight away when a version upgrade occurs, prompting the
   developer to look into why.
   2. Move a copy of the static assets for the dashboard into our project
   and deploy them ourselves. This has the advantage of aligning our approach
   with the resources we already maintain in the project (like
   core.properties, schema.xml, solrconfig.xml, logging etc.). But I am
   worried that it is really fragile and developers will miss it during a
   version upgrade, resulting in the dashboard creeping out-of-date and
   (worse) introducing subtle bugs because of a version mismatch between the
   UI and the underlying server code.
   3. I'd like to think a long term approach would be for the core Solr
   build to ship a JAR (or any other assembly) to Maven Central like
   'solr-dashboard'... but I'm not sure how that aligns with the move away
   from Solr being considered a webapp. It seems a shame that all of the Java
   code ends up in Maven central, but the web layer dead-ends in the ant build.

I might be missing something really obvious and there is already a way to
do this. Is there some other distribution of the dashboard statics? Other
than the downloadable zip that is.

Ta,
Greg

[jira] [Commented] (SOLR-8812) ExtendedDismaxQParser (edismax) ignores Boolean OR when q.op=AND and mm is not explicitly set

2016-06-10 Thread Greg Pendlebury (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15325638#comment-15325638
 ] 

Greg Pendlebury commented on SOLR-8812:
---

Sounds great. Add my thanks to the those you've already received.

> ExtendedDismaxQParser (edismax) ignores Boolean OR when q.op=AND and mm is 
> not explicitly set
> -
>
> Key: SOLR-8812
> URL: https://issues.apache.org/jira/browse/SOLR-8812
> Project: Solr
>  Issue Type: Bug
>  Components: query parsers
>Affects Versions: 5.5
>Reporter: Ryan Steinberg
>Assignee: Steve Rowe
> Fix For: 5.6, 6.1, 5.5.2, master (7.0), 6.0.2
>
> Attachments: SOLR-8812-barbie.patch, SOLR-8812.patch, 
> SOLR-8812.patch, SOLR-8812.patch
>
>
> The edismax parser ignores Boolean OR in queries when q.op=AND. This behavior 
> is new to Solr 5.5.0 and an unexpected major change.
> Example:
>   "q": "id:12345 OR zz",
>   "defType": "edismax",
>   "q.op": "AND",
> where "12345" is a known document ID and "zz" is a string NOT present 
> in my data
> Version 5.5.0 produces zero results:
> "rawquerystring": "id:12345 OR zz",
> "querystring": "id:12345 OR zz",
> "parsedquery": "(+((id:12345 
> DisjunctionMaxQuery((text:zz)))~2))/no_coord",
> "parsedquery_toString": "+((id:12345 (text:zz))~2)",
> "explain": {},
> "QParser": "ExtendedDismaxQParser"
> Version 5.4.0 produces one result as expected
>   "rawquerystring": "id:12345 OR zz",
> "querystring": "id:12345 OR zz",
> "parsedquery": "(+(id:12345 
> DisjunctionMaxQuery((text:zz/no_coord",
> "parsedquery_toString": "+(id:12345 (text:zz))"
> "explain": {},
> "QParser": "ExtendedDismaxQParser"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8812) ExtendedDismaxQParser (edismax) ignores Boolean OR when q.op=AND and mm is not explicitly set

2016-06-10 Thread Greg Pendlebury (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15324063#comment-15324063
 ] 

Greg Pendlebury commented on SOLR-8812:
---

Sounds (tentatively) ok to me. I was quite concerned when you said it puts 
things back to pre-SOLR-2649 functionality, but from looking at what got 
committed it seems that q.op=OR is no longer hardcoded in setDefaultOperator() 
(which was fixed in SOLR-2649). I haven't executed anything, but this seems 
like a good step with regards to mm handling.

> ExtendedDismaxQParser (edismax) ignores Boolean OR when q.op=AND and mm is 
> not explicitly set
> -
>
> Key: SOLR-8812
> URL: https://issues.apache.org/jira/browse/SOLR-8812
> Project: Solr
>  Issue Type: Bug
>  Components: query parsers
>Affects Versions: 5.5
>Reporter: Ryan Steinberg
>Assignee: Steve Rowe
> Fix For: 6.1, 5.5.2, 6.0.2
>
> Attachments: SOLR-8812-barbie.patch, SOLR-8812.patch, 
> SOLR-8812.patch, SOLR-8812.patch
>
>
> The edismax parser ignores Boolean OR in queries when q.op=AND. This behavior 
> is new to Solr 5.5.0 and an unexpected major change.
> Example:
>   "q": "id:12345 OR zz",
>   "defType": "edismax",
>   "q.op": "AND",
> where "12345" is a known document ID and "zz" is a string NOT present 
> in my data
> Version 5.5.0 produces zero results:
> "rawquerystring": "id:12345 OR zz",
> "querystring": "id:12345 OR zz",
> "parsedquery": "(+((id:12345 
> DisjunctionMaxQuery((text:zz)))~2))/no_coord",
> "parsedquery_toString": "+((id:12345 (text:zz))~2)",
> "explain": {},
> "QParser": "ExtendedDismaxQParser"
> Version 5.4.0 produces one result as expected
>   "rawquerystring": "id:12345 OR zz",
> "querystring": "id:12345 OR zz",
> "parsedquery": "(+(id:12345 
> DisjunctionMaxQuery((text:zz/no_coord",
> "parsedquery_toString": "+(id:12345 (text:zz))"
> "explain": {},
> "QParser": "ExtendedDismaxQParser"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2649) MM ignored in edismax queries with operators

2016-05-19 Thread Greg Pendlebury (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15292397#comment-15292397
 ] 

Greg Pendlebury commented on SOLR-2649:
---

[~rebeccatang], that sounds like expected behaviour. Your 'OR' operator is not 
being ignored; but rather, Solr translates OR operators into SHOULD occur flags 
(ie. optional search terms)... then, if 'mm' is set to 100%, this tells Solr 
that you require every optional search term to be present in the result set.

If you are explicitly setting 'mm' you should use a different value if you want 
OR operators to function. Also see SOLR-8812, which discusses setting a better 
default value for 'mm', particularly one that changes depending on the 'q.op' 
parameter. Of course that only applies in the case where you are not explicitly 
setting 'mm'.

> MM ignored in edismax queries with operators
> 
>
> Key: SOLR-2649
> URL: https://issues.apache.org/jira/browse/SOLR-2649
> Project: Solr
>  Issue Type: Improvement
>  Components: query parsers
>Reporter: Magnus Bergmark
>Assignee: Erick Erickson
> Fix For: 5.5, 6.0
>
> Attachments: SOLR-2649-with-Qop.patch, SOLR-2649-with-Qop.patch, 
> SOLR-2649.diff, SOLR-2649.patch, SOLR-2649.patch
>
>
> Hypothetical scenario:
>   1. User searches for "stocks oil gold" with MM set to "50%"
>   2. User adds "-stockings" to the query: "stocks oil gold -stockings"
>   3. User gets no hits since MM was ignored and all terms where AND-ed 
> together
> The behavior seems to be intentional, although the reason why is never 
> explained:
>   // For correct lucene queries, turn off mm processing if there
>   // were explicit operators (except for AND).
>   boolean doMinMatched = (numOR + numNOT + numPluses + numMinuses) == 0; 
> (lines 232-234 taken from 
> tags/lucene_solr_3_3/solr/src/java/org/apache/solr/search/ExtendedDismaxQParserPlugin.java)
> This makes edismax unsuitable as an replacement to dismax; mm is one of the 
> primary features of dismax.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8955) ReplicationHandler should throttle across all requests instead of for each client

2016-04-08 Thread Greg Pendlebury (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8955?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15233113#comment-15233113
 ] 

Greg Pendlebury commented on SOLR-8955:
---

I like the idea, but maybe it should be configurable? If the master has 
multiple NICs than hard coding an arbitrary limit because two unrelated slaves 
from different network interfaces are both online would actually be more of a 
hindrance than an improvement.

> ReplicationHandler should throttle across all requests instead of for each 
> client
> -
>
> Key: SOLR-8955
> URL: https://issues.apache.org/jira/browse/SOLR-8955
> Project: Solr
>  Issue Type: Improvement
>  Components: replication (java), SolrCloud
>Reporter: Shalin Shekhar Mangar
>  Labels: difficulty-easy, impact-medium, newdev
> Fix For: master, 6.1
>
>
> SOLR-6485 added the ability to throttle the speed of replication but the 
> implementation rate limits each request. So e.g. the maxWriteMBPerSec is 1 
> and 5 slaves request full replication then the effective transfer rate from 
> the master is 5 MB/second which is not what is often desired.
> I propose to make the rate limit global (across all replication requests) 
> instead.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8812) ExtendedDismaxQParser (edismax) ignores Boolean OR when q.op=AND

2016-04-03 Thread Greg Pendlebury (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15223581#comment-15223581
 ] 

Greg Pendlebury commented on SOLR-8812:
---

[~erickerickson], personally, I am ambivalent with regards to timing and 
versions. I am still not convinced there is actually an issue here, but I don't 
want to be a dick and dismiss it out-of-hand.

The patches provided are simply about choosing default parameter values that 
disrupt the least number of users who did not have mm set to an appropriate 
value. Any user (risky, broad generalisation incoming) who puts a boolean OR 
operator into an edismax query string would not want mm=100%, but that is what 
is happening here.

> ExtendedDismaxQParser (edismax) ignores Boolean OR when q.op=AND
> 
>
> Key: SOLR-8812
> URL: https://issues.apache.org/jira/browse/SOLR-8812
> Project: Solr
>  Issue Type: Bug
>  Components: query parsers
>Affects Versions: 5.5
>Reporter: Ryan Steinberg
>Assignee: Erick Erickson
>Priority: Blocker
> Fix For: 6.0, 5.5.1
>
> Attachments: SOLR-8812-barbie.patch, SOLR-8812.patch, SOLR-8812.patch
>
>
> The edismax parser ignores Boolean OR in queries when q.op=AND. This behavior 
> is new to Solr 5.5.0 and an unexpected major change.
> Example:
>   "q": "id:12345 OR zz",
>   "defType": "edismax",
>   "q.op": "AND",
> where "12345" is a known document ID and "zz" is a string NOT present 
> in my data
> Version 5.5.0 produces zero results:
> "rawquerystring": "id:12345 OR zz",
> "querystring": "id:12345 OR zz",
> "parsedquery": "(+((id:12345 
> DisjunctionMaxQuery((text:zz)))~2))/no_coord",
> "parsedquery_toString": "+((id:12345 (text:zz))~2)",
> "explain": {},
> "QParser": "ExtendedDismaxQParser"
> Version 5.4.0 produces one result as expected
>   "rawquerystring": "id:12345 OR zz",
> "querystring": "id:12345 OR zz",
> "parsedquery": "(+(id:12345 
> DisjunctionMaxQuery((text:zz/no_coord",
> "parsedquery_toString": "+(id:12345 (text:zz))"
> "explain": {},
> "QParser": "ExtendedDismaxQParser"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-8812) ExtendedDismaxQParser (edismax) ignores Boolean OR when q.op=AND

2016-04-03 Thread Greg Pendlebury (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-8812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Pendlebury updated SOLR-8812:
--
Attachment: SOLR-8812-barbie.patch

Adding a 'hair ties -barbie' example to unit tests. Not sure it demonstrates 
anything new, but it does work as I would expect.

I can't get git to generate a combined patch the way I would have in svn... my 
git-fu is weak.

> ExtendedDismaxQParser (edismax) ignores Boolean OR when q.op=AND
> 
>
> Key: SOLR-8812
> URL: https://issues.apache.org/jira/browse/SOLR-8812
> Project: Solr
>  Issue Type: Bug
>  Components: query parsers
>Affects Versions: 5.5
>Reporter: Ryan Steinberg
>Assignee: Erick Erickson
>Priority: Blocker
> Fix For: 6.0, 5.5.1
>
> Attachments: SOLR-8812-barbie.patch, SOLR-8812.patch, SOLR-8812.patch
>
>
> The edismax parser ignores Boolean OR in queries when q.op=AND. This behavior 
> is new to Solr 5.5.0 and an unexpected major change.
> Example:
>   "q": "id:12345 OR zz",
>   "defType": "edismax",
>   "q.op": "AND",
> where "12345" is a known document ID and "zz" is a string NOT present 
> in my data
> Version 5.5.0 produces zero results:
> "rawquerystring": "id:12345 OR zz",
> "querystring": "id:12345 OR zz",
> "parsedquery": "(+((id:12345 
> DisjunctionMaxQuery((text:zz)))~2))/no_coord",
> "parsedquery_toString": "+((id:12345 (text:zz))~2)",
> "explain": {},
> "QParser": "ExtendedDismaxQParser"
> Version 5.4.0 produces one result as expected
>   "rawquerystring": "id:12345 OR zz",
> "querystring": "id:12345 OR zz",
> "parsedquery": "(+(id:12345 
> DisjunctionMaxQuery((text:zz/no_coord",
> "parsedquery_toString": "+(id:12345 (text:zz))"
> "explain": {},
> "QParser": "ExtendedDismaxQParser"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8812) ExtendedDismaxQParser (edismax) ignores Boolean OR when q.op=AND

2016-03-31 Thread Greg Pendlebury (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15221079#comment-15221079
 ] 

Greg Pendlebury commented on SOLR-8812:
---

I also confirmed (for my own sanity) that q.op does indeed influence the 
default value of mm, as per [~janhoy]. Personally I don't like that, and 
perhaps it isn't relevant anymore since SOLR-2649... but I left it alone.

> ExtendedDismaxQParser (edismax) ignores Boolean OR when q.op=AND
> 
>
> Key: SOLR-8812
> URL: https://issues.apache.org/jira/browse/SOLR-8812
> Project: Solr
>  Issue Type: Bug
>  Components: query parsers
>Affects Versions: 5.5
>Reporter: Ryan Steinberg
>Assignee: Erick Erickson
>Priority: Blocker
> Fix For: 6.0, 5.5.1
>
> Attachments: SOLR-8812.patch, SOLR-8812.patch
>
>
> The edismax parser ignores Boolean OR in queries when q.op=AND. This behavior 
> is new to Solr 5.5.0 and an unexpected major change.
> Example:
>   "q": "id:12345 OR zz",
>   "defType": "edismax",
>   "q.op": "AND",
> where "12345" is a known document ID and "zz" is a string NOT present 
> in my data
> Version 5.5.0 produces zero results:
> "rawquerystring": "id:12345 OR zz",
> "querystring": "id:12345 OR zz",
> "parsedquery": "(+((id:12345 
> DisjunctionMaxQuery((text:zz)))~2))/no_coord",
> "parsedquery_toString": "+((id:12345 (text:zz))~2)",
> "explain": {},
> "QParser": "ExtendedDismaxQParser"
> Version 5.4.0 produces one result as expected
>   "rawquerystring": "id:12345 OR zz",
> "querystring": "id:12345 OR zz",
> "parsedquery": "(+(id:12345 
> DisjunctionMaxQuery((text:zz/no_coord",
> "parsedquery_toString": "+(id:12345 (text:zz))"
> "explain": {},
> "QParser": "ExtendedDismaxQParser"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-8812) ExtendedDismaxQParser (edismax) ignores Boolean OR when q.op=AND

2016-03-31 Thread Greg Pendlebury (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-8812?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Pendlebury updated SOLR-8812:
--
Attachment: SOLR-8812.patch

Attaching possible 'fix' that defaults mm to 0% if the users has declared no 
explicit mm, but has boolean operators in their query.

First time I have generated a patch using git, so hopefully it is ok.

> ExtendedDismaxQParser (edismax) ignores Boolean OR when q.op=AND
> 
>
> Key: SOLR-8812
> URL: https://issues.apache.org/jira/browse/SOLR-8812
> Project: Solr
>  Issue Type: Bug
>  Components: query parsers
>Affects Versions: 5.5
>Reporter: Ryan Steinberg
>Assignee: Erick Erickson
>Priority: Blocker
> Fix For: 6.0, 5.5.1
>
> Attachments: SOLR-8812.patch, SOLR-8812.patch
>
>
> The edismax parser ignores Boolean OR in queries when q.op=AND. This behavior 
> is new to Solr 5.5.0 and an unexpected major change.
> Example:
>   "q": "id:12345 OR zz",
>   "defType": "edismax",
>   "q.op": "AND",
> where "12345" is a known document ID and "zz" is a string NOT present 
> in my data
> Version 5.5.0 produces zero results:
> "rawquerystring": "id:12345 OR zz",
> "querystring": "id:12345 OR zz",
> "parsedquery": "(+((id:12345 
> DisjunctionMaxQuery((text:zz)))~2))/no_coord",
> "parsedquery_toString": "+((id:12345 (text:zz))~2)",
> "explain": {},
> "QParser": "ExtendedDismaxQParser"
> Version 5.4.0 produces one result as expected
>   "rawquerystring": "id:12345 OR zz",
> "querystring": "id:12345 OR zz",
> "parsedquery": "(+(id:12345 
> DisjunctionMaxQuery((text:zz/no_coord",
> "parsedquery_toString": "+(id:12345 (text:zz))"
> "explain": {},
> "QParser": "ExtendedDismaxQParser"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8812) ExtendedDismaxQParser (edismax) ignores Boolean OR when q.op=AND

2016-03-31 Thread Greg Pendlebury (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15220925#comment-15220925
 ] 

Greg Pendlebury commented on SOLR-8812:
---

Ok, I will try to find some time over the next week or so. I freely confess it 
doesn't look great on a Friday afternoon and school holidays begin here after 
next week. It might be a rough contribution someone else can carry over the 
line.

With regards to mixed cases of q.op and mm where users are explicitly setting 
them, I think they are already covered if you look in the unit test 
testDefaultOperatorWithMm(). The problem here seems to be the use case where 
people do not explicitly set mm and fall back to the default. This is treading 
on some expected behaviour from existing users.

> ExtendedDismaxQParser (edismax) ignores Boolean OR when q.op=AND
> 
>
> Key: SOLR-8812
> URL: https://issues.apache.org/jira/browse/SOLR-8812
> Project: Solr
>  Issue Type: Bug
>  Components: query parsers
>Affects Versions: 5.5
>Reporter: Ryan Steinberg
>Assignee: Erick Erickson
>Priority: Blocker
> Fix For: 6.0, 5.5.1
>
> Attachments: SOLR-8812.patch
>
>
> The edismax parser ignores Boolean OR in queries when q.op=AND. This behavior 
> is new to Solr 5.5.0 and an unexpected major change.
> Example:
>   "q": "id:12345 OR zz",
>   "defType": "edismax",
>   "q.op": "AND",
> where "12345" is a known document ID and "zz" is a string NOT present 
> in my data
> Version 5.5.0 produces zero results:
> "rawquerystring": "id:12345 OR zz",
> "querystring": "id:12345 OR zz",
> "parsedquery": "(+((id:12345 
> DisjunctionMaxQuery((text:zz)))~2))/no_coord",
> "parsedquery_toString": "+((id:12345 (text:zz))~2)",
> "explain": {},
> "QParser": "ExtendedDismaxQParser"
> Version 5.4.0 produces one result as expected
>   "rawquerystring": "id:12345 OR zz",
> "querystring": "id:12345 OR zz",
> "parsedquery": "(+(id:12345 
> DisjunctionMaxQuery((text:zz/no_coord",
> "parsedquery_toString": "+(id:12345 (text:zz))"
> "explain": {},
> "QParser": "ExtendedDismaxQParser"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8812) ExtendedDismaxQParser (edismax) ignores Boolean OR when q.op=AND

2016-03-31 Thread Greg Pendlebury (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15220854#comment-15220854
 ] 

Greg Pendlebury commented on SOLR-8812:
---

I don't know that what we are talking about here is a 'workaround' at all. Solr 
is doing exactly what it is being asked to do. I know it is disrupting an 
existing user base, so it warrants discussion and maybe even a 'fix'... but the 
existing user base were leaving a non-configured parameter at its default value 
(which probably didn't match their use case) and it only worked because the 
parameter was being ignored by edismax. The fact that parameter was ignored 
introduced the real bugs in SOLR-2649.

I think there has always been confusion over how this works under the hood, and 
that still continues. q.op and mm apply to two different parts of the query, 
and each of them has other factors that come into play.
 * q.op is a boolean operator, which happens pre-parse (or in the very earliest 
stages of parsing)
 * mm applies to (top level) clauses which have the SHOULD occur flag *after* 
Solr translates all the boolean operators
 * if mm is not explicitly set, the default value is determined by q.op (? I 
haven't verified this, but that is Jan's input above). The old doco says it is 
always 100% default... but I personally have always set it explicitly... no 
experience.
 * Solr translates boolean operators into occurs flags differently depending on 
the value of q.op. In particular q.op=AND causes non-intuitive generation of 
occurs flags if looked at from a purely boolean perspective.
 * mm does not make much sense at all if you think about search as a purely 
boolean query (ie. the result either matches or doesn't) instead of occurs 
flags (ie. the score of the result is either higher or lower)

So now that SOLR-2649 has come along, it slightly muddies the water because:
 * q.op is no longer hard coded to OR. Pre-patch the user could say q.op=AND, 
but it didn't do anything to the query
 * The presence of an operator no longer turns off the mm feature

*My take on the issue is that users who want to use boolean operators in 
edismax should pay attention to the mm parameter, and make sure their choice 
matches their use case*. Previously they didn't have to... but the presence of 
the boolean operators when using edismax was buggy (? debatable... it has been 
argued that it simply wasn't the use case edismax was first written for).

Having said that, IF anything was to change, I would simply play subtly with 
choosing the default value of mm. Maybe something like this:

IF (the query contains a boolean operator) AND (mm has not been explicitly set) 
THEN (mm = 0%)

It is a tweak on the work Jan did in SOLR-2649, so that instead of turning off 
mm in response to a boolean operator being present, we instead influence the 
default value. We still let users ultimately set up their parameters however 
they want though. If the user has a use case that includes both boolean 
parameters and mm logic... have fun.

> ExtendedDismaxQParser (edismax) ignores Boolean OR when q.op=AND
> 
>
> Key: SOLR-8812
> URL: https://issues.apache.org/jira/browse/SOLR-8812
> Project: Solr
>  Issue Type: Bug
>  Components: query parsers
>Affects Versions: 5.5
>Reporter: Ryan Steinberg
>Assignee: Erick Erickson
>Priority: Blocker
> Fix For: 6.0, 5.5.1
>
> Attachments: SOLR-8812.patch
>
>
> The edismax parser ignores Boolean OR in queries when q.op=AND. This behavior 
> is new to Solr 5.5.0 and an unexpected major change.
> Example:
>   "q": "id:12345 OR zz",
>   "defType": "edismax",
>   "q.op": "AND",
> where "12345" is a known document ID and "zz" is a string NOT present 
> in my data
> Version 5.5.0 produces zero results:
> "rawquerystring": "id:12345 OR zz",
> "querystring": "id:12345 OR zz",
> "parsedquery": "(+((id:12345 
> DisjunctionMaxQuery((text:zz)))~2))/no_coord",
> "parsedquery_toString": "+((id:12345 (text:zz))~2)",
> "explain": {},
> "QParser": "ExtendedDismaxQParser"
> Version 5.4.0 produces one result as expected
>   "rawquerystring": "id:12345 OR zz",
> "querystring": "id:12345 OR zz",
> "parsedquery": "(+(id:12345 
> DisjunctionMaxQuery((text:zz/no_coord",
> "parsedquery_toString": "+(id:12345 (text:zz))"
> "explain": {},
> "QParser": "ExtendedDismaxQParser"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8812) ExtendedDismaxQParser (edismax) ignores Boolean OR when q.op=AND

2016-03-30 Thread Greg Pendlebury (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15219133#comment-15219133
 ] 

Greg Pendlebury commented on SOLR-8812:
---

Thanks. Hopefully that is ok. I just installed git and started cloning trunk... 
now to upgrade to Java 8.

I think it is all working as intended, it is just that there is a confusing 
legacy of not having to worry about what mm was set to for some use cases. 
SOLR-2649 will force people to check what the parameters are, but all queries 
are now supported.

It would be nice if it was less disruptive, but given that pre-patch there was 
no way to get edismax to do certain queries, no matter what parameters you set, 
I think it is still an improvement.

> ExtendedDismaxQParser (edismax) ignores Boolean OR when q.op=AND
> 
>
> Key: SOLR-8812
> URL: https://issues.apache.org/jira/browse/SOLR-8812
> Project: Solr
>  Issue Type: Bug
>  Components: query parsers
>Affects Versions: 5.5
>Reporter: Ryan Steinberg
>Assignee: Erick Erickson
>Priority: Blocker
> Fix For: 6.0, 5.5.1
>
> Attachments: SOLR-8812.patch
>
>
> The edismax parser ignores Boolean OR in queries when q.op=AND. This behavior 
> is new to Solr 5.5.0 and an unexpected major change.
> Example:
>   "q": "id:12345 OR zz",
>   "defType": "edismax",
>   "q.op": "AND",
> where "12345" is a known document ID and "zz" is a string NOT present 
> in my data
> Version 5.5.0 produces zero results:
> "rawquerystring": "id:12345 OR zz",
> "querystring": "id:12345 OR zz",
> "parsedquery": "(+((id:12345 
> DisjunctionMaxQuery((text:zz)))~2))/no_coord",
> "parsedquery_toString": "+((id:12345 (text:zz))~2)",
> "explain": {},
> "QParser": "ExtendedDismaxQParser"
> Version 5.4.0 produces one result as expected
>   "rawquerystring": "id:12345 OR zz",
> "querystring": "id:12345 OR zz",
> "parsedquery": "(+(id:12345 
> DisjunctionMaxQuery((text:zz/no_coord",
> "parsedquery_toString": "+(id:12345 (text:zz))"
> "explain": {},
> "QParser": "ExtendedDismaxQParser"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8812) ExtendedDismaxQParser (edismax) ignores Boolean OR when q.op=AND

2016-03-30 Thread Greg Pendlebury (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8812?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15219086#comment-15219086
 ] 

Greg Pendlebury commented on SOLR-8812:
---

I am happy to take a look at any issues, since I was involved in SOLR-2649. I 
need to get a new copy of the code first, but in the interim, can someone 
confirm that explicitly setting mm to 0 does not fix this? I believe mm 
defaults to 100%, so that may be the real culprit, as opposed to q.op=AND. 
Before SOLR-2649 was resolved, setting an OR operator would have caused mm to 
be ignored. Now it will use the default value unless you set it explicitly.

Our production servers are using 5.1 with SOLR-2649 applied, and we have 
q.op=AND, with perfectly functional OR operators and mm=0%. All of the obvious 
queries work, including the cases referenced above.

>From memory there are a lot of subtle cliffs to fall off here, such as making 
>sure we are talking about top level clauses and ultimately remembering that 
>Solr does not use boolean logic... and there are some edge cases where it 
>simply doesn't work the same way as the occurs flags. SHOULD vs OR is the main 
>culprit.

> ExtendedDismaxQParser (edismax) ignores Boolean OR when q.op=AND
> 
>
> Key: SOLR-8812
> URL: https://issues.apache.org/jira/browse/SOLR-8812
> Project: Solr
>  Issue Type: Bug
>  Components: query parsers
>Affects Versions: 5.5
>Reporter: Ryan Steinberg
>Assignee: Erick Erickson
>Priority: Blocker
> Fix For: 6.0, 5.5.1
>
> Attachments: SOLR-8812.patch
>
>
> The edismax parser ignores Boolean OR in queries when q.op=AND. This behavior 
> is new to Solr 5.5.0 and an unexpected major change.
> Example:
>   "q": "id:12345 OR zz",
>   "defType": "edismax",
>   "q.op": "AND",
> where "12345" is a known document ID and "zz" is a string NOT present 
> in my data
> Version 5.5.0 produces zero results:
> "rawquerystring": "id:12345 OR zz",
> "querystring": "id:12345 OR zz",
> "parsedquery": "(+((id:12345 
> DisjunctionMaxQuery((text:zz)))~2))/no_coord",
> "parsedquery_toString": "+((id:12345 (text:zz))~2)",
> "explain": {},
> "QParser": "ExtendedDismaxQParser"
> Version 5.4.0 produces one result as expected
>   "rawquerystring": "id:12345 OR zz",
> "querystring": "id:12345 OR zz",
> "parsedquery": "(+(id:12345 
> DisjunctionMaxQuery((text:zz/no_coord",
> "parsedquery_toString": "+(id:12345 (text:zz))"
> "explain": {},
> "QParser": "ExtendedDismaxQParser"



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2649) MM ignored in edismax queries with operators

2015-12-13 Thread Greg Pendlebury (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15055202#comment-15055202
 ] 

Greg Pendlebury commented on SOLR-2649:
---

[~erickerickson] thanks for this!

> MM ignored in edismax queries with operators
> 
>
> Key: SOLR-2649
> URL: https://issues.apache.org/jira/browse/SOLR-2649
> Project: Solr
>  Issue Type: Improvement
>  Components: query parsers
>Reporter: Magnus Bergmark
>Assignee: Erick Erickson
> Fix For: 4.9, Trunk
>
> Attachments: SOLR-2649-with-Qop.patch, SOLR-2649-with-Qop.patch, 
> SOLR-2649.diff, SOLR-2649.patch, SOLR-2649.patch
>
>
> Hypothetical scenario:
>   1. User searches for "stocks oil gold" with MM set to "50%"
>   2. User adds "-stockings" to the query: "stocks oil gold -stockings"
>   3. User gets no hits since MM was ignored and all terms where AND-ed 
> together
> The behavior seems to be intentional, although the reason why is never 
> explained:
>   // For correct lucene queries, turn off mm processing if there
>   // were explicit operators (except for AND).
>   boolean doMinMatched = (numOR + numNOT + numPluses + numMinuses) == 0; 
> (lines 232-234 taken from 
> tags/lucene_solr_3_3/solr/src/java/org/apache/solr/search/ExtendedDismaxQParserPlugin.java)
> This makes edismax unsuitable as an replacement to dismax; mm is one of the 
> primary features of dismax.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2649) MM ignored in edismax queries with operators

2015-12-03 Thread Greg Pendlebury (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15039730#comment-15039730
 ] 

Greg Pendlebury commented on SOLR-2649:
---

I just ran it against out test system (patched Solr 5.1.0): (A OR B OR C) "D E"

1) Using mm=100%, q.op=AND and searching just the fulltext field. RAW debug:
{code}
(+(+(DisjunctionMaxQuery((fulltext:a)) DisjunctionMaxQuery((fulltext:b)) 
DisjunctionMaxQuery((fulltext:c))) +DisjunctionMaxQuery((fulltext:\"d e\"
{code}
I read that as:
{code}
+(a b c) +("d e")
{code}
which looks correct

2) switching to q.op=OR. RAW debug:
{code}
(+(((DisjunctionMaxQuery((fulltext:a)) DisjunctionMaxQuery((fulltext:b)) 
DisjunctionMaxQuery((fulltext:c))) DisjunctionMaxQuery((fulltext:\"d e\")))~2))
{code}
I read that as:
{code}
((a b c) "d e")~2
{code}
Which again looks correct... but we don't generally use OR, so I could be wrong

3) Finally, lowered mm to 50%, again with q.op=OR. RAW debug:
{code}
(+(((DisjunctionMaxQuery((fulltext:a)) DisjunctionMaxQuery((fulltext:b)) 
DisjunctionMaxQuery((fulltext:c))) DisjunctionMaxQuery((fulltext:\"d e\")))~1))
{code}
I read that as:
{code}
((a b c) "d e")~1
{code}
Still looks good.


> MM ignored in edismax queries with operators
> 
>
> Key: SOLR-2649
> URL: https://issues.apache.org/jira/browse/SOLR-2649
> Project: Solr
>  Issue Type: Improvement
>  Components: query parsers
>Reporter: Magnus Bergmark
>Assignee: Erick Erickson
> Fix For: 4.9, Trunk
>
> Attachments: SOLR-2649-with-Qop.patch, SOLR-2649-with-Qop.patch, 
> SOLR-2649.diff, SOLR-2649.patch
>
>
> Hypothetical scenario:
>   1. User searches for "stocks oil gold" with MM set to "50%"
>   2. User adds "-stockings" to the query: "stocks oil gold -stockings"
>   3. User gets no hits since MM was ignored and all terms where AND-ed 
> together
> The behavior seems to be intentional, although the reason why is never 
> explained:
>   // For correct lucene queries, turn off mm processing if there
>   // were explicit operators (except for AND).
>   boolean doMinMatched = (numOR + numNOT + numPluses + numMinuses) == 0; 
> (lines 232-234 taken from 
> tags/lucene_solr_3_3/solr/src/java/org/apache/solr/search/ExtendedDismaxQParserPlugin.java)
> This makes edismax unsuitable as an replacement to dismax; mm is one of the 
> primary features of dismax.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2649) MM ignored in edismax queries with operators

2015-12-03 Thread Greg Pendlebury (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15038688#comment-15038688
 ] 

Greg Pendlebury commented on SOLR-2649:
---

Mine shows as 18th Feb, but I assume that is just timezones. Assuming we are 
talking about the same patch, then, no, that is my patch (both of the 
'with-Qop' patches are from me). Jan, submitted the earlier 2014 patch which I 
used as a baseline to add the q.op change as well.

> MM ignored in edismax queries with operators
> 
>
> Key: SOLR-2649
> URL: https://issues.apache.org/jira/browse/SOLR-2649
> Project: Solr
>  Issue Type: Improvement
>  Components: query parsers
>Reporter: Magnus Bergmark
>Assignee: Erick Erickson
> Fix For: 4.9, Trunk
>
> Attachments: SOLR-2649-with-Qop.patch, SOLR-2649-with-Qop.patch, 
> SOLR-2649.diff, SOLR-2649.patch
>
>
> Hypothetical scenario:
>   1. User searches for "stocks oil gold" with MM set to "50%"
>   2. User adds "-stockings" to the query: "stocks oil gold -stockings"
>   3. User gets no hits since MM was ignored and all terms where AND-ed 
> together
> The behavior seems to be intentional, although the reason why is never 
> explained:
>   // For correct lucene queries, turn off mm processing if there
>   // were explicit operators (except for AND).
>   boolean doMinMatched = (numOR + numNOT + numPluses + numMinuses) == 0; 
> (lines 232-234 taken from 
> tags/lucene_solr_3_3/solr/src/java/org/apache/solr/search/ExtendedDismaxQParserPlugin.java)
> This makes edismax unsuitable as an replacement to dismax; mm is one of the 
> primary features of dismax.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Comment Edited] (SOLR-2649) MM ignored in edismax queries with operators

2015-12-03 Thread Greg Pendlebury (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15038626#comment-15038626
 ] 

Greg Pendlebury edited comment on SOLR-2649 at 12/3/15 9:31 PM:


I tried Jan's patch, and (whilst it is technically correct) it did not improve 
the usefulness of edismax without also addressing how q.op is handled. We 
continued to see absurd search results that failed UAT.

The combined patch with both has been on our prod servers since May 2014 
without any problems, but I have not heard any feedback from others that might 
have tried it. The corpus is nearly 200 million fulltext newspaper articles: 
http://trove.nla.gov.au/newspaper/result?q=


was (Author: gpendleb):
I tried Jan's patch, and (whilst it is technically correct) it did not improve 
the usefulness of edismax without also addressing how q.op is handled. We 
continued to see absurd search results that failed UAT.

The combined patch with both has been on our prod servers since May 2014 
without any problems, but I have not heard any feedback from others that might 
have tried it. The corpus is nearly 200 million fulltext newspapers: 
http://trove.nla.gov.au/newspaper/result?q=

> MM ignored in edismax queries with operators
> 
>
> Key: SOLR-2649
> URL: https://issues.apache.org/jira/browse/SOLR-2649
> Project: Solr
>  Issue Type: Improvement
>  Components: query parsers
>Reporter: Magnus Bergmark
>Assignee: Erick Erickson
> Fix For: 4.9, Trunk
>
> Attachments: SOLR-2649-with-Qop.patch, SOLR-2649-with-Qop.patch, 
> SOLR-2649.diff, SOLR-2649.patch
>
>
> Hypothetical scenario:
>   1. User searches for "stocks oil gold" with MM set to "50%"
>   2. User adds "-stockings" to the query: "stocks oil gold -stockings"
>   3. User gets no hits since MM was ignored and all terms where AND-ed 
> together
> The behavior seems to be intentional, although the reason why is never 
> explained:
>   // For correct lucene queries, turn off mm processing if there
>   // were explicit operators (except for AND).
>   boolean doMinMatched = (numOR + numNOT + numPluses + numMinuses) == 0; 
> (lines 232-234 taken from 
> tags/lucene_solr_3_3/solr/src/java/org/apache/solr/search/ExtendedDismaxQParserPlugin.java)
> This makes edismax unsuitable as an replacement to dismax; mm is one of the 
> primary features of dismax.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2649) MM ignored in edismax queries with operators

2015-12-03 Thread Greg Pendlebury (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15038626#comment-15038626
 ] 

Greg Pendlebury commented on SOLR-2649:
---

I tried Jan's patch, and (whilst it is technically correct) it did not improve 
the usefulness of edismax without also addressing how q.op is handled. We 
continued to see absurd search results that failed UAT.

The combined patch with both has been on our prod servers since May 2014 
without any problems, but I have not heard any feedback from others that might 
have tried it. The corpus is nearly 200 million fulltext newspapers: 
http://trove.nla.gov.au/newspaper/result?q=

> MM ignored in edismax queries with operators
> 
>
> Key: SOLR-2649
> URL: https://issues.apache.org/jira/browse/SOLR-2649
> Project: Solr
>  Issue Type: Improvement
>  Components: query parsers
>Reporter: Magnus Bergmark
>Assignee: Erick Erickson
> Fix For: 4.9, Trunk
>
> Attachments: SOLR-2649-with-Qop.patch, SOLR-2649-with-Qop.patch, 
> SOLR-2649.diff, SOLR-2649.patch
>
>
> Hypothetical scenario:
>   1. User searches for "stocks oil gold" with MM set to "50%"
>   2. User adds "-stockings" to the query: "stocks oil gold -stockings"
>   3. User gets no hits since MM was ignored and all terms where AND-ed 
> together
> The behavior seems to be intentional, although the reason why is never 
> explained:
>   // For correct lucene queries, turn off mm processing if there
>   // were explicit operators (except for AND).
>   boolean doMinMatched = (numOR + numNOT + numPluses + numMinuses) == 0; 
> (lines 232-234 taken from 
> tags/lucene_solr_3_3/solr/src/java/org/apache/solr/search/ExtendedDismaxQParserPlugin.java)
> This makes edismax unsuitable as an replacement to dismax; mm is one of the 
> primary features of dismax.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-3274) ZooKeeper related SolrCloud problems

2015-10-28 Thread Greg Pendlebury (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-3274?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14979672#comment-14979672
 ] 

Greg Pendlebury commented on SOLR-3274:
---

FWIW we ran into this issue today as well, and nothing worked until ZK was 
restarted. I would love to think that Solr could detect this issue, but it 
smells like a ZK bug to me.

> ZooKeeper related SolrCloud problems
> 
>
> Key: SOLR-3274
> URL: https://issues.apache.org/jira/browse/SOLR-3274
> Project: Solr
>  Issue Type: Bug
>  Components: SolrCloud
>Affects Versions: 4.0-ALPHA
> Environment: Any
>Reporter: Per Steffensen
>
> Same setup as in SOLR-3273. Well if I have to tell the entire truth we have 7 
> Solr servers, running 28 slices of the same collection (collA) - all slices 
> have one replica (two shards all in all - leader + replica) - 56 cores all in 
> all (8 shards on each solr instance). But anyways...
> Besides the problem reported in SOLR-3273, the system seems to run fine under 
> high load for several hours, but eventually errors like the ones shown below 
> start to occur. I might be wrong, but they all seem to indicate some kind of 
> unstability in the collaboration between Solr and ZooKeeper. I have to say 
> that I havnt been there to check ZooKeeper "at the moment where those 
> exception occur", but basically I dont believe the exceptions occur because 
> ZooKeeper is not running stable - at least when I go and check ZooKeeper 
> through other "channels" (e.g. my eclipse ZK plugin) it is always accepting 
> my connection and generally seems to be doing fine.
> Exception 1) Often the first error we see in solr.log is something like this
> {code}
> Mar 22, 2012 5:06:43 AM org.apache.solr.common.SolrException log
> SEVERE: org.apache.solr.common.SolrException: Cannot talk to ZooKeeper - 
> Updates are disabled.
> at 
> org.apache.solr.update.processor.DistributedUpdateProcessor.zkCheck(DistributedUpdateProcessor.java:678)
> at 
> org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:250)
> at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:140)
> at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:80)
> at 
> org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:59)
> at 
> org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
> at org.apache.solr.core.SolrCore.execute(SolrCore.java:1540)
> at 
> org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:407)
> at 
> org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:256)
> at 
> org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1212)
> at 
> org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:399)
> at 
> org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
> at 
> org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:182)
> at 
> org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:766)
> at 
> org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:450)
> at 
> org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
> at 
> org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
> at 
> org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
> at org.mortbay.jetty.Server.handle(Server.java:326)
> at 
> org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:542)
> at 
> org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:945)
> at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:756)
> at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:218)
> at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
> at 
> org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:228)
> at 
> org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:582)
> {code}
> I believe this error basically occurs because SolrZkClient.isConnected 
> reports false, which means that its internal "keeper.getState" does not 
> return ZooKeeper.States.CONNECTED. Im pretty sure that it has been CONNECTED 
> for a long time, since this error starts occuring after s

[jira] [Commented] (SOLR-8016) CloudSolrClient has extremely verbose error logging

2015-09-08 Thread Greg Pendlebury (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14735813#comment-14735813
 ] 

Greg Pendlebury commented on SOLR-8016:
---

I haven't looked at the innards of the method enough to say for sure. I know in 
our particular use case it is fruitless to keep trying. The nodes are online, 
but cannot answer in the way expected:

{code}
ERROR o.a.s.c.s.i.CloudSolrClient - Request to collection trove failed due to 
(500) org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: 
Error from server at /solr/trove: Expected mime type 
application/octet-stream but got text/html. 


Error 500 {msg=SolrCore 'trove' is not available due to init failure: 
Index locked for write for core 
trove,trace=org.apache.solr.common.SolrException: SolrCore 'trove' is not 
available due to init failure: Index locked for write for core trove
{code}

And then lots and lots more html output.

The Exception that bubbles up to our code is more than enough for us know where 
to start looking:
{code}
ERROR a.g.n.n.c.r.SolrService - Solr search failed: No live SolrServers 
available to handle this request:[]
{code}

> CloudSolrClient has extremely verbose error logging
> ---
>
> Key: SOLR-8016
> URL: https://issues.apache.org/jira/browse/SOLR-8016
> Project: Solr
>  Issue Type: Improvement
>  Components: clients - java
>    Affects Versions: 5.2.1, Trunk
>Reporter: Greg Pendlebury
>Priority: Minor
>  Labels: easyfix
>
> CloudSolrClient has this error logging line which is fairly annoying:
> {code}
>   log.error("Request to collection {} failed due to ("+errorCode+
>   ") {}, retry? "+retryCount, collection, rootCause.toString());
> {code}
> Given that this is a client library and then gets embedded into other 
> applications this line is very problematic to handle gracefully. In today's 
> example I was looking at, every failed search was logging over 100 lines, 
> including the full HTML response from the responding node in the cluster.
> The resulting SolrServerException that comes out to our application is 
> handled appropriately but we can't stop this class complaining in logs 
> without suppressing the entire ERROR channel, which we don't want to do. This 
> is the only direct line writing to the log I could find in the client, so we 
> _could_ suppress errors, but that just feels dirty, and fragile for the 
> future.
> From looking at the code I am fairly certain it is not as simple as throwing 
> an exception instead of logging... it is right in the middle of the method. I 
> suspect the simplest answer is adding a marker 
> (http://www.slf4j.org/api/org/slf4j/Marker.html) to the logging call.
> Then solrj users can choose what to do with these log entries. I don't know 
> if there is a broader strategy for handling this that I am ignorant of; 
> apologies if that is the case.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-8016) CloudSolrClient has extremely verbose error logging

2015-09-08 Thread Greg Pendlebury (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-8016?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14735752#comment-14735752
 ] 

Greg Pendlebury commented on SOLR-8016:
---

Lowering the level to INFO would be good in our case, although when you say 
that after all the retries it will eventually error would just delay the 
event... unless the error is thrown instead of logged. The Solr nodes were in a 
bad way and needed intervention from sysadmins because of locked index segments 
from a graceless shutdown.

Under this scenario, the UI clients were logging enormous amounts of useless 
content ('rootCause.toString()') and making finding other lines in the log very 
difficult. Because the client also throws Exceptions we had already gracefully 
handled the outage by degrading functionality.

With regards to Markers I have never used them personally, but before I 
suggested them I looked at the fact that both log4j and logback support them 
via slf4j. This covers both the solr default (log4j) and the binding we use in 
production (logback) so I am selfishly happy with the possibility... and I 
think it is the simplest change. I didn't want to propose a rethink of the 
logging, or that method's flow, but I am happy if this prompts that as well.

> CloudSolrClient has extremely verbose error logging
> ---
>
> Key: SOLR-8016
> URL: https://issues.apache.org/jira/browse/SOLR-8016
> Project: Solr
>  Issue Type: Improvement
>  Components: clients - java
>Affects Versions: 5.2.1, Trunk
>Reporter: Greg Pendlebury
>Priority: Minor
>  Labels: easyfix
>
> CloudSolrClient has this error logging line which is fairly annoying:
> {code}
>   log.error("Request to collection {} failed due to ("+errorCode+
>   ") {}, retry? "+retryCount, collection, rootCause.toString());
> {code}
> Given that this is a client library and then gets embedded into other 
> applications this line is very problematic to handle gracefully. In today's 
> example I was looking at, every failed search was logging over 100 lines, 
> including the full HTML response from the responding node in the cluster.
> The resulting SolrServerException that comes out to our application is 
> handled appropriately but we can't stop this class complaining in logs 
> without suppressing the entire ERROR channel, which we don't want to do. This 
> is the only direct line writing to the log I could find in the client, so we 
> _could_ suppress errors, but that just feels dirty, and fragile for the 
> future.
> From looking at the code I am fairly certain it is not as simple as throwing 
> an exception instead of logging... it is right in the middle of the method. I 
> suspect the simplest answer is adding a marker 
> (http://www.slf4j.org/api/org/slf4j/Marker.html) to the logging call.
> Then solrj users can choose what to do with these log entries. I don't know 
> if there is a broader strategy for handling this that I am ignorant of; 
> apologies if that is the case.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-8016) CloudSolrClient has extremely verbose error logging

2015-09-07 Thread Greg Pendlebury (JIRA)

Greg Pendlebury created SOLR-8016:
-

 Summary: CloudSolrClient has extremely verbose error logging
 Key: SOLR-8016
 URL: https://issues.apache.org/jira/browse/SOLR-8016
 Project: Solr
  Issue Type: Improvement
  Components: clients - java
Affects Versions: 5.2.1, Trunk
Reporter: Greg Pendlebury
Priority: Minor


CloudSolrClient has this error logging line which is fairly annoying:

{code}
  log.error("Request to collection {} failed due to ("+errorCode+
  ") {}, retry? "+retryCount, collection, rootCause.toString());
{code}

Given that this is a client library and then gets embedded into other 
applications this line is very problematic to handle gracefully. In today's 
example I was looking at, every failed search was logging over 100 lines, 
including the full HTML response from the responding node in the cluster.

The resulting SolrServerException that comes out to our application is handled 
appropriately but we can't stop this class complaining in logs without 
suppressing the entire ERROR channel, which we don't want to do. This is the 
only direct line writing to the log I could find in the client, so we _could_ 
suppress errors, but that just feels dirty, and fragile for the future.

>From looking at the code I am fairly certain it is not as simple as throwing 
>an exception instead of logging... it is right in the middle of the method. I 
>suspect the simplest answer is adding a marker 
>(http://www.slf4j.org/api/org/slf4j/Marker.html) to the logging call.

Then solrj users can choose what to do with these log entries. I don't know if 
there is a broader strategy for handling this that I am ignorant of; apologies 
if that is the case.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-2649) MM ignored in edismax queries with operators

2015-02-17 Thread Greg Pendlebury (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-2649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Pendlebury updated SOLR-2649:
--
Attachment: SOLR-2649-with-Qop.patch

Replacement patch for 'SOLR-2649-with-Qop.patch' against current trunk.

> MM ignored in edismax queries with operators
> 
>
> Key: SOLR-2649
> URL: https://issues.apache.org/jira/browse/SOLR-2649
> Project: Solr
>  Issue Type: Bug
>  Components: query parsers
>Reporter: Magnus Bergmark
>Assignee: Erick Erickson
>Priority: Minor
> Fix For: 4.9, Trunk
>
> Attachments: SOLR-2649-with-Qop.patch, SOLR-2649-with-Qop.patch, 
> SOLR-2649.diff, SOLR-2649.patch
>
>
> Hypothetical scenario:
>   1. User searches for "stocks oil gold" with MM set to "50%"
>   2. User adds "-stockings" to the query: "stocks oil gold -stockings"
>   3. User gets no hits since MM was ignored and all terms where AND-ed 
> together
> The behavior seems to be intentional, although the reason why is never 
> explained:
>   // For correct lucene queries, turn off mm processing if there
>   // were explicit operators (except for AND).
>   boolean doMinMatched = (numOR + numNOT + numPluses + numMinuses) == 0; 
> (lines 232-234 taken from 
> tags/lucene_solr_3_3/solr/src/java/org/apache/solr/search/ExtendedDismaxQParserPlugin.java)
> This makes edismax unsuitable as an replacement to dismax; mm is one of the 
> primary features of dismax.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2649) MM ignored in edismax queries with operators

2015-02-15 Thread Greg Pendlebury (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14322219#comment-14322219
 ] 

Greg Pendlebury commented on SOLR-2649:
---

Thanks Erick,

I can recreate the SOLR-2649-with-Qop.patch this week (today looks pretty busy 
sorry). Just updating trunk now. Jan's SOLR-2649 patch is technically correct 
from everything I have looked at, but it actually makes the eDismax parser very 
confusing for novice end users. Our investigation seemed to indicate that the 
problems stem from the steps taken by Lucene/Solr to convert boolean OR 
operators to the SHOULD occur flags (but running off memory here). This is made 
very obvious by the fact that eDismax is hard coded to use OR as the default 
operator. We were simply tea leaf gazing, but our assumption is that this 
confusion may have been the original cause for disabling 'mm' when operators 
were present.

So the patch we submitted simply does the same as Jan's, but also makes eDismax 
read the default operator from the 'q.op' parameter. With access to both 
parameters we have always been able to respond meaningfully to the queries our 
users are submitting.

> MM ignored in edismax queries with operators
> 
>
> Key: SOLR-2649
> URL: https://issues.apache.org/jira/browse/SOLR-2649
> Project: Solr
>  Issue Type: Bug
>  Components: query parsers
>Reporter: Magnus Bergmark
>Assignee: Erick Erickson
>Priority: Minor
> Fix For: 4.9, Trunk
>
> Attachments: SOLR-2649-with-Qop.patch, SOLR-2649.diff, SOLR-2649.patch
>
>
> Hypothetical scenario:
>   1. User searches for "stocks oil gold" with MM set to "50%"
>   2. User adds "-stockings" to the query: "stocks oil gold -stockings"
>   3. User gets no hits since MM was ignored and all terms where AND-ed 
> together
> The behavior seems to be intentional, although the reason why is never 
> explained:
>   // For correct lucene queries, turn off mm processing if there
>   // were explicit operators (except for AND).
>   boolean doMinMatched = (numOR + numNOT + numPluses + numMinuses) == 0; 
> (lines 232-234 taken from 
> tags/lucene_solr_3_3/solr/src/java/org/apache/solr/search/ExtendedDismaxQParserPlugin.java)
> This makes edismax unsuitable as an replacement to dismax; mm is one of the 
> primary features of dismax.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Comment Edited] (SOLR-5722) Add catenateShingles option to WordDelimiterFilter

2015-02-12 Thread Greg Pendlebury (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-5722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13903781#comment-13903781
 ] 

Greg Pendlebury edited comment on SOLR-5722 at 2/13/15 3:23 AM:


The link to the doco is working for me today so I took a quick look. I think 
the other reason that the HyphenatedWordsFilter is not suitable is that it 
removes the hyphen from the material assuming that it can only have one 
meaning. The specific circumstances I am considering is when the hyphen is part 
of a legitimately hyphenated word that just happen to break across a line wrap. 
eg. 'up-\{\n\}to-date'

The HyphenatedWordsFilter would turn this into 'upto-date', and cause user 
searches of 'up to date' to not match, since no filters later in the chain can 
really pull 'upto' apart again. Whereas the 'catenateShingles' option is 
intended to preserve the word delimiter and provide all the permutations a user 
might type to find that term: "up to date", "upto date", "up todate", "uptodate"


was (Author: gpendleb):
The link to the doco is working for me today so I took a quick look. I think 
the other reason that the HyphenatedWordsFilter is not suitable is that it 
removes the hyphen from the material assuming that it can only have one 
meaning. The specific circumstances I am considering is when the hyphen is part 
of a legitimately hyphenated word that just happen to break across a line wrap. 
eg. 'up-\{\n\}to-date'

The HyphenatedWordsFilter would turn this into 'upto-date', and cause user 
searches of 'up to date' to not match, since no filters later in the change can 
really pull 'upto' apart again. Whereas the 'catenateShingles' option is 
intended to preserve the word delimiter and provide all the permutations a user 
might type to find that term: "up to date", "upto date", "up todate", "uptodate"

> Add catenateShingles option to WordDelimiterFilter
> --
>
> Key: SOLR-5722
> URL: https://issues.apache.org/jira/browse/SOLR-5722
> Project: Solr
>  Issue Type: Improvement
>Reporter: Greg Pendlebury
>Priority: Minor
>  Labels: filter, newbie, patch
> Attachments: WDFconcatShingles.patch
>
>
> Apologies if I put this in the wrong spot. I'm attaching a patch (against 
> current trunk) that adds support for a 'catenateShingles' option to the 
> WordDelimiterFilter. 
> We (National Library of Australia - NLA) are currently maintaining this as an 
> internal modification to the Filter, but I believe it is generic enough to 
> contribute upstream.
> Description:
> =
> {code}
> /**
>  * NLA Modification to the standard word delimiter to support various
>  * hyphenation use cases. Primarily driven by requirements for
>  * newspapers where words are often broken across line endings.
>  *
>  *  eg. "hyphenated-surname" is printed printed across a line ending and
>  * turns out like "hyphen-ated-surname" or "hyphenated-sur-name".
>  *
>  *  In this scenario the stock filter, with 'catenateAll' turned on, will
>  *  generate individual tokens plus one combined token, but not
>  *  sub-tokens like "hyphenated surname" and "hyphenatedsur name".
>  *
>  *  So we add a new 'catenateShingles' to achieve this.
> */
> {code}
> Includes unit tests, and as is noted in one of them CATENATE_WORDS and 
> CATENATE_SHINGLES are logically considered mutually exclusive for sensible 
> usage and can cause duplicate tokens (although they should have the same 
> positions etc).
> I'm happy to work on it more if anyone finds problems with it.



--
This message was sent by Atlassian JIRA
(v6.3.4#6332)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Progress on SOLR-2649 (MM ignored in edismax queries with operators)

2015-02-12 Thread Greg Pendlebury

I would like to try and get the proposed fix on SOLR-2649 (
https://issues.apache.org/jira/browse/SOLR-2649) into the codebase if
possible. We have been running a patched v4.7.2 (SOLR-2649-with-Qop.patch)
in production since May 2014, and are currently planning an upgrade to
either Solr 5 or the latest v4.X (time will tell which).

I would love to be able to drop this patch out of our build procedures, and
I am happy to help with any work involved, but not sure where to start (in
terms of procedure, not code). The wiki doco on contributing which I read
last time just said to put patches in Jira, but I guess this one needs a
little more of a push. Given the potentially disruptive nature of the
change inside edismax I think more feedback from other edismax users would
be nice, but the Jira ticket doesn't get a lot of comment normally.

Ta,
Greg

Re: 4.9

2014-06-12 Thread Greg Pendlebury

Ok, ta


On 13 June 2014 12:22, Robert Muir  wrote:

> Those both look like pretty complicated issues. I don't see any reason why
> they should block a release.
>  On Jun 12, 2014 10:15 PM, "Greg Pendlebury" 
> wrote:
>
>> Can I do anything to assist in getting these patches considered for
>> inclusion?
>>
>> https://issues.apache.org/jira/browse/SOLR-5722
>> https://issues.apache.org/jira/browse/SOLR-2649
>>
>> Ta,
>> Greg
>>
>>
>>
>> On 13 June 2014 11:56, Robert Muir  wrote:
>>
>>> We have a pretty big release already with lots of good performance
>>> improvements. I'd like to release 4.9 soon, ill be RM. I'm thinking of
>>> spinning a RC in a week or so.
>>>
>>
>>

Re: 4.9

2014-06-12 Thread Greg Pendlebury

Can I do anything to assist in getting these patches considered for
inclusion?

https://issues.apache.org/jira/browse/SOLR-5722
https://issues.apache.org/jira/browse/SOLR-2649

Ta,
Greg



On 13 June 2014 11:56, Robert Muir  wrote:

> We have a pretty big release already with lots of good performance
> improvements. I'd like to release 4.9 soon, ill be RM. I'm thinking of
> spinning a RC in a week or so.
>

[jira] [Comment Edited] (SOLR-2649) MM ignored in edismax queries with operators

2014-04-30 Thread Greg Pendlebury (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13986408#comment-13986408
 ] 

Greg Pendlebury edited comment on SOLR-2649 at 5/1/14 6:54 AM:
---

I applied this patch to 4.7.2 Yesterday and tried it out on our dev servers. At 
first I thought it was pretty bad and failed completely... but then I had a 
good think and re-read everything on this ticket and this[1] article and 
realised my understanding of the problem was flawed. Using just this patch in 
isolation it converted all of the OR operators to AND operators with mm=100%. 
Very confusing behaviour for our business area, but I realise now that it is 
correct.

Perhaps the confusion stems from the way the q.op and mm parameters interact. 
If the behaviour was to instead separate them more clearly then we could change 
the config entirely. At the moment our mm is 100% because we effectively want 
q.op=AND, but if q.op was instead applied 1) always, 2) first and 3) 
independently from mm (ie. insert AND wherever an operator is missing) we could 
set mm=1 and achieve what we want by respecting the OR parameters provided by 
the user.

I've added this on top of the patch already here and deployed again to our dev 
servers using 'q.op=AND & mm=1' and now everything appears to function as it 
should. I'll upload the patch in a minute, and it includes several unit tests 
with different mm and q.op values. From my perspective I think the two 
parameters are interacting appropriately, but perhaps someone with more 
convoluted mm settings could give it a try?

The change is simply in the constructor of the ExtendedSolrQueryParser class 
where it was hardcoded to force the default operator to OR (presumably so that 
mm would take care of things) I've made it look at the parameter provided with 
the query (copied the code from the Simple QParser and adjusted to fit).

The unit test from the first patch that was marked TODO I have tweaked 
slightly. I think not finding a result in that case is entirely appropriate if 
the user can now tweak q.op. Opinions may vary of course.

[1] http://searchhub.org/2011/12/28/why-not-and-or-and-not/


was (Author: gpendleb):
I applied this patch to 4.7.2 Yesterday and tried it out on or dev servers. At 
first I thought it was pretty bad and failed completely... but then I had a 
good think and re-read everything on this ticket and this[1] article and 
realised my understanding of the problem was flawed. Using just this patch in 
isolation it converted all of the OR operators to AND operators with mm=100%. 
Very confusing behaviour for our business area, but I realise now that it is 
correct.

Perhaps the confusion stems from the way the q.op and mm parameters interact. 
If the behaviour was to instead separate them more clearly then we could change 
the config entirely. At the moment our mm is 100% because we effectively want 
q.op=AND, but if q.op was instead applied 1) always, 2) first and 3) 
independently from mm (ie. insert AND wherever an operator is missing) we could 
set mm=1 and achieve what we want by respecting the OR parameters provided by 
the user.

I've added this on top of the patch already here and deployed again to our dev 
servers using 'q.op=AND & mm=1' and now everything appears to function as it 
should. I'll upload the patch in a minute, and it includes several unit tests 
with different mm and q.op values. From my perspective I think the two 
parameters are interacting appropriately, but perhaps someone with more 
convoluted mm settings could give it a try?

The change is simply in the constructor of the ExtendedSolrQueryParser class 
where it was hardcoded to force the default operator to OR (presumably so that 
mm would take care of things) I've made it look at the parameter provided with 
the query (copied the code from the Simple QParser and adjusted to fit).

The unit test from the first patch that was marked TODO I have tweaked 
slightly. I think not finding a result in that case is entirely appropriate if 
the user can now tweak q.op. Opinions may vary of course.

[1] http://searchhub.org/2011/12/28/why-not-and-or-and-not/

> MM ignored in edismax queries with operators
> 
>
> Key: SOLR-2649
> URL: https://issues.apache.org/jira/browse/SOLR-2649
> Project: Solr
>  Issue Type: Bug
>  Components: query parsers
>Reporter: Magnus Bergmark
>Priority: Minor
> Fix For: 4.9, 5.0
>
> Attachments: SOLR-2649-with-Qop.patch, SOLR-2649.diff, SOLR-2649.patch
>
>
> Hypothetical scenario:
>   1. User searches for "stocks oil gold" with MM set to "50%"
>   2. User adds "-sto

[jira] [Updated] (SOLR-2649) MM ignored in edismax queries with operators

2014-04-30 Thread Greg Pendlebury (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-2649?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Pendlebury updated SOLR-2649:
--

Attachment: SOLR-2649-with-Qop.patch

> MM ignored in edismax queries with operators
> 
>
> Key: SOLR-2649
> URL: https://issues.apache.org/jira/browse/SOLR-2649
> Project: Solr
>  Issue Type: Bug
>  Components: query parsers
>Reporter: Magnus Bergmark
>Priority: Minor
> Fix For: 4.9, 5.0
>
> Attachments: SOLR-2649-with-Qop.patch, SOLR-2649.diff, SOLR-2649.patch
>
>
> Hypothetical scenario:
>   1. User searches for "stocks oil gold" with MM set to "50%"
>   2. User adds "-stockings" to the query: "stocks oil gold -stockings"
>   3. User gets no hits since MM was ignored and all terms where AND-ed 
> together
> The behavior seems to be intentional, although the reason why is never 
> explained:
>   // For correct lucene queries, turn off mm processing if there
>   // were explicit operators (except for AND).
>   boolean doMinMatched = (numOR + numNOT + numPluses + numMinuses) == 0; 
> (lines 232-234 taken from 
> tags/lucene_solr_3_3/solr/src/java/org/apache/solr/search/ExtendedDismaxQParserPlugin.java)
> This makes edismax unsuitable as an replacement to dismax; mm is one of the 
> primary features of dismax.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2649) MM ignored in edismax queries with operators

2014-04-30 Thread Greg Pendlebury (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2649?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13986408#comment-13986408
 ] 

Greg Pendlebury commented on SOLR-2649:
---

I applied this patch to 4.7.2 Yesterday and tried it out on or dev servers. At 
first I thought it was pretty bad and failed completely... but then I had a 
good think and re-read everything on this ticket and this[1] article and 
realised my understanding of the problem was flawed. Using just this patch in 
isolation it converted all of the OR operators to AND operators with mm=100%. 
Very confusing behaviour for our business area, but I realise now that it is 
correct.

Perhaps the confusion stems from the way the q.op and mm parameters interact. 
If the behaviour was to instead separate them more clearly then we could change 
the config entirely. At the moment our mm is 100% because we effectively want 
q.op=AND, but if q.op was instead applied 1) always, 2) first and 3) 
independently from mm (ie. insert AND wherever an operator is missing) we could 
set mm=1 and achieve what we want by respecting the OR parameters provided by 
the user.

I've added this on top of the patch already here and deployed again to our dev 
servers using 'q.op=AND & mm=1' and now everything appears to function as it 
should. I'll upload the patch in a minute, and it includes several unit tests 
with different mm and q.op values. From my perspective I think the two 
parameters are interacting appropriately, but perhaps someone with more 
convoluted mm settings could give it a try?

The change is simply in the constructor of the ExtendedSolrQueryParser class 
where it was hardcoded to force the default operator to OR (presumably so that 
mm would take care of things) I've made it look at the parameter provided with 
the query (copied the code from the Simple QParser and adjusted to fit).

The unit test from the first patch that was marked TODO I have tweaked 
slightly. I think not finding a result in that case is entirely appropriate if 
the user can now tweak q.op. Opinions may vary of course.

[1] http://searchhub.org/2011/12/28/why-not-and-or-and-not/

> MM ignored in edismax queries with operators
> 
>
> Key: SOLR-2649
> URL: https://issues.apache.org/jira/browse/SOLR-2649
> Project: Solr
>  Issue Type: Bug
>  Components: query parsers
>Reporter: Magnus Bergmark
>Priority: Minor
> Fix For: 4.9, 5.0
>
> Attachments: SOLR-2649.diff, SOLR-2649.patch
>
>
> Hypothetical scenario:
>   1. User searches for "stocks oil gold" with MM set to "50%"
>   2. User adds "-stockings" to the query: "stocks oil gold -stockings"
>   3. User gets no hits since MM was ignored and all terms where AND-ed 
> together
> The behavior seems to be intentional, although the reason why is never 
> explained:
>   // For correct lucene queries, turn off mm processing if there
>   // were explicit operators (except for AND).
>   boolean doMinMatched = (numOR + numNOT + numPluses + numMinuses) == 0; 
> (lines 232-234 taken from 
> tags/lucene_solr_3_3/solr/src/java/org/apache/solr/search/ExtendedDismaxQParserPlugin.java)
> This makes edismax unsuitable as an replacement to dismax; mm is one of the 
> primary features of dismax.



--
This message was sent by Atlassian JIRA
(v6.2#6252)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

Re: Simple unit test doco?

2014-04-29 Thread Greg Pendlebury

Thank-you for the tips. I still haven't had a single run pass out of around
20 attempts now on both the server and my desktop, but I'll just keep
chipping away at it.

Ta,
Greg


On 29 April 2014 14:00, Tomás Fernández Löbbe  wrote:

> Many times cloud-related/distrib tests fail due to timeouts, this could be
> related to the overall load of your computer (probably generated by the
> tests itself). I don’t know if this is the correct way, but I found it that
> it’s much less probable for them to fail if I use less JVMs to run the
> tests (by default my mac would use 4, but I set it to 2 if I see failures.
> You can use the JVM parameter "tests.jvms" when running ant test)
>
> If you are working on some specific component you can filter which tests
> to run in many ways, see “ant test-help”. It may be useful to use
> tests.slow=false to skip the slow tests in most of your runs.
>
> "do I need to turn on a ZK server for integration testing?”
> No, you don’t. Solr will start an embedded Zookeeper for the tests.
>
> "I've tried running those tests in isolation via IntelliJ and they all
> report as passing”
> Most probably is not related to this, but just in case: make sure when you
> try to reproduce a failure on a test that you saw to use the same seed
> (-Dtests.seed). The seed used should be in the output of the test where you
> saw the failure.
>
>
>[junit4] Tests with failures:
>[junit4]   - org.apache.solr.hadoop.MorphlineMapperTest (suite)
>[junit4]
>
> Sorry, no idea about this one.
>
>
> On Mon, Apr 28, 2014 at 7:47 PM, Greg Pendlebury <
> greg.pendleb...@gmail.com> wrote:
>
>> Heyo,
>>
>> I'm wondering if there is any additional doco and/or tricks to unit
>> testing solr than this wiki page? http://wiki.apache.org/solr/TestingSolr
>>
>> Some details about my troubles are below if anyone cares to read, but I'm
>> not so much looking for specific responses to why individual tests are
>> failing. I'm more trying to work out whether I'm on the right track or
>> missing some key information... like do I need to turn on a ZK server for
>> integration testing?
>>
>> Or do I need to accept failed unit tests as a baseline before applying
>> our patch? I don't typically like that, but this is an enormous test suite
>> and I'd be happy just to get a pass up to the same level that 4.7.2 had
>> prior to release.
>>
>> Ta,
>> Greg
>>
>>
>> Details
>> ==
>> I downloaded the tagged 4.7.2 release Yesterday to apply a patch our team
>> wants to test, but even before touching the codebase at all I cannot get
>> the unit tests to pass. I'm struggling to even get consistent results.
>>
>> The most useful two end points I reach are:
>>[junit4] Tests with failures:
>>[junit4]   -
>> org.apache.solr.cloud.CustomCollectionTest.testDistribSearch
>>[junit4]   -
>> org.apache.solr.cloud.DistribCursorPagingTest.testDistribSearch
>>[junit4]   - org.apache.solr.cloud.DistribCursorPagingTest (suite)
>>[junit4]
>> ...
>>[junit4] Execution time total: 2 hours 6 minutes 50 seconds
>>[junit4] Tests summary: 365 suites, 1570 tests, 1 suite-level error, 2
>> errors, 187 ignored (12 assumptions)
>>
>> And another one (don't have the terminal output on hand unfortunately) in
>> the cloudera morphline suite. It is the same error as this though and fails
>> after around an hour:
>> http://mail-archives.apache.org/mod_mbox/flume-dev/201310.mbox/%3ccac6yyrj2cv89hntdeel7t0qlq8zjbwjynbtcveucxlzdmyv...@mail.gmail.com%3E
>>
>> I've tried running those tests in isolation via IntelliJ and they all
>> report as passing... the logs show exceptions about ZK session expiry for
>> some (not all) but I assume those are trapped expected exceptions since
>> JUnit is passing them?
>>
>> Given the response in the message I linked just above re: windows support
>> I tried shifting the build up to a RHEL6 server this morning but I've tried
>> two runs now and both failed with this odd error:
>>[junit4] Tests with failures:
>>[junit4]   - org.apache.solr.hadoop.MorphlineMapperTest (suite)
>>[junit4]
>> ...
>>[junit4] Execution time total: 42 seconds
>>[junit4] Tests summary: 7 suites, 35 tests, 2 suite-level errors, 5
>> ignored
>>
>> I only say odd because they run for half an hour and then report 42
>> seconds.
>>
>> Thanks again if you've read all this.
>>
>
>

Simple unit test doco?

2014-04-28 Thread Greg Pendlebury

Heyo,

I'm wondering if there is any additional doco and/or tricks to unit testing
solr than this wiki page? http://wiki.apache.org/solr/TestingSolr

Some details about my troubles are below if anyone cares to read, but I'm
not so much looking for specific responses to why individual tests are
failing. I'm more trying to work out whether I'm on the right track or
missing some key information... like do I need to turn on a ZK server for
integration testing?

Or do I need to accept failed unit tests as a baseline before applying our
patch? I don't typically like that, but this is an enormous test suite and
I'd be happy just to get a pass up to the same level that 4.7.2 had prior
to release.

Ta,
Greg

Details
==
I downloaded the tagged 4.7.2 release Yesterday to apply a patch our team
wants to test, but even before touching the codebase at all I cannot get
the unit tests to pass. I'm struggling to even get consistent results.

The most useful two end points I reach are:
[junit4] Tests with failures:
[junit4] - org.apache.solr.cloud.CustomCollectionTest.testDistribSearch
[junit4] -
org.apache.solr.cloud.DistribCursorPagingTest.testDistribSearch
[junit4] - org.apache.solr.cloud.DistribCursorPagingTest (suite)
[junit4]
...
[junit4] Execution time total: 2 hours 6 minutes 50 seconds
[junit4] Tests summary: 365 suites, 1570 tests, 1 suite-level error, 2
errors, 187 ignored (12 assumptions)

And another one (don't have the terminal output on hand unfortunately) in
the cloudera morphline suite. It is the same error as this though and fails
after around an hour:
http://mail-archives.apache.org/mod_mbox/flume-dev/201310.mbox/%3ccac6yyrj2cv89hntdeel7t0qlq8zjbwjynbtcveucxlzdmyv...@mail.gmail.com%3E

I've tried running those tests in isolation via IntelliJ and they all
report as passing... the logs show exceptions about ZK session expiry for
some (not all) but I assume those are trapped expected exceptions since
JUnit is passing them?

Given the response in the message I linked just above re: windows support I
tried shifting the build up to a RHEL6 server this morning but I've tried
two runs now and both failed with this odd error:
[junit4] Tests with failures:
[junit4] - org.apache.solr.hadoop.MorphlineMapperTest (suite)
[junit4]
...
[junit4] Execution time total: 42 seconds
[junit4] Tests summary: 7 suites, 35 tests, 2 suite-level errors, 5
ignored

I only say odd because they run for half an hour and then report 42 seconds.

Thanks again if you've read all this.

[jira] [Comment Edited] (SOLR-5722) Add catenateShingles option to WordDelimiterFilter

2014-02-17 Thread Greg Pendlebury (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-5722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13903781#comment-13903781
 ] 

Greg Pendlebury edited comment on SOLR-5722 at 2/18/14 4:55 AM:


The link to the doco is working for me today so I took a quick look. I think 
the other reason that the HyphenatedWordsFilter is not suitable is that it 
removes the hyphen from the material assuming that it can only have one 
meaning. The specific circumstances I am considering is when the hyphen is part 
of a legitimately hyphenated word that just happen to break across a line wrap. 
eg. 'up-\{\n\}to-date'

The HyphenatedWordsFilter would turn this into 'upto-date', and cause user 
searches of 'up to date' to not match, since no filters later in the change can 
really pull 'upto' apart again. Whereas the 'catenateShingles' option is 
intended to preserve the word delimiter and provide all the permutations a user 
might type to find that term: "up to date", "upto date", "up todate", "uptodate"


was (Author: gpendleb):
The link to the doco is working for me today so I took a quick look. I think 
the other reason that the HyphenatedWordsFilter is not suitable is that it 
removes the hyphen from the material assuming that it can only have one 
meaning. The specific circumstances I am considering is when the hyphen is part 
of a legitimately hyphenated word that just happen to break across a line wrap. 
eg. 'up-{\n}to-date'

The HyphenatedWordsFilter would turn this into 'upto-date', and cause user 
searches of 'up to date' to not match, since no filters later in the change can 
really pull 'upto' apart again. Whereas the 'catenateShingles' option is 
intended to preserve the word delimiter and provide all the permutations a user 
might type to find that term: "up to date", "upto date", "up todate", "uptodate"

> Add catenateShingles option to WordDelimiterFilter
> --
>
> Key: SOLR-5722
> URL: https://issues.apache.org/jira/browse/SOLR-5722
> Project: Solr
>  Issue Type: Improvement
>Reporter: Greg Pendlebury
>Priority: Minor
>  Labels: filter, newbie, patch
> Attachments: WDFconcatShingles.patch
>
>
> Apologies if I put this in the wrong spot. I'm attaching a patch (against 
> current trunk) that adds support for a 'catenateShingles' option to the 
> WordDelimiterFilter. 
> We (National Library of Australia - NLA) are currently maintaining this as an 
> internal modification to the Filter, but I believe it is generic enough to 
> contribute upstream.
> Description:
> =
> {code}
> /**
>  * NLA Modification to the standard word delimiter to support various
>  * hyphenation use cases. Primarily driven by requirements for
>  * newspapers where words are often broken across line endings.
>  *
>  *  eg. "hyphenated-surname" is printed printed across a line ending and
>  * turns out like "hyphen-ated-surname" or "hyphenated-sur-name".
>  *
>  *  In this scenario the stock filter, with 'catenateAll' turned on, will
>  *  generate individual tokens plus one combined token, but not
>  *  sub-tokens like "hyphenated surname" and "hyphenatedsur name".
>  *
>  *  So we add a new 'catenateShingles' to achieve this.
> */
> {code}
> Includes unit tests, and as is noted in one of them CATENATE_WORDS and 
> CATENATE_SHINGLES are logically considered mutually exclusive for sensible 
> usage and can cause duplicate tokens (although they should have the same 
> positions etc).
> I'm happy to work on it more if anyone finds problems with it.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5722) Add catenateShingles option to WordDelimiterFilter

2014-02-17 Thread Greg Pendlebury (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-5722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13903781#comment-13903781
 ] 

Greg Pendlebury commented on SOLR-5722:
---

The link to the doco is working for me today so I took a quick look. I think 
the other reason that the HyphenatedWordsFilter is not suitable is that it 
removes the hyphen from the material assuming that it can only have one 
meaning. The specific circumstances I am considering is when the hyphen is part 
of a legitimately hyphenated word that just happen to break across a line wrap. 
eg. 'up-{\n}to-date'

The HyphenatedWordsFilter would turn this into 'upto-date', and cause user 
searches of 'up to date' to not match, since no filters later in the change can 
really pull 'upto' apart again. Whereas the 'catenateShingles' option is 
intended to preserve the word delimiter and provide all the permutations a user 
might type to find that term: "up to date", "upto date", "up todate", "uptodate"

> Add catenateShingles option to WordDelimiterFilter
> --
>
> Key: SOLR-5722
> URL: https://issues.apache.org/jira/browse/SOLR-5722
> Project: Solr
>  Issue Type: Improvement
>Reporter: Greg Pendlebury
>Priority: Minor
>  Labels: filter, newbie, patch
> Attachments: WDFconcatShingles.patch
>
>
> Apologies if I put this in the wrong spot. I'm attaching a patch (against 
> current trunk) that adds support for a 'catenateShingles' option to the 
> WordDelimiterFilter. 
> We (National Library of Australia - NLA) are currently maintaining this as an 
> internal modification to the Filter, but I believe it is generic enough to 
> contribute upstream.
> Description:
> =
> {code}
> /**
>  * NLA Modification to the standard word delimiter to support various
>  * hyphenation use cases. Primarily driven by requirements for
>  * newspapers where words are often broken across line endings.
>  *
>  *  eg. "hyphenated-surname" is printed printed across a line ending and
>  * turns out like "hyphen-ated-surname" or "hyphenated-sur-name".
>  *
>  *  In this scenario the stock filter, with 'catenateAll' turned on, will
>  *  generate individual tokens plus one combined token, but not
>  *  sub-tokens like "hyphenated surname" and "hyphenatedsur name".
>  *
>  *  So we add a new 'catenateShingles' to achieve this.
> */
> {code}
> Includes unit tests, and as is noted in one of them CATENATE_WORDS and 
> CATENATE_SHINGLES are logically considered mutually exclusive for sensible 
> usage and can cause duplicate tokens (although they should have the same 
> positions etc).
> I'm happy to work on it more if anyone finds problems with it.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-5722) Add catenateShingles option to WordDelimiterFilter

2014-02-16 Thread Greg Pendlebury (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-5722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13902824#comment-13902824
 ] 

Greg Pendlebury commented on SOLR-5722:
---

I don't think it does. It has been a while since we looked into it, and that 
link is currently returning 503 for me, but my understanding was that the 
HyphenatedWordsFilter put two tokens back together when a hyphen was found on 
the end of the first token. The catenateShingles options we are using addresses 
the scenario where multiple hyphens are found internal to a single token.

> Add catenateShingles option to WordDelimiterFilter
> --
>
> Key: SOLR-5722
> URL: https://issues.apache.org/jira/browse/SOLR-5722
> Project: Solr
>  Issue Type: Improvement
>Reporter: Greg Pendlebury
>Priority: Minor
>  Labels: filter, newbie, patch
> Attachments: WDFconcatShingles.patch
>
>
> Apologies if I put this in the wrong spot. I'm attaching a patch (against 
> current trunk) that adds support for a 'catenateShingles' option to the 
> WordDelimiterFilter. 
> We (National Library of Australia - NLA) are currently maintaining this as an 
> internal modification to the Filter, but I believe it is generic enough to 
> contribute upstream.
> Description:
> =
> {code}
> /**
>  * NLA Modification to the standard word delimiter to support various
>  * hyphenation use cases. Primarily driven by requirements for
>  * newspapers where words are often broken across line endings.
>  *
>  *  eg. "hyphenated-surname" is printed printed across a line ending and
>  * turns out like "hyphen-ated-surname" or "hyphenated-sur-name".
>  *
>  *  In this scenario the stock filter, with 'catenateAll' turned on, will
>  *  generate individual tokens plus one combined token, but not
>  *  sub-tokens like "hyphenated surname" and "hyphenatedsur name".
>  *
>  *  So we add a new 'catenateShingles' to achieve this.
> */
> {code}
> Includes unit tests, and as is noted in one of them CATENATE_WORDS and 
> CATENATE_SHINGLES are logically considered mutually exclusive for sensible 
> usage and can cause duplicate tokens (although they should have the same 
> positions etc).
> I'm happy to work on it more if anyone finds problems with it.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Updated] (SOLR-5722) Add catenateShingles option to WordDelimiterFilter

2014-02-12 Thread Greg Pendlebury (JIRA)


 [ 
https://issues.apache.org/jira/browse/SOLR-5722?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Greg Pendlebury updated SOLR-5722:
--

Attachment: WDFconcatShingles.patch

Patch against trunk : http://svn.apache.org/repos/asf/lucene/dev/trunk 
(r1567824)

> Add catenateShingles option to WordDelimiterFilter
> --
>
> Key: SOLR-5722
> URL: https://issues.apache.org/jira/browse/SOLR-5722
> Project: Solr
>  Issue Type: Improvement
>    Reporter: Greg Pendlebury
>Priority: Minor
>  Labels: filter, newbie, patch
> Attachments: WDFconcatShingles.patch
>
>
> Apologies if I put this in the wrong spot. I'm attaching a patch (against 
> current trunk) that adds support for a 'catenateShingles' option to the 
> WordDelimiterFilter. 
> We (National Library of Australia - NLA) are currently maintaining this as an 
> internal modification to the Filter, but I believe it is generic enough to 
> contribute upstream.
> Description:
> =
> {code}
> /**
>  * NLA Modification to the standard word delimiter to support various
>  * hyphenation use cases. Primarily driven by requirements for
>  * newspapers where words are often broken across line endings.
>  *
>  *  eg. "hyphenated-surname" is printed printed across a line ending and
>  * turns out like "hyphen-ated-surname" or "hyphenated-sur-name".
>  *
>  *  In this scenario the stock filter, with 'catenateAll' turned on, will
>  *  generate individual tokens plus one combined token, but not
>  *  sub-tokens like "hyphenated surname" and "hyphenatedsur name".
>  *
>  *  So we add a new 'catenateShingles' to achieve this.
> */
> {code}
> Includes unit tests, and as is noted in one of them CATENATE_WORDS and 
> CATENATE_SHINGLES are logically considered mutually exclusive for sensible 
> usage and can cause duplicate tokens (although they should have the same 
> positions etc).
> I'm happy to work on it more if anyone finds problems with it.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Created] (SOLR-5722) Add catenateShingles option to WordDelimiterFilter

2014-02-12 Thread Greg Pendlebury (JIRA)

Greg Pendlebury created SOLR-5722:
-

 Summary: Add catenateShingles option to WordDelimiterFilter
 Key: SOLR-5722
 URL: https://issues.apache.org/jira/browse/SOLR-5722
 Project: Solr
  Issue Type: Improvement
Reporter: Greg Pendlebury
Priority: Minor


Apologies if I put this in the wrong spot. I'm attaching a patch (against 
current trunk) that adds support for a 'catenateShingles' option to the 
WordDelimiterFilter. 

We (National Library of Australia - NLA) are currently maintaining this as an 
internal modification to the Filter, but I believe it is generic enough to 
contribute upstream.

Description:
=
{code}
/**
 * NLA Modification to the standard word delimiter to support various
 * hyphenation use cases. Primarily driven by requirements for
 * newspapers where words are often broken across line endings.
 *
 *  eg. "hyphenated-surname" is printed printed across a line ending and
 * turns out like "hyphen-ated-surname" or "hyphenated-sur-name".
 *
 *  In this scenario the stock filter, with 'catenateAll' turned on, will
 *  generate individual tokens plus one combined token, but not
 *  sub-tokens like "hyphenated surname" and "hyphenatedsur name".
 *
 *  So we add a new 'catenateShingles' to achieve this.
*/
{code}

Includes unit tests, and as is noted in one of them CATENATE_WORDS and 
CATENATE_SHINGLES are logically considered mutually exclusive for sensible 
usage and can cause duplicate tokens (although they should have the same 
positions etc).

I'm happy to work on it more if anyone finds problems with it.



--
This message was sent by Atlassian JIRA
(v6.1.5#6160)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-4956) make maxBufferedAddsPerServer configurable

2013-08-13 Thread Greg Pendlebury (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-4956?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13739058#comment-13739058
 ] 

Greg Pendlebury commented on SOLR-4956:
---

"So it seems we have three options here:
1> make it configurable with a warning that if you change it it may lead to Bad 
Stuff."

I'd support this solely from the perspective of testing its impact. Rebuilding 
code to change a hardcoded integer is a tad annoying if you are just diagnosing 
what impact things could have. We batch ingest several thousand documents at a 
time into a 96 JVM cluster (32 shards * 3 replicas). I'd love to see if we 
could lower CPU load by altering this setting... even if it is only a 
diagnostic step that is at odds with long term goals related to batching at all.

> make maxBufferedAddsPerServer configurable
> --
>
> Key: SOLR-4956
> URL: https://issues.apache.org/jira/browse/SOLR-4956
> Project: Solr
>  Issue Type: Improvement
>Affects Versions: 4.3, 5.0
>Reporter: Erick Erickson
>
> Anecdotal user's list evidence indicates that in high-throughput situations, 
> the default of 10 docs/batch for inter-shard batching can generate 
> significant CPU load. See the thread titled "Sharding and Replication" on 
> June 19th, but the gist is below.
> I haven't poked around, but it's a little surprising on the surface that Asif 
> is seeing this kind of difference. So I'm wondering if this change indicates 
> some other underlying issue. Regardless, this seems like it would be good to 
> investigate.
> Here's the gist of Asif's experience from the thread:
> Its a completely practical problem - we are exploring Solr to build a real
> time analytics/data solution for a system handling about 1000 qps. We have
> various metrics that are stored as different collections on the cloud,
> which means very high amount of writes. The cloud also needs to support
> about 300-400 qps.
> We initially tested with a single Solr node on a 16 core / 24 GB box  for a
> single metric. We saw that writes were not a issue at all - Solr was
> handling it extremely well. We were also able to achieve about 200 qps from
> a single node.
> When we set up the cloud ( a ensemble on 6 boxes), we saw very high CPU
> usage on the replicas. Up to 10 cores were getting used for writes on the
> replicas. Hence my concern with respect to batch updates for the replicas.
> BTW, I altered the maxBufferedAddsPerServer to 1000 - and now CPU usage is
> very similar to single node installation.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators
For more information on JIRA, see: http://www.atlassian.com/software/jira

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2487) Do not include slf4j-jdk14 jar in WAR

2012-05-02 Thread Greg Pendlebury (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13267045#comment-13267045
 ] 

Greg Pendlebury commented on SOLR-2487:
---

@Neil, that's way better then the way I do things now. Thanks.

Maven continues to surprise me.

> Do not include slf4j-jdk14 jar in WAR
> -
>
> Key: SOLR-2487
> URL: https://issues.apache.org/jira/browse/SOLR-2487
> Project: Solr
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.2, 4.0
>Reporter: Jan Høydahl
>Assignee: Jan Høydahl
>  Labels: logging, slf4j
> Fix For: 3.6, 4.0
>
> Attachments: SOLR-2487.patch, SOLR-2487.patch, SOLR-2487.patch
>
>
> I know we've intentionally bundled slf4j-jdk14-1.5.5.jar in the war to help 
> newbies get up and running. But I find myself re-packaging the war for every 
> customer when adapting to their choice of logger framework, which is 
> counter-productive.
> It would be sufficient to have the jdk-logging binding in example/lib to let 
> the example and tutorial still work OOTB but as soon as you deploy solr.war 
> to production you're forced to explicitly decide what logging to use.

--
This message is automatically generated by JIRA.
If you think it was sent incorrectly, please contact your JIRA administrators: 
https://issues.apache.org/jira/secure/ContactAdministrators!default.jspa
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Issue Comment Edited] (SOLR-2487) Do not include slf4j-jdk14 jar in WAR

2011-08-14 Thread Greg Pendlebury (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13084967#comment-13084967
 ] 

Greg Pendlebury edited comment on SOLR-2487 at 8/15/11 5:22 AM:


"At the moment there is no way in Maven to have it exclude the jdk14 JAR"... 
Hmm, I shouldn't have stated an absolute like that. I eventually got a script 
building today that dropped the WAR as a dependency, unpacked it to a '/solr' 
context folder, then nuked the jdk14 JAR only, leaving the rest in place.

I'd still prefer a skinny WAR, since it would be a much cleaner build script, 
and allow me to eliminate duplicate/conflicting JARs on the classpath with 
greater ease. It would also be more in line with the spirit of how Maven is 
intended to work... but I have a workaround, and don't expect the world to 
conform to my wishes :)

  was (Author: greg.pendlebury):
"At the moment there is no way in Maven to have it exclude the jdk14 
JAR"... Hmm, I shouldn't have stated an absolute like that. I eventually got a 
script building today that dropped the WAR as a dependency, unpacked it to a 
'/solr' context folder, then nuked the jdk14 JAR only, leaving the rest in 
place.

I'd still prefer a skinny WAR, since it would be a much cleaner build script, 
and allow me to eliminate duplicate JARs on the classpath with greater ease. It 
would also be more in line with the spirit of how Maven is intended to work... 
but I have a workaround, and don't expect to world to conform to my wishes :)
  
> Do not include slf4j-jdk14 jar in WAR
> -
>
> Key: SOLR-2487
> URL: https://issues.apache.org/jira/browse/SOLR-2487
> Project: Solr
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.2, 4.0
>Reporter: Jan Høydahl
>  Labels: logging, slf4j
>
> I know we've intentionally bundled slf4j-jdk14-1.5.5.jar in the war to help 
> newbies get up and running. But I find myself re-packaging the war for every 
> customer when adapting to their choice of logger framework, which is 
> counter-productive.
> It would be sufficient to have the jdk-logging binding in example/lib to let 
> the example and tutorial still work OOTB but as soon as you deploy solr.war 
> to production you're forced to explicitly decide what logging to use.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2487) Do not include slf4j-jdk14 jar in WAR

2011-08-14 Thread Greg Pendlebury (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13084967#comment-13084967
 ] 

Greg Pendlebury commented on SOLR-2487:
---

"At the moment there is no way in Maven to have it exclude the jdk14 JAR"... 
Hmm, I shouldn't have stated an absolute like that. I eventually got a script 
building today that dropped the WAR as a dependency, unpacked it to a '/solr' 
context folder, then nuked the jdk14 JAR only, leaving the rest in place.

I'd still prefer a skinny WAR, since it would be a much cleaner build script, 
and allow me to eliminate duplicate JARs on the classpath with greater ease. It 
would also be more in line with the spirit of how Maven is intended to work... 
but I have a workaround, and don't expect to world to conform to my wishes :)

> Do not include slf4j-jdk14 jar in WAR
> -
>
> Key: SOLR-2487
> URL: https://issues.apache.org/jira/browse/SOLR-2487
> Project: Solr
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.2, 4.0
>Reporter: Jan Høydahl
>  Labels: logging, slf4j
>
> I know we've intentionally bundled slf4j-jdk14-1.5.5.jar in the war to help 
> newbies get up and running. But I find myself re-packaging the war for every 
> customer when adapting to their choice of logger framework, which is 
> counter-productive.
> It would be sufficient to have the jdk-logging binding in example/lib to let 
> the example and tutorial still work OOTB but as soon as you deploy solr.war 
> to production you're forced to explicitly decide what logging to use.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

[jira] [Commented] (SOLR-2487) Do not include slf4j-jdk14 jar in WAR

2011-08-14 Thread Greg Pendlebury (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-2487?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=13084928#comment-13084928
 ] 

Greg Pendlebury commented on SOLR-2487:
---

It would be great to have a skinny WAR available as a Maven artifact. At the 
moment there is no way in Maven to have it exclude the jdk14 JAR short of 
rebuilding and rehosting the WAR elsewhere. eg: 
http://www.jarvana.com/jarvana/browse/org/dspace/dependencies/solr/dspace-solr-webapp/1.4.1.0/

And to my knowledge at the moment, there is nothing like this available for 
v3.3.0

With a skinny WAR in Maven listing all the currently bundled dependencies the 
end result for most users would be identical, since Maven will go get them all 
for you anyway. Then people that don't want jdk14 can add this to their own 
project and they will get everything but that single dependency:

  org.slf4j
  slf4j-jdk
  1.6.1
  provided



> Do not include slf4j-jdk14 jar in WAR
> -
>
> Key: SOLR-2487
> URL: https://issues.apache.org/jira/browse/SOLR-2487
> Project: Solr
>  Issue Type: Improvement
>  Components: Build
>Affects Versions: 3.2, 4.0
>Reporter: Jan Høydahl
>  Labels: logging, slf4j
>
> I know we've intentionally bundled slf4j-jdk14-1.5.5.jar in the war to help 
> newbies get up and running. But I find myself re-packaging the war for every 
> customer when adapting to their choice of logger framework, which is 
> counter-productive.
> It would be sufficient to have the jdk-logging binding in example/lib to let 
> the example and tutorial still work OOTB but as soon as you deploy solr.war 
> to production you're forced to explicitly decide what logging to use.

--
This message is automatically generated by JIRA.
For more information on JIRA, see: http://www.atlassian.com/software/jira



-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org

48 matches

Mail list logo