Re: Accent insensitive search for greek characters

2017-10-18 Thread Chitra
Hi Alexandre,
Thank you so much for the kind response. i will check
it out.

-- 
Regards,
Chitra


Solr nodes going into recovery mode and eventually failing

2017-10-18 Thread Shamik Bandopadhyay
Hi,

  I'm having this weird issue where Solr nodes suddenly go into recovery
mode and eventually failing. That one failure kicks off a cascading effect
and impacts the other nodes eventually. Without a restart, the entire
cluster goes into a limbo after a while. Looking into the log and SPM
monitoring tool, the issue happens under following circumstances:
1. Node gets a spike in query/index request, thus exhausting its allocated
memory.
2. GC forces CPU to use 100% of it's capacity
3. None of the above, when both JVM and CPU are within limit

I'm using Solr 6.6. Here are the details about the node :

Hardware type: AWS m4.4xlarge instance
Total memory : 64 gb
CPU : 16
SSD
SOLR_JAVA_MEM="-Xms35g -Xmx35g"
GC_TUNE="-XX:+UseG1GC \
-XX:+ParallelRefProcEnabled \
-XX:G1HeapRegionSize=8m \
-XX:MaxGCPauseMillis=200 \
-XX:+UseLargePages \
-XX:+AggressiveOpts"
SOLR_OPTS="$SOLR_OPTS -Xss256k"
SOLR_OPTS="$SOLR_OPTS -Dsolr.autoCommit.maxTime=60"
SOLR_OPTS="$SOLR_OPTS -Dsolr.clustering.enabled=true"
SOLR_OPTS="$SOLR_OPTS -Dpkiauth.ttl=12"

Cache Parameters:
4096
1000







true
60

I've currently 2 shards each having 2 replicas. The index size is
approximately 70gb.

Here's a solr log trace from the series of events once the node starts
getting into trouble. I've posted only the relevant ones here.


org.apache.solr.common.SolrException.log(SolrException.java:148) -
org.apache.solr.common.SolrException: Cannot talk to ZooKeeper - Updates
are disabled.
at
org.apache.solr.update.processor.DistributedUpdateProcessor.zkCheck(DistributedUpdateProcessor.java:1738)

 
org.apache.solr.handler.admin.CoreAdminOperation.lambda$static$5(CoreAdminOperation.java:143)
- It has been requested that we recover: core=knowledge
INFO647718[qtp2039328061-1526] -
org.apache.solr.servlet.HttpSolrCall.handleAdminRequest(HttpSolrCall.java:732)
- [admin] webapp=null path=/admin/cores
params={core=knowledge=REQUESTRECOVERY=javabin=2}
status=0 QTime=0
INFO647808[qtp2039328061-1540] -
org.apache.solr.update.processor.LogUpdateProcessorFactory$LogUpdateProcessor.finish(LogUpdateProcessorFactory.java:187)
- [knowledge]  webapp=/solr path=/update
params={update.distrib=FROMLEADER=
http://xx.xxx.xxx.63:8983/solr/knowledge/=javabin=2}{} 0 0

WARN657500[recoveryExecutor-3-thread-4-processing-n:xx.xxx.xxx.251:8983_solr
x:knowledge s:shard2 c:knowledge r:core_node9] -
org.apache.solr.cloud.RecoveryStrategy.sendPrepRecoveryCmd(RecoveryStrategy.java:659)
- Socket timeout on send prep recovery cmd, retrying..
INFO657500[recoveryExecutor-3-thread-4-processing-n:xx.xxx.xxx.251:8983_solr
x:knowledge s:shard2 c:knowledge r:core_node9] -
org.apache.solr.cloud.RecoveryStrategy.sendPrepRecoveryCmd(RecoveryStrategy.java:676)
- Sending prep recovery command to [http://xx.xxx.xxx.63:8983/solr];
[WaitForState:
action=PREPRECOVERY=knowledge=xx.xxx.xxx.251:8983_solr=core_node9=recovering=true=true=true]
WARN667514[recoveryExecutor-3-thread-4-processing-n:xx.xxx.xxx.251:8983_solr
x:knowledge s:shard2 c:knowledge r:core_node9] -
org.apache.solr.cloud.RecoveryStrategy.sendPrepRecoveryCmd(RecoveryStrategy.java:659)
- Socket timeout on send prep recovery cmd, retrying..

The retry happens for few times, then

INFO689389[qtp2039328061-1649] -
org.apache.solr.security.RuleBasedAuthorizationPlugin.checkPathPerm(RuleBasedAuthorizationPlugin.java:147)
- request has come without principal. failed permission {
  "name":"select",
  "collection":"knowledge",
  "path":"/select",
  "role":[
"admin",
"dev",
"read"],
  "index":3}
INFO689390[qtp2039328061-1649] -
org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:500) -
USER_REQUIRED auth header null context : userPrincipal: [null] type:
[READ], collections: [knowledge,], Path: [/select] path : /select params
:q=*:*=false=_docid_+asc=0=javabin=2
INFO907854[recoveryExecutor-3-thread-4-processing-n:xx.xxx.xxx.251:8983_solr
x:knowledge s:shard2 c:knowledge r:core_node9] -
org.apache.solr.cloud.RecoveryStrategy.sendPrepRecoveryCmd(RecoveryStrategy.java:676)
- Sending prep recovery command to [http://xx.xxx.xxx.63:8983/solr];
[WaitForState:
action=PREPRECOVERY=knowledge=xx.xxx.xxx.251:8983_solr=core_node9=recovering=true=true=true]
INFO913811[commitScheduler-13-thread-1] -
org.apache.solr.update.DirectUpdateHandler2.commit(DirectUpdateHandler2.java:603)
- start
commit{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}
INFO913812[commitScheduler-13-thread-1] -
org.apache.solr.update.SolrIndexWriter.setCommitData(SolrIndexWriter.java:174)
- Calling setCommitData with
IW:org.apache.solr.update.SolrIndexWriter@29f7c7fc
 INFO913812[commitScheduler-13-thread-1] -
org.apache.solr.update.LoggingInfoStream.message(LoggingInfoStream.java:34)
- [IW][commitScheduler-13-thread-1]: commit: start
 INFO913812[commitScheduler-13-thread-1] -
org.apache.solr.update.LoggingInfoStream.message(LoggingInfoStream.java:34)
- [IW][commitScheduler-13-thread-1]: commit: enter 

Goal: reverse chronological display Methods? (1) boost, and/or (2) disable idf

2017-10-18 Thread billtorcaso
Hello and thanks in advance,

I have inherited a working Solr installation, and I want to tune the order
of display (score) of search results.  I've read a fair amount of
documentation, and not found a solution.  I have debug output to paste in
here, below.

On aspect of my site is sort of a current-news report.  When a user puts in
"mexico" and "earthquake", The results that I get are all relevant.  But as
I inspect the results, I want more-recent items to appear before less recent
ones.  (I have this already: 
=recip(ms(NOW/DAY,unixdate),3.16e-11,5,0.1)) and that is not enough.

I have two specific questions:

  ---  Is there a different value for 'bf' or 'boost' that will strongly
boost the most recent docs?

  ---  It seems to me that 'idf' works against my goal.  I don't want a doc
about an earthquake in Turkey to score highly in a search for mexico
earthquake.  This is using edismax.

   Does that make sense?  If so, how could I achieve that?

Here is my JSON output, with debug information, from the Solr admin query
panel.  I included the first ten results only.

'unixdate' is a floating point value that holds milliseconds from the Unix
epoch, 1970-01-01-00:00:00UTC.  The document that I want on top has
"unixdate": 1506455170, and appears in the 5th position in the list of
results.

I also included a display of the Unix epoch values and the calendar dates
the represent, below the JSON output.


{
  "responseHeader": {
"status": 0,
"QTime": 28,
"params": {
  "lowercaseOperators": "true",
  "debugQuery": "true",
  "fl": "link, title, description,unixdate",
  "indent": "true",
  "q": "mexico earthquake\n",
  "_": "1508340944042",
  "stopwords": "true",
  "wt": "json",
  "defType": "edismax",
  "rows": "10"
}
  },
  "response": {
"numFound": 698,
"start": 0,
"docs": [
  {
"link":
"http://www.oxfamamerica.org/press/mexico-earthquake-oxfam-sends-in-assessment-teams-to-two-worst-hit-areas/;,
"description": "Oxfam is sending in two teams of humanitarian
experts to Morelos and Puebla, with more on the way, following the 7.1
earthquake that struck central Mexico on September 19th. This is the second
earthquake to strike Mexico in less than two weeks.",
"unixdate": 1506029180,
"title": "Mexico earthquake: Oxfam sends in assessment teams to two
worst hit areas"
  },
  {
"link":
"http://www.oxfamamerica.org/explore/stories/deadly-earthquake-hits-mexico/;,
"description": "Oxfam prepares response to Mexico earthquake",
"unixdate": 1506029570,
"title": "Deadly earthquake hits Mexicoâ”Oxfam is there"
  },
  {
"link":
"https://firstperson.oxfamamerica.org/2017/09/an-eyewitness-account-as-the-earthquake-hit-mexico-and-the-urgent-hours-that-followed/;,
"description": "Oxfamâ™s director in Mexico describes the
earthquake, and the initial response in Mexico City.",
"title": "An eyewitness account as the earthquake hit Mexico, and
the urgent hours that followed",
"unixdate": 1505939200
  },
  {
"link":
"http://www.oxfamamerica.org/explore/research-publications/in-need-of-a-better-wash-water-sanitation-and-hygiene-policy-issues-in-post-earthquake-haiti/;,
"description": "",
"unixdate": 1325325570,
"title": "In need of  a better WASH: Water, sanitation, and hygiene
policy issues in post-earthquake Haiti"
  },
  {
"link":
"http://www.oxfamamerica.org/explore/stories/oxfam-teams-focus-on-rural-areas-in-mexico-earthquake-response/;,
"description": "Evaluation of impact is concentrated in poor areas
outside Mexico City",
"unixdate": 1506455170,
"title": "Oxfam teams focus on rural areas in Mexico earthquake
response"
  },
  {
"link":
"http://www.oxfamamerica.org/explore/stories/inside-the-rescue-efforts-following-mexicos-massive-earthquake/;,
"description": "Oxfam is working with local partners to determine
needs and map out our response.",
"unixdate": 1506042370,
"title": "Inside the rescue efforts following Mexicoâ™s massive
earthquake"
  },
  {
"link":
"http://www.oxfamamerica.org/explore/research-publications/housing-delivery-and-housing-finance-in-haiti/;,
"description": "",
"unixdate": 1367459710,
"title": "Housing Delivery and Housing Finance in Haiti"
  },
  {
"link":
"http://www.oxfamamerica.org/explore/research-publications/haiti-land-rights-land-tenure-and-urban-recovery/;,
"description": "",
"unixdate": 1343861760,
"title": "Haiti land rights, land tenure, and urban recovery"
  },
  {
"link":
"http://www.oxfamamerica.org/press/oxfam-responds-to-ecuador-earthquake/;,
"description": "Oxfam has deployed an evaluation team to Ecuador to
determine its humanitarian response to the 7.8 earthquake that struck on the

Re: TermsQuery Result Ordering

2017-10-18 Thread Erick Erickson
bq: Can I boost the Terms in the terms query

I'm pretty sure you can't. But how many of these do you have? You can
always increase the maxBooleanClauses limit in solrconfig.xml. It's
primarily there to say "having this many clauses is usually a bad
idea, so proceed with caution". I've seen 10,000 and higher be used
before, you're really only limited by memory.

And I'm going to guess that your application doesn't have a high query
rate, so you can likely make maxBooleanClauses be very high.

Basically, the code that TermsQuerParser uses bypasses scoring on the
theory that these vary large OR clauses are usually useless for
scoring, your application is an outlier. But you knew that already ;)


Best,
Erick

On Wed, Oct 18, 2017 at 9:42 AM, Webster Homer  wrote:
> I have an application which currently uses a boolean query. The query could
> have a large number of boolean terms. I know that the TermsQuery doesn't
> have the same limitations as the boolean query. However I need to maintain
> the order of the original terms.
>
> The query terms from the boolean query are actually values returned by a
> chemical structure search, which are returned in order of their relevancy
> in the structure search. I maintain the order by giving them a boost which
> is a function of the relevancy from the structure search.
>
> structure_id:(12345^800 OR 12356^750 OR abcde^600 ...
>
> This approach gives me the results in the order I need them in. I'd love to
> use the TermsQuery instead as it doesn't have the same limitations.
>
> Can I boost the Terms in the terms query? Is there a way to order the
> results? e.g. would the results be returned in the same order I specified
> the terms?
>
> Thanks,
>
> --
>
>
> This message and any attachment are confidential and may be privileged or
> otherwise protected from disclosure. If you are not the intended recipient,
> you must not copy this message or attachment or disclose the contents to
> any other person. If you have received this transmission in error, please
> notify the sender immediately and delete the message and any attachment
> from your system. Merck KGaA, Darmstadt, Germany and any of its
> subsidiaries do not accept liability for any omissions or errors in this
> message which may arise as a result of E-Mail-transmission or for damages
> resulting from any unauthorized changes of the content of this message and
> any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its
> subsidiaries do not guarantee that this message is free of viruses and does
> not accept liability for any damages caused by any virus transmitted
> therewith.
>
> Click http://www.emdgroup.com/disclaimer to access the German, French,
> Spanish and Portuguese versions of this disclaimer.


Re: Jetty maxThreads

2017-10-18 Thread Walter Underwood
Actually, Java 8 defaults to 1MB for each stack thread.

-Xsssize
Sets the thread stack size (in bytes). Append the letter k or K to indicate KB, 
m or M to indicate MB, g or G to indicate GB. The default value depends on the 
platform:

Linux/ARM (32-bit): 320 KB

Linux/i386 (32-bit): 320 KB

Linux/x64 (64-bit): 1024 KB

OS X (64-bit): 1024 KB

Oracle Solaris/i386 (32-bit): 320 KB

Oracle Solaris/x64 (64-bit): 1024 KB


wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)


> On Oct 18, 2017, at 1:44 PM, Walter Underwood  wrote:
> 
> With an 8GB heap, I’d like to keep thread stack memory to 2GB or under, which 
> means a maxThreads of 1000.
> 
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
> 
> 
>> On Oct 18, 2017, at 1:41 PM, Walter Underwood  wrote:
>> 
>> Jetty maxThreads is set to 10,000 which seams way too big.
>> 
>> The comment suggests 5X the number of CPUs. We have 36 CPUs, which would 
>> mean 180 threads, which seems more reasonable.
>> 
>> wunder
>> Walter Underwood
>> wun...@wunderwood.org
>> http://observer.wunderwood.org/  (my blog)
>> 
>> 
> 



Re: Jetty maxThreads

2017-10-18 Thread Walter Underwood
With an 8GB heap, I’d like to keep thread stack memory to 2GB or under, which 
means a maxThreads of 1000.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)


> On Oct 18, 2017, at 1:41 PM, Walter Underwood  wrote:
> 
> Jetty maxThreads is set to 10,000 which seams way too big.
> 
> The comment suggests 5X the number of CPUs. We have 36 CPUs, which would mean 
> 180 threads, which seems more reasonable.
> 
> wunder
> Walter Underwood
> wun...@wunderwood.org
> http://observer.wunderwood.org/  (my blog)
> 
> 



Jetty maxThreads

2017-10-18 Thread Walter Underwood
Jetty maxThreads is set to 10,000 which seams way too big.

The comment suggests 5X the number of CPUs. We have 36 CPUs, which would mean 
180 threads, which seems more reasonable.

wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/  (my blog)




Re: Querying fields that don't exist in every collection

2017-10-18 Thread David Hastings
Well, it makes sense however for most use cases.  Lets say you have a
normal user enter a query
WATER:WHEN IS IT NOT A LIQUID
if it was interpreted strictly it would throw an error unless WATER is a
field, and "NOT" needs to be turned into "not" as well.

On Wed, Oct 18, 2017 at 3:46 PM, Beach, Daniel 
wrote:

> Yes, that appears to be true unless I'm missing something obvious.
>
> It seems like in this case Solr should either 1) search against the fields
> that it /does/ know about in a given collection normally, or 2) return an
> error. That would be more intuitive than returning results but with
> different search logic.
>
> Currently we add placeholder fields to other collections in an alias to
> get around this if required, but it's messy.
>
> -Original Message-
> From: David Hastings [mailto:hastings.recurs...@gmail.com]
> Sent: Wednesday, October 18, 2017 3:20 PM
> To: solr-user@lucene.apache.org
> Subject: Re: Querying fields that don't exist in every collection
>
> I may be wrong here, but what i think is happening is the edismax parser
> sees a field that doesn't exist, and therefore "believes" all logic you
> entered into the query is a complete mistake and negates it as such.  so
> NOT becomes the word not and * becomes whitespace.
>
> On Wed, Oct 18, 2017 at 3:15 PM, Beach, Daniel 
> wrote:
>
> > Hello all,
> >
> > I'm running into an issue where adding a field to the qf changes the
> > parsed query if that field doesn't exist in the Solr index. Our use
> > case for doing this is that we have multiple collections and many of
> > our queries leverage aliases to search across several of them
> > simultaneously. While we have a common group of fields that is shared
> > across schemas, it is often the case that we need to use
> > collection-specific fields to boost one content type over another.
> >
> > Here is the query debug output from a query that only searches on
> > existing fields. This is the intended outcome.
> >
> > q=companies%20NOT%20brands*=Text=json=query&
> > defType=edismax
> >
> > "rawquerystring":"companies NOT brands*",
> > "querystring":"companies NOT brands*",
> > "parsedquery":"(+(DisjunctionMaxQuery((Text:company))
> > -DisjunctionMaxQuery((Text:brands*/no_coord",
> > "parsedquery_toString":"+((Text:company) -(Text:brands*))",
> > "QParser":"ExtendedDismaxQParser",
> >
> >
> > When a "BOGUS" field name is added to the second query suddenly the
> > query operators aren't interpreted correctly, and wildcards get dropped.
> >
> > q=companies%20NOT%20brands*=Text%20BOGUS=json=
> > query=edismax
> >
> > "rawquerystring":"companies NOT brands*",
> > "querystring":"companies NOT brands*",
> > "parsedquery":"(+(DisjunctionMaxQuery((Text:company))
> > DisjunctionMaxQuery((Text:not)) DisjunctionMaxQuery((Text:
> > brand/no_coord",
> > "parsedquery_toString":"+((Text:company) (Text:not) (Text:brand))",
> > "QParser":"ExtendedDismaxQParser"
> >
> >
> > These results are from Solr 6.5.0 and 5.5, querying against a single
> > collection endpoint. Is this Solr's expected functionality? Hopefully
> > I'm just blanking on something simple here.
> >
> > Thanks,
> > Daniel Beach
> >
> > 
> >
> > The information contained in this message is intended only for the
> > recipient, and may be a confidential attorney-client communication or
> > may otherwise be privileged and confidential and protected from
> > disclosure. If the reader of this message is not the intended
> > recipient, or an employee or agent responsible for delivering this
> > message to the intended recipient, please be aware that any
> > dissemination or copying of this communication is strictly prohibited.
> > If you have received this communication in error, please immediately
> > notify us by replying to the message and deleting it from your
> > computer. S Global Inc. reserves the right, subject to applicable
> > local law, to monitor, review and process the content of any
> > electronic message or information sent to or from S Global Inc.
> > e-mail addresses without informing the sender or recipient of the
> > message. By sending electronic message or information to S Global
> > Inc. e-mail addresses you, as the sender, are consenting to S Global
> Inc. processing any of your personal data therein.
> >
>
> 
>
> The information contained in this message is intended only for the
> recipient, and may be a confidential attorney-client communication or may
> otherwise be privileged and confidential and protected from disclosure. If
> the reader of this message is not the intended recipient, or an employee or
> agent responsible for delivering this message to the intended recipient,
> please be aware that any dissemination or copying of this communication is
> strictly prohibited. If you have received this communication in error,
> please immediately 

RE: Querying fields that don't exist in every collection

2017-10-18 Thread Beach, Daniel
Yes, that appears to be true unless I'm missing something obvious.

It seems like in this case Solr should either 1) search against the fields that 
it /does/ know about in a given collection normally, or 2) return an error. 
That would be more intuitive than returning results but with different search 
logic.

Currently we add placeholder fields to other collections in an alias to get 
around this if required, but it's messy.

-Original Message-
From: David Hastings [mailto:hastings.recurs...@gmail.com]
Sent: Wednesday, October 18, 2017 3:20 PM
To: solr-user@lucene.apache.org
Subject: Re: Querying fields that don't exist in every collection

I may be wrong here, but what i think is happening is the edismax parser sees a 
field that doesn't exist, and therefore "believes" all logic you entered into 
the query is a complete mistake and negates it as such.  so NOT becomes the 
word not and * becomes whitespace.

On Wed, Oct 18, 2017 at 3:15 PM, Beach, Daniel 
wrote:

> Hello all,
>
> I'm running into an issue where adding a field to the qf changes the
> parsed query if that field doesn't exist in the Solr index. Our use
> case for doing this is that we have multiple collections and many of
> our queries leverage aliases to search across several of them
> simultaneously. While we have a common group of fields that is shared
> across schemas, it is often the case that we need to use
> collection-specific fields to boost one content type over another.
>
> Here is the query debug output from a query that only searches on
> existing fields. This is the intended outcome.
>
> q=companies%20NOT%20brands*=Text=json=query&
> defType=edismax
>
> "rawquerystring":"companies NOT brands*",
> "querystring":"companies NOT brands*",
> "parsedquery":"(+(DisjunctionMaxQuery((Text:company))
> -DisjunctionMaxQuery((Text:brands*/no_coord",
> "parsedquery_toString":"+((Text:company) -(Text:brands*))",
> "QParser":"ExtendedDismaxQParser",
>
>
> When a "BOGUS" field name is added to the second query suddenly the
> query operators aren't interpreted correctly, and wildcards get dropped.
>
> q=companies%20NOT%20brands*=Text%20BOGUS=json=
> query=edismax
>
> "rawquerystring":"companies NOT brands*",
> "querystring":"companies NOT brands*",
> "parsedquery":"(+(DisjunctionMaxQuery((Text:company))
> DisjunctionMaxQuery((Text:not)) DisjunctionMaxQuery((Text:
> brand/no_coord",
> "parsedquery_toString":"+((Text:company) (Text:not) (Text:brand))",
> "QParser":"ExtendedDismaxQParser"
>
>
> These results are from Solr 6.5.0 and 5.5, querying against a single
> collection endpoint. Is this Solr's expected functionality? Hopefully
> I'm just blanking on something simple here.
>
> Thanks,
> Daniel Beach
>
> 
>
> The information contained in this message is intended only for the
> recipient, and may be a confidential attorney-client communication or
> may otherwise be privileged and confidential and protected from
> disclosure. If the reader of this message is not the intended
> recipient, or an employee or agent responsible for delivering this
> message to the intended recipient, please be aware that any
> dissemination or copying of this communication is strictly prohibited.
> If you have received this communication in error, please immediately
> notify us by replying to the message and deleting it from your
> computer. S Global Inc. reserves the right, subject to applicable
> local law, to monitor, review and process the content of any
> electronic message or information sent to or from S Global Inc.
> e-mail addresses without informing the sender or recipient of the
> message. By sending electronic message or information to S Global
> Inc. e-mail addresses you, as the sender, are consenting to S Global Inc. 
> processing any of your personal data therein.
>



The information contained in this message is intended only for the recipient, 
and may be a confidential attorney-client communication or may otherwise be 
privileged and confidential and protected from disclosure. If the reader of 
this message is not the intended recipient, or an employee or agent responsible 
for delivering this message to the intended recipient, please be aware that any 
dissemination or copying of this communication is strictly prohibited. If you 
have received this communication in error, please immediately notify us by 
replying to the message and deleting it from your computer. S Global Inc. 
reserves the right, subject to applicable local law, to monitor, review and 
process the content of any electronic message or information sent to or from 
S Global Inc. e-mail addresses without informing the sender or recipient of 
the message. By sending electronic message or information to S Global Inc. 
e-mail addresses you, as the sender, are consenting to S Global Inc. 
processing any of your personal data therein.


Re: Querying fields that don't exist in every collection

2017-10-18 Thread David Hastings
I may be wrong here, but what i think is happening is the edismax parser
sees a field that doesn't exist, and therefore "believes" all logic you
entered into the query is a complete mistake and negates it as such.  so
NOT becomes the word not and * becomes whitespace.

On Wed, Oct 18, 2017 at 3:15 PM, Beach, Daniel 
wrote:

> Hello all,
>
> I'm running into an issue where adding a field to the qf changes the
> parsed query if that field doesn't exist in the Solr index. Our use case
> for doing this is that we have multiple collections and many of our queries
> leverage aliases to search across several of them simultaneously. While we
> have a common group of fields that is shared across schemas, it is often
> the case that we need to use collection-specific fields to boost one
> content type over another.
>
> Here is the query debug output from a query that only searches on existing
> fields. This is the intended outcome.
>
> q=companies%20NOT%20brands*=Text=json=query&
> defType=edismax
>
> "rawquerystring":"companies NOT brands*",
> "querystring":"companies NOT brands*",
> "parsedquery":"(+(DisjunctionMaxQuery((Text:company))
> -DisjunctionMaxQuery((Text:brands*/no_coord",
> "parsedquery_toString":"+((Text:company) -(Text:brands*))",
> "QParser":"ExtendedDismaxQParser",
>
>
> When a "BOGUS" field name is added to the second query suddenly the query
> operators aren't interpreted correctly, and wildcards get dropped.
>
> q=companies%20NOT%20brands*=Text%20BOGUS=json=
> query=edismax
>
> "rawquerystring":"companies NOT brands*",
> "querystring":"companies NOT brands*",
> "parsedquery":"(+(DisjunctionMaxQuery((Text:company))
> DisjunctionMaxQuery((Text:not)) DisjunctionMaxQuery((Text:
> brand/no_coord",
> "parsedquery_toString":"+((Text:company) (Text:not) (Text:brand))",
> "QParser":"ExtendedDismaxQParser"
>
>
> These results are from Solr 6.5.0 and 5.5, querying against a single
> collection endpoint. Is this Solr's expected functionality? Hopefully I'm
> just blanking on something simple here.
>
> Thanks,
> Daniel Beach
>
> 
>
> The information contained in this message is intended only for the
> recipient, and may be a confidential attorney-client communication or may
> otherwise be privileged and confidential and protected from disclosure. If
> the reader of this message is not the intended recipient, or an employee or
> agent responsible for delivering this message to the intended recipient,
> please be aware that any dissemination or copying of this communication is
> strictly prohibited. If you have received this communication in error,
> please immediately notify us by replying to the message and deleting it
> from your computer. S Global Inc. reserves the right, subject to
> applicable local law, to monitor, review and process the content of any
> electronic message or information sent to or from S Global Inc. e-mail
> addresses without informing the sender or recipient of the message. By
> sending electronic message or information to S Global Inc. e-mail
> addresses you, as the sender, are consenting to S Global Inc. processing
> any of your personal data therein.
>


Querying fields that don't exist in every collection

2017-10-18 Thread Beach, Daniel
Hello all,

I'm running into an issue where adding a field to the qf changes the parsed 
query if that field doesn't exist in the Solr index. Our use case for doing 
this is that we have multiple collections and many of our queries leverage 
aliases to search across several of them simultaneously. While we have a common 
group of fields that is shared across schemas, it is often the case that we 
need to use collection-specific fields to boost one content type over another.

Here is the query debug output from a query that only searches on existing 
fields. This is the intended outcome.

q=companies%20NOT%20brands*=Text=json=query=edismax

"rawquerystring":"companies NOT brands*",
"querystring":"companies NOT brands*",
"parsedquery":"(+(DisjunctionMaxQuery((Text:company)) 
-DisjunctionMaxQuery((Text:brands*/no_coord",
"parsedquery_toString":"+((Text:company) -(Text:brands*))",
"QParser":"ExtendedDismaxQParser",


When a "BOGUS" field name is added to the second query suddenly the query 
operators aren't interpreted correctly, and wildcards get dropped.


q=companies%20NOT%20brands*=Text%20BOGUS=json=query=edismax

"rawquerystring":"companies NOT brands*",
"querystring":"companies NOT brands*",
"parsedquery":"(+(DisjunctionMaxQuery((Text:company)) 
DisjunctionMaxQuery((Text:not)) DisjunctionMaxQuery((Text:brand/no_coord",
"parsedquery_toString":"+((Text:company) (Text:not) (Text:brand))",
"QParser":"ExtendedDismaxQParser"


These results are from Solr 6.5.0 and 5.5, querying against a single collection 
endpoint. Is this Solr's expected functionality? Hopefully I'm just blanking on 
something simple here.

Thanks,
Daniel Beach



The information contained in this message is intended only for the recipient, 
and may be a confidential attorney-client communication or may otherwise be 
privileged and confidential and protected from disclosure. If the reader of 
this message is not the intended recipient, or an employee or agent responsible 
for delivering this message to the intended recipient, please be aware that any 
dissemination or copying of this communication is strictly prohibited. If you 
have received this communication in error, please immediately notify us by 
replying to the message and deleting it from your computer. S Global Inc. 
reserves the right, subject to applicable local law, to monitor, review and 
process the content of any electronic message or information sent to or from 
S Global Inc. e-mail addresses without informing the sender or recipient of 
the message. By sending electronic message or information to S Global Inc. 
e-mail addresses you, as the sender, are consenting to S Global Inc. 
processing any of your personal data therein.


Re: Trying to fix Too Many Boolean Clauses Exception

2017-10-18 Thread Yonik Seeley
On Wed, Oct 18, 2017 at 12:23 PM, Erick Erickson
 wrote:
> What have you tried? And what is the current setting?
>
> This usually occurs when you are assembling very large OR clauses,
> sometimes for ACL calculations.
>
> So if you have a query of the form
> q=field:(A OR B OR C OR)
> or
> fq=field:(A OR B OR C OR)
>
> change it to use TermsQueryParser, see
> https://lucene.apache.org/solr/guide/6_6/other-parsers.html it doesn't
> suffer this limitation.
>
> In recent versions of Solr this is automatic.

Yeah, that was implemented  in 6.4, when we know we don't need
scoring.  So it will be automatic for things like "fq" parameters, but
not "q" unless you wrap it in a filter()

-Yonik


>
> Best,
> Erick
>
> On Wed, Oct 18, 2017 at 7:44 AM, Patrick R. TOKOUO  wrote:
>> Hello,
>> Please I have unsuccessfuly tried to fix this error on Solr 6.4.
>> I have increased  value to some max, but the same error
>> appear.
>> Please, could you help me.
>>
>> Regards,
>> Patrick R. TOKOUO
>> Mob: (+237) 6 90 08 55 95
>> Skype: ptokouo
>> In: www.linkedin.com/in/patricktokouo
>>
>> 
>> Garanti
>> sans virus. www.avg.com
>> 
>> <#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>


Re: LTR 'feature' and passing date parameters

2017-10-18 Thread Dariusz Wojtas
Thank you very much Binoy.
This worked perfectly.

Best regards,
Dariusz Wojtas

On Wed, Oct 18, 2017 at 5:06 PM, Binoy Dalal  wrote:

> Dariusz,
> This problem is most probably occurring because solr does not store dates
> in the format you've specified. It's something like: 2017-10-08T12:23:00Z.
> You'll probably need to specify your date in your efi feature in the manner
> above to get it to work.
>
> You can find more details on dates here:
> https://lucene.apache.org/solr/guide/6_6/working-with-dates.html
>
> On Wed 18 Oct, 2017, 19:16 Dariusz Wojtas,  wrote:
>
> > Hi,
> > I am using the LTR functionality (SOLR 7) and need to define a feature
> that
> > will check if the given request parameter of type date (ie. '1998-11-23')
> > matches birthDate in the stored document. Date granularity should be on
> DAY
> > level.
> > Simply:
> > * if dates match - return 1
> > * otherwise (birthDate not set, or they do not match) - return 0
> >
> > I have several features and do run some model that gives me the final
> > score. I cannot find a way that will calculate value for date related
> > feature.
> >
> > Currently i am having problem even with passing the date param, ie
> > '1998-11-23' to the feature to treat it as a date.
> >
> > My 'efi.' param for date is defined as follows:
> >  efi.searchBirthDate=1998-11-23
> >
> > In my feature I want to compare dates using the ms(x,y) function and
> check
> > if they are equal.
> >  ms(${searchBirthDate}, birthDate)
> >
> > But I get exception on calculating the feature:
> > Invalid Date String:'1998-11-23'
> >
> > Any idea how to solve such problem?
> >
> > Best regards,
> > Dariusz Wojtas
> >
> --
> Regards,
> Binoy Dalal
>


TermsQuery Result Ordering

2017-10-18 Thread Webster Homer
I have an application which currently uses a boolean query. The query could
have a large number of boolean terms. I know that the TermsQuery doesn't
have the same limitations as the boolean query. However I need to maintain
the order of the original terms.

The query terms from the boolean query are actually values returned by a
chemical structure search, which are returned in order of their relevancy
in the structure search. I maintain the order by giving them a boost which
is a function of the relevancy from the structure search.

structure_id:(12345^800 OR 12356^750 OR abcde^600 ...

This approach gives me the results in the order I need them in. I'd love to
use the TermsQuery instead as it doesn't have the same limitations.

Can I boost the Terms in the terms query? Is there a way to order the
results? e.g. would the results be returned in the same order I specified
the terms?

Thanks,

-- 


This message and any attachment are confidential and may be privileged or 
otherwise protected from disclosure. If you are not the intended recipient, 
you must not copy this message or attachment or disclose the contents to 
any other person. If you have received this transmission in error, please 
notify the sender immediately and delete the message and any attachment 
from your system. Merck KGaA, Darmstadt, Germany and any of its 
subsidiaries do not accept liability for any omissions or errors in this 
message which may arise as a result of E-Mail-transmission or for damages 
resulting from any unauthorized changes of the content of this message and 
any attachment thereto. Merck KGaA, Darmstadt, Germany and any of its 
subsidiaries do not guarantee that this message is free of viruses and does 
not accept liability for any damages caused by any virus transmitted 
therewith.

Click http://www.emdgroup.com/disclaimer to access the German, French, 
Spanish and Portuguese versions of this disclaimer.


Re: Trying to fix Too Many Boolean Clauses Exception

2017-10-18 Thread Erick Erickson
What have you tried? And what is the current setting?

This usually occurs when you are assembling very large OR clauses,
sometimes for ACL calculations.

So if you have a query of the form
q=field:(A OR B OR C OR)
or
fq=field:(A OR B OR C OR)

change it to use TermsQueryParser, see
https://lucene.apache.org/solr/guide/6_6/other-parsers.html it doesn't
suffer this limitation.

In recent versions of Solr this is automatic.

Best,
Erick

On Wed, Oct 18, 2017 at 7:44 AM, Patrick R. TOKOUO  wrote:
> Hello,
> Please I have unsuccessfuly tried to fix this error on Solr 6.4.
> I have increased  value to some max, but the same error
> appear.
> Please, could you help me.
>
> Regards,
> Patrick R. TOKOUO
> Mob: (+237) 6 90 08 55 95
> Skype: ptokouo
> In: www.linkedin.com/in/patricktokouo
>
> 
> Garanti
> sans virus. www.avg.com
> 
> <#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>


Re: LTR 'feature' and passing date parameters

2017-10-18 Thread Binoy Dalal
Dariusz,
This problem is most probably occurring because solr does not store dates
in the format you've specified. It's something like: 2017-10-08T12:23:00Z.
You'll probably need to specify your date in your efi feature in the manner
above to get it to work.

You can find more details on dates here:
https://lucene.apache.org/solr/guide/6_6/working-with-dates.html

On Wed 18 Oct, 2017, 19:16 Dariusz Wojtas,  wrote:

> Hi,
> I am using the LTR functionality (SOLR 7) and need to define a feature that
> will check if the given request parameter of type date (ie. '1998-11-23')
> matches birthDate in the stored document. Date granularity should be on DAY
> level.
> Simply:
> * if dates match - return 1
> * otherwise (birthDate not set, or they do not match) - return 0
>
> I have several features and do run some model that gives me the final
> score. I cannot find a way that will calculate value for date related
> feature.
>
> Currently i am having problem even with passing the date param, ie
> '1998-11-23' to the feature to treat it as a date.
>
> My 'efi.' param for date is defined as follows:
>  efi.searchBirthDate=1998-11-23
>
> In my feature I want to compare dates using the ms(x,y) function and check
> if they are equal.
>  ms(${searchBirthDate}, birthDate)
>
> But I get exception on calculating the feature:
> Invalid Date String:'1998-11-23'
>
> Any idea how to solve such problem?
>
> Best regards,
> Dariusz Wojtas
>
-- 
Regards,
Binoy Dalal


Trying to fix Too Many Boolean Clauses Exception

2017-10-18 Thread Patrick R. TOKOUO
Hello,
Please I have unsuccessfuly tried to fix this error on Solr 6.4.
I have increased  value to some max, but the same error
appear.
Please, could you help me.

Regards,
Patrick R. TOKOUO
Mob: (+237) 6 90 08 55 95
Skype: ptokouo
In: www.linkedin.com/in/patricktokouo


Garanti
sans virus. www.avg.com

<#DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>


Certificate issue

2017-10-18 Thread Younge, Kent A - Norman, OK - Contractor

Jack, 

Are you still having the same issue?





Thank you,

Kent Younge
Systems Engineer
USPS MTSC IT Support
600 W. Rock Creek Rd, Norman, OK  73069-8357
O:405 573 2273


-Original Message-
From: Younge, Kent A - Norman, OK - Contractor 
[mailto:kent.a.you...@usps.gov.INVALID] 
Sent: Monday, October 16, 2017 10:58 AM
To: solr-user@lucene.apache.org
Subject: RE: solrcloud dead-lock

Jack, 

No I still have the issue on one box only.  I have re-requested certificates 
several times and still come back with the same issue.  If I put a working 
certificate on the box everything works the way it should.  Also if I browse 
the https:  to the server name instead of the registered certificate name Solr 
comes up with a untrusted certificate showing that the site is registered to my 
certificate name.  So SOLR is working but, not with my certificates.   I have 
messed with the java security settings that did not help.  The box works like 
it should and for whatever, reason with that certificate it will not work.  I 
have changed the names of the certificate I had a hyphen in the name and 
thought that was causing an issue.  Took the hyphen out it made no difference.  
In IE I get the turn on TLS and even though it is set.  In Chrome I get 
ERR_SSL_Version or Cipher_MISMATCH.  






-Original Message-
From: SOLR6931 [mailto:solrpubl...@gmail.com] 
Sent: Monday, October 16, 2017 9:13 AM
To: solr-user@lucene.apache.org
Subject: Re: solrcloud dead-lock

Hey Kent,
Have you managed to find a solution to your problem?
I'm currently encountering the exact same issue.

Jack



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: [ANNOUNCE] Apache Solr 6.6.2 released

2017-10-18 Thread Ishan Chattopadhyaya
You can now access them. Thanks for your patience and understanding.

On 18 Oct 2017 3:15 pm, "Ishan Chattopadhyaya" 
wrote:

> Hi Bernd,
>
> > I get something like "permissionViolation=true", even after login!!!
> This is due to the security sensitive nature of those issues. Only PMC
> members have access to those issues at the moment.
> Please revisit those issues after Solr 5.5.5 release to see if you can
> access them.
>
> > Is SOLR going to be closed source?
> No
>
> > Do we have to pay for seeing the issues? ;-)
> No
>
> Regards,
> Ishan
>
>
> On Wed, Oct 18, 2017 at 2:23 PM, Bernd Fehling <
> bernd.fehl...@uni-bielefeld.de> wrote:
>
>> Thanks,
>> but I tried to access the mentioned issues of
>> https://lucene.apache.org/solr/6_6_2/changes/Changes.html
>>
>> https://issues.apache.org/jira/browse/SOLR-11477
>> https://issues.apache.org/jira/browse/SOLR-11482
>>
>> I get something like "permissionViolation=true", even after login!!!
>>
>> Is SOLR going to be closed source?
>>
>> Do we have to pay for seeing the issues? ;-)
>>
>> Regards
>> Bernd
>>
>>
>> Am 18.10.2017 um 10:29 schrieb Ishan Chattopadhyaya:
>> > 18 October 2017, Apache Solr™ 6.6.2 available
>> >
>> > The Lucene PMC is pleased to announce the release of Apache Solr 6.6.2
>> >
>> > Solr is the popular, blazing fast, open source NoSQL search platform
>> from
>> > the
>> > Apache Lucene project. Its major features include powerful full-text
>> > search,
>> > hit highlighting, faceted search and analytics, rich document parsing,
>> > geospatial search, extensive REST APIs as well as parallel SQL. Solr is
>> > enterprise grade, secure and highly scalable, providing fault tolerant
>> > distributed search and indexing, and powers the search and navigation
>> > features
>> > of many of the world's largest internet sites.
>> >
>> > This release includes a critical security fix and a bugfix. Details:
>> >
>> > * Fix for a 0-day exploit (CVE-2017-12629), details:
>> > https://s.apache.org/FJDl.
>> >   RunExecutableListener has been disabled by default (can be enabled by
>> >   -Dsolr.enableRunExecutableListener=true) and resolving external
>> entities
>> > in
>> >   the XML query parser (defType=xmlparser or {!xmlparser ... }) is
>> disabled
>> > by
>> >   default.
>> >
>> > * Fix a bug where Solr was attempting to load the same core twice (Error
>> > message:
>> >   "Lock held by this virtual machine").
>> >
>> > Furthermore, this release includes Apache Lucene 6.6.2 which includes
>> one
>> > security
>> > fix since the 6.6.1 release.
>> >
>> > The release is available for immediate download at:
>> >
>> >   http://www.apache.org/dyn/closer.lua/lucene/solr/6.6.2
>> >
>> > Please read CHANGES.txt for a detailed list of changes:
>> >
>> >   https://lucene.apache.org/solr/6_6_2/changes/Changes.html
>> >
>> > Please report any feedback to the mailing lists
>> > (http://lucene.apache.org/solr/discussion.html)
>> >
>> > Note: The Apache Software Foundation uses an extensive mirroring
>> > network for distributing releases. It is possible that the mirror you
>> > are using may not have replicated the release yet. If that is the
>> > case, please try another mirror. This also goes for Maven access.
>> >
>>
>
>


LTR 'feature' and passing date parameters

2017-10-18 Thread Dariusz Wojtas
Hi,
I am using the LTR functionality (SOLR 7) and need to define a feature that
will check if the given request parameter of type date (ie. '1998-11-23')
matches birthDate in the stored document. Date granularity should be on DAY
level.
Simply:
* if dates match - return 1
* otherwise (birthDate not set, or they do not match) - return 0

I have several features and do run some model that gives me the final
score. I cannot find a way that will calculate value for date related
feature.

Currently i am having problem even with passing the date param, ie
'1998-11-23' to the feature to treat it as a date.

My 'efi.' param for date is defined as follows:
 efi.searchBirthDate=1998-11-23

In my feature I want to compare dates using the ms(x,y) function and check
if they are equal.
 ms(${searchBirthDate}, birthDate)

But I get exception on calculating the feature:
Invalid Date String:'1998-11-23'

Any idea how to solve such problem?

Best regards,
Dariusz Wojtas


Re: zero-day exploit security issue

2017-10-18 Thread Cassandra Targett
The JIRA issues are now publicly viewable:

https://issues.apache.org/jira/browse/SOLR-11482
https://issues.apache.org/jira/browse/SOLR-11477



On Wed, Oct 18, 2017 at 4:49 AM, Ishan Chattopadhyaya
 wrote:
> There will be a 5.5.5 release soon. 6.6.2 has just been released.
>
> On Mon, Oct 16, 2017 at 8:17 PM, Keith L  wrote:
>
>> Additionally, it looks like the commits are public on github. Is this
>> backported to 5.5.x too? Users that are still on 5x might want to backport
>> some of the issues themselves since is not officially supported anymore.
>>
>> On Mon, Oct 16, 2017 at 10:11 AM Mike Drob  wrote:
>>
>> > Given that the already public nature of the disclosure, does it make
>> sense
>> > to make the work being done public prior to release as well?
>> >
>> > Normally security fixes are kept private while the vulnerabilities are
>> > private, but that's not the case here...
>> >
>> > On Mon, Oct 16, 2017 at 1:20 AM, Shalin Shekhar Mangar <
>> > shalinman...@gmail.com> wrote:
>> >
>> > > Yes, there is but it is private i.e. only the Apache Lucene PMC
>> > > members can see it. This is standard for all security issues in Apache
>> > > land. The fixes for this issue has been applied to the release
>> > > branches and the Solr 7.1.0 release candidate is already up for vote.
>> > > Barring any unforeseen circumstances, a 7.1.0 release with the fixes
>> > > should be expected this week.
>> > >
>> > > On Fri, Oct 13, 2017 at 8:14 PM, Xie, Sean  wrote:
>> > > > Is there a tracking to address this issue for SOLR 6.6.x and 7.x?
>> > > >
>> > > > https://lucene.apache.org/solr/news.html#12-october-
>> > > 2017-please-secure-your-apache-solr-servers-since-a-
>> > > zero-day-exploit-has-been-reported-on-a-public-mailing-list
>> > > >
>> > > > Sean
>> > > >
>> > > > Confidentiality Notice::  This email, including attachments, may
>> > include
>> > > non-public, proprietary, confidential or legally privileged
>> information.
>> > > If you are not an intended recipient or an authorized agent of an
>> > intended
>> > > recipient, you are hereby notified that any dissemination, distribution
>> > or
>> > > copying of the information contained in or transmitted with this e-mail
>> > is
>> > > unauthorized and strictly prohibited.  If you have received this email
>> in
>> > > error, please notify the sender by replying to this message and
>> > permanently
>> > > delete this e-mail, its attachments, and any copies of it immediately.
>> > You
>> > > should not retain, copy or use this e-mail or any attachment for any
>> > > purpose, nor disclose all or any part of the contents to any other
>> > person.
>> > > Thank you.
>> > >
>> > >
>> > >
>> > > --
>> > > Regards,
>> > > Shalin Shekhar Mangar.
>> > >
>> >
>>


Re: Concern on solr commit

2017-10-18 Thread Yonik Seeley
On Wed, Oct 18, 2017 at 5:09 AM, Leo Prince
 wrote:
> Is there any known negative impacts in setting up autoSoftCommit as 1
> second other than RAM usage..?

Briefly:
Don't use autowarming (but keep caches enabled!)
Use docValues for fields you will facet and sort on (this will avoid
using FieldCache)

-Yonik


Re: AW: Howto verify that update is "in-place"

2017-10-18 Thread alessandro.benedetti
According to the concept of immutability that should drive Lucene segmenting
approach, I think Emir observation sounds correct.

Being docValues a column based data structure, stored on segments i guess
when an in place update happens it does just a re-index of just that field.
This means we need to write a new segment containing the information and
potentially merge it if it is flushed to the disk.



-
---
Alessandro Benedetti
Search Consultant, R Software Engineer, Director
Sease Ltd. - www.sease.io
--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Influencing representing document in grouped search.

2017-10-18 Thread alessandro.benedetti
If you add a filter query to your original query :
fq=genre:A

You know that your results ( group heads included) will just be of that
genre.
So I think we are not getting your question properly.

Can you try to express your requirement from the beginning.
Leave outside grouping or field collapsing at the moment, let's see what is
the best way to solve the requirement in Apache Solr.



-
---
Alessandro Benedetti
Search Consultant, R Software Engineer, Director
Sease Ltd. - www.sease.io
--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Intermittent issue in solr index update

2017-10-18 Thread Bhaumik Joshi
Hi,


I am facing "Cannot talk to ZooKeeper" issue intermittently in solr index 
update. While facing this issue strange thing is that there is no error in 
ZooKeeper logs and also all shards are showing active in solr admin panel.


Please find below details logs and Solr server configuration.


Logs:

ERROR (qtp41903949-261266) [c:documents s:shard1 r:core_node4 x:documents] 
o.a.s.h.RequestHandlerBase org.apache.solr.common.SolrException: Cannot talk to 
ZooKeeper - Updates are disabled.
at 
org.apache.solr.update.processor.DistributedUpdateProcessor.zkCheck(DistributedUpdateProcessor.java:1490)
at 
org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:678)
at 
org.apache.solr.update.processor.LogUpdateProcessorFactory$LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:103)
at 
org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:48)
at 
org.apache.solr.update.processor.AsiteDocumentUpdateReqProcessor.processAdd(AsiteDocumentUpdateReqProcessorFactory.java:125)
at org.apache.solr.handler.loader.JavabinLoader$1.update(JavabinLoader.java:97)
at 
org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readOuterMostDocIterator(JavaBinUpdateRequestCodec.java:179)
at 
org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readIterator(JavaBinUpdateRequestCodec.java:135)
at org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:274)
at 
org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec$1.readNamedList(JavaBinUpdateRequestCodec.java:121)
at org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:239)
at org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:157)
at 
org.apache.solr.client.solrj.request.JavaBinUpdateRequestCodec.unmarshal(JavaBinUpdateRequestCodec.java:186)
at 
org.apache.solr.handler.loader.JavabinLoader.parseAndLoadDocs(JavabinLoader.java:107)
at org.apache.solr.handler.loader.JavabinLoader.load(JavabinLoader.java:54)
at 
org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:97)
at 
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:69)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:156)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:2036)
at org.apache.solr.servlet.HttpSolrCall.execute(HttpSolrCall.java:657)
at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:464)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:257)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:208)
at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1668)
at org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:581)
at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
at org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
at 
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)
at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1160)
at org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:511)
at 
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
at 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1092)
at org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
at 
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)
at 
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119)
at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
at org.eclipse.jetty.server.Server.handle(Server.java:518)
at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:308)
at org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:244)
at 
org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:273)
at org.eclipse.jetty.io.FillInterest.fillable(FillInterest.java:95)
at 
org.eclipse.jetty.io.SelectChannelEndPoint$2.run(SelectChannelEndPoint.java:93)
at 
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.produceAndRun(ExecuteProduceConsume.java:246)
at 
org.eclipse.jetty.util.thread.strategy.ExecuteProduceConsume.run(ExecuteProduceConsume.java:156)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool.runJob(QueuedThreadPool.java:654)
at 
org.eclipse.jetty.util.thread.QueuedThreadPool$3.run(QueuedThreadPool.java:572)
at java.lang.Thread.run(Thread.java:745)


Solr server configuration:
Processor: Intel(R) Xeon(R) CPU ES-2630 V4 @ 2.20Ghz (2 processor)
RAM : 128 GB usable
System type : 64-bit
OS : Window Server 2012 standard


Thanks & Regards,

Bhaumik Joshi


Re: zero-day exploit security issue

2017-10-18 Thread Ishan Chattopadhyaya
There will be a 5.5.5 release soon. 6.6.2 has just been released.

On Mon, Oct 16, 2017 at 8:17 PM, Keith L  wrote:

> Additionally, it looks like the commits are public on github. Is this
> backported to 5.5.x too? Users that are still on 5x might want to backport
> some of the issues themselves since is not officially supported anymore.
>
> On Mon, Oct 16, 2017 at 10:11 AM Mike Drob  wrote:
>
> > Given that the already public nature of the disclosure, does it make
> sense
> > to make the work being done public prior to release as well?
> >
> > Normally security fixes are kept private while the vulnerabilities are
> > private, but that's not the case here...
> >
> > On Mon, Oct 16, 2017 at 1:20 AM, Shalin Shekhar Mangar <
> > shalinman...@gmail.com> wrote:
> >
> > > Yes, there is but it is private i.e. only the Apache Lucene PMC
> > > members can see it. This is standard for all security issues in Apache
> > > land. The fixes for this issue has been applied to the release
> > > branches and the Solr 7.1.0 release candidate is already up for vote.
> > > Barring any unforeseen circumstances, a 7.1.0 release with the fixes
> > > should be expected this week.
> > >
> > > On Fri, Oct 13, 2017 at 8:14 PM, Xie, Sean  wrote:
> > > > Is there a tracking to address this issue for SOLR 6.6.x and 7.x?
> > > >
> > > > https://lucene.apache.org/solr/news.html#12-october-
> > > 2017-please-secure-your-apache-solr-servers-since-a-
> > > zero-day-exploit-has-been-reported-on-a-public-mailing-list
> > > >
> > > > Sean
> > > >
> > > > Confidentiality Notice::  This email, including attachments, may
> > include
> > > non-public, proprietary, confidential or legally privileged
> information.
> > > If you are not an intended recipient or an authorized agent of an
> > intended
> > > recipient, you are hereby notified that any dissemination, distribution
> > or
> > > copying of the information contained in or transmitted with this e-mail
> > is
> > > unauthorized and strictly prohibited.  If you have received this email
> in
> > > error, please notify the sender by replying to this message and
> > permanently
> > > delete this e-mail, its attachments, and any copies of it immediately.
> > You
> > > should not retain, copy or use this e-mail or any attachment for any
> > > purpose, nor disclose all or any part of the contents to any other
> > person.
> > > Thank you.
> > >
> > >
> > >
> > > --
> > > Regards,
> > > Shalin Shekhar Mangar.
> > >
> >
>


Re: [ANNOUNCE] Apache Solr 6.6.2 released

2017-10-18 Thread Ishan Chattopadhyaya
Hi Bernd,

> I get something like "permissionViolation=true", even after login!!!
This is due to the security sensitive nature of those issues. Only PMC
members have access to those issues at the moment.
Please revisit those issues after Solr 5.5.5 release to see if you can
access them.

> Is SOLR going to be closed source?
No

> Do we have to pay for seeing the issues? ;-)
No

Regards,
Ishan


On Wed, Oct 18, 2017 at 2:23 PM, Bernd Fehling <
bernd.fehl...@uni-bielefeld.de> wrote:

> Thanks,
> but I tried to access the mentioned issues of
> https://lucene.apache.org/solr/6_6_2/changes/Changes.html
>
> https://issues.apache.org/jira/browse/SOLR-11477
> https://issues.apache.org/jira/browse/SOLR-11482
>
> I get something like "permissionViolation=true", even after login!!!
>
> Is SOLR going to be closed source?
>
> Do we have to pay for seeing the issues? ;-)
>
> Regards
> Bernd
>
>
> Am 18.10.2017 um 10:29 schrieb Ishan Chattopadhyaya:
> > 18 October 2017, Apache Solr™ 6.6.2 available
> >
> > The Lucene PMC is pleased to announce the release of Apache Solr 6.6.2
> >
> > Solr is the popular, blazing fast, open source NoSQL search platform from
> > the
> > Apache Lucene project. Its major features include powerful full-text
> > search,
> > hit highlighting, faceted search and analytics, rich document parsing,
> > geospatial search, extensive REST APIs as well as parallel SQL. Solr is
> > enterprise grade, secure and highly scalable, providing fault tolerant
> > distributed search and indexing, and powers the search and navigation
> > features
> > of many of the world's largest internet sites.
> >
> > This release includes a critical security fix and a bugfix. Details:
> >
> > * Fix for a 0-day exploit (CVE-2017-12629), details:
> > https://s.apache.org/FJDl.
> >   RunExecutableListener has been disabled by default (can be enabled by
> >   -Dsolr.enableRunExecutableListener=true) and resolving external
> entities
> > in
> >   the XML query parser (defType=xmlparser or {!xmlparser ... }) is
> disabled
> > by
> >   default.
> >
> > * Fix a bug where Solr was attempting to load the same core twice (Error
> > message:
> >   "Lock held by this virtual machine").
> >
> > Furthermore, this release includes Apache Lucene 6.6.2 which includes one
> > security
> > fix since the 6.6.1 release.
> >
> > The release is available for immediate download at:
> >
> >   http://www.apache.org/dyn/closer.lua/lucene/solr/6.6.2
> >
> > Please read CHANGES.txt for a detailed list of changes:
> >
> >   https://lucene.apache.org/solr/6_6_2/changes/Changes.html
> >
> > Please report any feedback to the mailing lists
> > (http://lucene.apache.org/solr/discussion.html)
> >
> > Note: The Apache Software Foundation uses an extensive mirroring
> > network for distributing releases. It is possible that the mirror you
> > are using may not have replicated the release yet. If that is the
> > case, please try another mirror. This also goes for Maven access.
> >
>


Re: Concern on solr commit

2017-10-18 Thread Leo Prince
Hi Yonik,

Thank you for the inputs.

When checked, Upgrading Solr and trying out commitWithin takes a lot of
code change from the existing application codebase, hence we are planning
to use autoSoftCommit as 1 second and maintaining  autoCommit as 15
seconds. So we can maintain our NRT demands. Hence we can avoid frequent
and numerous hard commits while we can search new docs written NRT.

Is there any known negative impacts in setting up autoSoftCommit as 1
second other than RAM usage..?

Thanks in advance.

Regards,
Leo Prince.

On Tue, Oct 17, 2017 at 7:29 PM, Yonik Seeley  wrote:

> Related: maxWarmingSearchers behavior was fixed (block for another
> commit to succeed first rather than fail) in Solr 6.4 and later.
> https://issues.apache.org/jira/browse/SOLR-9712
>
> Also, if any of your "realtime" search requests only involve
> retrieving certain documents by ID, then you can use "realtime get"
> without opening a new searcher.
>
> -Yonik
>
>
> On Tue, Oct 17, 2017 at 9:45 AM, Leo Prince
>  wrote:
> > Hi,
> >
> > Thank you Emir, Erick and Shawn for your inputs.
> >
> > I am currently using SolrCloud and planning to try out commitWithin
> > parameter to reduce hard commits as per your advise. Though, just wanted
> to
> > double check whether commitWithin have any
> > negative impacts in SolrCloud environment like lag to search from other
> > nodes in SolrCloud.
> >
> > Thanks,
> > Leo Prince.
> >
> > On Tue, Oct 17, 2017 at 4:01 AM, Shawn Heisey 
> wrote:
> >
> >> I'm supplementing the other replies you've already gotten.  See inline:
> >>
> >>
> >> On 10/13/2017 2:30 AM, Leo Prince wrote:
> >> > I am getting the following errors/warnings from Solr > > 1, ERROR: >
> >> org.apache.solr.client.solrj.impl.HttpSolrServer$RemoteSolrException: >
> >> > Error opening new searcher. exceeded limit of maxWarmingSearchers=2,
> >> > try again later. 2, PERFORMANCE WARNING: Overlapping >
> onDeckSearchers=2
> >> 3, WARN: DistributedUpdateProcessor error sending
> >> See this FAQ entry:
> >>
> >> https://wiki.apache.org/solr/FAQ?highlight=%28ondecksearchers%29#What_
> >> does_.22exceeded_limit_of_maxWarmingSearchers.3DX.22_mean.3F
> >>
> >> > So my concern is, is there any chance of performance issues when >
> >> number of commits are high at a particular point of time. In our >
> >> application, we are approximating like 100-500 commits can happen >
> >> simultaneously from application and autocommit too for those >
> >> individual requests which are not committing individually after the >
> >> write. > > Autocommit is configured as follows, > > 
> >> 15000 > false
> >> 
> >> The commits generated by this configuration are not opening new
> >> searchers, so they are not connected in any way to the error messages
> >> you're getting, which are about new searchers.  Note that this
> >> particular kind of configuration is strongly recommended for ALL Solr
> >> installs using Solr 4.0 and later, so that transaction logs do not grow
> >> out of control.  I would personally use a value of at least 6 for
> >> autoCommit, but there is nothing wrong with a 15 second interval.
> >>
> >> The initial reply you got on this thread mentioned that commits from the
> >> application are discouraged.  I don't agree with this statement, but I
> >> will say that the way that people *use* commits from the application is
> >> frequently very wrong, and because of that, switching to fully automatic
> >> soft commits is often the best solution, because they are somewhat
> >> easier to control.
> >>
> >> We have no way of knowing how long it will take to open a new searcher
> >> on your index.  It depends on a lot of factors.  Whatever that time is,
> >> commits should not be happening on a more frequent basis than that
> >> interval.  They should happen *less* frequently than that interval if at
> >> all possible.  Depending on exactly how Solr is configured, it might be
> >> possible to reduce the amount of time that a commit with a new searcher
> >> takes to complete.
> >>
> >> Definitely avoid sending a commit after every document.  It is generally
> >> also a bad idea to send a commit with every update request.  If you want
> >> to do commits manually, then you should index a bunch of data and then
> >> send one commit to make all those changes visible, and not do another
> >> commit until you do another batch of indexing.
> >>
> >> Thanks,
> >> Shawn
> >>
> >>
>


Re: [ANNOUNCE] Apache Solr 6.6.2 released

2017-10-18 Thread Bernd Fehling
Thanks,
but I tried to access the mentioned issues of
https://lucene.apache.org/solr/6_6_2/changes/Changes.html

https://issues.apache.org/jira/browse/SOLR-11477
https://issues.apache.org/jira/browse/SOLR-11482

I get something like "permissionViolation=true", even after login!!!

Is SOLR going to be closed source?

Do we have to pay for seeing the issues? ;-)

Regards
Bernd


Am 18.10.2017 um 10:29 schrieb Ishan Chattopadhyaya:
> 18 October 2017, Apache Solr™ 6.6.2 available
> 
> The Lucene PMC is pleased to announce the release of Apache Solr 6.6.2
> 
> Solr is the popular, blazing fast, open source NoSQL search platform from
> the
> Apache Lucene project. Its major features include powerful full-text
> search,
> hit highlighting, faceted search and analytics, rich document parsing,
> geospatial search, extensive REST APIs as well as parallel SQL. Solr is
> enterprise grade, secure and highly scalable, providing fault tolerant
> distributed search and indexing, and powers the search and navigation
> features
> of many of the world's largest internet sites.
> 
> This release includes a critical security fix and a bugfix. Details:
> 
> * Fix for a 0-day exploit (CVE-2017-12629), details:
> https://s.apache.org/FJDl.
>   RunExecutableListener has been disabled by default (can be enabled by
>   -Dsolr.enableRunExecutableListener=true) and resolving external entities
> in
>   the XML query parser (defType=xmlparser or {!xmlparser ... }) is disabled
> by
>   default.
> 
> * Fix a bug where Solr was attempting to load the same core twice (Error
> message:
>   "Lock held by this virtual machine").
> 
> Furthermore, this release includes Apache Lucene 6.6.2 which includes one
> security
> fix since the 6.6.1 release.
> 
> The release is available for immediate download at:
> 
>   http://www.apache.org/dyn/closer.lua/lucene/solr/6.6.2
> 
> Please read CHANGES.txt for a detailed list of changes:
> 
>   https://lucene.apache.org/solr/6_6_2/changes/Changes.html
> 
> Please report any feedback to the mailing lists
> (http://lucene.apache.org/solr/discussion.html)
> 
> Note: The Apache Software Foundation uses an extensive mirroring
> network for distributing releases. It is possible that the mirror you
> are using may not have replicated the release yet. If that is the
> case, please try another mirror. This also goes for Maven access.
> 


[ANNOUNCE] Apache Solr 6.6.2 released

2017-10-18 Thread Ishan Chattopadhyaya
18 October 2017, Apache Solr™ 6.6.2 available

The Lucene PMC is pleased to announce the release of Apache Solr 6.6.2

Solr is the popular, blazing fast, open source NoSQL search platform from
the
Apache Lucene project. Its major features include powerful full-text
search,
hit highlighting, faceted search and analytics, rich document parsing,
geospatial search, extensive REST APIs as well as parallel SQL. Solr is
enterprise grade, secure and highly scalable, providing fault tolerant
distributed search and indexing, and powers the search and navigation
features
of many of the world's largest internet sites.

This release includes a critical security fix and a bugfix. Details:

* Fix for a 0-day exploit (CVE-2017-12629), details:
https://s.apache.org/FJDl.
  RunExecutableListener has been disabled by default (can be enabled by
  -Dsolr.enableRunExecutableListener=true) and resolving external entities
in
  the XML query parser (defType=xmlparser or {!xmlparser ... }) is disabled
by
  default.

* Fix a bug where Solr was attempting to load the same core twice (Error
message:
  "Lock held by this virtual machine").

Furthermore, this release includes Apache Lucene 6.6.2 which includes one
security
fix since the 6.6.1 release.

The release is available for immediate download at:

  http://www.apache.org/dyn/closer.lua/lucene/solr/6.6.2

Please read CHANGES.txt for a detailed list of changes:

  https://lucene.apache.org/solr/6_6_2/changes/Changes.html

Please report any feedback to the mailing lists
(http://lucene.apache.org/solr/discussion.html)

Note: The Apache Software Foundation uses an extensive mirroring
network for distributing releases. It is possible that the mirror you
are using may not have replicated the release yet. If that is the
case, please try another mirror. This also goes for Maven access.


Re: AW: Howto verify that update is "in-place"

2017-10-18 Thread Emir Arnautović
Hi,
Not claiming that is the case here, but I think I've read some comments (I
think in some Jira) that suggest that inplace updates are not as cheap as
one might think and that does not reindex dox but dies rewrite of doc
values for the segment. Did not look at code, but if someone is familiar
with this, can one please jump in and comment hiw cheap in place updates
are.

Thanks,
Emir

On Oct 17, 2017 2:11 PM, "James"  wrote:

> I found a solution which works for me:
>
> Add a document with very little tokenized text and write down QTime (for
> me: 5ms)
> Add another document with very much text (I used about 1MB of Lorem Ipsum
> sample text) and write down QTime (for me: 70ms).
> Perform an update operation on document 2 which you want to test whether
> it is "in-place" and compare QTime.
> For me it was again 70ms. So I assume that my operation did re-index the
> whole document and was thus not an in-place update.
>
>
> -Ursprüngliche Nachricht-
> Von: Amrit Sarkar [mailto:sarkaramr...@gmail.com]
> Gesendet: Dienstag, 17. Oktober 2017 12:43
> An: solr-user@lucene.apache.org
> Betreff: Re: Howto verify that update is "in-place"
>
> James,
>
> @Amrit: Are you saying that the _version_ field should not change when
> > performing an atomic update operation?
>
>
> It should change. a new version will be allotted to the document. I am not
> that sure about in-place updates, probably a test run will verify that.
>
> Amrit Sarkar
> Search Engineer
> Lucidworks, Inc.
> 415-589-9269
> www.lucidworks.com
> Twitter http://twitter.com/lucidworks
> LinkedIn: https://www.linkedin.com/in/sarkaramrit2
>
> On Tue, Oct 17, 2017 at 4:06 PM, James  wrote:
>
> > Hi Emir and Amrit, thanks for your reponses!
> >
> > @Emir: Nice idea but after changing any document in any way and after
> > committing the changes, all Doc counter (Num, Max, Deleted) are still
> > the same, only thing that changes is the Version (increases by steps of
> 2) .
> >
> > @Amrit: Are you saying that the _version_ field should not change when
> > performing an atomic update operation?
> >
> > Thanks
> > James
> >
> >
> > -Ursprüngliche Nachricht-
> > Von: Amrit Sarkar [mailto:sarkaramr...@gmail.com]
> > Gesendet: Dienstag, 17. Oktober 2017 11:35
> > An: solr-user@lucene.apache.org
> > Betreff: Re: Howto verify that update is "in-place"
> >
> > Hi James,
> >
> > As for each update you are doing via atomic operation contains the
> > "id" / "uniqueKey". Comparing the "_version_" field value for one of
> > them would be fine for a batch. Rest, Emir has list them out.
> >
> > Amrit Sarkar
> > Search Engineer
> > Lucidworks, Inc.
> > 415-589-9269
> > www.lucidworks.com
> > Twitter http://twitter.com/lucidworks
> > LinkedIn: https://www.linkedin.com/in/sarkaramrit2
> >
> > On Tue, Oct 17, 2017 at 2:47 PM, Emir Arnautović <
> > emir.arnauto...@sematext.com> wrote:
> >
> > > Hi James,
> > > I did not try, but checking max and num doc might give you info if
> > > update was in-place or atomic - atomic is reindexing of existing doc
> > > so the old doc will be deleted. In-place update should just update
> > > doc values of existing doc so number of deleted docs should not change.
> > >
> > > HTH,
> > > Emir
> > > --
> > > Monitoring - Log Management - Alerting - Anomaly Detection Solr &
> > > Elasticsearch Consulting Support Training - http://sematext.com/
> > >
> > >
> > >
> > > > On 17 Oct 2017, at 09:57, James  wrote:
> > > >
> > > > I am using Solr 6.6 and carefully read the documentation about
> > > > atomic and in-place updates. I am pretty sure that everything is
> > > > set up as it
> > > should.
> > > >
> > > >
> > > >
> > > > But how can I make certain that a simple update command actually
> > > performs an
> > > > in-place update without internally re-indexing all other fields?
> > > >
> > > >
> > > >
> > > > I am issuing this command to my server:
> > > >
> > > > (I am using implicit document routing, so I need the "Shard"
> > > > parameter.)
> > > >
> > > >
> > > >
> > > > {
> > > >
> > > > "ID":1133,
> > > >
> > > > "Property_2":{"set":124},
> > > >
> > > > "Shard":"FirstShard"
> > > >
> > > > }
> > > >
> > > >
> > > >
> > > >
> > > >
> > > > The log outputs:
> > > >
> > > >
> > > >
> > > > 2017-10-17 07:39:18.701 INFO  (qtp1937348256-643) [c:MyCollection
> > > > s:FirstShard r:core_node27 x:MyCollection_FirstShard_replica1]
> > > > o.a.s.u.p.LogUpdateProcessorFactory
> > > > [MyCollection_FirstShard_replica1]
> > > > webapp=/solr path=/update
> > > > params={commitWithin=1000=1.0=true=
> > > json&_=1508221142230}{
> > > > add=[1133 (1581489542869811200)]} 0 1
> > > >
> > > > 2017-10-17 07:39:19.703 INFO  (commitScheduler-283-thread-1)
> > > [c:MyCollection
> > > > s:FirstShard r:core_node27 x:MyCollection_FirstShard_replica1]
> > > > o.a.s.u.DirectUpdateHandler2 start
> > > > commit{,optimize=false,openSearcher=false,waitSearcher=true,
> > > expungeDeletes=f
> > > >