Re: Performance warning: Overlapping onDeskSearchers=2 solr

2017-05-17 Thread Susheel Kumar
+1 to change to new message

A strawman new message could be: "Performance warning: Overlapping
onDeskSearchers=2; consider reducing commit frequency if performance
problems encountered"

On Wed, May 17, 2017 at 1:15 PM, Mike Drob  wrote:

> You're committing too frequently, so you have new searchers getting queued
> up before the previous ones have been processed.
>
> You have several options on how to deal with this. Can increase commit
> interval, add hardware, or reduce query warming.
>
> I don't know if uncommenting that section will help because I don't know
> what your current settings are. Or if you are using manual commits.
>
> Mike
>
> On Wed, May 17, 2017, 4:58 AM Srinivas Kashyap 
> wrote:
>
> > Hi All,
> >
> > We are using Solr 5.2.1 version and are currently experiencing below
> > Warning in Solr Logging Console:
> >
> > Performance warning: Overlapping onDeskSearchers=2
> >
> > Also we encounter,
> >
> > org.apache.solr.common.SolrException: Error opening new searcher.
> exceeded
> > limit of maxWarmingSearchers=2,​ try again later.
> >
> >
> > The reason being, we are doing mass update on our application and solr
> > experiencing the higher loads at times. Data is being indexed using
> DIH(sql
> > queries).
> >
> > In solrconfig.xml below is the code.
> >
> > 
> >
> > Should we be uncommenting the above lines and try to avoid this error?
> > Please help me.
> >
> > Thanks and Regards,
> > Srinivas Kashyap
> >
> > 
> >
> > DISCLAIMER: E-mails and attachments from Bamboo Rose, LLC are
> > confidential. If you are not the intended recipient, please notify the
> > sender immediately by replying to the e-mail, and then delete it without
> > making copies or using it in any way. No representation is made that this
> > email or any attachments are free of viruses. Virus scanning is
> recommended
> > and is the responsibility of the recipient.
> >
>


Is IndexSchema addFields and addCopyFields concurrent?

2017-05-17 Thread Michael Hu
Hi all:


I am new to Solr, and I am using Solr 6.4.2. I try to add fields and copyFields 
to schema programmatically as below. However, in a very few occasions, I see a 
few fields are not added but copyFields are added when I try to add a lot of 
fields and copyFields (about 80 fields, 40 copyFields (one field is source, the 
other is destination). This causes core initialization failure because the 
associated fields for copyFields do not exist?


Can someone help me?


Thank you!


--Michael Hu


synchronized (oldSchema.getSchemaUpdateLock()) {

try {

IndexSchema newSchema = 
oldSchema.addFields(newFields).addCopyFields(newCopyFields, true);

if (null != newSchema) {

core.setLatestSchema(newSchema);

cmd.getReq().updateSchemaToLatest();

latestSchemaMap.put(core.getName(), newSchema);

log.debug("Successfully added field(s) to the 
schema.");

break; // success - exit from the retry loop

} else {

throw new 
SolrException(SolrException.ErrorCode.SERVER_ERROR, "Failed to add fields.");

}

} catch (ManagedIndexSchema.FieldExistsException e) {

log.error("At least one field to be added already 
exists in the schema - retrying.");

oldSchema = core.getLatestSchema();

cmd.getReq().updateSchemaToLatest();

} catch (ManagedIndexSchema.SchemaChangedInZkException 
e) {

log.debug("Schema changed while processing request 
- retrying.");

oldSchema = core.getLatestSchema();

cmd.getReq().updateSchemaToLatest();

}

}



Re: Possible regression in Parallel SQL in 6.5.1?

2017-05-17 Thread Timothy Potter
cool, thanks ... easy enough to fix the SQL statement for now ;-)


On Tue, May 16, 2017 at 6:27 PM, Kevin Risden  wrote:
> Well didn't take as long as I thought:
> https://issues.apache.org/jira/browse/CALCITE-1306
>
> Once Calcite 1.13 is released we should upgrade and get support for this
> again.
>
> Kevin Risden
>
> On Tue, May 16, 2017 at 7:23 PM, Kevin Risden 
> wrote:
>
>> Yea this came up on the calcite mailing list. Not sure if aliases in the
>> having clause were going to be added. I'll have to see if I can find that
>> discussion or JIRA.
>>
>> Kevin Risden
>>
>> On May 16, 2017 18:54, "Joel Bernstein"  wrote:
>>
>>> Yeah, Calcite doesn't support field aliases in the having clause. The
>>> query
>>> should work if you use count(*). We could consider this a regression, but
>>> I
>>> think this will be a won't fix.
>>>
>>> Joel Bernstein
>>> http://joelsolr.blogspot.com/
>>>
>>> On Tue, May 16, 2017 at 12:51 PM, Timothy Potter 
>>> wrote:
>>>
>>> > This SQL used to work pre-calcite:
>>> >
>>> > SELECT movie_id, COUNT(*) as num_ratings, avg(rating) as aggAvg FROM
>>> > ratings GROUP BY movie_id HAVING num_ratings > 100 ORDER BY aggAvg ASC
>>> > LIMIT 10
>>> >
>>> > Now I get:
>>> > Caused by: java.io.IOException: -->
>>> > http://192.168.1.4:8983/solr/ratings_shard2_replica1/:Failed to
>>> > execute sqlQuery 'SELECT movie_id, COUNT(*) as num_ratings,
>>> > avg(rating) as aggAvg FROM ratings GROUP BY movie_id HAVING
>>> > num_ratings > 100 ORDER BY aggAvg ASC LIMIT 10' against JDBC
>>> > connection 'jdbc:calcitesolr:'.
>>> > Error while executing SQL "SELECT movie_id, COUNT(*) as num_ratings,
>>> > avg(rating) as aggAvg FROM ratings GROUP BY movie_id HAVING
>>> > num_ratings > 100 ORDER BY aggAvg ASC LIMIT 10": From line 1, column
>>> > 103 to line 1, column 113: Column 'num_ratings' not found in any table
>>> > at org.apache.solr.client.solrj.io.stream.SolrStream.read(
>>> > SolrStream.java:235)
>>> > at com.lucidworks.spark.query.TupleStreamIterator.fetchNextTupl
>>> e(
>>> > TupleStreamIterator.java:82)
>>> > at com.lucidworks.spark.query.TupleStreamIterator.hasNext(
>>> > TupleStreamIterator.java:47)
>>> > ... 31 more
>>> >
>>>
>>


Re: Performance warning: Overlapping onDeskSearchers=2 solr

2017-05-17 Thread Mike Drob
You're committing too frequently, so you have new searchers getting queued
up before the previous ones have been processed.

You have several options on how to deal with this. Can increase commit
interval, add hardware, or reduce query warming.

I don't know if uncommenting that section will help because I don't know
what your current settings are. Or if you are using manual commits.

Mike

On Wed, May 17, 2017, 4:58 AM Srinivas Kashyap 
wrote:

> Hi All,
>
> We are using Solr 5.2.1 version and are currently experiencing below
> Warning in Solr Logging Console:
>
> Performance warning: Overlapping onDeskSearchers=2
>
> Also we encounter,
>
> org.apache.solr.common.SolrException: Error opening new searcher. exceeded
> limit of maxWarmingSearchers=2,​ try again later.
>
>
> The reason being, we are doing mass update on our application and solr
> experiencing the higher loads at times. Data is being indexed using DIH(sql
> queries).
>
> In solrconfig.xml below is the code.
>
> 
>
> Should we be uncommenting the above lines and try to avoid this error?
> Please help me.
>
> Thanks and Regards,
> Srinivas Kashyap
>
> 
>
> DISCLAIMER: E-mails and attachments from Bamboo Rose, LLC are
> confidential. If you are not the intended recipient, please notify the
> sender immediately by replying to the e-mail, and then delete it without
> making copies or using it in any way. No representation is made that this
> email or any attachments are free of viruses. Virus scanning is recommended
> and is the responsibility of the recipient.
>


Re: Solr Admin Documents tab

2017-05-17 Thread Rick Leir
Chris, Shawn,
I am using 5.2.1 . Neither the array (Shawn) nor the document list (Chris) 
works for me in the Admin panel. However, CSV works fine.

Clearly we are long overdue for an upgrade. 
Cheers -- Rick

On May 17, 2017 10:22:28 AM EDT, Shawn Heisey  wrote:
>On 5/16/2017 12:41 PM, Rick Leir wrote:
>> In the Solr Admin Documents tab, with the document type set to JSON,
>I cannot get it to accept more than one document. The legend says
>"Document(s)". What syntax is expected? It rejects an array of
>documents. Thanks -- Rick
>
>See the box labeled "Adding Multiple JSON Documents" on this page for
>an
>example of multiple JSON documents being added in one request:
>
>https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+Index+Handlers#UploadingDatawithIndexHandlers-Solr-StyleJSON
>
>The next section, labeled "Sending JSON Update Commands" shows how to
>use the command-based syntax with multiple documents and commands.
>
>Thanks,
>Shawn

-- 
Sorry for being brief. Alternate email is rickleir at yahoo dot com 

Re: Performance warning: Overlapping onDeskSearchers=2 solr

2017-05-17 Thread Shalin Shekhar Mangar
This has been changed already in 6.4.0. From the CHANGES.txt entry:

SOLR-9712: maxWarmingSearchers now defaults to 1, and more importantly
commits will now block if this
  limit is exceeded instead of throwing an exception (a good thing).
Consequently there is no longer a
  risk in overlapping commits.  Nonetheless users should continue to
avoid excessive committing. Users are
  advised to remove any pre-existing maxWarmingSearchers entries from
their solrconfig.xml files.

On Wed, May 17, 2017 at 8:45 PM, Jason Gerlowski  wrote:
> Hey Shawn, others.
>
> This is a pitfall that Solr users seem to run into with some
> frequency.  (Anecdotally, I've bookmarked the Lucidworks article you
> referenced because I end up referring people to it often enough.)
>
> The immediate first advice when someone encounters these
> onDeckSearcher error messages is to examine their commit settings.  Is
> there any other possible cause for those messages?  If not, can we
> consider changing the log/exception error message to be more explicit
> about the cause?
>
> A strawman new message could be: "Performance warning: Overlapping
> onDeskSearchers=2; consider reducing commit frequency if performance
> problems encountered"
>
> Happy to create a JIRA/patch for this; just wanted to get some
> feedback first in case there's an obvious reason the messages don't
> get explicit about the cause.
>
> Jason
>
> On Wed, May 17, 2017 at 8:49 AM, Shawn Heisey  wrote:
>> On 5/17/2017 5:57 AM, Srinivas Kashyap wrote:
>>> We are using Solr 5.2.1 version and are currently experiencing below 
>>> Warning in Solr Logging Console:
>>>
>>> Performance warning: Overlapping onDeskSearchers=2
>>>
>>> Also we encounter,
>>>
>>> org.apache.solr.common.SolrException: Error opening new searcher. exceeded 
>>> limit of maxWarmingSearchers=2, try again later.
>>>
>>>
>>> The reason being, we are doing mass update on our application and solr 
>>> experiencing the higher loads at times. Data is being indexed using DIH(sql 
>>> queries).
>>>
>>> In solrconfig.xml below is the code.
>>>
>>> 
>>>
>>> Should we be uncommenting the above lines and try to avoid this error? 
>>> Please help me.
>>
>> This warning means that you are committing so frequently that there are
>> already two searchers warming when you start another commit.
>>
>> DIH does a commit exactly once -- at the end of the import.  One import will 
>> not cause the warning message you're seeing, so if there is one import 
>> happening at a time, either you are sending explicit commit requests during 
>> the import, or you have autoSoftCommit enabled with values that are far too 
>> small.
>>
>> You should definitely have autoCommit configured, but I would remove
>> maxDocs and set maxTime to something like 6 -- one minute.  The
>> autoCommit should also set openSearcher to false.  This kind of commit
>> will not make new changes visible, but it will start a new transaction
>> log frequently.
>>
>>
>>  6
>>  false
>>
>>
>> An automatic commit (soft or hard) with a one second interval is going to 
>> cause that warning you're seeing.
>>
>> https://lucidworks.com/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/
>>
>> Thanks,
>> Shawn
>>



-- 
Regards,
Shalin Shekhar Mangar.


Re: Performance warning: Overlapping onDeskSearchers=2 solr

2017-05-17 Thread Erick Erickson
Also, what is your autoSoftCommit setting? That also opens up a new searcher.

On Wed, May 17, 2017 at 8:15 AM, Jason Gerlowski  wrote:
> Hey Shawn, others.
>
> This is a pitfall that Solr users seem to run into with some
> frequency.  (Anecdotally, I've bookmarked the Lucidworks article you
> referenced because I end up referring people to it often enough.)
>
> The immediate first advice when someone encounters these
> onDeckSearcher error messages is to examine their commit settings.  Is
> there any other possible cause for those messages?  If not, can we
> consider changing the log/exception error message to be more explicit
> about the cause?
>
> A strawman new message could be: "Performance warning: Overlapping
> onDeskSearchers=2; consider reducing commit frequency if performance
> problems encountered"
>
> Happy to create a JIRA/patch for this; just wanted to get some
> feedback first in case there's an obvious reason the messages don't
> get explicit about the cause.
>
> Jason
>
> On Wed, May 17, 2017 at 8:49 AM, Shawn Heisey  wrote:
>> On 5/17/2017 5:57 AM, Srinivas Kashyap wrote:
>>> We are using Solr 5.2.1 version and are currently experiencing below 
>>> Warning in Solr Logging Console:
>>>
>>> Performance warning: Overlapping onDeskSearchers=2
>>>
>>> Also we encounter,
>>>
>>> org.apache.solr.common.SolrException: Error opening new searcher. exceeded 
>>> limit of maxWarmingSearchers=2, try again later.
>>>
>>>
>>> The reason being, we are doing mass update on our application and solr 
>>> experiencing the higher loads at times. Data is being indexed using DIH(sql 
>>> queries).
>>>
>>> In solrconfig.xml below is the code.
>>>
>>> 
>>>
>>> Should we be uncommenting the above lines and try to avoid this error? 
>>> Please help me.
>>
>> This warning means that you are committing so frequently that there are
>> already two searchers warming when you start another commit.
>>
>> DIH does a commit exactly once -- at the end of the import.  One import will 
>> not cause the warning message you're seeing, so if there is one import 
>> happening at a time, either you are sending explicit commit requests during 
>> the import, or you have autoSoftCommit enabled with values that are far too 
>> small.
>>
>> You should definitely have autoCommit configured, but I would remove
>> maxDocs and set maxTime to something like 6 -- one minute.  The
>> autoCommit should also set openSearcher to false.  This kind of commit
>> will not make new changes visible, but it will start a new transaction
>> log frequently.
>>
>>
>>  6
>>  false
>>
>>
>> An automatic commit (soft or hard) with a one second interval is going to 
>> cause that warning you're seeing.
>>
>> https://lucidworks.com/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/
>>
>> Thanks,
>> Shawn
>>


Re: Performance warning: Overlapping onDeskSearchers=2 solr

2017-05-17 Thread Jason Gerlowski
Hey Shawn, others.

This is a pitfall that Solr users seem to run into with some
frequency.  (Anecdotally, I've bookmarked the Lucidworks article you
referenced because I end up referring people to it often enough.)

The immediate first advice when someone encounters these
onDeckSearcher error messages is to examine their commit settings.  Is
there any other possible cause for those messages?  If not, can we
consider changing the log/exception error message to be more explicit
about the cause?

A strawman new message could be: "Performance warning: Overlapping
onDeskSearchers=2; consider reducing commit frequency if performance
problems encountered"

Happy to create a JIRA/patch for this; just wanted to get some
feedback first in case there's an obvious reason the messages don't
get explicit about the cause.

Jason

On Wed, May 17, 2017 at 8:49 AM, Shawn Heisey  wrote:
> On 5/17/2017 5:57 AM, Srinivas Kashyap wrote:
>> We are using Solr 5.2.1 version and are currently experiencing below Warning 
>> in Solr Logging Console:
>>
>> Performance warning: Overlapping onDeskSearchers=2
>>
>> Also we encounter,
>>
>> org.apache.solr.common.SolrException: Error opening new searcher. exceeded 
>> limit of maxWarmingSearchers=2, try again later.
>>
>>
>> The reason being, we are doing mass update on our application and solr 
>> experiencing the higher loads at times. Data is being indexed using DIH(sql 
>> queries).
>>
>> In solrconfig.xml below is the code.
>>
>> 
>>
>> Should we be uncommenting the above lines and try to avoid this error? 
>> Please help me.
>
> This warning means that you are committing so frequently that there are
> already two searchers warming when you start another commit.
>
> DIH does a commit exactly once -- at the end of the import.  One import will 
> not cause the warning message you're seeing, so if there is one import 
> happening at a time, either you are sending explicit commit requests during 
> the import, or you have autoSoftCommit enabled with values that are far too 
> small.
>
> You should definitely have autoCommit configured, but I would remove
> maxDocs and set maxTime to something like 6 -- one minute.  The
> autoCommit should also set openSearcher to false.  This kind of commit
> will not make new changes visible, but it will start a new transaction
> log frequently.
>
>
>  6
>  false
>
>
> An automatic commit (soft or hard) with a one second interval is going to 
> cause that warning you're seeing.
>
> https://lucidworks.com/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/
>
> Thanks,
> Shawn
>


Re: solr /export handler - behavior during close()

2017-05-17 Thread Susmit Shukla
Thanks Joel, will try that.
Binary response would be more performant.
I observed the server sends responses in 32 kb chunks and the client reads
it with 8 kb buffer on inputstream. I don't know if changing that can
impact anything on performance. Even if buffer size is increased on
httpclient, it can't override the hardcoded 8kb buffer on
sun.nio.cs.StreamDecoder

Thanks,
Susmit

On Wed, May 17, 2017 at 5:49 AM, Joel Bernstein  wrote:

> Susmit,
>
> You could wrap a LimitStream around the outside of all the relational
> algebra. For example:
>
> parallel(limit((intersect(intersect(search, search), union(search,
> search)
>
> In this scenario the limit would happen on the workers.
>
> As far as the worker/replica ratio. This will depend on how heavy the
> export is. If it's a light export, small number of fields, mostly numeric,
> simple sort params, then I've seen a ratio of 5 (workers) to 1 (replica)
> work well. This will basically saturate the CPU on the replica. But heavier
> exports will saturate the replicas with fewer workers.
>
> Also I tend to use Direct DocValues to get the best performance. I'm not
> sure how much difference this makes, but it should eliminate the
> compression overhead fetching the data from the DocValues.
>
> Varun's suggestion of using the binary transport will provide a nice
> performance increase as well. But you'll need to upgrade. You may need to
> do that anyway as the fix on the early stream close will be on a later
> version that was refactored to support the binary transport.
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
> On Tue, May 16, 2017 at 8:03 PM, Joel Bernstein 
> wrote:
>
> > Yep, saw it. I'll comment on the ticket for what I believe needs to be
> > done.
> >
> > Joel Bernstein
> > http://joelsolr.blogspot.com/
> >
> > On Tue, May 16, 2017 at 8:00 PM, Varun Thacker 
> wrote:
> >
> >> Hi Joel,Susmit
> >>
> >> I created https://issues.apache.org/jira/browse/SOLR-10698 to track the
> >> issue
> >>
> >> @Susmit looking at the stack trace I see the expression is using
> >> JSONTupleStream
> >> . I wonder if you tried using JavabinTupleStreamParser could it help
> >> improve performance ?
> >>
> >> On Tue, May 16, 2017 at 9:39 AM, Susmit Shukla  >
> >> wrote:
> >>
> >> > Hi Joel,
> >> >
> >> > queries can be arbitrarily nested with AND/OR/NOT joins e.g.
> >> >
> >> > (intersect(intersect(search, search), union(search, search))). If I
> cut
> >> off
> >> > the innermost stream with a limit, the complete intersection would not
> >> > happen at upper levels. Also would the limit stream have same effect
> as
> >> > using /select handler with rows parameter?
> >> > I am trying to force input stream close through reflection, just to
> see
> >> if
> >> > it gives performance gains.
> >> >
> >> > 2) would experiment with null streams. Is workers = number of replicas
> >> in
> >> > data collection a good thumb rule? is parallelstream performance upper
> >> > bounded by number of replicas?
> >> >
> >> > Thanks,
> >> > Susmit
> >> >
> >> > On Tue, May 16, 2017 at 5:59 AM, Joel Bernstein 
> >> > wrote:
> >> >
> >> > > Your approach looks OK. The single sharded worker collection is only
> >> > needed
> >> > > if you were using CloudSolrStream to send the initial Streaming
> >> > Expression
> >> > > to the /stream handler. You are not doing this, so you're approach
> is
> >> > fine.
> >> > >
> >> > > Here are some thoughts on what you described:
> >> > >
> >> > > 1) If you are closing the parallel stream after the top 1000
> results,
> >> > then
> >> > > try wrapping the intersect in a LimitStream. This stream doesn't
> exist
> >> > yet
> >> > > so it will be a custom stream. The LimitStream can return the EOF
> >> tuple
> >> > > after it reads N tuples. This will cause the worker nodes to close
> the
> >> > > underlying stream and cause the Broken Pipe exception to occur at
> the
> >> > > /export handler, which will stop the /export.
> >> > >
> >> > > Here is the basic approach:
> >> > >
> >> > > parallel(limit(intersect(search, search)))
> >> > >
> >> > >
> >> > > 2) It can be tricky to understand where the bottleneck lies when
> using
> >> > the
> >> > > ParallelStream for parallel relational algebra. You can use the
> >> > NullStream
> >> > > to get an understanding of why performance is not increasing when
> you
> >> > > increase the workers. Here is the basic approach:
> >> > >
> >> > > parallel(null(intersect(search, search)))
> >> > >
> >> > > The NullStream will eat all the tuples on the workers and return a
> >> single
> >> > > tuple with the tuple count and the time taken to run the expression.
> >> So
> >> > > you'll get one tuple from each worker. This will eliminate any
> >> bottleneck
> >> > > on tuples returning through the ParallelStream and you can focus on
> >> the
> >> > > performance of the intersect and the /export handler.
> >> > >
> >> > > 

Re: cursorMark value causes Request-URI Too Long excpetion

2017-05-17 Thread Shawn Heisey
On 5/17/2017 2:40 AM, Giedrius wrote:
> I've been using cursorMark for quite a while, but I noticed that
> sometimes the value is huge (more than 8K). It results in Request-URI
> Too Long response. Is there a way to send cursorMark in POST request's
> Body? If it is, could you please provide an example? If post is not
> possilbe, is there any other way to fix the issue?

Yes, you can send any/all parameters as a POST request body.

Exactly how to do this will depend on what method/language you're using
to make the requests now.  Without knowing that, I have no idea how you
would adjust to use POST.

You can also increase the header size that the servlet container running
Solr will accept, but using POST is a better option.

Thanks,
Shawn



Re: Solr Admin Documents tab

2017-05-17 Thread Shawn Heisey
On 5/16/2017 12:41 PM, Rick Leir wrote:
> In the Solr Admin Documents tab, with the document type set to JSON, I cannot 
> get it to accept more than one document. The legend says "Document(s)". What 
> syntax is expected? It rejects an array of documents. Thanks -- Rick

See the box labeled "Adding Multiple JSON Documents" on this page for an
example of multiple JSON documents being added in one request:

https://cwiki.apache.org/confluence/display/solr/Uploading+Data+with+Index+Handlers#UploadingDatawithIndexHandlers-Solr-StyleJSON

The next section, labeled "Sending JSON Update Commands" shows how to
use the command-based syntax with multiple documents and commands.

Thanks,
Shawn



cursorMark value causes Request-URI Too Long excpetion

2017-05-17 Thread Giedrius
Hi,

I've been using cursorMark for quite a while, but I noticed that sometimes
the value is huge (more than 8K). It results in Request-URI Too Long
response. Is there a way to send cursorMark in POST request's Body? If it
is, could you please provide an example? If post is not possilbe, is there
any other way to fix the issue?

Thanks!


Re: setup solrcloud from scratch vie web-ui

2017-05-17 Thread Shawn Heisey
On 5/17/2017 6:18 AM, Thomas Porschberg wrote:
> Thank you. I am now a step further.
> I could import data into the new collection with the DIH. However I observed 
> the following exception 
> in solr.log:
>
> request: 
> http://127.0.1.1:8983/solr/hugo_shard1_replica1/update?update.distrib=TOLEADER=http%3A%2F%2F127.0.1.1%3A8983%2Fsolr%2Fhugo_shard2_replica1%2F=javabin=2
> Remote error message: This IndexSchema is not mutable.

This probably means that the configuration has an update processor that
adds unknown fields, but is using the classic schema instead of the
managed schema.  If you want unknown fields to automatically be guessed
and added, then you need the managed schema.  If not, then remove the
custom update processor chain.  If this doesn't sound like what's wrong,
then we will need the entire error message including the full Java
stacktrace.  That may be in the other instance's solr.log file.

> I imagine to split my data per day of the year. My idea was to create 365 
> shards of type compositeKey.

You cannot control shard routing explicitly with the compositeId
router.  That router uses a hash of the uniqueKey field to decide which
shard gets the document.  As its name implies, the hash can be composite
-- parts of the hash can be decided by multiple parts of the value in
the field, but it's still hashed.

You must use the implicit router (which means all routing is manual) if
you want to explicitly name the shard that receives the data.

Thanks,
Shawn



Re: Performance warning: Overlapping onDeskSearchers=2 solr

2017-05-17 Thread Shawn Heisey
On 5/17/2017 5:57 AM, Srinivas Kashyap wrote:
> We are using Solr 5.2.1 version and are currently experiencing below Warning 
> in Solr Logging Console:
>
> Performance warning: Overlapping onDeskSearchers=2
>
> Also we encounter,
>
> org.apache.solr.common.SolrException: Error opening new searcher. exceeded 
> limit of maxWarmingSearchers=2,​ try again later.
>
>
> The reason being, we are doing mass update on our application and solr 
> experiencing the higher loads at times. Data is being indexed using DIH(sql 
> queries).
>
> In solrconfig.xml below is the code.
>
> 
>
> Should we be uncommenting the above lines and try to avoid this error? Please 
> help me.

This warning means that you are committing so frequently that there are
already two searchers warming when you start another commit.

DIH does a commit exactly once -- at the end of the import.  One import will 
not cause the warning message you're seeing, so if there is one import 
happening at a time, either you are sending explicit commit requests during the 
import, or you have autoSoftCommit enabled with values that are far too small.

You should definitely have autoCommit configured, but I would remove
maxDocs and set maxTime to something like 6 -- one minute.  The
autoCommit should also set openSearcher to false.  This kind of commit
will not make new changes visible, but it will start a new transaction
log frequently.

   
 6
 false
   

An automatic commit (soft or hard) with a one second interval is going to cause 
that warning you're seeing.

https://lucidworks.com/understanding-transaction-logs-softcommit-and-commit-in-sorlcloud/

Thanks,
Shawn



Re: solr /export handler - behavior during close()

2017-05-17 Thread Joel Bernstein
Susmit,

You could wrap a LimitStream around the outside of all the relational
algebra. For example:

parallel(limit((intersect(intersect(search, search), union(search,
search)

In this scenario the limit would happen on the workers.

As far as the worker/replica ratio. This will depend on how heavy the
export is. If it's a light export, small number of fields, mostly numeric,
simple sort params, then I've seen a ratio of 5 (workers) to 1 (replica)
work well. This will basically saturate the CPU on the replica. But heavier
exports will saturate the replicas with fewer workers.

Also I tend to use Direct DocValues to get the best performance. I'm not
sure how much difference this makes, but it should eliminate the
compression overhead fetching the data from the DocValues.

Varun's suggestion of using the binary transport will provide a nice
performance increase as well. But you'll need to upgrade. You may need to
do that anyway as the fix on the early stream close will be on a later
version that was refactored to support the binary transport.

Joel Bernstein
http://joelsolr.blogspot.com/

On Tue, May 16, 2017 at 8:03 PM, Joel Bernstein  wrote:

> Yep, saw it. I'll comment on the ticket for what I believe needs to be
> done.
>
> Joel Bernstein
> http://joelsolr.blogspot.com/
>
> On Tue, May 16, 2017 at 8:00 PM, Varun Thacker  wrote:
>
>> Hi Joel,Susmit
>>
>> I created https://issues.apache.org/jira/browse/SOLR-10698 to track the
>> issue
>>
>> @Susmit looking at the stack trace I see the expression is using
>> JSONTupleStream
>> . I wonder if you tried using JavabinTupleStreamParser could it help
>> improve performance ?
>>
>> On Tue, May 16, 2017 at 9:39 AM, Susmit Shukla 
>> wrote:
>>
>> > Hi Joel,
>> >
>> > queries can be arbitrarily nested with AND/OR/NOT joins e.g.
>> >
>> > (intersect(intersect(search, search), union(search, search))). If I cut
>> off
>> > the innermost stream with a limit, the complete intersection would not
>> > happen at upper levels. Also would the limit stream have same effect as
>> > using /select handler with rows parameter?
>> > I am trying to force input stream close through reflection, just to see
>> if
>> > it gives performance gains.
>> >
>> > 2) would experiment with null streams. Is workers = number of replicas
>> in
>> > data collection a good thumb rule? is parallelstream performance upper
>> > bounded by number of replicas?
>> >
>> > Thanks,
>> > Susmit
>> >
>> > On Tue, May 16, 2017 at 5:59 AM, Joel Bernstein 
>> > wrote:
>> >
>> > > Your approach looks OK. The single sharded worker collection is only
>> > needed
>> > > if you were using CloudSolrStream to send the initial Streaming
>> > Expression
>> > > to the /stream handler. You are not doing this, so you're approach is
>> > fine.
>> > >
>> > > Here are some thoughts on what you described:
>> > >
>> > > 1) If you are closing the parallel stream after the top 1000 results,
>> > then
>> > > try wrapping the intersect in a LimitStream. This stream doesn't exist
>> > yet
>> > > so it will be a custom stream. The LimitStream can return the EOF
>> tuple
>> > > after it reads N tuples. This will cause the worker nodes to close the
>> > > underlying stream and cause the Broken Pipe exception to occur at the
>> > > /export handler, which will stop the /export.
>> > >
>> > > Here is the basic approach:
>> > >
>> > > parallel(limit(intersect(search, search)))
>> > >
>> > >
>> > > 2) It can be tricky to understand where the bottleneck lies when using
>> > the
>> > > ParallelStream for parallel relational algebra. You can use the
>> > NullStream
>> > > to get an understanding of why performance is not increasing when you
>> > > increase the workers. Here is the basic approach:
>> > >
>> > > parallel(null(intersect(search, search)))
>> > >
>> > > The NullStream will eat all the tuples on the workers and return a
>> single
>> > > tuple with the tuple count and the time taken to run the expression.
>> So
>> > > you'll get one tuple from each worker. This will eliminate any
>> bottleneck
>> > > on tuples returning through the ParallelStream and you can focus on
>> the
>> > > performance of the intersect and the /export handler.
>> > >
>> > > Then experiment with:
>> > >
>> > > 1) Increasing the number of parallel workers.
>> > > 2) Increasing the number of replicas in the data collections.
>> > >
>> > > And watch the timing information coming back from the NullStream
>> tuples.
>> > If
>> > > increasing the workers is not improving performance then the
>> bottleneck
>> > may
>> > > be in the /export handler. So try increasing replicas and see if that
>> > > improves performance. Different partitions of the streams will be
>> served
>> > by
>> > > different replicas.
>> > >
>> > > If performance doesn't improve with the NullStream after increasing
>> both
>> > > workers and replicas then we know the bottleneck is the network.
>> > >
>> > > Joel 

Re: setup solrcloud from scratch vie web-ui

2017-05-17 Thread Thomas Porschberg
> Tom Evans  hat am 17. Mai 2017 um 11:48 geschrieben:
> 
> 
> On Wed, May 17, 2017 at 6:28 AM, Thomas Porschberg
>  wrote:
> > Hi,
> >
> > I did not manipulating the data dir. What I did was:
> >
> > 1. Downloaded solr-6.5.1.zip
> > 2. ensured no solr process is running
> > 3. unzipped solr-6.5.1.zip to ~/solr_new2/solr-6.5.1
> > 3. started an external zookeeper
> > 4. copied a conf directory from a working non-cloudsolr (6.5.1) to
> >~/solr_new2/solr-6.5.1 so that I have ~/solr_new2/solr-6.5.1/conf
> >   (see http://randspringer.de/solrcloud_test/my.zip for content)
> 
> ..in which you've manipulated the dataDir! :)
> 
> The problem (I think) is that you have set a fixed data dir, and when
> Solr attempts to create a second core (for whatever reason, in your
> case it looks like you are adding a shard), Solr puts it exactly where
> you have told it to, in the same directory as the previous one. It
> finds the lock and blows up, because each core needs to be in a
> separate directory, but you've instructed Solr to put them in the same
> one.
> 
> Start with a the solrconfig from basic_configs configset that ships
> with Solr and add the special things that your installation needs. I
> am not massively surprised that your non cloud config does not work in
> cloud mode, when we moved to SolrCloud, we rewrote from scratch
> solrconfig.xml and schema.xml, starting from basic_configs and adding
> anything particular that we needed from our old config, checking every
> difference that we have from stock config and noting/discerning why,
> and ensuring that our field types are using the same names for the
> same types as basic_config wherever possible.
> 
> I only say all that because to fix this issue is a single thing, but
> you should spend the time comparing configs because this will not be
> the only issue. Anyway, to fix this problem, in your solrconfig.xml
> you have:
> 
>   data
> 
> It should be
> 
>   ${solr.data.dir:}
> 
> Which is still in your config, you've just got it commented out :)

Thank you. I am now a step further. 
I could import data into the new collection with the DIH. However I observed 
the following exception 
in solr.log:

request: 
http://127.0.1.1:8983/solr/hugo_shard1_replica1/update?update.distrib=TOLEADER=http%3A%2F%2F127.0.1.1%3A8983%2Fsolr%2Fhugo_shard2_replica1%2F=javabin=2
Remote error message: This IndexSchema is not mutable.
at 
org.apache.solr.client.solrj.impl.ConcurrentUpdateSolrClient$Runner.sendUpdateStream(ConcurrentUpdateSolrClient.java:345)

I also noticed that only one shard is filled.
The wiki describes how to populate data with the rest api. However I use the 
data importer.
I imagine to split my data per day of the year. My idea was to create 365 
shards of type compositeKey. In my SQL I have a date field and it is no problem 
to overwrite data after one year.
However I'm looking for a good example how to achieve this. May be I need in 
this case 365 dataimport.xml files under each shard one... with some 
modulo-expression for the specific day.
Currently the dataimport.xml is in the conf directory.
So I'm looking for a good example how to use the DIH with solrcloud.
Should it work to create a implicit router instead of compositeKey router (with 
365 shards) and simply specfiy as router.field= ?

Thomas


Performance warning: Overlapping onDeskSearchers=2 solr

2017-05-17 Thread Srinivas Kashyap
Hi All,

We are using Solr 5.2.1 version and are currently experiencing below Warning in 
Solr Logging Console:

Performance warning: Overlapping onDeskSearchers=2

Also we encounter,

org.apache.solr.common.SolrException: Error opening new searcher. exceeded 
limit of maxWarmingSearchers=2,​ try again later.


The reason being, we are doing mass update on our application and solr 
experiencing the higher loads at times. Data is being indexed using DIH(sql 
queries).

In solrconfig.xml below is the code.



Should we be uncommenting the above lines and try to avoid this error? Please 
help me.

Thanks and Regards,
Srinivas Kashyap



DISCLAIMER: E-mails and attachments from Bamboo Rose, LLC are confidential. If 
you are not the intended recipient, please notify the sender immediately by 
replying to the e-mail, and then delete it without making copies or using it in 
any way. No representation is made that this email or any attachments are free 
of viruses. Virus scanning is recommended and is the responsibility of the 
recipient.


Re: knowing which fields were successfully hit

2017-05-17 Thread John Blythe
hey erik, totally unaware of those two. we're able to retrieve metadata
about the query itself that way?

-- 
*John Blythe*
Product Manager & Lead Developer

251.605.3071 | j...@curvolabs.com
www.curvolabs.com

58 Adams Ave
Evansville, IN 47713

On Tue, May 16, 2017 at 1:54 PM, Erik Hatcher 
wrote:

> Is this the equivalent of facet.query’s?   or maybe rather, group.query?
>
> Erik
>
>
>
> > On May 16, 2017, at 1:16 PM, Dorian Hoxha 
> wrote:
> >
> > Something like elasticsearch named-queries, right
> > https://www.elastic.co/guide/en/elasticsearch/reference/
> current/search-request-named-queries-and-filters.html
> > ?
> >
> >
> > On Tue, May 16, 2017 at 7:10 PM, John Blythe  wrote:
> >
> >> sorry for the confusion. as in i received results due to matches on
> field x
> >> vs. field y.
> >>
> >> i've gone w a highlighting solution for now. the fact that it requires
> >> field storage isn't yet prohibitive for me, so can serve well for now.
> open
> >> to any alternative approaches all the same
> >>
> >> thanks-
> >>
> >> --
> >> *John Blythe*
> >> Product Manager & Lead Developer
> >>
> >> 251.605.3071 | j...@curvolabs.com
> >> www.curvolabs.com
> >>
> >> 58 Adams Ave
> >> Evansville, IN 47713
> >>
> >> On Tue, May 16, 2017 at 11:37 AM, David Hastings <
> >> hastings.recurs...@gmail.com> wrote:
> >>
> >>> what do you mean "hit?" As in the user clicked it?
> >>>
> >>> On Tue, May 16, 2017 at 11:35 AM, John Blythe 
> >> wrote:
> >>>
>  hey all. i'm sending data out that could represent a purchased item or
> >> a
>  competitive alternative. when the results are returned i'm needing to
> >>> know
>  which of the two were hit so i can serve up the *other*.
> 
>  i can make a blunt instrument in the application layer to simply look
> >>> for a
>  match between the queried terms and the resulting fields, but the
> >> problem
>  of fuzzy matching and some of the special analysis being done to get
> >> the
>  hits will be for naught.
> 
>  cursory googling landed me at a similar discussion that suggested
> using
> >>> hit
>  highlighting or retrieving the debuggers explain data to sort through.
> 
>  is there another, more efficient means or are these the two tools in
> >> the
>  toolbox?
> 
>  thanks!
> 
> >>>
> >>
>
>


Re: setup solrcloud from scratch vie web-ui

2017-05-17 Thread Tom Evans
On Wed, May 17, 2017 at 6:28 AM, Thomas Porschberg
 wrote:
> Hi,
>
> I did not manipulating the data dir. What I did was:
>
> 1. Downloaded solr-6.5.1.zip
> 2. ensured no solr process is running
> 3. unzipped solr-6.5.1.zip to ~/solr_new2/solr-6.5.1
> 3. started an external zookeeper
> 4. copied a conf directory from a working non-cloudsolr (6.5.1) to
>~/solr_new2/solr-6.5.1 so that I have ~/solr_new2/solr-6.5.1/conf
>   (see http://randspringer.de/solrcloud_test/my.zip for content)

..in which you've manipulated the dataDir! :)

The problem (I think) is that you have set a fixed data dir, and when
Solr attempts to create a second core (for whatever reason, in your
case it looks like you are adding a shard), Solr puts it exactly where
you have told it to, in the same directory as the previous one. It
finds the lock and blows up, because each core needs to be in a
separate directory, but you've instructed Solr to put them in the same
one.

Start with a the solrconfig from basic_configs configset that ships
with Solr and add the special things that your installation needs. I
am not massively surprised that your non cloud config does not work in
cloud mode, when we moved to SolrCloud, we rewrote from scratch
solrconfig.xml and schema.xml, starting from basic_configs and adding
anything particular that we needed from our old config, checking every
difference that we have from stock config and noting/discerning why,
and ensuring that our field types are using the same names for the
same types as basic_config wherever possible.

I only say all that because to fix this issue is a single thing, but
you should spend the time comparing configs because this will not be
the only issue. Anyway, to fix this problem, in your solrconfig.xml
you have:

  data

It should be

  ${solr.data.dir:}

Which is still in your config, you've just got it commented out :)

Cheers

Tom


CollapsingQParserPlugin with more than one field

2017-05-17 Thread Mikhail Ibraheem
Hi,

I want to group huge documents with 3 or 4 fields, but CollapsingQParserPlugin 
works only with one field.

If I add more than one fq for CollapsingQParserPlugin for different fields, it 
doesn't group the fields but it uses the group of the first field as input to 
the second one and hence gives me wrong results.

I want to group by more than one field at the same collector, is there a way 
for this?

 

Thanks

Mikhail