Re: Re: Config for massive inserts into Solr master

2016-10-09 Thread Reinhard Budenstecher
>
> That's considerably larger than you initially indicated.  In just one
> index, you've got almost 300 million docs taking up well over 200GB.
> About half of them have been deleted, but they are still there.  Those
> deleted docs *DO* affect operation and memory usage.
>

Yes, that's larger than I expected. Two days ago the index was at the size I've 
written. This huge increase does happen because of running ETL.

>
> usage.  The only effective way to get rid of them is to optimize the
> index ... but I will warn you that with an index of that size, the time
> required for an optimize can reach into multiple hours, and will
> temporarily require considerable additional disk space.  The fact that

Three days ago we've upgraded from Solr 5.5.3 to 6.2.1. Before upgrading I've 
optimized this index already and yes, it took some hours. So when two days of 
ETL cause such an increase of index size, running a daily optimize is not an 
option.

>
> You don't need to create it.  Stacktraces are logged by Solr, in a file
> named solr.log, whenever most errors occur.
>

Really, there is nothing in solr. log. I did not change any option related to 
this in config.  Solr died again some hours ago and the last entry is:

2016-10-09 22:02:31.051 WARN  (qtp225493257-1097) [   ] 
o.a.s.h.a.LukeRequestHandler Error getting file length for [segments_9102]
java.nio.file.NoSuchFileException: 
/var/solr/data/myshop/data/index/segments_9102
at 
sun.nio.fs.UnixException.translateToIOException(UnixException.java:86)
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:102)
at sun.nio.fs.UnixException.rethrowAsIOException(UnixException.java:107)
at 
sun.nio.fs.UnixFileAttributeViews$Basic.readAttributes(UnixFileAttributeViews.java:55)
at 
sun.nio.fs.UnixFileSystemProvider.readAttributes(UnixFileSystemProvider.java:144)
at 
sun.nio.fs.LinuxFileSystemProvider.readAttributes(LinuxFileSystemProvider.java:99)
at java.nio.file.Files.readAttributes(Files.java:1737)
at java.nio.file.Files.size(Files.java:2332)
at org.apache.lucene.store.FSDirectory.fileLength(FSDirectory.java:243)
at 
org.apache.lucene.store.NRTCachingDirectory.fileLength(NRTCachingDirectory.java:128)
at 
org.apache.solr.handler.admin.LukeRequestHandler.getFileLength(LukeRequestHandler.java:597)
at 
org.apache.solr.handler.admin.LukeRequestHandler.getIndexInfo(LukeRequestHandler.java:585)
at 
org.apache.solr.handler.admin.CoreAdminOperation.getCoreStatus(CoreAdminOperation.java:1007)
at 
org.apache.solr.handler.admin.CoreAdminOperation.lambda$static$3(CoreAdminOperation.java:170)
at 
org.apache.solr.handler.admin.CoreAdminOperation.execute(CoreAdminOperation.java:1056)
at 
org.apache.solr.handler.admin.CoreAdminHandler$CallInfo.call(CoreAdminHandler.java:365)
at 
org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:156)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:154)
at 
org.apache.solr.servlet.HttpSolrCall.handleAdminRequest(HttpSolrCall.java:658)
at org.apache.solr.servlet.HttpSolrCall.call(HttpSolrCall.java:440)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:257)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:208)
at 
org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1668)
at 
org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:581)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:143)
at 
org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:548)
at 
org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:226)
at 
org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1160)
at 
org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:511)
at 
org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:185)
at 
org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1092)
at 
org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:141)
at 
org.eclipse.jetty.server.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:213)
at 
org.eclipse.jetty.server.handler.HandlerCollection.handle(HandlerCollection.java:119)
at 
org.eclipse.jetty.server.handler.HandlerWrapper.handle(HandlerWrapper.java:134)
at org.eclipse.jetty.server.Server.handle(Server.java:518)
at org.eclipse.jetty.server.HttpChannel.handle(HttpChannel.java:308)
at 
org.eclipse.jetty.server.HttpConnection.onFillable(HttpConnection.java:244)
at 
org.eclipse.jetty.io.AbstractConnection$ReadCallback.succeeded(AbstractConnection.java:273)

Re: solr 5 leaving tomcat, will I be the only one fearing about this?

2016-10-09 Thread Shalin Shekhar Mangar
As far as configuration is concerned -- everything Solr specific goes to
zookeeper. The solr.xml can also be put into zookeeper. JNDI is not
supported -- can you explain why you need it? Can cluster properties solve
the problem? or replica properties? Both of those can go into zookeeper.

Integration testing is possible using MiniSolrCloudCluster which is built
to be used programatically. It starts a real zookeeper instance along with
real solr instances on jetties.

Most other things are by design e.g. whole cluster changes on configuration
change otherwise sneaky bugs where one node misses a config change are
easily possible and very very difficult to debug. But you can e.g. create a
new collection with just one node, stage your changes, run health checks
and update the configuration of the main collection.

The old master/slave setup was very simple. The cloud is a whole new ball
game. Having control of jetty has given Solr a lot of flexibility. At this
time I discourage anyone from changing anything inside Jetty's config. If
there are certain things that are not possible without then please let us
know so figure out how to build such things in Solr itself. We want jetty
to be a implementation detail and no more.

If you have suggestions on how we can fix some of these problems, please
speak up.

On Sun, Oct 9, 2016 at 5:41 AM, Aristedes Maniatis  wrote:

> On 9/10/16 2:09am, Shawn Heisey wrote:
> > One of the historical challenges on this mailing list is that we were
> > rarely aware of what steps the user had taken to install or start Solr,
> > and we had to support pretty much any scenario.  Since 5.0, the number
> > of supported ways to deploy and start Solr is greatly reduced, and those
> > ways were written by the project, so we tend to have a better
> > understanding of what is happening when a user starts Solr.  We also
> > usually know the relative location of the logfiles and Solr's data.
>
>
> This migration is causing a lot of grief for us as well, and we are still
> struggling to get all the bits in place. Before:
>
> * gradle build script
> * gradle project includes our own unit tests, run in jenkins
> * generates war file
> * relevant configuration is embedded into the build
> * deployment specific variables (db uris, passwords, ip addresses)
> conveniently contained in one context.xml file
>
>
> Now:
>
> * Solr version is no longer bound to our tests or configuration
>
> * configuration is now scattered in three places:
>  - zookeeper
>  - solr.xml in the data directory
>  - jetty files as part of the solr install that you need to replace (for
> example to set JNDI properties)
>
> * deployment is also scattered:
>  - Solr platform specific package manager (pkg in FreeBSD in my case,
> which I've had to write myself since it didn't exist)
>  - updating config files above
>  - writing custom scripts to push Zookeeper configuration into production
>  - creating collections/cores using the API rather than in a config file
>
> * unit testing no longer possible since you can't run a mock zookeeper
> instance
>
> * zookeeper is very hard to integrate with deployment processes (salt,
> puppet, etc) since configuration is no longer a set of version controlled
> files
>
> * you can't change the configuration of one node as a 'soft deployment':
> the whole cluster needs to be changed at once
>
> If we didn't need a less broken replication solution, I'd stay on Solr4
> forever.
>
>
> I really liked the old war deployment. It bound the solr version and
> configuration management into our version controlled source repository
> except for one context.xml file that contained server specific deployment
> options. Nice.
>
> The new arrangement is a mess.
>
>
> Ari
>
>
>
> --
> -->
> Aristedes Maniatis
> GPG fingerprint CBFB 84B4 738D 4E87 5E5C  5EFA EF6A 7D2E 3E49 102A
>



-- 
Regards,
Shalin Shekhar Mangar.


Re: newSearcher autowarming queries in solrconfig.xml run but does not appear to warm cache

2016-10-09 Thread Dalton Gooding
Erick,
I have tried tuning the queries with some limited success. I still get drastic 
differences between the first time I fire my warming query (after newSearcher 
ran query) and the second time, or any variant of the query i.e. removing 
fields or changing parameters, it runs much faster.
I am not sure what I am missing here, I put a query into the newSearcher 
section that runs fine, but the exact same query run after warming still takes 
the full time of a un-warmed query.
Can you break it down to the most basic type of newSearcher query to try and 
shrink the gap between first query and subsequent queries sent?
I cannot see why sending the same query after a newSearcher is slow, when 
subsequent queries run faster. I figured this was the idea of the newSearcher 
stanza's.  

On Friday, 7 October 2016, 14:45, Erick Erickson  
wrote:
 

 Replying on the public thread, somehow your mail was sent to me privately.

Pasted your email to me below for others.

You are still confusing documents and results. Forget about the rows
parameter, for this discussion it's irrelevant.

The QTime is the time spent searching. It is unaffected by whether a
document is in the documentCache or not.
It _solely_ measures the time that Solr/Lucene take to find the top N
documents (where N is the rows param) and
record their internal Lucene doc ID.

Increasing the rows or the document cache won't change anything about
the QTime. The documentCache is
totally the wrong place to focus.


The response when you re-submit the query suggests that getting the
top N docs' internal Lucene ID is
fetched from the queryResultCache. Changing the window size is also
irrelevant to this discussion. If you
vary the query even slightly you won't hit the queryResultCache. A
very easy way to check this is the
admin UI>>select core>>plugins/stats>>QueryHandler and then probably
the "select" handler. If you see
the hits go up after the fast query then you're getting the results
from the querResultCache.

What _is_ relevant is populating the low-level Lucene caches with
values from the indexed terms. My
contention is that this is not happening with match-all queries, i.e.
field:* or field:[* TO *] because in
those cases, a doc matches or doesn't based on whether it has anything
in the field. There's no point
in finding values since it doesn't matter anyway. And "finding values"
means reading indexed terms
from disk into low-level Lucene caches.

When I say "populate the low-level Lucene caches", what I'm really
talking about is reading them from
disk into your physical memory via MMapDirectory, see Uwe's excellent blog:
http://blog.thetaphi.de/2012/07/use-lucenes-mmapdirectory-on-64bit.html

So the suggestion is that you use real values from your index or
possibly ranges is so that part or all
of your disk files get read into MMapDirectorySpace via the first or
new Searcher event.

Please just give it a try. My bet is that you'll see your QTime values
first time after autowarming
go down. Significantly. Be sure to use a wide variety of different
values for autowarming.

BTW, the autowarmCounts in solrconfig.xml filterCache and
queryResultCache are intended
to warm by using the last N fq or q clauses on the theory that the
most recent N are predictive
of the next N.

Best,
Erick


***

I believe the return time back to the command line from the curl
command and the QTime as shown below

time curl -v 
'http:///solr/core1/select?fq=DataType_s%3AProduct=WebSections_ms%3Ahouse=%28VisibleOnline_ms%3ANAT+OR+VisibleOnline_ms%3A7%29=%7B%21tag%3Dcurrent_group%7DGroupIds_ms%3A458=SalesRank_f+desc=true=%7B%21ex%3Dcurrent_group%7Dattr_GroupLevel0=BrandID_s=%7B%21ex%3Dcurrent_group%7Dattr_GroupLevel2=%7B%21ex%3Dcurrent_group%7Dattr_GroupLevel1=SubBrandID_s=ProductAttr_967_ms=ProductAttr_NEG21_ms=ProductAttr_1852_ms=Price_7_f%3A%5B%2A+TO+%2A%5D=Price_2_f%3A%5B%2A+TO+%2A%5D=Price_3_f%3A%5B%2A+TO+%2A%5D=Price_4_f%3A%5B%2A+TO+%2A%5D=Price_5_f%3A%5B%2A+TO+%2A%5D=Price_6_f%3A%5B%2A+TO+%2A%5D=1=json=map=%28title%3A%2A+OR+text%3A%2A%29+AND+%28ms%3ALive%29=0=24'

real    0m1.436s
user    0m0.001s
sys    0m0.006s

"QTime":1387

>From what you suggested, changing the rows value from 20 to something
greater should add more documents to the cache. Injunction with tuning
the queries to remove the * wild card, this should provide a better
warming query?

Should I also increase the queryResultWindowSize in the solrconfig.xml
to help built out the cache?

Cheers,

Guy





On Thu, Oct 6, 2016 at 4:43 PM, Dalton Gooding
 wrote:
> Erick,
>
> Thanks for the response. After I run the initial query and get a long
> response time, if I change the query to remove or add additional query
> statements, I find the speed is good.
>
> If I run the modified query after a new searcher has registered, the
> response is slow but after the modified query has been completed, the
> warming query sent from CuRl is much faster. I 

Re: solr 5 leaving tomcat, will I be the only one fearing about this?

2016-10-09 Thread Aristedes Maniatis
Although I don't like docker, I do make heavy use of saltstack with jails plus 
ZFS snapshots to shipping virtual machines around. Docker is mostly just a 
reinvention of BSD/Solaris jails/zones.

You might prefer docker's own custom dockerfile language and I prefer python 
with salt. But ultimately it is a lot of workaround for the stupid decisions 
Solr made by scattering configuration everywhere and forcing us to customise 
the application itself. But thanks for the confirmation that "this is just the 
way it is"... I'm really not missing some easy way to do all this.

Ari


On 10/10/16 6:59am, Georg Sorst wrote:
> If you can, switch to Docker (https://hub.docker.com/_/solr/). It's a pain
> to get everything going the right way, but once it's running you get a lot
> of stuff for free:
> 
> * Deployment, scaling etc. is all taken care of by the Docker ecosystem
> * Testing is a breeze. Need a clean Solr instance to run your application
> against? It's just one command line away
> * You can version the Dockerfile (Docker build instructions), so you can
> version your whole setup. For example we add our own web app to the Docker
> image (we shouldn't be doing that, I know) and put the resulting images
> into our private Docker repository
> 
> Aristedes Maniatis  schrieb am So., 9. Okt. 2016 um
> 02:14 Uhr:
> 
>> On 9/10/16 11:11am, Aristedes Maniatis wrote:
>>> * deployment is also scattered:
>>>  - Solr platform specific package manager (pkg in FreeBSD in my case,
>> which I've had to write myself since it didn't exist)
>>>  - updating config files above
>>>  - writing custom scripts to push Zookeeper configuration into production
>>>  - creating collections/cores using the API rather than in a config file
>>
>> Oh, and pushing additional jars (like a JDBC adapter) into a special
>> folder. Again, not easily testable or version controlled.
>>
>>
>> Ari
>>
>>
>>
>> --
>> -->
>> Aristedes Maniatis
>> GPG fingerprint CBFB 84B4 738D 4E87 5E5C  5EFA EF6A 7D2E 3E49 102A
>>
> 


-- 
-->
Aristedes Maniatis
GPG fingerprint CBFB 84B4 738D 4E87 5E5C  5EFA EF6A 7D2E 3E49 102A


Re: Config for massive inserts into Solr master

2016-10-09 Thread Shawn Heisey
On 10/9/2016 1:59 PM, Reinhard Budenstecher wrote:
> Solr 6.2.1 on Debian Jessie, installed with: 

> Actually, there are three cores and UI gives me following info:
> Num Docs:148652589, Max Doc:298367634, Size:219.92 GB Num
> Docs:37396140, Max Doc:38926989, Size:28.81 GB Num Docs:8601222Max
> Doc:9111004, Size:6.26 GB 

That's considerably larger than you initially indicated.  In just one
index, you've got almost 300 million docs taking up well over 200GB. 
About half of them have been deleted, but they are still there.  Those
deleted docs *DO* affect operation and memory usage.

Getting rid of deleted docs would go a long way towards reducing memory
usage.  The only effective way to get rid of them is to optimize the
index ... but I will warn you that with an index of that size, the time
required for an optimize can reach into multiple hours, and will
temporarily require considerable additional disk space.  The fact that
your index is on SSD will probably not improve the performance of an
optimize.  When I have looked into optimize performance, it typically
proceeds at a much slower pace than most disks can sustain, because
Lucene must process the data as it rewrites it.

> Actually, Solr servers dies nearly once a day, on next shutdown, I'll
> reduce the heap size. 

I don't know if that's going to help, but it might.

> How do I create such a stack trace? I have no more log informations
> than the already posted ones. 

You don't need to create it.  Stacktraces are logged by Solr, in a file
named solr.log, whenever most errors occur.

Thanks,
Shawn



Re: solr 5 leaving tomcat, will I be the only one fearing about this?

2016-10-09 Thread Georg Sorst
If you can, switch to Docker (https://hub.docker.com/_/solr/). It's a pain
to get everything going the right way, but once it's running you get a lot
of stuff for free:

* Deployment, scaling etc. is all taken care of by the Docker ecosystem
* Testing is a breeze. Need a clean Solr instance to run your application
against? It's just one command line away
* You can version the Dockerfile (Docker build instructions), so you can
version your whole setup. For example we add our own web app to the Docker
image (we shouldn't be doing that, I know) and put the resulting images
into our private Docker repository

Aristedes Maniatis  schrieb am So., 9. Okt. 2016 um
02:14 Uhr:

> On 9/10/16 11:11am, Aristedes Maniatis wrote:
> > * deployment is also scattered:
> >  - Solr platform specific package manager (pkg in FreeBSD in my case,
> which I've had to write myself since it didn't exist)
> >  - updating config files above
> >  - writing custom scripts to push Zookeeper configuration into production
> >  - creating collections/cores using the API rather than in a config file
>
> Oh, and pushing additional jars (like a JDBC adapter) into a special
> folder. Again, not easily testable or version controlled.
>
>
> Ari
>
>
>
> --
> -->
> Aristedes Maniatis
> GPG fingerprint CBFB 84B4 738D 4E87 5E5C  5EFA EF6A 7D2E 3E49 102A
>


Re: Re: Config for massive inserts into Solr master

2016-10-09 Thread Reinhard Budenstecher

> What version of Solr?  How has it been installed and started?
>
Solr 6.2.1 on Debian Jessie, installed with:

apt-get install openjdk-8-jre-headless openjdk-8-jdk-headless
wget "http://www.eu.apache.org/dist/lucene/solr/6.2.1/solr-6.2.1.tgz; && tar 
xvfz solr-*.tgz
./solr-*/bin/install_solr_service.sh solr-*.tgz

started with "service solr start"

> Is this a single index core with 150 million docs and 140GB index
> directory size, or is that the sum total of all the indexes on the machine?
>

Actually, there are three cores and UI gives me following info:

Num Docs:148652589, Max Doc:298367634, Size:219.92 GB
Num Docs:37396140, Max Doc:38926989, Size:28.81 GB
Num Docs:8601222Max Doc:9111004, Size:6.26 GB

but the last two cores are not important

>
> It seems unlikely to me that you would see OOM errors when indexing with
> a 32GB heap and no queries.  You might try dropping the max heap to 31GB
> instead of 32GB, so your Java pointer sizes are cut in half.  You might
> actually see a net increase in the amount of memory that Solr can
> utilize with that change.

Actually, Solr servers dies nearly once a day, on next shutdown, I'll reduce 
the heap size.

>
> Whether the errors continue or not, can you copy the full error from
> your log with stacktrace(s) so we can see it?


How do I create such a stack trace? I have no more log informations than the 
already posted ones.

__
Gesendet mit Maills.de - mehr als nur Freemail www.maills.de




Re: Config for massive inserts into Solr master

2016-10-09 Thread Shawn Heisey
On 10/9/2016 12:33 PM, Reinhard Budenstecher wrote:
> We have an ETL process which updates product catalog. This produces massive 
> inserts on MASTER, but there are no reads. Often there are thousands and 
> hundreds of thousands of records per minute that where inserted. But 
> sometimes I get a an OOM error, the only log entry I can find is:
>
> 2016-10-09T16:17:34.440+0200: 63872,249: [Full GC (Allocation Failure) 
> 2016-10-09T16:17:34.440+0200: 63872,249: [CMS: 
> 16387099K->16387087K(16777216K), 4,2227778 secs] 
> 17782619K->17782606K(30758272K), [Metaspace: 36452K->36452K(38912K)], 
> 4,2229287 secs] [Times: user=4,22 sys=0,01, real=4,22 secs]
>
> As I'm a bit lost with this all: is there anybody who can help me with best 
> config for massive inserts on MASTER and massive reads on SLAVE. Is there a 
> common approach? What details should I provide furthermore? Or is the 
> simplest solution to raise heap on MASTER from 32GB of the available 64GB to 
> a higher value?

What version of Solr?  How has it been installed and started?

Is this a single index core with 150 million docs and 140GB index
directory size, or is that the sum total of all the indexes on the machine?

It seems unlikely to me that you would see OOM errors when indexing with
a 32GB heap and no queries.  You might try dropping the max heap to 31GB
instead of 32GB, so your Java pointer sizes are cut in half.  You might
actually see a net increase in the amount of memory that Solr can
utilize with that change.

Whether the errors continue or not, can you copy the full error from
your log with stacktrace(s) so we can see it?

Thanks,
Shawn



Config for massive inserts into Solr master

2016-10-09 Thread Reinhard Budenstecher
Hello,

I'm not a pro in Solr nor in Java, so please be patient. We have an ecommerce 
application with 150 millions docs and a size of 140GB in Solr.
We are using the following setup:

Solr "MASTER":
- DELL R530, 1x XEON E5-1650
- 64GB ECC RAM
- 4x 480GB SSD as RAID10 on hardware RAID (but no BBU so no writeback)

Solr "SLAVE":
- DELL R730, 2x XEON E5-2600
- 128GB ECC RAM
- 4x 480GB SSD as RAID10 on hardware RAID

Both systems are running with Debian Jessie, OpenJDK 8 and Solr 6.2.1. 
Modification to config, that are done:

- heap memory 32GB on MASTER and 64GB on slave
- disabled "AddSchemaFieldsUpdateProcessorFactory"
- set "ClassicIndexSchemaFactory"
- enabled "useFilterForSortedQuery"
- and of course: enabled replication
- autoCommit is 15 seconds and openSearcher is false
- rest in on stock

We have an ETL process which updates product catalog. This produces massive 
inserts on MASTER, but there are no reads. Often there are thousands and 
hundreds of thousands of records per minute that where inserted. But sometimes 
I get a an OOM error, the only log entry I can find is:

2016-10-09T16:17:34.440+0200: 63872,249: [Full GC (Allocation Failure) 
2016-10-09T16:17:34.440+0200: 63872,249: [CMS: 16387099K->16387087K(16777216K), 
4,2227778 secs] 17782619K->17782606K(30758272K), [Metaspace: 
36452K->36452K(38912K)], 4,2229287 secs] [Times: user=4,22 sys=0,01, real=4,22 
secs]

As I'm a bit lost with this all: is there anybody who can help me with best 
config for massive inserts on MASTER and massive reads on SLAVE. Is there a 
common approach? What details should I provide furthermore? Or is the simplest 
solution to raise heap on MASTER from 32GB of the available 64GB to a higher 
value?


__
Gesendet mit Maills.de - mehr als nur Freemail www.maills.de




Re: Real Time Search and External File Fields

2016-10-09 Thread Shawn Heisey
On 10/8/2016 1:18 PM, Mike Lissner wrote:
> I want to make sure I understand this properly and document this for
> futurepeople that may find this thread. Here's what I interpret your
> advice to be:
> 0. Slacken my auto soft commit interval to something more like a minute. 

Yes, I would do this.  I would also increase autoCommit to something
between one and five minutes, with openSearcher set to false.  There's
nothing *wrong* with 15 seconds for autoCommit, but I want my server to
be doing less work during normal operation.

To answer a question you posed in a later message: Yes, it's common for
users to have a longer interval on autoSoftCommit than autoCommit. 
Remember the mantra in the URL about understanding commits:  Hard
commits are about durability, soft commits are about visibility.  Hard
commits when openSearcher is false are almost always *very* fast, so
it's typically not much of a burden to have them happen more frequently,
and thus have a better data durability guarantee.  Like I said above, I
generally use an autoCommit value between one and five minutes.

> I'm a bit confused about the example autowarmcount for the caches, which is
> 0. Why not set this to something higher? I guess it's a RAM utilization vs.
> speed tradeoff? A low number like 16 seems like it'd have minimal impact on
> RAM?

A low autowarmCount is generally chosen for one reason: commit speed. 
If the example configs have it set to zero, I'm sure this was done so
commits would proceed as fast as possible.  Large values can turn
opening a new searcher into a process that can take *minutes*.

On my index shards, the autowarmCount on my filterCache is *four*. 
That's it -- execute only four of the most recent filters in the cache
when a new searcher opens.  That warming *still* sometimes takes as long
as 20 seconds on the larger shards.  The filters used in queries on my
indexes are very large and very complex, and can match millions of
documents.  Pleading with the dev team to decrease query complexity
doesn't help.

On the idea of reusing the external file data when it doesn't change:  I
do not know if this is possible.  I have no idea how Solr and Lucene use
the data found in the external file, so it might be completely necessary
to re-load it every time.  You can open an issue in Jira to explore the
idea, but don't be too surprised if it doesn't go anywhere.

Thanks,
Shawn



Re: configure solr kerberos with tomcat

2016-10-09 Thread Shawn Heisey
On 10/9/2016 2:14 AM, 李爽 wrote:
> i wonder how to configre solr kerberos with tomcat, as in the tutorial
> it shows the configuration procedure with default jetty server:
> https://cwiki.apache.org/confluence/display/solr/Kerberos+Authentication+Plugin

Since 5.0, Solr no longer officially supports running in a third-party
container like Tomcat.  You can still do it, but you're on your own.  I
have no idea whether the authentication plugins provided with Solr will
work with Tomcat.  The authentication plugins were mostly built after 5.0.

You may be able to configure Tomcat itself to handle the authentication,
but you'll need to figure out how to do that on your own.

Thanks,
Shawn



configure solr kerberos with tomcat

2016-10-09 Thread 李爽
hi,

i wonder how to configre solr kerberos with tomcat,  as in the tutorial it 
shows the configuration procedure with default jetty server: 
https://cwiki.apache.org/confluence/display/solr/Kerberos+Authentication+Plugin

thanks.