Re: Get first value in a multivalued field

2021-03-04 Thread Walter Underwood
You can copy the field to another field, then use the FirstFieldValueUpdateProcessorFactory to limit that field to the first value. At least, that seems to be what that URP does. I have not used it. https://solr.apache.org/guide/8_8/update-request-processors.html wunder Walter Underwood wun

Re: defragmentation can improve performance on SATA class 10 disk ~10000 rpm ?

2021-02-22 Thread Walter Underwood
True, but Windows does cache files. It has been a couple of decades since I ran search on Windows, but Ultraseek got large gains from setting some sort of system property to make it act like a file server and give file caching equal priority with program caching. wunder Walter Underwood wun

Re: defragmentation can improve performance on SATA class 10 disk ~10000 rpm ?

2021-02-22 Thread Walter Underwood
that could be use for caching files. 8GB of heap is usually enough. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Feb 21, 2021, at 11:52 PM, Danilo Tomasoni wrote: > > Hello all, > we are running a solr instance with around 41 MLN documents on

Re: Why Solr questions on stackoverflow get very few views and answers, if at all?

2021-02-12 Thread Walter Underwood
stop words” and “Solr is not a database”. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Feb 12, 2021, at 3:03 AM, Charlie Hull > wrote: > > I've answered a few in my time, but my experience is that if you do so you > then get

Shards and circuit breakers

2021-02-03 Thread Walter Underwood
“overkill” to me. If it only kills external requests, then 10% means 10%. Killing only external requests requires that external requests go roughly equally to all hosts in the cluster, or at least all NRT or PULL replicas. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org

Re: Events on updating documents

2021-01-21 Thread Walter Underwood
. Redesign your system to use a database as a data store. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Jan 20, 2021, at 11:49 PM, haris.k...@vnc.biz wrote: > > Hello, > > We at VNC are using Solr for search and as a data store. We

Re: different score from different replica of same shard

2021-01-13 Thread Walter Underwood
Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Jan 13, 2021, at 6:31 AM, Markus Jelsma wrote: > > Hallo Bernd, > > I see the different replica types in the 7.1 [1] manual but not in the 6.6. > ExactStatsCache should work in 6.6, just add it

Re: Apache Solr in High Availability Primary and Secondary node.

2021-01-11 Thread Walter Underwood
Use a load balancer. We’re in AWS, so we use an AWS ALB. If you don’t have a failure-tolerant load balancer implementation, the site has bigger problems than search. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Jan 11, 2021, at 10:15 AM, Dmi

Re: Apache Solr in High Availability Primary and Secondary node.

2021-01-11 Thread Walter Underwood
of them. If any of them goes down, you have the capacity to handle the traffic. This is called “N+1 provisioning”. This was our rule at Netflix a dozen years ago, running Solr 1.3. I do it the same way today with large sharded clusters, one extra per shard. wunder Walter Underwood wun

Re: Sending compressed (gzip) UpdateRequest with SolrJ

2021-01-08 Thread Walter Underwood
data probably overwhelms the network time. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Jan 8, 2021, at 12:01 AM, Gael Jourdan-Weil > wrote: > > You're right Matthew. > > Jetty supports it for responses but for reques

Missing processor in example update request processor chain?

2020-12-28 Thread Walter Underwood
est-processors.html> wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog)

Re: CPU and memory circuit breaker documentation issues

2020-12-18 Thread Walter Underwood
Thanks. I’m already familiar with adoc. https://issues.apache.org/jira/browse/SOLR-15056 <https://issues.apache.org/jira/browse/SOLR-15056> Now I need to brush up on How To Contribute. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Dec

Re: CPU and memory circuit breaker documentation issues

2020-12-18 Thread Walter Underwood
OperatingSystemMXBean.getSystemCPULoad(). How do I fix the documentation? wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Dec 16, 2020, at 10:41 AM, Walter Underwood wrote: > > In https://lucene.apache.org/solr/guide/8_7/circuit-breakers.html

Re: Best example solrconfig.xml?

2020-12-16 Thread Walter Underwood
tly specify at least one SolrJmxReporter configuration.” https://lucene.apache.org/solr/8_7_0/changes/Changes.html#v7.0.0.upgrading_from_solr_6.x wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Dec 15, 2020, at 7:36 PM, Walter Underwood wrote

CPU and memory circuit breaker documentation issues

2020-12-16 Thread Walter Underwood
/7/docs/api/java/lang/management/MemoryUsage.html> wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog)

Re: Best example solrconfig.xml?

2020-12-15 Thread Walter Underwood
they got into a stable congested state. People just don’t believe that will happen until they see it. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Dec 15, 2020, at 6:31 PM, Erick Erickson wrote: > > I’d start with that config set, ma

Best example solrconfig.xml?

2020-12-15 Thread Walter Underwood
We’re moving from 6.6 to 8.7 and I’m thinking of starting with an 8.7 solrconfig.xml and porting our changes into it. Is this the best one to start with? solr/server/solr/configsets/_default/conf/solrconfig.xml wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my

Re: Vulnerabilities in SOLR 8.6.2

2020-12-11 Thread Walter Underwood
, open a Jira issue. https://issues.apache.org/jira/projects/SOLR/issues/SOLR-14792?filter=allopenissues wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Dec 11, 2020, at 8:50 AM, Narayanan, Lakshmi > wrote: > > Can anyone please advise

Re: SolrCloud crashing due to memory error - 'Cannot allocate memory' (errno=12)

2020-12-10 Thread Walter Underwood
How much RAM do you have on those machines? That message says you ran out. 32 GB is a HUGE heap. Unless you have a specific need for that, run with a 8 GB heap and see how that works. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Dec 10, 2

Re: is there a way to trigger a notification when a document is deleted in solr

2020-12-07 Thread Walter Underwood
supports queries that send notifications. The original feature request could be satisfied the same way. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Dec 7, 2020, at 6:22 AM, Pushkar Mishra wrote: > > Hi All > https://issues.apache.org

Re: Solr8.7 - How to optmize my index ?

2020-12-01 Thread Walter Underwood
Even better DO NOT OPTIMIZE. Just let Solr manage the indexes automatically. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Dec 1, 2020, at 11:31 AM, Info MatheoSoftware > wrote: > > Hi All, > > > > I found the solutio

Re: data import handler deprecated?

2020-11-29 Thread Walter Underwood
. It allows loading prod data into a test cluster for load benchmarks, for example. Also good for disaster recovery, just load the recent batches from S3. Want to know exactly which documents were in the index in October? Look at the batches in S3. wunder Walter Underwood wun...@wunderwood.org http

Re: Query generation is different for search terms with and without "-"

2020-11-25 Thread Walter Underwood
man wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Nov 25, 2020, at 9:26 AM, Erick Erickson wrote: > > Parameters, no. You could use a PatternReplaceCharFilterFactory. NOTE: > > *FilterFactory are _not_ what you want in this case, t

Re: Phrase query no hits when stopwords and FlattenGraphFilterFactory used

2020-11-10 Thread Walter Underwood
in the index at Infoseek, back in 1996. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Nov 10, 2020, at 1:16 AM, Edward Turner wrote: > > Hi all, > > Okay, I've been doing more research about this problem and from what I >

Re: Solr tag cloud - words and counts

2020-11-03 Thread Walter Underwood
For a tag cloud, the anomalous words are what you want. If you choose the most common words, then every tag cloud will have the same words. It will look like: the, be, to, it, of, and, a, in, that, have, I, it, for, not, on, with, ... wunder Walter Underwood wun...@wunderwood.org http

Re: Avoiding duplicate entry for a multivalued field

2020-10-29 Thread Walter Underwood
Since you are already taking the performance hit of atomic updates, I doubt you’ll see any impact from field types or update request processors. The extra cost of atomic updates will be much greater than indexing cost. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org

Re: SOLR uses too much CPU and GC is also weird on Windows server

2020-10-28 Thread Walter Underwood
Double the heap. All that CPU is the GC trying to free up space. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Oct 28, 2020, at 6:29 AM, Jaan Arjasepp wrote: > > Hi all, > > Its me again. Anyway, I did a little research and we

Re: Tangent: old Solr versions

2020-10-28 Thread Walter Underwood
that aren’t in 6.6.2. Moving the Solr 4 cluster is way at the bottom of the list. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Oct 28, 2020, at 5:37 AM, Mark H. Wood wrote: > > On Tue, Oct 27, 2020 at 04:25:54PM -0500, Mike Drob wrote:

Re: SOLR uses too much CPU and GC is also weird on Windows server

2020-10-27 Thread Walter Underwood
# Settings from https://wiki.apache.org/solr/ShawnHeisey GC_TUNE=" \ -XX:+UseG1GC \ -XX:+ParallelRefProcEnabled \ -XX:G1HeapRegionSize=8m \ -XX:MaxGCPauseMillis=200 \ -XX:+UseLargePages \ -XX:+AggressiveOpts \ " wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/

Re: Faceting on indexed=false stored=false docValues=true fields

2020-10-19 Thread Walter Underwood
Hmm. Fields used for faceting will also be used for filtering, which is a kind of search. Are docValues OK for filtering? I expect they might be slow the first time, then cached. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Oct 19, 2020, at 11

Re: converting string to solr.TextField

2020-10-17 Thread Walter Underwood
Because Solr is not updating documents. Solr is adding to indexes of fields. You cannot add a TextField document to a StringField index. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Oct 17, 2020, at 5:23 AM, Vinay Rajput wrote: > > Sorr

Re: converting string to solr.TextField

2020-10-16 Thread Walter Underwood
in prod. We had created a bunch of extra fields for a series of A/B tests on relevance improvements. Those tests were finished, so we needed to remove those from the index. It was slightly simpler because we had already stopped querying those fields. wunder Walter Underwood wun...@wunderwood.org http

Re: converting string to solr.TextField

2020-10-16 Thread Walter Underwood
No. The data is already indexed as a StringField. You need to make a new field and reindex. If you want to keep the same field name, you need to delete all of the documents in the index, change the schema, and reindex. wunder Walter Underwood wun...@wunderwood.org http

Re: Solr 8.6.3

2020-10-15 Thread Walter Underwood
Solr does not index XML. It has an XML data format for indexing text. If you want to index and search XML, get MarkLogic. I used to work there. It is seriously awesome technology. https://www.marklogic.com <https://www.marklogic.com/> wunder Walter Underwood wun...@wunderwood.or

Re: Memory line in status output

2020-10-13 Thread Walter Underwood
I recommend using the options mentioned in recent messages on this list. Solr has pretty specific memory demands, with lots of allocations with a lifetime of a single request, plus very long-lived allocations that aren’t freed until they are evicted from a cache. wunder Walter Underwood wun

Re: Memory line in status output

2020-10-13 Thread Walter Underwood
The home page of the Solr admin UI shows all of the options to the JVM. That will include the choice of garbage collector. You can also see the options with “ps -ef | grep solr”. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Oct 13, 2020, at 1

Re: Any solr api to force leader on a specified node

2020-10-11 Thread Walter Underwood
biggest one is 48 hosts with 55 million documents. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Oct 11, 2020, at 8:40 PM, yaswanth kumar wrote: > > Hi wunder > > Thanks for replying on this.. > > I did setup solr cloud with 4

Re: Any solr api to force leader on a specified node

2020-10-11 Thread Walter Underwood
That requirement is not necessary. Let Solr choose a leader. Why is someone making this bad requirement? wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Oct 11, 2020, at 8:01 PM, yaswanth kumar wrote: > > Can someone pls help m

Re: Help with uploading files to a core.

2020-10-11 Thread Walter Underwood
Solr is not a database. You can make a huge mess pretending it is a DB. Also, it doesn’t store files. What is your use case? wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Oct 11, 2020, at 1:28 PM, Guilherme dos Reis Meneguello >

Re: Folding Repeated Letters

2020-10-09 Thread Walter Underwood
agree that trying to fix this after you have the query is hard. If edismax supported fuzzy matching, it would be much easier. I know that, because we’ve been running that patch (SOLR-629) in prod for several years. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my

Re: Solr endpoint on the public internet

2020-10-08 Thread Walter Underwood
Let me know where it is and I’ll delete all the documents in your collection. It is easy, just one HTTP request. https://gist.github.com/nz/673027/313f70681daa985ea13ba33a385753aef951a0f3 wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Oct 8, 2

Re: Term too complex for spellcheck.q param

2020-10-07 Thread Walter Underwood
The spellcheck feature was replaced by the suggester in Solr 4, released in 2012, so I would not expect any changes in spellcheck. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Oct 7, 2020, at 3:53 PM, gnandre wrote: > > Is th

Re: Java GC issue investigation

2020-10-07 Thread Walter Underwood
\ -XX:+ParallelRefProcEnabled \ -XX:G1HeapRegionSize=8m \ -XX:MaxGCPauseMillis=200 \ -XX:+UseLargePages \ -XX:+AggressiveOpts \ " wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Oct 7, 2020, at 2:39 AM, Karol Grzyb wrote: > > Hi Matthew, Erick! >

Re: Order of applying tokens/filter

2020-10-06 Thread Walter Underwood
Synonyms only need to be done once. Generally, expand synonyms at index time only. Also, consider the StandardTokeniizer. It is a bit smarter and that can be useful. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Oct 5, 2020, at 9:08

Re: Order of applying tokens/filter

2020-10-04 Thread Walter Underwood
SynonymGraphFilterFactory FlattenGraphFilterFactory KStemFilterFactory RemoveDuplicatesFilterFactory wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Oct 4, 2020, at 9:24 PM, Jayadevan Maymala > wrote: > > Hi all, > > Is this the best

Re: advice on whether to use stopwords for use case

2020-10-01 Thread Walter Underwood
I can’t think of an easy way to do this in Solr. Do a bunch of string searches on the query on the client side. If any of them match, make a “no hits” result page. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Sep 30, 2020, at 11:56 PM, De

Re: Master/Slave

2020-09-30 Thread Walter Underwood
are also our disaster recovery method for indexing. And of course, two clusters could be loaded from the same file. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Sep 30, 2020, at 12:09 PM, David Hastings > wrote: > >> whether we shoul

Re: advice on whether to use stopwords for use case

2020-09-30 Thread Walter Underwood
examples? wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Sep 30, 2020, at 11:53 AM, Alexandre Rafalovitch > wrote: > > You may also want to look at something like: > https://docs.querqy.org/index.html > > ApacheCon had (is hav

Re: Doing what does using SolrJ API

2020-09-17 Thread Walter Underwood
If you want to ignore a field being sent to Solr, you can set indexed=false and stored=false for that field in schema.xml. It will take up room in schema.xml but zero room on disk. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Sep 17, 2020, at

Re: Updating configset

2020-09-11 Thread Walter Underwood
I wrote some Python to get the Zookeeper address from CLUSTERSTATUS, then use the Kazoo library to upload a configset. Then it goes back to the cluster and runs an async command to RELOAD. I really should open source that thing (in my copious free time). wunder Walter Underwood wun

Re: Why use a different analyzer for "index" and "query"?

2020-09-10 Thread Walter Underwood
”, “baby-sitter”, and “baby sitter”. * remove HTML: we rarely see HTML in queries, but we never know when someone will get clever with the source text, sigh. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Sep 10, 2020, at 9:48 AM, Erick Erick

Re: Understanding Solr heap %

2020-09-01 Thread Walter Underwood
\ -XX:+ParallelRefProcEnabled \ -XX:G1HeapRegionSize=8m \ -XX:MaxGCPauseMillis=200 \ -XX:+UseLargePages \ -XX:+AggressiveOpts \ " wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Sep 1, 2020, at 8:39 AM, Joe Doupnik wrote: > > Erick states this

Re: Exclude a folder/directory from indexing

2020-08-28 Thread Walter Underwood
For building a crawler, I’d start with Scrapy (https://scrapy.org <https://scrapy.org/>). It is a solid design and should be easy to use for crawling web pages, files, or an API. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Aug 28, 2020

Re: PDF extraction using Tika

2020-08-26 Thread Walter Underwood
When I worked for a search engine vendor, we did exactly the same thing. We always ran the document crackers in a different process because they tended to hang, crash, run forever, or use all of memory. Adobe PDFlib was not an exception to that rule. wunder Walter Underwood Ultraseek Server

Re: Solr doesn't run after editing solr.in.sh

2020-08-23 Thread Walter Underwood
Also, what platform is this on and what editor did you use (especially if you are on Windows)? wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Aug 23, 2020, at 4:35 PM, Erick Erickson wrote: > > Well, first show exactly what you uncom

Re: SOLR indexing takes longer time

2020-08-18 Thread Walter Underwood
Instead of writing code, I’d fire up SQL Workbench/J, load the same JDBC driver that is being used in Solr, and run the query. https://www.sql-workbench.eu <https://www.sql-workbench.eu/> If that takes 3.5 hours, you have isolated the problem. wunder Walter Underwood wun...@wunderwood.or

Re: SOLR indexing takes longer time

2020-08-17 Thread Walter Underwood
while you are indexing. If it is under 50%, the bottleneck is MongoDB and single-threaded indexing. For another check, run that same query in a regular database client and time it. The Solr indexing will never be faster than that. wunder Walter Underwood wun...@wunderwood.org http

Looking for Solr contractor at Chegg

2020-08-17 Thread Walter Underwood
though they are somewhat different than the main topic of helping Solr users. wunder Walter Underwood Principal Software Engineer, Chegg wun...@wunderwood.org http://observer.wunderwood.org/ (my blog)

Re: Multiple Collections in a Alias.

2020-08-12 Thread Walter Underwood
a different order from different collections. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Aug 12, 2020, at 4:29 PM, Jae Joo wrote: > > Good question. How can I validate if the replicas are all synched? > > > On Wed, Aug 12, 2020

Re: Multiple Collections in a Alias.

2020-08-12 Thread Walter Underwood
Are the scores the same for the documents that are ordered differently? wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Aug 12, 2020, at 10:55 AM, Jae Joo wrote: > > The replications are all synched and there are no updates while I was

Re: Searching for credit card numbers

2020-07-28 Thread Walter Underwood
If you reindex, I’ve become a big fan of adding a date field with an index timestamp. That will allow you to check whether everything has been reindexed. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Jul 28, 2020, at 2:11 PM, Jörn Fra

Re: Searching for credit card numbers

2020-07-28 Thread Walter Underwood
I’d do that at index time. Add an update request processor script that does the regex and adds a field has_credit_card_number:true. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Jul 28, 2020, at 11:50 AM, lstusr 5u93n4 wrote: > > Let's s

Re: tlog keeps growing

2020-07-23 Thread Walter Underwood
This is a long shot, but look in the overseer queue to see if stuff is stuck. We ran into that with 6.x. We restarted the instance that was the overseer and the newly-elected overseer cleared the queue. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog

Re: Replica goes into recovery mode in Solr 6.1.0

2020-07-21 Thread Walter Underwood
Upgrade to 6.6.2. That will be compatible, but will fix several bugs that were discovered during the 6.x releases. If the problem happens after that, ask again. It might, we’ve had some issues with 6.6.2, but upgrade first. wunder Walter Underwood wun...@wunderwood.org http

Re: Solr fails to start with G1 GC

2020-07-16 Thread Walter Underwood
Instead of editing bin/solr, you should be able to set GC_TUNE in solr.in.sh, as I showed in my post below. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Jul 16, 2020, at 7:52 AM, krishan goyal wrote: > > The issue was figured out by

Re: Solr fails to start with G1 GC

2020-07-15 Thread Walter Underwood
://wiki.apache.org/solr/ShawnHeisey GC_TUNE=" \ -XX:+UseG1GC \ -XX:+ParallelRefProcEnabled \ -XX:G1HeapRegionSize=8m \ -XX:MaxGCPauseMillis=200 \ -XX:+UseLargePages \ -XX:+AggressiveOpts \ " wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Jul 15, 202

Re: Replica goes into recovery mode in Solr 6.1.0

2020-07-10 Thread Walter Underwood
. Try that heap size. Upgrade to 6.6.2. That includes all bug fixes for the 6.x release. The 6.x release had several bad bugs, especially in the middle releases. We were switching prod to Sol Cloud while those were being released and it was not fun. wunder Walter Underwood wun...@wunderwood.org http

Re: Replica goes into recovery mode in Solr 6.1.0

2020-07-09 Thread Walter Underwood
JVMs) just for Java where I would configure it with a single 8 GB JVM. That would free up 100 GB for file caches. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Jul 8, 2020, at 10:10 PM, vishal patel > wrote: > > Thanks for reply. &

Re: Replica goes into recovery mode in Solr 6.1.0

2020-07-08 Thread Walter Underwood
Solr super fast. It makes Solr do too much work, makes the work queues fill up, and makes it fail. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Jul 7, 2020, at 10:55 PM, vishal patel > wrote: > > Thanks for your reply. > > One

Re: Replica goes into recovery mode in Solr 6.1.0

2020-07-07 Thread Walter Underwood
huge JVMs? The servers will be doing a LOT of disk IO, so look at the read and write iops. I expect that the solr processes are blocked on disk reads almost all the time. "-Dsolr.autoSoftCommit.maxTime=100” is way too short (100 ms). That is probably causing your outages. wunder Walter Und

Re: Max number of documents in update request

2020-07-07 Thread Walter Underwood
will be sending the next batch so there is no pause in processing. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Jul 7, 2020, at 6:12 AM, Erick Erickson wrote: > > As many as you can send before blowing up. > > Really, the question is

Re: How to use two search string in a single solr query

2020-07-02 Thread Walter Underwood
First, remove the “mm” parameter from the request handler definition. That can be added back in and tweaked later, or just left out. Second, you don’t need any query syntax to search for two words. This query should work fine: books bags wunder Walter Underwood wun...@wunderwood.org http

Re: Query in quotes cannot find results

2020-06-30 Thread Walter Underwood
that never occur in production and to missing problems that do occur. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Jun 30, 2020, at 11:07 AM, Permakoff, Vadim > wrote: > > Hi Erick, > Thank you for the suggestion, I should of add it.

Re: Query in quotes cannot find results

2020-06-30 Thread Walter Underwood
about why we didn’t remove stopwords at Netflix. https://observer.wunderwood.org/2007/05/31/do-all-stopword-queries-matter/ wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Jun 30, 2020, at 8:56 AM, Permakoff, Vadim > wrote: > > Hi Erik, &

Re: Prevent Re-indexing if Doc Fields are Same

2020-06-26 Thread Walter Underwood
. You can check if the fields are the same with a checksum of the data. MD5 is fine for that. Check that database before sending the document and update it after new documents are indexed. You may also want to record deletes in the database. wunder Walter Underwood wun...@wunderwood.org http

Re: Retrieve disk usage & release disk space after delete

2020-06-23 Thread Walter Underwood
and start walking JSON data. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Jun 23, 2020, at 4:27 AM, Erick Erickson wrote: > > Q1: If you’re talking about disk space used up by deleted documents, > then yes, optimize or expungeDeletes

Re: Deleting on exact match

2020-06-21 Thread Walter Underwood
Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Jun 21, 2020, at 12:32 PM, Scott Q. wrote: > > Also note that I didn't apply the new schema yet because I don't > think it will let me change it mid-way like this without deleting all > data an

Re: [EXTERNAL] Getting rid of Master/Slave nomenclature in Solr

2020-06-19 Thread Walter Underwood
Delegator/handler is a common pattern, but it is not the pattern that describes traditional Solr replication. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog)

Getting rid of Overseer nomenclature in Solr

2020-06-19 Thread Walter Underwood
level than what the overseer does. It might be something like all the customer interactions in a billing process. That usage might be confusing for the term “orchestrator”. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Jun 18, 2020, at 10:44

Re: [EXTERNAL] Getting rid of Master/Slave nomenclature in Solr

2020-06-18 Thread Walter Underwood
of a current slave, add it to the load balancer, and walk away. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Jun 18, 2020, at 7:40 PM, Trey Grainger wrote: > >> >> Let’s instead find a new good name for the cluster type. Standalon

Re: [EXTERNAL] Getting rid of Master/Slave nomenclature in Solr

2020-06-18 Thread Walter Underwood
We don’t get to decide whether “master” is a problem. The rest of the world has already decided that it is a problem. Our task is to replace the terms “master” and “slave” in Solr. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Jun 18, 2020, a

Re: Getting rid of Master/Slave nomenclature in Solr

2020-06-18 Thread Walter Underwood
Actually, the term “master” is a problem, so master/follower doesn’t work. GitLab is renaming the master branch to main. Rice University renamed College Masters to College Magisters in 2017. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Jun

Re: Getting rid of Master/Slave nomenclature in Solr

2020-06-17 Thread Walter Underwood
Yes, it is nice to see everyone just pitch in and do it on this list. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog)

Re: Getting rid of Master/Slave nomenclature in Solr

2020-06-17 Thread Walter Underwood
Master/slave is not going away in our company. That cluster has zero downtime in five years. I can’t say that about our Solr Cloud clusters. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Jun 17, 2020, at 9:36 PM, Noble Paul wrote: > > I

Re: Getting rid of Master/Slave nomenclature in Solr

2020-06-17 Thread Walter Underwood
, but I can’t remember what terms I preferred because that was ten years ago. A master gives commands. That isn’t how Solr masters work. It is closer to how an NRT or TLOG leader works, actually. A Solr master just sits there and waits for other nodes to copy the index. wunder Walter Underwood wun

Re: Getting rid of Master/Slave nomenclature in Solr

2020-06-17 Thread Walter Underwood
in my email inbox, publisher/subscriber is an excellent choice. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Jun 17, 2020, at 2:21 PM, Trey Grainger wrote: > > I guess I don't see it as polysemous, but instead simplifying. > >

Re: Getting rid of Master/Slave nomenclature in Solr

2020-06-17 Thread Walter Underwood
this is not a hypothetical for us. We have 100+ Solr hosts in production. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Jun 17, 2020, at 1:36 PM, Trey Grainger wrote: > > Proposal: > "A Solr COLLECTION is composed of one or more S

Re: Solr 7.6 optimize index size increase

2020-06-17 Thread Walter Underwood
From that short description, you should not be running optimize at all. Just stop doing it. It doesn’t make that big a difference. It may take your indexes a few weeks to get back to a normal state after the forced merges. wunder Walter Underwood wun...@wunderwood.org http

Re: Master Slave Terminology

2020-06-17 Thread Walter Underwood
I’ve long thought that master/slave was not the right metaphor for a pull model anyway. We probably should not use “replica” since that already has a use in Solr Cloud. Where is the discussion? wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On

Re: Solr 7.6 optimize index size increase

2020-06-16 Thread Walter Underwood
if there wasn’t enough free space. It would log an error and send an email to the admin. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Jun 16, 2020, at 1:58 PM, David Hastings > wrote: > > I cant give you a 100% true answer but ive

Re: How to determine why solr stops running?

2020-06-11 Thread Walter Underwood
://wiki.apache.org/solr/ShawnHeisey GC_TUNE=" \ -XX:+UseG1GC \ -XX:+ParallelRefProcEnabled \ -XX:G1HeapRegionSize=8m \ -XX:MaxGCPauseMillis=200 \ -XX:+UseLargePages \ -XX:+AggressiveOpts \ " wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Jun 11, 2020

Re: Getting rid of zookeeper

2020-06-09 Thread Walter Underwood
release notes. https://zookeeper.apache.org/releases.html Elasticsearch does not have a good record on fault-tolerance. I haven’t checked recently, but it was losing updates during leader elections for several years worth of software releases. wunder Walter Underwood wun...@wunderwood.org http

Re: Script to check if solr is running

2020-06-08 Thread Walter Underwood
e/SOLR-14410 <https://issues.apache.org/jira/browse/SOLR-14410> Why have a cold backup and then switch? Every time I see that config, I wonder why people don’t have both servers live behind a load balancer. How do you know the cold server will work? wunder Walter Underwood wun...@w

Re: Script to check if solr is running

2020-06-05 Thread Walter Underwood
Most Linux distros are using systemd to manage server processes. https://en.wikipedia.org/wiki/Systemd <https://en.wikipedia.org/wiki/Systemd> wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Jun 5, 2020, at 8:08 AM, Mark H. Wood wrote: &g

Re: Multiple Solr instances using same ZooKeepers

2020-06-03 Thread Walter Underwood
it. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Jun 2, 2020, at 6:35 AM, Gell-Holleron, Daniel > wrote: > > Many thanks for this information! > > > -Original Message- > From: Colvin Cowie > Sent: 02 June

Re: Not all EML files are indexing during indexing

2020-06-02 Thread Walter Underwood
cript on Scrapy (https://scrapy.org <https://scrapy.org/>). I worked on a Python-based web spider for about ten years. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog)

Re: SOLR cache tuning

2020-06-01 Thread Walter Underwood
of our Solr instances, including one with about 5 million docs per shard. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Jun 1, 2020, at 8:28 AM, Tarun Jain wrote: > > Hi,I have a SOLR installation in master-slave configuration. The slave i

Re: JMX metrics for solr cloud cluster state

2020-05-31 Thread Walter Underwood
I gave up on JMX ages ago, so I can’t help there. I’d open a bug with New Relic. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On May 31, 2020, at 7:59 PM, Ganesh Sethuraman > wrote: > > Can you suggest Solr Cloud JMX metrics fo

Re: JMX metrics for solr cloud cluster state

2020-05-31 Thread Walter Underwood
I wrote a Python demon that gets clusterstatus from the API, parses it, and sends the counts of replicas in each state to InfluxDB. From there, we chart and alert in Grafana. New Relic is good, but we need other kinds of metrics, like the load balancer status from CloudWatch. wunder Walter

Re: Why Did It Match?

2020-05-28 Thread Walter Underwood
Are you sure they will wonder? I’d try it without that and see if the simpler UI is easier to use. Simple almost always wins the A/B test. You can use the highlighter to see if a field matched a term. Only use explain if you need all the scores. wunder Walter Underwood wun...@wunderwood.org

  1   2   3   4   5   6   7   8   9   10   >