Not highlighting "and" and "or"?

2017-06-28 Thread Walter Underwood
Is there some special casing in the highlighter to skip query syntax words? The words “and” and “or” don’t get highlighted. This is in 6.5.0. question html 440 fastVector 1 wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my

Re: Suggester and fuzzy/infix suggestions

2017-06-28 Thread Walter Underwood
Yes, but it is better than nothing. Don’t let the unavailable perfect solution keep you from implementing the available good solution. If you want to easily use fuzzy search with edismax, check out the patch submitted with SOLR-629. wunder Walter Underwood wun...@wunderwood.org http

Re: Suggester and fuzzy/infix suggestions

2017-06-28 Thread Walter Underwood
I set up two suggesters, one fuzzy and one analyzing infix. That gives two sets of suggestions, so the client code has to merge them into one list and toss duplicates. They use the same weights, so I can keep the top weighted suggestions. wunder Walter Underwood wun...@wunderwood.org http

Re: Mixing distrib=true and false in one request handler?

2017-06-22 Thread Walter Underwood
OK. We’re going with a separate call to /suggest. For those of us with controlled vocabularies, a suggest.distrib would be a handy thing. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Jun 22, 2017, at 4:32 PM, Alessandro Benedetti <a.

Re: Velocity UI with Analyzing Infix Suggester?

2017-06-22 Thread Walter Underwood
suggesters anyway, both fuzzy and infix. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Jun 6, 2017, at 4:34 AM, Rick Leir <rl...@leirtech.com> wrote: > >> typeahead solutions using a separate collection > > Erik, Do you u

Re: Mixing distrib=true and false in one request handler?

2017-06-22 Thread Walter Underwood
I really don’t understand [1]. I read the JavaDoc for that, but how does it help? What do I put in the solrconfig.xml? I’m pretty good at figuring out Solr stuff. I started with Solr 1.2. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Jun

Mixing distrib=true and false in one request handler?

2017-06-21 Thread Walter Underwood
set the distrib default in the suggester component instead of in the request handler? wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog)

Velocity UI with Analyzing Infix Suggester?

2017-06-05 Thread Walter Underwood
page. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog)

Re: Solr Web Crawler - Robots.txt

2017-06-01 Thread Walter Underwood
tps://en.wikipedia.org/robots.txt> wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Jun 1, 2017, at 4:58 PM, Mike Drob <md...@apache.org> wrote: > > Isn't this exactly what Apache Nutch was built for? > > On Thu, Jun 1, 2017 at 6:5

Re: Solr Web Crawler - Robots.txt

2017-06-01 Thread Walter Underwood
Which was exactly what I suggested. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Jun 1, 2017, at 3:31 PM, David Choi <choi.davi...@gmail.com> wrote: > > In the mean time I have found a better solution at the moment is to t

Re: Solr Web Crawler - Robots.txt

2017-06-01 Thread Walter Underwood
licates, etc. The output of the crawl goes to Solr. That is how we did it with Ultraseek (before Solr existed). wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Jun 1, 2017, at 3:01 PM, David Choi <choi.davi...@gmail.com> wrote: >

Re: Solr in NAS or Network Shared Drive

2017-05-26 Thread Walter Underwood
Pretty sure that master/slave was in Solr 1.2. That was very nearly ten years ago. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On May 26, 2017, at 9:52 AM, David Hastings <hastings.recurs...@gmail.com> > wrote: > > Im curious

Re: solrcloud replicas not in sync

2017-05-24 Thread Walter Underwood
the freshness to Graphite. It is generally under 300 ms. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On May 24, 2017, at 12:51 PM, Webster Homer <webster.ho...@sial.com> wrote: > > Actually I wrote a service that calls the collecti

Re: solr 6 at scale

2017-05-24 Thread Walter Underwood
text queries. Students enter queries with hundreds of words (copy/paste), but we truncate at 40 terms. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On May 24, 2017, at 12:33 PM, Nawab Zada Asad Iqbal <khi...@gmail.com> wrote: > >

Re: solr 6 at scale

2017-05-23 Thread Walter Underwood
reporting. Use 6.5.1. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On May 23, 2017, at 5:27 PM, Nawab Zada Asad Iqbal <khi...@gmail.com> wrote: > > Hi all, > > I am planning to upgrade my solr.4.x installation to a recent stable &g

Re: Indexing word with plus sign

2017-05-23 Thread Walter Underwood
That was on Solr 1.3, so I’m pretty sure it was the whitespace tokenizer. The synonym substitution for “+/-" was done in client code and indexing code, outside of Solr. We also sanitized queries to remove all query syntax characters. wunder Walter Underwood wun...@wunderwood.org

Re: Indexing word with plus sign

2017-05-23 Thread Walter Underwood
uation. And everyone searched for "[•REC]²” as “rec2”. The middot is supposed to be red. Movie studios are clueless about searchable strings. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On May 23, 2017, at 10:41 AM, Erick Erickson <erickerick...

Re: Shard marked as down while its operational & SOLR-9120

2017-05-16 Thread Walter Underwood
Look at all the bugs fixed or reported after 6.0.0. This might have been reported and their might be a workaround. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On May 16, 2017, at 11:41 AM, Susheel Kumar <susheel2...@gmail.com> wrote

Re: Shard marked as down while its operational & SOLR-9120

2017-05-16 Thread Walter Underwood
I would upgrade to 6.5.1 before doing anything else. 6.0.0 is more than a year old. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On May 16, 2017, at 10:27 AM, Susheel Kumar <susheel2...@gmail.com> wrote: > > Also this is

Re: Do developers and power users support stackoverflow.com solr tag?

2017-05-10 Thread Walter Underwood
wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On May 10, 2017, at 8:49 AM, Karl-Philipp Richter <krich...@posteo.de> wrote: > > Hi, > > Am 10.05.2017 um 17:03 schrieb Walter Underwood: >> I have contributed some answers

Re: Do developers and power users support stackoverflow.com solr tag?

2017-05-10 Thread Walter Underwood
” answer, even if it is wrong. Very frustrating. This happens a lot with questions about antennas. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On May 10, 2017, at 7:47 AM, Erick Erickson <erickerick...@gmail.com> wrote: > > Personal

Re: SOLR as nosql database store

2017-05-10 Thread Walter Underwood
CDCR doesn’t rebuild it so much as copy it. To change the schema, you’ll need to reindex. I’ve worked on two NoSQL databases (Objectivity and MarkLogic) and I’ve worked on Solr. They are utterly different designs, intended to do different things. wunder Walter Underwood wun...@wunderwood.org

Re: SessionExpiredException

2017-05-08 Thread Walter Underwood
Which garbage collector are you using? The default GC will probably give long pauses. You need to use CMS or G1. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On May 8, 2017, at 8:48 AM, Erick Erickson <erickerick...@gmail.com> wrote

Re: Solr performance on EC2 linux

2017-05-03 Thread Walter Underwood
feature. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On May 3, 2017, at 3:53 PM, Rick Leir <rl...@leirtech.com> wrote: > > +Walter test it > > Jeff, > How much CPU does the EC2 hypervisor use? I have heard 5% but that is

Re: Solr performance on EC2 linux

2017-05-02 Thread Walter Underwood
Hmm, has anyone measured the overhead of timeAllowed? We use it all the time. If nobody has, I’ll run a benchmark with and without it. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On May 2, 2017, at 9:52 AM, Chris Hostetter <hossm

Re: Solr performance on EC2 linux

2017-05-01 Thread Walter Underwood
Might want to measure the single CPU performance of your EC2 instance. The last time I checked, my MacBook was twice as fast as the EC2 instance I was using. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On May 1, 2017, at 6:24 PM, Chris Hostet

Re: Solr Query Performance benchmarking

2017-04-28 Thread Walter Underwood
ho `date` ": 90th percentiles are $pct90" echo `date` ": 95th percentiles are $pct95" echo `date` ": full results are in ${test}" wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Apr 28, 2017, at 12:00 PM,

Re: Solr Query Performance benchmarking

2017-04-28 Thread Walter Underwood
thing. Thank you SolrJ. Our SLAs are for 95th percentile. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Apr 28, 2017, at 11:39 AM, Erick Erickson <erickerick...@gmail.com> wrote: > > Well, the best way to get no cache hits is t

Re: Solr Query Performance benchmarking

2017-04-28 Thread Walter Underwood
More “unrealistic” than “amazing”. I bet the set of test queries is smaller than the query result cache size. Results from cache are about 2 ms, but network communication to the shards would add enough overhead to reach 40 ms. wunder Walter Underwood wun...@wunderwood.org http

Re: Split Shard not working

2017-04-27 Thread Walter Underwood
What is the message in the log when it crashes? wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Apr 27, 2017, at 10:10 AM, Vijay Kokatnur <kokatnur.vi...@gmail.com> wrote: > > We recently upgraded 4.5 index to 6.5 using IndexUpgra

Re: 1 main collection or multiple smaller collections?

2017-04-27 Thread Walter Underwood
”, and so on for other SRPs. There were a few other filters, like G-rated movies or streaming, DVD, HD DVD, or Bluray. The full index was under 350K documents. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Apr 27, 2017, at 10:01 AM, Rick Leir

Re: 1 main collection or multiple smaller collections?

2017-04-26 Thread Walter Underwood
Also, 300,000 documents is fairly small for Solr. We handle a million queries per day with a few servers on a collection that size. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Apr 26, 2017, at 10:33 PM, Walter Underwood <wun...@wunderwo

Re: 1 main collection or multiple smaller collections?

2017-04-26 Thread Walter Underwood
Do they have the same fields or different fields? Are they updated separately or together? If they have the same fields and are updated together, I’d put them in the same collection. Otherwise, probably separate. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org

Re: Poll: Master-Slave or SolrCloud?

2017-04-25 Thread Walter Underwood
and then to host names (drives me nuts). Same thing for scaling back, take it out of the load balancer and shoot it. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Apr 25, 2017, at 9:23 AM, Erick Erickson <erickerick...@gmail.com&

Re: Data Changes Logging

2017-04-22 Thread Walter Underwood
Do a range search on that field with the desired date range. Request rows=0. Compare the numFound to the total docs. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Apr 22, 2017, at 8:40 AM, Rick Leir <rl...@leirtech.com> wrote: &g

Re: Modify solr score

2017-04-21 Thread Walter Underwood
Using a minimum score cut off does not work. The score is not an absolute estimate of relevance. The idf component of the score is a whole-corpus metric. When you add or delete documents, the scores for the exact same query can change. wunder Walter Underwood wun...@wunderwood.org http

Re: Modify solr score

2017-04-21 Thread Walter Underwood
than the first hit for query B. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Apr 21, 2017, at 9:35 AM, tstusr <ulfrhe...@gmail.com> wrote: > > Since we report the score, we think there will be some relation between them. > As far

Re: Need help with auto-suggester

2017-04-15 Thread Walter Underwood
Sorry, that was formatted. The quotes are actually escaped, like this: {"term":"microsoft office","weight":14,"payload":"{\"count\": 1534255, \"id\": \"microsoft office\"}”} wunder Walter Underwood wun...@wunderwood.org

Re: Need help with auto-suggester

2017-04-15 Thread Walter Underwood
quot;count": 1534255, "id": "microsoft office"}" }, { term: "microsoft excel", weight: 13, payload: "{"count": 940151, "id": "microsoft excel"}" }, wunder wunder Walter Underwood wun...@wunderwood.org http://observer.w

Re: Need help with auto-suggester

2017-04-14 Thread Walter Underwood
We recently needed multiple values in the payload, so I put a JSON blob in there. It comes back as a string, so you have to decode that JSON separately. Otherwise, it was a pretty clean solution. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On

Re: Filtering results by minimum relevancy score

2017-04-13 Thread Walter Underwood
. It is still too easy for a good match to have a low score. We’re back to increasing the good hits vs reducing the bad hits. You really only achieve one of those two. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Apr 12, 2017, at 7:41 PM, Koji Sekigu

Re: Filtering results by minimum relevancy score

2017-04-12 Thread Walter Underwood
, there are scraps of info about beach parking in multiple other pages. Fix the content. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Apr 12, 2017, at 11:44 AM, David Kramer <david.kra...@shoebuy.com> wrote: > > The idea is to no

KeywordTokenizer and multiValued field

2017-04-12 Thread Walter Underwood
Does the KeywordTokenizer make each value into a unitary string or does it take the whole list of values and make that a single string? I really hope it is the former. I can’t find this in the docs (including JavaDocs). wunder Walter Underwood wun...@wunderwood.org http

Re: Long GC pauses while reading Solr docs using Cursor approach

2017-04-11 Thread Walter Underwood
all the ratios and stuff. When were running CMS, I set a size for the heap and a size for the new space. Done. With G1, I don’t even get that fussy. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Apr 11, 2017, at 8:22 PM, Shawn Heisey &

Re: Deleting a field in schema.xml, reindex needed?

2017-04-11 Thread Walter Underwood
When I have done this, it is in multiple steps. 1. Change the indexing so that no data is going to that field. 2. Reindex, so the field is empty. 3. Remove the field from the schema. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Apr 11, 2

Re: Number of shards - Best practice

2017-04-04 Thread Walter Underwood
proportional to the number of distinct terms in the index (the vocabulary). A rule of thumb is the vocabulary is proportional to the square root of the number of terms in the index. Which is often related to the number of documents. With this assumption, four shards gives a 2X speedup. Which has wor

Re: Searchable archive of this mailing list

2017-03-31 Thread Walter Underwood
MarkMail is also good. http://markmail.org/search/?q=solr-user#query:solr-user%20list%3Aorg.apache.lucene.solr-user+page:1+state:facets <http://markmail.org/search/?q=solr-user#query:solr-user list:org.apache.lucene.solr-user+page:1+state:facets> wunder Walter Underwood wun...@wunderwo

Re: Indexing speed reduced significantly with OCR

2017-03-30 Thread Walter Underwood
http://stackoverflow.com/questions/33588262/tesseract-ocr-on-aws-lambda-via-virtualenv/35724894#35724894 <http://stackoverflow.com/questions/33588262/tesseract-ocr-on-aws-lambda-via-virtualenv/35724894#35724894> wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.or

Re: Indexing speed reduced significantly with OCR

2017-03-28 Thread Walter Underwood
Converting from PDF to text is embarrassingly parallel. You can throw as many machines at it as you want. This is a great time to use a cloud computing service. Need 1000 machines? No problem. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Mar

Re: Cascading failures with replicas

2017-03-18 Thread Walter Underwood
6.3.0. No idea how it is happening, but I got two replicas on the same host after one host went down. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Mar 18, 2017, at 8:35 PM, Erick Erickson <erickerick...@gmail.com> wrote: > > Hm

Re: Cascading failures with replicas

2017-03-18 Thread Walter Underwood
rate Amazon EC2 instances, one JVM per instance, no rules, other than the default. "maxShardsPerNode":"1", > bug #1 has been more or less of a pain for quite a while, work is ongoing > there. Glad to share our logs. wunder > FWIW, > Erick > > On

Cascading failures with replicas

2017-03-17 Thread Walter Underwood
equal traffic to each core without considering the host. Each host should get equal traffic, not each core. Bug #4 is putting two replicas from the same shard on one instance. That is just asking for trouble. When it works, this cluster is awesome. wunder Walter Underwood wun...@wunderwood.org

Re: Data Import

2017-03-17 Thread Walter Underwood
That fails if Solr is not available. To avoid dropping updates, you need some kind of persistent queue. We use Amazon SQS for our incremental updates. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Mar 17, 2017, at 10:09 AM, OTH <

Re: Solr 6.3 will not stay connected to zookeeper

2017-03-16 Thread Walter Underwood
; org.apache.solr.util.SolrCLI$ZkCpTool; Could not complete the zk operation for reason: KeeperErrorCode = ConnectionLoss for /configs/tutors/solrconfig.xml ERROR: KeeperErrorCode = ConnectionLoss for /configs/tutors/solrconfig.xml wunder Walter Underwood wun...@wunderwood.org http

Re: Solr 6.3 will not stay connected to zookeeper

2017-03-15 Thread Walter Underwood
I have a pretty good guess what happened. I requested a Zookeeper 3.4.6 cluster, but they built a 3.4.9 cluster. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Mar 15, 2017, at 4:38 PM, Walter Underwood <wun...@wunderwood.org&

Re: Solr 6.3 will not stay connected to zookeeper

2017-03-15 Thread Walter Underwood
. The largest file uploaded was 1094 bytes. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Mar 15, 2017, at 4:27 PM, Walter Underwood <wun...@wunderwood.org> wrote: > > Python kazoo can talk to zookeeper, uploading these same files. Solr

Solr 6.3 will not stay connected to zookeeper

2017-03-15 Thread Walter Underwood
; zkClient has disconnected ERROR: Error uploading file /apps/solr6/server/solr/configsets/tutors/conf/schema.xml to zookeeper path /configs/tutors/schema.xml wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog)

Re: Data Import Handler on 6.4.1

2017-03-15 Thread Walter Underwood
Also, upgrade to 6.4.2. There are serious performance problems in 6.4.0 and 6.4.1. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Mar 15, 2017, at 12:05 PM, Liu, Daphne <daphne@cevalogistics.com> > wrote: > > For Solr 6.

Re: Index and query

2017-03-15 Thread Walter Underwood
, I recommend MarkLogic. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Mar 15, 2017, at 11:02 AM, rangeli nepal <rangeli.ne...@gmail.com> wrote: > > Thank you Erick for such a prompt reply. I am bit confused. > Suppose I ha

Re: Need help with date boost

2017-03-13 Thread Walter Underwood
President Clinton vs the earlier President Clinton. Oops, that doesn’t work. Well, the example used to work with “President Bush”, but now they are both pretty far in the past. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Mar 13, 2017, at 10:21

Re: I need to index files larger than 300 Mb, helpme please

2017-03-13 Thread Walter Underwood
00 MB is extremely large. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog)

Re: Changing definition of id field

2017-03-13 Thread Walter Underwood
master, then replicate. 5. When finished, stop sending updates to the old master and turn it off. It is a hassle, but it is guaranteed to work. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Mar 13, 2017, at 5:48 AM, Shawn Heisey <apa...@elyogr

Re: [ANNOUNCE] Apache Solr 6.4.2 released

2017-03-08 Thread Walter Underwood
and it needs more performance work. We are using New Relic for monitoring. That makes this sort of check very easy. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Mar 8, 2017, at 8:24 AM, Shawn Heisey <apa...@elyograg.org> wrote: > > On 3

Re: Managed schema vs schema.xml

2017-03-07 Thread Walter Underwood
, then do an async reload. I’ve been thinking about time stamping the config directories so I can roll back to a previous config if the reload fails. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Mar 7, 2017, at 12:47 PM, OTH <omer.t@gma

Re: Recommendation for production SOLR

2017-03-06 Thread Walter Underwood
We are going to production this week using 6.3.0. We don’t have time to re-run all the load benchmarks on 6.4.2. We’ll qualify 6.4.2 in a couple of weeks, then upgrade prod if it passes. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Mar 6, 2

Re: Use case for the Shingle Filter

2017-03-04 Thread Walter Underwood
that are in your content, synonyms are a better solution. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Mar 4, 2017, at 8:57 PM, Ryan Yacyshyn <ryan.yacys...@gmail.com> wrote: > > Hi everyone, > > I was thinking of using the Shingle

Re: Solr 6.3.0, possible SYN flooding on port 8983. Sending cookies.

2017-03-04 Thread Walter Underwood
similar on the same host as PHP. Connect to it locally, and let it pool connections to Solr. That will use Unix-local connections that don’t actually run TCP. Really, don’t try to fix networking inside PHP. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog

Re: Joining across collections with Nested documents

2017-03-03 Thread Walter Underwood
Make two denormalized collections. Just don’t join at query time. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Mar 3, 2017, at 1:01 AM, Preeti Bhat <preeti.b...@shoregrp.com> wrote: > > We can't, they are being used for different

Re: Joining across collections with Nested documents

2017-03-02 Thread Walter Underwood
Make one collection with denormalized data. This looks like a relational, multi-table schema in Solr. That will be slow and painful. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Mar 2, 2017, at 9:55 PM, Preeti Bhat <preeti.b...@shoreg

Re: What is the bottleneck for an optimise operation?

2017-03-02 Thread Walter Underwood
, G1 collector). I recommend a smaller heap so the OS can use that RAM to cache file buffers. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Mar 2, 2017, at 7:04 AM, Caruana, Matthew <mcaru...@icij.org> wrote: > > I’m curre

Re: Updating 100 documents in one request

2017-03-01 Thread Walter Underwood
That is exactly what we do. The entire set of loaded documents is saved as JSONL in S3. Very handy for loading up a prod index in test for diagnosis or benchmarking. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Mar 1, 2017, at 8:14 AM, R

Re: Updating 100 documents in one request

2017-03-01 Thread Walter Underwood
Since I always need to know which document was bad, I back off to batches of one document when there is a failure. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Mar 1, 2017, at 6:25 AM, Erick Erickson <erickerick...@gmail.com> wrote:

Re: Need to modify boolean AND search

2017-02-28 Thread Walter Underwood
I strongly recommend using OR instead of AND. Misspellings are in about 10% of queries. Those tend to get zero results for many variations of AND or mostly-AND. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Feb 28, 2017, at 11:54 AM, Nil

Re: Using parameter values in a sort

2017-02-27 Thread Walter Underwood
. This is a pretty vanilla solrconfig.xml. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Feb 27, 2017, at 6:44 PM, Erik Hatcher <erik.hatc...@gmail.com> wrote: > > `scores` (plural), you’ve got this below: > > Remove that, and li

Re: Using parameter values in a sort

2017-02-27 Thread Walter Underwood
I added that line because I was getting an error about it being undefined. At this point, I’m just doing random shit hoping it will work. There is not enough documentation to use this. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On

Re: Using parameter values in a sort

2017-02-27 Thread Walter Underwood
. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Feb 27, 2017, at 6:35 PM, Erik Hatcher <erik.hatc...@gmail.com> wrote: > > You have an empty “scores” parameter in there. You’re not showing your full > search request, b

Re: Using parameter values in a sort

2017-02-27 Thread Walter Underwood
at org.apache.solr.util.PropertiesUtil.substituteProperty(PropertiesUtil.java:65) at org.apache.solr.util.DOMUtil.substituteProperties(DOMUtil.java:298) wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Feb 27, 2017, at 6:17 PM, Erik Hatcher <erik.hatc...@gmail.com&

Re: solr warning - filling logs

2017-02-27 Thread Walter Underwood
of instances, because they will do majority voting. Three or five is a good number. Three works with one failure. Five works with one failure while you are upgrading the Zookeeper ensemble. We run a three node ensemble in test and a five node ensemble in prod. wunder Walter Underwood wun

Using parameter values in a sort

2017-02-27 Thread Walter Underwood
in the parameterized portion, like this: /handler?features=feature_a_4,feature_b_4,feature_c_4,feature_a_186,feature_b_186,feature c_186 Right now, I can’t even make a solrconfig.xml that will load. I’ve read everything I can find on params and function queries. wunder Walter Underwood wun

Re: Setting Solr data dir isn't really working (6.3.0)

2017-02-24 Thread Walter Underwood
Thanks! Now I need to write up the mistakes I made trying to use the solr command. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Feb 24, 2017, at 11:17 AM, Erick Erickson <erickerick...@gmail.com> wrote: > > bq: Which mean

Re: Setting Solr data dir isn't really working (6.3.0)

2017-02-24 Thread Walter Underwood
Running with this, which works they way we want. /solr/data/${solr.core.name} wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Feb 24, 2017, at 10:08 AM, Walter Underwood <wun...@wunderwood.org> wrote: > > Dang it. I know better t

Re: Setting Solr data dir isn't really working (6.3.0)

2017-02-24 Thread Walter Underwood
as Unix V7 (1979), with /var. It should be easy to do in Solr. I expected to see the shard names as directories under /solr/data, but I now remember that I need to set that with a variable. Time to delete everything and rebuild everything again. wunder Walter Underwood wun...@wunderwood.org http

Re: Setting Solr data dir isn't really working (6.3.0)

2017-02-23 Thread Walter Underwood
The bug is that the dataDir is /solr/data and the index data is in /apps/solr6/server/solr. Except for the suggest data. No index data should be outside the dataDir, right? wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Feb 23, 2017, at 6:11

Setting Solr data dir isn't really working (6.3.0)

2017-02-23 Thread Walter Underwood
under@new-solr-c02.test3]# Seems pretty broken to me. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog)

Re: Replicas fail immediately in new collection

2017-02-23 Thread Walter Underwood
path has a newer (6.4+) > solrj version but an older solr-core jar that cannot find this new > method. > > On Sat, Feb 18, 2017 at 5:16 AM, Walter Underwood > <walter.r.underw...@gmail.com> wrote: >> Any idea why I would be getting this on a brand new, empty collectio

Re: Question about best way to architect a Solr application with many data sources

2017-02-21 Thread Walter Underwood
to reload whenever we need to, like loading prod data in test or moving search to a different Amazon region. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Feb 21, 2017, at 7:34 PM, Erick Erickson <erickerick...@gmail.com> wrote: > > D

Re: java.util.concurrent.TimeoutException: Idle timeout expired: 50001/50000 ms

2017-02-21 Thread Walter Underwood
of breathing space above that. Not tons, because more old space garbage means longer collections. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Feb 21, 2017, at 5:18 PM, Erick Erickson <erickerick...@gmail.com> wrote: > > Solr is very m

Re: Question about best way to architect a Solr application with many data sources

2017-02-21 Thread Walter Underwood
Awesome advice. flat=fast in Solr. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Feb 21, 2017, at 5:17 PM, Dave <hastings.recurs...@gmail.com> wrote: > > B is a better option long term. Solr is meant for retrieving

Re: CPU Intensive Scoring Alternatives

2017-02-21 Thread Walter Underwood
BM25. Does host have enough RAM to hold most or all of the index in file buffers? What are the hit rates on your caches? Are you using fuzzy matches? N-gram prefix matching? Phrase matching? Shingles? What version of Java are you running? What garbage collector? wunder Walter Underwood wun

Re: Sorl 6 with jetty issues

2017-02-20 Thread Walter Underwood
Use Solr 6.3.0. For us, 6.4.x is using about 10X as much CPU under heavy query load. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Feb 20, 2017, at 5:11 AM, Michael Kuhlmann <k...@solr.info> wrote: > > This may be related to SOLR

Replicas fail immediately in new collection

2017-02-17 Thread Walter Underwood
/String;)V at org.apache.solr.update.TransactionLog.writeCommit(TransactionLog.java:457) wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog)

Re: Continual garbage collection loop

2017-02-14 Thread Walter Underwood
Yes, 512 MB is far too small. I’m surprised it even starts. We run with 8 Gb. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Feb 14, 2017, at 7:39 AM, Leon STRINGER <leon.strin...@ntlworld.com> wrote: > >> >>On 1

Re: Indexing slower on a better system

2017-02-13 Thread Walter Underwood
Sorry. Haven’t used Windows since seven years ago and haven’t run Windows as a server for more than a decade. I would not recommend using Windows as your Solr OS. Windows is just not designed for that. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog

Re: Indexing slower on a better system

2017-02-13 Thread Walter Underwood
ter (4 shards, 4-way replication factor) built with the c4.8xlarge instances. I’m running 64 indexing threads and 1000 doc batches. It might go a bit faster after we switch the cloud driver in SolrJ. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) >

Re: Indexing slower on a better system

2017-02-13 Thread Walter Underwood
Are you sure the server is faster? My MacBook Pro is a lot faster than many of our Amazon EC2 servers. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Feb 13, 2017, at 8:12 PM, Zheng Lin Edwin Yeo <edwinye...@gmail.com> wrote: > &g

Re: Heads up: SOLR-10130, Performance issue in Solr 6.4.1

2017-02-13 Thread Walter Underwood
I’m seeing similar problems here. With 6.4.0, we were handling 6000 requests/minute. With 6.4.1 it is 1000 rpm with median response times around 2.5 seconds. I also switched to the G1 collector. I’m going to back that out and retest today to see if the performance comes back. wunder Walter

Problem with collection operations in 6.4.1?

2017-02-09 Thread Walter Underwood
hutting down a node times out after three minutes and needs a kill. And collection reload times out after three minutes. Did not have this problem with 6.2.1. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog)

Re: values for fairnessPolicy?

2017-02-09 Thread Walter Underwood
? https://cwiki.apache.org/confluence/display/solr/Distributed+Requests <https://cwiki.apache.org/confluence/display/solr/Distributed+Requests> wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Feb 9, 2017, at 10:13 AM, Walter Under

values for fairnessPolicy?

2017-02-09 Thread Walter Underwood
The default is “false”. I tried “true” and it fails because it can’t parse that as an int. The docs need to describe legal values for this. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog)

Re: Removing duplicate terms from query

2017-02-09 Thread Walter Underwood
as much about New York, but it needs to be the best match for the query “new york new york”. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Feb 9, 2017, at 5:18 AM, Ere Maijala <ere.maij...@helsinki.fi> wrote: > > Thanks Emir. >

<    1   2   3   4   5   6   7   8   9   10   >