Re: A bad idea to store core data directory over NAS?

2014-11-04 Thread Walter Underwood
I did that once by accident. It was 100X slower. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ On Nov 4, 2014, at 1:57 PM, Gili Nachum gilinac...@gmail.com wrote: My data center is out of SAN or local disk storage - is it a big no-no to store Solr core data

Re: A bad idea to store core data directory over NAS?

2014-11-05 Thread Walter Underwood
My experience was with Solr 1.2 and regular old NFS, so that was probably worst case. I was very surprised that it was that bad, though. So benchmark it before you assume it is fast enough. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ On Nov 5, 2014, at 12:27

Re: Solr Cloud Cross-Core Joins

2014-11-05 Thread Walter Underwood
I am curious why you are trying to do this with Solr. This is straightforward with other systems. I would use HBase for this. This could be really hard with Solr. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ On Nov 5, 2014, at 5:08 PM, Steve Davids sdav

Re: Solr exceptions during batch indexing

2014-11-07 Thread Walter Underwood
Yes, I implemented exactly that fallback for Solr 1.2 at Netflix. It isn’t to hard if the code is structured for it; retry with a batch size of 1. wunder On Nov 7, 2014, at 11:01 AM, Erick Erickson erickerick...@gmail.com wrote: Yeah, this has been an ongoing issue for a _long_ time.

Re: Solr exceptions during batch indexing

2014-11-07 Thread Walter Underwood
Right, that is why we batch. When a batch of 1000 fails, drop to a batch size of 1 and start the batch over. Then it can report the exact document with problems. If you want to continue, go back to the bigger batch size. I usually fail the whole batch on one error. wunder Walter Underwood wun

Suggest dictionaries not rebuilding after restart

2014-11-13 Thread Walter Underwood
wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/

Re: Suggest dictionaries not rebuilding after restart

2014-11-13 Thread Walter Underwood
We get no suggestions until we force a build with suggest.build=true. Maybe we need to define a spellchecker component to get that behavior? wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ On Nov 13, 2014, at 10:56 PM, Michael Sokolov msoko

Re: Suggest dictionaries not rebuilding after restart

2014-11-14 Thread Walter Underwood
That fixed it. I bet that would fix the problem with the very long startup that another user had. That’s a bug in the default solrconfig.xml, it should persist the dictionaries. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ On Nov 14, 2014, at 12:42 AM

Re: Documents to query

2014-11-24 Thread Walter Underwood
This feature is called “more like this”. I think it only works for a single document, but it probably could be extended. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ On Nov 24, 2014, at 10:26 AM, Alexandre Rafalovitch arafa...@gmail.com wrote: Very unlikely

Re: confused about how to set a solr query timeout when using tomcat

2014-11-27 Thread Walter Underwood
Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ On Nov 27, 2014, at 12:56 PM, solr-user solr-u...@hotmail.com wrote: I inherited a set of some old 1.4x Solrs running under tomcat6/java6 while I will eventually upgrade them to a more recent solr/tomcat/java, I am unable

Re: Tika HTTP 400 Errors with DIH

2014-12-04 Thread Walter Underwood
No, 400 should mean that the request was bad. When the server fails, that is a 500. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ On Dec 4, 2014, at 8:43 AM, Alexandre Rafalovitch arafa...@gmail.com wrote: 400 error means something wrong on the server

Re: A field-wide remove duplicate tokens filter

2014-12-17 Thread Walter Underwood
Why is that useful? It breaks phrase search. If you want to ignore term frequency in ranking, change the Similarity class. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ On Dec 17, 2014, at 2:40 PM, Varun Rajput varun...@hotmail.com wrote

Re: WordDelimiter filter, expanding to multiple words, unexpected results

2014-12-30 Thread Walter Underwood
You want preserveOriginal=“1”. You should only do this processing at index time. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ On Dec 30, 2014, at 9:33 AM, Jonathan Rochkind rochk...@jhu.edu wrote: Okay, thanks. I'm not sure if it's my lack of understanding

Re: WordDelimiter filter, expanding to multiple words, unexpected results

2014-12-30 Thread Walter Underwood
There are two approaches for the query “mixedCase” to match “mixed Case” in the original document. 1. Add an index time synonym. 2. Add a ShingleFilterFactory to the index analysis chain. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ On Dec 30, 2014, at 9:50 AM

Re: Exception while loading 2 Billion + Documents in Solr 4.8.0

2015-02-04 Thread Walter Underwood
. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ On Feb 4, 2015, at 2:07 PM, Jack Krupansky jack.krupan...@gmail.com wrote: What's your cluster size? The 2 billion limit is per-node. My personal recommendation is that you don't load more than 100 million documents

Re: Changing contextRoot from /solr to /

2015-02-06 Thread Walter Underwood
Put Apache in front of it and rewrite all the URLs. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ On Feb 6, 2015, at 6:08 AM, Andrea Gazzarini a.gazzar...@gmail.com wrote: Sorry I didn't read your email carefully: the rename workaround doesn't work if you

Re: How do you query a sentence composed of multiple words in a description field?

2015-01-22 Thread Walter Underwood
Your query is this: summary:Oracle Fusion Middleware That searches for “Oracle” in the summary field and “Fusion” and “Middleware” in whatever your default field is. You want: summary:”Oracle Fusion Middleware” wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org

Re: GC tuning question - can improving GC pauses cause indexing to slow down?

2015-01-08 Thread Walter Underwood
collector, the goal there is 99 percent application time and 1 percent garbage collection time.” http://www.oracle.com/technetwork/articles/java/g1gc-1984535.html wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ On Jan 8, 2015, at 8:53 PM, Shawn Heisey apa

Re: GC tuning question - can improving GC pauses cause indexing to slow down?

2015-01-09 Thread Walter Underwood
throughput and pause: https://engineering.linkedin.com/garbage-collection/garbage-collection-optimization-high-throughput-and-low-latency-java-applications wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ On Jan 8, 2015, at 11:38 PM, Shawn Heisey apa...@elyograg.org wrote

Re: American /British Dictionary for solr-4.10.2

2015-02-12 Thread Walter Underwood
” suggested “arugula”. Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) On Feb 12, 2015, at 12:19 AM, Markus Jelsma markus.jel...@openindex.io wrote: There are no dictionaries that sum up all possible conjugations, using a heuristics based normalizer would be more

Re: questions about default operator within solr query string

2015-01-05 Thread Walter Underwood
=slug:entertainmentq=headline:entertainment Do you really want sort=score%20asc”? That shows the least relevant items (lowest score) first. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ On Jan 5, 2015, at 3:30 AM, Otis Gospodnetic otis.gospodne...@gmail.com wrote

Re: Determining the Number of Solr Shards

2015-01-07 Thread Walter Underwood
This is described as “write heavy”, so I think that is 12,000 writes/second, not queries. Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ On Jan 7, 2015, at 5:16 PM, Shawn Heisey apa...@elyograg.org wrote: On 1/7/2015 3:29 PM, Nishanth S wrote: I am working on coming

Re: Solr TCP layer

2015-03-08 Thread Walter Underwood
. This was designed before HTTP, so I have an excuse. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) On Mar 8, 2015, at 8:15 PM, Shawn Heisey apa...@elyograg.org wrote: On 3/8/2015 2:05 PM, Saumitra Srivastav wrote: I want to start working on adding a TCP layer

Re: Cores and and ranking (search quality)

2015-03-10 Thread Walter Underwood
lots of docs for the language statistics to even out. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) On Mar 10, 2015, at 1:23 PM, johnmu...@aol.com wrote: Thanks Walter. The design decision I'm trying to solve is this: using multiple cores, will my

Re: Solr TCP layer

2015-03-10 Thread Walter Underwood
I would strongly recommend taking a look at HTTP/2. It might not be fast enough for you, but it is fast enough for Google and there are already implementations. http://http2.github.io/faq/ wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) On Mar 10, 2015

Re: Cores and and ranking (search quality)

2015-03-10 Thread Walter Underwood
of the docs. idf statistics don’t settle down until at least 10K docs. You still sometimes see anomalies under a million documents. What design decision do you need to make? We can probably answer that for you. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog)

Re: SOLR Index in shared/Network folder

2015-03-30 Thread Walter Underwood
Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) On Mar 29, 2015, at 10:42 PM, abhi Abhishek abhi26...@gmail.com wrote: Hello, Thanks for the suggestions. My aim is to reduce the disk space usage. I have 1 master with 2 slave configured, where slaves

Re: SOLR Index in shared/Network folder

2015-03-27 Thread Walter Underwood
Several years ago, I accidentally put Solr indexes on an NFS volume and it was 100X slower. If you have enough RAM, query speed should be OK, but startup time (loading indexes into file buffers) could be really long. Indexing could be quite slow. wunder Walter Underwood wun...@wunderwood.org

Re: syntax for increasing java memory

2015-02-23 Thread Walter Underwood
That depends on the JVM you are using. For the Oracle JVMs, use this to get a list of extended options: java -X wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) On Feb 23, 2015, at 8:21 AM, Kevin Laurie superinterstel...@gmail.com wrote: Hi Guys, I

Re: Performing DIH on predefined list of IDS

2015-02-21 Thread Walter Underwood
, you may need to re-think your design. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) On Feb 21, 2015, at 4:45 PM, Shawn Heisey apa...@elyograg.org wrote: On 2/21/2015 1:46 AM, steve wrote: Careful with the GETs! There is a real, hard limit on the length

Re: Committed before 500

2015-02-20 Thread Walter Underwood
Since you are getting these failures, the 90 second timeout is not “good enough”. Try increasing it. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) On Feb 20, 2015, at 5:22 AM, NareshJakher naresh.jak...@capgemini.com wrote: Hi Shawn, I do

Re: Basic Multilingual search capability

2015-02-23 Thread Walter Underwood
-insensitive approach. But it hits the wall pretty fast. One thing that does work pretty well is trademarked names (LaserJet, Coke, etc). Those are spelled the same in all languages and usually not inflected. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) On Feb

Re: Performing DIH on predefined list of IDS

2015-02-21 Thread Walter Underwood
The HTTP protocol does not set a limit on GET URL size, but individual web servers usually do. You should get a response code of “414 Request-URI Too Long” when the URL is too long. This limit is usually configurable. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org

Re: how to debug solr performance degradation

2015-02-24 Thread Walter Underwood
The other memory is used by the OS as file buffers. All the important parts of the on-disk search index are buffered in memory. When the Solr process wants a block, it is already right there, no delays for disk access. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org

Re: How do you query a sentence composed of multiple words in a description field?

2015-01-23 Thread Walter Underwood
. http://localhost:8983/solr/nvd-rss/select?wt=jsonindent=trueq=summary%3A%22Oracle+Fusion%22 wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ On Jan 23, 2015, at 7:08 AM, Carl Roberts carl.roberts.zap...@gmail.com wrote: Thanks Erick, I think I am going to start

Re: Differentiating user search term in Solr

2015-04-20 Thread Walter Underwood
. * +/- * .hack//Roots * p=mv wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) On Apr 20, 2015, at 5:52 PM, Steven White swhite4...@gmail.com wrote: Hi Erick, I think you missed my point. My request is, Solr support a new URL parameter. If this parameter is set

Re: CDATA response is coming with lt: instead of

2015-04-21 Thread Walter Underwood
Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) On Apr 21, 2015, at 7:10 AM, mesenthil1 senthilkumar.arumu...@viacomcontractor.com wrote: Thanks. For wt=json, it is bringing the results properly. I understand the reason for getting this in lt;. As our solr

Re: Bad contentType for search handler :text/xml; charset=UTF-8

2015-04-22 Thread Walter Underwood
text/xml is not a safe content-type, because of the way that HTTP handles charsets. Always use application/xml. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) On Apr 22, 2015, at 3:01 AM, bengates benga...@aliceadsl.fr wrote: Looks like Solarium

Re: Multiple index.timestamp directories using up disk space

2015-05-04 Thread Walter Underwood
a stable Solr installation. You should consider a different search engine. “Optimizing” (forced merges) will not help. It will probably cause failures more often because it always merges the larges segment. Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) On May

Re: indexing java byte code in classes / jars

2015-05-11 Thread Walter Underwood
How about Krugle? http://opensearch.krugle.org/ Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) On May 11, 2015, at 3:18 AM, Tomasz Borek tomasz.bo...@gmail.com wrote: There's also Perl-backed ACK. http://beyondgrep.com/ Which does the job of searching

Re: Transactional Behavior

2015-05-14 Thread Walter Underwood
. It is a full-featured database that includes search features. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) On May 14, 2015, at 6:12 AM, Emir Arnautovic emir.arnauto...@sematext.com wrote: Hi Amr, As far as I am aware, SOLR does not support transaction

Re: Is it possible to search for the empty string?

2015-05-18 Thread Walter Underwood
with empty values. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) On May 18, 2015, at 5:56 PM, Shawn Heisey apa...@elyograg.org wrote: Can I search for the empty string? This is distinct from searching for documents that don't have a certain fieldat

Re: Indexing PDF and MS Office files

2015-04-16 Thread Walter Underwood
Turning PDF back into a structured document is like trying to turn hamburger back into a cow. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) On Apr 16, 2015, at 4:55 AM, Allison, Timothy B. talli...@mitre.org wrote: +1 :) PS: one more thing

Re: Measuring QPS

2015-04-06 Thread Walter Underwood
to get percentiles. The complicated part of the servlet filter was getting it configured in Tomcat. The code itself is not too bad. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) On Apr 6, 2015, at 1:49 PM, Siegfried Goeschl sgoes...@gmx.at wrote

Re: Measuring QPS

2015-04-06 Thread Walter Underwood
the front end through to Solr. For load testing, we replay production logs to test that we meet the SLA at a given traffic level. Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) On Apr 6, 2015, at 11:31 AM, Davis, Daniel (NIH/NLM) [C] daniel.da...@nih.gov wrote

Re: Measuring QPS

2015-04-06 Thread Walter Underwood
That sounds neat. Our QA people are moving to Gatling, so we probably won’t change our JMeter approach now. We use the JMeter Plugs CMDrunner, telling it to generate only CSV. http://jmeter-plugins.org/wiki/JMeterPluginsCMD/ Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org

Re: Suggestion on field type

2015-05-19 Thread Walter Underwood
store a JSON blob in Solr with the exact values, and use approximate fields to narrow things down. Of course, MarkLogic has a graceful interface to Hadoop. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) On May 19, 2015, at 4:09 PM, Erick Erickson erickerick

Re: Edismax

2015-05-20 Thread Walter Underwood
I highly recommend using boost= in edismax rather than bq=. The multiplicative boost is stable with a wide range of scores. bq is additive and has problems with high or low scores. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) On May 20, 2015, at 1:04

Re: Edismax

2015-05-20 Thread Walter Underwood
I believe that boost is a superset of the bq functionality. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) On May 20, 2015, at 1:16 PM, John Blythe j...@curvolabs.com wrote: could i do that the same way as my mention of using bq? the docs aren't very

Re: Edismax

2015-05-20 Thread Walter Underwood
I was going to post the same advice. If your approach depends on absolute scores, you need to change your approach. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) On May 20, 2015, at 2:09 PM, Shawn Heisey apa...@elyograg.org wrote: On 5/20/2015 2:54

Re: How to identify field names from the suggested values in multiple fields

2015-06-03 Thread Walter Underwood
Configure two suggesters, one based on each field. Use both of them and you’ll get separate suggestions from each. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) On Jun 3, 2015, at 10:03 PM, Dhanesh Radhakrishnan dhan...@hifx.co.in wrote: Hi Anyone

Re: SolrCloud across Amazon Regions?

2015-06-07 Thread Walter Underwood
across three regions (or AZs), the ensemble can survive a single failure of any of them. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) On Jun 7, 2015, at 12:31 PM, William Bell billnb...@gmail.com wrote: Here is a weird architecture... We have a SOLR

Re: Index optimize runs in background.

2015-06-11 Thread Walter Underwood
for managing index segments. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) On Jun 10, 2015, at 10:35 PM, Erick Erickson erickerick...@gmail.com wrote: If I knew, I would fix it ;). The sub-optimizes (i.e. the ones sent out to each replica) should be sent

Chef recipes for Solr

2015-06-01 Thread Walter Underwood
Anyone have Chef recipes they like for deploying Solr? I’d especially appreciate one for uploading the configs directly to a Zookeeper ensemble. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog)

Re: Chef recipes for Solr

2015-06-01 Thread Walter Underwood
That sounds great. Someone else here will be making the recipes, so I’ll put him in touch with you. As always, this is a really helpful list. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) On Jun 1, 2015, at 10:20 PM, Upayavira u...@odoko.co.uk wrote

Re: optimize status

2015-06-29 Thread Walter Underwood
“Optimize” is a manual full merge. Solr automatically merges segments as needed. This also expunges deleted documents. We really need to rename “optimize” to “force merge”. Is there a Jira for that? wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog

Re: Solr relevancy score in percentage

2015-05-26 Thread Walter Underwood
is not a probabilistic engine, it is vector space engine. The scores are fundamentally different. Treating it as a probability of relevance will not work. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog)

Re: Admin Login

2015-08-15 Thread Walter Underwood
No one runs a public-facing Solr server. Just like no one runs a public-facing MySQL server. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) On Aug 15, 2015, at 4:15 PM, Scott Derrick sc...@tnstaafl.net wrote: I'm somewhat puzzled there is no built

Re: Cache

2015-08-19 Thread Walter Underwood
Why? Do you evaluate Unix performance with and without file buffers? wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) On Aug 19, 2015, at 5:00 PM, Nagasharath sharathrayap...@gmail.com wrote: Trying to evaluate the performance of queries

Re: Multiple concurrent queries to Solr

2015-08-23 Thread Walter Underwood
, it can block. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) On Aug 23, 2015, at 8:49 AM, Shawn Heisey apa...@elyograg.org wrote: On 8/23/2015 7:46 AM, Ashish Mukherjee wrote: I want to run few Solr queries in parallel, which are being done in a multi

Re: Correcting text at index time

2015-06-29 Thread Walter Underwood
Yes, do this in an update request processor before it gets to the analyzer chain. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) On Jun 29, 2015, at 3:19 PM, Erick Erickson erickerick...@gmail.com wrote: Hmmm, very hard to do currently. The _point_

Absolute path name for external file field

2015-08-13 Thread Walter Underwood
. Is that still possible? wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog)

Re: Which default Boolean operator to set, AND or OR?

2015-07-15 Thread Walter Underwood
The AND default has one big problem. If the user misspells a single word, they get no results. About 10% of queries are misspelled, so that means a lot more failures. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) On Jul 15, 2015, at 7:21 AM, Jack

Re: LIX readability index calculation by solr

2015-10-21 Thread Walter Underwood
Can you reload all the content? If so, I would calculate this in an update request processor and put the result in its own field. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Oct 21, 2015, at 2:53 AM, Roland Szűcs <roland.sz...@booknwa

Re: [newbie] Configuration for SolrCloud + DataImportHandler

2015-10-21 Thread Walter Underwood
Does the collection reload do a rolling reload of each node or does it do them all at once? We were planning on using the core reload on each system, one at a time. That would make sure the collection stays available. I read the documentation, it didn’t say anything about that. wunder Walter

Re: Best strategy for indexing multiple tables with multiple fields

2015-10-26 Thread Walter Underwood
with tens of thousands of fields. A thousand fields might be cumbersome, but it won’t break Solr. If the tables contain different kinds of things, you might have different collections (one per document), or one collection with a “type” field for each kind of document. wunder Walter Underwood

Re: restore quorum after majority of zk nodes down

2015-10-29 Thread Walter Underwood
the Solr cluster to talk to it. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Oct 29, 2015, at 10:08 AM, Matteo Grolla <matteo.gro...@gmail.com> wrote: > > I'm designing a solr cloud installation where nodes from a single cluster &g

Re: Best way to track cumulative GC pauses in Solr

2015-11-13 Thread Walter Underwood
Also, what GC settings are you using? We may be able to make some suggestions. Cumulative GC pauses aren’t very interesting to me. I’m more interested in the longest ones, 90th percentile, 95th, etc. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog

Re: Solr logging in local time

2015-11-16 Thread Walter Underwood
I’m sure it is possible, but think twice before logging in local time. Do you really want one day with 23 hours and one day with 25 hours each year? wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Nov 16, 2015, at 8:04 AM, tedsolr &

Re: Very high memory and CPU utilization.

2015-11-02 Thread Walter Underwood
Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Nov 2, 2015, at 9:39 PM, Modassar Ather <modather1...@gmail.com> wrote: > > Thanks Walter for your response, > > It is around 90GB of index (around 8 million documents) on one shard and >

Re: Fastest way to import a giant word list into Solr/Lucene?

2015-10-30 Thread Walter Underwood
this short article to learn more about spelling correction. http://norvig.com/spell-correct.html <http://norvig.com/spell-correct.html> wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Oct 30, 2015, at 4:45 PM, Robert Oschler <robert.osch.

Re: Fastest way to import a giant word list into Solr/Lucene?

2015-10-30 Thread Walter Underwood
Read the links I have sent. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Oct 30, 2015, at 7:10 PM, Robert Oschler <robert.osch...@gmail.com> wrote: > > Thanks Walter. Are there any open source spell checkers that implement the

Re: Is it impossible to update an index that is undergoing an optimize?

2015-11-06 Thread Walter Underwood
It is pretty handy, though. Great for expunging docs that are marked deleted or are expired. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Nov 6, 2015, at 5:31 PM, Alexandre Rafalovitch <arafa...@gmail.com> wrote: > > Elasti

Re: Very high memory and CPU utilization.

2015-11-02 Thread Walter Underwood
use the EdgeNgramFilter to index prefixes. That will make your index larger, but prefix searches will be very fast. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Nov 2, 2015, at 5:17 AM, Toke Eskildsen <t...@statsbiblioteket.dk> wrote: &g

Re: Boosting a document score when advertised! Please help!

2015-11-05 Thread Walter Underwood
this approach is nice and clear. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Nov 5, 2015, at 3:33 AM, Alessandro Benedetti <abenede...@apache.org> > wrote: > > Hi Christian, > there are several ways : > > 1) Elevation

Re: Solr getting irrelevant results when use block join

2015-10-31 Thread Walter Underwood
This will probably work better without child documents and joins. I would denormalize into actor documents and movie documents. At least, that’s what I did at Netflix. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Oct 31, 2015, at 1:17

Re: Fastest way to import a giant word list into Solr/Lucene?

2015-10-30 Thread Walter Underwood
g fast. In only 21 lines of Python. http://norvig.com/spell-correct.html <http://norvig.com/spell-correct.html> wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Oct 30, 2015, at 11:37 AM, Robert Oschler <robert.osch...@gmail.com> wrote

Re: How to show some documents ahead of others

2015-10-08 Thread Walter Underwood
items using the “boost” parameter in edismax. Adjust it to be a tiebreaker between documents with similar score. 2. Show two lists, one with the five most relevant paid, the next with the five most relevant unpaid. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my

Re: How to show some documents ahead of others - requirements

2015-10-10 Thread Walter Underwood
thing. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Oct 10, 2015, at 9:31 AM, Erick Erickson <erickerick...@gmail.com> wrote: > > Would result grouping work here? If the group key was "paid", then > you'd get two gr

Re: Exclude documents having same data in two fields

2015-10-10 Thread Walter Underwood
After several days, we finally get the real requirement. It really does waste a lot of time and energy when people won’t tell us that. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Oct 10, 2015, at 8:19 AM, Upayavira <u...@odoko.co.uk&

Re: catchall fields or multiple fields

2015-10-12 Thread Walter Underwood
a phonetic representation, then you can weight the lower case higher than the stemmed field, and stemmed higher than phonetic. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Oct 12, 2015, at 6:12 AM, Ahmet Arslan <iori...@yahoo.com.INVALID&

Re: How to disable the admin interface

2015-10-05 Thread Walter Underwood
You understand that disabling the admin API will leave you with an unmaintainable Solr installation, right? You might not even be able to diagnose the problem. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Oct 5, 2015, at 11:34 AM, Siddhar

Re: EdgeNGramFilterFactory question

2015-10-07 Thread Walter Underwood
in different analysis chains stored in separate fields. The exact example you list will work fine with stemming and phrase search. Check out the phrase search support in the edismax query parser. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Oc

Re: Pressed optimize and now SOLR is not indexing while optimize is going on

2015-10-07 Thread Walter Underwood
LDP/sag/html/buffer-cache.html> wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Oct 7, 2015, at 3:40 AM, Toke Eskildsen <t...@statsbiblioteket.dk> wrote: > > On Wed, 2015-10-07 at 07:03 -0300, Eric Torti wrote: >> I'm sorry to d

Re: Exclude documents having same data in two fields

2015-10-09 Thread Walter Underwood
Please explain why you do not want to use an extra field. That is the only solution that will perform well on your large index. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Oct 9, 2015, at 7:47 AM, Aman Tandon <amantandon...@gmail.com&

Re: Auto-suggest in Solr

2015-07-11 Thread Walter Underwood
Thanks, this is very helpful. Suggester config is quite under documented. It took me longer than I expected to get it working. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) On Jul 10, 2015, at 6:30 PM, Alessandro Benedetti benedetti.ale...@gmail.com

Re: Search inconsistency on Solr

2015-07-07 Thread Walter Underwood
We test the order of results, not the exact score. Score values depend on the number of documents in the index. Also, the order is the only thing we care about. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) On Jul 7, 2015, at 12:40 AM, joseph paulo

Re: Strange interpretation of invalid ISO date strings

2015-09-07 Thread Walter Underwood
Yes, ISO 8601 gets pretty baroque in the far nooks and crannies of the spec. I use the “web profile” of ISO 8601, which is very simple. I’ve never seen any software mishandle dates using this subset of the spec. http://www.w3.org/TR/NOTE-datetime wunder Walter Underwood wun...@wunderwood.org

Re: any easy way to find out when a core's index physical file has been last updated?

2015-09-03 Thread Walter Underwood
Instead of writing new code, you could configure an autocommit interval in Solr. That already does what you want, no more than one commit in the interval and no commits if there were no adds or deletes. Then the clients would never need to commit. wunder Walter Underwood wun...@wunderwood.org

Re: Solr facets implementation question

2015-09-08 Thread Walter Underwood
Every faceting implementation I’ve seen (not just Solr/Lucene) makes big in-memory lists. Lots of values means a bigger list. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) On Sep 8, 2015, at 8:33 AM, Shawn Heisey <apa...@elyograg.org> wrote: &g

Re: Detect term occurrences

2015-09-10 Thread Walter Underwood
Doing a query for each term should work well. Solr is fast for queries. Write a script. I assume you only need to do this once. Running all the queries will probably take less time than figuring out a different approach. wunder Walter Underwood wun...@wunderwood.org http

Re: Cost of having multiple search handlers?

2015-09-28 Thread Walter Underwood
We did the same thing, but reporting performance metrics to Graphite. But we won’t be able to add servlet filters in 6.x, because it won’t be a webapp. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Sep 28, 2015, at 11:32 AM, Gili Nachum <g

Re: Solr vs Lucene

2015-10-01 Thread Walter Underwood
If you want a spell checker, don’t use a search engine. Use a spell checker. Something like aspell (http://aspell.net/ <http://aspell.net/>) will be faster and better than Solr. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Oct 1, 2015

Re: Cost of having multiple search handlers?

2015-09-28 Thread Walter Underwood
We built our own because there was no movement on that. Don’t hold your breath. Glad to contribute it. We’ve been running it in production for a year, but the config is pretty manual. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Sep 28, 2

Re: solr get score of each doc in edis max search and more like this search result

2015-09-23 Thread Walter Underwood
limit will almost certainly not do what you want. Because it doesn’t do anything useful. I recommend reading this document for more info: https://wiki.apache.org/lucene-java/ScoresAsPercentages <https://wiki.apache.org/lucene-java/ScoresAsPercentages> wunder Walter Underwo

Re: is there a way to remove deleted documents from index without optimize

2015-09-22 Thread Walter Underwood
Don’t do anything. Solr will automatically clean up the deleted documents for you. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Sep 22, 2015, at 6:01 PM, CrazyDiamond <crazy_diam...@mail.ru> wrote: > > my index is updating freque

Re: faceting is unusable slow since upgrade to 5.3.0

2015-09-22 Thread Walter Underwood
Faceting on an author field is almost always a bad idea. Or at least a slow, expensive idea. Faceting makes big in-memory lists. More values, bigger lists. An author field usually has many, many values, so you will need a lot of memory. wunder Walter Underwood wun...@wunderwood.org http

Re: bulk reindexing 5.3.0 issue

2015-09-25 Thread Walter Underwood
Sure. 1. Delete all the docs (no commit). 2. Add all the docs (no commit). 3. Commit. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Sep 25, 2015, at 2:17 PM, Ravi Solr <ravis...@gmail.com> wrote: > > I have been trying to re

Re: firstSearcher cache warming with own QuerySenderListener

2015-09-25 Thread Walter Underwood
Right. I chose the twenty most frequent terms from our documents and use those for cache warming. The list of most frequent terms is pretty stable in most collections. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Sep 25, 2015, at 8:38

Re: bulk reindexing 5.3.0 issue

2015-09-25 Thread Walter Underwood
of them. No guarantee, but it is worth a try. Good luck. wunder Walter Underwood wun...@wunderwood.org http://observer.wunderwood.org/ (my blog) > On Sep 25, 2015, at 2:59 PM, Ravi Solr <ravis...@gmail.com> wrote: > > Walter, Not in a mood for banter right now Its 6:00pm on a f

<    4   5   6   7   8   9   10   11   12   13   >