I did that once by accident. It was 100X slower.
wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/
On Nov 4, 2014, at 1:57 PM, Gili Nachum gilinac...@gmail.com wrote:
My data center is out of SAN or local disk storage - is it a big no-no to
store Solr core data
My experience was with Solr 1.2 and regular old NFS, so that was probably worst
case. I was very surprised that it was that bad, though.
So benchmark it before you assume it is fast enough.
wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/
On Nov 5, 2014, at 12:27
I am curious why you are trying to do this with Solr. This is straightforward
with other systems. I would use HBase for this. This could be really hard with
Solr.
wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/
On Nov 5, 2014, at 5:08 PM, Steve Davids sdav
Yes, I implemented exactly that fallback for Solr 1.2 at Netflix.
It isn’t to hard if the code is structured for it; retry with a batch size of 1.
wunder
On Nov 7, 2014, at 11:01 AM, Erick Erickson erickerick...@gmail.com wrote:
Yeah, this has been an ongoing issue for a _long_ time.
Right, that is why we batch.
When a batch of 1000 fails, drop to a batch size of 1 and start the batch over.
Then it can report the exact document with problems.
If you want to continue, go back to the bigger batch size. I usually fail the
whole batch on one error.
wunder
Walter Underwood
wun
wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/
We get no suggestions until we force a build with suggest.build=true. Maybe we
need to define a spellchecker component to get that behavior?
wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/
On Nov 13, 2014, at 10:56 PM, Michael Sokolov msoko
That fixed it.
I bet that would fix the problem with the very long startup that another user
had. That’s a bug in the default solrconfig.xml, it should persist the
dictionaries.
wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/
On Nov 14, 2014, at 12:42 AM
This feature is called “more like this”. I think it only works for a single
document, but it probably could be extended.
wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/
On Nov 24, 2014, at 10:26 AM, Alexandre Rafalovitch arafa...@gmail.com wrote:
Very unlikely
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/
On Nov 27, 2014, at 12:56 PM, solr-user solr-u...@hotmail.com wrote:
I inherited a set of some old 1.4x Solrs running under tomcat6/java6
while I will eventually upgrade them to a more recent solr/tomcat/java, I am
unable
No, 400 should mean that the request was bad. When the server fails, that is a
500.
wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/
On Dec 4, 2014, at 8:43 AM, Alexandre Rafalovitch arafa...@gmail.com wrote:
400 error means something wrong on the server
Why is that useful? It breaks phrase search.
If you want to ignore term frequency in ranking, change the Similarity class.
wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/
On Dec 17, 2014, at 2:40 PM, Varun Rajput varun...@hotmail.com wrote
You want preserveOriginal=“1”.
You should only do this processing at index time.
wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/
On Dec 30, 2014, at 9:33 AM, Jonathan Rochkind rochk...@jhu.edu wrote:
Okay, thanks. I'm not sure if it's my lack of understanding
There are two approaches for the query “mixedCase” to match “mixed Case” in the
original document.
1. Add an index time synonym.
2. Add a ShingleFilterFactory to the index analysis chain.
wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/
On Dec 30, 2014, at 9:50 AM
.
wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/
On Feb 4, 2015, at 2:07 PM, Jack Krupansky jack.krupan...@gmail.com wrote:
What's your cluster size? The 2 billion limit is per-node.
My personal recommendation is that you don't load more than 100 million
documents
Put Apache in front of it and rewrite all the URLs.
wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/
On Feb 6, 2015, at 6:08 AM, Andrea Gazzarini a.gazzar...@gmail.com wrote:
Sorry I didn't read your email carefully: the rename workaround doesn't work
if you
Your query is this:
summary:Oracle Fusion Middleware
That searches for “Oracle” in the summary field and “Fusion” and “Middleware”
in whatever your default field is.
You want:
summary:”Oracle Fusion Middleware”
wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org
collector, the goal there is 99 percent application time and 1 percent garbage
collection time.”
http://www.oracle.com/technetwork/articles/java/g1gc-1984535.html
wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/
On Jan 8, 2015, at 8:53 PM, Shawn Heisey apa
throughput and pause:
https://engineering.linkedin.com/garbage-collection/garbage-collection-optimization-high-throughput-and-low-latency-java-applications
wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/
On Jan 8, 2015, at 11:38 PM, Shawn Heisey apa...@elyograg.org wrote
” suggested “arugula”.
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/ (my blog)
On Feb 12, 2015, at 12:19 AM, Markus Jelsma markus.jel...@openindex.io wrote:
There are no dictionaries that sum up all possible conjugations, using a
heuristics based normalizer would be more
=slug:entertainmentq=headline:entertainment
Do you really want sort=score%20asc”? That shows the least relevant items
(lowest score) first.
wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/
On Jan 5, 2015, at 3:30 AM, Otis Gospodnetic otis.gospodne...@gmail.com wrote
This is described as “write heavy”, so I think that is 12,000 writes/second,
not queries.
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/
On Jan 7, 2015, at 5:16 PM, Shawn Heisey apa...@elyograg.org wrote:
On 1/7/2015 3:29 PM, Nishanth S wrote:
I am working on coming
.
This was designed before HTTP, so I have an excuse.
wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/ (my blog)
On Mar 8, 2015, at 8:15 PM, Shawn Heisey apa...@elyograg.org wrote:
On 3/8/2015 2:05 PM, Saumitra Srivastav wrote:
I want to start working on adding a TCP layer
lots of docs for the language statistics
to even out.
wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/ (my blog)
On Mar 10, 2015, at 1:23 PM, johnmu...@aol.com wrote:
Thanks Walter.
The design decision I'm trying to solve is this: using multiple cores, will
my
I would strongly recommend taking a look at HTTP/2. It might not be fast enough
for you, but it is fast enough for Google and there are already implementations.
http://http2.github.io/faq/
wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/ (my blog)
On Mar 10, 2015
of the docs.
idf statistics don’t settle down until at least 10K docs. You still sometimes
see anomalies under a million documents.
What design decision do you need to make? We can probably answer that for you.
wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/ (my blog)
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/ (my blog)
On Mar 29, 2015, at 10:42 PM, abhi Abhishek abhi26...@gmail.com wrote:
Hello,
Thanks for the suggestions. My aim is to reduce the disk space usage.
I have 1 master with 2 slave configured, where slaves
Several years ago, I accidentally put Solr indexes on an NFS volume and it was
100X slower.
If you have enough RAM, query speed should be OK, but startup time (loading
indexes into file buffers) could be really long. Indexing could be quite slow.
wunder
Walter Underwood
wun...@wunderwood.org
That depends on the JVM you are using. For the Oracle JVMs, use this to get a
list of extended options:
java -X
wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/ (my blog)
On Feb 23, 2015, at 8:21 AM, Kevin Laurie superinterstel...@gmail.com wrote:
Hi Guys,
I
, you may need to re-think
your design.
wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/ (my blog)
On Feb 21, 2015, at 4:45 PM, Shawn Heisey apa...@elyograg.org wrote:
On 2/21/2015 1:46 AM, steve wrote:
Careful with the GETs! There is a real, hard limit on the length
Since you are getting these failures, the 90 second timeout is not “good
enough”. Try increasing it.
wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/ (my blog)
On Feb 20, 2015, at 5:22 AM, NareshJakher naresh.jak...@capgemini.com wrote:
Hi Shawn,
I do
-insensitive approach. But it
hits the wall pretty fast.
One thing that does work pretty well is trademarked names (LaserJet, Coke,
etc). Those are spelled the same in all languages and usually not inflected.
wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/ (my blog)
On Feb
The HTTP protocol does not set a limit on GET URL size, but individual web
servers usually do. You should get a response code of “414 Request-URI Too
Long” when the URL is too long.
This limit is usually configurable.
wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org
The other memory is used by the OS as file buffers. All the important parts of
the on-disk search index are buffered in memory. When the Solr process wants a
block, it is already right there, no delays for disk access.
wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org
.
http://localhost:8983/solr/nvd-rss/select?wt=jsonindent=trueq=summary%3A%22Oracle+Fusion%22
wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/
On Jan 23, 2015, at 7:08 AM, Carl Roberts carl.roberts.zap...@gmail.com wrote:
Thanks Erick,
I think I am going to start
.
* +/-
* .hack//Roots
* p=mv
wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/ (my blog)
On Apr 20, 2015, at 5:52 PM, Steven White swhite4...@gmail.com wrote:
Hi Erick,
I think you missed my point. My request is, Solr support a new URL
parameter. If this parameter is set
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/ (my blog)
On Apr 21, 2015, at 7:10 AM, mesenthil1
senthilkumar.arumu...@viacomcontractor.com wrote:
Thanks.
For wt=json, it is bringing the results properly. I understand the reason
for getting this in lt;. As our solr
text/xml is not a safe content-type, because of the way that HTTP handles
charsets. Always use application/xml.
wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/ (my blog)
On Apr 22, 2015, at 3:01 AM, bengates benga...@aliceadsl.fr wrote:
Looks like Solarium
a stable Solr
installation. You should consider a different search engine.
“Optimizing” (forced merges) will not help. It will probably cause failures
more often because it always merges the larges segment.
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/ (my blog)
On May
How about Krugle?
http://opensearch.krugle.org/
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/ (my blog)
On May 11, 2015, at 3:18 AM, Tomasz Borek tomasz.bo...@gmail.com wrote:
There's also Perl-backed ACK. http://beyondgrep.com/
Which does the job of searching
. It is a full-featured database that includes search features.
wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/ (my blog)
On May 14, 2015, at 6:12 AM, Emir Arnautovic emir.arnauto...@sematext.com
wrote:
Hi Amr,
As far as I am aware, SOLR does not support transaction
with empty values.
wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/ (my blog)
On May 18, 2015, at 5:56 PM, Shawn Heisey apa...@elyograg.org wrote:
Can I search for the empty string? This is distinct from searching for
documents that don't have a certain fieldat
Turning PDF back into a structured document is like trying to turn hamburger
back into a cow.
wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/ (my blog)
On Apr 16, 2015, at 4:55 AM, Allison, Timothy B. talli...@mitre.org wrote:
+1
:)
PS: one more thing
to get percentiles.
The complicated part of the servlet filter was getting it configured in Tomcat.
The code itself is not too bad.
wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/ (my blog)
On Apr 6, 2015, at 1:49 PM, Siegfried Goeschl sgoes...@gmx.at wrote
the front end through to Solr.
For load testing, we replay production logs to test that we meet the SLA at a
given traffic level.
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/ (my blog)
On Apr 6, 2015, at 11:31 AM, Davis, Daniel (NIH/NLM) [C] daniel.da...@nih.gov
wrote
That sounds neat. Our QA people are moving to Gatling, so we probably won’t
change our JMeter approach now.
We use the JMeter Plugs CMDrunner, telling it to generate only CSV.
http://jmeter-plugins.org/wiki/JMeterPluginsCMD/
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org
store a JSON blob in Solr
with the exact values, and use approximate fields to narrow things down. Of
course, MarkLogic has a graceful interface to Hadoop.
wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/ (my blog)
On May 19, 2015, at 4:09 PM, Erick Erickson erickerick
I highly recommend using boost= in edismax rather than bq=. The multiplicative
boost is stable with a wide range of scores. bq is additive and has problems
with high or low scores.
wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/ (my blog)
On May 20, 2015, at 1:04
I believe that boost is a superset of the bq functionality.
wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/ (my blog)
On May 20, 2015, at 1:16 PM, John Blythe j...@curvolabs.com wrote:
could i do that the same way as my mention of using bq? the docs aren't
very
I was going to post the same advice. If your approach depends on absolute
scores, you need to change your approach.
wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/ (my blog)
On May 20, 2015, at 2:09 PM, Shawn Heisey apa...@elyograg.org wrote:
On 5/20/2015 2:54
Configure two suggesters, one based on each field. Use both of them and you’ll
get separate suggestions from each.
wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/ (my blog)
On Jun 3, 2015, at 10:03 PM, Dhanesh Radhakrishnan dhan...@hifx.co.in wrote:
Hi
Anyone
across three regions (or AZs), the ensemble can survive a single
failure of any of them.
wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/ (my blog)
On Jun 7, 2015, at 12:31 PM, William Bell billnb...@gmail.com wrote:
Here is a weird architecture...
We have a SOLR
for managing index segments.
wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/ (my blog)
On Jun 10, 2015, at 10:35 PM, Erick Erickson erickerick...@gmail.com wrote:
If I knew, I would fix it ;). The sub-optimizes (i.e. the ones
sent out to each replica) should be sent
Anyone have Chef recipes they like for deploying Solr?
I’d especially appreciate one for uploading the configs directly to a Zookeeper
ensemble.
wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/ (my blog)
That sounds great. Someone else here will be making the recipes, so I’ll put
him in touch with you.
As always, this is a really helpful list.
wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/ (my blog)
On Jun 1, 2015, at 10:20 PM, Upayavira u...@odoko.co.uk wrote
“Optimize” is a manual full merge.
Solr automatically merges segments as needed. This also expunges deleted
documents.
We really need to rename “optimize” to “force merge”. Is there a Jira for that?
wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/ (my blog
is not a probabilistic engine, it is vector space engine. The scores are
fundamentally different. Treating it as a probability of relevance will not
work.
wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/ (my blog)
No one runs a public-facing Solr server. Just like no one runs a public-facing
MySQL server.
wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/ (my blog)
On Aug 15, 2015, at 4:15 PM, Scott Derrick sc...@tnstaafl.net wrote:
I'm somewhat puzzled there is no built
Why? Do you evaluate Unix performance with and without file buffers?
wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/ (my blog)
On Aug 19, 2015, at 5:00 PM, Nagasharath sharathrayap...@gmail.com wrote:
Trying to evaluate the performance of queries
, it can block.
wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/ (my blog)
On Aug 23, 2015, at 8:49 AM, Shawn Heisey apa...@elyograg.org wrote:
On 8/23/2015 7:46 AM, Ashish Mukherjee wrote:
I want to run few Solr queries in parallel, which are being done in a
multi
Yes, do this in an update request processor before it gets to the analyzer
chain.
wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/ (my blog)
On Jun 29, 2015, at 3:19 PM, Erick Erickson erickerick...@gmail.com wrote:
Hmmm, very hard to do currently. The _point_
. Is that still
possible?
wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/ (my blog)
The AND default has one big problem. If the user misspells a single word, they
get no results. About 10% of queries are misspelled, so that means a lot more
failures.
wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/ (my blog)
On Jul 15, 2015, at 7:21 AM, Jack
Can you reload all the content?
If so, I would calculate this in an update request processor and put the result
in its own field.
wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/ (my blog)
> On Oct 21, 2015, at 2:53 AM, Roland Szűcs <roland.sz...@booknwa
Does the collection reload do a rolling reload of each node or does it do them
all at once? We were planning on using the core reload on each system, one at a
time. That would make sure the collection stays available.
I read the documentation, it didn’t say anything about that.
wunder
Walter
with tens of thousands of fields. A thousand fields
might be cumbersome, but it won’t break Solr.
If the tables contain different kinds of things, you might have different
collections (one per document), or one collection with a “type” field for each
kind of document.
wunder
Walter Underwood
the Solr
cluster to talk to it.
wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/ (my blog)
> On Oct 29, 2015, at 10:08 AM, Matteo Grolla <matteo.gro...@gmail.com> wrote:
>
> I'm designing a solr cloud installation where nodes from a single cluster
&g
Also, what GC settings are you using? We may be able to make some suggestions.
Cumulative GC pauses aren’t very interesting to me. I’m more interested in the
longest ones, 90th percentile, 95th, etc.
wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/ (my blog
I’m sure it is possible, but think twice before logging in local time. Do you
really want one day with 23 hours and one day with 25 hours each year?
wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/ (my blog)
> On Nov 16, 2015, at 8:04 AM, tedsolr &
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/ (my blog)
> On Nov 2, 2015, at 9:39 PM, Modassar Ather <modather1...@gmail.com> wrote:
>
> Thanks Walter for your response,
>
> It is around 90GB of index (around 8 million documents) on one shard and
>
this
short article to learn more about spelling correction.
http://norvig.com/spell-correct.html <http://norvig.com/spell-correct.html>
wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/ (my blog)
> On Oct 30, 2015, at 4:45 PM, Robert Oschler <robert.osch.
Read the links I have sent.
wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/ (my blog)
> On Oct 30, 2015, at 7:10 PM, Robert Oschler <robert.osch...@gmail.com> wrote:
>
> Thanks Walter. Are there any open source spell checkers that implement the
It is pretty handy, though. Great for expunging docs that are marked deleted or
are expired.
wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/ (my blog)
> On Nov 6, 2015, at 5:31 PM, Alexandre Rafalovitch <arafa...@gmail.com> wrote:
>
> Elasti
use the EdgeNgramFilter to index
prefixes. That will make your index larger, but prefix searches will be very
fast.
wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/ (my blog)
> On Nov 2, 2015, at 5:17 AM, Toke Eskildsen <t...@statsbiblioteket.dk> wrote:
&g
this approach is nice and clear.
wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/ (my blog)
> On Nov 5, 2015, at 3:33 AM, Alessandro Benedetti <abenede...@apache.org>
> wrote:
>
> Hi Christian,
> there are several ways :
>
> 1) Elevation
This will probably work better without child documents and joins.
I would denormalize into actor documents and movie documents. At least, that’s
what I did at Netflix.
wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/ (my blog)
> On Oct 31, 2015, at 1:17
g fast. In only 21 lines of Python.
http://norvig.com/spell-correct.html <http://norvig.com/spell-correct.html>
wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/ (my blog)
> On Oct 30, 2015, at 11:37 AM, Robert Oschler <robert.osch...@gmail.com> wrote
items using the “boost” parameter in edismax. Adjust it to be a
tiebreaker between documents with similar score.
2. Show two lists, one with the five most relevant paid, the next with the five
most relevant unpaid.
wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/ (my
thing.
wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/ (my blog)
> On Oct 10, 2015, at 9:31 AM, Erick Erickson <erickerick...@gmail.com> wrote:
>
> Would result grouping work here? If the group key was "paid", then
> you'd get two gr
After several days, we finally get the real requirement. It really does waste a
lot of time and energy when people won’t tell us that.
wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/ (my blog)
> On Oct 10, 2015, at 8:19 AM, Upayavira <u...@odoko.co.uk&
a phonetic representation, then
you can weight the lower case higher than the stemmed field, and stemmed higher
than phonetic.
wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/ (my blog)
> On Oct 12, 2015, at 6:12 AM, Ahmet Arslan <iori...@yahoo.com.INVALID&
You understand that disabling the admin API will leave you with an
unmaintainable Solr installation, right? You might not even be able to diagnose
the problem.
wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/ (my blog)
> On Oct 5, 2015, at 11:34 AM, Siddhar
in different
analysis chains stored in separate fields.
The exact example you list will work fine with stemming and phrase search.
Check out the phrase search support in the edismax query parser.
wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/ (my blog)
> On Oc
LDP/sag/html/buffer-cache.html>
wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/ (my blog)
> On Oct 7, 2015, at 3:40 AM, Toke Eskildsen <t...@statsbiblioteket.dk> wrote:
>
> On Wed, 2015-10-07 at 07:03 -0300, Eric Torti wrote:
>> I'm sorry to d
Please explain why you do not want to use an extra field. That is the only
solution that will perform well on your large index.
wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/ (my blog)
> On Oct 9, 2015, at 7:47 AM, Aman Tandon <amantandon...@gmail.com&
Thanks, this is very helpful.
Suggester config is quite under documented. It took me longer than I expected
to get it working.
wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/ (my blog)
On Jul 10, 2015, at 6:30 PM, Alessandro Benedetti benedetti.ale...@gmail.com
We test the order of results, not the exact score.
Score values depend on the number of documents in the index. Also, the order is
the only thing we care about.
wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/ (my blog)
On Jul 7, 2015, at 12:40 AM, joseph paulo
Yes, ISO 8601 gets pretty baroque in the far nooks and crannies of the spec.
I use the “web profile” of ISO 8601, which is very simple. I’ve never seen any
software mishandle dates using this subset of the spec.
http://www.w3.org/TR/NOTE-datetime
wunder
Walter Underwood
wun...@wunderwood.org
Instead of writing new code, you could configure an autocommit interval in
Solr. That already does what you want, no more than one commit in the interval
and no commits if there were no adds or deletes.
Then the clients would never need to commit.
wunder
Walter Underwood
wun...@wunderwood.org
Every faceting implementation I’ve seen (not just Solr/Lucene) makes big
in-memory lists. Lots of values means a bigger list.
wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/ (my blog)
On Sep 8, 2015, at 8:33 AM, Shawn Heisey <apa...@elyograg.org> wrote:
&g
Doing a query for each term should work well. Solr is fast for queries. Write a
script.
I assume you only need to do this once. Running all the queries will probably
take less time than figuring out a different approach.
wunder
Walter Underwood
wun...@wunderwood.org
http
We did the same thing, but reporting performance metrics to Graphite.
But we won’t be able to add servlet filters in 6.x, because it won’t be a
webapp.
wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/ (my blog)
> On Sep 28, 2015, at 11:32 AM, Gili Nachum <g
If you want a spell checker, don’t use a search engine. Use a spell checker.
Something like aspell (http://aspell.net/ <http://aspell.net/>) will be faster
and better than Solr.
wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/ (my blog)
> On Oct 1, 2015
We built our own because there was no movement on that. Don’t hold your breath.
Glad to contribute it. We’ve been running it in production for a year, but the
config is pretty manual.
wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/ (my blog)
> On Sep 28, 2
limit will almost certainly not do what you want. Because
it doesn’t do anything useful.
I recommend reading this document for more info:
https://wiki.apache.org/lucene-java/ScoresAsPercentages
<https://wiki.apache.org/lucene-java/ScoresAsPercentages>
wunder
Walter Underwo
Don’t do anything. Solr will automatically clean up the deleted documents for
you.
wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/ (my blog)
> On Sep 22, 2015, at 6:01 PM, CrazyDiamond <crazy_diam...@mail.ru> wrote:
>
> my index is updating freque
Faceting on an author field is almost always a bad idea. Or at least a slow,
expensive idea.
Faceting makes big in-memory lists. More values, bigger lists. An author field
usually has many, many values, so you will need a lot of memory.
wunder
Walter Underwood
wun...@wunderwood.org
http
Sure.
1. Delete all the docs (no commit).
2. Add all the docs (no commit).
3. Commit.
wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/ (my blog)
> On Sep 25, 2015, at 2:17 PM, Ravi Solr <ravis...@gmail.com> wrote:
>
> I have been trying to re
Right.
I chose the twenty most frequent terms from our documents and use those for
cache warming. The list of most frequent terms is pretty stable in most
collections.
wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/ (my blog)
> On Sep 25, 2015, at 8:38
of them. No guarantee, but it is worth a try.
Good luck.
wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/ (my blog)
> On Sep 25, 2015, at 2:59 PM, Ravi Solr <ravis...@gmail.com> wrote:
>
> Walter, Not in a mood for banter right now Its 6:00pm on a f
801 - 900 of 1642 matches
Mail list logo