). And so on.
As one of my coworkers said, trying to turn a PDF into structured text is like
trying to turn hamburger back into a cow.
PDF is where text goes to die.
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/ (my blog)
> On Dec 17, 2015, at 2:48 AM, Charlie H
cache.
Test with production logs. Choose logs where the number of distinct queries is
much larger than your cache sizes. If your caches are 1024, it would be good to
have a 100K distinct queries. That might mean of total log size of a few
million queries.
wunder
Walter Underwood
wun
questions.
wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/ (my blog)
> On Dec 22, 2015, at 4:58 PM, Vincenzo D'Amore wrote:
>
> Hi All,
>
> my website is under pressure, there is a big number of concurrent searches.
> When the connected
I would do that in a middle tier. You can’t do every single thing in Solr.
wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/ (my blog)
> On Dec 24, 2015, at 1:21 PM, Upayavira wrote:
>
> You could create a custom DocTransformer. They can enhance t
ex is continually updated, clicking that is a complete waste of
resources. Don’t do it.
wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/ (my blog)
> On Dec 29, 2015, at 6:35 PM, Zheng Lin Edwin Yeo wrote:
>
> Hi,
>
> I am facing a situation, when I d
fter
> optimization, the index size reduces. Do we still need to do that?
>
> Regards,
> Edwin
>
> On 30 December 2015 at 10:45, Walter Underwood
> wrote:
>
>> Do not “optimize".
>>
>> It is a forced merge, not an optimization. It was a mistake to eve
You probably do not NEED to merge your indexes. Have you tried not merging the
indexes?
wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/ (my blog)
> On Dec 29, 2015, at 7:31 PM, jeba earnest wrote:
>
> I have a scenario that I need to merge the sol
You could send the documents to both and filter out the recent ones in the
history collection.
wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/ (my blog)
> On Jan 5, 2016, at 5:46 AM, vidya wrote:
>
> Hi
>
> I would like to maintain two cores f
require Java 7 was made at some point in the
4.x development.
wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/ (my blog)
> On Jan 7, 2016, at 7:26 PM, billnb...@gmail.com wrote:
>
> Run it on 2 separate boxes
>
> Bill Bell
> Sent from mobile
>
give worse results
than a vector space model, but you can have thresholds.
wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/ (my blog)
> On Jan 20, 2016, at 5:11 AM, Emir Arnautovic
> wrote:
>
> Hi Sara,
> You can use funct and frange to achive n
would still be up.
If you are OK with that risk, run three nodes. If not, run five.
wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/ (my blog)
> On Jan 21, 2016, at 9:27 AM, Erick Erickson wrote:
>
> NP. My usual question though is "how often do you ex
very unusual queries. Median
response time was much better, about 50 milliseconds.
wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/ (my blog)
> On Jan 22, 2016, at 2:45 PM, Toke Eskildsen wrote:
>
> Aswath Srinivasan (TMS) wrote:
>> * To
Yo. That is the truth. You can get stuff indexed with an automatic schema, but
if you want to make your customers happy, tune it.
wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/ (my blog)
> On Jan 22, 2016, at 6:22 PM, Erick Erickson wrote:
>
>
think they even point out what is thread safe.
wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/ (my blog)
> On Jan 30, 2016, at 7:42 AM, Susheel Kumar wrote:
>
> Hi Steve,
>
> Can you please elaborate what error you are getting and i didn't un
Solr server, you have
one object.
There is no leak in HttpSolrClient, you are misusing the class, massively.
wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/ (my blog)
> On Jan 31, 2016, at 2:10 PM, Steven White wrote:
>
> Thank you all for your
be a lot faster after you reuse the
client class.
wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/ (my blog)
> On Jan 31, 2016, at 3:46 PM, Steven White wrote:
>
> Thanks Walter. Yes, I saw your answer and fixed the issue per your
> suggestion.
>
.
wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/ (my blog)
> On Feb 5, 2016, at 8:13 AM, Jack Krupansky wrote:
>
> This doesn't sound like a great use case for Solr - or any other search
> engine for that matter. I'm not sure what yo
Making two indexing calls, one to each, works until one system is not
available. Then they are out of sync.
You might want to put the updates into a persistent message queue, then have
both systems indexed from that queue.
wunder
Walter Underwood
wun...@wunderwood.org
http
Updating two systems in parallel gets into two-phase commit, instantly. So you
need a persistent pool of updates that both clusters pull from.
wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/ (my blog)
> On Feb 9, 2016, at 4:15 PM, Shawn Heisey wrote:
>
&g
I agree. If the system updates synchronously, then you are in two-phase commit
land. If you have a persistent store that each index can track, then things are
good.
wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/ (my blog)
> On Feb 9, 2016, at 7:37 PM, Sh
.
wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/ (my blog)
> On Feb 11, 2016, at 10:06 AM, Erick Erickson wrote:
>
> Steven's solution is a very common one, complete to the
> notion of re-chunking. Depending on the throughput requirements,
>
This happens for fonts where Tika does not have font metrics. Open the document
in Adobe Reader, then use document info to find the list of fonts.
Then post this question to the Tika list.
Fix it in Tika, don’t patch it in Solr.
wunder
Walter Underwood
wun...@wunderwood.org
http
good introduction.
http://rosenfeldmedia.com/books/search-analytics-for-your-site/
<http://rosenfeldmedia.com/books/search-analytics-for-your-site/>
Sea Urchin is doing some good work in search metrics: https://seaurchin.io/
<https://seaurchin.io/>
wunder
Walter Underwood
wun...@wu
by that value.
I haven’t tried any of these, of course.
wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/ (my blog)
> On Feb 25, 2016, at 3:33 PM, Binoy Dalal wrote:
>
> According to the edismax documentation, negative boosts are supported, so
>
I’m creating a query from MLT terms, then sending it to edismax. The
neighboring words in the query are not meaningful phrases.
Is there a way to turn off phrase creation and search for one query? Or should
I separate them all with “OR”?
wunder
Walter Underwood
wun...@wunderwood.org
http
ple of the need for shingle-type synonyms.
wunder
Walter Underwood
Former GO.com/Infoseek search engineer
wun...@wunderwood.org
http://observer.wunderwood.org/ (my blog)
You could index both pages and chapters, with a type field.
You could index by chapter with the page number as a payload for each token.
wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/ (my blog)
> On Mar 1, 2016, at 5:50 AM, Zaccheo Bagnati wrote:
>
>
If you need transactions, you should use a different system, like MarkLogic.
wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/ (my blog)
> On Mar 3, 2016, at 8:46 PM, sangs8788
> wrote:
>
> Hi Emir,
>
> Right now we are having only inserts i
So batch them. You get a response back from Solr whether the document was
accepted. If that fail, there is a failure. What do you do then?
After every 100 docs or one minute, do a commit. Then delete the documents from
the input queue. What do you do when the commit fails?
wunder
Walter
> On Mar 3, 2016, at 9:54 AM, Aneesh Mon N wrote:
>
> To be noted that all the fields are stored so as to support the atomic
> updates.
Are you doing all of these updates as atomic? That could be slow. If you are
supplying all the fields, then just do a regular add.
wunder
Walt
ing>
wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/ (my blog)
> On Mar 6, 2016, at 7:27 AM, Jack Krupansky wrote:
>
> Back to the original question... there are two answers:
>
> 1. Yes - for guru-level Solr experts.
> 2. No - for anybody else
queries.
* 5000 queries is not nearly enough. That totally fits in cache. I usually
start with 100K, though I’d like more. Benchmarking a cached system is one of
the hardest things in devops.
wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/ (my blog)
> On Mar 7, 2
the popularity scale. I gave up and made it work for popular movies.
Here at Chegg, multiplicative boost works fine.
Don’t think so much about the absolute values of the scores. All we care about
is ordering. Work with real user queries, not with theory.
wunder
Walter Underwood
wun
hundreds of views? People really will notice when the 1978
animated version shows up before the Peter Jackson films.
wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/ (my blog)
> On Mar 18, 2016, at 8:18 AM,
> wrote:
>
> On Friday, March 18, 2016
one
rented one million time and the one rented 800 thousand times (think about the
Twilight movies at Netflix). But it also distinguishes between the one rented
100 times and the one rented 80 times.
wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/ (my blog)
>
” not be
the first hit for that?
wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/ (my blog)
> On Mar 18, 2016, at 8:48 AM,
> wrote:
>
> On Friday, March 18, 2016 4:25 PM, wun...@wunderwood.org wrote:
>>
>> That works fine if you have a q
on this
list is an “XY problem”, where the poster has problem X and has assumed
solution Y, which is not the right solution. But they ask about Y. So we will
tell people that their approach is wrong, because that is the most helpful
thing we can do.
wunder
Walter Underwood
wun...@wunderwood.org
If possible, log in UTC. Daylight time causes amusing problems in logs, like
one day with 23 hours and one day with 25.
You can always convert to local time when you display it.
wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/ (my blog)
> On Mar 25, 2016, at 8
I’m not sure this is a legal polling interval:
00:00:60
Try:
00:01:00
Also, polling every minute is very fast. Try a longer period.
Check the clocks on the two systems. If the clocks are not synchronized, that
could cause problem.
wunder
Walter Underwood
wun
age+Analysis>
3. Learn the analysis tool in the Solr admin UI. That allows you to explore the
behavior.
4. If you really need a high grade morphological analyzer, consider purchasing
one from Basis Technology: http://www.rosette.com/solr/
<http://www.rosette.com/solr/>
wunder
Walter U
e two.
If your customer absolutely insists on having every single figo doc above
non-figo docs, well, they deserve what they get.
wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/ (my blog)
connections open or pool them, because PHP doesn’t do that.
wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/ (my blog)
> On Apr 15, 2016, at 8:39 AM, Sara Woodmansee wrote:
>
> Hi Shawn,
>
> No clue what PHP client they are using.
>
>
No, Zookeeper is used for managing the locations of replicas and the leader for
indexing. Queries should still be distributed with a load balancer.
Queries do NOT go through Zookeeper.
wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/ (my blog)
> On Apr 17, 2
http://dvd.netflix.com/Search?v1=blade+runner
<http://dvd.netflix.com/Search?v1=blade+runner>
At Netflix (when I was there), those were shown in popularity order with a
boost function.
And for stemming, should the movie “Saw” match “see”? Maybe not.
wunder
Walter Underwood
wun...@wund
32 GB is a pretty big heap. If the working set is really smaller than that, the
extra heap just makes a full GC take longer.
How much heap is used after a full GC? Take the largest value you see there,
then add a bit more, maybe 25% more or 2 GB more.
wunder
Walter Underwood
wun
that still
possible?
wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/ (my blog)
No one runs a public-facing Solr server. Just like no one runs a public-facing
MySQL server.
wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/ (my blog)
On Aug 15, 2015, at 4:15 PM, Scott Derrick wrote:
> I'm somewhat puzzled there is no built in secu
Why? Do you evaluate Unix performance with and without file buffers?
wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/ (my blog)
On Aug 19, 2015, at 5:00 PM, Nagasharath wrote:
> Trying to evaluate the performance of queries with and without cache
>
>
, it can block.
wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/ (my blog)
On Aug 23, 2015, at 8:49 AM, Shawn Heisey wrote:
> On 8/23/2015 7:46 AM, Ashish Mukherjee wrote:
>> I want to run few Solr queries in parallel, which are being done in a
>> multi
Instead of writing new code, you could configure an autocommit interval in
Solr. That already does what you want, no more than one commit in the interval
and no commits if there were no adds or deletes.
Then the clients would never need to commit.
wunder
Walter Underwood
wun...@wunderwood.org
Yes, ISO 8601 gets pretty baroque in the far nooks and crannies of the spec.
I use the “web profile” of ISO 8601, which is very simple. I’ve never seen any
software mishandle dates using this subset of the spec.
http://www.w3.org/TR/NOTE-datetime
wunder
Walter Underwood
wun...@wunderwood.org
Every faceting implementation I’ve seen (not just Solr/Lucene) makes big
in-memory lists. Lots of values means a bigger list.
wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/ (my blog)
On Sep 8, 2015, at 8:33 AM, Shawn Heisey wrote:
> On 9/8/2015 9:10 AM, adfe
Doing a query for each term should work well. Solr is fast for queries. Write a
script.
I assume you only need to do this once. Running all the queries will probably
take less time than figuring out a different approach.
wunder
Walter Underwood
wun...@wunderwood.org
http
. That was using too much CPU.
Right now, block the IPs. Those are hostile.
wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/ (my blog)
> On Sep 21, 2015, at 10:31 AM, Paul Libbrecht wrote:
>
> Writing a query component would be pretty easy or?
> It wo
Faceting on an author field is almost always a bad idea. Or at least a slow,
expensive idea.
Faceting makes big in-memory lists. More values, bigger lists. An author field
usually has many, many values, so you will need a lot of memory.
wunder
Walter Underwood
wun...@wunderwood.org
http
Don’t do anything. Solr will automatically clean up the deleted documents for
you.
wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/ (my blog)
> On Sep 22, 2015, at 6:01 PM, CrazyDiamond wrote:
>
> my index is updating frequently and i need to remo
limit will almost certainly not do what you want. Because
it doesn’t do anything useful.
I recommend reading this document for more info:
https://wiki.apache.org/lucene-java/ScoresAsPercentages
<https://wiki.apache.org/lucene-java/ScoresAsPercentages>
wunder
Walter Underwo
Right.
I chose the twenty most frequent terms from our documents and use those for
cache warming. The list of most frequent terms is pretty stable in most
collections.
wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/ (my blog)
> On Sep 25, 2015, at 8:38
Sure.
1. Delete all the docs (no commit).
2. Add all the docs (no commit).
3. Commit.
wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/ (my blog)
> On Sep 25, 2015, at 2:17 PM, Ravi Solr wrote:
>
> I have been trying to re-index the docs (about 1.5 mi
them. No guarantee, but it is worth a try.
Good luck.
wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/ (my blog)
> On Sep 25, 2015, at 2:59 PM, Ravi Solr wrote:
>
> Walter, Not in a mood for banter right now Its 6:00pm on a friday and
> Iam stuck
We did the same thing, but reporting performance metrics to Graphite.
But we won’t be able to add servlet filters in 6.x, because it won’t be a
webapp.
wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/ (my blog)
> On Sep 28, 2015, at 11:32 AM, Gili Nachum wr
We built our own because there was no movement on that. Don’t hold your breath.
Glad to contribute it. We’ve been running it in production for a year, but the
config is pretty manual.
wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/ (my blog)
> On Sep 28, 2
If you want a spell checker, don’t use a search engine. Use a spell checker.
Something like aspell (http://aspell.net/ <http://aspell.net/>) will be faster
and better than Solr.
wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/ (my blog)
> On Oct 1, 2015
You understand that disabling the admin API will leave you with an
unmaintainable Solr installation, right? You might not even be able to diagnose
the problem.
wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/ (my blog)
> On Oct 5, 2015, at 11:34 AM, Siddhar
It depends on the document. In a e-commerce search, you might want to fail
immediately and be notified. That is what we do, fail, rollback, and notify.
wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/ (my blog)
> On Oct 6, 2015, at 7:58 AM, Alessandro Benede
get an accurate report of which
document was rejected. I wrote that same thing back at Netflix, before SolrJ.
wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/ (my blog)
> On Oct 6, 2015, at 9:49 AM, Alessandro Benedetti
> wrote:
>
> Hi Walter,
>
LDP/sag/html/buffer-cache.html>
wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/ (my blog)
> On Oct 7, 2015, at 3:40 AM, Toke Eskildsen wrote:
>
> On Wed, 2015-10-07 at 07:03 -0300, Eric Torti wrote:
>> I'm sorry to diverge this thread a li
different
analysis chains stored in separate fields.
The exact example you list will work fine with stemming and phrase search.
Check out the phrase search support in the edismax query parser.
wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/ (my blog)
> On Oc
items using the “boost” parameter in edismax. Adjust it to be a
tiebreaker between documents with similar score.
2. Show two lists, one with the five most relevant paid, the next with the five
most relevant unpaid.
wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/ (my
Please explain why you do not want to use an extra field. That is the only
solution that will perform well on your large index.
wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/ (my blog)
> On Oct 9, 2015, at 7:47 AM, Aman Tandon wrote:
>
> No Sushee
After several days, we finally get the real requirement. It really does waste a
lot of time and energy when people won’t tell us that.
wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/ (my blog)
> On Oct 10, 2015, at 8:19 AM, Upayavira wrote:
>
> In w
thing.
wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/ (my blog)
> On Oct 10, 2015, at 9:31 AM, Erick Erickson wrote:
>
> Would result grouping work here? If the group key was "paid", then
> you'd get two groups back, "paid"
phonetic representation, then
you can weight the lower case higher than the stemmed field, and stemmed higher
than phonetic.
wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/ (my blog)
> On Oct 12, 2015, at 6:12 AM, Ahmet Arslan wrote:
>
> Hi,
>
> Catc
Can you reload all the content?
If so, I would calculate this in an update request processor and put the result
in its own field.
wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/ (my blog)
> On Oct 21, 2015, at 2:53 AM, Roland Szűcs wrote:
>
> Thank
Does the collection reload do a rolling reload of each node or does it do them
all at once? We were planning on using the core reload on each system, one at a
time. That would make sure the collection stays available.
I read the documentation, it didn’t say anything about that.
wunder
Walter
with tens of thousands of fields. A thousand fields
might be cumbersome, but it won’t break Solr.
If the tables contain different kinds of things, you might have different
collections (one per document), or one collection with a “type” field for each
kind of document.
wunder
Walter Underwood
igure the Solr
cluster to talk to it.
wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/ (my blog)
> On Oct 29, 2015, at 10:08 AM, Matteo Grolla wrote:
>
> I'm designing a solr cloud installation where nodes from a single cluster
> are distributed
g fast. In only 21 lines of Python.
http://norvig.com/spell-correct.html <http://norvig.com/spell-correct.html>
wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/ (my blog)
> On Oct 30, 2015, at 11:37 AM, Robert Oschler wrote:
>
> Hello everyone,
&
short article to learn more about spelling correction.
http://norvig.com/spell-correct.html <http://norvig.com/spell-correct.html>
wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/ (my blog)
> On Oct 30, 2015, at 4:45 PM, Robert Oschler wrote:
>
> H
Read the links I have sent.
wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/ (my blog)
> On Oct 30, 2015, at 7:10 PM, Robert Oschler wrote:
>
> Thanks Walter. Are there any open source spell checkers that implement the
> Peter Norvig or Damerau
This will probably work better without child documents and joins.
I would denormalize into actor documents and movie documents. At least, that’s
what I did at Netflix.
wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/ (my blog)
> On Oct 31, 2015, at 1:17
use the EdgeNgramFilter to index
prefixes. That will make your index larger, but prefix searches will be very
fast.
wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/ (my blog)
> On Nov 2, 2015, at 5:17 AM, Toke Eskildsen wrote:
>
> On Mon, 2015-11-02 at 17
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/ (my blog)
> On Nov 2, 2015, at 9:39 PM, Modassar Ather wrote:
>
> Thanks Walter for your response,
>
> It is around 90GB of index (around 8 million documents) on one shard and
> there are 12 such shards. As pe
approach is nice and clear.
wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/ (my blog)
> On Nov 5, 2015, at 3:33 AM, Alessandro Benedetti
> wrote:
>
> Hi Christian,
> there are several ways :
>
> 1) Elevation query component - it should be
It is pretty handy, though. Great for expunging docs that are marked deleted or
are expired.
wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/ (my blog)
> On Nov 6, 2015, at 5:31 PM, Alexandre Rafalovitch wrote:
>
> Elasticsearch removed deleteByQuery
Also, what GC settings are you using? We may be able to make some suggestions.
Cumulative GC pauses aren’t very interesting to me. I’m more interested in the
longest ones, 90th percentile, 95th, etc.
wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/ (my blog)
>
I’m sure it is possible, but think twice before logging in local time. Do you
really want one day with 23 hours and one day with 25 hours each year?
wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/ (my blog)
> On Nov 16, 2015, at 8:04 AM, tedsolr wrote:
>
That is the approach I’ve been using for years. Simple and effective.
It probably makes the index bigger. Make sure that only one of the fields is
stored, because the stored text will be exactly the same in both.
wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/ (my
those lists will fit in memory.
wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/ (my blog)
> On Nov 19, 2015, at 3:46 PM, Steven White wrote:
>
> Hi everyone
>
> What is considered too many fields for qf and fq? On average I will have
> 1500 field
The implementation for fq has changed from 4.x to 5.x, so I’ll let someone else
answer that in detail.
In 4.x, the result of each filter query can be cached. After that, they are
quite fast.
wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/ (my blog)
> On Nov
operating.
Specifying a list of all the zk nodes is robust. If one goes down, it tries
another.
wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/ (my blog)
> On Nov 29, 2015, at 12:19 PM, Don Bosco Durai wrote:
>
> This should answer your question:
e ensemble.
>
> Regards,
> Salman
>
> On Mon, Nov 30, 2015 at 1:07 AM, Walter Underwood
> wrote:
>
>> Why would that link answer the question?
>>
>> Each Solr connects to one Zookeeper node. If that node goes down,
>> Zookeeper is still available, but
629>
wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/ (my blog)
> On Dec 8, 2015, at 9:56 AM, Felley, James wrote:
>
> I am trying to build an edismax search handler that will allow a fuzzy
> search, using the "query fields" property (qf).
>
grep '"status":"idle"' > /dev/null
[ $? -ne 0 ] || break
sleep 300
done
echo Solr indexing is finished
wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/ (my blog)
> On Dec 8, 2015, at 5:37 PM, Brian Narsi wrote:
>
&
Often Solr documents are “semi-structured”. They have some structured fields
and some free-text fields. e-mail messages are like that, with structured
headers and an unstructured body.
wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/ (my blog)
> On Dec 9, 2
Since you are getting these failures, the 90 second timeout is not “good
enough”. Try increasing it.
wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/ (my blog)
On Feb 20, 2015, at 5:22 AM, NareshJakher wrote:
> Hi Shawn,
>
> I do not want to increase t
The HTTP protocol does not set a limit on GET URL size, but individual web
servers usually do. You should get a response code of “414 Request-URI Too
Long” when the URL is too long.
This limit is usually configurable.
wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org
, you may need to re-think
your design.
wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/ (my blog)
On Feb 21, 2015, at 4:45 PM, Shawn Heisey wrote:
> On 2/21/2015 1:46 AM, steve wrote:
>> Careful with the GETs! There is a real, hard limit on the length
That depends on the JVM you are using. For the Oracle JVMs, use this to get a
list of extended options:
java -X
wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/ (my blog)
On Feb 23, 2015, at 8:21 AM, Kevin Laurie wrote:
> Hi Guys,
> I am a newbie on Solr
-insensitive approach. But it
hits the wall pretty fast.
One thing that does work pretty well is trademarked names (LaserJet, Coke,
etc). Those are spelled the same in all languages and usually not inflected.
wunder
Walter Underwood
wun...@wunderwood.org
http://observer.wunderwood.org/ (my blog)
On Feb
1 - 100 of 1703 matches
Mail list logo