I have not tried it but I would check the option of using the synonymFilter
to duplicate certain query words . Anothe opt - you can detect these word
at index time (eg. UpdateProcessor) to give these documents a document
boost in case it fits your logic. Or even make a copy field that contains a
Right, it works!
I was not aware of this functionality and being able to customize it by
hl.requireFieldMatch param.
Thanks
Hello,
I need to expose the search and highlighting capabilities over few tens of
fields. The edismax's qf param makes it possible but the time performances
for searching tens of words over tens of fields is problematic.
I made a copyField (indexed, not stored) for these fields, which gives way
Current I use the classic but I can change my posting format in order to
work with another highlighting component if that leads to any solution
hl.fl work in this case? Or is highlighting the 10 fields the
slowdown?
Best,
Erick
On Wed, Jul 30, 2014 at 2:55 AM, Manuel Le Normand
manuel.lenorm...@gmail.com wrote:
Current I use the classic but I can change my posting format in order to
work with another highlighting component
Hello,
Many of our indexed documents are scanned and OCR'ed documents.
Unfortunately we were not able to improve much the OCR quality (less than
80% word accuracy) for various reasons, a fact which badly hurts the
retrieval quality.
As we use an open-source OCR, we think of changing every scanned
New York, NY 10017
t: @appinions https://twitter.com/Appinions | g+:
plus.google.com/appinions
https://plus.google.com/u/0/b/112002776285509593336/112002776285509593336/posts
w: appinions.com http://www.appinions.com/
On Wed, Jul 2, 2014 at 10:19 AM, Manuel Le Normand
Is the issue SOLR-5478 what you were looking for?
Why wouldn't you take advantage of your use case - the chars belong to
different char classes.
You can index this field to a single solr field (no copyField) and apply an
analysis chain that includes both languages analysis - stopword, stemmers
etc.
As every filter should apply to its' specific
Hi,
I have a performance and scoring problem for phrase queries
1. Performance - phrase queries involving frequent terms are very slow
due to the reading of large positions posting list.
2. Scoring - I want to control the boost of phrase and entity (in
gazetteers) matches
Indexing
Hello,
I'm trying to handle a situation with taxonomy search - that is for each
taxonomy I have a list of words with their boosts. These taxonomies are
updated frequently so I retrieve these scored lists at query time from an
external service.
My expectation would be:
In short, when running a distributed search every shard runs the query
separately. Each shard's collector returns the topN (rows param) internal
docId's of the matching documents.
These topN docId's are converted to their uniqueKey in the
BinaryResponseWriter and sent to the frontend core (the
In the last days one of my tomcat servlet, running only a Solr instance,
crushed unexpectedly twice.
Low memory usage, nothing written in the tomcat log, and the last thing
happening in solr log is 'end_commit_flush' followed by 'UnInverted
mutli-valued field' for the fields faceted during the
Zookeeper client for eclipse is the tool you're looking for. You can edit
directly the clusterstate.
http://www.massedynamic.org/mediawiki/index.php?title=Eclipse_Plug-in_for_ZooKeeper
Another option can be using the delivered zkclient (distributed with solr
4.5 and above) and upload a new
Running solr 4.3, sharded collection. Tomcat 7.0.39
Faceting on multivalue fields works perfectly fine, I was describing this
log to emphasize the fact the servlet failed right after a new searcher was
opened and the event listener finished running a warming faceting query.
In order to set discountOverlaps to true you must have added the
similarity class=solr.DefaultSimilarityFactory to the schema.xml, which
is commented out by default!
As by default this param is false, the above situation is expected with
correct positioning, as said.
In order to fix the field
Robert, you last reply is not accurate.
It's true that the field norms and termVectors are independent. But this
issue of higher norms for this case is expected with well assigned
positions. The LengthNorm is assigned as FieldInvertState.length which is
the count of incrementToken and not num of
https://issues.apache.org/jira/browse/SOLR-5478
There it goes
On Mon, Nov 18, 2013 at 5:44 PM, Manuel Le Normand
manuel.lenorm...@gmail.com wrote:
Sure, I am out of office till end of week. I reply after i upload the patch
Sure, I am out of office till end of week. I reply after i upload the patch
In order to accelerate the BinaryResponseWriter.write we extended this
writer class to implement the docid to id tranformation by docValues (on
memory) with no need to access stored field for id reading nor lazy loading
of fields that also has a cost. That should improve read rate as docValues
are
It's surprising such a query takes a long time, I would assume that after
trying consistently q=*:* you should be getting cache hits and times should
be faster. Try see in the adminUI how do your query/doc cache perform.
Moreover, the query in itself is just asking the first 5000 docs that were
Hi
Any distributed lookup is basically composed of two stages: the first
collecting all the matching documents from every shard and a second which
fetches additional information about specific ids (i.e stored, termVectors).
It can be seen in the logs of each shard (isShard=true), where first
I tried my last proposition, editing the clusterstate.json to add a dummy
frontend shard seems to work. I made sure the ranges were not overlapping.
Doesn't it resolve the solr cloud issue as specified above?
Would adding a dummy shard instead of a dummy collection would resolve the
situation? - e.g. editing clusterstate.json from a zookeeper client and
adding a shard with a 0-range so no docs are routed to this core. This core
would be on a separate server and act as the collection gateway.
is the one that does not have its own index and
is doing merging of the results. Is this the case? If yes, are all 36
shards always queried?
Dmitry
On Mon, Sep 9, 2013 at 10:11 PM, Manuel Le Normand
manuel.lenorm...@gmail.com wrote:
Hi Dmitry,
I have solr 4.3 and every query
much faster if results merging can be avoided.
Dmitry
On Sun, Sep 8, 2013 at 6:56 PM, Manuel Le Normand
manuel.lenorm...@gmail.com wrote:
Hello all
Looking on the 10% slowest queries, I get very bad performances (~60 sec
per query).
These queries have lots of conditions on my main
to get more disk space. The amount of engineer time spent
trying to tune this is way more expensive than a disk...
Best,
Erick
On Sun, Sep 8, 2013 at 11:51 AM, Manuel Le Normand
manuel.lenorm...@gmail.com wrote:
Hi,
In order to delete part of my index I run a delete by query
Hi,
In order to delete part of my index I run a delete by query that intends to
erase 15% of the docs.
I added this params to the solrconfig.xml
mergePolicy class=org.apache.lucene.index.TieredMergePolicy
int name=maxMergeAtOnce2/int
int name=maxMergeAtOnceExplicit2/int
double
Hello all
Looking on the 10% slowest queries, I get very bad performances (~60 sec
per query).
These queries have lots of conditions on my main field (more than a
hundred), including phrase queries and rows=1000. I do return only id's
though.
I can quite firmly say that this bad performance is due
Hello,
My solr cluster runs on RH Linux with tomcat7 servlet.
NumOfShards=40, replicationFactor=2, 40 servers each has 2 replicas. Solr
4.3
For experimental reasons I splitted my cluster to 2 sub-clusters, each
containing a single replica of each shard.
When connecting back these sub-clusters the
, Manuel Le Normand
manuel.lenorm...@gmail.com wrote:
Hello,
My solr cluster runs on RH Linux with tomcat7 servlet.
NumOfShards=40, replicationFactor=2, 40 servers each has 2 replicas. Solr
4.3
For experimental reasons I splitted my cluster to 2 sub-clusters, each
containing a single replica
Hi,
I have a slow storage machine and non sufficient RAM for the whole index to
store all the index. This causes the first queries (~5000) to be very slow
(they are read from disk and my cpu is most of time in iowait), and after
that the readings from the index become very fast and read mainly
Minfeng- This issue is tougher as the number of shard you have raise, you
can read Erick Erickson's post:
http://grokbase.com/t/lucene/solr-user/131p75p833/how-distributed-queries-works.
If you have 100M docs I guess you are running this issue.
The common way to deal with this issue is by
Use the pattern replace filter factory
filter class=solr.PatternReplaceFilterFactory pattern=([^a-z])
replacement=/
This will do exactly what you asked for
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.PatternReplaceFilterFactory
On Mon, Jul 22, 2013 at 12:22 PM,
Great explanation and article.
Yes, this buffer for merges seems very small, and still optimized. Thats
impressive.
, 2013 at 8:36 AM, Manuel Le Normand
manuel.lenorm...@gmail.com wrote:
Hello,
As a result of frequent java OOM exceptions, I try to investigate more
into
the solr jvm memory heap usage.
Please correct me if I am mistaking, this is my understanding of usages
for
the heap (per replica on a solr
Hello,
As a result of frequent java OOM exceptions, I try to investigate more into
the solr jvm memory heap usage.
Please correct me if I am mistaking, this is my understanding of usages for
the heap (per replica on a solr instance):
1. Buffers for indexing - bounded by ramBufferSize
2. Solr
My schema contains about a hundred of fields of various types (int,
strings, plain text, emails).
I was concerned what is the common practice for searching free text over
the index. Assuming there are not boosts related to field matching, these
are the options I see:
1. Index and query a
By field aliasing I meant something like: f.all_fields.qf=*_txt+*_s+*_int
that would sum up to 100 fields
On Wed, Jun 26, 2013 at 12:00 AM, Manuel Le Normand
manuel.lenorm...@gmail.com wrote:
My schema contains about a hundred of fields of various types (int,
strings, plain text, emails).
I
Hello all,
Assuming I have a single shard with a single core, how do run
multi-threaded queries on Solr 4.x?
Specifically, if one user sends a heavy query (legitimate wildcard query
for 10 sec), what happens to all other users quering during this period?
If the repsonse is that simultaneous
Hello again,
After a heavy query on my index (returning 100K docs in a single query) my
JVM heap's floods and I get an JAVA OOM exception, and then that my
GCcannot collect anything (GC
overhead limit exceeded) as these memory chunks are not disposable.
I want to afford queries like this, my
not get the JVM heap flooded (for
example I already have all cashed and my RAM io's are very fast)
On Mon, Jun 17, 2013 at 11:47 PM, Walter Underwood wun...@wunderwood.orgwrote:
Don't request 100K docs in a single query. Fetch them in smaller batches.
wunder
On Jun 17, 2013, at 1:44 PM, Manuel Le
in the OS - they all get a slice of the CPU time to
do their work. Not sure if that answers your question...?
Otis
--
Solr ElasticSearch Support
http://sematext.com/
On Mon, Jun 17, 2013 at 4:32 PM, Manuel Le Normand
manuel.lenorm...@gmail.com wrote:
Hello all,
Assuming I have a single
OOM because the JVM does not
have enough memory to build a response with 100K documents.
wunder
On Jun 17, 2013, at 1:57 PM, Manuel Le Normand wrote:
One of my users requested it, they are less aware of what's allowed and I
don't want apriori blocking them for long specific request
Ok! Will check eventually if it's an ACE issue and will upload the stack
trace in case something else is throwing theses exceptions...
Thanks meanwhile
On Mon, May 13, 2013 at 12:11 AM, Shawn Heisey s...@elyograg.org wrote:
On 5/12/2013 2:37 PM, Manuel Le Normand wrote:
The upgrade from
Hi there,
Looking at one of my shards (about 1M docs) i see lot of unique terms, more
than 8M which is a significant part of my total term count. These are very
likely useless terms, binaries or other meaningless numbers that come with
few of my docs.
I am totally fine with deleting them so these
Hello,
Since i replicated my shards (i have 2 cores per shard now), I get a
remarkable decrease in qTime. I assume it happens since my memory has to
split between twice more cores than it used to.
In my low qps rate use-case, I use replications as shard backup only (in
case one of my servers goes
Can happen for various reasons.
Can you recreate the situation, meaning restarting the servlet or server
would start with good qTime and decrease from that point? How fast does
this happen?
Start by monitoring the jvm process, with oracle visualVM for example.
Monitor for frequent garbage
Hello,
After creating a distributed collection on several different servers I
sometimes get to deal with failing servers (cores appear not available =
grey) or failing cores (Down / unable to recover = brown / red).
In case i wish to delete this errorneous collection (through collection
API) only
On the query side, another down side i see would be that for a given memory
pool, you'd have to share it with more cores because every replica uses
it's own cache.
True for the inner solr caching (JVM's heap) and OS caching as well.
Adding a replicated core creates a new data set (index) that will
Hi,
We have different working hours, sorry for the reply delay. Your assumed
numbers are right, about 25-30Kb per doc. giving a total of 15G per shard,
there are two shards per server (+2 slaves that should do no work normally).
An average query has about 30 conditions (OR AND mixed), most of them
a
response-merge (CPU resource) bottleneck?
Thanks in advance,
Manu
On Mon, Apr 8, 2013 at 10:19 PM, Shawn Heisey s...@elyograg.org wrote:
On 4/8/2013 12:19 PM, Manuel Le Normand wrote:
It seems that sharding my collection to many shards slowed down
unreasonably, and I'm trying to investigate why
After taking a look on what I'd wrote earlier, I will try to rephrase in a
clear manner.
It seems that sharding my collection to many shards slowed down
unreasonably, and I'm trying to investigate why.
First, I created collection1 - 4 shards*replicationFactor=1 collection on
2 servers. Second I
Hello
After performing a benchmark session on small scale i moved to a full scale
on 16 quad core servers.
Observations at small scale gave me excellent qTime (about 150 ms) with up
to 2 servers, showing my searching thread was mainly cpu bounded. My query
set is not faceted.
Growing to full scale
Your question is a typical use-case dependent, the bottleneck will change
from user to user.
These are two main issues that will affect the answer:
1. How do you index: what is your indexing rate (how many docs a days)? how
big is a typical document? how many documents do you plan on indexing in
specific questions related to
optimizations, but I think it's worth trying the suggestions above and
avoid optimizations altogether. I'm pretty sure the answer to #1 is no
and for #2 is it optimizes independently.
Cheers,
Tim
On Sat, Mar 2, 2013 at 10:24 AM, Manuel Le Normand
with a single
thread? Because Solr uses multiple threads to search AFAIK.
Best
Erick
On Wed, Feb 20, 2013 at 4:01 AM, Manuel Le Normand
manuel.lenorm...@gmail.com javascript:; wrote:
More to it, i do see 75 more threads under the process of tomcat6, but
only
a single one is working while
57 matches
Mail list logo