Hi,
Did anybody try to embed into Solr sorting based on the Hamming distance on
a certain field. http://en.wikipedia.org/wiki/Hamming_distance
E.g. having a document doc1 with a field doc_hash:12345678 and doc2 with
doc_hash:12345699.
When searching for doc_hash:123456780 the sort order should be
Are all the replicas up ?
Did you check if there is enough space on the disk?
How are you running the queries?
-
Thanks,
Michael
--
View this message in context:
http://lucene.472066.n3.nabble.com/Inconsistent-Behavior-of-Solr-Cloud-tp4141593p4141605.html
Sent from the Solr - User mailing
I'm having a SolrCloud setup using Solr 4.6 with several configuration sets
and multiple collections, some sharing the same config set.
I would like now to update the schema inside a config set, adding a new
field.
1. Can i do this directly downloading the schema file and re-uploading after
On google a user can query using operators like + or - and quote the
desired term in order to get the desired match.
Does something like this come by default with edismax parser ?
-
Thanks,
Michael
--
View this message in context:
Investigating, it looks that the payload.bytes property is where the problem
is.
payload.toString() outputs corrects values, but .bytes property seems to
behave a little weird:
public class CustomSimilarity extends DefaultSimilarity {
@Override
public float scorePayload(int doc, int
Yes, it's float:
filter class=solr.DelimitedPayloadTokenFilterFactory encoder=float
delimiter=|/
The scenario is simple to replicate - default solr-4.6.0 example, with a
custom Similarity class (the one above) and a custom queryparser (again,
listed above).
I posted the docs in XML format (docs
Hi Markus,
Do you have any example/tutorials of your payloads in custom filter
implementation ?
I really want to get payloads working, in any way.
Thanks!
-
Thanks,
Michael
--
View this message in context:
Hi Ahmet,
Yes, I did, also tried various scenarios with the same outcome. I used the
stock example, with minimum customization ( custom similarity and query
parser ).
-
Thanks,
Michael
--
View this message in context:
Hi,
I'm trying to test payloads in Solr
Using solr 4.6.0 and the example configuration, i posted 3 docs to solr:
add
doc
field name=id1/field
field name=titleDoc one/field
field name=payloadstestone|100 testtwo|30 testthree|5/field
field name=textI testone, you testtwo, they
Actually, i just checked the debugQuery output: they all have the same score:
explain: {
1: \n0.24276763 = (MATCH) weight(text:testone in 0)
[DefaultSimilarity], result of:\n 0.24276763 = fieldWeight in 0, product
of:\n1.0 = tf(freq=1.0), with freq of:\n 1.0 = termFreq=1.0\n
Thanks iorixxx,
Actually I've just tried it and I hit a small wall, the tutorial looks not
to be up to date with the codebase.
When implementing my custom similarity class i should be using
PayloadHelper, but following happens:
in PayloadHelper:
public static final float decodeFloat(byte []
Thanks, that indeed fixed the problem.
Now i've created a custom Similarity class and used it in schema.xml.
Problem is now that for all docs the calculated payload score is the same:
public class CustomSolrSimilarity extends DefaultSimilarity {
@Override
public float
Thanks Eric,
I did create a custom query parser, which seems to work just fine. My only
problem now is the one above, with all docs having the same score for some
reason.
See below the query parserL
import org.apache.commons.lang.StringUtils;
import org.apache.lucene.index.Term;
import
Correction:
I observed a pattern, the returned score is the same for all docs and equals
with the payload of the term in the first doc:
http://localhost:8983/solr/collection1/pds-search?q=payloads:testonewt=jsonindent=truedebugQuery=true
---
explain:{
1:\n15.4 = (MATCH)
Hi,
I'm not really sure how/if payloads work (I tried out Rafal Kuc's payload
example in Apache Solr 4 Cookbook and did not do what i was expecting - see
below what i was expecting to do and please correct me if i was looking for
the the wrong droid)
What I am trying to achieve is similar to the
Hi David,
They are loaded with a lot of data so avoiding a reload is of the utmost
importance.
Well, reloading a core won't cause any data loss. Is it 100% availability
during the process is what you need?
-
Thanks,
Michael
--
View this message in context:
Hi guys, thanks for the replies!
The json was valid, i validated it and the only diff between the fiels was
my edit.
But actually, it got fixed by itself - when i got to work today, everything
was working as it should.
Maybe it was something on my machine or browser, can't put a finger on the
Hi,
I'm trying to add SolrCloud to out internal monitoring tools and I wonder if
anybody else proceeded in this direction and could maybe provide some tips.
I would want to be able to get from SolrCloud:
1. The status for each collection - meaning can it serve queries or not.
2. Average query
Hi,
Somehow my Zookeeper clusterstate has gotten messed up and after a restart
of both Zookeeper instances and my Solr instances in one of my collections ,
for one shard, the range is now null.
Everything else it's fine, but I can't index documents now because i get an
error : No active slice
HI,
Today I changed my ZK config, removing one instance in the quorum and then
restarted both all ZKs and all Solr instances.
After this operation i noticed that one of the shards in one collection was
missing the range (range:null). Router for that collection was
compositeId.
So, I proceeded
Thanks for the reply Tim,
Yes, that was just a typo, i used cat not cate.
As for the checks everything looks fine, my edits were:
1. updating the shard range
2. removed the header which looked log information, as below:
* removed header start here*
Connecting to solr3:9983
2013-12-11
I had a look, but all looks fine there too:
[Wed Dec 11 2013 17:04:41 GMT+0100 (CET)] runRoute get #/~cloud
GET tpl/cloud.html?_=1386777881244
200 OK
57ms
GET /solr/zookeeper?wt=json_=1386777881308
200 OK
509ms
GET /solr/zookeeper?wt=jsonpath=%2Flive_nodes_=1386777881822
200 OK
62ms
Which are you solr startup parameters (java options) ?
You can assign more memory to the JVM by specifying -Xmx=10G or whichever
value works for you.
-
Thanks,
Michael
--
View this message in context:
http://lucene.472066.n3.nabble.com/JVM-crashed-when-start-solr-tp4105702p4105705.html
Use Core API, which provides the UNLOAD operation.
Simply just unload the cores you don't need and they'll be automatically
removed from SolrCloud. You can also specify options like deleteDataDir or
deleteIndex to cleanup the disk space or you can do it in your script.
As Shawn stated above, when you start up Solr there will be no such thing as
caches or old searchers.
If you want to warm up, you can only rely on firstSearcher and newSearcher
queries.
/What would happen to the autowarmed queries , cache , old searcher now ?/
They're all gone.
-
Thanks,
Caches are only valid as long as the Index Searcher is valid. So, if you make
a commit with opening a new searcher then caches will be invalidated.
However, in this scenario you can configure your caches so that the new
searcher will keep a certain number of cache entries from the previous one
You could just add the queries you have set up in your batch script to the
firstSearcher queries. Like this, you wouldn't need to run the script
everytime you restart Solr.
As for crash protection and immediate action, that's outside the scope of
the Solr mailing list. You could setup a watchdog
Hi,
There's nothing unusual in what you are trying to do, this scenario is very
common.
To answer your questions:
1. as I understand I can separate the configs of each collection in
zookeeper. is it correct?
Yes, that's correct. You'll have to upload your configs to ZK and use the
Use HTTP basic authentication, setup in your servlet container
(jetty/tomcat).
That should work fine if you are *not* using SolrCloud.
-
Thanks,
Michael
--
View this message in context:
http://lucene.472066.n3.nabble.com/How-to-work-with-remote-solr-savely-tp4102612p4102613.html
Sent from
http://wiki.apache.org/solr/SolrSecurity#Path_Based_Authentication
Maybe you could achieve write/read access limitation by setting path based
authentication:
The update handler /solr/core/update should be protected by
authentication, with credentials only known to you. But then of course, your
I encountered this problem often when i restarted a solr instance before
replication was finished more than once.
I would then have multiple timestamped directories and the index directory.
However, the index.properties points to the active index directory.
The moment when the replication
Hi,
The CollectionAPI provides some more options that will prove to be very
usefull to you:
/admin/collections?action=CREATEname=namenumShards=numberreplicationFactor=numbermaxShardsPerNode=numbercreateNodeSet=nodelistcollection.configName=configname
Have a look at:
A few weeks ago optimization in SolrCloud was discussed in this thred:
http://lucene.472066.n3.nabble.com/SolrCloud-optimizing-a-core-triggers-optimization-of-another-td4097499.html#a4098020
The thread was covering the distributed optimization inside a collection.
My use case requires manually
Thanks Erick!
That's a really interesting idea, i'll try it!
Another question would be, when does the merging actually happens? Is it
triggered or conditioned by something?
Currently I have a core with ~13M maxDocs and ~3M deleted docs, and although
I see a lot of merges in SPM, deleted
Do you do your commit from the two indexing clients or have the autocommit
set to maxDocs = 1000 ?
-
Thanks,
Michael
--
View this message in context:
http://lucene.472066.n3.nabble.com/solrcloud-forward-update-to-a-shard-failed-tp4100608p4100633.html
Sent from the Solr - User mailing list
I did something like that also, and i was getting some nasty problems when
one of my clients would try to commit before a commit issued by another one
hadn't yet finish. Might be the same problem for you too.
Try not doing explicit commits fomr the indexing client and instead set the
autocommit
You'll have to provide some more details on your problem. What do you mean by
location A and B : 2 different machines?
By default SolrCloud shards can have replicas which can be hosted on
different machines. It can offer you redundancy, if one of you machines
dies, your search system will still
Thank you, Peter!
Last weekend I was up until 4am trying to understand why is Solr starting so
so sooo slow, when i had gave enough memory to fit the entire index.
And then I remembered your trick used on the m3.xlarge machines, tried it
and it worked like a charm!
Thank you again!
-
Thanks for the comments Shalin,I ended up doing just that, reindexing from
ground up.
-
Thanks,
Michael
--
View this message in context:
http://lucene.472066.n3.nabble.com/Merging-shards-and-replicating-changes-in-SolrCloud-tp407p4100255.html
Sent from the Solr - User mailing list
From my understanding, if your already existing cluster satisfies your
collection (already live nodes = nr shards * replication factor) there
wouldn't be any need for creating additional replicas on the new server,
unless you directly ask for them, after startup.
I usually just add the machine to
Here's the background of this topic:
I have setup a collection with 4 shards, replicationFactor=2, on two
machines.
I started to index documents, but after hitting some update deadlocks and
restarting servers my shards ranges in ZK state got nulled (i'm using
implicit routing). Indexing continued
Is it possible to create a replica of a shard (collection1_shard1_replica1),
in SolrCloud, by copying the collection1_shard1_replica1/ directory to the
new server, updating core.properties and restarting solr on that machine?
Would this be faster than using the CoreAPI to create a new core and
First time you run a query it's always slower, because it reads data from
disk.
After the first query, caches are built and stored in RAM memory, so the
second run of that query will hit caches and be sensibly faster.
To change how slow the first query is, play around with you firstSearcher
and
Thank you!
I suspect that maybe my box was too small.
I'm upgrading my machines to more CPU RAM and let's see how it goes from
there.
Would limiting the number of returned fields to a smaller value would make
any improvement?
The behaviour I noticed was that:
at start=orows=10 avg qtime after
Solr's queryhandler statistics are pretty neat. Avg time per req, avg
requests in the last 5/15 min and so on.
But, when using SolrCloud's distributed search each core gets multiple
requests, making it hard to check which is the actual query time (the time
from when a leader gets the query request
I've setup my SolrCloud using AWS and i'm currently using 2 average machines.
I'm planning to ad one more bigger machine (by bigger i mean double the
RAM).
If they all work in a cluster and the search being distributed, will the
smaller machines limit the performance the bigger machine could
I saw that some time ago there was a JIRA ticket dicussing this, but still i
found no relevant information on how to deal with it.
When working with big nr of docs (e.g. 70M) in my case, I'm using
start=0rows=30 in my requests.
For the first req the query time is ok, the next one is visibily
Thank you, Erick!
-
Thanks,
Michael
--
View this message in context:
http://lucene.472066.n3.nabble.com/SolrCloud-different-machine-sizes-tp4099138p4099195.html
Sent from the Solr - User mailing list archive at Nabble.com.
I have a SolrCloud cluster holding 4 collections, each with with 3 shards and
replication factor = 2.
They all live on 2 machines, and I am currently using this setup for
testing.
However, i would like to connect this test setup to our live application,
just for benchmarking and evaluating if it
Thanks Jack!
I tried it and i get a really funny behaviour: I have two collections,
having the same solrconfig.xml and the same schema definition, except for
the type of some fields, which in collection_DE are customized for German
languange and in collection_US for English
fieldType
One more thing i just noticed:
if for collection_US i try to search for
title:blue hat^100 OR text:blue hat^50- i get the same error
but if i search for :
title:blue hat^100 OR text:bluehat^50 - it works fine
-
Thanks,
Michael
--
View this message in context:
I run a set of queries using the AdminUI and some of them trigger a weird
error:
error: {
msg: org.apache.solr.client.solrj.SolrServerException: No live
SolrServers available to handle this request:...
code: 500
}
Here's the pattern, using the edismax parser:
title:blue hat OR
Thanks Jack!
Some more info: I looked a little bit and tried the problem query,
undistributed, on each shard:
shard2_replica1 and shard2_replica2 throw this error:
responseHeader:{
status:500,
QTime:2,
params:{
lowercaseOperators:true,
indent:true,
q:title:\red shoes\
I also narrowed my problem to the text field.
simple query : title:red shoes works
but text:red shoes does not.
Could you extend a little bit how could my schema omitted position
information?
I'm not really sure what you mean by that.
Thank you!
-
Thanks,
Michael
--
View this
After restarting my servers this was the first error i got when trying to
make the same query:
{
responseHeader:{
status:500,
QTime:336,
params:{
lowercaseOperators:true,
indent:true,
q:text:\blue cat\,
distrib:false,
stopwords:true,
wt:json,
I'm currently using a SolrCoud setup and I index my data using a couple of
in-house indexing clients.
The clients process some files and post json messages containing added
documents in batches.
Initially my batch size was 100k docs and the post request took about 20-30
secs.
I switched to 10k
For maximum search accuracy on my SolrCloud system i was thinking of
combining phrase search with term search in the following way:
search term: john doe
search fields: title, description - a match in the title is more relevant
than one in the description
What i want to achieve - the following
Thanks @Mark @Erick
Should I create a JIRA issue for this ?
-
Thanks,
Michael
--
View this message in context:
http://lucene.472066.n3.nabble.com/SolrCloud-optimizing-a-core-triggers-optimization-of-another-tp4097499p4098020.html
Sent from the Solr - User mailing list archive at
How do you add the documents to the index - one by one, batches of n ? When
do you do your commits ?
Because 8k docs per day is not a lot. Depending on the above, commiting with
softCommit=true might also be a solution.
-
Thanks,
Michael
--
View this message in context:
When one of your shards dies, your index becomes incomplete. By default the
querying is distributed (on all shards - distrib=true) and if one of them
(shard X) is down, then you get an error stating that there are no servers
hosting shard X.
If the other shards are still up you can query them
You're describing two different entities: Job and Employee.
Since they are clearly different in any way you will need two different
cores with two different schemas.
-
Thanks,
Michael
--
View this message in context:
http://lucene.472066.n3.nabble.com/Solr-For-tp4097928p4098025.html
Sent
Put *:* in the q field
Then check the facet check box (look lower close to the Execute button) and
in the facet.field insert Name.
This should do the trick.
-
Thanks,
Michael
--
View this message in context:
http://lucene.472066.n3.nabble.com/Newbie-to-Solr-tp4092876p4098031.html
Sent from
I don't see the mentioned attachement.
Try using http://snag.gy/ to provide it.
As for where do you find it, the default is
http://localhost:8983/solr/collection1/query
-
Thanks,
Michael
--
View this message in context:
Maybe this can help you:
https://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.SynonymFilterFactory
-
Thanks,
Michael
--
View this message in context:
http://lucene.472066.n3.nabble.com/Normalized-data-during-indexing-tp4097750p4097752.html
Sent from the Solr - User mailing
Thanks Erick! I will try specifying the distrib parameter.
As for why I am optimizing, well i do lots of delete by id and by query and
after a while about 30% of maxDocs are deletedDocs. On a 50G index that
means about 15G of space which I am trying to free by doing the
optimization.
it's
Hi!
I have a SolrCloud setup, on two servers 3 shards, replicationFactor=2.
Today I trigered the optimization on core *shard2_replica2* which only
contained 3M docs, and 2.7G.
The size of the other shards were shard3=2.7G and shard1=48G (the routing is
implicit but after some update deadlocks and
For filtering categories i'm using something like this :
fq=category:(cat1 OR cat2 OR cat3)
-
Thanks,
Michael
--
View this message in context:
http://lucene.472066.n3.nabble.com/fq-with-or-in-Solr-4-3-1-tp4097170p4097183.html
Sent from the Solr - User mailing list archive at Nabble.com.
Being given
field name=title type=string bindexed=false* stored=true
multiValued=false /
Changed to
field name=title type=string bindexed=true* stored=true
multiValued=false /
Once the above is done and the collection reloaded, is there a way I can
build that index on that field, without
I've made a test, based on your suggestion.
Using the example in 4.5.0 i set the title field as indexed=false, indexed a
couple of docs:
add
doc
field name=id1/field
field name=title update=setBigApple/field
/doc
doc
field name=id2/field
field name=title update=setSmallApple/field
/doc
Hi!
I'm using Solr 4.4.0 currently and I'm having quite some trouble with
* SOLR-5216: Document updates to SolrCloud can cause a distributed deadlock.
(Mark Miller)
which should be fixed for 4.6.0.
Where could I get Solr 4.6.0 from? I want to make some tests regarding this
fix.
Thank you!
Thanks Chris Rafal!
So the problem actually persists in 4.6.
I'll then watch this issue and cheer for Mark's fix.
-
Thanks,
Michael
--
View this message in context:
http://lucene.472066.n3.nabble.com/Solr-4-6-0-latest-build-tp4096960p4096992.html
Sent from the Solr - User mailing list
Thank you, Otis!
I've integrated the SPM on my Solr instances and now I have access to
monitoring data.
Could you give me some hints on which metrics should I watch?
Below I've added my query configs. Is there anything I could tweak here?
query
maxBooleanClauses1024/maxBooleanClauses
Hmm, no, I haven't...
What would be the effect of this ?
-
Thanks,
Michael
--
View this message in context:
http://lucene.472066.n3.nabble.com/Solr-timeout-after-reboot-tp4096408p4096809.html
Sent from the Solr - User mailing list archive at Nabble.com.
I'm using the m3.xlarge server with 15G RAM, but my index size is over 100G,
so I guess putting running the above command would bite all available
memory.
-
Thanks,
Michael
--
View this message in context:
http://lucene.472066.n3.nabble.com/Solr-timeout-after-reboot-tp4096408p4096827.html
I have a SolrCloud environment with 4 shards, each having a replica and a
leader. The index size is about 70M docs and 60Gb, running with Jetty +
Zookeeper, on 2 EC2 instances, each with 4CPUs and 15G RAM.
I'm using SolrMeter for stress testing.
If I restart Jetty and then try to use SolrMeter to
The question also asked some 10 months ago in
http://lucene.472066.n3.nabble.com/SolrCloud-4-1-change-config-set-for-a-collection-td4037456.html,
and then the answer was negative, but here it goes again, maybe now it's
different.
Is it possible to change the config set of a collection using the
Thank you, Shawn!
linkconfig - that's exactly what i was looking for!
--
View this message in context:
http://lucene.472066.n3.nabble.com/Change-config-set-for-a-collection-tp4096032p4096134.html
Sent from the Solr - User mailing list archive at Nabble.com.
Thanks Garth!
Yes, indeed, I know that issue.
I had set up my SolrCloud using 4.5.0 and then encountered this problem, so
I rolled back to 4.4.0
-
Thanks,
Michael
--
View this message in context:
http://lucene.472066.n3.nabble.com/Change-config-set-for-a-collection-tp4096032p4096136.html
Thanks Erick!
The version is 4.4.0.
I'm posting 100k docs batches every 30-40 sec from each indexing client and
sometimes two or more clients post in a very small timeframe. That's when i
think the deadlock happens.
I'll try to replicate the problem and check the thread dump.
--
View this
I got the trace from jstack.
I found references to semaphore but not sure if this is what you meant.
Here's the trace:
http://pastebin.com/15QKAz7U
--
View this message in context:
http://lucene.472066.n3.nabble.com/Debugging-update-request-tp4095619p4095847.html
Sent from the Solr - User
I have setup a SolrCloud system with: 3 shards, replicationFactor=3 on 3
machines along with 3 Zookeeper instances.
My web application makes queries to Solr specifying the hostname of one of
the machines. So that machine will always get the request and the other ones
will just serve as an aid.
So
Thanks!
I've read a lil' bit about that, but my app is php-based so I'm afraid I
can't use that.
--
View this message in context:
http://lucene.472066.n3.nabble.com/SolrCloud-Query-Balancing-tp4095854p4095857.html
Sent from the Solr - User mailing list archive at Nabble.com.
Assuming that you are using the Admin UI:
The instanceDir must be already existing (in your case index1).
Inside it there should be conf/ directory holding the cofiguration files.
In the config field only insert the file name (like solrconfig.xml) which
shoulf be found in the conf/ directory
Thanks!
Could you provide some examples or details of the configuration you use ?
I think this solution would suit me also.
--
View this message in context:
http://lucene.472066.n3.nabble.com/SolrCloud-Query-Balancing-tp4095854p4095910.html
Sent from the Solr - User mailing list archive at
Here's some of the Solr's last words (log content before it stoped accepting
updates), maybe someone can help me interpret that.
http://pastebin.com/mv7fH62H
--
View this message in context:
http://lucene.472066.n3.nabble.com/Concurent-indexing-tp4095409p4095642.html
Sent from the Solr - User
85 matches
Mail list logo