Re: matching starts with only
I've changed the field name to string type, the default one presented in schema.xml, and I got what I needed. thanks for your time. -- View this message in context: http://lucene.472066.n3.nabble.com/matching-starts-with-only-tp4094430p4094637.html Sent from the Solr - User mailing list archive at Nabble.com.
matching starts with only
My index contains documents which could be a single word or a short sentence which contains up to 4-5 words. I need to return documents, which starts with only from the searched pattern. in regex it would be [^my_query]. for example, for a docs: black beautiful black cat cat cat is black black cat and for the query: black only black and black cat should be returned. The text field I'm using is as follows: fieldType name=text_general_aa class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.NGramFilterFactory minGramSize=4 maxGramSize=15 side=front/ filter class=solr.LowerCaseFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.NGramFilterFactory minGramSize=4 maxGramSize=15 side=front/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType Solr version is 4.2 thanks! -- View this message in context: http://lucene.472066.n3.nabble.com/matching-starts-with-only-tp4094430.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: matching starts with only
Shawn Heisey-4: thanks for the quick response. Why this field have to be copyField? Couldn't it be a single field, for example: fieldType name=text_general_long class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType field name=my_name type=text_general_long stored=true multiValued=false required=false/ thanks. -- View this message in context: http://lucene.472066.n3.nabble.com/matching-starts-with-only-tp4094430p4094447.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: matching starts with only
search by starts with is something new I have to add, as well as the data I have to index for this purpose, so it's ok to create a new field. But once I added the following field type: fieldType name=text_general_long class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ /analyzer analyzer type=query tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.LowerCaseFilterFactory/ /analyzer /fieldType And: field name=my_name type=text_general_long stored=true multiValued=false required=false/ indexing, and afterwards searching by my_name:/^black/ returns no results, while searching by my_name:black returns only black document. What am I missing? thanks. -- View this message in context: http://lucene.472066.n3.nabble.com/matching-starts-with-only-tp4094430p4094453.html Sent from the Solr - User mailing list archive at Nabble.com.
too many boolean clauses
I got: SyntaxError: Cannot parse 'name:Bbbbm' Using solr 4.21 name field type def: fieldType name=text_general class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.ASCIIFoldingFilterFactory / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=1 splitOnCaseChange=1 splitOnNumerics=1 preserveOriginal=1 types=characters.txt / filter class=solr.NGramTokenizerFactory minGramSize=2 maxGramSize=15/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.LowerCaseFilterFactory/ filter class=solr.SnowballPorterFilterFactory language=English protected=protwords.txt / filter class=solr.RemoveDuplicatesTokenFilterFactory / /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=1 splitOnCaseChange=1 splitOnNumerics=1 preserveOriginal=1 types=characters.txt / filter class=solr.NGramTokenizerFactory minGramSize=2 maxGramSize=15/ filter class=solr.SynonymFilterFactory synonyms=synonyms.txt ignoreCase=true expand=true/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.SnowballPorterFilterFactory language=English protected=protwords.txt / filter class=solr.RemoveDuplicatesTokenFilterFactory / /analyzer /fieldType Any ideas how to fix it? thanks -- View this message in context: http://lucene.472066.n3.nabble.com/too-many-boolean-clauses-tp4065288.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: too many boolean clauses
first of all thanks for response! Regarding two tokenizers - it's ok. switching to NGramFilterFactory didn't help (though I didn't reindex but don't think it was needed since switched it into 'query' section). Now regarding the maxBooleanClauses - how it effects performance (response times, memory usage) when increasing it? thanks -- View this message in context: http://lucene.472066.n3.nabble.com/too-many-boolean-clauses-tp4065288p4065314.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Poll: Largest SolrCloud out there?
4 AWS hosts: Memory: 30822868k total CPU: Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz x8 17M docs 5 Gb index. 8 master-slave shards (2 shards /host). 57 msec/query avg. time. (~110K queries/24 hours). -- View this message in context: http://lucene.472066.n3.nabble.com/Poll-Largest-SolrCloud-out-there-tp4043293p4046915.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: ping query frequency
*Shawn:* here you go - http://wiki.solarium-project.org/index.php/V1:Ping_query as for my cloud, average response time for ping request is 4 ms. but there are several pings that take even 3 seconds. (I have about 10 pings/day) -- View this message in context: http://lucene.472066.n3.nabble.com/ping-query-frequency-tp4044305p4044472.html Sent from the Solr - User mailing list archive at Nabble.com.
ping query frequency
Hi, I'm wonderring how frequent this query should be made. Currently it is done before each select request (some very old legacy). I googled a little and found out that it is bad practice and has performance impact. So the question is should I completely remove it or just do it once in some period of time. What is the best practice? thanks -- View this message in context: http://lucene.472066.n3.nabble.com/ping-query-frequency-tp4044305.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Dropping slow queries
Thanks, but it not exactly what I need. According to the documentation this value *only* applies to the search and *not to requests* in general. What I need is to effect the request - to drop it. To tell to the cloud to drop all requests which take more then x msec. No matter why - slow search in the shard, network issues between the shards, etc. thanks -- View this message in context: http://lucene.472066.n3.nabble.com/Dropping-slow-queries-tp4043074p4043592.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Dropping slow queries
Am I missing something? which answer do you mean? And actually, just to clarify, the question was how can I force the partial results and drop the results from the slow shard? -- View this message in context: http://lucene.472066.n3.nabble.com/Dropping-slow-queries-tp4043074p4043549.html Sent from the Solr - User mailing list archive at Nabble.com.
Dropping slow queries
Hi, Is there a way to drop slow query in the distributed search? In another words, is there a way to tell SolrCloud to wait x ms for the response from shards in the cloud and to return the results which were returned during the specified period of time (x ms)? For example X=10 ms. There are 4 shards in the cloud. 3 of them served the request in 3 ms and the last one in 15 seconds, so the total time for the query will be 15 seconds. I prefer to return less results (from 3 fast shards), but in much less time. Is there any configuration that can help to achieve this? thanks. -- View this message in context: http://lucene.472066.n3.nabble.com/Dropping-slow-queries-tp4043074.html Sent from the Solr - User mailing list archive at Nabble.com.
Errors during index optimization on solrcloud
Hi, I'm running SolrCloud (Solr4) with 1 core, 8 shards and zookeeper My index is being updated every minute, so I'm running optimization once a day. Every time during the optimization there is an error: SEVERE: shard update error StdNode: http://host:port/solr/core_name/ SEVERE: shard update error StdNode: http://host:port/solr/core_name/:org.apache.solr.common.SolrException: Server at http://host:port/solr/core_name/ returned non ok status:503, message:Service Unavailable Any ideas what is causes this error and how to avoid it? thanks. -- View this message in context: http://lucene.472066.n3.nabble.com/Errors-during-index-optimization-on-solrcloud-tp4041135.html Sent from the Solr - User mailing list archive at Nabble.com.
solrcloud-zookeeper
Hi all, the first question: is there a way to reduce timeout when sold shard comes up? it looks in log file as follows: Feb 12, 2013 1:19:08 PM org.apache.solr.cloud.ShardLeaderElectionContext waitForReplicasToComeUp INFO: Waiting until we see more replicas up: total=2 found=1 timeoutin=178992 Feb 12, 2013 1:19:09 PM org.apache.solr.cloud.ShardLeaderElectionContext waitForReplicasToComeUp INFO: Waiting until we see more replicas up: total=2 found=1 timeoutin=178489 Feb 12, 2013 1:19:09 PM org.apache.solr.cloud.ShardLeaderElectionContext waitForReplicasToComeUp INFO: Waiting until we see more replicas up: total=2 found=1 timeoutin=177986 And another one - let's assume I have 2 shards and one of them is down (both - master and slave) for some reason. What is happening now is that cluster returns 503 on the search request. Is there a way to configure to get responses from other shard? thanks. -- View this message in context: http://lucene.472066.n3.nabble.com/solrcloud-zookeeper-tp4039934.html Sent from the Solr - User mailing list archive at Nabble.com.
copyField vs single field
Hi, Let's assume I have to search for a string (textField) in 6-7 different fields (username, firstname, lastname, etc). Which one will have better performance: username:test OR firstname:test OR lastname:test or defining some copyField and searching within it like somecopyfield:test thanks. -- View this message in context: http://lucene.472066.n3.nabble.com/copyField-vs-single-field-tp4038832.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: dataimport.properties not created/updated with solrcloud
Well, I saw, that when I ran the full/delta i,port process on the 2nd, 3rd etc times I didn't so this exception any more. So I checked in my mysql queries log what's going on in mysql when I was running delta import process and I saw, that the queries got correct times on each delta-import execution. I assume, that this property stored somewhere in zookeeper, in binary format in $SOLR_HOME/solr/zoo_data folder. -- View this message in context: http://lucene.472066.n3.nabble.com/dataimport-properties-not-created-updated-with-solrcloud-tp4026162p4028161.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: dataimport.properties not created/updated with solrcloud
Thank you for your response. I saw the SOLR-3165, but can't really locate this different location. Even when I searching for this file with find command. According to the patch and warning message that I got (WARNING: Could not read DIH properties from /configs/my_collection/dataimport.properties) zookeeper tries to access file in undefined path. How can I tell zookeeper to use $home/solr/my_collection/conf/ folder for creating data import.properties? -- View this message in context: http://lucene.472066.n3.nabble.com/dataimport-properties-not-created-updated-with-solrcloud-tp4026162p4026371.html Sent from the Solr - User mailing list archive at Nabble.com.
Partial results returned
Hello, I'm running solrcloud with 2 shards. Lets assume I've 100 documents indexed in total, which are divided 55/45 by the shards... when I query, for example: curl 'http://localhost:7500/solr/index/select?q=*:*lwt=jsonindent=truerows=0' sometimes I got response:{numFound:0, sometimes - response:{numFound:45, response:{numFound:55 or response:{numFound:100. But when I run the query: curl 'http://localhost:7500/solr/index/select?shards=localhost:7500/solr/index,localhost:7501/solr/indexq=*:*wt=jsonindent=truerows=0' it always returns the complete list of 100 documents. Am I missing some configuration? thanks. -- View this message in context: http://lucene.472066.n3.nabble.com/Partial-results-returned-tp4026027.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Partial results returned
I have 1 collection called index. I created it like explained here: http://wiki.apache.org/solr/SolrCloud in Example A: Simple two shard cluster section here are the start up commands: 1)java -Dbootstrap_confdir=./solr/index/conf -Dcollection.configName=myconf -DzkRun -DnumShards=2 -jar start.jar -Djetty.port=7500 logs/solr_server_java.`date +%Y%m%d`.log 21 2)java -Dbootstrap_confdir=./solr/index/conf -Dcollection.configName=myconf -Djetty.port=7501 -DzkHost=localhost:8500 -jar start.jar logs/solr_server_java.`date +%Y%m%d`.log 21 in my http://localhost:7500/solr/#/~cloud There is only a chart of my collection with the shards -- View this message in context: http://lucene.472066.n3.nabble.com/Partial-results-returned-tp4026027p4026076.html Sent from the Solr - User mailing list archive at Nabble.com.
dataimport.properties not created/updated with solrcloud
Hi, I have a problem with updating dataimport.properties - while running single sold there is no problem at all. Everything works perfectly. But when I switching to cloud configuration with 2 shards (like described in http://wiki.apache.org/solr/SolrCloud ExampleA: Simple two shard cluster) this file doesn't get updated any more. I'm looking for the jetty log and it is totally free from warnings or errors. Since I'm using DIH, dataimport.properties is essential for my deltas. Read/Write permissions are ok for this file. Any ideas how to fix it? -- View this message in context: http://lucene.472066.n3.nabble.com/dataimport-properties-not-created-updated-with-solrcloud-tp4026162.html Sent from the Solr - User mailing list archive at Nabble.com.
SOLR4 (sharded) and join query
Hi, I'm running some join query, let's say it looks as follows: {!join from=some_id to=another_id}(a_id:55 AND some_type_id:3). When I run it on single instance of SOLR I got the correct result, but when I'm running it on the sharded system (2 shards with replica for each shard (total index counts ~300K entries)) I got partial result. Is there any issue with supporting join queries on sharded system or may be there is some configuration tweak, that I'm missing? Thanlks. -- View this message in context: http://lucene.472066.n3.nabble.com/SOLR4-sharded-and-join-query-tp4024547.html Sent from the Solr - User mailing list archive at Nabble.com.
solr/tomcat performance.
Hi, I'm running solr+tomcat with the following configuration: I have 16 slaves, which are being queried by aggregator, while aggregator being queried by the users. My slaveUrls variable in solr.xml (on aggregator) looks like - 'property name=slaveUrls value=host01/slave01,host02/slave02,host03/slave03,...,host16/slave16 /' I'm running it on linux machine (not dedicated, there are some other 'heavy' processes) with 16 quads CPUs and 66GB Ram. I ran some tests and I saw, that when I did 400 concurrent requests to aggregator the host stopped to respond until I restart the tomcat. I tried to 'play' with tomcat's/java configuration a little, but it didn't help me much and the main issue was memory usage and timeouts. Currently I'm using the following settings: Java: -Xms256m -Xmx8192m I tried to tweak -XX:MinHeapFreeRatio setting, but from what I could see no memory was returned to OS. Tomcat: Executor name=HTTPThreadPool namePrefix=HTTPThread- maxThreads=8000 minSpareThreads=4000/ Connector executor=HTTPThreadPool port=8080 protocol=HTTP/1.1 redirectPort=8443 URIEncoding=UTF-8 maxHttpHeaderSize=8388608 enableLookups=false acceptCount=100 connectionTimeout=1 / Assuming I'll have ~1000 requests/second done to aggregator, on how many aggregators should I balance the loading? Or may be I can achieve better performance only by tweaking the current system? Any help/advise will be appreciated, Thanks! -- View this message in context: http://lucene.472066.n3.nabble.com/solr-tomcat-performance-tp3727199p3727199.html Sent from the Solr - User mailing list archive at Nabble.com.