Re: matching starts with only

2013-10-10 Thread adm1n
I've changed the field name to string type, the default one presented in
schema.xml, and I got what I needed.


thanks for your time.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/matching-starts-with-only-tp4094430p4094637.html
Sent from the Solr - User mailing list archive at Nabble.com.


matching starts with only

2013-10-09 Thread adm1n
My index contains documents which could be a single word or a short sentence
which contains up to 4-5 words. I need to return documents, which starts
with only from the searched pattern.
in regex it would be [^my_query].

for example, for a docs:

black
beautiful black cat
cat
cat is black
black cat

and for the query: black

only black and black cat should be returned.

The text field I'm using is as follows:
fieldType name=text_general_aa class=solr.TextField
positionIncrementGap=100
  analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.NGramFilterFactory minGramSize=4
maxGramSize=15 side=front/
filter class=solr.LowerCaseFilterFactory/
  /analyzer
  analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.NGramFilterFactory minGramSize=4
maxGramSize=15 side=front/
filter class=solr.LowerCaseFilterFactory/
  /analyzer
/fieldType
Solr version is 4.2

thanks!



--
View this message in context: 
http://lucene.472066.n3.nabble.com/matching-starts-with-only-tp4094430.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: matching starts with only

2013-10-09 Thread adm1n
Shawn Heisey-4:

thanks for the quick response.

Why this field have to be copyField? Couldn't it be a single field, for
example:
fieldType name=text_general_long class=solr.TextField
positionIncrementGap=100
  analyzer type=index
tokenizer class=solr.KeywordTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
  /analyzer
  analyzer type=query
tokenizer class=solr.KeywordTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
  /analyzer
/fieldType

field name=my_name type=text_general_long stored=true
multiValued=false required=false/



thanks.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/matching-starts-with-only-tp4094430p4094447.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: matching starts with only

2013-10-09 Thread adm1n
search by starts with is something new I have to add, as well as the data I
have to index for this purpose, so it's ok to create a new field.

But once I added the following field type:
fieldType name=text_general_long class=solr.TextField
positionIncrementGap=100
  analyzer type=index
tokenizer class=solr.KeywordTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
  /analyzer
  analyzer type=query
tokenizer class=solr.KeywordTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
  /analyzer
/fieldType

And:
field name=my_name type=text_general_long stored=true
multiValued=false required=false/
indexing, and afterwards searching by my_name:/^black/ returns no results,
while searching by my_name:black returns only black document.

What am I missing?

thanks. 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/matching-starts-with-only-tp4094430p4094453.html
Sent from the Solr - User mailing list archive at Nabble.com.


too many boolean clauses

2013-05-22 Thread adm1n
I got:
SyntaxError: Cannot parse
'name:Bbbbm'

Using solr 4.21
name field type def:

fieldType name=text_general class=solr.TextField
positionIncrementGap=100 
  analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.ASCIIFoldingFilterFactory /
filter class=solr.WordDelimiterFilterFactory
generateWordParts=1 generateNumberParts=1 catenateWords=1
catenateNumbers=1 catenateAll=1 splitOnCaseChange=1
splitOnNumerics=1 preserveOriginal=1
types=characters.txt /
filter class=solr.NGramTokenizerFactory minGramSize=2
maxGramSize=15/
   filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt enablePositionIncrements=true /
filter class=solr.LowerCaseFilterFactory/
filter class=solr.SnowballPorterFilterFactory language=English
protected=protwords.txt /
filter class=solr.RemoveDuplicatesTokenFilterFactory /
  /analyzer
  analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt enablePositionIncrements=true /
filter class=solr.WordDelimiterFilterFactory
generateWordParts=1 generateNumberParts=1 catenateWords=1
catenateNumbers=1 catenateAll=1 splitOnCaseChange=1
splitOnNumerics=1 preserveOriginal=1
types=characters.txt /
filter class=solr.NGramTokenizerFactory minGramSize=2
maxGramSize=15/
filter class=solr.SynonymFilterFactory synonyms=synonyms.txt
ignoreCase=true expand=true/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.SnowballPorterFilterFactory language=English
protected=protwords.txt /
filter class=solr.RemoveDuplicatesTokenFilterFactory /
  /analyzer
/fieldType


Any ideas how to fix it?

thanks



--
View this message in context: 
http://lucene.472066.n3.nabble.com/too-many-boolean-clauses-tp4065288.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: too many boolean clauses

2013-05-22 Thread adm1n
first of all thanks for response!

Regarding two tokenizers - it's ok.
switching to NGramFilterFactory didn't help (though I didn't reindex but
don't think it was needed since switched it into 'query' section).

Now regarding the maxBooleanClauses - how it effects performance (response
times, memory usage) when increasing it?


thanks



--
View this message in context: 
http://lucene.472066.n3.nabble.com/too-many-boolean-clauses-tp4065288p4065314.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Poll: Largest SolrCloud out there?

2013-03-13 Thread adm1n
4 AWS hosts:
Memory: 30822868k total
CPU: Intel(R) Xeon(R) CPU E5-2670 0 @ 2.60GHz x8
17M docs
5 Gb index.
8 master-slave shards (2 shards /host).
57 msec/query avg. time. (~110K queries/24 hours).





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Poll-Largest-SolrCloud-out-there-tp4043293p4046915.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: ping query frequency

2013-03-04 Thread adm1n
*Shawn:*
here you go - http://wiki.solarium-project.org/index.php/V1:Ping_query


as for my cloud, average response time for ping request is 4 ms. but there
are several pings that take even 3 seconds. (I have about 10 pings/day)



--
View this message in context: 
http://lucene.472066.n3.nabble.com/ping-query-frequency-tp4044305p4044472.html
Sent from the Solr - User mailing list archive at Nabble.com.


ping query frequency

2013-03-03 Thread adm1n
Hi,


I'm wonderring how frequent this query should be made. Currently it is done
before each select request (some very old legacy). I googled a little and
found out that it is bad practice and has performance impact. So the
question is should I completely remove it or just do it once in some period
of time.

What is the best practice?


thanks



--
View this message in context: 
http://lucene.472066.n3.nabble.com/ping-query-frequency-tp4044305.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Dropping slow queries

2013-02-28 Thread adm1n
Thanks, but it not exactly what I need. According to the documentation this
value *only* applies to the search and *not to requests* in general. 

What I need is to effect the request - to drop it. To tell to the cloud to
drop all requests which take more then x msec. No matter why - slow search
in the shard, network issues between the shards, etc.

thanks



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Dropping-slow-queries-tp4043074p4043592.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Dropping slow queries

2013-02-27 Thread adm1n
Am I missing something? which answer do you mean?

And actually, just to clarify, the question was how can I force the partial
results and drop the results from the slow shard?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Dropping-slow-queries-tp4043074p4043549.html
Sent from the Solr - User mailing list archive at Nabble.com.


Dropping slow queries

2013-02-26 Thread adm1n
Hi,

Is there a way to drop slow query in the distributed search?

In another words, is there a way to tell SolrCloud to wait x ms for the
response from shards in the cloud and to return the results which were
returned during the specified period of time (x ms)?


For example X=10 ms. There are 4 shards in the cloud. 3 of them served the
request in 3 ms and the last one in 15 seconds, so the total time for the
query will be 15 seconds. I prefer to return less results (from 3 fast
shards), but in much less time. Is there any configuration that can help to
achieve this?

thanks.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Dropping-slow-queries-tp4043074.html
Sent from the Solr - User mailing list archive at Nabble.com.


Errors during index optimization on solrcloud

2013-02-18 Thread adm1n
Hi,

I'm running SolrCloud (Solr4) with 1 core, 8 shards and zookeeper
My index is being updated every minute, so I'm running optimization once a
day.
Every time during the optimization there is an error:
SEVERE: shard update error StdNode: http://host:port/solr/core_name/
SEVERE: shard update error StdNode:
http://host:port/solr/core_name/:org.apache.solr.common.SolrException:
Server at http://host:port/solr/core_name/ returned non ok status:503,
message:Service Unavailable

Any ideas what is causes this error and how to avoid it?


thanks.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Errors-during-index-optimization-on-solrcloud-tp4041135.html
Sent from the Solr - User mailing list archive at Nabble.com.


solrcloud-zookeeper

2013-02-12 Thread adm1n
Hi all,
the first question:
is there a way to reduce timeout when sold shard comes up? it looks in log
file as follows:

Feb 12, 2013 1:19:08 PM org.apache.solr.cloud.ShardLeaderElectionContext
waitForReplicasToComeUp
INFO: Waiting until we see more replicas up: total=2 found=1
timeoutin=178992
Feb 12, 2013 1:19:09 PM org.apache.solr.cloud.ShardLeaderElectionContext
waitForReplicasToComeUp
INFO: Waiting until we see more replicas up: total=2 found=1
timeoutin=178489
Feb 12, 2013 1:19:09 PM org.apache.solr.cloud.ShardLeaderElectionContext
waitForReplicasToComeUp
INFO: Waiting until we see more replicas up: total=2 found=1
timeoutin=177986


And another one - let's assume I have 2 shards and one of them is down (both
- master and slave) for some reason.
What is happening now is that cluster returns 503 on the search request. Is
there a way to configure to get responses from other shard?


thanks.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/solrcloud-zookeeper-tp4039934.html
Sent from the Solr - User mailing list archive at Nabble.com.


copyField vs single field

2013-02-06 Thread adm1n
Hi,

Let's assume I have to search for a string (textField) in 6-7 different
fields (username, firstname, lastname, etc). Which one will have better
performance:
username:test OR firstname:test OR lastname:test
or defining some copyField and searching within it like somecopyfield:test


thanks. 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/copyField-vs-single-field-tp4038832.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: dataimport.properties not created/updated with solrcloud

2012-12-19 Thread adm1n
Well, I saw, that when I ran the full/delta i,port process on the 2nd, 3rd
etc times I didn't so this exception any more. So I checked in my mysql
queries log what's going on in mysql when I was running delta import process
and I saw, that the queries got correct times on each delta-import
execution.

I assume, that this property stored somewhere in zookeeper, in binary format
in $SOLR_HOME/solr/zoo_data folder.





--
View this message in context: 
http://lucene.472066.n3.nabble.com/dataimport-properties-not-created-updated-with-solrcloud-tp4026162p4028161.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: dataimport.properties not created/updated with solrcloud

2012-12-12 Thread adm1n
Thank you for your response.

I saw the SOLR-3165, but can't really locate this different location. Even
when I searching for this file with find command.

According to the patch and warning message that I got (WARNING: Could not
read DIH properties from /configs/my_collection/dataimport.properties)
zookeeper tries to access file in undefined path.

How can I tell zookeeper to use $home/solr/my_collection/conf/ folder for
creating data import.properties?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/dataimport-properties-not-created-updated-with-solrcloud-tp4026162p4026371.html
Sent from the Solr - User mailing list archive at Nabble.com.


Partial results returned

2012-12-11 Thread adm1n
Hello,

I'm running solrcloud with 2 shards.

Lets assume I've 100 documents indexed in total, which are divided 55/45 by
the shards...
when I query, for example:
curl
'http://localhost:7500/solr/index/select?q=*:*lwt=jsonindent=truerows=0'
sometimes I got response:{numFound:0, sometimes -
response:{numFound:45, response:{numFound:55 or
response:{numFound:100.

But when I run the query:
curl
'http://localhost:7500/solr/index/select?shards=localhost:7500/solr/index,localhost:7501/solr/indexq=*:*wt=jsonindent=truerows=0'

it always returns the complete list of 100 documents.

Am I missing some configuration?

thanks.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Partial-results-returned-tp4026027.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Partial results returned

2012-12-11 Thread adm1n
I have 1 collection called index.
I created it like explained here: http://wiki.apache.org/solr/SolrCloud in
Example A: Simple two shard cluster section
here are the start up commands:

1)java -Dbootstrap_confdir=./solr/index/conf -Dcollection.configName=myconf
-DzkRun -DnumShards=2 -jar start.jar -Djetty.port=7500 
logs/solr_server_java.`date +%Y%m%d`.log 21 
2)java -Dbootstrap_confdir=./solr/index/conf -Dcollection.configName=myconf
-Djetty.port=7501 -DzkHost=localhost:8500 -jar start.jar 
logs/solr_server_java.`date +%Y%m%d`.log 21 


in my http://localhost:7500/solr/#/~cloud  There is only a chart of my
collection with the shards




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Partial-results-returned-tp4026027p4026076.html
Sent from the Solr - User mailing list archive at Nabble.com.


dataimport.properties not created/updated with solrcloud

2012-12-11 Thread adm1n
Hi,

I have a problem with updating dataimport.properties - while running single
sold there is no problem at all. Everything works perfectly. But when I
switching to cloud configuration with 2 shards (like described in
http://wiki.apache.org/solr/SolrCloud ExampleA: Simple two shard cluster)
this file doesn't get updated any more. I'm looking for the jetty log and it
is totally free from warnings or errors.
Since I'm using DIH, dataimport.properties is essential for my deltas.
Read/Write permissions are ok for this file.

Any ideas how to fix it?







--
View this message in context: 
http://lucene.472066.n3.nabble.com/dataimport-properties-not-created-updated-with-solrcloud-tp4026162.html
Sent from the Solr - User mailing list archive at Nabble.com.


SOLR4 (sharded) and join query

2012-12-05 Thread adm1n
Hi,

I'm running some join query, let's say it looks as follows: {!join
from=some_id to=another_id}(a_id:55 AND some_type_id:3). When I run it on
single instance of SOLR I got the correct result, but when I'm running it on
the sharded system (2 shards with replica for each shard (total index counts
~300K entries)) I got partial result.

Is there any issue with supporting join queries on sharded system or may be
there is some configuration tweak, that I'm missing?


Thanlks.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/SOLR4-sharded-and-join-query-tp4024547.html
Sent from the Solr - User mailing list archive at Nabble.com.


solr/tomcat performance.

2012-02-08 Thread adm1n
Hi,

I'm running solr+tomcat with the following configuration:
I have 16 slaves, which are being queried by aggregator, while aggregator
being queried by the users.
My slaveUrls variable in solr.xml (on aggregator) looks like - 'property
name=slaveUrls
value=host01/slave01,host02/slave02,host03/slave03,...,host16/slave16 /'
I'm running it on linux machine (not dedicated, there are some other 'heavy'
processes) with 16 quads CPUs and 66GB Ram.

I ran some tests and I saw, that when I did 400 concurrent requests to
aggregator the host stopped to respond until I restart the tomcat. I tried
to 'play' with tomcat's/java configuration a little, but it didn't help me
much and the main issue was memory usage and timeouts. Currently I'm using
the following settings:
Java:
-Xms256m -Xmx8192m
I tried to tweak -XX:MinHeapFreeRatio setting, but from what I could see no
memory was returned to OS.
Tomcat:
Executor name=HTTPThreadPool namePrefix=HTTPThread-
maxThreads=8000
minSpareThreads=4000/

Connector executor=HTTPThreadPool port=8080 protocol=HTTP/1.1
redirectPort=8443 URIEncoding=UTF-8
maxHttpHeaderSize=8388608
enableLookups=false
acceptCount=100
connectionTimeout=1 /

Assuming I'll have ~1000 requests/second done to aggregator, on how many
aggregators should I balance the loading? Or may be I can achieve better
performance only by tweaking the current system?


Any help/advise will be appreciated,
Thanks!

--
View this message in context: 
http://lucene.472066.n3.nabble.com/solr-tomcat-performance-tp3727199p3727199.html
Sent from the Solr - User mailing list archive at Nabble.com.