solr-cloud performance decrease day by day

2013-04-19 Thread qibaoyuan
Hello, i am using sold 4.1.0 and ihave used sold cloud in my product.I have found at first everything seems good,the search time is fast and delay is slow,but it becomes very slow after days.does any one knows if there maybe some params or optimization to use sold cloud?

Re: solr-cloud performance decrease day by day

2013-04-19 Thread Furkan KAMACI
Could you give more info about your index size and technical details of your machine? Maybe you are indexing more data day by day and your RAM capability is not enough anymore? 2013/4/19 qibaoyuan qibaoy...@gmail.com Hello, i am using sold 4.1.0 and ihave used sold cloud in my product.I

Re: solr-cloud performance decrease day by day

2013-04-19 Thread qibaoyuan
there are 6 shards and they are in one machine,and the jvm param is very big,the physical memory is 16GB,the total #docs is about 150k,the index size of each shard is about 1GB.AND there is indexing while searching,I USE auto commit each 10min.and the data comes about 100 per minutes. 在

Re: SolrCloud loadbalancing, replication, and failover

2013-04-19 Thread John Nielsen
Well, to consume 120GB of RAM with a 120GB index, you would have to query over every single GB of data. If you only actually query over, say, 500MB of the 120GB data in your dev environment, you would only use 500MB worth of RAM for caching. Not 120GB On Fri, Apr 19, 2013 at 7:55 AM, David

RE: SolrCloud loadbalancing, replication, and failover

2013-04-19 Thread David Parks
Interesting. I'm trying to correlate this new understanding to what I see on my servers. I've got one server with 5GB dedicated to solr, solr dashboard reports a 167GB index actually. When I do many typical queries I see between 3MB and 9MB of disk reads (watching iostat). But solr's dashboard

Re: solr-cloud performance decrease day by day

2013-04-19 Thread Manuel Le Normand
Can happen for various reasons. Can you recreate the situation, meaning restarting the servlet or server would start with good qTime and decrease from that point? How fast does this happen? Start by monitoring the jvm process, with oracle visualVM for example. Monitor for frequent garbage

Re: solr-cloud performance decrease day by day

2013-04-19 Thread qibaoyuan
Thanks manu,i will check it. 在 2013-4-19,下午4:26,Manuel Le Normand manuel.lenorm...@gmail.com 写道: Can happen for various reasons. Can you recreate the situation, meaning restarting the servlet or server would start with good qTime and decrease from that point? How fast does this happen?

Re: shard query return 500 on large data set

2013-04-19 Thread Dmitry Kan
Can you instead use paging mechanism? On Thu, Apr 18, 2013 at 8:03 PM, Jie Sun jsun5...@yahoo.com wrote: Hi - when I execute a shard query like:

Re: SolrCloud loadbalancing, replication, and failover

2013-04-19 Thread Shawn Heisey
On 4/19/2013 1:34 AM, John Nielsen wrote: Well, to consume 120GB of RAM with a 120GB index, you would have to query over every single GB of data. If you only actually query over, say, 500MB of the 120GB data in your dev environment, you would only use 500MB worth of RAM for caching. Not

Re: SolrCloud loadbalancing, replication, and failover

2013-04-19 Thread Shawn Heisey
On 4/19/2013 2:15 AM, David Parks wrote: Interesting. I'm trying to correlate this new understanding to what I see on my servers. I've got one server with 5GB dedicated to solr, solr dashboard reports a 167GB index actually. When I do many typical queries I see between 3MB and 9MB of disk

RE: SolrCloud loadbalancing, replication, and failover

2013-04-19 Thread David Parks
Ok, I understand better now. The Physical Memory is 90% utilized (21.18GB of 23.54GB). Solr has dark grey allocation of 602MB, and light grey of an additional 108MB, for a JVM total of 710MB allocated. If I understand correctly, Solr memory utilization is *not* for caching (unless I configured

in solrcoud, how to assign a schemaConf to a collection ?

2013-04-19 Thread sling
hi all, help~~~ how to specify a schema to a collection in solrcloud? i have a solrcloud with 3 collections, and each configfile is uploaded to zk like this: args=-Xmn3000m -Xms5000m -Xmx5000m -XX:MaxPermSize=384m -Dbootstrap_confdir=/workspace/solr/solrhome/doc/conf

solr-cloud problem about user-specified tags

2013-04-19 Thread qibaoyuan
I have plenty of docs and each docs maybe connected to many user-defined tags.I have used sold-cloud, and use join to do this kind of job,and recently i know sole-cloud does not support distributed search.AND so this is a big problem so far.AND the decomposition is quite impossible,because docs

Re: SolrCloud loadbalancing, replication, and failover

2013-04-19 Thread Toke Eskildsen
On Fri, 2013-04-19 at 06:51 +0200, Shawn Heisey wrote: Using SSDs for storage can speed things up dramatically and may reduce the total memory requirement to some degree, We have been using SSDs for several years in our servers. It is our clear experience that to some degree should be replaced

RE: SolrCloud loadbalancing, replication, and failover

2013-04-19 Thread David Parks
Wow, thank you for those benchmarks Toke, that really gives me some firm footing to stand on in knowing what to expect and thinking out which path to venture down. It's tremendously appreciated! Dave -Original Message- From: Toke Eskildsen [mailto:t...@statsbiblioteket.dk] Sent:

Re: WordDelimiterFactory

2013-04-19 Thread Erick Erickson
Ashok: You really, _really_ need to dive into the admin/analysis page. That'll show you exactly what WDFF (and all the other elements of your chain) do to input tokens. Understanding the index and query-time implications of all the settings in WDFF takes a while. But from what you're describing,

Re: in solrcoud, how to assign a schemaConf to a collection ?

2013-04-19 Thread sling
when i add a schema property to core core name=pic instanceDir=pic/ loadOnStartup=true transient=false collection=picCollection config=solrconfig.xml schema=../picconf/schema.xml/ it seems there a default path to schema ,that is /configs/docconf/ the exception is: [18:59:09.211]

RE: Indexing problems

2013-04-19 Thread GASPARD Joel
Hello Thank you for your answer. We have solved our problem now. I describe it for someone who could encounter a similar problem. Some of our fields are dynamic, and the name of one of these fields was not correct : it was sent to Solr as a java object, eg

Re: in solrcoud, how to assign a schemaConf to a collection ?

2013-04-19 Thread sling
i copy the 3 schema.xml and solrconfig.xml to $solrhome/conf/.xml, and upload this filedir to zk like this: args=-Xmn1000m -Xms2000m -Xmx2000m -XX:MaxPermSize=384m -Dbootstrap_confdir=/home/app/workspace/solrcloud/solr/solrhome/conf -Dcollection.configName=conf

Re: Solr using a ridiculous amount of memory

2013-04-19 Thread Erick Erickson
Hmmm. There has been quite a bit of work lately to support a couple of things that might be of interest (4.3, which Simon cut today, probably available to all mid next week at the latest). Basically, you can choose to pre-define all the cores in solr.xml (so-called old style) _or_ use the

Re: stats.facet not working for timestamp field

2013-04-19 Thread Erick Erickson
I'm guessing that your timestamp is a tdate, which stores extra information in the index for fast range searches. What happens if you try to facet on just a date field? Best Erick On Thu, Apr 18, 2013 at 8:37 AM, J Mohamed Zahoor zah...@indix.com wrote: Hi I am using SOlr 4.1 with 6 shards.

Re: solr4 : disable updateLog

2013-04-19 Thread Erick Erickson
updateLog is _required_ if you're in solrCloud mode. Assuming that you're not using SolrCloud, then you can freely disable it. Why do you want to? It's not a bad idea necessarily, but this might be an XY problem. Best Erick On Thu, Apr 18, 2013 at 10:47 AM, Jamel ESSOUSSI

Update Request Processor Chains

2013-04-19 Thread Furkan KAMACI
I am trying to understand update request processor chains. Do they runs one by one when indexing a ducument? Can I identify multiple update request processor chains? Also what are that LogUpdateProcessorFactory and RunUpdateProcessorFactory?

Re: solr-cloud performance decrease day by day

2013-04-19 Thread Jack Krupansky
How are you committing data? With 4.0, CommitWithin is now a soft commit, which means that the transaction log will grow until you do a hard commit. You need to periodically do a hard commit if you are continually updating the index. How much updating are you doing? Also, check how much heap

fuzzy search issue with PatternTokenizer Factory

2013-04-19 Thread meghana
I m using Solr4.2 , I have changed my text field definition, to use the Solr.PatternTokenizerFactory instead of Solr.StandardTokenizerFactory , and changed my schema defination as below fieldType name=text_token class=solr.TextField positionIncrementGap=100 analyzer type=index

Re: facet.method enum vs fc

2013-04-19 Thread Joel Bernstein
Faceting on a high cardinality string field, like url, on a 120 million record index is going to be very memory intensive. You will very likely need to shard the index to get the performance that you need. In Solr 4.2, you can make the url field a Disk based DocValue and shift the memory from

Import in Solr

2013-04-19 Thread hassancrowdc
I want to update(delta-import) one specific item. Is there any query to do that? like i can delete specific item with the following query: localhost:8080/solr/devices/update?stream.body=deletequeryid:46/query/deletecommit=true Thanks. -- View this message in context:

Returning similarity values for more like this search

2013-04-19 Thread Achim Domma
Hi, I'm executing a search including a search for similar documents (mlt=truemlt.fl=) which works fine so far. I would like to get the similarity value for each document. I expected this to be quite common and simple, but I could not find a hint how to do it. Any hint how to do it would

RE: DirectSolrSpellChecker : vastly varying spellcheck QTime times.

2013-04-19 Thread Dyer, James
I guess the first thing I'd do is to set maxCollationTries to zero. This means it will only run your main query once and not re-run it to check the collations. Now see if your queries have consistent qtime. One easy explanation is that with maxCollationTries=10, it may be running your query

Re: SolrCloud loadbalancing, replication, and failover

2013-04-19 Thread Shawn Heisey
On 4/19/2013 3:48 AM, David Parks wrote: The Physical Memory is 90% utilized (21.18GB of 23.54GB). Solr has dark grey allocation of 602MB, and light grey of an additional 108MB, for a JVM total of 710MB allocated. If I understand correctly, Solr memory utilization is *not* for caching (unless

Re: Returning similarity values for more like this search

2013-04-19 Thread Koji Sekiguchi
(13/04/19 23:24), Achim Domma wrote: Hi, I'm executing a search including a search for similar documents (mlt=truemlt.fl=) which works fine so far. I would like to get the similarity value for each document. I expected this to be quite common and simple, but I could not find a hint how

is phrase search possible in solr

2013-04-19 Thread vicky desai
I want to do a phrase search in solr without analyzers being applied to it eg - If I search for *DelhiDareDevil* (i.e - with inverted commas)it should search the exact text and not apply any analyzers or tokenizers on this field However if i search for *DelhiDareDevil* it should use tokenizers

Re: SEVERE: shard update error StdNode on SolrCloud 4.2.1

2013-04-19 Thread Steve Woodcock
On 16 April 2013 11:35, Steve Woodcock steve.woodc...@gmail.com wrote: We have a simple SolrCloud setup (4.2.1) running with a single shard and two nodes, and it's working fine except whenever we send an update request, the leader logs this error: SEVERE: shard update error StdNode:

Re: is phrase search possible in solr

2013-04-19 Thread Raymond Wiker
On Apr 19, 2013, at 16:59 , vicky desai vicky.de...@germinait.com wrote: I want to do a phrase search in solr without analyzers being applied to it eg - If I search for *DelhiDareDevil* (i.e - with inverted commas)it should search the exact text and not apply any analyzers or tokenizers on

Re: is phrase search possible in solr

2013-04-19 Thread Jack Krupansky
By definition, phrase search is one of two things: 1) match on a string field literally, or 2) analyze as a sequence of tokens as per the field type index analyzer. You could use the keyword tokenizer to store the whole field as one string, with filtering for the whole string. Or, just make

Pros and cons of using RAID or different RAIDS?

2013-04-19 Thread Furkan KAMACI
Is there any documentation that explains pros and cons of using RAID or different RAIDS?

Re: is phrase search possible in solr

2013-04-19 Thread Jack Krupansky
Oops... that's query analyzer, not index analyzer, so it's: By definition, phrase search is one of two things: 1) match on a string field literally, or 2) analyze as a sequence of tokens as per the field type query analyzer. -- Jack Krupansky -Original Message- From: Jack Krupansky

Searching

2013-04-19 Thread hassancrowdc
I want to search so that: - if i write an alphabet it returns all the items that start with that alphabet(a returns apple, aspire etc). - if i ask for a whole string, it returns me just the results with exact string. (like search for Samsung S3 then only result is samsung s3) -if i ask for

Re: facet.method enum vs fc

2013-04-19 Thread Mingfeng Yang
Joel, Thanks for your kind reply. The problem is solved with sharding and using facet.method=enum. I am curious about what's the different between enum and fc, so that enum works but fc does not. Do you know something about this? Thank you! Regards, Ming On Fri, Apr 19, 2013 at 6:18 AM,

Re: Searching

2013-04-19 Thread Jack Krupansky
Yes, you can do all of that... but it would be a non-trivial amount of effort - the kind of thing consultants get paid real money to do. You should also consider doing it in a middleware application layer, using possibly multiple queries of separate Solr collections. Otherwise, your index might

Re: Update Request Processor Chains

2013-04-19 Thread Erik Hatcher
You can have multiple update chains defined and use only one of them per update request. LogUpdateProcessor logs the update request and the RunUpdateProcessor is where the actual index is updated. Erik On Apr 19, 2013, at 07:49 , Furkan KAMACI wrote: I am trying to understand

Re: WordDelimiterFactory

2013-04-19 Thread Ashok
Yes, thank you Erick. The analysis/document handlers hold the key to deciding the type order of the filters to employ given one's document set, subject matter at hand. The finalized terms they produce for SOLR search, mlt etc... are crucial to the quality of the results. - ashok -- View this

Re: fuzzy search issue with PatternTokenizer Factory

2013-04-19 Thread Jack Krupansky
Give us some examples of tokens that you are expecting that pattern to tokenize. And express the pattern in simple English as well. Some some actual input data. I suspect that Solr is working fine - but you may not have precisely specified your pattern. But we don't know what your pattern is

RE: DirectSolrSpellChecker : vastly varying spellcheck QTime times.

2013-04-19 Thread SandeepM
James, Thanks for the reply. I see your point and sure enough, reducing maxCollationTries does reduce time, however may not produce results. It seems like the time is taken for the collations re-runs. Is there any way we can activate caching for collations. The same query repeatedly takes the

Re: Updating clusterstate from the zookeeper

2013-04-19 Thread Michael Della Bitta
I would like to know the answer to this as well. Michael Della Bitta Appinions 18 East 41st Street, 2nd Floor New York, NY 10017-6271 www.appinions.com Where Influence Isn’t a Game On Thu, Apr 18, 2013 at 8:15 PM, Manuel Le Normand

Re: Updating clusterstate from the zookeeper

2013-04-19 Thread mike st. john
you can use the eclipse plugin for zookeeper. http://www.massedynamic.org/mediawiki/index.php?title=Eclipse_Plug-in_for_ZooKeeper -Msj. On Fri, Apr 19, 2013 at 1:53 PM, Michael Della Bitta michael.della.bi...@appinions.com wrote: I would like to know the answer to this as well. Michael

RE: DirectSolrSpellChecker : vastly varying spellcheck QTime times.

2013-04-19 Thread Dyer, James
I do not know what it would take to have the collation tests make betetr use of the QueryResultCache. However, outside of a test scenario, I do not know if this would help a lot. Hopefully you wouldn't have a lot of users issuing the exact same query with the exact same misspelled words over

Re: Updating clusterstate from the zookeeper

2013-04-19 Thread Mingfeng Yang
Right. I am wondering if/how we can download a specific file from the zookeeper, modify it and then upload to rewrite it. Anyone ? Thanks, Ming On Fri, Apr 19, 2013 at 10:53 AM, Michael Della Bitta michael.della.bi...@appinions.com wrote: I would like to know the answer to this as well.

Re: Updating clusterstate from the zookeeper

2013-04-19 Thread Nate Fox
I've used zookeeper's cli to do this. I doubt its the right way and I have no idea if it'll work for clusterstate.json, but it seems to work for certain things. cd /opt/zookeeper/bin ./zkCli.sh -server 127.0.0.1:2183 set /configs/collection1/schema.xml `cat /tmp/newschema.xml` sleep 10 # give a

Weird query issues

2013-04-19 Thread Ravi Solr
Hello, We are using Solr 3.6.2 single core ( both index and query on same machine) and randomly the server fails to query correctly. If we query from the admin console the query is not even applied and it returns numFound count equal to total docs in the index as if no query is made, and if use

Could not find an instance of QueryComponent. Disabling collation verification against the index.

2013-04-19 Thread balaji.gandhi
Hi Team, I am trying to configure the Auto-suggest feature for the businessProvince field in my schema. I followed the instructions here:- http://wiki.apache.org/solr/Suggester But then I got the following error:- INFO: Could not find an instance of QueryComponent. Disabling collation

Re: solr-cloud performance decrease day by day

2013-04-19 Thread alxsss
How many segments each shard has and what is the reason of running multiple shards in one machine? Alex. -Original Message- From: qibaoyuan qibaoy...@gmail.com To: solr-user solr-user@lucene.apache.org Sent: Fri, Apr 19, 2013 12:26 am Subject: Re: solr-cloud performance

Weird query issues

2013-04-19 Thread Ravi Solr
Hello, We are using Solr 3.6.2 single core ( both index and query on same machine) and randomly the server fails to query correctly. If we query from the admin console the query is not even applied and it returns numFound count equal to total docs in the index as if no query is made, and if use

Re: facet.method enum vs fc

2013-04-19 Thread Chris Hostetter
: Thanks for your kind reply. The problem is solved with sharding and using : facet.method=enum. I am curious about what's the different between enum : and fc, so that enum works but fc does not. Do you know something about : this? method=fc/fcs uses the field caches (or uninverted fields

Re: Update Request Processor Chains

2013-04-19 Thread Chris Hostetter
: I am trying to understand update request processor chains. Do they runs one : by one when indexing a ducument? Can I identify multiple update request : processor chains? Also what are that LogUpdateProcessorFactory and : RunUpdateProcessorFactory?

Re: Searching

2013-04-19 Thread hassancrowdc
thanks. I was expecting an answer that could help me to choose analyzers or tokenizers. any help for anyone of the scenarios? -- View this message in context: http://lucene.472066.n3.nabble.com/Searching-tp4057328p4057465.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Update Request Processor Chains

2013-04-19 Thread Furkan KAMACI
Thanks for detailed answers. 2013/4/19 Chris Hostetter hossman_luc...@fucit.org : I am trying to understand update request processor chains. Do they runs one : by one when indexing a ducument? Can I identify multiple update request : processor chains? Also what are that

Re: Weird query issues

2013-04-19 Thread Shawn Heisey
On 4/19/2013 12:55 PM, Ravi Solr wrote: We are using Solr 3.6.2 single core ( both index and query on same machine) and randomly the server fails to query correctly. If we query from the admin console the query is not even applied and it returns numFound count equal to total docs in the index

external values source

2013-04-19 Thread Maciej Liżewski
I need some explanation on how ValuesSource and related classes work. There are already implemented ExternalFileField, example on how to load data from database ( http://sujitpal.blogspot.com/2011/05/custom-sorting-in-solr-using-external. html

Rogue query killed several replicas with OOM, after recovering - match all docs query problem

2013-04-19 Thread Timothy Potter
We had a rogue query take out several replicas in a large 4.2.0 cluster today, due to OOM's (we use the JVM args to kill the process on OOM). After recovering, when I execute the match all docs query (*:*), I get a different count each time. In other words, if I execute q=*:* several times in a

Re: external values source

2013-04-19 Thread Timothy Potter
Hi Maciek, I think a custom ValueSource is definitely what you want because you need to compute some derived value based on an indexed field and some external value. The trick is figuring how to make the lookup to the external data very, very fast. Here's a rough sketch of what we do: We have a

RE: SolrCloud loadbalancing, replication, and failover

2013-04-19 Thread David Parks
Again, thank you for this incredible information, I feel on much firmer footing now. I'm going to test distributing this across 10 servers, borrowing a Hadoop cluster temporarily, and see how it does with enough memory to have the whole index cached. But I'm thinking that we'll try the SSD route

Re: Pros and cons of using RAID or different RAIDS?

2013-04-19 Thread Otis Gospodnetic
Yeah, but as far as I know, there is nothing Solr-specific about that. See http://www.acnc.com/raid Otis -- Solr ElasticSearch Support http://sematext.com/ On Fri, Apr 19, 2013 at 11:19 AM, Furkan KAMACI furkankam...@gmail.com wrote: Is there any documentation that explains pros and cons

Re: Import in Solr

2013-04-19 Thread Gora Mohanty
On 19 April 2013 19:50, hassancrowdc hassancrowdc...@gmail.com wrote: I want to update(delta-import) one specific item. Is there any query to do that? No. Regards, Gora