long QTime for big index

2013-01-31 Thread Mou
I am running solr 3.4 on tomcat 7. Our index is very big , two cores each 120G. We are searching the slaves which are replicated every 30 min. I am using filtercache only and We have more than 90% cache hits. We use lot of filter queries, queries are usually pretty big with 10-20 fq parameters.

Re: Fwd: advice about develop AbstractSolrEventListener.

2013-01-31 Thread Miguel
Hi After to study apache solr documentation, I think only way to know update records (modify, delete an insert actions) is developed a class extends org.apache.solr.servlet.SolrUpdateServlet. In this class, I can access updated record information go into Apache solr server. Somebody can

How to use SolrCloud in multi-threaded indexing

2013-01-31 Thread andy
Hi, I am going to upgrade to solr 4.1 from version 3.6, and I want to set up to shards. I use ConcurrentUpdateSolrServer to index the documents in solr3.6. I saw the api CloudSolrServer in 4.1,BUT 1:CloudSolrServer use the LBHttpSolrServer to issue requests,but * LBHttpSolrServer should NOT be

searching for an id

2013-01-31 Thread b.riez...@pixel-ink.de
Hi I have an id wich is a string like this. tx-20130130-4599 i'm using a field without processing, wich i got confirmed via the analyser tool But when i search for that it got split up, so instead of finding that specific entry with that unique id, it finds all entries with tx in it. Any idea

RE: Indexing problems

2013-01-31 Thread GASPARD Joel
Hello, After more tests, we could identify our problem in indexation (Solr 4.0.0). Indeed our problems are OutOfMemoryErrors. Thinking about Zookeeper connection problems was a mistake. We have thought about this because OOME sometimes appear in logs after errors on Zookeeper leader election.

Re: long QTime for big index

2013-01-31 Thread Dmitry Kan
Does debugQuery=true tell anything useful for these? Like what is the component taking most of the 30 seconds. Do you have evictions in your solr caches? Dmitry On Thu, Jan 31, 2013 at 10:01 AM, Mou mouna...@gmail.com wrote: I am running solr 3.4 on tomcat 7. Our index is very big , two

Question on Facet field constraints sort order

2013-01-31 Thread vijeshnair
It could be a foolish question or concern, but I have no option :-) . We do have an e-com site where we consuming the feed from the CSE partners and indexing it in to SOLR for our search. Instead of the traditional auto-suggest, the predictive search in the header search box recommends the

Solr4.1 changing result order FIFO to LIFO

2013-01-31 Thread Bernd Fehling
Hi list, I recognized that the result order is FIFO if documents have the same score. I think this is due to the fact that documents which are indexed later get a higher internal document ID and the output for documents with the same score starts with the lowest internal document ID and raises.

Re: searching for an id

2013-01-31 Thread Chandan Tamrakar
which analyzer are you using to index that field , you can verify that from schema file . thanks On Thu, Jan 31, 2013 at 2:35 PM, b.riez...@pixel-ink.de b.riez...@pixel-ink.de wrote: Hi I have an id wich is a string like this. tx-20130130-4599 i'm using a field without processing,

Thoughts on production deployment?

2013-01-31 Thread Scott Stults
Part of this is a rant, part is a plea to others who've run successful production deployments. Solr is a second-class citizen when it comes to production deployment. Every recipe I've seen (RPM, DEB, chef, or puppet) makes assumptions that in one way or another run afoul of best-practices when

RE: Solr load balancer

2013-01-31 Thread Phil Hoy
Hi, So am I correct in thinking that I add the jira myself, if so can I add it do the 4.2 release? Also I have further questions about the scope of my patch, should that be left to the comments of the jira itself? Phil -Original Message- From: Otis Gospodnetic

solr atomic update

2013-01-31 Thread Marcos Mendez
Is there a way to do an atomic update (inc by 1) and retrieve the updated value in one operation?

Re: Can I start solr with replication activated but disabled between master and slave

2013-01-31 Thread Erick Erickson
You can also do all this via HTTP commands, see: http://wiki.apache.org/solr/SolrReplication#HTTP_API that allows you to control _all_ replication from the master (i.e. tell the master don't to any replication) or just tell a slave don't replicate any more as well as a lot of other stuff. Best

Re: Indexing problems

2013-01-31 Thread Erick Erickson
I'm really surprised you're hitting OOM errors, I suspect you have something else pathological in your system. So, I'd start checking things like - how many concurrent warming searchers you allow - How big your indexing RAM is set to (we find very little gain over 128M BTW). - Other load on your

Indexing nouns only - UIMA vs. OpenNLP

2013-01-31 Thread Kai Gülzau
Hi, I am stuck trying to index only the nouns of german and english texts. (very similar to http://wiki.apache.org/solr/OpenNLP#Full_Example) First try was to use UIMA with the HMMTagger: processor class=org.apache.solr.uima.processor.UIMAUpdateRequestProcessorFactory lst name=uimaConfig

Re: Possible issue in edismax?

2013-01-31 Thread Felipe Lahti
So, it depends of your business requirement, right? If a document has matches in more searchable fields, at least for me, this document is more important than other document that has less matches. Example: Put this in your schema: similarity class=com.your.namespace.NoIDFSimilarity / And create

Re: setting up master and slave in same machine with diff ip's and same port

2013-01-31 Thread epnRui
Hi, I solved the issue by setting up two different virtual network adapters in ubuntu server. case closed ;) thanks for the help!! -- View this message in context:

Stopping solr

2013-01-31 Thread epnRui
Hi people, First of all this forum is a god sent!!! Second: I have a master / slave configuration, using replication. Currently in production I have only one server, there's no backup server (really...). The webapplication is a public webapplication, everyone can see it. - How often, in

Re: Thoughts on production deployment?

2013-01-31 Thread Michael Della Bitta
On Thu, Jan 31, 2013 at 5:13 AM, Scott Stults sstu...@opensourceconnections.com wrote: Right now that blessed container is Jetty version 8.1.2.v20120308. I'd really like some confirmation from the devs that there really is a blessed status for a given container that provides advantages over

RE: Indexing problems

2013-01-31 Thread GASPARD Joel
Hello Erick, Thanks for your answer. After reading previous subjects on the user list, we had already tried to change the parameters we mentioned. - concurrent warming searchers : we have set the maxWarmingSearchers attribute to 2 maxWarmingSearchers2/maxWarmingSearchers - we have tried 32

Re: Stopping solr

2013-01-31 Thread Michael Della Bitta
- How often, in your experience, and why, would solr crash? Not very often. Typically if your heap is too small, you'll end up going OOM. - If I kill solr master and slave, usually do I need to also delete the indexes? Or everything should be fine upon restarting? Restarts are fine. Order

Re: Possible issue in edismax?

2013-01-31 Thread Sandeep Mestry
Fantastic! Thanks very much.. I will do so accordingly and will let you know the results. Thanks again, Sandeep On 31 January 2013 13:54, Felipe Lahti fla...@thoughtworks.com wrote: So, it depends of your business requirement, right? If a document has matches in more searchable fields, at

Re: long QTime for big index

2013-01-31 Thread Mou
Thanks for your reply. No, there is no eviction, yet. The time is spent mostly on org.apache.solr.handler.component.QueryComponent to process the request. Again, the time varies widely for same query. -- View this message in context:

Re: searching for an id

2013-01-31 Thread Alexandre Rafalovitch
Are you using eDismax? Maybe your ID field is not part of the search fields or not a high priority. And, just maybe, you are doing a copyField * to text and the text splits the ID into parts. Enable the debug on your query and you should be able to figure it out. Regards, Alex. Personal blog:

Re: help to build query

2013-01-31 Thread Abhishek tiwari
jack Thanks for your response.. we have a deal web application.. and having free text search in it . here free text means you can type any thing in it.. we have deals of different categories.. and tagged at different merchant locations.. As per requirement i have to do some tweaks in

Re: Thoughts on production deployment?

2013-01-31 Thread Paul Jungwirth
We have a Chef regime here, and I've written Tomcat and Solr recipes to be played against Ubuntu 12.04 Server. We do mostly the same: chef to install Tomcat (with configuration appropriate to Solr), but then instead of deploying Solr via chef, we use an ant script to package and deploy a war

RE: Indexing nouns only - UIMA vs. OpenNLP

2013-01-31 Thread Kai Gülzau
UIMA: I just found this issue https://issues.apache.org/jira/browse/SOLR-3013 Now I am able to use this analyzer for english texts and filter (un)wanted token types :-) fieldType name=uima_nouns_en class=solr.TextField positionIncrementGap=100 analyzer tokenizer

RE: field space consumption - stored vs not stored

2013-01-31 Thread Petersen, Robert
Thanks Shawn. Actually now that I think about it, Yonik also mentioned something about lucene number representation once in reply to one of my questions. Here it is: Could you also tell me what these `#8;#0;#0;#0;#1; strings represent in the debug output? That's internally how a number is

Search match all tokens in Query Text

2013-01-31 Thread Bing Hua
Hello, I have a field text with type text_general here. fieldType name=text_general class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.StandardTokenizerFactory /

Re: Search match all tokens in Query Text

2013-01-31 Thread Jack Krupansky
+text:a +b -- Jack Krupansky -Original Message- From: Bing Hua Sent: Thursday, January 31, 2013 12:59 PM To: solr-user@lucene.apache.org Subject: Search match all tokens in Query Text Hello, I have a field text with type text_general here. fieldType name=text_general

Re: Search match all tokens in Query Text

2013-01-31 Thread Bing Hua
Thanks for the quick reply. Seems like you are suggesting to add explicitly AND operator. I don't think this solves my problem. I found it solrQueryParser defaultOperator=AND/ somewhere, and this works. -- View this message in context:

Re: long QTime for big index

2013-01-31 Thread Shawn Heisey
On 1/31/2013 1:01 AM, Mou wrote: I am running solr 3.4 on tomcat 7. Our index is very big , two cores each 120G. We are searching the slaves which are replicated every 30 min. I am using filtercache only and We have more than 90% cache hits. We use lot of filter queries, queries are usually

Re: long QTime for big index

2013-01-31 Thread Mou
Thank you Shawn for reading all of my previous entries and for a detailed answer. To clarify, the third shard is used to store the recently added/updated data. Two main big cores take very long to replicate ( when a full replication is required) so the third one helps us to return the newly

DIH and splitBy

2013-01-31 Thread Christopher Condit
I'm having an issue getting the splitBy construct from the regex transformer to work in a very basic case (with either Solr 3.6 or 4.1). I have a field defined like this: field stored=true name=type type=string multiValued=true/ The entity is defined like this: entity name=item

RE: DIH and splitBy

2013-01-31 Thread Dyer, James
In your unit test, you have: field column=\type\ name=\type\ splitBy=\\\|\ / + And also: runner.update(INSERT INTO test VALUES 1, 'foo,bar,baz'); So you need to decide if you want to delimit with a pipe or a comma. James Dyer Ingram Content Group (615) 213-4311 -Original Message-

Re: long QTime for big index

2013-01-31 Thread Shawn Heisey
On 1/31/2013 12:47 PM, Mou wrote: To clarify, the third shard is used to store the recently added/updated data. Two main big cores take very long to replicate ( when a full replication is required) so the third one helps us to return the newly indexed documents quickly. It gets deleted every

Re: DIH and splitBy

2013-01-31 Thread Christopher Condit
Sorry about that - even if I switch the splitBy to , it still doesn't work. Here's the corrected unit test: http://pastie.org/5995399 On Thu, Jan 31, 2013 at 12:30 PM, Dyer, James james.d...@ingramcontent.com wrote: In your unit test, you have: field column=\type\ name=\type\ splitBy=\\\|\ / +

RE: long QTime for big index

2013-01-31 Thread Toke Eskildsen
Shawn Heisey [s...@elyograg.org] wrote: [...] If you have a total index size for this JVM of 240GB, then you may not have enough RAM to let the OS disk cache work efficiently. For that size of index, I would plan on a system with at least 128GB of RAM, 256GB would be better. [...] One of

Re: Stopping solr

2013-01-31 Thread Michael Della Bitta
The ping handler is how we tell our load balancers that our Solr cores are healthy. I guess if you're running more than one core behind the same balancer, it would make sense to drop a webapp in there that ran the ping queries for all your cores and only responded OK if they all came back OK. Or

Re: Thoughts on production deployment?

2013-01-31 Thread Mark Miller
On Jan 31, 2013, at 10:15 AM, Michael Della Bitta michael.della.bi...@appinions.com wrote: I'd really like some confirmation from the devs that there really is a blessed status for a given container that provides advantages over others. IMO: jetty is what all of our unit/integration tests

Re: Minimum word length for stemming

2013-01-31 Thread Jan Høydahl
Hi, I believe each stemmer implementation decides that themselves. At least the MinimalNorwegianStemmer has a built-in logic which stems certain suffixes only if the token is N chars. If you want external control, you can look at

Re: Thoughts on production deployment?

2013-01-31 Thread Michael Della Bitta
That's surprising to me, mostly because a number of the Solr wiki pages don't really make that strong of a case for it: http://wiki.apache.org/solr/SolrInstall http://wiki.apache.org/solr/SolrTomcat http://wiki.apache.org/solr/SolrJetty Would it make sense to spell that out somewhere? I do

Re: Minimum word length for stemming

2013-01-31 Thread Jamie Johnson
Thanks for confirming my suspicions, the custom TokenLengthMarkerFilterFactory sounds like the best approach for doing this. On Thu, Jan 31, 2013 at 5:12 PM, Jan Høydahl jan@cominvent.com wrote: Hi, I believe each stemmer implementation decides that themselves. At least the

Re: Thoughts on production deployment?

2013-01-31 Thread Shawn Heisey
On 1/31/2013 3:21 PM, Michael Della Bitta wrote: I do notice that it seems like the version of Jetty that ships with Solr isn't the preferred one according to the wiki, so that would be an extra dependency for a config management system like Chef. Near as I can tell, the versions of jetty that

RE: Solr load balancer

2013-01-31 Thread Jeff Wartes
For what it's worth, Google has done some pretty interesting research into coping with the idea that particular shards might very well be busy doing something else when your query comes in. Check out this slide deck: http://research.google.com/people/jeff/latency.html Lots of interesting

Re: Solr load balancer

2013-01-31 Thread Lance Norskog
It is possible to do this with IP Multicast. The query goes out on the multicast and all query servers read it. The servers wait for a random amount of time, then transmit the answer. Here's the trick: it's multicast. All of the query servers listen to each other's responses, and drop out when

Re: Indexing nouns only - UIMA vs. OpenNLP

2013-01-31 Thread Lance Norskog
Thanks, Kai! About removing non-nouns: the OpenNLP patch includes two simple TokenFilters for manipulating terms with payloads. The FilterPayloadFilter lets you keep or remove terms with given payloads. In the demo schema.xml, there is an example type that keeps only nounsverbs. There is a

Re: long QTime for big index

2013-01-31 Thread Mou
Thank you again. Unfortunately the index files will not fit in the RAM.I have to try using document cache. I am also moving my index to SSD again, we took our index off when fusion IO cards failed twice during indexing and index was corrupted.Now with the bios upgrade and new driver, it is