Re: Starts with Query

2012-06-15 Thread nutchsolruser
Thanks Jack for valuable response,Actually i am trying to match *any* numeric pattern at the start of each document. I dont know documents in index i just want documents title starting with any digit. -- View this message in context:

IndexWrite in Lucene/Solr 3.5 is slower?

2012-06-15 Thread Ramprakash Ramamoorthy
We are upgrading our search infrastructure from Lucene 2.3.1 to Lucene 3.5. I am in the process of load testing and I could find that Lucene 2.3.1 could index 32,000 docs per second, whereas Lucene 3.5 could index only around 17,000 docs per second. Indeed, both of them use the standard analyzer

RE: Starts with Query

2012-06-15 Thread Afroz Ahmad
If you are not searching for the specific digit and want to match all documents that start with any digit, you could as part of the indexing process, have another field say startsWithDigit and set it to true if it the title begins with a digit. All you need to do at query time then is query for

Re: IndexWrite in Lucene/Solr 3.5 is slower?

2012-06-15 Thread pravesh
BTW, Have you changed the MergePolicy MergeScheduler settings also? Since Lucene 3.x/3.5 onwards, there have been new MergePolicy MergeScheduler implementations available, like TieredMergePolicy ConcurrentMergeScheduler. Regards Pravesh -- View this message in context:

Re: Starts with Query

2012-06-15 Thread Michael Kuhlmann
It's not necessary to do this. You can simply be happy about the fact that all digits are ordered strictly in unicode, so you can use a range query: (f)q={!frange l=0 u=\: incl=true incu=false}title This finds all documents where any token from the title field starts with a digit, so if you

Re: IndexWrite in Lucene/Solr 3.5 is slower?

2012-06-15 Thread Ramprakash Ramamoorthy
On Fri, Jun 15, 2012 at 12:20 PM, pravesh suyalprav...@yahoo.com wrote: BTW, Have you changed the MergePolicy MergeScheduler settings also? Since Lucene 3.x/3.5 onwards, there have been new MergePolicy MergeScheduler implementations available, like TieredMergePolicy

Re: DIH idle in transaction forever

2012-06-15 Thread Jasper Floor
Btw, I removed the batchSize but performance is better with batchSize=1. I haven't done further testing to see what the best setting is, but the difference between setting it at 1 and not setting it is almost double the indexing time (~20 minutes vs ~37 minutes) On Thu, Jun 14, 2012 at

FileListEntityProcessor limit at 11 files?

2012-06-15 Thread Roland Ucker
Hello, I'm using the DIH to index some PDFs. Everything works fine for the first 11 files. But after indexing 11 PDFs the process stops independently of the PDFs being indexed or the directory structure (recursive=true). The lucene index for these 11 documents is valid. Is there anything like a

Re: FilterCache - maximum size of document set

2012-06-15 Thread Erick Erickson
Test first, of course, but slave on 3.6 and master on 3.5 should be fine. If you're getting evictions with the cache settings that high, you really want to look at why. Note that in particular, using NOW in your filter queries virtually guarantees that they won't be re-used as per the link I sent

SolrCloud subdirs in conf boostrap dir

2012-06-15 Thread Markus Jelsma
Hi, We'd like to create subdirectories for each collection in our conf bootstrap directory for cleaner maintenance and not having to include the collection name in each configuration file. However, it is not working: 2012-06-15 11:31:08,483 ERROR [solr.core.CoreContainer] - [main] - :

Re: Dedupe and overwriteDupes setting

2012-06-15 Thread Shameema Umer
Hi, My solrconfig dedupe setting is as follows. updateRequestProcessorChain name=dedupe processor class=org.apache.solr.update.processor.SignatureUpdateProcessorFactory bool name=enabledtrue/bool bool name=overwriteDupesfalse/bool str name=signatureFielddupesign/str

Re: IndexWrite in Lucene/Solr 3.5 is slower?

2012-06-15 Thread Ramprakash Ramamoorthy
On Fri, Jun 15, 2012 at 12:50 PM, Ramprakash Ramamoorthy youngestachie...@gmail.com wrote: On Fri, Jun 15, 2012 at 12:20 PM, pravesh suyalprav...@yahoo.com wrote: BTW, Have you changed the MergePolicy MergeScheduler settings also? Since Lucene 3.x/3.5 onwards, there have been new

Re: Building a heat map from geo data in index

2012-06-15 Thread Jamie Johnson
So I've tried this a bit, but I can't get it to look quite right. What I was doing up until now was taking the center point of the geohash cell as location for the value I am getting from the index. Doing this you end up with what appears to be islands (using HeatMap.js currently). I guess what I

Re: FilterCache - maximum size of document set

2012-06-15 Thread Pawel Rog
Thanks I don't use NOW in queries. All my filters with timestamp are rounded to hundreds of seconds to increase hitrate. The only problem could be in price filters which can be varied (users are unpredictable :P), but also that filters from fq or setting cache=false is also bad idea ... checked it

SolrCloud and split-brain

2012-06-15 Thread Otis Gospodnetic
Hi, How exactly does SolrCloud handle split brain situations? Imagine a cluster of 10 nodes. Imagine 3 of them being connected to the network by some switch and imagine the out port of this switch dies. When that happens, these 3 nodes will be disconnected from the other 7 nodes and we'll have

Re: SolrCloud and split-brain

2012-06-15 Thread Yury Kats
On 6/15/2012 12:49 PM, Otis Gospodnetic wrote: Hi, How exactly does SolrCloud handle split brain situations? Imagine a cluster of 10 nodes. Imagine 3 of them being connected to the network by some switch and imagine the out port of this switch dies. When that happens, these 3 nodes will

StreamingUpdateSolrServer Connection Timeout Setting

2012-06-15 Thread Kissue Kissue
Hi, Does anybody know what the default connection timeout setting is for StreamingUpdateSolrServer? Can i explicitly set one and how? Thanks.

Re: SolrCloud and split-brain

2012-06-15 Thread Mark Miller
Zookeeper avoids split brain using Paxos (or something very like it - I can't remember if they extended it or modified and/or what they call it). So you will only ever see one Zookeeper cluster - the smaller partition will be down. There is a proof for Paxos if I remember right. Zookeeper then

Re: SolrCloud and split-brain

2012-06-15 Thread Otis Gospodnetic
Hi,   Zookeeper avoids split brain using Paxos (or something very like it - I  can't remember if they extended it or modified and/or what they call it). So you will only ever see one Zookeeper cluster - the smaller partition will be down. There is a proof for Paxos if I remember right.

Re: SolrCloud and split-brain

2012-06-15 Thread Mark Miller
On Jun 15, 2012, at 1:44 PM, Otis Gospodnetic wrote: Does this work even when outside clients (apps for indexing or searching) send their requests directly to individual nodes? Let's use the example from my email where we end up with 2 groups of nodes: 7-node group with 2 ZK nodes on the

Re: StreamingUpdateSolrServer Connection Timeout Setting

2012-06-15 Thread Sami Siren
The api doc for version 3.6.0 is available here: http://lucene.apache.org/solr/api/org/apache/solr/client/solrj/impl/StreamingUpdateSolrServer.html I think the default is coming from your OS if you are not setting it explicitly. -- Sami Siren On Fri, Jun 15, 2012 at 8:22 PM, Kissue Kissue

Re: SolrCloud and split-brain

2012-06-15 Thread Otis Gospodnetic
Ola, Thanks Mark!   Does this work even when outside clients (apps for indexing or searching)  send their requests directly to individual nodes? Let's use the example from my email where we end up with 2 groups of nodes: 7-node group with 2 ZK nodes on the same network and 3-node group

Re: SolrCloud and split-brain

2012-06-15 Thread Mark Miller
On Jun 15, 2012, at 2:12 PM, Otis Gospodnetic wrote: Makes sense. Do responses carry something to alert the client that something is rotten in the state of cluster? No, I don't think so - we should probably add that to the header similar to how I assume partial results will work. Feel

Re: SolrCloud and split-brain

2012-06-15 Thread Otis Gospodnetic
Thanks Mark, will open an issue in a bit. But I think the following is the real meat of the Q about split brain and SolrCloud, especially when it comes to how indexing is handled during split brain:   Does this work even when outside clients (apps for indexing or searching)  send their

WordBreak and default dictionary crash Solr

2012-06-15 Thread Carrie Coy
Is this a configuration problem or a bug? We use two dictionaries, default (spellcheckerFreq) and solr.WordBreakSolrSpellChecker. When a query contains 2 misspellings, one corrected by the default dictionary, and the other corrected by the wordbreak dictionary (strawberryn shortcake) , Solr

Re: SolrCloud and split-brain

2012-06-15 Thread Mark Miller
On Jun 15, 2012, at 3:21 PM, Otis Gospodnetic wrote: Thanks Mark, will open an issue in a bit. But I think the following is the real meat of the Q about split brain and SolrCloud, especially when it comes to how indexing is handled during split brain: Does this work even when

RE: WordBreak and default dictionary crash Solr

2012-06-15 Thread Dyer, James
Carrie, Thank you for trying out new features! I'm pretty sure you've found a bug here. Could you tell me whether you're using a build from Trunk or Solr_4x ? Also, do you know the svn revision or the Jenkins build # (or timestamp) you're working from? Could you try instead to use

Re: How to boost a field with another field's value?

2012-06-15 Thread smita
Actually I have a title field that I am searching for my query term, and the documents have a rating field that I want to boost the results by, so the higher rated items appear before the lower rated documents. I am also boosting results on another field using bq:

Writing index files that have the right owner

2012-06-15 Thread Mike O'Leary
I have been putting together an application using Quartz to run several indexing jobs in sequence using SolrJ and Tomcat on Windows. I would like the Quartz job to do the following: 1. Delete index directories from the cores so each indexing job starts fresh with empty indexes to

Re: Solr Search Count Variance

2012-06-15 Thread Jack Krupansky
The variance is simply likely due to the fact that your text field is analyzed differently than the source fields you include in your dismax qf. For example, maybe some of them may be string with no analysis. So, fewer of those fields are matching on your query terms when using dismax. Look

Re: SolrCloud and split-brain

2012-06-15 Thread Otis Gospodnetic
Thanks Mark. The reason I asked this is because I saw mentions of SolrCloud being resilient to split brain because it uses ZooKeeper. However, if my half brain understands what split brain is then I think that's not a completely true claim because one can get unlucky and get a SolrCloud