Thanks Jack for valuable response,Actually i am trying to match *any* numeric
pattern at the start of each document. I dont know documents in index i
just want documents title starting with any digit.
--
View this message in context:
We are upgrading our search infrastructure from Lucene 2.3.1 to Lucene 3.5.
I am in the process of load testing and I could find that Lucene 2.3.1
could index 32,000 docs per second, whereas Lucene 3.5 could index only
around 17,000 docs per second.
Indeed, both of them use the standard analyzer
If you are not searching for the specific digit and want to match all
documents that start with any digit, you could as part of the indexing
process, have another field say startsWithDigit and set it to true if
it the title begins with a digit. All you need to do at query time then
is query for
BTW, Have you changed the MergePolicy MergeScheduler settings also? Since
Lucene 3.x/3.5 onwards,
there have been new MergePolicy MergeScheduler implementations available,
like TieredMergePolicy ConcurrentMergeScheduler.
Regards
Pravesh
--
View this message in context:
It's not necessary to do this. You can simply be happy about the fact
that all digits are ordered strictly in unicode, so you can use a range
query:
(f)q={!frange l=0 u=\: incl=true incu=false}title
This finds all documents where any token from the title field starts
with a digit, so if you
On Fri, Jun 15, 2012 at 12:20 PM, pravesh suyalprav...@yahoo.com wrote:
BTW, Have you changed the MergePolicy MergeScheduler settings also? Since
Lucene 3.x/3.5 onwards,
there have been new MergePolicy MergeScheduler implementations available,
like TieredMergePolicy
Btw, I removed the batchSize but performance is better with
batchSize=1. I haven't done further testing to see what the best
setting is, but the difference between setting it at 1 and not
setting it is almost double the indexing time (~20 minutes vs ~37
minutes)
On Thu, Jun 14, 2012 at
Hello,
I'm using the DIH to index some PDFs.
Everything works fine for the first 11 files.
But after indexing 11 PDFs the process stops independently of the PDFs
being indexed or the directory structure (recursive=true).
The lucene index for these 11 documents is valid.
Is there anything like a
Test first, of course, but slave on 3.6 and master on 3.5 should be
fine. If you're
getting evictions with the cache settings that high, you really want
to look at why.
Note that in particular, using NOW in your filter queries virtually guarantees
that they won't be re-used as per the link I sent
Hi,
We'd like to create subdirectories for each collection in our conf bootstrap
directory for cleaner maintenance and not having to include the collection name
in each configuration file. However, it is not working:
2012-06-15 11:31:08,483 ERROR [solr.core.CoreContainer] - [main] - :
Hi,
My solrconfig dedupe setting is as follows.
updateRequestProcessorChain name=dedupe
processor
class=org.apache.solr.update.processor.SignatureUpdateProcessorFactory
bool name=enabledtrue/bool
bool name=overwriteDupesfalse/bool
str name=signatureFielddupesign/str
On Fri, Jun 15, 2012 at 12:50 PM, Ramprakash Ramamoorthy
youngestachie...@gmail.com wrote:
On Fri, Jun 15, 2012 at 12:20 PM, pravesh suyalprav...@yahoo.com wrote:
BTW, Have you changed the MergePolicy MergeScheduler settings also?
Since
Lucene 3.x/3.5 onwards,
there have been new
So I've tried this a bit, but I can't get it to look quite right.
What I was doing up until now was taking the center point of the
geohash cell as location for the value I am getting from the index.
Doing this you end up with what appears to be islands (using
HeatMap.js currently). I guess what I
Thanks
I don't use NOW in queries. All my filters with timestamp are rounded to
hundreds of
seconds to increase hitrate. The only problem could be in price filters
which can be
varied (users are unpredictable :P), but also that filters from fq or
setting cache=false
is also bad idea ... checked it
Hi,
How exactly does SolrCloud handle split brain situations?
Imagine a cluster of 10 nodes.
Imagine 3 of them being connected to the network by some switch and imagine the
out port of this switch dies.
When that happens, these 3 nodes will be disconnected from the other 7 nodes
and we'll have
On 6/15/2012 12:49 PM, Otis Gospodnetic wrote:
Hi,
How exactly does SolrCloud handle split brain situations?
Imagine a cluster of 10 nodes.
Imagine 3 of them being connected to the network by some switch and imagine
the out port of this switch dies.
When that happens, these 3 nodes will
Hi,
Does anybody know what the default connection timeout setting is for
StreamingUpdateSolrServer? Can i explicitly set one and how?
Thanks.
Zookeeper avoids split brain using Paxos (or something very like it - I can't
remember if they extended it or modified and/or what they call it).
So you will only ever see one Zookeeper cluster - the smaller partition will be
down. There is a proof for Paxos if I remember right.
Zookeeper then
Hi,
Zookeeper avoids split brain using Paxos (or something very like it - I
can't remember if they extended it or modified and/or what they call it).
So you will only ever see one Zookeeper cluster - the smaller partition will
be
down. There is a proof for Paxos if I remember right.
On Jun 15, 2012, at 1:44 PM, Otis Gospodnetic wrote:
Does this work even when outside clients (apps for indexing or searching)
send their requests directly to individual nodes?
Let's use the example from my email where we end up with 2 groups of nodes:
7-node group with 2 ZK nodes on the
The api doc for version 3.6.0 is available here:
http://lucene.apache.org/solr/api/org/apache/solr/client/solrj/impl/StreamingUpdateSolrServer.html
I think the default is coming from your OS if you are not setting it explicitly.
--
Sami Siren
On Fri, Jun 15, 2012 at 8:22 PM, Kissue Kissue
Ola,
Thanks Mark!
Does this work even when outside clients (apps for indexing or searching)
send their requests directly to individual nodes?
Let's use the example from my email where we end up with 2 groups of
nodes: 7-node group with 2 ZK nodes on the same network and 3-node group
On Jun 15, 2012, at 2:12 PM, Otis Gospodnetic wrote:
Makes sense. Do responses carry something to alert the client that
something is rotten in the state of cluster?
No, I don't think so - we should probably add that to the header similar to how
I assume partial results will work.
Feel
Thanks Mark, will open an issue in a bit.
But I think the following is the real meat of the Q about split brain and
SolrCloud, especially when it comes to how indexing is handled during split
brain:
Does this work even when outside clients (apps for indexing or searching)
send their
Is this a configuration problem or a bug?
We use two dictionaries, default (spellcheckerFreq) and
solr.WordBreakSolrSpellChecker. When a query contains 2 misspellings,
one corrected by the default dictionary, and the other corrected by the
wordbreak dictionary (strawberryn shortcake) , Solr
On Jun 15, 2012, at 3:21 PM, Otis Gospodnetic wrote:
Thanks Mark, will open an issue in a bit.
But I think the following is the real meat of the Q about split brain and
SolrCloud, especially when it comes to how indexing is handled during split
brain:
Does this work even when
Carrie,
Thank you for trying out new features! I'm pretty sure you've found a bug
here. Could you tell me whether you're using a build from Trunk or Solr_4x ?
Also, do you know the svn revision or the Jenkins build # (or timestamp) you're
working from?
Could you try instead to use
Actually I have a title field that I am searching for my query term, and the
documents have a rating field that I want to boost the results by, so the
higher rated items appear before the lower rated documents.
I am also boosting results on another field using bq:
I have been putting together an application using Quartz to run several
indexing jobs in sequence using SolrJ and Tomcat on Windows. I would like the
Quartz job to do the following:
1. Delete index directories from the cores so each indexing job starts
fresh with empty indexes to
The variance is simply likely due to the fact that your text field is
analyzed differently than the source fields you include in your dismax qf.
For example, maybe some of them may be string with no analysis. So, fewer
of those fields are matching on your query terms when using dismax.
Look
Thanks Mark.
The reason I asked this is because I saw mentions of SolrCloud being resilient
to split brain because it uses ZooKeeper.
However, if my half brain understands what split brain is then I think that's
not a completely true claim because one can get unlucky and get a SolrCloud
31 matches
Mail list logo