why does a node switch state ?

2013-08-28 Thread sling
hi, I have a solrcloud with 8 jvm, which has 4 shards(2 nodes for each shard). 1000 000 docs are indexed per day, and 10 query requests per second, and sometimes, maybe there are 100 query requests per second. in each shard, one jvm has 8G ram, and another has 5G. the jvm args is like this: -Xmx5

Re: Multiple replicas for specific shard

2013-08-28 Thread maephisto
Thanks Keith! But could this be done dinamically? Let's take the following example: a SolrCloud cluster with sport event results split in three shards by category - footbal shard, golf shard and baseball shard. Each of this shards has a replica on a machine. Then i realize that my footbal related

Re: Solr 4.2 Regular expression, returning only matched substring

2013-08-28 Thread jai2
hi Erick, Appreciate your reply. Facet.query will give count of matches not the count of unique pattern matches. if i give regular expression [0-9]{3} to match a 3 digit number it will return total occurrences of three digit numbers, but i want to know occurrences of unique 3 numbers. lets say i

Re: why does a node switch state ?

2013-08-28 Thread Daniel Collins
Do you see anything in the solr logs as to what the trigger for your nodes changing state was? You should see some kind of error/warning before the election is triggered. My gut feeling would be loss of communication between your leader and ZK (possibly by a GC event that locks the JVM for a whil

Suspicious message with attachment

2013-08-28 Thread help
The following message addressed to you was quarantined because it likely contains a virus: Subject: Newbie SOLR question From: =?windows-1251?B?wPLg7eDxIMDy4O3g8e7i?= However, if you know the sender and are expecting an attachment, please reply to this message, and we will forward the quaranti

Re: Solr 4.2 Regular expression, returning only matched substring

2013-08-28 Thread Erick Erickson
Ah, OK. Nothing springs to mind. Even faceting on the individual values of the field counts _documents_ that match, but doesn't give you which particular values matched. I suppose that in that case you could run your regex over the returned labels for the facets. But that's a really ugly solution.

Help to figure out why query does not match

2013-08-28 Thread heaven
Hi, please help me figure out what's going on. I have the next field type: And the next string indexed: http://plus.google.com/111950520904110959061/profile Here is what the analyzer shows: http://img607.imageshack.us/img607/5074/fn1.png Then I d

Re: How to patch Solr4.2 for SolrEnityProcessor Sub-Enity issue

2013-08-28 Thread Shalin Shekhar Mangar
This is fixed in trunk and branch_4x and will be available in the next release (4.5) See https://issues.apache.org/jira/browse/SOLR-5190 On Mon, Aug 26, 2013 at 12:37 PM, harshchawla wrote: > Thanks a lot in advance. I am eagerly waiting for your response. > > > > -- > View this message in conte

Re: How to patch Solr4.2 for SolrEnityProcessor Sub-Enity issue

2013-08-28 Thread harshchawla
Thanks a lot for this fix. I am now eagerly waiting for solr - 4.5 -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-patch-Solr4-2-for-SolrEnityProcessor-Sub-Enity-issue-tp4086292p4086973.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Multiple replicas for specific shard

2013-08-28 Thread Erick Erickson
http://wiki.apache.org/solr/SolrCloud#Creating_cores_via_CoreAdmin. Essentially you create a core on a new machine and assign it a collection and shard. It'll register itself, replicate the data from the leader and join the cluster automatically. You could script this too, but be aware that the r

Solr 4.0 -> Fuzzy query and Proximity query

2013-08-28 Thread Prasi S
Hi, with solr 4.0 the fuzzy query syntax is like ~1 (or 2) Proximity search is like "value"~20. How does this differentiate between the two searches. My thought was promiximity would be on phrases and fuzzy on individual words. Is that correct? I wasnted to do a promiximity search for text field

Re: Help to figure out why query does not match

2013-08-28 Thread Erick Erickson
Hmmm, Certainly only the outputs of the last filter make it into the index. Consider stopwords being the last filter, you'd expect stopwords to be removed. There's nothing that I know of that'll do what you're asking, the code for ENGTF doesn't have any "preserve original" that I see. This seems l

Re: Solr 4.0 -> Fuzzy query and Proximity query

2013-08-28 Thread Erick Erickson
The first thing I'd recommend is to look at the admin/analysis page. I suspect you aren't seeing fuzzy query results at all, what you're seeing is the result of stemming. Stemming is algorithmic, so sometimes produces very surprising results, i.e. Trinidad and Trinigee may stem to something like t

Data Centre recovery/replication, does this seem plausible?

2013-08-28 Thread Daniel Collins
We have 2 separate data centers in our organisation, and in order to maintain the ZK quorum during any DC outage, we have 2 separate Solr clouds, one in each DC with separate ZK ensembles but both are fed with the same indexing data. Now in the event of a DC outage, all our Solr instances go down,

Re: Solr 4.0 -> Fuzzy query and Proximity query

2013-08-28 Thread Prasi S
hi Erick, Yes it is correct. These results are because of stemming + phonetic matching. Below is the Index time ST trinity services SF trinity services LCF trinity services SF trinity services SF trinity services WDF trinity services Query time SF triniti servic PF TRNTtriniti SRFKservic HWF TRN

Re: Solr 4.0 -> Fuzzy query and Proximity query

2013-08-28 Thread Prasi S
sry , i copied it wrong. Below is the correct analysis. Index time ST trinity services SF trinity services LCF trinity services SF trinity services SF trinity services WDF trinity services SF triniti servic PF TRNTtriniti SRFKservic HWF TRNTtriniti SRFKservic PSF TRNTtriniti SRFKservic *Query

Re: Solr 4.0 -> Fuzzy query and Proximity query

2013-08-28 Thread Erick Erickson
No, ComplexPhraseQuery has been around for quite a while but never incorporated into the code base, it's pretty much what you need to do both fuzzy and phrase at once. But, doesn't phonetic really incorporate at least a flavor of fuzzy? Is it close enough for your needs to just do phonetic matches

Re: Data Centre recovery/replication, does this seem plausible?

2013-08-28 Thread Erick Erickson
The separate DC problem has been lurking for a while. But your understanding it a little off. When a replica discovers that it's "too far" out of date, it does an old-style replication. IOW, the tlog doesn't contain the entire delta. Eventually, the old-style replications catch up to "close enough"

NPE during distributed search

2013-08-28 Thread Dmitry Kan
Solr 4.3.1 container: jetty 9 (jetty-distribution-9.0.4.v20130625) shard sizes: between 10G and 15G two cores per shard, non SolrCloud mode We have frontend solr and several shards. When searching in smaller amount of shards, the query runs ok. When asking for larger amount of shards, the query fa

how to sum a field grouping by more fields

2013-08-28 Thread hao.jin
Hello, can somebody tell me, if solr 4.4.0 support *stats.pivot* in order to sum a field grouping by more fields. Are there another methods to sum a field grouping by more fields? Thanks -- View this message in context: http://lucene.472066.n3.nabble.com/how-to-sum-a-field-grouping-by-more-

Group/distinct

2013-08-28 Thread Per Steffensen
Hi I have a set of collections containing documents with the fields: "a", "b" and "timestamp" A LOT of documents and a lot of them have same values for "a", and for each value of "a" there is only a very limited set of distinct values in the "b"'s. The "timestamp"-values are different for (alm

Re: Data Centre recovery/replication, does this seem plausible?

2013-08-28 Thread Timothy Potter
I've been thinking about this one too and was curious about using the Solr Entity support in the DIH to do the import from one DC to another (for the lost docs). In my mind, one configures the DIH to use the SolrEntityProcessor with a query to capture the docs in the DC that stayed online, most lik

Re: Data Centre recovery/replication, does this seem plausible?

2013-08-28 Thread Erick Erickson
If you can satisfy this statement then it seems possible. This is the same restirction as "atomic updates".: The SolrEntityProcessor can only copy fields that are stored in the source index. On Wed, Aug 28, 2013 at 9:41 AM, Timothy Potter wrote: > I've been thinking about this one too and was cu

RE: Data Centre recovery/replication, does this seem plausible?

2013-08-28 Thread Markus Jelsma
Hi - You're going to miss unstored but indexed fields. We stop any indexing process, kill the servlets on the down DC and copy over the files using scp, then remove the lock file and start it up again. Always works but it's a manual process at this point but should be easy to automate using som

RE: SOLR 4.2.1 - High Resident Memory Usage

2013-08-28 Thread Markus Jelsma
Hi - it's certainly not a rule of thumb but usually RES always grows higher than Xmx so keep an eye on it. -Original message- > From:vsilgalis > Sent: Wednesday 28th August 2013 2:53 > To: solr-user@lucene.apache.org > Subject: Re: SOLR 4.2.1 - High Resident Memory Usage > >

Re: Multiple replicas for specific shard

2013-08-28 Thread maephisto
Thanks Erik, I think this answers my question -- View this message in context: http://lucene.472066.n3.nabble.com/Multiple-replicas-for-specific-shard-tp4086828p4087019.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: Data Centre recovery/replication, does this seem plausible?

2013-08-28 Thread Shawn Heisey
On 8/28/2013 6:13 AM, Daniel Collins wrote: > We have 2 separate data centers in our organisation, and in order to > maintain the ZK quorum during any DC outage, we have 2 separate Solr > clouds, one in each DC with separate ZK ensembles but both are fed with the > same indexing data. > > Now in t

Re: Solr 4.0 -> Fuzzy query and Proximity query

2013-08-28 Thread Walter Underwood
Mixing fuzzy with phonetic can give bizarre matches. I worked on a search engine that did that. You really don't want to mix stemming, phonetic, and fuzzy. They are distinct transformations of the surface word that do different things. Stemming: conflate different inflections of the same word,

Re: ICUTokenizer class not found with Solr 4.4

2013-08-28 Thread Tom Burton-West
Thanks Shawn and Naomi, I think I am running into the same bug, but the symptoms are a bit different. I'm wondering if it makes sense to file a separate linked bug report. >>The workaround is to remove sharedLib from solr.xml, The solr.xml that comes out-of-the-box does not have a sharedLib.

Re: ICUTokenizer class not found with Solr 4.4

2013-08-28 Thread Tom Burton-West
My point in the previous e-mail was that following the instructions in the documentation does not seem to work. The workaround I found was to simply change the name of the collection1/lib directory to collection1/foobar and then include it in solrconfig.xml. This works, but does not

Re: ICUTokenizer class not found with Solr 4.4

2013-08-28 Thread Shawn Heisey
On 8/28/2013 9:34 AM, Tom Burton-West wrote: I think I am running into the same bug, but the symptoms are a bit different. I'm wondering if it makes sense to file a separate linked bug report. The workaround is to remove sharedLib from solr.xml, The solr.xml that comes out-of-the-box does not

Re: Data Centre recovery/replication, does this seem plausible?

2013-08-28 Thread Daniel Collins
Thanks Shawn/Erick for the suggestions. Unfortunately stopping indexing whilst we recover isn't a viable option, we are using Solr as an NRT search platform, so indexing must continue at least on the DC that is fine. If we could stop indexing on the "broken" DC, then recovery is relatively straigh

Question about SOLR-5017 - Allow sharding based on the value of a field

2013-08-28 Thread adfel70
Hi I'm looking into allowing query joins in solr cloud. This has the limitation of having to index all the documents that are joineable together to the same shard. I'm wondering if SOLR-5017 would give me the ability to do so without implementing

Re: Data Centre recovery/replication, does this seem plausible?

2013-08-28 Thread Shawn Heisey
On 8/28/2013 10:48 AM, Daniel Collins wrote: What ideally I would like to do is at the point that I kick off recovery, divert the indexing feed for the "broken" into a transaction log on those machines, run the replication and swap the index in, then replay the transaction log to bring it all up

Re: Question about SOLR-5017 - Allow sharding based on the value of a field

2013-08-28 Thread Greg Preston
I don't know about SOLR-5017, but why don't you want to use parent_id as a shard key? So if you've got a doc with a key of "abc123" and a parent_id of 456, just use a key of "456!abc123" and all docs with the same parent_id will go to the same shard. We're doing something similar and limiting que

SolrCloud Set up

2013-08-28 Thread Jared Griffith
What is the recommended way to set up Solr so it's HA and fault tolerant? I'm assuming it would be the SolrCloud set up. I'm guessing that Example C (http://wiki.apache.org/solr/SolrCloud) would be the optimum set up. If so, would one set up a load balancer (like f5 or whatever) to direct reque

Re: Storing query results

2013-08-28 Thread Dan Davis
You could copy the existing core to a new core every once in awhile, and then do your delta indexing into a new core once the copy is complete. If a Persistent URL for the search results included the name of the original core, the results you would get from a bookmark would be stable. However, if

Re: How to Manage RAM Usage at Heavy Indexing

2013-08-28 Thread Dan Davis
This could be an operating systems problem rather than a Solr problem. CentOS 6.4 (linux kernel 2.6.32) may have some issues with page flushing and I would read-up up on that. The VM parameters can be tuned in /etc/sysctl.conf On Sun, Aug 25, 2013 at 4:23 PM, Furkan KAMACI wrote: > Hi Erick; > >

Re: More on topic of Meta-search/Federated Search with Solr

2013-08-28 Thread Dan Davis
On Tue, Aug 27, 2013 at 2:03 AM, Paul Libbrecht wrote: > Dan, > > if you're bound to federated search then I would say that you need to work > on the service guarantees of each of the nodes and, maybe, create > strategies to cope with bad nodes. > > paul > +1 I'll think on that.

Re: More on topic of Meta-search/Federated Search with Solr

2013-08-28 Thread Dan Davis
On Tue, Aug 27, 2013 at 3:33 AM, Bernd Fehling < bernd.fehl...@uni-bielefeld.de> wrote: > Years ago when "Federated Search" was a buzzword we did some development > and > testing with Lucene, FAST Search, Google and several other Search Engines > according Federated Search in Library context. > Th

Re: More on topic of Meta-search/Federated Search with Solr

2013-08-28 Thread Dan Davis
On Mon, Aug 26, 2013 at 9:06 PM, Amit Jha wrote: > Would you like to create something like > http://knimbus.com > I work at the National Library of Medicine. We are moving our library catalog to a newer platform, and we will probably include articles. The article's content and meta-data are

Re: SolrCloud Set up

2013-08-28 Thread Shawn Heisey
On 8/28/2013 11:56 AM, Jared Griffith wrote: What is the recommended way to set up Solr so it's HA and fault tolerant? I'm assuming it would be the SolrCloud set up. I'm guessing that Example C (http://wiki.apache.org/solr/SolrCloud) would be the optimum set up. If so, would one set up a load

Re: Different Responses for 4.4 and 3.5 solr index

2013-08-28 Thread Michael Sokolov
We've been seeing changes in our rankings as well. I don't have a definite answer yet, since we're waiting on an index rebuild, but our current working theory is that the change to default omitNorms="true" for primitive types may have had an effect, possibly due to follow on confusion: our dev

Re: SolrCloud Set up

2013-08-28 Thread Jared Griffith
We are using Java here. Are you saying that the Solr java client would be aware of the multiple zookeepers and would thus do health / host checks on each zookeeper instance in turn until it got one that is working (assuming that you have one or more zookeepers down)? If that's the case, holy aweso

Re: SolrCloud Set up

2013-08-28 Thread Shawn Heisey
On 8/28/2013 1:36 PM, Jared Griffith wrote: We are using Java here. Are you saying that the Solr java client would be aware of the multiple zookeepers and would thus do health / host checks on each zookeeper instance in turn until it got one that is working (assuming that you have one or more zo

purge and optimize questions for solr 4.4.0

2013-08-28 Thread Joshi, Shital
We have SolrCloud cluster (5 shards and 2 replicas) on 10 boxes with 500 million documents. We're using custom sharding where we direct all documents with specific business date to specific shard. With Solr 3.6 we used this command to optimize documents on master and then let replication take

coordination factor in between query terms

2013-08-28 Thread Anirudha Jadhav
How can i specify coordination factor between query terms eg. q="termA termB" doc1= { field: termA} doc2 = {field: termA termB termC termD } I want doc2 scored higher than doc1 -- Anirudha P. Jadhav

RE: coordination factor in between query terms

2013-08-28 Thread Greg Walters
Just boost the term you want to show up higher in your results. http://wiki.apache.org/solr/SolrRelevancyCookbook#Boosting_Ranking_Terms - Greg -Original Message- From: anirudh...@gmail.com [mailto:anirudh...@gmail.com] On Behalf Of Anirudha Jadhav Sent: Wednesday, August 28, 2013 3:36

Re: coordination factor in between query terms

2013-08-28 Thread Anirudha Jadhav
i don't know what term to boost. I just need the documents with both terms listed as ranked higher. but since Doc1 is smaller and has an exact match on the term as per tf-idf is ranked higher. On Wed, Aug 28, 2013 at 4:47 PM, Greg Walters wrote: > Just boost the term you want to show up higher

Re: coordination factor in between query terms

2013-08-28 Thread Chris Hostetter
1) Coordination factor is controlled by the Similarity you have configured -- there is no request time option to affect hte coordination function. the Default Similarity already includes a simple ratio coord factor... https://lucene.apache.org/core/4_4_0/core/org/apache/lucene/search/similariti

Re: ICUTokenizer class not found with Solr 4.4

2013-08-28 Thread Tom Burton-West
Hi Shawn, I'm going to add this to the your JIRA unless you think that it would be good to open another issue. The issue for me is that making a ./lib in the instanceDir is documented as working in several places and has worked in previous versions of Solr, for example solr 4.1.0. I make a ./lib

Re: coordination factor in between query terms

2013-08-28 Thread Anirudha Jadhav
my bad, typo there q=termA termB i know omitNorms is indexTime field option, can it be applied to the query also? are there other solutions to this kind of a problem? curious On Wed, Aug 28, 2013 at 4:52 PM, Chris Hostetter wrote: > > 1) Coordination factor is controlled by the Similarity you

Re: ICUTokenizer class not found with Solr 4.4

2013-08-28 Thread Shawn Heisey
On 8/28/2013 2:59 PM, Tom Burton-West wrote: Do you think I should open another JIRA and link it to yours or just add this information (i.e. other scenarios where class loading not working) to your JIRA? The documentation does sound confused. My personal opinion (which may not be what ends up

What does it mean when a shard is down in solr4.4?

2013-08-28 Thread Utkarsh Sengar
I have a 3 node solrcloud cluster with 3 shards for each collection/core. At times when I rebuild the index say on collectionA on nodeA (shard1) via UpdateCSV, the "Cloud" status page says that collectionA on nodeA (shard1) is down. Observations: 1. Other collections on nodeA work. 2. collectionA

Re: Solr show total row count in response of full import

2013-08-28 Thread Chris Hostetter
: It would be nice if you could receive a total row count like : : 10100 : : With this information we could add another information like : : 62.91 : : This would make it easier to generate a progress bar for the end user. I don't think that's possible -- DIH has no way of knowing in advance t

Re: Filter cache pollution during sharded edismax queries

2013-08-28 Thread Chris Hostetter
Ken ... i'm not really sure i'm understanding what you're trying to describe. can you give the full details of a concrete example of what you are seeing? * full requestHandler config * example of query issued by client * every request logged on each shard * contends of filterCache and queryRes

Re: purge and optimize questions for solr 4.4.0

2013-08-28 Thread Chris Hostetter
: We have SolrCloud cluster (5 shards and 2 replicas) on 10 boxes with 500 : million documents. We're using custom sharding where we direct all : documents with specific business date to specific shard. ... : How do we optimize documents for all shards in Solr Cloud? Do we have to : fi

Feedback requested on design/implementation/extent of a proposed Solr configuration REST API

2013-08-28 Thread Steve Rowe
For mailing list participants on solr-user who aren't subscribed to the dev list: I've created a JIRA issue to discuss adding a Solr configuration REST API: . I'm interested in feedback of any kind on this proposal, preferably on the above-linked

Re: why does a node switch state ?

2013-08-28 Thread sling
Hi Daniel, thank you very much for your reply. However, my zkTimeout in solr.xml is 30s. ... -- View this message in context: http://lucene.472066.n3.nabble.com/why-does-a-node-switch-state-tp4086939p4087142.html Sent from the Solr - User mailing list archive at Nabble.com.

Re: why does a node switch state ?

2013-08-28 Thread veena rani
Kindly stop me from solr mail chain. Thanks and regards, Veena On Wed, Aug 28, 2013 at 12:55 PM, sling wrote: > hi, > I have a solrcloud with 8 jvm, which has 4 shards(2 nodes for each shard). > 1000 000 docs are indexed per day, and 10 query requests per second, and > sometimes, maybe there

RE: SOLR 4.2.1 - High Resident Memory Usage

2013-08-28 Thread vsilgalis
So we actually 3 of the 6 machines automatically restart the SOLR service as memory pressure was too high, 2 were by SIGABRT and one was java OOMkiller. I dropped a pmap on one of the solr services before it died. Basically i need to figure out what the other direct memory references are outside