TermsComponent/SolrCloud

2012-11-22 Thread Federico Méndez
Anyone knows if the TermsComponent supports distributed search trough a SolrCloud installation? I have a SolrCloud installation that works OK for regular searches but TermsComponent is returning empty results when using: [collectionName]/terms?terms.fl=collector_nameterms.prefix=jo, the request

Re: SolrCloud and external Zookeeper ensemble

2012-11-22 Thread Luis Cappa Banda
Hello, I´ve been dealing with the same question these days. In architecture terms, it´s always better to separate services (Solr and Zookeeper, in this case) rather to keep them in a single instance. However, when we have to deal with costs issues, all of use we are quite limitated and we must

Re: SolrCloud and external Zookeeper ensemble

2012-11-22 Thread Marcin Rzewucki
Yes, this is exactly my case. I prefer 3rd option too. As I have 2 more instances to be used for my purposes (SolrCloud4x + 2 more instances for loading) it will be easier to configure zookeeper ensemble (as I can use those 2 additional machines + 1 from SolrCloud) and avoid more instances to be

Re: TermsComponent/SolrCloud

2012-11-22 Thread Tomás Fernández Löbbe
Hi Federico, it should work. Make sure you set the shards.qt parameter too (in your case, it should be shards.qt=/terms) On Thu, Nov 22, 2012 at 6:51 AM, Federico Méndez federic...@gmail.comwrote: Anyone knows if the TermsComponent supports distributed search trough a SolrCloud installation?

Re: How to use eDismax query parser on a non tokenized field

2012-11-22 Thread Tomás Fernández Löbbe
You can either escape the whitespace with \ or search as a phrase. fieldNonTokenized:foo\ bar ...or... fieldNonTokenized:foo bar On Thu, Nov 22, 2012 at 9:08 AM, Varun Thacker varunthacker1...@gmail.comwrote: I have indexed documents using a fieldType which does not break the word up. I

Re: TermsComponent/SolrCloud

2012-11-22 Thread Federico Méndez
Thanks Tomas, your suggestion worked!! requestHandler name=/terms class=solr.SearchHandler startup=lazy lst name=defaults bool name=termstrue/bool bool name=distribtrue/bool str name=shards.qt/terms/str /lst arr name=components strterms/str /arr

Re: Suggester for numbers

2012-11-22 Thread Gustav
Hello Illu, Here you go: field name='autocomplete' type='text_auto' indexed='true' stored='true' multiValued='true'/ fieldType class=solr.TextField name=text_auto analyzer tokenizer class=solr.KeywordTokenizerFactory/ filter

Re: SolrCloud and exernal file fields

2012-11-22 Thread Martin Koch
Mikhail To avoid freezes we deployed the patches that are now on the 4.1 trunk (bug 3985). But this wasn't good enough, because SOLR would still take very long to restart when that was necessary. I don't see how we could throw more hardware at the problem without making it worse, really - the

Performance improvement for solr faceting on large index

2012-11-22 Thread Pravin Agrawal
Hi All, We are using solr 3.4 with following schema fields. schema.xml--- fieldType name=autosuggest_text class=solr.TextField positionIncrementGap=100 analyzer type=index

Re: Performance improvement for solr faceting on large index

2012-11-22 Thread Yuval Dotan
you could always try the fc facet method and maybe increase the filtercache size On Thu, Nov 22, 2012 at 2:53 PM, Pravin Agrawal pravin_agra...@persistent.co.in wrote: Hi All, We are using solr 3.4 with following schema fields.

Re: SolrCloud and external Zookeeper ensemble

2012-11-22 Thread Jack Krupansky
That's a tradeoff for you to make based on your own requirements, but the point is that it is LESS SAFE to run zookeeper on the same machine as a Solr instance. Also keep in mind that the goal is to have at least THREE zookeeper instances running at any moment, so if you run zookeeper on the

Re: From Solr3.1 to SolrCloud

2012-11-22 Thread roySolr
I run a separate Zookeeper instance right now. Works great, nodes are visible in admin. Two more questions: - I change my synonyms.txt on a solr node. How can i get zookeeper in sync and the other solr nodes without restart? - I read something more about zookeeper ensemble. When i need to run

Re: Partial results with not enough hits

2012-11-22 Thread Otis Gospodnetic
Hi, Maybe your goal should be to make your queries faster instead of fighting with timeouts which are known not to work well. What is your hardware like? How about your queries? What do you see in debugQuery=true output? Otis -- SOLR Performance Monitoring - http://sematext.com/spm On Nov 21,

Re: SolrCloud and external Zookeeper ensemble

2012-11-22 Thread Otis Gospodnetic
If your Solr instances don't max out your ec2 instances you should be fine. But maybe even micro instances will suffice. Or 1 on demand and 2 spot ones. If cost is the concern, that is. Otis -- SOLR Performance Monitoring - http://sematext.com/spm On Nov 21, 2012 5:07 PM, Marcin Rzewucki

Re: SolrCloud and exernal file fields

2012-11-22 Thread Yonik Seeley
On Tue, Nov 20, 2012 at 4:16 AM, Martin Koch m...@issuu.com wrote: around 7M documents in the index; each document has a 45 character ID. 7M documents isn't that large. Is there a reason why you need so many shards (16 in your case) on a single box? -Yonik http://lucidworks.com

Re: SolrCloud and external Zookeeper ensemble

2012-11-22 Thread Jack Krupansky
That is an interesting point - what size of instance is needed for a zookeeper. Can it run well in a micro? Another issue I wanted to raise is that maybe questions, advice, and guidelines should be relative to the shirt size of your cluster - small, medium, or large. SolrCloud is clearly more

Re: SolrCloud and external Zookeeper ensemble

2012-11-22 Thread Shawn Heisey
On 11/22/2012 2:18 AM, Luis Cappa Banda wrote: I´ve been dealing with the same question these days. In architecture terms, it´s always better to separate services (Solr and Zookeeper, in this case) rather to keep them in a single instance. However, when we have to deal with costs issues, all of

Re: How to get a list of servers per collection in sorlcloud using java api?

2012-11-22 Thread Luis Cappa Banda
Hello, Joe. Try something like this using SolrJ library: String endpoints[] = // your Solr server endpoints. Example: http://localhost:8080/solr/core1 String zookeeperEndpoints = // your Zookeeper endpoints. Example: localhost:9000 String collectionName = // Your collection name. Example: core1

Re: From Solr3.1 to SolrCloud

2012-11-22 Thread Tomás Fernández Löbbe
- I change my synonyms.txt on a solr node. How can i get zookeeper in sync and the other solr nodes without restart? Well, you can upload the whole collection configuration again with zkClient (included in the cloud.scripts section). see

Re: Solr Cloud Zookeeper Namespace

2012-11-22 Thread Tomás Fernández Löbbe
You could use Zookeeper's chroot: http://zookeeper.apache.org/doc/r3.2.2/zookeeperAdmin.html#sc_bestPractices You can use chroot in Solr by specifying it in the zkHost parameter, for example -DzkHost=localhost:2181/namespace1 In order for this to work, you need to first create the initial path

Re: How to get a list of servers per collection in sorlcloud using java api?

2012-11-22 Thread Luis Cappa Banda
Hello, As far as I know, you cannot do that at the moment, :-/ Regards, - Luis Cappa. 2012/11/22 joe.cohe...@gmail.com joe.cohe...@gmail.com Thanks Rakudten. I had my question mis-phrased. What I need is being able to get the solr servers storing a collection by giving the zookeeper

Re: is there a way to prevent abusing rows parameter

2012-11-22 Thread solr-user
Thanks guys. This is a problem with the front end not validating requests. I was hoping there might be a simple config value I could enter/change, rather than going the long process of migrating a proper fix all the way up to our production servers. Looks like not, but thx. -- View this

Re: Partial results with not enough hits

2012-11-22 Thread Aleksey Vorona
Thank you! That seems to be the case, I tried to execute queries without sorting and only one document in the response and I got execution time in the same range as before. -- Aleksey On 12-11-21 04:07 PM, Jack Krupansky wrote: It could be that the time to get set up to return even the

Re: Partial results with not enough hits

2012-11-22 Thread Aleksey Vorona
Thanks for the response. I have increased the timeout and it did not increase execution time or system load. It is really that I misused the timeout. Just to give you a bit of perspective, we added timeout to guarantee some level of QoS from the search engine. Our UI allows user to

SolrCloud: Very strange behavior when doing atomic updates or documents reindexation.

2012-11-22 Thread Luis Cappa Banda
Hello everyone. I´ve starting to seriously worry about with SolrCloud due an strange behavior that I have detected. The situation is this the following: *1.* SolrCloud with one shard and two Solr instances. *2.* Indexation via SolrJ with CloudServer and a custom BinaryLBHttpSolrServer that uses

Re: How to get a list of servers per collection in sorlcloud using java api?

2012-11-22 Thread Sami Siren
On Thu, Nov 22, 2012 at 7:20 PM, joe.cohe...@gmail.com joe.cohe...@gmail.com wrote: Thanks Rakudten. I had my question mis-phrased. What I need is being able to get the solr servers storing a collection by giving the zookeeper server as an input. something like: // returns a list of solr

Reloading config to zookeeper

2012-11-22 Thread Cool Techi
When we make changes to our config files, how do we reload the files into zookeeper. Also, I understand that we would need to reload the collection, would we need to do this at a per shard level or just at the cloud level. Regards, Ayush

Re: SolrCloud: Very strange behavior when doing atomic updates or documents reindexation.

2012-11-22 Thread Sami Siren
I think the problem is that even though you were able to work around the bug in the client solr still uses the xml format internally so the atomic update (with multivalued field) fails later down the stack. The bug you filed needs to be fixed to get the problem solved. On Thu, Nov 22, 2012 at

Re: SolrCloud: Very strange behavior when doing atomic updates or documents reindexation.

2012-11-22 Thread Luis Cappa Banda
Hi, Sami! But isn´t strange that some documents were updated (atomic updates) correctly and other ones not? Can´t it be a more serious problem like some kind of index writer lock, or whatever? Regards, - Luis Cappa. 2012/11/22 Sami Siren ssi...@gmail.com I think the problem is that even

Re: Reloading config to zookeeper

2012-11-22 Thread Marcin Rzewucki
Hi, I'm using cloud-scripts/zkcli.sh script for reloading configuration, for example: $ ./cloud-scripts/zkcli.sh -cmd upconfig -confdir config.dir -solrhome solr.home -confname config.name -z zookeeper.host Then I'm reloading collection on each node in cloud, but maybe someone knows better

RE: Reloading config to zookeeper

2012-11-22 Thread Cool Techi
Thanks, but why do we need to specify the -solrhome? I am using the following command to load new config, java -classpath .:/Users/solr-cli-lib/* org.apache.solr.cloud.ZkCLI -cmd upconfig -zkhost localhost:2181,localhost:2182,localhost:2183,localhost:2184,localhost:2185 -confdir

Re: Reloading config to zookeeper

2012-11-22 Thread Marcin Rzewucki
I think solrhome is not mandatory. Yes, reloading is uploading config dir again. It's a pity we can't update just modified files. Regards. On 22 November 2012 19:38, Cool Techi cooltec...@outlook.com wrote: Thanks, but why do we need to specify the -solrhome? I am using the following command

Re: Reload core via CoreAdminRequest doesnt work with solr cloud? (solrj)

2012-11-22 Thread Tomás Fernández Löbbe
If you need to reload all the cores from a given collection you can use the Collections API: http://localhost:8983/solr/admin/collections?action=RELOADname=mycollection On Thu, Nov 22, 2012 at 3:17 PM, joe.cohe...@gmail.com joe.cohe...@gmail.com wrote: Hi, I'm using solr-4.0.0 I'm trying to

Re: SolrCloud: Very strange behavior when doing atomic updates or documents reindexation.

2012-11-22 Thread Sami Siren
It might even depend on the cluster layout! Let's say you have 2 shards (no replicas) if the doc belongs to the node you send it to so that it does not get forwarded to another node then the update should work and in case where the doc gets forwarded to another node the problem occurs. With

upgrading from 4.0 to 4.1 causes CorruptIndexException: checksum mismatch in segments file

2012-11-22 Thread solr-user
hi all I have been working on moving us from 4.0 to a newer build of 4.1 I am seeing a CorruptIndexException: checksum mismatch in segments file error when I try to use the existing index files. I did see something in the build log for #119 re LUCENE-4446 that mentions flip file formats to

Re: Error: _version_field must exist in schema

2012-11-22 Thread Nick Zadrozny
On Wed, Oct 17, 2012 at 3:20 PM, Dotan Cohen dotanco...@gmail.com wrote: I do have a Solr 4 Beta index running on Websolr that does not have such a field. It works, but throws many Service Unavailable and Communication Error errors. Might the lack of the _version_ field be the reason?

Re: SolrCloud: Very strange behavior when doing atomic updates or documents reindexation.

2012-11-22 Thread Luis Cappa Banda
Hello! I´m using a simple test configuration with nShards=1 without any replica. SolrCloudServer is suposed to forward properly those index/update operations, isn´t it? I test with a complete document reindexation, not atomic updates, using the official LBHttpSolrServer, not my custom

Re: SolrCloud: Very strange behavior when doing atomic updates or documents reindexation.

2012-11-22 Thread Luis Cappa Banda
For more details, my indexation App is: 1. Multithreaded. 2. NRT indexation. 3. It´s a Web App with a REST API. It receives asynchronous requests that produces those atomic updates / document reindexations I told before. I´m pretty sure that the wrong behavior is related with CloudSolrServer and

Re: SolrCloud: Very strange behavior when doing atomic updates or documents reindexation.

2012-11-22 Thread Luis Cappa Banda
More info: - I´m trying to update the document re-indexing the whole document again. I first retrieve the document querying by it´s id, then delete it by it´s id, and re-index including the new changes. - At the same time there are other index writing operations. *RESULT*: in most cases the

Re: upgrading from 4.0 to 4.1 causes CorruptIndexException: checksum mismatch in segments file

2012-11-22 Thread Jack Krupansky
Moving from the final release of 4.0 to 4.1 should be fine, but you appear to be using a snapshot of 4.0 that is even older than the ALPHA release of 4.0 and a number of format changes occurred last Spring. So, yeah, you will have to re-index. -- Jack Krupansky -Original Message-

Re: Find the matched field in each matched document

2012-11-22 Thread Jack Krupansky
No, not directly, but indirectly you can - add debugQuery=true to your request and the explain section will detail which terms matched in which fields. You could probably also implement a custom search component which annotated each document with the matched field names. In that sense, Solr

Re: Find the matched field in each matched document

2012-11-22 Thread Alireza Salimi
Hi Jack, Thanks for the reply. I'm not sure about debug components, I thought it slows down query time. Can you explain more about custom search component? Thanks On Thu, Nov 22, 2012 at 7:02 PM, Jack Krupansky j...@basetechnology.comwrote: No, not directly, but indirectly you can - add

Re: Performance improvement for solr faceting on large index

2012-11-22 Thread Otis Gospodnetic
Hi, I don't quite follow what you are trying gyroscope do, but it almost sounds like you may be better off using something other than Solr if all you are doing is filtering by site and counting something. I see unigrams in what looks like it could be a big field and that's a red flag. Your index

Re: SolrCloud and external Zookeeper ensemble

2012-11-22 Thread Otis Gospodnetic
Note the number of zookeeper nodes is independent of number of shards. Otis -- SOLR Performance Monitoring - http://sematext.com/spm On Nov 22, 2012 4:19 AM, Luis Cappa Banda luisca...@gmail.com wrote: Hello, I´ve been dealing with the same question these days. In architecture terms, it´s

User context based search in apache solr

2012-11-22 Thread sagarzond
In our application we are providing product master data search with SOLR. Now our requirement want to provide user context based search(means we are providing top search result using user history). For that i have created one score table having following field 1)product_id 2)user_id

Re: Error: _version_field must exist in schema

2012-11-22 Thread Dotan Cohen
On Thu, Nov 22, 2012 at 9:26 PM, Nick Zadrozny n...@onemorecloud.com wrote: Belated reply, but this is probably something you should let us know about directly at supp...@onemorecloud.com if it happens again. Cheers. Hi Nick. This particular issue was on a Solr 4 instance on AWS, not on the

RE: Solr UIMA with KEA

2012-11-22 Thread Markus Jelsma
See: http://nutch.apache.org/apidocs-2.1/org/apache/nutch/crawl/AdaptiveFetchSchedule.html -Original message- From:nutchsolruser nutchsolru...@gmail.com Sent: Fri 23-Nov-2012 06:53 To: solr-user@lucene.apache.org Subject: Solr UIMA with KEA Is there any way we can extract tags

RE: Solr UIMA with KEA

2012-11-22 Thread Markus Jelsma
Sorry, wrong list :) -Original message- From:Markus Jelsma markus.jel...@openindex.io Sent: Fri 23-Nov-2012 08:32 To: solr-user@lucene.apache.org Subject: RE: Solr UIMA with KEA See: http://nutch.apache.org/apidocs-2.1/org/apache/nutch/crawl/AdaptiveFetchSchedule.html