Re: Best practices for Solr (how to update jar files safely)

2016-02-22 Thread Ramkumar R. Aiyengar
I side with Toke on this. Enterprise bare metal machines often have hundreds of gigs of memory and tens of CPU cores -- you would have to fit multiple instances in a machine to make use of them to circumvent huge heaps. If this is not a common case now, it could well be in the future the way

Re: It's possible up and debug solr in eclipse IDE?

2016-01-14 Thread Ramkumar R. Aiyengar
I should add to Erick's point that the test framework allows you to test HTTP APIs through an embedded Jetty instance, so you should be able to do anything that you do with a remote Solr instance from code.. On 12 Jan 2016 18:24, "Erick Erickson" wrote: > And a neater

Re: Number of requests to each shard is different with and without using of grouping

2015-08-22 Thread Ramkumar R. Aiyengar
M is the number of ids you want for each group, specified by group.limit. It's unrelated to the number of rows requested.. On 21 Aug 2015 19:54, SolrUser1543 osta...@gmail.com wrote: Ramkumar R. Aiyengar wrote Grouping does need 3 phases.. The phases are: (2) For the N groups, each shard

Re: SOLR to SOLR communication with custom authentication

2015-08-21 Thread Ramkumar R. Aiyengar
Custom authentication support was added in 5x, and the imminent (in the next few days) 5.3 release has a lot of features in this regard, including a basic authentication module, I would suggest upgrading to it. 5x versions (include 5.3) do support Java 7, so I don't see an issue here? On 20 Aug

Re: Number of requests to each shard is different with and without using of grouping

2015-08-21 Thread Ramkumar R. Aiyengar
Grouping does need 3 phases.. The phases are: (1) Each shard is asked for the top N groups (instead of ids), with the sort value. The federator then sorts the groups from all shards and chooses the top N groups. (2) For the N groups, each shard is asked for the top M ids (M is configurable per

Re: Solr 5.2.1 on Solaris

2015-06-19 Thread Ramkumar R. Aiyengar
Please open a JIRA with details of what the issues are, we should try to support this.. On 18 Jun 2015 15:07, Bence Vass bence.v...@inso.tuwien.ac.at wrote: Hello, Is there any documentation on how to start Solr 5.2.1 on Solaris (Solaris 10)? The script (solr start) doesn't work out of the

Re: Please help test the new Angular JS Admin UI

2015-06-17 Thread Ramkumar R. Aiyengar
I started with an empty Solr instance and Firefox 38 on Linux. This is the trunk source.. There's a 'No cores available. Go and create one' button available in the old and the new UI. In the old UI, clicking it goes to the core admin, and pops open the dialog for Add Core. The new UI only goes to

Re: SolrCloud Leader Election

2015-05-21 Thread Ramkumar R. Aiyengar
This shouldn't happen, but if it does, there's no good way currently for Solr to automatically fix it. There are a couple of issues being worked on to do that currently. But till then, your best bet is to restart the node which you expect to be the leader (you can look at ZK to see who is at the

Re: Multiple index.timestamp directories using up disk space

2015-05-05 Thread Ramkumar R. Aiyengar
cleaned up, I'd fill a JIRA to address the issue. Those directories should be removed over time. At times there will have to be a couple around at the same time and others may take a while to clean up. - Mark On Tue, Apr 28, 2015 at 3:27 AM Ramkumar R. Aiyengar andyetitmo...@gmail.com wrote

Re: Multiple index.timestamp directories using up disk space

2015-04-28 Thread Ramkumar R. Aiyengar
SolrCloud does need up to twice the amount of disk space as your usual index size during replication. Amongst other things, this ensures you have a full copy of the index at any point. There's no way around this, I would suggest you provision the additional disk space needed. On 20 Apr 2015 23:21,

Re: Restart solr failed after applied the patch in https://issues.apache.org/jira/browse/SOLR-6359

2015-03-31 Thread Ramkumar R. Aiyengar
It shouldn't be any different without the patch, or with the patch and (100,10) as parameters. Which is why I wanted you to check with 100,10.. If you see the same issue with that, then the patch is probably not an issue, may be it is with the patched build in general.. On 30 Mar 2015 13:01,

Re: Restart solr failed after applied the patch in https://issues.apache.org/jira/browse/SOLR-6359

2015-03-30 Thread Ramkumar R. Aiyengar
I doubt this has anything to do with the patch. Do you observe the same behaviour if you reduce the values for the config to defaults? (100, 10) On 30 Mar 2015 09:51, forest_soup tanglin0...@gmail.com wrote: https://issues.apache.org/jira/browse/SOLR-6359 I also posted the questions to the

Re: How to use ConcurrentUpdateSolrServer for Secured Solr?

2015-03-22 Thread Ramkumar R. Aiyengar
Not a direct answer, but Anshum just created this.. https://issues.apache.org/jira/browse/SOLR-7275 On 20 Mar 2015 23:21, Furkan KAMACI furkankam...@gmail.com wrote: Is there anyway to use ConcurrentUpdateSolrServer for secured Solr as like CloudSolrServer:

Re: Want to modify Solr Source Code

2015-03-17 Thread Ramkumar R. Aiyengar
Is your concern that you want to be able to modify source code just on your machine or that you can't for some reason install svn? If it's the former, even if you checkout using svn, you can't modify anything outside the machine as changes can be checked in only by the committers of the project.

Re: Whole RAM consumed while Indexing.

2015-03-16 Thread Ramkumar R. Aiyengar
Yes, and doing so is painful and takes lots of people and hardware resources to get there for large amounts of data and queries :) As Erick says, work backwards from 60s and first establish how high the commit interval can be to satisfy your use case.. On 16 Mar 2015 16:04, Erick Erickson

Re: Jetty version

2015-03-12 Thread Ramkumar R. Aiyengar
Yes, Solr 5.0 uses Jetty 8. FYI, the upcoming release 5.1 will move to Jetty 9. Also, just in case it matters -- as noted in the 5.0 release notes, the use of Jetty is now an implementation detail and we might move away from it in the future -- so you shouldn't be depending on Solr using Jetty

Re: 4.10.4 - nodes up, shard without leader

2015-03-09 Thread Ramkumar R. Aiyengar
The update log replay issue looks like https://issues.apache.org/jira/browse/SOLR-6583 On 9 Mar 2015 01:41, Mark Miller markrmil...@gmail.com wrote: Interesting bug. First there is the already closed transaction log. That by itself deserves a look. I'm not even positive we should be replaying

Re: Using tmpfs for Solr index

2015-01-27 Thread Ramkumar R. Aiyengar
I don't have formal benchmarks, but we did get significant performance gains by switching from a RAMDirectory to a MMapDirectory on tmpfs, especially under parallel queries. Locking seemed to pull down the former.. On 23 Jan 2015 06:35, deniz denizdurmu...@gmail.com wrote: Would it boost any

Re: Solr Recovery process

2015-01-26 Thread Ramkumar R. Aiyengar
https://issues.apache.org/jira/browse/SOLR-6359 has a patch which allows this to be configured, it has not gone in as yet. Note that the current design of the UpdateLog causes it to be less efficient if the number is bumped up too much, but certainly worth experimenting with. On 22 Jan 2015

Re: Easiest way to embed solr in a desktop application

2015-01-16 Thread Ramkumar R. Aiyengar
That's correct, even though it should still be possible to embed Jetty, that could change in the future, and that's why support for pluggable containers is being taken away. If you need to deal with the index at a lower level, there's always Lucene you can use as a library instead of Solr. But I

Re: Solr startup script in version 4.10.3

2015-01-08 Thread Ramkumar R. Aiyengar
Versions 4.10.3 and beyond already use server rather than example, which still finds a reference in the script purely for back compat. A major release 5.0 is coming soon, perhaps the back compat can be removed for that. On 6 Jan 2015 09:30, Dominique Bejean dominique.bej...@eolya.fr wrote: Hi,

Re: Dealing with bad apples in a SolrCloud cluster

2014-11-26 Thread Ramkumar R. Aiyengar
As Eric mentions, his change to have a state where indexing happens but querying doesn't surely helps in this case. But these are still boolean decisions of send vs don't send. In general, it would be nice to abstract the routing policy so that it is pluggable. You could then do stuff like have a

Re: any difference between using collection vs. shard in URL?

2014-11-06 Thread Ramkumar R. Aiyengar
Do keep one thing in mind though. If you are already doing the work of figuring out the right shard leader (through solrJ or otherwise), using that location with just the collection name might be suboptimal if there are multiple shard leaders present in the same instance -- the collection name

Re: Sharding configuration

2014-11-01 Thread Ramkumar R. Aiyengar
On 30 Oct 2014 23:46, Erick Erickson erickerick...@gmail.com wrote: This configuration deals with all the replication, NRT processing, self-repair when nodes go up and down and all that, but since there's no second trip to get the docs from shards your query performance won't be affected.

Re: Sharding configuration

2014-11-01 Thread Ramkumar R. Aiyengar
On 30 Oct 2014 14:49, Shawn Heisey apa...@elyograg.org wrote: In order to see a gain in performance from multiple shards per server, the server must have a lot of CPUs and the query rate must be fairly low. If the query rate is high, then all the CPUs will be busy just handling simultaneous

Re: Sharding configuration

2014-10-28 Thread Ramkumar R. Aiyengar
As far as the second option goes, unless you are using a large amount of memory and you reach a point where a JVM can't sensibly deal with a GC load, having multiple JVMs wouldn't buy you much. With a 26GB index, you probably haven't reached that point. There are also other shared resources at an

Re: Advice on highlighting

2014-09-14 Thread Ramkumar R. Aiyengar
https://issues.apache.org/jira/plugins/servlet/mobile#issue/LUCENE-2878 provides lucene API what you are trying to do, it's not yet in though. There's a fork which has the change in https://github.com/flaxsearch/lucene-solr-intervals On 12 Sep 2014 21:24, Craig Longman clong...@iconect.com wrote:

Re: Scaling to large Number of Collections

2014-08-31 Thread Ramkumar R. Aiyengar
On 31 Aug 2014 13:24, Mark Miller markrmil...@gmail.com wrote: On Aug 31, 2014, at 4:04 AM, Christoph Schmidt christoph.schm...@moresophy.de wrote: we see at least two problems when scaling to large number of collections. I would like to ask the community, if they are known and maybe

Re: Why does CLUSTERSTATUS return different information than the web cloud view?

2014-08-26 Thread Ramkumar R. Aiyengar
ZK has the list of live nodes available as a set of ephemeral nodes. You can use /zookeeper on Solr or talk to ZK directly to get that list. On 24 Aug 2014 03:08, Nathan Neulinger nn...@neulinger.org wrote: Is there a way to query the 'live node' state without sending a query to every node

Re: Disabling transaction logs

2014-08-13 Thread Ramkumar R. Aiyengar
(1) sounds a lot like SOLR-6261 I mention above. There are possibly other improvements since 4.6.1 as Mark mentions, I would certainly suggest you test with the latest version with the issue above patched (or use the current stable branch in svn, branch_4x) to see if that makes a difference.

Re: Disabling transaction logs

2014-08-09 Thread Ramkumar R. Aiyengar
I didn't realise you could even disable tlog when running SolrCloud, but as Anshum says it's a bad idea. In all possibility, even if it worked, removing transaction logs is likely to make your restart slower, SolrCloud would always be forced to do a full recovery because it cannot now use tlogs

Re: SolrCloud without NRT and indexing only on the master

2014-07-31 Thread Ramkumar R. Aiyengar
I agree with Erick that this gain you are looking at might not be worth, so do measure and see if there's a difference. Also, the next release of Solr is to have some significant improvements when it comes to CPU usage under heavy indexing load, and we have had at least one anecdote so far where

Re: Anybody knows of a project that indexes SVN repos into Solr?

2014-06-02 Thread Ramkumar R. Aiyengar
Not an exact answer.. OpenGrok uses Lucene, but not Solr. On 2 Jun 2014 07:48, Alexandre Rafalovitch arafa...@gmail.com wrote: Hello, Anybody knows of a recent projects that index SVN repos for Solr search? With or without UI. I know of similar efforts for other VCS, but the only thing I

Re: Distributed Search in Solr with different queries per shard

2014-05-25 Thread Ramkumar R. Aiyengar
I agree with Eric that this is premature unless you can show that it makes a difference. Firstly why are you splitting the data into multiple time tiers (one recent, and one all) and then waiting to merge results from all of them? Time tiering is useful when you can do the search separately on

Re: Can I reconstruct text from tokens?

2014-04-18 Thread Ramkumar R. Aiyengar
missing something? Regards, Alex On 16/04/2014 10:59 pm, Ramkumar R. Aiyengar andyetitmo...@gmail.com wrote: Logically if you tokenize and put the results in a multivalued field, you should be able to get all values in sequence? On 16 Apr 2014 16:51, Alexandre Rafalovitch arafa

Re: Can I reconstruct text from tokens?

2014-04-16 Thread Ramkumar R. Aiyengar
Logically if you tokenize and put the results in a multivalued field, you should be able to get all values in sequence? On 16 Apr 2014 16:51, Alexandre Rafalovitch arafa...@gmail.com wrote: Hello, If I use very basic tokenizers, e.g. space based and no filters, can I reconstruct the text from

Re: svn vs GIT

2014-04-14 Thread Ramkumar R. Aiyengar
ant compile / ant -f solr dist / ant test certainly work, I use them with a git working copy. You trying something else? On 14 Apr 2014 19:36, Jeff Wartes jwar...@whitepages.com wrote: I vastly prefer git, but last I checked, (admittedly, some time ago) you couldn't build the project from the

Re: update in SolrCloud through C++ client

2014-02-16 Thread Ramkumar R. Aiyengar
If only availability is your concern, you can always keep a list of servers to which your C++ clients will send requests, and round robin amongst them. If one of the servers go down, you will either not be able to reach it or get a 500+ error in the HTTP response, you can take it out of

Re: SolrCloud Zookeeper disconnection/reconnection

2014-02-16 Thread Ramkumar R. Aiyengar
Start with http://wiki.apache.org/solr/SolrPerformanceProblems It has a section on GC tuning and a link to some example settings. On 16 Feb 2014 21:19, lboutros boutr...@gmail.com wrote: Thanks a lot for your answer. Is there a web page, on the wiki for instance, where we could find some JVM

Re: SolrCloud Zookeeper disconnection/reconnection

2014-02-14 Thread Ramkumar R. Aiyengar
Ludovic, recent Solr changes won't do much to prevent ZK session expiry, you might want to enable GC logging on Solr and Zookeeper to check for pauses and tune appropriately. The patch below fixes a situation under which the cloud can get to a bad state during the recovery after session expiry.

Re: need help in understating solr cloud stats data

2014-02-05 Thread Ramkumar R. Aiyengar
We have had success with starting up Jolokia in the same servlet container as Solr, and then using its REST/Bulk API to JMX from the application of choice. On 4 Feb 2014 17:16, Walter Underwood wun...@wunderwood.org wrote: I agree that sorting and filtering stats in Solr is not a good idea.

Re: Removing last replica from a SolrCloud collection

2014-02-02 Thread Ramkumar R. Aiyengar
There's already an issue for this, https://issues.apache.org/jira/browse/SOLR-5209, we were once bitten by the same issue, when we were trying to relocate a shard. As Mark mentions, the idea was to do this in zk truth mode, the link also references where that work is being done. On 31 Jan 2014

Re: Solr limitations

2013-07-10 Thread Ramkumar R. Aiyengar
if they're correct, perhaps start to trim my requirements etc. FWIW, Erick On Tue, Jul 9, 2013 at 4:07 AM, Ramkumar R. Aiyengar andyetitmo...@gmail.com wrote: 5. No more than 32 nodes in your SolrCloud cluster. I hope this isn't too OT, but what tradeoffs is this based on? Would have thought

Re: Solr limitations

2013-07-09 Thread Ramkumar R. Aiyengar
5. No more than 32 nodes in your SolrCloud cluster. I hope this isn't too OT, but what tradeoffs is this based on? Would have thought it easy to hit this number for a big index and high load (hence with the view of both the number of shards and replicas horizontally scaling..) 6. Don't return

Re: whole index in memory

2013-06-01 Thread Ramkumar R. Aiyengar
In general, just increasing the cache sizes to make everything fit in memory might not always give you best results. Do keep in mind that the caches are in Java memory and that incurs the penalty of garbage collection and other housekeeping Java's memory management might have to do. Reasonably