I side with Toke on this. Enterprise bare metal machines often have
hundreds of gigs of memory and tens of CPU cores -- you would have to fit
multiple instances in a machine to make use of them to circumvent huge
heaps.
If this is not a common case now, it could well be in the future the way
I should add to Erick's point that the test framework allows you to test
HTTP APIs through an embedded Jetty instance, so you should be able to do
anything that you do with a remote Solr instance from code..
On 12 Jan 2016 18:24, "Erick Erickson" wrote:
> And a neater
M is the number of ids you want for each group, specified by group.limit.
It's unrelated to the number of rows requested..
On 21 Aug 2015 19:54, SolrUser1543 osta...@gmail.com wrote:
Ramkumar R. Aiyengar wrote
Grouping does need 3 phases.. The phases are:
(2) For the N groups, each shard
Custom authentication support was added in 5x, and the imminent (in the
next few days) 5.3 release has a lot of features in this regard, including
a basic authentication module, I would suggest upgrading to it. 5x versions
(include 5.3) do support Java 7, so I don't see an issue here?
On 20 Aug
Grouping does need 3 phases.. The phases are:
(1) Each shard is asked for the top N groups (instead of ids), with the
sort value. The federator then sorts the groups from all shards and chooses
the top N groups.
(2) For the N groups, each shard is asked for the top M ids (M is
configurable per
Please open a JIRA with details of what the issues are, we should try to
support this..
On 18 Jun 2015 15:07, Bence Vass bence.v...@inso.tuwien.ac.at wrote:
Hello,
Is there any documentation on how to start Solr 5.2.1 on Solaris (Solaris
10)? The script (solr start) doesn't work out of the
I started with an empty Solr instance and Firefox 38 on Linux. This is the
trunk source..
There's a 'No cores available. Go and create one' button available in the
old and the new UI. In the old UI, clicking it goes to the core admin, and
pops open the dialog for Add Core. The new UI only goes to
This shouldn't happen, but if it does, there's no good way currently for
Solr to automatically fix it. There are a couple of issues being worked on
to do that currently. But till then, your best bet is to restart the node
which you expect to be the leader (you can look at ZK to see who is at the
cleaned up, I'd fill a JIRA
to
address the issue. Those directories should be removed over time. At
times
there will have to be a couple around at the same time and others may
take
a while to clean up.
- Mark
On Tue, Apr 28, 2015 at 3:27 AM Ramkumar
R. Aiyengar
andyetitmo...@gmail.com wrote
SolrCloud does need up to twice the amount of disk space as your usual
index size during replication. Amongst other things, this ensures you have
a full copy of the index at any point. There's no way around this, I would
suggest you provision the additional disk space needed.
On 20 Apr 2015 23:21,
It shouldn't be any different without the patch, or with the patch and
(100,10) as parameters. Which is why I wanted you to check with 100,10.. If
you see the same issue with that, then the patch is probably not an issue,
may be it is with the patched build in general..
On 30 Mar 2015 13:01,
I doubt this has anything to do with the patch. Do you observe the same
behaviour if you reduce the values for the config to defaults? (100, 10)
On 30 Mar 2015 09:51, forest_soup tanglin0...@gmail.com wrote:
https://issues.apache.org/jira/browse/SOLR-6359
I also posted the questions to the
Not a direct answer, but Anshum just created this..
https://issues.apache.org/jira/browse/SOLR-7275
On 20 Mar 2015 23:21, Furkan KAMACI furkankam...@gmail.com wrote:
Is there anyway to use ConcurrentUpdateSolrServer for secured Solr as like
CloudSolrServer:
Is your concern that you want to be able to modify source code just on your
machine or that you can't for some reason install svn?
If it's the former, even if you checkout using svn, you can't modify
anything outside the machine as changes can be checked in only by the
committers of the project.
Yes, and doing so is painful and takes lots of people and hardware
resources to get there for large amounts of data and queries :)
As Erick says, work backwards from 60s and first establish how high the
commit interval can be to satisfy your use case..
On 16 Mar 2015 16:04, Erick Erickson
Yes, Solr 5.0 uses Jetty 8.
FYI, the upcoming release 5.1 will move to Jetty 9.
Also, just in case it matters -- as noted in the 5.0 release notes, the use
of Jetty is now an implementation detail and we might move away from it in
the future -- so you shouldn't be depending on Solr using Jetty
The update log replay issue looks like
https://issues.apache.org/jira/browse/SOLR-6583
On 9 Mar 2015 01:41, Mark Miller markrmil...@gmail.com wrote:
Interesting bug.
First there is the already closed transaction log. That by itself deserves
a look. I'm not even positive we should be replaying
I don't have formal benchmarks, but we did get significant performance
gains by switching from a RAMDirectory to a MMapDirectory on tmpfs,
especially under parallel queries. Locking seemed to pull down the former..
On 23 Jan 2015 06:35, deniz denizdurmu...@gmail.com wrote:
Would it boost any
https://issues.apache.org/jira/browse/SOLR-6359 has a patch which allows
this to be configured, it has not gone in as yet.
Note that the current design of the UpdateLog causes it to be less
efficient if the number is bumped up too much, but certainly worth
experimenting with.
On 22 Jan 2015
That's correct, even though it should still be possible to embed Jetty,
that could change in the future, and that's why support for pluggable
containers is being taken away.
If you need to deal with the index at a lower level, there's always Lucene
you can use as a library instead of Solr.
But I
Versions 4.10.3 and beyond already use server rather than example, which
still finds a reference in the script purely for back compat. A major
release 5.0 is coming soon, perhaps the back compat can be removed for that.
On 6 Jan 2015 09:30, Dominique Bejean dominique.bej...@eolya.fr wrote:
Hi,
As Eric mentions, his change to have a state where indexing happens but
querying doesn't surely helps in this case.
But these are still boolean decisions of send vs don't send. In general, it
would be nice to abstract the routing policy so that it is pluggable. You
could then do stuff like have a
Do keep one thing in mind though. If you are already doing the work of
figuring out the right shard leader (through solrJ or otherwise), using
that location with just the collection name might be suboptimal if there
are multiple shard leaders present in the same instance -- the collection
name
On 30 Oct 2014 23:46, Erick Erickson erickerick...@gmail.com wrote:
This configuration deals with all
the replication, NRT processing, self-repair when nodes go up and
down and all that, but since there's no second trip to get the docs
from shards your query performance won't be affected.
On 30 Oct 2014 14:49, Shawn Heisey apa...@elyograg.org wrote:
In order to see a gain in performance from multiple shards per server,
the server must have a lot of CPUs and the query rate must be fairly
low. If the query rate is high, then all the CPUs will be busy just
handling simultaneous
As far as the second option goes, unless you are using a large amount of
memory and you reach a point where a JVM can't sensibly deal with a GC
load, having multiple JVMs wouldn't buy you much. With a 26GB index, you
probably haven't reached that point. There are also other shared resources
at an
https://issues.apache.org/jira/plugins/servlet/mobile#issue/LUCENE-2878
provides lucene API what you are trying to do, it's not yet in though.
There's a fork which has the change in
https://github.com/flaxsearch/lucene-solr-intervals
On 12 Sep 2014 21:24, Craig Longman clong...@iconect.com wrote:
On 31 Aug 2014 13:24, Mark Miller markrmil...@gmail.com wrote:
On Aug 31, 2014, at 4:04 AM, Christoph Schmidt
christoph.schm...@moresophy.de wrote:
we see at least two problems when scaling to large number of
collections. I would like to ask the community, if they are known and maybe
ZK has the list of live nodes available as a set of ephemeral nodes. You
can use /zookeeper on Solr or talk to ZK directly to get that list.
On 24 Aug 2014 03:08, Nathan Neulinger nn...@neulinger.org wrote:
Is there a way to query the 'live node' state without sending a query to
every node
(1) sounds a lot like SOLR-6261 I mention above. There are possibly other
improvements since 4.6.1 as Mark mentions, I would certainly suggest you
test with the latest version with the issue above patched (or use the
current stable branch in svn, branch_4x) to see if that makes a difference.
I didn't realise you could even disable tlog when running SolrCloud, but as
Anshum says it's a bad idea. In all possibility, even if it worked,
removing transaction logs is likely to make your restart slower, SolrCloud
would always be forced to do a full recovery because it cannot now use
tlogs
I agree with Erick that this gain you are looking at might not be worth, so
do measure and see if there's a difference.
Also, the next release of Solr is to have some significant improvements
when it comes to CPU usage under heavy indexing load, and we have had at
least one anecdote so far where
Not an exact answer.. OpenGrok uses Lucene, but not Solr.
On 2 Jun 2014 07:48, Alexandre Rafalovitch arafa...@gmail.com wrote:
Hello,
Anybody knows of a recent projects that index SVN repos for Solr
search? With or without UI.
I know of similar efforts for other VCS, but the only thing I
I agree with Eric that this is premature unless you can show that it makes
a difference.
Firstly why are you splitting the data into multiple time tiers (one
recent, and one all) and then waiting to merge results from all of them?
Time tiering is useful when you can do the search separately on
missing something?
Regards,
Alex
On 16/04/2014 10:59 pm, Ramkumar R. Aiyengar andyetitmo...@gmail.com
wrote:
Logically if you tokenize and put the results in a multivalued field, you
should be able to get all values in sequence?
On 16 Apr 2014 16:51, Alexandre Rafalovitch arafa
Logically if you tokenize and put the results in a multivalued field, you
should be able to get all values in sequence?
On 16 Apr 2014 16:51, Alexandre Rafalovitch arafa...@gmail.com wrote:
Hello,
If I use very basic tokenizers, e.g. space based and no filters, can I
reconstruct the text from
ant compile / ant -f solr dist / ant test certainly work, I use them with a
git working copy. You trying something else?
On 14 Apr 2014 19:36, Jeff Wartes jwar...@whitepages.com wrote:
I vastly prefer git, but last I checked, (admittedly, some time ago) you
couldn't build the project from the
If only availability is your concern, you can always keep a list of servers
to which your C++ clients will send requests, and round robin amongst them.
If one of the servers go down, you will either not be able to reach it or
get a 500+ error in the HTTP response, you can take it out of
Start with http://wiki.apache.org/solr/SolrPerformanceProblems It has a
section on GC tuning and a link to some example settings.
On 16 Feb 2014 21:19, lboutros boutr...@gmail.com wrote:
Thanks a lot for your answer.
Is there a web page, on the wiki for instance, where we could find some JVM
Ludovic, recent Solr changes won't do much to prevent ZK session expiry,
you might want to enable GC logging on Solr and Zookeeper to check for
pauses and tune appropriately.
The patch below fixes a situation under which the cloud can get to a bad
state during the recovery after session expiry.
We have had success with starting up Jolokia in the same servlet container
as Solr, and then using its REST/Bulk API to JMX from the application of
choice.
On 4 Feb 2014 17:16, Walter Underwood wun...@wunderwood.org wrote:
I agree that sorting and filtering stats in Solr is not a good idea.
There's already an issue for this,
https://issues.apache.org/jira/browse/SOLR-5209, we were once bitten by the
same issue, when we were trying to relocate a shard. As Mark mentions, the
idea was to do this in zk truth mode, the link also references where that
work is being done.
On 31 Jan 2014
if they're correct, perhaps start to trim my requirements
etc.
FWIW,
Erick
On Tue, Jul 9, 2013 at 4:07 AM, Ramkumar R. Aiyengar
andyetitmo...@gmail.com wrote:
5. No more than 32 nodes in your SolrCloud cluster.
I hope this isn't too OT, but what tradeoffs is this based on? Would have
thought
5. No more than 32 nodes in your SolrCloud cluster.
I hope this isn't too OT, but what tradeoffs is this based on? Would have
thought it easy to hit this number for a big index and high load (hence
with the view of both the number of shards and replicas horizontally
scaling..)
6. Don't return
In general, just increasing the cache sizes to make everything fit in
memory might not always give you best results. Do keep in mind that the
caches are in Java memory and that incurs the penalty of garbage collection
and other housekeeping Java's memory management might have to do.
Reasonably
45 matches
Mail list logo