Anybody an idea?
Dec 5, 2012 3:52:32 PM org.apache.solr.client.solrj.impl.HttpClientUtil
createClient
INFO: Creating new http client,
config:maxConnections=500maxConnectionsPerHost=16
Dec 5, 2012 3:52:33 PM
org.apache.solr.update.processor.LogUpdateProcessor finish
INFO: [intradesk]
Hi
Query's with wildcards or fuzzy operators are called multi term queries and do
not pass through the field's analyzer as you might expect.
See: http://wiki.apache.org/solr/MultitermQueryAnalysis
-Original message-
From:Pratyul Kapoor praty...@gmail.com
Sent: Thu 06-Dec-2012
Hi,
You can either use omitTermFreqAndPositions on that field or set a custom
similarity for that field that returns 1 for tf 0.
http://wiki.apache.org/solr/SchemaXml#Common_field_options
http://wiki.apache.org/solr/SchemaXml#Similarity
-Original message-
From:Amit Jha
Sounds like it's worth a try! Thanks Andre.
Tom
On 5 Dec 2012, at 17:49, Andre Bois-Crettez andre.b...@kelkoo.com wrote:
If you do grouping on source_id, it should be enough to request 3 times
more documents than you need, then reorder and drop the bottom.
Is a 3x overhead acceptable ?
Hi,
The file descriptor count is always quite low.. At the moment after heavy
usage for a few days file descriptor counts are between 100-150 and I don't
have any errors in the logs. My worry at the moment is around all the
CLOSE_WAIT connections I am seeing. This is particularly true on
Hey all,
I'm in the process of migrating a single Solr 4.0 instace to a SolrCloud
setup for availability reasons.
After studying the wiki page for SolrCloud I'm not sure what the absolute
minimum setup is that would allow for one machine to go down.
Would it be enough to have one shard with one
Hi,
I currently have this setup:
Bring in data into the description schema and then have this code:
copyField source=description dest=truncated_description
maxChars=168/
To then truncate the description and move it to truncated_description.
This works fine.
I was wondering, is it possible so
-Original message-
From:Mark Miller markrmil...@gmail.com
Sent: Wed 05-Dec-2012 23:23
To: solr-user@lucene.apache.org
Subject: Re: The shard called `properties`
See the custom hashing issue - the UI has to be updated to ignore this.
Ah yes, i see it in clusterstate.json.
Thanks
It depends on if you are running embedded zk or an external zk ensemble.
One leader and a replica is all you need for Solr to allow on machine to go
down - but if those same machines are running zookeeper, you need 3.
You could also run zookeeper on one external machine and then it would be
but if those same machines are running zookeeper, you need 3.
And one of those 3 can go down? I thought 3 was the minimum number of
zookeepers.
-- Jack Krupansky
-Original Message-
From: Mark Miller
Sent: Thursday, December 06, 2012 9:30 AM
To: solr-user@lucene.apache.org
Subject:
The quorum is the minimun, so it depends on how many you have running in the
ensemble. If it's three or four, then two is the quorum and therefore the
minumum. Three is regarded as a minumum in the ensemble because two makes no
sense.
-Original message-
From:Jack Krupansky
On Thu, Dec 6, 2012 at 9:56 AM, Markus Jelsma
markus.jel...@openindex.io wrote:
The quorum is the minimun, so it depends on how many you have running in the
ensemble. If it's three or four, then two is the quorum
I think that for 4 ZK servers, then 3 would be the quorum?
-Yonik
Hi users,
Could you please help us on tuning the solr search performance. we have tried
to do some PT on solr instance with 8GB RAM and 50,000 record in index. and we
got 33 concurrent usr hitting the instance with on avg of 17.5 hits per second
with response time 2 seconds. as it is very high
On Dec 6, 2012, at 6:54 AM, Yonik Seeley yo...@lucidworks.com wrote:
On Thu, Dec 6, 2012 at 9:56 AM, Markus Jelsma
markus.jel...@openindex.io wrote:
The quorum is the minimun, so it depends on how many you have running in the
ensemble. If it's three or four, then two is the quorum
I
-Original message-
From:Yonik Seeley yo...@lucidworks.com
Sent: Thu 06-Dec-2012 16:01
To: solr-user@lucene.apache.org
Subject: Re: Minimum HA Setup with SolrCloud
On Thu, Dec 6, 2012 at 9:56 AM, Markus Jelsma
markus.jel...@openindex.io wrote:
The quorum is the minimun, so it
First, forget about master/slave with SolrCloud! Leaders really exist to
resolve conflicts, the old notion of M/S replication is largely irrelevant.
Updates can go to any node in the cluster, leader, replica, whatever. The
node forwards the doc to the correct leader based on a hash of the
Thanks a lot guys!
On Thu, Dec 6, 2012 at 4:22 PM, Markus Jelsma markus.jel...@openindex.iowrote:
-Original message-
From:Yonik Seeley yo...@lucidworks.com
Sent: Thu 06-Dec-2012 16:01
To: solr-user@lucene.apache.org
Subject: Re: Minimum HA Setup with SolrCloud
On Thu, Dec
However the tomcat logs are reporting:
INFO: Adding
'file:/opt/solr/contrib/extraction/lib/juniversalchardet-1.0.3.jar' to
classloader
Dec 6, 2012 3:42:57 PM org.apache.solr.core.SolrResourceLoader
replaceClassLoader
Original Message
Subject:Tika error
Date:
On Dec 6, 2012, at 9:50 AM, joe.cohe...@gmail.com joe.cohe...@gmail.com
wrote:
Is there an out-of-the-box or have anyone already implemented a feature for
collecting statistics on queries?
What sort of statistics are you talking about? Are you talking about
collecting information in
On Wed, Dec 5, 2012 at 5:17 PM, Mark Miller markrmil...@gmail.com wrote:
See the custom hashing issue - the UI has to be updated to ignore this.
Unfortunately, it seems that clients have to be hard coded to realize
properties is not a shard unless we add another nested layer.
Yeah, I talked
Hi,
In most of the examples I have seen for configuring the
DirectSolrSpellChecker the minPrefix attribute is set to 1 (and this is the
default value as well).
Is there any specific reason for this - would performance take a hit if it
was set to 0? We'd like to support returning corrections
Yeah, the main problem with it didn't really occur to me until I saw the
properties shard in the cluster view.
I started working on the UI to ignore it the other day and then never got there
because I was getting all sorts of weird 'busy' errors from svn for a while and
didn't have a clean
Hi All,
I followed the advice Michael and the timings reduced to couple of hours
now from 6-8 hours :-)
I have attached the solrconfig.xml we're using, can you let me know if I'm
missing something..
Thanks,
Sandeep
?xml version=1.0 encoding=UTF-8 ?
!--
Licensed to the Apache Software Foundation
: I'm sorry, I don't see how the resource loader awareness is relevant to
: schema awareness? Or perhaps you didn't imply that? Good to know about the
No, my mistake ... typed one when i ment the other one.
: I guess I use the core to get to the schema then. Hmm, I may recall trying
: that at
You'll need to tell us more about your custom component so that we can
make some suggestions as to how to update it to work with SolrCloud.
In particular: what exactly are you doing with the result from
getConfigDir() ? ... if you are just using it to build a path to a File
that you open to
http://lucenerevolution.org/
Lucene Revolution 2013 will take place at The Westin San Diego on April 29
- May 2, 2013. Many of the brightest minds in open source search will
convene at this 4th annual Lucene Revolution to discuss topics and trends
driving the next generation of search. The
: Hi - no we're not getting any errors because we enabled positions on all
: fields that are also listed in the qf-parameter. If we don't, and send a
: phrase query we would get an error such as:
:
: java.lang.IllegalStateException: field h1 was indexed without position
data; cannot run
:
Grouping should work:
group=truegroup.field=source_idgroup.limit=3group.main=true
On Thu, Dec 6, 2012 at 2:35 AM, Tom Mortimer bano...@gmail.com wrote:
Sounds like it's worth a try! Thanks Andre.
Tom
On 5 Dec 2012, at 17:49, Andre Bois-Crettez andre.b...@kelkoo.com wrote:
If you do
Thanks, but even with group.main=true the results are not in relevancy (score)
order, they are in group order. Which is why I can't use it as is.
Tom
On 6 Dec 2012, at 19:00, Way Cool way1.wayc...@gmail.com wrote:
Grouping should work:
Jason,
Thanks for raising it!
Erick,
That's what I want to discuss for a long time. Frankly speaking, the
question is:
if old-school (master/slave) search deployments doesn't comply to vision by
SolrCloud/ElasticSearch, does it mean that they are wrong?
Let me enumerate kinds of 'old-school
Hello,
What's you OS/cpu? is it a VM or real hardware? which jvm do you run? with
which parameters? have you checked GC log? what's the index size? what's a
typical query parameters? what's an average number of results in the
query? have you tried to run query with debugQuery=true during hard
Hi Joe,
http://sematext.com/search-analytics/index.html is free and will give you a
bunch of reports about search (Solr or anything else). Not queries by IP,
though - for that you better grep logs.
Yes, you could also implement your own SearchComponent, assuming the
servers/LBs in front of Solr
1 is the minimum :)
2 makes no sense.
3 must be the most common number in the zoo.
Otis
--
Performance Monitoring - http://sematext.com/spm/index.html
Search Analytics - http://sematext.com/search-analytics/index.html
On Thu, Dec 6, 2012 at 9:46 AM, Jack Krupansky
One is the loneliest number that you'll ever do,
Two can be as bad as one, it's the loneliest number since the single Zoo.
Michael Della Bitta
Appinions
18 East 41st Street, 2nd Floor
New York, NY 10017-6271
www.appinions.com
Where Influence
We measured for just 3 nodes the overhead is around 100ms. We also noticed is
that CPU spikes to 100% and some queries get blocked, this happens only when
cloud has multiple nodes but does not happen on single node. All the nodes
has the exact same configuration and JVM setting and hardware
Rewind.
If 1 is the minimum, what is the 3 minimum all about?
The zk web page does say Three ZooKeeper servers is the minimum recommended
size for an ensemble, and we also recommend that they run on separate
machines - but it does say recommended.
But back to the original question - it
Slightly more recent link:
http://zookeeper.apache.org/doc/r3.4.5/zookeeperAdmin.html
-- Jack Krupansky
-Original Message-
From: Jack Krupansky
Sent: Thursday, December 06, 2012 5:21 PM
To: solr-user@lucene.apache.org
Subject: Re: Minimum HA Setup with SolrCloud
Rewind.
If 1 is the
And that link includes this sentence: For example, with four machines
ZooKeeper can only handle the failure of a single machine; if two machines
fail, the remaining two machines do not constitute a majority.
wunder
On Dec 6, 2012, at 2:25 PM, Jack Krupansky wrote:
Slightly more recent link:
In case you missed the parallel thread running right now, a read of the main
zookeeper admin web page is a good background to have:
http://zookeeper.apache.org/doc/r3.4.5/zookeeperAdmin.html
-- Jack Krupansky
-Original Message-
From: Jamie Johnson
Sent: Thursday, December 06, 2012
On Thu, Dec 6, 2012 at 5:21 PM, Jack Krupansky j...@basetechnology.com wrote:
If 1 is the minimum, what is the 3 minimum all about?
The minimum for running an ensemble (a cluster) and having any sort of
fault tolerance?
The zk web page does say Three ZooKeeper servers is the minimum
Jack,
The recommended ensemble configured size takes into consideration that you
might have a node failure. You can still run with two while you replace the
third, so it's sort of like RAID-5.
If you run with four configured nodes, you're still running with
RAID-5-like failure survival
I just rethought what I wrote and it doesn't make any sense. :)
If you have two remaining nodes left when you have a three node ensemble,
how are ties broken? Or does Zookeeper not resolve ties since it doesn't
tolerate partitions?
Michael
Michael Della Bitta
3 is the minimum if you want to allow a node to go down.
1 is the minimum if you want the thing to work at all - but if the 1 goes down,
ZooKeeper may stop working…
- Mark
On Dec 6, 2012, at 2:21 PM, Jack Krupansky j...@basetechnology.com wrote:
Rewind.
If 1 is the minimum, what is the 3
On Dec 6, 2012, at 2:32 PM, Michael Della Bitta
michael.della.bi...@appinions.com wrote:
I just rethought what I wrote and it doesn't make any sense. :)
If you have two remaining nodes left when you have a three node ensemble,
how are ties broken? Or does Zookeeper not resolve ties since
I trust that you have the right answer, Mark, but maybe I'm just struggling
to parse this statement: the remaining two machines do not constitute a
majority.
If you start with 3 zk and lose one, you have an ensemble that does not
constitute a majority.
So, the question is what can or can't
On Dec 6, 2012, at 2:55 PM, Jack Krupansky j...@basetechnology.com wrote:
I mean, there must be some hard-core downside, other than that you can't lose
any more nodes.
Nope, not really. You just can't lose any more nodes.
Technically, you will also lose a bit of read performance - but write
I also did a test running a load directed to one single server in the cloud
and checked the CPU usage of other servers. It seems that even if there are
no load directed to those servers there is a CPU spike each minute. Did you
also di this test on the SolrCloud, any observations or suggestions?
On Thu, Dec 6, 2012 at 5:55 PM, Jack Krupansky j...@basetechnology.com wrote:
I trust that you have the right answer, Mark, but maybe I'm just struggling
to parse this statement: the remaining two machines do not constitute a
majority.
If you start with 3 zk and lose one, you have an ensemble
On Thu, Dec 6, 2012 at 12:17 PM, Sandeep Mestry sanmes...@gmail.com wrote:
I followed the advice Michael and the timings reduced to couple of hours now
from 6-8 hours :-)
Just changing from mmap to NIO, eh? What does your system look like?
operating system, JVM, drive, memory, etc?
-Yonik
see: http://wiki.apache.org/solr/DistributedSearch
joins aren't supported in distributed search. Any time you have more than
one shard in SolrCloud, you are, by definition, doing distributed search.
Best
Erick
On Wed, Dec 5, 2012 at 10:16 AM, adm1n evgeni.evg...@gmail.com wrote:
Hi,
I'm
I suspect that you're seeing a timeout issue and the simplest fix might be
to up the timeouts, probably at the servlet-level.
You might get some evidence that this is the issue if your log files for
the time when this happens show some unusual activity, garbage collection
is a popular reason for
I've seen the Waiting until we see... message as well, it seems for me to
be an artifact of bouncing servers rapidly. It took a lot of patience to
wait until the timeoutin value got all the way to 0, but when it did the
system recovered.
As to your original problem, are you possibly getting page
Not quite. Too much memory for the JVM means that it starves the op system
and the caching that goes on there. Objects that consume memory are created
all the time in Solr. They won't be recovered until some threshold is
passed. So you can be sure that the more memory you allocate to the JVM,
the
Why not do this at the app level? You can simply reorder the docs returned
in your groups by score and display it that way.
Or am I misunderstanding your requirement?
Best
Erick
On Thu, Dec 6, 2012 at 11:03 AM, Tom Mortimer bano...@gmail.com wrote:
Thanks, but even with group.main=true the
+1 to using IntelliJ's remote debugging facilities.
I've done this with Tomcat too - just edit catalina.sh to add the parameters to
the JVM invocation that the IntelliJ remote run configuration suggests.
With Tomcat you'll have to build the war using the Ant build, but that's more
sensible
But that is the context I was originally referring to - that with 4 zk you
can lose only one, that you can't lose two. So, if you want to tolerate a
loss on one, 4 zk would be the minimum... but then it was claimed that you
COULD start with 3 zk and loss of one would be fine. I mean whether you
well, you could probably do what you want. Go ahead and index on the super
cool AWS instance, just don't bring the replicas up yet. All the indexing
is going to this machine. Once your index is constructed, bring up
replicas. Old-style replication will take place and you should be off to
the
The Zookeeper ensemble knows the total size. It does not adjust it each time
that a machine is partitioned or down.
Two machines is not a quorum for a four machine ensemble.
Why do you think that the documentation would get this wrong?
wunder
On Dec 6, 2012, at 4:14 PM, Jack Krupansky wrote:
Yes - it means that 001 went down (or more likely had it's connection to
ZooKeeper interrupted? that's what I mean about a session timeout - if the
solr-zk link is broken for longer than the session timeout that will trigger a
leader election and when the connection is reestablished, the node
It's still an unresolved mystery, for now.
-- Jack Krupansky
-Original Message-
From: Walter Underwood
Sent: Thursday, December 06, 2012 7:30 PM
To: solr-user@lucene.apache.org
Subject: Re: Minimum HA Setup with SolrCloud
The Zookeeper ensemble knows the total size. It does not
What is the mystery? Two is not more than half of four. Therefore, two machines
is not a quorum for a four machine Zookeeper ensemble.
wunder
On Dec 6, 2012, at 4:50 PM, Jack Krupansky wrote:
It's still an unresolved mystery, for now.
-- Jack Krupansky
-Original Message- From:
Ok we think we found out the issue here. When solrcloud is started without
specifying numShards argument solrcloud starts with a single shard but still
thinks that there are multiple shards, so it forwards every single query to
all the nodes in the cloud. We did a tcpdump on the node where queries
There are some gains to be made in Solr's distributed search code. A few
weeks about I spent time profiling dist search using dtrace/btrace and
found some areas for improvement. I planned on writing up some blog posts
and providing patches but I'll list them off now in case others have input.
The part I still find confusing is that if you start with 3 and lose 1, your
have 2, which means you can't always break a tie, right? How is this
explained? As opposed to saying that 4 is the minimum if you need to
tolerate a loss of 1.
-- Jack Krupansky
-Original Message-
From:
Configure an ensemble of three. When one goes down, you still have an ensemble
of three, but with one down. The ensemble size is not reset after failures.
wunder
On Dec 6, 2012, at 5:20 PM, Jack Krupansky wrote:
The part I still find confusing is that if you start with 3 and lose 1, your
And this is precisely why the mystery remains - because you're only
describing half the picture! Describe the rest of the picture - including
what exactly those two zks can and can't do, including resolution of ties
and the concept of constitu.
-- Jack Krupansky
-Original Message-
Oops...
And this is precisely why the mystery remains - because you're only
describing half the picture! Describe the rest of the picture - including
what exactly those two zks can and can't do, including resolution of ties
and the concept of constituting a majority and a quorum.
I'm not
I should have sent this some time ago:
https://issues.apache.org/jira/browse/SOLR-3940 Rejoining the leader election
incorrectly triggers the code path for a fresh cluster start rather than fail
over.
The above is a somewhat ugly bug.
It means that if you are playing around with recovery and
Ryan, my new best friend! Please, file JIRA issue(s) for these items!
I'm sure you will get some feedback.
- Mark
On Dec 6, 2012, at 5:09 PM, Ryan Zezeski rzeze...@gmail.com wrote:
There are some gains to be made in Solr's distributed search code. A few
weeks about I spent time profiling
On Dec 6, 2012, at 5:08 PM, sausarkar sausar...@ebay.com wrote:
We solved the issue by explicitly adding numShards=1 argument to the solr
start up script. Is this a bug?
Sounds like it…perhaps related to SOLR-3971…not sure though.
- Mark
Hi,
Im generating SOLR using SOLR 3.3, Apache Tomcat 7.0.19. Some times my
Tomcat get hanged giving below error in log.
SEVERE: Full Import
failed:org.apache.solr.handler.dataimport.DataImportHandlerException:
java.lang.OutOfMemoryError: PermGen space
at
On Thu, Dec 6, 2012 at 8:42 PM, Jack Krupansky j...@basetechnology.com wrote:
And this is precisely why the mystery remains - because you're only
describing half the picture! Describe the rest of the picture - including
what exactly those two zks can and can't do, including resolution of ties
You might consider implementing some jmx tooling. Nagios is one of several
such engines.
wiki.apache.org/*tomcat*/FAQ/Monitoring
On Thursday, December 6, 2012, aniljayanti wrote:
Hi,
Im generating SOLR using SOLR 3.3, Apache Tomcat 7.0.19. Some times my
Tomcat get hanged giving below
You might consider implementing some jmx tooling. Nagios is one of several
such engines.
wiki.apache.org/*tomcat*/FAQ/Monitoring
On Thursday, December 6, 2012, aniljayanti wrote:
Hi,
Im generating SOLR using SOLR 3.3, Apache Tomcat 7.0.19. Some times my
Tomcat get hanged giving below
Thank you. I will read about these commands.
I don't copy anything anywhere. I just edit the code and click Run, IDEA does
everything for me. I guess, IDEA's artifacts are exactly for these routines.
Anyway, there are no such instructions, you described, anywhere in the solr
wiki, so it's hard
I think I've figured out how to express it: A zk node can offer its services
if it is able to communicate with more than half of the specified ensemble
size, which assures that there is no split brain, where two or more
competing groups of inter-communicating nodes could offer services that
On 7 December 2012 12:30, Zeng Lames lezhi.z...@gmail.com wrote:
Hi,
wanna to know is there any plugin / tool to import data from Excel to Solr.
[...]
You could export to CSV from Excel, and import the CSV into Solr:
http://wiki.apache.org/solr/UpdateCSV
Regards,
Gora
77 matches
Mail list logo