Hoss
many thanks for the reply
Paul
On 8 March 2011 19:45, Chris Hostetter hossman_luc...@fucit.org wrote:
: 1. Why the problem occurs (has something changed between 1.4.1 and 3x)?
Various pieces of code dealing with config parsing have changed since
1.4.1 to be better about verifying
Hi all,
Does anyone know what does m on the y -axis stands for in req/sec graph for
update handler .
--
Thanks Regards,
Isan Fulia.
i am using solr for NRT with this version of solr ...
Solr Specification Version: 4.0.0.2010.10.26.08.43.14
Solr Implementation Version: 4.0-2010-10-26_08-05-39 1027394 - hudson -
2010-10-26 08:43:14
Lucene Specification Version: 4.0-2010-10-26_08-05-39
Lucene Implementation Version:
Hi all,
I just improved the Solr UIMA integration wiki page [1] so if anyone is
using it and/or has any feedback it'd be more than welcome.
Regards,
Tommaso
[1] : http://wiki.apache.org/solr/SolrUIMA
question: http://wiki.apache.org/solr/NearRealtimeSearchTuning
'PERFORMANCE WARNING: Overlapping onDeckSearchers=x
i got this message.
in my solrconfig.xml: maxWarmingSearchers=4, if i set this to 1 or 2 i got
exception. with 4 i got nothing, but the Performance Warning. the
wiki-articel
Are you using shards or have everything in same index?
- shards == distributed Search over several cores ? = yes, but not always.
but in generally not.
What problem did you experience with the StatsCompnent?
- if i use stats on my 34Million Index, no matter how many docs founded, the
sum takes
i am using NRT, and the caches are not always warmed, i think this is almost
a problem !?
-
--- System
One Server, 12 GB RAM, 2 Solr Instances, 7 Cores,
1 Core with 31 Million Documents other Cores 100.000
- Solr1 for
Great work!
On Wednesday 09 March 2011 11:20:41 Tommaso Teofili wrote:
Hi all,
I just improved the Solr UIMA integration wiki page [1] so if anyone is
using it and/or has any feedback it'd be more than welcome.
Regards,
Tommaso
[1] : http://wiki.apache.org/solr/SolrUIMA
--
Markus Jelsma
I tried to create an NRT like in the wiki but i got some problems with
autowarming and ondeckSearchers.
ervery minute i start a delta of one core and the other core start every
minute a commit of the index to search for it.
wiki says ... = 1 Searcher and fitlerCache warmupCount=3600. with this
maxWarmingSearcher=1 is good for current stable Solr versions where memory is
important. Overlapping warming searchers can be extremely memory consuming. I
don't know how cache warming behaves with NRT.
On Wednesday 09 March 2011 11:27:39 stockii wrote:
question:
Yes, I think this should be pushed upstream - insert a tee in the
document stream so that all documents go to both masters.
Then use a load balancer to make requests of the masters.
The tee itself then becomes a possible single point of failure, but
you didn't say anything about the
Hi Peter,
When I execute the commands you mentioned, nothing happend.
Below I show you the comands executed and the answered of they.
Sorry, but I don´t know how to enable the log; my jre is by default.
Rememeber I´m running the example-DIH (trunk\solr\example\example-DIH\solr);
java
make it sense to update solr for getting SOLR-571 ???
-
--- System
One Server, 12 GB RAM, 2 Solr Instances, 7 Cores,
1 Core with 31 Million Documents other Cores 100.000
- Solr1 for Search-Requests - commit every Minute
You have a large index with tough performance requirements on one server.
I would analyze your system to see if it's got any bottlenecks.
Watch out for auto-warming taking too long so it does not finish before next
commit()
Watch out for too frequent commits
Monitor mem usage (JConsole or
Hi,
You've included some output in your message, so I presume something
*did* happen when you ran the 'status' command (but it might not be
what you wanted to happen :-)
If you run:
http://localhost:8983/solr/mail/dataimport?command=status
and you get something like this back:
str
Jae,
NRT hasn't been implemented NRT as of yet in Solr, I think partially
because major features such as replication, caching, and uninverted
faceting suddenly are no longer viable, eg, it's another round of
testing etc. It's doable, however I think the best approach is a
separate request call
I think it's best to turn the warmupCount to zero because usually
there isn't time in between the creation of a new searcher to run the
warmup queries, eg, it'll negatively impact the desired goal of low
latency new index readers?
On Wed, Mar 9, 2011 at 3:41 AM, stockii stock.jo...@googlemail.com
If you're using the delta import handler the problem would seem to go
away because you can have two separate masters running at all times,
and if one fails, you can then point the slaves to the secondary
master, that is guaranteed to be in sync because it's been importing
from the same database?
This has since been fixed. The problem was that there was not enough memory
on the machine. It works just fine now.
On Tue, Mar 8, 2011 at 6:22 PM, Chris Hostetter hossman_luc...@fucit.orgwrote:
: INFO: Creating a connection for entity id with URL:
:
Peter,
You´re right; may be I expose wrong because of my english.
I done everything you told me. I think that no find the folder when index.
What you thinking about?
Below I show to you part of the log.
09/03/2011 11:52:01 org.apache.solr.core.SolrCore execute
INFO: [mail] webapp=/solr
Hi,
When you ran the status command, what was the output?
On Wed, Mar 9, 2011 at 2:55 PM, Matias Alonso matiasgalo...@gmail.com wrote:
Peter,
You´re right; may be I expose wrong because of my english.
I done everything you told me. I think that no find the folder when index.
What you
Log:
09/03/2011 11:54:58 org.apache.solr.core.SolrCore execute
INFO: [mail] webapp=/solr path=/dataimport params={command=status} status=0
QTime=0
XML
response
-
lst name=responseHeader
int name=status0/int
int name=QTime0/int
/lst
-
lst name=initArgs
-
lst name=defaults
str
I'm trying to do a search with SolrJ using digest authentication, but
I'm getting the following error:
org.apache.solr.common.SolrException: Unauthorized
I'm setting up SolrJ this way:
HttpClient client = new HttpClient();
ListString authPrefs = new ArrayListString();
If you have a wrapper, like an indexer app which prepares solr docs and
sends them into solr, then it is simple. The wrapper is your 'tee' and
it can send docs to both (or N) masters.
-Original Message-
From: Michael Sokolov [mailto:soko...@ifactory.com]
Sent: Wednesday, March 09, 2011
Hi,
- Original Message
If you're using the delta import handler the problem would seem to go
away because you can have two separate masters running at all times,
and if one fails, you can then point the slaves to the secondary
master, that is guaranteed to be in sync because it's
Hi,
- Original Message
From: Robert Petersen rober...@buy.com
To: solr-user@lucene.apache.org
Sent: Wed, March 9, 2011 11:40:56 AM
Subject: RE: True master-master fail-over without data gaps
If you have a wrapper, like an indexer app which prepares solr docs and
sends them into
Oh, there is no DB involved. Think of a document stream continuously coming
in,
a component listening to that stream, grabbing docs, and pushing it to
master(s).
I don't think Solr is designed for this use case, eg, I wouldn't
expect deterministic results with the current architecture as
Hi,
- Original Message
Yes, I think this should be pushed upstream - insert a tee in the
document stream so that all documents go to both masters.
Then use a load balancer to make requests of the masters.
Hm, but this makes the tee app aware of this. What if I want to hide
Hi,
- Original Message
Oh, there is no DB involved. Think of a document stream continuously
coming
in,
a component listening to that stream, grabbing docs, and pushing it to
master(s).
I don't think Solr is designed for this use case, eg, I wouldn't
expect
Currently I use an application connected to a queue containing incoming
data which my indexer app turns into solr docs. I log everything to a
log table and have never had an issue with losing anything. I can trace
incoming docs exactly, and keep timing data in there also. If I added a
second
Hi,
- Original Message
I'd honestly think about buffer the incoming documents in some store that's
actually made for fail-over persistence reliability, maybe CouchDB or
something.
And then that's taking care of not losing anything, and the problem becomes
how
we make sure
On Mar 9, 2011, at 9:02 AM, Otis Gospodnetic wrote:
You mean it's not possible to have 2 masters that are in nearly real-time
sync?
How about with DRBD? I know people use DRBD to keep 2 Hadoop NNs (their edit
logs) in sync to avoid the current NN SPOF, for example, so I'm thinking this
...but the index resides on disk doesn't it??? lol
-Original Message-
From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com]
Sent: Wednesday, March 09, 2011 9:06 AM
To: solr-user@lucene.apache.org
Subject: Re: True master-master fail-over without data gaps
Hi,
- Original
Hi,
- Original Message
Currently I use an application connected to a queue containing incoming
data which my indexer app turns into solr docs. I log everything to a
log table and have never had an issue with losing anything.
Yeah, if everything goes through some storage that
On disk, yes, but only indexed, and thus far enough from the original content
to
make storing terms in Lucene's inverted index.
Otis
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/
- Original Message
From: Robert
RAMdisk
...but the index resides on disk doesn't it??? lol
-Original Message-
From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com]
Sent: Wednesday, March 09, 2011 9:06 AM
To: solr-user@lucene.apache.org
Subject: Re: True master-master fail-over without data gaps
Hi,
This is why there's block cipher cryptography.
On Wed, Mar 9, 2011 at 9:11 AM, Otis Gospodnetic
otis_gospodne...@yahoo.com wrote:
On disk, yes, but only indexed, and thus far enough from the original content
to
make storing terms in Lucene's inverted index.
Otis
Sematext ::
Brian,
I had the same problem a while back and set the JAVA_OPTS env variable
to something my machine could handle. That may also be an option for
you going forward.
Adam
On Wed, Mar 9, 2011 at 9:33 AM, Brian Lamb
brian.l...@journalexperts.com wrote:
This has since been fixed. The problem was
I guess you could put a LB between slaves and masters, never thought of
that! :)
-Original Message-
From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com]
Sent: Wednesday, March 09, 2011 9:10 AM
To: solr-user@lucene.apache.org
Subject: Re: True master-master fail-over without data
Right. LB VIP on both sides of master(s). Black box.
Otis
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/
- Original Message
From: Robert Petersen rober...@buy.com
To: solr-user@lucene.apache.org
Sent: Wed, March
Is there a way to perform string logic on the key field using a subquery or
some other method.
IE. If the left 4 characters of the key are ABCD, then include or exclude
those from the search.
Here is the laymans pseudo code for what I'm wanting to do:
*:* AND LEFT(KEY, 4) 'abcd'
How about something like:
for exclusion
+*:* -KEY:abcd*
for inclusion
+*:* +KEY:abcd*
Best
Erick
On Wed, Mar 9, 2011 at 12:34 PM, Daniel Baughman da...@hostworks.com wrote:
Is there a way to perform string logic on the key field using a subquery or
some other method.
IE. If the left 4
On 3/9/2011 12:05 PM, Otis Gospodnetic wrote:
But check this! In some cases one is not allowed to save content to
disk (think
copyrights). I'm not making this up - we actually have a customer with this
cannot save to disk (but can index) requirement.
Do they realize that a Solr index is on
Hi,
- Original Message
From: Walter Underwood wun...@wunderwood.org
On Mar 9, 2011, at 9:02 AM, Otis Gospodnetic wrote:
You mean it's not possible to have 2 masters that are in nearly real-time
sync?
How about with DRBD? I know people use DRBD to keep 2 Hadoop NNs (their
Hi,
It sounds like if you put those 4 chars in a separate field at index time you
could apply your logic on that at search time.
Otis
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/
- Original Message
From: Daniel
Can't you skip the SAN and keep the indexes locally? Then you would
have two redundant copies of the index and no lock issues.
Also, Can't master02 just be a slave to master01 (in the master farm and
separate from the slave farm) until such time as master01 fails? Then
master02 would start
After About 4-5 hours the merge completed (ran out of heap)..as you
suggested..it was having memory issues..
Read queries during the merge were working just fine (they were taking
longer then normal ~30-60seconds).
I think I need to do more reading on understanding the merge/optimization
Hi,
I'm investigating how to set up a schema like this:
I want to index accounts and the products purchased (multiValued) by
that account but I also need the ability to search by the date the
product was purchased.
It would be easy if the purchase date wasn't part of the requirements.
How
Would having a solr-document represent a 'product purchase per account'
solve your problem?
You could then easily link the date of purchase to the document as well as
the account-number.
e.g:
fields: orderid (key), productid, product-characteristics,
order-characteristics (including date of
Hi all,
I know that I can add sort=score desc to the url to sort in descending
order. However, I would like to sort a MoreLikeThis response which returns
records like this:
lst name=moreLikeThis
result name=3 numFound=113611 start=0 maxScore=0.4392774
result name=2 numFound= start=0
Anyone have any clue on this on?
On Tue, Mar 8, 2011 at 2:11 PM, Brian Lamb brian.l...@journalexperts.comwrote:
Hi all,
I am using dataimport to create my index and I want to use docBoost to
assign some higher weights to certain docs. I understand the concept behind
docBoost but I haven't
you can use the ScriptTransformer to perform the boost calcualtion and addition.
http://wiki.apache.org/solr/DataImportHandler#ScriptTransformer
dataConfig
script![CDATA[
function f1(row) {
// Add boost
row.put('$docBoost',1.5);
Hi,
Original Message
From: Robert Petersen rober...@buy.com
Can't you skip the SAN and keep the indexes locally? Then you would
have two redundant copies of the index and no lock issues.
I could, but then I'd have the issue of keeping them in sync, which seems more
fragile.
Hi,
You'll benefit from watching this segment merging video:
http://blog.mikemccandless.com/2011/02/visualizing-lucenes-segment-merges.html
And you'll appreciate the graph at the bottom:
http://code.google.com/p/zoie/wiki/ZoieMergePolicy
Otis
Sematext :: http://sematext.com/ :: Solr -
Hi Otis,
Have you considered using Solandra with Quorum writes
to achieve master/master with CA semantics?
-Jake
On Wed, Mar 9, 2011 at 2:48 PM, Otis Gospodnetic otis_gospodne...@yahoo.com
wrote:
Hi,
Original Message
From: Robert Petersen rober...@buy.com
Can't you skip
You will need to cap the maximum segment size using
LogByteSizeMergePolicy.setMaxMergeMB. As then you will only have
segments that are of an optimal size, and Lucene will not try to
create gigantic segments. I think though on the query side you will
run out of heap space due to the terms index
That makes sense. As a follow up, is there a way to only conditionally use
the boost score? For example, in some cases I want to use the boost score
and in other cases I want all documents to be treated equally.
On Wed, Mar 9, 2011 at 2:42 PM, Jayendra Patil jayendra.patil@gmail.com
wrote:
Hi all,
I'm using MoreLikeThis to find similar results but I'd like to exclude
records by the id number. For example, I use the following URL:
http://localhost:8983/solr/search/?q=id:(2 3
5)mlt=truemlt.fl=description,idfl=*,score
How would I exclude record 4 form the MoreLikeThis results?
I
- Forwarded Message -
From: l blevins l.blev...@comcast.net
To: solr user mail solr-user-h...@lucene.apache.org
Sent: Wednesday, March 9, 2011 4:03:06 PM
Subject: some relational-type groupig with search
I have a large database for which we have some good search capabilties now,
Brian,
...?q=id:(2 3 5) -4
Otis
---
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/
- Original Message
From: Brian Lamb brian.l...@journalexperts.com
To: solr-user@lucene.apache.org
Sent: Wed, March 9, 2011 4:05:10
Hi,
I am seeing an issue I do not understand and hope that someone can shed some
light on this. The issue is that for a particular search we are seeing a
particular result rank in position 3 on one machine and position 8 on the
production machine. The position 3 is our desired and roughly
Hi,
In one of the environments i'm working on (4 Solr 1.4.1. nodes with
replication, 3+ million docs, ~5.5GB index size, high commit rate (~1-2min),
high query rate (~50q/s), high number of updates (~1000docs/commit)) the nodes
continuously run out of memory.
During development we frequently
That doesn't seem to do it. Record 4 is still showing up in the MoreLikeThis
results.
On Wed, Mar 9, 2011 at 4:12 PM, Otis Gospodnetic otis_gospodne...@yahoo.com
wrote:
Brian,
...?q=id:(2 3 5) -4
Otis
---
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem
queryNorm is just a normalizing factor and is the same value across
all the results for a query, to just make the scores comparable.
So even if it varies in different environment, you should not worried about.
Yeah, that just restricts what items are in your main result set (and
adding -4 has no real effect).
The more like this set is constructed based on your main result set, for
each document in it.
As far as I can see from here: http://wiki.apache.org/solr/MoreLikeThis
..there seems to be no
Yes, but the identical index with the identical solrconfig.xml and the
identical query and the identical version of Solr on two different
machines should preduce identical results.
So it's a legitimate question why it's not. But perhaps queryNorm isn't
enough to answer that. Sorry, it's out
Thanks. Good to know, but even so my problem remains - the end score should not
be different and is causing a dramatically different ranking of a document (3
versus 7 is dramatic for my client). This must be down to the scoring debug
differences - it's the only difference I can find :(
On Mar
Hello all,
I have a small problem with my faceting fields. In all I create a new faceting
field which is indexed and not stored, and use copyField. The problem is I
facet on category names which have examples like this
Policies Documentation
That's what I think, glad I am not going mad.
I've spent 1/2 a day comparing the config files, checking out from SVN again
and ensuring the databases are identical. I cannot see what else I can do to
make them equivalent. Both servers checkout directly from SVN, I am convinced
the files are
Are you sure you have the same config ...
The boost seems different for the field text - text:dubai^0.1 text:dubai
-2.286596 = (MATCH) sum of:
- 1.6891675 = (MATCH) sum of:
-1.3198489 = (MATCH) max plus 0.01 times others of:
- 0.023022119 = (MATCH) weight(text:dubai^0.1 in 1551),
On Wed, Mar 9, 2011 at 4:49 PM, Jayendra Patil
jayendra.patil@gmail.com wrote:
Are you sure you have the same config ...
The boost seems different for the field text - text:dubai^0.1 text:dubai
Yep...
Try adding echoParams=all and see all the parameters solr is acting on.
Oh wow, how did I miss that?
My apologies to anyone who read this post. I should have diffed my custom
dismax handler. Looks like my SVN merge didn't work properly.
Embarassing.
Thanks everyone ;)
On Mar 9, 2011, at 4:51 PM, Yonik Seeley wrote:
On Wed, Mar 9, 2011 at 4:49 PM, Jayendra Patil
Hi,
I was wondering if it is possible during a query to create a returned
field 'on the fly' (like function query, but for concrete values, not
score).
For example, if I input this query:
q=_val_:product(15,3)fl=*,score
For every returned document, I get score = 45.
If I change it slightly
I was just about to jump in this conversation to mention Solandra and go fig,
Solandra's committer comes in. :-) It was nice to meet you at Strata, Jake.
I haven't dug into the code yet but Solandra strikes me as a killer way to
scale Solr. I'm looking forward to playing with it; particularly
Doesn't Solandra partition by term instead of document?
On Wed, Mar 9, 2011 at 2:13 PM, Smiley, David W. dsmi...@mitre.org wrote:
I was just about to jump in this conversation to mention Solandra and go fig,
Solandra's committer comes in. :-) It was nice to meet you at Strata, Jake.
I
Wait, if you don't have identical indexes, then why would you expect
identical results?
If your indexes are different, one would expect the results for the same
query to be different -- there are different documents in the index!
The iDF portion of the TF/iDF type algorithm at the base of
Zoie adds NRT to Solr:
http://snaprojects.jira.com/wiki/display/ZOIE/Zoie+Solr+Plugin
I haven't tried it yet but looks cool.
~ David Smiley
Author: http://www.packtpub.com/solr-1-4-enterprise-search-server/
On Mar 9, 2011, at 9:01 AM, Jason Rutherglen wrote:
Jae,
NRT hasn't been
Interesting, does anyone have a summary of what techniques zoie uses to
do this? I don't see any docs on the technical details.
On 3/9/2011 5:29 PM, Smiley, David W. wrote:
Zoie adds NRT to Solr:
http://snaprojects.jira.com/wiki/display/ZOIE/Zoie+Solr+Plugin
I haven't tried it yet but looks
Jason,
It's predecessor did, Lucandra. But Solandra is a new approach that manages
shards of documents across the cluster for you and uses solrs distributed
search to query indexes.
Jake
On Mar 9, 2011, at 5:15 PM, Jason Rutherglen jason.rutherg...@gmail.com wrote:
Doesn't Solandra
Jonathan, they have a Wiki up these somewhere, including pretty diagrams. If
you have Lucene in Action, Zoie is one of the case studies and is described in
a
lot of detail.
Otis
Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch
Lucene ecosystem search :: http://search-lucene.com/
Jake,
Maybe it's time to come up with the Solandra/Solr matrix so we can see
Solandra's strengths (e.g. RT, no replication) and weaknesses (e.g. I think I
saw a mention of some big indices?) or missing feature (e.g. no delete by
query), etc.
Thanks!
Otis
Sematext :: http://sematext.com/
Probably you can just sort by date (one way and then the other) and
limit your result set to a single document. That should free up enough
budget for the bonuses of the highly-placed people, I think :)
On 3/9/2011 4:05 PM, l.blev...@comcast.net wrote:
- Forwarded Message -
From: l
It is not just one document that would be returned, it one document per
person. That is a little trickier.
- Original Message -
From: Michael Sokolov soko...@ifactory.com
To: solr-user@lucene.apache.org
Cc: l blevins l.blev...@comcast.net
Sent: Wednesday, March 9, 2011 7:46:10
Yeah sure. Let me update this on the Solandra wiki. I'll send across the
link
I think you hit the main two shortcomings atm.
-Jake
On Wed, Mar 9, 2011 at 6:17 PM, Otis Gospodnetic otis_gospodne...@yahoo.com
wrote:
Jake,
Maybe it's time to come up with the Solandra/Solr matrix so we can
Hello,
I'm using a recent build of the trunk (from 3/1). I've noticed that after
the index is up and running for some time I start to get intermittent errors
that look like this:
Mar 2, 2011 9:26:01 AM org.apache.solr.common.SolrException log
SEVERE: java.lang.ClassCastException
The
I created the following SearchComponent that wraps a deduplicate filter
around the current query and added it to last-components. It appears to
be working, but is there any way I can improve the performance? Would
this be considered and added to the filtercache? Am I even caching
correctly?
On Wed, Mar 9, 2011 at 8:34 PM, harish.agarwal harish.agar...@gmail.com wrote:
I'm using a recent build of the trunk (from 3/1). I've noticed that after
the index is up and running for some time I start to get intermittent errors
that look like this:
Mar 2, 2011 9:26:01 AM
So it looks like can handle adding new documents, and expiring old
documents. Updating a document is not part of the game.
This would work well for message boards or tweet type solutions.
Solr can do this as well directly. Why wouldn't you just improve the
document and facet caching so that when
Yes just add if statement based on a field type and do a row.put() only if
that other value is a certain value.
On 3/9/11 1:39 PM, Brian Lamb brian.l...@journalexperts.com wrote:
That makes sense. As a follow up, is there a way to only conditionally use
the boost score? For example, in some
Hi,
I'm using Solr 1.4.1.
The scenario involves user uploading multiple files. These have content
extracted using SolrCell, then indexed by Solr along with other information
about the user.
ContentStreamUpdateRequest seemed like the right choice for this - use
addFile() to send file data, and
Please start new threads for new conversations.
On Wed, Mar 9, 2011 at 2:27 AM, stockii stock.jo...@googlemail.com wrote:
question: http://wiki.apache.org/solr/NearRealtimeSearchTuning
'PERFORMANCE WARNING: Overlapping onDeckSearchers=x
i got this message.
in my solrconfig.xml:
In case the exact problem was not clear to somebody:
The problem with FileUpload interpreting file data as regular form fields is
that, Solr thinks there are no content streams in the request and throws a
missing_content_stream exception.
On Thu, Mar 10, 2011 at 10:59 AM, Karthik Shiraly
92 matches
Mail list logo