Lowering the precisionStep of the trie field could help. With a
precisionStep of 8, there would be 256 top-level terms to step over.
If you lowered it to 6, it would cut it to 64 or 4 would cut it to 16.
The fastest would be to index a separate field to indicate presence or
absence of your
Hi Mark,
the READMA.txt in the main directory contains:
$Id: CHANGES.txt 817424 2009-09-21 21:53:41Z yonik $
I've downloaded the package as artifact from the Hudson server.
Chantal
Mark Miller schrieb:
It was added to trunk on the 11th and shouldn't require a patch. You
sure that nightly was
Sorry! I didn't replace the war file correctly. It was still the one
from start of August.
Chantal Ackermann schrieb:
Hi Mark,
the READMA.txt in the main directory contains:
$Id: CHANGES.txt 817424 2009-09-21 21:53:41Z yonik $
I've downloaded the package as artifact from the Hudson server.
Well,
Any explanation why I get different scores then?
Yonik Seeley wrote:
On Fri, Oct 2, 2009 at 8:16 AM, Julian Davchev j...@drun.net wrote:
It looks for pari in ancestorName field but first row looks in
241135 records
and the second row it's just 187821 records.
The in
Hi,
I'm running a read only index with SOLR 1.3 on a server with 8GB RAM and the
Heap set to 6GB. The index contains 17 million documents and occupies 63GB of
disc space with compression turned on. Replication frequency from the SOLR
master is 5 minutes. The index should be able to support
Hi,
it need more information to clarify.
can you paste your solrconfig.xml and OOO exception log?
On Mon, Oct 5, 2009 at 5:19 PM, Thomas Koch tho...@koch.ro wrote:
Hi,
I'm running a read only index with SOLR 1.3 on a server with 8GB RAM and
the
Heap set to 6GB. The index contains 17
On Fri, Oct 2, 2009 at 11:31 PM, Prasanna Ranganathan
pranganat...@netflix.com wrote:
Does the PatternReplaceFilter have an option where you can keep the
original token in addition to the modified token? From what I looked at it
does not seem to but I want to confirm the same.
No, it does
On Sat, Oct 3, 2009 at 1:16 AM, Elaine Li elaine.bing...@gmail.com wrote:
Hi,
My doc has three fields, say field1, field2, field3.
My search would be q=field1:string1 field2:string2. I also need to
do some computation and comparison of the string1 and string2 with the
contents in field3
On Sun, Oct 4, 2009 at 8:05 PM, Paul Rosen p...@performantsoftware.comwrote:
Hi,
I've been trying to experiment with merging, but have been running into
some problems.
First, I'm using ruby and the solr-ruby-0.0.7 gem. It looks like there is
no support in that gem for merging. Have I
On Mon, Oct 5, 2009 at 10:24 AM, Christian Zambrano czamb...@gmail.comwrote:
I am really surprised that a query for behaviour returns behavior as a
suggestion only when the parameter spellcheck.onlyMorePopular=true is
present. I re-read the documentation and I see nothing that will imply that
Hi,
In my Solrconfig file, can any one let me know what the below does
str name=qf and st name=mm represents?.
in the below mentioned.
requestHandler name=partitioned class=solr.SearchHandler
lst name=defaults
str name=defTypedismax/str
str name=echoParamsexplicit/str
Hi Gasol Wu,
thanks for your reply. I tried to make the config and syslog shorter and more
readable.
solrconfig.xml (shortened):
config
indexDefaults
useCompoundFilefalse/useCompoundFile
mergeFactor15/mergeFactor
maxBufferedDocs1500/maxBufferedDocs
It seems that you need Faceted
Searchhttp://www.lucidimagination.com/Community/Hear-from-the-Experts/Articles/Faceted-Search-Solr
On Fri, Oct 2, 2009 at 3:35 PM, Julian Davchev j...@drun.net wrote:
Hi,
Long story short: how can I take every 100th row from solr resultset.
What would syntax
Both of those are parameters for dismax query, described bolow:
http://wiki.apache.org/solr/DisMaxRequestHandler
Koji
bhaskar chandrasekar wrote:
Hi,
In my Solrconfig file, can any one let me know what the below does
str name=qf and st name=mm represents?.
in the below mentioned.
Probably you want to use
- multivalued field 'authors'
add
doc
field name=filenamelogin.php/field
field name=authorsalex/field
field name=authorsbrian/field
...
/doc
/add
- return facets for this field
- you can filter unwanted authors whether during indexing process or post
Hi Shalin,
Thanks for your attention.
I am implementing a language translation search. The field1 and field2
are two language's sentence pair. field3 is a table of indexes of
words in field1 and field2. The table was created by some independent
algorithm. If string1 and string2 can be found
Looks like you have a huge document cache, and the warming query must
have a really high rows.
Can you lower the rows to something like 10 on the master?
-Yonik
http://www.lucidimagination.com
On Fri, Oct 2, 2009 at 11:28 AM, Jeff Newburn jnewb...@zappos.com wrote:
The warmers return 11
Hello Everyone,
I use java -jar start.jar command to start Solr. And when ever i want to
stop it, I kill the process.
Is there any command to stop it?
Thanks in advance.
Sandeep
-
Sandeep Tagore
--
View this message in context:
Hi Elaine,
You can make use of http://wiki.apache.org/solr/FunctionQuery Function
Query to achieve this. You can do the computations in your customized
function to determine whether it is a hit or not.
Sandeep
-
Sandeep Tagore
--
View this message in context:
Sandeep Tagore wrote:
Hello Everyone,
I use java -jar start.jar command to start Solr. And when ever i want to
stop it, I kill the process.
Is there any command to stop it?
Thanks in advance.
Sandeep
-
Sandeep Tagore
Just look up how jetty works - its not Solr specific.
One
Hi
I am hoping someone can point me in the right direction with regards to
indexing words that are concatenated together to make other words or product
names.
We have indexed a product database and have come across some search terms
where zero results are returned. There are products in the
How many unique fields are you sorting and faceting on?
Without knowing much, based on what you have said, for a single machine
I would recommend at least 16GB of RAM for your setup. 32GB would be
even better. 17 million docs is def doable on a single server, but if
you are faceting/sorting on
Hi Elaine,
couldn't you do that comparison at index time and store the result in an
additional field? At query time you could include that other field as
condition.
Cheers,
Chantal
Elaine Li schrieb:
Hi Shalin,
Thanks for your attention.
I am implementing a language translation search.
Shalin,
Thanks for the clarification. That explains a lot. I should have looked
at the lucene documentation.
On 10/05/2009 05:28 AM, Shalin Shekhar Mangar wrote:
On Mon, Oct 5, 2009 at 10:24 AM, Christian Zambranoczamb...@gmail.comwrote:
I am really surprised that a query for
Hi Sandeep,
I read about this chapter before. It did not mention how to create my
own customized function.
Can you point me to some instructions?
Thanks.
Elaine
On Mon, Oct 5, 2009 at 10:15 AM, Sandeep Tagore
sandeep.tag...@gmail.com wrote:
Hi Elaine,
You can make use of
Hi Chantal,
I thought about that - taking care of the comparison at the index
time. But the user's input scenarios are countless. That method will
not cover all the cases.
Doing comparison on the fly is better. I am just confused which way to
go since I have not done much customization of solr by
I think it doesn't make sense to enable warming if your solr instance is just
for indexing pourposes (it changes if you use it for search aswell). You
could comment the caches aswell from solrconfig.xml
Setting queryResultWindowSize and queryResultMaxDocsCached to sero maybe
could help... (but if
sorry if this is a very simple question, but I am stuck (and online searches
for this info havent been fruitful).
Lets say that, in certain circumstances, I want to change the field names
and/or field query values being passed to SOLR.
For example, lets say my unmodified query is
We have indexed a product database and have come across some search terms
where zero results are returned. There are products in the index with
'Borderlands xxx xxx', 'Dragonfly xx xxx' in the title. Searches for
'Borderland' or 'Border Land' and 'Dragon Fly' return zero results
How often are you committing?
Every time you commit, Solr will close the old index and open the new one. If
you are doing this in parallel from multiple jobs (4-5 you mention) then
eventually the server gets behind and you start to pile up commit requests.
Once this starts to happen, it will
On Mon, Oct 5, 2009 at 12:03 PM, Giovanni Fernandez-Kincade
gfernandez-kinc...@capitaliq.com wrote:
Hi,
I’m attempting to index approximately 6 million HTML/Text files using SOLR
1.4/Tomcat6 on Windows Server 2003 x64. I’m running 64 bit Tomcat and JVM.
I’ve fired up 4-5 different jobs that
On Mon, Oct 5, 2009 at 4:42 AM, Julian Davchev j...@drun.net wrote:
Well,
Any explanation why I get different scores then?
I didn't have enough context to see if anything was wrong... by
different scores do you mean that the debugQuery scores don't match
with the scores in the main document
I'm not committing at all actually - I'm waiting for all 6 million to be done.
-Original Message-
From: Feak, Todd [mailto:todd.f...@smss.sony.com]
Sent: Monday, October 05, 2009 12:10 PM
To: solr-user@lucene.apache.org
Subject: RE: Solr Timeouts
How often are you committing?
Every
On Mon, Oct 5, 2009 at 12:30 PM, Giovanni Fernandez-Kincade
gfernandez-kinc...@capitaliq.com wrote:
I'm not committing at all actually - I'm waiting for all 6 million to be done.
You either have solr auto commit set up, or a client is issuing a commit.
-Yonik
http://www.lucidimagination.com
Ok. Guess that isn't a problem. :)
A second consideration... I could see lock contention being an issue with
multiple clients indexing at once. Is there any disadvantage to serializing the
clients to remove lock contention?
-Todd
-Original Message-
From: Giovanni Fernandez-Kincade
Actually, ignore my other response.
I believe you are committing, whether you know it or not.
This is in your provided stack trace
org.apache.solr.handler.RequestHandlerUtils.handleCommit(UpdateRequestProcessor,
SolrParams, boolean)
Is there somewhere other than solrConfig.xml that the autoCommit feature is
enabled? I've looked through that file and found autocommit to be commented out:
!--
Perform a commit/ automatically under certain conditions:
maxDocs - number of updates since last commit is greater than
yes that's what we decided to expand these terms while indexing.
if we have
bayrische motoren werke = bmw
and i have a document which has bmw in it, searching for text:bayrische does
not give me results. i have to give
text:bayrische motoren werke then it actually takes the synonym and gets
me
Hi Guys
I'm getting crazy with the highlighting in solr. The problem is the follow:
when I submit an exact phrase query, I get the related results and the
related snippets with highlight. But I've noticed that the *single term of
the phrase are highlighted too*. Here an example:
If I start a
Hi Mauricio, thanks for your feedback.
I suppose we will move to a mixed solution Solr on Tomcat and a .Net client
(maybe SolrNet)
But the Solr on KVM could be interesting. If I've time I'll try It and I'll
let you know in success case.
Antonio
2009/9/30 Mauricio Scheffer
This is what one of my SOLR requests look like:
http://titans:8080/solr/update/extract/?literal.versionId=684936literal.filingDate=1997-12-04T00:00:00Zliteral.formTypeId=95literal.companyId=3567904literal.sourceId=0resource.name=684936.txtcommit=false
Have you verified that all of your
How long is your timeout? Maybe it should be longer, since this is normal
Solr behavior. --wunder
-Original Message-
From: Giovanni Fernandez-Kincade [mailto:gfernandez-kinc...@capitaliq.com]
Sent: Monday, October 05, 2009 9:45 AM
To: solr-user@lucene.apache.org
Subject: RE: Solr
Ok I have eliminated all queries for warming and am still getting the heap
space dump. Any ideas at this point what could be wrong? This seems like a
huge increase in memory to go from indexing without issues to not being able
to even with warming off.
--
Jeff Newburn
Software Engineer,
Using synonyms might be a better solution because the use of
EdgeNGramTokenizerFactory has the potential of creating a large number
of token which will artificially increase the number of tokens in the
index which in turn will affect the IDF score.
A query for borderland should have returned
I'm fairly certain that all of the indexing jobs are calling SOLR with
commit=false. They all construct the indexing URLs using a CLR function I
wrote, which takes in a Commit parameter, which is always set to false.
Also, I don't see any calls to commit in the Tomcat logs (whereas normally
Hi everyone,
I have a little question regarding the search engine when a wildcard character
is used in the query.
Let's take the following example :
- I have sent in indexation the word Hésitation (with an accent on the e)
- The filters applied to the field that will handle this word, result in
Using synonyms might be a better solution because the use of
EdgeNGramTokenizerFactory has the potential of creating a large number of
token which will artificially increase the number of tokens in the index
which in turn will affect the IDF score.
Well, I don't see a reason as to why
You are correct.
I would recommend to only use the Synonym TokenFilter at index time
unless you have a very good reason to do it at query time.
On 10/05/2009 11:46 AM, darniz wrote:
yes that's what we decided to expand these terms while indexing.
if we have
bayrische motoren werke = bmw
No filters are applied to wildcard/fuzzy searches.
I couldn't find a reference to this on either the solr or lucene
documentation but I read it on the Solr book from PACKT
On 10/05/2009 12:09 PM, Angel Ice wrote:
Hi everyone,
I have a little question regarding the search engine when a
Note that there is a similar question in
http://www.nabble.com/TermsComponent-to25302503.html#a25312549
http://www.nabble.com/TermsComponent-to25302503.html#a25312549
--
View this message in context:
OK... next step is to verify that SolrCell doesn't have a bug that
causes it to commit.
I'll try and verify today unless someone else beats me to it.
-Yonik
http://www.lucidimagination.com
On Mon, Oct 5, 2009 at 1:04 PM, Giovanni Fernandez-Kincade
gfernandez-kinc...@capitaliq.com wrote:
I'm
Jeff Newburn wrote:
Ok I have eliminated all queries for warming and am still getting the heap
space dump. Any ideas at this point what could be wrong? This seems like a
huge increase in memory to go from indexing without issues to not being able
to even with warming off.
How about a
No filters are applied to wildcard/fuzzy searches.
Ah! Not like that ..
I guess, it is just that the phrase searches using wildcards are not
analyzed.
Cheers
Avlesh
On Mon, Oct 5, 2009 at 10:42 PM, Christian Zambrano czamb...@gmail.comwrote:
No filters are applied to wildcard/fuzzy
Thanks for your help!
-Original Message-
From: ysee...@gmail.com [mailto:ysee...@gmail.com] On Behalf Of Yonik Seeley
Sent: Monday, October 05, 2009 1:18 PM
To: solr-user@lucene.apache.org
Subject: Re: Solr Timeouts
OK... next step is to verify that SolrCell doesn't have a bug that
On Mon, Oct 5, 2009 at 1:00 PM, Jeff Newburn jnewb...@zappos.com wrote:
Ok I have eliminated all queries for warming and am still getting the heap
space dump. Any ideas at this point what could be wrong? This seems like a
huge increase in memory to go from indexing without issues to not being
Avlesh, I don't understand your answer.
First of all, I know of no way of doing wildcard phrase queries.
When I said not filters, I meant TokenFilters which is what I believe
you mean by 'not analyzed'
On 10/05/2009 12:27 PM, Avlesh Singh wrote:
No filters are applied to wildcard/fuzzy
Would you mind explaining how omitNorm has any effect on the IDF problem
I described earlier?
I agree with your second sentence. I had to use the NGramTokenFilter to
accommodate partial matches.
On 10/05/2009 12:11 PM, Avlesh Singh wrote:
Using synonyms might be a better solution because
We only have 1 custom search component none of the ones you listed.
Additionally, the last heap dump showed LRUCache and 4 instances of
IndexSchema as all of the memory. There were 5 cores live but the other 4
are all empty. I am trying again with all cores offline but the one we are
trying to
I just grabbed another stack trace for a thread that has been similarly
blocking for over an hour. Notice that there is no Commit in this one:
http-8080-Processor67 [RUNNABLE] CPU time: 1:02:05
org.apache.lucene.index.TermBuffer.read(IndexInput, FieldInfos)
fyi, if you don't want to turn off norms entirely, try this option in
lucene 2.9 DefaultSimilarity:
public void setDiscountOverlaps(boolean v)
Determines whether overlap tokens (Tokens with 0 position increment)
are ignored when computing norm. By default this is false, meaning
overlap tokens
First of all, I know of no way of doing wildcard phrase queries.
http://wiki.apache.org/lucene-java/LuceneFAQ#Can_I_combine_wildcard_and_phrase_search.2C_e.g._.22foo_ba.2A.22.3F
When I said not filters, I meant TokenFilters which is what I believe you
mean by 'not analyzed'
Analysis is a
Zambrano is right, Laurent. The analyzers for a field are not invoked for
wildcard queries. You custom filter is not even getting executed at
query-time.
If you want to enable wildcard queries, preserving the original token (while
processing each token in your filter) might work.
Cheers
Avlesh
Hi There,
Maybe I'm missing something, but I can't seem to get the dismax
request handler to perform and OR query. It appears that OR is removed
by the stop words. I like to do something like
qt=dismaxq=red+OR+green and get all green and all red results.
Thanks,
David
On 10/05/2009 01:18 PM, Avlesh Singh wrote:
First of all, I know of no way of doing wildcard phrase queries.
http://wiki.apache.org/lucene-java/LuceneFAQ#Can_I_combine_wildcard_and_phrase_search.2C_e.g._.22foo_ba.2A.22.3F
Thanks for that link
When I said not filters, I meant
David,
If your schema includes fields with analyzers that use the
StopFilterFactory and the dismax QueryHandler is set-up to search within
those fields, then you are correct.
On 10/05/2009 01:36 PM, David Giffin wrote:
Hi There,
Maybe I'm missing something, but I can't seem to get the
Zambrano, I was too quick to respond to your idf explanation. I definitely
did not mean that idf and length-norms are the same thing.
Andrew, this is how i would have done it -
First, I would create a field called prefix_text as undeneath in my
schema.xml
fieldType name=prefix_text
So, I remove the stop word OR from the stopwords and get the same
result. Using the standard query handler syntax like this
fq=((tags:red)+OR+(tags:green)) I get 421,000 results. Using dismax
q=red+OR+green I get 29,000 results. The debug output from
parsedquery_toString show this:
Ok we have done some more testing on this issue. When I only have the 1
core the reindex completes fine. However, when I added a second core with
no documents it runs out of heap again. This time the heap was 322Mb of
LRUCache. The 1 query that warms returns exactly 2 documents so I have no
Is there a reliable way to safely clean up index directories? This is needed
mainly on slave side as in several situations, an old index directory is
replaced with a new one, and I'd like to remove those that are no longer in
use.
Thanks,
--
J
On Mon, Oct 5, 2009 at 4:54 PM, Jeff Newburn jnewb...@zappos.com wrote:
Ok we have done some more testing on this issue. When I only have the 1
core the reindex completes fine. However, when I added a second core with
no documents it runs out of heap again. This time the heap was 322Mb of
Hi,
I am new in Solr. I am using Solr version 1.3
I would like to index XML files using SolrJ API. I have gone through solr
mailing list's emails and have been able to index XML files. But when I try to
query on those files using SolrJ, I get no output. Especially, I do not find
correct
It looks like you have some confusion about queries vs. facets. You may want to
look at the Solr wiki reqarding facets a bit. In the meanwhile, if you just
want to query for that field containing 21...
I would suggest that you don't set the query type, don't set any facet fields,
and only set
Have you looked at snapcleaner?
http://wiki.apache.org/solr/SolrCollectionDistributionScripts#snapcleaner
http://wiki.apache.org/solr/CollectionDistribution#snapcleaner
Bill
On Mon, Oct 5, 2009 at 4:58 PM, solr jay solr...@gmail.com wrote:
Is there a reliable way to safely clean up index
We use the snapcleaner script.
http://wiki.apache.org/solr/SolrCollectionDistributionScripts#snapcleaner
Will that do the job?
-Todd
-Original Message-
From: solr jay [mailto:solr...@gmail.com]
Sent: Monday, October 05, 2009 1:58 PM
To: solr-user@lucene.apache.org
Subject: cleanup old
I use it in our env(Prod), it seems to working fine for years now only clean up
the snapshot, but not the index.
I added it to the cron that run once a day to clean up
-francis
-Original Message-
From: Feak, Todd [mailto:todd.f...@smss.sony.com]
Sent: Monday, October 05, 2009 2:34 PM
Hi,
Thanks a lot. It worked!!
I was wondering if there is a way in SolrJ to print out the size of the index
being generated? Or else how do I determine the total size of the generated
index ?
Thanks,
Chaitali
--- On Mon, 10/5/09, Feak, Todd todd.f...@smss.sony.com wrote:
From: Feak,
Can someone please give me some pointers to the questions in my earlier
email? And and every help is much appreciated.
Regards,
Prasanna.
On 10/2/09 11:01 AM, Prasanna Ranganathan pranganat...@netflix.com
wrote:
Does the PatternReplaceFilter have an option where you can keep the
I just saw the reply from Shalin after sending this email. Kindly excuse.
On 10/5/09 5:17 PM, Prasanna Ranganathan pranganat...@netflix.com wrote:
Can someone please give me some pointers to the questions in my earlier
email? And and every help is much appreciated.
Regards,
On 10/5/09 2:46 AM, Shalin Shekhar Mangar shalinman...@gmail.com wrote:
Alternatively, is there a filter available which takes in a pattern and
produces additional forms of the token depending on the pattern? The use
case I am looking at here is using such a filter to automate synonym
Hi there,
I'm evaluating Solr as a replacement for our current search server, and am
trying to determine what the best strategy would be to implement our business
needs. Our problem is that we have a catalog schema with products and skus,
one to many. The most relevant content being indexed
I've added a unit test for the problem down below. It feeds document
field data into the XPathEntityProcessor via the
FieldReaderDataSource, and the XPath EP does not emit unpacked fields.
Running this under the debugger, I can see the supplied StringReader,
with the XML string, being piped into
Prasanna,
Wouldn't it be better to use built-in token filters at both index and
query that will convert 'it!' to just 'it'? I believe the
WorkDelimeterFilterFactory will do that for you.
Christian
On Oct 5, 2009, at 7:31 PM, Prasanna Ranganathan pranganat...@netflix.com
wrote:
On
On 10/5/09 8:59 PM, Christian Zambrano czamb...@gmail.com wrote:
Wouldn't it be better to use built-in token filters at both index and
query that will convert 'it!' to just 'it'? I believe the
WorkDelimeterFilterFactory will do that for you.
We do have a field that uses
83 matches
Mail list logo