Thanks Mark,
Good, this is probably good enough to give it a try. My analyzers are
normally fast, doing duplicate analysis (at each replica) is
probably not going to cost a lot, if there is some decent batching
Can this be somehow controlled (depth of this buffer / time till flush
or some
Hi!
On Wed, Feb 29, 2012 at 22:21, Emmanuel Espina espinaemman...@gmail.com wrote:
No. But probably we can find another way to do what you want. Please
describe the problem and include some numbers to give us an idea of
the sizes that you are handling. Number of documents, size of the
index,
I don't think mm will help here because it defaults to 100%
already by the
following code.
Default behavior of mm has changed recently. So it is a good idea to explicitly
set it to 100%. Then all of the search terms must match.
Regarding multi-word synonym, what is the best way to handle
Hi,
I've got an issue when searching with a searchtstring like: 'title:Blue
on Blu' . the original searchstring is: 'title:Blue on Blue' and this
works well. If I now delete the last double quote and the e than I get the
error below. Is there any filter that can handle such searches which I
I've got an issue when searching with a searchtstring
like: 'title:Blue
on Blu' . the original searchstring is: 'title:Blue on
Blue' and this
works well. If I now delete the last double quote and the
e than I get the
error below. Is there any filter that can handle such
searches which I
Hi,
does that effect my result list? Because if i use the dismax, and type into
my search field the title blue on blue (without quotes), I get this
product as a first result. If I use dismax without boosting and search for
blue on blue (without quotes) I'm not getting this result in the first 10
Hi,
Yesterday we had an issue with too many open files, which was solved
because a username was misspelled. But there is still a problem with
open files.
We cannot succesfully index a few millions documents from MapReduce to
a 5-node Solr cloud cluster. One of the problems is that after a
Hi
For spell checking component I set extendedResults to get the frequencies and
then select the word with the best frequency. I understand the spell check
algorithm based on Edit Distance. For an example:
Query to Solr: Marien
Spell Check Text Returned: Marine (Freq: 120), Market (Freq: 900)
does that effect my result list? Because if i use the
dismax, and type into
my search field the title blue on blue (without quotes), I
get this
product as a first result. If I use dismax without boosting
and search for
blue on blue (without quotes) I'm not getting this result
in the
What is netstat telling you about the connections on the servers?
Any connections in CLOSE_WAIT (passive close) hanging?
Saw this on my servers last week.
Used a little proggi to spoof a local connection on those servers ports
and was able to fake the TCP-stack to close those connections.
It
Hi,
Just wondering if anyone had any experience with solr and flashcache
[https://wiki.archlinux.org/index.php/Flashcache], my guess it might
be particularly useful for indicies not changing that often, and for
large indicies where an SSD of that size is prohibitive.
Cheers,
Dan
Do you have autocommit enabled? I tested this with 1m docs indexed by
using the default example config and saw used file descriptors go up
to 2400 (did not come down even after the final commit at the end).
Then I disabled autocommit, reindexed and the descriptor count stayed
pretty much flat at
On Thursday 01 March 2012 13:03:18 Bernd Fehling wrote:
What is netstat telling you about the connections on the servers?
Any connections in CLOSE_WAIT (passive close) hanging?
I can't tell exact numbers right now but there were a lot between all the
cores and the indexing clients.
Saw
Hmmm. ExternalFileFields can only be float values, so I'm not
sure the necessary data is straight-forward. Additionally, they
are used in function queries. Does this still work?
I really don't know the performance characteristics if, say, you have
users with access to all documents for SOLR-2272,
Currently, the page you referenced here:
http://wiki.apache.org/solr/SolrReplication
is the standard way to replicate incremental indexes.
You say your worried about the extra http. Why?
Do you have any evidence that this would be a problem?
Http isn't inherently inefficient at all, and even if
Right, there's nothing in Solr that I know of that'll help here. How would
a tokenizer understand that smartphone should be smart phone?
There's no general solution for this issue.
You can do domain-specific solutions with synonyms for instance, or
some other word list that contains terms you're
I think I didnt explain myself clearly: I need to be able to find substrings.
So, its not that I'd expect Solr to find synonyms, but rather if a piece of
text contains the searched text, for example:
if title holds smartphone I want it to be found when someone types
martph or smar or smart.
I
Perfect! Thanks!
On Wed, Feb 29, 2012 at 3:29 PM, Emmanuel Espina
espinaemman...@gmail.com wrote:
I think that what you want is FieldCollapsing:
http://wiki.apache.org/solr/FieldCollapsing
For example
q=my searchgroup=truegroup.field=subjectgroup.limit=5
Test it to see if that is what you
Hi,
what about, if a search string starts with $o$ ? this is not recognized by
dismax too, right? Is there another filter I have to use?
Thanks,
Ramo
-Ursprüngliche Nachricht-
Von: Ahmet Arslan [mailto:iori...@yahoo.com]
Gesendet: Donnerstag, 1. März 2012 12:44
An:
what about, if a search string starts with $o$ ? this is
not recognized by
dismax too, right? Is there another filter I have to use?
I don't fully follow your question but it seems that you want to search special
characters too? With raw or term query parser plugin you can do that.
I'm just starting out...
for either
testing QA
TESTING QA
I can query with the following strings and find my text:
testing
TESTING
testing*
but the following doesn't work.
TESTING*
any ideas?
thanks
Neil
Hi Erick, Thanks for your post.
We are not directly providing search result from lucene index to user. We
are processing the lucene search result and adding additional information to
it by getting from different sources[from other lunce indexes or from
databases]. So,
consuming search results
Any segment files on SSD will be faster in cases where the file is not
in OS cache. If you have enough RAM a lot of index segment files will
end up in OS system cache so it wont have to go to disk anyway. Since
most indexes are bigger than RAM an SSD helps a lot. But if index is
much larger
Hi!
Having just worked through the solr tutorial
(http://lucene.apache.org/solr/tutorial.html) I think I found two minor
bugs:
1.
The delete by query example
java -Ddata=args -jar post.jar deletequeryname:DDR/query/delete
should read
java -Ddata=args -jar post.jar
I once used a spell checker to break up compound words. It was slow, but worked
pretty well.
wunder
On Mar 1, 2012, at 5:53 AM, Erick Erickson wrote:
Right, there's nothing in Solr that I know of that'll help here. How would
a tokenizer understand that smartphone should be smart phone?
Yavar,
When you listed what the spell checker returns you put them in this order:
Marine (Freq: 120), Market (Freq: 900) and others
Was Marine listed first, and then did you pick Market because you thought
higher frequency is better? If so, you probably have the right settings
already but
Speaking of which, there is a spellchecker in jira that will detect word-break
errors like this. See WordBreakSpellChecker at
https://issues.apache.org/jira/browse/LUCENE-3523 .
To use it with Solr, you'd also need to apply SOLR-2993
(https://issues.apache.org/jira/browse/SOLR-2993). This
if title holds smartphone I want it to be found when
someone types
martph or smar or smart.
Peter, so you want to beginsWith startsWith type of search? You can use use
wildcard search (with start operator) for this. e.g. q=smar*
Alternatively, if your index size is not huge, you can use
but the following doesn't work.
TESTING*
Please see the following writeups:
http://wiki.apache.org/solr/MultitermQueryAnalysis
http://www.lucidimagination.com/blog/2011/11/29/whats-with-lowercasing-wildcard-multiterm-queries-in-solr/
On Thu, Mar 1, 2012 at 6:43 AM, Husain, Yavar yhus...@firstam.com wrote:
Hi
For spell checking component I set extendedResults to get the frequencies and
then select the word with the best frequency. I understand the spell check
algorithm based on Edit Distance. For an example:
Query to
@iorixxx: yes, that is what I need. But also when its IN the text, not
necessarily at the beginning.
So using the * character like:
q=smart*
the product is found, but when I do this:
q=*mart*
it isnt...why is that?
--
View this message in context:
Thanks James. I loved the last line in your mail But in the end, especially
with 1-word queries, I doubt even the best algorithms are going to always
accurately guess what the user wanted. Absolutely I agree to this; if it is a
phrase (instead of single word) then probably we can apply some
Thanks Robert. Yes thats right I can get some more accuracy if I use
transposition in addition to substitution, insert and deletion.
From: Robert Muir [rcm...@gmail.com]
Sent: Thursday, March 01, 2012 9:50 PM
To: solr-user@lucene.apache.org
Subject: Re:
Hi
i need to build buckets with alphanumeric values.
for example:
facet.field=person
person: Alex(10), Ben(5), George(8), Paul(3), Peter(2), Stefan(9)
now i need all person in the interval of A-C
with facet.query=person[A TO C] i only get the number of matches (15)
but i wanna have the values
Added it back in. I still get the same result.
On Wed, Feb 29, 2012 at 10:09 PM, Mark Miller markrmil...@gmail.com wrote:
Do you have a _version_ field in your schema? I actually just came back to
this thread with that thought and then saw your error - so that remains my
guess.
I'm going to
Hi, all!
It may be seems strange, but can you who read this post answer at some
questions. I want to understand, that maybe I want to much from my Solr, so:
1) Solr version;
2) Summary doc count;
3) Shards count (if exists);
4) rows count at query (from ... into);
5) Average queries per minute
I have the same problem. This happens only for some documents in the index.
Like sharadgaur, the problem ceased when I removed
ReversedWildcardFilterFactory from my analysis chain,
HTMLStripCharFilterFactory has been there before and after.
I am running branch-3.6 r1238628. As far as I can
P.S. FYI you will have to reindex after adding _version_ back the schema...
On Mar 1, 2012, at 3:35 PM, Mark Miller wrote:
Any other customizations you are making to solrconfig?
On Mar 1, 2012, at 1:48 PM, Matthew Parker wrote:
Added it back in. I still get the same result.
On Wed, Feb
I have the same problem. This happens
only for some documents in the index.
Andrew, can you provide a document string and a query pair? I will try to
re-produce the exception. Then we can create a test case that fails. Others can
look into it.
--- On Thu, 3/1/12, PeterKerk vettepa...@hotmail.com wrote:
From: PeterKerk vettepa...@hotmail.com
Subject: Re: Need tokenization that finds part of stringvalue
To: solr-user@lucene.apache.org
Date: Thursday, March 1, 2012, 6:59 PM
@iorixxx: yes, that is what I need.
But also when its IN
@iorixxx: Where can I find that example schema.xml?
I downloaded the latest version here:
ftp://apache.mirror.easycolocate.nl//lucene/solr/3.5.0
And checked \example\example-DIH\solr\db\conf\schema.xml
But no text_rev type is defined in there.
And when I find it, can I just make the title field
I don't think Spatial search will fully fit into this. I have 2 approaches in
mind but I am not satisfied with either one of them.
a) Have 2 separate indexes. First one to store the information about all the
cities and second one to store the retail stores information. Whenever user
searches
I'm really confused here. Your first question seemed to be about http
involved in index replication, which really doesn't seem to be
related to your latest post. Can you start over from the beginning?
Best
Erick
On Thu, Mar 1, 2012 at 9:56 AM, Neel neelkant.potlap...@aspiresys.com wrote:
Hi
On frequent method of doing leading and trailing wildcards
is to use ngrams (as distinct from edgengrams). That in
combination with phrase queries might work well in this case.
You also might be surprised at how little space bigrams take,
give it a test and see G..
Best
Erick
On Thu, Mar 1,
Only one interval? in that case you could add a filter query and facet
in the regular way. That is:
facet.field=personfq=person:[A TO C]
But consider that you will get the search results that include those
persons only.
Thanks
Emmanuel
2012/3/1 AlexR alexanderroessler1...@hotmail.com:
Hi
i
Hi all,
The documents in our solr index have an parent child relationship which we
have basically flattened in our solr queries. We have messaged solr into
being the query API for a 3rd party data. The relationship is simple
parent-child relationship as follows:
category
+-sub-category
this
@iorixxx: Where can I find that
example schema.xml?
Please find text_general_rev at
http://svn.apache.org/repos/asf/lucene/dev/trunk/solr/example/solr/conf/schema.xml
And when I find it, can I just make the title field which
currently is of
text type then of text_rev type?
Yes, also you
On Thu, Mar 1, 2012 at 3:34 AM, Michael Jakl jakl.mich...@gmail.com wrote:
The topic field holds roughly 5
values per doc, but I wasn't able to compute the correct number right
now.
How many unique values for that field in the whole index?
If you have log output (or output from the stats page
Hi,
Apologies if this has been answered before, I tried searching for it and
didn't find anything answering this exactly.
I want to find similar documents using MLT Handler using some specified
fields but I want to filter down the returned matches with some keywords as
well.
I looked at the
I tried publishing to /update/extract request handler using manifold, but
got the same result.
I also tried swapping out the replication handlers too, but that didn't do
anything.
Otherwise, that's it.
On Thu, Mar 1, 2012 at 3:35 PM, Mark Miller markrmil...@gmail.com wrote:
Any other
I reindex every time I change something.
I also delete any zookeeper data too.
I assuming the windows configuration looked correct?
On Thu, Mar 1, 2012 at 3:39 PM, Mark Miller markrmil...@gmail.com wrote:
P.S. FYI you will have to reindex after adding _version_ back the schema...
On Mar 1,
I assuming the windows configuration looked correct?
Yeah, so far I can not spot any smoking gun...I'm confounded at the moment.
I'll re read through everything once more...
- Mark
Hi,
I am sorry if this has already been posted.
I am new to the solr.
I am crawling my site using Nutch and posting it to Solr. I am trying to
implement a feature where I want to get all data where url starts with
http://someurl/;
Any thoughts?
Thanks,
Stan
--
View this message in
(12/03/02 6:05), Ahmet Arslan wrote:
I have the same problem. This happens
only for some documents in the index.
Andrew, can you provide a document string and a query pair? I will try to
re-produce the exception. Then we can create a test case that fails. Others can
look into it.
+1.
Thanks Ahmet! That's good to know someone else also tried to make phrase
queries to fix multi-word synonym issue. :-)
On Thu, Mar 1, 2012 at 1:42 AM, Ahmet Arslan iori...@yahoo.com wrote:
I don't think mm will help here because it defaults to 100%
already by the
following code.
Default
Hello Donnie,
1. Nothing beside of design consideration prevents you form doing search in
QueryResponseWriter. You have a request, which isn't closed yet, where you
can obtain searcher from.
2. Your usecase isn't clear. If you need just to search categories, and
return the lists of subcategories
Hi
I face the issue that i have n business-user. Each business-user has it's
own amount products. I want to provide an interface for each business-user
where he can find only the products he offers. What would be a be a better
solution:
1.)To have one big index and filter by
Also regarding the Join functionality I remember Yonik pointed out it's O(#
unique terms) but I agree with Erik on the ExternalFileField as you can use
it just inside a function query, for example, for boosting.
Tommaso
2012/3/1 Erick Erickson erickerick...@gmail.com
Hmmm. ExternalFileFields
58 matches
Mail list logo