Hi,
I'd like to clarify our use case a bit more.
We want to return the exact search query as a suggestion only if it is
present in the index. So in my example we would expect to get the
suggestion foo for the query foo but no suggestion abc for the query
abc (because abc is not in the
Hi,I am using solr4.4 with zookeeper 3.3.5. While i was checking for error
conditions of my application, i came across a strange issue.Here is what i
tried:I have three fields defined in my schemaa) UNIQUE_KEY - of type
solr.TrieLongb) empId - of type Solr.TrieLongc) companyId - of type
Hi,
I've been recently ask to implement an application to search products from
several stores, each store having different prices and stock for the same
product.
So I have products that have the usual fields (name, description, brand,
etc) and also number of units and price for each store. I
Hi,
We would like to implement special handling for queries that contain
certain keywords. Our particular use case:
In the example query Footitle season 1 we want to discover the keywords
season , get the subsequent number, and boost (or filter for) documents
that match 1 on field name=season.
What is the actual target speed you are pursuing? Is this for user
suggestions or something of that sort? Content based suggestions with
faceting and esp on 1.4 solr won't be lightning fast.
Have you looked at TermsComponent?
http://wiki.apache.org/solr/TermsComponent
By shingles, which in the
Hi,
I'd go with (2) also but using dynamic fields so you don't have to define all
the storeX_price fields in your schema but rather just one *_price field. Then
when you filter on store:store1 you'd know to sort with store1_price and so
forth for units. That should be pretty straightforward.
Hi all:
I’m currently on a Solr 4.5.0 instance and running this tutorial,
http://lucene.apache.org/solr/4_5_0/tutorial.html
My question is specific to indexing data as proposed from this tutorial,
$ java -jar post.jar solr.xml monitor.xml
The tutorial advises to validate from your localhost,
Hi Erick,
Thanks for the reply and sorry, my fault, wasn't clear enough. I was
wondering if there was a way to remove terms that would always be zero
(because the term came from a document that didn't match the filter query).
Here's an example. I have a bunch of documents with fields
I'm doing some performance testing against an 8-node Solr cloud cluster,
and I'm noticing some periodic slowness.
http://farm4.staticflickr.com/3668/10985410633_23e26c7681_o.png
I'm doing random test searches against an Alias Collection made up of four
smaller (monthly) collections. Like this:
I know it's documented that Lucene/Solr doesn't apply filters to queries with
wildcards, but this seems to trip up a lot of users. I can also see why
wildcards break a number of filters, but a number of filters (e.g. mapping
charsets) could mostly or entirely work. The N-gram filter is
That's what faceting does. The facets are only tabulated
for documents that satisfy they query, including all of
the filter queries and anh other criteria.
Otherwise, facet counts would be the same no matter
what the query was.
Or I'm completely misunderstanding your question...
Best,
Erick
add Durl=http://localhost:8983/solr/collection2/update when run post.jar,
此邮件发送自189邮箱
Reyes, Mark mark.re...@bpiedu.com wrote:
Hi all:
I’m currently on a Solr 4.5.0 instance and running this tutorial,
http://lucene.apache.org/solr/4_5_0/tutorial.html
My question is specific to indexing data
I am querying test in solr 4.3.1 over the field below and it's not finding
all occurences. It seems that if it is a substring of a word like
Supertestplan it isn't found unless I use a wildcards *test*. This is
write because of my tokenizer but does someone know a way around this? I
don't want to
How real time is NRT? In particular, what are you commit settings?
And can you characterize periodic slowness? Queries that usually
take 500ms not tail 10s? Or 1s? How often? How are you measuring?
Details matter, a lot...
Best,
Erick
On Thu, Nov 21, 2013 at 6:03 PM, Dave Seltzer
I have the following simplified setting:
My schema contains one text field, named text.
When I perform a query, I need to get the scores for the same text field
but for different similarity functions (e.g. TFIDF, BM25..) and combine
them externally using different weights.
An obvious way to
you're leaving off the - in front of the D,
-Durl.
Try java -jar post.jar -help for a list of options available
On Thu, Nov 21, 2013 at 12:04 PM, Reyes, Mark mark.re...@bpiedu.com wrote:
So then,
$ java -jar post.jar Durl=http://localhost:8983/solr/collection2/update
solr.xml monitor.xml
Might not be a perfect solution but you can use edgengram filter and copy all
your field data to that field and use it for suggestion.
fieldType name=text_autocomplete class=solr.TextField
positionIncrementGap=100
analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory/
Yes, more details…
Solr version, which garbage collector, how does heap usage look, cpu, etc.
- Mark
On Nov 21, 2013, at 6:46 PM, Erick Erickson erickerick...@gmail.com wrote:
How real time is NRT? In particular, what are you commit settings?
And can you characterize periodic slowness?
On 11/21/2013 1:57 AM, RadhaJayalakshmi wrote:
Hi,I am using solr4.4 with zookeeper 3.3.5. While i was checking for error
conditions of my application, i came across a strange issue.Here is what i
tried:I have three fields defined in my schemaa) UNIQUE_KEY - of type
solr.TrieLongb) empId - of
Hi,
On Wed, Nov 20, 2013 at 12:53 PM, Shalin Shekhar Mangar
shalinman...@gmail.com wrote:
At the Lucene level, I think it would require a directory
implementation which writes to a remote node directly. Otherwise, on
the solr side, we must move the leader itself to another node which
has
And this is the exact problem. Some characters are stored as entities, some are
not. When it is time to display, what else needs escaped? At a minimum, you
would have to always store as amp; to avoid escaping the leading ampersand
in the entities.
You could store every single character as a
此邮件发送自189邮箱
Reyes, Mark mark.re...@bpiedu.com wrote:
Hi all:
I’m currently on a Solr 4.5.0 instance and running this tutorial,
http://lucene.apache.org/solr/4_5_0/tutorial.html
My question is specific to indexing data as proposed from this tutorial,
$ java -jar post.jar solr.xml
Hi All,
Is it possible to perform a facet field query on a subset of documents (the
subset being defined via a filter query for instance)?
I understand that facet pivoting might work, but it would require that the
subset be defined by some field hierarchy, e.g. manufacturer - price (then
only
The query parser does its own tokenization and parsing before your analyzer
tokenizer and filters are called, assuring that only one white
space-delimited token is analyzed at a time.
You're probably best off having an application layer preprocessor for the
query that enriches the query in
Solr (actually Lucene) stores the input _exactly_ as it is entered, and
returns it the same way.
What you're seeing is almost certainly your display mechanism interpreting
the results,
whitespace is notoriously variable in terms of how it's displayed by various
interpretations of the standard.
Hi Adnreas,
If you don't want to use wildcards at query time, alternative way is to use
NGrams at indexing time. This will produce a lot of tokens. e.g.
For example 4grams of your example : Supertestplan = supe uper pert erte rtes
*test* estp stpl tpla plan
Is that you want? By the way why do
OK - probably I should have said A,or #97; :) My point was just
that there is not really anything special about special characters.
On 11/21/2013 10:50 AM, Jack Krupansky wrote:
Would you store a as #65; ?
No, not in any case.
-- Jack Krupansky
-Original Message- From: Michael
I suppose i have to create another field with diffenet tokenizers and set
the boost very low so it doesn't really mess with my ranking because there
the word is now in 2 fields. What kind of tokenizer can do the job?
From: Andreas Owen [mailto:a...@conx.ch]
Sent: Donnerstag, 21. November 2013
You might be able to make use of the dictionary compound word filter, but
you will have to build up a dictionary of words to use:
http://lucene.apache.org/core/4_5_1/analyzers-common/org/apache/lucene/analysis/compound/DictionaryCompoundWordTokenFilterFactory.html
My e-book has some examples
Lots of questions. Okay.
In digging a little deeper and looking at the config I see that
nrtModetrue/nrtMode is commented out. I believe this is the default
setting. So I don't know if NRT is enabled or not. Maybe just a red herring.
I don't know what Garbage Collector we're using. In this test
there is not really anything special about special characters
Well, the distinction was about named entities, which are indeed special.
Besides, in general, for more sophisticated text processing, character
types are a valid distinction.
But all of this begs the question of the original
I confirm
.
Hello,
I'm using Solr 4.x. In my solr schema I have the following fields defined :
field name=content type=text_general indexed=false
stored=true multiValued=true /
field name=all type=text_general indexed=true
stored=false multiValued=true termVectors=true /
field
Ah... now I understand your perspective - you have taken a narrow view of
what text is. A broader view is that it can contain formatting and special
entities as well, or rich text in general. My read is that it all
depends on the nature of the application and its requirements, not a one
size
I have to agree w/Walter. Use unicode as a storage format. The entity
encodings are for transfer/interchange. Encode/decode on the way in and
out if you have to. Would you store a as #65; ? It makes it
impossible to search for, for one thing. What if someone wants to
search for the TM
I know all about formatted text -- I worked at MarkLogic. That is why I
mentioned the XML Infoset.
Numeric entities are part of the final presentation, really, part of the
encoding. They should never be stored. Always store the Unicode.
Numeric and named entities are a convenience for tools
So then,
$ java -jar post.jar Durl=http://localhost:8983/solr/collection2/update
solr.xml monitor.xml
On 11/21/13, 8:14 AM, xiezhide xiezh...@gmail.com wrote:
add Durl=http://localhost:8983/solr/collection2/update when run post.jar,
此邮件发送自189邮箱
Reyes, Mark mark.re...@bpiedu.com wrote:
Hi
Would you store a as #65; ?
No, not in any case.
-- Jack Krupansky
-Original Message-
From: Michael Sokolov
Sent: Thursday, November 21, 2013 8:56 AM
To: solr-user@lucene.apache.org
Subject: Re: How to index X™ as ™ (HTML decimal entity)
I have to agree w/Walter. Use unicode as a
Dave you might want to connect JVisualVm and see if there's any pattern
with latency and garbage collection. That's a frequent culprit for
periodic hits in latency.
More info here
http://docs.oracle.com/javase/6/docs/technotes/guides/visualvm/jmx_connections.html
There's a couple GC
Hi, guys.
I indexed 1000 documents, which have fields like title, ptime and frequency.
The title is a text fild, the ptime is a date field, and the frequency is a
int field.
Frequency field is ups and downs. say sometimes its value is 0, and
sometimes its value is 999.
Now, in my app, the
Additional info on GC selection
http://www.oracle.com/technetwork/java/javase/gc-tuning-6-140523.html#available_collectors
If response time is more important than overall throughput and garbage
collection pauses must be kept shorter than approximately one second, then
select the concurrent
Thanks Doug!
One thing I'm not clear on is how do I know if this is in-fact related to
Garbage Collection. If you're right, and the cluster is only as slow as its
slowest link, how do I determine that this is GC. Do I have to run the
profiler on all eight nodes?
Or is it a matter of turning on
Thanks Shawn for your response.
So, from your email, it seems that unique_key validation is handled
differently from other field validation.
But what i am not very clear, is what the unique_key has to do with finding
the live server?
Becase if there is any mismatch in the unique_key, it is
On 11/21/2013 6:41 PM, Dave Seltzer wrote:
In digging a little deeper and looking at the config I see that
nrtModetrue/nrtMode is commented out. I believe this is the default
setting. So I don't know if NRT is enabled or not. Maybe just a red herring.
I had never seen this setting before.
Hi Robert,
That was the idea, dynamic fields, so, as you said, it is easier to sort
and filter. Besides, having dynamic fields it would be easier to add new
stores, as I wouldn't have to modify the schema :)
Thanks for the answer!
2013/11/21 Petersen, Robert robert.peter...@mail.rakuten.com
On 11/21/2013 9:51 PM, RadhaJayalakshmi wrote:
Thanks Shawn for your response.
So, from your email, it seems that unique_key validation is handled
differently from other field validation.
But what i am not very clear, is what the unique_key has to do with finding
the live server?
Becase if
46 matches
Mail list logo