Re: Suggester - how to return exact match?

2013-11-21 Thread Mirko
Hi, I'd like to clarify our use case a bit more. We want to return the exact search query as a suggestion only if it is present in the index. So in my example we would expect to get the suggestion foo for the query foo but no suggestion abc for the query abc (because abc is not in the

SolrServerException while adding an invalid UNIQUE_KEY in solr 4.4

2013-11-21 Thread RadhaJayalakshmi
Hi,I am using solr4.4 with zookeeper 3.3.5. While i was checking for error conditions of my application, i came across a strange issue.Here is what i tried:I have three fields defined in my schemaa) UNIQUE_KEY - of type solr.TrieLongb) empId - of type Solr.TrieLongc) companyId - of type

Best implementation for multi-price store?

2013-11-21 Thread Alejandro Marqués Rodríguez
Hi, I've been recently ask to implement an application to search products from several stores, each store having different prices and stock for the same product. So I have products that have the usual fields (name, description, brand, etc) and also number of units and price for each store. I

Parse eDisMax queries for keywords

2013-11-21 Thread Mirko
Hi, We would like to implement special handling for queries that contain certain keywords. Our particular use case: In the example query Footitle season 1 we want to discover the keywords season , get the subsequent number, and boost (or filter for) documents that match 1 on field name=season.

Re: facet method=enum and uninvertedfield limitations

2013-11-21 Thread Dmitry Kan
What is the actual target speed you are pursuing? Is this for user suggestions or something of that sort? Content based suggestions with faceting and esp on 1.4 solr won't be lightning fast. Have you looked at TermsComponent? http://wiki.apache.org/solr/TermsComponent By shingles, which in the

RE: Best implementation for multi-price store?

2013-11-21 Thread Petersen, Robert
Hi, I'd go with (2) also but using dynamic fields so you don't have to define all the storeX_price fields in your schema but rather just one *_price field. Then when you filter on store:store1 you'd know to sort with store1_price and so forth for units. That should be pretty straightforward.

Indexing data to a specific collection in Solr 4.5.0

2013-11-21 Thread Reyes, Mark
Hi all: I’m currently on a Solr 4.5.0 instance and running this tutorial, http://lucene.apache.org/solr/4_5_0/tutorial.html My question is specific to indexing data as proposed from this tutorial, $ java -jar post.jar solr.xml monitor.xml The tutorial advises to validate from your localhost,

Re: Facet field query on subset of documents

2013-11-21 Thread Luis Lebolo
Hi Erick, Thanks for the reply and sorry, my fault, wasn't clear enough. I was wondering if there was a way to remove terms that would always be zero (because the term came from a document that didn't match the filter query). Here's an example. I have a bunch of documents with fields

Periodic Slowness on Solr Cloud

2013-11-21 Thread Dave Seltzer
I'm doing some performance testing against an 8-node Solr cloud cluster, and I'm noticing some periodic slowness. http://farm4.staticflickr.com/3668/10985410633_23e26c7681_o.png I'm doing random test searches against an Alias Collection made up of four smaller (monthly) collections. Like this:

RE: search with wildcard

2013-11-21 Thread Scott Schneider
I know it's documented that Lucene/Solr doesn't apply filters to queries with wildcards, but this seems to trip up a lot of users. I can also see why wildcards break a number of filters, but a number of filters (e.g. mapping charsets) could mostly or entirely work. The N-gram filter is

Re: Facet field query on subset of documents

2013-11-21 Thread Erick Erickson
That's what faceting does. The facets are only tabulated for documents that satisfy they query, including all of the filter queries and anh other criteria. Otherwise, facet counts would be the same no matter what the query was. Or I'm completely misunderstanding your question... Best, Erick

Re: Indexing data to a specific collection in Solr 4.5.0

2013-11-21 Thread xiezhide
add Durl=http://localhost:8983/solr/collection2/update when run post.jar, 此邮件发送自189邮箱 Reyes, Mark mark.re...@bpiedu.com wrote: Hi all: I’m currently on a Solr 4.5.0 instance and running this tutorial, http://lucene.apache.org/solr/4_5_0/tutorial.html My question is specific to indexing data

search with wildcard

2013-11-21 Thread Andreas Owen
I am querying test in solr 4.3.1 over the field below and it's not finding all occurences. It seems that if it is a substring of a word like Supertestplan it isn't found unless I use a wildcards *test*. This is write because of my tokenizer but does someone know a way around this? I don't want to

Re: Periodic Slowness on Solr Cloud

2013-11-21 Thread Erick Erickson
How real time is NRT? In particular, what are you commit settings? And can you characterize periodic slowness? Queries that usually take 500ms not tail 10s? Or 1s? How often? How are you measuring? Details matter, a lot... Best, Erick On Thu, Nov 21, 2013 at 6:03 PM, Dave Seltzer

Multiple similarity scores for the same text field

2013-11-21 Thread Nikos Voskarides
I have the following simplified setting: My schema contains one text field, named text. When I perform a query, I need to get the scores for the same text field but for different similarity functions (e.g. TFIDF, BM25..) and combine them externally using different weights. An obvious way to

Re: Indexing data to a specific collection in Solr 4.5.0

2013-11-21 Thread Erick Erickson
you're leaving off the - in front of the D, -Durl. Try java -jar post.jar -help for a list of options available On Thu, Nov 21, 2013 at 12:04 PM, Reyes, Mark mark.re...@bpiedu.com wrote: So then, $ java -jar post.jar Durl=http://localhost:8983/solr/collection2/update solr.xml monitor.xml

Re: Suggester - how to return exact match?

2013-11-21 Thread Developer
Might not be a perfect solution but you can use edgengram filter and copy all your field data to that field and use it for suggestion. fieldType name=text_autocomplete class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/

Re: Periodic Slowness on Solr Cloud

2013-11-21 Thread Mark Miller
Yes, more details… Solr version, which garbage collector, how does heap usage look, cpu, etc. - Mark On Nov 21, 2013, at 6:46 PM, Erick Erickson erickerick...@gmail.com wrote: How real time is NRT? In particular, what are you commit settings? And can you characterize periodic slowness?

Re: SolrServerException while adding an invalid UNIQUE_KEY in solr 4.4

2013-11-21 Thread Shawn Heisey
On 11/21/2013 1:57 AM, RadhaJayalakshmi wrote: Hi,I am using solr4.4 with zookeeper 3.3.5. While i was checking for error conditions of my application, i came across a strange issue.Here is what i tried:I have three fields defined in my schemaa) UNIQUE_KEY - of type solr.TrieLongb) empId - of

Re: Split shard and stream sub-shards to remote nodes?

2013-11-21 Thread Otis Gospodnetic
Hi, On Wed, Nov 20, 2013 at 12:53 PM, Shalin Shekhar Mangar shalinman...@gmail.com wrote: At the Lucene level, I think it would require a directory implementation which writes to a remote node directly. Otherwise, on the solr side, we must move the leader itself to another node which has

Re: How to index X™ as #8482; (HTML decimal entity)

2013-11-21 Thread Walter Underwood
And this is the exact problem. Some characters are stored as entities, some are not. When it is time to display, what else needs escaped? At a minimum, you would have to always store as amp; to avoid escaping the leading ampersand in the entities. You could store every single character as a

Re: Indexing data to a specific collection in Solr 4.5.0

2013-11-21 Thread xiezhide
此邮件发送自189邮箱 Reyes, Mark mark.re...@bpiedu.com wrote: Hi all: I’m currently on a Solr 4.5.0 instance and running this tutorial, http://lucene.apache.org/solr/4_5_0/tutorial.html My question is specific to indexing data as proposed from this tutorial, $ java -jar post.jar solr.xml

Facet field query on subset of documents

2013-11-21 Thread Luis Lebolo
Hi All, Is it possible to perform a facet field query on a subset of documents (the subset being defined via a filter query for instance)? I understand that facet pivoting might work, but it would require that the subset be defined by some field hierarchy, e.g. manufacturer - price (then only

Re: Parse eDisMax queries for keywords

2013-11-21 Thread Jack Krupansky
The query parser does its own tokenization and parsing before your analyzer tokenizer and filters are called, assuring that only one white space-delimited token is analyzed at a time. You're probably best off having an application layer preprocessor for the query that enriches the query in

Re: How to retain the original format of input document in search results in SOLR - Tomcat

2013-11-21 Thread Erick Erickson
Solr (actually Lucene) stores the input _exactly_ as it is entered, and returns it the same way. What you're seeing is almost certainly your display mechanism interpreting the results, whitespace is notoriously variable in terms of how it's displayed by various interpretations of the standard.

Re: search with wildcard

2013-11-21 Thread Ahmet Arslan
Hi Adnreas, If you don't want to use wildcards at query time, alternative way is to use NGrams at indexing time. This will produce a lot of tokens. e.g. For example 4grams of your example : Supertestplan = supe uper pert erte rtes *test* estp stpl tpla plan Is that you want? By the way why do

Re: How to index X™ as #8482; (HTML decimal entity)

2013-11-21 Thread Michael Sokolov
OK - probably I should have said A,or #97; :) My point was just that there is not really anything special about special characters. On 11/21/2013 10:50 AM, Jack Krupansky wrote: Would you store a as #65; ? No, not in any case. -- Jack Krupansky -Original Message- From: Michael

RE: search with wildcard

2013-11-21 Thread Andreas Owen
I suppose i have to create another field with diffenet tokenizers and set the boost very low so it doesn't really mess with my ranking because there the word is now in 2 fields. What kind of tokenizer can do the job? From: Andreas Owen [mailto:a...@conx.ch] Sent: Donnerstag, 21. November 2013

Re: search with wildcard

2013-11-21 Thread Jack Krupansky
You might be able to make use of the dictionary compound word filter, but you will have to build up a dictionary of words to use: http://lucene.apache.org/core/4_5_1/analyzers-common/org/apache/lucene/analysis/compound/DictionaryCompoundWordTokenFilterFactory.html My e-book has some examples

Re: Periodic Slowness on Solr Cloud

2013-11-21 Thread Dave Seltzer
Lots of questions. Okay. In digging a little deeper and looking at the config I see that nrtModetrue/nrtMode is commented out. I believe this is the default setting. So I don't know if NRT is enabled or not. Maybe just a red herring. I don't know what Garbage Collector we're using. In this test

Re: How to index X™ as #8482; (HTML decimal entity)

2013-11-21 Thread Jack Krupansky
there is not really anything special about special characters Well, the distinction was about named entities, which are indeed special. Besides, in general, for more sophisticated text processing, character types are a valid distinction. But all of this begs the question of the original

Re: confirm subscribe to solr-user@lucene.apache.org

2013-11-21 Thread Paule LECUYER
I confirm .

How to implement a conditional copyField working for partial updates ?

2013-11-21 Thread Paule LECUYER
Hello, I'm using Solr 4.x. In my solr schema I have the following fields defined : field name=content type=text_general indexed=false stored=true multiValued=true / field name=all type=text_general indexed=true stored=false multiValued=true termVectors=true / field

Re: How to index X™ as #8482; (HTML decimal entity)

2013-11-21 Thread Jack Krupansky
Ah... now I understand your perspective - you have taken a narrow view of what text is. A broader view is that it can contain formatting and special entities as well, or rich text in general. My read is that it all depends on the nature of the application and its requirements, not a one size

Re: How to index X™ as #8482; (HTML decimal entity)

2013-11-21 Thread Michael Sokolov
I have to agree w/Walter. Use unicode as a storage format. The entity encodings are for transfer/interchange. Encode/decode on the way in and out if you have to. Would you store a as #65; ? It makes it impossible to search for, for one thing. What if someone wants to search for the TM

Re: How to index X™ as #8482; (HTML decimal entity)

2013-11-21 Thread Walter Underwood
I know all about formatted text -- I worked at MarkLogic. That is why I mentioned the XML Infoset. Numeric entities are part of the final presentation, really, part of the encoding. They should never be stored. Always store the Unicode. Numeric and named entities are a convenience for tools

Re: Indexing data to a specific collection in Solr 4.5.0

2013-11-21 Thread Reyes, Mark
So then, $ java -jar post.jar Durl=http://localhost:8983/solr/collection2/update solr.xml monitor.xml On 11/21/13, 8:14 AM, xiezhide xiezh...@gmail.com wrote: add Durl=http://localhost:8983/solr/collection2/update when run post.jar, 此邮件发送自189邮箱 Reyes, Mark mark.re...@bpiedu.com wrote: Hi

Re: How to index X™ as #8482; (HTML decimal entity)

2013-11-21 Thread Jack Krupansky
Would you store a as #65; ? No, not in any case. -- Jack Krupansky -Original Message- From: Michael Sokolov Sent: Thursday, November 21, 2013 8:56 AM To: solr-user@lucene.apache.org Subject: Re: How to index X™ as ™ (HTML decimal entity) I have to agree w/Walter. Use unicode as a

RE: Periodic Slowness on Solr Cloud

2013-11-21 Thread Doug Turnbull
Dave you might want to connect JVisualVm and see if there's any pattern with latency and garbage collection. That's a frequent culprit for periodic hits in latency. More info here http://docs.oracle.com/javase/6/docs/technotes/guides/visualvm/jmx_connections.html There's a couple GC

a function query of time, frequency and score.

2013-11-21 Thread sling
Hi, guys. I indexed 1000 documents, which have fields like title, ptime and frequency. The title is a text fild, the ptime is a date field, and the frequency is a int field. Frequency field is ups and downs. say sometimes its value is 0, and sometimes its value is 999. Now, in my app, the

Re: Periodic Slowness on Solr Cloud

2013-11-21 Thread Doug Turnbull
Additional info on GC selection http://www.oracle.com/technetwork/java/javase/gc-tuning-6-140523.html#available_collectors If response time is more important than overall throughput and garbage collection pauses must be kept shorter than approximately one second, then select the concurrent

Re: Periodic Slowness on Solr Cloud

2013-11-21 Thread Dave Seltzer
Thanks Doug! One thing I'm not clear on is how do I know if this is in-fact related to Garbage Collection. If you're right, and the cluster is only as slow as its slowest link, how do I determine that this is GC. Do I have to run the profiler on all eight nodes? Or is it a matter of turning on

Re: SolrServerException while adding an invalid UNIQUE_KEY in solr 4.4

2013-11-21 Thread RadhaJayalakshmi
Thanks Shawn for your response. So, from your email, it seems that unique_key validation is handled differently from other field validation. But what i am not very clear, is what the unique_key has to do with finding the live server? Becase if there is any mismatch in the unique_key, it is

Re: Periodic Slowness on Solr Cloud

2013-11-21 Thread Shawn Heisey
On 11/21/2013 6:41 PM, Dave Seltzer wrote: In digging a little deeper and looking at the config I see that nrtModetrue/nrtMode is commented out. I believe this is the default setting. So I don't know if NRT is enabled or not. Maybe just a red herring. I had never seen this setting before.

Re: Best implementation for multi-price store?

2013-11-21 Thread Alejandro Marqués Rodríguez
Hi Robert, That was the idea, dynamic fields, so, as you said, it is easier to sort and filter. Besides, having dynamic fields it would be easier to add new stores, as I wouldn't have to modify the schema :) Thanks for the answer! 2013/11/21 Petersen, Robert robert.peter...@mail.rakuten.com

Re: SolrServerException while adding an invalid UNIQUE_KEY in solr 4.4

2013-11-21 Thread Shawn Heisey
On 11/21/2013 9:51 PM, RadhaJayalakshmi wrote: Thanks Shawn for your response. So, from your email, it seems that unique_key validation is handled differently from other field validation. But what i am not very clear, is what the unique_key has to do with finding the live server? Becase if