AW: Spellchecking and suggesting part numbers

2014-11-03 Thread Lochschmied, Alexander
Thanks James, this did help a lot.

Is it possible to make DirectSolrSpellChecker try to return suggestions with 
maximum length of matching leading characters?

Alexander

-Ursprüngliche Nachricht-
Von: Dyer, James [mailto:james.d...@ingramcontent.com] 
Gesendet: Mittwoch, 24. September 2014 16:42
An: solr-user@lucene.apache.org
Betreff: RE: Spellchecking and suggesting part numbers

Alexander,

You could use a higher value for spellcheck.count, maybe 20 or so, then in your 
application pick out the suggestions that make changes on the right side.

Another option is to use DirectSolrSpellChecker (usually a better choice 
anyhow) and set the "minPrefix" field.  This will require up to n characters on 
the left side to match before it will make suggestions.  Taking a quick look at 
the code, it seems to me it won't try and correct anything in this prefix 
region also.  So perhaps you can set this to 2-4 (default=1).  See 
http://lucene.apache.org/core/4_10_0/suggest/org/apache/lucene/search/spell/DirectSpellChecker.html#setMinPrefix%28int%29
 .

James Dyer
Ingram Content Group
(615) 213-4311


-Original Message-
From: Lochschmied, Alexander [mailto:alexander.lochschm...@vishay.com] 
Sent: Wednesday, September 24, 2014 9:06 AM
To: solr-user@lucene.apache.org
Subject: Spellchecking and suggesting part numbers

Hello Solr Users,

we are trying to get suggestions for part numbers using the spellchecker.

Problem scenario:

ABCD1234 // This is the search term
ABCE1234 // This is what we get from spellchecker
ABCD1244 // This is what we would like to get from spellchecker

Characters towards the left of our part numbers are more relevant.


The setup is:



solr.IndexBasedSpellChecker
./spellchecker
did_you_mean_part




did_you_mean_part
on


spellcheck_part




















Can we tweak the setup such that we should get more relevant part numbers?

Thanks,
Alexander




Spellchecking and suggesting part numbers

2014-09-24 Thread Lochschmied, Alexander
Hello Solr Users,

we are trying to get suggestions for part numbers using the spellchecker.

Problem scenario:

ABCD1234 // This is the search term
ABCE1234 // This is what we get from spellchecker
ABCD1244 // This is what we would like to get from spellchecker

Characters towards the left of our part numbers are more relevant.


The setup is:



solr.IndexBasedSpellChecker
./spellchecker
did_you_mean_part




did_you_mean_part
on


spellcheck_part




















Can we tweak the setup such that we should get more relevant part numbers?

Thanks,
Alexander


AW: Special NGRAMish requirement

2014-02-04 Thread Lochschmied, Alexander
Hi Otis and everbody,

I am not sure if Solr works this way, actually I doubt it...

Let's say I look for "ABC", and I setup my EdgeNGram to create everything up to 
one character: "ABC", "AB", "A"
And maybe I have only two docs in the index with "ABCD" and "AB", and they are 
also setup with that EdgeNGram definition.

So in this case, I think what I would want is only the first document ("ABCD"), 
since the match in the other doc matches less characters. So I would want it 
not be part of the results at all (not just having a lower score).

Although I am curious about your answers, we probably do not need it anymore. 
So if there is no (easy) way to do this, it is not a problem for us.

Thanks,
Alexander


-Ursprüngliche Nachricht-
Von: Otis Gospodnetic [mailto:otis.gospodne...@gmail.com] 
Gesendet: Dienstag, 4. Februar 2014 03:32
An: solr-user@lucene.apache.org
Betreff: Re: Special NGRAMish requirement

Hi,

Can you provide an example, Alexander?

Otis
Solr & ElasticSearch Support
http://sematext.com/
On Feb 3, 2014 5:28 AM, "Lochschmied, Alexander" < 
alexander.lochschm...@vishay.com> wrote:

> Hi,
>
> we need to use something very similar to EdgeNGram (minGramSize="1"
> maxGramSize="50" side="front").
> The only thing missing is that we would like to reduce the number of 
> matches. The request we need to implement is returning only those 
> matches with the longest tokens (or terms if that is the right word).
>
> Is there a way to do this in Solr (not necessarily with EdgeNGram)?
>
> Thanks,
> Alexander
>


Special NGRAMish requirement

2014-02-03 Thread Lochschmied, Alexander
Hi,

we need to use something very similar to EdgeNGram (minGramSize="1" 
maxGramSize="50" side="front").
The only thing missing is that we would like to reduce the number of matches. 
The request we need to implement is returning only those matches with the 
longest tokens (or terms if that is the right word).

Is there a way to do this in Solr (not necessarily with EdgeNGram)?

Thanks,
Alexander


AW: About Suggestions

2013-07-16 Thread Lochschmied, Alexander
Thanks Eric, that is what I suspected. We are very happy with the four 
suggestions in the example (and all the others), but we would like to know 
which of them represents a full part number.
Can you elaborate a little more how that could be achieved?

Best regards,
Alexander

-Ursprüngliche Nachricht-
Von: Erick Erickson [mailto:erickerick...@gmail.com] 
Gesendet: Dienstag, 16. Juli 2013 14:09
An: solr-user@lucene.apache.org
Betreff: Re: About Suggestions

Garbage in, garbage out 

Your indexing analysis chain is breaking up the tokens via the 
EdgeNgramTokenizer and _putting those values in the index_.
Then the TermsComponent is looking _only_ at the tokens in the index and giving 
you back exactly what you're asking for.

So no, there's no way with that analysis chain to get only complete terms, at 
that level the fact that a term was part of a larger input token has been lost. 
In fact, if you were to enter something like terms.prefix=1n1 you'd likely see 
all your 3-grams that start with 1n1 etc.

So use a copyfield and put these in a separate field that has only whole tokens 
or just take the EdgeNgramTokenizer from your current definition. If the 
latter, blow away your index and re-index from scratch.

Best
Erick

On Tue, Jul 16, 2013 at 4:48 AM, Lochschmied, Alexander 
 wrote:
> Hi Eric and everybody else!
>
> Thanks for trying to help. Here is the example:
>
> .../terms?terms.regex.flag=case_insensitive&terms.fl=suggest&terms=tru
> e&terms.limit=20&terms.sort=index&terms.prefix=1n1187
>
> returns
>
> 1
> 1
> 1
> 1
>
> This list contains 3 complete part numbers but the third item (1n1187r) is 
> not a complete part number. Is there a way to make terms tell if a term 
> represents a complete value?
> (My guess is that this gets lost after ngram but I'm still hoping 
> something can be done.)
>
> More config details:
>
>  required="false" multiValued="true"/>
>
> and
>
>  positionIncrementGap="100">
> 
> 
>  words="stopwords.txt" enablePositionIncrements="true"/>
> 
>  maxGramSize="20" side="front"/>
> 
> 
> 
>  words="stopwords.txt" enablePositionIncrements="true"/>
> 
> 
> 
>
> Thanks,
> Alexander
>
>
> -Ursprüngliche Nachricht-
> Von: Erick Erickson [mailto:erickerick...@gmail.com]
> Gesendet: Samstag, 13. Juli 2013 19:58
> An: solr-user@lucene.apache.org
> Betreff: Re: About Suggestions
>
> Not quite sure what you mean here, a couple of examples would help.
>
> But since the term is using keyword tokenizer, then each thing you get back 
> is a complete term, by definition. So I'm not quite sure what you're asking 
> here.
>
> Best
> Erick
>
> On Fri, Jul 12, 2013 at 4:48 AM, Lochschmied, Alexander 
>  wrote:
>> Hi Solr people!
>>
>> We need to suggest part numbers in alphabetically order adding up to four 
>> characters to the already entered part number prefix. That works quite well 
>> with terms component acting on a multivalued field with keyword tokenizer 
>> and edge nGram filter. I am mentioning "part numbers" to indicate that each 
>> item in the multivalued field is a string without whitespace and where 
>> special characters like dashes cannot be seen as separators.
>>
>> Is there a way to know if the term (the suggestion) represents such a 
>> complete part number (without doing another query for each suggestion)?
>>
>> Since we are using SolJ, what we would need is something like
>> boolean Term.isRepresentingCompleteFieldValue()
>>
>> Thanks,
>> Alexander


AW: About Suggestions

2013-07-16 Thread Lochschmied, Alexander
Hi Eric and everybody else!

Thanks for trying to help. Here is the example: 

.../terms?terms.regex.flag=case_insensitive&terms.fl=suggest&terms=true&terms.limit=20&terms.sort=index&terms.prefix=1n1187

returns

1
1
1
1

This list contains 3 complete part numbers but the third item (1n1187r) is not 
a complete part number. Is there a way to make terms tell if a term represents 
a complete value?
(My guess is that this gets lost after ngram but I'm still hoping something can 
be done.)

More config details:



and















Thanks,
Alexander


-Ursprüngliche Nachricht-
Von: Erick Erickson [mailto:erickerick...@gmail.com] 
Gesendet: Samstag, 13. Juli 2013 19:58
An: solr-user@lucene.apache.org
Betreff: Re: About Suggestions

Not quite sure what you mean here, a couple of examples would help.

But since the term is using keyword tokenizer, then each thing you get back is 
a complete term, by definition. So I'm not quite sure what you're asking 
here.

Best
Erick

On Fri, Jul 12, 2013 at 4:48 AM, Lochschmied, Alexander 
 wrote:
> Hi Solr people!
>
> We need to suggest part numbers in alphabetically order adding up to four 
> characters to the already entered part number prefix. That works quite well 
> with terms component acting on a multivalued field with keyword tokenizer and 
> edge nGram filter. I am mentioning "part numbers" to indicate that each item 
> in the multivalued field is a string without whitespace and where special 
> characters like dashes cannot be seen as separators.
>
> Is there a way to know if the term (the suggestion) represents such a 
> complete part number (without doing another query for each suggestion)?
>
> Since we are using SolJ, what we would need is something like
> boolean Term.isRepresentingCompleteFieldValue()
>
> Thanks,
> Alexander


About Suggestions

2013-07-12 Thread Lochschmied, Alexander
Hi Solr people!

We need to suggest part numbers in alphabetically order adding up to four 
characters to the already entered part number prefix. That works quite well 
with terms component acting on a multivalued field with keyword tokenizer and 
edge nGram filter. I am mentioning "part numbers" to indicate that each item in 
the multivalued field is a string without whitespace and where special 
characters like dashes cannot be seen as separators.

Is there a way to know if the term (the suggestion) represents such a complete 
part number (without doing another query for each suggestion)?

Since we are using SolJ, what we would need is something like
boolean Term.isRepresentingCompleteFieldValue()

Thanks,
Alexander


AW: Surprising score?

2013-07-05 Thread Lochschmied, Alexander
Thanks Jeroen and Upayavira!

I read the warning about losing the ability to use index time boosts when I 
disable length normalization. And we actually use it; at least if it means 
having a boost field in the index and doing queries like this:

"{!boost b=boost}( series:RCWP^10 OR otherFileds:queries^2)"

Is there a way to omitNorms and still be able to use {!boost b=boost} ?

Thanks,
Alexander


-Ursprüngliche Nachricht-
Von: Upayavira [mailto:u...@odoko.co.uk] 
Gesendet: Donnerstag, 4. Juli 2013 13:07
An: solr-user@lucene.apache.org
Betreff: Re: Surprising score?

And be sure to re-index your content.

Upayavira

On Thu, Jul 4, 2013, at 11:28 AM, Jeroen Steggink wrote:
> Hi Alexander,
> 
> This is because you have length normalization enabled for that field.
> http://ir.dcs.gla.ac.uk/wiki/Length_Normalisation
> 
> If you want it disabled set the following:
> 
>  positionIncrementGap="100" omitNorms="true">
> 
> 
>   Jeroen
> 
> On 4-7-2013 11:10, Lochschmied, Alexander wrote:
> > Hi Solr people!
> >
> > querying for "series:RCWP" returns me the response below. Why does "RCWP 
> > Moisture Resistant" score worse than "D/CRCW-P e3" with the field 
> > definition below? OK, we are ignoring dashes and spaces, but I would have 
> > expected that matches towards the beginning score better. Can I change this 
> > behavior (in Solr 4)?
> >
> > --
> > 
> > 
> > RCWP
> > 3.2698402
> > 
> > 
> > D/CRCW-P e3
> > 1.3624334
> > 
> > 
> > RCWP Moisture Resistant
> > 0.5449734
> > 
> > 
> > --
> >
> > 
> > 
> >  > pattern="[\-\s]+" replacement=""/>
> > 
> >  > words="stopwords.txt" enablePositionIncrements="true"/>
> > 
> >  > maxGramSize="50"/>
> > 
> > 
> >  > pattern="[\-\s]+" replacement=""/>
> > 
> > 
> > 
> > 
> >
> > Thanks,
> > Alexander
> 
> 


Surprising score?

2013-07-04 Thread Lochschmied, Alexander
Hi Solr people!

querying for "series:RCWP" returns me the response below. Why does "RCWP 
Moisture Resistant" score worse than "D/CRCW-P e3" with the field definition 
below? OK, we are ignoring dashes and spaces, but I would have expected that 
matches towards the beginning score better. Can I change this behavior (in Solr 
4)?

--


RCWP
3.2698402


D/CRCW-P e3
1.3624334


RCWP Moisture Resistant
0.5449734


--
















Thanks,
Alexander


AW: Website (crawler for) indexing

2012-09-06 Thread Lochschmied, Alexander
Thanks Rafał and Markus for your comments.

I think Droids it has serious problem with URL parameters in current version 
(0.2.0) from Maven central:
https://issues.apache.org/jira/browse/DROIDS-144

I knew about Nutch, but I haven't been able to implement a crawler with it. 
Have you done that or seen an example application?
It's probably easy to call a Nutch jar and make it index a website and maybe I 
will have to do that.
But as we already have a Java implementation to index other sources, it would 
be nice if we could integrate the crawling part too.

Regards,
Alexander 



Hello!

You can implement your own crawler using Droids
(http://incubator.apache.org/droids/) or use Apache Nutch 
(http://nutch.apache.org/), which is very easy to integrate with Solr and is 
very powerful crawler.

--
Regards,
 Rafał Kuć
 Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch - ElasticSearch

> This may be a bit off topic: How do you index an existing website and 
> control the data going into index?

> We already have Java code to process the HTML (or XHTML) and turn it 
> into a SolrJ Document (removing tags and other things we do not want 
> in the index). We use SolrJ for indexing.
> So I guess the question is essentially which Java crawler could be useful.

> We used to use wget on command line in our publishing process, but we do no 
> longer want to do that.

> Thanks,
> Alexander



Website (crawler for) indexing

2012-09-05 Thread Lochschmied, Alexander
This may be a bit off topic: How do you index an existing website and control 
the data going into index?

We already have Java code to process the HTML (or XHTML) and turn it into a 
SolrJ Document (removing tags and other things we do not want in the index). We 
use SolrJ for indexing.
So I guess the question is essentially which Java crawler could be useful.

We used to use wget on command line in our publishing process, but we do no 
longer want to do that.

Thanks,
Alexander



Unexpected function query sort

2012-08-14 Thread Lochschmied, Alexander
strdist() doesn't seem to work for me (using Solr 3.6.0).

I use a query like this one:
?fl=mycol
&start=0
&rows=20
&q=mycol:abcd OR othercol:abcd
&fq=somecol:someval
&sort=strdist('abcd',mycol,edit) asc

I would have expected all documents with mycol == 'abcd' to be on top of the 
results (or at the bottom). Is that correct assumption?
I see however results starting with such documents (which is good), then 
documents without any value for mycol and then again some with mycol == 'abcd'.

mycol definition is pretty basic:

  


  


Is there a way to debug such (sort) function queries? Can I see the calculated 
sort value for example?

Thanks,
Alexander



AW: AW: Indexing wildcard patterns

2012-08-13 Thread Lochschmied, Alexander
Here is what we do in SQL:

mysql> select * from _tbl;
+++
| id | field  |
+++
|  1 | plain text |
|  2 | wil_c% |
+++
2 rows in set (0.14 sec)

mysql> SELECT * FROM _TBL WHERE 'wildcard' LIKE FIELD;
+++
| id | field  |
+++
|  2 | wil_c% |
+++
1 row in set (0.12 sec)

So the patterns are associated with the actual documents in the database. We 
use those fields as a means to manually customize some searches.

Thanks,
Alexander

-Ursprüngliche Nachricht-
Von: Jack Krupansky [mailto:j...@basetechnology.com] 
Gesendet: Freitag, 10. August 2012 18:39
An: solr-user@lucene.apache.org
Betreff: Re: AW: Indexing wildcard patterns

"Doc1 has the pattern "AB%CD%" associated with it (somehow?!)."

You need to clarify what you mean by that.

To be clear, Solr support for wildcards is a superset of the SQL LIKE operator, 
and the patterns used in the LIKE operator are NOT stored in the table data, 
but used at query time - same with Solr. In SQL you do not "associate" patterns 
with table data, but rather you query data using a pattern.

Step back and describe the problem you are trying to solve rather than 
prematurely jumping into a proposed solution.

So, if there is something you already do in SQL and now wish to do it in Solr, 
please tell us about it.

-- Jack Krupansky

-Original Message-
From: Lochschmied, Alexander
Sent: Friday, August 10, 2012 5:25 AM
To: solr-user@lucene.apache.org
Subject: AW: Indexing wildcard patterns

I thought my question might be confusing...

I know about Solr providing wildcards in queries, but my problem is different.

I have those patterns associated with my searchable documents before any actual 
search is done.
I need Solr to return the document which is associated with matching patterns. 
User does not enter the wildcard pattern; wildcard pattern must be tested by 
Solr automatically.

So in the example I provided below, a user might enter " ABCDXYZ " and I need 
Solr to return Doc1, as Doc1 has the pattern "AB%CD%" associated with it 
(somehow?!).

Thanks,
Alexander


-Ursprüngliche Nachricht-
Von: Ahmet Arslan [mailto:iori...@yahoo.com]
Gesendet: Freitag, 10. August 2012 10:34
An: solr-user@lucene.apache.org
Betreff: Re: Indexing wildcard patterns



--- On Fri, 8/10/12, Lochschmied, Alexander  
wrote:

> From: Lochschmied, Alexander 
> Subject: Indexing wildcard patterns
> To: "solr-user@lucene.apache.org" 
> Date: Friday, August 10, 2012, 11:07 AM Coming from a SQL database 
> based search system, we already have a set of defined patterns 
> associated with our searchable documents.
>
> % matches no or any number of characters _ matches one character
>
> Example:
> Doc 1: 'AB%CD', 'AB%CD%'
> Doc 2: 'AB_CD'
> ...
>
> Thus Doc 1 matches
> ABXYZCD
> ABCD
> ABCDXYZ
> ...
>
> Whereas Doc 2 matches only
> ABXCD
> ABYCD
> ABZCD
> ...
>
> This can be achieved in SQL WHERE statements using the LIKE operator.
>
> Is there a (similar) way to this in Solr?

Yes, wildcard search in solr

* matches no or any number of characters ? matches one character

http://lucene.apache.org/core/3_6_0/queryparsersyntax.html#Wildcard%20Searches 



AW: Indexing wildcard patterns

2012-08-10 Thread Lochschmied, Alexander
Thank you Toke, your comments made a lot of sense to me. Luckily we do not have 
many patterns and we just decided to consider only the prefixes up to the first 
wildcard. So we will no longer have to deal with patterns.
Alexander

-Ursprüngliche Nachricht-
Von: Toke Eskildsen [mailto:t...@statsbiblioteket.dk] 
Gesendet: Freitag, 10. August 2012 13:29
An: solr-user@lucene.apache.org
Betreff: Re: Indexing wildcard patterns

On Fri, 2012-08-10 at 10:07 +0200, Lochschmied, Alexander wrote:
> Coming from a SQL database based search system, we already have a set of 
> defined patterns associated with our searchable documents.
> 
> % matches no or any number of characters _ matches one character
> 
> Example:
> Doc 1: 'AB%CD', 'AB%CD%'
> Doc 2: 'AB_CD'

As I understand it: You have a list of (simple) patterns and want to find those 
that matches a given input. When you do it in SQL, it iterates all patterns and 
applies them one at a time.

I am not aware of any mechanism in Lucene/Solr that provides this 
functionality. Implementing a new Query type for this would be a possibility, 
and speed could be somewhat optimized by compiling the patterns only once; but 
as long as the underlying algorithm is "iterate all patterns and see if they 
match", this will not scale very well.

Before speculating any further, it would be nice to know the scale of your 
problem: How many unique patterns are we talking about? Is there any "pattern 
to the patterns", such as specific lengths, maximum number of substitutions or 
literal prefixes?



AW: Indexing wildcard patterns

2012-08-10 Thread Lochschmied, Alexander
I thought my question might be confusing...

I know about Solr providing wildcards in queries, but my problem is different.

I have those patterns associated with my searchable documents before any actual 
search is done.
I need Solr to return the document which is associated with matching patterns. 
User does not enter the wildcard pattern; wildcard pattern must be tested by 
Solr automatically.

So in the example I provided below, a user might enter " ABCDXYZ " and I need 
Solr to return Doc1, as Doc1 has the pattern "AB%CD%" associated with it 
(somehow?!).

Thanks,
Alexander


-Ursprüngliche Nachricht-
Von: Ahmet Arslan [mailto:iori...@yahoo.com] 
Gesendet: Freitag, 10. August 2012 10:34
An: solr-user@lucene.apache.org
Betreff: Re: Indexing wildcard patterns



--- On Fri, 8/10/12, Lochschmied, Alexander  
wrote:

> From: Lochschmied, Alexander 
> Subject: Indexing wildcard patterns
> To: "solr-user@lucene.apache.org" 
> Date: Friday, August 10, 2012, 11:07 AM Coming from a SQL database 
> based search system, we already have a set of defined patterns 
> associated with our searchable documents.
> 
> % matches no or any number of characters _ matches one character
> 
> Example:
> Doc 1: 'AB%CD', 'AB%CD%'
> Doc 2: 'AB_CD'
> ...
> 
> Thus Doc 1 matches
> ABXYZCD
> ABCD
> ABCDXYZ
> ...
> 
> Whereas Doc 2 matches only
> ABXCD
> ABYCD
> ABZCD
> ...
> 
> This can be achieved in SQL WHERE statements using the LIKE operator.
> 
> Is there a (similar) way to this in Solr?

Yes, wildcard search in solr

* matches no or any number of characters ? matches one character

 http://lucene.apache.org/core/3_6_0/queryparsersyntax.html#Wildcard%20Searches


Indexing wildcard patterns

2012-08-10 Thread Lochschmied, Alexander
Coming from a SQL database based search system, we already have a set of 
defined patterns associated with our searchable documents.

% matches no or any number of characters
_ matches one character

Example:
Doc 1: 'AB%CD', 'AB%CD%'
Doc 2: 'AB_CD'
...

Thus Doc 1 matches
ABXYZCD
ABCD
ABCDXYZ
...

Whereas Doc 2 matches only
ABXCD
ABYCD
ABZCD
...

This can be achieved in SQL WHERE statements using the LIKE operator.

Is there a (similar) way to this in Solr?

Thanks,
Alexander


AW: Special suggestions requirement

2012-08-06 Thread Lochschmied, Alexander
Is there anything you cannot do with Solr? :-)
Thanks a lot Erick! I only had to use . instead of ?, e.g.

...:8983/solr/terms?terms.fl=fieldname&terms.limit=100&terms.prefix=abcd&terms.regex.flag=case_insensitive&terms=true&terms.regex=abcd..

Adding terms.sort=index allows me even to sort as I need.

Thanks,
Alexander

-Ursprüngliche Nachricht-
Von: Erick Erickson [mailto:erickerick...@gmail.com] 
Gesendet: Samstag, 4. August 2012 20:11
An: solr-user@lucene.apache.org
Betreff: Re: Special suggestions requirement

Would it work to use TermsComponent with wildcards?
Something like terms.regex="ABCD42??"...

see: http://wiki.apache.org/solr/TermsComponent/

Best
Erick


On Fri, Aug 3, 2012 at 9:07 AM, Michael Della Bitta 
 wrote:
> I could be crazy, but it sounds to me like you need a trie, not a 
> search index: http://en.wikipedia.org/wiki/Trie
>
> But in any case, what you want to do should be achievable. It seems 
> like you need to do EdgeNgrams and facet on the results, where 
> facet.counts > 1 to exclude the actual part numbers, since each of 
> those would be distinct.
>
> I'm on the train right now, so I can't test this. :\
>
> Michael Della Bitta
>
> 
> Appinions | 18 East 41st St., Suite 1806 | New York, NY 10017 
> www.appinions.com Where Influence Isn't a Game
>
>
> On Thu, Aug 2, 2012 at 9:19 PM, Lochschmied, Alexander 
>  wrote:
>> Even with prefix query, I do not get "ABCD02" or any "ABCD02..." back. BTW: 
>> EdgeNGramFilterFactory is used on the field we are getting the 
>> suggestions/spellchecks from.
>> I think the problem is that there are a lot of different part numbers 
>> starting with "ABCD" and every part number has the same length. I showed 
>> only 4 in the example but there might be thousands.
>>
>> Here are some full part number examples that might be in the index:
>> ABCD110040
>> ABCD00
>> ABCD99
>> ABCD155500
>> ...
>>
>> I'm looking for a way to make Solr return distinct list of fixed 
>> length substrings of them, e.g. if "ABCD" is entered, I would need
>> ABCD00
>> ABCD01
>> ABCD02
>> ABCD03
>> ...
>> ABCD99
>>
>> Then if user chose "ABCD42" from the suggestions, I would need
>> ABCD4201
>> ABCD4202
>> ABCD4203
>> ...
>> ABCD4299
>>
>> and so on.
>>
>> I would be able to do some "post processing" if needed or adjust the schema 
>> or indexing process. But the key functionality I need from Solr is returning 
>> distinct set of those suggestions where only the last two characters change. 
>> All of the available combinations of those last two characters must be 
>> considered though. I need to show alpha-numerically sorted suggestions; the 
>> smallest value first.
>>
>> Thanks,
>> Alexander
>>
>> -Ursprüngliche Nachricht-
>> Von: Michael Della Bitta [mailto:michael.della.bi...@appinions.com]
>> Gesendet: Donnerstag, 2. August 2012 15:02
>> An: solr-user@lucene.apache.org
>> Betreff: Re: Special suggestions requirement
>>
>> In this case, we're storing the overall value length and sorting it on that, 
>> then alphabetically.
>>
>> Also, how are your queries fashioned? If you're doing a prefix query, 
>> everything that matches it should score the same. If you're only doing a 
>> prefix query, you might need to add a term for exact matches as well to get 
>> them to show up.
>>
>> Michael Della Bitta
>>
>> 
>> Appinions | 18 East 41st St., Suite 1806 | New York, NY 10017 
>> www.appinions.com Where Influence Isn't a Game
>>
>>
>> On Wed, Aug 1, 2012 at 9:58 PM, Lochschmied, Alexander 
>>  wrote:
>>> Is there a way to offer distinct, alphabetically sorted, fixed length 
>>> options?
>>>
>>> I am trying to suggest part numbers and I'm currently trying to do it with 
>>> the spellchecker component.
>>> Let's say "ABCD" was entered and we have indexed part numbers like
>>> ABCD
>>> ABCD2000
>>> ABCD2100
>>> ABCD2200
>>> ...
>>>
>>> I would like to have 2 characters suggested always, so for "ABCD", 
>>> it should suggest
>>> ABCD00
>>> ABCD20
>>> ABCD21
>>> ABCD22
>>> ...
>>>
>>> No smart sorting is needed, just alphabetically sorting. The problem is 
>>> that for example 00 (or ABCD00) may not be suggested currently as it 
>>> doesn't score high enough. But we are really trying to get all distinct 
>>> values starting from the smallest (up to a certain number of suggestions).
>>>
>>> I was looking already at custom comparator class option. But this would 
>>> probably not work as I would need more information to implement it there 
>>> (like at least the currently entered search term, "ABCD" in the example).
>>>
>>> Thanks,
>>> Alexander


AW: Special suggestions requirement

2012-08-02 Thread Lochschmied, Alexander
Even with prefix query, I do not get "ABCD02" or any "ABCD02..." back. BTW: 
EdgeNGramFilterFactory is used on the field we are getting the 
suggestions/spellchecks from.
I think the problem is that there are a lot of different part numbers starting 
with "ABCD" and every part number has the same length. I showed only 4 in the 
example but there might be thousands.

Here are some full part number examples that might be in the index:
ABCD110040
ABCD00
ABCD99
ABCD155500
...

I'm looking for a way to make Solr return distinct list of fixed length 
substrings of them, e.g. if "ABCD" is entered, I would need
ABCD00
ABCD01
ABCD02
ABCD03
...
ABCD99

Then if user chose "ABCD42" from the suggestions, I would need
ABCD4201
ABCD4202
ABCD4203
...
ABCD4299

and so on.

I would be able to do some "post processing" if needed or adjust the schema or 
indexing process. But the key functionality I need from Solr is returning 
distinct set of those suggestions where only the last two characters change. 
All of the available combinations of those last two characters must be 
considered though. I need to show alpha-numerically sorted suggestions; the 
smallest value first.

Thanks,
Alexander

-Ursprüngliche Nachricht-
Von: Michael Della Bitta [mailto:michael.della.bi...@appinions.com] 
Gesendet: Donnerstag, 2. August 2012 15:02
An: solr-user@lucene.apache.org
Betreff: Re: Special suggestions requirement

In this case, we're storing the overall value length and sorting it on that, 
then alphabetically.

Also, how are your queries fashioned? If you're doing a prefix query, 
everything that matches it should score the same. If you're only doing a prefix 
query, you might need to add a term for exact matches as well to get them to 
show up.

Michael Della Bitta


Appinions | 18 East 41st St., Suite 1806 | New York, NY 10017 www.appinions.com 
Where Influence Isn't a Game


On Wed, Aug 1, 2012 at 9:58 PM, Lochschmied, Alexander 
 wrote:
> Is there a way to offer distinct, alphabetically sorted, fixed length options?
>
> I am trying to suggest part numbers and I'm currently trying to do it with 
> the spellchecker component.
> Let's say "ABCD" was entered and we have indexed part numbers like
> ABCD
> ABCD2000
> ABCD2100
> ABCD2200
> ...
>
> I would like to have 2 characters suggested always, so for "ABCD", it 
> should suggest
> ABCD00
> ABCD20
> ABCD21
> ABCD22
> ...
>
> No smart sorting is needed, just alphabetically sorting. The problem is that 
> for example 00 (or ABCD00) may not be suggested currently as it doesn't score 
> high enough. But we are really trying to get all distinct values starting 
> from the smallest (up to a certain number of suggestions).
>
> I was looking already at custom comparator class option. But this would 
> probably not work as I would need more information to implement it there 
> (like at least the currently entered search term, "ABCD" in the example).
>
> Thanks,
> Alexander