Re: Apostrophes in fields

2013-09-03 Thread Jack Krupansky

Show us your full field type with analyzer.

I suspect that the problem is that one of the index-time filters is turning 
"dev's" into "devs" (WDF does that), but at query-time there is no filter 
that removes a trailing apostrophe.


Use the Solr Admin UI Analysis page to see home "dev's" gets indexed and how 
"dev'" gets analyzed at query time.



-- Jack Krupansky

-Original Message- 
From: devendra W

Sent: Tuesday, September 03, 2013 5:59 AM
To: solr-user@lucene.apache.org
Subject: Re: Apostrophes in fields

in my case - the fields with apostrophe not returned in results

When I search for --  dev it shows me following results
dev
dev's
devendra

but when I search for -- dev'   (dev with apo only)
Nothing comes out as result ?

What could be the workaround ?


Thanks
Devendra



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Apostrophes-in-fields-tp475058p4087910.html
Sent from the Solr - User mailing list archive at Nabble.com. 



Re: Apostrophes in fields

2013-09-03 Thread Shawn Heisey
On 9/3/2013 3:59 AM, devendra W wrote:
> in my case - the fields with apostrophe not returned in results

Don't use special characters in field names.  If it wouldn't work as an
variable name, function name (or other identifier) in a typical
programming language (Java, C, Perl), then it will probably cause you
problems with a field name.

This basically means: 7-bit ASCII only.  Starts with a letter, contains
only letters, numbers, and the underscore.

Most punctuation other than the underscore has a special meaning to
Solr.  Using extended characters (UTF-8, or those beyond 7-bit ASCII)
*might* work, but it's fairly easy to screw that up and use the wrong
character set, so it's better if you just don't do it.

Thanks,
Shawn



Re: Apostrophes in fields

2013-09-03 Thread devendra W
in my case - the fields with apostrophe not returned in results

When I search for --  dev it shows me following results
dev
dev's
devendra

but when I search for -- dev'   (dev with apo only)
Nothing comes out as result ? 

What could be the workaround ?


Thanks
Devendra



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Apostrophes-in-fields-tp475058p4087910.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Apostrophes in fields

2007-01-16 Thread Nick Jenkin

Using the fuzzy searching fixed the problem - I will have a play with
the analzyers and see if I can get it working nicely.

Thanks again, much apreciated.

On 1/17/07, Chris Hostetter <[EMAIL PROTECTED]> wrote:


: This problem is why some sloppiness is recommended when dealing with
: WordDelimiterFilter.

particularly when using the generate___Parts="true" options

Nick: if you want simpler matching like this, you might want to consider
simplifying your definition of "text" ... if you look at the "textTight"
fieldtype in the example shema (used by the field "sku") you'll see a
simpler usage of WordDelimiterFilter ... alternately you may just want to
use lucene's basic StandardAnalzyer ... i believe it strips Apostrophes.

as a real last resort, you could use the recently added
PatternReplaceFilter to strip out apostrophe's prior to
WordDelimiterFilter (if you like everything WordDelim does for you except
spliting on apostrophes)

:   - optionally index ohara at *both* "o" and "hara"

then searching for "Shelley ohara memorial" fails without unless yo have
slop .. if you need slop, you might as well not index it twice (not to
mention it throws off the tf/idf calculations)

:   - pick the "alignment" based on the token position in the stream...
: right-justify the catenations if it's the first token, otherwise
: left-justify.  One could try to identify proper names and do the
: justification correctly too (blech).

oh for the love of god please no.



-Hoss





--
- Nick


Re: Apostrophes in fields

2007-01-16 Thread Chris Hostetter

: This problem is why some sloppiness is recommended when dealing with
: WordDelimiterFilter.

particularly when using the generate___Parts="true" options

Nick: if you want simpler matching like this, you might want to consider
simplifying your definition of "text" ... if you look at the "textTight"
fieldtype in the example shema (used by the field "sku") you'll see a
simpler usage of WordDelimiterFilter ... alternately you may just want to
use lucene's basic StandardAnalzyer ... i believe it strips Apostrophes.

as a real last resort, you could use the recently added
PatternReplaceFilter to strip out apostrophe's prior to
WordDelimiterFilter (if you like everything WordDelim does for you except
spliting on apostrophes)

:   - optionally index ohara at *both* "o" and "hara"

then searching for "Shelley ohara memorial" fails without unless yo have
slop .. if you need slop, you might as well not index it twice (not to
mention it throws off the tf/idf calculations)

:   - pick the "alignment" based on the token position in the stream...
: right-justify the catenations if it's the first token, otherwise
: left-justify.  One could try to identify proper names and do the
: justification correctly too (blech).

oh for the love of god please no.



-Hoss



Re: Apostrophes in fields

2007-01-16 Thread Yonik Seeley

On 1/16/07, Mike Klaas <[EMAIL PROTECTED]> wrote:

> It appears to be matching author:"Shelley Ohara" but when I do this
> search no results are returned, searches like author:"Shelley O hara",
> author:"Shelley O'hara" work as expected. Any ideas?


This problem is why some sloppiness is recommended when dealing with
WordDelimiterFilter.

"Shelley Ohara"~1 should work.


Hmm, shouldn't "ohara" be generated at the same position as "o", not
"hara"?  It looks like it is failing to do exact phrase matching
because the index contains "shelley o (ohara|hara)"


The problem is, if you do it one way, the other way breaks.  If you
index "ohara" with "o", then a field like "O'hara Shelley" wouldn't
match a query like "oraha shelly".

There are a few possible options:
 - optionally index ohara at *both* "o" and "hara"
 - pick the "alignment" based on the token position in the stream...
right-justify the catenations if it's the first token, otherwise
left-justify.  One could try to identify proper names and do the
justification correctly too (blech).

-Yonik


Re: Apostrophes in fields

2007-01-16 Thread Mike Klaas

On 1/16/07, Nick Jenkin <[EMAIL PROTECTED]> wrote:

Hi Jeff, Bertrand
THanks for your help,

The analyzers I am using are the same as in the example schema.xml
Author field:

analysis result:
http://nickjenkin.com/misc/solr.jpg

It appears to be matching author:"Shelley Ohara" but when I do this
search no results are returned, searches like author:"Shelley O hara",
author:"Shelley O'hara" work as expected. Any ideas?


Hmm, shouldn't "ohara" be generated at the same position as "o", not
"hara"?  It looks like it is failing to do exact phrase matching
because the index contains "shelley o (ohara|hara)"

-Mike


Re: Apostrophes in fields

2007-01-16 Thread Nick Jenkin

Hi Jeff, Bertrand
THanks for your help,

The analyzers I am using are the same as in the example schema.xml
Author field:

analysis result:
http://nickjenkin.com/misc/solr.jpg

It appears to be matching author:"Shelley Ohara" but when I do this
search no results are returned, searches like author:"Shelley O hara",
author:"Shelley O'hara" work as expected. Any ideas?
Thanks
-Nick

On 1/16/07, Bertrand Delacretaz <[EMAIL PROTECTED]> wrote:

On 1/16/07, Jeff Rodenburg <[EMAIL PROTECTED]> wrote:
> Nick - this depends on the analyzer used to index the field as well as the
> analyzer used in your search query

Note that the Solr "analysis" page, in the admin interface, allows you
to see exactly how your field's content is converted for indexing.
There's an example at http://www.xml.com/lpt/a/1668 in the "Content
Analysis" part of the article.

-Bertrand




--
- Nick


Re: Apostrophes in fields

2007-01-15 Thread Bertrand Delacretaz

On 1/16/07, Jeff Rodenburg <[EMAIL PROTECTED]> wrote:

Nick - this depends on the analyzer used to index the field as well as the
analyzer used in your search query


Note that the Solr "analysis" page, in the admin interface, allows you
to see exactly how your field's content is converted for indexing.
There's an example at http://www.xml.com/lpt/a/1668 in the "Content
Analysis" part of the article.

-Bertrand


Re: Apostrophes in fields

2007-01-15 Thread Jeff Rodenburg

Nick - this depends on the analyzer used to index the field as well as the
analyzer used in your search query.  This gets handled in solr with the
fieldtype and requesthandler.  Referencing the sample schema.xml off the
wiki site, I would start with fieldtype="text" and go from there.  If it
doesn't address apostrophes (it splits on non-alpha chars) you can easily
extend it through configuration to reference the necessary filter factory
class.

Hope this helps.

-- j

On 1/15/07, Nick Jenkin <[EMAIL PROTECTED]> wrote:


Hi
This is probably more of a lucene question, but:
I have an author field,

If I query author:"Shelley Ohara" - no results are returned
If I query author:"Shelley O'hara" - many results are returned,

Is it possible, to get solr to ignore apostrophes in queries like the one
above?

e.g. doc

  Shelley  O'Hara
  true
  long descirption
  9780764559747
  Paperback
  IDGP
  Kierkegaard Within Your Grasp
  2004

Thanks
--
- Nick