Re: Improper Solr Search results

Alessandro Benedetti Thu, 24 Nov 2022 10:07:44 -0800

I agree with Markus, also, in your scenarios you are using "", that in
Apache Solr has a very specific meaning (phrase queries:
https://solr.apache.org/guide/solr/latest/query-guide/standard-query-parser.html#proximity-searches
).


But, are you using phrase queries or do you put the quotes just to describe
your problem? I would be cautious about that!

The long config is pretty much unreadable and weird, it almost feels like
you are using some sort of wrapper around Apache Solr.
I would suggest you just pass us your schema.xml if you like.

Also, if you run your query with a request param: '...&debug=query', the
output can help us.

Cheers
--------------------------
*Alessandro Benedetti*
Director @ Sease Ltd.
*Apache Lucene/Solr Committer*
*Apache Solr PMC Member*

e-mail: a.benede...@sease.io


*Sease* - Information Retrieval Applied
Consulting | Training | Open Source

Website: Sease.io <http://sease.io/>
LinkedIn <https://linkedin.com/company/sease-ltd> | Twitter
<https://twitter.com/seaseltd> | Youtube
<https://www.youtube.com/channel/UCDx86ZKLYNpI3gzMercM7BQ> | Github
<https://github.com/seaseltd>


On Wed, 23 Nov 2022 at 12:19, Markus Jelsma <markus.jel...@openindex.io>
wrote:

> Hello,
>
> It is unclear what you are looking for, do you have a problem with the
> highlighted excerpts, or a problem with the sorting of the top search
> results?
>
> Also, everything below 'Here are the settings what I have used.' is not
> really helpful.
>
> Regards,
> Markus
>
> Op wo 23 nov. 2022 om 12:35 schreef Raj Krishna <rkris...@sandvine.com>:
>
> > Hi Team,
> > Do we have any leads on this issue.
> >
> > Thanks
> > Raj
> >
> > From: Raj Krishna
> > Sent: Monday, November 21, 2022 2:56 PM
> > To: users@solr.apache.org
> > Subject: Improper Solr Search results
> >
> > Hi solr team,
> >
> > The solr search is not showing up the proper results.
> >
> > Here is what I am looking for:
> >
> > Scenerio1
> > Let's say, I searched for "ABC DEF" with Contains all of these words
> > configuration.
> > Result I get:
> > .......ABC........................DEF.........
> > .......DEF...........ABC.............
> > .......DEF......................
> > .......ABC............
> >
> > Expected Result:
> > ..........ABC DEF.......
> >
> > In scenerio1, in some cases when I go  to the actual page of the partial
> > search results (let's say 3rd one). I get the exact match in some
> different
> > line, not the excerpt which is displayed in the result.
> >
> > Scenerio2
> > Let's say, I searched for "ABC DEF" with Contains all of these words
> > configuration.
> > Result I get:
> > .......DEF......................
> > .......ABC............
> >
> > Expected Result:
> > ..........ABC DEF.......
> >
> > In Scenerio2, I don't even get the exact match.
> >
> >
> >
> >
> > Here are the settings what I have used.
> >
> >
> >
> > 1.    Home
> > 2.     Administration
> > 3.     Configuration
> > 4.     Search and Metadata
> > 5.     Search API
> > 6.     Solr index
> > 7.     Solr index
> > Index name Machine name: solr_index
> > Enter the displayed name for the index.
> > Machine-readable name
> > A unique machine-readable name. Can only contain lowercase letters,
> > numbers, and underscores.
> > Datasources
> >  Comment
> > Provides Comment entities for indexing and searching.
> >  Contact message
> > Provides Contact message entities for indexing and searching.
> >  Content
> > Provides Content entities for indexing and searching.
> >  Content moderation state
> > Provides Content moderation state entities for indexing and searching.
> >  Custom block
> > Provides Custom block entities for indexing and searching.
> >  Custom menu link
> > Provides Custom menu link entities for indexing and searching.
> >  File
> > Provides File entities for indexing and searching.
> >  Media
> > Provides Media entities for indexing and searching.
> >  Search task
> > Provides Search task entities for indexing and searching.
> >  Shortcut link
> > Provides Shortcut link entities for indexing and searching.
> >  Simplenews subscriber
> > Provides Simplenews subscriber entities for indexing and searching.
> >  Solr Document
> > Search through external Solr content. (Only works if this index is
> > attached to a Solr-based server.)
> >  Solr Multisite Document
> > Search through a different site's content. (Only works if this index is
> > attached to a Solr-based server.)
> >  Taxonomy term
> > Provides Taxonomy term entities for indexing and searching.
> >  URL alias
> > Provides URL alias entities for indexing and searching.
> >  User
> > Provides User entities for indexing and searching.
> >  Webform submission
> > Provides Webform submission entities for indexing and searching.
> >  Workflow scheduled transition
> > Provides Workflow scheduled transition entities for indexing and
> searching.
> >  Workflow transition
> > Provides Workflow transition entities for indexing and searching.
> > Select one or more datasources of items that will be stored in this
> index.
> > CONFIGURE THE CONTENT DATASOURCE
> > BUNDLESLANGUAGES
> > CONFIGURE THE DEFAULT TRACKER
> > Default index tracker which uses a simple database table for tracking
> > items.
> > Indexing order
> >  Index items in the same order in which they were saved
> >  Index the most recent items first
> > The order in which items will be indexed.
> > Server
> >  - No server -
> >  solr index server
> > Select the server this index should use. Indexes cannot be enabled
> without
> > a connection to a valid, enabled server.
> >  Enabled
> > Only enabled indexes can be used for indexing and searching. This setting
> > will only take effect if the selected server is also enabled.
> > Description
> >
> > Enter a description for the index.
> > INDEX OPTIONS
> >  Read only
> > Do not write to this index or track the status of items in this index.
> >  Index items immediately
> > Immediately index new or updated items instead of waiting for the next
> > cron run. This might have serious performance drawbacks and is generally
> > not advised for larger sites.
> >  Track changes in referenced entities
> > Automatically queue items for re-indexing if one of the field values
> > indexed from entities they reference is changed. (For instance, when
> > indexing the name of a taxonomy term in a Content index, this would lead
> to
> > re-indexing when the term's name changes.) Enabling this setting can lead
> > to performance problems on large sites when saving some types of entities
> > (an often-used taxonomy term in our example). However, when the setting
> is
> > disabled, fields from referenced entities can go stale in the search
> index
> > and other steps should be taken to prevent this.
> > Cron batch size
> > Set how many items will be indexed at once when indexing items during a
> > cron run. "0" means that no items will be indexed by cron for this index,
> > "-1" means that cron should index all items at once.
> > SOLR SPECIFIC INDEX OPTIONS
> >  Finalize index before first search
> > If enabled, other modules could hook in to apply "finalizations" to the
> > index after updates or deletions happend to index items.
> > MULTILINGUAL
> >  Limit to current content language.
> > Limit all search results for custom queries or search pages not managed
> by
> > Views to current content language if no language is specified in the
> query.
> >  Include language independent content in search results.
> > This option will include content without a language assigned in the
> > results of custom queries or search pages not managed by Views. For
> > example, if you search for English content, but have an article with
> > languague of "undefined", you will see those results as well. If you
> > disable this option, you will only see content that matches the language.
> > HIGHLIGHTER
> > If "Retrieve result data from Solr" and "Highlight retrieved data" are
> > selected for the Solr backend on the server edit page, these highlighting
> > settings will be used.
> > maxAnalyzedChars
> > Specifies the number of characters into a document that Solr should look
> > for suitable snippets.
> > fragmenter
> > Specifies a text snippet generator for highlighted text. The standard
> > fragmenter is gap, which creates fixed-sized fragments with gaps for
> > multi-valued fields. Another option is regex, which tries to create
> > fragments that resemble a specified regular expression. This parameter
> > accepts per-field overrdes.
> > REGEX
> > regex.slop
> > When using the regex fragmenter, this parameter defines the factor by
> > which the fragmenter can stray from the ideal fragment size (given by
> > fragsize) to accommodate a regular expression. For instance, a slop of
> 0.2
> > with fragsize=100 should yield fragments between 80 and 120 characters in
> > length. It is usually good to provide a slightly smaller fragsize value
> > when using the regex fragmenter.
> > regex.pattern
> > Specifies the regular expression for fragmenting. This could be used to
> > extract sentences.
> > regex.maxAnalyzedChars
> > Instructs Solr to analyze only this many characters from a field when
> > using the regex fragmenter (after which, the fragmenter produces
> > fixed-sized fragments). Applying a complicated regex to a huge field is
> > computationally expensive.
> >  usePhraseHighlighter
> > If set, Solr will highlight phrase queries (and other advanced
> > position-sensitive queries) accurately. If false, the parts of the phrase
> > will be highlighted everywhere instead of only when it forms the given
> > phrase.
> >  highlightMultiTerm
> > If set, Solr will highlight wildcard queries (and other MultiTermQuery
> > subclasses). If false, they won't be highlighted at all.
> >  preserveMulti
> > If set, multi-valued fields will return all values in the order they were
> > saved in the index. If false, only values that match the highlight
> request
> > will be returned.
> >  mergeContiguous
> > Instructs Solr to collapse contiguous fragments into a single fragment. A
> > value of true indicates contiguous fragments will be collapsed into
> single
> > fragment. This parameter accepts per-field overrides. The default value,
> > false, is also the backward-compatible setting.
> >  requireFieldMatch
> > If set, highlights terms only if they appear in the specified field. If
> > not set, terms are highlighted in all requested fields regardless of
> which
> > field matched the query.
> > snippets
> > Specifies maximum number of highlighted snippets to generate per field.
> It
> > is possible for any number of snippets from zero to this value to be
> > generated. This parameter accepts per-field overrides.
> > fragsize
> > Specifies the size, in characters, of fragments to consider for
> > highlighting. 0 indicates that no fragmenting should be considered and
> the
> > whole field value should be used. This parameter accepts per-field
> > overrides.
> > MLT (MORELIKETHIS)TERM MODIFIERSADVANCED
> >
> >
> >
> > Manage processors for search index Solr index
> >  Add to Default shortcuts<
> >
> https://docs.support.sandvine.com/admin/config/user-interface/shortcut/manage/default/add-link-inline?link=admin/config/search/search-api/index/solr_index/processors&name=Manage%20processors%20for%20search%20index%20Solr%20index&destination=/admin/config/search/search-api/index/solr_index/processors&token=IXOY03csEq7siIRPM6iA8innjeB_U7l08-neAjqibSk
> > >
> > Primary tabs
> > *       View
> > *       Edit
> > *       Fields
> > *       Processors(active tab)
> > Breadcrumb
> > 1.    Home
> > 2.     Administration
> > 3.     Configuration
> > 4.     Search and Metadata
> > 5.     Search API
> > 6.     Solr index
> > 7.     Solr index
> >
> > Configure processors which will pre- and post-process data at index and
> > search time. Find more information on the processors documentation page.
> > ENABLED
> >  Boost more recent dates
> > Boost more recent documents and penalize older documents.
> >  Content access
> > Adds content access checks for nodes and comments.
> >  Double Quote Workaround
> > Replaces double quotes in field values and query to work around a bug in
> > Solr streaming expressions.
> >  Entity status
> > Exclude inactive users and unpublished entities (which have a "Published"
> > state) from being indexed.
> >  Highlight
> > Adds a highlighted excerpt to results and highlights returned fields.
> >  HTML filter
> > Strips HTML tags from fulltext fields and decodes HTML entities. Use this
> > processor when indexing HTML data - for example, node bodies for certain
> > text formats. The processor also allows to boost (or ignore) the contents
> > of specific elements.
> >  Ignore case
> > Makes searches case-insensitive on selected fields.
> > It is recommended not to use this processor with the selected server.
> >  Ignore characters
> > Configure types of characters which should be ignored for searches.
> >  Index hierarchy
> > Allows the indexing of values along with all their ancestors for
> > hierarchical fields (like taxonomy term references)
> >  Number field-based boosting
> > Adds a boost to indexed items based on the value of a numeric field.
> >  Regular expression based replacements
> > Regular expression based replacements.
> >  Reverse entity references
> > Allows indexing of entities that link to the indexed entity.
> >  Role-based access
> > Adds an access check based on a user's roles. This may be sufficient for
> > sites where access is primarily granted or denied based on roles and
> > permissions. For grants-based access checks on "Content" or "Comment"
> > entities the "Content access" processor may be a suitable alternative.
> >  Solr dummy fields
> > Adds dummy fields to all datasources to register a pseudo field names
> that
> > get their values via API, for example
> > hook_search_api_solr_documents_alter().
> >  Stemmer
> > Stems search terms (for example, talking to talk). Currently, this only
> > acts on English language content. It uses the Porter 2 stemmer algorithm
> > (More information). For best results, use after tokenizing.
> > It is recommended not to use this processor with the selected server.
> >  Stopwords
> > Allows you to define stopwords which will be ignored in searches.
> Caution:
> > Only use after both 'Ignore case' and 'Tokenizer' have run.
> > It is recommended not to use this processor with the selected server.
> >  Tokenizer
> > Splits text into individual words for searching.
> > It is recommended not to use this processor with the selected server.
> >  Transliteration
> > Makes searches insensitive to accents and other non-ASCII characters.
> > It is recommended not to use this processor with the selected server.
> >  Type-specific boosting
> > Adds a boost to indexed items based on their datasource and/or bundle.
> > PROCESSOR ORDER
> > PREPROCESS INDEX
> > Show row weights
> >  <
> >
> https://docs.support.sandvine.com/admin/config/search/search-api/index/solr_index/processors
> > >
> > HTML filter
> >
> > PREPROCESS QUERY
> > Show row weights
> >  <
> >
> https://docs.support.sandvine.com/admin/config/search/search-api/index/solr_index/processors
> > >
> > HTML filter
> >
> >  <
> >
> https://docs.support.sandvine.com/admin/config/search/search-api/index/solr_index/processors
> > >
> > Content access
> >
> >  <
> >
> https://docs.support.sandvine.com/admin/config/search/search-api/index/solr_index/processors
> > >
> > Boost more recent dates
> >
> > POSTPROCESS QUERY
> > Show row weights
> >  <
> >
> https://docs.support.sandvine.com/admin/config/search/search-api/index/solr_index/processors
> > >
> > Highlight
> >
> > Processor settings
> > *       Boost more recent datesEnabled
> > *       HighlightEnabled(active tab)
> > *       HTML filterEnabled
> > Highlight returned field data
> > Select whether returned fields should be highlighted.
> >  Highlight partial matches
> > When enabled, matches in parts of words will be highlighted as well.
> >  Create excerpt
> > When enabled, an excerpt will be created for searches with keywords,
> > containing all occurrences of keywords in a fulltext field.
> >  Create excerpt even if no search keys are available
> > When enabled, an excerpt will be created even with an empty query string.
> > Excerpt length
> > The requested length of the excerpt, in characters
> > Exclude fields from excerpt
> >  Body (body)
> >  Title (title)
> > Exclude certain fulltext fields from being included in the excerpt.
> > Highlighting prefix
> > Text/HTML that will be prepended to all occurrences of search keywords in
> > highlighted text
> > Highlighting suffix
> > Text/HTML that will be appended to all occurrences of search keywords in
> > highlighted text
> >
> >
> >
> >
> >
> >
> >
> > Please Triage on this issue.
> > Feel free to ask for more clarity and details regarding this from my
> side.
> >
> > Thanks
> > Raj
> >
> > Disclaimer:
> > This communication (including any attachments) is intended for the use of
> > the intended recipient(s) only and may contain information that is
> > considered confidential, proprietary, sensitive and/or otherwise legally
> > protected. Any unauthorized use or dissemination of this communication is
> > strictly prohibited. If you have received this communication in error,
> > please immediately notify the sender by return e-mail message and delete
> > all copies of the original communication. Thank you for your cooperation.
> >
>

Re: Improper Solr Search results

Reply via email to