Re: Handling acronyms

2021-01-15 Thread Shaun Campbell
Hi Michael

Thanks for that I'll have a study later.  It's just reminded me of the
expand option which I meant to have a look at.

Thanks
Shaun

On Fri, 15 Jan 2021 at 14:33, Michael Gibney 
wrote:

> The equivalent terms on the right-hand side of the `=>` operator in the
> example you sent should be separated by a comma. You mention you already
> tried only-comma-separated (e.g. one line: `SRN,Stroke Research Network`)
> and that that yielded unexpected results as well. I would recommend
> pre-case-normalizing all the terms in synonyms.txt (i.e., lower-case), and
> applying the synonym filter _after_ case normalization in the analysis
> chain (there are other ways you could do, but the key point being that you
> need to pay attention to case and how it interacts with the order in which
> filters are applied).
>
> Re: Charlie's recommendation to apply these at index-time, a word of
> caution (and it's possible that this is in fact the underlying cause of
> some of the unexpected behavior you're observine?): be careful if you're
> using term _expansion_ at index-time (i.e., mapping single terms to
> multiple terms, which I note appears to be what you're trying to do in the
> example lines you provided). Multi-term index-time synonyms can lead to
> unexpected results for positional queries (either explicit phrase queries,
> or implicit, e.g. as configured by `pf` param in edismax). I'm aware of at
> least two good overviews of this topic, one by Mike McCandless focusing on
> Elasticsearch [1], one by Steve Rowe focusing on Solr [2]. The underlying
> issue is related LUCENE-4312 [3], so both posts (ES- & Solr-related) are
> relevant.
>
> One way to work around this is to "collapse" (rather than expand) synonyms,
> at both index and query time. Another option would be to apply synonym
> expansion only at query-time. It's also worth noting that increasing phrase
> slop (`ps` param, etc.) can cause the issues with index-time synonym
> expansion to "fly under the radar" a little, wrt the most blatant "false
> negative" manifestations of index-time synonym issues for phrase queries.
>
> [1]
>
> https://www.elastic.co/blog/multitoken-synonyms-and-graph-queries-in-elasticsearch
> [2]
>
> https://lucidworks.com/post/multi-word-synonyms-solr-adds-query-time-support/
> [3] https://issues.apache.org/jira/browse/LUCENE-4312
>
> On Fri, Jan 15, 2021 at 6:18 AM Charlie Hull <
> ch...@opensourceconnections.com> wrote:
>
> > I'm wondering if you should be using these acronyms at index time, not
> > search time. It will make your index bigger and you'll have to re-index
> > to add new synonyms (as they may apply to old documents) but this could
> > be an occasional task, and in the meantime you could use query-time
> > synonyms for the new ones.
> >
> > Maintaining 9000 synonyms in Solr's synonyms.txt file seems unweildy to
> me.
> >
> > Cheers
> >
> > Charlie
> >
> > On 15/01/2021 09:48, Shaun Campbell wrote:
> > > I have a medical journals search application and I've a list of some
> > 9,000
> > > acronyms like this:
> > >
> > > MSNQ=>MSNQ Multiple Sclerosis Neuropsychological Screening
> Questionnaire
> > > SRN=>SRN Stroke Research Network
> > > IGBP=>IGBP isolated gastric bypass
> > > TOMADO=>TOMADO Trial of Oral Mandibular Advancement Devices for
> > Obstructive
> > > sleep apnoea–hypopnoea
> > > SRM=>SRM standardised response mean
> > > SRT=>SRT substrate reduction therapy
> > > SRS=>SRS Sexual Rating Scale
> > > SRU=>SRU stroke rehabilitation unit
> > > T2w=>T2w T2-weighted
> > > Ab-P=>Ab-P Aberdeen participation restriction subscale
> > > MSOA=>MSOA middle-layer super output area
> > > SSA=>SSA site-specific assessment
> > > SSC=>SSC Study Steering Committee
> > > SSB=>SSB short-stretch bandage
> > > SSE=>SSE sum squared error
> > > SSD=>SSD social services department
> > > NVPI=>NVPI Nausea and Vomiting of Pregnancy Instrument
> > >
> > > I tried to put them in a synonyms file, either just with a comma
> between,
> > > or with an arrow in between and the acronym repeated on the right like
> > > above, and no matter what I try I'm getting really strange search
> > results.
> > > It's like words in one acronym are matching with the same word in
> another
> > > acronym and then searching with that acronym which is completely
> > unrelated.
> > >
> > > I don't think Solr can handle this, but does anyone know of any crafty
> > > tricks in Solr to handle this situation where I can either search by
> the
> > > acronym or by the text?
> > >
> > > Shaun
> > >
> >
> > --
> > Charlie Hull - Managing Consultant at OpenSource Connections Limited
> > 
> > Founding member of The Search Network <https://thesearchnetwork.com/>
> > and co-author of Searching the Enterprise
> > <https://opensourceconnections.com/about-us/books-resources/>
> > tel/fax: +44 (0)8700 118334
> > mobile: +44 (0)7767 825828
> >
>


Re: Handling acronyms

2021-01-15 Thread Shaun Campbell
Hi Charlie

I was indexing at index time only. The synonyms/acronyms were coming from
the published journals xml files so I wasn't expecting to maintain them
myself.  If it worked, I was expecting, hopefully, to update the synonyms
file automatically.

As I just explained to Bernd I'm finding that because I'm just using
supplied acronyms from the documents there's some overlap on the words used
and it's giving me unexpected results.  For example if I enter diabetes it
finds the acronym DM for diabetes mellitus, which then coincides with an
authors initials and puts them at the top of the list which is completely
wrong, or is it?  Perhaps I was looking for an author DM. Just too much
noise to be useful I think.

Thanks for your input anyway.
Shaun



On Fri, 15 Jan 2021 at 11:18, Charlie Hull 
wrote:

> I'm wondering if you should be using these acronyms at index time, not
> search time. It will make your index bigger and you'll have to re-index
> to add new synonyms (as they may apply to old documents) but this could
> be an occasional task, and in the meantime you could use query-time
> synonyms for the new ones.
>
> Maintaining 9000 synonyms in Solr's synonyms.txt file seems unweildy to me.
>
> Cheers
>
> Charlie
>
> On 15/01/2021 09:48, Shaun Campbell wrote:
> > I have a medical journals search application and I've a list of some
> 9,000
> > acronyms like this:
> >
> > MSNQ=>MSNQ Multiple Sclerosis Neuropsychological Screening Questionnaire
> > SRN=>SRN Stroke Research Network
> > IGBP=>IGBP isolated gastric bypass
> > TOMADO=>TOMADO Trial of Oral Mandibular Advancement Devices for
> Obstructive
> > sleep apnoea–hypopnoea
> > SRM=>SRM standardised response mean
> > SRT=>SRT substrate reduction therapy
> > SRS=>SRS Sexual Rating Scale
> > SRU=>SRU stroke rehabilitation unit
> > T2w=>T2w T2-weighted
> > Ab-P=>Ab-P Aberdeen participation restriction subscale
> > MSOA=>MSOA middle-layer super output area
> > SSA=>SSA site-specific assessment
> > SSC=>SSC Study Steering Committee
> > SSB=>SSB short-stretch bandage
> > SSE=>SSE sum squared error
> > SSD=>SSD social services department
> > NVPI=>NVPI Nausea and Vomiting of Pregnancy Instrument
> >
> > I tried to put them in a synonyms file, either just with a comma between,
> > or with an arrow in between and the acronym repeated on the right like
> > above, and no matter what I try I'm getting really strange search
> results.
> > It's like words in one acronym are matching with the same word in another
> > acronym and then searching with that acronym which is completely
> unrelated.
> >
> > I don't think Solr can handle this, but does anyone know of any crafty
> > tricks in Solr to handle this situation where I can either search by the
> > acronym or by the text?
> >
> > Shaun
> >
>
> --
> Charlie Hull - Managing Consultant at OpenSource Connections Limited
> 
> Founding member of The Search Network <https://thesearchnetwork.com/>
> and co-author of Searching the Enterprise
> <https://opensourceconnections.com/about-us/books-resources/>
> tel/fax: +44 (0)8700 118334
> mobile: +44 (0)7767 825828
>


Re: Handling acronyms

2021-01-15 Thread Shaun Campbell
Hi Bernd

Thanks for that. I think it is working, but I think unfortunately what I'm
trying to do is impossible/not logical.  When I enter a term it goes off
and searches using all the matching acronyms, because I'm finding a term
used in more than one synonym eg diabetes.

I think at the end of the day this produces too much "noise" to make any
sense of the results.   Think I will have to park this for now.

Thanks
Shaun

On Fri, 15 Jan 2021 at 10:35, Bernd Fehling 
wrote:

> If you are using multiword synonyms, acronyms, ...
> Your should escape the space within the multiwords.
>
> As synonyms.txt:
> SRN, Stroke\ Research\ Network
> IGBP, isolated\ gastric\ bypass
> ...
>
> Redards
> Bernd
>
>
> Am 15.01.21 um 10:48 schrieb Shaun Campbell:
> > I have a medical journals search application and I've a list of some
> 9,000
> > acronyms like this:
> >
> > MSNQ=>MSNQ Multiple Sclerosis Neuropsychological Screening Questionnaire
> > SRN=>SRN Stroke Research Network
> > IGBP=>IGBP isolated gastric bypass
> > TOMADO=>TOMADO Trial of Oral Mandibular Advancement Devices for
> Obstructive
> > sleep apnoea–hypopnoea
> > SRM=>SRM standardised response mean
> > SRT=>SRT substrate reduction therapy
> > SRS=>SRS Sexual Rating Scale
> > SRU=>SRU stroke rehabilitation unit
> > T2w=>T2w T2-weighted
> > Ab-P=>Ab-P Aberdeen participation restriction subscale
> > MSOA=>MSOA middle-layer super output area
> > SSA=>SSA site-specific assessment
> > SSC=>SSC Study Steering Committee
> > SSB=>SSB short-stretch bandage
> > SSE=>SSE sum squared error
> > SSD=>SSD social services department
> > NVPI=>NVPI Nausea and Vomiting of Pregnancy Instrument
> >
> > I tried to put them in a synonyms file, either just with a comma between,
> > or with an arrow in between and the acronym repeated on the right like
> > above, and no matter what I try I'm getting really strange search
> results.
> > It's like words in one acronym are matching with the same word in another
> > acronym and then searching with that acronym which is completely
> unrelated.
> >
> > I don't think Solr can handle this, but does anyone know of any crafty
> > tricks in Solr to handle this situation where I can either search by the
> > acronym or by the text?
> >
> > Shaun
> >
>


Handling acronyms

2021-01-15 Thread Shaun Campbell
I have a medical journals search application and I've a list of some 9,000
acronyms like this:

MSNQ=>MSNQ Multiple Sclerosis Neuropsychological Screening Questionnaire
SRN=>SRN Stroke Research Network
IGBP=>IGBP isolated gastric bypass
TOMADO=>TOMADO Trial of Oral Mandibular Advancement Devices for Obstructive
sleep apnoea–hypopnoea
SRM=>SRM standardised response mean
SRT=>SRT substrate reduction therapy
SRS=>SRS Sexual Rating Scale
SRU=>SRU stroke rehabilitation unit
T2w=>T2w T2-weighted
Ab-P=>Ab-P Aberdeen participation restriction subscale
MSOA=>MSOA middle-layer super output area
SSA=>SSA site-specific assessment
SSC=>SSC Study Steering Committee
SSB=>SSB short-stretch bandage
SSE=>SSE sum squared error
SSD=>SSD social services department
NVPI=>NVPI Nausea and Vomiting of Pregnancy Instrument

I tried to put them in a synonyms file, either just with a comma between,
or with an arrow in between and the acronym repeated on the right like
above, and no matter what I try I'm getting really strange search results.
It's like words in one acronym are matching with the same word in another
acronym and then searching with that acronym which is completely unrelated.

I don't think Solr can handle this, but does anyone know of any crafty
tricks in Solr to handle this situation where I can either search by the
acronym or by the text?

Shaun


Re: Highlighting large text fields

2021-01-12 Thread Shaun Campbell
Hi David

Just reindexed everything and it appears to be performing well and giving
me highlights for the matched text.

Thanks for your help.
Shaun

On Tue, 12 Jan 2021, 21:00 David Smiley,  wrote:

> The last update to highlighting that I think is pertinent to
> whether highlights match or not is v7.6 which added that hl.weightMatches
> option.  So I recommend upgrading to at least that if you want to
> experiment further.  But... uh.weightMatches highlights more accurately and
> as such is more likely to not highlight as much as you are highlighting
> now, and highlighting more is your goal right now it appears.
>
> ~ David Smiley
> Apache Lucene/Solr Search Developer
> http://www.linkedin.com/in/davidwsmiley
>
>
> On Tue, Jan 12, 2021 at 2:45 PM Shaun Campbell 
> wrote:
>
> > That's great David.  So hl.maxAnalyzedChars isn't that critical. I'll
> whack
> > it right up and see what happens.
> >
> > I'm running 7.4 from a few years ago. Should I upgrade?
> >
> > For your info this is what I'm doing with Solr
> > https://dev.fundingawards.nihr.ac.uk/search.
> >
> > Thanks
> > Shaun
> >
> > On Tue, 12 Jan 2021 at 19:33, David Smiley  wrote:
> >
> > > On Tue, Jan 12, 2021 at 1:08 PM Shaun Campbell <
> campbell.sh...@gmail.com
> > >
> > > wrote:
> > >
> > > > Hi David
> > > >
> > > > Getting closer now.
> > > >
> > > > First of all, a bit of a mistake on my part. I have two cores set up
> > and
> > > I
> > > > was changing the solrconfig.xml on the wrong core doh!!  That's why
> > > > highlighting wasn't being turned off.
> > > >
> > > > I think I've got the unified highlighter working.
> > > > storeOffsetsWithPositions was already configured on my field type
> > > > definition, not the field definition, so that was ok.
> > > >
> > > > What it boils down to now I think is hl.maxAnalyzedChars. I'm getting
> > > > highlighting on some records and not others, making it confusing as
> to
> > > > where the match is with my dismax parser.  I increased
> > > > my hl.maxAnalyzedChars to 130 and now it's highlighting more
> > records.
> > > > Two questions:
> > > >
> > > > 1. Have you any guidelines as to what could be a
> > > > maximum hl.maxAnalyzedChars without impacting performance or memory?
> > > >
> > >
> > > With storeOffsetsWithPositions, highlighting is super-fast, and so this
> > > hl.maxAnalyzedChars threshold is of marginal utility, like only to cap
> > the
> > > amount of memory used if you have some truly humongous docs and it's
> okay
> > > only highlight the first X megabytes of them.  Maybe set to a 100MB
> worth
> > > of text, or something like that.
> > >
> > >
> > > > 2. Do you know a way to query the maximum length of text in a field
> so
> > > that
> > > > I can set hl.maxAnalyzedChars accordingly?  Just thinking I can
> > probably
> > > > modify my java indexer to log the maximum content length.  Actually,
> I
> > > > probably don't want the maximum but some value that highlights 90-95%
> > > > records
> > > >
> > >
> > > Eh... not really.  Maybe some approximation hacks involving function
> > > queries on norms but I'd not bother in favor of just using a high
> > threshold
> > > such that this won't be an issue.
> > >
> > > All this said, this threshold is *not* the only reason why you might
> not
> > be
> > > getting highlights that you expect.  If you are using a recent Solr
> > > version, you might try toggling the hl.weightMatches boolean, which
> could
> > > make a difference for certain query arrangements.  There's a JIRA issue
> > > pertaining to this one, and I haven't investigated it yet.
> > >
> > > ~ David
> > >
> > >
> > > >
> > > > Thanks
> > > > Shaun
> > > >
> > > > On Tue, 12 Jan 2021 at 16:30, David Smiley 
> wrote:
> > > >
> > > > > On Tue, Jan 12, 2021 at 9:39 AM Shaun Campbell <
> > > campbell.sh...@gmail.com
> > > > >
> > > > > wrote:
> > > > >
> > > > > > Hi David
> > > > > >
> > > > > > First of all I wanted to say I'm working off your book!!  Third
> > > > edition,
> > > > > > and I thin

Re: Highlighting large text fields

2021-01-12 Thread Shaun Campbell
That's great David.  So hl.maxAnalyzedChars isn't that critical. I'll whack
it right up and see what happens.

I'm running 7.4 from a few years ago. Should I upgrade?

For your info this is what I'm doing with Solr
https://dev.fundingawards.nihr.ac.uk/search.

Thanks
Shaun

On Tue, 12 Jan 2021 at 19:33, David Smiley  wrote:

> On Tue, Jan 12, 2021 at 1:08 PM Shaun Campbell 
> wrote:
>
> > Hi David
> >
> > Getting closer now.
> >
> > First of all, a bit of a mistake on my part. I have two cores set up and
> I
> > was changing the solrconfig.xml on the wrong core doh!!  That's why
> > highlighting wasn't being turned off.
> >
> > I think I've got the unified highlighter working.
> > storeOffsetsWithPositions was already configured on my field type
> > definition, not the field definition, so that was ok.
> >
> > What it boils down to now I think is hl.maxAnalyzedChars. I'm getting
> > highlighting on some records and not others, making it confusing as to
> > where the match is with my dismax parser.  I increased
> > my hl.maxAnalyzedChars to 130 and now it's highlighting more records.
> > Two questions:
> >
> > 1. Have you any guidelines as to what could be a
> > maximum hl.maxAnalyzedChars without impacting performance or memory?
> >
>
> With storeOffsetsWithPositions, highlighting is super-fast, and so this
> hl.maxAnalyzedChars threshold is of marginal utility, like only to cap the
> amount of memory used if you have some truly humongous docs and it's okay
> only highlight the first X megabytes of them.  Maybe set to a 100MB worth
> of text, or something like that.
>
>
> > 2. Do you know a way to query the maximum length of text in a field so
> that
> > I can set hl.maxAnalyzedChars accordingly?  Just thinking I can probably
> > modify my java indexer to log the maximum content length.  Actually, I
> > probably don't want the maximum but some value that highlights 90-95%
> > records
> >
>
> Eh... not really.  Maybe some approximation hacks involving function
> queries on norms but I'd not bother in favor of just using a high threshold
> such that this won't be an issue.
>
> All this said, this threshold is *not* the only reason why you might not be
> getting highlights that you expect.  If you are using a recent Solr
> version, you might try toggling the hl.weightMatches boolean, which could
> make a difference for certain query arrangements.  There's a JIRA issue
> pertaining to this one, and I haven't investigated it yet.
>
> ~ David
>
>
> >
> > Thanks
> > Shaun
> >
> > On Tue, 12 Jan 2021 at 16:30, David Smiley  wrote:
> >
> > > On Tue, Jan 12, 2021 at 9:39 AM Shaun Campbell <
> campbell.sh...@gmail.com
> > >
> > > wrote:
> > >
> > > > Hi David
> > > >
> > > > First of all I wanted to say I'm working off your book!!  Third
> > edition,
> > > > and I think it's a bit out of date now. I was just going to try
> > following
> > > > the section on the Postings highlighter, but I see that's been
> absorbed
> > > > into the Unified highlighter. I find your book easier to follow than
> > the
> > > > official documentation though.
> > > >
> > >
> > > Thanks :-D.  I do maintain the Solr Reference Guide for the parts of
> > code I
> > > touch, including highlighting, so I hope what's there makes sense too.
> > >
> > >
> > > > I am going to try to configure the unified highlighter, and I will
> add
> > > that
> > > > storeOffsetsWithPositions to the schema (which I saw in your book)
> and
> > I
> > > > will try indexing again from scratch.  Was getting some funny things
> > > going
> > > > on where I thought I'd turned highlighting off and it was still
> giving
> > me
> > > > highlights.
> > > >
> > >
> > > hl=true/false
> > >
> > >
> > > > Actually just re-reading your email again, are you saying that you
> > can't
> > > > configure highlighting in solrconfig.xml? That's where I always
> > configure
> > > > original highlighting in my dismax search handler. Am I supposed to
> add
> > > > highlighting to each request?
> > > >
> > >
> > > You can set highlighting and other *parameters* in solrconfig.xml for
> > > request handlers.  But the dedicated  plugin info is only
> > for
> > > the original and Fast Vector Highlighters.
> > >
> > > ~ David
&g

Re: Highlighting large text fields

2021-01-12 Thread Shaun Campbell
Hi David

Getting closer now.

First of all, a bit of a mistake on my part. I have two cores set up and I
was changing the solrconfig.xml on the wrong core doh!!  That's why
highlighting wasn't being turned off.

I think I've got the unified highlighter working.
storeOffsetsWithPositions was already configured on my field type
definition, not the field definition, so that was ok.

What it boils down to now I think is hl.maxAnalyzedChars. I'm getting
highlighting on some records and not others, making it confusing as to
where the match is with my dismax parser.  I increased
my hl.maxAnalyzedChars to 130 and now it's highlighting more records.
Two questions:

1. Have you any guidelines as to what could be a
maximum hl.maxAnalyzedChars without impacting performance or memory?

2. Do you know a way to query the maximum length of text in a field so that
I can set hl.maxAnalyzedChars accordingly?  Just thinking I can probably
modify my java indexer to log the maximum content length.  Actually, I
probably don't want the maximum but some value that highlights 90-95%
records

Thanks
Shaun

On Tue, 12 Jan 2021 at 16:30, David Smiley  wrote:

> On Tue, Jan 12, 2021 at 9:39 AM Shaun Campbell 
> wrote:
>
> > Hi David
> >
> > First of all I wanted to say I'm working off your book!!  Third edition,
> > and I think it's a bit out of date now. I was just going to try following
> > the section on the Postings highlighter, but I see that's been absorbed
> > into the Unified highlighter. I find your book easier to follow than the
> > official documentation though.
> >
>
> Thanks :-D.  I do maintain the Solr Reference Guide for the parts of code I
> touch, including highlighting, so I hope what's there makes sense too.
>
>
> > I am going to try to configure the unified highlighter, and I will add
> that
> > storeOffsetsWithPositions to the schema (which I saw in your book) and I
> > will try indexing again from scratch.  Was getting some funny things
> going
> > on where I thought I'd turned highlighting off and it was still giving me
> > highlights.
> >
>
> hl=true/false
>
>
> > Actually just re-reading your email again, are you saying that you can't
> > configure highlighting in solrconfig.xml? That's where I always configure
> > original highlighting in my dismax search handler. Am I supposed to add
> > highlighting to each request?
> >
>
> You can set highlighting and other *parameters* in solrconfig.xml for
> request handlers.  But the dedicated  plugin info is only for
> the original and Fast Vector Highlighters.
>
> ~ David
>
>
> >
> > Thanks
> > Shaun
> >
> > On Mon, 11 Jan 2021 at 20:57, David Smiley  wrote:
> >
> > > Hello!
> > >
> > > I worked on the UnifiedHighlighter a lot and want to help you!
> > >
> > > On Mon, Jan 11, 2021 at 9:58 AM Shaun Campbell <
> campbell.sh...@gmail.com
> > >
> > > wrote:
> > >
> > > > I've been using highlighting for a while, using the original
> > highlighter,
> > > > and just come across a problem with fields that contain a large
> amount
> > of
> > > > text, approx 250k characters. I only have about 2,000 records but
> each
> > > one
> > > > contains a journal publication to search through.
> > > >
> > > > What I noticed is that some records didn't return a highlight even
> > though
> > > > they matched on the content. I noticed the hl.maxAnalyzedChars
> > parameter
> > > > and increased that, but  it allowed some records to be highlighted,
> but
> > > not
> > > > all, and then it caused memory problems on the server.  Performance
> is
> > > also
> > > > very poor.
> > > >
> > >
> > > I've been thinking hl.maxAnalyzedChars should maybe default to no limit
> > --
> > > it's a performance threshold but perhaps better to opt-in to such a
> limit
> > > then scratch your head for a long time wondering why a search result
> > isn't
> > > showing highlights.
> > >
> > >
> > > > To try to fix this I've tried  to configure the unified highlighter
> in
> > my
> > > > solrconfig.xml instead.   It seems to be working but again I'm
> missing
> > > some
> > > > highlighted records.
> > > >
> > >
> > > There is no configuration of that highlighter in solrconfig.xml; it's
> > > entirely parameter driven (runtime).
> > >
> > >
> > > > The other thing is I've tried to adjust my unified highlighting
> > set

Re: Highlighting large text fields

2021-01-12 Thread Shaun Campbell
Hi David

First of all I wanted to say I'm working off your book!!  Third edition,
and I think it's a bit out of date now. I was just going to try following
the section on the Postings highlighter, but I see that's been absorbed
into the Unified highlighter. I find your book easier to follow than the
official documentation though.

I am going to try to configure the unified highlighter, and I will add that
storeOffsetsWithPositions to the schema (which I saw in your book) and I
will try indexing again from scratch.  Was getting some funny things going
on where I thought I'd turned highlighting off and it was still giving me
highlights.

Actually just re-reading your email again, are you saying that you can't
configure highlighting in solrconfig.xml? That's where I always configure
original highlighting in my dismax search handler. Am I supposed to add
highlighting to each request?

Thanks
Shaun

On Mon, 11 Jan 2021 at 20:57, David Smiley  wrote:

> Hello!
>
> I worked on the UnifiedHighlighter a lot and want to help you!
>
> On Mon, Jan 11, 2021 at 9:58 AM Shaun Campbell 
> wrote:
>
> > I've been using highlighting for a while, using the original highlighter,
> > and just come across a problem with fields that contain a large amount of
> > text, approx 250k characters. I only have about 2,000 records but each
> one
> > contains a journal publication to search through.
> >
> > What I noticed is that some records didn't return a highlight even though
> > they matched on the content. I noticed the hl.maxAnalyzedChars parameter
> > and increased that, but  it allowed some records to be highlighted, but
> not
> > all, and then it caused memory problems on the server.  Performance is
> also
> > very poor.
> >
>
> I've been thinking hl.maxAnalyzedChars should maybe default to no limit --
> it's a performance threshold but perhaps better to opt-in to such a limit
> then scratch your head for a long time wondering why a search result isn't
> showing highlights.
>
>
> > To try to fix this I've tried  to configure the unified highlighter in my
> > solrconfig.xml instead.   It seems to be working but again I'm missing
> some
> > highlighted records.
> >
>
> There is no configuration of that highlighter in solrconfig.xml; it's
> entirely parameter driven (runtime).
>
>
> > The other thing is I've tried to adjust my unified highlighting settings
> in
> > solrconfig.xml and they don't  seem to be having any effect even after
> > restarting Solr.  I was just wondering whether there is any highlighting
> > information stored at index time. It's taking over 4hours to index my
> > records so it's not easy to keep reindexing my content.
> >
> > Any ideas on how to handle highlighting of large content  would be
> > appreciated.
> >
> > Shaun
> >
>
> Please read the documentation here thoroughly:
>
> https://lucene.apache.org/solr/guide/8_6/highlighting.html#the-unified-highlighter
> (or earlier version as applicable)
> Since you have large bodies of text to highlight, you would strongly
> benefit from putting offsets into the search index (and re-index) --
> storeOffsetsWithPositions.  That's an option on the field/fieldType in your
> schema; it may not be obvious reading the docs.  You have to opt-in to
> that; Solr doesn't normally store any info in the index for highlighting.
>
> ~ David Smiley
> Apache Lucene/Solr Search Developer
> http://www.linkedin.com/in/davidwsmiley
>


Highlighting large text fields

2021-01-11 Thread Shaun Campbell
I've been using highlighting for a while, using the original highlighter,
and just come across a problem with fields that contain a large amount of
text, approx 250k characters. I only have about 2,000 records but each one
contains a journal publication to search through.

What I noticed is that some records didn't return a highlight even though
they matched on the content. I noticed the hl.maxAnalyzedChars parameter
and increased that, but  it allowed some records to be highlighted, but not
all, and then it caused memory problems on the server.  Performance is also
very poor.

To try to fix this I've tried  to configure the unified highlighter in my
solrconfig.xml instead.   It seems to be working but again I'm missing some
highlighted records.

The other thing is I've tried to adjust my unified highlighting settings in
solrconfig.xml and they don't  seem to be having any effect even after
restarting Solr.  I was just wondering whether there is any highlighting
information stored at index time. It's taking over 4hours to index my
records so it's not easy to keep reindexing my content.

Any ideas on how to handle highlighting of large content  would be
appreciated.

Shaun


Searching document content and mult-valued fields

2020-07-01 Thread Shaun Campbell
Hi

Been using Solr on a project now for a couple of years and is working well.
It's just a simple index of about 20 - 25 fields and 7,000 project records.

Now there's a requirement to be able to search on the content of documents
(web pages, Word, pdf etc) related to those projects.  My initial thought
was to just create a new index to store the Tika'd content and just search
on that. However, the requirement is to somehow search through both the
project records and the content records at the same time and list the main
project with perhaps some info on the matching content data. I tried to
explain that you may find matching main project records but no content, and
vice versa.

My only solution to this search problem is to either concatenate all the
document content into one field on the main project record, and add that to
my dismax search, and use boosting etc or to use a multi-valued field to
store the content of each project document.  I'm a bit reluctant to do this
as the application is running well and I'm a bit nervous about a change to
the schema and the indexing process.  I just wondered what you thought
about adding a lot of content to an existing schema (single or multivalued
field) that doesn't normally store big amounts of data.

Or does anyone know of any way, I can join two searches like this together
and two separate indexes?

Thanks
Shaun


Re: Multiple Cores

2011-06-20 Thread Shaun Campbell
I would say it all depends on what you are trying to do.  Unlike a
relational database, in Solr the data does not need to be normalised, you
need to put everything into an index so that you can achieve whatever
feature it is that you want.  For example, you may search on customer and
want a facetted count of the products.

Also in Solr you have the concept of multi valued fields, therefore you
could have a product index with a multi valued field that stores say
customer id, thereby linking products and customers in one index.

We have multiple cores which we had to create for various reasons.  To
access the cores for indexing (or searching) you just have to refer to the
cores by their names in any Solr URLs or in the Java client etc.

I think it all depends on what it is you are trying to achieve.  It's best
to read a good book such as
http://www.packtpub.com/solr-1-4-enterprise-search-server/book and see what
can be achieved and design your indexes accordingly.

Hope that helps.



On 20 June 2011 05:38, jboy79 joel_pangani...@yahoo.com wrote:

 Hi, I am new to SOLR and would like to know if multiple cores is the best
 way
 to deal with having a product and customer index. If this is the case how
 do
 you go about indexing on multiple cores.
 Thanks

 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Multiple-Cores-tp3084817p3084817.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: manual background re-indexing

2011-04-28 Thread Shaun Campbell
Hi Paul

Would a multi-core set up and the swap command do what you want it to do?

http://wiki.apache.org/solr/CoreAdmin

Shaun

On 28 April 2011 12:49, Paul Libbrecht p...@hoplahup.net wrote:


 Hello list,

 I am planning to implement a setup, to be run on unix scripts, that should
 perform a full pull-and-reindex in a background server and index then deploy
 that index. All should happen on the same machine.

 I thought the replication methods would help me but they seem to rather
 solve the issues of distribution while, what I need, is only the ability to:

 - suspend the queries
 - swap the directories with the new index
 - close all searchers
 - reload and warm-up the searcher on the new index

 Is there a part of the replication utilities (http or unix) that I could
 use to perform the above tasks?
 I intend to do this on occasion... maybe once a month or even less.
 Is reload the right term to be used?

 paul


Re: Indexing Best Practice

2011-04-11 Thread Shaun Campbell
If it's of any help I've split the processing of PDF files from the
indexing. I put the PDF content into a text file (but I guess you could load
it into a database) and use that as part of the indexing.  My processing of
the PDF files also compares timestamps on the document and the text file so
that I'm only processing documents that have changed.

I am a newbie so perhaps there's more sophisticated approaches.

Hope that helps.
Shaun

On 11 April 2011 07:20, Darx Oman darxo...@gmail.com wrote:

 Hi guys

 I'm wondering how to best configure solr to fulfills my requirements.

 I'm indexing data from 2 data sources:
 1- Database
 2- PDF files (password encrypted)

 Every file has related information stored in the database.  Both the file
 content and the related database fields must be indexed as one document in
 solr.  Among the DB data is *per-user* permissions for every document.

 The file contents nearly never change, on the other hand, the DB data and
 especially the permissions change very frequently which require me to
 re-index everything for every modified document.

 My problem is in process of decrypting the PDF files before re-indexing
 them
 which takes too much time for a large number of documents, it could span to
 days in full re-indexing.

 What I'm trying to accomplish is eliminating the need to re-index the PDF
 content if not changed even if the DB data changed.  I know this is not
 possible in solr, because solr doesn't update documents.

 So how to best accomplish this:

 Can I use 2 indexes one for PDF contents and the other for DB data and have
 a common id field for both as a link between them, *and results are treated
 as one Document*?



Re: Tips for getting unique results?

2011-04-10 Thread Shaun Campbell
Hi Pete

Still think facets are what you need. We use facets to identify the most
common tags for documents in our library.  I use them to print the top 25
most common document tags.  The sort by count (the default) gives you the
one with the highest count first and then the next most common and so on.

Hope this helps.
Shaun

On 8 April 2011 19:28, Peter Spam ps...@mac.com wrote:

 Thanks for the note, Shaun, but the documentation indicates that the
 sorting is only in ascending order :-(

 facet.sort

 This param determines the ordering of the facet field constraints.

• count - sort the constraints by count (highest count first)
• index - to return the constraints sorted in their index order
 (lexicographic by indexed term). For terms in the ascii range, this will be
 alphabetically sorted.
 The default is count if facet.limit is greater than 0, index otherwise.

 Prior to Solr1.4, one needed to use true instead of count and false instead
 of index.

 This parameter can be specified on a per field basis.


 -Pete

 On Apr 8, 2011, at 2:49 AM, Shaun Campbell wrote:

  Pete
 
  Surely the default sort order for facets is by descending count order.
  See
  http://wiki.apache.org/solr/SimpleFacetParameters.  If your results are
  really sorted in ascending order can't you sort them externally eg Java?
 
  Hope that helps.
 
  Shaun




Re: Tips for getting unique results?

2011-04-08 Thread Shaun Campbell
Pete

Surely the default sort order for facets is by descending count order.  See
http://wiki.apache.org/solr/SimpleFacetParameters.  If your results are
really sorted in ascending order can't you sort them externally eg Java?

Hope that helps.

Shaun


Highlighting Issue

2010-12-09 Thread Shaun Campbell
I'm trying to highlight a field and I'm getting an exception thrown, only on
certain search terms though.  I am fairly certain that the cause of the
problem is through having synonyms on the highlighted field as I have had
highlighting working in the past on other fields.

The added complication is that the field that I am highlighting also has
ngramming and stemming.  I think what is happening is that the highlighting
cannot match the criteria (which happens to be a synonym) against the actual
string retrieved from the index and crashes, I think if the string found is
greater than a certain number of characters.

I wonder if anyone has experienced this problem and knows how to get around
it?

My field definition is:

!-- An edge nGrammed and stemmed field for the document tags.
--
fieldType name=tagphrase_nGram class=solr.TextField
positionIncrementGap=100
analyzer type=index
tokenizer class=solr.WhitespaceTokenizerFactory/
filter class=solr.SynonymFilterFactory
synonyms=../../common/tag_synonyms.txt ignoreCase=true expand=true/
!--filter class=solr.StopFilterFactory ignoreCase=true
words=stopwords.txt enablePositionIncrements=true /--
filter class=solr.WordDelimiterFilterFactory
generateWordParts=1 generateNumberParts=1 catenateWords=1
catenateNumbers=1 catenateAll=0 splitOnCaseChange=0/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.SnowballPorterFilterFactory language=English
protected=protwords.txt/

filter class=solr.EdgeNGramFilterFactory
minGramSize=1
maxGramSize=15
side=front /
/analyzer
analyzer type=query
tokenizer class=solr.WhitespaceTokenizerFactory /
filter class=solr.WordDelimiterFilterFactory
generateWordParts=1 generateNumberParts=1 catenateWords=0
catenateNumbers=0 catenateAll=0 splitOnCaseChange=0/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.SnowballPorterFilterFactory language=English
protected=protwords.txt/
/analyzer
/fieldType

My query is:

sort=tagcount+deschl.snippets=1start=0q=(+%2Btagsearch:asset)+||+(+%2Btagsearchnostem:asset)+hl.fl=tagsearchwt=javabinhl=truerows=100version=1

The exception being thrown is:

09-Dec-2010 11:59:26 org.apache.solr.common.SolrException log
SEVERE: org.apache.solr.common.SolrException:
org.apache.lucene.search.highlight.InvalidTokenOffsetsException: Token profo
exceeds length of provided text sized 26
at
org.apache.solr.highlight.DefaultSolrHighlighter.doHighlighting(DefaultSolrHighlighter.java:342)
at
org.apache.solr.handler.component.HighlightComponent.process(HighlightComponent.java:89)
at
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:195)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286)
at
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:845)
at
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583)
at
org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447)
at java.lang.Thread.run(Thread.java:619)
Caused by: org.apache.lucene.search.highlight.InvalidTokenOffsetsException:
Token profo exceeds length of provided text sized 26
at
org.apache.lucene.search.highlight.Highlighter.getBestTextFragments(Highlighter.java:254)
at
org.apache.solr.highlight.DefaultSolrHighlighter.doHighlighting(DefaultSolrHighlighter.java:335)
... 18 more

Thanks
Shaun


Re: Highlighting Issue

2010-12-09 Thread Shaun Campbell
Koji

Thanks a lot it's stopped crashing now.  Can I ask one other question about
synonym highlighting which looks a bit puzzling?

I enter asset as my criteria and it returns through synonym matching other
terms highlighted as well.  My debug output is:

DEBUG: uk.co.sjp.intranet.service.SolrServiceImpl - Highlighted tag =
eminves/emtment
DEBUG: uk.co.sjp.intranet.service.SolrServiceImpl - Highlighted tag =
emasset/em management
DEBUG: uk.co.sjp.intranet.service.SolrServiceImpl - Highlighted tag =
eminves/emtment product
DEBUG: uk.co.sjp.intranet.service.SolrServiceImpl - Highlighted tag =
alternative emasset/ems

As you can see asset works well.  For the synonyms does it just highlight
the first n characters where n is the length of the input string?  Can't
figure out how it could do otherwise.

Shaun








On 9 December 2010 12:51, Koji Sekiguchi k...@r.email.ne.jp wrote:

 (10/12/09 21:22), Shaun Campbell wrote:

 I'm trying to highlight a field and I'm getting an exception thrown, only
 on
 certain search terms though.  I am fairly certain that the cause of the
 problem is through having synonyms on the highlighted field as I have had
 highlighting working in the past on other fields.

 The added complication is that the field that I am highlighting also has
 ngramming and stemming.  I think what is happening is that the
 highlighting
 cannot match the criteria (which happens to be a synonym) against the
 actual
 string retrieved from the index and crashes, I think if the string found
 is
 greater than a certain number of characters.

 I wonder if anyone has experienced this problem and knows how to get
 around
 it?


 Basically, highlighting on synonym fields is no problem, but
 highlighter doesn't support n-gram fields. FastVectorHighlighter
 supports fixed-length (minGramSize==maxGramSize) n-gram fields.

 Koji
 --
 http://www.rondhuit.com/en/



Re: Highlighting Issue

2010-12-09 Thread Shaun Campbell
OK. I'd switch to FastVectorHighlighter which cured the exceptions and gives
me highlighting so I assumed that you could use this instead of the standard
highlighter on n-grammed fields. I guess my query was how does the
highlighter now highlight synonym terms?

Thanks
Shaun



 As I said in my previous mail, highlighter doesn't support n-gram field.
 Please remove EdgeNGramFilter from your index analyzer and re-index.
 You'll get what you want.


 Koji
 --
 http://www.rondhuit.com/en/



Re: Highlighting Issue

2010-12-09 Thread Shaun Campbell
Sorry, see what you mean about fixed-length (minGramSize==maxGramSize).  I
see mine aren't.:(

On 9 December 2010 14:26, Koji Sekiguchi k...@r.email.ne.jp wrote:

 (10/12/09 22:50), Shaun Campbell wrote:

 OK. I'd switch to FastVectorHighlighter which cured the exceptions and
 gives
 me highlighting so I assumed that you could use this instead of the
 standard
 highlighter on n-grammed fields. I guess my query was how does the
 highlighter now highlight synonym terms?

 Thanks
 Shaun


 FVH supports fixed-length (minGramSize==maxGramSize) n-gram tokeizer.
 I think FVH supports synonym fields, too.


 Koji
 --
 http://www.rondhuit.com/en/



Core Swapping

2010-11-16 Thread Shaun Campbell
I've got a Solr multi core system and I'm trying to swap the cores
after a re-index via SolrJ using a separate HTTP Solr web server.  My
application seems to be generating a URL that's not valid for my Solr
Tomcat installation but I can't see why or where it's getting its data
from.

Core swapping is working if I enter a URL manually into my browser.
This URL works:


http://localhost:8080/solr/admin/cores?action=SWAPcore=liveother=rebuild

As you can see in my error below the URL generated
(http://localhost:8080/solr/rebuild/admin/cores?action=SWAPcore=rebuildother=livewt=javabinversion=1)
has the core *rebuild* in it.  If I enter this directly into my
browser I get the following error:

HTTP Status 404 - /solr/admin/cores
type Status report
message /solr/admin/cores
description The requested resource (/solr/admin/cores) is not available.

My Java code is:
CoreAdminRequest car = new CoreAdminRequest();
car.setCoreName(rebuild);
car.setOtherCoreName(live);
car.setAction(CoreAdminParams.CoreAdminAction.SWAP);
CoreAdminResponse carp = car.process(getSolrServer());


Can anyone suggest what I might be doing wrong?

Thanks
Shaun


The error from my web application is:

SEVERE: Servlet.service() for servlet Spring MVC Dispatcher Servlet
threw exception
org.apache.solr.common.SolrException: Not Found

Not Found

request: 
http://localhost:8080/solr/rebuild/admin/cores?action=SWAPcore=rebuildother=livewt=javabinversion=1
    at 
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:435)
    at 
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:244)
    at 
org.apache.solr.client.solrj.request.CoreAdminRequest.process(CoreAdminRequest.java:214)
    at 
uk.co.sjp.intranet.Factory.HttpSolrServerImpl.swapCores(HttpSolrServerImpl.java:50)
    at 
uk.co.sjp.intranet.indexing.DocumentIndexer.index(DocumentIndexer.java:185)
    at uk.co.sjp.intranet.SearchController.test(SearchController.java:107)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at 
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
    at 
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
    at java.lang.reflect.Method.invoke(Method.java:597)
    at 
org.springframework.web.bind.annotation.support.HandlerMethodInvoker.doInvokeMethod(HandlerMethodInvoker.java:710)
    at 
org.springframework.web.bind.annotation.support.HandlerMethodInvoker.invokeHandlerMethod(HandlerMethodInvoker.java:167)
    at 
org.springframework.web.servlet.mvc.annotation.AnnotationMethodHandlerAdapter.invokeHandlerMethod(AnnotationMethodHandlerAdapter.java:414)
    at 
org.springframework.web.servlet.mvc.annotation.AnnotationMethodHandlerAdapter.handle(AnnotationMethodHandlerAdapter.java:402)
    at 
org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherServlet.java:771)
    at 
org.springframework.web.servlet.DispatcherServlet.doService(DispatcherServlet.java:716)
    at 
org.springframework.web.servlet.FrameworkServlet.processRequest(FrameworkServlet.java:647)
    at 
org.springframework.web.servlet.FrameworkServlet.doGet(FrameworkServlet.java:552)
    at javax.servlet.http.HttpServlet.service(HttpServlet.java:617)
    at javax.servlet.http.HttpServlet.service(HttpServlet.java:717)
    at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:290)
    at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
    at 
org.apache.catalina.core.ApplicationDispatcher.invoke(ApplicationDispatcher.java:630)
    at 
org.apache.catalina.core.ApplicationDispatcher.processRequest(ApplicationDispatcher.java:436)
    at 
org.apache.catalina.core.ApplicationDispatcher.doForward(ApplicationDispatcher.java:374)
    at 
org.apache.catalina.core.ApplicationDispatcher.forward(ApplicationDispatcher.java:302)
    at 
org.tuckey.web.filters.urlrewrite.NormalRewrittenUrl.doRewrite(NormalRewrittenUrl.java:195)
    at 
org.tuckey.web.filters.urlrewrite.RuleChain.handleRewrite(RuleChain.java:159)
    at org.tuckey.web.filters.urlrewrite.RuleChain.doRules(RuleChain.java:141)
    at 
org.tuckey.web.filters.urlrewrite.UrlRewriter.processRequest(UrlRewriter.java:90)
    at 
org.tuckey.web.filters.urlrewrite.UrlRewriteFilter.doFilter(UrlRewriteFilter.java:417)
    at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
    at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
    at 
uk.co.sjp.intranet.utils.JsonpCallbackFilter.doFilter(JsonpCallbackFilter.java:108)
    at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
    at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
    at 

EmbeddedSolrServer, Indexing and Core Swapping

2010-11-16 Thread Shaun Campbell
Hi

I've switched my app to now use an EmbeddedSolrServer.  I'm doing an
index on my rebuild core and swapping cores at the end.
Unfortunately, without restarting my web app I can't see the newly
indexed data.  I can see core swapping is working, and I can see the
data after indexing without restarting servers if I use an http Solr
server.   Is there something different I need to do after indexing and
swapping cores with an embedded Solr server?  The only thing I can
possibly think of is that when I create a EmbeddedSolrServer object
for processing my swap request I need to specify a core.  Don't know
if this is significant.

Also, do you think I need to recreate my CoreContainer?

Any ideas would be welcome.
Thanks


Exception being thrown indexing a specific pdf document using Solr Cell

2010-10-15 Thread Shaun Campbell
I've got an existing Spring Solr SolrJ application that indexes a mixture of
documents.  It seems to have been working fine now for a couple of weeks but
today I've just started getting an exception when processing a certain pdf
file.

The exception is :

ERROR: org.apache.solr.core.SolrCore - org.apache.solr.common.SolrException:
org.apache.tika.exception.TikaException: Unexpected RuntimeException from
org.apache.tika.parser.pdf.pdfpar...@4683c2
at
org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:211)
at
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
at
org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:233)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
at
org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(EmbeddedSolrServer.java:139)
at
uk.co.sjp.intranet.service.SolrServiceImpl.loadDocuments(SolrServiceImpl.java:308)
at
uk.co.sjp.intranet.SearchController.loadDocuments(SearchController.java:297)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at
sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at
org.springframework.web.bind.annotation.support.HandlerMethodInvoker.doInvokeMethod(HandlerMethodInvoker.java:710)
at
org.springframework.web.bind.annotation.support.HandlerMethodInvoker.invokeHandlerMethod(HandlerMethodInvoker.java:167)
at
org.springframework.web.servlet.mvc.annotation.AnnotationMethodHandlerAdapter.invokeHandlerMethod(AnnotationMethodHandlerAdapter.java:414)
at
org.springframework.web.servlet.mvc.annotation.AnnotationMethodHandlerAdapter.handle(AnnotationMethodHandlerAdapter.java:402)
at
org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherServlet.java:771)
at
org.springframework.web.servlet.DispatcherServlet.doService(DispatcherServlet.java:716)
at
org.springframework.web.servlet.FrameworkServlet.processRequest(FrameworkServlet.java:647)
at
org.springframework.web.servlet.FrameworkServlet.doGet(FrameworkServlet.java:552)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:617)
at javax.servlet.http.HttpServlet.service(HttpServlet.java:717)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:290)
at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at
org.apache.catalina.core.ApplicationDispatcher.invoke(ApplicationDispatcher.java:630)
at
org.apache.catalina.core.ApplicationDispatcher.processRequest(ApplicationDispatcher.java:436)
at
org.apache.catalina.core.ApplicationDispatcher.doForward(ApplicationDispatcher.java:374)
at
org.apache.catalina.core.ApplicationDispatcher.forward(ApplicationDispatcher.java:302)
at
org.tuckey.web.filters.urlrewrite.NormalRewrittenUrl.doRewrite(NormalRewrittenUrl.java:195)
at
org.tuckey.web.filters.urlrewrite.RuleChain.handleRewrite(RuleChain.java:159)
at
org.tuckey.web.filters.urlrewrite.RuleChain.doRules(RuleChain.java:141)
at
org.tuckey.web.filters.urlrewrite.UrlRewriter.processRequest(UrlRewriter.java:90)
at
org.tuckey.web.filters.urlrewrite.UrlRewriteFilter.doFilter(UrlRewriteFilter.java:417)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286)
at
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:845)
at
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583)
at
org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447)
at java.lang.Thread.run(Thread.java:619)
Caused by: org.apache.tika.exception.TikaException: Unexpected
RuntimeException from org.apache.tika.parser.pdf.pdfpar...@4683c2
at
org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:121)
at
org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:105)
at

Re: Swapping cores with SolrJ

2010-09-14 Thread Shaun Campbell
Hi Mitch

Thanks for responding.  Not actually sure what you wanted from
CoreAdminResponse but I put the following in:

CoreAdminRequest car = new CoreAdminRequest();
car.setCoreName(live);
car.setOtherCoreName(rebuild);
car.setAction(CoreAdminParams.CoreAdminAction.SWAP);
CoreAdminResponse carp = car.process(solrServer);
logger.debug(CoreAdminResponse status :  +
carp.getCoreStatus());
logger.debug(CoreAdminResponse :  +
carp.getResponse().toString());

and this was the output:

DEBUG: uk.co.apps2net.intranet.service.SolrServiceImpl - CoreAdminResponse
status : null
DEBUG: uk.co.apps2net.intranet.service.SolrServiceImpl - CoreAdminResponse :
{responseHeader={status=0,QTime=31}}

Looks sort of as though it's done nothing!!

Thanks
Shaun



On 14 September 2010 15:49, MitchK mitc...@web.de wrote:


 Hi Shaun,

 I think it is more easy to fix this problem, if we got more information
 about what is going on in your application.
 Please, could you provide the CoreAdminResponse returned by car.process()
 for us?

 Kind regards,
 - Mitch
 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Swapping-cores-with-SolrJ-tp1472154p1473435.html
 Sent from the Solr - User mailing list archive at Nabble.com.



SolrJ and Multi Core Set up

2010-09-03 Thread Shaun Campbell
I'm writing a client using SolrJ and was wondering how to handle a multi
core installation.  We want to use the facility to rebuild the index on one
of the cores at a scheduled time and then use the SWAP facility to switch
the live core to the newly rebuilt core.  I think I can do the SWAP with
CoreAdminRequest.setAction() with a suitable parameter.

First of all, does Solr have some concept of a default core? If I have core0
as my live core and core1 which I rebuild, then after the swap I expect
core0 to now contain my rebuilt index and core1 to contain the old live core
data.  My application should then need to keep referring to core0 as normal
with no change.  Does I have to refer to core0 programmatically? I've
currently got working client code to index and to query my Solr data but I
was wondering whether or how I set the core when I move to multi core?
There's examples showing it set as part of the URL so my guess it's done by
using something like setParam on SolrQuery.

Has anyone got any advice or examples of using SolrJ in a multi core
installation?

Regards
Shaun


Re: SolrJ and Multi Core Set up

2010-09-03 Thread Shaun Campbell
Thanks Chantal I hadn't spotted that that's a big help.

Thank you.
Shaun

On 3 September 2010 12:31, Chantal Ackermann 
chantal.ackerm...@btelligent.de wrote:

 Hi Shaun,

 you create the SolrServer using multicore by just adding the core to the
 URL. You don't need to add anything with SolrQuery.

 URL url = new URL(new URL(solrBaseUrl), coreName);
 CommonsHttpSolrServer server = new CommonsHttpSolrServer(url);

 Concerning the default core thing - I wouldn't know about that.


 Cheers,
 Chantal

 On Fri, 2010-09-03 at 12:03 +0200, Shaun Campbell wrote:
  I'm writing a client using SolrJ and was wondering how to handle a multi
  core installation.  We want to use the facility to rebuild the index on
 one
  of the cores at a scheduled time and then use the SWAP facility to switch
  the live core to the newly rebuilt core.  I think I can do the SWAP
 with
  CoreAdminRequest.setAction() with a suitable parameter.
 
  First of all, does Solr have some concept of a default core? If I have
 core0
  as my live core and core1 which I rebuild, then after the swap I expect
  core0 to now contain my rebuilt index and core1 to contain the old live
 core
  data.  My application should then need to keep referring to core0 as
 normal
  with no change.  Does I have to refer to core0 programmatically? I've
  currently got working client code to index and to query my Solr data but
 I
  was wondering whether or how I set the core when I move to multi core?
  There's examples showing it set as part of the URL so my guess it's done
 by
  using something like setParam on SolrQuery.
 
  Has anyone got any advice or examples of using SolrJ in a multi core
  installation?
 
  Regards
  Shaun