Re: Handling acronyms
Hi Michael Thanks for that I'll have a study later. It's just reminded me of the expand option which I meant to have a look at. Thanks Shaun On Fri, 15 Jan 2021 at 14:33, Michael Gibney wrote: > The equivalent terms on the right-hand side of the `=>` operator in the > example you sent should be separated by a comma. You mention you already > tried only-comma-separated (e.g. one line: `SRN,Stroke Research Network`) > and that that yielded unexpected results as well. I would recommend > pre-case-normalizing all the terms in synonyms.txt (i.e., lower-case), and > applying the synonym filter _after_ case normalization in the analysis > chain (there are other ways you could do, but the key point being that you > need to pay attention to case and how it interacts with the order in which > filters are applied). > > Re: Charlie's recommendation to apply these at index-time, a word of > caution (and it's possible that this is in fact the underlying cause of > some of the unexpected behavior you're observine?): be careful if you're > using term _expansion_ at index-time (i.e., mapping single terms to > multiple terms, which I note appears to be what you're trying to do in the > example lines you provided). Multi-term index-time synonyms can lead to > unexpected results for positional queries (either explicit phrase queries, > or implicit, e.g. as configured by `pf` param in edismax). I'm aware of at > least two good overviews of this topic, one by Mike McCandless focusing on > Elasticsearch [1], one by Steve Rowe focusing on Solr [2]. The underlying > issue is related LUCENE-4312 [3], so both posts (ES- & Solr-related) are > relevant. > > One way to work around this is to "collapse" (rather than expand) synonyms, > at both index and query time. Another option would be to apply synonym > expansion only at query-time. It's also worth noting that increasing phrase > slop (`ps` param, etc.) can cause the issues with index-time synonym > expansion to "fly under the radar" a little, wrt the most blatant "false > negative" manifestations of index-time synonym issues for phrase queries. > > [1] > > https://www.elastic.co/blog/multitoken-synonyms-and-graph-queries-in-elasticsearch > [2] > > https://lucidworks.com/post/multi-word-synonyms-solr-adds-query-time-support/ > [3] https://issues.apache.org/jira/browse/LUCENE-4312 > > On Fri, Jan 15, 2021 at 6:18 AM Charlie Hull < > ch...@opensourceconnections.com> wrote: > > > I'm wondering if you should be using these acronyms at index time, not > > search time. It will make your index bigger and you'll have to re-index > > to add new synonyms (as they may apply to old documents) but this could > > be an occasional task, and in the meantime you could use query-time > > synonyms for the new ones. > > > > Maintaining 9000 synonyms in Solr's synonyms.txt file seems unweildy to > me. > > > > Cheers > > > > Charlie > > > > On 15/01/2021 09:48, Shaun Campbell wrote: > > > I have a medical journals search application and I've a list of some > > 9,000 > > > acronyms like this: > > > > > > MSNQ=>MSNQ Multiple Sclerosis Neuropsychological Screening > Questionnaire > > > SRN=>SRN Stroke Research Network > > > IGBP=>IGBP isolated gastric bypass > > > TOMADO=>TOMADO Trial of Oral Mandibular Advancement Devices for > > Obstructive > > > sleep apnoea–hypopnoea > > > SRM=>SRM standardised response mean > > > SRT=>SRT substrate reduction therapy > > > SRS=>SRS Sexual Rating Scale > > > SRU=>SRU stroke rehabilitation unit > > > T2w=>T2w T2-weighted > > > Ab-P=>Ab-P Aberdeen participation restriction subscale > > > MSOA=>MSOA middle-layer super output area > > > SSA=>SSA site-specific assessment > > > SSC=>SSC Study Steering Committee > > > SSB=>SSB short-stretch bandage > > > SSE=>SSE sum squared error > > > SSD=>SSD social services department > > > NVPI=>NVPI Nausea and Vomiting of Pregnancy Instrument > > > > > > I tried to put them in a synonyms file, either just with a comma > between, > > > or with an arrow in between and the acronym repeated on the right like > > > above, and no matter what I try I'm getting really strange search > > results. > > > It's like words in one acronym are matching with the same word in > another > > > acronym and then searching with that acronym which is completely > > unrelated. > > > > > > I don't think Solr can handle this, but does anyone know of any crafty > > > tricks in Solr to handle this situation where I can either search by > the > > > acronym or by the text? > > > > > > Shaun > > > > > > > -- > > Charlie Hull - Managing Consultant at OpenSource Connections Limited > > > > Founding member of The Search Network <https://thesearchnetwork.com/> > > and co-author of Searching the Enterprise > > <https://opensourceconnections.com/about-us/books-resources/> > > tel/fax: +44 (0)8700 118334 > > mobile: +44 (0)7767 825828 > > >
Re: Handling acronyms
Hi Charlie I was indexing at index time only. The synonyms/acronyms were coming from the published journals xml files so I wasn't expecting to maintain them myself. If it worked, I was expecting, hopefully, to update the synonyms file automatically. As I just explained to Bernd I'm finding that because I'm just using supplied acronyms from the documents there's some overlap on the words used and it's giving me unexpected results. For example if I enter diabetes it finds the acronym DM for diabetes mellitus, which then coincides with an authors initials and puts them at the top of the list which is completely wrong, or is it? Perhaps I was looking for an author DM. Just too much noise to be useful I think. Thanks for your input anyway. Shaun On Fri, 15 Jan 2021 at 11:18, Charlie Hull wrote: > I'm wondering if you should be using these acronyms at index time, not > search time. It will make your index bigger and you'll have to re-index > to add new synonyms (as they may apply to old documents) but this could > be an occasional task, and in the meantime you could use query-time > synonyms for the new ones. > > Maintaining 9000 synonyms in Solr's synonyms.txt file seems unweildy to me. > > Cheers > > Charlie > > On 15/01/2021 09:48, Shaun Campbell wrote: > > I have a medical journals search application and I've a list of some > 9,000 > > acronyms like this: > > > > MSNQ=>MSNQ Multiple Sclerosis Neuropsychological Screening Questionnaire > > SRN=>SRN Stroke Research Network > > IGBP=>IGBP isolated gastric bypass > > TOMADO=>TOMADO Trial of Oral Mandibular Advancement Devices for > Obstructive > > sleep apnoea–hypopnoea > > SRM=>SRM standardised response mean > > SRT=>SRT substrate reduction therapy > > SRS=>SRS Sexual Rating Scale > > SRU=>SRU stroke rehabilitation unit > > T2w=>T2w T2-weighted > > Ab-P=>Ab-P Aberdeen participation restriction subscale > > MSOA=>MSOA middle-layer super output area > > SSA=>SSA site-specific assessment > > SSC=>SSC Study Steering Committee > > SSB=>SSB short-stretch bandage > > SSE=>SSE sum squared error > > SSD=>SSD social services department > > NVPI=>NVPI Nausea and Vomiting of Pregnancy Instrument > > > > I tried to put them in a synonyms file, either just with a comma between, > > or with an arrow in between and the acronym repeated on the right like > > above, and no matter what I try I'm getting really strange search > results. > > It's like words in one acronym are matching with the same word in another > > acronym and then searching with that acronym which is completely > unrelated. > > > > I don't think Solr can handle this, but does anyone know of any crafty > > tricks in Solr to handle this situation where I can either search by the > > acronym or by the text? > > > > Shaun > > > > -- > Charlie Hull - Managing Consultant at OpenSource Connections Limited > > Founding member of The Search Network <https://thesearchnetwork.com/> > and co-author of Searching the Enterprise > <https://opensourceconnections.com/about-us/books-resources/> > tel/fax: +44 (0)8700 118334 > mobile: +44 (0)7767 825828 >
Re: Handling acronyms
Hi Bernd Thanks for that. I think it is working, but I think unfortunately what I'm trying to do is impossible/not logical. When I enter a term it goes off and searches using all the matching acronyms, because I'm finding a term used in more than one synonym eg diabetes. I think at the end of the day this produces too much "noise" to make any sense of the results. Think I will have to park this for now. Thanks Shaun On Fri, 15 Jan 2021 at 10:35, Bernd Fehling wrote: > If you are using multiword synonyms, acronyms, ... > Your should escape the space within the multiwords. > > As synonyms.txt: > SRN, Stroke\ Research\ Network > IGBP, isolated\ gastric\ bypass > ... > > Redards > Bernd > > > Am 15.01.21 um 10:48 schrieb Shaun Campbell: > > I have a medical journals search application and I've a list of some > 9,000 > > acronyms like this: > > > > MSNQ=>MSNQ Multiple Sclerosis Neuropsychological Screening Questionnaire > > SRN=>SRN Stroke Research Network > > IGBP=>IGBP isolated gastric bypass > > TOMADO=>TOMADO Trial of Oral Mandibular Advancement Devices for > Obstructive > > sleep apnoea–hypopnoea > > SRM=>SRM standardised response mean > > SRT=>SRT substrate reduction therapy > > SRS=>SRS Sexual Rating Scale > > SRU=>SRU stroke rehabilitation unit > > T2w=>T2w T2-weighted > > Ab-P=>Ab-P Aberdeen participation restriction subscale > > MSOA=>MSOA middle-layer super output area > > SSA=>SSA site-specific assessment > > SSC=>SSC Study Steering Committee > > SSB=>SSB short-stretch bandage > > SSE=>SSE sum squared error > > SSD=>SSD social services department > > NVPI=>NVPI Nausea and Vomiting of Pregnancy Instrument > > > > I tried to put them in a synonyms file, either just with a comma between, > > or with an arrow in between and the acronym repeated on the right like > > above, and no matter what I try I'm getting really strange search > results. > > It's like words in one acronym are matching with the same word in another > > acronym and then searching with that acronym which is completely > unrelated. > > > > I don't think Solr can handle this, but does anyone know of any crafty > > tricks in Solr to handle this situation where I can either search by the > > acronym or by the text? > > > > Shaun > > >
Handling acronyms
I have a medical journals search application and I've a list of some 9,000 acronyms like this: MSNQ=>MSNQ Multiple Sclerosis Neuropsychological Screening Questionnaire SRN=>SRN Stroke Research Network IGBP=>IGBP isolated gastric bypass TOMADO=>TOMADO Trial of Oral Mandibular Advancement Devices for Obstructive sleep apnoea–hypopnoea SRM=>SRM standardised response mean SRT=>SRT substrate reduction therapy SRS=>SRS Sexual Rating Scale SRU=>SRU stroke rehabilitation unit T2w=>T2w T2-weighted Ab-P=>Ab-P Aberdeen participation restriction subscale MSOA=>MSOA middle-layer super output area SSA=>SSA site-specific assessment SSC=>SSC Study Steering Committee SSB=>SSB short-stretch bandage SSE=>SSE sum squared error SSD=>SSD social services department NVPI=>NVPI Nausea and Vomiting of Pregnancy Instrument I tried to put them in a synonyms file, either just with a comma between, or with an arrow in between and the acronym repeated on the right like above, and no matter what I try I'm getting really strange search results. It's like words in one acronym are matching with the same word in another acronym and then searching with that acronym which is completely unrelated. I don't think Solr can handle this, but does anyone know of any crafty tricks in Solr to handle this situation where I can either search by the acronym or by the text? Shaun
Re: Highlighting large text fields
Hi David Just reindexed everything and it appears to be performing well and giving me highlights for the matched text. Thanks for your help. Shaun On Tue, 12 Jan 2021, 21:00 David Smiley, wrote: > The last update to highlighting that I think is pertinent to > whether highlights match or not is v7.6 which added that hl.weightMatches > option. So I recommend upgrading to at least that if you want to > experiment further. But... uh.weightMatches highlights more accurately and > as such is more likely to not highlight as much as you are highlighting > now, and highlighting more is your goal right now it appears. > > ~ David Smiley > Apache Lucene/Solr Search Developer > http://www.linkedin.com/in/davidwsmiley > > > On Tue, Jan 12, 2021 at 2:45 PM Shaun Campbell > wrote: > > > That's great David. So hl.maxAnalyzedChars isn't that critical. I'll > whack > > it right up and see what happens. > > > > I'm running 7.4 from a few years ago. Should I upgrade? > > > > For your info this is what I'm doing with Solr > > https://dev.fundingawards.nihr.ac.uk/search. > > > > Thanks > > Shaun > > > > On Tue, 12 Jan 2021 at 19:33, David Smiley wrote: > > > > > On Tue, Jan 12, 2021 at 1:08 PM Shaun Campbell < > campbell.sh...@gmail.com > > > > > > wrote: > > > > > > > Hi David > > > > > > > > Getting closer now. > > > > > > > > First of all, a bit of a mistake on my part. I have two cores set up > > and > > > I > > > > was changing the solrconfig.xml on the wrong core doh!! That's why > > > > highlighting wasn't being turned off. > > > > > > > > I think I've got the unified highlighter working. > > > > storeOffsetsWithPositions was already configured on my field type > > > > definition, not the field definition, so that was ok. > > > > > > > > What it boils down to now I think is hl.maxAnalyzedChars. I'm getting > > > > highlighting on some records and not others, making it confusing as > to > > > > where the match is with my dismax parser. I increased > > > > my hl.maxAnalyzedChars to 130 and now it's highlighting more > > records. > > > > Two questions: > > > > > > > > 1. Have you any guidelines as to what could be a > > > > maximum hl.maxAnalyzedChars without impacting performance or memory? > > > > > > > > > > With storeOffsetsWithPositions, highlighting is super-fast, and so this > > > hl.maxAnalyzedChars threshold is of marginal utility, like only to cap > > the > > > amount of memory used if you have some truly humongous docs and it's > okay > > > only highlight the first X megabytes of them. Maybe set to a 100MB > worth > > > of text, or something like that. > > > > > > > > > > 2. Do you know a way to query the maximum length of text in a field > so > > > that > > > > I can set hl.maxAnalyzedChars accordingly? Just thinking I can > > probably > > > > modify my java indexer to log the maximum content length. Actually, > I > > > > probably don't want the maximum but some value that highlights 90-95% > > > > records > > > > > > > > > > Eh... not really. Maybe some approximation hacks involving function > > > queries on norms but I'd not bother in favor of just using a high > > threshold > > > such that this won't be an issue. > > > > > > All this said, this threshold is *not* the only reason why you might > not > > be > > > getting highlights that you expect. If you are using a recent Solr > > > version, you might try toggling the hl.weightMatches boolean, which > could > > > make a difference for certain query arrangements. There's a JIRA issue > > > pertaining to this one, and I haven't investigated it yet. > > > > > > ~ David > > > > > > > > > > > > > > Thanks > > > > Shaun > > > > > > > > On Tue, 12 Jan 2021 at 16:30, David Smiley > wrote: > > > > > > > > > On Tue, Jan 12, 2021 at 9:39 AM Shaun Campbell < > > > campbell.sh...@gmail.com > > > > > > > > > > wrote: > > > > > > > > > > > Hi David > > > > > > > > > > > > First of all I wanted to say I'm working off your book!! Third > > > > edition, > > > > > > and I thin
Re: Highlighting large text fields
That's great David. So hl.maxAnalyzedChars isn't that critical. I'll whack it right up and see what happens. I'm running 7.4 from a few years ago. Should I upgrade? For your info this is what I'm doing with Solr https://dev.fundingawards.nihr.ac.uk/search. Thanks Shaun On Tue, 12 Jan 2021 at 19:33, David Smiley wrote: > On Tue, Jan 12, 2021 at 1:08 PM Shaun Campbell > wrote: > > > Hi David > > > > Getting closer now. > > > > First of all, a bit of a mistake on my part. I have two cores set up and > I > > was changing the solrconfig.xml on the wrong core doh!! That's why > > highlighting wasn't being turned off. > > > > I think I've got the unified highlighter working. > > storeOffsetsWithPositions was already configured on my field type > > definition, not the field definition, so that was ok. > > > > What it boils down to now I think is hl.maxAnalyzedChars. I'm getting > > highlighting on some records and not others, making it confusing as to > > where the match is with my dismax parser. I increased > > my hl.maxAnalyzedChars to 130 and now it's highlighting more records. > > Two questions: > > > > 1. Have you any guidelines as to what could be a > > maximum hl.maxAnalyzedChars without impacting performance or memory? > > > > With storeOffsetsWithPositions, highlighting is super-fast, and so this > hl.maxAnalyzedChars threshold is of marginal utility, like only to cap the > amount of memory used if you have some truly humongous docs and it's okay > only highlight the first X megabytes of them. Maybe set to a 100MB worth > of text, or something like that. > > > > 2. Do you know a way to query the maximum length of text in a field so > that > > I can set hl.maxAnalyzedChars accordingly? Just thinking I can probably > > modify my java indexer to log the maximum content length. Actually, I > > probably don't want the maximum but some value that highlights 90-95% > > records > > > > Eh... not really. Maybe some approximation hacks involving function > queries on norms but I'd not bother in favor of just using a high threshold > such that this won't be an issue. > > All this said, this threshold is *not* the only reason why you might not be > getting highlights that you expect. If you are using a recent Solr > version, you might try toggling the hl.weightMatches boolean, which could > make a difference for certain query arrangements. There's a JIRA issue > pertaining to this one, and I haven't investigated it yet. > > ~ David > > > > > > Thanks > > Shaun > > > > On Tue, 12 Jan 2021 at 16:30, David Smiley wrote: > > > > > On Tue, Jan 12, 2021 at 9:39 AM Shaun Campbell < > campbell.sh...@gmail.com > > > > > > wrote: > > > > > > > Hi David > > > > > > > > First of all I wanted to say I'm working off your book!! Third > > edition, > > > > and I think it's a bit out of date now. I was just going to try > > following > > > > the section on the Postings highlighter, but I see that's been > absorbed > > > > into the Unified highlighter. I find your book easier to follow than > > the > > > > official documentation though. > > > > > > > > > > Thanks :-D. I do maintain the Solr Reference Guide for the parts of > > code I > > > touch, including highlighting, so I hope what's there makes sense too. > > > > > > > > > > I am going to try to configure the unified highlighter, and I will > add > > > that > > > > storeOffsetsWithPositions to the schema (which I saw in your book) > and > > I > > > > will try indexing again from scratch. Was getting some funny things > > > going > > > > on where I thought I'd turned highlighting off and it was still > giving > > me > > > > highlights. > > > > > > > > > > hl=true/false > > > > > > > > > > Actually just re-reading your email again, are you saying that you > > can't > > > > configure highlighting in solrconfig.xml? That's where I always > > configure > > > > original highlighting in my dismax search handler. Am I supposed to > add > > > > highlighting to each request? > > > > > > > > > > You can set highlighting and other *parameters* in solrconfig.xml for > > > request handlers. But the dedicated plugin info is only > > for > > > the original and Fast Vector Highlighters. > > > > > > ~ David &g
Re: Highlighting large text fields
Hi David Getting closer now. First of all, a bit of a mistake on my part. I have two cores set up and I was changing the solrconfig.xml on the wrong core doh!! That's why highlighting wasn't being turned off. I think I've got the unified highlighter working. storeOffsetsWithPositions was already configured on my field type definition, not the field definition, so that was ok. What it boils down to now I think is hl.maxAnalyzedChars. I'm getting highlighting on some records and not others, making it confusing as to where the match is with my dismax parser. I increased my hl.maxAnalyzedChars to 130 and now it's highlighting more records. Two questions: 1. Have you any guidelines as to what could be a maximum hl.maxAnalyzedChars without impacting performance or memory? 2. Do you know a way to query the maximum length of text in a field so that I can set hl.maxAnalyzedChars accordingly? Just thinking I can probably modify my java indexer to log the maximum content length. Actually, I probably don't want the maximum but some value that highlights 90-95% records Thanks Shaun On Tue, 12 Jan 2021 at 16:30, David Smiley wrote: > On Tue, Jan 12, 2021 at 9:39 AM Shaun Campbell > wrote: > > > Hi David > > > > First of all I wanted to say I'm working off your book!! Third edition, > > and I think it's a bit out of date now. I was just going to try following > > the section on the Postings highlighter, but I see that's been absorbed > > into the Unified highlighter. I find your book easier to follow than the > > official documentation though. > > > > Thanks :-D. I do maintain the Solr Reference Guide for the parts of code I > touch, including highlighting, so I hope what's there makes sense too. > > > > I am going to try to configure the unified highlighter, and I will add > that > > storeOffsetsWithPositions to the schema (which I saw in your book) and I > > will try indexing again from scratch. Was getting some funny things > going > > on where I thought I'd turned highlighting off and it was still giving me > > highlights. > > > > hl=true/false > > > > Actually just re-reading your email again, are you saying that you can't > > configure highlighting in solrconfig.xml? That's where I always configure > > original highlighting in my dismax search handler. Am I supposed to add > > highlighting to each request? > > > > You can set highlighting and other *parameters* in solrconfig.xml for > request handlers. But the dedicated plugin info is only for > the original and Fast Vector Highlighters. > > ~ David > > > > > > Thanks > > Shaun > > > > On Mon, 11 Jan 2021 at 20:57, David Smiley wrote: > > > > > Hello! > > > > > > I worked on the UnifiedHighlighter a lot and want to help you! > > > > > > On Mon, Jan 11, 2021 at 9:58 AM Shaun Campbell < > campbell.sh...@gmail.com > > > > > > wrote: > > > > > > > I've been using highlighting for a while, using the original > > highlighter, > > > > and just come across a problem with fields that contain a large > amount > > of > > > > text, approx 250k characters. I only have about 2,000 records but > each > > > one > > > > contains a journal publication to search through. > > > > > > > > What I noticed is that some records didn't return a highlight even > > though > > > > they matched on the content. I noticed the hl.maxAnalyzedChars > > parameter > > > > and increased that, but it allowed some records to be highlighted, > but > > > not > > > > all, and then it caused memory problems on the server. Performance > is > > > also > > > > very poor. > > > > > > > > > > I've been thinking hl.maxAnalyzedChars should maybe default to no limit > > -- > > > it's a performance threshold but perhaps better to opt-in to such a > limit > > > then scratch your head for a long time wondering why a search result > > isn't > > > showing highlights. > > > > > > > > > > To try to fix this I've tried to configure the unified highlighter > in > > my > > > > solrconfig.xml instead. It seems to be working but again I'm > missing > > > some > > > > highlighted records. > > > > > > > > > > There is no configuration of that highlighter in solrconfig.xml; it's > > > entirely parameter driven (runtime). > > > > > > > > > > The other thing is I've tried to adjust my unified highlighting > > set
Re: Highlighting large text fields
Hi David First of all I wanted to say I'm working off your book!! Third edition, and I think it's a bit out of date now. I was just going to try following the section on the Postings highlighter, but I see that's been absorbed into the Unified highlighter. I find your book easier to follow than the official documentation though. I am going to try to configure the unified highlighter, and I will add that storeOffsetsWithPositions to the schema (which I saw in your book) and I will try indexing again from scratch. Was getting some funny things going on where I thought I'd turned highlighting off and it was still giving me highlights. Actually just re-reading your email again, are you saying that you can't configure highlighting in solrconfig.xml? That's where I always configure original highlighting in my dismax search handler. Am I supposed to add highlighting to each request? Thanks Shaun On Mon, 11 Jan 2021 at 20:57, David Smiley wrote: > Hello! > > I worked on the UnifiedHighlighter a lot and want to help you! > > On Mon, Jan 11, 2021 at 9:58 AM Shaun Campbell > wrote: > > > I've been using highlighting for a while, using the original highlighter, > > and just come across a problem with fields that contain a large amount of > > text, approx 250k characters. I only have about 2,000 records but each > one > > contains a journal publication to search through. > > > > What I noticed is that some records didn't return a highlight even though > > they matched on the content. I noticed the hl.maxAnalyzedChars parameter > > and increased that, but it allowed some records to be highlighted, but > not > > all, and then it caused memory problems on the server. Performance is > also > > very poor. > > > > I've been thinking hl.maxAnalyzedChars should maybe default to no limit -- > it's a performance threshold but perhaps better to opt-in to such a limit > then scratch your head for a long time wondering why a search result isn't > showing highlights. > > > > To try to fix this I've tried to configure the unified highlighter in my > > solrconfig.xml instead. It seems to be working but again I'm missing > some > > highlighted records. > > > > There is no configuration of that highlighter in solrconfig.xml; it's > entirely parameter driven (runtime). > > > > The other thing is I've tried to adjust my unified highlighting settings > in > > solrconfig.xml and they don't seem to be having any effect even after > > restarting Solr. I was just wondering whether there is any highlighting > > information stored at index time. It's taking over 4hours to index my > > records so it's not easy to keep reindexing my content. > > > > Any ideas on how to handle highlighting of large content would be > > appreciated. > > > > Shaun > > > > Please read the documentation here thoroughly: > > https://lucene.apache.org/solr/guide/8_6/highlighting.html#the-unified-highlighter > (or earlier version as applicable) > Since you have large bodies of text to highlight, you would strongly > benefit from putting offsets into the search index (and re-index) -- > storeOffsetsWithPositions. That's an option on the field/fieldType in your > schema; it may not be obvious reading the docs. You have to opt-in to > that; Solr doesn't normally store any info in the index for highlighting. > > ~ David Smiley > Apache Lucene/Solr Search Developer > http://www.linkedin.com/in/davidwsmiley >
Highlighting large text fields
I've been using highlighting for a while, using the original highlighter, and just come across a problem with fields that contain a large amount of text, approx 250k characters. I only have about 2,000 records but each one contains a journal publication to search through. What I noticed is that some records didn't return a highlight even though they matched on the content. I noticed the hl.maxAnalyzedChars parameter and increased that, but it allowed some records to be highlighted, but not all, and then it caused memory problems on the server. Performance is also very poor. To try to fix this I've tried to configure the unified highlighter in my solrconfig.xml instead. It seems to be working but again I'm missing some highlighted records. The other thing is I've tried to adjust my unified highlighting settings in solrconfig.xml and they don't seem to be having any effect even after restarting Solr. I was just wondering whether there is any highlighting information stored at index time. It's taking over 4hours to index my records so it's not easy to keep reindexing my content. Any ideas on how to handle highlighting of large content would be appreciated. Shaun
Searching document content and mult-valued fields
Hi Been using Solr on a project now for a couple of years and is working well. It's just a simple index of about 20 - 25 fields and 7,000 project records. Now there's a requirement to be able to search on the content of documents (web pages, Word, pdf etc) related to those projects. My initial thought was to just create a new index to store the Tika'd content and just search on that. However, the requirement is to somehow search through both the project records and the content records at the same time and list the main project with perhaps some info on the matching content data. I tried to explain that you may find matching main project records but no content, and vice versa. My only solution to this search problem is to either concatenate all the document content into one field on the main project record, and add that to my dismax search, and use boosting etc or to use a multi-valued field to store the content of each project document. I'm a bit reluctant to do this as the application is running well and I'm a bit nervous about a change to the schema and the indexing process. I just wondered what you thought about adding a lot of content to an existing schema (single or multivalued field) that doesn't normally store big amounts of data. Or does anyone know of any way, I can join two searches like this together and two separate indexes? Thanks Shaun
Re: Multiple Cores
I would say it all depends on what you are trying to do. Unlike a relational database, in Solr the data does not need to be normalised, you need to put everything into an index so that you can achieve whatever feature it is that you want. For example, you may search on customer and want a facetted count of the products. Also in Solr you have the concept of multi valued fields, therefore you could have a product index with a multi valued field that stores say customer id, thereby linking products and customers in one index. We have multiple cores which we had to create for various reasons. To access the cores for indexing (or searching) you just have to refer to the cores by their names in any Solr URLs or in the Java client etc. I think it all depends on what it is you are trying to achieve. It's best to read a good book such as http://www.packtpub.com/solr-1-4-enterprise-search-server/book and see what can be achieved and design your indexes accordingly. Hope that helps. On 20 June 2011 05:38, jboy79 joel_pangani...@yahoo.com wrote: Hi, I am new to SOLR and would like to know if multiple cores is the best way to deal with having a product and customer index. If this is the case how do you go about indexing on multiple cores. Thanks -- View this message in context: http://lucene.472066.n3.nabble.com/Multiple-Cores-tp3084817p3084817.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: manual background re-indexing
Hi Paul Would a multi-core set up and the swap command do what you want it to do? http://wiki.apache.org/solr/CoreAdmin Shaun On 28 April 2011 12:49, Paul Libbrecht p...@hoplahup.net wrote: Hello list, I am planning to implement a setup, to be run on unix scripts, that should perform a full pull-and-reindex in a background server and index then deploy that index. All should happen on the same machine. I thought the replication methods would help me but they seem to rather solve the issues of distribution while, what I need, is only the ability to: - suspend the queries - swap the directories with the new index - close all searchers - reload and warm-up the searcher on the new index Is there a part of the replication utilities (http or unix) that I could use to perform the above tasks? I intend to do this on occasion... maybe once a month or even less. Is reload the right term to be used? paul
Re: Indexing Best Practice
If it's of any help I've split the processing of PDF files from the indexing. I put the PDF content into a text file (but I guess you could load it into a database) and use that as part of the indexing. My processing of the PDF files also compares timestamps on the document and the text file so that I'm only processing documents that have changed. I am a newbie so perhaps there's more sophisticated approaches. Hope that helps. Shaun On 11 April 2011 07:20, Darx Oman darxo...@gmail.com wrote: Hi guys I'm wondering how to best configure solr to fulfills my requirements. I'm indexing data from 2 data sources: 1- Database 2- PDF files (password encrypted) Every file has related information stored in the database. Both the file content and the related database fields must be indexed as one document in solr. Among the DB data is *per-user* permissions for every document. The file contents nearly never change, on the other hand, the DB data and especially the permissions change very frequently which require me to re-index everything for every modified document. My problem is in process of decrypting the PDF files before re-indexing them which takes too much time for a large number of documents, it could span to days in full re-indexing. What I'm trying to accomplish is eliminating the need to re-index the PDF content if not changed even if the DB data changed. I know this is not possible in solr, because solr doesn't update documents. So how to best accomplish this: Can I use 2 indexes one for PDF contents and the other for DB data and have a common id field for both as a link between them, *and results are treated as one Document*?
Re: Tips for getting unique results?
Hi Pete Still think facets are what you need. We use facets to identify the most common tags for documents in our library. I use them to print the top 25 most common document tags. The sort by count (the default) gives you the one with the highest count first and then the next most common and so on. Hope this helps. Shaun On 8 April 2011 19:28, Peter Spam ps...@mac.com wrote: Thanks for the note, Shaun, but the documentation indicates that the sorting is only in ascending order :-( facet.sort This param determines the ordering of the facet field constraints. • count - sort the constraints by count (highest count first) • index - to return the constraints sorted in their index order (lexicographic by indexed term). For terms in the ascii range, this will be alphabetically sorted. The default is count if facet.limit is greater than 0, index otherwise. Prior to Solr1.4, one needed to use true instead of count and false instead of index. This parameter can be specified on a per field basis. -Pete On Apr 8, 2011, at 2:49 AM, Shaun Campbell wrote: Pete Surely the default sort order for facets is by descending count order. See http://wiki.apache.org/solr/SimpleFacetParameters. If your results are really sorted in ascending order can't you sort them externally eg Java? Hope that helps. Shaun
Re: Tips for getting unique results?
Pete Surely the default sort order for facets is by descending count order. See http://wiki.apache.org/solr/SimpleFacetParameters. If your results are really sorted in ascending order can't you sort them externally eg Java? Hope that helps. Shaun
Highlighting Issue
I'm trying to highlight a field and I'm getting an exception thrown, only on certain search terms though. I am fairly certain that the cause of the problem is through having synonyms on the highlighted field as I have had highlighting working in the past on other fields. The added complication is that the field that I am highlighting also has ngramming and stemming. I think what is happening is that the highlighting cannot match the criteria (which happens to be a synonym) against the actual string retrieved from the index and crashes, I think if the string found is greater than a certain number of characters. I wonder if anyone has experienced this problem and knows how to get around it? My field definition is: !-- An edge nGrammed and stemmed field for the document tags. -- fieldType name=tagphrase_nGram class=solr.TextField positionIncrementGap=100 analyzer type=index tokenizer class=solr.WhitespaceTokenizerFactory/ filter class=solr.SynonymFilterFactory synonyms=../../common/tag_synonyms.txt ignoreCase=true expand=true/ !--filter class=solr.StopFilterFactory ignoreCase=true words=stopwords.txt enablePositionIncrements=true /-- filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=1 catenateNumbers=1 catenateAll=0 splitOnCaseChange=0/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.SnowballPorterFilterFactory language=English protected=protwords.txt/ filter class=solr.EdgeNGramFilterFactory minGramSize=1 maxGramSize=15 side=front / /analyzer analyzer type=query tokenizer class=solr.WhitespaceTokenizerFactory / filter class=solr.WordDelimiterFilterFactory generateWordParts=1 generateNumberParts=1 catenateWords=0 catenateNumbers=0 catenateAll=0 splitOnCaseChange=0/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.SnowballPorterFilterFactory language=English protected=protwords.txt/ /analyzer /fieldType My query is: sort=tagcount+deschl.snippets=1start=0q=(+%2Btagsearch:asset)+||+(+%2Btagsearchnostem:asset)+hl.fl=tagsearchwt=javabinhl=truerows=100version=1 The exception being thrown is: 09-Dec-2010 11:59:26 org.apache.solr.common.SolrException log SEVERE: org.apache.solr.common.SolrException: org.apache.lucene.search.highlight.InvalidTokenOffsetsException: Token profo exceeds length of provided text sized 26 at org.apache.solr.highlight.DefaultSolrHighlighter.doHighlighting(DefaultSolrHighlighter.java:342) at org.apache.solr.handler.component.HighlightComponent.process(HighlightComponent.java:89) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:195) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:845) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447) at java.lang.Thread.run(Thread.java:619) Caused by: org.apache.lucene.search.highlight.InvalidTokenOffsetsException: Token profo exceeds length of provided text sized 26 at org.apache.lucene.search.highlight.Highlighter.getBestTextFragments(Highlighter.java:254) at org.apache.solr.highlight.DefaultSolrHighlighter.doHighlighting(DefaultSolrHighlighter.java:335) ... 18 more Thanks Shaun
Re: Highlighting Issue
Koji Thanks a lot it's stopped crashing now. Can I ask one other question about synonym highlighting which looks a bit puzzling? I enter asset as my criteria and it returns through synonym matching other terms highlighted as well. My debug output is: DEBUG: uk.co.sjp.intranet.service.SolrServiceImpl - Highlighted tag = eminves/emtment DEBUG: uk.co.sjp.intranet.service.SolrServiceImpl - Highlighted tag = emasset/em management DEBUG: uk.co.sjp.intranet.service.SolrServiceImpl - Highlighted tag = eminves/emtment product DEBUG: uk.co.sjp.intranet.service.SolrServiceImpl - Highlighted tag = alternative emasset/ems As you can see asset works well. For the synonyms does it just highlight the first n characters where n is the length of the input string? Can't figure out how it could do otherwise. Shaun On 9 December 2010 12:51, Koji Sekiguchi k...@r.email.ne.jp wrote: (10/12/09 21:22), Shaun Campbell wrote: I'm trying to highlight a field and I'm getting an exception thrown, only on certain search terms though. I am fairly certain that the cause of the problem is through having synonyms on the highlighted field as I have had highlighting working in the past on other fields. The added complication is that the field that I am highlighting also has ngramming and stemming. I think what is happening is that the highlighting cannot match the criteria (which happens to be a synonym) against the actual string retrieved from the index and crashes, I think if the string found is greater than a certain number of characters. I wonder if anyone has experienced this problem and knows how to get around it? Basically, highlighting on synonym fields is no problem, but highlighter doesn't support n-gram fields. FastVectorHighlighter supports fixed-length (minGramSize==maxGramSize) n-gram fields. Koji -- http://www.rondhuit.com/en/
Re: Highlighting Issue
OK. I'd switch to FastVectorHighlighter which cured the exceptions and gives me highlighting so I assumed that you could use this instead of the standard highlighter on n-grammed fields. I guess my query was how does the highlighter now highlight synonym terms? Thanks Shaun As I said in my previous mail, highlighter doesn't support n-gram field. Please remove EdgeNGramFilter from your index analyzer and re-index. You'll get what you want. Koji -- http://www.rondhuit.com/en/
Re: Highlighting Issue
Sorry, see what you mean about fixed-length (minGramSize==maxGramSize). I see mine aren't.:( On 9 December 2010 14:26, Koji Sekiguchi k...@r.email.ne.jp wrote: (10/12/09 22:50), Shaun Campbell wrote: OK. I'd switch to FastVectorHighlighter which cured the exceptions and gives me highlighting so I assumed that you could use this instead of the standard highlighter on n-grammed fields. I guess my query was how does the highlighter now highlight synonym terms? Thanks Shaun FVH supports fixed-length (minGramSize==maxGramSize) n-gram tokeizer. I think FVH supports synonym fields, too. Koji -- http://www.rondhuit.com/en/
Core Swapping
I've got a Solr multi core system and I'm trying to swap the cores after a re-index via SolrJ using a separate HTTP Solr web server. My application seems to be generating a URL that's not valid for my Solr Tomcat installation but I can't see why or where it's getting its data from. Core swapping is working if I enter a URL manually into my browser. This URL works: http://localhost:8080/solr/admin/cores?action=SWAPcore=liveother=rebuild As you can see in my error below the URL generated (http://localhost:8080/solr/rebuild/admin/cores?action=SWAPcore=rebuildother=livewt=javabinversion=1) has the core *rebuild* in it. If I enter this directly into my browser I get the following error: HTTP Status 404 - /solr/admin/cores type Status report message /solr/admin/cores description The requested resource (/solr/admin/cores) is not available. My Java code is: CoreAdminRequest car = new CoreAdminRequest(); car.setCoreName(rebuild); car.setOtherCoreName(live); car.setAction(CoreAdminParams.CoreAdminAction.SWAP); CoreAdminResponse carp = car.process(getSolrServer()); Can anyone suggest what I might be doing wrong? Thanks Shaun The error from my web application is: SEVERE: Servlet.service() for servlet Spring MVC Dispatcher Servlet threw exception org.apache.solr.common.SolrException: Not Found Not Found request: http://localhost:8080/solr/rebuild/admin/cores?action=SWAPcore=rebuildother=livewt=javabinversion=1 at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:435) at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:244) at org.apache.solr.client.solrj.request.CoreAdminRequest.process(CoreAdminRequest.java:214) at uk.co.sjp.intranet.Factory.HttpSolrServerImpl.swapCores(HttpSolrServerImpl.java:50) at uk.co.sjp.intranet.indexing.DocumentIndexer.index(DocumentIndexer.java:185) at uk.co.sjp.intranet.SearchController.test(SearchController.java:107) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.springframework.web.bind.annotation.support.HandlerMethodInvoker.doInvokeMethod(HandlerMethodInvoker.java:710) at org.springframework.web.bind.annotation.support.HandlerMethodInvoker.invokeHandlerMethod(HandlerMethodInvoker.java:167) at org.springframework.web.servlet.mvc.annotation.AnnotationMethodHandlerAdapter.invokeHandlerMethod(AnnotationMethodHandlerAdapter.java:414) at org.springframework.web.servlet.mvc.annotation.AnnotationMethodHandlerAdapter.handle(AnnotationMethodHandlerAdapter.java:402) at org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherServlet.java:771) at org.springframework.web.servlet.DispatcherServlet.doService(DispatcherServlet.java:716) at org.springframework.web.servlet.FrameworkServlet.processRequest(FrameworkServlet.java:647) at org.springframework.web.servlet.FrameworkServlet.doGet(FrameworkServlet.java:552) at javax.servlet.http.HttpServlet.service(HttpServlet.java:617) at javax.servlet.http.HttpServlet.service(HttpServlet.java:717) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:290) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.ApplicationDispatcher.invoke(ApplicationDispatcher.java:630) at org.apache.catalina.core.ApplicationDispatcher.processRequest(ApplicationDispatcher.java:436) at org.apache.catalina.core.ApplicationDispatcher.doForward(ApplicationDispatcher.java:374) at org.apache.catalina.core.ApplicationDispatcher.forward(ApplicationDispatcher.java:302) at org.tuckey.web.filters.urlrewrite.NormalRewrittenUrl.doRewrite(NormalRewrittenUrl.java:195) at org.tuckey.web.filters.urlrewrite.RuleChain.handleRewrite(RuleChain.java:159) at org.tuckey.web.filters.urlrewrite.RuleChain.doRules(RuleChain.java:141) at org.tuckey.web.filters.urlrewrite.UrlRewriter.processRequest(UrlRewriter.java:90) at org.tuckey.web.filters.urlrewrite.UrlRewriteFilter.doFilter(UrlRewriteFilter.java:417) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at uk.co.sjp.intranet.utils.JsonpCallbackFilter.doFilter(JsonpCallbackFilter.java:108) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at
EmbeddedSolrServer, Indexing and Core Swapping
Hi I've switched my app to now use an EmbeddedSolrServer. I'm doing an index on my rebuild core and swapping cores at the end. Unfortunately, without restarting my web app I can't see the newly indexed data. I can see core swapping is working, and I can see the data after indexing without restarting servers if I use an http Solr server. Is there something different I need to do after indexing and swapping cores with an embedded Solr server? The only thing I can possibly think of is that when I create a EmbeddedSolrServer object for processing my swap request I need to specify a core. Don't know if this is significant. Also, do you think I need to recreate my CoreContainer? Any ideas would be welcome. Thanks
Exception being thrown indexing a specific pdf document using Solr Cell
I've got an existing Spring Solr SolrJ application that indexes a mixture of documents. It seems to have been working fine now for a couple of weeks but today I've just started getting an exception when processing a certain pdf file. The exception is : ERROR: org.apache.solr.core.SolrCore - org.apache.solr.common.SolrException: org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.pdf.pdfpar...@4683c2 at org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:211) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.RequestHandlers$LazyRequestHandlerWrapper.handleRequest(RequestHandlers.java:233) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) at org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(EmbeddedSolrServer.java:139) at uk.co.sjp.intranet.service.SolrServiceImpl.loadDocuments(SolrServiceImpl.java:308) at uk.co.sjp.intranet.SearchController.loadDocuments(SearchController.java:297) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:597) at org.springframework.web.bind.annotation.support.HandlerMethodInvoker.doInvokeMethod(HandlerMethodInvoker.java:710) at org.springframework.web.bind.annotation.support.HandlerMethodInvoker.invokeHandlerMethod(HandlerMethodInvoker.java:167) at org.springframework.web.servlet.mvc.annotation.AnnotationMethodHandlerAdapter.invokeHandlerMethod(AnnotationMethodHandlerAdapter.java:414) at org.springframework.web.servlet.mvc.annotation.AnnotationMethodHandlerAdapter.handle(AnnotationMethodHandlerAdapter.java:402) at org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherServlet.java:771) at org.springframework.web.servlet.DispatcherServlet.doService(DispatcherServlet.java:716) at org.springframework.web.servlet.FrameworkServlet.processRequest(FrameworkServlet.java:647) at org.springframework.web.servlet.FrameworkServlet.doGet(FrameworkServlet.java:552) at javax.servlet.http.HttpServlet.service(HttpServlet.java:617) at javax.servlet.http.HttpServlet.service(HttpServlet.java:717) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:290) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.ApplicationDispatcher.invoke(ApplicationDispatcher.java:630) at org.apache.catalina.core.ApplicationDispatcher.processRequest(ApplicationDispatcher.java:436) at org.apache.catalina.core.ApplicationDispatcher.doForward(ApplicationDispatcher.java:374) at org.apache.catalina.core.ApplicationDispatcher.forward(ApplicationDispatcher.java:302) at org.tuckey.web.filters.urlrewrite.NormalRewrittenUrl.doRewrite(NormalRewrittenUrl.java:195) at org.tuckey.web.filters.urlrewrite.RuleChain.handleRewrite(RuleChain.java:159) at org.tuckey.web.filters.urlrewrite.RuleChain.doRules(RuleChain.java:141) at org.tuckey.web.filters.urlrewrite.UrlRewriter.processRequest(UrlRewriter.java:90) at org.tuckey.web.filters.urlrewrite.UrlRewriteFilter.doFilter(UrlRewriteFilter.java:417) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:845) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447) at java.lang.Thread.run(Thread.java:619) Caused by: org.apache.tika.exception.TikaException: Unexpected RuntimeException from org.apache.tika.parser.pdf.pdfpar...@4683c2 at org.apache.tika.parser.CompositeParser.parse(CompositeParser.java:121) at org.apache.tika.parser.AutoDetectParser.parse(AutoDetectParser.java:105) at
Re: Swapping cores with SolrJ
Hi Mitch Thanks for responding. Not actually sure what you wanted from CoreAdminResponse but I put the following in: CoreAdminRequest car = new CoreAdminRequest(); car.setCoreName(live); car.setOtherCoreName(rebuild); car.setAction(CoreAdminParams.CoreAdminAction.SWAP); CoreAdminResponse carp = car.process(solrServer); logger.debug(CoreAdminResponse status : + carp.getCoreStatus()); logger.debug(CoreAdminResponse : + carp.getResponse().toString()); and this was the output: DEBUG: uk.co.apps2net.intranet.service.SolrServiceImpl - CoreAdminResponse status : null DEBUG: uk.co.apps2net.intranet.service.SolrServiceImpl - CoreAdminResponse : {responseHeader={status=0,QTime=31}} Looks sort of as though it's done nothing!! Thanks Shaun On 14 September 2010 15:49, MitchK mitc...@web.de wrote: Hi Shaun, I think it is more easy to fix this problem, if we got more information about what is going on in your application. Please, could you provide the CoreAdminResponse returned by car.process() for us? Kind regards, - Mitch -- View this message in context: http://lucene.472066.n3.nabble.com/Swapping-cores-with-SolrJ-tp1472154p1473435.html Sent from the Solr - User mailing list archive at Nabble.com.
SolrJ and Multi Core Set up
I'm writing a client using SolrJ and was wondering how to handle a multi core installation. We want to use the facility to rebuild the index on one of the cores at a scheduled time and then use the SWAP facility to switch the live core to the newly rebuilt core. I think I can do the SWAP with CoreAdminRequest.setAction() with a suitable parameter. First of all, does Solr have some concept of a default core? If I have core0 as my live core and core1 which I rebuild, then after the swap I expect core0 to now contain my rebuilt index and core1 to contain the old live core data. My application should then need to keep referring to core0 as normal with no change. Does I have to refer to core0 programmatically? I've currently got working client code to index and to query my Solr data but I was wondering whether or how I set the core when I move to multi core? There's examples showing it set as part of the URL so my guess it's done by using something like setParam on SolrQuery. Has anyone got any advice or examples of using SolrJ in a multi core installation? Regards Shaun
Re: SolrJ and Multi Core Set up
Thanks Chantal I hadn't spotted that that's a big help. Thank you. Shaun On 3 September 2010 12:31, Chantal Ackermann chantal.ackerm...@btelligent.de wrote: Hi Shaun, you create the SolrServer using multicore by just adding the core to the URL. You don't need to add anything with SolrQuery. URL url = new URL(new URL(solrBaseUrl), coreName); CommonsHttpSolrServer server = new CommonsHttpSolrServer(url); Concerning the default core thing - I wouldn't know about that. Cheers, Chantal On Fri, 2010-09-03 at 12:03 +0200, Shaun Campbell wrote: I'm writing a client using SolrJ and was wondering how to handle a multi core installation. We want to use the facility to rebuild the index on one of the cores at a scheduled time and then use the SWAP facility to switch the live core to the newly rebuilt core. I think I can do the SWAP with CoreAdminRequest.setAction() with a suitable parameter. First of all, does Solr have some concept of a default core? If I have core0 as my live core and core1 which I rebuild, then after the swap I expect core0 to now contain my rebuilt index and core1 to contain the old live core data. My application should then need to keep referring to core0 as normal with no change. Does I have to refer to core0 programmatically? I've currently got working client code to index and to query my Solr data but I was wondering whether or how I set the core when I move to multi core? There's examples showing it set as part of the URL so my guess it's done by using something like setParam on SolrQuery. Has anyone got any advice or examples of using SolrJ in a multi core installation? Regards Shaun