Re: Storing Related Data - At Different Times
Hi Otis, Thanks. Was thinking along those lines. But having two indexes will hurt my search. 1 . Searching fields that belong only to the personal details should result in 5 resumes begin shown for the guy (if he has 5). But now it will only show 1 link to the personal details and no resumes. 2 . Searching fields that belong to the personal details and resume details will result in 2 sets of results which I will have to manually combine using text processing. Can I avoid doing this? Thanks, Gavin On Sun, 2008-01-20 at 22:52 -0800, Otis Gospodnetic wrote: You could have 2 separate indices tied with a common field (a la FK-PK). Then you only need to change the item you are updating. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Gavin [EMAIL PROTECTED] To: solr-user solr-user@lucene.apache.org Sent: Monday, January 21, 2008 12:09:23 AM Subject: Storing Related Data - At Different Times Hi, In the web application we are developing we have two sets of details. The personal details and the resume details. We allow 5 different resumes to be available for each user. But we want the personal details to remain same for each 5 resumes. Personal details are added at registration time. After than for each resume we want link personal details. This is a simple join in the db. But how do we achieve this in Solr. The problem is when personal details are changed we will have to update all 5 resumes. I read the thread Some sort of join in SOLR?. But not sure this answers my problem. Would very much appreciate some sort of help on this one. Thanks, -- Gavin Selvaratnam, Project Leader hSenid Mobile Solutions Phone: +94-11-2446623/4 Fax: +94-11-2307579 Web: http://www.hSenidMobile.com Make it happen Disclaimer: This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to which they are addressed. The content and opinions contained in this email are not necessarily those of hSenid Software International. If you have received this email in error please contact the sender.
Re: Term vector
Term vectors are, to some extent, the opposite of the inverted index. They store term, position and offset (the latter two are optional) on a per document basis, such that you can say give me the terms, position and offsets for document X. In terms of MLT, they are used to figure out what the most important terms are in a document, such that a new query can be formed to find other documents that are more like this document. They are also useful for highlighting and other non-search related activities like clustering, etc. For more info, see my talk at ApacheCon: http://cnlp.org/presentations/slides/AdvancedLucene.pdf Also, search for term vectors on the Lucene user mailing list (you can do this via Nabble) -Grant On Jan 20, 2008, at 10:04 PM, anuvenk wrote: what are term vectors? How do they help with mlt? -- View this message in context: http://www.nabble.com/Term-vector-tp14990408p14990408.html Sent from the Solr - User mailing list archive at Nabble.com. -- Grant Ingersoll http://lucene.grantingersoll.com http://www.lucenebootcamp.com Lucene Helpful Hints: http://wiki.apache.org/lucene-java/BasicsOfPerformance http://wiki.apache.org/lucene-java/LuceneFAQ
Newbie with Java + typo
Hi people First the typo on http://wiki.apache.org/solr/mySolr: Production Typically it's not recommended do have your front end it should probably be ..recommended To have.. Second, I don't know much about Java, nor about Jetty/Resin/JBoss/ Tomcat. I went through the tutorial and was impressed with how easy it all seemed. Until the tutorial ended.. As a newbie, should I use Tomcat, JBoss, Resin, Jetty or the thing that comes with the example (Jetty, or?)? All the installation pages talk about this and that that doesn't make much sense to non-Java people like myself :-/ Would be MUCH appreciated with some after-tutorial page for us newbies. Right now I'm just looking for something that can be used on a production level machine. It doesn't have to be the fastest, as long as it's fairly easy to install. Recommendations and pointers are very welcome :) Thanks in advance! / d
Re: Newbie with Java + typo
Daniel: As a fellow 'non-java' person I feel your pain (well, felt it anyway). A lot depends on your load and the machine, but I successfully ran the stock jetty system on a box last summer for work and didn't have performance problems. The bigger issue was from the other java people complaining that I hadn't used the standard jboss setup they had already working. However, I didnt' have access to that machine, nor would anyone give it to me at the time, so it was a catch 22. Performance-wise, the stock jetty will probably do just fine for you. Longer term, you may want to learn more about jboss or tomcat or something else which can give you more application management options and such. But don't let those things stop you from running jetty/solr in production - it's worked fine for me. On Jan 21, 2008 10:48 AM, Daniel Andersson [EMAIL PROTECTED] wrote: Hi people First the typo on http://wiki.apache.org/solr/mySolr: Production Typically it's not recommended do have your front end it should probably be ..recommended To have.. Second, I don't know much about Java, nor about Jetty/Resin/JBoss/ Tomcat. I went through the tutorial and was impressed with how easy it all seemed. Until the tutorial ended.. As a newbie, should I use Tomcat, JBoss, Resin, Jetty or the thing that comes with the example (Jetty, or?)? All the installation pages talk about this and that that doesn't make much sense to non-Java people like myself :-/ Would be MUCH appreciated with some after-tutorial page for us newbies. Right now I'm just looking for something that can be used on a production level machine. It doesn't have to be the fastest, as long as it's fairly easy to install. Recommendations and pointers are very welcome :) Thanks in advance! / d -- Michael Kimsal http://webdevradio.com
Multisearching with Solr
Hi. I am checking out solr after having some experience with lucene using pyLucene. I am looking at the potential of solr to search over a large index divided over multiple servers to collect results, sort of what the parallel multisearcher does in Lucene on its own. From quick scan of archives it appears SOLR-303 may be the answer to this. Can this functionality be incorporated into 1.2 in a sandbox environment? Has anyone written a recipe that would be helpful in getting a sandbox up and running with SOLR-303? It will most likely be a few months before needing to incorporate this type of functionality in production but hoping to begin experimenting as soon as possible. On that note, is it anticipated that 1.3 will be out in a few months. If so, will it include this functionality? Lastly, what sort of load balancing and replication potential is anticipated for the multisearching capability? Many thanks. Regards, David
Re: Newbie with Java + typo
Daniel Andersson wrote: Hi people First the typo on http://wiki.apache.org/solr/mySolr: Production Typically it's not recommended do have your front end it should probably be ..recommended To have.. you can edit any of the wiki pages... fixing typos is a great contribution! As a newbie, should I use Tomcat, JBoss, Resin, Jetty or the thing that comes with the example (Jetty, or?)? Solr is servlet container agnostic -- it should run equally well on any of them. Most people are constrained to use what they are already using. If you really have no preference, perhaps stick with the jetty one included in the example. Would be MUCH appreciated with some after-tutorial page for us newbies. Right now I'm just looking for something that can be used on a production level machine. It doesn't have to be the fastest, as long as it's fairly easy to install. jetty is fine. I think otis is using that in http://www.simpy.com/ -- I use resin. Everyone you ask will give you a different answer ;) but the three containers that are most used by solr developers are jetty, resin an tomcat. ryan
Re: Multisearching with Solr
You can always use the trunk build, but you'll have to check the status of SOLR-303 to be sure it's in the trunk... Here's a thread that discusses this... http://mail.google.com/mail/?zx=wmtcqx3ngeupshva=1#label/Solr/11799e3704804489 Best Erick On Jan 21, 2008 10:55 AM, David Pratt [EMAIL PROTECTED] wrote: Hi. I am checking out solr after having some experience with lucene using pyLucene. I am looking at the potential of solr to search over a large index divided over multiple servers to collect results, sort of what the parallel multisearcher does in Lucene on its own. From quick scan of archives it appears SOLR-303 may be the answer to this. Can this functionality be incorporated into 1.2 in a sandbox environment? Has anyone written a recipe that would be helpful in getting a sandbox up and running with SOLR-303? It will most likely be a few months before needing to incorporate this type of functionality in production but hoping to begin experimenting as soon as possible. On that note, is it anticipated that 1.3 will be out in a few months. If so, will it include this functionality? Lastly, what sort of load balancing and replication potential is anticipated for the multisearching capability? Many thanks. Regards, David
Re: Newbie with Java + typo
On Jan 21, 2008, at 11:13 AM, Daniel Andersson wrote: Well, no. Immutable Page, and as far as I know (english not being my mother tongue), that means I can't edit the page You need to create an account first.
Re: Newbie with Java + typo
On Jan 21, 2008, at 4:53 PM, Michael Kimsal wrote: As a fellow 'non-java' person I feel your pain (well, felt it anyway). A lot depends on your load and the machine, but I successfully ran the stock jetty system on a box last summer for work and didn't have performance problems. Performance-wise, the stock jetty will probably do just fine for you. Longer term, you may want to learn more about jboss or tomcat or something else which can give you more application management options and such. But don't let those things stop you from running jetty/solr in production - it's worked fine for me. Sounds good to me, thanks! / d
Re: Multisearching with Solr
Hi Erick. Thank you for your reply. Unfortunately, I cannot access the link you provided. It this message from the solr-user list? Many thanks. Regards, David Erick Erickson wrote: You can always use the trunk build, but you'll have to check the status of SOLR-303 to be sure it's in the trunk... Here's a thread that discusses this... http://mail.google.com/mail/?zx=wmtcqx3ngeupshva=1#label/Solr/11799e3704804489 Best Erick On Jan 21, 2008 10:55 AM, David Pratt [EMAIL PROTECTED] wrote: Hi. I am checking out solr after having some experience with lucene using pyLucene. I am looking at the potential of solr to search over a large index divided over multiple servers to collect results, sort of what the parallel multisearcher does in Lucene on its own. From quick scan of archives it appears SOLR-303 may be the answer to this. Can this functionality be incorporated into 1.2 in a sandbox environment? Has anyone written a recipe that would be helpful in getting a sandbox up and running with SOLR-303? It will most likely be a few months before needing to incorporate this type of functionality in production but hoping to begin experimenting as soon as possible. On that note, is it anticipated that 1.3 will be out in a few months. If so, will it include this functionality? Lastly, what sort of load balancing and replication potential is anticipated for the multisearching capability? Many thanks. Regards, David
Re: spellcheckhandler
I did try with the latest nightly build. The problem still exists. I tested with the example data that comes with solr package. 1)with termsourcefield set to 'word' which is string fieldtype q=iped nano returns 'ipod nano' which is good 2) with termsourcefield set to 'spell' (which is the catchall field of 'spell' fieldtype according to the tutorial http://wiki.apache.org/solr/SpellCheckerRequestHandler that has my text fields copied in to it at index time) q=grapics returns 'graphics' which is good but q=grapics card returns nothing. Not sure if i'm missing something. Please help!! Otis Gospodnetic wrote: You don't need to wait for 1.3 to be released - you can simply use a recent nightly build. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: anuvenk [EMAIL PROTECTED] To: solr-user@lucene.apache.org Sent: Monday, January 21, 2008 12:35:52 AM Subject: Re: spellcheckhandler I followed the steps outlined in http://wiki.apache.org/solr/SpellCheckerRequestHandler with regards to setting up of the schema with a new field 'spell' and copying other fields to this 'spell' field at index time. It works fine with single word queries but doesn't return anything for multi-word queries. I read previous posts where this has been discussed. I read that some of the active members are in the process of releasing patches that fixes this problem. I'm actually trying to implement this spell check in the production set up. Is it absolutely not possible to get spell check results back for multi-word queries, should i wait for 1.3 release. If there is any other option please educate me. In case a patch was already released, how to add it to the current 1.2 version that i'm using? -- View this message in context: http://www.nabble.com/spellcheckhandler-tp14627712p14991534.html Sent from the Solr - User mailing list archive at Nabble.com. -- View this message in context: http://www.nabble.com/spellcheckhandler-tp14627712p15002379.html Sent from the Solr - User mailing list archive at Nabble.com.
DisMax and Search Components
Is there any support for DisMax (or any search request handlers) in search components, or is that something that still needs to be done? It seems like it isn't supported at the moment. We want to be able to use a field collapsing component (https://issues.apache.org/jira/browse/SOLR-236), but still be able to use our DisMax handlers. Right now it's one or the other, and we -need- both. Thanks. doug
Re: Multisearching with Solr
Yep, it's from the SOLR user list. Well, not really. I mistakenly copied my gmail url when I was looking at the relevant post, which *of course* you can't access http://svn.apache.org/repos/asf/lucene/solr/trunk or http://lucene.apache.org/solr/version_control.html Sorry 'bout that. Erick On Jan 21, 2008 11:34 AM, David Pratt [EMAIL PROTECTED] wrote: Hi Erick. Thank you for your reply. Unfortunately, I cannot access the link you provided. It this message from the solr-user list? Many thanks. Regards, David Erick Erickson wrote: You can always use the trunk build, but you'll have to check the status of SOLR-303 to be sure it's in the trunk... Here's a thread that discusses this... http://mail.google.com/mail/?zx=wmtcqx3ngeupshva=1#label/Solr/11799e3704804489 Best Erick On Jan 21, 2008 10:55 AM, David Pratt [EMAIL PROTECTED] wrote: Hi. I am checking out solr after having some experience with lucene using pyLucene. I am looking at the potential of solr to search over a large index divided over multiple servers to collect results, sort of what the parallel multisearcher does in Lucene on its own. From quick scan of archives it appears SOLR-303 may be the answer to this. Can this functionality be incorporated into 1.2 in a sandbox environment? Has anyone written a recipe that would be helpful in getting a sandbox up and running with SOLR-303? It will most likely be a few months before needing to incorporate this type of functionality in production but hoping to begin experimenting as soon as possible. On that note, is it anticipated that 1.3 will be out in a few months. If so, will it include this functionality? Lastly, what sort of load balancing and replication potential is anticipated for the multisearching capability? Many thanks. Regards, David
Re: solr 1.3
On 20-Jan-08, at 5:07 PM, anuvenk wrote: when will this be released? where can i find the list of improvements/enhancements in 1.3 if its been documented already? see http://svn.apache.org/viewvc/lucene/solr/trunk/CHANGES.txt? view=markup We're not sure on a timeframe for release yet. -Mike
RE: solr 1.3
Would somone please consider marking a label on the Subversion repository that says, This is a clean version? I only do HTTP requests and have no custom software, so I don't care about internal interfaces changing. Thanks, Lance Norskog -Original Message- From: Mike Klaas [mailto:[EMAIL PROTECTED] Sent: Monday, January 21, 2008 11:25 AM To: solr-user@lucene.apache.org Subject: Re: solr 1.3 On 20-Jan-08, at 5:07 PM, anuvenk wrote: when will this be released? where can i find the list of improvements/enhancements in 1.3 if its been documented already? see http://svn.apache.org/viewvc/lucene/solr/trunk/CHANGES.txt? view=markup We're not sure on a timeframe for release yet. -Mike
Re: Missing Content Stream
I am trying solrj to index.. using follwing code String url = http://localhost:8080/solr;; SolrServer server = new CommonsHttpSolrServer( url ); its giving error that undifined symbol for constructor(string). can somoen tell me why this constructor thrwoing error while in source file i can clearly see this constructor thanks On 1/15/08, Ismail Siddiqui [EMAIL PROTECTED] wrote: thanks brian and otis, i will definitely try solrj.. but actaually now the problem is resolved by setting content length in header i was missing it c.setRequestProperty(Content-Length, xmlText.length()+); but now its not throwing any error but not indexing the document either.. do I have to set autoCommit on in solrconfig.xml ??? thanks On 1/15/08, Brian Whitman [EMAIL PROTECTED] wrote: On Jan 15, 2008, at 1:50 PM, Ismail Siddiqui wrote: Hi Everyone, I am new to solr. I am trying to index xml using http post as follows Ismail, you seem to have a few spelling mistakes in your xml string. fiehld, nadme etc. (a) try fixing them, (b) try solrj instead, I agree w/ otis.
Is it possible to have append kind update operation?
Hi, is it possible to have append like updates, where if two records of same id's are posted to solr, the contents of the two merges and composes a single record with the id? I am asking because my program works in a multi-thread manner where several threads produces serveral parts of a final record which is to be posted and indexed. Currently I am having a preprocessing program where the threads produces parts, then a post processing where the parts are merged into a single xml file then posted to solr. If it is possible to do append like updating, then each thread can post to solr directly without writing temporary files. For example, thread 1 produce an xml file like: -- ?xml version=1.0 encoding=UTF-8? add allowDups=true overwriteCommitted=false overwritePending=false doc field name=record-id198/field field name=descriptionThis is my short text. This is part 1 of the record with id=198/field /doc /add -- thread 2 produces xml like -- ?xml version=1.0 encoding=UTF-8? add allowDups=true overwriteCommitted=false overwritePending=false doc field name=record-id198/field field name=titleTitle here. This is part 2 of record with id=198/field /doc /add -- Currently my program needs to produce the two separate files, then merge them into -- ?xml version=1.0 encoding=UTF-8? add allowDups=true overwriteCommitted=false overwritePending=false doc field name=record-id198/field field name=descriptionThis is my short text. This is part 1 of the record with id=198/field field name=descriptionThis is my short text. This is part 1 of the record with id=198/field /doc /add -- Then post the final file. If I post the two separately, I get two separate records with same id=198, while one has only field description and the other has only field title. Is it possible to append? Or is my settings allowDup incorrect? Many thanks! -- View this message in context: http://www.nabble.com/Is-it-possible-to-have-%22append%22-kind-update-operation--tp15006743p15006743.html Sent from the Solr - User mailing list archive at Nabble.com.
illegal characters in xml file to be posted?
Hi, I am using the SimplePostTool to post files to solr. I have encoutered some problem with the content of xml files. I noticed that if my xml file has fields whose values contain the character or or , the post fails and I get the exception : javax.xml.stream.XMLStreamException: ParseError at [row, col]:[x,y] Message: The entity name must immediately follow the '' in the entity reference Looks like these characters are illegal in xml as embedded contents - but I did extract them from xml in the first place. Is there a list of such characters I need to deal with before I pass that to SimplePostTool? Thanks! -- View this message in context: http://www.nabble.com/illegal-characters-in-xml-file-to-be-posted--tp15006748p15006748.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: illegal characters in xml file to be posted?
You should encode those three characters, and it doesn't hurt to encode the ampersand and double-quote characters too: http://en.wikipedia.org/wiki/XML#Entity_references Peter -Original Message- From: zqzuk [mailto:[EMAIL PROTECTED] Sent: Monday, January 21, 2008 2:24 PM To: solr-user@lucene.apache.org Subject: illegal characters in xml file to be posted? Hi, I am using the SimplePostTool to post files to solr. I have encoutered some problem with the content of xml files. I noticed that if my xml file has fields whose values contain the character or or , the post fails and I get the exception : javax.xml.stream.XMLStreamException: ParseError at [row, col]:[x,y] Message: The entity name must immediately follow the '' in the entity reference Looks like these characters are illegal in xml as embedded contents - but I did extract them from xml in the first place. Is there a list of such characters I need to deal with before I pass that to SimplePostTool? Thanks! -- View this message in context: http://www.nabble.com/illegal-characters-in-xml-file-to-be-posted--tp150 06748p15006748.html Sent from the Solr - User mailing list archive at Nabble.com.
Wildcards
Hello, I just started to use solr and I experience strange behaviour when it comes to wildcards. When I use the StandardRequestHandler queries like eur?p?an or eur*an work fine. But garden?r or admini*tion do not bring any results (without wildcards there are some of course). All affected fields are of type text, with the standard schema.xml from the example. Does anybody know how to fix this?
RE: illegal characters in xml file to be posted?
Thanks for the quick advice! pbinkley wrote: You should encode those three characters, and it doesn't hurt to encode the ampersand and double-quote characters too: http://en.wikipedia.org/wiki/XML#Entity_references Peter -Original Message- From: zqzuk [mailto:[EMAIL PROTECTED] Sent: Monday, January 21, 2008 2:24 PM To: solr-user@lucene.apache.org Subject: illegal characters in xml file to be posted? Hi, I am using the SimplePostTool to post files to solr. I have encoutered some problem with the content of xml files. I noticed that if my xml file has fields whose values contain the character or or , the post fails and I get the exception : javax.xml.stream.XMLStreamException: ParseError at [row, col]:[x,y] Message: The entity name must immediately follow the '' in the entity reference Looks like these characters are illegal in xml as embedded contents - but I did extract them from xml in the first place. Is there a list of such characters I need to deal with before I pass that to SimplePostTool? Thanks! -- View this message in context: http://www.nabble.com/illegal-characters-in-xml-file-to-be-posted--tp150 06748p15006748.html Sent from the Solr - User mailing list archive at Nabble.com. -- View this message in context: http://www.nabble.com/illegal-characters-in-xml-file-to-be-posted--tp15006748p15007840.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Wildcards
On Jan 21, 2008 5:18 PM, dojolava [EMAIL PROTECTED] wrote: I just started to use solr and I experience strange behaviour when it comes to wildcards. When I use the StandardRequestHandler queries like eur?p?an or eur*an work fine. But garden?r or admini*tion do not bring any results (without wildcards there are some of course). It's probably stemming. Something like gardener is probably stemmed to garden, so a wildcard query that expects something longer than garden won't find anything. If you really need more accurate wildcard queries, do a copyField of this field into another that does not have stemming (perhaps just whitespace tokenizer and lowercase filter, and maybe stop filter). Then use this alternate field for wildcard queries. -Yonik
Re: Wildcards
Thanks a lot! I checked it, when I search for g?rden it works, only g?rdener does not... I will try the copyField solution. On Jan 21, 2008 11:23 PM, Yonik Seeley [EMAIL PROTECTED] wrote: On Jan 21, 2008 5:18 PM, dojolava [EMAIL PROTECTED] wrote: I just started to use solr and I experience strange behaviour when it comes to wildcards. When I use the StandardRequestHandler queries like eur?p?an or eur*an work fine. But garden?r or admini*tion do not bring any results (without wildcards there are some of course). It's probably stemming. Something like gardener is probably stemmed to garden, so a wildcard query that expects something longer than garden won't find anything. If you really need more accurate wildcard queries, do a copyField of this field into another that does not have stemming (perhaps just whitespace tokenizer and lowercase filter, and maybe stop filter). Then use this alternate field for wildcard queries. -Yonik
Re: DisMax and Search Components
On Jan 21, 2008 10:23 AM, Doug Steigerwald [EMAIL PROTECTED] wrote: Is there any support for DisMax (or any search request handlers) in search components, or is that something that still needs to be done? It seems like it isn't supported at the moment. I was curious about this, too ... If it *is* something that needs to be done, am happy to help w/ the coding. But I would need some advice/guidance up front -- I'm new enough to Solr that the design behind the SearchComponents refactoring is not immediately obvious to me, either from the Jira comments or the code itself. -Charlie
Re: DisMax and Search Components
The QueryComponent supports both lucene queryparser syntax and dismax query syntax. The dismax request handler now simply sets defType (the default base query type) to dismax -Yonik On Jan 21, 2008 1:23 PM, Doug Steigerwald [EMAIL PROTECTED] wrote: Is there any support for DisMax (or any search request handlers) in search components, or is that something that still needs to be done? It seems like it isn't supported at the moment. We want to be able to use a field collapsing component (https://issues.apache.org/jira/browse/SOLR-236), but still be able to use our DisMax handlers. Right now it's one or the other, and we -need- both. Thanks. doug
Re: DisMax and Search Components
We've found a way to work around it. In our search components, we're doing something like: defType = defType == null ? DisMaxQParserPlugin.NAME : defType; If you add defType=dismax to the query string, it'll use the DisMaxQParserPlugin. Unfortunately, I haven't been able to figure out an easy way to access the config for the different defined disxmax handlers in the config, so on our service side (Rails app), we're going to have a configuration with all the params we need to pass (qf, pf, fl, etc) and send them based on parameters we have coming into the service that we use to figure out which dismax handler to use (uh, yeah, I think that sounds right). This may not be the best way to do it, but it will work fine for us until we can dedicate more time to it (we roll out Solr and our search service to QA next week). Doug Charles Hornberger wrote: On Jan 21, 2008 10:23 AM, Doug Steigerwald [EMAIL PROTECTED] wrote: Is there any support for DisMax (or any search request handlers) in search components, or is that something that still needs to be done? It seems like it isn't supported at the moment. I was curious about this, too ... If it *is* something that needs to be done, am happy to help w/ the coding. But I would need some advice/guidance up front -- I'm new enough to Solr that the design behind the SearchComponents refactoring is not immediately obvious to me, either from the Jira comments or the code itself. -Charlie
Re: DisMax and Search Components
On Jan 21, 2008 9:06 PM, Doug Steigerwald [EMAIL PROTECTED] wrote: We've found a way to work around it. In our search components, we're doing something like: defType = defType == null ? DisMaxQParserPlugin.NAME : defType; Would it be easier to just add it as a default parameter in the request handler? -Yonik
Re: DisMax and Search Components
We don't always want to use the dismax handler in our setup. Doug Yonik Seeley wrote: On Jan 21, 2008 9:06 PM, Doug Steigerwald [EMAIL PROTECTED] wrote: We've found a way to work around it. In our search components, we're doing something like: defType = defType == null ? DisMaxQParserPlugin.NAME : defType; Would it be easier to just add it as a default parameter in the request handler? -Yonik
RE: copyField limitation
Sorting on a non-integer has space problems. As I understand it, sorting creates an array of integers the size of the number of records in the entire index. Sorting on a non-integer type also creates a separate array of the same size with the field data copied into it. Thus sorting a non-integer field can use several times as much memory. We have a very large index with very small records. We are creating matching integer fields for various fields just to have faster sorts, and we are doing this after benchmarking our speed and space behaviours. I filed a Jira issue: https://issues.apache.org/jira/browse/SOLR-464 Thanks for your time, Lance Norskog -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Yonik Seeley Sent: Thursday, January 17, 2008 2:53 PM To: solr-user@lucene.apache.org Subject: Re: copyField limitation On Jan 17, 2008 4:53 PM, Lance Norskog [EMAIL PROTECTED] wrote: Because sort works much faster on type 'integer', but range queries do not work on type 'integer', Really? The sort speed should be identical. -Yonik
OOE during indexing
Hi. I get OOE with Solr 1.3 Autowarm seem to be the villain in cojunction with FieldCache somehow. JVM args: -Xmx512m -Xms512m -Xss128k Index size is ~4 Million docs, where I index text and store database primary keys. du /srv/solr/feedItem/data/index/ 1.7G/srv/solr/feedItem/data/index/ To ensure that the docs I index do not swell to much I only allow 5K per doc to over the wire i.e. I substring 0, 5000 on the field content I have removed firstSearcher and newSearcher since the queries I used before killed performance on reindexing the whole index. I will add them later again when I get into a delta update index state. Stacktrace. [06:25:53.122] [null] /update wt=xmlversion=2.2 0 3165 [06:25:53.877] Error during auto-warming of key: [EMAIL PROTECTED]:java.lang.OutOfMemoryError: Java heap space [06:25:53.877] at org.apache.lucene.index.TermBuffer.toTerm(TermBuffer.java :104) [06:25:53.877] at org.apache.lucene.index.SegmentTermEnum.term( SegmentTermEnum.java:159) [06:25:53.877] at org.apache.lucene.index.SegmentMergeInfo.next( SegmentMergeInfo.java:66) [06:25:53.877] at org.apache.lucene.index.MultiTermEnum.next( MultiReader.java:315) [06:25:53.877] at org.apache.lucene.search.FieldCacheImpl$10.createValue( FieldCacheImpl.java:388) [06:25:53.877] at org.apache.lucene.search.FieldCacheImpl$Cache.get( FieldCacheImpl.java:72) [06:25:53.877] at org.apache.lucene.search.FieldCacheImpl.getStringIndex( FieldCacheImpl.java:350) [06:25:53.877] at org.apache.lucene.search.FieldSortedHitQueue.comparatorString( FieldSortedHitQueue.java:266) [06:25:53.877] at org.apache.lucene.search.FieldSortedHitQueue$1.createValue( FieldSortedHitQueue.java:182) [06:25:53.877] at org.apache.lucene.search.FieldCacheImpl$Cache.get( FieldCacheImpl.java:72) [06:25:53.877] at org.apache.lucene.search.FieldSortedHitQueue.getCachedComparator( FieldSortedHitQueue.java:155) [06:25:53.877] at org.apache.lucene.search.FieldSortedHitQueue.init( FieldSortedHitQueue.java:56) [06:25:53.877] at org.apache.solr.search.SolrIndexSearcher.getDocListNC( SolrIndexSearcher.java:862) [06:25:53.877] at org.apache.solr.search.SolrIndexSearcher.getDocListC( SolrIndexSearcher.java:808) [06:25:53.877] at org.apache.solr.search.SolrIndexSearcher.access$000( SolrIndexSearcher.java:56) [06:25:53.877] at org.apache.solr.search.SolrIndexSearcher$2.regenerateItem (SolrIndexSearcher.java:254) [06:25:53.877] at org.apache.solr.search.LRUCache.warm(LRUCache.java:192) [06:25:53.877] at org.apache.solr.search.SolrIndexSearcher.warm( SolrIndexSearcher.java:1393) [06:25:53.877] at org.apache.solr.core.SolrCore$2.call(SolrCore.java:702) [06:25:53.877] at java.util.concurrent.FutureTask$Sync.innerRun( FutureTask.java:269) [06:25:53.877] at java.util.concurrent.FutureTask.run(FutureTask.java:123) [06:25:53.877] at java.util.concurrent.ThreadPoolExecutor$Worker.runTask( ThreadPoolExecutor.java:650) [06:25:53.877] at java.util.concurrent.ThreadPoolExecutor$Worker.run( ThreadPoolExecutor.java:675) [06:25:53.877] at java.lang.Thread.run(Thread.java:595) Help anyone? Attaching schema.xml and solrconfig.xml Kindly //Marcus Herou ?xml version=1.0 encoding=UTF-8 ? !-- Licensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the NOTICE file distributed with this work for additional information regarding copyright ownership. The ASF licenses this file to You under the Apache License, Version 2.0 (the License); you may not use this file except in compliance with the License. You may obtain a copy of the License at http://www.apache.org/licenses/LICENSE-2.0 Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an AS IS BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License. -- !-- This is the Solr schema file. This file should be named schema.xml and should be in the conf directory under the solr home (i.e. ./solr/conf/schema.xml by default) or located where the classloader for the Solr webapp can find it. This example schema is the recommended starting point for users. It should be kept correct and concise, usable out-of-the-box. For more information, on how to customize this file, please see http://wiki.apache.org/solr/SchemaXml -- schema name=example version=1.1 types fieldType name=string class=solr.StrField sortMissingLast=true omitNorms=true/ fieldType name=boolean class=solr.BoolField sortMissingLast=true omitNorms=true/ fieldType name=integer class=solr.IntField omitNorms=true/ fieldType name=long class=solr.LongField omitNorms=true/ fieldType name=float class=solr.FloatField omitNorms=true/ fieldType name=double class=solr.DoubleField omitNorms=true/ fieldType name=sint class=solr.SortableIntField sortMissingLast=true omitNorms=true/ fieldType