RE: Problem comitting on 40GB index
Sorry, my bad... I replied to a current mailing list message only changing the subject... Didn't know about this Hijacking problem. Will not happen again. Just for close this issue, if I understand correctly, for an index of 40G, I will need, for running an optimize: - 40G if all activity on index is stopped - 80G if index is being searched...) - 120G if index is being searched and if a commit is performed. Is this correct? Thanks. Frederico -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: terça-feira, 12 de Janeiro de 2010 19:18 To: solr-user@lucene.apache.org Subject: Re: Problem comitting on 40GB index Huh? On Tue, Jan 12, 2010 at 2:00 PM, Chris Hostetter hossman_luc...@fucit.orgwrote: : Subject: Problem comitting on 40GB index : In-Reply-To: 7a9c48b51001120345h5a57dbd4o8a8a39fc4a98a...@mail.gmail.com http://people.apache.org/~hossman/#threadhijack Thread Hijacking on Mailing Lists When starting a new discussion on a mailing list, please do not reply to an existing message, instead start a fresh email. Even if you change the subject line of your email, other mail headers still track which thread you replied to and your question is hidden in that thread and gets less attention. It makes following discussions in the mailing list archives particularly difficult. See Also: http://en.wikipedia.org/wiki/User:DonDiego/Thread_hijacking -Hoss
Re: XmlUpdateRequestHandler with HTMLStripCharFilterFactory
On Wed, Jan 13, 2010 at 7:48 AM, Lance Norskog goks...@gmail.com wrote: You can do this stripping in the DataImportHandler. You would have to write your own stripping code using regular expresssions. Note that DIH has a HTMLStripTransformer which wraps Solr's HTMLStripReader. -- Regards, Shalin Shekhar Mangar.
Problem with 'sint' in More Like This feature
Hi, I am using the More Like This feature. I have configured it in solrconfig.xml as a dedicated request handler and I am using SolrJ. It's working properly when the similarity fields are all text data types. But when I add a field whose datatype is 'sint', it's throwing an exception. Exception - Caused by: org.apache.solr.common.SolrException: For_input_string_?javalangNumberFormatException_For_input_string_? Any help / suggestion is much appreciated. Thanks, Vijay
Re: Queries of type field:value not functioning
try /solr/select?q.alt=*:*qt=dismax or /solr/select?q=some search termqt=dismax dismax should be configured in solrconfig.xml by default, but you have to adapt it to list the fields from your schema.xml and for anything with known field: /solr/select?q=field:valueqt=standard Cheers, Chantal Siddhant Goel schrieb: Hi all, Any query I make which is of type field:value does not return any documents. Same is the case for the *:* query. The *:* query doesn't return any result either. The index size is close to 1GB now, so it should be returning some documents. The rest of the queries are functioning properly. Any help? Thanks, -- - Siddhant
Restricting Facet to FilterQuery in combination with mincount
Hi all, is it possible to restrict the returned facets to only those that apply to the filter query but still use mincount=0? Keeping those that have a count of 0 but apply to the filter, and at the same time leaving out those that are not covered by the filter (and thus 0, as well). Some longer explanation of the question: Example (don't nail me down on biology here, it's just for illustration): q=type:mammalfacet.mincount=0facet.field=type returns facets for all values stored in the field type. Results would look like: mammal(2123) bird(0) dinosaur(0) fish(0) ... In this case setting facet.mincount=1 solves the problem. But consider: q=area:waterfq=type:mammalfacet.field=namefacet.mincount=0 would return something like dolphin (20) blue whale (20) salmon (0) = not covered by filter query lion (0) dog (0) ... (all sorts of animals, every possible value in field name) My question is: how can I exclude those facets from the result that are not covered by the filter query. In this example: how can I exclude the non-mammals from the facets but keep all those mammals that are not matched by the actual query parameter? Thanks! Chantal
Re: Multi language support
right, but we should not encourage users to significantly degrade overall relevance for all movies due to a few movies and a band (very special cases, as I said). In english, by not using stopwords, it doesn't really degrade relevance that much, so its a reasonable decision to make. This is not true in other languages! Instead, systems that worry about all-stopword queries should use CommonGrams. it will work better for these cases, without taking away from overall relevance. On Wed, Jan 13, 2010 at 1:08 AM, Walter Underwood wun...@wunderwood.org wrote: There is a band named The The. And a producer named Don Was. For a list of all-stopword movie titles at Netflix, see this post: http://wunderwood.org/most_casual_observer/2007/05/invisible_titles.html My favorite is To Be and To Have (Être et Avoir), which is all stopwords in two languages. And a very good movie. wunder On Jan 12, 2010, at 6:55 PM, Robert Muir wrote: sorry, i forgot to include this 2009 paper comparing what stopwords do across 3 languages: http://doc.rero.ch/lm.php?url=1000,43,4,20091218142456-GY/Dolamic_Ljiljana_-_When_Stopword_Lists_Make_the_Difference_20091218.pdf in my opinion, if stopwords annoy your users for very special cases like 'the the' then, instead consider using commongrams + defaultsimilarity.discountOverlaps = true so that you still get the benefits. as you can see from the above paper, they can be extremely important depending on the language, they just don't matter so much for English. On Tue, Jan 12, 2010 at 9:20 PM, Lance Norskog goks...@gmail.com wrote: There are a lot of projects that don't use stopwords any more. You might consider dropping them altogether. On Mon, Jan 11, 2010 at 2:25 PM, Don Werve d...@madwombat.com wrote: This is the way I've implemented multilingual search as well. 2010/1/11 Markus Jelsma mar...@buyways.nl Hello, We have implemented language specific search in Solr using language specific fields and field types. For instance, an en_text field type can use an English stemmer, and list of stopwords and synonyms. We, however did not use specific stopwords, instead we used one list shared by both languages. So you would have a field type like: fieldType name=en_text class=solr.TextField ... analyzer type= filter class=solr.StopFilterFactory words=stopwords.en.txt filter class=solr.SynonymFilterFactory synonyms=synoyms.en.txt etc etc. Cheers, - Markus Jelsma Buyways B.V. Technisch Architect Friesestraatweg 215c http://www.buyways.nl 9743 AD Groningen Alg. 050-853 6600 KvK 01074105 Tel. 050-853 6620 Fax. 050-3118124 Mob. 06-5025 8350 In: http://www.linkedin.com/in/markus17 On Mon, 2010-01-11 at 13:45 +0100, Daniel Persson wrote: Hi Solr users. I'm trying to set up a site with Solr search integrated. And I use the SolJava API to feed the index with search documents. At the moment I have only activated search on the English portion of the site. I'm interested in using as many features of solr as possible. Synonyms, Stopwords and stems all sounds quite interesting and useful but how do I set up this in a good way for a multilingual site? The site don't have a huge text mass so performance issues don't really bother me but still I'd like to hear your suggestions before I try to implement an solution. Best regards Daniel -- Lance Norskog goks...@gmail.com -- Robert Muir rcm...@gmail.com -- Robert Muir rcm...@gmail.com
Re: Problem comitting on 40GB index
That's my understanding.. But fortunately disk space is cheap G On Wed, Jan 13, 2010 at 5:01 AM, Frederico Azeiteiro frederico.azeite...@cision.com wrote: Sorry, my bad... I replied to a current mailing list message only changing the subject... Didn't know about this Hijacking problem. Will not happen again. Just for close this issue, if I understand correctly, for an index of 40G, I will need, for running an optimize: - 40G if all activity on index is stopped - 80G if index is being searched...) - 120G if index is being searched and if a commit is performed. Is this correct? Thanks. Frederico -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: terça-feira, 12 de Janeiro de 2010 19:18 To: solr-user@lucene.apache.org Subject: Re: Problem comitting on 40GB index Huh? On Tue, Jan 12, 2010 at 2:00 PM, Chris Hostetter hossman_luc...@fucit.orgwrote: : Subject: Problem comitting on 40GB index : In-Reply-To: 7a9c48b51001120345h5a57dbd4o8a8a39fc4a98a...@mail.gmail.com http://people.apache.org/~hossman/#threadhijack Thread Hijacking on Mailing Lists When starting a new discussion on a mailing list, please do not reply to an existing message, instead start a fresh email. Even if you change the subject line of your email, other mail headers still track which thread you replied to and your question is hidden in that thread and gets less attention. It makes following discussions in the mailing list archives particularly difficult. See Also: http://en.wikipedia.org/wiki/User:DonDiego/Thread_hijacking -Hoss
Re: DataImportHandler - synchronous execution
Hi, I created Jira issue SOLR-1721 and attached simple patch ( no documentation ) for this. HIH, Alex 2010/1/13 Noble Paul നോബിള് नोब्ळ् noble.p...@corp.aol.com: it can be added On Tue, Jan 12, 2010 at 10:18 PM, Alexey Serba ase...@gmail.com wrote: Hi, I found that there's no explicit option to run DataImportHandler in a synchronous mode. I need that option to run DIH from SolrJ ( EmbeddedSolrServer ) in the same thread. Currently I pass dummy stream to DIH as a workaround for this, but I think it makes sense to add specific option for that. Any objections? Alex -- - Noble Paul | Systems Architect| AOL | http://aol.com
Re: Multi language support
Isn't the conclusion here that some stopword and stemming free matching should be the best match if ever and to then gently degrade to weaker forms of matching? paul Le 13-janv.-10 à 07:08, Walter Underwood a écrit : There is a band named The The. And a producer named Don Was. For a list of all-stopword movie titles at Netflix, see this post: http://wunderwood.org/most_casual_observer/2007/05/invisible_titles.html My favorite is To Be and To Have (Être et Avoir), which is all stopwords in two languages. And a very good movie. wunder On Jan 12, 2010, at 6:55 PM, Robert Muir wrote: sorry, i forgot to include this 2009 paper comparing what stopwords do across 3 languages: http://doc.rero.ch/lm.php?url=1000,43,4,20091218142456-GY/Dolamic_Ljiljana_-_When_Stopword_Lists_Make_the_Difference_20091218.pdf in my opinion, if stopwords annoy your users for very special cases like 'the the' then, instead consider using commongrams + defaultsimilarity.discountOverlaps = true so that you still get the benefits. as you can see from the above paper, they can be extremely important depending on the language, they just don't matter so much for English. On Tue, Jan 12, 2010 at 9:20 PM, Lance Norskog goks...@gmail.com wrote: There are a lot of projects that don't use stopwords any more. You might consider dropping them altogether. On Mon, Jan 11, 2010 at 2:25 PM, Don Werve d...@madwombat.com wrote: This is the way I've implemented multilingual search as well. 2010/1/11 Markus Jelsma mar...@buyways.nl Hello, We have implemented language specific search in Solr using language specific fields and field types. For instance, an en_text field type can use an English stemmer, and list of stopwords and synonyms. We, however did not use specific stopwords, instead we used one list shared by both languages. So you would have a field type like: fieldType name=en_text class=solr.TextField ... analyzer type= filter class=solr.StopFilterFactory words=stopwords.en.txt filter class=solr.SynonymFilterFactory synonyms=synoyms.en.txt etc etc. Cheers, - Markus Jelsma Buyways B.V. Technisch ArchitectFriesestraatweg 215c http://www.buyways.nl 9743 AD Groningen Alg. 050-853 6600 KvK 01074105 Tel. 050-853 6620 Fax. 050-3118124 Mob. 06-5025 8350 In: http://www.linkedin.com/in/markus17 On Mon, 2010-01-11 at 13:45 +0100, Daniel Persson wrote: Hi Solr users. I'm trying to set up a site with Solr search integrated. And I use the SolJava API to feed the index with search documents. At the moment I have only activated search on the English portion of the site. I'm interested in using as many features of solr as possible. Synonyms, Stopwords and stems all sounds quite interesting and useful but how do I set up this in a good way for a multilingual site? The site don't have a huge text mass so performance issues don't really bother me but still I'd like to hear your suggestions before I try to implement an solution. Best regards Daniel -- Lance Norskog goks...@gmail.com -- Robert Muir rcm...@gmail.com
Problem indexing files
Hi all, I'm trying to add multiple files to solr 1.4 with solrj. With this programm 1 Doc is added to solr: SolrServer server = SolrHelper.getServer(); server.deleteByQuery( *:* );// delete everything! server.commit(); QueryResponse rsp = server.query( new SolrQuery( *:*) ); Assert.assertEquals( 0, rsp.getResults().getNumFound() ); ContentStreamUpdateRequest up = new ContentStreamUpdateRequest(/update/extract); up.addFile(new File(d:/temp/test.txt)); //up.addFile(new File(d:/temp/test2.txt)); //-- Nothing added if removing the comment from this line. up.setParam(literal.contid, doc1); up.setAction(AbstractUpdateRequest.ACTION.COMMIT, true, true); NamedListObject result = server.request(up); UpdateResponse test = server.commit(); But no doc is added, if i remove the comment tag from the second addFile. What's wrong with this? Thanks, Thomas
Boosting fields with localsolr
I have tried several variations now, but have been unable to come up with a way to boost fields in a localsolr query. What I need to do is do a localsolr search and sort the result set so that a specific value is at the top. My idea was to use a nested dismax query with a boost field like this (with field names changed to protect the guilty): qt=geo lat=44.47 long=-73.15 radius=10 _query_:{!dismax qf=year bf=author:kevin^2}2010 sort=score desc In plain english, find all posts in the given radius from the year 2010 with the posts by author 'kevin' appearing at the top of the result set. This didn't work, as _query_ wasn't recognized by the localsolr handler. I then tried the opposite, putting the localsolr query in a nested query, but the dismax handler didn't parse the nested query. So, is there any way to accomplish what I am trying? Thanks, Kevin
Re: Boosting fields with localsolr
On Jan 13, 2010, at 10:44 AM, Kevin Thorley wrote: I have tried several variations now, but have been unable to come up with a way to boost fields in a localsolr query. What I need to do is do a localsolr search and sort the result set so that a specific value is at the top. My idea was to use a nested dismax query with a boost field like this (with field names changed to protect the guilty): qt=geo lat=44.47 long=-73.15 radius=10 _query_:{!dismax qf=year bf=author:kevin^2}2010 sort=score desc Sorry if this caused any confusion... the bf param above should have been bq In plain english, find all posts in the given radius from the year 2010 with the posts by author 'kevin' appearing at the top of the result set. This didn't work, as _query_ wasn't recognized by the localsolr handler. I then tried the opposite, putting the localsolr query in a nested query, but the dismax handler didn't parse the nested query. So, is there any way to accomplish what I am trying? Thanks, Kevin
RE: Need help Migrating to Solr
I don't have experience with migrating, but you should consider using the example schema.xml in the distro as a starting basis for creating your schema. -Original Message- From: Abin Mathew [mailto:abin.mat...@toostep.com] Sent: Tuesday, January 12, 2010 8:42 PM To: solr-user@lucene.apache.org Subject: Need help Migrating to Solr Hi I am new to the solr technology. We have been using lucene for handling searching in our web application www.toostep.com which is a knowledge sharing platform developed in java using Spring MVC architecture and iBatis as the persistance framework. Now that the application is getting very complex we have decided to implement Solr technology over lucene. Anyone having expertise in this area please give me some guidelines on where to start off and how to form the schema for Solr. Thanks and Regards Abin Mathew
copyField with Analyzer?
Hi all, I tried creating a case-insensitive string using the values provided to a string, via CopyField. This didn't work, since copyField does it's job before the analyzer on the case-insensitive string field is invoked. Is there another way I might accomplish this field replication on the server? Tim Harsch Sr. Software Engineer Dell Perot Systems
RE: Problem comitting on 40GB index
Just curious, have you checked if the hanging you are experiencing is not garbage collection related? -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: 13 January 2010 13:33 To: solr-user@lucene.apache.org Subject: Re: Problem comitting on 40GB index That's my understanding.. But fortunately disk space is cheap G On Wed, Jan 13, 2010 at 5:01 AM, Frederico Azeiteiro frederico.azeite...@cision.com wrote: Sorry, my bad... I replied to a current mailing list message only changing the subject... Didn't know about this Hijacking problem. Will not happen again. Just for close this issue, if I understand correctly, for an index of 40G, I will need, for running an optimize: - 40G if all activity on index is stopped - 80G if index is being searched...) - 120G if index is being searched and if a commit is performed. Is this correct? Thanks. Frederico -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: terça-feira, 12 de Janeiro de 2010 19:18 To: solr-user@lucene.apache.org Subject: Re: Problem comitting on 40GB index Huh? On Tue, Jan 12, 2010 at 2:00 PM, Chris Hostetter hossman_luc...@fucit.orgwrote: : Subject: Problem comitting on 40GB index : In-Reply-To: 7a9c48b51001120345h5a57dbd4o8a8a39fc4a98a...@mail.gmail.com http://people.apache.org/~hossman/#threadhijack Thread Hijacking on Mailing Lists When starting a new discussion on a mailing list, please do not reply to an existing message, instead start a fresh email. Even if you change the subject line of your email, other mail headers still track which thread you replied to and your question is hidden in that thread and gets less attention. It makes following discussions in the mailing list archives particularly difficult. See Also: http://en.wikipedia.org/wiki/User:DonDiego/Thread_hijacking -Hoss -- This transmission is strictly confidential, possibly legally privileged, and intended solely for the addressee. Any views or opinions expressed within it are those of the author and do not necessarily represent those of 192.com, i-CD Publishing (UK) Ltd or any of it's subsidiary companies. If you are not the intended recipient then you must not disclose, copy or take any action in reliance of this transmission. If you have received this transmission in error, please notify the sender as soon as possible. No employee or agent is authorised to conclude any binding agreement on behalf of i-CD Publishing (UK) Ltd with another party by email without express written confirmation by an authorised employee of the Company. http://www.192.com (Tel: 08000 192 192). i-CD Publishing (UK) Ltd is incorporated in England and Wales, company number 3148549, VAT No. GB 673128728.
Re: Question
On Wed, Jan 13, 2010 at 10:17 AM, Bill Bell bb...@kaango.com wrote: I am using Solr 1.4, and have 3 cores defined in solr.xml. Question on replication 1. How do I set up rsync replication from master to slaves? It was easy to do with just one core and one script.conf, but with multiple cores what is the easiest way? 2. I got the system to work by changing the snappuller to pass in a -c script.conf but there has got to be an easier way? 3. On the master I have 3 rsync daemons running. Is it possible to do it with one? The script.conf really needs multiple data_dir settings... -- Bill Bell Vice President of Technology bb...@kaango.com mobile 720.256.8076 Kaango, LLC - www.kaango.com -- Bill Bell Vice President of Technology bb...@kaango.com mobile 720.256.8076 Kaango, LLC - www.kaango.com
Re: How to display Highlight with VelocityResponseWriter?
Thanks a lot. It works now. When i added the line #set($hl = $response.highlighting) i got the highlighting. But i wonder if there's any document that describes the usage of that. I mean i didn't know the name of those methods. Actually i just managed to guess it. best regards, Qiuyan Quoting Sascha Szott sz...@zib.de: Qiuyan, with highlight can also be displayed in the web gui. I've added bool name=hltrue/bool into the standard responseHandler and it already works, i.e without velocity. But the same line doesn't take effect in itas. Should i configure anything else? Thanks in advance. First of all, just a few notes on the /itas request handler in your solrconfig.xml: 1. The entry arr name=components strhighlight/str /arr is obsolete, since the highlighting component is a default search component [1]. 2. Note that since you didn't specify a value for hl.fl highlighting will only affect the fields listed inside of qf. 3. Why did you override the default value of hl.fragmenter? In most cases the default fragmenting algorithm (gap) works fine - and maybe in yours as well? To make sure all your hl related settings are correct, can you post an xml output (change the wt parameter to xml) for a search with highlighted results. And finally, can you post the vtl code snippet that should produce the highlighted output. -Sascha [1] http://wiki.apache.org/solr/SearchComponent
RE: Problem comitting on 40GB index
The hanging didn't happen again since yesterday. I never run out of space again. This is still a dev environment, so the number of searches is very low. Maybe I'm just lucky... Where can I see the garbage collection info? -Original Message- From: Marc Des Garets [mailto:marc.desgar...@192.com] Sent: quarta-feira, 13 de Janeiro de 2010 17:20 To: solr-user@lucene.apache.org Subject: RE: Problem comitting on 40GB index Just curious, have you checked if the hanging you are experiencing is not garbage collection related? -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: 13 January 2010 13:33 To: solr-user@lucene.apache.org Subject: Re: Problem comitting on 40GB index That's my understanding.. But fortunately disk space is cheap G On Wed, Jan 13, 2010 at 5:01 AM, Frederico Azeiteiro frederico.azeite...@cision.com wrote: Sorry, my bad... I replied to a current mailing list message only changing the subject... Didn't know about this Hijacking problem. Will not happen again. Just for close this issue, if I understand correctly, for an index of 40G, I will need, for running an optimize: - 40G if all activity on index is stopped - 80G if index is being searched...) - 120G if index is being searched and if a commit is performed. Is this correct? Thanks. Frederico -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: terça-feira, 12 de Janeiro de 2010 19:18 To: solr-user@lucene.apache.org Subject: Re: Problem comitting on 40GB index Huh? On Tue, Jan 12, 2010 at 2:00 PM, Chris Hostetter hossman_luc...@fucit.orgwrote: : Subject: Problem comitting on 40GB index : In-Reply-To: 7a9c48b51001120345h5a57dbd4o8a8a39fc4a98a...@mail.gmail.com http://people.apache.org/~hossman/#threadhijack Thread Hijacking on Mailing Lists When starting a new discussion on a mailing list, please do not reply to an existing message, instead start a fresh email. Even if you change the subject line of your email, other mail headers still track which thread you replied to and your question is hidden in that thread and gets less attention. It makes following discussions in the mailing list archives particularly difficult. See Also: http://en.wikipedia.org/wiki/User:DonDiego/Thread_hijacking -Hoss -- This transmission is strictly confidential, possibly legally privileged, and intended solely for the addressee. Any views or opinions expressed within it are those of the author and do not necessarily represent those of 192.com, i-CD Publishing (UK) Ltd or any of it's subsidiary companies. If you are not the intended recipient then you must not disclose, copy or take any action in reliance of this transmission. If you have received this transmission in error, please notify the sender as soon as possible. No employee or agent is authorised to conclude any binding agreement on behalf of i-CD Publishing (UK) Ltd with another party by email without express written confirmation by an authorised employee of the Company. http://www.192.com (Tel: 08000 192 192). i-CD Publishing (UK) Ltd is incorporated in England and Wales, company number 3148549, VAT No. GB 673128728.
Interesting OutOfMemoryError on a 170M index
Hi, I have a bit of an interesting OutOfMemoryError that I'm trying to figure out. My client Solr server are running in the same JVM (for deployment simplicity). FWIW, I'm using Jetty to host Solr. I'm using the supplied code for the http-based client interface. Solr 1.3.0. My app is adding about 20,000 documents per minute to the index - one at a time (it is listening to an event stream and for every event, it adds a new document to the index). The size of the documents, however, is tiny - the total index growth is only about 170M (after about 1 hr and the OutOfMemoryError) At this point, there is zero querying happening - just updates to the index (only adding documents, no updates or deletes) After about an hour or so, my JVM runs out of heap space - and if I look at the memory utilisation over time, it looks like a classic memory leak. It slowly ramps up until we end up with constant FULL GC's and eventual OOME. Max heap space is 512M. In Solr, I'm using autocommit (to buffer the updates) autoCommit maxDocs1/maxDocs maxTime1000/maxTime /autoCommit (Aside: Now, I'm not sure if I am meant to call commit or not on the client SolrServer class if I am using autocommit - but as it turns out, I get OOME whether I do that or not) Any suggestions/advice of quick things to check before I dust off the profiler? Thanks in advance. Cheers, Nick === Please access the attached hyperlink for an important electronic communications disclaimer: http://www.credit-suisse.com/legal/en/disclaimer_email_ib.html ===
case-insensitive string type
Hi I have a field: field name=srcANYSTRStrCI type=string_ci indexed=true stored=true multiValued=true / With type definition: !-- A Case insensitive version of string type -- fieldType name=string_ci class=solr.StrField sortMissingLast=true omitNorms=true analyzer type=index tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.LowerCaseFilterFactory / /analyzer analyzer type=query tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.LowerCaseFilterFactory / /analyzer /fieldType When searching that field I can't get a case-insensitive match. It works as if it is a regular string, for instance I can do a prefix query and so long as the prefix matches the case of the value it works, but if I change the prefix case it doesn't Essentially I am trying to get case-insensitive matching that supports wild cards... Tim Harsch Sr. Software Engineer Dell Perot Systems (650) 604-0374
Re: case-insensitive string type
from http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters On wildcard and fuzzy searches, no text analysis is performed on the search word. i'd just lowercase the wildcard-ed search term in your client code, before you send it to solr. hth, rob On Wed, Jan 13, 2010 at 2:18 PM, Harsch, Timothy J. (ARC-TI)[PEROT SYSTEMS] timothy.j.har...@nasa.gov wrote: Hi I have a field: field name=srcANYSTRStrCI type=string_ci indexed=true stored=true multiValued=true / With type definition: !-- A Case insensitive version of string type -- fieldType name=string_ci class=solr.StrField sortMissingLast=true omitNorms=true analyzer type=index tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.LowerCaseFilterFactory / /analyzer analyzer type=query tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.LowerCaseFilterFactory / /analyzer /fieldType When searching that field I can't get a case-insensitive match. It works as if it is a regular string, for instance I can do a prefix query and so long as the prefix matches the case of the value it works, but if I change the prefix case it doesn't Essentially I am trying to get case-insensitive matching that supports wild cards... Tim Harsch Sr. Software Engineer Dell Perot Systems (650) 604-0374
Re: case-insensitive string type
What do you get when you add debugQuery=on to your lower-case query? And does Luke show you what you expect in the index? On Wed, Jan 13, 2010 at 2:18 PM, Harsch, Timothy J. (ARC-TI)[PEROT SYSTEMS] timothy.j.har...@nasa.gov wrote: Hi I have a field: field name=srcANYSTRStrCI type=string_ci indexed=true stored=true multiValued=true / With type definition: !-- A Case insensitive version of string type -- fieldType name=string_ci class=solr.StrField sortMissingLast=true omitNorms=true analyzer type=index tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.LowerCaseFilterFactory / /analyzer analyzer type=query tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.LowerCaseFilterFactory / /analyzer /fieldType When searching that field I can't get a case-insensitive match. It works as if it is a regular string, for instance I can do a prefix query and so long as the prefix matches the case of the value it works, but if I change the prefix case it doesn't Essentially I am trying to get case-insensitive matching that supports wild cards... Tim Harsch Sr. Software Engineer Dell Perot Systems (650) 604-0374
RE: case-insensitive string type
I considered that, but I'm also having the issue that I can't get an exact match as case insensitive either. -Original Message- From: Rob Casson [mailto:rob.cas...@gmail.com] Sent: Wednesday, January 13, 2010 11:26 AM To: solr-user@lucene.apache.org Subject: Re: case-insensitive string type from http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters On wildcard and fuzzy searches, no text analysis is performed on the search word. i'd just lowercase the wildcard-ed search term in your client code, before you send it to solr. hth, rob On Wed, Jan 13, 2010 at 2:18 PM, Harsch, Timothy J. (ARC-TI)[PEROT SYSTEMS] timothy.j.har...@nasa.gov wrote: Hi I have a field: field name=srcANYSTRStrCI type=string_ci indexed=true stored=true multiValued=true / With type definition: !-- A Case insensitive version of string type -- fieldType name=string_ci class=solr.StrField sortMissingLast=true omitNorms=true analyzer type=index tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.LowerCaseFilterFactory / /analyzer analyzer type=query tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.LowerCaseFilterFactory / /analyzer /fieldType When searching that field I can't get a case-insensitive match. It works as if it is a regular string, for instance I can do a prefix query and so long as the prefix matches the case of the value it works, but if I change the prefix case it doesn't Essentially I am trying to get case-insensitive matching that supports wild cards... Tim Harsch Sr. Software Engineer Dell Perot Systems (650) 604-0374
RE: case-insensitive string type
From the query http://localhost:8080/solr/select?q=idxPartition%3ASOMEPART%20AND%20srcANYSTRStrCI:%22mixcase%20or%20lower%22debugQuery=on Debug info attached -Original Message- From: Harsch, Timothy J. (ARC-TI)[PEROT SYSTEMS] [mailto:timothy.j.har...@nasa.gov] Sent: Wednesday, January 13, 2010 11:28 AM To: solr-user@lucene.apache.org Subject: RE: case-insensitive string type I considered that, but I'm also having the issue that I can't get an exact match as case insensitive either. -Original Message- From: Rob Casson [mailto:rob.cas...@gmail.com] Sent: Wednesday, January 13, 2010 11:26 AM To: solr-user@lucene.apache.org Subject: Re: case-insensitive string type from http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters On wildcard and fuzzy searches, no text analysis is performed on the search word. i'd just lowercase the wildcard-ed search term in your client code, before you send it to solr. hth, rob On Wed, Jan 13, 2010 at 2:18 PM, Harsch, Timothy J. (ARC-TI)[PEROT SYSTEMS] timothy.j.har...@nasa.gov wrote: Hi I have a field: field name=srcANYSTRStrCI type=string_ci indexed=true stored=true multiValued=true / With type definition: !-- A Case insensitive version of string type -- fieldType name=string_ci class=solr.StrField sortMissingLast=true omitNorms=true analyzer type=index tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.LowerCaseFilterFactory / /analyzer analyzer type=query tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.LowerCaseFilterFactory / /analyzer /fieldType When searching that field I can't get a case-insensitive match. It works as if it is a regular string, for instance I can do a prefix query and so long as the prefix matches the case of the value it works, but if I change the prefix case it doesn't Essentially I am trying to get case-insensitive matching that supports wild cards... Tim Harsch Sr. Software Engineer Dell Perot Systems (650) 604-0374 ?xml version=1.0 encoding=UTF-8? response lst name=responseHeaderint name=status0/intint name=QTime62/intlst name=paramsstr name=debugQueryon/strstr name=qidxPartition:SOMEPART AND srcANYSTRStrCI:mixcase or lower/str/lst/lstresult name=response numFound=0 start=0/lst name=debugstr name=rawquerystringidxPartition:SOMEPART AND srcANYSTRStrCI:mixcase or lower/strstr name=querystringidxPartition:SOMEPART AND srcANYSTRStrCI:mixcase or lower/strstr name=parsedquery+idxPartition:SOMEPART +srcANYSTRStrCI:mixcase or lower/strstr name=parsedquery_toString+idxPartition:SOMEPART +srcANYSTRStrCI:mixcase or lower/strlst name=explain/str name=QParserLuceneQParser/strlst name=timingdouble name=time31.0/doublelst name=preparedouble name=time31.0/doublelst name=org.apache.solr.handler.component.QueryComponentdouble name=time0.0/double/lstlst name=org.apache.solr.handler.component.FacetComponentdouble name=time0.0/double/lstlst name=org.apache.solr.handler.component.MoreLikeThisComponentdouble name=time0.0/double/lstlst name=org.apache.solr.handler.component.HighlightComponentdouble name=time0.0/double/lstlst name=org.apache.solr.handler.component.StatsComponentdouble name=time0.0/double/lstlst name=org.apache.solr.handler.component.DebugComponentdouble name=time0.0/double/lst/lstlst name=processdouble name=time0.0/doublelst name=org.apache.solr.handler.component.QueryComponentdouble name=time0.0/double/lstlst name=org.apache.solr.handler.component.FacetComponentdouble name=time0.0/double/lstlst name=org.apache.solr.handler.component.MoreLikeThisComponentdouble name=time0.0/double/lstlst name=org.apache.solr.handler.component.HighlightComponentdouble name=time0.0/double/lstlst name=org.apache.solr.handler.component.StatsComponentdouble name=time0.0/double/lstlst name=org.apache.solr.handler.component.DebugComponentdouble name=time0.0/double/lst/lst/lst/lst /response
RE: case-insensitive string type
The value in the srcANYSTRStrCI field is miXCAse or LowER according to Luke. -Original Message- From: Harsch, Timothy J. (ARC-TI)[PEROT SYSTEMS] [mailto:timothy.j.har...@nasa.gov] Sent: Wednesday, January 13, 2010 11:31 AM To: solr-user@lucene.apache.org Subject: RE: case-insensitive string type From the query http://localhost:8080/solr/select?q=idxPartition%3ASOMEPART%20AND%20srcANYSTRStrCI:%22mixcase%20or%20lower%22debugQuery=on Debug info attached -Original Message- From: Harsch, Timothy J. (ARC-TI)[PEROT SYSTEMS] [mailto:timothy.j.har...@nasa.gov] Sent: Wednesday, January 13, 2010 11:28 AM To: solr-user@lucene.apache.org Subject: RE: case-insensitive string type I considered that, but I'm also having the issue that I can't get an exact match as case insensitive either. -Original Message- From: Rob Casson [mailto:rob.cas...@gmail.com] Sent: Wednesday, January 13, 2010 11:26 AM To: solr-user@lucene.apache.org Subject: Re: case-insensitive string type from http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters On wildcard and fuzzy searches, no text analysis is performed on the search word. i'd just lowercase the wildcard-ed search term in your client code, before you send it to solr. hth, rob On Wed, Jan 13, 2010 at 2:18 PM, Harsch, Timothy J. (ARC-TI)[PEROT SYSTEMS] timothy.j.har...@nasa.gov wrote: Hi I have a field: field name=srcANYSTRStrCI type=string_ci indexed=true stored=true multiValued=true / With type definition: !-- A Case insensitive version of string type -- fieldType name=string_ci class=solr.StrField sortMissingLast=true omitNorms=true analyzer type=index tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.LowerCaseFilterFactory / /analyzer analyzer type=query tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.LowerCaseFilterFactory / /analyzer /fieldType When searching that field I can't get a case-insensitive match. It works as if it is a regular string, for instance I can do a prefix query and so long as the prefix matches the case of the value it works, but if I change the prefix case it doesn't Essentially I am trying to get case-insensitive matching that supports wild cards... Tim Harsch Sr. Software Engineer Dell Perot Systems (650) 604-0374
RE: case-insensitive string type
I created a document that has a string field and a case insensitive string field using my string_ci type, both have the same value sent at document creation time: miXCAse or LowER. I attach two debug query results. One against the string type and one against mine. The query is only different by changing the query field. Against the string there are results. Against mine there are none. Looking at the debug info, querying my type does lower case the query value it seems. Does this mean the analyzer to the index is failing? Would the fact that Luke shows the value as case preserved in both the string field and the string_ci field support this? -Original Message- From: Harsch, Timothy J. (ARC-TI)[PEROT SYSTEMS] [mailto:timothy.j.har...@nasa.gov] Sent: Wednesday, January 13, 2010 11:35 AM To: solr-user@lucene.apache.org Subject: RE: case-insensitive string type The value in the srcANYSTRStrCI field is miXCAse or LowER according to Luke. -Original Message- From: Harsch, Timothy J. (ARC-TI)[PEROT SYSTEMS] [mailto:timothy.j.har...@nasa.gov] Sent: Wednesday, January 13, 2010 11:31 AM To: solr-user@lucene.apache.org Subject: RE: case-insensitive string type From the query http://localhost:8080/solr/select?q=idxPartition%3ASOMEPART%20AND%20srcANYSTRStrCI:%22mixcase%20or%20lower%22debugQuery=on Debug info attached -Original Message- From: Harsch, Timothy J. (ARC-TI)[PEROT SYSTEMS] [mailto:timothy.j.har...@nasa.gov] Sent: Wednesday, January 13, 2010 11:28 AM To: solr-user@lucene.apache.org Subject: RE: case-insensitive string type I considered that, but I'm also having the issue that I can't get an exact match as case insensitive either. -Original Message- From: Rob Casson [mailto:rob.cas...@gmail.com] Sent: Wednesday, January 13, 2010 11:26 AM To: solr-user@lucene.apache.org Subject: Re: case-insensitive string type from http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters On wildcard and fuzzy searches, no text analysis is performed on the search word. i'd just lowercase the wildcard-ed search term in your client code, before you send it to solr. hth, rob On Wed, Jan 13, 2010 at 2:18 PM, Harsch, Timothy J. (ARC-TI)[PEROT SYSTEMS] timothy.j.har...@nasa.gov wrote: Hi I have a field: field name=srcANYSTRStrCI type=string_ci indexed=true stored=true multiValued=true / With type definition: !-- A Case insensitive version of string type -- fieldType name=string_ci class=solr.StrField sortMissingLast=true omitNorms=true analyzer type=index tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.LowerCaseFilterFactory / /analyzer analyzer type=query tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.LowerCaseFilterFactory / /analyzer /fieldType When searching that field I can't get a case-insensitive match. It works as if it is a regular string, for instance I can do a prefix query and so long as the prefix matches the case of the value it works, but if I change the prefix case it doesn't Essentially I am trying to get case-insensitive matching that supports wild cards... Tim Harsch Sr. Software Engineer Dell Perot Systems (650) 604-0374 ?xml version=1.0 encoding=UTF-8? response lst name=responseHeaderint name=status0/intint name=QTime32/intlst name=paramsstr name=debugQueryon/strstr name=qsrcANYSTRStr:(miXCAse or LowER)/str/lst/lstresult name=response numFound=1 start=0docstr name=idxKeyTimsUniqueKey/strarr name=srcANYSTRStrstrmiXCAse or LowER/str/arrarr name=srcANYSTRStrCIstrmiXCAse or LowER/str/arr/doc/resultlst name=debugstr name=rawquerystringsrcANYSTRStr:(miXCAse or LowER)/strstr name=querystringsrcANYSTRStr:(miXCAse or LowER)/strstr name=parsedquerysrcANYSTRStr:miXCAse or LowER/strstr name=parsedquery_toStringsrcANYSTRStr:miXCAse or LowER/strlst name=explainstr name=TimsUniqueKey 9.250228 = (MATCH) fieldWeight(srcANYSTRStr:miXCAse or LowER in 0), product of: 1.0 = tf(termFreq(srcANYSTRStr:miXCAse or LowER)=1) 9.250228 = idf(docFreq=1, maxDocs=7657) 1.0 = fieldNorm(field=srcANYSTRStr, doc=0) /str/lststr name=QParserLuceneQParser/strlst name=timingdouble name=time16.0/doublelst name=preparedouble name=time0.0/doublelst name=org.apache.solr.handler.component.QueryComponentdouble name=time0.0/double/lstlst name=org.apache.solr.handler.component.FacetComponentdouble name=time0.0/double/lstlst name=org.apache.solr.handler.component.MoreLikeThisComponentdouble name=time0.0/double/lstlst name=org.apache.solr.handler.component.HighlightComponentdouble name=time0.0/double/lstlst name=org.apache.solr.handler.component.StatsComponentdouble name=time0.0/double/lstlst
RE: case-insensitive string type
That seems to work. But why? Does string type not support LowerCaseFilterFactory? Or KeywordTokenizerFactory? -Original Message- From: Ahmet Arslan [mailto:iori...@yahoo.com] Sent: Wednesday, January 13, 2010 11:51 AM To: solr-user@lucene.apache.org Subject: RE: case-insensitive string type The value in the srcANYSTRStrCI field is miXCAse or LowER according to Luke. Can you try this fieldType (that uses class=solr.TextField) declaration and re-start tomcat re-index: fieldType name=string_ci class=solr.TextField sortMissingLast=true omitNorms=true analyzer tokenizer class=solr.KeywordTokenizerFactory/ filter class=solr.LowerCaseFilterFactory / filter class=solr.TrimFilterFactory / /analyzer /fieldType
RE: case-insensitive string type
Thanks, I know I read that sometime back but I guess I thought that was because there were no analyzer tags defined on the string field in the schema. I guess cause I'm still kind of a noob - I didn't take that to mean that it couldn't be made to have analyzers. A subtle but important distinction I guess. So my concern now is that my use case is that I need a field that behaves like string, case-sensitive, and a case-insensitive version of the same. Is it the case the solr.StrField and solr.textField with LowerCaseFilterFactory and KeywordTokenizerFactory only differ by their treatment of character case? -Original Message- From: Ahmet Arslan [mailto:iori...@yahoo.com] Sent: Wednesday, January 13, 2010 12:18 PM To: solr-user@lucene.apache.org Subject: RE: case-insensitive string type That seems to work. But why? Does string type not support LowerCaseFilterFactory? Or KeywordTokenizerFactory? From from apache-solr-1.4.0\example\solr\conf\schema.xml : The StrField type is not analyzed, but indexed/stored verbatim. solr.TextField allows the specification of custom text analyzers specified as a tokenizer and a list of token filters.
Re: Multi language support
Robert Muir: Thank you for the pointer to that paper! On Wed, Jan 13, 2010 at 6:29 AM, Paul Libbrecht p...@activemath.org wrote: Isn't the conclusion here that some stopword and stemming free matching should be the best match if ever and to then gently degrade to weaker forms of matching? paul Le 13-janv.-10 à 07:08, Walter Underwood a écrit : There is a band named The The. And a producer named Don Was. For a list of all-stopword movie titles at Netflix, see this post: http://wunderwood.org/most_casual_observer/2007/05/invisible_titles.html My favorite is To Be and To Have (Être et Avoir), which is all stopwords in two languages. And a very good movie. wunder On Jan 12, 2010, at 6:55 PM, Robert Muir wrote: sorry, i forgot to include this 2009 paper comparing what stopwords do across 3 languages: http://doc.rero.ch/lm.php?url=1000,43,4,20091218142456-GY/Dolamic_Ljiljana_-_When_Stopword_Lists_Make_the_Difference_20091218.pdf in my opinion, if stopwords annoy your users for very special cases like 'the the' then, instead consider using commongrams + defaultsimilarity.discountOverlaps = true so that you still get the benefits. as you can see from the above paper, they can be extremely important depending on the language, they just don't matter so much for English. On Tue, Jan 12, 2010 at 9:20 PM, Lance Norskog goks...@gmail.com wrote: There are a lot of projects that don't use stopwords any more. You might consider dropping them altogether. On Mon, Jan 11, 2010 at 2:25 PM, Don Werve d...@madwombat.com wrote: This is the way I've implemented multilingual search as well. 2010/1/11 Markus Jelsma mar...@buyways.nl Hello, We have implemented language specific search in Solr using language specific fields and field types. For instance, an en_text field type can use an English stemmer, and list of stopwords and synonyms. We, however did not use specific stopwords, instead we used one list shared by both languages. So you would have a field type like: fieldType name=en_text class=solr.TextField ... analyzer type= filter class=solr.StopFilterFactory words=stopwords.en.txt filter class=solr.SynonymFilterFactory synonyms=synoyms.en.txt etc etc. Cheers, - Markus Jelsma Buyways B.V. Technisch Architect Friesestraatweg 215c http://www.buyways.nl 9743 AD Groningen Alg. 050-853 6600 KvK 01074105 Tel. 050-853 6620 Fax. 050-3118124 Mob. 06-5025 8350 In: http://www.linkedin.com/in/markus17 On Mon, 2010-01-11 at 13:45 +0100, Daniel Persson wrote: Hi Solr users. I'm trying to set up a site with Solr search integrated. And I use the SolJava API to feed the index with search documents. At the moment I have only activated search on the English portion of the site. I'm interested in using as many features of solr as possible. Synonyms, Stopwords and stems all sounds quite interesting and useful but how do I set up this in a good way for a multilingual site? The site don't have a huge text mass so performance issues don't really bother me but still I'd like to hear your suggestions before I try to implement an solution. Best regards Daniel -- Lance Norskog goks...@gmail.com -- Robert Muir rcm...@gmail.com -- Lance Norskog goks...@gmail.com
Re: copyField with Analyzer?
You can do this filtering in the DataImportHandler. The regular expression tool is probably enough: http://wiki.apache.org/solr/DataImportHandler#RegexTransformer On Wed, Jan 13, 2010 at 8:57 AM, Harsch, Timothy J. (ARC-TI)[PEROT SYSTEMS] timothy.j.har...@nasa.gov wrote: Hi all, I tried creating a case-insensitive string using the values provided to a string, via CopyField. This didn't work, since copyField does it's job before the analyzer on the case-insensitive string field is invoked. Is there another way I might accomplish this field replication on the server? Tim Harsch Sr. Software Engineer Dell Perot Systems -- Lance Norskog goks...@gmail.com
Re: Interesting OutOfMemoryError on a 170M index
The time in autocommit is in milliseconds. You are committing every second while indexing. This then causes a build-up of sucessive index readers that absorb each commit, which is probably the out-of-memory. On Wed, Jan 13, 2010 at 10:36 AM, Minutello, Nick nick.minute...@credit-suisse.com wrote: Hi, I have a bit of an interesting OutOfMemoryError that I'm trying to figure out. My client Solr server are running in the same JVM (for deployment simplicity). FWIW, I'm using Jetty to host Solr. I'm using the supplied code for the http-based client interface. Solr 1.3.0. My app is adding about 20,000 documents per minute to the index - one at a time (it is listening to an event stream and for every event, it adds a new document to the index). The size of the documents, however, is tiny - the total index growth is only about 170M (after about 1 hr and the OutOfMemoryError) At this point, there is zero querying happening - just updates to the index (only adding documents, no updates or deletes) After about an hour or so, my JVM runs out of heap space - and if I look at the memory utilisation over time, it looks like a classic memory leak. It slowly ramps up until we end up with constant FULL GC's and eventual OOME. Max heap space is 512M. In Solr, I'm using autocommit (to buffer the updates) autoCommit maxDocs1/maxDocs maxTime1000/maxTime /autoCommit (Aside: Now, I'm not sure if I am meant to call commit or not on the client SolrServer class if I am using autocommit - but as it turns out, I get OOME whether I do that or not) Any suggestions/advice of quick things to check before I dust off the profiler? Thanks in advance. Cheers, Nick === Please access the attached hyperlink for an important electronic communications disclaimer: http://www.credit-suisse.com/legal/en/disclaimer_email_ib.html === -- Lance Norskog goks...@gmail.com
RE: Interesting OutOfMemoryError on a 170M index
Agreed, commit every second. Assuming I understand what you're saying correctly: There shouldn't be any index readers - as at this point, just writing to the index. Did I understand correctly what you meant? -Nick -Original Message- From: Lance Norskog [mailto:goks...@gmail.com] Sent: 13 January 2010 22:28 To: solr-user@lucene.apache.org Subject: Re: Interesting OutOfMemoryError on a 170M index The time in autocommit is in milliseconds. You are committing every second while indexing. This then causes a build-up of sucessive index readers that absorb each commit, which is probably the out-of-memory. On Wed, Jan 13, 2010 at 10:36 AM, Minutello, Nick nick.minute...@credit-suisse.com wrote: Hi, I have a bit of an interesting OutOfMemoryError that I'm trying to figure out. My client Solr server are running in the same JVM (for deployment simplicity). FWIW, I'm using Jetty to host Solr. I'm using the supplied code for the http-based client interface. Solr 1.3.0. My app is adding about 20,000 documents per minute to the index - one at a time (it is listening to an event stream and for every event, it adds a new document to the index). The size of the documents, however, is tiny - the total index growth is only about 170M (after about 1 hr and the OutOfMemoryError) At this point, there is zero querying happening - just updates to the index (only adding documents, no updates or deletes) After about an hour or so, my JVM runs out of heap space - and if I look at the memory utilisation over time, it looks like a classic memory leak. It slowly ramps up until we end up with constant FULL GC's and eventual OOME. Max heap space is 512M. In Solr, I'm using autocommit (to buffer the updates) autoCommit maxDocs1/maxDocs maxTime1000/maxTime /autoCommit (Aside: Now, I'm not sure if I am meant to call commit or not on the client SolrServer class if I am using autocommit - but as it turns out, I get OOME whether I do that or not) Any suggestions/advice of quick things to check before I dust off the profiler? Thanks in advance. Cheers, Nick == = Please access the attached hyperlink for an important electronic communications disclaimer: http://www.credit-suisse.com/legal/en/disclaimer_email_ib.html == = -- Lance Norskog goks...@gmail.com === Please access the attached hyperlink for an important electronic communications disclaimer: http://www.credit-suisse.com/legal/en/disclaimer_email_ib.html ===
Need deployment strategy
Hi all, The way the indexing works on our system is as follows: We have a separate staging server with a copy of our web app. The clients will index a number of documents in a batch on the staging server (this happens about once a week), then they play with the results on the staging server for a day until satisfied. Only then do they give the ok to deploy. What I've been doing is, when they want to deploy, I do the following: 1) merge and optimize the index on the staging server, 2) copy it to the production server, 3) stop solr on production, 4) copy the new index on top of the old one, 5) start solr on production. This works, but has the following disadvantages: 1) The index is getting bigger, so it takes longer to zip it and transfer it. 2) The user is only added a few records, yet we copy over all of them. If a bug happens that causes an unrelated document to get deleted or replaced on staging, we wouldn't notice, and we'd propagate the problem to the server. I'd sleep better if I were only moving the records that were new or changed and leaving the records that already work in place. 3) solr is down on production for about 5 minutes, so users during that time are getting errors. I was looking for some kind of replication strategy where I can run a task on the production server to tell it to merge a core from the staging server. Is that possible? I can open up port 8983 on the staging server only to the production server, but then what do I do on production to get the core? Thanks, Paul
RE: Problem comitting on 40GB index
Hi! Garbage collection is an issue of the underlying JVM. You may use –XX:+PrintGCDetails as an argument to your JVM in order to collect details of the garbage collection. If you also use the parameter –XX:+PrintGCTimeStamps you get the time stamps of the garbage collection. For further information you may want to refer to the paper http://java.sun.com/j2se/reference/whitepapers/memorymanagement_whitepaper.pdf which points you to a few other utilities related to GC. Best, Sven Maurmann --On Mittwoch, 13. Januar 2010 18:03 + Frederico Azeiteiro frederico.azeite...@cision.com wrote: The hanging didn't happen again since yesterday. I never run out of space again. This is still a dev environment, so the number of searches is very low. Maybe I'm just lucky... Where can I see the garbage collection info? -Original Message- From: Marc Des Garets [mailto:marc.desgar...@192.com] Sent: quarta-feira, 13 de Janeiro de 2010 17:20 To: solr-user@lucene.apache.org Subject: RE: Problem comitting on 40GB index Just curious, have you checked if the hanging you are experiencing is not garbage collection related? -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: 13 January 2010 13:33 To: solr-user@lucene.apache.org Subject: Re: Problem comitting on 40GB index That's my understanding.. But fortunately disk space is cheap G On Wed, Jan 13, 2010 at 5:01 AM, Frederico Azeiteiro frederico.azeite...@cision.com wrote: Sorry, my bad... I replied to a current mailing list message only changing the subject... Didn't know about this Hijacking problem. Will not happen again. Just for close this issue, if I understand correctly, for an index of 40G, I will need, for running an optimize: - 40G if all activity on index is stopped - 80G if index is being searched...) - 120G if index is being searched and if a commit is performed. Is this correct? Thanks. Frederico -Original Message- From: Erick Erickson [mailto:erickerick...@gmail.com] Sent: terça-feira, 12 de Janeiro de 2010 19:18 To: solr-user@lucene.apache.org Subject: Re: Problem comitting on 40GB index Huh? On Tue, Jan 12, 2010 at 2:00 PM, Chris Hostetter hossman_luc...@fucit.orgwrote: : Subject: Problem comitting on 40GB index : In-Reply-To: 7a9c48b51001120345h5a57dbd4o8a8a39fc4a98a...@mail.gmail.com http://people.apache.org/~hossman/#threadhijack Thread Hijacking on Mailing Lists When starting a new discussion on a mailing list, please do not reply to an existing message, instead start a fresh email. Even if you change the subject line of your email, other mail headers still track which thread you replied to and your question is hidden in that thread and gets less attention. It makes following discussions in the mailing list archives particularly difficult. See Also: http://en.wikipedia.org/wiki/User:DonDiego/Thread_hijacking -Hoss -- This transmission is strictly confidential, possibly legally privileged, and intended solely for the addressee. Any views or opinions expressed within it are those of the author and do not necessarily represent those of 192.com, i-CD Publishing (UK) Ltd or any of it's subsidiary companies. If you are not the intended recipient then you must not disclose, copy or take any action in reliance of this transmission. If you have received this transmission in error, please notify the sender as soon as possible. No employee or agent is authorised to conclude any binding agreement on behalf of i-CD Publishing (UK) Ltd with another party by email without express written confirmation by an authorised employee of the Company. http://www.192.com (Tel: 08000 192 192). i-CD Publishing (UK) Ltd is incorporated in England and Wales, company number 3148549, VAT No. GB 673128728.
Re: Question
Bill, If you are using Solr 1.4, don't bother with rsync, use the Java-based replication - info on zee Wiki. Otis -- Sematext -- http://sematext.com/ -- Solr - Lucene - Nutch From: Bill Bell bb...@kaango.com To: solr-user@lucene.apache.org Sent: Wed, January 13, 2010 12:21:44 PM Subject: Re: Question On Wed, Jan 13, 2010 at 10:17 AM, Bill Bell bb...@kaango.com wrote: I am using Solr 1.4, and have 3 cores defined in solr.xml. Question on replication 1. How do I set up rsync replication from master to slaves? It was easy to do with just one core and one script.conf, but with multiple cores what is the easiest way? 2. I got the system to work by changing the snappuller to pass in a -c script.conf but there has got to be an easier way? 3. On the master I have 3 rsync daemons running. Is it possible to do it with one? The script.conf really needs multiple data_dir settings... -- Bill Bell Vice President of Technology bb...@kaango.com mobile 720.256.8076 Kaango, LLC - www.kaango.com -- Bill Bell Vice President of Technology bb...@kaango.com mobile 720.256.8076 Kaango, LLC - www.kaango.com
Re: Queries of type field:value not functioning
Hi, Pointers: * What happens when you don't use a field name? * What are your logs showing? * What is debugQuery=on showing? * What is the Analysis page for some of the problematic queries showing? Otis -- Sematext -- http://sematext.com/ -- Solr - Lucene - Nutch From: Siddhant Goel siddhantg...@gmail.com To: solr-user@lucene.apache.org Sent: Wed, January 13, 2010 5:38:53 AM Subject: Queries of type field:value not functioning Hi all, Any query I make which is of type field:value does not return any documents. Same is the case for the *:* query. The *:* query doesn't return any result either. The index size is close to 1GB now, so it should be returning some documents. The rest of the queries are functioning properly. Any help? Thanks, -- - Siddhant
Re: Interesting OutOfMemoryError on a 170M index
On Jan 13, 2010, at 5:34 PM, Minutello, Nick wrote: Agreed, commit every second. Do you need the index to be updated this often? Are you reading from it every second? and need results that are that fresh If not, i imagine increasing the auto-commit time to 1min or even 10 secs would help some. Re, calling commit from the client with auto-commit... if you are using auto-commit, you should not call commit from the client ryan Assuming I understand what you're saying correctly: There shouldn't be any index readers - as at this point, just writing to the index. Did I understand correctly what you meant? -Nick -Original Message- From: Lance Norskog [mailto:goks...@gmail.com] Sent: 13 January 2010 22:28 To: solr-user@lucene.apache.org Subject: Re: Interesting OutOfMemoryError on a 170M index The time in autocommit is in milliseconds. You are committing every second while indexing. This then causes a build-up of sucessive index readers that absorb each commit, which is probably the out-of- memory. On Wed, Jan 13, 2010 at 10:36 AM, Minutello, Nick nick.minute...@credit-suisse.com wrote: Hi, I have a bit of an interesting OutOfMemoryError that I'm trying to figure out. My client Solr server are running in the same JVM (for deployment simplicity). FWIW, I'm using Jetty to host Solr. I'm using the supplied code for the http-based client interface. Solr 1.3.0. My app is adding about 20,000 documents per minute to the index - one at a time (it is listening to an event stream and for every event, it adds a new document to the index). The size of the documents, however, is tiny - the total index growth is only about 170M (after about 1 hr and the OutOfMemoryError) At this point, there is zero querying happening - just updates to the index (only adding documents, no updates or deletes) After about an hour or so, my JVM runs out of heap space - and if I look at the memory utilisation over time, it looks like a classic memory leak. It slowly ramps up until we end up with constant FULL GC's and eventual OOME. Max heap space is 512M. In Solr, I'm using autocommit (to buffer the updates) autoCommit maxDocs1/maxDocs maxTime1000/maxTime /autoCommit (Aside: Now, I'm not sure if I am meant to call commit or not on the client SolrServer class if I am using autocommit - but as it turns out, I get OOME whether I do that or not) Any suggestions/advice of quick things to check before I dust off the profiler? Thanks in advance. Cheers, Nick = = = Please access the attached hyperlink for an important electronic communications disclaimer: http://www.credit-suisse.com/legal/en/disclaimer_email_ib.html = = = -- Lance Norskog goks...@gmail.com = = = = = = = = = == Please access the attached hyperlink for an important electronic communications disclaimer: http://www.credit-suisse.com/legal/en/disclaimer_email_ib.html = = = = = = = = = ==
RE: Interesting OutOfMemoryError on a 170M index
if you are using auto-commit, you should not call commit from the client Cheers, thanks. Do you need the index to be updated this often? Wouldn't increasing the autocommit time make it worse? (ie more documents buffered) I can extend it and see what effect it has -Nick -Original Message- From: Ryan McKinley [mailto:ryan...@gmail.com] Sent: 13 January 2010 23:16 To: solr-user@lucene.apache.org Subject: Re: Interesting OutOfMemoryError on a 170M index On Jan 13, 2010, at 5:34 PM, Minutello, Nick wrote: Agreed, commit every second. Do you need the index to be updated this often? Are you reading from it every second? and need results that are that fresh If not, i imagine increasing the auto-commit time to 1min or even 10 secs would help some. Re, calling commit from the client with auto-commit... if you are using auto-commit, you should not call commit from the client ryan Assuming I understand what you're saying correctly: There shouldn't be any index readers - as at this point, just writing to the index. Did I understand correctly what you meant? -Nick -Original Message- From: Lance Norskog [mailto:goks...@gmail.com] Sent: 13 January 2010 22:28 To: solr-user@lucene.apache.org Subject: Re: Interesting OutOfMemoryError on a 170M index The time in autocommit is in milliseconds. You are committing every second while indexing. This then causes a build-up of sucessive index readers that absorb each commit, which is probably the out-of- memory. On Wed, Jan 13, 2010 at 10:36 AM, Minutello, Nick nick.minute...@credit-suisse.com wrote: Hi, I have a bit of an interesting OutOfMemoryError that I'm trying to figure out. My client Solr server are running in the same JVM (for deployment simplicity). FWIW, I'm using Jetty to host Solr. I'm using the supplied code for the http-based client interface. Solr 1.3.0. My app is adding about 20,000 documents per minute to the index - one at a time (it is listening to an event stream and for every event, it adds a new document to the index). The size of the documents, however, is tiny - the total index growth is only about 170M (after about 1 hr and the OutOfMemoryError) At this point, there is zero querying happening - just updates to the index (only adding documents, no updates or deletes) After about an hour or so, my JVM runs out of heap space - and if I look at the memory utilisation over time, it looks like a classic memory leak. It slowly ramps up until we end up with constant FULL GC's and eventual OOME. Max heap space is 512M. In Solr, I'm using autocommit (to buffer the updates) autoCommit maxDocs1/maxDocs maxTime1000/maxTime /autoCommit (Aside: Now, I'm not sure if I am meant to call commit or not on the client SolrServer class if I am using autocommit - but as it turns out, I get OOME whether I do that or not) Any suggestions/advice of quick things to check before I dust off the profiler? Thanks in advance. Cheers, Nick = = = Please access the attached hyperlink for an important electronic communications disclaimer: http://www.credit-suisse.com/legal/en/disclaimer_email_ib.html = = = -- Lance Norskog goks...@gmail.com = = = = = = = = = == Please access the attached hyperlink for an important electronic communications disclaimer: http://www.credit-suisse.com/legal/en/disclaimer_email_ib.html = = = = = = = = = == === Please access the attached hyperlink for an important electronic communications disclaimer: http://www.credit-suisse.com/legal/en/disclaimer_email_ib.html ===
Re: How to display Highlight with VelocityResponseWriter?
Hi Qiuyan, Thanks a lot. It works now. When i added the line #set($hl = $response.highlighting) i got the highlighting. But i wonder if there's any document that describes the usage of that. I mean i didn't know the name of those methods. Actually i just managed to guess it. Solritas (aka VelocityResponseWriter) binds a number of objects into a so called VelocityContext (consult [1] for a complete list). You can think of a map that allows you to access objects by symbolic names, e.g., an instance of QueryResponse is stored under response (that's why you write $response in your template). Since $response is an instance of QueryResponse you can call all methods on it the API [2] provides. Furthermore, Velocity incorporates a JavaBean-like introspection mechanism that lets you write $response.highlighting instead of $response.getHighlighting() (only a bit of syntactic sugar). -Sascha [1] http://wiki.apache.org/solr/VelocityResponseWriter#line-93 [2] http://lucene.apache.org/solr/api/org/apache/solr/client/solrj/response/QueryResponse.html Quoting Sascha Szott sz...@zib.de: Qiuyan, with highlight can also be displayed in the web gui. I've added bool name=hltrue/bool into the standard responseHandler and it already works, i.e without velocity. But the same line doesn't take effect in itas. Should i configure anything else? Thanks in advance. First of all, just a few notes on the /itas request handler in your solrconfig.xml: 1. The entry arr name=components strhighlight/str /arr is obsolete, since the highlighting component is a default search component [1]. 2. Note that since you didn't specify a value for hl.fl highlighting will only affect the fields listed inside of qf. 3. Why did you override the default value of hl.fragmenter? In most cases the default fragmenting algorithm (gap) works fine - and maybe in yours as well? To make sure all your hl related settings are correct, can you post an xml output (change the wt parameter to xml) for a search with highlighted results. And finally, can you post the vtl code snippet that should produce the highlighted output. -Sascha [1] http://wiki.apache.org/solr/SearchComponent
RE: Interesting OutOfMemoryError on a 170M index
Hm, Ryan, you may have inadvertently solved the problem. :) Going flat out in a loop, indexing 1 doc at a time, I can only index about 17,000 per minute - roughly what I was seeing with my app running... which makes me suspicious. The number is too close to be coincidental. It could very well be that I may be getting many more than 17,000 updates per minute - and because I cant index them fast enough, the event queue in the underlying library (that is providing me the events) may be growing without bound... So, looks like I have to increase the throughput with the indexing. (indexing 1 at a time is far from ideal - even with the buffering). I may have to either implement some client-side buffering to make it more efficient - or eliminate the http layer (go embedded). Thanks. -Nick -Original Message- From: Minutello, Nick Sent: 13 January 2010 23:29 To: solr-user@lucene.apache.org Subject: RE: Interesting OutOfMemoryError on a 170M index if you are using auto-commit, you should not call commit from the client Cheers, thanks. Do you need the index to be updated this often? Wouldn't increasing the autocommit time make it worse? (ie more documents buffered) I can extend it and see what effect it has -Nick -Original Message- From: Ryan McKinley [mailto:ryan...@gmail.com] Sent: 13 January 2010 23:16 To: solr-user@lucene.apache.org Subject: Re: Interesting OutOfMemoryError on a 170M index On Jan 13, 2010, at 5:34 PM, Minutello, Nick wrote: Agreed, commit every second. Do you need the index to be updated this often? Are you reading from it every second? and need results that are that fresh If not, i imagine increasing the auto-commit time to 1min or even 10 secs would help some. Re, calling commit from the client with auto-commit... if you are using auto-commit, you should not call commit from the client ryan Assuming I understand what you're saying correctly: There shouldn't be any index readers - as at this point, just writing to the index. Did I understand correctly what you meant? -Nick -Original Message- From: Lance Norskog [mailto:goks...@gmail.com] Sent: 13 January 2010 22:28 To: solr-user@lucene.apache.org Subject: Re: Interesting OutOfMemoryError on a 170M index The time in autocommit is in milliseconds. You are committing every second while indexing. This then causes a build-up of sucessive index readers that absorb each commit, which is probably the out-of- memory. On Wed, Jan 13, 2010 at 10:36 AM, Minutello, Nick nick.minute...@credit-suisse.com wrote: Hi, I have a bit of an interesting OutOfMemoryError that I'm trying to figure out. My client Solr server are running in the same JVM (for deployment simplicity). FWIW, I'm using Jetty to host Solr. I'm using the supplied code for the http-based client interface. Solr 1.3.0. My app is adding about 20,000 documents per minute to the index - one at a time (it is listening to an event stream and for every event, it adds a new document to the index). The size of the documents, however, is tiny - the total index growth is only about 170M (after about 1 hr and the OutOfMemoryError) At this point, there is zero querying happening - just updates to the index (only adding documents, no updates or deletes) After about an hour or so, my JVM runs out of heap space - and if I look at the memory utilisation over time, it looks like a classic memory leak. It slowly ramps up until we end up with constant FULL GC's and eventual OOME. Max heap space is 512M. In Solr, I'm using autocommit (to buffer the updates) autoCommit maxDocs1/maxDocs maxTime1000/maxTime /autoCommit (Aside: Now, I'm not sure if I am meant to call commit or not on the client SolrServer class if I am using autocommit - but as it turns out, I get OOME whether I do that or not) Any suggestions/advice of quick things to check before I dust off the profiler? Thanks in advance. Cheers, Nick = = = Please access the attached hyperlink for an important electronic communications disclaimer: http://www.credit-suisse.com/legal/en/disclaimer_email_ib.html = = = -- Lance Norskog goks...@gmail.com = = = = = = = = = == Please access the attached hyperlink for an important electronic communications disclaimer: http://www.credit-suisse.com/legal/en/disclaimer_email_ib.html = = = = = = = = = == === Please access the attached hyperlink for an important electronic communications disclaimer:
Re: Encountering a roadblock with my Solr schema design...use dedupe?
: Dedupe is completely the wrong word. Deduping is something else : entirely - it is about trying not to index the same document twice. Dedup can also certainly be used with field collapsing -- that was one of the initial use cases identified for the SignatureUpdateProcessorFactory ... you can compute an 'expensive' signature when adding a document, index it, and then FieldCollapse on that signature field. This gives you query time deduplication based on a value computed when indexing (the canonical example is multiple urls refrenceing the same content but with slightly differnet boilerplate markup. You can use a Signature class that recognizes the boilerplate and computes an identical signature value for each URL whose content is the same but still index all of the URLs and their content as distinct documents ... so use cases where people only distinct URLs work using field collapse but by default all matching documents can still be returned and searches on text in the boilerplate markup also still work. -Hoss
Re: Encountering a roadblock with my Solr schema design...use dedupe?
Hoss, Would you suggest using dedup for my use case; and if so, do you know of a working example I can reference? I don't have an issue using the patched version of Solr, but I'd much rather use the GA version. -Kelly hossman wrote: : Dedupe is completely the wrong word. Deduping is something else : entirely - it is about trying not to index the same document twice. Dedup can also certainly be used with field collapsing -- that was one of the initial use cases identified for the SignatureUpdateProcessorFactory ... you can compute an 'expensive' signature when adding a document, index it, and then FieldCollapse on that signature field. This gives you query time deduplication based on a value computed when indexing (the canonical example is multiple urls refrenceing the same content but with slightly differnet boilerplate markup. You can use a Signature class that recognizes the boilerplate and computes an identical signature value for each URL whose content is the same but still index all of the URLs and their content as distinct documents ... so use cases where people only distinct URLs work using field collapse but by default all matching documents can still be returned and searches on text in the boilerplate markup also still work. -Hoss -- View this message in context: http://old.nabble.com/Encountering-a-roadblock-with-my-Solr-schema-design...use-dedupe--tp27118977p27155115.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: What is this error means?
Hi Israel Thank you for your response. However, I use both ini_set and set the _defaultTimeout to 6000 but the error still occur with same error message. Now, when I start build the index, the error pops up much faster than changing it before. So do you have any idea? Thank you in advance for your help. Israel Ekpo wrote: Ellery, A preliminary look at the source code indicates that the error is happening because the solr server is taking longer than expected to respond to the client http://code.google.com/p/solr-php-client/source/browse/trunk/Apache/Solr/Service.php The default time out handed down to Apache_Solr_Service:_sendRawPost() is 60 seconds since you were calling the addDocument() method So if it took longer than that (1 minute), then it will exit with that error message. You will have to increase the default value to something very high like 10 minutes or so on line 252 in the source code since there is no way to specify that in the constructor or the addDocument method. Another alternative will be to update the default_socket_timeout in the php.ini file or in the code using ini_set I hope that helps On Tue, Jan 12, 2010 at 9:33 PM, Ellery Leung elleryle...@be-o.com wrote: Hi, here is the stack trace: br / Fatal error: Uncaught exception 'Exception' with message 'quot;0quot; Status: Communication Error' in C:\nginx\html\lib\SolrPhpClient\Apache\Solr\Serv ice.php:385 Stack trace: #0 C:\nginx\html\lib\SolrPhpClient\Apache\Solr\Service.php(652): Apache_Solr_Ser vice-gt;_sendRawPost('http://127.0.0', 'lt;add allowDups=...') #1 C:\nginx\html\lib\SolrPhpClient\Apache\Solr\Service.php(676): Apache_Solr_Ser vice-gt;add('lt;add allowDups=...') #2 C:\nginx\html\apps\milio\lib\System\classes\SolrSearchEngine.class.php(221): Apache_Solr_Service-gt;addDocument(Object(Apache_Solr_Document)) #3 C:\nginx\html\apps\milio\lib\System\classes\SolrSearchEngine.class.php(262): SolrSearchEngine-gt;buildIndex(Array, 'key') #4 C:\nginx\html\apps\milio\lib\System\classes\Indexer\Indexer.class.php(51): So lrSearchEngine-gt;createFullIndex('contacts', Array, 'key', 'www') #5 C:\nginx\html\apps\milio\lib\System\functions\createIndex.php(64): Indexer-g t;create('www') #6 {main} thrown in C:\nginx\html\lib\SolrPhpClient\Apache\Solr\Service.php on li ne 385br / C:\nginx\html\apps\milio\htdocs\Contactspause Press any key to continue . . . Thanks for helping me. Grant Ingersoll-6 wrote: Do you have a stack trace? On Jan 12, 2010, at 2:54 AM, Ellery Leung wrote: When I am building the index for around 2 ~ 25000 records, sometimes I came across with this error: Uncaught exception Exception with message '0' Status: Communication Error I search Google Yahoo but no answer. I am now committing document to solr on every 10 records fetched from a SQLite Database with PHP 5.3. Platform: Windows 7 Home Web server: Nginx Solr Specification Version: 1.4.0 Solr Implementation Version: 1.4.0 833479 - grantingersoll - 2009-11-06 12:33:40 Lucene Specification Version: 2.9.1 Lucene Implementation Version: 2.9.1 832363 - 2009-11-03 04:37:25 Solr hosted in jetty 6.1.3 All the above are in one single test machine. The situation is that sometimes when I build the index, it can be created successfully. But sometimes it will just stop with the above error. Any clue? Please help. Thank you in advance. -- View this message in context: http://old.nabble.com/What-is-this-error-means--tp27123815p27138658.html Sent from the Solr - User mailing list archive at Nabble.com. -- Good Enough is not good enough. To give anything less than your best is to sacrifice the gift. Quality First. Measure Twice. Cut Once. http://www.israelekpo.com/ -- View this message in context: http://old.nabble.com/What-is-this-error-means--tp27123815p27155487.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Reverse sort facet query [SOLR-1672]
: i.e. just extend facet.sort to allow a 'count desc'. By convention, ok : to use a a space in the name? - or would count.desc (and count.asc as : alias for count) be more compliant? i would use space to remain consistent with the existing sort param. it might even make sense to refactor (re/ab)use the existing sort parsing code in QueryParsing.parseSort ... but now that that also know about parsing functions it's a bit hairry ... so that does seem a little crazy. : : : : Peter : : : : _ : We want to hear all your funny, exciting and crazy Hotmail stories. Tell us now : http://clk.atdmt.com/UKM/go/195013117/direct/01/ -Hoss
Re: What is this error means?
Here are a workaround of this issue: On line 382 of SolrPhpClient/Apache/Solr/Service.php, I change to: while(true){ $str = file_get_contents($url, false, $this-_postContext); if(empty($str) == false){ break; } } $response = new Apache_Solr_Response($str, $http_response_header, $this-_createDocuments, $this-_collapseSingleValueArrays); As I found that, for some strange reason on Windows, when you post some data and add index, Solr may not be able to receive it. Therefore I added an infinitive loop and if it does not receive any response ($str is empty), we post it again. Side effect: when I open the window console to see it, sometimes it will prompt: Failed to open stream: HTTP request failed! I haven't researched it yet, but the index is built successfully. Hope it helps someone. Ellery Leung wrote: Hi Israel Thank you for your response. However, I use both ini_set and set the _defaultTimeout to 6000 but the error still occur with same error message. Now, when I start build the index, the error pops up much faster than changing it before. So do you have any idea? Thank you in advance for your help. Israel Ekpo wrote: Ellery, A preliminary look at the source code indicates that the error is happening because the solr server is taking longer than expected to respond to the client http://code.google.com/p/solr-php-client/source/browse/trunk/Apache/Solr/Service.php The default time out handed down to Apache_Solr_Service:_sendRawPost() is 60 seconds since you were calling the addDocument() method So if it took longer than that (1 minute), then it will exit with that error message. You will have to increase the default value to something very high like 10 minutes or so on line 252 in the source code since there is no way to specify that in the constructor or the addDocument method. Another alternative will be to update the default_socket_timeout in the php.ini file or in the code using ini_set I hope that helps On Tue, Jan 12, 2010 at 9:33 PM, Ellery Leung elleryle...@be-o.com wrote: Hi, here is the stack trace: br / Fatal error: Uncaught exception 'Exception' with message 'quot;0quot; Status: Communication Error' in C:\nginx\html\lib\SolrPhpClient\Apache\Solr\Serv ice.php:385 Stack trace: #0 C:\nginx\html\lib\SolrPhpClient\Apache\Solr\Service.php(652): Apache_Solr_Ser vice-gt;_sendRawPost('http://127.0.0', 'lt;add allowDups=...') #1 C:\nginx\html\lib\SolrPhpClient\Apache\Solr\Service.php(676): Apache_Solr_Ser vice-gt;add('lt;add allowDups=...') #2 C:\nginx\html\apps\milio\lib\System\classes\SolrSearchEngine.class.php(221): Apache_Solr_Service-gt;addDocument(Object(Apache_Solr_Document)) #3 C:\nginx\html\apps\milio\lib\System\classes\SolrSearchEngine.class.php(262): SolrSearchEngine-gt;buildIndex(Array, 'key') #4 C:\nginx\html\apps\milio\lib\System\classes\Indexer\Indexer.class.php(51): So lrSearchEngine-gt;createFullIndex('contacts', Array, 'key', 'www') #5 C:\nginx\html\apps\milio\lib\System\functions\createIndex.php(64): Indexer-g t;create('www') #6 {main} thrown in C:\nginx\html\lib\SolrPhpClient\Apache\Solr\Service.php on li ne 385br / C:\nginx\html\apps\milio\htdocs\Contactspause Press any key to continue . . . Thanks for helping me. Grant Ingersoll-6 wrote: Do you have a stack trace? On Jan 12, 2010, at 2:54 AM, Ellery Leung wrote: When I am building the index for around 2 ~ 25000 records, sometimes I came across with this error: Uncaught exception Exception with message '0' Status: Communication Error I search Google Yahoo but no answer. I am now committing document to solr on every 10 records fetched from a SQLite Database with PHP 5.3. Platform: Windows 7 Home Web server: Nginx Solr Specification Version: 1.4.0 Solr Implementation Version: 1.4.0 833479 - grantingersoll - 2009-11-06 12:33:40 Lucene Specification Version: 2.9.1 Lucene Implementation Version: 2.9.1 832363 - 2009-11-03 04:37:25 Solr hosted in jetty 6.1.3 All the above are in one single test machine. The situation is that sometimes when I build the index, it can be created successfully. But sometimes it will just stop with the above error. Any clue? Please help. Thank you in advance. -- View this message in context: http://old.nabble.com/What-is-this-error-means--tp27123815p27138658.html Sent from the Solr - User mailing list archive at Nabble.com. -- Good Enough is not good enough. To give anything less than your best is to sacrifice the gift. Quality First. Measure Twice. Cut Once. http://www.israelekpo.com/ -- View this message in context: http://old.nabble.com/What-is-this-error-means--tp27123815p27156058.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Queries of type field:value not functioning
Hi, Thanks for the responses. q.alt did the job. Turns out that the dismax query parser was at fault, and wasn't able to handle queries of the type *:*. Putting the query in q.alt, or adding a defType=lucene (as pointed out to me on the irc channel) worked. Thanks, -- - Siddhant