Re: Converting German special characters / umlaute
On Thu, 2007-09-27 at 13:26 -0400, J.J. Larrea wrote: At 12:13 PM -0400 9/27/07, Steven Rowe wrote: Chris Hostetter wrote: ... As for implementation, the first part could easily and flexibly accomplished with the current PatternReplaceFilter, and I'm thinking the second could be done with an extension to that or better yet a new Filter which allows parsing synonymous tokens from a flat to overlaid format, e.g. something on the order of: filter class=solr.PatternReplaceFilterFactory pattern=(.*)(ü|ue)(.*) replacement=$1ue$3|$1u$3 tokensep=| !-- not currently implemented -- replace=first/ or perhaps better, filter class=solr.PatternReplaceFilterFactory pattern=(.*)(ü|ue)(.*) replacement=$1ue$3|$1u$3 replace=first/ filter class=solr.OverlayTokenFilterFactory tokensep=|/ !-- not currently implemented -- which in my fantasy implementation would map: Müller - Mueller|Muller Mueller - Mueller|Muller Muller - Muller and could be run at index-time and/or query-time as appropriate. Does anyone know if there are other (Latin-1-utilizing) languages besides German with standardized diacritic substitutions that involve something other than just stripping the diacritics? I'm curious about this too. I am German, but working in Spain so I have not faced the problem so far. Anyhow, IMO Müller - Mueller Mueller - Mueller is right to further shorten the word does not seems right since one is changing the meaning too much. Further: groß - gross gross - gross ß is pronounced 'sz' but only replaced by 'ss'. salu2 - J.J. -- Thorsten Scherler thorsten.at.apache.org Open Source Java consulting, training and solutions
Color search
Hi, We're running an e-commerce site that provides product search. We've been able to extract colors from product images, and we think it'd be cool and useful to search products by color. A product image can have up to 5 colors (from a color space of about 100 colors), so we can implement it easily with Solr's facet search (thanks all who've developed Solr). The problem arises when we try to sort the results by the color relevancy. What's different from a normal facet search is that colors are weighted. For example, a black dress can have 70% of black, 20% of gray, 10% of brown. A search query color:black should return results in which the black dress ranks higher than other products with less percentage of black. My question is: how to configure and index the color field so that products with higher percentage of color X ranks higher for query color:X? Thanks for your help! - Guangwei
Indexing without application server
Hi, I have a multi millions document to be indexed and looking for the way to index it without j2ee application server. It is not incremental indexing, this is a kind of Index once, use forever - all batch mode. I can guess if there is a way to index it without J2EE, it may be much faster... Thanks, Jae Joo
Re: Color search
Hi Guangwei, When you index your products, you could have a single color field, and include duplicates of each color component proportional to its weight. For example, if you decide to use 10% increments, for your black dress with 70% of black, 20% of gray, 10% of brown, you would index the following terms for the color field: black black black black black black black gray gray brown This works because Lucene natively interprets document term frequencies as weights. Steve Guangwei Yuan wrote: Hi, We're running an e-commerce site that provides product search. We've been able to extract colors from product images, and we think it'd be cool and useful to search products by color. A product image can have up to 5 colors (from a color space of about 100 colors), so we can implement it easily with Solr's facet search (thanks all who've developed Solr). The problem arises when we try to sort the results by the color relevancy. What's different from a normal facet search is that colors are weighted. For example, a black dress can have 70% of black, 20% of gray, 10% of brown. A search query color:black should return results in which the black dress ranks higher than other products with less percentage of black. My question is: how to configure and index the color field so that products with higher percentage of color X ranks higher for query color:X? Thanks for your help! - Guangwei
Re: Color search
Another option would be to extend Solr (and donate back) to incorporate Lucene's payload functionality, in which case you could associate the percentile of the color as a payload and use the BoostingTermQuery... :-) If you're interested in this, a discussion on solr-dev is probably warranted to figure out the best way to do this. -Grant On Sep 28, 2007, at 9:23 AM, Yonik Seeley wrote: If it were just a couple of colors, you could have a separate field for each color and then index the percent in that field. black:70 grey:20 and then you could use a function query to influence the score (or you could sort by the color percent). However, this doesn't scale well to a large index with a large number of colors. Each field used like that will take up 4 bytes per document in the index. so if you have 1M documents, that's 1Mdocs * 100colors * 4bytes = 400MB Doable depending on your index size (use int or float and not sint or sfloat type for this... it will be better on the memory). If you needed to be better on the memory, you could encode all of the colors into a single value (perhaps into a compact string... one percentile per byte or something) and then have a custom function that extracts the value for a particular color. (this involves some java development) -Yonik On 9/28/07, Guangwei Yuan [EMAIL PROTECTED] wrote: Hi, We're running an e-commerce site that provides product search. We've been able to extract colors from product images, and we think it'd be cool and useful to search products by color. A product image can have up to 5 colors (from a color space of about 100 colors), so we can implement it easily with Solr's facet search (thanks all who've developed Solr). The problem arises when we try to sort the results by the color relevancy. What's different from a normal facet search is that colors are weighted. For example, a black dress can have 70% of black, 20% of gray, 10% of brown. A search query color:black should return results in which the black dress ranks higher than other products with less percentage of black. My question is: how to configure and index the color field so that products with higher percentage of color X ranks higher for query color:X? Thanks for your help! - Guangwei -- Grant Ingersoll http://lucene.grantingersoll.com Lucene Helpful Hints: http://wiki.apache.org/lucene-java/BasicsOfPerformance http://wiki.apache.org/lucene-java/LuceneFAQ
locallucene former custom-sort thread
Changing thread name; Are you using local lucene or local solr, and which version? P [EMAIL PROTECTED] wrote: i have been testing locallucene with our data for the last couple of days. one issue i faced with it is during when using geo sorting is that it seems to eat up all the memory, however big and become progressively slower, finally after several requests (10 or so in my case) it throws up a java.lang.OutOfMemoryError: Java heap space error. is there a way to get around this? -Original Message- From: Jon Pierce [mailto:[EMAIL PROTECTED]] Sent: 28 September 2007 15:48 To: solr-user@lucene.apache.org Subject: Re: custom sorting Is the machinery in place to do this now (hook up a function query to be used in sorting)? I'm trying to figure out what's the best way to do a distance sort: custom comparator or function query. Using a custom comparator seems straightforward and reusable across both the standard and dismax handlers. But it also seems most likely to impact performance (or at least require the most work/knowledge to get right by minimizing calculations, caching, watching out for memory leaks, etc.). (Speaking of which, could anyone with more Lucene/Solr experience than I comment on the performance characteristics of the locallucene implementation mentioned on the list recently? I've taken a first look and it seems reasonable to me.) Using a function query, as Yonik suggests above, is another approach. But to get a true sort, you have to boost the original query to zero? How does this impact the results returned by the original query? Will the requirements (and boosts) of the original (now nested) query remain intact, only sorted by the function? Also, is there any way to do this with the dismax handler? Thanks, - Jon On 9/27/07, Yonik Seeley [EMAIL PROTECTED] wrote: On 9/27/07, Erik Hatcher [EMAIL PROTECTED] wrote: Using something like this, how would the custom SortComparatorSource get a parameter from the request to use in sorting calculations? perhaps hook in via function query: dist(10.4,20.2,geoloc) And either manipulate the score with that and sort by score, q=+(foo bar)0 dist(10.4,20.2,geoloc) sort=score asc or extend solr's sorting mechanisms to allow specifying a function to sort by. sort="dist(10.4,20.2,geoloc) asc" -Yonik This email is confidential and may also be privileged. If you are not the intended recipient please notify us immediately by telephoning +44 (0)20 7452 5300 or email [EMAIL PROTECTED]. You should not copy it or use it for any purpose nor disclose its contents to any other person. Touch Local cannot accept liability for statements made which are clearly the sender's own and are not made on behalf of the firm. Touch Local Limited Registered Number: 2885607 VAT Number: GB896112114 Cardinal Tower, 12 Farringdon Road, London EC1M 3NN +44 (0)20 7452 5300 -- Patrick O'Leary AOL Local Search Technologies Phone: + 1 703 265 8763 You see, wire telegraph is a kind of a very, very long cat. You pull his tail in New York and his head is meowing in Los Angeles. Do you understand this? And radio operates exactly the same way: you send signals here, they receive them there. The only difference is that there is no cat. - Albert Einstein View Patrick O Leary's profile
Re: Color search
If it were just a couple of colors, you could have a separate field for each color and then index the percent in that field. black:70 grey:20 and then you could use a function query to influence the score (or you could sort by the color percent). However, this doesn't scale well to a large index with a large number of colors. Each field used like that will take up 4 bytes per document in the index. so if you have 1M documents, that's 1Mdocs * 100colors * 4bytes = 400MB Doable depending on your index size (use int or float and not sint or sfloat type for this... it will be better on the memory). If you needed to be better on the memory, you could encode all of the colors into a single value (perhaps into a compact string... one percentile per byte or something) and then have a custom function that extracts the value for a particular color. (this involves some java development) -Yonik On 9/28/07, Guangwei Yuan [EMAIL PROTECTED] wrote: Hi, We're running an e-commerce site that provides product search. We've been able to extract colors from product images, and we think it'd be cool and useful to search products by color. A product image can have up to 5 colors (from a color space of about 100 colors), so we can implement it easily with Solr's facet search (thanks all who've developed Solr). The problem arises when we try to sort the results by the color relevancy. What's different from a normal facet search is that colors are weighted. For example, a black dress can have 70% of black, 20% of gray, 10% of brown. A search query color:black should return results in which the black dress ranks higher than other products with less percentage of black. My question is: how to configure and index the color field so that products with higher percentage of color X ranks higher for query color:X? Thanks for your help! - Guangwei
RE: custom sorting
i have been testing locallucene with our data for the last couple of days. one issue i faced with it is during when using geo sorting is that it seems to eat up all the memory, however big and become progressively slower, finally after several requests (10 or so in my case) it throws up a java.lang.OutOfMemoryError: Java heap space error. is there a way to get around this? -Original Message- From: Jon Pierce [mailto:[EMAIL PROTECTED] Sent: 28 September 2007 15:48 To: solr-user@lucene.apache.org Subject: Re: custom sorting Is the machinery in place to do this now (hook up a function query to be used in sorting)? I'm trying to figure out what's the best way to do a distance sort: custom comparator or function query. Using a custom comparator seems straightforward and reusable across both the standard and dismax handlers. But it also seems most likely to impact performance (or at least require the most work/knowledge to get right by minimizing calculations, caching, watching out for memory leaks, etc.). (Speaking of which, could anyone with more Lucene/Solr experience than I comment on the performance characteristics of the locallucene implementation mentioned on the list recently? I've taken a first look and it seems reasonable to me.) Using a function query, as Yonik suggests above, is another approach. But to get a true sort, you have to boost the original query to zero? How does this impact the results returned by the original query? Will the requirements (and boosts) of the original (now nested) query remain intact, only sorted by the function? Also, is there any way to do this with the dismax handler? Thanks, - Jon On 9/27/07, Yonik Seeley [EMAIL PROTECTED] wrote: On 9/27/07, Erik Hatcher [EMAIL PROTECTED] wrote: Using something like this, how would the custom SortComparatorSource get a parameter from the request to use in sorting calculations? perhaps hook in via function query: dist(10.4,20.2,geoloc) And either manipulate the score with that and sort by score, q=+(foo bar)^0 dist(10.4,20.2,geoloc) sort=score asc or extend solr's sorting mechanisms to allow specifying a function to sort by. sort=dist(10.4,20.2,geoloc) asc -Yonik This email is confidential and may also be privileged. If you are not the intended recipient please notify us immediately by telephoning +44 (0)20 7452 5300 or email [EMAIL PROTECTED] You should not copy it or use it for any purpose nor disclose its contents to any other person. Touch Local cannot accept liability for statements made which are clearly the sender's own and are not made on behalf of the firm. Touch Local Limited Registered Number: 2885607 VAT Number: GB896112114 Cardinal Tower, 12 Farringdon Road, London EC1M 3NN +44 (0)20 7452 5300
RE: locallucene former custom-sort thread
Hi, i'm using local lucene, downloaded the latest zip file solr-example_s1.3_ls0.2.tgz is there a newer version available? Thanks! Sandeep -Original Message- From: patrick o'leary [mailto:[EMAIL PROTECTED] Sent: 28 September 2007 16:08 To: solr-user@lucene.apache.org Subject: locallucene former custom-sort thread Changing thread name; Are you using local lucene or local solr, and which version? P [EMAIL PROTECTED] wrote: i have been testing locallucene with our data for the last couple of days. one issue i faced with it is during when using geo sorting is that it seems to eat up all the memory, however big and become progressively slower, finally after several requests (10 or so in my case) it throws up a java.lang.OutOfMemoryError: Java heap space error. is there a way to get around this? -Original Message- From: Jon Pierce [mailto:[EMAIL PROTECTED] Sent: 28 September 2007 15:48 To: solr-user@lucene.apache.org Subject: Re: custom sorting Is the machinery in place to do this now (hook up a function query to be used in sorting)? I'm trying to figure out what's the best way to do a distance sort: custom comparator or function query. Using a custom comparator seems straightforward and reusable across both the standard and dismax handlers. But it also seems most likely to impact performance (or at least require the most work/knowledge to get right by minimizing calculations, caching, watching out for memory leaks, etc.). (Speaking of which, could anyone with more Lucene/Solr experience than I comment on the performance characteristics of the locallucene implementation mentioned on the list recently? I've taken a first look and it seems reasonable to me.) Using a function query, as Yonik suggests above, is another approach. But to get a true sort, you have to boost the original query to zero? How does this impact the results returned by the original query? Will the requirements (and boosts) of the original (now nested) query remain intact, only sorted by the function? Also, is there any way to do this with the dismax handler? Thanks, - Jon On 9/27/07, Yonik Seeley [EMAIL PROTECTED] wrote: On 9/27/07, Erik Hatcher [EMAIL PROTECTED] wrote: Using something like this, how would the custom SortComparatorSource get a parameter from the request to use in sorting calculations? perhaps hook in via function query: dist(10.4,20.2,geoloc) And either manipulate the score with that and sort by score, q=+(foo bar)0 dist(10.4,20.2,geoloc) sort=score asc or extend solr's sorting mechanisms to allow specifying a function to sort by. sort=dist(10.4,20.2,geoloc) asc -Yonik This email is confidential and may also be privileged. If you are not the intended recipient please notify us immediately by telephoning +44 (0)20 7452 5300 or email [EMAIL PROTECTED] You should not copy it or use it for any purpose nor disclose its contents to any other person. Touch Local cannot accept liability for statements made which are clearly the sender's own and are not made on behalf of the firm. Touch Local Limited Registered Number: 2885607 VAT Number: GB896112114 Cardinal Tower, 12 Farringdon Road, London EC1M 3NN +44 (0)20 7452 5300 -- Patrick O'Leary AOL Local Search Technologies Phone: + 1 703 265 8763 You see, wire telegraph is a kind of a very, very long cat. You pull his tail in New York and his head is meowing in Los Angeles. Do you understand this? And radio operates exactly the same way: you send signals here, they receive them there. The only difference is that there is no cat. - Albert Einstein View Patrick O Leary's LinkedIn profileView Patrick O Leary's profile http://www.linkedin.com/in/pjaol
RE: Color search
Here's another idea: encode color mixes as one RGB value (32 bits) and sort according to those values. To find the closest color is like finding the closest points in the color space. It would be like a distance search. 70% black #00 = 0 20% gray #f0f0f0 = #303030 10% brown #8b4513 = #0e0702 = #3e3732 The distance would be: sqrt( (r1 - r0)^2 + (g1 - g0)^2 + (b1 - b0)^2 ) Where r0g0b0 is the color the user asked for, and r1g1b1 is the composite color of the item, calculated above. --Renaud -Original Message- From: Steven Rowe [mailto:[EMAIL PROTECTED] Sent: Friday, September 28, 2007 7:14 AM To: solr-user@lucene.apache.org Subject: Re: Color search Hi Guangwei, When you index your products, you could have a single color field, and include duplicates of each color component proportional to its weight. For example, if you decide to use 10% increments, for your black dress with 70% of black, 20% of gray, 10% of brown, you would index the following terms for the color field: black black black black black black black gray gray brown This works because Lucene natively interprets document term frequencies as weights. Steve Guangwei Yuan wrote: Hi, We're running an e-commerce site that provides product search. We've been able to extract colors from product images, and we think it'd be cool and useful to search products by color. A product image can have up to 5 colors (from a color space of about 100 colors), so we can implement it easily with Solr's facet search (thanks all who've developed Solr). The problem arises when we try to sort the results by the color relevancy. What's different from a normal facet search is that colors are weighted. For example, a black dress can have 70% of black, 20% of gray, 10% of brown. A search query color:black should return results in which the black dress ranks higher than other products with less percentage of black. My question is: how to configure and index the color field so that products with higher percentage of color X ranks higher for query color:X? Thanks for your help! - Guangwei
Re: locallucene former custom-sort thread
That's the latest. I was experimenting with caching, which might be the problem. I'll have a look, could you give me an idea of how large the radius was and how many results were coming back. Thanks P Sandeep Shetty wrote: Hi, i'm using local lucene, downloaded the latest zip file solr-example_s1.3_ls0.2.tgz is there a newer version available? Thanks! Sandeep -Original Message- From: patrick o'leary [mailto:[EMAIL PROTECTED]] Sent: 28 September 2007 16:08 To: solr-user@lucene.apache.org Subject: locallucene former custom-sort thread Changing thread name; Are you using local lucene or local solr, and which version? P [EMAIL PROTECTED] wrote: i have been testing locallucene with our data for the last couple of days. one issue i faced with it is during when using geo sorting is that it seems to eat up all the memory, however big and become progressively slower, finally after several requests (10 or so in my case) it throws up a java.lang.OutOfMemoryError: Java heap space error. is there a way to get around this? -Original Message- From: Jon Pierce [mailto:[EMAIL PROTECTED]] Sent: 28 September 2007 15:48 To: solr-user@lucene.apache.org Subject: Re: custom sorting Is the machinery in place to do this now (hook up a function query to be used in sorting)? I'm trying to figure out what's the best way to do a distance sort: custom comparator or function query. Using a custom comparator seems straightforward and reusable across both the standard and dismax handlers. But it also seems most likely to impact performance (or at least require the most work/knowledge to get right by minimizing calculations, caching, watching out for memory leaks, etc.). (Speaking of which, could anyone with more Lucene/Solr experience than I comment on the performance characteristics of the locallucene implementation mentioned on the list recently? I've taken a first look and it seems reasonable to me.) Using a function query, as Yonik suggests above, is another approach. But to get a true sort, you have to boost the original query to zero? How does this impact the results returned by the original query? Will the requirements (and boosts) of the original (now nested) query remain intact, only sorted by the function? Also, is there any way to do this with the dismax handler? Thanks, - Jon On 9/27/07, Yonik Seeley [EMAIL PROTECTED] wrote: On 9/27/07, Erik Hatcher [EMAIL PROTECTED] wrote: Using something like this, how would the custom SortComparatorSource get a parameter from the request to use in sorting calculations? perhaps hook in via function query: dist(10.4,20.2,geoloc) And either manipulate the score with that and sort by score, q=+(foo bar)0 dist(10.4,20.2,geoloc) sort=score asc or extend solr's sorting mechanisms to allow specifying a function to sort by. sort="dist(10.4,20.2,geoloc) asc" -Yonik This email is confidential and may also be privileged. If you are not the intended recipient please notify us immediately by telephoning +44 (0)20 7452 5300 or email [EMAIL PROTECTED]. You should not copy it or use it for any purpose nor disclose its contents to any other person. Touch Local cannot accept liability for statements made which are clearly the sender's own and are not made on behalf of the firm. Touch Local Limited Registered Number: 2885607 VAT Number: GB896112114 Cardinal Tower, 12 Farringdon Road, London EC1M 3NN +44 (0)20 7452 5300 -- Patrick O'Leary AOL Local Search Technologies Phone: + 1 703 265 8763 You see, wire telegraph is a kind of a very, very long cat. You pull his tail in New York and his head is meowing in Los Angeles. Do you understand this? And radio operates exactly the same way: you send signals here, they receive them there. The only difference is that there is no cat. - Albert Einstein View Patrick O Leary's profile
Re: custom sorting
Hi all, Regarding this issue, we tried using a custom request handler which inturn uses the CustomCompartor. But this has a memory leak and we are almost got stuck up at that point. As somebody mentioned, we are thinking of moving towards function query to achieve the same. Please let me know whether anybody has faced similar issue or is it that we are doing something wrong. The additional code that we have return from the default handler is as given below. * if* (*myappRequestHandler*.equalsIgnoreCase(requestHandler)) { sort = getSortCriteria(*new* SimpleSortComparatorSourceImpl()); } Thanks and Regards Narayanan On 9/28/07, Yonik Seeley [EMAIL PROTECTED] wrote: On 9/27/07, Erik Hatcher [EMAIL PROTECTED] wrote: Using something like this, how would the custom SortComparatorSource get a parameter from the request to use in sorting calculations? perhaps hook in via function query: dist(10.4,20.2,geoloc) And either manipulate the score with that and sort by score, q=+(foo bar)^0 dist(10.4,20.2,geoloc) sort=score asc or extend solr's sorting mechanisms to allow specifying a function to sort by. sort=dist(10.4,20.2,geoloc) asc -Yonik
Re: Indexing without application server
I do not think it will be much faster. The data transfer time is small compared to the indexing time. The indexing will probably take less than a day, so if you spend more than 30 minutes coding a faster method, the project will take longer. wunder On 9/28/07 6:06 AM, Jae Joo [EMAIL PROTECTED] wrote: Hi, I have a multi millions document to be indexed and looking for the way to index it without j2ee application server. It is not incremental indexing, this is a kind of Index once, use forever - all batch mode. I can guess if there is a way to index it without J2EE, it may be much faster... Thanks, Jae Joo
one query or multiple queries
Hi, there, I have a user index(each user has a unique index record). If I want to search 10 users, should I run 10 queries or 1 query with multiple user ids? Is there any performance difference? Thanks Xuesong
RE: locallucene former custom-sort thread
also probably a point to consider, the index has about 2.9 million records in total -Original Message- From: Sandeep Shetty Sent: 28 September 2007 17:15 To: 'solr-user@lucene.apache.org' Subject: RE: locallucene former custom-sort thread yes i was thinking abt the same. i was searching for a radius of 25 miles. we get about 2500 results back for the search. it seems like its storing all those geo results in cache and it keeps on adding to it each time a geo request is made... thanks for looking into it! Sandeep -Original Message- From: patrick o'leary [mailto:[EMAIL PROTECTED] Sent: 28 September 2007 17:02 To: solr-user@lucene.apache.org Subject: Re: locallucene former custom-sort thread That's the latest. I was experimenting with caching, which might be the problem. I'll have a look, could you give me an idea of how large the radius was and how many results were coming back. Thanks P Sandeep Shetty wrote: Hi, i'm using local lucene, downloaded the latest zip file solr-example_s1.3_ls0.2.tgz is there a newer version available? Thanks! Sandeep -Original Message- From: patrick o'leary [mailto:[EMAIL PROTECTED] Sent: 28 September 2007 16:08 To: solr-user@lucene.apache.org Subject: locallucene former custom-sort thread Changing thread name; Are you using local lucene or local solr, and which version? P [EMAIL PROTECTED] wrote: i have been testing locallucene with our data for the last couple of days. one issue i faced with it is during when using geo sorting is that it seems to eat up all the memory, however big and become progressively slower, finally after several requests (10 or so in my case) it throws up a java.lang.OutOfMemoryError: Java heap space error. is there a way to get around this? -Original Message- From: Jon Pierce [mailto:[EMAIL PROTECTED] Sent: 28 September 2007 15:48 To: solr-user@lucene.apache.org Subject: Re: custom sorting Is the machinery in place to do this now (hook up a function query to be used in sorting)? I'm trying to figure out what's the best way to do a distance sort: custom comparator or function query. Using a custom comparator seems straightforward and reusable across both the standard and dismax handlers. But it also seems most likely to impact performance (or at least require the most work/knowledge to get right by minimizing calculations, caching, watching out for memory leaks, etc.). (Speaking of which, could anyone with more Lucene/Solr experience than I comment on the performance characteristics of the locallucene implementation mentioned on the list recently? I've taken a first look and it seems reasonable to me.) Using a function query, as Yonik suggests above, is another approach. But to get a true sort, you have to boost the original query to zero? How does this impact the results returned by the original query? Will the requirements (and boosts) of the original (now nested) query remain intact, only sorted by the function? Also, is there any way to do this with the dismax handler? Thanks, - Jon On 9/27/07, Yonik Seeley [EMAIL PROTECTED] wrote: On 9/27/07, Erik Hatcher [EMAIL PROTECTED] wrote: Using something like this, how would the custom SortComparatorSource get a parameter from the request to use in sorting calculations? perhaps hook in via function query: dist(10.4,20.2,geoloc) And either manipulate the score with that and sort by score, q=+(foo bar)0 dist(10.4,20.2,geoloc) sort=score asc or extend solr's sorting mechanisms to allow specifying a function to sort by. sort=dist(10.4,20.2,geoloc) asc -Yonik This email is confidential and may also be privileged. If you are not the intended recipient please notify us immediately by telephoning +44 (0)20 7452 5300 or email [EMAIL PROTECTED] You should not copy it or use it for any purpose nor disclose its contents to any other person. Touch Local cannot accept liability for statements made which are clearly the sender's own and are not made on behalf of the firm. Touch Local Limited Registered Number: 2885607 VAT Number: GB896112114 Cardinal Tower, 12 Farringdon Road, London EC1M 3NN +44 (0)20 7452 5300 -- Patrick O'Leary AOL Local Search Technologies Phone: + 1 703 265 8763 You see, wire telegraph is a kind of a very, very long cat. You pull his tail in New York and his head is meowing in Los Angeles. Do you understand this? And radio operates exactly the same way: you send signals here, they receive them there. The only difference is that there is no cat. - Albert Einstein View Patrick O Leary's LinkedIn profileView Patrick O Leary's profile http://www.linkedin.com/in/pjaol
Re: custom sorting
Is the machinery in place to do this now (hook up a function query to be used in sorting)? I'm trying to figure out what's the best way to do a distance sort: custom comparator or function query. Using a custom comparator seems straightforward and reusable across both the standard and dismax handlers. But it also seems most likely to impact performance (or at least require the most work/knowledge to get right by minimizing calculations, caching, watching out for memory leaks, etc.). (Speaking of which, could anyone with more Lucene/Solr experience than I comment on the performance characteristics of the locallucene implementation mentioned on the list recently? I've taken a first look and it seems reasonable to me.) Using a function query, as Yonik suggests above, is another approach. But to get a true sort, you have to boost the original query to zero? How does this impact the results returned by the original query? Will the requirements (and boosts) of the original (now nested) query remain intact, only sorted by the function? Also, is there any way to do this with the dismax handler? Thanks, - Jon On 9/27/07, Yonik Seeley [EMAIL PROTECTED] wrote: On 9/27/07, Erik Hatcher [EMAIL PROTECTED] wrote: Using something like this, how would the custom SortComparatorSource get a parameter from the request to use in sorting calculations? perhaps hook in via function query: dist(10.4,20.2,geoloc) And either manipulate the score with that and sort by score, q=+(foo bar)^0 dist(10.4,20.2,geoloc) sort=score asc or extend solr's sorting mechanisms to allow specifying a function to sort by. sort=dist(10.4,20.2,geoloc) asc -Yonik
RE: locallucene former custom-sort thread
yes i was thinking abt the same. i was searching for a radius of 25 miles. we get about 2500 results back for the search. it seems like its storing all those geo results in cache and it keeps on adding to it each time a geo request is made... thanks for looking into it! Sandeep -Original Message- From: patrick o'leary [mailto:[EMAIL PROTECTED] Sent: 28 September 2007 17:02 To: solr-user@lucene.apache.org Subject: Re: locallucene former custom-sort thread That's the latest. I was experimenting with caching, which might be the problem. I'll have a look, could you give me an idea of how large the radius was and how many results were coming back. Thanks P Sandeep Shetty wrote: Hi, i'm using local lucene, downloaded the latest zip file solr-example_s1.3_ls0.2.tgz is there a newer version available? Thanks! Sandeep -Original Message- From: patrick o'leary [mailto:[EMAIL PROTECTED] Sent: 28 September 2007 16:08 To: solr-user@lucene.apache.org Subject: locallucene former custom-sort thread Changing thread name; Are you using local lucene or local solr, and which version? P [EMAIL PROTECTED] wrote: i have been testing locallucene with our data for the last couple of days. one issue i faced with it is during when using geo sorting is that it seems to eat up all the memory, however big and become progressively slower, finally after several requests (10 or so in my case) it throws up a java.lang.OutOfMemoryError: Java heap space error. is there a way to get around this? -Original Message- From: Jon Pierce [mailto:[EMAIL PROTECTED] Sent: 28 September 2007 15:48 To: solr-user@lucene.apache.org Subject: Re: custom sorting Is the machinery in place to do this now (hook up a function query to be used in sorting)? I'm trying to figure out what's the best way to do a distance sort: custom comparator or function query. Using a custom comparator seems straightforward and reusable across both the standard and dismax handlers. But it also seems most likely to impact performance (or at least require the most work/knowledge to get right by minimizing calculations, caching, watching out for memory leaks, etc.). (Speaking of which, could anyone with more Lucene/Solr experience than I comment on the performance characteristics of the locallucene implementation mentioned on the list recently? I've taken a first look and it seems reasonable to me.) Using a function query, as Yonik suggests above, is another approach. But to get a true sort, you have to boost the original query to zero? How does this impact the results returned by the original query? Will the requirements (and boosts) of the original (now nested) query remain intact, only sorted by the function? Also, is there any way to do this with the dismax handler? Thanks, - Jon On 9/27/07, Yonik Seeley [EMAIL PROTECTED] wrote: On 9/27/07, Erik Hatcher [EMAIL PROTECTED] wrote: Using something like this, how would the custom SortComparatorSource get a parameter from the request to use in sorting calculations? perhaps hook in via function query: dist(10.4,20.2,geoloc) And either manipulate the score with that and sort by score, q=+(foo bar)0 dist(10.4,20.2,geoloc) sort=score asc or extend solr's sorting mechanisms to allow specifying a function to sort by. sort=dist(10.4,20.2,geoloc) asc -Yonik This email is confidential and may also be privileged. If you are not the intended recipient please notify us immediately by telephoning +44 (0)20 7452 5300 or email [EMAIL PROTECTED] You should not copy it or use it for any purpose nor disclose its contents to any other person. Touch Local cannot accept liability for statements made which are clearly the sender's own and are not made on behalf of the firm. Touch Local Limited Registered Number: 2885607 VAT Number: GB896112114 Cardinal Tower, 12 Farringdon Road, London EC1M 3NN +44 (0)20 7452 5300 -- Patrick O'Leary AOL Local Search Technologies Phone: + 1 703 265 8763 You see, wire telegraph is a kind of a very, very long cat. You pull his tail in New York and his head is meowing in Los Angeles. Do you understand this? And radio operates exactly the same way: you send signals here, they receive them there. The only difference is that there is no cat. - Albert Einstein View Patrick O Leary's LinkedIn profileView Patrick O Leary's profile http://www.linkedin.com/in/pjaol
Dismax and Grouping query
Hi, I've tried to use grouping query on DisMaxRequestHandler without success. When I sent grouping query in Solr Admin, I could see parens of query escaped in 'querystring' line with debugQuery On. Is this the cause of the failure? e.g. When I send query like +(lucene solr), I can see following line in the result page. str name=querystring+\(lucene solr\)/str When I tried this with StandardRequestHandler, parens of the query were not escaped. And the query was successfully answered. Digging into the source of Solr, I could find the following line at DisMaxRequestHandler.java. userQuery = U.partialEscape(U.stripUnbalancedQuotes(userQuery)).toString(); And partialEscape function seems to carry out the escaping. So... Can I carry out grouping query on DisMaxRequestHandler? If so, should I use special character for grouping in stead of parens? I'm pretty new on Solr. Any reply will help. Thanks in advance.
Re: searching remote indexes
resending due to lack of response : [We are using embedded solr 1.2 ] I need a mechanism by which i can search over 3 remote indexes? Can i use the Lucene remote apis to access the index created via Embedded solr? -Venkat On 9/4/07, Venkatraman S [EMAIL PROTECTED] wrote: Hi, [I am new to Solr]. How do i search remote indexes using Solr? I am not able to find suitable documentation on this - can you guys guide me? Regards, Venkat -- --
Re: Color search
Hi Renaud, I think your method will produce strange results, probably in most cases, e.g. 33% red #FF = #55 33% green #00FF00 = #005500 33% blue #FF = #55 = #55 Thus, red, green and blue dress would score well against a search for medium gray. Not good. Steve Renaud Waldura wrote: Here's another idea: encode color mixes as one RGB value (32 bits) and sort according to those values. To find the closest color is like finding the closest points in the color space. It would be like a distance search. 70% black #00 = 0 20% gray #f0f0f0 = #303030 10% brown #8b4513 = #0e0702 = #3e3732 The distance would be: sqrt( (r1 - r0)^2 + (g1 - g0)^2 + (b1 - b0)^2 ) Where r0g0b0 is the color the user asked for, and r1g1b1 is the composite color of the item, calculated above. --Renaud -Original Message- From: Steven Rowe [mailto:[EMAIL PROTECTED] Sent: Friday, September 28, 2007 7:14 AM To: solr-user@lucene.apache.org Subject: Re: Color search Hi Guangwei, When you index your products, you could have a single color field, and include duplicates of each color component proportional to its weight. For example, if you decide to use 10% increments, for your black dress with 70% of black, 20% of gray, 10% of brown, you would index the following terms for the color field: black black black black black black black gray gray brown This works because Lucene natively interprets document term frequencies as weights. Steve Guangwei Yuan wrote: Hi, We're running an e-commerce site that provides product search. We've been able to extract colors from product images, and we think it'd be cool and useful to search products by color. A product image can have up to 5 colors (from a color space of about 100 colors), so we can implement it easily with Solr's facet search (thanks all who've developed Solr). The problem arises when we try to sort the results by the color relevancy. What's different from a normal facet search is that colors are weighted. For example, a black dress can have 70% of black, 20% of gray, 10% of brown. A search query color:black should return results in which the black dress ranks higher than other products with less percentage of black. My question is: how to configure and index the color field so that products with higher percentage of color X ranks higher for query color:X? Thanks for your help! - Guangwei
Re: one query or multiple queries
I'd guess the latter would be faster, but who knows? Try it both ways. -- Ian. On 9/28/07, Xuesong Luo [EMAIL PROTECTED] wrote: Hi, there, I have a user index(each user has a unique index record). If I want to search 10 users, should I run 10 queries or 1 query with multiple user ids? Is there any performance difference? Thanks Xuesong
Re: Color search
This discussion is incredibly interesting to me! We solved this simply by indexing the color names, and faceting on that. Not a very elegant solution, to be sure - but it works. If people search for a green running shoe they get -green- running shoes. I would be very very interested in having a color picker ajax app which then went out and found the products with colors most like the one you chose. ++ | Matthew Runo | Zappos Development | [EMAIL PROTECTED] | 702-943-7833 ++ On Sep 28, 2007, at 1:00 AM, Guangwei Yuan wrote: Hi, We're running an e-commerce site that provides product search. We've been able to extract colors from product images, and we think it'd be cool and useful to search products by color. A product image can have up to 5 colors (from a color space of about 100 colors), so we can implement it easily with Solr's facet search (thanks all who've developed Solr). The problem arises when we try to sort the results by the color relevancy. What's different from a normal facet search is that colors are weighted. For example, a black dress can have 70% of black, 20% of gray, 10% of brown. A search query color:black should return results in which the black dress ranks higher than other products with less percentage of black. My question is: how to configure and index the color field so that products with higher percentage of color X ranks higher for query color:X? Thanks for your help! - Guangwei
Index multiple languages with multiple analyzers with the same field
Hi, I know this probably has been asked before, but I was not able to find it in the mailing list. So forgive me if I repeated the same question. We are trying to build a search application to support multiple languages. Users can potentially query with any language. First thought come to us is to index the text of all languages in the same field using language specific analyzer. As all the data are indexed in the same field, it would just find results with the language that matches the user query. Looking at the Solr schema, it seems each field can have one and only analyzer. Is it possible to have multiple analyzers for the same field? Or is there any other approaches that can achieve the same thing? Daniel
Re: Index multiple languages with multiple analyzers with the same field
On 28-Sep-07, at 11:13 AM, Wu, Daniel wrote: Hi, I know this probably has been asked before, but I was not able to find it in the mailing list. So forgive me if I repeated the same question. This thread hashes out the issues in quite a lot of detail: http://www.nabble.com/Multi-language-indexing-and-searching- tf3885324.html#a11012939 -Mike
Re: searching remote indexes
Solr's main interface is http, so you can connect to that remotely. Query each machine and combine the results using you own business logic. Alternatively, you can try out the query distribution code being developed in http://issues.apache.org/jira/browse/SOLR-303 -Mike On 28-Sep-07, at 1:59 AM, Venkatraman S wrote: resending due to lack of response : [We are using embedded solr 1.2 ] I need a mechanism by which i can search over 3 remote indexes? Can i use the Lucene remote apis to access the index created via Embedded solr? -Venkat On 9/4/07, Venkatraman S [EMAIL PROTECTED] wrote: Hi, [I am new to Solr]. How do i search remote indexes using Solr? I am not able to find suitable documentation on this - can you guys guide me? Regards, Venkat -- --
Re: Index multiple languages with multiple analyzers with the same field
I had the same problem, but never found a good solution. The best solution is to have a more dynamic way of determining which analyzer to return, such as having some kind of conditional expression evalution in the fieldType/analyzer element, where either the document or the query request could be used as the comparison object. fieldtype type=textMultiLingual class=solr.TextField analyzer type=query expression=request.lang == 'EN' tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.StandardFilterFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.StopFilterFactory/ filter class=solr.PorterStemFilterFactory/ /analyzer /fieldtype Analyzers could still be cached by adding the expression to the cache key. Unfortunately I have switched jobs, so I don't have the time or motivation to do this, but it should be a very useful addition. - Thom Wu, Daniel wrote: Hi, I know this probably has been asked before, but I was not able to find it in the mailing list. So forgive me if I repeated the same question. We are trying to build a search application to support multiple languages. Users can potentially query with any language. First thought come to us is to index the text of all languages in the same field using language specific analyzer. As all the data are indexed in the same field, it would just find results with the language that matches the user query. Looking at the Solr schema, it seems each field can have one and only analyzer. Is it possible to have multiple analyzers for the same field? Or is there any other approaches that can achieve the same thing? Daniel
Re: Color search
: useful to search products by color. A product image can have up to 5 colors : (from a color space of about 100 colors), so we can implement it easily with : Solr's facet search (thanks all who've developed Solr). : : The problem arises when we try to sort the results by the color relevancy. : What's different from a normal facet search is that colors are weighted. For : example, a black dress can have 70% of black, 20% of gray, 10% of brown. A if 5 is a hard max on the number of colors that you support, then you can always use 5 seperate fields to store the colors in order of dominance and then query on those 5 fields with varying boosts... color_1:black^10 color_2:black^7 color_3:black^4 color_4:black color_5:black^0.1 ...something like this will loose the % granularity info that you have (so a 60% black skirt and an 80% black dress would both score the same against black since it's hte dominant color) alternately: i'm assuming your percentage data only has so much confidence -- maybe on the order of 10%?. you can have a seperate field for each bucket of color percentages and index the name of hte color in the corrisponding bucket. with 10% granularity that's only 10 fields -- a 10 clause boolean query for the color is no big deal ... even going to 5% would be trivial. Incidently: people interested in teh general topic of color faceting at a finer granularity then just color names may want to check out this thread from last... http://www.nabble.com/faceting-and-categorizing-on-color--tf1801106.html -Hoss
Re: Request for graphics
: I am trying to make a presentation on SOLR and have been unable to find the : SOLR graphic in high quality. Could someone point me in the right direction : or provide the graphics? you're right -- i can't find the orriginal source files for it in subversion. I think i know who made it (here at CNET) I'll ping him and see if i can get the orriginal source files and get them into subversion so alternate resolutions can be generated. -Hoss
Schema version question
I was wondering if anyone could help me, I just completed a full index of my data (about 4 million documents) and noticed that when I was first setting up the schema I set the version number to 1.2 thinking that solr 1.2 uses schema version 1.2... ugh... so I am wondering if I can just set the schema to 1.1 without having to rebuild the full index? I ask because I am hoping that given an invalid schema version number, that version 1.0 is not used by default and all my fields are now mulitvalued. Any help would be greatly appreciated. Thanks in advance -- View this message in context: http://www.nabble.com/Schema-version-question-tf4536802.html#a12948588 Sent from the Solr - User mailing list archive at Nabble.com.
Re: Color search
On 28-Sep-07, at 6:31 AM, Grant Ingersoll wrote: Another option would be to extend Solr (and donate back) to incorporate Lucene's payload functionality, in which case you could associate the percentile of the color as a payload and use the BoostingTermQuery... :-) If you're interested in this, a discussion on solr-dev is probably warranted to figure out the best way to do this. For reference, here is a summary of the changes needed: 1. A payload analyzer (here is an example that tokenizes strings of token:whatever:score into token with payload score: /** Returns the next token in the stream, or null at EOS. */ public final Token next() throws IOException { Token t = input.next(); if (null == t) return null; String s = t.termText(); if(s.indexOf(:) -1 ) { String []parts = s.split(:); assert parts.length == 3; String colour = parts[0]; int bits = Float.floatToIntBits(Float.parseFloat(parts[1])); byte []buf = new byte[4]; for(int shift=0, i=0; shift 32; shift += 8, i++) { buf[i] = (byte)( (bitsshift) 0xff ); } Token gen = new Token(colour, t.startOffset(), t.endOffset()); gen.setPayload(new Payload(buf)); t = gen; } return t; } 2. A payload deserializer. Add this method to your custom Similarity class: public float scorePayload(byte [] payload, int offset, int length) { assert length == 4; int accum = ((payload[0+offset]0xff)) | ((payload[1+offset]0xff)8) | ((payload[2+offset]0xff)16) | ((payload[3+offset]0xff)24); return Float.intBitsToFloat(accum); } 3. Add a relevant query clause. In a custom request handler, you could have a parameter to add BoostingTermQueries: q= new BoostingTermQuery(new Term(colourPayload, colour)) query.add(q, Occur.SHOULD); How to add this generically is an interesting question. There are many possibilities, especially on the request handler and tokenizer side of things. If there is a consensus on a sensible way of doing this, I could contribute the bits of code that I have. HTH, -Mike
Re: small rsync index question
On 9/28/07, Brian Whitman [EMAIL PROTECTED] wrote: For some reason sending a commit/ is not refreshing the index It should... are there any errors in the logs? do you see the commit in the logs? Check the stats page to see info about when the current searcher was last opened too. -Yonik
Re: Schema version question
On 9/28/07, Robert Purdy [EMAIL PROTECTED] wrote: I was wondering if anyone could help me, I just completed a full index of my data (about 4 million documents) and noticed that when I was first setting up the schema I set the version number to 1.2 thinking that solr 1.2 uses schema version 1.2... ugh... so I am wondering if I can just set the schema to 1.1 without having to rebuild the full index? I ask because I am hoping that given an invalid schema version number, that version 1.0 is not used by default and all my fields are now mulitvalued. Any help would be greatly appreciated. Thanks in advance Yes, it should be OK to set it back to 1.1 w/o reindexing. The index format does not differentiate between single and multi-valued fields so you should be fine there. -Yonik
Re: Request for graphics
On 9/28/07, Clay Webster [EMAIL PROTECTED] wrote: i'm late for dinner out, so i'm just attaching it here. Most attachments are stripped :-) -Yonik
RE: Index multiple languages with multiple analyzers with the same field
Other people custom-create a separate dynamic field for each language they want to support. The spellchecker in Solr 1.2 wants just one field to use as its word source, so this fits. We have a more complex version of this problem: we have content with both English and other languages. Searching is one problem; we also want to have spelling correction dictionaries for each language. We have many world languages which need very different handling and semantics, like CJK processing. We will have to use the multiple-field trick; I don't think we can shoehorn our complexity into this technique. It is a valiant effort, though. It's possible we could separate out the different-language words in the document, put them each in separate words_en_text, word_sp_text, etc. and make the default search field out of copyField source=*_text dest=defaultText/ Hmm. Lance -Original Message- From: Thom Nelson [mailto:[EMAIL PROTECTED] Sent: Friday, September 28, 2007 12:07 PM To: solr-user@lucene.apache.org; [EMAIL PROTECTED] Subject: Re: Index multiple languages with multiple analyzers with the same field I had the same problem, but never found a good solution. The best solution is to have a more dynamic way of determining which analyzer to return, such as having some kind of conditional expression evalution in the fieldType/analyzer element, where either the document or the query request could be used as the comparison object. fieldtype type=textMultiLingual class=solr.TextField analyzer type=query expression=request.lang == 'EN' tokenizer class=solr.StandardTokenizerFactory/ filter class=solr.StandardFilterFactory/ filter class=solr.LowerCaseFilterFactory/ filter class=solr.StopFilterFactory/ filter class=solr.PorterStemFilterFactory/ /analyzer /fieldtype Analyzers could still be cached by adding the expression to the cache key. Unfortunately I have switched jobs, so I don't have the time or motivation to do this, but it should be a very useful addition. - Thom Wu, Daniel wrote: Hi, I know this probably has been asked before, but I was not able to find it in the mailing list. So forgive me if I repeated the same question. We are trying to build a search application to support multiple languages. Users can potentially query with any language. First thought come to us is to index the text of all languages in the same field using language specific analyzer. As all the data are indexed in the same field, it would just find results with the language that matches the user query. Looking at the Solr schema, it seems each field can have one and only analyzer. Is it possible to have multiple analyzers for the same field? Or is there any other approaches that can achieve the same thing? Daniel
Re: custom sorting
: Using something like this, how would the custom SortComparatorSource : get a parameter from the request to use in sorting calculations? in general: you wouldn't you would have to specify all options as init params for the FieldType -- which makes it pretty horrible for distance calculations, and isn't something i considered when i posted that. the only way i can think of that you can really solve the problem with a plugin at the moment (without some serious internal changes that yonik describes below) would be to use a dynamicField when you want geodistance sort, and encode the center lat/lon point in the field name, ala: sort=geodist_-124.75_93.45 : or extend solr's sorting mechanisms to allow specifying a function to sort by. : : sort=dist(10.4,20.2,geoloc) asc thta would in fact, kick ass. even if there is a better solution for the distance stuff the idea of being able to specify a raw function as a sort would be pretty sick. (NOTE: that's sick as in so good it's amazing ... since the last person i used that idiom with didn't understand and thought i ment bad) -Hoss
Re: Color search
Thanks for all the replies. I think creating 10 fields and feeding each field with a color's value for 10% from that color is a reasonable approach, and easy to implement too. One problem though, is that not all products have a total of 100% colors (due to various reasons including our color extraction algorithm, etc.) So, for a product with 50% of #00, and 20% of #99, I'll have to fill the remaining three fields with some dummy values. Otherwise, Lucene seems to score it higher than products that also have 50% of #00, but more than 20% of some other colors. Since I also need a way to exclude the dummy value when faceting, is there a neater solution? I'll certainly look at the payload functionality, which is new to me :) - Guangwei
Re: Dismax and Grouping query
: I've tried to use grouping query on DisMaxRequestHandler without success. : e.g. : When I send query like +(lucene solr), : I can see following line in the result page. : str name=querystring+\(lucene solr\)/str the dismax handler does not consider parens to be special characters. if it did, it's not clear what the semantics would be of a query like... q=A +(B C)qf=X Y Z ..when building the query structure ... what happens if X:B exists and Y:C exists? is that considered a match? Generally, the mm param is used to indicate how many of the query terms (that don't have a + or - prefix) are required, or you can explicitly require/prohibit a term using + or -, but there is no way to require that one of N sub terms is required (prohibiting any of N sub terms is easy, just prohibit them all individually) -Hoss
Re: custom sorting
: leaks, etc.). (Speaking of which, could anyone with more Lucene/Solr : experience than I comment on the performance characteristics of the : locallucene implementation mentioned on the list recently? I've taken : a first look and it seems reasonable to me.) i cna't speak for anyone else, but i haven't had a chacne to drill into it yet. : Using a function query, as Yonik suggests above, is another approach. : But to get a true sort, you have to boost the original query to zero? or a very close approximation there of (0.01 perhaps) keep in mind: a true distance sort while easy to explain may not be as useful as a sort by score where the distance is factored into the score ... there have been some threads about this on the java-user list in the past and it's been discussed that a really relevant result 2 miles away is probably better then a mildly relevent result 1.5 miles away ... that's where a function query with well choosen boosts might serve you better. : How does this impact the results returned by the original query? Will : the requirements (and boosts) of the original (now nested) query : remain intact, only sorted by the function? Also, is there any way to it should ... but i won't swear to that. : do this with the dismax handler? a strict sort on the value of a a function? put the function in the bf param, don't bother with bq or pf params and change your qf params to all have really small boosts. -Hoss
Re: small rsync index question
: To completely remove the window of inconsistency, comment out the : post-commit hook in solrconfig.xml that takes a snapshot, then send a : commit to get a new snapshot and rsync from that. i think yonik ment UN-comment the postCommit hook in the example solrconfig.xml. -Hoss