Re: Deadlock in lucene?
Ouch, that's certainly a problem! I'll have to think some more on this one. Thanks for your time! Matthew Runo Software Engineer, Zappos.com [EMAIL PROTECTED] - 702-943-7833 On Aug 19, 2008, at 1:42 PM, Otis Gospodnetic wrote: Matthew, just because an index is read-only on some server it doesn't mean it contains no deletes (no docs marked as deleted, but not yet removed from the index). So you still want to check isDeleted(doc) *unless* you are certain the index has no docs marked as deleted (this happens after optimization). Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message From: Matthew Runo <[EMAIL PROTECTED]> To: solr-user@lucene.apache.org Sent: Tuesday, August 19, 2008 4:26:59 PM Subject: Re: Deadlock in lucene? I know this isn't really the place for this, so please forgive me - but does this patch look reasonably safe to use to skip the isDeleted check inside of FunctionQuery? My reasoning behind this is that many people (us included) will be building the index on a separate server, and then using the replication scripts to publish the files out to several read-only servers. On those instances, deletedDocs would always be empty, since it's a read only instance - and so we can conveniently skip the Lucene code in question. This flag would also be good for other optimizations that can only be made when you assume the index is read-only. Solr seems to work with the flag set - any reasons why this will crash and/or kill my kitten? (please forgive my posting this here instead of in solr-dev!) Index: src/java/org/apache/solr/search/FunctionQParser.java === --- src/java/org/apache/solr/search/FunctionQParser.java(revision 687135) +++ src/java/org/apache/solr/search/FunctionQParser.javaTue Aug 19 11:08:45 PDT 2008 @@ -49,7 +49,7 @@ } ***/ -return new FunctionQuery(vs); +return new FunctionQuery(vs, req.getSchema().getSolrConfig().isReadOnly() ); } /** Index: src/java/org/apache/solr/search/function/FunctionQuery.java === --- src/java/org/apache/solr/search/function/FunctionQuery.java (revision 687135) +++ src/java/org/apache/solr/search/function/FunctionQuery.java Tue Aug 19 11:08:45 PDT 2008 @@ -31,12 +31,14 @@ */ public class FunctionQuery extends Query { ValueSource func; + Boolean readOnly; /** * @param func defines the function to be used for scoring */ - public FunctionQuery(ValueSource func) { + public FunctionQuery(ValueSource func, Boolean readOnly) { this.func=func; +this.readOnly=readOnly; } /** @return The associated ValueSource */ @@ -113,7 +115,7 @@ if (doc>=maxDoc) { return false; } -if (reader.isDeleted(doc)) continue; +if (!readOnly && reader.isDeleted(doc)) continue; // todo: maybe allow score() to throw a specific exception // and continue on to the next document if it is thrown... // that may be useful, but exceptions aren't really good Index: src/java/org/apache/solr/core/Config.java === --- src/java/org/apache/solr/core/Config.java(revision 687135) +++ src/java/org/apache/solr/core/Config.javaTue Aug 19 11:08:45 PDT 2008 @@ -45,6 +45,8 @@ private final String name; private final SolrResourceLoader loader; + private Boolean readOnly; + /** * @deprecated Use [EMAIL PROTECTED] #Config(SolrResourceLoader, String, InputStream, String)} instead. */ @@ -254,6 +256,19 @@ return val!=null ? Double.parseDouble(val) : def; } + /** + * Is the index set up to be readOnly? If so, this will cause the FunctionQuery stuff to not check + * for deleted documents. + * @return boolean readOnly + */ + public boolean isReadOnly() { + if( this.readOnly == null ){ + readOnly = getBool("/mainIndex/readOnly", false); + } + + return readOnly; + } + // The following functions were moved to ResourceLoader //- Index: example/solr/conf/solrconfig.xml === --- example/solr/conf/solrconfig.xml(revision 687135) +++ example/solr/conf/solrconfig.xmlTue Aug 19 11:13:13 PDT 2008 @@ -114,6 +114,12 @@ This is not needed if lock type is 'none' or 'single' --> false + + +false use --- end patch --- On Aug 18, 2008, at 8:04 PM, Yonik Seeley wrote: It's not a deadlock (just a synchronization bottleneck) , but it is a known issue in Lucene and there has been some progress in improving the situation. -Yonik On Mon, Aug 18, 2008 at 10:55 PM, Matthew Runo wrote: Hello folks! I was just wondering if anyone else has seen this issue under heavy load. W
Re: Deadlock in lucene?
Matthew, just because an index is read-only on some server it doesn't mean it contains no deletes (no docs marked as deleted, but not yet removed from the index). So you still want to check isDeleted(doc) *unless* you are certain the index has no docs marked as deleted (this happens after optimization). Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message > From: Matthew Runo <[EMAIL PROTECTED]> > To: solr-user@lucene.apache.org > Sent: Tuesday, August 19, 2008 4:26:59 PM > Subject: Re: Deadlock in lucene? > > I know this isn't really the place for this, so please forgive me - > but does this patch look reasonably safe to use to skip the isDeleted > check inside of FunctionQuery? > > My reasoning behind this is that many people (us included) will be > building the index on a separate server, and then using the > replication scripts to publish the files out to several read-only > servers. On those instances, deletedDocs would always be empty, since > it's a read only instance - and so we can conveniently skip the Lucene > code in question. This flag would also be good for other optimizations > that can only be made when you assume the index is read-only. > > Solr seems to work with the flag set - any reasons why this will crash > and/or kill my kitten? > > (please forgive my posting this here instead of in solr-dev!) > > Index: src/java/org/apache/solr/search/FunctionQParser.java > === > --- src/java/org/apache/solr/search/FunctionQParser.java(revision > 687135) > +++ src/java/org/apache/solr/search/FunctionQParser.javaTue Aug 19 > 11:08:45 PDT 2008 > @@ -49,7 +49,7 @@ > } > ***/ > > -return new FunctionQuery(vs); > +return new FunctionQuery(vs, > req.getSchema().getSolrConfig().isReadOnly() ); > } > > /** > Index: src/java/org/apache/solr/search/function/FunctionQuery.java > === > --- src/java/org/apache/solr/search/function/FunctionQuery.java > (revision 687135) > +++ src/java/org/apache/solr/search/function/FunctionQuery.javaTue > Aug 19 11:08:45 PDT 2008 > @@ -31,12 +31,14 @@ >*/ > public class FunctionQuery extends Query { > ValueSource func; > + Boolean readOnly; > > /** > * @param func defines the function to be used for scoring > */ > - public FunctionQuery(ValueSource func) { > + public FunctionQuery(ValueSource func, Boolean readOnly) { > this.func=func; > +this.readOnly=readOnly; > } > > /** @return The associated ValueSource */ > @@ -113,7 +115,7 @@ > if (doc>=maxDoc) { > return false; > } > -if (reader.isDeleted(doc)) continue; > +if (!readOnly && reader.isDeleted(doc)) continue; > // todo: maybe allow score() to throw a specific exception > // and continue on to the next document if it is thrown... > // that may be useful, but exceptions aren't really good > Index: src/java/org/apache/solr/core/Config.java > === > --- src/java/org/apache/solr/core/Config.java(revision 687135) > +++ src/java/org/apache/solr/core/Config.javaTue Aug 19 11:08:45 PDT > 2008 > @@ -45,6 +45,8 @@ > private final String name; > private final SolrResourceLoader loader; > > + private Boolean readOnly; > + > /** > * @deprecated Use [EMAIL PROTECTED] #Config(SolrResourceLoader, String, > InputStream, String)} instead. > */ > @@ -254,6 +256,19 @@ >return val!=null ? Double.parseDouble(val) : def; > } > > + /** > + * Is the index set up to be readOnly? If so, this will cause the > FunctionQuery stuff to not check > + * for deleted documents. > + * @return boolean readOnly > + */ > + public boolean isReadOnly() { > + if( this.readOnly == null ){ > + readOnly = getBool("/mainIndex/readOnly", false); > + } > + > + return readOnly; > + } > + > // The following functions were moved to ResourceLoader > > //- > > Index: example/solr/conf/solrconfig.xml > === > --- example/solr/conf/solrconfig.xml(revision 687135) > +++ example/solr/conf/solrconfig.xmlTue Aug 19 11:13:13 PDT 2008 > @@ -114,6 +114,12 @@ >This is not needed if lock type is 'none' or 'single' >--> > false > + > + > +false > > > > use > > > > --- end patch --- > > On Aug 18, 2008, at 8:04 PM, Yonik Seeley wrote: > > > It's not a deadlock (just a synchronization bottleneck) , but it is a > > known issue in Lucene and there has been some progress in improving > > the situation. > > -Yonik > > > > > > On Mon, Aug 18, 2008 at 10:55 PM, Matthew R
Re: shards and performance
As long as Solr/Lucene makes smart use from memory (and they from my experiences), it is really easy to calculate how long a huge query/update will take when you know how much the smaller ones will take. Just keep in mind that the resource consumption of memory and disk space is almost always proportional. 2008/8/19 Mike Klaas <[EMAIL PROTECTED]> > > On 19-Aug-08, at 12:58 PM, Phillip Farber wrote: > >> >> So you experience differs from Mike's. Obviously it's an important >> decision as to whether to buy more machines. Can you (or Mike) weigh in on >> what factors led to your different take on local shards vs. shards >> distributed across machines? >> > > I do both; the only reason I have two shards on each machine is to squeeze > maximum performance out of an equipment budget. Err on the side of multiple > machines. > > At least for building the index, the number of shards really does >>> help. To index Medline (1.6e7 docs which is 60Gb in XML text) on a >>> single machine starts at about 100doc/s but slows down to 10doc/s when >>> the index grows. It seems as though the limit is reached once you run >>> out of RAM and it gets slower and slower in a linear fashion the >>> larger the index you get. >>> My sweet spot was 5 machines with 8GB RAM for indexing about 60GB of >>> data. >>> >> >> Can you say what the specs were for these machines? Given that I have more >> like 1TB of data over 1M docs how do you think my machine requirements might >> be affected as compared to yours? >> > > You are in a much better position to determine this than we are. See how > big an index you can put on a single machine while maintaining acceptible > performance using a typical query load. It's relatively safe to extrapolate > linearly from that. > > -Mike > -- Alexander Ramos Jardim
Re: Deadlock in lucene?
I don't think it will help; for instance SegmentReader of Lucene: public synchronized Document document(int n, FieldSelector fieldSelector) Unsynchronized (in future) SOLR caching should help. -Fuad I know this isn't really the place for this, so please forgive me - but does this patch look reasonably safe to use to skip the isDeleted check inside of FunctionQuery? My reasoning behind this is that many people (us included) will be building the index on a separate server, and then using the replication scripts to publish the files out to several read-only servers. On those instances, deletedDocs would always be empty, since it's a read only instance - and so we can conveniently skip the Lucene code in question. This flag would also be good for other optimizations that can only be made when you assume the index is read-only. Solr seems to work with the flag set - any reasons why this will crash and/or kill my kitten? (please forgive my posting this here instead of in solr-dev!) Index: src/java/org/apache/solr/search/FunctionQParser.java ...
Re: Deadlock in lucene?
FYI, I just slipped this optimization into trunk. -Yonik On Tue, Aug 19, 2008 at 4:37 PM, Yonik Seeley <[EMAIL PROTECTED]> wrote: > It doesn't matter that it's executed on the read-only server... it > matters if any of the docs are marked as deleted. That's the > condition that you probably want to check for. > > -Yonik > > On Tue, Aug 19, 2008 at 4:26 PM, Matthew Runo <[EMAIL PROTECTED]> wrote: >> I know this isn't really the place for this, so please forgive me - but does >> this patch look reasonably safe to use to skip the isDeleted check inside of >> FunctionQuery? >> >> My reasoning behind this is that many people (us included) will be building >> the index on a separate server, and then using the replication scripts to >> publish the files out to several read-only servers. On those instances, >> deletedDocs would always be empty, since it's a read only instance - and so >> we can conveniently skip the Lucene code in question. This flag would also >> be good for other optimizations that can only be made when you assume the >> index is read-only. >> >> Solr seems to work with the flag set - any reasons why this will crash >> and/or kill my kitten? >> >> (please forgive my posting this here instead of in solr-dev!) >> >> Index: src/java/org/apache/solr/search/FunctionQParser.java >> === >> --- src/java/org/apache/solr/search/FunctionQParser.java(revision >> 687135) >> +++ src/java/org/apache/solr/search/FunctionQParser.javaTue Aug 19 >> 11:08:45 PDT 2008 >> @@ -49,7 +49,7 @@ >> } >> ***/ >> >> -return new FunctionQuery(vs); >> +return new FunctionQuery(vs, >> req.getSchema().getSolrConfig().isReadOnly() ); >> } >> >> /** >> Index: src/java/org/apache/solr/search/function/FunctionQuery.java >> === >> --- src/java/org/apache/solr/search/function/FunctionQuery.java (revision >> 687135) >> +++ src/java/org/apache/solr/search/function/FunctionQuery.java Tue Aug 19 >> 11:08:45 PDT 2008 >> @@ -31,12 +31,14 @@ >> */ >> public class FunctionQuery extends Query { >> ValueSource func; >> + Boolean readOnly; >> >> /** >>* @param func defines the function to be used for scoring >>*/ >> - public FunctionQuery(ValueSource func) { >> + public FunctionQuery(ValueSource func, Boolean readOnly) { >> this.func=func; >> +this.readOnly=readOnly; >> } >> >> /** @return The associated ValueSource */ >> @@ -113,7 +115,7 @@ >> if (doc>=maxDoc) { >> return false; >> } >> -if (reader.isDeleted(doc)) continue; >> +if (!readOnly && reader.isDeleted(doc)) continue; >> // todo: maybe allow score() to throw a specific exception >> // and continue on to the next document if it is thrown... >> // that may be useful, but exceptions aren't really good >> Index: src/java/org/apache/solr/core/Config.java >> === >> --- src/java/org/apache/solr/core/Config.java (revision 687135) >> +++ src/java/org/apache/solr/core/Config.java Tue Aug 19 11:08:45 PDT 2008 >> @@ -45,6 +45,8 @@ >> private final String name; >> private final SolrResourceLoader loader; >> >> + private Boolean readOnly; >> + >> /** >>* @deprecated Use [EMAIL PROTECTED] #Config(SolrResourceLoader, String, >> InputStream, >> String)} instead. >>*/ >> @@ -254,6 +256,19 @@ >> return val!=null ? Double.parseDouble(val) : def; >>} >> >> + /** >> + * Is the index set up to be readOnly? If so, this will cause the >> FunctionQuery stuff to not check >> + * for deleted documents. >> + * @return boolean readOnly >> + */ >> + public boolean isReadOnly() { >> + if( this.readOnly == null ){ >> + readOnly = getBool("/mainIndex/readOnly", false); >> + } >> + >> + return readOnly; >> + } >> + >> // The following functions were moved to ResourceLoader >> >> //- >> >> Index: example/solr/conf/solrconfig.xml >> === >> --- example/solr/conf/solrconfig.xml(revision 687135) >> +++ example/solr/conf/solrconfig.xmlTue Aug 19 11:13:13 PDT 2008 >> @@ -114,6 +114,12 @@ >> This is not needed if lock type is 'none' or 'single' >> --> >> false >> + >> + >> + false >> >> >> > >> On Aug 18, 2008, at 8:04 PM, Yonik Seeley wrote: >> >>> It's not a deadlock (just a synchronization bottleneck) , but it is a >>> known issue in Lucene and there has been some progress in improving >>> the situation. >>> -Yonik >>> >>> >>> On Mon, Aug 18, 2008 at 10:55 PM, Matthew Runo <[EMAIL PROTECTED]> wrote: Hello folks! I was just wondering if anyone else has seen this issue under heavy load. We had some servers set to very high th
Re: Localisation, faceting
Solr has pluggable query parsers, but the default one is the Lucene one, so I'd make use of Lucene's QueryParser. Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message > From: Pierre Auslaender <[EMAIL PROTECTED]> > To: solr-user@lucene.apache.org > Sent: Monday, August 18, 2008 6:08:47 PM > Subject: Re: Localisation, faceting > > Excellent point about the saved queries. Thanks! So I could sniff the > locale (from the HTML page or the Java application,...) and infer the > "query language", or try to do automatic "guessing" of the language > based on the operator names (if they don't collide with indexed terms). > > This brings up an other question: which query parser should I use? I > guess it would be a bad idea to invent one, it would be better to reuse > or adapt "the" query parser used by SOLR - or is it Lucene? Can you > point me to the parser? > > Thanks, > Pierre > > Walter Underwood a écrit : > > I would do it in the client, even if it meant parsing the query, > > modifying it, then unparsing it. > > > > This is exactly like changing "To:" to "Zu:" in a mail header. > > Show that in the client, but make it standard before it goes > > onto the network. > > > > If queries at the Solr/Lucene level are standard, then users > > with different locale settings could share saved queries. > > > > wunder > > > > On 8/18/08 2:18 PM, "Pierre Auslaender" wrote: > > > > > >> Would that be of any interest to the SOLR / Lucene community, given the > >> trend to globalisation / regionalisation ? My base is Switzerland - 4 > >> official national tongues, none of them English. > >> > >> If one were to localise the boolean operators, would that have to be at > >> the Lucene level, or could that be done at the SOLR level ? > >> > >> Thanks, > >> Pierre > >> > >> Otis Gospodnetic a écrit : > >> > >>> Hi, > >>> > >>> Regarding Boolean operator localization -- there was a person who > >>> submitted > >>> patches for the same functionality, but for Lucene's QueryParser. This > >>> was > a > >>> few years ago. I think his patch was never applied. Perhaps that helps. > >>> > >>> Otis > >>> -- > >>> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch > >>> > >>> > >>> > >>> - Original Message > >>> > >>> > From: Pierre Auslaender > To: solr-user@lucene.apache.org > Sent: Saturday, August 16, 2008 12:50:53 PM > Subject: Localisation, faceting > > Hello, > > I have a couple of questions: > > 1/ Is it possible to localise query operator names without writing code? > For instance, I'd like to issue queries with French operator names, e.g. > ET (instead of AND), OU (instead of OR), etc. > > 2/ Is it possible for Solr to generate, in the XML response, the URLs or > complete queries for each facet in a faceted search? > > Here's an example. Say my first query is : > > http://localhost:8080/solr/select?q=bac&facet=true&facet.field=kind&facet.li > mit=-1 > > The "kind" field has three values: material, immaterial, time. I get > back something like this: > > > > > > 1024 > 27633 > 389 > > > > > If I want to drill down into one facet, say into "material", I have to > "manually" rebuild a query like this: > > http://localhost:8080/solr/select?q=bac&facet=true&facet.field=kind&facet.li > mit=-1&fq=kind:"material" > > It's not too difficult, but surely Solr could add this URL or query > string under the "material" element. Is this possible? Or do I have to > XSLT the result myself? > > Thanks, > > Pierre Auslaender > > > >>> > >>> > > > > > >
Re: Deadlock in lucene?
It doesn't matter that it's executed on the read-only server... it matters if any of the docs are marked as deleted. That's the condition that you probably want to check for. -Yonik On Tue, Aug 19, 2008 at 4:26 PM, Matthew Runo <[EMAIL PROTECTED]> wrote: > I know this isn't really the place for this, so please forgive me - but does > this patch look reasonably safe to use to skip the isDeleted check inside of > FunctionQuery? > > My reasoning behind this is that many people (us included) will be building > the index on a separate server, and then using the replication scripts to > publish the files out to several read-only servers. On those instances, > deletedDocs would always be empty, since it's a read only instance - and so > we can conveniently skip the Lucene code in question. This flag would also > be good for other optimizations that can only be made when you assume the > index is read-only. > > Solr seems to work with the flag set - any reasons why this will crash > and/or kill my kitten? > > (please forgive my posting this here instead of in solr-dev!) > > Index: src/java/org/apache/solr/search/FunctionQParser.java > === > --- src/java/org/apache/solr/search/FunctionQParser.java(revision > 687135) > +++ src/java/org/apache/solr/search/FunctionQParser.javaTue Aug 19 > 11:08:45 PDT 2008 > @@ -49,7 +49,7 @@ > } > ***/ > > -return new FunctionQuery(vs); > +return new FunctionQuery(vs, > req.getSchema().getSolrConfig().isReadOnly() ); > } > > /** > Index: src/java/org/apache/solr/search/function/FunctionQuery.java > === > --- src/java/org/apache/solr/search/function/FunctionQuery.java (revision > 687135) > +++ src/java/org/apache/solr/search/function/FunctionQuery.java Tue Aug 19 > 11:08:45 PDT 2008 > @@ -31,12 +31,14 @@ > */ > public class FunctionQuery extends Query { > ValueSource func; > + Boolean readOnly; > > /** >* @param func defines the function to be used for scoring >*/ > - public FunctionQuery(ValueSource func) { > + public FunctionQuery(ValueSource func, Boolean readOnly) { > this.func=func; > +this.readOnly=readOnly; > } > > /** @return The associated ValueSource */ > @@ -113,7 +115,7 @@ > if (doc>=maxDoc) { > return false; > } > -if (reader.isDeleted(doc)) continue; > +if (!readOnly && reader.isDeleted(doc)) continue; > // todo: maybe allow score() to throw a specific exception > // and continue on to the next document if it is thrown... > // that may be useful, but exceptions aren't really good > Index: src/java/org/apache/solr/core/Config.java > === > --- src/java/org/apache/solr/core/Config.java (revision 687135) > +++ src/java/org/apache/solr/core/Config.java Tue Aug 19 11:08:45 PDT 2008 > @@ -45,6 +45,8 @@ > private final String name; > private final SolrResourceLoader loader; > > + private Boolean readOnly; > + > /** >* @deprecated Use [EMAIL PROTECTED] #Config(SolrResourceLoader, String, > InputStream, > String)} instead. >*/ > @@ -254,6 +256,19 @@ > return val!=null ? Double.parseDouble(val) : def; >} > > + /** > + * Is the index set up to be readOnly? If so, this will cause the > FunctionQuery stuff to not check > + * for deleted documents. > + * @return boolean readOnly > + */ > + public boolean isReadOnly() { > + if( this.readOnly == null ){ > + readOnly = getBool("/mainIndex/readOnly", false); > + } > + > + return readOnly; > + } > + > // The following functions were moved to ResourceLoader > > //- > > Index: example/solr/conf/solrconfig.xml > === > --- example/solr/conf/solrconfig.xml(revision 687135) > +++ example/solr/conf/solrconfig.xmlTue Aug 19 11:13:13 PDT 2008 > @@ -114,6 +114,12 @@ > This is not needed if lock type is 'none' or 'single' > --> > false > + > + > + false > > > > On Aug 18, 2008, at 8:04 PM, Yonik Seeley wrote: > >> It's not a deadlock (just a synchronization bottleneck) , but it is a >> known issue in Lucene and there has been some progress in improving >> the situation. >> -Yonik >> >> >> On Mon, Aug 18, 2008 at 10:55 PM, Matthew Runo <[EMAIL PROTECTED]> wrote: >>> >>> Hello folks! >>> >>> I was just wondering if anyone else has seen this issue under heavy load. >>> We >>> had some servers set to very high thread limits (12 core servers with 32 >>> gigs of ram), and found several threads would end up in this state >>> >>> Name: http-8080-891 >>> State: BLOCKED on [EMAIL PROTECTED] owned >>> by: >>> http-8080-191 >>> Total blocked: 97,926 Total waited: 16 >>> >>> Stack trace
Re: shards and performance
On 19-Aug-08, at 12:58 PM, Phillip Farber wrote: So you experience differs from Mike's. Obviously it's an important decision as to whether to buy more machines. Can you (or Mike) weigh in on what factors led to your different take on local shards vs. shards distributed across machines? I do both; the only reason I have two shards on each machine is to squeeze maximum performance out of an equipment budget. Err on the side of multiple machines. At least for building the index, the number of shards really does help. To index Medline (1.6e7 docs which is 60Gb in XML text) on a single machine starts at about 100doc/s but slows down to 10doc/s when the index grows. It seems as though the limit is reached once you run out of RAM and it gets slower and slower in a linear fashion the larger the index you get. My sweet spot was 5 machines with 8GB RAM for indexing about 60GB of data. Can you say what the specs were for these machines? Given that I have more like 1TB of data over 1M docs how do you think my machine requirements might be affected as compared to yours? You are in a much better position to determine this than we are. See how big an index you can put on a single machine while maintaining acceptible performance using a typical query load. It's relatively safe to extrapolate linearly from that. -Mike
Re: Deadlock in lucene?
I know this isn't really the place for this, so please forgive me - but does this patch look reasonably safe to use to skip the isDeleted check inside of FunctionQuery? My reasoning behind this is that many people (us included) will be building the index on a separate server, and then using the replication scripts to publish the files out to several read-only servers. On those instances, deletedDocs would always be empty, since it's a read only instance - and so we can conveniently skip the Lucene code in question. This flag would also be good for other optimizations that can only be made when you assume the index is read-only. Solr seems to work with the flag set - any reasons why this will crash and/or kill my kitten? (please forgive my posting this here instead of in solr-dev!) Index: src/java/org/apache/solr/search/FunctionQParser.java === --- src/java/org/apache/solr/search/FunctionQParser.java (revision 687135) +++ src/java/org/apache/solr/search/FunctionQParser.java Tue Aug 19 11:08:45 PDT 2008 @@ -49,7 +49,7 @@ } ***/ -return new FunctionQuery(vs); +return new FunctionQuery(vs, req.getSchema().getSolrConfig().isReadOnly() ); } /** Index: src/java/org/apache/solr/search/function/FunctionQuery.java === --- src/java/org/apache/solr/search/function/FunctionQuery.java (revision 687135) +++ src/java/org/apache/solr/search/function/FunctionQuery.java Tue Aug 19 11:08:45 PDT 2008 @@ -31,12 +31,14 @@ */ public class FunctionQuery extends Query { ValueSource func; + Boolean readOnly; /** * @param func defines the function to be used for scoring */ - public FunctionQuery(ValueSource func) { + public FunctionQuery(ValueSource func, Boolean readOnly) { this.func=func; +this.readOnly=readOnly; } /** @return The associated ValueSource */ @@ -113,7 +115,7 @@ if (doc>=maxDoc) { return false; } -if (reader.isDeleted(doc)) continue; +if (!readOnly && reader.isDeleted(doc)) continue; // todo: maybe allow score() to throw a specific exception // and continue on to the next document if it is thrown... // that may be useful, but exceptions aren't really good Index: src/java/org/apache/solr/core/Config.java === --- src/java/org/apache/solr/core/Config.java (revision 687135) +++ src/java/org/apache/solr/core/Config.java Tue Aug 19 11:08:45 PDT 2008 @@ -45,6 +45,8 @@ private final String name; private final SolrResourceLoader loader; + private Boolean readOnly; + /** * @deprecated Use [EMAIL PROTECTED] #Config(SolrResourceLoader, String, InputStream, String)} instead. */ @@ -254,6 +256,19 @@ return val!=null ? Double.parseDouble(val) : def; } + /** + * Is the index set up to be readOnly? If so, this will cause the FunctionQuery stuff to not check + * for deleted documents. + * @return boolean readOnly + */ + public boolean isReadOnly() { + if( this.readOnly == null ){ + readOnly = getBool("/mainIndex/readOnly", false); + } + + return readOnly; + } + // The following functions were moved to ResourceLoader //- Index: example/solr/conf/solrconfig.xml === --- example/solr/conf/solrconfig.xml(revision 687135) +++ example/solr/conf/solrconfig.xmlTue Aug 19 11:13:13 PDT 2008 @@ -114,6 +114,12 @@ This is not needed if lock type is 'none' or 'single' --> false + + + false
Re: solr-ruby version management
I like this idea. Perhaps separate the solr version and the solr-ruby version with a dash instead of dot -- solr-ruby-1.3.0-0.0.6 Otis -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch -- Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch - Original Message > From: Koji Sekiguchi <[EMAIL PROTECTED]> > To: solr-user@lucene.apache.org; [EMAIL PROTECTED] > Sent: Tuesday, August 19, 2008 4:24:31 AM > Subject: solr-ruby version management > > From: http://www.nabble.com/CHANGES.txt-td18901774.html > > The latest version of solr-ruby is 0.0.6: > > solr-ruby-0.0.6.gem > http://rubyforge.org/frs/?group_id=2875&release_id=23885 > > I think it isn't clear what Solr version is corresponding. > > I'd like to change this to solr-ruby-{solrVersion}.{solr-rubyVersion}.gem > when Solr 1.3 is released. Where solr-rubyVersion is two digits. > That is, the first official release of solr-ruby will be > solr-ruby-1.3.0.01.gem. > > Any objections to changing to this new version format? > Or anyone who has suggestions, please let me know. > > Koji
Re: shards and performance
Thanks, Ian, for the considered reply. See below. Ian Connor wrote: I have not seen any boost by having an index split into shards on the same machine. However, when you split it into smaller shards on different machines (cpu/ram/hdd), the performance boost worth it. So you experience differs from Mike's. Obviously it's an important decision as to whether to buy more machines. Can you (or Mike) weigh in on what factors led to your different take on local shards vs. shards distributed across machines? At least for building the index, the number of shards really does help. To index Medline (1.6e7 docs which is 60Gb in XML text) on a single machine starts at about 100doc/s but slows down to 10doc/s when the index grows. It seems as though the limit is reached once you run out of RAM and it gets slower and slower in a linear fashion the larger the index you get. My sweet spot was 5 machines with 8GB RAM for indexing about 60GB of data. Can you say what the specs were for these machines? Given that I have more like 1TB of data over 1M docs how do you think my machine requirements might be affected as compared to yours? HDD speed helps with the initial rate and I found modern cheap SATA drives that get 60-50Mb/s ideal. SCSI is faster but costs more. So, for the money, you can add more shards instead of paying extra for SCSI. I also tried a RAID0 array of USB drives hoping the access speeds would help - but it didn't and the performance was the same as it was for cheap SATA drives. However, it took me a few weeks of experimenting to find this. I can add more machines, and the index will get faster. However, the rate of adding docs (my slope) does not degrade while I am building the index with 5 machines. On Tue, Aug 19, 2008 at 2:47 PM, Mike Klaas <[EMAIL PROTECTED]> wrote: On 19-Aug-08, at 10:18 AM, Phillip Farber wrote: I'm trying to understand how splitting a monolithic index into shards improves query response time. Please tell me if I'm on the right track here. Were does the increase in performance come from? Is it that in-memory arrays are smaller when the index is partitioned into shards? Or is it due to the likelihood that the solr process behind each shard is running on its own CPU on a multi-CPU box? Usually, the performance is obtained by putting shards on separate machines. However, I have had success partitioning an index on a single machine so that a single query can be executed by multiple cpus. It also helps to have each index on a different hard disk. And it must be the case that the overhead of merging results from several shards is still less than the expense of searching a monolithic index. True? Merging overhead is relatively insignificant. Fetching stored fields from more docs than necessary is an expense of sharding, however. Given roughly 10 million documents in several languages inducing perhaps 200K unique terms and averaging about 1 MB/doc how many shards would you recommend and how much RAM? I'd never recommend more shards on a single machine than there are cpus. For an index of that size, you will need at least 8GB of ram; 16GB would be better. Is it correct that Distributed Search (shards) is in 1.3 or does 1.2 support it? It is 1.3 only. If 1.3, is the nightly build the best one to grab bearing in mind that we would want any protocols around distributed search to be as stable as possible? Or just wait for the 1.3 release? Go for the nightly build. The release will look very similar to it. -Mike
Re: shards and performance
I have not seen any boost by having an index split into shards on the same machine. However, when you split it into smaller shards on different machines (cpu/ram/hdd), the performance boost worth it. At least for building the index, the number of shards really does help. To index Medline (1.6e7 docs which is 60Gb in XML text) on a single machine starts at about 100doc/s but slows down to 10doc/s when the index grows. It seems as though the limit is reached once you run out of RAM and it gets slower and slower in a linear fashion the larger the index you get. My sweet spot was 5 machines with 8GB RAM for indexing about 60GB of data. HDD speed helps with the initial rate and I found modern cheap SATA drives that get 60-50Mb/s ideal. SCSI is faster but costs more. So, for the money, you can add more shards instead of paying extra for SCSI. I also tried a RAID0 array of USB drives hoping the access speeds would help - but it didn't and the performance was the same as it was for cheap SATA drives. However, it took me a few weeks of experimenting to find this. I can add more machines, and the index will get faster. However, the rate of adding docs (my slope) does not degrade while I am building the index with 5 machines. On Tue, Aug 19, 2008 at 2:47 PM, Mike Klaas <[EMAIL PROTECTED]> wrote: > On 19-Aug-08, at 10:18 AM, Phillip Farber wrote: > >> >> >> I'm trying to understand how splitting a monolithic index into shards >> improves query response time. Please tell me if I'm on the right track here. >> Were does the increase in performance come from? Is it that in-memory >> arrays are smaller when the index is partitioned into shards? Or is it due >> to the likelihood that the solr process behind each shard is running on its >> own CPU on a multi-CPU box? > > Usually, the performance is obtained by putting shards on separate machines. > However, I have had success partitioning an index on a single machine so > that a single query can be executed by multiple cpus. It also helps to have > each index on a different hard disk. > >> And it must be the case that the overhead of merging results from several >> shards is still less than the expense of searching a monolithic index. >> True? > > Merging overhead is relatively insignificant. Fetching stored fields from > more docs than necessary is an expense of sharding, however. > >> Given roughly 10 million documents in several languages inducing perhaps >> 200K unique terms and averaging about 1 MB/doc how many shards would you >> recommend and how much RAM? > > I'd never recommend more shards on a single machine than there are cpus. > For an index of that size, you will need at least 8GB of ram; 16GB would be > better. > >> Is it correct that Distributed Search (shards) is in 1.3 or does 1.2 >> support it? > > It is 1.3 only. > >> If 1.3, is the nightly build the best one to grab bearing in mind that we >> would want any protocols around distributed search to be as stable as >> possible? Or just wait for the 1.3 release? > > Go for the nightly build. The release will look very similar to it. > > -Mike > -- Regards, Ian Connor 1 Leighton St #605 Cambridge, MA 02141 Direct Line: +1 (978) 672 Call Center Phone: +1 (714) 239 3875 (24 hrs) Mobile Phone: +1 (312) 218 3209 Fax: +1(770) 818 5697 Suisse Phone: +41 (0) 22 548 1664 Skype: ian.connor
Re: Clarification on facets
A simple way is to query using debugQuery=true and parse the output: 0.74248177 = queryWeight(rawText:python), product of: 2.581456 = idf(docFreq=16017) 0.28762132 = queryNorm 0.4191762 = (MATCH) fieldWeight(rawText:python in 950285), product of: 5.196152 = tf(termFreq(rawText:python)=27) 2.581456 = idf(docFreq=16017) 0.03125 = fieldNorm(field=rawText, doc=950285) The =27 is the number of times 'python' appears in this document. You could also write a custom component that included in this information in the response. -Mike On 18-Aug-08, at 8:16 PM, Gene Campbell wrote: Thank you for the response. Always nice to have something willing to validate your thinking! Of course, if anyone has any ideas on how to get the numbers of times term is repeated in a document, I'm all ears. cheers gene On Tue, Aug 19, 2008 at 1:42 PM, Norberto Meijome <[EMAIL PROTECTED]> wrote: On Tue, 19 Aug 2008 10:18:12 +1200 "Gene Campbell" <[EMAIL PROTECTED]> wrote: Is this interpreted as meaning, there are 10 documents that will match with 'car' in the title, and likewise 6 'boat' and 2 'bike'? Correct. If so, is there any way to get counts for the *number times* a value is found in a document. I'm looking for a way to determine the number of times 'car' is repeated in the title, for example Not sure - i would suggest that a field with a term repeated several times would receive a higher score when searching for that term, but not sure how you could get the information you seek...maybe with the Luke handler ? ( but on a per-document basis...slow... ? ) B _ {Beto|Norberto|Numard} Meijome Computers are like air conditioners; they can't do their job properly if you open windows. I speak for myself, not my employer. Contents may be hot. Slippery when wet. Reading disclaimers makes you go blind. Writing them is worse. You have been Warned.
Re: shards and performance
On 19-Aug-08, at 10:18 AM, Phillip Farber wrote: I'm trying to understand how splitting a monolithic index into shards improves query response time. Please tell me if I'm on the right track here. Were does the increase in performance come from? Is it that in-memory arrays are smaller when the index is partitioned into shards? Or is it due to the likelihood that the solr process behind each shard is running on its own CPU on a multi- CPU box? Usually, the performance is obtained by putting shards on separate machines. However, I have had success partitioning an index on a single machine so that a single query can be executed by multiple cpus. It also helps to have each index on a different hard disk. And it must be the case that the overhead of merging results from several shards is still less than the expense of searching a monolithic index. True? Merging overhead is relatively insignificant. Fetching stored fields from more docs than necessary is an expense of sharding, however. Given roughly 10 million documents in several languages inducing perhaps 200K unique terms and averaging about 1 MB/doc how many shards would you recommend and how much RAM? I'd never recommend more shards on a single machine than there are cpus. For an index of that size, you will need at least 8GB of ram; 16GB would be better. Is it correct that Distributed Search (shards) is in 1.3 or does 1.2 support it? It is 1.3 only. If 1.3, is the nightly build the best one to grab bearing in mind that we would want any protocols around distributed search to be as stable as possible? Or just wait for the 1.3 release? Go for the nightly build. The release will look very similar to it. -Mike
Re: Order of returned fields
I don't think so, as solr uses a flat index to represent data. I have some efort towards representing relational data on a flat structure, but until now I don't have anything too concrete. My suggestion is: create classes that isolate the parsing strategy, so you can have dao's that doesn't really know what is happening with the data the retrieve, and your domain classes would retrieve the data as they expect independently of the format you put them on the index. This is like having a pair of dao+parser to retrieve the data in the middle tier. 2008/8/19 Pierre Auslaender <[EMAIL PROTECTED]> > Hi Alex, > > Do you think I could then specify an order on the returned fields for each > document, without reordering the fields by parsing the SOLR response ? > > Thanks, > Pierre > > Alexander Ramos Jardim a écrit : > > Hey Pierre, >> >> I don't know if my case helps you, but what I do to keep relational >> information is to put the related data all in the same field. >> >> Let me give you an example: >> >> I have a product index. Each product has a list of manufacturer >> properties, >> like dimensions, color, connections supported (usb, bluetooth and so on), >> etc etc etc. Each property belongs to a context, so I index data >> following >> this model: >> >> propertyId ^ propertyLabel ^ propertyType ^ propertyValue >> >> Then I parse each result returned on my application. >> >> Does that help you? >> >> 2008/8/18 Pierre Auslaender <[EMAIL PROTECTED]> >> >> >> >>> Order matters in my application because I'm indexing structured data - >>> actually, a domain object model (a bit like with Hibernate Search), only >>> I'm >>> adding parents to children, instead of children to parents. So say I have >>> Cities and People, with a 1-N relationship between City and People. I'm >>> indexing documents for Cities, and documents for People, and the >>> documents >>> for People contain the fields of the City they're living in. >>> >>> When I display the results, I'd like the People fields to display before >>> the City fields. I can parse the Solr response and rearrange the fields >>> (in >>> the Java middle-tier, or with XSLT, or in the Javascript client), but >>> then I >>> have to "know" of the domain in too many places. I have to "know" of the >>> domain in my Java application, in the SOLR schema file, and in the >>> Javascript that rearranges the fields... I thought maybe I could avoid >>> the >>> latter and put as much application information as possible in the SOLR >>> schema, for instance specifiy an order for the returned fields... >>> >>> Thanks anyway, >>> >>> Pierre >>> >>> Erik Hatcher a écrit : >>> >>> Yes, this is normal behavior. >>> >>> Does order matter in your application? Could you explain why? Order is maintained with multiple values of the same field name, though - which is important. Erik On Aug 17, 2008, at 6:38 PM, Pierre Auslaender wrote: Hello, > After a Solr query, I always get the fields back in alphabetical order, > no matter how I insert them. > Is this the normal behaviour? > > This is when adding the document... > > ch.tsr.esg.domain.ProgramCollection[id: > 1] > collection > Bac à sable > > http://localhost:8080/esg/api/collections/1 > > > ... and this is when retrieving it: > > Bac à sable > > http://localhost:8080/esg/api/collections/1 > collection > ch.tsr.esg.domain.ProgramCollection[id: > 1] > > > Thanks a lot, > Pierre Auslaender > > > >>> >> >> >> > -- Alexander Ramos Jardim
shards and performance
I'm trying to understand how splitting a monolithic index into shards improves query response time. Please tell me if I'm on the right track here. Were does the increase in performance come from? Is it that in-memory arrays are smaller when the index is partitioned into shards? Or is it due to the likelihood that the solr process behind each shard is running on its own CPU on a multi-CPU box? And it must be the case that the overhead of merging results from several shards is still less than the expense of searching a monolithic index. True? Given roughly 10 million documents in several languages inducing perhaps 200K unique terms and averaging about 1 MB/doc how many shards would you recommend and how much RAM? Is it correct that Distributed Search (shards) is in 1.3 or does 1.2 support it? If 1.3, is the nightly build the best one to grab bearing in mind that we would want any protocols around distributed search to be as stable as possible? Or just wait for the 1.3 release? Thanks very much, Phil -- Phillip Farber - http://www.umdl.umich.edu
RE: Can I change "/select" to POST and not GET
Hi Ian, Thanks for the reply. I am using CURL, and the library was sending a GET request to solr. But I have changed it to POST. Now it's working properly. Thanks, Sunil -Original Message- From: Ian Connor [mailto:[EMAIL PROTECTED] Sent: Tuesday, August 19, 2008 7:53 PM To: solr-user@lucene.apache.org Subject: Re: Can I change "/select" to POST and not GET The query limit is a software imposed limit. What client are you using and can that be configured to allow more? On Tue, Aug 19, 2008 at 9:43 AM, Sunil <[EMAIL PROTECTED]> wrote: > Hi, > > My query limit is exceeding the 1024 URL length. Can I configure solr to > accept POST requests while searching content in solr? > > Thanks in advance, > Sunil. > > > -- Regards, Ian Connor
Re: Can I change "/select" to POST and not GET
The query limit is a software imposed limit. What client are you using and can that be configured to allow more? On Tue, Aug 19, 2008 at 9:43 AM, Sunil <[EMAIL PROTECTED]> wrote: > Hi, > > My query limit is exceeding the 1024 URL length. Can I configure solr to > accept POST requests while searching content in solr? > > Thanks in advance, > Sunil. > > > -- Regards, Ian Connor
Can I change "/select" to POST and not GET
Hi, My query limit is exceeding the 1024 URL length. Can I configure solr to accept POST requests while searching content in solr? Thanks in advance, Sunil.
Re: which shard is a result coming from
Could this idea of a "computed field" actually just be a query filter? Can the filter just add a field on the return like this? On Tue, Aug 19, 2008 at 9:10 AM, Ian Connor <[EMAIL PROTECTED]> wrote: > I was thinking more that it would be an extra field you get back. My > understanding of doing updates requires: > > 1. get your document (either by ID or from a search) > 2. merge your update into the doc > 3. update solr with the doc (which essentially just writes it all > again but as you have done the merge nothing is lost). > > for shards, i would read from the main shard but write directly back > to the shard directly. The idea is that you don't need to concern the > main server with an update to a child shard (unless this direct bypass > is dangerous somehow). > > So finding out which shard it came from on the initial "get" is key to > know where to send the merged document. > > Some sort of "computed field" would work here. Something that is not > actually in the index but is returned. The indexing and storing of the > value is not needed as you can always filter which shards you want > when creating the query. > > On Tue, Aug 19, 2008 at 8:59 AM, Brian Whitman <[EMAIL PROTECTED]> wrote: >> >> On Aug 19, 2008, at 8:49 AM, Ian Connor wrote: >> >>> What is the current "special requestHandler" that you can set currently? >> >> If you're referring to my issue post, that's just something we have >> internally (not in trunk solr) that we use instead of /update -- it just >> inserts a hostname:port/solr into the incoming >> XML doc add stream. Not very clean but it works. Use lars's patch. >> >> >> >> > > > > -- > Regards, > > Ian Connor > 1 Leighton St #605 > Cambridge, MA 02141 > Direct Line: +1 (978) 672 > Call Center Phone: +1 (714) 239 3875 (24 hrs) > Mobile Phone: +1 (312) 218 3209 > Fax: +1(770) 818 5697 > Suisse Phone: +41 (0) 22 548 1664 > Skype: ian.connor > -- Regards, Ian Connor
Re: Order of returned fields
Hi Alex, Do you think I could then specify an order on the returned fields for each document, without reordering the fields by parsing the SOLR response ? Thanks, Pierre Alexander Ramos Jardim a écrit : Hey Pierre, I don't know if my case helps you, but what I do to keep relational information is to put the related data all in the same field. Let me give you an example: I have a product index. Each product has a list of manufacturer properties, like dimensions, color, connections supported (usb, bluetooth and so on), etc etc etc. Each property belongs to a context, so I index data following this model: propertyId ^ propertyLabel ^ propertyType ^ propertyValue Then I parse each result returned on my application. Does that help you? 2008/8/18 Pierre Auslaender <[EMAIL PROTECTED]> Order matters in my application because I'm indexing structured data - actually, a domain object model (a bit like with Hibernate Search), only I'm adding parents to children, instead of children to parents. So say I have Cities and People, with a 1-N relationship between City and People. I'm indexing documents for Cities, and documents for People, and the documents for People contain the fields of the City they're living in. When I display the results, I'd like the People fields to display before the City fields. I can parse the Solr response and rearrange the fields (in the Java middle-tier, or with XSLT, or in the Javascript client), but then I have to "know" of the domain in too many places. I have to "know" of the domain in my Java application, in the SOLR schema file, and in the Javascript that rearranges the fields... I thought maybe I could avoid the latter and put as much application information as possible in the SOLR schema, for instance specifiy an order for the returned fields... Thanks anyway, Pierre Erik Hatcher a écrit : Yes, this is normal behavior. Does order matter in your application? Could you explain why? Order is maintained with multiple values of the same field name, though - which is important. Erik On Aug 17, 2008, at 6:38 PM, Pierre Auslaender wrote: Hello, After a Solr query, I always get the fields back in alphabetical order, no matter how I insert them. Is this the normal behaviour? This is when adding the document... ch.tsr.esg.domain.ProgramCollection[id: 1] collection Bac à sable http://localhost:8080/esg/api/collections/1 ... and this is when retrieving it: Bac à sable http://localhost:8080/esg/api/collections/1 collection ch.tsr.esg.domain.ProgramCollection[id: 1] Thanks a lot, Pierre Auslaender
Re: which shard is a result coming from
I was thinking more that it would be an extra field you get back. My understanding of doing updates requires: 1. get your document (either by ID or from a search) 2. merge your update into the doc 3. update solr with the doc (which essentially just writes it all again but as you have done the merge nothing is lost). for shards, i would read from the main shard but write directly back to the shard directly. The idea is that you don't need to concern the main server with an update to a child shard (unless this direct bypass is dangerous somehow). So finding out which shard it came from on the initial "get" is key to know where to send the merged document. Some sort of "computed field" would work here. Something that is not actually in the index but is returned. The indexing and storing of the value is not needed as you can always filter which shards you want when creating the query. On Tue, Aug 19, 2008 at 8:59 AM, Brian Whitman <[EMAIL PROTECTED]> wrote: > > On Aug 19, 2008, at 8:49 AM, Ian Connor wrote: > >> What is the current "special requestHandler" that you can set currently? > > If you're referring to my issue post, that's just something we have > internally (not in trunk solr) that we use instead of /update -- it just > inserts a hostname:port/solr into the incoming > XML doc add stream. Not very clean but it works. Use lars's patch. > > > > -- Regards, Ian Connor 1 Leighton St #605 Cambridge, MA 02141 Direct Line: +1 (978) 672 Call Center Phone: +1 (714) 239 3875 (24 hrs) Mobile Phone: +1 (312) 218 3209 Fax: +1(770) 818 5697 Suisse Phone: +41 (0) 22 548 1664 Skype: ian.connor
Re: which shard is a result coming from
On Aug 19, 2008, at 8:49 AM, Ian Connor wrote: What is the current "special requestHandler" that you can set currently? If you're referring to my issue post, that's just something we have internally (not in trunk solr) that we use instead of /update -- it just inserts a hostname:port/solr into the incoming XML doc add stream. Not very clean but it works. Use lars's patch.
Re: which shard is a result coming from
What is the current "special requestHandler" that you can set currently? On Tue, Aug 19, 2008 at 8:41 AM, Shalin Shekhar Mangar <[EMAIL PROTECTED]> wrote: > There's an issue open for this. Look at > https://issues.apache.org/jira/browse/SOLR-705 > > On Tue, Aug 19, 2008 at 6:08 PM, Ian Connor <[EMAIL PROTECTED]> wrote: >> Hi, >> >> Is there a way to know which shard contains a given result. This would >> help when you want to write updates back to the correct place. >> >> The idea is when you read your results, there would be an item to say >> where a given result came from. >> >> -- >> Regards, >> >> Ian Connor >> > > > > -- > Regards, > Shalin Shekhar Mangar. > -- Regards, Ian Connor 1 Leighton St #605 Cambridge, MA 02141 Direct Line: +1 (978) 672 Call Center Phone: +1 (714) 239 3875 (24 hrs) Mobile Phone: +1 (312) 218 3209 Fax: +1(770) 818 5697 Suisse Phone: +41 (0) 22 548 1664 Skype: ian.connor
Re: which shard is a result coming from
There's an issue open for this. Look at https://issues.apache.org/jira/browse/SOLR-705 On Tue, Aug 19, 2008 at 6:08 PM, Ian Connor <[EMAIL PROTECTED]> wrote: > Hi, > > Is there a way to know which shard contains a given result. This would > help when you want to write updates back to the correct place. > > The idea is when you read your results, there would be an item to say > where a given result came from. > > -- > Regards, > > Ian Connor > -- Regards, Shalin Shekhar Mangar.
which shard is a result coming from
Hi, Is there a way to know which shard contains a given result. This would help when you want to write updates back to the correct place. The idea is when you read your results, there would be an item to say where a given result came from. -- Regards, Ian Connor
solr-ruby version management
From: http://www.nabble.com/CHANGES.txt-td18901774.html The latest version of solr-ruby is 0.0.6: solr-ruby-0.0.6.gem http://rubyforge.org/frs/?group_id=2875&release_id=23885 I think it isn't clear what Solr version is corresponding. I'd like to change this to solr-ruby-{solrVersion}.{solr-rubyVersion}.gem when Solr 1.3 is released. Where solr-rubyVersion is two digits. That is, the first official release of solr-ruby will be solr-ruby-1.3.0.01.gem. Any objections to changing to this new version format? Or anyone who has suggestions, please let me know. Koji
Re: Solr won't start under jetty on RHEL5.2
On Tue, Aug 19, 2008 at 4:50 AM, Jon Drukman <[EMAIL PROTECTED]> wrote: > Jon Drukman wrote: > >> I just migrated my solr instance to a new server, running RHEL5.2. I >> installed java from yum but I suspect it's different from the one I used to >> use. >> > > > Turns out my instincts were correct. The version from yum does not work. I > installed the official sun jdk and now it starts fine. > > bad: > > java version "1.4.2" > gij (GNU libgcj) version 4.1.2 20071124 (Red Hat 4.1.2-42) > > good: > > java version "1.6.0_07" > Java(TM) SE Runtime Environment (build 1.6.0_07-b06) > Java HotSpot(TM) 64-Bit Server VM (build 10.0-b23, mixed mode) > Probably because Solr is compiled with Java 5. AFAIK, gcj does not support Java 5 features fully. -- Regards, Shalin Shekhar Mangar.