Re: Deadlock in lucene?

2008-08-19 Thread Matthew Runo
Ouch, that's certainly a problem! I'll have to think some more on this  
one.


Thanks for your time!

Matthew Runo
Software Engineer, Zappos.com
[EMAIL PROTECTED] - 702-943-7833

On Aug 19, 2008, at 1:42 PM, Otis Gospodnetic wrote:

Matthew, just because an index is read-only on some server it  
doesn't mean it contains no deletes (no docs marked as deleted, but  
not yet removed from the index).  So you still want to check  
isDeleted(doc) *unless* you are certain the index has no docs marked  
as deleted (this happens after optimization).


Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 

From: Matthew Runo <[EMAIL PROTECTED]>
To: solr-user@lucene.apache.org
Sent: Tuesday, August 19, 2008 4:26:59 PM
Subject: Re: Deadlock in lucene?

I know this isn't really the place for this, so please forgive me -
but does this patch look reasonably safe to use to skip the isDeleted
check inside of FunctionQuery?

My reasoning behind this is that many people (us included) will be
building the index on a separate server, and then using the
replication scripts to publish the files out to several read-only
servers. On those instances, deletedDocs would always be empty, since
it's a read only instance - and so we can conveniently skip the  
Lucene
code in question. This flag would also be good for other  
optimizations

that can only be made when you assume the index is read-only.

Solr seems to work with the flag set - any reasons why this will  
crash

and/or kill my kitten?

(please forgive my posting this here instead of in solr-dev!)

Index: src/java/org/apache/solr/search/FunctionQParser.java
===
--- src/java/org/apache/solr/search/FunctionQParser.java(revision
687135)
+++ src/java/org/apache/solr/search/FunctionQParser.javaTue Aug  
19

11:08:45 PDT 2008
@@ -49,7 +49,7 @@
 }
 ***/

-return new FunctionQuery(vs);
+return new FunctionQuery(vs,
req.getSchema().getSolrConfig().isReadOnly() );
   }

   /**
Index: src/java/org/apache/solr/search/function/FunctionQuery.java
===
--- src/java/org/apache/solr/search/function/FunctionQuery.java
(revision 687135)
+++ src/java/org/apache/solr/search/function/FunctionQuery.java 
Tue

Aug 19 11:08:45 PDT 2008
@@ -31,12 +31,14 @@
  */
 public class FunctionQuery extends Query {
   ValueSource func;
+  Boolean readOnly;

   /**
* @param func defines the function to be used for scoring
*/
-  public FunctionQuery(ValueSource func) {
+  public FunctionQuery(ValueSource func, Boolean readOnly) {
 this.func=func;
+this.readOnly=readOnly;
   }

   /** @return The associated ValueSource */
@@ -113,7 +115,7 @@
 if (doc>=maxDoc) {
   return false;
 }
-if (reader.isDeleted(doc)) continue;
+if (!readOnly && reader.isDeleted(doc)) continue;
 // todo: maybe allow score() to throw a specific exception
 // and continue on to the next document if it is thrown...
 // that may be useful, but exceptions aren't really good
Index: src/java/org/apache/solr/core/Config.java
===
--- src/java/org/apache/solr/core/Config.java(revision 687135)
+++ src/java/org/apache/solr/core/Config.javaTue Aug 19  
11:08:45 PDT

2008
@@ -45,6 +45,8 @@
   private final String name;
   private final SolrResourceLoader loader;

+  private Boolean readOnly;
+
   /**
* @deprecated Use [EMAIL PROTECTED] #Config(SolrResourceLoader, String,
InputStream, String)} instead.
*/
@@ -254,6 +256,19 @@
  return val!=null ? Double.parseDouble(val) : def;
}

+  /**
+   * Is the index set up to be readOnly? If so, this will cause the
FunctionQuery stuff to not check
+   * for deleted documents.
+   * @return boolean readOnly
+   */
+   public boolean isReadOnly() {
+   if( this.readOnly == null ){
+   readOnly = getBool("/mainIndex/readOnly", false);
+   }
+
+   return readOnly;
+   }
+
   // The following functions were moved to ResourceLoader

//-

Index: example/solr/conf/solrconfig.xml
===
--- example/solr/conf/solrconfig.xml(revision 687135)
+++ example/solr/conf/solrconfig.xmlTue Aug 19 11:13:13 PDT 2008
@@ -114,6 +114,12 @@
  This is not needed if lock type is 'none' or 'single'
  -->
 false
+
+
+false



use




--- end patch ---

On Aug 18, 2008, at 8:04 PM, Yonik Seeley wrote:

It's not a deadlock (just a synchronization bottleneck) , but it  
is a

known issue in Lucene and there has been some progress in improving
the situation.
-Yonik


On Mon, Aug 18, 2008 at 10:55 PM, Matthew Runo
wrote:

Hello folks!

I was just wondering if anyone else has seen this issue under heavy
load. W

Re: Deadlock in lucene?

2008-08-19 Thread Otis Gospodnetic
Matthew, just because an index is read-only on some server it doesn't mean it 
contains no deletes (no docs marked as deleted, but not yet removed from the 
index).  So you still want to check isDeleted(doc) *unless* you are certain the 
index has no docs marked as deleted (this happens after optimization).

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
> From: Matthew Runo <[EMAIL PROTECTED]>
> To: solr-user@lucene.apache.org
> Sent: Tuesday, August 19, 2008 4:26:59 PM
> Subject: Re: Deadlock in lucene?
> 
> I know this isn't really the place for this, so please forgive me -  
> but does this patch look reasonably safe to use to skip the isDeleted  
> check inside of FunctionQuery?
> 
> My reasoning behind this is that many people (us included) will be  
> building the index on a separate server, and then using the  
> replication scripts to publish the files out to several read-only  
> servers. On those instances, deletedDocs would always be empty, since  
> it's a read only instance - and so we can conveniently skip the Lucene  
> code in question. This flag would also be good for other optimizations  
> that can only be made when you assume the index is read-only.
> 
> Solr seems to work with the flag set - any reasons why this will crash  
> and/or kill my kitten?
> 
> (please forgive my posting this here instead of in solr-dev!)
> 
> Index: src/java/org/apache/solr/search/FunctionQParser.java
> ===
> --- src/java/org/apache/solr/search/FunctionQParser.java(revision  
> 687135)
> +++ src/java/org/apache/solr/search/FunctionQParser.javaTue Aug 19  
> 11:08:45 PDT 2008
> @@ -49,7 +49,7 @@
>   }
>   ***/
> 
> -return new FunctionQuery(vs);
> +return new FunctionQuery(vs,  
> req.getSchema().getSolrConfig().isReadOnly() );
> }
> 
> /**
> Index: src/java/org/apache/solr/search/function/FunctionQuery.java
> ===
> --- src/java/org/apache/solr/search/function/FunctionQuery.java
> (revision 687135)
> +++ src/java/org/apache/solr/search/function/FunctionQuery.javaTue  
> Aug 19 11:08:45 PDT 2008
> @@ -31,12 +31,14 @@
>*/
>   public class FunctionQuery extends Query {
> ValueSource func;
> +  Boolean readOnly;
> 
> /**
>  * @param func defines the function to be used for scoring
>  */
> -  public FunctionQuery(ValueSource func) {
> +  public FunctionQuery(ValueSource func, Boolean readOnly) {
>   this.func=func;
> +this.readOnly=readOnly;
> }
> 
> /** @return The associated ValueSource */
> @@ -113,7 +115,7 @@
>   if (doc>=maxDoc) {
> return false;
>   }
> -if (reader.isDeleted(doc)) continue;
> +if (!readOnly && reader.isDeleted(doc)) continue;
>   // todo: maybe allow score() to throw a specific exception
>   // and continue on to the next document if it is thrown...
>   // that may be useful, but exceptions aren't really good
> Index: src/java/org/apache/solr/core/Config.java
> ===
> --- src/java/org/apache/solr/core/Config.java(revision 687135)
> +++ src/java/org/apache/solr/core/Config.javaTue Aug 19 11:08:45 PDT  
> 2008
> @@ -45,6 +45,8 @@
> private final String name;
> private final SolrResourceLoader loader;
> 
> +  private Boolean readOnly;
> +
> /**
>  * @deprecated Use [EMAIL PROTECTED] #Config(SolrResourceLoader, String,  
> InputStream, String)} instead.
>  */
> @@ -254,6 +256,19 @@
>return val!=null ? Double.parseDouble(val) : def;
>  }
> 
> +  /**
> +   * Is the index set up to be readOnly? If so, this will cause the  
> FunctionQuery stuff to not check
> +   * for deleted documents.
> +   * @return boolean readOnly
> +   */
> +   public boolean isReadOnly() {
> +   if( this.readOnly == null ){
> +   readOnly = getBool("/mainIndex/readOnly", false);
> +   }
> +
> +   return readOnly;
> +   }
> +
> // The following functions were moved to ResourceLoader
> 
> //-
> 
> Index: example/solr/conf/solrconfig.xml
> ===
> --- example/solr/conf/solrconfig.xml(revision 687135)
> +++ example/solr/conf/solrconfig.xmlTue Aug 19 11:13:13 PDT 2008
> @@ -114,6 +114,12 @@
>This is not needed if lock type is 'none' or 'single'
>-->
>   false
> +
> +
> +false
> 
> 
> > use
> 
> 
> 
> --- end patch ---
> 
> On Aug 18, 2008, at 8:04 PM, Yonik Seeley wrote:
> 
> > It's not a deadlock (just a synchronization bottleneck) , but it is a
> > known issue in Lucene and there has been some progress in improving
> > the situation.
> > -Yonik
> >
> >
> > On Mon, Aug 18, 2008 at 10:55 PM, Matthew R

Re: shards and performance

2008-08-19 Thread Alexander Ramos Jardim
As long as Solr/Lucene makes smart use from memory (and they from my
experiences), it is really easy to calculate how long a huge query/update
will take when you know how much the smaller ones will take. Just keep in
mind that the resource consumption of memory and disk space is almost always
proportional.

2008/8/19 Mike Klaas <[EMAIL PROTECTED]>

>
> On 19-Aug-08, at 12:58 PM, Phillip Farber wrote:
>
>>
>> So you experience differs from Mike's.  Obviously it's an important
>> decision as to whether to buy more machines.  Can you (or Mike) weigh in on
>> what factors led to your different take on local shards vs. shards
>> distributed across machines?
>>
>
> I do both; the only reason I have two shards on each machine is to squeeze
> maximum performance out of an equipment budget.  Err on the side of multiple
> machines.
>
>  At least for building the index, the number of shards really does
>>> help. To index Medline (1.6e7 docs which is 60Gb in XML text) on a
>>> single machine starts at about 100doc/s but slows down to 10doc/s when
>>> the index grows. It seems as though the limit is reached once you run
>>> out of RAM and it gets slower and slower in a linear fashion the
>>> larger the index you get.
>>> My sweet spot was 5 machines with 8GB RAM for indexing about 60GB of
>>> data.
>>>
>>
>> Can you say what the specs were for these machines? Given that I have more
>> like 1TB of data over 1M docs how do you think my machine requirements might
>> be affected as compared to yours?
>>
>
> You are in a much better position to determine this than we are.  See how
> big an index you can put on a single machine while maintaining acceptible
> performance using a typical query load.  It's relatively safe to extrapolate
> linearly from that.
>
> -Mike
>



-- 
Alexander Ramos Jardim


Re: Deadlock in lucene?

2008-08-19 Thread Fuad Efendi


I don't think it will help; for instance SegmentReader of Lucene:

public synchronized Document document(int n, FieldSelector fieldSelector)


Unsynchronized (in future) SOLR caching should help.

-Fuad



I know this isn't really the place for this, so please forgive me - but
does this patch look reasonably safe to use to skip the isDeleted check
inside of FunctionQuery?

My reasoning behind this is that many people (us included) will be
building the index on a separate server, and then using the replication
scripts to publish the files out to several read-only servers. On those
instances, deletedDocs would always be empty, since it's a read only
instance - and so we can conveniently skip the Lucene code in question.
This flag would also be good for other optimizations that can only be
made when you assume the index is read-only.

Solr seems to work with the flag set - any reasons why this will crash
and/or kill my kitten?

(please forgive my posting this here instead of in solr-dev!)

Index: src/java/org/apache/solr/search/FunctionQParser.java

...




Re: Deadlock in lucene?

2008-08-19 Thread Yonik Seeley
FYI, I just slipped this optimization into trunk.

-Yonik

On Tue, Aug 19, 2008 at 4:37 PM, Yonik Seeley <[EMAIL PROTECTED]> wrote:
> It doesn't matter that it's executed on the read-only server... it
> matters if any of the docs are marked as deleted.   That's the
> condition that you probably want to check for.
>
> -Yonik
>
> On Tue, Aug 19, 2008 at 4:26 PM, Matthew Runo <[EMAIL PROTECTED]> wrote:
>> I know this isn't really the place for this, so please forgive me - but does
>> this patch look reasonably safe to use to skip the isDeleted check inside of
>> FunctionQuery?
>>
>> My reasoning behind this is that many people (us included) will be building
>> the index on a separate server, and then using the replication scripts to
>> publish the files out to several read-only servers. On those instances,
>> deletedDocs would always be empty, since it's a read only instance - and so
>> we can conveniently skip the Lucene code in question. This flag would also
>> be good for other optimizations that can only be made when you assume the
>> index is read-only.
>>
>> Solr seems to work with the flag set - any reasons why this will crash
>> and/or kill my kitten?
>>
>> (please forgive my posting this here instead of in solr-dev!)
>>
>> Index: src/java/org/apache/solr/search/FunctionQParser.java
>> ===
>> --- src/java/org/apache/solr/search/FunctionQParser.java(revision
>> 687135)
>> +++ src/java/org/apache/solr/search/FunctionQParser.javaTue Aug 19
>> 11:08:45 PDT 2008
>> @@ -49,7 +49,7 @@
>> }
>> ***/
>>
>> -return new FunctionQuery(vs);
>> +return new FunctionQuery(vs,
>> req.getSchema().getSolrConfig().isReadOnly() );
>>   }
>>
>>   /**
>> Index: src/java/org/apache/solr/search/function/FunctionQuery.java
>> ===
>> --- src/java/org/apache/solr/search/function/FunctionQuery.java (revision
>> 687135)
>> +++ src/java/org/apache/solr/search/function/FunctionQuery.java Tue Aug 19
>> 11:08:45 PDT 2008
>> @@ -31,12 +31,14 @@
>>  */
>>  public class FunctionQuery extends Query {
>>   ValueSource func;
>> +  Boolean readOnly;
>>
>>   /**
>>* @param func defines the function to be used for scoring
>>*/
>> -  public FunctionQuery(ValueSource func) {
>> +  public FunctionQuery(ValueSource func, Boolean readOnly) {
>> this.func=func;
>> +this.readOnly=readOnly;
>>   }
>>
>>   /** @return The associated ValueSource */
>> @@ -113,7 +115,7 @@
>> if (doc>=maxDoc) {
>>   return false;
>> }
>> -if (reader.isDeleted(doc)) continue;
>> +if (!readOnly && reader.isDeleted(doc)) continue;
>> // todo: maybe allow score() to throw a specific exception
>> // and continue on to the next document if it is thrown...
>> // that may be useful, but exceptions aren't really good
>> Index: src/java/org/apache/solr/core/Config.java
>> ===
>> --- src/java/org/apache/solr/core/Config.java   (revision 687135)
>> +++ src/java/org/apache/solr/core/Config.java   Tue Aug 19 11:08:45 PDT 2008
>> @@ -45,6 +45,8 @@
>>   private final String name;
>>   private final SolrResourceLoader loader;
>>
>> +  private Boolean readOnly;
>> +
>>   /**
>>* @deprecated Use [EMAIL PROTECTED] #Config(SolrResourceLoader, String, 
>> InputStream,
>> String)} instead.
>>*/
>> @@ -254,6 +256,19 @@
>>  return val!=null ? Double.parseDouble(val) : def;
>>}
>>
>> +  /**
>> +   * Is the index set up to be readOnly? If so, this will cause the
>> FunctionQuery stuff to not check
>> +   * for deleted documents.
>> +   * @return boolean readOnly
>> +   */
>> +   public boolean isReadOnly() {
>> +   if( this.readOnly == null ){
>> +   readOnly = getBool("/mainIndex/readOnly", false);
>> +   }
>> +
>> +   return readOnly;
>> +   }
>> +
>>   // The following functions were moved to ResourceLoader
>>
>> //-
>>
>> Index: example/solr/conf/solrconfig.xml
>> ===
>> --- example/solr/conf/solrconfig.xml(revision 687135)
>> +++ example/solr/conf/solrconfig.xmlTue Aug 19 11:13:13 PDT 2008
>> @@ -114,6 +114,12 @@
>>  This is not needed if lock type is 'none' or 'single'
>>  -->
>> false
>> +
>> +   
>> +   false
>>   
>>
>>   >
>> On Aug 18, 2008, at 8:04 PM, Yonik Seeley wrote:
>>
>>> It's not a deadlock (just a synchronization bottleneck) , but it is a
>>> known issue in Lucene and there has been some progress in improving
>>> the situation.
>>> -Yonik
>>>
>>>
>>> On Mon, Aug 18, 2008 at 10:55 PM, Matthew Runo <[EMAIL PROTECTED]> wrote:

 Hello folks!

 I was just wondering if anyone else has seen this issue under heavy load.
 We
 had some servers set to very high th

Re: Localisation, faceting

2008-08-19 Thread Otis Gospodnetic
Solr has pluggable query parsers, but the default one is the Lucene one, so I'd 
make use of Lucene's QueryParser.

Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
> From: Pierre Auslaender <[EMAIL PROTECTED]>
> To: solr-user@lucene.apache.org
> Sent: Monday, August 18, 2008 6:08:47 PM
> Subject: Re: Localisation, faceting
> 
> Excellent point about the saved queries. Thanks! So I could sniff the 
> locale (from the HTML page or the Java application,...) and infer the 
> "query language", or try to do automatic "guessing" of the language 
> based on the operator names (if they don't collide with indexed terms).
> 
> This brings up an other question: which query parser should I use? I 
> guess it would be a bad idea to invent one, it would be better to reuse 
> or adapt "the" query parser used by SOLR - or is it Lucene? Can you 
> point me to the parser?
> 
> Thanks,
> Pierre
> 
> Walter Underwood a écrit :
> > I would do it in the client, even if it meant parsing the query,
> > modifying it, then unparsing it.
> >
> > This is exactly like changing "To:" to "Zu:" in a mail header.
> > Show that in the client, but make it standard before it goes
> > onto the network.
> >
> > If queries at the Solr/Lucene level are standard, then users
> > with different locale settings could share saved queries.
> >
> > wunder
> >
> > On 8/18/08 2:18 PM, "Pierre Auslaender" wrote:
> >
> >  
> >> Would that be of any interest to the SOLR / Lucene community, given the
> >> trend to globalisation / regionalisation ? My base is Switzerland - 4
> >> official national tongues, none of them English.
> >>
> >> If one were to localise the boolean operators, would that have to be at
> >> the Lucene level, or could that be done at the SOLR level ?
> >>
> >> Thanks,
> >> Pierre
> >>
> >> Otis Gospodnetic a écrit :
> >>
> >>> Hi,
> >>>
> >>> Regarding Boolean operator localization -- there was a person who 
> >>> submitted
> >>> patches for the same functionality, but for Lucene's QueryParser.  This 
> >>> was 
> a
> >>> few years ago.  I think his patch was never applied.  Perhaps that helps.
> >>>
> >>> Otis
> >>> --
> >>> Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
> >>>
> >>>
> >>>
> >>> - Original Message 
> >>>  
> >>>  
>  From: Pierre Auslaender 
>  To: solr-user@lucene.apache.org
>  Sent: Saturday, August 16, 2008 12:50:53 PM
>  Subject: Localisation, faceting
> 
>  Hello,
> 
>  I have a couple of questions:
> 
>  1/ Is it possible to localise query operator names without writing code?
>  For instance, I'd like to issue queries with French operator names, e.g.
>  ET (instead of AND), OU (instead of OR), etc.
> 
>  2/ Is it possible for Solr to generate, in the XML response, the URLs or
>  complete queries for each facet in a faceted search?
> 
>  Here's an example. Say my first query is :
>  
> http://localhost:8080/solr/select?q=bac&facet=true&facet.field=kind&facet.li
>  mit=-1
> 
>  The "kind" field has three values: material, immaterial, time. I get
>  back something like this:
> 
> 
> 
> 
> 
>  1024
>  27633
>  389
> 
> 
> 
> 
>  If I want to drill down into one facet, say into "material", I have to
>  "manually" rebuild a query like this:
>  
> http://localhost:8080/solr/select?q=bac&facet=true&facet.field=kind&facet.li
>  mit=-1&fq=kind:"material"
> 
>  It's not too difficult, but surely Solr could add this URL or query
>  string under the "material" element. Is this possible? Or do I have to
>  XSLT the result myself?
> 
>  Thanks,
> 
>  Pierre Auslaender
> 
> 
> >>>  
> >>>  
> >
> >
> >  



Re: Deadlock in lucene?

2008-08-19 Thread Yonik Seeley
It doesn't matter that it's executed on the read-only server... it
matters if any of the docs are marked as deleted.   That's the
condition that you probably want to check for.

-Yonik

On Tue, Aug 19, 2008 at 4:26 PM, Matthew Runo <[EMAIL PROTECTED]> wrote:
> I know this isn't really the place for this, so please forgive me - but does
> this patch look reasonably safe to use to skip the isDeleted check inside of
> FunctionQuery?
>
> My reasoning behind this is that many people (us included) will be building
> the index on a separate server, and then using the replication scripts to
> publish the files out to several read-only servers. On those instances,
> deletedDocs would always be empty, since it's a read only instance - and so
> we can conveniently skip the Lucene code in question. This flag would also
> be good for other optimizations that can only be made when you assume the
> index is read-only.
>
> Solr seems to work with the flag set - any reasons why this will crash
> and/or kill my kitten?
>
> (please forgive my posting this here instead of in solr-dev!)
>
> Index: src/java/org/apache/solr/search/FunctionQParser.java
> ===
> --- src/java/org/apache/solr/search/FunctionQParser.java(revision
> 687135)
> +++ src/java/org/apache/solr/search/FunctionQParser.javaTue Aug 19
> 11:08:45 PDT 2008
> @@ -49,7 +49,7 @@
> }
> ***/
>
> -return new FunctionQuery(vs);
> +return new FunctionQuery(vs,
> req.getSchema().getSolrConfig().isReadOnly() );
>   }
>
>   /**
> Index: src/java/org/apache/solr/search/function/FunctionQuery.java
> ===
> --- src/java/org/apache/solr/search/function/FunctionQuery.java (revision
> 687135)
> +++ src/java/org/apache/solr/search/function/FunctionQuery.java Tue Aug 19
> 11:08:45 PDT 2008
> @@ -31,12 +31,14 @@
>  */
>  public class FunctionQuery extends Query {
>   ValueSource func;
> +  Boolean readOnly;
>
>   /**
>* @param func defines the function to be used for scoring
>*/
> -  public FunctionQuery(ValueSource func) {
> +  public FunctionQuery(ValueSource func, Boolean readOnly) {
> this.func=func;
> +this.readOnly=readOnly;
>   }
>
>   /** @return The associated ValueSource */
> @@ -113,7 +115,7 @@
> if (doc>=maxDoc) {
>   return false;
> }
> -if (reader.isDeleted(doc)) continue;
> +if (!readOnly && reader.isDeleted(doc)) continue;
> // todo: maybe allow score() to throw a specific exception
> // and continue on to the next document if it is thrown...
> // that may be useful, but exceptions aren't really good
> Index: src/java/org/apache/solr/core/Config.java
> ===
> --- src/java/org/apache/solr/core/Config.java   (revision 687135)
> +++ src/java/org/apache/solr/core/Config.java   Tue Aug 19 11:08:45 PDT 2008
> @@ -45,6 +45,8 @@
>   private final String name;
>   private final SolrResourceLoader loader;
>
> +  private Boolean readOnly;
> +
>   /**
>* @deprecated Use [EMAIL PROTECTED] #Config(SolrResourceLoader, String, 
> InputStream,
> String)} instead.
>*/
> @@ -254,6 +256,19 @@
>  return val!=null ? Double.parseDouble(val) : def;
>}
>
> +  /**
> +   * Is the index set up to be readOnly? If so, this will cause the
> FunctionQuery stuff to not check
> +   * for deleted documents.
> +   * @return boolean readOnly
> +   */
> +   public boolean isReadOnly() {
> +   if( this.readOnly == null ){
> +   readOnly = getBool("/mainIndex/readOnly", false);
> +   }
> +
> +   return readOnly;
> +   }
> +
>   // The following functions were moved to ResourceLoader
>
> //-
>
> Index: example/solr/conf/solrconfig.xml
> ===
> --- example/solr/conf/solrconfig.xml(revision 687135)
> +++ example/solr/conf/solrconfig.xmlTue Aug 19 11:13:13 PDT 2008
> @@ -114,6 +114,12 @@
>  This is not needed if lock type is 'none' or 'single'
>  -->
> false
> +
> +   
> +   false
>   
>
>   
> On Aug 18, 2008, at 8:04 PM, Yonik Seeley wrote:
>
>> It's not a deadlock (just a synchronization bottleneck) , but it is a
>> known issue in Lucene and there has been some progress in improving
>> the situation.
>> -Yonik
>>
>>
>> On Mon, Aug 18, 2008 at 10:55 PM, Matthew Runo <[EMAIL PROTECTED]> wrote:
>>>
>>> Hello folks!
>>>
>>> I was just wondering if anyone else has seen this issue under heavy load.
>>> We
>>> had some servers set to very high thread limits (12 core servers with 32
>>> gigs of ram), and found several threads would end up in this state
>>>
>>> Name: http-8080-891
>>> State: BLOCKED on [EMAIL PROTECTED] owned
>>> by:
>>> http-8080-191
>>> Total blocked: 97,926  Total waited: 16
>>>
>>> Stack trace

Re: shards and performance

2008-08-19 Thread Mike Klaas


On 19-Aug-08, at 12:58 PM, Phillip Farber wrote:


So you experience differs from Mike's.  Obviously it's an important  
decision as to whether to buy more machines.  Can you (or Mike)  
weigh in on what factors led to your different take on local shards  
vs. shards distributed across machines?


I do both; the only reason I have two shards on each machine is to  
squeeze maximum performance out of an equipment budget.  Err on the  
side of multiple machines.



At least for building the index, the number of shards really does
help. To index Medline (1.6e7 docs which is 60Gb in XML text) on a
single machine starts at about 100doc/s but slows down to 10doc/s  
when

the index grows. It seems as though the limit is reached once you run
out of RAM and it gets slower and slower in a linear fashion the
larger the index you get.
My sweet spot was 5 machines with 8GB RAM for indexing about 60GB of
data.


Can you say what the specs were for these machines? Given that I  
have more like 1TB of data over 1M docs how do you think my machine  
requirements might be affected as compared to yours?


You are in a much better position to determine this than we are.  See  
how big an index you can put on a single machine while maintaining  
acceptible performance using a typical query load.  It's relatively  
safe to extrapolate linearly from that.


-Mike


Re: Deadlock in lucene?

2008-08-19 Thread Matthew Runo
I know this isn't really the place for this, so please forgive me -  
but does this patch look reasonably safe to use to skip the isDeleted  
check inside of FunctionQuery?


My reasoning behind this is that many people (us included) will be  
building the index on a separate server, and then using the  
replication scripts to publish the files out to several read-only  
servers. On those instances, deletedDocs would always be empty, since  
it's a read only instance - and so we can conveniently skip the Lucene  
code in question. This flag would also be good for other optimizations  
that can only be made when you assume the index is read-only.


Solr seems to work with the flag set - any reasons why this will crash  
and/or kill my kitten?


(please forgive my posting this here instead of in solr-dev!)

Index: src/java/org/apache/solr/search/FunctionQParser.java
===
--- src/java/org/apache/solr/search/FunctionQParser.java	(revision  
687135)
+++ src/java/org/apache/solr/search/FunctionQParser.java	Tue Aug 19  
11:08:45 PDT 2008

@@ -49,7 +49,7 @@
 }
 ***/

-return new FunctionQuery(vs);
+return new FunctionQuery(vs,  
req.getSchema().getSolrConfig().isReadOnly() );

   }

   /**
Index: src/java/org/apache/solr/search/function/FunctionQuery.java
===
--- src/java/org/apache/solr/search/function/FunctionQuery.java	 
(revision 687135)
+++ src/java/org/apache/solr/search/function/FunctionQuery.java	Tue  
Aug 19 11:08:45 PDT 2008

@@ -31,12 +31,14 @@
  */
 public class FunctionQuery extends Query {
   ValueSource func;
+  Boolean readOnly;

   /**
* @param func defines the function to be used for scoring
*/
-  public FunctionQuery(ValueSource func) {
+  public FunctionQuery(ValueSource func, Boolean readOnly) {
 this.func=func;
+this.readOnly=readOnly;
   }

   /** @return The associated ValueSource */
@@ -113,7 +115,7 @@
 if (doc>=maxDoc) {
   return false;
 }
-if (reader.isDeleted(doc)) continue;
+if (!readOnly && reader.isDeleted(doc)) continue;
 // todo: maybe allow score() to throw a specific exception
 // and continue on to the next document if it is thrown...
 // that may be useful, but exceptions aren't really good
Index: src/java/org/apache/solr/core/Config.java
===
--- src/java/org/apache/solr/core/Config.java   (revision 687135)
+++ src/java/org/apache/solr/core/Config.java	Tue Aug 19 11:08:45 PDT  
2008

@@ -45,6 +45,8 @@
   private final String name;
   private final SolrResourceLoader loader;

+  private Boolean readOnly;
+
   /**
* @deprecated Use [EMAIL PROTECTED] #Config(SolrResourceLoader, String,  
InputStream, String)} instead.

*/
@@ -254,6 +256,19 @@
  return val!=null ? Double.parseDouble(val) : def;
}

+  /**
+   * Is the index set up to be readOnly? If so, this will cause the  
FunctionQuery stuff to not check

+   * for deleted documents.
+   * @return boolean readOnly
+   */
+   public boolean isReadOnly() {
+   if( this.readOnly == null ){
+   readOnly = getBool("/mainIndex/readOnly", false);
+   }
+
+   return readOnly;
+   }
+
   // The following functions were moved to ResourceLoader
   
//-

Index: example/solr/conf/solrconfig.xml
===
--- example/solr/conf/solrconfig.xml(revision 687135)
+++ example/solr/conf/solrconfig.xmlTue Aug 19 11:13:13 PDT 2008
@@ -114,6 +114,12 @@
  This is not needed if lock type is 'none' or 'single'
  -->
 false
+
+   
+   false
   

   

Re: solr-ruby version management

2008-08-19 Thread Otis Gospodnetic
I like this idea.  Perhaps separate the solr version and the solr-ruby version 
with a dash instead of dot -- solr-ruby-1.3.0-0.0.6

 
Otis
--
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch
 --
Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



- Original Message 
> From: Koji Sekiguchi <[EMAIL PROTECTED]>
> To: solr-user@lucene.apache.org; [EMAIL PROTECTED]
> Sent: Tuesday, August 19, 2008 4:24:31 AM
> Subject: solr-ruby version management
> 
> From: http://www.nabble.com/CHANGES.txt-td18901774.html
> 
> The latest version of solr-ruby is 0.0.6:
> 
> solr-ruby-0.0.6.gem
> http://rubyforge.org/frs/?group_id=2875&release_id=23885
> 
> I think it isn't clear what Solr version is corresponding.
> 
> I'd like to change this to solr-ruby-{solrVersion}.{solr-rubyVersion}.gem
> when Solr 1.3 is released. Where solr-rubyVersion is two digits.
> That is, the first official release of solr-ruby will be
> solr-ruby-1.3.0.01.gem.
> 
> Any objections to changing to this new version format?
> Or anyone who has suggestions, please let me know.
> 
> Koji



Re: shards and performance

2008-08-19 Thread Phillip Farber

Thanks, Ian, for the considered reply.  See below.

Ian Connor wrote:

I have not seen any boost by having an index split into shards on the
same machine. However, when you split it into smaller shards on
different machines (cpu/ram/hdd), the performance boost worth it.


So you experience differs from Mike's.  Obviously it's an important 
decision as to whether to buy more machines.  Can you (or Mike) weigh in 
on what factors led to your different take on local shards vs. shards 
distributed across machines?




At least for building the index, the number of shards really does
help. To index Medline (1.6e7 docs which is 60Gb in XML text) on a
single machine starts at about 100doc/s but slows down to 10doc/s when
the index grows. It seems as though the limit is reached once you run
out of RAM and it gets slower and slower in a linear fashion the
larger the index you get.

My sweet spot was 5 machines with 8GB RAM for indexing about 60GB of
data. 


Can you say what the specs were for these machines? Given that I have 
more like 1TB of data over 1M docs how do you think my machine 
requirements might be affected as compared to yours?



HDD speed helps with the initial rate and I found modern cheap
SATA drives that get 60-50Mb/s ideal. SCSI is faster but costs more.
So, for the money, you can add more shards instead of paying extra for
SCSI. I also tried a RAID0 array of USB drives hoping the access
speeds would help - but it didn't and the performance was the same as
it was for cheap SATA drives.

However, it took me a few weeks of experimenting to find this. I can
add more machines, and the index will get faster. However, the rate of
adding docs (my slope) does not degrade while I am building the index
with 5 machines.

On Tue, Aug 19, 2008 at 2:47 PM, Mike Klaas <[EMAIL PROTECTED]> wrote:

On 19-Aug-08, at 10:18 AM, Phillip Farber wrote:



I'm trying to understand how splitting a monolithic index into shards
improves query response time. Please tell me if I'm on the right track here.
  Were does the increase in performance come from?  Is it that in-memory
arrays are smaller when the index is partitioned into shards?  Or is it due
to the likelihood that the solr process behind each shard is running on its
own CPU on a multi-CPU box?

Usually, the performance is obtained by putting shards on separate machines.
 However, I have had success partitioning an index on a single  machine so
that a single query can be executed by multiple cpus.  It also helps to have
each index on a different hard disk.


And it must be the case that the overhead of merging results from several
shards is still less than the expense of searching a monolithic index.
 True?

Merging overhead is relatively insignificant.  Fetching stored fields from
more docs than necessary is an expense of sharding, however.


Given roughly 10 million documents in several languages inducing perhaps
200K unique terms and averaging about 1 MB/doc how many shards would you
recommend and how much RAM?

I'd never recommend more shards on a single machine than there are cpus.
 For an index of that size, you will need at least 8GB of ram; 16GB would be
better.


Is it correct that Distributed Search (shards) is in 1.3 or does 1.2
support it?

It is 1.3 only.


If 1.3, is the nightly build the best one to grab bearing in mind that we
would want any protocols around distributed search to be as stable as
possible?  Or just wait for the 1.3 release?

Go for the nightly build.  The release will look very similar to it.

-Mike







Re: shards and performance

2008-08-19 Thread Ian Connor
I have not seen any boost by having an index split into shards on the
same machine. However, when you split it into smaller shards on
different machines (cpu/ram/hdd), the performance boost worth it.

At least for building the index, the number of shards really does
help. To index Medline (1.6e7 docs which is 60Gb in XML text) on a
single machine starts at about 100doc/s but slows down to 10doc/s when
the index grows. It seems as though the limit is reached once you run
out of RAM and it gets slower and slower in a linear fashion the
larger the index you get.

My sweet spot was 5 machines with 8GB RAM for indexing about 60GB of
data. HDD speed helps with the initial rate and I found modern cheap
SATA drives that get 60-50Mb/s ideal. SCSI is faster but costs more.
So, for the money, you can add more shards instead of paying extra for
SCSI. I also tried a RAID0 array of USB drives hoping the access
speeds would help - but it didn't and the performance was the same as
it was for cheap SATA drives.

However, it took me a few weeks of experimenting to find this. I can
add more machines, and the index will get faster. However, the rate of
adding docs (my slope) does not degrade while I am building the index
with 5 machines.

On Tue, Aug 19, 2008 at 2:47 PM, Mike Klaas <[EMAIL PROTECTED]> wrote:
> On 19-Aug-08, at 10:18 AM, Phillip Farber wrote:
>
>>
>>
>> I'm trying to understand how splitting a monolithic index into shards
>> improves query response time. Please tell me if I'm on the right track here.
>>   Were does the increase in performance come from?  Is it that in-memory
>> arrays are smaller when the index is partitioned into shards?  Or is it due
>> to the likelihood that the solr process behind each shard is running on its
>> own CPU on a multi-CPU box?
>
> Usually, the performance is obtained by putting shards on separate machines.
>  However, I have had success partitioning an index on a single  machine so
> that a single query can be executed by multiple cpus.  It also helps to have
> each index on a different hard disk.
>
>> And it must be the case that the overhead of merging results from several
>> shards is still less than the expense of searching a monolithic index.
>>  True?
>
> Merging overhead is relatively insignificant.  Fetching stored fields from
> more docs than necessary is an expense of sharding, however.
>
>> Given roughly 10 million documents in several languages inducing perhaps
>> 200K unique terms and averaging about 1 MB/doc how many shards would you
>> recommend and how much RAM?
>
> I'd never recommend more shards on a single machine than there are cpus.
>  For an index of that size, you will need at least 8GB of ram; 16GB would be
> better.
>
>> Is it correct that Distributed Search (shards) is in 1.3 or does 1.2
>> support it?
>
> It is 1.3 only.
>
>> If 1.3, is the nightly build the best one to grab bearing in mind that we
>> would want any protocols around distributed search to be as stable as
>> possible?  Or just wait for the 1.3 release?
>
> Go for the nightly build.  The release will look very similar to it.
>
> -Mike
>



-- 
Regards,

Ian Connor
1 Leighton St #605
Cambridge, MA 02141
Direct Line: +1 (978) 672
Call Center Phone: +1 (714) 239 3875 (24 hrs)
Mobile Phone: +1 (312) 218 3209
Fax: +1(770) 818 5697
Suisse Phone: +41 (0) 22 548 1664
Skype: ian.connor


Re: Clarification on facets

2008-08-19 Thread Mike Klaas

A simple way is to query using debugQuery=true and parse the output:

0.74248177 = queryWeight(rawText:python), product of:
   2.581456 = idf(docFreq=16017)
  0.28762132 = queryNorm
0.4191762 = (MATCH) fieldWeight(rawText:python in 950285), product of:
   5.196152 = tf(termFreq(rawText:python)=27)
   2.581456 = idf(docFreq=16017)
   0.03125 = fieldNorm(field=rawText, doc=950285)

The =27 is the number of times 'python' appears in this document.

You could also write a custom component that included in this  
information in the response.


-Mike

On 18-Aug-08, at 8:16 PM, Gene Campbell wrote:


Thank you for the response.  Always nice to have something willing to
validate your thinking!

Of course, if anyone has any ideas on how to get the numbers of times
term is repeated in a document,
I'm all ears.

cheers
gene


On Tue, Aug 19, 2008 at 1:42 PM, Norberto Meijome  
<[EMAIL PROTECTED]> wrote:

On Tue, 19 Aug 2008 10:18:12 +1200
"Gene Campbell" <[EMAIL PROTECTED]> wrote:

Is this interpreted as meaning, there are 10 documents that will  
match

with 'car' in the title, and likewise 6 'boat' and 2 'bike'?


Correct.


If so, is there any way to get counts for the *number times* a value
is found in a document.  I'm looking for a way to determine the  
number

of times 'car' is repeated in the title, for example


Not sure - i would suggest that a field with a term repeated  
several times would receive a higher score when searching for that  
term, but not sure how you could get the information you  
seek...maybe with the Luke handler ? ( but on a per-document  
basis...slow... ? )


B
_
{Beto|Norberto|Numard} Meijome

Computers are like air conditioners; they can't do their job  
properly if you open windows.


I speak for myself, not my employer. Contents may be hot. Slippery  
when wet. Reading disclaimers makes you go blind. Writing them is  
worse. You have been Warned.






Re: shards and performance

2008-08-19 Thread Mike Klaas

On 19-Aug-08, at 10:18 AM, Phillip Farber wrote:




I'm trying to understand how splitting a monolithic index into  
shards improves query response time. Please tell me if I'm on the  
right track here.   Were does the increase in performance come  
from?  Is it that in-memory arrays are smaller when the index is  
partitioned into shards?  Or is it due to the likelihood that the  
solr process behind each shard is running on its own CPU on a multi- 
CPU box?


Usually, the performance is obtained by putting shards on separate  
machines.  However, I have had success partitioning an index on a  
single  machine so that a single query can be executed by multiple  
cpus.  It also helps to have each index on a different hard disk.


And it must be the case that the overhead of merging results from  
several shards is still less than the expense of searching a  
monolithic index.  True?


Merging overhead is relatively insignificant.  Fetching stored fields  
from more docs than necessary is an expense of sharding, however.


Given roughly 10 million documents in several languages inducing  
perhaps 200K unique terms and averaging about 1 MB/doc how many  
shards would you recommend and how much RAM?


I'd never recommend more shards on a single machine than there are  
cpus.  For an index of that size, you will need at least 8GB of ram;  
16GB would be better.


Is it correct that Distributed Search (shards) is in 1.3 or does 1.2  
support it?


It is 1.3 only.

If 1.3, is the nightly build the best one to grab bearing in mind  
that we would want any protocols around distributed search to be as  
stable as possible?  Or just wait for the 1.3 release?


Go for the nightly build.  The release will look very similar to it.

-Mike


Re: Order of returned fields

2008-08-19 Thread Alexander Ramos Jardim
I don't think so, as solr uses a flat index to represent data. I have some
efort towards representing relational data on a flat structure, but until
now I don't have anything too concrete.

My suggestion is: create classes that isolate the parsing strategy, so you
can have dao's that doesn't really know what is happening with the data the
retrieve, and your domain classes would retrieve the data as they expect
independently of the format you put them on the index.

This is like having a pair of dao+parser to retrieve the data in the middle
tier.

2008/8/19 Pierre Auslaender <[EMAIL PROTECTED]>

> Hi Alex,
>
> Do you think I could then specify an order on the returned fields for each
> document, without reordering the fields by parsing the SOLR response ?
>
> Thanks,
> Pierre
>
> Alexander Ramos Jardim a écrit :
>
>  Hey Pierre,
>>
>> I don't know if my case helps you, but what I do to keep relational
>> information is to put the related data all in the same field.
>>
>> Let me give you an example:
>>
>> I have a product index. Each product has a list of manufacturer
>> properties,
>> like dimensions, color, connections supported (usb, bluetooth and so on),
>> etc etc etc. Each property  belongs to a context, so I index data
>> following
>> this model:
>>
>> propertyId ^ propertyLabel ^ propertyType ^ propertyValue
>>
>> Then I parse each result returned on my application.
>>
>> Does that help you?
>>
>> 2008/8/18 Pierre Auslaender <[EMAIL PROTECTED]>
>>
>>
>>
>>> Order matters in my application because I'm indexing structured data -
>>> actually, a domain object model (a bit like with Hibernate Search), only
>>> I'm
>>> adding parents to children, instead of children to parents. So say I have
>>> Cities and People, with a 1-N relationship between City and People. I'm
>>> indexing documents for Cities, and documents for People, and the
>>> documents
>>> for People contain the fields of the City they're living in.
>>>
>>> When I display the results, I'd like the People fields to display before
>>> the City fields. I can parse the Solr response and rearrange the fields
>>> (in
>>> the Java middle-tier, or with XSLT, or in the Javascript client), but
>>> then I
>>> have to "know" of the domain in too many places. I have to "know" of the
>>> domain in my Java application, in the SOLR schema file, and in the
>>> Javascript that rearranges the fields... I thought maybe I could avoid
>>> the
>>> latter and put as much application information as possible in the SOLR
>>> schema, for instance specifiy an order for the returned fields...
>>>
>>> Thanks anyway,
>>>
>>> Pierre
>>>
>>> Erik Hatcher a écrit :
>>>
>>>  Yes, this is normal behavior.
>>>
>>>
 Does order matter in your application?  Could you explain why?

 Order is maintained with multiple values of the same field name, though
 -
 which is important.

   Erik


 On Aug 17, 2008, at 6:38 PM, Pierre Auslaender wrote:

  Hello,


> After a Solr query, I always get the fields back in alphabetical order,
> no matter how I insert them.
> Is this the normal behaviour?
>
> This is when adding the document...
>  
> ch.tsr.esg.domain.ProgramCollection[id:
> 1]
> collection
> Bac à sable
> 
> http://localhost:8080/esg/api/collections/1
>  
>
> ... and this is when retrieving it:
> 
> Bac à sable
> 
> http://localhost:8080/esg/api/collections/1
> collection
> ch.tsr.esg.domain.ProgramCollection[id:
> 1]
> 
>
> Thanks a lot,
> Pierre Auslaender
>
>
>



>>>
>>
>>
>>
>


-- 
Alexander Ramos Jardim


shards and performance

2008-08-19 Thread Phillip Farber



I'm trying to understand how splitting a monolithic index into shards 
improves query response time. Please tell me if I'm on the right track 
here.   Were does the increase in performance come from?  Is it that 
in-memory arrays are smaller when the index is partitioned into shards? 
 Or is it due to the likelihood that the solr process behind each shard 
is running on its own CPU on a multi-CPU box?


And it must be the case that the overhead of merging results from 
several shards is still less than the expense of searching a monolithic 
index.  True?


Given roughly 10 million documents in several languages inducing perhaps 
200K unique terms and averaging about 1 MB/doc how many shards would you 
recommend and how much RAM?


Is it correct that Distributed Search (shards) is in 1.3 or does 1.2 
support it?


If 1.3, is the nightly build the best one to grab bearing in mind that 
we would want any protocols around distributed search to be as stable as 
possible?  Or just wait for the 1.3 release?




Thanks very much,

Phil

--
Phillip Farber - http://www.umdl.umich.edu







RE: Can I change "/select" to POST and not GET

2008-08-19 Thread Sunil
Hi Ian,

Thanks for the reply. I am using CURL, and the library was sending a GET
request to solr. But I have changed it to POST. Now it's working
properly.

Thanks,
Sunil

-Original Message-
From: Ian Connor [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, August 19, 2008 7:53 PM
To: solr-user@lucene.apache.org
Subject: Re: Can I change "/select" to POST and not GET

The query limit is a software imposed limit. What client are you using
and can that be configured to allow more?

On Tue, Aug 19, 2008 at 9:43 AM, Sunil <[EMAIL PROTECTED]> wrote:
> Hi,
>
> My query limit is exceeding the 1024 URL length. Can I configure solr
to
> accept POST requests while searching content in solr?
>
> Thanks in advance,
> Sunil.
>
>
>



-- 
Regards,

Ian Connor




Re: Can I change "/select" to POST and not GET

2008-08-19 Thread Ian Connor
The query limit is a software imposed limit. What client are you using
and can that be configured to allow more?

On Tue, Aug 19, 2008 at 9:43 AM, Sunil <[EMAIL PROTECTED]> wrote:
> Hi,
>
> My query limit is exceeding the 1024 URL length. Can I configure solr to
> accept POST requests while searching content in solr?
>
> Thanks in advance,
> Sunil.
>
>
>



-- 
Regards,

Ian Connor


Can I change "/select" to POST and not GET

2008-08-19 Thread Sunil
Hi,

My query limit is exceeding the 1024 URL length. Can I configure solr to
accept POST requests while searching content in solr?

Thanks in advance,
Sunil.




Re: which shard is a result coming from

2008-08-19 Thread Ian Connor
Could this idea of a  "computed field" actually just be a query
filter? Can the filter just add a field on the return like this?

On Tue, Aug 19, 2008 at 9:10 AM, Ian Connor <[EMAIL PROTECTED]> wrote:
> I was thinking more that it would be an extra field you get back. My
> understanding of doing updates requires:
>
> 1. get your document (either by ID or from a search)
> 2. merge your update into the doc
> 3. update solr with the doc (which essentially just writes it all
> again but as you have done the merge nothing is lost).
>
> for shards, i would read from the main shard but write directly back
> to the shard directly. The idea is that you don't need to concern the
> main server with an update to a child shard (unless this direct bypass
> is dangerous somehow).
>
> So finding out which shard it came from on the initial "get" is key to
> know where to send the merged document.
>
> Some sort of "computed field" would work here. Something that is not
> actually in the index but is returned. The indexing and storing of the
> value is not needed as you can always filter which shards you want
> when creating the query.
>
> On Tue, Aug 19, 2008 at 8:59 AM, Brian Whitman <[EMAIL PROTECTED]> wrote:
>>
>> On Aug 19, 2008, at 8:49 AM, Ian Connor wrote:
>>
>>> What is the current "special requestHandler" that you can set currently?
>>
>> If you're referring to my issue post, that's just something we have
>> internally (not in trunk solr) that we use instead of /update -- it just
>> inserts a hostname:port/solr into the incoming
>> XML doc add stream. Not very clean but it works. Use lars's patch.
>>
>>
>>
>>
>
>
>
> --
> Regards,
>
> Ian Connor
> 1 Leighton St #605
> Cambridge, MA 02141
> Direct Line: +1 (978) 672
> Call Center Phone: +1 (714) 239 3875 (24 hrs)
> Mobile Phone: +1 (312) 218 3209
> Fax: +1(770) 818 5697
> Suisse Phone: +41 (0) 22 548 1664
> Skype: ian.connor
>



-- 
Regards,

Ian Connor


Re: Order of returned fields

2008-08-19 Thread Pierre Auslaender

Hi Alex,

Do you think I could then specify an order on the returned fields for 
each document, without reordering the fields by parsing the SOLR response ?


Thanks,
Pierre

Alexander Ramos Jardim a écrit :

Hey Pierre,

I don't know if my case helps you, but what I do to keep relational
information is to put the related data all in the same field.

Let me give you an example:

I have a product index. Each product has a list of manufacturer properties,
like dimensions, color, connections supported (usb, bluetooth and so on),
etc etc etc. Each property  belongs to a context, so I index data following
this model:

propertyId ^ propertyLabel ^ propertyType ^ propertyValue

Then I parse each result returned on my application.

Does that help you?

2008/8/18 Pierre Auslaender <[EMAIL PROTECTED]>

  

Order matters in my application because I'm indexing structured data -
actually, a domain object model (a bit like with Hibernate Search), only I'm
adding parents to children, instead of children to parents. So say I have
Cities and People, with a 1-N relationship between City and People. I'm
indexing documents for Cities, and documents for People, and the documents
for People contain the fields of the City they're living in.

When I display the results, I'd like the People fields to display before
the City fields. I can parse the Solr response and rearrange the fields (in
the Java middle-tier, or with XSLT, or in the Javascript client), but then I
have to "know" of the domain in too many places. I have to "know" of the
domain in my Java application, in the SOLR schema file, and in the
Javascript that rearranges the fields... I thought maybe I could avoid the
latter and put as much application information as possible in the SOLR
schema, for instance specifiy an order for the returned fields...

Thanks anyway,

Pierre

Erik Hatcher a écrit :

 Yes, this is normal behavior.


Does order matter in your application?  Could you explain why?

Order is maintained with multiple values of the same field name, though -
which is important.

   Erik


On Aug 17, 2008, at 6:38 PM, Pierre Auslaender wrote:

 Hello,
  

After a Solr query, I always get the fields back in alphabetical order,
no matter how I insert them.
Is this the normal behaviour?

This is when adding the document...
 
 ch.tsr.esg.domain.ProgramCollection[id: 1]
 collection
 Bac à sable
 
http://localhost:8080/esg/api/collections/1
 

... and this is when retrieving it:
 
 Bac à sable
 
http://localhost:8080/esg/api/collections/1
 collection
 ch.tsr.esg.domain.ProgramCollection[id: 1]
 

Thanks a lot,
Pierre Auslaender




  



  


Re: which shard is a result coming from

2008-08-19 Thread Ian Connor
I was thinking more that it would be an extra field you get back. My
understanding of doing updates requires:

1. get your document (either by ID or from a search)
2. merge your update into the doc
3. update solr with the doc (which essentially just writes it all
again but as you have done the merge nothing is lost).

for shards, i would read from the main shard but write directly back
to the shard directly. The idea is that you don't need to concern the
main server with an update to a child shard (unless this direct bypass
is dangerous somehow).

So finding out which shard it came from on the initial "get" is key to
know where to send the merged document.

Some sort of "computed field" would work here. Something that is not
actually in the index but is returned. The indexing and storing of the
value is not needed as you can always filter which shards you want
when creating the query.

On Tue, Aug 19, 2008 at 8:59 AM, Brian Whitman <[EMAIL PROTECTED]> wrote:
>
> On Aug 19, 2008, at 8:49 AM, Ian Connor wrote:
>
>> What is the current "special requestHandler" that you can set currently?
>
> If you're referring to my issue post, that's just something we have
> internally (not in trunk solr) that we use instead of /update -- it just
> inserts a hostname:port/solr into the incoming
> XML doc add stream. Not very clean but it works. Use lars's patch.
>
>
>
>



-- 
Regards,

Ian Connor
1 Leighton St #605
Cambridge, MA 02141
Direct Line: +1 (978) 672
Call Center Phone: +1 (714) 239 3875 (24 hrs)
Mobile Phone: +1 (312) 218 3209
Fax: +1(770) 818 5697
Suisse Phone: +41 (0) 22 548 1664
Skype: ian.connor


Re: which shard is a result coming from

2008-08-19 Thread Brian Whitman


On Aug 19, 2008, at 8:49 AM, Ian Connor wrote:

What is the current "special requestHandler" that you can set  
currently?


If you're referring to my issue post, that's just something we have  
internally (not in trunk solr) that we use instead of /update -- it  
just inserts a hostname:port/solr into the  
incoming XML doc add stream. Not very clean but it works. Use lars's  
patch.






Re: which shard is a result coming from

2008-08-19 Thread Ian Connor
What is the current "special requestHandler" that you can set currently?

On Tue, Aug 19, 2008 at 8:41 AM, Shalin Shekhar Mangar
<[EMAIL PROTECTED]> wrote:
> There's an issue open for this. Look at
> https://issues.apache.org/jira/browse/SOLR-705
>
> On Tue, Aug 19, 2008 at 6:08 PM, Ian Connor <[EMAIL PROTECTED]> wrote:
>> Hi,
>>
>> Is there a way to know which shard contains a given result. This would
>> help when you want to write updates back to the correct place.
>>
>> The idea is when you read your results, there would be an item to say
>> where a given result came from.
>>
>> --
>> Regards,
>>
>> Ian Connor
>>
>
>
>
> --
> Regards,
> Shalin Shekhar Mangar.
>



-- 
Regards,

Ian Connor
1 Leighton St #605
Cambridge, MA 02141
Direct Line: +1 (978) 672
Call Center Phone: +1 (714) 239 3875 (24 hrs)
Mobile Phone: +1 (312) 218 3209
Fax: +1(770) 818 5697
Suisse Phone: +41 (0) 22 548 1664
Skype: ian.connor


Re: which shard is a result coming from

2008-08-19 Thread Shalin Shekhar Mangar
There's an issue open for this. Look at
https://issues.apache.org/jira/browse/SOLR-705

On Tue, Aug 19, 2008 at 6:08 PM, Ian Connor <[EMAIL PROTECTED]> wrote:
> Hi,
>
> Is there a way to know which shard contains a given result. This would
> help when you want to write updates back to the correct place.
>
> The idea is when you read your results, there would be an item to say
> where a given result came from.
>
> --
> Regards,
>
> Ian Connor
>



-- 
Regards,
Shalin Shekhar Mangar.


which shard is a result coming from

2008-08-19 Thread Ian Connor
Hi,

Is there a way to know which shard contains a given result. This would
help when you want to write updates back to the correct place.

The idea is when you read your results, there would be an item to say
where a given result came from.

-- 
Regards,

Ian Connor


solr-ruby version management

2008-08-19 Thread Koji Sekiguchi
From: http://www.nabble.com/CHANGES.txt-td18901774.html

The latest version of solr-ruby is 0.0.6:

solr-ruby-0.0.6.gem
http://rubyforge.org/frs/?group_id=2875&release_id=23885

I think it isn't clear what Solr version is corresponding.

I'd like to change this to solr-ruby-{solrVersion}.{solr-rubyVersion}.gem
when Solr 1.3 is released. Where solr-rubyVersion is two digits.
That is, the first official release of solr-ruby will be
solr-ruby-1.3.0.01.gem.

Any objections to changing to this new version format?
Or anyone who has suggestions, please let me know.

Koji



Re: Solr won't start under jetty on RHEL5.2

2008-08-19 Thread Shalin Shekhar Mangar
On Tue, Aug 19, 2008 at 4:50 AM, Jon Drukman <[EMAIL PROTECTED]> wrote:

> Jon Drukman wrote:
>
>> I just migrated my solr instance to a new server, running RHEL5.2.  I
>> installed java from yum but I suspect it's different from the one I used to
>> use.
>>
>
>
> Turns out my instincts were correct.  The version from yum does not work. I
> installed the official sun jdk and now it starts fine.
>
> bad:
>
> java version "1.4.2"
> gij (GNU libgcj) version 4.1.2 20071124 (Red Hat 4.1.2-42)
>
> good:
>
> java version "1.6.0_07"
> Java(TM) SE Runtime Environment (build 1.6.0_07-b06)
> Java HotSpot(TM) 64-Bit Server VM (build 10.0-b23, mixed mode)
>

Probably because Solr is compiled with Java 5. AFAIK, gcj does not support
Java 5 features fully.


-- 
Regards,
Shalin Shekhar Mangar.