Re: Which terms in the query match

2007-10-17 Thread Bertrand Delacretaz
On 10/16/07, Nishant Soni [EMAIL PROTECTED] wrote:

 ...So is there a way to query solr about which of the tokens in the query
 actually matched ?...

The analyzer admin page should help, see
http://wiki.apache.org/solr/FAQ#head-b25df8c8393bbcca28f1f344c432975002e29ca9

-Bertrand


Re: Which terms in the query match

2007-10-17 Thread Nishant Soni

Thanks for the tip but I guess I should have been more specific. I want
to do it programmatically on the fly and use those results in other
ways. 

Essentially what I want to do is this:

1. Query for a set of terms against a field - 
2. do a second query on the results of the first query for the terms
that did not match in the first query against another field.

I am thinking this should not be too uncommon a problem so maybe there
is something that I am missing.

thanks
nishant


--- Bertrand Delacretaz [EMAIL PROTECTED] wrote:

 On 10/16/07, Nishant Soni [EMAIL PROTECTED] wrote:
 
  ...So is there a way to query solr about which of the tokens in the
 query
  actually matched ?...
 
 The analyzer admin page should help, see

http://wiki.apache.org/solr/FAQ#head-b25df8c8393bbcca28f1f344c432975002e29ca9
 
 -Bertrand
 


-- 
www.reviewgist.com

__
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 


Re: Search results problem

2007-10-17 Thread Maximilian Hütter
Daniel Naber schrieb:
 On Tuesday 16 October 2007 12:03, Maximilian Hütter wrote:
 
 the content of one document is completely contained in another,
 but search for a special word I only get one document as result.
 I am absolutely sure it is contained in the other document, but I will
 only get the parent doc if I add a word.
 
 You should try debugging the problem with Luke, e.g. use reconstruct  
 edit to see if the term is really indexed in both documents.
 
 Regards
  Daniel
 

Thank you for the tip, after using luke I can see that the term is
really missing in the other document.
Is there a size restriction for field content in Solr/Lucene? Because
from the fulltext field I use as default field (after luke
reconstruction) seem to be missing a lot strings I expected to find there.

Best regards,

Max

-- 
Maximilian Hütter
blue elephant systems GmbH
Wollgrasweg 49
D-70599 Stuttgart

Tel:  (+49) 0711 - 45 10 17 578
Fax:  (+49) 0711 - 45 10 17 573
e-mail :  [EMAIL PROTECTED]
Sitz   :  Stuttgart, Amtsgericht Stuttgart, HRB 24106
Geschäftsführer:  Joachim Hörnle, Thomas Gentsch, Holger Dietrich


Re: Search results problem

2007-10-17 Thread Pieter Berkel
There is a configuration option called maxFieldLength in
solrconfig.xmlwith the default value of 10,000.  You may need to
increase this value if
you are indexing fields that are longer.



On 17/10/2007, Maximilian Hütter [EMAIL PROTECTED] wrote:

 Daniel Naber schrieb:
  On Tuesday 16 October 2007 12:03, Maximilian Hütter wrote:
 
  the content of one document is completely contained in another,
  but search for a special word I only get one document as result.
  I am absolutely sure it is contained in the other document, but I will
  only get the parent doc if I add a word.
 
  You should try debugging the problem with Luke, e.g. use reconstruct 
  edit to see if the term is really indexed in both documents.
 
  Regards
   Daniel
 

 Thank you for the tip, after using luke I can see that the term is
 really missing in the other document.
 Is there a size restriction for field content in Solr/Lucene? Because
 from the fulltext field I use as default field (after luke
 reconstruction) seem to be missing a lot strings I expected to find there.

 Best regards,

 Max

 --
 Maximilian Hütter
 blue elephant systems GmbH
 Wollgrasweg 49
 D-70599 Stuttgart

 Tel:  (+49) 0711 - 45 10 17 578
 Fax:  (+49) 0711 - 45 10 17 573
 e-mail :  [EMAIL PROTECTED]
 Sitz   :  Stuttgart, Amtsgericht Stuttgart, HRB 24106
 Geschäftsführer:  Joachim Hörnle, Thomas Gentsch, Holger Dietrich



Re: Search results problem

2007-10-17 Thread Thorsten Scherler
On Wed, 2007-10-17 at 20:44 +1000, Pieter Berkel wrote:
 There is a configuration option called maxFieldLength in
 solrconfig.xmlwith the default value of 10,000.  You may need to
 increase this value if
 you are indexing fields that are longer.
 

Is there a way to define a unlimited value? Like -1?

TIA

salu2

 
 
 On 17/10/2007, Maximilian Hütter [EMAIL PROTECTED] wrote:
 
  Daniel Naber schrieb:
   On Tuesday 16 October 2007 12:03, Maximilian Hütter wrote:
  
   the content of one document is completely contained in another,
   but search for a special word I only get one document as result.
   I am absolutely sure it is contained in the other document, but I will
   only get the parent doc if I add a word.
  
   You should try debugging the problem with Luke, e.g. use reconstruct 
   edit to see if the term is really indexed in both documents.
  
   Regards
Daniel
  
 
  Thank you for the tip, after using luke I can see that the term is
  really missing in the other document.
  Is there a size restriction for field content in Solr/Lucene? Because
  from the fulltext field I use as default field (after luke
  reconstruction) seem to be missing a lot strings I expected to find there.
 
  Best regards,
 
  Max
 
  --
  Maximilian Hütter
  blue elephant systems GmbH
  Wollgrasweg 49
  D-70599 Stuttgart
 
  Tel:  (+49) 0711 - 45 10 17 578
  Fax:  (+49) 0711 - 45 10 17 573
  e-mail :  [EMAIL PROTECTED]
  Sitz   :  Stuttgart, Amtsgericht Stuttgart, HRB 24106
  Geschäftsführer:  Joachim Hörnle, Thomas Gentsch, Holger Dietrich
 
-- 
Thorsten Scherler thorsten.at.apache.org
Open Source Java  consulting, training and solutions



Re: Search results problem

2007-10-17 Thread Pieter Berkel
Just to clarify, maxFieldLength refers to the maximum number of *terms*
that will be indexed per field, not the character length of the field (I
wasn't clear about that in my previous post).

Unfortunately there is no way to specify an unlimited value, although if you
set it to a suitably large value, you shouldn't really have any problems
(other than running out of memory).

Piete



On 17/10/2007, Thorsten Scherler [EMAIL PROTECTED]
wrote:

 On Wed, 2007-10-17 at 20:44 +1000, Pieter Berkel wrote:
  There is a configuration option called maxFieldLength in
  solrconfig.xmlwith the default value of 10,000.  You may need to
  increase this value if
  you are indexing fields that are longer.
 

 Is there a way to define a unlimited value? Like -1?

 TIA

 salu2

 
 
  On 17/10/2007, Maximilian Hütter [EMAIL PROTECTED] wrote:
  
   Daniel Naber schrieb:
On Tuesday 16 October 2007 12:03, Maximilian Hütter wrote:
   
the content of one document is completely contained in another,
but search for a special word I only get one document as result.
I am absolutely sure it is contained in the other document, but I
 will
only get the parent doc if I add a word.
   
You should try debugging the problem with Luke, e.g. use
 reconstruct 
edit to see if the term is really indexed in both documents.
   
Regards
 Daniel
   
  
   Thank you for the tip, after using luke I can see that the term is
   really missing in the other document.
   Is there a size restriction for field content in Solr/Lucene? Because
   from the fulltext field I use as default field (after luke
   reconstruction) seem to be missing a lot strings I expected to find
 there.
  
   Best regards,
  
   Max
  
   --
   Maximilian Hütter
   blue elephant systems GmbH
   Wollgrasweg 49
   D-70599 Stuttgart
  
   Tel:  (+49) 0711 - 45 10 17 578
   Fax:  (+49) 0711 - 45 10 17 573
   e-mail :  [EMAIL PROTECTED]
   Sitz   :  Stuttgart, Amtsgericht Stuttgart, HRB 24106
   Geschäftsführer:  Joachim Hörnle, Thomas Gentsch, Holger Dietrich
  
 --
 Thorsten Scherler thorsten.at.apache.org
 Open Source Java  consulting, training and solutions




Re: Search results problem

2007-10-17 Thread Maximilian Hütter
Thorsten Scherler schrieb:
 On Wed, 2007-10-17 at 20:44 +1000, Pieter Berkel wrote:
 There is a configuration option called maxFieldLength in
 solrconfig.xmlwith the default value of 10,000.  You may need to
 increase this value if
 you are indexing fields that are longer.

 
 Is there a way to define a unlimited value? Like -1?
 
 TIA

I didn't see the maxFieldLength option, but that is surely the problem,
as the document is truncated at 1 terms.
The question is what to do about it, I certainly need a much higher
number. I doubt if it is possible to set it to unlimited.

I also found this:

Controls the maximum number of terms that can be added to a Field for a
given Document, thereby truncating the document. Increase this number if
large documents are expected. However, setting this value too high may
result in out-of-memory errors.

Coming from: http://www.ibm.com/developerworks/library/j-solr2/index.html

That might be a problem for me.

I was thinking about using copyFields, instead of one large fulltext
field. Would that solve my problem, or would the maxFieldLength still
apply when using copyFields?

Best regards,

Max


-- 
Maximilian Hütter
blue elephant systems GmbH
Wollgrasweg 49
D-70599 Stuttgart

Tel:  (+49) 0711 - 45 10 17 578
Fax:  (+49) 0711 - 45 10 17 573
e-mail :  [EMAIL PROTECTED]
Sitz   :  Stuttgart, Amtsgericht Stuttgart, HRB 24106
Geschäftsführer:  Joachim Hörnle, Thomas Gentsch, Holger Dietrich


Re: End user session tracking

2007-10-17 Thread Karl Wettin


16 okt 2007 kl. 17.12 skrev Ryan McKinley:

So I'll start with an ad hoc session manager within Solr. Where in  
Solr should I add such a service?


I am using a custom filter that extends SolrDispatchFilter.


Alright, thanks!

--
karl


Re: Search results problem

2007-10-17 Thread Yonik Seeley
On 10/17/07, Maximilian Hütter [EMAIL PROTECTED] wrote:
 I also found this:

 Controls the maximum number of terms that can be added to a Field for a
 given Document, thereby truncating the document. Increase this number if
 large documents are expected. However, setting this value too high may
 result in out-of-memory errors.

 Coming from: http://www.ibm.com/developerworks/library/j-solr2/index.html

 That might be a problem for me.

 I was thinking about using copyFields, instead of one large fulltext
 field. Would that solve my problem, or would the maxFieldLength still
 apply when using copyFields?

maxFieldLength is a setting on the IndexWriter and applies to all fields.
If you want more tokens indexed, simply increase the value of
maxFieldLength to something like 20 and you should be fine.

There's no penalty for setting it higher than the largest field you
are indexing (no diff between 1M and 2B if all your docs have field
lengths less than 1M tokens anyway).

-Yonik


Solr + Tomcat Undeploy Leaks

2007-10-17 Thread Stu Hood
Hello,

I'm using the Tomcat Manager app with 6.0.14 to start and stop Solr instances, 
and I believe I am running into a variant of the linked issue:

http://wiki.apache.org/jakarta-commons/Logging/UndeployMemoryLeak?action=print

According to `top`, the 'size' of the Tomcat process reaches the limit I have 
set for it with the Java -Xmx flag soon after starting and launching a few 
instances. The 'RSS' varies based on how full the caches are at any particular 
time, but I don't think it ever reaches the 'size'.

After a few days, I will get OOM errors in the logs when I try and start new 
instances (note: this is typically in the middle of the night, when usage is 
low), and all of the instances will stop responding until I (hard) restart 
Tomcat.



Has anyone run into this issue before? Is logging the culprit? If so, what 
options do I have (besides setting up a cron job to restart Tomcat nightly...)

Thanks,

Stu Hood
Webmail.us
You manage your business. We'll manage your email.®



Re: Solr, operating systems and globalization

2007-10-17 Thread Chris Hostetter

: However, SolrSharp culture settings should be reflective and consistent with
: the solr server instance's culture.  This leads to my question: does Solr
: control its culture  language settings through the various language
: components that can be incorporated, or does the underlying OS have a say in
: how that data is treated?

As a general rule:
  1) Solr (the server) should operate as culturally and locally agnostic as 
possible.
  2) Solr Clients that want to act culturally appropriate should 
 explicitly translate from local formats to absolute concepts that 
 it sends to the server.  (ala: the absolute unambiguous date format)

Ideally you should be able to take a Solr install from one box, move it to 
another JVM on a different OS in a different timezone with different 
Locale settings and everything will keep working the same.

(I think once upon a time i argued that Solr should assume the 
charencoding of the local JVM, and wiser people then me pointed out that 
was bad).

There may be exceptions to this -- but those exceptions should be in cases 
where: a) the person configuring Solr is in completley control; and b) the 
exception is prudent because doing the work in the client would require 
more complexity.  Analysis is a good example of this: we don't make the 
clients analyze the text according to the native language customs -- we 
let the person creating the schema.xml specify what the Analysis should 
be.

As i recal, the issue that prompted this email had to do with C# and the 
various cultural ways to specify a floating point number: 1,234 vs 1.234 
(comma vs period).  this is the kind of thing that should be translated in 
clients to the canonical floating point representation. ... by which i 
mean: the one the solr server uses :)

*IF* Solr has the behavior where setting the JVM local to something random 
makes Solr assume floats should be in the comma format, then i would 
consider that a Bug in Solr ... Solr should allways be consistent.

-Hoss



Re: Solr, operating systems and globalization

2007-10-17 Thread Jeff Rodenburg
Thanks for the comments Hoss.  More notes embedded below...

On 10/17/07, Chris Hostetter [EMAIL PROTECTED] wrote:


 : However, SolrSharp culture settings should be reflective and consistent
 with
 : the solr server instance's culture.  This leads to my question: does
 Solr
 : control its culture  language settings through the various language
 : components that can be incorporated, or does the underlying OS have a
 say in
 : how that data is treated?

 As a general rule:
   1) Solr (the server) should operate as culturally and locally agnostic
 as possible.
   2) Solr Clients that want to act culturally appropriate should
  explicitly translate from local formats to absolute concepts that
  it sends to the server.  (ala: the absolute unambiguous date format)

 Ideally you should be able to take a Solr install from one box, move it to
 another JVM on a different OS in a different timezone with different
 Locale settings and everything will keep working the same.


I fully understand that approach.  Going back to C#/Windows, this is known
as an Invariant culture setting, which we're incorporating into Solrsharp
(along with configurable culture settings as appropriate.)

(I think once upon a time i argued that Solr should assume the
 charencoding of the local JVM, and wiser people then me pointed out that
 was bad).

 There may be exceptions to this -- but those exceptions should be in cases
 where: a) the person configuring Solr is in completley control; and b) the
 exception is prudent because doing the work in the client would require
 more complexity.  Analysis is a good example of this: we don't make the
 clients analyze the text according to the native language customs -- we
 let the person creating the schema.xml specify what the Analysis should
 be.

 As i recal, the issue that prompted this email had to do with C# and the
 various cultural ways to specify a floating point number: 1,234 vs 1.234
 (comma vs period).  this is the kind of thing that should be translated in
 clients to the canonical floating point representation. ... by which i
 mean: the one the solr server uses :)


This is exactly the scenario.  Ideally what I'd like to achieve is for
Solrsharp to discover the culture settings from the targeted Solr instance
and set the client in appropriate position.

*IF* Solr has the behavior where setting the JVM local to something random
 makes Solr assume floats should be in the comma format, then i would
 consider that a Bug in Solr ... Solr should allways be consistent.


This would be an interesting discovery exercise for those who deal with
multi-lingual systems across different JVM and OS platforms.  If it *were*
the case that different underlying system stacks affected solr in such a
way, Solrsharp should follow the server's lead.

-Hoss




multilingual list of stopwords

2007-10-17 Thread Maria Mosolova
Hi,

I am looking for a multilingual list of stopwords to use with
Solr/Lucene and would greatly appreciate an advice on where I could
find it.

Thanks,

Maria