trie fields and sortMissingLast

2009-10-01 Thread Steve Conover
Am I correct in thinking that trie fields don't support
sortMissingLast (my tests show that they don't).  If not, is there any
plan for adding it in?

Regards,
Steve


Re: Invalid response with search key having numbers

2009-10-01 Thread Shalin Shekhar Mangar
On Wed, Sep 30, 2009 at 3:01 PM, con convo...@gmail.com wrote:


 Hi all
 I am getting incorrect results when i search with numbers only or string
 containing numbers.
 when such a search is done, all the results in the index is returned,
 irrespective of the search key.
 For eg, the phone number field is mapped to TextField. it can contains
 values like , 653-23345
 also search string like john25, searched against name will show all the
 results.


Getting all results irrespective of the query is very odd. Try adding
debugQuery=on to the queries. That will show you exactly how the query is
being parsed.

-- 
Regards,
Shalin Shekhar Mangar.


Re: Problem with Wildcard...

2009-10-01 Thread Shalin Shekhar Mangar
On Tue, Sep 29, 2009 at 6:42 PM, Jörg Agatz joerg.ag...@googlemail.comwrote:

 Hi Users...

 i have a Problem

 I have a lot of fields, (type=text) for search in all fields i copy all
 fields in the default text field and use this for default search.

 Now i will search...

 This is into a Field

 RI-MC500034-1
 when i search RI-MC500034-1 i found it...
 if i seacht RI-MC5000* i dosen´t

 when i search 500034 i found it...
 if i seacht 5000* i dosen´t

 what can i do to use the Wildcards?


I guess one thing you need to do is to add preserveOriginal=true in the
WordDelimiterFactory section in your field type. That would help match
things like RI-MC5000*. Make sure you re-index all documents after this
change.

As for the others, add debugQuery=on as a request parameter and see how the
query is being parsed. If you have a doubt, paste it on the list and we can
help you.

-- 
Regards,
Shalin Shekhar Mangar.


Why isn't the DateField implementation of ISO 8601 broader?

2009-10-01 Thread Tricia Williams

Hi All,

   I'm working with data that has multiple date precisions most of 
which do not have a time associated with them, rather centuries (like 
1800's),  years (like 1867),  and years/months (like  1918-11).  I'm 
able to sort and search using a workaround where we store the date as a 
string CCYYMM where YYMM are optional.


   I was hoping to be able to tie this into the DateField type so that 
it becomes possible to facet on them without much work and duplication 
of data.  Unfortunately it requires the cannonical representation of 
dateTime which means the time part of the string is mandatory.


   My question is why isn't the DateField implementation of ISO 8601 
broader so that it could include  and MM as acceptable date 
strings?  What would it take to do so?  Are there any work-arounds for 
faceting by century, year, month without creating new fields in my 
schema?  The last resort would be to create these new fields but I'm 
hoping to leverage the power of the DateField and the trie to replace 
range stuff.


Thanks,
Tricia

Some interesting observations from tinkering with the DateFieldTest:

   * 2003-03-00T00:00:00Z becomes 2003-02-28T00:00:00Z
   * 2008-03-00T00:00:00Z becomes 2008-02-29T00:00:00Z
   * 2003-00-00T00:00:00Z becomes 2002-11-30T00:00:00Z
   * 2000-00-00T00:00:00Z becomes 1999-11-30T00:00:00Z
   * 1979-00-31T00:00:00Z becomes 1978-12-31T00:00:00Z
   * 2005-04-00T00:00:00Z becomes 2005-03-31T00:00:00Z
   * 1850-10-00T00:00:00Z becomes 1850-09-30T00:00:00Z

The rounding /YEAR, /MONTH, etc artificially imposes extra precision 
that the original data wouldn't have.  In any case where months are zero 
weird rounding happens.


Re: field collapsing sums

2009-10-01 Thread Martijn v Groningen
Hi Joe,

Currently the patch does not do that, but you can do something else
that might help you in getting your summed stock.

In the latest patch you can include fields of collapsed documents in
the result per distinct field value.
If your specify collapse.includeCollapseDocs.fl=num_in_stock in the
request nd lets say you collapse on brand then in the response you
will receive the following xml:
lst name=collapsedDocs
   result name=brand1 numFound=48 start=0
doc
  str name=num_in_stock2/str
/doc
 doc
  str name=num_in_stock3/str
/doc
  ...
   /result
   result name=”brand2” numFound=”9” start=”0”
  ...
   /result
/lst

On the client side you can do whatever you want with this data and for
example sum it together. Although the patch does not sum for you, I
think it will allow to implement your requirement without to much
hassle.

Cheers,

Martijn

2009/10/1 Matt Weber m...@mattweber.org:
 You might want to see how the stats component works with field collapsing.

 Thanks,

 Matt Weber

 On Sep 30, 2009, at 5:16 PM, Uri Boness wrote:

 Hi,

 At the moment I think the most appropriate place to put it is in the
 AbstractDocumentCollapser (in the getCollapseInfo method). Though, it might
 not be the most efficient.

 Cheers,
 Uri

 Joe Calderon wrote:

 hello all, i have a question on the field collapsing patch, say i have
 an integer field called num_in_stock and i collapse by some other
 column, is it possible to sum up that integer field and return the
 total in the output, if not how would i go about extending the
 collapsing component to support that?


 thx much

 --joe






Question on modifying solr behavior on indexing xml files..

2009-10-01 Thread Thung, Peter C CIV SPAWARSYSCEN-PACIFIC, 56340
1.  In my playing around with 
sending in an XML document within a an XML CDATA tag,
with termVectors=true
 
I noticed the following behavior:
personpeter/person
collapses to the term
personpeterperson
instead of
person
and 
peter separately.
 
I realize I could try and do a search and replaces of characters like
=  to a space so that the default parser/indexer can preserve element
names.
However, I'm wondering if someon could point me to where one might do
this withing
the solr or apache lucene code as a proper plug in with maybe an example
that I could use
as a template.  Also where in the solrconfig.xml file I would want to
change to reference the new parser.
 
2.  My other question would also be if this technique would work for XML
type messages embedded
in Microsoft Excel, or Powerpoint presentations where I would like to
preserve knowining xml element term frequencies
where I would try and leverage the component that automatically indexes
microsoft documents.
Would I need to modify that component and customize it?
 
-Peter
 
 



Where to place ReversedWildcardFilterFactory in Chain

2009-10-01 Thread Chantal Ackermann

Hi all,

I would have two questions about the ReversedWildcardFilterFactory:
a) put it into both chains, index and query, or into index only?
b) where exactly in the/each chain do I have to put it? (Do I have to 
respect a certain order - as I have wordDelimiter and lowercase in 
there, as well.)


More Details:

I understand it is used to allow queries like *sport.

My current configuration for the field I want to use it for contains 
this setup:


fieldType name=text_cn class=solr.TextField
  analyzer
filter class=solr.WordDelimiterFilterFactory
   splitOnCaseChange=1 splitOnNumerics=1
   stemEnglishPossessive=1 generateWordParts=1
   generateNumberParts=1 catenateAll=1
   preserveOriginal=1 /
filter class=solr.LowerCaseFilterFactory /
  /analyzer
/fieldType

The wiki page 
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters states for 
the ReversedWildcardFF:

Add this filter to the index analyzer, but not the query analyzer.

However, the API for it says it provides functionality at index and 
query time (my understanding):
When this factory is added to an analysis chain, it will be used both 
for filtering the tokens during indexing, and to determine the query 
processing of this field during search.


Any help is greatly appreciated.
Thanks!
Chantal



--
Chantal Ackermann


Query filters/analyzers

2009-10-01 Thread Claudio Martella
Hello list.

So, i setup my schema.xml with the different chains of analyzers and
filters for each field (i.e. i created types text-en, text-de, text-it). 
As i have to index documents in different languages, this is good. 
But what defines the analyzers and filters for the query? 

Let's suppose i have my web-app with my input form where you
fill in the query. I detect the language so i can query the field
content-en or content-it or content-de according to the detection.
But how is the query going to be analyzed? Of course i want the query to
be analyzed accordingly to the field i'm going to search in.
Where is this defined?

TIA

Claudio

-- 
Claudio Martella
Digital Technologies
Unit Research  Development - Engineer

TIS innovation park
Via Siemens 19 | Siemensstr. 19
39100 Bolzano | 39100 Bozen
Tel. +39 0471 068 123
Fax  +39 0471 068 129
claudio.marte...@tis.bz.it http://www.tis.bz.it

Short information regarding use of personal data. According to Section 13 of 
Italian Legislative Decree no. 196 of 30 June 2003, we inform you that we 
process your personal data in order to fulfil contractual and fiscal 
obligations and also to send you information regarding our services and events. 
Your personal data are processed with and without electronic means and by 
respecting data subjects' rights, fundamental freedoms and dignity, 
particularly with regard to confidentiality, personal identity and the right to 
personal data protection. At any time and without formalities you can write an 
e-mail to priv...@tis.bz.it in order to object the processing of your personal 
data for the purpose of sending advertising materials and also to exercise the 
right to access personal data and other rights referred to in Section 7 of 
Decree 196/2003. The data controller is TIS Techno Innovation Alto Adige, 
Siemens Street n. 19, Bolzano. You can find the complete information on the web 
site www.tis.bz.it.




Re: Query filters/analyzers

2009-10-01 Thread Chantal Ackermann

Hi Claudio,

in schema.xml, the analyzer element accepts the attribute type.
If you need different analyzer chains during indexing and querying, 
configure it like this:


fieldType name=channel_name class=solr.TextField
analyzer type=index
!-- indexing analyzer chain defined here --
/analyzer
analyzer type=query
!-- query analyzer chain defined here --
/analyzer
/fieldType

If there is no difference, just remove one analyzer element and the type 
attribute from the remaining one.


You can check after indexing in the schema browser (admin web frontend) 
what analyzer chain is applied for indexing and querying on a certain field.


When you have detected the input language, simply choose the correct 
field, and the configured analyzer chain for that field will be applied 
automatically.


E.g. input is italian:
q=text-it:input

text-it has the italian analyzers configured for index and query, so to 
the input, the italian analyzers will also be applied.


Cheers,
Chantal

Claudio Martella schrieb:

Hello list.

So, i setup my schema.xml with the different chains of analyzers and
filters for each field (i.e. i created types text-en, text-de, text-it).
As i have to index documents in different languages, this is good.
But what defines the analyzers and filters for the query?

Let's suppose i have my web-app with my input form where you
fill in the query. I detect the language so i can query the field
content-en or content-it or content-de according to the detection.
But how is the query going to be analyzed? Of course i want the query to
be analyzed accordingly to the field i'm going to search in.
Where is this defined?

TIA

Claudio

--
Claudio Martella
Digital Technologies
Unit Research  Development - Engineer

TIS innovation park
Via Siemens 19 | Siemensstr. 19
39100 Bolzano | 39100 Bozen
Tel. +39 0471 068 123
Fax  +39 0471 068 129
claudio.marte...@tis.bz.it http://www.tis.bz.it

Short information regarding use of personal data. According to Section 13 of 
Italian Legislative Decree no. 196 of 30 June 2003, we inform you that we 
process your personal data in order to fulfil contractual and fiscal 
obligations and also to send you information regarding our services and events. 
Your personal data are processed with and without electronic means and by 
respecting data subjects' rights, fundamental freedoms and dignity, 
particularly with regard to confidentiality, personal identity and the right to 
personal data protection. At any time and without formalities you can write an 
e-mail to priv...@tis.bz.it in order to object the processing of your personal 
data for the purpose of sending advertising materials and also to exercise the 
right to access personal data and other rights referred to in Section 7 of 
Decree 196/2003. The data controller is TIS Techno Innovation Alto Adige, 
Siemens Street n. 19, Bolzano. You can find the complete information on the web

site www.tis.bz.it.





Solr 1.4 Release date/ lucene 2.9 API ?

2009-10-01 Thread Jérôme Etévé
Hi all,

Have you planned a release date for solr 1.4? If I understood well, it
will use lucene 2.9 release from last sept. 24th with a stable API?

Thanks.

Jerome.

-- 
Jerome Eteve.
http://www.eteve.net
jer...@eteve.net


index size before and after commit

2009-10-01 Thread Phillip Farber
I am trying to automate a build process that adds documents to 10 shards 
over 5 machines and need to limit the size of a shard to no more than 
200GB because I only have 400GB of disk available to optimize a given shard.


Why does the size (du) of an index typically decrease after a commit?  
I've observed a decrease in size of as much as from 296GB down to 151GB 
or as little as from 183GB to 182GB.  Is that size after a commit close 
to the size the index would be after an optimize?  For that matter, are 
there cases where optimization can take more than 2x?  I've heard of 
cases but have not observed them in my system.  I only do adds to the 
shards, never query them. An LVM snapshot of the shard receives the queries.


Is doing a commit before I take a du a reliable way to gauge the size of 
the shard?  It is really bad news to allow a shard to go over 200GB in 
my use case.  How do others manage this problem of 2x space needed to 
optimize with limited dosk space?


Advice greatly appreciated.

Phil



Re: Solr 1.4 Release date/ lucene 2.9 API ?

2009-10-01 Thread Grant Ingersoll


On Oct 1, 2009, at 8:32 AM, Jérôme Etévé wrote:


Hi all,

Have you planned a release date for solr 1.4? If I understood well, it
will use lucene 2.9 release from last sept. 24th with a stable API?



Please have a look at https://issues.apache.org/jira/secure/BrowseVersion.jspa?id=12310230versionId=12313351showOpenIssuesOnly=true 
 (assuming JIRA is up) and see if there is anyway you can contribute  
to testing, etc.  Once these 9 issues are cleared up, we can do a  
release.


Yes, it will use 2.9.0

--
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
using Solr/Lucene:

http://www.lucidimagination.com/search



Re: index size before and after commit

2009-10-01 Thread Grant Ingersoll
It may take some time before resources are released and garbage  
collected, so that may be part of the reason why things hang around  
and du doesn't report much of a drop.


On Oct 1, 2009, at 8:54 AM, Phillip Farber wrote:

I am trying to automate a build process that adds documents to 10  
shards over 5 machines and need to limit the size of a shard to no  
more than 200GB because I only have 400GB of disk available to  
optimize a given shard.


Why does the size (du) of an index typically decrease after a  
commit?  I've observed a decrease in size of as much as from 296GB  
down to 151GB or as little as from 183GB to 182GB.  Is that size  
after a commit close to the size the index would be after an  
optimize?  For that matter, are there cases where optimization can  
take more than 2x?  I've heard of cases but have not observed them  
in my system.


I seem to recall a case where it can be 3x, but I don't know that it  
has been observed much.


I only do adds to the shards, never query them. An LVM snapshot of  
the shard receives the queries.


Is doing a commit before I take a du a reliable way to gauge the  
size of the shard?  It is really bad news to allow a shard to go  
over 200GB in my use case.  How do others manage this problem of 2x  
space needed to optimize with limited dosk space?


Do you need to optimize at all?


--
Grant Ingersoll
http://www.lucidimagination.com/

Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids)  
using Solr/Lucene:

http://www.lucidimagination.com/search



Re: index size before and after commit

2009-10-01 Thread Mark Miller
Phillip Farber wrote:
 I am trying to automate a build process that adds documents to 10
 shards over 5 machines and need to limit the size of a shard to no
 more than 200GB because I only have 400GB of disk available to
 optimize a given shard.

 Why does the size (du) of an index typically decrease after a commit? 
 I've observed a decrease in size of as much as from 296GB down to
 151GB or as little as from 183GB to 182GB.  Is that size after a
 commit close to the size the index would be after an optimize?  
Likely. Until you commit or close the Writer, the unoptimized index is
the live index. And then you also have the optimized index. Once you
commit and make the optimized index the live index, the unoptimized
index can be removed (depending on your delete policy, which by default
only keeps the latest commit point).
 For that matter, are there cases where optimization can take more than
 2x?  I've heard of cases but have not observed them in my system.  I
 only do adds to the shards, never query them. An LVM snapshot of the
 shard receives the queries.
There are cases where it takes over 2x - but they involve using reopen.
If you have more than one Reader on the index, and only reopen some of
them, the new Readers created can hold open the partially optimized
segments that existed at that moment, creating a need for greater than 2x.

 Is doing a commit before I take a du a reliable way to gauge the size
 of the shard?  It is really bad news to allow a shard to go over 200GB
 in my use case.  How do others manage this problem of 2x space needed
 to optimize with limited dosk space?
Get more disk space ;) Or don't optimize. A lower mergefactor can make
optimizations less necessary.

 Advice greatly appreciated.

 Phil



-- 
- Mark

http://www.lucidimagination.com





Re: index size before and after commit

2009-10-01 Thread Mark Miller
Whoops - they way I have mail come in, not easy to tell if I'm replying
to Lucene or Solr list ;)

The way Solr works with Searchers and reopen, it shouldn't run into a
situation that requires greater than
2x to optimize. I won't guarantee it ;) But based on what I know, it
shouldn't happen under normal circumstances.

Mark Miller wrote:
 Phillip Farber wrote:
   
 I am trying to automate a build process that adds documents to 10
 shards over 5 machines and need to limit the size of a shard to no
 more than 200GB because I only have 400GB of disk available to
 optimize a given shard.

 Why does the size (du) of an index typically decrease after a commit? 
 I've observed a decrease in size of as much as from 296GB down to
 151GB or as little as from 183GB to 182GB.  Is that size after a
 commit close to the size the index would be after an optimize?  
 
 Likely. Until you commit or close the Writer, the unoptimized index is
 the live index. And then you also have the optimized index. Once you
 commit and make the optimized index the live index, the unoptimized
 index can be removed (depending on your delete policy, which by default
 only keeps the latest commit point).
   
 For that matter, are there cases where optimization can take more than
 2x?  I've heard of cases but have not observed them in my system.  I
 only do adds to the shards, never query them. An LVM snapshot of the
 shard receives the queries.
 
 There are cases where it takes over 2x - but they involve using reopen.
 If you have more than one Reader on the index, and only reopen some of
 them, the new Readers created can hold open the partially optimized
 segments that existed at that moment, creating a need for greater than 2x.
   
 Is doing a commit before I take a du a reliable way to gauge the size
 of the shard?  It is really bad news to allow a shard to go over 200GB
 in my use case.  How do others manage this problem of 2x space needed
 to optimize with limited dosk space?
 
 Get more disk space ;) Or don't optimize. A lower mergefactor can make
 optimizations less necessary.
   
 Advice greatly appreciated.

 Phil

 


   


-- 
- Mark

http://www.lucidimagination.com





Re: Query filters/analyzers

2009-10-01 Thread Claudio Martella
Thanks, that's exactly the kind of answer I was looking for.


Chantal Ackermann wrote:
 Hi Claudio,

 in schema.xml, the analyzer element accepts the attribute type.
 If you need different analyzer chains during indexing and querying,
 configure it like this:

 fieldType name=channel_name class=solr.TextField
 analyzer type=index
 !-- indexing analyzer chain defined here --
 /analyzer
 analyzer type=query
 !-- query analyzer chain defined here --
 /analyzer
 /fieldType

 If there is no difference, just remove one analyzer element and the
 type attribute from the remaining one.

 You can check after indexing in the schema browser (admin web
 frontend) what analyzer chain is applied for indexing and querying on
 a certain field.

 When you have detected the input language, simply choose the correct
 field, and the configured analyzer chain for that field will be
 applied automatically.

 E.g. input is italian:
 q=text-it:input

 text-it has the italian analyzers configured for index and query, so
 to the input, the italian analyzers will also be applied.

 Cheers,
 Chantal

 Claudio Martella schrieb:
 Hello list.

 So, i setup my schema.xml with the different chains of analyzers and
 filters for each field (i.e. i created types text-en, text-de, text-it).
 As i have to index documents in different languages, this is good.
 But what defines the analyzers and filters for the query?

 Let's suppose i have my web-app with my input form where you
 fill in the query. I detect the language so i can query the field
 content-en or content-it or content-de according to the detection.
 But how is the query going to be analyzed? Of course i want the query to
 be analyzed accordingly to the field i'm going to search in.
 Where is this defined?

 TIA

 Claudio

 -- 
 Claudio Martella
 Digital Technologies
 Unit Research  Development - Engineer

 TIS innovation park
 Via Siemens 19 | Siemensstr. 19
 39100 Bolzano | 39100 Bozen
 Tel. +39 0471 068 123
 Fax  +39 0471 068 129
 claudio.marte...@tis.bz.it http://www.tis.bz.it

 Short information regarding use of personal data. According to
 Section 13 of Italian Legislative Decree no. 196 of 30 June 2003, we
 inform you that we process your personal data in order to fulfil
 contractual and fiscal obligations and also to send you information
 regarding our services and events. Your personal data are processed
 with and without electronic means and by respecting data subjects'
 rights, fundamental freedoms and dignity, particularly with regard to
 confidentiality, personal identity and the right to personal data
 protection. At any time and without formalities you can write an
 e-mail to priv...@tis.bz.it in order to object the processing of your
 personal data for the purpose of sending advertising materials and
 also to exercise the right to access personal data and other rights
 referred to in Section 7 of Decree 196/2003. The data controller is
 TIS Techno Innovation Alto Adige, Siemens Street n. 19, Bolzano. You
 can find the complete information on the web
 site www.tis.bz.it.





-- 
Claudio Martella
Digital Technologies
Unit Research  Development - Engineer

TIS innovation park
Via Siemens 19 | Siemensstr. 19
39100 Bolzano | 39100 Bozen
Tel. +39 0471 068 123
Fax  +39 0471 068 129
claudio.marte...@tis.bz.it http://www.tis.bz.it

Short information regarding use of personal data. According to Section 13 of 
Italian Legislative Decree no. 196 of 30 June 2003, we inform you that we 
process your personal data in order to fulfil contractual and fiscal 
obligations and also to send you information regarding our services and events. 
Your personal data are processed with and without electronic means and by 
respecting data subjects' rights, fundamental freedoms and dignity, 
particularly with regard to confidentiality, personal identity and the right to 
personal data protection. At any time and without formalities you can write an 
e-mail to priv...@tis.bz.it in order to object the processing of your personal 
data for the purpose of sending advertising materials and also to exercise the 
right to access personal data and other rights referred to in Section 7 of 
Decree 196/2003. The data controller is TIS Techno Innovation Alto Adige, 
Siemens Street n. 19, Bolzano. You can find the complete information on the web 
site www.tis.bz.it.




Re: Where to place ReversedWildcardFilterFactory in Chain

2009-10-01 Thread Mark Miller
Chantal Ackermann wrote:
 Hi all,

 I would have two questions about the ReversedWildcardFilterFactory:
 a) put it into both chains, index and query, or into index only?
 b) where exactly in the/each chain do I have to put it? (Do I have to
 respect a certain order - as I have wordDelimiter and lowercase in
 there, as well.)

 More Details:

 I understand it is used to allow queries like *sport.

 My current configuration for the field I want to use it for contains
 this setup:

 fieldType name=text_cn class=solr.TextField
   analyzer
 filter class=solr.WordDelimiterFilterFactory
splitOnCaseChange=1 splitOnNumerics=1
stemEnglishPossessive=1 generateWordParts=1
generateNumberParts=1 catenateAll=1
preserveOriginal=1 /
 filter class=solr.LowerCaseFilterFactory /
   /analyzer
 /fieldType

 The wiki page
 http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters states for
 the ReversedWildcardFF:
 Add this filter to the index analyzer, but not the query analyzer.

 However, the API for it says it provides functionality at index and
 query time (my understanding):
 When this factory is added to an analysis chain, it will be used both
 for filtering the tokens during indexing, and to determine the query
 processing of this field during search.

 Any help is greatly appreciated.
 Thanks!
 Chantal



You just put it in the index chain, not the query chain. The
SolrQueryParser will consult it when building a wildcard search - don't
put it in the query chain. I know, appears like a bit of magic. That
Andrzej is a wizard though, so it makes sense ;)

-- 
- Mark

http://www.lucidimagination.com





Keepwords Schema

2009-10-01 Thread matrix_psj

Hi guys,

Although i've been looking at Solr on and off for a few months, I'm still
getting to grips with the schema and filter/tokenizers.

I'm having trouble using the solr.KeepWordFilterFactory functionality and
there doesnt appear to be previous discussions here regarding it. I
basically have a short text field (~100 chars on average) that i wish to be
returned as a facet, but only some or parts of the field based on keepwords
stored in a file.

An example:
My schema is about web files. Part of the syntax is a text field of authors
that have worked on each file, e.g.
file
filenamelogin.php/filename
   lastModDate2009-01-01/lastModDate
   authorsalex, brian, carl carlington, dave alpha, eddie, dave
beta/authors
/file

When I perform a search and get 20 web files back, I would like a facet of
the individual authors, but only if there name appears in a
public_authors.txt file.

So if the public_authors.txt file contained:
Anna,
Bob,
Carl Carlington,
Dave Alpha,
Elvis,
Eddie,

The facet returned would be:
Carl Carlington
Dave Alpha
Eddie



Not sure if that makes sense? If it does, could someone explain to me the
schema fieldtype declarations that would bring back this sort of results.

Thanks for any help

Paul
-- 
View this message in context: 
http://www.nabble.com/Keepwords-Schema-tp25696896p25696896.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: index size before and after commit

2009-10-01 Thread Mark Miller
bq. and reindex without any merges.

Thats actually quite a hoop to jump as well - though if you determined
and you have tons of RAM, its somewhat doable.

Mark Miller wrote:
 Nice one ;) Its not technically a case where optimize requires  2x
 though in case the user asking gets confused. Its a case unrelated to
 optimize that can grow your index. Then you need  2x for the optimize,
 since you won't copy the deletes.

 It also requires that you jump hoops to delete everything. If you delete
 everything with *:*, that is smart enough not to just do a delete on
 every document - it just creates a new index, allowing the removal of
 the old very efficiently.

 Def agree on the more disk space.

 Walter Underwood wrote:
   
 Here is how you need 3X. First, index everything and optimize. Then
 delete everything and reindex without any merges.

 You have one full-size index containing only deleted docs, one
 full-size index containing reindexed docs, and need that much space
 for a third index.

 Honestly, disk is cheap, and there is no way to make Lucene work
 reliably with less disk. 1TB is a few hundred dollars. You have a free
 search engine, buy some disk.

 wunder

 On Oct 1, 2009, at 6:25 AM, Grant Ingersoll wrote:

 
 151GB or as little as from 183GB to 182GB.  Is that size after a
 commit close to the size the index would be after an optimize?  For
 that matter, are there cases where optimization can take more than
 2x?  I've heard of cases but have not observed them in my system.
 
 I seem to recall a case where it can be 3x, but I don't know that it
 has been observed much.
   


   


-- 
- Mark

http://www.lucidimagination.com





Re: Query filters/analyzers

2009-10-01 Thread Claudio Martella
Ok, one more question on this issue. I used to have an all field where
i used to copyField title content and keywords defined with
typeField text, which used to have english-language dependant
analyzers/filters. Now I can copyField all the three content-* fields
as I know that only one of the three will be filled per document. My
problem is once again that i have to define a typeField for this all
that should be language-independant.
The solution is once again to create three all fields or to create
only one defined as text-ws (no language-dependant analysis). But in the
latter case it would be desynched with the content-* fields which are
stemmed and stopped.

About the copyField issue in general: as it copies the content to the
other field, what is the sense to define analyzers for the destination
field? The source is already analyzed so i guess that the RESULT of the
analysis is copied there. In this case a text-ws should be sufficient.
But then i guess the problem is again with the QUERY time analysis. Right?


Chantal Ackermann wrote:
 Hi Claudio,

 in schema.xml, the analyzer element accepts the attribute type.
 If you need different analyzer chains during indexing and querying,
 configure it like this:

 fieldType name=channel_name class=solr.TextField
 analyzer type=index
 !-- indexing analyzer chain defined here --
 /analyzer
 analyzer type=query
 !-- query analyzer chain defined here --
 /analyzer
 /fieldType

 If there is no difference, just remove one analyzer element and the
 type attribute from the remaining one.

 You can check after indexing in the schema browser (admin web
 frontend) what analyzer chain is applied for indexing and querying on
 a certain field.

 When you have detected the input language, simply choose the correct
 field, and the configured analyzer chain for that field will be
 applied automatically.

 E.g. input is italian:
 q=text-it:input

 text-it has the italian analyzers configured for index and query, so
 to the input, the italian analyzers will also be applied.

 Cheers,
 Chantal

 Claudio Martella schrieb:
 Hello list.

 So, i setup my schema.xml with the different chains of analyzers and
 filters for each field (i.e. i created types text-en, text-de, text-it).
 As i have to index documents in different languages, this is good.
 But what defines the analyzers and filters for the query?

 Let's suppose i have my web-app with my input form where you
 fill in the query. I detect the language so i can query the field
 content-en or content-it or content-de according to the detection.
 But how is the query going to be analyzed? Of course i want the query to
 be analyzed accordingly to the field i'm going to search in.
 Where is this defined?

 TIA

 Claudio

 -- 
 Claudio Martella
 Digital Technologies
 Unit Research  Development - Engineer

 TIS innovation park
 Via Siemens 19 | Siemensstr. 19
 39100 Bolzano | 39100 Bozen
 Tel. +39 0471 068 123
 Fax  +39 0471 068 129
 claudio.marte...@tis.bz.it http://www.tis.bz.it

 Short information regarding use of personal data. According to
 Section 13 of Italian Legislative Decree no. 196 of 30 June 2003, we
 inform you that we process your personal data in order to fulfil
 contractual and fiscal obligations and also to send you information
 regarding our services and events. Your personal data are processed
 with and without electronic means and by respecting data subjects'
 rights, fundamental freedoms and dignity, particularly with regard to
 confidentiality, personal identity and the right to personal data
 protection. At any time and without formalities you can write an
 e-mail to priv...@tis.bz.it in order to object the processing of your
 personal data for the purpose of sending advertising materials and
 also to exercise the right to access personal data and other rights
 referred to in Section 7 of Decree 196/2003. The data controller is
 TIS Techno Innovation Alto Adige, Siemens Street n. 19, Bolzano. You
 can find the complete information on the web
 site www.tis.bz.it.





-- 
Claudio Martella
Digital Technologies
Unit Research  Development - Engineer

TIS innovation park
Via Siemens 19 | Siemensstr. 19
39100 Bolzano | 39100 Bozen
Tel. +39 0471 068 123
Fax  +39 0471 068 129
claudio.marte...@tis.bz.it http://www.tis.bz.it

Short information regarding use of personal data. According to Section 13 of 
Italian Legislative Decree no. 196 of 30 June 2003, we inform you that we 
process your personal data in order to fulfil contractual and fiscal 
obligations and also to send you information regarding our services and events. 
Your personal data are processed with and without electronic means and by 
respecting data subjects' rights, fundamental freedoms and dignity, 
particularly with regard to confidentiality, personal identity and the right to 
personal data protection. At any time and without formalities you can write an 
e-mail to priv...@tis.bz.it in order to object the processing of your 

Re: Only one usage of each socket address error

2009-10-01 Thread Steinar Asbjørnsen

Hi.

This situation is still bugging me.
I thought i had it fixed yday, but no...

Seems like this goes both for deleting and adding, but I'll explain  
the delete-situation here:
When I'm deleting documents(~5k) from a index, i get a error message  
saying
Only one usage of each socket address (protocol/network address/port)  
is normally permitted 127.0.0.1:8983.


I've tried  both delete by id and delete by query, and both gives me  
the same error.
The command that is giving me the errormessage is solr.Delete(id) and  
solr.Delete(new SolrQuery(id:+id)).


The command is issued with SolrNet, and I'm not sure if this is  
SolrNet or solr related.


I cannot find anything that helps me out in the catalina-log.
Are there any other logs that should be checked?

I'm grateful for any pointers :)

Thanks,
Steinar

Den 29. sep. 2009 kl. 11.15 skrev Steinar Asbjørnsen:

Seems like the post in the SolrNet group: 		http://groups.google.com/group/solrnet/browse_thread/thread/7e3034b626d3e82d?pli=1 
 helped me get trough.


Thanks you solr-user's for helping out too!

Steinar

Videresendt melding:


Fra: Steinar Asbjørnsen steinar...@gmail.com
Dato: 28. september 2009 17.07.15 GMT+02.00
Til: solr-user@lucene.apache.org
Emne: Re: Only one usage of each socket address error

I'm using the add(MyObject) command form ()in a foreach loop to add  
my objects to the index.


In the catalina-log i cannot see anything that helps me out.
It stops at:
28.sep.2009 08:58:40  
org.apache.solr.update.processor.LogUpdateProcessor finish

INFO: {add=[12345]} 0 187
28.sep.2009 08:58:40 org.apache.solr.core.SolrCore execute
INFO: [core2] webapp=/solr path=/update params={} status=0 QTime=187
Whitch indicates nothing wrong.

Are there any other logs that should be checked?

What it seems like to me at the moment is that the foreach is  
passing objects(documents) to solr faster then solr can add them to  
the index. As in I'm eventually running out of connections (to  
solr?) or something.


I'm running another incremental update that with other objects  
where the foreachs isn't quite as fast. This job has added over  
100k documents without failing, and still going. Whereas the  
problematic job fails after ~3k.


What I've learned trough the day tho, is that the index where my  
feed is failing is actually redundant.

I.e I'm off the hook for now.

Still I'd like to figure out whats going wrong.

Steinar

There's nothing in that output that indicates something we can  
help with over in solr-user land.  What is the call you're making  
to Solr?  Did Solr log anything anomalous?


Erik


On Sep 28, 2009, at 4:41 AM, Steinar Asbjørnsen wrote:

I just posted to the SolrNet-group since i have the exact same(?)  
problem.
Hope I'm not beeing rude posting here as well (since the SolrNet- 
group doesn't seem as active as this mailinglist).


The problem occurs when I'm running an incremental feed(self  
made) of a index.


My post:
[snip]
Whats happening is that i get this error message (in VS):
A first chance exception of type
'SolrNet.Exceptions.SolrConnectionException' occurred in  
SolrNet.DLL

And the web browser (which i use to start the feed says:
System.Data.SqlClient.SqlException: Timeout expired.  The timeout
period elapsed prior to completion of the operation or the server  
is

not responding.
At the time of writing my index contains 15k docs, and lacks  
~700k

docs that the incremental feed should take care of adding to the
index.
The error message appears after 3k docs are added, and before 4k
docs are added.
I'm committing each 1%1000==0.
In addittion autocommit is set to:
autoCommit
maxDocs1/maxDocs
/autoCommit
More info:
From schema.xml:
field name=id type=text indexed=true stored=true
required=true /
field name=name type=string indexed=true stored=true
required=false /
I'm fetching data from a (remote) Sql 2008 Server, using  
sqljdbc4.jar.

And Solr is running on a local Tomcat-installation.
SolrNet version: 0.2.3.0
Solr Specification Version: 1.3.0.2009.08.29.08.05.39

[/snip]
Any suggestions on how to fix this would be much apreceiated.

Regards,
Steinar










Re: Where to place ReversedWildcardFilterFactory in Chain

2009-10-01 Thread Chantal Ackermann

Thanks, Mark!
But I suppose it does matter where in the index chain it goes? I would 
guess it is applied to the tokens, so I suppose I should put it at the 
very end - after WordDelimiter and Lowercase have been applied.



Is that correct?

   analyzer type=index
 filter class=solr.WordDelimiterFilterFactory
splitOnCaseChange=1 splitOnNumerics=1
stemEnglishPossessive=1 generateWordParts=1
generateNumberParts=1 catenateAll=1
preserveOriginal=1 /
 filter class=solr.LowerCaseFilterFactory /
   filter class=solr.ReversedWildcardFilterFactory /
   /analyzer


Cheers,
Chantal

Mark Miller schrieb:
 You just put it in the index chain, not the query chain. The
 SolrQueryParser will consult it when building a wildcard search - don't
 put it in the query chain. I know, appears like a bit of magic. That
 Andrzej is a wizard though, so it makes sense ;)

 --
 - Mark

 http://www.lucidimagination.com




Chantal Ackermann wrote:

Hi all,

I would have two questions about the ReversedWildcardFilterFactory:
a) put it into both chains, index and query, or into index only?
b) where exactly in the/each chain do I have to put it? (Do I have to
respect a certain order - as I have wordDelimiter and lowercase in
there, as well.)

More Details:

I understand it is used to allow queries like *sport.

My current configuration for the field I want to use it for contains
this setup:

fieldType name=text_cn class=solr.TextField
  analyzer
filter class=solr.WordDelimiterFilterFactory
   splitOnCaseChange=1 splitOnNumerics=1
   stemEnglishPossessive=1 generateWordParts=1
   generateNumberParts=1 catenateAll=1
   preserveOriginal=1 /
filter class=solr.LowerCaseFilterFactory /
  /analyzer
/fieldType

The wiki page
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters states for
the ReversedWildcardFF:
Add this filter to the index analyzer, but not the query analyzer.

However, the API for it says it provides functionality at index and
query time (my understanding):
When this factory is added to an analysis chain, it will be used both
for filtering the tokens during indexing, and to determine the query
processing of this field during search.

Any help is greatly appreciated.
Thanks!
Chantal







Re: index size before and after commit

2009-10-01 Thread Walter Underwood
I've now worked on three different search engines and they all have a  
3X worst

case on space, so I'm familiar with this case. --wunder

On Oct 1, 2009, at 7:15 AM, Mark Miller wrote:


Nice one ;) Its not technically a case where optimize requires  2x
though in case the user asking gets confused. Its a case unrelated to
optimize that can grow your index. Then you need  2x for the  
optimize,

since you won't copy the deletes.

It also requires that you jump hoops to delete everything. If you  
delete

everything with *:*, that is smart enough not to just do a delete on
every document - it just creates a new index, allowing the removal of
the old very efficiently.

Def agree on the more disk space.

Walter Underwood wrote:

Here is how you need 3X. First, index everything and optimize. Then
delete everything and reindex without any merges.

You have one full-size index containing only deleted docs, one
full-size index containing reindexed docs, and need that much space
for a third index.

Honestly, disk is cheap, and there is no way to make Lucene work
reliably with less disk. 1TB is a few hundred dollars. You have a  
free

search engine, buy some disk.

wunder

On Oct 1, 2009, at 6:25 AM, Grant Ingersoll wrote:


151GB or as little as from 183GB to 182GB.  Is that size after a
commit close to the size the index would be after an optimize?  For
that matter, are there cases where optimization can take more than
2x?  I've heard of cases but have not observed them in my system.


I seem to recall a case where it can be 3x, but I don't know that it
has been observed much.





--
- Mark

http://www.lucidimagination.com







Re: trie fields and sortMissingLast

2009-10-01 Thread Steve Conover
I just noticed this comment in the default schema:

!--
   These types should only be used for back compatibility with existing
   indexes, or if sortMissingLast functionality is needed. Use
Trie based fields instead.
--

Does that mean TrieFields are never going to get sortMissingLast?

Do you all think that a reasonable strategy is to use a copyField and
use s fields for sorting (only), and trie for everything else?

On Wed, Sep 30, 2009 at 10:59 PM, Steve Conover scono...@gmail.com wrote:
 Am I correct in thinking that trie fields don't support
 sortMissingLast (my tests show that they don't).  If not, is there any
 plan for adding it in?

 Regards,
 Steve



RE: Sorting/paging problem

2009-10-01 Thread Charlie Jackson
Oops, the missing trailing Z was probably just a cut and paste error.

It might be tough to come up with a case that can reproduce it -- it's a
sticky issue. I'll post it if I can, though. 


-Original Message-
From: Chris Hostetter [mailto:hossman_luc...@fucit.org] 
Sent: Tuesday, September 29, 2009 6:08 PM
To: solr-user@lucene.apache.org
Subject: Re: Sorting/paging problem


: docdate name=indexed_date2009-09-23T19:25:03.400Z/date/doc
: 
: docdate name=indexed_date2009-09-23T19:25:19.951/date/doc
: 
: docdate name=indexed_date2009-09-23T20:10:07.919Z/date/doc

is that a cut/paste error, or did you really get a date back from Solr 
w/o the trailing Z ?!?!?!

...

: So, not only is the date sorting wrong, but the exact same document
: shows up on the next page, also still out of date order. I've seen the
: same document show up in 4-5 pages in some cases. It's always the last
: record on the page, too. If I change the page size, the problem seems
to

that is really freaking weird.  can you reproduce this in a simple 
example?  maybe an index that's small enough (and doesn't contain 
confidential information) that you could zip up and post online?



-Hoss



Solr Trunk Heap Space Issues

2009-10-01 Thread Jeff Newburn
I am trying to update to the newest version of solr from trunk as of May
5th.  I updated and compiled from trunk as of yesterday (09/30/2009).  When
I try to do a full import I am receiving a GC heap error after changing
nothing in the configuration files.  Why would this happen in the most
recent versions but not in the version from a few months ago.  The stack
trace is below.

Oct 1, 2009 8:34:32 AM org.apache.solr.update.processor.LogUpdateProcessor
finish
INFO: {add=[166400, 166608, 166698, 166800, 166811, 167097, 167316, 167353,
...(83 more)]} 0 35991
Oct 1, 2009 8:34:32 AM org.apache.solr.common.SolrException log
SEVERE: java.lang.OutOfMemoryError: GC overhead limit exceeded
at java.util.Arrays.copyOfRange(Arrays.java:3209)
at java.lang.String.init(String.java:215)
at com.ctc.wstx.util.TextBuffer.contentsAsString(TextBuffer.java:384)
at com.ctc.wstx.sr.BasicStreamReader.getText(BasicStreamReader.java:821)
at org.apache.solr.handler.XMLLoader.readDoc(XMLLoader.java:280)
at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:139)
at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:69)
at 
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentSt
reamHandlerBase.java:54)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.
java:131)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:3
38)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:
241)
at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Application
FilterChain.java:235)
at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterCh
ain.java:206)
at 
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.ja
va:233)
at 
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.ja
va:175)
at 
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128
)
at 
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102
)
at 
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java
:109)
at 
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286)
at 
org.apache.coyote.http11.Http11NioProcessor.process(Http11NioProcessor.java:
879)
at 
org.apache.coyote.http11.Http11NioProtocol$Http11ConnectionHandler.process(H
ttp11NioProtocol.java:719)
at 
org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.run(NioEndpoint.java:
2080)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.ja
va:886)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:9
08)
at java.lang.Thread.run(Thread.java:619)

Oct 1, 2009 8:40:06 AM org.apache.solr.core.SolrCore execute
INFO: [zeta-main] webapp=/solr path=/update params={} status=500 QTime=5265
Oct 1, 2009 8:40:12 AM org.apache.solr.common.SolrException log
SEVERE: java.lang.OutOfMemoryError: GC overhead limit exceeded

-- 
Jeff Newburn
Software Engineer, Zappos.com
jnewb...@zappos.com - 702-943-7562



Re: field collapsing sums

2009-10-01 Thread Joe Calderon
hello martijn, thx for the tip, i tried that approach but ran into two
snags, 1. returning the fields makes collapsing a lot slower for
results, but that might just be the nature of iterating large results.
2. it seems like only dupes of records on the first page are returned

or is tehre a a setting im missing? currently im only sending,
collapse.field=brand and collapse.includeCollapseDocs.fl=num_in_stock

--joe

On Thu, Oct 1, 2009 at 1:14 AM, Martijn v Groningen
martijn.is.h...@gmail.com wrote:
 Hi Joe,

 Currently the patch does not do that, but you can do something else
 that might help you in getting your summed stock.

 In the latest patch you can include fields of collapsed documents in
 the result per distinct field value.
 If your specify collapse.includeCollapseDocs.fl=num_in_stock in the
 request nd lets say you collapse on brand then in the response you
 will receive the following xml:
 lst name=collapsedDocs
   result name=brand1 numFound=48 start=0
        doc
          str name=num_in_stock2/str
        /doc
         doc
          str name=num_in_stock3/str
        /doc
      ...
   /result
   result name=”brand2” numFound=”9” start=”0”
      ...
   /result
 /lst

 On the client side you can do whatever you want with this data and for
 example sum it together. Although the patch does not sum for you, I
 think it will allow to implement your requirement without to much
 hassle.

 Cheers,

 Martijn

 2009/10/1 Matt Weber m...@mattweber.org:
 You might want to see how the stats component works with field collapsing.

 Thanks,

 Matt Weber

 On Sep 30, 2009, at 5:16 PM, Uri Boness wrote:

 Hi,

 At the moment I think the most appropriate place to put it is in the
 AbstractDocumentCollapser (in the getCollapseInfo method). Though, it might
 not be the most efficient.

 Cheers,
 Uri

 Joe Calderon wrote:

 hello all, i have a question on the field collapsing patch, say i have
 an integer field called num_in_stock and i collapse by some other
 column, is it possible to sum up that integer field and return the
 total in the output, if not how would i go about extending the
 collapsing component to support that?


 thx much

 --joe







Re: Where to place ReversedWildcardFilterFactory in Chain

2009-10-01 Thread Andrzej Bialecki

Chantal Ackermann wrote:

Thanks, Mark!
But I suppose it does matter where in the index chain it goes? I would 
guess it is applied to the tokens, so I suppose I should put it at the 
very end - after WordDelimiter and Lowercase have been applied.



Is that correct?

analyzer type=index
  filter class=solr.WordDelimiterFilterFactory
 splitOnCaseChange=1 splitOnNumerics=1
 stemEnglishPossessive=1 generateWordParts=1
 generateNumberParts=1 catenateAll=1
 preserveOriginal=1 /
  filter class=solr.LowerCaseFilterFactory /
   filter class=solr.ReversedWildcardFilterFactory /
/analyzer


Yes. Care should be taken that the query analyzer chain produces the 
same forward tokens, because the code in QueryParser that optionally 
reverses tokens acts on tokens that it receives _after_ all other query 
analyzers have run on the query.



--
Best regards,
Andrzej Bialecki 
 ___. ___ ___ ___ _ _   __
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com



What to set in query.setMaxRows()?

2009-10-01 Thread Paul Tomblin
Sorry about asking this here, but I can't reach wiki.apache.org right now.
 What do I set in query.setMaxRows() to get all the rows?

-- 
http://www.linkedin.com/in/paultomblin


Re: Solr Trunk Heap Space Issues

2009-10-01 Thread Mark Miller
Jeff Newburn wrote:
 I am trying to update to the newest version of solr from trunk as of May
 5th.  I updated and compiled from trunk as of yesterday (09/30/2009).  When
 I try to do a full import I am receiving a GC heap error after changing
 nothing in the configuration files.  Why would this happen in the most
 recent versions but not in the version from a few months ago.  
Good question. The error means its spending too much time trying to
garbage collect without making much progress.
Why so much more garbage to collect just by updating? Not sure...

 The stack
 trace is below.

 Oct 1, 2009 8:34:32 AM org.apache.solr.update.processor.LogUpdateProcessor
 finish
 INFO: {add=[166400, 166608, 166698, 166800, 166811, 167097, 167316, 167353,
 ...(83 more)]} 0 35991
 Oct 1, 2009 8:34:32 AM org.apache.solr.common.SolrException log
 SEVERE: java.lang.OutOfMemoryError: GC overhead limit exceeded
 at java.util.Arrays.copyOfRange(Arrays.java:3209)
 at java.lang.String.init(String.java:215)
 at com.ctc.wstx.util.TextBuffer.contentsAsString(TextBuffer.java:384)
 at com.ctc.wstx.sr.BasicStreamReader.getText(BasicStreamReader.java:821)
 at org.apache.solr.handler.XMLLoader.readDoc(XMLLoader.java:280)
 at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:139)
 at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:69)
 at 
 org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentSt
 reamHandlerBase.java:54)
 at 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.
 java:131)
 at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
 at 
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:3
 38)
 at 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:
 241)
 at 
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Application
 FilterChain.java:235)
 at 
 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterCh
 ain.java:206)
 at 
 org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.ja
 va:233)
 at 
 org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.ja
 va:175)
 at 
 org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128
 )
 at 
 org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102
 )
 at 
 org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java
 :109)
 at 
 org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286)
 at 
 org.apache.coyote.http11.Http11NioProcessor.process(Http11NioProcessor.java:
 879)
 at 
 org.apache.coyote.http11.Http11NioProtocol$Http11ConnectionHandler.process(H
 ttp11NioProtocol.java:719)
 at 
 org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.run(NioEndpoint.java:
 2080)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.ja
 va:886)
 at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:9
 08)
 at java.lang.Thread.run(Thread.java:619)

 Oct 1, 2009 8:40:06 AM org.apache.solr.core.SolrCore execute
 INFO: [zeta-main] webapp=/solr path=/update params={} status=500 QTime=5265
 Oct 1, 2009 8:40:12 AM org.apache.solr.common.SolrException log
 SEVERE: java.lang.OutOfMemoryError: GC overhead limit exceeded

   


-- 
- Mark

http://www.lucidimagination.com





Correction: query.setRows

2009-10-01 Thread Paul Tomblin
Sorry, in my last question I meant setRows not setMaxRows.  Whay do I pass to 
setRows to get all matches, not just the first 10?

-- Sent from my Palm Prē



Re: Solr Trunk Heap Space Issues

2009-10-01 Thread Bill Au
You probably want to add the following command line option to java to
produce a heap dump:

-XX:+HeapDumpOnOutOfMemoryError

Then you can use jhat to see what's taking up all the space in the heap.

Bill

On Thu, Oct 1, 2009 at 11:47 AM, Mark Miller markrmil...@gmail.com wrote:

 Jeff Newburn wrote:
  I am trying to update to the newest version of solr from trunk as of May
  5th.  I updated and compiled from trunk as of yesterday (09/30/2009).
  When
  I try to do a full import I am receiving a GC heap error after changing
  nothing in the configuration files.  Why would this happen in the most
  recent versions but not in the version from a few months ago.
 Good question. The error means its spending too much time trying to
 garbage collect without making much progress.
 Why so much more garbage to collect just by updating? Not sure...

  The stack
  trace is below.
 
  Oct 1, 2009 8:34:32 AM
 org.apache.solr.update.processor.LogUpdateProcessor
  finish
  INFO: {add=[166400, 166608, 166698, 166800, 166811, 167097, 167316,
 167353,
  ...(83 more)]} 0 35991
  Oct 1, 2009 8:34:32 AM org.apache.solr.common.SolrException log
  SEVERE: java.lang.OutOfMemoryError: GC overhead limit exceeded
  at java.util.Arrays.copyOfRange(Arrays.java:3209)
  at java.lang.String.init(String.java:215)
  at com.ctc.wstx.util.TextBuffer.contentsAsString(TextBuffer.java:384)
  at
 com.ctc.wstx.sr.BasicStreamReader.getText(BasicStreamReader.java:821)
  at org.apache.solr.handler.XMLLoader.readDoc(XMLLoader.java:280)
  at
 org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:139)
  at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:69)
  at
 
 org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentSt
  reamHandlerBase.java:54)
  at
 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.
  java:131)
  at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
  at
 
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:3
  38)
  at
 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:
  241)
  at
 
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Application
  FilterChain.java:235)
  at
 
 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterCh
  ain.java:206)
  at
 
 org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.ja
  va:233)
  at
 
 org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.ja
  va:175)
  at
 
 org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128
  )
  at
 
 org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102
  )
  at
 
 org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java
  :109)
  at
 
 org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286)
  at
 
 org.apache.coyote.http11.Http11NioProcessor.process(Http11NioProcessor.java:
  879)
  at
 
 org.apache.coyote.http11.Http11NioProtocol$Http11ConnectionHandler.process(H
  ttp11NioProtocol.java:719)
  at
 
 org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.run(NioEndpoint.java:
  2080)
  at
 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.ja
  va:886)
  at
 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:9
  08)
  at java.lang.Thread.run(Thread.java:619)
 
  Oct 1, 2009 8:40:06 AM org.apache.solr.core.SolrCore execute
  INFO: [zeta-main] webapp=/solr path=/update params={} status=500
 QTime=5265
  Oct 1, 2009 8:40:12 AM org.apache.solr.common.SolrException log
  SEVERE: java.lang.OutOfMemoryError: GC overhead limit exceeded
 
 


 --
 - Mark

 http://www.lucidimagination.com






Re: Where to place ReversedWildcardFilterFactory in Chain

2009-10-01 Thread Chantal Ackermann

Hi Andrzej,

thanks! Unfortunately, I get a ClassNotFoundException for the 
solr.ReversedWildcardFilterFactory with my nightly build from 22nd of 
September. I've found the corresponding JIRA issue, but from the wiki 
it's not obvious that this might require a patch? I'll have a closer 
look at the JIRA issue, in any case.


Cheers,
Chantal


Andrzej Bialecki schrieb:

Chantal Ackermann wrote:

Thanks, Mark!
But I suppose it does matter where in the index chain it goes? I would
guess it is applied to the tokens, so I suppose I should put it at the
very end - after WordDelimiter and Lowercase have been applied.


Is that correct?

analyzer type=index
  filter class=solr.WordDelimiterFilterFactory
 splitOnCaseChange=1 splitOnNumerics=1
 stemEnglishPossessive=1 generateWordParts=1
 generateNumberParts=1 catenateAll=1
 preserveOriginal=1 /
  filter class=solr.LowerCaseFilterFactory /
   filter class=solr.ReversedWildcardFilterFactory /
/analyzer


Yes. Care should be taken that the query analyzer chain produces the
same forward tokens, because the code in QueryParser that optionally
reverses tokens acts on tokens that it receives _after_ all other query
analyzers have run on the query.


--
Best regards,
Andrzej Bialecki 
  ___. ___ ___ ___ _ _   __
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com



Re: Problem getting Solr home from JNDI in Tomcat

2009-10-01 Thread Andrew Clegg



Andrew Clegg wrote:
 
 
 hossman wrote:
 
 
 This is why the examples of using context files on the wiki talk about 
 keeping the war *outside* of the webapps directory, and using docBase in 
 your Context declaration...
   http://wiki.apache.org/solr/SolrTomcat
 
 
 
 Great, I'll try it this way and see if it clears up. Is it okay to keep
 the war file *inside* the Solr home directory (/opt/solr in my case) so
 it's all self-contained?
 

For the benefit of future searchers -- I tried it this way and it works
fine. Thanks again to everyone for helping.

Andrew.

-- 
View this message in context: 
http://www.nabble.com/Problem-getting-Solr-home-from-JNDI-in-Tomcat-tp25662200p25701748.html
Sent from the Solr - User mailing list archive at Nabble.com.



ExtractingRequestHandler unknown field 'stream_source_info'

2009-10-01 Thread Tricia Williams

Hi All,

   I'm trying Solr CEL outside of the example and running into trouble 
because I can't refer to the 
http://wiki.apache.org/solr/ExtractingRequestHandler (the wiki's down).  
After realizing I needed to copy all the jars from /example/solr/lib to 
my indexes /lib dir, I am now hitting this particular wall:


INFO: [] webapp=/solr path=/update/extract 
params={myfile=MHGL016341T.pdfcommit=trueliteral.id=MHGL.1634} 
status=0 QTime=5967
1-Oct-2009 10:06:34 AM 
org.apache.solr.update.processor.LogUpdateProcessor finish

INFO: {} 0 260248
1-Oct-2009 10:06:38 AM org.apache.solr.common.SolrException log
SEVERE: org.apache.solr.common.SolrException: ERROR:unknown field 
'stream_source_info'
at 
org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:289)
at 
org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:60)
at 
org.apache.solr.handler.extraction.ExtractingDocumentLoader.doAdd(ExtractingDocumentLoader.java:118)
at 
org.apache.solr.handler.extraction.ExtractingDocumentLoader.addDoc(ExtractingDocumentLoader.java:123)
at 
org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:192)
at 
org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)

at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089)
at 
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365)
at 
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at 
org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
at 
org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712)
at 
org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405)
at 
org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211)
at 
org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
at 
org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139)

at org.mortbay.jetty.Server.handle(Server.java:285)
at 
org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502)
at 
org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:835)

at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:641)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:202)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378)
at 
org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226)
at 
org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442)

while running:
curl 
http://localhost:8983/solr/update/extract?literal.id=MHGL.1634commit=true; 
-F myfi...@mhgl016341t.pdf


It feels like I'm not mapping something correctly either in my POST 
request or in solrconfig.xml/schema.xml.  I can see that 
STREAM_SOURCE_INFO is supposed to be an internal field from the code but 
I'm not following why it would cause this error.


Any suggestions would be appreciated.

Many Thanks,
Tricia


Quotes in query string cause NullPointerException

2009-10-01 Thread Andrew Clegg

Hi folks,

I'm using the 2009-09-30 build, and any single or double quotes in the query
string cause an NPE. Is this normal behaviour? I never tried it with my
previous installation.

Example:

http://myserver:8080/solr/select/?title:%22Creatine+kinase%22

(I've also tried without the URL encoding, no difference)

Response:

HTTP Status 500 - null java.lang.NullPointerException at
java.io.StringReader.init(StringReader.java:33) at
org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:173) at
org.apache.solr.search.LuceneQParser.parse(LuceneQParserPlugin.java:78) at
org.apache.solr.search.QParser.getQuery(QParser.java:131) at
org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:89)
at
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:174)
at
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) at
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
at
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
at
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
at
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:175)
at
org.apache.catalina.valves.RequestFilterValve.process(RequestFilterValve.java:269)
at
org.apache.catalina.valves.RemoteAddrValve.invoke(RemoteAddrValve.java:81)
at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:568)
at
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
at
org.jstripe.tomcat.probe.Tomcat55AgentValve.invoke(Tomcat55AgentValve.java:20)
at
org.jstripe.tomcat.probe.Tomcat55AgentValve.invoke(Tomcat55AgentValve.java:20)
at
org.jstripe.tomcat.probe.Tomcat55AgentValve.invoke(Tomcat55AgentValve.java:20)
at
org.jstripe.tomcat.probe.Tomcat55AgentValve.invoke(Tomcat55AgentValve.java:20)
at
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
at
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
at
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286)
at
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:844)
at
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583)
at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447)
at java.lang.Thread.run(Thread.java:619) 

Single quotes have the same effect.

Is there another way to specify exact phrases?

Thanks,

Andrew.

-- 
View this message in context: 
http://www.nabble.com/Quotes-in-query-string-cause-NullPointerException-tp25702207p25702207.html
Sent from the Solr - User mailing list archive at Nabble.com.



Re: ExtractingRequestHandler unknown field 'stream_source_info'

2009-10-01 Thread Walter Lewis

On 1 Oct 09, at 12:46 PM, Tricia Williams wrote:


STREAM_SOURCE_INFO



https://www.packtpub.com/article/indexing-data-solr-1.4-enterprise-search-server-2

appears to be a constant from this page:
  http://lucene.apache.org/solr/api/constant-values.html

This has it embedded as an arr in the results
http://www.nabble.com/Solr-question-td25271706.html

Whether any of these help or not ...

Walter


Re: Where to place ReversedWildcardFilterFactory in Chain

2009-10-01 Thread Mark Miller
It was added to trunk on the 11th and shouldn't require a patch. You
sure that nightly was actually build after then?

solr.ReversedWildcardFilterFactory should work fine.

Chantal Ackermann wrote:
 Hi Andrzej,

 thanks! Unfortunately, I get a ClassNotFoundException for the
 solr.ReversedWildcardFilterFactory with my nightly build from 22nd of
 September. I've found the corresponding JIRA issue, but from the wiki
 it's not obvious that this might require a patch? I'll have a closer
 look at the JIRA issue, in any case.

 Cheers,
 Chantal


 Andrzej Bialecki schrieb:
 Chantal Ackermann wrote:
 Thanks, Mark!
 But I suppose it does matter where in the index chain it goes? I would
 guess it is applied to the tokens, so I suppose I should put it at the
 very end - after WordDelimiter and Lowercase have been applied.


 Is that correct?

 analyzer type=index
   filter class=solr.WordDelimiterFilterFactory
  splitOnCaseChange=1 splitOnNumerics=1
  stemEnglishPossessive=1 generateWordParts=1
  generateNumberParts=1 catenateAll=1
  preserveOriginal=1 /
   filter class=solr.LowerCaseFilterFactory /
filter class=solr.ReversedWildcardFilterFactory /
 /analyzer

 Yes. Care should be taken that the query analyzer chain produces the
 same forward tokens, because the code in QueryParser that optionally
 reverses tokens acts on tokens that it receives _after_ all other query
 analyzers have run on the query.


 -- 
 Best regards,
 Andrzej Bialecki 
   ___. ___ ___ ___ _ _   __
 [__ || __|__/|__||\/|  Information Retrieval, Semantic Web
 ___|||__||  \|  ||  |  Embedded Unix, System Integration
 http://www.sigram.com  Contact: info at sigram dot com



-- 
- Mark

http://www.lucidimagination.com





Re: Quotes in query string cause NullPointerException

2009-10-01 Thread Erik Hatcher

don't forget q=...  :)

Erik

On Oct 1, 2009, at 9:49 AM, Andrew Clegg wrote:



Hi folks,

I'm using the 2009-09-30 build, and any single or double quotes in  
the query
string cause an NPE. Is this normal behaviour? I never tried it with  
my

previous installation.

Example:

http://myserver:8080/solr/select/?title:%22Creatine+kinase%22

(I've also tried without the URL encoding, no difference)

Response:

HTTP Status 500 - null java.lang.NullPointerException at
java.io.StringReader.init(StringReader.java:33) at
org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java: 
173) at
org.apache.solr.search.LuceneQParser.parse(LuceneQParserPlugin.java: 
78) at

org.apache.solr.search.QParser.getQuery(QParser.java:131) at
org 
.apache 
.solr.handler.component.QueryComponent.prepare(QueryComponent.java:89)

at
org 
.apache 
.solr 
.handler 
.component.SearchHandler.handleRequestBody(SearchHandler.java:174)

at
org 
.apache 
.solr 
.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)

at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) at
org 
.apache 
.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)

at
org 
.apache 
.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)

at
org 
.apache 
.catalina 
.core 
.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java: 
235)

at
org 
.apache 
.catalina 
.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)

at
org 
.apache 
.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java: 
233)

at
org 
.apache 
.catalina.core.StandardContextValve.invoke(StandardContextValve.java: 
175)

at
org 
.apache 
.catalina.valves.RequestFilterValve.process(RequestFilterValve.java: 
269)

at
org 
.apache.catalina.valves.RemoteAddrValve.invoke(RemoteAddrValve.java: 
81)
at  
org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java: 
568)

at
org 
.apache 
.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)

at
org 
.jstripe 
.tomcat.probe.Tomcat55AgentValve.invoke(Tomcat55AgentValve.java:20)

at
org 
.jstripe 
.tomcat.probe.Tomcat55AgentValve.invoke(Tomcat55AgentValve.java:20)

at
org 
.jstripe 
.tomcat.probe.Tomcat55AgentValve.invoke(Tomcat55AgentValve.java:20)

at
org 
.jstripe 
.tomcat.probe.Tomcat55AgentValve.invoke(Tomcat55AgentValve.java:20)

at
org 
.apache 
.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)

at
org 
.apache 
.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java: 
109)

at
org 
.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java: 
286)

at
org 
.apache.coyote.http11.Http11Processor.process(Http11Processor.java: 
844)

at
org.apache.coyote.http11.Http11Protocol 
$Http11ConnectionHandler.process(Http11Protocol.java:583)
at org.apache.tomcat.util.net.JIoEndpoint 
$Worker.run(JIoEndpoint.java:447)

at java.lang.Thread.run(Thread.java:619)

Single quotes have the same effect.

Is there another way to specify exact phrases?

Thanks,

Andrew.

--
View this message in context: 
http://www.nabble.com/Quotes-in-query-string-cause-NullPointerException-tp25702207p25702207.html
Sent from the Solr - User mailing list archive at Nabble.com.





Re: Quotes in query string cause NullPointerException

2009-10-01 Thread Andrew Clegg

Sorry! I'm officially a complete idiot.

Personally I'd try to catch things like that and rethrow a
'QueryParseException' or something -- but don't feel under any obligation to
listen to me because, well, I'm an idiot.

Thanks :-)

Andrew.


Erik Hatcher-4 wrote:
 
 don't forget q=...  :)
 
   Erik
 
 On Oct 1, 2009, at 9:49 AM, Andrew Clegg wrote:
 

 Hi folks,

 I'm using the 2009-09-30 build, and any single or double quotes in  
 the query
 string cause an NPE. Is this normal behaviour? I never tried it with  
 my
 previous installation.

 Example:

 http://myserver:8080/solr/select/?title:%22Creatine+kinase%22

 (I've also tried without the URL encoding, no difference)

 Response:

 HTTP Status 500 - null java.lang.NullPointerException at
 java.io.StringReader.init(StringReader.java:33) at
 org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java: 
 173) at
 org.apache.solr.search.LuceneQParser.parse(LuceneQParserPlugin.java: 
 78) at
 org.apache.solr.search.QParser.getQuery(QParser.java:131) at
 org 
 .apache 
 .solr.handler.component.QueryComponent.prepare(QueryComponent.java:89)
 at
 org 
 .apache 
 .solr 
 .handler 
 .component.SearchHandler.handleRequestBody(SearchHandler.java:174)
 at
 org 
 .apache 
 .solr 
 .handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
 at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) at
 org 
 .apache 
 .solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
 at
 org 
 .apache 
 .solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
 at
 org 
 .apache 
 .catalina 
 .core 
 .ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java: 
 235)
 at
 org 
 .apache 
 .catalina 
 .core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
 at
 org 
 .apache 
 .catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java: 
 233)
 at
 org 
 .apache 
 .catalina.core.StandardContextValve.invoke(StandardContextValve.java: 
 175)
 at
 org 
 .apache 
 .catalina.valves.RequestFilterValve.process(RequestFilterValve.java: 
 269)
 at
 org 
 .apache.catalina.valves.RemoteAddrValve.invoke(RemoteAddrValve.java: 
 81)
 at  
 org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java: 
 568)
 at
 org 
 .apache 
 .catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
 at
 org 
 .jstripe 
 .tomcat.probe.Tomcat55AgentValve.invoke(Tomcat55AgentValve.java:20)
 at
 org 
 .jstripe 
 .tomcat.probe.Tomcat55AgentValve.invoke(Tomcat55AgentValve.java:20)
 at
 org 
 .jstripe 
 .tomcat.probe.Tomcat55AgentValve.invoke(Tomcat55AgentValve.java:20)
 at
 org 
 .jstripe 
 .tomcat.probe.Tomcat55AgentValve.invoke(Tomcat55AgentValve.java:20)
 at
 org 
 .apache 
 .catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
 at
 org 
 .apache 
 .catalina.core.StandardEngineValve.invoke(StandardEngineValve.java: 
 109)
 at
 org 
 .apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java: 
 286)
 at
 org 
 .apache.coyote.http11.Http11Processor.process(Http11Processor.java: 
 844)
 at
 org.apache.coyote.http11.Http11Protocol 
 $Http11ConnectionHandler.process(Http11Protocol.java:583)
 at org.apache.tomcat.util.net.JIoEndpoint 
 $Worker.run(JIoEndpoint.java:447)
 at java.lang.Thread.run(Thread.java:619)

 Single quotes have the same effect.

 Is there another way to specify exact phrases?

 Thanks,

 Andrew.

 -- 
 View this message in context:
 http://www.nabble.com/Quotes-in-query-string-cause-NullPointerException-tp25702207p25702207.html
 Sent from the Solr - User mailing list archive at Nabble.com.

 
 
 

-- 
View this message in context: 
http://www.nabble.com/Quotes-in-query-string-cause-NullPointerException-tp25702207p25704050.html
Sent from the Solr - User mailing list archive at Nabble.com.



How to access the information from SolrJ

2009-10-01 Thread Paul Tomblin
When I do a query directly form the web, the XML of the response
includes how many results would have been returned if it hadn't
restricted itself to the first 10 rows:

For instance, the query:
http://localhost:8080/solrChunk/nutch/select/?q=*:*fq=category:mysites
returns:
response
lst name='responseHeader'
int name='status'0/int
int name='QTime'0/int
lst name='params'
str name='q'*:*/str
str name='fq'category:mysites/str
/lst
/lst
result name='response' numFound='1251' start='0'
doc
str name='category'mysites/str
long name='chunkNum'0/long
str name='chunkUrl'http://localhost/Chunks/mysites/0-http___xcski.com_.xml/str
str name='concept'Anatomy/str
...

The value I'm talking about is in the numFound attribute of the result tag.

I don't see any way to retrieve it through SolrJ - it's not in the
QueryResponse.getHeader(), for instance.  Can I retrieve it somewhere?

--
http://www.linkedin.com/in/paultomblin


Re: Solr Trunk Heap Space Issues

2009-10-01 Thread Jeff Newburn
Added the parameter and it didn't seem to dump when it hit the gc limit
error.  Any other thoughts?

-- 
Jeff Newburn
Software Engineer, Zappos.com
jnewb...@zappos.com - 702-943-7562


 From: Bill Au bill.w...@gmail.com
 Reply-To: solr-user@lucene.apache.org
 Date: Thu, 1 Oct 2009 12:16:53 -0400
 To: solr-user@lucene.apache.org
 Subject: Re: Solr Trunk Heap Space Issues
 
 You probably want to add the following command line option to java to
 produce a heap dump:
 
 -XX:+HeapDumpOnOutOfMemoryError
 
 Then you can use jhat to see what's taking up all the space in the heap.
 
 Bill
 
 On Thu, Oct 1, 2009 at 11:47 AM, Mark Miller markrmil...@gmail.com wrote:
 
 Jeff Newburn wrote:
 I am trying to update to the newest version of solr from trunk as of May
 5th.  I updated and compiled from trunk as of yesterday (09/30/2009).
  When
 I try to do a full import I am receiving a GC heap error after changing
 nothing in the configuration files.  Why would this happen in the most
 recent versions but not in the version from a few months ago.
 Good question. The error means its spending too much time trying to
 garbage collect without making much progress.
 Why so much more garbage to collect just by updating? Not sure...
 
 The stack
 trace is below.
 
 Oct 1, 2009 8:34:32 AM
 org.apache.solr.update.processor.LogUpdateProcessor
 finish
 INFO: {add=[166400, 166608, 166698, 166800, 166811, 167097, 167316,
 167353,
 ...(83 more)]} 0 35991
 Oct 1, 2009 8:34:32 AM org.apache.solr.common.SolrException log
 SEVERE: java.lang.OutOfMemoryError: GC overhead limit exceeded
 at java.util.Arrays.copyOfRange(Arrays.java:3209)
 at java.lang.String.init(String.java:215)
 at com.ctc.wstx.util.TextBuffer.contentsAsString(TextBuffer.java:384)
 at
 com.ctc.wstx.sr.BasicStreamReader.getText(BasicStreamReader.java:821)
 at org.apache.solr.handler.XMLLoader.readDoc(XMLLoader.java:280)
 at
 org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:139)
 at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:69)
 at
 
 org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentSt
 reamHandlerBase.java:54)
 at
 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.
 java:131)
 at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
 at
 
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:3
 38)
 at
 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:
 241)
 at
 
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Application
 FilterChain.java:235)
 at
 
 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterCh
 ain.java:206)
 at
 
 org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.ja
 va:233)
 at
 
 org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.ja
 va:175)
 at
 
 org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128
 )
 at
 
 org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102
 )
 at
 
 org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java
 :109)
 at
 
 org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286)
 at
 
 org.apache.coyote.http11.Http11NioProcessor.process(Http11NioProcessor.java:
 879)
 at
 
 org.apache.coyote.http11.Http11NioProtocol$Http11ConnectionHandler.process(H
 ttp11NioProtocol.java:719)
 at
 
 org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.run(NioEndpoint.java:
 2080)
 at
 
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.ja
 va:886)
 at
 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:9
 08)
 at java.lang.Thread.run(Thread.java:619)
 
 Oct 1, 2009 8:40:06 AM org.apache.solr.core.SolrCore execute
 INFO: [zeta-main] webapp=/solr path=/update params={} status=500
 QTime=5265
 Oct 1, 2009 8:40:12 AM org.apache.solr.common.SolrException log
 SEVERE: java.lang.OutOfMemoryError: GC overhead limit exceeded
 
 
 
 
 --
 - Mark
 
 http://www.lucidimagination.com
 
 
 
 



Re: Solr Trunk Heap Space Issues

2009-10-01 Thread Mark Miller
Jeff Newburn wrote:
 Added the parameter and it didn't seem to dump when it hit the gc limit
 error.  Any other thoughts?

   
You might use jmap to take a look at the heap (you can do it well its
live with Java6) or to force a heap dump when you specify.

Since its spending 98% of the time in gc and recovering less than 2% of
the heap, its likely your just right at the mem limits of what your app
now requires. Why that should be different than what it was in a march
build, I still dunno.

Can you give us the info on your stats page regarding the new fieldcache
insanity checker?

-- 
- Mark

http://www.lucidimagination.com





Re: Solr Trunk Heap Space Issues

2009-10-01 Thread Mark Miller
Mark Miller wrote:

 You might use jmap to take a look at the heap (you can do it well its
 live with Java6)
Errr - just so I don't screw anyone in a production environment - it
will freeze your app while its getting the info.

-- 
- Mark

http://www.lucidimagination.com





Re: Why isn't the DateField implementation of ISO 8601 broader?

2009-10-01 Thread Lance Norskog
 My question is why isn't the DateField implementation of ISO 8601 broader so 
 that it could include  and MM as acceptable date strings?  What would 
 it take to do so?

Nobody ever cared? But yes, you're right, the spurious precision is
annoying. However, there is no fuzzy search for dates so the
precision is always used. Let's say I want to limit it to 19th
century America culture. 1790-1910 are a fairly contiguous sequence
in US history, with a massive break at 1910 for WW1.

 Are there any work-arounds for faceting by century, year, month without 
 creating new fields in my schema?  The last resort would be to create these 
 new fields but I'm hoping to leverage the power of the DateField and the trie 
 to replace range stuff.

There are no workarounds as yet. You do not have to store the
century/year etc. fields, only index them.

Tries do not support faceting yet.

 Some interesting observations from tinkering with the DateFieldTest:
   * 2003-03-00T00:00:00Z becomes 2003-02-28T00:00:00Z

The date parser should blow up with these values!


Re: ExtractingRequestHandler unknown field 'stream_source_info'

2009-10-01 Thread Tricia Williams

If the wiki isn't working
https://www.packtpub.com/article/indexing-data-solr-1.4-enterprise-search-server-2 


gave me more information.  The LucidImagination article helps too.

Now that the wiki is up again it is more obvious that I need to add:

str name=fmap.contentfulltext/str
str name=defaultFieldtext/str

to my solrconfig.xml

Tricia


Re: Quotes in query string cause NullPointerException

2009-10-01 Thread Israel Ekpo
Don't be too hard on yourself.

Sometimes, mistakes like that can happen even to the most brilliant and most
experienced.

On Thu, Oct 1, 2009 at 2:15 PM, Andrew Clegg andrew.cl...@gmail.com wrote:


 Sorry! I'm officially a complete idiot.

 Personally I'd try to catch things like that and rethrow a
 'QueryParseException' or something -- but don't feel under any obligation
 to
 listen to me because, well, I'm an idiot.

 Thanks :-)

 Andrew.


 Erik Hatcher-4 wrote:
 
  don't forget q=...  :)
 
Erik
 
  On Oct 1, 2009, at 9:49 AM, Andrew Clegg wrote:
 
 
  Hi folks,
 
  I'm using the 2009-09-30 build, and any single or double quotes in
  the query
  string cause an NPE. Is this normal behaviour? I never tried it with
  my
  previous installation.
 
  Example:
 
  http://myserver:8080/solr/select/?title:%22Creatine+kinase%22
 
  (I've also tried without the URL encoding, no difference)
 
  Response:
 
  HTTP Status 500 - null java.lang.NullPointerException at
  java.io.StringReader.init(StringReader.java:33) at
  org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:
  173) at
  org.apache.solr.search.LuceneQParser.parse(LuceneQParserPlugin.java:
  78) at
  org.apache.solr.search.QParser.getQuery(QParser.java:131) at
  org
  .apache
  .solr.handler.component.QueryComponent.prepare(QueryComponent.java:89)
  at
  org
  .apache
  .solr
  .handler
  .component.SearchHandler.handleRequestBody(SearchHandler.java:174)
  at
  org
  .apache
  .solr
  .handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
  at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) at
  org
  .apache
  .solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
  at
  org
  .apache
  .solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
  at
  org
  .apache
  .catalina
  .core
  .ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:
  235)
  at
  org
  .apache
  .catalina
  .core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
  at
  org
  .apache
  .catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:
  233)
  at
  org
  .apache
  .catalina.core.StandardContextValve.invoke(StandardContextValve.java:
  175)
  at
  org
  .apache
  .catalina.valves.RequestFilterValve.process(RequestFilterValve.java:
  269)
  at
  org
  .apache.catalina.valves.RemoteAddrValve.invoke(RemoteAddrValve.java:
  81)
  at
  org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:
  568)
  at
  org
  .apache
  .catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
  at
  org
  .jstripe
  .tomcat.probe.Tomcat55AgentValve.invoke(Tomcat55AgentValve.java:20)
  at
  org
  .jstripe
  .tomcat.probe.Tomcat55AgentValve.invoke(Tomcat55AgentValve.java:20)
  at
  org
  .jstripe
  .tomcat.probe.Tomcat55AgentValve.invoke(Tomcat55AgentValve.java:20)
  at
  org
  .jstripe
  .tomcat.probe.Tomcat55AgentValve.invoke(Tomcat55AgentValve.java:20)
  at
  org
  .apache
  .catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
  at
  org
  .apache
  .catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:
  109)
  at
  org
  .apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:
  286)
  at
  org
  .apache.coyote.http11.Http11Processor.process(Http11Processor.java:
  844)
  at
  org.apache.coyote.http11.Http11Protocol
  $Http11ConnectionHandler.process(Http11Protocol.java:583)
  at org.apache.tomcat.util.net.JIoEndpoint
  $Worker.run(JIoEndpoint.java:447)
  at java.lang.Thread.run(Thread.java:619)
 
  Single quotes have the same effect.
 
  Is there another way to specify exact phrases?
 
  Thanks,
 
  Andrew.
 
  --
  View this message in context:
 
 http://www.nabble.com/Quotes-in-query-string-cause-NullPointerException-tp25702207p25702207.html
  Sent from the Solr - User mailing list archive at Nabble.com.
 
 
 
 

 --
 View this message in context:
 http://www.nabble.com/Quotes-in-query-string-cause-NullPointerException-tp25702207p25704050.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 
Good Enough is not good enough.
To give anything less than your best is to sacrifice the gift.
Quality First. Measure Twice. Cut Once.


Re: trie fields and sortMissingLast

2009-10-01 Thread Lance Norskog
Trie fields also do not support faceting. They also take more ram in
some operations.

Given these defects, I'm not sure that promoting tries as the default
is appropriate at this time. (I'm sure this is an old argument.:)

On Thu, Oct 1, 2009 at 7:39 AM, Steve Conover scono...@gmail.com wrote:
 I just noticed this comment in the default schema:

 !--
       These types should only be used for back compatibility with existing
       indexes, or if sortMissingLast functionality is needed. Use
 Trie based fields instead.
    --

 Does that mean TrieFields are never going to get sortMissingLast?

 Do you all think that a reasonable strategy is to use a copyField and
 use s fields for sorting (only), and trie for everything else?

 On Wed, Sep 30, 2009 at 10:59 PM, Steve Conover scono...@gmail.com wrote:
 Am I correct in thinking that trie fields don't support
 sortMissingLast (my tests show that they don't).  If not, is there any
 plan for adding it in?

 Regards,
 Steve





-- 
Lance Norskog
goks...@gmail.com


Re: Quotes in query string cause NullPointerException

2009-10-01 Thread Erik Hatcher
Indeed... and the only reason I knew the answer right away is because  
I've experienced this myself numerous times :)


Erik

On Oct 1, 2009, at 11:46 AM, Israel Ekpo wrote:


Don't be too hard on yourself.

Sometimes, mistakes like that can happen even to the most brilliant  
and most

experienced.

On Thu, Oct 1, 2009 at 2:15 PM, Andrew Clegg  
andrew.cl...@gmail.com wrote:




Sorry! I'm officially a complete idiot.

Personally I'd try to catch things like that and rethrow a
'QueryParseException' or something -- but don't feel under any  
obligation

to
listen to me because, well, I'm an idiot.

Thanks :-)

Andrew.


Erik Hatcher-4 wrote:


don't forget q=...  :)

 Erik

On Oct 1, 2009, at 9:49 AM, Andrew Clegg wrote:



Hi folks,

I'm using the 2009-09-30 build, and any single or double quotes in
the query
string cause an NPE. Is this normal behaviour? I never tried it  
with

my
previous installation.

Example:

http://myserver:8080/solr/select/?title:%22Creatine+kinase%22

(I've also tried without the URL encoding, no difference)

Response:

HTTP Status 500 - null java.lang.NullPointerException at
java.io.StringReader.init(StringReader.java:33) at
org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:
173) at
org 
.apache.solr.search.LuceneQParser.parse(LuceneQParserPlugin.java:

78) at
org.apache.solr.search.QParser.getQuery(QParser.java:131) at
org
.apache
.solr 
.handler.component.QueryComponent.prepare(QueryComponent.java:89)

at
org
.apache
.solr
.handler
.component.SearchHandler.handleRequestBody(SearchHandler.java:174)
at
org
.apache
.solr
.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java: 
131)

at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) at
org
.apache
.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java: 
338)

at
org
.apache
.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java: 
241)

at
org
.apache
.catalina
.core
.ApplicationFilterChain 
.internalDoFilter(ApplicationFilterChain.java:

235)
at
org
.apache
.catalina
.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java: 
206)

at
org
.apache
.catalina 
.core.StandardWrapperValve.invoke(StandardWrapperValve.java:

233)
at
org
.apache
.catalina 
.core.StandardContextValve.invoke(StandardContextValve.java:

175)
at
org
.apache
.catalina 
.valves.RequestFilterValve.process(RequestFilterValve.java:

269)
at
org
.apache 
.catalina.valves.RemoteAddrValve.invoke(RemoteAddrValve.java:

81)
at
org 
.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:

568)
at
org
.apache
.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
at
org
.jstripe
.tomcat.probe.Tomcat55AgentValve.invoke(Tomcat55AgentValve.java:20)
at
org
.jstripe
.tomcat.probe.Tomcat55AgentValve.invoke(Tomcat55AgentValve.java:20)
at
org
.jstripe
.tomcat.probe.Tomcat55AgentValve.invoke(Tomcat55AgentValve.java:20)
at
org
.jstripe
.tomcat.probe.Tomcat55AgentValve.invoke(Tomcat55AgentValve.java:20)
at
org
.apache
.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
at
org
.apache
.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:
109)
at
org
.apache 
.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:

286)
at
org
.apache.coyote.http11.Http11Processor.process(Http11Processor.java:
844)
at
org.apache.coyote.http11.Http11Protocol
$Http11ConnectionHandler.process(Http11Protocol.java:583)
at org.apache.tomcat.util.net.JIoEndpoint
$Worker.run(JIoEndpoint.java:447)
at java.lang.Thread.run(Thread.java:619)

Single quotes have the same effect.

Is there another way to specify exact phrases?

Thanks,

Andrew.

--
View this message in context:


http://www.nabble.com/Quotes-in-query-string-cause-NullPointerException-tp25702207p25702207.html

Sent from the Solr - User mailing list archive at Nabble.com.







--
View this message in context:
http://www.nabble.com/Quotes-in-query-string-cause-NullPointerException-tp25702207p25704050.html
Sent from the Solr - User mailing list archive at Nabble.com.





--
Good Enough is not good enough.
To give anything less than your best is to sacrifice the gift.
Quality First. Measure Twice. Cut Once.




Re: ExtractingRequestHandler unknown field 'stream_source_info'

2009-10-01 Thread Lance Norskog
For future reference, the Solr  Lucene wikis and mailing lists are
indexed on http://www.lucidimagination.com/search/

On Thu, Oct 1, 2009 at 11:40 AM, Tricia Williams
williams.tri...@gmail.com wrote:
 If the wiki isn't working


 https://www.packtpub.com/article/indexing-data-solr-1.4-enterprise-search-server-2

 gave me more information.  The LucidImagination article helps too.

 Now that the wiki is up again it is more obvious that I need to add:

 str name=fmap.contentfulltext/str
 str name=defaultFieldtext/str

 to my solrconfig.xml

 Tricia




-- 
Lance Norskog
goks...@gmail.com


Re: Solr Trunk Heap Space Issues

2009-10-01 Thread Yonik Seeley
On Thu, Oct 1, 2009 at 11:41 AM, Jeff Newburn jnewb...@zappos.com wrote:
 I am trying to update to the newest version of solr from trunk as of May
 5th.

Tons of changes since... including the per-segment
searching/sorting/function queries (I think).

Do you sort on any single valued fields that you also facet on?
Do you use ord() or rord() in any function queries?

Unfortunately, some of these things will take up more memory because
some things still cache FieldCache elements with the top-level reader,
while some use segment readers.  The direction is going toward all
segment readers, but we're not there yet (and won't be for 1.4).
ord() rord() will never be fixed... people need to migrate to
something else.

http://issues.apache.org/jira/browse/SOLR- is the main issue for this.

If course, I've really only been talking about search related changes.
 Nothing on the indexing side should cause greater memory usage
but perhaps the indexing side could run out of memory due to the
search side taking up more.

-Yonik
http://www.lucidimagination.com

  I updated and compiled from trunk as of yesterday (09/30/2009).  When
 I try to do a full import I am receiving a GC heap error after changing
 nothing in the configuration files.  Why would this happen in the most
 recent versions but not in the version from a few months ago.  The stack
 trace is below.

 Oct 1, 2009 8:34:32 AM org.apache.solr.update.processor.LogUpdateProcessor
 finish
 INFO: {add=[166400, 166608, 166698, 166800, 166811, 167097, 167316, 167353,
 ...(83 more)]} 0 35991
 Oct 1, 2009 8:34:32 AM org.apache.solr.common.SolrException log
 SEVERE: java.lang.OutOfMemoryError: GC overhead limit exceeded
    at java.util.Arrays.copyOfRange(Arrays.java:3209)
    at java.lang.String.init(String.java:215)
    at com.ctc.wstx.util.TextBuffer.contentsAsString(TextBuffer.java:384)
    at com.ctc.wstx.sr.BasicStreamReader.getText(BasicStreamReader.java:821)
    at org.apache.solr.handler.XMLLoader.readDoc(XMLLoader.java:280)
    at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:139)
    at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:69)
    at
 org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentSt
 reamHandlerBase.java:54)
    at
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.
 java:131)
    at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
    at
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:3
 38)
    at
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:
 241)
    at
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Application
 FilterChain.java:235)
    at
 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterCh
 ain.java:206)
    at
 org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.ja
 va:233)
    at
 org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.ja
 va:175)
    at
 org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128
 )
    at
 org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102
 )
    at
 org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java
 :109)
    at
 org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286)
    at
 org.apache.coyote.http11.Http11NioProcessor.process(Http11NioProcessor.java:
 879)
    at
 org.apache.coyote.http11.Http11NioProtocol$Http11ConnectionHandler.process(H
 ttp11NioProtocol.java:719)
    at
 org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.run(NioEndpoint.java:
 2080)
    at
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.ja
 va:886)
    at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:9
 08)
    at java.lang.Thread.run(Thread.java:619)

 Oct 1, 2009 8:40:06 AM org.apache.solr.core.SolrCore execute
 INFO: [zeta-main] webapp=/solr path=/update params={} status=500 QTime=5265
 Oct 1, 2009 8:40:12 AM org.apache.solr.common.SolrException log
 SEVERE: java.lang.OutOfMemoryError: GC overhead limit exceeded

 --
 Jeff Newburn
 Software Engineer, Zappos.com
 jnewb...@zappos.com - 702-943-7562




Re: index size before and after commit

2009-10-01 Thread Lance Norskog
I've heard there is a new partial optimize feature in Lucene, but it
is not mentioned in the Solr or Lucene wikis so I cannot advise you
how to use it.

On a previous project we had a 500GB index for 450m documents. It took
14 hours to optimize. We found that Solr worked well (given enough RAM
for sorting and faceting requests) but that the IT logistics of a 500G
fileset were too much.

Also, if you want your query servers to continue serving while
propogating the newly optimized index, you need 2X space to store both
copies on the slave during the transfer. For us this 35 minutes over
1G ethernet.

On Thu, Oct 1, 2009 at 7:36 AM, Walter Underwood wun...@wunderwood.org wrote:
 I've now worked on three different search engines and they all have a 3X
 worst
 case on space, so I'm familiar with this case. --wunder

 On Oct 1, 2009, at 7:15 AM, Mark Miller wrote:

 Nice one ;) Its not technically a case where optimize requires  2x
 though in case the user asking gets confused. Its a case unrelated to
 optimize that can grow your index. Then you need  2x for the optimize,
 since you won't copy the deletes.

 It also requires that you jump hoops to delete everything. If you delete
 everything with *:*, that is smart enough not to just do a delete on
 every document - it just creates a new index, allowing the removal of
 the old very efficiently.

 Def agree on the more disk space.

 Walter Underwood wrote:

 Here is how you need 3X. First, index everything and optimize. Then
 delete everything and reindex without any merges.

 You have one full-size index containing only deleted docs, one
 full-size index containing reindexed docs, and need that much space
 for a third index.

 Honestly, disk is cheap, and there is no way to make Lucene work
 reliably with less disk. 1TB is a few hundred dollars. You have a free
 search engine, buy some disk.

 wunder

 On Oct 1, 2009, at 6:25 AM, Grant Ingersoll wrote:

 151GB or as little as from 183GB to 182GB.  Is that size after a
 commit close to the size the index would be after an optimize?  For
 that matter, are there cases where optimization can take more than
 2x?  I've heard of cases but have not observed them in my system.

 I seem to recall a case where it can be 3x, but I don't know that it
 has been observed much.



 --
 - Mark

 http://www.lucidimagination.com








-- 
Lance Norskog
goks...@gmail.com


Re: trie fields and sortMissingLast

2009-10-01 Thread Yonik Seeley
On Thu, Oct 1, 2009 at 10:39 AM, Steve Conover scono...@gmail.com wrote:
 I just noticed this comment in the default schema:

 !--
       These types should only be used for back compatibility with existing
       indexes, or if sortMissingLast functionality is needed. Use
 Trie based fields instead.
    --

 Does that mean TrieFields are never going to get sortMissingLast?

Not in time for 1.4, but yes they will eventually get it.
It has to do with the representation... currently we can't tell
between a 0 and missing.

 Do you all think that a reasonable strategy is to use a copyField and
 use s fields for sorting (only), and trie for everything else?

If you don't need the fast range queries, use the s fields only.

-Yonik
http://www.lucidimagination.com


 On Wed, Sep 30, 2009 at 10:59 PM, Steve Conover scono...@gmail.com wrote:
 Am I correct in thinking that trie fields don't support
 sortMissingLast (my tests show that they don't).  If not, is there any
 plan for adding it in?

 Regards,
 Steve




Re: Solr Trunk Heap Space Issues

2009-10-01 Thread Mark Miller
bq. Tons of changes since... including the per-segment
searching/sorting/function queries (I think).

Yup. I actually didn't think so, because that was committed to Lucene in
Feburary - but it didn't come into Solr till March 10th. March 5th just
ducked it.

Yonik Seeley wrote:
 On Thu, Oct 1, 2009 at 11:41 AM, Jeff Newburn jnewb...@zappos.com wrote:
   
 I am trying to update to the newest version of solr from trunk as of May
 5th.
 

 Tons of changes since... including the per-segment
 searching/sorting/function queries (I think).

 Do you sort on any single valued fields that you also facet on?
 Do you use ord() or rord() in any function queries?

 Unfortunately, some of these things will take up more memory because
 some things still cache FieldCache elements with the top-level reader,
 while some use segment readers.  The direction is going toward all
 segment readers, but we're not there yet (and won't be for 1.4).
 ord() rord() will never be fixed... people need to migrate to
 something else.

 http://issues.apache.org/jira/browse/SOLR- is the main issue for this.

 If course, I've really only been talking about search related changes.
  Nothing on the indexing side should cause greater memory usage
 but perhaps the indexing side could run out of memory due to the
 search side taking up more.

 -Yonik
 http://www.lucidimagination.com

   
  I updated and compiled from trunk as of yesterday (09/30/2009).  When
 I try to do a full import I am receiving a GC heap error after changing
 nothing in the configuration files.  Why would this happen in the most
 recent versions but not in the version from a few months ago.  The stack
 trace is below.

 Oct 1, 2009 8:34:32 AM org.apache.solr.update.processor.LogUpdateProcessor
 finish
 INFO: {add=[166400, 166608, 166698, 166800, 166811, 167097, 167316, 167353,
 ...(83 more)]} 0 35991
 Oct 1, 2009 8:34:32 AM org.apache.solr.common.SolrException log
 SEVERE: java.lang.OutOfMemoryError: GC overhead limit exceeded
at java.util.Arrays.copyOfRange(Arrays.java:3209)
at java.lang.String.init(String.java:215)
at com.ctc.wstx.util.TextBuffer.contentsAsString(TextBuffer.java:384)
at com.ctc.wstx.sr.BasicStreamReader.getText(BasicStreamReader.java:821)
at org.apache.solr.handler.XMLLoader.readDoc(XMLLoader.java:280)
at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:139)
at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:69)
at
 org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentSt
 reamHandlerBase.java:54)
at
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.
 java:131)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
at
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:3
 38)
at
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:
 241)
at
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Application
 FilterChain.java:235)
at
 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterCh
 ain.java:206)
at
 org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.ja
 va:233)
at
 org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.ja
 va:175)
at
 org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128
 )
at
 org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102
 )
at
 org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java
 :109)
at
 org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286)
at
 org.apache.coyote.http11.Http11NioProcessor.process(Http11NioProcessor.java:
 879)
at
 org.apache.coyote.http11.Http11NioProtocol$Http11ConnectionHandler.process(H
 ttp11NioProtocol.java:719)
at
 org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.run(NioEndpoint.java:
 2080)
at
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.ja
 va:886)
at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:9
 08)
at java.lang.Thread.run(Thread.java:619)

 Oct 1, 2009 8:40:06 AM org.apache.solr.core.SolrCore execute
 INFO: [zeta-main] webapp=/solr path=/update params={} status=500 QTime=5265
 Oct 1, 2009 8:40:12 AM org.apache.solr.common.SolrException log
 SEVERE: java.lang.OutOfMemoryError: GC overhead limit exceeded

 --
 Jeff Newburn
 Software Engineer, Zappos.com
 jnewb...@zappos.com - 702-943-7562


 


-- 
- Mark

http://www.lucidimagination.com





Re: index size before and after commit

2009-10-01 Thread Lance Norskog
Ha! Searching partial optimize on
http://www.lucidimagination.com/search , we discover SOLR-603 which
gives the 'maxSegments' option to the optimize command. The text
does not include the word 'partial'.

It's on http://wiki.apache.org/solr/UpdateXmlMessages. The command
gives a number of Lucene segments, and I have no idea how this will
translate to disk space. To minimize disk space, you could run it
repetitively with the number of segments decreasing to one.

On Thu, Oct 1, 2009 at 11:49 AM, Lance Norskog goks...@gmail.com wrote:
 I've heard there is a new partial optimize feature in Lucene, but it
 is not mentioned in the Solr or Lucene wikis so I cannot advise you
 how to use it.

 On a previous project we had a 500GB index for 450m documents. It took
 14 hours to optimize. We found that Solr worked well (given enough RAM
 for sorting and faceting requests) but that the IT logistics of a 500G
 fileset were too much.

 Also, if you want your query servers to continue serving while
 propogating the newly optimized index, you need 2X space to store both
 copies on the slave during the transfer. For us this 35 minutes over
 1G ethernet.

 On Thu, Oct 1, 2009 at 7:36 AM, Walter Underwood wun...@wunderwood.org 
 wrote:
 I've now worked on three different search engines and they all have a 3X
 worst
 case on space, so I'm familiar with this case. --wunder

 On Oct 1, 2009, at 7:15 AM, Mark Miller wrote:

 Nice one ;) Its not technically a case where optimize requires  2x
 though in case the user asking gets confused. Its a case unrelated to
 optimize that can grow your index. Then you need  2x for the optimize,
 since you won't copy the deletes.

 It also requires that you jump hoops to delete everything. If you delete
 everything with *:*, that is smart enough not to just do a delete on
 every document - it just creates a new index, allowing the removal of
 the old very efficiently.

 Def agree on the more disk space.

 Walter Underwood wrote:

 Here is how you need 3X. First, index everything and optimize. Then
 delete everything and reindex without any merges.

 You have one full-size index containing only deleted docs, one
 full-size index containing reindexed docs, and need that much space
 for a third index.

 Honestly, disk is cheap, and there is no way to make Lucene work
 reliably with less disk. 1TB is a few hundred dollars. You have a free
 search engine, buy some disk.

 wunder

 On Oct 1, 2009, at 6:25 AM, Grant Ingersoll wrote:

 151GB or as little as from 183GB to 182GB.  Is that size after a
 commit close to the size the index would be after an optimize?  For
 that matter, are there cases where optimization can take more than
 2x?  I've heard of cases but have not observed them in my system.

 I seem to recall a case where it can be 3x, but I don't know that it
 has been observed much.



 --
 - Mark

 http://www.lucidimagination.com








 --
 Lance Norskog
 goks...@gmail.com




-- 
Lance Norskog
goks...@gmail.com


Re: Solr Trunk Heap Space Issues

2009-10-01 Thread Yonik Seeley
On Thu, Oct 1, 2009 at 3:14 PM, Mark Miller markrmil...@gmail.com wrote:
 bq. Tons of changes since... including the per-segment
 searching/sorting/function queries (I think).

 Yup. I actually didn't think so, because that was committed to Lucene in
 Feburary - but it didn't come into Solr till March 10th. March 5th just
 ducked it.

Jeff said May 5th

But it wasn't until the end of May that Solr started using Lucene's
new sorting facilities that worked per-segment.

-Yonik
http://www.lucidimagination.com


 Yonik Seeley wrote:
 On Thu, Oct 1, 2009 at 11:41 AM, Jeff Newburn jnewb...@zappos.com wrote:

 I am trying to update to the newest version of solr from trunk as of May
 5th.


 Tons of changes since... including the per-segment
 searching/sorting/function queries (I think).

 Do you sort on any single valued fields that you also facet on?
 Do you use ord() or rord() in any function queries?

 Unfortunately, some of these things will take up more memory because
 some things still cache FieldCache elements with the top-level reader,
 while some use segment readers.  The direction is going toward all
 segment readers, but we're not there yet (and won't be for 1.4).
 ord() rord() will never be fixed... people need to migrate to
 something else.

 http://issues.apache.org/jira/browse/SOLR- is the main issue for this.

 If course, I've really only been talking about search related changes.
  Nothing on the indexing side should cause greater memory usage
 but perhaps the indexing side could run out of memory due to the
 search side taking up more.

 -Yonik
 http://www.lucidimagination.com


  I updated and compiled from trunk as of yesterday (09/30/2009).  When
 I try to do a full import I am receiving a GC heap error after changing
 nothing in the configuration files.  Why would this happen in the most
 recent versions but not in the version from a few months ago.  The stack
 trace is below.

 Oct 1, 2009 8:34:32 AM org.apache.solr.update.processor.LogUpdateProcessor
 finish
 INFO: {add=[166400, 166608, 166698, 166800, 166811, 167097, 167316, 167353,
 ...(83 more)]} 0 35991
 Oct 1, 2009 8:34:32 AM org.apache.solr.common.SolrException log
 SEVERE: java.lang.OutOfMemoryError: GC overhead limit exceeded
    at java.util.Arrays.copyOfRange(Arrays.java:3209)
    at java.lang.String.init(String.java:215)
    at com.ctc.wstx.util.TextBuffer.contentsAsString(TextBuffer.java:384)
    at com.ctc.wstx.sr.BasicStreamReader.getText(BasicStreamReader.java:821)
    at org.apache.solr.handler.XMLLoader.readDoc(XMLLoader.java:280)
    at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:139)
    at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:69)
    at
 org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentSt
 reamHandlerBase.java:54)
    at
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.
 java:131)
    at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
    at
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:3
 38)
    at
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:
 241)
    at
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Application
 FilterChain.java:235)
    at
 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterCh
 ain.java:206)
    at
 org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.ja
 va:233)
    at
 org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.ja
 va:175)
    at
 org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128
 )
    at
 org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102
 )
    at
 org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java
 :109)
    at
 org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286)
    at
 org.apache.coyote.http11.Http11NioProcessor.process(Http11NioProcessor.java:
 879)
    at
 org.apache.coyote.http11.Http11NioProtocol$Http11ConnectionHandler.process(H
 ttp11NioProtocol.java:719)
    at
 org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.run(NioEndpoint.java:
 2080)
    at
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.ja
 va:886)
    at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:9
 08)
    at java.lang.Thread.run(Thread.java:619)

 Oct 1, 2009 8:40:06 AM org.apache.solr.core.SolrCore execute
 INFO: [zeta-main] webapp=/solr path=/update params={} status=500 QTime=5265
 Oct 1, 2009 8:40:12 AM org.apache.solr.common.SolrException log
 SEVERE: java.lang.OutOfMemoryError: GC overhead limit exceeded

 --
 Jeff Newburn
 Software Engineer, Zappos.com
 jnewb...@zappos.com - 702-943-7562





 --
 - Mark

 http://www.lucidimagination.com






Re: Solr Trunk Heap Space Issues

2009-10-01 Thread Mark Miller
Whoops. There is my lazy brain for you - march, may, august - all the
same ;)

Okay - forgot Solr went straight down and used FieldSortedHitQueue.

So it all still makes sense ;)

Still interested in seeing his field sanity output to see whats possibly
being doubled.

Yonik Seeley wrote:
 On Thu, Oct 1, 2009 at 3:14 PM, Mark Miller markrmil...@gmail.com wrote:
   
 bq. Tons of changes since... including the per-segment
 searching/sorting/function queries (I think).

 Yup. I actually didn't think so, because that was committed to Lucene in
 Feburary - but it didn't come into Solr till March 10th. March 5th just
 ducked it.
 

 Jeff said May 5th

 But it wasn't until the end of May that Solr started using Lucene's
 new sorting facilities that worked per-segment.

 -Yonik
 http://www.lucidimagination.com


   
 Yonik Seeley wrote:
 
 On Thu, Oct 1, 2009 at 11:41 AM, Jeff Newburn jnewb...@zappos.com wrote:

   
 I am trying to update to the newest version of solr from trunk as of May
 5th.

 
 Tons of changes since... including the per-segment
 searching/sorting/function queries (I think).

 Do you sort on any single valued fields that you also facet on?
 Do you use ord() or rord() in any function queries?

 Unfortunately, some of these things will take up more memory because
 some things still cache FieldCache elements with the top-level reader,
 while some use segment readers.  The direction is going toward all
 segment readers, but we're not there yet (and won't be for 1.4).
 ord() rord() will never be fixed... people need to migrate to
 something else.

 http://issues.apache.org/jira/browse/SOLR- is the main issue for this.

 If course, I've really only been talking about search related changes.
  Nothing on the indexing side should cause greater memory usage
 but perhaps the indexing side could run out of memory due to the
 search side taking up more.

 -Yonik
 http://www.lucidimagination.com


   
  I updated and compiled from trunk as of yesterday (09/30/2009).  When
 I try to do a full import I am receiving a GC heap error after changing
 nothing in the configuration files.  Why would this happen in the most
 recent versions but not in the version from a few months ago.  The stack
 trace is below.

 Oct 1, 2009 8:34:32 AM org.apache.solr.update.processor.LogUpdateProcessor
 finish
 INFO: {add=[166400, 166608, 166698, 166800, 166811, 167097, 167316, 167353,
 ...(83 more)]} 0 35991
 Oct 1, 2009 8:34:32 AM org.apache.solr.common.SolrException log
 SEVERE: java.lang.OutOfMemoryError: GC overhead limit exceeded
at java.util.Arrays.copyOfRange(Arrays.java:3209)
at java.lang.String.init(String.java:215)
at com.ctc.wstx.util.TextBuffer.contentsAsString(TextBuffer.java:384)
at com.ctc.wstx.sr.BasicStreamReader.getText(BasicStreamReader.java:821)
at org.apache.solr.handler.XMLLoader.readDoc(XMLLoader.java:280)
at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:139)
at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:69)
at
 org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentSt
 reamHandlerBase.java:54)
at
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.
 java:131)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
at
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:3
 38)
at
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:
 241)
at
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Application
 FilterChain.java:235)
at
 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterCh
 ain.java:206)
at
 org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.ja
 va:233)
at
 org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.ja
 va:175)
at
 org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128
 )
at
 org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102
 )
at
 org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java
 :109)
at
 org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286)
at
 org.apache.coyote.http11.Http11NioProcessor.process(Http11NioProcessor.java:
 879)
at
 org.apache.coyote.http11.Http11NioProtocol$Http11ConnectionHandler.process(H
 ttp11NioProtocol.java:719)
at
 org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.run(NioEndpoint.java:
 2080)
at
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.ja
 va:886)
at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:9
 08)
at java.lang.Thread.run(Thread.java:619)

 Oct 1, 2009 8:40:06 AM org.apache.solr.core.SolrCore execute
 INFO: [zeta-main] webapp=/solr path=/update params={} status=500 QTime=5265
 Oct 1, 2009 8:40:12 AM org.apache.solr.common.SolrException log
 SEVERE: 

Re: Solr Trunk Heap Space Issues

2009-10-01 Thread Yonik Seeley
On Thu, Oct 1, 2009 at 3:37 PM, Mark Miller markrmil...@gmail.com wrote:
 Still interested in seeing his field sanity output to see whats possibly
 being doubled.

Strangely enough, I'm having a hard time seeing caching at the different levels.
I mad a multi-segment index (2 segments), and then did a sort and facet:
http://localhost:8983/solr/select?q=*:*sort=popularity%20descfacet=truefacet.field=popularity

Seems like that should do it, but the statistics fieldCache section
shows only 2 entries.
  entries_count : 2
entry#0 : 
'org.apache.lucene.index.compoundfilereader$csindexin...@5b38d7'='popularity',int,org.apache.lucene.search.FieldCache.NUMERIC_UTILS_INT_PARSER=[I#949587
(size =~ 92 bytes)
entry#1 : 
'org.apache.lucene.index.compoundfilereader$csindexin...@1582a7c'='popularity',int,org.apache.lucene.search.FieldCache.NUMERIC_UTILS_INT_PARSER=[I#3534544
(size =~ 28 bytes)
insanity_count : 0

Investigating further...

-Yonik
http://www.lucidimagination.com

 Yonik Seeley wrote:
 On Thu, Oct 1, 2009 at 3:14 PM, Mark Miller markrmil...@gmail.com wrote:

 bq. Tons of changes since... including the per-segment
 searching/sorting/function queries (I think).

 Yup. I actually didn't think so, because that was committed to Lucene in
 Feburary - but it didn't come into Solr till March 10th. March 5th just
 ducked it.


 Jeff said May 5th

 But it wasn't until the end of May that Solr started using Lucene's
 new sorting facilities that worked per-segment.

 -Yonik
 http://www.lucidimagination.com



 Yonik Seeley wrote:

 On Thu, Oct 1, 2009 at 11:41 AM, Jeff Newburn jnewb...@zappos.com wrote:


 I am trying to update to the newest version of solr from trunk as of May
 5th.


 Tons of changes since... including the per-segment
 searching/sorting/function queries (I think).

 Do you sort on any single valued fields that you also facet on?
 Do you use ord() or rord() in any function queries?

 Unfortunately, some of these things will take up more memory because
 some things still cache FieldCache elements with the top-level reader,
 while some use segment readers.  The direction is going toward all
 segment readers, but we're not there yet (and won't be for 1.4).
 ord() rord() will never be fixed... people need to migrate to
 something else.

 http://issues.apache.org/jira/browse/SOLR- is the main issue for this.

 If course, I've really only been talking about search related changes.
  Nothing on the indexing side should cause greater memory usage
 but perhaps the indexing side could run out of memory due to the
 search side taking up more.

 -Yonik
 http://www.lucidimagination.com



  I updated and compiled from trunk as of yesterday (09/30/2009).  When
 I try to do a full import I am receiving a GC heap error after changing
 nothing in the configuration files.  Why would this happen in the most
 recent versions but not in the version from a few months ago.  The stack
 trace is below.

 Oct 1, 2009 8:34:32 AM org.apache.solr.update.processor.LogUpdateProcessor
 finish
 INFO: {add=[166400, 166608, 166698, 166800, 166811, 167097, 167316, 
 167353,
 ...(83 more)]} 0 35991
 Oct 1, 2009 8:34:32 AM org.apache.solr.common.SolrException log
 SEVERE: java.lang.OutOfMemoryError: GC overhead limit exceeded
    at java.util.Arrays.copyOfRange(Arrays.java:3209)
    at java.lang.String.init(String.java:215)
    at com.ctc.wstx.util.TextBuffer.contentsAsString(TextBuffer.java:384)
    at 
 com.ctc.wstx.sr.BasicStreamReader.getText(BasicStreamReader.java:821)
    at org.apache.solr.handler.XMLLoader.readDoc(XMLLoader.java:280)
    at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:139)
    at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:69)
    at
 org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentSt
 reamHandlerBase.java:54)
    at
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.
 java:131)
    at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
    at
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:3
 38)
    at
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:
 241)
    at
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Application
 FilterChain.java:235)
    at
 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterCh
 ain.java:206)
    at
 org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.ja
 va:233)
    at
 org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.ja
 va:175)
    at
 org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128
 )
    at
 org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102
 )
    at
 org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java
 :109)
    at
 org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286)
    at
 

Re: field collapsing sums

2009-10-01 Thread Martijn v Groningen
1) That is correct. Including collapsed documents fields can make you
search significantly slower (depending on how many documents are
returned).
2) It seems that you are using the parameters as was intended. The
collapsed documents will contain all documents (from whole query
result) that have been collapsed on a certain field value that occurs
in the result set that is being displayed. That is how it should work.
But if I'm understanding you correctly you want to display all dupes
from the whole query result set (also those which collapse field value
does not occur in the in the displayed result set)?

Martijn

2009/10/1 Joe Calderon calderon@gmail.com:
 hello martijn, thx for the tip, i tried that approach but ran into two
 snags, 1. returning the fields makes collapsing a lot slower for
 results, but that might just be the nature of iterating large results.
 2. it seems like only dupes of records on the first page are returned

 or is tehre a a setting im missing? currently im only sending,
 collapse.field=brand and collapse.includeCollapseDocs.fl=num_in_stock

 --joe

 On Thu, Oct 1, 2009 at 1:14 AM, Martijn v Groningen
 martijn.is.h...@gmail.com wrote:
 Hi Joe,

 Currently the patch does not do that, but you can do something else
 that might help you in getting your summed stock.

 In the latest patch you can include fields of collapsed documents in
 the result per distinct field value.
 If your specify collapse.includeCollapseDocs.fl=num_in_stock in the
 request nd lets say you collapse on brand then in the response you
 will receive the following xml:
 lst name=collapsedDocs
   result name=brand1 numFound=48 start=0
        doc
          str name=num_in_stock2/str
        /doc
         doc
          str name=num_in_stock3/str
        /doc
      ...
   /result
   result name=”brand2” numFound=”9” start=”0”
      ...
   /result
 /lst

 On the client side you can do whatever you want with this data and for
 example sum it together. Although the patch does not sum for you, I
 think it will allow to implement your requirement without to much
 hassle.

 Cheers,

 Martijn

 2009/10/1 Matt Weber m...@mattweber.org:
 You might want to see how the stats component works with field collapsing.

 Thanks,

 Matt Weber

 On Sep 30, 2009, at 5:16 PM, Uri Boness wrote:

 Hi,

 At the moment I think the most appropriate place to put it is in the
 AbstractDocumentCollapser (in the getCollapseInfo method). Though, it might
 not be the most efficient.

 Cheers,
 Uri

 Joe Calderon wrote:

 hello all, i have a question on the field collapsing patch, say i have
 an integer field called num_in_stock and i collapse by some other
 column, is it possible to sum up that integer field and return the
 total in the output, if not how would i go about extending the
 collapsing component to support that?


 thx much

 --joe









-- 
Met vriendelijke groet,

Martijn van Groningen


Re: Solr Trunk Heap Space Issues

2009-10-01 Thread Yonik Seeley
On Thu, Oct 1, 2009 at 4:05 PM, Yonik Seeley yo...@lucidimagination.com wrote:
 On Thu, Oct 1, 2009 at 3:37 PM, Mark Miller markrmil...@gmail.com wrote:
 Still interested in seeing his field sanity output to see whats possibly
 being doubled.

 Strangely enough, I'm having a hard time seeing caching at the different 
 levels.
 I mad a multi-segment index (2 segments), and then did a sort and facet:
 http://localhost:8983/solr/select?q=*:*sort=popularity%20descfacet=truefacet.field=popularity

 Seems like that should do it, but the statistics fieldCache section
 shows only 2 entries.
  entries_count : 2
 entry#0 : 
 'org.apache.lucene.index.compoundfilereader$csindexin...@5b38d7'='popularity',int,org.apache.lucene.search.FieldCache.NUMERIC_UTILS_INT_PARSER=[I#949587
 (size =~ 92 bytes)
 entry#1 : 
 'org.apache.lucene.index.compoundfilereader$csindexin...@1582a7c'='popularity',int,org.apache.lucene.search.FieldCache.NUMERIC_UTILS_INT_PARSER=[I#3534544
 (size =~ 28 bytes)
 insanity_count : 0

 Investigating further...

Ahhh, TrieField.isTokenized() returns true.
The facet code has
boolean multiToken = sf.multiValued() || ft.isTokenized();
and if multiToken==true then it uses multi-valued faceting, which
doesn't use the field cache.

Since isTokenized() more reflects if something is tokenized at the
Lucene level, perhaps we need something that specifies if there is
more than one logical value per field value?  I'm drawing a blank on a
good name for such a method though...

-Yonik
http://www.lucidimagination.com



 -Yonik
 http://www.lucidimagination.com

 Yonik Seeley wrote:
 On Thu, Oct 1, 2009 at 3:14 PM, Mark Miller markrmil...@gmail.com wrote:

 bq. Tons of changes since... including the per-segment
 searching/sorting/function queries (I think).

 Yup. I actually didn't think so, because that was committed to Lucene in
 Feburary - but it didn't come into Solr till March 10th. March 5th just
 ducked it.


 Jeff said May 5th

 But it wasn't until the end of May that Solr started using Lucene's
 new sorting facilities that worked per-segment.

 -Yonik
 http://www.lucidimagination.com



 Yonik Seeley wrote:

 On Thu, Oct 1, 2009 at 11:41 AM, Jeff Newburn jnewb...@zappos.com wrote:


 I am trying to update to the newest version of solr from trunk as of May
 5th.


 Tons of changes since... including the per-segment
 searching/sorting/function queries (I think).

 Do you sort on any single valued fields that you also facet on?
 Do you use ord() or rord() in any function queries?

 Unfortunately, some of these things will take up more memory because
 some things still cache FieldCache elements with the top-level reader,
 while some use segment readers.  The direction is going toward all
 segment readers, but we're not there yet (and won't be for 1.4).
 ord() rord() will never be fixed... people need to migrate to
 something else.

 http://issues.apache.org/jira/browse/SOLR- is the main issue for this.

 If course, I've really only been talking about search related changes.
  Nothing on the indexing side should cause greater memory usage
 but perhaps the indexing side could run out of memory due to the
 search side taking up more.

 -Yonik
 http://www.lucidimagination.com



  I updated and compiled from trunk as of yesterday (09/30/2009).  When
 I try to do a full import I am receiving a GC heap error after changing
 nothing in the configuration files.  Why would this happen in the most
 recent versions but not in the version from a few months ago.  The stack
 trace is below.

 Oct 1, 2009 8:34:32 AM 
 org.apache.solr.update.processor.LogUpdateProcessor
 finish
 INFO: {add=[166400, 166608, 166698, 166800, 166811, 167097, 167316, 
 167353,
 ...(83 more)]} 0 35991
 Oct 1, 2009 8:34:32 AM org.apache.solr.common.SolrException log
 SEVERE: java.lang.OutOfMemoryError: GC overhead limit exceeded
    at java.util.Arrays.copyOfRange(Arrays.java:3209)
    at java.lang.String.init(String.java:215)
    at com.ctc.wstx.util.TextBuffer.contentsAsString(TextBuffer.java:384)
    at 
 com.ctc.wstx.sr.BasicStreamReader.getText(BasicStreamReader.java:821)
    at org.apache.solr.handler.XMLLoader.readDoc(XMLLoader.java:280)
    at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:139)
    at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:69)
    at
 org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentSt
 reamHandlerBase.java:54)
    at
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.
 java:131)
    at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
    at
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:3
 38)
    at
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:
 241)
    at
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Application
 FilterChain.java:235)
    at
 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterCh
 ain.java:206)
    at
 

Re: Solr Trunk Heap Space Issues

2009-10-01 Thread Yonik Seeley
On Thu, Oct 1, 2009 at 4:35 PM, Yonik Seeley yo...@lucidimagination.com wrote:
 Since isTokenized() more reflects if something is tokenized at the
 Lucene level, perhaps we need something that specifies if there is
 more than one logical value per field value?  I'm drawing a blank on a
 good name for such a method though...

boolean singleValuedFieldCache()?

-Yonik
http://www.lucidimagination.com


Re: field collapsing sums

2009-10-01 Thread Joe Calderon
thx for the reply, i just want the number of dupes in the query
result, but it seems i dont get the correct totals,

for example a non collapsed dismax query for belgian beer returns X
number results
but when i collapse and sum the number of docs under collapse_counts,
its much less than X

it does seem to work when the collapsed results fit on one page (10
rows in my case)


--joe

 2) It seems that you are using the parameters as was intended. The
 collapsed documents will contain all documents (from whole query
 result) that have been collapsed on a certain field value that occurs
 in the result set that is being displayed. That is how it should work.
 But if I'm understanding you correctly you want to display all dupes
 from the whole query result set (also those which collapse field value
 does not occur in the in the displayed result set)?


Authentication/Authorization with Master-Slave over HTTP

2009-10-01 Thread Fuad Efendi
Is that possible? Implemented?

I want to be able to have SOLR Slave instance on publicly available host
(accessible via HTTP), and synchronize with Master securely (via HTTP)

I had it implicitly with cron jobs running as 'root' user, and Tomcat as
'tomcat'... Slave wasn't able to update index because of file system
permissions... but now, I want to move instances far (Master in LAB, and
Slave at Hosting Company) - and I want to secure it...


Thanks,
FUad
http://www.linkedin.com/in/liferay






Re: Solr Trunk Heap Space Issues

2009-10-01 Thread Mark Miller
Yonik Seeley wrote:
 On Thu, Oct 1, 2009 at 4:35 PM, Yonik Seeley yo...@lucidimagination.com 
 wrote:
   
 Since isTokenized() more reflects if something is tokenized at the
 Lucene level, perhaps we need something that specifies if there is
 more than one logical value per field value?  I'm drawing a blank on a
 good name for such a method though...
 

 boolean singleValuedFieldCache()?

 -Yonik
 http://www.lucidimagination.com
   
Since everything seems to weigh towards calling out multi, why not
multiValuedFieldCache?

Either one sounds good to me though.

-- 
- Mark

http://www.lucidimagination.com





Re: Solr Trunk Heap Space Issues

2009-10-01 Thread Jeff Newburn
Ok I was able to get a heap dump from the GC Limit error.

1 instance of LRUCache is taking 170mb
1 instance of SchemaIndex is taking 56Mb
4 instances of SynonymMap is taking 112mb

There is no searching going on during this index update process.

Any ideas what on earth is going on?  Like I said my May version did this
without any problems whatsoever.

-- 
Jeff Newburn
Software Engineer, Zappos.com
jnewb...@zappos.com - 702-943-7562


 From: Mark Miller markrmil...@gmail.com
 Reply-To: solr-user@lucene.apache.org
 Date: Thu, 01 Oct 2009 17:57:28 -0400
 To: solr-user@lucene.apache.org
 Subject: Re: Solr Trunk Heap Space Issues
 
 Yonik Seeley wrote:
 On Thu, Oct 1, 2009 at 4:35 PM, Yonik Seeley yo...@lucidimagination.com
 wrote:
   
 Since isTokenized() more reflects if something is tokenized at the
 Lucene level, perhaps we need something that specifies if there is
 more than one logical value per field value?  I'm drawing a blank on a
 good name for such a method though...
 
 
 boolean singleValuedFieldCache()?
 
 -Yonik
 http://www.lucidimagination.com
   
 Since everything seems to weigh towards calling out multi, why not
 multiValuedFieldCache?
 
 Either one sounds good to me though.
 
 -- 
 - Mark
 
 http://www.lucidimagination.com
 
 
 



Re: ExtractingRequestHandler unknown field 'stream_source_info'

2009-10-01 Thread Tricia Williams

Thanks Lance,

  I have lucid's search as one of my open search tools in my browser.  
Generally pretty useful (especially the ability to filter) but it's not 
of much help when the tool points out that the best info is on the wiki 
and the link to the wiki reveals that it can't be reached.  This is the 
second time in a couple of weeks I've seen the wiki down.  Is there an 
ongoing problem?  I do appreciate the tip though.


Tricia

Lance Norskog wrote:

For future reference, the Solr  Lucene wikis and mailing lists are
indexed on http://www.lucidimagination.com/search/

On Thu, Oct 1, 2009 at 11:40 AM, Tricia Williams
williams.tri...@gmail.com wrote:
  

If the wiki isn't working


https://www.packtpub.com/article/indexing-data-solr-1.4-enterprise-search-server-2
  

gave me more information.  The LucidImagination article helps too.

Now that the wiki is up again it is more obvious that I need to add:

str name=fmap.contentfulltext/str
str name=defaultFieldtext/str

to my solrconfig.xml

Tricia






  




JVM OOM when using field collapse component

2009-10-01 Thread Joe Calderon
i gotten two different out of memory errors while using the field
collapsing component, using the latest patch (2009-09-26) and the
latest nightly,

has anyone else encountered similar problems? my collection is 5
million results but ive gotten the error collapsing as little as a few
thousand

SEVERE: java.lang.OutOfMemoryError: Java heap space
at org.apache.lucene.util.ArrayUtil.grow(ArrayUtil.java:173)
at 
org.apache.lucene.util.OpenBitSet.ensureCapacityWords(OpenBitSet.java:749)
at org.apache.lucene.util.OpenBitSet.ensureCapacity(OpenBitSet.java:757)
at 
org.apache.lucene.util.OpenBitSet.expandingWordNum(OpenBitSet.java:292)
at org.apache.lucene.util.OpenBitSet.set(OpenBitSet.java:233)
at 
org.apache.solr.search.AbstractDocumentCollapser.addCollapsedDoc(AbstractDocumentCollapser.java:402)
at 
org.apache.solr.search.NonAdjacentDocumentCollapser.doCollapsing(NonAdjacentDocumentCollapser.java:115)
at 
org.apache.solr.search.AbstractDocumentCollapser.collapse(AbstractDocumentCollapser.java:208)
at 
org.apache.solr.handler.component.CollapseComponent.doProcess(CollapseComponent.java:98)
at 
org.apache.solr.handler.component.CollapseComponent.process(CollapseComponent.java:66)
at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:195)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1148)
at 
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:387)
at 
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at 
org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
at 
org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:765)
at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:417)
at 
org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
at 
org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
at 
org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
at org.mortbay.jetty.Server.handle(Server.java:326)
at 
org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:534)
at 
org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:864)
at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:539)
at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212)
at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404)
at 
org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:409)
at 
org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:520)

SEVERE: java.lang.OutOfMemoryError: Java heap space
at 
org.apache.solr.util.DocSetScoreCollector.init(DocSetScoreCollector.java:44)
at 
org.apache.solr.search.NonAdjacentDocumentCollapser.doQuery(NonAdjacentDocumentCollapser.java:68)
at 
org.apache.solr.search.AbstractDocumentCollapser.collapse(AbstractDocumentCollapser.java:205)
at 
org.apache.solr.handler.component.CollapseComponent.doProcess(CollapseComponent.java:98)
at 
org.apache.solr.handler.component.CollapseComponent.process(CollapseComponent.java:66)
at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:195)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
at 
org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1148)
at 
org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:387)
at 
org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216)
at 
org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181)
at 
org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:765)
at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:417)
at 
org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230)
at 
org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114)
at 
org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)

Re: Solr Trunk Heap Space Issues

2009-10-01 Thread Mark Miller
Jeff Newburn wrote:
 Ok I was able to get a heap dump from the GC Limit error.

 1 instance of LRUCache is taking 170mb
 1 instance of SchemaIndex is taking 56Mb
 4 instances of SynonymMap is taking 112mb

 There is no searching going on during this index update process.

 Any ideas what on earth is going on?  Like I said my May version did this
 without any problems whatsoever.

   
Had any searching gone on though? Even if its not occurring during the
indexing, you will still have the data structure loaded if searches had
occurred.

What heap size do you have - that doesn't look like much data to me ...

-- 
- Mark

http://www.lucidimagination.com





Re: Solr Trunk Heap Space Issues

2009-10-01 Thread Jeffery Newburn
I loaded the jvm and started indexing. It is a test server so unless  
some errant query came in then no searching. Our instance has only  
512mb but my concern is the obvious memory requirement leap since it  
worked before. What other data would be helpful with this?




On Oct 1, 2009, at 5:14 PM, Mark Miller markrmil...@gmail.com wrote:


Jeff Newburn wrote:

Ok I was able to get a heap dump from the GC Limit error.

1 instance of LRUCache is taking 170mb
1 instance of SchemaIndex is taking 56Mb
4 instances of SynonymMap is taking 112mb

There is no searching going on during this index update process.

Any ideas what on earth is going on?  Like I said my May version  
did this

without any problems whatsoever.



Had any searching gone on though? Even if its not occurring during the
indexing, you will still have the data structure loaded if searches  
had

occurred.

What heap size do you have - that doesn't look like much data to  
me ...


--
- Mark

http://www.lucidimagination.com





Re: Ranking of search results

2009-10-01 Thread bhaskar chandrasekar


--- On Wed, 9/23/09, Amit Nithian anith...@gmail.com wrote:


Hi Amith,
 
Thanks for your reply.How do i set preference for the links , which should 
appear first,second in the search results.
Which configuration file in Solr needs to be modified to achieve the same?.
 
Regards
Bhaskar
From: Amit Nithian anith...@gmail.com
Subject: Re: Ranking of search results
To: solr-user@lucene.apache.org
Date: Wednesday, September 23, 2009, 11:33 AM


It depends on several things:1) The query handler that you are using
2) The fields that you are searching on and default fields specified

For the default handler, it will issue a query for the default field and
return results accordingly. To see what is going on  pass the
debugQuery=true to the end of the URL to see detailed output. If you are
using the DisMaxHandler (DisJoint Max) then you will have a qf, pf and bf
(query fields, phrase fields, boosting function). I would start looking at
http://wiki.apache.org/solr/DisMaxRequestHandler

http://wiki.apache.org/solr/DisMaxRequestHandler- Amit

On Wed, Sep 23, 2009 at 10:25 AM, bhaskar chandrasekar bas_s...@yahoo.co.in
 wrote:

 Hi,

 When i give a input string for search in Solr , it displays me the
 corresponding results for the given input string.

 How the results are ranked and displayed.On what basis the search results
 are displayed.
 Is there any algorithm followed for displaying the results with first
 result and so on.


 Regards
 Bhaskar







  

Re: trie fields and sortMissingLast

2009-10-01 Thread Steve Conover
 Not in time for 1.4, but yes they will eventually get it.
 It has to do with the representation... currently we can't tell
 between a 0 and missing.

Hmm.  So does that mean that a query for latitudes, stored as trie
floats, from -10 to +10 matches documents with no (i.e. null) latitude
value?


Re: How to access the information from SolrJ

2009-10-01 Thread Noble Paul നോബിള്‍ नोब्ळ्
QueryResponse#getResults()#getNumFound()

On Thu, Oct 1, 2009 at 11:49 PM, Paul Tomblin ptomb...@xcski.com wrote:
 When I do a query directly form the web, the XML of the response
 includes how many results would have been returned if it hadn't
 restricted itself to the first 10 rows:

 For instance, the query:
 http://localhost:8080/solrChunk/nutch/select/?q=*:*fq=category:mysites
 returns:
 response
 lst name='responseHeader'
 int name='status'0/int
 int name='QTime'0/int
 lst name='params'
 str name='q'*:*/str
 str name='fq'category:mysites/str
 /lst
 /lst
 result name='response' numFound='1251' start='0'
 doc
 str name='category'mysites/str
 long name='chunkNum'0/long
 str name='chunkUrl'http://localhost/Chunks/mysites/0-http___xcski.com_.xml/str
 str name='concept'Anatomy/str
 ...

 The value I'm talking about is in the numFound attribute of the result 
 tag.

 I don't see any way to retrieve it through SolrJ - it's not in the
 QueryResponse.getHeader(), for instance.  Can I retrieve it somewhere?

 --
 http://www.linkedin.com/in/paultomblin




-- 
-
Noble Paul | Principal Engineer| AOL | http://aol.com


Re: trie fields and sortMissingLast

2009-10-01 Thread Yonik Seeley
On Thu, Oct 1, 2009 at 11:09 PM, Steve Conover scono...@gmail.com wrote:
 Not in time for 1.4, but yes they will eventually get it.
 It has to do with the representation... currently we can't tell
 between a 0 and missing.

 Hmm.  So does that mean that a query for latitudes, stored as trie
 floats, from -10 to +10 matches documents with no (i.e. null) latitude
 value?

No, because normal queries work off of the inverted index
(term-docids_that_match), and there won't be any values indexed for
that document.
Sorting and function queries work off of a non-inverted index
(docid-value), that depending on the representation can't tell
non-matching from default value.

-Yonik
http://www.lucidimagination.com


Re: Solr Trunk Heap Space Issues

2009-10-01 Thread Yonik Seeley
On Thu, Oct 1, 2009 at 8:45 PM, Jeffery Newburn jnewb...@zappos.com wrote:
 I loaded the jvm and started indexing. It is a test server so unless some
 errant query came in then no searching. Our instance has only 512mb but my
 concern is the obvious memory requirement leap since it worked before. What
 other data would be helpful with this?

Interesting... not too much should have changed for memory
requirements on the indexing side.
TokenStreams are now reused (and hence cached) per thread... but that
normally wouldn't amount to much.

There was recently another bug where compound file format was being
used regardless of the config settings... but I think that was fixed
on the 29th.

Maybe you were already close to the limit required?
Also, your heap dump did show LRUCache taking up 170MB, and only
searches populate that (perhaps you have warming searches configured
on this server?)

-Yonik
http://www.lucidimagination.com







 On Oct 1, 2009, at 5:14 PM, Mark Miller markrmil...@gmail.com wrote:

 Jeff Newburn wrote:

 Ok I was able to get a heap dump from the GC Limit error.

 1 instance of LRUCache is taking 170mb
 1 instance of SchemaIndex is taking 56Mb
 4 instances of SynonymMap is taking 112mb

 There is no searching going on during this index update process.

 Any ideas what on earth is going on?  Like I said my May version did this
 without any problems whatsoever.


 Had any searching gone on though? Even if its not occurring during the
 indexing, you will still have the data structure loaded if searches had
 occurred.

 What heap size do you have - that doesn't look like much data to me ...

 --
 - Mark

 http://www.lucidimagination.com