trie fields and sortMissingLast
Am I correct in thinking that trie fields don't support sortMissingLast (my tests show that they don't). If not, is there any plan for adding it in? Regards, Steve
Re: Invalid response with search key having numbers
On Wed, Sep 30, 2009 at 3:01 PM, con convo...@gmail.com wrote: Hi all I am getting incorrect results when i search with numbers only or string containing numbers. when such a search is done, all the results in the index is returned, irrespective of the search key. For eg, the phone number field is mapped to TextField. it can contains values like , 653-23345 also search string like john25, searched against name will show all the results. Getting all results irrespective of the query is very odd. Try adding debugQuery=on to the queries. That will show you exactly how the query is being parsed. -- Regards, Shalin Shekhar Mangar.
Re: Problem with Wildcard...
On Tue, Sep 29, 2009 at 6:42 PM, Jörg Agatz joerg.ag...@googlemail.comwrote: Hi Users... i have a Problem I have a lot of fields, (type=text) for search in all fields i copy all fields in the default text field and use this for default search. Now i will search... This is into a Field RI-MC500034-1 when i search RI-MC500034-1 i found it... if i seacht RI-MC5000* i dosen´t when i search 500034 i found it... if i seacht 5000* i dosen´t what can i do to use the Wildcards? I guess one thing you need to do is to add preserveOriginal=true in the WordDelimiterFactory section in your field type. That would help match things like RI-MC5000*. Make sure you re-index all documents after this change. As for the others, add debugQuery=on as a request parameter and see how the query is being parsed. If you have a doubt, paste it on the list and we can help you. -- Regards, Shalin Shekhar Mangar.
Why isn't the DateField implementation of ISO 8601 broader?
Hi All, I'm working with data that has multiple date precisions most of which do not have a time associated with them, rather centuries (like 1800's), years (like 1867), and years/months (like 1918-11). I'm able to sort and search using a workaround where we store the date as a string CCYYMM where YYMM are optional. I was hoping to be able to tie this into the DateField type so that it becomes possible to facet on them without much work and duplication of data. Unfortunately it requires the cannonical representation of dateTime which means the time part of the string is mandatory. My question is why isn't the DateField implementation of ISO 8601 broader so that it could include and MM as acceptable date strings? What would it take to do so? Are there any work-arounds for faceting by century, year, month without creating new fields in my schema? The last resort would be to create these new fields but I'm hoping to leverage the power of the DateField and the trie to replace range stuff. Thanks, Tricia Some interesting observations from tinkering with the DateFieldTest: * 2003-03-00T00:00:00Z becomes 2003-02-28T00:00:00Z * 2008-03-00T00:00:00Z becomes 2008-02-29T00:00:00Z * 2003-00-00T00:00:00Z becomes 2002-11-30T00:00:00Z * 2000-00-00T00:00:00Z becomes 1999-11-30T00:00:00Z * 1979-00-31T00:00:00Z becomes 1978-12-31T00:00:00Z * 2005-04-00T00:00:00Z becomes 2005-03-31T00:00:00Z * 1850-10-00T00:00:00Z becomes 1850-09-30T00:00:00Z The rounding /YEAR, /MONTH, etc artificially imposes extra precision that the original data wouldn't have. In any case where months are zero weird rounding happens.
Re: field collapsing sums
Hi Joe, Currently the patch does not do that, but you can do something else that might help you in getting your summed stock. In the latest patch you can include fields of collapsed documents in the result per distinct field value. If your specify collapse.includeCollapseDocs.fl=num_in_stock in the request nd lets say you collapse on brand then in the response you will receive the following xml: lst name=collapsedDocs result name=brand1 numFound=48 start=0 doc str name=num_in_stock2/str /doc doc str name=num_in_stock3/str /doc ... /result result name=”brand2” numFound=”9” start=”0” ... /result /lst On the client side you can do whatever you want with this data and for example sum it together. Although the patch does not sum for you, I think it will allow to implement your requirement without to much hassle. Cheers, Martijn 2009/10/1 Matt Weber m...@mattweber.org: You might want to see how the stats component works with field collapsing. Thanks, Matt Weber On Sep 30, 2009, at 5:16 PM, Uri Boness wrote: Hi, At the moment I think the most appropriate place to put it is in the AbstractDocumentCollapser (in the getCollapseInfo method). Though, it might not be the most efficient. Cheers, Uri Joe Calderon wrote: hello all, i have a question on the field collapsing patch, say i have an integer field called num_in_stock and i collapse by some other column, is it possible to sum up that integer field and return the total in the output, if not how would i go about extending the collapsing component to support that? thx much --joe
Question on modifying solr behavior on indexing xml files..
1. In my playing around with sending in an XML document within a an XML CDATA tag, with termVectors=true I noticed the following behavior: personpeter/person collapses to the term personpeterperson instead of person and peter separately. I realize I could try and do a search and replaces of characters like = to a space so that the default parser/indexer can preserve element names. However, I'm wondering if someon could point me to where one might do this withing the solr or apache lucene code as a proper plug in with maybe an example that I could use as a template. Also where in the solrconfig.xml file I would want to change to reference the new parser. 2. My other question would also be if this technique would work for XML type messages embedded in Microsoft Excel, or Powerpoint presentations where I would like to preserve knowining xml element term frequencies where I would try and leverage the component that automatically indexes microsoft documents. Would I need to modify that component and customize it? -Peter
Where to place ReversedWildcardFilterFactory in Chain
Hi all, I would have two questions about the ReversedWildcardFilterFactory: a) put it into both chains, index and query, or into index only? b) where exactly in the/each chain do I have to put it? (Do I have to respect a certain order - as I have wordDelimiter and lowercase in there, as well.) More Details: I understand it is used to allow queries like *sport. My current configuration for the field I want to use it for contains this setup: fieldType name=text_cn class=solr.TextField analyzer filter class=solr.WordDelimiterFilterFactory splitOnCaseChange=1 splitOnNumerics=1 stemEnglishPossessive=1 generateWordParts=1 generateNumberParts=1 catenateAll=1 preserveOriginal=1 / filter class=solr.LowerCaseFilterFactory / /analyzer /fieldType The wiki page http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters states for the ReversedWildcardFF: Add this filter to the index analyzer, but not the query analyzer. However, the API for it says it provides functionality at index and query time (my understanding): When this factory is added to an analysis chain, it will be used both for filtering the tokens during indexing, and to determine the query processing of this field during search. Any help is greatly appreciated. Thanks! Chantal -- Chantal Ackermann
Query filters/analyzers
Hello list. So, i setup my schema.xml with the different chains of analyzers and filters for each field (i.e. i created types text-en, text-de, text-it). As i have to index documents in different languages, this is good. But what defines the analyzers and filters for the query? Let's suppose i have my web-app with my input form where you fill in the query. I detect the language so i can query the field content-en or content-it or content-de according to the detection. But how is the query going to be analyzed? Of course i want the query to be analyzed accordingly to the field i'm going to search in. Where is this defined? TIA Claudio -- Claudio Martella Digital Technologies Unit Research Development - Engineer TIS innovation park Via Siemens 19 | Siemensstr. 19 39100 Bolzano | 39100 Bozen Tel. +39 0471 068 123 Fax +39 0471 068 129 claudio.marte...@tis.bz.it http://www.tis.bz.it Short information regarding use of personal data. According to Section 13 of Italian Legislative Decree no. 196 of 30 June 2003, we inform you that we process your personal data in order to fulfil contractual and fiscal obligations and also to send you information regarding our services and events. Your personal data are processed with and without electronic means and by respecting data subjects' rights, fundamental freedoms and dignity, particularly with regard to confidentiality, personal identity and the right to personal data protection. At any time and without formalities you can write an e-mail to priv...@tis.bz.it in order to object the processing of your personal data for the purpose of sending advertising materials and also to exercise the right to access personal data and other rights referred to in Section 7 of Decree 196/2003. The data controller is TIS Techno Innovation Alto Adige, Siemens Street n. 19, Bolzano. You can find the complete information on the web site www.tis.bz.it.
Re: Query filters/analyzers
Hi Claudio, in schema.xml, the analyzer element accepts the attribute type. If you need different analyzer chains during indexing and querying, configure it like this: fieldType name=channel_name class=solr.TextField analyzer type=index !-- indexing analyzer chain defined here -- /analyzer analyzer type=query !-- query analyzer chain defined here -- /analyzer /fieldType If there is no difference, just remove one analyzer element and the type attribute from the remaining one. You can check after indexing in the schema browser (admin web frontend) what analyzer chain is applied for indexing and querying on a certain field. When you have detected the input language, simply choose the correct field, and the configured analyzer chain for that field will be applied automatically. E.g. input is italian: q=text-it:input text-it has the italian analyzers configured for index and query, so to the input, the italian analyzers will also be applied. Cheers, Chantal Claudio Martella schrieb: Hello list. So, i setup my schema.xml with the different chains of analyzers and filters for each field (i.e. i created types text-en, text-de, text-it). As i have to index documents in different languages, this is good. But what defines the analyzers and filters for the query? Let's suppose i have my web-app with my input form where you fill in the query. I detect the language so i can query the field content-en or content-it or content-de according to the detection. But how is the query going to be analyzed? Of course i want the query to be analyzed accordingly to the field i'm going to search in. Where is this defined? TIA Claudio -- Claudio Martella Digital Technologies Unit Research Development - Engineer TIS innovation park Via Siemens 19 | Siemensstr. 19 39100 Bolzano | 39100 Bozen Tel. +39 0471 068 123 Fax +39 0471 068 129 claudio.marte...@tis.bz.it http://www.tis.bz.it Short information regarding use of personal data. According to Section 13 of Italian Legislative Decree no. 196 of 30 June 2003, we inform you that we process your personal data in order to fulfil contractual and fiscal obligations and also to send you information regarding our services and events. Your personal data are processed with and without electronic means and by respecting data subjects' rights, fundamental freedoms and dignity, particularly with regard to confidentiality, personal identity and the right to personal data protection. At any time and without formalities you can write an e-mail to priv...@tis.bz.it in order to object the processing of your personal data for the purpose of sending advertising materials and also to exercise the right to access personal data and other rights referred to in Section 7 of Decree 196/2003. The data controller is TIS Techno Innovation Alto Adige, Siemens Street n. 19, Bolzano. You can find the complete information on the web site www.tis.bz.it.
Solr 1.4 Release date/ lucene 2.9 API ?
Hi all, Have you planned a release date for solr 1.4? If I understood well, it will use lucene 2.9 release from last sept. 24th with a stable API? Thanks. Jerome. -- Jerome Eteve. http://www.eteve.net jer...@eteve.net
index size before and after commit
I am trying to automate a build process that adds documents to 10 shards over 5 machines and need to limit the size of a shard to no more than 200GB because I only have 400GB of disk available to optimize a given shard. Why does the size (du) of an index typically decrease after a commit? I've observed a decrease in size of as much as from 296GB down to 151GB or as little as from 183GB to 182GB. Is that size after a commit close to the size the index would be after an optimize? For that matter, are there cases where optimization can take more than 2x? I've heard of cases but have not observed them in my system. I only do adds to the shards, never query them. An LVM snapshot of the shard receives the queries. Is doing a commit before I take a du a reliable way to gauge the size of the shard? It is really bad news to allow a shard to go over 200GB in my use case. How do others manage this problem of 2x space needed to optimize with limited dosk space? Advice greatly appreciated. Phil
Re: Solr 1.4 Release date/ lucene 2.9 API ?
On Oct 1, 2009, at 8:32 AM, Jérôme Etévé wrote: Hi all, Have you planned a release date for solr 1.4? If I understood well, it will use lucene 2.9 release from last sept. 24th with a stable API? Please have a look at https://issues.apache.org/jira/secure/BrowseVersion.jspa?id=12310230versionId=12313351showOpenIssuesOnly=true (assuming JIRA is up) and see if there is anyway you can contribute to testing, etc. Once these 9 issues are cleared up, we can do a release. Yes, it will use 2.9.0 -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene: http://www.lucidimagination.com/search
Re: index size before and after commit
It may take some time before resources are released and garbage collected, so that may be part of the reason why things hang around and du doesn't report much of a drop. On Oct 1, 2009, at 8:54 AM, Phillip Farber wrote: I am trying to automate a build process that adds documents to 10 shards over 5 machines and need to limit the size of a shard to no more than 200GB because I only have 400GB of disk available to optimize a given shard. Why does the size (du) of an index typically decrease after a commit? I've observed a decrease in size of as much as from 296GB down to 151GB or as little as from 183GB to 182GB. Is that size after a commit close to the size the index would be after an optimize? For that matter, are there cases where optimization can take more than 2x? I've heard of cases but have not observed them in my system. I seem to recall a case where it can be 3x, but I don't know that it has been observed much. I only do adds to the shards, never query them. An LVM snapshot of the shard receives the queries. Is doing a commit before I take a du a reliable way to gauge the size of the shard? It is really bad news to allow a shard to go over 200GB in my use case. How do others manage this problem of 2x space needed to optimize with limited dosk space? Do you need to optimize at all? -- Grant Ingersoll http://www.lucidimagination.com/ Search the Lucene ecosystem (Lucene/Solr/Nutch/Mahout/Tika/Droids) using Solr/Lucene: http://www.lucidimagination.com/search
Re: index size before and after commit
Phillip Farber wrote: I am trying to automate a build process that adds documents to 10 shards over 5 machines and need to limit the size of a shard to no more than 200GB because I only have 400GB of disk available to optimize a given shard. Why does the size (du) of an index typically decrease after a commit? I've observed a decrease in size of as much as from 296GB down to 151GB or as little as from 183GB to 182GB. Is that size after a commit close to the size the index would be after an optimize? Likely. Until you commit or close the Writer, the unoptimized index is the live index. And then you also have the optimized index. Once you commit and make the optimized index the live index, the unoptimized index can be removed (depending on your delete policy, which by default only keeps the latest commit point). For that matter, are there cases where optimization can take more than 2x? I've heard of cases but have not observed them in my system. I only do adds to the shards, never query them. An LVM snapshot of the shard receives the queries. There are cases where it takes over 2x - but they involve using reopen. If you have more than one Reader on the index, and only reopen some of them, the new Readers created can hold open the partially optimized segments that existed at that moment, creating a need for greater than 2x. Is doing a commit before I take a du a reliable way to gauge the size of the shard? It is really bad news to allow a shard to go over 200GB in my use case. How do others manage this problem of 2x space needed to optimize with limited dosk space? Get more disk space ;) Or don't optimize. A lower mergefactor can make optimizations less necessary. Advice greatly appreciated. Phil -- - Mark http://www.lucidimagination.com
Re: index size before and after commit
Whoops - they way I have mail come in, not easy to tell if I'm replying to Lucene or Solr list ;) The way Solr works with Searchers and reopen, it shouldn't run into a situation that requires greater than 2x to optimize. I won't guarantee it ;) But based on what I know, it shouldn't happen under normal circumstances. Mark Miller wrote: Phillip Farber wrote: I am trying to automate a build process that adds documents to 10 shards over 5 machines and need to limit the size of a shard to no more than 200GB because I only have 400GB of disk available to optimize a given shard. Why does the size (du) of an index typically decrease after a commit? I've observed a decrease in size of as much as from 296GB down to 151GB or as little as from 183GB to 182GB. Is that size after a commit close to the size the index would be after an optimize? Likely. Until you commit or close the Writer, the unoptimized index is the live index. And then you also have the optimized index. Once you commit and make the optimized index the live index, the unoptimized index can be removed (depending on your delete policy, which by default only keeps the latest commit point). For that matter, are there cases where optimization can take more than 2x? I've heard of cases but have not observed them in my system. I only do adds to the shards, never query them. An LVM snapshot of the shard receives the queries. There are cases where it takes over 2x - but they involve using reopen. If you have more than one Reader on the index, and only reopen some of them, the new Readers created can hold open the partially optimized segments that existed at that moment, creating a need for greater than 2x. Is doing a commit before I take a du a reliable way to gauge the size of the shard? It is really bad news to allow a shard to go over 200GB in my use case. How do others manage this problem of 2x space needed to optimize with limited dosk space? Get more disk space ;) Or don't optimize. A lower mergefactor can make optimizations less necessary. Advice greatly appreciated. Phil -- - Mark http://www.lucidimagination.com
Re: Query filters/analyzers
Thanks, that's exactly the kind of answer I was looking for. Chantal Ackermann wrote: Hi Claudio, in schema.xml, the analyzer element accepts the attribute type. If you need different analyzer chains during indexing and querying, configure it like this: fieldType name=channel_name class=solr.TextField analyzer type=index !-- indexing analyzer chain defined here -- /analyzer analyzer type=query !-- query analyzer chain defined here -- /analyzer /fieldType If there is no difference, just remove one analyzer element and the type attribute from the remaining one. You can check after indexing in the schema browser (admin web frontend) what analyzer chain is applied for indexing and querying on a certain field. When you have detected the input language, simply choose the correct field, and the configured analyzer chain for that field will be applied automatically. E.g. input is italian: q=text-it:input text-it has the italian analyzers configured for index and query, so to the input, the italian analyzers will also be applied. Cheers, Chantal Claudio Martella schrieb: Hello list. So, i setup my schema.xml with the different chains of analyzers and filters for each field (i.e. i created types text-en, text-de, text-it). As i have to index documents in different languages, this is good. But what defines the analyzers and filters for the query? Let's suppose i have my web-app with my input form where you fill in the query. I detect the language so i can query the field content-en or content-it or content-de according to the detection. But how is the query going to be analyzed? Of course i want the query to be analyzed accordingly to the field i'm going to search in. Where is this defined? TIA Claudio -- Claudio Martella Digital Technologies Unit Research Development - Engineer TIS innovation park Via Siemens 19 | Siemensstr. 19 39100 Bolzano | 39100 Bozen Tel. +39 0471 068 123 Fax +39 0471 068 129 claudio.marte...@tis.bz.it http://www.tis.bz.it Short information regarding use of personal data. According to Section 13 of Italian Legislative Decree no. 196 of 30 June 2003, we inform you that we process your personal data in order to fulfil contractual and fiscal obligations and also to send you information regarding our services and events. Your personal data are processed with and without electronic means and by respecting data subjects' rights, fundamental freedoms and dignity, particularly with regard to confidentiality, personal identity and the right to personal data protection. At any time and without formalities you can write an e-mail to priv...@tis.bz.it in order to object the processing of your personal data for the purpose of sending advertising materials and also to exercise the right to access personal data and other rights referred to in Section 7 of Decree 196/2003. The data controller is TIS Techno Innovation Alto Adige, Siemens Street n. 19, Bolzano. You can find the complete information on the web site www.tis.bz.it. -- Claudio Martella Digital Technologies Unit Research Development - Engineer TIS innovation park Via Siemens 19 | Siemensstr. 19 39100 Bolzano | 39100 Bozen Tel. +39 0471 068 123 Fax +39 0471 068 129 claudio.marte...@tis.bz.it http://www.tis.bz.it Short information regarding use of personal data. According to Section 13 of Italian Legislative Decree no. 196 of 30 June 2003, we inform you that we process your personal data in order to fulfil contractual and fiscal obligations and also to send you information regarding our services and events. Your personal data are processed with and without electronic means and by respecting data subjects' rights, fundamental freedoms and dignity, particularly with regard to confidentiality, personal identity and the right to personal data protection. At any time and without formalities you can write an e-mail to priv...@tis.bz.it in order to object the processing of your personal data for the purpose of sending advertising materials and also to exercise the right to access personal data and other rights referred to in Section 7 of Decree 196/2003. The data controller is TIS Techno Innovation Alto Adige, Siemens Street n. 19, Bolzano. You can find the complete information on the web site www.tis.bz.it.
Re: Where to place ReversedWildcardFilterFactory in Chain
Chantal Ackermann wrote: Hi all, I would have two questions about the ReversedWildcardFilterFactory: a) put it into both chains, index and query, or into index only? b) where exactly in the/each chain do I have to put it? (Do I have to respect a certain order - as I have wordDelimiter and lowercase in there, as well.) More Details: I understand it is used to allow queries like *sport. My current configuration for the field I want to use it for contains this setup: fieldType name=text_cn class=solr.TextField analyzer filter class=solr.WordDelimiterFilterFactory splitOnCaseChange=1 splitOnNumerics=1 stemEnglishPossessive=1 generateWordParts=1 generateNumberParts=1 catenateAll=1 preserveOriginal=1 / filter class=solr.LowerCaseFilterFactory / /analyzer /fieldType The wiki page http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters states for the ReversedWildcardFF: Add this filter to the index analyzer, but not the query analyzer. However, the API for it says it provides functionality at index and query time (my understanding): When this factory is added to an analysis chain, it will be used both for filtering the tokens during indexing, and to determine the query processing of this field during search. Any help is greatly appreciated. Thanks! Chantal You just put it in the index chain, not the query chain. The SolrQueryParser will consult it when building a wildcard search - don't put it in the query chain. I know, appears like a bit of magic. That Andrzej is a wizard though, so it makes sense ;) -- - Mark http://www.lucidimagination.com
Keepwords Schema
Hi guys, Although i've been looking at Solr on and off for a few months, I'm still getting to grips with the schema and filter/tokenizers. I'm having trouble using the solr.KeepWordFilterFactory functionality and there doesnt appear to be previous discussions here regarding it. I basically have a short text field (~100 chars on average) that i wish to be returned as a facet, but only some or parts of the field based on keepwords stored in a file. An example: My schema is about web files. Part of the syntax is a text field of authors that have worked on each file, e.g. file filenamelogin.php/filename lastModDate2009-01-01/lastModDate authorsalex, brian, carl carlington, dave alpha, eddie, dave beta/authors /file When I perform a search and get 20 web files back, I would like a facet of the individual authors, but only if there name appears in a public_authors.txt file. So if the public_authors.txt file contained: Anna, Bob, Carl Carlington, Dave Alpha, Elvis, Eddie, The facet returned would be: Carl Carlington Dave Alpha Eddie Not sure if that makes sense? If it does, could someone explain to me the schema fieldtype declarations that would bring back this sort of results. Thanks for any help Paul -- View this message in context: http://www.nabble.com/Keepwords-Schema-tp25696896p25696896.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: index size before and after commit
bq. and reindex without any merges. Thats actually quite a hoop to jump as well - though if you determined and you have tons of RAM, its somewhat doable. Mark Miller wrote: Nice one ;) Its not technically a case where optimize requires 2x though in case the user asking gets confused. Its a case unrelated to optimize that can grow your index. Then you need 2x for the optimize, since you won't copy the deletes. It also requires that you jump hoops to delete everything. If you delete everything with *:*, that is smart enough not to just do a delete on every document - it just creates a new index, allowing the removal of the old very efficiently. Def agree on the more disk space. Walter Underwood wrote: Here is how you need 3X. First, index everything and optimize. Then delete everything and reindex without any merges. You have one full-size index containing only deleted docs, one full-size index containing reindexed docs, and need that much space for a third index. Honestly, disk is cheap, and there is no way to make Lucene work reliably with less disk. 1TB is a few hundred dollars. You have a free search engine, buy some disk. wunder On Oct 1, 2009, at 6:25 AM, Grant Ingersoll wrote: 151GB or as little as from 183GB to 182GB. Is that size after a commit close to the size the index would be after an optimize? For that matter, are there cases where optimization can take more than 2x? I've heard of cases but have not observed them in my system. I seem to recall a case where it can be 3x, but I don't know that it has been observed much. -- - Mark http://www.lucidimagination.com
Re: Query filters/analyzers
Ok, one more question on this issue. I used to have an all field where i used to copyField title content and keywords defined with typeField text, which used to have english-language dependant analyzers/filters. Now I can copyField all the three content-* fields as I know that only one of the three will be filled per document. My problem is once again that i have to define a typeField for this all that should be language-independant. The solution is once again to create three all fields or to create only one defined as text-ws (no language-dependant analysis). But in the latter case it would be desynched with the content-* fields which are stemmed and stopped. About the copyField issue in general: as it copies the content to the other field, what is the sense to define analyzers for the destination field? The source is already analyzed so i guess that the RESULT of the analysis is copied there. In this case a text-ws should be sufficient. But then i guess the problem is again with the QUERY time analysis. Right? Chantal Ackermann wrote: Hi Claudio, in schema.xml, the analyzer element accepts the attribute type. If you need different analyzer chains during indexing and querying, configure it like this: fieldType name=channel_name class=solr.TextField analyzer type=index !-- indexing analyzer chain defined here -- /analyzer analyzer type=query !-- query analyzer chain defined here -- /analyzer /fieldType If there is no difference, just remove one analyzer element and the type attribute from the remaining one. You can check after indexing in the schema browser (admin web frontend) what analyzer chain is applied for indexing and querying on a certain field. When you have detected the input language, simply choose the correct field, and the configured analyzer chain for that field will be applied automatically. E.g. input is italian: q=text-it:input text-it has the italian analyzers configured for index and query, so to the input, the italian analyzers will also be applied. Cheers, Chantal Claudio Martella schrieb: Hello list. So, i setup my schema.xml with the different chains of analyzers and filters for each field (i.e. i created types text-en, text-de, text-it). As i have to index documents in different languages, this is good. But what defines the analyzers and filters for the query? Let's suppose i have my web-app with my input form where you fill in the query. I detect the language so i can query the field content-en or content-it or content-de according to the detection. But how is the query going to be analyzed? Of course i want the query to be analyzed accordingly to the field i'm going to search in. Where is this defined? TIA Claudio -- Claudio Martella Digital Technologies Unit Research Development - Engineer TIS innovation park Via Siemens 19 | Siemensstr. 19 39100 Bolzano | 39100 Bozen Tel. +39 0471 068 123 Fax +39 0471 068 129 claudio.marte...@tis.bz.it http://www.tis.bz.it Short information regarding use of personal data. According to Section 13 of Italian Legislative Decree no. 196 of 30 June 2003, we inform you that we process your personal data in order to fulfil contractual and fiscal obligations and also to send you information regarding our services and events. Your personal data are processed with and without electronic means and by respecting data subjects' rights, fundamental freedoms and dignity, particularly with regard to confidentiality, personal identity and the right to personal data protection. At any time and without formalities you can write an e-mail to priv...@tis.bz.it in order to object the processing of your personal data for the purpose of sending advertising materials and also to exercise the right to access personal data and other rights referred to in Section 7 of Decree 196/2003. The data controller is TIS Techno Innovation Alto Adige, Siemens Street n. 19, Bolzano. You can find the complete information on the web site www.tis.bz.it. -- Claudio Martella Digital Technologies Unit Research Development - Engineer TIS innovation park Via Siemens 19 | Siemensstr. 19 39100 Bolzano | 39100 Bozen Tel. +39 0471 068 123 Fax +39 0471 068 129 claudio.marte...@tis.bz.it http://www.tis.bz.it Short information regarding use of personal data. According to Section 13 of Italian Legislative Decree no. 196 of 30 June 2003, we inform you that we process your personal data in order to fulfil contractual and fiscal obligations and also to send you information regarding our services and events. Your personal data are processed with and without electronic means and by respecting data subjects' rights, fundamental freedoms and dignity, particularly with regard to confidentiality, personal identity and the right to personal data protection. At any time and without formalities you can write an e-mail to priv...@tis.bz.it in order to object the processing of your
Re: Only one usage of each socket address error
Hi. This situation is still bugging me. I thought i had it fixed yday, but no... Seems like this goes both for deleting and adding, but I'll explain the delete-situation here: When I'm deleting documents(~5k) from a index, i get a error message saying Only one usage of each socket address (protocol/network address/port) is normally permitted 127.0.0.1:8983. I've tried both delete by id and delete by query, and both gives me the same error. The command that is giving me the errormessage is solr.Delete(id) and solr.Delete(new SolrQuery(id:+id)). The command is issued with SolrNet, and I'm not sure if this is SolrNet or solr related. I cannot find anything that helps me out in the catalina-log. Are there any other logs that should be checked? I'm grateful for any pointers :) Thanks, Steinar Den 29. sep. 2009 kl. 11.15 skrev Steinar Asbjørnsen: Seems like the post in the SolrNet group: http://groups.google.com/group/solrnet/browse_thread/thread/7e3034b626d3e82d?pli=1 helped me get trough. Thanks you solr-user's for helping out too! Steinar Videresendt melding: Fra: Steinar Asbjørnsen steinar...@gmail.com Dato: 28. september 2009 17.07.15 GMT+02.00 Til: solr-user@lucene.apache.org Emne: Re: Only one usage of each socket address error I'm using the add(MyObject) command form ()in a foreach loop to add my objects to the index. In the catalina-log i cannot see anything that helps me out. It stops at: 28.sep.2009 08:58:40 org.apache.solr.update.processor.LogUpdateProcessor finish INFO: {add=[12345]} 0 187 28.sep.2009 08:58:40 org.apache.solr.core.SolrCore execute INFO: [core2] webapp=/solr path=/update params={} status=0 QTime=187 Whitch indicates nothing wrong. Are there any other logs that should be checked? What it seems like to me at the moment is that the foreach is passing objects(documents) to solr faster then solr can add them to the index. As in I'm eventually running out of connections (to solr?) or something. I'm running another incremental update that with other objects where the foreachs isn't quite as fast. This job has added over 100k documents without failing, and still going. Whereas the problematic job fails after ~3k. What I've learned trough the day tho, is that the index where my feed is failing is actually redundant. I.e I'm off the hook for now. Still I'd like to figure out whats going wrong. Steinar There's nothing in that output that indicates something we can help with over in solr-user land. What is the call you're making to Solr? Did Solr log anything anomalous? Erik On Sep 28, 2009, at 4:41 AM, Steinar Asbjørnsen wrote: I just posted to the SolrNet-group since i have the exact same(?) problem. Hope I'm not beeing rude posting here as well (since the SolrNet- group doesn't seem as active as this mailinglist). The problem occurs when I'm running an incremental feed(self made) of a index. My post: [snip] Whats happening is that i get this error message (in VS): A first chance exception of type 'SolrNet.Exceptions.SolrConnectionException' occurred in SolrNet.DLL And the web browser (which i use to start the feed says: System.Data.SqlClient.SqlException: Timeout expired. The timeout period elapsed prior to completion of the operation or the server is not responding. At the time of writing my index contains 15k docs, and lacks ~700k docs that the incremental feed should take care of adding to the index. The error message appears after 3k docs are added, and before 4k docs are added. I'm committing each 1%1000==0. In addittion autocommit is set to: autoCommit maxDocs1/maxDocs /autoCommit More info: From schema.xml: field name=id type=text indexed=true stored=true required=true / field name=name type=string indexed=true stored=true required=false / I'm fetching data from a (remote) Sql 2008 Server, using sqljdbc4.jar. And Solr is running on a local Tomcat-installation. SolrNet version: 0.2.3.0 Solr Specification Version: 1.3.0.2009.08.29.08.05.39 [/snip] Any suggestions on how to fix this would be much apreceiated. Regards, Steinar
Re: Where to place ReversedWildcardFilterFactory in Chain
Thanks, Mark! But I suppose it does matter where in the index chain it goes? I would guess it is applied to the tokens, so I suppose I should put it at the very end - after WordDelimiter and Lowercase have been applied. Is that correct? analyzer type=index filter class=solr.WordDelimiterFilterFactory splitOnCaseChange=1 splitOnNumerics=1 stemEnglishPossessive=1 generateWordParts=1 generateNumberParts=1 catenateAll=1 preserveOriginal=1 / filter class=solr.LowerCaseFilterFactory / filter class=solr.ReversedWildcardFilterFactory / /analyzer Cheers, Chantal Mark Miller schrieb: You just put it in the index chain, not the query chain. The SolrQueryParser will consult it when building a wildcard search - don't put it in the query chain. I know, appears like a bit of magic. That Andrzej is a wizard though, so it makes sense ;) -- - Mark http://www.lucidimagination.com Chantal Ackermann wrote: Hi all, I would have two questions about the ReversedWildcardFilterFactory: a) put it into both chains, index and query, or into index only? b) where exactly in the/each chain do I have to put it? (Do I have to respect a certain order - as I have wordDelimiter and lowercase in there, as well.) More Details: I understand it is used to allow queries like *sport. My current configuration for the field I want to use it for contains this setup: fieldType name=text_cn class=solr.TextField analyzer filter class=solr.WordDelimiterFilterFactory splitOnCaseChange=1 splitOnNumerics=1 stemEnglishPossessive=1 generateWordParts=1 generateNumberParts=1 catenateAll=1 preserveOriginal=1 / filter class=solr.LowerCaseFilterFactory / /analyzer /fieldType The wiki page http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters states for the ReversedWildcardFF: Add this filter to the index analyzer, but not the query analyzer. However, the API for it says it provides functionality at index and query time (my understanding): When this factory is added to an analysis chain, it will be used both for filtering the tokens during indexing, and to determine the query processing of this field during search. Any help is greatly appreciated. Thanks! Chantal
Re: index size before and after commit
I've now worked on three different search engines and they all have a 3X worst case on space, so I'm familiar with this case. --wunder On Oct 1, 2009, at 7:15 AM, Mark Miller wrote: Nice one ;) Its not technically a case where optimize requires 2x though in case the user asking gets confused. Its a case unrelated to optimize that can grow your index. Then you need 2x for the optimize, since you won't copy the deletes. It also requires that you jump hoops to delete everything. If you delete everything with *:*, that is smart enough not to just do a delete on every document - it just creates a new index, allowing the removal of the old very efficiently. Def agree on the more disk space. Walter Underwood wrote: Here is how you need 3X. First, index everything and optimize. Then delete everything and reindex without any merges. You have one full-size index containing only deleted docs, one full-size index containing reindexed docs, and need that much space for a third index. Honestly, disk is cheap, and there is no way to make Lucene work reliably with less disk. 1TB is a few hundred dollars. You have a free search engine, buy some disk. wunder On Oct 1, 2009, at 6:25 AM, Grant Ingersoll wrote: 151GB or as little as from 183GB to 182GB. Is that size after a commit close to the size the index would be after an optimize? For that matter, are there cases where optimization can take more than 2x? I've heard of cases but have not observed them in my system. I seem to recall a case where it can be 3x, but I don't know that it has been observed much. -- - Mark http://www.lucidimagination.com
Re: trie fields and sortMissingLast
I just noticed this comment in the default schema: !-- These types should only be used for back compatibility with existing indexes, or if sortMissingLast functionality is needed. Use Trie based fields instead. -- Does that mean TrieFields are never going to get sortMissingLast? Do you all think that a reasonable strategy is to use a copyField and use s fields for sorting (only), and trie for everything else? On Wed, Sep 30, 2009 at 10:59 PM, Steve Conover scono...@gmail.com wrote: Am I correct in thinking that trie fields don't support sortMissingLast (my tests show that they don't). If not, is there any plan for adding it in? Regards, Steve
RE: Sorting/paging problem
Oops, the missing trailing Z was probably just a cut and paste error. It might be tough to come up with a case that can reproduce it -- it's a sticky issue. I'll post it if I can, though. -Original Message- From: Chris Hostetter [mailto:hossman_luc...@fucit.org] Sent: Tuesday, September 29, 2009 6:08 PM To: solr-user@lucene.apache.org Subject: Re: Sorting/paging problem : docdate name=indexed_date2009-09-23T19:25:03.400Z/date/doc : : docdate name=indexed_date2009-09-23T19:25:19.951/date/doc : : docdate name=indexed_date2009-09-23T20:10:07.919Z/date/doc is that a cut/paste error, or did you really get a date back from Solr w/o the trailing Z ?!?!?! ... : So, not only is the date sorting wrong, but the exact same document : shows up on the next page, also still out of date order. I've seen the : same document show up in 4-5 pages in some cases. It's always the last : record on the page, too. If I change the page size, the problem seems to that is really freaking weird. can you reproduce this in a simple example? maybe an index that's small enough (and doesn't contain confidential information) that you could zip up and post online? -Hoss
Solr Trunk Heap Space Issues
I am trying to update to the newest version of solr from trunk as of May 5th. I updated and compiled from trunk as of yesterday (09/30/2009). When I try to do a full import I am receiving a GC heap error after changing nothing in the configuration files. Why would this happen in the most recent versions but not in the version from a few months ago. The stack trace is below. Oct 1, 2009 8:34:32 AM org.apache.solr.update.processor.LogUpdateProcessor finish INFO: {add=[166400, 166608, 166698, 166800, 166811, 167097, 167316, 167353, ...(83 more)]} 0 35991 Oct 1, 2009 8:34:32 AM org.apache.solr.common.SolrException log SEVERE: java.lang.OutOfMemoryError: GC overhead limit exceeded at java.util.Arrays.copyOfRange(Arrays.java:3209) at java.lang.String.init(String.java:215) at com.ctc.wstx.util.TextBuffer.contentsAsString(TextBuffer.java:384) at com.ctc.wstx.sr.BasicStreamReader.getText(BasicStreamReader.java:821) at org.apache.solr.handler.XMLLoader.readDoc(XMLLoader.java:280) at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:139) at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:69) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentSt reamHandlerBase.java:54) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase. java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:3 38) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java: 241) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Application FilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterCh ain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.ja va:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.ja va:175) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128 ) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102 ) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java :109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286) at org.apache.coyote.http11.Http11NioProcessor.process(Http11NioProcessor.java: 879) at org.apache.coyote.http11.Http11NioProtocol$Http11ConnectionHandler.process(H ttp11NioProtocol.java:719) at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.run(NioEndpoint.java: 2080) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.ja va:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:9 08) at java.lang.Thread.run(Thread.java:619) Oct 1, 2009 8:40:06 AM org.apache.solr.core.SolrCore execute INFO: [zeta-main] webapp=/solr path=/update params={} status=500 QTime=5265 Oct 1, 2009 8:40:12 AM org.apache.solr.common.SolrException log SEVERE: java.lang.OutOfMemoryError: GC overhead limit exceeded -- Jeff Newburn Software Engineer, Zappos.com jnewb...@zappos.com - 702-943-7562
Re: field collapsing sums
hello martijn, thx for the tip, i tried that approach but ran into two snags, 1. returning the fields makes collapsing a lot slower for results, but that might just be the nature of iterating large results. 2. it seems like only dupes of records on the first page are returned or is tehre a a setting im missing? currently im only sending, collapse.field=brand and collapse.includeCollapseDocs.fl=num_in_stock --joe On Thu, Oct 1, 2009 at 1:14 AM, Martijn v Groningen martijn.is.h...@gmail.com wrote: Hi Joe, Currently the patch does not do that, but you can do something else that might help you in getting your summed stock. In the latest patch you can include fields of collapsed documents in the result per distinct field value. If your specify collapse.includeCollapseDocs.fl=num_in_stock in the request nd lets say you collapse on brand then in the response you will receive the following xml: lst name=collapsedDocs result name=brand1 numFound=48 start=0 doc str name=num_in_stock2/str /doc doc str name=num_in_stock3/str /doc ... /result result name=”brand2” numFound=”9” start=”0” ... /result /lst On the client side you can do whatever you want with this data and for example sum it together. Although the patch does not sum for you, I think it will allow to implement your requirement without to much hassle. Cheers, Martijn 2009/10/1 Matt Weber m...@mattweber.org: You might want to see how the stats component works with field collapsing. Thanks, Matt Weber On Sep 30, 2009, at 5:16 PM, Uri Boness wrote: Hi, At the moment I think the most appropriate place to put it is in the AbstractDocumentCollapser (in the getCollapseInfo method). Though, it might not be the most efficient. Cheers, Uri Joe Calderon wrote: hello all, i have a question on the field collapsing patch, say i have an integer field called num_in_stock and i collapse by some other column, is it possible to sum up that integer field and return the total in the output, if not how would i go about extending the collapsing component to support that? thx much --joe
Re: Where to place ReversedWildcardFilterFactory in Chain
Chantal Ackermann wrote: Thanks, Mark! But I suppose it does matter where in the index chain it goes? I would guess it is applied to the tokens, so I suppose I should put it at the very end - after WordDelimiter and Lowercase have been applied. Is that correct? analyzer type=index filter class=solr.WordDelimiterFilterFactory splitOnCaseChange=1 splitOnNumerics=1 stemEnglishPossessive=1 generateWordParts=1 generateNumberParts=1 catenateAll=1 preserveOriginal=1 / filter class=solr.LowerCaseFilterFactory / filter class=solr.ReversedWildcardFilterFactory / /analyzer Yes. Care should be taken that the query analyzer chain produces the same forward tokens, because the code in QueryParser that optionally reverses tokens acts on tokens that it receives _after_ all other query analyzers have run on the query. -- Best regards, Andrzej Bialecki ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram dot com
What to set in query.setMaxRows()?
Sorry about asking this here, but I can't reach wiki.apache.org right now. What do I set in query.setMaxRows() to get all the rows? -- http://www.linkedin.com/in/paultomblin
Re: Solr Trunk Heap Space Issues
Jeff Newburn wrote: I am trying to update to the newest version of solr from trunk as of May 5th. I updated and compiled from trunk as of yesterday (09/30/2009). When I try to do a full import I am receiving a GC heap error after changing nothing in the configuration files. Why would this happen in the most recent versions but not in the version from a few months ago. Good question. The error means its spending too much time trying to garbage collect without making much progress. Why so much more garbage to collect just by updating? Not sure... The stack trace is below. Oct 1, 2009 8:34:32 AM org.apache.solr.update.processor.LogUpdateProcessor finish INFO: {add=[166400, 166608, 166698, 166800, 166811, 167097, 167316, 167353, ...(83 more)]} 0 35991 Oct 1, 2009 8:34:32 AM org.apache.solr.common.SolrException log SEVERE: java.lang.OutOfMemoryError: GC overhead limit exceeded at java.util.Arrays.copyOfRange(Arrays.java:3209) at java.lang.String.init(String.java:215) at com.ctc.wstx.util.TextBuffer.contentsAsString(TextBuffer.java:384) at com.ctc.wstx.sr.BasicStreamReader.getText(BasicStreamReader.java:821) at org.apache.solr.handler.XMLLoader.readDoc(XMLLoader.java:280) at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:139) at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:69) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentSt reamHandlerBase.java:54) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase. java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:3 38) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java: 241) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Application FilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterCh ain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.ja va:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.ja va:175) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128 ) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102 ) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java :109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286) at org.apache.coyote.http11.Http11NioProcessor.process(Http11NioProcessor.java: 879) at org.apache.coyote.http11.Http11NioProtocol$Http11ConnectionHandler.process(H ttp11NioProtocol.java:719) at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.run(NioEndpoint.java: 2080) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.ja va:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:9 08) at java.lang.Thread.run(Thread.java:619) Oct 1, 2009 8:40:06 AM org.apache.solr.core.SolrCore execute INFO: [zeta-main] webapp=/solr path=/update params={} status=500 QTime=5265 Oct 1, 2009 8:40:12 AM org.apache.solr.common.SolrException log SEVERE: java.lang.OutOfMemoryError: GC overhead limit exceeded -- - Mark http://www.lucidimagination.com
Correction: query.setRows
Sorry, in my last question I meant setRows not setMaxRows. Whay do I pass to setRows to get all matches, not just the first 10? -- Sent from my Palm Prē
Re: Solr Trunk Heap Space Issues
You probably want to add the following command line option to java to produce a heap dump: -XX:+HeapDumpOnOutOfMemoryError Then you can use jhat to see what's taking up all the space in the heap. Bill On Thu, Oct 1, 2009 at 11:47 AM, Mark Miller markrmil...@gmail.com wrote: Jeff Newburn wrote: I am trying to update to the newest version of solr from trunk as of May 5th. I updated and compiled from trunk as of yesterday (09/30/2009). When I try to do a full import I am receiving a GC heap error after changing nothing in the configuration files. Why would this happen in the most recent versions but not in the version from a few months ago. Good question. The error means its spending too much time trying to garbage collect without making much progress. Why so much more garbage to collect just by updating? Not sure... The stack trace is below. Oct 1, 2009 8:34:32 AM org.apache.solr.update.processor.LogUpdateProcessor finish INFO: {add=[166400, 166608, 166698, 166800, 166811, 167097, 167316, 167353, ...(83 more)]} 0 35991 Oct 1, 2009 8:34:32 AM org.apache.solr.common.SolrException log SEVERE: java.lang.OutOfMemoryError: GC overhead limit exceeded at java.util.Arrays.copyOfRange(Arrays.java:3209) at java.lang.String.init(String.java:215) at com.ctc.wstx.util.TextBuffer.contentsAsString(TextBuffer.java:384) at com.ctc.wstx.sr.BasicStreamReader.getText(BasicStreamReader.java:821) at org.apache.solr.handler.XMLLoader.readDoc(XMLLoader.java:280) at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:139) at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:69) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentSt reamHandlerBase.java:54) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase. java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:3 38) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java: 241) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Application FilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterCh ain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.ja va:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.ja va:175) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128 ) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102 ) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java :109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286) at org.apache.coyote.http11.Http11NioProcessor.process(Http11NioProcessor.java: 879) at org.apache.coyote.http11.Http11NioProtocol$Http11ConnectionHandler.process(H ttp11NioProtocol.java:719) at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.run(NioEndpoint.java: 2080) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.ja va:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:9 08) at java.lang.Thread.run(Thread.java:619) Oct 1, 2009 8:40:06 AM org.apache.solr.core.SolrCore execute INFO: [zeta-main] webapp=/solr path=/update params={} status=500 QTime=5265 Oct 1, 2009 8:40:12 AM org.apache.solr.common.SolrException log SEVERE: java.lang.OutOfMemoryError: GC overhead limit exceeded -- - Mark http://www.lucidimagination.com
Re: Where to place ReversedWildcardFilterFactory in Chain
Hi Andrzej, thanks! Unfortunately, I get a ClassNotFoundException for the solr.ReversedWildcardFilterFactory with my nightly build from 22nd of September. I've found the corresponding JIRA issue, but from the wiki it's not obvious that this might require a patch? I'll have a closer look at the JIRA issue, in any case. Cheers, Chantal Andrzej Bialecki schrieb: Chantal Ackermann wrote: Thanks, Mark! But I suppose it does matter where in the index chain it goes? I would guess it is applied to the tokens, so I suppose I should put it at the very end - after WordDelimiter and Lowercase have been applied. Is that correct? analyzer type=index filter class=solr.WordDelimiterFilterFactory splitOnCaseChange=1 splitOnNumerics=1 stemEnglishPossessive=1 generateWordParts=1 generateNumberParts=1 catenateAll=1 preserveOriginal=1 / filter class=solr.LowerCaseFilterFactory / filter class=solr.ReversedWildcardFilterFactory / /analyzer Yes. Care should be taken that the query analyzer chain produces the same forward tokens, because the code in QueryParser that optionally reverses tokens acts on tokens that it receives _after_ all other query analyzers have run on the query. -- Best regards, Andrzej Bialecki ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram dot com
Re: Problem getting Solr home from JNDI in Tomcat
Andrew Clegg wrote: hossman wrote: This is why the examples of using context files on the wiki talk about keeping the war *outside* of the webapps directory, and using docBase in your Context declaration... http://wiki.apache.org/solr/SolrTomcat Great, I'll try it this way and see if it clears up. Is it okay to keep the war file *inside* the Solr home directory (/opt/solr in my case) so it's all self-contained? For the benefit of future searchers -- I tried it this way and it works fine. Thanks again to everyone for helping. Andrew. -- View this message in context: http://www.nabble.com/Problem-getting-Solr-home-from-JNDI-in-Tomcat-tp25662200p25701748.html Sent from the Solr - User mailing list archive at Nabble.com.
ExtractingRequestHandler unknown field 'stream_source_info'
Hi All, I'm trying Solr CEL outside of the example and running into trouble because I can't refer to the http://wiki.apache.org/solr/ExtractingRequestHandler (the wiki's down). After realizing I needed to copy all the jars from /example/solr/lib to my indexes /lib dir, I am now hitting this particular wall: INFO: [] webapp=/solr path=/update/extract params={myfile=MHGL016341T.pdfcommit=trueliteral.id=MHGL.1634} status=0 QTime=5967 1-Oct-2009 10:06:34 AM org.apache.solr.update.processor.LogUpdateProcessor finish INFO: {} 0 260248 1-Oct-2009 10:06:38 AM org.apache.solr.common.SolrException log SEVERE: org.apache.solr.common.SolrException: ERROR:unknown field 'stream_source_info' at org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:289) at org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:60) at org.apache.solr.handler.extraction.ExtractingDocumentLoader.doAdd(ExtractingDocumentLoader.java:118) at org.apache.solr.handler.extraction.ExtractingDocumentLoader.addDoc(ExtractingDocumentLoader.java:123) at org.apache.solr.handler.extraction.ExtractingDocumentLoader.load(ExtractingDocumentLoader.java:192) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:54) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1089) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:365) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:712) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:405) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:211) at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:139) at org.mortbay.jetty.Server.handle(Server.java:285) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:502) at org.mortbay.jetty.HttpConnection$RequestHandler.content(HttpConnection.java:835) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:641) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:202) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:378) at org.mortbay.jetty.bio.SocketConnector$Connection.run(SocketConnector.java:226) at org.mortbay.thread.BoundedThreadPool$PoolThread.run(BoundedThreadPool.java:442) while running: curl http://localhost:8983/solr/update/extract?literal.id=MHGL.1634commit=true; -F myfi...@mhgl016341t.pdf It feels like I'm not mapping something correctly either in my POST request or in solrconfig.xml/schema.xml. I can see that STREAM_SOURCE_INFO is supposed to be an internal field from the code but I'm not following why it would cause this error. Any suggestions would be appreciated. Many Thanks, Tricia
Quotes in query string cause NullPointerException
Hi folks, I'm using the 2009-09-30 build, and any single or double quotes in the query string cause an NPE. Is this normal behaviour? I never tried it with my previous installation. Example: http://myserver:8080/solr/select/?title:%22Creatine+kinase%22 (I've also tried without the URL encoding, no difference) Response: HTTP Status 500 - null java.lang.NullPointerException at java.io.StringReader.init(StringReader.java:33) at org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:173) at org.apache.solr.search.LuceneQParser.parse(LuceneQParserPlugin.java:78) at org.apache.solr.search.QParser.getQuery(QParser.java:131) at org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:89) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:174) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:175) at org.apache.catalina.valves.RequestFilterValve.process(RequestFilterValve.java:269) at org.apache.catalina.valves.RemoteAddrValve.invoke(RemoteAddrValve.java:81) at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:568) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128) at org.jstripe.tomcat.probe.Tomcat55AgentValve.invoke(Tomcat55AgentValve.java:20) at org.jstripe.tomcat.probe.Tomcat55AgentValve.invoke(Tomcat55AgentValve.java:20) at org.jstripe.tomcat.probe.Tomcat55AgentValve.invoke(Tomcat55AgentValve.java:20) at org.jstripe.tomcat.probe.Tomcat55AgentValve.invoke(Tomcat55AgentValve.java:20) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:844) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:447) at java.lang.Thread.run(Thread.java:619) Single quotes have the same effect. Is there another way to specify exact phrases? Thanks, Andrew. -- View this message in context: http://www.nabble.com/Quotes-in-query-string-cause-NullPointerException-tp25702207p25702207.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: ExtractingRequestHandler unknown field 'stream_source_info'
On 1 Oct 09, at 12:46 PM, Tricia Williams wrote: STREAM_SOURCE_INFO https://www.packtpub.com/article/indexing-data-solr-1.4-enterprise-search-server-2 appears to be a constant from this page: http://lucene.apache.org/solr/api/constant-values.html This has it embedded as an arr in the results http://www.nabble.com/Solr-question-td25271706.html Whether any of these help or not ... Walter
Re: Where to place ReversedWildcardFilterFactory in Chain
It was added to trunk on the 11th and shouldn't require a patch. You sure that nightly was actually build after then? solr.ReversedWildcardFilterFactory should work fine. Chantal Ackermann wrote: Hi Andrzej, thanks! Unfortunately, I get a ClassNotFoundException for the solr.ReversedWildcardFilterFactory with my nightly build from 22nd of September. I've found the corresponding JIRA issue, but from the wiki it's not obvious that this might require a patch? I'll have a closer look at the JIRA issue, in any case. Cheers, Chantal Andrzej Bialecki schrieb: Chantal Ackermann wrote: Thanks, Mark! But I suppose it does matter where in the index chain it goes? I would guess it is applied to the tokens, so I suppose I should put it at the very end - after WordDelimiter and Lowercase have been applied. Is that correct? analyzer type=index filter class=solr.WordDelimiterFilterFactory splitOnCaseChange=1 splitOnNumerics=1 stemEnglishPossessive=1 generateWordParts=1 generateNumberParts=1 catenateAll=1 preserveOriginal=1 / filter class=solr.LowerCaseFilterFactory / filter class=solr.ReversedWildcardFilterFactory / /analyzer Yes. Care should be taken that the query analyzer chain produces the same forward tokens, because the code in QueryParser that optionally reverses tokens acts on tokens that it receives _after_ all other query analyzers have run on the query. -- Best regards, Andrzej Bialecki ___. ___ ___ ___ _ _ __ [__ || __|__/|__||\/| Information Retrieval, Semantic Web ___|||__|| \| || | Embedded Unix, System Integration http://www.sigram.com Contact: info at sigram dot com -- - Mark http://www.lucidimagination.com
Re: Quotes in query string cause NullPointerException
don't forget q=... :) Erik On Oct 1, 2009, at 9:49 AM, Andrew Clegg wrote: Hi folks, I'm using the 2009-09-30 build, and any single or double quotes in the query string cause an NPE. Is this normal behaviour? I never tried it with my previous installation. Example: http://myserver:8080/solr/select/?title:%22Creatine+kinase%22 (I've also tried without the URL encoding, no difference) Response: HTTP Status 500 - null java.lang.NullPointerException at java.io.StringReader.init(StringReader.java:33) at org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java: 173) at org.apache.solr.search.LuceneQParser.parse(LuceneQParserPlugin.java: 78) at org.apache.solr.search.QParser.getQuery(QParser.java:131) at org .apache .solr.handler.component.QueryComponent.prepare(QueryComponent.java:89) at org .apache .solr .handler .component.SearchHandler.handleRequestBody(SearchHandler.java:174) at org .apache .solr .handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) at org .apache .solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338) at org .apache .solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241) at org .apache .catalina .core .ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java: 235) at org .apache .catalina .core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org .apache .catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java: 233) at org .apache .catalina.core.StandardContextValve.invoke(StandardContextValve.java: 175) at org .apache .catalina.valves.RequestFilterValve.process(RequestFilterValve.java: 269) at org .apache.catalina.valves.RemoteAddrValve.invoke(RemoteAddrValve.java: 81) at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java: 568) at org .apache .catalina.core.StandardHostValve.invoke(StandardHostValve.java:128) at org .jstripe .tomcat.probe.Tomcat55AgentValve.invoke(Tomcat55AgentValve.java:20) at org .jstripe .tomcat.probe.Tomcat55AgentValve.invoke(Tomcat55AgentValve.java:20) at org .jstripe .tomcat.probe.Tomcat55AgentValve.invoke(Tomcat55AgentValve.java:20) at org .jstripe .tomcat.probe.Tomcat55AgentValve.invoke(Tomcat55AgentValve.java:20) at org .apache .catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org .apache .catalina.core.StandardEngineValve.invoke(StandardEngineValve.java: 109) at org .apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java: 286) at org .apache.coyote.http11.Http11Processor.process(Http11Processor.java: 844) at org.apache.coyote.http11.Http11Protocol $Http11ConnectionHandler.process(Http11Protocol.java:583) at org.apache.tomcat.util.net.JIoEndpoint $Worker.run(JIoEndpoint.java:447) at java.lang.Thread.run(Thread.java:619) Single quotes have the same effect. Is there another way to specify exact phrases? Thanks, Andrew. -- View this message in context: http://www.nabble.com/Quotes-in-query-string-cause-NullPointerException-tp25702207p25702207.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Quotes in query string cause NullPointerException
Sorry! I'm officially a complete idiot. Personally I'd try to catch things like that and rethrow a 'QueryParseException' or something -- but don't feel under any obligation to listen to me because, well, I'm an idiot. Thanks :-) Andrew. Erik Hatcher-4 wrote: don't forget q=... :) Erik On Oct 1, 2009, at 9:49 AM, Andrew Clegg wrote: Hi folks, I'm using the 2009-09-30 build, and any single or double quotes in the query string cause an NPE. Is this normal behaviour? I never tried it with my previous installation. Example: http://myserver:8080/solr/select/?title:%22Creatine+kinase%22 (I've also tried without the URL encoding, no difference) Response: HTTP Status 500 - null java.lang.NullPointerException at java.io.StringReader.init(StringReader.java:33) at org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java: 173) at org.apache.solr.search.LuceneQParser.parse(LuceneQParserPlugin.java: 78) at org.apache.solr.search.QParser.getQuery(QParser.java:131) at org .apache .solr.handler.component.QueryComponent.prepare(QueryComponent.java:89) at org .apache .solr .handler .component.SearchHandler.handleRequestBody(SearchHandler.java:174) at org .apache .solr .handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) at org .apache .solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338) at org .apache .solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241) at org .apache .catalina .core .ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java: 235) at org .apache .catalina .core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org .apache .catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java: 233) at org .apache .catalina.core.StandardContextValve.invoke(StandardContextValve.java: 175) at org .apache .catalina.valves.RequestFilterValve.process(RequestFilterValve.java: 269) at org .apache.catalina.valves.RemoteAddrValve.invoke(RemoteAddrValve.java: 81) at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java: 568) at org .apache .catalina.core.StandardHostValve.invoke(StandardHostValve.java:128) at org .jstripe .tomcat.probe.Tomcat55AgentValve.invoke(Tomcat55AgentValve.java:20) at org .jstripe .tomcat.probe.Tomcat55AgentValve.invoke(Tomcat55AgentValve.java:20) at org .jstripe .tomcat.probe.Tomcat55AgentValve.invoke(Tomcat55AgentValve.java:20) at org .jstripe .tomcat.probe.Tomcat55AgentValve.invoke(Tomcat55AgentValve.java:20) at org .apache .catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org .apache .catalina.core.StandardEngineValve.invoke(StandardEngineValve.java: 109) at org .apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java: 286) at org .apache.coyote.http11.Http11Processor.process(Http11Processor.java: 844) at org.apache.coyote.http11.Http11Protocol $Http11ConnectionHandler.process(Http11Protocol.java:583) at org.apache.tomcat.util.net.JIoEndpoint $Worker.run(JIoEndpoint.java:447) at java.lang.Thread.run(Thread.java:619) Single quotes have the same effect. Is there another way to specify exact phrases? Thanks, Andrew. -- View this message in context: http://www.nabble.com/Quotes-in-query-string-cause-NullPointerException-tp25702207p25702207.html Sent from the Solr - User mailing list archive at Nabble.com. -- View this message in context: http://www.nabble.com/Quotes-in-query-string-cause-NullPointerException-tp25702207p25704050.html Sent from the Solr - User mailing list archive at Nabble.com.
How to access the information from SolrJ
When I do a query directly form the web, the XML of the response includes how many results would have been returned if it hadn't restricted itself to the first 10 rows: For instance, the query: http://localhost:8080/solrChunk/nutch/select/?q=*:*fq=category:mysites returns: response lst name='responseHeader' int name='status'0/int int name='QTime'0/int lst name='params' str name='q'*:*/str str name='fq'category:mysites/str /lst /lst result name='response' numFound='1251' start='0' doc str name='category'mysites/str long name='chunkNum'0/long str name='chunkUrl'http://localhost/Chunks/mysites/0-http___xcski.com_.xml/str str name='concept'Anatomy/str ... The value I'm talking about is in the numFound attribute of the result tag. I don't see any way to retrieve it through SolrJ - it's not in the QueryResponse.getHeader(), for instance. Can I retrieve it somewhere? -- http://www.linkedin.com/in/paultomblin
Re: Solr Trunk Heap Space Issues
Added the parameter and it didn't seem to dump when it hit the gc limit error. Any other thoughts? -- Jeff Newburn Software Engineer, Zappos.com jnewb...@zappos.com - 702-943-7562 From: Bill Au bill.w...@gmail.com Reply-To: solr-user@lucene.apache.org Date: Thu, 1 Oct 2009 12:16:53 -0400 To: solr-user@lucene.apache.org Subject: Re: Solr Trunk Heap Space Issues You probably want to add the following command line option to java to produce a heap dump: -XX:+HeapDumpOnOutOfMemoryError Then you can use jhat to see what's taking up all the space in the heap. Bill On Thu, Oct 1, 2009 at 11:47 AM, Mark Miller markrmil...@gmail.com wrote: Jeff Newburn wrote: I am trying to update to the newest version of solr from trunk as of May 5th. I updated and compiled from trunk as of yesterday (09/30/2009). When I try to do a full import I am receiving a GC heap error after changing nothing in the configuration files. Why would this happen in the most recent versions but not in the version from a few months ago. Good question. The error means its spending too much time trying to garbage collect without making much progress. Why so much more garbage to collect just by updating? Not sure... The stack trace is below. Oct 1, 2009 8:34:32 AM org.apache.solr.update.processor.LogUpdateProcessor finish INFO: {add=[166400, 166608, 166698, 166800, 166811, 167097, 167316, 167353, ...(83 more)]} 0 35991 Oct 1, 2009 8:34:32 AM org.apache.solr.common.SolrException log SEVERE: java.lang.OutOfMemoryError: GC overhead limit exceeded at java.util.Arrays.copyOfRange(Arrays.java:3209) at java.lang.String.init(String.java:215) at com.ctc.wstx.util.TextBuffer.contentsAsString(TextBuffer.java:384) at com.ctc.wstx.sr.BasicStreamReader.getText(BasicStreamReader.java:821) at org.apache.solr.handler.XMLLoader.readDoc(XMLLoader.java:280) at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:139) at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:69) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentSt reamHandlerBase.java:54) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase. java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:3 38) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java: 241) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Application FilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterCh ain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.ja va:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.ja va:175) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128 ) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102 ) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java :109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286) at org.apache.coyote.http11.Http11NioProcessor.process(Http11NioProcessor.java: 879) at org.apache.coyote.http11.Http11NioProtocol$Http11ConnectionHandler.process(H ttp11NioProtocol.java:719) at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.run(NioEndpoint.java: 2080) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.ja va:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:9 08) at java.lang.Thread.run(Thread.java:619) Oct 1, 2009 8:40:06 AM org.apache.solr.core.SolrCore execute INFO: [zeta-main] webapp=/solr path=/update params={} status=500 QTime=5265 Oct 1, 2009 8:40:12 AM org.apache.solr.common.SolrException log SEVERE: java.lang.OutOfMemoryError: GC overhead limit exceeded -- - Mark http://www.lucidimagination.com
Re: Solr Trunk Heap Space Issues
Jeff Newburn wrote: Added the parameter and it didn't seem to dump when it hit the gc limit error. Any other thoughts? You might use jmap to take a look at the heap (you can do it well its live with Java6) or to force a heap dump when you specify. Since its spending 98% of the time in gc and recovering less than 2% of the heap, its likely your just right at the mem limits of what your app now requires. Why that should be different than what it was in a march build, I still dunno. Can you give us the info on your stats page regarding the new fieldcache insanity checker? -- - Mark http://www.lucidimagination.com
Re: Solr Trunk Heap Space Issues
Mark Miller wrote: You might use jmap to take a look at the heap (you can do it well its live with Java6) Errr - just so I don't screw anyone in a production environment - it will freeze your app while its getting the info. -- - Mark http://www.lucidimagination.com
Re: Why isn't the DateField implementation of ISO 8601 broader?
My question is why isn't the DateField implementation of ISO 8601 broader so that it could include and MM as acceptable date strings? What would it take to do so? Nobody ever cared? But yes, you're right, the spurious precision is annoying. However, there is no fuzzy search for dates so the precision is always used. Let's say I want to limit it to 19th century America culture. 1790-1910 are a fairly contiguous sequence in US history, with a massive break at 1910 for WW1. Are there any work-arounds for faceting by century, year, month without creating new fields in my schema? The last resort would be to create these new fields but I'm hoping to leverage the power of the DateField and the trie to replace range stuff. There are no workarounds as yet. You do not have to store the century/year etc. fields, only index them. Tries do not support faceting yet. Some interesting observations from tinkering with the DateFieldTest: * 2003-03-00T00:00:00Z becomes 2003-02-28T00:00:00Z The date parser should blow up with these values!
Re: ExtractingRequestHandler unknown field 'stream_source_info'
If the wiki isn't working https://www.packtpub.com/article/indexing-data-solr-1.4-enterprise-search-server-2 gave me more information. The LucidImagination article helps too. Now that the wiki is up again it is more obvious that I need to add: str name=fmap.contentfulltext/str str name=defaultFieldtext/str to my solrconfig.xml Tricia
Re: Quotes in query string cause NullPointerException
Don't be too hard on yourself. Sometimes, mistakes like that can happen even to the most brilliant and most experienced. On Thu, Oct 1, 2009 at 2:15 PM, Andrew Clegg andrew.cl...@gmail.com wrote: Sorry! I'm officially a complete idiot. Personally I'd try to catch things like that and rethrow a 'QueryParseException' or something -- but don't feel under any obligation to listen to me because, well, I'm an idiot. Thanks :-) Andrew. Erik Hatcher-4 wrote: don't forget q=... :) Erik On Oct 1, 2009, at 9:49 AM, Andrew Clegg wrote: Hi folks, I'm using the 2009-09-30 build, and any single or double quotes in the query string cause an NPE. Is this normal behaviour? I never tried it with my previous installation. Example: http://myserver:8080/solr/select/?title:%22Creatine+kinase%22 (I've also tried without the URL encoding, no difference) Response: HTTP Status 500 - null java.lang.NullPointerException at java.io.StringReader.init(StringReader.java:33) at org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java: 173) at org.apache.solr.search.LuceneQParser.parse(LuceneQParserPlugin.java: 78) at org.apache.solr.search.QParser.getQuery(QParser.java:131) at org .apache .solr.handler.component.QueryComponent.prepare(QueryComponent.java:89) at org .apache .solr .handler .component.SearchHandler.handleRequestBody(SearchHandler.java:174) at org .apache .solr .handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) at org .apache .solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338) at org .apache .solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241) at org .apache .catalina .core .ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java: 235) at org .apache .catalina .core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org .apache .catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java: 233) at org .apache .catalina.core.StandardContextValve.invoke(StandardContextValve.java: 175) at org .apache .catalina.valves.RequestFilterValve.process(RequestFilterValve.java: 269) at org .apache.catalina.valves.RemoteAddrValve.invoke(RemoteAddrValve.java: 81) at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java: 568) at org .apache .catalina.core.StandardHostValve.invoke(StandardHostValve.java:128) at org .jstripe .tomcat.probe.Tomcat55AgentValve.invoke(Tomcat55AgentValve.java:20) at org .jstripe .tomcat.probe.Tomcat55AgentValve.invoke(Tomcat55AgentValve.java:20) at org .jstripe .tomcat.probe.Tomcat55AgentValve.invoke(Tomcat55AgentValve.java:20) at org .jstripe .tomcat.probe.Tomcat55AgentValve.invoke(Tomcat55AgentValve.java:20) at org .apache .catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org .apache .catalina.core.StandardEngineValve.invoke(StandardEngineValve.java: 109) at org .apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java: 286) at org .apache.coyote.http11.Http11Processor.process(Http11Processor.java: 844) at org.apache.coyote.http11.Http11Protocol $Http11ConnectionHandler.process(Http11Protocol.java:583) at org.apache.tomcat.util.net.JIoEndpoint $Worker.run(JIoEndpoint.java:447) at java.lang.Thread.run(Thread.java:619) Single quotes have the same effect. Is there another way to specify exact phrases? Thanks, Andrew. -- View this message in context: http://www.nabble.com/Quotes-in-query-string-cause-NullPointerException-tp25702207p25702207.html Sent from the Solr - User mailing list archive at Nabble.com. -- View this message in context: http://www.nabble.com/Quotes-in-query-string-cause-NullPointerException-tp25702207p25704050.html Sent from the Solr - User mailing list archive at Nabble.com. -- Good Enough is not good enough. To give anything less than your best is to sacrifice the gift. Quality First. Measure Twice. Cut Once.
Re: trie fields and sortMissingLast
Trie fields also do not support faceting. They also take more ram in some operations. Given these defects, I'm not sure that promoting tries as the default is appropriate at this time. (I'm sure this is an old argument.:) On Thu, Oct 1, 2009 at 7:39 AM, Steve Conover scono...@gmail.com wrote: I just noticed this comment in the default schema: !-- These types should only be used for back compatibility with existing indexes, or if sortMissingLast functionality is needed. Use Trie based fields instead. -- Does that mean TrieFields are never going to get sortMissingLast? Do you all think that a reasonable strategy is to use a copyField and use s fields for sorting (only), and trie for everything else? On Wed, Sep 30, 2009 at 10:59 PM, Steve Conover scono...@gmail.com wrote: Am I correct in thinking that trie fields don't support sortMissingLast (my tests show that they don't). If not, is there any plan for adding it in? Regards, Steve -- Lance Norskog goks...@gmail.com
Re: Quotes in query string cause NullPointerException
Indeed... and the only reason I knew the answer right away is because I've experienced this myself numerous times :) Erik On Oct 1, 2009, at 11:46 AM, Israel Ekpo wrote: Don't be too hard on yourself. Sometimes, mistakes like that can happen even to the most brilliant and most experienced. On Thu, Oct 1, 2009 at 2:15 PM, Andrew Clegg andrew.cl...@gmail.com wrote: Sorry! I'm officially a complete idiot. Personally I'd try to catch things like that and rethrow a 'QueryParseException' or something -- but don't feel under any obligation to listen to me because, well, I'm an idiot. Thanks :-) Andrew. Erik Hatcher-4 wrote: don't forget q=... :) Erik On Oct 1, 2009, at 9:49 AM, Andrew Clegg wrote: Hi folks, I'm using the 2009-09-30 build, and any single or double quotes in the query string cause an NPE. Is this normal behaviour? I never tried it with my previous installation. Example: http://myserver:8080/solr/select/?title:%22Creatine+kinase%22 (I've also tried without the URL encoding, no difference) Response: HTTP Status 500 - null java.lang.NullPointerException at java.io.StringReader.init(StringReader.java:33) at org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java: 173) at org .apache.solr.search.LuceneQParser.parse(LuceneQParserPlugin.java: 78) at org.apache.solr.search.QParser.getQuery(QParser.java:131) at org .apache .solr .handler.component.QueryComponent.prepare(QueryComponent.java:89) at org .apache .solr .handler .component.SearchHandler.handleRequestBody(SearchHandler.java:174) at org .apache .solr .handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java: 131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) at org .apache .solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java: 338) at org .apache .solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java: 241) at org .apache .catalina .core .ApplicationFilterChain .internalDoFilter(ApplicationFilterChain.java: 235) at org .apache .catalina .core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java: 206) at org .apache .catalina .core.StandardWrapperValve.invoke(StandardWrapperValve.java: 233) at org .apache .catalina .core.StandardContextValve.invoke(StandardContextValve.java: 175) at org .apache .catalina .valves.RequestFilterValve.process(RequestFilterValve.java: 269) at org .apache .catalina.valves.RemoteAddrValve.invoke(RemoteAddrValve.java: 81) at org .apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java: 568) at org .apache .catalina.core.StandardHostValve.invoke(StandardHostValve.java:128) at org .jstripe .tomcat.probe.Tomcat55AgentValve.invoke(Tomcat55AgentValve.java:20) at org .jstripe .tomcat.probe.Tomcat55AgentValve.invoke(Tomcat55AgentValve.java:20) at org .jstripe .tomcat.probe.Tomcat55AgentValve.invoke(Tomcat55AgentValve.java:20) at org .jstripe .tomcat.probe.Tomcat55AgentValve.invoke(Tomcat55AgentValve.java:20) at org .apache .catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org .apache .catalina.core.StandardEngineValve.invoke(StandardEngineValve.java: 109) at org .apache .catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java: 286) at org .apache.coyote.http11.Http11Processor.process(Http11Processor.java: 844) at org.apache.coyote.http11.Http11Protocol $Http11ConnectionHandler.process(Http11Protocol.java:583) at org.apache.tomcat.util.net.JIoEndpoint $Worker.run(JIoEndpoint.java:447) at java.lang.Thread.run(Thread.java:619) Single quotes have the same effect. Is there another way to specify exact phrases? Thanks, Andrew. -- View this message in context: http://www.nabble.com/Quotes-in-query-string-cause-NullPointerException-tp25702207p25702207.html Sent from the Solr - User mailing list archive at Nabble.com. -- View this message in context: http://www.nabble.com/Quotes-in-query-string-cause-NullPointerException-tp25702207p25704050.html Sent from the Solr - User mailing list archive at Nabble.com. -- Good Enough is not good enough. To give anything less than your best is to sacrifice the gift. Quality First. Measure Twice. Cut Once.
Re: ExtractingRequestHandler unknown field 'stream_source_info'
For future reference, the Solr Lucene wikis and mailing lists are indexed on http://www.lucidimagination.com/search/ On Thu, Oct 1, 2009 at 11:40 AM, Tricia Williams williams.tri...@gmail.com wrote: If the wiki isn't working https://www.packtpub.com/article/indexing-data-solr-1.4-enterprise-search-server-2 gave me more information. The LucidImagination article helps too. Now that the wiki is up again it is more obvious that I need to add: str name=fmap.contentfulltext/str str name=defaultFieldtext/str to my solrconfig.xml Tricia -- Lance Norskog goks...@gmail.com
Re: Solr Trunk Heap Space Issues
On Thu, Oct 1, 2009 at 11:41 AM, Jeff Newburn jnewb...@zappos.com wrote: I am trying to update to the newest version of solr from trunk as of May 5th. Tons of changes since... including the per-segment searching/sorting/function queries (I think). Do you sort on any single valued fields that you also facet on? Do you use ord() or rord() in any function queries? Unfortunately, some of these things will take up more memory because some things still cache FieldCache elements with the top-level reader, while some use segment readers. The direction is going toward all segment readers, but we're not there yet (and won't be for 1.4). ord() rord() will never be fixed... people need to migrate to something else. http://issues.apache.org/jira/browse/SOLR- is the main issue for this. If course, I've really only been talking about search related changes. Nothing on the indexing side should cause greater memory usage but perhaps the indexing side could run out of memory due to the search side taking up more. -Yonik http://www.lucidimagination.com I updated and compiled from trunk as of yesterday (09/30/2009). When I try to do a full import I am receiving a GC heap error after changing nothing in the configuration files. Why would this happen in the most recent versions but not in the version from a few months ago. The stack trace is below. Oct 1, 2009 8:34:32 AM org.apache.solr.update.processor.LogUpdateProcessor finish INFO: {add=[166400, 166608, 166698, 166800, 166811, 167097, 167316, 167353, ...(83 more)]} 0 35991 Oct 1, 2009 8:34:32 AM org.apache.solr.common.SolrException log SEVERE: java.lang.OutOfMemoryError: GC overhead limit exceeded at java.util.Arrays.copyOfRange(Arrays.java:3209) at java.lang.String.init(String.java:215) at com.ctc.wstx.util.TextBuffer.contentsAsString(TextBuffer.java:384) at com.ctc.wstx.sr.BasicStreamReader.getText(BasicStreamReader.java:821) at org.apache.solr.handler.XMLLoader.readDoc(XMLLoader.java:280) at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:139) at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:69) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentSt reamHandlerBase.java:54) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase. java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:3 38) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java: 241) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Application FilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterCh ain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.ja va:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.ja va:175) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128 ) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102 ) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java :109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286) at org.apache.coyote.http11.Http11NioProcessor.process(Http11NioProcessor.java: 879) at org.apache.coyote.http11.Http11NioProtocol$Http11ConnectionHandler.process(H ttp11NioProtocol.java:719) at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.run(NioEndpoint.java: 2080) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.ja va:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:9 08) at java.lang.Thread.run(Thread.java:619) Oct 1, 2009 8:40:06 AM org.apache.solr.core.SolrCore execute INFO: [zeta-main] webapp=/solr path=/update params={} status=500 QTime=5265 Oct 1, 2009 8:40:12 AM org.apache.solr.common.SolrException log SEVERE: java.lang.OutOfMemoryError: GC overhead limit exceeded -- Jeff Newburn Software Engineer, Zappos.com jnewb...@zappos.com - 702-943-7562
Re: index size before and after commit
I've heard there is a new partial optimize feature in Lucene, but it is not mentioned in the Solr or Lucene wikis so I cannot advise you how to use it. On a previous project we had a 500GB index for 450m documents. It took 14 hours to optimize. We found that Solr worked well (given enough RAM for sorting and faceting requests) but that the IT logistics of a 500G fileset were too much. Also, if you want your query servers to continue serving while propogating the newly optimized index, you need 2X space to store both copies on the slave during the transfer. For us this 35 minutes over 1G ethernet. On Thu, Oct 1, 2009 at 7:36 AM, Walter Underwood wun...@wunderwood.org wrote: I've now worked on three different search engines and they all have a 3X worst case on space, so I'm familiar with this case. --wunder On Oct 1, 2009, at 7:15 AM, Mark Miller wrote: Nice one ;) Its not technically a case where optimize requires 2x though in case the user asking gets confused. Its a case unrelated to optimize that can grow your index. Then you need 2x for the optimize, since you won't copy the deletes. It also requires that you jump hoops to delete everything. If you delete everything with *:*, that is smart enough not to just do a delete on every document - it just creates a new index, allowing the removal of the old very efficiently. Def agree on the more disk space. Walter Underwood wrote: Here is how you need 3X. First, index everything and optimize. Then delete everything and reindex without any merges. You have one full-size index containing only deleted docs, one full-size index containing reindexed docs, and need that much space for a third index. Honestly, disk is cheap, and there is no way to make Lucene work reliably with less disk. 1TB is a few hundred dollars. You have a free search engine, buy some disk. wunder On Oct 1, 2009, at 6:25 AM, Grant Ingersoll wrote: 151GB or as little as from 183GB to 182GB. Is that size after a commit close to the size the index would be after an optimize? For that matter, are there cases where optimization can take more than 2x? I've heard of cases but have not observed them in my system. I seem to recall a case where it can be 3x, but I don't know that it has been observed much. -- - Mark http://www.lucidimagination.com -- Lance Norskog goks...@gmail.com
Re: trie fields and sortMissingLast
On Thu, Oct 1, 2009 at 10:39 AM, Steve Conover scono...@gmail.com wrote: I just noticed this comment in the default schema: !-- These types should only be used for back compatibility with existing indexes, or if sortMissingLast functionality is needed. Use Trie based fields instead. -- Does that mean TrieFields are never going to get sortMissingLast? Not in time for 1.4, but yes they will eventually get it. It has to do with the representation... currently we can't tell between a 0 and missing. Do you all think that a reasonable strategy is to use a copyField and use s fields for sorting (only), and trie for everything else? If you don't need the fast range queries, use the s fields only. -Yonik http://www.lucidimagination.com On Wed, Sep 30, 2009 at 10:59 PM, Steve Conover scono...@gmail.com wrote: Am I correct in thinking that trie fields don't support sortMissingLast (my tests show that they don't). If not, is there any plan for adding it in? Regards, Steve
Re: Solr Trunk Heap Space Issues
bq. Tons of changes since... including the per-segment searching/sorting/function queries (I think). Yup. I actually didn't think so, because that was committed to Lucene in Feburary - but it didn't come into Solr till March 10th. March 5th just ducked it. Yonik Seeley wrote: On Thu, Oct 1, 2009 at 11:41 AM, Jeff Newburn jnewb...@zappos.com wrote: I am trying to update to the newest version of solr from trunk as of May 5th. Tons of changes since... including the per-segment searching/sorting/function queries (I think). Do you sort on any single valued fields that you also facet on? Do you use ord() or rord() in any function queries? Unfortunately, some of these things will take up more memory because some things still cache FieldCache elements with the top-level reader, while some use segment readers. The direction is going toward all segment readers, but we're not there yet (and won't be for 1.4). ord() rord() will never be fixed... people need to migrate to something else. http://issues.apache.org/jira/browse/SOLR- is the main issue for this. If course, I've really only been talking about search related changes. Nothing on the indexing side should cause greater memory usage but perhaps the indexing side could run out of memory due to the search side taking up more. -Yonik http://www.lucidimagination.com I updated and compiled from trunk as of yesterday (09/30/2009). When I try to do a full import I am receiving a GC heap error after changing nothing in the configuration files. Why would this happen in the most recent versions but not in the version from a few months ago. The stack trace is below. Oct 1, 2009 8:34:32 AM org.apache.solr.update.processor.LogUpdateProcessor finish INFO: {add=[166400, 166608, 166698, 166800, 166811, 167097, 167316, 167353, ...(83 more)]} 0 35991 Oct 1, 2009 8:34:32 AM org.apache.solr.common.SolrException log SEVERE: java.lang.OutOfMemoryError: GC overhead limit exceeded at java.util.Arrays.copyOfRange(Arrays.java:3209) at java.lang.String.init(String.java:215) at com.ctc.wstx.util.TextBuffer.contentsAsString(TextBuffer.java:384) at com.ctc.wstx.sr.BasicStreamReader.getText(BasicStreamReader.java:821) at org.apache.solr.handler.XMLLoader.readDoc(XMLLoader.java:280) at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:139) at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:69) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentSt reamHandlerBase.java:54) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase. java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:3 38) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java: 241) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Application FilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterCh ain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.ja va:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.ja va:175) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128 ) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102 ) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java :109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286) at org.apache.coyote.http11.Http11NioProcessor.process(Http11NioProcessor.java: 879) at org.apache.coyote.http11.Http11NioProtocol$Http11ConnectionHandler.process(H ttp11NioProtocol.java:719) at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.run(NioEndpoint.java: 2080) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.ja va:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:9 08) at java.lang.Thread.run(Thread.java:619) Oct 1, 2009 8:40:06 AM org.apache.solr.core.SolrCore execute INFO: [zeta-main] webapp=/solr path=/update params={} status=500 QTime=5265 Oct 1, 2009 8:40:12 AM org.apache.solr.common.SolrException log SEVERE: java.lang.OutOfMemoryError: GC overhead limit exceeded -- Jeff Newburn Software Engineer, Zappos.com jnewb...@zappos.com - 702-943-7562 -- - Mark http://www.lucidimagination.com
Re: index size before and after commit
Ha! Searching partial optimize on http://www.lucidimagination.com/search , we discover SOLR-603 which gives the 'maxSegments' option to the optimize command. The text does not include the word 'partial'. It's on http://wiki.apache.org/solr/UpdateXmlMessages. The command gives a number of Lucene segments, and I have no idea how this will translate to disk space. To minimize disk space, you could run it repetitively with the number of segments decreasing to one. On Thu, Oct 1, 2009 at 11:49 AM, Lance Norskog goks...@gmail.com wrote: I've heard there is a new partial optimize feature in Lucene, but it is not mentioned in the Solr or Lucene wikis so I cannot advise you how to use it. On a previous project we had a 500GB index for 450m documents. It took 14 hours to optimize. We found that Solr worked well (given enough RAM for sorting and faceting requests) but that the IT logistics of a 500G fileset were too much. Also, if you want your query servers to continue serving while propogating the newly optimized index, you need 2X space to store both copies on the slave during the transfer. For us this 35 minutes over 1G ethernet. On Thu, Oct 1, 2009 at 7:36 AM, Walter Underwood wun...@wunderwood.org wrote: I've now worked on three different search engines and they all have a 3X worst case on space, so I'm familiar with this case. --wunder On Oct 1, 2009, at 7:15 AM, Mark Miller wrote: Nice one ;) Its not technically a case where optimize requires 2x though in case the user asking gets confused. Its a case unrelated to optimize that can grow your index. Then you need 2x for the optimize, since you won't copy the deletes. It also requires that you jump hoops to delete everything. If you delete everything with *:*, that is smart enough not to just do a delete on every document - it just creates a new index, allowing the removal of the old very efficiently. Def agree on the more disk space. Walter Underwood wrote: Here is how you need 3X. First, index everything and optimize. Then delete everything and reindex without any merges. You have one full-size index containing only deleted docs, one full-size index containing reindexed docs, and need that much space for a third index. Honestly, disk is cheap, and there is no way to make Lucene work reliably with less disk. 1TB is a few hundred dollars. You have a free search engine, buy some disk. wunder On Oct 1, 2009, at 6:25 AM, Grant Ingersoll wrote: 151GB or as little as from 183GB to 182GB. Is that size after a commit close to the size the index would be after an optimize? For that matter, are there cases where optimization can take more than 2x? I've heard of cases but have not observed them in my system. I seem to recall a case where it can be 3x, but I don't know that it has been observed much. -- - Mark http://www.lucidimagination.com -- Lance Norskog goks...@gmail.com -- Lance Norskog goks...@gmail.com
Re: Solr Trunk Heap Space Issues
On Thu, Oct 1, 2009 at 3:14 PM, Mark Miller markrmil...@gmail.com wrote: bq. Tons of changes since... including the per-segment searching/sorting/function queries (I think). Yup. I actually didn't think so, because that was committed to Lucene in Feburary - but it didn't come into Solr till March 10th. March 5th just ducked it. Jeff said May 5th But it wasn't until the end of May that Solr started using Lucene's new sorting facilities that worked per-segment. -Yonik http://www.lucidimagination.com Yonik Seeley wrote: On Thu, Oct 1, 2009 at 11:41 AM, Jeff Newburn jnewb...@zappos.com wrote: I am trying to update to the newest version of solr from trunk as of May 5th. Tons of changes since... including the per-segment searching/sorting/function queries (I think). Do you sort on any single valued fields that you also facet on? Do you use ord() or rord() in any function queries? Unfortunately, some of these things will take up more memory because some things still cache FieldCache elements with the top-level reader, while some use segment readers. The direction is going toward all segment readers, but we're not there yet (and won't be for 1.4). ord() rord() will never be fixed... people need to migrate to something else. http://issues.apache.org/jira/browse/SOLR- is the main issue for this. If course, I've really only been talking about search related changes. Nothing on the indexing side should cause greater memory usage but perhaps the indexing side could run out of memory due to the search side taking up more. -Yonik http://www.lucidimagination.com I updated and compiled from trunk as of yesterday (09/30/2009). When I try to do a full import I am receiving a GC heap error after changing nothing in the configuration files. Why would this happen in the most recent versions but not in the version from a few months ago. The stack trace is below. Oct 1, 2009 8:34:32 AM org.apache.solr.update.processor.LogUpdateProcessor finish INFO: {add=[166400, 166608, 166698, 166800, 166811, 167097, 167316, 167353, ...(83 more)]} 0 35991 Oct 1, 2009 8:34:32 AM org.apache.solr.common.SolrException log SEVERE: java.lang.OutOfMemoryError: GC overhead limit exceeded at java.util.Arrays.copyOfRange(Arrays.java:3209) at java.lang.String.init(String.java:215) at com.ctc.wstx.util.TextBuffer.contentsAsString(TextBuffer.java:384) at com.ctc.wstx.sr.BasicStreamReader.getText(BasicStreamReader.java:821) at org.apache.solr.handler.XMLLoader.readDoc(XMLLoader.java:280) at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:139) at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:69) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentSt reamHandlerBase.java:54) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase. java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:3 38) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java: 241) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Application FilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterCh ain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.ja va:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.ja va:175) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128 ) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102 ) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java :109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286) at org.apache.coyote.http11.Http11NioProcessor.process(Http11NioProcessor.java: 879) at org.apache.coyote.http11.Http11NioProtocol$Http11ConnectionHandler.process(H ttp11NioProtocol.java:719) at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.run(NioEndpoint.java: 2080) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.ja va:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:9 08) at java.lang.Thread.run(Thread.java:619) Oct 1, 2009 8:40:06 AM org.apache.solr.core.SolrCore execute INFO: [zeta-main] webapp=/solr path=/update params={} status=500 QTime=5265 Oct 1, 2009 8:40:12 AM org.apache.solr.common.SolrException log SEVERE: java.lang.OutOfMemoryError: GC overhead limit exceeded -- Jeff Newburn Software Engineer, Zappos.com jnewb...@zappos.com - 702-943-7562 -- - Mark http://www.lucidimagination.com
Re: Solr Trunk Heap Space Issues
Whoops. There is my lazy brain for you - march, may, august - all the same ;) Okay - forgot Solr went straight down and used FieldSortedHitQueue. So it all still makes sense ;) Still interested in seeing his field sanity output to see whats possibly being doubled. Yonik Seeley wrote: On Thu, Oct 1, 2009 at 3:14 PM, Mark Miller markrmil...@gmail.com wrote: bq. Tons of changes since... including the per-segment searching/sorting/function queries (I think). Yup. I actually didn't think so, because that was committed to Lucene in Feburary - but it didn't come into Solr till March 10th. March 5th just ducked it. Jeff said May 5th But it wasn't until the end of May that Solr started using Lucene's new sorting facilities that worked per-segment. -Yonik http://www.lucidimagination.com Yonik Seeley wrote: On Thu, Oct 1, 2009 at 11:41 AM, Jeff Newburn jnewb...@zappos.com wrote: I am trying to update to the newest version of solr from trunk as of May 5th. Tons of changes since... including the per-segment searching/sorting/function queries (I think). Do you sort on any single valued fields that you also facet on? Do you use ord() or rord() in any function queries? Unfortunately, some of these things will take up more memory because some things still cache FieldCache elements with the top-level reader, while some use segment readers. The direction is going toward all segment readers, but we're not there yet (and won't be for 1.4). ord() rord() will never be fixed... people need to migrate to something else. http://issues.apache.org/jira/browse/SOLR- is the main issue for this. If course, I've really only been talking about search related changes. Nothing on the indexing side should cause greater memory usage but perhaps the indexing side could run out of memory due to the search side taking up more. -Yonik http://www.lucidimagination.com I updated and compiled from trunk as of yesterday (09/30/2009). When I try to do a full import I am receiving a GC heap error after changing nothing in the configuration files. Why would this happen in the most recent versions but not in the version from a few months ago. The stack trace is below. Oct 1, 2009 8:34:32 AM org.apache.solr.update.processor.LogUpdateProcessor finish INFO: {add=[166400, 166608, 166698, 166800, 166811, 167097, 167316, 167353, ...(83 more)]} 0 35991 Oct 1, 2009 8:34:32 AM org.apache.solr.common.SolrException log SEVERE: java.lang.OutOfMemoryError: GC overhead limit exceeded at java.util.Arrays.copyOfRange(Arrays.java:3209) at java.lang.String.init(String.java:215) at com.ctc.wstx.util.TextBuffer.contentsAsString(TextBuffer.java:384) at com.ctc.wstx.sr.BasicStreamReader.getText(BasicStreamReader.java:821) at org.apache.solr.handler.XMLLoader.readDoc(XMLLoader.java:280) at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:139) at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:69) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentSt reamHandlerBase.java:54) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase. java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:3 38) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java: 241) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Application FilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterCh ain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.ja va:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.ja va:175) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128 ) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102 ) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java :109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286) at org.apache.coyote.http11.Http11NioProcessor.process(Http11NioProcessor.java: 879) at org.apache.coyote.http11.Http11NioProtocol$Http11ConnectionHandler.process(H ttp11NioProtocol.java:719) at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.run(NioEndpoint.java: 2080) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.ja va:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:9 08) at java.lang.Thread.run(Thread.java:619) Oct 1, 2009 8:40:06 AM org.apache.solr.core.SolrCore execute INFO: [zeta-main] webapp=/solr path=/update params={} status=500 QTime=5265 Oct 1, 2009 8:40:12 AM org.apache.solr.common.SolrException log SEVERE:
Re: Solr Trunk Heap Space Issues
On Thu, Oct 1, 2009 at 3:37 PM, Mark Miller markrmil...@gmail.com wrote: Still interested in seeing his field sanity output to see whats possibly being doubled. Strangely enough, I'm having a hard time seeing caching at the different levels. I mad a multi-segment index (2 segments), and then did a sort and facet: http://localhost:8983/solr/select?q=*:*sort=popularity%20descfacet=truefacet.field=popularity Seems like that should do it, but the statistics fieldCache section shows only 2 entries. entries_count : 2 entry#0 : 'org.apache.lucene.index.compoundfilereader$csindexin...@5b38d7'='popularity',int,org.apache.lucene.search.FieldCache.NUMERIC_UTILS_INT_PARSER=[I#949587 (size =~ 92 bytes) entry#1 : 'org.apache.lucene.index.compoundfilereader$csindexin...@1582a7c'='popularity',int,org.apache.lucene.search.FieldCache.NUMERIC_UTILS_INT_PARSER=[I#3534544 (size =~ 28 bytes) insanity_count : 0 Investigating further... -Yonik http://www.lucidimagination.com Yonik Seeley wrote: On Thu, Oct 1, 2009 at 3:14 PM, Mark Miller markrmil...@gmail.com wrote: bq. Tons of changes since... including the per-segment searching/sorting/function queries (I think). Yup. I actually didn't think so, because that was committed to Lucene in Feburary - but it didn't come into Solr till March 10th. March 5th just ducked it. Jeff said May 5th But it wasn't until the end of May that Solr started using Lucene's new sorting facilities that worked per-segment. -Yonik http://www.lucidimagination.com Yonik Seeley wrote: On Thu, Oct 1, 2009 at 11:41 AM, Jeff Newburn jnewb...@zappos.com wrote: I am trying to update to the newest version of solr from trunk as of May 5th. Tons of changes since... including the per-segment searching/sorting/function queries (I think). Do you sort on any single valued fields that you also facet on? Do you use ord() or rord() in any function queries? Unfortunately, some of these things will take up more memory because some things still cache FieldCache elements with the top-level reader, while some use segment readers. The direction is going toward all segment readers, but we're not there yet (and won't be for 1.4). ord() rord() will never be fixed... people need to migrate to something else. http://issues.apache.org/jira/browse/SOLR- is the main issue for this. If course, I've really only been talking about search related changes. Nothing on the indexing side should cause greater memory usage but perhaps the indexing side could run out of memory due to the search side taking up more. -Yonik http://www.lucidimagination.com I updated and compiled from trunk as of yesterday (09/30/2009). When I try to do a full import I am receiving a GC heap error after changing nothing in the configuration files. Why would this happen in the most recent versions but not in the version from a few months ago. The stack trace is below. Oct 1, 2009 8:34:32 AM org.apache.solr.update.processor.LogUpdateProcessor finish INFO: {add=[166400, 166608, 166698, 166800, 166811, 167097, 167316, 167353, ...(83 more)]} 0 35991 Oct 1, 2009 8:34:32 AM org.apache.solr.common.SolrException log SEVERE: java.lang.OutOfMemoryError: GC overhead limit exceeded at java.util.Arrays.copyOfRange(Arrays.java:3209) at java.lang.String.init(String.java:215) at com.ctc.wstx.util.TextBuffer.contentsAsString(TextBuffer.java:384) at com.ctc.wstx.sr.BasicStreamReader.getText(BasicStreamReader.java:821) at org.apache.solr.handler.XMLLoader.readDoc(XMLLoader.java:280) at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:139) at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:69) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentSt reamHandlerBase.java:54) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase. java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:3 38) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java: 241) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Application FilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterCh ain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.ja va:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.ja va:175) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128 ) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102 ) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java :109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286) at
Re: field collapsing sums
1) That is correct. Including collapsed documents fields can make you search significantly slower (depending on how many documents are returned). 2) It seems that you are using the parameters as was intended. The collapsed documents will contain all documents (from whole query result) that have been collapsed on a certain field value that occurs in the result set that is being displayed. That is how it should work. But if I'm understanding you correctly you want to display all dupes from the whole query result set (also those which collapse field value does not occur in the in the displayed result set)? Martijn 2009/10/1 Joe Calderon calderon@gmail.com: hello martijn, thx for the tip, i tried that approach but ran into two snags, 1. returning the fields makes collapsing a lot slower for results, but that might just be the nature of iterating large results. 2. it seems like only dupes of records on the first page are returned or is tehre a a setting im missing? currently im only sending, collapse.field=brand and collapse.includeCollapseDocs.fl=num_in_stock --joe On Thu, Oct 1, 2009 at 1:14 AM, Martijn v Groningen martijn.is.h...@gmail.com wrote: Hi Joe, Currently the patch does not do that, but you can do something else that might help you in getting your summed stock. In the latest patch you can include fields of collapsed documents in the result per distinct field value. If your specify collapse.includeCollapseDocs.fl=num_in_stock in the request nd lets say you collapse on brand then in the response you will receive the following xml: lst name=collapsedDocs result name=brand1 numFound=48 start=0 doc str name=num_in_stock2/str /doc doc str name=num_in_stock3/str /doc ... /result result name=”brand2” numFound=”9” start=”0” ... /result /lst On the client side you can do whatever you want with this data and for example sum it together. Although the patch does not sum for you, I think it will allow to implement your requirement without to much hassle. Cheers, Martijn 2009/10/1 Matt Weber m...@mattweber.org: You might want to see how the stats component works with field collapsing. Thanks, Matt Weber On Sep 30, 2009, at 5:16 PM, Uri Boness wrote: Hi, At the moment I think the most appropriate place to put it is in the AbstractDocumentCollapser (in the getCollapseInfo method). Though, it might not be the most efficient. Cheers, Uri Joe Calderon wrote: hello all, i have a question on the field collapsing patch, say i have an integer field called num_in_stock and i collapse by some other column, is it possible to sum up that integer field and return the total in the output, if not how would i go about extending the collapsing component to support that? thx much --joe -- Met vriendelijke groet, Martijn van Groningen
Re: Solr Trunk Heap Space Issues
On Thu, Oct 1, 2009 at 4:05 PM, Yonik Seeley yo...@lucidimagination.com wrote: On Thu, Oct 1, 2009 at 3:37 PM, Mark Miller markrmil...@gmail.com wrote: Still interested in seeing his field sanity output to see whats possibly being doubled. Strangely enough, I'm having a hard time seeing caching at the different levels. I mad a multi-segment index (2 segments), and then did a sort and facet: http://localhost:8983/solr/select?q=*:*sort=popularity%20descfacet=truefacet.field=popularity Seems like that should do it, but the statistics fieldCache section shows only 2 entries. entries_count : 2 entry#0 : 'org.apache.lucene.index.compoundfilereader$csindexin...@5b38d7'='popularity',int,org.apache.lucene.search.FieldCache.NUMERIC_UTILS_INT_PARSER=[I#949587 (size =~ 92 bytes) entry#1 : 'org.apache.lucene.index.compoundfilereader$csindexin...@1582a7c'='popularity',int,org.apache.lucene.search.FieldCache.NUMERIC_UTILS_INT_PARSER=[I#3534544 (size =~ 28 bytes) insanity_count : 0 Investigating further... Ahhh, TrieField.isTokenized() returns true. The facet code has boolean multiToken = sf.multiValued() || ft.isTokenized(); and if multiToken==true then it uses multi-valued faceting, which doesn't use the field cache. Since isTokenized() more reflects if something is tokenized at the Lucene level, perhaps we need something that specifies if there is more than one logical value per field value? I'm drawing a blank on a good name for such a method though... -Yonik http://www.lucidimagination.com -Yonik http://www.lucidimagination.com Yonik Seeley wrote: On Thu, Oct 1, 2009 at 3:14 PM, Mark Miller markrmil...@gmail.com wrote: bq. Tons of changes since... including the per-segment searching/sorting/function queries (I think). Yup. I actually didn't think so, because that was committed to Lucene in Feburary - but it didn't come into Solr till March 10th. March 5th just ducked it. Jeff said May 5th But it wasn't until the end of May that Solr started using Lucene's new sorting facilities that worked per-segment. -Yonik http://www.lucidimagination.com Yonik Seeley wrote: On Thu, Oct 1, 2009 at 11:41 AM, Jeff Newburn jnewb...@zappos.com wrote: I am trying to update to the newest version of solr from trunk as of May 5th. Tons of changes since... including the per-segment searching/sorting/function queries (I think). Do you sort on any single valued fields that you also facet on? Do you use ord() or rord() in any function queries? Unfortunately, some of these things will take up more memory because some things still cache FieldCache elements with the top-level reader, while some use segment readers. The direction is going toward all segment readers, but we're not there yet (and won't be for 1.4). ord() rord() will never be fixed... people need to migrate to something else. http://issues.apache.org/jira/browse/SOLR- is the main issue for this. If course, I've really only been talking about search related changes. Nothing on the indexing side should cause greater memory usage but perhaps the indexing side could run out of memory due to the search side taking up more. -Yonik http://www.lucidimagination.com I updated and compiled from trunk as of yesterday (09/30/2009). When I try to do a full import I am receiving a GC heap error after changing nothing in the configuration files. Why would this happen in the most recent versions but not in the version from a few months ago. The stack trace is below. Oct 1, 2009 8:34:32 AM org.apache.solr.update.processor.LogUpdateProcessor finish INFO: {add=[166400, 166608, 166698, 166800, 166811, 167097, 167316, 167353, ...(83 more)]} 0 35991 Oct 1, 2009 8:34:32 AM org.apache.solr.common.SolrException log SEVERE: java.lang.OutOfMemoryError: GC overhead limit exceeded at java.util.Arrays.copyOfRange(Arrays.java:3209) at java.lang.String.init(String.java:215) at com.ctc.wstx.util.TextBuffer.contentsAsString(TextBuffer.java:384) at com.ctc.wstx.sr.BasicStreamReader.getText(BasicStreamReader.java:821) at org.apache.solr.handler.XMLLoader.readDoc(XMLLoader.java:280) at org.apache.solr.handler.XMLLoader.processUpdate(XMLLoader.java:139) at org.apache.solr.handler.XMLLoader.load(XMLLoader.java:69) at org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentSt reamHandlerBase.java:54) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase. java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:3 38) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java: 241) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(Application FilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterCh ain.java:206) at
Re: Solr Trunk Heap Space Issues
On Thu, Oct 1, 2009 at 4:35 PM, Yonik Seeley yo...@lucidimagination.com wrote: Since isTokenized() more reflects if something is tokenized at the Lucene level, perhaps we need something that specifies if there is more than one logical value per field value? I'm drawing a blank on a good name for such a method though... boolean singleValuedFieldCache()? -Yonik http://www.lucidimagination.com
Re: field collapsing sums
thx for the reply, i just want the number of dupes in the query result, but it seems i dont get the correct totals, for example a non collapsed dismax query for belgian beer returns X number results but when i collapse and sum the number of docs under collapse_counts, its much less than X it does seem to work when the collapsed results fit on one page (10 rows in my case) --joe 2) It seems that you are using the parameters as was intended. The collapsed documents will contain all documents (from whole query result) that have been collapsed on a certain field value that occurs in the result set that is being displayed. That is how it should work. But if I'm understanding you correctly you want to display all dupes from the whole query result set (also those which collapse field value does not occur in the in the displayed result set)?
Authentication/Authorization with Master-Slave over HTTP
Is that possible? Implemented? I want to be able to have SOLR Slave instance on publicly available host (accessible via HTTP), and synchronize with Master securely (via HTTP) I had it implicitly with cron jobs running as 'root' user, and Tomcat as 'tomcat'... Slave wasn't able to update index because of file system permissions... but now, I want to move instances far (Master in LAB, and Slave at Hosting Company) - and I want to secure it... Thanks, FUad http://www.linkedin.com/in/liferay
Re: Solr Trunk Heap Space Issues
Yonik Seeley wrote: On Thu, Oct 1, 2009 at 4:35 PM, Yonik Seeley yo...@lucidimagination.com wrote: Since isTokenized() more reflects if something is tokenized at the Lucene level, perhaps we need something that specifies if there is more than one logical value per field value? I'm drawing a blank on a good name for such a method though... boolean singleValuedFieldCache()? -Yonik http://www.lucidimagination.com Since everything seems to weigh towards calling out multi, why not multiValuedFieldCache? Either one sounds good to me though. -- - Mark http://www.lucidimagination.com
Re: Solr Trunk Heap Space Issues
Ok I was able to get a heap dump from the GC Limit error. 1 instance of LRUCache is taking 170mb 1 instance of SchemaIndex is taking 56Mb 4 instances of SynonymMap is taking 112mb There is no searching going on during this index update process. Any ideas what on earth is going on? Like I said my May version did this without any problems whatsoever. -- Jeff Newburn Software Engineer, Zappos.com jnewb...@zappos.com - 702-943-7562 From: Mark Miller markrmil...@gmail.com Reply-To: solr-user@lucene.apache.org Date: Thu, 01 Oct 2009 17:57:28 -0400 To: solr-user@lucene.apache.org Subject: Re: Solr Trunk Heap Space Issues Yonik Seeley wrote: On Thu, Oct 1, 2009 at 4:35 PM, Yonik Seeley yo...@lucidimagination.com wrote: Since isTokenized() more reflects if something is tokenized at the Lucene level, perhaps we need something that specifies if there is more than one logical value per field value? I'm drawing a blank on a good name for such a method though... boolean singleValuedFieldCache()? -Yonik http://www.lucidimagination.com Since everything seems to weigh towards calling out multi, why not multiValuedFieldCache? Either one sounds good to me though. -- - Mark http://www.lucidimagination.com
Re: ExtractingRequestHandler unknown field 'stream_source_info'
Thanks Lance, I have lucid's search as one of my open search tools in my browser. Generally pretty useful (especially the ability to filter) but it's not of much help when the tool points out that the best info is on the wiki and the link to the wiki reveals that it can't be reached. This is the second time in a couple of weeks I've seen the wiki down. Is there an ongoing problem? I do appreciate the tip though. Tricia Lance Norskog wrote: For future reference, the Solr Lucene wikis and mailing lists are indexed on http://www.lucidimagination.com/search/ On Thu, Oct 1, 2009 at 11:40 AM, Tricia Williams williams.tri...@gmail.com wrote: If the wiki isn't working https://www.packtpub.com/article/indexing-data-solr-1.4-enterprise-search-server-2 gave me more information. The LucidImagination article helps too. Now that the wiki is up again it is more obvious that I need to add: str name=fmap.contentfulltext/str str name=defaultFieldtext/str to my solrconfig.xml Tricia
JVM OOM when using field collapse component
i gotten two different out of memory errors while using the field collapsing component, using the latest patch (2009-09-26) and the latest nightly, has anyone else encountered similar problems? my collection is 5 million results but ive gotten the error collapsing as little as a few thousand SEVERE: java.lang.OutOfMemoryError: Java heap space at org.apache.lucene.util.ArrayUtil.grow(ArrayUtil.java:173) at org.apache.lucene.util.OpenBitSet.ensureCapacityWords(OpenBitSet.java:749) at org.apache.lucene.util.OpenBitSet.ensureCapacity(OpenBitSet.java:757) at org.apache.lucene.util.OpenBitSet.expandingWordNum(OpenBitSet.java:292) at org.apache.lucene.util.OpenBitSet.set(OpenBitSet.java:233) at org.apache.solr.search.AbstractDocumentCollapser.addCollapsedDoc(AbstractDocumentCollapser.java:402) at org.apache.solr.search.NonAdjacentDocumentCollapser.doCollapsing(NonAdjacentDocumentCollapser.java:115) at org.apache.solr.search.AbstractDocumentCollapser.collapse(AbstractDocumentCollapser.java:208) at org.apache.solr.handler.component.CollapseComponent.doProcess(CollapseComponent.java:98) at org.apache.solr.handler.component.CollapseComponent.process(CollapseComponent.java:66) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:195) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1148) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:387) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:765) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:417) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152) at org.mortbay.jetty.Server.handle(Server.java:326) at org.mortbay.jetty.HttpConnection.handleRequest(HttpConnection.java:534) at org.mortbay.jetty.HttpConnection$RequestHandler.headerComplete(HttpConnection.java:864) at org.mortbay.jetty.HttpParser.parseNext(HttpParser.java:539) at org.mortbay.jetty.HttpParser.parseAvailable(HttpParser.java:212) at org.mortbay.jetty.HttpConnection.handle(HttpConnection.java:404) at org.mortbay.io.nio.SelectChannelEndPoint.run(SelectChannelEndPoint.java:409) at org.mortbay.thread.QueuedThreadPool$PoolThread.run(QueuedThreadPool.java:520) SEVERE: java.lang.OutOfMemoryError: Java heap space at org.apache.solr.util.DocSetScoreCollector.init(DocSetScoreCollector.java:44) at org.apache.solr.search.NonAdjacentDocumentCollapser.doQuery(NonAdjacentDocumentCollapser.java:68) at org.apache.solr.search.AbstractDocumentCollapser.collapse(AbstractDocumentCollapser.java:205) at org.apache.solr.handler.component.CollapseComponent.doProcess(CollapseComponent.java:98) at org.apache.solr.handler.component.CollapseComponent.process(CollapseComponent.java:66) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:195) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241) at org.mortbay.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1148) at org.mortbay.jetty.servlet.ServletHandler.handle(ServletHandler.java:387) at org.mortbay.jetty.security.SecurityHandler.handle(SecurityHandler.java:216) at org.mortbay.jetty.servlet.SessionHandler.handle(SessionHandler.java:181) at org.mortbay.jetty.handler.ContextHandler.handle(ContextHandler.java:765) at org.mortbay.jetty.webapp.WebAppContext.handle(WebAppContext.java:417) at org.mortbay.jetty.handler.ContextHandlerCollection.handle(ContextHandlerCollection.java:230) at org.mortbay.jetty.handler.HandlerCollection.handle(HandlerCollection.java:114) at org.mortbay.jetty.handler.HandlerWrapper.handle(HandlerWrapper.java:152)
Re: Solr Trunk Heap Space Issues
Jeff Newburn wrote: Ok I was able to get a heap dump from the GC Limit error. 1 instance of LRUCache is taking 170mb 1 instance of SchemaIndex is taking 56Mb 4 instances of SynonymMap is taking 112mb There is no searching going on during this index update process. Any ideas what on earth is going on? Like I said my May version did this without any problems whatsoever. Had any searching gone on though? Even if its not occurring during the indexing, you will still have the data structure loaded if searches had occurred. What heap size do you have - that doesn't look like much data to me ... -- - Mark http://www.lucidimagination.com
Re: Solr Trunk Heap Space Issues
I loaded the jvm and started indexing. It is a test server so unless some errant query came in then no searching. Our instance has only 512mb but my concern is the obvious memory requirement leap since it worked before. What other data would be helpful with this? On Oct 1, 2009, at 5:14 PM, Mark Miller markrmil...@gmail.com wrote: Jeff Newburn wrote: Ok I was able to get a heap dump from the GC Limit error. 1 instance of LRUCache is taking 170mb 1 instance of SchemaIndex is taking 56Mb 4 instances of SynonymMap is taking 112mb There is no searching going on during this index update process. Any ideas what on earth is going on? Like I said my May version did this without any problems whatsoever. Had any searching gone on though? Even if its not occurring during the indexing, you will still have the data structure loaded if searches had occurred. What heap size do you have - that doesn't look like much data to me ... -- - Mark http://www.lucidimagination.com
Re: Ranking of search results
--- On Wed, 9/23/09, Amit Nithian anith...@gmail.com wrote: Hi Amith, Thanks for your reply.How do i set preference for the links , which should appear first,second in the search results. Which configuration file in Solr needs to be modified to achieve the same?. Regards Bhaskar From: Amit Nithian anith...@gmail.com Subject: Re: Ranking of search results To: solr-user@lucene.apache.org Date: Wednesday, September 23, 2009, 11:33 AM It depends on several things:1) The query handler that you are using 2) The fields that you are searching on and default fields specified For the default handler, it will issue a query for the default field and return results accordingly. To see what is going on pass the debugQuery=true to the end of the URL to see detailed output. If you are using the DisMaxHandler (DisJoint Max) then you will have a qf, pf and bf (query fields, phrase fields, boosting function). I would start looking at http://wiki.apache.org/solr/DisMaxRequestHandler http://wiki.apache.org/solr/DisMaxRequestHandler- Amit On Wed, Sep 23, 2009 at 10:25 AM, bhaskar chandrasekar bas_s...@yahoo.co.in wrote: Hi, When i give a input string for search in Solr , it displays me the corresponding results for the given input string. How the results are ranked and displayed.On what basis the search results are displayed. Is there any algorithm followed for displaying the results with first result and so on. Regards Bhaskar
Re: trie fields and sortMissingLast
Not in time for 1.4, but yes they will eventually get it. It has to do with the representation... currently we can't tell between a 0 and missing. Hmm. So does that mean that a query for latitudes, stored as trie floats, from -10 to +10 matches documents with no (i.e. null) latitude value?
Re: How to access the information from SolrJ
QueryResponse#getResults()#getNumFound() On Thu, Oct 1, 2009 at 11:49 PM, Paul Tomblin ptomb...@xcski.com wrote: When I do a query directly form the web, the XML of the response includes how many results would have been returned if it hadn't restricted itself to the first 10 rows: For instance, the query: http://localhost:8080/solrChunk/nutch/select/?q=*:*fq=category:mysites returns: response lst name='responseHeader' int name='status'0/int int name='QTime'0/int lst name='params' str name='q'*:*/str str name='fq'category:mysites/str /lst /lst result name='response' numFound='1251' start='0' doc str name='category'mysites/str long name='chunkNum'0/long str name='chunkUrl'http://localhost/Chunks/mysites/0-http___xcski.com_.xml/str str name='concept'Anatomy/str ... The value I'm talking about is in the numFound attribute of the result tag. I don't see any way to retrieve it through SolrJ - it's not in the QueryResponse.getHeader(), for instance. Can I retrieve it somewhere? -- http://www.linkedin.com/in/paultomblin -- - Noble Paul | Principal Engineer| AOL | http://aol.com
Re: trie fields and sortMissingLast
On Thu, Oct 1, 2009 at 11:09 PM, Steve Conover scono...@gmail.com wrote: Not in time for 1.4, but yes they will eventually get it. It has to do with the representation... currently we can't tell between a 0 and missing. Hmm. So does that mean that a query for latitudes, stored as trie floats, from -10 to +10 matches documents with no (i.e. null) latitude value? No, because normal queries work off of the inverted index (term-docids_that_match), and there won't be any values indexed for that document. Sorting and function queries work off of a non-inverted index (docid-value), that depending on the representation can't tell non-matching from default value. -Yonik http://www.lucidimagination.com
Re: Solr Trunk Heap Space Issues
On Thu, Oct 1, 2009 at 8:45 PM, Jeffery Newburn jnewb...@zappos.com wrote: I loaded the jvm and started indexing. It is a test server so unless some errant query came in then no searching. Our instance has only 512mb but my concern is the obvious memory requirement leap since it worked before. What other data would be helpful with this? Interesting... not too much should have changed for memory requirements on the indexing side. TokenStreams are now reused (and hence cached) per thread... but that normally wouldn't amount to much. There was recently another bug where compound file format was being used regardless of the config settings... but I think that was fixed on the 29th. Maybe you were already close to the limit required? Also, your heap dump did show LRUCache taking up 170MB, and only searches populate that (perhaps you have warming searches configured on this server?) -Yonik http://www.lucidimagination.com On Oct 1, 2009, at 5:14 PM, Mark Miller markrmil...@gmail.com wrote: Jeff Newburn wrote: Ok I was able to get a heap dump from the GC Limit error. 1 instance of LRUCache is taking 170mb 1 instance of SchemaIndex is taking 56Mb 4 instances of SynonymMap is taking 112mb There is no searching going on during this index update process. Any ideas what on earth is going on? Like I said my May version did this without any problems whatsoever. Had any searching gone on though? Even if its not occurring during the indexing, you will still have the data structure loaded if searches had occurred. What heap size do you have - that doesn't look like much data to me ... -- - Mark http://www.lucidimagination.com