IndexSearcher and Caches
Hello all, I have a few questions w.r.t the caches and the IndexSearcher available in solr. I am using solr 1.3. - The solr wiki states that the caches are per IndexSearcher object i.e if I set my filterCache size to 1000 it means that 1000 entries can be assigned for every IndexSearcher object. Is this true for queryResultsCache, filterCache and documentCache ? For the document cache, the wiki states that the value should be greater than (number of records) * (max number of queries). If the document cache is also sized per IndexSearcher object, then why do we need the (max number of queries) parameter in the formula ? - In a web application, where multiple users may log into the system and query concurrently, should we assign a new IndexSearcher object for every user ? I tried sharing the IndexSearcher object but noticed that the search criteria and filters of one user gets carried over to another ? Or is there some way to get over that ? - Combining the above two, if the caches are per IndexSearcher objects, and if we have to assign a new IndexSearcher for every new user (in a web application), will the total cache size not explode ? Apologies if these seem really basic. Thank you. Regards Rahul
Re: How real-time are Soir/Lucene queries?
You may wish to look at Lucandra: http://github.com/tjake/Lucandra On 21 May 2010, at 06:12, Walter Underwood wrote: Solr is a very good engine, but it is not real-time. You can turn off the caches and reduce the delays, but it is fundamentally not real-time. I work at MarkLogic, and we have a real-time transactional search engine (and respository). If you are curious, contact me directly. I do like Solr for lots of applications -- I chose it when I was at Netflix. wunder On May 20, 2010, at 7:22 PM, Thomas J. Buhr wrote: Hello Soir, Soir looks like an excellent API and its nice to have a tutorial that makes it easy to discover the basics of what Soir does, I'm impressed. I can see plenty of potential uses of Soir/Lucene and I'm interested now in just how real-time the queries made to an index can be? For example, in my application I have time ordered data being processed by a paint method in real-time. Each piece of data is identified and its associated renderer is invoked. The Java2D renderer would then lookup any layout and style values it requires to render the current data it has received from the layout and style indexes. What I'm wondering is if this lookup which would be a Lucene search will be fast enough? Would it be best to make Lucene queries for the relevant layout and style values required by the renderers ahead of rendering time and have the query results placed into the most performant collection (map/array) so renderer lookup would be as fast as possible? Or can Lucene handle many individual lookup queries fast enough so rendering is quick? Best regards from Canada, Thom
Re: How real-time are Soir/Lucene queries?
Further to earlier note re Lucandra. I note that Cassandra, which Lucandra backs onto, is 'eventually consistent', so given your real- time requirements, you may want to review this in the first instance, if Lucandra is of interest. On 21 May 2010, at 06:12, Walter Underwood wrote: Solr is a very good engine, but it is not real-time. You can turn off the caches and reduce the delays, but it is fundamentally not real-time. I work at MarkLogic, and we have a real-time transactional search engine (and respository). If you are curious, contact me directly. I do like Solr for lots of applications -- I chose it when I was at Netflix. wunder On May 20, 2010, at 7:22 PM, Thomas J. Buhr wrote: Hello Soir, Soir looks like an excellent API and its nice to have a tutorial that makes it easy to discover the basics of what Soir does, I'm impressed. I can see plenty of potential uses of Soir/Lucene and I'm interested now in just how real-time the queries made to an index can be? For example, in my application I have time ordered data being processed by a paint method in real-time. Each piece of data is identified and its associated renderer is invoked. The Java2D renderer would then lookup any layout and style values it requires to render the current data it has received from the layout and style indexes. What I'm wondering is if this lookup which would be a Lucene search will be fast enough? Would it be best to make Lucene queries for the relevant layout and style values required by the renderers ahead of rendering time and have the query results placed into the most performant collection (map/array) so renderer lookup would be as fast as possible? Or can Lucene handle many individual lookup queries fast enough so rendering is quick? Best regards from Canada, Thom
Re: How real-time are Solr/Lucene queries?
Thanks for the new information. Its really great to see so many options for Lucene. In my scenario there are the following pieces: 1 - A local Java client with an embedded Solr instance and its own local index/s. 2 - A remote server running Solr with index/s that are more like a repository that local clients query for extra goodies. 3 - The client is also a JXTA node so it can share indexes or documents too. 4 - There is no browser involved what so ever. My music composing application is a local client that uses configurations which would become many different document types. A subset of these configurations will be bundled with the application and then many more would be made available via a server/s running Solr. I would not expect the queries which would be made from within the local client to be returned in real-time. I would only expect such queries to be made in reasonable time and returned to the client. The client would have its local Lucene index system (embedded Solr using SolrJ) which would be updated with the results of the query made to the Solr instance running on the remote server. Then the user on the client would issue queries to the local Lucene index/s to obtain results which are used to setup contexts for different aspects of the client. For example: an activated context for musical scales and rhythms used for creating musical notes, an activated context for rendering with layout and style information for different music symbol renderer types. I'm not yet sure but it may be best to make queries against the local Lucene index/s and then convert the results into some context objects, maybe an array or map (I'd like to learn more about how query results can be returned as arrays or maps as well). Then the tools and renderers which require the information in the contexts would do any real-time lookup directly from the context objects not the local or remote Lucene or Solr index/s. The local client is also a JXTA node so it can share its own index/s with fellow peers. This is how I envision this happening with my limited knowledge of Lucene/Solr at this time. What are your thoughts on the feasibility of such a scenario? I'm just reading through the Solr reference PDF now and looking over the Solr admin application. Looking at the Schema.xml it seems to be field not document oriented. From my point of view I think in terms of configuration types which would be documents. In the schema it seems like only fields are defined and it does not matter which configuration/document they belong to? I guess this is fine as long as the indexing takes into account my unique document types and I can search for them as a whole as well, not only for specific values across a set of indexed documents. Also, does the schema allow me to index certain documents into specific indexes or are they all just bunched together? I'd rather have unique indexes for specific document types. I've just read about multiple cores running under one Solr instance, is this the only way to support multiple indexes? I'm thinking of ordering the Lucene in Action v2 book which is due this month and also the Solr 1.4 book. Before I do I just need to understand a few things which is why I'm writing such a long message :-) Thom On 2010-05-21, at 2:12 AM, Ben Eliott wrote: Further to earlier note re Lucandra. I note that Cassandra, which Lucandra backs onto, is 'eventually consistent', so given your real-time requirements, you may want to review this in the first instance, if Lucandra is of interest. On 21 May 2010, at 06:12, Walter Underwood wrote: Solr is a very good engine, but it is not real-time. You can turn off the caches and reduce the delays, but it is fundamentally not real-time. I work at MarkLogic, and we have a real-time transactional search engine (and respository). If you are curious, contact me directly. I do like Solr for lots of applications -- I chose it when I was at Netflix. wunder On May 20, 2010, at 7:22 PM, Thomas J. Buhr wrote: Hello Soir, Soir looks like an excellent API and its nice to have a tutorial that makes it easy to discover the basics of what Soir does, I'm impressed. I can see plenty of potential uses of Soir/Lucene and I'm interested now in just how real-time the queries made to an index can be? For example, in my application I have time ordered data being processed by a paint method in real-time. Each piece of data is identified and its associated renderer is invoked. The Java2D renderer would then lookup any layout and style values it requires to render the current data it has received from the layout and style indexes. What I'm wondering is if this lookup which would be a Lucene search will be fast enough? Would it be best to make Lucene queries for the relevant layout and style values required by the renderers ahead of rendering time and have the query results placed into the most
Re: IndexSearcher and Caches
Rahul, the IndexSearcher of Solr gets shared with every request within two commits. That means one IndexSearcher + its caches got a lifetime of one commit. After every commit, there will be a new one created. The cache does not mean, that they are applied automatically. They mean, that a filter from a query will be cached and whenever an user-query requieres the same filtering-criteria, they will use the cached filter instead of creating a new one on the fly. I.e: fq=inStock:true The result of this filtering-criteria gets cached one time. If another user asks again for a query with fq=inStock:true, Solr reuses the already existing filter. Since such filters are cached as byteVectors, they are not large. In this case it does not care for what the user is querying in his q-param. BTW: The IndexSearcher is threadsafe. So there is no problem with concurrent usage. Hope this helps??? Kind regards - Mitch -- View this message in context: http://lucene.472066.n3.nabble.com/IndexSearcher-and-Caches-tp833567p833841.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr 1.4 Enterprise Search Server book examples
Hi, everybody who owns the book can now download the source code examples again, the zip file is fixed now - just got a message from Packt! :) https://www.packtpub.com/support?nid=4191 Have fun :) Cheers, Stefan Am 06.05.2010 16:15, schrieb Antonello Mangone: I had the same problem and I send a message ... I'm waiting for an answer :D 2010/5/6 Stefan Moisesmoi...@shoptimax.de Hi fellow Solr users, I have contacted Packt regarding this issue and they are already working on fixing it, here is the friendly response I've received: Dear Stefan, Thank you for writing to PacktPub.com. I'm sorry to know that you were not able to access the code files for our title. Our IT team is now investigating this issue which they feel is due to the heavy file size. They have also taken author's help for this The author has split the example code file into parts due to its massive size. The author is in the process to upload a few record files on a different site (Musicbrainz) so they have to spent some time on getting permissions from them too. We are currently just waiting for him to send the remaining part of the code to us which will be made available on our website shortly. Please accept our apologies for the trouble. With warm regards For Packt Publishing Verus Pereira Sales Executive I'll let you all know once they get back to me that the files are updated. Cheers, Stefan Am 27.04.2010 12:00, schrieb findbestopensource: I downloaded the 5883_Code.zip file but not able to extract the complete contents. Regards Aditya www.findbestopensource.com On Tue, Apr 27, 2010 at 12:45 AM, Johan Cwiklinski johan.cwiklin...@ajlsm.com wrote: Hello, Le 26/04/2010 20:53, findbestopensource a écrit : I am able to successfully download the code. It is of 360 MB and took lot of time to download. I'm also able to download the file ; but not to extract many of the files it contains after download (can list them but not extract, an error occurs). Are you able to extract the ZIP archive you've downloaded? https://www.packtpub.com/solr-1-4-enterprise-search-server/book Select the download the code link and provide your email id, Download link will be sent via email. Regards Aditya www.findbestopensource.com On Mon, Apr 26, 2010 at 8:34 PM, Abdelhamid ABIDaeh.a...@gmail.com wrote: Hi, I'm also interested to get those examples, would someone to share them ? On 4/26/10, markus.rietz...@rzf.fin-nrw.de markus.rietz...@rzf.fin-nrw.de wrote: i have send you a private mail. markus -Ursprüngliche Nachricht- Von: Johan Cwiklinski [mailto:johan.cwiklin...@ajlsm.com] Gesendet: Montag, 26. April 2010 10:58 An: solr-user@lucene.apache.org Betreff: Solr 1.4 Enterprise Search Server book examples Hello, We've recently acquired the Solr 1.4 Enterprise Search Server book. I've tried to download the example ZIP file from the editor's website, but the file is actually corrupted, and I cannot unzip it :( Could someone tell me if I can get these examples from another location? I've send a message last week to the editor reporting the issue, but that is not yet fixed ; and I'd really like to take a look at the example code and make some tests. Regards, -- Johan Cwiklinski -- Abdelhamid ABID Software Engineer- J2EE / WEB -- Johan Cwiklinski -- *** Stefan Moises Senior Softwareentwickler shoptimax GmbH Guntherstraße 45 a 90461 Nürnberg Amtsgericht Nürnberg HRB 21703 GF Friedrich Schreieck Tel.: 0911/25566-25 Fax: 0911/25566-29 moi...@shoptimax.de http://www.shoptimax.de *** -- *** Stefan Moises Senior Softwareentwickler shoptimax GmbH Guntherstraße 45 a 90461 Nürnberg Amtsgericht Nürnberg HRB 21703 GF Friedrich Schreieck Tel.: 0911/25566-25 Fax: 0911/25566-29 moi...@shoptimax.de http://www.shoptimax.de ***
Re: Solr 1.4 Enterprise Search Server book examples
Hello, Le 21/05/2010 13:29, Stefan Moises a écrit : Hi, everybody who owns the book can now download the source code examples again, the zip file is fixed now - just got a message from Packt! :) https://www.packtpub.com/support?nid=4191 Have fun :) Cheers, Stefan I've received the same message today ; finally, I'll can take a look at those examples :) Regards, -- Johan Cwiklinski AJLSM
Wildcard queries
Hi folks, what's the idea behind the fact that no text analysis (e.g. lowercasing) is performed on wildcarded search terms? In my context this behaviour seems to be counter-intuitive (I guess that's the case in the majority of applications) and my application needs to lowercase any input term before sending the HTTP request to my Solr server. Would it be easy to disable this behaviour in Solr (1.5)? I would like to see a config parameter (per field type) that allows to disable this odd behaviour if needed. To ensure backward compatibility the odd behaviour should be the default anymore. Am I missing any drawbacks? Best, Sascha
Re: Wildcard queries
we can use stemming as an example: lets say your query is c?ns?st?nt?y how will this match consistently, which the porter stemmer transforms to 'consistent'. furthermore, note that i replaced the vowels with ?'s here. The porter stemmer doesnt just rip stuff off the end, but attempts to guess syllables as part of the process, so it cannot possibly work. the only way it would work in this situation would be if you formed permutations of all the possible words this wildcard would match, and then did analysis on each form, and searched on all stems. but, this is impossible, since the * operator allows an infinite language. On Fri, May 21, 2010 at 10:11 AM, Sascha Szott sz...@zib.de wrote: Hi folks, what's the idea behind the fact that no text analysis (e.g. lowercasing) is performed on wildcarded search terms? In my context this behaviour seems to be counter-intuitive (I guess that's the case in the majority of applications) and my application needs to lowercase any input term before sending the HTTP request to my Solr server. Would it be easy to disable this behaviour in Solr (1.5)? I would like to see a config parameter (per field type) that allows to disable this odd behaviour if needed. To ensure backward compatibility the odd behaviour should be the default anymore. Am I missing any drawbacks? Best, Sascha -- Robert Muir rcm...@gmail.com
Re: Wildcard queries
I absolutely consider this a bug too. Cast your vote: https://issues.apache.org/jira/browse/SOLR-219 ~ David On May 21, 2010, at 10:11 AM, Sascha Szott wrote: Hi folks, what's the idea behind the fact that no text analysis (e.g. lowercasing) is performed on wildcarded search terms? In my context this behaviour seems to be counter-intuitive (I guess that's the case in the majority of applications) and my application needs to lowercase any input term before sending the HTTP request to my Solr server. Would it be easy to disable this behaviour in Solr (1.5)? I would like to see a config parameter (per field type) that allows to disable this odd behaviour if needed. To ensure backward compatibility the odd behaviour should be the default anymore. Am I missing any drawbacks? Best, Sascha
Re: Wildcard queries
Hi Robert, thanks, you're absolutely right. I should better refine my initial question to: What's the idea behind the fact that no *lowercasing* is performed on wildcarded search terms if the field in question contains a LowercaseFilter in its associated field type definition? -Sascha Robert Muir wrote: we can use stemming as an example: lets say your query is c?ns?st?nt?y how will this match consistently, which the porter stemmer transforms to 'consistent'. furthermore, note that i replaced the vowels with ?'s here. The porter stemmer doesnt just rip stuff off the end, but attempts to guess syllables as part of the process, so it cannot possibly work. the only way it would work in this situation would be if you formed permutations of all the possible words this wildcard would match, and then did analysis on each form, and searched on all stems. but, this is impossible, since the * operator allows an infinite language. On Fri, May 21, 2010 at 10:11 AM, Sascha Szottsz...@zib.de wrote: Hi folks, what's the idea behind the fact that no text analysis (e.g. lowercasing) is performed on wildcarded search terms? In my context this behaviour seems to be counter-intuitive (I guess that's the case in the majority of applications) and my application needs to lowercase any input term before sending the HTTP request to my Solr server. Would it be easy to disable this behaviour in Solr (1.5)? I would like to see a config parameter (per field type) that allows to disable this odd behaviour if needed. To ensure backward compatibility the odd behaviour should be the default anymore. Am I missing any drawbacks? Best, Sascha
Re: Wildcard queries
this lowercasing can 'sort of work' (depending on your analysis, and even language, not all case folding is as simple as english). But the more general problem cannot be a bug, as its mathematically not possible to do with queries like wildcard that allow an infinite language, and non-reversible analysis. On Fri, May 21, 2010 at 10:21 AM, Smiley, David W. dsmi...@mitre.org wrote: I absolutely consider this a bug too. Cast your vote: https://issues.apache.org/jira/browse/SOLR-219 ~ David On May 21, 2010, at 10:11 AM, Sascha Szott wrote: Hi folks, what's the idea behind the fact that no text analysis (e.g. lowercasing) is performed on wildcarded search terms? In my context this behaviour seems to be counter-intuitive (I guess that's the case in the majority of applications) and my application needs to lowercase any input term before sending the HTTP request to my Solr server. Would it be easy to disable this behaviour in Solr (1.5)? I would like to see a config parameter (per field type) that allows to disable this odd behaviour if needed. To ensure backward compatibility the odd behaviour should be the default anymore. Am I missing any drawbacks? Best, Sascha -- Robert Muir rcm...@gmail.com
Re: Wildcard queries
I honestly do not know the rationale behind this in Solr, except to say similar problems exist even if you reduce the scope to just casing: For example, if you are using a german stemmer, it will case-fold ß to 'ss' (such that it will match SS). So doing some lowercasing at query-time will not correct the situation for that character, and furthermore it will be inconsistent with the '?' operator... (which only matches one character) On Fri, May 21, 2010 at 10:28 AM, Sascha Szott sz...@zib.de wrote: Hi Robert, thanks, you're absolutely right. I should better refine my initial question to: What's the idea behind the fact that no *lowercasing* is performed on wildcarded search terms if the field in question contains a LowercaseFilter in its associated field type definition? -Sascha Robert Muir wrote: we can use stemming as an example: lets say your query is c?ns?st?nt?y how will this match consistently, which the porter stemmer transforms to 'consistent'. furthermore, note that i replaced the vowels with ?'s here. The porter stemmer doesnt just rip stuff off the end, but attempts to guess syllables as part of the process, so it cannot possibly work. the only way it would work in this situation would be if you formed permutations of all the possible words this wildcard would match, and then did analysis on each form, and searched on all stems. but, this is impossible, since the * operator allows an infinite language. On Fri, May 21, 2010 at 10:11 AM, Sascha Szottsz...@zib.de wrote: Hi folks, what's the idea behind the fact that no text analysis (e.g. lowercasing) is performed on wildcarded search terms? In my context this behaviour seems to be counter-intuitive (I guess that's the case in the majority of applications) and my application needs to lowercase any input term before sending the HTTP request to my Solr server. Would it be easy to disable this behaviour in Solr (1.5)? I would like to see a config parameter (per field type) that allows to disable this odd behaviour if needed. To ensure backward compatibility the odd behaviour should be the default anymore. Am I missing any drawbacks? Best, Sascha -- Robert Muir rcm...@gmail.com
Re: Wildcard queries
On May 21, 2010, at 10:35 AM, Robert Muir wrote: I honestly do not know the rationale behind this in Solr, except to say similar problems exist even if you reduce the scope to just casing: Then why are you talking about stemming in the following example? We know stemming is problematic with wildcard searching. But casing... I argue not. For example, if you are using a german stemmer, it will case-fold ß to 'ss' (such that it will match SS). So doing some lowercasing at query-time will not correct the situation for that character, and furthermore it will be inconsistent with the '?' operator... (which only matches one character) On Fri, May 21, 2010 at 10:28 AM, Sascha Szott sz...@zib.de wrote: Hi Robert, thanks, you're absolutely right. I should better refine my initial question to: What's the idea behind the fact that no *lowercasing* is performed on wildcarded search terms if the field in question contains a LowercaseFilter in its associated field type definition? -Sascha Robert Muir wrote: we can use stemming as an example: lets say your query is c?ns?st?nt?y how will this match consistently, which the porter stemmer transforms to 'consistent'. furthermore, note that i replaced the vowels with ?'s here. The porter stemmer doesnt just rip stuff off the end, but attempts to guess syllables as part of the process, so it cannot possibly work. the only way it would work in this situation would be if you formed permutations of all the possible words this wildcard would match, and then did analysis on each form, and searched on all stems. but, this is impossible, since the * operator allows an infinite language. On Fri, May 21, 2010 at 10:11 AM, Sascha Szottsz...@zib.de wrote: Hi folks, what's the idea behind the fact that no text analysis (e.g. lowercasing) is performed on wildcarded search terms? In my context this behaviour seems to be counter-intuitive (I guess that's the case in the majority of applications) and my application needs to lowercase any input term before sending the HTTP request to my Solr server. Would it be easy to disable this behaviour in Solr (1.5)? I would like to see a config parameter (per field type) that allows to disable this odd behaviour if needed. To ensure backward compatibility the odd behaviour should be the default anymore. Am I missing any drawbacks? Best, Sascha -- Robert Muir rcm...@gmail.com
Re: Wildcard queries
On Fri, May 21, 2010 at 10:40 AM, Smiley, David W. dsmi...@mitre.org wrote: Then why are you talking about stemming in the following example? We know stemming is problematic with wildcard searching. But casing... I argue not. I just mentioned an example stemmer that properly case-folds this german character. Another tokenstream that does, is the unicode case-folding algorithm [requires code dependent on ICU at the moment] LowerCaseFilter is *not* unicode-compliant as far as casing goes. toLowerCase is intended for display, not for case-insensitive matching. -- Robert Muir rcm...@gmail.com
Re: How real-time are Soir/Lucene queries?
Did your successor choose Solr? I seem to have read an article or seen a 'mobcast' whre the Search Engine Guy (SEG) @ Netflix used Solr. (Or, maybe ite as another video chain) Dennis Gearon Signature Warning EARTH has a Right To Life, otherwise we all die. Read 'Hot, Flat, and Crowded' Laugh at http://www.yert.com/film.php --- On Thu, 5/20/10, Walter Underwood wun...@wunderwood.org wrote: From: Walter Underwood wun...@wunderwood.org Subject: Re: How real-time are Soir/Lucene queries? To: solr-user@lucene.apache.org Date: Thursday, May 20, 2010, 10:12 PM Solr is a very good engine, but it is not real-time. You can turn off the caches and reduce the delays, but it is fundamentally not real-time. I work at MarkLogic, and we have a real-time transactional search engine (and respository). If you are curious, contact me directly. I do like Solr for lots of applications -- I chose it when I was at Netflix. wunder On May 20, 2010, at 7:22 PM, Thomas J. Buhr wrote: Hello Soir, Soir looks like an excellent API and its nice to have a tutorial that makes it easy to discover the basics of what Soir does, I'm impressed. I can see plenty of potential uses of Soir/Lucene and I'm interested now in just how real-time the queries made to an index can be? For example, in my application I have time ordered data being processed by a paint method in real-time. Each piece of data is identified and its associated renderer is invoked. The Java2D renderer would then lookup any layout and style values it requires to render the current data it has received from the layout and style indexes. What I'm wondering is if this lookup which would be a Lucene search will be fast enough? Would it be best to make Lucene queries for the relevant layout and style values required by the renderers ahead of rendering time and have the query results placed into the most performant collection (map/array) so renderer lookup would be as fast as possible? Or can Lucene handle many individual lookup queries fast enough so rendering is quick? Best regards from Canada, Thom
Re: How real-time are Soir/Lucene queries?
I chose it, and it doesn't look like they've replaced it in the eight months since I left. At the time, I was the entire search engineering department, so it was me. wunder On May 21, 2010, at 8:49 AM, Dennis Gearon wrote: Did your successor choose Solr? I seem to have read an article or seen a 'mobcast' whre the Search Engine Guy (SEG) @ Netflix used Solr. (Or, maybe ite as another video chain) Dennis Gearon Signature Warning EARTH has a Right To Life, otherwise we all die. Read 'Hot, Flat, and Crowded' Laugh at http://www.yert.com/film.php --- On Thu, 5/20/10, Walter Underwood wun...@wunderwood.org wrote: From: Walter Underwood wun...@wunderwood.org Subject: Re: How real-time are Soir/Lucene queries? To: solr-user@lucene.apache.org Date: Thursday, May 20, 2010, 10:12 PM Solr is a very good engine, but it is not real-time. You can turn off the caches and reduce the delays, but it is fundamentally not real-time. I work at MarkLogic, and we have a real-time transactional search engine (and respository). If you are curious, contact me directly. I do like Solr for lots of applications -- I chose it when I was at Netflix. wunder On May 20, 2010, at 7:22 PM, Thomas J. Buhr wrote: Hello Soir, Soir looks like an excellent API and its nice to have a tutorial that makes it easy to discover the basics of what Soir does, I'm impressed. I can see plenty of potential uses of Soir/Lucene and I'm interested now in just how real-time the queries made to an index can be? For example, in my application I have time ordered data being processed by a paint method in real-time. Each piece of data is identified and its associated renderer is invoked. The Java2D renderer would then lookup any layout and style values it requires to render the current data it has received from the layout and style indexes. What I'm wondering is if this lookup which would be a Lucene search will be fast enough? Would it be best to make Lucene queries for the relevant layout and style values required by the renderers ahead of rendering time and have the query results placed into the most performant collection (map/array) so renderer lookup would be as fast as possible? Or can Lucene handle many individual lookup queries fast enough so rendering is quick? Best regards from Canada, Thom -- Walter Underwood Venture ASM, Troop 14, Palo Alto
Re: Personalized Search
It will likely be what you suggested, one or two multi value fields. But with 10,000+ members, does Solr scaled with this schema? On Thu, May 20, 2010 at 6:27 PM, findbestopensource findbestopensou...@gmail.com wrote: Hi Rih, You going to include either of the two field bought or like to per member/visitor OR a unique field per member / visitor? If it's one or two common fields are included then there will not be any impact in performance. If you want to include unique field then you need to consider multi value field otherwise you certainly hit the wall. Regards Aditya www.findbestopensource.com On Thu, May 20, 2010 at 12:13 PM, Rih tanrihae...@gmail.com wrote: Has anybody done personalized search with Solr? I'm thinking of including fields such as bought or like per member/visitor via dynamic fields to a product search schema. Another option is to have a multi-value field that can contain user IDs. What are the possible performance issues with this setup? Looking forward to your ideas. Rih
Re: Personalized Search
Well, it's not really a recommendation engine per se but more of a filter for the user. Say, I already own some stuff from the result set, I just want to exclude them from the results. What I'm concerned with is reindexing the document everytime someone marks/votes/likes/boughts. On Thu, May 20, 2010 at 11:04 PM, Ken Krugler kkrugler_li...@transpac.comwrote: On May 19, 2010, at 11:43pm, Rih wrote: Has anybody done personalized search with Solr? I'm thinking of including fields such as bought or like per member/visitor via dynamic fields to a product search schema. Another option is to have a multi-value field that can contain user IDs. What are the possible performance issues with this setup? Mitch is right, what you're looking for here is a recommendation engine, if I understand your question properly. And yes, Mahout should work though the Taste recommendation engine it supports is pretty new. But Sean Owen Robin Anil have a Mahout in Action book that's in early release via Manning, and it has lots of good information about Mahout recommender systems. Assuming you have a list of recommendations for a given user, based on their past behavior and the recommendation engine, then you could use this to adjust search results. I'm waiting for Hoss to jump in here on how best to handle that :) -- Ken Ken Krugler +1 530-210-6378 http://bixolabs.com e l a s t i c w e b m i n i n g
Re: Personalized Search
Just want to throw this in: If you're worried about scaling, etc. you could take a look at item-based collaborative filtering instead of user based. i.e: DO NIGHTLY/ BATCH: - calculate the similarity between items based on their properties DO ON EACH REQUEST - have a user store/update it's interest as a vector of item-properties. How to update this based on click / browse behavior is the interesting thing and depends a lot on your environment. - Next is to recommend 'neighboring' items that are close to the defined 'interest-vector'. The code is similar to user-based colab. filtering, but scaling is invariant to the nr of users. other advantages: - new items/ products can be recommended as soon as they are added to the catalog (no need for users to express interest in them before the item can be suggested) disadvantage: - top-N results tend to be less dynamic then when using user-based colab. filtering. Of course, this doesn't touch on how to integrate this with Solr. Perhaps some combination with Mahout is indeed the best solution. I haven't given this much thought yet I must say. For info on Mahout Taste (+ an explanation on item-based filtering vs. user-based filtering) see: http://lucene.apache.org/mahout/taste.html Cheers, Geert-Jan 2010/5/21 Rih tanrihae...@gmail.com - keep the SOLR index independent of bought/like - have a db table with user prefs on a per item basis I have the same idea this far. at query time, specify boosts for 'my items' items I believe this works if you want to sort results by faved/not faved. But how does it scale if users already favorited/liked hundreds of item? The query can be quite long. Looking forward to your idea. On Thu, May 20, 2010 at 6:37 PM, dc tech dctech1...@gmail.com wrote: Another approach would be to do query time boosts of 'my' items under the assumption that count is limited: - keep the SOLR index independent of bought/like - have a db table with user prefs on a per item basis - at query time, specify boosts for 'my items' items We are planning to do this in the context of document management where documents in 'my (used/favorited ) folders' provide a boost factor to the results. On 5/20/10, findbestopensource findbestopensou...@gmail.com wrote: Hi Rih, You going to include either of the two field bought or like to per member/visitor OR a unique field per member / visitor? If it's one or two common fields are included then there will not be any impact in performance. If you want to include unique field then you need to consider multi value field otherwise you certainly hit the wall. Regards Aditya www.findbestopensource.com On Thu, May 20, 2010 at 12:13 PM, Rih tanrihae...@gmail.com wrote: Has anybody done personalized search with Solr? I'm thinking of including fields such as bought or like per member/visitor via dynamic fields to a product search schema. Another option is to have a multi-value field that can contain user IDs. What are the possible performance issues with this setup? Looking forward to your ideas. Rih -- Sent from my mobile device
Re: Which Solr to use?
On Tue, May 18, 2010 at 12:31 PM, Sixten Otto six...@sfko.com wrote: So features are being actively added to / code rearranged in trunk/4.0, with some of the work being back-ported to this branch to form a stable 3.1 release? Is that accurate? Is there any thinking about when that might drop (beyond the quite understandable when it's done)? Or, perhaps more reasonably, when it might freeze? I'm also interested in the recommend testing branch (to borrow a Debian term) to use. I'm planning a deployment in 2 months or so and have been experiencing too many problems with the older version of Tika to use the 1.4 version. Jim
Re: Which Solr to use?
: Is there any thinking about when that might drop (beyond the quite : understandable when it's done)? Or, perhaps more reasonably, when it : might freeze? FWIW: I have no idea ... it's all a question of when someone takes charge on the release process -- quite frankly, so much is in flux right now (because of the java+solr code tree merges, *and* the decision to create parallel dev branches so the trunk could be more agressive about API changes, *and* the decision to refactor modules) that i suspect a lot of things kind of need to shake out before anyone is going to feel comfortable doing a new release : I'm also interested in the recommend testing branch (to borrow a : Debian term) to use. I'm planning a deployment in 2 months or so and : have been experiencing too many problems with the older version of : Tika to use the 1.4 version. FWIW: If the only problem you are having with 1.4 is that you want o upgrade Tika, patching 1.4 to make hte neccessary changes to use a new version of Tika is probably going to be far less invasive/risky then using the 3x branch (but that is only my opinion, and i'm not even that well informed about what it takes to upgrade the Tika dependency) -Hoss
Re: Special Circumstances for embedded Solr
Any other commonly compelling reasons to use SolrJ? The most compelling reason (I think) is that if you program against the Solrj API, you can switch between embedded/http/streaming implementations without changing anything. This is great for our app that is either run as a small local instance of a big enterprise setting. ryan
Re: Moving from Lucene to Solr?
On Wed, May 19, 2010 at 6:38 AM, Peter Karich peat...@yahoo.de wrote: Hi all, while asking a question on stackoverflow [1] some other questions appear: Is SolrJ a recommended way to access Solr or should I prefer the HTTP interface? solrj vs HTTP interface? That will just be a matter of taste. If you are working in java, then solrj is likely a good option. How can I (j)unit-test Solr? (e.g. create+delete index via Java call) If you want to mess with creating/removing indexes at runtime, see: http://wiki.apache.org/solr/CoreAdmin Is Lucene faster than Solr? ... do you have experiences, preferable with the same index? solr is built ontop of lucene, so in that regard it is the same speed. Depending on your app, the abstractions that solr makes may make it less efficient then working directly in lucene. Unless you have very specialized needs, I doubt this will make a big difference.
DataImportHandler and running out of disk space
I'm noticing some data differences between my database and Solr. About a week ago my Solr server ran out of disk space, so now I'm observing how the DataImportHandler behaves when Solr runs out of disk space. In a word, I'd say it behaves badly! It looks like out-of-disk-space exceptions are treated like any other document-level exception (so my updates report successful completion). After running out of disk space I see the index shrink, then updates continue until I run out of disk space again, then the index shrinks, etc. I'm running a nightly build from December, 2009. Has this behaviour has changed since then? Is there DIH configuration to fail on certain exceptions? Thanks, Wojtek -- View this message in context: http://lucene.472066.n3.nabble.com/DataImportHandler-and-running-out-of-disk-space-tp835125p835125.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: DIH post import event listener for errors
I have a similar need so I've opened up a ticket: http://issues.apache.org/jira/browse/SOLR-1922 Should be pretty trivial to add. -- View this message in context: http://lucene.472066.n3.nabble.com/DIH-post-import-event-listener-for-errors-tp834645p835132.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Personalized Search
In our specific case, we would get the user's folders and then do a function query that provides a boost if the document.folder is in {my folder list}. Another approach that will work for our intranet use is to add the userids in a multi-valued field as others have suggested. On 5/20/10, MitchK mitc...@web.de wrote: Hi dc, - at query time, specify boosts for 'my items' items Do you mean something like document-boost or do you want to include something like OR myItemId:100^100 ? Can you tell us how you would specify document-boostings at query-time? Or are you querying something like a boolean field (i.e. isFavorite:true^10) or a numeric field? Kind regards - Mitch -- View this message in context: http://lucene.472066.n3.nabble.com/Personalized-Search-tp831070p832062.html Sent from the Solr - User mailing list archive at Nabble.com. -- Sent from my mobile device
Re: Tipps for develop a own RequestHandler ?!
: I would write an own RH for my system. is an howto in the www ? i didnt : found anythin about it. I would start by looking at how existing RequestHandlers are implemented -- the ones that ship with Solr are heavily refactored to reuse a lot of functionality, which can sometimes make it hard to follow what's going on, but at he same time they help make it clear where there is functionality you can reuse. My other big tip would be: rethink wether you really need to write RequestHandler. once upon a time this was the main type of plugin for doing things at search time, but with the introduction of QParsers and SearchComponents there is now usually a lot easier ways to do things -- if you tell us what type of custom logic you want to write, folks might be able to point out a simpler way to implement it. : can i develop in the svn-checkout and test in without building an new : solr.war ? debug ? if you take a look at the JARs that are included in the Solr release, you can compile against them -- you also don't have to rebuild the WAR with your custom classes, you can put them in a Jar that is loaded at runtime... http://wiki.apache.org/solr/SolrPlugins As for debugging Plugins: i tend to do it all via UnitTests and stack traces. there are some base classes in Solr that make this fairly easy to setup (TestHarness, AbstractSolrTestCase, and the new JUNit4 test base class whose name escapes me at hte moment) : i setting up solr in eclipse like this: : http://www.lucidimagination.com/developers/articles/setting-up-apache-solr-in-eclipse : but so its not possible to develop, right ? can you clarify your question? ... if you followed those steps i'm not sure why you wouldn't be able to develope your plugin and test it in eclipse. -Hoss
Re: Statistics exposed as JSON
: Are the Solr 1.4 statistics like #docs, #docsPending etc. exposed in : JSON format? if you are refering to hte output from stats.jsp, then no -- that is not available in JSON format in Solr 1.4. In future versions of Solr a new RequestHandle will replace stats.jsp (and regsitry.jsp) making this info available in all the formats supported by the ResponseWriters. -Hoss
No hits returned from shard search on multi-core setup
I cannot get hits back and do not get a correct total number of records when using shard searching. I have 2 cores, core0 and core1. Both have the same schema.xml and solrconfig.xml (different datadirs in solrconfig.xml). Our id field contains globally unique id's across both cores, but they use the same id field (same schema.xml). Issue exists when testing with Jetty and Tomcat. Using Solr 1.4.1. I found two other instances of this exact error on Google and neither have a solution, just a description like mine with lots of responses. Multi-core searching is something we need due to data layout including multiple languages. Details: Folder layout: C:\apache-solr-1.4.0\example\solr_multicore\solr\cores core0\data core0\conf core1\data core1\conf solr.xml My solr.xml: ?xml version=1.0 encoding=UTF-8 ? !-- snip the comments from Apache wiki -- solr persistent=true sharedLib=lib cores adminPath=/admin/cores core name=core0 instanceDir=core0 / core name=core1 instanceDir=core1 / /cores /solr Core0 search: http://localhost:8080/solr/core0/select/?q=*:*version=2.2start=0rows=10indent=on results: ?xml version=1.0 encoding=UTF-8 ? - response - lst name=responseHeader int name=status0/int int name=QTime0/int - lst name=params str name=rows10/str str name=start0/str str name=indenton/str str name=q*:*/str str name=version2.2/str /lst /lst - result name=response numFound=131 start=0 - doc ... Core1 search: http://localhost:8080/solr/core1/select/?q=*:*version=2.2start=0rows=10indent=on results: ?xml version=1.0 encoding=UTF-8 ? - response - lst name=responseHeader int name=status0/int int name=QTime16/int - lst name=params str name=rows10/str str name=start0/str str name=indenton/str str name=q*:*/str str name=version2.2/str /lst /lst - result name=response numFound=302 start=0 - doc ... Shard'd search: http://localhost:8080/solr/core0/select?q=*%:*version=2.2start=0rows=10indent=onshards=localhost:8080/solr/core0,localhost:8080/solr/core1 results: ?xml version=1.0 encoding=UTF-8 ? - response - lst name=responseHeader int name=status0/int int name=QTime31/int - lst name=params str name=rows10/str str name=start0/str str name=indenton/str str name=q*:*/str str name=shardslocalhost:8080/solr/core0,localhost:8983/solr/core1/str str name=version2.2/str /lst /lst result name=response numFound=423 start=0 / /response Notice no doc's. numFound does not equal total for both cores (131+302=433). Query info from Catalina.log: May 21, 2010 12:27:32 PM org.apache.solr.core.SolrCore execute INFO: [core0] webapp=/solr path=/select params={wt=javabinisShard=truerows=10start=0fsv=trueq=*:*fl=sedocid,scoreversion=1} hits=131 status=0 QTime=0 May 21, 2010 12:27:32 PM org.apache.solr.core.SolrCore execute INFO: [core1] webapp=/solr path=/select params={wt=javabinisShard=truerows=10start=0fsv=trueq=*:*fl=sedocid,scoreversion=1} hits=302 status=0 QTime=0 May 21, 2010 12:27:32 PM org.apache.solr.core.SolrCore execute INFO: [core0] webapp=/solr path=/select params={wt=javabinisShard=truerows=10start=0ids=core0_IM10009_0,core0_IM10006_0,core0_IM10002_0,core0_IM10010_0,core0_IM10007_0,core0_IM10004_0,core0_IM10001_0,core0_IM10003_0,core0_IM10008_0,core0_IM10005_0q=*:*version=1} status=0 QTime=0 May 21, 2010 12:27:32 PM org.apache.solr.core.SolrCore execute INFO: [core0] webapp=/solr path=/select params={rows=10start=0indent=onq=*:*shards=localhost:8080/solr/core0,localhost:8080/solr/core1version=2.2} status=0 QTime=172 -- View this message in context: http://lucene.472066.n3.nabble.com/No-hits-returned-from-shard-search-on-multi-core-setup-tp835169p835169.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Non-English query via Solr Example Admin corrupts text
: I wanted to improve the documentation in the solr wiki by adding in my : findings. However, when I try to log in and create a new account, I : receive this error message: : : You are not allowed to do newaccount on this page. Login and try again. : : Does anyone know how I can get permission to add a page to the : documentation? Hmmm... yes, there definitely seems to be a problem with creating new wiki accounts on wiki.apache.org -- i've opened an issue with INFRA... https://issues.apache.org/jira/browse/INFRA-2726 -Hoss
Re: Personalized Search
Excluding favorited items is an easier problem - get the results - get exclude list from db - scan results and exclude the items in the item list You'd have to do some code to manage 'holes' in the result list ie fetch more etc. You could marry this with the solr batch based approach to reduce the holes : - Every night, update the item.users field. This can be simple string type of field. - query with negative criteria ie content:search_term AND -users:userid - then do the steps outlined earlier On 5/21/10, Rih tanrihae...@gmail.com wrote: - keep the SOLR index independent of bought/like - have a db table with user prefs on a per item basis I have the same idea this far. at query time, specify boosts for 'my items' items I believe this works if you want to sort results by faved/not faved. But how does it scale if users already favorited/liked hundreds of item? The query can be quite long. Looking forward to your idea. On Thu, May 20, 2010 at 6:37 PM, dc tech dctech1...@gmail.com wrote: Another approach would be to do query time boosts of 'my' items under the assumption that count is limited: - keep the SOLR index independent of bought/like - have a db table with user prefs on a per item basis - at query time, specify boosts for 'my items' items We are planning to do this in the context of document management where documents in 'my (used/favorited ) folders' provide a boost factor to the results. On 5/20/10, findbestopensource findbestopensou...@gmail.com wrote: Hi Rih, You going to include either of the two field bought or like to per member/visitor OR a unique field per member / visitor? If it's one or two common fields are included then there will not be any impact in performance. If you want to include unique field then you need to consider multi value field otherwise you certainly hit the wall. Regards Aditya www.findbestopensource.com On Thu, May 20, 2010 at 12:13 PM, Rih tanrihae...@gmail.com wrote: Has anybody done personalized search with Solr? I'm thinking of including fields such as bought or like per member/visitor via dynamic fields to a product search schema. Another option is to have a multi-value field that can contain user IDs. What are the possible performance issues with this setup? Looking forward to your ideas. Rih -- Sent from my mobile device -- Sent from my mobile device
Re: TikaEntityProcessor on Solr 1.4?
2010/5/19 Noble Paul നോബിള് नोब्ळ् noble.p...@gmail.com: I guess it should work because Tika Entityprocessor does not use any new 1.4 APIs On Wed, May 19, 2010 at 1:17 AM, Sixten Otto six...@sfko.com wrote: The TikaEntityProcessor class that enables DataImportHandler to process business documents was added after the release of Solr 1.4, ... Has anyone tried back-porting those changes to Solr 1.4? Did you mean new 1.5 APIs (since TEP was added *after* 1.4 was released)? Even then, that doesn't make a lot of sense to me, as at least a couple of new things (the binary data sources) *were* added to support TikaEntityProcessor. I'm sorry if I'm being dense, but I'm having trouble understanding this answer. Sixten
RE: Non-English query via Solr Example Admin corrupts text
This should be fixed now -- please update the Jira issue if you have any other problems creating an account. : Hmmm... yes, there definitely seems to be a problem with creating new wiki : accounts on wiki.apache.org -- i've opened an issue with INFRA... : :https://issues.apache.org/jira/browse/INFRA-2726 -Hoss
RE: seemingly impossible query
I just realized something that may make the fieldcollapsing strategy insufficient. My 'ids' field is multi-valued. From what I've read you cannot field collapse on a multi-valued field. Any other ideas? Thanks, -Kallin Nagelberg -Original Message- From: Geert-Jan Brits [mailto:gbr...@gmail.com] Sent: Thursday, May 20, 2010 1:03 PM To: solr-user@lucene.apache.org Subject: Re: seemingly impossible query Hi Kallin, again please look at FieldCollapsinghttp://wiki.apache.org/solr/FieldCollapsing , that should do the trick. basically: first you constrain the field: 'listOfIds' to only contain docs that contain any of the (up to) 100 random ids as you know how to do Next, in the same query, specify to collapse on field 'listOfIds ' basically: q=listOfIds:1 OR listOfIds:10 OR listOfIds:24 collapse.threshold=1collapse.field=listOfIdscollapse.type=normal this would return the top-matching doc for each id left in listOfIds. Since you constrained this field by the ids specified you are left with 1 matching doc for each id. Again it is not guarenteed that all docs returned are different. Since you didn't specify this as a requirement I think this will suffics. Cheers, Geert-Jan 2010/5/20 Nagelberg, Kallin knagelb...@globeandmail.com Yeah I need something like: (id:1 and maxhits:1) OR (id:2 and maxits:1).. something crazy like that.. I'm not sure how I can hit solr once. If I do try and do them all in one big OR query then I'm probably not going to get a hit for each ID. I would need to request probably 1000 documents to find all 100 and even then there's no guarantee and no way of knowing how deep to go. -Kallin Nagelberg -Original Message- From: dar...@ontrenet.com [mailto:dar...@ontrenet.com] Sent: Thursday, May 20, 2010 12:27 PM To: solr-user@lucene.apache.org Subject: RE: seemingly impossible query I see. Well, now you're asking Solr to ignore its prime directive of returning hits that match a query. Hehe. I'm not sure if Solr has a unique attribute. But this sounds, to me, like you will have to filter the results yourself. But at least you hit Solr only once before doing so. Good luck! Thanks Darren, The problem with that is that it may not return one document per id, which is what I need. IE, I could give 100 ids in that OR query and retrieve 100 documents, all containing just 1 of the IDs. -Kallin Nagelberg -Original Message- From: dar...@ontrenet.com [mailto:dar...@ontrenet.com] Sent: Thursday, May 20, 2010 12:21 PM To: solr-user@lucene.apache.org Subject: Re: seemingly impossible query Ok. I think I understand. What's impossible about this? If you have a single field name called id that is multivalued then you can retrieved the documents with something like: id:1 OR id:2 OR id:56 ... id:100 then add limit 100. There's probably a more succinct way to do this, but I'll leave that to the experts. If you also only want the documents within a certain time, then you also create a time field and use a conjunction (id:0 ...) AND time:NOW-1H or something similar to this. Check the query syntax wiki for specifics. Darren Hey everyone, I've recently been given a requirement that is giving me some trouble. I need to retrieve up to 100 documents, but I can't see a way to do it without making 100 different queries. My schema has a multi-valued field like 'listOfIds'. Each document has between 0 and N of these ids associated to them. My input is up to 100 of these ids at random, and I need to retrieve the most recent document for each id (N Ids as input, N docs returned). I'm currently planning on doing a single query for each id, requesting 1 row, and caching the result. This could work OK since some of these ids should repeat quite often. Of course I would prefer to find a way to do this in Solr, but I'm not sure it's capable. Any ideas? Thanks, -Kallin Nagelberg
Full Import failed
I am getting this error, any hint as where i should look SEVERE: Full Import failed org.apache.solr.handler.dataimport.DataImportHandlerException: java.lang.NoSuchMethodError: isEmpty at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:424) at org.apache.solr.handler.dataimport.DocBuilder.doFullDump(DocBuilder.java:242) at org.apache.solr.handler.dataimport.DocBuilder.execute(DocBuilder.java:180) at org.apache.solr.handler.dataimport.DataImporter.doFullImport(DataImporter.java:331) at org.apache.solr.handler.dataimport.DataImporter.runCmd(DataImporter.java:389) at org.apache.solr.handler.dataimport.DataImporter$1.run(DataImporter.java:370) Caused by: java.lang.NoSuchMethodError: isEmpty at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:391) ... 5 more P.S: I am using ClobTransformer and HTMLStripTransformer
Re: Full Import failed
Last I encountered that exception was with the usage of String.isEmpty which is a 1.6 novelty. Can it be you've been running 1.5? paul Le 21-mai-10 à 22:44, Mohamed Parvez a écrit : SEVERE: Full Import failed org.apache.solr.handler.dataimport.DataImportHandlerException: java.lang.NoSuchMethodError: isEmpty at org .apache .solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:424) smime.p7s Description: S/MIME cryptographic signature
Re: TikaEntityProcessor on Solr 1.4?
You are right that TikaEntityProcessor has a couple of other prereqs beyond stock Solr 1.4. I think the main point is that they're relatively minor. I've merged TikaEntityProcessor (and some prereqs) and its dependencies into my Solr 1.4 tree and it compiles fine, though I haven't yet tested that TikaEntityProcessor actually works in my setup. Actually, rather than cherry-pick just the changes from SOLR-1358 and SOLR-1583 what I did was to merge in all DataImportHandler-related changes from between the 1.4 release up through Solr trunk r890679 (inclusive). I'm not sure if that's what would work best for you, but it's one option. On Fri, May 21, 2010 at 1:28 PM, Sixten Otto six...@sfko.com wrote: 2010/5/19 Noble Paul നോബിള് नोब्ळ् noble.p...@gmail.com: I guess it should work because Tika Entityprocessor does not use any new 1.4 APIs On Wed, May 19, 2010 at 1:17 AM, Sixten Otto six...@sfko.com wrote: The TikaEntityProcessor class that enables DataImportHandler to process business documents was added after the release of Solr 1.4, ... Has anyone tried back-porting those changes to Solr 1.4? Did you mean new 1.5 APIs (since TEP was added *after* 1.4 was released)? Even then, that doesn't make a lot of sense to me, as at least a couple of new things (the binary data sources) *were* added to support TikaEntityProcessor. I'm sorry if I'm being dense, but I'm having trouble understanding this answer. Sixten
Re: DataImportHandler and running out of disk space
I ran through some more failure scenarios (scenarios and results below). The concerning ones in my deployment are when data does not get updated, but the DIH's .properties file does. I could only simulate that scenario when I ran out of disk space (all all disk space issues behaved consistently). Is this worthy of a JIRA issue? Successful import all dates updated in .properties (title date updated, each [entity name].last_index_time updated to its own update time. last_index_time set to earliest entity update time) Running out of disk space during import (in data directory only, conf directory still has space) no data updated, but dataimport.properties updated as in 1 Running out of disk space during import (in both data directory and conf directory) some data updated, but dataimport.properties updated as in 1 Running out of disk space during commit/optimize (in data directory only, conf directory still has space) no data updated, but dataimport.properties updated as in 1 Running out of disk space during commit/optimize (in both data directory and conf directory) no data updated, but dataimport.properties updated as in 1 File permissions prevent writing (on index directory) data not updated, failure reported, properties file not updated File permissions prevent writing (on segment files) data updated, failure reported, properties file not updated File permissions prevent writing (on .properties file) data updated, failure reported, properties file not updated Shutting down Solr during import (killing process) data not updated, .properties not updated, no result reported Shutting down Solr during import (issuing shutdown message) Some data updated, .properties not updated, no result reported DB connection lost (unplugging network cable) data not updated, .properties not updated, failure reported Updating single entity fails (first one) data not updated, .properties not updated, failure reported Updating single entity fails (after another one succeeds) data not updated, .properties not updated, failure reported -- View this message in context: http://lucene.472066.n3.nabble.com/DataImportHandler-and-running-out-of-disk-space-tp835125p835368.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Full Import failed
yes i am running 1.5, Any idea how we can run Solr 1.4 using Java 1.5 --- Thanks/Regards, Parvez On Fri, May 21, 2010 at 4:17 PM, Paul Libbrecht p...@activemath.org wrote: Last I encountered that exception was with the usage of String.isEmpty which is a 1.6 novelty. Can it be you've been running 1.5? paul Le 21-mai-10 à 22:44, Mohamed Parvez a écrit : SEVERE: Full Import failed org.apache.solr.handler.dataimport.DataImportHandlerException: java.lang.NoSuchMethodError: isEmpty at org.apache.solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java:424)
Re: TikaEntityProcessor on Solr 1.4?
On Fri, May 21, 2010 at 5:30 PM, Chris Harris rygu...@gmail.com wrote: Actually, rather than cherry-pick just the changes from SOLR-1358 and SOLR-1583 what I did was to merge in all DataImportHandler-related changes from between the 1.4 release up through Solr trunk r890679 (inclusive). I'm not sure if that's what would work best for you, but it's one option. I'd rather, of course, not to have to build my own. But if I'm going to dabble in the source at all, it's just a slippery slope from the former to the latter. :-) (My main hesitation in doing so would be that I'm new enough to the code that I have no idea what core changes the trunk's DIH might also depend on. And my Java's pretty rusty.) How did you arrive at your patch? Just grafting the entire trunk/solr/contrib/dataimporthandler onto 1.4's code? Or did you go through Jira/SVN looking for applicable changesets? I'll be very interested to hear how your testing goes! Sixten
Re: Full Import failed
Fixing that precise line is very easy, and recompiling is easy as well. But I am absolutely not sure this will be the only occurrence of a 1.6 dependency. paul Le 21-mai-10 à 23:40, Mohamed Parvez a écrit : yes i am running 1.5, Any idea how we can run Solr 1.4 using Java 1.5 --- Thanks/Regards, Parvez On Fri, May 21, 2010 at 4:17 PM, Paul Libbrecht p...@activemath.org wrote: Last I encountered that exception was with the usage of String.isEmpty which is a 1.6 novelty. Can it be you've been running 1.5? paul Le 21-mai-10 à 22:44, Mohamed Parvez a écrit : SEVERE: Full Import failed org.apache.solr.handler.dataimport.DataImportHandlerException: java.lang.NoSuchMethodError: isEmpty at org .apache .solr.handler.dataimport.DocBuilder.buildDocument(DocBuilder.java: 424) smime.p7s Description: S/MIME cryptographic signature
field collapsing on multi-valued field
As I understand from looking at https://issues.apache.org/jira/login.jsp?os_destination=/browse/SOLR-236 field collapsing has been disabled on multi-valued fields. Is this really necessary? Let's say I have a multi-valued field, 'my-mv-field'. I have a query like (my-mv-field:1 OR my-mv-field:5) that returns docs with the following values for 'my-mv-field': Doc1: 1, 2, 3, Doc2: 1, 3 Doc3: 2, 4, 5, 6 Doc4: 1 If I collapse on that field with that query I imagine it should mean 'collect the docs, starting from the top, so that I find 1 and 5'. In this case if it returned Doc1 and Doc3 I would be happy. There must be some ambiguity or implementation detail I am unaware that is preventing this. It may be a critical piece of functionality for an application I'm working on, so I'm curious if there is point in pursuing development of this functionality or if I am missing something. Thanks, Kallin Nagelberg
RE: Solr 1.4 Enterprise Search Server book examples
I downloaded the examples and unzipped into C:\Examples C:\Examples\3 C:\Examples\7 C:\Examples\8 C:\Examples\9 C:\Examples\cores C:\Examples\solr Starting in the C:\Examples\solr folder run command 'java -jar start.jar' and it starts ok, but all the URI's return 404. I can get Solr running with Tomcat quite easily but wanted to try their Jetty version. Chapter 1 pages 15-18 just don't explain the OOTB install of the examples. I have tried java -jar start.jar java -Dsolr.solr.home=c:/Examples/solr/ -jar start.jar java -Dsolr.solr.home=c:\Examples\solr\ -jar start.jar java -Dsolr.solr.home=c:/Examples/solr -jar start.jar java -Dsolr.solr.home=c:\Examples\solr -jar start.jar java -Dsolr.solr.home=solr/ -jar start.jar What am I missing? --Robert -Original Message- From: Johan Cwiklinski [mailto:johan.cwiklin...@ajlsm.com] Sent: Friday, May 21, 2010 5:00 AM To: solr-user@lucene.apache.org Subject: Re: Solr 1.4 Enterprise Search Server book examples Hello, Le 21/05/2010 13:29, Stefan Moises a écrit : Hi, everybody who owns the book can now download the source code examples again, the zip file is fixed now - just got a message from Packt! :) https://www.packtpub.com/support?nid=4191 Have fun :) Cheers, Stefan I've received the same message today ; finally, I'll can take a look at those examples :) Regards, -- Johan Cwiklinski AJLSM
SolrJ/EmbeddedSolrServer
I've got a situation where my data directory (a) needs to live elsewhere besides inside of Solr home, (b) moves to a different location when updating indexes, and (c) setting up a symlink from solr_home/data isn't a great option. So what's the best approach to making this work with SolrJ? The low- level solution seems to be - create my own SolrCore instance, where I specify the data directory - use that to update the CoreContainer - create a new EmbeddedSolrServer But recreating the EmbeddedSolrServer with each index update feels wrong, and I'd like to avoid mucking around with low-level SolrCore instantiation. Any other approaches? Thanks, -- Ken Ken Krugler +1 530-210-6378 http://bixolabs.com e l a s t i c w e b m i n i n g
Re: DIH post import event listener for errors
Added a patch on the latest trunk: http://issues.apache.org/jira/browse/SOLR-1922 -- View this message in context: http://lucene.472066.n3.nabble.com/DIH-post-import-event-listener-for-errors-tp834645p835704.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: DIH post import event listener for errors
I'd consider using the logging framework. I do this with Log4j in other apps. Its a generic approach that works for just about any system. ~ David Smiley - Author: https://www.packtpub.com/solr-1-4-enterprise-search-server/book -- View this message in context: http://lucene.472066.n3.nabble.com/DIH-post-import-event-listener-for-errors-tp834645p835904.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Solr 1.4 Enterprise Search Server book examples
Hello Rob, Thank you for buying the book. I'm the lead author. There is a README.txt file in the root of the zip which includes a rather full invocation of java to kick off Solr that is to be used for the example data. The options as part of the invocation should elucidate what's going on. The layout of where Solr's home is in relation to where Jetty is does not coincide with a standard Solr distribution's example directory. In hind-site, I should have made it the same so as not to confuse people. Sorry. And I have no idea why the download got corrupted on Packt's server. I made a smaller distribution for them (~127MB vs 300-something) and put the data files on MusicBrainz' servers which are downloaded as part of the setup script you should run. ~ David Smiley - Author: https://www.packtpub.com/solr-1-4-enterprise-search-server/book -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-1-4-Enterprise-Search-Server-book-examples-tp756119p835927.html Sent from the Solr - User mailing list archive at Nabble.com.