Re: System requirements in my case?
Dedicated Server may not be required. If you want to cut down cost, then prefer shared server. How much the RAM? Regards Aditya www.findbestopensource.com On Tue, May 22, 2012 at 12:36 PM, Bruno Mannina bmann...@free.fr wrote: Dear Solr users, My company would like to use solr to index around 80 000 000 documents (xml files with around 5~10ko size each). My program (robot) will connect to this solr with boolean requests. Number of users: around 1000 Number of requests by user and by day: 300 Number of users by day: 30 I would like to subscribe to a host provider with this configuration: - Dedicated Server - Ubuntu - Intel Xeon i7 2x 266+ GHz 12 Go 2 * 1500Go - Unlimited bandwidth - IP fixe Do you think this configuration is enough? Thanks for your info, Sincerely Bruno
Re: Strategy for maintaining De-normalized indexes
Thats how de-normalization works. You need to update all child products. If you just need the count and you are using facets then maintain a map between category and main product, main product and child product. Lucene db has no schema. You could retrieve the data based on its type. Category record will have Category name, ProductName and a type (CATEGORY_TYPE) Child product record will have ProductName, MainProductName ProductDetails, and type (PRODUCT_TYPE) Now in this you may need to use two queries. Given the category name, fetch the main product name and query using it to fetch the child products. Hope it helps. Regards Aditya www.findbestopensource.com On Tue, May 22, 2012 at 1:37 PM, Sohail Aboobaker sabooba...@gmail.comwrote: Hi, I have a very basic question and hopefully there is a simple answer to this. We are trying to index a simple product catalog which has a master product and child products. Each master product can have multiple child products. A master product can be assigned one or more product categories. Now, we need to be able to show counts of categories based on number of child products in each category. We have indexed data using a join and selecting appropriate values for index from each table. This is basically a De-normalized result set. It works perfectly for our search purposes. However, maintaining the index and keeping index up to date is an issue. Whenever a product master is updated with a new category, we will need to delete all the index entries for child products in index and insert them again. This seems a lot of activity for a regular on-going operation i.e. product category updates. Since, join between schemas is only available in 4.0, what are other strategies to maintain or to create such queries. Thanks for your help. Regards, Sohail
Re: Multicore Solr
Having cores per user is not good idea. The count is too high. Keep everything in single core. You could filter the data based on user name or user id. Regards Aditya www.findbestopensource.com On Tue, May 22, 2012 at 2:29 PM, Shanu Jha shanuu@gmail.com wrote: Hi all, greetings from my end. This is my first post on this mailing list. I have few questions on multicore solr. For background we want to create a core for each user logged in to our application. In that case it may be 50, 100, 1000, N-numbers. Each core will be used to write and search index in real time. 1. Is this a good idea to go with? 2. What are the pros and cons of this approch? Awaiting for your response. Regards AJ
Re: System requirements in my case?
Seems to be fine. Go head. Before hosting, Have you tried / tested your application in local setup. RAM usage is what matters in terms of Solr. Just benchmark your app for 100 000 documents, Log the memory used. Calculate the RAM reqd for 80 000 000 documents. Regards Aditya www.findbestopensource.com On Tue, May 22, 2012 at 2:36 PM, Bruno Mannina bmann...@free.fr wrote: My choice: http://www.ovh.com/fr/**serveurs_dedies/eg_best_of.xmlhttp://www.ovh.com/fr/serveurs_dedies/eg_best_of.xml 24 Go DDR3 Le 22/05/2012 10:26, findbestopensource a écrit : Dedicated Server may not be required. If you want to cut down cost, then prefer shared server. How much the RAM? Regards Aditya www.findbestopensource.com On Tue, May 22, 2012 at 12:36 PM, Bruno Manninabmann...@free.fr wrote: Dear Solr users, My company would like to use solr to index around 80 000 000 documents (xml files with around 5~10ko size each). My program (robot) will connect to this solr with boolean requests. Number of users: around 1000 Number of requests by user and by day: 300 Number of users by day: 30 I would like to subscribe to a host provider with this configuration: - Dedicated Server - Ubuntu - Intel Xeon i7 2x 266+ GHz 12 Go 2 * 1500Go - Unlimited bandwidth - IP fixe Do you think this configuration is enough? Thanks for your info, Sincerely Bruno
Re: is commit a sequential process in solr indexing
Yes. Lucene / Solr supports multi threaded environment. You could do commit from two different threads to same core or different core. Regards Aditya www.findbestopensource.com On Tue, May 22, 2012 at 12:35 AM, jame vaalet jamevaa...@gmail.com wrote: hi, my use case here is to search all the incoming documents for certain comination of words which are pre-determined. So what am doing here is, create a batch of x docs according to their creation date, index them, commit them and search them for query (pre-determined). My question is, if i have to make the entire process multi threaded and two threads are trying to commit two different set of batchs, will the commit happen in parallel. what if am trying to commit to different solr-cores ? -- -JAME
Re: Fault tolerant Solr replication architecture
Hi Parvin, Fault tolerant architecture is something you need to decide on your requirement. At some point of time there may require some manual intervention to recover from crash. You need to see how much percentage you could support fault tolerant. It certainly may not be 100. We could handle situation of network failure but hard to handle situation of crashes. Consider you have one master and two slaves. You could have load balancer between slaves, so that you could do round-robin or fail-over between slaves. If you are not using load balancer then you should handle this in your application. If the master crashes, then you may need to rebuild the index. Chances are less likely. Regards Aditya www.findbestopensource.com On Mon, May 21, 2012 at 12:55 PM, Parvin Gasimzade parvin.gasimz...@gmail.com wrote: Hi, I am using solr with replication. I have one master that indexes data and two slaves which pulls index from master and responds to the queries. My question is, how can i create fault tolerant architecture? I mean what should i do when master server crashes? I heard that repeater is used for this type of architecture. Then, do I have to create one master, one slave with repeater and one slave? Another question is, if master crashes then does slave with repeater start indexing authomatically or should i configure it manually? I asked similar question on the stackoverflow : http://stackoverflow.com/questions/10597053/fault-tolerant-solr-replication-architecture Any help will be appreciated. Regards, Parvin
Re: curl or nutch
You could very well use Solr. It has support to index the PDF and XML files. If you want to index websites and search using page rank then choose Nutch. Regards Aditya www.findbestopensource.com On Wed, May 16, 2012 at 1:13 PM, Tolga to...@ozses.net wrote: Hi, I have been trying for a week. I really want to get a start, so what should I use? curl or nutch? I want to be able to index pdf, xml etc. and search within them as well. Regards,
Re: authentication for solr admin page?
I have written an article on this. The various steps to restrict / authenticate Solr admin interface. http://www.findbestopensource.com/article-detail/restrict-solr-admin-access Regards Aditya www.findbestopensource.com On Thu, Mar 29, 2012 at 1:06 AM, geeky2 gee...@hotmail.com wrote: update - ok - i was reading about replication here: http://wiki.apache.org/solr/SolrReplication and noticed comments in the solrconfig.xml file related to HTTP Basic Authentication and the usage of the following tags: str name=httpBasicAuthUserusername/str str name=httpBasicAuthPasswordpassword/str *Can i place these tags in the request handler to achieve an authentication scheme for the /admin page?* // snipped from the solrconfig.xml file requestHandler name=/admin/ class=org.apache.solr.handler.admin.AdminHandlers/ thanks for any help mark -- View this message in context: http://lucene.472066.n3.nabble.com/authentication-for-solr-admin-page-tp3865665p3865747.html Sent from the Solr - User mailing list archive at Nabble.com.
Large data set or data corpus
Hello all, Recently i saw couple of discussions in LinkedIn group about generating large data set or data corpus. I have compiled the same in to an article. Hope it would be helpful. If you have any other links where we could get large data set for free, please reply to this mail thread, i will update my article. http://www.findbestopensource.com/article-detail/free-large-data-corpus Regards Aditya www.findbestopensource.com
Re: Search Issue
While indexing @ is removed. You need to use your own Tokenizer which will consider @rohit as one word. Another option is to break the tweet in to two fields, @username and the tweet. Index both the fields but don't use any tokenizer for the field @username. Just index as it is. While querying you need to search for both the fields. This method will help to fetch tweets of the particular user. Regards Aditya www.findbestopensource.com On Wed, Jan 11, 2012 at 3:50 PM, Rohit ro...@in-rev.com wrote: Hi, We are storing a large number of tweets and blogs feeds into solr. Now if the user searches for twitter mentions like, @rohit , records which just contain the word rohit are also being returned. Even if we do an exact match @rohit, I understand this happens because of use of WordDelimiterFilterFactory which splits on special charaters, http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.WordDelimit erFilterFactory How can I force Solr to not return without @? Hope I am being clear. Regards, Rohit
Re: Thoughts on Search Analytics?
1. Reports based on Location. Group by City / Country 2. Total search performed per hour / week / month 3. Frequently used search keywords 4. Analytics based on search keywords. Regards Aditya www.findbestopensource.com On Fri, May 6, 2011 at 3:55 AM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: Hi, I'd like to solicit your thoughts about Search Analytics if you are doing any sort of analysis/reporting of search logs or click stream or anything related. * Which information or reports do you find the most useful and why? * Which reports would you like to have, but don't have for whatever reason (don't have the needed data, or it's too hard to produce such reports, or ...) * Which tool(s) or service(s) do you use and find the most useful? I'm preparing a presentation on the topic of Search Analytics, so I'm trying to solicit opinions, practices, desires, etc. on this topic. Your thoughts would be greatly appreciated. If you could reply directly, that would be great, since this may be a bit OT for the list. Thanks! Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/
Re: How can i use Solr based Search Engine for My University?
Hello Anurag Google is always there to do internet search. You need to support search for your university. My opinion would be don't crawl the sites. You require only Solr and not Nutch. 1. Provide an interface to upload the documents by the university students. The documents could be previous year question paper, Notes, E-books etc. Scan the documents and convert it to PDF and upload them. Providing search on these things would be more valuable than crawling the sites. Regards Aditya www.findbestopensource.com On Fri, May 6, 2011 at 1:31 PM, Anurag anurag.it.jo...@gmail.com wrote: I am a student at http://jmi.ac.in/index.htm Jamia Millia Islamia , a central univeristy in India. I want to use my search engine for the benefit of students. The university has course like undergraduate,graduate,phd etc inlcuding Engineering . Earlier one of my teacher suggested developing Intranet Search ( for Lan) , but i am not able to figure it out as to how to implement it. My university uses Google as its own site search tool. I am in Engg department and i see students( including me ) using Xerox, Previous year papers , Notes etc during exam time. People use internet or say google to learn if any topics is not inlucded in book. Please give some valuable suggestions. Thanks - Kumar Anurag -- View this message in context: http://lucene.472066.n3.nabble.com/How-can-i-use-Solr-based-Search-Engine-for-My-University-tp2907168p2907168.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Is it possible to use sub-fields or multivalued fields for boosting?
Hello deniz, You could create a new field say FullName which is a copyfield of firstname and surname. Search on both the new field and location but boost up the new field query. Regards Aditya www.findbestopensource.com On Thu, May 5, 2011 at 9:21 AM, deniz denizdurmu...@gmail.com wrote: okay... let me make the situation more clear... I am trying to create an universal field which includes information about users like firstname, surname, gender, location etc. When I enter something e.g London, I would like to match any users having 'London' in any field firstname, surname or location. But if it matches name or surname, I would like to give a higher weight. so my question is... is it possible to have sub-fields? like field name=universal field name=firstnameblabla/field field name=surnameblabla/field field name=genderblabla/field field name=locationblabla/field /field or any other ideas for implementing such feature? -- View this message in context: http://lucene.472066.n3.nabble.com/Is-it-possible-to-use-sub-fields-or-multivalued-fields-for-boosting-tp2901992p2901992.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: [ANNOUNCE] Web Crawler
Hello Dominique Bejean, Good job. We identified almost 8 open source web crawlers http://www.findbestopensource.com/tagged/webcrawler I don't know how far yours would be different from the rest. Your license states that it is not open source but it is free for personnel use. Regards Aditya www.findbestopensource.com On Wed, Mar 2, 2011 at 5:55 AM, Dominique Bejean dominique.bej...@eolya.frwrote: Hi, I would like to announce Crawl Anywhere. Crawl-Anywhere is a Java Web Crawler. It includes : * a crawler * a document processing pipeline * a solr indexer The crawler has a web administration in order to manage web sites to be crawled. Each web site crawl is configured with a lot of possible parameters (no all mandatory) : * number of simultaneous items crawled by site * recrawl period rules based on item type (html, PDF, …) * item type inclusion / exclusion rules * item path inclusion / exclusion / strategy rules * max depth * web site authentication * language * country * tags * collections * ... The pileline includes various ready to use stages (text extraction, language detection, Solr ready to index xml writer, ...). All is very configurable and extendible either by scripting or java coding. With scripting technology, you can help the crawler to handle javascript links or help the pipeline to extract relevant title and cleanup the html pages (remove menus, header, footers, ..) With java coding, you can develop your own pipeline stage stage The Crawl Anywhere web site provides good explanations and screen shots. All is documented in a wiki. The current version is 1.1.4. You can download and try it out from here : www.crawl-anywhere.com Regards Dominique
Re: Does Solr supports indexing search for Hebrew.
You may need to use Hebrew analyzer. http://www.findbestopensource.com/search/?query=hebrew Regards Aditya www.findbestopensource.com On Tue, Jan 18, 2011 at 2:34 PM, prasad deshpande prasad.deshpand...@gmail.com wrote: Hello, With reference to below links I haven't found Hebrew support in Solr. http://wiki.apache.org/solr/LanguageAnalysis http://lucene.apache.org/java/3_0_3/api/all/index.html If I want to index and search Hebrew files/data then how would I achieve this? Thanks, Prasad
Re: Spatial Search - Best choice ?
Some more pointers to spatial search, http://www.jteam.nl/products/spatialsolrplugin.html http://code.google.com/p/spatial-search-lucene/ http://sujitpal.blogspot.com/2008/02/spatial-search-with-lucene.html Regards Aditya www.findbestopensource.com On Thu, Jul 15, 2010 at 3:54 PM, Saïd Radhouani r.steve@gmail.comwrote: Hi, Using Solr 1.4, I'm now working on adding spatial search options, such as distance-based sorting, Bounding-box filter, etc. To the best of my knowledge, there are three possible points we can start from: 1. The http://blog.jteam.nl/2009/08/03/geo-location-search-with-solr-and-lucene/ 2. The gissearch.com 3. The http://www.ibm.com/developerworks/opensource/library/j-spatial/index.html#resources I saw that these three options have been used but didn't see any comparison between them. Is there any one out there who can recommend one option over another? Thanks, -S
Re: Cache full text into memory
You have two options 1. Store the compressed text as part of stored field in Solr. 2. Using external caching. http://www.findbestopensource.com/tagged/distributed-caching You could use ehcache / Memcache / Membase. The problem with external caching is you need to synchronize the deletions and modification. Fetching the stored field from Solr is also faster. Regards Aditya www.findbestopensource.com On Wed, Jul 14, 2010 at 12:08 PM, Li Li fancye...@gmail.com wrote: I want to cache full text into memory to improve performance. Full text is only used to highlight in my application(But it's very time consuming, My avg query time is about 250ms, I guess it will cost about 50ms if I just get top 10 full text. Things get worse when get more full text because in disk, it scatters erverywhere for a query.). My full text per machine is about 200GB. The memory available for store full text is about 10GB. So I want to compress it in memory. Suppose compression ratio is 1:5, then I can load 1/4 full text in memory. I need a Cache component for it. Has anyone faced the problem before? I need some advice. Is it possbile using external tools such as MemCached? Thank you.
Re: Cache full text into memory
I have just provided you two options. Since you already store as part of the index, You could try external caching. Try using ehcache / Membase http://www.findbestopensource.com/tagged/distributed-caching . The caching system will do LRU and is much more efficient. On Wed, Jul 14, 2010 at 12:39 PM, Li Li fancye...@gmail.com wrote: I have already store it in lucene index. But it is in disk and When a query come, it must seek the disk to get it. I am not familiar with lucene cache. I just want to fully use my memory that load 10GB of it in memory and a LRU stragety when cache full. To load more into memory, I want to compress it in memory. I don't care much about disk space so whether or not it's compressed in lucene . 2010/7/14 findbestopensource findbestopensou...@gmail.com: You have two options 1. Store the compressed text as part of stored field in Solr. 2. Using external caching. http://www.findbestopensource.com/tagged/distributed-caching You could use ehcache / Memcache / Membase. The problem with external caching is you need to synchronize the deletions and modification. Fetching the stored field from Solr is also faster. Regards Aditya www.findbestopensource.com On Wed, Jul 14, 2010 at 12:08 PM, Li Li fancye...@gmail.com wrote: I want to cache full text into memory to improve performance. Full text is only used to highlight in my application(But it's very time consuming, My avg query time is about 250ms, I guess it will cost about 50ms if I just get top 10 full text. Things get worse when get more full text because in disk, it scatters erverywhere for a query.). My full text per machine is about 200GB. The memory available for store full text is about 10GB. So I want to compress it in memory. Suppose compression ratio is 1:5, then I can load 1/4 full text in memory. I need a Cache component for it. Has anyone faced the problem before? I need some advice. Is it possbile using external tools such as MemCached? Thank you.
Re: Cache full text into memory
I doubt about it. Caching system is a key value store. You have to use some compression library to compress and decompress your data. Caching system helps to retrieve fast. Anyways please take a look of each of the caching system features. Regards Aditya www.findbestopensource.com On Wed, Jul 14, 2010 at 3:06 PM, Li Li fancye...@gmail.com wrote: Thank you. I don't know which cache system to use. In my application, the cache system must support compression algorithm which has high compression ratio and fast decompression speed(because each time it get from cache, it must decompress). 2010/7/14 findbestopensource findbestopensou...@gmail.com: I have just provided you two options. Since you already store as part of the index, You could try external caching. Try using ehcache / Membase http://www.findbestopensource.com/tagged/distributed-caching . The caching system will do LRU and is much more efficient. On Wed, Jul 14, 2010 at 12:39 PM, Li Li fancye...@gmail.com wrote: I have already store it in lucene index. But it is in disk and When a query come, it must seek the disk to get it. I am not familiar with lucene cache. I just want to fully use my memory that load 10GB of it in memory and a LRU stragety when cache full. To load more into memory, I want to compress it in memory. I don't care much about disk space so whether or not it's compressed in lucene . 2010/7/14 findbestopensource findbestopensou...@gmail.com: You have two options 1. Store the compressed text as part of stored field in Solr. 2. Using external caching. http://www.findbestopensource.com/tagged/distributed-caching You could use ehcache / Memcache / Membase. The problem with external caching is you need to synchronize the deletions and modification. Fetching the stored field from Solr is also faster. Regards Aditya www.findbestopensource.com On Wed, Jul 14, 2010 at 12:08 PM, Li Li fancye...@gmail.com wrote: I want to cache full text into memory to improve performance. Full text is only used to highlight in my application(But it's very time consuming, My avg query time is about 250ms, I guess it will cost about 50ms if I just get top 10 full text. Things get worse when get more full text because in disk, it scatters erverywhere for a query.). My full text per machine is about 200GB. The memory available for store full text is about 10GB. So I want to compress it in memory. Suppose compression ratio is 1:5, then I can load 1/4 full text in memory. I need a Cache component for it. Has anyone faced the problem before? I need some advice. Is it possbile using external tools such as MemCached? Thank you.
Re: Use of EmbeddedSolrServer
Refer http://wiki.apache.org/solr/Solrj#EmbeddedSolrServer Regards Aditya www.findbestopensource.com On Fri, Jun 11, 2010 at 2:25 PM, Robert Naczinski robert.naczin...@googlemail.com wrote: Hello experts, we would like to use Solr in our search application. We want to index a large inventory in the database. The initial index is not a problem, but it should also updates the Db. These tables, we plan to provide you with triggers. The only problem is the Datenbnk on zOS. But we get the updates on PL1. The Procedure to send a message through MQSeries that we would handy tips on the application. Then, the index should be updated. If the plan valid? If so, do I have in my application and use with EJB Message Driven Bean. The normal Solr server is but one was. Therefore, I would use EmbeddedSolrServer in a application deployed on WebSphere AppServer. Can I find somewhere a manual for the use of EmbeddedSolrServer? Regards, Robert
Re: Indexing link targets in HTML fragments
Could you tell us your schema used for indexing. In my opinion, using standardanalyzer / Snowball analyzer will do the best. They will not break the URLs. Add href, and other related html tags as part of stop words and it will removed while indexing. Regards Aditya www.findbestopensource.com On Mon, Jun 7, 2010 at 12:20 PM, Andrew Clegg andrew.cl...@gmail.comwrote: Lance Norskog-2 wrote: The PatternReplace and HTMPStrip tokenizers might be the right bet. The easiest way to go about this is to make a bunch of text fields with different analysis stacks and investigate them in the Scema Browser. You can paste an HTML document into the text box and see exactly how the words markup get torn apart. Thanks Lance, I'll experiment. For reference, for anyone else who comes across this thread -- the html in my original post might have got munged on the way into or out of the list server. It was supposed to look like this: This is the entire content of my field, but [a href=http://example.com/]some of the words[/a] are a hyperlink. (but with real html tags instead of the square brackets) and I am just trying to extract the words and the link target but lose the rest of the markup. Cheers, Andrew. -- View this message in context: http://lucene.472066.n3.nabble.com/Indexing-link-targets-in-HTML-fragments-tp874547p875503.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Query Question
What analyzer you are using to index and search? Check out schema.xml. You are currently using analyzer which breaks the words. If you don't want to break then you need to use tokenizer class=solr.KeywordTokenizerFactory/. Regards Aditya www.findbestopensource.com On Wed, Jun 2, 2010 at 2:41 PM, M.Rizwan muhammad.riz...@sigmatec.com.pkwrote: Hi, I have solr 1.4. In schema i have a field called title of type text Now problem is, when I search for Test_Title it brings all documents with titles like Test-Title, Test_Title, Test,Title, Test Title, Test.Title What to do to avoid this? Test_Title should only return documents having title Test_Title Any idea? Thanks - Riz
Re: logic for auto-index
You need to do schedule your task. Check out schedulers available in all programming languages. http://www.findbestopensource.com/tagged/job-scheduler Regards Aditya www.findbestopensource.com On Wed, Jun 2, 2010 at 2:39 PM, Jonty Rhods jonty.rh...@gmail.com wrote: Hi Peter, actually I want the index process should start automatically. right now I am doing mannually. same thing I want to start indexing when less load on server i.e. late night. So setting auto will fix my problem.. On Wed, Jun 2, 2010 at 2:00 PM, Peter Karich peat...@yahoo.de wrote: Hi Jonty, what is your specific problem? You could use a cronjob or the Java-lib called quartz to automate this task. Or did you mean replication? Regards, Peter. Hi All, I am very new to solr as well as java too. I require to use solrj for indexing also require to index automatically once in 24 hour. I wrote java code for indexing now I want to do further coding for automatic process. Could you suggest or give me sample code for automatic index process.. please help.. with regards Jonty.
Re: newbie question on how to batch commit documents
Add commit after the loop. I would advise to use commit in a separate thread. I do keep separate timer thread, where every minute I will do commit and at the end of every day I will optimize the index. Regards Aditya www.findbestopensource.com On Tue, Jun 1, 2010 at 2:57 AM, Steve Kuo kuosen...@gmail.com wrote: I have a newbie question on what is the best way to batch add/commit a large collection of document data via solrj. My first attempt was to write a multi-threaded application that did following. CollectionSolrInputDocument docs = new ArrayListSolrInputDocument(); for (Widget w : widges) { doc.addField(id, w.getId()); doc.addField(name, w.getName()); doc.addField(price, w.getPrice()); doc.addField(category, w.getCat()); doc.addField(srcType, w.getSrcType()); docs.add(doc); // commit docs to solr server server.add(docs); server.commit(); } And I got this exception. rg.apache.solr.common.SolrException: Error_opening_new_searcher_exceeded_limit_of_maxWarmingSearchers2_try_again_later Error_opening_new_searcher_exceeded_limit_of_maxWarmingSearchers2_try_again_later at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:424) at org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:243) at org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105) at org.apache.solr.client.solrj.SolrServer.commit(SolrServer.java:86) The solrj wiki/documents seemed to indicate that because multiple threads were calling SolrServer.commit() which in term called CommonsHttpSolrServer.request() resulting in multiple searchers. My first thought was to change the configs for autowarming. But after looking at the autowarm params, I am not sure what can be changed or perhaps a different approach is recommened. filterCache class=solr.FastLRUCache size=512 initialSize=512 autowarmCount=0/ queryResultCache class=solr.LRUCache size=512 initialSize=512 autowarmCount=0/ documentCache class=solr.LRUCache size=512 initialSize=512 autowarmCount=0/ Your help is much appreciated.
Re: Using solrJ to get all fields in a particular schema/index
To reterive all documents, You need to use the query/filter *FieldName:*:** Regards Aditya www.findbestopensource.com On Tue, May 25, 2010 at 4:14 PM, Rakhi Khatwani rkhatw...@gmail.com wrote: Hi, Is there any way to get all the fields (irrespective of whether it contains a value or null) in solrDocument. or Is there any way to get all the fields in schema.xml of the url link ( http://localhost:8983/solr/core0/)?? Regards, Raakhi
Re: Using solrJ to get all fields in a particular schema/index
To reterive all documents, You need to use the query/filter *FieldName:*:** Regards Aditya www.findbestopensource.com On Tue, May 25, 2010 at 4:14 PM, Rakhi Khatwani rkhatw...@gmail.com wrote: Hi, Is there any way to get all the fields (irrespective of whether it contains a value or null) in solrDocument. or Is there any way to get all the fields in schema.xml of the url link ( http://localhost:8983/solr/core0/)?? Regards, Raakhi
Re: Using solrJ to get all fields in a particular schema/index
Resending it as there is a typo error. To reterive all documents, You need to use the query/filter FieldName:*:* . Regards Aditya www.findbestopensource.com On Tue, May 25, 2010 at 4:29 PM, findbestopensource findbestopensou...@gmail.com wrote: To reterive all documents, You need to use the query/filter *FieldName:*:* * Regards Aditya www.findbestopensource.com On Tue, May 25, 2010 at 4:14 PM, Rakhi Khatwani rkhatw...@gmail.comwrote: Hi, Is there any way to get all the fields (irrespective of whether it contains a value or null) in solrDocument. or Is there any way to get all the fields in schema.xml of the url link ( http://localhost:8983/solr/core0/)?? Regards, Raakhi
Re: Using solrJ to get all fields in a particular schema/index
If a field doesn't have a value, You will get NULL on retrieving it. How could you expect a value for a field which is not provided? You have two options, choose either one.. 1. If the fieldvalue is returned NULL then display a proper error / user defined message. Handle the error. 2. Add a dummy value say NO_VALUE to the title field, which doesn't have any value. Regards Aditya www.findbestopensource.com On Tue, May 25, 2010 at 5:20 PM, Rakhi Khatwani rkhatw...@gmail.com wrote: Hi Aditya, i can retrieve all documents. but cannot retrieve all the fields in a document(if it does not hv any value). For example i get a list of documents, some of the documents have some value for title field, and others mite not contain a value for title field. in anycase i need to get the entry for title in getFieldNames(). How do i go about that? Regards, Raakhi On Tue, May 25, 2010 at 5:07 PM, findbestopensource findbestopensou...@gmail.com wrote: Resending it as there is a typo error. To reterive all documents, You need to use the query/filter FieldName:*:* . Regards Aditya www.findbestopensource.com On Tue, May 25, 2010 at 4:29 PM, findbestopensource findbestopensou...@gmail.com wrote: To reterive all documents, You need to use the query/filter *FieldName:*:* * Regards Aditya www.findbestopensource.com On Tue, May 25, 2010 at 4:14 PM, Rakhi Khatwani rkhatw...@gmail.com wrote: Hi, Is there any way to get all the fields (irrespective of whether it contains a value or null) in solrDocument. or Is there any way to get all the fields in schema.xml of the url link ( http://localhost:8983/solr/core0/)?? Regards, Raakhi
Re: Personalized Search
Hi Rih, You going to include either of the two field bought or like to per member/visitor OR a unique field per member / visitor? If it's one or two common fields are included then there will not be any impact in performance. If you want to include unique field then you need to consider multi value field otherwise you certainly hit the wall. Regards Aditya www.findbestopensource.com On Thu, May 20, 2010 at 12:13 PM, Rih tanrihae...@gmail.com wrote: Has anybody done personalized search with Solr? I'm thinking of including fields such as bought or like per member/visitor via dynamic fields to a product search schema. Another option is to have a multi-value field that can contain user IDs. What are the possible performance issues with this setup? Looking forward to your ideas. Rih
Re: Moving from Lucene to Solr?
Hi Peter, You need to use Lucene, - To have more control - You cannot depend on any Web server - To use termvector, termdocs etc - You could easily extend to have your own Analyzer You need to use Solr, - To index and search docs easily by writting few code - Solr is a standalone App and it takes care most of the stuff like optimizing,warmup the reader etc.. - Solr could be extended to multiple nodes - To use facet If you are developing your client in Java and want to use Solr then i would advise to use SolrJ as it is easy and you don't need to care about HTTP stuff. I use Solr using SolrJ in my project www.findbestopensource.com Regards Aditya www.findbestopensource.com On Wed, May 19, 2010 at 4:08 PM, Peter Karich peat...@yahoo.de wrote: Hi all, while asking a question on stackoverflow [1] some other questions appear: Is SolrJ a recommended way to access Solr or should I prefer the HTTP interface? How can I (j)unit-test Solr? (e.g. create+delete index via Java call) Is Lucene faster than Solr? ... do you have experiences, preferable with the same index? The background is an application which uses Lucene at the moment but I hardly need the facetting feature of Solr and I don't want to implement it in Lucene for myself. Regards, Peter. [1] http://stackoverflow.com/questions/2856427/situations-to-prefer-apache-lucene-over-solr
Re: Solr Deployment Question
Please explain how you have handled two indexes in a single VM. Is it multi core? To identify memory consumption, You need to calculate usedmemory before and after loading the indexes, basically calculate usedmemory before and after any check point you want to analyse. Their difference will give you the actual memory consumption. Regards Aditya http://www.findbestopensource.com On Fri, May 14, 2010 at 11:14 AM, Maduranga Kannangara mkannang...@infomedia.com.au wrote: But even we used a single index, we were running out of memory. What do you mean by active? No queries on the masters. Only one index is being processed/optimized. Also, if I may add to my same question, how can I find the amount of memory that an index would use, theoretically? i.e.: Is there a formulae etc? Thanks Madu -Original Message- From: findbestopensource [mailto:findbestopensou...@gmail.com] Sent: Friday, 14 May 2010 3:34 PM To: solr-user@lucene.apache.org Subject: Re: Solr Deployment Question You may use one index at a time, but both indexes are active and loaded all its terms in memory. Memory consumption will be certainly more. Regards Aditya http://www.findbestopensource.com On Fri, May 14, 2010 at 10:28 AM, Maduranga Kannangara mkannang...@infomedia.com.au wrote: Hi We use separate JVMs to Index and Query. (Client applications will query only slaves, while master does only indexing) Recently we moved a two master indexes to a single JVM. Our memory allocation was for each index was 512Mb and 1Gb. Once we moved both indexes to a single VM, we thought it would still Index using 1Gb as we use only one index at a time. But for our surprise it needed more than that (1.2Gb) even though only one index was used at a time. Can I know why, or can I know how to find why this is? Solr 1.4 Java 1.6.0_20 We use a VPS for deployment. Thanks in advance Madu
Re: Solr Deployment Question
You may use one index at a time, but both indexes are active and loaded all its terms in memory. Memory consumption will be certainly more. Regards Aditya http://www.findbestopensource.com On Fri, May 14, 2010 at 10:28 AM, Maduranga Kannangara mkannang...@infomedia.com.au wrote: Hi We use separate JVMs to Index and Query. (Client applications will query only slaves, while master does only indexing) Recently we moved a two master indexes to a single JVM. Our memory allocation was for each index was 512Mb and 1Gb. Once we moved both indexes to a single VM, we thought it would still Index using 1Gb as we use only one index at a time. But for our surprise it needed more than that (1.2Gb) even though only one index was used at a time. Can I know why, or can I know how to find why this is? Solr 1.4 Java 1.6.0_20 We use a VPS for deployment. Thanks in advance Madu
Re: multi-valued associated fields
Hello Eric, Certainly it is possible. I would strongly advice to have field which differentiates the record type (RECORD_TYPE:CAR / PROPERTY). In general I was also wondering how Solr developers implement websites that uses tag filters.For example, a user clicks on Hard drives then get tags External, Internal then clicks on External and gets usb, firewire etc. By using faceting queries, You could acheive this. Regards Aditya www.findbestopensource.com On Wed, May 12, 2010 at 12:29 PM, Eric Grobler impalah...@googlemail.comwrote: Hallo Solr community, We are considering Solr for searching on content from various partners with wildly different content. Is it possible or practical to work with multi-valued associated fields like this? Make:Audi, Model:A4, Color:Blue, Year:1998, KM:20, Extras:GPS Type:Flat, Rooms:2, Period:6 months Make:Toshiba, Model:Tecra, RAM:4GB, Extras:BlueRay;Lock Breed:Siamese, Age:9 weeks and do: - searching on individual keys - range queries within multi-valued fields. - faceting I suppose an alternative would be to create unnamed fields like range1, range2, range3 with a descripter field like Year,KM,EngineSize for a car document and Rooms for a property document for example. In general I was also wondering how Solr developers implement websites that uses tag filters. For example, a user clicks on Hard drives then get tags External, Internal then clicks on External and gets usb, firewire etc. Any suggestions and feedback would be greatly appreciated. Regards Eric
Re: Solr 1.4 Enterprise Search Server book examples
I downloaded the 5883_Code.zip file but not able to extract the complete contents. Regards Aditya www.findbestopensource.com On Tue, Apr 27, 2010 at 12:45 AM, Johan Cwiklinski johan.cwiklin...@ajlsm.com wrote: Hello, Le 26/04/2010 20:53, findbestopensource a écrit : I am able to successfully download the code. It is of 360 MB and took lot of time to download. I'm also able to download the file ; but not to extract many of the files it contains after download (can list them but not extract, an error occurs). Are you able to extract the ZIP archive you've downloaded? https://www.packtpub.com/solr-1-4-enterprise-search-server/book Select the download the code link and provide your email id, Download link will be sent via email. Regards Aditya www.findbestopensource.com On Mon, Apr 26, 2010 at 8:34 PM, Abdelhamid ABID aeh.a...@gmail.com wrote: Hi, I'm also interested to get those examples, would someone to share them ? On 4/26/10, markus.rietz...@rzf.fin-nrw.de markus.rietz...@rzf.fin-nrw.de wrote: i have send you a private mail. markus -Ursprüngliche Nachricht- Von: Johan Cwiklinski [mailto:johan.cwiklin...@ajlsm.com] Gesendet: Montag, 26. April 2010 10:58 An: solr-user@lucene.apache.org Betreff: Solr 1.4 Enterprise Search Server book examples Hello, We've recently acquired the Solr 1.4 Enterprise Search Server book. I've tried to download the example ZIP file from the editor's website, but the file is actually corrupted, and I cannot unzip it :( Could someone tell me if I can get these examples from another location? I've send a message last week to the editor reporting the issue, but that is not yet fixed ; and I'd really like to take a look at the example code and make some tests. Regards, -- Johan Cwiklinski -- Abdelhamid ABID Software Engineer- J2EE / WEB -- Johan Cwiklinski
Re: hybrid approach to using cloud servers for Solr/Lucene
Hello Dennis If the load goes up, then queries are sent to the cloud at a certain point. My advice is to do load balancing between local and cloud. Your local system seems to be capable as it is a dedicated host. Another option is to do indexing in local and sync it with cloud. Cloud will be only used for search. Hope it helps. Regards Aditya www,findbestopensource.com On Mon, Apr 26, 2010 at 7:47 AM, Dennis Gearon gear...@sbcglobal.netwrote: I'm working on an app that could grow much faster and bigger than I could scale local resources, at least on certain dates and for other reasons. So I'd like to run a local machine in a dedicated host or even virtual machine at a host. If the load goes up, then queries are sent to the cloud at a certain point. Is this practical, anyone have experience in this? This is obviously a search engine app based on solr/lucene if someone is wondering. Dennis Gearon Signature Warning EARTH has a Right To Life, otherwise we all die. Read 'Hot, Flat, and Crowded' Laugh at http://www.yert.com/film.php
Re: Best Open Source
Thank you Dave and Michael for your feedback. We are currently in beta and we will fix these issues sooner. Regards Aditya www.findbestopensource.com On Tue, Apr 20, 2010 at 3:01 PM, Michael Kuhlmann michael.kuhlm...@zalando.de wrote: Nice site. Really! In addition to Dave: How do I search with tags enabled? If I search for Blog, I can see that there's one blog software written in Java. When I click on the Java tag, then my search is discarded, and I get all Java software. when I do my search again, the tag filter is lost. It seems to be impossible to combine tag filters with search. -Michael Am 20.04.2010 11:00, schrieb solai ganesh: Hello all, We have launched a new site hosting the best open source products and libraries across all categories. This site is powered by Solr search. There are many open source products available in all categories and it is sometimes difficult to identify which is the best. We identify the best. As a open source users, you might be using many opensource products and libraries , It would be great, if you help us to identify the best. http://www.findbestopensource.com/ Regards Aditya