Re: performace jetty (jetty.xml)
ok Thanx ;) 2011/10/19 Otis Gospodnetic otis_gospodne...@yahoo.com Gastone, Those numbers are probably OK. Let us know if you have any actual problems with Solr 3.4. Oh, and use the solr-user mailing list instead please. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ -- *From:* Gastone Penzo gastone.pe...@gmail.com *To:* solr-user@lucene.apache.org; d...@lucene.apache.org *Sent:* Tuesday, October 18, 2011 10:03 AM *Subject:* performace jetty (jetty.xml) Hi, i just change my solr installation from 1.4 to 3.4.. i can notice that also jetty configuration file (jetty.xml) is changed. default threads number is higher, theadpool is higher and other default value are higher. is it normal?? what number of these value do you seems are correct for me? i have a dedicated machine with 2 solr istances inside my machine has 8gb of ram and 8 cpu.. i do like 200.000 - 250.000 calls to solr a day... someone can help me?? - Theads number (min,max and low) - corepool size and maximum poolsize * * -- *Gastone Penzo* * *
Re: add thumnail image for search result
Hadi, I do not think solr or solrj does this. are your document HTML documents? I would look in the crawler resources but I note that rendering is a rather server-unfriendly task and it bears some security risk if the documents are not fully trusted. In i2geo.net, we finally gave up on automated rendering, we allowed the users to upload a snapshot; this gives thumbnails that focus on the relevant things instead of a global picture of the initial situation (which, with learning resources, is often close to a blank page). paul Le 19 oct. 2011 à 07:53, hadi a écrit : I want to know how can i add thumbnail image for my files when i am indexing files with solrj? thanks -- View this message in context: http://lucene.472066.n3.nabble.com/add-thumnail-image-for-search-result-tp3433440p3433440.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: score based on unique words matching???
Heres my problem : field1 (text) - subject q=david bowie changes Problem : If a record mentions david bowie a lot, it beats out something more relevant (more unique matches) ... A. (now appearing david bowie at the cineplex 7pm david bowie goes on stage, then mr. bowie will sign autographs) B. song :david bowie - changes (A) ends up more relevant because of the frequency or number of words in it.. not cool... I want it so the number of words matching will trump density/weight You need to disable to Term Frequency (tf) factor. I am not sure just omitTf is available but omitTermFreqAndPositions exists. If you mark your field as omitTermFreqAndPositions=true you will obtain what you want. But this with phrase queries won't work.
Re: IndexBasedSpellChecker on multiple fields
Hi James! terrific suggestion, thanks a lot!!! And sorry for the delay (due to my timezone ;) ) I'll let you know how things will go, thanks once again and have a nice day! Simo http://people.apache.org/~simonetripodi/ http://simonetripodi.livejournal.com/ http://twitter.com/simonetripodi http://www.99soft.org/ On Tue, Oct 18, 2011 at 5:16 PM, Dyer, James james.d...@ingrambook.com wrote: Simone, You can set up a master dictionary but with a few caveats. What you'll need to do is copyfield all of the fields you want to include in your master dictionary into one field and base your IndexBasedSpellChecker dictionary on that. In addition, I would recommend you use the collate feature and set spellcheck.maxCollationTries to something greater than zero (5-10 is usually good). Otherwise, you probably will get a lot of ridiculous suggestions from it trying to correct words from one field with values from another. See http://wiki.apache.org/solr/SpellCheckComponent#spellcheck.collate for more information. There is still a big problem with approach, however. Unless you set onlyMorePopular=true, Solr will never suggest a correction for a word that exists in the dictionary. By creating a huge master dictionary, you will be increasing the chances that Solr will assume your users' misspelled words are in fact correct. One way to work around this is instead of blindly using copyField, to hand-pick a subset of your terms for the master field on which you base your dictionary. Another workaround is to use onlyMorePopular, although this has its own problems. See the discussion for SOLR-2585 (https://issues.apache.org/jira/browse/SOLR-2585), which aims to solve these problems. James Dyer E-Commerce Systems Ingram Content Group (615) 213-4311 -Original Message- From: simone.trip...@gmail.com [mailto:simone.trip...@gmail.com] On Behalf Of Simone Tripodi Sent: Tuesday, October 18, 2011 7:06 AM To: solr-user@lucene.apache.org Subject: IndexBasedSpellChecker on multiple fields Hi all guys, I need to configure the IndexBasedSpellChecker that uses more than just one field as a spelling dictionary, is it possible to achieve? In the meanwhile I configured two spellcheckers and let users switch from a checkeer to another via params on GET request, but looks like people are not particularly happy about it... The main problem is that fields I need to speel contain different informations, I mean the intersection between the two sets could be empty. Many thanks in advance, all the best! Simo http://people.apache.org/~simonetripodi/ http://simonetripodi.livejournal.com/ http://twitter.com/simonetripodi http://www.99soft.org/
Re: Dismax boost + payload boost
Hello Milan, You may also be interesting by the following article: Using Payloads with DisMaxQParser in SOLRhttp://digitalpebble.blogspot.com/2010/08/using-payloads-with-dismaxqparser-in.html http://digitalpebble.blogspot.com/2010/08/using-payloads-with-dismaxqparser-in.html I have implemented something close to what is explained in this article an I am now checking in depth if it works as I expect. I have some problems with the bq parameter to be able to generate dynamicaly several bq statements with setParam(bq,arrayOfValues). Best Using Payloads with DisMaxQParser in SOLR On Wed, Oct 19, 2011 at 12:02 AM, Milan Dobrota mi...@milandobrota.comwrote: Is it possible to combine dismax boost (query time) and payload boost (index time)? I've done something very similar to this post http://sujitpal.blogspot.com/2011/01/payloads-with-solr.html but it seems that query time boosts get ignored. -- Jean-Claude Dauphin jc.daup...@gmail.com jc.daup...@afus.unesco.org http://kenai.com/projects/j-isis/ http://www.unesco.org/isis/ http://www.unesco.org/idams/ http://www.greenstone.org
Dismax and phrases
Hello, I've inherited a solr-lucene project which I continue to develop. This particular SOLR (1.4.1) uses dismax for the queries but I am getting some results that I do not understand. Mainly when I search for two terms I get some results however when I put quotes around the two terms I get a lot more results which goes against my understanding of what should happen ie. a lesser set of results. Where should I start digging for the answer? solrconfiq.xql or some other place? Best regards, Lauri Hyttinen
Optimization /Commit memory
Do we require 2 or 3 Times OS RAM memory or Hard Disk Space while performing Commit or Optimize or Both? what is the requirement in terms of size of RAM and HD for commit and Optimize Regards Sujatha
Re: solr/lucene and its database (a silly question)
Hello Alireza, thank you for the link again ;-) Cheers Loren -- View this message in context: http://lucene.472066.n3.nabble.com/solr-lucene-and-its-database-a-silly-question-tp3431436p3433803.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: solr/lucene and its database (a silly question)
Hi Robert, also many thanks to you and your shortly descriptions/explanations to my questions again were really helpful. Cheers have a nice day Loren -- View this message in context: http://lucene.472066.n3.nabble.com/solr-lucene-and-its-database-a-silly-question-tp3431436p3433811.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr MultiValue Fields and adding values
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 I was hoping that wasent going to be the case... I ended up querying for all unique IDs in the DB, and then querying for each unique ID and getting all names, and then inserting them that way... Seems a lot slower than in theory it really should be... Thanks. - --Tiernan On 18/10/2011 23:20, Otis Gospodnetic wrote: Hi, You'll need to construct the whole document and index it as such. You can't append values to document fields. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ From: Tiernan OToole lsmart...@gmail.com To: solr-user@lucene.apache.org Sent: Tuesday, October 18, 2011 11:41 AM Subject: Solr MultiValue Fields and adding values Good morning. I asked this question on StackOverflow, but though this group may be able to help... the question is available on SO here: http://bit.ly/r6MAWU here goes: I am building a search engine, and have a not so unique ID for a lot of different names... So, for example, there could be an id of B0051QVF7A which would have multiple names like Kindle Amazon Kindle Amazon Kindle 3G Kindle Ebook Reader New Kindle etc. The problem, and question i have, is that i am trying to enter this data from a DB of 11 ish million rows. each is being read one at a time. So i dont have all the names of each ID. I am adding new documents to the list each time. What i am trying to find out is how do i add names to an existing Document? if i am reading documentation correctly, it seems to overwrite the whole document, not add extra info to the field... i just want to add an extra name to the document multivalue field... I know this could cause some weird and wonderful issues if a name is removed (in the example above, New Kindle could be removed when a newer Kindle gets released) but i am thinking of recreating the index every now and again, to clear out issues like that (once a month or so. Its taking about 45min currently to create the index). So, how do you add a value to a multivalue field in solr for an existing document? Thanks in advance. --Tiernan -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.9 (MingW32) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk6eohcACgkQW5AKVqf62MEDiACgrYRvLITHbR2fv//dokfRem1g gJcAoN0f8geuBJHHASRNGS4yDWc/RX2H =4exA -END PGP SIGNATURE-
Re: Solr scraping: Nutch and other alternatives.
Hello Marco, Markus and Óscar. Thank you very much for your answers. What you suggest, Óscar, sounds very interesting. I mean the alternative that covers data mining with any 'popular searcher'. Do you know any tutorial or book that can teach me the first steps? Bye!
Re: Find Documents with field = maxValue
What I'm looking for is to do everything in single shot in Solr. I'm not even sure if it's possible or not. Finding the max value and then running another query is NOT my ideal solution. Thanks everybody On Tue, Oct 18, 2011 at 6:28 PM, Sujit Pal sujit@comcast.net wrote: Hi Alireza, Would this work? Sort the results by age desc, then loop through the results as long as age == age[0]. -sujit On Tue, 2011-10-18 at 15:23 -0700, Otis Gospodnetic wrote: Hi, Are you just looking for: age:target age This will return all documents/records where age field is equal to target age. But maybe you want age:[0 TO target age here] This will include people aged from 0 to target age. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ From: Alireza Salimi alireza.sal...@gmail.com To: solr-user@lucene.apache.org Sent: Tuesday, October 18, 2011 10:15 AM Subject: Re: Find Documents with field = maxValue Hi Ahmet, Thanks for your reply, but I want ALL documents with age = max_age. On Tue, Oct 18, 2011 at 9:59 AM, Ahmet Arslan iori...@yahoo.com wrote: --- On Tue, 10/18/11, Alireza Salimi alireza.sal...@gmail.com wrote: From: Alireza Salimi alireza.sal...@gmail.com Subject: Find Documents with field = maxValue To: solr-user@lucene.apache.org Date: Tuesday, October 18, 2011, 4:10 PM Hi, It might be a naive question. Assume we have a list of Document, each Document contains the information of a person, there is a numeric field named 'age', how can we find those Documents whose *age* field is *max(age) *in one query. May be http://wiki.apache.org/solr/StatsComponent? Or sort by age? q=*:*start=0rows=1sort=age desc -- Alireza Salimi Java EE Developer -- Alireza Salimi Java EE Developer
RE: Filter Question
Thanks Steven, that's just the kind of feedback I needed. And thanks also to Jan. I'll do a little clean-up on my filter and submit it... -Monica -Original Message- From: Steven A Rowe [mailto:sar...@syr.edu] Sent: Friday, October 14, 2011 3:18 AM To: solr-user@lucene.apache.org Subject: RE: Filter Question Hi Monica, AFAIK there is nothing like the filter you've described, and I believe it would be generally useful. Maybe it could be called StopTermTypesFilter? (Plural on Types to signify that more than one type of term can be stopped by a single instance of the filter.) Such a filter should have an enablePositionIncrements option like StopFilter. Steve -Original Message- From: Monica Skidmore [mailto:monica.skidm...@careerbuilder.com] Sent: Thursday, October 13, 2011 1:04 PM To: solr-user@lucene.apache.org; Otis Gospodnetic Subject: RE: Filter Question Thanks, Otis - yes, this is different from the synonyms filter, which we also use. For example, if you wanted all tokens that were marked 'lemma' to be removed, you could specify that, and all tokens with any type other than 'lemma' would still be returned. You could also choose to remove all tokens of types 'lemma' and 'word' (although that would probably be a bad idea!), etc. Normally, if you don't want a token type, you just don't include/run the filter that produces that type. However, we have a third-party filter that produces multiple types, and this allows us to select a subset of those types. I did see the HowToContribute wiki, but I'm relatively new to solr, and I wanted to see if this looked familiar to someone before I started down the contribution path. Thanks again! -Monica -Original Message- From: Otis Gospodnetic [mailto:otis_gospodne...@yahoo.com] Sent: Thursday, October 13, 2011 12:37 PM To: solr-user@lucene.apache.org Subject: Re: Filter Question Monica, This is different from Solr's synonyms filter with different synonyms files, one for index-time and the other for query-time expansion (not sure when you'd want that, but it looks like you need this and like this), right? If so, maybe you can describe what your filter does differently and then follow http://wiki.apache.org/solr/HowToContribute - thanks in advance! :) Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ From: Monica Skidmore monica.skidm...@careerbuilder.com To: solr-user@lucene.apache.org solr-user@lucene.apache.org Sent: Thursday, October 13, 2011 11:37 AM Subject: Filter Question Our Solr implementation includes a third-party filter that adds additional, multiple term types to the token list (beyond word, etc.). Most of the time this is exactly what we want, but we felt we could improve our search results by having different tokens on the index and query side. Since the filter in question was third-party and we didn't have access to source code, we wrote our own filter that will take out tokens based on their term attribute type. We didn't see another filter available that does this - did we overlook it? And if not, is this something that would be of value if we contribute it back to the Solr community? Monica Skidmore
Painfully slow indexing
Hi guys, I have set up a Solr instance and upon attempting to index document, the whole process is painfully slow. I will try to put as much info as I can in this mail. Pl. feel free to ask me anything else that might be required. I am sending documents in batches not exceeding 2,000. The size of each of them depends but usually is around 10-15MiB. My indexing script tells me that Solr took T seconds to add N documents of size S. For the same data, the Solr Log add QTime is QT. Some of the sample data are: N ST QT - 390 docs | 3,478,804 Bytes | 14.5s| 2297 852 docs | 6,039,535 Bytes | 25.3s| 4237 1345 docs | 11,147,512 Bytes | 47s | 8543 1147 docs | 9,457,717 Bytes | 44s | 2297 1096 docs | 13,058,204 Bytes | 54.3s | 8782 The time T includes the time of converting an array of Hash objects into XML, POSTing it to Solr and response acknowledged from Solr. Clearly, there is a huge difference between both the time T and QT. After a lot of efforts, I have no clue why these times do not match. The Server has 16 cores, 48GiB RAM. JVM options are -Xms5000M -Xmx5000M -XX:+UseParNewGC I believe my Indexing is getting slow. Relevant portion from my schema file are as follows. On a related note, every document has one dynamic field. Based on this rate, it takes me ~30hrs to do a full index of my database. I would really appreciate kindness of community in order to get this indexing faster. indexDefaults useCompoundFilefalse/useCompoundFile mergeScheduler class=org.apache.lucene.index.ConcurrentMergeScheduler int name=maxMergeCount10/int int name=maxThreadCount10/int /mergeScheduler ramBufferSizeMB2048/ramBufferSizeMB maxMergeDocs2147483647/maxMergeDocs maxFieldLength300/maxFieldLength writeLockTimeout1000/writeLockTimeout maxBufferedDocs5/maxBufferedDocs termIndexInterval256/termIndexInterval mergeFactor10/mergeFactor useCompoundFilefalse/useCompoundFile !-- mergePolicy class=org.apache.lucene.index.TieredMergePolicy int name=maxMergeAtOnceExplicit19/int int name=segmentsPerTier9/int /mergePolicy -- /indexDefaults mainIndex unlockOnStartuptrue/unlockOnStartup reopenReaderstrue/reopenReaders deletionPolicy class=solr.SolrDeletionPolicy str name=maxCommitsToKeep1/str str name=maxOptimizedCommitsToKeep0/str /deletionPolicy infoStream file=INFOSTREAM.txtfalse/infoStream /mainIndex updateHandler class=solr.DirectUpdateHandler2 autoCommit maxDocs10/maxDocs /autoCommit /updateHandler *Pranav Prakash* temet nosce Twitter http://twitter.com/pranavprakash | Blog http://blog.myblive.com | Google http://www.google.com/profiles/pranny
Merging Remote Solr Indexes?
Hi, I thought of a useful capability if it doesn't already exist. Is it possible to do an index merge between two remote Solr's? To handle massive index-time scalability, wouldn't it be useful to have distributed indexes accepting local input, then merge them into one central index after? Darren
RE: Solr MultiValue Fields and adding values
While Solr/Lucene can't support true document updates, there are 2 ways you might be able to work around this in your situation. 1. If you store all of the fields, you can write something that will read back everything already indexed to the document, append whatever data you want, then write it back. This will increase index size and possibly make indexing too slow. On the other hand, it might be more efficient than requiring the database to return everything in order. 2. You could store your data as multiple documents per id (pick something else as your unique id). Then use the grouping functionality to roll up on your unique id whenever you query. This will mean changes to your application, probably a bigger index, and likely somewhat slower querying. But the performance losses might be slight and this seems to me like it maybe would be a good solution in your case. Perhaps it would make it so you wouldn't have to entirely re-index each month or so. See http://wiki.apache.org/solr/FieldCollapsing for more information. James Dyer E-Commerce Systems Ingram Content Group (615) 213-4311 -Original Message- From: Tiernan OToole [mailto:lsmart...@gmail.com] Sent: Wednesday, October 19, 2011 5:11 AM To: solr-user@lucene.apache.org Cc: Otis Gospodnetic Subject: Re: Solr MultiValue Fields and adding values -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 I was hoping that wasent going to be the case... I ended up querying for all unique IDs in the DB, and then querying for each unique ID and getting all names, and then inserting them that way... Seems a lot slower than in theory it really should be... Thanks. - --Tiernan On 18/10/2011 23:20, Otis Gospodnetic wrote: Hi, You'll need to construct the whole document and index it as such. You can't append values to document fields. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ From: Tiernan OToole lsmart...@gmail.com To: solr-user@lucene.apache.org Sent: Tuesday, October 18, 2011 11:41 AM Subject: Solr MultiValue Fields and adding values Good morning. I asked this question on StackOverflow, but though this group may be able to help... the question is available on SO here: http://bit.ly/r6MAWU here goes: I am building a search engine, and have a not so unique ID for a lot of different names... So, for example, there could be an id of B0051QVF7A which would have multiple names like Kindle Amazon Kindle Amazon Kindle 3G Kindle Ebook Reader New Kindle etc. The problem, and question i have, is that i am trying to enter this data from a DB of 11 ish million rows. each is being read one at a time. So i dont have all the names of each ID. I am adding new documents to the list each time. What i am trying to find out is how do i add names to an existing Document? if i am reading documentation correctly, it seems to overwrite the whole document, not add extra info to the field... i just want to add an extra name to the document multivalue field... I know this could cause some weird and wonderful issues if a name is removed (in the example above, New Kindle could be removed when a newer Kindle gets released) but i am thinking of recreating the index every now and again, to clear out issues like that (once a month or so. Its taking about 45min currently to create the index). So, how do you add a value to a multivalue field in solr for an existing document? Thanks in advance. --Tiernan -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.9 (MingW32) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk6eohcACgkQW5AKVqf62MEDiACgrYRvLITHbR2fv//dokfRem1g gJcAoN0f8geuBJHHASRNGS4yDWc/RX2H =4exA -END PGP SIGNATURE-
Re: Solr scraping: Nutch and other alternatives.
Try this if you haven't use python before : http://gun.io/blog/python-for-the-web/ Keep in mind that the usage of some very known search engine is usually not in line with their ToS, so they will sooner or later block you, at least. Be gentle and polite, and you even might make it work... ;) On Wed, Oct 19, 2011 at 2:08 PM, Luis Cappa Banda luisca...@gmail.comwrote: Do you know any tutorial or book that can teach me the first steps? -- Igor Milovanović http://about.me/igor.milovanovic http://umotvorine.com/
Re: Solr MultiValue Fields and adding values
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Thanks for the comment. Sounds like too much of a change in all fairness... I have actually made a tweak to my DB to allow multiple names, and storing them off the main table. my query then only needs to query the IDs, and then the second table to get the names. but i will keep the comments in mind and see how things go over the next while. As a side note, if i were to go down the get doc from solr, modify, commit back to solr is it really that simple? run a query on Solr, get the document, add the extra data, and insert back to solr? Thanks. - --Tiernan On 19/10/2011 15:26, Dyer, James wrote: While Solr/Lucene can't support true document updates, there are 2 ways you might be able to work around this in your situation. 1. If you store all of the fields, you can write something that will read back everything already indexed to the document, append whatever data you want, then write it back. This will increase index size and possibly make indexing too slow. On the other hand, it might be more efficient than requiring the database to return everything in order. 2. You could store your data as multiple documents per id (pick something else as your unique id). Then use the grouping functionality to roll up on your unique id whenever you query. This will mean changes to your application, probably a bigger index, and likely somewhat slower querying. But the performance losses might be slight and this seems to me like it maybe would be a good solution in your case. Perhaps it would make it so you wouldn't have to entirely re-index each month or so. See http://wiki.apache.org/solr/FieldCollapsing for more information. James Dyer E-Commerce Systems Ingram Content Group (615) 213-4311 -Original Message- From: Tiernan OToole [mailto:lsmart...@gmail.com] Sent: Wednesday, October 19, 2011 5:11 AM To: solr-user@lucene.apache.org Cc: Otis Gospodnetic Subject: Re: Solr MultiValue Fields and adding values I was hoping that wasent going to be the case... I ended up querying for all unique IDs in the DB, and then querying for each unique ID and getting all names, and then inserting them that way... Seems a lot slower than in theory it really should be... Thanks. --Tiernan On 18/10/2011 23:20, Otis Gospodnetic wrote: Hi, You'll need to construct the whole document and index it as such. You can't append values to document fields. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ From: Tiernan OToole lsmart...@gmail.com To: solr-user@lucene.apache.org Sent: Tuesday, October 18, 2011 11:41 AM Subject: Solr MultiValue Fields and adding values Good morning. I asked this question on StackOverflow, but though this group may be able to help... the question is available on SO here: http://bit.ly/r6MAWU here goes: I am building a search engine, and have a not so unique ID for a lot of different names... So, for example, there could be an id of B0051QVF7A which would have multiple names like Kindle Amazon Kindle Amazon Kindle 3G Kindle Ebook Reader New Kindle etc. The problem, and question i have, is that i am trying to enter this data from a DB of 11 ish million rows. each is being read one at a time. So i dont have all the names of each ID. I am adding new documents to the list each time. What i am trying to find out is how do i add names to an existing Document? if i am reading documentation correctly, it seems to overwrite the whole document, not add extra info to the field... i just want to add an extra name to the document multivalue field... I know this could cause some weird and wonderful issues if a name is removed (in the example above, New Kindle could be removed when a newer Kindle gets released) but i am thinking of recreating the index every now and again, to clear out issues like that (once a month or so. Its taking about 45min currently to create the index). So, how do you add a value to a multivalue field in solr for an existing document? Thanks in advance. --Tiernan -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.9 (MingW32) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk6e5j4ACgkQW5AKVqf62MHcnACbBGtTs25FjGe8Rs7q9DyO0J5r VnEAnRiPe4KCe717i//aPFiAlYsLwELB =eqRg -END PGP SIGNATURE-
Re: Solr MultiValue Fields and adding values
-BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Thats what i though too... we see what the speed difference actually is... running some tests now... Thanks for the info! - --Tiernan On 19/10/2011 16:07, Dyer, James wrote: Not that I am doing this with any of my indexes, but I'm pretty sure the get doc from solr, modify, commit back to solr approach really is that simple. Just be sure you are storing the exact raw data that came from your database (typically you would). The problem with this approach is it potentially could be very slow if you're updating lots of documents. James Dyer E-Commerce Systems Ingram Content Group (615) 213-4311 From: Tiernan OToole [mailto:lsmart...@gmail.com] Sent: Wednesday, October 19, 2011 10:01 AM To: solr-user@lucene.apache.org Cc: Dyer, James Subject: Re: Solr MultiValue Fields and adding values Thanks for the comment. Sounds like too much of a change in all fairness... I have actually made a tweak to my DB to allow multiple names, and storing them off the main table. my query then only needs to query the IDs, and then the second table to get the names. but i will keep the comments in mind and see how things go over the next while. As a side note, if i were to go down the get doc from solr, modify, commit back to solr is it really that simple? run a query on Solr, get the document, add the extra data, and insert back to solr? Thanks. --Tiernan On 19/10/2011 15:26, Dyer, James wrote: While Solr/Lucene can't support true document updates, there are 2 ways you might be able to work around this in your situation. 1. If you store all of the fields, you can write something that will read back everything already indexed to the document, append whatever data you want, then write it back. This will increase index size and possibly make indexing too slow. On the other hand, it might be more efficient than requiring the database to return everything in order. 2. You could store your data as multiple documents per id (pick something else as your unique id). Then use the grouping functionality to roll up on your unique id whenever you query. This will mean changes to your application, probably a bigger index, and likely somewhat slower querying. But the performance losses might be slight and this seems to me like it maybe would be a good solution in your case. Perhaps it would make it so you wouldn't have to entirely re-index each month or so. See http://wiki.apache.org/solr/FieldCollapsing for more information. James Dyer E-Commerce Systems Ingram Content Group (615) 213-4311 -Original Message- From: Tiernan OToole [mailto:lsmart...@gmail.com] Sent: Wednesday, October 19, 2011 5:11 AM To: solr-user@lucene.apache.org Cc: Otis Gospodnetic Subject: Re: Solr MultiValue Fields and adding values I was hoping that wasent going to be the case... I ended up querying for all unique IDs in the DB, and then querying for each unique ID and getting all names, and then inserting them that way... Seems a lot slower than in theory it really should be... Thanks. --Tiernan On 18/10/2011 23:20, Otis Gospodnetic wrote: Hi, You'll need to construct the whole document and index it as such. You can't append values to document fields. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ From: Tiernan OToole lsmart...@gmail.com To: solr-user@lucene.apache.org Sent: Tuesday, October 18, 2011 11:41 AM Subject: Solr MultiValue Fields and adding values Good morning. I asked this question on StackOverflow, but though this group may be able to help... the question is available on SO here: http://bit.ly/r6MAWU here goes: I am building a search engine, and have a not so unique ID for a lot of different names... So, for example, there could be an id of B0051QVF7A which would have multiple names like Kindle Amazon Kindle Amazon Kindle 3G Kindle Ebook Reader New Kindle etc. The problem, and question i have, is that i am trying to enter this data from a DB of 11 ish million rows. each is being read one at a time. So i dont have all the names of each ID. I am adding new documents to the list each time. What i am trying to find out is how do i add names to an existing Document? if i am reading documentation correctly, it seems to overwrite the whole document, not add extra info to the field... i just want to add an extra name to the document multivalue field... I know this could cause some weird and wonderful issues if a name is removed
stemEnglishPossessive and contractions
We utilize a comprehensive dictionary of English words, place names, surnames, male and female first names, ... you get the point. As such, the possessive plural forms of these words are recognized as 'misspelled'. I simply thought that 'turning on' this option for the WordDelimiterFactory would address my concerns; however, I also got an unintended consequence: Contractions (isn't, wouldn't, shouldn't, he'll, we'll...) also seem to be affected. Is this intended behavior? When I read 'English possessive' I hear 'apostrophe s' and not 'apostrophe anything'. Is there something I'm missing here?
Re: Merging Remote Solr Indexes?
Hi Darren, http://search-lucene.com/?q=solr+mergefc_project=Solr Check hit #1 Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ From: dar...@ontrenet.com dar...@ontrenet.com To: solr-user@lucene.apache.org Sent: Wednesday, October 19, 2011 10:04 AM Subject: Merging Remote Solr Indexes? Hi, I thought of a useful capability if it doesn't already exist. Is it possible to do an index merge between two remote Solr's? To handle massive index-time scalability, wouldn't it be useful to have distributed indexes accepting local input, then merge them into one central index after? Darren
java.lang.NoSuchMethodError: org.slf4j.spi.LocationAwareLogger.log
I'm working on upgrading to Solr 3.4.0 and am seeing this error in my tomcat log. I'm using the following slf jars: slf4j-api-1.6.1.jar slf4j-jdk14-1.6.1.jar Has anybody run into this? I can reproduce it doing curl calls to the Solr ExtractingRequestHandler ala /solr/update/extract. TIA - Tod
Re: stemEnglishPossessive and contractions
The word delimiter filter also does other things, it treats ' as punctuation by default. So it normally splits on ', except if its 's (in this case it removes the 's completely if you use this stemEnglishPossessive). There are a couple approaches you can use: 1. you can keep worddelimiterfilter with this option on, but disabling splitting on ' by customize its type table. in this case specify types=mycustomtypes.txt, and in that file specify ' to be treated as ALPHANUM or similar. see https://issues.apache.org/jira/browse/SOLR-2059 for some examples of this. i would only do this if you want worddelimiterfilter for other purposes, if you just want to remove possessives and don't need worddelimiterfilter's other features, look below. 2. you can instead use EnglishPossessiveFilterFactory, which only does this exact thing (remove 's) and nothing else. On Wed, Oct 19, 2011 at 5:30 PM, Herman Kiefus herm...@angieslist.com wrote: We utilize a comprehensive dictionary of English words, place names, surnames, male and female first names, ... you get the point. As such, the possessive plural forms of these words are recognized as 'misspelled'. I simply thought that 'turning on' this option for the WordDelimiterFactory would address my concerns; however, I also got an unintended consequence: Contractions (isn't, wouldn't, shouldn't, he'll, we'll...) also seem to be affected. Is this intended behavior? When I read 'English possessive' I hear 'apostrophe s' and not 'apostrophe anything'. Is there something I'm missing here? -- lucidimagination.com
How to make UnInvertedField faster?
I was wondering if anyone has any ideas for making UnInvertedField.uninvert() faster, or other alternatives for generating facets quickly. The vast majority of the CPU time for our Solr instances is spent generating UnInvertedFields after each commit. Here's an example of one of our slower fields: [2011-10-19 17:46:01,055] INFO125974[pool-1-thread-1] - (SolrCore:440) - UnInverted multi-valued field {field=authorCS,memSize=38063628,tindexSize=422652, time=15610,phase1=15584,nTerms=1558514,bigTerms=0,termInstances=4510674,uses=0} That is from an index with approximately 8 million documents. After each commit, it takes on average about 90 seconds to uninvert all the fields that we facet on. Any ideas at all would be greatly appreciated. -Michael
dataimport indexing fails: where are my log files ? ;-)
dumb question ... today I set up solr3.4/example, indexing to 8983 via post is working, so is search, solr/dataimport reports str name=Total Rows Fetched0/str str name=Total Documents Processed0/str str name=Total Documents Skipped0/str str name=Full Dump Started2011-10-19 18:13:57/str str name=Indexing failed. Rolled back all changes./str Google tells me to look at the exception logs to find out what's happening ... but, I can't find the logs! Where are they? example/logs is an empty directory.
Re: dataimport indexing fails: where are my log files ? ;-)
On 10/19/2011 12:42 PM, Fred Zimmerman wrote: dumb question ... today I set up solr3.4/example, indexing to 8983 via post is working, so is search, solr/dataimport reports str name=Total Rows Fetched0/str str name=Total Documents Processed0/str str name=Total Documents Skipped0/str str name=Full Dump Started2011-10-19 18:13:57/str str name=Indexing failed. Rolled back all changes./str Google tells me to look at the exception logs to find out what's happening ... but, I can't find the logs! Where are they? example/logs is an empty directory. I believe that if you are running the example Solr without any changes related to logging, that information will be dumped to stdout/stderr. If you are starting Solr as a daemon or a service, it may be going someplace you can't retrieve it. Start it directly from the commandline and/or alter your startup command to redirect stdout/stderr to files. I hope that's actually helpful! Thanks, Shawn
Re: java.lang.NoSuchMethodError: org.slf4j.spi.LocationAwareLogger.log
Hi Tod, I had similar issue with slf4j, but it was NoClassDefFound. Do you have some other dependencies in your application that use some other version of slf4j? You can use mvn dependency:tree to get all dependencies in your application. Or maybe there's some other version already in your tomcat or application server. /Tim 2011/10/19 Tod listac...@gmail.com: I'm working on upgrading to Solr 3.4.0 and am seeing this error in my tomcat log. I'm using the following slf jars: slf4j-api-1.6.1.jar slf4j-jdk14-1.6.1.jar Has anybody run into this? I can reproduce it doing curl calls to the Solr ExtractingRequestHandler ala /solr/update/extract. TIA - Tod
RE: stemEnglishPossessive and contractions
Thanks Robert, exactly what I was looking for. -Original Message- From: Robert Muir [mailto:rcm...@gmail.com] Sent: Wednesday, October 19, 2011 1:15 PM To: solr-user@lucene.apache.org Subject: Re: stemEnglishPossessive and contractions The word delimiter filter also does other things, it treats ' as punctuation by default. So it normally splits on ', except if its 's (in this case it removes the 's completely if you use this stemEnglishPossessive). There are a couple approaches you can use: 1. you can keep worddelimiterfilter with this option on, but disabling splitting on ' by customize its type table. in this case specify types=mycustomtypes.txt, and in that file specify ' to be treated as ALPHANUM or similar. see https://issues.apache.org/jira/browse/SOLR-2059 for some examples of this. i would only do this if you want worddelimiterfilter for other purposes, if you just want to remove possessives and don't need worddelimiterfilter's other features, look below. 2. you can instead use EnglishPossessiveFilterFactory, which only does this exact thing (remove 's) and nothing else. On Wed, Oct 19, 2011 at 5:30 PM, Herman Kiefus herm...@angieslist.com wrote: We utilize a comprehensive dictionary of English words, place names, surnames, male and female first names, ... you get the point. As such, the possessive plural forms of these words are recognized as 'misspelled'. I simply thought that 'turning on' this option for the WordDelimiterFactory would address my concerns; however, I also got an unintended consequence: Contractions (isn't, wouldn't, shouldn't, he'll, we'll...) also seem to be affected. Is this intended behavior? When I read 'English possessive' I hear 'apostrophe s' and not 'apostrophe anything'. Is there something I'm missing here? -- lucidimagination.com
where is solr data import handler looking for my file?
Solr dataimport is reporting file not found when it looks for foo.xml. Where is it looking for /data? is this an url off the apache2/htdocs on the server, or is it an URL within example/solr/...? entity name=page processor=XPathEntityProcessor stream=true forEach=/mediawiki/page/ url=/data/foo.xml transformer=RegexTransformer,DateFormatTransformer
Re: Merging Remote Solr Indexes?
Hi Otis, Yeah, I saw page, but it says for merging cores, which I presume must reside locally to the solr instance doing the merging? What I'm interested in doing is merging across solr instances running on different machines into a single solr running on another machine (programmatically). Is it still possible or did I misread the wiki? Thanks! Darren On 10/19/2011 11:57 AM, Otis Gospodnetic wrote: Hi Darren, http://search-lucene.com/?q=solr+mergefc_project=Solr Check hit #1 Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ From: dar...@ontrenet.comdar...@ontrenet.com To: solr-user@lucene.apache.org Sent: Wednesday, October 19, 2011 10:04 AM Subject: Merging Remote Solr Indexes? Hi, I thought of a useful capability if it doesn't already exist. Is it possible to do an index merge between two remote Solr's? To handle massive index-time scalability, wouldn't it be useful to have distributed indexes accepting local input, then merge them into one central index after? Darren
RE: how was developed solr admin page and the UI part?
I believe that if you have the Solr distribution, you have the source for the web UI already: it is just .jsp pages. They are inside the solr .war file. JRJ -Original Message- From: nagarjuna [mailto:nagarjuna.avul...@gmail.com] Sent: Wednesday, October 19, 2011 12:07 AM To: solr-user@lucene.apache.org Subject: how was developed solr admin page and the UI part? Hi everybody... i would like know how was the solr admin page and the total UI part developed i would like to download the source code of solr UI part can anybody send me the links please Thanx in advance -- View this message in context: http://lucene.472066.n3.nabble.com/how-was-developed-solr-admin-page-and-the-UI-part-tp3433345p3433345.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: OS Cache - Solr
200 instances of what? The Solr application with lucene, etc. per usual? Solr cores? ??? Either way, 200 seems to be very very very many: unusually so. Why so many? If you have 200 instances of Solr in a 20 GB JVM, that would only be 100MB per Solr instance. If you have 200 instances of Solr all accessing the same physical disk, the results are not likely to be satisfactory - the disk head will go nuts trying to handle all of the requests. JRJ -Original Message- From: Sujatha Arun [mailto:suja.a...@gmail.com] Sent: Wednesday, October 19, 2011 12:25 AM To: solr-user@lucene.apache.org; Otis Gospodnetic Subject: Re: OS Cache - Solr Thanks ,Otis, This is our Solr Cache Allocation.We have the same Cache allocation for all our *200+ instances* in the single Server.Is this too high? *Query Result Cache*:LRU Cache(maxSize=16384, initialSize=4096, autowarmCount=1024, ) *Document Cache *:LRU Cache(maxSize=16384, initialSize=16384) *Filter Cache* LRU Cache(maxSize=16384, initialSize=4096, autowarmCount=4096, ) Regards Sujatha On Wed, Oct 19, 2011 at 4:05 AM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: Maybe your Solr Document cache is big and that's consuming a big part of that JVM heap? If you want to be able to run with a smaller heap, consider making your caches smaller. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ From: Sujatha Arun suja.a...@gmail.com To: solr-user@lucene.apache.org Sent: Tuesday, October 18, 2011 12:53 AM Subject: Re: OS Cache - Solr Hello Jan, Thanks for your response and clarification. We are monitoring the JVM cache utilization and we are currently using about 18 GB of the 20 GB assigned to JVM. Out total index size being abt 14GB Regards Sujatha On Tue, Oct 18, 2011 at 1:19 AM, Jan Høydahl jan@cominvent.com wrote: Hi Sujatha, Are you sure you need 20Gb for Tomcat? Have you profiled using JConsole or similar? Try with 15Gb and see how it goes. The reason why this is beneficial is that you WANT your OS to have available memory for disk caching. If you have 17Gb free after starting Solr, your OS will be able to cache all index files in memory and you get very high search performance. With your current settings, there is only 12Gb free for both caching the index and for your MySql activities. Chances are that when you backup MySql, the cached part of your Solr index gets flushed from disk caches and need to be re-cached later. How to interpret memory stats vary between OSes, and seing 163Mb free may simply mean that your OS has used most RAM for various caches and paging, but will flush it once an application asks for more memory. Have you seen http://wiki.apache.org/solr/SolrPerformanceFactors ? You should also slim down your index maximally by setting stored=false and indexed=false wherever possible. I would also upgrade to a more current Solr version. -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Solr Training - www.solrtraining.com On 17. okt. 2011, at 19:51, Sujatha Arun wrote: Hello I am trying to understand the OS cache utilization of Solr .Our server has several solr instances on a server .The total combined Index size of all instances is abt 14 Gb and the size of the maximum single Index is abt 2.5 GB . Our Server has Quad processor with 32 GB RAM .Out of which 20 GB has been assigned to JVM. We are running solr1.3 on tomcat 5.5 and Java 1.6 Our current Statistics indicate that solr uses 18-19 GB of 20 GB RAM assigned to JVM .However the Free physical seems to remain constant as below. Free physical memory = 163 Mb Total physical memory = 32,232 Mb, The server also serves as a backup server for Mysql where the application DB is backed up and restored .During this activity we see that lot of queries that nearly take even 10+ minutes to execute .But other wise maximum query time is less than 1-2 secs The physical memory that is free seems to be constant . Why is this constant and how this will be used between the Mysql backup and solr while backup activity is happening How much free physical memory should be available to OS given out stats.? Any pointers would be helpful. Regards Sujatha
RE: How to update document with solrj?
Solr does not have an update per se: you have to re-add the document. A document with the same value for the field defined as the uniqueKey will replace any existing document with that key (you do not have to query and explicitly delete it first). JRJ -Original Message- From: hadi [mailto:md.anb...@gmail.com] Sent: Wednesday, October 19, 2011 12:50 AM To: solr-user@lucene.apache.org Subject: How to update document with solrj? I have indexed some files that do not have any tag or description and i want to add some field without deleting them,how can i update or add info to my index files with solrj? my idea for this issue is query on specific file and delete it and add some info and re index it but i think it is not a good idea -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-update-document-with-solrj-tp3433434p3433434.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: add thumnail image for search result
It won't do it for you automatically. I suppose you might create the thumbnail image beforehand, Base64 encode it, and add it as a stored, non-indexed, binary field (see schema: solr.BinaryField) when you index the document. JRJ -Original Message- From: hadi [mailto:md.anb...@gmail.com] Sent: Wednesday, October 19, 2011 12:54 AM To: solr-user@lucene.apache.org Subject: add thumnail image for search result I want to know how can i add thumbnail image for my files when i am indexing files with solrj? thanks -- View this message in context: http://lucene.472066.n3.nabble.com/add-thumnail-image-for-search-result-tp3433440p3433440.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Optimization /Commit memory
Commit does not particularly spike disk or memory usage, unless you are adding a very large number of documents between commits. A commit can cause a need to merge indexes, which can increase disk space temporarily. An optimize is *likely* to merge indexes, which will usually increase disk space temporarily. How much disk space depends very much upon how big your index is in the first place. A 2 to 3 times factor of the sum of your peak index file size seems safe, to me. Solr uses only modest amounts of memory for the JVM for this stuff. JRJ -Original Message- From: Sujatha Arun [mailto:suja.a...@gmail.com] Sent: Wednesday, October 19, 2011 4:04 AM To: solr-user@lucene.apache.org Subject: Optimization /Commit memory Do we require 2 or 3 Times OS RAM memory or Hard Disk Space while performing Commit or Optimize or Both? what is the requirement in terms of size of RAM and HD for commit and Optimize Regards Sujatha
Re: Merging Remote Solr Indexes?
Darren, No, that is not possible without one copying an index/shard to a single machine on which you would then merge indices as described on the Wiki. H, wouldn't it be nice to make use of existing replication code to make it possible to move shards around the cluster? Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ From: Darren Govoni dar...@ontrenet.com To: solr-user@lucene.apache.org Sent: Wednesday, October 19, 2011 5:15 PM Subject: Re: Merging Remote Solr Indexes? Hi Otis, Yeah, I saw page, but it says for merging cores, which I presume must reside locally to the solr instance doing the merging? What I'm interested in doing is merging across solr instances running on different machines into a single solr running on another machine (programmatically). Is it still possible or did I misread the wiki? Thanks! Darren On 10/19/2011 11:57 AM, Otis Gospodnetic wrote: Hi Darren, http://search-lucene.com/?q=solr+mergefc_project=Solr Check hit #1 Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ From: dar...@ontrenet.comdar...@ontrenet.com To: solr-user@lucene.apache.org Sent: Wednesday, October 19, 2011 10:04 AM Subject: Merging Remote Solr Indexes? Hi, I thought of a useful capability if it doesn't already exist. Is it possible to do an index merge between two remote Solr's? To handle massive index-time scalability, wouldn't it be useful to have distributed indexes accepting local input, then merge them into one central index after? Darren
Re: Merging Remote Solr Indexes?
Actually, yeah. If you think about it a remote merge is like the inverse of replication. Where replication is a one to many away from an index, the inverse would be merging many back to the one. Sorta like a recall. I think it would be a great analog to replication. On 10/19/2011 06:18 PM, Otis Gospodnetic wrote: Darren, No, that is not possible without one copying an index/shard to a single machine on which you would then merge indices as described on the Wiki. H, wouldn't it be nice to make use of existing replication code to make it possible to move shards around the cluster? Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ From: Darren Govonidar...@ontrenet.com To: solr-user@lucene.apache.org Sent: Wednesday, October 19, 2011 5:15 PM Subject: Re: Merging Remote Solr Indexes? Hi Otis, Yeah, I saw page, but it says for merging cores, which I presume must reside locally to the solr instance doing the merging? What I'm interested in doing is merging across solr instances running on different machines into a single solr running on another machine (programmatically). Is it still possible or did I misread the wiki? Thanks! Darren On 10/19/2011 11:57 AM, Otis Gospodnetic wrote: Hi Darren, http://search-lucene.com/?q=solr+mergefc_project=Solr Check hit #1 Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ From: dar...@ontrenet.comdar...@ontrenet.com To: solr-user@lucene.apache.org Sent: Wednesday, October 19, 2011 10:04 AM Subject: Merging Remote Solr Indexes? Hi, I thought of a useful capability if it doesn't already exist. Is it possible to do an index merge between two remote Solr's? To handle massive index-time scalability, wouldn't it be useful to have distributed indexes accepting local input, then merge them into one central index after? Darren
RE: how was developed solr admin page and the UI part?
Thank u for ur reply jaegeri saw that and i would like to use that jsp code and thought to modify solr UI little bit as per user convinience .now my question is ,is it possible to develop that using spring mvc architecture. -- View this message in context: http://lucene.472066.n3.nabble.com/how-was-developed-solr-admin-page-and-the-UI-part-tp3433345p3436737.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Optimization /Commit memory
Thanks Jay , I was trying to compute the *OS RAM requirement* *not JVM RAM* for a 14 GB Index [cumulative Index size of all Instances].And I put it thus - Requirement of Operating System RAM for an Index of 14GB is - Index Size + 3 Times the maximum Index Size of Individual Instance for Optimize . That is to say ,I have several Instances ,combined Index Size is 14GB .Maximum Individual Index Size is 2.5GB .so My requirement for OS RAM is 14GB +3 * 2.5 GB ~ = 22GB. Correct? Regards Sujatha On Thu, Oct 20, 2011 at 3:45 AM, Jaeger, Jay - DOT jay.jae...@dot.wi.govwrote: Commit does not particularly spike disk or memory usage, unless you are adding a very large number of documents between commits. A commit can cause a need to merge indexes, which can increase disk space temporarily. An optimize is *likely* to merge indexes, which will usually increase disk space temporarily. How much disk space depends very much upon how big your index is in the first place. A 2 to 3 times factor of the sum of your peak index file size seems safe, to me. Solr uses only modest amounts of memory for the JVM for this stuff. JRJ -Original Message- From: Sujatha Arun [mailto:suja.a...@gmail.com] Sent: Wednesday, October 19, 2011 4:04 AM To: solr-user@lucene.apache.org Subject: Optimization /Commit memory Do we require 2 or 3 Times OS RAM memory or Hard Disk Space while performing Commit or Optimize or Both? what is the requirement in terms of size of RAM and HD for commit and Optimize Regards Sujatha
Re: OS Cache - Solr
Yes 200 Individual Solr Instances not solr cores. We get an avg response time of below 1 sec. The number of documents is not many most of the isntances ,some of the instnaces have about 5 lac documents on average. Regards Sujahta On Thu, Oct 20, 2011 at 3:35 AM, Jaeger, Jay - DOT jay.jae...@dot.wi.govwrote: 200 instances of what? The Solr application with lucene, etc. per usual? Solr cores? ??? Either way, 200 seems to be very very very many: unusually so. Why so many? If you have 200 instances of Solr in a 20 GB JVM, that would only be 100MB per Solr instance. If you have 200 instances of Solr all accessing the same physical disk, the results are not likely to be satisfactory - the disk head will go nuts trying to handle all of the requests. JRJ -Original Message- From: Sujatha Arun [mailto:suja.a...@gmail.com] Sent: Wednesday, October 19, 2011 12:25 AM To: solr-user@lucene.apache.org; Otis Gospodnetic Subject: Re: OS Cache - Solr Thanks ,Otis, This is our Solr Cache Allocation.We have the same Cache allocation for all our *200+ instances* in the single Server.Is this too high? *Query Result Cache*:LRU Cache(maxSize=16384, initialSize=4096, autowarmCount=1024, ) *Document Cache *:LRU Cache(maxSize=16384, initialSize=16384) *Filter Cache* LRU Cache(maxSize=16384, initialSize=4096, autowarmCount=4096, ) Regards Sujatha On Wed, Oct 19, 2011 at 4:05 AM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: Maybe your Solr Document cache is big and that's consuming a big part of that JVM heap? If you want to be able to run with a smaller heap, consider making your caches smaller. Otis Sematext :: http://sematext.com/ :: Solr - Lucene - Nutch Lucene ecosystem search :: http://search-lucene.com/ From: Sujatha Arun suja.a...@gmail.com To: solr-user@lucene.apache.org Sent: Tuesday, October 18, 2011 12:53 AM Subject: Re: OS Cache - Solr Hello Jan, Thanks for your response and clarification. We are monitoring the JVM cache utilization and we are currently using about 18 GB of the 20 GB assigned to JVM. Out total index size being abt 14GB Regards Sujatha On Tue, Oct 18, 2011 at 1:19 AM, Jan Høydahl jan@cominvent.com wrote: Hi Sujatha, Are you sure you need 20Gb for Tomcat? Have you profiled using JConsole or similar? Try with 15Gb and see how it goes. The reason why this is beneficial is that you WANT your OS to have available memory for disk caching. If you have 17Gb free after starting Solr, your OS will be able to cache all index files in memory and you get very high search performance. With your current settings, there is only 12Gb free for both caching the index and for your MySql activities. Chances are that when you backup MySql, the cached part of your Solr index gets flushed from disk caches and need to be re-cached later. How to interpret memory stats vary between OSes, and seing 163Mb free may simply mean that your OS has used most RAM for various caches and paging, but will flush it once an application asks for more memory. Have you seen http://wiki.apache.org/solr/SolrPerformanceFactors ? You should also slim down your index maximally by setting stored=false and indexed=false wherever possible. I would also upgrade to a more current Solr version. -- Jan Høydahl, search solution architect Cominvent AS - www.cominvent.com Solr Training - www.solrtraining.com On 17. okt. 2011, at 19:51, Sujatha Arun wrote: Hello I am trying to understand the OS cache utilization of Solr .Our server has several solr instances on a server .The total combined Index size of all instances is abt 14 Gb and the size of the maximum single Index is abt 2.5 GB . Our Server has Quad processor with 32 GB RAM .Out of which 20 GB has been assigned to JVM. We are running solr1.3 on tomcat 5.5 and Java 1.6 Our current Statistics indicate that solr uses 18-19 GB of 20 GB RAM assigned to JVM .However the Free physical seems to remain constant as below. Free physical memory = 163 Mb Total physical memory = 32,232 Mb, The server also serves as a backup server for Mysql where the application DB is backed up and restored .During this activity we see that lot of queries that nearly take even 10+ minutes to execute .But other wise maximum query time is less than 1-2 secs The physical memory that is free seems to be constant . Why is this constant and how this will be used between the Mysql backup and solr while backup activity is happening How much free physical memory should be available to OS given out stats.? Any pointers would be helpful. Regards Sujatha