Re: Is it OK to have very big number of fields in solr/lucene ?
Thanks for helps! The use case is that there is a file to which different users have different accessment, if only for filtering I can create one field, like status_field, with value like user1_important user2_important user3_unimportant ... then using a filter to get all files which is important to user1, like status_field:user1_important But tough part is that we may need to sort files according to such flag, for each user (I should have mentioned in last mail). My solution is to add many fields to file document, like user1_status, user2_status, user3_status // value can be important or unimportant then I can sort files according to each user's accessment. My concern is that with too many fields it would not scale well ? Best regards, Lisheng On Sat, Aug 9, 2014 at 6:18 AM, Toke Eskildsen t...@statsbiblioteket.dk wrote: Lisheng Zhang [lz0522...@gmail.com] wrote: In our application there are many complicated filter conditions, very often those conditions are special to each user (like whether or not a doc is important or already read by a user ..), two possible solutions to implement those filters in lucene: 1/ create many fields 2/ create many collections (for each user, for example) 3) Define a few fields or even just a single field where users can provide their special tags. misc_field:important misc_field:read misc_field:personal_letters misc_field:todo misc_field:very_important Chances are that there will be a lot of overlap of terms between the users (if not, please describe in more detail what the user-specific things are), so that the filter caches can be re-used between them. - Toke Eskildsen
Re: what os env you use to develop lucene or solr?
On Mon, 2014-08-11 at 03:49 +0200, rulinma wrote: I want know this, if linux is the best choosen? The only special OS-thing about Lucene/Solr is that you should use a 64-bit OS for proper memory mapping. With that out of the way, the question becomes What OS should one use for developing and that is extremely dependent on taste and context. The three obvious choices are Windows, OS X and a Linux variant. I would argue that OS X and Linux are fairly similar when developing in Java as file locking and scripting works the same way: If your server is OS X or Linux, using either OS X or Linux for development makes for easier testing. Likewise, if your server runs Windows, you might catch some errors earlier by developing under Windows. - Toke Eskildsen, State and University Library, Denmark
Re: what os env you use to develop lucene or solr?
I use MacOSX for development since more than 10 years. It's, by far, the user-friendliest Unix-based system. So copy and paste works correctly from the terminal to the IDE. Find in the terminal is nicely behaving (really!). This is kilometers away from XWindows' terminals and megameters away from the DOS shell (I have indeed a limited experience in both things). Lucene and solr work flawless on all such systems. Beware not to use network disks for the index file-system. paul On 11 août 2014, at 03:49, rulinma ruli...@gmail.com wrote: HI everybody, I want know this, if linux is the best choosen? and Doug Cutting use what, and centos or ubuntu or others, even mac? thanks. -- View this message in context: http://lucene.472066.n3.nabble.com/what-os-env-you-use-to-develop-lucene-or-solr-tp4152219.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: How to grab matching stats in Similarity class
On Thu, Aug 7, 2014 at 6:10 AM, Hafiz Mian M Hamid mianhami...@yahoo.com.invalid wrote: We're using solr 4.2.1 and use an extension of Lucene's DefaultSimilarity as our similarity class. I am trying to figure out how we could get hold of the matching stats (i.e. how many/which terms in the query matched on different fields in the retrieved document set) in our similarity class since we want to add some custom boost to our scoring function. The scoring logic needs to know the number of terms matched on each field in the query to determine the boost value. The score is calculated on a per field basis. Hence the similarity will never know how many fields the term match against. In Solr if you are using eDismax ( https://cwiki.apache.org/confluence/display/solr/The+Extended+DisMax+Query+Parser ) then the same term is searched across all the fields individually and then the best score from the highest scoring field is taken. This could solve your custom logic part in a crude way. Basically we want our similarity class to be aware of the global matching stats even for scoring a single term in it's TFIDFDocScorer.score() method. I was wondering how we could get hold of that information. It looks like the exactSimScorer() and sloppySimScorer() methods get an instance of AtomicReaderContext as second parameter but it doesn't look like we could retrieve matching stats from this object. Is there any other way we could make the similarity class aware of the global matching stats? I'd highly appreciate any help. Thanks, Hamid -- Regards, Varun Thacker http://www.vthacker.in/
Re: Is it OK to have very big number of fields in solr/lucene ?
On Mon, 2014-08-11 at 09:44 +0200, Lisheng Zhang wrote: [...] But tough part is that we may need to sort files according to such flag, for each user (I should have mentioned in last mail). My solution is to add many fields to file document, like user1_status, user2_status, user3_status // value can be important or unimportant then I can sort files according to each user's accessment. Flip assessment and user name: status:important_user1 status:unimportant_user2 status:customflag_user1 status:anothercustomflag_user2 That way sorting will work. Unfortunately there will be no re-use of the filter cache between users, so you might want to disable filter caching. If you also need to be able to filter on which documents a user has assessed, then add a second field for that assessed_by:user1 assessed_by:user2 My concern is that with too many fields it would not scale well ? I have successfully experimented with 10.000 fields, but I doubt that it will work with millions. - Toke Eskildsen, State and University Library, Denmark
Solr search \ special cases
Hi, I have some strange cases while search with Solr. I have doc with names like: rule #22, rule +33, rule %44. When search for #22 or %55 or +33 Solr bring me as expected: rule #22 and rule +33 and rule %44. But when appending star (*) to each search (#22*, +33*, %55*), just the one with + sign bring rule +33, all other result none. Can someone explain? Thanks, Shay.
Re: Solr search \ special cases
Hi Shay, I believe + is treated as space, is it a rest call or api ? what is your field type ? Regards Harshvardhan Ojha On Mon, Aug 11, 2014 at 4:04 PM, Shay Sofer sha...@checkpoint.com wrote: Hi, I have some strange cases while search with Solr. I have doc with names like: rule #22, rule +33, rule %44. When search for #22 or %55 or +33 Solr bring me as expected: rule #22 and rule +33 and rule %44. But when appending star (*) to each search (#22*, +33*, %55*), just the one with + sign bring rule +33, all other result none. Can someone explain? Thanks, Shay.
RE: Solr search \ special cases
I call directly from Solr web api. Field type is string. * should bring more results ? this is suffix search ? am I wrong ? Thanks ! -Original Message- From: Harshvardhan Ojha [mailto:ojha.harshvard...@gmail.com] Sent: Monday, August 11, 2014 1:40 PM To: solr-user@lucene.apache.org Subject: Re: Solr search \ special cases Hi Shay, I believe + is treated as space, is it a rest call or api ? what is your field type ? Regards Harshvardhan Ojha On Mon, Aug 11, 2014 at 4:04 PM, Shay Sofer sha...@checkpoint.com wrote: Hi, I have some strange cases while search with Solr. I have doc with names like: rule #22, rule +33, rule %44. When search for #22 or %55 or +33 Solr bring me as expected: rule #22 and rule +33 and rule %44. But when appending star (*) to each search (#22*, +33*, %55*), just the one with + sign bring rule +33, all other result none. Can someone explain? Thanks, Shay. Email secured by Check Point
Re: Solr search \ special cases
The use of a wildcard suppresses analysis of the query term, so the special characters remain, but... they were removed when the terms were indexed, so no match. You must manually emulate the index term analysis in order to use wildcards. -- Jack Krupansky -Original Message- From: Shay Sofer Sent: Monday, August 11, 2014 6:34 AM To: solr-user@lucene.apache.org Subject: Solr search \ special cases Hi, I have some strange cases while search with Solr. I have doc with names like: rule #22, rule +33, rule %44. When search for #22 or %55 or +33 Solr bring me as expected: rule #22 and rule +33 and rule %44. But when appending star (*) to each search (#22*, +33*, %55*), just the one with + sign bring rule +33, all other result none. Can someone explain? Thanks, Shay.
RE: SqlEntityProcessor
I've heard of a user adding a separate entity / section to the end of their data-config.xml with a SqlEntityProcessor and an UPDATE statement. It would run after your main entity / section. I have not tried it myself, and surely DIH was not designed to do this, but it might work. A better solution might be to write a class implementing EventListener that does the db update you want and put an onImportEnd listener in your configuration. See http://wiki.apache.org/solr/DataImportHandler#EventListeners for details. James Dyer Ingram Content Group (615) 213-4311 -Original Message- From: Christof Lorenz [mailto:loc...@web.de] Sent: Sunday, August 10, 2014 6:52 AM To: solr-user@lucene.apache.org Subject: SqlEntityProcessor Hi folks, i am searching for a way to update a certain column in the rdbms for each item as soon as the item was indexed by solr. The column will be the indicator in the delta-query to select un-indexed items. We don't want to use the timestamp based mechanism that is default. Any ideas how we could implement this ? Regards, Lochri
Re: SolrCloud Scale Struggle
On 8/10/2014 11:07 PM, anand.mahajan wrote: Thank you for your suggestions. With the autoCommit (every 10 mins) and softCommit (every 10 secs) frequencies reduced things work much better now. The CPU usages has gone down considerably too (by about 60%) and the read/write throughput is showing considerable improvements too. There are a certain shards that are giving poor response times - these have over 10M listings - I guess this is due to the fact that these are starving for RAM? Would it help if I split these up in smaller shards, but with the existing set of hardware? (I cannot allocate more machines to the cloud as yet) Memory requirements are actually likely to go *up* a little bit with more shards on the same hardware, not down. The ideal RAM setup is to have enough RAM in the machine to equal or exceed the sum of your max Solr heap and the size of all the index data on that machine. This allows the operating system to load the entire index into the disk cache. Disks are *slow*, if the OS can pull the data it needs out of RAM (the operating system disk cache), that becomes *very* fast. If you have enough RAM to load at least two thirds of the index size, performance is likely to also be very good, but 100% of the index is better. If at all possible, put more RAM in the machine. Thanks, Shawn
When I use minimum match and maxCollationTries parameters together in edismax, Solr gets stuck
Hi, In the following configuration when uncomment both mm and maxCollationTries lines, and run a query on |/select|, Solr gets stuck with no exception. I tried different values for both parameters and found that values for mm less than %40 still works. |requestHandler name=/select class=solr.SearchHandler !-- default values for query parameters can be specified, these will be overridden by parameters in the request -- lst name=defaults str name=echoParamsexplicit/str str name=defTypeedismax/str int name=timeAllowed1000/int str name=qftitle^3 title_s^2 content/str str name=pftitle content/str str name=flid,title,content,score/str float name=tie0.1/float str name=lowercaseOperatorstrue/str str name=stopwordstrue/str !-- str name=mm75%/str-- int name=rows10/int str name=spellcheckon/str str name=spellcheck.dictionarydefault/str str name=spellcheck.dictionarywordbreak/str str name=spellcheck.onlyMorePopulartrue/str str name=spellcheck.count5/str str name=spellcheck.maxResultsForSuggest5/str str name=spellcheck.extendedResultsfalse/str str name=spellcheck.alternativeTermCount2/str str name=spellcheck.collatetrue/str str name=spellcheck.collateExtendedResultstrue/str str name=spellcheck.maxCollationTries5/str !-- str name=spellcheck.collateParam.mm100%/str-- str name=spellcheck.maxCollations3/str /lst arr name=last-components strspellcheck/str /arr /requestHandler Any idea? Thanks | -- Harun Reşit Zafer TÜBİTAK BİLGEM BTE Bulut Bilişim ve Büyük Veri Analiz Sistemleri Bölümü T +90 262 675 3268 W http://www.hrzafer.com
RE: When I use minimum match and maxCollationTries parameters together in edismax, Solr gets stuck
Harun, Just to clarify, is this happening during startup when a warmup query is running, or is this once the server is fully started? This might be another instance of https://issues.apache.org/jira/browse/SOLR-5386 . James Dyer Ingram Content Group (615) 213-4311 -Original Message- From: Harun Reşit Zafer [mailto:harun.za...@tubitak.gov.tr] Sent: Monday, August 11, 2014 8:39 AM To: solr-user@lucene.apache.org Subject: When I use minimum match and maxCollationTries parameters together in edismax, Solr gets stuck Hi, In the following configuration when uncomment both mm and maxCollationTries lines, and run a query on |/select|, Solr gets stuck with no exception. I tried different values for both parameters and found that values for mm less than %40 still works. |requestHandler name=/select class=solr.SearchHandler !-- default values for query parameters can be specified, these will be overridden by parameters in the request -- lst name=defaults str name=echoParamsexplicit/str str name=defTypeedismax/str int name=timeAllowed1000/int str name=qftitle^3 title_s^2 content/str str name=pftitle content/str str name=flid,title,content,score/str float name=tie0.1/float str name=lowercaseOperatorstrue/str str name=stopwordstrue/str !-- str name=mm75%/str-- int name=rows10/int str name=spellcheckon/str str name=spellcheck.dictionarydefault/str str name=spellcheck.dictionarywordbreak/str str name=spellcheck.onlyMorePopulartrue/str str name=spellcheck.count5/str str name=spellcheck.maxResultsForSuggest5/str str name=spellcheck.extendedResultsfalse/str str name=spellcheck.alternativeTermCount2/str str name=spellcheck.collatetrue/str str name=spellcheck.collateExtendedResultstrue/str str name=spellcheck.maxCollationTries5/str !-- str name=spellcheck.collateParam.mm100%/str-- str name=spellcheck.maxCollations3/str /lst arr name=last-components strspellcheck/str /arr /requestHandler Any idea? Thanks | -- Harun Reşit Zafer TÜBİTAK BİLGEM BTE Bulut Bilişim ve Büyük Veri Analiz Sistemleri Bölümü T +90 262 675 3268 W http://www.hrzafer.com
Performance comparison of uploading SolrInputDocument vs JSONRequestHandler
I have a large number of documents that I am trying to load into SOLR. I am about to begin bench marking this effort, but I thought I would ask here. I have the documents in JSONArrays already. I am most concerned with ingest rate on the server. So I don't mind performing extra work on the client to speed up the server... Assuming I am using ConcurrentUpdateSolrServer, will I get better ingest performance if I convert all my documents to SolrInputDocuments before sending, or if I use the JsonRequestHandler on the server and send the JSONArrays via a ContentStreamUpdateRequest? Thanks, George
Re: Performance comparison of uploading SolrInputDocument vs JSONRequestHandler
I think you're worrying about the wrong problem ;) Often, the difference between the JSON and SolrInputDocument decoding on the server is dwarfed by the time it takes the client to assemble the docs to send. Quick test: When you start indexing, how hard is the Solr server working (measure crudely by looking at CPU utilization). Very frequently you'll find the server sitting around waiting for the client to send documents. Very often you'll get _much_ greater throughput gains by racking N clients together all sending to the Solr server than you will get by worrying about whether JSON or SolrInputDocument (or even XML docs) is more efficient on the server. That said, SolrInputDocuments are somewhat faster I think. FWIW Erick On Mon, Aug 11, 2014 at 7:34 AM, georgelav...@comcast.net wrote: I have a large number of documents that I am trying to load into SOLR. I am about to begin bench marking this effort, but I thought I would ask here. I have the documents in JSONArrays already. I am most concerned with ingest rate on the server. So I don't mind performing extra work on the client to speed up the server... Assuming I am using ConcurrentUpdateSolrServer, will I get better ingest performance if I convert all my documents to SolrInputDocuments before sending, or if I use the JsonRequestHandler on the server and send the JSONArrays via a ContentStreamUpdateRequest? Thanks, George
RES: SOLRJ Stop Streaming
Hey Guys, any ideas? Thanks, Felipe De: Felipe Paiva Enviado: quarta-feira, 6 de agosto de 2014 16:40 Para: solr-user@lucene.apache.org Assunto: SOLRJ Stop Streaming Hi Guys, in version 4.0 of SOLRJ a support for streaming response was added: https://issues.apache.org/jira/browse/SOLR-2112 In my application, the output for the SOLR input stream is a response stream from a REST web service. It works fine, but if the client closes the connection with the REST server, the SOLR stream continues to work. As a result of that, CPU remains being used, although nothing is being delivered to the client. Is there a way to force the SOLR stream to be closed? I think I would have to modify the class StreamingBinaryResponseParser, by adding a new method that checks if the SOLR stream should be closed. Am I right? I am using the 4.1.0 version of the SOLRJ. Thank you all. Cheers, Felipe Paiva UOL - Analista de Sistemas Av. Brig. Faria Lima, 1384, 3° andar . 01452-002 . São Paulo/SP Telefone: 11 3092 6938 AVISO: A informação contida neste e-mail, bem como em qualquer de seus anexos, é CONFIDENCIAL e destinada ao uso exclusivo do(s) destinatário(s) acima referido(s), podendo conter informações sigilosas e/ou legalmente protegidas. Caso você não seja o destinatário desta mensagem, informamos que qualquer divulgação, distribuição ou cópia deste e-mail e/ou de qualquer de seus anexos é absolutamente proibida. Solicitamos que o remetente seja comunicado imediatamente, respondendo esta mensagem, e que o original desta mensagem e de seus anexos, bem como toda e qualquer cópia e/ou impressão realizada a partir destes, sejam permanentemente apagados e/ou destruídos. Informações adicionais sobre nossa empresa podem ser obtidas no site http://sobre.uol.com.br/. NOTICE: The information contained in this e-mail and any attachments thereto is CONFIDENTIAL and is intended only for use by the recipient named herein and may contain legally privileged and/or secret information. If you are not the e-mail´s intended recipient, you are hereby notified that any dissemination, distribution or copy of this e-mail, and/or any attachments thereto, is strictly prohibited. Please immediately notify the sender replying to the above mentioned e-mail address, and permanently delete and/or destroy the original and any copy of this e-mail and/or its attachments, as well as any printout thereof. Additional information about our company may be obtained through the site http://www.uol.com.br/ir/.
Need some help with solr not restarting
I'm very new to SolrCloud. When I tried restarting our tomcat server running SolrCloud, I started getting this in our logs: SEVERE: org.apache.zookeeper.KeeperException$ConnectionLossException: KeeperErrorCode = ConnectionLoss for /configs/configuration1/default-collection/data/index/_3ts3_Lucene41_0.doc at org.apache.zookeeper.KeeperException.create(KeeperException.java:99) at org.apache.zookeeper.KeeperException.create(KeeperException.java:51) at org.apache.zookeeper.ZooKeeper.create(ZooKeeper.java:783) at org.apache.solr.common.cloud.SolrZkClient$10.execute(SolrZkClient.java:407) at org.apache.solr.common.cloud.ZkCmdExecutor.retryOperation(ZkCmdExecutor.java:65) at org.apache.solr.common.cloud.SolrZkClient.makePath(SolrZkClient.java:404) at org.apache.solr.common.cloud.SolrZkClient.makePath(SolrZkClient.java:314) at org.apache.solr.cloud.ZkController.uploadToZK(ZkController.java:1325) at org.apache.solr.cloud.ZkController.uploadToZK(ZkController.java:1327) at org.apache.solr.cloud.ZkController.uploadToZK(ZkController.java:1327) at org.apache.solr.cloud.ZkController.uploadToZK(ZkController.java:1327) at org.apache.solr.cloud.ZkController.uploadConfigDir(ZkController.java:1099) at org.apache.solr.core.ZkContainer.initZooKeeper(ZkContainer.java:199) at org.apache.solr.core.ZkContainer.initZooKeeper(ZkContainer.java:74) at org.apache.solr.core.CoreContainer.load(CoreContainer.java:206) at org.apache.solr.servlet.SolrDispatchFilter.createCoreContainer(SolrDispatchFilter.java:177) at org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:127) at org.apache.catalina.core.ApplicationFilterConfig.initFilter(ApplicationFilterConfig.java:281) at org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:262) at org.apache.catalina.core.ApplicationFilterConfig.init(ApplicationFilterConfig.java:107) at org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:4797) at org.apache.catalina.core.StandardContext.startInternal(StandardContext.java:5473) at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:150) at org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:901) at org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:877) at org.apache.catalina.core.StandardHost.addChild(StandardHost.java:634) at org.apache.catalina.startup.HostConfig.deployWAR(HostConfig.java:1074) at org.apache.catalina.startup.HostConfig$DeployWar.run(HostConfig.java:1858) at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) at java.util.concurrent.FutureTask.run(Unknown Source) at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source) at java.util.concurrent.ThreadPoolExecutor$Worker.run(Unknown Source) at java.lang.Thread.run(Unknown Source) Aug 11, 2014 2:21:08 PM org.apache.solr.servlet.SolrDispatchFilter init SEVERE: Could not start Solr. Check solr/home property and the logs Aug 11, 2014 2:21:08 PM org.apache.solr.common.SolrException log SEVERE: null:org.apache.solr.common.cloud.ZooKeeperException: at org.apache.solr.core.ZkContainer.initZooKeeper(ZkContainer.java:224) at org.apache.solr.core.ZkContainer.initZooKeeper(ZkContainer.java:74) at org.apache.solr.core.CoreContainer.load(CoreContainer.java:206) at org.apache.solr.servlet.SolrDispatchFilter.createCoreContainer(SolrDispatchFilter.java:177) at org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:127) at org.apache.catalina.core.ApplicationFilterConfig.initFilter(ApplicationFilterConfig.java:281) at org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:262) at org.apache.catalina.core.ApplicationFilterConfig.init(ApplicationFilterConfig.java:107) at org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:4797) at org.apache.catalina.core.StandardContext.startInternal(StandardContext.java:5473) at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:150) at org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:901) at org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:877) at org.apache.catalina.core.StandardHost.addChild(StandardHost.java:634) at org.apache.catalina.startup.HostConfig.deployWAR(HostConfig.java:1074) at org.apache.catalina.startup.HostConfig$DeployWar.run(HostConfig.java:1858) at java.util.concurrent.Executors$RunnableAdapter.call(Unknown Source) at java.util.concurrent.FutureTask.run(Unknown Source) at java.util.concurrent.ThreadPoolExecutor.runWorker(Unknown Source)
regexTransformer returns no results if there is no match
Hello, I try to construct wikipedia page url from page title using regexTransformer with field column=title_underscore regex=\s+ replaceWith=_ sourceColName=title / This does not work for titles that have no space, so title_underscore for them is empty. Any ideas what is wrong here? This is with solr-4.8.1 Thanks. Alex.
SolrCloud OOM Problem
My SolrCloud of 3 shard / 3 replicas is having a lot of OOM errors. Here are some specs on my setup: hosts: all are EC2 m1.large with 250G data volumes documents: 120M total zookeeper: 5 external t1.micros startup command with memory and GC values === root 12499 1 61 19:36 pts/001:49:18 /usr/lib/jvm/jre/bin/java -XX:NewSize=1536m -XX:MaxNewSize=1536m -Xms5120m -Xmx5120m -XX:+UseParNewGC -XX:+CMSParallelRemarkEnabled -XX:+UseConcMarkSweepGC -Djavax.sql.DataSource.Factory=org.apache.commons.dbcp.BasicDataSourceFactory -DnumShards=3 -Dbootstrap_confdir=/data/solr/lighting_products/conf -Dcollection.configName=lighting_products_cloud_conf -DzkHost=ec2-00-17-55-217.compute-1.amazonaws.com:2181,ec2-00-82-150-252.compute-1.amazonaws.com:2181,ec2-00-234-237-109.compute-1.amazonaws.com:2181,ec2-00-224-205-204.compute-1.amazonaws.com:2181,ec2-00-20-72-124.compute-1.amazonaws.com:2181 -classpath :/usr/share/tomcat6/bin/bootstrap.jar:/usr/share/tomcat6/bin/tomcat-juli.jar:/usr/share/java/commons-daemon.jar -Dcatalina.base=/usr/share/tomcat6 -Dcatalina.home=/usr/share/tomcat6 -Djava.endorsed.dirs= -Djava.io.tmpdir=/var/cache/tomcat6/temp -Djava.util.logging.config.file=/usr/share/tomcat6/conf/logging.properties -Djava.util.logging.manager=org.apache.juli.ClassLoaderLogManager org.apache.catalina.startup.Bootstrap start Linux top command output with no indexing === PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ COMMAND 8654 root 20 0 95.3g 6.4g 1.1g S 27.6 87.4 83:46.19 java Linux top command output with indexing === PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ COMMAND 12499 root 20 0 95.8g 5.8g 556m S 164.3 80.2 110:40.99 java So it appears our indexing is clobbering the CPU but not the memory. The user queries are pretty bad and I will provide a few examples here. Note that the sort date updated_dt is not the order in which documents are indexed, the user application should be sorting on a different date. === INFO: [lighting_products] webapp=/solr path=/select params={facet=truesort=updated_dt+descf.content_videotype_s.facet.missing=truespellcheck.q=candid+cameranocache=1407790600582distrib=falseversion=2oe=UTF-8fl=id,scoredf=textshard.url=10.211.82.113:80/solr/lighting_products/|10.249.34.65:80/solr/lighting_products/NOW=1407790600883ie=UTF-8facet.field=content_videotype_sfq=my_database_s:trainingfq=my_server_s:mydomain\-arc\-v2.lightingservices.comfq=allnamespaces_s_mv:(happyland\-site+OR+tags+OR+predicates+OR+authorities+OR+happyland+OR+movies+OR+global+OR+devicegroup+OR+people+OR+entertainment)fq={!tag%3Dct0}_contenttype_s:Standard\:ShowVideofsv=truesite=solr_arc_restype=SOLRwt=javabindefType=dismaxrows=50start=0f.content_videotype_s.facet.limit=160q=candid+cameraq.op=ANDisShard=true} hits=22 status=0 QTime=118 INFO: [lighting_products] webapp=/solr path=/select
Re: SolrCloud OOM Problem
On 8/11/2014 5:27 PM, dancoleman wrote: My SolrCloud of 3 shard / 3 replicas is having a lot of OOM errors. Here are some specs on my setup: hosts: all are EC2 m1.large with 250G data volumes documents: 120M total zookeeper: 5 external t1.micros snip Linux top command output with no indexing === PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ COMMAND 8654 root 20 0 95.3g 6.4g 1.1g S 27.6 87.4 83:46.19 java Linux top command output with indexing === PID USER PR NI VIRT RES SHR S %CPU %MEMTIME+ COMMAND 12499 root 20 0 95.8g 5.8g 556m S 164.3 80.2 110:40.99 java I think you're likely going to need a much larger heap than 5GB, or you're going to need a lot more machines and shards, so that each machine has a much smaller piece of the index. The java heap is only one part of the story here, though. Solr performance is terrible when the OS cannot effectively cache the index, because Solr must actually read the disk to get the data required for a query. Disks are incredibly SLOW. Even SSD storage is a *lot* slower than RAM. Your setup does not have anywhere near enough memory for the size of your shards. Amazon's website says that the m1.large instance has 7.5GB of RAM. You're allocating 5GB of that to Solr (the java heap) according to your startup options. If you subtract a little more for the operating system and basic system services, that leaves about 2GB of RAM for the disk cache. Based on the numbers from top, that Solr instance is handling nearly 90GB of index. 2GB of RAM for caching is nowhere near enough -- you will want between 32GB and 96GB of total RAM for that much index. http://wiki.apache.org/solr/SolrPerformanceProblems#RAM Thanks, Shawn
Re: SolrCloud OOM Problem
90G is correct, each host is currently holding that much data. Are you saying that 32GB to 96GB would be needed for each host? Assuming we did not add more shards that is. -- View this message in context: http://lucene.472066.n3.nabble.com/SolrCloud-OOM-Problem-tp4152389p4152401.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: SolrCloud OOM Problem
90G is correct, each host is currently holding that much data. Are you saying that 32GB to 96GB would be needed for each host? Assuming we did not add more shards that is. If you want good performance and enough memory to give Solr the heap it will need, yes. Lucene (the search API that Solr uses) relies on good operating system caching for the index. Having enough memory to catch the ENTIRE index is not usually required, but it is recommended. Alternatively, you can add a lot more hosts and create a new collection with a lot more shards. The total memory requirement across the whole cloud won't go down, but each host won't require as much. Thanks, Shawn
what's the difference between solr and elasticsearch in hdfs case?
Hi~ I'm new to both solr and elasticsearch. I have read that both the two support creating index on hdfs. So, what's the difference between solr and elasticsearch in hdfs case? -- View this message in context: http://lucene.472066.n3.nabble.com/what-s-the-difference-between-solr-and-elasticsearch-in-hdfs-case-tp4152413.html Sent from the Solr - User mailing list archive at Nabble.com.
SpatialForTimeDurations question
Hello. I am sorry for bad English. I am using Solr4.7.1 I want to search date range query to multiValued field. Then, I found solution. http://wiki.apache.org/solr/SpatialForTimeDurations The solution is almost perfect. But some values, I got error message #message 2014/08/10 23:28:12.558: ERROR: Fatal: 400: ERROR: [doc=12345] Error adding field 'disp_date_range'='[20140418 201408201908]' msg=Index: 0, Size: 0 #schema.xml fieldType name=date_range class=solr.SpatialRecursivePrefixTreeFieldType multiValued=true geo=false worldBounds=0 0 1231 1231 distErrPct=0 maxDistErr=1 units=degrees / #data doc field name=id12345/field field name=disp_date_range update=set20140418 201408201908/field /doc I checked other values. And there are some values that solr can not store. For example, 202805041049 202805041049 201301041353 201301041353 200305281316 200305281316 200601011536 200601011536 203005271640 203005271640 201505211646 201505211646 202602071904 202602071904 Can anyone help please ?! Thanks. Shu Ogawa
Re: what's the difference between solr and elasticsearch in hdfs case?
Are you comparing Solr vs. ElasticSearch. Or Cloudera vs. ElasticSearch? Because Cloudera is also commercial like ElasticSearch and has a full HDFS story. Regards, Alex. Personal: http://www.outerthoughts.com/ and @arafalov Solr resources and newsletter: http://www.solr-start.com/ and @solrstart Solr popularizers community: https://www.linkedin.com/groups?gid=6713853 On Tue, Aug 12, 2014 at 4:43 AM, Jianyi phoenix.w.2...@qq.com wrote: Hi~ I'm new to both solr and elasticsearch. I have read that both the two support creating index on hdfs. So, what's the difference between solr and elasticsearch in hdfs case? -- View this message in context: http://lucene.472066.n3.nabble.com/what-s-the-difference-between-solr-and-elasticsearch-in-hdfs-case-tp4152413.html Sent from the Solr - User mailing list archive at Nabble.com.