Solr Optimization Fail
Hi, When we do optimize, it actually reduces the data size right? I have index of size 6gb(5 million documents). Index is already created with commits for every 1 documents. Now I was trying to do optimization with http optimize command. When i did that, data size became - 12gb. Why this might have happened? And can anyone please suggest me fix for it? Thanks Rajani
disable stemming on query parser.
Hi All, I am using Stemming in my solr , but i don't want to apply stemming always for each search request. i am thinking of to disable stemming on one specific query parser , can i do this? Any help much appreciated. Thanks in Advance -- View this message in context: http://lucene.472066.n3.nabble.com/disable-stemming-on-query-parser-tp3591420p3591420.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Solr Optimization Fail
Maybe you are generating a snapshot of your index attached to the optimize ??? Look for post-commit or post-optimize events in your solr-config.xml De: Rajani Maski [rajinima...@gmail.com] Enviado el: viernes, 16 de diciembre de 2011 11:11 Para: solr-user@lucene.apache.org Asunto: Solr Optimization Fail Hi, When we do optimize, it actually reduces the data size right? I have index of size 6gb(5 million documents). Index is already created with commits for every 1 documents. Now I was trying to do optimization with http optimize command. When i did that, data size became - 12gb. Why this might have happened? And can anyone please suggest me fix for it? Thanks Rajani
Re: Solr Optimization Fail
These parameters are commented in my solr config.xml see the parameters attached. !-- The RunExecutableListener executes an external command from a hook such as postCommit or postOptimize. exe - the name of the executable to run dir - dir to use as the current working directory. default=. wait - the calling thread waits until the executable returns. default=true args - the arguments to pass to the program. default=nothing env - environment variables to set. default=nothing -- !-- A postCommit event is fired after every commit or optimize command listener event=postCommit class=solr.RunExecutableListener str name=exesolr/bin/snapshooter/str str name=dir./str bool name=waittrue/bool arr name=args strarg1/str strarg2/str /arr arr name=env strMYVAR=val1/str /arr /listener -- !-- A postOptimize event is fired only after every optimize command listener event=postOptimize class=solr.RunExecutableListener str name=exesnapshooter/str str name=dirsolr/bin/str bool name=waittrue/bool /listener -- When i do optimize on index of size 400 mb , it reduces the size of data folder to 200 mb. But when data is huge it doubles it. Why is that so? Optimization : Actually should reduce the size of the data ? Or just improves the search query performance? On Fri, Dec 16, 2011 at 5:40 PM, Juan Pablo Mora jua...@informa.es wrote: Maybe you are generating a snapshot of your index attached to the optimize ??? Look for post-commit or post-optimize events in your solr-config.xml De: Rajani Maski [rajinima...@gmail.com] Enviado el: viernes, 16 de diciembre de 2011 11:11 Para: solr-user@lucene.apache.org Asunto: Solr Optimization Fail Hi, When we do optimize, it actually reduces the data size right? I have index of size 6gb(5 million documents). Index is already created with commits for every 1 documents. Now I was trying to do optimization with http optimize command. When i did that, data size became - 12gb. Why this might have happened? And can anyone please suggest me fix for it? Thanks Rajani
full-data import suddenly stopped working. Total Rows Fetched remains 0
My full-data import stopped working all of a sudden. Afaik I have not made any changes that would cause this. The response is: response script/ lst name=responseHeader int name=status0/int int name=QTime0/int /lst lst name=initArgs lst name=defaults str name=configwedding-data-config.xml/str /lst /lst str name=commandfull-import/str str name=statusbusy/str str name=importResponseA command is still running.../str lst name=statusMessages str name=Time Elapsed0:6:4.112/str str name=Total Requests made to DataSource1/str str name=Total Rows Fetched0/str str name=Total Documents Processed0/str str name=Total Documents Skipped0/str str name=Full Dump Started2011-12-16 13:12:29/str /lst str name=WARNING This response format is experimental. It is likely to change in the future. /str /response Doesnt matter how often I refresh this page, it stays like this, only thing chaning is Time Elapsed. Here's the log: Dec 16, 2011 1:20:04 PM org.apache.solr.handler.dataimport.DataImporter doFullIm port INFO: Starting Full Import Dec 16, 2011 1:20:04 PM org.apache.solr.handler.dataimport.SolrWriter readIndexe rProperties INFO: Read dataimport.properties Dec 16, 2011 1:20:04 PM org.apache.solr.update.DirectUpdateHandler2 deleteAll INFO: [cam] REMOVING ALL DOCUMENTS FROM INDEX Dec 16, 2011 1:20:04 PM org.apache.solr.core.SolrDeletionPolicy onInit INFO: SolrDeletionPolicy.onInit: commits:num=1 commit{dir=C:\My Dropbox\inetpub\apache-solr-4.0-2010-10-12_08-05-48\exa mple\example-DIH\solr\cam\data\index,segFN=segments_jb,version=1286962723772,gen eration=695,filenames=[_iv.prx, _iv.frq, segments_jb, _iv.tis, _iv.nrm, _iv.fdt, _iv.fdx, _iv.fnm, _iv.tii] Dec 16, 2011 1:20:04 PM org.apache.solr.core.SolrDeletionPolicy updateCommits INFO: newest commit = 1286962723772 Dec 16, 2011 1:20:04 PM org.apache.solr.handler.dataimport.JdbcDataSource$1 call INFO: Creating a connection for entity camera with URL: jdbc:sqlserver://localho st:1433;databaseName=tt Dec 16, 2011 1:20:05 PM org.apache.solr.core.SolrCore execute INFO: [cam] webapp=/solr path=/dataimport params={command=full-import} status=0 QTime=0 Dec 16, 2011 1:20:06 PM org.apache.solr.core.SolrCore execute INFO: [cam] webapp=/solr path=/dataimport params={command=full-import} status=0 QTime=0 Dec 16, 2011 1:20:07 PM org.apache.solr.core.SolrCore execute INFO: [cam] webapp=/solr path=/dataimport params={command=full-import} status=0 QTime=0 Dec 16, 2011 1:20:09 PM org.apache.solr.core.SolrCore execute INFO: [cam] webapp=/solr path=/dataimport params={command=full-import} status=0 QTime=0 Dec 16, 2011 1:20:09 PM org.apache.solr.core.SolrCore execute INFO: [cam] webapp=/solr path=/dataimport params={command=full-import} status=0 QTime=0 Dec 16, 2011 1:20:10 PM org.apache.solr.core.SolrCore execute INFO: [cam] webapp=/solr path=/dataimport params={command=full-import} status=0 QTime=0 Dec 16, 2011 1:20:10 PM org.apache.solr.core.SolrCore execute INFO: [cam] webapp=/solr path=/dataimport params={command=full-import} status=0 QTime=0 Dec 16, 2011 1:20:11 PM org.apache.solr.core.SolrCore execute INFO: [cam] webapp=/solr path=/dataimport params={command=full-import} status=0 QTime=0 Once again...this always used to work...I have no idea why it now doesnt since I see no error whatsoever. -- View this message in context: http://lucene.472066.n3.nabble.com/full-data-import-suddenly-stopped-working-Total-Rows-Fetched-remains-0-tp3591479p3591479.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr Optimization Fail
Are you on Windows? There is a JVM bug that makes Solr keep the old files, even if they are not used anymore. The files are going to be eventually removed, but if you want them out of there immediately try optimizing twice, the second optimize doesn't do much but it will remove the old files. On Fri, Dec 16, 2011 at 9:10 AM, Juan Pablo Mora jua...@informa.es wrote: Maybe you are generating a snapshot of your index attached to the optimize ??? Look for post-commit or post-optimize events in your solr-config.xml De: Rajani Maski [rajinima...@gmail.com] Enviado el: viernes, 16 de diciembre de 2011 11:11 Para: solr-user@lucene.apache.org Asunto: Solr Optimization Fail Hi, When we do optimize, it actually reduces the data size right? I have index of size 6gb(5 million documents). Index is already created with commits for every 1 documents. Now I was trying to do optimization with http optimize command. When i did that, data size became - 12gb. Why this might have happened? And can anyone please suggest me fix for it? Thanks Rajani
Re: Solr Optimization Fail
Oh, yes on windows, using java 1.6 and Solr 1.4.1. Ok let me try that one... Thank you so much. Regards, Rajani 2011/12/16 Tomás Fernández Löbbe tomasflo...@gmail.com Are you on Windows? There is a JVM bug that makes Solr keep the old files, even if they are not used anymore. The files are going to be eventually removed, but if you want them out of there immediately try optimizing twice, the second optimize doesn't do much but it will remove the old files. On Fri, Dec 16, 2011 at 9:10 AM, Juan Pablo Mora jua...@informa.es wrote: Maybe you are generating a snapshot of your index attached to the optimize ??? Look for post-commit or post-optimize events in your solr-config.xml De: Rajani Maski [rajinima...@gmail.com] Enviado el: viernes, 16 de diciembre de 2011 11:11 Para: solr-user@lucene.apache.org Asunto: Solr Optimization Fail Hi, When we do optimize, it actually reduces the data size right? I have index of size 6gb(5 million documents). Index is already created with commits for every 1 documents. Now I was trying to do optimization with http optimize command. When i did that, data size became - 12gb. Why this might have happened? And can anyone please suggest me fix for it? Thanks Rajani
How to disable Auto Commit and Auto optimize operation after addition of few documents through dataimport handler
Hi, I would like to know how can we disable the commit and optimize operation is called by deafult after addition of few documents through dataimport handlers. In our application, the master solr instance is used for indexing purpose and the slave solr is for user search request. Hence the replication has to happen on regular interval of time. Master solr has around 1.4 million document (Size : 2.7 GB). We have frequent addition/deletion of documents in the master solr. After each addition/deletion commit and optimize operation are called by default, which tends to be a costly operation. Also this makes the replication time longer. So what I thought is that the commit operation should be performed after certain amount of documents are added and optimize operation should performed only once in a day or manually done. Please let me know how to customize the setting for commit and optimize operation in solrConfig.xml. do we have any documentation regarding the same. Any pointers would be of great help. Thanks in advances. Thanks Regards, Sivaganesh -- View this message in context: http://lucene.472066.n3.nabble.com/How-to-disable-Auto-Commit-and-Auto-optimize-operation-after-addition-of-few-documents-through-datair-tp3591560p3591560.html Sent from the Solr - User mailing list archive at Nabble.com.
SolrCloud Cores
What is the most appropriate way to configure Solr when deploying in a cloud environment? Should the core name on all instances be the collection name or is it more appropriate that each shard be a separate core, or should each solr instance be a separate core (i.e. master1, master1-replica are 2 separate cores)?
Re: How to disable Auto Commit and Auto optimize operation after addition of few documents through dataimport handler
On 12/16/2011 5:57 AM, mechravi25 wrote: I would like to know how can we disable the commit and optimize operation is called by deafult after addition of few documents through dataimport handlers. Add this to the url you use to call the handler: commit=falseoptimize=false Thanks, Shawn
Lock obtain timed out
Hi, I'm doing a lot reads and writes into a single solr server (on the magnitude of 50ish per second), and have around 300,000 documents in the index. Now every 5 minutes I get this exception: SEVERE: org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: NativeFSLock@./solr/data/index/write.lock And I have to restart my solr process. I've done some googling, some people have suggested raising the limit for linux file open #, or changing the merge factor, but that didn't work. Does anyone have insights into this? Thanks, Eric
Re: disable stemming on query parser.
You can disable stemming in a copy field. So you need to define one field with your input data on which stemming will be done and the other field (copy field), on which stemming will not be done. Then on the client you can decide which field to search against. Dmitry On Fri, Dec 16, 2011 at 2:00 PM, meghana meghana.rav...@amultek.com wrote: Hi All, I am using Stemming in my solr , but i don't want to apply stemming always for each search request. i am thinking of to disable stemming on one specific query parser , can i do this? Any help much appreciated. Thanks in Advance -- View this message in context: http://lucene.472066.n3.nabble.com/disable-stemming-on-query-parser-tp3591420p3591420.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Lock obtain timed out
Hi Eric, And you are using the latest version of Solr, 3.5.0? What is the timeout in solrconfig.xml? How many CPU cores does the machine have and how many concurrent indexer threads do you have running? Otis Performance Monitoring SaaS for Solr - http://sematext.com/spm/solr-performance-monitoring/index.html From: Eric Tang eric.x.t...@gmail.com To: solr-user@lucene.apache.org Sent: Friday, December 16, 2011 10:08 AM Subject: Lock obtain timed out Hi, I'm doing a lot reads and writes into a single solr server (on the magnitude of 50ish per second), and have around 300,000 documents in the index. Now every 5 minutes I get this exception: SEVERE: org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: NativeFSLock@./solr/data/index/write.lock And I have to restart my solr process. I've done some googling, some people have suggested raising the limit for linux file open #, or changing the merge factor, but that didn't work. Does anyone have insights into this? Thanks, Eric
Re: Replication file become very very big
Hi, Hm, I don't know what this could be caused by. But if you want to get rid of it, remote that Linux server our of the load balancer pool, stop Solr, remove the index, and restart Solr. Then force replication and put the server back in the load balancer pool. If you use SPM (see link in my signature below) you will see how your indices grow (and shrink!) over time and will catch this problem when it happens next time by looking at the graph that shows info about your index - size on FS, # of segments, documents, etc. Otis Performance Monitoring SaaS for Solr - http://sematext.com/spm/solr-performance-monitoring/index.html From: ZiLi dangld...@163.com To: solr-user@lucene.apache.org Cc: dangld...@163.com Sent: Thursday, December 15, 2011 9:28 PM Subject: Replication file become very very big Hi all, I meet a very strange problem . We use a windows server as master serviced for 5 windows slaves and 3 Linux slaves . It has worked normally for 2 months .But today we find one of the Linux slave's index file become very very big (150G! Others is 300M ). And we can't find the index folder under data folder .There is just four files :index.20111203090855 (150G)、index.properties、replication.properties、spellchecker 。 By the way , although this file is 150G , its service is normal and the query is very fast . By the way, our Linux slaves' index will poll from server every 40 minutes and every 15 minutes our program will update these server's solr index. We forbidden AutoCommit in solrconfig.xml . Is this caused the problem via some big transaction ? Any suggestion will be appreciate .
Re: Core overhead
Hi, I used to think this, too, but have learned this not to be entirely true. We had a customer with a query rate of a few hundred QPS and 32 or 64 GB RAM (don't recall which any more) and a pretty large JVM heap. Most queries were very fast, but once in a while a query would be very slow. GC, we thought! So the initial thinking was was - must be that big heap of theirs. But long story short, instead of making the heap smaller we just tuned the JVM and took care of those slow queries. Using SPM (link in sig) and seeing GC info (collection counts, times, heap size, etc.) was invaluable! Otis Performance Monitoring SaaS for Solr - http://sematext.com/spm/solr-performance-monitoring/index.html - FREE! From: Robert Stewart bstewart...@gmail.com To: solr-user@lucene.apache.org Sent: Thursday, December 15, 2011 2:16 PM Subject: Re: Core overhead One other thing I did not mention is GC pauses. If you have smaller heap sizes, you would have less very long GC pauses, so that can be an advantage having many cores (if cores are distributed into seperate SOLR instances, as seperate processes). I think you can expect 1 second pause for each GB of heap size in worst case. On Thu, Dec 15, 2011 at 2:14 PM, Robert Stewart bstewart...@gmail.com wrote: It is true number of terms may be much more than N/10 (or even N for each core), but it is the number of docs per term that will really matter. So you can have N terms in each core but each term has 1/10 number of docs on avg. 2011/12/15 Yury Kats yuryk...@yahoo.com: On 12/15/2011 1:07 PM, Robert Stewart wrote: I think overall memory usage would be close to the same. Is this really so? I suspect that the consumed memory is in direct proportion to the number of terms in the index. I also suspect that if I divided 1 core with N terms into 10 smaller cores, each smaller core would have much more than N/10 terms. Let's say I'm indexing English texts, it's likely that all smaller cores would have almost the same number of terms, close to the original N. Not so?
Re: Lock obtain timed out
Hi Otis, I'm using 3.2 because I can't get velocity to run on 3.5. I've changed my writeLockTimeout from 1000 to 1, and my commitLockTimeout from 1 to 5 Running on a large ec2 box, which has 2 virtual cores. I don't know how to find out the # of concurrent indexer threads. Is that the same as maxWarmingSearchers? If that's the case I've changed it from 2 to 5. I have about 12 processes running concurrently to read/write to solr at the moment, but this is just a test and I'm planning to up this number to 50 - 100. Thanks, Eric On Fri, Dec 16, 2011 at 10:14 AM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: Hi Eric, And you are using the latest version of Solr, 3.5.0? What is the timeout in solrconfig.xml? How many CPU cores does the machine have and how many concurrent indexer threads do you have running? Otis Performance Monitoring SaaS for Solr - http://sematext.com/spm/solr-performance-monitoring/index.html From: Eric Tang eric.x.t...@gmail.com To: solr-user@lucene.apache.org Sent: Friday, December 16, 2011 10:08 AM Subject: Lock obtain timed out Hi, I'm doing a lot reads and writes into a single solr server (on the magnitude of 50ish per second), and have around 300,000 documents in the index. Now every 5 minutes I get this exception: SEVERE: org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: NativeFSLock@./solr/data/index/write.lock And I have to restart my solr process. I've done some googling, some people have suggested raising the limit for linux file open #, or changing the merge factor, but that didn't work. Does anyone have insights into this? Thanks, Eric
Re: Core overhead
Hi Yury, Not sure if this was already covered in this thread, but with N smaller cores on a single N-CPU-core box you could run N queries in parallel over smaller indices, which may be faster than a single query going against a single big index, depending on how many concurrent query requests the box is handling (i.e. how busy or idle the CPU cores are). Otis Performance Monitoring SaaS for Solr - http://sematext.com/spm/solr-performance-monitoring/index.html From: Yury Kats yuryk...@yahoo.com To: solr-user@lucene.apache.org Sent: Thursday, December 15, 2011 12:58 PM Subject: Core overhead Does anybody have an idea, or better yet, measured data, to see what the overhead of a core is, both in memory and speed? For example, what would be the difference between having 1 core with 100M documents versus having 10 cores with 10M documents?
Re: Lock obtain timed out
Hi, I'm using 3.2 because I can't get velocity to run on 3.5. Maybe this is worth asking about in a separate thread or maybe you already did that. I've changed my writeLockTimeout from 1000 to 1, and my commitLockTimeout from 1 to 5 Running on a large ec2 box, which has 2 virtual cores. I don't know how to Note: *2* *virtual* cores. find out the # of concurrent indexer threads. Is that the same as maxWarmingSearchers? If that's the case I've changed it from 2 to 5. I 2 is better than 5 here have about 12 processes running concurrently to read/write to solr at the moment, but this is just a test and I'm planning to up this number to 50 - 100. Some of these processes are writing to Solr (indexing), others are reading from it (searching). Having more than 1-2 indexing processes on an EC2 box with just 2 *virtual* cores will be suboptimal. Does the error go away if you change your application to have just 1 indexing thread? Otis Performance Monitoring SaaS for Solr - http://sematext.com/spm/solr-performance-monitoring/index.html On Fri, Dec 16, 2011 at 10:14 AM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: Hi Eric, And you are using the latest version of Solr, 3.5.0? What is the timeout in solrconfig.xml? How many CPU cores does the machine have and how many concurrent indexer threads do you have running? Otis Performance Monitoring SaaS for Solr - http://sematext.com/spm/solr-performance-monitoring/index.html From: Eric Tang eric.x.t...@gmail.com To: solr-user@lucene.apache.org Sent: Friday, December 16, 2011 10:08 AM Subject: Lock obtain timed out Hi, I'm doing a lot reads and writes into a single solr server (on the magnitude of 50ish per second), and have around 300,000 documents in the index. Now every 5 minutes I get this exception: SEVERE: org.apache.lucene.store.LockObtainFailedException: Lock obtain timed out: NativeFSLock@./solr/data/index/write.lock And I have to restart my solr process. I've done some googling, some people have suggested raising the limit for linux file open #, or changing the merge factor, but that didn't work. Does anyone have insights into this? Thanks, Eric
Re: how to setup to archive expired documents?
Hi, We've done a fair number of such things over the years. :) If daily shards don't work for you, why not weekly or monthly? Have a look at Zoie's Hourglass concept/code. Some Solr alternatives are currently better suited to handle this sort of setup... Otis Performance Monitoring SaaS for Solr - http://sematext.com/spm/solr-performance-monitoring/index.html - Original Message - From: Robert Stewart bstewart...@gmail.com To: solr-user@lucene.apache.org Cc: Sent: Thursday, December 15, 2011 12:55 PM Subject: Re: how to setup to archive expired documents? I think managing 100 cores will be too much headache. Also performance of querying 100 cores will not be good (need page_number*page_size from 100 cores, and then merge). I think having around 10 SOLR instances, each one about 10M docs. Always search all 10 nodes. Index using some hash(doc) to distribute new docs among nodes. Run some nightly/weekly job to delete old docs and force merge (optimize) to some min/max number of segments. I think that will work ok, but not sure about how to handle replication/failover so each node is redundant. If we use SOLR replication it will have problems with replication after optimize for large indexes. Seems to take a long time to move 10M doc index from master to slave (around 100GB in our case). Doing it once per week is probably ok. 2011/12/15 Avni, Itamar itamar.a...@verint.com: What about managing a core for each day? This way the deletion/archive is very simple. No holes in the index (which is often when deleting document by document). The index done against core [today-0]. The query is done against cores [today-0],[today-1]...[today-99]. Quite a headache. Itamar -Original Message- From: Robert Stewart [mailto:bstewart...@gmail.com] Sent: יום ה 15 דצמבר 2011 16:54 To: solr-user@lucene.apache.org Subject: how to setup to archive expired documents? We have a large (100M) index where we add about 1M new docs per day. We want to keep index at a constant size so the oldest ones are removed and/or archived each day (so index contains around 100 days of data). What is the best way to do this? We still want to keep older data in some archive index, not just delete it (so is it possible to export older segments, etc. into some other index?). If we have some daily job to delete old data, I assume we'd need to optimize the index to actually remove and free space, but that will require very large (and slow) replication after optimize which will probably not work out well for so large an index. Is there some way to shard the data or other best practice? Thanks Bob This electronic message may contain proprietary and confidential information of Verint Systems Inc., its affiliates and/or subsidiaries. The information is intended to be for the use of the individual(s) or entity(ies) named above. If you are not the intended recipient (or authorized to receive this e-mail for the intended recipient), you may not use, copy, disclose or distribute to anyone this message or any information contained in this message. If you have received this electronic message in error, please notify us by replying to this e-mail.
Re: Solr AutoComplete - Address Search
Just to add to it, I'm using Suggester component to implement Auto Complete http://wiki.apache.org/solr/Suggester -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-AutoComplete-Address-Search-tp3590112p3592017.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Poor performance on distributed search
The thing that jumps out at me is rows=2000. If you documentCache in solrconfig.xml is still the defaults, it only holds 512. So you're running all over your disk gathering up the fields to return, especially since you also specified fl=*,score. And if you have large fields stored, you're doing an awful lot of disk reading. simple tests to see if this is on the right track, try these, singly and in combination. 1 try with rows=10 2 try with fl=id assuming id is your uniqueKey Best Erick On Thu, Dec 15, 2011 at 5:00 PM, ku3ia dem...@gmail.com wrote: Hi, all! I have a problem with distributed search. I downloaded one shard from my production. It has: * ~29M docs * 11 fields * ~105M terms * size of shard is: 13GB On production there are near 30 the same shards. I split this shard to 4 more smaller shards, so now I have: small shard1: docs: 6.2M terms: 27.2M size: 2.89GB small shard2: docs: 6.3M terms: 28.7M size: 2.98GB small shard3: docs: 7.9M terms: 32.8M size: 3.60GB small shard4: docs: 8.2M terms: 32.6M size: 3.70GB My machine confguration: ABIT AX-78 AMD Athlon 64 X2 5200+ DDR2 Kingston 2x2G+2x1G = 6G WDC WD2500JS (System here) WDC WD20EARS (6 partitions = 30 GB for shards at begin of drive, and other empty, all partitions are well aligned) GNU/Linux Debian Squeeze Tomcat 6.0.32 with JAVA_OPTS: JAVA_OPTS=$JAVA_OPTS -XX:+DisableExplicitGC -server \ -XX:PermSize=512M -XX:MaxPermSize=512M -Xmx4096M -Xms4096M -XX:NewSize=128M -XX:MaxNewSize=128M \ -XX:+UseParNewGC -XX:+UseConcMarkSweepGC -XX:+CMSClassUnloadingEnabled \ -XX:CMSInitiatingOccupancyFraction=50 -XX:GCTimeRatio=9 -XX:MinHeapFreeRatio=25 -XX:MaxHeapFreeRatio=25 \ -verbose:gc -XX:+PrintGCTimeStamps -Xloggc:$CATALINA_HOME/logs/gc.log Solr 3.5 I configured 4 cores and start Tomcat. I write a bash script. It's runing during 300 seconds and sending every 6 seconds queries like http://127.0.0.1:8080/solr/shard1/select/?ident=trueq=(assistants)rows=2000start=0fl=*,scoreqt=requestShards where qt=requestShards is my 4 shards. After test I have the results: Elapsed time: 299 secs --- solr --- Queries processed: 21 this is full response file Queries cancelled: 29 this is number of killed curls Average QTime is: 59645.6 ms Average RTime is: 59.7619 sec(s) this is average time difference between start and stop the curl. There is a part of script: # dcs=`date +%s` # curl ${url} -s -H 'Content-type:text/xml; charset=utf-8' ${F_DATADIR}/$dest.fdata # dce=`date +%s` # dcd=$(echo $dce - $dcs | bc) Size of data-dir is: 3346766 bytes this is response dir size I'm using nmon to to monitor R/W disk speed, and I was surprised that read speed of my shards volumes WDC20EAR's drive was nearly 3 MB/s when script is working. After this I run benchmark test from disk utility. Here is results: Minimum read rate: 53.2MB/s Maximum Read rate: 126.4 MB/s Average Read rate: 95.8 MB/s But from the other side I tested queries like http://127.0.0.1:8080/solr/shard1/select/?ident=trueq=(assistants)rows=2000start=0fl=*,score results is: Elapsed time: 299 secs --- solr --- Queries processed: 50 Queries cancelled: 0 Average QTime is: 139.76 ms Average RTime is: 2.2 sec(s) Size of data-dir is: 6819259 bytes and quesries like http://127.0.0.1:8080/solr/shard1/select/?ident=trueq=(assistants)rows=2000start=0fl=*,scoreshards=127.0.0.1:8080/solr/shard1 and result is: Elapsed time: 299 secs --- solr --- Queries processed: 49 Queries cancelled: 1 Average QTime is: 1878.37 ms Average RTime is: 1.95918 sec(s) Size of data-dir is: 4274099 bytes So we see the results are the same. My big question is: why is so slow drive read speed when Solr is working? Thanks for any replies P.S. And maybe my general problem is too much terms in shard, for example, query http://127.0.0.1:8080/solr/shard1/terms?terms.fl=field1 shows: lst name=field1 int name=a58641/int int name=the45022/int int name=i36339/int int name=s35637/int int name=d34247/int int name=m33869/int int name=b28961/int int name=r28147/int int name=e27654/int int name=n26940/int /lst Thanks. -- View this message in context: http://lucene.472066.n3.nabble.com/Poor-performance-on-distributed-search-tp3590028p3590028.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: how to setup to archive expired documents?
We actually have a system that uses weekly shards but that is all .NET (Lucene.NET) and has lots of code to manage adding new indexes. We want to move to SOLR for performance and maintenance reasons. So if we use some sort of weekly or daily sharding, there needs to be some mechanism in place to dynamically add the new shard when the current one fills up. (Which would also ideally know where to put the new shards on what server, etc.) Since SOLR does not implement that I was thinking of just having a static set of shards. On Dec 16, 2011, at 10:54 AM, Otis Gospodnetic wrote: Hi, We've done a fair number of such things over the years. :) If daily shards don't work for you, why not weekly or monthly? Have a look at Zoie's Hourglass concept/code. Some Solr alternatives are currently better suited to handle this sort of setup... Otis Performance Monitoring SaaS for Solr - http://sematext.com/spm/solr-performance-monitoring/index.html - Original Message - From: Robert Stewart bstewart...@gmail.com To: solr-user@lucene.apache.org Cc: Sent: Thursday, December 15, 2011 12:55 PM Subject: Re: how to setup to archive expired documents? I think managing 100 cores will be too much headache. Also performance of querying 100 cores will not be good (need page_number*page_size from 100 cores, and then merge). I think having around 10 SOLR instances, each one about 10M docs. Always search all 10 nodes. Index using some hash(doc) to distribute new docs among nodes. Run some nightly/weekly job to delete old docs and force merge (optimize) to some min/max number of segments. I think that will work ok, but not sure about how to handle replication/failover so each node is redundant. If we use SOLR replication it will have problems with replication after optimize for large indexes. Seems to take a long time to move 10M doc index from master to slave (around 100GB in our case). Doing it once per week is probably ok. 2011/12/15 Avni, Itamar itamar.a...@verint.com: What about managing a core for each day? This way the deletion/archive is very simple. No holes in the index (which is often when deleting document by document). The index done against core [today-0]. The query is done against cores [today-0],[today-1]...[today-99]. Quite a headache. Itamar -Original Message- From: Robert Stewart [mailto:bstewart...@gmail.com] Sent: יום ה 15 דצמבר 2011 16:54 To: solr-user@lucene.apache.org Subject: how to setup to archive expired documents? We have a large (100M) index where we add about 1M new docs per day. We want to keep index at a constant size so the oldest ones are removed and/or archived each day (so index contains around 100 days of data). What is the best way to do this? We still want to keep older data in some archive index, not just delete it (so is it possible to export older segments, etc. into some other index?). If we have some daily job to delete old data, I assume we'd need to optimize the index to actually remove and free space, but that will require very large (and slow) replication after optimize which will probably not work out well for so large an index. Is there some way to shard the data or other best practice? Thanks Bob This electronic message may contain proprietary and confidential information of Verint Systems Inc., its affiliates and/or subsidiaries. The information is intended to be for the use of the individual(s) or entity(ies) named above. If you are not the intended recipient (or authorized to receive this e-mail for the intended recipient), you may not use, copy, disclose or distribute to anyone this message or any information contained in this message. If you have received this electronic message in error, please notify us by replying to this e-mail.
Re: edismax doesn't obey 'pf' parameter
A side note: specifying qt and defType on the same query is probably not what you intend. I'd just omit the qt bit since you're essentially passing all the info you intend explicitly... I see the same behavior when I specify a non-tokenized field in 3.5 But I don't think this is a bug since it doesn't make sense to specify a phrase field on a non-tokenized field since there's always exactly one token at position 0. The whole idea of phrases is that multiple tokens must appear within the slop. Best Erick On Thu, Dec 15, 2011 at 5:46 PM, entdeveloper cameron.develo...@gmail.com wrote: I'm observing strange results with both the correct and incorrect behavior happening depending on which field I put in the 'pf' param. I wouldn't think this should be analyzer specific, but is it? If I try: http://localhost:8080/solr/collection1/select?qt=%2Fsearchq=mickey%20mousedebugQuery=ondefType=edismaxpf=blah_exactqf=blah It looks correct: str name=rawquerystringmickey mouse/str str name=querystringmickey mouse/str str name=parsedquery+((DisjunctionMaxQuery((blah:mickey)) DisjunctionMaxQuery((blah:mouse)))~2) DisjunctionMaxQuery((blah_exact:mickey mouse))/str str name=parsedquery_toString+(((blah:mickey) (blah:mouse))~2) (blah_exact:mickey mouse)/str However, If I put in the field I want, for some reason that phrase portion of the query just completely drops off: http://localhost:8080/solr/collection1/select?qt=%2Fsearchq=mickey%20mousedebugQuery=ondefType=edismaxpf=name_exactqf=name Results: str name=rawquerystringmickey mouse/str str name=querystringmickey mouse/str str name=parsedquery+((DisjunctionMaxQuery((name:mickey)) DisjunctionMaxQuery((name:mouse)))~2) ()/str str name=parsedquery_toString+(((name:mickey) (name:mouse))~2) ()/str The name_exact field's analyzer uses KeywordTokenizer, but again, I think this query is being formed too early in the process for that to matter at this point -- View this message in context: http://lucene.472066.n3.nabble.com/edismax-doesn-t-obey-pf-parameter-tp3589763p3590153.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr Version Upgrade issue
Please start another thread and provide some details, there's not enough information here to say anything. You might review: http://wiki.apache.org/solr/UsingMailingLists Best Erick On Thu, Dec 15, 2011 at 11:50 PM, Pawan Darira pawan.dar...@gmail.com wrote: Thanks. I re-started from scratch at least things have started working now. I upgraded by deploying 3.2 war in my jboss. Also, did conf changes as mentioned in CHANGES.txt It did expected to have a separate libdirectory which was not required in 1.4. New problem is that it's taking very long to build indexes more than an hour. it took only 10 minutes in 1.4. Can u please guide regarding this. Should i attach my solrconfig.xml for reference On Wed, Dec 7, 2011 at 8:22 PM, Erick Erickson erickerick...@gmail.comwrote: How did you upgrade? What steps did you follow? Do you have any custom code? Any additional lib entries in your solrconfig.xml? These details help us diagnose your problem, but it's almost certainly that you have a mixture of jar files lying around your machine in a place you don't expect. Best Erick On Wed, Dec 7, 2011 at 1:28 AM, Pawan Darira pawan.dar...@gmail.com wrote: I checked that. there are only latest jars. I am not able to figure out the issue. On Tue, Dec 6, 2011 at 6:57 PM, Mark Miller markrmil...@gmail.com wrote: Looks like you must have a mix of old and new jars. On Tuesday, December 6, 2011, Pawan Darira pawan.dar...@gmail.com wrote: Hi I am trying to upgrade my SOLR version from 1.4 to 3.2. but it's giving me below exception. I have checked solr home path it is correct.. Please help SEVERE: Could not start Solr. Check solr/home property java.lang.NoSuchMethodError: org.apache.solr.common.SolrException.logOnce(Lorg/slf4j/Logger;Ljava/lang/String;Ljava/lang/Throwable;)V at org.apache.solr.core.CoreContainer.load(CoreContainer.java:321) at org.apache.solr.core.CoreContainer.load(CoreContainer.java:207) at org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:130) at org.apache.solr.servlet.SolrDispatchFilter.init(SolrDispatchFilter.java:94) at org.apache.catalina.core.ApplicationFilterConfig.getFilter(ApplicationFilterConfig.java:275) at org.apache.catalina.core.ApplicationFilterConfig.setFilterDef(ApplicationFilterConfig.java:397) at org.apache.catalina.core.ApplicationFilterConfig.init(ApplicationFilterConfig.java:108) at org.apache.catalina.core.StandardContext.filterStart(StandardContext.java:3720) at org.apache.catalina.core.StandardContext.start(StandardContext.java:4358) at org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:752) at org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:732) at org.apache.catalina.core.StandardHost.addChild(StandardHost.java:553) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:585) at org.apache.tomcat.util.modeler.BaseModelMBean.invoke(BaseModelMBean.java:297) at org.jboss.mx.server.RawDynamicInvoker.invoke(RawDynamicInvoker.java:164) at org.jboss.mx.server.MBeanServerImpl.invoke(MBeanServerImpl.java:659) at org.apache.catalina.core.StandardContext.init(StandardContext.java:5300) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) at java.lang.reflect.Method.invoke(Method.java:585) at org.apache.tomcat.util.modeler.BaseModelMBean.invoke(BaseModelMBean.java:297) at org.jboss.mx.server.RawDynamicInvoker.invoke(RawDynamicInvoker.java:164) at org.jboss.mx.server.MBeanServerImpl.invoke(MBeanServerImpl.java:659) at org.jboss.web.tomcat.service.TomcatDeployer.performDeployInternal(TomcatDeployer.java:301) at org.jboss.web.tomcat.service.TomcatDeployer.performDeploy(TomcatDeployer.java:104) at org.jboss.web.AbstractWebDeployer.start(AbstractWebDeployer.java:375) at org.jboss.web.WebModule.startModule(WebModule.java:83) -- - Mark http://www.lucidimagination.com -- Thanks, Pawan
Re: edismax doesn't obey 'pf' parameter
That was a little confusing! there's always exactly one token at position 0. Of course. What I meant to say was there is always exactly one token in a non-tokenized field and it's offset is always exactly 0. There will never be tokens at position 1. So asking to match phrases, which is based on term positions is basically a no-op. Hope that makes more sense Erick On Fri, Dec 16, 2011 at 11:44 AM, Erick Erickson erickerick...@gmail.com wrote: A side note: specifying qt and defType on the same query is probably not what you intend. I'd just omit the qt bit since you're essentially passing all the info you intend explicitly... I see the same behavior when I specify a non-tokenized field in 3.5 But I don't think this is a bug since it doesn't make sense to specify a phrase field on a non-tokenized field since there's always exactly one token at position 0. The whole idea of phrases is that multiple tokens must appear within the slop. Best Erick On Thu, Dec 15, 2011 at 5:46 PM, entdeveloper cameron.develo...@gmail.com wrote: I'm observing strange results with both the correct and incorrect behavior happening depending on which field I put in the 'pf' param. I wouldn't think this should be analyzer specific, but is it? If I try: http://localhost:8080/solr/collection1/select?qt=%2Fsearchq=mickey%20mousedebugQuery=ondefType=edismaxpf=blah_exactqf=blah It looks correct: str name=rawquerystringmickey mouse/str str name=querystringmickey mouse/str str name=parsedquery+((DisjunctionMaxQuery((blah:mickey)) DisjunctionMaxQuery((blah:mouse)))~2) DisjunctionMaxQuery((blah_exact:mickey mouse))/str str name=parsedquery_toString+(((blah:mickey) (blah:mouse))~2) (blah_exact:mickey mouse)/str However, If I put in the field I want, for some reason that phrase portion of the query just completely drops off: http://localhost:8080/solr/collection1/select?qt=%2Fsearchq=mickey%20mousedebugQuery=ondefType=edismaxpf=name_exactqf=name Results: str name=rawquerystringmickey mouse/str str name=querystringmickey mouse/str str name=parsedquery+((DisjunctionMaxQuery((name:mickey)) DisjunctionMaxQuery((name:mouse)))~2) ()/str str name=parsedquery_toString+(((name:mickey) (name:mouse))~2) ()/str The name_exact field's analyzer uses KeywordTokenizer, but again, I think this query is being formed too early in the process for that to matter at this point -- View this message in context: http://lucene.472066.n3.nabble.com/edismax-doesn-t-obey-pf-parameter-tp3589763p3590153.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Announcement of Soldash - a dashboard for multiple Solr instances
Nice! May be good to upload some screenshots there... Otis Performance Monitoring SaaS for Solr - http://sematext.com/spm/solr-performance-monitoring/index.html - Original Message - From: Alexander Valet | edelight alexander.va...@edelight.de To: solr-user@lucene.apache.org Cc: Sent: Thursday, December 15, 2011 9:50 AM Subject: Announcement of Soldash - a dashboard for multiple Solr instances We use Solr quite a bit at edelight -- and love it. However, we encountered one minor peeve: although each individual Solr server has its own dashboard, there's no easy way of getting a complete overview of an entire Solr cluster and the status of its nodes. Over the last weeks our own Aengus Walton developed Soldash, a dashboard for your entire Solr cluster. Although still in its infancy, Soldash gives you an overview of: - your Solr servers - what version of Solr they're running - what index version they have, and whether slaves are in sync with their master as well as allowing you to: - turn polling and replication on or off - force an index fetch on a slave - display a file list of the current index - backup the index - reload the index It is worth noting that due to the set-up of our own environment, Soldash has been programmed to automatically presume all Solr instances have the same cores. This may change in future releases, depending on community reaction. The project is open-source and hopefully some of you shall find this tool useful in day-to-day administration of Solr. The newest version (0.2.2) can be downloaded at: https://github.com/edelight/soldash/tags Instructions on how to configure Soldash can be found at the project's homepage on github: https://github.com/edelight/soldash Feedback and suggestions are very welcome! -- edelight GmbH, Wilhelmstr. 4a, 70182 Stuttgart Fon: +49 (0)711-912590-14 | Fax: +49 (0)711-912590-99 Geschäftsführer: Peter Ambrozy, Tassilo Bestler Amtsgericht Stuttgart, HRB 722861 Ust.-IdNr. DE814842587 Diese E-Mail ist vertraulich. Wenn Sie nicht der rechtmäßige Empfänger sind, dürfen Sie den Inhalt weder kopieren noch verbreiten oder benutzen. Sollten Sie diese E-Mail versehentlich erhalten haben, senden Sie sie bitte an uns zurück und löschen Sie sie anschließend. This email is confidential. If you are not the intended recipient, you must not copy, disclose or use its contents. If you have received it in error, please inform us immediately by return email and delete the document.
Re: Core overhead
Wow the shameless plugging of product (footer) has hit a new low Otis. On Fri, Dec 16, 2011 at 7:32 AM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: Hi Yury, Not sure if this was already covered in this thread, but with N smaller cores on a single N-CPU-core box you could run N queries in parallel over smaller indices, which may be faster than a single query going against a single big index, depending on how many concurrent query requests the box is handling (i.e. how busy or idle the CPU cores are). Otis Performance Monitoring SaaS for Solr - http://sematext.com/spm/solr-performance-monitoring/index.html From: Yury Kats yuryk...@yahoo.com To: solr-user@lucene.apache.org Sent: Thursday, December 15, 2011 12:58 PM Subject: Core overhead Does anybody have an idea, or better yet, measured data, to see what the overhead of a core is, both in memory and speed? For example, what would be the difference between having 1 core with 100M documents versus having 10 cores with 10M documents?
updates to runbot.sh script
http://wiki.apache.org/nutch/Crawl This script no longer works. See: echo - Index (Step 5 of $steps) - $NUTCH_HOME/bin/nutch index crawl/NEWindexes crawl/crawldb crawl/linkdb \ crawl/segments/* The index call doesn't existso what does this line get replaced with? Is there an updated runbot.sh script? Has anyone created a new one that will work? I've done some changes on it, but I just don't know what to do for this part. Thanks! -- Chris
Re: Poor performance on distributed search
Hi, Erick, thanks for your reply Yeah, you are right - document cache is default, but I tried to decrease and increase values but I didn't get the desired result. I tried the tests. Here are results: 1 try with rows=10 successfully started at 19:48:34 Queries interval is: 10 queries per minute http://127.0.0.1:8080/solr/shard1/select/?ident=trueq=(gulping)rows=10start=0fl=*,scoreqt=requestShards ... http://127.0.0.1:8080/solr/shard1/select/?ident=trueq=(tabors)rows=10start=0fl=*,scoreqt=requestShards utility successfully stopped at 19:53:33 Elapsed time: 299 secs --- solr --- Queries processed: 50 Queries cancelled: 0 Average QTime is: 764 ms Average RTime is: 0.68 sec(s) Size of data-dir is: 235784 bytes 2 try with fl=id assuming id is your uniqueKey successfully started at 19:56:23 Queries interval is: 10 queries per minute http://127.0.0.1:8080/solr/shard1/select/?ident=trueq=(psyche's)rows=2000start=0fl=RecordIDqt=requestShards ... http://127.0.0.1:8080/solr/shard1/select/?ident=trueq=(betook)rows=2000start=0fl=RecordIDqt=requestShards utility successfully stopped at 20:01:24 Elapsed time: 301 secs --- solr --- Queries processed: 15 Queries cancelled: 35 Average QTime is: 52775.7 ms Average RTime is: 53.2667 sec(s) Size of data-dir is: 212978 bytes In first test disk usage by nmon: ~30-40% and in the second - 100%. Drive read speed starting from 3-5 MB/s and falls to 500-700 KB/s in both tests. Have you any ideas? -- View this message in context: http://lucene.472066.n3.nabble.com/Poor-performance-on-distributed-search-tp3590028p3592364.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Core overhead
I thought it was slightly clumsy, but it was informative. It seemed like a fine thing to say. Effectively it was I/we have developed a tool that will help you solve your problem. That is responsive to the OP and it is clear that it is a commercial deal. On Fri, Dec 16, 2011 at 10:02 AM, Jason Rutherglen jason.rutherg...@gmail.com wrote: Wow the shameless plugging of product (footer) has hit a new low Otis. On Fri, Dec 16, 2011 at 7:32 AM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: Hi Yury, Not sure if this was already covered in this thread, but with N smaller cores on a single N-CPU-core box you could run N queries in parallel over smaller indices, which may be faster than a single query going against a single big index, depending on how many concurrent query requests the box is handling (i.e. how busy or idle the CPU cores are). Otis Performance Monitoring SaaS for Solr - http://sematext.com/spm/solr-performance-monitoring/index.html From: Yury Kats yuryk...@yahoo.com To: solr-user@lucene.apache.org Sent: Thursday, December 15, 2011 12:58 PM Subject: Core overhead Does anybody have an idea, or better yet, measured data, to see what the overhead of a core is, both in memory and speed? For example, what would be the difference between having 1 core with 100M documents versus having 10 cores with 10M documents?
Re: updates to runbot.sh script
: http://wiki.apache.org/nutch/Crawl : : This script no longer works. See: If you have a question about something on the nutch wiki, or included in the nutch release, i would suggest you email the nutch user list. -Hoss
Re: updates to runbot.sh script
Ha, sorry Hoss. Thought i hit user@nutch, gmail did the replace and I wasn't paying attention. -- Chris On Fri, Dec 16, 2011 at 2:46 PM, Chris Hostetter hossman_luc...@fucit.org wrote: : http://wiki.apache.org/nutch/Crawl : : This script no longer works. See: If you have a question about something on the nutch wiki, or included in the nutch release, i would suggest you email the nutch user list. -Hoss
Re: Core overhead
Ted, ...- FREE! is stupid idiot spam. It's annoying and not suitable. On Fri, Dec 16, 2011 at 11:45 AM, Ted Dunning ted.dunn...@gmail.com wrote: I thought it was slightly clumsy, but it was informative. It seemed like a fine thing to say. Effectively it was I/we have developed a tool that will help you solve your problem. That is responsive to the OP and it is clear that it is a commercial deal. On Fri, Dec 16, 2011 at 10:02 AM, Jason Rutherglen jason.rutherg...@gmail.com wrote: Wow the shameless plugging of product (footer) has hit a new low Otis. On Fri, Dec 16, 2011 at 7:32 AM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: Hi Yury, Not sure if this was already covered in this thread, but with N smaller cores on a single N-CPU-core box you could run N queries in parallel over smaller indices, which may be faster than a single query going against a single big index, depending on how many concurrent query requests the box is handling (i.e. how busy or idle the CPU cores are). Otis Performance Monitoring SaaS for Solr - http://sematext.com/spm/solr-performance-monitoring/index.html From: Yury Kats yuryk...@yahoo.com To: solr-user@lucene.apache.org Sent: Thursday, December 15, 2011 12:58 PM Subject: Core overhead Does anybody have an idea, or better yet, measured data, to see what the overhead of a core is, both in memory and speed? For example, what would be the difference between having 1 core with 100M documents versus having 10 cores with 10M documents?
Re: Core overhead
Sounds like we disagree. On Fri, Dec 16, 2011 at 11:56 AM, Jason Rutherglen jason.rutherg...@gmail.com wrote: Ted, ...- FREE! is stupid idiot spam. It's annoying and not suitable. On Fri, Dec 16, 2011 at 11:45 AM, Ted Dunning ted.dunn...@gmail.com wrote: I thought it was slightly clumsy, but it was informative. It seemed like a fine thing to say. Effectively it was I/we have developed a tool that will help you solve your problem. That is responsive to the OP and it is clear that it is a commercial deal. On Fri, Dec 16, 2011 at 10:02 AM, Jason Rutherglen jason.rutherg...@gmail.com wrote: Wow the shameless plugging of product (footer) has hit a new low Otis. On Fri, Dec 16, 2011 at 7:32 AM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: Hi Yury, Not sure if this was already covered in this thread, but with N smaller cores on a single N-CPU-core box you could run N queries in parallel over smaller indices, which may be faster than a single query going against a single big index, depending on how many concurrent query requests the box is handling (i.e. how busy or idle the CPU cores are). Otis Performance Monitoring SaaS for Solr - http://sematext.com/spm/solr-performance-monitoring/index.html From: Yury Kats yuryk...@yahoo.com To: solr-user@lucene.apache.org Sent: Thursday, December 15, 2011 12:58 PM Subject: Core overhead Does anybody have an idea, or better yet, measured data, to see what the overhead of a core is, both in memory and speed? For example, what would be the difference between having 1 core with 100M documents versus having 10 cores with 10M documents?
Re: Poor performance on distributed search
OK, so your speed differences are pretty much dependent upon whether you specify rows=2000 or rows=10, right? Why do you need 2,000 rows? Or is the root question why there's such a difference when you specify qt=requestShards? In which case I'm curious to see that request handler definition... Best Erick On Fri, Dec 16, 2011 at 1:38 PM, ku3ia dem...@gmail.com wrote: Hi, Erick, thanks for your reply Yeah, you are right - document cache is default, but I tried to decrease and increase values but I didn't get the desired result. I tried the tests. Here are results: 1 try with rows=10 successfully started at 19:48:34 Queries interval is: 10 queries per minute http://127.0.0.1:8080/solr/shard1/select/?ident=trueq=(gulping)rows=10start=0fl=*,scoreqt=requestShards ... http://127.0.0.1:8080/solr/shard1/select/?ident=trueq=(tabors)rows=10start=0fl=*,scoreqt=requestShards utility successfully stopped at 19:53:33 Elapsed time: 299 secs --- solr --- Queries processed: 50 Queries cancelled: 0 Average QTime is: 764 ms Average RTime is: 0.68 sec(s) Size of data-dir is: 235784 bytes 2 try with fl=id assuming id is your uniqueKey successfully started at 19:56:23 Queries interval is: 10 queries per minute http://127.0.0.1:8080/solr/shard1/select/?ident=trueq=(psyche's)rows=2000start=0fl=RecordIDqt=requestShards ... http://127.0.0.1:8080/solr/shard1/select/?ident=trueq=(betook)rows=2000start=0fl=RecordIDqt=requestShards utility successfully stopped at 20:01:24 Elapsed time: 301 secs --- solr --- Queries processed: 15 Queries cancelled: 35 Average QTime is: 52775.7 ms Average RTime is: 53.2667 sec(s) Size of data-dir is: 212978 bytes In first test disk usage by nmon: ~30-40% and in the second - 100%. Drive read speed starting from 3-5 MB/s and falls to 500-700 KB/s in both tests. Have you any ideas? -- View this message in context: http://lucene.472066.n3.nabble.com/Poor-performance-on-distributed-search-tp3590028p3592364.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Possible to facet across two indices, or document types in single index?
: Chris, you replied: : : : But there is a workaround: : : 1) Do a normal query without facets (you only need to request doc ids : : at this point) : : 2) Collect all the IDs of the documents returned : : 3) Do a second query for all fields and facets, adding a filter to : : restrict result to those IDs collected in step 2. FYI: that was actually Erick's suggestion, i just pointed out that #3 wasn't neccessary if you *only* care about the docs on page #1 ... but given thta in your situation you really need data from two different collections, it's a much differnet problem. : When the initial search query comes in, I can do 1-3 above as you : describe. I have fewer than 200K documents in the index. Given the : generalness of the search terms, let's say I get 7500 document IDs back : per 1 and 2. It sounds like I need to create a filter query which : includes all 7500 IDs, and issue the 2nd query (in my case to another : core) and have it facet on the additional field(s) I'm interested in. : I don't need to return results from this, just get the facet : values/counts. so far so good -- what you are describing is exactly what Join does (or specificly: what it was designed to do except for the anoying bug in how it parses the query) except that you are choosing to ignore the results and only look at the facet counts. : Step 4 for me is to search the first index again, to obtain the : requested number of rows of results, return the appropriate fields, and : calculate facets for that content. I can then merge the facet results : of both indexes, and the client is none the wiser. here's where you've lost me... how are you going to merge the facet counts from the two cores? you could just lump them all in together (fieldA1 and fieldA2 from coreA, in a map with fieldB1 and fieldB2 from coreB) but they are counting ocmpleltey differnet things from comletely differnet cores -- if your main result set is from coreA, but you also show these facet counts based on the join against coreB, the constraint counts for values from fieldB2 aren't going to mean much relative to the results you return. I mean: consider a concrete example of having a books core and an authors core - wher every book has a field identifying the author by id. if a user searches for authors who live in oregon, and then you get that list of 98 authors, and join them against the books core and facet on genre you can return some data like this... Genre: Biography: 1023 Romance: 854 Mystery: 674 ... ...but thta doesn't really tell you anything about the author documents you are returning does it? you know that some subset of those 98 authors wrote a total of 854 romance novels, but is that actaully useful in some way? I suspect what you really want is to know the number of *authors* who have written books in each of those genres -- and nothing you've described so far will get you that. (once again, we're back to the issue of denormalizing) Setting asside that issue for a moment... : A couple questions though (aren't there always? :)) Is this very : efficient? Beyond building the string of 7500 IDs within my app, can : Solr swallow that okay? I'm using SolrJ, javabin format, so hopefully : there is not a URL length issue (between my app and Solr)? I'm guessing : javabin uses HTTP POST. efficient is vauge... it can be done, but there's a lot of data going over the wire. it would probably be more efficint to do is server side in a custom request handler (similar to how Join works) : What is a reasonable way for the facets derived from the 2nd index to be : used for narrowing like those in the main content index? That is, : pinning down facet values from the second index is not going to affect : the results (document IDs) from searching the first index. Perhaps that Now we're back to the problem i mentioned before, except you're describing it at the moment when a person attempts to filter on a facet constraint -- but as i've pointed out, you already have to deal with this just to generate the list of facet constraints and their counts. -Hoss
Re: SolrCloud Cores
On Fri, Dec 16, 2011 at 8:14 AM, Jamie Johnson jej2...@gmail.com wrote: What is the most appropriate way to configure Solr when deploying in a cloud environment? Should the core name on all instances be the collection name or is it more appropriate that each shard be a separate core, or should each solr instance be a separate core (i.e. master1, master1-replica are 2 separate cores)? At this point, its probably best/easiest to name them after the collection. -- - Mark http://www.lucidimagination.com
Re: Core overhead
Ted, The list would be unreadable if everyone spammed at the bottom their email like Otis'. It's just bad form. Jason On Fri, Dec 16, 2011 at 12:00 PM, Ted Dunning ted.dunn...@gmail.com wrote: Sounds like we disagree. On Fri, Dec 16, 2011 at 11:56 AM, Jason Rutherglen jason.rutherg...@gmail.com wrote: Ted, ...- FREE! is stupid idiot spam. It's annoying and not suitable. On Fri, Dec 16, 2011 at 11:45 AM, Ted Dunning ted.dunn...@gmail.com wrote: I thought it was slightly clumsy, but it was informative. It seemed like a fine thing to say. Effectively it was I/we have developed a tool that will help you solve your problem. That is responsive to the OP and it is clear that it is a commercial deal. On Fri, Dec 16, 2011 at 10:02 AM, Jason Rutherglen jason.rutherg...@gmail.com wrote: Wow the shameless plugging of product (footer) has hit a new low Otis. On Fri, Dec 16, 2011 at 7:32 AM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: Hi Yury, Not sure if this was already covered in this thread, but with N smaller cores on a single N-CPU-core box you could run N queries in parallel over smaller indices, which may be faster than a single query going against a single big index, depending on how many concurrent query requests the box is handling (i.e. how busy or idle the CPU cores are). Otis Performance Monitoring SaaS for Solr - http://sematext.com/spm/solr-performance-monitoring/index.html From: Yury Kats yuryk...@yahoo.com To: solr-user@lucene.apache.org Sent: Thursday, December 15, 2011 12:58 PM Subject: Core overhead Does anybody have an idea, or better yet, measured data, to see what the overhead of a core is, both in memory and speed? For example, what would be the difference between having 1 core with 100M documents versus having 10 cores with 10M documents?
Re: Core overhead
We still disagree. On Fri, Dec 16, 2011 at 12:29 PM, Jason Rutherglen jason.rutherg...@gmail.com wrote: Ted, The list would be unreadable if everyone spammed at the bottom their email like Otis'. It's just bad form. Jason On Fri, Dec 16, 2011 at 12:00 PM, Ted Dunning ted.dunn...@gmail.com wrote: Sounds like we disagree. On Fri, Dec 16, 2011 at 11:56 AM, Jason Rutherglen jason.rutherg...@gmail.com wrote: Ted, ...- FREE! is stupid idiot spam. It's annoying and not suitable. On Fri, Dec 16, 2011 at 11:45 AM, Ted Dunning ted.dunn...@gmail.com wrote: I thought it was slightly clumsy, but it was informative. It seemed like a fine thing to say. Effectively it was I/we have developed a tool that will help you solve your problem. That is responsive to the OP and it is clear that it is a commercial deal. On Fri, Dec 16, 2011 at 10:02 AM, Jason Rutherglen jason.rutherg...@gmail.com wrote: Wow the shameless plugging of product (footer) has hit a new low Otis. On Fri, Dec 16, 2011 at 7:32 AM, Otis Gospodnetic otis_gospodne...@yahoo.com wrote: Hi Yury, Not sure if this was already covered in this thread, but with N smaller cores on a single N-CPU-core box you could run N queries in parallel over smaller indices, which may be faster than a single query going against a single big index, depending on how many concurrent query requests the box is handling (i.e. how busy or idle the CPU cores are). Otis Performance Monitoring SaaS for Solr - http://sematext.com/spm/solr-performance-monitoring/index.html From: Yury Kats yuryk...@yahoo.com To: solr-user@lucene.apache.org Sent: Thursday, December 15, 2011 12:58 PM Subject: Core overhead Does anybody have an idea, or better yet, measured data, to see what the overhead of a core is, both in memory and speed? For example, what would be the difference between having 1 core with 100M documents versus having 10 cores with 10M documents?
Re: Poor performance on distributed search
OK, so your speed differences are pretty much dependent upon whether you specify rows=2000 or rows=10, right? Why do you need 2,000 rows? Yes, big difference is 10 v. 2K records. Limit of 2K rows is setted by manager and I can't decrease it. It is a minimum row count needed to process data. Or is the root question why there's such a difference when you specify qt=requestShards? In which case I'm curious to see that request handler definition... requestHandler name=requestShards class=solr.SearchHandler default=false lst name=defaults str name=echoParamsexplicit/str int name=rows10/int str name=shards127.0.0.1:8080/solr/shard1,127.0.0.1:8080/solr/shard2,127.0.0.1:8080/solr/shard3,127.0.0.1:8080/solr/shard4/str /lst /requestHandler This request handler is defined at shard1's solrconfig. -- View this message in context: http://lucene.472066.n3.nabble.com/Poor-performance-on-distributed-search-tp3590028p3592734.html Sent from the Solr - User mailing list archive at Nabble.com.
Retrieving Documents
I've been doing a fair amount of reading and experimenting with Solr lately. I find that it does a good job of indexing very structured documents. However, the application I have in mind is build around long EPUB documents. Of course, I found the Extract components useful for indexing the EPUBs. However, I would like to be able to * Size the highlight portion of text around the query parameters (i.e. show 20 or 30 words) and * Retrieve a location within the document so I can display that page from the EPUB. What is common practice for these? I notice that if I have a list of (short) text segments in fields, they are stored without too much fuss and are retrievable. However, I'm talking about a field of potentially hundreds of words. Thanks for any pointers, Dan -- Dan McGinn-Combs dgco...@gmail.com Peachtree City, Georgia USA
Call RequestHandler from QueryComponent
Hi! I have a solrconfig.xml like: requestHandler name=/ABC class=solr.SearchHandler lst name=defaults str name=echoParamsall/str int name=start0/int int name=rows10/int str name=wtABC/str str name=sortscore desc,rating asc/str str name=fqCUSTOM FQ/str str name=version2.2/str str name=flCUSTOM FL/str /lst arr name=components strvalidate/str strCUSTOM ABC QUERY COMPONENT/str strstats/str strdebug/str /arr /requestHandler requestHandler name=/XYZ class=solr.SearchHandler lst name=defaults str name=echoParamsall/str int name=start0/int int name=rows1/int str name=wtXYZ/str str name=sortscore desc/str str name=flCUSTOM FL/str str name=version2.2/str str name=defTypeedismax/str float name=tie1/float str name=qfCUSTOM QF/str str name=qs0/str str name=mm1/str str name=q.alt*:*/str /lst arr name=components strvalidate/str strCUSTOM XYZ QUERY COMPONENT/str strstats/str strdebug/str /arr /requestHandler In ABC QUERY COMPONENT, I customize prepare() and process(). In its process() I want to call the /XYZ request handler and include those results in the results for ABC. Is that possible? I know the org.apache.solr.spelling.SpellCheckCollator calls a QueryComponent and invokes prepare and process on it, but I want to invoke the request handler directly. It’d be silly to use SolrJ since both handlers are in the same core. Any suggestions? Thanks! Maria
Re: r1201855 broke stats.facet on long fields
Wow ... either i'm a huge idiot and everyone has just been really polite about it in most threads, or something about this thread in particular made me really stupid. (Luis: i'm sorry for all the things i have said so far in this email thread that were a complete waste of your time - hopefully this email will make up for it) Idiocy #1... : Solr can : not reasonably compute stats on a multivalued field : : Wasn't that added here? : https://issues.apache.org/jira/browse/SOLR-1380 Yes, correct. I didn't realize that functionality had ever been added, but it was and it does still work just fine in Solr 3.5 (you can see this in any of the StatsComponentTest methods that call doTestMVFieldStatisticsResult) Idiocy #2... Subject : Re: r1201855 broke stats.facet on long fields ...in spite of this subject, and multiple references to stats.facet in Luis's original email I complely overlooked the entire crux of the Luis's problem. I thought the issue was that he couldn't get *stats* on a multi-valued field, I didn't realize that it was the stats.facet param that had started failing for him in Solr 3.5 I believe that the intention of the code Luis quoted, which was committed as part of SOLR-1023 in r1201855, was actually to pre-emptively avoid the problems mentioned in SOLR-1782 (which Luis actually mentioned and i *still* didn't realize this was about stats.facet - Idiocy#3) ... if (facetFieldType.isTokenized() || facetFieldType.isMultiValued()) { throw new SolrException(SolrException.ErrorCode.BAD_REQUEST, ...given the way the stats faceting code works, that sanity check does make sense, and seems like a good idea. but the crux of the issue in Luis's case... fieldType name=long class=solr.TrieLongField precisionStep=0 omitNorms=true positionIncrementGap=0 / field name=ts type=long indexed=true stored=true required=true / ...seems to be that the isTokenized() test that's failing (and *not* the isMultiValued() check that i immediatley assumed - Idiocy#4) because TrieField.isTokenized() is hardcoded to return true. I *think* TrieField.isTokenized should be changed to depend on the value of the precisionStep, but i'm not sure what all the ramifications of that are just yet -- but i've opened SOLR-2976 to look into it. -Hoss
Re: Call RequestHandler from QueryComponent
Maria: sending the same email 4 times in less the 48 hours isn't really a good way to encourange people to help you -- it just means more total mail people have to wade thorugh which slows them down and makes them less likeely to want to help. : In ABC QUERY COMPONENT, I customize prepare() and process(). In its : process() I want to call the /XYZ request handler and include those results : in the results for ABC. Is that possible? certianly -- you can execute any java code you wnat in a custom component, take a look at how SolrDispatchFilter exeuts the original request on the SolrCore, you can do something similar in your custom component (but you'll want to use a LocalSolrQueryRequest that you populate with params -- see the TestHarness for an example) and then take whatever data you want out of the inner SolrQueryResponse you get back and add it directly to the outer SolrQueryResponse. One thing you might have to watch out for is ensuring that the same SolrIndexSearcher used in the outer request is also the one used in the inner request -- the consistency is crucial to ensuring any DocList you copy is meaninful -- but i'm not sure if you can do that easily with LocalSolrQueryRequest, you might need to tweak it. -Hoss
Re: NRT or similar for Solr 3.5?
Hey Vikram, I finally got around to getting Solr-RA installed but I'm having trouble getting the NRT to work. Could you help me out? I added these four lines immediately after config in solrconfig.xml: realtime visible=200true/realtime libraryrankingalgorithm/library realtime visible=200 facet=truetrue/realtime libraryrankingalgorithm/library Is that correct? I also read something about disabling caching, so I took out the queryResultCache. Is that right? What else do I need to do to get NRT working? Do I need to switch some engine to Solr-RA? If so, how do I do that? Are there other caches I need to disable? Any help appreciated. Thanks. -- Steven Ou | 歐偉凡 *ravn.com* | Chief Technology Officer steve...@gmail.com | +1 909-569-9880 2011/12/12 vikram kamath kmar...@gmail.com @Steven .. try some alternate email address(besides google/yahoo) and check your spam [image: twitter] http://twitter.com/kmarkiv[image: facebook]http://facebook.com/kmarkiv[image: google-buzz] http://profiles.google.com/kmarkiv#buzz[image: linkedin]http://linkedin.com/in/vikramkamathc Regards Vikram Kamath 2011/12/13 Steven Ou steve...@gmail.com Yeah, running Chrome on OSX and doesn't do anything. Just switched to Firefox and it works. *But*, also don't seem to be receiving confirmation email. -- Steven Ou | 歐偉凡 *ravn.com* | Chief Technology Officer steve...@gmail.com | +1 909-569-9880 2011/12/12 vikram kamath kmar...@gmail.com The Onclick handler does not seem to be called on google chrome (Ubuntu ). Also , I dont seem to receive the email with the confirmation link on registering (I have checked my spam) Regards Vikram Kamath 2011/12/12 Nagendra Nagarajayya nnagaraja...@transaxtions.com Steven: There is an onclick handler that allows you to download the src. BTW, an early access Solr 3.5 with RankingAlgorithm 1.3 (NRT) release is available for download. So please give it a try. Regards, - Nagendra Nagarajayya http://solr-ra.tgels.org http://rankingalgorithm.tgels.org On 12/10/2011 11:18 PM, Steven Ou wrote: All the links on the download section link to http://solr-ra.tgels.org/# -- Steven Ou | 歐偉凡 *ravn.com* | Chief Technology Officer steve...@gmail.com | +1 909-569-9880 2011/12/11 Nagendra Nagarajayya nnagaraja...@transaxtions.com Steven: Not sure why you had problems, #downloads ( http://solr-ra.tgels.org/#downloads ) should point you to the downloads section showing the different versions available for download ? Please share if this is not so ( there were downloads yesterday with no problems ) Regarding NRT, you can switch between RA and Lucene at query level or at config level; in the current version with RA, NRT is in effect while with lucene, it is not, you can get more information from here: http://solr-ra.tgels.org/papers/Solr34_with_RankingAlgorithm13.pdf Solr 3.5 with RankingAlgorithm 1.3 should be available next week. Regards, - Nagendra Nagarajayya http://solr-ra.tgels.org http://rankingalgorithm.tgels.org On 12/9/2011 4:49 PM, Steven Ou wrote: Hey Nagendra, I took a look and Solr-RA looks promising - but: - I could not figure out how to download it. It seems like all the download links just point to # - I wasn't looking for another ranking algorithm, so would it be possible for me to use NRT but *not* RA (i.e. just use the normal Lucene library)? -- Steven Ou | 歐偉凡 *ravn.com* | Chief Technology Officer steve...@gmail.com | +1 909-569-9880 On Sat, Dec 10, 2011 at 5:13 AM, Nagendra Nagarajayya nnagaraja...@transaxtions.com wrote: Steven: Please take a look at Solr with RankingAlgorithm. It offers NRT functionality. You can set your autoCommit to about 15 mins. You can get more information from here: http://solr-ra.tgels.com/wiki/**en/Near_Real_Time_Search_ver_**3.x http://solr-ra.tgels.com/wiki/en/Near_Real_Time_Search_ver_3.x Regards, - Nagendra Nagarajayya http://solr-ra.tgels.org http://rankingalgorithm.tgels.**org http://rankingalgorithm.tgels.org On 12/8/2011 9:30 PM, Steven Ou wrote: Hi guys, I'm looking for NRT functionality or similar in Solr 3.5. Is that possible? From what I understand there's NRT in Solr 4, but I can't figure out whether or not 3.5 can do it as well? If not, is it feasible to use an autoCommit every 1000ms? We don't currently process *that* much data so I wonder if it's OK to just commit very often? Obviously not scalable on a large scale, but it is feasible for
Re: Core overhead
: The list would be unreadable if everyone spammed at the bottom their : email like Otis'. It's just bad form. If you'd like to debate project policy on what is/isn't acceptible on any of the Lucene mailing lists, please start a new thread on general@lucene (the list that exists precisely for the purpose of discussing meta-issues related to the Project/Community) instead of spamming the substantial solr-user@lucene subscriber base who probably subscribed to this list because they were interested in getting emails about using solr, not debating email etiquite. -Hoss
Call RequestHandler from QueryComponent
Hi! I have a solrconfig.xml like: requestHandler name=/ABC class=solr.SearchHandler lst name=defaults str name=echoParamsall/str int name=start0/int int name=rows10/int str name=wtABC/str str name=sortscore desc,rating asc/str str name=fqCUSTOM FQ/str str name=version2.2/str str name=flCUSTOM FL/str /lst arr name=components strvalidate/str strCUSTOM ABC QUERY COMPONENT/str strstats/str strdebug/str /arr /requestHandler requestHandler name=/XYZ class=solr.SearchHandler lst name=defaults str name=echoParamsall/str int name=start0/int int name=rows1/int str name=wtXYZ/str str name=sortscore desc/str str name=flCUSTOM FL/str str name=version2.2/str str name=defTypeedismax/str float name=tie1/float str name=qfCUSTOM QF/str str name=qs0/str str name=mm1/str str name=q.alt*:*/str /lst arr name=components strvalidate/str strCUSTOM XYZ QUERY COMPONENT/str strstats/str strdebug/str /arr /requestHandler In ABC QUERY COMPONENT, I customize prepare() and process(). In its process() I want to call the /XYZ request handler and include those results in the results for ABC. Is that possible? I know the org.apache.solr.spelling.SpellCheckCollator calls a QueryComponent and invokes prepare and process on it, but I want to invoke the request handler directly. It’d be silly to use SolrJ since both handlers are in the same core. Any suggestions? Thanks! Maria
Re: how to setup to archive expired documents?
: So if we use some sort of weekly or daily sharding, there needs to be : some mechanism in place to dynamically add the new shard when the : current one fills up. (Which would also ideally know where to put the : new shards on what server, etc.) Since SOLR does not implement that I : was thinking of just having a static set of shards. You may want to consider taking a look at the ongoing work on improving Solr Cloud -- particularly the distributed indexing and shard failure logic. My understanding (from past discussions, this may have changed w/o me realizing it) is that doc-shard mapping will nominally be a simple hash function on the uniqueKey, but that a plugin could customize that so that documents are sharded by date -- and then when you only need to query recent docs you could query those shards explicitly. (no idea if things are stable enough yet for such a plugin to be written -- but the sooner someone tries the tackle it to solve their use case, the sooner people will be confident that the API is stable) https://wiki.apache.org/solr/SolrCloud https://issues.apache.org/jira/browse/SOLR-2358 https://svn.apache.org/viewvc/lucene/dev/branches/solrcloud/ -Hoss
Re: Poor performance on distributed search
Right, are you falling afoul of the recursive shard thing? That is, if you shards point back to itself. As far as I understand, your shards parameter in your request handler shouldn't point back to itself But I'm guessing here. Best Erick On Fri, Dec 16, 2011 at 4:27 PM, ku3ia dem...@gmail.com wrote: OK, so your speed differences are pretty much dependent upon whether you specify rows=2000 or rows=10, right? Why do you need 2,000 rows? Yes, big difference is 10 v. 2K records. Limit of 2K rows is setted by manager and I can't decrease it. It is a minimum row count needed to process data. Or is the root question why there's such a difference when you specify qt=requestShards? In which case I'm curious to see that request handler definition... requestHandler name=requestShards class=solr.SearchHandler default=false lst name=defaults str name=echoParamsexplicit/str int name=rows10/int str name=shards127.0.0.1:8080/solr/shard1,127.0.0.1:8080/solr/shard2,127.0.0.1:8080/solr/shard3,127.0.0.1:8080/solr/shard4/str /lst /requestHandler This request handler is defined at shard1's solrconfig. -- View this message in context: http://lucene.472066.n3.nabble.com/Poor-performance-on-distributed-search-tp3590028p3592734.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Call RequestHandler from QueryComponent
I am very very sorry. My mail client was not working from work and it looked like it was not being delivered, that's why I tried a few times. Sorry everybody! -Original Message- From: Chris Hostetter [mailto:hossman_luc...@fucit.org] Sent: Friday, December 16, 2011 3:23 PM To: solr-user@lucene.apache.org Subject: Re: Call RequestHandler from QueryComponent Maria: sending the same email 4 times in less the 48 hours isn't really a good way to encourange people to help you -- it just means more total mail people have to wade thorugh which slows them down and makes them less likeely to want to help. : In ABC QUERY COMPONENT, I customize prepare() and process(). In its : process() I want to call the /XYZ request handler and include those results : in the results for ABC. Is that possible? certianly -- you can execute any java code you wnat in a custom component, take a look at how SolrDispatchFilter exeuts the original request on the SolrCore, you can do something similar in your custom component (but you'll want to use a LocalSolrQueryRequest that you populate with params -- see the TestHarness for an example) and then take whatever data you want out of the inner SolrQueryResponse you get back and add it directly to the outer SolrQueryResponse. One thing you might have to watch out for is ensuring that the same SolrIndexSearcher used in the outer request is also the one used in the inner request -- the consistency is crucial to ensuring any DocList you copy is meaninful -- but i'm not sure if you can do that easily with LocalSolrQueryRequest, you might need to tweak it. -Hoss
Looking for a good Text on Solr
I am looking for a good book to read from and get a better understanding of solr. On amazon, all the books on Solr have average rating (which I supposed no one tried them or bothered to post a review) but this one: Solr 1.4 Enterprise Search Server by David Smiley, Eric Pugh has a pretty decent review. But the current version of Solr is 3.5, so should I proceed with David Smiley's book or is there a better text available. Thanks, Shiv Deepak
Re: Looking for a good Text on Solr
There is an update to that book for Solr 3: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book I actually bought it recently, but haven't looked at it yet. Good luck. Brendan On Dec 16, 2011, at 9:01 PM, Shiv Deepak wrote: I am looking for a good book to read from and get a better understanding of solr. On amazon, all the books on Solr have average rating (which I supposed no one tried them or bothered to post a review) but this one: Solr 1.4 Enterprise Search Server by David Smiley, Eric Pugh has a pretty decent review. But the current version of Solr is 3.5, so should I proceed with David Smiley's book or is there a better text available. Thanks, Shiv Deepak
Re: Looking for a good Text on Solr
Hi Shiv, For me, a combination of the following has helped me learn a lot about Solr in a short period of time: * Apache Solr 3 Enterprise Search Server: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book * Solr Wiki: http://wiki.apache.org/solr/ * Pretty much every single post on this blog: http://www.hathitrust.org/blogs/large-scale-search Hope this helps, -- Hector On Friday, December 16, 2011 at 9:01 PM, Shiv Deepak wrote: I am looking for a good book to read from and get a better understanding of solr. On amazon, all the books on Solr have average rating (which I supposed no one tried them or bothered to post a review) but this one: Solr 1.4 Enterprise Search Server by David Smiley, Eric Pugh has a pretty decent review. But the current version of Solr is 3.5, so should I proceed with David Smiley's book or is there a better text available. Thanks, Shiv Deepak
Re: Looking for a good Text on Solr
Hey Brendan, Hey Hector, That was very helpful. :) Thanks, Shiv Deepak On 17-Dec-2011, at 07:52 , Hector Castro wrote: Hi Shiv, For me, a combination of the following has helped me learn a lot about Solr in a short period of time: * Apache Solr 3 Enterprise Search Server: http://www.packtpub.com/apache-solr-3-enterprise-search-server/book * Solr Wiki: http://wiki.apache.org/solr/ * Pretty much every single post on this blog: http://www.hathitrust.org/blogs/large-scale-search Hope this helps, -- Hector On Friday, December 16, 2011 at 9:01 PM, Shiv Deepak wrote: I am looking for a good book to read from and get a better understanding of solr. On amazon, all the books on Solr have average rating (which I supposed no one tried them or bothered to post a review) but this one: Solr 1.4 Enterprise Search Server by David Smiley, Eric Pugh has a pretty decent review. But the current version of Solr is 3.5, so should I proceed with David Smiley's book or is there a better text available. Thanks, Shiv Deepak
Re: Retrieving Documents
Hi Dan, 1) Are you looking for http://wiki.apache.org/solr/HighlightingParameters#hl.fragsize ? 2) Hundreds of words in a field should not be a problem for highlighting. But it sounds like this long field may contain content that corresponds to N different pages in a publication and you would like to inform the searcher which page the match was on, and not just that a match was somewhere in that big piece of text. One way to deal with that is to break your document into N smaller documents - one document for each page. Otis Performance Monitoring SaaS for Solr - http://sematext.com/spm/solr-performance-monitoring/index.html From: Dan McGinn-Combs dgco...@gmail.com To: solr-user@lucene.apache.org Sent: Friday, December 16, 2011 4:33 PM Subject: Retrieving Documents I've been doing a fair amount of reading and experimenting with Solr lately. I find that it does a good job of indexing very structured documents. However, the application I have in mind is build around long EPUB documents. Of course, I found the Extract components useful for indexing the EPUBs. However, I would like to be able to * Size the highlight portion of text around the query parameters (i.e. show 20 or 30 words) and * Retrieve a location within the document so I can display that page from the EPUB. What is common practice for these? I notice that if I have a list of (short) text segments in fields, they are stored without too much fuss and are retrievable. However, I'm talking about a field of potentially hundreds of words. Thanks for any pointers, Dan -- Dan McGinn-Combs dgco...@gmail.com Peachtree City, Georgia USA