Re: run filter queries after post filter
Hey, so the post filter logs the number of ids that it receives. With the above filter having cost=200, the post filter should have received the same number of ids as before ( when the filter was not present ). But that does not seem to be the case...with the filter query on the index, the number of ids that the post filter is receiving reduces. Thanks, Rohit On Tue, Oct 8, 2013 at 8:29 PM, Erick Erickson erickerick...@gmail.comwrote: Hmmm, seems like it should. What's our evidence that it isn't working? Best, Erick On Tue, Oct 8, 2013 at 4:10 PM, Rohit Harchandani rhar...@gmail.com wrote: Hey, I am using solr 4.0 with my own PostFilter implementation which is executed after the normal solr query is done. This filter has a cost of 100. Is it possible to run filter queries on the index after the execution of the post filter? I tried adding the below line to the url but it did not seem to work: fq={!cache=false cost=200}field:value Thanks, Rohit
Re: run filter queries after post filter
yes i get that. actually i should have explained in more detail. - i have a query which gets certain documents. - the post filter gets these matched documents and does some processing on them and filters the results. - but after this is done i need to apply another filter - which is why i gave a higher cost to it. the reason i need to do this is because the processing done by the post filter depends on the documents matching the query till that point. since the normal fq clause is also getting executed before the post filter (despite the cost), the final results are not accurate thanks Rohit On Wed, Oct 9, 2013 at 4:14 PM, Erick Erickson erickerick...@gmail.comwrote: Ah, I think you're misunderstanding the nature of post-filters. Or I'm confused, which happens a lot! The whole point of post filters is that they're assumed to be expensive (think ACL calculation). So you want them to run on the fewest documents possible. So only docs that make it through the primary query _and_ all lower-cost filters will get to this post-filter. This means they can't be cached for instance, because they don't see (hopefully) very many docs. This is radically different than normal fq clauses, which are calculated on the entire corpus and can thus be cached. Best, Erick On Wed, Oct 9, 2013 at 11:59 AM, Rohit Harchandani rhar...@gmail.com wrote: Hey, so the post filter logs the number of ids that it receives. With the above filter having cost=200, the post filter should have received the same number of ids as before ( when the filter was not present ). But that does not seem to be the case...with the filter query on the index, the number of ids that the post filter is receiving reduces. Thanks, Rohit On Tue, Oct 8, 2013 at 8:29 PM, Erick Erickson erickerick...@gmail.com wrote: Hmmm, seems like it should. What's our evidence that it isn't working? Best, Erick On Tue, Oct 8, 2013 at 4:10 PM, Rohit Harchandani rhar...@gmail.com wrote: Hey, I am using solr 4.0 with my own PostFilter implementation which is executed after the normal solr query is done. This filter has a cost of 100. Is it possible to run filter queries on the index after the execution of the post filter? I tried adding the below line to the url but it did not seem to work: fq={!cache=false cost=200}field:value Thanks, Rohit
run filter queries after post filter
Hey, I am using solr 4.0 with my own PostFilter implementation which is executed after the normal solr query is done. This filter has a cost of 100. Is it possible to run filter queries on the index after the execution of the post filter? I tried adding the below line to the url but it did not seem to work: fq={!cache=false cost=200}field:value Thanks, Rohit
adding custom fields to solr response
Hi, I have created a custom component with some post filtering ability. Now I am trying to add certain fields to the solr response. I was able to add it as a separate response section, but i am having difficulty adding it to the docs themselves. Is there an example of any component which adds fields to the docs using DocTransformer ? Thanks, Rohit
Re: solr postfilter question
Hi, I did finally manage to this. I get all the documents in the post filter and then call collect on the each ones that match the filter criteria. But for some reason, it does not seem to hit the query results cache (equals succeeds and the hashcode is good too)? Not sure what I am missing here? Thanks, Rohit On Wed, Jul 10, 2013 at 6:10 PM, Yonik Seeley yo...@lucidworks.com wrote: On Wed, Jul 10, 2013 at 6:08 PM, Rohit Harchandani rhar...@gmail.com wrote: Hey, I am trying to create a plugin which makes use of postfilter. I know that the collect function is called for every document matched, but is there a way i can access all the matched documents upto this point before collect is called on each of them? You would need to collect/cache that information yourself in the post filter. -Yonik http://lucidworks.com
Re: replication getting stuck on a file
I am facing this problem in solr 4.0 too. Its definitely not related to autowarming. It just gets stuck while downloading a file and there is no way to abort the replication except restarting solr. On Wed, Jul 10, 2013 at 6:10 PM, adityab aditya_ba...@yahoo.com wrote: I have seen this in 4.2.1 too. Once replication is finished, on Admin UI we see 100% and time and dlspeed information goes out of wack Same is reflected in mbeans. But whats actually happening in the background is auto-warmup of caches (in my case) May be some minor stats bug -- View this message in context: http://lucene.472066.n3.nabble.com/replication-getting-stuck-on-a-file-tp4076707p4077112.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: solr postfilter question
Basically i see it is looking up the cache and getting a hit, but it still seems to be collecting all the documents again. Thanks, Rohit On Thu, Aug 1, 2013 at 4:37 PM, Rohit Harchandani rhar...@gmail.com wrote: Hi, I did finally manage to this. I get all the documents in the post filter and then call collect on the each ones that match the filter criteria. But for some reason, it does not seem to hit the query results cache (equals succeeds and the hashcode is good too)? Not sure what I am missing here? Thanks, Rohit On Wed, Jul 10, 2013 at 6:10 PM, Yonik Seeley yo...@lucidworks.comwrote: On Wed, Jul 10, 2013 at 6:08 PM, Rohit Harchandani rhar...@gmail.com wrote: Hey, I am trying to create a plugin which makes use of postfilter. I know that the collect function is called for every document matched, but is there a way i can access all the matched documents upto this point before collect is called on each of them? You would need to collect/cache that information yourself in the post filter. -Yonik http://lucidworks.com
solr postfilter question
Hey, I am trying to create a plugin which makes use of postfilter. I know that the collect function is called for every document matched, but is there a way i can access all the matched documents upto this point before collect is called on each of them? Thanks, Rohir
Re: My latest solr blog post on Solr's PostFiltering
Hi Amit, Great article. I tried it and it works well. I am new to developing in solr and had a question? do you know if there is a way to access all the matched ids before collect is called? Thanks, Rohit On Sat, Nov 10, 2012 at 1:12 PM, Erick Erickson erickerick...@gmail.comwrote: That'll teach _me_ to look closely at the URL... Best Erick On Fri, Nov 9, 2012 at 12:03 PM, Amit Nithian anith...@gmail.com wrote: Oh weird. I'll post URLs on their own lines next time to clarify. Thanks guys and looking forward to any feedback! Cheers Amit On Fri, Nov 9, 2012 at 2:05 AM, Dmitry Kan dmitry@gmail.com wrote: I guess the url should have been: http://hokiesuns.blogspot.com/2012/11/using-solrs-postfiltering-to-collect.html i.e. without 'and' in the end of it. -- Dmitry On Fri, Nov 9, 2012 at 12:03 PM, Erick Erickson erickerick...@gmail.com wrote: It's always good when someone writes up their experiences! But when I try to follow that link, I get to your Random Writings, but it tells me that the blog post doesn't exist... Erick On Thu, Nov 8, 2012 at 4:21 PM, Amit Nithian anith...@gmail.com wrote: Hey all, I wanted to thank those who have helped in answering some of my esoteric questions and especially the one about using Solr's post filtering feature to implement some score statistics gathering we had to do at Zvents. To show this appreciation and to help advance the knowledge of this space in a more codified fashion, I have written a blog post about this work and open sourced the work as well. Please take a read by visiting http://hokiesuns.blogspot.com/2012/11/using-solrs-postfiltering-to-collect.htmland please let me know if there are any inaccuracies or points of contention so I can address/correct them. Thanks! Amit -- Regards, Dmitry Kan
Problem with Solr replication in solr 4.2
Hey, Currently we are using solr 4.0 with a master slave setup. The data gets indexed on the master and then we issue a fetchindex command to replicate it on the slave. The slave has a postCommit listener which gets kicked off when replication finishes and we depend on this listener to know whn replication is done. But when I tried to do the same with 4.2, the commit does not seem to be happening. Is this a known issue? is there any other way to know that replication is done? Also, initially when i tried solr 4.2, i noticed with this setup, i noticed that with the fetchIndex command, the fields were downloaded to the temp folder, but it was never pulled into the index directory on the slave. The only file which made it was the lock file. This problem does not happen anymore? Thanks, Rohit
Re: Run multiple instances of solr using single data directory
ok. but what are the problems when brining up multiple instances reading from the same data directory? also how to re-open the searchers without restarting solr? Thanks, Rohit On Tue, Nov 13, 2012 at 11:20 PM, Otis Gospodnetic otis.gospodne...@gmail.com wrote: Hi, If you have high query rate, running multiple instances of Solr on the same server doesn't typically make sense. I'd stop and rethink :) Otis -- Solr Performance Monitoring - http://sematext.com/spm/index.html On Tue, Nov 13, 2012 at 5:46 PM, Rohit Harchandani rhar...@gmail.com wrote: Hi All, I am currently using solr 4.0. The application I am working on requires a high rate of queries per second. Currently, we have setup a single master and a single slave on a production machine. We want to bring up multiple instances of solr (slaves). Are there any problems, when bringing them up on different ports but using the same data directory? These will be only serving up queries and all the indexing will take place on the master machine. Also, if i have multiple instances from the same data directory and i perform replication. Would that re-open searchers on all the instances? Thanks, Rohit
Re: Solr 4.0 simultaneous query problem
So is it a better approach to query for smaller rows, say 500, and keep increasing the start parameter? wouldnt that be slower since I have an increasing start parameter and I will also be sorting by the same field in each of my queries made to the multiple shards? Also, does it make sense to have all these documents in the same shard? I went for this approach because the shard which is queried the most is small and gives a lot of benefit in terms of time taken for all the stats queries. This shard is only about 5 gb whereas the entire index will be about 50 gb. Thanks for the help, Rohit On Mon, Nov 5, 2012 at 4:02 PM, Walter Underwood wun...@wunderwood.orgwrote: Don't query for 5000 documents. That is going to be slow no matter how it is implemented. wunder On Nov 5, 2012, at 1:00 PM, Rohit Harchandani wrote: Hi, So it seems that when I query multiple shards with the sort criteria for 5000 documents, it queries all shards and gets a list of document ids and then adds the document ids to the original query and queries all the shards again. This process of doing the join of query results with the unique ids and getting the remaining fields is turning out to be really slow. It takes a while to search for a list of unique ids. Is there any config change to make this process faster? Also what does isDistrib=false mean when solr generates the queries internally? Thanks, Rohit On Fri, Oct 19, 2012 at 5:23 PM, Rohit Harchandani rhar...@gmail.com wrote: Hi, The same query is fired always for 500 rows. The only thing different is the start parameter. The 3 shards are in the same instance on the same server. They all have the same schema. But the inherent type of the documents is different. Also most of the apps queries goes to shard A which has the smallest index size (4gb). The query is made to a master shard which by default goes to all 3 shards for results. (also, the query that i am trying matches documents only only in shard A mentioned above) Will try debugQuery now and post it here. Thanks, Rohit On Thu, Oct 18, 2012 at 11:00 PM, Otis Gospodnetic otis.gospodne...@gmail.com wrote: Hi, Maybe you can narrow this down a little further. Are there some queries that are faster and some slower? Is there a pattern? Can you share examples of slow queries? Have you tried debugQuery=true? These 3 shards is each of them on its own server or? Is the slow one always the one that hits the biggest shard? Do they hold the same type of data? How come their sizes are so different? Otis -- Search Analytics - http://sematext.com/search-analytics/index.html Performance Monitoring - http://sematext.com/spm/index.html On Thu, Oct 18, 2012 at 12:22 PM, Rohit Harchandani rhar...@gmail.com wrote: Hi all, I have an application which queries a solr instance having 3 shards(4gb, 13gb and 30gb index size respectively) having 6 million documents in all. When I start 10 threads in my app to make simultaneous queries (with rows=500 and different start parameter, sort on 1 field and no facets) to solr to return 500 different documents in each query, sometimes I see that most of the responses come back within no time (500ms-1000ms), but the last response takes close to 50 seconds (Qtime). I am using the latest 4.0 release. What is the reason for this delay? Is there a way to prevent this? Thanks and regards, Rohit -- Walter Underwood wun...@wunderwood.org
Re: Solr 4.0 simultaneous query problem
Hi, So it seems that when I query multiple shards with the sort criteria for 5000 documents, it queries all shards and gets a list of document ids and then adds the document ids to the original query and queries all the shards again. This process of doing the join of query results with the unique ids and getting the remaining fields is turning out to be really slow. It takes a while to search for a list of unique ids. Is there any config change to make this process faster? Also what does isDistrib=false mean when solr generates the queries internally? Thanks, Rohit On Fri, Oct 19, 2012 at 5:23 PM, Rohit Harchandani rhar...@gmail.comwrote: Hi, The same query is fired always for 500 rows. The only thing different is the start parameter. The 3 shards are in the same instance on the same server. They all have the same schema. But the inherent type of the documents is different. Also most of the apps queries goes to shard A which has the smallest index size (4gb). The query is made to a master shard which by default goes to all 3 shards for results. (also, the query that i am trying matches documents only only in shard A mentioned above) Will try debugQuery now and post it here. Thanks, Rohit On Thu, Oct 18, 2012 at 11:00 PM, Otis Gospodnetic otis.gospodne...@gmail.com wrote: Hi, Maybe you can narrow this down a little further. Are there some queries that are faster and some slower? Is there a pattern? Can you share examples of slow queries? Have you tried debugQuery=true? These 3 shards is each of them on its own server or? Is the slow one always the one that hits the biggest shard? Do they hold the same type of data? How come their sizes are so different? Otis -- Search Analytics - http://sematext.com/search-analytics/index.html Performance Monitoring - http://sematext.com/spm/index.html On Thu, Oct 18, 2012 at 12:22 PM, Rohit Harchandani rhar...@gmail.com wrote: Hi all, I have an application which queries a solr instance having 3 shards(4gb, 13gb and 30gb index size respectively) having 6 million documents in all. When I start 10 threads in my app to make simultaneous queries (with rows=500 and different start parameter, sort on 1 field and no facets) to solr to return 500 different documents in each query, sometimes I see that most of the responses come back within no time (500ms-1000ms), but the last response takes close to 50 seconds (Qtime). I am using the latest 4.0 release. What is the reason for this delay? Is there a way to prevent this? Thanks and regards, Rohit
Retrieval of large number of documents
Hi all, I have a solr index with 5,000,000 documents and my index size is 38GB. But when I query for about 400,000 documents based on certain criteria, solr searches it really quickly but does not return data for close to 2 minutes. The unique key field is the only field i am requesting for. Also, I apply an xslt transformation to the response to get a comma separated list of unique keys. Is there a way to improve this speed?? Would sharding help in this case? I am currently using solr 4.0 beta in my application. Thanks, Rohit
Re: Delete all documents in the index
Thanks everyone. Adding the _version_ field in the schema worked. Deleting the data directory works for me, but was not sure why deleting using curl was not working. On Wed, Sep 5, 2012 at 1:49 PM, Michael Della Bitta michael.della.bi...@appinions.com wrote: Rohit: If it's easy, the easiest thing to do is to turn off your servlet container, rm -r * inside of the data directory, and then restart the container. Michael Della Bitta Appinions | 18 East 41st St., Suite 1806 | New York, NY 10017 www.appinions.com Where Influence Isn’t a Game On Wed, Sep 5, 2012 at 12:56 PM, Jack Krupansky j...@basetechnology.com wrote: Check to make sure that you are not stumbling into SOLR-3432: deleteByQuery silently ignored if updateLog is enabled, but {{_version_}} field does not exist in schema. See: https://issues.apache.org/jira/browse/SOLR-3432 This could happen if you kept the new 4.0 solrconfig.xml, but copied in your pre-4.0 schema.xml. -- Jack Krupansky -Original Message- From: Rohit Harchandani Sent: Wednesday, September 05, 2012 12:48 PM To: solr-user@lucene.apache.org Subject: Delete all documents in the index Hi, I am having difficulty deleting documents from the index using curl. The urls i tried were: curl http://localhost:9020/solr/core1/update/?stream.body= deletequery*:*/query/deletecommit=true curl http://localhost:9020/solr/core1/update/?commit=true; -H Content-Type: text/xml --data-binary 'deletequeryid:[* TO *]/query/delete' curl http://localhost:9020/solr/core1/update/?commit=true; -H Content-Type: text/xml --data-binary 'deletequery*:*/query/delete' I also tried: curl http://localhost:9020/solr/core1/update/?stream.body=%3Cdelete%3E%3Cquery%3E*:*%3C/query%3E%3C/delete%3Ecommit=true as suggested on some forums. I get a response with status=0 in all cases, but none of the above seem to work. When I run curl http://localhost:9020/solr/core1/select?q=*:*rows=0wt=xml; I still get a value for numFound. I am currently using solr 4.0 beta version. Thanks for your help in advance. Regards, Rohit
Update data directory at run time
Hi All, I am new to Solr and would really appreciate some help on this issue. I have a single core setup currently and have separate instances for querying and indexing. These two instances point to different data directories through symbolic links since I do not want it to affect the live searching instance. Once the indexing is done, I swap the directories to which the symbolic links point, so that the live searching instance now points to the directory where the new data was indexed. But this does not seem to work, without restarting solr. I guess my purpose can probably be acheived using multiple cores and SWAP(?), but wanted to know if there is a way to do this with a single core? Is there some command needed once the directories are swapped? I see in the core description, that the directory entry under index did not change after updating the symlinks. (org.apache.lucene.store.MMapDirectory:org.apache.lucene.store.MMapDirectory@/bb/mbigd/mbig2580/srchSolr/apache-solr-4.0.0-ALPHA/example/solr/data/index lockFactory=org.apache.lucene.store.NativeFSLockFactory@2447e380) Is there a way to update this dynamically? Thanks a lot Regards, Rohit Harchandani
Re: Update data directory at run time
Cool. Thanks. I will have a look at this. But in this case, if all the files on the master are new, will the entire index on the slave be replaced or will it add to whatever is currently present on the slave? Thanks again, Rohit On Tue, Aug 14, 2012 at 6:04 PM, Walter Underwood wun...@wunderwood.orgwrote: Why are you not using the built-in replication? That works fine. You do not need to invent anything. wunder On Aug 14, 2012, at 2:57 PM, Rohit Harchandani wrote: Hi All, I am new to Solr and would really appreciate some help on this issue. I have a single core setup currently and have separate instances for querying and indexing. These two instances point to different data directories through symbolic links since I do not want it to affect the live searching instance. Once the indexing is done, I swap the directories to which the symbolic links point, so that the live searching instance now points to the directory where the new data was indexed. But this does not seem to work, without restarting solr. I guess my purpose can probably be acheived using multiple cores and SWAP(?), but wanted to know if there is a way to do this with a single core? Is there some command needed once the directories are swapped? I see in the core description, that the directory entry under index did not change after updating the symlinks. (org.apache.lucene.store.MMapDirectory:org.apache.lucene.store.MMapDirectory@ /bb/mbigd/mbig2580/srchSolr/apache-solr-4.0.0-ALPHA/example/solr/data/index lockFactory=org.apache.lucene.store.NativeFSLockFactory@2447e380) Is there a way to update this dynamically? Thanks a lot Regards, Rohit Harchandani