Re: run filter queries after post filter

2013-10-09 Thread Rohit Harchandani
Hey,
so the post filter logs the number of ids that it receives.
With the above filter having cost=200, the post filter should have received
the same number of ids as before ( when the filter was not present ).
But that does not seem to be the case...with the filter query on the index,
the number of ids that the post filter is receiving reduces.

Thanks,
Rohit


On Tue, Oct 8, 2013 at 8:29 PM, Erick Erickson erickerick...@gmail.comwrote:

 Hmmm, seems like it should. What's our evidence that it isn't working?

 Best,
 Erick

 On Tue, Oct 8, 2013 at 4:10 PM, Rohit Harchandani rhar...@gmail.com
 wrote:
  Hey,
  I am using solr 4.0 with my own PostFilter implementation which is
 executed
  after the normal solr query is done. This filter has a cost of 100. Is it
  possible to run filter queries on the index after the execution of the
 post
  filter?
  I tried adding the below line to the url but it did not seem to work:
  fq={!cache=false cost=200}field:value
  Thanks,
  Rohit



Re: run filter queries after post filter

2013-10-09 Thread Rohit Harchandani
yes i get that. actually i should have explained in more detail.

- i have a query which gets certain documents.
- the post filter gets these matched documents and does some processing on
them and filters the results.
- but after this is done i need to apply another filter - which is why i
gave a higher cost to it.

the reason i need to do this is because the processing done by the post
filter depends on the documents matching the query till that point.
since the normal fq clause is also getting executed before the post filter
(despite the cost), the final results are not accurate

thanks
Rohit




On Wed, Oct 9, 2013 at 4:14 PM, Erick Erickson erickerick...@gmail.comwrote:

 Ah, I think you're misunderstanding the nature of post-filters.
 Or I'm confused, which happens a lot!

 The whole point of post filters is that they're assumed to be
 expensive (think ACL calculation). So you want them to run
 on the fewest documents possible. So only docs that make it
 through the primary query _and_ all lower-cost filters will get
 to this post-filter. This means they can't be cached for
 instance, because they don't see (hopefully) very many docs.

 This is radically different than normal fq clauses, which are
 calculated on the entire corpus and can thus be cached.

 Best,
 Erick

 On Wed, Oct 9, 2013 at 11:59 AM, Rohit Harchandani rhar...@gmail.com
 wrote:
  Hey,
  so the post filter logs the number of ids that it receives.
  With the above filter having cost=200, the post filter should have
 received
  the same number of ids as before ( when the filter was not present ).
  But that does not seem to be the case...with the filter query on the
 index,
  the number of ids that the post filter is receiving reduces.
 
  Thanks,
  Rohit
 
 
  On Tue, Oct 8, 2013 at 8:29 PM, Erick Erickson erickerick...@gmail.com
 wrote:
 
  Hmmm, seems like it should. What's our evidence that it isn't working?
 
  Best,
  Erick
 
  On Tue, Oct 8, 2013 at 4:10 PM, Rohit Harchandani rhar...@gmail.com
  wrote:
   Hey,
   I am using solr 4.0 with my own PostFilter implementation which is
  executed
   after the normal solr query is done. This filter has a cost of 100.
 Is it
   possible to run filter queries on the index after the execution of the
  post
   filter?
   I tried adding the below line to the url but it did not seem to work:
   fq={!cache=false cost=200}field:value
   Thanks,
   Rohit
 



run filter queries after post filter

2013-10-08 Thread Rohit Harchandani
Hey,
I am using solr 4.0 with my own PostFilter implementation which is executed
after the normal solr query is done. This filter has a cost of 100. Is it
possible to run filter queries on the index after the execution of the post
filter?
I tried adding the below line to the url but it did not seem to work:
fq={!cache=false cost=200}field:value
Thanks,
Rohit


adding custom fields to solr response

2013-08-13 Thread Rohit Harchandani
Hi,
I have created a custom component with some post filtering ability. Now I
am trying to add certain fields to the solr response. I was able to add it
as a separate response section, but i am having difficulty adding it to the
docs themselves. Is there an example of any component which adds fields to
the docs using DocTransformer ?
Thanks,
Rohit


Re: solr postfilter question

2013-08-01 Thread Rohit Harchandani
Hi,
I did finally manage to this. I get all the documents in the post filter
and then call collect on the each ones that match the filter criteria. But
for some reason, it does not seem to hit the query results cache (equals
succeeds and the hashcode is good too)? Not sure what I am missing here?
Thanks,
Rohit


On Wed, Jul 10, 2013 at 6:10 PM, Yonik Seeley yo...@lucidworks.com wrote:

 On Wed, Jul 10, 2013 at 6:08 PM, Rohit Harchandani rhar...@gmail.com
 wrote:
  Hey,
  I am trying to create a plugin which makes use of postfilter. I know that
  the collect function is called for every document matched, but is there a
  way i can access all the matched documents upto this point before collect
  is called on each of them?

 You would need to collect/cache that information yourself in the post
 filter.

 -Yonik
 http://lucidworks.com



Re: replication getting stuck on a file

2013-08-01 Thread Rohit Harchandani
I am facing this problem in solr 4.0 too. Its definitely not related to
autowarming. It just gets stuck while downloading a file and there is no
way to abort the replication except restarting solr.


On Wed, Jul 10, 2013 at 6:10 PM, adityab aditya_ba...@yahoo.com wrote:

 I have seen this in 4.2.1 too.
 Once replication is finished, on Admin UI we see 100% and time and dlspeed
 information goes out of wack Same is reflected in mbeans. But whats
 actually
 happening in the background is auto-warmup of caches (in my case)
 May be some minor stats bug




 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/replication-getting-stuck-on-a-file-tp4076707p4077112.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: solr postfilter question

2013-08-01 Thread Rohit Harchandani
Basically i see it is looking up the cache and getting a hit, but it still
seems to be collecting all the documents again.
Thanks,
Rohit


On Thu, Aug 1, 2013 at 4:37 PM, Rohit Harchandani rhar...@gmail.com wrote:

 Hi,
 I did finally manage to this. I get all the documents in the post filter
 and then call collect on the each ones that match the filter criteria. But
 for some reason, it does not seem to hit the query results cache (equals
 succeeds and the hashcode is good too)? Not sure what I am missing here?
 Thanks,
 Rohit


 On Wed, Jul 10, 2013 at 6:10 PM, Yonik Seeley yo...@lucidworks.comwrote:

 On Wed, Jul 10, 2013 at 6:08 PM, Rohit Harchandani rhar...@gmail.com
 wrote:
  Hey,
  I am trying to create a plugin which makes use of postfilter. I know
 that
  the collect function is called for every document matched, but is there
 a
  way i can access all the matched documents upto this point before
 collect
  is called on each of them?

 You would need to collect/cache that information yourself in the post
 filter.

 -Yonik
 http://lucidworks.com





solr postfilter question

2013-07-10 Thread Rohit Harchandani
Hey,
I am trying to create a plugin which makes use of postfilter. I know that
the collect function is called for every document matched, but is there a
way i can access all the matched documents upto this point before collect
is called on each of them?
Thanks,
Rohir


Re: My latest solr blog post on Solr's PostFiltering

2013-07-10 Thread Rohit Harchandani
Hi Amit,

Great article. I tried it and it works well. I am new to developing in solr
and had a question? do you know if there is a way to access all the matched
ids before collect is called?

Thanks,
Rohit


On Sat, Nov 10, 2012 at 1:12 PM, Erick Erickson erickerick...@gmail.comwrote:

 That'll teach _me_ to look closely at the URL...

 Best
 Erick


 On Fri, Nov 9, 2012 at 12:03 PM, Amit Nithian anith...@gmail.com wrote:

  Oh weird. I'll post URLs on their own lines next time to clarify.
 
  Thanks guys and looking forward to any feedback!
 
  Cheers
  Amit
 
 
  On Fri, Nov 9, 2012 at 2:05 AM, Dmitry Kan dmitry@gmail.com wrote:
 
   I guess the url should have been:
  
  
  
 
 http://hokiesuns.blogspot.com/2012/11/using-solrs-postfiltering-to-collect.html
  
   i.e. without 'and' in the end of it.
  
   -- Dmitry
  
   On Fri, Nov 9, 2012 at 12:03 PM, Erick Erickson 
 erickerick...@gmail.com
   wrote:
  
It's always good when someone writes up their experiences!
   
But when I try to follow that link, I get to your Random Writings,
  but
   it
tells me that the blog post doesn't exist...
   
Erick
   
   
On Thu, Nov 8, 2012 at 4:21 PM, Amit Nithian anith...@gmail.com
  wrote:
   
 Hey all,

 I wanted to thank those who have helped in answering some of my
   esoteric
 questions and especially the one about using Solr's post filtering
feature
 to implement some score statistics gathering we had to do at
 Zvents.

 To show this appreciation and to help advance the knowledge of this
   space
 in a more codified fashion, I have written a blog post about this
  work
and
 open sourced the work as well.

 Please take a read by visiting


   
  
 
 http://hokiesuns.blogspot.com/2012/11/using-solrs-postfiltering-to-collect.htmland
 please let me know if there are any inaccuracies or points of
 contention so I can address/correct them.

 Thanks!
 Amit

   
  
  
  
   --
   Regards,
  
   Dmitry Kan
  
 



Problem with Solr replication in solr 4.2

2013-03-21 Thread Rohit Harchandani
Hey,
Currently we are using solr 4.0 with a master slave setup. The data gets
indexed on the master and then we issue a fetchindex command to replicate
it on the slave. The slave has a postCommit listener which gets kicked off
when replication finishes and we depend on this listener to know whn
replication is done. But when I tried to do the same with 4.2, the commit
does not seem to be happening. Is this a known issue? is there any other
way to know that replication is done?
Also, initially when i tried solr 4.2, i noticed with this setup, i noticed
that with the fetchIndex command, the fields were downloaded to the temp
folder, but it was never pulled into the index directory on the slave. The
only file which made it was the lock file. This problem does not happen
anymore?
Thanks,
Rohit


Re: Run multiple instances of solr using single data directory

2012-11-14 Thread Rohit Harchandani
ok. but what are the problems when brining up multiple instances reading
from the same data directory?
also how to re-open the searchers without restarting solr?
Thanks,
Rohit


On Tue, Nov 13, 2012 at 11:20 PM, Otis Gospodnetic 
otis.gospodne...@gmail.com wrote:

 Hi,

 If you have high query rate, running multiple instances of Solr on the same
 server doesn't typically make sense.  I'd stop and rethink :)

 Otis
 --
 Solr Performance Monitoring - http://sematext.com/spm/index.html


 On Tue, Nov 13, 2012 at 5:46 PM, Rohit Harchandani rhar...@gmail.com
 wrote:

  Hi All,
  I am currently using solr 4.0. The application I am working on requires a
  high rate of queries per second.
  Currently, we have setup a single master and a single slave on a
 production
  machine. We want to bring up multiple instances of solr (slaves). Are
 there
  any problems, when bringing them up on different ports but using the same
  data directory? These will be only serving up queries and all the
 indexing
  will take place on the master machine.
 
  Also, if i have multiple instances from the same data directory and i
  perform replication. Would that re-open searchers on all the instances?
  Thanks,
  Rohit
 



Re: Solr 4.0 simultaneous query problem

2012-11-06 Thread Rohit Harchandani
So is it a better approach to query for smaller rows, say 500, and keep
increasing the start parameter? wouldnt that be slower since I have an
increasing start parameter and I will also be sorting by the same field in
each of my queries made to the multiple shards?

Also, does it make sense to have all these documents in the same shard? I
went for this approach because the shard which is queried the most is small
and gives a lot of benefit in terms of time taken for all the stats
queries. This shard is only about 5 gb whereas the entire index will be
about 50 gb.

Thanks for the help,
Rohit

On Mon, Nov 5, 2012 at 4:02 PM, Walter Underwood wun...@wunderwood.orgwrote:

 Don't query for 5000 documents. That is going to be slow no matter how it
 is implemented.

 wunder

 On Nov 5, 2012, at 1:00 PM, Rohit Harchandani wrote:

  Hi,
  So it seems that when I query multiple shards with the sort criteria for
  5000 documents, it queries all shards and gets a list of document ids and
  then adds the document ids to the original query and queries all the
 shards
  again.
  This process of doing the join of query results with the unique ids and
  getting the remaining fields is turning out to be really slow. It takes a
  while to search for a list of unique ids. Is there any config change  to
  make this process faster?
  Also what does isDistrib=false mean when solr generates the queries
  internally?
  Thanks,
  Rohit
 
  On Fri, Oct 19, 2012 at 5:23 PM, Rohit Harchandani rhar...@gmail.com
 wrote:
 
  Hi,
 
  The same query is fired always for 500 rows. The only thing different is
  the start parameter.
 
  The 3 shards are in the same instance on the same server. They all have
  the same schema. But the inherent type of the documents is different.
 Also
  most of the apps queries goes to shard A which has the smallest index
  size (4gb).
 
  The query is made to a master shard which by default goes to all 3
  shards for results. (also, the query that i am trying matches documents
  only only in shard A mentioned above)
 
  Will try debugQuery now and post it here.
 
  Thanks,
  Rohit
 
 
 
 
  On Thu, Oct 18, 2012 at 11:00 PM, Otis Gospodnetic 
  otis.gospodne...@gmail.com wrote:
 
  Hi,
 
  Maybe you can narrow this down a little further.  Are there some
  queries that are faster and some slower?  Is there a pattern?  Can you
  share examples of slow queries?  Have you tried debugQuery=true?
  These 3 shards is each of them on its own server or?  Is the slow
  one always the one that hits the biggest shard?  Do they hold the same
  type of data?  How come their sizes are so different?
 
  Otis
  --
  Search Analytics - http://sematext.com/search-analytics/index.html
  Performance Monitoring - http://sematext.com/spm/index.html
 
 
  On Thu, Oct 18, 2012 at 12:22 PM, Rohit Harchandani rhar...@gmail.com
 
  wrote:
  Hi all,
  I have an application which queries a solr instance having 3
 shards(4gb,
  13gb and 30gb index size respectively) having 6 million documents in
  all.
  When I start 10 threads in my app to make simultaneous queries (with
  rows=500 and different start parameter, sort on 1 field and no facets)
  to
  solr to return 500 different documents in each query, sometimes I see
  that
  most of the responses come back within no time (500ms-1000ms), but the
  last
  response takes close to 50 seconds (Qtime).
  I am using the latest 4.0 release. What is the reason for this delay?
 Is
  there a way to prevent this?
  Thanks and regards,
  Rohit
 
 
 

 --
 Walter Underwood
 wun...@wunderwood.org






Re: Solr 4.0 simultaneous query problem

2012-11-05 Thread Rohit Harchandani
Hi,
So it seems that when I query multiple shards with the sort criteria for
5000 documents, it queries all shards and gets a list of document ids and
then adds the document ids to the original query and queries all the shards
again.
This process of doing the join of query results with the unique ids and
getting the remaining fields is turning out to be really slow. It takes a
while to search for a list of unique ids. Is there any config change  to
make this process faster?
Also what does isDistrib=false mean when solr generates the queries
internally?
Thanks,
Rohit

On Fri, Oct 19, 2012 at 5:23 PM, Rohit Harchandani rhar...@gmail.comwrote:

 Hi,

 The same query is fired always for 500 rows. The only thing different is
 the start parameter.

 The 3 shards are in the same instance on the same server. They all have
 the same schema. But the inherent type of the documents is different. Also
 most of the apps queries goes to shard A which has the smallest index
 size (4gb).

 The query is made to a master shard which by default goes to all 3
 shards for results. (also, the query that i am trying matches documents
 only only in shard A mentioned above)

 Will try debugQuery now and post it here.

 Thanks,
 Rohit




 On Thu, Oct 18, 2012 at 11:00 PM, Otis Gospodnetic 
 otis.gospodne...@gmail.com wrote:

 Hi,

 Maybe you can narrow this down a little further.  Are there some
 queries that are faster and some slower?  Is there a pattern?  Can you
 share examples of slow queries?  Have you tried debugQuery=true?
 These 3 shards is each of them on its own server or?  Is the slow
 one always the one that hits the biggest shard?  Do they hold the same
 type of data?  How come their sizes are so different?

 Otis
 --
 Search Analytics - http://sematext.com/search-analytics/index.html
 Performance Monitoring - http://sematext.com/spm/index.html


 On Thu, Oct 18, 2012 at 12:22 PM, Rohit Harchandani rhar...@gmail.com
 wrote:
  Hi all,
  I have an application which queries a solr instance having 3 shards(4gb,
  13gb and 30gb index size respectively) having 6 million documents in
 all.
  When I start 10 threads in my app to make simultaneous queries (with
  rows=500 and different start parameter, sort on 1 field and no facets)
 to
  solr to return 500 different documents in each query, sometimes I see
 that
  most of the responses come back within no time (500ms-1000ms), but the
 last
  response takes close to 50 seconds (Qtime).
  I am using the latest 4.0 release. What is the reason for this delay? Is
  there a way to prevent this?
  Thanks and regards,
  Rohit





Retrieval of large number of documents

2012-09-12 Thread Rohit Harchandani
Hi all,
I have a solr index with 5,000,000 documents and my index size is 38GB. But
when I query for about 400,000 documents based on certain criteria, solr
searches it really quickly but does not return data for close to 2 minutes.
The unique key field is the only field i am requesting for. Also, I apply
an xslt transformation to the response to get a comma separated list of
unique keys. Is there a way to improve this speed?? Would sharding help in
this case?
I am currently using solr 4.0 beta in my application.
Thanks,
Rohit


Re: Delete all documents in the index

2012-09-05 Thread Rohit Harchandani
Thanks everyone. Adding the _version_ field in the schema worked.
Deleting the data directory works for me, but was not sure why deleting
using curl was not working.

On Wed, Sep 5, 2012 at 1:49 PM, Michael Della Bitta 
michael.della.bi...@appinions.com wrote:

 Rohit:

 If it's easy, the easiest thing to do is to turn off your servlet
 container, rm -r * inside of the data directory, and then restart the
 container.

 Michael Della Bitta

 
 Appinions | 18 East 41st St., Suite 1806 | New York, NY 10017
 www.appinions.com
 Where Influence Isn’t a Game


 On Wed, Sep 5, 2012 at 12:56 PM, Jack Krupansky j...@basetechnology.com
 wrote:
  Check to make sure that you are not stumbling into SOLR-3432:
 deleteByQuery
  silently ignored if updateLog is enabled, but {{_version_}} field does
 not
  exist in schema.
 
  See:
  https://issues.apache.org/jira/browse/SOLR-3432
 
  This could happen if you kept the new 4.0 solrconfig.xml, but copied in
 your
  pre-4.0 schema.xml.
 
  -- Jack Krupansky
 
  -Original Message- From: Rohit Harchandani
  Sent: Wednesday, September 05, 2012 12:48 PM
  To: solr-user@lucene.apache.org
  Subject: Delete all documents in the index
 
 
  Hi,
  I am having difficulty deleting documents from the index using curl. The
  urls i tried were:
  curl http://localhost:9020/solr/core1/update/?stream.body=
  deletequery*:*/query/deletecommit=true
  curl http://localhost:9020/solr/core1/update/?commit=true; -H
  Content-Type: text/xml --data-binary 'deletequeryid:[* TO
  *]/query/delete'
  curl http://localhost:9020/solr/core1/update/?commit=true; -H
  Content-Type: text/xml --data-binary
 'deletequery*:*/query/delete'
  I also tried:
  curl 
 
 http://localhost:9020/solr/core1/update/?stream.body=%3Cdelete%3E%3Cquery%3E*:*%3C/query%3E%3C/delete%3Ecommit=true
  
  as suggested on some forums. I get a response with status=0 in all cases,
  but none of the above seem to work.
  When I run
  curl http://localhost:9020/solr/core1/select?q=*:*rows=0wt=xml;
  I still get a value for numFound.
 
  I am currently using solr 4.0 beta version.
 
  Thanks for your help in advance.
  Regards,
  Rohit



Update data directory at run time

2012-08-14 Thread Rohit Harchandani
Hi All,
I am new to Solr and would really appreciate some help on this issue.
I have a single core setup currently and have separate instances for
querying and indexing. These two instances point to different data
directories through symbolic links since I do not want it to affect the
live searching instance.
Once the indexing is done, I swap the directories to which the symbolic
links point, so that the live searching instance now points to the
directory where the new data was indexed. But this does not seem to work,
without restarting solr.
I guess my purpose can probably be acheived using multiple cores and
SWAP(?), but wanted to know if there is a way to do this with a single
core? Is there some command needed once the directories are swapped?
I see in the core description, that the directory entry under index did
not change after updating the symlinks.

(org.apache.lucene.store.MMapDirectory:org.apache.lucene.store.MMapDirectory@/bb/mbigd/mbig2580/srchSolr/apache-solr-4.0.0-ALPHA/example/solr/data/index
lockFactory=org.apache.lucene.store.NativeFSLockFactory@2447e380)

Is there a way to update this dynamically? Thanks a lot

Regards,
Rohit Harchandani


Re: Update data directory at run time

2012-08-14 Thread Rohit Harchandani
Cool. Thanks. I will have a look at this.
But in this case, if all the files on the master are new, will the entire
index on the slave be replaced or will it add to whatever is currently
present on the slave?
Thanks again,
Rohit

On Tue, Aug 14, 2012 at 6:04 PM, Walter Underwood wun...@wunderwood.orgwrote:

 Why are you not using the built-in replication? That works fine. You do
 not need to invent anything.

 wunder

 On Aug 14, 2012, at 2:57 PM, Rohit Harchandani wrote:

  Hi All,
  I am new to Solr and would really appreciate some help on this issue.
  I have a single core setup currently and have separate instances for
  querying and indexing. These two instances point to different data
  directories through symbolic links since I do not want it to affect the
  live searching instance.
  Once the indexing is done, I swap the directories to which the symbolic
  links point, so that the live searching instance now points to the
  directory where the new data was indexed. But this does not seem to work,
  without restarting solr.
  I guess my purpose can probably be acheived using multiple cores and
  SWAP(?), but wanted to know if there is a way to do this with a single
  core? Is there some command needed once the directories are swapped?
  I see in the core description, that the directory entry under index
 did
  not change after updating the symlinks.
 
 
 (org.apache.lucene.store.MMapDirectory:org.apache.lucene.store.MMapDirectory@
 /bb/mbigd/mbig2580/srchSolr/apache-solr-4.0.0-ALPHA/example/solr/data/index
  lockFactory=org.apache.lucene.store.NativeFSLockFactory@2447e380)
 
  Is there a way to update this dynamically? Thanks a lot
 
  Regards,
  Rohit Harchandani