Re: Solr Custom Filter Factory - How to pass parameters?
Can someone please point to some samples on how to implement custom SolrEventListeners? Whats the default behavior of Solr when no SolrEventListeners are configured in solrconfig.xml. I am trying to understand how does custom listener fit in with default listeners (if there are any) Thanks -K'Rider - Thanks -K'Rider -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Custom-Filter-Factory-How-to-pass-parameters-handle-PostProcessing-tp4002217p4003014.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr Custom Filter Factory - How to pass parameters?
I'm reaching a bit here, haven't implemented one myself, but... It seems like you're just dealing with some shared memory. So say your filter recorded all the stuff you want to put into the DB. When you put stuff in to the shared memory, you probably have to figure out when you should commit the batch (if you're indexing 100M docs, you probably don't want to use up that much memory, but what do I know). This is all done at the filter. It seems like you could also create an SolrEventListener on the PostCommit event (see: http://wiki.apache.org/solr/SolrPlugins#SolrEventListener) to put whatever remained in your list into your DB. Of course you'd have to do some synchronization so multiple threads played nice with each other. And you'd have to be sure to fire a commit at the end of your indexing process if you wanted some certainty that everything was tidied up. If some delay isn't a problem and you have autocommit configured, then your event listener would be called when then next autocommit happened. FWIW Erick On Tue, Aug 21, 2012 at 8:19 PM, ksu wildcats ksu.wildc...@gmail.com wrote: Jack Reading through the documentation for UpdateRequestProcessor my understanding is that its good for handling processing of documents before analysis. Is it true that processAdd (where we can have custom logic) is invoked once per document and is invoked before any of the analyzers gets invoked? I couldn't figure out how I can use UpdateRequestProcessor to access the tokens stored in memory by CustomFilterFactory/CustomFilter. Can you please provide more information on how I can use UpdateRequestProcessor to handle any post processing that needs to be done after all documents are added to the index? Also does CustomFilterFactory/CustomFilter has any ways to do post processing after all documents are added to index? Here is the code i have for CustomFilterFactory/CustomFilter. This might help understand what i am trying to do and may be there is a better way to do this. The main problem i have with this approach is that i am forced to write results stored in memory (customMap) to database per document and if i have 1 million documents then thats 1 million db calls. I am trying to avoid the number of calls made to database by storing results in memory and write results to database once for every X documents (say, every 1 docs). public class CustomFilterFactory extends BaseTokenFilterFactory { public CustomFilter create(TokenStream input) { String databaseName = getArgs().get(paramname); return new CustomFilter(input, databasename); } } public class CustomFilter extends TokenFilter { private TermAttribute termAtt; MapTermAttribute, Integer customMap = new HashMapTermAttribute, Integer(); String databasename = null; protected CustomFilter(TokenStream input, String databasename) { super(input); termAtt = (TermAttribute) addAttribute(TermAttribute.class); this.databasename = databasename; } public final boolean incrementToken() throws IOException { if (!input.incrementToken()) { writeResultsToDB() return false; } if (addWordToCustomMap()) { // do some analysis on term and then populate customMap // customMap.put(term,somevalue); } if (customMap.size() commitSize) { writeResultsToDB() } return true; } boolean addWordToCustomMap() { // custom logic - some validation on term to determine if this should be added to customMap } void writeResultsToDB() throws IOException { // custom logic that reads data from customMap, does some analysis and writes them to database. } } -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Custom-Filter-Factory-How-to-pass-parameters-tp4002217p4002531.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr Custom Filter Factory - How to pass parameters?
Thanks Erick. I tried to do it all at the filter but the problem i am running into doing it at the filter is intercepting the final commit calls or in other words I am unable to figure out when the final commit should happen such that I don't miss out any data. One option I tried is to increase the in-memory batch size and commit the data from in-memory to database in incrementToken method but this can lead to missing out data from in-memory if the size of the batch is less than the set threshold. I'll try using SolrEventListener and see if that can help resolve the issues i am running into. -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Custom-Filter-Factory-How-to-pass-parameters-handle-PostProcessing-tp4002217p4002768.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Solr Custom Filter Factory - How to pass parameters?
Thanks for your help. I was able to get it working with using the parameters from filedtype definition in config files. I am now stuck on next step. Can you please tell if there is a way to identify/intercept last token that gets added to index (across all documents) ? Here is my scenario 1) I have custom implementation in incrementToken method in CustomFilter 2) I am trying to collect all tokens from all documents and then do some analysis on those tokens and then write the result to database. 3) I have the results saved in-memory and am writing them to database after last token is parsed. if (!input.incrementToken()) { // custom logic that writes the data from in-memory to database } 4) I noticed that this approach increased too many db calls (one per each document) 5) To avoid too many calls to database I tried to batch results from multiple documents and then write them all at once to database but what I couldn't figure out is how can i determine when to flush the results from CustomFilter to database. Is there any method in FilterFactory or Filter class that I can use to know that Indexing is completed? -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Custom-Filter-Factory-How-to-pass-parameters-tp4002217p4002323.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr Custom Filter Factory - How to pass parameters?
Read through the update processor stuff. Maybe that might suggest a good place to put processing that should occur after all input has been analyzed. http://wiki.apache.org/solr/UpdateRequestProcessor -- Jack Krupansky -Original Message- From: ksu wildcats Sent: Tuesday, August 21, 2012 2:02 AM To: solr-user@lucene.apache.org Subject: RE: Solr Custom Filter Factory - How to pass parameters? Thanks for your help. I was able to get it working with using the parameters from filedtype definition in config files. I am now stuck on next step. Can you please tell if there is a way to identify/intercept last token that gets added to index (across all documents) ? Here is my scenario 1) I have custom implementation in incrementToken method in CustomFilter 2) I am trying to collect all tokens from all documents and then do some analysis on those tokens and then write the result to database. 3) I have the results saved in-memory and am writing them to database after last token is parsed. if (!input.incrementToken()) { // custom logic that writes the data from in-memory to database } 4) I noticed that this approach increased too many db calls (one per each document) 5) To avoid too many calls to database I tried to batch results from multiple documents and then write them all at once to database but what I couldn't figure out is how can i determine when to flush the results from CustomFilter to database. Is there any method in FilterFactory or Filter class that I can use to know that Indexing is completed? -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Custom-Filter-Factory-How-to-pass-parameters-tp4002217p4002323.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr Custom Filter Factory - How to pass parameters?
Jack Reading through the documentation for UpdateRequestProcessor my understanding is that its good for handling processing of documents before analysis. Is it true that processAdd (where we can have custom logic) is invoked once per document and is invoked before any of the analyzers gets invoked? I couldn't figure out how I can use UpdateRequestProcessor to access the tokens stored in memory by CustomFilterFactory/CustomFilter. Can you please provide more information on how I can use UpdateRequestProcessor to handle any post processing that needs to be done after all documents are added to the index? Also does CustomFilterFactory/CustomFilter has any ways to do post processing after all documents are added to index? Here is the code i have for CustomFilterFactory/CustomFilter. This might help understand what i am trying to do and may be there is a better way to do this. The main problem i have with this approach is that i am forced to write results stored in memory (customMap) to database per document and if i have 1 million documents then thats 1 million db calls. I am trying to avoid the number of calls made to database by storing results in memory and write results to database once for every X documents (say, every 1 docs). public class CustomFilterFactory extends BaseTokenFilterFactory { public CustomFilter create(TokenStream input) { String databaseName = getArgs().get(paramname); return new CustomFilter(input, databasename); } } public class CustomFilter extends TokenFilter { private TermAttribute termAtt; MapTermAttribute, Integer customMap = new HashMapTermAttribute, Integer(); String databasename = null; protected CustomFilter(TokenStream input, String databasename) { super(input); termAtt = (TermAttribute) addAttribute(TermAttribute.class); this.databasename = databasename; } public final boolean incrementToken() throws IOException { if (!input.incrementToken()) { writeResultsToDB() return false; } if (addWordToCustomMap()) { // do some analysis on term and then populate customMap // customMap.put(term,somevalue); } if (customMap.size() commitSize) { writeResultsToDB() } return true; } boolean addWordToCustomMap() { // custom logic - some validation on term to determine if this should be added to customMap } void writeResultsToDB() throws IOException { // custom logic that reads data from customMap, does some analysis and writes them to database. } } -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Custom-Filter-Factory-How-to-pass-parameters-tp4002217p4002531.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr Custom Filter Factory - How to pass parameters?
First, the obvious question: What kind of information? Be specific. Second, you can pass parameters to your filter factory in your field type definitions. You could have separate schemas or separate field types for the different indexes. Is there anything this doesn't cover? You can also provide an update processor that could supply whatever parameters you want. -- Jack Krupansky -Original Message- From: ksu wildcats Sent: Monday, August 20, 2012 1:19 PM To: solr-user@lucene.apache.org Subject: Solr Custom Filter Factory - How to pass parameters? We are using SOLR and are in the process of adding custom filter factory to handle the processing of words/tokens to suit our needs. Here is what our custom filter factory does 1) Reads the tokens and does some analysis and writes the result of analysis to database. We are using Embedded Solr with multi-core (separate core for each index). We have Custom Filter Factory information configured in the Schema.xml The problem we are running into is - not able to pass parameters to our custom filter factory. We need to be able to pass some additional information (index specific and this would be different for each index) to our custom filter factory. Can anyone please tell if this is possible with Solr or do we need to switch back to using Lucene-APIs? Thanks -K -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Custom-Filter-Factory-How-to-pass-parameters-tp4002217.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Solr Custom Filter Factory - How to pass parameters?
Thanks Jack. The information I want to pass is the databasename into which the analyzed data needs to be inserted. As i was saying earlier, the set up we have is 1) we use embedded solr server with multi cores - embedded into our webapp 2) support one index for each client - each client has a separate database (rdbms) and separate index (core) 3) dynamically create the config files when client request comes into our service for first time. config files (schema xml) are separate but content is identifical for all cores. The custom filter factory we want to add to chain of filters in schema.xml will process tokens and write them to the clients database. I am trying to figure out a way to retrieve the database name based on the information coming in request from client. I hope this is clear. Regarding your suggestion on ability to pass parameters in filed type definitions. Can you please point me to documentation or example on how to retrieve these parameter values from within filter factory? Also I am not familiar with update processor. Any link to additional information on how to provide update processor will be greatly helpful. -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Custom-Filter-Factory-How-to-pass-parameters-tp4002217p4002231.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Solr Custom Filter Factory - How to pass parameters?
-Original message- From:ksu wildcats ksu.wildc...@gmail.com Sent: Mon 20-Aug-2012 20:28 To: solr-user@lucene.apache.org Subject: Re: Solr Custom Filter Factory - How to pass parameters? Thanks Jack. The information I want to pass is the databasename into which the analyzed data needs to be inserted. As i was saying earlier, the set up we have is 1) we use embedded solr server with multi cores - embedded into our webapp 2) support one index for each client - each client has a separate database (rdbms) and separate index (core) 3) dynamically create the config files when client request comes into our service for first time. config files (schema xml) are separate but content is identifical for all cores. The custom filter factory we want to add to chain of filters in schema.xml will process tokens and write them to the clients database. I am trying to figure out a way to retrieve the database name based on the information coming in request from client. I hope this is clear. Regarding your suggestion on ability to pass parameters in filed type definitions. Can you please point me to documentation or example on how to retrieve these parameter values from within filter factory? You extend a TokenFilterFactory: http://lucene.apache.org/core/4__0-BETA/analyzers-common/org/apache/lucene/analysis/util/TokenFilterFactory.html which extends AbstractAnalysisFactory: http://lucene.apache.org/core/4_0_0-BETA/analyzers-common/org/apache/lucene/analysis/util/AbstractAnalysisFactory.html Use the get() method to get the parameters defined in the XML. Check how the stopfilter retrieves it's parameters: http://svn.apache.org/viewvc/lucene/dev/trunk/lucene/analysis/common/src/java/org/apache/lucene/analysis/core/StopFilterFactory.java?view=markup Also I am not familiar with update processor. Any link to additional information on how to provide update processor will be greatly helpful. -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Custom-Filter-Factory-How-to-pass-parameters-tp4002217p4002231.html Sent from the Solr - User mailing list archive at Nabble.com.
RE: Solr Custom Filter Factory - How to pass parameters?
Thanks Markus. Links are helpful. I will give it a try and see if that solves my problem. -- View this message in context: http://lucene.472066.n3.nabble.com/Solr-Custom-Filter-Factory-How-to-pass-parameters-tp4002217p4002248.html Sent from the Solr - User mailing list archive at Nabble.com.