Re: Subclassing DIH
I'll give the deletedEntity "trick" a try... igneous -- View this message in context: http://lucene.472066.n3.nabble.com/Subclassing-DIH-tp830954p863108.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Subclassing DIH
On 01.06.2010, at 23:35, Chris Hostetter wrote: > > : > http://lucene.472066.n3.nabble.com/StackOverflowError-during-Delta-Import-td811053.html#a824780 > > yeah, i remember that thread -- it really seems like a driver issue, but > understandable that "fixing hte driver" is probably more out of scope then > "working arround in solr" > > : I never did find a "good" solution to that bug however I did come up with a > : workaround. I noticed if I removed my deletedPkQuery then the delta-import > : would work as expected. Obviously I still have the need to delete items out > : of the index during indexing so I wanted to subclass the DataImportHandler > : to first update all documents then I would delete all the documents that my > : deletedPkQuery would have deleted. > > i'm not a DIH expert, but have you considered the possibility of having > two > distinct "entities" declared in your config, that both refer to the same > logical entity -- one that you use fo hte delta importing, and one that > you use for hte deletedPkQuery ? > > I'm not sure if it would work, but based on another recent thread i saw, i > think it might... to me the entire delta-query approach makes no sense, but i digress. here is a cut down version of the config i use todo full imports, deletes and updates As you can see I have parameterized the DSN information. Plus I have one query defined for the deletes and another one for both the full import and updates. if clear is set to anything but false, the where condition evalutes to true and the updated_at would be ignored in pretty much any decent RDBMS. if its false, then the updated_at is checked as per usual. regards, Lukas Kahwe Smith m...@pooteeweet.org
Re: Subclassing DIH
: http://lucene.472066.n3.nabble.com/StackOverflowError-during-Delta-Import-td811053.html#a824780 yeah, i remember that thread -- it really seems like a driver issue, but understandable that "fixing hte driver" is probably more out of scope then "working arround in solr" : I never did find a "good" solution to that bug however I did come up with a : workaround. I noticed if I removed my deletedPkQuery then the delta-import : would work as expected. Obviously I still have the need to delete items out : of the index during indexing so I wanted to subclass the DataImportHandler : to first update all documents then I would delete all the documents that my : deletedPkQuery would have deleted. i'm not a DIH expert, but have you considered the possibility of having two distinct "entities" declared in your config, that both refer to the same logical entity -- one that you use fo hte delta importing, and one that you use for hte deletedPkQuery ? I'm not sure if it would work, but based on another recent thread i saw, i think it might... http://lucene.472066.n3.nabble.com/deleteDocByID-td858903.html#a858951 ...in any event, subclassing the entire DataImportHandler definitley seems like overkill for what you are trying to achieve -- we just need ot get some of the DIH experts to chime in here. -Hoss
Re: Subclassing DIH
Ok to further explain myself. Well first off I was experience a StackOverFlow error during my delta-imports after doing a full-import. The strange thing was, it only happened sometimes. Thread is here: http://lucene.472066.n3.nabble.com/StackOverflowError-during-Delta-Import-td811053.html#a824780 I never did find a "good" solution to that bug however I did come up with a workaround. I noticed if I removed my deletedPkQuery then the delta-import would work as expected. Obviously I still have the need to delete items out of the index during indexing so I wanted to subclass the DataImportHandler to first update all documents then I would delete all the documents that my deletedPkQuery would have deleted. I can actually accomplish the above behavior using the onImportEnd EventListener however I lose the ability to know how many documents were actually deleted since my manual deletion of documents doesnt get pick up in the data importer cumulativeStatistics. My hope was that I could subclass DIH and "massage" the cumulativeStatistics after my manual deletion of documents. FYI my manual deletion is accomplished by sending a deleteById query to an instance of CommonsHttpSolrServer that I create from the current context of the EventListener. Side question: How can I retrieve the # of items actually removed from the index after a deletedById query??? Thoughts on the process? There just has to be an easier way. -- View this message in context: http://lucene.472066.n3.nabble.com/Subclassing-DIH-tp830954p832684.html Sent from the Solr - User mailing list archive at Nabble.com.
Re: Subclassing DIH
: I am trying to subclass DIH to add I am having a hard time trying to get : access to the current Solr Context. How is this possible? I don't think DIH was particularly designed to be subclassed (i'm suprised it's not final) ... it was built with the assumption that people would write plugins (transformers, datasources, etc...) If you elaborate a little bit more on what you hope to achieve by subclassing, people cna provide more insight into the best way to go about it... http://people.apache.org/~hossman/#xyproblem XY Problem Your question appears to be an "XY Problem" ... that is: you are dealing with "X", you are assuming "Y" will help you, and you are asking about "Y" without giving more details about the "X" so that we can understand the full issue. Perhaps the best solution doesn't involve "Y" at all? See Also: http://www.perlmonks.org/index.pl?node_id=542341 -Hoss
Subclassing DIH
I am trying to subclass DIH to add I am having a hard time trying to get access to the current Solr Context. How is this possible? Is there anyway to get access to the current DataSource, DataImporter etc? On a related note... when working with an onImportEnd, or onImportStart how can I get a reference to the current Request/Response that initiated the import? >From the DIH subclass I can access the request/response but not the context. >From the event listener I can access the Context but not the request/response. -- View this message in context: http://lucene.472066.n3.nabble.com/Subclassing-DIH-tp830954p830954.html Sent from the Solr - User mailing list archive at Nabble.com.