Re: Subclassing DIH

2010-06-01 Thread Chris Hostetter

: 
http://lucene.472066.n3.nabble.com/StackOverflowError-during-Delta-Import-td811053.html#a824780

yeah, i remember that thread -- it really seems like a driver issue, but 
understandable that fixing hte driver is probably more out of scope then 
working arround in solr

: I never did find a good solution to that bug however I did come up with a
: workaround. I noticed if I removed my deletedPkQuery then the delta-import
: would work as expected. Obviously I still have the need to delete items out
: of the index during indexing so I wanted to subclass the DataImportHandler
: to first update all documents then I would delete all the documents that my
: deletedPkQuery would have deleted.

i'm not a DIH expert, but have you considered the possibility of having 
two 
distinct entities declared in your config, that both refer to the same 
logical entity -- one that you use fo hte delta importing, and one that 
you use for hte deletedPkQuery ?

I'm not sure if it would work, but based on another recent thread i saw, i 
think it might...

http://lucene.472066.n3.nabble.com/deleteDocByID-td858903.html#a858951


...in any event, subclassing the entire DataImportHandler definitley seems 
like overkill for what you are trying to achieve -- we just need ot get 
some of the DIH experts to chime in here.

-Hoss



Re: Subclassing DIH

2010-06-01 Thread Lukas Kahwe Smith

On 01.06.2010, at 23:35, Chris Hostetter wrote:

 
 : 
 http://lucene.472066.n3.nabble.com/StackOverflowError-during-Delta-Import-td811053.html#a824780
 
 yeah, i remember that thread -- it really seems like a driver issue, but 
 understandable that fixing hte driver is probably more out of scope then 
 working arround in solr
 
 : I never did find a good solution to that bug however I did come up with a
 : workaround. I noticed if I removed my deletedPkQuery then the delta-import
 : would work as expected. Obviously I still have the need to delete items out
 : of the index during indexing so I wanted to subclass the DataImportHandler
 : to first update all documents then I would delete all the documents that my
 : deletedPkQuery would have deleted.
 
 i'm not a DIH expert, but have you considered the possibility of having 
 two 
 distinct entities declared in your config, that both refer to the same 
 logical entity -- one that you use fo hte delta importing, and one that 
 you use for hte deletedPkQuery ?
 
 I'm not sure if it would work, but based on another recent thread i saw, i 
 think it might...


to me the entire delta-query approach makes no sense, but i digress. here is a 
cut down version of the config i use todo full imports, deletes and updates

dataConfig
dataSource type=JdbcDataSource driver=com.mysql.jdbc.Driver 
url=${dataimporter.request.source_dsn} batchSize=-1 
user=${dataimporter.request.user} 
password=${dataimporter.request.password}/
document
entity name=deletedentity query=SELECT NULL pk=id 
deletedPkQuery=SELECT e.id AS `$deleteDocById`
FROM deletedentity AS e/
entity name=entity query=SELECT
e.id,  e.status, e.name
FROM entity AS e
WHERE ('${dataimporter.request.clear}' != 'false' OR e.updated_at  
'${dataimporter.last_index_time}')/
/document
/dataConfig

As you can see I have parameterized the DSN information. Plus I have one query 
defined for the deletes and another one for both the full import and updates. 
if clear is set to anything but false, the where condition evalutes to true and 
the updated_at would be ignored in pretty much any decent RDBMS. if its false, 
then the updated_at is checked as per usual.

regards,
Lukas Kahwe Smith
m...@pooteeweet.org





Re: Subclassing DIH

2010-06-01 Thread Blargy

I'll give the deletedEntity trick a try... igneous 
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Subclassing-DIH-tp830954p863108.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Subclassing DIH

2010-05-20 Thread Chris Hostetter

: I am trying to subclass DIH to add I am having a hard time trying to get
: access to the current Solr Context. How is this possible? 

I don't think DIH was particularly designed to be subclassed (i'm suprised 
it's not final) ... it was built with the assumption that people would 
write plugins (transformers, datasources, etc...)

If you elaborate a little bit more on what you hope to achieve by 
subclassing, people cna provide more insight into the best way to go about 
it...

http://people.apache.org/~hossman/#xyproblem
XY Problem

Your question appears to be an XY Problem ... that is: you are dealing
with X, you are assuming Y will help you, and you are asking about Y
without giving more details about the X so that we can understand the
full issue.  Perhaps the best solution doesn't involve Y at all?
See Also: http://www.perlmonks.org/index.pl?node_id=542341


-Hoss



Re: Subclassing DIH

2010-05-20 Thread Blargy

Ok to further explain myself.

Well first off I was experience a StackOverFlow error during my
delta-imports after doing a full-import. The strange thing was, it only
happened sometimes. Thread is here:
http://lucene.472066.n3.nabble.com/StackOverflowError-during-Delta-Import-td811053.html#a824780

I never did find a good solution to that bug however I did come up with a
workaround. I noticed if I removed my deletedPkQuery then the delta-import
would work as expected. Obviously I still have the need to delete items out
of the index during indexing so I wanted to subclass the DataImportHandler
to first update all documents then I would delete all the documents that my
deletedPkQuery would have deleted.

I can actually accomplish the above behavior using the onImportEnd
EventListener however I lose the ability to know how many documents were
actually deleted since my manual deletion of documents doesnt get pick up in
the data importer cumulativeStatistics. 

My hope was that I could subclass DIH and massage the cumulativeStatistics
after my manual deletion of documents.

FYI my manual deletion is accomplished by sending a deleteById query to an
instance of CommonsHttpSolrServer that I create from the current context of
the EventListener. Side question: How can I retrieve the # of items actually
removed from the index after a deletedById query???

Thoughts on the process? There just has to be an easier way.
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Subclassing-DIH-tp830954p832684.html
Sent from the Solr - User mailing list archive at Nabble.com.


Subclassing DIH

2010-05-19 Thread Blargy

I am trying to subclass DIH to add I am having a hard time trying to get
access to the current Solr Context. How is this possible? 

Is there anyway to get access to the current DataSource, DataImporter etc?

On a related note... when working with an onImportEnd, or onImportStart how
can I get a reference to the current Request/Response that initiated the
import? 

From the DIH subclass I can access the request/response but not the context.
From the event listener I can access the Context but not the
request/response. 
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Subclassing-DIH-tp830954p830954.html
Sent from the Solr - User mailing list archive at Nabble.com.