Moser: You may not need to resort to workarounds. There are two solutions one using delta-import and one using full-import
solution:1 using delta-import If you wish that DIH manage your deletes there is a deletedPkQuery also ,. The config may look like, <entity name="posts" query="SELECT p.forumid, p.messageid,p.message FROM posts p, forums f WHERE f.forumid = p.forumid" deletedPkQuery ="SELECT p.messageid from posts p, forums f WHERE f.forumid = p.forumid and p.deleted= true OR f.deleted=true"/> * am assuming that p.messageid is the pk The query is run in the beginning and the pk's returned will be used to delete documents solution:2 using full-import The config may look like, This will do a clean full import everytime <entity name="posts" query="SELECT p.forumid, p.messageid, IF (p.deleted OR f.deleted,true,false) as deleted, p.message FROM posts p, forums f WHERE f.forumid = p.forumid"/> This adds the flag 'deleted' to a document If you wish to do incremental indexing then run the command full-import with clean=false , It ensures that the index is not cleaned prior to indexing. <entity name="posts" query="SELECT p.forumid, p.messageid,p.message FROM posts p, forums f WHERE f.forumid = p.forumid and p.last_modified> ${dataimporter.last_index_time}" deletedPkQuery ="SELECT p.messageid from posts p, forums f WHERE f.forumid = p.forumid and p.deleted= true OR f.deleted=true"/> I am assuming that you are maintaining a timestamp for last_modified in the posts . note: The full-import may not be as expensive as you think. We do a full import of 3 million docs in 20 mins . --Noble On Tue, May 13, 2008 at 5:36 AM, Chris Moser (JIRA) <[EMAIL PROTECTED]> wrote: > > > [ > https://issues.apache.org/jira/browse/SOLR-469?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12596237#action_12596237 > ] > > Chris Moser commented on SOLR-469: > ---------------------------------- > > Hi Shalin, > > I'm indexing forums with Solr and have tables with a structure similar to > this: > > {code} > posts > ------ > forumid int > messageid int > deleted boolean > message text > > forums > ------ > forumid int > name text > deleted boolean > > {code} > > The simplified data query I'm running goes like this: > > {code} > SELECT > p.forumid, > p.messageid, > IF (p.deleted OR f.deleted,true,false) as deleted, > p.message > > FROM > posts p, forums f > > WHERE > f.forumid = p.forumid > {code} > > The query checks to see if the post or the forum is deleted, and marks it in > the index as deleted in either case (which is why I'm doing the join). The > problem I'm running into is that the importer is running the WHERE clause > like this: > > {code} > WHERE > f.forumid = p.forumid and forumid=123 and messageid=123456789 > {code} > > In this case, the _forumid=123_ part is ambiguous (forumid being in the > posts and the forums table) so this causes a SQL error. So I added an > additional attribute to the entity defintion (pkTable) which prepends the > _forumid=123_ with the pkTable value so it generates _pkTable.forumid=123_. > > Not sure if this is the best way to do it but it fixed the problem :) > > > Data Import RequestHandler > > -------------------------- > > > > Key: SOLR-469 > > URL: https://issues.apache.org/jira/browse/SOLR-469 > > Project: Solr > > Issue Type: New Feature > > Components: update > > Affects Versions: 1.3 > > Reporter: Noble Paul > > Assignee: Grant Ingersoll > > Fix For: 1.3 > > > > Attachments: SOLR-469-contrib.patch, SOLR-469.patch, > SOLR-469.patch, SOLR-469.patch, SOLR-469.patch, SOLR-469.patch, > SOLR-469.patch, SOLR-469.patch, SOLR-469.patch > > > > > > We need a RequestHandler Which can import data from a DB or other > dataSources into the Solr index .Think of it as an advanced form of SqlUpload > Plugin (SOLR-103). > > The way it works is as follows. > > * Provide a configuration file (xml) to the Handler which takes in the > necessary SQL queries and mappings to a solr schema > > - It also takes in a properties file for the data source > configuraution > > * Given the configuration it can also generate the solr schema.xml > > * It is registered as a RequestHandler which can take two commands > do-full-import, do-delta-import > > - do-full-import - dumps all the data from the Database into > the index (based on the SQL query in configuration) > > - do-delta-import - dumps all the data that has changed since > last import. (We assume a modified-timestamp column in tables) > > * It provides a admin page > > - where we can schedule it to be run automatically at regular > intervals > > - It shows the status of the Handler (idle, full-import, > delta-import) > > -- > This message is automatically generated by JIRA. > - > You can reply to this email to add a comment to the issue online. > > -- --Noble Paul