[ https://issues.apache.org/jira/browse/SOLR-1613?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ]
Matt Inger updated SOLR-1613: ----------------------------- Description: It is desirable to able to segment imports by a particular field in the root entity record so that you can update a particular segment of your database when bulk updates occur on the backend database. For instance, if a bulk update occurs for a particular customer, it would be more efficient to be able to update a full segment of your index for that customer rather than issuing updates for every single user in your index for that customer, or updating the entire index. That would be a waste of processing power. Instead, it would be more efficient to specify that a particular document field in the root entity was a segmentation field, and define an additional query on the root entity (i'm basing my example on a jdbc based datasource): <entity name="user" pk="userid" segment="customerid" ... query="..." segmentQuery="select ... where customerid=${dataimporter.request.segment}" /> Then, when you request a segment update, you specify the segment as a parameter to your request /solr/db/dataimport?command=segment-import&segment=1000 It would automatically remove documents whose field corresponding to your "segment" attribute on the root entity matched the segment you were importing. In the above example, it would remove documents matching. Though I'm not sure that's exactly the right thing to do, as we would need to ensure exact field matching, and i'm not sure what the default behavior is of the query matching which is using in the delete routines, so that would need to be looked into. customerid:1000 I've worked out the code segments required to do this for the JdbcDataSource, though I'm not sure what additional changes would be necessary for other datasource types, and am attaching a patch which includes these changes. was: It is desirable to able to segment imports by a particular field in the root entity record so that you can update a particular segment of your database when bulk updates occur on the backend database. For instance, if a bulk update occurs for a particular customer, it would be more efficient to be able to update a full segment of your index for that customer rather than issuing updates for every single user in your index for that customer, or updating the entire index. That would be a waste of processing power. Instead, it would be more efficient to specify that a particular document field in the root entity was a segmentation field, and define an additional query on the root entity (i'm basing my example on a jdbc based datasource): <entity name="user" pk="userid" segment="customerid" ... query="..." segmentQuery="select ... where customerid=${dataimporter.request.segment}" /> Then, when you request a segment update, you specify the segment as a parameter to your request /solr/db/dataimport?command=segment-import&segment=1000 I've worked out the code segments required to do this for the JdbcDataSource, though I'm not sure what additional changes would be necessary for other datasource types, and am attaching a patch which includes these changes. > Segmentation of data imports (not just full or single record imports) > --------------------------------------------------------------------- > > Key: SOLR-1613 > URL: https://issues.apache.org/jira/browse/SOLR-1613 > Project: Solr > Issue Type: New Feature > Components: contrib - DataImportHandler > Affects Versions: 1.4 > Reporter: Matt Inger > Attachments: SOLR-1613.patch > > > It is desirable to able to segment imports by a particular field in the root > entity record so that you can update a particular segment of your database > when bulk updates occur on the backend database. For instance, if a bulk > update occurs for a particular customer, it would be more efficient to be > able to update a full segment of your index for that customer rather than > issuing updates for every single user in your index for that customer, or > updating the entire index. That would be a waste of processing power. > Instead, it would be more efficient to specify that a particular document > field in the root entity was a segmentation field, and define an additional > query on the root entity (i'm basing my example on a jdbc based datasource): > <entity name="user" pk="userid" segment="customerid" ... > query="..." segmentQuery="select ... where > customerid=${dataimporter.request.segment}" /> > Then, when you request a segment update, you specify the segment as a > parameter to your request > /solr/db/dataimport?command=segment-import&segment=1000 > It would automatically remove documents whose field corresponding to your > "segment" attribute on the root entity matched the segment you were > importing. In the above example, it would remove documents matching. Though > I'm not sure that's exactly the right thing to do, as we would need to ensure > exact field matching, and i'm not sure what the default behavior is of the > query matching which is using in the delete routines, so that would need to > be looked into. > customerid:1000 > I've worked out the code segments required to do this for the JdbcDataSource, > though I'm not sure what additional changes would be necessary for other > datasource types, and am attaching a patch which includes these changes. > -- This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.