Nice! On Thu, Apr 8, 2010 at 6:50 AM, Brendan Grainger <brendan.grain...@gmail.com> wrote: > For what it's worth, it's also really easy to implement your own > EntityProcessor. Extend from EntityProcessorBase then implement the getNext > method to return a Map<String, Object> representing the row you want indexed. > I did exactly this so I could use reuse my hibernate domain models to query > for the data instead of sql. > > Brendan > > On Apr 8, 2010, at 9:17 AM, Shawn Heisey wrote: > >> On 4/7/2010 9:26 PM, bbarani wrote: >>> Hi, >>> >>> I am currently using DIH to index the data from a database. I am just trying >>> to figure out if there are any other open source tools which I can use just >>> for indexing purpose and use SOLR for querying. >>> >>> I also thought of writing a custom code for retrieving the data from >>> database and use SOLRJ to add the data as documents in to lucene. One doubt >>> here is that if I use the custom code for retrieving the data and use SOLRJ >>> to commit that data, will the schema file be still used? I mean the field >>> types / analyzers / tokenizers etc.. present in schema file? or do I need to >>> manipulate each data (to fit to corresponding data type) in my SOLRJ >>> program? >>> >>> >> >> This response is more of an answer to your earlier message where you asked >> about batch importing than this exact question, but this is where the >> discussion is, so I'm answering here. You could continue to use DIH and >> specify the batches externally. I just actually wrote most of this in reply >> to another email just a few minutes ago. >> >> You can pass variables into the DIH to specify the range of documents that >> you want to work on, and handle the batching externally. Start with a >> full-import or a delete/optimize to clear out the index and then do multiple >> delta-imports. >> >> Here's what I'm using as the queries in my latest iteration. The >> deltaImportQuery is identical to the regular query used for full-import. >> The deltaQuery is just something related that returns quickly, the >> information is thrown away when it does a delta-import. >> >> query="SELECT * FROM ${dataimporter.request.dataTable} WHERE did > >> ${dataimporter.request.minDid} AND did <= ${dataimporter.request.maxDid} >> AND (did % ${dataimporter.request.numShards}) IN >> (${dataimporter.request.modVal})" >> >> deltaQuery="SELECT MAX(did) FROM ${dataimporter.request.dataTable}" >> >> deltaImportQuery="SELECT * FROM ${dataimporter.request.dataTable} WHERE did >> > ${dataimporter.request.minDid} AND did <= >> ${dataimporter.request.maxDid} AND (did % ${dataimporter.request.numShards}) >> IN (${dataimporter.request.modVal})"> >> >> Then here is my URL template: >> >> http://HOST:PORT/solr/CORE/dataimport?command=COMMAND&dataTable=DATATABLE&numShards=NUMSHARDS&modVal=MODVAL&minDid=MINDID&maxDid=MAXDID >> >> And the perl data structure that holds the replacements for the uppercase >> parts: >> >> $urlBits = { >> HOST => $cfg{'shards/inc.host1'}, >> PORT => $cfg{'shards/inc.port'}, >> MODVAL => $cfg{'shards/inc.modVal'}, >> CORE => "live", >> COMMAND => "delta-import&commit=true&optimize=false", >> DATATABLE => $cfg{dataTable}, >> NUMSHARDS => $cfg{numShards}, >> MINDID => $cfg{maxDid}, >> MAXDID => $dbMaxDid, >> }; >> >> Good luck with your setup! >> >> Shawn >> > >
-- Lance Norskog goks...@gmail.com