Re: Is there any other tool other than DIH to index a database

Lance Norskog Thu, 08 Apr 2010 18:33:05 -0700

Nice!

On Thu, Apr 8, 2010 at 6:50 AM, Brendan Grainger
<brendan.grain...@gmail.com> wrote:
> For what it's worth, it's also really easy to implement your own 
> EntityProcessor. Extend from EntityProcessorBase then implement the getNext 
> method to return a Map<String, Object> representing the row you want indexed. 
> I did exactly this so I could use reuse my hibernate domain models to query 
> for the data instead of sql.
>
> Brendan
>
> On Apr 8, 2010, at 9:17 AM, Shawn Heisey wrote:
>
>> On 4/7/2010 9:26 PM, bbarani wrote:
>>> Hi,
>>>
>>> I am currently using DIH to index the data from a database. I am just trying
>>> to figure out if there are any other open source tools which I can use just
>>> for indexing purpose and use SOLR for querying.
>>>
>>> I also thought of writing a custom code for retrieving the data from
>>> database and use SOLRJ to add the data as documents in to lucene. One doubt
>>> here is that if I use the custom code for retrieving the data and use SOLRJ
>>> to commit that data, will the schema file be still used? I mean the field
>>> types / analyzers / tokenizers etc.. present in schema file? or do I need to
>>> manipulate each data (to fit to corresponding data type) in my SOLRJ
>>> program?
>>>
>>>
>>
>> This response is more of an answer to your earlier message where you asked 
>> about batch importing than this exact question, but this is where the 
>> discussion is, so I'm answering here.  You could continue to use DIH and 
>> specify the batches externally.  I just actually wrote most of this in reply 
>> to another email just a few minutes ago.
>>
>> You can pass variables into the DIH to specify the range of documents that 
>> you want to work on, and handle the batching externally.  Start with a 
>> full-import or a delete/optimize to clear out the index and then do multiple 
>> delta-imports.
>>
>> Here's what I'm using as the queries in my latest iteration.  The 
>> deltaImportQuery is identical to the regular query used for full-import.  
>> The deltaQuery is just something related that returns quickly, the 
>> information is thrown away when it does a delta-import.
>>
>> query="SELECT * FROM ${dataimporter.request.dataTable} WHERE did &gt; 
>> ${dataimporter.request.minDid} AND did &lt;= ${dataimporter.request.maxDid} 
>> AND (did % ${dataimporter.request.numShards}) IN 
>> (${dataimporter.request.modVal})"
>>
>> deltaQuery="SELECT MAX(did) FROM ${dataimporter.request.dataTable}"
>>
>> deltaImportQuery="SELECT * FROM ${dataimporter.request.dataTable} WHERE did 
>> &gt; ${dataimporter.request.minDid} AND did &lt;= 
>> ${dataimporter.request.maxDid} AND (did % ${dataimporter.request.numShards}) 
>> IN (${dataimporter.request.modVal})">
>>
>> Then here is my URL template:
>>
>> http://HOST:PORT/solr/CORE/dataimport?command=COMMAND&dataTable=DATATABLE&numShards=NUMSHARDS&modVal=MODVAL&minDid=MINDID&maxDid=MAXDID
>>
>> And the perl data structure that holds the replacements for the uppercase 
>> parts:
>>
>> $urlBits = {
>>  HOST => $cfg{'shards/inc.host1'},
>>  PORT => $cfg{'shards/inc.port'},
>>  MODVAL => $cfg{'shards/inc.modVal'},
>>  CORE => "live",
>>  COMMAND => "delta-import&commit=true&optimize=false",
>>  DATATABLE => $cfg{dataTable},
>>  NUMSHARDS => $cfg{numShards},
>>  MINDID => $cfg{maxDid},
>>  MAXDID => $dbMaxDid,
>> };
>>
>> Good luck with your setup!
>>
>> Shawn
>>
>
>




-- 
Lance Norskog
goks...@gmail.com

Re: Is there any other tool other than DIH to index a database

Reply via email to