Search the list for my post "DIH - deleting documents, high performance
(delta) imports, and passing parameters" which shows a different
approach to 1:M sub entities

Ephraim Ofir

-----Original Message-----
From: Tim Gilbert [mailto:tim.gilb...@morningstar.com] 
Sent: Thursday, April 14, 2011 6:02 PM
To: solr-user@lucene.apache.org
Subject: RE: Fast DIH with 1:M multValue entities

How did I miss that?  Thanks, I will try that as it seems to be "in
memory" lookup solution I needed.

Thanks Erick,

Tim

-----Original Message-----
From: Erick Erickson [mailto:erickerick...@gmail.com] 
Sent: Thursday, April 14, 2011 10:58 AM
To: solr-user@lucene.apache.org
Subject: Re: Fast DIH with 1:M multValue entities

I'm not sure this applies, but have you looked at
http://wiki.apache.org/solr/DataImportHandler#CachedSqlEntityProcessor

<http://wiki.apache.org/solr/DataImportHandler#CachedSqlEntityProcessor>
Best
Erick

On Thu, Apr 14, 2011 at 9:12 AM, Tim Gilbert
<tim.gilb...@morningstar.com>wrote:

> We are working on importing a large number of records into Solr using
> DIH.  We have one schema with ~2000 fields declared which map off to
> several database schemas so that typically each document will have
~500
> fields in use.  We have about 2 million "rows" which we are importing,
> and we are seeing < 20 minutes in test across 14 different "entity's"
> which really map off to one virtual document.  Then we added our
> multiValue stuff and, well, it didn't work out nearly as well. :-)
>
>
>
> We have several fields which are 1:M and so in our data-config.xml we
> might have something like this:
>
>
>
> <document name="allfund">
>
> <entity name="FundId" dataSource="getFundManager" query="{call
> dbo.getFundManager_Id()}">
>
> <field column="FundId" name="HS04C" />
>
> <entity name="FundData" dataSource="getFundManager"
>
> query="{call dbo.getFundManager_Data(${FundId.FundId})}">
>
>
>
> <field column="ManagerName" name="OF015" />
>
> </entity>
>
> </entity>
>
> </document>
>
>
>
> That is a lot of database queries for a small result set which is
really
> slowing things down for us.
>
>
>
> My question is more to ask advice, so it's a multi-parter :-)
>
>
>
> 1)                   Is there a way to declare in DIH an in-memory
> lookup where we can query for the entire Many side of the query in one
> database query, and match up on the PK?  Then we can declare that
field
> multiValued.
>
> 2)                   Assuming that isn't currently available, I
thought
> "denormalizing" the 1:M into a delimited list and then using
>
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters#solr.WordDel
> imiterFilterFactory to tokenize.  That would allow us to search on
> individual bits, and build something into the front-end to handle the
> display.  That means we wouldn't use multiValued and we'd have to
modify
> our db but we'd lose out on some of the abilities.
>
> 3)                   The third option was to open up DIH and try to
add
> the first feature into it ourselves.
>
>
>
> Am I approaching this the right way?  Are there other ways I haven't
> considered or don't know about?
>
>
>
> Thanks in advance,
>
>
>
> Tim
>
>

Reply via email to