Seems like you could split it into separate operations.

Operation 1
  - Select ProductID from tblProducts

That first operation determines the product ids for the rest of the
operations.

Operation 2
  - Branching. Use the BranchingOperation or
MultiThreadedBranchingOperation to split the list of product ids into
multiple streams

Operation 3, 4 and 5
  - Separate operations for categories, materials and types that you wire
up to operation 2. Each operation will take the original result set and do
what you need with it. You can run a join for each to pull the additional
information needed.

Nathan

On Sun, Jul 22, 2012 at 2:47 PM, Remco Ros <[email protected]> wrote:

> The tables don't have the same schema/data.
> I need to denormalize product data from sql server to solr documents (with
> multivalued fields).
>
> the source schema is somewhat like i posted before (product / categories /
> articlegroup / product attributes).
> But we don't need all data, we need just the products matching some sql
> criteria.
> We found this query is somewhat costly, so we optimized it.
>
> All in one sql command:
> - We first determine all product id's we want and insert them into a temp
> table.
> - we select al product data (one row for each product)
> - then select categories from products which match the temp table.
> - then select other attributes for products in the temp table
> - etc.
>
> on the client side, I fetch the first results (products) and construct a
> Product object from it.
> i put those in a dictionary<productid, product> for quick access later.
> Then I fetch the next resultset.
> I loop trough this set once, and add that data to the correct product
> using the previous dictionary:
> productsById[productid].Categories.Add(rdr["CatId"], etc.)
>
> In the end, this leaves me with a large collection of denormalized product
> data.
>
> I do all this in one AbstractOperation currently, passing all the products
> as a result for further processing.
>
> I was hoping I could split this one big operation into several, so we can
> for example add a new operation for each new product attribute to
> extract/transform/load.
>
> I could express this as one big SQL query and figure the joining out on
> the client side, but this would lead to one big cartesian product.
>
> Op zondag 22 juli 2012 16:35:44 UTC+2 schreef Nathan Palmer het volgende:
>
>> Remco,
>>
>> You can do multiple result-sets by using the join operators. Here is an
>> example using the DSL.
>>
>> https://github.com/**hibernating-rhinos/rhino-etl/**
>> blob/master/Rhino.Etl.Tests/**Dsl/InnerJoin.boo<https://github.com/hibernating-rhinos/rhino-etl/blob/master/Rhino.Etl.Tests/Dsl/InnerJoin.boo>
>>
>> However if you need multiple streams into the same result-set it can
>> still be done but with a custom operation. You'd want to selected from the
>> first table and emit all of the rows.. then select from the second and emit
>> all of those. At the end of the operation you'd get the combination of all
>> of the tables. This is of course assuming all tables have the same columns.
>>
>> Let me know if this makes sense or not. I might have an example of this
>> somewhere if I dig through some code.
>>
>> Nathan Palmer
>>
>> On Fri, Jul 20, 2012 at 10:34 AM, Remco Ros <[email protected]> wrote:
>>
>>> Hi,
>>>
>>> I want to use Rhino ETL do normalize relational data to documents.
>>>
>>> We need to fetch all products based on some heavy query.
>>> then fetch all related data about these products (around 6 tables).
>>>
>>> Because of performance I figured I don't want to join all these tables
>>> to products, because of the duplicate data.
>>>
>>> So we use something like this now:
>>>
>>> (all in one query):
>>>
>>> ---
>>> CREATE TABLE #productids
>>> (
>>> productId uniqueidentifier
>>> )
>>>
>>> INSERT INTO #productids
>>> SELECT p.ProductId
>>> FROM
>>> tblProduct p
>>> // snip... some heavy query to determine all products we want.
>>>
>>> SELECT * FROM tblProduct WHERE productId IN (SELECT productId FROM
>>> @productids)
>>> SELECT * FROM tblProductCategories WHERE productId IN (SELECT productId
>>> FROM @productids)
>>> SELECT * FROM tblProductMaterialls WHERE productId IN (SELECT productId
>>> FROM @productids)
>>> SELECT * FROM tblProductTypes WHERE productId IN (SELECT productId FROM
>>> @productids)
>>> // etc....
>>> ---
>>>
>>> currently I read all this data using SqlDataReader into a 'Product'
>>> class.
>>> then convert this list of products to a dictionary (for fast access)
>>> then rdr.NextResult()
>>> then loop trough the datareader and fill the extra product data
>>> then rdr.NextResult()
>>> then loop trough the datareader and fill the extra product data
>>> etc.
>>>
>>>
>>> How can this process be expressed as an ETL process? Because, As far as
>>> I studied Rhino ETL. It works only of one set of rows, and has no support
>>> for multiple result sets.
>>>
>>> Any ideas?
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "Rhino Tools Dev" group.
>>> To view this discussion on the web visit https://groups.google.com/d/**
>>> msg/rhino-tools-dev/-/_**tQtqa4iCvkJ<https://groups.google.com/d/msg/rhino-tools-dev/-/_tQtqa4iCvkJ>
>>> .
>>> To post to this group, send email to 
>>> rhino-tools-dev@googlegroups.**com<[email protected]>
>>> .
>>> To unsubscribe from this group, send email to
>>> rhino-tools-dev+unsubscribe@**googlegroups.com<rhino-tools-dev%[email protected]>
>>> .
>>> For more options, visit this group at http://groups.google.com/**
>>> group/rhino-tools-dev?hl=en<http://groups.google.com/group/rhino-tools-dev?hl=en>
>>> .
>>>
>>
>>  --
> You received this message because you are subscribed to the Google Groups
> "Rhino Tools Dev" group.
> To view this discussion on the web visit
> https://groups.google.com/d/msg/rhino-tools-dev/-/CVw-zFwzUQ4J.
>
> To post to this group, send email to [email protected].
> To unsubscribe from this group, send email to
> [email protected].
> For more options, visit this group at
> http://groups.google.com/group/rhino-tools-dev?hl=en.
>

-- 
You received this message because you are subscribed to the Google Groups 
"Rhino Tools Dev" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
For more options, visit this group at 
http://groups.google.com/group/rhino-tools-dev?hl=en.

Reply via email to