Takeshi Yamamuro created HIVEMALL-185:
-----------------------------------------

             Summary: Add an optimizer rule to push down a Sample plan node 
into fact tables
                 Key: HIVEMALL-185
                 URL: https://issues.apache.org/jira/browse/HIVEMALL-185
             Project: Hivemall
          Issue Type: Sub-task
            Reporter: Takeshi Yamamuro
            Assignee: Takeshi Yamamuro


Sampling is a common technique to extract a part of data in joined relations 
(fact tables and dimension tables) for training data. The optimizer in Spark 
cannot push down a Sample plan node into larger fact tables because this node 
is non-deterministic. But, by using RI constraints, we could push down this 
node into fact tables in some cases.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

Reply via email to