[GitHub] drill issue #671: DRILL-4347: Propagate distinct row count for joins from lo...
Github user jinfengni commented on the issue: https://github.com/apache/drill/pull/671 Agreed with your comment about the importance of doing calcite rebasing. AFAIK, someone(s) else in Ukraine have been working on calcite rebasing for a while. Last time I heard is they managed to get a rebased calcite branch and are dealing with regressions on Drill side. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill issue #671: DRILL-4347: Propagate distinct row count for joins from lo...
Github user julianhyde commented on the issue: https://github.com/apache/drill/pull/671 Rebasing onto Calcite is like running after a train: you can't just stop and take a rest. :) And by the way, the state of Drill-Arrow integration makes me very sad. Now Drill has fallen behind there, I doubt whether it will ever catch up. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill issue #671: DRILL-4347: Propagate distinct row count for joins from lo...
Github user jinfengni commented on the issue: https://github.com/apache/drill/pull/671 @jacques-n , CachingRelMetadataProvider provides caching capability per meta-method / rel node [1]. Since Drill logical rel (DrillJoinRel) and Drill physical rel (JoinPrel) are different rels, CachingRelMetadataProvider probably would not help avoiding the first meta data call for the physical rel nodes, even the meta data for logical rels are in the cache. @julianhyde , I probably once tried to cherry-pick CALCITE-604 to Drill's calcite fork, and I aborted that effort after seeing many merging conflicts (If I remember correctly). Since there has been ongoing effort to rebase Drill onto latest Calcite, it might make sense to see if the rebase work could be done shortly. At that time, Drill will benefit from CALCITE-604. 1. https://github.com/apache/calcite/blob/master/core/src/main/java/org/apache/calcite/rel/metadata/CachingRelMetadataProvider.java#L113-L120 --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill issue #671: DRILL-4347: Propagate distinct row count for joins from lo...
Github user amansinha100 commented on the issue: https://github.com/apache/drill/pull/671 Thanks @julianhyde... CALCITE-604 should potentially help with this. Drill's calcite version has not caught up to this yet. Let me confer with @jinfengni sometime next week (he is on vacation until then) and get back on what can be done to get this into Drill. In the meantime, even though my patch addresses the hang issue, I will hold it for now. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill issue #671: DRILL-4347: Propagate distinct row count for joins from lo...
Github user julianhyde commented on the issue: https://github.com/apache/drill/pull/671 Yes, CachingRelMetadataProvider is meant for this. After https://issues.apache.org/jira/browse/CALCITE-604, providers are considerably more efficient (not called via reflection), and RelMetadataQuery contains a per-request cache (because some metadata does change, but slowly). --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill issue #671: DRILL-4347: Propagate distinct row count for joins from lo...
Github user amansinha100 commented on the issue: https://github.com/apache/drill/pull/671 It is quite likely that the CachingRelMetadataProvider is meant for this. Based on the stack trace, there are multiple instances of "at org.apache.calcite.rel.metadata.CachingRelMetadataProvider$CachingInvocationHandler.invoke(CachingRelMetadataProvider.java:132)" and that line # indicates that there was either a cache miss or the entry was stale. So, the caching provider does in fact get used but then subsequently gets stuck in the apply() method of the ReflectiveRelMetadataProvider. I did not attempt to debug why it got stuck there...partly because I am not very familiar with the way reflection is used in this provider. Hence, my fix is an attempt to circumvent the issue. --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---
[GitHub] drill issue #671: DRILL-4347: Propagate distinct row count for joins from lo...
Github user jacques-n commented on the issue: https://github.com/apache/drill/pull/671 Random question: won't a caching metadata provider achieve the same thing without having to do per rel node changes? At least that is what I thought it was for. Maybe @julianhyde can provide additional feedback? --- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. ---