[GitHub] carbondata issue #2335: [CARBONDATA-2573] integrate carbonstore mv branch
Github user eychih commented on the issue: https://github.com/apache/carbondata/pull/2335 @ravipesala @jackylk It looks like the discussion become uninteresting and unproductive now. I just try to make some constructive comments. The comments I made in the previous discussions remain true. In terms of MV, the performance bottleneck comes from the amount of data we need to process and how we process them, e.g. using cache, rather than matching time of the logical plan. The reason I take the approach in chowbranch because the old code is difficult to understand and to extend. This is not just my opinion. I will stop here and not make any further comment. ---
[GitHub] carbondata issue #2335: [CARBONDATA-2573] integrate carbonstore mv branch
Github user jackylk commented on the issue: https://github.com/apache/carbondata/pull/2335 LGTM ---
[GitHub] carbondata issue #2335: [CARBONDATA-2573] integrate carbonstore mv branch
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/2335 @eychih The code added to rewrite with MV table is done outside so there would not be any harm to the purpose of MV. And moreover checking the sameresult multiple times gives performance problems while doing matching. And we have tested the sameresult and gives the problem if the session changes unless we maintain the cache for each session, I think that is not a good idea. ---
[GitHub] carbondata issue #2335: [CARBONDATA-2573] integrate carbonstore mv branch
Github user eychih commented on the issue: https://github.com/apache/carbondata/pull/2335 @ravipesala Incremental loading of MV tables should be done separately from query rewrite. Mixing of query rewrite with incremental loading defeats the purpose of MV. Also, the method sameresult is independent of sessions. The method can be applied to logical plans created in different sessions. Finally, the way used in the chowbranch will have better performance when an MV is cached in memory because Spark replaces the corresponding subplan with an in-memory table first. ---
[GitHub] carbondata issue #2335: [CARBONDATA-2573] integrate carbonstore mv branch
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/2335 @eychih I have merged the necessary changes from chowbranch to this PR. And left outer join changes will be taken in a different PR. Factored out MV subplan to table is handled in `withMVTable` method. I cannot use the chowbranch's way it give issues related to incremental loading of table and sameresult cannot work if the session is changed.And also it has performnace impact if we use the chowbranch's way. Dropping of preAgg function is handled in MVHelper class ---
[GitHub] carbondata issue #2335: [CARBONDATA-2573] integrate carbonstore mv branch
Github user eychih commented on the issue: https://github.com/apache/carbondata/pull/2335 In addition, for consistency, please use the following way to remove PreAggFunction: val planToRegister = sparkSession.sql(updatedQuery).queryExecution.analyzed. transform { case p @ logical.Project(head :: tail, _) if head.isInstanceOf[Alias] && head.name.equals("preAgg") => p.copy(projectList = tail) case a @ logical.Aggregate(_, head :: tail, _) if head.isInstanceOf[Alias] && head.name.equals("preAgg") => a.copy(aggregateExpressions = tail) } ---
[GitHub] carbondata issue #2335: [CARBONDATA-2573] integrate carbonstore mv branch
Github user eychih commented on the issue: https://github.com/apache/carbondata/pull/2335 For 2 in the previous comment, in SummaryDatasetCatalog.scala, I add a method useSummaryDataset(plan: LogicalPlan) to replace a subplan of an MV with the corresponding MV table. ---
[GitHub] carbondata issue #2335: [CARBONDATA-2573] integrate carbonstore mv branch
Github user eychih commented on the issue: https://github.com/apache/carbondata/pull/2335 I saw you already integrate some code in chowbranch. However, in MVAnalyzerRule.scala, there are some issues: 1. At line 68, you call withMVTable directly. The issue is, if the method throws exceptions, there is no way to handle them. 2. where do you replace the factored out MV subplan with the corresponding MV table? (please refer the same file in chowbranch, to see how this get handled) ---
[GitHub] carbondata issue #2335: [CARBONDATA-2573] integrate carbonstore mv branch
Github user eychih commented on the issue: https://github.com/apache/carbondata/pull/2335 can you merge changes in chowbranch into master? code in chowbranch has passed test cases in both old and new test cases and has support for the left outer join view? ---
[GitHub] carbondata issue #2335: [CARBONDATA-2573] integrate carbonstore mv branch
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/2335 @jackylk Please review and merge ---
[GitHub] carbondata issue #2335: [CARBONDATA-2573] integrate carbonstore mv branch
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/2335 SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/5222/ ---
[GitHub] carbondata issue #2335: [CARBONDATA-2573] integrate carbonstore mv branch
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2335 Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/5090/ ---
[GitHub] carbondata issue #2335: [CARBONDATA-2573] integrate carbonstore mv branch
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2335 Build Success with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/6252/ ---
[GitHub] carbondata issue #2335: [CARBONDATA-2573] integrate carbonstore mv branch
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/2335 SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/5215/ ---
[GitHub] carbondata issue #2335: [CARBONDATA-2573] integrate carbonstore mv branch
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2335 Build Success with Spark 2.2.1, Please check CI http://88.99.58.216:8080/job/ApacheCarbonPRBuilder/5083/ ---
[GitHub] carbondata issue #2335: [CARBONDATA-2573] integrate carbonstore mv branch
Github user CarbonDataQA commented on the issue: https://github.com/apache/carbondata/pull/2335 Build Failed with Spark 2.1.0, Please check CI http://136.243.101.176:8080/job/ApacheCarbonPRBuilder1/6245/ ---
[GitHub] carbondata issue #2335: [CARBONDATA-2573] integrate carbonstore mv branch
Github user ravipesala commented on the issue: https://github.com/apache/carbondata/pull/2335 SDV Build Success , Please check CI http://144.76.159.231:8080/job/ApacheSDVTests/5214/ ---