[jira] [Commented] (KYLIN-2937) 非分区cube的中间数据会累积
[ https://issues.apache.org/jira/browse/KYLIN-2937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16207049#comment-16207049 ] zhengzfand commented on KYLIN-2937: --- It works , Thank you. > 非分区cube的中间数据会累积 > --- > > Key: KYLIN-2937 > URL: https://issues.apache.org/jira/browse/KYLIN-2937 > Project: Kylin > Issue Type: Bug >Reporter: zhengzfand > > 非分区的cube构建之后,中间数据不会被清理. > 存放在hdfs上的字典文件,会一直累积.cube构建时会加载所有这些累积字典文件, > 有可能导致内存溢出(如果字典文件够大够多的话). > For nonpartition cube will remain dictionary files which store in hdfs ,as > kylin cube building job may > load all dictionary files , this may make jvm heap out of memory. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Closed] (KYLIN-2937) 非分区cube的中间数据会累积
[ https://issues.apache.org/jira/browse/KYLIN-2937?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] zhengzfand closed KYLIN-2937. - Resolution: Fixed > 非分区cube的中间数据会累积 > --- > > Key: KYLIN-2937 > URL: https://issues.apache.org/jira/browse/KYLIN-2937 > Project: Kylin > Issue Type: Bug >Reporter: zhengzfand > > 非分区的cube构建之后,中间数据不会被清理. > 存放在hdfs上的字典文件,会一直累积.cube构建时会加载所有这些累积字典文件, > 有可能导致内存溢出(如果字典文件够大够多的话). > For nonpartition cube will remain dictionary files which store in hdfs ,as > kylin cube building job may > load all dictionary files , this may make jvm heap out of memory. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KYLIN-1892) merge interval support
[ https://issues.apache.org/jira/browse/KYLIN-1892?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16206955#comment-16206955 ] fengYu commented on KYLIN-1892: --- sorry for delay, you can finish it if you have planed, Thanks. > merge interval support > -- > > Key: KYLIN-1892 > URL: https://issues.apache.org/jira/browse/KYLIN-1892 > Project: Kylin > Issue Type: Improvement > Components: Job Engine >Affects Versions: v1.5.2 >Reporter: fengYu >Assignee: Yang Hao > > We always has some data need to be amended some days later > in current kylin, once I set Auto Merge Thresholds, the segment newly build > will merge if reach Thresholds, the next day refresh will refresh merged > segemnt, which is unnecessary. > So I want to add a interval configuration means auto merge will merge > segments outside of the interval. > for example, if interval = 2, Auto Merge Thresholds=7, if 07-01 to 07-07 is > built, auto merge will not trigger, when 07-09 built success, auto merge will > trigger and merge segments from 07-01 to 07-07. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (KYLIN-2941) Configuration setting for SSO
Pan, Julian created KYLIN-2941: -- Summary: Configuration setting for SSO Key: KYLIN-2941 URL: https://issues.apache.org/jira/browse/KYLIN-2941 Project: Kylin Issue Type: Improvement Reporter: Pan, Julian Priority: Minor I noticed there is kylin.security.saml.metadata-file properties in kylin.properties but never used. And there are classpath:samlKeystore.jks & sso_metadata.xml the configuration in kylinSecurity.xml. Could we config both of them in kylin.properties? And change them in kylinSecurity.xml. e.g ${kylin.security.saml.metadata-file} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (KYLIN-1892) merge interval support
[ https://issues.apache.org/jira/browse/KYLIN-1892?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yang Hao reassigned KYLIN-1892: --- Assignee: Yang Hao (was: fengYu) > merge interval support > -- > > Key: KYLIN-1892 > URL: https://issues.apache.org/jira/browse/KYLIN-1892 > Project: Kylin > Issue Type: Improvement > Components: Job Engine >Affects Versions: v1.5.2 >Reporter: fengYu >Assignee: Yang Hao > > We always has some data need to be amended some days later > in current kylin, once I set Auto Merge Thresholds, the segment newly build > will merge if reach Thresholds, the next day refresh will refresh merged > segemnt, which is unnecessary. > So I want to add a interval configuration means auto merge will merge > segments outside of the interval. > for example, if interval = 2, Auto Merge Thresholds=7, if 07-01 to 07-07 is > built, auto merge will not trigger, when 07-09 built success, auto merge will > trigger and merge segments from 07-01 to 07-07. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KYLIN-2926) DumpMerger return incorrect results
[ https://issues.apache.org/jira/browse/KYLIN-2926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16206944#comment-16206944 ] fengYu commented on KYLIN-2926: --- This is what I means on above response, I think create a codec for every dump is a good way too, However, for the finally solving the problem, remove the ThreadLocal is a better way, which can avoid the trap for the following delevoper. > DumpMerger return incorrect results > --- > > Key: KYLIN-2926 > URL: https://issues.apache.org/jira/browse/KYLIN-2926 > Project: Kylin > Issue Type: Bug >Affects Versions: v2.0.0 >Reporter: fengYu >Assignee: fengYu > Attachments: > 0001-KYLIN-2926-DumpMerger-return-incorrect-results-creat.patch > > > I our scenario, a cube query will get wrong result once coprocessor need to > spill to disk, Our version is 2.0.0 and I find the root cause is that in > DumpMerger.enqueueFromDump > because in DataTypeSerializer kylin use a ThreadLocal variable ‘current’, It > leading to different elements in dumpCurrentValues share the same object, so > next fill up measure values will change the existing values. > the incorrect measures is HLLC and raw, which use current variable in > deserialize. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KYLIN-2926) DumpMerger return incorrect results
[ https://issues.apache.org/jira/browse/KYLIN-2926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16206936#comment-16206936 ] zhengdong commented on KYLIN-2926: -- Hi [~feng_xiao_yu] and [~Shaofengshi], since we now only found {{DumpMerger}} not serialize measure instance in sequence, what about just changing {{DumpMerger}}? For instance, these encoded measure values could be kept in {{dumpCurrentValues}} instead of decoded values until they are used by the final result aggregator. > DumpMerger return incorrect results > --- > > Key: KYLIN-2926 > URL: https://issues.apache.org/jira/browse/KYLIN-2926 > Project: Kylin > Issue Type: Bug >Affects Versions: v2.0.0 >Reporter: fengYu >Assignee: fengYu > Attachments: > 0001-KYLIN-2926-DumpMerger-return-incorrect-results-creat.patch > > > I our scenario, a cube query will get wrong result once coprocessor need to > spill to disk, Our version is 2.0.0 and I find the root cause is that in > DumpMerger.enqueueFromDump > because in DataTypeSerializer kylin use a ThreadLocal variable ‘current’, It > leading to different elements in dumpCurrentValues share the same object, so > next fill up measure values will change the existing values. > the incorrect measures is HLLC and raw, which use current variable in > deserialize. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (KYLIN-2903) support cardinality calculation for Hive view
[ https://issues.apache.org/jira/browse/KYLIN-2903?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhong Yanghong reassigned KYLIN-2903: - Assignee: Wang, Gang (was: Zhong Yanghong) > support cardinality calculation for Hive view > - > > Key: KYLIN-2903 > URL: https://issues.apache.org/jira/browse/KYLIN-2903 > Project: Kylin > Issue Type: Improvement > Components: Job Engine >Reporter: Wang, Gang >Assignee: Wang, Gang >Priority: Minor > > Currently, Kylin leverage HCatlog to calculate column cardinality for Hive > tables. While, HCatlog does not support Hive view actually. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (KYLIN-2898) Introduce memcached as a distributed cache for queries
[ https://issues.apache.org/jira/browse/KYLIN-2898?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhong Yanghong reassigned KYLIN-2898: - Assignee: Wang Ken (was: Zhong Yanghong) > Introduce memcached as a distributed cache for queries > -- > > Key: KYLIN-2898 > URL: https://issues.apache.org/jira/browse/KYLIN-2898 > Project: Kylin > Issue Type: Sub-task > Components: Query Engine >Affects Versions: v2.1.0 >Reporter: Zhong Yanghong >Assignee: Wang Ken > -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (KYLIN-2913) Enable job retry for configurable exceptions
[ https://issues.apache.org/jira/browse/KYLIN-2913?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhong Yanghong reassigned KYLIN-2913: - Assignee: Wang, Gang (was: Dong Li) > Enable job retry for configurable exceptions > > > Key: KYLIN-2913 > URL: https://issues.apache.org/jira/browse/KYLIN-2913 > Project: Kylin > Issue Type: Improvement > Components: Job Engine >Affects Versions: v2.1.0 >Reporter: Wang, Gang >Assignee: Wang, Gang > Fix For: v2.3.0 > > > In our production environment, we always get some certain exceptions from > Hadoop or HBase, like > "org.apache.kylin.job.exception.NoEnoughReplicationException", > "java.util.ConcurrentModificationException", which results in job failure. > While, these exceptions can be handled by retry actually. So, it will be much > more convenient if we are able to make job retry on some configurable > exceptions. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (KYLIN-2932) Simplify the thread model for in-memory cubing
[ https://issues.apache.org/jira/browse/KYLIN-2932?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zhong Yanghong reassigned KYLIN-2932: - Assignee: Wang Ken (was: Dong Li) > Simplify the thread model for in-memory cubing > -- > > Key: KYLIN-2932 > URL: https://issues.apache.org/jira/browse/KYLIN-2932 > Project: Kylin > Issue Type: Improvement > Components: Job Engine >Reporter: Wang Ken >Assignee: Wang Ken > > The current implementation uses split threads, task threads and main thread > to do the cube building, there is complex join and error handling logic. > The new implement leverages the ForkJoinPool from JDK, the event split logic > is handled in > main thread. Cuboid task and sub-tasks are handled in fork join pool, cube > results are collected > async and can be write to output earlier. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KYLIN-2926) DumpMerger return incorrect results
[ https://issues.apache.org/jira/browse/KYLIN-2926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16206898#comment-16206898 ] fengYu commented on KYLIN-2926: --- [~Shaofengshi] I am totally agree with you, but there are some more places refer to HLLC and RAW, it need to more test to cover those. I quickly fix it by this patch, I think the bug is very serious, A big cube which contains HLLC or RAW measure(once coprocessor need dump data) maybe be return incorrect results. > DumpMerger return incorrect results > --- > > Key: KYLIN-2926 > URL: https://issues.apache.org/jira/browse/KYLIN-2926 > Project: Kylin > Issue Type: Bug >Affects Versions: v2.0.0 >Reporter: fengYu >Assignee: fengYu > Attachments: > 0001-KYLIN-2926-DumpMerger-return-incorrect-results-creat.patch > > > I our scenario, a cube query will get wrong result once coprocessor need to > spill to disk, Our version is 2.0.0 and I find the root cause is that in > DumpMerger.enqueueFromDump > because in DataTypeSerializer kylin use a ThreadLocal variable ‘current’, It > leading to different elements in dumpCurrentValues share the same object, so > next fill up measure values will change the existing values. > the incorrect measures is HLLC and raw, which use current variable in > deserialize. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KYLIN-2937) 非分区cube的中间数据会累积
[ https://issues.apache.org/jira/browse/KYLIN-2937?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16206885#comment-16206885 ] Shaofeng SHI commented on KYLIN-2937: - “ StorageCleanupJob” won't clean the dict files; Run $KYLIN_HOME/bin/metastore.sh clean --delete This will remove the useless metadata entries, includes dictionary, snapshots, etc. Take a backup before run this is recommended > 非分区cube的中间数据会累积 > --- > > Key: KYLIN-2937 > URL: https://issues.apache.org/jira/browse/KYLIN-2937 > Project: Kylin > Issue Type: Bug >Reporter: zhengzfand > > 非分区的cube构建之后,中间数据不会被清理. > 存放在hdfs上的字典文件,会一直累积.cube构建时会加载所有这些累积字典文件, > 有可能导致内存溢出(如果字典文件够大够多的话). > For nonpartition cube will remain dictionary files which store in hdfs ,as > kylin cube building job may > load all dictionary files , this may make jvm heap out of memory. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Created] (KYLIN-2940) List job restful throw NPE when time filter not set
Pan, Julian created KYLIN-2940: -- Summary: List job restful throw NPE when time filter not set Key: KYLIN-2940 URL: https://issues.apache.org/jira/browse/KYLIN-2940 Project: Kylin Issue Type: Improvement Components: REST Service Affects Versions: v2.1.0 Reporter: Pan, Julian Assignee: Pan, Julian Here is the error response: {"code":"999","data":null,"msg":null,"stacktrace":"java.lang.NullPointerException\n\tat org.apache.kylin.rest.controller.JobController.list(JobController.java:72)\n\tat sun.reflect.GeneratedMethodAccessor283.invoke(Unknown Source)\n\tat sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)\n\tat java.lang.reflect.Method.invoke(Method.java:606)\n\tat org.springframework.web.method.support.InvocableHandlerMethod.doInvoke(InvocableHandlerMethod.java:221)\n\tat org.springframework.web.method.support.InvocableHandlerMethod.invokeForRequest(InvocableHandlerMethod.java:136)\n\tat org.springframework.web.servlet.mvc.method.annotation.ServletInvocableHandlerMethod.invokeAndHandle(ServletInvocableHandlerMethod.java:110)\n\tat org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.invokeHandlerMethod(RequestMappingHandlerAdapter.java:832)\n\tat org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.handleInternal(RequestMappingHandlerAdapter.java:743)\n\tat org.springframework.web.servlet.mvc.method.AbstractHandlerMethodAdapter.handle(AbstractHandlerMethodAdapter.java:85)\n\tat org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherServlet.java:961)\n\tat org.springframework.web.servlet.DispatcherServlet.doService(DispatcherServlet.java:895)\n\tat org.springframework.web.servlet.FrameworkServlet.processRequest(FrameworkServlet.java:967)\n\tat org.springframework.web.servlet.FrameworkServlet.doGet(FrameworkServlet.java:858)\n\tat javax.servlet.http.HttpServlet.service(HttpServlet.java:624)\n\tat org.springframework.web.servlet.FrameworkServlet.service(FrameworkServlet.java:843)\n\tat javax.servlet.http.HttpServlet.service(HttpServlet.java:731)\n\tat org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:303)\n\tat org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)\n\tat org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:52)\n\tat org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:241)\n\tat org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:208)\n\tat org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:316)\n\tat org.springframework.security.web.access.intercept.FilterSecurityInterceptor.invoke(FilterSecurityInterceptor.java:126)\n\tat org.springframework.security.web.access.intercept.FilterSecurityInterceptor.doFilter(FilterSecurityInterceptor.java:90)\n\tat org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:330)\n\tat org.springframework.security.web.access.ExceptionTranslationFilter.doFilter(ExceptionTranslationFilter.java:114)\n\tat org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:330)\n\tat org.springframework.security.web.session.SessionManagementFilter.doFilter(SessionManagementFilter.java:122)\n\tat org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:330)\n\tat org.springframework.security.web.authentication.AnonymousAuthenticationFilter.doFilter(AnonymousAuthenticationFilter.java:111)\n\tat org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:330)\n\tat org.springframework.security.web.servletapi.SecurityContextHolderAwareRequestFilter.doFilter(SecurityContextHolderAwareRequestFilter.java:169)\n\tat org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:330)\n\tat org.springframework.security.web.savedrequest.RequestCacheAwareFilter.doFilter(RequestCacheAwareFilter.java:48)\n\tat org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:330)\n\tat org.springframework.security.web.authentication.www.BasicAuthenticationFilter.doFilterInternal(BasicAuthenticationFilter.java:213)\n\tat org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:107)\n\tat org.springframework.security.web.FilterChainProxy$VirtualFilterChain.doFilter(FilterChainProxy.java:330)\n\tat org.springframework.security.web.authentication.AbstractAuthenticationProcessingFilter.doFilter(AbstractAuthenticationProcessingFilter.java:205)\n\tat org.springframework.security.web.FilterChainProxy$Vi
[jira] [Commented] (KYLIN-2926) DumpMerger return incorrect results
[ https://issues.apache.org/jira/browse/KYLIN-2926?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16206842#comment-16206842 ] Shaofeng SHI commented on KYLIN-2926: - Hi Yu, good catch; Using ThreadLocal in the DataTypeSerializers is dangerous; The assumption is the deserialized object will be serialized in sequence (like in MR case), or is immutable. In the GTAggregateScanner, obviously, it doesn't match the requirement for HLLC and RAW. The ultimate solution is to fix this in each Serializer class, making the deserialize() return a new object or an immutable object. Like what we already do in TopNCounterSerializer. That will be clearer and easier for maintenance int the future. [~liyang.g...@gmail.com] what's your opinion? > DumpMerger return incorrect results > --- > > Key: KYLIN-2926 > URL: https://issues.apache.org/jira/browse/KYLIN-2926 > Project: Kylin > Issue Type: Bug >Affects Versions: v2.0.0 >Reporter: fengYu >Assignee: fengYu > Attachments: > 0001-KYLIN-2926-DumpMerger-return-incorrect-results-creat.patch > > > I our scenario, a cube query will get wrong result once coprocessor need to > spill to disk, Our version is 2.0.0 and I find the root cause is that in > DumpMerger.enqueueFromDump > because in DataTypeSerializer kylin use a ThreadLocal variable ‘current’, It > leading to different elements in dumpCurrentValues share the same object, so > next fill up measure values will change the existing values. > the incorrect measures is HLLC and raw, which use current variable in > deserialize. -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Commented] (KYLIN-2722) Introduce a new measure, called active reservoir, for actively pushing metrics to reporters
[ https://issues.apache.org/jira/browse/KYLIN-2722?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16206826#comment-16206826 ] Zhong Yanghong commented on KYLIN-2722: --- bq. The term "active reservoir" sounds a little strange, because I don't see inactive reservoir any where. The "active" is not meaningful without the "inactive", just like there is no good without evil. In codahale metrics, there's already a class named {{Reservoir}}, which is used in {{Histogram}} for just holding events with different strategies. Here, {{ActiveReservoir}} is not only for holding events, but also flushing events to reporters actively. > Introduce a new measure, called active reservoir, for actively pushing > metrics to reporters > --- > > Key: KYLIN-2722 > URL: https://issues.apache.org/jira/browse/KYLIN-2722 > Project: Kylin > Issue Type: Sub-task >Reporter: Zhong Yanghong >Assignee: Zhong Yanghong > Attachments: APACHE-KYLIN-2722.patch > > > For many existing metrics frameworks, they focus on maintaining metrics in > memory independently for each instance. However, kylin server may consist of > multiple instances. Thus we extend existing metrics framework by introducing > *active reservoir* to actively push metrics to reporters which will report > metrics of its instance to a unified storage. > Here we introduced two *active reservoirs*. One is called > {{BlockingReservoir}}, which will buffer the metrics. The other is called > {{InstantReservoir}}, which owns no buffer and will directly push metrics to > reporters. > Generally, one *active reservoir* can push its metrics to multiple reporters > and one reporter can only listen on one *active reservoir*. -- This message was sent by Atlassian JIRA (v6.4.14#64029)