[jira] [Comment Edited] (KYLIN-4500) Timeout waiting for connection from pool
[ https://issues.apache.org/jira/browse/KYLIN-4500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17423927#comment-17423927 ] Gabor Arki edited comment on KYLIN-4500 at 10/4/21, 1:14 PM: - This has happened to our production environment today, now with Kylin 3.1.0 running on EMR 5.28. Restarting the query+job server released the connections again and resolved the issue. I assume there is another potential leak somewhere similar to KYLIN-4396 that is yet unfixed, at least in v3.1.0. was (Author: arkigabor): This has happened to our production environment today, now with Kylin 3.1.0 running on EMR 5.28. Restarting the query server released the connections again and resolved the issue. > Timeout waiting for connection from pool > > > Key: KYLIN-4500 > URL: https://issues.apache.org/jira/browse/KYLIN-4500 > Project: Kylin > Issue Type: Bug >Affects Versions: v3.0.0, v3.1.0 >Reporter: Gabor Arki >Priority: Major > Attachments: kylin-connection-timeout.txt > > > h4. Environment > * Kylin server 3.0.0 > * EMR 5.28 > h4. Issue > After an extended uptime, both Kylin query server and jobs running on EMR > stop working. The root cause in both cases is: > {noformat} > Caused by: java.io.IOException: > com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.SdkClientException: Unable > to execute HTTP request: Timeout waiting for connection from pool > at > com.amazon.ws.emr.hadoop.fs.s3n2.S3NativeFileSystem2.getFileStatus(S3NativeFileSystem2.java:257) > ~[emrfs-hadoop-assembly-2.37.0.jar:?]{noformat} > Based on > [https://aws.amazon.com/premiumsupport/knowledge-center/emr-timeout-connection-wait/] > increasing the fs.s3.maxConnections setting to 1 is just delaying the > issue thus the underlying issue is likely a connection leak. It also > indicates a leak that restarting the kylin service solves the problem. > A full stack trace from the QueryService is attached. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (KYLIN-4500) Timeout waiting for connection from pool
[ https://issues.apache.org/jira/browse/KYLIN-4500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17423927#comment-17423927 ] Gabor Arki edited comment on KYLIN-4500 at 10/4/21, 1:08 PM: - This has happened to our production environment today, now with Kylin 3.1.0 running on EMR 5.28. Restarting the query server released the connections again and resolved the issue. was (Author: arkigabor): This has happened to our production environment today, now with Kylin 3.1.0 running on ERM 5.28. Restarting the query server released the connections again and resolved the issue. > Timeout waiting for connection from pool > > > Key: KYLIN-4500 > URL: https://issues.apache.org/jira/browse/KYLIN-4500 > Project: Kylin > Issue Type: Bug >Affects Versions: v3.0.0, v3.1.0 >Reporter: Gabor Arki >Priority: Major > Attachments: kylin-connection-timeout.txt > > > h4. Environment > * Kylin server 3.0.0 > * EMR 5.28 > h4. Issue > After an extended uptime, both Kylin query server and jobs running on EMR > stop working. The root cause in both cases is: > {noformat} > Caused by: java.io.IOException: > com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.SdkClientException: Unable > to execute HTTP request: Timeout waiting for connection from pool > at > com.amazon.ws.emr.hadoop.fs.s3n2.S3NativeFileSystem2.getFileStatus(S3NativeFileSystem2.java:257) > ~[emrfs-hadoop-assembly-2.37.0.jar:?]{noformat} > Based on > [https://aws.amazon.com/premiumsupport/knowledge-center/emr-timeout-connection-wait/] > increasing the fs.s3.maxConnections setting to 1 is just delaying the > issue thus the underlying issue is likely a connection leak. It also > indicates a leak that restarting the kylin service solves the problem. > A full stack trace from the QueryService is attached. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (KYLIN-4500) Timeout waiting for connection from pool
[ https://issues.apache.org/jira/browse/KYLIN-4500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabor Arki updated KYLIN-4500: -- Affects Version/s: v3.0.0 v3.1.0 > Timeout waiting for connection from pool > > > Key: KYLIN-4500 > URL: https://issues.apache.org/jira/browse/KYLIN-4500 > Project: Kylin > Issue Type: Bug >Affects Versions: v3.0.0, v3.1.0 >Reporter: Gabor Arki >Priority: Major > Attachments: kylin-connection-timeout.txt > > > h4. Environment > * Kylin server 3.0.0 > * EMR 5.28 > h4. Issue > After an extended uptime, both Kylin query server and jobs running on EMR > stop working. The root cause in both cases is: > {noformat} > Caused by: java.io.IOException: > com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.SdkClientException: Unable > to execute HTTP request: Timeout waiting for connection from pool > at > com.amazon.ws.emr.hadoop.fs.s3n2.S3NativeFileSystem2.getFileStatus(S3NativeFileSystem2.java:257) > ~[emrfs-hadoop-assembly-2.37.0.jar:?]{noformat} > Based on > [https://aws.amazon.com/premiumsupport/knowledge-center/emr-timeout-connection-wait/] > increasing the fs.s3.maxConnections setting to 1 is just delaying the > issue thus the underlying issue is likely a connection leak. It also > indicates a leak that restarting the kylin service solves the problem. > A full stack trace from the QueryService is attached. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KYLIN-4500) Timeout waiting for connection from pool
[ https://issues.apache.org/jira/browse/KYLIN-4500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17423927#comment-17423927 ] Gabor Arki commented on KYLIN-4500: --- This has happened to our production environment today, now with Kylin 3.1.0 running on ERM 5.28. Restarting the query server released the connections again and resolved the issue. > Timeout waiting for connection from pool > > > Key: KYLIN-4500 > URL: https://issues.apache.org/jira/browse/KYLIN-4500 > Project: Kylin > Issue Type: Bug >Reporter: Gabor Arki >Priority: Major > Attachments: kylin-connection-timeout.txt > > > h4. Environment > * Kylin server 3.0.0 > * EMR 5.28 > h4. Issue > After an extended uptime, both Kylin query server and jobs running on EMR > stop working. The root cause in both cases is: > {noformat} > Caused by: java.io.IOException: > com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.SdkClientException: Unable > to execute HTTP request: Timeout waiting for connection from pool > at > com.amazon.ws.emr.hadoop.fs.s3n2.S3NativeFileSystem2.getFileStatus(S3NativeFileSystem2.java:257) > ~[emrfs-hadoop-assembly-2.37.0.jar:?]{noformat} > Based on > [https://aws.amazon.com/premiumsupport/knowledge-center/emr-timeout-connection-wait/] > increasing the fs.s3.maxConnections setting to 1 is just delaying the > issue thus the underlying issue is likely a connection leak. It also > indicates a leak that restarting the kylin service solves the problem. > A full stack trace from the QueryService is attached. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KYLIN-5022) kylin升级新版本-/mnt/tmp/hbase-hbase/local/jars/tmp产生大量的kylin-coprocessor文件
[ https://issues.apache.org/jira/browse/KYLIN-5022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17384695#comment-17384695 ] Gabor Arki commented on KYLIN-5022: --- Maybe using [checksum|https://hadoop.apache.org/docs/r2.8.2/api/org/apache/hadoop/fs/FileSystem.html#getFileChecksum(org.apache.hadoop.fs.Path)], although couldn't tell whether this one is applicable to the S3 filesystem. Also, the implemented algorithm is dependent on the filesystem used. Another workaround could be to extend the upload method with the solution I am currently using: after the file has been copied to the {{hdfsWorkingDirectory}}, copy it back to the local file system. Or maybe an even easier solution to reverse setting the last modified timestamp: instead of trying to set it on the filesystem, take the last modified of the newly uploaded file and set that value as the last modified for the original jar on the local file system. > kylin升级新版本-/mnt/tmp/hbase-hbase/local/jars/tmp产生大量的kylin-coprocessor文件 > -- > > Key: KYLIN-5022 > URL: https://issues.apache.org/jira/browse/KYLIN-5022 > Project: Kylin > Issue Type: Bug > Components: Metadata, Storage - HBase >Affects Versions: v3.1.1 >Reporter: star_dev >Priority: Major > Attachments: Capture.PNG, 屏幕快照1.png, 屏幕快照2.png, 日志.log > > > kylin版本从3.0.2更新到3.1.1,还是用原来的元数据。 > 发现在EMR的core节点中有大量的kylin-coprocessor文件生成,见附件屏幕快照1,占用了大量的空间,导致hdfs文件系统可用空间变少。路径为/mnt/tmp/hbase-hbase/local/jars/tmp > 查询官方文档 [http://kylin.apache.org/docs/howto/howto_update_coprocessor.html] > 执行如下命令仍然不好用,日志信息见附件 > - > > {{$KYLIN_HOME/bin/kylin.sh > org.apache.kylin.storage.hbase.util.DeployCoprocessorCLI default all}} > {{-}} > 同时发现kylin元数据 > kylin_metadata/coprocessor/下有大量的kylin-coprocessor-3.1.1-*.jar文件,见附件屏幕快照2 > > 是什么原因导致的这种现象? > 如何才能在/mnt/tmp/hbase-hbase/local/jars/tmp路径下不再产生大量的文件? -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KYLIN-5022) kylin升级新版本-/mnt/tmp/hbase-hbase/local/jars/tmp产生大量的kylin-coprocessor文件
[ https://issues.apache.org/jira/browse/KYLIN-5022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17383245#comment-17383245 ] Gabor Arki commented on KYLIN-5022: --- [~xxyu], I did some additional investigation and after finding the aforementioned {{isSame}} method I also found the root cause. We are using S3 instead of HDFS and the problem is caused by that. S3 does not support setting the last modified timestamp. After the coprocessor jar has been copied to the remote file system, the invoked {{setTimes}} is silently ignored in the case of an S3 filesystem. Because of that, {{isSame}} will always return with false and for each and every table a new coprocessor jar is uploaded. As a manual workaround, I copied the coprocessor jar manually to S3 and then copied it back from S3 to the local file system. This way the last modified timestamps are matching and only one coprocessor jar is used. > kylin升级新版本-/mnt/tmp/hbase-hbase/local/jars/tmp产生大量的kylin-coprocessor文件 > -- > > Key: KYLIN-5022 > URL: https://issues.apache.org/jira/browse/KYLIN-5022 > Project: Kylin > Issue Type: Bug > Components: Metadata, Storage - HBase >Affects Versions: v3.1.1 >Reporter: star_dev >Priority: Major > Attachments: Capture.PNG, 屏幕快照1.png, 屏幕快照2.png, 日志.log > > > kylin版本从3.0.2更新到3.1.1,还是用原来的元数据。 > 发现在EMR的core节点中有大量的kylin-coprocessor文件生成,见附件屏幕快照1,占用了大量的空间,导致hdfs文件系统可用空间变少。路径为/mnt/tmp/hbase-hbase/local/jars/tmp > 查询官方文档 [http://kylin.apache.org/docs/howto/howto_update_coprocessor.html] > 执行如下命令仍然不好用,日志信息见附件 > - > > {{$KYLIN_HOME/bin/kylin.sh > org.apache.kylin.storage.hbase.util.DeployCoprocessorCLI default all}} > {{-}} > 同时发现kylin元数据 > kylin_metadata/coprocessor/下有大量的kylin-coprocessor-3.1.1-*.jar文件,见附件屏幕快照2 > > 是什么原因导致的这种现象? > 如何才能在/mnt/tmp/hbase-hbase/local/jars/tmp路径下不再产生大量的文件? -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (KYLIN-5022) kylin升级新版本-/mnt/tmp/hbase-hbase/local/jars/tmp产生大量的kylin-coprocessor文件
[ https://issues.apache.org/jira/browse/KYLIN-5022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17381978#comment-17381978 ] Gabor Arki edited comment on KYLIN-5022 at 7/16/21, 10:44 AM: -- The root cause seems to be that Kylin is creating and configuring a unique jar per HBase table thus HBase region servers are downloading this 5.5M jar for each table separately. In our case, the ~11000 tables result in 50+GB of space needed on our HBase region servers. To make this issue worse, it seems over time HBase is starting to delete these jars (maybe when a table is cleaned up, maybe it does so occasionally anyway). But given the HBase region server process is continuing to run, the disk space occupied by these deleted jars is not freed up either unless the region server is shut down. Only then are these deleted files released and removed from the disk. was (Author: arkigabor): The root cause seems to be that Kylin is creating and configuring a unique jar per HBase table thus HBase region servers are downloading this 5.5M jar for each table separately. In our case, the ~11000 tables result in 50+GB of space needed on our HBase region servers. To make this issue worse, it seems over time HBase is starting to delete these jars (maybe when a table is cleaned up, maybe it does so occasionally anyway). But given the HBase region server process is continuing to run, the disk space occupied by these deleted jars is not freed up unless the region server is shut down. Only then are these deleted files released and removed from the disk. > kylin升级新版本-/mnt/tmp/hbase-hbase/local/jars/tmp产生大量的kylin-coprocessor文件 > -- > > Key: KYLIN-5022 > URL: https://issues.apache.org/jira/browse/KYLIN-5022 > Project: Kylin > Issue Type: Bug > Components: Metadata, Storage - HBase >Affects Versions: v3.1.1 >Reporter: star_dev >Priority: Major > Attachments: Capture.PNG, 屏幕快照1.png, 屏幕快照2.png, 日志.log > > > kylin版本从3.0.2更新到3.1.1,还是用原来的元数据。 > 发现在EMR的core节点中有大量的kylin-coprocessor文件生成,见附件屏幕快照1,占用了大量的空间,导致hdfs文件系统可用空间变少。路径为/mnt/tmp/hbase-hbase/local/jars/tmp > 查询官方文档 [http://kylin.apache.org/docs/howto/howto_update_coprocessor.html] > 执行如下命令仍然不好用,日志信息见附件 > - > > {{$KYLIN_HOME/bin/kylin.sh > org.apache.kylin.storage.hbase.util.DeployCoprocessorCLI default all}} > {{-}} > 同时发现kylin元数据 > kylin_metadata/coprocessor/下有大量的kylin-coprocessor-3.1.1-*.jar文件,见附件屏幕快照2 > > 是什么原因导致的这种现象? > 如何才能在/mnt/tmp/hbase-hbase/local/jars/tmp路径下不再产生大量的文件? -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KYLIN-5022) kylin升级新版本-/mnt/tmp/hbase-hbase/local/jars/tmp产生大量的kylin-coprocessor文件
[ https://issues.apache.org/jira/browse/KYLIN-5022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17381978#comment-17381978 ] Gabor Arki commented on KYLIN-5022: --- The root cause seems to be that Kylin is creating and configuring a unique jar per HBase table thus HBase region servers are downloading this 5.5M jar for each table separately. In our case, the ~11000 tables result in 50+GB of space needed on our HBase region servers. To make this issue worse, it seems over time HBase is starting to delete these jars (maybe when a table is cleaned up, maybe it does so occasionally anyway). But given the HBase region server process is continuing to run, the disk space occupied by these deleted jars is not freed up unless the region server is shut down. Only then are these deleted files released and removed from the disk. > kylin升级新版本-/mnt/tmp/hbase-hbase/local/jars/tmp产生大量的kylin-coprocessor文件 > -- > > Key: KYLIN-5022 > URL: https://issues.apache.org/jira/browse/KYLIN-5022 > Project: Kylin > Issue Type: Bug > Components: Metadata, Storage - HBase >Affects Versions: v3.1.1 >Reporter: star_dev >Priority: Major > Attachments: Capture.PNG, 屏幕快照1.png, 屏幕快照2.png, 日志.log > > > kylin版本从3.0.2更新到3.1.1,还是用原来的元数据。 > 发现在EMR的core节点中有大量的kylin-coprocessor文件生成,见附件屏幕快照1,占用了大量的空间,导致hdfs文件系统可用空间变少。路径为/mnt/tmp/hbase-hbase/local/jars/tmp > 查询官方文档 [http://kylin.apache.org/docs/howto/howto_update_coprocessor.html] > 执行如下命令仍然不好用,日志信息见附件 > - > > {{$KYLIN_HOME/bin/kylin.sh > org.apache.kylin.storage.hbase.util.DeployCoprocessorCLI default all}} > {{-}} > 同时发现kylin元数据 > kylin_metadata/coprocessor/下有大量的kylin-coprocessor-3.1.1-*.jar文件,见附件屏幕快照2 > > 是什么原因导致的这种现象? > 如何才能在/mnt/tmp/hbase-hbase/local/jars/tmp路径下不再产生大量的文件? -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (KYLIN-5022) kylin升级新版本-/mnt/tmp/hbase-hbase/local/jars/tmp产生大量的kylin-coprocessor文件
[ https://issues.apache.org/jira/browse/KYLIN-5022?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabor Arki updated KYLIN-5022: -- Attachment: Capture.PNG > kylin升级新版本-/mnt/tmp/hbase-hbase/local/jars/tmp产生大量的kylin-coprocessor文件 > -- > > Key: KYLIN-5022 > URL: https://issues.apache.org/jira/browse/KYLIN-5022 > Project: Kylin > Issue Type: Bug > Components: Metadata, Storage - HBase >Affects Versions: v3.1.1 >Reporter: star_dev >Priority: Major > Attachments: Capture.PNG, 屏幕快照1.png, 屏幕快照2.png, 日志.log > > > kylin版本从3.0.2更新到3.1.1,还是用原来的元数据。 > 发现在EMR的core节点中有大量的kylin-coprocessor文件生成,见附件屏幕快照1,占用了大量的空间,导致hdfs文件系统可用空间变少。路径为/mnt/tmp/hbase-hbase/local/jars/tmp > 查询官方文档 [http://kylin.apache.org/docs/howto/howto_update_coprocessor.html] > 执行如下命令仍然不好用,日志信息见附件 > - > > {{$KYLIN_HOME/bin/kylin.sh > org.apache.kylin.storage.hbase.util.DeployCoprocessorCLI default all}} > {{-}} > 同时发现kylin元数据 > kylin_metadata/coprocessor/下有大量的kylin-coprocessor-3.1.1-*.jar文件,见附件屏幕快照2 > > 是什么原因导致的这种现象? > 如何才能在/mnt/tmp/hbase-hbase/local/jars/tmp路径下不再产生大量的文件? -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (KYLIN-5022) kylin升级新版本-/mnt/tmp/hbase-hbase/local/jars/tmp产生大量的kylin-coprocessor文件
[ https://issues.apache.org/jira/browse/KYLIN-5022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17378143#comment-17378143 ] Gabor Arki edited comment on KYLIN-5022 at 7/9/21, 4:01 PM: We are encountering a similar issue with v3.1.0. Apart from the {{kylin-coprocessor-.jar}} files present in {{/mnt/tmp}}, this is also causing a disk space leakage. There are a lot of deleted files still referenced by the HBase region server process thus the disk space cannot be actually freed. This causes a significant discrepancy between the {{du}} and {{df}} calculations. In our case, when running the EMR for Kylin for a few months now, 50+ GB of such deleted *{{kylin-coprocessor-3.1.0-}}*{{.jar}} files are still occupying disk space on each core node: [hadoop@ip-23-0-3-131 mnt]$ sudo lsof| grep "/mnt" | grep delete | more java 16611 hbase 666r REG 259,3 5592785 143780992 /mnt/tmp/hbase-hbase/local/jars/tmp/.1369857329.kylin-coprocessor-3.1.0-SNAPSHOT-1597.jar .1602171232484.jar (deleted) java 16611 hbase 679r REG 259,3 5592785 144773588 /mnt/tmp/hbase-hbase/local/jars/tmp/.1170685576.kylin-coprocessor-3.1.0-SNAPSHOT-129.jar. 1602182959858.jar (deleted) java 16611 hbase 680r REG 259,3 5592785 144321908 /mnt/tmp/hbase-hbase/local/jars/tmp/.-1329141342.kylin-coprocessor-3.1.0-SNAPSHOT-3653.ja r.1602180128061.jar (deleted) java 16611 hbase 681r REG 259,3 5592785 144248531 /mnt/tmp/hbase-hbase/local/jars/tmp/.-832621882.kylin-coprocessor-3.1.0-SNAPSHOT-3651.jar .1602179699713.jar (deleted) ... ... was (Author: arkigabor): We are encountering a similar issue with v3.1.0. Apart from the {{kylin-coprocessor-*.jar}} files present in {{/mnt/tmp}}, this is also causing a disk space leakage. There are a lot of deleted files still referenced by the HBase region server process thus the disk space cannot be actually freed. This causes a significant discrepancy between the {{du}} and {{df}} calculations. In our case, when running the EMR for Kylin for a few months now, 50+ GB of such deleted {{kylin-coprocessor-3.1.0-*.jar}} files are still occupying disk space on each core node: [hadoop@ip-23-0-3-131 mnt]$ sudo lsof| grep "/mnt" | grep delete | more java 16611 hbase 666r REG 259,35592785 143780992 /mnt/tmp/hbase-hbase/local/jars/tmp/.1369857329.kylin-coprocessor-3.1.0-SNAPSHOT-1597.jar .1602171232484.jar (deleted) java 16611 hbase 679r REG 259,35592785 144773588 /mnt/tmp/hbase-hbase/local/jars/tmp/.1170685576.kylin-coprocessor-3.1.0-SNAPSHOT-129.jar. 1602182959858.jar (deleted) java 16611 hbase 680r REG 259,35592785 144321908 /mnt/tmp/hbase-hbase/local/jars/tmp/.-1329141342.kylin-coprocessor-3.1.0-SNAPSHOT-3653.ja r.1602180128061.jar (deleted) java 16611 hbase 681r REG 259,35592785 144248531 /mnt/tmp/hbase-hbase/local/jars/tmp/.-832621882.kylin-coprocessor-3.1.0-SNAPSHOT-3651.jar .1602179699713.jar (deleted) ... ... > kylin升级新版本-/mnt/tmp/hbase-hbase/local/jars/tmp产生大量的kylin-coprocessor文件 > -- > > Key: KYLIN-5022 > URL: https://issues.apache.org/jira/browse/KYLIN-5022 > Project: Kylin > Issue Type: Bug > Components: Metadata, Storage - HBase >Affects Versions: v3.1.1 >Reporter: star_dev >Priority: Major > Attachments: 屏幕快照1.png, 屏幕快照2.png, 日志.log > > > kylin版本从3.0.2更新到3.1.1,还是用原来的元数据。 > 发现在EMR的core节点中有大量的kylin-coprocessor文件生成,见附件屏幕快照1,占用了大量的空间,导致hdfs文件系统可用空间变少。路径为/mnt/tmp/hbase-hbase/local/jars/tmp > 查询官方文档 [http://kylin.apache.org/docs/howto/howto_update_coprocessor.html] > 执行如下命令仍然不好用,日志信息见附件 > - > > {{$KYLIN_HOME/bin/kylin.sh > org.apache.kylin.storage.hbase.util.DeployCoprocessorCLI default all}} > {{-}} > 同时发现kylin元数据 > kylin_metadata/coprocessor/下有大量的kylin-coprocessor-3.1.1-*.jar文件,见附件屏幕快照2 > > 是什么原因导致的这种现象? > 如何才能在/mnt/tmp/hbase-hbase/local/jars/tmp路径下不再产生大量的文件? -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (KYLIN-5022) kylin升级新版本-/mnt/tmp/hbase-hbase/local/jars/tmp产生大量的kylin-coprocessor文件
[ https://issues.apache.org/jira/browse/KYLIN-5022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17378143#comment-17378143 ] Gabor Arki edited comment on KYLIN-5022 at 7/9/21, 4:01 PM: We are encountering a similar issue with v3.1.0. Apart from the {{kylin-coprocessor-.jar}} files present in {{/mnt/tmp}}, this is also causing a disk space leakage. There are a lot of deleted files still referenced by the HBase region server process thus the disk space cannot be actually freed. This causes a significant discrepancy between the {{du}} and {{df}} calculations. In our case, when running the EMR for Kylin for a few months now, 50+ GB of such deleted *{{kylin-coprocessor-3.1.0-}}*{{.jar}} files are still occupying disk space on each core node: {code:java} [hadoop@ip-23-0-3-131 mnt]$ sudo lsof| grep "/mnt" | grep delete | more java 16611 hbase 666r REG 259,3 5592785 143780992 /mnt/tmp/hbase-hbase/local/jars/tmp/.1369857329.kylin-coprocessor-3.1.0-SNAPSHOT-1597.jar .1602171232484.jar (deleted) java 16611 hbase 679r REG 259,3 5592785 144773588 /mnt/tmp/hbase-hbase/local/jars/tmp/.1170685576.kylin-coprocessor-3.1.0-SNAPSHOT-129.jar. 1602182959858.jar (deleted) java 16611 hbase 680r REG 259,3 5592785 144321908 /mnt/tmp/hbase-hbase/local/jars/tmp/.-1329141342.kylin-coprocessor-3.1.0-SNAPSHOT-3653.ja r.1602180128061.jar (deleted) java 16611 hbase 681r REG 259,3 5592785 144248531 /mnt/tmp/hbase-hbase/local/jars/tmp/.-832621882.kylin-coprocessor-3.1.0-SNAPSHOT-3651.jar .1602179699713.jar (deleted) ...{code} was (Author: arkigabor): We are encountering a similar issue with v3.1.0. Apart from the {{kylin-coprocessor-.jar}} files present in {{/mnt/tmp}}, this is also causing a disk space leakage. There are a lot of deleted files still referenced by the HBase region server process thus the disk space cannot be actually freed. This causes a significant discrepancy between the {{du}} and {{df}} calculations. In our case, when running the EMR for Kylin for a few months now, 50+ GB of such deleted *{{kylin-coprocessor-3.1.0-}}*{{.jar}} files are still occupying disk space on each core node: [hadoop@ip-23-0-3-131 mnt]$ sudo lsof| grep "/mnt" | grep delete | more java 16611 hbase 666r REG 259,3 5592785 143780992 /mnt/tmp/hbase-hbase/local/jars/tmp/.1369857329.kylin-coprocessor-3.1.0-SNAPSHOT-1597.jar .1602171232484.jar (deleted) java 16611 hbase 679r REG 259,3 5592785 144773588 /mnt/tmp/hbase-hbase/local/jars/tmp/.1170685576.kylin-coprocessor-3.1.0-SNAPSHOT-129.jar. 1602182959858.jar (deleted) java 16611 hbase 680r REG 259,3 5592785 144321908 /mnt/tmp/hbase-hbase/local/jars/tmp/.-1329141342.kylin-coprocessor-3.1.0-SNAPSHOT-3653.ja r.1602180128061.jar (deleted) java 16611 hbase 681r REG 259,3 5592785 144248531 /mnt/tmp/hbase-hbase/local/jars/tmp/.-832621882.kylin-coprocessor-3.1.0-SNAPSHOT-3651.jar .1602179699713.jar (deleted) ... ... > kylin升级新版本-/mnt/tmp/hbase-hbase/local/jars/tmp产生大量的kylin-coprocessor文件 > -- > > Key: KYLIN-5022 > URL: https://issues.apache.org/jira/browse/KYLIN-5022 > Project: Kylin > Issue Type: Bug > Components: Metadata, Storage - HBase >Affects Versions: v3.1.1 >Reporter: star_dev >Priority: Major > Attachments: 屏幕快照1.png, 屏幕快照2.png, 日志.log > > > kylin版本从3.0.2更新到3.1.1,还是用原来的元数据。 > 发现在EMR的core节点中有大量的kylin-coprocessor文件生成,见附件屏幕快照1,占用了大量的空间,导致hdfs文件系统可用空间变少。路径为/mnt/tmp/hbase-hbase/local/jars/tmp > 查询官方文档 [http://kylin.apache.org/docs/howto/howto_update_coprocessor.html] > 执行如下命令仍然不好用,日志信息见附件 > - > > {{$KYLIN_HOME/bin/kylin.sh > org.apache.kylin.storage.hbase.util.DeployCoprocessorCLI default all}} > {{-}} > 同时发现kylin元数据 > kylin_metadata/coprocessor/下有大量的kylin-coprocessor-3.1.1-*.jar文件,见附件屏幕快照2 > > 是什么原因导致的这种现象? > 如何才能在/mnt/tmp/hbase-hbase/local/jars/tmp路径下不再产生大量的文件? -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KYLIN-5022) kylin升级新版本-/mnt/tmp/hbase-hbase/local/jars/tmp产生大量的kylin-coprocessor文件
[ https://issues.apache.org/jira/browse/KYLIN-5022?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17378143#comment-17378143 ] Gabor Arki commented on KYLIN-5022: --- We are encountering a similar issue with v3.1.0. Apart from the {{kylin-coprocessor-*.jar}} files present in {{/mnt/tmp}}, this is also causing a disk space leakage. There are a lot of deleted files still referenced by the HBase region server process thus the disk space cannot be actually freed. This causes a significant discrepancy between the {{du}} and {{df}} calculations. In our case, when running the EMR for Kylin for a few months now, 50+ GB of such deleted {{kylin-coprocessor-3.1.0-*.jar}} files are still occupying disk space on each core node: [hadoop@ip-23-0-3-131 mnt]$ sudo lsof| grep "/mnt" | grep delete | more java 16611 hbase 666r REG 259,35592785 143780992 /mnt/tmp/hbase-hbase/local/jars/tmp/.1369857329.kylin-coprocessor-3.1.0-SNAPSHOT-1597.jar .1602171232484.jar (deleted) java 16611 hbase 679r REG 259,35592785 144773588 /mnt/tmp/hbase-hbase/local/jars/tmp/.1170685576.kylin-coprocessor-3.1.0-SNAPSHOT-129.jar. 1602182959858.jar (deleted) java 16611 hbase 680r REG 259,35592785 144321908 /mnt/tmp/hbase-hbase/local/jars/tmp/.-1329141342.kylin-coprocessor-3.1.0-SNAPSHOT-3653.ja r.1602180128061.jar (deleted) java 16611 hbase 681r REG 259,35592785 144248531 /mnt/tmp/hbase-hbase/local/jars/tmp/.-832621882.kylin-coprocessor-3.1.0-SNAPSHOT-3651.jar .1602179699713.jar (deleted) ... ... > kylin升级新版本-/mnt/tmp/hbase-hbase/local/jars/tmp产生大量的kylin-coprocessor文件 > -- > > Key: KYLIN-5022 > URL: https://issues.apache.org/jira/browse/KYLIN-5022 > Project: Kylin > Issue Type: Bug > Components: Metadata, Storage - HBase >Affects Versions: v3.1.1 >Reporter: star_dev >Priority: Major > Attachments: 屏幕快照1.png, 屏幕快照2.png, 日志.log > > > kylin版本从3.0.2更新到3.1.1,还是用原来的元数据。 > 发现在EMR的core节点中有大量的kylin-coprocessor文件生成,见附件屏幕快照1,占用了大量的空间,导致hdfs文件系统可用空间变少。路径为/mnt/tmp/hbase-hbase/local/jars/tmp > 查询官方文档 [http://kylin.apache.org/docs/howto/howto_update_coprocessor.html] > 执行如下命令仍然不好用,日志信息见附件 > - > > {{$KYLIN_HOME/bin/kylin.sh > org.apache.kylin.storage.hbase.util.DeployCoprocessorCLI default all}} > {{-}} > 同时发现kylin元数据 > kylin_metadata/coprocessor/下有大量的kylin-coprocessor-3.1.1-*.jar文件,见附件屏幕快照2 > > 是什么原因导致的这种现象? > 如何才能在/mnt/tmp/hbase-hbase/local/jars/tmp路径下不再产生大量的文件? -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KYLIN-4689) Deadlock in Kylin job execution
[ https://issues.apache.org/jira/browse/KYLIN-4689?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17203901#comment-17203901 ] Gabor Arki commented on KYLIN-4689: --- [~xxyu], I might be wrong but I think this issue is happening due to a race condition and not a broken lock: * Even though {{kylin.job.max-concurrent-jobs}} config defines 10, Kylin is submitting more stream jobs than this limit (my experience is around 30 with a 3 cube setup) * 30 or so jobs are competing for 10 slots and 1 lock per cube * At first, this is not a problem. Let's say *jobX* starts running, acquires the lock, and runs _Build Dimension Dictionaries For Steaming Job_ step * But after each step finished, the jobs are competing again for the resources * If *jobX* did get a slot from the 10, then everything is working as expected: it is running _Save Cube Dictionaries_ step and unlocks * But if *jobX* didn't get a slot, it is a deadlock: the other jobs will be running and occupying all 10 slots and waiting for the lock indefinitely, while *jobX* will be possessing the lock and waiting for a free slot indefinitely Removing the lock of *jobX* at this point fixes the deadlock but re-introduces the possibility of running into https://issues.apache.org/jira/browse/KYLIN-4165. > Deadlock in Kylin job execution > --- > > Key: KYLIN-4689 > URL: https://issues.apache.org/jira/browse/KYLIN-4689 > Project: Kylin > Issue Type: Bug > Components: Job Engine >Affects Versions: v3.0.0, v3.1.0, v3.0.1, v3.0.2 >Reporter: Gabor Arki >Assignee: Xiaoxiang Yu >Priority: Critical > Fix For: v3.1.1 > > > h4. Reproduction steps > * Install Kylin 3.1.0 > * Deploy a streaming cube > * Enable the cube having historical data present in the Kafka topic > * Note: in our case, we had 3 cubes deployed, each consuming ~20-20 hourly > segments from Kafka when the cubes were enabled > h4. Expected result > * Kylin is starting to process stream segments with stream jobs, eventually > processing the older segments and catching up with the stream > h4. Actual result > * A short time time after the stream jobs have started (37 successful stream > jobs), all jobs are completely stuck without any progress. Some in running > state, some in pending state. > * The following logs are continuously written: > {code:java} > 2020-08-06 06:16:22 INFO [Scheduler 116797841 Job > 12750aea-3b96-c817-64e8-bf893d8c120f-254] MapReduceExecutable:409 - > 12750aea-3b96-c817-64e8-bf893d8c120f-00, parent lock > path(/cube_job_lock/cube_vm) is locked by other job result is true ,ephemeral > lock path :/cube_job_ephemeral_lock/cube_vm is locked by other job result is > true,will try after one minute > 2020-08-06 06:16:33 WARN [FetcherRunner 787667774-43] FetcherRunner:56 - > There are too many jobs running, Job Fetch will wait until next schedule time > {code} > * Zookeeper indicates the following locks are in place: > {code:java} > ls /kylin/kylin_metadata/cube_job_ephemeral_lock > [cube_cm, cube_vm, cube_jm] > ls /kylin/kylin_metadata/cube_job_ephemeral_lock/cube_cm > [] > ls /kylin/kylin_metadata/cube_job_ephemeral_lock/cube_vm > [] > ls /kylin/kylin_metadata/cube_job_ephemeral_lock/cube_jm > [] > ls /kylin/kylin_metadata/cube_job_lock > [cube_cm, cube_vm, cube_jm] > ls /kylin/kylin_metadata/cube_job_lock/cube_cm > [f888380e-9ff4-98f5-2df4-1ae71e045f93] > ls /kylin/kylin_metadata/cube_job_lock/cube_vm > [fc186bd9-1186-6ed4-e58c-bbbf6dd8ef74] > ls /kylin/kylin_metadata/cube_job_lock/cube_jm > [d1a6475a-9ab2-5ee4-6714-f395e20cfc01] > {code} > * The job IDs for the running jobs: > ** 169f75fa-a02f-221b-fc48-037bc7a842d0 > ** 0b5dae1b-6faf-66c5-71dc-86f5b820f1c4 > ** 00924699-8b51-8091-6e71-34ccfeba3a98 > ** 4620192a-71e1-16dd-3b05-44d7f9144ad4 > ** 416355c2-a3d7-57eb-55c6-c042aa256510 > ** 12750aea-3b96-c817-64e8-bf893d8c120f > ** 42819dde-5857-fd6b-b075-439952f47140 > ** 00128937-bd4a-d6c1-7a4e-744dee946f67 > ** 46a0233f-217e-9155-725b-c815ad77ba2c > ** 062150ba-bacd-6644-4801-3a51b260d1c5 > As you can see, the 10 jobs that are actually running do not possess the > locks thus cannot actually do anything (these all were stuck at step Build > Dimension Dictionaries For Steaming Job). On the other hand, the 3 jobs > possessing the locks cannot resume running because there are already 10 jobs > in running state, thus cannot proceed and release the locks. This is a > deadlock and the cluster is completely stuck. > We have been observing this behavior in 3.0.0 (where rolling back > https://issues.apache.org/jira/browse/KYLIN-4165 resolved the issue), and now > in 3.1.0 as well. It has been originally reported in the comments of > https://issues.apache.org/jira/browse/KYLIN-4348 but I'm not sure that it's > related to that bug/epic. --
[jira] [Updated] (KYLIN-4689) Deadlock in Kylin job execution
[ https://issues.apache.org/jira/browse/KYLIN-4689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabor Arki updated KYLIN-4689: -- Description: h4. Reproduction steps * Install Kylin 3.1.0 * Deploy a streaming cube * Enable the cube having historical data present in the Kafka topic * Note: in our case, we had 3 cubes deployed, each consuming ~20-20 hourly segments from Kafka when the cubes were enabled h4. Expected result * Kylin is starting to process stream segments with stream jobs, eventually processing the older segments and catching up with the stream h4. Actual result * A short time time after the stream jobs have started (37 successful stream jobs), all jobs are completely stuck without any progress. Some in running state, some in pending state. * The following logs are continuously written: {code:java} 2020-08-06 06:16:22 INFO [Scheduler 116797841 Job 12750aea-3b96-c817-64e8-bf893d8c120f-254] MapReduceExecutable:409 - 12750aea-3b96-c817-64e8-bf893d8c120f-00, parent lock path(/cube_job_lock/cube_vm) is locked by other job result is true ,ephemeral lock path :/cube_job_ephemeral_lock/cube_vm is locked by other job result is true,will try after one minute 2020-08-06 06:16:33 WARN [FetcherRunner 787667774-43] FetcherRunner:56 - There are too many jobs running, Job Fetch will wait until next schedule time {code} * Zookeeper indicates the following locks are in place: {code:java} ls /kylin/kylin_metadata/cube_job_ephemeral_lock [cube_cm, cube_vm, cube_jm] ls /kylin/kylin_metadata/cube_job_ephemeral_lock/cube_cm [] ls /kylin/kylin_metadata/cube_job_ephemeral_lock/cube_vm [] ls /kylin/kylin_metadata/cube_job_ephemeral_lock/cube_jm [] ls /kylin/kylin_metadata/cube_job_lock [cube_cm, cube_vm, cube_jm] ls /kylin/kylin_metadata/cube_job_lock/cube_cm [f888380e-9ff4-98f5-2df4-1ae71e045f93] ls /kylin/kylin_metadata/cube_job_lock/cube_vm [fc186bd9-1186-6ed4-e58c-bbbf6dd8ef74] ls /kylin/kylin_metadata/cube_job_lock/cube_jm [d1a6475a-9ab2-5ee4-6714-f395e20cfc01] {code} * The job IDs for the running jobs: ** 169f75fa-a02f-221b-fc48-037bc7a842d0 ** 0b5dae1b-6faf-66c5-71dc-86f5b820f1c4 ** 00924699-8b51-8091-6e71-34ccfeba3a98 ** 4620192a-71e1-16dd-3b05-44d7f9144ad4 ** 416355c2-a3d7-57eb-55c6-c042aa256510 ** 12750aea-3b96-c817-64e8-bf893d8c120f ** 42819dde-5857-fd6b-b075-439952f47140 ** 00128937-bd4a-d6c1-7a4e-744dee946f67 ** 46a0233f-217e-9155-725b-c815ad77ba2c ** 062150ba-bacd-6644-4801-3a51b260d1c5 As you can see, the 10 jobs that are actually running do not possess the locks thus cannot actually do anything (these all were stuck at step Build Dimension Dictionaries For Steaming Job). On the other hand, the 3 jobs possessing the locks cannot resume running because there are already 10 jobs in running state, thus cannot proceed and release the locks. This is a deadlock and the cluster is completely stuck. We have been observing this behavior in 3.0.0 (where rolling back https://issues.apache.org/jira/browse/KYLIN-4165 resolved the issue), and now in 3.1.0 as well. It has been originally reported in the comments of https://issues.apache.org/jira/browse/KYLIN-4348 but I'm not sure that it's related to that bug/epic. was: h4. Reproduction steps * Install Kylin 3.1.0 * Deploy a streaming cube * Enable the cube having historical data present in the Kafka topic * Note: in our case, we had 3 cubes deployed, each consuming ~20-20 hourly segments from Kafka when the cubes were enabled h4. Expected result * Kylin is starting to process stream segments with stream jobs, eventually processing the older segments and catching up with the stream h4. Actual result * After a short time, all jobs are completely stuck without any progress. Some in running state, some in pending state. * The following logs are continously written: {code:java} 2020-08-06 06:16:22 INFO [Scheduler 116797841 Job 12750aea-3b96-c817-64e8-bf893d8c120f-254] MapReduceExecutable:409 - 12750aea-3b96-c817-64e8-bf893d8c120f-00, parent lock path(/cube_job_lock/cube_vm) is locked by other job result is true ,ephemeral lock path :/cube_job_ephemeral_lock/cube_vm is locked by other job result is true,will try after one minute 2020-08-06 06:16:33 WARN [FetcherRunner 787667774-43] FetcherRunner:56 - There are too many jobs running, Job Fetch will wait until next schedule time {code} * Zookeeper indicates the following locks are in place: {code:java} ls /kylin/kylin_metadata/cube_job_ephemeral_lock [cube_cm, cube_vm, cube_jm] ls /kylin/kylin_metadata/cube_job_ephemeral_lock/cube_cm [] ls /kylin/kylin_metadata/cube_job_ephemeral_lock/cube_vm [] ls /kylin/kylin_metadata/cube_job_ephemeral_lock/cube_jm [] ls /kylin/kylin_metadata/cube_job_lock [cube_cm, cube_vm, cube_jm] ls /kylin/kylin_metadata/cube_job_lock/cube_cm [f888380e-9ff4-98f5-2df4-1ae71e045f93] ls /kylin/kylin_metadata/cube_job_lock/cube_vm [fc186bd9-1186-6ed4-e58c-bbbf6dd8ef74]
[jira] [Updated] (KYLIN-4689) Deadlock in Kylin job execution
[ https://issues.apache.org/jira/browse/KYLIN-4689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabor Arki updated KYLIN-4689: -- Affects Version/s: v3.0.1 v3.0.2 > Deadlock in Kylin job execution > --- > > Key: KYLIN-4689 > URL: https://issues.apache.org/jira/browse/KYLIN-4689 > Project: Kylin > Issue Type: Bug > Components: Job Engine >Affects Versions: v3.0.0, v3.1.0, v3.0.1, v3.0.2 >Reporter: Gabor Arki >Priority: Critical > > h4. Reproduction steps > * Install Kylin 3.1.0 > * Deploy a streaming cube > * Enable the cube having historical data present in the Kafka topic > * Note: in our case, we had 3 cubes deployed, each consuming ~20-20 hourly > segments from Kafka when the cubes were enabled > h4. Expected result > * Kylin is starting to process stream segments with stream jobs, eventually > processing the older segments and catching up with the stream > h4. Actual result > * After a short time, all jobs are completely stuck without any progress. > Some in running state, some in pending state. > * The following logs are continously written: > {code:java} > 2020-08-06 06:16:22 INFO [Scheduler 116797841 Job > 12750aea-3b96-c817-64e8-bf893d8c120f-254] MapReduceExecutable:409 - > 12750aea-3b96-c817-64e8-bf893d8c120f-00, parent lock > path(/cube_job_lock/cube_vm) is locked by other job result is true ,ephemeral > lock path :/cube_job_ephemeral_lock/cube_vm is locked by other job result is > true,will try after one minute > 2020-08-06 06:16:33 WARN [FetcherRunner 787667774-43] FetcherRunner:56 - > There are too many jobs running, Job Fetch will wait until next schedule time > {code} > * Zookeeper indicates the following locks are in place: > {code:java} > ls /kylin/kylin_metadata/cube_job_ephemeral_lock > [cube_cm, cube_vm, cube_jm] > ls /kylin/kylin_metadata/cube_job_ephemeral_lock/cube_cm > [] > ls /kylin/kylin_metadata/cube_job_ephemeral_lock/cube_vm > [] > ls /kylin/kylin_metadata/cube_job_ephemeral_lock/cube_jm > [] > ls /kylin/kylin_metadata/cube_job_lock > [cube_cm, cube_vm, cube_jm] > ls /kylin/kylin_metadata/cube_job_lock/cube_cm > [f888380e-9ff4-98f5-2df4-1ae71e045f93] > ls /kylin/kylin_metadata/cube_job_lock/cube_vm > [fc186bd9-1186-6ed4-e58c-bbbf6dd8ef74] > ls /kylin/kylin_metadata/cube_job_lock/cube_jm > [d1a6475a-9ab2-5ee4-6714-f395e20cfc01] > {code} > * The job IDs for the running jobs: > * > ** 169f75fa-a02f-221b-fc48-037bc7a842d0 > ** 0b5dae1b-6faf-66c5-71dc-86f5b820f1c4 > ** 00924699-8b51-8091-6e71-34ccfeba3a98 > ** 4620192a-71e1-16dd-3b05-44d7f9144ad4 > ** 416355c2-a3d7-57eb-55c6-c042aa256510 > ** 12750aea-3b96-c817-64e8-bf893d8c120f > ** 42819dde-5857-fd6b-b075-439952f47140 > ** 00128937-bd4a-d6c1-7a4e-744dee946f67 > ** 46a0233f-217e-9155-725b-c815ad77ba2c > ** 062150ba-bacd-6644-4801-3a51b260d1c5 > As you can see, the 10 jobs that are actually running do not possess the > locks thus cannot actually do anything. On the other hand, the 3 jobs > possessing the locks cannot resume running because there are already 10 jobs > in running state, thus cannot proceed and release the locks. This is a > deadlock and the cluster is completely stuck. > We have been observing this behavior in 3.0.0 (where rolling back > https://issues.apache.org/jira/browse/KYLIN-4165 resolved the issue), and now > in 3.1.0 as well. It has been originally reported in the comments of > https://issues.apache.org/jira/browse/KYLIN-4348 but I'm not sure that it's > related to that bug/epic. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (KYLIN-4689) Deadlock in Kylin job execution
[ https://issues.apache.org/jira/browse/KYLIN-4689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabor Arki updated KYLIN-4689: -- Description: h4. Reproduction steps * Install Kylin 3.1.0 * Deploy a streaming cube * Enable the cube having historical data present in the Kafka topic * Note: in our case, we had 3 cubes deployed, each consuming ~20-20 hourly segments from Kafka when the cubes were enabled h4. Expected result * Kylin is starting to process stream segments with stream jobs, eventually processing the older segments and catching up with the stream h4. Actual result * After a short time, all jobs are completely stuck without any progress. Some in running state, some in pending state. * The following logs are continously written: {code:java} 2020-08-06 06:16:22 INFO [Scheduler 116797841 Job 12750aea-3b96-c817-64e8-bf893d8c120f-254] MapReduceExecutable:409 - 12750aea-3b96-c817-64e8-bf893d8c120f-00, parent lock path(/cube_job_lock/cube_vm) is locked by other job result is true ,ephemeral lock path :/cube_job_ephemeral_lock/cube_vm is locked by other job result is true,will try after one minute 2020-08-06 06:16:33 WARN [FetcherRunner 787667774-43] FetcherRunner:56 - There are too many jobs running, Job Fetch will wait until next schedule time {code} * Zookeeper indicates the following locks are in place: {code:java} ls /kylin/kylin_metadata/cube_job_ephemeral_lock [cube_cm, cube_vm, cube_jm] ls /kylin/kylin_metadata/cube_job_ephemeral_lock/cube_cm [] ls /kylin/kylin_metadata/cube_job_ephemeral_lock/cube_vm [] ls /kylin/kylin_metadata/cube_job_ephemeral_lock/cube_jm [] ls /kylin/kylin_metadata/cube_job_lock [cube_cm, cube_vm, cube_jm] ls /kylin/kylin_metadata/cube_job_lock/cube_cm [f888380e-9ff4-98f5-2df4-1ae71e045f93] ls /kylin/kylin_metadata/cube_job_lock/cube_vm [fc186bd9-1186-6ed4-e58c-bbbf6dd8ef74] ls /kylin/kylin_metadata/cube_job_lock/cube_jm [d1a6475a-9ab2-5ee4-6714-f395e20cfc01] {code} * The job IDs for the running jobs: * ** 169f75fa-a02f-221b-fc48-037bc7a842d0 ** 0b5dae1b-6faf-66c5-71dc-86f5b820f1c4 ** 00924699-8b51-8091-6e71-34ccfeba3a98 ** 4620192a-71e1-16dd-3b05-44d7f9144ad4 ** 416355c2-a3d7-57eb-55c6-c042aa256510 ** 12750aea-3b96-c817-64e8-bf893d8c120f ** 42819dde-5857-fd6b-b075-439952f47140 ** 00128937-bd4a-d6c1-7a4e-744dee946f67 ** 46a0233f-217e-9155-725b-c815ad77ba2c ** 062150ba-bacd-6644-4801-3a51b260d1c5 As you can see, the 10 jobs that are actually running do not possess the locks thus cannot actually do anything. On the other hand, the 3 jobs possessing the locks cannot resume running because there are already 10 jobs in running state, thus cannot proceed and release the locks. This is a deadlock and the cluster is completely stuck. We have been observing this behavior in 3.0.0 (where rolling back https://issues.apache.org/jira/browse/KYLIN-4165 resolved the issue), and now in 3.1.0 as well. It has been originally reported in the comments of https://issues.apache.org/jira/browse/KYLIN-4348 but I'm not sure that it's related to that bug/epic. was: h4. Reproduction steps * Install Kylin 3.1.0 * Deploy a streaming cube * Enable the cube having historical data present in the Kafka topic * Note: in our case, we had 3 cubes deployed, each consuming ~20-20 hourly segments from Kafka when the cubes were enabled h4. Expected result * Kylin is starting to process stream segments with stream jobs, eventually processing the older segments and catching up with the stream h4. Actual result * After a short time, all jobs are completely stuck without any progress. Some in running state, some in pending state. * The following logs are continously written: {code:java} 2020-08-06 06:16:22 INFO [Scheduler 116797841 Job 12750aea-3b96-c817-64e8-bf893d8c120f-254] MapReduceExecutable:409 - 12750aea-3b96-c817-64e8-bf893d8c120f-00, parent lock path(/cube_job_lock/cube_vm) is locked by other job result is true ,ephemeral lock path :/cube_job_ephemeral_lock/cube_vm is locked by other job result is true,will try after one minute 2020-08-06 06:16:33 WARN [FetcherRunner 787667774-43] FetcherRunner:56 - There are too many jobs running, Job Fetch will wait until next schedule time {code} * Zookeeper indicates the following locks are in place: {code:java} ls /kylin/kylin_metadata/cube_job_ephemeral_lock [cube_cm, cube_vm, cube_jm] ls /kylin/kylin_metadata/cube_job_ephemeral_lock/cube_cm [] ls /kylin/kylin_metadata/cube_job_ephemeral_lock/cube_vm [] ls /kylin/kylin_metadata/cube_job_ephemeral_lock/cube_jm [] ls /kylin/kylin_metadata/cube_job_lock [cube_cm, cube_vm, cube_jm] ls /kylin/kylin_metadata/cube_job_lock/cube_cm [f888380e-9ff4-98f5-2df4-1ae71e045f93] ls /kylin/kylin_metadata/cube_job_lock/cube_vm [fc186bd9-1186-6ed4-e58c-bbbf6dd8ef74] ls /kylin/kylin_metadata/cube_job_lock/cube_jm [d1a6475a-9ab2-5ee4-6714-f395e20cfc01] {code} * The job IDs for the running jobs: *
[jira] [Updated] (KYLIN-4689) Deadlock in Kylin job execution
[ https://issues.apache.org/jira/browse/KYLIN-4689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabor Arki updated KYLIN-4689: -- Affects Version/s: v3.0.0 v3.1.0 > Deadlock in Kylin job execution > --- > > Key: KYLIN-4689 > URL: https://issues.apache.org/jira/browse/KYLIN-4689 > Project: Kylin > Issue Type: Bug > Components: Job Engine >Affects Versions: v3.0.0, v3.1.0 >Reporter: Gabor Arki >Priority: Critical > > h4. Reproduction steps > * Install Kylin 3.1.0 > * Deploy a streaming cube > * Enable the cube having historical data present in the Kafka topic > * Note: in our case, we had 3 cubes deployed, each consuming ~20-20 hourly > segments from Kafka when the cubes were enabled > h4. Expected result > * Kylin is starting to process stream segments with stream jobs, eventually > processing the older segments and catching up with the stream > h4. Actual result > * After a short time, all jobs are completely stuck without any progress. > Some in running state, some in pending state. > * The following logs are continously written: > {code:java} > 2020-08-06 06:16:22 INFO [Scheduler 116797841 Job > 12750aea-3b96-c817-64e8-bf893d8c120f-254] MapReduceExecutable:409 - > 12750aea-3b96-c817-64e8-bf893d8c120f-00, parent lock > path(/cube_job_lock/cube_vm) is locked by other job result is true ,ephemeral > lock path :/cube_job_ephemeral_lock/cube_vm is locked by other job result is > true,will try after one minute > 2020-08-06 06:16:33 WARN [FetcherRunner 787667774-43] FetcherRunner:56 - > There are too many jobs running, Job Fetch will wait until next schedule time > {code} > * Zookeeper indicates the following locks are in place: > {code:java} > ls /kylin/kylin_metadata/cube_job_ephemeral_lock > [cube_cm, cube_vm, cube_jm] > ls /kylin/kylin_metadata/cube_job_ephemeral_lock/cube_cm > [] > ls /kylin/kylin_metadata/cube_job_ephemeral_lock/cube_vm > [] > ls /kylin/kylin_metadata/cube_job_ephemeral_lock/cube_jm > [] > ls /kylin/kylin_metadata/cube_job_lock > [cube_cm, cube_vm, cube_jm] > ls /kylin/kylin_metadata/cube_job_lock/cube_cm > [f888380e-9ff4-98f5-2df4-1ae71e045f93] > ls /kylin/kylin_metadata/cube_job_lock/cube_vm > [fc186bd9-1186-6ed4-e58c-bbbf6dd8ef74] > ls /kylin/kylin_metadata/cube_job_lock/cube_jm > [d1a6475a-9ab2-5ee4-6714-f395e20cfc01] > {code} > * The job IDs for the running jobs: > * > ** 169f75fa-a02f-221b-fc48-037bc7a842d0 > ** 0b5dae1b-6faf-66c5-71dc-86f5b820f1c4 > ** 00924699-8b51-8091-6e71-34ccfeba3a98 > ** 4620192a-71e1-16dd-3b05-44d7f9144ad4 > ** 416355c2-a3d7-57eb-55c6-c042aa256510 > ** 12750aea-3b96-c817-64e8-bf893d8c120f > ** 42819dde-5857-fd6b-b075-439952f47140 > ** 00128937-bd4a-d6c1-7a4e-744dee946f67 > ** 46a0233f-217e-9155-725b-c815ad77ba2c > ** 062150ba-bacd-6644-4801-3a51b260d1c5 > As you can see, the 10 jobs that are actually running do not possess the > locks thus cannot actually do anything. On the other hand, the 3 jobs > possessing the locks cannot resume running because there are already 10 jobs > in running state, thus cannot proceed and release the locks. This is a > deadlock and the cluster is completely stuck. > We have been observing this behavior in 3.0.0 (where rolling back > https://issues.apache.org/jira/browse/KYLIN-4165 resolved the issue), and now > in 3.1.0 as well. It has been originally reported in the comments of > https://issues.apache.org/jira/browse/KYLIN-4348. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (KYLIN-4689) Deadlock in Kylin job execution
[ https://issues.apache.org/jira/browse/KYLIN-4689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabor Arki updated KYLIN-4689: -- Description: h4. Reproduction steps * Install Kylin 3.1.0 * Deploy a streaming cube * Enable the cube having historical data present in the Kafka topic * Note: in our case, we had 3 cubes deployed, each consuming ~20-20 hourly segments from Kafka when the cubes were enabled h4. Expected result * Kylin is starting to process stream segments with stream jobs, eventually processing the older segments and catching up with the stream h4. Actual result * After a short time, all jobs are completely stuck without any progress. Some in running state, some in pending state. * The following logs are continously written: {code:java} 2020-08-06 06:16:22 INFO [Scheduler 116797841 Job 12750aea-3b96-c817-64e8-bf893d8c120f-254] MapReduceExecutable:409 - 12750aea-3b96-c817-64e8-bf893d8c120f-00, parent lock path(/cube_job_lock/cube_vm) is locked by other job result is true ,ephemeral lock path :/cube_job_ephemeral_lock/cube_vm is locked by other job result is true,will try after one minute 2020-08-06 06:16:33 WARN [FetcherRunner 787667774-43] FetcherRunner:56 - There are too many jobs running, Job Fetch will wait until next schedule time {code} * Zookeeper indicates the following locks are in place: {code:java} ls /kylin/kylin_metadata/cube_job_ephemeral_lock [cube_cm, cube_vm, cube_jm] ls /kylin/kylin_metadata/cube_job_ephemeral_lock/cube_cm [] ls /kylin/kylin_metadata/cube_job_ephemeral_lock/cube_vm [] ls /kylin/kylin_metadata/cube_job_ephemeral_lock/cube_jm [] ls /kylin/kylin_metadata/cube_job_lock [cube_cm, cube_vm, cube_jm] ls /kylin/kylin_metadata/cube_job_lock/cube_cm [f888380e-9ff4-98f5-2df4-1ae71e045f93] ls /kylin/kylin_metadata/cube_job_lock/cube_vm [fc186bd9-1186-6ed4-e58c-bbbf6dd8ef74] ls /kylin/kylin_metadata/cube_job_lock/cube_jm [d1a6475a-9ab2-5ee4-6714-f395e20cfc01] {code} * The job IDs for the running jobs: * ** 169f75fa-a02f-221b-fc48-037bc7a842d0 ** 0b5dae1b-6faf-66c5-71dc-86f5b820f1c4 ** 00924699-8b51-8091-6e71-34ccfeba3a98 ** 4620192a-71e1-16dd-3b05-44d7f9144ad4 ** 416355c2-a3d7-57eb-55c6-c042aa256510 ** 12750aea-3b96-c817-64e8-bf893d8c120f ** 42819dde-5857-fd6b-b075-439952f47140 ** 00128937-bd4a-d6c1-7a4e-744dee946f67 ** 46a0233f-217e-9155-725b-c815ad77ba2c ** 062150ba-bacd-6644-4801-3a51b260d1c5 As you can see, the 10 jobs that are actually running do not possess the locks thus cannot actually do anything. On the other hand, the 3 jobs possessing the locks cannot resume running because there are already 10 jobs in running state, thus cannot proceed and release the locks. This is a deadlock that completely stuck the cluster. We have been observing this behavior in 3.0.0 (where rolling back https://issues.apache.org/jira/browse/KYLIN-4165 resolved the issue), and now in 3.1.0 as well. It has been originally reported in the comments of https://issues.apache.org/jira/browse/KYLIN-4348. was: h4. Reproduction steps * Install Kylin 3.1.0 * Deploy a streaming cube * Enable the cube having historical data present in the Kafka topic * Note: in our case, we had 3 cubes deployed, each consuming ~20-20 hourly segments from Kafka when the cubes were enabled h4. Expected result * Kylin is starting to process stream segments with stream jobs, eventually processing the older segments and catching up with the stream h4. Actual result * After a short time, all jobs are completely stuck without any progress. Some in running state, some in pending state. * The following logs are continously written: {code:java} 2020-08-06 06:16:22 INFO [Scheduler 116797841 Job 12750aea-3b96-c817-64e8-bf893d8c120f-254] MapReduceExecutable:409 - 12750aea-3b96-c817-64e8-bf893d8c120f-00, parent lock path(/cube_job_lock/cube_vm) is locked by other job result is true ,ephemeral lock path :/cube_job_ephemeral_lock/cube_vm is locked by other job result is true,will try after one minute 2020-08-06 06:16:33 WARN [FetcherRunner 787667774-43] FetcherRunner:56 - There are too many jobs running, Job Fetch will wait until next schedule time {code} * Zookeeper indicates the following locks are in place: {code:java} ls /kylin/kylin_metadata/cube_job_ephemeral_lock [cube_cm, cube_vm, cube_jm] ls /kylin/kylin_metadata/cube_job_ephemeral_lock/cube_cm [] ls /kylin/kylin_metadata/cube_job_ephemeral_lock/cube_vm [] ls /kylin/kylin_metadata/cube_job_ephemeral_lock/cube_jm [] ls /kylin/kylin_metadata/cube_job_lock [cube_cm, cube_vm, cube_jm] ls /kylin/kylin_metadata/cube_job_lock/cube_cm [f888380e-9ff4-98f5-2df4-1ae71e045f93] ls /kylin/kylin_metadata/cube_job_lock/cube_vm [fc186bd9-1186-6ed4-e58c-bbbf6dd8ef74] ls /kylin/kylin_metadata/cube_job_lock/cube_jm [d1a6475a-9ab2-5ee4-6714-f395e20cfc01] {code} * The job IDs for the running jobs: * ** 169f75fa-a02f-221b-fc48-037bc7a842d0 **
[jira] [Updated] (KYLIN-4689) Deadlock in Kylin job execution
[ https://issues.apache.org/jira/browse/KYLIN-4689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabor Arki updated KYLIN-4689: -- Description: h4. Reproduction steps * Install Kylin 3.1.0 * Deploy a streaming cube * Enable the cube having historical data present in the Kafka topic * Note: in our case, we had 3 cubes deployed, each consuming ~20-20 hourly segments from Kafka when the cubes were enabled h4. Expected result * Kylin is starting to process stream segments with stream jobs, eventually processing the older segments and catching up with the stream h4. Actual result * After a short time, all jobs are completely stuck without any progress. Some in running state, some in pending state. * The following logs are continously written: {code:java} 2020-08-06 06:16:22 INFO [Scheduler 116797841 Job 12750aea-3b96-c817-64e8-bf893d8c120f-254] MapReduceExecutable:409 - 12750aea-3b96-c817-64e8-bf893d8c120f-00, parent lock path(/cube_job_lock/cube_vm) is locked by other job result is true ,ephemeral lock path :/cube_job_ephemeral_lock/cube_vm is locked by other job result is true,will try after one minute 2020-08-06 06:16:33 WARN [FetcherRunner 787667774-43] FetcherRunner:56 - There are too many jobs running, Job Fetch will wait until next schedule time {code} * Zookeeper indicates the following locks are in place: {code:java} ls /kylin/kylin_metadata/cube_job_ephemeral_lock [cube_cm, cube_vm, cube_jm] ls /kylin/kylin_metadata/cube_job_ephemeral_lock/cube_cm [] ls /kylin/kylin_metadata/cube_job_ephemeral_lock/cube_vm [] ls /kylin/kylin_metadata/cube_job_ephemeral_lock/cube_jm [] ls /kylin/kylin_metadata/cube_job_lock [cube_cm, cube_vm, cube_jm] ls /kylin/kylin_metadata/cube_job_lock/cube_cm [f888380e-9ff4-98f5-2df4-1ae71e045f93] ls /kylin/kylin_metadata/cube_job_lock/cube_vm [fc186bd9-1186-6ed4-e58c-bbbf6dd8ef74] ls /kylin/kylin_metadata/cube_job_lock/cube_jm [d1a6475a-9ab2-5ee4-6714-f395e20cfc01] {code} * The job IDs for the running jobs: * ** 169f75fa-a02f-221b-fc48-037bc7a842d0 ** 0b5dae1b-6faf-66c5-71dc-86f5b820f1c4 ** 00924699-8b51-8091-6e71-34ccfeba3a98 ** 4620192a-71e1-16dd-3b05-44d7f9144ad4 ** 416355c2-a3d7-57eb-55c6-c042aa256510 ** 12750aea-3b96-c817-64e8-bf893d8c120f ** 42819dde-5857-fd6b-b075-439952f47140 ** 00128937-bd4a-d6c1-7a4e-744dee946f67 ** 46a0233f-217e-9155-725b-c815ad77ba2c ** 062150ba-bacd-6644-4801-3a51b260d1c5 As you can see, the 10 jobs that are actually running do not possess the locks thus cannot actually do anything. On the other hand, the 3 jobs possessing the locks cannot resume running because there are already 10 jobs in running state, thus cannot proceed and release the locks. This is a deadlock the cluster is completely stuck. We have been observing this behavior in 3.0.0 (where rolling back https://issues.apache.org/jira/browse/KYLIN-4165 resolved the issue), and now in 3.1.0 as well. It has been originally reported in the comments of https://issues.apache.org/jira/browse/KYLIN-4348. was: h4. Reproduction steps * Install Kylin 3.1.0 * Deploy a streaming cube * Enable the cube having historical data present in the Kafka topic * Note: in our case, we had 3 cubes deployed, each consuming ~20-20 hourly segments from Kafka when the cubes were enabled h4. Expected result * Kylin is starting to process stream segments with stream jobs, eventually processing the older segments and catching up with the stream h4. Actual result * After a short time, all jobs are completely stuck without any progress. Some in running state, some in pending state. * The following logs are continously written: {code:java} 2020-08-06 06:16:22 INFO [Scheduler 116797841 Job 12750aea-3b96-c817-64e8-bf893d8c120f-254] MapReduceExecutable:409 - 12750aea-3b96-c817-64e8-bf893d8c120f-00, parent lock path(/cube_job_lock/cube_vm) is locked by other job result is true ,ephemeral lock path :/cube_job_ephemeral_lock/cube_vm is locked by other job result is true,will try after one minute 2020-08-06 06:16:33 WARN [FetcherRunner 787667774-43] FetcherRunner:56 - There are too many jobs running, Job Fetch will wait until next schedule time {code} * Zookeeper indicates the following locks are in place: {code:java} ls /kylin/kylin_metadata/cube_job_ephemeral_lock [cube_cm, cube_vm, cube_jm] ls /kylin/kylin_metadata/cube_job_ephemeral_lock/cube_cm [] ls /kylin/kylin_metadata/cube_job_ephemeral_lock/cube_vm [] ls /kylin/kylin_metadata/cube_job_ephemeral_lock/cube_jm [] ls /kylin/kylin_metadata/cube_job_lock [cube_cm, cube_vm, cube_jm] ls /kylin/kylin_metadata/cube_job_lock/cube_cm [f888380e-9ff4-98f5-2df4-1ae71e045f93] ls /kylin/kylin_metadata/cube_job_lock/cube_vm [fc186bd9-1186-6ed4-e58c-bbbf6dd8ef74] ls /kylin/kylin_metadata/cube_job_lock/cube_jm [d1a6475a-9ab2-5ee4-6714-f395e20cfc01] {code} * The job IDs for the running jobs: * ** 169f75fa-a02f-221b-fc48-037bc7a842d0 **
[jira] [Updated] (KYLIN-4689) Deadlock in Kylin job execution
[ https://issues.apache.org/jira/browse/KYLIN-4689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabor Arki updated KYLIN-4689: -- Description: h4. Reproduction steps * Install Kylin 3.1.0 * Deploy a streaming cube * Enable the cube having historical data present in the Kafka topic * Note: in our case, we had 3 cubes deployed, each consuming ~20-20 hourly segments from Kafka when the cubes were enabled h4. Expected result * Kylin is starting to process stream segments with stream jobs, eventually processing the older segments and catching up with the stream h4. Actual result * After a short time, all jobs are completely stuck without any progress. Some in running state, some in pending state. * The following logs are continously written: {code:java} 2020-08-06 06:16:22 INFO [Scheduler 116797841 Job 12750aea-3b96-c817-64e8-bf893d8c120f-254] MapReduceExecutable:409 - 12750aea-3b96-c817-64e8-bf893d8c120f-00, parent lock path(/cube_job_lock/cube_vm) is locked by other job result is true ,ephemeral lock path :/cube_job_ephemeral_lock/cube_vm is locked by other job result is true,will try after one minute 2020-08-06 06:16:33 WARN [FetcherRunner 787667774-43] FetcherRunner:56 - There are too many jobs running, Job Fetch will wait until next schedule time {code} * Zookeeper indicates the following locks are in place: {code:java} ls /kylin/kylin_metadata/cube_job_ephemeral_lock [cube_cm, cube_vm, cube_jm] ls /kylin/kylin_metadata/cube_job_ephemeral_lock/cube_cm [] ls /kylin/kylin_metadata/cube_job_ephemeral_lock/cube_vm [] ls /kylin/kylin_metadata/cube_job_ephemeral_lock/cube_jm [] ls /kylin/kylin_metadata/cube_job_lock [cube_cm, cube_vm, cube_jm] ls /kylin/kylin_metadata/cube_job_lock/cube_cm [f888380e-9ff4-98f5-2df4-1ae71e045f93] ls /kylin/kylin_metadata/cube_job_lock/cube_vm [fc186bd9-1186-6ed4-e58c-bbbf6dd8ef74] ls /kylin/kylin_metadata/cube_job_lock/cube_jm [d1a6475a-9ab2-5ee4-6714-f395e20cfc01] {code} * The job IDs for the running jobs: * ** 169f75fa-a02f-221b-fc48-037bc7a842d0 ** 0b5dae1b-6faf-66c5-71dc-86f5b820f1c4 ** 00924699-8b51-8091-6e71-34ccfeba3a98 ** 4620192a-71e1-16dd-3b05-44d7f9144ad4 ** 416355c2-a3d7-57eb-55c6-c042aa256510 ** 12750aea-3b96-c817-64e8-bf893d8c120f ** 42819dde-5857-fd6b-b075-439952f47140 ** 00128937-bd4a-d6c1-7a4e-744dee946f67 ** 46a0233f-217e-9155-725b-c815ad77ba2c ** 062150ba-bacd-6644-4801-3a51b260d1c5 As you can see, the 10 jobs that are actually running do not possess the locks thus cannot actually do anything. On the other hand, the 3 jobs possessing the locks cannot resume running because there are already 10 jobs in running state, thus cannot proceed and release the locks. This is a deadlock and the cluster is completely stuck. We have been observing this behavior in 3.0.0 (where rolling back https://issues.apache.org/jira/browse/KYLIN-4165 resolved the issue), and now in 3.1.0 as well. It has been originally reported in the comments of https://issues.apache.org/jira/browse/KYLIN-4348. was: h4. Reproduction steps * Install Kylin 3.1.0 * Deploy a streaming cube * Enable the cube having historical data present in the Kafka topic * Note: in our case, we had 3 cubes deployed, each consuming ~20-20 hourly segments from Kafka when the cubes were enabled h4. Expected result * Kylin is starting to process stream segments with stream jobs, eventually processing the older segments and catching up with the stream h4. Actual result * After a short time, all jobs are completely stuck without any progress. Some in running state, some in pending state. * The following logs are continously written: {code:java} 2020-08-06 06:16:22 INFO [Scheduler 116797841 Job 12750aea-3b96-c817-64e8-bf893d8c120f-254] MapReduceExecutable:409 - 12750aea-3b96-c817-64e8-bf893d8c120f-00, parent lock path(/cube_job_lock/cube_vm) is locked by other job result is true ,ephemeral lock path :/cube_job_ephemeral_lock/cube_vm is locked by other job result is true,will try after one minute 2020-08-06 06:16:33 WARN [FetcherRunner 787667774-43] FetcherRunner:56 - There are too many jobs running, Job Fetch will wait until next schedule time {code} * Zookeeper indicates the following locks are in place: {code:java} ls /kylin/kylin_metadata/cube_job_ephemeral_lock [cube_cm, cube_vm, cube_jm] ls /kylin/kylin_metadata/cube_job_ephemeral_lock/cube_cm [] ls /kylin/kylin_metadata/cube_job_ephemeral_lock/cube_vm [] ls /kylin/kylin_metadata/cube_job_ephemeral_lock/cube_jm [] ls /kylin/kylin_metadata/cube_job_lock [cube_cm, cube_vm, cube_jm] ls /kylin/kylin_metadata/cube_job_lock/cube_cm [f888380e-9ff4-98f5-2df4-1ae71e045f93] ls /kylin/kylin_metadata/cube_job_lock/cube_vm [fc186bd9-1186-6ed4-e58c-bbbf6dd8ef74] ls /kylin/kylin_metadata/cube_job_lock/cube_jm [d1a6475a-9ab2-5ee4-6714-f395e20cfc01] {code} * The job IDs for the running jobs: * ** 169f75fa-a02f-221b-fc48-037bc7a842d0 **
[jira] [Updated] (KYLIN-4689) Deadlock in Kylin job execution
[ https://issues.apache.org/jira/browse/KYLIN-4689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabor Arki updated KYLIN-4689: -- Description: h4. Reproduction steps * Install Kylin 3.1.0 * Deploy a streaming cube * Enable the cube having historical data present in the Kafka topic * Note: in our case, we had 3 cubes deployed, each consuming ~20-20 hourly segments from Kafka when the cubes were enabled h4. Expected result * Kylin is starting to process stream segments with stream jobs, eventually processing the older segments and catching up with the stream h4. Actual result * After a short time, all jobs are completely stuck without any progress. Some in running state, some in pending state. * The following logs are continously written: {code:java} 2020-08-06 06:16:22 INFO [Scheduler 116797841 Job 12750aea-3b96-c817-64e8-bf893d8c120f-254] MapReduceExecutable:409 - 12750aea-3b96-c817-64e8-bf893d8c120f-00, parent lock path(/cube_job_lock/cube_vm) is locked by other job result is true ,ephemeral lock path :/cube_job_ephemeral_lock/cube_vm is locked by other job result is true,will try after one minute 2020-08-06 06:16:33 WARN [FetcherRunner 787667774-43] FetcherRunner:56 - There are too many jobs running, Job Fetch will wait until next schedule time {code} * Zookeeper indicates the following locks are in place: {code:java} ls /kylin/kylin_metadata/cube_job_ephemeral_lock [cube_cm, cube_vm, cube_jm] ls /kylin/kylin_metadata/cube_job_ephemeral_lock/cube_cm [] ls /kylin/kylin_metadata/cube_job_ephemeral_lock/cube_vm [] ls /kylin/kylin_metadata/cube_job_ephemeral_lock/cube_jm [] ls /kylin/kylin_metadata/cube_job_lock [cube_cm, cube_vm, cube_jm] ls /kylin/kylin_metadata/cube_job_lock/cube_cm [f888380e-9ff4-98f5-2df4-1ae71e045f93] ls /kylin/kylin_metadata/cube_job_lock/cube_vm [fc186bd9-1186-6ed4-e58c-bbbf6dd8ef74] ls /kylin/kylin_metadata/cube_job_lock/cube_jm [d1a6475a-9ab2-5ee4-6714-f395e20cfc01] {code} * The job IDs for the running jobs: * ** 169f75fa-a02f-221b-fc48-037bc7a842d0 ** 0b5dae1b-6faf-66c5-71dc-86f5b820f1c4 ** 00924699-8b51-8091-6e71-34ccfeba3a98 ** 4620192a-71e1-16dd-3b05-44d7f9144ad4 ** 416355c2-a3d7-57eb-55c6-c042aa256510 ** 12750aea-3b96-c817-64e8-bf893d8c120f ** 42819dde-5857-fd6b-b075-439952f47140 ** 00128937-bd4a-d6c1-7a4e-744dee946f67 ** 46a0233f-217e-9155-725b-c815ad77ba2c ** 062150ba-bacd-6644-4801-3a51b260d1c5 As you can see, the 10 jobs that are actually running do not possess the locks thus cannot actually do anything. On the other hand, the 3 jobs possessing the lock are not running thus cannot proceed and release them. This is a deadlock that completely stuck the cluster. We have been observing this behavior in 3.0.0 (where rolling back https://issues.apache.org/jira/browse/KYLIN-4165 resolved the issue), and now in 3.1.0 as well. It has been originally reported in the comments of https://issues.apache.org/jira/browse/KYLIN-4348. was: h4. Reproduction steps * Install Kylin 3.1.0 * Deploy a streaming cube * Enable the cube having historical data present in the Kafka topic * Note: in our case, we had 3 cubes deployed, each consuming ~20-20 hourly segments from Kafka when the cubes were enabled h4. Expected result * Kylin is starting to process stream segments with stream jobs, eventually processing the older segments and catching up with the stream h4. Actual result * After a short time, all jobs are completely stuck without any progress. Some in running state, some in pending state. * The following logs are continously written: {code:java} 2020-08-06 06:16:22 INFO [Scheduler 116797841 Job 12750aea-3b96-c817-64e8-bf893d8c120f-254] MapReduceExecutable:409 - 12750aea-3b96-c817-64e8-bf893d8c120f-00, parent lock path(/cube_job_lock/cube_vm) is locked by other job result is true ,ephemeral lock path :/cube_job_ephemeral_lock/cube_vm is locked by other job result is true,will try after one minute 2020-08-06 06:16:33 WARN [FetcherRunner 787667774-43] FetcherRunner:56 - There are too many jobs running, Job Fetch will wait until next schedule time {code} * Zookeeper indicates the following locks are in place: {code:java} ls /kylin/kylin_metadata/cube_job_ephemeral_lock [cube_cm, cube_vm, cube_jm] ls /kylin/kylin_metadata/cube_job_ephemeral_lock/cube_cm [] ls /kylin/kylin_metadata/cube_job_ephemeral_lock/cube_vm [] ls /kylin/kylin_metadata/cube_job_ephemeral_lock/cube_jm [] ls /kylin/kylin_metadata/cube_job_lock [cube_cm, cube_vm, cube_jm] ls /kylin/kylin_metadata/cube_job_lock/cube_cm [f888380e-9ff4-98f5-2df4-1ae71e045f93] ls /kylin/kylin_metadata/cube_job_lock/cube_vm [fc186bd9-1186-6ed4-e58c-bbbf6dd8ef74] ls /kylin/kylin_metadata/cube_job_lock/cube_jm [d1a6475a-9ab2-5ee4-6714-f395e20cfc01] {code} * The job IDs for the running jobs: * 10 running jobs in the cluster which show no progress: * ** 169f75fa-a02f-221b-fc48-037bc7a842d0 **
[jira] [Updated] (KYLIN-4689) Deadlock in Kylin job execution
[ https://issues.apache.org/jira/browse/KYLIN-4689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabor Arki updated KYLIN-4689: -- Description: h4. Reproduction steps * Install Kylin 3.1.0 * Deploy a streaming cube * Enable the cube having historical data present in the Kafka topic * Note: in our case, we had 3 cubes deployed, each consuming ~20-20 hourly segments from Kafka when the cubes were enabled h4. Expected result * Kylin is starting to process stream segments with stream jobs, eventually processing the older segments and catching up with the stream h4. Actual result * After a short time, all jobs are completely stuck without any progress. Some in running state, some in pending state. * The following logs are continously written: {code:java} 2020-08-06 06:16:22 INFO [Scheduler 116797841 Job 12750aea-3b96-c817-64e8-bf893d8c120f-254] MapReduceExecutable:409 - 12750aea-3b96-c817-64e8-bf893d8c120f-00, parent lock path(/cube_job_lock/cube_vm) is locked by other job result is true ,ephemeral lock path :/cube_job_ephemeral_lock/cube_vm is locked by other job result is true,will try after one minute 2020-08-06 06:16:33 WARN [FetcherRunner 787667774-43] FetcherRunner:56 - There are too many jobs running, Job Fetch will wait until next schedule time {code} * Zookeeper indicates the following locks are in place: {code:java} ls /kylin/kylin_metadata/cube_job_ephemeral_lock [cube_cm, cube_vm, cube_jm] ls /kylin/kylin_metadata/cube_job_ephemeral_lock/cube_cm [] ls /kylin/kylin_metadata/cube_job_ephemeral_lock/cube_vm [] ls /kylin/kylin_metadata/cube_job_ephemeral_lock/cube_jm [] ls /kylin/kylin_metadata/cube_job_lock [cube_cm, cube_vm, cube_jm] ls /kylin/kylin_metadata/cube_job_lock/cube_cm [f888380e-9ff4-98f5-2df4-1ae71e045f93] ls /kylin/kylin_metadata/cube_job_lock/cube_vm [fc186bd9-1186-6ed4-e58c-bbbf6dd8ef74] ls /kylin/kylin_metadata/cube_job_lock/cube_jm [d1a6475a-9ab2-5ee4-6714-f395e20cfc01] {code} * The job IDs for the running jobs: * 10 running jobs in the cluster which show no progress: * ** 169f75fa-a02f-221b-fc48-037bc7a842d0 ** 0b5dae1b-6faf-66c5-71dc-86f5b820f1c4 ** 00924699-8b51-8091-6e71-34ccfeba3a98 ** 4620192a-71e1-16dd-3b05-44d7f9144ad4 ** 416355c2-a3d7-57eb-55c6-c042aa256510 ** 12750aea-3b96-c817-64e8-bf893d8c120f ** 42819dde-5857-fd6b-b075-439952f47140 ** 00128937-bd4a-d6c1-7a4e-744dee946f67 ** 46a0233f-217e-9155-725b-c815ad77ba2c ** 062150ba-bacd-6644-4801-3a51b260d1c5 As you can see, the 10 jobs that are actually running do not posess the locks thus cannot actually do anything. On the other hand, the 3 jobs possessing the lock are not running thus cannot proceed and release them. This is a deadlock that completely stuck the cluster. We have been observing this behavior in 3.0.0 (where rolling back https://issues.apache.org/jira/browse/KYLIN-4165 resolved the issue), and now in 3.1.0 as well. It has been originally reported in the comments of https://issues.apache.org/jira/browse/KYLIN-4348. was: h4. Reproduction steps * Install Kylin 3.1.0 * Deploy a streaming cube * Enable the cube having historical data present in the Kafka topic * Note: in our case, we had 3 cubes deployed, each consuming ~20-20 hourly segments from Kafka when the cubes were enabled h4. Expected result * Kylin is starting to process stream segments with stream jobs, eventually processing the older segments and catching up with the stream Actual result * After a short time, all jobs are completely stuck without any progress. Some in running state, some in pending state. * The following logs are continously written: {code:java} 2020-08-06 06:16:22 INFO [Scheduler 116797841 Job 12750aea-3b96-c817-64e8-bf893d8c120f-254] MapReduceExecutable:409 - 12750aea-3b96-c817-64e8-bf893d8c120f-00, parent lock path(/cube_job_lock/cube_vm) is locked by other job result is true ,ephemeral lock path :/cube_job_ephemeral_lock/cube_vm is locked by other job result is true,will try after one minute 2020-08-06 06:16:33 WARN [FetcherRunner 787667774-43] FetcherRunner:56 - There are too many jobs running, Job Fetch will wait until next schedule time {code} * Zookeeper indicates the following locks are in place: {code:java} ls /kylin/kylin_metadata/cube_job_ephemeral_lock [cube_cm, cube_vm, cube_jm] ls /kylin/kylin_metadata/cube_job_ephemeral_lock/cube_cm [] ls /kylin/kylin_metadata/cube_job_ephemeral_lock/cube_vm [] ls /kylin/kylin_metadata/cube_job_ephemeral_lock/cube_jm [] ls /kylin/kylin_metadata/cube_job_lock [cube_cm, cube_vm, cube_jm] ls /kylin/kylin_metadata/cube_job_lock/cube_cm [f888380e-9ff4-98f5-2df4-1ae71e045f93] ls /kylin/kylin_metadata/cube_job_lock/cube_vm [fc186bd9-1186-6ed4-e58c-bbbf6dd8ef74] ls /kylin/kylin_metadata/cube_job_lock/cube_jm [d1a6475a-9ab2-5ee4-6714-f395e20cfc01] {code} * The job IDs for the running jobs: * 10 running jobs in the cluster which show no progress: * **
[jira] [Updated] (KYLIN-4689) Deadlock in Kylin job execution
[ https://issues.apache.org/jira/browse/KYLIN-4689?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabor Arki updated KYLIN-4689: -- Description: h4. Reproduction steps * Install Kylin 3.1.0 * Deploy a streaming cube * Enable the cube having historical data present in the Kafka topic * Note: in our case, we had 3 cubes deployed, each consuming ~20-20 hourly segments from Kafka when the cubes were enabled h4. Expected result * Kylin is starting to process stream segments with stream jobs, eventually processing the older segments and catching up with the stream Actual result * After a short time, all jobs are completely stuck without any progress. Some in running state, some in pending state. * The following logs are continously written: {code:java} 2020-08-06 06:16:22 INFO [Scheduler 116797841 Job 12750aea-3b96-c817-64e8-bf893d8c120f-254] MapReduceExecutable:409 - 12750aea-3b96-c817-64e8-bf893d8c120f-00, parent lock path(/cube_job_lock/cube_vm) is locked by other job result is true ,ephemeral lock path :/cube_job_ephemeral_lock/cube_vm is locked by other job result is true,will try after one minute 2020-08-06 06:16:33 WARN [FetcherRunner 787667774-43] FetcherRunner:56 - There are too many jobs running, Job Fetch will wait until next schedule time {code} * Zookeeper indicates the following locks are in place: {code:java} ls /kylin/kylin_metadata/cube_job_ephemeral_lock [cube_cm, cube_vm, cube_jm] ls /kylin/kylin_metadata/cube_job_ephemeral_lock/cube_cm [] ls /kylin/kylin_metadata/cube_job_ephemeral_lock/cube_vm [] ls /kylin/kylin_metadata/cube_job_ephemeral_lock/cube_jm [] ls /kylin/kylin_metadata/cube_job_lock [cube_cm, cube_vm, cube_jm] ls /kylin/kylin_metadata/cube_job_lock/cube_cm [f888380e-9ff4-98f5-2df4-1ae71e045f93] ls /kylin/kylin_metadata/cube_job_lock/cube_vm [fc186bd9-1186-6ed4-e58c-bbbf6dd8ef74] ls /kylin/kylin_metadata/cube_job_lock/cube_jm [d1a6475a-9ab2-5ee4-6714-f395e20cfc01] {code} * The job IDs for the running jobs: * 10 running jobs in the cluster which show no progress: * ** 169f75fa-a02f-221b-fc48-037bc7a842d0 ** 0b5dae1b-6faf-66c5-71dc-86f5b820f1c4 ** 00924699-8b51-8091-6e71-34ccfeba3a98 ** 4620192a-71e1-16dd-3b05-44d7f9144ad4 ** 416355c2-a3d7-57eb-55c6-c042aa256510 ** 12750aea-3b96-c817-64e8-bf893d8c120f ** 42819dde-5857-fd6b-b075-439952f47140 ** 00128937-bd4a-d6c1-7a4e-744dee946f67 ** 46a0233f-217e-9155-725b-c815ad77ba2c ** 062150ba-bacd-6644-4801-3a51b260d1c5 As you can see, the 10 jobs that are actually running do not posess the locks thus cannot actually do anything. On the other hand, the 3 jobs possessing the lock are not running thus cannot proceed and release them. This is a deadlock that completely stuck the cluster. We have been observing this behavior in 3.0.0 (where rolling back https://issues.apache.org/jira/browse/KYLIN-4165 resolved the issue), and now in 3.1.0 as well. It has been originally reported in the comments of https://issues.apache.org/jira/browse/KYLIN-4348. was: h4. Reproduction steps * Install Kylin 3.1.0 * Deploy a streaming cube * Enable the cube having historical data present in the Kafka topic * Note: in our case, we had 3 cubes deployed, each consuming ~20-20 hourly segments from Kafka when the cubes were enabled) h4. Expected result * Kylin is starting to process stream segments with stream jobs, eventually processing the older segments and catching up with the stream Actual result * After a short time, all jobs are completely stuck without any progress. Some in running state, some in pending state. * The following logs are continously written: {code:java} 2020-08-06 06:16:22 INFO [Scheduler 116797841 Job 12750aea-3b96-c817-64e8-bf893d8c120f-254] MapReduceExecutable:409 - 12750aea-3b96-c817-64e8-bf893d8c120f-00, parent lock path(/cube_job_lock/cube_vm) is locked by other job result is true ,ephemeral lock path :/cube_job_ephemeral_lock/cube_vm is locked by other job result is true,will try after one minute 2020-08-06 06:16:33 WARN [FetcherRunner 787667774-43] FetcherRunner:56 - There are too many jobs running, Job Fetch will wait until next schedule time {code} * Zookeeper indicates the following locks are in place: {code:java} ls /kylin/kylin_metadata/cube_job_ephemeral_lock [cube_cm, cube_vm, cube_jm] ls /kylin/kylin_metadata/cube_job_ephemeral_lock/cube_cm [] ls /kylin/kylin_metadata/cube_job_ephemeral_lock/cube_vm [] ls /kylin/kylin_metadata/cube_job_ephemeral_lock/cube_jm [] ls /kylin/kylin_metadata/cube_job_lock [cube_cm, cube_vm, cube_jm] ls /kylin/kylin_metadata/cube_job_lock/cube_cm [f888380e-9ff4-98f5-2df4-1ae71e045f93] ls /kylin/kylin_metadata/cube_job_lock/cube_vm [fc186bd9-1186-6ed4-e58c-bbbf6dd8ef74] ls /kylin/kylin_metadata/cube_job_lock/cube_jm [d1a6475a-9ab2-5ee4-6714-f395e20cfc01] {code} * The job IDs for the running jobs: * 10 running jobs in the cluster which show no progress: **
[jira] [Created] (KYLIN-4689) Deadlock in Kylin job execution
Gabor Arki created KYLIN-4689: - Summary: Deadlock in Kylin job execution Key: KYLIN-4689 URL: https://issues.apache.org/jira/browse/KYLIN-4689 Project: Kylin Issue Type: Bug Components: Job Engine Reporter: Gabor Arki h4. Reproduction steps * Install Kylin 3.1.0 * Deploy a streaming cube * Enable the cube having historical data present in the Kafka topic * Note: in our case, we had 3 cubes deployed, each consuming ~20-20 hourly segments from Kafka when the cubes were enabled) h4. Expected result * Kylin is starting to process stream segments with stream jobs, eventually processing the older segments and catching up with the stream Actual result * After a short time, all jobs are completely stuck without any progress. Some in running state, some in pending state. * The following logs are continously written: {code:java} 2020-08-06 06:16:22 INFO [Scheduler 116797841 Job 12750aea-3b96-c817-64e8-bf893d8c120f-254] MapReduceExecutable:409 - 12750aea-3b96-c817-64e8-bf893d8c120f-00, parent lock path(/cube_job_lock/cube_vm) is locked by other job result is true ,ephemeral lock path :/cube_job_ephemeral_lock/cube_vm is locked by other job result is true,will try after one minute 2020-08-06 06:16:33 WARN [FetcherRunner 787667774-43] FetcherRunner:56 - There are too many jobs running, Job Fetch will wait until next schedule time {code} * Zookeeper indicates the following locks are in place: {code:java} ls /kylin/kylin_metadata/cube_job_ephemeral_lock [cube_cm, cube_vm, cube_jm] ls /kylin/kylin_metadata/cube_job_ephemeral_lock/cube_cm [] ls /kylin/kylin_metadata/cube_job_ephemeral_lock/cube_vm [] ls /kylin/kylin_metadata/cube_job_ephemeral_lock/cube_jm [] ls /kylin/kylin_metadata/cube_job_lock [cube_cm, cube_vm, cube_jm] ls /kylin/kylin_metadata/cube_job_lock/cube_cm [f888380e-9ff4-98f5-2df4-1ae71e045f93] ls /kylin/kylin_metadata/cube_job_lock/cube_vm [fc186bd9-1186-6ed4-e58c-bbbf6dd8ef74] ls /kylin/kylin_metadata/cube_job_lock/cube_jm [d1a6475a-9ab2-5ee4-6714-f395e20cfc01] {code} * The job IDs for the running jobs: * 10 running jobs in the cluster which show no progress: ** 169f75fa-a02f-221b-fc48-037bc7a842d0 ** 0b5dae1b-6faf-66c5-71dc-86f5b820f1c4 ** 00924699-8b51-8091-6e71-34ccfeba3a98 ** 4620192a-71e1-16dd-3b05-44d7f9144ad4 ** 416355c2-a3d7-57eb-55c6-c042aa256510 ** 12750aea-3b96-c817-64e8-bf893d8c120f ** 42819dde-5857-fd6b-b075-439952f47140 ** 00128937-bd4a-d6c1-7a4e-744dee946f67 ** 46a0233f-217e-9155-725b-c815ad77ba2c ** 062150ba-bacd-6644-4801-3a51b260d1c5 As you can see, the 10 jobs that are actually running do not posess the locks thus cannot actually do anything. On the other hand, the 3 jobs possessing the lock are not running thus cannot proceed and release them. This is a deadlock that completely stuck the cluster. We have been observing this behavior in 3.0.0 (where rolling back https://issues.apache.org/jira/browse/KYLIN-4165 resolved the issue), and now in 3.1.0 as well. It has been originally reported in the comments of https://issues.apache.org/jira/browse/KYLIN-4348. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KYLIN-4348) Fix distributed concurrency lock bug
[ https://issues.apache.org/jira/browse/KYLIN-4348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17172255#comment-17172255 ] Gabor Arki commented on KYLIN-4348: --- Not sure if it is actually related to this ticket/epic, so created a separate bug: https://issues.apache.org/jira/browse/KYLIN-4689 > Fix distributed concurrency lock bug > > > Key: KYLIN-4348 > URL: https://issues.apache.org/jira/browse/KYLIN-4348 > Project: Kylin > Issue Type: Sub-task >Reporter: wangxiaojing >Assignee: wangxiaojing >Priority: Major > Fix For: v3.1.0 > > Attachments: image-2020-02-03-10-54-21-976.png, > image-2020-02-03-10-54-53-468.png > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KYLIN-4348) Fix distributed concurrency lock bug
[ https://issues.apache.org/jira/browse/KYLIN-4348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17172237#comment-17172237 ] Gabor Arki commented on KYLIN-4348: --- Altogether we have 10 running jobs in the cluster which show no progress: * 169f75fa-a02f-221b-fc48-037bc7a842d0 * 0b5dae1b-6faf-66c5-71dc-86f5b820f1c4 * 00924699-8b51-8091-6e71-34ccfeba3a98 * 4620192a-71e1-16dd-3b05-44d7f9144ad4 * 416355c2-a3d7-57eb-55c6-c042aa256510 * 12750aea-3b96-c817-64e8-bf893d8c120f * 42819dde-5857-fd6b-b075-439952f47140 * 00128937-bd4a-d6c1-7a4e-744dee946f67 * 46a0233f-217e-9155-725b-c815ad77ba2c * 062150ba-bacd-6644-4801-3a51b260d1c5 However, the ones possessing the locks are all pending: * f888380e-9ff4-98f5-2df4-1ae71e045f93 * fc186bd9-1186-6ed4-e58c-bbbf6dd8ef74 * d1a6475a-9ab2-5ee4-6714-f395e20cfc01 So, essentially the jobs that are running cannot actually run because they are unable to acquire a lock. However, the ones that possess the lock cannot continue because there are already 10 running jobs. This seems to be a deadlock to me. > Fix distributed concurrency lock bug > > > Key: KYLIN-4348 > URL: https://issues.apache.org/jira/browse/KYLIN-4348 > Project: Kylin > Issue Type: Sub-task >Reporter: wangxiaojing >Assignee: wangxiaojing >Priority: Major > Fix For: v3.1.0 > > Attachments: image-2020-02-03-10-54-21-976.png, > image-2020-02-03-10-54-53-468.png > > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KYLIN-4348) Fix distributed concurrency lock bug
[ https://issues.apache.org/jira/browse/KYLIN-4348?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17172216#comment-17172216 ] Gabor Arki commented on KYLIN-4348: --- Zookeeper: {code:java} ls /kylin/kylin_metadata/cube_job_ephemeral_lock [cube_cm, cube_vm, cube_jm] ls /kylin/kylin_metadata/cube_job_ephemeral_lock/cube_cm [] ls /kylin/kylin_metadata/cube_job_ephemeral_lock/cube_vm [] ls /kylin/kylin_metadata/cube_job_ephemeral_lock/cube_jm [] ls /kylin/kylin_metadata/cube_job_lock [cube_cm, cube_vm, cube_jm] ls /kylin/kylin_metadata/cube_job_lock/cube_cm [f888380e-9ff4-98f5-2df4-1ae71e045f93] ls /kylin/kylin_metadata/cube_job_lock/cube_vm [fc186bd9-1186-6ed4-e58c-bbbf6dd8ef74] ls /kylin/kylin_metadata/cube_job_lock/cube_jm [d1a6475a-9ab2-5ee4-6714-f395e20cfc01]{code} State of {{f888380e-9ff4-98f5-2df4-1ae71e045f93}} job: Build Dimension Dictionaries For Steaming Job: FINISHED Save Cube Dictionaries: PENDING Overall job status: PENDING Last logs: {code:java} 2020-08-05 22:44:44 INFO [Scheduler 116797841 Job f888380e-9ff4-98f5-2df4-1ae71e045f93-354] ExecutableManager:479 - job id:f888380e-9ff4-98f5-2df4-1ae71e045f93-00 from RUNNING to SUCCEED 2020-08-05 22:44:44 INFO [Scheduler 116797841 Job f888380e-9ff4-98f5-2df4-1ae71e045f93-354] ExecutableManager:479 - job id:f888380e-9ff4-98f5-2df4-1ae71e045f93 from RUNNING to READY{code} Since this I can only see the following for this job ID: {code:java} 2020-08-05 23:18:01 INFO [streaming_job_submitter-thread-1] BuildJobSubmitter:287 - No left quota to build segments for cube:cube_cm at 0 2020-08-05 23:18:01 INFO [streaming_job_submitter-thread-1] BuildJobSubmitter:462 - Job:b730dd18-173c-53d9-250b-ab9fb30a83b8 is in running, job state: READY. 2020-08-05 23:18:01 INFO [streaming_job_submitter-thread-1] BuildJobSubmitter:287 - No left quota to build segments for cube:cube_cm at 0 2020-08-05 23:18:01 INFO [streaming_job_submitter-thread-1] BuildJobSubmitter:462 - Job:46a0233f-217e-9155-725b-c815ad77ba2c is in running, job state: RUNNING. 2020-08-05 23:18:01 INFO [streaming_job_submitter-thread-1] BuildJobSubmitter:287 - No left quota to build segments for cube:cube_cm at 0 2020-08-05 23:18:01 INFO [streaming_job_submitter-thread-1] BuildJobSubmitter:287 - No left quota to build segments for cube:cube_cm at 0 2020-08-05 23:18:01 INFO [streaming_job_submitter-thread-1] BuildJobSubmitter:462 - Job:f888380e-9ff4-98f5-2df4-1ae71e045f93 is in running, job state: READY. 2020-08-05 23:18:01 INFO [streaming_job_submitter-thread-1] BuildJobSubmitter:287 - No left quota to build segments for cube:cube_cm at 0 2020-08-05 23:18:01 INFO [streaming_job_submitter-thread-1] BuildJobSubmitter:462 - Job:70a52f13-e401-4f2f-8a33-b35b5ef955c4 is in running, job state: READY. 2020-08-05 23:18:01 INFO [streaming_job_submitter-thread-1] BuildJobSubmitter:287 - No left quota to build segments for cube:cube_cm at 0 2020-08-05 23:18:01 INFO [streaming_job_submitter-thread-1] BuildJobSubmitter:462 - Job:4620192a-71e1-16dd-3b05-44d7f9144ad4 is in running, job state: RUNNING. 2020-08-05 23:18:01 INFO [streaming_job_submitter-thread-1] BuildJobSubmitter:287 - No left quota to build segments for cube:cube_cm at 0 2020-08-05 23:18:01 INFO [streaming_job_submitter-thread-1] BuildJobSubmitter:462 - Job:00924699-8b51-8091-6e71-34ccfeba3a98 is in running, job state: RUNNING. 2020-08-05 23:18:01 INFO [streaming_job_submitter-thread-1] BuildJobSubmitter:287 - No left quota to build segments for cube:cube_cm at 0 2020-08-05 23:18:01 INFO [streaming_job_submitter-thread-1] BuildJobSubmitter:462 - Job:169f75fa-a02f-221b-fc48-037bc7a842d0 is in running, job state: RUNNING. 2020-08-05 23:18:01 INFO [streaming_job_submitter-thread-1] BuildJobSubmitter:287 - No left quota to build segments for cube:cube_cm at 0 2020-08-05 23:18:01 INFO [streaming_job_submitter-thread-1] BuildJobSubmitter:462 - Job:456ebd63-9202-f142-eee6-2156846e5c11 is in running, job state: READY. 2020-08-05 23:18:01 INFO [streaming_job_submitter-thread-1] BuildJobSubmitter:287 - No left quota to build segments for cube:cube_cm at 0 2020-08-05 23:18:01 INFO [streaming_job_submitter-thread-1] BuildJobSubmitter:462 - Job:522b8b86-5f89-cffb-3423-cca8d6908613 is in running, job state: READY. 2020-08-05 23:18:01 INFO [streaming_job_submitter-thread-1] BuildJobSubmitter:287 - No left quota to build segments for cube:cube_cm at 0 2020-08-05 23:18:01 INFO [streaming_job_submitter-thread-1] BuildJobSubmitter:287 - No left quota to build segments for cube:cube_cm at 0 {code} The rest of the jobs for this cube are: * pending on Calculate Statistics from Base Cuboid (one, {{b730dd18-173c-53d9-250b-ab9fb30a83b8)}} * pending on Build Dimension Dictionaries For Steaming Job * running, but stuck on Build Dimension Dictionaries For Steaming Job (four as in the list above) The other 2 cubes we have are similarly stuck. > Fix
[jira] [Commented] (KYLIN-4500) Timeout waiting for connection from pool
[ https://issues.apache.org/jira/browse/KYLIN-4500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17170903#comment-17170903 ] Gabor Arki commented on KYLIN-4500: --- The issue has happened today again on one of our test environments. I checked the open connections and it was exceeding 1 {code:java} [hadoop@ip-24-0-1-221 ~]$ netstat -anp | grep 21053 | grep CLOSE_WAIT | wc -l 10007{code} Based on this, it seems indeed highly likely that https://issues.apache.org/jira/browse/KYLIN-4396 is causing the issue just it manifests with a different error in our case. I will be monitoring the 3.1.0 version once we make the upgrade and will get providing an update. > Timeout waiting for connection from pool > > > Key: KYLIN-4500 > URL: https://issues.apache.org/jira/browse/KYLIN-4500 > Project: Kylin > Issue Type: Bug >Reporter: Gabor Arki >Priority: Major > Attachments: kylin-connection-timeout.txt > > > h4. Environment > * Kylin server 3.0.0 > * EMR 5.28 > h4. Issue > After an extended uptime, both Kylin query server and jobs running on EMR > stop working. The root cause in both cases is: > {noformat} > Caused by: java.io.IOException: > com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.SdkClientException: Unable > to execute HTTP request: Timeout waiting for connection from pool > at > com.amazon.ws.emr.hadoop.fs.s3n2.S3NativeFileSystem2.getFileStatus(S3NativeFileSystem2.java:257) > ~[emrfs-hadoop-assembly-2.37.0.jar:?]{noformat} > Based on > [https://aws.amazon.com/premiumsupport/knowledge-center/emr-timeout-connection-wait/] > increasing the fs.s3.maxConnections setting to 1 is just delaying the > issue thus the underlying issue is likely a connection leak. It also > indicates a leak that restarting the kylin service solves the problem. > A full stack trace from the QueryService is attached. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (KYLIN-4656) Guava classpath conflict caused by kylin-jdbc 3.1.0 jar
[ https://issues.apache.org/jira/browse/KYLIN-4656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17163407#comment-17163407 ] Gabor Arki edited comment on KYLIN-4656 at 7/23/20, 10:24 AM: -- [~zhangyaqian] because you are excluding it from the dependencies of the released pom, but not from the shading itself. * Instead of providing an exclude list and shade everything else into the jdbc jar: {noformat} org.slf4j:jcl-over-slf4j:* {noformat} You should define an include list and shade only that you actually intend to shade: {noformat} org.apache.kylin:kylin-shaded-guava org.apache.calcite.avatica:avatica-core org.apache.calcite.avatica:avatica-metrics com.fasterxml.jackson.core:jackson-annotations com.fasterxml.jackson.core:jackson-core com.fasterxml.jackson.core:jackson-databind com.google.protobuf:protobuf-java org.apache.httpcomponents:httpclient org.apache.httpcomponents:httpcore commons-codec:commons-codec commons-logging:commons-logging {noformat} This set is the one that seems to be included in 3.0.2, not sure if everything is actually needed or not. * Also, the relocation in the kylin-shaded-guava module is not complete: {noformat} com.google.common ${shadeBase}.com.google.common{noformat} It relocated only the {{com.google.common}} package of guava, but keeps {{com.google.thirdparty}} as is allowing again some classpath conflicts. Should be: {noformat} com.google ${shadeBase}.com.google{noformat} * These workarounds seem to fix the issue locally, but defining dependencies in the [parent pom|https://github.com/apache/kylin/blob/master/pom.xml#L1095] is still a bad practice because these are transitively also defined as dependencies for anyone who is using your public libraries like kylin-jdbc. was (Author: arkigabor): [~zhangyaqian] because you are excluding it from the dependencies of the released pom, but not from the shading itself. * Instead of providing an exclude list and shade everything else: {noformat} org.slf4j:jcl-over-slf4j:* {noformat} You should define an include list and shade only that you actually intend to shade: {noformat} org.apache.kylin:kylin-shaded-guava org.apache.calcite.avatica:avatica-core org.apache.calcite.avatica:avatica-metrics com.fasterxml.jackson.core:jackson-annotations com.fasterxml.jackson.core:jackson-core com.fasterxml.jackson.core:jackson-databind com.google.protobuf:protobuf-java org.apache.httpcomponents:httpclient org.apache.httpcomponents:httpcore commons-codec:commons-codec commons-logging:commons-logging {noformat} This set it the one that seems to be included in 3.0.2, not sure if everything is actually needed or not. * Also, the relocation in the kylin-shaded-guava module is not complete: {noformat} com.google.common ${shadeBase}.com.google.common{noformat} It relocated only the {{com.google.common}} package of guava, but keeps {{com.google.thirdparty}} as is allowing again some classpath conflicts. Should be: {noformat} com.google ${shadeBase}.com.google{noformat} * These workarounds seem to fix the issue locally, but defining dependencies in the [parent pom|https://github.com/apache/kylin/blob/master/pom.xml#L1095] is still a bad practice because these are transitively also defined as dependencies for anyone who is using your public libraries like kylin-jdbc. > Guava classpath conflict caused by kylin-jdbc 3.1.0 jar > --- > > Key: KYLIN-4656 > URL: https://issues.apache.org/jira/browse/KYLIN-4656 > Project: Kylin > Issue Type: Bug > Components: Driver - JDBC >Affects Versions: v3.1.0 >Reporter: Gabor Arki >Priority: Critical > Attachments: image-2020-07-23-07-44-40-675.png > > > The newly released kylin-jdbc 3.1.0 jar contains a shaded, non-repackaged > version of the Guava library. This is causing class duplication with the > original guava jar if it is also on the classpath which results in > non-deterministic, runtime errors depending on which version of a certain > guava class has been picked up by the class-loader from the 2 versions. Based > on the runtime errors of the missing classes and methods, it seems to be a > very old version, probably <=14. > > Either implement a proper shading with package relocation or rely on > transitive dependency, but do not shade non-repackaged versions of libraries. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KYLIN-4656) Guava classpath conflict caused by kylin-jdbc 3.1.0 jar
[ https://issues.apache.org/jira/browse/KYLIN-4656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17163407#comment-17163407 ] Gabor Arki commented on KYLIN-4656: --- [~zhangyaqian] because you are excluding it from the dependencies of the released pom, but not from the shading itself. * Instead of providing an exclude list and shade everything else: {noformat} org.slf4j:jcl-over-slf4j:* {noformat} You should define an include list and shade only that you actually intend to shade: {noformat} org.apache.kylin:kylin-shaded-guava org.apache.calcite.avatica:avatica-core org.apache.calcite.avatica:avatica-metrics com.fasterxml.jackson.core:jackson-annotations com.fasterxml.jackson.core:jackson-core com.fasterxml.jackson.core:jackson-databind com.google.protobuf:protobuf-java org.apache.httpcomponents:httpclient org.apache.httpcomponents:httpcore commons-codec:commons-codec commons-logging:commons-logging {noformat} This set it the one that seems to be included in 3.0.2, not sure if everything is actually needed or not. * Also, the relocation in the kylin-shaded-guava module is not complete: {noformat} com.google.common ${shadeBase}.com.google.common{noformat} It relocated only the {{com.google.common}} package of guava, but keeps {{com.google.thirdparty}} as is allowing again some classpath conflicts. Should be: {noformat} com.google ${shadeBase}.com.google{noformat} * These workarounds seem to fix the issue locally, but defining dependencies in the [parent pom|https://github.com/apache/kylin/blob/master/pom.xml#L1095] is still a bad practice because these are transitively also defined as dependencies for anyone who is using your public libraries like kylin-jdbc. > Guava classpath conflict caused by kylin-jdbc 3.1.0 jar > --- > > Key: KYLIN-4656 > URL: https://issues.apache.org/jira/browse/KYLIN-4656 > Project: Kylin > Issue Type: Bug > Components: Driver - JDBC >Affects Versions: v3.1.0 >Reporter: Gabor Arki >Priority: Critical > Attachments: image-2020-07-23-07-44-40-675.png > > > The newly released kylin-jdbc 3.1.0 jar contains a shaded, non-repackaged > version of the Guava library. This is causing class duplication with the > original guava jar if it is also on the classpath which results in > non-deterministic, runtime errors depending on which version of a certain > guava class has been picked up by the class-loader from the 2 versions. Based > on the runtime errors of the missing classes and methods, it seems to be a > very old version, probably <=14. > > Either implement a proper shading with package relocation or rely on > transitive dependency, but do not shade non-repackaged versions of libraries. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (KYLIN-4656) Guava classpath conflict caused by kylin-jdbc 3.1.0 jar
[ https://issues.apache.org/jira/browse/KYLIN-4656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17163352#comment-17163352 ] Gabor Arki edited comment on KYLIN-4656 at 7/23/20, 9:01 AM: - Based on the dependency tree of the jdbc module, guava 14 is a transitive dependency of the kylin-shaded-guava: {noformat} [INFO] < org.apache.kylin:kylin-jdbc >- [INFO] Building Apache Kylin - JDBC Driver 3.1.1-SNAPSHOT [31/35] [INFO] [ jar ]- [INFO] [INFO] --- maven-dependency-plugin:2.10:tree (default-cli) @ kylin-jdbc --- [INFO] org.apache.kylin:kylin-jdbc:jar:3.1.1-SNAPSHOT [INFO] +- org.apache.kylin:kylin-shaded-guava:jar:3.1.1-SNAPSHOT:compile [INFO] | \- com.google.guava:guava:jar:14.0:compile{noformat} The dependency tree of the external module: {noformat} [INFO] --< org.apache.kylin:kylin-external >--- [INFO] Building Apache Kylin - kylin External 3.1.1-SNAPSHOT [2/35] [INFO] [ pom ]- [INFO] [INFO] --- maven-dependency-plugin:2.10:tree (default-cli) @ kylin-external --- [INFO] org.apache.kylin:kylin-external:pom:3.1.1-SNAPSHOT [INFO] +- log4j:log4j:jar:1.2.17:provided [INFO] +- org.slf4j:slf4j-log4j12:jar:1.7.21:provided [INFO] +- org.slf4j:jcl-over-slf4j:jar:1.7.21:compile [INFO] +- org.slf4j:slf4j-api:jar:1.7.21:compile [INFO] \- org.apache.hadoop:hadoop-common:jar:2.7.1:provided [INFO] +- org.apache.hadoop:hadoop-annotations:jar:2.7.1:provided [INFO] | \- jdk.tools:jdk.tools:jar:1.8:system [INFO] +- com.google.guava:guava:jar:14.0:provided [INFO] +- commons-cli:commons-cli:jar:1.2:provided [INFO] +- org.apache.commons:commons-math3:jar:3.1.1:provided [INFO] +- xmlenc:xmlenc:jar:0.52:provided [INFO] +- commons-httpclient:commons-httpclient:jar:3.1:provided [INFO] +- commons-codec:commons-codec:jar:1.4:provided [INFO] +- commons-io:commons-io:jar:2.4:provided [INFO] +- commons-net:commons-net:jar:3.1:provided [INFO] +- commons-collections:commons-collections:jar:3.2.2:provided [INFO] +- org.mortbay.jetty:jetty:jar:6.1.26:provided [INFO] +- org.mortbay.jetty:jetty-util:jar:6.1.26:provided [INFO] +- com.sun.jersey:jersey-core:jar:1.9:provided [INFO] +- com.sun.jersey:jersey-json:jar:1.9:provided [INFO] | +- org.codehaus.jettison:jettison:jar:1.1:provided [INFO] | +- com.sun.xml.bind:jaxb-impl:jar:2.2.3-1:provided [INFO] | | \- javax.xml.bind:jaxb-api:jar:2.2.2:provided [INFO] | | +- javax.xml.stream:stax-api:jar:1.0-2:provided [INFO] | | \- javax.activation:activation:jar:1.1:provided [INFO] | +- org.codehaus.jackson:jackson-jaxrs:jar:1.8.3:provided [INFO] | \- org.codehaus.jackson:jackson-xc:jar:1.8.3:provided [INFO] +- com.sun.jersey:jersey-server:jar:1.9:provided [INFO] | \- asm:asm:jar:3.1:provided [INFO] +- commons-logging:commons-logging:jar:1.1.3:provided [INFO] +- commons-lang:commons-lang:jar:2.6:provided [INFO] +- commons-configuration:commons-configuration:jar:1.6:provided [INFO] | +- commons-digester:commons-digester:jar:1.8:provided [INFO] | | \- commons-beanutils:commons-beanutils:jar:1.7.0:provided [INFO] | \- commons-beanutils:commons-beanutils-core:jar:1.8.0:provided [INFO] +- org.codehaus.jackson:jackson-core-asl:jar:1.9.13:provided [INFO] +- org.codehaus.jackson:jackson-mapper-asl:jar:1.9.13:provided [INFO] +- org.apache.avro:avro:jar:1.7.4:provided [INFO] | +- com.thoughtworks.paranamer:paranamer:jar:2.3:provided [INFO] | \- org.xerial.snappy:snappy-java:jar:1.0.4.1:provided [INFO] +- com.google.protobuf:protobuf-java:jar:2.5.0:provided [INFO] +- com.google.code.gson:gson:jar:2.2.4:provided [INFO] +- org.apache.hadoop:hadoop-auth:jar:2.7.1:provided [INFO] | +- org.apache.httpcomponents:httpclient:jar:4.3.6:provided [INFO] | | \- org.apache.httpcomponents:httpcore:jar:4.3.3:provided [INFO] | +- org.apache.directory.server:apacheds-kerberos-codec:jar:2.0.0-M15:provided [INFO] | | +- org.apache.directory.server:apacheds-i18n:jar:2.0.0-M15:provided [INFO] | | +- org.apache.directory.api:api-asn1-api:jar:1.0.0-M20:provided [INFO] | | \- org.apache.directory.api:api-util:jar:1.0.0-M20:provided [INFO] | \- org.apache.curator:curator-framework:jar:2.12.0:provided [INFO] +- com.jcraft:jsch:jar:0.1.54:provided [INFO] +- org.apache.curator:curator-client:jar:2.12.0:provided [INFO] +- org.apache.curator:curator-recipes:jar:2.12.0:provided [INFO] +- com.google.code.findbugs:jsr305:jar:3.0.1:provided [INFO] +- org.apache.htrace:htrace-core:jar:3.1.0-incubating:provided [INFO] +- org.apache.zookeeper:zookeeper:jar:3.4.14:provided [INFO] | +- com.github.spotbugs:spotbugs-annotations:jar:3.1.9:provided [INFO] | +- org.apache.yetus:audience-annotations:jar:0.5.0:provided [INFO] | \- io.netty:netty:jar:3.10.6.Final:provided [INFO] \- org.apache.commons:commons-compress:jar:1.19:provided{noformat} It is inheriting all
[jira] [Commented] (KYLIN-4656) Guava classpath conflict caused by kylin-jdbc 3.1.0 jar
[ https://issues.apache.org/jira/browse/KYLIN-4656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17163352#comment-17163352 ] Gabor Arki commented on KYLIN-4656: --- Based on the dependency tree of the jdbc module, guava 14 is a transitive dependency of the kylin-shaded-guava: {noformat} [INFO] < org.apache.kylin:kylin-jdbc >- [INFO] Building Apache Kylin - JDBC Driver 3.1.1-SNAPSHOT [31/35] [INFO] [ jar ]- [INFO] [INFO] --- maven-dependency-plugin:2.10:tree (default-cli) @ kylin-jdbc --- [INFO] org.apache.kylin:kylin-jdbc:jar:3.1.1-SNAPSHOT [INFO] +- org.apache.kylin:kylin-shaded-guava:jar:3.1.1-SNAPSHOT:compile [INFO] | \- com.google.guava:guava:jar:14.0:compile{noformat} > Guava classpath conflict caused by kylin-jdbc 3.1.0 jar > --- > > Key: KYLIN-4656 > URL: https://issues.apache.org/jira/browse/KYLIN-4656 > Project: Kylin > Issue Type: Bug > Components: Driver - JDBC >Affects Versions: v3.1.0 >Reporter: Gabor Arki >Priority: Critical > Attachments: image-2020-07-23-07-44-40-675.png > > > The newly released kylin-jdbc 3.1.0 jar contains a shaded, non-repackaged > version of the Guava library. This is causing class duplication with the > original guava jar if it is also on the classpath which results in > non-deterministic, runtime errors depending on which version of a certain > guava class has been picked up by the class-loader from the 2 versions. Based > on the runtime errors of the missing classes and methods, it seems to be a > very old version, probably <=14. > > Either implement a proper shading with package relocation or rely on > transitive dependency, but do not shade non-repackaged versions of libraries. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (KYLIN-4656) Guava classpath conflict caused by kylin-jdbc 3.1.0 jar
[ https://issues.apache.org/jira/browse/KYLIN-4656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17163235#comment-17163235 ] Gabor Arki edited comment on KYLIN-4656 at 7/23/20, 5:52 AM: - Download the published 3.1.0 jar: [https://mvnrepository.com/artifact/org.apache.kylin/kylin-jdbc/3.1.0] Then unzip the file. You will see the com.google package inside the jar containing the guava library within the kylin-jdbc jar. From my findings, this is likely version 14 of guava. !image-2020-07-23-07-44-40-675.png! If you also have the real guava jar on your classpath, you end up having 2 separate and highly incompatible versions of guava classes present, and the class-loader is loading one of the competing classes randomly causing numerous runtime issues. Also, the shaded version you are referring too is also present in the jar and is properly relocated to the org.apache.kylin.shaded.com.google package and is fine as is. was (Author: arkigabor): Download the published 3.1.0 jar: [https://mvnrepository.com/artifact/org.apache.kylin/kylin-jdbc/3.1.0] Then unzip the file. You will see the com.google package inside the jar containing the guava library within the kylin-jdbc jar. From my findings, this is likely version 14 of guava. !image-2020-07-23-07-44-40-675.png! If you also have the real guava jar on your classpath, you end up having 2 separate and highly incompatible versions of guava classes present, and the class-loader is loading one of the competing classes randomly causing numerous runtime issues. > Guava classpath conflict caused by kylin-jdbc 3.1.0 jar > --- > > Key: KYLIN-4656 > URL: https://issues.apache.org/jira/browse/KYLIN-4656 > Project: Kylin > Issue Type: Bug > Components: Driver - JDBC >Affects Versions: v3.1.0 >Reporter: Gabor Arki >Priority: Critical > Attachments: image-2020-07-23-07-44-40-675.png > > > The newly released kylin-jdbc 3.1.0 jar contains a shaded, non-repackaged > version of the Guava library. This is causing class duplication with the > original guava jar if it is also on the classpath which results in > non-deterministic, runtime errors depending on which version of a certain > guava class has been picked up by the class-loader from the 2 versions. Based > on the runtime errors of the missing classes and methods, it seems to be a > very old version, probably <=14. > > Either implement a proper shading with package relocation or rely on > transitive dependency, but do not shade non-repackaged versions of libraries. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (KYLIN-4656) Guava classpath conflict caused by kylin-jdbc 3.1.0 jar
[ https://issues.apache.org/jira/browse/KYLIN-4656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17163235#comment-17163235 ] Gabor Arki edited comment on KYLIN-4656 at 7/23/20, 5:48 AM: - Download the published 3.1.0 jar: [https://mvnrepository.com/artifact/org.apache.kylin/kylin-jdbc/3.1.0] Then unzip the file. You will see the com.google package inside the jar containing the guava library within the kylin-jdbc jar. From my findings, this is likely version 14 of guava. !image-2020-07-23-07-44-40-675.png! If you also have the real guava jar on your classpath, you end up having 2 separate and highly incompatible versions of guava classes present, and the class-loader is loading one of the competing classes randomly causing numerous runtime issues. was (Author: arkigabor): Download the published 3.1.0 jar: [https://mvnrepository.com/artifact/org.apache.kylin/kylin-jdbc/3.1.0] Then unzip the file. You will see the com.google package inside the jar containing the guava library within the kylin-jdbc jar. From my findings, this is likely version 14 of guava. !image-2020-07-23-07-44-40-675.png! If you also have the real guava jar on your classpath, you end up having 2 separate and highly incompatible versions of guava classes present, and the class-loader is loading one of them randomly causing numerous runtime issues. > Guava classpath conflict caused by kylin-jdbc 3.1.0 jar > --- > > Key: KYLIN-4656 > URL: https://issues.apache.org/jira/browse/KYLIN-4656 > Project: Kylin > Issue Type: Bug > Components: Driver - JDBC >Affects Versions: v3.1.0 >Reporter: Gabor Arki >Priority: Critical > Attachments: image-2020-07-23-07-44-40-675.png > > > The newly released kylin-jdbc 3.1.0 jar contains a shaded, non-repackaged > version of the Guava library. This is causing class duplication with the > original guava jar if it is also on the classpath which results in > non-deterministic, runtime errors depending on which version of a certain > guava class has been picked up by the class-loader from the 2 versions. Based > on the runtime errors of the missing classes and methods, it seems to be a > very old version, probably <=14. > > Either implement a proper shading with package relocation or rely on > transitive dependency, but do not shade non-repackaged versions of libraries. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (KYLIN-4656) Guava classpath conflict caused by kylin-jdbc 3.1.0 jar
[ https://issues.apache.org/jira/browse/KYLIN-4656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17163235#comment-17163235 ] Gabor Arki edited comment on KYLIN-4656 at 7/23/20, 5:47 AM: - Download the published 3.1.0 jar: [https://mvnrepository.com/artifact/org.apache.kylin/kylin-jdbc/3.1.0] Then unzip the file. You will see the com.google package inside the jar containing the guava library within the kylin-jdbc jar. From my findings, this is likely version 14 of guava. !image-2020-07-23-07-44-40-675.png! If you also have the real guava jar on your classpath, you end up having 2 separate and highly incompatible versions of guava classes present, and the class-loader is loading one of them randomly causing numerous runtime issues. was (Author: arkigabor): Download the published 3.1.0 jar: [https://mvnrepository.com/artifact/org.apache.kylin/kylin-jdbc/3.1.0] Then unzip the file. You will see the com.google package inside the jar containing the guava library within the kylin-jdbc jar. !image-2020-07-23-07-44-40-675.png! If you also have the real guava jar on your classpath, you end up having 2 separate and highly incompatible versions of guava classes present, and the class-loader is loading one of them randomly causing numerous runtime issues. > Guava classpath conflict caused by kylin-jdbc 3.1.0 jar > --- > > Key: KYLIN-4656 > URL: https://issues.apache.org/jira/browse/KYLIN-4656 > Project: Kylin > Issue Type: Bug > Components: Driver - JDBC >Affects Versions: v3.1.0 >Reporter: Gabor Arki >Priority: Critical > Attachments: image-2020-07-23-07-44-40-675.png > > > The newly released kylin-jdbc 3.1.0 jar contains a shaded, non-repackaged > version of the Guava library. This is causing class duplication with the > original guava jar if it is also on the classpath which results in > non-deterministic, runtime errors depending on which version of a certain > guava class has been picked up by the class-loader from the 2 versions. Based > on the runtime errors of the missing classes and methods, it seems to be a > very old version, probably <=14. > > Either implement a proper shading with package relocation or rely on > transitive dependency, but do not shade non-repackaged versions of libraries. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KYLIN-4656) Guava classpath conflict caused by kylin-jdbc 3.1.0 jar
[ https://issues.apache.org/jira/browse/KYLIN-4656?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17163235#comment-17163235 ] Gabor Arki commented on KYLIN-4656: --- Download the published 3.1.0 jar: [https://mvnrepository.com/artifact/org.apache.kylin/kylin-jdbc/3.1.0] Then unzip the file. You will see the com.google package inside the jar containing the guava library within the kylin-jdbc jar. !image-2020-07-23-07-44-40-675.png! If you also have the real guava jar on your classpath, you end up having 2 separate and highly incompatible versions of guava classes present, and the class-loader is loading one of them randomly causing numerous runtime issues. > Guava classpath conflict caused by kylin-jdbc 3.1.0 jar > --- > > Key: KYLIN-4656 > URL: https://issues.apache.org/jira/browse/KYLIN-4656 > Project: Kylin > Issue Type: Bug > Components: Driver - JDBC >Affects Versions: v3.1.0 >Reporter: Gabor Arki >Priority: Critical > Attachments: image-2020-07-23-07-44-40-675.png > > > The newly released kylin-jdbc 3.1.0 jar contains a shaded, non-repackaged > version of the Guava library. This is causing class duplication with the > original guava jar if it is also on the classpath which results in > non-deterministic, runtime errors depending on which version of a certain > guava class has been picked up by the class-loader from the 2 versions. Based > on the runtime errors of the missing classes and methods, it seems to be a > very old version, probably <=14. > > Either implement a proper shading with package relocation or rely on > transitive dependency, but do not shade non-repackaged versions of libraries. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (KYLIN-4656) Guava classpath conflict caused by kylin-jdbc 3.1.0 jar
[ https://issues.apache.org/jira/browse/KYLIN-4656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabor Arki updated KYLIN-4656: -- Attachment: image-2020-07-23-07-44-40-675.png > Guava classpath conflict caused by kylin-jdbc 3.1.0 jar > --- > > Key: KYLIN-4656 > URL: https://issues.apache.org/jira/browse/KYLIN-4656 > Project: Kylin > Issue Type: Bug > Components: Driver - JDBC >Affects Versions: v3.1.0 >Reporter: Gabor Arki >Priority: Critical > Attachments: image-2020-07-23-07-44-40-675.png > > > The newly released kylin-jdbc 3.1.0 jar contains a shaded, non-repackaged > version of the Guava library. This is causing class duplication with the > original guava jar if it is also on the classpath which results in > non-deterministic, runtime errors depending on which version of a certain > guava class has been picked up by the class-loader from the 2 versions. Based > on the runtime errors of the missing classes and methods, it seems to be a > very old version, probably <=14. > > Either implement a proper shading with package relocation or rely on > transitive dependency, but do not shade non-repackaged versions of libraries. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KYLIN-4500) Timeout waiting for connection from pool
[ https://issues.apache.org/jira/browse/KYLIN-4500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17162925#comment-17162925 ] Gabor Arki commented on KYLIN-4500: --- For now, I will keep monitoring our server with netstat and try to determine whether there is any correlation with the S3 pool exhaustion. Also, we will try to upgrade to 3.1.0 but will probably take some time to tell whether the issue is still reproducible with that version. I will post an update with our findings once I have them. > Timeout waiting for connection from pool > > > Key: KYLIN-4500 > URL: https://issues.apache.org/jira/browse/KYLIN-4500 > Project: Kylin > Issue Type: Bug >Reporter: Gabor Arki >Priority: Major > Attachments: kylin-connection-timeout.txt > > > h4. Environment > * Kylin server 3.0.0 > * EMR 5.28 > h4. Issue > After an extended uptime, both Kylin query server and jobs running on EMR > stop working. The root cause in both cases is: > {noformat} > Caused by: java.io.IOException: > com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.SdkClientException: Unable > to execute HTTP request: Timeout waiting for connection from pool > at > com.amazon.ws.emr.hadoop.fs.s3n2.S3NativeFileSystem2.getFileStatus(S3NativeFileSystem2.java:257) > ~[emrfs-hadoop-assembly-2.37.0.jar:?]{noformat} > Based on > [https://aws.amazon.com/premiumsupport/knowledge-center/emr-timeout-connection-wait/] > increasing the fs.s3.maxConnections setting to 1 is just delaying the > issue thus the underlying issue is likely a connection leak. It also > indicates a leak that restarting the kylin service solves the problem. > A full stack trace from the QueryService is attached. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Comment Edited] (KYLIN-4500) Timeout waiting for connection from pool
[ https://issues.apache.org/jira/browse/KYLIN-4500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17162898#comment-17162898 ] Gabor Arki edited comment on KYLIN-4500 at 7/22/20, 4:21 PM: - [~hit_lacus], the linked issue seems to be unrelated. But if the dictionary is stored on the cluster, it could be related, however I do not see any FileNotFoundException logs when we are hitting this issue. I do see the slow ramp-up in CLOSE_WAIT connections though on the server. We are running Kylin on AWS EMR cluster and use S3 (EMRFS) for data storage instead of HDFS to make the cluster stateless. However, after some continuous uptime, we are always facing this issue where both the query server and the Kylin MR jobs are suddenly failing with the aforementioned Exception. The root cause of these failures is that the connection pool of the EMR cluster to S3 is exhausted and new operations fail to acquire a connection and time out while waiting for an S3 connection. No matter how much of a pool size we are configuring for the fs.s3.maxConnections value, this keeps happening. The underlying issue is very likely a connection leak where some code is not properly closing and returning a connection to the pool. Given a query server restart is solving the issue, I suspect the pool is exhausted somewhere in the Kylin query server code. was (Author: arkigabor): [~hit_lacus], the linked issue seems to be unrelated. We are running Kylin on AWS EMR cluster and use S3 (EMRFS) for data storage instead of HDFS to make the cluster stateless. However, after some continuous uptime, we are always facing this issue where both the query server and the Kylin MR jobs are suddenly failing with the aforementioned Exception. The root cause of these failures is that the connection pool of the EMR cluster to S3 is exhausted and new operations fail to acquire a connection and time out while waiting for an S3 connection. No matter how much of a pool size we are configuring for the fs.s3.maxConnections value, this keeps happening. The underlying issue is very likely a connection leak where some code is not properly closing and returning a connection to the pool. Given a query server restart is solving the issue, I suspect the pool is exhausted somewhere in the Kylin query server code. > Timeout waiting for connection from pool > > > Key: KYLIN-4500 > URL: https://issues.apache.org/jira/browse/KYLIN-4500 > Project: Kylin > Issue Type: Bug >Reporter: Gabor Arki >Priority: Major > Attachments: kylin-connection-timeout.txt > > > h4. Environment > * Kylin server 3.0.0 > * EMR 5.28 > h4. Issue > After an extended uptime, both Kylin query server and jobs running on EMR > stop working. The root cause in both cases is: > {noformat} > Caused by: java.io.IOException: > com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.SdkClientException: Unable > to execute HTTP request: Timeout waiting for connection from pool > at > com.amazon.ws.emr.hadoop.fs.s3n2.S3NativeFileSystem2.getFileStatus(S3NativeFileSystem2.java:257) > ~[emrfs-hadoop-assembly-2.37.0.jar:?]{noformat} > Based on > [https://aws.amazon.com/premiumsupport/knowledge-center/emr-timeout-connection-wait/] > increasing the fs.s3.maxConnections setting to 1 is just delaying the > issue thus the underlying issue is likely a connection leak. It also > indicates a leak that restarting the kylin service solves the problem. > A full stack trace from the QueryService is attached. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (KYLIN-4656) Guava classpath conflict caused by kylin-jdbc 3.1.0 jar
[ https://issues.apache.org/jira/browse/KYLIN-4656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabor Arki updated KYLIN-4656: -- Description: The newly released kylin-jdbc 3.1.0 jar contains a shaded, non-repackaged version of the Guava library. This is causing class duplication with the original guava jar if it is also on the classpath which results in non-deterministic, runtime errors depending on which version of a certain guava class has been picked up by the class-loader from the 2 versions. Based on the runtime errors of the missing classes and methods, it seems to be a very old version, probably <=14. Either implement a proper shading with package relocation or rely on transitive dependency, but do not shade non-repackaged versions of libraries. was: The newly released kylin-jdbc 3.1.0 jar contains a shaded, non-repackaged version of the Guava library. This is causing class duplication with the original guava jar if it is also on the classpath which results in non-deterministic, runtime errors depending on which version of a certain guava class has been picked up by the class-loader from the 2 versions. Based on the runtime errors of the missing classes and methods, it seems to be a very old version, probably <=14. Because of the, Either implement a proper shading with package relocation or rely on transitive dependency, but do not shade non-repackaged versions of libraries. > Guava classpath conflict caused by kylin-jdbc 3.1.0 jar > --- > > Key: KYLIN-4656 > URL: https://issues.apache.org/jira/browse/KYLIN-4656 > Project: Kylin > Issue Type: Bug > Components: Driver - JDBC >Affects Versions: v3.1.0 >Reporter: Gabor Arki >Priority: Critical > > The newly released kylin-jdbc 3.1.0 jar contains a shaded, non-repackaged > version of the Guava library. This is causing class duplication with the > original guava jar if it is also on the classpath which results in > non-deterministic, runtime errors depending on which version of a certain > guava class has been picked up by the class-loader from the 2 versions. Based > on the runtime errors of the missing classes and methods, it seems to be a > very old version, probably <=14. > > Either implement a proper shading with package relocation or rely on > transitive dependency, but do not shade non-repackaged versions of libraries. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (KYLIN-4656) Guava classpath conflict caused by kylin-jdbc 3.1.0 jar
[ https://issues.apache.org/jira/browse/KYLIN-4656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabor Arki updated KYLIN-4656: -- Description: The newly released kylin-jdbc 3.1.0 jar contains a shaded, non-repackaged version of the Guava library. This is causing class duplication with the original guava jar if it is also on the classpath which results in non-deterministic, runtime errors depending on which version of a certain guava class has been picked up by the class-loader from the 2 versions. Based on the runtime errors of the missing classes and methods, it seems to be a very old version, probably <=14. Because of the, Either implement a proper shading with package relocation or rely on transitive dependency, but do not shade non-repackaged versions of libraries. was: The newly released kylin-jdbc 3.1.0 jar contains a shaded, non-repackaged version of the Guava library. This is causing class duplication with the original guava jar if it is also on the classpath which results in non-deterministic, runtime errors depending on which version of a certain guava class has been picked up by the class-loader from the 2 versions. Based on the runtime errors of the missing classes and methods, it seems to be a very old version, probably <=14. Either implement a proper shading with package relocation or rely on transitive dependency, but do not shade non-repackaged versions of libraries. > Guava classpath conflict caused by kylin-jdbc 3.1.0 jar > --- > > Key: KYLIN-4656 > URL: https://issues.apache.org/jira/browse/KYLIN-4656 > Project: Kylin > Issue Type: Bug > Components: Driver - JDBC >Affects Versions: v3.1.0 >Reporter: Gabor Arki >Priority: Critical > > The newly released kylin-jdbc 3.1.0 jar contains a shaded, non-repackaged > version of the Guava library. This is causing class duplication with the > original guava jar if it is also on the classpath which results in > non-deterministic, runtime errors depending on which version of a certain > guava class has been picked up by the class-loader from the 2 versions. Based > on the runtime errors of the missing classes and methods, it seems to be a > very old version, probably <=14. Because of the, > > Either implement a proper shading with package relocation or rely on > transitive dependency, but do not shade non-repackaged versions of libraries. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (KYLIN-4656) Guava classpath conflict caused by kylin-jdbc 3.1.0 jar
[ https://issues.apache.org/jira/browse/KYLIN-4656?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabor Arki updated KYLIN-4656: -- Description: The newly released kylin-jdbc 3.1.0 jar contains a shaded, non-repackaged version of the Guava library. This is causing class duplication with the original guava jar if it is also on the classpath which results in non-deterministic, runtime errors depending on which version of a certain guava class has been picked up by the class-loader from the 2 versions. Based on the runtime errors of the missing classes and methods, it seems to be a very old version, probably <=14. Either implement a proper shading with package relocation or rely on transitive dependency, but do not shade non-repackaged versions of libraries. was: The newly released kylin-jdbc 3.1.0 jar contains a shaded, non-repackaged version of the Guava library. This is causing class duplications with the original guava jar if it is also on the classpath which results in non-deterministic, runtime errors depending on which version of a certain guava class has been picked up by the classloader from the 2 versions. Either implement a proper shading with package relocation or rely on transitive dependency, but do not shade non-repackaged versions of libraries. > Guava classpath conflict caused by kylin-jdbc 3.1.0 jar > --- > > Key: KYLIN-4656 > URL: https://issues.apache.org/jira/browse/KYLIN-4656 > Project: Kylin > Issue Type: Bug > Components: Driver - JDBC >Affects Versions: v3.1.0 >Reporter: Gabor Arki >Priority: Critical > > The newly released kylin-jdbc 3.1.0 jar contains a shaded, non-repackaged > version of the Guava library. This is causing class duplication with the > original guava jar if it is also on the classpath which results in > non-deterministic, runtime errors depending on which version of a certain > guava class has been picked up by the class-loader from the 2 versions. Based > on the runtime errors of the missing classes and methods, it seems to be a > very old version, probably <=14. > > Either implement a proper shading with package relocation or rely on > transitive dependency, but do not shade non-repackaged versions of libraries. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KYLIN-4656) Guava classpath conflict caused by kylin-jdbc 3.1.0 jar
Gabor Arki created KYLIN-4656: - Summary: Guava classpath conflict caused by kylin-jdbc 3.1.0 jar Key: KYLIN-4656 URL: https://issues.apache.org/jira/browse/KYLIN-4656 Project: Kylin Issue Type: Bug Components: Driver - JDBC Affects Versions: v3.1.0 Reporter: Gabor Arki The newly released kylin-jdbc 3.1.0 jar contains a shaded, non-repackaged version of the Guava library. This is causing class duplications with the original guava jar if it is also on the classpath which results in non-deterministic, runtime errors depending on which version of a certain guava class has been picked up by the classloader from the 2 versions. Either implement a proper shading with package relocation or rely on transitive dependency, but do not shade non-repackaged versions of libraries. -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Commented] (KYLIN-4500) Timeout waiting for connection from pool
[ https://issues.apache.org/jira/browse/KYLIN-4500?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17162898#comment-17162898 ] Gabor Arki commented on KYLIN-4500: --- [~hit_lacus], the linked issue seems to be unrelated. We are running Kylin on AWS EMR cluster and use S3 (EMRFS) for data storage instead of HDFS to make the cluster stateless. However, after some continuous uptime, we are always facing this issue where both the query server and the Kylin MR jobs are suddenly failing with the aforementioned Exception. The root cause of these failures is that the connection pool of the EMR cluster to S3 is exhausted and new operations fail to acquire a connection and time out while waiting for an S3 connection. No matter how much of a pool size we are configuring for the fs.s3.maxConnections value, this keeps happening. The underlying issue is very likely a connection leak where some code is not properly closing and returning a connection to the pool. Given a query server restart is solving the issue, I suspect the pool is exhausted somewhere in the Kylin query server code. > Timeout waiting for connection from pool > > > Key: KYLIN-4500 > URL: https://issues.apache.org/jira/browse/KYLIN-4500 > Project: Kylin > Issue Type: Bug >Reporter: Gabor Arki >Priority: Major > Attachments: kylin-connection-timeout.txt > > > h4. Environment > * Kylin server 3.0.0 > * EMR 5.28 > h4. Issue > After an extended uptime, both Kylin query server and jobs running on EMR > stop working. The root cause in both cases is: > {noformat} > Caused by: java.io.IOException: > com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.SdkClientException: Unable > to execute HTTP request: Timeout waiting for connection from pool > at > com.amazon.ws.emr.hadoop.fs.s3n2.S3NativeFileSystem2.getFileStatus(S3NativeFileSystem2.java:257) > ~[emrfs-hadoop-assembly-2.37.0.jar:?]{noformat} > Based on > [https://aws.amazon.com/premiumsupport/knowledge-center/emr-timeout-connection-wait/] > increasing the fs.s3.maxConnections setting to 1 is just delaying the > issue thus the underlying issue is likely a connection leak. It also > indicates a leak that restarting the kylin service solves the problem. > A full stack trace from the QueryService is attached. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (KYLIN-4500) Timeout waiting for connection from pool
[ https://issues.apache.org/jira/browse/KYLIN-4500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabor Arki updated KYLIN-4500: -- Description: h4. Environment * Kylin server 3.0.0 * EMR 5.28 h4. Issue After an extended uptime, both Kylin query server and jobs running on EMR stop working. The root cause in both cases is: {noformat} Caused by: java.io.IOException: com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.SdkClientException: Unable to execute HTTP request: Timeout waiting for connection from pool at com.amazon.ws.emr.hadoop.fs.s3n2.S3NativeFileSystem2.getFileStatus(S3NativeFileSystem2.java:257) ~[emrfs-hadoop-assembly-2.37.0.jar:?]{noformat} Based on [https://aws.amazon.com/premiumsupport/knowledge-center/emr-timeout-connection-wait/] increasing the fs.s3.maxConnections setting to 1 is just delaying the issue thus the underlying issue is likely a connection leak. It also indicates a leak that restarting the kylin service solves the problem. A full stack trace from the QueryService is attached. was: h4. Environment * Kylin server 3.0.0 * EMR 5.28 h4. Issue After an extended uptime, both Kylin query server and jobs running on EMR stop working. The root cause in both cases is: {noformat} Caused by: java.io.IOException: com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.SdkClientException: Unable to execute HTTP request: Timeout waiting for connection from pool at com.amazon.ws.emr.hadoop.fs.s3n2.S3NativeFileSystem2.getFileStatus(S3NativeFileSystem2.java:257) ~[emrfs-hadoop-assembly-2.37.0.jar:?]{noformat} {{Based on [https://aws.amazon.com/premiumsupport/knowledge-center/emr-timeout-connection-wait/] increasing the *fs.s3.maxConnections* setting to 1 is just delaying the issue thus the underlying issue is likely a connection leak. It also indicates a leak that restarting the kylin service solves the problem.}} {{A full stack trace from the QueryService is attached.}} > Timeout waiting for connection from pool > > > Key: KYLIN-4500 > URL: https://issues.apache.org/jira/browse/KYLIN-4500 > Project: Kylin > Issue Type: Bug >Reporter: Gabor Arki >Priority: Major > Attachments: kylin-connection-timeout.txt > > > h4. Environment > * Kylin server 3.0.0 > * EMR 5.28 > h4. Issue > After an extended uptime, both Kylin query server and jobs running on EMR > stop working. The root cause in both cases is: > {noformat} > Caused by: java.io.IOException: > com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.SdkClientException: Unable > to execute HTTP request: Timeout waiting for connection from pool > at > com.amazon.ws.emr.hadoop.fs.s3n2.S3NativeFileSystem2.getFileStatus(S3NativeFileSystem2.java:257) > ~[emrfs-hadoop-assembly-2.37.0.jar:?]{noformat} > Based on > [https://aws.amazon.com/premiumsupport/knowledge-center/emr-timeout-connection-wait/] > increasing the fs.s3.maxConnections setting to 1 is just delaying the > issue thus the underlying issue is likely a connection leak. It also > indicates a leak that restarting the kylin service solves the problem. > A full stack trace from the QueryService is attached. > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (KYLIN-4500) Timeout waiting for connection from pool
[ https://issues.apache.org/jira/browse/KYLIN-4500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabor Arki updated KYLIN-4500: -- Description: h4. Environment * Kylin server 3.0.0 * EMR 5.28 h4. Issue After an extended uptime, both Kylin query server and jobs running on EMR stop working. The root cause in both cases is: {noformat} Caused by: java.io.IOException: com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.SdkClientException: Unable to execute HTTP request: Timeout waiting for connection from pool at com.amazon.ws.emr.hadoop.fs.s3n2.S3NativeFileSystem2.getFileStatus(S3NativeFileSystem2.java:257) ~[emrfs-hadoop-assembly-2.37.0.jar:?]{noformat} {{Based on [https://aws.amazon.com/premiumsupport/knowledge-center/emr-timeout-connection-wait/] increasing the *fs.s3.maxConnections* setting to 1 is just delaying the issue thus the underlying issue is likely a connection leak. It also indicates a leak that restarting the kylin service solves the problem.}} {{A full stack trace from the QueryService is attached.}} was: h4. Environment * Kylin server 3.0.0 * EMR 5.28 h4. Issue After an extended uptime, both Kylin query server and jobs running on EMR stop working. The root cause is both cases is: {noformat} Caused by: java.io.IOException: com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.SdkClientException: Unable to execute HTTP request: Timeout waiting for connection from pool at com.amazon.ws.emr.hadoop.fs.s3n2.S3NativeFileSystem2.getFileStatus(S3NativeFileSystem2.java:257) ~[emrfs-hadoop-assembly-2.37.0.jar:?]{noformat} {{Based on [https://aws.amazon.com/premiumsupport/knowledge-center/emr-timeout-connection-wait/] increasing the *fs.s3.maxConnections* setting to 1 is just delaying the issue thus the underlying issue is likely a connection leak. It also indicates a leak that restarting the kylin service solves the problem.}} {{A full stack trace from the QueryService is attached.}} > Timeout waiting for connection from pool > > > Key: KYLIN-4500 > URL: https://issues.apache.org/jira/browse/KYLIN-4500 > Project: Kylin > Issue Type: Bug >Reporter: Gabor Arki >Priority: Major > Attachments: kylin-connection-timeout.txt > > > h4. Environment > * Kylin server 3.0.0 > * EMR 5.28 > h4. Issue > After an extended uptime, both Kylin query server and jobs running on EMR > stop working. The root cause in both cases is: > {noformat} > Caused by: java.io.IOException: > com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.SdkClientException: Unable > to execute HTTP request: Timeout waiting for connection from pool > at > com.amazon.ws.emr.hadoop.fs.s3n2.S3NativeFileSystem2.getFileStatus(S3NativeFileSystem2.java:257) > ~[emrfs-hadoop-assembly-2.37.0.jar:?]{noformat} > {{Based on > [https://aws.amazon.com/premiumsupport/knowledge-center/emr-timeout-connection-wait/] > increasing the *fs.s3.maxConnections* setting to 1 is just delaying the > issue thus the underlying issue is likely a connection leak. It also > indicates a leak that restarting the kylin service solves the problem.}} > {{A full stack trace from the QueryService is attached.}} > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (KYLIN-4500) Timeout waiting for connection from pool
[ https://issues.apache.org/jira/browse/KYLIN-4500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabor Arki updated KYLIN-4500: -- Description: h4. Environment * Kylin server 3.0.0 * EMR 5.28 h4. Issue After an extended uptime, both Kylin query server and jobs running on EMR stop working. The root cause is both cases is: {noformat} Caused by: java.io.IOException: com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.SdkClientException: Unable to execute HTTP request: Timeout waiting for connection from pool at com.amazon.ws.emr.hadoop.fs.s3n2.S3NativeFileSystem2.getFileStatus(S3NativeFileSystem2.java:257) ~[emrfs-hadoop-assembly-2.37.0.jar:?]{noformat} {{Based on [https://aws.amazon.com/premiumsupport/knowledge-center/emr-timeout-connection-wait/] increasing the *fs.s3.maxConnections* setting to 1 is just delaying the issue thus the underlying issue is likely a connection leak. It also indicates a leak that restarting the kylin service solves the problem.}} {{A full stack trace from the QueryService is attached.}} was: h4. Environment * Kylin server 3.0.0 * EMR 5.28 h4. Issue After an extended uptime, both Kylin query server and jobs running on EMR stop working. The root cause is both cases is: {{Caused by: java.io.IOException: com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.SdkClientException: Unable to execute HTTP request: Timeout waiting for connection from pool}} {{ at com.amazon.ws.emr.hadoop.fs.s3n2.S3NativeFileSystem2.getFileStatus(S3NativeFileSystem2.java:257) ~[emrfs-hadoop-assembly-2.37.0.jar:?]}} {{Based on [https://aws.amazon.com/premiumsupport/knowledge-center/emr-timeout-connection-wait/] increasing the *fs.s3.maxConnections* setting to 1 is just delaying the issue thus the underlying issue is likely a connection leak. It also indicates a leak that restarting the kylin service solves the problem.}} {{A full stack trace from the QueryService is attached.}} > Timeout waiting for connection from pool > > > Key: KYLIN-4500 > URL: https://issues.apache.org/jira/browse/KYLIN-4500 > Project: Kylin > Issue Type: Bug >Reporter: Gabor Arki >Priority: Major > Attachments: kylin-connection-timeout.txt > > > h4. Environment > * Kylin server 3.0.0 > * EMR 5.28 > h4. Issue > After an extended uptime, both Kylin query server and jobs running on EMR > stop working. The root cause is both cases is: > {noformat} > Caused by: java.io.IOException: > com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.SdkClientException: Unable > to execute HTTP request: Timeout waiting for connection from pool > at > com.amazon.ws.emr.hadoop.fs.s3n2.S3NativeFileSystem2.getFileStatus(S3NativeFileSystem2.java:257) > ~[emrfs-hadoop-assembly-2.37.0.jar:?]{noformat} > {{Based on > [https://aws.amazon.com/premiumsupport/knowledge-center/emr-timeout-connection-wait/] > increasing the *fs.s3.maxConnections* setting to 1 is just delaying the > issue thus the underlying issue is likely a connection leak. It also > indicates a leak that restarting the kylin service solves the problem.}} > {{A full stack trace from the QueryService is attached.}} > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (KYLIN-4500) Timeout waiting for connection from pool
[ https://issues.apache.org/jira/browse/KYLIN-4500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabor Arki updated KYLIN-4500: -- Attachment: kylin-connection-timeout.txt > Timeout waiting for connection from pool > > > Key: KYLIN-4500 > URL: https://issues.apache.org/jira/browse/KYLIN-4500 > Project: Kylin > Issue Type: Bug >Reporter: Gabor Arki >Priority: Major > Attachments: kylin-connection-timeout.txt > > > h4. Environment > * Kylin server 3.0.0 > * EMR 5.28 > h4. Issue > After an extended uptime, both Kylin query server and jobs running on EMR > stop working. The root cause is both cases is: > {noformat} > Caused by: java.io.IOException: > com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.SdkClientException: Unable > to execute HTTP request: Timeout waiting for connection from pool > at > com.amazon.ws.emr.hadoop.fs.s3n2.S3NativeFileSystem2.getFileStatus(S3NativeFileSystem2.java:257) > ~[emrfs-hadoop-assembly-2.37.0.jar:?]{noformat} > {{Based on > [https://aws.amazon.com/premiumsupport/knowledge-center/emr-timeout-connection-wait/] > increasing the *fs.s3.maxConnections* setting to 1 is just delaying the > issue thus the underlying issue is likely a connection leak. It also > indicates a leak that restarting the kylin service solves the problem.}} > {{A full stack trace from the QueryService is attached.}} > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KYLIN-4500) Timeout waiting for connection from pool
Gabor Arki created KYLIN-4500: - Summary: Timeout waiting for connection from pool Key: KYLIN-4500 URL: https://issues.apache.org/jira/browse/KYLIN-4500 Project: Kylin Issue Type: Bug Reporter: Gabor Arki h4. Environment * Kylin server 3.0.0 * EMR 5.28 h4. Issue After an extended uptime, both Kylin query server and jobs running on EMR stop working. The root cause is both cases is: {{Caused by: java.io.IOException: com.amazon.ws.emr.hadoop.fs.shaded.com.amazonaws.SdkClientException: Unable to execute HTTP request: Timeout waiting for connection from pool}} {{ at com.amazon.ws.emr.hadoop.fs.s3n2.S3NativeFileSystem2.getFileStatus(S3NativeFileSystem2.java:257) ~[emrfs-hadoop-assembly-2.37.0.jar:?]}} {{Based on [https://aws.amazon.com/premiumsupport/knowledge-center/emr-timeout-connection-wait/] increasing the *fs.s3.maxConnections* setting to 1 is just delaying the issue thus the underlying issue is likely a connection leak. It also indicates a leak that restarting the kylin service solves the problem.}} {{A full stack trace from the QueryService is attached.}} -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (KYLIN-4382) Unable to use DATE type in prepared statements
[ https://issues.apache.org/jira/browse/KYLIN-4382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabor Arki updated KYLIN-4382: -- Description: h4. Environment * Kylin JDBC driver: 3.0.0 * Kylin server: 3.0.0 h4. Reproduction steps * Use a cube with a DATE column (like the derived day_start) * Create a prepared statement and try to filter with this column in a where clause * Pass the values as java.sql.Date type h4. Expected result * The proper response is provided for the query with the values for the specified date(s) h4. Actual result * No data is returned * StreamStorageQuery's _Skip cube segment_ log message is containing the filter with an epoch day value, for example: {{DAY_START GTE [18231]}} * Executing the same query from the web UI you get the expected response. Now the same log message is containing the filter in epoch millis format, for example: {{DAY_START IN [158077440, 158086080]}} * Passing the value as String instead of java.sql.Date fails on server-side with: {{exception while executing query: java.lang.String cannot be cast to java.lang.Integer}} * Passing the value as java.sql.Timestamp or java.util.Date fails on server-side with: {{exception while executing query: java.lang.Long cannot be cast to java.lang.Integer}} * Trying to CAST a String to DATE fails with the error described here: https://issues.apache.org/jira/browse/CALCITE-3100 was: h4. Environment * Kylin JDBC driver: 3.0.0 * Kylin server: 3.0.0 h4. Reproduction steps * Use a cube with a DATE column (like the derived day_start) * Create a prepared statement and try to filter with this column in a where clause * Pass the values as java.sql.Date type h4. Expected result * The proper response is provided for the query with the values for the specified date(s) h4. Actual result * No data is returned * StreamStorageQuery's _Skip cube segment_ log message is containing the filter with an epoch day value, for example: {{DAY_START GTE [18231]}} * Executing the same query from the web UI you get the expected response. Now the same log message is containing the filter in epoch millis format, for example: {{DAY_START IN [158077440, 158086080]}} * Passing the value as String instead of java.sql.Date fails on server-side with: {{exception while executing query: java.lang.String cannot be cast to java.lang.Integer}} * Passing the value as java.sql.Timestamp or java.util.Date fails on server-side with: {{exception while executing query: java.lang.Long cannot be cast to java.lang.Integer}} > Unable to use DATE type in prepared statements > -- > > Key: KYLIN-4382 > URL: https://issues.apache.org/jira/browse/KYLIN-4382 > Project: Kylin > Issue Type: Bug > Components: Query Engine >Reporter: Gabor Arki >Priority: Major > > h4. Environment > * Kylin JDBC driver: 3.0.0 > * Kylin server: 3.0.0 > h4. Reproduction steps > * Use a cube with a DATE column (like the derived day_start) > * Create a prepared statement and try to filter with this column in a where > clause > * Pass the values as java.sql.Date type > h4. Expected result > * The proper response is provided for the query with the values for the > specified date(s) > h4. Actual result > * No data is returned > * StreamStorageQuery's _Skip cube segment_ log message is containing the > filter with an epoch day value, for example: {{DAY_START GTE [18231]}} > * Executing the same query from the web UI you get the expected response. > Now the same log message is containing the filter in epoch millis format, for > example: {{DAY_START IN [158077440, 158086080]}} > * Passing the value as String instead of java.sql.Date fails on server-side > with: {{exception while executing query: java.lang.String cannot be cast to > java.lang.Integer}} > * Passing the value as java.sql.Timestamp or java.util.Date fails on > server-side with: {{exception while executing query: java.lang.Long cannot be > cast to java.lang.Integer}} > * Trying to CAST a String to DATE fails with the error described here: > https://issues.apache.org/jira/browse/CALCITE-3100 > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (KYLIN-4382) Unable to use DATE type in prepared statements
[ https://issues.apache.org/jira/browse/KYLIN-4382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabor Arki updated KYLIN-4382: -- Description: h4. Environment * Kylin JDBC driver: 3.0.0 * Kylin server: 3.0.0 h4. Reproduction steps * Use a cube with a DATE column (like the derived day_start) * Create a prepared statement and try to filter with this column in a where clause * Pass the values as java.sql.Date type h4. Expected result * The proper response is provided for the query with the values for the specified date(s) h4. Actual result * No data is returned * StreamStorageQuery's _Skip cube segment_ log message is containing the filter with an epoch day value, for example: {{DAY_START GTE [18231]}} * Executing the same query from the web UI you get the expected response. Now the same log message is containing the filter in epoch millis format, for example: {{DAY_START IN [158077440, 158086080]}} * Passing the value as String instead of java.sql.Date fails on server-side with: {{exception while executing query: java.lang.String cannot be cast to java.lang.Integer}} * Passing the value as java.sql.Timestamp or java.util.Date fails on server-side with: {{exception while executing query: java.lang.Long cannot be cast to java.lang.Integer}} was: h4. Environment * Kylin JDBC driver: 3.0.0 * Kylin server: 3.0.0 h4. Reproduction steps * Use a cube with a DATE column (like the derived day_start) * Create a prepared statement and try to filter with this column in a where clause * Pass the values as java.sql.Date type h4. Expected result * The proper response is provided for the query with the values for the specified date(s) h4. Actual result * No data is returned * StreamStorageQuery's _Skip cube segment_ log message is containing the filter with an epoch day value, for example: {{}}{{DAY_START GTE [18231]}} * Executing the same query from the web UI you get the expected response. Now the same log message is containing the filter in epoch millis format, for example: {{DAY_START IN [158077440, 158086080]}} * Passing the value as String instead of java.sql.Date fails on server-side with: {{exception while executing query: java.lang.String cannot be cast to java.lang.Integer}} * Passing the value as java.sql.Timestamp or java.util.Date fails on server-side with: {{exception while executing query: java.lang.Long cannot be cast to java.lang.Integer}} > Unable to use DATE type in prepared statements > -- > > Key: KYLIN-4382 > URL: https://issues.apache.org/jira/browse/KYLIN-4382 > Project: Kylin > Issue Type: Bug > Components: Query Engine >Reporter: Gabor Arki >Priority: Major > > h4. Environment > * Kylin JDBC driver: 3.0.0 > * Kylin server: 3.0.0 > h4. Reproduction steps > * Use a cube with a DATE column (like the derived day_start) > * Create a prepared statement and try to filter with this column in a where > clause > * Pass the values as java.sql.Date type > h4. Expected result > * The proper response is provided for the query with the values for the > specified date(s) > h4. Actual result > * No data is returned > * StreamStorageQuery's _Skip cube segment_ log message is containing the > filter with an epoch day value, for example: {{DAY_START GTE [18231]}} > * Executing the same query from the web UI you get the expected response. > Now the same log message is containing the filter in epoch millis format, for > example: {{DAY_START IN [158077440, 158086080]}} > * Passing the value as String instead of java.sql.Date fails on server-side > with: {{exception while executing query: java.lang.String cannot be cast to > java.lang.Integer}} > * Passing the value as java.sql.Timestamp or java.util.Date fails on > server-side with: {{exception while executing query: java.lang.Long cannot be > cast to java.lang.Integer}} > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Updated] (KYLIN-4382) Unable to use DATE type in prepared statements
[ https://issues.apache.org/jira/browse/KYLIN-4382?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabor Arki updated KYLIN-4382: -- Description: h4. Environment * Kylin JDBC driver: 3.0.0 * Kylin server: 3.0.0 h4. Reproduction steps * Use a cube with a DATE column (like the derived day_start) * Create a prepared statement and try to filter with this column in a where clause * Pass the values as java.sql.Date type h4. Expected result * The proper response is provided for the query with the values for the specified date(s) h4. Actual result * No data is returned * StreamStorageQuery's _Skip cube segment_ log message is containing the filter with an epoch day value, for example: {{}}{{DAY_START GTE [18231]}} * Executing the same query from the web UI you get the expected response. Now the same log message is containing the filter in epoch millis format, for example: {{DAY_START IN [158077440, 158086080]}} * Passing the value as String instead of java.sql.Date fails on server-side with: {{exception while executing query: java.lang.String cannot be cast to java.lang.Integer}} * Passing the value as java.sql.Timestamp or java.util.Date fails on server-side with: {{exception while executing query: java.lang.Long cannot be cast to java.lang.Integer}} was: h4. Environment * Kylin JDBC driver: 3.0.0 * Kylin server: 3.0.0 h4. Reproduction steps * Use a cube with a DATE column (like the derived day_start) * Create a prepared statement and try to filter with this column in a where clause * Pass the values as java.sql.Date type h4. Expected result * The proper response is provided for the query with the values for the specified date(s) h4. Actual result * No data is returned * StreamStorageQuery's _Skip cube segment_ log message is containing the filter with an epoch day value, for example: {{}}{{DAY_START GTE [18231]}} * Executing the same query from the web UI you get the expected response. Now the same log message is containing the filter in epoch millis format, for example: {{DAY_START IN [158077440, 158086080]}} * Passing the value as String instead of java.sql.Date fails on server-side with: {{exception while executing query: java.lang.String cannot be cast to java.lang.Integer}} * Passing the value as java.sql.Timestamp or java.util.Date fails on server-side with: {{exception while executing query: java.lang.Long cannot be cast to java.lang.Integer}} > Unable to use DATE type in prepared statements > -- > > Key: KYLIN-4382 > URL: https://issues.apache.org/jira/browse/KYLIN-4382 > Project: Kylin > Issue Type: Bug > Components: Query Engine >Reporter: Gabor Arki >Priority: Major > > h4. Environment > * Kylin JDBC driver: 3.0.0 > * Kylin server: 3.0.0 > h4. Reproduction steps > * Use a cube with a DATE column (like the derived day_start) > * Create a prepared statement and try to filter with this column in a where > clause > * Pass the values as java.sql.Date type > h4. Expected result > * The proper response is provided for the query with the values for the > specified date(s) > h4. Actual result > * No data is returned > * StreamStorageQuery's _Skip cube segment_ log message is containing the > filter with an epoch day value, for example: {{}}{{DAY_START GTE [18231]}} > * Executing the same query from the web UI you get the expected response. > Now the same log message is containing the filter in epoch millis format, for > example: {{DAY_START IN [158077440, 158086080]}} > * Passing the value as String instead of java.sql.Date fails on server-side > with: {{exception while executing query: java.lang.String cannot be cast to > java.lang.Integer}} > * Passing the value as java.sql.Timestamp or java.util.Date fails on > server-side with: {{exception while executing query: java.lang.Long cannot be > cast to java.lang.Integer}} > -- This message was sent by Atlassian Jira (v8.3.4#803005)
[jira] [Created] (KYLIN-4382) Unable to use DATE type in prepared statements
Gabor Arki created KYLIN-4382: - Summary: Unable to use DATE type in prepared statements Key: KYLIN-4382 URL: https://issues.apache.org/jira/browse/KYLIN-4382 Project: Kylin Issue Type: Bug Components: Query Engine Reporter: Gabor Arki h4. Environment * Kylin JDBC driver: 3.0.0 * Kylin server: 3.0.0 h4. Reproduction steps * Use a cube with a DATE column (like the derived day_start) * Create a prepared statement and try to filter with this column in a where clause * Pass the values as java.sql.Date type h4. Expected result * The proper response is provided for the query with the values for the specified date(s) h4. Actual result * No data is returned * StreamStorageQuery's _Skip cube segment_ log message is containing the filter with an epoch day value, for example: {{}}{{DAY_START GTE [18231]}} * Executing the same query from the web UI you get the expected response. Now the same log message is containing the filter in epoch millis format, for example: {{DAY_START IN [158077440, 158086080]}} * Passing the value as String instead of java.sql.Date fails on server-side with: {{exception while executing query: java.lang.String cannot be cast to java.lang.Integer}} * Passing the value as java.sql.Timestamp or java.util.Date fails on server-side with: {{exception while executing query: java.lang.Long cannot be cast to java.lang.Integer}} -- This message was sent by Atlassian Jira (v8.3.4#803005)