[jira] [Comment Edited] (IMPALA-9491) Compilation failure in KuduUtil.java
[ https://issues.apache.org/jira/browse/IMPALA-9491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17059130#comment-17059130 ] Joe McDonnell edited comment on IMPALA-9491 at 3/13/20, 11:52 PM: -- One option may be to specify a custom version when building Kudu (i.e. not a SNAPSHOT). It looks like Kudu gets its version from the version.txt file in its repo. If each toolchain build had a different version with different jar names, this wouldn't be a problem. For example, we might do a search and replace in version.txt for SNAPSHOT and put in the githash we are building. Then we would update IMPALA_KUDU_JAVA_VERSION to include it: [https://github.com/apache/impala/blob/master/bin/impala-config.sh#L727] I'm not sure what complications that would introduce. was (Author: joemcdonnell): One option may be to specify a custom version when building Kudu (i.e. not a SNAPSHOT). It looks like Kudu gets its version from the version.txt file in its repo. If each toolchain build had a different version with different jar names, this wouldn't be a problem. For example, we might do a search and replace in version.txt for SNAPSHOT and put in the githash we are building. Then we would update IMPALA_KUDU_JAVA_VERSION to include it: [https://github.com/apache/impala/blob/master/bin/impala-config.sh#L727] > Compilation failure in KuduUtil.java > > > Key: IMPALA-9491 > URL: https://issues.apache.org/jira/browse/IMPALA-9491 > Project: IMPALA > Issue Type: Bug > Components: Frontend >Reporter: David Rorke >Assignee: Csaba Ringhofer >Priority: Blocker > Labels: broken-build > > Build is failing with the following: > {noformat} > 12:40:33 [INFO] BUILD FAILURE > 12:40:33 [ERROR] Failed to execute goal > org.apache.maven.plugins:maven-compiler-plugin:3.3:compile (default-compile) > on project impala-frontend: Compilation failure: Compilation failure: > 12:40:33 [ERROR] > /data0/jenkins/workspace/impala-asf-master-core/repos/Impala/fe/src/main/java/org/apache/impala/util/KuduUtil.java:[181,12] > an enum switch case label must be the unqualified name of an enumeration > constant > 12:40:33 [ERROR] > /data0/jenkins/workspace/impala-asf-master-core/repos/Impala/fe/src/main/java/org/apache/impala/util/KuduUtil.java:[183,12] > cannot find symbol > 12:40:33 [ERROR] symbol: method addDate(int,java.sql.Date) > 12:40:33 [ERROR] location: variable key of type > org.apache.kudu.client.PartialRow > 12:40:33 [ERROR] > /data0/jenkins/workspace/impala-asf-master-core/repos/Impala/fe/src/main/java/org/apache/impala/util/KuduUtil.java:[239,12] > an enum switch case label must be the unqualified name of an enumeration > constant > 12:40:33 [ERROR] > /data0/jenkins/workspace/impala-asf-master-core/repos/Impala/fe/src/main/java/org/apache/impala/util/KuduUtil.java:[442,45] > cannot find symbol > 12:40:33 [ERROR] symbol: variable DATE > 12:40:33 [ERROR] location: class org.apache.kudu.Type > 12:40:33 [ERROR] > /data0/jenkins/workspace/impala-asf-master-core/repos/Impala/fe/src/main/java/org/apache/impala/util/KuduUtil.java:[468,12] > an enum switch case label must be the unqualified name of an enumeration > constant > {noformat} > Likely related to this change: https://gerrit.cloudera.org/#/c/14705/ -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-9491) Compilation failure in KuduUtil.java
[ https://issues.apache.org/jira/browse/IMPALA-9491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17059130#comment-17059130 ] Joe McDonnell commented on IMPALA-9491: --- One option may be to specify a custom version when building Kudu (i.e. not a SNAPSHOT). It looks like Kudu gets its version from the version.txt file in its repo. If each toolchain build had a different version with different jar names, this wouldn't be a problem. For example, we might do a search and replace in version.txt for SNAPSHOT and put in the githash we are building. Then we would update IMPALA_KUDU_JAVA_VERSION to include it: [https://github.com/apache/impala/blob/master/bin/impala-config.sh#L727] > Compilation failure in KuduUtil.java > > > Key: IMPALA-9491 > URL: https://issues.apache.org/jira/browse/IMPALA-9491 > Project: IMPALA > Issue Type: Bug > Components: Frontend >Reporter: David Rorke >Assignee: Csaba Ringhofer >Priority: Blocker > Labels: broken-build > > Build is failing with the following: > {noformat} > 12:40:33 [INFO] BUILD FAILURE > 12:40:33 [ERROR] Failed to execute goal > org.apache.maven.plugins:maven-compiler-plugin:3.3:compile (default-compile) > on project impala-frontend: Compilation failure: Compilation failure: > 12:40:33 [ERROR] > /data0/jenkins/workspace/impala-asf-master-core/repos/Impala/fe/src/main/java/org/apache/impala/util/KuduUtil.java:[181,12] > an enum switch case label must be the unqualified name of an enumeration > constant > 12:40:33 [ERROR] > /data0/jenkins/workspace/impala-asf-master-core/repos/Impala/fe/src/main/java/org/apache/impala/util/KuduUtil.java:[183,12] > cannot find symbol > 12:40:33 [ERROR] symbol: method addDate(int,java.sql.Date) > 12:40:33 [ERROR] location: variable key of type > org.apache.kudu.client.PartialRow > 12:40:33 [ERROR] > /data0/jenkins/workspace/impala-asf-master-core/repos/Impala/fe/src/main/java/org/apache/impala/util/KuduUtil.java:[239,12] > an enum switch case label must be the unqualified name of an enumeration > constant > 12:40:33 [ERROR] > /data0/jenkins/workspace/impala-asf-master-core/repos/Impala/fe/src/main/java/org/apache/impala/util/KuduUtil.java:[442,45] > cannot find symbol > 12:40:33 [ERROR] symbol: variable DATE > 12:40:33 [ERROR] location: class org.apache.kudu.Type > 12:40:33 [ERROR] > /data0/jenkins/workspace/impala-asf-master-core/repos/Impala/fe/src/main/java/org/apache/impala/util/KuduUtil.java:[468,12] > an enum switch case label must be the unqualified name of an enumeration > constant > {noformat} > Likely related to this change: https://gerrit.cloudera.org/#/c/14705/ -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-9191) Provide a way to build Impala with only one of Sentry / Ranger
[ https://issues.apache.org/jira/browse/IMPALA-9191?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17059126#comment-17059126 ] Joe McDonnell commented on IMPALA-9191: --- I think it makes sense to split this up into multiple parts. The first part is to make it possible to run tests without Sentry. Here is what I think would be needed for that: # Add an environment variable (something like DISABLE_SENTRY) # Only start up Sentry (in testdata/bin/run-all.sh) if DISABLE_SENTRY is false # Only run Sentry tests if DISABLE_SENTRY is false (through pytest skips and frontend test equivalents) # Set DISABLE_SENTRY to true for USE_CDP_HIVE=true. This would still build against Sentry. A second part of this would be to be able to build without Sentry. That is harder, so it can be postponed. > Provide a way to build Impala with only one of Sentry / Ranger > -- > > Key: IMPALA-9191 > URL: https://issues.apache.org/jira/browse/IMPALA-9191 > Project: IMPALA > Issue Type: Improvement > Components: Frontend >Affects Versions: Impala 3.4.0 >Reporter: Joe McDonnell >Assignee: Fang-Yu Rao >Priority: Critical > > Deployments of Impala will use either Ranger or Sentry, and deployments would > not switch back and forth between the two. It makes sense to provide a way to > pick at compile time which one to include. This allows packagers of Impala to > avoid a dependency for whichever authorization provider they don't need. > In particular, compilation of the USE_CDP_HIVE=true side of Impala currently > needs only a few things from the CDH_BUILD_NUMBER and one them is Sentry. In > the other direction, the only thing a USE_CDP_HIVE=false configuration uses > from the CDP_BUILD_NUMBER is Ranger. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-9505) CatalogdMetaProvider using bad TUnit for profile counters
[ https://issues.apache.org/jira/browse/IMPALA-9505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Rorke updated IMPALA-9505: Summary: CatalogdMetaProvider using bad TUnit for profile counters (was: CatalogMetaProvider using bad TUnit for profile counters) > CatalogdMetaProvider using bad TUnit for profile counters > - > > Key: IMPALA-9505 > URL: https://issues.apache.org/jira/browse/IMPALA-9505 > Project: IMPALA > Issue Type: Bug > Components: Frontend >Reporter: David Rorke >Priority: Major > Labels: observability > > CatalogMetaProvider is using a TUnit value of TUnit.NONE for several runtime > profile counters: > [https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java#L554-L558] > and > [https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java#L387] > TUnit.UNIT would be a better choice here and the use of TUnit.NONE may break > existing profile readers which won't expect this value for a numerical > counter. The appropriate set of TUnit types to maximize reader compatibility > is described in IMPALA-8236. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-9505) CatalogdMetaProvider using bad TUnit for profile counters
[ https://issues.apache.org/jira/browse/IMPALA-9505?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] David Rorke updated IMPALA-9505: Description: CatalogdMetaProvider is using a TUnit value of TUnit.NONE for several runtime profile counters: [https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java#L554-L558] and [https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java#L387] TUnit.UNIT would be a better choice here and the use of TUnit.NONE may break existing profile readers which won't expect this value for a numerical counter. The appropriate set of TUnit types to maximize reader compatibility is described in IMPALA-8236. was: CatalogMetaProvider is using a TUnit value of TUnit.NONE for several runtime profile counters: [https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java#L554-L558] and [https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java#L387] TUnit.UNIT would be a better choice here and the use of TUnit.NONE may break existing profile readers which won't expect this value for a numerical counter. The appropriate set of TUnit types to maximize reader compatibility is described in IMPALA-8236. > CatalogdMetaProvider using bad TUnit for profile counters > - > > Key: IMPALA-9505 > URL: https://issues.apache.org/jira/browse/IMPALA-9505 > Project: IMPALA > Issue Type: Bug > Components: Frontend >Reporter: David Rorke >Priority: Major > Labels: observability > > CatalogdMetaProvider is using a TUnit value of TUnit.NONE for several runtime > profile counters: > [https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java#L554-L558] > and > [https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java#L387] > TUnit.UNIT would be a better choice here and the use of TUnit.NONE may break > existing profile readers which won't expect this value for a numerical > counter. The appropriate set of TUnit types to maximize reader compatibility > is described in IMPALA-8236. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-9505) CatalogMetaProvider using bad TUnit for profile counters
David Rorke created IMPALA-9505: --- Summary: CatalogMetaProvider using bad TUnit for profile counters Key: IMPALA-9505 URL: https://issues.apache.org/jira/browse/IMPALA-9505 Project: IMPALA Issue Type: Bug Components: Frontend Reporter: David Rorke CatalogMetaProvider is using a TUnit value of TUnit.NONE for several runtime profile counters: [https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java#L554-L558] and [https://github.com/apache/impala/blob/master/fe/src/main/java/org/apache/impala/catalog/local/CatalogdMetaProvider.java#L387] TUnit.UNIT would be a better choice here and the use of TUnit.NONE may break existing profile readers which won't expect this value for a numerical counter. The appropriate set of TUnit types to maximize reader compatibility is described in IMPALA-8236. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Assigned] (IMPALA-9191) Provide a way to build Impala with only one of Sentry / Ranger
[ https://issues.apache.org/jira/browse/IMPALA-9191?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fang-Yu Rao reassigned IMPALA-9191: --- Assignee: Fang-Yu Rao > Provide a way to build Impala with only one of Sentry / Ranger > -- > > Key: IMPALA-9191 > URL: https://issues.apache.org/jira/browse/IMPALA-9191 > Project: IMPALA > Issue Type: Improvement > Components: Frontend >Affects Versions: Impala 3.4.0 >Reporter: Joe McDonnell >Assignee: Fang-Yu Rao >Priority: Critical > > Deployments of Impala will use either Ranger or Sentry, and deployments would > not switch back and forth between the two. It makes sense to provide a way to > pick at compile time which one to include. This allows packagers of Impala to > avoid a dependency for whichever authorization provider they don't need. > In particular, compilation of the USE_CDP_HIVE=true side of Impala currently > needs only a few things from the CDH_BUILD_NUMBER and one them is Sentry. In > the other direction, the only thing a USE_CDP_HIVE=false configuration uses > from the CDP_BUILD_NUMBER is Ranger. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Work started] (IMPALA-9494) Support displaying complex types
[ https://issues.apache.org/jira/browse/IMPALA-9494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on IMPALA-9494 started by Gabor Kaszab. > Support displaying complex types > > > Key: IMPALA-9494 > URL: https://issues.apache.org/jira/browse/IMPALA-9494 > Project: IMPALA > Issue Type: Epic > Components: Backend, Frontend >Reporter: Gabor Kaszab >Assignee: Gabor Kaszab >Priority: Major > Labels: complextype > > Currently displaying complex types is not supported in Impala. There is a > workaround to see the unnested content of the collections but there is no > such functionality to display one complex type in one result column formatted > to Json. Note, Hive does have support for the same. > The scope of this Jira is to have an umbrella for the following tasks: > - Support all complex types (Struct, Array, Map) in the SELECT list and > display them as Json. > - Display complex type columns in Json for a "SELECT *" query > - Allow returning complex types in a sub-select > - Have the display support for Parquet and ORC > Note, I made the sub-tasks as small as possible so that this granularity > allows to parallelise the work. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Work started] (IMPALA-9495) Allow Struct type in SELECT list for ORC tables
[ https://issues.apache.org/jira/browse/IMPALA-9495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on IMPALA-9495 started by Gabor Kaszab. > Allow Struct type in SELECT list for ORC tables > --- > > Key: IMPALA-9495 > URL: https://issues.apache.org/jira/browse/IMPALA-9495 > Project: IMPALA > Issue Type: New Feature > Components: Backend, Frontend >Reporter: Gabor Kaszab >Assignee: Gabor Kaszab >Priority: Major > Labels: complextype > > Output is expected in Json format, e.g.: > {"a":5,"b":"five"} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-9500) Allow returning complex types from a subselect
[ https://issues.apache.org/jira/browse/IMPALA-9500?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabor Kaszab updated IMPALA-9500: - Labels: complextype (was: ) > Allow returning complex types from a subselect > -- > > Key: IMPALA-9500 > URL: https://issues.apache.org/jira/browse/IMPALA-9500 > Project: IMPALA > Issue Type: New Feature >Reporter: Gabor Kaszab >Priority: Major > Labels: complextype > > Once the rest of the tasks are implemented from the same epic there is a > chance that there won't be anything to do with this issue. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-9498) Allow Collection types in SELECT list for Parquet tables
[ https://issues.apache.org/jira/browse/IMPALA-9498?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabor Kaszab updated IMPALA-9498: - Labels: complextype (was: ) > Allow Collection types in SELECT list for Parquet tables > > > Key: IMPALA-9498 > URL: https://issues.apache.org/jira/browse/IMPALA-9498 > Project: IMPALA > Issue Type: New Feature >Reporter: Gabor Kaszab >Priority: Major > Labels: complextype > > This covers collections: Array, Map > Expected printout format: > Array: [null,1,2,null,3,null] > Map: {"k1":2,"k2":null} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-9499) Display support for all complex types in a SELECT * query
[ https://issues.apache.org/jira/browse/IMPALA-9499?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabor Kaszab updated IMPALA-9499: - Labels: complextype (was: ) > Display support for all complex types in a SELECT * query > - > > Key: IMPALA-9499 > URL: https://issues.apache.org/jira/browse/IMPALA-9499 > Project: IMPALA > Issue Type: New Feature >Reporter: Gabor Kaszab >Priority: Major > Labels: complextype > > Covers all complex types (Struct, Array, Map) for both Parquet and ORC file > formats. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-9495) Allow Struct type in SELECT list for ORC tables
[ https://issues.apache.org/jira/browse/IMPALA-9495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabor Kaszab updated IMPALA-9495: - Labels: complextype (was: ) > Allow Struct type in SELECT list for ORC tables > --- > > Key: IMPALA-9495 > URL: https://issues.apache.org/jira/browse/IMPALA-9495 > Project: IMPALA > Issue Type: New Feature > Components: Backend, Frontend >Reporter: Gabor Kaszab >Assignee: Gabor Kaszab >Priority: Major > Labels: complextype > > Output is expected in Json format, e.g.: > {"a":5,"b":"five"} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-9497) Allow Collection types in SELECT list for ORC tables
[ https://issues.apache.org/jira/browse/IMPALA-9497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabor Kaszab updated IMPALA-9497: - Labels: complextype (was: ) > Allow Collection types in SELECT list for ORC tables > > > Key: IMPALA-9497 > URL: https://issues.apache.org/jira/browse/IMPALA-9497 > Project: IMPALA > Issue Type: New Feature >Reporter: Gabor Kaszab >Priority: Major > Labels: complextype > > This covers collections: Array, Map > Expected printout format: > Array: [null,1,2,null,3,null] > Map: \{"k1":2,"k2":null} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-9496) Allow Struct type in SELECT list for Parquet tables
[ https://issues.apache.org/jira/browse/IMPALA-9496?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabor Kaszab updated IMPALA-9496: - Labels: complextype (was: ) > Allow Struct type in SELECT list for Parquet tables > --- > > Key: IMPALA-9496 > URL: https://issues.apache.org/jira/browse/IMPALA-9496 > Project: IMPALA > Issue Type: New Feature >Reporter: Gabor Kaszab >Priority: Major > Labels: complextype > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Work started] (IMPALA-9467) Impala Doc: Improve Impala shell usability by enabling live_progress in the interactive mode
[ https://issues.apache.org/jira/browse/IMPALA-9467?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on IMPALA-9467 started by Kris Hahn. - > Impala Doc: Improve Impala shell usability by enabling live_progress in the > interactive mode > > > Key: IMPALA-9467 > URL: https://issues.apache.org/jira/browse/IMPALA-9467 > Project: IMPALA > Issue Type: Documentation > Components: Clients >Reporter: Alice Fan >Assignee: Kris Hahn >Priority: Major > > https://gerrit.cloudera.org/#/c/15219/ > We enable shell option live_progress in interactive mode by default. As for > in the non-interactive mode, live reporting is not supported. Impala-shell > will disable live_progress if the mode is detected. Need to update the doc to > reflect the changes. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-9504) DOC: Remove "experimental" from description of ORC support.
Kris Hahn created IMPALA-9504: - Summary: DOC: Remove "experimental" from description of ORC support. Key: IMPALA-9504 URL: https://issues.apache.org/jira/browse/IMPALA-9504 Project: IMPALA Issue Type: Documentation Reporter: Kris Hahn Assignee: Kris Hahn -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-9503) Expose 'healthz' endpoint for statestored and catalogd
Abhishek Rawat created IMPALA-9503: -- Summary: Expose 'healthz' endpoint for statestored and catalogd Key: IMPALA-9503 URL: https://issues.apache.org/jira/browse/IMPALA-9503 Project: IMPALA Issue Type: Task Reporter: Abhishek Rawat Assignee: Alice Fan IMPALA-8895 exposed the end points for impalads. It seems only coordinator and executors expose 'healthz' endpoint. It will be good to expose the endpoints on statestored and catalogd. {code:java} curl http://localhost:25010/healthz No URI handler for '/healthz' curl http://localhost:25020/healthz No URI handler for '/healthz'{code} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-9369) Inserts on large tables could be very slow when event processing it turned on
[ https://issues.apache.org/jira/browse/IMPALA-9369?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Anurag Mantripragada resolved IMPALA-9369. -- Target Version: Impala 3.4.0 Resolution: Fixed > Inserts on large tables could be very slow when event processing it turned on > - > > Key: IMPALA-9369 > URL: https://issues.apache.org/jira/browse/IMPALA-9369 > Project: IMPALA > Issue Type: Improvement > Components: Catalog >Reporter: Vihang Karajgaonkar >Assignee: Anurag Mantripragada >Priority: Critical > > In case where large number files are being inserted into a table, the > {{createInsertEvents}} method fires insert events to HMS for each partition > one take a time. This could be very slow for a insert statement which is > added hundreds or thousands of files. > We should see if we can fire the insert events asynchronously instead of > blocking the query from returning to the user until all the insert events are > fired. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-2272) Parquet scanner always materializes NULL for empty collections
[ https://issues.apache.org/jira/browse/IMPALA-2272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17058932#comment-17058932 ] Gabor Kaszab commented on IMPALA-2272: -- Thanks for drawing my attention to this ticket [~tarmstrong]. I'll put a label on this so that we can find these complex types related issues with one search. > Parquet scanner always materializes NULL for empty collections > -- > > Key: IMPALA-2272 > URL: https://issues.apache.org/jira/browse/IMPALA-2272 > Project: IMPALA > Issue Type: Bug > Components: Backend >Affects Versions: Impala 2.3.0 >Reporter: Skye Wanderman-Milne >Priority: Minor > Labels: complextype, nested_types > > Currently the Parquet scanner will always materialize a NULL slot for an > empty collection, rather than an empty ArrayValue/CollectionValue. It is not > currently possible to write a query that exposes this bug (i.e. it's not > possible to write a query that distinguishes between an empty and NULL > collection), but it will be once we add expressions that take collections as > input (e.g. "select array_column is null from tbl"). > We have this bug because the parquet scanner only looks at the repeated field > of an array, not the containing group field. To fix it, it will have to > consider the def/rep levels of both. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-2272) Parquet scanner always materializes NULL for empty collections
[ https://issues.apache.org/jira/browse/IMPALA-2272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabor Kaszab updated IMPALA-2272: - Labels: complextype nested_types (was: nested_types) > Parquet scanner always materializes NULL for empty collections > -- > > Key: IMPALA-2272 > URL: https://issues.apache.org/jira/browse/IMPALA-2272 > Project: IMPALA > Issue Type: Bug > Components: Backend >Affects Versions: Impala 2.3.0 >Reporter: Skye Wanderman-Milne >Priority: Minor > Labels: complextype, nested_types > > Currently the Parquet scanner will always materialize a NULL slot for an > empty collection, rather than an empty ArrayValue/CollectionValue. It is not > currently possible to write a query that exposes this bug (i.e. it's not > possible to write a query that distinguishes between an empty and NULL > collection), but it will be once we add expressions that take collections as > input (e.g. "select array_column is null from tbl"). > We have this bug because the parquet scanner only looks at the repeated field > of an array, not the containing group field. To fix it, it will have to > consider the def/rep levels of both. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-2939) Nested Types : Address Runtime & Scoped timer overhead
[ https://issues.apache.org/jira/browse/IMPALA-2939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17058928#comment-17058928 ] Gabor Kaszab commented on IMPALA-2939: -- [~tarmstrong] Agree, a lot could have change since this was opened. I'll put this in one of the complex types milestones and we do a re-measurement once we get there. > Nested Types : Address Runtime & Scoped timer overhead > -- > > Key: IMPALA-2939 > URL: https://issues.apache.org/jira/browse/IMPALA-2939 > Project: IMPALA > Issue Type: Improvement > Components: Backend >Affects Versions: Impala 2.4.0 >Reporter: Mostafa Mokhtar >Priority: Minor > Labels: complextype, nested_types, performance > Attachments: Bottom-Up-HotFunctions.csv, Top-Down-HotFunctions.csv, > nestedTypesQ1.zip > > > For the following query about 45% of the time is spent in updating timers, > RunTimeProfile and checking query state, since NestedTypes don't always > operate on Batches the overhead of updating counters is amplified. > {code} > select > l.l_shipdate, count(*) as wins > from > customer.c_orders o, > o.o_lineitems l > where > o_orderdate = '1993-12-12' > group by l.l_shipdate > order by wins; > {code} > |Function||Effective Time by Utilization|| > |clock_gettime29.8% 0s 0s librt.so.1| clock_gettime| > |impala::RuntimeProfile::Counter::Add|5.3%| > |std::map std::less, std::allocator impala::RuntimeProfile::Counter*>>>::operator[]| 4.7%| > |impala::RuntimeState::CheckQueryState| 3.2%| > |impala::MonotonicStopWatch::Stop|2.7%| -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-4085) Create impala docker images from asf-gerrit/master instead of origin/cdh5-trunk
[ https://issues.apache.org/jira/browse/IMPALA-4085?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong resolved IMPALA-4085. --- Resolution: Won't Fix The docker images referred to weren't maintained by the apache impala project > Create impala docker images from asf-gerrit/master instead of > origin/cdh5-trunk > --- > > Key: IMPALA-4085 > URL: https://issues.apache.org/jira/browse/IMPALA-4085 > Project: IMPALA > Issue Type: Improvement > Components: Infrastructure >Affects Versions: Impala 2.7.0 >Reporter: Lars Volker >Priority: Minor > Labels: asf > > Currently the Impala docker images are built from the cdh5-trunk branch in > the Impala project on gerrit.cloudera.org > (https://gerrit.cloudera.org/#/q/project:Impala). > However, due to the switch to the Impala-ASF project, they should be built > from the master branch in the Impala-ASF project > (https://gerrit.cloudera.org/#/q/project:Impala-ASF). -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-6073) No response from expr-codegen-test/expr-test
[ https://issues.apache.org/jira/browse/IMPALA-6073?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong resolved IMPALA-6073. --- Resolution: Cannot Reproduce > No response from expr-codegen-test/expr-test > > > Key: IMPALA-6073 > URL: https://issues.apache.org/jira/browse/IMPALA-6073 > Project: IMPALA > Issue Type: Bug > Components: Backend >Affects Versions: Impala 2.9.0 > Environment: ubuntu 14.04.5 LTS >Reporter: Jin Chul Kim >Assignee: Philip Martin >Priority: Major > > I checked out HEAD of Impala repo. There is no any code modification on my > local repo. > After full build, ./build/debug/exprs/expr-test and > ./build/debug/exprs/expr-codegen-test seem to be hung. The other gtests in be > are working appropriately. Some threads are waiting and the other threads are > on sleep. > I guess this issue is similar to the deadlock: > https://issues.apache.org/jira/browse/HDFS-11851 > Please let me know if you have any workaround. > Here are stack traces on expr-codegen-test using GDB: > {code:java} > jinchulkim@ubuntu:~/workspace/Impala/be$ gdb > ./build/debug/exprs/expr-codegen-test > ... > (gdb) info thread > Id Target Id Frame > 16 Thread 0x7fffe20ae700 (LWP 23374) "expr-codegen-te" > pthread_cond_timedwait@@GLIBC_2.3.2 () at > ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238 > 15 Thread 0x7fffe21af700 (LWP 23373) "expr-codegen-te" > pthread_cond_wait@@GLIBC_2.3.2 () at > ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185 > 14 Thread 0x7fffe22b0700 (LWP 23372) "expr-codegen-te" > pthread_cond_wait@@GLIBC_2.3.2 () at > ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185 > 13 Thread 0x7fffe23b1700 (LWP 23371) "expr-codegen-te" > pthread_cond_wait@@GLIBC_2.3.2 () at > ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185 > 12 Thread 0x7fffe24b2700 (LWP 23370) "expr-codegen-te" sem_wait () at > ../nptl/sysdeps/unix/sysv/linux/x86_64/sem_wait.S:85 > 11 Thread 0x7fffe2c95700 (LWP 23369) "expr-codegen-te" > pthread_cond_wait@@GLIBC_2.3.2 () at > ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185 > 10 Thread 0x7fffe2d96700 (LWP 23368) "expr-codegen-te" > pthread_cond_wait@@GLIBC_2.3.2 () at > ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185 > 9Thread 0x7fffe2e97700 (LWP 23367) "expr-codegen-te" > pthread_cond_timedwait@@GLIBC_2.3.2 () at > ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_timedwait.S:238 > 8Thread 0x7fffe915a700 (LWP 23366) "expr-codegen-te" > pthread_cond_wait@@GLIBC_2.3.2 () at > ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185 > 7Thread 0x7fffe925b700 (LWP 23365) "expr-codegen-te" > pthread_cond_wait@@GLIBC_2.3.2 () at > ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185 > 6Thread 0x7fffe935c700 (LWP 23364) "expr-codegen-te" > pthread_cond_wait@@GLIBC_2.3.2 () at > ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185 > 5Thread 0x7fffe945d700 (LWP 23363) "expr-codegen-te" > pthread_cond_wait@@GLIBC_2.3.2 () at > ../nptl/sysdeps/unix/sysv/linux/x86_64/pthread_cond_wait.S:185 > 4Thread 0x7fffee536700 (LWP 23362) "expr-codegen-te" 0x7037bb9d > in nanosleep () at ../sysdeps/unix/syscall-template.S:81 > 3Thread 0x7fffeed37700 (LWP 23361) "expr-codegen-te" 0x7037bb9d > in nanosleep () at ../sysdeps/unix/syscall-template.S:81 > 2Thread 0x7fffef538700 (LWP 23356) "expr-codegen-te" 0x70067dfd > in nanosleep () at ../sysdeps/unix/syscall-template.S:81 > * 1Thread 0x7fffef5408c0 (LWP 23150) "expr-codegen-te" __lll_lock_wait () > at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135 > (gdb) where > #0 __lll_lock_wait () at > ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135 > #1 0x70376649 in _L_lock_909 () from > /lib/x86_64-linux-gnu/libpthread.so.0 > #2 0x70376470 in __GI___pthread_mutex_lock (mutex=0x74080600 > ) at ../nptl/pthread_mutex_lock.c:79 > #3 0x73e7b666 in mutexLock (m=) at > /data/2/jenkins/workspace/impala-hadoop-dependency/hadoop/hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/os/posix/mutexes.c:28 > #4 0x73e73a11 in setTLSExceptionStrings (rootCause=0x0, > stackTrace=0x0) at > /data/2/jenkins/workspace/impala-hadoop-dependency/hadoop/hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/jni_helper.c:581 > #5 0x73e73393 in printExceptionAndFreeV (env=0x307c1d8, > exc=0x301adc0, noPrintFlags=, fmt=0x73e7bf6e > "loadFileSystems", ap=0x7fffaab0) at > /data/2/jenkins/workspace/impala-hadoop-dependency/hadoop/hadoop-hdfs-project/hadoop-hdfs/src/main/native/libhdfs/exception.c:183 > #6 0x73e735ef in
[jira] [Commented] (IMPALA-2939) Nested Types : Address Runtime & Scoped timer overhead
[ https://issues.apache.org/jira/browse/IMPALA-2939?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17058896#comment-17058896 ] Tim Armstrong commented on IMPALA-2939: --- [~gaborkaszab] I guess we probably would want to re-profile these queries to see what the bottleneck is before making targeted fixes. > Nested Types : Address Runtime & Scoped timer overhead > -- > > Key: IMPALA-2939 > URL: https://issues.apache.org/jira/browse/IMPALA-2939 > Project: IMPALA > Issue Type: Improvement > Components: Backend >Affects Versions: Impala 2.4.0 >Reporter: Mostafa Mokhtar >Priority: Minor > Labels: complextype, nested_types, performance > Attachments: Bottom-Up-HotFunctions.csv, Top-Down-HotFunctions.csv, > nestedTypesQ1.zip > > > For the following query about 45% of the time is spent in updating timers, > RunTimeProfile and checking query state, since NestedTypes don't always > operate on Batches the overhead of updating counters is amplified. > {code} > select > l.l_shipdate, count(*) as wins > from > customer.c_orders o, > o.o_lineitems l > where > o_orderdate = '1993-12-12' > group by l.l_shipdate > order by wins; > {code} > |Function||Effective Time by Utilization|| > |clock_gettime29.8% 0s 0s librt.so.1| clock_gettime| > |impala::RuntimeProfile::Counter::Add|5.3%| > |std::map std::less, std::allocator impala::RuntimeProfile::Counter*>>>::operator[]| 4.7%| > |impala::RuntimeState::CheckQueryState| 3.2%| > |impala::MonotonicStopWatch::Stop|2.7%| -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-2272) Parquet scanner always materializes NULL for empty collections
[ https://issues.apache.org/jira/browse/IMPALA-2272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Tim Armstrong updated IMPALA-2272: -- Labels: nested_types (was: ) > Parquet scanner always materializes NULL for empty collections > -- > > Key: IMPALA-2272 > URL: https://issues.apache.org/jira/browse/IMPALA-2272 > Project: IMPALA > Issue Type: Bug > Components: Backend >Affects Versions: Impala 2.3.0 >Reporter: Skye Wanderman-Milne >Priority: Minor > Labels: nested_types > > Currently the Parquet scanner will always materialize a NULL slot for an > empty collection, rather than an empty ArrayValue/CollectionValue. It is not > currently possible to write a query that exposes this bug (i.e. it's not > possible to write a query that distinguishes between an empty and NULL > collection), but it will be once we add expressions that take collections as > input (e.g. "select array_column is null from tbl"). > We have this bug because the parquet scanner only looks at the repeated field > of an array, not the containing group field. To fix it, it will have to > consider the def/rep levels of both. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-2272) Parquet scanner always materializes NULL for empty collections
[ https://issues.apache.org/jira/browse/IMPALA-2272?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17058895#comment-17058895 ] Tim Armstrong commented on IMPALA-2272: --- [~gaborkaszab] this is good to keep in mind. > Parquet scanner always materializes NULL for empty collections > -- > > Key: IMPALA-2272 > URL: https://issues.apache.org/jira/browse/IMPALA-2272 > Project: IMPALA > Issue Type: Bug > Components: Backend >Affects Versions: Impala 2.3.0 >Reporter: Skye Wanderman-Milne >Priority: Minor > Labels: nested_types > > Currently the Parquet scanner will always materialize a NULL slot for an > empty collection, rather than an empty ArrayValue/CollectionValue. It is not > currently possible to write a query that exposes this bug (i.e. it's not > possible to write a query that distinguishes between an empty and NULL > collection), but it will be once we add expressions that take collections as > input (e.g. "select array_column is null from tbl"). > We have this bug because the parquet scanner only looks at the repeated field > of an array, not the containing group field. To fix it, it will have to > consider the def/rep levels of both. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-9502) Avoid copying TExecRequest when retrying queries
Sahil Takiar created IMPALA-9502: Summary: Avoid copying TExecRequest when retrying queries Key: IMPALA-9502 URL: https://issues.apache.org/jira/browse/IMPALA-9502 Project: IMPALA Issue Type: Sub-task Reporter: Sahil Takiar There are a few issues that occur when re-using a {{TExecRequest}} across query retries. We should investigate if there is a way to work around those issues so that the {{TExecRequest}} does not need to be copied when retrying a query. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-9501) Upgrade sqlparse to a version that supports python 3.0
David Knupp created IMPALA-9501: --- Summary: Upgrade sqlparse to a version that supports python 3.0 Key: IMPALA-9501 URL: https://issues.apache.org/jira/browse/IMPALA-9501 Project: IMPALA Issue Type: Improvement Components: Infrastructure Reporter: David Knupp The current version (0.1.19) was selected, per IMPALA-6999. because it's the last version to be compatible with python 2.6. However, it's not compatible with python 3.x. {noformat} Traceback (most recent call last): File "/home/dknupp/Impala/shell/build/impala-shell-3.4.0-SNAPSHOT/impala_shell.py", line 37, in import sqlparse File "", line 983, in _find_and_load File "", line 967, in _find_and_load_unlocked File "", line 668, in _load_unlocked File "", line 638, in _load_backward_compatible File "/home/dknupp/Impala/shell/build/impala-shell-3.4.0-SNAPSHOT/ext-py/sqlparse-0.1.19-py2.7.egg/sqlparse/__init__.py", line 13, in File "", line 983, in _find_and_load File "", line 967, in _find_and_load_unlocked File "", line 668, in _load_unlocked File "", line 638, in _load_backward_compatible File "/home/dknupp/Impala/shell/build/impala-shell-3.4.0-SNAPSHOT/ext-py/sqlparse-0.1.19-py2.7.egg/sqlparse/engine/__init__.py", line 8, in File "", line 983, in _find_and_load File "", line 963, in _find_and_load_unlocked File "", line 906, in _find_spec File "", line 1280, in find_spec File "", line 1254, in _get_spec File "", line 1235, in _legacy_get_spec File "", line 441, in spec_from_loader File "", line 594, in spec_from_file_location File "/home/dknupp/Impala/shell/build/impala-shell-3.4.0-SNAPSHOT/ext-py/sqlparse-0.1.19-py2.7.egg/sqlparse/lexer.py", line 84 except Exception, err: ^ SyntaxError: invalid syntax {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-3841) Avoid materializing nested collections if top-level predicates already disqualify the row.
[ https://issues.apache.org/jira/browse/IMPALA-3841?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17058894#comment-17058894 ] Tim Armstrong commented on IMPALA-3841: --- There's also opportunity to optimise this for any kind of column - i.e. evaluate predicates before materialising other columns. > Avoid materializing nested collections if top-level predicates already > disqualify the row. > -- > > Key: IMPALA-3841 > URL: https://issues.apache.org/jira/browse/IMPALA-3841 > Project: IMPALA > Issue Type: Improvement > Components: Backend >Affects Versions: Impala 2.5.0, Impala 2.6.0 >Reporter: Alexander Behm >Priority: Minor > Labels: complextype, nested_types, parquet, performance > > Today, we fully materialize a row before evaluating the top-level conjuncts > when scanning Parquet. This includes materializing nested collections. We > should avoid materializing nested collections if top-level conjuncts already > discard the row. Our recent move to column-wise materialization makes this > improvement feasible (IMPALA-2736). > To illustrate the problem, consider this query: > {code} > select * from customer c, c.orders o where c.id = 10 > {code} > Even though we have a very selective predicate on the top-level customer, our > scanner will still fully materialize all orders of all customers. The > non-matches will be filtered, but we still pay the cost of materializing the > orders. > The proposed improvement is to avoid materializing the orders of > non-qualifying customers. > The improvement will several things: > * Analyze and separate the top-level conjuncts into those that can be > evaluated before materializing the nested collections and those that require > nested collections to be materialized. In particular, we need to be careful > with our auto-generated !empty() predicates on nested collections. > * Add a new SkipValues() or similar interface to the Parquet column readers > to advances the scanner without actually materializing values. If possible, > we should skip entire blocks. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Closed] (IMPALA-2777) Support complex-typed expressions in the select list
[ https://issues.apache.org/jira/browse/IMPALA-2777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabor Kaszab closed IMPALA-2777. Fix Version/s: Not Applicable Resolution: Duplicate I created an umbrella Jira to cover the displaying functionalities of complex types. Let me close this as a duplicate and use the umbrella in the future. > Support complex-typed expressions in the select list > > > Key: IMPALA-2777 > URL: https://issues.apache.org/jira/browse/IMPALA-2777 > Project: IMPALA > Issue Type: Improvement > Components: Frontend >Affects Versions: Impala 2.3.0 >Reporter: Andrew Lam >Priority: Minor > Labels: complextype, planner, usability > Fix For: Not Applicable > > > We use `CREATE VIEW` in HIVE to strip out columns containing sensitive > information from tables, and grant access to these views to a subset of users. > Impala queries involving complex types work on the original table but not on > the view. > Is this expected? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-9500) Allow returning complex types from a subselect
Gabor Kaszab created IMPALA-9500: Summary: Allow returning complex types from a subselect Key: IMPALA-9500 URL: https://issues.apache.org/jira/browse/IMPALA-9500 Project: IMPALA Issue Type: New Feature Reporter: Gabor Kaszab Once the rest of the tasks are implemented from the same epic there is a chance that there won't be anything to do with this issue. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-9499) Display support for all complex types in a SELECT * query
Gabor Kaszab created IMPALA-9499: Summary: Display support for all complex types in a SELECT * query Key: IMPALA-9499 URL: https://issues.apache.org/jira/browse/IMPALA-9499 Project: IMPALA Issue Type: New Feature Reporter: Gabor Kaszab Covers all complex types (Struct, Array, Map) for both Parquet and ORC file formats. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-9497) Allow Collection types in SELECT list for ORC tables
[ https://issues.apache.org/jira/browse/IMPALA-9497?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabor Kaszab updated IMPALA-9497: - Description: This covers collections: Array, Map Expected printout format: Array: [null,1,2,null,3,null] Map: \{"k1":2,"k2":null} was: This covers collections: Array, Map Expected printout format: Array: [null,1,2,null,3,null] Map:{"k1":2,"k2":null} > Allow Collection types in SELECT list for ORC tables > > > Key: IMPALA-9497 > URL: https://issues.apache.org/jira/browse/IMPALA-9497 > Project: IMPALA > Issue Type: New Feature >Reporter: Gabor Kaszab >Priority: Major > > This covers collections: Array, Map > Expected printout format: > Array: [null,1,2,null,3,null] > Map: \{"k1":2,"k2":null} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-9498) Allow Collection types in SELECT list for Parquet tables
Gabor Kaszab created IMPALA-9498: Summary: Allow Collection types in SELECT list for Parquet tables Key: IMPALA-9498 URL: https://issues.apache.org/jira/browse/IMPALA-9498 Project: IMPALA Issue Type: New Feature Reporter: Gabor Kaszab This covers collections: Array, Map Expected printout format: Array: [null,1,2,null,3,null] Map: {"k1":2,"k2":null} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-9497) Allow Collection types in SELECT list for ORC tables
Gabor Kaszab created IMPALA-9497: Summary: Allow Collection types in SELECT list for ORC tables Key: IMPALA-9497 URL: https://issues.apache.org/jira/browse/IMPALA-9497 Project: IMPALA Issue Type: New Feature Reporter: Gabor Kaszab This covers collections: Array, Map Expected printout format: Array: [null,1,2,null,3,null] Map:{"k1":2,"k2":null} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-9496) Allow Struct type in SELECT list for Parquet tables
Gabor Kaszab created IMPALA-9496: Summary: Allow Struct type in SELECT list for Parquet tables Key: IMPALA-9496 URL: https://issues.apache.org/jira/browse/IMPALA-9496 Project: IMPALA Issue Type: New Feature Reporter: Gabor Kaszab -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-9495) Allow Struct type in SELECT list for ORC tables
Gabor Kaszab created IMPALA-9495: Summary: Allow Struct type in SELECT list for ORC tables Key: IMPALA-9495 URL: https://issues.apache.org/jira/browse/IMPALA-9495 Project: IMPALA Issue Type: Bug Components: Backend, Frontend Reporter: Gabor Kaszab Assignee: Gabor Kaszab Output is expected in Json format, e.g.: {"a":5,"b":"five"} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-9495) Allow Struct type in SELECT list for ORC tables
[ https://issues.apache.org/jira/browse/IMPALA-9495?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabor Kaszab updated IMPALA-9495: - Issue Type: New Feature (was: Bug) > Allow Struct type in SELECT list for ORC tables > --- > > Key: IMPALA-9495 > URL: https://issues.apache.org/jira/browse/IMPALA-9495 > Project: IMPALA > Issue Type: New Feature > Components: Backend, Frontend >Reporter: Gabor Kaszab >Assignee: Gabor Kaszab >Priority: Major > > Output is expected in Json format, e.g.: > {"a":5,"b":"five"} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-8908) Bad error message when failing to connect to HTTPS endpoint with shell
[ https://issues.apache.org/jira/browse/IMPALA-8908?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17058781#comment-17058781 ] Tamas Mate commented on IMPALA-8908: Started working on this recently, checked the commits required to apply THRIFT-3634, it is available from thrift-0.10. I have found the following commits that should be be backported to the native-toolchain: {code:java} THRIFT-3634 Fix Python TSocket resource leak on connection failure THRIFT-3618 Python TSSLSocket deprecation message should print caller… THRIFT-3615 Fix Python SSL client resource leak on connection failure THRIFT-3599 Validate client IP address against cert's SubjectAltName THRIFT-2103 [python] Support for SSL certificates with Subject Altern… THRIFT-3596 Better conformance to PEP8 THRIFT-1857 Python 3 Support {code} I was able to build a native-toolchain with these patches on, but the Impala build failed. Today I stumbled upon IMPALA-9489, which looks like to be resolving this issue as well by using thrift-0.11 instead of thrift-0.9.3 when running impala-shell. [~dknupp] , could you confirm if my understanding is correct? > Bad error message when failing to connect to HTTPS endpoint with shell > -- > > Key: IMPALA-8908 > URL: https://issues.apache.org/jira/browse/IMPALA-8908 > Project: IMPALA > Issue Type: Bug > Components: Clients >Affects Versions: Impala 3.3.0 >Reporter: Tim Armstrong >Assignee: Tamas Mate >Priority: Critical > Labels: observability, ramp-up > > Legitimate connection errors get masked with an UnboundLocalError. It looks > like THRIFT-3634 fixed this. > {noformat} > $ impala-shell.sh -i ip-10-97-80-186.cloudera.site --protocol=hs2 --ldap > --user csso_tarmstrong --ssl > Starting Impala Shell using LDAP-based authentication > SSL is enabled. Impala server certificates will NOT be verified (set > --ca_cert to change) > LDAP password for csso_tarmstrong: > Traceback (most recent call last): > File "/home/tarmstrong/Impala/incubator-impala/shell/impala_shell.py", line > 1880, in > impala_shell_main() > File "/home/tarmstrong/Impala/incubator-impala/shell/impala_shell.py", line > 1841, in impala_shell_main > with ImpalaShell(options, query_options) as shell: > File "/home/tarmstrong/Impala/incubator-impala/shell/impala_shell.py", line > 243, in __init__ > self.do_connect(options.impalad) > File "/home/tarmstrong/Impala/incubator-impala/shell/impala_shell.py", line > 812, in do_connect > self._connect() > File "/home/tarmstrong/Impala/incubator-impala/shell/impala_shell.py", line > 860, in _connect > self.server_version, self.webserver_address = self.imp_client.connect() > File "/home/tarmstrong/Impala/incubator-impala/shell/impala_client.py", > line 176, in connect > self.transport = self._get_transport(self.client_connect_timeout_ms) > File "/home/tarmstrong/Impala/incubator-impala/shell/impala_client.py", > line 472, in _get_transport > transport.open() > File "/home/tarmstrong/Impala/incubator-impala/shell/thrift_sasl.py", line > 61, in open > self._trans.open() > File > "/opt/Impala-Toolchain/thrift-0.9.3-p7/python/lib/python2.7/site-packages/thrift/transport/TSSLSocket.py", > line 258, in open > logger.error('Error while connecting with %s.', ip_port, exc_info=True) > UnboundLocalError: local variable 'ip_port' referenced before assignment > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-9494) Support displaying complex types
[ https://issues.apache.org/jira/browse/IMPALA-9494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabor Kaszab updated IMPALA-9494: - Labels: complextype (was: ) > Support displaying complex types > > > Key: IMPALA-9494 > URL: https://issues.apache.org/jira/browse/IMPALA-9494 > Project: IMPALA > Issue Type: Epic > Components: Backend, Frontend >Reporter: Gabor Kaszab >Assignee: Gabor Kaszab >Priority: Major > Labels: complextype > > Currently displaying complex types is not supported in Impala. There is a > workaround to see the unnested content of the collections but there is no > such functionality to display one complex type in one result column formatted > to Json. Note, Hive does have support for the same. > The scope of this Jira is to have an umbrella for the following tasks: > - Support all complex types (Struct, Array, Map) in the SELECT list and > display them as Json. > - Display complex type columns in Json for a "SELECT *" query > - Allow returning complex types in a sub-select > - Have the display support for Parquet and ORC > Note, I made the sub-tasks as small as possible so that this granularity > allows to parallelise the work. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-9494) Support displaying complex types
Gabor Kaszab created IMPALA-9494: Summary: Support displaying complex types Key: IMPALA-9494 URL: https://issues.apache.org/jira/browse/IMPALA-9494 Project: IMPALA Issue Type: Epic Components: Backend, Frontend Reporter: Gabor Kaszab Assignee: Gabor Kaszab Currently displaying complex types is not supported in Impala. There is a workaround to see the unnested content of the collections but there is no such functionality to display one complex type in one result column formatted to Json. Note, Hive does have support for the same. The scope of this Jira is to have an umbrella for the following tasks: - Support all complex types (Struct, Array, Map) in the SELECT list and display them as Json. - Display complex type columns in Json for a "SELECT *" query - Allow returning complex types in a sub-select - Have the display support for Parquet and ORC Note, I made the sub-tasks as small as possible so that this granularity allows to parallelise the work. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-2777) Support complex-typed expressions in the select list
[ https://issues.apache.org/jira/browse/IMPALA-2777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabor Kaszab updated IMPALA-2777: - Issue Type: Improvement (was: Epic) > Support complex-typed expressions in the select list > > > Key: IMPALA-2777 > URL: https://issues.apache.org/jira/browse/IMPALA-2777 > Project: IMPALA > Issue Type: Improvement > Components: Frontend >Affects Versions: Impala 2.3.0 >Reporter: Andrew Lam >Priority: Minor > Labels: complextype, planner, usability > > We use `CREATE VIEW` in HIVE to strip out columns containing sensitive > information from tables, and grant access to these views to a subset of users. > Impala queries involving complex types work on the original table but not on > the view. > Is this expected? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Closed] (IMPALA-3111) Selecting collection types returns json output rather than analysis error
[ https://issues.apache.org/jira/browse/IMPALA-3111?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabor Kaszab closed IMPALA-3111. Fix Version/s: Not Applicable Resolution: Duplicate > Selecting collection types returns json output rather than analysis error > - > > Key: IMPALA-3111 > URL: https://issues.apache.org/jira/browse/IMPALA-3111 > Project: IMPALA > Issue Type: New Feature > Components: Frontend >Affects Versions: Impala 2.3.0 >Reporter: Matthew Jacobs >Priority: Minor > Labels: complextype > Fix For: Not Applicable > > > Today, an outermost select list may not return collection types-- Impala now > throws an AnalysisException. > E.g. > {code} > [localhost:21000] > select c_orders from tpch_nested_parquet.customer limit 1; > Query: select c_orders from tpch_nested_parquet.customer limit 1 > ERROR: AnalysisException: Expr 'c_orders' in select list returns a complex > type > 'ARRAY>>>'. > Only scalar types are allowed in the select list. > {code} > Instead, we should support returning collection types in the outermost select > list as json strings. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-2777) Support complex-typed expressions in the select list
[ https://issues.apache.org/jira/browse/IMPALA-2777?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabor Kaszab updated IMPALA-2777: - Issue Type: Epic (was: Improvement) > Support complex-typed expressions in the select list > > > Key: IMPALA-2777 > URL: https://issues.apache.org/jira/browse/IMPALA-2777 > Project: IMPALA > Issue Type: Epic > Components: Frontend >Affects Versions: Impala 2.3.0 >Reporter: Andrew Lam >Priority: Minor > Labels: complextype, planner, usability > > We use `CREATE VIEW` in HIVE to strip out columns containing sensitive > information from tables, and grant access to these views to a subset of users. > Impala queries involving complex types work on the original table but not on > the view. > Is this expected? -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-2603) Incorrect results and plan for inline view referencing several collection types correlated with different ancestor blocks
[ https://issues.apache.org/jira/browse/IMPALA-2603?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabor Kaszab updated IMPALA-2603: - Labels: complextype correctness crash downgraded nested_types query_generator (was: correctness crash downgraded nested_types query_generator) > Incorrect results and plan for inline view referencing several collection > types correlated with different ancestor blocks > - > > Key: IMPALA-2603 > URL: https://issues.apache.org/jira/browse/IMPALA-2603 > Project: IMPALA > Issue Type: Bug > Components: Frontend >Affects Versions: Impala 2.3.0 >Reporter: Taras Bobrovytsky >Priority: Critical > Labels: complextype, correctness, crash, downgraded, > nested_types, query_generator > > *Problem* > Queries with multiple nested inline views that have correlated references to > nested collections (relative table references), can return incorrect results > in RELEASE and hit a DCHECK in debug under the following condition: > * There is an inline view that references multiple nested collections which > come from different ancestor blocks at different levels of nesting. > * In the example below, Impala fails to generate a correct plan for the "a" > inline view because the references "t5" and "t6" reference different ancestor > query blocks at different nesting levels. > Query: > {code} > SELECT > 1 > FROM > customer t1 > INNER JOIN ( > SELECT > 1 > FROM > t1.c_orders t2 > INNER JOIN ( > SELECT > 1 > FROM > t2.o_lineitems t5 > INNER JOIN t1.c_orders t6 >) as a > ) as b; > {code} > Wrong Query Plan: > {code} > ++ > | Explain String >| > ++ > | Estimated Per-Host Requirements: Memory=176.00MB VCores=1 >| > | WARNING: The following tables are missing relevant table and/or column > statistics. | > | tpch_nested_parquet.customer >| > | >| > | 05:EXCHANGE [UNPARTITIONED] >| > | | >| > | 01:SUBPLAN >| > | | >| > | |--04:NESTED LOOP JOIN [CROSS JOIN] >| > | | | >| > | | |--02:SINGULAR ROW SRC >| > | | | >| > | | 03:UNNEST [t1.c_orders t2] >| > | | >| > | 00:SCAN HDFS [tpch_nested_parquet.customer t1] >| > |partitions=1/1 files=4 size=554.13MB >| > ++ > {code} > Stack Trace: > {code} > #0 0x7f5c10cf5cc9 in __GI_raise (sig=sig@entry=6) at > ../nptl/sysdeps/unix/sysv/linux/raise.c:56 > #1 0x7f5c10cf90d8 in __GI_abort () at abort.c:89 > #2 0x02144d09 in google::DumpStackTraceAndExit () at > src/utilities.cc:147 > #3 0x0213ddbd in google::LogMessage::Fail () at src/logging.cc:1315 > #4 0x0213fc45 in google::LogMessage::SendToLog (this=0x7f5b9ef08e00) > at src/logging.cc:1269 > #5 0x0213d913 in google::LogMessage::Flush > (this=this@entry=0x7f5b9ef08e00) at src/logging.cc:1138 > #6 0x0214059e in google::LogMessageFatal::~LogMessageFatal > (this=0x7f5b9ef08e00, __in_chrg=) at src/logging.cc:1836 > #7 0x01586657 in impala::Coordinator::ValidateCollectionSlots > (this=0xc49ca00, batch=0xc3b3e00) at > /home/dev/Impala/be/src/runtime/coordinator.cc:911 > #8 0x0158638d in impala::Coordinator::GetNext (this=0xc49ca00, > batch=0x7dd3bd0, state=0xd934400) at > /home/dev/Impala/be/src/runtime/coordinator.cc:890 > #9 0x013710c3 in > impala::ImpalaServer::QueryExecState::FetchNextBatch (this=0x7dd2000) at > /home/dev/Impala/be/src/service/query-exec-state.cc:877 > #10
[jira] [Closed] (IMPALA-3310) Cannot reference outer columns from inline view over nested collection
[ https://issues.apache.org/jira/browse/IMPALA-3310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabor Kaszab closed IMPALA-3310. Fix Version/s: Not Applicable Resolution: Duplicate This issue seems to be the same as IMPALA-2777 hence closing this one as duplicate. > Cannot reference outer columns from inline view over nested collection > -- > > Key: IMPALA-3310 > URL: https://issues.apache.org/jira/browse/IMPALA-3310 > Project: IMPALA > Issue Type: Improvement > Components: Frontend >Affects Versions: Impala 2.5.0 >Reporter: Skye Wanderman-Milne >Priority: Minor > Labels: planner > Fix For: Not Applicable > > > It should be possible to run this query: > {noformat} > Query: explain select id, m from complextypestbl t, (select min(t.id + item) > m from t.int_array) v > ERROR: AnalysisException: Could not resolve column/field reference: 't.id' > {noformat} > Table schema: > {noformat} > id bigint > int_array array > {noformat} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-3841) Avoid materializing nested collections if top-level predicates already disqualify the row.
[ https://issues.apache.org/jira/browse/IMPALA-3841?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabor Kaszab updated IMPALA-3841: - Labels: complextype nested_types parquet performance (was: nested_types parquet performance) > Avoid materializing nested collections if top-level predicates already > disqualify the row. > -- > > Key: IMPALA-3841 > URL: https://issues.apache.org/jira/browse/IMPALA-3841 > Project: IMPALA > Issue Type: Improvement > Components: Backend >Affects Versions: Impala 2.5.0, Impala 2.6.0 >Reporter: Alexander Behm >Priority: Minor > Labels: complextype, nested_types, parquet, performance > > Today, we fully materialize a row before evaluating the top-level conjuncts > when scanning Parquet. This includes materializing nested collections. We > should avoid materializing nested collections if top-level conjuncts already > discard the row. Our recent move to column-wise materialization makes this > improvement feasible (IMPALA-2736). > To illustrate the problem, consider this query: > {code} > select * from customer c, c.orders o where c.id = 10 > {code} > Even though we have a very selective predicate on the top-level customer, our > scanner will still fully materialize all orders of all customers. The > non-matches will be filtered, but we still pay the cost of materializing the > orders. > The proposed improvement is to avoid materializing the orders of > non-qualifying customers. > The improvement will several things: > * Analyze and separate the top-level conjuncts into those that can be > evaluated before materializing the nested collections and those that require > nested collections to be materialized. In particular, we need to be careful > with our auto-generated !empty() predicates on nested collections. > * Add a new SkipValues() or similar interface to the Parquet column readers > to advances the scanner without actually materializing values. If possible, > we should skip entire blocks. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-7471) Impala crashes or returns incorrect results when querying parquet nested types
[ https://issues.apache.org/jira/browse/IMPALA-7471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabor Kaszab updated IMPALA-7471: - Labels: complextype correctness crash parquet (was: correctness crash parquet) > Impala crashes or returns incorrect results when querying parquet nested types > -- > > Key: IMPALA-7471 > URL: https://issues.apache.org/jira/browse/IMPALA-7471 > Project: IMPALA > Issue Type: Bug > Components: Backend >Reporter: Tim Armstrong >Assignee: Csaba Ringhofer >Priority: Critical > Labels: complextype, correctness, crash, parquet > Attachments: test_users_131786401297925138_0.parquet > > > From > http://community.cloudera.com/t5/Interactive-Short-cycle-SQL/Impala-bug-with-nested-arrays-of-structures-where-some-of/m-p/78507/highlight/false#M4779 > {quote}We found a case where Impala returns incorrect values from simple > query. Our data contains nested array of structures and structures contains > other structures. > We generated minimal sample data allowing to reproduce the issue. > > SQL to create a table: > {quote} > {code} > CREATE TABLE plat_test.test_users ( > id INT, > name STRING, > devices ARRAY< > STRUCT< > id:STRING, > device_info:STRUCT< > model:STRING > > > > > > > ) > STORED AS PARQUET > {code} > {quote} > Please put attached parquet file to the location of the table and refresh the > table. > In sample data we have 2 users, one with 2 devices, second one with 3. Some > of the devices.device_info.model fields are NULL. > > When I issue a query: > {quote} > {code} > SELECT u.name, d.device_info.model as model > FROM test_users u, > u.devices d; > {code} > {quote} > I'm expecting to get 5 records in results, but getting only one1.png > If I change query to: > {quote} > {code} > SELECT u.name, d.device_info.model as model > FROM test_users u > LEFT OUTER JOIN u.devices d; > {code} > {quote} > I'm getting two records in the results, but still not as it should be. > We found some workaround to this problem. If we add to the result columns > device.id we will get all records from parquet file: > {quote} > {code} > SELECT u.name, d.id, d.device_info.model as model > FROM test_users u > , u.devices d > {code} > {quote} > And result is 3.png > > But we can't rely on this workaround, because we don't need device.id in all > queries and Impala optimizes it, and as a result we are getting unpredicted > results. > > I tested Hive query on this table and it returns expected results: > {quote} > {code} > SELECT u.name, d.device_info.model > FROM test_users u > lateral view outer inline (u.devices) d; > {code} > {quote} > results: > 4.png > Please advice if it's a problem in Impala engine or we did some mistake in > our query. > > Best regards, > Come2Play team. > {quote} -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-9263) Support column masking for nested types
[ https://issues.apache.org/jira/browse/IMPALA-9263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17058710#comment-17058710 ] Gabor Kaszab commented on IMPALA-9263: -- Hey [~stigahuang], I'm working on putting together a roadmap for complex type support and this Jira came to my radar. Is this going to be covered by the "Ranger column masking" efforts or should I put this on the complex type roadmap? > Support column masking for nested types > --- > > Key: IMPALA-9263 > URL: https://issues.apache.org/jira/browse/IMPALA-9263 > Project: IMPALA > Issue Type: New Feature >Reporter: Quanlong Huang >Priority: Major > -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-2490) RowsReturned profile counter may be wrong with nested types
[ https://issues.apache.org/jira/browse/IMPALA-2490?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabor Kaszab updated IMPALA-2490: - Labels: complextype debugging supportability usability (was: debugging supportability usability) > RowsReturned profile counter may be wrong with nested types > --- > > Key: IMPALA-2490 > URL: https://issues.apache.org/jira/browse/IMPALA-2490 > Project: IMPALA > Issue Type: Bug > Components: Backend >Affects Versions: Impala 2.3.0 >Reporter: Matthew Jacobs >Assignee: Abhishek Rawat >Priority: Major > Labels: complextype, debugging, supportability, usability > > We don't have a consistent way of accounting for rows returned in operators > between Close/Reset cycles. While we should have a way of determining the > total number of rows returned by an operator (we reset the counter in > ExecNode::Reset), there appear to be issues with the accounting in some > places. > When executing the following query, the rows returned by the NLJ operator > appear to be wrong: > {code} > create table test3 as select t1.field_102.field_104.field_107 c1 > FROM table_3 t1 > INNER JOIN t1.field_86 t2 > INNER JOIN t1.field_102.field_104.field_108.field_110 t3 > INNER JOIN table_5 t4 > WHERE > NOT EXISTS (SELECT > tt1.pos AS int_col > FROM t1.field_102.field_104.field_108.field_110 tt1 > CROSS JOIN t1.field_86 tt2 > WHERE > ((tt1.pos) IN (tt1.pos, -581.8)) AND (((t1.field_85) = (tt2.key)) AND > ((t1.field_82) = (tt2.value.field_94 > {code} > The # of rows inserted does not match the number of rows returned by its > child, the NLJ: > {code} > HdfsTableSink:(Total: 1m31s, non-child: 1m31s, % non-child: 100.00%) > - BytesWritten: 5.36 GB (5760571200) > - CompressTimer: 0ns > - EncodeTimer: 1m22s > - FilesCreated: 1 (1) > - FinalizePartitionFileTimer: 14.38ms > - HdfsWriteTimer: 8s058ms > - PartitionsCreated: 1 (1) > - PeakMemoryUsage: 50.00 KB (51200) > - RowsInserted: 615.57M (615574800) > - TmpFileCreateTimer: 14.754ms > NESTED_LOOP_JOIN_NODE (id=12):(Total: 1m33s, non-child: 1m31s, % > non-child: 98.01%) > - BuildRows: 600 (600) > - BuildTime: 32.750us > - PeakMemoryUsage: 4.09 MB (4284416) > - ProbeRows: 1.02K (1024) > - ProbeTime: 0ns > - RowsReturned: 1.14B (1136695648) > - RowsReturnedRate: 12.22 M/sec > {code} > The code used to increment/set the rows_returned_counter_ does not appear to > be correct. > There is no workaround. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-2792) Syntactic sugar for computing aggregates over nested collections.
[ https://issues.apache.org/jira/browse/IMPALA-2792?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabor Kaszab updated IMPALA-2792: - Labels: complextype nested_types planner ramp-up usability (was: nested_types planner ramp-up usability) > Syntactic sugar for computing aggregates over nested collections. > - > > Key: IMPALA-2792 > URL: https://issues.apache.org/jira/browse/IMPALA-2792 > Project: IMPALA > Issue Type: New Feature > Components: Frontend >Affects Versions: Impala 2.3.0 >Reporter: Alexander Behm >Priority: Major > Labels: complextype, nested_types, planner, ramp-up, usability > > For user convenience and SQL brevity, we should add syntax extensions to > concisely express aggregates over nested collections. Internally, we should > re-write the concise versions into the more verbose equivalent with a > correlated inline view. > Example A: > {code} > New syntax: > select count(c.orders) from customer c > Internally rewrite to: > select cnt from customer c, (select count(*) from c.orders) v > {code} > Example B: > {code} > New syntax: > select avg(c.orders.items.price) from customer c > Internally rewrite to: > select a from customer c, (select avg(price) from c.orders.items) v > {code} > I suggest performing the rewrite inside StmtRewriter.java after rewriting all > subqueries from the WHERE clause. > Similar syntactic improvements should be considered for analytic functions on > nested collections. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Updated] (IMPALA-2939) Nested Types : Address Runtime & Scoped timer overhead
[ https://issues.apache.org/jira/browse/IMPALA-2939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Gabor Kaszab updated IMPALA-2939: - Labels: complextype nested_types performance (was: nested_types performance) > Nested Types : Address Runtime & Scoped timer overhead > -- > > Key: IMPALA-2939 > URL: https://issues.apache.org/jira/browse/IMPALA-2939 > Project: IMPALA > Issue Type: Improvement > Components: Backend >Affects Versions: Impala 2.4.0 >Reporter: Mostafa Mokhtar >Priority: Minor > Labels: complextype, nested_types, performance > Attachments: Bottom-Up-HotFunctions.csv, Top-Down-HotFunctions.csv, > nestedTypesQ1.zip > > > For the following query about 45% of the time is spent in updating timers, > RunTimeProfile and checking query state, since NestedTypes don't always > operate on Batches the overhead of updating counters is amplified. > {code} > select > l.l_shipdate, count(*) as wins > from > customer.c_orders o, > o.o_lineitems l > where > o_orderdate = '1993-12-12' > group by l.l_shipdate > order by wins; > {code} > |Function||Effective Time by Utilization|| > |clock_gettime29.8% 0s 0s librt.so.1| clock_gettime| > |impala::RuntimeProfile::Counter::Add|5.3%| > |std::map std::less, std::allocator impala::RuntimeProfile::Counter*>>>::operator[]| 4.7%| > |impala::RuntimeState::CheckQueryState| 3.2%| > |impala::MonotonicStopWatch::Stop|2.7%| -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-9493) Docker-based tests fail on Centos 6 with NTP not started error
Laszlo Gaal created IMPALA-9493: --- Summary: Docker-based tests fail on Centos 6 with NTP not started error Key: IMPALA-9493 URL: https://issues.apache.org/jira/browse/IMPALA-9493 Project: IMPALA Issue Type: Bug Components: Infrastructure Affects Versions: Impala 3.4.0 Reporter: Laszlo Gaal Assignee: Laszlo Gaal When using docker/test-with-docker.py for increased test parallelism, the test run fails on CentOS 6 at minicluster startup time with {code} 2020-03-12 15:14:23.905791 Starting kms (Web UI - http://localhost:9600) 2020-03-12 15:14:28.987895 Waiting for ntpd to synchronize... ntpd is not running! 2020-03-12 15:14:28.988346 ntp-wait failed; cannot start kudu 2020-03-12 15:14:28.991141 ERROR in /home/impdev/Impala/testdata/cluster/admin at line 349: return 1 2020-03-12 15:14:29.019315 Generated: /home/impdev/Impala/logs/extra_junit_xml_logs/generate_junitxml.buildall.admin.20200312_22_14_29.xml 2020-03-12 15:14:29.022498 ERROR in /home/impdev/Impala/testdata/bin/run-mini-dfs.sh at line 42: $IMPALA_HOME/testdata/cluster/admin start_cluster 2020-03-12 15:14:29.049704 Generated: /home/impdev/Impala/logs/extra_junit_xml_logs/generate_junitxml.buildall.run-mini-dfs.20200312_22_14_29.xml 2020-03-12 15:14:29.107076 ERROR in /home/impdev/Impala/testdata/bin/run-all.sh at line 55: tee ${IMPALA_CLUSTER_LOGS_DIR}/run-mini-dfs.log 2020-03-12 15:14:29.133153 Generated: /home/impdev/Impala/logs/extra_junit_xml_logs/generate_junitxml.buildall.run-all.20200312_22_14_29.xml 2020-03-12 15:14:29.134047 + ret=1 2020-03-12 15:14:29.134071 + set +x 2020-03-12 15:14:29.134094 >>> build_impdev (1) (end) {code} The failure is consistently reproducible, blocking these tests in a centos6-based container. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-9491) Compilation failure in KuduUtil.java
[ https://issues.apache.org/jira/browse/IMPALA-9491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17058672#comment-17058672 ] Csaba Ringhofer commented on IMPALA-9491: - I think that our current solution is potentially flaky by design: a maven-metadata.xml's element is based on the timestamp of the build, not the the timestamp of the last change in the repo. This means that if there is a newer build for an older version of kudu-client, the mvn will prefer it over the newer version. Another issue I saw is that there is actually no maven-metadata.xml in the repository in the toolchain. Shouldn't we add the maven-metadata.xml + the checksums there? > Compilation failure in KuduUtil.java > > > Key: IMPALA-9491 > URL: https://issues.apache.org/jira/browse/IMPALA-9491 > Project: IMPALA > Issue Type: Bug > Components: Frontend >Reporter: David Rorke >Assignee: Csaba Ringhofer >Priority: Blocker > Labels: broken-build > > Build is failing with the following: > {noformat} > 12:40:33 [INFO] BUILD FAILURE > 12:40:33 [ERROR] Failed to execute goal > org.apache.maven.plugins:maven-compiler-plugin:3.3:compile (default-compile) > on project impala-frontend: Compilation failure: Compilation failure: > 12:40:33 [ERROR] > /data0/jenkins/workspace/impala-asf-master-core/repos/Impala/fe/src/main/java/org/apache/impala/util/KuduUtil.java:[181,12] > an enum switch case label must be the unqualified name of an enumeration > constant > 12:40:33 [ERROR] > /data0/jenkins/workspace/impala-asf-master-core/repos/Impala/fe/src/main/java/org/apache/impala/util/KuduUtil.java:[183,12] > cannot find symbol > 12:40:33 [ERROR] symbol: method addDate(int,java.sql.Date) > 12:40:33 [ERROR] location: variable key of type > org.apache.kudu.client.PartialRow > 12:40:33 [ERROR] > /data0/jenkins/workspace/impala-asf-master-core/repos/Impala/fe/src/main/java/org/apache/impala/util/KuduUtil.java:[239,12] > an enum switch case label must be the unqualified name of an enumeration > constant > 12:40:33 [ERROR] > /data0/jenkins/workspace/impala-asf-master-core/repos/Impala/fe/src/main/java/org/apache/impala/util/KuduUtil.java:[442,45] > cannot find symbol > 12:40:33 [ERROR] symbol: variable DATE > 12:40:33 [ERROR] location: class org.apache.kudu.Type > 12:40:33 [ERROR] > /data0/jenkins/workspace/impala-asf-master-core/repos/Impala/fe/src/main/java/org/apache/impala/util/KuduUtil.java:[468,12] > an enum switch case label must be the unqualified name of an enumeration > constant > {noformat} > Likely related to this change: https://gerrit.cloudera.org/#/c/14705/ -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-9491) Compilation failure in KuduUtil.java
[ https://issues.apache.org/jira/browse/IMPALA-9491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17058569#comment-17058569 ] Csaba Ringhofer commented on IMPALA-9491: - [~attilaj] It is possible that the cause is a more general issue related to IMPALA-9279 - we are not always picking up the newer Kudu jars from toolchain. The commit that led to build failures (Kudu Date support) seems the first time we use classes that are not in CDH Kudu client. > Compilation failure in KuduUtil.java > > > Key: IMPALA-9491 > URL: https://issues.apache.org/jira/browse/IMPALA-9491 > Project: IMPALA > Issue Type: Bug > Components: Frontend >Reporter: David Rorke >Assignee: Csaba Ringhofer >Priority: Blocker > Labels: broken-build > > Build is failing with the following: > {noformat} > 12:40:33 [INFO] BUILD FAILURE > 12:40:33 [ERROR] Failed to execute goal > org.apache.maven.plugins:maven-compiler-plugin:3.3:compile (default-compile) > on project impala-frontend: Compilation failure: Compilation failure: > 12:40:33 [ERROR] > /data0/jenkins/workspace/impala-asf-master-core/repos/Impala/fe/src/main/java/org/apache/impala/util/KuduUtil.java:[181,12] > an enum switch case label must be the unqualified name of an enumeration > constant > 12:40:33 [ERROR] > /data0/jenkins/workspace/impala-asf-master-core/repos/Impala/fe/src/main/java/org/apache/impala/util/KuduUtil.java:[183,12] > cannot find symbol > 12:40:33 [ERROR] symbol: method addDate(int,java.sql.Date) > 12:40:33 [ERROR] location: variable key of type > org.apache.kudu.client.PartialRow > 12:40:33 [ERROR] > /data0/jenkins/workspace/impala-asf-master-core/repos/Impala/fe/src/main/java/org/apache/impala/util/KuduUtil.java:[239,12] > an enum switch case label must be the unqualified name of an enumeration > constant > 12:40:33 [ERROR] > /data0/jenkins/workspace/impala-asf-master-core/repos/Impala/fe/src/main/java/org/apache/impala/util/KuduUtil.java:[442,45] > cannot find symbol > 12:40:33 [ERROR] symbol: variable DATE > 12:40:33 [ERROR] location: class org.apache.kudu.Type > 12:40:33 [ERROR] > /data0/jenkins/workspace/impala-asf-master-core/repos/Impala/fe/src/main/java/org/apache/impala/util/KuduUtil.java:[468,12] > an enum switch case label must be the unqualified name of an enumeration > constant > {noformat} > Likely related to this change: https://gerrit.cloudera.org/#/c/14705/ -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-9491) Compilation failure in KuduUtil.java
[ https://issues.apache.org/jira/browse/IMPALA-9491?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17058475#comment-17058475 ] Quanlong Huang commented on IMPALA-9491: I hit this in my local dev env and resolved it by clearing local maven stuffs of Kudu: {code:bash} rm -rf ~/.m2/repository/org/apache/kudu/*{code} > Compilation failure in KuduUtil.java > > > Key: IMPALA-9491 > URL: https://issues.apache.org/jira/browse/IMPALA-9491 > Project: IMPALA > Issue Type: Bug > Components: Frontend >Reporter: David Rorke >Assignee: Csaba Ringhofer >Priority: Blocker > Labels: broken-build > > Build is failing with the following: > {noformat} > 12:40:33 [INFO] BUILD FAILURE > 12:40:33 [ERROR] Failed to execute goal > org.apache.maven.plugins:maven-compiler-plugin:3.3:compile (default-compile) > on project impala-frontend: Compilation failure: Compilation failure: > 12:40:33 [ERROR] > /data0/jenkins/workspace/impala-asf-master-core/repos/Impala/fe/src/main/java/org/apache/impala/util/KuduUtil.java:[181,12] > an enum switch case label must be the unqualified name of an enumeration > constant > 12:40:33 [ERROR] > /data0/jenkins/workspace/impala-asf-master-core/repos/Impala/fe/src/main/java/org/apache/impala/util/KuduUtil.java:[183,12] > cannot find symbol > 12:40:33 [ERROR] symbol: method addDate(int,java.sql.Date) > 12:40:33 [ERROR] location: variable key of type > org.apache.kudu.client.PartialRow > 12:40:33 [ERROR] > /data0/jenkins/workspace/impala-asf-master-core/repos/Impala/fe/src/main/java/org/apache/impala/util/KuduUtil.java:[239,12] > an enum switch case label must be the unqualified name of an enumeration > constant > 12:40:33 [ERROR] > /data0/jenkins/workspace/impala-asf-master-core/repos/Impala/fe/src/main/java/org/apache/impala/util/KuduUtil.java:[442,45] > cannot find symbol > 12:40:33 [ERROR] symbol: variable DATE > 12:40:33 [ERROR] location: class org.apache.kudu.Type > 12:40:33 [ERROR] > /data0/jenkins/workspace/impala-asf-master-core/repos/Impala/fe/src/main/java/org/apache/impala/util/KuduUtil.java:[468,12] > an enum switch case label must be the unqualified name of an enumeration > constant > {noformat} > Likely related to this change: https://gerrit.cloudera.org/#/c/14705/ -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org