[jira] [Commented] (IMPALA-9857) Batch ALTER_PARTITION events
[ https://issues.apache.org/jira/browse/IMPALA-9857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17424042#comment-17424042 ] Vihang Karajgaonkar commented on IMPALA-9857: - IMPALA-10949 is created as a follow-up which can improve the batching logic significantly. > Batch ALTER_PARTITION events > > > Key: IMPALA-9857 > URL: https://issues.apache.org/jira/browse/IMPALA-9857 > Project: IMPALA > Issue Type: Improvement > Components: Catalog >Reporter: Vihang Karajgaonkar >Assignee: Vihang Karajgaonkar >Priority: Major > > When Hive inserts data into partitioned tables, it generates a lot of > ALTER_PARTITION (and possibly INSERT_EVENT) in quick succession. Currently, > such events are processed one by one by EventsProcessor which is can be slow > and can cause EventsProcessor to lag behind. This JIRA proposes to use > batching for such ALTER_PARTITION events such that all the successive > ALTER_PARTITION events for the same table are batched together into one > ALTER_PARTITIONS event and then are processed together to refresh all the > partitions from the events. This can significantly speed up the event > processing in such cases. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Created] (IMPALA-10953) Impalad may crash when TmpFilrMgr initialization fails and abort_on_config_error is set false
Yida Wu created IMPALA-10953: Summary: Impalad may crash when TmpFilrMgr initialization fails and abort_on_config_error is set false Key: IMPALA-10953 URL: https://issues.apache.org/jira/browse/IMPALA-10953 Project: IMPALA Issue Type: Bug Affects Versions: Impala 4.0.0, Impala 3.0, Impala 4.1.0 Reporter: Yida Wu The impalad can start up successfully without a successful initialization on TmpFilrMgr when the abort_on_config_error is set false. [https://github.com/apache/impala/blob/d2f866f9a17c2d71fb3e3e731a2dfcce68d336d9/be/src/service/impala-server.cc#L440] It could lead to a crash during runtime for using an incomplete TmpFilrMgr when the query needs to spill the data. There is a DCHECK assert for using an incomplete TmpFilrMgr, but the DCHECK doesn't work for a release version, and instead of a DCHECK, it may be better to have a function to return an error if using an incomplete TmpFilrMgr, therefore we can fail the query instead of causing a impalad crash. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Assigned] (IMPALA-10953) Impalad may crash when TmpFilrMgr initialization fails and abort_on_config_error is set false
[ https://issues.apache.org/jira/browse/IMPALA-10953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yida Wu reassigned IMPALA-10953: Assignee: Yida Wu > Impalad may crash when TmpFilrMgr initialization fails and > abort_on_config_error is set false > - > > Key: IMPALA-10953 > URL: https://issues.apache.org/jira/browse/IMPALA-10953 > Project: IMPALA > Issue Type: Bug >Affects Versions: Impala 3.0, Impala 4.0.0, Impala 4.1.0 >Reporter: Yida Wu >Assignee: Yida Wu >Priority: Major > > The impalad can start up successfully without a successful initialization on > TmpFilrMgr when the abort_on_config_error is set false. > [https://github.com/apache/impala/blob/d2f866f9a17c2d71fb3e3e731a2dfcce68d336d9/be/src/service/impala-server.cc#L440] > It could lead to a crash during runtime for using an incomplete TmpFilrMgr > when the query needs to spill the data. There is a DCHECK assert for using an > incomplete TmpFilrMgr, but the DCHECK doesn't work for a release version, and > instead of a DCHECK, it may be better to have a function to return an error > if using an incomplete TmpFilrMgr, therefore we can fail the query instead of > causing a impalad crash. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Work started] (IMPALA-10929) Optimise memory usage of structs in tuples
[ https://issues.apache.org/jira/browse/IMPALA-10929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Work on IMPALA-10929 started by Daniel Becker. -- > Optimise memory usage of structs in tuples > -- > > Key: IMPALA-10929 > URL: https://issues.apache.org/jira/browse/IMPALA-10929 > Project: IMPALA > Issue Type: Improvement > Components: Frontend >Reporter: Daniel Becker >Assignee: Daniel Becker >Priority: Major > > If we have both a whole struct and one of its members (or a member of a > member etc.) in the select list, the whole struct and the member are assigned > to different slots in the tuple. We could use less memory if the member > expression used the slot within the whole struct instead. > Example: > For the query > {code:java} > select id, outer_struct from functional_orc_def.complextypes_nested_structs; > {code} > the row size is 64B, while for > {code:java} > select id, outer_struct, outer_struct.inner_struct2 from > functional_orc_def.complextypes_nested_structs; > {code} > it is 80B, although it should not need more memory. > It is not limited to the select list, it should also work with where clauses > etc., for example > {code:java} > select id, outer_struct from functional_orc_def.complextypes_nested_structs > where outer_struct.inner_struct2.i > 1; > {code} > should also have a row size of 64B instead of 68B. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Resolved] (IMPALA-10935) Impala crashes on old Iceberg table property in some cases
[ https://issues.apache.org/jira/browse/IMPALA-10935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Zoltán Borók-Nagy resolved IMPALA-10935. Fix Version/s: Impala 4.1.0 Resolution: Fixed > Impala crashes on old Iceberg table property in some cases > -- > > Key: IMPALA-10935 > URL: https://issues.apache.org/jira/browse/IMPALA-10935 > Project: IMPALA > Issue Type: Bug >Reporter: Zoltán Borók-Nagy >Assignee: Zoltán Borók-Nagy >Priority: Major > Fix For: Impala 4.1.0 > > > Impala crashes when the following conditions are true: > * local catalog mode is being used > * Iceberg table is being queried > * 'iceberg.file_format' is set instead of 'write.format.default' table > property > * the file format is ORC > * Query is select count(*) from t; > When the above conditions met Impala wrongly assumes that PARQUET is being > used and tries to apply the count star optimization. It is not implemented > for the ORC scanner and causes it to crash. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-10935) Impala crashes on old Iceberg table property in some cases
[ https://issues.apache.org/jira/browse/IMPALA-10935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17423858#comment-17423858 ] ASF subversion and git services commented on IMPALA-10935: -- Commit d2f866f9a17c2d71fb3e3e731a2dfcce68d336d9 in impala's branch refs/heads/master from Zoltan Borok-Nagy [ https://gitbox.apache.org/repos/asf?p=impala.git;h=d2f866f ] IMPALA-10935: Impala crashes on old Iceberg table property With IMPALA-10627 we switched to use standard Iceberg table properties: https://iceberg.apache.org/configuration/ E.g. we switched from 'iceberg.file_format' to 'write.format.default'. For backward compatibility we also support 'iceberg.file_format'. Though the support is not perfect as it causes a crash in some cases. Impala crashes when the following conditions met: * local catalog mode is being used * Iceberg table is being queried * the data file format is ORC * 'iceberg.file_format' is set instead of 'write.format.default' table property * Query is "select count(*) from t;" Impala wrongly assumes that PARQUET is being used and tries to apply the count star optimization. It is not implemented for the ORC scanner and causes it to crash. This patch fixes the wrong assumption. Also it fixes the HdfsOrcScanner, so it won't crash in release mode but raise an error. This patch also enables UNSETting the file format table property for Iceberg tables. This table property was already enabled for modifications (changing the value via SET TBLPROPERTIES). Testing: * added e2e test for the above conditions Change-Id: Iafd9baef1c124d7356a14ba24c571567629a5e50 Reviewed-on: http://gerrit.cloudera.org:8080/17877 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins > Impala crashes on old Iceberg table property in some cases > -- > > Key: IMPALA-10935 > URL: https://issues.apache.org/jira/browse/IMPALA-10935 > Project: IMPALA > Issue Type: Bug >Reporter: Zoltán Borók-Nagy >Assignee: Zoltán Borók-Nagy >Priority: Major > > Impala crashes when the following conditions are true: > * local catalog mode is being used > * Iceberg table is being queried > * 'iceberg.file_format' is set instead of 'write.format.default' table > property > * the file format is ORC > * Query is select count(*) from t; > When the above conditions met Impala wrongly assumes that PARQUET is being > used and tries to apply the count star optimization. It is not implemented > for the ORC scanner and causes it to crash. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org
[jira] [Commented] (IMPALA-10627) Use standard Iceberg table properties
[ https://issues.apache.org/jira/browse/IMPALA-10627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17423859#comment-17423859 ] ASF subversion and git services commented on IMPALA-10627: -- Commit d2f866f9a17c2d71fb3e3e731a2dfcce68d336d9 in impala's branch refs/heads/master from Zoltan Borok-Nagy [ https://gitbox.apache.org/repos/asf?p=impala.git;h=d2f866f ] IMPALA-10935: Impala crashes on old Iceberg table property With IMPALA-10627 we switched to use standard Iceberg table properties: https://iceberg.apache.org/configuration/ E.g. we switched from 'iceberg.file_format' to 'write.format.default'. For backward compatibility we also support 'iceberg.file_format'. Though the support is not perfect as it causes a crash in some cases. Impala crashes when the following conditions met: * local catalog mode is being used * Iceberg table is being queried * the data file format is ORC * 'iceberg.file_format' is set instead of 'write.format.default' table property * Query is "select count(*) from t;" Impala wrongly assumes that PARQUET is being used and tries to apply the count star optimization. It is not implemented for the ORC scanner and causes it to crash. This patch fixes the wrong assumption. Also it fixes the HdfsOrcScanner, so it won't crash in release mode but raise an error. This patch also enables UNSETting the file format table property for Iceberg tables. This table property was already enabled for modifications (changing the value via SET TBLPROPERTIES). Testing: * added e2e test for the above conditions Change-Id: Iafd9baef1c124d7356a14ba24c571567629a5e50 Reviewed-on: http://gerrit.cloudera.org:8080/17877 Reviewed-by: Impala Public Jenkins Tested-by: Impala Public Jenkins > Use standard Iceberg table properties > - > > Key: IMPALA-10627 > URL: https://issues.apache.org/jira/browse/IMPALA-10627 > Project: IMPALA > Issue Type: Bug >Reporter: Zoltán Borók-Nagy >Assignee: Attila Jeges >Priority: Major > Labels: impala-iceberg > Fix For: Impala 4.1.0 > > > Iceberg lists the following properties: > [https://iceberg.apache.org/configuration/] > We should also use these properties if possible, e.g. write.format.default, > write..compression-codec > Currently Impala use the table property 'iceberg.file_format' to determine > the data file format for reads/writes. In the future, read operations should > automatically detect the file formats (IMPALA-10610), but for writes we > should use 'write.format.default'. -- This message was sent by Atlassian Jira (v8.3.4#803005) - To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org For additional commands, e-mail: issues-all-h...@impala.apache.org