[jira] [Commented] (IMPALA-9857) Batch ALTER_PARTITION events

2021-10-04 Thread Vihang Karajgaonkar (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-9857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17424042#comment-17424042
 ] 

Vihang Karajgaonkar commented on IMPALA-9857:
-

IMPALA-10949 is created as a follow-up which can improve the batching logic 
significantly.

> Batch ALTER_PARTITION events
> 
>
> Key: IMPALA-9857
> URL: https://issues.apache.org/jira/browse/IMPALA-9857
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Catalog
>Reporter: Vihang Karajgaonkar
>Assignee: Vihang Karajgaonkar
>Priority: Major
>
> When Hive inserts data into partitioned tables, it generates a lot of 
> ALTER_PARTITION (and possibly INSERT_EVENT) in quick succession. Currently, 
> such events are processed one by one by EventsProcessor which is can be slow 
> and can cause EventsProcessor to lag behind. This JIRA proposes to use 
> batching for such ALTER_PARTITION events such that all the successive 
> ALTER_PARTITION events for the same table are batched together into one 
> ALTER_PARTITIONS event and then are processed together to refresh all the 
> partitions from the events. This can significantly speed up the event 
> processing in such cases.   



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Created] (IMPALA-10953) Impalad may crash when TmpFilrMgr initialization fails and abort_on_config_error is set false

2021-10-04 Thread Yida Wu (Jira)
Yida Wu created IMPALA-10953:


 Summary: Impalad may crash when TmpFilrMgr initialization fails 
and abort_on_config_error is set false
 Key: IMPALA-10953
 URL: https://issues.apache.org/jira/browse/IMPALA-10953
 Project: IMPALA
  Issue Type: Bug
Affects Versions: Impala 4.0.0, Impala 3.0, Impala 4.1.0
Reporter: Yida Wu


The impalad can start up successfully without a successful initialization on 
TmpFilrMgr when the abort_on_config_error is set false.

[https://github.com/apache/impala/blob/d2f866f9a17c2d71fb3e3e731a2dfcce68d336d9/be/src/service/impala-server.cc#L440]

It could lead to a crash during runtime for using an incomplete TmpFilrMgr when 
the query needs to spill the data. There is a DCHECK assert for using an 
incomplete TmpFilrMgr, but the DCHECK doesn't work for a release version, and 
instead of a DCHECK, it may be better to have a function to return an error if 
using an incomplete TmpFilrMgr, therefore we can fail the query instead of 
causing a impalad crash.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Assigned] (IMPALA-10953) Impalad may crash when TmpFilrMgr initialization fails and abort_on_config_error is set false

2021-10-04 Thread Yida Wu (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10953?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Yida Wu reassigned IMPALA-10953:


Assignee: Yida Wu

> Impalad may crash when TmpFilrMgr initialization fails and 
> abort_on_config_error is set false
> -
>
> Key: IMPALA-10953
> URL: https://issues.apache.org/jira/browse/IMPALA-10953
> Project: IMPALA
>  Issue Type: Bug
>Affects Versions: Impala 3.0, Impala 4.0.0, Impala 4.1.0
>Reporter: Yida Wu
>Assignee: Yida Wu
>Priority: Major
>
> The impalad can start up successfully without a successful initialization on 
> TmpFilrMgr when the abort_on_config_error is set false.
> [https://github.com/apache/impala/blob/d2f866f9a17c2d71fb3e3e731a2dfcce68d336d9/be/src/service/impala-server.cc#L440]
> It could lead to a crash during runtime for using an incomplete TmpFilrMgr 
> when the query needs to spill the data. There is a DCHECK assert for using an 
> incomplete TmpFilrMgr, but the DCHECK doesn't work for a release version, and 
> instead of a DCHECK, it may be better to have a function to return an error 
> if using an incomplete TmpFilrMgr, therefore we can fail the query instead of 
> causing a impalad crash.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Work started] (IMPALA-10929) Optimise memory usage of structs in tuples

2021-10-04 Thread Daniel Becker (Jira)


 [ 
https://issues.apache.org/jira/browse/IMPALA-10929?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Work on IMPALA-10929 started by Daniel Becker.
--
> Optimise memory usage of structs in tuples
> --
>
> Key: IMPALA-10929
> URL: https://issues.apache.org/jira/browse/IMPALA-10929
> Project: IMPALA
>  Issue Type: Improvement
>  Components: Frontend
>Reporter: Daniel Becker
>Assignee: Daniel Becker
>Priority: Major
>
> If we have both a whole struct and one of its members (or a member of a 
> member etc.) in the select list, the whole struct and the member are assigned 
> to different slots in the tuple. We could use less memory if the member 
> expression used the slot within the whole struct instead.
> Example:
> For the query 
> {code:java}
> select id, outer_struct from functional_orc_def.complextypes_nested_structs;
> {code}
> the row size is 64B, while for
> {code:java}
> select id, outer_struct, outer_struct.inner_struct2 from 
> functional_orc_def.complextypes_nested_structs;
> {code}
> it is 80B, although it should not need more memory.
> It is not limited to the select list, it should also work with where clauses 
> etc., for example
> {code:java}
> select id, outer_struct from functional_orc_def.complextypes_nested_structs 
> where outer_struct.inner_struct2.i > 1;
> {code}
> should also have a row size of 64B instead of 68B.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Resolved] (IMPALA-10935) Impala crashes on old Iceberg table property in some cases

2021-10-04 Thread Jira


 [ 
https://issues.apache.org/jira/browse/IMPALA-10935?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Zoltán Borók-Nagy resolved IMPALA-10935.

Fix Version/s: Impala 4.1.0
   Resolution: Fixed

> Impala crashes on old Iceberg table property in some cases
> --
>
> Key: IMPALA-10935
> URL: https://issues.apache.org/jira/browse/IMPALA-10935
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Zoltán Borók-Nagy
>Assignee: Zoltán Borók-Nagy
>Priority: Major
> Fix For: Impala 4.1.0
>
>
> Impala crashes when the following conditions are true:
> * local catalog mode is being used
> * Iceberg table is being queried
> * 'iceberg.file_format' is set instead of 'write.format.default' table 
> property
> * the file format is ORC
> * Query is select count(*) from t;
> When the above conditions met Impala wrongly assumes that PARQUET is being 
> used and tries to apply the count star optimization. It is not implemented 
> for the ORC scanner and causes it to crash.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-10935) Impala crashes on old Iceberg table property in some cases

2021-10-04 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-10935?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17423858#comment-17423858
 ] 

ASF subversion and git services commented on IMPALA-10935:
--

Commit d2f866f9a17c2d71fb3e3e731a2dfcce68d336d9 in impala's branch 
refs/heads/master from Zoltan Borok-Nagy
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=d2f866f ]

IMPALA-10935: Impala crashes on old Iceberg table property

With IMPALA-10627 we switched to use standard Iceberg table
properties: https://iceberg.apache.org/configuration/

E.g. we switched from 'iceberg.file_format' to 'write.format.default'.
For backward compatibility we also support 'iceberg.file_format'. Though
the support is not perfect as it causes a crash in some cases.

Impala crashes when the following conditions met:
* local catalog mode is being used
* Iceberg table is being queried
* the data file format is ORC
* 'iceberg.file_format' is set instead of 'write.format.default' table
  property
* Query is "select count(*) from t;"

Impala wrongly assumes that PARQUET is being used and tries to apply the
count star optimization. It is not implemented for the ORC scanner and
causes it to crash.

This patch fixes the wrong assumption. Also it fixes the HdfsOrcScanner,
so it won't crash in release mode but raise an error.

This patch also enables UNSETting the file format table property for
Iceberg tables. This table property was already enabled for
modifications (changing the value via SET TBLPROPERTIES).

Testing:
 * added e2e test for the above conditions

Change-Id: Iafd9baef1c124d7356a14ba24c571567629a5e50
Reviewed-on: http://gerrit.cloudera.org:8080/17877
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 


> Impala crashes on old Iceberg table property in some cases
> --
>
> Key: IMPALA-10935
> URL: https://issues.apache.org/jira/browse/IMPALA-10935
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Zoltán Borók-Nagy
>Assignee: Zoltán Borók-Nagy
>Priority: Major
>
> Impala crashes when the following conditions are true:
> * local catalog mode is being used
> * Iceberg table is being queried
> * 'iceberg.file_format' is set instead of 'write.format.default' table 
> property
> * the file format is ORC
> * Query is select count(*) from t;
> When the above conditions met Impala wrongly assumes that PARQUET is being 
> used and tries to apply the count star optimization. It is not implemented 
> for the ORC scanner and causes it to crash.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org



[jira] [Commented] (IMPALA-10627) Use standard Iceberg table properties

2021-10-04 Thread ASF subversion and git services (Jira)


[ 
https://issues.apache.org/jira/browse/IMPALA-10627?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17423859#comment-17423859
 ] 

ASF subversion and git services commented on IMPALA-10627:
--

Commit d2f866f9a17c2d71fb3e3e731a2dfcce68d336d9 in impala's branch 
refs/heads/master from Zoltan Borok-Nagy
[ https://gitbox.apache.org/repos/asf?p=impala.git;h=d2f866f ]

IMPALA-10935: Impala crashes on old Iceberg table property

With IMPALA-10627 we switched to use standard Iceberg table
properties: https://iceberg.apache.org/configuration/

E.g. we switched from 'iceberg.file_format' to 'write.format.default'.
For backward compatibility we also support 'iceberg.file_format'. Though
the support is not perfect as it causes a crash in some cases.

Impala crashes when the following conditions met:
* local catalog mode is being used
* Iceberg table is being queried
* the data file format is ORC
* 'iceberg.file_format' is set instead of 'write.format.default' table
  property
* Query is "select count(*) from t;"

Impala wrongly assumes that PARQUET is being used and tries to apply the
count star optimization. It is not implemented for the ORC scanner and
causes it to crash.

This patch fixes the wrong assumption. Also it fixes the HdfsOrcScanner,
so it won't crash in release mode but raise an error.

This patch also enables UNSETting the file format table property for
Iceberg tables. This table property was already enabled for
modifications (changing the value via SET TBLPROPERTIES).

Testing:
 * added e2e test for the above conditions

Change-Id: Iafd9baef1c124d7356a14ba24c571567629a5e50
Reviewed-on: http://gerrit.cloudera.org:8080/17877
Reviewed-by: Impala Public Jenkins 
Tested-by: Impala Public Jenkins 


> Use standard Iceberg table properties
> -
>
> Key: IMPALA-10627
> URL: https://issues.apache.org/jira/browse/IMPALA-10627
> Project: IMPALA
>  Issue Type: Bug
>Reporter: Zoltán Borók-Nagy
>Assignee: Attila Jeges
>Priority: Major
>  Labels: impala-iceberg
> Fix For: Impala 4.1.0
>
>
> Iceberg lists the following properties:
> [https://iceberg.apache.org/configuration/]
> We should also use these properties if possible, e.g. write.format.default, 
> write..compression-codec
> Currently Impala use the table property 'iceberg.file_format' to determine 
> the data file format for reads/writes. In the future, read operations should 
> automatically detect the file formats (IMPALA-10610), but for writes we 
> should use 'write.format.default'.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

-
To unsubscribe, e-mail: issues-all-unsubscr...@impala.apache.org
For additional commands, e-mail: issues-all-h...@impala.apache.org