I'm afraid I have another guess. In his scene, did TEZ-4248 fail?
---- Replied Message ---- | From | Sungwoo Park<glap...@gmail.com> | | Date | 08/27/2025 15:02 | | To | user@hive.apache.org | | Cc | | | Subject | Re: ORC Predicate Pushdown (SARG) Not Applied (allowSARGToFilter: false) | Hi, I have a quick question. Did you try setting orc.sarg.to.filter to true in hive-site.xml? --- Sungwoo On Wed, Aug 27, 2025 at 3:02 PM 서연 <seoyonie...@gmail.com> wrote: Hello Hive Development Team, We are observing a significant performance issue with queries on a non-partitioned ORC table. Our investigation indicates that ORC predicate pushdown (SARG) is not being applied at the storage layer, forcing full data scans instead of efficient, filtered reads. From the TezChild logs, we can see that Hive correctly identifies the pushdown predicate. However, it then explicitly instructs the ORC reader to ignore it for filtering by setting the allowSARGToFilter option to false. ``` 2025-08-27 13:21:52,149 [INFO] [TezChild] |orc.OrcInputFormat|: ORC pushdown predicate: (and leaf-(BETWEEN inv_quantity_on_hand 100 500) (not leaf-(IS_NULL inv_item_sk)) (not leaf-(IS_NULL inv_date_sk))) 2025-08-27 13:21:52,149 [INFO] [TezChild] |orc.ReaderImpl|: Reading ORC rows from hdfs://.../inventory/000000_0 with {..., sarg: (and leaf-(BETWEEN inv_quantity_on_hand 100 500) ...), ..., allowSARGToFilter: false, ...} ``` However, we have confirmed that when we run the exact same query on the same data in our Hive 2.3.2 environment, predicate pushdown works correctly, and the data is filtered at the ORC reader level as expected. Our hypothesis is that this difference is due to changes in the ORC integration. We suspect that the ORC version used in Hive 2.3.2 (likely ORC 1.3.3) did not have the allowSARGToFilter parameter and would always apply a filter if a sarg was present. The introduction of this flag in newer versions seems to have inadvertently caused this performance regression in our use case. Given this, we strongly believe that there should be a way for users to control this behavior. We propose that Hive should provide a configuration (e.g., a session variable or a table property) to explicitly set allowSARGToFilter to true. This would restore the efficient behavior of older versions and provide a crucial performance tuning capability. What are your thoughts on this? Is our analysis correct, and would you be open to considering such a feature? For context, here is our environment information: Hive Version: 4.0.1 Execution Engine: 0.10.4 Query : tpcds scale 300 query82 Thank you for your time and any guidance you can offer. Best regards, seoyeon.