[ 
https://issues.apache.org/jira/browse/PARQUET-2036?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=17332454#comment-17332454
 ] 

Gabor Szadovszky commented on PARQUET-2036:
-------------------------------------------

[~elad_yosifon], thanks for reporting this.

I am not sure I get the actual circumstances leading to getting to the DEBUG 
log level without an explicit configuration.

The "magic behavior" was implemented to allow JIT to remove the logging parts 
from the compiled code so it'll run faster. Without this static flag the code 
would at least check the log level at each value read/write even if the log 
level is much higher than DEBUG.

About your tips for preventing the issue. Do you think printing to the STDOUT 
would help? I think STDOUT is only checked in cases when there is an error. If 
you do not realize you have a performance impact you would not notice the 
message on STDOUT either. Meanwhile, if you start checking the logs it would be 
clear that the log level is DEBUG.
What do you mean by "waiting for explicit configuration"? I think 
{{isDebugEnabled}} should return the explicit configuration of the log level. 
parquet-mr uses SLF4J just to allow the users (other components) to specify a 
logging FW and configuration.

> implicitly defining DEBUG mode in MessageColumnIO causes 80% performance 
> overhead
> ---------------------------------------------------------------------------------
>
>                 Key: PARQUET-2036
>                 URL: https://issues.apache.org/jira/browse/PARQUET-2036
>             Project: Parquet
>          Issue Type: Bug
>    Affects Versions: 1.10.0, 1.10.1, 1.12.0
>            Reporter: Elad Yosifon
>            Priority: Critical
>
> *parquet-column* jar leverages +slf4j and log4j as default logger+, 
> neglecting to define a log4j configuration, defaults to *DEBUG* log level.
>  
> {code:java}
> public class MessageColumnIO extends GroupColumnIO {
>   private static final Logger LOG = 
> LoggerFactory.getLogger(MessageColumnIO.class);
>   private static final boolean DEBUG = LOG.isDebugEnabled(); // <------
> }
> {code}
>  
> this "magic behavior" defaults parquet library to be in DEBUG mode, without 
> any notification or warnings. Unfortunately, the 
> *RecordConsumerLoggingWrapper* implementation generates 5x performance 
> overhead in comparison to the *MessageColumnIORecordConsumer* implementation, 
> causing a massive hit in performance and wasteful server utilization.
>  
> +IMHO there are two things that could prevent such issue:+
>  * printing a message to STDOUT notifying about DEBUG mode being set to 
> active.
>  * defaulting to *MessageColumnIORecordConsumer* implementation, and waiting 
> for explicit configuration to define DEBUG mode, and use 
> *RecordConsumerLoggingWrapper*.
>  
> In the past 2 years, this issue probably cost my company 50,000$ in excessive 
> cloud costs!



--
This message was sent by Atlassian Jira
(v8.3.4#803005)

Reply via email to