Kimahriman commented on PR #743:
URL: https://github.com/apache/incubator-sedona/pull/743#issuecomment-1370424624

   > @Kimahriman I don't think it is a good idea to include logging 
dependencies in Sedona. Since Spark and Flink already package their own 
versions of log4j or slf4j, including these dependencies by ourselves are 
likely to create conflicts in the future. This will introduce additional 
complexity to the project management and cause potential bugs.
   
   Nothing is added as a compile dependency, most of this is to standardize 
what framework gets used during the tests. For example, several of the test 
base classes have log4j 1.x code to reduce the logging level for various 
packages. I don't _think_ these do anything for current Flink or Spark 
versions, because those both use log4j 2.x as the logging implementation, but 
they include `log4j-1.2-api` as a dependency so the log4j 1.x code still 
compiles. I'm not sure what the limitations of this bridge API are.
   
   I mentioned in the description, but the real thing that needs to be done is 
to migrate all logging to SLF4J. Currently if Spark  or Flink dropped the 
log4j-1.2-api dependency, Sedona would no longer work I believe. This is kind 
of a step towards that, with two main parts:
   - The two test dependencies (`log4j-core` and `log4j-slf4j-impl`) are added 
to make sure all tests run with log4j 2.x, so that we can use a 
log4j2.properties file to configure them
   - `log4j-1.2-api` is added purely for older versions of Spark since we 
exclude log4j 1.x dependencies to make sure 2.x is used for the tests. But 
because this provides the log4j 1.x API that the Sedona code currently uses, it 
has to be a provided dependency and not a test dependency. If Sedona only used 
SLF4J for logging this wouldn't be necessary, as the 1.x to 2.x change would be 
seamless and we wouldn't need to exclude the SLF4J dependency from Spark.
   
   Also part of this is that currently the Flink module doesn't support any 
form of logging during the tests, because the Flink dependencies don't provide 
any of the logging modules like Spark does for the dependencies that are 
currently included.
   
   With this going forward, we could easily enable logging in the common module 
by adding the slf4j-api as a provided dependency. This would enable the code to 
compile, and then with the two test dependencies mentioned above, common module 
tests could properly output logs, and at runtime the slf4j-api would be 
provided by either Flink or Spark directly. Especially if/as more of the 
non-Spark related core and sql module code get ported to the common module.
   
   Alternative options to change less things but still work toward that goal:
   - Only include these dependencies for Flink so that the Flink tests can 
actually produce logs
   - Either don't worry about configuring logging for the Spark tests across 
all the versions, or also add a log4j.properties file to mirror the 
log4j2.properties file so both old and new versions of Spark would have similar 
settings applied (either for reducing verbosity of certain modules or do what I 
have done here which is output the unit test logs to a file instead of the 
console
   
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: dev-unsubscr...@sedona.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org

Reply via email to