pan3793 commented on PR #52706:
URL: https://github.com/apache/spark/pull/52706#issuecomment-3441217757

   > Please make `beeline` work in the existing class path by default. New code 
path should be applied additionally by configuration or environment variables.
   
   @dongjoon-hyun In general, I agree with your concerns and proposal, but I 
think we can have a different default behavior, due to the following reasons: 
   
   1. Technically, BeeLine does NOT use Spark classes.
      Spark integrates the vanilla Hive BeeLine without modification, the 
dependencies list can be found at [Maven 
Central](https://mvnrepository.com/artifact/org.apache.hive/hive-beeline/2.3.10).
 Excluding some classic Spark jars should NOT be risky.
   
   2. To not surprise users, we'd better make the usage of BeeLine with Connect 
Server out-of-the-box, then we should tune the classpath automatically.
   
   3. If we want to achieve both 2 and make `beeline` work in the existing 
classpath by default, we must have a mechanism to distinguish which service 
BeeLine is going to connect to, which involves two questions:
      
      1. We can parse the args in `SparkClassCommandBuilder` to distinguish the 
connect service if the user provides the JDBC URL in the command directly, 
e.g., `beeline -u 'jdbc:sc://xxxx'`, but this means we need to process 
`BeeLine` args in Spark Launcher, which introduces additional complexity and is 
not eligible IMO.
      2. BeeLine also allows users to use `!connect <jdbc-url>` to connect to a 
DBMS in interactive mode (after starting the CLI). In this case, we don't have 
a chance to dynamically change the classpath.
   
   Given the above reasons, I think we can change the classpath as proposed by 
this PR by default, and have an internal switch (i.e., env var 
`SPARK_BEELINE_CLASSIC` and keep it for at least until 5.x) as a backdoor to 
allow the user to switch back to the original classpath if something goes wrong.
   
   Or, if we are very conservative, we can provide a switch (e.g., env var 
`SPARK_BEELINE_CONNECT`) and then the user must set it explicitly before using 
BeeLine to connect to Connect Server. TBH, I think this hurts user experience.
   
   ```
   $ SPARK_BEELINE_CONNECT=1 bin/beeline -u jdbc:sc://localhost:15002
   ```
   
   also cc @LuciferYang


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to