gaborgsomogyi commented on issue #24967: [SPARK-28163][SS] Use 
CaseInsensitiveMap for KafkaOffsetReader
URL: https://github.com/apache/spark/pull/24967#issuecomment-505759715
 
 
   @HeartSaVioR thanks for your comment!
   
   This case sensitivity issue disturbs me for at least half a year so pretty 
open what less error prone solution can we come up. Since I've found another 
issue I've invested more time and analyzed it through.
   
   Let me share my thinking and please do the same to find out how to proceed. 
First, start with the actual implementation. Majority of this code uses 
case-insensitive maps in some way (`CaseInsensitiveMap` or 
`CaseInsensitiveStringMap` instances arrive mainly).
   
   > Map with lowercased key and CaseInsensitiveMap are not same
   
   This is true.
   > referring Map with lowercase key as "case-insensitive" which is not 
strictly true
   
   I agree with your statement and we can discuss how to name it. Map with 
lowercase key is referred "case-insensitive" because the user doesn't have to 
care about case when the following operations used: put/get/contains. The 
following ops are the same:
   ```
   val map: CaseInsensitive...
   map.put("key1", "value1")
   map.put("keY2", "value2")
   
   // all is true
   map.get("key1")==map.get("KEY1")
   map.contains("kEy1")
   map.get("key2")==map.get("KEY2")
   map.contains("kEy2")
   ```
   If the underlying conversion which is used on the keys consistently will 
change then the mentioned logic is still true. To find out whether user is 
provided a configuration I think it's the most robust solution.
   
   The other use-case which exists is to extract the entries from these maps 
which start with `kafka.` and send it to Kakfa producer/consumer. This can't be 
called case-insensitive of course because Kafka requires lowercase keys, no 
question. This is the reason why the following function extracts the mentioned 
parameters (and force lower case conversion on keys):
   ```
     private def convertToSpecifiedParams(parameters: Map[String, String]): 
Map[String, String] = {
       parameters
         .keySet
         .filter(_.toLowerCase(Locale.ROOT).startsWith("kafka."))
         .map { k => k.drop(6).toString -> parameters(k) }
         .toMap
     }
   ```
   
   Majority of this code uses the mentioned case insensitive maps but there are 
some parts which breaks this (for example the part which we're now modifying).
   
   As a final conclusion from my perspective:
   * Case insensitive maps has to be used for the first use-case because less 
error prone (this is what I tend to address)
   * When any parameter passed to Kafka `convertToSpecifiedParams` must be used 
(this is already the situation)
   * Changing the following strings to lower case would has the same effect 
just like the actual stand of this PR but then we stay in the same parameter 
case hell
   ```
     private[kafka010] val FETCH_OFFSET_NUM_RETRY = "fetchoffset.numretries"
     private[kafka010] val FETCH_OFFSET_RETRY_INTERVAL_MS = 
"fetchoffset.retryintervalms"
   ```
   
   That said everything can be discussed so waiting on thoughts...
   

----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
[email protected]


With regards,
Apache Git Services

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to