gaborgsomogyi commented on issue #24967: [SPARK-28163][SS] Use CaseInsensitiveMap for KafkaOffsetReader URL: https://github.com/apache/spark/pull/24967#issuecomment-505759715 @HeartSaVioR thanks for your comment! This case sensitivity issue disturbs me for at least half a year so pretty open what less error prone solution can we come up. Since I've found another issue I've invested more time and analyzed it through. Let me share my thinking and please do the same to find out how to proceed. First, start with the actual implementation. Majority of this code uses case-insensitive maps in some way (`CaseInsensitiveMap` or `CaseInsensitiveStringMap` instances arrive mainly). > Map with lowercased key and CaseInsensitiveMap are not same This is true. > referring Map with lowercase key as "case-insensitive" which is not strictly true I agree with your statement and we can discuss how to name it. Map with lowercase key is referred "case-insensitive" because the user doesn't have to care about case when the following operations used: put/get/contains. The following ops are the same: ``` val map: CaseInsensitive... map.put("key1", "value1") map.put("keY2", "value2") // all is true map.get("key1")==map.get("KEY1") map.contains("kEy1") map.get("key2")==map.get("KEY2") map.contains("kEy2") ``` If the underlying conversion which is used on the keys consistently will change then the mentioned logic is still true. To find out whether user is provided a configuration I think it's the most robust solution. The other use-case which exists is to extract the entries from these maps which start with `kafka.` and send it to Kakfa producer/consumer. This can't be called case-insensitive of course because Kafka requires lowercase keys, no question. This is the reason why the following function extracts the mentioned parameters (and force lower case conversion on keys): ``` private def convertToSpecifiedParams(parameters: Map[String, String]): Map[String, String] = { parameters .keySet .filter(_.toLowerCase(Locale.ROOT).startsWith("kafka.")) .map { k => k.drop(6).toString -> parameters(k) } .toMap } ``` Majority of this code uses the mentioned case insensitive maps but there are some parts which breaks this (for example the part which we're now modifying). As a final conclusion from my perspective: * Case insensitive maps has to be used for the first use-case because less error prone (this is what I tend to address) * When any parameter passed to Kafka `convertToSpecifiedParams` must be used (this is already the situation) * Changing the following strings to lower case would has the same effect just like the actual stand of this PR but then we stay in the same parameter case hell ``` private[kafka010] val FETCH_OFFSET_NUM_RETRY = "fetchoffset.numretries" private[kafka010] val FETCH_OFFSET_RETRY_INTERVAL_MS = "fetchoffset.retryintervalms" ``` That said everything can be discussed so waiting on thoughts...
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
