Github user koeninger commented on a diff in the pull request:

    https://github.com/apache/spark/pull/7185#discussion_r33779900
  
    --- Diff: 
external/kafka/src/main/scala/org/apache/spark/streaming/kafka/KafkaUtils.scala 
---
    @@ -670,4 +670,9 @@ private class KafkaUtilsPythonHelper {
         TopicAndPartition(topic, partition)
     
       def createBroker(host: String, port: JInt): Broker = Broker(host, port)
    +
    +  def getParentKafkaRDDs(rdd: RDD[_]): JList[RDD[_]] = {
    +    val parentRDDs = rdd.getNarrowAncestors
    +    parentRDDs.filter(rdd => rdd.isInstanceOf[KafkaRDD[_, _, _, _, _]])
    --- End diff --
    
    The general approach seems good, but I don't understand why this is 
searching the whole list of parent rdds.  Shouldn't the original kafka rdd be a 
fixed distance away in the chain of parents?
    
    This feels wrong to me for several reasons:
    
    - There should be at most 1 parent kafka rdd, not a list of them.
    - If someone calls KafkaUtils.offsetRanges on something that isn't a 
KafkaRDD, they're going to get an empty list.  I personally would expect an 
exception.
    - The scala / java approach we recommend (typecast) is only going to work 
if it's done on the original rdd, not after a transformation.  I personally 
think this is a little clearer, as it reinforces the idea that you can't rely 
on 1:1 after a repartition.  But in any case I feel like the approach we're 
recommending for the different languages should be similar; either change this, 
or add a java / scala static method that grabs offset ranges in a similar 
manner.



---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to