[ 
https://issues.apache.org/jira/browse/KAFKA-6129?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Francesco vigotti updated KAFKA-6129:
-------------------------------------
    Description: 
I've started writing in this issue: 
https://issues.apache.org/jira/browse/KAFKA-2729
but then I'm going to open this new issue because I've probably found the cause 
in my kubernetes setup, but In my opinion kubernetes did nothing wrong in his 
setup ( and all other application works using the same nodeport redirection , 
ie: zookeeper )
kafka brokers fails , silently (randomly in multiple brokers setup)  and with a 
misleading error from producer so I think that Kafka should be improved, 
providing more robust pre-startup flight-checks and identifying/reporting the 
current issue 

After further investigation from my reply here 
https://issues.apache.org/jira/browse/KAFKA-2729  with a minimum size cluster ( 
1 zk + 1 kafka-broker ) I've found the problem, 
the problem is with kubernetes, ( I don't know why this issue appeared only now 
to me , if something changed in recent kube-proxy versions or in kafka 0.10+ , 
or ... ) 
anyway my old kafka cluster started being underreplicated and return various 
problem , 

the problem happens when in kubernetes pods are created and redirected using a 
nodeport-service ( over a static ip in my case ) to expose kafka brokers from 
the host, when using hostNetwork  ( so no redirection ) everything works, what 
is strange is that zookeeper instead works fine with nodeport ( which create a 
redirection rule in iptables->nat->prerouting ) the only application I've found 
problems with this kubernetes configuration is kafka,
what is weird is that kafka starts correctly without errors, but on multiple 
broker clusters there are random issues, on single broker cluster instead the 
console-producer fails with infinite looop of :

```
[2017-10-26 09:38:23,281] WARN Error while fetching metadata with correlation 
id 5 : {test6=UNKNOWN_TOPIC_OR_PARTITION} 
(org.apache.kafka.clients.NetworkClient)
[2017-10-26 09:38:23,383] WARN Error while fetching metadata with correlation 
id 6 : {test6=UNKNOWN_TOPIC_OR_PARTITION} 
(org.apache.kafka.clients.NetworkClient)
[2017-10-26 09:38:23,485] WARN Error while fetching metadata with correlation 
id 7 : {test6=UNKNOWN_TOPIC_OR_PARTITION} 
(org.apache.kafka.clients.NetworkClient)
```
, still no errors reported from broker or zookeeper,
Also I want to say that I've come across this discussion : 
             
https://stackoverflow.com/questions/35788697/leader-not-available-kafka-in-console-producer
 
but the proposed solution for the host pod ( to allow self-resolving of 
advertised hostname) didn't worked 

``` 
hostAliases:
      - ip: "127.0.0.1"
        hostnames:
        - "---myhosthostname---"
````




  was:
I've started writing in this issue: 
https://issues.apache.org/jira/browse/KAFKA-2729
but then I'm going to open this new issue because I've probably found the cause 
in my kubernetes setup, but In my opinion kubernetes did nothing wrong in his 
setup ( and all other application works using the same nodeport redirection , 
ie: zookeeper )
kafka brokers fails , silently (randomly in multiple brokers setup)  and with a 
misleading error from producer so I think that Kafka should be improved, 
providing more robust pre-startup flight-checks and identifying/reporting the 
current issue 

After further investigation from my reply here 
https://issues.apache.org/jira/browse/KAFKA-2729  with a minimum size cluster ( 
1 zk + 1 kafka-broker ) I've found the problem, 
the problem is with kubernetes, ( I don't know why this issue appeared only now 
to me , if something changed in recent kube-proxy versions or in kafka 0.10+ , 
or ... ) 
anyway my old kafka cluster started being underreplicated and return various 
problem , 

the problem happens when in kubernetes pods are created and redirected using a 
nodeport-service ( over a static ip in my case ) to expose kafka brokers from 
the host, when using hostNetwork  ( so no redirection ) everything works, what 
is strange is that zookeeper instead works fine with nodeport ( which create a 
redirection rule in iptables->nat->prerouting ) the only application I've found 
problems with this kubernetes configuration is kafka,
what is weird is that kafka starts correctly without errors, but on multiple 
broker clusters there are random issues, on single broker cluster instead the 
console-producer fails with infinite looop of :

```
[2017-10-26 09:38:23,281] WARN Error while fetching metadata with correlation 
id 5 : {test6=UNKNOWN_TOPIC_OR_PARTITION} 
(org.apache.kafka.clients.NetworkClient)
[2017-10-26 09:38:23,383] WARN Error while fetching metadata with correlation 
id 6 : {test6=UNKNOWN_TOPIC_OR_PARTITION} 
(org.apache.kafka.clients.NetworkClient)
[2017-10-26 09:38:23,485] WARN Error while fetching metadata with correlation 
id 7 : {test6=UNKNOWN_TOPIC_OR_PARTITION} 
(org.apache.kafka.clients.NetworkClient)
```
, still no errors reported from broker or zookeeper,
Also I want to say that I've come across this discussion : 
             
https://stackoverflow.com/questions/35788697/leader-not-available-kafka-in-console-producer
 
but the proposed solution for the host pod ( to allow self-resolving of 
advertised hostname) didn't worked 
``` 
hostAliases:
      - ip: "127.0.0.1"
        hostnames:
        - "---myhosthostname---"
````





> kafka issue when exposing through nodeport in kubernetes
> --------------------------------------------------------
>
>                 Key: KAFKA-6129
>                 URL: https://issues.apache.org/jira/browse/KAFKA-6129
>             Project: Kafka
>          Issue Type: Bug
>    Affects Versions: 0.10.2.1
>         Environment: kubernetes
>            Reporter: Francesco vigotti
>            Priority: Critical
>
> I've started writing in this issue: 
> https://issues.apache.org/jira/browse/KAFKA-2729
> but then I'm going to open this new issue because I've probably found the 
> cause in my kubernetes setup, but In my opinion kubernetes did nothing wrong 
> in his setup ( and all other application works using the same nodeport 
> redirection , ie: zookeeper )
> kafka brokers fails , silently (randomly in multiple brokers setup)  and with 
> a misleading error from producer so I think that Kafka should be improved, 
> providing more robust pre-startup flight-checks and identifying/reporting the 
> current issue 
> After further investigation from my reply here 
> https://issues.apache.org/jira/browse/KAFKA-2729  with a minimum size cluster 
> ( 1 zk + 1 kafka-broker ) I've found the problem, 
> the problem is with kubernetes, ( I don't know why this issue appeared only 
> now to me , if something changed in recent kube-proxy versions or in kafka 
> 0.10+ , or ... ) 
> anyway my old kafka cluster started being underreplicated and return various 
> problem , 
> the problem happens when in kubernetes pods are created and redirected using 
> a nodeport-service ( over a static ip in my case ) to expose kafka brokers 
> from the host, when using hostNetwork  ( so no redirection ) everything 
> works, what is strange is that zookeeper instead works fine with nodeport ( 
> which create a redirection rule in iptables->nat->prerouting ) the only 
> application I've found problems with this kubernetes configuration is kafka,
> what is weird is that kafka starts correctly without errors, but on multiple 
> broker clusters there are random issues, on single broker cluster instead the 
> console-producer fails with infinite looop of :
> ```
> [2017-10-26 09:38:23,281] WARN Error while fetching metadata with correlation 
> id 5 : {test6=UNKNOWN_TOPIC_OR_PARTITION} 
> (org.apache.kafka.clients.NetworkClient)
> [2017-10-26 09:38:23,383] WARN Error while fetching metadata with correlation 
> id 6 : {test6=UNKNOWN_TOPIC_OR_PARTITION} 
> (org.apache.kafka.clients.NetworkClient)
> [2017-10-26 09:38:23,485] WARN Error while fetching metadata with correlation 
> id 7 : {test6=UNKNOWN_TOPIC_OR_PARTITION} 
> (org.apache.kafka.clients.NetworkClient)
> ```
> , still no errors reported from broker or zookeeper,
> Also I want to say that I've come across this discussion : 
>              
> https://stackoverflow.com/questions/35788697/leader-not-available-kafka-in-console-producer
>  
> but the proposed solution for the host pod ( to allow self-resolving of 
> advertised hostname) didn't worked 
> ``` 
> hostAliases:
>       - ip: "127.0.0.1"
>         hostnames:
>         - "---myhosthostname---"
> ````



--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

Reply via email to