[jira] [Created] (KAFKA-16370) offline rollback procedure from kraft mode to zookeeper mode.

2024-03-14 Thread kaushik srinivas (Jira)
kaushik srinivas created KAFKA-16370:


 Summary: offline rollback procedure from kraft mode to zookeeper 
mode.
 Key: KAFKA-16370
 URL: https://issues.apache.org/jira/browse/KAFKA-16370
 Project: Kafka
  Issue Type: Improvement
Reporter: kaushik srinivas


>From the KIP, 
>[https://cwiki.apache.org/confluence/display/KAFKA/KIP-866+ZooKeeper+to+KRaft+Migration,]

 
h2. Finalizing the Migration

Once the cluster has been fully upgraded to KRaft mode, the controller will 
still be running in migration mode and making dual writes to KRaft and ZK. 
Since the data in ZK is still consistent with that of the KRaft metadata log, 
it is still possible to revert back to ZK.

*_The time that the cluster is running all KRaft brokers/controllers, but still 
running in migration mode, is effectively unbounded._*

Once the operator has decided to commit to KRaft mode, the final step is to 
restart the controller quorum and take it out of migration mode by setting 
_zookeeper.metadata.migration.enable_ to "false" (or unsetting it). The active 
controller will only finalize the migration once it detects that all members of 
the quorum have signaled that they are finalizing the migration (again, using 
the tagged field in ApiVersionsResponse). Once the controller leaves migration 
mode, it will write a ZkMigrationStateRecord to the log and no longer perform 
writes to ZK. It will also disable its special handling of ZK RPCs.

*At this point, the cluster is fully migrated and is running in KRaft mode. A 
rollback to ZK is still possible after finalizing the migration, but it must be 
done offline and it will cause metadata loss (which can also cause partition 
data loss).*

 

Trying out the same in a kafka cluster which is migrated from zookeeper into 
kraft mode. We observe the rollback is possible by deleting the "/controller" 
node in the zookeeper before the rollback from kraft mode to zookeeper is done.

The above snippet indicates that the rollback from kraft to zk after migration 
is finalized is still possible in offline method. Is there any already known 
steps to be done as part of this offline method of rollback ?

>From our experience, we currently know of the step "deletion of /controller 
>node in zookeeper to force zookeper based brokers to be elected as new 
>controller after the rollback is done". Are there any additional steps/actions 
>apart from this ?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-16360) Release plan of 3.x kafka releases.

2024-03-11 Thread kaushik srinivas (Jira)
kaushik srinivas created KAFKA-16360:


 Summary: Release plan of 3.x kafka releases.
 Key: KAFKA-16360
 URL: https://issues.apache.org/jira/browse/KAFKA-16360
 Project: Kafka
  Issue Type: Improvement
Reporter: kaushik srinivas


KIP 
[https://cwiki.apache.org/confluence/display/KAFKA/KIP-833%3A+Mark+KRaft+as+Production+Ready#KIP833:MarkKRaftasProductionReady-ReleaseTimeline]
 mentions ,
h2. Kafka 3.7
 * January 2024
 * Final release with ZK mode

But we see in Jira, some tickets are marked for 3.8 release. Does apache 
continue to make 3.x releases having zookeeper and kraft supported independent 
of pure kraft 4.x releases ?

If yes, how many more releases can be expected on 3.x release line ?

 



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-15223) Need clarity in documentation for upgrade/downgrade across releases.

2023-07-20 Thread kaushik srinivas (Jira)
kaushik srinivas created KAFKA-15223:


 Summary: Need clarity in documentation for upgrade/downgrade 
across releases.
 Key: KAFKA-15223
 URL: https://issues.apache.org/jira/browse/KAFKA-15223
 Project: Kafka
  Issue Type: Improvement
Reporter: kaushik srinivas


Referring to the upgrade documentation for apache kafka.

[https://kafka.apache.org/34/documentation.html#upgrade_3_4_0]

There is confusion with respect to below statements from the above sectioned 
link of apache docs.

"If you are upgrading from a version prior to 2.1.x, please see the note below 
about the change to the schema used to store consumer offsets. *Once you have 
changed the inter.broker.protocol.version to the latest version, it will not be 
possible to downgrade to a version prior to 2.1."*

The above statement mentions that the downgrade would not be possible to 
version prior to "2.1" in case of "upgrading the inter.broker.protocol.version 
to the latest version".

But, there is another statement made in the documentation in *point 4* as below

"Restart the brokers one by one for the new protocol version to take effect. 
{*}Once the brokers begin using the latest protocol version, it will no longer 
be possible to downgrade the cluster to an older version.{*}"

 

These two statements are repeated across a lot of prior release of kafka and is 
confusing.

Below are the questions:
 # Is downgrade not at all possible to *"any"* older version of kafka once the 
inter.broker.protocol.version is updated to latest version *OR* downgrades are 
not possible only to versions *"<2.1"* ?
 # Suppose one takes an approach similar to upgrade even for the downgrade 
path. i.e. downgrade the inter.broker.protocol.version first to the previous 
version, next downgrade the software/code of kafka to previous release 
revision. Does downgrade work with this approach ?

Can these two questions be documented if the results are already known ?

Maybe a downgrade guide can be created too similar to the existing upgrade 
guide ?



--
This message was sent by Atlassian Jira
(v8.20.10#820010)


[jira] [Created] (KAFKA-13578) Need clarity on the bridge release version of kafka without zookeeper in the eco system - KIP500

2022-01-06 Thread kaushik srinivas (Jira)
kaushik srinivas created KAFKA-13578:


 Summary: Need clarity on the bridge release version of kafka 
without zookeeper in the eco system - KIP500
 Key: KAFKA-13578
 URL: https://issues.apache.org/jira/browse/KAFKA-13578
 Project: Kafka
  Issue Type: Bug
Reporter: kaushik srinivas


We see from the KIP-500

[https://cwiki.apache.org/confluence/display/KAFKA/KIP-500%3A+Replace+ZooKeeper+with+a+Self-Managed+Metadata+Quorum]

 

Below statement needs some clarity,

*We will be able to upgrade from any version of Kafka to this bridge release, 
and from the bridge release to a post-ZK release.  When upgrading from an 
earlier release to a post-ZK release, the upgrade must be done in two steps: 
first, you must upgrade to the bridge release, and then you must upgrade to the 
post-ZK release.*

 

What apache kafka version is referred to as bridge release in this context ? 
can apache kafka 3.0.0 be considered bridge release even though it is not 
production ready to run without zookeeper ?



--
This message was sent by Atlassian Jira
(v8.20.1#820001)


[jira] [Resolved] (KAFKA-13177) partition failures and fewer shrink but a lot of isr expansions with increased num.replica.fetchers in kafka brokers

2021-08-09 Thread kaushik srinivas (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-13177?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

kaushik srinivas resolved KAFKA-13177.
--
Resolution: Not A Bug

> partition failures and fewer shrink but a lot of isr expansions with 
> increased num.replica.fetchers in kafka brokers
> 
>
> Key: KAFKA-13177
> URL: https://issues.apache.org/jira/browse/KAFKA-13177
> Project: Kafka
>  Issue Type: Bug
>Reporter: kaushik srinivas
>Assignee: kaushik srinivas
>Priority: Major
>
> Installing 3 node kafka broker cluster (4 core cpu and 4Gi memory on k8s)
> topics : 15, partitions each : 15 replication factor 3, min.insync.replicas  
> : 2
> producers running with acks : all
> Initially the num.replica.fetchers was set to 1 (default) and we observed 
> very frequent ISR shrinks and expansions. So the setups were tuned with a 
> higher value of 4. 
> Once after this change was done, we see below behavior and warning msgs in 
> broker logs
>  # Over a period of 2 days, there are around 10 shrinks corresponding to 10 
> partitions, but around 700 ISR expansions corresponding to almost all 
> partitions in the cluster(approx 50 to 60 partitions).
>  # we see frequent warn msg of partitions being marked as failure in the same 
> time span. Below is the trace --> {"type":"log", "host":"ww", 
> "level":"WARN", "neid":"kafka-ww", "system":"kafka", 
> "time":"2021-08-03T20:09:15.340", "timezone":"UTC", 
> "log":{"message":"ReplicaFetcherThread-2-1003 - 
> kafka.server.ReplicaFetcherThread - *[ReplicaFetcher replicaId=1001, 
> leaderId=1003, fetcherId=2] Partition test-16 marked as failed"}}*
>  
> We see the above behavior continuously after increasing the 
> num.replica.fetchers to 4 from 1. We did increase this to improve the 
> replication performance and hence reduce the ISR shrinks.
> But we see this strange behavior after the change. What would the above trace 
> indicate and is marking partitions as failed just a WARN msgs and handled by 
> kafka or is it something to worry about ? 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-13178) frequent network_exception trace at kafka producer.

2021-08-08 Thread kaushik srinivas (Jira)
kaushik srinivas created KAFKA-13178:


 Summary: frequent network_exception trace at kafka producer.
 Key: KAFKA-13178
 URL: https://issues.apache.org/jira/browse/KAFKA-13178
 Project: Kafka
  Issue Type: Bug
Reporter: kaushik srinivas


Running 3 node kafka cluster (4 cores and 4 cpu in k8s).

topics : 15, partitions each : 15 , replication factor : 3. min.insync.replicas 
: 2

producer is running with acks : "all"

We see frequent failures in the kafka producer with below trace

{"host":"ww","level":"WARN","log":{"classname":"org.apache.kafka.clients.producer.internals.Sender:595","message":"[Producer
 clientId=producer-1] Got error produce response with correlation id 2646 on 
topic-partition *-0, retrying (2 attempts left). Error: 
NETWORK_EXCEPTION","stacktrace":"","threadname":"kafka-producer-network-thread 
| 
producer-1"},"time":"2021-08-*04T02:22:20.529Z","timezone":"UTC","type":"log","system":"w","systemid":"3"}

 

What could be possible reasons for the above trace ? 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-13177) partition failures and fewer shrink but a lot of isr expansions with increased num.replica.fetchers in kafka brokers

2021-08-08 Thread kaushik srinivas (Jira)
kaushik srinivas created KAFKA-13177:


 Summary: partition failures and fewer shrink but a lot of isr 
expansions with increased num.replica.fetchers in kafka brokers
 Key: KAFKA-13177
 URL: https://issues.apache.org/jira/browse/KAFKA-13177
 Project: Kafka
  Issue Type: Bug
Reporter: kaushik srinivas


Installing 3 node kafka broker cluster (4 core cpu and 4Gi memory on k8s)

topics : 15, partitions each : 15 replication factor 3, min.insync.replicas  : 2

producers running with acks : all

Initially the num.replica.fetchers was set to 1 (default) and we observed very 
frequent ISR shrinks and expansions. So the setups were tuned with a higher 
value of 4. 

Once after this change was done, we see below behavior and warning msgs in 
broker logs
 # Over a period of 2 days, there are around 10 shrinks corresponding to 10 
partitions, but around 700 ISR expansions corresponding to almost all 
partitions in the cluster(approx 50 to 60 partitions).
 # we see frequent warn msg of partitions being marked as failure in the same 
time span. Below is the trace --> {"type":"log", "host":"ww", 
"level":"WARN", "neid":"kafka-ww", "system":"kafka", 
"time":"2021-08-03T20:09:15.340", "timezone":"UTC", 
"log":{"message":"ReplicaFetcherThread-2-1003 - 
kafka.server.ReplicaFetcherThread - *[ReplicaFetcher replicaId=1001, 
leaderId=1003, fetcherId=2] Partition test-16 marked as failed"}}*

 

We see the above behavior continuously after increasing the 
num.replica.fetchers to 4 from 1. We did increase this to improve the 
replication performance and hence reduce the ISR shrinks.

But we see this strange behavior after the change. What would the above trace 
indicate and is marking partitions as failed just a WARN msgs and handled by 
kafka or is it something to worry about ? 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-13176) frequent ISR shrinks and expansion with default num.replica.fetchers (1) under very low throughput conditions.

2021-08-08 Thread kaushik srinivas (Jira)
kaushik srinivas created KAFKA-13176:


 Summary: frequent ISR shrinks and expansion with default 
num.replica.fetchers (1) under very low throughput conditions.
 Key: KAFKA-13176
 URL: https://issues.apache.org/jira/browse/KAFKA-13176
 Project: Kafka
  Issue Type: Bug
Reporter: kaushik srinivas


Running a 3 node kafka cluster (2.3.x kafka) with 4 cores of cpu and 4Gi of 
memory on  a k8s environment.

num.replica.fetchers is configured to 1 (default value).

There are around 15 topics in the cluster and all of them receive a very low 
rate of records/sec (less than 100 per second most of the cases).

All the topics have more than 10 partitions and 3 replication each. 
min.insync.replicas is set to 2. And producers are run with acks level set to 
'all'.

we constantly observer ISR shrinks and expansions for almost each topic 
partition continuously. shrinks and expansions are mostly seperated by around 6 
to 8 seconds mostly usually.

During these shrinks and expands we see a lot of request time outs at the kafka 
producer side for these topics.

any known configuration items we can use to overcome this ? 

Confused about the fact of continuous ISR shrinks and expands with very low 
throughput topics.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-12858) dynamically update the ssl certificates of kafka connect worker without restarting connect process.

2021-05-28 Thread kaushik srinivas (Jira)
kaushik srinivas created KAFKA-12858:


 Summary: dynamically update the ssl certificates of kafka connect 
worker without restarting connect process.
 Key: KAFKA-12858
 URL: https://issues.apache.org/jira/browse/KAFKA-12858
 Project: Kafka
  Issue Type: Improvement
Reporter: kaushik srinivas


Hi,

 

We are trying to update the ssl certificates of kafka connect worker which is 
due for expiry. Is there any way to dynamically update the ssl certificate of 
connet worker as it is possible in kafka using kafka-configs.sh script ?

If not, what is the recommended way to update the ssl certificates of kafka 
connect worker without disrupting the existing traffic ?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-12855) Update ssl certificates of kafka connect worker runtime without restarting the worker process.

2021-05-27 Thread kaushik srinivas (Jira)
kaushik srinivas created KAFKA-12855:


 Summary: Update ssl certificates of kafka connect worker runtime 
without restarting the worker process.
 Key: KAFKA-12855
 URL: https://issues.apache.org/jira/browse/KAFKA-12855
 Project: Kafka
  Issue Type: Improvement
  Components: KafkaConnect
Reporter: kaushik srinivas


Is there a possibility to update the ssl certificates of kafka connect worker 
dynamically something similar to kafka-configs script for kafka ? Or the only 
way to update the certificates is to restart the worker processes and update 
the certificates ?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-12534) kafka-configs does not work with ssl enabled kafka broker.

2021-03-23 Thread kaushik srinivas (Jira)
kaushik srinivas created KAFKA-12534:


 Summary: kafka-configs does not work with ssl enabled kafka broker.
 Key: KAFKA-12534
 URL: https://issues.apache.org/jira/browse/KAFKA-12534
 Project: Kafka
  Issue Type: Bug
Affects Versions: 2.6.1
Reporter: kaushik srinivas


We are trying to change the trust store password on the fly using the 
kafka-configs script for a ssl enabled kafka broker.

Below is the command used:

kafka-configs.sh --bootstrap-server localhost:9092 --entity-type brokers 
--entity-name 1001 --alter --add-config 'ssl.truststore.password=xxx'

But we see below error in the broker logs when the command is run.

{"type":"log", "host":"kf-2-0", "level":"INFO", 
"neid":"kafka-cfd5ccf2af7f47868e83473408", "system":"kafka", 
"time":"2021-03-23T12:14:40.055", "timezone":"UTC", 
"log":\{"message":"data-plane-kafka-network-thread-1002-ListenerName(SSL)-SSL-2 
- org.apache.kafka.common.network.Selector - [SocketServer brokerId=1002] 
Failed authentication with /127.0.0.1 (SSL handshake failed)"}}

 How can anyone configure ssl certs for the kafka-configs script and succeed 
with the ssl handshake in this case ? 

Note : 

We are trying with a single listener i.e SSL: 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-12530) kafka-configs.sh does not work while changing the sasl jaas configurations.

2021-03-23 Thread kaushik srinivas (Jira)
kaushik srinivas created KAFKA-12530:


 Summary: kafka-configs.sh does not work while changing the sasl 
jaas configurations.
 Key: KAFKA-12530
 URL: https://issues.apache.org/jira/browse/KAFKA-12530
 Project: Kafka
  Issue Type: Bug
Reporter: kaushik srinivas


We are trying to use kafka-configs script to modify the sasl jaas 
configurations, but unable to do so.

Command used:

./kafka-configs.sh --bootstrap-server localhost:9092 --entity-type brokers 
--entity-name 59 --alter --add-config 'sasl.jaas.config=KafkaServer \{\n 
org.apache.kafka.common.security.plain.PlainLoginModule required \n 
username=\"test\" \n password=\"test\"; \n };'

error:

requirement failed: Invalid entity config: all configs to be added must be in 
the format "key=val".

command 2:

kafka-configs.sh --bootstrap-server localhost:9092 --entity-type brokers 
--entity-name 59 --alter --add-config 
'sasl.jaas.config=[username=test,password=test]'

output:

command does not return , but kafka broker logs below error:

DEBUG", "neid":"kafka-cfd5ccf2af7f47868e83471a5b603408", "system":"kafka", 
"time":"2021-03-23T08:29:00.946", "timezone":"UTC", 
"log":\{"message":"data-plane-kafka-network-thread-1001-ListenerName(SASL_PLAINTEXT)-SASL_PLAINTEXT-2
 - org.apache.kafka.common.security.authenticator.SaslServerAuthenticator - Set 
SASL server state to FAILED during authentication"}}
{"type":"log", "host":"kf-kaudynamic-0", "level":"INFO", 
"neid":"kafka-cfd5ccf2af7f47868e83471a5b603408", "system":"kafka", 
"time":"2021-03-23T08:29:00.946", "timezone":"UTC", 
"log":\{"message":"data-plane-kafka-network-thread-1001-ListenerName(SASL_PLAINTEXT)-SASL_PLAINTEXT-2
 - org.apache.kafka.common.network.Selector - [SocketServer brokerId=1001] 
Failed authentication with /127.0.0.1 (Unexpected Kafka request of type 
METADATA during SASL handshake.)"}}

We have below issues:
1. If one installs kafka broker with SASL mechanism and wants to change the 
SASL jaas config via kafka-configs scripts, how is it supposed to be done ?
 does kafka-configs needs client credentials to do the same ? 
2. Can anyone point us to example commands of kafka-configs to alter the 
sasl.jaas.config property of kafka broker. We do not see any documentation or 
examples for the same.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-12529) kafka-configs.sh does not work while changing the sasl jaas configurations.

2021-03-23 Thread kaushik srinivas (Jira)
kaushik srinivas created KAFKA-12529:


 Summary: kafka-configs.sh does not work while changing the sasl 
jaas configurations.
 Key: KAFKA-12529
 URL: https://issues.apache.org/jira/browse/KAFKA-12529
 Project: Kafka
  Issue Type: Bug
Reporter: kaushik srinivas


We are trying to use kafka-configs script to modify the sasl jaas 
configurations, but unable to do so.

Command used:

./kafka-configs.sh --bootstrap-server localhost:9092 --entity-type brokers 
--entity-name 59 --alter --add-config 'sasl.jaas.config=KafkaServer \{\n 
org.apache.kafka.common.security.plain.PlainLoginModule required \n 
username=\"test\" \n password=\"test\"; \n };'

error:

requirement failed: Invalid entity config: all configs to be added must be in 
the format "key=val".

command 2:

kafka-configs.sh --bootstrap-server localhost:9092 --entity-type brokers 
--entity-name 59 --alter --add-config 
'sasl.jaas.config=[username=test,password=test]'

output:

command does not return , but kafka broker logs below error:

DEBUG", "neid":"kafka-cfd5ccf2af7f47868e83471a5b603408", "system":"kafka", 
"time":"2021-03-23T08:29:00.946", "timezone":"UTC", 
"log":\{"message":"data-plane-kafka-network-thread-1001-ListenerName(SASL_PLAINTEXT)-SASL_PLAINTEXT-2
 - org.apache.kafka.common.security.authenticator.SaslServerAuthenticator - Set 
SASL server state to FAILED during authentication"}}
{"type":"log", "host":"kf-kaudynamic-0", "level":"INFO", 
"neid":"kafka-cfd5ccf2af7f47868e83471a5b603408", "system":"kafka", 
"time":"2021-03-23T08:29:00.946", "timezone":"UTC", 
"log":\{"message":"data-plane-kafka-network-thread-1001-ListenerName(SASL_PLAINTEXT)-SASL_PLAINTEXT-2
 - org.apache.kafka.common.network.Selector - [SocketServer brokerId=1001] 
Failed authentication with /127.0.0.1 (Unexpected Kafka request of type 
METADATA during SASL handshake.)"}}

We have below issues:
1. If one installs kafka broker with SASL mechanism and wants to change the 
SASL jaas config via kafka-configs scripts, how is it supposed to be done ?
 does kafka-configs needs client credentials to do the same ? 
2. Can anyone point us to example commands of kafka-configs to alter the 
sasl.jaas.config property of kafka broker. We do not see any documentation or 
examples for the same.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-12528) kafka-configs.sh does not work while changing the sasl jaas configurations.

2021-03-23 Thread kaushik srinivas (Jira)
kaushik srinivas created KAFKA-12528:


 Summary: kafka-configs.sh does not work while changing the sasl 
jaas configurations.
 Key: KAFKA-12528
 URL: https://issues.apache.org/jira/browse/KAFKA-12528
 Project: Kafka
  Issue Type: Bug
  Components: admin, core
Reporter: kaushik srinivas


We are trying to modify the sasl jaas configurations for the kafka broker 
runtime using the dynamic config update functionality using the 
kafka-configs.sh script. But we are unable to get it working.

Below is our command:

./kafka-configs.sh --bootstrap-server localhost:9092 --entity-type brokers 
--entity-name 59 --alter --add-config 'sasl.jaas.config=KafkaServer \{\n 
org.apache.kafka.common.security.plain.PlainLoginModule required \n 
username=\"test\" \n password=\"test\"; \n };'

 

command is exiting with error:

requirement failed: Invalid entity config: all configs to be added must be in 
the format "key=val".

 

we also tried below format as well:

kafka-configs.sh --bootstrap-server localhost:9092 --entity-type brokers 
--entity-name 59 --alter --add-config 
'sasl.jaas.config=[username=test,password=test]'

command does not return but the kafka broker logs prints the below error 
messages.

org.apache.kafka.common.security.authenticator.SaslServerAuthenticator - Set 
SASL server state to FAILED during authentication"}}
{"type":"log", "host":"kf-kaudynamic-0", "level":"INFO", 
"neid":"kafka-cfd5ccf2af7f47868e83471a5b603408", "system":"kafka", 
"time":"2021-03-23T08:29:00.946", "timezone":"UTC", 
"log":\{"message":"data-plane-kafka-network-thread-1001-ListenerName(SASL_PLAINTEXT)-SASL_PLAINTEXT-2
 - org.apache.kafka.common.network.Selector - [SocketServer brokerId=1001] 
Failed authentication with /127.0.0.1 (Unexpected Kafka request of type 
METADATA during SASL handshake.)"}}

 

1. If one has SASL enabled and with a single listener, how are we supposed to 
change the sasl credentials using this command ?

2. can anyone point us out to some example commands for modifying the sasl jaas 
configurations ?



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-12164) ssue when kafka connect worker pod restart, during creation of nested partition directories in hdfs file system.

2021-01-07 Thread kaushik srinivas (Jira)
kaushik srinivas created KAFKA-12164:


 Summary: ssue when kafka connect worker pod restart, during 
creation of nested partition directories in hdfs file system.
 Key: KAFKA-12164
 URL: https://issues.apache.org/jira/browse/KAFKA-12164
 Project: Kafka
  Issue Type: Bug
  Components: KafkaConnect
Reporter: kaushik srinivas


In our production labs, an issue is observed. Below is the sequence of the same.
 # hdfs connector is added to the connect worker.
 # hdfs connector is creating folders in hdfs /test1=1/test2=2/
Based on the custom partitioner. Here test1 and test2 are two separate nested 
directories derived from multiple fields in the record using a custom 
partitioner.
 # Now kafka connect hdfs connector uses below function calls to create the 
directories in the hdfs file system.
fs.mkdirs(new Path(filename));
ref: 
[https://github.com/confluentinc/kafka-connect-hdfs/blob/master/src/main/java/io/confluent/connect/hdfs/storage/HdfsStorage.java]

Now the important thing to note is that if mkdirs() is a non atomic operation 
(i.e can result in partial execution if interrupted)
then suppose the first directory ie test1 is created and just before creation 
of test2 in hdfs happens if there is a restart to the connect worker pod. Then 
the hdfs file system will remain with partial folders created for partitions 
during the restart time frames.

So we might have conditions in hdfs as below
/test1=0/test2=0/
/test1=1/
/test1=2/test2=2
/test1=3/test2=3

So the second partition has a missing directory in it. And if hive integration 
is enabled, hive metastore exceptions will occur since there is a partition 
expected from hive table is missing for few partitions in hdfs.

*This can occur to any connector with some ongoing non atomic operation and a 
restart is triggered to kafka connect worker pod. This will result in some 
partially completed states in the system and may cause issues for the connector 
to continue its operation*.

*This is a very critical issue and needs some attention on ways for handling 
the same.*



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-10278) kafka-configs does not show the current properties of running kafka broker upon describe.

2020-07-16 Thread kaushik srinivas (Jira)
kaushik srinivas created KAFKA-10278:


 Summary: kafka-configs does not show the current properties of 
running kafka broker upon describe.
 Key: KAFKA-10278
 URL: https://issues.apache.org/jira/browse/KAFKA-10278
 Project: Kafka
  Issue Type: Bug
Affects Versions: 2.4.1
Reporter: kaushik srinivas


kafka-configs.sh does not list the properties 
(read-only/per-broker/cluster-wide) with which the kafka broker is currently 
running.

The command returns nothing.

Only those properties added or updated via kafka-configs.sh is listed by the 
describe command.

bash-4.2$ env -i  bin/kafka-configs.sh --bootstrap-server 
kf-test-0.kf-test-headless.test.svc.cluster.local:9092 --entity-type brokers 
--entity-default --describe Default config for brokers in the cluster are:
  log.cleaner.threads=2 sensitive=false 
synonyms=\{DYNAMIC_DEFAULT_BROKER_CONFIG:log.cleaner.threads=2}

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-9934) Information and doc update needed for support of AclAuthorizer when protocol is PLAINTEXT

2020-04-28 Thread kaushik srinivas (Jira)
kaushik srinivas created KAFKA-9934:
---

 Summary: Information and doc update needed for support of 
AclAuthorizer when protocol is PLAINTEXT
 Key: KAFKA-9934
 URL: https://issues.apache.org/jira/browse/KAFKA-9934
 Project: Kafka
  Issue Type: Improvement
  Components: security
Affects Versions: 2.4.1
Reporter: kaushik srinivas


Need information on the case where the protocol is PLAINTEXT for listeners in 
kafka.

Does Authorization applies when the protocol is PLAINTEXT ?

if so, what would be used as the principal name for the authorization acl 
validations?

There is no doc which describes this case.

Need info and doc update for the same.

Thanks,

kaushik.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Created] (KAFKA-9933) Need doc update on the AclAuthorizer when SASL_SSL is the protocol used.

2020-04-28 Thread kaushik srinivas (Jira)
kaushik srinivas created KAFKA-9933:
---

 Summary: Need doc update on the AclAuthorizer when SASL_SSL is the 
protocol used.
 Key: KAFKA-9933
 URL: https://issues.apache.org/jira/browse/KAFKA-9933
 Project: Kafka
  Issue Type: Improvement
  Components: security
Affects Versions: 2.4.1
Reporter: kaushik srinivas


Hello,

Document on the usage of the authorizer does not speak about the principal 
being used when the protocol for the listener is chosen as SASL + SSL 
(SASL_SSL).

Suppose kerberos and ssl is enabled together, will the authorization be based 
on the kerberos principal names or on the ssl certificate DN names ?

There is no document covering this part of the use case.

This needs information and documentation update.

Thanks,

Kaushik.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


[jira] [Resolved] (KAFKA-8622) Snappy Compression Not Working

2020-04-11 Thread kaushik srinivas (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-8622?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

kaushik srinivas resolved KAFKA-8622.
-
Resolution: Resolved

> Snappy Compression Not Working
> --
>
> Key: KAFKA-8622
> URL: https://issues.apache.org/jira/browse/KAFKA-8622
> Project: Kafka
>  Issue Type: Bug
>  Components: compression
>Affects Versions: 2.3.0, 2.2.1
>Reporter: Kunal Verma
>Assignee: kaushik srinivas
>Priority: Major
>
> I am trying to produce a message on the broker with compression enabled as 
> snappy.
> Environment :
> Brokers[Kafka-cluster] are hosted on Centos 7
> I have download the latest version (2.3.0 & 2.2.1) tar, extract it and moved 
> to /opt/kafka-
> I have executed the broker with standard configuration.
> In my producer service(written in java), I have enabled snappy compression.
> props.put(ProducerConfig.COMPRESSION_TYPE_CONFIG, "snappy");
>  
> so while sending record on broker, I am getting following errors:
> org.apache.kafka.common.errors.UnknownServerException: The server experienced 
> an unexpected error when processing the request
>  
> While investing further at broker end I got following error in log
>  
> logs/kafkaServer.out:java.lang.UnsatisfiedLinkError: 
> /tmp/snappy-1.1.7-ecd381af-ffdd-4a5c-a3d8-b802d0fa4e85-libsnappyjava.so: 
> /tmp/snappy-1.1.7-ecd381af-ffdd-4a5c-a3d8-b802d0fa4e85-libsnappyjava.so: 
> failed to map segment from shared object: Operation not permitted
> --
>  
> [2019-07-02 15:29:43,399] ERROR [ReplicaManager broker=1] Error processing 
> append operation on partition test-bulk-1 (kafka.server.ReplicaManager)
> java.lang.NoClassDefFoundError: Could not initialize class 
> org.xerial.snappy.Snappy
> at 
> org.xerial.snappy.SnappyInputStream.hasNextChunk(SnappyInputStream.java:435)
> at org.xerial.snappy.SnappyInputStream.read(SnappyInputStream.java:466)
> at java.io.DataInputStream.readByte(DataInputStream.java:265)
> at org.apache.kafka.common.utils.ByteUtils.readVarint(ByteUtils.java:168)
> at 
> org.apache.kafka.common.record.DefaultRecord.readFrom(DefaultRecord.java:293)
> at 
> org.apache.kafka.common.record.DefaultRecordBatch$1.readNext(DefaultRecordBatch.java:264)
> at 
> org.apache.kafka.common.record.DefaultRecordBatch$RecordIterator.next(DefaultRecordBatch.java:569)
> at 
> org.apache.kafka.common.record.DefaultRecordBatch$RecordIterator.next(DefaultRecordBatch.java:538)
> at 
> org.apache.kafka.common.record.DefaultRecordBatch.iterator(DefaultRecordBatch.java:327)
> at 
> scala.collection.convert.Wrappers$JIterableWrapper.iterator(Wrappers.scala:55)
> at scala.collection.IterableLike.foreach(IterableLike.scala:74)
> at scala.collection.IterableLike.foreach$(IterableLike.scala:73)
> at scala.collection.AbstractIterable.foreach(Iterable.scala:56)
> at 
> kafka.log.LogValidator$.$anonfun$validateMessagesAndAssignOffsetsCompressed$1(LogValidator.scala:269)
> at 
> kafka.log.LogValidator$.$anonfun$validateMessagesAndAssignOffsetsCompressed$1$adapted(LogValidator.scala:261)
> at scala.collection.Iterator.foreach(Iterator.scala:941)
> at scala.collection.Iterator.foreach$(Iterator.scala:941)
> at scala.collection.AbstractIterator.foreach(Iterator.scala:1429)
> at scala.collection.IterableLike.foreach(IterableLike.scala:74)
> at scala.collection.IterableLike.foreach$(IterableLike.scala:73)
> at scala.collection.AbstractIterable.foreach(Iterable.scala:56)
> at 
> kafka.log.LogValidator$.validateMessagesAndAssignOffsetsCompressed(LogValidator.scala:261)
> at 
> kafka.log.LogValidator$.validateMessagesAndAssignOffsets(LogValidator.scala:73)
> at kafka.log.Log.liftedTree1$1(Log.scala:881)
> at kafka.log.Log.$anonfun$append$2(Log.scala:868)
> at kafka.log.Log.maybeHandleIOException(Log.scala:2065)
> at kafka.log.Log.append(Log.scala:850)
> at kafka.log.Log.appendAsLeader(Log.scala:819)
> at 
> kafka.cluster.Partition.$anonfun$appendRecordsToLeader$1(Partition.scala:772)
> at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:253)
> at kafka.utils.CoreUtils$.inReadLock(CoreUtils.scala:259)
> at kafka.cluster.Partition.appendRecordsToLeader(Partition.scala:759)
> at 
> kafka.server.ReplicaManager.$anonfun$appendToLocalLog$2(ReplicaManager.scala:763)
> at scala.collection.TraversableLike.$anonfun$map$1(TraversableLike.scala:237)
> at scala.collection.mutable.HashMap.$anonfun$foreach$1(HashMap.scala:149)
> at scala.collection.mutable.HashTable.foreachEntry(HashTable.scala:237)
> at scala.collection.mutable.HashTable.foreachEntry$(HashTable.scala:230)
> at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:44)
> at scala.collection.mutable.HashMap.foreach(HashMap.scala:149)
> at scala.collection.TraversableLike.map(TraversableLike.scala:237)
> at scala.collection.TraversableLike.map$(TraversableLike.scala:230)
> at 

[jira] [Created] (KAFKA-8485) Kafka connect worker does not respond when kafka broker goes down with data streaming in progress

2019-06-05 Thread kaushik srinivas (JIRA)
kaushik srinivas created KAFKA-8485:
---

 Summary: Kafka connect worker does not respond when kafka broker 
goes down with data streaming in progress
 Key: KAFKA-8485
 URL: https://issues.apache.org/jira/browse/KAFKA-8485
 Project: Kafka
  Issue Type: Bug
  Components: KafkaConnect
Affects Versions: 2.2.1
Reporter: kaushik srinivas


Below is the scenario

3 kafka brokers are up and running.

Kafka connect worker is installed and a hdfs sink connector is added.

Data streaming started, data being flushed out of kafka into hdfs.

Topic is created with 3 partitons, one leader on all the three brokers.

Now, 2 kafka brokers are restarted. Partition re balance happens.

Now we observe, kafka connect does not respond. REST API keeps timing out. 

Nothing useful is being logged at the connect logs as well.

Only way to get out of this situation currently is to restart the kafka connect 
worker and things gets normal.

 

The same scenario when tried without data being in progress, works fine. 
Meaning REST API does not get into timing out state. 

making this issue a blocker, because of the impact due to kafka broker restart.

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KAFKA-8314) Managing the doc field in case of schema projection - kafka connect

2019-05-02 Thread kaushik srinivas (JIRA)
kaushik srinivas created KAFKA-8314:
---

 Summary: Managing the doc field in case of schema projection - 
kafka connect
 Key: KAFKA-8314
 URL: https://issues.apache.org/jira/browse/KAFKA-8314
 Project: Kafka
  Issue Type: Bug
Reporter: kaushik srinivas


Doc field change in the schema while writing to hdfs using hdfs sink connector 
via connect framework would cause failures in schema projection.

 

java.lang.RuntimeException: 
org.apache.kafka.connect.errors.SchemaProjectorException: Schema parameters not 
equal. source parameters: \{connect.record.doc=xxx} and target parameters: 
\{connect.record.doc=yyy} 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KAFKA-7667) Need synchronous records send support for kafka performance producer java application.

2018-11-22 Thread kaushik srinivas (JIRA)
kaushik srinivas created KAFKA-7667:
---

 Summary: Need synchronous records send support for kafka 
performance producer java application.
 Key: KAFKA-7667
 URL: https://issues.apache.org/jira/browse/KAFKA-7667
 Project: Kafka
  Issue Type: Improvement
  Components: tools
Affects Versions: 2.0.0
Reporter: kaushik srinivas
Assignee: kaushik srinivas


Why synchronous send support for performance producer ?

ProducerPerformance java application is used for load testing kafka brokers.
Load testing involves replicating very high throughput records flowing in to 
kafka brokers and 
many producers in field would use synchronous way of sending data i.e blocking 
until the message has been 
written completely on all the min.insyc.replicas no of brokers.

Asynchronous sends would satisfy the first requirement of loading kafka brokers.

This requirement would help in performance tuning the kafka brokers when 
producers are deployed with "acks": all, "min.insync.replicas" :
equal to replication factor and synchronous way of sending. 
Throughput degradation happens with synchronous producers and this would help 
in 
tuning resources for replication in kafka brokers. 
Also benchmarks could be made from kafka producer perspective with synchronous 
way of sending records and tune kafka producer's 
resources appropriately.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KAFKA-7659) dummy test

2018-11-20 Thread kaushik srinivas (JIRA)
kaushik srinivas created KAFKA-7659:
---

 Summary: dummy test
 Key: KAFKA-7659
 URL: https://issues.apache.org/jira/browse/KAFKA-7659
 Project: Kafka
  Issue Type: Improvement
  Components: tools
Reporter: kaushik srinivas






--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KAFKA-7171) KafkaPerformanceProducer crashes with same transaction id.

2018-07-17 Thread kaushik srinivas (JIRA)
kaushik srinivas created KAFKA-7171:
---

 Summary: KafkaPerformanceProducer crashes with same transaction id.
 Key: KAFKA-7171
 URL: https://issues.apache.org/jira/browse/KAFKA-7171
 Project: Kafka
  Issue Type: Bug
Affects Versions: 1.0.1
Reporter: kaushik srinivas


Running org.apache.kafka.tools.ProducerPerformance code to performance test the 
kafka cluster. As a trial cluster has only one broker and zookeeper with 12GB 
of heap space.

Running 6 producers on 3 machines with same transaction id (2 producers on each 
node).

Below are the settings of each producer,

kafka-run-class org.apache.kafka.tools.ProducerPerformance --print-metrics 
--topic perf1 --num-records 9223372036854 --throughput 25  --record-size 
200 --producer-props bootstrap.servers=localhost:9092 buffer.memory=524288000 
batch.size=524288

 

for 2 hours all producers run fine, then suddenly throughput of all producers 
increase 3 times and 4 producers on 2 nodes crashes with below exceptions,

[2018-07-16 14:00:18,744] ERROR Error executing user-provided callback on 
message for topic-partition perf1-6: 
(org.apache.kafka.clients.producer.internals.RecordBatch)
java.lang.ClassCastException: 
org.apache.kafka.clients.producer.internals.RecordAccumulator$RecordAppendResult
 cannot be cast to org.apache.kafka.clients.producer.internals.RecordBatch$Thunk
 at 
org.apache.kafka.clients.producer.internals.RecordBatch.done(RecordBatch.java:99)
 at 
org.apache.kafka.clients.producer.internals.Sender.completeBatch(Sender.java:312)
 at 
org.apache.kafka.clients.producer.internals.Sender.handleProduceResponse(Sender.java:272)
 at 
org.apache.kafka.clients.producer.internals.Sender.access$100(Sender.java:57)
 at 
org.apache.kafka.clients.producer.internals.Sender$1.onComplete(Sender.java:358)
 at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:278)
 at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:229)
 at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:134)
 at java.lang.Thread.run(Thread.java:748)

 

First machine (2 producers) run fine.

Need some pointers on this issue. 

Queires:

why the throughput is increasing 3 times after 2 hours of duration ?

why the other producers are crashing ?

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KAFKA-6961) UnknownTopicOrPartitionException & NotLeaderForPartitionException upon replication of topics.

2018-05-28 Thread kaushik srinivas (JIRA)
kaushik srinivas created KAFKA-6961:
---

 Summary: UnknownTopicOrPartitionException & 
NotLeaderForPartitionException upon replication of topics.
 Key: KAFKA-6961
 URL: https://issues.apache.org/jira/browse/KAFKA-6961
 Project: Kafka
  Issue Type: Bug
 Environment: kubernetes cluster kafka.
Reporter: kaushik srinivas
 Attachments: k8s_NotLeaderForPartition.txt, k8s_replication_errors.txt

Running kafka & zookeeper in kubernetes cluster.

No of brokers : 3

No of partitions per topic : 3

creating topic with 3 partitions, and looks like all the partitions are up.

Below is the snapshot to confirm the same,

Topic:applestore  PartitionCount:3  ReplicationFactor:3   Configs:
 Topic: applestore  Partition: 0Leader: 1001Replicas: 
1001,1003,1002Isr: 1001,1003,1002
 Topic: applestore  Partition: 1Leader: 1002Replicas: 
1002,1001,1003Isr: 1002,1001,1003
 Topic: applestore  Partition: 2Leader: 1003Replicas: 
1003,1002,1001Isr: 1003,1002,1001
 
But, we see in the brokers as soon as the topics are created below stack traces 
appears,
 
error 1: 
[2018-05-28 08:00:31,875] ERROR [ReplicaFetcher replicaId=1001, leaderId=1003, 
fetcherId=7] Error for partition applestore-2 to broker 
1003:org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This 
server does not host this topic-partition. (kafka.server.ReplicaFetcherThread)
 
error 2 :
[2018-05-28 00:43:20,993] ERROR [ReplicaFetcher replicaId=1003, leaderId=1001, 
fetcherId=0] Error for partition apple-6 to broker 
1001:org.apache.kafka.common.errors.NotLeaderForPartitionException: This server 
is not the leader for that topic-partition. (kafka.server.ReplicaFetcherThread)
 
When we tries producing records to each specific partition, it works fine and 
also log size across the replicated brokers appears to be equal, which means 
replication is happening fine.
Attaching the two stack trace files.
 
Why are these stack traces appearing ? can we ignore these stack traces if its 
some spam messages ?
 
 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KAFKA-6600) Kafka Bytes Out lags behind Kafka Bytes In on all brokers when topics replicated with 3 and flume kafka consumer.

2018-02-27 Thread kaushik srinivas (JIRA)
kaushik srinivas created KAFKA-6600:
---

 Summary: Kafka Bytes Out lags behind Kafka Bytes In on all brokers 
when topics replicated with 3 and flume kafka consumer.
 Key: KAFKA-6600
 URL: https://issues.apache.org/jira/browse/KAFKA-6600
 Project: Kafka
  Issue Type: Bug
Affects Versions: 0.10.0.1
Reporter: kaushik srinivas


Below is the setup detail,

Kafka with 3 brokers (each broker with 10 cores and 32GBmem (12 GB heap)).

Created topic with 120 partitions and replication factor 3.

Throughput per broker is ~40k msgs/sec and bytes in ~8mb/sec.

Flume kafka source is used as the consumer.

Observations:

When the replication factor is kept 1, the bytes out and bytes in stops exactly 
at same timestamp(i.e when the producer to kafka is stopped).

But when the replication factor is increased to 3, there is a time lag observed 
in bytes out compared to bytes in. Flume kafka source is pulling data slowly. 
But flume is configured with very high memory and cpu configurations.

 

Tried increasing num.replica.fetchers from default value 1 to 10, 20, 50 etc 
and replica.fetch.max.bytes from default 1MB to 10MB,20MB. But no improvement 
is found to be observed in terms of the lag.

under repplicated partitions is observed to be zero using replica manager 
metrics in jmx.

Kafka brokers were monitored for cpu and memory, cpu is being used at 3% of 
total cores max and memory used at 4gb (32 Gb configured).

Flume kafka source has overriden kafka consumer properties : 
max.partition.fetch bytes is kept at default 1MB and fetch.max.bytes is kept at 
default 52MB. Flume kafka source batch size is kept at default value 1000.
 agent.sources..kafka.consumer.fetch.max.bytes = 10485760
 agent.sources..kafka.consumer.max.partition.fetch.bytes = 10485760
 agent.sources..batchSize = 1000
 

what more tuning is needed in order to reduce the lag between bytes in and 
bytes out  at kafka brokers with replication factor 3 or is there any 
configuration missed out?

 

 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KAFKA-6356) UnknownTopicOrPartitionException & NotLeaderForPartitionException and log deletion happening with retention bytes kept at -1.

2017-12-13 Thread kaushik srinivas (JIRA)
kaushik srinivas created KAFKA-6356:
---

 Summary: UnknownTopicOrPartitionException & 
NotLeaderForPartitionException and log deletion happening with retention bytes 
kept at -1.
 Key: KAFKA-6356
 URL: https://issues.apache.org/jira/browse/KAFKA-6356
 Project: Kafka
  Issue Type: Bug
Affects Versions: 0.10.1.0
 Environment: Cent OS 7.2,
HDD : 2Tb,
CPUs: 56 cores,
RAM : 256GB
Reporter: kaushik srinivas
 Attachments: configs.txt, stderr_b0, stderr_b1, stderr_b2, stdout_b0, 
stdout_b1, stdout_b2, topic_description, topic_offsets

Facing issues in kafka topic with partitions and replication factor of 3.

Config used :
No of partitions : 20
replication factor : 3
No of brokers : 3
Memory for broker : 32GB
Heap for broker : 12GB

Producer is run to produce data for 20 partitions of a single topic.
But observed that partitions for which the leader is one of the 
broker(broker-1), the offsets are never incremented and also we see log file 
with 0MB size in the broker disk.

Seeing below error in the brokers :

error 1:
2017-12-13 07:11:11,191] ERROR [ReplicaFetcherThread-0-2], Error for partition 
[test2,5] to broker 
2:org.apache.kafka.common.errors.UnknownTopicOrPartitionException: This server 
does not host this topic-partition. (kafka.server.ReplicaFetcherThread)

error 2:
[2017-12-11 12:19:41,599] ERROR [ReplicaFetcherThread-0-2], Error for partition 
[test1,13] to broker 
2:org.apache.kafka.common.errors.NotLeaderForPartitionException: This server is 
not the leader for that topic-partition. (kafka.server.ReplicaFetcherThread)

Attaching,
1. error and std out files of all the brokers.
2. kafka config used.
3. offsets and topic description.

Retention bytes was kept to -1 and retention period 96 hours.
But still observing some of the log files deleting at the broker,

from logs :
[2017-12-11 12:20:20,586] INFO Deleting index 
/var/lib/mesos/slave/slaves/7b319cf4-f06e-4a35-a6fe-fd4fcc0548e6-S7/frameworks/7b319cf4-f06e-4a35-a6fe-fd4fcc0548e6-0006/executors/ckafka__5f085d0c-e296-40f0-a686-8953dd14e4c6/runs/506a1ce7-23d1-45ea-bb7c-84e015405285/kafka-broker-data/broker-1/test1-12/.timeindex
 (kafka.log.TimeIndex)
[2017-12-11 12:20:20,587] INFO Deleted log for partition [test1,12] in 
/var/lib/mesos/slave/slaves/7b319cf4-f06e-4a35-a6fe-fd4fcc0548e6-S7/frameworks/7b319cf4-f06e-4a35-a6fe-fd4fcc0548e6-0006/executors/ckafka__5f085d0c-e296-40f0-a686-8953dd14e4c6/runs/506a1ce7-23d1-45ea-bb7c-84e015405285/kafka-broker-data/broker-1/test1-12.
 (kafka.log.LogManager)

We are expecting the logs to be never delete if retention bytes set to -1.






--
This message was sent by Atlassian JIRA
(v6.4.14#64029)


[jira] [Created] (KAFKA-6165) Kafka Brokers goes down with outOfMemoryError.

2017-11-03 Thread kaushik srinivas (JIRA)
kaushik srinivas created KAFKA-6165:
---

 Summary: Kafka Brokers goes down with outOfMemoryError.
 Key: KAFKA-6165
 URL: https://issues.apache.org/jira/browse/KAFKA-6165
 Project: Kafka
  Issue Type: Bug
Affects Versions: 0.11.0.0
 Environment: DCOS cluster with 4 agent nodes and 3 masters.

agent machine config :
RAM : 384 GB
DISK : 4TB


Reporter: kaushik srinivas
Priority: Major
 Attachments: kafka_config.txt, stderr_broker1.txt, stderr_broker2.txt, 
stdout_broker1.txt, stdout_broker2.txt

Performance testing kafka with end to end pipe lines of,
Kafka Data Producer -> kafka -> spark streaming -> hdfs -- stream1
Kafka Data Producer -> kafka -> flume -> hdfs -- stream2

stream1 kafka configs :
No of topics : 10
No of partitions : 20 for all the topics

stream2 kafka configs :
No of topics : 10
No of partitions : 20 for all the topics

Some important Kafka Configuration :
"BROKER_MEM": "32768"(32GB)
"BROKER_JAVA_HEAP": "16384"(16GB)
"BROKER_COUNT": "3"
"KAFKA_MESSAGE_MAX_BYTES": "112"(1MB)
"KAFKA_REPLICA_FETCH_MAX_BYTES": "1048576"(1MB)
"KAFKA_NUM_PARTITIONS": "20"
"BROKER_DISK_SIZE": "5000" (5GB)
"KAFKA_LOG_SEGMENT_BYTES": "5000",(50MB)
"KAFKA_LOG_RETENTION_BYTES": "50"(5GB)

Data Producer to kafka Throughput:

message rate : 5 lakhs messages/sec approx across all the 3 brokers and 
topics/partitions.
message size : approx 300 to 400 bytes.

Issues observed with this configs:

Issue 1:

stack trace:

[2017-11-03 00:56:28,484] FATAL [Replica Manager on Broker 0]: Halting due to 
unrecoverable I/O error while handling produce request:  
(kafka.server.ReplicaManager)
kafka.common.KafkaStorageException: I/O exception in append to log 
'store_sales-16'
at kafka.log.Log.append(Log.scala:349)
at kafka.cluster.Partition$$anonfun$10.apply(Partition.scala:443)
at kafka.cluster.Partition$$anonfun$10.apply(Partition.scala:429)
at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:234)
at kafka.utils.CoreUtils$.inReadLock(CoreUtils.scala:240)
at kafka.cluster.Partition.appendMessagesToLeader(Partition.scala:429)
at 
kafka.server.ReplicaManager$$anonfun$appendToLocalLog$2.apply(ReplicaManager.scala:407)
at 
kafka.server.ReplicaManager$$anonfun$appendToLocalLog$2.apply(ReplicaManager.scala:393)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at 
scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:234)
at 
scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)
at 
scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:99)
at 
scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:230)
at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:40)
at scala.collection.mutable.HashMap.foreach(HashMap.scala:99)
at scala.collection.TraversableLike$class.map(TraversableLike.scala:234)
at scala.collection.AbstractTraversable.map(Traversable.scala:104)
at 
kafka.server.ReplicaManager.appendToLocalLog(ReplicaManager.scala:393)
at kafka.server.ReplicaManager.appendMessages(ReplicaManager.scala:330)
at kafka.server.KafkaApis.handleProducerRequest(KafkaApis.scala:425)
at kafka.server.KafkaApis.handle(KafkaApis.scala:78)
at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:60)
at java.lang.Thread.run(Thread.java:745)
Caused by: java.io.IOException: Map failed
at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:940)
at 
kafka.log.AbstractIndex$$anonfun$resize$1.apply(AbstractIndex.scala:116)
at 
kafka.log.AbstractIndex$$anonfun$resize$1.apply(AbstractIndex.scala:106)
at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:234)
at kafka.log.AbstractIndex.resize(AbstractIndex.scala:106)
at 
kafka.log.AbstractIndex$$anonfun$trimToValidSize$1.apply$mcV$sp(AbstractIndex.scala:160)
at 
kafka.log.AbstractIndex$$anonfun$trimToValidSize$1.apply(AbstractIndex.scala:160)
at 
kafka.log.AbstractIndex$$anonfun$trimToValidSize$1.apply(AbstractIndex.scala:160)
at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:234)
at kafka.log.AbstractIndex.trimToValidSize(AbstractIndex.scala:159)
at kafka.log.Log.roll(Log.scala:771)
at kafka.log.Log.maybeRoll(Log.scala:742)
at kafka.log.Log.append(Log.scala:405)
... 22 more
Caused by: java.lang.OutOfMemoryError: Map failed
at sun.nio.ch.FileChannelImpl.map0(Native Method)
at sun.nio.ch.FileChannelImpl.map(FileChannelImpl.java:937)
... 34 more


Issue 2 :

stack trace :

[2017-11-02 23:55:49,602] FATAL [ReplicaFetcherThread-0-0], Disk error while 
replicating data for catalog_sales-3