[jira] [Updated] (KAFKA-7711) Add a bounded flush() API to Kafka Producer

2019-02-14 Thread Ray Chiang (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-7711?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ray Chiang updated KAFKA-7711:
--
Component/s: producer 

> Add a bounded flush()  API to Kafka Producer
> 
>
> Key: KAFKA-7711
> URL: https://issues.apache.org/jira/browse/KAFKA-7711
> Project: Kafka
>  Issue Type: Improvement
>  Components: producer 
>Reporter: kun du
>Priority: Minor
>
> Currently the call to Producer.flush() can be hang there for indeterminate 
> time.
> It is a good idea to add a bounded flush() API and timeout if producer is 
> unable to flush all the batch records in a limited time. In this way the 
> caller of flush() has a chance to decide what to do next instead of just wait 
> forever.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KAFKA-6149) LogCleanerManager should include topic partition name when warning of invalid cleaner offset

2018-08-20 Thread Ray Chiang (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-6149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ray Chiang updated KAFKA-6149:
--
Component/s: log

> LogCleanerManager should include topic partition name when warning of invalid 
> cleaner offset 
> -
>
> Key: KAFKA-6149
> URL: https://issues.apache.org/jira/browse/KAFKA-6149
> Project: Kafka
>  Issue Type: Improvement
>  Components: log, logging
>Reporter: Ryan P
>Priority: Major
>
> The following message would be a lot more helpful if the topic partition name 
> were included.
> if (!isCompactAndDelete(log))
>   warn(s"Resetting first dirty offset to log start offset 
> $logStartOffset since the checkpointed offset $offset is invalid.")



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KAFKA-5943) Reduce dependency on mock in connector tests

2018-08-20 Thread Ray Chiang (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-5943?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ray Chiang updated KAFKA-5943:
--
Component/s: unit tests

> Reduce dependency on mock in connector tests
> 
>
> Key: KAFKA-5943
> URL: https://issues.apache.org/jira/browse/KAFKA-5943
> Project: Kafka
>  Issue Type: Test
>  Components: unit tests
>Reporter: Ted Yu
>Priority: Minor
>  Labels: connector, mock
>
> Currently connector tests make heavy use of mock (easymock, power mock).
> This may hide the real logic behind operations and makes finding bugs 
> difficult.
> We should reduce the use of mocks so that developers can debug connector code 
> using unit tests.
> This would shorten the development cycle for connector.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KAFKA-6033) Kafka Streams does not work with musl-libc (alpine linux)

2018-08-20 Thread Ray Chiang (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-6033?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ray Chiang updated KAFKA-6033:
--
Component/s: streams

> Kafka Streams does not work with musl-libc (alpine linux)
> -
>
> Key: KAFKA-6033
> URL: https://issues.apache.org/jira/browse/KAFKA-6033
> Project: Kafka
>  Issue Type: Improvement
>  Components: streams
>Affects Versions: 0.11.0.0
> Environment: Alpine 3.6
>Reporter: Jeffrey Zampieron
>Priority: Major
>
> Using the released version of kafka fails on alpine based images b/c of 
> rocksdb using the jni and failing to load.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KAFKA-6149) LogCleanerManager should include topic partition name when warning of invalid cleaner offset

2018-08-20 Thread Ray Chiang (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-6149?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ray Chiang updated KAFKA-6149:
--
Component/s: logging

> LogCleanerManager should include topic partition name when warning of invalid 
> cleaner offset 
> -
>
> Key: KAFKA-6149
> URL: https://issues.apache.org/jira/browse/KAFKA-6149
> Project: Kafka
>  Issue Type: Improvement
>  Components: log, logging
>Reporter: Ryan P
>Priority: Major
>
> The following message would be a lot more helpful if the topic partition name 
> were included.
> if (!isCompactAndDelete(log))
>   warn(s"Resetting first dirty offset to log start offset 
> $logStartOffset since the checkpointed offset $offset is invalid.")



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KAFKA-6295) Add 'Coordinator Id' to consumer metrics

2018-08-20 Thread Ray Chiang (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-6295?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ray Chiang updated KAFKA-6295:
--
Component/s: metrics

> Add 'Coordinator Id' to consumer metrics
> 
>
> Key: KAFKA-6295
> URL: https://issues.apache.org/jira/browse/KAFKA-6295
> Project: Kafka
>  Issue Type: Improvement
>  Components: metrics
>Reporter: Ryan P
>Priority: Major
>  Labels: needs-kip
>
> It would be incredibly helpful to be able to review which broker was the 
> coordinator for a consumer at a given point in time. The easiest way to 
> achieve this in my opinion would be to expose a coordinator id metric. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KAFKA-6143) VerifiableProducer & VerifiableConsumer need tests

2018-08-20 Thread Ray Chiang (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-6143?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ray Chiang updated KAFKA-6143:
--
Component/s: system tests

> VerifiableProducer & VerifiableConsumer need tests
> --
>
> Key: KAFKA-6143
> URL: https://issues.apache.org/jira/browse/KAFKA-6143
> Project: Kafka
>  Issue Type: Bug
>  Components: system tests
>Reporter: Tom Bentley
>Priority: Minor
>
> The {{VerifiableProducer}} and {{VerifiableConsumer}} used use for system 
> tests, but don't have any tests themselves. They should have.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KAFKA-6385) Rack awareness ignored by kafka-reassign-partitions

2018-08-20 Thread Ray Chiang (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-6385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ray Chiang updated KAFKA-6385:
--
Component/s: tools

> Rack awareness ignored by kafka-reassign-partitions
> ---
>
> Key: KAFKA-6385
> URL: https://issues.apache.org/jira/browse/KAFKA-6385
> Project: Kafka
>  Issue Type: Bug
>  Components: tools
>Affects Versions: 1.0.0
> Environment: Ubuntu 16.04
>Reporter: Gal Barak
>Priority: Minor
> Attachments: actual.txt, topic-to-move.json
>
>
> Hi,
> It seems that the kafka-reassign-partitions script ignores rack awareness, 
> when suggesting a new partition layout. Came across it when doing some 
> initial testing with Kafka.
> +To reproduce:+
> #  Create a Kafka cluster with 3 brokers (1,2,3). Use 3 different racks 
> (broker.rack definition. Example: "A", "B" and "C").
> #* I used a non-root directory in zookeeper (i.e. - {{ 1>:2181,:2181,:2182/}})
> #* The tested topic was automatically created, according to a default 
> configuration of 12 partitions and 3 replicas per topic.
> # Install a 4th broker, and assign it to the same rack as the 1st broker 
> ("A").
> # Create a topics-to-move.json file for a single topic. The file I used was 
> uploaded as topic-to-move.json.
> # Run the kafka-reassign-partitions script:
> {{kafka-reassign-partitions --zookeeper :2181, 2>:2181,:2182/ 
> --topics-to-move-json-file  --broker-list "1,2,3,4" 
> --generate}}
> +Expected result:+
> A suggested reassignment that makes sure that no partitions uses both broker 
> 1 and broker 4 as its replicas.
> +Actual results of the command:+
> The full result is attached as a file (actual.txt). It includes partitions 
> with replicas that are on both brokers 1 and 4, which are two servers on the 
> same rack.
> Example: {"topic":"","partition":6,"replicas":[1,2,4]}
> +Additional notes:+
> * I did not test starting the cluster from scratch. The same behavior might 
> be present when topic partitions are created automatically (in which case, 
> the priority might be higher).
> * I'm not sure it's related. But the original assignment seems to be 
> problematic as well: If a single server (of the 3) failed, a different single 
> server became the leader for all of its partitions. For example, if broker 1 
> failed, server 2 became the leader for all of the partitions for which 1 was 
> previously the leader, instead of having the load distributed evenly between 
> brokers 2 and 3.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KAFKA-6154) Transient failure TransactionsBounceTest.testBrokerFailure

2018-08-20 Thread Ray Chiang (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-6154?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ray Chiang updated KAFKA-6154:
--
Component/s: unit tests

> Transient failure TransactionsBounceTest.testBrokerFailure
> --
>
> Key: KAFKA-6154
> URL: https://issues.apache.org/jira/browse/KAFKA-6154
> Project: Kafka
>  Issue Type: Bug
>  Components: unit tests
>Reporter: Jason Gustafson
>Priority: Major
>
> {code}
> java.lang.AssertionError: Out of order messages detected 
> expected: 34, 37, 43, 44, 45, 47, 48, 52, 54, 55, 56, 59, 60, 61, 63, 64, 70, 71, 72, 
> 76, 79, 80, 82, 88, 89, 92, 93, 96, 98, 99, 100, 102, 106, 107, 108, 109, 
> 119, 125, 131, 132, 134, 137, 141, 144, 145, 153, 154, 156, 157, 158, 160, 
> 163, 168, 174, 177, 178, 183, 184, 187, 190, 191, 194, 195, 196, 198, 199, 
> 204, 206, 207, 208, 210, 215, 219, 222, 224, 227, 228, 238, 240, 244, 247, 
> 255, 256, 257, 261, 265, 267, 269, 272, 273, 275, 278, 279, 281, 282, 283, 
> 287, 288, 293, 295, 296, 299, 300, 301, 304, 307, 308, 311, 313, 316, 317, 
> 319, 320, 326, 329, 335, 337, 338, 339, 340, 342, 344, 347, 348, 355, 356, 
> 360, 361, 364, 366, 367, 369, 371, 373, 389, 391, 395, 398, 402, 404, 405, 
> 408, 409, 410, 411, 420, 421, 423, 424, 428, 432, 433, 442, 445, 449, 450, 
> 451, 452, 453, 454, 457, 460, 461, 463, 464, 470, 471, 472, 474, 476, 477, 
> 483, 485, 489, 491, 492, 493, 494, 499, 500, 501, 502, 505, 507, 510, 515, 
> 516, 522, 523, 525, 527, 528, 530, 532, 538, 542, 544, 550, 556, 557, 560, 
> 563, 564, 565, 568, 569, 570, 573, 575, 576, 582, 587, 594, 597, 602, 603, 
> 605, 608, 609, 610, 613, 621, 623, 631, 632, 638, 642, 644, 651, 654, 660, 
> 662, 664, 666, 667, 668, 669, 672, 675, 676, 682, 683, 686, 687, 688, 691, 
> 692, 698, 701, 709, 711, 716, 718, 719, 722, 725, 727, 729, 731, 734, 738, 
> 739, 740, 743, 744, 745, 751, 758, 762, 764, 767, 771, 773, 775, 781, 783, 
> 784, 788, 795, 796, 802, 804, 805, 809, 811, 812, 819, 822, 824, 826, 828, 
> 831, 834, 839, 840, 844, 845, 849, 856, 859, 862, 864, 865, 870, 871, 889, 
> 895, 899, 900, 901, 907, 913, 916, 921, 922, 924, 925, 931, 938, 941, 945, 
> 948, 953, 954, 956, 959, 960, 962, 966, 968, 969, 970, 974, 985, 998, 999, 
> 1001, 1002, 1007, 1008, 1015, 1017, 1020, 1022, 1026, 1027, 1028, 1030, 1033, 
> 1034, 1036, 1043, 1044, 1046, 1049, 1051, 1054, 1055, 1059, 1060, 1062, 1063, 
> 1065, 1069, 1074, 1077, 1079, 1084, 1089, 1090, 1094, 1095, 1097, 1105, 1110, 
> 1112, 1113, 1119, 1121, 1126, 1130, 1134, 1135, 1137, 1140, 1141, 1149, 1150, 
> 1151, 1154, 1156, 1157, 1159, 1160, 1161, 1162, 1167, 1169, 1172, 1173, 1176, 
> 1179, 1184, 1186, 1188, 1195, 1197, 1198, 1199, 1201, 1203, 1205, 1207, 1216, 
> 1217, 1219, 1221, 1223, 1226, 1233, 1234, 1236, 1239, 1240, 1241, 1242, 1248, 
> 1255, 1256, 1264, 1270, 1271, 1274, 1279, 1283, 1289, 1292, 1293, 1294, 1301, 
> 1306, 1307, 1310, 1312, 1313, 1316, 1317, 1322, 1330, 1332, 1333, 1334, 1335, 
> 1337, 1342, 1345, 1348, 1351, 1354, 1355, 1357, 1358, 1359, 1363, 1365, 1376, 
> 1378, 1381, 1388, 1390, 1391, 1396, 1397, 1398, 1404, 1412, 1413, 1414, 1417, 
> 1420, 1423, 1424, 1429, 1434, 1436, 1438, 1443, 1445, 1446, 1448, 1449, 1453, 
> 1455, 1461, 1464, 1473, 1474, 1475, 1478, 1479, 1481, 1482, 1484, 1494, 1497, 
> 1501, 1505, 1509, 1519, 1520, 1523, 1525, 1529, 1530, 1533, 1534, 1538, 1546, 
> 1550, 1551, 1552, 1554, 1555, 1557, 1562, 1565, 1566, 1574, 1577, 1580, 1581, 
> 1585, 1586, 1587, 1593, 1600, 1604, 1606, 1609, 1611, 1615, 1618, 1620, 1621, 
> 1622, 1623, 1626, 1628, 1629, 1632, 1641, 1643, 1645, 1646, 1647, 1651, 1652, 
> 1655, 1662, 1677, 1679, 1680, 1681, 1683, 1684, 1688, 1693, 1695, 1696, 1700, 
> 1712, 1718, 1719, 1722, 1723, 1731, 1734, 1735, 1739, 1745, 1746, 1749, 1753, 
> 1755, 1756, 1758, 1760, 1761, 1762, 1763, 1765, 1766, 1767, 1768, 1770, 1771, 
> 1776, 1777, 1779, 1780, 1782, 1785, 1786, 1793, 1802, 1803, 1804, 1805, 1806, 
> 1807, 1811, 1815, 1817, 1819, 1821, 1826, 1830, 1840, 1845, 1846, 1847, 1848, 
> 1850, 1855, 1857, 1859, 1862, 1869, 1870, 1876, 1879, 1881, 1882, 1885, 1890, 
> 1893, 1895, 1898, 1899, 1900, 1901, 1902, 1903, 1906, 1908, 1911, 1915, 1917, 
> 1918, 1919, 1921, 1926, 1927, 1931, 1934, 1936, 1941, 1951, 1952, 1953, 1954, 
> 1958, 1961, 1964, 1965, 1966, 1970, 1971, 1972, 1974, 1985, 1987, 1992, 1995, 
> 1997, 1999, 2001, 2002, 2005, 2006, 2008, 2009, 2014, 2015, 2019, 2023, 2026, 
> 2039, 2040, 2042, 2043, 2050, 2051, 2052, 2053, 2055, 2056, 2057, 2060, 2067, 
> 2069, 2070, 2076, 2077, 2083, 2084, 2085, 2086, 2087, 2091, 2093, 2097, 2100, 
> 2101, 2102, 2104, 2106, 2107, 2110, 2115, 2117, 2121, 2123, 2130, 2136, 2137, 
> 2139, 2148, 2149, 2153, 2156, 2157, 2160, 2163, 2164, 2172, 2179, 2180, 2181, 
> 2183, 2184, 2186, 2188, 2189, 2193, 2198, 2199, 2200, 2212, 2213, 2218, , 
> 

[jira] [Updated] (KAFKA-5887) Enable findBugs (or equivalent) when building with Java 9

2018-08-20 Thread Ray Chiang (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-5887?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ray Chiang updated KAFKA-5887:
--
Component/s: build

> Enable findBugs (or equivalent) when building with Java 9
> -
>
> Key: KAFKA-5887
> URL: https://issues.apache.org/jira/browse/KAFKA-5887
> Project: Kafka
>  Issue Type: Task
>  Components: build
>Reporter: Ismael Juma
>Priority: Major
> Fix For: 2.1.0
>
>
> findBugs doesn't support Java 9 and it seems to be abandonware at this point:
> https://github.com/findbugsproject/findbugs/issues/105
> https://github.com/gradle/gradle/issues/720
> It has been forked, but the fork requires Java 8:
> https://github.com/spotbugs/spotbugs
> https://github.com/spotbugs/spotbugs/blob/master/docs/migration.rst#findbugs-gradle-plugin
> We should migrate once we move to Java 8 if spotbugs is still being actively 
> developed and findBugs continues to be dead.
> Additional tasks:
> 1. Remove the code that disables the Gradle plugin for findBugs (or spotbugs) 
> when building with Java 9.
> 2. Enable the findBugs plugin in Jenkins for the relevant builds:
> https://builds.apache.org/job/kafka-trunk-jdk9/
> https://builds.apache.org/job/kafka-pr-jdk9-scala2.12/



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KAFKA-7202) Support multiple auto-generated docs formats

2018-08-20 Thread Ray Chiang (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-7202?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ray Chiang updated KAFKA-7202:
--
Component/s: documentation

> Support multiple auto-generated docs formats
> 
>
> Key: KAFKA-7202
> URL: https://issues.apache.org/jira/browse/KAFKA-7202
> Project: Kafka
>  Issue Type: New Feature
>  Components: documentation
>Reporter: Joel Hamill
>Priority: Major
>
> Currently the configuration parameters for Confluent/Kafka are autogenerated 
> as HTML (and hosted at 
> [https://kafka.apache.org/documentation/#configuration]). This request is to 
> expand this to support other formats (e.g. RST) so that they can be easily 
> leveraged by other authorign language formats.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KAFKA-6823) Transient failure in DynamicBrokerReconfigurationTest.testThreadPoolResize

2018-08-20 Thread Ray Chiang (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-6823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ray Chiang updated KAFKA-6823:
--
Component/s: unit tests

> Transient failure in DynamicBrokerReconfigurationTest.testThreadPoolResize
> --
>
> Key: KAFKA-6823
> URL: https://issues.apache.org/jira/browse/KAFKA-6823
> Project: Kafka
>  Issue Type: Bug
>  Components: unit tests
>Reporter: Anna Povzner
>Priority: Major
>
> Saw in my PR build (*DK 10 and Scala 2.12 ):*
> *15:58:46* kafka.server.DynamicBrokerReconfigurationTest > 
> testThreadPoolResize FAILED
> *15:58:46*     java.lang.AssertionError: Invalid threads: expected 6, got 7: 
> List(ReplicaFetcherThread-0-0, ReplicaFetcherThread-0-1, 
> ReplicaFetcherThread-0-2, ReplicaFetcherThread-0-0, ReplicaFetcherThread-0-2, 
> ReplicaFetcherThread-1-0, ReplicaFetcherThread-0-1)
> *15:58:46*         at org.junit.Assert.fail(Assert.java:88)
> *15:58:46*         at org.junit.Assert.assertTrue(Assert.java:41)
> *15:58:46*         at 
> kafka.server.DynamicBrokerReconfigurationTest.verifyThreads(DynamicBrokerReconfigurationTest.scala:1147)
> *15:58:46*         at 
> kafka.server.DynamicBrokerReconfigurationTest.maybeVerifyThreadPoolSize$1(DynamicBrokerReconfigurationTest.scala:412)
> *15:58:46*         at 
> kafka.server.DynamicBrokerReconfigurationTest.resizeThreadPool$1(DynamicBrokerReconfigurationTest.scala:431)
> *15:58:46*         at 
> kafka.server.DynamicBrokerReconfigurationTest.reducePoolSize$1(DynamicBrokerReconfigurationTest.scala:417)
> *15:58:46*         at 
> kafka.server.DynamicBrokerReconfigurationTest.$anonfun$testThreadPoolResize$3(DynamicBrokerReconfigurationTest.scala:440)
> *15:58:46*         at 
> scala.collection.immutable.Range.foreach$mVc$sp(Range.scala:156)
> *15:58:46*         at 
> kafka.server.DynamicBrokerReconfigurationTest.verifyThreadPoolResize$1(DynamicBrokerReconfigurationTest.scala:439)
> *15:58:46*         at 
> kafka.server.DynamicBrokerReconfigurationTest.testThreadPoolResize(DynamicBrokerReconfigurationTest.scala:453)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KAFKA-6465) Add a metrics for the number of records per log

2018-08-20 Thread Ray Chiang (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-6465?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ray Chiang updated KAFKA-6465:
--
Component/s: log

> Add a metrics for the number of records per log
> ---
>
> Key: KAFKA-6465
> URL: https://issues.apache.org/jira/browse/KAFKA-6465
> Project: Kafka
>  Issue Type: Bug
>  Components: log
>Reporter: Ivan Babrou
>Priority: Major
>
> Currently there are log metrics for:
>  * Start offset
>  * End offset
>  * Size in bytes
>  * Number of segments
> I propose to add another metric to track number of record batches in the log. 
> This should provide operators with an idea of how much batching is happening 
> on the producers. Having this metric in one place seems easier than scraping 
> the metric from each producer.
> Having an absolute counter may be infeasible (batches are not assigned 
> sequential IDs), but gauge should be ok. Average batch size can be calculated 
> as (end offset - start offset) / number of batches. This will be heavily 
> skewed for logs with long retention, though.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KAFKA-7166) Links are really difficult to see - insufficient contrast

2018-08-20 Thread Ray Chiang (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-7166?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ray Chiang updated KAFKA-7166:
--
Component/s: documentation

> Links are really difficult to see - insufficient contrast
> -
>
> Key: KAFKA-7166
> URL: https://issues.apache.org/jira/browse/KAFKA-7166
> Project: Kafka
>  Issue Type: Bug
>  Components: documentation
>Reporter: Sebb
>Priority: Major
>
> {color:#0b6d88}The link colour on the pages is very similar to the text 
> colour.{color}
> This works OK where the links form part of a menu system.
>  In that case it's obvious that the the text items must be links.
> However it does not provide sufficient contrast where the links are part of 
> body text.
>  This is particularly true when a whole section of text is a link, so there 
> is no contrast within the text.
> Users should not have to search to find the links.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KAFKA-7178) Is kafka compatible with zookeeper 3.5.x ?

2018-08-20 Thread Ray Chiang (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-7178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ray Chiang updated KAFKA-7178:
--
Component/s: zkclient

> Is kafka compatible with zookeeper 3.5.x ?
> --
>
> Key: KAFKA-7178
> URL: https://issues.apache.org/jira/browse/KAFKA-7178
> Project: Kafka
>  Issue Type: Improvement
>  Components: zkclient
>Reporter: fwq
>Priority: Major
>
> Hi, all
> I want to know is kafka versions (0.8.x, 0.9.x, 0.10.x 0.11.x 1.x) compatible 
> with zookeeper 3.5.x with dynamic reconfiguration feature?
> some refs on here: 
> https://serverfault.com/questions/854650/kafka-compatible-with-zookeeper-3-5-feature-rebalancing-client-connections



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KAFKA-7309) Upgrade Jacoco for Java 11 support

2018-08-20 Thread Ray Chiang (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-7309?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ray Chiang updated KAFKA-7309:
--
Component/s: packaging

> Upgrade Jacoco for Java 11 support
> --
>
> Key: KAFKA-7309
> URL: https://issues.apache.org/jira/browse/KAFKA-7309
> Project: Kafka
>  Issue Type: Sub-task
>  Components: packaging
>Reporter: Ismael Juma
>Priority: Major
>




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KAFKA-7276) Consider using re2j to speed up regex operations

2018-08-20 Thread Ray Chiang (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-7276?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ray Chiang updated KAFKA-7276:
--
Component/s: packaging

> Consider using re2j to speed up regex operations
> 
>
> Key: KAFKA-7276
> URL: https://issues.apache.org/jira/browse/KAFKA-7276
> Project: Kafka
>  Issue Type: Task
>  Components: packaging
>Reporter: Ted Yu
>Priority: Major
>
> https://github.com/google/re2j
> re2j claims to do linear time regular expression matching in Java.
> Its benefit is most obvious for deeply nested regex (such as a | b | c | d).
> We should consider using re2j to speed up regex operations.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KAFKA-7314) MirrorMaker example in documentation does not work

2018-08-20 Thread Ray Chiang (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-7314?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ray Chiang updated KAFKA-7314:
--
Component/s: mirrormaker

> MirrorMaker example in documentation does not work
> --
>
> Key: KAFKA-7314
> URL: https://issues.apache.org/jira/browse/KAFKA-7314
> Project: Kafka
>  Issue Type: Bug
>  Components: mirrormaker
>Reporter: John Wilkinson
>Priority: Critical
>
> Kafka MirrorMaker as described in the documentation 
> [here|https://kafka.apache.org/documentation/#basic_ops_mirror_maker] does 
> not work. Instead of pulling messages from the consumer-defined 
> {{bootstrap.servers}} and pushing to the producer-defined 
> {{bootstrap.servers}}, it consumes and producers on the same topic on the 
> same host repeatedly.
> To replicate, set up two instances of kafka following 
> [this|https://docs.confluent.io/current/installation/docker/docs/installation/recipes/single-node-client.html]
>  guide. The schema registry and rest proxy are unnecessary. 
> [Here|https://hub.docker.com/r/confluentinc/cp-kafka/] is the DockerHub page 
> for the image.  The Kafka version is 2.0.0.
> Using those two kafka instances, go {{docker exec}} into one and set up the 
> {{consumer.properties}} and the {{producer.properties}} following the 
> MirrorMaker guide.
> Oddly, if you put in garbage unresolvable server addresses in the config, 
> there will be an error, despite the configs not getting used.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KAFKA-7282) Failed to read `log header` from file channel

2018-08-20 Thread Ray Chiang (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-7282?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ray Chiang updated KAFKA-7282:
--
Component/s: log

> Failed to read `log header` from file channel
> -
>
> Key: KAFKA-7282
> URL: https://issues.apache.org/jira/browse/KAFKA-7282
> Project: Kafka
>  Issue Type: Bug
>  Components: log
>Affects Versions: 0.11.0.2, 1.1.1, 2.0.0
> Environment: Linux
>Reporter: Alastair Munro
>Priority: Major
>
> Full stack trace:
> {code:java}
> [2018-08-13 11:22:01,635] ERROR [ReplicaManager broker=2] Error processing 
> fetch operation on partition segmenter-evt-v1-14, offset 96745 
> (kafka.server.ReplicaManager)
> org.apache.kafka.common.KafkaException: java.io.EOFException: Failed to read 
> `log header` from file channel `sun.nio.ch.FileChannelImpl@6e6d8ddd`. 
> Expected to read 17 bytes, but reached end of file after reading 0 bytes. 
> Started read from position 25935.
> at 
> org.apache.kafka.common.record.RecordBatchIterator.makeNext(RecordBatchIterator.java:40)
> at 
> org.apache.kafka.common.record.RecordBatchIterator.makeNext(RecordBatchIterator.java:24)
> at 
> org.apache.kafka.common.utils.AbstractIterator.maybeComputeNext(AbstractIterator.java:79)
> at 
> org.apache.kafka.common.utils.AbstractIterator.hasNext(AbstractIterator.java:45)
> at 
> org.apache.kafka.common.record.FileRecords.searchForOffsetWithSize(FileRecords.java:286)
> at kafka.log.LogSegment.translateOffset(LogSegment.scala:254)
> at kafka.log.LogSegment.read(LogSegment.scala:277)
> at kafka.log.Log$$anonfun$read$2.apply(Log.scala:1159)
> at kafka.log.Log$$anonfun$read$2.apply(Log.scala:1114)
> at kafka.log.Log.maybeHandleIOException(Log.scala:1837)
> at kafka.log.Log.read(Log.scala:1114)
> at 
> kafka.server.ReplicaManager.kafka$server$ReplicaManager$$read$1(ReplicaManager.scala:912)
> at 
> kafka.server.ReplicaManager$$anonfun$readFromLocalLog$1.apply(ReplicaManager.scala:974)
> at 
> kafka.server.ReplicaManager$$anonfun$readFromLocalLog$1.apply(ReplicaManager.scala:973)
> at 
> scala.collection.mutable.ResizableArray$class.foreach(ResizableArray.scala:59)
> at scala.collection.mutable.ArrayBuffer.foreach(ArrayBuffer.scala:48)
> at kafka.server.ReplicaManager.readFromLocalLog(ReplicaManager.scala:973)
> at kafka.server.ReplicaManager.readFromLog$1(ReplicaManager.scala:802)
> at kafka.server.ReplicaManager.fetchMessages(ReplicaManager.scala:815)
> at kafka.server.KafkaApis.handleFetchRequest(KafkaApis.scala:678)
> at kafka.server.KafkaApis.handle(KafkaApis.scala:107)
> at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:69)
> at java.lang.Thread.run(Thread.java:748)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KAFKA-6763) Consider using direct byte buffers in SslTransportLayer

2018-08-20 Thread Ray Chiang (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-6763?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ray Chiang updated KAFKA-6763:
--
Component/s: network

> Consider using direct byte buffers in SslTransportLayer
> ---
>
> Key: KAFKA-6763
> URL: https://issues.apache.org/jira/browse/KAFKA-6763
> Project: Kafka
>  Issue Type: Improvement
>  Components: network
>Reporter: Ismael Juma
>Priority: Minor
>  Labels: performance, tls
>
> We use heap byte buffers in SslTransportLayer. For netReadBuffer and 
> netWriteBuffer, it means that the NIO layer has to copy to/from a native 
> buffer before it can write/read to the socket. It would be good to test if 
> switching to direct byte buffers improves performance. We can't be sure as 
> the benefit of avoiding the copy could be offset by the specifics of the 
> operations we perform on netReadBuffer, netWriteBuffer and appReadBuffer.
> We should benchmark produce and consume performance and try a few 
> combinations of direct/heap byte buffers for netReadBuffer, netWriteBuffer 
> and appReadBuffer (the latter should probably remain as a heap byte buffer, 
> but no harm in testing it too).



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KAFKA-7310) Fix SSL tests when running with Java 11

2018-08-20 Thread Ray Chiang (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-7310?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ray Chiang updated KAFKA-7310:
--
Component/s: unit tests
 packaging

> Fix SSL tests when running with Java 11
> ---
>
> Key: KAFKA-7310
> URL: https://issues.apache.org/jira/browse/KAFKA-7310
> Project: Kafka
>  Issue Type: Sub-task
>  Components: packaging, unit tests
>Reporter: Ismael Juma
>Priority: Critical
> Fix For: 2.1.0
>
>
> * Many SSL tests in clients fail. Probably related to the TLS 1.3 changes in 
> Java 11.
> * Many SSL (and some SASL) tests in Core fail. Maybe same underlying reason 
> as the clients failures, but I didn't investigate.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KAFKA-7232) Allow kafka-topics.sh to take brokerid as parameter to show partitions associated with it

2018-08-20 Thread Ray Chiang (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-7232?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ray Chiang updated KAFKA-7232:
--
Component/s: tools

> Allow kafka-topics.sh to take brokerid as parameter to show partitions 
> associated with it
> -
>
> Key: KAFKA-7232
> URL: https://issues.apache.org/jira/browse/KAFKA-7232
> Project: Kafka
>  Issue Type: Improvement
>  Components: tools
>Reporter: Ratish Ravindran
>Priority: Minor
>
> Currently with kafka-topics.sh if we want to get the list of partitions 
> associated with a specific broker irrespective of whether it is leader or 
> replica, we pipe the output and then do grep on it.
> I am proposing the change to add option in TopicCommand.scala to pass the 
> broker id.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KAFKA-1175) Hierarchical Topics

2018-08-03 Thread Ray Chiang (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-1175?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ray Chiang updated KAFKA-1175:
--
Component/s: core

> Hierarchical Topics
> ---
>
> Key: KAFKA-1175
> URL: https://issues.apache.org/jira/browse/KAFKA-1175
> Project: Kafka
>  Issue Type: New Feature
>  Components: core
>Reporter: Pradeep Gollakota
>Priority: Major
>
> Allow for creation of hierarchical topics so that related topics can be 
> grouped together.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KAFKA-1450) check invalid leader in a more robust way

2018-08-03 Thread Ray Chiang (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-1450?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ray Chiang updated KAFKA-1450:
--
Component/s: core

> check invalid leader in a more robust way
> -
>
> Key: KAFKA-1450
> URL: https://issues.apache.org/jira/browse/KAFKA-1450
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 0.8.2.0
>Reporter: Jun Rao
>Priority: Major
> Attachments: KAFKA-1450.patch
>
>
> In MetadataResponse, we only treat -1 as an invalid leader.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KAFKA-1206) allow Kafka to start from a resource negotiator system

2018-08-03 Thread Ray Chiang (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-1206?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ray Chiang updated KAFKA-1206:
--
Component/s: packaging

> allow Kafka to start from a resource negotiator system
> --
>
> Key: KAFKA-1206
> URL: https://issues.apache.org/jira/browse/KAFKA-1206
> Project: Kafka
>  Issue Type: Bug
>  Components: packaging
>Reporter: Joe Stein
>Priority: Major
>  Labels: mesos
> Attachments: KAFKA-1206_2014-01-16_00:40:30.patch
>
>
> We need a generic implementation to hold the property information for 
> brokers, producers and consumers.  We want the resource negotiator to store 
> this information however it wants and give it respond with a 
> java.util.Properties.  This can get used then in the Kafka.scala as 
> serverConfigs for the KafkaServerStartable.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KAFKA-2079) Support exhibitor

2018-08-03 Thread Ray Chiang (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-2079?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ray Chiang updated KAFKA-2079:
--
Component/s: zkclient
 packaging

> Support exhibitor
> -
>
> Key: KAFKA-2079
> URL: https://issues.apache.org/jira/browse/KAFKA-2079
> Project: Kafka
>  Issue Type: Improvement
>  Components: packaging, zkclient
>Reporter: Aaron Dixon
>Priority: Major
>
> Exhibitor (https://github.com/Netflix/exhibitor) is a discovery/monitoring 
> solution for managing Zookeeper clusters. It supports use cases like 
> discovery, node replacements and auto-scaling of Zk cluster hosts (so you 
> don't have to manage a fixed set of Zk hosts--especially useful in cloud 
> environments.)
> The easiest way for Kafka to support connection to Zk clusters via exhibitor 
> is to use curator as its client. There is already a separate ticket for this: 
> KAFKA-873



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KAFKA-1754) KOYA - Kafka on YARN

2018-08-03 Thread Ray Chiang (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-1754?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ray Chiang updated KAFKA-1754:
--
Component/s: packaging

> KOYA - Kafka on YARN
> 
>
> Key: KAFKA-1754
> URL: https://issues.apache.org/jira/browse/KAFKA-1754
> Project: Kafka
>  Issue Type: New Feature
>  Components: packaging
>Reporter: Thomas Weise
>Priority: Major
> Attachments: DT-KOYA-Proposal- JIRA.pdf
>
>
> YARN (Hadoop 2.x) has enabled clusters to be used for a variety of workloads, 
> emerging as distributed operating system for big data applications. 
> Initiatives are on the way to bring long running services under the YARN 
> umbrella, leveraging it for centralized resource management and operations 
> ([YARN-896] and examples such as HBase, Accumulo or Memcached through 
> Slider). This JIRA is to propose KOYA (Kafka On Yarn), a YARN application 
> master to launch and manage Kafka clusters running on YARN. Brokers will use 
> resources allocated through YARN with support for recovery, monitoring etc. 
> Please see attached for more details.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KAFKA-1207) Launch Kafka from within Apache Mesos

2018-08-03 Thread Ray Chiang (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-1207?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ray Chiang updated KAFKA-1207:
--
Component/s: packaging

> Launch Kafka from within Apache Mesos
> -
>
> Key: KAFKA-1207
> URL: https://issues.apache.org/jira/browse/KAFKA-1207
> Project: Kafka
>  Issue Type: Bug
>  Components: packaging
>Reporter: Joe Stein
>Priority: Major
>  Labels: mesos
> Attachments: KAFKA-1207.patch, KAFKA-1207_2014-01-19_00:04:58.patch, 
> KAFKA-1207_2014-01-19_00:48:49.patch
>
>
> There are a few components to this.
> 1) The Framework:  This is going to be responsible for starting up and 
> managing the fail over of brokers within the mesos cluster.  This will have 
> to get some Kafka focused paramaters for launching new replica brokers, 
> moving topics and partitions around based on what is happening in the grid 
> through time.
> 2) The Scheduler: This is what is going to ask for resources for Kafka 
> brokers (new ones, replacement ones, commissioned ones) and other operations 
> such as stopping tasks (decommissioning brokers).  I think this should also 
> expose a user interface (or at least a rest api) for producers and consumers 
> so we can have producers and consumers run inside of the mesos cluster if 
> folks want (just add the jar)
> 3) The Executor : This is the task launcher.  It launches tasks kills them 
> off.
> 4) Sharing data between Scheduler and Executor: I looked at the a few 
> implementations of this.  I like parts of the Storm implementation but think 
> using the environment variable 
> ExectorInfo.CommandInfo.Enviornment.Variables[] is the best shot.  We can 
> have a command line bin/kafka-mesos-scheduler-start.sh that would build the 
> contrib project if not already built and support conf/server.properties to 
> start.
> The Framework and operating Scheduler would run in on an administrative node. 
>  I am probably going to hook Apache Curator into it so it can do it's own 
> failure to a another follower.  Running more than 2 should be sufficient as 
> long as it can bring back it's state (e.g. from zk).  I think we can add this 
> in after once everything is working.
> Additional detail can be found on the Wiki page 
> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=38570672



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KAFKA-10) Kafka deployment on EC2 should be WHIRR based, instead of current contrib/deploy code based solution

2018-08-03 Thread Ray Chiang (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-10?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ray Chiang updated KAFKA-10:

Component/s: packaging

> Kafka deployment on EC2 should be WHIRR based, instead of current 
> contrib/deploy code based solution
> 
>
> Key: KAFKA-10
> URL: https://issues.apache.org/jira/browse/KAFKA-10
> Project: Kafka
>  Issue Type: Improvement
>  Components: packaging
>Affects Versions: 0.6
>Priority: Major
>
> Apache Whirr is a a set of libraries for running cloud services 
> http://incubator.apache.org/whirr/ 
> It is desirable that Kafka's integration with EC2 be Whirr based, rather than 
> the code based solution we currently have in contrib/deploy. 
> The code in contrib/deploy will be deleted in 0.6 release



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (KAFKA-5153) KAFKA Cluster : 0.10.2.0 : Servers Getting disconnected : Service Impacting

2018-08-03 Thread Ray Chiang (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ray Chiang resolved KAFKA-5153.
---
   Resolution: Information Provided
Fix Version/s: 0.11.0.1

Upgrading fixed the problem based on the last comment.

> KAFKA Cluster : 0.10.2.0 : Servers Getting disconnected : Service Impacting
> ---
>
> Key: KAFKA-5153
> URL: https://issues.apache.org/jira/browse/KAFKA-5153
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.10.2.0, 0.11.0.0
> Environment: RHEL 6
> Java Version  1.8.0_91-b14
>Reporter: Arpan
>Priority: Critical
>  Labels: reliability
> Fix For: 0.11.0.1
>
> Attachments: ThreadDump_1493564142.dump, ThreadDump_1493564177.dump, 
> ThreadDump_1493564249.dump, server.properties, server_1_72server.log, 
> server_2_73_server.log, server_3_74Server.log
>
>
> Hi Team, 
> I was earlier referring to issue KAFKA-4477 because the problem i am facing 
> is similar. I tried to search the same reference in release docs as well but 
> did not get anything in 0.10.1.1 or 0.10.2.0. I am currently using 
> 2.11_0.10.2.0.
> I am have 3 node cluster for KAFKA and cluster for ZK as well on the same set 
> of servers in cluster mode. We are having around 240GB of data getting 
> transferred through KAFKA everyday. What we are observing is disconnect of 
> the server from cluster and ISR getting reduced and it starts impacting 
> service.
> I have also observed file descriptor count getting increased a bit, in normal 
> circumstances we have not observed FD count more than 500 but when issue 
> started we were observing it in the range of 650-700 on all 3 servers. 
> Attaching thread dumps of all 3 servers when we started facing the issue 
> recently.
> The issue get vanished once you bounce the nodes and the set up is not 
> working more than 5 days without this issue. Attaching server logs as well.
> Kindly let me know if you need any additional information. Attaching 
> server.properties as well for one of the server (It's similar on all 3 
> serversP)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KAFKA-3346) Rename "Mode" to "SslMode"

2018-08-03 Thread Ray Chiang (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-3346?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ray Chiang updated KAFKA-3346:
--
Component/s: security

> Rename "Mode" to "SslMode"
> --
>
> Key: KAFKA-3346
> URL: https://issues.apache.org/jira/browse/KAFKA-3346
> Project: Kafka
>  Issue Type: Bug
>  Components: security
>Reporter: Gwen Shapira
>Priority: Major
>
> In the channel builders, the Mode enum is undocumented, so it is unclear that 
> it is used to signify whether the connection is for SSL client or SSL server.
> I suggest renaming to SslMode (although adding documentation will be ok too, 
> if people object to the rename)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KAFKA-3458) Selector should throw InterruptException when interrupted

2018-08-03 Thread Ray Chiang (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-3458?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ray Chiang updated KAFKA-3458:
--
Component/s: network

> Selector should throw InterruptException when interrupted
> -
>
> Key: KAFKA-3458
> URL: https://issues.apache.org/jira/browse/KAFKA-3458
> Project: Kafka
>  Issue Type: Bug
>  Components: network
>Affects Versions: 0.9.0.1
>Reporter: Sebastian Rühl
>Priority: Major
>
> Similar to [KAFKA-2704]:
> org.apache.kafka.common.network.Selector does not throw InterruptException 
> when interrupted



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KAFKA-4610) getting error:Batch containing 3 record(s) expired due to timeout while requesting metadata from brokers for test2R2P2-1

2018-08-03 Thread Ray Chiang (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-4610?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ray Chiang updated KAFKA-4610:
--
Component/s: producer 

> getting error:Batch containing 3 record(s) expired due to timeout while 
> requesting metadata from brokers for test2R2P2-1
> 
>
> Key: KAFKA-4610
> URL: https://issues.apache.org/jira/browse/KAFKA-4610
> Project: Kafka
>  Issue Type: Bug
>  Components: producer 
> Environment: Dev
>Reporter: sandeep kumar singh
>Priority: Major
>
> i a getting below error when running producer client, which take messages 
> from an input file kafka_message.log. this log file is pilled with 10 
> records per second of each message of length 4096
> error - 
> [2017-01-09 14:45:24,813] ERROR Error when sending message to topic test2R2P2 
> with key: null, value: 4096 bytes with error: 
> (org.apache.kafka.clients.producer.internals.ErrorLoggingCallback)
> org.apache.kafka.common.errors.TimeoutException: Batch containing 3 record(s) 
> expired due to timeout while requesting metadata from brokers for test2R2P2-0
> [2017-01-09 14:45:24,816] ERROR Error when sending message to topic test2R2P2 
> with key: null, value: 4096 bytes with error: 
> (org.apache.kafka.clients.producer.internals.ErrorLoggingCallback)
> org.apache.kafka.common.errors.TimeoutException: Batch containing 3 record(s) 
> expired due to timeout while requesting metadata from brokers for test2R2P2-0
> [2017-01-09 14:45:24,816] ERROR Error when sending message to topic test2R2P2 
> with key: null, value: 4096 bytes with error: 
> (org.apache.kafka.clients.producer.internals.ErrorLoggingCallback)
> org.apache.kafka.common.errors.TimeoutException: Batch containing 3 record(s) 
> expired due to timeout while requesting metadata from brokers for test2R2P2-0
> [2017-01-09 14:45:24,816] ERROR Error when sending message to topic test2R2P2 
> with key: null, value: 4096 bytes with error: 
> (org.apache.kafka.clients.producer.internals.ErrorLoggingCallback)
> org.apache.kafka.common.errors.TimeoutException: Batch containing 3 record(s) 
> expired due to timeout while requesting metadata from brokers for test2R2P2-0
> [2017-01-09 14:45:24,816] ERROR Error when sending message to topic test2R2P2 
> with key: null, value: 4096 bytes with error: 
> (org.apache.kafka.clients.producer.internals.ErrorLoggingCallback)
> org.apache.kafka.common.errors.TimeoutException: Batch containing 3 record(s) 
> expired due to timeout while requesting metadata from brokers for test2R2P2-0
> command i run :
> $ bin/kafka-console-producer.sh --broker-list x.x.x.x:,x.x.x.x: 
> --batch-size 1000 --message-send-max-retries 10 --request-required-acks 1 
> --topic test2R2P2 <~/kafka_message.log
> there are 2 brokers running and the topic has partitions = 2 and replication 
> factor 2. 
> Could you please help me understand what does that error means?
> also i see message loss when i manually restart one of the broker and while 
> kafak-producer-perf-test command is running? is this a expected behavior?
> thanks
> Sandeep



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KAFKA-5092) KIP 141 - ProducerRecord Interface Improvements

2018-08-03 Thread Ray Chiang (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-5092?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ray Chiang updated KAFKA-5092:
--
Component/s: producer 

> KIP 141 - ProducerRecord Interface Improvements
> ---
>
> Key: KAFKA-5092
> URL: https://issues.apache.org/jira/browse/KAFKA-5092
> Project: Kafka
>  Issue Type: Improvement
>  Components: producer 
>Reporter: Stephane Maarek
>Priority: Major
>  Labels: kip
> Fix For: 2.1.0
>
>
> See KIP here: 
> https://cwiki.apache.org/confluence/display/KAFKA/KIP+141+-+ProducerRecord+Interface+Improvements



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KAFKA-4385) producer is sending too many unnecessary meta data request if the meta data for a topic is not available and "auto.create.topics.enable" =false

2018-08-03 Thread Ray Chiang (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-4385?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ray Chiang updated KAFKA-4385:
--
Component/s: producer 

> producer is sending too many unnecessary meta data request if the meta data 
> for a topic is not available and "auto.create.topics.enable" =false
> ---
>
> Key: KAFKA-4385
> URL: https://issues.apache.org/jira/browse/KAFKA-4385
> Project: Kafka
>  Issue Type: Bug
>  Components: producer 
>Affects Versions: 0.9.0.1
>Reporter: Jun Yao
>Priority: Major
>
> All current kafka-client producer implementation (<= 0.10.1.0),
> When sending a msg to a topic, it will first check if meta data for this 
> topic is available or not, 
> when not available, it will set "metadata.requestUpdate()" and wait for meta 
> data from brokers, 
> The thing is inside "org.apache.kafka.clients.Metadata.awaitUpdate()", it's 
> already doing a "while (this.version <= lastVersion)" loop waiting for new 
> version response, 
> So the loop inside 
> "org.apache.kafka.clients.producer.KafkaProducer.waitOnMetadata() is not 
> needed, 
> When "auto.create.topics.enable" is false, sending msgs to a non-exist topic 
> will trigger too many meta requests, everytime a metadata response is 
> returned, because it does not contain the metadata for the topic, it's going 
> to try again until TimeoutException is thrown; 
> This is a waste and sometimes causes too much overhead when unexpected msgs 
> are arrived. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KAFKA-4487) Tests should be run in Jenkins with INFO or DEBUG level

2018-08-03 Thread Ray Chiang (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-4487?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ray Chiang updated KAFKA-4487:
--
Component/s: unit tests

> Tests should be run in Jenkins with INFO or DEBUG level
> ---
>
> Key: KAFKA-4487
> URL: https://issues.apache.org/jira/browse/KAFKA-4487
> Project: Kafka
>  Issue Type: Bug
>  Components: unit tests
>Reporter: Ismael Juma
>Priority: Major
> Fix For: 2.1.0
>
>
> KAFKA-4483 is an example of what can be missed by running them at ERROR 
> level. Worse than that would be subtle issues that would escape detection 
> altogether.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KAFKA-4915) Don't include "error" in log messages at non-error levels

2018-08-03 Thread Ray Chiang (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-4915?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ray Chiang updated KAFKA-4915:
--
Component/s: logging

> Don't include "error" in log messages at non-error levels
> -
>
> Key: KAFKA-4915
> URL: https://issues.apache.org/jira/browse/KAFKA-4915
> Project: Kafka
>  Issue Type: Bug
>  Components: logging
>Affects Versions: 0.10.1.1
>Reporter: Andrey
>Priority: Major
>
> Currently we see in logs:
> {code}
> WARN Error while fetching metadata with correlation id 659455 : 
> {kafka-connect-offsets=INVALID_REPLICATION_FACTOR} 
> (org.apache.kafka.clients.NetworkClient:600)
> {code}
> Expected:
> {code}
> WARN unable to fetch metadata with correlation id 659455 : 
> {kafka-connect-offsets=INVALID_REPLICATION_FACTOR} 
> (org.apache.kafka.clients.NetworkClient:600)
> {code}
> or
> {code}
> ERROR unable to fetch metadata with correlation id 659455 : 
> {kafka-connect-offsets=INVALID_REPLICATION_FACTOR} 
> (org.apache.kafka.clients.NetworkClient:600)
> {code}
> See for reference: 
> https://github.com/apache/kafka/blob/65650ba4dcba8a9729cb9cb6477a62a7b7c3714e/clients/src/main/java/org/apache/kafka/clients/NetworkClient.java#L723



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KAFKA-5071) 2017-04-11 18:18:45.574 ERROR StreamThread:783 StreamThread-128 - stream-thread [StreamThread-128] Failed to commit StreamTask 0_304 state: org.apache.kafka.streams.erro

2018-08-03 Thread Ray Chiang (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-5071?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ray Chiang updated KAFKA-5071:
--
Component/s: streams

> 2017-04-11 18:18:45.574 ERROR StreamThread:783 StreamThread-128 - 
> stream-thread [StreamThread-128] Failed to commit StreamTask 0_304 state:  
> org.apache.kafka.streams.errors.ProcessorStateException: task [0_304] Failed 
> to flush state store fmdbt 
> -
>
> Key: KAFKA-5071
> URL: https://issues.apache.org/jira/browse/KAFKA-5071
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 0.10.2.0
> Environment: Linux
>Reporter: Dhana
>Priority: Major
> Attachments: RocksDB_Issue_commitFailedonFlush.7z
>
>
> Scenario: we use two consumer(applicaion -puse10) in different machine.
> using 400 partitions, 200 streams/consumer.
> config:
> bootstrap.servers=10.16.34.29:9092,10.16.35.134:9092,10.16.38.27:9092
> zookeeper.connect=10.16.34.29:2181,10.16.35.134:2181,10.16.38.27:2181
> num.stream.threads=200
> pulse.per.pdid.count.enable=false
> replication.factor=2
> state.dir=/opt/rocksdb
> max.poll.records=50
> session.timeout.ms=18
> request.timeout.ms=502
> max.poll.interval.ms=500
> fetch.max.bytes=102400
> max.partition.fetch.bytes=102400
> heartbeat.interval.ms = 6
> Logs - attached.
> Error:
> 2017-04-11 18:18:45.170 INFO  VehicleEventsStreamProcessor:219 
> StreamThread-32 - Current size of Treemap is 4 for pdid 
> skga11041730gedvcl2pdid2236
> 2017-04-11 18:18:45.170 INFO  VehicleEventsStreamProcessor:245 
> StreamThread-32 - GE to be processed pdid skga11041730gedvcl2pdid2236 and 
> uploadTimeStamp 2017-04-11 17:46:06.883
> 2017-04-11 18:18:45.175 INFO  VehicleEventsStreamProcessor:179 
> StreamThread-47 - Arrived GE uploadTimestamp 2017-04-11 17:46:10.911 pdid 
> skga11041730gedvcl2pdid2290
> 2017-04-11 18:18:45.176 INFO  VehicleEventsStreamProcessor:219 
> StreamThread-47 - Current size of Treemap is 4 for pdid 
> skga11041730gedvcl2pdid2290
> 2017-04-11 18:18:45.176 INFO  VehicleEventsStreamProcessor:245 
> StreamThread-47 - GE to be processed pdid skga11041730gedvcl2pdid2290 and 
> uploadTimeStamp 2017-04-11 17:46:06.911
> 2017-04-11 18:18:45.571 INFO  StreamThread:737 StreamThread-128 - 
> stream-thread [StreamThread-128] Committing all tasks because the commit 
> interval 3ms has elapsed
> 2017-04-11 18:18:45.571 INFO  StreamThread:775 StreamThread-128 - 
> stream-thread [StreamThread-128] Committing task StreamTask 0_304
> 2017-04-11 18:18:45.574 ERROR StreamThread:783 StreamThread-128 - 
> stream-thread [StreamThread-128] Failed to commit StreamTask 0_304 state: 
> org.apache.kafka.streams.errors.ProcessorStateException: task [0_304] Failed 
> to flush state store fmdbt
>   at 
> org.apache.kafka.streams.processor.internals.ProcessorStateManager.flush(ProcessorStateManager.java:325)
>   at 
> org.apache.kafka.streams.processor.internals.StreamTask$1.run(StreamTask.java:72)
>   at 
> org.apache.kafka.streams.processor.internals.StreamsMetricsImpl.measureLatencyNs(StreamsMetricsImpl.java:188)
>   at 
> org.apache.kafka.streams.processor.internals.StreamTask.commit(StreamTask.java:280)
>   at 
> org.apache.kafka.streams.processor.internals.StreamThread.commitOne(StreamThread.java:777)
>   at 
> org.apache.kafka.streams.processor.internals.StreamThread.commitAll(StreamThread.java:764)
>   at 
> org.apache.kafka.streams.processor.internals.StreamThread.maybeCommit(StreamThread.java:739)
>   at 
> org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:661)
>   at 
> org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:368)
> Caused by: org.apache.kafka.streams.errors.ProcessorStateException: Error 
> while executing flush from store fmdbt
>   at 
> com.harman.analytics.stream.base.stores.HarmanRocksDBStore.flushInternal(HarmanRocksDBStore.java:353)
>   at 
> com.harman.analytics.stream.base.stores.HarmanRocksDBStore.flush(HarmanRocksDBStore.java:342)
>   at 
> com.harman.analytics.stream.base.stores.HarmanPersistentKVStore.flush(HarmanPersistentKVStore.java:72)
>   at 
> org.apache.kafka.streams.processor.internals.ProcessorStateManager.flush(ProcessorStateManager.java:323)
>   ... 8 more
> Caused by: org.rocksdb.RocksDBException: N
>   at org.rocksdb.RocksDB.flush(Native Method)
>   at org.rocksdb.RocksDB.flush(RocksDB.java:1642)
>   at 
> com.harman.analytics.stream.base.stores.HarmanRocksDBStore.flushInternal(HarmanRocksDBStore.java:351)
>   ... 11 more
> 2017-04-11 

[jira] [Updated] (KAFKA-5306) Official init.d scripts

2018-08-03 Thread Ray Chiang (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-5306?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ray Chiang updated KAFKA-5306:
--
Component/s: packaging

> Official init.d scripts
> ---
>
> Key: KAFKA-5306
> URL: https://issues.apache.org/jira/browse/KAFKA-5306
> Project: Kafka
>  Issue Type: Improvement
>  Components: packaging
>Affects Versions: 0.10.2.1
> Environment: Ubuntu 14.04
>Reporter: Shahar
>Priority: Minor
>
> It would be great to have an officially supported init.d script for starting 
> and stopping Kafka as a service.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KAFKA-5480) Partition Leader may not be elected although there is one live replica in ISR

2018-08-03 Thread Ray Chiang (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-5480?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ray Chiang updated KAFKA-5480:
--
Component/s: replication

> Partition Leader may not be elected although there is one live replica in ISR
> -
>
> Key: KAFKA-5480
> URL: https://issues.apache.org/jira/browse/KAFKA-5480
> Project: Kafka
>  Issue Type: Bug
>  Components: replication
>Affects Versions: 0.9.0.1, 0.10.2.0
>Reporter: Pengwei
>Priority: Major
>  Labels: reliability
> Fix For: 2.1.0
>
>
> Currently we found a consumer blocking in the poll because of the coordinator 
> of this consumer group is not available.
> Digging in the log, we found some of the __consumer_offsets' partitions' 
> leader are -1, so the coordinator not available is 
> because of leader is not available, the scene is as follow:
> There are 3 brokers in the cluster, and the network of the cluster is not 
> stable.  At the beginning, the partition [__consumer_offsets,3] 
> Leader is 3, ISR is [3, 1, 2]
> 1. Broker 1 become the controller: 
> [2017-06-10 15:48:30,006] INFO [Controller 1]: Broker 1 starting become 
> controller state transition (kafka.controller.KafkaController)
> [2017-06-10 15:48:30,085] INFO [Controller 1]: Initialized controller epoch 
> to 8 and zk version 7 (kafka.controller.KafkaController)
> [2017-06-10 15:48:30,088] INFO [Controller 1]: Controller 1 incremented epoch 
> to 9 (kafka.controller.KafkaController)
> 2. Broker 2 soon becomes the controller, it is aware of all the brokers:
> [2017-06-10 15:48:30,936] INFO [Controller 2]: Broker 2 starting become 
> controller state transition (kafka.controller.KafkaController)
> [2017-06-10 15:48:30,936] INFO [Controller 2]: Initialized controller epoch 
> to 9 and zk version 8 (kafka.controller.KafkaController)
> [2017-06-10 15:48:30,943] INFO [Controller 2]: Controller 2 incremented epoch 
> to 10 (kafka.controller.KafkaController)
> [2017-06-10 15:48:31,574] INFO [Controller 2]: Currently active brokers in 
> the cluster: Set(1, 2, 3) (kafka.controller.KafkaController)
> [2017-06-10 15:48:31,574] INFO [Controller 2]: Currently shutting brokers in 
> the cluster: Set() (kafka.controller.KafkaController)
> So broker 2 think Leader 3 is alive, does not need to elect leader.
> 3. Broker 1 is not resign until 15:48:32,  but it is not aware of the broker 
> 3:
> [2017-06-10 15:48:31,470] INFO [Controller 1]: List of partitions to be 
> deleted: Map() (kafka.controller.KafkaController)
> [2017-06-10 15:48:31,470] INFO [Controller 1]: Currently active brokers in 
> the cluster: Set(1, 2) (kafka.controller.KafkaController)
> [2017-06-10 15:48:31,470] INFO [Controller 1]: Currently shutting brokers in 
> the cluster: Set() (kafka.controller.KafkaController)
> and change the Leader to broker 1:
> [2017-06-10 15:48:31,847] DEBUG [OfflinePartitionLeaderSelector]: Some broker 
> in ISR is alive for [__consumer_offsets,3]. Select 1 from ISR 1,2 to be the 
> leader. (kafka.controller.OfflinePartitionLeaderSelector)
> broker 1 resign until 15:48:32 when the zk client is aware of the broker 2 
> has change the controller's data:
> kafka.common.ControllerMovedException: Broker 1 received update metadata 
> request with correlation id 4 from an old controller 1 with epoch 9. Latest 
> known controller epoch is 10
>   at 
> kafka.server.ReplicaManager.maybeUpdateMetadataCache(ReplicaManager.scala:621)
>   at 
> kafka.server.KafkaApis.handleUpdateMetadataRequest(KafkaApis.scala:163)
>   at kafka.server.KafkaApis.handle(KafkaApis.scala:76)
>   at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:60)
>   at java.lang.Thread.run(Thread.java:748)
> [2017-06-10 15:48:32,307] INFO New leader is 2 
> (kafka.server.ZookeeperLeaderElector$LeaderChangeListener)
> 4. Then broker 2's controllerContext.partitionLeadershipInfo cached the 
> Leader is 3 ISR is [3,1,2], but in zk
> Leader is 1 ISR is [1, 2].  It will keep this a long time until another zk 
> event happen.
> 5. After 1 day, broker 2 received the broker 1's broker change event:
> [2017-06-12 21:43:18,287] INFO [BrokerChangeListener on Controller 2]: Broker 
> change listener fired for path /brokers/ids with children 2,3 
> (kafka.controller.ReplicaStateMachine$BrokerChangeListener)
> [2017-06-12 21:43:18,293] INFO [BrokerChangeListener on Controller 2]: Newly 
> added brokers: , deleted brokers: 1, all live brokers: 2,3 
> (kafka.controller.ReplicaStateMachine$BrokerChangeListener)
> then broker 2 will invoke onBrokerFailure for the deleted broker 1,  but 
> because Leader is 3, it will not change the partition to OfflinePartition and 
> will not change the leader in 
> partitionStateMachine.triggerOnlinePartitionStateChange().
> But in the 
> 

[jira] [Updated] (KAFKA-7110) Windowed changelog keys not deserialized properly by TimeWindowedSerde

2018-08-03 Thread Ray Chiang (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-7110?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ray Chiang updated KAFKA-7110:
--
Component/s: streams

> Windowed changelog keys not deserialized properly by TimeWindowedSerde
> --
>
> Key: KAFKA-7110
> URL: https://issues.apache.org/jira/browse/KAFKA-7110
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Reporter: Shawn Nguyen
>Priority: Major
>
> Currently the TimeWindowedSerde does not deserialize the windowed keys from a 
> changelog topic properly. There are a few assumptions made in the 
> TimeWindowedDeserializer that prevents the changelog windowed keys from being 
> correctly deserialized.
> 1) In the from method of WindowKeySchema (called in deserialize in 
> TimeWindowedDeserializer), we extract the window from the binary key, but we 
> call getLong(binaryKey.length -TIMESTAMP_SIZE). However, the changelog for 
> ChangeLoggingWindowBytesStore will log the windowed key as:
>  
> {noformat}
> changeLogger.logChange(WindowKeySchema.toStoreKeyBinary(key, timestamp, 
> maybeUpdateSeqnumForDups()), value);
> {noformat}
>  
> In toStoreKeyBinary, we store the key in 
> {noformat}
> final ByteBuffer buf = ByteBuffer.allocate(serializedKey.length + 
> TIMESTAMP_SIZE + SEQNUM_SIZE);
> {noformat}
> with the seqnum (used for de-duping). So the eventual result is that when we 
> deserialize, we do not assume the windowed changelog key has a seq_num, and 
> the window extracted will be gibberish values since the bytes extracted won't 
> be alligned.
> The fix here is to introduce a new Serde in WindowSerdes that will handle 
> explicitly, windowed changelog input topic. 
>  
> 2) In the constructor of TimeWindowedDeserializer, the windowSize is fixed to 
> Long.MAX_VALUE:
>  
> {noformat}
> // TODO: fix this part as last bits of KAFKA-4468 public 
> TimeWindowedDeserializer(final Deserializer inner) { this(inner, 
> Long.MAX_VALUE); } 
> public TimeWindowedDeserializer(final Deserializer inner, final long 
> windowSize) { this.inner = inner; this.windowSize = windowSize; }
> {noformat}
> This will cause the end times to be giberrish when we extract the window 
> since the windowSize is subtracted from the start time in:
>  
> {noformat}
> public static  Windowed from(final byte[] binaryKey, final long 
> windowSize, final Deserializer deserializer, final String topic) { final 
> byte[] bytes = new byte[binaryKey.length - TIMESTAMP_SIZE]; 
> System.arraycopy(binaryKey, 0, bytes, 0, bytes.length); final K key = 
> deserializer.deserialize(topic, bytes); final Window window = 
> extractWindow(binaryKey, windowSize); return new Windowed<>(key, window); } 
> private static Window extractWindow(final byte[] binaryKey, final long 
> windowSize) { final ByteBuffer buffer = ByteBuffer.wrap(binaryKey); final 
> long start = buffer.getLong(binaryKey.length - TIMESTAMP_SIZE); return 
> timeWindowForSize(start, windowSize); }
> {noformat}
> So in the new serde, we will make windowSize a constructor param that can be 
> supplied.
> I've started a patch, and will prepare a PR for the fix for 1) and 2) above. 
> Let me know if this sounds reasonable. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KAFKA-3296) All consumer reads hang indefinately

2018-08-03 Thread Ray Chiang (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-3296?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ray Chiang updated KAFKA-3296:
--
Component/s: consumer

> All consumer reads hang indefinately
> 
>
> Key: KAFKA-3296
> URL: https://issues.apache.org/jira/browse/KAFKA-3296
> Project: Kafka
>  Issue Type: Bug
>  Components: consumer
>Affects Versions: 0.9.0.0, 0.9.0.1
>Reporter: Simon Cooper
>Priority: Critical
> Attachments: controller.zip, kafkalogs.zip
>
>
> We've got several integration tests that bring up systems on VMs for testing. 
> We've recently upgraded to 0.9, and very occasionally we occasionally see an 
> issue where every consumer that tries to read from the broker hangs, spamming 
> the following in their logs:
> {code}2016-02-26T12:25:37,856 | DEBUG | o.a.k.c.NetworkClient 
> [pool-10-thread-1] | Sending metadata request 
> ClientRequest(expectResponse=true, callback=null, 
> request=RequestSend(header={api_key=3,api_version=0,correlation_id=21905,client_id=consumer-1},
>  body={topics=[Topic1]}), isInitiatedByNetworkClient, 
> createdTimeMs=1456489537856, sendTimeMs=0) to node 1
> 2016-02-26T12:25:37,856 | DEBUG | o.a.k.c.Metadata [pool-10-thread-1] | 
> Updated cluster metadata version 10954 to Cluster(nodes = [Node(1, 
> server.internal, 9092)], partitions = [Partition(topic = Topic1, partition = 
> 0, leader = 1, replicas = [1,], isr = [1,]])
> 2016-02-26T12:25:37,856 | DEBUG | o.a.k.c.c.i.AbstractCoordinator 
> [pool-10-thread-1] | Issuing group metadata request to broker 1
> 2016-02-26T12:25:37,857 | DEBUG | o.a.k.c.c.i.AbstractCoordinator 
> [pool-10-thread-1] | Group metadata response 
> ClientResponse(receivedTimeMs=1456489537857, disconnected=false, 
> request=ClientRequest(expectResponse=true, 
> callback=org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient$RequestFutureCompletionHandler@28edb273,
>  
> request=RequestSend(header={api_key=10,api_version=0,correlation_id=21906,client_id=consumer-1},
>  body={group_id=}), createdTimeMs=1456489537856, sendTimeMs=1456489537856), 
> responseBody={error_code=15,coordinator={node_id=-1,host=,port=-1}})
> 2016-02-26T12:25:37,956 | DEBUG | o.a.k.c.NetworkClient [pool-10-thread-1] | 
> Sending metadata request ClientRequest(expectResponse=true, callback=null, 
> request=RequestSend(header={api_key=3,api_version=0,correlation_id=21907,client_id=consumer-1},
>  body={topics=[Topic1]}), isInitiatedByNetworkClient, 
> createdTimeMs=1456489537956, sendTimeMs=0) to node 1
> 2016-02-26T12:25:37,956 | DEBUG | o.a.k.c.Metadata [pool-10-thread-1] | 
> Updated cluster metadata version 10955 to Cluster(nodes = [Node(1, 
> server.internal, 9092)], partitions = [Partition(topic = Topic1, partition = 
> 0, leader = 1, replicas = [1,], isr = [1,]])
> 2016-02-26T12:25:37,956 | DEBUG | o.a.k.c.c.i.AbstractCoordinator 
> [pool-10-thread-1] | Issuing group metadata request to broker 1
> 2016-02-26T12:25:37,957 | DEBUG | o.a.k.c.c.i.AbstractCoordinator 
> [pool-10-thread-1] | Group metadata response 
> ClientResponse(receivedTimeMs=1456489537957, disconnected=false, 
> request=ClientRequest(expectResponse=true, 
> callback=org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient$RequestFutureCompletionHandler@40cee8cc,
>  
> request=RequestSend(header={api_key=10,api_version=0,correlation_id=21908,client_id=consumer-1},
>  body={group_id=}), createdTimeMs=1456489537956, sendTimeMs=1456489537956), 
> responseBody={error_code=15,coordinator={node_id=-1,host=,port=-1}})
> 2016-02-26T12:25:38,056 | DEBUG | o.a.k.c.NetworkClient [pool-10-thread-1] | 
> Sending metadata request ClientRequest(expectResponse=true, callback=null, 
> request=RequestSend(header={api_key=3,api_version=0,correlation_id=21909,client_id=consumer-1},
>  body={topics=[Topic1]}), isInitiatedByNetworkClient, 
> createdTimeMs=1456489538056, sendTimeMs=0) to node 1
> 2016-02-26T12:25:38,056 | DEBUG | o.a.k.c.Metadata [pool-10-thread-1] | 
> Updated cluster metadata version 10956 to Cluster(nodes = [Node(1, 
> server.internal, 9092)], partitions = [Partition(topic = Topic1, partition = 
> 0, leader = 1, replicas = [1,], isr = [1,]])
> 2016-02-26T12:25:38,056 | DEBUG | o.a.k.c.c.i.AbstractCoordinator 
> [pool-10-thread-1] | Issuing group metadata request to broker 1
> 2016-02-26T12:25:38,057 | DEBUG | o.a.k.c.c.i.AbstractCoordinator 
> [pool-10-thread-1] | Group metadata response 
> ClientResponse(receivedTimeMs=1456489538057, disconnected=false, 
> request=ClientRequest(expectResponse=true, 
> callback=org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient$RequestFutureCompletionHandler@439e25fb,
>  
> request=RequestSend(header={api_key=10,api_version=0,correlation_id=21910,client_id=consumer-1},
>  body={group_id=}), createdTimeMs=1456489538056, sendTimeMs=1456489538056), 

[jira] [Updated] (KAFKA-4418) Broker Leadership Election Fails If Missing ZK Path Raises Exception

2018-08-03 Thread Ray Chiang (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-4418?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ray Chiang updated KAFKA-4418:
--
Component/s: zkclient

> Broker Leadership Election Fails If Missing ZK Path Raises Exception
> 
>
> Key: KAFKA-4418
> URL: https://issues.apache.org/jira/browse/KAFKA-4418
> Project: Kafka
>  Issue Type: Bug
>  Components: zkclient
>Affects Versions: 0.9.0.1, 0.10.0.0, 0.10.0.1
>Reporter: Michael Pedersen
>Priority: Major
>  Labels: reliability
>
> Our Kafka cluster went down because a single node went down *and* a path in 
> Zookeeper was missing for one topic (/brokers/topics//partitions). 
> When this occurred, leadership election could not run, and produced a stack 
> trace that looked like this:
> Failed to start preferred replica election
> org.I0Itec.zkclient.exception.ZkNoNodeException: 
> org.apache.zookeeper.KeeperException$NoNodeException: KeeperErrorCode = 
> NoNode for /brokers/topics/warandpeace/partitions
>   at org.I0Itec.zkclient.exception.ZkException.create(ZkException.java:47)
>   at org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:995)
>   at org.I0Itec.zkclient.ZkClient.getChildren(ZkClient.java:675)
>   at org.I0Itec.zkclient.ZkClient.getChildren(ZkClient.java:671)
>   at kafka.utils.ZkUtils.getChildren(ZkUtils.scala:537)
>   at 
> kafka.utils.ZkUtils$$anonfun$getAllPartitions$1.apply(ZkUtils.scala:817)
>   at 
> kafka.utils.ZkUtils$$anonfun$getAllPartitions$1.apply(ZkUtils.scala:816)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
>   at scala.collection.Iterator$class.foreach(Iterator.scala:727)
>   at scala.collection.AbstractIterator.foreach(Iterator.scala:1157)
>   at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
>   at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
>   at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
>   at scala.collection.AbstractTraversable.map(Traversable.scala:105)
>   at kafka.utils.ZkUtils.getAllPartitions(ZkUtils.scala:816)
>   at 
> kafka.admin.PreferredReplicaLeaderElectionCommand$.main(PreferredReplicaLeaderElectionCommand.scala:64)
>   at 
> kafka.admin.PreferredReplicaLeaderElectionCommand.main(PreferredReplicaLeaderElectionCommand.scala)
> Caused by: org.apache.zookeeper.KeeperException$NoNodeException: 
> KeeperErrorCode = NoNode for /brokers/topics/warandpeace/partitions
>   at org.apache.zookeeper.KeeperException.create(KeeperException.java:111)
>   at org.apache.zookeeper.KeeperException.create(KeeperException.java:51)
>   at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1472)
>   at org.apache.zookeeper.ZooKeeper.getChildren(ZooKeeper.java:1500)
>   at org.I0Itec.zkclient.ZkConnection.getChildren(ZkConnection.java:114)
>   at org.I0Itec.zkclient.ZkClient$4.call(ZkClient.java:678)
>   at org.I0Itec.zkclient.ZkClient$4.call(ZkClient.java:675)
>   at org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:985)
>   ... 16 more
> I have checked through the code a bit, and have found a quick place to 
> introduce a fix that would seem to allow the leadership election to continue. 
> Specifically, the function at 
> https://github.com/apache/kafka/blob/trunk/core/src/main/scala/kafka/utils/ZkUtils.scala#L633
>  does not handle possible exceptions. Wrapping a try/catch block here would 
> work, but could introduce a number of other problems:
> * If the code is used elsewhere, the exception might be needed at a higher 
> level to prevent something else.
> * Unless the exception is logged/reported somehow, no one will know this 
> problem exists, which makes debugging other problems harder.
> I'm sure there are other issues I'm not aware of, but those two come to mind 
> quickly. What would be the best route for getting this resolved quickly?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KAFKA-4414) Unexpected "Halting because log truncation is not allowed"

2018-08-03 Thread Ray Chiang (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-4414?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ray Chiang updated KAFKA-4414:
--
Component/s: replication

> Unexpected "Halting because log truncation is not allowed"
> --
>
> Key: KAFKA-4414
> URL: https://issues.apache.org/jira/browse/KAFKA-4414
> Project: Kafka
>  Issue Type: Bug
>  Components: replication
>Affects Versions: 0.9.0.1
>Reporter: Meyer Kizner
>Priority: Major
>
> Our Kafka installation runs with unclean leader election disabled, so brokers 
> halt when they find that their message offset is ahead of the leader's offset 
> for a topic. We had two brokers halt today with this issue. After much time 
> spent digging through the logs, I believe the following timeline describes 
> what occurred and points to a plausible hypothesis as to what happened.
> * B1, B2, and B3 are replicas of a topic, all in the ISR. B2 is currently the 
> leader, but B1 is the preferred leader. The controller runs on B3.
> * B1 fails, but the controller does not detect the failure immediately.
> * B2 receives a message from a producer and B3 fetches it to stay up to date. 
> B2 has not accepted the message, because B1 is down and so has not 
> acknowledged the message.
> * The controller triggers a preferred leader election, making B1 the leader, 
> and notifies all replicas.
> * Very shortly afterwards (~200ms), B1's broker registration in ZooKeeper 
> expires, so the controller reassigns B2 to be leader again and notifies all 
> replicas.
> * Because B3 is the controller, while B2 is on another box, B3 hears about 
> both of these events before B2 hears about either. B3 truncates its log to 
> the high water mark (before the pending message) and resumes fetching from B2.
> * B3 fetches the pending message from B2 again.
> * B2 learns that it has been displaced and then reelected, and truncates its 
> log to the high water mark, before the pending message.
> * The next time B3 tries to fetch from B2, it sees that B2 is missing the 
> pending message and halts.
> In this case, there was no data loss or inconsistency. I haven't fully 
> thought through whether either would be possible, but it seems likely that 
> they would be, especially if there had been multiple producers to this topic.
> I'm not completely certain about this timeline, but this sequence of events 
> appears to at least be possible. Looking a bit through the controller code, 
> there doesn't seem to be anything that forces {{LeaderAndIsrRequest}} to be 
> sent in a particular order. If someone with more knowledge of the code base 
> believes this is incorrect, I'd be happy to post the logs and/or do some more 
> digging.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KAFKA-3493) Replica fetcher load is not balanced over fetcher threads

2018-08-03 Thread Ray Chiang (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-3493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ray Chiang updated KAFKA-3493:
--
Component/s: replication

> Replica fetcher load is not balanced over fetcher threads
> -
>
> Key: KAFKA-3493
> URL: https://issues.apache.org/jira/browse/KAFKA-3493
> Project: Kafka
>  Issue Type: Improvement
>  Components: replication
>Affects Versions: 0.9.0.1
>Reporter: Maysam Yabandeh
>Priority: Major
>
> The replicas are not evenly distributed among the fetcher threads. This has 
> caused some fetcher threads get overloaded and hence their requests time out 
> frequently. This is especially a big issue when a new node is added to the 
> cluster and the fetch traffic is high. 
> Here is an example run in a test cluster with 10 brokers and 6 fetcher 
> threads (per source broker). A single topic consisting of 500+ partitions was 
> assigned to have a replica for each parition on the newly added broker.
> {code}[kafka-jetstream.canary]myabandeh@sjc8c-rl17-23b:~$ for i in `seq 0 5`; 
> do grep ReplicaFetcherThread-$i- /var/log/kafka/server.log | grep "reset its 
> fetch offset from 0" | wc -l; done
> 85
> 83
> 85
> 83
> 85
> 85
> [kafka-jetstream.canary]myabandeh@sjc8c-rl17-23b:~$ for i in `seq 0 5`; do 
> grep ReplicaFetcherThread-$i-22 /var/log/kafka/server.log | grep "reset its 
> fetch offset from 0" | wc -l; done
> 15
> 1
> 13
> 1
> 14
> 1
> {code}
> The problem is that AbstractFetcherManager::getFetcherId method does not take 
> the broker id into account:
> {code}
>   private def getFetcherId(topic: String, partitionId: Int) : Int = {
> Utils.abs(31 * topic.hashCode() + partitionId) % numFetchers
>   }
> {code}
> Hence although the replicas are evenly distributed among the fetcher ids 
> across all source brokers, this is not necessarily the case for each broker 
> separately. 
> I think a random function would do a much better job in distributing the load 
> over the fetcher threads from each source broker.
> Thoughts?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KAFKA-4666) Failure test for Kafka configured for consistency vs availability

2018-08-03 Thread Ray Chiang (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-4666?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ray Chiang updated KAFKA-4666:
--
Component/s: documentation

> Failure test for Kafka configured for consistency vs availability
> -
>
> Key: KAFKA-4666
> URL: https://issues.apache.org/jira/browse/KAFKA-4666
> Project: Kafka
>  Issue Type: Improvement
>  Components: documentation
>Reporter: Emanuele Cesena
>Priority: Major
> Attachments: consistency_test.py
>
>
> We recently had an issue with our Kafka setup because of a misconfiguration.
> In short, we thought we have configured Kafka for durability, but we didn't 
> set the producers to acks=all. During a full outage, we had situations where 
> some partitions were "partitioned", meaning that the followers started 
> without properly waiting for the right leader, and thus we lost data. Again, 
> this is not an issue with Kafka, but a misconfiguration on our side.
> I think we reproduced the issue, and we built a docker test that proves that, 
> if the producer isn't set with acks=all, then data can be lost during an 
> almost full outage. The test is attached.
> I was thinking to send a PR, but wanted to run this through you first, as 
> it's not necessarily proving that a feature works as expected.
> In addition, I think the documentation could be slightly improved, for 
> instance in the section:
> http://kafka.apache.org/documentation/#design_ha
> by clearly stating that there are 3 steps one should do for configuring kafka 
> for consistency, the third being that producers should be set with acks=all 
> (which is now part of the 2nd point).
> Please let me know what do you think, and I can send a PR if you agree.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KAFKA-4493) Connections to Kafka brokers should be validated

2018-08-03 Thread Ray Chiang (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-4493?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ray Chiang updated KAFKA-4493:
--
Component/s: clients

> Connections to Kafka brokers should be validated
> 
>
> Key: KAFKA-4493
> URL: https://issues.apache.org/jira/browse/KAFKA-4493
> Project: Kafka
>  Issue Type: Improvement
>  Components: clients
>Reporter: Ismael Juma
>Priority: Major
>
> There have been a few reports of Kafka clients throwing an OOM because they 
> read 4 bytes from the stream and then use that to allocate a ByteBuffer 
> without validating that they are using the right security protocol or even 
> communicating with a Kafka broker.
> It would be good to perform some validation in order to show a useful error 
> message to the user instead of the OOM.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KAFKA-5813) Unexpected unclean leader election due to leader/controller's unusual event handling order

2018-08-03 Thread Ray Chiang (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-5813?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ray Chiang updated KAFKA-5813:
--
Component/s: zkclient
 replication

> Unexpected unclean leader election due to leader/controller's unusual event 
> handling order 
> ---
>
> Key: KAFKA-5813
> URL: https://issues.apache.org/jira/browse/KAFKA-5813
> Project: Kafka
>  Issue Type: Improvement
>  Components: replication, zkclient
>Affects Versions: 0.10.2.1
>Reporter: Allen Wang
>Priority: Minor
>
> We experienced an unexpected unclean leader election after network glitch 
> happened on the leader of partition. We have replication factor 2.
> Here is the sequence of event gathered from various logs:
> 1. ZK session timeout happens for leader of partition 
> 2. New ZK session is established for leader 
> 3. Leader removes the follower from ISR (which might be caused by replication 
> delay due to the network problem) and updates the ISR in ZK 
> 4. Controller processes the BrokerChangeListener event happened at step 1 
> where the leader seems to be offline 
> 5. Because the ISR in ZK is already updated by leader to remove the follower, 
> controller makes an unclean leader election 
> 6. Controller processes the second BrokerChangeListener event happened at 
> step 2 to mark the broker online again
> It seems to me that step 4 happens too late. If it happens right after step 
> 1, it will be a clean leader election and hopefully the producer will 
> immediately switch to the new leader, thus avoiding consumer offset reset. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KAFKA-4040) Round Robin Assignment does not create balanced assignments

2018-08-03 Thread Ray Chiang (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-4040?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ray Chiang updated KAFKA-4040:
--
Component/s: replication

> Round Robin Assignment does not create balanced assignments
> ---
>
> Key: KAFKA-4040
> URL: https://issues.apache.org/jira/browse/KAFKA-4040
> Project: Kafka
>  Issue Type: Improvement
>  Components: replication
>Affects Versions: 0.10.0.0, 0.11.0.0, 1.0.0
>Reporter: Mathias Herberts
>Priority: Major
>
> The RoundRobinAssignor loops over the consumers in a circular manner, 
> assigning each time N partitions to each consumer, where N is the number of 
> consuming threads.
> In the following scenario, this creates a major imbalance in the assignment.
> single topic with 64 partitions, 4 consumers (A,B,C,D), each with 12 threads. 
> The roundrobin strategy will allocate the partitions in the following manner:
> A: 24
> B: 16
> C: 12
> D: 12
> when the expected assignment would rather be 16 to each consumer.
> The reason for this imbalance is that instead of allocating a single 
> partition to each consumer in sequence, the assignor attempts to assign 12 
> partitions each time it is handed a consumer by the circularIterator, so it 
> starts by assigning 12 to each of A,B,C and D, then it has 16 partitions left 
> and it is handed A again, to which it assigns 12 partitions before moving to 
> B to which 4 will be assigned.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KAFKA-3630) Consider auto closing outdated pull requests

2018-08-03 Thread Ray Chiang (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-3630?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ray Chiang updated KAFKA-3630:
--
Component/s: build

> Consider auto closing outdated pull requests
> 
>
> Key: KAFKA-3630
> URL: https://issues.apache.org/jira/browse/KAFKA-3630
> Project: Kafka
>  Issue Type: Improvement
>  Components: build
>Reporter: Grant Henke
>Priority: Major
>
> Currently we don't have access to close pull requests and the list of open 
> pull requests is growing. We are nearing 200 open pull requests and many are 
> outdated. 
> I am not sure if this is possible but I think a potential improvement would 
> be to have a Jenkins job that runs periodically to:
> 1. Find all pull requests that have had no activity for 15 days
> 2. Comment on the pull requests that they were auto closed and should be 
> re-opened if there is still interest
> 3. Close the pull requests
> I don't think closing the outdated pull request will hurt project progress in 
> anyway because:
> - Jira still tracks the feature or fix
> - The pull requests likely need a rebase or feedback needs to be address
> - The notification will encourage the pull request owner and reviewers to 
> follow up
> As of today the break down of older pull requests is:
> - [Older than 15 
> days|https://github.com/apache/kafka/pulls?utf8=%E2%9C%93=is%3Apr+is%3Aopen+updated%3A%3C%3D2016-04-12+]:
>  153
> - [Older than 30 
> days|https://github.com/apache/kafka/pulls?utf8=%E2%9C%93=is%3Apr+is%3Aopen+updated%3A%3C%3D2016-03-28]:
>  107
> - [Older than 60 
> days|https://github.com/apache/kafka/pulls?utf8=%E2%9C%93=is%3Apr+is%3Aopen+updated%3A%3C%3D2016-02-28+]:
>  73
> - [Older than 90 
> days|https://github.com/apache/kafka/pulls?utf8=%E2%9C%93=is%3Apr+is%3Aopen+updated%3A%3C%3D2016-01-28+]:
>  52
> This jira is mainly to track discussion and ideas around this challenge. 
> Please feel free to propose an alternate solution. 
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KAFKA-3877) Gradle compiler daemon exits with non-zero exit code while running LogOffsetTest

2018-08-03 Thread Ray Chiang (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-3877?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ray Chiang updated KAFKA-3877:
--
Component/s: unit tests

> Gradle compiler daemon exits with non-zero exit code while running 
> LogOffsetTest
> 
>
> Key: KAFKA-3877
> URL: https://issues.apache.org/jira/browse/KAFKA-3877
> Project: Kafka
>  Issue Type: Sub-task
>  Components: unit tests
>Reporter: Ismael Juma
>Priority: Major
>  Labels: transient-unit-test-failure
>
> This happened in a recent build:
> {code}
> kafka.server.LogOffsetTest > testGetOffsetsBeforeNow STARTED
> :kafka-trunk-jdk8:core:test FAILED
> :test_core_2_11 FAILED
> FAILURE: Build failed with an exception.
> * What went wrong:
> Execution failed for task ':test_core_2_11'.
> > Process 'Gradle Compiler Daemon 1' finished with non-zero exit value 137
> {code}
> https://builds.apache.org/job/kafka-trunk-jdk8/702



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KAFKA-4519) Delete old unused branches in git repo

2018-08-03 Thread Ray Chiang (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-4519?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ray Chiang updated KAFKA-4519:
--
Component/s: build

> Delete old unused branches in git repo
> --
>
> Key: KAFKA-4519
> URL: https://issues.apache.org/jira/browse/KAFKA-4519
> Project: Kafka
>  Issue Type: Task
>  Components: build
>Reporter: Jeff Widman
>Priority: Trivial
>
> Delete these old git branches, as they're quite outdated and not relevant for 
> various version branches:
> * consumer_redesign
> * transactional_messaging
> * 0.8.0-beta1-candidate1



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KAFKA-4987) Topic creation allows invalid config values on running brokers

2018-08-03 Thread Ray Chiang (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-4987?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ray Chiang updated KAFKA-4987:
--
Component/s: config

> Topic creation allows invalid config values on running brokers
> --
>
> Key: KAFKA-4987
> URL: https://issues.apache.org/jira/browse/KAFKA-4987
> Project: Kafka
>  Issue Type: Bug
>  Components: config
>Affects Versions: 0.10.0.1, 0.10.1.0
>Reporter: dan norwood
>Priority: Major
>
> we use kip4 capabilities to make a `CreateTopicsRequest` for our topics. one 
> of the configs we use is `cleanup.policy=compact, delete`. this was 
> inadvertently run against a cluster that does not support that policy. the 
> result was that the topic was created, however on subsequent broker bounce 
> the broker fails to start up
> {code}
> [2017-03-23 00:00:44,837] FATAL Fatal error during KafkaServer startup. 
> Prepare to shutdown (kafka.server.KafkaServer)
> org.apache.kafka.common.config.ConfigException: Invalid value compact,delete 
> for configuration cleanup.policy: String must be one of: compact, delete
>   at 
> org.apache.kafka.common.config.ConfigDef$ValidString.ensureValid(ConfigDef.java:827)
>   at org.apache.kafka.common.config.ConfigDef.parse(ConfigDef.java:427)
>   at 
> org.apache.kafka.common.config.AbstractConfig.(AbstractConfig.java:55)
>   at kafka.log.LogConfig.(LogConfig.scala:56)
>   at kafka.log.LogConfig$.fromProps(LogConfig.scala:192)
>   at kafka.server.KafkaServer$$anonfun$3.apply(KafkaServer.scala:598)
>   at kafka.server.KafkaServer$$anonfun$3.apply(KafkaServer.scala:597)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
>   at 
> scala.collection.TraversableLike$$anonfun$map$1.apply(TraversableLike.scala:244)
>   at 
> scala.collection.immutable.HashMap$HashMap1.foreach(HashMap.scala:224)
>   at 
> scala.collection.immutable.HashMap$HashTrieMap.foreach(HashMap.scala:403)
>   at 
> scala.collection.immutable.HashMap$HashTrieMap.foreach(HashMap.scala:403)
>   at 
> scala.collection.immutable.HashMap$HashTrieMap.foreach(HashMap.scala:403)
>   at scala.collection.TraversableLike$class.map(TraversableLike.scala:244)
>   at scala.collection.AbstractTraversable.map(Traversable.scala:105)
>   at kafka.server.KafkaServer.createLogManager(KafkaServer.scala:597)
>   at kafka.server.KafkaServer.startup(KafkaServer.scala:183)
>   at 
> kafka.server.KafkaServerStartable.startup(KafkaServerStartable.scala:37)
>   at kafka.Kafka$.main(Kafka.scala:67)
>   at kafka.Kafka.main(Kafka.scala)
> [2017-03-23 00:00:44,839] INFO shutting down (kafka.server.KafkaServer)
> [2017-03-23 00:00:44,844] INFO shut down completed (kafka.server.KafkaServer)
> {code}
> i believe that the broker should fail when given an invalid config during 
> topic creation



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KAFKA-4629) Per topic MBeans leak

2018-08-03 Thread Ray Chiang (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-4629?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ray Chiang updated KAFKA-4629:
--
Component/s: metrics

> Per topic MBeans leak
> -
>
> Key: KAFKA-4629
> URL: https://issues.apache.org/jira/browse/KAFKA-4629
> Project: Kafka
>  Issue Type: Bug
>  Components: metrics
>Affects Versions: 0.10.0.1
>Reporter: Alberto Forti
>Priority: Minor
>
> Hi,
> In our application we create and delete topics dynamically. Most of the times 
> when a topic is deleted the related MBeans are not deleted. Example of MBean: 
> kafka.server:type=BrokerTopicMetrics,name=TotalProduceRequestsPerSec,topic=dw_06b5f828-e452-4e22-89c9-67849a65603d
> Also, deleting a topic often produces (what I think is) noise in the logs at 
> WARN level. One example is:
> WARN  PartitionStateMachine$DeleteTopicsListener:83 - [DeleteTopicsListener 
> on 1]: Ignoring request to delete non-existing topics 
> dw_fe8ff14b-aa9b-4f24-9bc1-6fbce15d20d2
> Easy reproducible with a basic Kafka cluster with two brokers, just create 
> and delete topics few times. Sometimes the MBeans for the topic are deleted 
> and sometimes are not.
> I'm creating and deleting topics using the AdminUtils class in the Java API:
> AdminUtils.deleteTopic(zkUtils, topicName);
> AdminUtils.createTopic(zkUtils, topicName, partitions, replicationFactor, 
> topicConfiguration, kafka.admin.RackAwareMode.Enforced$.MODULE$);
> Kafka version: 0.10.0.1 (haven't tried other versions)
> Thanks,
> Alberto



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KAFKA-5802) ScramServerCallbackHandler#handle should check username not being null before calling credentialCache.get()

2018-08-03 Thread Ray Chiang (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-5802?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ray Chiang updated KAFKA-5802:
--
Component/s: security

> ScramServerCallbackHandler#handle should check username not being null before 
> calling credentialCache.get()
> ---
>
> Key: KAFKA-5802
> URL: https://issues.apache.org/jira/browse/KAFKA-5802
> Project: Kafka
>  Issue Type: Bug
>  Components: security
>Reporter: Ted Yu
>Priority: Minor
>
> {code}
> String username = null;
> for (Callback callback : callbacks) {
> if (callback instanceof NameCallback)
> username = ((NameCallback) callback).getDefaultName();
> else if (callback instanceof ScramCredentialCallback)
> ((ScramCredentialCallback) 
> callback).scramCredential(credentialCache.get(username));
> {code}
> Since ConcurrentHashMap, used by CredentialCache, doesn't allow null keys, we 
> should check that username is not null before calling credentialCache.get()



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KAFKA-6619) InstanceAlreadyExistsException while Tomcat starting up

2018-08-03 Thread Ray Chiang (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-6619?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ray Chiang updated KAFKA-6619:
--
Component/s: packaging

> InstanceAlreadyExistsException while Tomcat starting up
> ---
>
> Key: KAFKA-6619
> URL: https://issues.apache.org/jira/browse/KAFKA-6619
> Project: Kafka
>  Issue Type: Bug
>  Components: packaging
>Affects Versions: 0.10.0.0
>Reporter: xiezhi
>Priority: Major
>
>  I configured log4j to send application logs to kafka.
> There is no more producer, one only. So I couldn't figure out what's happened.
> log4j.properties-
> log4j.rootLogger=INFO, kafka
> #appender kafka
>  log4j.appender.kafka=org.apache.kafka.log4jappender.KafkaLog4jAppender
>  log4j.appender.kafka.topic=UAT_APP
>  log4j.appender.A1.Threshold=INFO
>  log4j.appender.kafka.syncSend=false
> #multiple brokers are separated by comma ",".
>  log4j.appender.kafka.brokerList=localhost:9091,localhost:9092,localhost:9093,
>  log4j.appender.kafka.layout=org.apache.log4j.PatternLayout
>  log4j.appender.kafka.layout.ConversionPattern=%d\{-MM-dd HH:mm:ss} %p %t 
> %c (%F:%L) - %m%n
> -end log4j.properties---
> It's the error log below.
> 2018-03-07 14:54:57 INFO localhost-startStop-1 
> org.apache.kafka.common.utils.AppInfoParser (AppInfoParser.java:109) - Kafka 
> version : 1.0.0
>  2018-03-07 14:54:57 INFO localhost-startStop-1 
> org.apache.kafka.common.utils.AppInfoParser (AppInfoParser.java:110) - Kafka 
> commitId : aaa7af6d4a11b29d
>  2018-03-07 14:54:57 WARN localhost-startStop-1 
> org.apache.kafka.common.utils.AppInfoParser (AppInfoParser.java:66) - Error 
> registering AppInfo mbean
>  javax.management.InstanceAlreadyExistsException: 
> kafka.producer:type=app-info,id=producer-1
>  at com.sun.jmx.mbeanserver.Repository.addMBean(Repository.java:437)
>  at 
> com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerWithRepository(DefaultMBeanServerInterceptor.java:1898)
>  at 
> com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerDynamicMBean(DefaultMBeanServerInterceptor.java:966)
>  at 
> com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerObject(DefaultMBeanServerInterceptor.java:900)
>  at 
> com.sun.jmx.interceptor.DefaultMBeanServerInterceptor.registerMBean(DefaultMBeanServerInterceptor.java:324)
>  at 
> com.sun.jmx.mbeanserver.JmxMBeanServer.registerMBean(JmxMBeanServer.java:522)
>  at 
> org.apache.kafka.common.utils.AppInfoParser.registerAppInfo(AppInfoParser.java:62)
>  at 
> org.apache.kafka.clients.producer.KafkaProducer.(KafkaProducer.java:427)
>  at 
> org.apache.kafka.clients.producer.KafkaProducer.(KafkaProducer.java:291)
>  at 
> org.apache.kafka.log4jappender.KafkaLog4jAppender.getKafkaProducer(KafkaLog4jAppender.java:246)
>  at 
> org.apache.kafka.log4jappender.KafkaLog4jAppender.activateOptions(KafkaLog4jAppender.java:240)
>  at org.apache.log4j.config.PropertySetter.activate(PropertySetter.java:307)
>  at 
> org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:172)
>  at 
> org.apache.log4j.config.PropertySetter.setProperties(PropertySetter.java:104)
>  at 
> org.apache.log4j.PropertyConfigurator.parseAppender(PropertyConfigurator.java:809)
>  at 
> org.apache.log4j.PropertyConfigurator.parseCategory(PropertyConfigurator.java:735)
>  at 
> org.apache.log4j.PropertyConfigurator.configureRootCategory(PropertyConfigurator.java:615)
>  at 
> org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:502)
>  at 
> org.apache.log4j.PropertyConfigurator.doConfigure(PropertyConfigurator.java:547)
>  at 
> org.apache.log4j.helpers.OptionConverter.selectAndConfigure(OptionConverter.java:483)
>  at org.apache.log4j.LogManager.(LogManager.java:127)
>  at org.apache.log4j.Logger.getLogger(Logger.java:104)
>  at 
> org.apache.commons.logging.impl.Log4JLogger.getLogger(Log4JLogger.java:289)
>  at org.apache.commons.logging.impl.Log4JLogger.(Log4JLogger.java:109)
>  at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
>  at 
> sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
>  at 
> sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
>  at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
>  at 
> org.apache.commons.logging.impl.LogFactoryImpl.createLogFromClass(LogFactoryImpl.java:1040)
>  at 
> org.apache.commons.logging.impl.LogFactoryImpl.discoverLogImplementation(LogFactoryImpl.java:838)
>  at 
> org.apache.commons.logging.impl.LogFactoryImpl.newInstance(LogFactoryImpl.java:601)
>  at 
> org.apache.commons.logging.impl.LogFactoryImpl.getInstance(LogFactoryImpl.java:333)
>  at 
> 

[jira] [Updated] (KAFKA-6798) Kafka leader rebalance failures

2018-08-03 Thread Ray Chiang (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-6798?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ray Chiang updated KAFKA-6798:
--
Component/s: replication

> Kafka leader rebalance failures
> ---
>
> Key: KAFKA-6798
> URL: https://issues.apache.org/jira/browse/KAFKA-6798
> Project: Kafka
>  Issue Type: Bug
>  Components: replication
>Affects Versions: 0.10.2.1, 1.0.1
>Reporter: Riley Zimmerman
>Priority: Critical
>
> I am running 3 Kafka (version 0.10.2.1 and more recently moved to 1.0.1) with 
> 3 Zookeeper (v3.4.9) as statefulsets in a kubernetes v1.9.1 deployment.  My 
> partitions are replication factor 3.  My main workload involves a kafka 
> streams consumer/producer (storing offsets in kafka) and a second kafka 
> consumer storing offsets in zookeeper (only commits every 30 seconds).  There 
> are ~200,000 kafka messages going through each per minute.  The log.retention 
> settings are all 4 hours.  I have auto.leader.rebalance.enabled.  
> I am randomly having failures during the rebalances.  The result is that 
> partitions for both topics and consumer_offsets go out of sync and the 
> partition leader becomes -1.  After 4 hours there is another (auto?) 
> rebalance and sometimes it sorts itself out.  Sometimes it runs for weeks 
> without problems, other times it it happens multiple times in a few days.  It 
> appears to happen earlier in test runs if it is going to happen.   
> {noformat}
> Topic:__consumer_offsetsPartitionCount:50   ReplicationFactor:3   
>   
> Configs:segment.bytes=104857600,cleanup.policy=compact,compression.type=producer
> Topic: __consumer_offsets   Partition: 0Leader: -1  
> Replicas: 2,0,1 Isr:
> Topic: __consumer_offsets   Partition: 1Leader: 0   
> Replicas: 0,1,2 Isr: 1,2,0
> Topic: __consumer_offsets   Partition: 2Leader: 1   
> Replicas: 1,2,0 Isr: 2,1,0
> Topic: __consumer_offsets   Partition: 3Leader: -1  
> Replicas: 2,1,0 Isr:
> {noformat}
> {noformat}
> [2018-03-20 12:42:32,180] WARN [Controller 2]: Partition [agent.metadata,5] 
> failed to complete preferred replica leader election. Leader is -1 
> (kafka.controller.KafkaController)
> {noformat}
> {noformat}
> [2018-03-20 11:02:32,099] TRACE Controller 2 epoch 27 started leader election 
> for partition [__consumer_offsets,30] (state.change.logger)
> [2018-03-20 11:02:32,101] ERROR Controller 2 epoch 27 encountered error while 
> electing leader for partition [__consumer_offsets,30] due to: Preferred 
> replica 2 for partition [__consumer_offsets,30] is either not alive or not in 
> the isr. Current leader and ISR: [{"leader":-1,"leader_epoch":59,"isr":[]}]. 
> (state.change.logger)
> [2018-03-20 11:02:32,101] ERROR Controller 2 epoch 27 initiated state change 
> for partition [__consumer_offsets,30] from OnlinePartition to OnlinePartition 
> failed (state.change.logger)
> kafka.common.StateChangeFailedException: encountered error while electing 
> leader for partition [__consumer_offsets,30] due to: Preferred replica 2 for 
> partition [__consumer_offsets,30] is either not alive or not in the isr. 
> Current leader and ISR: [{"leader":-1,"leader_epoch":59,"isr":[]}].
>   at 
> kafka.controller.PartitionStateMachine.electLeaderForPartition(PartitionStateMachine.scala:362)
>   at 
> kafka.controller.PartitionStateMachine.kafka$controller$PartitionStateMachine$$handleStateChange(PartitionStateMachine.scala:202)
>   at 
> kafka.controller.PartitionStateMachine$$anonfun$handleStateChanges$2.apply(PartitionStateMachine.scala:141)
>   at 
> kafka.controller.PartitionStateMachine$$anonfun$handleStateChanges$2.apply(PartitionStateMachine.scala:140)
>   at scala.collection.immutable.Set$Set1.foreach(Set.scala:94)
>   at 
> kafka.controller.PartitionStateMachine.handleStateChanges(PartitionStateMachine.scala:140)
>   at 
> kafka.controller.KafkaController.onPreferredReplicaElection(KafkaController.scala:662)
>   at 
> kafka.controller.KafkaController$$anonfun$kafka$controller$KafkaController$$checkAndTriggerPartitionRebalance$4$$anonfun$apply$16$$anonfun$apply$5.apply$mcV$sp(KafkaController.scala:1230)
>   at 
> kafka.controller.KafkaController$$anonfun$kafka$controller$KafkaController$$checkAndTriggerPartitionRebalance$4$$anonfun$apply$16$$anonfun$apply$5.apply(KafkaController.scala:1225)
>   at 
> kafka.controller.KafkaController$$anonfun$kafka$controller$KafkaController$$checkAndTriggerPartitionRebalance$4$$anonfun$apply$16$$anonfun$apply$5.apply(KafkaController.scala:1225)
>   at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:213)
>   at 
> kafka.controller.KafkaController$$anonfun$kafka$controller$KafkaController$$checkAndTriggerPartitionRebalance$4$$anonfun$apply$16.apply(KafkaController.scala:1222)

[jira] [Updated] (KAFKA-661) Prevent a shutting down broker from re-entering the ISR

2018-08-03 Thread Ray Chiang (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-661?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ray Chiang updated KAFKA-661:
-
Component/s: replication

> Prevent a shutting down broker from re-entering the ISR
> ---
>
> Key: KAFKA-661
> URL: https://issues.apache.org/jira/browse/KAFKA-661
> Project: Kafka
>  Issue Type: Bug
>  Components: replication
>Affects Versions: 0.8.0, 0.8.1
>Reporter: Joel Koshy
>Priority: Major
>
> There is a timing issue in controlled shutdown that affects low-volume 
> topics. The leader that is being shut down receives a leaderAndIsrRequest 
> informing it is no longer the leader and thus starts up a follower which 
> starts issuing fetch requests to the new leader. We then shrink the ISR and 
> send a StopReplicaRequest to the shutting down broker. However, the new 
> leader upon receiving the fetch request expands the ISR again.
> This does not really have critical impact in the sense that it can cause 
> producers to that topic to timeout. However, there are probably very few or 
> no produce requests coming in as it primarily affects low-volume topics. The 
> shutdown logic itself seems to be working correctly in that the leader has 
> been successfully moved.
> One possible approach would be to use the callback feature in the 
> ControllerBrokerRequestBatch and wait until the StopReplicaRequest has been 
> processed by the shutting down broker before shrinking the ISR; and there are 
> probably other ways as well.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KAFKA-2939) Make AbstractConfig.logUnused() tunable for clients

2018-08-03 Thread Ray Chiang (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-2939?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ray Chiang updated KAFKA-2939:
--
Component/s: config

> Make AbstractConfig.logUnused() tunable for clients
> ---
>
> Key: KAFKA-2939
> URL: https://issues.apache.org/jira/browse/KAFKA-2939
> Project: Kafka
>  Issue Type: Improvement
>  Components: config
>Reporter: Guozhang Wang
>Priority: Major
>  Labels: newbie
>
> Today we always log unused configs in KafkaProducer / KafkaConsumer in their 
> constructors, however for some cases like Kafka Streams that make use of 
> these clients, other configs may be passed in to configure Partitioner / 
> Serializer classes, etc. So it would be better to make this function call 
> optional to avoid printing unnecessary and confusing WARN entries.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Commented] (KAFKA-764) Race Condition in Broker Registration after ZooKeeper disconnect

2018-08-03 Thread Ray Chiang (JIRA)


[ 
https://issues.apache.org/jira/browse/KAFKA-764?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16568727#comment-16568727
 ] 

Ray Chiang commented on KAFKA-764:
--

This is for a very old version of Kafka.  If I don't see any update in a week 
or ago, I'm going to close this JIRA.

> Race Condition in Broker Registration after ZooKeeper disconnect
> 
>
> Key: KAFKA-764
> URL: https://issues.apache.org/jira/browse/KAFKA-764
> Project: Kafka
>  Issue Type: Bug
>  Components: zkclient
>Affects Versions: 0.7.1
>Reporter: Bob Cotton
>Priority: Major
> Attachments: BPPF_2900-Broker_Logs.tbz2
>
>
> When running our ZooKeepers in VMware, occasionally all the keepers 
> simultaneously pause long enough for the Kafka clients to time out and then 
> the keepers simultaneously un-pause.
> When this happens, the zk clients disconnect from ZooKeeper. When ZooKeeper 
> comes back ZkUtils.createEphemeralPathExpectConflict discovers the node id of 
> itself and does not re-register the broker id node and the function call 
> succeeds. Then ZooKeeper figures out the broker disconnected from the keeper 
> and deletes the ephemeral node *after* allowing the consumer to read the data 
> in the /brokers/ids/x node.  The broker then goes on to register all the 
> topics, etc.  When consumers connect, they see topic nodes associated with 
> the broker but thy can't find the broker node to get connection information 
> for the broker, sending them into a rebalance loop until they reach 
> rebalance.retries.max and fail.
> This might also be a ZooKeeper issue, but the desired behavior for a 
> disconnect case might be, if the broker node is found to explicitly delete 
> and recreate it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KAFKA-764) Race Condition in Broker Registration after ZooKeeper disconnect

2018-08-03 Thread Ray Chiang (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-764?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ray Chiang updated KAFKA-764:
-
Component/s: zkclient

> Race Condition in Broker Registration after ZooKeeper disconnect
> 
>
> Key: KAFKA-764
> URL: https://issues.apache.org/jira/browse/KAFKA-764
> Project: Kafka
>  Issue Type: Bug
>  Components: zkclient
>Affects Versions: 0.7.1
>Reporter: Bob Cotton
>Priority: Major
> Attachments: BPPF_2900-Broker_Logs.tbz2
>
>
> When running our ZooKeepers in VMware, occasionally all the keepers 
> simultaneously pause long enough for the Kafka clients to time out and then 
> the keepers simultaneously un-pause.
> When this happens, the zk clients disconnect from ZooKeeper. When ZooKeeper 
> comes back ZkUtils.createEphemeralPathExpectConflict discovers the node id of 
> itself and does not re-register the broker id node and the function call 
> succeeds. Then ZooKeeper figures out the broker disconnected from the keeper 
> and deletes the ephemeral node *after* allowing the consumer to read the data 
> in the /brokers/ids/x node.  The broker then goes on to register all the 
> topics, etc.  When consumers connect, they see topic nodes associated with 
> the broker but thy can't find the broker node to get connection information 
> for the broker, sending them into a rebalance loop until they reach 
> rebalance.retries.max and fail.
> This might also be a ZooKeeper issue, but the desired behavior for a 
> disconnect case might be, if the broker node is found to explicitly delete 
> and recreate it.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KAFKA-7244) Add note about memory map kernel limits to documentation

2018-08-03 Thread Ray Chiang (JIRA)
Ray Chiang created KAFKA-7244:
-

 Summary: Add note about memory map kernel limits to documentation
 Key: KAFKA-7244
 URL: https://issues.apache.org/jira/browse/KAFKA-7244
 Project: Kafka
  Issue Type: Improvement
  Components: documentation
Affects Versions: 0.10.0.0
Reporter: Ray Chiang
Assignee: Ray Chiang


In the documentation for 0.10.x through 2.0.0, there is mention of the file 
descriptor limit and the max socket buffer size, but no mention of the memory 
map kernel limit.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KAFKA-4238) consumer-subscription not working, when accessing a newly created topic immediately after its creation with the AdminUtils

2018-08-03 Thread Ray Chiang (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-4238?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ray Chiang updated KAFKA-4238:
--
Component/s: consumer
 admin

> consumer-subscription not working, when accessing a newly created topic 
> immediately after its creation with the AdminUtils
> --
>
> Key: KAFKA-4238
> URL: https://issues.apache.org/jira/browse/KAFKA-4238
> Project: Kafka
>  Issue Type: Bug
>  Components: admin, consumer
>Affects Versions: 0.10.0.0, 0.10.0.1
>Reporter: Florian Witteler
>Priority: Major
>
> I created a test-project to reproduce the bug.
> https://github.com/FloWi/kafka-topic-creation-bug
> We use a docker container that creates a fresh topic before a testsuite gets 
> executed (see {{trait FreshKafkaTopics}}). That trait uses the AdminUtils to 
> create the topic. 
> If we access the newly created topic directly after its creation, the 
> subscriber is broken. It sometimes works though (<5%), so it seems to be a 
> race-condition.
> If I put a {{Thread.sleep(1000)}} after the topic-creation, everything's fine 
> though.
> So, the problem is twofold:
> - {{AdminUtils.createTopic}} should block until the topic-creation is 
> completed
> - {{new KafkaConsumer[String, 
> String](props).subscribe(util.Arrays.asList(topic))}} should throw an 
> exception, when the topic is "not ready"



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KAFKA-4294) Allow password file in server.properties to separate 'secrets' from standard configs

2018-08-03 Thread Ray Chiang (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-4294?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ray Chiang updated KAFKA-4294:
--
Component/s: security
 config

> Allow password file in server.properties to separate 'secrets' from standard 
> configs 
> -
>
> Key: KAFKA-4294
> URL: https://issues.apache.org/jira/browse/KAFKA-4294
> Project: Kafka
>  Issue Type: Improvement
>  Components: config, security
>Reporter: Ryan P
>Priority: Major
>
> Java's keytool(for Windows) allows you to specify the keystore/truststore 
> password with an external file in addition to a string argument. 
> -storepass:file secret.txt
> http://docs.oracle.com/javase/7/docs/technotes/tools/windows/keytool.html
> It would be nice if Kafka could offer the same functionality allowing 
> organizations to separate concerns between standard configs and 'secrets'. 
> Ideally Kafka would add a secrets file property to the broker config which 
> could override any ssl properties which currently exist within the broker 
> config. Since the secrets file property is only used to override existing 
> SSL/TLS properties the change maintains backward compatibility. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KAFKA-3410) Unclean leader election and "Halting because log truncation is not allowed"

2018-08-03 Thread Ray Chiang (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-3410?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ray Chiang updated KAFKA-3410:
--
Component/s: replication

> Unclean leader election and "Halting because log truncation is not allowed"
> ---
>
> Key: KAFKA-3410
> URL: https://issues.apache.org/jira/browse/KAFKA-3410
> Project: Kafka
>  Issue Type: Bug
>  Components: replication
>Reporter: James Cheng
>Priority: Major
>  Labels: reliability
>
> I ran into a scenario where one of my brokers would continually shutdown, 
> with the error message:
> [2016-02-25 00:29:39,236] FATAL [ReplicaFetcherThread-0-1], Halting because 
> log truncation is not allowed for topic test, Current leader 1's latest 
> offset 0 is less than replica 2's latest offset 151 
> (kafka.server.ReplicaFetcherThread)
> I managed to reproduce it with the following scenario:
> 1. Start broker1, with unclean.leader.election.enable=false
> 2. Start broker2, with unclean.leader.election.enable=false
> 3. Create topic, single partition, with replication-factor 2.
> 4. Write data to the topic.
> 5. At this point, both brokers are in the ISR. Broker1 is the partition 
> leader.
> 6. Ctrl-Z on broker2. (Simulates a GC pause or a slow network) Broker2 gets 
> dropped out of ISR. Broker1 is still the leader. I can still write data to 
> the partition.
> 7. Shutdown Broker1. Hard or controlled, doesn't matter.
> 8. rm -rf the log directory of broker1. (This simulates a disk replacement or 
> full hardware replacement)
> 9. Resume broker2. It attempts to connect to broker1, but doesn't succeed 
> because broker1 is down. At this point, the partition is offline. Can't write 
> to it.
> 10. Resume broker1. Broker1 resumes leadership of the topic. Broker2 attempts 
> to join ISR, and immediately halts with the error message:
> [2016-02-25 00:29:39,236] FATAL [ReplicaFetcherThread-0-1], Halting because 
> log truncation is not allowed for topic test, Current leader 1's latest 
> offset 0 is less than replica 2's latest offset 151 
> (kafka.server.ReplicaFetcherThread)
> I am able to recover by setting unclean.leader.election.enable=true on my 
> brokers.
> I'm trying to understand a couple things:
> * In step 10, why is broker1 allowed to resume leadership even though it has 
> no data?
> * In step 10, why is it necessary to stop the entire broker due to one 
> partition that is in this state? Wouldn't it be possible for the broker to 
> continue to serve traffic for all the other topics, and just mark this one as 
> unavailable?
> * Would it make sense to allow an operator to manually specify which broker 
> they want to become the new master? This would give me more control over how 
> much data loss I am willing to handle. In this case, I would want broker2 to 
> become the new master. Or, is that possible and I just don't know how to do 
> it?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KAFKA-3962) ConfigDef support for resource-specific configuration

2018-08-03 Thread Ray Chiang (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-3962?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ray Chiang updated KAFKA-3962:
--
Component/s: config

> ConfigDef support for resource-specific configuration
> -
>
> Key: KAFKA-3962
> URL: https://issues.apache.org/jira/browse/KAFKA-3962
> Project: Kafka
>  Issue Type: Improvement
>  Components: config
>Reporter: Shikhar Bhushan
>Priority: Major
>
> It often comes up with connectors that you want some piece of configuration 
> that should be overridable at the topic-level, table-level, etc.
> The ConfigDef API should allow for defining these resource-overridable config 
> properties and we should have getter variants that accept a resource 
> argument, and return the more specific config value (falling back to the 
> default).
> There are a couple of possible ways to allow for this:
> 1. Support for map-style config properties "resource1:v1,resource2:v2". There 
> are escaping considerations to think through here. Also, how should the user 
> override fallback/default values -- perhaps {{*}} as a special resource?
> 2. Templatized configs -- so you would define {{$resource.some.property}}. 
> The default value is more naturally overridable here, by the user setting 
> {{some.property}} without the {{$resource}} prefix.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KAFKA-3861) Shrunk ISR before leader crash makes the partition unavailable

2018-08-03 Thread Ray Chiang (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-3861?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ray Chiang updated KAFKA-3861:
--
Component/s: replication

> Shrunk ISR before leader crash makes the partition unavailable
> --
>
> Key: KAFKA-3861
> URL: https://issues.apache.org/jira/browse/KAFKA-3861
> Project: Kafka
>  Issue Type: Bug
>  Components: replication
>Affects Versions: 0.10.0.0
>Reporter: Maysam Yabandeh
>Priority: Major
>
> We observed a case that the leader experienced a crash and lost its in-memory 
> data and latest HW offsets. Normally Kafka should be safe and be able to make 
> progress with a single node failure. However a few seconds before the crash 
> the leader shrunk its ISR to itself, which is safe since min-in-sync-replicas 
> is 2 and replication factor is 3 thus the troubled leader cannot accept new 
> produce messages. After the crash however the controller could not name any 
> of the of the followers as the new leader since as far as the controller 
> knows they are not in ISR and could potentially be behind the last leader. 
> Note that unclean-leader-election is disabled in this cluster since the 
> cluster requires a very high degree of durability and cannot tolerate data 
> loss.
> The impact could get worse if the admin brings up the crashed broker in an 
> attempt to make such partitions available again; this would take down even 
> more brokers as the followers panic when they find their offset larger than 
> HW offset in the leader:
> {code}
> if (leaderEndOffset < replica.logEndOffset.messageOffset) {
>   // Prior to truncating the follower's log, ensure that doing so is not 
> disallowed by the configuration for unclean leader election.
>   // This situation could only happen if the unclean election 
> configuration for a topic changes while a replica is down. Otherwise,
>   // we should never encounter this situation since a non-ISR leader 
> cannot be elected if disallowed by the broker configuration.
>   if (!LogConfig.fromProps(brokerConfig.originals, 
> AdminUtils.fetchEntityConfig(replicaMgr.zkUtils,
> ConfigType.Topic, 
> topicAndPartition.topic)).uncleanLeaderElectionEnable) {
> // Log a fatal error and shutdown the broker to ensure that data loss 
> does not unexpectedly occur.
> fatal("Halting because log truncation is not allowed for topic 
> %s,".format(topicAndPartition.topic) +
>   " Current leader %d's latest offset %d is less than replica %d's 
> latest offset %d"
>   .format(sourceBroker.id, leaderEndOffset, brokerConfig.brokerId, 
> replica.logEndOffset.messageOffset))
> Runtime.getRuntime.halt(1)
>   }
> {code}
> One hackish solution would be that the admin investigates the logs, determine 
> that unclean-leader-election in this particular case would be safe and 
> temporarily enables it (while the crashed node is down) until new leaders are 
> selected for affected partitions, wait for the topics LEO advances far enough 
> and then brings up the crashed node again. This manual process is however 
> slow and error-prone and the cluster will suffer partial unavailability in 
> the meanwhile.
> We are thinking of having the controller make an exception for this case: if 
> ISR size is less than min-in-sync-replicas and the new leader would be -1, 
> then the controller does an RPC to all the replicas and inquire of the latest 
> offset, and if all the replicas responded then chose the one with the largest 
> offset as the leader as well as the new ISR. Note that the controller cannot 
> do that if any of the non-leader replicas do not respond since there might be 
> a case that the responding replicas have not been involved the last ISR and 
> hence potentially behind the others (and the controller could not know that 
> since it does not keep track of previous ISR).
> Pros would be that kafka will be safely available when such cases occur and 
> would not require any admin intervention. The cons however is that the 
> controller talking to brokers inside the leader election function would break 
> the existing pattern in the source code as currently the leader is elected 
> locally without requiring any additional RPC.
> Thoughts?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KAFKA-3329) Validation script to test expected behavior of Authorizer implementations

2018-08-03 Thread Ray Chiang (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-3329?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ray Chiang updated KAFKA-3329:
--
Component/s: tools

> Validation script to test expected behavior of Authorizer implementations
> -
>
> Key: KAFKA-3329
> URL: https://issues.apache.org/jira/browse/KAFKA-3329
> Project: Kafka
>  Issue Type: Wish
>  Components: tools
>Reporter: Grant Henke
>Priority: Major
>
> The authorizer interface and documentation defines some of the expected 
> behavior of an Authorizer implementation. However, having real tests for a 
> user implementing their own authorizer would be useful. A script like:
> {code}
> kafka-validate-authorizer.sh --authorizer-class ...
> {code}
> could be used to validate:
> * Expected operation inheritance
> ** Example: READ or WRITE automatically grants DESCRIBE
> * Expected exceptions or handling of edge cases
> ** When I add the same ACL twice
> ** When I remove an ACL that is not set
> ** When both Deny and Allow are set?
> ** When no Acl is attached to a resource?
> * Expected support for concurrent requests against multiple instances
> These same tests could be part of the Authorizer integration tests for 
> Kafka's SimpleAuthorizer implementation. 
> Users would not be required to follow all of the "default" expectations. But 
> they would at least know what assumptions their implementation breaks. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KAFKA-3790) Default options when removing ACLs do not comply with documentation

2018-08-03 Thread Ray Chiang (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-3790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ray Chiang updated KAFKA-3790:
--
Component/s: documentation

> Default options when removing ACLs do not comply with documentation
> ---
>
> Key: KAFKA-3790
> URL: https://issues.apache.org/jira/browse/KAFKA-3790
> Project: Kafka
>  Issue Type: Bug
>  Components: documentation, security
>Affects Versions: 0.9.0.1, 0.10.0.0
>Reporter: Sébastien Launay
>Priority: Minor
>
> When removing ACLs without providing options like principal, host or 
> operation, we got a prompt for removing all the matching ACLs but when 
> executing the command none get removed.
> The following commands can be used to reproduce the inconsistency:
> {noformat}
> $ ./bin/kafka-acls.sh --authorizer-properties 
> zookeeper.connect=localhost:2181 -list -topic test
> Current ACLs for resource `Topic:test`: 
> $ ./bin/kafka-acls.sh --authorizer-properties 
> zookeeper.connect=localhost:2181 --add --allow-principal User:Alice 
> --operation Write --topic test --allow-host 1.2.3.4
> Adding ACLs for resource `Topic:test`: 
>   User:Alice has Allow permission for operations: Write from hosts: 
> 1.2.3.4 
> Current ACLs for resource `Topic:test`: 
>   User:Alice has Allow permission for operations: Write from hosts: 
> 1.2.3.4 
> $ ./bin/kafka-acls.sh --authorizer-properties 
> zookeeper.connect=localhost:2181 --remove --allow-principal User:Alice 
> --topic test 
> Are you sure you want to remove ACLs: 
>   User:Alice has Allow permission for operations: All from hosts: * 
>  from resource `Topic:test`? (y/n)
> y
> Current ACLs for resource `Topic:test`: 
>   User:Alice has Allow permission for operations: Write from hosts: 
> 1.2.3.4 
> {noformat}
> *The Current ACLs for resource {{Topic:test}} is expected to be empty after 
> the last command.*
> Only a specific ACL (when all options mentioned above are provided) or else 
> all the ACLs for a given resource (none of the options mentioned above are 
> provided) can get removed as shown by the following code snippets:
> {noformat}
>   // AclCommand.scala
>   ...
>   private def removeAcl(opts: AclCommandOptions) {
> withAuthorizer(opts) { authorizer =>
>   val resourceToAcl = getResourceToAcls(opts)
>   for ((resource, acls) <- resourceToAcl) {
> if (acls.isEmpty) {
>   if (confirmAction(opts, s"Are you sure you want to delete all ACLs 
> for resource `${resource}`? (y/n)"))
> authorizer.removeAcls(resource)
> } else {
>   if (confirmAction(opts, s"Are you sure you want to remove ACLs: 
> $Newline ${acls.map("\t" + _).mkString(Newline)} $Newline from resource 
> `${resource}`? (y/n)"))
> authorizer.removeAcls(acls, resource)
> }
>   }
>   listAcl(opts)
> }
>   }
> ...
>   // SimpleAclAuthorizer.scala
> ...
>   override def removeAcls(aclsTobeRemoved: Set[Acl], resource: Resource): 
> Boolean = {
>  inWriteLock(lock) {
>updateResourceAcls(resource) { currentAcls =>
> currentAcls -- aclsTobeRemoved
>}
>  }
>}
> {noformat}
> A workaround consists of listing the ACL in order to know which exact one to 
> remove which make the automation of ACL management trickier.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KAFKA-3790) Default options when removing ACLs do not comply with documentation

2018-08-03 Thread Ray Chiang (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-3790?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ray Chiang updated KAFKA-3790:
--
Component/s: security

> Default options when removing ACLs do not comply with documentation
> ---
>
> Key: KAFKA-3790
> URL: https://issues.apache.org/jira/browse/KAFKA-3790
> Project: Kafka
>  Issue Type: Bug
>  Components: documentation, security
>Affects Versions: 0.9.0.1, 0.10.0.0
>Reporter: Sébastien Launay
>Priority: Minor
>
> When removing ACLs without providing options like principal, host or 
> operation, we got a prompt for removing all the matching ACLs but when 
> executing the command none get removed.
> The following commands can be used to reproduce the inconsistency:
> {noformat}
> $ ./bin/kafka-acls.sh --authorizer-properties 
> zookeeper.connect=localhost:2181 -list -topic test
> Current ACLs for resource `Topic:test`: 
> $ ./bin/kafka-acls.sh --authorizer-properties 
> zookeeper.connect=localhost:2181 --add --allow-principal User:Alice 
> --operation Write --topic test --allow-host 1.2.3.4
> Adding ACLs for resource `Topic:test`: 
>   User:Alice has Allow permission for operations: Write from hosts: 
> 1.2.3.4 
> Current ACLs for resource `Topic:test`: 
>   User:Alice has Allow permission for operations: Write from hosts: 
> 1.2.3.4 
> $ ./bin/kafka-acls.sh --authorizer-properties 
> zookeeper.connect=localhost:2181 --remove --allow-principal User:Alice 
> --topic test 
> Are you sure you want to remove ACLs: 
>   User:Alice has Allow permission for operations: All from hosts: * 
>  from resource `Topic:test`? (y/n)
> y
> Current ACLs for resource `Topic:test`: 
>   User:Alice has Allow permission for operations: Write from hosts: 
> 1.2.3.4 
> {noformat}
> *The Current ACLs for resource {{Topic:test}} is expected to be empty after 
> the last command.*
> Only a specific ACL (when all options mentioned above are provided) or else 
> all the ACLs for a given resource (none of the options mentioned above are 
> provided) can get removed as shown by the following code snippets:
> {noformat}
>   // AclCommand.scala
>   ...
>   private def removeAcl(opts: AclCommandOptions) {
> withAuthorizer(opts) { authorizer =>
>   val resourceToAcl = getResourceToAcls(opts)
>   for ((resource, acls) <- resourceToAcl) {
> if (acls.isEmpty) {
>   if (confirmAction(opts, s"Are you sure you want to delete all ACLs 
> for resource `${resource}`? (y/n)"))
> authorizer.removeAcls(resource)
> } else {
>   if (confirmAction(opts, s"Are you sure you want to remove ACLs: 
> $Newline ${acls.map("\t" + _).mkString(Newline)} $Newline from resource 
> `${resource}`? (y/n)"))
> authorizer.removeAcls(acls, resource)
> }
>   }
>   listAcl(opts)
> }
>   }
> ...
>   // SimpleAclAuthorizer.scala
> ...
>   override def removeAcls(aclsTobeRemoved: Set[Acl], resource: Resource): 
> Boolean = {
>  inWriteLock(lock) {
>updateResourceAcls(resource) { currentAcls =>
> currentAcls -- aclsTobeRemoved
>}
>  }
>}
> {noformat}
> A workaround consists of listing the ACL in order to know which exact one to 
> remove which make the automation of ACL management trickier.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KAFKA-3980) JmxReporter uses excessive memory causing OutOfMemoryException

2018-08-03 Thread Ray Chiang (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-3980?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ray Chiang updated KAFKA-3980:
--
Component/s: metrics

> JmxReporter uses excessive memory causing OutOfMemoryException
> --
>
> Key: KAFKA-3980
> URL: https://issues.apache.org/jira/browse/KAFKA-3980
> Project: Kafka
>  Issue Type: Bug
>  Components: metrics
>Affects Versions: 0.9.0.1
>Reporter: Andrew Jorgensen
>Priority: Major
>
> I have some nodes in a kafka cluster that occasionally will run out of memory 
> whenever I restart the producers. I was able to take a heap dump from both a 
> recently restarted Kafka node which weighed in at about 20 MB and a node that 
> has been running for 2 months is using over 700MB of memory. Looking at the 
> heap dump it looks like the JmxReporter is holding on to metrics and causing 
> them to build up over time. 
> !http://imgur.com/N6Cd0Ku.png!
> !http://imgur.com/kQBqA2j.png!
> The ultimate problem this causes is that there is a chance when I restart the 
> producers it will cause the node to experience an Java heap space exception 
> and OOM. The nodes  then fail to startup correctly and write a -1 as the 
> leader number to the partitions they were responsible for effectively 
> resetting the offset and rendering that partition unavailable. The kafka 
> process then needs to go be restarted in order to re-assign the node to the 
> partition that it owns.
> I have a few questions:
> 1. I am not quite sure why there are so many client id entries in that 
> JmxReporter map.
> 2. Is there a way to have the JmxReporter release metrics after a set amount 
> of time or a way to turn certain high cardinality metrics like these off?
> I can provide any logs or heap dumps if more information is needed.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (KAFKA-3577) Partial cluster breakdown

2018-08-03 Thread Ray Chiang (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-3577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ray Chiang resolved KAFKA-3577.
---
Resolution: Duplicate

> Partial cluster breakdown
> -
>
> Key: KAFKA-3577
> URL: https://issues.apache.org/jira/browse/KAFKA-3577
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.9.0.1
> Environment: Debian GNU/Linux 7.9 (wheezy)
>Reporter: Kim Christensen
>Priority: Major
>
> We run a cluster of 3 brokers and 3 zookeepers, but after we upgraded to 
> 0.9.0.1 our cluster sometimes goes partially down, and we can't figure why. A 
> full cluster restart fixed the problem.
> I've added a snippet of the logs on each broker below.
> Broker 4:
> {quote}
> [2016-04-18 05:58:26,390] INFO [Group Metadata Manager on Broker 4]: Removed 
> 0 expired offsets in 0 milliseconds. (kafka.coordinator.GroupMetadataManager)
> [2016-04-18 06:05:55,218] INFO Creating /controller (is it secure? false) 
> (kafka.utils.ZKCheckedEphemeral)
> [2016-04-18 06:05:57,396] ERROR Session has expired while creating 
> /controller (kafka.utils.ZKCheckedEphemeral)
> [2016-04-18 06:05:57,396] INFO Result of znode creation is: SESSIONEXPIRED 
> (kafka.utils.ZKCheckedEphemeral)
> [2016-04-18 06:05:57,400] ERROR Error while electing or becoming leader on 
> broker 4 (kafka.server.ZookeeperLeaderElector)
> org.I0Itec.zkclient.exception.ZkException: 
> org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode 
> = Session expired
> at 
> org.I0Itec.zkclient.exception.ZkException.create(ZkException.java:68)
> at kafka.utils.ZKCheckedEphemeral.create(ZkUtils.scala:1090)
> at 
> kafka.server.ZookeeperLeaderElector.elect(ZookeeperLeaderElector.scala:81)
> at 
> kafka.server.ZookeeperLeaderElector$LeaderChangeListener$$anonfun$handleDataDeleted$1.apply$mcZ$sp(ZookeeperLeaderElector.scala:146)
> at 
> kafka.server.ZookeeperLeaderElector$LeaderChangeListener$$anonfun$handleDataDeleted$1.apply(ZookeeperLeaderElector.scala:141)
> at 
> kafka.server.ZookeeperLeaderElector$LeaderChangeListener$$anonfun$handleDataDeleted$1.apply(ZookeeperLeaderElector.scala:141)
> at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:262)
> at 
> kafka.server.ZookeeperLeaderElector$LeaderChangeListener.handleDataDeleted(ZookeeperLeaderElector.scala:141)
> at org.I0Itec.zkclient.ZkClient$9.run(ZkClient.java:823)
> at org.I0Itec.zkclient.ZkEventThread.run(ZkEventThread.java:71)
> Caused by: org.apache.zookeeper.KeeperException$SessionExpiredException: 
> KeeperErrorCode = Session expired
> at 
> org.apache.zookeeper.KeeperException.create(KeeperException.java:127)
> ... 9 more
> [2016-04-18 06:05:57,420] INFO Creating /controller (is it secure? false) 
> (kafka.utils.ZKCheckedEphemeral)
> [2016-04-18 06:05:57,424] INFO Result of znode creation is: OK 
> (kafka.utils.ZKCheckedEphemeral)
> [2016-04-18 06:05:57,425] INFO 4 successfully elected as leader 
> (kafka.server.ZookeeperLeaderElector)
> [2016-04-18 06:05:57,885] INFO [ReplicaFetcherManager on broker 4] Removed 
> fetcher for partitions 
> [__consumer_offsets,32],[__consumer_offsets,44],[cdrrecords-errors,1],[cdrrecords,0],[__consumer_offsets,38],[__consumer_offsets,8],[events
> ,2],[__consumer_offsets,20],[__consumer_offsets,2],[__consumer_offsets,14],[__consumer_offsets,26]
>  (kafka.server.ReplicaFetcherManager)
> [2016-04-18 06:05:57,892] INFO [ReplicaFetcherManager on broker 4] Removed 
> fetcher for partitions 
> [__consumer_offsets,35],[__consumer_offsets,23],[__consumer_offsets,47],[__consumer_offsets,11],[__consumer_offsets,5],[events-errors,2],[_
> _consumer_offsets,17],[__consumer_offsets,41],[__consumer_offsets,29] 
> (kafka.server.ReplicaFetcherManager)
> [2016-04-18 06:05:57,894] INFO Truncating log __consumer_offsets-17 to offset 
> 0. (kafka.log.Log)
> [2016-04-18 06:05:57,894] INFO Truncating log __consumer_offsets-23 to offset 
> 0. (kafka.log.Log)
> [2016-04-18 06:05:57,894] INFO Truncating log __consumer_offsets-29 to offset 
> 0. (kafka.log.Log)
> [2016-04-18 06:05:57,895] INFO Truncating log __consumer_offsets-35 to offset 
> 0. (kafka.log.Log)
> [2016-04-18 06:05:57,895] INFO Truncating log __consumer_offsets-41 to offset 
> 0. (kafka.log.Log)
> [2016-04-18 06:05:57,895] INFO Truncating log events-errors-2 to offset 0. 
> (kafka.log.Log)
> [2016-04-18 06:05:57,895] INFO Truncating log __consumer_offsets-5 to offset 
> 0. (kafka.log.Log)
> [2016-04-18 06:05:57,895] INFO Truncating log __consumer_offsets-11 to offset 
> 0. (kafka.log.Log)
> [2016-04-18 06:05:57,896] INFO Truncating log __consumer_offsets-47 to offset 
> 0. (kafka.log.Log)
> [2016-04-18 06:05:57,904] INFO [ReplicaFetcherManager on broker 4] Added 
> fetcher for partitions List([[__consumer_offsets,17], initOffset 0 

[jira] [Updated] (KAFKA-3494) mbeans overwritten with identical clients on a single jvm

2018-08-03 Thread Ray Chiang (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-3494?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ray Chiang updated KAFKA-3494:
--
Component/s: metrics

> mbeans overwritten with identical clients on a single jvm
> -
>
> Key: KAFKA-3494
> URL: https://issues.apache.org/jira/browse/KAFKA-3494
> Project: Kafka
>  Issue Type: Bug
>  Components: metrics
>Reporter: Onur Karaman
>Priority: Major
>
> Quotas today are implemented on a (client-id, broker) granularity. I think 
> one of the motivating factors for using a simple quota id was to allow for 
> flexibility in the granularity of the quota enforcement. For instance, entire 
> services can share the same id to get some form of (service, broker) 
> granularity quotas. From my understanding, client-id was chosen as the quota 
> id because it's a property that already exists on the clients and reusing it 
> had relatively low impact.
> Continuing the above example, let's say a service spins up multiple 
> KafkaConsumers in one jvm sharing the same client-id because they want those 
> consumers to be quotad as a single entity. Sharing client-ids within a single 
> jvm would cause problems in client metrics since the mbeans tags only go as 
> granular as the client-id.
> An easy example is kafka-metrics count. Here's a sample code snippet:
> {code}
> package org.apache.kafka.clients.consumer;
> import java.util.Collections;
> import java.util.Properties;
> import org.apache.kafka.common.TopicPartition;
> public class KafkaConsumerMetrics {
> public static void main(String[] args) throws InterruptedException {
> Properties properties = new Properties();
> properties.setProperty(ConsumerConfig.BOOTSTRAP_SERVERS_CONFIG, 
> "localhost:9092");
> properties.setProperty(ConsumerConfig.GROUP_ID_CONFIG, "test");
> properties.setProperty(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG, 
> "org.apache.kafka.common.serialization.StringDeserializer");
> 
> properties.setProperty(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG, 
> "org.apache.kafka.common.serialization.StringDeserializer");
> properties.setProperty(ConsumerConfig.CLIENT_ID_CONFIG, 
> "testclientid");
> KafkaConsumer kc1 = new KafkaConsumer<>(properties);
> KafkaConsumer kc2 = new KafkaConsumer<>(properties);
> kc1.assign(Collections.singletonList(new TopicPartition("t1", 0)));
> while (true) {
> kc1.poll(1000);
> System.out.println("kc1 metrics: " + kc1.metrics().size());
> System.out.println("kc2 metrics: " + kc2.metrics().size());
> Thread.sleep(1000);
> }
> }
> }
> {code}
> jconsole shows one mbean 
> kafka.consumer:type=kafka-metrics-count,client-id=testclientid consistently 
> with value 40.
> but stdout shows:
> {code}
> kc1 metrics: 69
> kc2 metrics: 40
> {code}
> I think the two possible solutions are:
> 1. add finer granularity to the mbeans to distinguish between the clients
> 2. require client ids to be unique per jvm like KafkaStreams has done



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KAFKA-3420) Transient failure in OffsetCommitTest.testNonExistingTopicOffsetCommit

2018-08-03 Thread Ray Chiang (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-3420?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ray Chiang updated KAFKA-3420:
--
Component/s: unit tests

> Transient failure in OffsetCommitTest.testNonExistingTopicOffsetCommit
> --
>
> Key: KAFKA-3420
> URL: https://issues.apache.org/jira/browse/KAFKA-3420
> Project: Kafka
>  Issue Type: Sub-task
>  Components: unit tests
>Reporter: Jason Gustafson
>Priority: Major
>  Labels: transient-unit-test-failure
>
> From a recent build. Possibly related to KAFKA-2068, which was committed 
> recently.
> {code}
> java.lang.AssertionError: expected:<0> but was:<16>
>   at org.junit.Assert.fail(Assert.java:88)
>   at org.junit.Assert.failNotEquals(Assert.java:834)
>   at org.junit.Assert.assertEquals(Assert.java:645)
>   at org.junit.Assert.assertEquals(Assert.java:631)
>   at 
> kafka.server.OffsetCommitTest.testNonExistingTopicOffsetCommit(OffsetCommitTest.scala:308)
>   at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
>   at 
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at 
> org.junit.runners.model.FrameworkMethod$1.runReflectiveCall(FrameworkMethod.java:50)
>   at 
> org.junit.internal.runners.model.ReflectiveCallable.run(ReflectiveCallable.java:12)
>   at 
> org.junit.runners.model.FrameworkMethod.invokeExplosively(FrameworkMethod.java:47)
>   at 
> org.junit.internal.runners.statements.InvokeMethod.evaluate(InvokeMethod.java:17)
>   at 
> org.junit.internal.runners.statements.RunBefores.evaluate(RunBefores.java:26)
>   at 
> org.junit.internal.runners.statements.RunAfters.evaluate(RunAfters.java:27)
>   at org.junit.runners.ParentRunner.runLeaf(ParentRunner.java:325)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:78)
>   at 
> org.junit.runners.BlockJUnit4ClassRunner.runChild(BlockJUnit4ClassRunner.java:57)
>   at org.junit.runners.ParentRunner$3.run(ParentRunner.java:290)
>   at org.junit.runners.ParentRunner$1.schedule(ParentRunner.java:71)
>   at org.junit.runners.ParentRunner.runChildren(ParentRunner.java:288)
>   at org.junit.runners.ParentRunner.access$000(ParentRunner.java:58)
>   at org.junit.runners.ParentRunner$2.evaluate(ParentRunner.java:268)
>   at org.junit.runners.ParentRunner.run(ParentRunner.java:363)
>   at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecuter.runTestClass(JUnitTestClassExecuter.java:105)
>   at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassExecuter.execute(JUnitTestClassExecuter.java:56)
>   at 
> org.gradle.api.internal.tasks.testing.junit.JUnitTestClassProcessor.processTestClass(JUnitTestClassProcessor.java:64)
>   at 
> org.gradle.api.internal.tasks.testing.SuiteTestClassProcessor.processTestClass(SuiteTestClassProcessor.java:49)
>   at sun.reflect.GeneratedMethodAccessor12.invoke(Unknown Source)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at 
> org.gradle.messaging.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:35)
>   at 
> org.gradle.messaging.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24)
>   at 
> org.gradle.messaging.dispatch.ContextClassLoaderDispatch.dispatch(ContextClassLoaderDispatch.java:32)
>   at 
> org.gradle.messaging.dispatch.ProxyDispatchAdapter$DispatchingInvocationHandler.invoke(ProxyDispatchAdapter.java:93)
>   at com.sun.proxy.$Proxy2.processTestClass(Unknown Source)
>   at 
> org.gradle.api.internal.tasks.testing.worker.TestWorker.processTestClass(TestWorker.java:106)
>   at sun.reflect.GeneratedMethodAccessor11.invoke(Unknown Source)
>   at 
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
>   at java.lang.reflect.Method.invoke(Method.java:606)
>   at 
> org.gradle.messaging.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:35)
>   at 
> org.gradle.messaging.dispatch.ReflectionDispatch.dispatch(ReflectionDispatch.java:24)
>   at 
> org.gradle.messaging.remote.internal.hub.MessageHub$Handler.run(MessageHub.java:360)
>   at 
> org.gradle.internal.concurrent.ExecutorPolicy$CatchAndRecordFailures.onExecute(ExecutorPolicy.java:54)
>   at 
> org.gradle.internal.concurrent.StoppableExecutorImpl$1.run(StoppableExecutorImpl.java:40)
>   at 
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
>   at 
> 

[jira] [Updated] (KAFKA-3241) JmxReporter produces invalid JSON when a value is Infinity

2018-08-03 Thread Ray Chiang (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-3241?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ray Chiang updated KAFKA-3241:
--
Component/s: metrics

> JmxReporter produces invalid JSON when a value is Infinity
> --
>
> Key: KAFKA-3241
> URL: https://issues.apache.org/jira/browse/KAFKA-3241
> Project: Kafka
>  Issue Type: Bug
>  Components: metrics
>Reporter: Babak Behzad
>Priority: Major
>
> We recently realized that the when JmxReporter$KafkaMbean has some metrics 
> with the value Infinity, the JSON created is invalid since the string value 
> "Infinity" or "-Infinity" are not in double-quotes! Here's an example:
> {noformat}
>  {
> "name" : 
> "kafka.producer:type=producer-node-metrics,client-id=producer-1,node-id=node-1",
> "modelerType" : "org.apache.kafka.common.metrics.JmxReporter$KafkaMbean",
> "request-rate" : 0.0,
> "request-size-avg" : 0.0,
> "incoming-byte-rate" : 0.0,
> "request-size-max" : -Infinity,
> "outgoing-byte-rate" : 0.0,
> "request-latency-max" : -Infinity,
> "request-latency-avg" : 0.0,
> "response-rate" : 0.0
>   }
> {noformat}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (KAFKA-3094) Kafka process 100% CPU when no message in topic

2018-08-03 Thread Ray Chiang (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-3094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ray Chiang resolved KAFKA-3094.
---
Resolution: Cannot Reproduce

> Kafka process 100% CPU when no message in topic
> ---
>
> Key: KAFKA-3094
> URL: https://issues.apache.org/jira/browse/KAFKA-3094
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.9.0.0
>Reporter: Omar AL Zabir
>Priority: Major
>
> When there's no message in a kafka topic and it is not getting any traffic 
> for some time, all the kafka nodes go 100% CPU. 
> As soon as I post a message, the CPU comes back to normal. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KAFKA-7239) Kafka Connect secret externalization not working

2018-08-03 Thread Ray Chiang (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-7239?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ray Chiang updated KAFKA-7239:
--
Component/s: KafkaConnect

> Kafka Connect secret externalization not working
> 
>
> Key: KAFKA-7239
> URL: https://issues.apache.org/jira/browse/KAFKA-7239
> Project: Kafka
>  Issue Type: Bug
>  Components: KafkaConnect
>Reporter: satyanarayan komandur
>Priority: Major
>
> I used the Kafka FileConfigProvider to externalize the properties like 
> connection.user and connection.password for JDBC source connector. I noticed 
> that the values in the connection properties are being replaced after the 
> connector has attempted to establish a connection with original key/value 
> pairs (untransformed). This is resulting a failure in connection. I am not 
> sure if this issue belong to Kafka Connector framework or its an issue with 
> JDBC Source Connector



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KAFKA-1736) Improve parition-broker assignment strategy for better availaility in majority durability modes

2018-08-03 Thread Ray Chiang (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-1736?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ray Chiang updated KAFKA-1736:
--
Component/s: replication

> Improve parition-broker assignment strategy for better availaility in 
> majority durability modes
> ---
>
> Key: KAFKA-1736
> URL: https://issues.apache.org/jira/browse/KAFKA-1736
> Project: Kafka
>  Issue Type: Improvement
>  Components: replication
>Affects Versions: 0.8.1.1
>Reporter: Kyle Banker
>Priority: Minor
> Attachments: Partitioner.scala
>
>
> The current random strategy of partition-to-broker distribution combined with 
> a fairly typical use of min.isr and request.acks results in a suboptimal 
> level of availability.
> Specifically, if all of your topics have a replication factor of 3, and you 
> use min.isr=2 and required.acks=all, then regardless of the number of the 
> brokers in the cluster, you can safely lose only 1 node. Losing more than 1 
> node will, 95% of the time, result in the inability to write to at least one 
> partition, thus rendering the cluster unavailable. As the total number of 
> partitions increases, so does this probability.
> On the other hand, if partitions are distributed so that brokers are 
> effectively replicas of each other, then the probability of unavailability 
> when two nodes are lost is significantly decreased. This probability 
> continues to decrease as the size of the cluster increases and, more 
> significantly, this probability is constant with respect to the total number 
> of partitions. The only requirement for getting these numbers with this 
> strategy is that the number of brokers be a multiple of the replication 
> factor.
> Here are of the results of some simulations I've run:
> With Random Partition Assignment
> Number of Brokers / Number of Partitions / Replication Factor / Probability 
> that two randomly selected nodes will contain at least 1 of the same 
> partitions
> 6  / 54 / 3 / .999
> 9  / 54 / 3 / .986
> 12 / 54 / 3 / .894
> Broker-Replica-Style Partitioning
> Number of Brokers / Number of Partitions / Replication Factor / Probability 
> that two randomly selected nodes will contain at least 1 of the same 
> partitions
> 6  / 54 / 3 / .424
> 9  / 54 / 3 / .228
> 12 / 54 / 3 / .168
> Adopting this strategy will greatly increase availability for users wanting 
> majority-style durability and should not change current behavior as long as 
> leader partitions are assigned evenly. I don't know of any negative impact 
> for other use cases, as in these cases, the distribution will still be 
> effectively random.
> Let me know if you'd like to see simulation code and whether a patch would be 
> welcome.
> EDIT: Just to clarify, here's how the current partition assigner would assign 
> 9 partitions with 3 replicas each to a 9-node cluster (broker number -> set 
> of replicas).
> 0 = Some(List(2, 3, 4))
> 1 = Some(List(3, 4, 5))
> 2 = Some(List(4, 5, 6))
> 3 = Some(List(5, 6, 7))
> 4 = Some(List(6, 7, 8))
> 5 = Some(List(7, 8, 9))
> 6 = Some(List(8, 9, 1))
> 7 = Some(List(9, 1, 2))
> 8 = Some(List(1, 2, 3))
> Here's how I'm proposing they be assigned:
> 0 = Some(ArrayBuffer(8, 5, 2))
> 1 = Some(ArrayBuffer(8, 5, 2))
> 2 = Some(ArrayBuffer(8, 5, 2))
> 3 = Some(ArrayBuffer(7, 4, 1))
> 4 = Some(ArrayBuffer(7, 4, 1))
> 5 = Some(ArrayBuffer(7, 4, 1))
> 6 = Some(ArrayBuffer(6, 3, 0))
> 7 = Some(ArrayBuffer(6, 3, 0))
> 8 = Some(ArrayBuffer(6, 3, 0))



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KAFKA-2471) Replicas Order and Leader out of sync

2018-08-03 Thread Ray Chiang (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-2471?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ray Chiang updated KAFKA-2471:
--
Component/s: replication

> Replicas Order and Leader out of sync
> -
>
> Key: KAFKA-2471
> URL: https://issues.apache.org/jira/browse/KAFKA-2471
> Project: Kafka
>  Issue Type: Bug
>  Components: replication
>Affects Versions: 0.8.2.1
>Reporter: Manish Sharma
>Priority: Major
>
> Our 2 kafka brokers ( 1 & 5) were rebooted due to hypervisor going down and I 
> think we encountered a similar
> issue that was discussed in thread "Problem with node after restart no 
> partitions?".  The resulting JIRA is closed without conclusions or
> recovery steps. 
> Our Brokers 5 and 1 were also running zookeeper of our cluster (along with 
> broker 2),
> we are running kafka version 0.8.2.1
> After doing a controlled restarts over all brokers a few times our cluster 
> seems ok now.
> But there are a some topics that have replicas out of sync with Leaders.
> Partition 2 below has Leader 5 and replicas order should be 5,1 
> {code}
> Topic:2015-01-12PartitionCount:3ReplicationFactor:2 
> Configs:
> Topic: 2015-01-12   Partition: 0Leader: 4   Replicas: 4,3 
>   Isr: 3,4
> Topic: 2015-01-12   Partition: 1Leader: 0   Replicas: 0,4 
>   Isr: 0,4
> Topic: 2015-01-12   Partition: 2Leader: 5   Replicas: 1,5 
>   Isr: 5
> {code}
> I tried reassigning partition 2 replicas to broker 5 (leader) and broker : 0
> Now partition reassignment is stuck for more than a day. 
> %) /usr/local/kafka/bin/kafka-reassign-partitions.sh --zookeeper 
> kafka-trgt05:2182 --reassignment-json-file 2015-01-12_2.json --verify
> Status of partition reassignment:
> Reassignment of partition [2015-01-12,2] is still in progress
> And In zookeeper, reassign_partitions is empty..
> [zk: kafka-trgt05:2182(CONNECTED) 2] ls /admin/reassign_partitions
> []
> This seems like a bug being triggered, that leaves the cluster in unhealthy 
> state.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KAFKA-2127) Running TopicCommand --alter causes connection close/reset errors in kafka logs

2018-08-03 Thread Ray Chiang (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-2127?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ray Chiang updated KAFKA-2127:
--
Component/s: tools

> Running TopicCommand --alter causes connection close/reset errors in kafka 
> logs
> ---
>
> Key: KAFKA-2127
> URL: https://issues.apache.org/jira/browse/KAFKA-2127
> Project: Kafka
>  Issue Type: Bug
>  Components: tools
>Reporter: Jason Rosenberg
>Priority: Minor
>
> I am using 0.8.2.1.  I've been noticing that any time I use TopicCommand to 
> alter a topic (e.g. add partitions) or delete a topic, the broker logs show a 
> bunch of closed connections, and usually 2 or 3 Connection reset exceptions.  
> It logs these with ERROR status.
> I recently used the kafka.admin.TopicCommand tool to increase the partitions 
> for a topic from 1 to 4.  So I ran:
> {code}
>  java -cp kafka.jar kafka.admin.TopicCommand --zookeeper myzkserver:12345 
> --topic mytopic --alter --partitions 4
> {code}
> This resulted in the following sequence in the broker log (repeated pretty 
> much in the logs of each broker):
> {code}
> 2015-04-16 03:51:26,156  INFO [kafka-network-thread-27330-1] 
> network.Processor - Closing socket connection to /1.2.3.12.
> 2015-04-16 03:51:26,169  INFO [kafka-network-thread-27330-0] 
> network.Processor - Closing socket connection to /1.2.3.89.
> 2015-04-16 03:51:26,169  INFO [kafka-network-thread-27330-0] 
> network.Processor - Closing socket connection to /1.2.3.95.
> 2015-04-16 03:51:26,176 ERROR [kafka-network-thread-27330-2] 
> network.Processor - Closing socket for /1.2.4.34 because of error
> java.io.IOException: Connection reset by peer
> at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
> at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
> at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
> at sun.nio.ch.IOUtil.read(IOUtil.java:197)
> at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379)
> at kafka.utils.Utils$.read(Utils.scala:380)
> at 
> kafka.network.BoundedByteBufferReceive.readFrom(BoundedByteBufferReceive.scala:54)
> at kafka.network.Processor.read(SocketServer.scala:444)
> at kafka.network.Processor.run(SocketServer.scala:340)
> at java.lang.Thread.run(Thread.java:745)
> 2015-04-16 03:51:26,178 ERROR [kafka-network-thread-27330-1] 
> network.Processor - Closing socket for /1.2.4.59 because of error
> java.io.IOException: Connection reset by peer
> at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
> at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
> at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
> at sun.nio.ch.IOUtil.read(IOUtil.java:197)
> at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379)
> at kafka.utils.Utils$.read(Utils.scala:380)
> at 
> kafka.network.BoundedByteBufferReceive.readFrom(BoundedByteBufferReceive.scala:54)
> at kafka.network.Processor.read(SocketServer.scala:444)
> at kafka.network.Processor.run(SocketServer.scala:340)
> at java.lang.Thread.run(Thread.java:745)
> 2015-04-16 03:51:26,192 ERROR [kafka-network-thread-27330-1] 
> network.Processor - Closing socket for /1.2.3.11 because of error
> java.io.IOException: Connection reset by peer
> at sun.nio.ch.FileDispatcherImpl.read0(Native Method)
> at sun.nio.ch.SocketDispatcher.read(SocketDispatcher.java:39)
> at sun.nio.ch.IOUtil.readIntoNativeBuffer(IOUtil.java:223)
> at sun.nio.ch.IOUtil.read(IOUtil.java:197)
> at sun.nio.ch.SocketChannelImpl.read(SocketChannelImpl.java:379)
> at kafka.utils.Utils$.read(Utils.scala:380)
> at 
> kafka.network.BoundedByteBufferReceive.readFrom(BoundedByteBufferReceive.scala:54)
> at kafka.network.Processor.read(SocketServer.scala:444)
> at kafka.network.Processor.run(SocketServer.scala:340)
> at java.lang.Thread.run(Thread.java:745)
> 2015-04-16 03:51:26,451  INFO [kafka-request-handler-3] 
> server.ReplicaFetcherManager - [ReplicaFetcherManager on broker 45] Removed 
> fetcher for partitions [mytopic,2]
> 2015-04-16 03:51:26,453  INFO [kafka-request-handler-3] log.Log - Completed 
> load of log mytopic-2 with log end offset 0
> 2015-04-16 03:51:26,454  INFO [kafka-request-handler-3] log.LogManager - 
> Created log for partition [mytopic,2] in /data/kafka_logs with properties 
> {segment.index.bytes -> 10485760, file.delete.delay.ms -> 6, 
> segment.bytes -> 1073741824, flush.ms -> 9223372036854775807, 
> delete.retention.ms -> 8640, index.interval.bytes -> 4096, 
> retention.bytes -> 500, min.insync.replicas -> 1, cleanup.policy -> 
> delete, unclean.leader.election.enable -> 

[jira] [Updated] (KAFKA-1543) Changing replication factor

2018-08-03 Thread Ray Chiang (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-1543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ray Chiang updated KAFKA-1543:
--
Component/s: replication

> Changing replication factor
> ---
>
> Key: KAFKA-1543
> URL: https://issues.apache.org/jira/browse/KAFKA-1543
> Project: Kafka
>  Issue Type: Improvement
>  Components: replication, tools
>Reporter: Alexey Ozeritskiy
>Priority: Major
> Attachments: can-change-replication.patch
>
>
> It is difficult to change replication factor by manual editing json config.
> I propose to add a key to kafka-reassign-partitions.sh command to 
> automatically create json config.
> Example of usage
> {code}
> kafka-reassign-partitions.sh --zookeeper zk --replicas new-replication-factor 
> --topics-to-move-json-file topics-file --broker-list 1,2,3,4 --generate > 
> output
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KAFKA-1543) Changing replication factor

2018-08-03 Thread Ray Chiang (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-1543?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ray Chiang updated KAFKA-1543:
--
Component/s: tools

> Changing replication factor
> ---
>
> Key: KAFKA-1543
> URL: https://issues.apache.org/jira/browse/KAFKA-1543
> Project: Kafka
>  Issue Type: Improvement
>  Components: replication, tools
>Reporter: Alexey Ozeritskiy
>Priority: Major
> Attachments: can-change-replication.patch
>
>
> It is difficult to change replication factor by manual editing json config.
> I propose to add a key to kafka-reassign-partitions.sh command to 
> automatically create json config.
> Example of usage
> {code}
> kafka-reassign-partitions.sh --zookeeper zk --replicas new-replication-factor 
> --topics-to-move-json-file topics-file --broker-list 1,2,3,4 --generate > 
> output
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KAFKA-1665) controller state gets stuck in message after execute

2018-08-03 Thread Ray Chiang (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-1665?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ray Chiang updated KAFKA-1665:
--
Component/s: controller

> controller state gets stuck in message after execute
> 
>
> Key: KAFKA-1665
> URL: https://issues.apache.org/jira/browse/KAFKA-1665
> Project: Kafka
>  Issue Type: Bug
>  Components: controller
>Reporter: Joe Stein
>Priority: Major
>
> I had a 0.8.1.1 Kafka Broker go down, and I was trying to use the reassign 
> partition script to move topics off that broker. When I describe the topics, 
> I see the following:
> Topic: mini__022active_120__33__mini Partition: 0 Leader: 2131118 
> Replicas: 2131118,2166601,2163421 Isr: 2131118,2166601
> This shows that the broker “2163421” is down. So I create the following file 
> /tmp/move_topic.json:
> {
> "version": 1,
> "partitions": [
> {
> "topic": "mini__022active_120__33__mini",
> "partition": 0,
> "replicas": [
> 2131118, 2166601,  2156998
> ]
> }
> ]
> }
> And then do this:
> ./kafka-reassign-partitions.sh --execute --reassignment-json-file 
> /tmp/move_topic.json
> Successfully started reassignment of partitions 
> {"version":1,"partitions":[{"topic":"mini__022active_120__33__mini","partition":0,"replicas":[2131118,2166601,2156998]}]}
> However, when I try to verify this, I get the following error:
> ./kafka-reassign-partitions.sh --verify --reassignment-json-file 
> /tmp/move_topic.json
> Status of partition reassignment:
> ERROR: Assigned replicas (2131118,2166601,2156998,2163421) don't match the 
> list of replicas for reassignment (2131118,2166601,2156998) for partition 
> [mini__022active_120__33__mini,0]
> Reassignment of partition [mini__022active_120__33__mini,0] failed
> If I describe the topics, I now see there are 4 replicas. This has been like 
> this for many hours now, so it seems to have permanently moved to 4 replicas 
> for some reason.
> Topic:mini__022active_120__33__mini PartitionCount:1 ReplicationFactor:4 
> Configs:
> Topic: mini__022active_120__33__mini Partition: 0 Leader: 2131118 
> Replicas: 2131118,2166601,2156998,2163421 Isr: 2131118,2166601
> If I re-execute and re-verify, I get the same error. So it seems to be wedged.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KAFKA-1712) Excessive storage usage on newly added node

2018-08-03 Thread Ray Chiang (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-1712?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ray Chiang updated KAFKA-1712:
--
Component/s: log

> Excessive storage usage on newly added node
> ---
>
> Key: KAFKA-1712
> URL: https://issues.apache.org/jira/browse/KAFKA-1712
> Project: Kafka
>  Issue Type: Bug
>  Components: log
>Reporter: Oleg Golovin
>Priority: Major
>
> When a new node is added to cluster data starts replicating into it. The 
> mtime of creating segments will be set on the last message being written to 
> them. Though the replication is a prolonged process, let's assume (for 
> simplicity of explanation) that their mtime is very close to the time when 
> the new node was added.
> After the replication is done, new data will start to flow into this new 
> node. After `log.retention.hours` the amount of data will be 2 * 
> daily_amount_of_data_in_kafka_node (first one is the replicated data from 
> other nodes when the node was added (let us call it `t1`) and the second is 
> the amount of replicated data from other nodes which happened from `t1` to 
> `t1 + log.retention.hours`). So by that time the node will have twice as much 
> data as the other nodes.
> This poses a big problem to us as our storage is chosen to fit normal amount 
> of data (not twice this amount).
> In our particular case it poses another problem. We have an emergency segment 
> cleaner which runs in case storage is nearly full (>90%). We try to balance 
> the amount of data for it not to run to rely solely on kafka internal log 
> deletion, but sometimes emergency cleaner runs.
> It works this way:
> - it gets all kafka segments for the volume
> - it filters out last segments of each partition (just to avoid unnecessary 
> recreation of last small-size segments)
> - it sorts them by segment mtime
> - it changes mtime of the first N segements (with the lowest mtime) to 1, so 
> they become really really old. Number N is chosen to free specified 
> percentage of volume (3% in our case).  Kafka deletes these segments later 
> (as they are very old).
> Emergency cleaner works very well. Except for the case when the data is 
> replicated to the newly added node. 
> In this case segment mtime is the time the segment was replicated and does 
> not reflect the real creation time of original data stored in this segment.
> So in this case kafka emergency cleaner will delete segments with the lowest 
> mtime, which may hold the data which is much more recent than the data in 
> other segments.
> This is not a big problem until we delete the data which hasn't been fully 
> consumed.
> In this case we loose data and this makes it a big problem.
> Is it possible to retain segment mtime during initial replication on a new 
> node?
> This will help not to load the new node with the twice as large amount of 
> data as other nodes have.
> Or maybe there are another ways to sort segments by data creation times (or 
> close to data creation time)? (for example if this ticket is implemented 
> https://issues.apache.org/jira/browse/KAFKA-1403, we may take time of the 
> first message from .index). In our case it will help with kafka emergency 
> cleaner, which will be deleting really the oldest data.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KAFKA-1967) Support more flexible serialization in Log4jAppender

2018-08-03 Thread Ray Chiang (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-1967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ray Chiang updated KAFKA-1967:
--
Component/s: logging

> Support more flexible serialization in Log4jAppender
> 
>
> Key: KAFKA-1967
> URL: https://issues.apache.org/jira/browse/KAFKA-1967
> Project: Kafka
>  Issue Type: Improvement
>  Components: logging
>Reporter: Jesse Yates
>Priority: Minor
> Attachments: kafka-1967-trunk.patch
>
>
> It would be nice to allow subclasses of the standard KafkfaLog4jAppender to 
> be able to serialize the LoggingEvent however they chose, rather than always 
> having to write out a string.
> A possible use case - the one I'm interested in - allows implementors to 
> convert the event to any sort of bytes. This means downstream consumers don't 
> lose data based on the logging format, but instead can get the entire event 
> to do with as they please



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KAFKA-2575) inconsistant offset count in replication-offset-checkpoint during lead election leads to huge exceptions

2018-08-03 Thread Ray Chiang (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-2575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ray Chiang updated KAFKA-2575:
--
Component/s: (was: offset manager)
 replication

> inconsistant offset count in replication-offset-checkpoint during lead 
> election leads to huge exceptions
> 
>
> Key: KAFKA-2575
> URL: https://issues.apache.org/jira/browse/KAFKA-2575
> Project: Kafka
>  Issue Type: Bug
>  Components: replication
>Affects Versions: 0.8.2.1
>Reporter: Warren Jin
>Priority: Major
>
> We have 3 brokers, more than 100 topics in production, the default partition 
> number is 24 for each topic, the replication factor is 3.
> We noticed the following errors in recent days.
> 2015-09-22 22:25:12,529 ERROR Error on broker 1 while processing LeaderAndIsr 
> request correlationId 438501 received from controller 2 epoch 12 for 
> partition [LOGIST.DELIVERY.SUBSCRIBE,7] (state.change.logger)
> java.io.IOException: Expected 3918 entries but found only 3904
>   at kafka.server.OffsetCheckpoint.read(OffsetCheckpoint.scala:99)
>   at kafka.cluster.Partition.getOrCreateReplica(Partition.scala:91)
>   at 
> kafka.cluster.Partition$$anonfun$makeLeader$1$$anonfun$apply$mcZ$sp$4.apply(Partition.scala:171)
>   at 
> kafka.cluster.Partition$$anonfun$makeLeader$1$$anonfun$apply$mcZ$sp$4.apply(Partition.scala:171)
>   at scala.collection.immutable.Set$Set3.foreach(Set.scala:115)
>   at 
> kafka.cluster.Partition$$anonfun$makeLeader$1.apply$mcZ$sp(Partition.scala:171)
>   at 
> kafka.cluster.Partition$$anonfun$makeLeader$1.apply(Partition.scala:163)
>   at 
> kafka.cluster.Partition$$anonfun$makeLeader$1.apply(Partition.scala:163)
>   at kafka.utils.Utils$.inLock(Utils.scala:535)
>   at kafka.utils.Utils$.inWriteLock(Utils.scala:543)
>   at kafka.cluster.Partition.makeLeader(Partition.scala:163)
>   at 
> kafka.server.ReplicaManager$$anonfun$makeLeaders$5.apply(ReplicaManager.scala:427)
>   at 
> kafka.server.ReplicaManager$$anonfun$makeLeaders$5.apply(ReplicaManager.scala:426)
>   at 
> scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
>   at 
> scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
>   at 
> scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:226)
>   at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:39)
>   at scala.collection.mutable.HashMap.foreach(HashMap.scala:98)
>   at kafka.server.ReplicaManager.makeLeaders(ReplicaManager.scala:426)
>   at 
> kafka.server.ReplicaManager.becomeLeaderOrFollower(ReplicaManager.scala:378)
>   at kafka.server.KafkaApis.handleLeaderAndIsrRequest(KafkaApis.scala:120)
>   at kafka.server.KafkaApis.handle(KafkaApis.scala:63)
>   at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:59)
>   at java.lang.Thread.run(Thread.java:745)
> It occurs in LOGIST.DELIVERY.SUBSCRIBE partition election, 
> then it repeatly pring out the error message:
> 2015-09-23 10:20:03 525 WARN [Replica Manager on Broker 1]: Fetch request 
> with correlation id 14943530 from client ReplicaFetcherThread-2-1 on 
> partition [LOGIST.DELIVERY.SUBSCRIBE,22] failed due to Leader not local for 
> partition [LOGIST.DELIVERY.SUBSCRIBE,22] on broker 1 
> (kafka.server.ReplicaManager)
> 2015-09-23 10:20:03 525 WARN [Replica Manager on Broker 1]: Fetch request 
> with correlation id 15022337 from client ReplicaFetcherThread-1-1 on 
> partition [LOGIST.DELIVERY.SUBSCRIBE,1] failed due to Leader not local for 
> partition [LOGIST.DELIVERY.SUBSCRIBE,1] on broker 1 
> (kafka.server.ReplicaManager)
> 2015-09-23 10:20:03 525 WARN [Replica Manager on Broker 1]: Fetch request 
> with correlation id 15078431 from client ReplicaFetcherThread-0-1 on 
> partition [LOGIST.DELIVERY.SUBSCRIBE,4] failed due to Leader not local for 
> partition [LOGIST.DELIVERY.SUBSCRIBE,4] on broker 1 
> (kafka.server.ReplicaManager)
> 2015-09-23 10:20:03 525 WARN [Replica Manager on Broker 1]: Fetch request 
> with correlation id 13477660 from client ReplicaFetcherThread-2-1 on 
> partition [LOGIST.DELIVERY.SUBSCRIBE,10] failed due to Leader not local for 
> partition [LOGIST.DELIVERY.SUBSCRIBE,10] on broker 1 
> (kafka.server.ReplicaManager)
> 2015-09-23 10:20:03 525 WARN [Replica Manager on Broker 1]: Fetch request 
> with correlation id 15022337 from client ReplicaFetcherThread-1-1 on 
> partition [LOGIST.DELIVERY.SUBSCRIBE,13] failed due to Leader not local for 
> partition [LOGIST.DELIVERY.SUBSCRIBE,13] on broker 1 
> (kafka.server.ReplicaManager)
> 2015-09-23 10:20:03 525 WARN [Replica Manager on Broker 1]: Fetch request 
> with correlation id 15078431 from client 

[jira] [Updated] (KAFKA-2178) Loss of highwatermarks on incorrect cluster shutdown/restart

2018-08-03 Thread Ray Chiang (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-2178?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ray Chiang updated KAFKA-2178:
--
Component/s: replication

> Loss of highwatermarks on incorrect cluster shutdown/restart
> 
>
> Key: KAFKA-2178
> URL: https://issues.apache.org/jira/browse/KAFKA-2178
> Project: Kafka
>  Issue Type: Improvement
>  Components: replication
>Affects Versions: 0.8.2.1
>Reporter: Alexey Ozeritskiy
>Priority: Major
> Attachments: KAFKA-2178.patch
>
>
> ReplicaManager flushes highwatermarks only for partitions which it recieved 
> from Controller.
> If Controller sends incomplete list of partitions then ReplicaManager will 
> write incomplete list of highwatermarks.
> As a result one can lose a lot of data during incorrect broker restart.
> We got this situation in real life on our cluster.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KAFKA-2575) inconsistant offset count in replication-offset-checkpoint during lead election leads to huge exceptions

2018-08-03 Thread Ray Chiang (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-2575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ray Chiang updated KAFKA-2575:
--
Component/s: offset manager

> inconsistant offset count in replication-offset-checkpoint during lead 
> election leads to huge exceptions
> 
>
> Key: KAFKA-2575
> URL: https://issues.apache.org/jira/browse/KAFKA-2575
> Project: Kafka
>  Issue Type: Bug
>  Components: offset manager
>Affects Versions: 0.8.2.1
>Reporter: Warren Jin
>Priority: Major
>
> We have 3 brokers, more than 100 topics in production, the default partition 
> number is 24 for each topic, the replication factor is 3.
> We noticed the following errors in recent days.
> 2015-09-22 22:25:12,529 ERROR Error on broker 1 while processing LeaderAndIsr 
> request correlationId 438501 received from controller 2 epoch 12 for 
> partition [LOGIST.DELIVERY.SUBSCRIBE,7] (state.change.logger)
> java.io.IOException: Expected 3918 entries but found only 3904
>   at kafka.server.OffsetCheckpoint.read(OffsetCheckpoint.scala:99)
>   at kafka.cluster.Partition.getOrCreateReplica(Partition.scala:91)
>   at 
> kafka.cluster.Partition$$anonfun$makeLeader$1$$anonfun$apply$mcZ$sp$4.apply(Partition.scala:171)
>   at 
> kafka.cluster.Partition$$anonfun$makeLeader$1$$anonfun$apply$mcZ$sp$4.apply(Partition.scala:171)
>   at scala.collection.immutable.Set$Set3.foreach(Set.scala:115)
>   at 
> kafka.cluster.Partition$$anonfun$makeLeader$1.apply$mcZ$sp(Partition.scala:171)
>   at 
> kafka.cluster.Partition$$anonfun$makeLeader$1.apply(Partition.scala:163)
>   at 
> kafka.cluster.Partition$$anonfun$makeLeader$1.apply(Partition.scala:163)
>   at kafka.utils.Utils$.inLock(Utils.scala:535)
>   at kafka.utils.Utils$.inWriteLock(Utils.scala:543)
>   at kafka.cluster.Partition.makeLeader(Partition.scala:163)
>   at 
> kafka.server.ReplicaManager$$anonfun$makeLeaders$5.apply(ReplicaManager.scala:427)
>   at 
> kafka.server.ReplicaManager$$anonfun$makeLeaders$5.apply(ReplicaManager.scala:426)
>   at 
> scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
>   at 
> scala.collection.mutable.HashMap$$anonfun$foreach$1.apply(HashMap.scala:98)
>   at 
> scala.collection.mutable.HashTable$class.foreachEntry(HashTable.scala:226)
>   at scala.collection.mutable.HashMap.foreachEntry(HashMap.scala:39)
>   at scala.collection.mutable.HashMap.foreach(HashMap.scala:98)
>   at kafka.server.ReplicaManager.makeLeaders(ReplicaManager.scala:426)
>   at 
> kafka.server.ReplicaManager.becomeLeaderOrFollower(ReplicaManager.scala:378)
>   at kafka.server.KafkaApis.handleLeaderAndIsrRequest(KafkaApis.scala:120)
>   at kafka.server.KafkaApis.handle(KafkaApis.scala:63)
>   at kafka.server.KafkaRequestHandler.run(KafkaRequestHandler.scala:59)
>   at java.lang.Thread.run(Thread.java:745)
> It occurs in LOGIST.DELIVERY.SUBSCRIBE partition election, 
> then it repeatly pring out the error message:
> 2015-09-23 10:20:03 525 WARN [Replica Manager on Broker 1]: Fetch request 
> with correlation id 14943530 from client ReplicaFetcherThread-2-1 on 
> partition [LOGIST.DELIVERY.SUBSCRIBE,22] failed due to Leader not local for 
> partition [LOGIST.DELIVERY.SUBSCRIBE,22] on broker 1 
> (kafka.server.ReplicaManager)
> 2015-09-23 10:20:03 525 WARN [Replica Manager on Broker 1]: Fetch request 
> with correlation id 15022337 from client ReplicaFetcherThread-1-1 on 
> partition [LOGIST.DELIVERY.SUBSCRIBE,1] failed due to Leader not local for 
> partition [LOGIST.DELIVERY.SUBSCRIBE,1] on broker 1 
> (kafka.server.ReplicaManager)
> 2015-09-23 10:20:03 525 WARN [Replica Manager on Broker 1]: Fetch request 
> with correlation id 15078431 from client ReplicaFetcherThread-0-1 on 
> partition [LOGIST.DELIVERY.SUBSCRIBE,4] failed due to Leader not local for 
> partition [LOGIST.DELIVERY.SUBSCRIBE,4] on broker 1 
> (kafka.server.ReplicaManager)
> 2015-09-23 10:20:03 525 WARN [Replica Manager on Broker 1]: Fetch request 
> with correlation id 13477660 from client ReplicaFetcherThread-2-1 on 
> partition [LOGIST.DELIVERY.SUBSCRIBE,10] failed due to Leader not local for 
> partition [LOGIST.DELIVERY.SUBSCRIBE,10] on broker 1 
> (kafka.server.ReplicaManager)
> 2015-09-23 10:20:03 525 WARN [Replica Manager on Broker 1]: Fetch request 
> with correlation id 15022337 from client ReplicaFetcherThread-1-1 on 
> partition [LOGIST.DELIVERY.SUBSCRIBE,13] failed due to Leader not local for 
> partition [LOGIST.DELIVERY.SUBSCRIBE,13] on broker 1 
> (kafka.server.ReplicaManager)
> 2015-09-23 10:20:03 525 WARN [Replica Manager on Broker 1]: Fetch request 
> with correlation id 15078431 from client ReplicaFetcherThread-0-1 on 
> partition 

[jira] [Updated] (KAFKA-2717) Add kafka logbak appender

2018-08-03 Thread Ray Chiang (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-2717?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ray Chiang updated KAFKA-2717:
--
Component/s: logging

> Add kafka logbak appender
> -
>
> Key: KAFKA-2717
> URL: https://issues.apache.org/jira/browse/KAFKA-2717
> Project: Kafka
>  Issue Type: New Feature
>  Components: logging
>Reporter: Xin Wang
>Priority: Major
>
> Since many applications use logback as their log framework.
> So, KafkaLogbakAppender would make it easier for integrating with kafka, just 
> like KafkaLog4jAppender.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KAFKA-1122) Kafka can log giant log lines

2018-08-02 Thread Ray Chiang (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-1122?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ray Chiang updated KAFKA-1122:
--
Component/s: logging

> Kafka can log giant log lines
> -
>
> Key: KAFKA-1122
> URL: https://issues.apache.org/jira/browse/KAFKA-1122
> Project: Kafka
>  Issue Type: Bug
>  Components: logging
>Affects Versions: 0.8.0
>Reporter: Jason Rosenberg
>Priority: Minor
>  Labels: usability
>
> There are a number of log lines that the kafka server, and high-level 
> consumer, can log, that can end up becoming a giant log line.  This can be 
> cumbersome to deal with in a log file.
> This happens in my case as I have have a large number of topics (on the order 
> of 500-700 topics).  Typically, these giant log lines will say something 
> separately about every topic on the broker.  An example:
> 2013-11-04 23:28:11,148  INFO [kafka-request-handler-0] server.ReplicaManager 
> - [Replica Manager on Broker 10]: Handling LeaderAndIsr request 
> Name:LeaderAndIsrRequest;Version:0;Controller:11;ControllerEpoch:220;CorrelationId:5;ClientId:id_11-host_null-port_27330;PartitionState:(mytopic,0)
>  -> 
> (LeaderAndIsrInfo:(Leader:11,ISR:11,LeaderEpoch:43,ControllerEpoch:219),ReplicationFactor:2),.
> Imagine that line going on with a separate entry for 700 topics.  There are 
> many other examples of this phenomenon in the server, and high-level consumer.
> I'd think these log lines could be separated into a single line per topic.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KAFKA-517) Ensure that we escape the metric names if they include user strings

2018-08-02 Thread Ray Chiang (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-517?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ray Chiang updated KAFKA-517:
-
Component/s: metrics

> Ensure that we escape the metric names if they include user strings
> ---
>
> Key: KAFKA-517
> URL: https://issues.apache.org/jira/browse/KAFKA-517
> Project: Kafka
>  Issue Type: Bug
>  Components: metrics
>Affects Versions: 0.8.0
>Reporter: Jay Kreps
>Priority: Major
>  Labels: bugs
>
> JMX has limits on valid strings. We need to check validity before blindly 
> creating a metric that includes a given topic name. If we fail to do this we 
> will get an exception like this:
> javax.management.MalformedObjectNameException: Unterminated key property part
>   at javax.management.ObjectName.construct(ObjectName.java:540)
>   at javax.management.ObjectName.(ObjectName.java:1403)
>   at 
> com.yammer.metrics.reporting.JmxReporter.onMetricAdded(JmxReporter.java:395)
>   at 
> com.yammer.metrics.core.MetricsRegistry.notifyMetricAdded(MetricsRegistry.java:516)
>   at 
> com.yammer.metrics.core.MetricsRegistry.getOrAdd(MetricsRegistry.java:491)
>   at 
> com.yammer.metrics.core.MetricsRegistry.newMeter(MetricsRegistry.java:240)
>   at com.yammer.metrics.Metrics.newMeter(Metrics.java:245)
>   at 
> kafka.metrics.KafkaMetricsGroup$class.newMeter(KafkaMetricsGroup.scala:46)
>   at kafka.server.FetcherStat.newMeter(AbstractFetcherThread.scala:180)
>   at kafka.server.FetcherStat.(AbstractFetcherThread.scala:182)
>   at 
> kafka.server.FetcherStat$$anonfun$2.apply(AbstractFetcherThread.scala:186)
>   at 
> kafka.server.FetcherStat$$anonfun$2.apply(AbstractFetcherThread.scala:186)
>   at kafka.utils.Pool.getAndMaybePut(Pool.scala:60)
>   at 
> kafka.server.FetcherStat$.getFetcherStat(AbstractFetcherThread.scala:190)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KAFKA-967) Use key range in ProducerPerformance

2018-08-02 Thread Ray Chiang (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-967?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ray Chiang updated KAFKA-967:
-
Component/s: producer 

> Use key range in ProducerPerformance
> 
>
> Key: KAFKA-967
> URL: https://issues.apache.org/jira/browse/KAFKA-967
> Project: Kafka
>  Issue Type: Improvement
>  Components: producer 
>Reporter: Guozhang Wang
>Priority: Major
> Attachments: KAFKA-967.v1.patch, KAFKA-967.v2.patch, 
> KAFKA-967.v3.patch
>
>
> Currently in ProducerPerformance, the key of the message is set to MessageID. 
> It would better to set it to a specific key within a key range (Integer type) 
> so that we can test the semantic partitioning case. This is related to 
> KAFKA-957.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KAFKA-960) Upgrade Metrics to 3.x

2018-08-02 Thread Ray Chiang (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ray Chiang updated KAFKA-960:
-
Component/s: metrics

> Upgrade Metrics to 3.x
> --
>
> Key: KAFKA-960
> URL: https://issues.apache.org/jira/browse/KAFKA-960
> Project: Kafka
>  Issue Type: Improvement
>  Components: metrics
>Affects Versions: 0.8.1
>Reporter: Cosmin Lehene
>Priority: Major
>
> Now that metrics 3.0 has been released 
> (http://metrics.codahale.com/about/release-notes/) we can upgrade back



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KAFKA-1858) Make ServerShutdownTest a bit less flaky

2018-08-02 Thread Ray Chiang (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-1858?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ray Chiang updated KAFKA-1858:
--
Component/s: unit tests

> Make ServerShutdownTest a bit less flaky
> 
>
> Key: KAFKA-1858
> URL: https://issues.apache.org/jira/browse/KAFKA-1858
> Project: Kafka
>  Issue Type: Bug
>  Components: unit tests
>Reporter: Gwen Shapira
>Priority: Major
> Attachments: KAFKA-1858.patch
>
>
> ServerShutdownTest currently:
> * Starts a KafkaServer
> * Does stuff
> * Stops the server
> * Counts if there are any live kafka threads
> This is fine on its own. But when running in a test suite (i.e gradle test), 
> the test is very very sensitive to any other test freeing all resources. If 
> you start a server in a previous test and forgot to close it, the 
> ServerShutdownTest will find threads from the previous test and fail.
> This makes for a flaky test that is pretty challenging to troubleshoot.
> I suggest counting the threads at the beginning and end of each test in the 
> class, and only failing if the number at the end is greater than the number 
> at the beginning.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KAFKA-1704) Add PartitionConfig besides LogConfig

2018-08-02 Thread Ray Chiang (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-1704?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ray Chiang updated KAFKA-1704:
--
Component/s: config

> Add PartitionConfig besides LogConfig
> -
>
> Key: KAFKA-1704
> URL: https://issues.apache.org/jira/browse/KAFKA-1704
> Project: Kafka
>  Issue Type: Bug
>  Components: config
>Reporter: Guozhang Wang
>Priority: Major
>  Labels: newbie
>
> Today we only have two places to store configs: server configs which is used 
> to store server side global configs, and log configs to store others. 
> However, many topic / partition level configs would be better stored in a 
> partition config such that they do not need to require accessing the 
> underlying logs, for example:
> 1. uncleanLeaderElectionEnable
> 2. minInSyncReplicas
> 3. compact [? this is defined per-topic / partition but maybe ok to store as 
> log configs]



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KAFKA-1918) System test for ZooKeeper quorum failure scenarios

2018-08-02 Thread Ray Chiang (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-1918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ray Chiang updated KAFKA-1918:
--
Component/s: system tests

> System test for ZooKeeper quorum failure scenarios
> --
>
> Key: KAFKA-1918
> URL: https://issues.apache.org/jira/browse/KAFKA-1918
> Project: Kafka
>  Issue Type: Test
>  Components: system tests
>Reporter: Omid Aladini
>Priority: Major
>
> Following up on the [conversation on the mailing 
> list|http://mail-archives.apache.org/mod_mbox/kafka-users/201502.mbox/%3CCAHwHRrX3SAWDUGF5LjU4rrMUsqv%3DtJcyjX7OENeL5C_V5o3tCw%40mail.gmail.com%3E],
>  the FAQ writes:
> {quote}
> Once the Zookeeper quorum is down, brokers could result in a bad state and 
> could not normally serve client requests, etc. Although when Zookeeper quorum 
> recovers, the Kafka brokers should be able to resume to normal state 
> automatically, _there are still a few +corner cases+ the they cannot and a 
> hard kill-and-recovery is required to bring it back to normal_. Hence it is 
> recommended to closely monitor your zookeeper cluster and provision it so 
> that it is performant.
> {quote}
> As ZK quorum failures are inevitable (due to rolling upgrades of ZK, leader 
> hardware failure, etc), it would be great to identify the corner cases (if 
> they still exist) and fix them if necessary.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KAFKA-1794) Make config and config defaults accessible to clients

2018-08-02 Thread Ray Chiang (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-1794?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ray Chiang updated KAFKA-1794:
--
Component/s: producer 
 config

> Make config and config defaults accessible to clients
> -
>
> Key: KAFKA-1794
> URL: https://issues.apache.org/jira/browse/KAFKA-1794
> Project: Kafka
>  Issue Type: Bug
>  Components: config, producer 
>Affects Versions: 0.8.1.1
>Reporter: Navina Ramesh
>Priority: Major
>
> In the new Kafka producer API, the ProducerConfig is not accessible to the 
> clients. Samza uses the ProducerConfig instance to access the defaults 
> property values, which can then be used in the various helper utils. Config 
> instance is accessible even without instantiating a Kafka producer. 
> With the new API, there is no way to instantiate a ProducerConfig as the 
> constructor is marked private. Also, it does not make the default config 
> values accessible to the client without actually instantiating a 
> KafkaProducer.
> Changes suggested:
> 1. Make the ProducerConfig constructor public
> 2. Make ConfigDef in ProducerConfig accessible by the client
> 3. Use public static variables for kafka config default "values" 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KAFKA-3015) Improve JBOD data balancing

2018-08-02 Thread Ray Chiang (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-3015?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ray Chiang updated KAFKA-3015:
--
Component/s: (was: core)
 log

> Improve JBOD data balancing
> ---
>
> Key: KAFKA-3015
> URL: https://issues.apache.org/jira/browse/KAFKA-3015
> Project: Kafka
>  Issue Type: Improvement
>  Components: log
>Reporter: Jay Kreps
>Priority: Major
>
> When running with multiple data directories (i.e. JBOD) we currently place 
> partitions entirely within one data directory. This tends to lead to poor 
> balancing across disks as some topics have more throughput/retention and not 
> all disks get data from all topics. You can't fix this problem with smarter 
> partition placement strategies because ultimately you don't know when a 
> partition is created when or how heavily it will be used (this is a subtle 
> point, and the tendency is to try to think of some more sophisticated way to 
> place partitions based on current data size but this is actually 
> exceptionally dangerous and can lead to much worse imbalance when creating 
> many partitions at once as they would all go to the disk with the least 
> data). We don't support online rebalancing across directories/disks so this 
> imbalance is a big problem and limits the usefulness of this configuration. 
> Implementing online rebalancing of data across disks without downtime is 
> actually quite hard and requires lots of I/O since you have to actually 
> rewrite full partitions of data.
> An alternative would be to place each partition in *all* directories/drives 
> and round-robin *segments* within the partition across the directories. So 
> the layout would be something like:
>   drive-a/mytopic-0/
>   000.data
>   000.index
>   0024680.data
>   0024680.index
>   drive-a/mytopic-0/
>   0012345.data
>   0012345.index
>   0036912.data
>   0036912.index
> This is a little harder to implement than the current approach but not very 
> hard, and it is a lot easier than implementing online data balancing across 
> disks while retaining the current approach. I think this could easily be done 
> in a backwards compatible way.
> I think the balancing you would get from this in most cases would be good 
> enough to make JBOD the default configuration. Thoughts?



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KAFKA-2569) Kafka should write its metrics to a Kafka topic

2018-08-02 Thread Ray Chiang (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-2569?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ray Chiang updated KAFKA-2569:
--
Component/s: metrics

> Kafka should write its metrics to a Kafka topic
> ---
>
> Key: KAFKA-2569
> URL: https://issues.apache.org/jira/browse/KAFKA-2569
> Project: Kafka
>  Issue Type: New Feature
>  Components: metrics
>Reporter: James Cheng
>Priority: Major
>
> Kafka is often used to hold and transport monitoring data.
> In order to monitor Kafka itself, Kafka currently exposes many metrics via 
> JMX, which require using a tool to pull the JMX metrics, and then write them 
> to the monitoring system.
> It would be convenient if Kafka could simply send its metrics to a Kafka 
> topic. This would make most sense if the Kafka topic was in a different Kafka 
> cluster, but could still be useful even if it was sent to a topic in the same 
> Kafka cluster.
> Of course, if sent to the same cluster, it would not be accessible if the 
> cluster itself was down.
> This would allow monitoring of Kafka itself without requiring people to set 
> up their own JMX-to-monitoring-system pipelines.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Updated] (KAFKA-1921) Delete offsets for a group with kafka offset storage

2018-08-02 Thread Ray Chiang (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-1921?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ray Chiang updated KAFKA-1921:
--
Component/s: offset manager

> Delete offsets for a group with kafka offset storage
> 
>
> Key: KAFKA-1921
> URL: https://issues.apache.org/jira/browse/KAFKA-1921
> Project: Kafka
>  Issue Type: Improvement
>  Components: offset manager
>Reporter: Onur Karaman
>Priority: Major
>
> There is currently no way to delete offsets for a consumer group when using 
> kafka offset storage.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


  1   2   3   >