[jira] [Resolved] (KAFKA-10453) Backport of PR-7781

2020-08-31 Thread Niketh Sabbineni (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-10453?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Niketh Sabbineni resolved KAFKA-10453.
--
Resolution: Workaround

> Backport of PR-7781
> ---
>
> Key: KAFKA-10453
> URL: https://issues.apache.org/jira/browse/KAFKA-10453
> Project: Kafka
>  Issue Type: Wish
>  Components: clients
>Affects Versions: 1.1.1
>Reporter: Niketh Sabbineni
>Priority: Major
>
> We have been hitting this bug (with kafka 1.1.1) where the Producer takes 
> forever to load metadata. The issue seems to have been patched in master 
> [here|[https://github.com/apache/kafka/pull/7781]]. 
> Would you *recommend* a backport of that above change to 1.1? There are 7-8 
> changes that need to be cherry picked. The other option is to upgrade to 2.5 
> (which would be much more involved)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [DISCUSS] KIP-405: Kafka Tiered Storage

2020-08-31 Thread Jun Rao
Hi, Satish,

Thanks for the updated KIP. Made another pass. A few more comments below.

600. The topic deletion logic needs more details.
600.1 The KIP mentions "The controller considers the topic partition is
deleted only when it determines that there are no log segments for that
topic partition by using RLMM". How is this done?
600.2 "If the delete option is enabled then the leader will stop RLM task
and stop processing and it sets all the remote log segment metadata of that
partition with a delete marker and publishes them to RLMM." We discussed
this earlier. When a topic is being deleted, there may not be a leader for
the deleted partition.

601. Unclean leader election
601.1 Scenario 1: new empty follower
After step 1, the follower restores up to offset 3. So why does it have
LE-2 at offset 5?
601.2 senario 5: After Step 3, leader A has inconsistent data between its
local and the tiered data. For example. offset 3 has msg 3 LE-0 locally,
but msg 5 LE-1 in the remote store. While it's ok for the unclean leader to
lose data, it should still return consistent data, whether it's from the
local or the remote store.
601.3 The follower picks up log start offset using the following api.
Suppose that we have 3 remote segments (LE, SegmentStartOffset) as (2, 10),
(3, 20) and (7, 15) due to an unclean leader election. Using the following
api will cause logStartOffset to go backward from 20 to 15. How do we
prevent that?
earliestLogOffset(TopicPartition topicPartition, int leaderEpoch)
601.4 It seems that retention is based on
listRemoteLogSegments(TopicPartition topicPartition, long leaderEpoch).
When there is an unclean leader election, it's possible for the new leader
to not to include certain epochs in its epoch cache. How are remote
segments associated with those epochs being cleaned?
601.5 The KIP discusses the handling of unclean leader elections for user
topics. What about unclean leader elections on
__remote_log_segment_metadata?

602. It would be useful to clarify the limitations in the initial release.
The KIP mentions not supporting compacted topics. What about JBOD and
changing the configuration of a topic from delete to compact
after remote.log.storage.enable is enabled?

603. RLM leader tasks:
603.1"It checks for rolled over LogSegments (which have the last message
offset less than last stable offset of that topic partition) and copies
them along with their offset/time/transaction indexes and leader epoch
cache to the remote tier." It needs to copy the producer snapshot too.
603.2 "Local logs are not cleaned up till those segments are copied
successfully to remote even though their retention time/size is reached"
This seems weird. If the tiering stops because the remote store is not
available, we don't want the local data to grow forever.

604. "RLM maintains a bounded cache(possibly LRU) of the index files of
remote log segments to avoid multiple index fetches from the remote
storage. These indexes can be used in the same way as local segment indexes
are used." Could you provide more details on this? Are the indexes cached
in memory or on disk? If on disk, where are they stored? Are the cached
indexes bound by a certain size?

605. BuildingRemoteLogAux
605.1 In this section, two options are listed. Which one is chosen?
605.2 In option 2, it says  "Build the local leader epoch cache by cutting
the leader epoch sequence received from remote storage to [LSO, ELO]. (LSO
= log start offset)." We need to do the same thing for the producer
snapshot. However, it's hard to cut the producer snapshot to an earlier
offset. Another option is to simply take the lastOffset from the remote
segment and use that as the starting fetch offset in the follower. This
avoids the need for cutting.

606. ListOffsets: Since we need a version bump, could you document it under
a protocol change section?

607. "LogStartOffset of a topic can point to either of local segment or
remote segment but it is initialised and maintained in the Log class like
now. This is already maintained in `Log` class while loading the logs and
it can also be fetched from RemoteLogMetadataManager." What will happen to
the existing logic (e.g. log recovery) that currently depends on
logStartOffset but assumes it's local?

608. Handle expired remote segment: How does it pick up new logStartOffset
from deleteRecords?

609. RLMM message format:
609.1 It includes both MaxTimestamp and EventTimestamp. Where does it get
both since the message in the log only contains one timestamp?
609.2 If we change just the state (e.g. to DELETE_STARTED), it seems it's
wasteful to have to include all other fields not changed.
609.3 Could you document which process makes the following transitions
DELETE_MARKED, DELETE_STARTED, DELETE_FINISHED?

610. remote.log.reader.max.pending.tasks: "Maximum remote log reader thread
pool task queue size. If the task queue is full, broker will stop reading
remote log segments."  What does the broker do if the queue is full?

611. 

Re: [DISCUSS] KIP-667: Remove deprecated methods from ReadOnlyWindowStore

2020-08-31 Thread Sophie Blee-Goldman
Thanks for this KIP as well! Seems like the methods were deprecated in 2.1.
What's our rule for how something has to stay deprecated before we can go
ahead and remove it?

Assuming 3.0 comes after 2.8, it certainly seems like enough time/releases
have
passed for us to do so in 3.0. But I'm pretty sure that the deprecation
rule follows
a specific number that I'm just not remembering, so hopefully someone else
can
jump in with an official quote here.



On Fri, Aug 28, 2020 at 11:32 AM Jorge Esteban Quilcate Otoya <
quilcate.jo...@gmail.com> wrote:

> Hi everyone,
>
> I'd like to propose these changes to the Window Store API.
>
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-667%3A+Remove+deprecated+methods+from+ReadOnlyWindowStore
>
> As these changes involve removing deprecated methods, this KIP is targeting
> the next major release v3.0.
>
> Looking forward to your feedback.
>
> Cheers,
> Jorge
>


Re: [DISCUSS] KIP-666: Add Instant-based methods to ReadOnlySessionStore

2020-08-31 Thread Sophie Blee-Goldman
Thanks for bringing the IQ API into alignment -- the proposal looks good,
although
one nit: you missed updating the startTime long to Instant in both
appearances of
the fetchSession(key, startTime, sessionEndTime) method. Also, I think by
"startTime"
you actually meant "earliestSessionEndTime".

One question I do have is whether we really need to provide a default
implementation
that throws UnsupportedOperationException? Actually I'm wondering if we
shouldn't
do something similar to the WindowStore methods, and provide a default
implementation
on the SessionStore interface which then just calls the corresponding
long-based method.
WDYT?

-Sophie

On Fri, Aug 28, 2020 at 11:31 AM Jorge Esteban Quilcate Otoya <
quilcate.jo...@gmail.com> wrote:

> Hi everyone,
>
> I'd like to discuss the following proposal to align IQ Session Store API
> with the Window Store one.
>
>
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-666%3A+Add+Instant-based+methods+to+ReadOnlySessionStore
>
> Looking forward to your feedback.
>
> Cheers,
> Jorge.
>


Build failed in Jenkins: Kafka » kafka-trunk-jdk8 #36

2020-08-31 Thread Apache Jenkins Server
See 


Changes:

[github] KAFKA-5636: SlidingWindows (KIP-450) (#9039)


--
[...truncated 6.47 MB...]
org.apache.kafka.streams.internals.WindowStoreFacadeTest > 
shouldReturnIsPersistent STARTED

org.apache.kafka.streams.internals.WindowStoreFacadeTest > 
shouldReturnIsPersistent PASSED

org.apache.kafka.streams.internals.WindowStoreFacadeTest > shouldForwardClose 
STARTED

org.apache.kafka.streams.internals.WindowStoreFacadeTest > shouldForwardClose 
PASSED

org.apache.kafka.streams.internals.WindowStoreFacadeTest > shouldForwardFlush 
STARTED

org.apache.kafka.streams.internals.WindowStoreFacadeTest > shouldForwardFlush 
PASSED

org.apache.kafka.streams.internals.WindowStoreFacadeTest > shouldForwardInit 
STARTED

org.apache.kafka.streams.internals.WindowStoreFacadeTest > shouldForwardInit 
PASSED

org.apache.kafka.streams.TestTopicsTest > testNonUsedOutputTopic STARTED

org.apache.kafka.streams.TestTopicsTest > testNonUsedOutputTopic PASSED

org.apache.kafka.streams.TestTopicsTest > testEmptyTopic STARTED

org.apache.kafka.streams.TestTopicsTest > testEmptyTopic PASSED

org.apache.kafka.streams.TestTopicsTest > testStartTimestamp STARTED

org.apache.kafka.streams.TestTopicsTest > testStartTimestamp PASSED

org.apache.kafka.streams.TestTopicsTest > testNegativeAdvance STARTED

org.apache.kafka.streams.TestTopicsTest > testNegativeAdvance PASSED

org.apache.kafka.streams.TestTopicsTest > shouldNotAllowToCreateWithNullDriver 
STARTED

org.apache.kafka.streams.TestTopicsTest > shouldNotAllowToCreateWithNullDriver 
PASSED

org.apache.kafka.streams.TestTopicsTest > testDuration STARTED

org.apache.kafka.streams.TestTopicsTest > testDuration PASSED

org.apache.kafka.streams.TestTopicsTest > testOutputToString STARTED

org.apache.kafka.streams.TestTopicsTest > testOutputToString PASSED

org.apache.kafka.streams.TestTopicsTest > testValue STARTED

org.apache.kafka.streams.TestTopicsTest > testValue PASSED

org.apache.kafka.streams.TestTopicsTest > testTimestampAutoAdvance STARTED

org.apache.kafka.streams.TestTopicsTest > testTimestampAutoAdvance PASSED

org.apache.kafka.streams.TestTopicsTest > testOutputWrongSerde STARTED

org.apache.kafka.streams.TestTopicsTest > testOutputWrongSerde PASSED

org.apache.kafka.streams.TestTopicsTest > 
shouldNotAllowToCreateOutputTopicWithNullTopicName STARTED

org.apache.kafka.streams.TestTopicsTest > 
shouldNotAllowToCreateOutputTopicWithNullTopicName PASSED

org.apache.kafka.streams.TestTopicsTest > testWrongSerde STARTED

org.apache.kafka.streams.TestTopicsTest > testWrongSerde PASSED

org.apache.kafka.streams.TestTopicsTest > testKeyValuesToMapWithNull STARTED

org.apache.kafka.streams.TestTopicsTest > testKeyValuesToMapWithNull PASSED

org.apache.kafka.streams.TestTopicsTest > testNonExistingOutputTopic STARTED

org.apache.kafka.streams.TestTopicsTest > testNonExistingOutputTopic PASSED

org.apache.kafka.streams.TestTopicsTest > testMultipleTopics STARTED

org.apache.kafka.streams.TestTopicsTest > testMultipleTopics PASSED

org.apache.kafka.streams.TestTopicsTest > testKeyValueList STARTED

org.apache.kafka.streams.TestTopicsTest > testKeyValueList PASSED

org.apache.kafka.streams.TestTopicsTest > 
shouldNotAllowToCreateOutputWithNullDriver STARTED

org.apache.kafka.streams.TestTopicsTest > 
shouldNotAllowToCreateOutputWithNullDriver PASSED

org.apache.kafka.streams.TestTopicsTest > testValueList STARTED

org.apache.kafka.streams.TestTopicsTest > testValueList PASSED

org.apache.kafka.streams.TestTopicsTest > testRecordList STARTED

org.apache.kafka.streams.TestTopicsTest > testRecordList PASSED

org.apache.kafka.streams.TestTopicsTest > testNonExistingInputTopic STARTED

org.apache.kafka.streams.TestTopicsTest > testNonExistingInputTopic PASSED

org.apache.kafka.streams.TestTopicsTest > testKeyValuesToMap STARTED

org.apache.kafka.streams.TestTopicsTest > testKeyValuesToMap PASSED

org.apache.kafka.streams.TestTopicsTest > testRecordsToList STARTED

org.apache.kafka.streams.TestTopicsTest > testRecordsToList PASSED

org.apache.kafka.streams.TestTopicsTest > testKeyValueListDuration STARTED

org.apache.kafka.streams.TestTopicsTest > testKeyValueListDuration PASSED

org.apache.kafka.streams.TestTopicsTest > testInputToString STARTED

org.apache.kafka.streams.TestTopicsTest > testInputToString PASSED

org.apache.kafka.streams.TestTopicsTest > testTimestamp STARTED

org.apache.kafka.streams.TestTopicsTest > testTimestamp PASSED

org.apache.kafka.streams.TestTopicsTest > testWithHeaders STARTED

org.apache.kafka.streams.TestTopicsTest > testWithHeaders PASSED

org.apache.kafka.streams.TestTopicsTest > testKeyValue STARTED

org.apache.kafka.streams.TestTopicsTest > testKeyValue PASSED

org.apache.kafka.streams.TestTopicsTest > 
shouldNotAllowToCreateTopicWithNullTopicName STARTED

org.apache.kafka.streams.TestTopicsTest > 

Build failed in Jenkins: Kafka » kafka-trunk-jdk15 #37

2020-08-31 Thread Apache Jenkins Server
See 


Changes:

[github] KAFKA-5636: SlidingWindows (KIP-450) (#9039)


--
[...truncated 3.25 MB...]

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentForCompareKeyValueTimestamp STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentForCompareKeyValueTimestamp PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldPassIfValueAndTimestampIsEqualForCompareValueTimestampWithProducerRecord 
STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldPassIfValueAndTimestampIsEqualForCompareValueTimestampWithProducerRecord 
PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentWithNullReversForCompareKeyValueTimestamp STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentWithNullReversForCompareKeyValueTimestamp PASSED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentWithNullReverseForCompareValueTimestampWithProducerRecord
 STARTED

org.apache.kafka.streams.test.OutputVerifierTest > 
shouldFailIfValueIsDifferentWithNullReverseForCompareValueTimestampWithProducerRecord
 PASSED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldCreateConsumerRecordsFromKeyValuePairs STARTED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldCreateConsumerRecordsFromKeyValuePairs PASSED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldNotAllowToCreateTopicWithNullTopicNameWithNullKeyAndDefaultTimestamp 
STARTED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldNotAllowToCreateTopicWithNullTopicNameWithNullKeyAndDefaultTimestamp 
PASSED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldRequireCustomTopicNameIfNotDefaultFactoryTopicNameWithDefaultTimestamp 
STARTED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldRequireCustomTopicNameIfNotDefaultFactoryTopicNameWithDefaultTimestamp 
PASSED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldNotAllowToCreateTopicWithNullTopicNameWithNullKey STARTED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldNotAllowToCreateTopicWithNullTopicNameWithNullKey PASSED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldCreateNullKeyConsumerRecordWithOtherTopicNameAndTimestampWithTimetamp 
STARTED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldCreateNullKeyConsumerRecordWithOtherTopicNameAndTimestampWithTimetamp 
PASSED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldCreateConsumerRecordWithTimestamp STARTED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldCreateConsumerRecordWithTimestamp PASSED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldNotAllowToCreateTopicWithNullHeaders STARTED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldNotAllowToCreateTopicWithNullHeaders PASSED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldNotAllowToCreateTopicWithNullTopicNameWithDefaultTimestamp STARTED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldNotAllowToCreateTopicWithNullTopicNameWithDefaultTimestamp PASSED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldRequireCustomTopicNameIfNotDefaultFactoryTopicName STARTED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldRequireCustomTopicNameIfNotDefaultFactoryTopicName PASSED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldRequireCustomTopicNameIfNotDefaultFactoryTopicNameWithNullKey STARTED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldRequireCustomTopicNameIfNotDefaultFactoryTopicNameWithNullKey PASSED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldCreateConsumerRecord STARTED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldCreateConsumerRecord PASSED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldCreateNullKeyConsumerRecord STARTED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldCreateNullKeyConsumerRecord PASSED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldCreateConsumerRecordWithOtherTopicName STARTED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldCreateConsumerRecordWithOtherTopicName PASSED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > shouldAdvanceTime 
STARTED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > shouldAdvanceTime 
PASSED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldNotAllowToCreateTopicWithNullTopicNameWithKeyValuePairs STARTED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 
shouldNotAllowToCreateTopicWithNullTopicNameWithKeyValuePairs PASSED

org.apache.kafka.streams.test.ConsumerRecordFactoryTest > 

Build failed in Jenkins: Kafka » kafka-trunk-jdk11 #38

2020-08-31 Thread Apache Jenkins Server
See 


Changes:

[github] KAFKA-5636: SlidingWindows (KIP-450) (#9039)


--
[...truncated 3.25 MB...]

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldUseSourceSpecificDeserializers[Eos enabled = false] STARTED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldUseSourceSpecificDeserializers[Eos enabled = false] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > shouldReturnAllStores[Eos 
enabled = false] STARTED

org.apache.kafka.streams.TopologyTestDriverTest > shouldReturnAllStores[Eos 
enabled = false] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldNotCreateStateDirectoryForStatelessTopology[Eos enabled = false] STARTED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldNotCreateStateDirectoryForStatelessTopology[Eos enabled = false] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldApplyGlobalUpdatesCorrectlyInRecursiveTopologies[Eos enabled = false] 
STARTED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldApplyGlobalUpdatesCorrectlyInRecursiveTopologies[Eos enabled = false] 
PASSED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldReturnAllStoresNames[Eos enabled = false] STARTED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldReturnAllStoresNames[Eos enabled = false] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldPassRecordHeadersIntoSerializersAndDeserializers[Eos enabled = false] 
STARTED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldPassRecordHeadersIntoSerializersAndDeserializers[Eos enabled = false] 
PASSED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldProcessConsumerRecordList[Eos enabled = false] STARTED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldProcessConsumerRecordList[Eos enabled = false] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldUseSinkSpecificSerializers[Eos enabled = false] STARTED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldUseSinkSpecificSerializers[Eos enabled = false] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldFlushStoreForFirstInput[Eos enabled = false] STARTED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldFlushStoreForFirstInput[Eos enabled = false] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldProcessFromSourceThatMatchPattern[Eos enabled = false] STARTED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldProcessFromSourceThatMatchPattern[Eos enabled = false] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldCaptureSinkTopicNamesIfWrittenInto[Eos enabled = false] STARTED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldCaptureSinkTopicNamesIfWrittenInto[Eos enabled = false] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldUpdateStoreForNewKey[Eos enabled = false] STARTED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldUpdateStoreForNewKey[Eos enabled = false] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldSendRecordViaCorrectSourceTopicDeprecated[Eos enabled = false] STARTED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldSendRecordViaCorrectSourceTopicDeprecated[Eos enabled = false] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldPunctuateOnWallClockTime[Eos enabled = false] STARTED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldPunctuateOnWallClockTime[Eos enabled = false] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > shouldSetRecordMetadata[Eos 
enabled = false] STARTED

org.apache.kafka.streams.TopologyTestDriverTest > shouldSetRecordMetadata[Eos 
enabled = false] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldNotUpdateStoreForLargerValue[Eos enabled = false] STARTED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldNotUpdateStoreForLargerValue[Eos enabled = false] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldReturnCorrectInMemoryStoreTypeOnly[Eos enabled = false] STARTED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldReturnCorrectInMemoryStoreTypeOnly[Eos enabled = false] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > shouldThrowForMissingTime[Eos 
enabled = false] STARTED

org.apache.kafka.streams.TopologyTestDriverTest > shouldThrowForMissingTime[Eos 
enabled = false] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldCaptureInternalTopicNamesIfWrittenInto[Eos enabled = false] STARTED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldCaptureInternalTopicNamesIfWrittenInto[Eos enabled = false] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldPunctuateOnWallClockTimeDeprecated[Eos enabled = false] STARTED

org.apache.kafka.streams.TopologyTestDriverTest > 
shouldPunctuateOnWallClockTimeDeprecated[Eos enabled = false] PASSED

org.apache.kafka.streams.TopologyTestDriverTest > 

Can't find any sendfile system call trace from Kafka process?

2020-08-31 Thread Ming Liu
Hi Kafka dev community,
 As we know, one major reason that Kafka is fast is because it is using
sendfile() for zero copy, as what it described at
https://kafka.apache.org/documentation/#producerconfigs,



*This combination of pagecache and sendfile means that on a Kafka cluster
where the consumers are mostly caught up you will see no read activity on
the disks whatsoever as they will be serving data entirely from cache.*

However, when I ran tracing on all my kafka brokers, I didn't get a
single sendfile system call, why is this? Does it eventually translate to
plain read/write syscalls?

sudo ./syscount -p 126806 -d 30
Tracing syscalls, printing top 10... Ctrl+C to quit.
[17:44:10]
SYSCALL  COUNT
epoll_wait108482
write  107165
epoll_ctl 95058
futex   86716
read   86388
pread   26910
fstat   9213
getrusage  120
close27
open 21


[DISCUSS] KIP-665 Kafka Connect Hash SMT

2020-08-31 Thread Brandon Brown
Hey everybody, I’ve created the following and would love some feedback. One 
place where this could be of use would be to say hashing the key used as an 
identifier for inserting into elasticsearch (which has a size limit) or 
obfuscating sensitive values like say passwords or ssn. 

https://cwiki.apache.org/confluence/display/KAFKA/KIP-665%3A+Kafka+Connect+Hash+SMT

The current pr with the proposed changes 
https://github.com/apache/kafka/pull/9057 and the original 3rd party 
contribution which initiated this change 
https://github.com/aiven/aiven-kafka-connect-transforms/issues/9#issuecomment-662378057.

I'm interested in any suggestions for ways to improve this as I think it would 
make a nice addition to the existing SMTs provided by Kafka Connect out of the 
box.

Thanks,
Brandon

Re: [DISCUSS] KIP-664: Provide tooling to detect and abort hanging transactions

2020-08-31 Thread Guozhang Wang
Thanks Jason, I do not have more comments on the KIP then.

On Mon, Aug 31, 2020 at 3:19 PM Jason Gustafson  wrote:

> > Hmm, but the "TxnStartOffset" is not included in the DescribeProducers
> response either?
>
> Oh, I accidentally called it `CurrentTxnStartTimestamp` in the schema.
> Fixed now!
>
> -Jason
>
> On Mon, Aug 31, 2020 at 3:04 PM Guozhang Wang  wrote:
>
> > On Mon, Aug 31, 2020 at 12:28 PM Jason Gustafson 
> > wrote:
> >
> > > Hey Guozhang,
> > >
> > > Thanks for the detailed comments. Responses inline:
> > >
> > > > 1. I'd like to clarify how we can make "--abort" work with old
> brokers,
> > > since without the additional field "Partitions" the tool needs to set
> the
> > > coordinator epoch correctly instead of "-1"? Arguably that's still
> doable
> > > but would require different call paths, and it's not clear whether
> that's
> > > worth doing for old versions.
> > >
> > > That's a good question. What I had in mind was to write the marker
> using
> > > the last coordinator epoch that was used by the respective ProducerId.
> I
> > > realized that I left the coordinator epoch out of the
> `DescribeProducers`
> > > response, so I have updated the KIP to include it. It is possible that
> > > there is no coordinator epoch associated with a given ProducerId (e.g.
> if
> > > it is the first transaction from that producer), but in this case we
> can
> > > use 0.
> > >
> > > As for whether this is worth doing, I guess I would be more inclined to
> > > leave it out if users had a reasonable alternative today to address
> this
> > > problem.
> > >
> > > > 2. Why do we have to enforce "DescribeProducers" to be sent to only
> > > leaders
> > > while ListTransactions can be sent to any brokers? Or is it really
> > > "ListTransactions to be sent to coordinators only"? From the workflow
> > > you've described, based on the results back from DescribeProducers, we
> > > should just immediately send ListTransactions to the
> > > corresponding coordinators based on the collected producer ids, instead
> > of
> > > trying to send to any brokers right?
> > >
> > > I'm going to change `DescribeProducers` so that it can be handled by
> any
> > > replica of a topic partition. This was suggested by Lucas in order to
> > allow
> > > this API to be used for replica consistency testing. As far as
> > > `ListTransactions`, I was treating this similarly to `ListGroups`.
> > Although
> > > we know that the coordinators are the leaders of the
> __transaction_state
> > > partitions, this is more of an implementation detail. From an API
> > > perspective, we say that any broker could be a transaction coordinator.
> > >
> > > > 3. One thing I'm a bit hesitant about is that, is `Describe`
> permission
> > > on
> > > the associated topic sufficient to allow any users to get all producer
> > > information writing to the specific topic-partitions including last
> > > timestamp, txn-start-timestamp etc, which may be considered sensitive?
> > > Should we require "ClusterAction" to only allow operators only?
> > >
> > > That's a fair point. Do you think `Read` permission would be
> reasonable?
> > > This is all information that could be obtained by reading the topic.
> > >
> > > Yeah that makes sense.
> >
> >
> > > > 4. From the example it seems "TxnStartOffset" should be included in
> the
> > > DescribeTransaction response schema? Otherwise the user would not get
> it
> > in
> > > the following WriteTxnMarker request.
> > >
> > > The `DescribeTransaction` API is sent to the transaction coordinator,
> > which
> > > does not know the start offset of a transaction in each topic
> partition.
> > > That is why we need `DescribeProducers`.
> > >
> >
> > Hmm, but the "TxnStartOffset" is not included in the DescribeProducers
> > response either?
> >
> >
> > >
> > > > 5. It is a bit easier for readers to highlight the added fields in
> the
> > > existing WriteTxnMarkerRequest (btw I read is that we are only adding
> > > "Partitions" with the starting offset, right?). Also as for its
> response
> > it
> > > seems we do not make any schema changes except adding one more
> potential
> > > error code "INVALID_TXN_STATE" to it, right? If that's the case we can
> > just
> > > state that explicitly.
> > >
> > > I highlighted the new field in the request. For the response, the KIP
> > > states the following: "There are no changes to the response schema, but
> > it
> > > will be bumped. Note that we are also enabling flexible version
> support."
> > >
> > > > 6. It is not clear to me for the overloaded function that the
> following
> > > option classes are not specified, what should be the default options?
> > > ...
> > >
> > > I was just trying to stick with existing conventions, but I will add
> some
> > > more detail here. I think we should probably still include
> > > `AbortTransactionOptions`. The `Options` classes are how users override
> > > timeouts.
> > >
> > > > 7.1 Is "--broker" a required or optional (in that case I presume we
> > 

Re: [DISCUSS] KIP-664: Provide tooling to detect and abort hanging transactions

2020-08-31 Thread Jason Gustafson
> Hmm, but the "TxnStartOffset" is not included in the DescribeProducers
response either?

Oh, I accidentally called it `CurrentTxnStartTimestamp` in the schema.
Fixed now!

-Jason

On Mon, Aug 31, 2020 at 3:04 PM Guozhang Wang  wrote:

> On Mon, Aug 31, 2020 at 12:28 PM Jason Gustafson 
> wrote:
>
> > Hey Guozhang,
> >
> > Thanks for the detailed comments. Responses inline:
> >
> > > 1. I'd like to clarify how we can make "--abort" work with old brokers,
> > since without the additional field "Partitions" the tool needs to set the
> > coordinator epoch correctly instead of "-1"? Arguably that's still doable
> > but would require different call paths, and it's not clear whether that's
> > worth doing for old versions.
> >
> > That's a good question. What I had in mind was to write the marker using
> > the last coordinator epoch that was used by the respective ProducerId. I
> > realized that I left the coordinator epoch out of the `DescribeProducers`
> > response, so I have updated the KIP to include it. It is possible that
> > there is no coordinator epoch associated with a given ProducerId (e.g. if
> > it is the first transaction from that producer), but in this case we can
> > use 0.
> >
> > As for whether this is worth doing, I guess I would be more inclined to
> > leave it out if users had a reasonable alternative today to address this
> > problem.
> >
> > > 2. Why do we have to enforce "DescribeProducers" to be sent to only
> > leaders
> > while ListTransactions can be sent to any brokers? Or is it really
> > "ListTransactions to be sent to coordinators only"? From the workflow
> > you've described, based on the results back from DescribeProducers, we
> > should just immediately send ListTransactions to the
> > corresponding coordinators based on the collected producer ids, instead
> of
> > trying to send to any brokers right?
> >
> > I'm going to change `DescribeProducers` so that it can be handled by any
> > replica of a topic partition. This was suggested by Lucas in order to
> allow
> > this API to be used for replica consistency testing. As far as
> > `ListTransactions`, I was treating this similarly to `ListGroups`.
> Although
> > we know that the coordinators are the leaders of the __transaction_state
> > partitions, this is more of an implementation detail. From an API
> > perspective, we say that any broker could be a transaction coordinator.
> >
> > > 3. One thing I'm a bit hesitant about is that, is `Describe` permission
> > on
> > the associated topic sufficient to allow any users to get all producer
> > information writing to the specific topic-partitions including last
> > timestamp, txn-start-timestamp etc, which may be considered sensitive?
> > Should we require "ClusterAction" to only allow operators only?
> >
> > That's a fair point. Do you think `Read` permission would be reasonable?
> > This is all information that could be obtained by reading the topic.
> >
> > Yeah that makes sense.
>
>
> > > 4. From the example it seems "TxnStartOffset" should be included in the
> > DescribeTransaction response schema? Otherwise the user would not get it
> in
> > the following WriteTxnMarker request.
> >
> > The `DescribeTransaction` API is sent to the transaction coordinator,
> which
> > does not know the start offset of a transaction in each topic partition.
> > That is why we need `DescribeProducers`.
> >
>
> Hmm, but the "TxnStartOffset" is not included in the DescribeProducers
> response either?
>
>
> >
> > > 5. It is a bit easier for readers to highlight the added fields in the
> > existing WriteTxnMarkerRequest (btw I read is that we are only adding
> > "Partitions" with the starting offset, right?). Also as for its response
> it
> > seems we do not make any schema changes except adding one more potential
> > error code "INVALID_TXN_STATE" to it, right? If that's the case we can
> just
> > state that explicitly.
> >
> > I highlighted the new field in the request. For the response, the KIP
> > states the following: "There are no changes to the response schema, but
> it
> > will be bumped. Note that we are also enabling flexible version support."
> >
> > > 6. It is not clear to me for the overloaded function that the following
> > option classes are not specified, what should be the default options?
> > ...
> >
> > I was just trying to stick with existing conventions, but I will add some
> > more detail here. I think we should probably still include
> > `AbortTransactionOptions`. The `Options` classes are how users override
> > timeouts.
> >
> > > 7.1 Is "--broker" a required or optional (in that case I presume we
> would
> > just query all brokers iteratively) in "--find-hanging"?
> >
> > I think it should be required as a reasonable way to limit the scope of
> the
> > search. This is meant to be guided by metrics after all. If we do not
> limit
> > the scope to a single broker, then the behavior might get worse as the
> > cluster grows. I will clarify this.
> >
> > > 7.2 Seems 

Re: [DISCUSS] KIP-664: Provide tooling to detect and abort hanging transactions

2020-08-31 Thread Guozhang Wang
On Mon, Aug 31, 2020 at 12:28 PM Jason Gustafson  wrote:

> Hey Guozhang,
>
> Thanks for the detailed comments. Responses inline:
>
> > 1. I'd like to clarify how we can make "--abort" work with old brokers,
> since without the additional field "Partitions" the tool needs to set the
> coordinator epoch correctly instead of "-1"? Arguably that's still doable
> but would require different call paths, and it's not clear whether that's
> worth doing for old versions.
>
> That's a good question. What I had in mind was to write the marker using
> the last coordinator epoch that was used by the respective ProducerId. I
> realized that I left the coordinator epoch out of the `DescribeProducers`
> response, so I have updated the KIP to include it. It is possible that
> there is no coordinator epoch associated with a given ProducerId (e.g. if
> it is the first transaction from that producer), but in this case we can
> use 0.
>
> As for whether this is worth doing, I guess I would be more inclined to
> leave it out if users had a reasonable alternative today to address this
> problem.
>
> > 2. Why do we have to enforce "DescribeProducers" to be sent to only
> leaders
> while ListTransactions can be sent to any brokers? Or is it really
> "ListTransactions to be sent to coordinators only"? From the workflow
> you've described, based on the results back from DescribeProducers, we
> should just immediately send ListTransactions to the
> corresponding coordinators based on the collected producer ids, instead of
> trying to send to any brokers right?
>
> I'm going to change `DescribeProducers` so that it can be handled by any
> replica of a topic partition. This was suggested by Lucas in order to allow
> this API to be used for replica consistency testing. As far as
> `ListTransactions`, I was treating this similarly to `ListGroups`. Although
> we know that the coordinators are the leaders of the __transaction_state
> partitions, this is more of an implementation detail. From an API
> perspective, we say that any broker could be a transaction coordinator.
>
> > 3. One thing I'm a bit hesitant about is that, is `Describe` permission
> on
> the associated topic sufficient to allow any users to get all producer
> information writing to the specific topic-partitions including last
> timestamp, txn-start-timestamp etc, which may be considered sensitive?
> Should we require "ClusterAction" to only allow operators only?
>
> That's a fair point. Do you think `Read` permission would be reasonable?
> This is all information that could be obtained by reading the topic.
>
> Yeah that makes sense.


> > 4. From the example it seems "TxnStartOffset" should be included in the
> DescribeTransaction response schema? Otherwise the user would not get it in
> the following WriteTxnMarker request.
>
> The `DescribeTransaction` API is sent to the transaction coordinator, which
> does not know the start offset of a transaction in each topic partition.
> That is why we need `DescribeProducers`.
>

Hmm, but the "TxnStartOffset" is not included in the DescribeProducers
response either?


>
> > 5. It is a bit easier for readers to highlight the added fields in the
> existing WriteTxnMarkerRequest (btw I read is that we are only adding
> "Partitions" with the starting offset, right?). Also as for its response it
> seems we do not make any schema changes except adding one more potential
> error code "INVALID_TXN_STATE" to it, right? If that's the case we can just
> state that explicitly.
>
> I highlighted the new field in the request. For the response, the KIP
> states the following: "There are no changes to the response schema, but it
> will be bumped. Note that we are also enabling flexible version support."
>
> > 6. It is not clear to me for the overloaded function that the following
> option classes are not specified, what should be the default options?
> ...
>
> I was just trying to stick with existing conventions, but I will add some
> more detail here. I think we should probably still include
> `AbortTransactionOptions`. The `Options` classes are how users override
> timeouts.
>
> > 7.1 Is "--broker" a required or optional (in that case I presume we would
> just query all brokers iteratively) in "--find-hanging"?
>
> I think it should be required as a reasonable way to limit the scope of the
> search. This is meant to be guided by metrics after all. If we do not limit
> the scope to a single broker, then the behavior might get worse as the
> cluster grows. I will clarify this.
>
> > 7.2 Seems "list-producers" is not exposed as a standalone feature in the
> cmd but only used in the wrapping "--find-hanging", is that intentional?
> Personally I feel exposing a "--list-producers" may be useful too: if we
> believe the user has the right ACL, it is legitimate to return the producer
> information to her anyways. But that is debatable in the meta point 3)
> above.
>
> Yeah, I was planning to add this to support the use case that Lucas
> mentioned. 

[jira] [Created] (KAFKA-10453) Backport of PR-7781

2020-08-31 Thread Niketh Sabbineni (Jira)
Niketh Sabbineni created KAFKA-10453:


 Summary: Backport of PR-7781
 Key: KAFKA-10453
 URL: https://issues.apache.org/jira/browse/KAFKA-10453
 Project: Kafka
  Issue Type: Wish
  Components: clients
Affects Versions: 1.1.1
Reporter: Niketh Sabbineni


We have been hitting this bug (with kafka 1.1.1) where the Producer takes 
forever to load metadata. The issue seems to have been patched in master 
[here|[https://github.com/apache/kafka/pull/7781]]. 

Would you *recommend* a backport of that above change to 1.1? There are 7-8 
changes that need to be cherry picked. The other option is to upgrade to 2.5 
(which would be much more involved)



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [DISCUSS] KIP-659: Improve TimeWindowedDeserializer and TimeWindowedSerde to handle window size

2020-08-31 Thread Leah Thomas
Hey Sophie,

Thanks for the catch! It makes sense that the consumer would accept a
deserializer somewhere, so we can definitely skip the additional configs. I
updated the KIP to reflect that.

John seems to know Scala better than I do as well, but I think we need to
keep the current implicit that allows users to just pass in a serde and no
window size for backwards compatibility. It seems to me that based on the
discussion around KIP-616 , we
can pretty easily do John's third suggestion for handling this implicit:
logging an error message and passing to a non-deprecated constructor using
some default value. It seems from KIP-616 that most scala users will use
the new Serdes class anyways, and Yuriy is just removing these implicits so
it seems like whatever fix we decide for this class won't get used too
heavily.

Cheers,
Leah

On Thu, Aug 27, 2020 at 8:49 PM Sophie Blee-Goldman 
wrote:

> Ok I'm definitely feeling pretty dumb now, but I was just thinking how
> ridiculous
> it is that the Consumer forces you to configure your Deserializer through
> actual
> config maps instead of just taking the ones you pass in directly. So I
> thought
> "why not just fix the Consumer to allow passing in an actual Deserializer
> object"
> and went to go through the code in case there's some legitimate reason why
> not,
> and what do you know. You actually can pass in an actual Deserializer
> object!
> There is a KafkaConsumer constructor that accepts a key and value
> Deserializer,
> and doesn't instantiate or configure a new one if provided in this way.
> Duh.
>
> Sorry for misleading everyone on that front. I'm just happy to find out
> that a
> reasonable way of configuring deserializer actually *is *possible after
> all. In that
> case, maybe we can remove the extra configs from this KIP and just proceed
> with the deprecation?
>
> Obviously that doesn't help anything with regards to the remaining question
> that
> John/Leah have posed. Now I probably don't have anything valuable to offer
> there
> since I know next to nothing about Scala, but I do want to
> better understand: why
> would we add an "implicit" (what exactly does this mean?) that relies on
> allowing
> users to not set the windowSize, if we are explicitly taking away that
> option from
> the Java users? Or if we have already added something, can't we just
> deprecate
> it like we are deprecating the Java constructor? I may need some remedial
> lessons
> in Scala just to understand the problem that we apparently have, because I
> don't
> get it.
>
> By the way, I'm a little tempted to say that we should go one step further
> and
> deprecate the DEFAULT_WINDOWED_INNER_CLASS configs, but maybe that's
> a bit too radical for the moment. It just seems like default serde configs
> have been
> a lot more trouble than they're worth overall. That said, these particular
> configs
> don't appear to have hurt anyone thus far, at least not that we know of
> (possibly
> because no one is using it anyway) so there's no strong motivation to do so
>
> On Wed, Aug 26, 2020 at 9:19 AM Leah Thomas  wrote:
>
> > Hey John,
> >
> > Thanks for pointing this out, I wasn't sure how to handle the Scala
> > changes.
> >
> > I'm not fully versed in the Scala version of Streams, so feel free to
> > correct me if any of my assumptions are wrong. I think logging an error
> > message and then calling the constructor that requires a windowSize seems
> > like the simplest fix from my point of view. So instead of
> > calling`TimeWindowedSerde(final Serde inner)`, we could
> > call `TimeWindowedSerde(final Serde inner, final long windowSize)`
> with
> > Long.MAX_VALUE as the window size.
> >
> > I do feel like we would want to add an implicit to `Serdes.scala` that
> > takes a serde and a window size so that users can access the constructor
> > that initializes with the correct window size. I agree with your comment
> on
> > the KIP-616 PR that the serde needs to be pre-configured when it's
> passed,
> > but I'm not sure we would need a windowSize config. I think if the
> > constructor is passed the serde and the window size, then window size
> > should be set within the deserializer. The only catch is if the Scala
> > version of the consumer creates a new deserializer, and at that point
> we'd
> > need a window size config, but I'm not sure if that's the case.
> >
> > WDYT - is it possible to alter the existing implicit and add a new one?
> >
> > On Wed, Aug 26, 2020 at 10:00 AM John Roesler 
> wrote:
> >
> > > Hi Leah,
> > >
> > > I was just reviewing the PR for KIP-616 and realized that we
> > > forgot to mention the Scala API in your KIP. We should
> > > consider it because `scala.Serdes.timeWindowedSerde` is
> > > implicitly using the exact constructor you're deprecating.
> > >
> > > I had some ideas in the code review:
> > > https://github.com/apache/kafka/pull/8955#discussion_r477358755
> > >
> > > What do you think is the best approach?
> > 

[jira] [Created] (KAFKA-10452) Only expire preferred read replica if a leader is alive for the topic

2020-08-31 Thread Jeff Kim (Jira)
Jeff Kim created KAFKA-10452:


 Summary: Only expire preferred read replica if a leader is alive 
for the topic
 Key: KAFKA-10452
 URL: https://issues.apache.org/jira/browse/KAFKA-10452
 Project: Kafka
  Issue Type: Bug
  Components: clients
Reporter: Jeff Kim
Assignee: Jeff Kim


Fetch from follower functionality periodically expires and refreshes preferred 
read replica (at `metadata.max.age.ms` interval). This allows a client to 
discover a better follower to fetch from if one becomes available.

However the expiration is done even if the current partition has no leader (can 
happen in DR scenario with observers). It makes sense to get the new preferred 
replica information and update existing one, instead of expiring existing one 
and then fetching new one.

Doing this will allow clients to keep on fetching from a follower/observer 
instead of failing to find leader when all ISR replicas go offline.

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [DISCUSS] KIP-664: Provide tooling to detect and abort hanging transactions

2020-08-31 Thread Jason Gustafson
Hi Bob,

Thanks for the comment.

> I'm not sure how much value the MaxActiveTransactionDuration metric adds,
given that we have the --find-hanging option in the tool. As you mention,
instances of these transactions are expected to be rare, and a
partition-level metric, which can generate a lot of data, seems very
heavyweight for such a rare occurrence. I think "alert on
PartitionsWithLateTransactionsCount" followed by "run kafka-transactions
--find-hanging on the relevant broker" is a reasonable process for cluster
operators to follow.

Yeah, I see your point. My original thinking was that this metric might
help an operator identify partitions specifically, but that was before I
added the --find-hanging command. I guess we can remove it.

-Jason




On Mon, Aug 31, 2020 at 12:28 PM Jason Gustafson  wrote:

> Hey Guozhang,
>
> Thanks for the detailed comments. Responses inline:
>
> > 1. I'd like to clarify how we can make "--abort" work with old brokers,
> since without the additional field "Partitions" the tool needs to set the
> coordinator epoch correctly instead of "-1"? Arguably that's still doable
> but would require different call paths, and it's not clear whether that's
> worth doing for old versions.
>
> That's a good question. What I had in mind was to write the marker using
> the last coordinator epoch that was used by the respective ProducerId. I
> realized that I left the coordinator epoch out of the `DescribeProducers`
> response, so I have updated the KIP to include it. It is possible that
> there is no coordinator epoch associated with a given ProducerId (e.g. if
> it is the first transaction from that producer), but in this case we can
> use 0.
>
> As for whether this is worth doing, I guess I would be more inclined to
> leave it out if users had a reasonable alternative today to address this
> problem.
>
> > 2. Why do we have to enforce "DescribeProducers" to be sent to only
> leaders
> while ListTransactions can be sent to any brokers? Or is it really
> "ListTransactions to be sent to coordinators only"? From the workflow
> you've described, based on the results back from DescribeProducers, we
> should just immediately send ListTransactions to the
> corresponding coordinators based on the collected producer ids, instead of
> trying to send to any brokers right?
>
> I'm going to change `DescribeProducers` so that it can be handled by any
> replica of a topic partition. This was suggested by Lucas in order to allow
> this API to be used for replica consistency testing. As far as
> `ListTransactions`, I was treating this similarly to `ListGroups`. Although
> we know that the coordinators are the leaders of the __transaction_state
> partitions, this is more of an implementation detail. From an API
> perspective, we say that any broker could be a transaction coordinator.
>
> > 3. One thing I'm a bit hesitant about is that, is `Describe` permission
> on
> the associated topic sufficient to allow any users to get all producer
> information writing to the specific topic-partitions including last
> timestamp, txn-start-timestamp etc, which may be considered sensitive?
> Should we require "ClusterAction" to only allow operators only?
>
> That's a fair point. Do you think `Read` permission would be reasonable?
> This is all information that could be obtained by reading the topic.
>
> > 4. From the example it seems "TxnStartOffset" should be included in the
> DescribeTransaction response schema? Otherwise the user would not get it in
> the following WriteTxnMarker request.
>
> The `DescribeTransaction` API is sent to the transaction coordinator,
> which does not know the start offset of a transaction in each topic
> partition. That is why we need `DescribeProducers`.
>
> > 5. It is a bit easier for readers to highlight the added fields in the
> existing WriteTxnMarkerRequest (btw I read is that we are only adding
> "Partitions" with the starting offset, right?). Also as for its response it
> seems we do not make any schema changes except adding one more potential
> error code "INVALID_TXN_STATE" to it, right? If that's the case we can just
> state that explicitly.
>
> I highlighted the new field in the request. For the response, the KIP
> states the following: "There are no changes to the response schema, but it
> will be bumped. Note that we are also enabling flexible version support."
>
> > 6. It is not clear to me for the overloaded function that the following
> option classes are not specified, what should be the default options?
> ...
>
> I was just trying to stick with existing conventions, but I will add some
> more detail here. I think we should probably still include
> `AbortTransactionOptions`. The `Options` classes are how users override
> timeouts.
>
> > 7.1 Is "--broker" a required or optional (in that case I presume we would
> just query all brokers iteratively) in "--find-hanging"?
>
> I think it should be required as a reasonable way to limit the scope of
> the search. This is meant 

[GitHub] [kafka-site] ewencp merged pull request #300: [MINOR] adding Itau Unibanco and OTICS to the powered-by page

2020-08-31 Thread GitBox


ewencp merged pull request #300:
URL: https://github.com/apache/kafka-site/pull/300


   



This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org




Re: [DISCUSS] KIP-664: Provide tooling to detect and abort hanging transactions

2020-08-31 Thread Jason Gustafson
Hey Guozhang,

Thanks for the detailed comments. Responses inline:

> 1. I'd like to clarify how we can make "--abort" work with old brokers,
since without the additional field "Partitions" the tool needs to set the
coordinator epoch correctly instead of "-1"? Arguably that's still doable
but would require different call paths, and it's not clear whether that's
worth doing for old versions.

That's a good question. What I had in mind was to write the marker using
the last coordinator epoch that was used by the respective ProducerId. I
realized that I left the coordinator epoch out of the `DescribeProducers`
response, so I have updated the KIP to include it. It is possible that
there is no coordinator epoch associated with a given ProducerId (e.g. if
it is the first transaction from that producer), but in this case we can
use 0.

As for whether this is worth doing, I guess I would be more inclined to
leave it out if users had a reasonable alternative today to address this
problem.

> 2. Why do we have to enforce "DescribeProducers" to be sent to only
leaders
while ListTransactions can be sent to any brokers? Or is it really
"ListTransactions to be sent to coordinators only"? From the workflow
you've described, based on the results back from DescribeProducers, we
should just immediately send ListTransactions to the
corresponding coordinators based on the collected producer ids, instead of
trying to send to any brokers right?

I'm going to change `DescribeProducers` so that it can be handled by any
replica of a topic partition. This was suggested by Lucas in order to allow
this API to be used for replica consistency testing. As far as
`ListTransactions`, I was treating this similarly to `ListGroups`. Although
we know that the coordinators are the leaders of the __transaction_state
partitions, this is more of an implementation detail. From an API
perspective, we say that any broker could be a transaction coordinator.

> 3. One thing I'm a bit hesitant about is that, is `Describe` permission on
the associated topic sufficient to allow any users to get all producer
information writing to the specific topic-partitions including last
timestamp, txn-start-timestamp etc, which may be considered sensitive?
Should we require "ClusterAction" to only allow operators only?

That's a fair point. Do you think `Read` permission would be reasonable?
This is all information that could be obtained by reading the topic.

> 4. From the example it seems "TxnStartOffset" should be included in the
DescribeTransaction response schema? Otherwise the user would not get it in
the following WriteTxnMarker request.

The `DescribeTransaction` API is sent to the transaction coordinator, which
does not know the start offset of a transaction in each topic partition.
That is why we need `DescribeProducers`.

> 5. It is a bit easier for readers to highlight the added fields in the
existing WriteTxnMarkerRequest (btw I read is that we are only adding
"Partitions" with the starting offset, right?). Also as for its response it
seems we do not make any schema changes except adding one more potential
error code "INVALID_TXN_STATE" to it, right? If that's the case we can just
state that explicitly.

I highlighted the new field in the request. For the response, the KIP
states the following: "There are no changes to the response schema, but it
will be bumped. Note that we are also enabling flexible version support."

> 6. It is not clear to me for the overloaded function that the following
option classes are not specified, what should be the default options?
...

I was just trying to stick with existing conventions, but I will add some
more detail here. I think we should probably still include
`AbortTransactionOptions`. The `Options` classes are how users override
timeouts.

> 7.1 Is "--broker" a required or optional (in that case I presume we would
just query all brokers iteratively) in "--find-hanging"?

I think it should be required as a reasonable way to limit the scope of the
search. This is meant to be guided by metrics after all. If we do not limit
the scope to a single broker, then the behavior might get worse as the
cluster grows. I will clarify this.

> 7.2 Seems "list-producers" is not exposed as a standalone feature in the
cmd but only used in the wrapping "--find-hanging", is that intentional?
Personally I feel exposing a "--list-producers" may be useful too: if we
believe the user has the right ACL, it is legitimate to return the producer
information to her anyways. But that is debatable in the meta point 3)
above.

Yeah, I was planning to add this to support the use case that Lucas
mentioned. There is some awkwardness since it is a little difficult to
convey different sources of information through the same command. I guess
we can do `--list producers` and `--list transactions` and explain in the
documentation. Maybe that is good enough.

> 7.3 "Describing Transactions": we should also explain how that would be
executed, e.g. at least 

[jira] [Resolved] (KAFKA-10384) Separate converters from generated messages

2020-08-31 Thread Colin McCabe (Jira)


 [ 
https://issues.apache.org/jira/browse/KAFKA-10384?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Colin McCabe resolved KAFKA-10384.
--
Fix Version/s: 2.7.0
   Resolution: Fixed

> Separate converters from generated messages
> ---
>
> Key: KAFKA-10384
> URL: https://issues.apache.org/jira/browse/KAFKA-10384
> Project: Kafka
>  Issue Type: Bug
>Reporter: Colin McCabe
>Assignee: Colin McCabe
>Priority: Major
> Fix For: 2.7.0
>
>
> Separate the JSON converter classes from the message classes, so that the 
> clients module can be used without Jackson on the CLASSPATH.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: [DISCUSS] KIP-405: Kafka Tiered Storage

2020-08-31 Thread Satish Duggana
KIP is updated with
- Remote log segment metadata topic message format/schema.
- Added remote log segment metadata state transitions and explained
how the deletion of segments is handled, including the case of
partition deletions.
- Added a few more limitations in the "Non goals" section.

Thanks,
Satish.

On Thu, Aug 27, 2020 at 12:42 AM Harsha Ch  wrote:
>
> Updated the KIP with Meeting Notes section
> https://cwiki.apache.org/confluence/display/KAFKA/KIP-405%3A+Kafka+Tiered+Storage#KIP405:KafkaTieredStorage-MeetingNotes
>
> On Tue, Aug 25, 2020 at 1:03 PM Jun Rao  wrote:
>
> > Hi, Harsha,
> >
> > Thanks for the summary. Could you add the summary and the recording link to
> > the last section of
> >
> > https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals
> > ?
> >
> > Jun
> >
> > On Tue, Aug 25, 2020 at 11:12 AM Harsha Chintalapani 
> > wrote:
> >
> > > Thanks everyone for attending the meeting today.
> > > Here is the recording
> > >
> > >
> > https://drive.google.com/file/d/14PRM7U0OopOOrJR197VlqvRX5SXNtmKj/view?usp=sharing
> > >
> > > Notes:
> > >
> > >1. KIP is updated with follower fetch protocol and ready to reviewed
> > >2. Satish to capture schema of internal metadata topic in the KIP
> > >3. We will update the KIP with details of different cases
> > >4. Test plan will be captured in a doc and will add to the KIP
> > >5. Add a section "Limitations" to capture the capabilities that will
> > be
> > >introduced with this KIP and what will not be covered in this KIP.
> > >
> > > Please add to it I missed anything. Will produce a formal meeting notes
> > > from next meeting onwards.
> > >
> > > Thanks,
> > > Harsha
> > >
> > >
> > >
> > > On Mon, Aug 24, 2020 at 9:42 PM, Ying Zheng 
> > > wrote:
> > >
> > > > We did some basic feature tests at Uber. The test cases and results are
> > > > shared in this google doc:
> > > > https://docs.google.com/spreadsheets/d/
> > > > 1XhNJqjzwXvMCcAOhEH0sSXU6RTvyoSf93DHF-YMfGLk/edit?usp=sharing
> > > >
> > > > The performance test results were already shared in the KIP last month.
> > > >
> > > > On Mon, Aug 24, 2020 at 11:10 AM Harsha Ch 
> > wrote:
> > > >
> > > > "Understand commitments towards driving design & implementation of the
> > > KIP
> > > > further and how it aligns with participant interests in contributing to
> > > the
> > > > efforts (ex: in the context of Uber’s Q3/Q4 roadmap)." What is that
> > > about?
> > > >
> > > > On Mon, Aug 24, 2020 at 11:05 AM Kowshik Prakasam <
> > > kpraka...@confluent.io>
> > > > wrote:
> > > >
> > > > Hi Harsha,
> > > >
> > > > The following google doc contains a proposal for temporary agenda for
> > the
> > > > KIP-405  sync meeting
> > > > tomorrow:
> > > >
> > > > https://docs.google.com/document/d/
> > > > 1pqo8X5LU8TpwfC_iqSuVPezhfCfhGkbGN2TqiPA3LBU/edit
> > > >
> > > > .
> > > > Please could you add it to the Google calendar invite?
> > > >
> > > > Thank you.
> > > >
> > > > Cheers,
> > > > Kowshik
> > > >
> > > > On Thu, Aug 20, 2020 at 10:58 AM Harsha Ch 
> > wrote:
> > > >
> > > > Hi All,
> > > >
> > > > Scheduled a meeting for Tuesday 9am - 10am. I can record and upload for
> > > > community to be able to follow the discussion.
> > > >
> > > > Jun, please add the required folks on confluent side.
> > > >
> > > > Thanks,
> > > >
> > > > Harsha
> > > >
> > > > On Thu, Aug 20, 2020 at 12:33 AM, Alexandre Dupriez <
> > alexandre.dupriez@
> > > > gmail.com > wrote:
> > > >
> > > > Hi Jun,
> > > >
> > > > Many thanks for your initiative.
> > > >
> > > > If you like, I am happy to attend at the time you suggested.
> > > >
> > > > Many thanks,
> > > > Alexandre
> > > >
> > > > Le mer. 19 août 2020 à 22:00, Harsha Ch < harsha. ch@ gmail. com (
> > > harsha.
> > > > c...@gmail.com ) > a écrit :
> > > >
> > > > Hi Jun,
> > > > Thanks. This will help a lot. Tuesday will work for us.
> > > > -Harsha
> > > >
> > > > On Wed, Aug 19, 2020 at 1:24 PM Jun Rao < jun@ confluent. io ( jun@
> > > > confluent.io ) > wrote:
> > > >
> > > > Hi, Satish, Ying, Harsha,
> > > >
> > > > Do you think it would be useful to have a regular virtual meeting to
> > > > discuss this KIP? The goal of the meeting will be sharing
> > > > design/development progress and discussing any open issues to
> > > >
> > > > accelerate
> > > >
> > > > this KIP. If so, will every Tuesday (from next week) 9am-10am
> > > >
> > > > PT
> > > >
> > > > work for you? I can help set up a Zoom meeting, invite everyone who
> > > >
> > > > might
> > > >
> > > > be interested, have it recorded and shared, etc.
> > > >
> > > > Thanks,
> > > >
> > > > Jun
> > > >
> > > > On Tue, Aug 18, 2020 at 11:01 AM Satish Duggana <
> > > >
> > > > satish. duggana@ gmail. com ( satish.dugg...@gmail.com ) >
> > > >
> > > > wrote:
> > > >
> > > > Hi Kowshik,
> > > >
> > > > Thanks for looking into the KIP and sending your comments.
> > > >
> > > > 5001. Under the section "Follower fetch 

Re: Request to subscribe to kafka mailing list

2020-08-31 Thread Guozhang Wang
Hi SaiTejia,

You can add yourself to the mailing list, it's self service:


https://kafka.apache.org/contact


Guozhang

On Sat, Aug 29, 2020 at 8:35 AM SaiTeja Ramisetty 
wrote:

> Regards,
> SaiTeja - Data Engineer
>


-- 
-- Guozhang


[jira] [Created] (KAFKA-10451) system tests send large command over ssh instead of using remote file for security config

2020-08-31 Thread Ron Dagostino (Jira)
Ron Dagostino created KAFKA-10451:
-

 Summary: system tests send large command over ssh instead of using 
remote file for security config
 Key: KAFKA-10451
 URL: https://issues.apache.org/jira/browse/KAFKA-10451
 Project: Kafka
  Issue Type: Improvement
  Components: system tests
Reporter: Ron Dagostino


In `kafka.py` the pattern used to supply security configuration information to 
remote CLI tools is to send the information as part of the ssh command.  For 
example, see this --command-config definition:

{{Running ssh command: export 
KAFKA_OPTS="-Djava.security.auth.login.config=/mnt/security/admin_client_as_broker_jaas.conf
 -Djava.security.krb5.conf=/mnt/security/krb5.conf"; 
/opt/kafka-dev/bin/kafka-configs.sh --bootstrap-server worker2:9095 
--command-config <(echo '
ssl.endpoint.identification.algorithm=HTTPS
sasl.kerberos.service.name=kafka
security.protocol=SASL_SSL
ssl.keystore.location=/mnt/security/test.keystore.jks
ssl.truststore.location=/mnt/security/test.truststore.jks
ssl.keystore.password=test-ks-passwd
sasl.mechanism=SCRAM-SHA-256
ssl.truststore.password=test-ts-passwd
ssl.key.password=test-ks-passwd
sasl.mechanism.inter.broker.protocol=GSSAPI
') --entity-name kafka-client --entity-type users --alter --add-config 
SCRAM-SHA-256=[password=client-secret]}}

This ssh command length is getting pretty big.  It would be best if this 
referred to a file as opposed to sending in the file contents as part of the 
ssh command.

This happens in a few places in `kafka/py` and should be rectified.



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: There is an error on the main page

2020-08-31 Thread Sophie Blee-Goldman
Thanks for the heads up. Would you be interested in submitting a PR to fix
this?

The typo seems to appear in two files, quickstart-docker.html and
quickstart-zookeeper.html, of the kafka-site repo
.

On Mon, Aug 31, 2020 at 8:43 AM Liu Lycos  wrote:

> Hello, I'm a developer. When I wanted to learn Kafka, I found the official
> website. I found that there was an error code, the target location
>
> --> http://kafka.apache.org/documentation/#quickstart
>
> --> 1. GETTING STARTED
>
> --> 1.3 Quick Start
>
> --> STEP 7: PROCESS YOUR EVENTS WITH KAFKA STREAMS
>
> The following code
>
> KStream textLines = builder.stream ("quickstart-events");
> >
> >
> > KTable wordCounts = textLines
> >
> > .flatMapValues(line -> Arrays.asList ( line.toLowerCase ().split(" ")))
> >
> > .groupBy((keyIgnored, word) -> word)
> >
> > .count();
> >
> >
> > wordCounts.toStream ().to("output-topic"), Produced.with ( Serdes.String
> > (), Serdes.Long ()));
> >
>
>
>
> There is an extra bracket after "output topic", when I click Kafka streams
> demo  below
>
>
> But I don't know who the email was sent to, so I found this email and hope
> to optimize the page code.
>
>
>
> ---
> Name: 刘玉川 LiuYuChuan
> NickName: Lycos
> PostCode:51
> MobPhone:18615183765
> TelePhone:
> QQ: 763999883
> Addr:
> 广州市越秀区环市东路498号,柏丽商业中心
> Baili commercial center, No. 498, Huanshi East Road, Yuexiu District,
> Guangzhou City, Guangdong Province
>


[jira] [Created] (KAFKA-10450) console-producer throws Uncaught error in kafka producer I/O thread:  (org.apache.kafka.clients.producer.internals.Sender) java.lang.IllegalStateException: There are no

2020-08-31 Thread Jigar Naik (Jira)
Jigar Naik created KAFKA-10450:
--

 Summary: console-producer throws Uncaught error in kafka producer 
I/O thread:  (org.apache.kafka.clients.producer.internals.Sender) 
java.lang.IllegalStateException: There are no in-flight requests for node -1
 Key: KAFKA-10450
 URL: https://issues.apache.org/jira/browse/KAFKA-10450
 Project: Kafka
  Issue Type: Bug
  Components: producer 
Affects Versions: 2.6.0
 Environment: Kafka Version 2.6.0
MacOS Version - macOS Catalina 10.15.6 (19G2021)
java version "11.0.8" 2020-07-14 LTS
Java(TM) SE Runtime Environment 18.9 (build 11.0.8+10-LTS)
Java HotSpot(TM) 64-Bit Server VM 18.9 (build 11.0.8+10-LTS, mixed mode)
Reporter: Jigar Naik


Kafka-console-producer.sh gives below error on Mac 

ERROR [Producer clientId=console-producer] Uncaught error in kafka producer I/O 
thread:  (org.apache.kafka.clients.producer.internals.Sender)

java.lang.IllegalStateException: There are no in-flight requests for node -1

*Steps to re-produce the issue.* 

Download Kafka from 
[kafka_2.13-2.6.0.tgz|https://www.apache.org/dyn/closer.cgi?path=/kafka/2.6.0/kafka_2.13-2.6.0.tgz]
 

Change data and log directory (Optional)

Create Topic Using below command 

 
{code:java}
./kafka-topics.sh \
 --create \
 --zookeeper localhost:2181 \
 --replication-factor 1 \
 --partitions 1 \
 --topic my-topic{code}
 

Start Kafka console producer using below command

 
{code:java}
./kafka-console-consumer.sh \
 --topic my-topic \
 --from-beginning \
 --bootstrap-server localhost:9092{code}
 

Gives below output

 
{code:java}
./kafka-console-producer.sh \
     --topic my-topic \
     --bootstrap-server 127.0.0.1:9092
>[2020-09-01 00:24:18,177] ERROR [Producer clientId=console-producer] Uncaught 
>error in kafka producer I/O thread:  
>(org.apache.kafka.clients.producer.internals.Sender)
java.nio.BufferUnderflowException
at java.base/java.nio.Buffer.nextGetIndex(Buffer.java:650)
at java.base/java.nio.HeapByteBuffer.getInt(HeapByteBuffer.java:391)
at 
org.apache.kafka.common.protocol.ByteBufferAccessor.readInt(ByteBufferAccessor.java:43)
at 
org.apache.kafka.common.message.ResponseHeaderData.read(ResponseHeaderData.java:102)
at 
org.apache.kafka.common.message.ResponseHeaderData.(ResponseHeaderData.java:70)
at org.apache.kafka.common.requests.ResponseHeader.parse(ResponseHeader.java:66)
at 
org.apache.kafka.clients.NetworkClient.parseStructMaybeUpdateThrottleTimeMetrics(NetworkClient.java:717)
at 
org.apache.kafka.clients.NetworkClient.handleCompletedReceives(NetworkClient.java:834)
at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:553)
at org.apache.kafka.clients.producer.internals.Sender.runOnce(Sender.java:325)
at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:240)
at java.base/java.lang.Thread.run(Thread.java:834)
[2020-09-01 00:24:18,179] ERROR [Producer clientId=console-producer] Uncaught 
error in kafka producer I/O thread:  
(org.apache.kafka.clients.producer.internals.Sender)
java.lang.IllegalStateException: There are no in-flight requests for node -1
at 
org.apache.kafka.clients.InFlightRequests.requestQueue(InFlightRequests.java:62)
at 
org.apache.kafka.clients.InFlightRequests.completeNext(InFlightRequests.java:70)
at 
org.apache.kafka.clients.NetworkClient.handleCompletedReceives(NetworkClient.java:833)
at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:553)
at org.apache.kafka.clients.producer.internals.Sender.runOnce(Sender.java:325)
at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:240)
at java.base/java.lang.Thread.run(Thread.java:834)
[2020-09-01 00:24:18,682] WARN [Producer clientId=console-producer] Bootstrap 
broker 127.0.0.1:9092 (id: -1 rack: null) disconnected 
(org.apache.kafka.clients.NetworkClient)
{code}
 

 

The same steps works fine with Kafka version 2.0.0 on Mac. 

The same steps works fine with Kafka version 2.6.0 on Windows. 

 

 



--
This message was sent by Atlassian Jira
(v8.3.4#803005)


Re: Permission to create KIP

2020-08-31 Thread Jun Rao
Hi, Sorin,

Thanks for your interest. Just gave you the wiki permission.

Jun

On Sat, Aug 29, 2020 at 8:35 AM sorin 
wrote:

> Hi guys I just wanted to propose an addition to the Consumer API to add
> a new poll method which would also accept a collection of paused
> partitions to automatically do the actions that are needed to resume
> serving messages for the provided partitions. If this is not the way to
> go for making such proposals, please suggest other ways.
>
> Thanks!
>
>


There is an error on the main page

2020-08-31 Thread Liu Lycos
Hello, I'm a developer. When I wanted to learn Kafka, I found the official
website. I found that there was an error code, the target location

--> http://kafka.apache.org/documentation/#quickstart

--> 1. GETTING STARTED

--> 1.3 Quick Start

--> STEP 7: PROCESS YOUR EVENTS WITH KAFKA STREAMS

The following code

KStream textLines = builder.stream ("quickstart-events");
>
>
> KTable wordCounts = textLines
>
> .flatMapValues(line -> Arrays.asList ( line.toLowerCase ().split(" ")))
>
> .groupBy((keyIgnored, word) -> word)
>
> .count();
>
>
> wordCounts.toStream ().to("output-topic"), Produced.with ( Serdes.String
> (), Serdes.Long ()));
>



There is an extra bracket after "output topic", when I click Kafka streams
demo  below


But I don't know who the email was sent to, so I found this email and hope
to optimize the page code.



---
Name: 刘玉川 LiuYuChuan
NickName: Lycos
PostCode:51
MobPhone:18615183765
TelePhone:
QQ: 763999883
Addr:
广州市越秀区环市东路498号,柏丽商业中心
Baili commercial center, No. 498, Huanshi East Road, Yuexiu District,
Guangzhou City, Guangdong Province


Re: [DISCUSS] KIP-631: The Quorum-based Kafka Controller

2020-08-31 Thread Unmesh Joshi
>>The reason for including LeaseStartTimeMs in the request is to ensure
that the time required to communicate with the controller gets included in
>>the lease time.  Since requests can potentially be delayed in the network
for a long time, this is important.
The network time will be added anyway, because the lease timer on the
active controller will start only after the heartbeat request reaches the
server. And I think, some assumption about network round trip time is
needed anyway to decide on the frequency of the heartbeat (
registration.heartbeat.interval.ms), and lease timeout (
registration.lease.timeout.ms). So I think just having a leaseTTL in the
request is easier to understand and implement.
>>>Yes, I agree that the lease timeout on the controller side should be
reset in the case of controller failover.  The alternative would be to
track the >>>lease as hard state rather than soft state, but I think that
is not really needed, and would result in more log entries.
My interpretation of the mention of BrokerRecord in the KIP was that this
record exists in the Raft log. By soft state, do you mean the broker
records exist only on the active leader and will not be replicated in the
raft log? If the live brokers list is maintained only on the active
controller (raft leader), then, in case of leader failure, there will be a
window where the new leader does not know about the live brokers, till the
brokers establish the leases again.
I think it will be safer to have leases as a hard state managed by standard
Raft replication.
Or am I misunderstanding something? (I assume that with soft state, you
mean something like zookeeper local sessions
https://issues.apache.org/jira/browse/ZOOKEEPER-1147.)

>>Our code is single threaded as well.  I think it makes sense for the
controller, since otherwise locking becomes very messy.  I'm not sure I
>>understand your question about duplicate broker ID detection, though.
There's a section in the KIP about this -- is there a detail we should add
?>>there?
I assumed broker leases are implemented as a hard state. In that case, to
check for broker id conflict, we need to check the broker ids at two places
1. Pending broker registrations (which are yet to be committed) 2. Already
committed broker registrations.

Thanks,
Unmesh



On Mon, Aug 31, 2020 at 5:42 PM Colin McCabe  wrote:

> On Sat, Aug 29, 2020, at 01:12, Unmesh Joshi wrote:
> > >>>Can you repeat your questions about broker leases?
> >
> > The LeaseStartTimeMs is expected to be the broker's
> > 'System.currentTimeMillis()' at the point of the request. The active
> > controller will add its lease period to this in order to compute the
> > LeaseEndTimeMs.
> >
> > I think the use of LeaseStartTimeMs and LeaseEndTimeMs in the KIP is a
> > bit
> > confusing.  Monotonic Clock (System.nanoTime) on the active controller
> > should be used to track leases.
> > (For example,
> >
> https://issues.apache.org/jira/browse/ZOOKEEPER-1616https://github.com/etcd-io/etcd/pull/6888/commits/e7f4010ccaf28b6ce64fe514d25a4b2fa459d114
> > )
> >
> > Then we will not need LeaseStartTimeMs?
> > Instead of LeaseStartTimeMs, can we call it LeaseTTL? The active
> controller
> > can then calculate LeaseEndTime = System.nanoTime() + LeaseTTL.
> > In this case we might just drop LeaseEndTimeMs from the response, as the
> > broker already knows about the TTL and can send heartbeats at some
> fraction
> > of TTL, say every TTL/4 milliseconds.(elapsed time on the broker measured
> > by System.nanoTime)
> >
>
> Hi Unmesh,
>
> I agree that the monotonic clock is probably a better idea here.  It is
> good to be robust against wall clock changes, although I think a cluster
> which had them might suffer other issues.  I will change it to specify a
> monotonic clock.
>
> The reason for including LeaseStartTimeMs in the request is to ensure that
> the time required to communicate with the controller gets included in the
> lease time.  Since requests can potentially be delayed in the network for a
> long time, this is important.
>
> >
> > I have a prototype built to demonstrate this as following:
> >
> https://github.com/unmeshjoshi/distrib-broker/blob/master/src/main/scala/com/dist/simplekafka/kip500/Kip631Controller.scala
> >
> > The Kip631Controller itself depends on a Consensus module, to demonstrate
> > how possible interactions with the consensus module will look like
> >  (The Consensus can be pluggable really, with an API to allow reading
> > replicated log upto HighWaterMark)
> >
> > It has an implementation of LeaseTracker
> >
> https://github.com/unmeshjoshi/distrib-broker/blob/master/src/main/scala/com/dist/simplekafka/kip500/LeaderLeaseTracker.scala
> > to demonstrate LeaseTracker's interaction with the consensus module.
> >
> > The implementation has the following aspects:
> > 1. The lease tracking happens only on the active controller (raft
> > leader)
> > 2. Once the lease expires, it needs to propose and commit a FenceBroker
> > 

Re: [VOTE] KIP-635: GetOffsetShell: support for multiple topics and consumer configuration override

2020-08-31 Thread Dániel Urbán
Hello everyone,

I'd like to ask you to consider voting for this KIP, it only needs 2 more
binding votes.

This KIP focuses on the GetOffsetShell tool. The tool is useful for
monitoring and investigations as well.
Unfortunately, the tool misses one key, and some quality-of-life features:
- Because of lack of configuration options, it cannot be used with secure
clusters - which makes it unviable in many use-cases.
- Only supports a single topic - usability improvement, also in line with
other tools using pattern based matching.
- Still utilizes the deprecated "--broker-list" argument name.

Overall, I believe that this is a non-intrusive change - minor improvements
without breaking changes. But for some, this would be a great improvement
in using Kafka.

Thank you in advance,
Daniel


Dániel Urbán  ezt írta (időpont: 2020. aug. 27., Cs,
17:52):

> Hi all,
>
> Please vote if you'd like to see this implemented. This one fixes a
> long-time debt, would be nice to see it pass.
>
> Thank you,
> Daniel
>
> Dániel Urbán  ezt írta (időpont: 2020. aug. 18.,
> K, 14:06):
>
>> Hello everyone,
>>
>> Please, if you are interested in this KIP and PR, don't forget to vote.
>>
>> Thank you,
>> Daniel
>>
>> Dániel Urbán  ezt írta (időpont: 2020. aug. 13.,
>> Cs, 14:00):
>>
>>> Hi David,
>>>
>>> Thank you for the suggestion. KIP-635 was referencing the --broker-list
>>> issue, but based on your suggestion, I pinged the PR
>>> https://github.com/apache/kafka/pull/8123.
>>> Since I got no response, I updated KIP-635 to deprecate --broker-list.
>>> Will update the PR related to KIP-635 to reflect that change.
>>>
>>> Thanks,
>>> Daniel
>>>
>>> David Jacot  ezt írta (időpont: 2020. aug. 10., H,
>>> 20:48):
>>>
 Hi Daniel,

 I was not aware of that PR. At minimum, I would add `--bootstrap-server`
 to the list in the KIP for completeness. Regarding the implementation,
 I would leave a comment in that PR asking if they plan to continue it.
 If
 not,
 we could do it as part of your PR directly.

 Cheers,
 David

 On Mon, Aug 10, 2020 at 10:49 AM Dániel Urbán 
 wrote:

 > Hi everyone,
 >
 > Just a reminder, please vote if you are interested in this KIP being
 > implemented.
 >
 > Thanks,
 > Daniel
 >
 > Dániel Urbán  ezt írta (időpont: 2020. júl.
 31., P,
 > 9:01):
 >
 > > Hi David,
 > >
 > > There is another PR linked on KAFKA-8507, which is still open:
 > > https://github.com/apache/kafka/pull/8123
 > > Wasn't sure if it will go in, and wanted to avoid conflicts. Do you
 think
 > > I should do the switch to '--bootstrap-server' anyway?
 > >
 > > Thanks,
 > > Daniel
 > >
 > > David Jacot  ezt írta (időpont: 2020. júl.
 30., Cs,
 > > 17:52):
 > >
 > >> Hi Daniel,
 > >>
 > >> Thanks for the KIP.
 > >>
 > >> It seems that we have forgotten to include this tool in KIP-499.
 > >> KAFKA-8507
 > >> is resolved
 > >> by this tool still uses the deprecated "--broker-list". I suggest
 to
 > >> include "--bootstrap-server"
 > >> in your public interfaces as well and fix this omission during the
 > >> implementation.
 > >>
 > >> +1 (non-binding)
 > >>
 > >> Thanks,
 > >> David
 > >>
 > >> On Thu, Jul 30, 2020 at 1:52 PM Kamal Chandraprakash <
 > >> kamal.chandraprak...@gmail.com> wrote:
 > >>
 > >> > +1 (non-binding), thanks for the KIP!
 > >> >
 > >> > On Thu, Jul 30, 2020 at 3:31 PM Manikumar <
 manikumar.re...@gmail.com>
 > >> > wrote:
 > >> >
 > >> > > +1 (binding)
 > >> > >
 > >> > > Thanks for the KIP!
 > >> > >
 > >> > >
 > >> > >
 > >> > > On Thu, Jul 30, 2020 at 3:07 PM Dániel Urbán <
 urb.dani...@gmail.com
 > >
 > >> > > wrote:
 > >> > >
 > >> > > > Hi everyone,
 > >> > > >
 > >> > > > If you are interested in this KIP, please do not forget to
 vote.
 > >> > > >
 > >> > > > Thanks,
 > >> > > > Daniel
 > >> > > >
 > >> > > > Viktor Somogyi-Vass  ezt írta
 (időpont:
 > >> 2020.
 > >> > > > júl.
 > >> > > > 28., K, 16:06):
 > >> > > >
 > >> > > > > +1 from me (non-binding), thanks for the KIP.
 > >> > > > >
 > >> > > > > On Mon, Jul 27, 2020 at 10:02 AM Dániel Urbán <
 > >> urb.dani...@gmail.com
 > >> > >
 > >> > > > > wrote:
 > >> > > > >
 > >> > > > > > Hello everyone,
 > >> > > > > >
 > >> > > > > > I'd like to start a vote on KIP-635. The KIP enhances the
 > >> > > > GetOffsetShell
 > >> > > > > > tool by enabling querying multiple topic-partitions,
 adding
 > new
 > >> > > > filtering
 > >> > > > > > options, and adding a config override option.
 > >> > > > > >
 > >> > > > > >
 > >> > > > >
 > >> > > >
 > >> > >
 > >> >
 > >>
 >
 

Re: [DISCUSS] KIP-631: The Quorum-based Kafka Controller

2020-08-31 Thread Colin McCabe
On Sat, Aug 29, 2020, at 01:12, Unmesh Joshi wrote:
> >>>Can you repeat your questions about broker leases?
> 
> The LeaseStartTimeMs is expected to be the broker's
> 'System.currentTimeMillis()' at the point of the request. The active
> controller will add its lease period to this in order to compute the
> LeaseEndTimeMs.
> 
> I think the use of LeaseStartTimeMs and LeaseEndTimeMs in the KIP is a 
> bit
> confusing.  Monotonic Clock (System.nanoTime) on the active controller
> should be used to track leases.
> (For example,
> https://issues.apache.org/jira/browse/ZOOKEEPER-1616https://github.com/etcd-io/etcd/pull/6888/commits/e7f4010ccaf28b6ce64fe514d25a4b2fa459d114
> )
> 
> Then we will not need LeaseStartTimeMs?
> Instead of LeaseStartTimeMs, can we call it LeaseTTL? The active controller
> can then calculate LeaseEndTime = System.nanoTime() + LeaseTTL.
> In this case we might just drop LeaseEndTimeMs from the response, as the
> broker already knows about the TTL and can send heartbeats at some fraction
> of TTL, say every TTL/4 milliseconds.(elapsed time on the broker measured
> by System.nanoTime)
> 

Hi Unmesh,

I agree that the monotonic clock is probably a better idea here.  It is good to 
be robust against wall clock changes, although I think a cluster which had them 
might suffer other issues.  I will change it to specify a monotonic clock.

The reason for including LeaseStartTimeMs in the request is to ensure that the 
time required to communicate with the controller gets included in the lease 
time.  Since requests can potentially be delayed in the network for a long 
time, this is important.

>
> I have a prototype built to demonstrate this as following:
> https://github.com/unmeshjoshi/distrib-broker/blob/master/src/main/scala/com/dist/simplekafka/kip500/Kip631Controller.scala
> 
> The Kip631Controller itself depends on a Consensus module, to demonstrate
> how possible interactions with the consensus module will look like
>  (The Consensus can be pluggable really, with an API to allow reading
> replicated log upto HighWaterMark)
> 
> It has an implementation of LeaseTracker
> https://github.com/unmeshjoshi/distrib-broker/blob/master/src/main/scala/com/dist/simplekafka/kip500/LeaderLeaseTracker.scala
> to demonstrate LeaseTracker's interaction with the consensus module.
> 
> The implementation has the following aspects:
> 1. The lease tracking happens only on the active controller (raft 
> leader)
> 2. Once the lease expires, it needs to propose and commit a FenceBroker
> record for that lease.
> 3. In case of active controller failure, the lease will be tracked by 
> the
> newly raft leader. The new raft leader starts the lease timer again, (as
> implemented in onBecomingLeader method of
> https://github.com/unmeshjoshi/distrib-broker/blob/master/src/main/scala/com/dist/simplekafka/kip500/Kip631Controller.scala
> )
> in effect extending the lease by the time spent in the leader election 
> and
> whatever time was elapsed on the old leader.

Yes, I agree that the lease timeout on the controller side should be reset in 
the case of controller failover.  The alternative would be to track the lease 
as hard state rather than soft state, but I think that is not really needed, 
and would result in more log entries.

> 
> There are working tests for this implementation here.
> https://github.com/unmeshjoshi/distrib-broker/blob/master/src/test/scala/com/dist/simplekafka/kip500/Kip631ControllerTest.scala
> and an end to end test here
> https://github.com/unmeshjoshi/distrib-broker/blob/master/src/test/scala/com/dist/simplekafka/ProducerConsumerKIP500Test.scala
> 
> >>'m not sure what you mean by "de-duplication of the broker."  Can you
> give a little more context?
> Apologies for using the confusing term deduplication. I meant broker id
> conflict.
> As you can see in the prototype handleRequest of KIP631Controller
> ,
> the duplicate broker id needs to be detected before the BrokerRecord is
> submitted to the raft module.
> Also as implemented in the prototype, the KIP631Controller is single
> threaded, handling requests one at a time. (an example of
> https://martinfowler.com/articles/patterns-of-distributed-systems/singular-update-queue.html
> )

Our code is single threaded as well.  I think it makes sense for the 
controller, since otherwise locking becomes very messy.  I'm not sure I 
understand your question about duplicate broker ID detection, though.  There's 
a section in the KIP about this -- is there a detail we should add there?

best,
Colin


> 
> Thanks,
> Unmesh
> 
> On Sat, Aug 29, 2020 at 10:49 AM Colin McCabe  wrote:
> 
> > On Fri, Aug 28, 2020, at 19:36, Unmesh Joshi wrote:
> > > Hi Colin,
> > >
> > > There were a few of 

Re: [VOTE] KIP-662: Throw Exception when Source Topics of a Streams App are Deleted

2020-08-31 Thread Bruno Cadonna

Thank you for voting!

The KIP passes with 5 votes:
- 4 binding votes (Guozhang, Bill, John, and Matthias)
- 1 non-binding vote (Walker)

Best,
Bruno

On 28.08.20 02:46, Matthias J. Sax wrote:

+1 (binding)

On 8/27/20 1:10 PM, John Roesler wrote:

Thanks, Bruno!

I'm +1 (binding)

-John

On Thu, 2020-08-27 at 15:35 -0400, Bill Bejeck wrote:

Thanks for the KIP Bruno.

+1 (binding)

-Bill

On Thu, Aug 27, 2020 at 3:15 PM Walker Carlson 
wrote:


+1 (Non Binding). Good Kip Bruno

Walker

On Tue, Aug 25, 2020 at 11:17 AM Guozhang Wang  wrote:


+1. Thanks Bruno!


Guozhang

On Tue, Aug 25, 2020 at 4:00 AM Bruno Cadonna 

wrote:

Hi,

I would like to start the vote for




https://cwiki.apache.org/confluence/display/KAFKA/KIP-662%3A+Throw+Exception+when+Source+Topics+of+a+Streams+App+are+Deleted

Best,
Bruno



--
-- Guozhang