Re: [ANNOUNCE] New Committer: Manikumar Reddy

2018-10-17 Thread Ray Chiang

Congrats Mani.

-Ray

On 10/17/18 10:19 AM, Harsha wrote:

Congrats Mani!! Very well deserved.

--Harsha
On Tue, Oct 16, 2018, at 5:20 PM, Attila Sasvari wrote:

Congratulations Manikumar! Keep up the good work.

On Tue, Oct 16, 2018 at 12:30 AM Jungtaek Lim  wrote:


Congrats Mani!
On Tue, 16 Oct 2018 at 1:45 PM Abhimanyu Nagrath <
abhimanyunagr...@gmail.com>
wrote:


Congratulations Manikumar

On Tue, Oct 16, 2018 at 10:09 AM Satish Duggana <

satish.dugg...@gmail.com>

wrote:


Congratulations Mani!


On Fri, Oct 12, 2018 at 9:41 PM Colin McCabe 

wrote:

Congratulations, Manikumar!  Well done.

best,
Colin


On Fri, Oct 12, 2018, at 01:25, Edoardo Comar wrote:

Well done Manikumar !
--

Edoardo Comar

IBM Event Streams
IBM UK Ltd, Hursley Park, SO21 2JN




From:   "Matthias J. Sax" 
To: dev 
Cc: users 
Date:   11/10/2018 23:41
Subject:Re: [ANNOUNCE] New Committer: Manikumar Reddy



Congrats!


On 10/11/18 2:31 PM, Yishun Guan wrote:

Congrats Manikumar!
On Thu, Oct 11, 2018 at 1:20 PM Sönke Liebau
 wrote:

Great news, congratulations Manikumar!!

On Thu, Oct 11, 2018 at 9:08 PM Vahid Hashemian



wrote:


Congrats Manikumar!

On Thu, Oct 11, 2018 at 11:49 AM Ryanne Dolan <

ryannedo...@gmail.com>

wrote:


Bravo!

On Thu, Oct 11, 2018 at 1:48 PM Ismael Juma <

ism...@juma.me.uk>

wrote:

Congratulations Manikumar! Thanks for your continued

contributions.

Ismael

On Thu, Oct 11, 2018 at 10:39 AM Jason Gustafson



wrote:


Hi all,

The PMC for Apache Kafka has invited Manikumar Reddy as a

committer

and

we

are
pleased to announce that he has accepted!

Manikumar has contributed 134 commits including significant

work to

add

support for delegation tokens in Kafka:

KIP-48:



https://cwiki.apache.org/confluence/display/KAFKA/KIP-48+Delegation+token+support+for+Kafka

KIP-249
<

https://cwiki.apache.org/confluence/display/KAFKA/KIP-48+Delegation+token+support+for+KafkaKIP-249

:



https://cwiki.apache.org/confluence/display/KAFKA/KIP-249%3A+Add+Delegation+Token+Operations+to+KafkaAdminClient

He has broad experience working with many of the core

components in

Kafka

and he has reviewed over 80 PRs. He has also made huge

progress

addressing

some of our technical debt.

We appreciate the contributions and we are looking forward

to

more.

Congrats Manikumar!

Jason, on behalf of the Apache Kafka PMC



--
Sönke Liebau
Partner
Tel. +49 179 7940878
OpenCore GmbH & Co. KG - Thomas-Mann-Straße 8 - 22880 Wedel -

Germany

[attachment "signature.asc" deleted by Edoardo Comar/UK/IBM]


Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with

number

741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire

PO6

3AU



--
--
Attila Sasvari
Software Engineer





Re: [ANNOUNCE] New committer: Colin McCabe

2018-09-25 Thread Ray Chiang

Nice job Colin!

-Ray

On 9/25/18 1:58 AM, Stanislav Kozlovski wrote:

Congrats Colin!

On Tue, Sep 25, 2018 at 9:51 AM Edoardo Comar  wrote:


Congratulations Colin !
--

Edoardo Comar

IBM Event Streams
IBM UK Ltd, Hursley Park, SO21 2JN




From:   Ismael Juma 
To: Kafka Users , dev 
Date:   25/09/2018 09:40
Subject:[ANNOUNCE] New committer: Colin McCabe



Hi all,

The PMC for Apache Kafka has invited Colin McCabe as a committer and we
are
pleased to announce that he has accepted!

Colin has contributed 101 commits and 8 KIPs including significant
improvements to replication, clients, code quality and testing. A few
highlights were KIP-97 (Improved Clients Compatibility Policy), KIP-117
(AdminClient), KIP-227 (Incremental FetchRequests to Increase Partition
Scalability), the introduction of findBugs and adding Trogdor (fault
injection and benchmarking tool).

In addition, Colin has reviewed 38 pull requests and participated in more
than 50 KIP discussions.

Thank you for your contributions Colin! Looking forward to many more. :)

Ismael, for the Apache Kafka PMC



Unless stated otherwise above:
IBM United Kingdom Limited - Registered in England and Wales with number
741598.
Registered office: PO Box 41, North Harbour, Portsmouth, Hampshire PO6 3AU







Re: [ANNOUNCE] New Kafka PMC member: Dong Lin

2018-08-21 Thread Ray Chiang

Congrats Dong!

-Ray

On 8/21/18 9:33 AM, Becket Qin wrote:

Congrats, Dong!


On Aug 21, 2018, at 11:03 PM, Eno Thereska  wrote:

Congrats Dong!

Eno

On Tue, Aug 21, 2018 at 7:05 AM, Ted Yu  wrote:


Congratulation Dong!

On Tue, Aug 21, 2018 at 1:59 AM Viktor Somogyi-Vass <
viktorsomo...@gmail.com>
wrote:


Congrats Dong! :)

On Tue, Aug 21, 2018 at 10:09 AM James Cheng 

wrote:

Congrats Dong!

-James


On Aug 20, 2018, at 3:54 AM, Ismael Juma  wrote:

Hi everyone,

Dong Lin became a committer in March 2018. Since then, he has

remained

active in the community and contributed a number of patches, reviewed
several pull requests and participated in numerous KIP discussions. I

am

happy to announce that Dong is now a member of the
Apache Kafka PMC.

Congratulation Dong! Looking forward to your future contributions.

Ismael, on behalf of the Apache Kafka PMC






Re: [DISCUSS] Applying scalafmt to core code

2018-08-08 Thread Ray Chiang
By doing piecemeal formatting, I don't think we can do a "hard" 
enforcement on using scalafmt with every PR, but by allowing the tool to 
run on already modified files in a patch, we can slowly migrate towards 
getting the entire code base clean.  The trade offs are pretty standard 
(giant patch polluting "git blame" vs. slower cleanup).  This came out 
of the discussion in KAFKA-2423, where most seemed against one giant patch.


The benefits of pretty-printing tends to be limited, but it does open 
the door for other linting/static analysis tools without the need to 
turn off their particular pretty-printing features (which is in some, 
but not all tools).


-Ray


On 20180807 11:41 AM, Colin McCabe wrote:

Hmm.  It would be unfortunate to make contributors include unrelated style 
changes in their PRs.  This would be especially hard on new contributors who 
might not want to make a large change.

If we really want to do something like this, I would vote for A1.  Just do the 
change all at once and get it over with.

I'm also curious what benefit we get out of making these changes.  If the code 
style was acceptable to the reviewers who committed the code, maybe we should 
leave it alone?

best,
Colin


On Tue, Aug 7, 2018, at 09:41, Guozhang Wang wrote:

Hello Ray,

I saw on the original PR Jason (cc'ed) expressed a concern comparing
scalafmt with scalastyle: the latter will throw exceptions in the build
process to notify developers while the former will automatically reformat
the code that developers may not be aware of. So I think maybe Jason can
elaborate a bit more of your thoughts on this regard.

Personally I like this idea (scalafmt). As for cherry-picking burdens,
there may be always some unavoidable, and I think B4 seems less invasive
and hence preferable.


Guozhang




On Mon, Jul 30, 2018 at 1:20 PM, Ray Chiang  wrote:


I had started on KAFKA-2423 (was Scalastyle, now Expand scalafmt to
core).  As part of the cleanup, applying the "gradlew spotlessApply"
command ended up affecting too many (435 out of 439) files.  Since this
will affect every file, this sort of change does risk polluting the git
logs.

So, I'd like to get a discussion going to find some agreement on an
approach.  Right now, I see two categories of options:

A) Getting scalafmt working on the existing code
B) Getting all the code conforming to scalafmt requirements

For the first, I see a couple of approaches:

A1) Do the minimum change that allows scalafmt to run on all the .scala
files
A2) Make the change so that scalafmt runs as-is (only on the streams code)
and add a different task/options that allow running scalafmt on a subset of
code.  (Reasons explained below)

For the second, I can think of the following options:

B1) Do one giant git commit of all cleaned code (no one seemed to like
this)
B2) Do git commits one file at a time (trunk or as a branch)
B3) Do git commits one leaf subdirectory at a time (trunk or as a branch)
B4) With each pull request on all patches, run option A2) on the affected
files

 From what I can envision, options B2 and B3 require quite a bit of manual
work if we want to cover multiple releases.  The "cleanest" option I can
think of looks something like:

C1) Contributor makes code modifications for their JIRA
C2) Contributor runs option A2 to also apply scalafmt to their existing
code
C3) Committer does the regular review process

At some point in the future, enough cleanup could be done that the final
cleanup can be done as a much smaller set of MINOR commits.

-Ray




--
-- Guozhang




Re: [VOTE] KIP-346 - Improve LogCleaner behavior on error

2018-08-07 Thread Ray Chiang

+1 (non-binding)

-Ray

On 8/7/18 9:26 AM, Ted Yu wrote:

+1

On Tue, Aug 7, 2018 at 5:25 AM Thomas Becker  wrote:


+1 (non-binding)

We've hit issues with the log cleaner in the past, and this would be a
great improvement.
On Tue, 2018-08-07 at 12:19 +0100, Stanislav Kozlovski wrote:

Hey everybody,

I'm starting a vote on KIP-346

<
https://cwiki.apache.org/confluence/display/KAFKA/KIP-346+-+Improve+LogCleaner+behavior+on+error




This email and any attachments may contain confidential and privileged
material for the sole use of the intended recipient. Any review, copying,
or distribution of this email (or any attachments) by others is prohibited.
If you are not the intended recipient, please contact the sender
immediately and permanently delete this email and any attachments. No
employee or agent of TiVo Inc. is authorized to conclude any binding
agreement on behalf of TiVo Inc. by email. Binding agreements with TiVo
Inc. may only be made by a signed written agreement.





Re: [DISCUSS] KIP-346 - Limit blast radius of log compaction failure

2018-08-06 Thread Ray Chiang

I'm okay with that.

-Ray

On 8/6/18 10:59 AM, Colin McCabe wrote:

Perhaps we could start with max.uncleanable.partitions and then implement 
max.uncleanable.partitions.per.logdir in a follow-up change if it seemed to be 
necessary?  What do you think?

regards,
Colin


On Sat, Aug 4, 2018, at 10:53, Stanislav Kozlovski wrote:

Hey Ray,

Thanks for the explanation. In regards to the configuration property - I'm
not sure. As long as it has sufficient documentation, I find
"max.uncleanable.partitions" to be okay. If we were to add the distinction
explicitly, maybe it should be `max.uncleanable.partitions.per.logdir` ?

On Thu, Aug 2, 2018 at 7:32 PM Ray Chiang  wrote:


One more thing occurred to me.  Should the configuration property be
named "max.uncleanable.partitions.per.disk" instead?

-Ray


On 8/1/18 9:11 AM, Stanislav Kozlovski wrote:

Yes, good catch. Thank you, James!

Best,
Stanislav

On Wed, Aug 1, 2018 at 5:05 PM James Cheng  wrote:


Can you update the KIP to say what the default is for
max.uncleanable.partitions?

-James

Sent from my iPhone


On Jul 31, 2018, at 9:56 AM, Stanislav Kozlovski <

stanis...@confluent.io>

wrote:

Hey group,

I am planning on starting a voting thread tomorrow. Please do reply if

you

feel there is anything left to discuss.

Best,
Stanislav

On Fri, Jul 27, 2018 at 11:05 PM Stanislav Kozlovski <

stanis...@confluent.io>

wrote:


Hey, Ray

Thanks for pointing that out, it's fixed now

Best,
Stanislav


On Fri, Jul 27, 2018 at 9:43 PM Ray Chiang 

wrote:

Thanks.  Can you fix the link in the "KIPs under discussion" table on
the main KIP landing page
<


https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals#

?

I tried, but the Wiki won't let me.

-Ray


On 7/26/18 2:01 PM, Stanislav Kozlovski wrote:
Hey guys,

@Colin - good point. I added some sentences mentioning recent

improvements

in the introductory section.

*Disk Failure* - I tend to agree with what Colin said - once a disk

fails,

you don't want to work with it again. As such, I've changed my mind

and

believe that we should mark the LogDir (assume its a disk) as

offline

on

the first `IOException` encountered. This is the LogCleaner's

current

behavior. We shouldn't change that.

*Respawning Threads* - I believe we should never re-spawn a thread.

The

correct approach in my mind is to either have it stay dead or never

let

it

die in the first place.

*Uncleanable-partition-names metric* - Colin is right, this metric

is

unneeded. Users can monitor the `uncleanable-partitions-count`

metric

and

inspect logs.


Hey Ray,


2) I'm 100% with James in agreement with setting up the LogCleaner

to

skip over problematic partitions instead of dying.

I think we can do this for every exception that isn't `IOException`.

This

will future-proof us against bugs in the system and potential other

errors.

Protecting yourself against unexpected failures is always a good

thing

in

my mind, but I also think that protecting yourself against bugs in

the

software is sort of clunky. What does everybody think about this?


4) The only improvement I can think of is that if such an
error occurs, then have the option (configuration setting?) to

create a

.skip file (or something similar).

This is a good suggestion. Have others also seen corruption be

generally

tied to the same segment?

On Wed, Jul 25, 2018 at 11:55 AM Dhruvil Shah 
wrote:

For the cleaner thread specifically, I do not think respawning will

help at

all because we are more than likely to run into the same issue

again

which

would end up crashing the cleaner. Retrying makes sense for

transient

errors or when you believe some part of the system could have

healed

itself, both of which I think are not true for the log cleaner.

On Wed, Jul 25, 2018 at 11:08 AM Ron Dagostino 

wrote:

<<
you

in

an

infinite loop which consumes resources and fires off continuous

log

messages.
Hi Colin.  In case it could be relevant, one way to mitigate this

effect

is

to implement a backoff mechanism (if a second respawn is to occur

then

wait

for 1 minute before doing it; then if a third respawn is to occur

wait

for

2 minutes before doing it; then 4 minutes, 8 minutes, etc. up to

some

max

wait time).

I have no opinion on whether respawn is appropriate or not in this

context,

but a mitigation like the increasing backoff described above may

be

relevant in weighing the pros and cons.

Ron

On Wed, Jul 25, 2018 at 1:26 PM Colin McCabe 

wrote:

On Mon, Jul 23, 2018, at 23:20, James Cheng wrote:
Hi Stanislav! Thanks for this KIP!

I agree that it would be good if the LogCleaner were more

tolerant

of

errors. Currently, as you said, once it dies, it stays dead.

Things are better now than they used to be. We have the metric


kafka.log:type=LogCleanerManager,name=time-since-last-run-ms

which we can use to tell us if the threads are dead. And as of

1.1.0,

we

[jira] [Resolved] (KAFKA-5153) KAFKA Cluster : 0.10.2.0 : Servers Getting disconnected : Service Impacting

2018-08-03 Thread Ray Chiang (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-5153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ray Chiang resolved KAFKA-5153.
---
   Resolution: Information Provided
Fix Version/s: 0.11.0.1

Upgrading fixed the problem based on the last comment.

> KAFKA Cluster : 0.10.2.0 : Servers Getting disconnected : Service Impacting
> ---
>
> Key: KAFKA-5153
> URL: https://issues.apache.org/jira/browse/KAFKA-5153
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.10.2.0, 0.11.0.0
> Environment: RHEL 6
> Java Version  1.8.0_91-b14
>Reporter: Arpan
>Priority: Critical
>  Labels: reliability
> Fix For: 0.11.0.1
>
> Attachments: ThreadDump_1493564142.dump, ThreadDump_1493564177.dump, 
> ThreadDump_1493564249.dump, server.properties, server_1_72server.log, 
> server_2_73_server.log, server_3_74Server.log
>
>
> Hi Team, 
> I was earlier referring to issue KAFKA-4477 because the problem i am facing 
> is similar. I tried to search the same reference in release docs as well but 
> did not get anything in 0.10.1.1 or 0.10.2.0. I am currently using 
> 2.11_0.10.2.0.
> I am have 3 node cluster for KAFKA and cluster for ZK as well on the same set 
> of servers in cluster mode. We are having around 240GB of data getting 
> transferred through KAFKA everyday. What we are observing is disconnect of 
> the server from cluster and ISR getting reduced and it starts impacting 
> service.
> I have also observed file descriptor count getting increased a bit, in normal 
> circumstances we have not observed FD count more than 500 but when issue 
> started we were observing it in the range of 650-700 on all 3 servers. 
> Attaching thread dumps of all 3 servers when we started facing the issue 
> recently.
> The issue get vanished once you bounce the nodes and the set up is not 
> working more than 5 days without this issue. Attaching server logs as well.
> Kindly let me know if you need any additional information. Attaching 
> server.properties as well for one of the server (It's similar on all 3 
> serversP)



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KAFKA-7244) Add note about memory map kernel limits to documentation

2018-08-03 Thread Ray Chiang (JIRA)
Ray Chiang created KAFKA-7244:
-

 Summary: Add note about memory map kernel limits to documentation
 Key: KAFKA-7244
 URL: https://issues.apache.org/jira/browse/KAFKA-7244
 Project: Kafka
  Issue Type: Improvement
  Components: documentation
Affects Versions: 0.10.0.0
Reporter: Ray Chiang
Assignee: Ray Chiang


In the documentation for 0.10.x through 2.0.0, there is mention of the file 
descriptor limit and the max socket buffer size, but no mention of the memory 
map kernel limit.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (KAFKA-3577) Partial cluster breakdown

2018-08-03 Thread Ray Chiang (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-3577?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ray Chiang resolved KAFKA-3577.
---
Resolution: Duplicate

> Partial cluster breakdown
> -
>
> Key: KAFKA-3577
> URL: https://issues.apache.org/jira/browse/KAFKA-3577
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.9.0.1
> Environment: Debian GNU/Linux 7.9 (wheezy)
>Reporter: Kim Christensen
>Priority: Major
>
> We run a cluster of 3 brokers and 3 zookeepers, but after we upgraded to 
> 0.9.0.1 our cluster sometimes goes partially down, and we can't figure why. A 
> full cluster restart fixed the problem.
> I've added a snippet of the logs on each broker below.
> Broker 4:
> {quote}
> [2016-04-18 05:58:26,390] INFO [Group Metadata Manager on Broker 4]: Removed 
> 0 expired offsets in 0 milliseconds. (kafka.coordinator.GroupMetadataManager)
> [2016-04-18 06:05:55,218] INFO Creating /controller (is it secure? false) 
> (kafka.utils.ZKCheckedEphemeral)
> [2016-04-18 06:05:57,396] ERROR Session has expired while creating 
> /controller (kafka.utils.ZKCheckedEphemeral)
> [2016-04-18 06:05:57,396] INFO Result of znode creation is: SESSIONEXPIRED 
> (kafka.utils.ZKCheckedEphemeral)
> [2016-04-18 06:05:57,400] ERROR Error while electing or becoming leader on 
> broker 4 (kafka.server.ZookeeperLeaderElector)
> org.I0Itec.zkclient.exception.ZkException: 
> org.apache.zookeeper.KeeperException$SessionExpiredException: KeeperErrorCode 
> = Session expired
> at 
> org.I0Itec.zkclient.exception.ZkException.create(ZkException.java:68)
> at kafka.utils.ZKCheckedEphemeral.create(ZkUtils.scala:1090)
> at 
> kafka.server.ZookeeperLeaderElector.elect(ZookeeperLeaderElector.scala:81)
> at 
> kafka.server.ZookeeperLeaderElector$LeaderChangeListener$$anonfun$handleDataDeleted$1.apply$mcZ$sp(ZookeeperLeaderElector.scala:146)
> at 
> kafka.server.ZookeeperLeaderElector$LeaderChangeListener$$anonfun$handleDataDeleted$1.apply(ZookeeperLeaderElector.scala:141)
> at 
> kafka.server.ZookeeperLeaderElector$LeaderChangeListener$$anonfun$handleDataDeleted$1.apply(ZookeeperLeaderElector.scala:141)
> at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:262)
> at 
> kafka.server.ZookeeperLeaderElector$LeaderChangeListener.handleDataDeleted(ZookeeperLeaderElector.scala:141)
> at org.I0Itec.zkclient.ZkClient$9.run(ZkClient.java:823)
> at org.I0Itec.zkclient.ZkEventThread.run(ZkEventThread.java:71)
> Caused by: org.apache.zookeeper.KeeperException$SessionExpiredException: 
> KeeperErrorCode = Session expired
> at 
> org.apache.zookeeper.KeeperException.create(KeeperException.java:127)
> ... 9 more
> [2016-04-18 06:05:57,420] INFO Creating /controller (is it secure? false) 
> (kafka.utils.ZKCheckedEphemeral)
> [2016-04-18 06:05:57,424] INFO Result of znode creation is: OK 
> (kafka.utils.ZKCheckedEphemeral)
> [2016-04-18 06:05:57,425] INFO 4 successfully elected as leader 
> (kafka.server.ZookeeperLeaderElector)
> [2016-04-18 06:05:57,885] INFO [ReplicaFetcherManager on broker 4] Removed 
> fetcher for partitions 
> [__consumer_offsets,32],[__consumer_offsets,44],[cdrrecords-errors,1],[cdrrecords,0],[__consumer_offsets,38],[__consumer_offsets,8],[events
> ,2],[__consumer_offsets,20],[__consumer_offsets,2],[__consumer_offsets,14],[__consumer_offsets,26]
>  (kafka.server.ReplicaFetcherManager)
> [2016-04-18 06:05:57,892] INFO [ReplicaFetcherManager on broker 4] Removed 
> fetcher for partitions 
> [__consumer_offsets,35],[__consumer_offsets,23],[__consumer_offsets,47],[__consumer_offsets,11],[__consumer_offsets,5],[events-errors,2],[_
> _consumer_offsets,17],[__consumer_offsets,41],[__consumer_offsets,29] 
> (kafka.server.ReplicaFetcherManager)
> [2016-04-18 06:05:57,894] INFO Truncating log __consumer_offsets-17 to offset 
> 0. (kafka.log.Log)
> [2016-04-18 06:05:57,894] INFO Truncating log __consumer_offsets-23 to offset 
> 0. (kafka.log.Log)
> [2016-04-18 06:05:57,894] INFO Truncating log __consumer_offsets-29 to offset 
> 0. (kafka.log.Log)
> [2016-04-18 06:05:57,895] INFO Truncating log __consumer_offsets-35 to offset 
> 0. (kafka.log.Log)
> [2016-04-18 06:05:57,895] INFO Truncating log __consumer_offsets-41 to offset 
> 0. (kafka.log.Log)
> [2016-04-18 06:05:57,895] INFO Truncating log events-errors-2 to offset 0. 
> (kafka.log.Log)
> [2016-04-18 06:05:57,895] INFO Truncating log __consumer_offsets-5 to offset 
> 0. (kafka.log.Log)
> [2016-04-18 06:05:57,895] INFO Truncating log __consumer_offsets-11 to offset 
> 0. (kafka.log

[jira] [Resolved] (KAFKA-3094) Kafka process 100% CPU when no message in topic

2018-08-03 Thread Ray Chiang (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-3094?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ray Chiang resolved KAFKA-3094.
---
Resolution: Cannot Reproduce

> Kafka process 100% CPU when no message in topic
> ---
>
> Key: KAFKA-3094
> URL: https://issues.apache.org/jira/browse/KAFKA-3094
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.9.0.0
>Reporter: Omar AL Zabir
>Priority: Major
>
> When there's no message in a kafka topic and it is not getting any traffic 
> for some time, all the kafka nodes go 100% CPU. 
> As soon as I post a message, the CPU comes back to normal. 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Resolved] (KAFKA-4101) java.lang.IllegalStateException in org.apache.kafka.common.network.Selector.channelOrFail

2018-08-02 Thread Ray Chiang (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-4101?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ray Chiang resolved KAFKA-4101.
---
Resolution: Duplicate

> java.lang.IllegalStateException in 
> org.apache.kafka.common.network.Selector.channelOrFail
> -
>
> Key: KAFKA-4101
> URL: https://issues.apache.org/jira/browse/KAFKA-4101
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 0.9.0.1
> Environment: Ubuntu 14.04, AWS deployment, under heavy network load
>Reporter: Andrey Savov
>Priority: Major
>
> {code}
>  at org.apache.kafka.common.network.Selector.channelOrFail(Selector.java:467)
> at org.apache.kafka.common.network.Selector.mute(Selector.java:347)
> at 
> kafka.network.Processor$$anonfun$run$11.apply(SocketServer.scala:434)
> at 
> kafka.network.Processor$$anonfun$run$11.apply(SocketServer.scala:421)
> at scala.collection.Iterator$class.foreach(Iterator.scala:742)
> at scala.collection.AbstractIterator.foreach(Iterator.scala:1194)
> at scala.collection.IterableLike$class.foreach(IterableLike.scala:72)
> at scala.collection.AbstractIterable.foreach(Iterable.scala:54)
> at kafka.network.Processor.run(SocketServer.scala:421)
> at java.lang.Thread.run(Thread.java:745)
> {code}



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: Discussion: New components in JIRA?

2018-08-02 Thread Ray Chiang

Great.  Thanks!

-Ray

On 8/2/18 12:28 PM, Guozhang Wang wrote:

Hello Ray,

I've added these two components. People should be able to use them creating
/ updating the JIRAs now.


Guozhang

On Wed, Aug 1, 2018 at 12:56 PM, Ray Chiang  wrote:


I haven't seen any comments.  Let me know if/when you add the new
components.  Thanks.

-Ray



On 7/27/18 9:54 PM, Guozhang Wang wrote:


Hello Ray,

Any PMC member of the project can add more components in the JIRA system.
If there is no objection in the next 72 hours I can just go ahead and add
them.


Guozhang


On Thu, Jul 26, 2018 at 1:50 PM, Ray Chiang  wrote:

Thanks Guozhang.  I'm good with the way the documentation is now.

Is there any other procedure to follow to get "logging" and "mirrormaker"
added as components or can we just request a JIRA admin to do that on
this
list?

-Ray


On 7/23/18 4:56 PM, Guozhang Wang wrote:

I've just updated the web docs on http://kafka.apache.org/contributing

accordingly.

On Mon, Jul 23, 2018 at 3:30 PM, khaireddine Rezgui <
khaireddine...@gmail.com> wrote:

Good job Ray for the wiki, it's clear enough.


Le 23 juil. 2018 10:17 PM, "Ray Chiang"  a écrit :

Okay, I've created a wiki page Reporting Issues in Apache Kafka
<
https://cwiki.apache.org/confluence/display/KAFKA/
Reporting+Issues+in+Apache+Kafka>.

I'd appreciate any feedback.  If this is good enough, I can file a JIRA
to change the link under "Bugs" in the "Project information" page.


-Ray


On 7/23/18 11:28 AM, Ray Chiang wrote:

Good point.  I'll look into adding some JIRA guidelines to the

documentation/wiki.

-Ray

On 7/22/18 10:23 AM, Guozhang Wang wrote:

Hello Ray,

Thanks for brining this up. I'm generally +1 on the first two, while
for
the last category, personally I felt leaving them as part of `tools`
is
fine, but I'm also open for other opinions.

A more general question though, is that today we do not have any
guidelines
to ask JIRA reporters to set the right component, i.e. it is purely
best-effort, and we cannot disallow reporters to add any new
component
names. And so far the project does not really have a tradition to
manage
JIRA reports per-component, as the goal is to not "separate" the
project
into silos but recommending everyone to get hands on every aspect of
the
project.


Guozhang


On Fri, Jul 20, 2018 at 2:44 PM, Ray Chiang 
wrote:

I've been doing a little bit of component cleanup in JIRA.  What do


people
think of adding
one or more of the following components?

- logging: For any consumer/producer/broker logging (i.e. log4j).
This
should help disambiguate from the "log" component (i.e. Kafka
messages).

- mirrormaker: There are enough requests specific to MirrorMaker
that it
could be put into its own component.

- scripts: I'm a little more ambivalent about this one, but any of
the
bin/*.sh script fixes could belong in their own category.  I'm not
sure if
other people feel strongly for how the "tools" component should be
used
w.r.t. the run scripts.

Any thoughts?

-Ray










Re: [DISCUSS] KIP-346 - Limit blast radius of log compaction failure

2018-08-02 Thread Ray Chiang
One more thing occurred to me.  Should the configuration property be 
named "max.uncleanable.partitions.per.disk" instead?


-Ray


On 8/1/18 9:11 AM, Stanislav Kozlovski wrote:

Yes, good catch. Thank you, James!

Best,
Stanislav

On Wed, Aug 1, 2018 at 5:05 PM James Cheng  wrote:


Can you update the KIP to say what the default is for
max.uncleanable.partitions?

-James

Sent from my iPhone


On Jul 31, 2018, at 9:56 AM, Stanislav Kozlovski 

wrote:

Hey group,

I am planning on starting a voting thread tomorrow. Please do reply if

you

feel there is anything left to discuss.

Best,
Stanislav

On Fri, Jul 27, 2018 at 11:05 PM Stanislav Kozlovski <

stanis...@confluent.io>

wrote:


Hey, Ray

Thanks for pointing that out, it's fixed now

Best,
Stanislav


On Fri, Jul 27, 2018 at 9:43 PM Ray Chiang  wrote:

Thanks.  Can you fix the link in the "KIPs under discussion" table on
the main KIP landing page
<


https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals#

?

I tried, but the Wiki won't let me.

-Ray


On 7/26/18 2:01 PM, Stanislav Kozlovski wrote:
Hey guys,

@Colin - good point. I added some sentences mentioning recent

improvements

in the introductory section.

*Disk Failure* - I tend to agree with what Colin said - once a disk

fails,

you don't want to work with it again. As such, I've changed my mind

and

believe that we should mark the LogDir (assume its a disk) as offline

on

the first `IOException` encountered. This is the LogCleaner's current
behavior. We shouldn't change that.

*Respawning Threads* - I believe we should never re-spawn a thread.

The

correct approach in my mind is to either have it stay dead or never

let

it

die in the first place.

*Uncleanable-partition-names metric* - Colin is right, this metric is
unneeded. Users can monitor the `uncleanable-partitions-count` metric

and

inspect logs.


Hey Ray,


2) I'm 100% with James in agreement with setting up the LogCleaner to
skip over problematic partitions instead of dying.

I think we can do this for every exception that isn't `IOException`.

This

will future-proof us against bugs in the system and potential other

errors.

Protecting yourself against unexpected failures is always a good thing

in

my mind, but I also think that protecting yourself against bugs in the
software is sort of clunky. What does everybody think about this?


4) The only improvement I can think of is that if such an
error occurs, then have the option (configuration setting?) to

create a

.skip file (or something similar).

This is a good suggestion. Have others also seen corruption be

generally

tied to the same segment?

On Wed, Jul 25, 2018 at 11:55 AM Dhruvil Shah 

wrote:

For the cleaner thread specifically, I do not think respawning will

help at

all because we are more than likely to run into the same issue again

which

would end up crashing the cleaner. Retrying makes sense for transient
errors or when you believe some part of the system could have healed
itself, both of which I think are not true for the log cleaner.

On Wed, Jul 25, 2018 at 11:08 AM Ron Dagostino 

wrote:

<<
in

an

infinite loop which consumes resources and fires off continuous log
messages.
Hi Colin.  In case it could be relevant, one way to mitigate this

effect

is

to implement a backoff mechanism (if a second respawn is to occur

then

wait

for 1 minute before doing it; then if a third respawn is to occur

wait

for

2 minutes before doing it; then 4 minutes, 8 minutes, etc. up to

some

max

wait time).

I have no opinion on whether respawn is appropriate or not in this

context,

but a mitigation like the increasing backoff described above may be
relevant in weighing the pros and cons.

Ron

On Wed, Jul 25, 2018 at 1:26 PM Colin McCabe 

wrote:

On Mon, Jul 23, 2018, at 23:20, James Cheng wrote:
Hi Stanislav! Thanks for this KIP!

I agree that it would be good if the LogCleaner were more tolerant

of

errors. Currently, as you said, once it dies, it stays dead.

Things are better now than they used to be. We have the metric
   kafka.log:type=LogCleanerManager,name=time-since-last-run-ms
which we can use to tell us if the threads are dead. And as of

1.1.0,

we

have KIP-226, which allows you to restart the log cleaner thread,
without requiring a broker restart.


https://cwiki.apache.org/confluence/display/KAFKA/KIP-226+-+Dynamic+Broker+Configuration

<

https://cwiki.apache.org/confluence/display/KAFKA/KIP-226+-+Dynamic+Broker+Configuration

I've only read about this, I haven't personally tried it.

Thanks for pointing this out, James!  Stanislav, we should probably

add a

sentence or two mentioning the KIP-226 changes somewhere in the

KIP.

Maybe

in the intro section?

I think it's clear that requiring the users to manually restart the

log

cleaner is not a very good solution.  But it's good to know that

it's a

possibility on some older releases.


Some comments:
* I like the i

Re: [DISCUSS] KIP-346 - Limit blast radius of log compaction failure

2018-08-02 Thread Ray Chiang
I see this as a fix for the LogCleaner.  Uncaught exceptions kill the 
CleanerThread and this is viewed as undesired behavior.  Some other ways 
to think of this fix:


1) If you have occasional corruption in some log segments, then with 
each broker restart, the LogCleaner will lose its state, re-find all the 
corrupted log segments, and will skip them in future runs.  In these 
cases, you will see a non-zero value for uncleanable-partitions-count 
and look in the broker logs to see if this is fixable in some way or it 
will decrement once the log segment is no longer retained.


2) If you have increasing disk corruption, then this is a way to 
potentially catch increasing corruption.  It's not a perfect approach, 
but as we've discussed before, hard drive failures tend to cascade.  
This is a useful side effect of LogCleaning.


3) If you have a non-zero uncleanable-partitions-count, you can look in 
the logs, compare the replicated partitions across brokers, use 
DumpLogSegments to possibly find/fix/delete the corrupted record(s).  
Just from the cases I've seen, this type of corruption is fixable 
roughly 30% of the time.


-Ray


On 8/1/18 11:35 AM, James Cheng wrote:

I’m a little confused about something. Is this KIP focused on log cleaner 
exceptions in general, or focused on log cleaner exceptions due to disk 
failures?

Will max.uncleanable.partitions apply to all exceptions (including log cleaner 
logic errors) or will it apply to only disk I/o exceptions?

I can understand taking the disk offline if there have been “N” I/O exceptions. 
Disk errors are user fixable (by replacing the affected disk). It turns an 
invisible (soft?) failure into a visible hard failure. And the I/O exceptions 
are possibly already causing problems, so it makes sense to limit their impact.

But I’m not sure if it makes sense to take a disk offline after”N” logic errors 
in the log cleaner. If a log cleaner logic error happens, it’s rarely user 
fixable. And it will likely several partitions at once, so you’re likely to 
bump up against the max.uncleanable.partitions limit more quickly. If a disk 
was taken due to logic errors, I’m not sure what the user would do.

-James

Sent from my iPhone


On Aug 1, 2018, at 9:11 AM, Stanislav Kozlovski  wrote:

Yes, good catch. Thank you, James!

Best,
Stanislav


On Wed, Aug 1, 2018 at 5:05 PM James Cheng  wrote:

Can you update the KIP to say what the default is for
max.uncleanable.partitions?

-James

Sent from my iPhone


On Jul 31, 2018, at 9:56 AM, Stanislav Kozlovski 

wrote:

Hey group,

I am planning on starting a voting thread tomorrow. Please do reply if

you

feel there is anything left to discuss.

Best,
Stanislav

On Fri, Jul 27, 2018 at 11:05 PM Stanislav Kozlovski <

stanis...@confluent.io>

wrote:


Hey, Ray

Thanks for pointing that out, it's fixed now

Best,
Stanislav


On Fri, Jul 27, 2018 at 9:43 PM Ray Chiang  wrote:

Thanks.  Can you fix the link in the "KIPs under discussion" table on
the main KIP landing page
<


https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals#

?

I tried, but the Wiki won't let me.

-Ray


On 7/26/18 2:01 PM, Stanislav Kozlovski wrote:
Hey guys,

@Colin - good point. I added some sentences mentioning recent

improvements

in the introductory section.

*Disk Failure* - I tend to agree with what Colin said - once a disk

fails,

you don't want to work with it again. As such, I've changed my mind

and

believe that we should mark the LogDir (assume its a disk) as offline

on

the first `IOException` encountered. This is the LogCleaner's current
behavior. We shouldn't change that.

*Respawning Threads* - I believe we should never re-spawn a thread.

The

correct approach in my mind is to either have it stay dead or never

let

it

die in the first place.

*Uncleanable-partition-names metric* - Colin is right, this metric is
unneeded. Users can monitor the `uncleanable-partitions-count` metric

and

inspect logs.


Hey Ray,


2) I'm 100% with James in agreement with setting up the LogCleaner to
skip over problematic partitions instead of dying.

I think we can do this for every exception that isn't `IOException`.

This

will future-proof us against bugs in the system and potential other

errors.

Protecting yourself against unexpected failures is always a good thing

in

my mind, but I also think that protecting yourself against bugs in the
software is sort of clunky. What does everybody think about this?


4) The only improvement I can think of is that if such an
error occurs, then have the option (configuration setting?) to

create a

.skip file (or something similar).

This is a good suggestion. Have others also seen corruption be

generally

tied to the same segment?

On Wed, Jul 25, 2018 at 11:55 AM Dhruvil Shah 

wrote:

For the cleaner thread specifically, I do not think respawning will

help at

all because we are more than likely to run into the same issue again

which

wo

Re: Discussion: New components in JIRA?

2018-08-01 Thread Ray Chiang
I haven't seen any comments.  Let me know if/when you add the new 
components.  Thanks.


-Ray


On 7/27/18 9:54 PM, Guozhang Wang wrote:

Hello Ray,

Any PMC member of the project can add more components in the JIRA system.
If there is no objection in the next 72 hours I can just go ahead and add
them.


Guozhang


On Thu, Jul 26, 2018 at 1:50 PM, Ray Chiang  wrote:


Thanks Guozhang.  I'm good with the way the documentation is now.

Is there any other procedure to follow to get "logging" and "mirrormaker"
added as components or can we just request a JIRA admin to do that on this
list?

-Ray


On 7/23/18 4:56 PM, Guozhang Wang wrote:


I've just updated the web docs on http://kafka.apache.org/contributing
accordingly.

On Mon, Jul 23, 2018 at 3:30 PM, khaireddine Rezgui <
khaireddine...@gmail.com> wrote:

Good job Ray for the wiki, it's clear enough.

Le 23 juil. 2018 10:17 PM, "Ray Chiang"  a écrit :

Okay, I've created a wiki page Reporting Issues in Apache Kafka
<
https://cwiki.apache.org/confluence/display/KAFKA/
Reporting+Issues+in+Apache+Kafka>.

I'd appreciate any feedback.  If this is good enough, I can file a JIRA
to change the link under "Bugs" in the "Project information" page.


-Ray


On 7/23/18 11:28 AM, Ray Chiang wrote:


Good point.  I'll look into adding some JIRA guidelines to the
documentation/wiki.

-Ray

On 7/22/18 10:23 AM, Guozhang Wang wrote:


Hello Ray,

Thanks for brining this up. I'm generally +1 on the first two, while
for
the last category, personally I felt leaving them as part of `tools` is
fine, but I'm also open for other opinions.

A more general question though, is that today we do not have any
guidelines
to ask JIRA reporters to set the right component, i.e. it is purely
best-effort, and we cannot disallow reporters to add any new component
names. And so far the project does not really have a tradition to
manage
JIRA reports per-component, as the goal is to not "separate" the
project
into silos but recommending everyone to get hands on every aspect of
the
project.


Guozhang


On Fri, Jul 20, 2018 at 2:44 PM, Ray Chiang 
wrote:

I've been doing a little bit of component cleanup in JIRA.  What do

people
think of adding
one or more of the following components?

- logging: For any consumer/producer/broker logging (i.e. log4j). This
should help disambiguate from the "log" component (i.e. Kafka
messages).

- mirrormaker: There are enough requests specific to MirrorMaker
that it
could be put into its own component.

- scripts: I'm a little more ambivalent about this one, but any of the
bin/*.sh script fixes could belong in their own category.  I'm not
sure if
other people feel strongly for how the "tools" component should be
used
w.r.t. the run scripts.

Any thoughts?

-Ray











Re: [DISCUSS] KIP-346 - Limit blast radius of log compaction failure

2018-07-31 Thread Ray Chiang
I had one question that I was trying to do some investigation before I 
asked, but I'm having some issues with my JMX browser right now.


 * For the uncleanable-partitions-count metric, is that going to be
   per-logDir entry?
 * For max.uncleanable.partitions, is the intention to have -1 be
   "infinite" or are we going to use Int.MaxValue as a practical
   equivalent?
 * In this sentence: "When evaluating which logs to compact, skip the
   marked ones.", should we define what "marking" will be?  If we're
   going with the ".skip" file or equivalent, can we also add how
   successful retries will behave?

-Ray

On 7/31/18 9:56 AM, Stanislav Kozlovski wrote:

Hey group,

I am planning on starting a voting thread tomorrow. Please do reply if you
feel there is anything left to discuss.

Best,
Stanislav

On Fri, Jul 27, 2018 at 11:05 PM Stanislav Kozlovski 
wrote:


Hey, Ray

Thanks for pointing that out, it's fixed now

Best,
Stanislav

On Fri, Jul 27, 2018 at 9:43 PM Ray Chiang  wrote:


Thanks.  Can you fix the link in the "KIPs under discussion" table on
the main KIP landing page
<
https://cwiki.apache.org/confluence/display/KAFKA/Kafka+Improvement+Proposals#>?

I tried, but the Wiki won't let me.

-Ray

On 7/26/18 2:01 PM, Stanislav Kozlovski wrote:

Hey guys,

@Colin - good point. I added some sentences mentioning recent

improvements

in the introductory section.

*Disk Failure* - I tend to agree with what Colin said - once a disk

fails,

you don't want to work with it again. As such, I've changed my mind and
believe that we should mark the LogDir (assume its a disk) as offline on
the first `IOException` encountered. This is the LogCleaner's current
behavior. We shouldn't change that.

*Respawning Threads* - I believe we should never re-spawn a thread. The
correct approach in my mind is to either have it stay dead or never let

it

die in the first place.

*Uncleanable-partition-names metric* - Colin is right, this metric is
unneeded. Users can monitor the `uncleanable-partitions-count` metric

and

inspect logs.


Hey Ray,


2) I'm 100% with James in agreement with setting up the LogCleaner to
skip over problematic partitions instead of dying.

I think we can do this for every exception that isn't `IOException`.

This

will future-proof us against bugs in the system and potential other

errors.

Protecting yourself against unexpected failures is always a good thing

in

my mind, but I also think that protecting yourself against bugs in the
software is sort of clunky. What does everybody think about this?


4) The only improvement I can think of is that if such an
error occurs, then have the option (configuration setting?) to create a
.skip file (or something similar).

This is a good suggestion. Have others also seen corruption be generally
tied to the same segment?

On Wed, Jul 25, 2018 at 11:55 AM Dhruvil Shah 

wrote:

For the cleaner thread specifically, I do not think respawning will

help at

all because we are more than likely to run into the same issue again

which

would end up crashing the cleaner. Retrying makes sense for transient
errors or when you believe some part of the system could have healed
itself, both of which I think are not true for the log cleaner.

On Wed, Jul 25, 2018 at 11:08 AM Ron Dagostino 

wrote:

<<
in

an

infinite loop which consumes resources and fires off continuous log
messages.
Hi Colin.  In case it could be relevant, one way to mitigate this

effect

is

to implement a backoff mechanism (if a second respawn is to occur then

wait

for 1 minute before doing it; then if a third respawn is to occur wait

for

2 minutes before doing it; then 4 minutes, 8 minutes, etc. up to some

max

wait time).

I have no opinion on whether respawn is appropriate or not in this

context,

but a mitigation like the increasing backoff described above may be
relevant in weighing the pros and cons.

Ron

On Wed, Jul 25, 2018 at 1:26 PM Colin McCabe 

wrote:

On Mon, Jul 23, 2018, at 23:20, James Cheng wrote:

Hi Stanislav! Thanks for this KIP!

I agree that it would be good if the LogCleaner were more tolerant

of

errors. Currently, as you said, once it dies, it stays dead.

Things are better now than they used to be. We have the metric
kafka.log:type=LogCleanerManager,name=time-since-last-run-ms
which we can use to tell us if the threads are dead. And as of

1.1.0,

we

have KIP-226, which allows you to restart the log cleaner thread,
without requiring a broker restart.


https://cwiki.apache.org/confluence/display/KAFKA/KIP-226+-+Dynamic+Broker+Configuration

<

https://cwiki.apache.org/confluence/display/KAFKA/KIP-226+-+Dynamic+Broker+Configuration

I've only read about this, I haven't personally tried it.

Thanks for pointing this out, James!  Stanislav, we should probably

add a

sentence or two mentioning the KIP-226 changes somewhere in the KIP.

Maybe

in the intro section?

I think it's cle

[DISCUSS] Applying scalafmt to core code

2018-07-30 Thread Ray Chiang
I had started on KAFKA-2423 (was Scalastyle, now Expand scalafmt to 
core).  As part of the cleanup, applying the "gradlew spotlessApply" 
command ended up affecting too many (435 out of 439) files.  Since this 
will affect every file, this sort of change does risk polluting the git 
logs.


So, I'd like to get a discussion going to find some agreement on an 
approach.  Right now, I see two categories of options:


A) Getting scalafmt working on the existing code
B) Getting all the code conforming to scalafmt requirements

For the first, I see a couple of approaches:

A1) Do the minimum change that allows scalafmt to run on all the .scala 
files
A2) Make the change so that scalafmt runs as-is (only on the streams 
code) and add a different task/options that allow running scalafmt on a 
subset of code.  (Reasons explained below)


For the second, I can think of the following options:

B1) Do one giant git commit of all cleaned code (no one seemed to like this)
B2) Do git commits one file at a time (trunk or as a branch)
B3) Do git commits one leaf subdirectory at a time (trunk or as a branch)
B4) With each pull request on all patches, run option A2) on the 
affected files


From what I can envision, options B2 and B3 require quite a bit of 
manual work if we want to cover multiple releases.  The "cleanest" 
option I can think of looks something like:


C1) Contributor makes code modifications for their JIRA
C2) Contributor runs option A2 to also apply scalafmt to their existing code
C3) Committer does the regular review process

At some point in the future, enough cleanup could be done that the final 
cleanup can be done as a much smaller set of MINOR commits.


-Ray



Re: [DISCUSS] KIP-346 - Limit blast radius of log compaction failure

2018-07-27 Thread Ray Chiang
Thanks.  Can you fix the link in the "KIPs under discussion" table on 
the main KIP landing page 
?  
I tried, but the Wiki won't let me.


-Ray

On 7/26/18 2:01 PM, Stanislav Kozlovski wrote:

Hey guys,

@Colin - good point. I added some sentences mentioning recent improvements
in the introductory section.

*Disk Failure* - I tend to agree with what Colin said - once a disk fails,
you don't want to work with it again. As such, I've changed my mind and
believe that we should mark the LogDir (assume its a disk) as offline on
the first `IOException` encountered. This is the LogCleaner's current
behavior. We shouldn't change that.

*Respawning Threads* - I believe we should never re-spawn a thread. The
correct approach in my mind is to either have it stay dead or never let it
die in the first place.

*Uncleanable-partition-names metric* - Colin is right, this metric is
unneeded. Users can monitor the `uncleanable-partitions-count` metric and
inspect logs.


Hey Ray,


2) I'm 100% with James in agreement with setting up the LogCleaner to
skip over problematic partitions instead of dying.

I think we can do this for every exception that isn't `IOException`. This
will future-proof us against bugs in the system and potential other errors.
Protecting yourself against unexpected failures is always a good thing in
my mind, but I also think that protecting yourself against bugs in the
software is sort of clunky. What does everybody think about this?


4) The only improvement I can think of is that if such an
error occurs, then have the option (configuration setting?) to create a
.skip file (or something similar).

This is a good suggestion. Have others also seen corruption be generally
tied to the same segment?

On Wed, Jul 25, 2018 at 11:55 AM Dhruvil Shah  wrote:


For the cleaner thread specifically, I do not think respawning will help at
all because we are more than likely to run into the same issue again which
would end up crashing the cleaner. Retrying makes sense for transient
errors or when you believe some part of the system could have healed
itself, both of which I think are not true for the log cleaner.

On Wed, Jul 25, 2018 at 11:08 AM Ron Dagostino  wrote:


<<
an

infinite loop which consumes resources and fires off continuous log
messages.
Hi Colin.  In case it could be relevant, one way to mitigate this effect

is

to implement a backoff mechanism (if a second respawn is to occur then

wait

for 1 minute before doing it; then if a third respawn is to occur wait

for

2 minutes before doing it; then 4 minutes, 8 minutes, etc. up to some max
wait time).

I have no opinion on whether respawn is appropriate or not in this

context,

but a mitigation like the increasing backoff described above may be
relevant in weighing the pros and cons.

Ron

On Wed, Jul 25, 2018 at 1:26 PM Colin McCabe  wrote:


On Mon, Jul 23, 2018, at 23:20, James Cheng wrote:

Hi Stanislav! Thanks for this KIP!

I agree that it would be good if the LogCleaner were more tolerant of
errors. Currently, as you said, once it dies, it stays dead.

Things are better now than they used to be. We have the metric
   kafka.log:type=LogCleanerManager,name=time-since-last-run-ms
which we can use to tell us if the threads are dead. And as of 1.1.0,

we

have KIP-226, which allows you to restart the log cleaner thread,
without requiring a broker restart.


https://cwiki.apache.org/confluence/display/KAFKA/KIP-226+-+Dynamic+Broker+Configuration

<

https://cwiki.apache.org/confluence/display/KAFKA/KIP-226+-+Dynamic+Broker+Configuration



I've only read about this, I haven't personally tried it.

Thanks for pointing this out, James!  Stanislav, we should probably

add a

sentence or two mentioning the KIP-226 changes somewhere in the KIP.

Maybe

in the intro section?

I think it's clear that requiring the users to manually restart the log
cleaner is not a very good solution.  But it's good to know that it's a
possibility on some older releases.


Some comments:
* I like the idea of having the log cleaner continue to clean as many
partitions as it can, skipping over the problematic ones if possible.

* If the log cleaner thread dies, I think it should automatically be
revived. Your KIP attempts to do that by catching exceptions during
execution, but I think we should go all the way and make sure that a

new

one gets created, if the thread ever dies.

This is inconsistent with the way the rest of Kafka works.  We don't
automatically re-create other threads in the broker if they terminate.

In

general, if there is a serious bug in the code, respawning threads is
likely to make things worse, by putting you in an infinite loop which
consumes resources and fires off continuous log messages.


* It might be worth trying to re-clean the uncleanable partitions.

I've

seen cases where an uncleanable partition later became cleanable. I
unfortunately don't remember how 

Re: Discussion: New components in JIRA?

2018-07-26 Thread Ray Chiang

Thanks Guozhang.  I'm good with the way the documentation is now.

Is there any other procedure to follow to get "logging" and 
"mirrormaker" added as components or can we just request a JIRA admin to 
do that on this list?


-Ray

On 7/23/18 4:56 PM, Guozhang Wang wrote:

I've just updated the web docs on http://kafka.apache.org/contributing
accordingly.

On Mon, Jul 23, 2018 at 3:30 PM, khaireddine Rezgui <
khaireddine...@gmail.com> wrote:


Good job Ray for the wiki, it's clear enough.

Le 23 juil. 2018 10:17 PM, "Ray Chiang"  a écrit :

Okay, I've created a wiki page Reporting Issues in Apache Kafka
<
https://cwiki.apache.org/confluence/display/KAFKA/
Reporting+Issues+in+Apache+Kafka>.

I'd appreciate any feedback.  If this is good enough, I can file a JIRA
to change the link under "Bugs" in the "Project information" page.


-Ray


On 7/23/18 11:28 AM, Ray Chiang wrote:

Good point.  I'll look into adding some JIRA guidelines to the
documentation/wiki.

-Ray

On 7/22/18 10:23 AM, Guozhang Wang wrote:

Hello Ray,

Thanks for brining this up. I'm generally +1 on the first two, while for
the last category, personally I felt leaving them as part of `tools` is
fine, but I'm also open for other opinions.

A more general question though, is that today we do not have any
guidelines
to ask JIRA reporters to set the right component, i.e. it is purely
best-effort, and we cannot disallow reporters to add any new component
names. And so far the project does not really have a tradition to manage
JIRA reports per-component, as the goal is to not "separate" the project
into silos but recommending everyone to get hands on every aspect of the
project.


Guozhang


On Fri, Jul 20, 2018 at 2:44 PM, Ray Chiang  wrote:


I've been doing a little bit of component cleanup in JIRA.  What do
people
think of adding
one or more of the following components?

- logging: For any consumer/producer/broker logging (i.e. log4j). This
should help disambiguate from the "log" component (i.e. Kafka
messages).

- mirrormaker: There are enough requests specific to MirrorMaker
that it
could be put into its own component.

- scripts: I'm a little more ambivalent about this one, but any of the
bin/*.sh script fixes could belong in their own category.  I'm not
sure if
other people feel strongly for how the "tools" component should be used
w.r.t. the run scripts.

Any thoughts?

-Ray









Re: [DISCUSS] KIP-346 - Limit blast radius of log compaction failure

2018-07-26 Thread Ray Chiang

Thanks for creating this KIP Stanislav.  My observations:

1) I agree with Colin that threads automatically re-launching threads 
generally isn't a great idea.  Metrics and/or monitoring threads are 
generally much safer.  And there's always the issue of what happens if 
the re-launcher dies?


2) I'm 100% with James in agreement with setting up the LogCleaner to 
skip over problematic partitions instead of dying.


3) There's a lot of "feature bloat" suggestions.  From how I see things, 
a message could get corrupted in one of several states:


3a) Message is corrupted by the leader partition saving to disk. 
Replicas have the same error.
3b) Message is corrupted by one of the replica partitions saving to 
disk.  Leader and other replica(s) unlikely to have the same error

3c) Disk corruption happens later (e.g. during partition move)

If we have the simplest solution, then all of the above will not cause 
the LogCleaner to crash and 3b/3c have a chance of manual recovery.


4) In most of the issues I'm seeing via work, most of the corruption 
seems persistent on the same log segment (i.e. a 3b/3c type of 
corruption).  The only improvement I can think of is that if such an 
error occurs, then have the option (configuration setting?) to create a 
.skip file (or something similar).  If the .skip file is 
there, don't re-scan the segment.  If you want a re-try or manage to fix 
the issue manually (e.g. copying from a replica), then the .skip file 
can be deleted after the segment is fixed and the LogCleaner will try 
again on the next iteration.


5) I'm in alignment with Colin's comment about hard drive failures. By 
the time you can reliably detect HDD hardware failures, it's less about 
improving the LogCleaner as much as that data needs to be moved to a new 
drive.


-Ray

On 7/25/18 11:55 AM, Dhruvil Shah wrote:

For the cleaner thread specifically, I do not think respawning will help at
all because we are more than likely to run into the same issue again which
would end up crashing the cleaner. Retrying makes sense for transient
errors or when you believe some part of the system could have healed
itself, both of which I think are not true for the log cleaner.

On Wed, Jul 25, 2018 at 11:08 AM Ron Dagostino  wrote:


<< wrote:


On Mon, Jul 23, 2018, at 23:20, James Cheng wrote:

Hi Stanislav! Thanks for this KIP!

I agree that it would be good if the LogCleaner were more tolerant of
errors. Currently, as you said, once it dies, it stays dead.

Things are better now than they used to be. We have the metric
   kafka.log:type=LogCleanerManager,name=time-since-last-run-ms
which we can use to tell us if the threads are dead. And as of 1.1.0,

we

have KIP-226, which allows you to restart the log cleaner thread,
without requiring a broker restart.


https://cwiki.apache.org/confluence/display/KAFKA/KIP-226+-+Dynamic+Broker+Configuration

<

https://cwiki.apache.org/confluence/display/KAFKA/KIP-226+-+Dynamic+Broker+Configuration



I've only read about this, I haven't personally tried it.

Thanks for pointing this out, James!  Stanislav, we should probably add a
sentence or two mentioning the KIP-226 changes somewhere in the KIP.

Maybe

in the intro section?

I think it's clear that requiring the users to manually restart the log
cleaner is not a very good solution.  But it's good to know that it's a
possibility on some older releases.


Some comments:
* I like the idea of having the log cleaner continue to clean as many
partitions as it can, skipping over the problematic ones if possible.

* If the log cleaner thread dies, I think it should automatically be
revived. Your KIP attempts to do that by catching exceptions during
execution, but I think we should go all the way and make sure that a

new

one gets created, if the thread ever dies.

This is inconsistent with the way the rest of Kafka works.  We don't
automatically re-create other threads in the broker if they terminate.

In

general, if there is a serious bug in the code, respawning threads is
likely to make things worse, by putting you in an infinite loop which
consumes resources and fires off continuous log messages.


* It might be worth trying to re-clean the uncleanable partitions. I've
seen cases where an uncleanable partition later became cleanable. I
unfortunately don't remember how that happened, but I remember being
surprised when I discovered it. It might have been something like a
follower was uncleanable but after a leader election happened, the log
truncated and it was then cleanable again. I'm not sure.

James, I disagree.  We had this behavior in the Hadoop Distributed File
System (HDFS) and it was a constant source of user problems.

What would happen is disks would just go bad over time.  The DataNode
would notice this and take them offline.  But then, due to some
"optimistic" code, the DataNode would periodically try to re-add them to
the system.  Then one of two things would happen: the disk would just

fail

immediately 

Re: Discussion: New components in JIRA?

2018-07-23 Thread Ray Chiang
Okay, I've created a wiki page Reporting Issues in Apache Kafka 
<https://cwiki.apache.org/confluence/display/KAFKA/Reporting+Issues+in+Apache+Kafka>.  
I'd appreciate any feedback.  If this is good enough, I can file a JIRA 
to change the link under "Bugs" in the "Project information" page.


-Ray

On 7/23/18 11:28 AM, Ray Chiang wrote:
Good point.  I'll look into adding some JIRA guidelines to the 
documentation/wiki.


-Ray

On 7/22/18 10:23 AM, Guozhang Wang wrote:

Hello Ray,

Thanks for brining this up. I'm generally +1 on the first two, while for
the last category, personally I felt leaving them as part of `tools` is
fine, but I'm also open for other opinions.

A more general question though, is that today we do not have any 
guidelines

to ask JIRA reporters to set the right component, i.e. it is purely
best-effort, and we cannot disallow reporters to add any new component
names. And so far the project does not really have a tradition to manage
JIRA reports per-component, as the goal is to not "separate" the project
into silos but recommending everyone to get hands on every aspect of the
project.


Guozhang


On Fri, Jul 20, 2018 at 2:44 PM, Ray Chiang  wrote:

I've been doing a little bit of component cleanup in JIRA.  What do 
people

think of adding
one or more of the following components?

- logging: For any consumer/producer/broker logging (i.e. log4j). This
should help disambiguate from the "log" component (i.e. Kafka 
messages).


- mirrormaker: There are enough requests specific to MirrorMaker 
that it

could be put into its own component.

- scripts: I'm a little more ambivalent about this one, but any of the
bin/*.sh script fixes could belong in their own category.  I'm not 
sure if

other people feel strongly for how the "tools" component should be used
w.r.t. the run scripts.

Any thoughts?

-Ray










Re: Discussion: New components in JIRA?

2018-07-23 Thread Ray Chiang
Good point.  I'll look into adding some JIRA guidelines to the 
documentation/wiki.


-Ray

On 7/22/18 10:23 AM, Guozhang Wang wrote:

Hello Ray,

Thanks for brining this up. I'm generally +1 on the first two, while for
the last category, personally I felt leaving them as part of `tools` is
fine, but I'm also open for other opinions.

A more general question though, is that today we do not have any guidelines
to ask JIRA reporters to set the right component, i.e. it is purely
best-effort, and we cannot disallow reporters to add any new component
names. And so far the project does not really have a tradition to manage
JIRA reports per-component, as the goal is to not "separate" the project
into silos but recommending everyone to get hands on every aspect of the
project.


Guozhang


On Fri, Jul 20, 2018 at 2:44 PM, Ray Chiang  wrote:


I've been doing a little bit of component cleanup in JIRA.  What do people
think of adding
one or more of the following components?

- logging: For any consumer/producer/broker logging (i.e. log4j). This
should help disambiguate from the "log" component (i.e. Kafka messages).

- mirrormaker: There are enough requests specific to MirrorMaker that it
could be put into its own component.

- scripts: I'm a little more ambivalent about this one, but any of the
bin/*.sh script fixes could belong in their own category.  I'm not sure if
other people feel strongly for how the "tools" component should be used
w.r.t. the run scripts.

Any thoughts?

-Ray








[jira] [Resolved] (KAFKA-6918) Kafka server fails to start with IBM JAVA

2018-07-23 Thread Ray Chiang (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-6918?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Ray Chiang resolved KAFKA-6918.
---
   Resolution: Fixed
Fix Version/s: 1.1.1

> Kafka server fails to start with IBM JAVA
> -
>
> Key: KAFKA-6918
> URL: https://issues.apache.org/jira/browse/KAFKA-6918
> Project: Kafka
>  Issue Type: Bug
>Affects Versions: 1.0.0
>Reporter: Nayana Thorat
>Priority: Critical
> Fix For: 1.1.1
>
>
> Kafka server start fails with below error:
> bin/kafka-server-start.sh -daemon config/server.properties
> ERROR:
> (kafka.server.KafkaConfig)
>  FATAL  (kafka.Kafka$)
> java.lang.IllegalArgumentException: Signal already used by VM: INT
>     at 
> com.ibm.misc.SignalDispatcher.registerSignal(SignalDispatcher.java:127)
>     at sun.misc.Signal.handle(Signal.java:184)
>     at kafka.Kafka$.registerHandler$1(Kafka.scala:67)
>     at kafka.Kafka$.registerLoggingSignalHandler(Kafka.scala:74)
>     at kafka.Kafka$.main(Kafka.scala:85)
>     at kafka.Kafka.main(Kafka.scala)
>  
> Tried with binaries and well as built  Apache Kafka(v1.0.0) from source.
> Installed  IBM SDK on Ubuntu 16.04. 
> IBM java link:
> wget 
> http://public.dhe.ibm.com/ibmdl/export/pub/systems/cloud/runtimes/java/8.0.5.10/linux/x86_64/ibm-java-sdk-8.0-5.10-x86_64-archive.bin
>  
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Discussion: New components in JIRA?

2018-07-20 Thread Ray Chiang
I've been doing a little bit of component cleanup in JIRA.  What do 
people think of adding

one or more of the following components?

- logging: For any consumer/producer/broker logging (i.e. log4j). This 
should help disambiguate from the "log" component (i.e. Kafka messages).


- mirrormaker: There are enough requests specific to MirrorMaker that it 
could be put into its own component.


- scripts: I'm a little more ambivalent about this one, but any of the 
bin/*.sh script fixes could belong in their own category.  I'm not sure 
if other people feel strongly for how the "tools" component should be 
used w.r.t. the run scripts.


Any thoughts?

-Ray



Re: [DISCUSS] KIP-253: Support in-order message delivery with partition expansion

2018-04-09 Thread Ray Chiang
My notes from today's meeting.  Sorry if I got anyone's name wrong. Plus 
I missed a few moments with noise at home and/or dropped video.


-Ray

=

KIP-253 Discussion

- Currently, adding partitions can cause keys to be read out-of-order.
  This KIP is trying to preserve the key ordering when adding partitions.

- State management in applications (i.e. Kafka Streams) can maintain
  local state via caching.  If the number of partitions changes, how
  would those applications update their local state.  This is the current
  point of discussion/disagreement.

- Jan Filipiak is mostly worried about log compacted topics.  Not as
  concerned about producer swapping.  Worried about the consumer design is
  a bit contradictory compared to the architecture.

  Current design is to start up a new consumer in parallel with old
  topic/consumer.  Run until consumer finishes "copying" to the new topic.
  Once the consumer is caught up, point the producer at the new topic.

  Would like to have this technique as a "core primitive" to Kafka.
  - Is this a useful primitive?
  - What's the best way to support it?

  - Topic expansion as it currently exists just "adds partitions". But
    how does this affect bootstrapping applications?  How to deal with
    "moved" (from "old partition" to "new expanded partition") keys?

  - Dong's proposal example.  10 partitions growing to 15.  5 of the
    first 10 partitions are split into 2 each.  Say Kafka remembers
    parent->child relationship.  Then for each parent partition, there
    are two child partitions.  Initially, there were 10 states to
    manage.  Then bootstrapping new application would have 15 states.
    Need to know which "generation" of partition you are consuming
    from.  Until you get to "newer" generation of data, then the keys
    will be find (i.e. reading from old partition).

  - Scheme works well for transient events.  Any stateful processor will
    likely break.

  - Tracking can become extremely complicated, since each split requires
    potentially more and more offset/partition combos.

  - Need to support continuation for consumers to read the new partitions.

  - With linear hashing, integral multiple increase (2x, 3x, 4x, etc).
    Easier mapping from old partition sets to new partition sets.
    Keys end up with a clean hierarchy instead of a major reshuffling.

  - Dong's approach piggyback on existing leader epoch.  Log segment
    could be tagged with version in linear hashing case.

  - In Jan's case, existing consumers bootstrap from the beginning.

  - James' use case.  Using Kafka as a long term persistent data store.
    Putting "source of truth" information into Kafka.  Bootstrap case
    is very important.  New applications could be bootstrapping as they
    come up.

    - Increasing partitions will help with load from prodcuer and
  increasing consumer parallelism.
    - How does Kinesis handling partition splits?  They don't have
  compacted logs, so no issue with bootstrapping.  Kinesis uses
  MD5 and splits results based on md5sum into bucket ranges.
    - Is it useful for the server to know the partitioning function?
  Consumer has some implicit assumptions about keyed partitions,
  but not strongly enforced on server side.

    - KIP-213 (one to many joins in Kafka Streams)

  - MySQL case.  Primary key forced to be used as Kafka key.

    (Sorry had some audio and video drop at this point)

  - Mirror Maker.  Source cluster has custom partitioning function.
    Producer won't duplicate to same partitioning setup as source.
    Need to provide same partitioning function to producer.
    Would need to determine partitioning function based on topic.

    - Should server validate partitioning?
    - Who does actual determination of which key goes to which 
partition.


  - How to implement backfill?

    - Who will do it?  In producer?  Hard to do.  Every client would
  need to add this functionality.  Better to do on server side.
    - Add a type of "copy consumer"?  Add backoff to producer?
  Benefit of doing in consumer vs. producer?

  - Still TBD
    - How to dedupe control messages?
    - How to deal with subtle cases during transition?
    - Is it useful for the server to have the option to validate the key
  distribution?
    - Jan concerned about how a consumer application would look with the
  new "split partition" design.
    - KIP introduced callback.  Jan doesn't think is useful. Callback
  for switching "between Partition 1 and can start on Partition 11".
  Rely on marker in Partition 1 instead.  Intent for callback is
  for possibility that delivery of messages for given key is moved
  to a different consumer instance.



On 4/6/18 9:44 AM, Dong Lin wrote:

Hey John,

Thanks much for your super-detailed explanation. This is very helpful.

Now that I have finished reading through your email, I think the 

Re: [DISCUSS] KIP-253: Support in-order message delivery with partition expansion

2018-04-06 Thread Ray Chiang
Hi Jun, please add me to the invitation as well.  If this is happening 
near Palo Alto, let me know if I can join in person. Thanks.


-Ray

On 4/4/18 1:34 PM, Jun Rao wrote:

Hi, Jan, Dong, John, Guozhang,

Perhaps it will be useful to have a KIP meeting to discuss this together as
a group. Would Apr. 9 (Monday) at 9:00am PDT work? If so, I will send out
an invite to the mailing list.

Thanks,

Jun


On Wed, Apr 4, 2018 at 1:25 AM, Jan Filipiak 
wrote:


Want to quickly step in here again because it is going places again.

The last part of the discussion is just a pain to read and completely
diverged from what I suggested without making the reasons clear to me.

I don't know why this happens here are my comments anyway.

@Guozhang: That Streams is working on automatic creating
copartition-usuable topics: great for streams, has literally nothing todo
with the KIP as we want to grow the
input topic. Everyone can reshuffle rel. easily but that is not what we
need todo, we need to grow the topic in question. After streams
automatically reshuffled the input topic
still has the same size and it didn't help a bit. I fail to see why this
is relevant. What am i missing here?

@Dong
I am still on the position that the current proposal brings us into the
wrong direction. Especially introducing PartitionKeyRebalanceListener
 From this point we can never move away to proper state full handling
without completely deprecating this creature from hell again.
Linear hashing is not the optimising step we have todo here. An interface
that when a topic is a topic its always the same even after it had
grown or shrunk is important. So from my POV I have major concerns that
this KIP is benefitial in its current state.

What is it that makes everyone so addicted to the idea of linear hashing?
not attractive at all for me.
And with statefull consumers still a complete mess. Why not stick with the
Kappa architecture???





On 03.04.2018 17:38, Dong Lin wrote:


Hey John,

Thanks much for your comments!!

I have yet to go through the emails of John/Jun/Guozhang in detail. But
let
me present my idea for how to minimize the delay for state loading for
stream use-case.

For ease of understanding, let's assume that the initial partition number
of input topics and change log topic are both 10. And initial number of
stream processor is also 10. If we only increase initial partition number
of input topics to 15 without changing number of stream processor, the
current KIP already guarantees in-order delivery and no state needs to be
moved between consumers for stream use-case. Next, let's say we want to
increase the number of processor to expand the processing capacity for
stream use-case. This requires us to move state between processors which
will take time. Our goal is to minimize the impact (i.e. delay) for
processing while we increase the number of processors.

Note that stream processor generally includes both consumer and producer.
In addition to consume from the input topic, consumer may also need to
consume from change log topic on startup for recovery. And producer may
produce state to the change log topic.



The solution will include the following steps:

1) Increase partition number of the input topic from 10 to 15. Since the
messages with the same key will still go to the same consume before and
after the partition expansion, this step can be done without having to
move
state between processors.

2) Increase partition number of the change log topic from 10 to 15. Note
that this step can also be done without impacting existing workflow. After
we increase partition number of the change log topic, key space may split
and some key will be produced to the newly-added partition. But the same
key will still go to the same processor (i.e. consumer) before and after
the partition. Thus this step can also be done without having to move
state
between processors.

3) Now, let's add 5 new consumers whose groupId is different from the
existing processor's groupId. Thus these new consumers will not impact
existing workflow. Each of these new consumers should consume two
partitions from the earliest offset, where these two partitions are the
same partitions that will be consumed if the consumers have the same
groupId as the existing processor's groupId. For example, the first of the
five consumers will consume partition 0 and partition 10. The purpose of
these consumers is to rebuild the state (e.g. RocksDB) for the processors
in advance. Also note that, by design of the current KIP, each consume
will
consume the existing partition of the change log topic up to the offset
before the partition expansion. Then they will only need to consume the
state of the new partition of the change log topic.

4) After consumers have caught up in step 3), we should stop these
consumers and add 5 new processors to the stream processing job. These 5
new processors should run in the same location as the previous 5 consumers
to 

Re: Wiki edit permission request

2018-04-06 Thread Ray Chiang

Got it.  Thanks.

-Ray

On 4/6/18 12:03 PM, Matthias J. Sax wrote:

Please keep it on the mailing list.

Wiki permissions granted.


-Matthias

On 4/6/18 11:58 AM, Ray Chiang wrote:

ID is rchiang.

-Ray

On 4/6/18 11:23 AM, Matthias J. Sax wrote:

What is your wiki ID?

-Matthias

On 4/6/18 10:54 AM, Ray Chiang wrote:

As best as I can tell, I currently don't have edit access.

-Ray





Wiki edit permission request

2018-04-06 Thread Ray Chiang

As best as I can tell, I currently don't have edit access.

-Ray



Question about developer documentation

2018-02-09 Thread Ray Chiang

There is some documentation for developers at:

  http://kafka.apache.org/project

There's also another set of links at the bottom of this wiki page:

  https://cwiki.apache.org/confluence/display/KAFKA/Index

There's some minor duplication of information, but it's definitely not 
quite presented in a clean "step by step" manner.


I think it could benefit from a reorganization of how the information is 
presented.  Before I start making suggestions, does anyone have any 
thoughts on the subject?


-Ray



Re: Eclipse or Intellij workspace setup

2018-02-07 Thread Ray Chiang
Here's what I did recently with OS X/IntelliJ.  I hadn't quite fleshed 
out all the instructions to put up an updated Wiki version yet:


- Use Homebrew to install gradle/scala@2.11 (use --with-docs option)/sbt
- git clone Kafka
- Run gradle/gradlew commands as documented in 
https://cwiki.apache.org/confluence/display/KAFKA/Developer+Setup

- Upgrade to latest IntelliJ
- Import Kafka into IntelliJ
  - Point to local Gradle command
- Set up pointer to Scala SDK (OSX: File->Other Settings->Default 
Project Structure)

  - Change Compiler Classpath jars to Homebrew directory
    - scala-compiler.jar
    - scala-library.jar
    - scala-reflect.jar
  - Change Standard Library jars to Homebrew directory
    - Classes
  - scala-library.jar
  - scala-library--sources.jar (TBD)
  - scala-reflect.jar
  - scala-reflect--sources.jar (TBD)
    - Javadoc
  - scala-library--javadoc.jar
  - scala-reflect--javadoc.jar
- Install IntelliJ Scala plugin

-Ray

On 2/7/18 11:06 PM, Ramesh Kolli wrote:

Hi Team,

I am newbie to Kafka dev forum, I would like to be part of kafka
development. I tried to setup Eclipse and Intellij workspace setup by
follow below link(
https://cwiki.apache.org/confluence/display/KAFKA/Developer+Setup). But, I
am not to setup compile successfully, after importing project, i am getting
lot of error.

Can someone please help me setup the workspace?


Regards,
Ramesh