?????? [DISCUSS] Why don't support web UI

2018-05-31 Thread ????????????
Hi Rahul :
  Thanks for you reply . web UI is no need very beautiful . 
  Hadoop HDFS and YARN 's UI is nice , maybe we can deploy a web interface when 
the core function is ok 




--  --
??: "Rahul Singh"; 
: 2018??5??31??(??) 6:23
??: "dev"; 
: Re: [DISCUSS] Why don't support web UI



My opinions why:

There are open source Web UIs available. Kafka Manager being one of them. Kafka 
is also a fairly new product. Even slightly mature products like Cassandra 
(which I??m also on the dev list for) focus on core features instead of user 
interfaces because there 100s is ways to take the JMX APIs and either make your 
own tools to use something else readily created.

There are also commercial providers like Confluent and Landoop that do this as 
a packaged product.

--
Rahul Singh
rahul.si...@anant.us

Anant Corporation
On May 28, 2018, 10:58 PM -0400,  <1878707...@qq.com>, wrote:
> Hi team??
> I'm curious that why we don't offer a web UI for administrators . The web UI 
> is very useful to display cluster status and current topic status .

Re: [DISCUSS] KIP-307: Allow to define custom processor names with KStreams DSL

2018-05-31 Thread Guozhang Wang
Hi Florian,

Re 1: I think changing the KStreamImpl / KTableImpl to allow modifying the
processor name after the operator is fine as long as we do the check again
when modifying that. In fact, we are having some topology optimization
going on which may modify processor names in the final topology anyways (
https://github.com/apache/kafka/pull/4983). Semantically I think it is
easier to understand to developers than "deciding the processor name for
the next operator".

Re 2: Yeah I'm thinking that for operators that translates to multiple
processor names, we can still use the provided "hint" to name the processor
names, e.g. for Joins we can name them as `join-foo-this` and
`join-foo-that` etc if user calls `as("foo")`.

Re 3: The motivation I had about removing the suffix is that it has huge
restrictions on topology compatibilities: consider if user code added a new
operator, or library does some optimization to remove some operators, the
suffix indexing may be changed for a large amount of the processor names:
this will in turn change the internal state store names, as well as
internal topic names as well, making the new application topology to be
incompatible with the ones. One rationale I had about this KIP is that
aligned this effort, moving forward we can allow users to customize
internal names so that they can still be reused even with topology changes
(e.g. KIP-230), so I think removing the suffix index would be more
applicable in the long run.



Guozhang





On Thu, May 31, 2018 at 3:08 PM, Florian Hussonnois 
wrote:

> Hi ,
> Thank you very much for your feedback.
>
> 1/
> I agree that overloading most of the methods with a Processed is not ideal.
> I've started modifying the KStream API and I got to the same conclusion.
> Also ading a new method directly to KStreamImpl and KTableImpl classes
> seems to be a better option.
>
> However a processor name cannot be redefined after calling an operator (or
> maybe I miss something in the code).
> From my understanding, this will only set the KStream name property not the
> processor name previsouly added to the topology builder - leading to
> InvalidTopology exception.
>
> So the new method should actually defines the name of the next processor :
> Below is an example :
>
> *stream.as (Processed.name("MAPPE_TO_UPPERCASE")*
> *  .map( (k, v) -> KeyValue.pair(k, v.toUpperCase()))*
>
> I think this approach could solve the cases for methods returning void ?
>
> Regarding this new method we have two possible implementations :
>
>1. Adding a method like : withName(String processorName)
>2. or adding a method accepting an Processed object : as(Processed).
>
> I think solution 2. is preferable as the Processed class could be enriched
> further (in futur).
>
> 2/
> As Guozhang said some operators add internal processors.
> For example the branch() method create one KStreamBranch processor to route
> records and one KStreamPassThrough processor for each branch.
> In that situation only the parent processor can be named. For children
> processors we could keep the current behaviour that add a suffix (i.e
> KSTREAM-BRANCHCHILD-)
>
> This also the case for the join() method that result to adding multiple
> processors to the topology (windowing, left/right joins and a merge
> processor).
> I think, like for the branch method users could only define a processor
> name prefix.
>
> 3/
> I think we should  still added a suffix like "-00" to processor
> name and enforce uppercases as this will keep some consistency with the
> ones generated by the API.
>
> 4/
> Yes, the KTable interface should be modified like KStream to allow custom
> processor names definition.
>
> Thanks,
>
>
> Le jeu. 31 mai 2018 à 19:18, Damian Guy  a écrit :
>
> > Hi Florian,
> >
> > Thanks for the KIP. What about KTable and other DSL interfaces? Will they
> > not want to be able to do the same thing?
> > It would be good to see a complete set of the public API changes.
> >
> > Cheers,
> > Damian
> >
> > On Wed, 30 May 2018 at 19:45 Guozhang Wang  wrote:
> >
> > > Hello Florian,
> > >
> > > Thanks for the KIP. I have some meta feedbacks on the proposal:
> > >
> > > 1. You mentioned that this `Processed` object will be added to a new
> > > overloaded variant of all the stateless operators, what about the
> > stateful
> > > operators? Would like to hear your opinions if you have thought about
> > that:
> > > note for stateful operators they will usually be mapped to multiple
> > > processor node names, so we probably need to come up with some ways to
> > > define all their names.
> > >
> > > 2. I share the same concern with Bill as for adding lots of new
> overload
> > > functions into the stateless operators, as we have just spent quite
> some
> > > effort in trimming them since 1.0.0 release. If the goal is to just
> > provide
> > > some "hints" on the generated processor node names, not strictly
> > enforcing
> > > the exact names that to be generated, 

Re: [VOTE] KIP-266: Add TimeoutException for KafkaConsumer#position

2018-05-31 Thread Dong Lin
Thanks for the KIP! I am in favor of the option 1.

+1 as well.

On Thu, May 31, 2018 at 6:00 PM, Jason Gustafson  wrote:

> Thanks everyone for the feedback. I've updated the KIP and added
> KAFKA-6979.
>
> -Jason
>
> On Wed, May 30, 2018 at 3:50 PM, Guozhang Wang  wrote:
>
> > Thanks Jason. I'm in favor of option 1 as well.
> >
> > On Wed, May 30, 2018 at 1:37 PM, Bill Bejeck  wrote:
> >
> > > For what it's worth I'm +1 on Option 1 and the default value for the
> > > timeout.
> > >
> > > In addition to reasons outlined above by Jason, I think it will help to
> > > reason about consumer behavior (with respect to blocking) having the
> > > configuration and default value aligned with the producer.
> > >
> > > -Bill
> > >
> > > On Wed, May 30, 2018 at 3:43 PM, Ismael Juma 
> wrote:
> > >
> > > > Sounds good to me,
> > > >
> > > > On Wed, May 30, 2018 at 12:40 PM Jason Gustafson  >
> > > > wrote:
> > > >
> > > > > Perhaps one minute? That is the default used by the producer.
> > > > >
> > > > > -Jason
> > > > >
> > > > > On Wed, May 30, 2018 at 9:50 AM, Ismael Juma 
> > > wrote:
> > > > >
> > > > > > Option 1 sounds good to me provided that we can come up with a
> good
> > > > > > default. What would you suggest?
> > > > > >
> > > > > > Ismael
> > > > > >
> > > > > > On Wed, May 30, 2018 at 9:41 AM Jason Gustafson <
> > ja...@confluent.io>
> > > > > > wrote:
> > > > > >
> > > > > > > Hi Everyone,
> > > > > > >
> > > > > > > There remains some inconsistency in the timeout behavior of the
> > > > > consumer
> > > > > > > APIs which do not accept a timeout. Some of them block forever
> > > (e.g.
> > > > > > > position()) and some of them use request.timeout.ms (e.g.
> > > > > > > parititonsFor()).
> > > > > > > I think we'd probably all agree that blocking forever is not
> > useful
> > > > > > > behavior and using request.timeout.ms has always been a hack
> > since
> > > > it
> > > > > > > controls a separate concern. I think there are basically two
> > > options
> > > > to
> > > > > > > address this:
> > > > > > >
> > > > > > > 1. We can add max.block.ms to match the producer and use it as
> > the
> > > > > > default
> > > > > > > timeout when a timeout is not explicitly provided. This will
> fix
> > > the
> > > > > > > indefinite blocking behavior and avoid conflating
> > > request.timeout.ms
> > > > .
> > > > > > > 2. We can deprecate the methods which don't accept a timeout.
> > > > > > >
> > > > > > > I'm leaning toward the first solution because I think we want
> to
> > > push
> > > > > > users
> > > > > > > to specifying timeouts through configuration rather than in
> code
> > > > (Jay's
> > > > > > > original argument). I think the overloads are still useful for
> > > > advanced
> > > > > > > usage (e.g. in kafka streams), but we should give users an easy
> > > > option
> > > > > > with
> > > > > > > reasonable default behavior.
> > > > > > >
> > > > > > > If that sounds ok, I'd propose we add it to this KIP and fix it
> > > now.
> > > > > This
> > > > > > > gives users an easy way to get the benefit of the improvements
> > from
> > > > > this
> > > > > > > KIP without changing any code.
> > > > > > >
> > > > > > > Thanks,
> > > > > > > Jason
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > >
> > > > > > > On Sun, May 13, 2018 at 7:58 PM, Richard Yu <
> > > > > yohan.richard...@gmail.com>
> > > > > > > wrote:
> > > > > > >
> > > > > > > > Hi,
> > > > > > > >
> > > > > > > > With 3 binding votes and 6 non-binding, this KIP would be
> > > accepted.
> > > > > > > >
> > > > > > > > Thanks for participating.
> > > > > > > >
> > > > > > > > On Thu, May 10, 2018 at 2:35 AM, Edoardo Comar <
> > > edoco...@gmail.com
> > > > >
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > > +1 (non-binding)
> > > > > > > > >
> > > > > > > > > On 10 May 2018 at 10:29, zhenya Sun 
> wrote:
> > > > > > > > >
> > > > > > > > > > +1 non-binding
> > > > > > > > > >
> > > > > > > > > > > 在 2018年5月10日,下午5:19,Manikumar <
> manikumar.re...@gmail.com
> > >
> > > > 写道:
> > > > > > > > > > >
> > > > > > > > > > > +1 (non-binding).
> > > > > > > > > > > Thanks.
> > > > > > > > > > >
> > > > > > > > > > > On Thu, May 10, 2018 at 2:33 PM, Mickael Maison <
> > > > > > > > > > mickael.mai...@gmail.com>
> > > > > > > > > > > wrote:
> > > > > > > > > > >
> > > > > > > > > > >> +1 (non binding)
> > > > > > > > > > >> Thanks
> > > > > > > > > > >>
> > > > > > > > > > >> On Thu, May 10, 2018 at 9:39 AM, Rajini Sivaram <
> > > > > > > > > > rajinisiva...@gmail.com>
> > > > > > > > > > >> wrote:
> > > > > > > > > > >>> Hi Richard, Thanks for the KIP.
> > > > > > > > > > >>>
> > > > > > > > > > >>> +1 (binding)
> > > > > > > > > > >>>
> > > > > > > > > > >>> Regards,
> > > > > > > > > > >>>
> > > > > > > > > > >>> Rajini
> > > > > > > > > > >>>
> > > > > > > > > > >>> On Wed, May 9, 2018 at 10:54 PM, Guozhang Wang <
> > > > > > > wangg...@gmail.com
> > > > > > > > >
> > > > > > > > > > >> wrote:
> > > > > > > > > > >>>
> > > > > > > > > > 

Re: [VOTE] KIP-266: Add TimeoutException for KafkaConsumer#position

2018-05-31 Thread Jason Gustafson
Thanks everyone for the feedback. I've updated the KIP and added KAFKA-6979.

-Jason

On Wed, May 30, 2018 at 3:50 PM, Guozhang Wang  wrote:

> Thanks Jason. I'm in favor of option 1 as well.
>
> On Wed, May 30, 2018 at 1:37 PM, Bill Bejeck  wrote:
>
> > For what it's worth I'm +1 on Option 1 and the default value for the
> > timeout.
> >
> > In addition to reasons outlined above by Jason, I think it will help to
> > reason about consumer behavior (with respect to blocking) having the
> > configuration and default value aligned with the producer.
> >
> > -Bill
> >
> > On Wed, May 30, 2018 at 3:43 PM, Ismael Juma  wrote:
> >
> > > Sounds good to me,
> > >
> > > On Wed, May 30, 2018 at 12:40 PM Jason Gustafson 
> > > wrote:
> > >
> > > > Perhaps one minute? That is the default used by the producer.
> > > >
> > > > -Jason
> > > >
> > > > On Wed, May 30, 2018 at 9:50 AM, Ismael Juma 
> > wrote:
> > > >
> > > > > Option 1 sounds good to me provided that we can come up with a good
> > > > > default. What would you suggest?
> > > > >
> > > > > Ismael
> > > > >
> > > > > On Wed, May 30, 2018 at 9:41 AM Jason Gustafson <
> ja...@confluent.io>
> > > > > wrote:
> > > > >
> > > > > > Hi Everyone,
> > > > > >
> > > > > > There remains some inconsistency in the timeout behavior of the
> > > > consumer
> > > > > > APIs which do not accept a timeout. Some of them block forever
> > (e.g.
> > > > > > position()) and some of them use request.timeout.ms (e.g.
> > > > > > parititonsFor()).
> > > > > > I think we'd probably all agree that blocking forever is not
> useful
> > > > > > behavior and using request.timeout.ms has always been a hack
> since
> > > it
> > > > > > controls a separate concern. I think there are basically two
> > options
> > > to
> > > > > > address this:
> > > > > >
> > > > > > 1. We can add max.block.ms to match the producer and use it as
> the
> > > > > default
> > > > > > timeout when a timeout is not explicitly provided. This will fix
> > the
> > > > > > indefinite blocking behavior and avoid conflating
> > request.timeout.ms
> > > .
> > > > > > 2. We can deprecate the methods which don't accept a timeout.
> > > > > >
> > > > > > I'm leaning toward the first solution because I think we want to
> > push
> > > > > users
> > > > > > to specifying timeouts through configuration rather than in code
> > > (Jay's
> > > > > > original argument). I think the overloads are still useful for
> > > advanced
> > > > > > usage (e.g. in kafka streams), but we should give users an easy
> > > option
> > > > > with
> > > > > > reasonable default behavior.
> > > > > >
> > > > > > If that sounds ok, I'd propose we add it to this KIP and fix it
> > now.
> > > > This
> > > > > > gives users an easy way to get the benefit of the improvements
> from
> > > > this
> > > > > > KIP without changing any code.
> > > > > >
> > > > > > Thanks,
> > > > > > Jason
> > > > > >
> > > > > >
> > > > > >
> > > > > >
> > > > > > On Sun, May 13, 2018 at 7:58 PM, Richard Yu <
> > > > yohan.richard...@gmail.com>
> > > > > > wrote:
> > > > > >
> > > > > > > Hi,
> > > > > > >
> > > > > > > With 3 binding votes and 6 non-binding, this KIP would be
> > accepted.
> > > > > > >
> > > > > > > Thanks for participating.
> > > > > > >
> > > > > > > On Thu, May 10, 2018 at 2:35 AM, Edoardo Comar <
> > edoco...@gmail.com
> > > >
> > > > > > wrote:
> > > > > > >
> > > > > > > > +1 (non-binding)
> > > > > > > >
> > > > > > > > On 10 May 2018 at 10:29, zhenya Sun  wrote:
> > > > > > > >
> > > > > > > > > +1 non-binding
> > > > > > > > >
> > > > > > > > > > 在 2018年5月10日,下午5:19,Manikumar  >
> > > 写道:
> > > > > > > > > >
> > > > > > > > > > +1 (non-binding).
> > > > > > > > > > Thanks.
> > > > > > > > > >
> > > > > > > > > > On Thu, May 10, 2018 at 2:33 PM, Mickael Maison <
> > > > > > > > > mickael.mai...@gmail.com>
> > > > > > > > > > wrote:
> > > > > > > > > >
> > > > > > > > > >> +1 (non binding)
> > > > > > > > > >> Thanks
> > > > > > > > > >>
> > > > > > > > > >> On Thu, May 10, 2018 at 9:39 AM, Rajini Sivaram <
> > > > > > > > > rajinisiva...@gmail.com>
> > > > > > > > > >> wrote:
> > > > > > > > > >>> Hi Richard, Thanks for the KIP.
> > > > > > > > > >>>
> > > > > > > > > >>> +1 (binding)
> > > > > > > > > >>>
> > > > > > > > > >>> Regards,
> > > > > > > > > >>>
> > > > > > > > > >>> Rajini
> > > > > > > > > >>>
> > > > > > > > > >>> On Wed, May 9, 2018 at 10:54 PM, Guozhang Wang <
> > > > > > wangg...@gmail.com
> > > > > > > >
> > > > > > > > > >> wrote:
> > > > > > > > > >>>
> > > > > > > > >  +1 from me, thanks!
> > > > > > > > > 
> > > > > > > > > 
> > > > > > > > >  Guozhang
> > > > > > > > > 
> > > > > > > > >  On Wed, May 9, 2018 at 10:46 AM, Jason Gustafson <
> > > > > > > > ja...@confluent.io>
> > > > > > > > >  wrote:
> > > > > > > > > 
> > > > > > > > > > Thanks for the KIP, +1 (binding).
> > > > > > > > > >
> > > > > > > > > > One small correction: the KIP mentions that close()
> > will
> > > be
> > 

[jira] [Created] (KAFKA-6979) Add max.block.ms to consumer for default behavior

2018-05-31 Thread Jason Gustafson (JIRA)
Jason Gustafson created KAFKA-6979:
--

 Summary: Add max.block.ms to consumer for default behavior
 Key: KAFKA-6979
 URL: https://issues.apache.org/jira/browse/KAFKA-6979
 Project: Kafka
  Issue Type: Improvement
Reporter: Jason Gustafson
Assignee: Jason Gustafson
 Fix For: 2.0.0


Implement max.block.ms as described in KIP-266: 
https://cwiki.apache.org/confluence/display/KAFKA/KIP-266%3A+Fix+consumer+indefinite+blocking+behavior.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [VOTE] KIP-281: ConsumerPerformance: Increase Polling Loop Timeout and Make It Reachable by the End User

2018-05-31 Thread Jason Gustafson
I'm going to call this vote since it has enough binding votes and it has
been open for a long time. The final tally:

Binding +4: me, Ismael, Rajini, and Guozhang
Non-binding +4: Ted, Moshe, Alex, and Dhruvil

There were no -1 votes.

Thanks to Alexander Dunayevsky for the KIP!

Thanks,
Jason

On Fri, May 11, 2018 at 8:55 AM, Guozhang Wang  wrote:

> +1 (binding), thanks!
>
>
> Guozhang
>
>
> On Thu, May 10, 2018 at 4:41 PM, Ismael Juma  wrote:
>
> > Thanks for the KIP, +1 (binding). A few suggestions:
> >
> > 1. We normally include the time unit in configs. Not sure if we do it for
> > command line parameters though, so can we please verify and make it
> > consistent?
> > 2. The KIP mentions --polling-loop-timeout and --timeout. Which is it?
> > 3. Can we include the description of the new parameter in the KIP? In the
> > PR it says "Consumer polling loop timeout", but I think this is a bit
> > unclear. What are we actually measuring here?
> >
> > Ismael
> >
> > On Mon, Apr 16, 2018 at 2:25 PM Alex Dunayevsky 
> > wrote:
> >
> > > Hello friends,
> > >
> > > Let's start the vote for KIP-281: ConsumerPerformance: Increase Polling
> > > Loop Timeout and Make It Reachable by the End User:
> > >
> > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > 281%3A+ConsumerPerformance%3A+Increase+Polling+Loop+Timeout+
> > and+Make+It+Reachable+by+the+End+User
> > >
> > > Thank you,
> > > Alexander Dunayevsky
> > >
> >
>
>
>
> --
> -- Guozhang
>


Re: [DISCUSS] KIP-201: Rationalising Policy interfaces

2018-05-31 Thread Anna Povzner
Hi Tom,


Thanks for the KIP. I am aware that the voting thread was started, but
wanted to discuss couple of concerns here first.


I think the coupling of RequestedTopicState#generatedReplicaAssignment()
and TopicState#replicasAssignments() does not work well in case where the
request deals only with a subset of partitions (e.g., add partitions) or no
assignment at all (alter topic config). In particular:

1) Alter topic config use case: There is no replica assignment in the
request, and generatedReplicaAssignment()  returning either true or false
is both misleading. The user can interpret this as assignment being
generated or provided by the user originally (e.g., on topic create), while
I don’t think we track such thing.

2) On add partitions, we may have manual assignment for new partitions.
What I understood from the KIP,  generatedReplicaAssignment() will return
true or false based on whether new partitions were manually assigned or
not, while TopicState#replicasAssignments() will return replica assignments
for all partitions. I think it is confusing in a way that assignment of old
partitions could be auto-generated but new partitions are manually assigned.


3) Generalizing #2, suppose in a future, a user can re-assign replicas for
a set of partitions.


One way to address this with minimal changes to proposed API is to rename
RequestedTopicState#generatedReplicaAssignment() to
RequestedTopicState#manualReplicaAssignment() and change the API behavior
and description to : “True if the client explicitly provided replica
assignments in this request, which means that some or all assignments
returned by TopicState#replicasAssignments() are explicitly requested by
the user”. The user then will have to diff TopicState#replicasAssignments()
from clusterState and TopicState#replicasAssignments()  from
RequestedTopicState, and assume that assignments that are different are
manually assigned (if RequestedTopicState#manualReplicaAssignment()  returns
true). We will need to clearly document this and it still seems awkward.


I think a cleaner way is to make RequestedTopicState to provide replica
assignments only for partitions that were manually assigned replicas in the
request that is being validated. Similarly, for alter topic validation, it
would be nice to make it more clear for the user what has been changed. I
remember that you already raised that point earlier by comparing current
proposed API with having separate methods for each specific command.
However, I agree that it will make it harder to change the interface in the
future.


Could we explore the option of pushing methods that are currently in
TopicState to CreateTopicRequest and AlterTopicRequest? TopicState will
still be used for requesting current topic state via ClusterState.

Something like:

interface CreateTopicRequest extends AbstractRequestMetadata {

  // requested number of partitions or if manual assignment is given,
number of partitions in the assignment

  int numPartitions();

  // requested replication factor, or if manual assignment is given, number
of replicas in assignment for partition 0

  short replicationFactor();

 // replica assignment requested by the client, or null if assignment is
auto-generated

 map> manualReplicaAssignment();

 map configs();

}


interface AlterTopicRequest extends AbstractRequestMetadata {

  // updated topic configs, or null if not changed

  map updatedConfigs();

  // proposed replica assignment in this request, or null. For adding new
partitions request, this is proposed replica assignment for new partitions.
For replica re-assignment case, this is proposed new assignment.

  map> proposedReplicaAssignment();

  // new number of partitions (due to increase/decrease), or null if number
of partitions not changed

  Integer updatedNumPartitions()

}


I did not spend much time on my AlterTopicRequest interface proposal, but
the idea is basically to return only the parts which were changed. The
advantage of this approach over having separate methods for each specific
alter topic request is that it is more flexible for future mixing of what
can be updated in the topic state.


What do you think?


Thanks,

Anna


On Mon, Oct 9, 2017 at 1:39 AM, Tom Bentley  wrote:

> I've added RequestedTopicState, as discussed in my last email.
>
> I've also added a paragraph to the migration plan about old clients making
> policy-violating delete topics or delete records request.
>
> If no further comments a forthcoming in the next day or two then I will
> start a vote.
>
> Thanks,
>
> Tom
>
> On 5 October 2017 at 12:41, Tom Bentley  wrote:
>
> > I'd like to raise a somewhat subtle point about how the proposed API
> > should behave.
> >
> > The current CreateTopicPolicy gets passed either the request partition
> > count and replication factor, or the requested assignment. So if the
> > request had specified partition count and replication factor, the policy
> > sees a null replicaAssignments(). Likewise if the 

[jira] [Created] (KAFKA-6978) Make Streams Window retention time strict

2018-05-31 Thread John Roesler (JIRA)
John Roesler created KAFKA-6978:
---

 Summary: Make Streams Window retention time strict
 Key: KAFKA-6978
 URL: https://issues.apache.org/jira/browse/KAFKA-6978
 Project: Kafka
  Issue Type: Improvement
  Components: streams
Reporter: John Roesler


Currently, the configured retention time for windows is a lower bound. We 
actually keep the window around until it's time to roll a new segment. At that 
time, we drop all windows in the oldest segment.

As long as a window is still in a segment, we will continue to add 
late-arriving records to it and also serve IQ queries from it. This is sort of 
nice, because it makes optimistic use of the fact that the windows live for 
some time after their retention expires. However, it is also a source of 
(apparent) non-determinism, and it's arguably better for programability if we 
adhere strictly to the configured constraints.

Therefore, the new behavior will be:
 * once the retention time for a window passes, Streams will drop any 
later-arriving records (with a warning log and a metric)

 * likewise, IQ will first check whether the window is younger than its 
retention time before answering queries.

No changes need to be made to the underlying segment management, this is purely 
to make the behavior more strict wrt the configuration.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [DISCUSS] KIP-307: Allow to define custom processor names with KStreams DSL

2018-05-31 Thread Florian Hussonnois
Hi ,
Thank you very much for your feedback.

1/
I agree that overloading most of the methods with a Processed is not ideal.
I've started modifying the KStream API and I got to the same conclusion.
Also ading a new method directly to KStreamImpl and KTableImpl classes
seems to be a better option.

However a processor name cannot be redefined after calling an operator (or
maybe I miss something in the code).
>From my understanding, this will only set the KStream name property not the
processor name previsouly added to the topology builder - leading to
InvalidTopology exception.

So the new method should actually defines the name of the next processor :
Below is an example :

*stream.as (Processed.name("MAPPE_TO_UPPERCASE")*
*  .map( (k, v) -> KeyValue.pair(k, v.toUpperCase()))*

I think this approach could solve the cases for methods returning void ?

Regarding this new method we have two possible implementations :

   1. Adding a method like : withName(String processorName)
   2. or adding a method accepting an Processed object : as(Processed).

I think solution 2. is preferable as the Processed class could be enriched
further (in futur).

2/
As Guozhang said some operators add internal processors.
For example the branch() method create one KStreamBranch processor to route
records and one KStreamPassThrough processor for each branch.
In that situation only the parent processor can be named. For children
processors we could keep the current behaviour that add a suffix (i.e
KSTREAM-BRANCHCHILD-)

This also the case for the join() method that result to adding multiple
processors to the topology (windowing, left/right joins and a merge
processor).
I think, like for the branch method users could only define a processor
name prefix.

3/
I think we should  still added a suffix like "-00" to processor
name and enforce uppercases as this will keep some consistency with the
ones generated by the API.

4/
Yes, the KTable interface should be modified like KStream to allow custom
processor names definition.

Thanks,


Le jeu. 31 mai 2018 à 19:18, Damian Guy  a écrit :

> Hi Florian,
>
> Thanks for the KIP. What about KTable and other DSL interfaces? Will they
> not want to be able to do the same thing?
> It would be good to see a complete set of the public API changes.
>
> Cheers,
> Damian
>
> On Wed, 30 May 2018 at 19:45 Guozhang Wang  wrote:
>
> > Hello Florian,
> >
> > Thanks for the KIP. I have some meta feedbacks on the proposal:
> >
> > 1. You mentioned that this `Processed` object will be added to a new
> > overloaded variant of all the stateless operators, what about the
> stateful
> > operators? Would like to hear your opinions if you have thought about
> that:
> > note for stateful operators they will usually be mapped to multiple
> > processor node names, so we probably need to come up with some ways to
> > define all their names.
> >
> > 2. I share the same concern with Bill as for adding lots of new overload
> > functions into the stateless operators, as we have just spent quite some
> > effort in trimming them since 1.0.0 release. If the goal is to just
> provide
> > some "hints" on the generated processor node names, not strictly
> enforcing
> > the exact names that to be generated, then how about we just add a new
> > function to `KStream` and `KTable` classes like: "as(Processed)", with
> the
> > semantics as "the latest operators that generate this KStream / KTable
> will
> > be named accordingly to this hint".
> >
> > The only caveat, is that for all operators like `KStream#to` and
> > `KStream#print` that returns void, this alternative would not work. But
> for
> > the current operators:
> >
> > a. KStream#print,
> > b. KStream#foreach,
> > c. KStream#to,
> > d. KStream#process
> >
> > I personally felt that except `KStream#process` users would not usually
> > bother to override their names, and for `KStream#process` we could add an
> > overload variant with the additional Processed object.
> >
> >
> > 3. In your example, the processor names are still added with a suffix
> like
> > "
> > -00", is this intentional? If yes, why (I thought with user
> > specified processor name hints we will not add suffix to distinguish
> > different nodes of the same type any more)?
> >
> >
> > Guozhang
> >
> >
> > On Tue, May 29, 2018 at 6:47 AM, Bill Bejeck  wrote:
> >
> > > Hi Florian,
> > >
> > > Thanks for the KIP.  I think being able to add more context to the
> > > processor names would be useful.
> > >
> > > I like the idea of adding a "withProcessorName" to Produced, Consumed
> and
> > > Joined.
> > >
> > > But instead of adding the "Processed" parameter to a large percentage
> of
> > > the methods, which would result in overloaded methods (which we removed
> > > quite a bit with KIP-182) what do you think of adding a method
> > > to the AbstractStream class "withName(String processorName)"? BTW I"m
> not
> > > married to the method name, it's the best I can do off 

[jira] [Created] (KAFKA-6977) Unexpected error code 2 while fetching data

2018-05-31 Thread Eugen Feller (JIRA)
Eugen Feller created KAFKA-6977:
---

 Summary:  Unexpected error code 2 while fetching data
 Key: KAFKA-6977
 URL: https://issues.apache.org/jira/browse/KAFKA-6977
 Project: Kafka
  Issue Type: Bug
  Components: streams
Affects Versions: 0.10.2.1
Reporter: Eugen Feller


We are running Kafka Streams 0.11 with Kafka Broker 0.10.2.1 and constantly run 
into the following exception:

 
{code:java}
[visitstats-mongowriter-838cf286-be42-4a49-b3ab-3a291617233e-StreamThread-1] 
INFO org.apache.kafka.streams.processor.internals.StreamThread - stream-thread 
[visitstats-mongowriter-838cf286-be42-4a49-b3ab-3a291617233e-StreamThread-1] 
partition assignment took 40 ms.
current active tasks: [0_0, 0_1, 0_3, 0_4, 0_5, 0_10, 0_12, 0_13, 0_14, 0_15, 
0_17, 0_20, 0_21, 0_23, 0_25, 0_28, 0_2, 0_6, 0_7, 0_8, 0_9, 0_11, 0_16, 0_18, 
0_19, 0_22, 0_24, 0_26, 0_27, 0_29, 0_30, 0_31]
current standby tasks: []
previous active tasks: [0_0, 0_1, 0_3, 0_4, 0_5, 0_10, 0_12, 0_13, 0_14, 0_15, 
0_17, 0_20, 0_21, 0_23, 0_25, 0_28]
[visitstats-mongowriter-838cf286-be42-4a49-b3ab-3a291617233e-StreamThread-1] 
INFO org.apache.kafka.streams.processor.internals.StreamThread - stream-thread 
[visitstats-mongowriter-838cf286-be42-4a49-b3ab-3a291617233e-StreamThread-1] 
State transition from PARTITIONS_ASSIGNED to RUNNING.
[visitstats-mongowriter-838cf286-be42-4a49-b3ab-3a291617233e-StreamThread-1] 
INFO org.apache.kafka.streams.KafkaStreams - stream-client 
[visitstats-mongowriter-838cf286-be42-4a49-b3ab-3a291617233e] State transition 
from REBALANCING to RUNNING.
[visitstats-mongowriter-838cf286-be42-4a49-b3ab-3a291617233e-StreamThread-1] 
ERROR org.apache.kafka.streams.processor.internals.StreamThread - stream-thread 
[visitstats-mongowriter-838cf286-be42-4a49-b3ab-3a291617233e-StreamThread-1] 
Encountered the following error during processing:
at 
org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:480)
at 
org.apache.kafka.clients.consumer.internals.Fetcher.parseCompletedFetch(Fetcher.java:886)
at 
org.apache.kafka.clients.consumer.internals.Fetcher.fetchedRecords(Fetcher.java:525)
at 
org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:1086)
at org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:1043)
at 
org.apache.kafka.streams.processor.internals.StreamThread.pollRequests(StreamThread.java:536)
at 
org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:490)
java.lang.IllegalStateException: Unexpected error code 2 while fetching data
at 
org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:457)
{code}
 



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


REMINDER: Apache EU Roadshow 2018 in Berlin is less than 2 weeks away!

2018-05-31 Thread sharan

Hello Apache Supporters and Enthusiasts

This is a reminder that our Apache EU Roadshow in Berlin is less than 
two weeks away and we need your help to spread the word. Please let your 
work colleagues, friends and anyone interested in any attending know 
about our Apache EU Roadshow event.


We have a great schedule including tracks on Apache Tomcat, Apache Http 
Server, Microservices, Internet of Things (IoT) and Cloud Technologies. 
You can find more details at the link below:


https://s.apache.org/0hnG

Ticket prices will be going up on 8^th June 2018, so please make sure 
that you register soon if you want to beat the price increase. 
https://foss-backstage.de/tickets


Remember that registering for the Apache EU Roadshow also gives you 
access to FOSS Backstage so you can attend any talks and workshops from 
both conferences. And don’t forget that our Apache Lounge will be open 
throughout the whole conference as a place to meet up, hack and relax.


We look forward to seeing you in Berlin!

Thanks
Sharan Foga,  VP Apache Community Development

http://apachecon.com/
@apachecon

PLEASE NOTE: You are receiving this message because you are subscribed 
to a user@ or dev@ list of one or more Apache Software Foundation projects.


[jira] [Resolved] (KAFKA-6666) OffsetOutOfRangeException: Replica Thread Stopped Resulting in Underreplicated Partitions

2018-05-31 Thread Srinivas Dhruvakumar (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Srinivas Dhruvakumar resolved KAFKA-.
-
Resolution: Fixed

As part of the fix of https://issues.apache.org/jira/browse/KAFKA-3978 this bug 
has been fixed

> OffsetOutOfRangeException: Replica Thread Stopped Resulting in 
> Underreplicated Partitions
> -
>
> Key: KAFKA-
> URL: https://issues.apache.org/jira/browse/KAFKA-
> Project: Kafka
>  Issue Type: Bug
>  Components: core
>Affects Versions: 0.11.0.1
>Reporter: Srinivas Dhruvakumar
>Priority: Critical
> Fix For: 1.1.0
>
> Attachments: Screen Shot 2018-03-15 at 3.52.13 PM.png
>
>
> Hello All, 
> Currently we were seeing a few underreplicated partitions on our test cluster 
> which is used for Intergation testing. On debugging more we found the replica 
> thread was stopped due to an error 
> Caused by: org.apache.kafka.common.errors.OffsetOutOfRangeException: Cannot 
> increment the log start offset to 50 of partition  since it is larger 
> than the high watermark -1
> Kindly find the attached screenshot. 
> !Screen Shot 2018-03-15 at 3.52.13 PM.png!
>  



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[DISCUSS] KIP-301 Schema Inferencing in JsonConverters

2018-05-31 Thread Andrew Otto
Hiya, I’m late to this thread, and was just referred to the KIP from the
confluent mailing list.  Nice idea!

I like the idea of being able to provide a Schema fallback for
JsonConverter.  Perhaps an interface that could return the schema given the
JsonNode?For my use case, I have JSONSchemas resolvable via a
schema_uri field in my JSON data.  I’d like to be able to implement an
interface that would look up the JSONSchema for the given message, and then
convert that JSONSchema to a Connect Schema.

Then, instead of inferSchema being a simple method that is either enabled
or not, we can set a schema.resolver.type.  Pseudocode E.g.

interface JsonSchemaResolver {
  public Schema apply(JsonNode value);
}

class InferSchemaResolver implements JsonSchemaResolver {
   public Schema apply(JsonNode value) {
  return inferSchema(value);
   }
}

Or for me and my custom resolver…

class JsonSchemaURIResolver implements JsonSchemaResolver {
  public JsonSchemaURIResolver(schemaUriPath) {
  // …
  }

  public Schema apply(JsonNode value) {
schemaURI = value.at(schemaUriPath);
// cache JSON schema here.
return asConnectSchema(getJsonSchemaAtURI(schemaURI));
  }

  public Schema asConnectSchema(JsonNode schema) {
// Recursively traverse schema and pull convert “type” fields to
Connect Schema types.
  }
}

And in config:
# perhaps the default:
schema.resolver.type=org.apache.kafka.connect.json.InferSchemaResolver
# Or for me and my custom resolver…
schema.resolver.type=org.apache.kafka.connect.json.JsonSchemaURIResolver


If we were really fancy, I suppose the Resolver interface wouldn't have to
be JsonNode specific.  Then any Converter could be configured with a schema
resolver implementation that returned a Schema from an Object value.

(I believe this is similar to how Confluent’s AvroConverter works, but that
is Confluent Schema Registry specific.)

Anyway, just a thought.

- Andrew Otto
  Senior Systems Engineer
  Wikimedia Foundation


[jira] [Created] (KAFKA-6976) Kafka Streams instances going in to DEAD state

2018-05-31 Thread Deepak Goyal (JIRA)
Deepak Goyal created KAFKA-6976:
---

 Summary: Kafka Streams instances going in to DEAD state
 Key: KAFKA-6976
 URL: https://issues.apache.org/jira/browse/KAFKA-6976
 Project: Kafka
  Issue Type: New Feature
  Components: streams
Reporter: Deepak Goyal


We are using Kafka 0.10.2.0, kafka streams 1.1.0. We have Kafka Cluster of 16 
machines, and topic that is being consumed by Kafka Streams has 256 partitions. 
We spawned 400 instances of Kakfa Streams application. We see that all of the 
StreamThreads go in to DEAD state.
{quote}{{[2018-05-25 05:59:29,282] INFO stream-thread 
[ksapp-19f923d7-5f9e-4137-b79f-ee20945a7dd7-StreamThread-1] State transition 
from PENDING_SHUTDOWN to DEAD 
(org.apache.kafka.streams.processor.internals.StreamThread) [2018-05-25 
05:59:29,282] INFO stream-client [ksapp-19f923d7-5f9e-4137-b79f-ee20945a7dd7] 
State transition from REBALANCING to ERROR 
(org.apache.kafka.streams.KafkaStreams) [2018-05-25 05:59:29,282] WARN 
stream-client [ksapp-19f923d7-5f9e-4137-b79f-ee20945a7dd7] All stream threads 
have died. The instance will be in error state and should be closed. 
(org.apache.kafka.streams.KafkaStreams) [2018-05-25 05:59:29,282] INFO 
stream-thread [ksapp-19f923d7-5f9e-4137-b79f-ee20945a7dd7-StreamThread-1] 
Shutdown complete (org.apache.kafka.streams.processor.internals.StreamThread)}}
{quote}

Please note that when we have only 100 kafka instances, things are working as 
expected. We see that instances are consuming messages from topic.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


[jira] [Created] (KAFKA-6975) AdminClient.deleteRecords() may cause replicas unable to fetch from beginning

2018-05-31 Thread Anna Povzner (JIRA)
Anna Povzner created KAFKA-6975:
---

 Summary: AdminClient.deleteRecords() may cause replicas unable to 
fetch from beginning
 Key: KAFKA-6975
 URL: https://issues.apache.org/jira/browse/KAFKA-6975
 Project: Kafka
  Issue Type: Bug
Affects Versions: 1.1.0
Reporter: Anna Povzner
Assignee: Anna Povzner


AdminClient.deleteRecords(beforeOffset(offset)) will set log start offset to 
the requested offset. If the requested offset is in the middle of the batch, 
the replica will not be able to fetch from that offset (because it is in the 
middle of the batch). 

One use-case where this could cause problems is replica re-assignment. Suppose 
we have a topic partition with 3 initial replicas, and at some point the user 
issues  AdminClient.deleteRecords() for the offset that falls in the middle of 
the batch. It now becomes log start offset for this topic partition. Suppose at 
some later time, the user starts partition re-assignment to 3 new replicas. The 
new replicas (followers) will start with HW = 0, will try to fetch from 0, then 
get "out of order offset" because 0 < log start offset (LSO); the follower will 
be able to reset offset to LSO of the leader and fetch LSO; the leader will 
send a batch in response with base offset 

[jira] [Created] (KAFKA-6974) Changes the interaction between request handler threads and fetcher threads into an ASYNC model

2018-05-31 Thread Lucas Wang (JIRA)
Lucas Wang created KAFKA-6974:
-

 Summary: Changes the interaction between request handler threads 
and fetcher threads into an ASYNC model
 Key: KAFKA-6974
 URL: https://issues.apache.org/jira/browse/KAFKA-6974
 Project: Kafka
  Issue Type: Improvement
Reporter: Lucas Wang


Problem Statement:
At LinkedIn, occasionally our clients complain about receiving consant 
NotLeaderForPartition exceptions 

Investigations:
For one investigated case, the cluster was going through a rolling bounce. And 
we saw there was a ~8 minutes delay between an old partition leader resigning 
and the new leader becoming active, based on entries of "Broker xxx handling 
LeaderAndIsr request" in the state change log.
Our monitoring shows the LeaderAndISR request local time during the incident 
went up to ~4 minutes.


Explanations:
One possible explanation of the ~8 minutes of delay is:
During controlled shutdown of a broker, the partitions whose leaders lie on the 
shutting down broker need to go through leadership transitions. And the 
controller process partitions in batches with each batch having 
config.controlledShutdownPartitionBatchSize partitions, e.g. 100.
If the 1st LeaderAndISR sent to a new leader broker takes too long, e.g. 4 
minutes, then the subsequent LeaderAndISR requests can have an accumulated 
delay of maybe 4 minutes, 8 minutes, or even 12 minutes... The reason is that 
subsequent LeaderAndISR requests are blocked in a muted channel, given only one 
LeaderAndISR request can be processed at a time with a 
maxInFlightRequestsPerConnection setting of 1. When that happens, no existing 
metric would show the total delay of 8 or 12 minutes for muted requests.
Now the question is why it took ~4 minutes for the the 1st LeaderAndISR request 
to finish.

Explanation for the ~4 minutes of local time for LeaderAndISR request:
During processing of an LeaderAndISR request, the request handler thread needs 
to add partitions to or remove partitions from partitionStates field of the 
ReplicaFetcherThread, also shutdown idle fetcher threads by checking the size 
of the partitionStates field. On the other hand, background fetcher threads 
need to iterate through all the partitions in partitionStates in order to build 
fetch request, and process fetch responses. The synchronization between request 
handler thread and the fetcher threads is done through a partitionMapLock. 
Specifically, the fetcher threads may acquire the partitionMapLock, and then 
calls the following functions for processing the fetch response
(1) processPartitionData, which in turn calls 
(2) Replica.maybeIncrementLogStartOffset, which calls 
(3) Log.maybeIncrementLogStartOffset, which calls 
(4) LeaderEpochCache.clearAndFlushEarliest.
Now two factors contribute to the long holding of the partitionMapLock,
1. function (4) above entails calling sync() to make sure data gets persistent 
to the disk, which may potentially have a long latency
2. All the 4 functions above can potentially be called for each partition in 
the fetch response, multiplying the sync() latency by a factor of n.

The end result is that the request handler thread got blocked for a long time 
trying to acquire the partitionMapLock of some fetcher inside 
AbstractFetcherManager.shutdownIdleFetcherThreads since checking each fetcher's 
partitionCount requires getting the partitionMapLock.

In our testing environment, we reproduced the problem and confirmed the 
explanation above with a request handler thread getting blocked for 10 seconds 
trying to acquire the partitionMapLock of one particular fetcher thread, while 
there are many log entries showing "Incrementing log start offset of 
partition..."

Proposed change:
We propose to change the interaction between the request handler threads and 
the fetcher threads to an ASYNC model by using an event queue. All requests to 
add or remove partitions, or shutdown idle fetcher threads are modeled as items 
in the event queue. And only the fetcher threads can take items out of the 
event queue and actually process them.
In the new ASYNC model, in order to be able to process an infinite sequence of 
FetchRequests, a fetcher thread initially has one FetchRequest, and after it's 
done processing one FetchRequest, it enqueues one more into its own event queue.
Also since the current AbstractFetcherThread logic is inherited by both the 
replica-fetcher-threads and the consumer-fetcher-threads for the old consumer, 
and the latter has been deprecated, we plan to implement the ASYNC model with a 
clean-slate approach, and only support the replica-fetcher-threads, in order to 
make the code cleaner.




--
This message was sent by Atlassian JIRA
(v7.6.3#76005)


Re: [DISCUSS] KIP-307: Allow to define custom processor names with KStreams DSL

2018-05-31 Thread Damian Guy
Hi Florian,

Thanks for the KIP. What about KTable and other DSL interfaces? Will they
not want to be able to do the same thing?
It would be good to see a complete set of the public API changes.

Cheers,
Damian

On Wed, 30 May 2018 at 19:45 Guozhang Wang  wrote:

> Hello Florian,
>
> Thanks for the KIP. I have some meta feedbacks on the proposal:
>
> 1. You mentioned that this `Processed` object will be added to a new
> overloaded variant of all the stateless operators, what about the stateful
> operators? Would like to hear your opinions if you have thought about that:
> note for stateful operators they will usually be mapped to multiple
> processor node names, so we probably need to come up with some ways to
> define all their names.
>
> 2. I share the same concern with Bill as for adding lots of new overload
> functions into the stateless operators, as we have just spent quite some
> effort in trimming them since 1.0.0 release. If the goal is to just provide
> some "hints" on the generated processor node names, not strictly enforcing
> the exact names that to be generated, then how about we just add a new
> function to `KStream` and `KTable` classes like: "as(Processed)", with the
> semantics as "the latest operators that generate this KStream / KTable will
> be named accordingly to this hint".
>
> The only caveat, is that for all operators like `KStream#to` and
> `KStream#print` that returns void, this alternative would not work. But for
> the current operators:
>
> a. KStream#print,
> b. KStream#foreach,
> c. KStream#to,
> d. KStream#process
>
> I personally felt that except `KStream#process` users would not usually
> bother to override their names, and for `KStream#process` we could add an
> overload variant with the additional Processed object.
>
>
> 3. In your example, the processor names are still added with a suffix like
> "
> -00", is this intentional? If yes, why (I thought with user
> specified processor name hints we will not add suffix to distinguish
> different nodes of the same type any more)?
>
>
> Guozhang
>
>
> On Tue, May 29, 2018 at 6:47 AM, Bill Bejeck  wrote:
>
> > Hi Florian,
> >
> > Thanks for the KIP.  I think being able to add more context to the
> > processor names would be useful.
> >
> > I like the idea of adding a "withProcessorName" to Produced, Consumed and
> > Joined.
> >
> > But instead of adding the "Processed" parameter to a large percentage of
> > the methods, which would result in overloaded methods (which we removed
> > quite a bit with KIP-182) what do you think of adding a method
> > to the AbstractStream class "withName(String processorName)"? BTW I"m not
> > married to the method name, it's the best I can do off the top of my
> head.
> >
> > For the methods that return void, we'd have to add a parameter, but that
> > would at least cut down on the number of overloaded methods in the API.
> >
> > Just my 2 cents.
> >
> > Thanks,
> > Bill
> >
> > On Sun, May 27, 2018 at 4:13 PM, Florian Hussonnois <
> fhussonn...@gmail.com
> > >
> > wrote:
> >
> > > Hi,
> > >
> > > I would like to start a new discussion on following KIP :
> > > https://cwiki.apache.org/confluence/display/KAFKA/KIP-
> > > 307%3A+Allow+to+define+custom+processor+names+with+KStreams+DSL
> > >
> > > This is still a draft.
> > >
> > > Looking forward for your feedback.
> > > --
> > > Florian HUSSONNOIS
> > >
> >
>
>
>
> --
> -- Guozhang
>


Re: referencing OffsetCheckpoint in downstream project

2018-05-31 Thread Ted Yu
Thanks for the reply.

It seems the following class in newer release can do what the test needs:

class OffsetCheckpointFile(val file: File, logDirFailureChannel:
LogDirFailureChannel = null) {

Cheers

On Thu, May 31, 2018 at 9:27 AM, Ismael Juma  wrote:

> Hi Ted,
>
> There are two such classes. The example you have is for the broker class,
> not the Streams one.
>
> Ismael
>
> On Thu, 31 May 2018, 09:03 Ted Yu,  wrote:
>
> > Hi,
> > OffsetCheckpoint has been relocated
> > to org.apache.kafka.streams.state.internals package.
> >
> > Does this mean that downstream project should no longer reference this
> > class ?
> >
> > This is how the class is used (against Kafka 0.10.0.1 release) :
> >
> > // ensure that topic is removed from all cleaner offsets
> > assert(servers.forall(server => topicAndPartitions.forall { tp =>
> >   val checkpoints = server.getLogManager().logDirs.map { logDir =>
> > new OffsetCheckpoint(new File(logDir,
> > "cleaner-offset-checkpoint")).read()
> >   }
> >   checkpoints.forall(checkpointsPerLogDir =>
> > !checkpointsPerLogDir.contains(tp))
> > }), s"checkpoint for topic $topic still exists")
> >
> > Cheers
> >
>


Re: [VOTE] KIP-275 - Indicate "isClosing" in the SinkTaskContext

2018-05-31 Thread Matt Farmer
Bumping this again as it's been languishing for a few weeks. Would love to
get further feedback (or know for sure that this won't happen).

On Mon, May 14, 2018 at 3:48 PM, Matt Farmer  wrote:

> Bumping this thread.
>
> For anyone who needs a refresher the discussion thread is here:
> http://mail-archives.apache.org/mod_mbox/kafka-dev/
> 201803.mbox/%3CCAM5dya9x---9M3uEf_wrJL5dw%2B6HLV4%
> 3D5PfKKSTPE1vOHEWC_g%40mail.gmail.com%3E
>
> And there's a work in progress PR open here: https://github.com/
> apache/kafka/pull/5002
>
> Thanks!
>
> On Wed, Apr 25, 2018 at 1:04 PM, Matt Farmer  wrote:
>
>> Bump!
>>
>> We're currently at 1 non-binding +1.
>>
>> Still soliciting votes here. =)
>>
>> On Wed, Apr 18, 2018 at 3:41 PM, Ted Yu  wrote:
>>
>>> +1
>>>
>>> On Wed, Apr 18, 2018 at 12:40 PM, Matt Farmer  wrote:
>>>
>>> > Good afternoon/evening/morning all:
>>> >
>>> > I'd like to start voting on KIP-275: Indicate "isClosing" in the
>>> > SinkTaskContext
>>> > https://cwiki.apache.org/confluence/pages/viewpage.action?pa
>>> geId=75977607
>>> >
>>> > I'm going to start preparing the patch we've been using internally for
>>> PR
>>> > and get it up for review later this week.
>>> >
>>> > Thanks!
>>> > Matt
>>> >
>>>
>>
>>
>


Re: referencing OffsetCheckpoint in downstream project

2018-05-31 Thread Ismael Juma
Hi Ted,

There are two such classes. The example you have is for the broker class,
not the Streams one.

Ismael

On Thu, 31 May 2018, 09:03 Ted Yu,  wrote:

> Hi,
> OffsetCheckpoint has been relocated
> to org.apache.kafka.streams.state.internals package.
>
> Does this mean that downstream project should no longer reference this
> class ?
>
> This is how the class is used (against Kafka 0.10.0.1 release) :
>
> // ensure that topic is removed from all cleaner offsets
> assert(servers.forall(server => topicAndPartitions.forall { tp =>
>   val checkpoints = server.getLogManager().logDirs.map { logDir =>
> new OffsetCheckpoint(new File(logDir,
> "cleaner-offset-checkpoint")).read()
>   }
>   checkpoints.forall(checkpointsPerLogDir =>
> !checkpointsPerLogDir.contains(tp))
> }), s"checkpoint for topic $topic still exists")
>
> Cheers
>


referencing OffsetCheckpoint in downstream project

2018-05-31 Thread Ted Yu
Hi,
OffsetCheckpoint has been relocated
to org.apache.kafka.streams.state.internals package.

Does this mean that downstream project should no longer reference this
class ?

This is how the class is used (against Kafka 0.10.0.1 release) :

// ensure that topic is removed from all cleaner offsets
assert(servers.forall(server => topicAndPartitions.forall { tp =>
  val checkpoints = server.getLogManager().logDirs.map { logDir =>
new OffsetCheckpoint(new File(logDir,
"cleaner-offset-checkpoint")).read()
  }
  checkpoints.forall(checkpointsPerLogDir =>
!checkpointsPerLogDir.contains(tp))
}), s"checkpoint for topic $topic still exists")

Cheers


Re: [DISCUSS] Why don't support web UI

2018-05-31 Thread Rahul Singh
My opinions why:

There are open source Web UIs available. Kafka Manager being one of them. Kafka 
is also a fairly new product. Even slightly mature products like Cassandra 
(which I’m also on the dev list for) focus on core features instead of user 
interfaces because there 100s is ways to take the JMX APIs and either make your 
own tools to use something else readily created.

There are also commercial providers like Confluent and Landoop that do this as 
a packaged product.

--
Rahul Singh
rahul.si...@anant.us

Anant Corporation
On May 28, 2018, 10:58 PM -0400, 逐风者的祝福 <1878707...@qq.com>, wrote:
> Hi team:
> I'm curious that why we don't offer a web UI for administrators . The web UI 
> is very useful to display cluster status and current topic status .


Jenkins build is back to normal : kafka-trunk-jdk8 #2687

2018-05-31 Thread Apache Jenkins Server
See 




Re: Need help

2018-05-31 Thread Rahul Singh
This is probably better for the User list. The Dev group is for discussing 
Kafka Development not Kafka use cases.

Also :

1. Did you mean to send a confidential email to this public list?
2. Did you search for anything on Google already and did you have trouble doing 
this on your own?
3. What data base are you using? There are hundreds of “databases”.

--
Rahul Singh
rahul.si...@anant.us

Anant Corporation
On May 30, 2018, 4:04 AM -0400, satyanarayan...@dell.com, wrote:
> Dell - Internal Use - Confidential
>
> Hi Team,
>
>
> As part of my project , Need to read data from kafka with Avro and load  into 
> my data base.
>
> Data containing with arrays  [0],[1],[2]
>
> Under the Phone number we have phone key objects also .
>
> Please provide sample code to achieve this .
>
>
> Thanks
> Satya


[jira] [Resolved] (KAFKA-6054) ERROR "SubscriptionInfo - unable to decode subscription data: version=2" when upgrading from 0.10.0.0 to 0.10.2.1

2018-05-31 Thread Matthias J. Sax (JIRA)


 [ 
https://issues.apache.org/jira/browse/KAFKA-6054?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
 ]

Matthias J. Sax resolved KAFKA-6054.

Resolution: Fixed

> ERROR "SubscriptionInfo - unable to decode subscription data: version=2" when 
> upgrading from 0.10.0.0 to 0.10.2.1
> -
>
> Key: KAFKA-6054
> URL: https://issues.apache.org/jira/browse/KAFKA-6054
> Project: Kafka
>  Issue Type: Bug
>  Components: streams
>Affects Versions: 0.10.2.1
>Reporter: James Cheng
>Assignee: Matthias J. Sax
>Priority: Major
>  Labels: kip
> Fix For: 2.0.0
>
>
> KIP: 
> [https://cwiki.apache.org/confluence/display/KAFKA/KIP-268%3A+Simplify+Kafka+Streams+Rebalance+Metadata+Upgrade]
> We upgraded an app from kafka-streams 0.10.0.0 to 0.10.2.1. We did a rolling 
> upgrade of the app, so that one point, there were both 0.10.0.0-based 
> instances and 0.10.2.1-based instances running.
> We observed the following stack trace:
> {code:java}
> 2017-10-11 07:02:19.964 [StreamThread-3] ERROR o.a.k.s.p.i.a.SubscriptionInfo 
> -
> unable to decode subscription data: version=2
> org.apache.kafka.streams.errors.TaskAssignmentException: unable to decode
> subscription data: version=2
> at 
> org.apache.kafka.streams.processor.internals.assignment.SubscriptionInfo.decode(SubscriptionInfo.java:113)
> at 
> org.apache.kafka.streams.processor.internals.StreamPartitionAssignor.assign(StreamPartitionAssignor.java:235)
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.performAssignment(ConsumerCoordinator.java:260)
> at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.onJoinLeader(AbstractCoordinator.java:404)
> at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.access$900(AbstractCoordinator.java:81)
> at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator$JoinGroupResponseHandler.handle(AbstractCoordinator.java:358)
> at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator$JoinGroupResponseHandler.handle(AbstractCoordinator.java:340)
> at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator$CoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:679)
> at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator$CoordinatorResponseHandler.onSuccess(AbstractCoordinator.java:658)
> at 
> org.apache.kafka.clients.consumer.internals.RequestFuture$1.onSuccess(RequestFuture.java:167)
> at 
> org.apache.kafka.clients.consumer.internals.RequestFuture.fireSuccess(RequestFuture.java:133)
> at 
> org.apache.kafka.clients.consumer.internals.RequestFuture.complete(RequestFuture.java:107)
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient$RequestFutureCompletionHandler.onComplete(ConsumerNetworkClient.java:426)
> at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:278)
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.clientPoll(ConsumerNetworkClient.java:360)
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:224)
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:192)
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:163)
> at 
> org.apache.kafka.clients.consumer.internals.AbstractCoordinator.ensureActiveGroup(AbstractCoordinator.java:243)
> at 
> org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.ensurePartitionAssignment(ConsumerCoordinator.java:345)
> at 
> org.apache.kafka.clients.consumer.KafkaConsumer.pollOnce(KafkaConsumer.java:977)
> at 
> org.apache.kafka.clients.consumer.KafkaConsumer.poll(KafkaConsumer.java:937)
> at 
> org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:295)
> at 
> org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:218)
> 
> {code}
> I spoke with [~mjsax] and he said this is a known issue that happens when you 
> have both 0.10.0.0 instances and 0.10.2.1 instances running at the same time, 
> because the internal version number of the protocol changed when adding 
> Interactive Queries. Matthias asked me to file this JIRA>



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)