Re: [DISCUSS] SEP-5: Enable partition expansion of input streams

2017-06-12 Thread Dong Lin
Thanks for the reply Jacob. Please see my comment inline.

On Mon, Jun 12, 2017 at 7:51 PM, Jacob Maes  wrote:

> >
> > - For users that need partition expansion of the input streams for
> stateful
> > job, they have a really big headache in the sense that Samza does not
> allow
> > partition expansion for stateful job. SEP-5 addresses this headache for
> > them.
> > You are right that SEP-5 requires user to understand and enforce
> > limitations across organizations. But it is still much better than not
> > allowing user to expansion partition for stateful jobs at all, right?
> Did I
> > miss something here?
>
> I guess this one is a matter of perspective.
>
> One argument is that if the system supports one case, it's better than none
> because there is one less scenario in which the system does the wrong
> thing.
>
> The counter argument is for uniform and consistent behavior, which is easy
> for users to understand and properly leverage.
>
> Specifically, I'd argue that the current rule is very simple: "you cannot
> repartition inputs on a stateful job, so you must over-partition the
> initial implementation". To me, while that rule is not ideal, its
> simplicity is better that introducing a new solution that has a bunch of
> caveats, any one of which could be missed. If any one of the assumptions in
> this SEP design are violated, the job would behave incorrectly. That puts a
> lot more burden on the users than the simpler rule.
>

I agree that we have different perspective here. It is true that user would
mess up their job if they used this feature in a wrong way, i.e. violate
the assumption made in SEP-5. On the other hand, I think there is always a
way for user to mess up their job if they configure the Samza job
incorrectly. I also think the assumption made in this SEP is not
particularly harder to understand than other existing configs in Samza.

The answer to this can be subjective. I would love to hear perspective from
other developers on this issue.


>
> That's why I mentioned a few alternatives that, while more complex to
> implement, would provide a more consistent behavior with simple rules for
> the users.
>

I am open to discuss alternative solutions that can address the the problem
in a better manner. I am not opposed to complexity as long as it gives us
good long term benefits.

Here are my current concern with the three alternatives you described
earlier:

- The first alternative requires support from input system which is
currently not available. It will limit the usage of partition expansion to
only systems that support such interface. And it is not guaranteed that we
can persuade the developer of the input system to add this interface. This
is not desirable for Samza in the long term.

- I can not comment on the second alternative because I don't understand
how it reshuffles all existing changelog data. We can discuss more if there
is more specific detail. My gut feel is that this will be complex and
carries performance overhead.

- The third alternative requires performance overhead. Given that user can
already use this solution to enable partition expansion, maybe Samza
developers can provide more input as to why we are not doing it by default.
My gut feel is that it carries considerable performance overhead and
increases the cost-to-serve Samze job (e.g. disk usage), which may make it
undesirable in the long term.



>
> Yes, we need a similar check for GroupBySystemStreamPartitionWi
> > thFixedTaskNum
> > as well. If there is more grouper classes needed in the future, we can
> > solve this problem cleanly without new config. Given the
> > previousGrouperClass and newGrouperClass, KafkaCheckpointLogKey will
> throw
> > exception if and only if newGrouperClass is an instance of
> > previousGrouperClass.
> > GroupBySystemStreamPartitionWithFixedTaskNum should extend
> > GroupBySystemStreamPartition
> > and GroupByPartitionWithFixedTaskNum should extend GroupByPartition.
> Does
> > this address your concern?
>
> Sounds workable, thanks.
>
> >
> > Can
> > you be more specific why Partition-to-task mapping is not meaningful
> > without
> > some definition of the key-to-partition assignments and why it is
> > incomplete and misleading?
>
>  A partition is (in my naive interpretation) an independent queue for
> messages of a particular key set. It is not the *identity* of the partition
> that determine the contents of the associated task's local state. Rather it
> is the *contents* of the partition that affect the task's state. A
> partiton-to-task mapping only captures an identity relationship:
> partition1->task1. Without the assumptions of this SEP, this is
> insufficient to determine the assignment of keys to tasks, which is what
> really matters. Therefore, any future feature that utilizes this mapping
> without accounting for the assumptions of this SEP is likely to
> malfunction.
>
>
I am not sure it is true that "any future feature that utilizes this

Re: [DISCUSS] SEP-5: Enable partition expansion of input streams

2017-06-12 Thread Jacob Maes
>
> - For users that need partition expansion of the input streams for stateful
> job, they have a really big headache in the sense that Samza does not allow
> partition expansion for stateful job. SEP-5 addresses this headache for
> them.
> You are right that SEP-5 requires user to understand and enforce
> limitations across organizations. But it is still much better than not
> allowing user to expansion partition for stateful jobs at all, right? Did I
> miss something here?

I guess this one is a matter of perspective.

One argument is that if the system supports one case, it's better than none
because there is one less scenario in which the system does the wrong
thing.

The counter argument is for uniform and consistent behavior, which is easy
for users to understand and properly leverage.

Specifically, I'd argue that the current rule is very simple: "you cannot
repartition inputs on a stateful job, so you must over-partition the
initial implementation". To me, while that rule is not ideal, its
simplicity is better that introducing a new solution that has a bunch of
caveats, any one of which could be missed. If any one of the assumptions in
this SEP design are violated, the job would behave incorrectly. That puts a
lot more burden on the users than the simpler rule.

That's why I mentioned a few alternatives that, while more complex to
implement, would provide a more consistent behavior with simple rules for
the users.

Yes, we need a similar check for GroupBySystemStreamPartitionWi
> thFixedTaskNum
> as well. If there is more grouper classes needed in the future, we can
> solve this problem cleanly without new config. Given the
> previousGrouperClass and newGrouperClass, KafkaCheckpointLogKey will throw
> exception if and only if newGrouperClass is an instance of
> previousGrouperClass.
> GroupBySystemStreamPartitionWithFixedTaskNum should extend
> GroupBySystemStreamPartition
> and GroupByPartitionWithFixedTaskNum should extend GroupByPartition. Does
> this address your concern?

Sounds workable, thanks.

>
> Can
> you be more specific why Partition-to-task mapping is not meaningful
> without
> some definition of the key-to-partition assignments and why it is
> incomplete and misleading?

 A partition is (in my naive interpretation) an independent queue for
messages of a particular key set. It is not the *identity* of the partition
that determine the contents of the associated task's local state. Rather it
is the *contents* of the partition that affect the task's state. A
partiton-to-task mapping only captures an identity relationship:
partition1->task1. Without the assumptions of this SEP, this is
insufficient to determine the assignment of keys to tasks, which is what
really matters. Therefore, any future feature that utilizes this mapping
without accounting for the assumptions of this SEP is likely to malfunction.


On Mon, Jun 12, 2017 at 5:09 PM, Dong Lin  wrote:

> Hey Jacob,
>
> Thanks for the explanation. It seems that your biggest concern is with the
> generality of the proposal. Let me try to address this and other comments
> below.
>
> 1) ... it will cause headaches for Samza users ...
>
> I am not sure I understand why this proposal causes headache for Samza
> users. Here is the impact of the SEP-5 on users:
>
> - For users that do not need partition expansion of the input stream, they
> can use Samza without change change in code/binary/config. Thus there is no
> headache for them.
>
> - For users that need partition expansion of the input streams for
> stateless job, they currently need to manually reboot their Samza job in
> order to let Samza consume the new partitions created for the stream. SEP-5
> actually reduced their headache by allowing Samza to automatically detect
> and consume new partitions.
>
> - For users that need partition expansion of the input streams for stateful
> job, they have a really big headache in the sense that Samza does not allow
> partition expansion for stateful job. SEP-5 addresses this headache for
> them.
>
> You are right that SEP-5 requires user to understand and enforce
> limitations across organizations. But it is still much better than not
> allowing user to expansion partition for stateful jobs at all, right? Did I
> miss something here?
>
> 2) ... Separate orgs are often difficult to coordinate and a system which
> depends on such significant process/coordination is too fragile for my
> taste ..
>
> This is true. Ideally we want a system that is fully self-serving. I think
> this is a long term goal for Samza. Still, for the reasons described above,
> I think something is better than nothing. I am open to alternative design
> that can support partition expansion for stateful jobs without requiring
> coordination.
>
> 3) There is currently no supported way of sharing state among the tasks of
> a container.  Each task has its own isolated store and that logical
> isolation is the primary thing that enables Samza jobs to scale 

Re: [DISCUSS] SEP-5: Enable partition expansion of input streams

2017-06-12 Thread Dong Lin
Thanks Xinyu offering a solution.

Yeah, we have actually listed it as the third rejected alternative in SEP-5
.
I can move this to future work. I think it is actually a great idea to
support more general partition expansion and I think this is what we should
do for Samza in the long term.

While this pluggable function enables support for more input system, it
alone won't address Jacob's concern with separate coordination across
organization. This is because user would still need to coordinate with the
upstream organization and manually configure their Samza job to specify an
old-to-new-partition mapping to be consistent with the mapping enforced in
the input system. We can make this self-serving if the input system
provides an interface for Samza to dynamically fetch the
new-to-old-partition mapping. But it needs to be future work since no
existing input system provides this interface.

On Mon, Jun 12, 2017 at 4:54 PM, xinyu liu  wrote:

> How about making the partition mapping function a pluggable component in
> the partition expansion? Mathematically, this is a mapping function which
> is able to map the new partitions to the old ones:
>
>   *f (new partition) -> old partition*
>
> If the function is a surjective function (
> https://en.wikipedia.org/wiki/Surjective_function), we are able to keep
> the
> tasks as they were by replacing the old partition assignment with the new
> one using the mapping function. By making this function pluggable, users
> can provide their own mapping functions to make this work for different
> kinds of input systems. Samza should check whether the function is
> surjective so it knows whether we can keep the same task count. with
> different grouping. For Kafka, we can provide a simple modular function as
> the mapping, and it's surjective. I agree it's very nice to have a more
> general support to be able to split the states of tasks and expand the
> change log etc, but this SEP is still useful and can address quite a large
> number of scenarios in practice, do you agree?
>
> Thanks,
> Xinyu
>
> On Mon, Jun 12, 2017 at 3:54 PM, Jacob Maes  wrote:
>
> > Hey Dong,
> >
> > I'm opposed (or a +0, at best) to this limited, Kafka-specific solution.
> I
> > understand that the proposal is relatively simple to implement, but I
> think
> > it will cause headaches for Samza users. They will not only have to
> > understand all the limitations (increase only, double partitions only,
> > partition using hash+modulo, etc) of this approach, but enforcing these
> > limitations can be a major problem, especially when the Samza jobs and
> > message brokers are managed by separate orgs in a company. Separate orgs
> > are often difficult to coordinate and a system which depends on such
> > significant process/coordination is too fragile for my taste.
> >
> > That said, I realize that my opinion is just one of many in the broader
> > community which may feel differently, so let me respond to some of the
> > other items in the discussion so we can clear them up:
> >
> > The task-to-container assignment matters because if the correlated tasks
> > > (i.e. tasks that consume messages with the same key) needs to be in the
> > > same container so that they can share the same key/value local store on
> > the
> > > same physical machine.
> >
> > There is currently no supported way of sharing state among the tasks of a
> > container.  Each task has its own isolated store and that logical
> isolation
> > is the primary thing that enables Samza jobs to scale with a simple
> > container count change. My feeling is that we should not change this
> > without good reason.
> >
> > I think we can hardcode new logic in KafkaCheckpointLogKey.scala such
> that
> > > exception will not be thrown if new grouper is
> > > GroupByPartitionWithFixedTaskNum and old grouper is GroupByPartition.
> > Does
> > > this look reasonable?
> >
> > With the current proposal, we'd also need a similar check for
> > GroupBySystemStreamPartitionWithFixedTaskNum as well. And if any other
> > groupers were later added with both these modes, we'd probably need to
> add
> > those too. It might be easier and cleaner to add a config to ignore that
> > check temporarily. Down side is that it further complicates the Samza
> > config, which is already huge. Thoughts?
> >
> > I think storing the previous task-to-partition mapping is more general
> than
> > > storing the partition count of all topics for the following reasons:
> > > - Samza already stores the task-to-container mapping and
> > container-to-host
> > > mapping in the coordinator stream. It seems consistent to also store
> the
> > > partition-to-task mapping. And this information may be useful for other
> > > use-case such as debugging.
> > > - By having the new interface take the previous task-to-partition
> > > assignment instead of a 

Re: [DISCUSS] SEP-5: Enable partition expansion of input streams

2017-06-12 Thread Dong Lin
Hey Jacob,

Thanks for the explanation. It seems that your biggest concern is with the
generality of the proposal. Let me try to address this and other comments
below.

1) ... it will cause headaches for Samza users ...

I am not sure I understand why this proposal causes headache for Samza
users. Here is the impact of the SEP-5 on users:

- For users that do not need partition expansion of the input stream, they
can use Samza without change change in code/binary/config. Thus there is no
headache for them.

- For users that need partition expansion of the input streams for
stateless job, they currently need to manually reboot their Samza job in
order to let Samza consume the new partitions created for the stream. SEP-5
actually reduced their headache by allowing Samza to automatically detect
and consume new partitions.

- For users that need partition expansion of the input streams for stateful
job, they have a really big headache in the sense that Samza does not allow
partition expansion for stateful job. SEP-5 addresses this headache for
them.

You are right that SEP-5 requires user to understand and enforce
limitations across organizations. But it is still much better than not
allowing user to expansion partition for stateful jobs at all, right? Did I
miss something here?

2) ... Separate orgs are often difficult to coordinate and a system which
depends on such significant process/coordination is too fragile for my
taste ..

This is true. Ideally we want a system that is fully self-serving. I think
this is a long term goal for Samza. Still, for the reasons described above,
I think something is better than nothing. I am open to alternative design
that can support partition expansion for stateful jobs without requiring
coordination.

3) There is currently no supported way of sharing state among the tasks of
a container.  Each task has its own isolated store and that logical
isolation is the primary thing that enables Samza jobs to scale with a
simple container count change. My feeling is that we should not change
this without
good reason.

I see your point. I will remove this sentence from the motivation section.
This won't have any impact on the design of the SEP-5. Does this address
the problem?

4) With the current proposal, we'd also need a similar check for
GroupBySystemStreamPartitionWithFixedTaskNum as well. And if any other groupers
were later added with both these modes, we'd probably need to add those
too. It might be easier and cleaner to add a config to ignore that check
temporarily. Down side is that it further complicates the Samza config,
which is already huge. Thoughts?

Yes, we need a similar check for GroupBySystemStreamPartitionWithFixedTaskNum
as well. If there is more grouper classes needed in the future, we can
solve this problem cleanly without new config. Given the
previousGrouperClass and newGrouperClass, KafkaCheckpointLogKey will throw
exception if and only if newGrouperClass is an instance of
previousGrouperClass.
GroupBySystemStreamPartitionWithFixedTaskNum should extend
GroupBySystemStreamPartition
and GroupByPartitionWithFixedTaskNum should extend GroupByPartition. Does
this address your concern?

5) The task-to-container and container-to-host mappings are both meaningful
in context of the JobModel. Partition-to-task mapping is not meaningful without
some definition of the key-to-partition assignments. It's incomplete
information and therefore misleading. I think it only makes sense to use
this mapping if we adopt a solution wherein Samza also knows the partition
key assignment.

Partition-to-task is currently explicitly passed from job coordinator to
each task as part of the job model to tell tasks which partitions to
consume from. I think we can store some definition of the key-to-partition
assignments if Samza decides to get and use this information in the future. Can
you be more specific why Partition-to-task mapping is not meaningful without
some definition of the key-to-partition assignments and why it is
incomplete and misleading?


Thanks,
Dong

On Mon, Jun 12, 2017 at 3:54 PM, Jacob Maes  wrote:

> Hey Dong,
>
> I'm opposed (or a +0, at best) to this limited, Kafka-specific solution. I
> understand that the proposal is relatively simple to implement, but I think
> it will cause headaches for Samza users. They will not only have to
> understand all the limitations (increase only, double partitions only,
> partition using hash+modulo, etc) of this approach, but enforcing these
> limitations can be a major problem, especially when the Samza jobs and
> message brokers are managed by separate orgs in a company. Separate orgs
> are often difficult to coordinate and a system which depends on such
> significant process/coordination is too fragile for my taste.
>
> That said, I realize that my opinion is just one of many in the broader
> community which may feel differently, so let me respond to some of the
> other items in the discussion so we can 

Re: [DISCUSS] SEP-5: Enable partition expansion of input streams

2017-06-12 Thread xinyu liu
How about making the partition mapping function a pluggable component in
the partition expansion? Mathematically, this is a mapping function which
is able to map the new partitions to the old ones:

  *f (new partition) -> old partition*

If the function is a surjective function (
https://en.wikipedia.org/wiki/Surjective_function), we are able to keep the
tasks as they were by replacing the old partition assignment with the new
one using the mapping function. By making this function pluggable, users
can provide their own mapping functions to make this work for different
kinds of input systems. Samza should check whether the function is
surjective so it knows whether we can keep the same task count. with
different grouping. For Kafka, we can provide a simple modular function as
the mapping, and it's surjective. I agree it's very nice to have a more
general support to be able to split the states of tasks and expand the
change log etc, but this SEP is still useful and can address quite a large
number of scenarios in practice, do you agree?

Thanks,
Xinyu

On Mon, Jun 12, 2017 at 3:54 PM, Jacob Maes  wrote:

> Hey Dong,
>
> I'm opposed (or a +0, at best) to this limited, Kafka-specific solution. I
> understand that the proposal is relatively simple to implement, but I think
> it will cause headaches for Samza users. They will not only have to
> understand all the limitations (increase only, double partitions only,
> partition using hash+modulo, etc) of this approach, but enforcing these
> limitations can be a major problem, especially when the Samza jobs and
> message brokers are managed by separate orgs in a company. Separate orgs
> are often difficult to coordinate and a system which depends on such
> significant process/coordination is too fragile for my taste.
>
> That said, I realize that my opinion is just one of many in the broader
> community which may feel differently, so let me respond to some of the
> other items in the discussion so we can clear them up:
>
> The task-to-container assignment matters because if the correlated tasks
> > (i.e. tasks that consume messages with the same key) needs to be in the
> > same container so that they can share the same key/value local store on
> the
> > same physical machine.
>
> There is currently no supported way of sharing state among the tasks of a
> container.  Each task has its own isolated store and that logical isolation
> is the primary thing that enables Samza jobs to scale with a simple
> container count change. My feeling is that we should not change this
> without good reason.
>
> I think we can hardcode new logic in KafkaCheckpointLogKey.scala such that
> > exception will not be thrown if new grouper is
> > GroupByPartitionWithFixedTaskNum and old grouper is GroupByPartition.
> Does
> > this look reasonable?
>
> With the current proposal, we'd also need a similar check for
> GroupBySystemStreamPartitionWithFixedTaskNum as well. And if any other
> groupers were later added with both these modes, we'd probably need to add
> those too. It might be easier and cleaner to add a config to ignore that
> check temporarily. Down side is that it further complicates the Samza
> config, which is already huge. Thoughts?
>
> I think storing the previous task-to-partition mapping is more general than
> > storing the partition count of all topics for the following reasons:
> > - Samza already stores the task-to-container mapping and
> container-to-host
> > mapping in the coordinator stream. It seems consistent to also store the
> > partition-to-task mapping. And this information may be useful for other
> > use-case such as debugging.
> > - By having the new interface take the previous task-to-partition
> > assignment instead of a topic-to-partition-count mapping as new
> parameter,
> > we can potentially have grouper implementation to support other types of
> > input systems.
> > - It is sightly simpler to store the task-to-partition assignment because
> > we don't need to know whether this is the first time a job is started or
> > not. On the other hand, you can write topic-to-partition-count mapping to
> > the coordinator stream only if this is the first time the job is run
>
> The task-to-container and container-to-host mappings are both meaningful in
> context of the JobModel. Partition-to-task mapping is not meaningful
> without some definition of the key-to-partition assignments. It's
> incomplete information and therefore misleading. I think it only makes
> sense to use this mapping if we adopt a solution wherein Samza also knows
> the partition key assignment.
>
> -Jake
>
> On Tue, Jun 6, 2017 at 11:06 PM, Dong Lin  wrote:
>
> > Hey Jacob,
> >
> > Thanks for taking time to review the SEP.
> >
> > I agree with you and Navina that the current SEP doesn't provide support
> to
> > arbitrary input systems and it doesn't support partition shrink. I think
> > the scope of this SEP is to support partition expansion for 

[GitHub] samza pull request #223: SAMZA-<1324> :

2017-06-12 Thread PawasChhokra
GitHub user PawasChhokra opened a pull request:

https://github.com/apache/samza/pull/223

SAMZA-<1324> : 

Added a metrics class for ZK based job coordinator that reports a few 
metrics.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/PawasChhokra/samza ZkJobCoordinatorMetrics

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/samza/pull/223.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #223


commit 71f26c8b080bfdc709485d0ea69d0ae1a3b2
Author: Pawas Chhokra 
Date:   2017-06-12T23:11:10Z

Added initial metrics for ZooKeeper based Job Coordinator




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: [DISCUSS] SEP-5: Enable partition expansion of input streams

2017-06-12 Thread Jacob Maes
Hey Dong,

I'm opposed (or a +0, at best) to this limited, Kafka-specific solution. I
understand that the proposal is relatively simple to implement, but I think
it will cause headaches for Samza users. They will not only have to
understand all the limitations (increase only, double partitions only,
partition using hash+modulo, etc) of this approach, but enforcing these
limitations can be a major problem, especially when the Samza jobs and
message brokers are managed by separate orgs in a company. Separate orgs
are often difficult to coordinate and a system which depends on such
significant process/coordination is too fragile for my taste.

That said, I realize that my opinion is just one of many in the broader
community which may feel differently, so let me respond to some of the
other items in the discussion so we can clear them up:

The task-to-container assignment matters because if the correlated tasks
> (i.e. tasks that consume messages with the same key) needs to be in the
> same container so that they can share the same key/value local store on the
> same physical machine.

There is currently no supported way of sharing state among the tasks of a
container.  Each task has its own isolated store and that logical isolation
is the primary thing that enables Samza jobs to scale with a simple
container count change. My feeling is that we should not change this
without good reason.

I think we can hardcode new logic in KafkaCheckpointLogKey.scala such that
> exception will not be thrown if new grouper is
> GroupByPartitionWithFixedTaskNum and old grouper is GroupByPartition. Does
> this look reasonable?

With the current proposal, we'd also need a similar check for
GroupBySystemStreamPartitionWithFixedTaskNum as well. And if any other
groupers were later added with both these modes, we'd probably need to add
those too. It might be easier and cleaner to add a config to ignore that
check temporarily. Down side is that it further complicates the Samza
config, which is already huge. Thoughts?

I think storing the previous task-to-partition mapping is more general than
> storing the partition count of all topics for the following reasons:
> - Samza already stores the task-to-container mapping and container-to-host
> mapping in the coordinator stream. It seems consistent to also store the
> partition-to-task mapping. And this information may be useful for other
> use-case such as debugging.
> - By having the new interface take the previous task-to-partition
> assignment instead of a topic-to-partition-count mapping as new parameter,
> we can potentially have grouper implementation to support other types of
> input systems.
> - It is sightly simpler to store the task-to-partition assignment because
> we don't need to know whether this is the first time a job is started or
> not. On the other hand, you can write topic-to-partition-count mapping to
> the coordinator stream only if this is the first time the job is run

The task-to-container and container-to-host mappings are both meaningful in
context of the JobModel. Partition-to-task mapping is not meaningful
without some definition of the key-to-partition assignments. It's
incomplete information and therefore misleading. I think it only makes
sense to use this mapping if we adopt a solution wherein Samza also knows
the partition key assignment.

-Jake

On Tue, Jun 6, 2017 at 11:06 PM, Dong Lin  wrote:

> Hey Jacob,
>
> Thanks for taking time to review the SEP.
>
> I agree with you and Navina that the current SEP doesn't provide support to
> arbitrary input systems and it doesn't support partition shrink. I think
> the scope of this SEP is to support partition expansion for Kafka (the most
> widely used input system of Samza) and keep the door open for partition
> support of various input systems. The current design can support any system
> that meets the two operational requirement specified in the doc.
>
> While it is possible to support more types of input systems, it will likely
> add more complexity to the design. For example, the first alternative
> solution from you requires broker-side support to negotiate hash algorithm.
> The second alternative solution requires changelog partition reshuffle
> which carries its own design complexity and performance overhead. There is
> tradeoff between the generality and the complexity among these choices. I
> like the current design because it is simple and addresses a big usage
> scenario for us. We can add more complexity to generalize the design if it
> enables important use-case. Does this sound reasonable?
>
> Note that the "Rejected Alternative" section also mentions the possibility
> of supporting a wider range of input systems by allowing user to specify
> the new-partition to old-partition mapping. We are not doing it because 1)
> we may have better understanding of the design after we have a specific
> second input system to support 2) the current design can be extended to
> support 

[GitHub] samza pull request #220: SAMZA-1330: Stand alone feature preview, known limi...

2017-06-12 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/samza/pull/220


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] samza pull request #217: SAMZA-1327: create zk namespace if does not exist

2017-06-12 Thread sborya
Github user sborya closed the pull request at:

https://github.com/apache/samza/pull/217


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] samza pull request #218: SAMZA-1327: fail if namespace specified in the conn...

2017-06-12 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/samza/pull/218


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---