Re: Custom ordering when using async

2017-08-10 Thread Jagadish Venkatraman
Hi XiaoChuan,

Are you setting task.max.concurrency > 1 that allows multiple messages
in-flight? (The "keyed executor pool" is only meaningful with that
scenario)

Also, Have you tried increasing your *job.container.thread.pool.size *config
and setting it to the number of tasks in the container? Given that your
input topic is already partitioned by memberID, it'll probably be simpler
to try this first, benchmark your QPS and see if it meets your performance
goals. I'd tune these config-knobs first and confirm that you need the
"keyed executor thread pool". You may find that it introduces more
complexity.

Please let us know if you had further questions. We are happy to further
help you tune your job for maximum performance.



On Wed, Aug 9, 2017 at 4:03 PM, xinyu liu  wrote:

> Hi, XiaoChuan,
>
> For your questions:
>
> 1. By "keyed single thread executor pool", it means something like a map
> from a key to a single thread executor, like Map  where
> each Executor is a Executors.*newSingleThreadExecutor
>  concurrent/Executors.html#newSingleThreadExecutor()>*
> (). This means for a particular key, it will be executed in a designated
> thread, which guarantees the ordering of the key.
>
> 2. For your use case, you can create the above keyed executors by setting
> the key being some hash of the user id. For example:
>
> Map keyedExecutors = new HashMap<>();
>
> in processAsync():
> String memberId = 
> int hash = memberId.hashCode(); // you can reduce the hash size by %
> Executor executor = keyedExecutors.get(hash);
> if (executor == null) {
>   executor = Executors.newSingleThreadExecutor();
>   keyedExecutors.put(hash, executor);
> }
>
> executor.execute(() -> process your message here);
> ...
>
> So the same user will always be executed in a single thread, which ensures
> the ordering. Does this make sense to you?
>
> Thanks,
> Xinyu
>
>
>
> On Wed, Aug 9, 2017 at 10:07 AM, XiaoChuan Yu 
> wrote:
>
> > Hi,
> >
> > I have a few questions regarding the order of processing when using
> > processAsync.
> >
> > From the LinkedIn article here
> >  > processing-and-multithreading-in-apache-samza--part>
> > it
> > mentions the following:
> > "For parallelism within a task, Samza guarantees processAsync will be
> > invoked in order for a task. The processing or completion, however, can
> go
> > out of order. With this guarantee, users can implement sub-task-level
> data
> > pipelining with customized ordering and parallelism. For example, users
> can
> > use a keyed single thread executor pool to have in-order processing per
> key
> > while processing messages with different keys in parallel."
> >
> > 1. What exactly is meant by a "keyed single thread executor pool"? Are
> > there any code examples available on what this looks like?
> > 2. I need to process a stream keyed on user IDs in parallel using
> > processAsync but would like each user's event be processed in order. Does
> > this then require custom ordering logic mentioned in the article?
> >
> > Thanks,
> > Xiaochuan Yu
> >
>



-- 
Jagadish V,
Graduate Student,
Department of Computer Science,
Stanford University


Re: [Discuss] Samza 0.13.1 release

2017-08-10 Thread Jagadish Venkatraman
+1 for the release. thanks for the summary and for driving this Fred!

On Thu, Aug 10, 2017 at 5:15 PM Fred Haifeng Ji 
wrote:

> The format was messed up when sent from my yahoo mail to
> dev@samza.apache.org. I am resending it from my gmail account. Sorry for
> inconvenience!
>
> Hi all,
>
> There have been some new features and critical bug fixes added to master
> since 0.13.0 release, which makes Samza Standalone features more stable. It
> is now good enough to warrant *a new minor release*. We will continue to
> test for stability and performance in the next few weeks.
>
> Here are the main JIRA tickets that will be included in this release (but
> not limited to):
> SAMZA-1165: Cleanup data created by ZkStandalone in ZK;
> SAMZA-1324: Add a metricsreporter lifecycle for JobCoordinator component of
> StreamProcessor;
> SAMZA-1336: Standalone session expiration propagation;
> SAMZA-1337: LocalApplicationRunner needs to support StreamTask;
> SAMZA-1339: Add standalone integration tests;
> …
>
> There are also quite a few bug fixes in 0.13.1, *please check the complete
> list of changes in 0.13.1 here
> <
> https://issues.apache.org/jira/browse/SAMZA-1165?jql=project%20%3D%2012314526%20AND%20fixVersion%20%3D%2012340845%20ORDER%20BY%20priority%20DESC%2C%20key%20ASC
> >*
> .
>
> Most JIRAs in the list have been completed and merged, with the following
> one remaining, but we should try to get it completed before 0.13.1 is
> released.
> SAMZA-1385: Coordination utils in LocalApplicationRunner uses same Zk node
> as ZkJobCoordinatorFactory for leader election
>
> Here's what I propose:
> 1. Cut an 0.13.1 release branch.
> 2. Work on getting the remaining open JIRA done.
> 3. Target a release vote by Aug 18.
>
> Thoughts?
>
> Fred
>
-- 
Sent from my iphone.


Re: [Discuss] Samza 0.13.1 release

2017-08-10 Thread Fred Haifeng Ji
The format was messed up when sent from my yahoo mail to
dev@samza.apache.org. I am resending it from my gmail account. Sorry for
inconvenience!

Hi all,

There have been some new features and critical bug fixes added to master
since 0.13.0 release, which makes Samza Standalone features more stable. It
is now good enough to warrant *a new minor release*. We will continue to
test for stability and performance in the next few weeks.

Here are the main JIRA tickets that will be included in this release (but
not limited to):
SAMZA-1165: Cleanup data created by ZkStandalone in ZK;
SAMZA-1324: Add a metricsreporter lifecycle for JobCoordinator component of
StreamProcessor;
SAMZA-1336: Standalone session expiration propagation;
SAMZA-1337: LocalApplicationRunner needs to support StreamTask;
SAMZA-1339: Add standalone integration tests;
…

There are also quite a few bug fixes in 0.13.1, *please check the complete
list of changes in 0.13.1 here
*
.

Most JIRAs in the list have been completed and merged, with the following
one remaining, but we should try to get it completed before 0.13.1 is
released.
SAMZA-1385: Coordination utils in LocalApplicationRunner uses same Zk node
as ZkJobCoordinatorFactory for leader election

Here's what I propose:
1. Cut an 0.13.1 release branch.
2. Work on getting the remaining open JIRA done.
3. Target a release vote by Aug 18.

Thoughts?

Fred


[GitHub] samza pull request #268: Fix ZkLatch await(timeout, TimeUnit) api.

2017-08-10 Thread shanthoosh
GitHub user shanthoosh opened a pull request:

https://github.com/apache/samza/pull/268

Fix ZkLatch await(timeout, TimeUnit) api.

Use passed in timeUnit value for zkClient.waitUnitExists method rather than 
hardcoding with  `TimeUnit.MILLISECONDS`.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/shanthoosh/samza fix_zklatch_impl

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/samza/pull/268.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #268


commit 7169fb919f20771f632ecffe636570ae7aef64df
Author: Shanthoosh Venkataraman 
Date:   2017-08-11T00:11:15Z

Fix ZkLatch await implementation.

Use passed in timeUnit value for waitUnitExists method (rather than 
hardcoding it as millis).




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: [Discuss] Samza 0.13.1 release

2017-08-10 Thread Jacob Maes
Hey Fred,

The email seems to have some whitespace issues. Can you try sending it
again?

Also, I'd like to include the patch for https://issues.apache.org/
jira/browse/SAMZA-1387 in the release. The PR is up for review now. Please
take a look and let me know what you think.

-Jake

On Thu, Aug 10, 2017 at 4:12 PM, Fred Ji  wrote:

> Hi all,
> There have been some new featuresand critical bug fixes added to master
> since 0.13.0 release, which makes SamzaStandalone features more stable. It
> is now good enough to warrant a new minorrelease. We will continue to test
> for stability and performance in the next fewweeks.
> Here are the main JIRA tickets that will be included in this release (but
> notlimited to):SAMZA-1165: Cleanup datacreated by ZkStandalone in
> ZKSAMZA-1324: Add ametricsreporter lifecycle for JobCoordinator component
> of StreamProcessorSAMZA-1336: Standalone sessionexpiration
> propagationSAMZA-1337: LocalApplicationRunner needs to support
> StreamTaskSAMZA-1339: Add standaloneintegration tests…
> There are also quite a few bug fixes in 0.13.1, please check the complete
> list of changes in 0.13.1 here.
> Most JIRAs in the list havebeen completed and merged, with the following
> one remaining, but we should tryto get it completed before 0.13.1 is
> released.SAMZA-1385: Coordinationutils in LocalApplicationRunner uses same
> Zk node as ZkJobCoordinatorFactoryfor leader election
> Here's what I propose:1. Cut an 0.13.1 release branch.2. Work on getting
> the remaining open JIRA done.3. Target a release vote by Aug 18.
> Thoughts?
> Fred


[GitHub] samza pull request #267: Zk lock example

2017-08-10 Thread sborya
GitHub user sborya opened a pull request:

https://github.com/apache/samza/pull/267

Zk lock example

This is a suggestion on ZkLock implementation. PR is not ready for final 
review.
@navina 
@PawasChhokra 

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sborya/samza ZkLockExample

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/samza/pull/267.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #267


commit d1618d8bbc2cd5f6058d0cb91f58a2a7e9ef4ee9
Author: Boris Shkolnik 
Date:   2017-08-10T23:47:45Z

add ZkLock example

commit d76f2cd8952b7fc5374aedd6a88ae22769e4ac0f
Author: Boris Shkolnik 
Date:   2017-08-10T23:50:33Z

minor changes




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


[GitHub] samza pull request #266: SAMZA-1387: Unable to Start Samza App Because Regex...

2017-08-10 Thread jmakes
GitHub user jmakes opened a pull request:

https://github.com/apache/samza/pull/266

SAMZA-1387: Unable to Start Samza App Because Regex Check



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/jmakes/samza samza-1387

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/samza/pull/266.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #266


commit 71eb968711015d284c0a22cc86fa875d42ae07fc
Author: Jacob Maes 
Date:   2017-08-10T23:49:44Z

SAMZA-1387: Unable to Start Samza App Because Regex Check




---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---


Re: Using Amazon Kinesis as Samza Backend

2017-08-10 Thread Yi Pan
Hi, Christopher,

I am glad that you are interested in the Kinesis integration to Samza! Yes,
Renato's implementation is so far the most complete one. However, I think
that it might not handle dynamic shard split/merge cases. Aditya Toomula in
our team is currently working on a complete proposal to address these
issues as listed in https://issues.apache.org/jira/browse/SAMZA-489. He
will be able to share more details.

-Yi

On Sun, Aug 6, 2017 at 4:16 PM, Christopher Kelly  wrote:

> Hi all!
>
> I am relatively new to using Samza and am trying to hook it up with Kinesis
> rather than Kafka as the messaging bus behind it.  There are a few
> references to functionality intended to make working with Kinesis a bit
> easier (in the latest release notes for example), but I haven't seen much
> code with examples.
>
> I think the most complete implementation I've seen is this github repo:
> https://github.com/renato2099/SamzaKinesis, but I did notice that it
> hasn't
> been updated for a bit.  Has anyone had any experience working with Kinesis
> as the backend? Any guidance would be appreciated!
>
> Thanks,
> Chris
>


[Discuss] Samza 0.13.1 release

2017-08-10 Thread Fred Ji
Hi all,
There have been some new featuresand critical bug fixes added to master since 
0.13.0 release, which makes SamzaStandalone features more stable. It is now 
good enough to warrant a new minorrelease. We will continue to test for 
stability and performance in the next fewweeks.
Here are the main JIRA tickets that will be included in this release (but 
notlimited to):SAMZA-1165: Cleanup datacreated by ZkStandalone in ZKSAMZA-1324: 
Add ametricsreporter lifecycle for JobCoordinator component of 
StreamProcessorSAMZA-1336: Standalone sessionexpiration propagationSAMZA-1337: 
LocalApplicationRunner needs to support StreamTaskSAMZA-1339: Add 
standaloneintegration tests…
There are also quite a few bug fixes in 0.13.1, please check the complete list 
of changes in 0.13.1 here.
Most JIRAs in the list havebeen completed and merged, with the following one 
remaining, but we should tryto get it completed before 0.13.1 is 
released.SAMZA-1385: Coordinationutils in LocalApplicationRunner uses same Zk 
node as ZkJobCoordinatorFactoryfor leader election
Here's what I propose:1. Cut an 0.13.1 release branch.2. Work on getting the 
remaining open JIRA done.3. Target a release vote by Aug 18.
Thoughts? 
Fred

[GitHub] samza pull request #263: SAMZA-1384: Race condition with async commit affect...

2017-08-10 Thread asfgit
Github user asfgit closed the pull request at:

https://github.com/apache/samza/pull/263


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---