[DISCUSS] SEP-8: Add in-memory system consumer & producer

2017-08-02 Thread Bharath Kumara Subramanian
Hi all,

I have created SEP-8 for in-memory system support.
Please find the design doc here


Let me know your thoughts and feedback.

Thanks,
Bharath


[CANCEL][VOTE] Apache Samza 0.13.0 release

2017-05-19 Thread Bharath Kumara Subramanian


Hi all,

This is an official CANCEL for the RC0 vote, and a status update for the
new RC. Resending this since previous emails did not get delivered to the
list.

We found a few issues in RC0 during further testing.
The following issues are already fixed and will be available in new RC:
SAMZA-1286: Close zk connection in ZkController.stop()
SAMZA-1288: Add null check for sink OutputStream
SAMZA-1289: Default id generator if not configured

We're waiting for the following additional items, after which we'll create
the new RC.
SAMZA-1064: Remove dependency of debounce timer from the CoordinationUtils
SAMZA-1298: if ZkConnect string contains extra path, it needs to be created
on the ZK.

We're downgrading the priority for the following from blocker to major. It
might not be part of 0.13.0: SAMZA-1153: Metrics should be added for ZK
based JobCoordinator

We're also updating the documentation and hello-samza tutorials for the new
features. New documentation will be available as part of the final 0.13.0
release.

Thanks,
Prateek


[VOTE] SEP-8: Add in-memory system consumer & producer

2017-09-06 Thread Bharath Kumara Subramanian
Hi all,

Can you please vote for SEP-8?
You can find the design document here
.

Thanks,
Bharath


[DISCUSS] Samza 0.14.0 release

2017-11-27 Thread Bharath Kumara Subramanian
Hi all,



We have added couple of major features to master since 0.13.1 that warrants
a major release.

Within LinkedIn, some of these features have already been tested as part of
our test suites. We plan to continue our testing in coming weeks to
validate the stability prior to release.

We wanted to kick off the discussion in open source forum to keep the
momentum flowing.



Here is the list of features that are part of the new release

   - SAMZA-1510  - Samza
   SQL
   - SAMZA-1417  - Add
   support for multistage batch to Samza on Hadoop
   - SAMZA-1438  - Event-hub
   connectors for Samza



We have also worked on stabilizing our 0.13 features. Here are some
highlights

   - SAMZA-1454 ,
   SAMZA-1493  - Add
   support for durable state for high level API
   - SAMZA-1417 
   SAMZA-1330  SAMZA-1289
    - Stabilization of
   ZooKeeper based deployment model
   - SAMZA-1471 ,
   SAMZA-1392 , SAMZA-1465
    - Performance
   improvements



You can find the concrete list of the features here

.



Here is my proposal on our release schedule and timelines.

   1. Create a release candidate with the current 0.14.0 HEAD
   2. Target a release vote on the week Dec 4st



Thoughts?



Thanks,

Bharath


[VOTE] Apache Samza 0.14.0 RC2

2017-12-13 Thread Bharath Kumara Subramanian
Hi,



This is a call for vote for the release of Apache Samza 0.14.0. We are
excited to see new features and contributors in this release.



The release candidate is signed with the pgp key, which can be found on
keyservers:

https://pgp.mit.edu/pks/lookup?op=get=0xE0A84E1C9C19AA37



The git tag is release-0.14.0-rc2 and signed with the key C31D7061
:

https://git-wip-us.apache.org/repos/asf?p=samza.git;a=tag;h=refs/tags/release-0.14.0-rc2



Test binaries have been published to Maven's staging repository, and are
available here:

https://repository.apache.org/content/repositories/orgapachesamza-1035/


70 issues have been resolved as part of this release

*https://issues.apache.org/jira/browse/SAMZA-1534?jql=project%20%3D%20SAMZA%20AND%20fixVersion%20%3D%200.14.0%20AND%20(status%20%3D%20Resolved%20OR%20status%20%3D%20Closed)
*



The vote will be open for 72 hours (ending at 15:00 Friday, 12/18/2017).



Please download the release candidate, check the hashes/signature, build it

and test it, and then please vote:





[ ] +1 approve



[ ] +0 no opinion



[ ] -1 disapprove (and reason why)





Thanks,

Bharath


[CANCEL][VOTE] Apache Samza 0.14.0 RC1

2017-12-05 Thread Bharath Kumara Subramanian
Hi,

This is an official CANCEL for the RC1 vote and a status update.

We are waiting on the following items

   - SAMZA-1524  - Create
   Eventhub example in hello-samza
   - SAMZA-1525  - Add
   Eventhub documentation
   - SAMZA-1526  - Add
   tooling and documentation for Samza SQL


We will have an update once the new RC is available.

Thanks,
Bharath


[VOTE] Apache Samza 0.14.0 RC1

2017-12-05 Thread Bharath Kumara Subramanian
Hi,



This is a call for vote for the release of Apache Samza 0.14.0. We are
excited to see new features and contributors in this release.



The release candidate is signed with the pgp key, which can be found on
keyservers:

https://pgp.mit.edu/pks/lookup?op=get=0xE0A84E1C9C19AA37



The git tag is release-0.14.0-rc1 and signed with the key C31D7061
:

https://git-wip-us.apache.org/repos/asf?p=samza.git;a=tag;h=refs/tags/release-0.14.0-rc1



Test binaries have been published to Maven's staging repository, and are
available here:

https://repository.apache.org/content/repositories/orgapachesamza-1034/


61 issues have been resolved as part of this release

https://issues.apache.org/jira/browse/SAMZA-1519?jql=
project%20%3D%20SAMZA%20AND%20fixVersion%20%3D%200.14.0%
20AND%20status%20%3D%20Resolved



The vote will be open for 72 hours (ending at 11:00 AM Friday, 12/08/2017).



Please download the release candidate, check the hashes/signature, build it

and test it, and then please vote:





[ ] +1 approve



[ ] +0 no opinion



[ ] -1 disapprove (and reason why)





Thanks,

Bharath


[CANCEL][VOTE] Apache Samza 0.14.0 RC0

2017-12-05 Thread Bharath Kumara Subramanian
Hi,

This is an official CANCEL for the RC0 vote and status update on the new RC.

We have few last minute cleanup and javadoc changes related to SerDe
support through APIs. Keep an eye out for the new RC.

We are also working on updating documentation and tutorials for Samza-SQL.
It will be available as the part of the final 0.14.0 release.


Thanks,
Bharath


[VOTE] Apache Samza 0.14.0 RC0

2017-12-03 Thread Bharath Kumara Subramanian
Hi,



This is a call for vote for the release of Apache Samza 0.14.0. We are
excited to see new features and contributors in this release.



The release candidate is signed with the pgp key, which can be found on
keyservers:

https://pgp.mit.edu/pks/lookup?op=get=0xE0A84E1C9C19AA37



The git tag is release-0.14.0-rc0 and signed with the key C31D7061
:

https://git-wip-us.apache.org/repos/asf?p=samza.git;a=tag;h=refs/tags/release-0.14.0-rc0



Test binaries have been published to Maven's staging repository, and are
available here:

https://repository.apache.org/content/repositories/orgapachesamza-1033/


61 issues have been resolved as part of this release

https://issues.apache.org/jira/browse/SAMZA-1519?jql=project%20%3D%20SAMZA%20AND%20fixVersion%20%3D%200.14.0%20AND%20status%20%3D%20Resolved



The vote will be open for 72 hours (ending at 8:00 AM Wednesday,
01/06/2017).



Please download the release candidate, check the hashes/signature, build it

and test it, and then please vote:





[ ] +1 approve



[ ] +0 no opinion



[ ] -1 disapprove (and reason why)





Thanks,

Bharath


[VOTE] SEP-21: Samza Async API for High Level

2019-03-25 Thread Bharath Kumara Subramanian
Hi all,


This is a call for a vote for SEP-21: Samza Async API for High Level


SEP-21 has been discussed and implemented using SAMZA-2055. For reference,
the design document can be found -
https://cwiki.apache.org/confluence/display/SAMZA/SEP-21%3A+Samza+Async+API+for+High+Level


Thanks,
Bharath


Re: [DISCUSS] Change Apache Samza git comments/merge email recipient to commits@samza

2019-04-11 Thread Bharath Kumara Subramanian
+1 to separate discussion and commit threads.

On Fri, Apr 5, 2019 at 5:23 PM Jagadish Venkatraman 
wrote:

> +1 to the proposal Xinyu.
>
> Thanks,
>
> On Fri, Apr 5, 2019 at 5:01 PM Prateek Maheshwari 
> wrote:
>
> > Thanks Xinyu. Separating discussions and commit messages sounds good to
> me.
> > I'm +1, but happy to keep it as-is if others find the commit emails
> useful.
> >
> > - Prateek
> >
> > On Thu, Apr 4, 2019 at 3:14 PM Xinyu Liu  wrote:
> >
> > > Hi, All,
> > >
> > > Our dev mailing list has been flooded with github comments/merges so
> it's
> > > really hard to see the meaningful discussions and user engagement.
> Shall
> > we
> > > move dev@Samza off the JIRA/github messages? We will use
> > > comm...@samza.apache.org as the recipients.
> > >
> > > In addition, I think we should have most of our open source discussions
> > in
> > > the dev mailing list to raise visibility as well as attract
> contributors.
> > >
> > > Please let me know if you are ok or have objections to this change.
> It's
> > > a +1 on my side.
> > >
> > > Thanks,
> > > Xinyu
> > >
> >
>
>
> --
> Jagadish V,
> Graduate Student,
> Department of Computer Science,
> Stanford University
>


Re: Supported Versions Question

2019-06-04 Thread Bharath Kumara Subramanian
Hi Darin,

The recommended dependency version in release notes are the ones we
validate our releases. Typically, the patch and minor versions of the
recommended dependency version works.

Here is the supported matrix for Kafka clients

Dependency Supported versions Not supported version Samza version

Kafka Clients
0.11.*
1.*, 2.*
0.10 and below
1.0
1.1
2.0.* 1.* and below 1.2 (Upcoming release)

Thanks,
Bharath


On Fri, May 31, 2019 at 11:07 AM Darin Douglass 
wrote:

> All,
>
> Is there any documentation around which versions of services Samza
> supports? I, for the life of me, cannot find anything that lists explicitly
> supported versions anywhere.
>
> For example: the last reference I find to a specific Kafka version is here
> in the Samza 1.0.0 release notes:
>
> Kafka upgrade
>
> This release upgrades Samza to use Kafka’s high-level consumer (Kafka
> v0.11.1.62). This brings latency and throughput benefits for Samza
> applications that consume from Kafka, in addition to bug-fixes. This also
> means Samza applications can now better their utilization of the underlying
> Kafka cluster.
>
> That blurb says Samza _uses_ a specific feature of Kafka v0.11.1.62 but
> not whether it supports anything passed that version.
>
> Thanks in advance!
> —
> Darin Douglass
>
> ===
> Forrester names Barracuda WAF a Strong Performer!
> Get the free report here!
> https://www.barracuda.com/WAFWave
>
> DISCLAIMER:
> This e-mail and any attachments to it contain confidential and proprietary
> material of Barracuda, its affiliates or agents, and is solely for the use
> of the intended recipient. Any review, use, disclosure, distribution or
> copying of this transmittal is prohibited except by or on behalf of the
> intended recipient. If you have received this transmittal in error, please
> notify the sender and destroy this e-mail and any attachments and all
> copies, whether electronic or printed.
> ===
>


Re: Supported Kafka version by Samza

2019-06-05 Thread Bharath Kumara Subramanian
HI Qi,

You are right. I forgot to update the other thread on this. I suspected
Kafka 1.0 was the reason for NPE since it had trouble instantiating the
logger.
Here is the Kafka client support matrix for our releases

Dependency Supported versions Not supported version Samza version

Kafka
0.11.*
1.*, 2.*
1.0
1.1
2.0.* 1.* and below 1.2(Upcoming release)
Hope it helps.

Thanks,
Bharath


On Tue, Jun 4, 2019 at 11:24 PM QiShu  wrote:

> Hi,
>
> Running environment:
> Samza version: 2.12-1.1.0
> Kafka cluster version: 2.12-1.1.0
> Hadoop version: 3.1.0
>
> When using Kafka_2.12-1.1.0.jar, Samza job failed to run when
> retrieving coordinator stream meta info from Kafka:
> 2019-06-03 17:26:11.176 [main] KafkaSystemAdmin [INFO] Fetching
> SystemStreamMetadata for topics [__samza_coordinator_canal-metrics-test_1]
> on system kafka
> 2019-06-03 17:26:11.179 [main] KafkaSystemAdmin [ERROR] Fetching system
> stream metadata for: [__samza_coordinator_canal-metrics-test_1] threw an
> exception.
> java.lang.NullPointerException
> at
> java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:936)
> at
> org.slf4j.impl.Log4jLoggerFactory.getLogger(Log4jLoggerFactory.java:75)
> at org.slf4j.LoggerFactory.getLogger(LoggerFactory.java:253)
> at kafka.utils.Logging.logger(Logging.scala:43)
> at kafka.utils.Logging.debug(Logging.scala:62)
> at kafka.utils.Logging.debug$(Logging.scala:62)
> at
> org.apache.samza.system.kafka.KafkaSystemAdminUtilsScala$.debug(KafkaSystemAdminUtilsScala.scala:45)
> at
> org.apache.samza.system.kafka.KafkaSystemAdminUtilsScala$.assembleMetadata(KafkaSystemAdminUtilsScala.scala:74)
> at
> org.apache.samza.system.kafka.KafkaSystemAdminUtilsScala.assembleMetadata(KafkaSystemAdminUtilsScala.scala)
> at
> org.apache.samza.system.kafka.KafkaSystemAdmin.fetchSystemStreamMetadata(KafkaSystemAdmin.java:461)
> at
> org.apache.samza.system.kafka.KafkaSystemAdmin.access$400(KafkaSystemAdmin.java:76)
> at
> org.apache.samza.system.kafka.KafkaSystemAdmin$4.apply(KafkaSystemAdmin.java:340)
> at
> org.apache.samza.system.kafka.KafkaSystemAdmin$4.apply(KafkaSystemAdmin.java:337)
> at
> org.apache.samza.util.ExponentialSleepStrategy.run(ExponentialSleepStrategy.scala:90)
> at
> org.apache.samza.system.kafka.KafkaSystemAdmin.getSystemStreamMetadata(KafkaSystemAdmin.java:374)
> at
> org.apache.samza.system.kafka.KafkaSystemAdmin.getSystemStreamMetadata(KafkaSystemAdmin.java:292)
> at
> org.apache.samza.coordinator.stream.CoordinatorStreamSystemConsumer.register(CoordinatorStreamSystemConsumer.java:109)
> at org.apache.samza.job.JobRunner.run(JobRunner.scala:105)
> at
> org.apache.samza.runtime.RemoteApplicationRunner.lambda$run$0(RemoteApplicationRunner.java:76)
> at java.util.ArrayList.forEach(ArrayList.java:1255)
> at
> org.apache.samza.runtime.RemoteApplicationRunner.run(RemoteApplicationRunner.java:73)
> at
> org.apache.samza.runtime.ApplicationRunnerUtil.invoke(ApplicationRunnerUtil.java:49)
> at
> org.apache.samza.runtime.ApplicationRunnerMain.main(ApplicationRunnerMain.java:53)
> Exception in thread "main" org.apache.samza.SamzaException: Failed to run
> application
> at
> org.apache.samza.runtime.RemoteApplicationRunner.run(RemoteApplicationRunner.java:79)
> at
> org.apache.samza.runtime.ApplicationRunnerUtil.invoke(ApplicationRunnerUtil.java:49)
> at
> org.apache.samza.runtime.ApplicationRunnerMain.main(ApplicationRunnerMain.java:53)
> Caused by: org.apache.samza.SamzaException: java.lang.NullPointerException
> at
> org.apache.samza.system.kafka.KafkaSystemAdmin$5.apply(KafkaSystemAdmin.java:358)
> at
> org.apache.samza.system.kafka.KafkaSystemAdmin$5.apply(KafkaSystemAdmin.java:347)
> at
> org.apache.samza.util.ExponentialSleepStrategy.run(ExponentialSleepStrategy.scala:97)
> at
> org.apache.samza.system.kafka.KafkaSystemAdmin.getSystemStreamMetadata(KafkaSystemAdmin.java:374)
> at
> org.apache.samza.system.kafka.KafkaSystemAdmin.getSystemStreamMetadata(KafkaSystemAdmin.java:292)
> at
> org.apache.samza.coordinator.stream.CoordinatorStreamSystemConsumer.register(CoordinatorStreamSystemConsumer.java:109)
> at org.apache.samza.job.JobRunner.run(JobRunner.scala:105)
> at
> org.apache.samza.runtime.RemoteApplicationRunner.lambda$run$0(RemoteApplicationRunner.java:76)
> at java.util.ArrayList.forEach(ArrayList.java:1255)
> at
> org.apache.samza.runtime.RemoteApplicationRunner.run(RemoteApplicationRunner.java:73)
> ... 2 more
> Caused by: java.lang.NullPointerException
> at
> java.util.concurrent.ConcurrentHashMap.get(ConcurrentHashMap.java:936)
> at
> org.slf4j.impl.Log4jLoggerFactory.getLogger(Log4jLoggerFactory.java:75)
> at 

Re: REMINDER. [VOTE] Apache Samza 1.2.0 RC4

2019-06-05 Thread Bharath Kumara Subramanian
+1 (non-binding)

Verified build and test on Linux. I too noticed some intermittent failures
on mac for Scala 2.12.

Thanks,
Bharath

On Tue, Jun 4, 2019 at 2:00 PM Hai Lu  wrote:

> +1 (non-binding)
>
> Verified build and test on Linux box. On mac the test is failing but seems
> like flakiness not real failure.
>
> Thanks,
> Hai
>
> On Tue, Jun 4, 2019 at 1:55 PM santhosh venkat <
> santhoshvenkat1...@gmail.com>
> wrote:
>
> > +1(non-binding)
> >
> > 1. ./bin/check-all.sh succeeded
> > 2. ./bin/integration-tests.sh succeeded
> > 3. Expanded samza-tools and followed the tutorial steps for standalone
> SQL
> > examples Succeeded.
> > 4. Verified all sha1 hash code and asc signatures successfully
> >
> > Thanks,
> >
> >
> > On Tue, Jun 4, 2019 at 1:26 PM Xinyu Liu  wrote:
> >
> > > +1 (binding).
> > >
> > > run check-all.sh and the build passed.
> > >
> > > Having trouble running the integration tests in both linux and mac,
> > > possibly due to my local machine env.
> > >
> > > Thanks,
> > > Xinyu
> > >
> > > On Mon, Jun 3, 2019 at 11:00 AM Daniel Nishimura  >
> > > wrote:
> > >
> > > > check-all.sh and integration tests passed. +1 from me.
> > > >
> > > > Just a side note, the link in the original email is a broken link.
> The
> > > link
> > > > to the RC archive is: http://home.apache.org/~boryas/samza-1.2.0-rc4
> > > >
> > > > On Sun, Jun 2, 2019 at 5:00 PM Boris Shkolnik 
> > wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > This is a call for a vote on a release of Apache Samza 1.2.0.
> Thanks
> > to
> > > > > everyone who has contributed to this release.
> > > > >
> > > > >
> > > > > The release candidate can be downloaded from here:
> > > > > http://home.apache.org/~boryas/samza-1.2.0-rc
> > > > > 4
> > > > >
> > > > > (this release has a fix for standalone integration test)
> > > > >
> > > > > The release candidate is signed with pgp key 0x7D74D0CD5B5EB041,
> > which
> > > > can
> > > > > be found
> > > > >
> > >
> http://keyserver.ubuntu.com/pks/lookup?op=get=0x7d74d0cd5b5eb041
> > > > > <
> > >
> http://keyserver.ubuntu.com/pks/lookup?op=get=0xF8B95961A401BF0F
> > > > >
> > > > > The git tag is release-1.2.0-rc4 and signed with the same pgp key:
> > > > >
> > > > >
> > > >
> > >
> >
> https://gitbox.apache.org/repos/asf?p=samza.git;a=tag;h=refs/tags/release-1.2.0-rc
> > > > > <
> > > > >
> > > >
> > >
> >
> https://gitbox.apache.org/repos/asf?p=samza.git;a=tag;h=refs/tags/release-1.1.0-rc1
> > > > > >
> > > > > 4
> > > > >
> > > > > Test binaries have been published to Maven's staging repository,
> and
> > > are
> > > > > available here:
> > > > >
> > https://repository.apache.org/content/repositories/orgapachesamza-106
> > > > > <
> > > > >
> > > >
> > >
> >
> https://repository.apache.org/content/repositories/orgapachesamza-1065/org/
> > > > > >
> > > > > 9
> > > > >
> > > > > The vote will be open until 06:00 PM PST Monday, 06/03/2019.
> > > > >
> > > > >
> > > > > Please download the release candidate, check the hashes/signature,
> > > build
> > > > it
> > > > > and test it, and then please vote:
> > > > >
> > > > > [ ] +1 approve
> > > > >
> > > > > [ ] +0 no opinion
> > > > >
> > > > > [ ] -1 disapprove (and reason why)
> > > > >
> > > > > I ran check-all.sh and integration tests.
> > > > >
> > > > > +1 from my side.
> > > > >
> > > > > Thanks
> > > > >
> > > >
> > >
> >
>


Re: Can't start app in Yarn

2019-05-30 Thread Bharath Kumara Subramanian
 sasl.kerberos.ticket.renew.window.factor = 0.8
> sasl.mechanism = GSSAPI
> security.protocol = PLAINTEXT
> send.buffer.bytes = 131072
> ssl.cipher.suites = null
> ssl.enabled.protocols = [TLSv1.2, TLSv1.1, TLSv1]
> ssl.endpoint.identification.algorithm = null
> ssl.key.password = null
> ssl.keymanager.algorithm = SunX509
> ssl.keystore.location = null
> ssl.keystore.password = null
> ssl.keystore.type = JKS
> ssl.protocol = TLS
> ssl.provider = null
> ssl.secure.random.implementation = null
> ssl.trustmanager.algorithm = PKIX
> ssl.truststore.location = null
> ssl.truststore.password = null
> ssl.truststore.type = JKS
>
> 2019-05-30 09:59:11.191 [main] AdminClientConfig [WARN] The configuration
> 'zookeeper.connect' was supplied but isn't a known config.
> 2019-05-30 09:59:11.191 [main] AppInfoParser [INFO] Kafka version : 1.1.0
> 2019-05-30 09:59:11.191 [main] AppInfoParser [INFO] Kafka commitId :
> fdcf75ea326b8e07
> 2019-05-30 09:59:11.208 [main] Log4jControllerRegistration$ [INFO]
> Registered kafka:type=kafka.Log4jController MBean
> 2019-05-30 09:59:11.223 [main] KafkaSystemAdmin [INFO] Created
> KafkaSystemAdmin for system kafka
> Exception in thread "main" org.apache.samza.SamzaException: Failed to run
> application
> at
> org.apache.samza.runtime.RemoteApplicationRunner.run(RemoteApplicationRunner.java:79)
> at
> org.apache.samza.runtime.ApplicationRunnerUtil.invoke(ApplicationRunnerUtil.java:49)
> at
> org.apache.samza.runtime.ApplicationRunnerMain.main(ApplicationRunnerMain.java:53)
> Caused by: java.lang.NullPointerException
> at java.util.HashMap.merge(HashMap.java:1225)
> at
> java.util.stream.Collectors.lambda$toMap$58(Collectors.java:1320)
> at
> java.util.stream.ReduceOps$3ReducingSink.accept(ReduceOps.java:169)
> at
> java.util.HashMap$EntrySpliterator.forEachRemaining(HashMap.java:1696)
> at
> java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
> at
> java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
> at
> java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
> at
> java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
> at
> java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
> at
> org.apache.samza.config.JavaSystemConfig.getSystemAdmins(JavaSystemConfig.java:84)
> at
> org.apache.samza.system.SystemAdmins.(SystemAdmins.java:38)
> at
> org.apache.samza.execution.StreamManager.(StreamManager.java:55)
> at
> org.apache.samza.execution.JobPlanner.buildAndStartStreamManager(JobPlanner.java:64)
> at
> org.apache.samza.execution.JobPlanner.getExecutionPlan(JobPlanner.java:94)
> at
> org.apache.samza.execution.RemoteJobPlanner.prepareJobs(RemoteJobPlanner.java:57)
> at
> org.apache.samza.runtime.RemoteApplicationRunner.run(RemoteApplicationRunner.java:67)
> ... 2 more
>
>
> Thanks.
>
> Qi Shu
>
> > 在 2019年5月30日,上午3:41,Bharath Kumara Subramanian 
> 写道:
> >
> > Hi Qi,
> >
> > Can you share your application configuration? Especially the systems your
> > application consumes and produces to and its related configuration.
> > Also, would it be possible for to attach the entire log?
> >
> > Thanks,
> > Bharath
> >
> > On Tue, May 28, 2019 at 7:07 PM QiShu  wrote:
> >
> >> Hi,
> >>
> >>Below is the running environment:
> >> Hadoop version 3.1.0
> >> java version “1.8.0_151"
> >> samza-api-1.1.0.jar
> >> samza-core_2.12-1.1.0.jar
> >> samza-kafka_2.12-1.1.0.jar
> >> samza-kv_2.12-1.1.0.jar
> >> samza-kv-inmemory_2.12-1.1.0.jar
> >> samza-kv-rocksdb_2.12-1.1.0.jar
> >> samza-log4j_2.12-1.1.0.jar
> >> samza-shell-1.1.0-dist.tgz
> >> samza-yarn_2.12-1.1.0.jar
> >> scala-compiler-2.12.1.jar
> >> scala-library-2.12.1.jar
> >> scala-logging_2.12-3.7.2.jar
> >> scala-parser-combinators_2.12-1.0.4.jar
> >> scala-reflect-2.12.4.jar
> >> scalate-core_2.12-1.8.0.jar
> >> scalate-util_2.12-1.8.0.jar
> >> scalatra_2.12-2.5.0.jar
> >> scalatra-common_2.12-2.5.0.jar
> >> scalatra-scalate_2.12-2.5.0.jar
> >> scala-xml_2.12-1.0.6.jar
> >> kafka_2.12-1.1.0.jar
> >> kafka-clients-1.1.0.jar
> >>
> >>Below is the exception when starting ap

Re: [ANNOUNCE] Please welcome Boris Shkolnik to the Samza PMC

2019-06-07 Thread Bharath Kumara Subramanian
Congratulations Boris!

On Fri, Jun 7, 2019 at 3:19 PM Jagadish Venkatraman 
wrote:

> Congratulations Boris!
>
> On Fri, Jun 7, 2019 at 3:15 PM Xinyu Liu  wrote:
>
> > Congrats, Boris!
> >
> > Xinyu
> >
> > On Fri, Jun 7, 2019 at 3:13 PM Jakob Homan  wrote:
> >
> > > Howdy all-
> > >I'm very pleased to announce that the Samza PMC has voted Boris
> > > Shkolnik to be a Project Management Committee (PMC) Member.  The PMC
> > > is responsible for the overall health of a project andl for voting in
> > > new committers and PMC members, as well as VOTEing on releases. Over
> > > the past two years, Boris has been a valuable committer on the
> > > project.
> > >
> > > Congrats Boris!
> > >
> > > Thanks,
> > >
> > > Jakob
> > > on behalf of the Samza PMC
> > >
> >
>
>
> --
> Jagadish V,
> Graduate Student,
> Department of Computer Science,
> Stanford University
>


Re: Can't start app in Yarn

2019-05-29 Thread Bharath Kumara Subramanian
Hi Qi,

Can you share your application configuration? Especially the systems your
application consumes and produces to and its related configuration.
Also, would it be possible for to attach the entire log?

Thanks,
Bharath

On Tue, May 28, 2019 at 7:07 PM QiShu  wrote:

> Hi,
>
> Below is the running environment:
> Hadoop version 3.1.0
> java version “1.8.0_151"
> samza-api-1.1.0.jar
> samza-core_2.12-1.1.0.jar
> samza-kafka_2.12-1.1.0.jar
> samza-kv_2.12-1.1.0.jar
> samza-kv-inmemory_2.12-1.1.0.jar
> samza-kv-rocksdb_2.12-1.1.0.jar
> samza-log4j_2.12-1.1.0.jar
> samza-shell-1.1.0-dist.tgz
> samza-yarn_2.12-1.1.0.jar
> scala-compiler-2.12.1.jar
> scala-library-2.12.1.jar
> scala-logging_2.12-3.7.2.jar
> scala-parser-combinators_2.12-1.0.4.jar
> scala-reflect-2.12.4.jar
> scalate-core_2.12-1.8.0.jar
> scalate-util_2.12-1.8.0.jar
> scalatra_2.12-2.5.0.jar
> scalatra-common_2.12-2.5.0.jar
> scalatra-scalate_2.12-2.5.0.jar
> scala-xml_2.12-1.0.6.jar
> kafka_2.12-1.1.0.jar
> kafka-clients-1.1.0.jar
>
> Below is the exception when starting app in Yarn:
> 2019-05-29 09:52:47.851 [main] AppInfoParser [INFO] Kafka version : 1.1.0
> 2019-05-29 09:52:47.851 [main] AppInfoParser [INFO] Kafka commitId :
> fdcf75ea326b8e07
> 2019-05-29 09:52:47.862 [main] Log4jControllerRegistration$ [INFO]
> Registered kafka:type=kafka.Log4jController MBean
> 2019-05-29 09:52:47.877 [main] KafkaSystemAdmin [INFO] Created
> KafkaSystemAdmin for system kafka
> Exception in thread "main" org.apache.samza.SamzaException: Failed to run
> application
> at
> org.apache.samza.runtime.RemoteApplicationRunner.run(RemoteApplicationRunner.java:79)
> at
> org.apache.samza.runtime.ApplicationRunnerUtil.invoke(ApplicationRunnerUtil.java:49)
> at
> org.apache.samza.runtime.ApplicationRunnerMain.main(ApplicationRunnerMain.java:53)
> Caused by: java.lang.NullPointerException
> at java.util.HashMap.merge(HashMap.java:1225)
> at
> java.util.stream.Collectors.lambda$toMap$58(Collectors.java:1320)
> at
> java.util.stream.ReduceOps$3ReducingSink.accept(ReduceOps.java:169)
> at
> java.util.HashMap$EntrySpliterator.forEachRemaining(HashMap.java:1696)
> at
> java.util.stream.AbstractPipeline.copyInto(AbstractPipeline.java:481)
> at
> java.util.stream.AbstractPipeline.wrapAndCopyInto(AbstractPipeline.java:471)
> at
> java.util.stream.ReduceOps$ReduceOp.evaluateSequential(ReduceOps.java:708)
> at
> java.util.stream.AbstractPipeline.evaluate(AbstractPipeline.java:234)
> at
> java.util.stream.ReferencePipeline.collect(ReferencePipeline.java:499)
> at
> org.apache.samza.config.JavaSystemConfig.getSystemAdmins(JavaSystemConfig.java:84)
> at
> org.apache.samza.system.SystemAdmins.(SystemAdmins.java:38)
> at
> org.apache.samza.execution.StreamManager.(StreamManager.java:55)
> at
> org.apache.samza.execution.JobPlanner.buildAndStartStreamManager(JobPlanner.java:64)
> at
> org.apache.samza.execution.JobPlanner.getExecutionPlan(JobPlanner.java:94)
> at
> org.apache.samza.execution.RemoteJobPlanner.prepareJobs(RemoteJobPlanner.java:57)
> at
> org.apache.samza.runtime.RemoteApplicationRunner.run(RemoteApplicationRunner.java:67)
> ... 2 more
>
>
> Thanks for your help!
>
> Qi Shu
>
>
>


Re: [VOTE] Apache Samza 1.2.0 RC0

2019-05-22 Thread Bharath Kumara Subramanian
-1 disapprove.

We have a bug fix  for
Async API.

The fix has been committed to master and we need to include it in 1.2.
The commit hash c087f8

to cherry pick.

Thanks,
Bharath

On Wed, May 22, 2019 at 11:03 AM Boris Shkolnik  wrote:

> Hi,
>
> This is a call for a vote on a release of Apache Samza 1.2.0. Thanks to
> everyone who has contributed to this release.
>
> The release candidate can be downloaded from here:
> http://home.apache.org/~boryas/samza-1.2.0-rc0/
>
> The release candidate is signed with pgp key 0x7D74D0CD5B5EB041, which can
> be found
> http://keyserver.ubuntu.com/pks/lookup?op=get=0x7d74d0cd5b5eb041
> 
> The git tag is release-1.2.0-rc0 and signed with the same pgp key:
>
> https://gitbox.apache.org/repos/asf?p=samza.git;a=tag;h=refs/tags/release-1.2.0-rc
> <
> https://gitbox.apache.org/repos/asf?p=samza.git;a=tag;h=refs/tags/release-1.1.0-rc1
> >
> 0
>
> Test binaries have been published to Maven's staging repository, and are
> available here:
> https://repository.apache.org/content/repositories/orgapachesamza-1065
> <
> https://repository.apache.org/content/repositories/orgapachesamza-1065/org/
> >
>
> The vote will be open for 56 hours (ending at 06:00 PM PST Friday,
> 05/24/2019).
>
> Please download the release candidate, check the hashes/signature, build it
> and test it, and then please vote:
>
> [ ] +1 approve
>
> [ ] +0 no opinion
>
> [ ] -1 disapprove (and reason why)
>
> I ran check-all.sh and integration tests.
>
> +1 (non-binding) from my side.
>
> Thanks,
>


Re: [DISCUSS] 1.2 release

2019-05-22 Thread Bharath Kumara Subramanian
We have a critical bug fix
 for Async high level API.

The changes  have been committed
to master.



Can we include that as part of 1.2?



Thanks,
Bharath

On Mon, May 20, 2019 at 3:38 PM Jake Maes  wrote:

> Ahh, perhaps I was mistaken.
>
> I must be thinking of a different ticket.
>
> Thanks for checking!
>
> On Mon, May 20, 2019 at 3:13 PM Boris S  wrote:
>
> > @jmakes, Hmm, the git log actually shows a commit related to this JIRA,
> so
> > there is a code change associated with this ticket. And the ticket is
> > resolved as FIXED. I can remove it the tag from the jira, but I cannot
> > remove it from the git log.
> >
> > On Mon, May 20, 2019 at 9:39 AM Jake Maes  wrote:
> >
> > > I don't think we did anything for "Making sendTo(table), sendTo(stream)
> > > non-terminal". The ticket was just closed as a "won't fix" IIRC.
> > >
> > > Nevertheless, I think the Kafka 2.0 upgrade warrants a release by
> itself.
> > >
> > > Let's do it.
> > >
> > > On Fri, May 17, 2019 at 12:17 PM Boris S  wrote:
> > >
> > > > Hi all,
> > > >
> > > > We have added a number of major features and changes to master since
> > > > 1.1 that warrants a new 1.2 release.
> > > >
> > > > Within LinkedIn, some of these features have already been tested as
> > > > part of our test suites. We plan to continue our testing in coming
> > > > week to validate the stability prior to release.
> > > >
> > > > We wanted to kick off the discussion in the open source forum to keep
> > > > the momentum flowing.
> > > > Here is a selected list of features that are part of the new release
> > > >
> > > >   Kafka 2.0 upgrade
> > > >
> > > >   Couchbase support for Samza Table API
> > > >   Making sendTo(table), sendTo(stream) non-terminal
> > > >
> > > > We have also worked on the following upgrades and bugfixes.
> > > > You can find a concrete list of the features, bug-fixes, upgrades
> > > > herehttps://
> > > >
> > >
> >
> issues.apache.org/jira/issues/?jql=project%20%3D%20%22SAMZA%22%20and%20fixVersion%20in%20(1.2)
> > > >
> > > >
> > > > Some of these Jiras are not marked as fixed (but they are marked as
> > > > committed in the git log). Please close the Jiras is they are fixed.
> > > >
> > > > Here is my proposal on our release schedule and timelines.
> > > >1. Cut the 1.2 release branch.
> > > >2. Target a release vote on the week of May 20, 2019
> > > >
> > > >
> > > > Thanks
> > > > Boris
> > > >
> > >
> >
>


[DISCUSS] SAMZA-2305: Standalone rebalance bug

2019-08-20 Thread Bharath Kumara Subramanian
We recently encountered a bug in standalone that results in processors
running with inconsistent job model.

Here is a document

that describes the problem and proposes few approaches to fix the problem.

Please let me know your thoughts.

--
Bharath


Re: [PROPOSAL] Async ParDo API

2019-09-13 Thread Bharath Kumara Subramanian
Please ignore the thread. It was intended for a different community.


On Fri, Sep 13, 2019 at 9:37 AM Bharath Kumara Subramanian <
codin.mart...@gmail.com> wrote:

> Hi all,
>
> I have put a proposal
> <https://docs.google.com/document/d/1t--UYXgaij0ULEoXUnhG3r8OZPBljN9r_WWlwQJBDrI/edit?usp=sharing>
> to add async ParDo API.
>
> For context, please refer to the archive conversation:
> https://www.mail-archive.com/dev@beam.apache.org/msg12101.html
>
> Thanks,
> Bharath
>


[PROPOSAL] Async ParDo API

2019-09-13 Thread Bharath Kumara Subramanian
Hi all,

I have put a proposal

to add async ParDo API.

For context, please refer to the archive conversation:
https://www.mail-archive.com/dev@beam.apache.org/msg12101.html

Thanks,
Bharath


Re: [VOTE] SEP-18: Startpoints - Manipulating Starting Offsets for Input Streams

2019-09-06 Thread Bharath Kumara Subramanian
+1 (non-binding)

Looks good to me.

On Fri, Sep 6, 2019 at 8:11 AM Jagadish Venkatraman 
wrote:

> +1 (binding)
>
> Excellent work Dan! LGTM;
>
> On Fri, Sep 6, 2019 at 8:02 AM Daniel Nishimura 
> wrote:
>
> > Please vote for SEP-18
> > <
> >
> https://cwiki.apache.org/confluence/display/SAMZA/SEP-18%3A+Startpoints+-+Manipulating+Starting+Offsets+for+Input+Streams
> > >.
> > Thanks to the committers and contributors that were involved with the
> > design, review and implementation!
> >
> > SEP-18 has been discussed and implemented using SAMZA-1983
> > 
> >
> >
> >
> https://cwiki.apache.org/confluence/display/SAMZA/SEP-18%3A+Startpoints+-+Manipulating+Starting+Offsets+for+Input+Streams
> >
>
>
> --
> Jagadish V,
> Graduate Student,
> Department of Computer Science,
> Stanford University
>


Re: Occasional checkpoint mismatch on Samza task reload

2019-10-31 Thread Bharath Kumara Subramanian
Hi Malcolm,

The warning is not particularly related to your restarts. It happens when
the offset requested by the consumer no longer exists on the broker.
It can typically happen if you are consuming from a topic with TTL enabled
and the time retention kicked in and broker purged older offsets.

You can find the starting offset from the container logs for that specific
topic and partition. Look for

"Registering ssp: {} with offset: {}"

Once you get the starting offset, you can verify if the above theory is
true by looking at the metadata of the topic in zookeeper to get the
beginning and end offsets for a particular partition.

Let me know if you have any other questions.

Thanks,
Bharath

On Thu, Oct 31, 2019 at 6:09 PM Malcolm McFarland 
wrote:

> Hey folks,
>
> We're running Samza 0.14.1 on Hadoop 2.7.6. Every once in a while while
> restarting an application, the process will come up with some variation on
> this error:
>
> INFO Validating offset  for topic and partition
> [,]
> WARN While refreshing brokers for [,]:
> org.apache.kafka.common.errors.OffsetOutOfRangeException: The requested
> offset is not within the range of offsets maintained by the server..
> Retrying
>
> This error is really mystifying us. We're not doing anything severe here,
> just using YARN's kill command to stop the application and then submitting
> it via a normal mechanism. Are there any best practices or gotchas
> surrounding restarting Samza applications on YARN that could help here?
>
> Cheers,
> Malcolm McFarland
> Cavulus
>
>
> This correspondence is from HealthPlanCRM, LLC, d/b/a Cavulus. Any
> unauthorized or improper disclosure, copying, distribution, or use of the
> contents of this message is prohibited. The information contained in this
> message is intended only for the personal and confidential use of the
> recipient(s) named above. If you have received this message in error,
> please notify the sender immediately and delete the original message.
>


Re: [DISCUSS] SEP 25: PR Title and Description Guidelines

2019-12-15 Thread Bharath Kumara Subramanian
+1.  Template looks good to me.
It will be really helpful to sift through, categorize and prepare release
notes from notable PRs during releases.

Thanks,
Bharath


On Fri, Dec 13, 2019 at 11:24 PM Jagadish Venkatraman <
jagadish1...@gmail.com> wrote:

> +1, thanks for the write-up Prateek.
>
> Let's also update the contributor's guidelines at:
> https://samza.apache.org/contribute/contributors-corner.html
>
>
> On Friday, December 13, 2019, Prateek Maheshwari 
> wrote:
>
> > Hi folks,
> >
> > In order to make Samza PR descriptions and commit messages more
> consistent,
> > informative and discoverable, we propose the following requirements for
> new
> > PRs submitted to the Samza project
> >
> > https://cwiki.apache.org/confluence/display/SAMZA/SEP-25%3A+
> > PR+Title+And+Description+Guidelines
> >
> > Contributors should copy-paste and update the description template when
> > submitting PRs.
> > Committers should ensure that the guidelines are followed before merging
> > changes.
> >
> > Please take a look and let us know if you have any concerns or
> suggestions.
> >
> > Thanks,
> > Prateek
> >
>
>
> --
> Jagadish
>


Re: Samza App Deployment Topology - Best Practices

2019-10-14 Thread Bharath Kumara Subramanian
Hi Eric,

Answers to your questions are as follows


>
>
>
> *Can I, or is it recommended to, package multiple jobs as 1 deploymentwith
> 1 properties file or keep each app separated?  Based on thedocumentation,
> it appears to support 1 app/job within a singleconfiguration as there is no
> mechanism to assign multiple app classes andgiven each a name unless I am
> mistaken*
>

 *app.class* is a single valued configuration and your understanding about
it based on the documentation is correct.


>
>
> *If only 1 app per config+deployment, what is the best way to
> handlerequirement #3 - common/shared system services as there is no app or
> jobper say, I just need to specify the streams and output system
> (ieorg.apache.samza.system.hdfs.writer*
>

There are couple of options to achieve your #3 requirement.

   1. If there is enough commonality between your jobs, you could have one
   application class that describes the logic and have the different
   configurations to modify the behavior of the application logic. This does
   come with some of the following considerations
  1. Your deployment system needs to support deploying the same
  application with different configs.
  2. Potential duplication of configuration if you configuration system
  doesn't support hierarchies and overrides.
  3. Potentially unmanageable for evolution, since the change in
  application affects multiple jobs and requires extensive testing across
  different configurations.
   2. You could potentially have libraries to perform some piece of
   business logic and have your different jobs leverage them using
   composition. Some things to consider with this option
   1. Your application and configuration stay isolated.
  2. You could still leverage some of the common configurations if your
  configuration system supports hierarchies and overrides
  3. Alleviates concerns over evolution and testing as long as the
  changes are application specific.


I am still unclear about the second part of your 2nd question.
Do you mean to say all your jobs consume from same sources and write to
sources and only your processing logic is different?


> *common/shared system services as there is no app or jobper say, I just
> need to specify the streams and output system*


Also, I am not sure I follow what do you mean by "*there is no app or job"*.
You still have 1 app per config + deployment, right?

Thanks,
Bharath

On Thu, Oct 10, 2019 at 9:46 AM Eric Shieh  wrote:

> Hi,
>
> I am new to Samza, I am evaluating Samza as the backbone for my streaming
> CEP requirement.  I have:
>
> 1. Multiple data enrichment and ETL jobs
> 2. Multiple domain specific CEP rulesets
> 3. Common/shared system services like consuming topics/streams and
> persisting the messages in ElasticSearch and HDFS.
>
> My questions are:
>
> 1. Can I, or is it recommended to, package multiple jobs as 1 deployment
> with 1 properties file or keep each app separated?  Based on the
> documentation, it appears to support 1 app/job within a single
> configuration as there is no mechanism to assign multiple app classes and
> given each a name unless I am mistaken.
> 2. If only 1 app per config+deployment, what is the best way to handle
> requirement #3 - common/shared system services as there is no app or job
> per say, I just need to specify the streams and output system (ie
> org.apache.samza.system.hdfs.writer.
> BinarySequenceFileHdfsWriter or
>
> org.apache.samza.system.elasticsearch.indexrequest.DefaultIndexRequestFactory).
> Given it's a common shared system service not tied to specific jobs, can it
> be deployed without an app?
>
> Thank you in advance for your help, looking forward to learning more about
> Samza and developing this critical feature using Samza!
>
> Regards,
>
> Eric
>


Re: Samza App Deployment Topology - Best Practices

2019-10-22 Thread Bharath Kumara Subramanian
Hi Eric,

Thanks for additional clarification. I now have a better understanding of
your use case.
Your use case be accomplished using various approaches.


   1. One Samza application that consumes all of your input streams (A and
   D) and also your intermediate output streams (B) and routes it to HDFS or
   ElasticSearch. With this approach,
  1. You may have to scale up or scale down your entire job to react to
  changes to either of your inputs (A, D & B as well). It might potentially
  result in under utilization of resources.
  2. A failure in one component could impact other components. For
  example, it is possible that there are other writers to output
stream B and
  you don't want to disrupt them because of a bug in the
transformation logic
  of input stream A.
  3. On the similar lines of 2, it also possible that due to a HDFS or
  elastic search outage, the back pressure causes output B to grow which
  might potentially impact the time to catch up and also impact your job's
  performance (e.g. this can happen if you have priorities setup
across your
  streams)
   2. Another approach is to split them into multiple jobs; one for
   consuming input sources (A and D) and routing to appropriate destinations
   (B & C), another for consuming the output from the previous job (B) and
   routing it to HDFS or ElasticSearch. With this approach
  1. You isolate the common/shared logic to write to HDFS or
  ElasticSearch into its own job which allows you to manage it
independently
  including scale up/down.
  2. During HDFS/ElasticSearch outages, other components are
  non-impacted and the back pressure causes your stream B to grow
which Kafka
  can handle well.


Our recommend API to write Samza application is Apache Beam
<https://beam.apache.org/>. Examples on how to write a sample application
using Beam API and run it using Samza can be found:
https://github.com/apache/samza-beam-examples

Please reach out to us if you have more questions.

Thanks,
Bharath

On Wed, Oct 16, 2019 at 7:31 AM Eric Shieh  wrote:

> Thank you Bharath.  Regarding my 2nd question, perhaps the following
> scenario can help to illustrate what I am looking to achieve:
>
> Input stream A -> Job 1 -> Output stream B (Kafka Topic B)
> Input stream A -> Job 2 -> Output stream C
> Input stream D -> Job 3 -> Output stream B (Kafka Topic B)
> Input stream B (Kafka Topic B) -> Elasticsearch (or write to HDFS)
>
> In the case of "Input stream B (Kafka Topic B) -> Elasticsearch (or write
> to HDFS)" this is what I was referring to as "Common/shared system
> services" that does not have any transformation logic except message sink
> to either Elasticsearch or HDFS using Samza's systems/connectors.  In other
> words, Job 1 and Job 3 both output to "Output stream B" expecting messages
> will be persisted in Elasticsearch or HDFS, would I need to specify the
> system/connector configuration separately in Job 1 and Job 3?  Is there a
> way to have "Input stream B  (Kafka Topic B) -> Elasticsearch (or write to
> HDFS)" as its own stand-alone job so I can have the following:
>
> RESTful web services (or other none Samza services/applications) as Kafka
> producer ->  Input stream B (Kafka Topic B) -> Elasticsearch (or write to
> HDFS)
>
> Regards,
>
> Eric
>
> On Mon, Oct 14, 2019 at 8:35 PM Bharath Kumara Subramanian <
> codin.mart...@gmail.com> wrote:
>
> > Hi Eric,
> >
> > Answers to your questions are as follows
> >
> >
> > >
> > >
> > >
> > > *Can I, or is it recommended to, package multiple jobs as 1
> > deploymentwith
> > > 1 properties file or keep each app separated?  Based on
> thedocumentation,
> > > it appears to support 1 app/job within a singleconfiguration as there
> is
> > no
> > > mechanism to assign multiple app classes andgiven each a name unless I
> am
> > > mistaken*
> > >
> >
> >  *app.class* is a single valued configuration and your understanding
> about
> > it based on the documentation is correct.
> >
> >
> > >
> > >
> > > *If only 1 app per config+deployment, what is the best way to
> > > handlerequirement #3 - common/shared system services as there is no app
> > or
> > > jobper say, I just need to specify the streams and output system
> > > (ieorg.apache.samza.system.hdfs.writer*
> > >
> >
> > There are couple of options to achieve your #3 requirement.
> >
> >1. If there is enough commonality between your jobs, you could have
> one
> >application class that describes the logic and have the different
>

Re: MessageStream with multiple concurrent operation stacks

2019-10-22 Thread Bharath Kumara Subramanian
Hi Eric,

Based on the source code, it appears that each job designates a unique
> group id when subscribing to kafka topic, is my understanding correct?
>

Yes. Samza uses a combination of job name and job id to generate the group
id.


> is it possible to have 2 independent stack

of operations applied on the same InputStream?


Yes. The code snippet provided in the email should work as expected.

Hope that helps.

Thanks,
Bharath

On Fri, Oct 18, 2019 at 5:55 AM Eric Shieh  wrote:

> Hi,
>
> Based on the source code, it appears that each job designates a unique
> group id when subscribing to kafka topic, is my understanding correct?  If
> so, since one cannot call appDescriptor.getInputStream with the
> same KafkaInputDescriptor twice, is it possible to have 2 independent stack
> of operations applied on the same InputStream?  In essence, I have a
> requirement to process a message from 1 InputStream and write to 2
> OutputStreams or sinks after 2 different independent stacks of operations
> applied.  One way to solve this is to deploy 2 independent jobs but the
> downside of it is it would be difficult to synchronize the 2 jobs.  Is it
> possible to do the following:
>
> MessageStream ms = appDescriptor.getInputStream(kid);
> MessageStream msForkPoint = ms.map(mapping_logic1);
> msForkPoint.filter(filter_logic_1).sendTo(outputSream1);
> msForkPoint.map(mapping_logic2).sink(write_to_DB);
>
> Based on the source code, each operation instantiates a new instance of
> MessageStream and registers the new StreamOperatorSpec with the previous
> MessageStream instance's StreamOperatorSpec essentially forming a "linked
> list" of parent-child StreamOperatorSpecs.  Since each parent  OperatorSpec
> maintains a LinkedHashSet of next OperatorSpecs, the above code of forking
> 2 independent operation stacks after the initial map seems to be feasible.
>
> Regards,
>
> Eric
>


Re: [VOTE] Apache Samza 1.3.0 RC0

2019-11-12 Thread Bharath Kumara Subramanian
-1

Unfortunately, we discovered a bug in side inputs for standby containers.
Here is the ticket that tracks the issue -
https://issues.apache.org/jira/browse/SAMZA-2382

Can we pull https://github.com/apache/samza/pull/1218 into the release as
well?

Thanks,
Bharath

On Tue, Nov 12, 2019 at 3:41 PM Hai Lu  wrote:

> Hi,
>
> This is a call for a vote on a release of Apache Samza 1.3.0. Thanks to
> everyone who has contributed to this release.
>
> The release candidate can be downloaded from here:
> http://home.apache.org/~lhaiesp/samza-1.3.0-rc0/
>
> The release candidate is signed with pgp key 0x07678C76, which can be found
> here:
>
> https://keyserver.ubuntu.com/pks/lookup?search=0x07678C76=on=index
> or to directly see the public key here:
>
> https://keyserver.ubuntu.com/pks/lookup?op=get=0x1513eaedf69d7ca77ff283b534ea3ca507678c76
>
> The git tag is release-1.3.0-rc0 and signed with the same pgp key above:
>
> https://gitbox.apache.org/repos/asf?p=samza.git;a=tag;h=aa4bafdfe54919d2dc576c18f542d5fb7066d04b
>
> Test binaries have been published to Maven's staging repository, and are
> available here:
> https://repository.apache.org/content/repositories/orgapachesamza-1071/
>
> The vote will be open for 56 hours (ending at 10:00 PM PST Thursday,
> 11/14/2019).
>
> Please download the release candidate, check the hashes/signature, build it
> and test it, and then please vote:
>
> [ ] +1 approve
>
> [ ] +0 no opinion
>
> [ ] -1 disapprove (and reason why)
>
> I ran check-all.sh and integration tests.
>
> +1 (non-binding) from my side.
>
> Thanks,
> Hai
>


Re: [VOTE] SEP 25: PR Title and Description Guidelines

2019-12-18 Thread Bharath Kumara Subramanian
+1 (non-binding).

Thanks,
Bharath

On Wed, Dec 18, 2019 at 10:42 AM Prateek Maheshwari 
wrote:

> Hi folks,
>
> This is a call for a vote on SEP 25: PR Title and Description Guidelines.
> Thanks to everyone who helped review the proposal and provided feedback.
>
> Feedback from the discussion is positive:
>
> http://mail-archives.apache.org/mod_mbox/samza-dev/201912.mbox/%3CCAMja7KeQr9C048UVZwfSC46h%3DEX_9S%2BSEvMF9NPg0V5dPTPfZg%40mail.gmail.com%3E
>  <
>
> http://mail-archives.apache.org/mod_mbox/samza-dev/201912.mbox/%3CCAMja7KeQr9C048UVZwfSC46h%3DEX_9S%2BSEvMF9NPg0V5dPTPfZg%40mail.gmail.com%3E
> r  >>
>
> SEP can be found at:
>
> https://cwiki.apache.org/confluence/display/SAMZA/SEP-25%3A+PR+Title+And+Description+Guidelines
>  <
>
> https://cwiki.apache.org/confluence/display/SAMZA/SEP-25%3A+PR+Title+And+Description+Guidelines
> >
>
> Please vote:
>
> [ ] +1 approve
>
> [ ] +0 no opinion
>
> [ ] -1 disapprove (and reason why)
>
> Thanks,
> Prateek
>


Re: Running Samza with YARN Node label support

2019-12-18 Thread Bharath Kumara Subramanian
Hi Debraj,

To get the node label working, set the label configurations[1] pointed out
by Yang in your application config. Samza will take care of embedding the
node label in the resource request automatically if it notices the label
configuration inside your application.
Samza framework respects node label configurations even though they are
documented in the configuration table. I have created SAMZA-2422
 to track this work item.

Let us know if you run into any issues.

Thanks,
Bharath

[1] -
*yarn.container.label* for specifying node label for the containers
*yarn.am.container.label*  for specifying node label for the application
master

On Wed, Dec 18, 2019 at 10:49 AM Debraj Manna 
wrote:

> I understood how I can assign labels to yarn nodes.
>
> But it is still not clear to me how can I specify the node label for a
> samza application. I am referring to the section "Specifying node label for
> application" in the link
> <
> https://hadoop.apache.org/docs/r2.7.3/hadoop-yarn/hadoop-yarn-site/NodeLabel.html
> >
> you shared in your last email.
>
> On Wed, Dec 18, 2019 at 11:17 PM Yang Zhang  wrote:
>
> > Hi Debraj Manna,
> >
> > The app-def in previous email is just an example where you can configure
> > node labels. Yarn node labels
> > <
> >
> https://hadoop.apache.org/docs/r2.7.3/hadoop-yarn/hadoop-yarn-site/NodeLabel.html
> > >
> > is
> > a general feature (not specific to Samza), and it depends on the
> > configuration system your system uses. The example uses xml format to
> > configure Samza job, but Samza as a framework, it does not restrict
> > configuration format. Please let us know if you have further questions,
> and
> > we should detail the documents in OSS to describe the usage of certain
> > features.
> >
> > Best,
> > Yang
> >
> > On Tue, Dec 17, 2019 at 9:58 PM Debraj Manna 
> > wrote:
> >
> > > Thanks, Yang for replaying.
> > >
> > > Yes, my use case is almost similar.
> > >
> > > Can you let me know which app-def you are referring to? I am not able
> to
> > > locate yarn.am.container.label in samza-configurations
> > > <
> > >
> >
> http://samza.apache.org/learn/documentation/latest/jobs/samza-configurations.html
> > > >
> > > .
> > > Is there any samza project whose code I can refer to regarding the
> usage
> > of
> > > these configurations?
> > >
> > > On Wed, Dec 18, 2019 at 7:42 AM Yang Zhang  wrote:
> > >
> > > > Hello Debraj,
> > > >
> > > > We do not have a formal documentation in open source to describe how
> > yarn
> > > > node label is used in general. In contrast, we have an example of
> using
> > > > yarn node label to specify Samza container to run over "HDD" rather
> > than
> > > > default "SSD" nodes. Please take a look at the following guide and
> let
> > us
> > > > know whether it can be applied for your use case. Thank you for
> > reporting
> > > > this issue!
> > > > =Step-by-step guide
> > > >
> > > >
> > > >1.
> > > >
> > > >Add the *yarn.container.label *and* yarn.am.container.label* to
> the
> > > >job's *app-def* if not already present. The default of an empty
> > string
> > > >will keep the current default behavior of using SSD nodes.
> > > >
> > > > > xmlns="urn:com:linkedin:ns:configuration:definition:1.0"
> > > >name="my-application" version="">
> > > >
> > > >
> 
> > > > 
> > > >
> > > >
> > > >2. If you had to modify your *app-def* in step 1, you will need to
> > do
> > > a
> > > >trigger-build to get the change to take effect.
> > > >3.
> > > >
> > > >Add the label to *application.src* for your job. The *hdd* label
> > will
> > > >assign your containers to machines with spinning disks instead of
> > > solid
> > > >state disks.
> > > >
> > > > > > >name="my-application">
> > > >  
> > > >
> > > >
> > > >  
> > > >
> > > >
> > > >
> > > >4.
> > > >
> > > >Deploy.
> > > >
> > > > =
> > > >
> > > >
> > > > Best,
> > > >
> > > > Yang
> > > >
> > > > On Tue, Dec 17, 2019 at 10:13 AM Debraj Manna <
> > subharaj.ma...@gmail.com>
> > > > wrote:
> > > >
> > > > > Hi
> > > > >
> > > > > I am seeing running samza with yarn node label is resolved in 0.12.
> > > > >
> > > > >
> > > > >
> > > >
> > >
> >
> https://issues.apache.org/jira/browse/SAMZA-1013?page=com.atlassian.jira.plugin.system.issuetabpanels%3Aall-tabpanel
> > > > >
> > > > > But I am not able to locate the relevant documentation in
> samza-yarn
> > > > > documentation
> > > > > <
> > > >
> > https://samza.apache.org/learn/documentation/latest/deployment/yarn.html
> > > >
> > > > >
> > > > > Can someone point me to the relevant documentation?
> > > > >
> > > >
> > >
> >
>


Re: Running Samza with YARN Node label support

2019-12-18 Thread Bharath Kumara Subramanian
Hi Debraj,

I forgot to call this out earlier. Some distribution of YARN doesn't
support node label and rack combination as part of the same request. If you
were to use node labels along with host affinity feature
<https://samza.apache.org/learn/documentation/latest/yarn/yarn-host-affinity.html>
in Samza, you might run into following issue

19:25:10.032 [main] ClusterBasedJobCoordinator [ERROR] Exception thrown in
> the JobCoordinator loop
> org.apache.hadoop.yarn.client.api.InvalidContainerRequestException: Cannot
> specify node label with rack and node at
> org.apache.hadoop.yarn.client.api.impl.AMRMClientImpl.checkNodeLabelExpression(AMRMClientImpl.java:617)
> at


Refer https://jira.apache.org/jira/browse/YARN-4925
<https://jira.apache.org/jira/browse/YARN-4925?attachmentOrder=asc> for
more information. You may want to back-port the patch to your custom YARN
distribution if applicable.

Thanks,
Bharath

On Wed, Dec 18, 2019 at 1:15 PM Bharath Kumara Subramanian <
codin.mart...@gmail.com> wrote:

> Hi Debraj,
>
> To get the node label working, set the label configurations[1] pointed out
> by Yang in your application config. Samza will take care of embedding the
> node label in the resource request automatically if it notices the label
> configuration inside your application.
> Samza framework respects node label configurations even though they are
> documented in the configuration table. I have created SAMZA-2422
> <https://issues.apache.org/jira/browse/SAMZA-2422> to track this work
> item.
>
> Let us know if you run into any issues.
>
> Thanks,
> Bharath
>
> [1] -
> *yarn.container.label* for specifying node label for the containers
> *yarn.am.container.label*  for specifying node label for the application
> master
>
> On Wed, Dec 18, 2019 at 10:49 AM Debraj Manna 
> wrote:
>
>> I understood how I can assign labels to yarn nodes.
>>
>> But it is still not clear to me how can I specify the node label for a
>> samza application. I am referring to the section "Specifying node label
>> for
>> application" in the link
>> <
>> https://hadoop.apache.org/docs/r2.7.3/hadoop-yarn/hadoop-yarn-site/NodeLabel.html
>> >
>> you shared in your last email.
>>
>> On Wed, Dec 18, 2019 at 11:17 PM Yang Zhang  wrote:
>>
>> > Hi Debraj Manna,
>> >
>> > The app-def in previous email is just an example where you can configure
>> > node labels. Yarn node labels
>> > <
>> >
>> https://hadoop.apache.org/docs/r2.7.3/hadoop-yarn/hadoop-yarn-site/NodeLabel.html
>> > >
>> > is
>> > a general feature (not specific to Samza), and it depends on the
>> > configuration system your system uses. The example uses xml format to
>> > configure Samza job, but Samza as a framework, it does not restrict
>> > configuration format. Please let us know if you have further questions,
>> and
>> > we should detail the documents in OSS to describe the usage of certain
>> > features.
>> >
>> > Best,
>> > Yang
>> >
>> > On Tue, Dec 17, 2019 at 9:58 PM Debraj Manna 
>> > wrote:
>> >
>> > > Thanks, Yang for replaying.
>> > >
>> > > Yes, my use case is almost similar.
>> > >
>> > > Can you let me know which app-def you are referring to? I am not able
>> to
>> > > locate yarn.am.container.label in samza-configurations
>> > > <
>> > >
>> >
>> http://samza.apache.org/learn/documentation/latest/jobs/samza-configurations.html
>> > > >
>> > > .
>> > > Is there any samza project whose code I can refer to regarding the
>> usage
>> > of
>> > > these configurations?
>> > >
>> > > On Wed, Dec 18, 2019 at 7:42 AM Yang Zhang  wrote:
>> > >
>> > > > Hello Debraj,
>> > > >
>> > > > We do not have a formal documentation in open source to describe how
>> > yarn
>> > > > node label is used in general. In contrast, we have an example of
>> using
>> > > > yarn node label to specify Samza container to run over "HDD" rather
>> > than
>> > > > default "SSD" nodes. Please take a look at the following guide and
>> let
>> > us
>> > > > know whether it can be applied for your use case. Thank you for
>> > reporting
>> > > > this issue!
>> > > > =Step-by-step guide
>> > > >
>> > > >
>

Re: Samza - New Relic integration

2020-02-26 Thread Bharath Kumara Subramanian
Hi,

I am not sure I fully understand the ask.

IIUC, Samza doesn't have native integration with New relic. However, you
should still be able to integrate your application with New relic on your
end without native support.
If you are particularly looking to integrate native Samza metrics w/ New
relic, you might need to implement your own custom metrics reporter. You
can find more details here


Thanks,
Bharath

On Wed, Feb 26, 2020 at 3:15 AM Vaibhav Garg 
wrote:

> Hi,
>
> I want to integrate New Relic in my Samza jobs. Can anyone help with this,
> please?
>
> Thanks,
> Vaibhav Garg
> +91-9505020924
> vaibhavgar...@gmail.com
> LinkedIn 
>


Re: [VOTE] Apache Samza 1.3.1 RC0

2020-02-18 Thread Bharath Kumara Subramanian
Ran check-all and integration tests passed with one exception.
The following test was flaky and passed on the second attempt for a
specific combination (Scala - 2.12, YARN - 2.7.1 & JDK 8)

> Task :samza-test_2.12:test


testRepartitionedSessionWindowCounter FAILED

java.lang.AssertionError: expected:<0> but was:<2>

at org.junit.Assert.fail(Assert.java:88)

at org.junit.Assert.failNotEquals(Assert.java:834)

at org.junit.Assert.assertEquals(Assert.java:645)

at org.junit.Assert.assertEquals(Assert.java:631)

at
org.apache.samza.test.operator.TestRepartitionWindowApp.testRepartitionedSessionWindowCounter(TestRepartitionWindowApp.java:72)

We can follow it up with a ticket if we someone else runs into it as well.

+1 (binding)

Thanks,
Bharath

On Tue, Feb 18, 2020 at 12:12 AM Yi Pan  wrote:

> Ran check-all and integration tests successfully.
>
> +1 (binding)
>
> On Thu, Feb 13, 2020 at 12:02 PM Hai Lu  wrote:
>
> > Hi,
> >
> > This is a call for a vote on a release of Apache Samza 1.3.1 to redress
> > certain issues found in 1.3.0
> >
> > The release candidate can be downloaded from here:
> > http://home.apache.org/~lhaiesp/samza-1.3.1-rc0/
> >
> > The release candidate is signed with pgp key 0x256F8FA2, which can be
> found
> > here:
> >
> >
> https://keyserver.ubuntu.com/pks/lookup?search=0x256F8FA2=on=index
> > or to directly see the public key here:
> >
> >
> https://keyserver.ubuntu.com/pks/lookup?op=get=0x9ebc0889d43fae16dd0d8f5ba2f50cf4256f8fa2
> >
> > The git tag is release-1.3.1-rc0 and signed with the same pgp key above:
> >
> >
> https://gitbox.apache.org/repos/asf?p=samza.git;a=commit;h=7b849c047827587dec55ac169f41aac7321ce1cb
> >
> > Test binaries have been published to Maven's staging repository, and are
> > available here:
> > https://repository.apache.org/content/repositories/orgapachesamza-1074
> >
> > The vote will be open for 128 hours (ending at 8:00 PM PST Tuesday,
> > 2/18/2020).
> >
> > Please download the release candidate, check the hashes/signature, build
> it
> > and test it, and then please vote:
> >
> > [ ] +1 approve
> >
> > [ ] +0 no opinion
> >
> > [ ] -1 disapprove (and reason why)
> >
> > I ran check-all.sh and integration tests (both YARN and standalone).
> >
> > +1 (non-binding) from my side.
> >
> > Thanks,
> > Hai
> >
>


Re: Samza throwing exception: Duplicate key org.apache.beam.runners.samza.runtime.KeyedTimerData@5a5eb850 registration for the same timer

2020-02-20 Thread Bharath Kumara Subramanian
Hi Sandeep,

Thank you for reporting this issue. We noticed this problem during our
testing for upcoming Samza release.
You can find more details about the bug -
https://issues.apache.org/jira/browse/SAMZA-2463

We have a fix and should be included in our upcoming Samza 1.4 release that
is slotted for end of this month.
However, we also need to update beam Samza runner to pick up the fix which
give or take should take another 2 weeks including testing and verification.

Hopefully, we should have a fix by mid March.

Thanks,
Bharath

On Wed, Feb 19, 2020 at 11:54 AM Kathula, Sandeep
 wrote:

> Hi,
>We are trying to build sessionization where we get clickstream hits
> from Kafka and generate sessions from the hits. We are using Apache Beam
> for our code and it runs on Samza runner. We have a PCollection Event> where key is user id and value is clickstream hit. We are grouping
> by user id and calculating sessions.
>
> We are using following windowing strategy:
> PCollection.apply("UserSessions", Window.>into(
>
> Sessions.withGapDuration(Duration.standardMinutes(30)))
> .triggering(Repeatedly
> .forever(AfterProcessingTime
> .pastFirstElementInPane()
> .plusDelayOf(Duration.standardSeconds(60)))
> )
> .discardingFiredPanes()
> .withAllowedLateness(Duration.standardDays(200))
> )
>
> But events we are getting are out of order. So, we are getting timestamp
> from the hit and adding it as event timestamp in order to have it as part
> of correct session. We are using WithTimestamps.of()   for that.
>
> We are saving intermediate state in Kafka topics. We are getting duplicate
> key registered for the same timer exception. When I tried with different
> scenarios when this issue is happening, we figured out that when events are
> coming out of order. For a user when a hit comes and later some hit of
> earlier timestamp comes then it is throwing duplicate key timer exception.
> It is writing all these events into intermediate Kafka topic from which
> duplicate key timer exception is being thrown. First out of order event is
> being written into this Kafka topic and very next moment this process is
> failing with duplicate key timer issue.
>
> Stack trace:
>
> ERROR o.a.b.r.samza.SamzaPipelineResult - org.apache.samza.SamzaException:
> Callback failed for task Partition 8, ssp SystemStreamPartition [kafka,
> cg-mint-session-take27-8c06b298-1091-4322-8da8-2f55d4f617d4-partition_by-gbk-8,
> 8], offset 825. org.apache.samza.SamzaException:
> org.apache.samza.SamzaException: Callback failed for task Partition 8, ssp
> SystemStreamPartition [kafka,
> cg-mint-session-take27-8c06b298-1091-4322-8da8-2f55d4f617d4-partition_by-gbk-8,
> 8], offset 825. at
> org.apache.samza.task.AsyncRunLoop.run(AsyncRunLoop.java:150) at
> org.apache.samza.container.SamzaContainer.run(SamzaContainer.scala:778) at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) at
> java.util.concurrent.FutureTask.run(FutureTask.java:266) at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748) Caused by:
> org.apache.samza.SamzaException: Callback failed for task Partition 8, ssp
> SystemStreamPartition [kafka,
> cg-mint-session-take27-8c06b298-1091-4322-8da8-2f55d4f617d4-partition_by-gbk-8,
> 8], offset 825. at
> org.apache.samza.task.TaskCallbackImpl.failure(TaskCallbackImpl.java:89) at
> org.apache.samza.task.AsyncStreamTaskAdapter.process(AsyncStreamTaskAdapter.java:75)
> at
> org.apache.samza.task.AsyncStreamTaskAdapter.access$000(AsyncStreamTaskAdapter.java:33)
> at
> org.apache.samza.task.AsyncStreamTaskAdapter$1.run(AsyncStreamTaskAdapter.java:58)
> ... 5 common frames omitted Caused by:
> org.apache.beam.sdk.util.UserCodeException: java.lang.RuntimeException:
> org.apache.beam.sdk.util.UserCodeException:
> java.lang.IllegalStateException: Duplicate key
> org.apache.beam.runners.samza.runtime.KeyedTimerData@dffeb6cc
> registration for the same timer at
> org.apache.beam.sdk.util.UserCodeException.wrap(UserCodeException.java:34)
> at org.apache.beam.runners.samza.runtime.OpAdapter.apply(OpAdapter.java:96)
> at org.apache.beam.runners.samza.runtime.OpAdapter.apply(OpAdapter.java:37)
> at
> org.apache.samza.operators.impl.StreamOperatorImpl.handleMessage(StreamOperatorImpl.java:55)
> at
> org.apache.samza.operators.impl.OperatorImpl.onMessage(OperatorImpl.java:178)
> at
> org.apache.samza.operators.impl.OperatorImpl.lambda$null$1(OperatorImpl.java:194)
> at java.lang.Iterable.forEach(Iterable.java:75) at
> org.apache.samza.operators.impl.OperatorImpl.lambda$onMessage$2(OperatorImpl.java:193)
> at java.util.ArrayList.forEach(ArrayList.java:1257) at
> org.apache.samza.operators.impl.OperatorImpl.onMessage(OperatorImpl.java:192)
> at
> 

Re: [DISCUSS] Samza 1.3.1 release

2020-02-13 Thread Bharath Kumara Subramanian
+1.

Cheers,
Bharath

On Thu, Feb 13, 2020 at 12:54 PM Yang Zhang  wrote:

> +1. Thanks!
>
> Best,
> Yang
>
> On Thu, Feb 13, 2020 at 12:06 PM Xinyu Liu  wrote:
>
> > Cool! Thanks for driving this release.
> >
> > Thanks,
> > Xinyu
> >
> > On Thu, Feb 13, 2020 at 10:28 AM Hai Lu  wrote:
> >
> > > Hi all,
> > >
> > > We're going to make a 1.3.1 release to address some critical issues
> that
> > > were found in 1.3.0
> > >
> > > 1.3.1 will be based off 1.3.0 but include the following additional
> > commits:
> > >
> > > SAMZA-2447: Checkpoint dir removal should only search in valid store
> dirs
> > > (#1261)
> > > SAMZA-2446: Invoke onCheckpoint only for registered SSPs (#1260)
> > > SAMZA-2431: Fix the checkpoint and changelog topic auto-creation.
> (#1251)
> > > SAMZA-2434: Fix the coordinator steam creation workflow
> > > SAMZA-2423: Heartbeat failure causes incorrect container shutdown
> (#1240)
> > > SAMZA-2305: Stream processor should ensure previous container is
> stopped
> > > during a rebalance (#1213)
> > >
> > > I'm going to create the release candidate soon.
> > >
> > > Thanks,
> > > Hai
> > >
> >
>


Re: Reduce kafka partition for a topic samza is using

2020-01-08 Thread Bharath Kumara Subramanian
There are few ways to achieve data copy.
 1. Use the vanilla Kafka consumer that consumes data from the old topic
and produce to the new topic with fewer partitions.
 2. Write a Samza job that reads from your old topic and funnels the data
to the new topic.

I'd recommend you to follow up with Kafka community too if you are looking
for more options.

You typically don't have to delete the checkpoint and coordinator topic.
The checkpoints of the new tasks should overwrite the old tasks. However,
you might be left with some stale data since the new topic has fewer
partitions and hence fewer tasks. Coordinator topic stores config for the
most part and it is possible the topics have some stale topic
configurations (if any).

Hope that helps.

Thanks,
Bharath


On Tue, Jan 7, 2020 at 6:19 AM Debraj Manna 
wrote:

> Thanks Bharath for replying.
>
> Samza job is stateless and running in YARN cluster.
>
> If I follow the below approach.
>
>1. Create a temp kafka topic
>2. Copy the messages from old topic to the new topic
>3. Delete old topic
>4. Create new topic with required partitions
>5. Delete old topic
>6. Copy messages from temp topic to new topic
>
> What I have do with the __samza_checkpoint and __samza_coordinator_ topics?
> Should I also delete them?
>
> Can you explain what do you mean by reroute?
>
> On Mon, Jan 6, 2020 at 7:40 PM Bharath Kumara Subramanian <
> codin.mart...@gmail.com> wrote:
>
> > Hi Debraj,
> >
> > Kafka doesn't support reducing the partition size and only supports
> > increasing the partition size of a topic.
> > One way to accomplish it would be to create a new topic with the desired
> > partition count and reroute data from the old topic. Although, it will be
> > good to first understand the use case behind your request. Can you
> > shed some light on this?
> >
> > In the event of change to input topic partition count, the implications
> to
> > a Samza job are as follows
> >
> >1. For stateless jobs, the job is shutdown and if you are running in a
> >cluster mode (YARN), typically containers get restarted and pick up
> the
> >change. In case of Standalone, a new rebalance is triggered.
> >2. For stateful jobs, the shutdown behavior is the same. However,
> based
> >on the choice of the grouper, it might result in additional tasks or
> >reduced number of tasks which would invalidate some of the state
> > associated
> >with the tasks. If you have changelog enabled, you might need to
> > recreate
> >the changelog topic otherwise, you might run into validation issues or
> >correctness issues with your application.
> >
> > As of how Samza detects partition count changes[1] and the actions it
> takes
> > can be found here[2]
> >
> > Thanks,
> > Bharath
> >
> >
> > [1] -
> >
> >
> https://github.com/apache/samza/blob/master/samza-core/src/main/java/org/apache/samza/coordinator/StreamPartitionCountMonitor.java
> > [2] -
> >
> >
> https://github.com/apache/samza/blob/beb5e1b40c07c092bc6e14aafc131d96eda5fcd4/samza-core/src/main/java/org/apache/samza/clustermanager/ClusterBasedJobCoordinator.java#L370
> >
> >
> >
> > On Mon, Jan 6, 2020 at 4:31 AM Debraj Manna 
> > wrote:
> >
> > > Anyone any thoughts on this?
> > >
> > > On Sat, Jan 4, 2020 at 5:16 PM Debraj Manna 
> > > wrote:
> > >
> > > > I am using samza on yarn with Kafka. I need to reduce the number of
> > > > partitions in kafka. I am ok with some data loss. Can someone suggest
> > > what
> > > > should be the recommended way of doing this?
> > > >
> > > > Samza Job Config looks like this -
> > > >
> > > > job.factory.class = org.apache.samza.job.yarn.YarnJobFactory
> > > > task.class = com.vnera.grid.task.GenericStreamTask
> > > > task.window.ms = 100
> > > > systems.kafka.samza.factory =
> > > > org.apache.samza.system.kafka.KafkaSystemFactory
> > > > systems.kafka.consumer.zookeeper.connect = localhost:2181
> > > > systems.kafka.consumer.auto.offset.reset = largest
> > > > systems.kafka.producer.metadata.broker.list = localhost:9092
> > > > systems.kafka.producer.producer.type = sync
> > > > systems.kafka.producer.batch.num.messages = 1
> > > > systems.kafka.samza.key.serde = string
> > > > serializers.registry.string.class =
> > > > org.apache.samza.serializers.StringSerdeFactory
> > > > yarn.package.path =
> > > >
> > file://${basedir}/target/${project.artifactId}-${pom.version}-dist.tar.gz
> > > > yarn.container.count = ${container.count}
> > > > yarn.am.container.memory.mb =  ${samzajobs.memory.mb}
> > > > job.name = job4
> > > > task.inputs = kafka.Topic3
> > > >
> > >
> >
>


Re: [VOTE] Apache Samza 1.4.0 RC1

2020-03-11 Thread Bharath Kumara Subramanian
+1(binding).

Check-all.sh and the integration tests passed.

Thanks,
Bharath

On Tue, Mar 10, 2020 at 3:35 PM Xinyu Liu  wrote:

> +1 (binding).
>
> Run check-all.sh and integration tests for both yarn and standalone. All
> passed.
>
> Thanks,
> Xinyu
>
>
> On Fri, Mar 6, 2020 at 6:46 PM Yi Pan  wrote:
>
> > Have downloaded the files, build with check-all.sh, and ran both YARN and
> > standalone integration tests. All passed.
> >
> > +1 (binding).
> >
> > Thanks!
> >
> > -Yi
> >
> > On Tue, Mar 3, 2020 at 3:03 PM Cameron Lee 
> > wrote:
> >
> > > Hi all,
> > >
> > > This is a call for a vote on a release of Apache Samza 1.4.0. Thanks to
> > > everyone who has contributed to this release.
> > >
> > > The release candidate can be downloaded from here:
> > > https://home.apache.org/~cameronlee/samza-1.4.0-rc1/
> > >
> > > The release candidate is signed with pgp key 0x54CB3CE3, which can be
> > found
> > > here:
> > >
> > >
> >
> https://keyserver.ubuntu.com/pks/lookup?search=0x54CB3CE3=on=index
> > > or to directly see the public key here:
> > >
> > >
> >
> https://keyserver.ubuntu.com/pks/lookup?op=get=0x71b0145290ecdbfa5caea6dbd786a7ba54cb3ce3
> > >
> > > The git tag is release-1.4.0-rc1, signed by the same pgp key above:
> > >
> > >
> >
> https://gitbox.apache.org/repos/asf?p=samza.git;a=commit;h=5327fafb8502b126482ec0c4efc8d1aa9b96ba44
> > >
> > > Test binaries have been published to Maven's staging repository, and
> are
> > > available here:
> > > https://repository.apache.org/content/repositories/orgapachesamza-1077
> > >
> > > The vote will be open for 72 hours (until Friday, March 6, 2020 at 3pm
> > > PST).
> > >
> > > Please download the release candidate, check the hashes/signature,
> build
> > it
> > > and test it, and then please vote:
> > > [ ] +1 approve
> > > [ ] +0 no opinion
> > > [ ] -1 disapprove (and reason why)
> > >
> > > I ran check-all.sh and integration tests.
> > >
> > > +1 (non-binding) from my side.
> > >
> > > Thank you,
> > > Cameron
> > >
> >
>


Re: Execution of action: JobModelVersionChange failed.

2020-03-22 Thread Bharath Kumara Subramanian
Hi Omkar,

The errors are related to timeouts during shutdown which gets triggered
during a rebalance. Whenever a new processor joins the quorum or leaves the
quorum, a rebalance is triggered which requires all the existing processors
to shutdown its container before agreeing on the new job model.
In your case, it looks like the container is taking beyond the configured
timeout (task.shutdown.ms) and hence throwing an exception.

Do you know what is taking so long in your shutdown sequence?

Meanwhile, you can start by increasing the shutdown timeout to a higher
value.
*Note:* You will need to account for the increase in the *consensus timeout*
- the time the leader of the quorum will wait for other participants to
agree on the new job model. If the other processors are still in the
shutdown phase, the leader may end up expiring the current barrier and
trigger another rebalance.

For e.g. if the current setup is
*task.shutdown.ms  = 1*
*job.coordinator.zk.consensus.timeout.ms
 = 3*
then your new setup will roughly (depending on how much room you already
have between these two configurations) need to be following where "*x*" -
denotes the increase in the value
*task.shutdown.ms  = 1 + x*
*job.coordinator.zk.consensus.timeout.ms
 = 3 + x*
Let me know how it goes.

Thanks,
Bharath

On Sat, Mar 21, 2020 at 3:17 PM Deshpande, Omkar
 wrote:

> We are using beam with samza runner - beam.version 2.19.0, samza.version
> 1.3.0
>
> And we are seeing the following excption frequently. Should we be tweaking
> some configuration? Does this point to any network connectivity issue?
>
> 2020/03/21 21:42:09.896 INFO  o.a.s.zk.ZkBarrierForVersionUpgrade -
> Subscribing data changes on the path:
> /app-clickstreamc360pl127-v1327c712f-fd45-4a67-ba80-7baae72e1f62/clickstreamc360pl127-v1327c712f-fd45-4a67-ba80-7baae72e1f62-coordinationData/jobModelGeneration/jobModelUpgradeBarrier/versionBarriers/barrier_151/barrier_state
> for barrier version: 151.
> 2020/03/21 21:42:09.896 ERROR o.a.s.zk.ScheduleAfterDebounceTime -
> Execution of action: JobModelVersionChange failed.
> java.lang.IllegalStateException: ZkClient already closed!
> at
> org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:987)
> at org.I0Itec.zkclient.ZkClient.watchForData(ZkClient.java:1158)
> at
> org.I0Itec.zkclient.ZkClient.subscribeDataChanges(ZkClient.java:194)
> at
> org.apache.samza.zk.ZkUtils.subscribeDataChanges(ZkUtils.java:337)
> at
> org.apache.samza.zk.ZkBarrierForVersionUpgrade.join(ZkBarrierForVersionUpgrade.java:144)
> at
> org.apache.samza.zk.ZkJobCoordinator$ZkJobModelVersionChangeHandler.lambda$doHandleDataChange$0(ZkJobCoordinator.java:536)
> at
> org.apache.samza.zk.ScheduleAfterDebounceTime.lambda$getScheduleableAction$0(ScheduleAfterDebounceTime.java:169)
> at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
> at
> java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
> at
> java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
> at java.lang.Thread.run(Thread.java:748)
> 2020/03/21 21:42:09.897 ERROR org.apache.samza.zk.ZkJobCoordinator -
> Received exception in debounce timer! Stopping the job coordinator
> java.lang.IllegalStateException: ZkClient already closed!
> at
> org.I0Itec.zkclient.ZkClient.retryUntilConnected(ZkClient.java:987)
> at org.I0Itec.zkclient.ZkClient.watchForData(ZkClient.java:1158)
> at
> org.I0Itec.zkclient.ZkClient.subscribeDataChanges(ZkClient.java:194)
> at
> org.apache.samza.zk.ZkUtils.subscribeDataChanges(ZkUtils.java:337)
> at
> org.apache.samza.zk.ZkBarrierForVersionUpgrade.join(ZkBarrierForVersionUpgrade.java:144)
> at
> org.apache.samza.zk.ZkJobCoordinator$ZkJobModelVersionChangeHandler.lambda$doHandleDataChange$0(ZkJobCoordinator.java:536)
> at
> org.apache.samza.zk.ScheduleAfterDebounceTime.lambda$getScheduleableAction$0(ScheduleAfterDebounceTime.java:169)
> at
> java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
> at java.util.concurrent.FutureTask.run(FutureTask.java:266)
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.access$201(ScheduledThreadPoolExecutor.java:180)
> at
> java.util.concurrent.ScheduledThreadPoolExecutor$ScheduledFutureTask.run(ScheduledThreadPoolExecutor.java:293)
> at
> 

Re: Execution of action: JobModelVersionChange failed.

2020-03-24 Thread Bharath Kumara Subramanian
Hi Omkar,

So what does stopping a samza container mean in context of a Beam Pipeline?


A container shutting down during a rebalance shouldn't affect the beam
application unless the shutdown was unsuccessful.


> Does pres.waitUntilFinish() return when samza container is being stopped ?
>

waitUntilFinish() is a blocking call and will only return when the
application exits (e.g. non-streaming jobs) or in the event of application
errors (errors within samza container etc)
In a standalone setup, samza application can either be run as an individual
application or along side with other services sharing the same JVM. For
this reason, any runtime errors in samza container doesn't bring down the
JVM by default. However, it might be desirable for some applications to
bring down their application and that can be achieved by using a monitor
thread that waits on the *pipeline.waitUntilFinish* and then triggers
*System.exit()* or *Runtime.halt() *or* necessary steps.*

And does rebalancing always include stopping all the containers?
>
Yes, rebalances are handled by coordinators which brings down the existing
container and spins up a new container.

I think Thread.sleep(3)  in this code is increasing the time taken for
> the shutdown sequence. Is that correct?
>
I don't think that is causing delays in the shutdown. The block of code
only gets triggered after *waitUntilFinish* returns which is due to
exception in container shutdown sequence. You may want to check if there
any pluggable components (like Consumers, TableProviders) etc take longer
time to shutdown.

As far as your code snippet, I think the reason shutdown signals don't
bring down your application is because of deadlock
<https://stackoverflow.com/questions/25204615/deadlock-in-java-shutdown-hook>
between your main thread which does a *System.exit()* and the shutdown hook
which waits on the *mainThread.join()*.


Thanks,
Bharath

On Mon, Mar 23, 2020 at 3:28 PM Deshpande, Omkar
 wrote:

> Hey Bharath,
>
> I probably know what is taking long in my shutdown sequence.
>
> My code roughly looks like this -
> https://gist.github.com/omkardeshpande8/dc4259a8aa7a726a4fe787d9ece8f44a
> I think Thread.sleep(3)  in this code is increasing the time taken for
> the shutdown sequence. Is that correct?
>
> So what does stopping a samza container mean in context of a Beam
> Pipeline? Does pres.waitUntilFinish() return when samza container is being
> stopped ?
>
> We added Runtime.getRuntime().halt() because our JVM running a Beam
> pipeline was hanging up without exiting, in a lot of cases.
> Do you have suggestions on better way to handle this?
>
> And does rebalancing always include stopping all the containers?
> We are running on K8S and the pods are often moved around. And every time
> a pod is moved, rebalance will be triggered.
> And the rebalance in turn will restart all other pods.
>
>
> On 3/22/20, 10:29 PM, "Bharath Kumara Subramanian" <
> codin.mart...@gmail.com> wrote:
>
> This email is from an external sender.
>
>
> Hi Omkar,
>
> The errors are related to timeouts during shutdown which gets triggered
> during a rebalance. Whenever a new processor joins the quorum or
> leaves the
> quorum, a rebalance is triggered which requires all the existing
> processors
> to shutdown its container before agreeing on the new job model.
> In your case, it looks like the container is taking beyond the
> configured
> timeout (task.shutdown.ms) and hence throwing an exception.
>
> Do you know what is taking so long in your shutdown sequence?
>
> Meanwhile, you can start by increasing the shutdown timeout to a higher
> value.
> *Note:* You will need to account for the increase in the *consensus
> timeout*
> - the time the leader of the quorum will wait for other participants to
> agree on the new job model. If the other processors are still in the
> shutdown phase, the leader may end up expiring the current barrier and
> trigger another rebalance.
>
> For e.g. if the current setup is
> *task.shutdown.ms <http://task.shutdown.ms> = 1*
> *job.coordinator.zk.consensus.timeout.ms
> <http://job.coordinator.zk.consensus.timeout.ms> = 3*
> then your new setup will roughly (depending on how much room you
> already
> have between these two configurations) need to be following where
> "*x*" -
> denotes the increase in the value
> *task.shutdown.ms <http://task.shutdown.ms> = 1 + x*
> *job.coordinator.zk.consensus.timeout.ms
> <http://job.coordinator.zk.consensus.timeout.ms> = 3 + x*
> Let me know how it goes.
>
> Thanks,
> Bharath
>
> On Sat, Mar 21, 2020 at 3:17 PM 

[DISCUSS] Samza 1.5.1 release

2020-08-18 Thread Bharath Kumara Subramanian
Hi all,

In 1.5 release, we enabled transactional state by default for all samza
jobs. We identified a critical bug related to trimming the state which
requires a minor release.

I wanted to kick off the discussion on the open source forum as the bug fix
has been validated internally at LinkedIn.

More details on the bug can be found in SAMZA-2578
.
The patch that contains the fix: samza/pull/1413


I'd like to target early next week for voting.

Cheers,
Bharath


[RESULT][VOTE] Apache Samza 1.5.1 RC0

2020-08-27 Thread Bharath Kumara Subramanian
Hi all,

The vote for 1.5.1 release has been out for more than 72 hours and we got
+1(binding) x3 (Yi, Boris, Bharath)

Samza 1.5.1 officially passed the VOTE!

Thanks for your contribution and help with validation.

--
Bharath


Re: [DISCUSS] Samza 1.5.1 release

2020-08-22 Thread Bharath Kumara Subramanian
It has been more than 72+ hours and the response seems positive.
Let me proceed with a voting thread.

Thanks,
Bharath

On Wed, Aug 19, 2020 at 10:31 AM Xinyu Liu  wrote:

> +1. Hopefully it's not a bug in my code :)
>
> Thanks,
> Xinyu
>
> On Wed, Aug 19, 2020 at 10:29 AM Yi Pan  wrote:
>
> > +1 to this release as well!
> >
> > On Wed, Aug 19, 2020 at 9:21 AM Prateek Maheshwari 
> > wrote:
> >
> > > +1, this is a critical bug and we should release the fix ASAP.
> > >
> > > Thanks,
> > > Prateek
> > >
> > > On Tue, Aug 18, 2020 at 9:02 PM Bharath Kumara Subramanian <
> > > codin.mart...@gmail.com> wrote:
> > >
> > > > Hi all,
> > > >
> > > > In 1.5 release, we enabled transactional state by default for all
> samza
> > > > jobs. We identified a critical bug related to trimming the state
> which
> > > > requires a minor release.
> > > >
> > > > I wanted to kick off the discussion on the open source forum as the
> bug
> > > fix
> > > > has been validated internally at LinkedIn.
> > > >
> > > > More details on the bug can be found in SAMZA-2578
> > > > <https://issues.apache.org/jira/browse/SAMZA-2578>.
> > > > The patch that contains the fix: samza/pull/1413
> > > > <https://github.com/apache/samza/pull/1413>
> > > >
> > > > I'd like to target early next week for voting.
> > > >
> > > > Cheers,
> > > > Bharath
> > > >
> > >
> >
>


[VOTE] Apache Samza 1.5.1 RC0

2020-08-22 Thread Bharath Kumara Subramanian
Hi all,

This is a call for a vote on a release of Apache Samza 1.5.1. We are
releasing 1.5.1 to address a critical bug related transaction state feature.

More details on the bug can be found here:
https://issues.apache.org/jira/browse/SAMZA-2578

The release candidate can be downloaded from here:
http://home.apache.org/~bharathkk/samza-1.5.1-rc0/

The release candidate is signed with pgp key F3B965A6B192DAB7, which is
included in the repository's KEYS file:
https://github.com/apache/samza/blob/master/KEYS

or to directly see the public key here:
https://keyserver.ubuntu.com/pks/lookup?search=Bharath+Kumarasubramanian=on=index

The git tag is release-1.5.1-rc0 and signed with the same pgp key above:
https://gitbox.apache.org/repos/asf?p=samza.git;a=tag;h=refs/tags/release-1.5.1-rc0

Test binaries have been published to Maven's staging repository, and are
available here:
https://repository.apache.org/content/repositories/orgapachesamza-1081

The vote will be open for 72 hours (end at 07:15pm Wednesday, 08/26/2020).
Please download the release candidate, check the hashes/signature, build it
and test it, and vote:

[ ] +1 approve
[ ] +0 no opinion
[ ] -1 disapprove (and reason why)

I ran check-all.sh and integration tests (both YARN and standalone) passed.

+1 on my end for the release.

Thanks,
Bharath


Re: [VOTE] SEP-22: Container Placements in Samza

2020-05-29 Thread Bharath Kumara Subramanian
+1.
Looking forward to use the feature.

--
Bharath

On Fri, May 29, 2020 at 12:16 PM Prateek Maheshwari 
wrote:

> +1 from me too. Thanks for adding this feature!
>
> - Prateek
>
> On Fri, May 29, 2020 at 11:56 AM Boris S  wrote:
>
> > Hi,
> > LGTM.
> > +1.
> >
> > On Wed, May 27, 2020 at 11:28 AM Sanil Jain 
> > wrote:
> >
> > > Hi all,
> > >
> > > This is a call for a vote on SEP-22: Container Placements in Samza
> > >
> > > Thanks to everyone who reviewed the proposal and
> > > provided feedback.
> > >
> > > I have addressed comments on the SEP, and I am not aware of any further
> > > major questions or objections, so I am starting this vote.
> > >
> > > *SEP link: *
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/SAMZA/SEP-22%3A+Container+Placements+in+Samza
> > >
> > >
> > > *Discuss thread:*
> > >
> > >
> > >
> >
> http://mail-archives.apache.org/mod_mbox/samza-dev/202001.mbox/%3CCAKkRg%3D94NY8cLn89u%3DVeL1K52R3XuOimzxXsy7BLzS7fpS%3DLfg%40mail.gmail.com%3E
> > >
> > > There was also some discussion through comments on the SEP page (see
> > > Resolved Comments).
> > >
> > > Please vote:
> > > [ ] +1 approve
> > > [ ] +0 no opinion
> > > [ ] -1 disapprove (and reason why)
> > >
> >
>


[DISCUSS] Samza 1.5 release

2020-05-26 Thread Bharath Kumara Subramanian
Hi all,

We have accumulated few features/improvements since the last release and
would like to make Samza 1.5 release.

I wanted to kick off the discussion on the open source forum as some of
these changes have already been tested internally at LinkedIn. Some of the
features/improvements include but not limited to Simplifying Job Runner,
Container Placements and auto enable transactional state.

A comprehensive list of changes can be found here:
https://issues.apache.org/jira/browse/SAMZA-2527?jql=project%20%3D%20%22SAMZA%22%20and%20fixVersion%20in%20(1.5)

The new release branch has already been cut and the name is "1.5.0". I
would like to target the week of June 1st for voting.

Thank you,
Bharath


Re: [VOTE] Apache Samza 1.5.0 RC0

2020-06-08 Thread Bharath Kumara Subramanian
Hi all,

This vote is canceled because of a flaky integration test issue. Details as
follows

testBatchOperationTriggeredByBatchSize FAILED

java.lang.AssertionError

at org.junit.Assert.fail(Assert.java:86)

at org.junit.Assert.assertTrue(Assert.java:41)

at org.junit.Assert.assertTrue(Assert.java:52)

at
org.apache.samza.table.batching.TestBatchProcessor$TestBatchTriggered.testBatchOperationTriggeredByBatchSize(TestBatchProcessor.java:114)

We are working on fixing the issue.

Thanks,
Bharath

On Thu, Jun 4, 2020 at 9:59 PM Bharath Kumara Subramanian <
codin.mart...@gmail.com> wrote:

> Hi all,
>
> This is a call for a vote on a release of Apache Samza 1.5.0. We are
> excited to see some new features and improvements in this release.
>
> The release candidate can be downloaded from here:
> http://home.apache.org/~bharathkk/samza-1.5.0-rc0/
>
> The release candidate is signed with pgp key F3B965A6B192DAB7, which is
> included in the repository's KEYS file:
> https://github.com/apache/samza/blob/master/KEYS
>
> or to directly see the public key here:
>
> https://keyserver.ubuntu.com/pks/lookup?search=Bharath+Kumarasubramanian=on=index
>
> The git tag is release-1.5.0-rc0 and signed with the same pgp key above:
>
> https://gitbox.apache.org/repos/asf?p=samza.git;a=tag;h=refs/tags/release-1.5.0-rc0
>
> Test binaries have been published to Maven's staging repository, and are
> available here:
> https://repository.apache.org/content/repositories/orgapachesamza-1079/
>
> The vote will be open for 120 hours (end at 10:00pm Tuesday, 06/09/2021).
> Please download the release candidate, check the hashes/signature, build
> it and test it, and vote:
>
> [ ] +1 approve
> [ ] +0 no opinion
> [ ] -1 disapprove (and reason why)
>
> I ran check-all.sh and integration tests (both YARN and standalone) passed.
>
> +1 on my end for the release.
>
> Thanks,
> Bharath
>
>
>
>


Re: [DISCUSS] Samza 1.5 release

2020-06-04 Thread Bharath Kumara Subramanian
Since there is no objection. I am moving on with a vote for the release.

Thanks,
Bharath

On Mon, Jun 1, 2020 at 8:37 PM Yi Pan  wrote:

> lgtm. Thanks for kicking off the discussion!
>
> -Yi
>
> On Tue, May 26, 2020 at 8:55 PM Bharath Kumara Subramanian <
> codin.mart...@gmail.com> wrote:
>
> > Hi all,
> >
> > We have accumulated few features/improvements since the last release and
> > would like to make Samza 1.5 release.
> >
> > I wanted to kick off the discussion on the open source forum as some of
> > these changes have already been tested internally at LinkedIn. Some of
> the
> > features/improvements include but not limited to Simplifying Job Runner,
> > Container Placements and auto enable transactional state.
> >
> > A comprehensive list of changes can be found here:
> >
> >
> https://issues.apache.org/jira/browse/SAMZA-2527?jql=project%20%3D%20%22SAMZA%22%20and%20fixVersion%20in%20(1.5)
> >
> > The new release branch has already been cut and the name is "1.5.0". I
> > would like to target the week of June 1st for voting.
> >
> > Thank you,
> > Bharath
> >
>


[VOTE] Apache Samza 1.5.0 RC0

2020-06-04 Thread Bharath Kumara Subramanian
Hi all,

This is a call for a vote on a release of Apache Samza 1.5.0. We are
excited to see some new features and improvements in this release.

The release candidate can be downloaded from here:
http://home.apache.org/~bharathkk/samza-1.5.0-rc0/

The release candidate is signed with pgp key F3B965A6B192DAB7, which is
included in the repository's KEYS file:
https://github.com/apache/samza/blob/master/KEYS

or to directly see the public key here:
https://keyserver.ubuntu.com/pks/lookup?search=Bharath+Kumarasubramanian=on=index

The git tag is release-1.5.0-rc0 and signed with the same pgp key above:
https://gitbox.apache.org/repos/asf?p=samza.git;a=tag;h=refs/tags/release-1.5.0-rc0

Test binaries have been published to Maven's staging repository, and are
available here:
https://repository.apache.org/content/repositories/orgapachesamza-1079/

The vote will be open for 120 hours (end at 10:00pm Tuesday, 06/09/2021).
Please download the release candidate, check the hashes/signature, build it
and test it, and vote:

[ ] +1 approve
[ ] +0 no opinion
[ ] -1 disapprove (and reason why)

I ran check-all.sh and integration tests (both YARN and standalone) passed.

+1 on my end for the release.

Thanks,
Bharath


[RESULT][VOTE] Apache Samza 1.5.0 RC1

2020-06-11 Thread Bharath Kumara Subramanian
The vote for 1.5.0 release has been out for more than 72 hours and we got
+1 (binding) x3 (Yi, Boris, Bharath)
+1(non-binding) x1 (Ke)

Samza 1.5.0 officially passed the VOTE!

Thanks for your contribution to the release and help with validation.

--
Bharath


[VOTE] Apache Samza 1.5.0 RC1

2020-06-08 Thread Bharath Kumara Subramanian
Hi all,

This is a call for a vote on a release of Apache Samza 1.5.0. We are
excited to see some new features and improvements in this release.

The release candidate can be downloaded from here:
http://home.apache.org/~bharathkk/samza-1.5.0-rc1/

The release candidate is signed with pgp key F3B965A6B192DAB7, which is
included in the repository's KEYS file:
https://github.com/apache/samza/blob/master/KEYS

or to directly see the public key here:
https://keyserver.ubuntu.com/pks/lookup?search=Bharath+Kumarasubramanian=on=index

The git tag is release-1.5.0-rc0 and signed with the same pgp key above:
https://gitbox.apache.org/repos/asf?p=samza.git;a=tag;h=refs/tags/release-1.5.0-rc1

Test binaries have been published to Maven's staging repository, and are
available here:
https://repository.apache.org/content/repositories/orgapachesamza-1080/

The vote will be open for 72 hours (end at 05:15pm Tuesday, 06/11/2021).
Please download the release candidate, check the hashes/signature, build it
and test it, and vote:

[ ] +1 approve
[ ] +0 no opinion
[ ] -1 disapprove (and reason why)

I ran check-all.sh and integration tests (both YARN and standalone) passed.

+1 on my end for the release.

Thanks,
Bharath


Re: [VOTE] Apache Samza 1.6.0 RC1

2021-01-11 Thread Bharath Kumara Subramanian
Thanks for driving this release Boris! Verified signatures and ran all the
tests.
check-all and integration tests passed.

+1 (binding)

--
Bharath

On Mon, Jan 11, 2021 at 1:28 PM Boris S  wrote:

> Quick reminder,
> Please take some time to validate the release and vote on it.
>
> Thanks,
> Boris.
>
> On Wed, Jan 6, 2021 at 11:13 PM Boris S  wrote:
>
> > Hi all,
> >
> > This is a call for a vote on a release of Apache Samza 1.6.0. We are
> > excited to see some new features and improvements in this release.
> >
> > The release candidate can be downloaded from here:
> > http://people.apache.org/~boryas/samza-1.6.0-rc1/
> > 
> >
> > The release candidate is signed with pgp key D2103453, which is
> > included in the repository's KEYS file:
> > https://github.com/apache/samza/blob/master/KEYS
> >
> > or to directly see the public key here:
> >
> >
> https://keyserver.ubuntu.com/pks/lookup?search=Boris+Shkolnik=on=index
> >
> > The git tag is release-1.6.0-rc1 and signed with the same pgp key above:
> >
> >
> https://gitbox.apache.org/repos/asf?p=samza.git;a=tag;h=refs/tags/release-1.6.0-rc1
> > <
> https://gitbox.apache.org/repos/asf?p=samza.git;a=tag;h=refs/tags/release-1.6.0-rc0
> >
> >
> > Test binaries have been published to Maven's staging repository, and are
> > available here:
> > https://repository.apache.org/content/repositories/orgapachesamza-1088/
> >  >
> > and
> > https://repository.apache.org/content/repositories/orgapachesamza-1089/
> >  >
> > (for Scala 2.12)
> >
> > The vote will be open for 72 hours (end at 06:00pm, Mon 1/11/2021).
> > Please download the release candidate, check the hashes/signature, build
> it
> > and test it, and vote:
> >
> > [ ] +1 approve
> > [ ] +0 no opinion
> > [ ] -1 disapprove (and reason why)
> >
> > I ran check-all.sh and integration tests (both YARN and standalone)
> passed.
> >
> > +1 on my end for the release.
> >
> > Thanks,
> > Boris
> >
>


Re: [DISCUSS] Apache Samza 1.6.0 RC0

2020-11-30 Thread Bharath Kumara Subramanian
+1.
Can you please tag all the JIRAs with fix version = 1.6? The link in the
email points to JIRAs across versions.

Looking forward to the new release.

On Wed, Nov 18, 2020 at 10:36 AM Boris S  wrote:

> ** with correct subject this time**
>
> Hi all,
>
> We have added a number of major features and changes to master since
> 1.5, that warrants a major 1.6 release.
>
> Within LinkedIn, some of these features have already been tested as
> part of our test suites. We plan to continue our testing the coming
> weeks to validate the stability prior to release.
>
>
> We wanted to kick off the discussion in the open source forum to keep
> the momentum flowing.
>
>
> Here is a selected list of features that are part of the new release:
>   SAMZA-2600: Extract constants for string literals used in AM and
> container (#1439)
>
> SAMZA-2596: Replace String.format() calls to avoid
> MissingFormatArgumentException
>
> SAMZA-2595: Updated MonitorService to use separate
> ScheduleExecutor for each monitor (#1434)
>
> SAMZA-2587: IntermediateMessageSerde exception handling (#1426)
>
> SAMZA-2593: Update task callback to store only necessary fields
> instead of the message envelope (#1433)
>
> SAMZA-2574 : improve flexibility of SystemFactory interface
>
> SAMZA-2589: Consolidate Beam and High/Low Samza Apps launch workflow
> (#1428)
>
> SAMZA-2558: Refactor app.runner.class
>
> SAMZA-2424: AM should cache and serve serialized Job Model to
> containers (#1241)
>
> SAMZA-2584: Refactor ClusterBasedJobCoordinator (#1424)
>
> SAMZA-2585: Modify shutdown sequence to handle orphaned AMs (#1422)
>
> SAMZA-2439: Remove LocalityManager and container location
> information from JobModel (#1421)
>
> SAMZA-2579: Force restart feature for Container Placements (#1414)
>
>
> You can find a concrete list of the features, bug-fixes, upgrades here
>
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20SAMZA%20and%20fixVersion%20%3D%201.6
> <
> https://issues.apache.org/jira/browse/SAMZA-1875?jql=project%20%3D%20SAMZA%20AND%20resolution%20%3D%20Fixed%20ORDER%20BY%20created%20DESC
> >
>
>
>
>
>
> Here is my proposal on our release schedule and timelines.
>
>1. Cut the 1.6.0 release branch.
>
>2. Target a release vote on the week of Nov 23rd.
>
> --
>
> Thanks
>
> Boris
>


Re: Upgrading from 0.14 -> 1.5.1

2020-12-14 Thread Bharath Kumara Subramanian
Hi Malcolm,

Based on the following log

INFO [org.apache.samza.zk.ZkJobCoordinator] New JobModel does not contain
> pid=a3e86ddf-8d18-40c9-8063-1efd588cec56. Stopping this processor. New
> JobModel: JobModel [..]
>

I'd have to guess that the processor isn't part of the quorum (list of
processors) that was used by the leader to generate the job model in the
first place and hence it is expected to ignore the job model change and
shut itself down.

I'd suggest

   1. Take a pass at whether this processor is part of the quorum and what
   happened to its membership.
   2. Take a pass at the leader's log to get some insights into what set of
   processors it started out with when generating the job model.

We will need more details to investigate the issue. If you can attach the
failed processor and leader logs, I can take a stab at it.

Thanks,
Bharath


On Mon, Dec 14, 2020 at 10:05 AM Malcolm McFarland 
wrote:

> Hey all,
>
> We have an app that's been running on v0.14.1 for the last few years, and
> we're trying to drag it forward into the present with v1.5.1. I've tried a
> few different approaches at updating it, including creating a
> TaskApplication via the low-level API and also following the "Legacy
> Applications" deploy instructions. Thus far, the legacy approach seems most
> promising, but the application isn't fully starting up. It _seems_ to be an
> issue with creating the JobModel; although there are no explicit errors, I
> do see these log messages:
>
> INFO [org.apache.samza.zk.ZkJobCoordinator] Got a notification for new
> JobModel version. Path = ..
> INFO [org.apache.samza.zk.ZkJobCoordinator]
> pid=a3e86ddf-8d18-40c9-8063-1efd588cec56: new JobModel is available.
> Version =9; JobModel = JobModel [..]
> INFO [org.apache.samza.zk.ZkJobCoordinator] New JobModel does not contain
> pid=a3e86ddf-8d18-40c9-8063-1efd588cec56. Stopping this processor. New
> JobModel: JobModel [..]
>
> At this point the ThreadJob shuts down cleanly. Afaict, the legacy
> configuration is set up correctly, and mirrors our functional build under
> 0.14.1. Any thoughts?
>
> Cheers,
> Malcolm McFarland
> Cavulus
>
>
> This correspondence is from HealthPlanCRM, LLC, d/b/a Cavulus. Any
> unauthorized or improper disclosure, copying, distribution, or use of the
> contents of this message is prohibited. The information contained in this
> message is intended only for the personal and confidential use of the
> recipient(s) named above. If you have received this message in error,
> please notify the sender immediately and delete the original message.
>


Re: Welcome Sanil Jain as Apache Samza committer

2021-02-02 Thread Bharath Kumara Subramanian
Congratulations Sanil!

On Tue, Feb 2, 2021 at 3:43 PM Miguel Sanchez Schwarz
 wrote:

> Congratulations Sanil!!
> 
> From: Wei Song 
> Sent: Tuesday, February 2, 2021 2:27 PM
> To: dev@samza.apache.org 
> Subject: Re: Welcome Sanil Jain as Apache Samza committer
>
> [weison...@gmail.com appears similar to someone who previously sent you
> email, but may not be that person. Learn why this could be a risk at
> http://aka.ms/LearnAboutSenderIdentification.]
>
> Congrats, Sanil!
>
> On 2/2/21, 2:03 PM, "Yi Pan"  wrote:
>
> Hi, everyone,
>
> I am glad to announce that Sanil Jain has officially accepted our
> invitation and become an Apache Samza committer now.
>
> Please join me to give him a warm welcome!
>
> Cheers!
>
> -Yi
>


Re: [VOTE] SEP-28: Samza State Backend Interface and Checkpointing Improvement

2021-06-22 Thread Bharath Kumara Subramanian
+1 (binding). Looking forward to seeing this running in production.

--
Bharath

On Tue, Jun 22, 2021 at 1:41 PM Sanil Jain  wrote:

> +1 (non-binding) Thanks for the contribution
>
> On Tue, 22 Jun 2021 at 13:09, Prateek Maheshwari 
> wrote:
>
> > +1 (binding). Thanks for the contribution!
> >
> > - Prateek
> >
> > On Tue, Jun 22, 2021 at 11:13 AM Yi Pan  wrote:
> >
> > > +1 (binding) this is going to improve our state recovery story
> > > significantly!
> > >
> > > -Yi
> > >
> > > On Mon, Jun 21, 2021 at 1:03 PM Daniel Chen  wrote:
> > >
> > > > Hi all,
> > > >
> > > > This is a call for a vote on SEP-28: Samza State Backend Interface
> and
> > > > Checkpointing Improvements. Thanks to everyone who was involved with
> > the
> > > > design and reviews to refine the proposal.
> > > >
> > > > Discussion thread:
> > > >
> > > >
> > >
> >
> http://mail-archives.apache.org/mod_mbox/samza-dev/202106.mbox/%3cCA+6YmWVVxz=xr244rpg2a-a6qaor0mjrw9ck41-u7tsuv8o...@mail.gmail.com%3e
> > > >
> > > > SEP-28:
> > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/SAMZA/SEP-28%3A+Samza+State+Backend+Interface+and+Checkpointing+Improvements
> > > >
> > > > Jira ticket:
> > > > https://issues.apache.org/jira/browse/SAMZA-2591
> > > >
> > > > Please vote:
> > > >
> > > > [ ] +1 approve
> > > >
> > > > [ ] +0 no opinion
> > > >
> > > > [ ] -1 disapprove (and reason why)
> > > >
> > > > Thanks,
> > > > Daniel
> > > >
> > >
> >
>


Re: [VOTE] SEP-29: Blob Store Based State Backup And Restore

2021-06-23 Thread Bharath Kumara Subramanian
+1 (binding). Thanks for the contribution!

--
Bharath

On Tue, Jun 22, 2021 at 8:19 PM Yi Pan  wrote:

> +1 (binding). Thanks for rolling out this big feature!
>
> -Yi
>
> On Tue, Jun 22, 2021 at 1:42 PM Sanil Jain  wrote:
>
> > +1 (non-binding) Thanks for this contribution!
> >
> > -Sanil
> >
> > On Tue, 22 Jun 2021 at 13:13, Daniel Chen  wrote:
> >
> > > +1 (non-binding), thanks!
> > >
> > > On Tue, Jun 22, 2021 at 1:10 PM Prateek Maheshwari <
> prate...@utexas.edu>
> > > wrote:
> > >
> > > > +1 (binding) from me. Thanks for the contribution!
> > > >
> > > > - Prateek
> > > >
> > > > On Tue, Jun 22, 2021 at 11:45 AM Prateek Maheshwari <
> > prate...@utexas.edu
> > > >
> > > > wrote:
> > > >
> > > > > Hi all,
> > > > >
> > > > > This is a call for a vote on SEP-29: Blob Store Based State Backup
> > And
> > > > > Restore.
> > > > >
> > > > > Discussion thread:
> > > > >
> > > >
> > >
> >
> https://mail-archives.apache.org/mod_mbox/samza-dev/202106.mbox/%3cCAMja7KdMNU_Zk-vDnwcm4GSJs==126-mu6djgtsoukzkxzf...@mail.gmail.com%3e
> > > > >
> > > > > SEP-29:
> > > > >
> > > >
> > >
> >
> https://cwiki.apache.org/confluence/display/SAMZA/SEP-29%3A+Blob+Store+Based+State+Backup+And+Restore
> > > > >
> > > > > Please vote:
> > > > > [  ] +1 approve
> > > > > [  ] +0 no opinion
> > > > > [  ] -1 disapprove (and the reason why)
> > > > >
> > > > > Thanks,
> > > > > Shekhar and Prateek
> > > > >
> > > >
> > >
> >
>


Draft board report for Samza

2021-04-16 Thread Bharath Kumara Subramanian
Hi all,

Here is a draft report for Samza. Please let me know if I missed anything.



## Description:
The mission of Samza is the creation and maintenance of software related to
distributed stream processing framework

## Issues:
- There are no issues requiring board attention.

## Membership Data:
Apache Samza was founded 2015-01-22 (6 years ago)
There are currently 28 committers and 17 PMC members in this project.
The Committer-to-PMC ratio is roughly 7:5.

Community changes, past quarter:
- No new PMC members. Last addition was Bharath Kumarasubramanian on
2020-02-13.
- Ke Wu was added as committer on 2021-02-25
- Sanil Jain was added as committer on 2021-02-01

## Project Activity:
- 1.6.0 was released on 2021-01-28.
- Support rack aware standby in YARN

## Community Health:
- JIRA Activity
  - 11 issues opened in JIRA, past quarter (-50% decrease)
  - 7 issues closed in JIRA, past quarter (-46% decrease)
- Commit Activity
  - 34 commits in the past quarter
  - 13 code contributors in the past quarter (44% increase)
  - 35 PRs opened on GitHub, past quarter (66% increase)
  - 38 PRs closed on GitHub, past quarter (90% increase)


Re: Kafka Offset Commit

2021-08-11 Thread Bharath Kumara Subramanian
Hi Talat,

It is expected behavior since Samza uses low level Kafka consumer instead
of the high level consumer. As a result, offset management is done by Samza
and doesn't leverage the offset management that Kafka consumer has by
default.

Additionally, kafka consumer group doesn't apply to samza as well since
Samza manages the assignments of Kafka partitions to tasks and doesn't
leverage Kafka's high level consumer behavior to assign its partition to
different consumers.

Hope that answers your question.

Thank you,
Bharath

On Tue, Aug 10, 2021 at 9:41 PM Talat Uyarer 
wrote:

> Thank you rayman. But my question is when i check kafka consumer group of
> the job. I dont see any offset movement. I chose to store checkpoints on
> file system. Do you think because of that i dont see my job's consumer
> group does not move offset ?
>
>
>
> On Tue, Aug 10, 2021, 9:32 PM rayman preet  wrote:
>
> > Hi Talat,
> >
> > Since in the job.properties the task.checkpoint.factory is set to
> > FileSystemCheckpointManagerFactory
> > and not org.apache.samza.checkpoint.kafka.KafkaCheckpointManagerFactory.
> > That is why its writing checkpoints to the filesystem (with its. location
> > controlled by task.checkpoint.path).
> >
> >
> >
> https://urldefense.proofpoint.com/v2/url?u=https-3A__samza.apache.org_learn_documentation_1.0.0_container_checkpointing.html=DwIBaQ=V9IgWpI5PvzTw83UyHGVSoW3Uc1MFWe5J8PTfkrzVSo=BkW1L6EF7ergAVYDXCo-3Vwkpy6qjsWAz7_GD7pAR8g=sDaqSy0r4X9x43flCgkiuMeZhtbLCX9uwEMkqxtvQ2g=vlTnWi-4Xxnk52pXrMZTfQekBoDWp66hovL2E_qi-Z8=
> >
> > has details on the configs we need to add to enable checkpointing to
> kafka
> > for a job.
> >
> > thanks
> >
> >
> > On Tue, Aug 10, 2021 at 5:03 PM Talat Uyarer <
> tuya...@paloaltonetworks.com
> > >
> > wrote:
> >
> > > Hi Samza Community,
> > >
> > > This is my first email. Forgive my lack of knowledge about samza. I am
> > > running a testing job in my environment. I run in local model but
> somehow
> > > my job is processing data however it does not commit offset on Kafka
> > side.
> > > I use an apache beam samza runner.
> > >
> > > My pipeline is simply read from kafka write to GCS bucket. Do you have
> > any
> > > idea where I should look for debugging this issue?
> > >
> > > This is my job.properties file
> > >
> > > app.runner.class=org.apache.samza.runtime.LocalApplicationRunner
> > > job.coordinator.factory=org.apache.samza.zk.ZkJobCoordinatorFactory
> > > job.coordinator.zk.connect=localhost:2181
> > >
> > >
> >
> task.name.grouper.factory=org.apache.samza.container.grouper.task.GroupByContainerIdsFactory
> > > job.config.rewriters=env-config
> > >
> > >
> >
> job.config.rewriter.env-config.class=org.apache.samza.config.EnvironmentConfigRewriter
> > > job.default.system=filereader
> > >
> > >
> >
> systems.filereader.samza.factory=org.apache.samza.system.kafka.KafkaSystemFactory
> > > job.container.thread.pool.size=300
> > >
> > >
> >
> job.systemstreampartition.grouper.factory=org.apache.samza.container.grouper.stream.GroupBySystemStreamPartitionFactory
> > >
> > >
> >
> task.checkpoint.factory=org.apache.samza.checkpoint.file.FileSystemCheckpointManagerFactory
> > > task.checkpoint.path=/home/checkpoints
> > >
> > > Thanks for your help in advance.
> > >
> > > Talat
> > >
> >
> >
> > --
> > thanks
> > rayman
> >
>


Re: [ANNOUNCE] Welcome Daniel Chen as Samza Committer

2021-09-20 Thread Bharath Kumara Subramanian
Congratulations Daniel!

Looking forward to more contributions.



On Mon, Sep 20, 2021 at 8:03 AM Hai Lu  wrote:

> Congrats!!!
>
> On Fri, Sep 17, 2021 at 2:06 PM Sanil Jain  wrote:
>
> > Congrats Daniel !
> >
> > On Fri, 17 Sept 2021 at 11:40, Jagadish Venkatraman <
> > jagadish1...@gmail.com>
> > wrote:
> >
> > > Congrats Daniel on this well deserved recognition.
> > >
> > > Look forward to more contributions!
> > >
> > >
> > > On Fri, Sep 17, 2021 at 11:25 AM Yi Pan  wrote:
> > >
> > > > Congrats, Daniel, well deserved!!!
> > > >
> > > > -Yi
> > > >
> > > > On Fri, Sep 17, 2021 at 11:23 AM Xinyu Liu 
> > > wrote:
> > > >
> > > > > Hi, all,
> > > > >
> > > > > I am glad to announce that Daniel Chen has officially accepted our
> > > > > invitation and become an Apache Samza Committer now.
> > > > >
> > > > > Daniel has contributed to many areas of Samza, from his early work
> on
> > > > > Eventhub connector, to recently state restore and checkpointing
> > > > > improvements. Daniel also contributed tremendously to integrate
> > Apache
> > > > Beam
> > > > > Python API on top of Samza. As an active member in Samza, he has
> > > > > participated frequently in the design, code reviews and mailing
> list
> > > > > discussions. He has also contributed to Samza tutorials, website,
> > > > releases
> > > > > and bug fixes.
> > > > >
> > > > > Considering his contributions, the Samza PMC trusts Daniel with the
> > > > > responsibilities of a Samza Committer.
> > > > >
> > > > > Please join me to give him a warm welcome!
> > > > >
> > > > > Xinyu Liu
> > > > > on behalf of the Apache Samza PMC
> > > > >
> > > >
> > >
> > >
> > > --
> > > -- Jagadish
> > >
> >
>


Re: [DISCUSS] Apache Samza 1.7.0 RC0

2022-01-26 Thread Bharath Kumara Subramanian
+1 from my end.

On Wed, Jan 26, 2022 at 3:07 PM Yi Pan  wrote:

> Huge +1! Can't wait to see this list of features coming out!
>
> -Yi
>
> On Wed, Jan 26, 2022 at 2:36 PM Daniel Chen  wrote:
>
> > Hi folks,
> >
> > We have added a number of major features and changes to master since
> >
> > 1.6, that warrants a major 1.7 release.
> >
> > Within LinkedIn, some of these features have already been tested as
> >
> > part of our test suites and are currently used in many of our production
> > jobs. We plan to continue our testing in the coming weeks to validate the
> > stability prior to release.
> >
> > We wanted to kick off the discussion in the open source forum to keep
> >
> > the momentum flowing.
> >
> > Here is a selected list of major features that are part of the new
> release:
> >
> >
> >
> >-
> >
> >SEP-28: Samza State Backend Interface and Checkpointing Improvements
> >(#1514)
> >-
> >
> >SEP-29: Blob Store as backend for Samza State backup and restore
> (#1501)
> >-
> >
> >SEP-30: Adding partial update api to Table API (#1560)
> >
> >
> >
> > You can find a concrete list of the features, bug-fixes, upgrades here
> >
> >
> >
> https://issues.apache.org/jira/issues/?jql=project%20%3D%20SAMZA%20AND%20fixVersion%20%3D%201.7
> >
>


Re: [VOTE] SEP-30: Support Updates in Table API

2022-01-26 Thread Bharath Kumara Subramanian
+1 (binding).

Looking forward to seeing this in production.

Thanks,
Bharath

On Mon, Jan 24, 2022 at 4:58 PM Yi Pan  wrote:

> Discussed and resolved the minor concerns offline. +1 (binding) for this
> one.
>
> Thanks!
>
> -Yi
>
> On Tue, Dec 21, 2021 at 1:28 PM Xinyu Liu  wrote:
>
> > +1 on my side.
> >
> > Glad to see this feature coming. Please make sure the api changes are
> > reflected in the documents, e.g.
> > https://samza.apache.org/learn/documentation/1.0.0/api/table-api.html.
> >
> > Thanks,
> > Xinyu
> >
> > On Mon, Dec 20, 2021 at 10:44 AM Ajo Thomas 
> > wrote:
> >
> > > Hi All,
> > >
> > > This is a call for a vote on SEP-30: Support Updates in Table API
> > > Thanks to everyone involved with the design and reviews to refine the
> > > proposal.
> > >
> > > Email Thread:
> > >
> > >
> >
> http://mail-archives.apache.org/mod_mbox/samza-dev/202112.mbox/%3cCAAMuQDN9fX64KONdqD1n06xTXvgMXNUqkt2RnPnt9Zr=vjn...@mail.gmail.com%3e
> > >
> > > SEP-30:
> > >
> > >
> >
> https://cwiki.apache.org/confluence/display/SAMZA/SEP-30%3A+Support+Updates+in+Table+API
> > >
> > > Jira ticket:
> > > https://issues.apache.org/jira/browse/SAMZA-2709
> > >
> > > Please vote:
> > > [ ] +1 approve
> > > [ ] +0 no opinion
> > > [ ] -1 disapprove (and reason why)
> > >
> > > Thanks,
> > > Ajo Thomas
> > >
> >
>


Re: [DISCUSS] SEP-32: Elasticity for Samza

2023-02-07 Thread Bharath Kumara Subramanian
+1 on my end.

Looks good to me.
Thanks for putting this together, Manasa!

Cheers,
Bharath



On Mon, Feb 6, 2023 at 11:51 PM Jagadish Venkatraman 
wrote:

> Thank you Manasa for the proposal. I reviewed it and it looks good to me.
> nice work!
>
> +1 (approve) from my end.
>
>
>
> On Mon, Feb 6, 2023 at 11:41 PM Yi Pan  wrote:
>
> > Hi, Manasa,
> >
> > Sorry for the late reply. The revision lgtm. Thanks for the great work!
> >
> > Best,
> >
> > -Yi
> >
> > On Mon, Jan 30, 2023 at 12:11 PM Lakshmi Manasa <
> lakshmimanas...@gmail.com
> > >
> > wrote:
> >
> > > Hi Yi,
> > >
> > >  I have updated the SEP-32 including all feedback for the above
> > questions.
> > > Please let me know if there are any follow up questions.
> > >
> > > thanks,
> > > Manasa
> > >
> > > On Mon, Jan 23, 2023 at 8:56 AM Lakshmi Manasa <
> > lakshmimanas...@gmail.com>
> > > wrote:
> > >
> > >> Hi Yi,
> > >>
> > >> thank you for raising these questions. Please find my answers inline
> > >> below.
> > >>
> > >> *a) how are states for the virtual tasks managed during split/merge?*
> > >> for this SEP, stateful job elasticity is future work. SEP-32 currently
> > >> only deals with stateless elasticity
> > >> The idea for state preserving elasticity is to have a requirement that
> > >> only jobs can guarantee a bijective mapping between state key and
> input
> > key
> > >> will be supported.
> > >> This requirement is needed so that when input keys move from one
> virtual
> > >> task to another, it is easy to identify which state keys should be
> > present
> > >> in the store of the virtual task for correct operation.
> > >> additionally, stateful elasticity is only supported for jobs that rely
> > on
> > >> blob store for backup and restore.
> > >> Furthermore, for stateful jobs elasticity is increased or decreased
> only
> > >> in steps of 2.
> > >> With these restrictions in place, when a job starts with elasticity
> > >> factor 2, the state blob for the original task is copied for both
> > virtual
> > >> tasks during a split.
> > >> for a merge, when two virtual tasks merge into one (virtual/original)
> > >> task, the state blob of new task will need to be stitched from older
> > blobs.
> > >> This will need to be done by leveraging the stateKey input key
> bijective
> > >> mapping which will help determing for each state key in new blob, the
> > value
> > >> should come from which older blob
> > >> (older blob belonged to a virtual task that consumed an input key
> based
> > >> on the keyBucket of the virutal task)
> > >> That said the design for stateful needs more work and is planned for a
> > >> subsequent follow up SEP and this current SEP-32, focusses only on
> > >> stateless jobs
> > >>
> > >> *b) what's perf impact when we have 2 virtual tasks on the same SSP in
> > >> the same container, while one virtual task is much faster than the
> > other?*
> > >> SystemConsumer subscribes to the input system at a partition level.
> > >> Due to this even if one v. task is much faster than the other, since
> > both
> > >> consume the same SSP, system consumer of a container will only fetch
> > only
> > >> once the entire SSP buffer is empty.
> > >> This means even though one v. task is much faster, the perf will be
> > >> determined by the slower v. task.
> > >> however, this is not worse than the pre-elastic job perf and if num
> > >> containers is increased then the fast v.task can improve perf if the
> > slow
> > >> and fast v.task are in different containers (different system
> consumers)
> > >>
> > >> *c) what's the reason that a virtual task can not filter older
> messages
> > >> from a previous offset, in case the container restarts from a smaller
> > >> offset from another virtual task consuming the same SSP?*
> > >> iiuc this question is for when a containers has two v. tasks that
> > >> committed checkpoints for an SSP where one fast v.task commited a
> newer
> > >> offset and slow v.task committed an older offset.
> > >> In this scenario, the SEP says there could be duplicate processing as
> > the
> > >> SystemConsumer will start consuming from the older offset for the SSP.
> > >> Yes, an improvement can be done to enable the v.task that committed a
> > >> newer offset to start processing only from the offset after its
> > checkpoint
> > >> and filter out older messages.
> > >>
> > >> *d) how do we compare this w/ an alternative idea that implements a
> > >> KeyedOrderedExecutor w/ multiple parallel threads within the single
> > task's
> > >> main event loop to increase the parallelism?*
> > >> Is this similar to the per-key parallelism option (in the rejected
> > >> solutions section) with the difference that the num threads is fixed
> > for a
> > >> single task (as opposed to one thread per key in the rejected
> solution)?
> > >> this KeyOrdereredExecutor is better than the parallelism current
> > >> task.max.concurrency offers as it gives in-order execution per key.
> > >> However, for KeyOrderedExecutor solution num 

Re: [VOTE] SEP-32: Elasticity for Samza

2023-02-07 Thread Bharath Kumara Subramanian
+1 (binding)

Cheers,
Bharath

On Tue, Feb 7, 2023 at 12:56 PM Lakshmi Manasa 
wrote:

> Hi folks,
>
>  This is a call for vote on SEP-32: Elasticity for Samza.
> Thank you for reviewing the SEP and giving feedback.
>
> I have addressed the comments on the SEP and since there were three +1 on
> the discuss thread, starting this vote.
>
> Discussion thread:
> https://lists.apache.org/thread/vjtl5fnf64kpkoxc591466y92dlt2bsb
>
> SEP:
>
> https://cwiki.apache.org/confluence/display/SAMZA/SEP-32%3A+Elasticity+for+Samza
>
> Please vote:
> [ ] +1 approve
> [ ] +0 no opinion
> [ ] -1 disapprove (and reason why)
>
> thanks,
> Manasa
>