[GitHub] samza pull request #347: SAMZA-1478; Delete unneeded data from intermediate ...
GitHub user lindong28 opened a pull request: https://github.com/apache/samza/pull/347 SAMZA-1478; Delete unneeded data from intermediate Kafka topic on offset commit You can merge this pull request into a Git repository by running: $ git pull https://github.com/lindong28/samza SAMZA-1478 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/samza/pull/347.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #347 commit c418d59b45133478e0152625e2f2ec030e3400f3 Author: Dong LinDate: 2017-11-01T03:41:21Z SAMZA-1478; Delete unneeded data from intermediate Kafka topic on offset commit ---
[GitHub] samza pull request #346: Admin GetPartitionsCount should return count and no...
GitHub user sborya opened a pull request: https://github.com/apache/samza/pull/346 Admin GetPartitionsCount should return count and not fake SamzaPartitionsMetadata You can merge this pull request into a Git repository by running: $ git pull https://github.com/sborya/samza getPartitionsCount Alternatively you can review and apply these changes as the patch at: https://github.com/apache/samza/pull/346.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #346 commit a31a7aa29b7be4bb46f8e651b6b8fa46a65b48e2 Author: Boris ShkolnikDate: 2017-10-16T22:25:49Z reduce debugging from info to debug in KafkaCheckpointManager.java commit 410ce78ba1ff8dafa2587481473e62ac9cfa6f4f Author: Boris S Date: 2017-10-17T01:20:04Z Merge branch 'master' of https://github.com/apache/samza commit d4620d6690f74cad9472d0e27a1b31aeb4156c54 Author: Boris S Date: 2017-10-25T00:11:48Z Merge branch 'master' of https://github.com/apache/samza commit bbffb79b8b9799a41e8e82ded60f83550736886b Author: Boris S Date: 2017-10-25T00:54:20Z Merge branch 'master' of https://github.com/apache/samza commit 010fa168ee2a290b93f5a3b0908709b2c19044ec Author: Boris S Date: 2017-10-25T01:33:03Z Merge branch 'master' of https://github.com/apache/samza commit 5e6f5fb5f9a9ee12ce35ee8eb1836a058521df20 Author: Boris Shkolnik Date: 2017-10-25T16:50:37Z Merge branch 'master' of https://github.com/apache/samza commit 06b1ac36e9c67a3bd558a0fa592639b16fcbfda9 Author: Boris Shkolnik Date: 2017-10-25T16:50:55Z Merge branch 'master' of https://github.com/sborya/samza commit 1ad58d43fbe00a57054cb85b0be2eef6ee6470a6 Author: Boris S Date: 2017-10-31T19:46:53Z Merge branch 'master' of https://github.com/apache/samza commit 17f2d29c6731cf6b1d59362d43f4aa3b23bb11db Author: Boris Shkolnik Date: 2017-10-31T20:11:34Z getPartitionsCount returns Integer ---
Re: Samza questions (downtime during deployment and num partition per task)
Hi, Tony, For your questions: 1) Having a hot-standby job instance for fail-over may introduce certain operational complications. For example, if they produce to the same output, then both will be running in a short period of time, which might lead to duplicates in output. If the jobs has local state, it will be more complicated to continue using the previous states from the active job. So if your job doesn't have state and can handle duplicates, this might work. Another option I can think of is to use the zk-based Samza deployment. This is a new feature we added to Samza. The description is http://samza.apache.org/startup/preview/#flexible-deployment-model, and there is a hello-samza example for it: http://samza.apache.org/learn/tutorials/latest/hello-samza-high-level-zk.html. For zk-based deployment, we support rolling bounce so you can upgrade your container one at a time. 2) Yes, you can implement your own SystemStreamPartitionGrouper if you want a static assignment (from config). For example, please take a look at AllSspToSingleTaskGrouper which groups all system stream partitions into a single task. Thanks, Xinyu On Mon, Oct 30, 2017 at 2:02 PM, Tony Duwrote: > Hi, we're looking into Samza for doing real-time processing. We have couple > questions w.r.t Samza functionality > > 1. One "must-have" requirement for us is zero/minimal downtime during > deployment of Samza jobs. One approach that we're thinking of is to start a > new instance of the same Samza job and make sure it's running before take > down the running one. Is there any problem to this approach? If so, is > there any suggestion for this problem as I'm unable to find anything > related to this in documentation. > > 2. Is it possible to configure multiple Kafka topic partitions to a single > task instance within a job? Since each task has its own JVM and some of the > job that we have use a lot of memory, it would be very wasteful to run many > instances of them > > Thanks much for your help! > > > -- > Tony T Du > Sr. Software Engineer - Dataminr > > I hear I forget. I see I remember. I do I understand. > > -- >
[GitHub] samza pull request #345: SAMZA-1477: Fix issues found by BEAM tests
GitHub user xinyuiscool opened a pull request: https://github.com/apache/samza/pull/345 SAMZA-1477: Fix issues found by BEAM tests A bunch of issues were found by BEAM tests, which includes: 1) WatermarkFunction needs to be able to return output after processWatermark() 2) control message doesn't implement the equals() and hashcode() 3) Some kafka system related code is not scala 2.10 compatible for tests. You can merge this pull request into a Git repository by running: $ git pull https://github.com/xinyuiscool/samza SAMZA-1477 Alternatively you can review and apply these changes as the patch at: https://github.com/apache/samza/pull/345.patch To close this pull request, make a commit to your master/trunk branch with (at least) the following in the commit message: This closes #345 commit 735f4b966b19521c3d7003c25209f5344f0fb6c3 Author: xiliuDate: 2017-10-31T17:41:56Z SAMZA-1477: Fix issues found by BEAM tests ---
Re: Exit code 248 from YARN and InterruptedException
Hi Xiaochuan, Is there any exception in your custom SystemConsumer? Or, are you saying that your consumer is being interrupted externally? The entire log will be helpful. Thanks, Jagdish On Mon, Oct 30, 2017 at 9:14 AM, XiaoChuan Yuwrote: > Hi, > > I'm trying to debug some problem with a Samza job that loads data from a > Cassandra database into Kafka via a custom System implementation. > The System makes use of BlockingEnvelopeMap. > > What I see in the logs is InterruptedException from put calls from the > class that inherited BlockingEnvelopeMap and then the container exiting > with code 248: > > java.lang.InterruptedException: null > at > java.util.concurrent.locks.AbstractQueuedSynchronizer. > acquireInterruptibly(AbstractQueuedSynchronizer.java:1220) > at > java.util.concurrent.locks.ReentrantLock.lockInterruptibly( > ReentrantLock.java:335) > at > java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:339) > at > org.apache.samza.util.BlockingEnvelopeMap.putAll( > BlockingEnvelopeMap.java:199) > ... > > There is almost nothing else in the logs so I'm very puzzled on what could > be causing this. > Has anyone ever seen exit code 248? > Could a problem with Kafka cause something like this? > > Thanks, > Xiaochuan Yu > -- Jagadish V, Graduate Student, Department of Computer Science, Stanford University