[GitHub] samza pull request #347: SAMZA-1478; Delete unneeded data from intermediate ...

2017-10-31 Thread lindong28
GitHub user lindong28 opened a pull request:

https://github.com/apache/samza/pull/347

SAMZA-1478; Delete unneeded data from intermediate Kafka topic on offset 
commit



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/lindong28/samza SAMZA-1478

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/samza/pull/347.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #347


commit c418d59b45133478e0152625e2f2ec030e3400f3
Author: Dong Lin 
Date:   2017-11-01T03:41:21Z

SAMZA-1478; Delete unneeded data from intermediate Kafka topic on offset 
commit




---


[GitHub] samza pull request #346: Admin GetPartitionsCount should return count and no...

2017-10-31 Thread sborya
GitHub user sborya opened a pull request:

https://github.com/apache/samza/pull/346

Admin GetPartitionsCount should return count and not fake 
SamzaPartitionsMetadata



You can merge this pull request into a Git repository by running:

$ git pull https://github.com/sborya/samza getPartitionsCount

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/samza/pull/346.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #346


commit a31a7aa29b7be4bb46f8e651b6b8fa46a65b48e2
Author: Boris Shkolnik 
Date:   2017-10-16T22:25:49Z

reduce debugging from info to debug in KafkaCheckpointManager.java

commit 410ce78ba1ff8dafa2587481473e62ac9cfa6f4f
Author: Boris S 
Date:   2017-10-17T01:20:04Z

Merge branch 'master' of https://github.com/apache/samza

commit d4620d6690f74cad9472d0e27a1b31aeb4156c54
Author: Boris S 
Date:   2017-10-25T00:11:48Z

Merge branch 'master' of https://github.com/apache/samza

commit bbffb79b8b9799a41e8e82ded60f83550736886b
Author: Boris S 
Date:   2017-10-25T00:54:20Z

Merge branch 'master' of https://github.com/apache/samza

commit 010fa168ee2a290b93f5a3b0908709b2c19044ec
Author: Boris S 
Date:   2017-10-25T01:33:03Z

Merge branch 'master' of https://github.com/apache/samza

commit 5e6f5fb5f9a9ee12ce35ee8eb1836a058521df20
Author: Boris Shkolnik 
Date:   2017-10-25T16:50:37Z

Merge branch 'master' of https://github.com/apache/samza

commit 06b1ac36e9c67a3bd558a0fa592639b16fcbfda9
Author: Boris Shkolnik 
Date:   2017-10-25T16:50:55Z

Merge branch 'master' of https://github.com/sborya/samza

commit 1ad58d43fbe00a57054cb85b0be2eef6ee6470a6
Author: Boris S 
Date:   2017-10-31T19:46:53Z

Merge branch 'master' of https://github.com/apache/samza

commit 17f2d29c6731cf6b1d59362d43f4aa3b23bb11db
Author: Boris Shkolnik 
Date:   2017-10-31T20:11:34Z

getPartitionsCount returns Integer




---


Re: Samza questions (downtime during deployment and num partition per task)

2017-10-31 Thread xinyu liu
Hi, Tony,

For your questions:

1) Having a hot-standby job instance for fail-over may introduce certain
operational complications. For example, if they produce to the same output,
then both will be running in a short period of time, which might lead to
duplicates in output. If the jobs has local state, it will be more
complicated to continue using the previous states from the active job. So
if your job doesn't have state and can handle duplicates, this might work.
Another option I can think of is to use the zk-based Samza deployment. This
is a new feature we added to Samza. The description is
http://samza.apache.org/startup/preview/#flexible-deployment-model, and
there is a hello-samza example for it:
http://samza.apache.org/learn/tutorials/latest/hello-samza-high-level-zk.html.
For zk-based deployment, we support rolling bounce so you can upgrade your
container one at a time.

2) Yes, you can implement your own SystemStreamPartitionGrouper if you want
a static assignment (from config). For example, please take a look at
AllSspToSingleTaskGrouper which groups all system stream partitions into a
single task.

Thanks,
Xinyu

On Mon, Oct 30, 2017 at 2:02 PM, Tony Du  wrote:

> Hi, we're looking into Samza for doing real-time processing. We have couple
> questions w.r.t Samza functionality
>
> 1. One "must-have" requirement for us is zero/minimal downtime during
> deployment of Samza jobs. One approach that we're thinking of is to start a
> new instance of the same Samza job and make sure it's running before take
> down the running one. Is there any problem to this approach? If so, is
> there any suggestion for this problem as I'm unable to find anything
> related to this in documentation.
>
> 2. Is it possible to configure multiple Kafka topic partitions to a single
> task instance within a job? Since each task has its own JVM and some of the
> job that we have use a lot of memory, it would be very wasteful to run many
> instances of them
>
> Thanks much for your help!
>
> 
> --
> Tony T Du
> Sr. Software Engineer - Dataminr
>
> I hear I forget. I see I remember. I do I understand.
> 
> --
>


[GitHub] samza pull request #345: SAMZA-1477: Fix issues found by BEAM tests

2017-10-31 Thread xinyuiscool
GitHub user xinyuiscool opened a pull request:

https://github.com/apache/samza/pull/345

SAMZA-1477: Fix issues found by BEAM tests

A bunch of issues were found by BEAM tests, which includes:

1) WatermarkFunction needs to be able to return output after 
processWatermark()
2) control message doesn't implement the equals() and hashcode()
3) Some kafka system related code is not scala 2.10 compatible for tests.

You can merge this pull request into a Git repository by running:

$ git pull https://github.com/xinyuiscool/samza SAMZA-1477

Alternatively you can review and apply these changes as the patch at:

https://github.com/apache/samza/pull/345.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

This closes #345


commit 735f4b966b19521c3d7003c25209f5344f0fb6c3
Author: xiliu 
Date:   2017-10-31T17:41:56Z

SAMZA-1477: Fix issues found by BEAM tests




---


Re: Exit code 248 from YARN and InterruptedException

2017-10-31 Thread Jagadish Venkatraman
Hi Xiaochuan,

Is there any exception in your custom SystemConsumer? Or, are you saying
that your consumer is being interrupted externally?

The entire log will be helpful.

Thanks,
Jagdish

On Mon, Oct 30, 2017 at 9:14 AM, XiaoChuan Yu  wrote:

> Hi,
>
> I'm trying to debug some problem with a Samza job that loads data from a
> Cassandra database into Kafka via a custom System implementation.
> The System makes use of BlockingEnvelopeMap.
>
> What I see in the logs is InterruptedException from put calls from the
> class that inherited BlockingEnvelopeMap and then the container exiting
> with code 248:
>
> java.lang.InterruptedException: null
> at
> java.util.concurrent.locks.AbstractQueuedSynchronizer.
> acquireInterruptibly(AbstractQueuedSynchronizer.java:1220)
> at
> java.util.concurrent.locks.ReentrantLock.lockInterruptibly(
> ReentrantLock.java:335)
> at
> java.util.concurrent.LinkedBlockingQueue.put(LinkedBlockingQueue.java:339)
> at
> org.apache.samza.util.BlockingEnvelopeMap.putAll(
> BlockingEnvelopeMap.java:199)
> ...
>
> There is almost nothing else in the logs so I'm very puzzled on what could
> be causing this.
> Has anyone ever seen exit code 248?
> Could a problem with Kafka cause something like this?
>
> Thanks,
> Xiaochuan Yu
>



-- 
Jagadish V,
Graduate Student,
Department of Computer Science,
Stanford University