Re: Review Request 52476: Do not load task store which are older than delete tombstones.

2016-10-19 Thread Shanthoosh Venkataraman


> On Oct. 17, 2016, 11:08 p.m., Prateek Maheshwari wrote:
> > samza-core/src/main/scala/org/apache/samza/storage/TaskStorageManager.scala,
> >  line 156
> > 
> >
> > Unrelated, but let's make this info.

Done.


> On Oct. 17, 2016, 11:08 p.m., Prateek Maheshwari wrote:
> > samza-core/src/main/scala/org/apache/samza/config/StorageConfig.scala, line 
> > 33
> > 
> >
> > Usually more readable if you write this as a multiplication: 1 * 24 * 
> > 60 * 60 * 1000L

Done.


> On Oct. 17, 2016, 11:08 p.m., Prateek Maheshwari wrote:
> > samza-core/src/main/scala/org/apache/samza/container/SamzaContainer.scala, 
> > line 532
> > 
> >
> > Prefer passing the one config that we need explicitly instead of 
> > passing the config object.

Moved to use changeLogDeleteRetentions map.


> On Oct. 17, 2016, 11:08 p.m., Prateek Maheshwari wrote:
> > samza-core/src/main/scala/org/apache/samza/storage/TaskStorageManager.scala,
> >  line 29
> > 
> >
> > Unrelated to RB but prefer explicit imports.

Done.


> On Oct. 17, 2016, 11:08 p.m., Prateek Maheshwari wrote:
> > samza-core/src/main/scala/org/apache/samza/storage/TaskStorageManager.scala,
> >  line 26
> > 
> >
> > Delete or import explicitly.

Done.


> On Oct. 17, 2016, 11:08 p.m., Prateek Maheshwari wrote:
> > samza-core/src/main/scala/org/apache/samza/storage/TaskStorageManager.scala,
> >  line 107
> > 
> >
> > Add method description.

Done.


> On Oct. 17, 2016, 11:08 p.m., Prateek Maheshwari wrote:
> > samza-core/src/main/scala/org/apache/samza/storage/TaskStorageManager.scala,
> >  line 114
> > 
> >
> > Another case we ran into on Friday - if the oldest offset in the 
> > changelog topic is newer than the offset in the OFFSET file. Do you need to 
> > handle that here?
> > 
> > Nitpick: would isStaleStore be clearer?

Discussed offline. This is a regular scenario that happens with compaction 
(message expiration in general w.r.t topics). When the offset in the offset 
file is older than oldest offset in changelog, it indicates that compaction has 
happened. To not miss messages from the topic in the consumption, users have to 
consume from the oldest offset in the changelog, which is controlled by the 
config parameter systems.name.consumer.auto.offset.reset.


> On Oct. 17, 2016, 11:08 p.m., Prateek Maheshwari wrote:
> > samza-core/src/main/scala/org/apache/samza/storage/TaskStorageManager.scala,
> >  line 122
> > 
> >
> > Mention somewhere in the message that this means that the store is 
> > stale.

Done.


- Shanthoosh


---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/52476/#review152976
---


On Oct. 19, 2016, 10:04 p.m., Shanthoosh Venkataraman wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/52476/
> ---
> 
> (Updated Oct. 19, 2016, 10:04 p.m.)
> 
> 
> Review request for samza.
> 
> 
> Repository: samza
> 
> 
> Description
> ---
> 
> Every local task store is backed up by a kafka changelog topic. Due to log 
> compaction, delete tombstones of the changelog topic have a ttl of 
> delete.retention.ms. Replaying the events from the changelog that has missing 
> delete tombstones, would result in creation of an inconsistent local 
> store(due to the missing of some delete events). This patch deletes the local 
> stores in which difference between current time and last modified time of the 
> offset file is greater than delete.retention.ms during the container startup.
> 
> 
> Diffs
> -
> 
>   samza-core/src/main/java/org/apache/samza/storage/StorageRecovery.java 
> 9329edf7d724f3a0d9235354bb77936f713e3b5f 
>   samza-core/src/main/scala/org/apache/samza/config/StorageConfig.scala 
> be3f1068f7921c9c8f8697577efea6580672813e 
>   samza-core/src/main/scala/org/apache/samza/container/SamzaContainer.scala 
> 05a996c98075ea8ed3767af666b9beeb1933f2a6 
>   samza-core/src/main/scala/org/apache/samza/storage/TaskStorageManager.scala 
> 0b7bcdda1639eea8239a69c31bdf42558e9077d2 
>   
> samza-core/src/test/scala/org/apache/samza/storage/TestTaskStorageManager.scala
>  4d40f520e54beb643acd8410c772b75e2f6a9162 
>   

Re: Review Request 52476: Do not load task store which are older than delete tombstones.

2016-10-19 Thread Shanthoosh Venkataraman

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/52476/
---

(Updated Oct. 19, 2016, 10:04 p.m.)


Review request for samza.


Repository: samza


Description
---

Every local task store is backed up by a kafka changelog topic. Due to log 
compaction, delete tombstones of the changelog topic have a ttl of 
delete.retention.ms. Replaying the events from the changelog that has missing 
delete tombstones, would result in creation of an inconsistent local store(due 
to the missing of some delete events). This patch deletes the local stores in 
which difference between current time and last modified time of the offset file 
is greater than delete.retention.ms during the container startup.


Diffs (updated)
-

  samza-core/src/main/java/org/apache/samza/storage/StorageRecovery.java 
9329edf7d724f3a0d9235354bb77936f713e3b5f 
  samza-core/src/main/scala/org/apache/samza/config/StorageConfig.scala 
be3f1068f7921c9c8f8697577efea6580672813e 
  samza-core/src/main/scala/org/apache/samza/container/SamzaContainer.scala 
05a996c98075ea8ed3767af666b9beeb1933f2a6 
  samza-core/src/main/scala/org/apache/samza/storage/TaskStorageManager.scala 
0b7bcdda1639eea8239a69c31bdf42558e9077d2 
  
samza-core/src/test/scala/org/apache/samza/storage/TestTaskStorageManager.scala 
4d40f520e54beb643acd8410c772b75e2f6a9162 
  samza-kafka/src/main/scala/org/apache/samza/config/KafkaConfig.scala 
973ab8cfb3d248bec7efe5e338f5e667f097556d 

Diff: https://reviews.apache.org/r/52476/diff/


Testing
---

Unit testing and manual testing has been done to verify the functionality.


Thanks,

Shanthoosh Venkataraman



Re: Review Request 53028: SAMZA-1040: Revert the ClassLoaderHelper change in SamzaContainer

2016-10-19 Thread Jagadish Venkatraman

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/53028/#review153311
---


Ship it!




Ship It!

- Jagadish Venkatraman


On Oct. 19, 2016, 6:30 p.m., Xinyu Liu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/53028/
> ---
> 
> (Updated Oct. 19, 2016, 6:30 p.m.)
> 
> 
> Review request for samza, Jake Maes and Jagadish Venkatraman.
> 
> 
> Repository: samza
> 
> 
> Description
> ---
> 
> The change introduced by ClassLoaderHelpler does not work for 
> AsyncStreamTask, so the patch has been reverted in 0.11.0 branch. This is the 
> fix forward for the master branch.
> 
> 
> Diffs
> -
> 
>   samza-core/src/main/scala/org/apache/samza/container/SamzaContainer.scala 
> 05a996c98075ea8ed3767af666b9beeb1933f2a6 
> 
> Diff: https://reviews.apache.org/r/53028/diff/
> 
> 
> Testing
> ---
> 
> Tested by AsyncStreamTask jobs.
> 
> 
> Thanks,
> 
> Xinyu Liu
> 
>



Re: Review Request 53002: Do not load the monitor, if the MonitorFactoryClass is not defined for the monitor in the config.

2016-10-19 Thread Shanthoosh Venkataraman

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/53002/
---

(Updated Oct. 19, 2016, 6 p.m.)


Review request for samza and Jake Maes.


Repository: samza


Description (updated)
---

This patch aims to not load the monitors if the monitor factory class is not 
defined(set) for the monitor in the config. This will enable the users to turn 
on/off the monitors in samza-rest easily(just by setting the 
monitorFactoryClass config associated with monitor a to empty string.). This is 
associated with the JIRA ticket : 
https://issues.apache.org/jira/browse/SAMZA-1039


Diffs
-

  build.gradle 98839f2d44569cfe04f12207117668afd593b4df 
  samza-rest/src/main/java/org/apache/samza/monitor/SamzaMonitorService.java 
ce947f7ae1175acc1ee9aa75991c726848072694 
  samza-rest/src/test/java/org/apache/samza/monitor/TestMonitorService.java 
4618b54f5af861383df45bf7185622d36d17cd5e 
  
samza-rest/src/test/java/org/apache/samza/monitor/mock/MockMonitorFactory.java 
PRE-CREATION 

Diff: https://reviews.apache.org/r/53002/diff/


Testing
---

Unit testing and manual testing are done to verify the functionality.


Thanks,

Shanthoosh Venkataraman



Re: Review Request 53027: SAMZA-1017 - Added disk quota based throttling to AsyncRunLoop.

2016-10-19 Thread Prateek Maheshwari

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/53027/
---

(Updated Oct. 19, 2016, 10:54 a.m.)


Review request for samza and Xinyu Liu.


Changes
---

Updated description.


Bugs: SAMZA-1017
https://issues.apache.org/jira/browse/SAMZA-1017


Repository: samza


Description (updated)
---

Added disk quota based throttling to AsyncRunLoop.

Overview:
Adds a Throttleable interface, implemented by RunLoop, AsyncRunLoop and 
ThrottlingExecutor
When AsyncRunLoop is throttled, it delays the onComplete() callback from 
processAsync() by a delay amount appropriate for the desired work factor.

This implementation has a couple of known issues:
1. Adding additional delay to process()/processAsync() callback will not 
throttle the run loop as long as task processing rate > message throughput. 
E.g., a low QPS stream with process() time < message inter-arrival time. If 
desirable, this can be addressed by delaying based on the total run loop time 
instead of just the process() time.

2. If throttled, users can increase their throughput back to original by 
increasing task.max.concurrency and redeploying their jobs. I don't have a 
simple solution for this, suggestions are welcome.


Diffs
-

  samza-core/src/main/java/org/apache/samza/container/RunLoopFactory.java 
a789d04 
  
samza-core/src/main/java/org/apache/samza/container/disk/WatermarkDiskQuotaPolicy.java
 21fbca2 
  samza-core/src/main/java/org/apache/samza/task/AsyncRunLoop.java a510bb0 
  samza-core/src/main/java/org/apache/samza/util/Throttleable.java PRE-CREATION 
  samza-core/src/main/java/org/apache/samza/util/ThrottlingExecutor.java 
afcc4c5 
  samza-core/src/main/java/org/apache/samza/util/ThrottlingScheduler.java 
PRE-CREATION 
  samza-core/src/main/scala/org/apache/samza/container/RunLoop.scala 538ebb8 
  samza-core/src/main/scala/org/apache/samza/container/SamzaContainer.scala 
05a996c 
  samza-core/src/test/java/org/apache/samza/task/TestAsyncRunLoop.java ca913de 
  samza-core/src/test/java/org/apache/samza/util/TestThrottlingScheduler.java 
PRE-CREATION 
  samza-core/src/test/scala/org/apache/samza/container/TestRunLoop.scala 
aa1a8d6 
  samza-core/src/test/scala/org/apache/samza/container/TestSamzaContainer.scala 
cff6b96 

Diff: https://reviews.apache.org/r/53027/diff/


Testing
---

Tested locally with a hello world app.


Thanks,

Prateek Maheshwari



Review Request 53027: SAMZA-1017 - Added disk quota based throttling to AsyncRunLoop.

2016-10-19 Thread Prateek Maheshwari

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/53027/
---

Review request for samza and Xinyu Liu.


Bugs: SAMZA-1017
https://issues.apache.org/jira/browse/SAMZA-1017


Repository: samza


Description
---

Added disk quota based throttling to AsyncRunLoop.

Overview:
Adds a Throttleable interface, implemented by SyncRunLoop, AsyncRunLoop and 
ThrottlingExecutor
Adds a RunLoop interface, implemented by SyncRunLoop (formerly RunLoop) and 
AsyncRunLoop. 
When AsyncRunLoop is throttled, it delays the onComplete() callback from 
processAsync() by a delay amount appropriate for the desired work factor.

This implementation has a couple of known issues:
1. Adding additional delay to process()/processAsync() callback will not 
throttle the run loop as long as task processing rate > message throughput. 
E.g., a low QPS stream with process() time < message inter-arrival time. If 
desirable, this can be addressed by delaying based on the total run loop time 
instead of just the process() time.

2. If throttled, users can increase their throughput back to original by 
increasing task.max.concurrency and redeploying their jobs. I don't have a 
simple solution for this, suggestions are welcome.


Diffs
-

  samza-core/src/main/java/org/apache/samza/container/RunLoopFactory.java 
a789d04 
  
samza-core/src/main/java/org/apache/samza/container/disk/WatermarkDiskQuotaPolicy.java
 21fbca2 
  samza-core/src/main/java/org/apache/samza/task/AsyncRunLoop.java a510bb0 
  samza-core/src/main/java/org/apache/samza/util/Throttleable.java PRE-CREATION 
  samza-core/src/main/java/org/apache/samza/util/ThrottlingExecutor.java 
afcc4c5 
  samza-core/src/main/java/org/apache/samza/util/ThrottlingScheduler.java 
PRE-CREATION 
  samza-core/src/main/scala/org/apache/samza/container/RunLoop.scala 538ebb8 
  samza-core/src/main/scala/org/apache/samza/container/SamzaContainer.scala 
05a996c 
  samza-core/src/test/java/org/apache/samza/task/TestAsyncRunLoop.java ca913de 
  samza-core/src/test/java/org/apache/samza/util/TestThrottlingScheduler.java 
PRE-CREATION 
  samza-core/src/test/scala/org/apache/samza/container/TestRunLoop.scala 
aa1a8d6 
  samza-core/src/test/scala/org/apache/samza/container/TestSamzaContainer.scala 
cff6b96 

Diff: https://reviews.apache.org/r/53027/diff/


Testing
---

Tested locally with a hello world app.


Thanks,

Prateek Maheshwari



Re: Review Request 52960: SAMZA-1029: Prepare release candidate for 0.11.0

2016-10-19 Thread Jake Maes

---
This is an automatically generated e-mail. To reply, visit:
https://reviews.apache.org/r/52960/#review153288
---


Ship it!




Ship It!

- Jake Maes


On Oct. 18, 2016, 9:28 p.m., Xinyu Liu wrote:
> 
> ---
> This is an automatically generated e-mail. To reply, visit:
> https://reviews.apache.org/r/52960/
> ---
> 
> (Updated Oct. 18, 2016, 9:28 p.m.)
> 
> 
> Review request for samza and Navina Ramesh.
> 
> 
> Repository: samza
> 
> 
> Description
> ---
> 
> Doc updates for the master.
> 
> 
> Diffs
> -
> 
>   docs/_config.yml dc1a66fa743d464c70d92406540fd7122c45272c 
>   docs/_layouts/default.html 60e56b5a14f5211a5ff0e2812c1fc331a25ebfe5 
>   docs/archive/index.html b0a44c6ab40f4eeb7ea4dfc17a8c7243b7e6e035 
>   docs/learn/tutorials/versioned/deploy-samza-job-from-hdfs.md 
> ca7b5f1a59724bbae9c46c7abd0d68cb3f019e3b 
>   docs/learn/tutorials/versioned/deploy-samza-to-CDH.md 
> daf762bc9f536520cceb503c5053283a80488bb1 
>   docs/learn/tutorials/versioned/remote-debugging-samza.md 
> 40db31a8152a999b549fde8f9155f4541d03147d 
>   docs/learn/tutorials/versioned/run-in-multi-node-yarn.md 
> bf2b59e3f4e0c6a3bfde0187db0b799f76797afb 
>   docs/learn/tutorials/versioned/samza-rest-getting-started.md 
> 942329e968bb02886df44b680bea8f75a221a289 
>   docs/startup/download/index.md 6a0c670bca01e01b9d8a73482af35cc144f1d524 
>   docs/startup/hello-samza/versioned/index.md 
> 8baacd390d41c5c87a426d63eec9ce5028de0cc2 
>   gradle.properties f032b745a7ceae319996314f22c16fe0b664e705 
> 
> Diff: https://reviews.apache.org/r/52960/diff/
> 
> 
> Testing
> ---
> 
> Local website
> 
> 
> Thanks,
> 
> Xinyu Liu
> 
>