Re: [DISCUSS] SEP-2: ApplicationRunner Design

2017-06-21 Thread Chris Pettitt
t > we don't have so many variants and parameters. Otherwise, as we add more > parameters for event time support and type information for serdes, it'll > get really difficult to both discover the appropriate window variant and > understand a window specification once written. >

Re: [DISCUSS] SEP-2: ApplicationRunner Design

2017-06-20 Thread Chris Pettitt
wing first and then separately specify the aggregation operation (e.g. group by key, combine, sum, etc.). This saves from the combinatorial explosion of keyed / global, unwindowed (batch) / session / fixed / tumbling / etc., GBK / sum / count / etc. - Chris On Tue, Jun 20, 2017 at 10:46 AM, Chris Pett

Re: [DISCUSS] SEP-2: ApplicationRunner Design

2017-06-20 Thread Chris Pettitt
it for each job, >>removing the current JobRunner interface >>6. We also need a StreamTaskApplication class that allows user to >>create task-level applications, by mandate the constructor with a >> parameter >>of StreamTaskFactory >> >>

Re: [DISCUSS] SEP-6: Support Watermark Across Intermediate Streams for Batch Processing

2017-06-01 Thread Chris Pettitt
taking one input source emits watermark at 1min interval and the >> task taking another input source emits watermark at 5min interval, how does >> the downstream consumer reconcile the watermarks? >> >> Watermark propagation does not require synchronization. Chris's equati

Re: [DISCUSS] SEP-6: Support Watermark Across Intermediate Streams for Batch Processing

2017-05-30 Thread Chris Pettitt
oo and I dropped the ControlMessage type in > my > > code. I also moved taskName, taskCount to the parent ControlMessage > class. > > Just updated the SEP-6. Please take a look again. > > > > Thanks, > > Xinyu > > > > On Tue, May 30, 2017 at 9:12 AM, Chri

Re: [DISCUSS] SEP-6: Support Watermark Across Intermediate Streams for Batch Processing

2017-05-30 Thread Chris Pettitt
MessageType and ControlMessage.Type look redundant. You could either use "ControlMessage" as the type in MessageType or drop ControlMessage.Type. On Fri, May 26, 2017 at 5:14 PM, xinyu liu wrote: > Thanks a lot for the comments. I updated the SEP with more details and >

Re: [DISCUSS] SEP-2: ApplicationRunner Design

2017-05-03 Thread Chris Pettitt
ke Kafka, > artifactory or http server. For Kafka, it's straightforward but we will > have the size limit or cut it by ourselves. For the other two, we need to > investigate whether we can easily upload jars to our artifactory and > localizing it with Yarn. Any opinions on this? >

Re: [DISCUSS] SEP-2: ApplicationRunner Design

2017-04-28 Thread Chris Pettitt
.run(), the operator user classes/lamdas in the StreamGraph need to > > be serialized. As today, the existing option is to serialize to a stream, > > either the coordinator stream or the pipeline control stream, which will > > have the size limit per message. Do you see RPC as an

Re: [DISCUSS] SEP-2: ApplicationRunner Design

2017-04-27 Thread Chris Pettitt
That should have been: For #1, Beam doesn't have a hard requirement... On Thu, Apr 27, 2017 at 9:07 AM, Chris Pettitt <cpett...@linkedin.com> wrote: > For #1, I doesn't have a hard requirement for any change from Samza. A > very nice to have would be to allow the input systems

Re: [DISCUSS] SEP-2: ApplicationRunner Design

2017-04-27 Thread Chris Pettitt
For #1, I doesn't have a hard requirement for any change from Samza. A very nice to have would be to allow the input systems to be set up at the same time as the rest of the StreamGraph. An even nicer to have would be to do away with the callback based approach and treat graph building as a

Re: [DISCUSS] SEP-2: ApplicationRunner Design

2017-04-21 Thread Chris Pettitt
, Apr 20, 2017 at 3:52 PM, Chris Pettitt <cpett...@linkedin.com> wrote: > It might be worth taking a look at how Beam does test streams. The API is > more powerful than just passing in a queue, e.g.: > > TestStream source = TestStream.create(StringUtf8Coder.of()) > .addElement

Re: [DISCUSS] SEP-2: ApplicationRunner Design

2017-04-20 Thread Chris Pettitt
It might be worth taking a look at how Beam does test streams. The API is more powerful than just passing in a queue, e.g.: TestStream source = TestStream.create(StringUtf8Coder.of()) .addElements(TimestampedValue.of("this", start)) .addElements(TimestampedValue.of("that", start))

Re: Review Request 53282: SAMZA-1043: Samza performance improvements

2016-11-02 Thread Chris Pettitt
(line 25) <https://reviews.apache.org/r/53282/#comment224228> We have a HighResolutionClock that does this. I think you can use it here. - Chris Pettitt On Nov. 2, 2016, 5:56 p.m., Xinyu Liu wrote: > > --- > This is a

Re: Review Request 52403: SAMZA-1028: Moving logline before closing kafka producer and making exception thrown AtomicReference

2016-09-29 Thread Chris Pettitt
need an atomic ref - a volatile would be sufficient. However, if the code can be run in a multi-threaded path, you would indeed want to use CAS to dequeue the exception in one operation. - Chris Pettitt On Sept. 29, 2016, 7:04 p.m., Xinyu Liu wrote

Re: Review Request 47835: SAMZA-914: Initial draft for Java programming APIs on operators supporting DAGs

2016-09-14 Thread Chris Pettitt
> On Sept. 14, 2016, 7:03 p.m., Chris Pettitt wrote: > > This review turned out not to be so massive as I expected (lots of > > deletes). I don't see any serious issues. There are some minor cosmetic > > issues and some subjective stuff that you can take or leave. I t

Re: Review Request 47835: SAMZA-914: Initial draft for Java programming APIs on operators supporting DAGs

2016-09-14 Thread Chris Pettitt
35/#comment216488> This doesn't appear to be testing anything (other than that types check). samza-operator/src/test/java/org/apache/samza/task/AssembleCallGraphTask.java (line 141) <https://reviews.apache.org/r/47835/#comment216489> Minor: indentation is off here.

Re: Review Request 51346: SAMZA-974 - Support finite datasources in Samza that have a notion of End-Of-Stream

2016-09-01 Thread Chris Pettitt
> On Aug. 26, 2016, 7:51 p.m., Chris Pettitt wrote: > > samza-api/src/main/java/org/apache/samza/system/IncomingMessageEnvelope.java, > > line 31 > > <https://reviews.apache.org/r/51346/diff/4/?file=1486375#file1486375line31> > > > > How likely are we

Re: Review Request 50174: SAMZA-977: User doc for samza multithreading

2016-07-28 Thread Chris Pettitt
, the documentation reads like it is OK to do synchronous IO by just throwing more threads at the problem. This is not a good idea. The ability to do synchronous IO is a transition step towards async, not an end state. - Chris Pettitt On July 27, 2016, 11:05 p.m., Xinyu Liu wrote

Re: Review Request 50174: SAMZA-977: User doc for samza multithreading

2016-07-20 Thread Chris Pettitt
> > (Updated July 19, 2016, 1:05 a.m.) > > > Review request for samza, Chris Pettitt, Navina Ramesh, and Yi Pan (Data > Infrastructure). > > > Repository: samza > > > Description > --- >

Re: Review Request 47835: SAMZA-914: Initial draft for Java programming APIs on operators supporting DAGs

2016-07-18 Thread Chris Pettitt
from T to a Collection of R instead of having a separate flat map function type. This comment can be applied generally wherever we're providing "functions" - Chris Pettitt On July 13, 2016, 8:54 a.m., Yi Pan (Data Infra

Re: Review Request 50082: SAMZA-973: Disk Quotas: clamp max delay, better measure processing time

2016-07-18 Thread Chris Pettitt
/SamzaContainer.scala 18c09224bbae959342daf9b2b7a7d971cc224f48 samza-core/src/test/java/org/apache/samza/util/TestThrottlingExecutor.java 26590507b9c72a8c64171aeb1e5b7c3d5c24c41a Diff: https://reviews.apache.org/r/50082/diff/ Testing --- gradle test Thanks, Chris Pettitt

Re: Review Request 50082: SAMZA-973: Disk Quotas: clamp max delay, better measure processing time

2016-07-18 Thread Chris Pettitt
his is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/50082/#review142459 --- On July 15, 2016, 7:59 p.m., Chris Pettitt wrote: > > --- >

Re: Review Request 50082: SAMZA-973: Disk Quotas: clamp max delay, better measure processing time

2016-07-15 Thread Chris Pettitt
/SamzaContainer.scala 18c09224bbae959342daf9b2b7a7d971cc224f48 samza-core/src/test/java/org/apache/samza/util/TestThrottlingExecutor.java 26590507b9c72a8c64171aeb1e5b7c3d5c24c41a Diff: https://reviews.apache.org/r/50082/diff/ Testing (updated) --- gradle test Thanks, Chris Pettitt

Review Request 50082: SAMZA-973: Disk Quotas: clamp max delay, better measure processing time

2016-07-15 Thread Chris Pettitt
18c09224bbae959342daf9b2b7a7d971cc224f48 samza-core/src/test/java/org/apache/samza/util/TestThrottlingExecutor.java 26590507b9c72a8c64171aeb1e5b7c3d5c24c41a Diff: https://reviews.apache.org/r/50082/diff/ Testing --- Thanks, Chris Pettitt

Re: Review Request 48213: SAMZA-960: Make system producer thread safe

2016-07-15 Thread Chris Pettitt
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/48213/#review142418 --- Ship it! - Chris Pettitt On July 14, 2016, 12:43 a.m., Xinyu

Re: Review Request 49877: SAMZA-972: Holistic memory monitoring for SamzaContainer

2016-07-15 Thread Chris Pettitt
r/host/PosixCommandBasedStatisticsGetter.java (lines 47 - 55) <https://reviews.apache.org/r/49877/#comment207972> Not required, but you could simplify this with try-with-resources: https://docs.oracle.com/javase/tutorial/essential/exceptions/tryResourceClose.html - Chris Pettitt On July 14, 20

Re: Review Request 48356: RFC: Samza as a library

2016-07-12 Thread Chris Pettitt
> On July 12, 2016, 6:43 p.m., Chris Pettitt wrote: > > samza-core/src/main/java/org/apache/samza/processor/StreamProcessor.java, > > lines 174-176 > > <https://reviews.apache.org/r/48356/diff/8/?file=1441900#file1441900line174> > > > > Don't we need t

Re: Review Request 48356: RFC: Samza as a library

2016-07-12 Thread Chris Pettitt
> On June 27, 2016, 6:53 p.m., Chris Pettitt wrote: > > A few more thoughts below. > > > > Still not a fan of the direction we're going with the config. I know it is > > status quo, but it further locks us into a limited model. One other benefit > > of

Re: Review Request 48356: RFC: Samza as a library

2016-07-12 Thread Chris Pettitt
rently? If so, let's document this. - Chris Pettitt On July 12, 2016, midnight, Navina Ramesh wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.

Re: Review Request 49877: SAMZA-972: Holistic memory monitoring for SamzaContainer

2016-07-12 Thread Chris Pettitt
> On July 11, 2016, 6:47 p.m., Chris Pettitt wrote: > > Very high level question: I assume you looked at `ps -o rss` and > > disqualified it for some reason. Could you elaborate as to why? `ps` itself > > is certainly more portable than procfs (though `-o rss` is not p

Re: Review Request 49877: SAMZA-972: Holistic memory monitoring for SamzaContainer

2016-07-12 Thread Chris Pettitt
343> In the code you copied this from I prefix the thread name with "Samza-". The reason to do that is it becomes much easier when looking at a thread dump to determine if the thread is from Samza or thirdy party code. I'd recommend indicating this is a Samza thread someh

Re: Review Request 49877: SAMZA-972: Holistic memory monitoring for SamzaContainer

2016-07-11 Thread Chris Pettitt
` and disqualified it for some reason. Could you elaborate as to why? `ps` itself is certainly more portable than procfs (though `-o rss` is not part of the POSIX standard) - it works on RHEL and OSX. It also gives you the actual memory usage vs the number of pages. - Chris Pettitt On July 11, 2016

Re: Review Request 48356: RFC: Samza as a library

2016-06-27 Thread Chris Pettitt
che/samza/configbuilder/TestStandaloneConfigBuilder.java (line 50) <https://reviews.apache.org/r/48356/#comment204900> To future proof this a bit you could use AllSspToSingleTaskGrouperFactory.class.getName. Same for the one below. - Ch

Re: Review Request 48243: SAMZA-961: Async tasks and multithreading model

2016-06-16 Thread Chris Pettitt
multi-threaded context. samza-core/src/main/java/org/apache/samza/task/TaskCallbackManager.java (line 94) <https://reviews.apache.org/r/48243/#comment203223> final - Chris Pettitt On June 15, 2016, 11:41 p.m., Xin

Re: Review Request 48243: SAMZA-961: Async tasks and multithreading model

2016-06-16 Thread Chris Pettitt
> On June 15, 2016, 3:08 p.m., Chris Pettitt wrote: > > samza-core/src/main/java/org/apache/samza/task/AsyncRunLoop.java, line 188 > > <https://reviews.apache.org/r/48243/diff/2/?file=1412994#file1412994line188> > > > > Do we need to handle the case th

Re: Review Request 48243: SAMZA-961: Async tasks and multithreading model

2016-06-15 Thread Chris Pettitt
nt as above re. run loop termination. You potentially could even have a abortRunLoop(Throwable) function that would make this super clear. We should sync up on how to coordinate the disk quota changes and this change. - Chris Pettitt On June

Re: Review Request 48243: SAMZA-961: Async tasks and multithreading model

2016-06-15 Thread Chris Pettitt
his is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/48243/ > --- > > (Updated June 9, 2016, 7:49 p.m.) > > > Review request for samza, Chris Pettitt, Navina Ram

Re: Review Request 48080: SAMZA-956: Disk Quotas: Add throttler and disk quota enforcement

2016-06-13 Thread Chris Pettitt
eature to be < 150ns for Linux and OSX Thanks, Chris Pettitt

Re: Review Request 48182: SAMZA-958: Make store/cache thread safe

2016-06-09 Thread Chris Pettitt
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/48182/#review136879 --- Ship it! Ship It! - Chris Pettitt On June 9, 2016, 12:33

Re: Review Request 48182: SAMZA-958: Make store/cache thread safe

2016-06-09 Thread Chris Pettitt
generated e-mail. To reply, visit: > https://reviews.apache.org/r/48182/ > --- > > (Updated June 9, 2016, 12:33 a.m.) > > > Review request for samza, Chris Pettitt, Navina Ramesh, and Yi Pan (Data > Infrastructure). > > >

Re: Review Request 48213: SAMZA-960: Make system producer thread safe

2016-06-09 Thread Chris Pettitt
--- This is an automatically generated e-mail. To reply, visit: https://reviews.apache.org/r/48213/#review136875 --- Ship it! Ship It! - Chris Pettitt On June 8, 2016, 11:53

Re: Review Request 48356: RFC: Samza as a library

2016-06-09 Thread Chris Pettitt
Container.scala (lines 617 - 619) <https://reviews.apache.org/r/48356/#comment201973> I hope this is temporary. If not, if would be nicer to provide proper lifecycle / shutdown mechansisms. For example, on stop, I may want to flush a cache to disk. - Chris Pettitt On June 8, 2016, 9

Re: Review Request 48243: SAMZA-961: Async tasks and multithreading model

2016-06-08 Thread Chris Pettitt
as is, are a bit too heavy. samza-core/src/main/scala/org/apache/samza/container/RunLoop.scala (lines 71 - 75) <https://reviews.apache.org/r/48243/#comment201795> Same comment as for the async run loop. - Chris Pettitt O

Re: Review Request 48213: SAMZA-960: Make system producer thread safe

2016-06-06 Thread Chris Pettitt
es close correctly). Another benefit is that you reduce the number of volatile reads you need to make. - Chris Pettitt On June 3, 2016, 10:09 p.m., Xinyu Liu wrote: > > --- > This is an automatically generated e-mai

Re: Review Request 48182: SAMZA-958: Make store/cache thread safe

2016-06-06 Thread Chris Pettitt
m the main thread? It will give you a much more helpful error message. - Chris Pettitt On June 3, 2016, 9:30 p.m., Xinyu Liu wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > http

Re: Review Request 48213: SAMZA-960: Make system producer thread safe

2016-06-03 Thread Chris Pettitt
le. samza-kafka/src/main/scala/org/apache/samza/system/kafka/KafkaSystemProducer.scala (line 129) <https://reviews.apache.org/r/48213/#comment201105> same re. lock / volatile - Chris Pettitt On June 3, 2016, 5:

Re: Review Request 48182: SAMZA-958: Make store/cache thread safe

2016-06-03 Thread Chris Pettitt
a race here if this code is invoked before the store.put on line 492? - Chris Pettitt On June 2, 2016, 6:33 p.m., Xinyu Liu wrote: > > --- > This is an automatically generated e-mail. To

Re: Review Request 48080: SAMZA-956: Disk Quotas: Add throttler and disk quota enforcement

2016-06-02 Thread Chris Pettitt
reducing them a few lines down in the sleep call. The sleep call is returning the amount of error on the sleep (which may be negative). Previously I actually measured the error here instead of in the sleep call and did the decrement inline. I'm not wed to doing it in the sleep

Re: Review Request 48109: SAMZA-957 Avoid unnecessary KV Store flushes (part 3)

2016-06-01 Thread Chris Pettitt
> On June 1, 2016, 3:20 p.m., Chris Pettitt wrote: > > samza-kv/src/main/scala/org/apache/samza/storage/kv/CachedStore.scala, line > > 179 > > <https://reviews.apache.org/r/48109/diff/1/?file=1402973#file1402973line179> > > > > I think you could even

Review Request 48080: SAMZA-956: Disk Quotas: Add throttler and disk quota enforcement

2016-05-31 Thread Chris Pettitt
existing tests with gradle test - Verified throttling behavior and instrumentation with local deployment - Verified average latency impact of feature to be < 150ns for Linux and OSX Thanks, Chris Pettitt

Re: Review Request 47197: SAMZA-948 CoordinatorSystemStreamConsumer is not threadsafe

2016-05-11 Thread Chris Pettitt
/coordinator/stream/CoordinatorStreamSystemConsumer.java (line 66) <https://reviews.apache.org/r/47197/#comment197035> You actually don't need to wrap emptySet because it's already immutable. - Chris Pettitt On May 11, 2016, 8:13 p.m., Jake Maes

Re: Review Request 47197: SAMZA-948 CoordinatorSystemStreamConsumer is not threadsafe

2016-05-11 Thread Chris Pettitt
ent196969> You only need volatile here (vs. AtomicReference) since you're not using any CAS operation. For full safety, you need to wrap the set in an unmodifiable wrapper. Otherwise it would be possible to modify the set via "read only" methods like getBootstrappedStr

Re: Review Request 46644: SAMZA-889 - Change log not working properly with In memory Store

2016-05-05 Thread Chris Pettitt
elieve javadoc is not generated for private members by default). - Chris Pettitt On May 5, 2016, 9:13 p.m., Navina Ramesh wrote: > > --- > This is an automatically generated e-mail. To reply, visit: > https://reviews.

Re: Review Request 46644: SAMZA-889 - Change log not working properly with In memory Store

2016-05-05 Thread Chris Pettitt
working with history (e.g. git bisect). - Chris Pettitt On April 25, 2016, 5:25 p.m., Navina Ramesh wrote: > > --- > This is an automatically generated e-mail. To reply, visi

Re: Review Request 46856: SAMZA-943 Occasional test failure: TestStreamPartitionCountMonitor.testStartStopBehavior

2016-05-03 Thread Chris Pettitt
> On May 3, 2016, 5:23 p.m., Chris Pettitt wrote: > > samza-core/src/main/java/org/apache/samza/coordinator/StreamPartitionCountMonitor.java, > > line 107 > > <https://reviews.apache.org/r/46856/diff/2/?file=1367775#file1367775line107> > > > > This is

Re: Review Request 46856: SAMZA-943 Occasional test failure: TestStreamPartitionCountMonitor.testStartStopBehavior

2016-05-03 Thread Chris Pettitt
ient. I think you're checking the terminated flag for test, but the test code is already doing a timed wait, so you could switch it to `assertTrue(monitor.awaitTermination(...))` for the same effect. - Chris Pettitt On April 29, 201

Re: Review Request 45258: Abandon producer retry after a certain # of errors : SAMZA-911

2016-04-14 Thread Chris Pettitt
an implicit assumption that retryBackoff is providing a happens-before constraint between an invocation of the exception handler in retryBackoff and a subsequent invocation of the loop. This likely always holds, but a CAS for numRetries would cover cases where that does not hold. - Chris Pettit

Re: Review Request 44920: SAMZA-680 Refactor the Samza AppMaster to support other cluster managers

2016-04-14 Thread Chris Pettitt
; This is an automatically generated e-mail. To reply, visit: > https://reviews.apache.org/r/44920/ > --- > > (Updated April 12, 2016, 11:45 p.m.) > > > Review request for samza, Boris Shkolnik, Chris Pettitt, Jake Mae

Re: Review Request 45190: SAMZA-910 Fix expired request test in HostAwareContainerAllocator

2016-04-08 Thread Chris Pettitt
r than killing off several LoC) is that you get a timeout for the whole process versus timeouts on individual steps. Looks very good! I especially like the more intuitive host names. - Chris Pettitt On April 8, 2016, 3:02 p.m

Review Request 45504: SAMZA-924: Add disk space monitoring

2016-03-30 Thread Chris Pettitt
. Thanks, Chris Pettitt

Re: Review Request 44920: Remove tight coupling of Samza with Yarn. Define APIs for resource manager integration

2016-03-19 Thread Chris Pettitt
> On March 16, 2016, 11:10 p.m., Chris Pettitt wrote: > > Some more comments Sorry, did not mean to create issues for all of the below. I think #1 and #2 are the most interesting to look at of the group.

Re: Review Request 44920: Remove tight coupling of Samza with Yarn. Define APIs for resource manager integration

2016-03-19 Thread Chris Pettitt
.org/r/44920/#comment186274> Minor: may as well use a diamond here since you're using it on the next line. - Chris Pettitt On March 16, 2016, 6:23 p.m., Jagadish Venkatraman wrote: > > --- > This is an automatically gen

Re: Review Request 44920: Remove tight coupling of Samza with Yarn. Define APIs for resource manager integration

2016-03-19 Thread Chris Pettitt
://reviews.apache.org/r/44920/#comment186254> This should be volatile, especially as I believe it is being used to convey state across threads (e.g. whatever calls onContainerCompleted and the main thread in ClusterBasedJobCoordinator). - Chris Pettitt On March 16, 2016, 6:23 p.m., Jagadi

Re: Review Request 44920: Remove tight coupling of Samza with Yarn. Define APIs for resource manager integration

2016-03-18 Thread Chris Pettitt
> On March 16, 2016, 8:48 p.m., Chris Pettitt wrote: > > samza-core/src/main/java/org/apache/samza/clustermanager/SamzaAppState.java, > > line 36 > > <https://reviews.apache.org/r/44920/diff/1/?file=1301270#file1301270line36> > > > > Agreed. This i